[gridengine users] Odd commlib HOST_NOT_RESOLVABLE error

Skylar Thompson skylar2 at uw.edu
Fri Aug 24 14:33:32 UTC 2018

Can you do a strace on the command as it's failing? Something like "strace
-e trace=open,connect qstat > /dev/null" should at least give a pointer for
where the failure is occuring. My first thought is that nscd is caching
some negative response for a few minutes, rather than retrying.

On Fri, Aug 24, 2018 at 10:25:42AM -0400, Valerio Luccio wrote:
> Hello all,
> we have a rather old installation of SGE that has been running for years
> without any problems. In the last 2-3 weeks I've been experiencing an
> odd problem: when issuing any command (qsub, qstat, qping, etc) I get
> the following error:
>     error: commlib error: access denied (server host resolves destination host "<server address>" as "(HOST_NOT_RESOLVABLE)")
>     error: unable to contact qmaster using port 6444 on host "<server address>"
> There are several odd things about this:
>   * Nothing has changed on the server or the clients in the months
>     before the error started appearing.
>   * This happens from most of the clients, but not all.
>   * The error persists for 5-10 minutes, and then everything works fine.
>   * Both gethostbyname and gethostbyaddr return the correct values from
>     the client while the error occurs (I haven't had a chance to try
>     them from the master during these episodes).
> I get a feeling that this has something to do with DNS and reverse
> lookup, but I don't know where to start debugging it.
> Anyone have any clue what I should look at ?
> Thanks,
> -- 
> Valerio Luccio             (212) 998-8736
> Center for Brain Imaging   4 Washington Place, Room 157
> New York University        New York, NY 10003
>     "In an open world, who needs windows or gates ?"

> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine

More information about the users mailing list