[gridengine users] qrsh commlib error with separate submit host

John Kloss john.kloss at gmail.com
Tue Sep 10 15:55:40 UTC 2013


>> error: commlib error: local host name error (IP based host name resolving "Levi-Montalcini01" doesn't match client host name from connect message "Levi-Montalcini86")
>> $

Is your submit host multi-homed?  I have had issues where  I had a
multi-homed submit host, say, hostA, which connects to two networks
via

hostA-int -> "grid network"
hostA-ext -> "gateway network"

Where "gateway network" and "grid network" do not route because
they're isolated from each other.

And the hostname used by hostA to contact a compute node is hostA-ext.
 The compute node can't reach hostA-ext; it can only reach hostA-int.
I had to change the hostname for hostA to hostA-int (under
/etc/hostname or /etc/sysconfig/network or /etc/node, etc.) so that
IP/hostname resolution matched for the "grid network".

Or, perhaps your submit host local hostname does not match your domain
name lookup mechanism (DNS, NIS, etc.) .  That is, your submit host
thinks its name is hostA.localhost and DNS thinks it's
hostA-submit.somenet.com.

What do you get when you type from the submit host

hostname

vs.

nslookup <submit_hostname>

?

Thanks.

  John.



More information about the users mailing list