[gridengine users] qrsh commlib error with separate submit host

Burian, John John.Burian at nationwidechildrens.org
Tue Sep 10 15:00:04 UTC 2013


I have an OGS 2011.11p1 cluster. The primary submit host is a separate machine from the queue master. When I try to use qrsh from the submit node, I get a commlib error (Levi-Montalcini01 is the queue master, Levi-Montalcini86 is a compute node):

$ qrsh  -verbose
Your job 590725 ("QRLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 590725 has been successfully scheduled.
Establishing builtin session to host Levi-Montalcini86 ...
error: commlib error: local host name error (IP based host name resolving "Levi-Montalcini01" doesn't match client host name from connect message "Levi-Montalcini86")
$

When I use qrsh from the queue master, it works fine:

$ qrsh -verbose
Your job 590750 ("QRLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 590750 has been successfully scheduled.
Establishing builtin session to host Levi-Montalcini88 ...
Levi-Montalcini88|~>

During the failed attempt, I see traffic from the compute node back to the queue master, but no traffic to the submit node from either the queue master or the compute node. Is qrsh from a separate submit node expected to work? Thanks,

John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20130910/af13008d/attachment.html>


More information about the users mailing list