[gridengine users] qhost and qsub working, qlogin and qrsh not working

François-Michel L'Heureux fmlheureux at datacratic.com
Fri Aug 29 14:01:57 UTC 2014


Hi Reuti

Thanks for you reply.

I've made some tests and it seems that regarding the connection flow it's
the opposite. On the submit host, I activated debugging and launched qrsh.

I got

     9   4931         main     starting commlib server
    10   4931         main     trying to create commlib handle
    11   4931         main     (*handle)->connect_port = 0
    12   4931         main     (*handle)->service_port = 46780
    13   4931         main     B E F O R E     S E N D I N G! ! ! ! ! ! ! !
! ! ! ! ! !
    14   4931         main
=====================================================
    15   4931         main     sge_set_auth_info: username(uid) = root(0),
groupname = root(0)
    16   4931         main     JSV client context
    17   4931         main     JSV list for current thread updated
    18   4931         main     job id is: 32
    19   4931         main     R E A D I N G    J O B ! ! ! ! ! ! ! ! ! ! !
    20   4931         main     ============================================
    21   4931         main     random polling set to 3
    22   4931         main     waiting for connection

Then ctrl+z, ran "netstat -plnt" and got

Proto Recv-Q Send-Q Local Address           Foreign Address         State
    PID/Program name
tcp        0      0 0.0.0.0:46780           0.0.0.0:*               LISTEN
     4931/qrsh

So it seems that it is in fact the submit host that opens a port and await
for the exec host to connect.

Since docker doesn't currently allow to bind ports on demand I think I have
to declare my attempt a failure. (A very instructive one though!) It may
work someday if docker changes the way it handles ports, but in its current
state I don't see what can be done. I'll keep thinking about it...

Thanks a lot.
Mich


2014-08-29 6:27 GMT-04:00 Reuti <reuti at staff.uni-marburg.de>:

> Hi,
>
> Am 28.08.2014 um 23:28 schrieb François-Michel L'Heureux:
>
> > Ok, I'm pretty sure now my issue is related to ports exposition.
> > Is there a known/configurable port range used by SGE? There seem to be
> none...
>
> SGE will instruct the shepherd on the selected exechost to start a damon
> (either builtin/sshd/rshd) solely for your job and listen on a randomly
> selected port. On the other side the submission host will get the
> information to connect on exactly using this port to the exechost.
>
> I'm also not aware of any option to restrict this to a range of ports
> (besides doing it somewhere in the source).
>
> -- Reuti
>
>
> > 2014-08-28 16:57 GMT-04:00 François-Michel L'Heureux <
> fmlheureux at datacratic.com>:
> > Hello!
> >
> > I'll start by admitting that I'm working on a somewhat complex setup:
> I'm trying to submit jobs to SGE from a docker environment.
> >
> > So far, I've managed to mount the proper directory, run theinstallation
> and so on. qhost and qsub work perfectly well. qrsh and qlogin however
> don't. When I check /opt/sge6/default/spool/qmaster/messages I have
> >
> > 08/28/2014 20:41:31|worker|master|W|job 23.1 failed on host master
> assumedly after job because: can't read usage file for job 23.1
> >
> > for all of my qrsh/qlogin attemps.
> >
> > Googling the error did not help much. Anyone ever encountered that?
> >
> > Thanks a lot!
> > Mich
> >
> > _______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20140829/81bbbf57/attachment.html>


More information about the users mailing list