[gridengine users] qlogin with ssh

Wiegers, Bert Bert.Wiegers at t-systems-sfr.com
Wed Dec 4 16:19:39 UTC 2013


Hi *,

we are using a qlogin wrapper script, as mentioned below.
It looks like that this setup prevents the sge to reach the terminate_method.

Bert

> -----Original Message-----
> From: users-bounces at gridengine.org [mailto:users-bounces at gridengine.org] On Behalf Of Wiegers,
> Bert
> Sent: Tuesday, December 03, 2013 9:01 AM
> To: users at gridengine.org
> Subject: Re: [gridengine users] qlogin with ssh
> 
> Hi Reuti,
> 
> The processtree looks like this
> root     20939  0.0  0.0 1242552 5892 ?        Sl   Nov14  18:57 /export/opt/SGE-8.1.6/bin/lx-
> amd64/sge_execd
> root     33874 99.7  0.0  34164  2828 ?        R    08:47   0:22  \_ sge_shepherd-18003 -bg
> root     33882  0.0  0.0  98156  3836 pts/1    Ss+  08:47   0:00      \_ sshd: xxxxxx [priv]
> xxxxxx 33884  0.0  0.0  98156  2044 pts/1    S+   08:47   0:00          \_ sshd: xxxxxx at pts/2
> xxxxxx 33885  1.1  0.0  14556  3260 pts/2    SNs  08:47   0:00              \_ -tcsh
> it stays the same as long as I am logged on to the node.
> 
> The Job is still listed in qstat.
> 
> In the messages of the scheduler I find these hints:
> 12/03/2013 08:52:31|schedu|service0|W|job 18003.1 should have finished since 90s
> 
> When I logout afterwards I see  in the messages
> 12/03/2013 08:58:42|worker|service0|I|removing trigger to terminate job 18003.1
> 12/03/2013 08:58:42|worker|service0|W|job 18003.1 failed on host XY qmaster enforced h_rt, h_cpu,
> or h_vmem limit because: <unknown reason>
> 
> Bert
> 
> 
> 
> > -----Original Message-----
> > From: Reuti [mailto:reuti at staff.uni-marburg.de]
> > Sent: Monday, December 02, 2013 6:43 PM
> > To: Wiegers, Bert
> > Cc: users at gridengine.org
> > Subject: Re: [gridengine users] qlogin with ssh
> >
> > Hi,
> >
> > Am 02.12.2013 um 18:28 schrieb Wiegers, Bert:
> >
> > > we are running the SGE 8.1.6.
> > > We have configured some interactive queues and use qlogin with the
> > > wrapper-script  (... /usr/bin/ssh -Y -p $PORT $HOST).
> > > In our setup the user is forced to use the  h_rt variable.
> > > Unfortunatly qlogin does not care if the walltime is overdue.
> > > The shepherd seems to be unable to kill the qlogin sessions, when the
> > > user is still connected to the node.
> > > Has anyone a solution or a workaround for this?
> >
> > Is the `sshd` a child of the `shephered`, i.e. something like:
> >
> > $ ps -e f
> > ...
> >  6656 ?        Sl    56:23 /usr/sge/bin/lx24-x86/sge_execd
> >  9391 ?        S      0:00  \_ sge_shepherd-10502 -bg
> >  9392 ?        Ss     0:00      \_ sshd: reuti [priv]
> >  9398 ?        S      0:00          \_ sshd: reuti at pts/2
> >  9405 pts/2    Ss     0:00              \_ -bash
> >
> > How does the process tree look like after "h_rt" expired - did the job vanish from the `qstat`
too?
> >
> > -- Reuti
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6916 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20131204/6f80a6b2/attachment.p7s>


More information about the users mailing list