[gridengine users] qrsh session failed to execute prolog script?

Reuti reuti at staff.uni-marburg.de
Wed Jan 9 08:36:02 UTC 2019


> Am 09.01.2019 um 01:14 schrieb Derrick Lin <klin938 at gmail.com>:
> Hi guys,
> I just brought up a new SGE cluster, but somehow the qrsh session does not work:
> tester at login-gpu:~$ qrsh
> ^Cerror: error while waiting for builtin IJS connection: "got select timeout"
> after I hit entered, the session just stuck there forever instead of bring me to a compute node. I have to entered Crtl+c to terminate and it gave the above error.
> I noticed, the SGE did send my qrsh request to a compute node as I could tell from qstat:
> ---------------------------------------------------------------------------------
> short.q at zeta-4-15.local        BIP   0/1/80         0.01     lx-amd64
>      15 0.55500 QRLOGIN    tester       r    01/09/2019 10:47:13     1
> We have a prolog script configured globally, the script deals with local disk quota and keep all output to a log file for each job. So I went to that compute node, and check, found that a log file was created but it was empty. 
> So my thinking so far is, my qrsh stuck because the prolog script is not fully executed.

Is there any statement in the prolog, which could wait for stdin – and in a batch job there is just no stdin, hence it continues? Could be tested with "-i" to a batch job.

-- Reuti

> qsub job are working fine.
> Any idea will be appreciated 
> Cheers,
> Derrick
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

More information about the users mailing list