[gridengine users] sge_execd dies silently with 0 exit status
reuti at staff.uni-marburg.de
Thu Sep 12 16:12:47 UTC 2013
Am 12.09.2013 um 17:50 schrieb Edward Ned Harvey:
> I'm having a heck of a time figuring out why.
> On rhel6, /etc/init.d/sgeexecd.myclustername script is run at startup, or via sudo after startup.
> sudo /etc/init.d/sgeexecd.myclustername start
> It just says "OK" and no other output, yet the daemon isn't running.
> I added the "-x" option to '#!/bin/sh -x" so I can debug it …
> I see it gets up to the "exec 1> /dev/null 2>&1" which effectively eliminates any further debug output…
> So I comment out that line and run again.
> Now I can see it launches sge_execd, and the exit status is 0, so the "touch" on the following line does indeed create the lock file.
> The "qping" loop immediately after that in the script … exits with 0 status, on the first try.
> And still, there is no process running at the end of that script.
> I modify the startup script to perform the qping 5 times unconditionally. I see the first time, it has exit value 0, and all subsequent times, it has exit value 1. This means it is indeed running for a very short period of time, but then it dies in less than a second.
> Any ideas what the problem is?
Please have a look at your /tmp. The starting execd will write the cause of not being able to start in a file therein.
> This is a machine that we recently reinstalled the OS, and we're reinstalling sgeexecd by the same process it was previously installed.
> users mailing list
> users at gridengine.org
More information about the users