[gridengine users] C|!!!!!!!!!! got NULL element for EH_name !!!!!!!!!!

Marshall2, John (SSC/SPC) john.marshall2 at canada.ca
Sat Nov 10 21:48:44 UTC 2018


I've never seen this but I would start with:
1) strace qmaster during restart to try to see at which point it is dying (e.g.,
loading a config file)
2) look for any reference to the name of the host you deleted in the spool
area and do some cleanup
3) clean out the jobs spool area


On Sat, 2018-11-10 at 16:23 -0500, Daniel Povey wrote:
Has anyone found this error, and managed to fix it?
I am in a very difficult situation.
I deleted a host (qconf -de hostname) thinking that the machine no longer existed, but it did exist, and there was a job in 'dr' state there.
After I attempted to force-delete that job (qdel -f job-id), the queue master died with out-of-memory, and now I can't restart qmaster.

So now I don't know hw to fix it.  Am I just completely lost now?



users mailing list

users at gridengine.org<mailto:users at gridengine.org>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20181110/f5969b9e/attachment.html>

More information about the users mailing list