[gridengine users] SoGE 8.1.8 - sge_qmaster fails inconsistently and fail-over occurs quite often - best practices debugging and resolving the issue.

Coleman, Marcus [JRDUS Non-J&J] mcolem19 at its.jnj.com
Fri Jul 1 20:53:52 UTC 2016


Hi Sorry if I am replying incorrectly...

You stated you received the message

06/14/2016 04:31:43|worker|mtlxsge001|E|execd at mtlx168.yok.mtl.com<https://gridengine.org/mailman/listinfo/users> reports running job (89975.1/master) in queue "idle.q at mtlx168.yok.mtl.com<https://gridengine.org/mailman/listinfo/users>" that was not supposed to be there - killing



1.  Can you please reply with the output of qacct -j 89975...

2.  Also can you please provide your output for qconf -sconf

3.  Also can you please provide your output for ethtool - S <interface used with sge>





Thanks! If you have resolved your issue please disregard...








-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20160701/dcea58c6/attachment.html>


More information about the users mailing list