[gridengine users] SoGE 8.1.8 - sge_qmaster fails inconsistently and fail-over occurs quite often - best practices debugging and resolving the issue.
Coleman, Marcus [JRDUS Non-J&J]
mcolem19 at its.jnj.com
Fri Jul 1 20:53:52 UTC 2016
Hi Sorry if I am replying incorrectly...
You stated you received the message
06/14/2016 04:31:43|worker|mtlxsge001|E|execd at mtlx168.yok.mtl.com<https://gridengine.org/mailman/listinfo/users> reports running job (89975.1/master) in queue "idle.q at mtlx168.yok.mtl.com<https://gridengine.org/mailman/listinfo/users>" that was not supposed to be there - killing
1. Can you please reply with the output of qacct -j 89975...
2. Also can you please provide your output for qconf -sconf
3. Also can you please provide your output for ethtool - S <interface used with sge>
Thanks! If you have resolved your issue please disregard...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users