[gridengine users] reschedule_unknown and state "t"
reuti at staff.uni-marburg.de
Thu Nov 8 12:29:29 UTC 2012
Am 02.11.2012 um 15:56 schrieb William Hay:
> I submitted an array job with -r y. One of the tasks was transferring to a node (state t) when that node went down but despite max_unheard+reschedule_unknown being exceeded neither that task nor another task on the same node was rescheduled. A manual qmod -rq seems to work but just working would be better.
But if the node crashes while all jobs are state "r" it working for you - there was no checkpointing environment in the way?
The array task was still shown in state "t" all the time?
> Is this a known problem?
It's hard to provoke.
More information about the users