[gridengine users] reschedule_unknown and state "t"

William Hay w.hay at ucl.ac.uk
Thu Nov 8 12:59:36 UTC 2012


There was a checkpointing environment.  Also the same thing seems to happen
in the first few minutes the job is running but not afterwards.


On 8 November 2012 12:29, Reuti <reuti at staff.uni-marburg.de> wrote:

> Am 02.11.2012 um 15:56 schrieb William Hay:
>
> > I submitted an array job with -r y.  One of the tasks was transferring
> to a node (state t) when that node went down but despite
> max_unheard+reschedule_unknown being exceeded neither that task nor another
> task on the same node was rescheduled.  A manual qmod -rq seems to work but
> just working would be better.
>
> But if the node crashes while all jobs are state "r" it working for you -
> there was no checkpointing environment in the way?
>
> The array task was still shown in state "t" all the time?
>
>
> > Is this a known problem?
>
> It's hard to provoke.
>
> - Reuti
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20121108/db0a9453/attachment.html>


More information about the users mailing list