[gridengine users] Jobs died through signal XCPU while not exceeding limit

Reuti reuti at staff.uni-marburg.de
Fri Oct 19 19:51:32 UTC 2012


BTW:

If you set:

$ qconf -sconf
...
loglevel                     log_info
...

you will get an entry in the messages file of the node when SGE discovers such a condition.

-- Reuti


Am 19.10.2012 um 20:47 schrieb Reuti:

> Am 19.10.2012 um 19:43 schrieb Jérémie Dubois-Lacoste:
> 
>> afair, when vmem is passed, the abort message says KILL,
>> not XCPU. But anyway 433M is below the limit (soft 450,
>> hard 480), so I don't think the memory is involved here.
> 
> Defined by M or m?
> 
> M = base 1024
> m = base 1000
> 
> -- Reuti
> 
> (man sge_types)
> 
> 
>> 2012/10/19 Reuti <reuti at staff.uni-marburg.de>:
>>> Am 19.10.2012 um 19:01 schrieb Jérémie Dubois-Lacoste:
>>> 
>>>> One user on our cluster is having this problem, that I've never
>>>> seen before. According to him there is some randomness, the
>>>> same job may succeed or fail from time to time.
>>>> When the job abbort he gets this e-mail:
>>>> 
>>>> Start Time       = 10/19/2012 15:25:17
>>>> End Time         = 10/19/2012 17:07:20
>>>> CPU              = 01:40:35
>>>> Max vmem         = 433.707M
>>> 
>>> It's also send if s_vmem is passed.
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> failed assumedly after job because:
>>>> job 5433573.1 died through signal XCPU (24)
>>>> 
>>>> So the job was running for 1h40, then get killed.
>>>> 
>>>> But the queue that he submitted to has a CPU time limit
>>>> of one week. Among the output of "qconf -sq <queue>":
>>>> s_cpu                 168:00:00
>>>> h_cpu                 169:00:00
>>>> 
>>>> Any idea?
>>>> 
>>>> Jérémie
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> users at gridengine.org
>>>> https://gridengine.org/mailman/listinfo/users
>>>> 
>>> 
>> 
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>> 
> 
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
> 





More information about the users mailing list