[gridengine users] Jobs died through signal XCPU while not exceeding limit

Reuti reuti at staff.uni-marburg.de
Fri Oct 19 17:17:14 UTC 2012


Am 19.10.2012 um 19:01 schrieb Jérémie Dubois-Lacoste:

> One user on our cluster is having this problem, that I've never
> seen before. According to him there is some randomness, the
> same job may succeed or fail from time to time.
> When the job abbort he gets this e-mail:
> 
> Start Time       = 10/19/2012 15:25:17
> End Time         = 10/19/2012 17:07:20
> CPU              = 01:40:35
> Max vmem         = 433.707M

It's also send if s_vmem is passed.

-- Reuti


> failed assumedly after job because:
> job 5433573.1 died through signal XCPU (24)
> 
> So the job was running for 1h40, then get killed.
> 
> But the queue that he submitted to has a CPU time limit
> of one week. Among the output of "qconf -sq <queue>":
> s_cpu                 168:00:00
> h_cpu                 169:00:00
> 
> Any idea?
> 
> Jérémie
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
> 





More information about the users mailing list