[gridengine users] Jobs died through signal XCPU while not exceeding limit
reuti at staff.uni-marburg.de
Fri Oct 19 17:17:14 UTC 2012
Am 19.10.2012 um 19:01 schrieb Jérémie Dubois-Lacoste:
> One user on our cluster is having this problem, that I've never
> seen before. According to him there is some randomness, the
> same job may succeed or fail from time to time.
> When the job abbort he gets this e-mail:
> Start Time = 10/19/2012 15:25:17
> End Time = 10/19/2012 17:07:20
> CPU = 01:40:35
> Max vmem = 433.707M
It's also send if s_vmem is passed.
> failed assumedly after job because:
> job 5433573.1 died through signal XCPU (24)
> So the job was running for 1h40, then get killed.
> But the queue that he submitted to has a CPU time limit
> of one week. Among the output of "qconf -sq <queue>":
> s_cpu 168:00:00
> h_cpu 169:00:00
> Any idea?
> users mailing list
> users at gridengine.org
More information about the users