[gridengine users] jobs getting killed (failed assumedly after job because: job 311263.1 died through signal KILL (9))
Mike Hanby
mhanby at uab.edu
Mon Oct 3 17:41:04 UTC 2011
Is the hard run time limit (h_rt) getting reached some times but not others?
> -----Original Message-----
> From: users-bounces at gridengine.org [mailto:users-
> bounces at gridengine.org] On Behalf Of Peskin, Eric
> Sent: Monday, October 03, 2011 11:14 AM
> To: users at gridengine.org
> Subject: [gridengine users] jobs getting killed (failed assumedly after
> job because: job 311263.1 died through signal KILL (9))
>
> All,
>
> I have a user running qmake jobs. Intermittently, the job fails and
> SGE says it was killed with signal 9. The user did not kill it. We
> (the sysadmins) did not kill it. How can I figure out what is going
> on? The worst part is that this problem is intermittent. Exactly the
> same command works sometimes but fails sometimes. I have appended the
> message from SGE below. Any suggestions would be greatly appreciated.
>
> Thanks,
> Eric Peskin
>
> From: root [root at local]
> Sent: Saturday, September 24, 2011 9:04 PM
> To: Tang, Zuojian
> Subject: Job 311263 (qmake) Aborted
>
> Job 311263 (qmake) Aborted
> Exit Status = 137
> Signal = KILL
> User = tangz01
> Queue = regular.q at compute-0-13.local
> Host = compute-0-13.local
> Start Time = 09/24/2011 19:03:31
> End Time = 09/24/2011 21:04:10
> CPU = 00:00:29
> Max vmem = 2.579G
> failed assumedly after job because:
> job 311263.1 died through signal KILL (9)
>
>
> ------------------------------------------------------------
> This email message, including any attachments, is for the sole use of
> the intended recipient(s) and may contain information that is
> proprietary, confidential, and exempt from disclosure under applicable
> law. Any unauthorized review, use, disclosure, or distribution is
> prohibited. If you have received this email in error please notify the
> sender by return email and delete the original message. Please note,
> the recipient should check this email and any attachments for the
> presence of viruses. The organization accepts no liability for any
> damage caused by any virus transmitted by this email.
> =================================
>
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
More information about the users
mailing list