[gridengine users] jobs getting killed (failed assumedly after job because: job 311263.1 died through signal KILL (9))
Eric.Peskin at nyumc.org
Wed Oct 19 13:09:31 UTC 2011
The user launches her job with
nohup qmake -m be -M <emailaddress> -- -j 32 all &
(Where <emailaddress> above is replaced with her actual email address, omitted here for privacy.)
She has a file ~/.sge_request that specifies the following:
So I do not see anything here that mentions h_rt.
On Oct 19, 2011, at 8:29 AM, Esztermann, Ansgar wrote:
> On Oct 15, 2011, at 1:07 , Peskin, Eric wrote:
>> Sorry for the delayed reply. I have appended the SGE logs below. The very first line is actually a different user's job that happens to have the same number pop up. I am not sure what the best way to grep for these is:
> awk -F : '$5=311263' logfilename
> As to the real problem, someone mentioned h_rt, and you checked that no limit had been configured for the queues, but what about the jobs? Could the users inadvertently have given a low h_rt?
> Ansgar Esztermann
> Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
> users mailing list
> users at gridengine.org
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.
More information about the users