[gridengine users] Jobs being killed when another jobs start
reuti at staff.uni-marburg.de
Fri May 4 15:23:02 UTC 2012
Am 04.05.2012 um 09:52 schrieb Winkler, Ursula (ursula.winkler at uni-graz.at):
> I have a problem with SGE: it looks like (though I’m really not sure) that running jobs are killed when other jobs are submitted which otherwise should wait in the queue because no free cores are available. It happens repeatedly so this raises suspicion. Unfortunately the error of the job log files only says that the jobs terminated badly. “qacct –j <job-no>” tells that the exit status of the jobs is “7”. I could not find out what this error code means. Does anybody know that?
Well, can you tell us more about your setup? In principle it's e.g. possible to define a suspend method to kill the job on its own.
Exit state 7 is SIGBUS (`kill -l` lists all), which might even lead to a hardware error.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users