[gridengine users] difference between a task reschedule and a task kill in the epilog?
Lars van der bijl
lars at realisestudio.com
Wed Apr 4 15:42:04 UTC 2012
On 4 April 2012 17:14, Reuti <reuti at staff.uni-marburg.de> wrote:
> Well, in both cases it is killed of course. You could set loglevel to log_info and search the messages file of the qmaster for entries like:
> 04/04/2012 17:03:07|worker|pc15370|W|job 3963.1 failed on host pc15370 rescheduling because: manual/auto rescheduling
> 04/04/2012 17:03:07|worker|pc15370|W|rescheduling job 3963.1
> 04/04/2012 17:03:46|worker|pc15370|I|reuti has deleted job 396
might have to rotate the file before i try and do something like that,
it's currently 117Mb.
> Then you can act on this. Do you have this often, that you want to reschedule a job? I wonder whether using a checkpointing environment would help (also if we don't intend to use any checkpointing at all). There you can have a procedure for migration in migr_command.
no it's not something I want to happen often but it happens. one thing
i'm still struggling with on a related note is that a task will keep
running even after it is rescheduled. making both of the outputs
would we be able to make sure the task is kill -9'd (and it's sub
pids) if it's rescheduled using a checkpointing?
> -- Reuti
> Am 04.04.2012 um 16:33 schrieb Lars van der bijl:
>> is there a way to tell the difference?
>> if i reschedual a job i get these values in the usage file in the epilog
>> if i kill the job I get this.
>> anyone know of a way to tell the difference from the epilog?
>> users mailing list
>> users at gridengine.org
More information about the users