[gridengine users] -notify and killing jobs

Reuti reuti at staff.uni-marburg.de
Mon Mar 6 11:29:05 UTC 2017


> Am 06.03.2017 um 10:36 schrieb Julien Nicoulaud <julien.nicoulaud at gmail.com>:
> I run jobs with -notify and a long notify time of 30 minutes, as the jobs can have a very long cleanup.
> This works fine, when using "qdel" USR2 is sent and handled by my jobs.
> But in some cases, I would like to force kill the job immediately (by sending the KILL signal).
> I cannot find any way to do this, any idea ?

Unfortunately the -notify has no y/n option, and hence we can't change its setting by `qalter`. There are two similar ways to remove them anyway:

1. Abuse a checkpointing interface to kill it by rescheduling it (must be attached to the queue and requested by job submission).

$ qconf -sckpt killer
ckpt_name          killer
interface          userdefined
ckpt_command       none
migr_command       none
restart_command    none
clean_command      none
ckpt_dir           /tmp
signal             none
when               x

The running job can be checkpointed by `qmod -sj <job_id>`, this will send a sigkill to the job and reschedule it. While it is waiting again, you can use the usual `qdel` to remove it from the waiting list.

(2. but not optimal: Submit the jobs with "-r y" and reschedule them by `qmod -rj <jobn_id>`. While it's waiting again, you can use the `qdel` on the (again) waiting job. But the jobs will continue on the node although they vanished from the job list. There were discussions on the list before, that it will need some time until they really decease operation.)

-- Reuti
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://gridengine.org/pipermail/users/attachments/20170306/e5502a1f/attachment.sig>

More information about the users mailing list