[gridengine users] Removing 1.4 BILLION tasks job array
jfarran at uci.edu
Wed Aug 7 23:40:24 UTC 2019
A user accidentally submitted a 1.4 BILLION job array on our HPC
cluster. How can I remove it?
I cannot qdel the job nor can I qhold the job because it crashes SGE.
I can restart SGE just fine but the job remains.
I removed the SGE job script itself from /var/spool/sge/job_scripts and
restarted SGE, job remains.
The only thing I can do is remove tasks a time either one at a time or
in groups which works but at 1.4 BILLION tasks, that will take a while.
Added max_aj_task to SGE to prevent this in the future.
# qconf -sconf|grep tasks
Any help appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users