[gridengine users] Removing 1.4 BILLION tasks job array

Joseph Farran jfarran at uci.edu
Wed Aug 7 23:40:24 UTC 2019


Howdy.

A user accidentally submitted a 1.4 BILLION job array on our HPC 
cluster.    How can I remove it?

I cannot qdel the job nor can I qhold the job because it crashes SGE.   
I can restart SGE just fine but the job remains.

I removed the SGE job script itself from /var/spool/sge/job_scripts and 
restarted SGE, job remains.

The only thing I can do is remove tasks a time either one at a time or 
in groups which works but at 1.4 BILLION tasks, that will take a while.

Added max_aj_task to SGE to prevent this in the future.

    # qconf -sconf|grep tasks
    max_aj_tasks                 100000


Any help appreciated.

Thank you,
Joseph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20190807/7cf29c56/attachment.html>


More information about the users mailing list