[gridengine users] Finished jobs still appear as running in queue
Nicolás Serrano Martínez-Santos
nserrano at dsic.upv.es
Wed Nov 27 09:24:23 UTC 2013
Excerpts from Reuti's message of 2013-11-26 19:37:34 +0100:
> But the process is also gone from the node, and not in some uninterruptible kernel sleep?
It is gone.
> What's in the script: /scripts/sgeepilog.sh - anything what could hang?
Please find it attached. However, the wait does not always return -1 in the
epilog but sometimes also in the main script.
> Are you using -notify and s_rt at the same time? At least for the CPU time I spot 36000 as s_cpu which I suggest to remove. It has no direct effect as you have a h_cpu in addition anyway. Having -notify and a soft warning at the same time could result in a warning for the warning and the job is never killed but warned every 90 seconds or so. Maybe something similar is happening when you have s_cpu and s_rt being triggered almost at the same time.
We are not using those two options. This is what the typical qstat of a process loooks like
submission_time: Tue Nov 19 17:30:06 2013
hard resource_list: h_cpu=72000,h_rt=72000,h_vmem=5120M
mail_list: nserrano at dsic.upv.es
jid_predecessor_list (req): cart_700.standard.triphoneme.train-init
job-array tasks: 1-500:1
usage 334: cpu=10:08:02, mem=182410.00000 GBs, io=0.00000, vmem=5.000G, maxvmem=5.000G
scheduling info: queue instance "gpus at hpcg1.cc.upv.es" dropped because it is disabled
queue instance "gpus at hpcg2.cc.upv.es" dropped because it is disabled
Another peculiarity of the cluster is that all processes are submittion with -R y, could it cause also any problem? I read in one of your mails
but I don't think is related to this problem.
> -- Reuti
> > until the process is deleted with "-f".
> > In the <qmaster spool>/messages there are references to this jobs as:
> > 11/25/2013 10:11:41|schedu|mainnode|W|job 312363.9 should have finished since 10483s
> > Do you have any hint of what can be problem?
> > Thanks in advance,
> > --
> > NiCo
> > <trace>_______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users
More information about the users