[gridengine users] Using multiple queues inherits s_rt & h_rt
jfarran at uci.edu
Thu May 28 19:27:07 UTC 2015
I am not sure if this is a bug or the way Grid Engine works.
We have several queues our users submit jobs to. One of the queues
"free64" has a 3-day wall-clock limit:
$ qconf -sq free64 | grep "_rt"
While other queue "bio" does not:
$ qconf -sq bio | grep "_rt"
When a user submits a job to both queues "-q free64,bio", jobs that run
longer than 3 days are killed whether they land on "free64" or "bio"
queue. Why are jobs that land on the "bio" queue being killed after 3
The jobs are also using GE checkpoint restart:
$ qconf -sckpt restart
Is it that checkpoint restart the cause of this? I am guessing that a
job that landed first on free64 queue picked-up the 3-days wall-clock
limit and when it is restarted on the bio queue, it inherited the
wall-clock 3-days limit from free64? If this is what is happening, is
this a bug? Is there a workaround?
More information about the users