[gridengine users] Forgetting the Subordinate Queue
reuti at staff.uni-marburg.de
Sun Mar 17 09:14:14 UTC 2013
Am 17.03.2013 um 07:22 schrieb Joseph Farran:
> On 1/4/2013 10:37 AM, Reuti wrote:
>> Am 02.01.2013 um 05:08 schrieb Joseph Farran:
>>> Hello Reuti.
>>> Yes, the job(s) are not suspending (S) as they normally do. So it's not the queue, but the jobs.
>> But is the queue in suspended state (qstat -f)?
> Sorry Reuti, missed your question.
> Yes, the queue is SUSPENDED but jobs continue to run: Here is one example:
> free64 at compute-14-18.local BIP 0/4/64 11.21 lx-amd64 S
> 242709 0.00355 CMAPNN mengfant r 03/15/2013 02:27:23 2 20
> 242709 0.00355 CMAPNN mengfant r 03/15/2013 02:27:23 2 33
Were these slave tasks of a parallel job?
> Any idea why it keeps forgetting to suspend? Only happens once in a while but it overloads the nodes when it does happen.
>> -- Reuti
>>> Normally as soon as 1 or more core jobs enters the node through the queue, the subordinate jobs suspend immediately. Once is a while, the jobs that go in through the subordinate queue do not suspend as they should.
>>> On 1/1/2013 7:04 AM, Reuti wrote:
>>>> Engine Forgets and does not suspend and the node is overloaded.
>>>> The queue is not going into the "S" state or the jobs therein are just not suspended?
>>>> -- Reuti
More information about the users