[gridengine users] Queues using less than slots=

Reuti reuti at staff.uni-marburg.de
Thu Nov 8 07:38:56 UTC 2012


Am 08.11.2012 um 07:36 schrieb Joseph Farran:

> Ok,
> 
> I found one clue.    "qstat" and "qstat -f" are reporting different number of cores ( slots ) in use:
> 
> Qstat is reporting 25 + 32 + 32 cores while "qstat -f " reports 25 + 15 + 10 cores:
> 
> qstat -f   ( for compute-2-6 )
> bio at compute-2-6.local          BIP   0/50/64        3.79 lx-amd64
>  45647 0.54310 QRLOGIN    user1        r     11/07/2012 15:55:04    25
>  40044 0.55421 SNPtable   user2        r     11/06/2012 11:13:18    15
>  40279 0.55421 SNPtable   user2        r     11/06/2012 14:50:25    10

In total 50 slots are used on this node, which is shown in the summary 0/50/64 and split to different jobs.


> $ qstat | grep compute-2-6
>  45647 0.54310 QRLOGIN    user1        r     11/07/2012 15:55:04 bio at compute-2-6.local             25
>  40044 0.55421 SNPtable   user2        r     11/06/2012 11:13:18 bio at compute-2-6.local             32
>  40279 0.55421 SNPtable   user2        r     11/06/2012 14:50:25 bio at compute-2-6.local             32

This shows the location of the master queue for this job only, not its allocation inside the cluster, which depends on the defined allocation_rule in the PE definition.

$ qstat -g t

As the job can run on other machines too, all slots on all machines must be considered.

-- Reuti


> So it looks like SGE is confused.    How can I fix this?
> 
> 
> On 11/7/2012 9:25 PM, Joseph Farran wrote:
>> Hi.
>> 
>> I am using SGE 8.1.2 with several queues and recently, several of my 64-slots queues are not scheduling the full 64-cores.
>> 
>> So if I submit 64 1-core jobs, only 57 or so are schedule per node instead of 64.      If I submit 4 16-core pe jobs, only 3 of the 16-core pe jobs are scheduled on a node instead of 4 ( 16x4 = 64 ).
>> 
>> This was working before just fine, so I think SGE just lost track or something.    I tried restarting SGE with same symptoms.    My queues do show "slots=64".    The compute nodes do not have any special settings.
>> 
>> Is there a way to tell SGE to re-count cores per node, or to reset SGE without disrupting running jobs?
>> 
>> Joseph
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users





More information about the users mailing list