[gridengine users] Queues using less than slots=
reuti at staff.uni-marburg.de
Thu Nov 8 07:38:56 UTC 2012
Am 08.11.2012 um 07:36 schrieb Joseph Farran:
> I found one clue. "qstat" and "qstat -f" are reporting different number of cores ( slots ) in use:
> Qstat is reporting 25 + 32 + 32 cores while "qstat -f " reports 25 + 15 + 10 cores:
> qstat -f ( for compute-2-6 )
> bio at compute-2-6.local BIP 0/50/64 3.79 lx-amd64
> 45647 0.54310 QRLOGIN user1 r 11/07/2012 15:55:04 25
> 40044 0.55421 SNPtable user2 r 11/06/2012 11:13:18 15
> 40279 0.55421 SNPtable user2 r 11/06/2012 14:50:25 10
In total 50 slots are used on this node, which is shown in the summary 0/50/64 and split to different jobs.
> $ qstat | grep compute-2-6
> 45647 0.54310 QRLOGIN user1 r 11/07/2012 15:55:04 bio at compute-2-6.local 25
> 40044 0.55421 SNPtable user2 r 11/06/2012 11:13:18 bio at compute-2-6.local 32
> 40279 0.55421 SNPtable user2 r 11/06/2012 14:50:25 bio at compute-2-6.local 32
This shows the location of the master queue for this job only, not its allocation inside the cluster, which depends on the defined allocation_rule in the PE definition.
$ qstat -g t
As the job can run on other machines too, all slots on all machines must be considered.
> So it looks like SGE is confused. How can I fix this?
> On 11/7/2012 9:25 PM, Joseph Farran wrote:
>> I am using SGE 8.1.2 with several queues and recently, several of my 64-slots queues are not scheduling the full 64-cores.
>> So if I submit 64 1-core jobs, only 57 or so are schedule per node instead of 64. If I submit 4 16-core pe jobs, only 3 of the 16-core pe jobs are scheduled on a node instead of 4 ( 16x4 = 64 ).
>> This was working before just fine, so I think SGE just lost track or something. I tried restarting SGE with same symptoms. My queues do show "slots=64". The compute nodes do not have any special settings.
>> Is there a way to tell SGE to re-count cores per node, or to reset SGE without disrupting running jobs?
>> users mailing list
>> users at gridengine.org
> users mailing list
> users at gridengine.org
More information about the users