[gridengine users] SGE PE scheduler problem, doesn't pick least used nodes ?
alex at phillisoft.co.uk
Wed Mar 16 10:32:31 UTC 2011
We have a cluster of 1920 cores spread over 160 nodes (12 cores/node),
we only run one code in one queue, with jobs of between 48 and 256 cores
using an mpi pe.
When benchmarking our code we found a 14-15% speedup by running on 6
cores/node, compared with 12 cores/node.
We also found that if we ran on 6 cores/node, with a second job on the
other 6cores/node, we still have a 5-6% speedup.
So I have configured our mpi pe with allocation_rule = 6, and this
works, however, as the cluster fills up, the scheduler is starting a
second job on some nodes, before all the nodes are busy.
How can we configure the scheduler to run one job on all the nodes,
before starting a second job ?
I have tried defining the number of slots as a complex value on the
execution hosts, I’ve tried –np_load_avg, np_load_avg, slots, and -slots
as the load_formula, but I can’t get it to work.
but I can’t set the allocation rule to $pe_slots, as we only want to run
on 6 cores/node, not 12.
Any suggestions ?
More information about the users