[gridengine users] Making the fair-share policy/scheduler algorithm "more fair"

Jake Carroll jake.carroll at uq.edu.au
Wed May 1 00:50:05 UTC 2013


Mark,

Thanks for the response. This is opening up a bunch of cool ideas for us.

We're trying to get our heads around how the scaling factor stuff actually
"works" however.

For example, if a host policy says scale factor for mem = 1.0, but we
could perhaps set it to 0.50, what does that actually *mean*? How does it
change the "scale" factor and what impact does it have on the way the
scheduler works to utilise memory on that that node? Trying to get a
better handle on the semantics of this thing.

For example, we have "small" node and a "large" node in the same queue,
like so:

root at cluster ~]# qconf -se compute-0-0
hostname              compute-0-0.local
load_scaling          NONE
complex_values        virtual_free=92G,h_vmem=92G
load_values           arch=lx26-amd64,num_proc=24,mem_total=96865.863281M,
\
                      swap_total=0.000000M,virtual_total=96865.863281M, \
                      load_avg=0.150000,load_short=0.000000, \
                      load_medium=0.150000,load_long=0.490000, \
                      mem_free=95567.398438M,swap_free=0.000000M, \
                      virtual_free=95567.398438M,mem_used=1298.464844M, \
                      swap_used=0.000000M,virtual_used=1298.464844M, \
                      cpu=0.000000, \
                      m_topology=SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT, \
                  
m_topology_inuse=SCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTT, \
                      m_socket=2,m_core=12,np_load_avg=0.006250, \
                      np_load_short=0.000000,np_load_medium=0.006250, \
                      np_load_long=0.020417
processors            24
user_lists            NONE
xuser_lists           NONE
projects              NONE
xprojects             NONE
usage_scaling         NONE
report_variables      NONE




[root at cluster ~]# qconf -se compute-1-0
hostname              compute-1-0.local
load_scaling          NONE
complex_values        virtual_free=373G,h_vmem=373G
load_values        
arch=lx26-amd64,num_proc=80,mem_total=387739.152344M, \
                      swap_total=0.000000M,virtual_total=387739.152344M, \
                      load_avg=2.000000,load_short=2.000000, \
                      load_medium=2.000000,load_long=2.000000, \
                      mem_free=298652.855469M,swap_free=0.000000M, \
                      virtual_free=298652.855469M,mem_used=89086.296875M, \
                      swap_used=0.000000M,virtual_used=89086.296875M, \
                      cpu=2.500000, \
                  
m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSC
TTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, \
                  
m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCT
TCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, \
                      m_socket=4,m_core=40,np_load_avg=0.025000, \
                      np_load_short=0.025000,np_load_medium=0.025000, \
                      np_load_long=0.025000
processors            80
user_lists            NONE
xuser_lists           NONE
projects              NONE
xprojects             NONE
usage_scaling         NONE
report_variables      NONE


So - how does the scale factor etc actually impact the schedulers use of
the node?

Thanks.

--JC









On 30/04/13 9:12 PM, "Mark Dixon" <m.c.dixon at leeds.ac.uk> wrote:

>On Fri, 26 Apr 2013, Jake Carroll wrote:
>...
>> Anyway. What I would really like to know is, if it's possible to weight
>> and "fair-share" based on something other than slots utilisation. Can a
>> user weight on memory utilisation for example? What I'd really like to
>> be able to do is prioritise and weight users down who slam the HPC
>> environment with big high memory jobs, such that they are
>>de-prioritised 
>> once their jobs have run, so it gives other users a fair swing at the
>> lovely DIMM modules too.
>...
>
>We use the share tree here, rather than the functional policy, so this
>might not be applicable.
>
>By default, the "usage" of a job is wholly based on slots*seconds. You
>can 
>introduce memory (in gigabytes*seconds) by editing the
>"usage_weight_list"
>parameter in "qconf -ssconf". We certainly did :)
>
>See the sched_conf man page for more details.
>
>If you don't have the same amount of RAM everywhere, you might also want
>to play with "usage_scaling" parameters in the execd host definitions.
>
>Good luck :)
>
>Mark
>-- 
>-----------------------------------------------------------------
>Mark Dixon                       Email    : m.c.dixon at leeds.ac.uk
>HPC/Grid Systems Support         Tel (int): 35429
>Information Systems Services     Tel (ext): +44(0)113 343 5429
>University of Leeds, LS2 9JT, UK
>-----------------------------------------------------------------





More information about the users mailing list