[gridengine users] Understanding load_formula and load calculations for queue overloads..
reuti at staff.uni-marburg.de
Mon Feb 29 15:55:01 UTC 2016
> Am 28.02.2016 um 21:51 schrieb Ben Daniel Pere <ben.pere at gmail.com>:
> Each job starting on a machine will contribute 1 to the adjustment which will decay over time to 0, in your case in 7:30 minutes. The 38.23 is the sum of all these adjustments of all jobs starting in the last 7:30 while each job will have it's own individual contribution to this sum. If no job started in the last 7:30 on a machine it should read 0.50 * 0.000000. This value is then divided by 56 before being added to 0.965536.
> I actually realized the 38.23 while I was writing this email and noticed the decay time and started to read about it - still what made me send the question was the fact I don't see where the load_formula kicks into play here - the minus num_procs seems to be completely ignored here so I'm probably missing something - what is it?
It's the other way round. The load used in the load_formula is already adjusted. You adjust individual values, not the result of any computation already made with them.
The computed load_formula will then be used to sort the machines.
> > load_formula is load_avg-num_proc and load_adjustments are 0.5:
> What was the reason to implement it this way? Having a full loaded machine and subtracting num_proc would read zero - which doesn't reflect the actual use of the machine.
> no one remembers.. talked with the people who configured it - they have absolutely no idea :) "probably copy pasted from somewhere online" <-- real quote.
> - A job_load_adjustments does handle the fact that a job isn't using the granted resources instantly, what is not happening in your case.
> I would also assumes it's good for "starting engines" - since the load_avg is the 5 minutes load submitting a huge array after some idle time will make all jobs see almost zero load on the machine.. I wouldn't mind bombing the machine because we only have 1 slot per core so not really worried about killing the cpu but I can see the logic in it even in cases of always intensive jobs.
> - alarm_threshold in the queue definition takes care in case you want to oversubscribe a machine by intention as your parallel job doesn't scale well
> we basically have 2 kinds of queue - a workhorse queue "all.q" which has 1 slot per core and an interactive queue which also has 1 slot per core but gets a better priority. we set the load_thresholds to 1.3 to allow 30% oversubscription to ensure interactive jobs can always run.. we never ever put our nodes in alarm mode, we use zabbix to monitor machine's health and we automatically take it out of the cluster (by disabling all of it's queues) in cases of "mess" (disk failures, out of space, mounting issues, stuff like that).
Are these interactive job generating load, is it used only to allow users to peek on a machine?
More information about the users