[gridengine users] Understanding load_formula and load calculations for queue overloads..

Reuti reuti at staff.uni-marburg.de
Sun Feb 28 19:00:29 UTC 2016


Am 28.02.2016 um 17:03 schrieb Ben Daniel Pere:

> I'm looking into several cases where jobs don't enter our queues even though the load is lower than the threshold and I noticed there's a different calculation there I can't figure..
> Turning on logging, I see the following on qstat -j on a job that should enter but isn't:
> queue instance "all.q at n38.--.com" dropped because it is overloaded: np_load_avg=1.306875 (= 0.965536 + 0.50 * 38.230000 with nproc=56) >= 1.30

Each job starting on a machine will contribute 1 to the adjustment which will decay over time to 0, in your case in 7:30 minutes. The 38.23 is the sum of all these adjustments of all jobs starting in the last 7:30 while each job will have it's own individual contribution to this sum. If no job started in the last 7:30 on a machine it should read 0.50 * 0.000000. This value is then divided by 56 before being added to 0.965536.

> load_formula is load_avg-num_proc and load_adjustments are 0.5:
> $ qconf -ssconf
> algorithm                         default
> schedule_interval                 00:00:01
> maxujobs                          0
> queue_sort_method                 load
> job_load_adjustments              np_load_avg=0.50,load_avg=0.50
> load_adjustment_decay_time        0:7:30
> load_formula                      load_avg-num_proc

What was the reason to implement it this way? Having a full loaded machine and subtracting num_proc would read zero - which doesn't reflect the actual use of the machine.

> given the n38 example, I see the average load is 0.965536 but I have absolutely ZERO idea where that 38.23 comes from.. num_proc is 56, load_avg is less than that, where does 38.23 comes from?
> Also, I should note all our jobs take 1 full cpu and they start doing it after about 10 seconds of starting, what should I set the decay time to? we took 7:30

When your jobs use the granted CPU almost instantly and you don't intend to overload machines by intention, then you will neither need any job_load_adjustments nor any alarm_threshold in the queue definition.

- A job_load_adjustments does handle the fact that a job isn't using the granted resources instantly, what is not happening in your case.

- alarm_threshold in the queue definition takes care in case you want to oversubscribe a machine by intention as your parallel job doesn't scale well (E.g. running 72 slots on 64 core machine and you expect an average load of 1 by a certain mix of parallel and serial jobs on this machine. Running now 72 serial jobs would lead to an noticeable oversubscription - the alarm_threshold would take care of this and avoid further jobs to be started on this machine). If you have serial jobs only or parallel jobs which scale well, this isn't necessary to be set.

(BTW: we use alarm_threshold only to put a machine in alarm state to avoid further dispatching of jobs to this machine in case the free space in /tmp falls below 1 GB)

-- Reuti

> minutes as a default we found somewhere..
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

More information about the users mailing list