[gridengine users] current state of the art for tuning scheduler for very high job flow rates?

Reuti reuti at staff.uni-marburg.de
Wed Jul 24 14:11:52 UTC 2013


Am 24.07.2013 um 15:17 schrieb ChrisDag:

> Hi folks,
> 
> Thanks to a 20GB accounting file with 73 million entries that I was able to push through the S-GAE webapp (a process worthy of a writeup/blog post of it's own ...) I've got some interesting multi-year data about a cluster that is ready to have another round of tuning and optimization.
> 
> The most interesting data bits:
> 
> - Millions of jobs per month. Average 1.5m or so but we saw as high as 2.6 million jobs in one month
> - Average job duration is incredibly short - looks like average execution time is 40-50 seconds
> 
> Gut feeling is that the first thing this cluster will need is a reinstall so that we can tune the schedule into "scheduling on demand" mode. However it's been a few years since I seriously had to deal with a system running at this job throughput rate.
> 
> Has anything changed with respect to the current state of the art? I'm thinking as a base line:
> 
> - Reinstall so we can set scheduling on demand behavior for the scheduler

Why reinstall? It should be these parameters:

$ qconf -ssconf
...
schedule_interval                 0:2:0
...
flush_submit_sec                  4
flush_finish_sec                  4

in $SGE_ROOT/util/install_modules/inst_schedd_{normal,max,high}.conf

-- Reuti


> - Force local spooling and switch to binary if they are using classic mode
> - Strongly work with users/developers to increase average job duration
> 
> 
> -dag
> 
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users





More information about the users mailing list