[gridengine users] current state of the art for tuning scheduler for very high job flow rates?
reuti at staff.uni-marburg.de
Wed Jul 24 14:11:52 UTC 2013
Am 24.07.2013 um 15:17 schrieb ChrisDag:
> Hi folks,
> Thanks to a 20GB accounting file with 73 million entries that I was able to push through the S-GAE webapp (a process worthy of a writeup/blog post of it's own ...) I've got some interesting multi-year data about a cluster that is ready to have another round of tuning and optimization.
> The most interesting data bits:
> - Millions of jobs per month. Average 1.5m or so but we saw as high as 2.6 million jobs in one month
> - Average job duration is incredibly short - looks like average execution time is 40-50 seconds
> Gut feeling is that the first thing this cluster will need is a reinstall so that we can tune the schedule into "scheduling on demand" mode. However it's been a few years since I seriously had to deal with a system running at this job throughput rate.
> Has anything changed with respect to the current state of the art? I'm thinking as a base line:
> - Reinstall so we can set scheduling on demand behavior for the scheduler
Why reinstall? It should be these parameters:
$ qconf -ssconf
> - Force local spooling and switch to binary if they are using classic mode
> - Strongly work with users/developers to increase average job duration
> users mailing list
> users at gridengine.org
More information about the users