[gridengine users] suggestions on setting up queues
moloney at ohsu.edu
Fri Jan 16 21:37:23 UTC 2015
I have found resource quotas to be pretty unreliable once you start making even moderately complex rules. Instead I do almost everything through resource/host/queue configuration. Using a server side JSV you can do pretty much anything without needing resource quotas.
The way I have it setup is to have various time limits (30 minutes, 1 hour, 2 hours, ..., over 48 hours). The jobs with the shortest time limit can use 100% of the cluster resources. Jobs with longer time limits can only use a fraction of cluster resources (e.g. 90% for 1 hour jobs). I don't differentiate between interactive/batch jobs since people can always use an "interactive" session to do non-interactive processing (if all the batch slots are full). I use the fair share system to balance usage between users. Finally I provide an "urgent.q" which has a higher priority on which each user can use at most one slot for 12 hours.
Advanced Imaging Research Center
Oregon Health Science University
From: users-bounces at gridengine.org [users-bounces at gridengine.org] on behalf of Prentice Bisbal [prentice.bisbal at rutgers.edu]
Sent: Friday, January 16, 2015 12:56 PM
To: users at gridengine.org
Subject: Re: [gridengine users] suggestions on setting up queues
I'd be careful about setting up too many queues. The more complicated you make it,the harder it is for your users to use. I'd start with the following. My apologies if you've already done some of these steps:
0. Find some way to monitor your scheduler's behavior, and figure out what you want to see happen. Without some kind of goals and metrics, how will you know if your changes are working as desired?
1. Require users specify a wallclock time when running jobs. This is required for step 2. Don't set a default wallclock time. Configure SGE to fail a job immediately if a wallclock time isn't specified. I did this a long time ago, but forgot how to this. I believe if you make '-w e' a default option for qsub (eg 'qsub -w e ......') jobs that do not specify h_rt will fail immediately. This will get your users to remember to always set h_rt.
2. Turn on backfill scheduling.
3. Look into fairshare scheduling.
Only after you've take these 3 steps, would I look into making additional queues.
On 01/16/2015 02:50 PM, Stephen Spencer wrote:
With the number of users on our clusters growing, it's becoming less realistic to say "play fair 'cause you're not the only user of the cluster."
I'm looking for suggestions on setting up queues, both the "why" and "how," that will allow more of our users access to the cluster.
What I'm thinking of is a multi-queue approach:
* some limited number of "interactive" slots (and they'd be time-limited)
* a queue for jobs with short time duration - the "express" queue
* a queue for jobs that will run longer... but only so many of these per user
Any and all suggestions are welcome.
spencer at cs.washington.edu<mailto:spencer at cs.washington.edu>
users mailing list
users at gridengine.org<mailto:users at gridengine.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users