[gridengine users] suggestions on setting up queues

Skylar Thompson skylar2 at u.washington.edu
Tue Jan 20 14:59:38 UTC 2015


Hi Stephen,

Rather than separate queues for different hardware resources, I would
recommend using requestable complexes, unless you really need something
that only a queue can provide (suspend configuration, resubmit, etc.).
You can use urgency values on the complexes to ensure that jobs that depend
on that resource have a high enough priority to run. Resource reservations
can help too, although note that the overhead of scheduling reservations
can be quite high if you have lots of running jobs.

On Sun, Jan 18, 2015 at 12:20:36PM -0800, Stephen Spencer wrote:
> Chris (and all who've responded),
> 
> Thank you for the responses. It's given me some directions to explore.
> 
> At present, I have just the default queue on one cluster, and two queues -
> one for the machines with GPUs, and one for machines without GPUs - on the
> other cluster.
> 
> The "fairshare-by-user" policy sounds very interesting; users typically
> only complain when one user is monopolizing the cluster, submitting
> hundreds of jobs, forcing anyone else to join the end of the line with
> their jobs.
> 
> Best,
> Stephen
> 
> On Fri, Jan 16, 2015 at 12:51 PM, Chris Dagdigian <dag at sonsorol.org> wrote:
> 
> >
> > Queues are just a piece of the puzzle when it comes to handling resource
> > allocation on a multi user system, what (if any)  scheduling policies and
> > resource quotas are you currently using?
> >
> > That said you are using the queue methods in a good way. There are certain
> > things that can only be really done on a per-queue basis and top of the
> > list would be ACL protection and the ability to impose hard or soft
> > wallclock limits.
> >
> > A fairshare-by-user policy with the queue structure you set up would be a
> > decent starting point from which you can gather more data and user feedback.
> >
> > Thoughts
> >
> >  - resource quota would perfectly handle the "only N jobs per user can run
> > in the long-job.q cluster queue ..."
> >
> >  - I've had little success putting wallclock limits on interactive queues;
> > there are legit business/scientific reasons in many cases for a long
> > running interactive session. You might want to poll the users or collect
> > data on this. In a few different environments I've had decent success by
> > leaving interactive queue slots unrestricted but putting a resource quota
> > around how many slots a single user can consume. It's also pretty easy to
> > set up tools that would allow you to dynamically adjust the size/count of
> > the interactive slot pool to account for changing demand - it's
> > particularly easy when used with SGE hostgroup objects.
> >
> > My $.02
> >
> >
> >
> >
> >
> >  Stephen Spencer <mailto:spencer at cs.washington.edu>
> >> January 16, 2015 at 2:50 PM
> >> Good morning.
> >>
> >> With the number of users on our clusters growing, it's becoming less
> >> realistic to say "play fair 'cause you're not the only user of the cluster."
> >>
> >> I'm looking for suggestions on setting up queues, both the "why" and
> >> "how," that will allow more of our users access to the cluster.
> >>
> >> What I'm thinking of is a multi-queue approach:
> >>
> >>   * some limited number of "interactive" slots (and they'd be
> >>     time-limited)
> >>   * a queue for jobs with short time duration - the "express" queue
> >>   * a queue for jobs that will run longer... but only so many of these
> >>     per user
> >>
> >> Any and all suggestions are welcome.
> >>
> >> Thank you!
> >>
> >> Best,
> >> --
> >> Stephen Spencer
> >> spencer at cs.washington.edu <mailto:spencer at cs.washington.edu>
> >> _______________________________________________
> >> users mailing list
> >> users at gridengine.org
> >> https://gridengine.org/mailman/listinfo/users
> >>
> >
> 
> 
> -- 
> Stephen Spencer
> spencer at cs.washington.edu

> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine



More information about the users mailing list