[gridengine users] Functional share policy question
beckerjes at mail.nih.gov
Tue Nov 27 21:43:16 UTC 2012
On Tue, Nov 27, 2012 at 04:27:49PM -0500, Allan Tran wrote:
>I'm running SGE 8.1.2 from https://arc.liv.ac.uk/trac/SGE. Things are running fine by default. Now I need to set up a share policy but not sure how to approach this the best possible.
>The scenario is we have different groups of users and I need to give each of them a defined resource (or slots) so at any given time, each group will has a guaranty of slots.
>Say I have 120 slots (10 x 12 core procs) and 5 groups; math 20% (or 2 nodes), chem 50% (5), cs 10%, bio 10% (1) and other 10% (1).
>1. If the cluster is idle, any user in any group can get whatever they ask for.
>2. If the cluster is busy with all math users running (120 slots) and then chem user needs 50 slots, then 50 slots of math jobs will be suspended to allow chem users to run. Then if any other group needs to run, more math jobs will be suspended but math will guaranty to have at least 20 slots.
>Does it makes sense?
>I was thinking to enable the functional share policy and actually set it up, following this instructions (http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/i999885/index.html)
>However I'm not quite clear how the number of functional tickets translates to SGE slots. Will job will be suspended or resumed by default with this setup? Or does it even do what I'm after here.
>Thank for your response and advice.
Functional shares (alone), won't suspend any jobs. It is used for
scheduling jobs, to try and balanace job distribution as best it can
according to the ticket policy you've set.
With 120 total slots in a single queue, and assuming sufficient jobs from
each "group," SGE will try to allocate 24 SLOTS to math, 60 SLOTS to chem,
12 to CS, 12 to Bio, and 12 to "other." Note that I said "slots" and,
not "nodes." Unless there's a good reason to not "mix" jobs from
different groups on the same node, don't try to segregate things.
Functional shares also won't inherently suspend any jobs; it deals with
scheduling and dispatch. You can suspend jobs via other means though,
including load threshold and subordinate queues.
Incidentally, the "share tree" works basically the same way as
functional shares, except that it takes past usage into account.
Functional shares *only* look at current state of the queues *right
now*. This may, or may not be appropriate for your circumstance.
You might want to look into "resource quotas" as well, to keep a given
group from taking over the cluster.
NHGRI Linux support (Digicon Contractor)
More information about the users