[gridengine users] Functional share policy question
reuti at staff.uni-marburg.de
Tue Nov 27 22:08:02 UTC 2012
Am 27.11.2012 um 22:43 schrieb Jesse Becker:
> On Tue, Nov 27, 2012 at 04:27:49PM -0500, Allan Tran wrote:
>> I'm running SGE 8.1.2 from https://arc.liv.ac.uk/trac/SGE. Things are running fine by default. Now I need to set up a share policy but not sure how to approach this the best possible.
>> The scenario is we have different groups of users and I need to give each of them a defined resource (or slots) so at any given time, each group will has a guaranty of slots.
>> Say I have 120 slots (10 x 12 core procs) and 5 groups; math 20% (or 2 nodes), chem 50% (5), cs 10%, bio 10% (1) and other 10% (1).
>> 1. If the cluster is idle, any user in any group can get whatever they ask for.
>> 2. If the cluster is busy with all math users running (120 slots) and then chem user needs 50 slots, then 50 slots of math jobs will be suspended to allow chem users to run. Then if any other group needs to run, more math jobs will be suspended but math will guaranty to have at least 20 slots.
>> Does it makes sense?
>> I was thinking to enable the functional share policy and actually set it up, following this instructions (http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/i999885/index.html)
>> However I'm not quite clear how the number of functional tickets translates to SGE slots. Will job will be suspended or resumed by default with this setup? Or does it even do what I'm after here.
>> Thank for your response and advice.
> Functional shares (alone), won't suspend any jobs. It is used for
> scheduling jobs, to try and balanace job distribution as best it can
> according to the ticket policy you've set.
Yes, when a job is allowed to start, it will be allowed to run up to the end. There is nothing in SGE to reschedule or suspend a job according to the tickets.
One small addition: a functional policy can also change the "nice" value of jobs to achieve a certain distribution (if nodes are oversubscribed of course). Settings "reprioritize_interval" in the scheduler and in addition for now "reprioritize" in the SGE configuration (`man sge_priority`).
> With 120 total slots in a single queue, and assuming sufficient jobs from
> each "group," SGE will try to allocate 24 SLOTS to math, 60 SLOTS to chem,
> 12 to CS, 12 to Bio, and 12 to "other." Note that I said "slots" and,
> not "nodes." Unless there's a good reason to not "mix" jobs from
> different groups on the same node, don't try to segregate things.
> Functional shares also won't inherently suspend any jobs; it deals with
> scheduling and dispatch. You can suspend jobs via other means though,
> including load threshold and subordinate queues.
> Incidentally, the "share tree" works basically the same way as
> functional shares, except that it takes past usage into account.
> Functional shares *only* look at current state of the queues *right
> now*. This may, or may not be appropriate for your circumstance.
> You might want to look into "resource quotas" as well, to keep a given
> group from taking over the cluster.
> Jesse Becker
> NHGRI Linux support (Digicon Contractor)
> users mailing list
> users at gridengine.org
More information about the users