[gridengine users] Minimum allocation for the duration of an array job

Stephen Willey stephen at esstec.co.uk
Mon Sep 26 10:00:02 UTC 2011


Interesting idea, but...

Let's consider a farm of 24 slots with 5 users trying to get 8 slots each:

limit users !A to slots=16
limit users !B to slots=16
limit users !C to slots=16

That means D and/or E could submit a job and get up to 16 slots.  The
rules would have to be sequentially created as follows:

limit users !A to slots=16
limit users !A and !B to slots=8
limit users !A and !B and !C to slots=0

And then of course when A finishes before B you'd have to rejig all
the rules so I don't think that'd work in practise.

The second part though, where you grab a whole machine... that might work...

limit users !A hosts box1 to slots=0
limit users !B hosts box2 to slots=0
limit users !C hosts box3 to slots=0

and then remove once the job's done.

I guess that'd work.  I was just hoping for something a little less hacky :-)
Thanks for the help Reuti,

Stephen



On Mon, Sep 26, 2011 at 10:34 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Am 24.09.2011 um 14:49 schrieb Stephen Willey:
>
>>>> An example:
>>>>
>>>> 10 machines with X slots each
>>>> 15 users (a,b,c,d,...o) submit array jobs with 1000 tasks, each of
>>>> which requires X slots.
>>>
>>> X is not the same X like above? Or are you using always machines exclusive
>>> per array task?
>>
>> In this particular case, the machines are being used exclusively but
>> that was really just to give a simple example.  It's definitely not
>> always the case.
>
> I could think of the following implementation, but it needs an external cron job, checking the actual jobs therein. When it discovers that user A has an array job running using 8 slots out of 216 in the cluster for each instance, it will create a new RQS which reads:
>
> limit users !A to slots=208
>
> As the first rule in an RQS grants access, you will need one RQS per user (not many lines in one and the same RQS). It could also be extended to include the particular host, i.e. host Z having 12 cores:
>
> limit users !A hosts Z to slots=4
>
> Unfortunately there is no "jobs" rule implemented, which could implement it even better.
>
> -- Reuti



-- 
Stephen

http://lensframephoto.com



More information about the users mailing list