[gridengine users] A couple of questions...

Jesse Becker beckerjes at mail.nih.gov
Wed Jun 29 13:34:18 UTC 2011


On Wed, Jun 29, 2011 at 09:01:11AM -0400, Vic wrote:
>
>> Can you alter your bsub -> qsub translater include a '-P <project>'
>> string?  That way you can use an RQS to throttle the the number of jobs
>> running at a time.
>
>That's the sort of thing I'm thinking about - but unfortunately, we have
>several users (who might be running several compilations each), so to use
>that sort of throttling would require some nifty scripting to pick up a
>new project per run; I'm not sure how to create that uniqueness yet. But
>this is the sort of ugliness I'm toying with..., :-)

You can set limits per user as well via RQS.

Also, there is a 'max_u_jobs' setting (qconf -sconf), that also limits the
number of active (running) jobs a user can have at time.  This is a
seperate restriction from using RQS (you can use both at the same time,
and the lower number should apply).

Another advantage to using a project for these jobs is to get accounting
information, even if you don't use a resource quota to limit the number
of them running.  A project can also apply to the scheduling routines if
you use either functional shares or a share tree configration.


>
>> You also mentioned that this was "well into the iterator".  If you use
>> qsub, you essetially dump all of the jobs into the queue, and let SGE
>> deal with them when it can.  You could also use qrsh or "qsub -now y".
>> These will both block until the job is actually complete.
>
>I can't easily change the invocation sequence; Quartus is separating the
>jobs & dispatching each one to the grid. So if I let them all go without
>dependency settings, a couple of runs will completely swamp the grid for
>several hours, and no-one else can run anything. If I limit each user,
>that user becomes idle as soon as he's committed a run to the grid (won't
>be able to run anything else).

They can *submit* as many jobs as they want, in that they are sent to
SGE (up to the limits set by max_u_jobs, max_jobs, and other limits),
but only a certain number of them will run at a given time.  

>
>It isn't pretty, is it?

Something similar happens here at $day_job, where a single user can swamp the
cluster, given the chance.  We address this in several ways:

1) Set global user limits on the number of total slots they can use at a
time (we use about 60%)

2) Limit specific groups/projects further, as needed.

3) We encourage the users to submit "more, smaller" jobs to SGE when
possible.  This creates a higher job throughput, and a faster "churn"
rate, which leads to a more fair distribution of CPU cycles (we mostly
use functional shares for this).


Another option would be to push all of these Quartus jobs into a
specific queue, and make it subordinate to a different queue used for
non-quartus jobs (project restrictions can help here too).


-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)



More information about the users mailing list