[gridengine users] A couple of questions...

Jesse Becker beckerjes at mail.nih.gov
Wed Jun 29 14:02:04 UTC 2011

On Wed, Jun 29, 2011 at 09:43:33AM -0400, Vic wrote:
>> You can set limits per user as well via RQS.
>Yes, but I don't want to set limits per user; I want to set limits *per
>Each user must be able to submit jobs after his Quartus jobs have been
>queued, and have those jobs start immediately (subject to resource, of
>course). Additionally, users might sometimes need to run multiple Quartus
>runs and have those run independently of each other - so, for example, we
>might have one user running two instances, with four jobs executing from
>each instance and 12 jobs from each queuing.

Sure, that makes sense.  I don't think your situation is all that rare.

>> They can *submit* as many jobs as they want, in that they are sent to
>> SGE (up to the limits set by max_u_jobs, max_jobs, and other limits),
>> but only a certain number of them will run at a given time.
>Indeed - but if I limit the total number of jobs a user can run to a
>sensible level, then that user can't do any work after submitting a
>Quartus run. We might as well send him home...

Not quite.  Create two queues:  one called "Quart" for Quartus jobs and
one called "Gallon" for non-Quartus.  Both queues exist on all hosts.
Make Quart subordinate to Gallon.  Set the scheduler to use the lowest
loaded systems first.

Thus, you can run as many Quartus jobs as you want, but when someone
wants to run something else, it will go to the Pint queue, and suspend
the Quartus jobs if it needs to.

Or you could buy more compute nodes.

>> 3) We encourage the users to submit "more, smaller" jobs to SGE when
>> possible.  This creates a higher job throughput, and a faster "churn"
>> rate, which leads to a more fair distribution of CPU cycles (we mostly
>> use functional shares for this).
>This isn't something over which I have any control; we're using commercial
>tools, and they submit jobs that are the size they are. Typically, we
>expect 4hrs+ per job, but 24 hours isn't unknown.

Sure, we've the same issue here as well.  Not all jobs are "short," but
we try.

Jesse Becker
NHGRI Linux support (Digicon Contractor)

More information about the users mailing list