[gridengine users] RQS Help

Ray Spence r3spence at gmail.com
Wed Jun 27 21:46:20 UTC 2012


Hi Reuti,

I think I'm coming to understand how SGE must be configured to restrict job
memory
usage. Our goal is to have one common queue with no memory/slots limits and
one
higher priority queue with memory and slots (h_vmem=128G, slots=32) limits.
My understanding is that the only way to do this is to make h_vmem and
slots globally (?)
consumable via qconf -mc. Once I do that I must set a default limit in the
next column.
(I found that if I left h_vmem at 0 all jobs got killed..)

So, how can I override that global (?) default on my higher priority queue?
Can I do this
in the queue config via qconf -mq ? I've tried this by setting the second
column in the
h_vmem line to "2G". Doesn't seem to work..

Thanks again,
Ray


On Tue, Jun 26, 2012 at 1:59 PM, Reuti <reuti at staff.uni-marburg.de> wrote:

> Hi,
>
> Am 26.06.2012 um 20:57 schrieb Ray Spence:
>
> > Back on the list. Please see below -
> >
> > On Tue, Jun 26, 2012 at 11:41 AM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > Am 26.06.2012 um 20:30 schrieb Ray Spence:
> >
> > > Reuti,
> > >
> > > Thank you so much for making RQS/h_vmem clear.
> > >
> > > I hope I'm not taking advantage of you here - I apologize if so.
> >
> > No, but please ask on the list or register. Therefore I didn't forward
> you last posting as it was sent from an unknown address.
> >
> >
> > > I have another question regarding slots. Our cluster has 4 nodes each
> with 32 cores. My
> > > assumption is that SGE should be able to run 128 total jobs at any
> time. I see only
> > > 1 job running per node with many jobs in qw. I think I need to change
> the "slots" value
> > > in the queue config? Here is what I have (still early on in learning
> SGE config..)
> > > from qconf -sq <queue>
> > >
> > > slots                1,[scf-sm00.Stat.Berkeley.EDU=32], \
> > >                       [scf-sm01.Stat.Berkeley.EDU=32], \
> > >                       [scf-sm02.Stat.Berkeley.EDU=32], \
> > >                       [scf-sm03.Stat.Berkeley.EDU=32]
> >
> > It's a matter of taste: the above is correct. If you have identical
> nodes, you can even shorten it to:
> >
> > slots      32
> >
> > Great, we do - I'll try that!
> >
> >
> > It's the number of slots per queue instance.
> >
> >
> > > should the "1" be 32? 128? Or, where is it that I tell SGE to use all
> 32 cores?
> >
> > As the default memory consumption is 248g, only one job can run at a
> time I would say.
> >
> > Ok, this is not what we want at all. It is more important to use all 32
> cores/node than attempting
> > any ram usage control. I'm going to back out the h_vmem complex setting
> in order to run 128 jobs at a time. Should I reset the h_vmem complex back
> to not consumable (NO)
>
> If you want to control the usage of memory and avoid oversubscription of
> it, it needs to stay being consumable.
>
>
> > or keep it comsumable but set its default to say 1G? If I do that won't
> users have to request
> > a higher h_vmem  amount upon job submission?
>
> Sure, if they want to use more than 1G they have to request more. They
> have to predict what they need. There is no crystal ball inside SGE which
> could look ahead to predict the necessary memory for for a job.
>
> -- Reuti
>
>
> >
> > Try to submit jobs with "sleep 120" or so for which you requested less
> memory on the command line. The actual used up memory can be checked:
> >
> > qhost -F h_vmem
> >
> >
> > -- Reuti
> >
> >
> > > Thank you again,
> > > Ray
> > >
> > >
> > > On Tue, Jun 26, 2012 at 11:14 AM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > > Am 26.06.2012 um 19:42 schrieb Ray Spence:
> > >
> > > > Hi Reuti,
> > > >
> > > > I'll respond in-line:
> > > >
> > > > On Mon, Jun 25, 2012 at 4:21 PM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > > > Hi,
> > > >
> > > > Am 26.06.2012 um 00:57 schrieb Ray Spence:
> > > >
> > > > > I apologize for more questions but I'm not getting to where our
> group wants our new
> > > > > cluster to be. In order to limit all of a given user's jobs in a
> specified queue to a total
> > > > > amount of physical ram (h_vmem) I see no other solution than an
> RQS. Is this true?
> > > >
> > > > Correct. h_vmem is a hard limit while others prefer virtual_free as
> a guidance for SGE, while the latter is not enforced:
> > > >
> > > >
> http://www.gridengine.info/2009/12/01/adding-memory-requirement-awareness-to-the-scheduler/
> > > >
> > > >
> > > > I've read this info from you. When you say "Use the one you defined
> in your qsub command by requesting it with the -l option..." I take you to
> mean that once I've made a given memory complex (h_vmem, virtual_free,
> etc.) consumable (qconf -mc) in order to enforce any limit on that complex
> users must request a number value on that complex at job submission. I think
> > > > I'm repeating myself here.. Your info here is what lead me to pose
> my question in the first place.
> > > >
> > > >
> > > > > Using qconf -mq <queue> will limit each job in <queue> but not
> each user's total
> > > > > memory footprint across all his jobs, correct?
> > > >
> > > > Correct.
> > > >
> > > >
> > > > > The node-level limit does not do what
> > > > > we want here..
> > > >
> > > > Correct, it's the memory usage across all queues and resp. all jobs
> on a node.
> > > >
> > > >
> > > > > I have this RQS in place:
> > > > >
> > > > > {
> > > > >    name         high.q-h_vmem
> > > > >    description  "high.q h_vmem limited to 128G"
> > > >
> > > > The quotation marks are not necessary.
> > > >
> > > >
> > > > >    enabled      TRUE
> > > > >    limit        users {*} queues high.q to h_vmem=128g
> > > > > }
> > > >
> > > > You made h_vmem consumable and attached a value per exechost?
> > > >
> > > > Yes - via qconf -mc, here is what the h_vmem line looks like:
> > > >
> > > > h_vmem              h_vmem     MEMORY      <=    YES         YES
>    248g     0
> > >
> > > It should be set to a default you expect to be taken for a job. We set
> it to 2g here, and users can increase the per job limit to the one set in
> the queue definition.
> > >
> > >
> > > > (should the "default" value here be different than 248? see below..
> Must it be 0? Must it NOT be 0?)
> > > >
> > > > and via qconf -me I've set h_vmem to be a little less (248G) than
> the installed ram (256G)
> > > > on each of the cluster's 4 nodes:
> > > >
> > > > qconf -se <cluster_node>
> > > > hostname              <>
> > > > load_scaling          NONE
> > > > complex_values        slots=32,h_vmem=248G
> > > > .....
> > > >
> > > >
> > > >
> > > > > which would seem to accomplish our goal. However, jobs submitted
> to high.q against this
> > > > > RQS without stating h_vmem needs at submission but which are
> written to exceed the memory limit do exceed the memory limit.
> > > >
> > > > Correct, the RQS will check the job request for h_vmem, but there is
> no relation back, i.e. that the RQS will limit the job's memory. Specifying
> only an overall limit per user would even make it hard for the RQS to
> decide what limit to (per job) set at all. Or if the overall limit is
> passed: which job should be killed?
> > > >
> > > >
> > > > > Worse, jobs submitted to high.q with an h_vmem need set below the
> RQS limit but which are written to exceed the limit successfully gobble up a
> > > > > forbidden amount of ram.
> > > >
> > > > I don't get this sentence. Can you make an example?
> > > >
> > > > I have a simple shell script that runs the linux tool stress which
> asks the system for some amount of ram, here is that line:
> > > >
> > > > /usr/bin/stress -v --cpu 1 --io 2 --vm 1 --vm-bytes 150G --vm-hang 0
> > > >
> > > > which ramps up to occupy 150GB by reading and dirtying ram. The
> --vm-hang 0 part tells
> > > > stress to simply stop and hang around indefinitely once stress has
> occupied 150GB. This
> > > > script succeeds if I do not state h_vmem request at job submission
> > >
> > > ...as the default is 248g
> > >
> > > > or if I ask for h_vmem under
> > > > the RQS limit of 128G.
> > >
> > > NB: g = base 1000, G = base 1024 (man sge_types)
> > >
> > >
> > > > It seems if RQS is satisfied upon job submission
> > >
> > > No, at job start.
> > >
> > >
> > > > then it does not
> > > > monitor ram usage once a job is running
> > >
> > > RQS will never monitor running jobs.
> > >
> > >
> > > > - you say as much in this response.
> > >
> > > You can check with:
> > >
> > > $ ulimit -aH
> > > $ ulimit -aS
> > >
> > > what was set by SGE for the limits. In addition SGE's execd (not the
> RQS) will monitor the usage which was requested by -l h_vmem=... or set by
> the default in the complex definition (man queue_conf, section RESOURCE
> LIMITS). This will be done by the execd, which doesn't know anyhing about
> other user's jobs on other nodes.
> > >
> > >
> > > > > Regarding ram usage: I have tested and read enough on RQS and the
> various ways to configure SGE to conclude that RQS doesn't actually monitor
> ram usage once a job has been submitted?
> > > >
> > > > It will monitor the requested RAM to decide whether any submitted
> job is eligible to start. All running ones should never pass h_vmem if
> added up.
> > > >
> > > > But here (second sentence) you imply that with an h_vmem value in an
> RQS that SGE does indeed monitor a user's running jobs to see if the
> cumulative ram usage exceeds the RQS
> > >
> > > Not the usage. It's a consumable, so it will just add up all requested
> h_vmem requests at time of start of the job and allow it to run or not.
> > >
> > > -- Reuti
> > >
> > >
> > > > h_vmem limit? Is this true but also that SGE will not kill any job
> to get a user's ram footprint down below the RQS? The monitoring is used
> only to determine if a submitted job may be run if that submitted job's
> h_vmem request and that user's current ram usage are together below the RQS
> limit?
> > >
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20120627/bacccf1c/attachment.html>


More information about the users mailing list