[gridengine users] RQS Help

Reuti reuti at staff.uni-marburg.de
Tue Jun 26 20:59:02 UTC 2012


Hi,

Am 26.06.2012 um 20:57 schrieb Ray Spence:

> Back on the list. Please see below - 
> 
> On Tue, Jun 26, 2012 at 11:41 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Am 26.06.2012 um 20:30 schrieb Ray Spence:
> 
> > Reuti,
> >
> > Thank you so much for making RQS/h_vmem clear.
> >
> > I hope I'm not taking advantage of you here - I apologize if so.
> 
> No, but please ask on the list or register. Therefore I didn't forward you last posting as it was sent from an unknown address.
> 
> 
> > I have another question regarding slots. Our cluster has 4 nodes each with 32 cores. My
> > assumption is that SGE should be able to run 128 total jobs at any time. I see only
> > 1 job running per node with many jobs in qw. I think I need to change the "slots" value
> > in the queue config? Here is what I have (still early on in learning SGE config..)
> > from qconf -sq <queue>
> >
> > slots                1,[scf-sm00.Stat.Berkeley.EDU=32], \
> >                       [scf-sm01.Stat.Berkeley.EDU=32], \
> >                       [scf-sm02.Stat.Berkeley.EDU=32], \
> >                       [scf-sm03.Stat.Berkeley.EDU=32]
> 
> It's a matter of taste: the above is correct. If you have identical nodes, you can even shorten it to:
> 
> slots      32
> 
> Great, we do - I'll try that!
>  
> 
> It's the number of slots per queue instance.
> 
> 
> > should the "1" be 32? 128? Or, where is it that I tell SGE to use all 32 cores?
> 
> As the default memory consumption is 248g, only one job can run at a time I would say.
> 
> Ok, this is not what we want at all. It is more important to use all 32 cores/node than attempting
> any ram usage control. I'm going to back out the h_vmem complex setting in order to run 128 jobs at a time. Should I reset the h_vmem complex back to not consumable (NO)

If you want to control the usage of memory and avoid oversubscription of it, it needs to stay being consumable.


> or keep it comsumable but set its default to say 1G? If I do that won't users have to request
> a higher h_vmem  amount upon job submission?

Sure, if they want to use more than 1G they have to request more. They have to predict what they need. There is no crystal ball inside SGE which could look ahead to predict the necessary memory for for a job.

-- Reuti


> 
> Try to submit jobs with "sleep 120" or so for which you requested less memory on the command line. The actual used up memory can be checked:
> 
> qhost -F h_vmem
> 
> 
> -- Reuti
> 
> 
> > Thank you again,
> > Ray
> >
> >
> > On Tue, Jun 26, 2012 at 11:14 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> > Am 26.06.2012 um 19:42 schrieb Ray Spence:
> >
> > > Hi Reuti,
> > >
> > > I'll respond in-line:
> > >
> > > On Mon, Jun 25, 2012 at 4:21 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
> > > Hi,
> > >
> > > Am 26.06.2012 um 00:57 schrieb Ray Spence:
> > >
> > > > I apologize for more questions but I'm not getting to where our group wants our new
> > > > cluster to be. In order to limit all of a given user's jobs in a specified queue to a total
> > > > amount of physical ram (h_vmem) I see no other solution than an RQS. Is this true?
> > >
> > > Correct. h_vmem is a hard limit while others prefer virtual_free as a guidance for SGE, while the latter is not enforced:
> > >
> > > http://www.gridengine.info/2009/12/01/adding-memory-requirement-awareness-to-the-scheduler/
> > >
> > >
> > > I've read this info from you. When you say "Use the one you defined in your qsub command by requesting it with the -l option..." I take you to mean that once I've made a given memory complex (h_vmem, virtual_free, etc.) consumable (qconf -mc) in order to enforce any limit on that complex users must request a number value on that complex at job submission. I think
> > > I'm repeating myself here.. Your info here is what lead me to pose my question in the first place.
> > >
> > >
> > > > Using qconf -mq <queue> will limit each job in <queue> but not each user's total
> > > > memory footprint across all his jobs, correct?
> > >
> > > Correct.
> > >
> > >
> > > > The node-level limit does not do what
> > > > we want here..
> > >
> > > Correct, it's the memory usage across all queues and resp. all jobs on a node.
> > >
> > >
> > > > I have this RQS in place:
> > > >
> > > > {
> > > >    name         high.q-h_vmem
> > > >    description  "high.q h_vmem limited to 128G"
> > >
> > > The quotation marks are not necessary.
> > >
> > >
> > > >    enabled      TRUE
> > > >    limit        users {*} queues high.q to h_vmem=128g
> > > > }
> > >
> > > You made h_vmem consumable and attached a value per exechost?
> > >
> > > Yes - via qconf -mc, here is what the h_vmem line looks like:
> > >
> > > h_vmem              h_vmem     MEMORY      <=    YES         YES        248g     0
> >
> > It should be set to a default you expect to be taken for a job. We set it to 2g here, and users can increase the per job limit to the one set in the queue definition.
> >
> >
> > > (should the "default" value here be different than 248? see below.. Must it be 0? Must it NOT be 0?)
> > >
> > > and via qconf -me I've set h_vmem to be a little less (248G) than the installed ram (256G)
> > > on each of the cluster's 4 nodes:
> > >
> > > qconf -se <cluster_node>
> > > hostname              <>
> > > load_scaling          NONE
> > > complex_values        slots=32,h_vmem=248G
> > > .....
> > >
> > >
> > >
> > > > which would seem to accomplish our goal. However, jobs submitted to high.q against this
> > > > RQS without stating h_vmem needs at submission but which are written to exceed the memory limit do exceed the memory limit.
> > >
> > > Correct, the RQS will check the job request for h_vmem, but there is no relation back, i.e. that the RQS will limit the job's memory. Specifying only an overall limit per user would even make it hard for the RQS to decide what limit to (per job) set at all. Or if the overall limit is passed: which job should be killed?
> > >
> > >
> > > > Worse, jobs submitted to high.q with an h_vmem need set below the RQS limit but which are written to exceed the limit successfully gobble up a
> > > > forbidden amount of ram.
> > >
> > > I don't get this sentence. Can you make an example?
> > >
> > > I have a simple shell script that runs the linux tool stress which asks the system for some amount of ram, here is that line:
> > >
> > > /usr/bin/stress -v --cpu 1 --io 2 --vm 1 --vm-bytes 150G --vm-hang 0
> > >
> > > which ramps up to occupy 150GB by reading and dirtying ram. The --vm-hang 0 part tells
> > > stress to simply stop and hang around indefinitely once stress has occupied 150GB. This
> > > script succeeds if I do not state h_vmem request at job submission
> >
> > ...as the default is 248g
> >
> > > or if I ask for h_vmem under
> > > the RQS limit of 128G.
> >
> > NB: g = base 1000, G = base 1024 (man sge_types)
> >
> >
> > > It seems if RQS is satisfied upon job submission
> >
> > No, at job start.
> >
> >
> > > then it does not
> > > monitor ram usage once a job is running
> >
> > RQS will never monitor running jobs.
> >
> >
> > > - you say as much in this response.
> >
> > You can check with:
> >
> > $ ulimit -aH
> > $ ulimit -aS
> >
> > what was set by SGE for the limits. In addition SGE's execd (not the RQS) will monitor the usage which was requested by -l h_vmem=... or set by the default in the complex definition (man queue_conf, section RESOURCE LIMITS). This will be done by the execd, which doesn't know anyhing about other user's jobs on other nodes.
> >
> >
> > > > Regarding ram usage: I have tested and read enough on RQS and the various ways to configure SGE to conclude that RQS doesn't actually monitor ram usage once a job has been submitted?
> > >
> > > It will monitor the requested RAM to decide whether any submitted job is eligible to start. All running ones should never pass h_vmem if added up.
> > >
> > > But here (second sentence) you imply that with an h_vmem value in an RQS that SGE does indeed monitor a user's running jobs to see if the cumulative ram usage exceeds the RQS
> >
> > Not the usage. It's a consumable, so it will just add up all requested h_vmem requests at time of start of the job and allow it to run or not.
> >
> > -- Reuti
> >
> >
> > > h_vmem limit? Is this true but also that SGE will not kill any job to get a user's ram footprint down below the RQS? The monitoring is used only to determine if a submitted job may be run if that submitted job's h_vmem request and that user's current ram usage are together below the RQS limit?
> >
> >
> >
> 
> 




More information about the users mailing list