[gridengine users] default value in complex cannot be changed via a user request

Reuti reuti at staff.uni-marburg.de
Sat Apr 11 09:36:25 UTC 2015


Am 11.04.2015 um 03:05 schrieb Marlies Hankel:

> Dear all,
> 
> Yes, I set a default of 1G value for h_vmem in the global complex.

You mean `qconf -me global`? This is the available memory being available once in the complete cluster. The default is in `qconf -mc`.


> The queue has infinity and each node has a h_vmem of 120G. There is nothing set in sge_request.
> 
> Here is the error I get
> 
> 04/09/2015 12:41:54|  main|cpu-1-4|W|job 4 exceeds job hard limit "h_vmem" of queue "all.q at cpu-1-4.local" (7002386432.00000 > limit:1073741824.00000) - sending SIGKILL
> 04/09/2015 12:41:55|  main|cpu-1-4|W|job 4 exceeds job hard limit "h_vmem" of queue "all.q at cpu-1-4.local" (6889209856.00000 > limit:1073741824.00000) - sending SIGKILL
> 
> 
> For this I asked for
> #$ -pe openmpi 10
> #$ -l h_vmem=1G
> 
> checking ulimits via the script gives the expected 10G.

So the job started on one node only? What is the complete definition of this PE?

> 
> So for some reason there is a limit of 1G there. Checking qacct it shows the the total maxvmem used is just over 3G so asking for 10G should be plenty.
> 
> [root at queue ~]# qconf -sc
> #name               shortcut   type        relop requestable consumable default  urgency
> #----------------------------------------------------------------------------------------
> h_vmem              h_vmem     MEMORY <=    YES         YES        1G       0

Yep, this 1G should work.

If it persists, maybe it a problem in OGS as I didn't notice it in other forks.

-- Reuti


> [root at queue ~]# qconf -sq all.q
> h_vmem                INFINITY
> 
> [root at queue ~]# qconf -se cpu-1-4.local
> complex_values        h_vmem=120G
> 
> Marlies
> 
> On 04/10/2015 07:43 PM, Reuti wrote:
>>> Am 10.04.2015 um 05:59 schrieb Marlies Hankel<m.hankel at uq.edu.au>:
>>> 
>>> Dear all,
>>> 
>>> I ran into some trouble with the default value of h_vmem. I set to be consumable=yes and also set a default value of 1G. When I submitted a job asking for example for 10 slots with 1G per lost the job crashed with an error in the queue logs saying that the h_vmem needed by the job (around 3G) was over the hard limit of the queue (local host instance) of 1G. I would have thought that the request of 1G per slot, so 10G in total would override this and give enough memory for the job.
>>> 
>>> Setting the default value to 6G resolved the problem,
>> You refer to the setting on a queue level? This is the limit per process. There is also a column for the default value in the complex definition for each consumable complex. This can be set to 1G and users can override it, as long as they stay below the limit on a queue (or exechost) level.
>> 
>> -- Reuti
>> 
>> 
>>> but as we might be dealing with larger memory jobs in future I would like to find a proper fix for this. I am running SGE as installed by ROCKS 6.1.1 (OGS/Grid Engine 2011.11) and the only thing I changed was to set h_vmem to consumable=yes and set the relevant h_vmem values for each host.
>>> 
>>> I want users to request memory and jobs to be killed if they exceed the requested amount, so h_vmem seemed to be the way to go. But how do I set a small default value that users can change if they need more? Or should I set it to forced without a default and force users to request it?
>>> 
>>> Thanks in advance
>>> 
>>> Marlies
>>> 
>>> -- 
>>> 
>>> ------------------
>>> 
>>> Dr. Marlies Hankel
>>> Research Fellow, Theory and Computation Group
>>> Australian Institute for Bioengineering and Nanotechnology (Bldg 75)
>>> eResearch Analyst, Research Computing Centre and Queensland Cyber Infrastructure Foundation
>>> The University of Queensland
>>> Qld 4072, Brisbane, Australia
>>> Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445
>>> Email: m.hankel at uq.edu.au | www.theory-computation.uq.edu.au
>>> 
>>> 
>>> Notice: If you receive this e-mail by mistake, please notify me,
>>> and do not make any use of its contents. I do not waive any
>>> privilege, confidentiality or copyright associated with it. Unless
>>> stated otherwise, this e-mail represents only the views of the
>>> Sender and not the views of The University of Queensland.
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
> 
> -- 
> 
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> 
> Please note change of work hours: Monday, Wednesday and Friday
> 
> Dr. Marlies Hankel
> Research Fellow
> High Performance Computing, Quantum Dynamics&  Nanotechnology
> Theory and Computational Molecular Sciences Group
> Room 229 Australian Institute for Bioengineering and Nanotechnology  (75)
> The University of Queensland
> Qld 4072, Brisbane
> Australia
> Tel: +61 (0)7-33463996
> Fax: +61 (0)7-334 63992
> mobile:+61 (0)404262445
> Email: m.hankel at uq.edu.au
> http://web.aibn.uq.edu.au/cbn/
> 
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> 
> Notice: If you receive this e-mail by mistake, please notify me, and do
> not make any use of its contents. I do not waive any privilege,
> confidentiality or copyright associated with it. Unless stated
> otherwise, this e-mail represents only the views of the Sender and not
> the views of The University of Queensland.
> 
> 
> 





More information about the users mailing list