[gridengine users] how create and monitor a new consumable?

Reuti reuti at staff.uni-marburg.de
Mon Apr 20 10:43:14 UTC 2015


Hi,

> Am 20.04.2015 um 07:18 schrieb Marlies Hankel <m.hankel at uq.edu.au>:
> 
> Hi,
> 
> Yes changing where $TMPDIR points, from /tmp to /scratch is the way I would like to do it. That way users can just use the $TMPDIR variable to access.

Yes, this is the preferred way in SGE. The directory will be created once the job starts and being removed when the job finished. This will be applied for each invocation of an array job. Otherwise it would be impossible that each array task instance gets its own scratch directory.

Note, that for parallel jobs the scratch directory will be created when a `qrsh` will start a task on a slave node and being removed when the slave task finishes. Hence for parallel the jobs issuing several `mpiexec` and relying on the data from the last invocation, special means must be taken to create a persistent directory on each of the slave nodes.


> So I can then set up a consumable in qconf -mc
> 
> scratch             scratch    MEMORY      <=    YES YES        0        0
> 
> and then use a load sensor to monitor its usage (I would set a default or force a request). Would the above setup mean the request is per process or per job? Do I need to change the second YES to JOB to have it per job?

This depends what behavior you prefer. Should the requested amount being multiplied or not. I think there is no general rule to do it only in one way.


> How would I then set up the load sensor? I found a howto with an example script that uses /tmp (http://gridscheduler.sourceforge.net/howto/loadsensor.html). If I set the load sensor script to be executed on every host will it know which JOB_ID to use etc as the full path of $TMPDIR will somehow include these. I am no scripting person, I can read them but cannot write them.
> 
> So instead of /tmp I would have something like $JOB_ID.$SGE_TASK_ID.$QUEUE.

Yes.


> Would that actually work?

Well, load sensors are per host, not per job. So you would like that the load_sensor will kill a job once it passes the requested space in its $TMPDIR? Then the load sensor should as a side effect look for all jobs on a node, compute the used space for each job, kill the jobs if necessary. Whether it delivers any value to SGE for any defined complex wouldn't matter. It's more like a cron job assisting SGE to do the right thing and to honor limits.

-- Reuti


> Not sure if QUEUE is the right one to use, will that resolve into just the queue or actually the queue instance? So if the job runs over several nodes (so queue instances) do I need to check several directories or will it all be under one top directory with the JOB_ID?
> 
> Also, while this was under PBS pro I have had the bad experience that individual task temporary files where not deleted even after the task had finished and so the array jobs directory with temporary files just grew and grew and would only be deleted once the whole array job had finished. How does this work in SGE, will individual task TMPDIR directories be deleted once the task is finished and will individual task directories reside within one for the JOB, so JOB_ID or are they all different?
> 
> Thanks for your help
> 
> Marlies
> 
> On 04/10/2015 07:40 PM, Reuti wrote:
>> Hi Marlies,
>> 
>>> Am 10.04.2015 um 06:05 schrieb Marlies Hankel <m.hankel at uq.edu.au>:
>>> 
>>> Dear all,
>>> 
>>> I am using OGS/Grid Engine 2011.11 as installed under ROCKS 6..1.1.
>>> 
>>> We have a global storage which hosts all home directories and which also has a large space which I would like to use as scratch space for jobs. As the storage is faster and so much bigger than the small local disk I would like $TMPDIR to default to /scratch. It will also help with some of our MPI applications that need global scratch space.
>>> 
>>> I know I can set the tmp directory to /scratch but I would also like to have the space as a consumable complex that can be requested by the users if scratch space is needed.
>> This is quite like any consumable with the type being MEMORY calling it "disk" or alike. As it's a global consumable its total available value must be defined in the `qconf -me global`.
>> 
>> Any limits of a file size in "ulimit" would be per process. So there are other means necessary to control eht overall size. There are two ways to control it, but it's not build into SGE:
>> 
>> - looking up the additional group ID for each job controlled by SGE and adding up all open files by checking `lsof`. If the size passes the defined limit, the process could be killed. Already closed file will be missed of course. Or maybe easier:
>> 
>> - For the /scratch directory the upper level permissions can be set to:
>> 
>> node: # ls -dl /scratch
>> drwxr-xr-t 7 root root 4096 2015-04-10 11:25 /scratch
>> 
>> Hence noone can write at this level, but it allows SGE to create a directory there (as root user) and change the ownership with the usual result that the jobs can write into their dedicated $TMPDIR. A background process could then much easier than before check the size of each of the <job_id_>.<task_id:+1>.<q_name> directories and act accordingly by issuing a `qdel`. For sake of easiness this can be put into the load sensor which outputs the consumed space for the defined complex. It's possible to have a consumable which will also get a value from a load sensor. The tighter restriction will be used, being it the internal book keeping of SGE or teh actual load value.
>> 
>> -- Reuti
>> 
>> 
>>> I also would like jobs to be killed if they go over the requested amount.
>>> 
>>> I can set up a complex named scratch and make this consumable but how to I make sure that jobs do not go over the requested amount? Or is there already a complex that would do this I could use?
>>> 
>>> Thanks in advance
>>> 
>>> Marlies
>>> 
>>> -- 
>>> 
>>> ------------------
>>> 
>>> Dr. Marlies Hankel
>>> Research Fellow, Theory and Computation Group
>>> Australian Institute for Bioengineering and Nanotechnology (Bldg 75)
>>> eResearch Analyst, Research Computing Centre and Queensland Cyber Infrastructure Foundation
>>> The University of Queensland
>>> Qld 4072, Brisbane, Australia
>>> Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445
>>> Email: m.hankel at uq.edu.au | www.theory-computation.uq.edu.au
>>> 
>>> 
>>> Notice: If you receive this e-mail by mistake, please notify me,
>>> and do not make any use of its contents. I do not waive any
>>> privilege, confidentiality or copyright associated with it. Unless
>>> stated otherwise, this e-mail represents only the views of the
>>> Sender and not the views of The University of Queensland.
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
> 
> -- 
> 
> ------------------
> 
> Dr. Marlies Hankel
> Research Fellow, Theory and Computation Group
> Australian Institute for Bioengineering and Nanotechnology (Bldg 75)
> eResearch Analyst, Research Computing Centre and Queensland Cyber Infrastructure Foundation
> The University of Queensland
> Qld 4072, Brisbane, Australia
> Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445
> Email: m.hankel at uq.edu.au | www.theory-computation.uq.edu.au
> 
> 
> Notice: If you receive this e-mail by mistake, please notify me,
> and do not make any use of its contents. I do not waive any
> privilege, confidentiality or copyright associated with it. Unless
> stated otherwise, this e-mail represents only the views of the
> Sender and not the views of The University of Queensland.
> 
> 





More information about the users mailing list