[gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

Reuti reuti at staff.uni-marburg.de
Fri Dec 22 23:02:06 UTC 2017


Am 22.12.2017 um 23:55 schrieb bergman at merctech.com:

> In the message dated: Thu, 21 Dec 2017 23:17:52 +0100,
> The pithy ruminations from Reuti on 
> <Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged> were:
> => Hi,
> => 
> => Am 21.12.2017 um 22:46 schrieb bergman at merctech.com:
> => 
> 
> => > 
> => > I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
> => > making it a consumable resource, and updating our JSV (in perl) so that
> => > if the job is submitted as
> => > 
> => > 	qsub -l gpu foobar
> => > 
> => > it will be altered to the equivalent of
> => > 
> => > 	qsub -l gpu=1 foobar
> => > 
> => > to keep things easy for users.
> => > 
> => > Any suggestions about this plan?
> => 
> => Even with "-w n" you will face a "missing value for request" I fear, as it's AFAIK checked before the JSV will be called*. I had the idea in the past to change the default value for an integer request without a number to one (it's quiet easy to find in the source where the BOOL without a value is expanded) but it was denied.
> => 
> 
> Well, I tried the changes:
> 
> 	qconf -sc | grep gpu
> 	gpu                 cuda       INT         <=    YES         JOB        0        1000
> 
> 
> And submitted a job:
> 
> 	qsub -l gpu ./smi.qsub
> 
> And it seems to have been accepted by qsub (note the change to "gpu=1" from our JSV):
> 
> 	qstat -j 737215|grep gpu
> 	hard resource_list:         gpu=1,h_vmem=4g,h_stack=256m
> 
> 
> Perhaps the "missing value for request" check only applies to certain
> SGE versions? I left out mentioning that we're running SoGE 8.1.6.

Aha, interesting. This might have been changed in SoGE. As you have this running: can you please output the value of gpu before you assigned a value? Was it just 0 or already set to 1 as default?

-- Reuti


> 
> 
> => But: do you need to know which GPU will be used? Univa GE has a named
> 
> Yeah, that was going to be another post.
> 
> => resource. With SGE it might help to have one queue with one slot per GPU,
> => and from the name (i.e. suffix) of the granted queue name you know which
> => GPU you have to use.
> 
> True, but even with that info, there doesn't seem to be any universal
> way to tell an arbitrary GPU job which GPU to use -- they all default
> to device 0.
> 
> Our likely solution will be to install 1 GPU/node, except for a few nodes
> with multiple GPUs where any job requesting that node gets all GPUs,
> and the job is expected to manage the multiple devices.
> 
> Thanks,
> 
> Mark
> 
> => 
> => -- Reuti
> => 
> => *) The "-w e" check will even be performed twice: one time before the JSV and one time after. This is to my opinion not optimal, as it prohibits to submit a completely malformed request and put things in order inside the JSV. Sure, one problem are the fields which are feed to the JSV. How to express a missing integer value (besides the IEEE ways like NaN and alike).
> => 
> => 
> => > 
> => > Thanks,
> => > 
> => > Mark
> 





More information about the users mailing list