[gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

bergman at merctech.com bergman at merctech.com
Fri Dec 22 22:55:26 UTC 2017

In the message dated: Thu, 21 Dec 2017 23:17:52 +0100,
The pithy ruminations from Reuti on 
<Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged> were:
=> Hi,
=> Am 21.12.2017 um 22:46 schrieb bergman at merctech.com:

=> > 
=> > I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
=> > making it a consumable resource, and updating our JSV (in perl) so that
=> > if the job is submitted as
=> > 
=> > 	qsub -l gpu foobar
=> > 
=> > it will be altered to the equivalent of
=> > 
=> > 	qsub -l gpu=1 foobar
=> > 
=> > to keep things easy for users.
=> > 
=> > Any suggestions about this plan?
=> Even with "-w n" you will face a "missing value for request" I fear, as it's AFAIK checked before the JSV will be called*. I had the idea in the past to change the default value for an integer request without a number to one (it's quiet easy to find in the source where the BOOL without a value is expanded) but it was denied.

Well, I tried the changes:

	qconf -sc | grep gpu
	gpu                 cuda       INT         <=    YES         JOB        0        1000

And submitted a job:

	qsub -l gpu ./smi.qsub

And it seems to have been accepted by qsub (note the change to "gpu=1" from our JSV):

	qstat -j 737215|grep gpu
	hard resource_list:         gpu=1,h_vmem=4g,h_stack=256m

Perhaps the "missing value for request" check only applies to certain
SGE versions? I left out mentioning that we're running SoGE 8.1.6.

=> But: do you need to know which GPU will be used? Univa GE has a named

Yeah, that was going to be another post.

=> resource. With SGE it might help to have one queue with one slot per GPU,
=> and from the name (i.e. suffix) of the granted queue name you know which
=> GPU you have to use.

True, but even with that info, there doesn't seem to be any universal
way to tell an arbitrary GPU job which GPU to use -- they all default
to device 0.

Our likely solution will be to install 1 GPU/node, except for a few nodes
with multiple GPUs where any job requesting that node gets all GPUs,
and the job is expected to manage the multiple devices.



=> -- Reuti
=> *) The "-w e" check will even be performed twice: one time before the JSV and one time after. This is to my opinion not optimal, as it prohibits to submit a completely malformed request and put things in order inside the JSV. Sure, one problem are the fields which are feed to the JSV. How to express a missing integer value (besides the IEEE ways like NaN and alike).
=> > 
=> > Thanks,
=> > 
=> > Mark

More information about the users mailing list