[gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged
bergman at merctech.com
bergman at merctech.com
Fri Dec 22 22:55:26 UTC 2017
In the message dated: Thu, 21 Dec 2017 23:17:52 +0100,
The pithy ruminations from Reuti on
<Re: [gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged> were:
=> Am 21.12.2017 um 22:46 schrieb bergman at merctech.com:
=> > I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
=> > making it a consumable resource, and updating our JSV (in perl) so that
=> > if the job is submitted as
=> > qsub -l gpu foobar
=> > it will be altered to the equivalent of
=> > qsub -l gpu=1 foobar
=> > to keep things easy for users.
=> > Any suggestions about this plan?
=> Even with "-w n" you will face a "missing value for request" I fear, as it's AFAIK checked before the JSV will be called*. I had the idea in the past to change the default value for an integer request without a number to one (it's quiet easy to find in the source where the BOOL without a value is expanded) but it was denied.
Well, I tried the changes:
qconf -sc | grep gpu
gpu cuda INT <= YES JOB 0 1000
And submitted a job:
qsub -l gpu ./smi.qsub
And it seems to have been accepted by qsub (note the change to "gpu=1" from our JSV):
qstat -j 737215|grep gpu
hard resource_list: gpu=1,h_vmem=4g,h_stack=256m
Perhaps the "missing value for request" check only applies to certain
SGE versions? I left out mentioning that we're running SoGE 8.1.6.
=> But: do you need to know which GPU will be used? Univa GE has a named
Yeah, that was going to be another post.
=> resource. With SGE it might help to have one queue with one slot per GPU,
=> and from the name (i.e. suffix) of the granted queue name you know which
=> GPU you have to use.
True, but even with that info, there doesn't seem to be any universal
way to tell an arbitrary GPU job which GPU to use -- they all default
to device 0.
Our likely solution will be to install 1 GPU/node, except for a few nodes
with multiple GPUs where any job requesting that node gets all GPUs,
and the job is expected to manage the multiple devices.
=> -- Reuti
=> *) The "-w e" check will even be performed twice: one time before the JSV and one time after. This is to my opinion not optimal, as it prohibits to submit a completely malformed request and put things in order inside the JSV. Sure, one problem are the fields which are feed to the JSV. How to express a missing integer value (besides the IEEE ways like NaN and alike).
=> > Thanks,
=> > Mark
More information about the users