[gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

Reuti reuti at staff.uni-marburg.de
Thu Dec 21 22:17:52 UTC 2017


Am 21.12.2017 um 22:46 schrieb bergman at merctech.com:

> In our cluster, we've got several different types of GPUs.
> Some jobs simply need any GPU, while others require a specific type.
> Previously, we had "gpu" declared as a BOOLEAN attribute on each GPU-node
> and had the GPU type (ie., TITANX, P100, etc) declared as an INT attribute
> with the count of that number of GPUs per node.
> For example:
> 	qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node1
> 	qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node2
> 	qconf -aattr exechost complex_values gpu=TRUE,P100=2 node3
> 	qconf -aattr exechost complex_values gpu=TRUE,P40=1 node4
> A user could submit:
> 	qsub -l gpu myjob
> and it could run on any of the nodes, or a user could run:
> 	qsub -l TITANX=1 myjob
> and it could run on node1 or node2.
> However... this lead to over-subscription as the 'gpu' BOOLEAN isn't a
> consumable resource.
> I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
> making it a consumable resource, and updating our JSV (in perl) so that
> if the job is submitted as
> 	qsub -l gpu foobar
> it will be altered to the equivalent of
> 	qsub -l gpu=1 foobar
> to keep things easy for users.
> Any suggestions about this plan?

Even with "-w n" you will face a "missing value for request" I fear, as it's AFAIK checked before the JSV will be called*. I had the idea in the past to change the default value for an integer request without a number to one (it's quiet easy to find in the source where the BOOL without a value is expanded) but it was denied.

But: do you need to know which GPU will be used? Univa GE has a named resource. With SGE it might help to have one queue with one slot per GPU, and from the name (i.e. suffix) of the granted queue name you know which GPU you have to use.

-- Reuti

*) The "-w e" check will even be performed twice: one time before the JSV and one time after. This is to my opinion not optimal, as it prohibits to submit a completely malformed request and put things in order inside the JSV. Sure, one problem are the fields which are feed to the JSV. How to express a missing integer value (besides the IEEE ways like NaN and alike).

> Thanks,
> Mark
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

More information about the users mailing list