[gridengine users] resource types -- changing BOOL to INT but keeping qsub unchanged

bergman at merctech.com bergman at merctech.com
Thu Dec 21 21:46:13 UTC 2017


In our cluster, we've got several different types of GPUs.

Some jobs simply need any GPU, while others require a specific type.

Previously, we had "gpu" declared as a BOOLEAN attribute on each GPU-node
and had the GPU type (ie., TITANX, P100, etc) declared as an INT attribute
with the count of that number of GPUs per node.

For example:

	qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node1
	qconf -aattr exechost complex_values gpu=TRUE,TITANX=1 node2
	qconf -aattr exechost complex_values gpu=TRUE,P100=2 node3
	qconf -aattr exechost complex_values gpu=TRUE,P40=1 node4

A user could submit:
	qsub -l gpu myjob
and it could run on any of the nodes, or a user could run:
	qsub -l TITANX=1 myjob
and it could run on node1 or node2.

However... this lead to over-subscription as the 'gpu' BOOLEAN isn't a
consumable resource.

I'm considering changing "gpu" to an INT (set to the number of GPUs/node),
making it a consumable resource, and updating our JSV (in perl) so that
if the job is submitted as

	qsub -l gpu foobar

it will be altered to the equivalent of

	qsub -l gpu=1 foobar

to keep things easy for users.

Any suggestions about this plan?

Thanks,

Mark



More information about the users mailing list