[gridengine users] Consumable configuration best practices question for hundreds of resources for specific group of nodes

William Hay w.hay at ucl.ac.uk
Mon Mar 30 08:41:10 UTC 2015

On Sun, 29 Mar 2015 08:50:15 +0000
Yuri Burmachenko <yuribu at mellanox.com> wrote:

> Users will care about which cells they are using.

Could you confirm my understanding is correct below is correct:
The users of this system care which cells they need to use for reasons other than avoiding oversubscription of the cell. 
Cell 25 is fundamentally different from cell 39 even when both are free.  
The users want to be able to tell the scheduler which cells to use rather than being able to write a job script that can read a list of cells
to use from a file or similar.

If all the above is true then your 300 different complex_values are probably unavoidable but it won't be pretty.

> Our partial solution should allow the users to control/monitor/request/free these cells.
> I looked into the links https://arc.liv.ac.uk/trac/SGE/ticket/1426 and http://gridengine.eu/grid-engine-internals/102-univa-grid-engine-810-features-part-2-better-resource-management-with-the-rsmap-complex-2012-05-25 - I see that many consumable resources can be attached on host basis with RSMAP.
Not entirely AIUI (and we're not Univa customers) RSMAP resources can be associated with queues or the global host as well.  Also you request the number of resources you want but UGE assigns the specific resources(cells in you case) that your job will use.  If I'm understanding you correctly that won't work for you. 

> We need to be able to attach these 300 consumable resources as shared between 4 nodes – is it possible? Maybe a separate queue for these 4 particular hosts with list of complex consumable resources?

That doesn't work because resources defined on a cluster queue exist for each queue instance.

Grid Engine doesn't have a simple way to associate a resource with a group of hosts other than the cluster as a whole.  What you can do is define resource availability on the global pseudo host then add a restriction by some means to prevent usage other than on the hosts in question:
*You could define your queue configuration so that all queues on all other nodes have 0 of the resource available while the nodes with access say nothing about availability and therefore have access to the full resources defined on the global host.
*You could define the resources as having 0 availability on hosts other than the ones in question.
*You could probably also do the same with resource quotas.

The first of the above is probably simplest/least work assuming your existing queue configuration is simple.

> All cells are different and users will need to know which one they need to request. At this stage they all should be distinct.

OK.  If users request a lot of different cells for individual jobs this will probably lead to long delays before jobs start.  Said users will almost certainly want to request
a dynamic reservation for their jobs (-R y).

William Hay <w.hay at ucl.ac.uk>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20150330/6f8adeab/attachment.sig>

More information about the users mailing list