[gridengine users] preventing certain jobs from being suspended (subordinated)

Reuti reuti at staff.uni-marburg.de
Thu Sep 5 12:11:51 UTC 2019


> Am 05.09.2019 um 13:57 schrieb Tina Friedrich <tina.friedrich at it.ox.ac.uk>:
> 
> We had this problem lots, and I can't quite remember how I solved it - I 
> think it might've been either a JSV or a qsub wrapper that shoves all 
> GPU jobs into the superordinate queue.
> 
> Now that I'm thinking about this again - does the subordinate queue 
> setting accept 'queueu@@hostgroup' syntax like everything else? Don't 
> remember if I ever tried that.

Yes, one can limit it to be available on certain machines only:

subordinate_list      NONE,[@intel2667v4=short]

-- Reuti


> Tina
> 
> On 04/09/2019 21:52, Reuti wrote:
>> 
>> Am 04.09.2019 um 21:58 schrieb bergman at merctech.com:
>> 
>>> Our SoGE (8.1.6) configuration has essentially two queues: one for "all"
>>> jobs and one for "short jobs". The all.q is subordinate to the short.q,
>>> and short jobs can suspend a job in the general queue. At the moment, the
>>> all.q has nodes with & without GPU resources (not ideal, not permanent,
>>> probably to be replaced in the future with multiple queues, but it's
>>> what we have now).
>>> 
>>> Our GPU jobs do not stop or free resources when suspended (OK, the CPU
>>> portion may respond correctly to SIGSTOP, but the GPU portion keeps
>>> running).
>>> 
>>> Is there any way, with our current number of queues, to exempt jobs
>>> using a GPU resource complex (-l gpu) from being suspended by short jobs?
>> 
>> Not that I'm aware of. Almost 10 years ago I had a similar idea:
>> 
>> https://arc.liv.ac.uk/trac/SGE/ticket/735
>> 
>> -- Reuti
>> 
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>> 
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users



More information about the users mailing list