[gridengine users] preemption mechanism with virtual resources
reuti at staff.uni-marburg.de
Sat Mar 23 17:57:49 UTC 2013
Am 22.03.2013 um 17:55 schrieb HUMMEL Michel:
> Thank you to take time for me.
> In our cluster we have critical resources named DCh (specific hardware used through TCP sockets) .
> The DChs are limited in number and if a high priority job is submitted to the cluster it must be able to take DCh to lower priority jobs if needed. (it is what I mean by preemption)
> To do this I tryed to used:
> - A "global complex" to manage the DChs. (works fine)
> - The parameter subordinate_list on lists which were sharing the "global complex"
This is not implemented by any automatism in SGE. What can be done to script it outside of SGE:
- define a BOOL complex "urgent":
urgent urgent BOOL == FORCED NO 0 0
- request "-l urgent=FALSE" for all normal jobs (e.g. by sge_request or a JSV)
- request "-l urgent=TRUE" for the really urgent jobs
- the value of "urgent" needs to be defined on a global level `qconf -me global` to read "complex_values urgent=FALSE"
- in case an urgent job is waiting:
a) set the global value "complex_values urgent=TRUE" *)
b) `qdel` any of the normal jobs, so that the global count of the consumable "DCh" increases again
c) due to the defined "urgent" complex only this type of job can start right now
d) after the job started, the global value of "urgent" can be reverted again
HTH - Reuti
*) can be scripted by: `qconf -mattr exechost complex_values urgent=TRUE global`
PS: In principle a high "urgency" value could also put the urgent job on top of the list of waiting jobs, but this might be influenced by other priority settings too and is not safe IMO.
PPS: without an additonal complex it would be necessary to put all other jobs on `qhold` and `qrls` them afterwards. This could be more complicated to get a proper list of jobs which should be affected by this.
> The objective was to allow high priority job to requeue the lower priority job but it doesn't seems to work with job asking for complex resources.
> Do you have any solution?
> -----Message d'origine-----
> De : Reuti [mailto:reuti at staff.uni-marburg.de]
> Envoyé : vendredi 22 mars 2013 17:17
> À : HUMMEL Michel
> Cc : users at gridengine.org
> Objet : Re: [gridengine users] preemption mechanism with virtual resources
> Am 22.03.2013 um 15:59 schrieb HUMMEL Michel:
>> Sorry for the mistake, it works when queues are sharing the same resources (nodes)
>> But I cannot make it to work with A Global Consumable Resource :
>> I am trying to create a global resource which can be preempted by high priority job.
> Resources will never be preempted in advance, as SGE can't look ahead. The suspension by subordination is the result of a new job starting on an exechost. But neither slots nor any memory will be freed or alike. E.g. on a gobal level already used up licenses will still be in use from SGE's point of view.
> What behavior do you need in detail?
> -- Reuti
>> Thank you for helping, is it more clear for you
>> -----Message d'origine-----
>> De : Reuti [mailto:reuti at staff.uni-marburg.de]
>> Envoyé : vendredi 22 mars 2013 15:37
>> À : HUMMEL Michel
>> Cc : users at gridengine.org
>> Objet : Re: [gridengine users] preemption mechanism with virtual resources
>> Am 22.03.2013 um 15:02 schrieb HUMMEL Michel:
>>> I am testing the preemption mechanism of OSG using the subordinate_list parameter on lists which are using the same resources.
>> Lists of machines, i.e. hostgroups?
>>> It works like a charm with a classical resource like CPU but it doesn't seems to work with virtual resources.
>>> Does anyone has successfully configured the preemption mechanism with virtual resources ?
>> What do you mean by virtual resources?
>> -- Reuti
More information about the users