[gridengine users] Abaqus job suspension & Olesen FlexLM integration
reuti at staff.uni-marburg.de
Tue Nov 22 10:27:47 UTC 2016
> Am 22.11.2016 um 09:52 schrieb Goes, Patrick <patrick.goes at arcelormittal.com>:
> To optimize the utilization of our pool of Abaqus licenses, I want to implement some form of preemptive scheduling, where urgent jobs can force suspension of less urgent ones.
> I have used Marc Olesen’s qlicserver (run as a load sensor) to account for external uses of licenses, which works fine.
> I have created a Abaqus-dedicated queue with suspend, resume and terminate methods for suspending, etc…
> The abaqus jobs that run on execution hosts are suspended all right, and the licenses are released correctly as far as the FlexLM license server is concerned, but for SGE they are *not* released, and consequently, no new jobs that need them are executed.
> Since qlicserver compares the license use reported by the FlexLM server with that of SGE to determine the external (non-SGE) uses, and adjusts (reduces) the SGE complex accordingly, it seems possible to do something similar for suspended jobs: they are reported by qstat but not by FlexLM. Their license use count could be used to increase the SGE complex as long as they are suspended.
> As far as I know, this would require an extension of the qlicserver.
> Or am I missing existing possibilities?
> Are any of you aware of similar efforts in that direction ? Or alternative solutions ?
Unfortunately no. Although there are custom suspend and resume procedures which can be defined and even correct any overbooking of licenses, there is no "look-ahead" feature in SGE. Means, that SGE can't see that the available licenses with be increased by X if job Y is going to be suspended. So the job which would lead to suspension of another job is never scheduled.
> Thank you, best regards,
> Patrick L. Goes
> users mailing list
> users at gridengine.org
More information about the users