[gridengine users] PE Job Suspend / Resume
rayrayson at gmail.com
Wed Jun 13 06:19:24 UTC 2012
On Wed, Jun 13, 2012 at 1:47 AM, Erik Soyez
<E.Soyez at science-computing.de> wrote:
> You probably need some kind of cronjob to suspend and unsuspend your
> parallel jobs correctly. Or does anyone have a patch for this?
So is/was it really working when you try it with SGE 6.2u5??
I have not looked into the code that handles parallel job suspension
in detail (we were working on "near-by" code in 2008 and Shannon was
also looking into the suspending parallel jobs at that time, and thus
we just relied on him to debug the code :-D ).
However, in order to properly handle the case you metioned, the
qmaster will need to keep track of the number of times subordination
happens to a job. And I can already think of issues if the accounting
code is not accurate enough.
Do you know if other batch systems handle the case you mentioned correctly?
> Regards, Erik Soyez.
> On Tue, 12 Jun 2012, Joseph Farran wrote:
>> Well, for our needs, we *REALLY* need Parallel Job suspension. It's
>> not even a choice for us.
>> If Torque/Maui can do it, I am sure OGE can do it without issues.
>> Can someone please tell me what patch I need to install to un-break /
>> turn-on Parallel job suspension?
>> If you guys are that paranoid about PE suspension, how about adding an
>> on/off flag for this since the code is already there and let the admin pick?
>> On 06/12/2012 06:52 AM, Dave Love wrote:
>>> "Joseph A. Farran"<jfarran at uci.edu> writes:
>>>> If you guys are taking requests, *please* add suspension and ignore old
>>>> Sun recommendation.
>>> Support for suspension exists, it's just broken (per the issue Reuti
>>> pointed to). The use of | is clearly wrong, but the other bit isn't
>>> clear. It's one of the available patches I wanted to understand before
>>> applying (and had forgotten about). Can anyone cast more light on it?
> Vorstandsvorsitzender/Chairman of the board of management:
> Gerd-Lothar Leonhart
> Vorstand/Board of Management:
> Dr. Bernd Finkbeiner, Michael Heinrichs, Dr. Arno Steitz, Dr. Ingrid Zech
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Philippe Miltin
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196
> users mailing list
> users at gridengine.org
More information about the users