[gridengine dev] [PATCH] Fix PE task array job failure due to missing job script on execd

Joachim Gabler jgabler at univa.com
Wed Dec 14 10:57:10 UTC 2011


Hi Mark,

thank you for this comprehensive and well documented patch.

The code looks good to me except for a small thing:
You are using the JG_tag_slave_job field in the 
JAT_granted_destin_identifier_list to detect if the job is master or 
slave on the host.
The JG_tag_slave_job is just a tag used in sge_qmaster to do some 
bookkeeping during job delivery and job finish and cleanup. The 
algorithms used in sge_qmaster might change which would then break your fix.

Looking for the JSLAVE state of the jatep should give you the same 
information,
something like
if (lGetUlong(jatep, JAT_status) != JSLAVE) {
    master_jobs++;
}

Is there a way to reliably reproduce the issue?
Or some set-up with a high probability of seeing it?

Thanks,

    Joachim


Mark Dixon wrote:
> Commit message from the attached patch (prepared against 6.2u5) 
> reproduced below... please let me know if this is useful and/or if 
> there are any problems with it :)
>
> Cheers,
>
> Mark
>
>
> _______________________________________________
> dev mailing list
> dev at gridengine.org
> https://gridengine.org/mailman/listinfo/dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/dev/attachments/20111214/f4f88bc9/attachment.html>


More information about the dev mailing list