[gridengine users] e-mail when job array finishes

Fritz Ferstl fferstl at univa.com
Thu Nov 7 17:56:04 UTC 2013



> Fritz Ferstl <mailto:fferstl at univa.com>
> 7. November 2013 18:26
> Hhhhmm ...you could use a job context variable (see the qsub/qalter 
> man page on job context vars and the options to set them) for instance 
> as a counter to ascertain in a prolog that a particular task really is 
> the last from an array job.
I meant "epilog", not "prolog" here.
>
> A JSV could insert the option to set the job context variable to the 
> number of tasks (to save the user from having to do it) and then the 
> epilog of the tasks could decrease that counter by 1 unless the 
> current count is 1 in which case the task is the last and it can send 
> the e-mail.
>
> This approach has a number of pitfalls which would need to be 
> addressed, however. If more than one task were to finish at about the 
> same time then there'd be a race condition during the step of 
> decreasing the counter. It is not an atomic operation. You have to 
> read the content of the context var first, then decrease it and then 
> set it back in the job context. You probably could address this with 
> another context var serving as a lock, though. (That's several client 
> communications with the qmaster per task, however, and on a throughput 
> cluster certainly not a good idea for performance reasons.)
>
> Another issue would be if epilog scripts for tasks would not get 
> executed properly, e.g. because the node goes down while being into 
> it. Should happen very rarely, though.
>
> It's still somewhat cumbersome to do all this and hence I'd go for 
> Reuti's solution if it was me. A wrapper script for job submission 
> could ensure that the dependent dummy job gets submitted without a 
> need for the end user to think about it.
>
> Cheers,
>
> Fritz
>
>
> Txema Heredia <mailto:txema.llistes at gmail.com>
> 7. November 2013 17:42
> El 07/11/13 16:32, Reuti escribió:
>> Hi,
>>
>> Am 07.11.2013 um 15:28 schrieb Arnau Bria:
>>
>>> I'd like to get an e-mail when job a job array finishes.
>>>
>>> I was looking at
>>> http://comments.gmane.org/gmane.comp.clustering.gridengine.users/19962
>>> and did a simple condition when SGE_TASK = SGE_TASK_LAST then e-mail,
>>> but someone told me that maybe the last task is not the last one to
>>> finish, so, i.e, in a array of 10 jobs, the 3th is the one that
>>> finish last so I won't be getting the e-mail when the array finishes
>>> but when last task finishes.
>>>
>>>
>>> So, I'm thinking in how to manage this, and I'm wondering if there's
>>> another solution than doing/parsing a qstat every time a task finishes
>>> for guessing if it's the last one.
>>>
>>> * I'd like to leave this in the job side, nothing like "daemons"
>>>   running in the server or even prolog... (if possible).
>>>
>>> Anyone with something more elegant?
>> Instead of getting an email from a particular task, you could submit 
>> a follow up job with -hold_jid which depends on this one. For this 
>> followup job you will then just get one email. Depending on your 
>> cluster setup, it might be necessary to have some kind of dummy-queue 
>> with a cpu time limit of 10 seconds or so, which will always accept 
>> jobs (i.e. maybe a forced "mail_only" boolean complex, it could even 
>> reside on the master node).
>>
>> -- Reuti
>>
> That is a good solution, but it would rely on the users/is not automated.
>
> The best "automatic" solution would be using an epilog script that 
> checks qstat | grep $JOBID | wc -l == 1 and acts accordingly. But then 
> again you have the problem of 2 jobs finishing at once (rare), and 
> users that request "-m e" in their task jobs.
>
>
>>> TIA,
>>> Arnau
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
> Reuti <mailto:reuti at staff.uni-marburg.de>
> 7. November 2013 16:32
> Hi,
>
> Am 07.11.2013 um 15:28 schrieb Arnau Bria:
>
>> I'd like to get an e-mail when job a job array finishes.
>>
>> I was looking at
>> http://comments.gmane.org/gmane.comp.clustering.gridengine.users/19962
>> and did a simple condition when SGE_TASK = SGE_TASK_LAST then e-mail,
>> but someone told me that maybe the last task is not the last one to
>> finish, so, i.e, in a array of 10 jobs, the 3th is the one that
>> finish last so I won't be getting the e-mail when the array finishes
>> but when last task finishes.
>>
>>
>> So, I'm thinking in how to manage this, and I'm wondering if there's
>> another solution than doing/parsing a qstat every time a task finishes
>> for guessing if it's the last one.
>>
>> * I'd like to leave this in the job side, nothing like "daemons"
>>   running in the server or even prolog... (if possible).
>>
>> Anyone with something more elegant?
>
> Instead of getting an email from a particular task, you could submit a follow up job with -hold_jid which depends on this one. For this followup job you will then just get one email. Depending on your cluster setup, it might be necessary to have some kind of dummy-queue with a cpu time limit of 10 seconds or so, which will always accept jobs (i.e. maybe a forced "mail_only" boolean complex, it could even reside on the master node).
>
> -- Reuti
>
>
>> TIA,
>> Arnau
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
> Arnau Bria <mailto:listsarnau at gmail.com>
> 7. November 2013 15:28
> Hi all,
>
> I'd like to get an e-mail when job a job array finishes.
>
> I was looking at
> http://comments.gmane.org/gmane.comp.clustering.gridengine.users/19962
> and did a simple condition when SGE_TASK = SGE_TASK_LAST then e-mail,
> but someone told me that maybe the last task is not the last one to
> finish, so, i.e, in a array of 10 jobs, the 3th is the one that
> finish last so I won't be getting the e-mail when the array finishes
> but when last task finishes.
>
>
> So, I'm thinking in how to manage this, and I'm wondering if there's
> another solution than doing/parsing a qstat every time a task finishes
> for guessing if it's the last one.
>
> * I'd like to leave this in the job side, nothing like "daemons"
> running in the server or even prolog... (if possible).
>
> Anyone with something more elegant?
>
> TIA,
> Arnau
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

-- 

UnivaFritz Ferstl | CTO and Business Development, EMEA
Univa Corporation <http://www.univa.com/> | The Data Center Optimization 
Company
E-Mail: fferstl at univa.com | Phone: +49.9471.200.195 | Mobile: 
+49.170.819.7390

Where Grid Engine lives

*Visit us at SC13 at booth #4101*!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20131107/d696005a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20131107/d696005a/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Grafik1
Type: image/png
Size: 4331 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20131107/d696005a/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Where Grid Engine lives
Type: image/png
Size: 5779 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20131107/d696005a/attachment-0001.png>


More information about the users mailing list