[gridengine users] Message in stderr after exceeding resources
reuti at staff.uni-marburg.de
Wed Mar 2 19:32:19 UTC 2011
Am 02.03.2011 um 20:16 schrieb Chris Jewell:
> On 2 Mar 2011, at 18:59, Reuti wrote:
>> Am 02.03.2011 um 19:37 schrieb Chris Jewell:
>>> I was wondering if it was possible to get GE to output an error message to the stderr file in response to a job being killed due to it exceeding a resource request?
>> yep, it's sometimes not easy to investigate why a job was killed as you have to check the messages file of the appropriate nodes. As you have only SMP jobs in the parallel case there is only one machine to check, and it can be attached to the email which is send to the user. Please find attached a mail-wrapper which uses a local messages file, but it can be adjusted to reflect your path. In case you face race conditions that the email is send too early before there is an entry in the messages file, a `sleep 5` or alike should help.
> Thanks for that, Reuti. I'm a little confused as to where to include it in the config -- are you meaning to replace the mailer on the host with it?
You can place the mailer.sh anywhere you like, just all nodes need to have access to it. I put it usually in a directory $SGE_ROOT/cluster Then enter the path to this script in SGE's configuration with `qconf -mconf` for the to be used "mailer".
> Would it be possible to write to the stderr with an epilogue script, harvesting the same line from the messages file, I wonder?
My experience is: no. At time of the epilog the entry in the messges file is not written yet, so I decided to put it in the mail wrapper which will be used to send the email after the job has left the node.
It's of course possible to "abuse" the mail-wrapper to also append the line to the stderr of the job. But the setup is a little bit convoluted, as you have to save the name of the stderr file in a persistent file which will survive the end of the job - after the job you can't retrieve its name any longer. This can be done in a job prolog, and we use it to include some entries of the job context in the email later on. Let me know if you need some directions to set it up.
> Dr Chris Jewell
> Department of Statistics
> University of Warwick
> CV4 7AL
> Tel: +44 (0)24 7615 0778
> users mailing list
> users at gridengine.org
More information about the users