[gridengine users] Message in stderr after exceeding resources
Mark Dixon
m.c.dixon at leeds.ac.uk
Mon Mar 14 14:09:54 UTC 2011
On Fri, 4 Mar 2011, Dave Love wrote:
...
>> I've been meaning to dust it off and bring it up to date for 6.2 and where
>> the client spool is local to each compute node (but I arrange for the
>> messages file to end up in the central location anyway),
>
> What's your recommended recipe if one has enough pressure on the spool
> to make the local spooling worthwhile (which we definitely don't)?
* Create a $SGE_ROOT/$SGE_CELL/spool/<HOSTNAME> directory
* Replace the messages file in the local spool directory with a symlink
pointing to $SGE_ROOT/$SGE_CELL/spool/<HOSTNAME>/messages
The intention is to gain many of the performance advantages of a local
spool, without having to change how you debug job failures.
However, I'm not doing this yet as we're hitting a bug (which I keep
meaning to log in a bugtracker and/or try and fix - sound familiar?) where
lots of logging info relating to failed job cleanup is being continually
written - so I'm copying data across to the central spool area at hourly
intervals (with cron).
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : m.c.dixon at leeds.ac.uk
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
More information about the users
mailing list