[gridengine users] Message in stderr after exceeding resources

Mark Dixon m.c.dixon at leeds.ac.uk
Mon Mar 14 14:09:54 UTC 2011


On Fri, 4 Mar 2011, Dave Love wrote:
...
>> I've been meaning to dust it off and bring it up to date for 6.2 and where
>> the client spool is local to each compute node (but I arrange for the
>> messages file to end up in the central location anyway),
>
> What's your recommended recipe if one has enough pressure on the spool
> to make the local spooling worthwhile (which we definitely don't)?

* Create a $SGE_ROOT/$SGE_CELL/spool/<HOSTNAME> directory

* Replace the messages file in the local spool directory with a symlink 
pointing to $SGE_ROOT/$SGE_CELL/spool/<HOSTNAME>/messages

The intention is to gain many of the performance advantages of a local 
spool, without having to change how you debug job failures.

However, I'm not doing this yet as we're hitting a bug (which I keep 
meaning to log in a bugtracker and/or try and fix - sound familiar?) where 
lots of logging info relating to failed job cleanup is being continually 
written - so I'm copying data across to the central spool area at hourly 
intervals (with cron).

Mark
-- 
-----------------------------------------------------------------
Mark Dixon                       Email    : m.c.dixon at leeds.ac.uk
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------


More information about the users mailing list