[gridengine users] deciding spool directory location

Chris Dagdigian dag at sonsorol.org
Thu Jan 12 18:50:09 UTC 2012


Hi Dale,

> We are trying to determine where the spool directory should reside based on performance
>>  Versus ease of administration.  Can somebody explain how ease of administration would
>>  be made easier?

Here is a short answer:

When the spool directory is shared it is far easier for an administrator 
to troubleshoot node-specific job issues. This is because you can 
see/access all of the spool/<nodename/messages files in one convenient 
location without having to hop to a specific machine.

When spool is not shared your spool data and messages are on local disk 
on the compute nodes. This means that you have to connect to that node 
in order to read or examine the files.

More detail ...


The decision to do shared or not-shared generally revolves around the 
power of your NFS server, what else is talking on that same 
network/subnet/vlan/wire and probably more importantly how many jobs you 
might be running through your system during a day. The number of jobs 
entering and existing the system is the real factor on how often and 
hard your spool share is getting hit. Some of my pharma clusters run 
hours-long jobs and might only do a few hundred or thousand jobs per 
day. Another biotech cluster of similar size might be doing 150,000 jobs 
per day running short chemical simulations.

My gut answer is usually to do shared-spool first and only move away 
from that if performance demands it. Changing the spooling location 
post-install is not a huge deal.

I'm also a classic spooling zealot. I hate berkeleydb spooling and even 
on the 2000 core cluster that does 150,000 jobs per day we still use 
classic spooling on a NFS shared SGE Root and spool. We are, however, 
using Isilon scale-out NAS for the NFS and that means we have no real 
performance issues at all.

My $.02

-Chris





More information about the users mailing list