[gridengine users] deciding spool directory location
dag at sonsorol.org
Thu Jan 12 18:50:09 UTC 2012
> We are trying to determine where the spool directory should reside based on performance
>> Versus ease of administration. Can somebody explain how ease of administration would
>> be made easier?
Here is a short answer:
When the spool directory is shared it is far easier for an administrator
to troubleshoot node-specific job issues. This is because you can
see/access all of the spool/<nodename/messages files in one convenient
location without having to hop to a specific machine.
When spool is not shared your spool data and messages are on local disk
on the compute nodes. This means that you have to connect to that node
in order to read or examine the files.
More detail ...
The decision to do shared or not-shared generally revolves around the
power of your NFS server, what else is talking on that same
network/subnet/vlan/wire and probably more importantly how many jobs you
might be running through your system during a day. The number of jobs
entering and existing the system is the real factor on how often and
hard your spool share is getting hit. Some of my pharma clusters run
hours-long jobs and might only do a few hundred or thousand jobs per
day. Another biotech cluster of similar size might be doing 150,000 jobs
per day running short chemical simulations.
My gut answer is usually to do shared-spool first and only move away
from that if performance demands it. Changing the spooling location
post-install is not a huge deal.
I'm also a classic spooling zealot. I hate berkeleydb spooling and even
on the 2000 core cluster that does 150,000 jobs per day we still use
classic spooling on a NFS shared SGE Root and spool. We are, however,
using Isilon scale-out NAS for the NFS and that means we have no real
performance issues at all.
More information about the users