[gridengine users] Scheduler Performance
Ansgar.Esztermann at mpi-bpc.mpg.de
Mon Mar 14 17:25:59 UTC 2011
> I/O on the $SGE_ROOT directory can certainly cause the problems you
> report. I would take a look at what your disks are doing with "iostat -x"
> if I were you. You might see a large number of small I/O requests: we
> certainly did.
There are many small requests, but they seem to be on /var, not $SGE_ROOT. Of course, this might be caused by some process apart from SGE. Our cluster management software uses MySQL, and that's using /var as well.
> * If $SGE_ROOT is not local to the qmaster, MONITOR=1 can itself generate
> a large number of small I/Os and be a significant contributor to the
> problem. Replacing common/schedule with a symlink to a disk local to the
> qmaster resolved many "slow running" problems for us.
> * Do your compute nodes spool to local disk, or to an NFS share?
> ("qconf -sconf | grep execd_spool_dir")
> * Is $SGE_ROOT local to the qmaster?
I was about to write "yes", but that's not entirely true. It's on drbd.
> * Are you using classic or BDB spooling?
Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
More information about the users