[gridengine users] Son of Grid Engine 8.0.0c available

Dave Love d.love at liverpool.ac.uk
Wed Nov 9 18:18:10 UTC 2011

There's been a new, less stable, release available from
<http://arc.liv.ac.uk/downloads/SGE/releases/8.0.0c/> for a while,
release notes appended, but I couldn't post the announcement before.
Major changes include basing qmake and qtcsh on current gmake/tcsh, and
the core binding implementation.

I'm interested in feedback about the core binding, in particular.  It
now uses hwloc <http://www.open-mpi.org/projects/hwloc/>, so it should
work on all platforms likely to be of interest, including
Magny-Cours-style NUMA.  However, I've only used it on Red Hat 5 with
Westmere and several Opteron generations, so it could do with wider
testing.  To build it, you need the hwloc library (release 1.1 or
better) from the URL above or your OS distribution.  (The Red Hat 5 RPMs
for hwloc I used are in the download area.)

A new binding specification for use with distributed parallel jobs was
discussed before, and is implemented in the form
"-binding [...] linear:slots" (or simply "-binding linear") to bind
available cores in sequence up to the number of slots used by the job on
each node.  That does the right thing if an MPI job uses an unequal
number of cores on different nodes without exclusive access.  (When
testing with something like hwloc-ps(1), note that binding all cores on
a node filled by one job appears as no binding, as with the previous

"-binding linear:slots" should be a reasonable default; I have it in
sge_request along with OpenMPI defaults

  rmaps_base_schedule_policy = core
  orte_process_binding = core

Binding still only deals with cores (as with OpenMPI, for instance).  I
resisted a simple extension to cover threads for various reasons.  One
is the need for more general specification of job placement, and I'm not
sure what.  I don't think it could be as general as OAR's but, for
instance, it might be possible to avoid the PQS interface that you might
otherwise use for network-topology-aware scheduling
(which now builds).

Release notes

* Bug fixes

  * Man and other documentation fixes
  * Build/installation fixes (particularly for Red Hat 6 and Linux 3)
  * Fix group ids for submitted jobs [U]
  * Fix default value of boolean with JSV [U]
  * Windows fixes for helper crashes and Vista GUI jobs [U]
  * Ensure parallel jobs are dispatched to the least loaded host [U]
  * Correct ownership of qsub -pty output file; was owned by admin user [U]
  * Fix format of Windows loadcheck.exe output [U]
  * Read from stderr even if stdout is already closed in IJS [U]
  * Fix PDC_INTERVAL=NEVER execd parameter [U]
  * Fix accounting information for Windows GUI jobs [U]
  * Increase default MAX_DYN_EC qmaster param [U]
  * Fix qsub -sync y error message and enforce MAX_DYN_EC correctly [U]
  * Fix job validation (-w e) behaviour [#716] [U]
  * Fix qrsh input redirection [U]
  * Avoid warning when submitting a qrsh job [U]
  * Print start time in qstat -j -xml output [U]
  * Don't raise an error changing resource request on waiting job [#806]
  * Don't exit 0 on error with qconf -secl or -sep
  * Include string.h in drmaa.h [#712]
  * Fix process-scheduler-log with host aliases

* Enhancements

  * Base qmake and qtcsh on the current gmake and tcsh source [#289,
    #504, #832]
  * Support "-binding linear" and "-binding linear:slots"
  * Use the hwloc library for all topology information and core
    binding, supporting more operating systems (now: AIX, Darwin,
    FreeBSD, GNU/Linux, HPUX, MS Windows, OSF/1, Solaris), and more
    hardware types (specifically AMD Magny Cours and similar)
  * Add task number to execd "exceeds job ... limit"

* Other changes (possibly-incompatible)

  * Modify default paths in build files and elsewhere [U]
  * Assorted message fixes
  * In RPMs, move qsched to qmaster package, and separate drmaa4ruby
  * Default to newijs in load_sge_config.sh
  * Default to sh, not csh for configured shell

More information about the users mailing list