[gridengine users] Simplifying Parallel Environments

Mark Dixon m.c.dixon at leeds.ac.uk
Thu Feb 2 16:52:01 UTC 2012


On Wed, 1 Feb 2012, Brian Smith wrote:

> I've started a github page for some tools I've put together from various
> bits of code, how-tos, etc. to simplify the setup of parallel
> environments so that they work universally for all MPI implementations
> (on x86_64 Linux) w/ tight-integration support (no support for ssh yet).
>  The syntax for submitting parallel jobs becomes more similar to
> LSF/PBS/Torque and provides for easy configuration of your task layout
> (ppn,nodes,pcpus,pcpus_min,pcpus_max).  We use a JSV to make the magic
> happen.  We create PEs tied to queues since our queues often delineate
> changes in the underlying communication fabrics available.
...

Hi Brian,

I'm glad to see you've written it as an optional alternative to the normal 
way of requesting resources ;)

A few things you might want to consider for future developments:

1) Some sites encode things like interconnect topology in the PE name 
instead as well as ppn. Perhaps you should read the requested PE and 
append a suffix, instead of overwriting it?

2) Your implementation clearly works well with the "-l exclusive" feature, 
giving users a simplified way to experiment and find the optimum ppn for 
their code, or do mixed-mode parallel programming. Unfortunately, AFAIK 
this doesn't get accounted for properly in usage policy calculations. 
Until an execd_params "ACCT_EXCLUSIVE_USAGE" or similar option appears in 
your favourite GE variant, you might want to try the obvious sorts of ugly 
kludges around this.

3) Personally, I'm really not a fan of managing the machinefile and rsh 
wrappers using the PE's start_proc_args / stop_proc_args. I find that an 
mpirun wrapper script provides a much cleaner and more powerful way to 
achieve this (and many other improvements).

All the best,

Mark
-- 
-----------------------------------------------------------------
Mark Dixon                       Email    : m.c.dixon at leeds.ac.uk
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------


More information about the users mailing list