[gridengine users] Parallel Environment
reuti at staff.uni-marburg.de
Tue Jun 5 17:50:48 UTC 2012
Am 05.06.2012 um 04:29 schrieb Rayson Ho:
> On Mon, Jun 4, 2012 at 9:08 PM, Joseph Farran <jfarran at uci.edu> wrote:
>> Telling OGE to use the "mpich" Parallel Environment requesting 64 slots
>> Is there a way of having a generic/default PE environment so one can simple
>> #$ -pe 64
> You can do something similar by defining a generic PE and you can then
> use a generic name.
> Keep in mind that a PE is a somewhat overloaded interface, and it
> defines a lot of things:
> So if for example the mpich PE
In fact: Open MPI and MPICH2 can have the same PE nowadays. You can also use it for HP-MPI, but would need to convert the list of machines therein to a format HP-MPI understands (recent versions can accept one in the MPICH1 format). This way, you could even refomat the list of machines to many other formats, and create a couple of machine files in $TMPDIR. The user will just need to use the correct one for his applications, you could call the file "hpmpi_machines" or alike and so on. As $TMPDIR is removed by SGE after the job anyway, the bunch of created list of machines won't matter.
Such a generic PE won't work if daemons need to be started in the start/stop_proc_args (like for PVM, LAM-MPI or MPICH2 version < 1.3). Nevertheless you could use some kind of flag to select any of the startup mechanisms. Drawback: if you touch the PE scripts and mess them up, all parallel applications will fail. Therefore I would (as daemons are no longer used in recent versions I say "would") prefer one for each kind of such a daemon startup.
> and a Hadoop PE both are tight (ie.
> OGS/GE can control the slave tasks, perform job control & accounting,
> etc), then you can in theory use a common PE for both. And while we
> are on this topic, if you want a tight PE, you will need to use qrsh
> to invoke remote tasks.
..., unless it is used already by the parallel library by default like Open MPI and MPICH2 if they detect that they run inside OGE.
> Also, a generic "threaded" PE can be used for OpenMP applications,
> Intel TBB programs, & user-threaded applications as the PE definition
> are likely to be very similar.
>> In our environment, there are *many* users with their own parallel type of
>> programs and I like to have a generic PE being the default, but I don't know
>> if you can get away with not specifying a PE name?
> I haven't tried it myself, but looks like you can use a JSV or a qsub
> wrapper for it if you think you really don't want to specify a PE:
> But I would propably tell my users that it is a required parameter and
> thus tell them not to be lazy!
Yes, but the requested PE by the user must exist already (even with a slot count of 0), as the `qsub` parameters will be verfied berfore *and* after the call to a JSV. (This could be a point of discussion: if you want to correct/change/adjust the resource requests in a JSV, OGE should verify the parameters only *after* the adjustment.)
>> One other question. We have a cluster here running sge6.2 and it has a PE
>> for mpich with "Allocation Rule" set to "Round Robin". Should this not be
>> set to "Fill Up" to fill up all cores on the first node before continuing
>> with the next next node? I am trying to understand if this was an
>> oversight when it was setup or if there is a reason for this?
> It depends on the MPI application - some applications perform better
> when some MPI tasks can talk to their peers via shared memory, while
> other MPI applications are better off when they can use more memory
> per node. So you may want to talk to the person who set up the cluster
> why he did it that way.
> BTW, we also have an archive of scheduler related blog entries at:
>> users mailing list
>> users at gridengine.org
> users mailing list
> users at gridengine.org
More information about the users