[gridengine users] control_slaves on PE
reuti at staff.uni-marburg.de
Wed Jan 14 11:49:44 UTC 2015
Am 14.01.2015 um 11:50 schrieb Roberto Nunnari <roberto.nunnari at supsi.ch>:
> Il 14.01.2015 11:05, Reuti ha scritto:
>> Am 14.01.2015 um 10:09 schrieb Roberto Nunnari:
>>> man sge_pe states:
>>> This parameter can be set to TRUE or FALSE (the default). It indicates whether Oracle Grid Engine is the creator of the slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8) and thus has full control over all processes in a parallel application, which enables capabilities such as resource limitation and correct accounting. However, to gain control over the slave tasks of a parallel application, a sophisticated PE interface is required, which works closely together with Oracle Grid Engine facilities. Such PE interfaces are available through your local Oracle Grid Engine support office.
>>> Does that mean that you need to buy some software from Oracle in order to take advantage of 'control_slaves TRUE' ?
>> It mainly refers to the fact that it depends on the parallel application whether any preparation might be necessary by supplying scripts for start/stop_proc_args and set up or tuning the started application not to do nasty things like jumping out of the process tree.
>> Technically its value must be set to TRUE to allow that a started job script is allowed to perform `qrsh --inherit ...` to reach other nodes without any `rsh`/`ssh` at all (in my clusters `ssh` is available for admin staff only).
> Interesting.. once I try to do the same, but a program stopped to work.. so I implemented a (half)solution where ssh is for admins only on the master node, and for all users on the execution nodes.
The PE must be set up for each type of parallel library. Therefore several ones can be defined and the user has to use the correct for the intended type of job. Was this failing with Open MPI or a different parallel library?
> While these scripts were mandatory for many parallel applications in the past, MPICH and Open MPI (./configure --with-sge for the latter) in the actual versions support SGE out of the box.
>> For Open MPI you can look for the value:
>> $ ompi_info | grep grid
>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
> Yes. It's like that, thank you. :-)
>> whether it's set up in your version. Care must be taken with Open MPI 1.8 and newer as by default they issue a core binding independent from SGE's one and always start at socket/core 0/0, i.e. if more than one Open MPI job is running on a node it's necessary to either switch of Open MPI's core binding (and/or use SGE's one) or reformat the by SGE granted core list that it can be used by Open MPI.
> humm.. I see that on CentOS 6.6 they introduced openmpi 1.8.1..
> # ompi_info | grep grid
> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.8.1)
> while on CentOS 6.4:
> # ompi_info | grep grid
> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.5.4)
> ..so does that means that even though it's version 1.8.1, it doesn't use the default core binding that breaks SGE?
Yep. Best way would be to reformat the SGE generated core list in $PE_HOSTFILE (`qsub -binding pe linear:1 ...`) and feed it to Open MPI.
Second to best solution is...
> I rephrase my question: if I upgrade my execution nodes from CentOS 6.4 (that use openmpi 1.5.4) to CentOS 6.6 (that use openmpi 1.8.1) SGE PE jobs will continue to work or will it need some tweeks?
> You talk about 'switch off openmpi's core binding and/or use SGE's one'.. how do you do it? at build time or at run time? What's the command line switch?
...to turn off Open MPI's binding as it always start as socket/core 0/0 as said:
$ mpiexec --bind-to none ...
The SGE binding is still in effect and not overridden or removed. Difference: SGE binds all processes to a set of cores, and inside this set of cores the Linux kernel can schedule the processes to other cores (but it should do its best to reuse the caches). Open MPI's binding is exactly 1:1, but it depends highly on the MPI job how to phrase the correct binding, especially when threads are used in conjunction with Open MPI.
> Thank you and best regards.
>> -- Reuti
>>> In my production environment, I have four PEs and two are set as 'control_slaves FALSE' and two 'control_slaves TRUE'.. and as long as I know, all of them behave as expected.. that has been like that for about 9 years, since I inherited the SGE cluster..
>>> Can anybody cast some light on it, please?
>>> my present environment:
>>> - OGE 6.2u7
>>> - on the execution nodes: openmpi 1.5.4
>>> - on the master node: openmpi 1.4
>>> Thank you and best regards.
>>> users mailing list
>>> users at gridengine.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
More information about the users