[gridengine users] control_slaves on PE

Roberto Nunnari roberto.nunnari at supsi.ch
Wed Jan 14 10:19:42 UTC 2015


Il 14.01.2015 10:09, Roberto Nunnari ha scritto:
> Hi.
>
> man sge_pe states:
>
> control_slaves
>    This parameter can be set to TRUE or FALSE (the default). It
> indicates whether Oracle Grid Engine is the creator of the slave tasks
> of a parallel  application  via  sge_execd(8)  and  sge_shepherd(8) and
> thus has full control over all processes in a parallel application,
> which enables capabilities such as resource limitation and correct
> accounting. However, to gain control over the slave tasks of a parallel
>   application,  a sophisticated  PE  interface  is  required, which
> works closely together with Oracle Grid Engine facilities. Such PE
> interfaces are available through your local Oracle Grid Engine support
> office.
>
> Does that mean that you need to buy some software from Oracle in order
> to take advantage of 'control_slaves TRUE' ?
>
> In my production environment, I have four PEs and two are set as
> 'control_slaves FALSE' and two 'control_slaves TRUE'.. and as long as I
> know, all of them behave as expected.. that has been like that for about
> 9 years, since I inherited the SGE cluster..
>
> Can anybody cast some light on it, please?
>
> my present environment:
> - OGE 6.2u7
> - on the execution nodes: openmpi 1.5.4
> - on the master node: openmpi 1.4
>
> Thank you and best regards.
> Robi

I can add that on the execution nodes, jobs launched with a PE 
configured with 'control_slaves TRUE' have a process hierarchy with 
sge_execd and sge_shepherd..

sge       1977     1  0  2014 ?        04:50:50 
/opt/sge/bin/lx24-amd64/sge_execd
sge      23594  1977  0 Jan12 ?        00:00:00     sge_shepherd-1668440 -bg
user1    23596 23594  0 Jan12 ?        00:00:00       -sh 
/opt/sge/default/spool/node21/job_scripts/1668440
user1    23702 23596 99 Jan12 ?        11-23:30:46 
/homea/user1/opt/myprog -v -ntomp 6 -nice 0 -gpu_id 0 -plumed plumed.dat 
-s topo

So.. it seams that in this case it is working properly..
Robi




More information about the users mailing list