[gridengine users] Fwd: Round Robin x Fill Up

Sergio Mafra sergiohmafra at gmail.com
Sat Jul 27 14:07:55 UTC 2013


Appending to previous message.

If I change to $fill_up and submit the same job using only 16 slots of 32
available slots. here comes the output:

 2842 ?        S      0:00  \_ sge_shepherd-2 -bg
 2844 ?        Ss     0:00      \_ mpiexec newave170502_L
 2845 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy
--control-port master:45562 --demux poll --pgid 0 --retries 10 --proxy-id 0
 2847 ?        S      0:00          |   \_ newave170502_L
 2846 ?        Z      0:00          \_ [qrsh] <defunct>


---------- Forwarded message ----------
From: Sergio Mafra <sergiohmafra at gmail.com>
Date: Sat, Jul 27, 2013 at 10:58 AM
Subject: Re: [gridengine users] Round Robin x Fill Up
To: Reuti <reuti at staff.uni-marburg.de>
Cc: "users at gridengine.org" <users at gridengine.org>


Hi Reuti,

>Do you start in your job script any `mpiexec` resp. `mpirun` or is this
issued already inside >the application you started? The question is,
whether there is any additional "-hostlist", "->machinefile" or alike given
as argument to this command and invalidating the generated >$PE_HOSTFILE of
SGE.

The job is started using mpiexec, in this way:
$ qsub -N $nameofthecase -b y -pe orte $1 -cwd mpiexec newave170502_L
where newave170502_L is the name of mpi app.

>You can also try the following:
>
>- revert the PE definition to allocate by $round_robin
>- submit a job
>- SSH to the master node of the parallel job
>- issue:
>
>ps -e f --cols=500
>
>(f w/o -)

>- somewhere should be the `mpiexec` resp. `mpirun` command. Can you please
post >this line, it should be a child of the started job script.

Here comes the output:

2382 ?        Sl     0:00 /opt/sge6/bin/linux-x64/sge_execd
 2817 ?        S      0:00  \_ sge_shepherd-1 -bg
 2819 ?        Ss     0:00      \_ mpiexec newave170502_L
 2820 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy
--control-port master:40945 --demux poll --pgid 0 --retries 10 --proxy-id 0
 2822 ?        R      0:30          |   \_ newave170502_L
 2821 ?        Sl     0:00          \_ /opt/sge6/bin/linux-x64/qrsh
-inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:40945
--demux poll --pgid 0 --retries 10 --proxy-id 1

All best,

Sergio


On Sat, Jul 27, 2013 at 10:13 AM, Reuti <reuti at staff.uni-marburg.de> wrote:

> Hi,
>
> Am 26.07.2013 um 23:26 schrieb Sergio Mafra:
>
> > Hi Reuti,
> >
> > Thanks for your prompt answer.
> > Regarding yout questions:
> >
> > > How does you application read the list of granted machines?
> > > Did you compile MPI on your own (which implementation in detail)?
> >
> > I´ve got no control or no documentation about this app. It was design by
> an Electrical Research Center for our proposes.
> >
> > > PS: I assume that with $round_robin simply all (or at least: many)
> nodes were access allowed to.
> >
> > Yes. It´s correct.
> >
> > >As now hosts are first filled before access to another one is granted,
> you might see the >effect of the former (possibly wrong) distribution of
> slave tasks to the nodes
> >
> > So I understand that the app should be recompiled to take advantages of
> $fill_up option?
>
> No necessarily, the used version of MPI is obviously prepared to run under
> the control of SGE, as it uses `qrsh -inherit ...` to start slave tasks on
> other nodes. Unfortunately also on machines/slots which weren't granted for
> this job and results in the error you mentioned first.
>
> Do you start in your job script any `mpiexec` resp. `mpirun` or is this
> issued already inside the application you started? The question is, whether
> there is any additional "-hostlist", "-machinefile" or alike given as
> argument to this command and invalidating the generated $PE_HOSTFILE of SGE.
>
> The MPI library should detect the granted allocation automatically, as it
> honors already that it's started under SGE.
>
> You can also try the following:
>
> - revert the PE definition to allocate by $round_robin
> - submit a job
> - SSH to the master node of the parallel job
> - issue:
>
> ps -e f --cols=500
>
> (f w/o -)
>
> - somewhere should be the `mpiexec` resp. `mpirun` command. Can you please
> post this line, it should be a child of the started job script.
>
> -- Reuti
>
>
> > All the best,
> >
> > Sergio
> >
> >
> > On Fri, Jul 26, 2013 at 10:06 AM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > Hi,
> >
> > Am 26.07.2013 um 14:22 schrieb Sergio Mafra:
> >
> > > I'm using MIT StarCluster with mpich2 and OGE. Everything's ok.
> > > But when I tried to change the strategy of distribution of work from
> Round Robin (default) to Fill Up... My problems had just began.
> > > OGE keeps me teling that some nodes can not receive tasks...
> >
> > On the one hand this is a good sign, as it confirms that your PE is
> defined to control slave tasks on the nodes.
> >
> >
> > > "Error: executing task of job 9 failed: execution daemon on host
> "node002" didn't accept task"It seems that my mpi app always tries to run
> in all nodes of the cluster, no matter if OGE doesn't allow it to do it.
> > > Does anybody knows of a workaround ?
> >
> > This indicates, that you application tries to use a node in the cluster,
> which wasn't granted to this job by SGE.
> >
> > How does you application read the list of granted machines?
> >
> > Did you compile MPI on your own (which implementation in detail)?
> >
> > -- Reuti
> >
> > PS: I assume that with $round_robin simply all (or at least: many) nodes
> were access allowed to. As now hosts are first filled before access to
> another one is granted, you might see the effect of the former (possibly
> wrong) distribution of slave tasks to the nodes.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20130727/0982855a/attachment.html>


More information about the users mailing list