[gridengine users] Round Robin x Fill Up

Reuti reuti at staff.uni-marburg.de
Sun Jul 28 12:27:18 UTC 2013


Am 27.07.2013 um 22:29 schrieb Sergio Mafra:

> Hi Reuti,
> 
> It seems that the previous tests are wrong.
> I realize that your doubts are right.. There was only one slot being busy despite all 16 being deployed.
> 
> I´d change the job launcher to:
> 
> $qsub -N $nameofthecase -b y -pe orte 20 -cwd mpiexec -np 20 newave170502_L

Aha, the "-np 20" option shouldn't be necessary at all. Maybe it was a bug in MPICH2 1.4 at that time not to detect the granted slots.

- Was the MPICH2 1.4 version used also to compile the application?

- As the 1.4 is somewhat old, I suggest to update at least to 1.4.1p1:

http://www.mpich.org/static/downloads/1.4.1p1/

You can compile it to be installed into ~/local/mpich2-1.4.1p1 or alike and use this version then for compilation and execution.

You could also try the latest http://www.mpich.org/ or even http://www.open-mpi.org

-- Reuti


> Note that (for some reason) it´s mandatory to tell PE and mpi that are 20 slots to use.
> 
> Doing that, it comes this output for a job with 20 slots
> 
> $round_robin:
> 
> job with 20 slots
> job launched as 
> $qsub -N $nameofthecase -b y -pe orte 20 -cwd mpiexec -np 20 newave170502_L
> 
> $ ps -e f --cols=500
>  2390 ?        Sl     0:00 /opt/sge6/bin/linux-x64/sge_execd
>  2835 ?        S      0:00  \_ sge_shepherd-1 -bg
>  2837 ?        Ss     0:00      \_ mpiexec -np 20 newave170502_L
>  2838 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy --control-port master:46220 --demux poll --pgid 0 --retries 10 --proxy-id 0
>  2840 ?        R      1:18          |   \_ newave170502_L
>  2841 ?        S      0:54          |   \_ newave170502_L
>  2842 ?        S      1:07          |   \_ newave170502_L
>  2843 ?        S      0:52          |   \_ newave170502_L
>  2844 ?        S      1:07          |   \_ newave170502_L
>  2845 ?        S      1:08          |   \_ newave170502_L
>  2846 ?        S      0:00          |   \_ newave170502_L
>  2847 ?        S      0:00          |   \_ newave170502_L
>  2848 ?        S      0:00          |   \_ newave170502_L
>  2849 ?        S      0:00          |   \_ newave170502_L
>  2839 ?        Sl     0:00          \_ /opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:46220 --demux poll --pgid 0 --retries 10 --proxy-id 1
> 
>  
> $ mpiexec --version
>  HYDRA build details:
>     Version:                                 1.4
>     Release Date:                            Thu Jun 16 16:41:08 CDT 2011
>     CC:                              gcc  -I/build/buildd/mpich2-1.4/src/mpl/include -I/build/buildd/mpich2-1.4/src/mpl/include -I/build/buildd/mpich2-1.4/src/openpa/src -I/build/buildd/mpich2-1.4/src/openpa/src -I/build/buildd/mpich2-1.4/src/mpid/ch3/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/include -I/build/buildd/mpich2-1.4/src/mpid/common/datatype -I/build/buildd/mpich2-1.4/src/mpid/common/datatype -I/build/buildd/mpich2-1.4/src/mpid/common/locks -I/build/buildd/mpich2-1.4/src/mpid/common/locks -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor -I/build/buildd/mpich2-1.4/src/util/wrappers -I/build/buildd/mpich2-1.4/src/util/wrappers  -g -O2 -g -O2 -Wall -O2  -Wl,-Bsymbolic-functions  -lrt -lcr -lpthread
>     CXX:
>     F77:
>     F90:                             gfortran  -Wl,-Bsymbolic-functions  -lrt -lcr -lpthread
>     Configure options:                       '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=${prefix}/lib/mpich2' '--srcdir=.' '--disable-maintainer-mode' '--disable-dependency-tracking' '--disable-silent-rules' '--enable-shared' '--prefix=/usr' '--enable-fc' '--disable-rpath' '--sysconfdir=/etc/mpich2' '--includedir=/usr/include/mpich2' '--docdir=/usr/share/doc/mpich2' '--with-hwloc-prefix=system' '--enable-checkpointing' '--with-hydra-ckpointlib=blcr' 'build_alias=x86_64-linux-gnu' 'MPICH2LIB_CFLAGS=-g -O2 -g -O2 -Wall' 'MPICH2LIB_CXXFLAGS=-g -O2 -g -O2 -Wall' 'MPICH2LIB_FFLAGS=-g -O2' 'MPICH2LIB_FCFLAGS=' 'LDFLAGS=-Wl,-Bsymbolic-functions ' 'CPPFLAGS= -I/build/buildd/mpich2-1.4/src/mpl/include -I/build/buildd/mpich2-1.4/src/mpl/include -I/build/buildd/mpich2-1.4/src/openpa/src -I/build/buildd/mpich2-1.4/src/openpa/src -I/build/buildd/mpich2-1.4/src/mpid/ch3/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/include -I/build/buildd/mpich2-1.4/src/mpid/common/datatype -I/build/buildd/mpich2-1.4/src/mpid/common/datatype -I/build/buildd/mpich2-1.4/src/mpid/common/locks -I/build/buildd/mpich2-1.4/src/mpid/common/locks -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor -I/build/buildd/mpich2-1.4/src/util/wrappers -I/build/buildd/mpich2-1.4/src/util/wrappers' 'FFLAGS= -g -O2 -O2' 'FC=gfortran' 'CFLAGS= -g -O2 -g -O2 -Wall -O2' 'CXXFLAGS= -g -O2 -g -O2 -Wall -O2' '--disable-option-checking' 'CC=gcc' 'LIBS=-lrt -lcr -lpthread '
>     Process Manager:                         pmi
>     Launchers available:                     ssh rsh fork slurm ll lsf sge none persist
>     Binding libraries available:             hwloc plpa
>     Resource management kernels available:   none slurm ll lsf sge pbs
>     Checkpointing libraries available:       blcr
>     Demux engines available:                 poll select
> 
> $ ps -eLf
> sgeadmin  2837  2835  2837  0    1 19:49 ?        00:00:00 mpiexec -np 20 newave170502_L
> sgeadmin  2838  2837  2838  0    1 19:49 ?        00:00:00 /usr/bin/hydra_pmi_proxy --control-port master:46220 --demux poll --pgid 0 --retries 10 --proxy-id 0
> sgeadmin  2839  2837  2839  0    3 19:49 ?        00:00:00 /opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:46220 --demux poll -
> sgeadmin  2839  2837  2850  0    3 19:49 ?        00:00:00 /opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:46220 --demux poll -
> sgeadmin  2839  2837  2851  0    3 19:49 ?        00:00:00 /opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:46220 --demux poll -
> sgeadmin  2840  2838  2840 98    1 19:49 ?        00:04:32 newave170502_L
> sgeadmin  2841  2838  2841 89    1 19:49 ?        00:04:05 newave170502_L
> sgeadmin  2842  2838  2842 93    1 19:49 ?        00:04:18 newave170502_L
> sgeadmin  2843  2838  2843 88    1 19:49 ?        00:04:03 newave170502_L
> sgeadmin  2844  2838  2844 93    1 19:49 ?        00:04:19 newave170502_L
> sgeadmin  2845  2838  2845 94    1 19:49 ?        00:04:20 newave170502_L
> sgeadmin  2846  2838  2846 69    1 19:49 ?        00:03:11 newave170502_L
> sgeadmin  2847  2838  2847 69    1 19:49 ?        00:03:11 newave170502_L
> sgeadmin  2848  2838  2848 69    1 19:49 ?        00:03:11 newave170502_L
> sgeadmin  2849  2838  2849 69    1 19:49 ?        00:03:11 newave170502_L
> sgeadmin  2858  2491  2858  0    1 19:54 pts/0    00:00:00 ps -eLf
> 
> $ cat /etc/hosts
> 127.0.0.1 ubuntu
> 
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> # Added by cloud-init
> 127.0.1.1       ip-10-17-48-113.ec2.internal ip-10-17-48-113
> 10.17.48.113 master
> 10.17.48.210 node001
> 
> $ which mpiexec
> /usr/bin/mpiexec
> 
> $ cat newave.tim (this is an output of the mpi app showing that 20 slots are being used)
> Programa Newave
> Versao 17.5.2
> Caso: PMO JANEIRO - 2011  29/12/2010 CVAR L25 A25 niveis para 31/12 NW Versao 17.5.x
> Data: 27-07-2013
> Hora: 19h 49min 28.425sec
> Numero de Processadores:   20 (<-- number of processors)
> 
> Everything runs fine. The job is divided into the 2 servers equally, occupying 10 slots in each one.
> 
> Now.. if I change PE to $fill_up and submit the same 20 slots´ job.. something weird happens.
> 
> Let´s see:
> 
> $fill_up
> job with 20 slots
> job launched as 
> $qsub -N $NOMECASO -b y -pe orte 20 -cwd mpiexec -np 20 newave170502_L
> 
> $ ps -e f --cols=500
>  2390 ?        Sl     0:01 /opt/sge6/bin/linux-x64/sge_execd
>  2890 ?        S      0:00  \_ sge_shepherd-2 -bg
>  2892 ?        Ss     0:00      \_ mpiexec -np 20 newave170502_L
>  2893 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy --control-port master:37827 --demux poll --pgid 0 --retries 10 --proxy-id 0
>  2895 ?        R      0:31          |   \_ newave170502_L
>  2896 ?        R      0:24          |   \_ newave170502_L
>  2897 ?        R      0:24          |   \_ newave170502_L
>  2898 ?        R      0:24          |   \_ newave170502_L
>  2899 ?        R      0:24          |   \_ newave170502_L
>  2900 ?        R      0:24          |   \_ newave170502_L
>  2901 ?        S      0:00          |   \_ newave170502_L
>  2902 ?        S      0:00          |   \_ newave170502_L
>  2903 ?        S      0:00          |   \_ newave170502_L
>  2904 ?        S      0:00          |   \_ newave170502_L
>  2894 ?        Sl     0:00          \_ /opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:37827 --demux poll --pgid 0 --retries 10 --proxy-id 1
> 
>  $ qstat -f
>  queuename                      qtype resv/used/tot. load_avg arch          states
> ---------------------------------------------------------------------------------
> all.q at master                   BIP   0/16/16        8.20     linux-x64
>       2 0.55500 pmo_2011-0 sgeadmin     r     07/27/2013 20:01:11    16
> ---------------------------------------------------------------------------------
> all.q at node001                  BIP   0/4/16         8.24     linux-x64
>       2 0.55500 pmo_2011-0 sgeadmin     r     07/27/2013 20:01:11     4
> 
> *** As you can see, the queue filled up the first server and use the 4 slots of the second, but..
> the mpi used 10 slots of the first server and 10 in the other one.
> 
> If I resubmit it, now with 16 slots:
> 
> job with 16 slots
> 
> $ ps -e f --cols=500
>  2932 ?        S      0:00  \_ sge_shepherd-3 -bg
>  2934 ?        Ss     0:00      \_ mpiexec -np 16 newave170502_L
>  2935 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy --control-port master:50693 --demux poll --pgid 0 --retries 10 --proxy-id 0
>  2937 ?        S      0:00          |   \_ newave170502_L
>  2938 ?        S      0:00          |   \_ newave170502_L
>  2939 ?        S      0:00          |   \_ newave170502_L
>  2940 ?        S      0:00          |   \_ newave170502_L
>  2941 ?        S      0:00          |   \_ newave170502_L
>  2942 ?        S      0:00          |   \_ newave170502_L
>  2943 ?        S      0:00          |   \_ newave170502_L
>  2944 ?        S      0:00          |   \_ newave170502_L
>  2936 ?        Z      0:00          \_ [qrsh] <defunct>
> 
> $ qstat -f
> queuename                      qtype resv/used/tot. load_avg arch          states
> ---------------------------------------------------------------------------------
> all.q at master                   BIP   0/16/16        4.39     linux-x64
>       3 0.55500 pmo_2011-0 sgeadmin     r     07/27/2013 20:12:26    16
> ---------------------------------------------------------------------------------
> all.q at node001                  BIP   0/0/16         4.67     linux-x64
> 
> $ ps -eLf
> sgeadmin  2934  2932  2934  0    1 20:12 ?        00:00:00 mpiexec -np 16 newave170502_L
> sgeadmin  2935  2934  2935  0    1 20:12 ?        00:00:00 /usr/bin/hydra_pmi_proxy --control-port master:50693 --demux poll --pgid 0 --retries 10 --proxy-id 0
> sgeadmin  2936  2934  2936  0    1 20:12 ?        00:00:00 [qrsh] <defunct>
> sgeadmin  2937  2935  2937  0    1 20:12 ?        00:00:00 newave170502_L
> sgeadmin  2938  2935  2938  0    1 20:12 ?        00:00:00 newave170502_L
> sgeadmin  2939  2935  2939  0    1 20:12 ?        00:00:00 newave170502_L
> sgeadmin  2940  2935  2940  0    1 20:12 ?        00:00:00 newave170502_L
> sgeadmin  2941  2935  2941  0    1 20:12 ?        00:00:00 newave170502_L
> sgeadmin  2942  2935  2942  0    1 20:12 ?        00:00:00 newave170502_L
> sgeadmin  2943  2935  2943  0    1 20:12 ?        00:00:00 newave170502_L
> sgeadmin  2944  2935  2944  0    1 20:12 ?        00:00:00 newave170502_L
> sgeadmin  2949  2491  2949  0    1 20:14 pts/0    00:00:00 ps -eLf
> 
> *** Again you can see, the queue filled up the first server and use no slots of the second, but..
> the mpi used 8 slots of the first server and  tried to use 8 in the other one but got an error...
> 
> Comments?
> 
> 
> All the best and thank you so much for your time and effort to help in this one...
> 
> 
> Sergio
> 
> 
> On Sat, Jul 27, 2013 at 3:58 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Am 27.07.2013 um 16:25 schrieb Sergio Mafra:
> 
> > Reuti,
> >
> > Aggregating all data...
> >
> > My cluster has 2 servers (master and node001), with 16 slots each one.
> >
> > My mpi app is newave170502_L
> >
> > I ran 3 tests:
> >
> > 1. $round_robin using 32 slots: (ran ok)
> >
> >  2382 ?        Sl     0:00 /opt/sge6/bin/linux-x64/sge_execd
> >  2817 ?        S      0:00  \_ sge_shepherd-1 -bg
> >  2819 ?        Ss     0:00      \_ mpiexec newave170502_L
> >  2820 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy --control-port master:40945 --demux poll --pgid 0 --retries 10 --proxy-id 0
> >  2822 ?        R      0:30          |   \_ newave170502_L
> >  2821 ?        Sl     0:00          \_ /opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:40945 --demux poll --pgid 0 --ret
> 
> As both nodes are used, this will succeed. I wonder why there is only one `newave170502` process. It should show 16 on each machine as child of the particular `hydra_pmi_proxy`.
> 
> What is the output of:
> 
> mpiexec --version
> 
> Maybe the application is using threads in addition. Does:
> 
> ps -eLf
> 
> list more instances of the application?
> 
> 
> > 2. $fill_up with 16 slots: (aborted with error error: executing task of job 2 failed: execution daemon on host "node001" didn't accept task)
> >
> >  2842 ?        S      0:00  \_ sge_shepherd-2 -bg
> >  2844 ?        Ss     0:00      \_ mpiexec newave170502_L
> >  2845 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy --control-port master:45562 --demux poll --pgid 0 --retries 10 --proxy-id 0
> >  2847 ?        S      0:00          |   \_ newave170502_L
> >  2846 ?        Z      0:00          \_ [qrsh] <defunct>
> 
> SGE allocated all slots to the "master" and none to "node001", as the submitted job can get the required amount of slots from only one machine, there is no need to spread another task on "node001". They question is: why is your application (or even the `mpiexec`) trying to do so? There were cases, where SGE was misled due to contradictory entries in:
> 
> /etc/hosts
> 
> having two or more different names for each machine.
> 
> - What is the content of this file in your machines?
> 
> - Is
> 
> > 3. $fill_up with 18 slots (ran ok):
> >
> >  2382 ?        Sl     0:01 /opt/sge6/bin/linux-x64/sge_execd
> >  2861 ?        Sl     0:00  \_ sge_shepherd-3 -bg
> >  2862 ?        Ss     0:00      \_ /opt/sge6/utilbin/linux-x64/qrsh_starter /opt/sge6/default/spool/exec_spool_local/master/active_jobs/3.1/1.master
> >  2869 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy --control-port node001:36673 --demux poll --pgid 0 --retries 10 --proxy-id 0
> >  2870 ?        R      0:24              \_ newave170502_L
> 
> While in former times (with the old MPICH(1)) each slave task need its own `qrsh --inherit ...`, nowadays only one is used and all additional processes on the master or any slave node are forks.
> 
> I guess even 17 would work, as it would need at least one slot from the other machine.
> 
> - Is there any comment in the output of your application, how many processes were started for a computation?
> 
> - Is the `mpiexec` a plain binary, or some kind of wrapper script?
> 
> file `which mpiexec`
> 
> If it's a symbolic link, it should point to mpiexec.hydra and the inquiry can be repeated.
> 
> -- Reuti
> 
> 
> > ---------- Forwarded message ----------
> > From: Sergio Mafra <sergiohmafra at gmail.com>
> > Date: Sat, Jul 27, 2013 at 11:07 AM
> > Subject: Fwd: [gridengine users] Round Robin x Fill Up
> > To: Reuti <reuti at staff.uni-marburg.de>, "users at gridengine.org" <users at gridengine.org>
> >
> >
> > Appending to previous message.
> >
> > If I change to $fill_up and submit the same job using only 16 slots of 32 available slots. here comes the output:
> >
> >  2842 ?        S      0:00  \_ sge_shepherd-2 -bg
> >  2844 ?        Ss     0:00      \_ mpiexec newave170502_L
> >  2845 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy --control-port master:45562 --demux poll --pgid 0 --retries 10 --proxy-id 0
> >  2847 ?        S      0:00          |   \_ newave170502_L
> >  2846 ?        Z      0:00          \_ [qrsh] <defunct>
> > ---------- Forwarded message ----------
> > From: Sergio Mafra <sergiohmafra at gmail.com>
> > Date: Sat, Jul 27, 2013 at 10:58 AM
> > Subject: Re: [gridengine users] Round Robin x Fill Up
> > To: Reuti <reuti at staff.uni-marburg.de>
> > Cc: "users at gridengine.org" <users at gridengine.org>
> >
> >
> > Hi Reuti,
> >
> > >Do you start in your job script any `mpiexec` resp. `mpirun` or is this issued already inside >the application you started? The question is, whether there is any additional "-hostlist", "->machinefile" or alike given as argument to this command and invalidating the generated >$PE_HOSTFILE of SGE.
> >
> > The job is started using mpiexec, in this way:
> > $ qsub -N $nameofthecase -b y -pe orte $1 -cwd mpiexec newave170502_L
> > where newave170502_L is the name of mpi app.
> >
> > >You can also try the following:
> > >
> > >- revert the PE definition to allocate by $round_robin
> > >- submit a job
> > >- SSH to the master node of the parallel job
> > >- issue:
> > >
> > >ps -e f --cols=500
> > >
> > >(f w/o -)
> >
> > >- somewhere should be the `mpiexec` resp. `mpirun` command. Can you please post >this line, it should be a child of the started job script.
> >
> > Here comes the output:
> >
> > 2382 ?        Sl     0:00 /opt/sge6/bin/linux-x64/sge_execd
> >  2817 ?        S      0:00  \_ sge_shepherd-1 -bg
> >  2819 ?        Ss     0:00      \_ mpiexec newave170502_L
> >  2820 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy --control-port master:40945 --demux poll --pgid 0 --retries 10 --proxy-id 0
> >  2822 ?        R      0:30          |   \_ newave170502_L
> >  2821 ?        Sl     0:00          \_ /opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:40945 --demux poll --pgid 0 --retries 10 --proxy-id 1
> >
> > All best,
> >
> > Sergio
> >
> >
> > On Sat, Jul 27, 2013 at 10:13 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> > Hi,
> >
> > Am 26.07.2013 um 23:26 schrieb Sergio Mafra:
> >
> > > Hi Reuti,
> > >
> > > Thanks for your prompt answer.
> > > Regarding yout questions:
> > >
> > > > How does you application read the list of granted machines?
> > > > Did you compile MPI on your own (which implementation in detail)?
> > >
> > > I´ve got no control or no documentation about this app. It was design by an Electrical Research Center for our proposes.
> > >
> > > > PS: I assume that with $round_robin simply all (or at least: many) nodes were access allowed to.
> > >
> > > Yes. It´s correct.
> > >
> > > >As now hosts are first filled before access to another one is granted, you might see the >effect of the former (possibly wrong) distribution of slave tasks to the nodes
> > >
> > > So I understand that the app should be recompiled to take advantages of $fill_up option?
> >
> > No necessarily, the used version of MPI is obviously prepared to run under the control of SGE, as it uses `qrsh -inherit ...` to start slave tasks on other nodes. Unfortunately also on machines/slots which weren't granted for this job and results in the error you mentioned first.
> >
> > Do you start in your job script any `mpiexec` resp. `mpirun` or is this issued already inside the application you started? The question is, whether there is any additional "-hostlist", "-machinefile" or alike given as argument to this command and invalidating the generated $PE_HOSTFILE of SGE.
> >
> > The MPI library should detect the granted allocation automatically, as it honors already that it's started under SGE.
> >
> > You can also try the following:
> >
> > - revert the PE definition to allocate by $round_robin
> > - submit a job
> > - SSH to the master node of the parallel job
> > - issue:
> >
> > ps -e f --cols=500
> >
> > (f w/o -)
> >
> > - somewhere should be the `mpiexec` resp. `mpirun` command. Can you please post this line, it should be a child of the started job script.
> >
> > -- Reuti
> >
> >
> > > All the best,
> > >
> > > Sergio
> > >
> > >
> > > On Fri, Jul 26, 2013 at 10:06 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> > > Hi,
> > >
> > > Am 26.07.2013 um 14:22 schrieb Sergio Mafra:
> > >
> > > > I'm using MIT StarCluster with mpich2 and OGE. Everything's ok.
> > > > But when I tried to change the strategy of distribution of work from Round Robin (default) to Fill Up... My problems had just began.
> > > > OGE keeps me teling that some nodes can not receive tasks...
> > >
> > > On the one hand this is a good sign, as it confirms that your PE is defined to control slave tasks on the nodes.
> > >
> > >
> > > > "Error: executing task of job 9 failed: execution daemon on host "node002" didn't accept task"It seems that my mpi app always tries to run in all nodes of the cluster, no matter if OGE doesn't allow it to do it.
> > > > Does anybody knows of a workaround ?
> > >
> > > This indicates, that you application tries to use a node in the cluster, which wasn't granted to this job by SGE.
> > >
> > > How does you application read the list of granted machines?
> > >
> > > Did you compile MPI on your own (which implementation in detail)?
> > >
> > > -- Reuti
> > >
> > > PS: I assume that with $round_robin simply all (or at least: many) nodes were access allowed to. As now hosts are first filled before access to another one is granted, you might see the effect of the former (possibly wrong) distribution of slave tasks to the nodes.
> > >
> >
> >
> >
> >
> 
> 




More information about the users mailing list