[gridengine users] Round Robin x Fill Up

Reuti reuti at staff.uni-marburg.de
Wed Jul 31 15:19:00 UTC 2013


Am 31.07.2013 um 17:08 schrieb Sergio Mafra:

> Reuti,
> 
> Found this link: http://mickey.ifp.illinois.edu/speechWiki/index.php?title=Simple_UserGuide_for_MPICH2&oldid=4403
> 
> It seems (I didn´t test it yet) that it uses a mpich2 version prior to 1.4 (without the tight integration) and OGE with $fill_up.
> 
> Other question: Is there any way to launch mpd in $fill_up instead of $round_robin?

My Howto applies to both (better: all) allocation rules available in SGE.

-- Reuti


> All the best,
> 
> Sergio
> 
> 
> On Sun, Jul 28, 2013 at 2:07 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Am 28.07.2013 um 18:10 schrieb Sergio Mafra:
> 
> > Ok,
> >
> > In fact, the app was compiled with mpich2 version 1.08 and since it's distributed by a Research Center, we don't have access to compilation.
> 
> This is even worse, as there is no guarantee that an application built with 1.08 will work with 1.4 at all. MPI is an application programming interface (API) but not an application binary interface (ABI), hence the MPI library can't be changed in type and version in general *). The one used for compilation should be the same used later on during execution.
> 
> At time of MPICH 1.08 the default startup of slave tasks was to have already an mpd-ring running in MPICH2 before the job is started and most likely this is statically linked into the application. This may result in running n instances of the application in serial instead of one parallel application. I don't recall exactly when Hydra appeared in MPICH2.
> 
> The best would be to ask them to recompile the application with a recent version of an MPI library, or for you to go back MPICH2 1.08 and set up SGE for it: http://arc.liv.ac.uk/SGE/howto/mpich2-integration/mpich2-integration.html But I can't provide any support for this old version or setup.
> 
> -- Reuti
> 
> 
> *)There are exceptions e.g. in Open MPI that the ABI will be the same between an uneven (feature release) and the following even (stable release) version of the library.
> 
> 
> > I'll try to upgrade the version of mpich2 and see what happens.
> >
> > Thanks again,
> >
> > Sergio
> >
> > Em domingo, 28 de julho de 2013, Reuti escreveu:
> > Am 27.07.2013 um 22:29 schrieb Sergio Mafra:
> >
> > > Hi Reuti,
> > >
> > > It seems that the previous tests are wrong.
> > > I realize that your doubts are right.. There was only one slot being busy despite all 16 being deployed.
> > >
> > > I´d change the job launcher to:
> > >
> > > $qsub -N $nameofthecase -b y -pe orte 20 -cwd mpiexec -np 20 newave170502_L
> >
> > Aha, the "-np 20" option shouldn't be necessary at all. Maybe it was a bug in MPICH2 1.4 at that time not to detect the granted slots.
> >
> > - Was the MPICH2 1.4 version used also to compile the application?
> >
> > - As the 1.4 is somewhat old, I suggest to update at least to 1.4.1p1:
> >
> > http://www.mpich.org/static/downloads/1.4.1p1/
> >
> > You can compile it to be installed into ~/local/mpich2-1.4.1p1 or alike and use this version then for compilation and execution.
> >
> > You could also try the latest http://www.mpich.org/ or even http://www.open-mpi.org
> >
> > -- Reuti
> >
> >
> > > Note that (for some reason) it´s mandatory to tell PE and mpi that are 20 slots to use.
> > >
> > > Doing that, it comes this output for a job with 20 slots
> > >
> > > $round_robin:
> > >
> > > job with 20 slots
> > > job launched as
> > > $qsub -N $nameofthecase -b y -pe orte 20 -cwd mpiexec -np 20 newave170502_L
> > >
> > > $ ps -e f --cols=500
> > >  2390 ?        Sl     0:00 /opt/sge6/bin/linux-x64/sge_execd
> > >  2835 ?        S      0:00  \_ sge_shepherd-1 -bg
> > >  2837 ?        Ss     0:00      \_ mpiexec -np 20 newave170502_L
> > >  2838 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy --control-port master:46220 --demux poll --pgid 0 --retries 10 --proxy-id 0
> > >  2840 ?        R      1:18          |   \_ newave170502_L
> > >  2841 ?        S      0:54          |   \_ newave170502_L
> > >  2842 ?        S      1:07          |   \_ newave170502_L
> > >  2843 ?        S      0:52          |   \_ newave170502_L
> > >  2844 ?        S      1:07          |   \_ newave170502_L
> > >  2845 ?        S      1:08          |   \_ newave170502_L
> > >  2846 ?        S      0:00          |   \_ newave170502_L
> > >  2847 ?        S      0:00          |   \_ newave170502_L
> > >  2848 ?        S      0:00          |   \_ newave170502_L
> > >  2849 ?        S      0:00          |   \_ newave170502_L
> > >  2839 ?        Sl     0:00          \_ /opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:46220 --demux poll --pgid 0 --retries 10 --proxy-id 1
> > >
> > >
> > > $ mpiexec --version
> > >  HYDRA build details:
> > >     Version:                                 1.4
> > >     Release Date:                            Thu Jun 16 16:41:08 CDT 2011
> > >     CC:                              gcc  -I/build/buildd/mpich2-1.4/src/mpl/include -I/build/buildd/mpich2-1.4/src/mpl/include -I/build/buildd/mpich2-1.4/src/openpa/src -I/build/buildd/mpich2-1.4/src/openpa/src -I/build/buildd/mpich2-1.4/src/mpid/ch3/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/include -I/build/buildd/mpich2-1.4/src/mpid/common/datatype -I/build/buildd/mpich2-1.4/src/mpid/common/datatype -I/build/buildd/mpich2-1.4/src/mpid/common/locks -I/build/buildd/mpich2-1.4/src/mpid/common/locks -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor -I/build/buildd/mpich2-1.4/src/util/wrappers -I/build/buildd/mpich2-1.4/src/util/wrappers  -g -O2 -g -O2 -Wall -O2  -Wl,-Bsymbolic-functions  -lrt -lcr -lpthread
> > >     CXX:
> > >     F77:
> > >     F90:                             gfortran  -Wl,-Bsymbolic-functions  -lrt -lcr -lpthread
> > >     Configure options:                       '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=${prefix}/lib/mpich2' '--srcdir=.' '--disable-maintainer-mode' '--disable-dependency-tracking' '-
> 
> 





More information about the users mailing list