[gridengine users] Round Robin x Fill Up

Sergio Mafra sergiohmafra at gmail.com
Wed Jul 31 15:35:19 UTC 2013


Reuti,

I agree. Could you repoint the lastest Reuti´s how-to?

Thank you.

Sergio


On Wed, Jul 31, 2013 at 12:19 PM, Reuti <reuti at staff.uni-marburg.de> wrote:

> Am 31.07.2013 um 17:08 schrieb Sergio Mafra:
>
> > Reuti,
> >
> > Found this link:
> http://mickey.ifp.illinois.edu/speechWiki/index.php?title=Simple_UserGuide_for_MPICH2&oldid=4403
> >
> > It seems (I didn´t test it yet) that it uses a mpich2 version prior to
> 1.4 (without the tight integration) and OGE with $fill_up.
> >
> > Other question: Is there any way to launch mpd in $fill_up instead of
> $round_robin?
>
> My Howto applies to both (better: all) allocation rules available in SGE.
>
> -- Reuti
>
>
> > All the best,
> >
> > Sergio
> >
> >
> > On Sun, Jul 28, 2013 at 2:07 PM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > Am 28.07.2013 um 18:10 schrieb Sergio Mafra:
> >
> > > Ok,
> > >
> > > In fact, the app was compiled with mpich2 version 1.08 and since it's
> distributed by a Research Center, we don't have access to compilation.
> >
> > This is even worse, as there is no guarantee that an application built
> with 1.08 will work with 1.4 at all. MPI is an application programming
> interface (API) but not an application binary interface (ABI), hence the
> MPI library can't be changed in type and version in general *). The one
> used for compilation should be the same used later on during execution.
> >
> > At time of MPICH 1.08 the default startup of slave tasks was to have
> already an mpd-ring running in MPICH2 before the job is started and most
> likely this is statically linked into the application. This may result in
> running n instances of the application in serial instead of one parallel
> application. I don't recall exactly when Hydra appeared in MPICH2.
> >
> > The best would be to ask them to recompile the application with a recent
> version of an MPI library, or for you to go back MPICH2 1.08 and set up SGE
> for it:
> http://arc.liv.ac.uk/SGE/howto/mpich2-integration/mpich2-integration.htmlBut I can't provide any support for this old version or setup.
> >
> > -- Reuti
> >
> >
> > *)There are exceptions e.g. in Open MPI that the ABI will be the same
> between an uneven (feature release) and the following even (stable release)
> version of the library.
> >
> >
> > > I'll try to upgrade the version of mpich2 and see what happens.
> > >
> > > Thanks again,
> > >
> > > Sergio
> > >
> > > Em domingo, 28 de julho de 2013, Reuti escreveu:
> > > Am 27.07.2013 um 22:29 schrieb Sergio Mafra:
> > >
> > > > Hi Reuti,
> > > >
> > > > It seems that the previous tests are wrong.
> > > > I realize that your doubts are right.. There was only one slot being
> busy despite all 16 being deployed.
> > > >
> > > > I´d change the job launcher to:
> > > >
> > > > $qsub -N $nameofthecase -b y -pe orte 20 -cwd mpiexec -np 20
> newave170502_L
> > >
> > > Aha, the "-np 20" option shouldn't be necessary at all. Maybe it was a
> bug in MPICH2 1.4 at that time not to detect the granted slots.
> > >
> > > - Was the MPICH2 1.4 version used also to compile the application?
> > >
> > > - As the 1.4 is somewhat old, I suggest to update at least to 1.4.1p1:
> > >
> > > http://www.mpich.org/static/downloads/1.4.1p1/
> > >
> > > You can compile it to be installed into ~/local/mpich2-1.4.1p1 or
> alike and use this version then for compilation and execution.
> > >
> > > You could also try the latest http://www.mpich.org/ or even
> http://www.open-mpi.org
> > >
> > > -- Reuti
> > >
> > >
> > > > Note that (for some reason) it´s mandatory to tell PE and mpi that
> are 20 slots to use.
> > > >
> > > > Doing that, it comes this output for a job with 20 slots
> > > >
> > > > $round_robin:
> > > >
> > > > job with 20 slots
> > > > job launched as
> > > > $qsub -N $nameofthecase -b y -pe orte 20 -cwd mpiexec -np 20
> newave170502_L
> > > >
> > > > $ ps -e f --cols=500
> > > >  2390 ?        Sl     0:00 /opt/sge6/bin/linux-x64/sge_execd
> > > >  2835 ?        S      0:00  \_ sge_shepherd-1 -bg
> > > >  2837 ?        Ss     0:00      \_ mpiexec -np 20 newave170502_L
> > > >  2838 ?        S      0:00          \_ /usr/bin/hydra_pmi_proxy
> --control-port master:46220 --demux poll --pgid 0 --retries 10 --proxy-id 0
> > > >  2840 ?        R      1:18          |   \_ newave170502_L
> > > >  2841 ?        S      0:54          |   \_ newave170502_L
> > > >  2842 ?        S      1:07          |   \_ newave170502_L
> > > >  2843 ?        S      0:52          |   \_ newave170502_L
> > > >  2844 ?        S      1:07          |   \_ newave170502_L
> > > >  2845 ?        S      1:08          |   \_ newave170502_L
> > > >  2846 ?        S      0:00          |   \_ newave170502_L
> > > >  2847 ?        S      0:00          |   \_ newave170502_L
> > > >  2848 ?        S      0:00          |   \_ newave170502_L
> > > >  2849 ?        S      0:00          |   \_ newave170502_L
> > > >  2839 ?        Sl     0:00          \_ /opt/sge6/bin/linux-x64/qrsh
> -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:46220
> --demux poll --pgid 0 --retries 10 --proxy-id 1
> > > >
> > > >
> > > > $ mpiexec --version
> > > >  HYDRA build details:
> > > >     Version:                                 1.4
> > > >     Release Date:                            Thu Jun 16 16:41:08 CDT
> 2011
> > > >     CC:                              gcc
>  -I/build/buildd/mpich2-1.4/src/mpl/include
> -I/build/buildd/mpich2-1.4/src/mpl/include
> -I/build/buildd/mpich2-1.4/src/openpa/src
> -I/build/buildd/mpich2-1.4/src/openpa/src
> -I/build/buildd/mpich2-1.4/src/mpid/ch3/include
> -I/build/buildd/mpich2-1.4/src/mpid/ch3/include
> -I/build/buildd/mpich2-1.4/src/mpid/common/datatype
> -I/build/buildd/mpich2-1.4/src/mpid/common/datatype
> -I/build/buildd/mpich2-1.4/src/mpid/common/locks
> -I/build/buildd/mpich2-1.4/src/mpid/common/locks
> -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include
> -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include
> -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include
> -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include
> -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor
> -I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor
> -I/build/buildd/mpich2-1.4/src/util/wrappers
> -I/build/buildd/mpich2-1.4/src/util/wrappers  -g -O2 -g -O2 -Wall -O2
>  -Wl,-Bsymbolic-functions  -lrt -lcr -lpthread
> > > >     CXX:
> > > >     F77:
> > > >     F90:                             gfortran
>  -Wl,-Bsymbolic-functions  -lrt -lcr -lpthread
> > > >     Configure options:
> '--build=x86_64-linux-gnu' '--includedir=${prefix}/include'
> '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info'
> '--sysconfdir=/etc' '--localstatedir=/var'
> '--libexecdir=${prefix}/lib/mpich2' '--srcdir=.'
> '--disable-maintainer-mode' '--disable-dependency-tracking' '-
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20130731/d0af7421/attachment.html>


More information about the users mailing list