[gridengine users] Fwd: error in parallel run openmpi for gridengine

Yong Wu wuy069 at gmail.com
Fri Apr 7 14:04:26 UTC 2017


Thanks for your reply.
First of all, I can run this job on multiple nodes without
Torque/SGE resource manager, and also ok used with Torque.
But this job does not work on multiple nodes with gridengine.
I doubt that this is caused by the parallel environment of gridengine.
However, orte, mpi, mpich, I got the same error for these PEs of gridengine.

I answer your above mentioned question.
>Can you please post the output of the $PE_HOSTFILE and the converted
test.nodes for a run, and the allocation you got: qstat -g t
The output of $PE_HOSTFILE:
compute-0-34.local 16 bgmnode.q at compute-0-34.local UNDEFINED
compute-0-67.local 8 bgmnode.q at compute-0-67.local UNDEFINED

The converted test.nodes:
$ cat test.nodes
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-34
compute-0-67
compute-0-67
compute-0-67
compute-0-67
compute-0-67
compute-0-67
compute-0-67
compute-0-67

$ qstat -g t
job-ID  prior   name       user         state submit/start at     queue
                     master ja-task-ID
------------------------------------------------------------
------------------------------------------------------
  84462 0.60500 test       wuy          r     04/07/2017 21:37:18
bgmnode.q at compute-0-34.local   MASTER

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE

bgmnode.q at compute-0-34.local   SLAVE
  84462 0.60500 test       wuy          r     04/07/2017 21:37:18
bgmnode.q at compute-0-67.local   SLAVE

bgmnode.q at compute-0-67.local   SLAVE

bgmnode.q at compute-0-67.local   SLAVE

bgmnode.q at compute-0-67.local   SLAVE

bgmnode.q at compute-0-67.local   SLAVE

bgmnode.q at compute-0-67.local   SLAVE

bgmnode.q at compute-0-67.local   SLAVE

bgmnode.q at compute-0-67.local   SLAVE

> The "mpivars.sh" seems not to be in the default Open MPI compilation.
Where is it coming from, what's inside?
The "mpivars.sh" is touched by me, and the content:
$ cat /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh
# PATH
if test -z "`echo $PATH | grep /share/apps/mpi/openmpi2.0.2-ifort/bin`";
then
    PATH=/share/apps/mpi/openmpi2.0.2-ifort/bin:${PATH}
    export PATH
fi

# LD_LIBRARY_PATH
if test -z "`echo $LD_LIBRARY_PATH | grep
/share/apps/mpi/openmpi2.0.2-ifort/lib`";
then
    LD_LIBRARY_PATH=/share/apps/mpi/openmpi2.0.2-ifort/lib${
LD_LIBRARY_PATH:+:}${LD_LIBRARY_PATH}
    export LD_LIBRARY_PATH
fi

# MANPATH
if test -z "`echo $MANPATH | grep
/share/apps/mpi/openmpi2.0.2-ifort/share/man`";
then
    MANPATH=/share/apps/mpi/openmpi2.0.2-ifort/share/man:${MANPATH}
    export MANPATH
fi

>Did you compile Open MPI with the "--with-sge" in the ./configure step?
Yes, I configure it with Intel compiler and with the option "--with-sge"
$ module load intel/compiler/2011.7.256
$ source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh
$ ompi_info | grep gridengine
                 MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
v2.0.2)

>Side note:
I create the same directory on each nodes and also use the NFS shared directory
for scratch directory. And use the following environment:
source /usr/share/Modules/init/sh
module load intel/compiler/2011.7.256
source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh
export RSH_COMMAND="ssh"

Use these environments, I can run this orca job normally on multiple nodes
without gridengine by type the command:"/share/apps/orca4.0.0/orca test.inp
&>test.log &"
But use the gridengine resource manager, I got the error:
--------------------------------------------------------------------------
A hostfile was provided that contains at least one node not
present in the allocation:

  hostfile:  test.nodes
  node:      compute-0-67

If you are operating in a resource-managed environment, then only
nodes that are in the allocation can be used in the hostfile. You
may find relative node syntax to be a useful alternative to
specifying absolute node names see the orte_hosts man page for
further information.
--------------------------------------------------------------------------
I do not know why.

Best regards,
Yong Wu


2017-04-07 20:14 GMT+08:00 Reuti <reuti at staff.uni-marburg.de>:

> Hi,
>
> > Am 07.04.2017 um 09:42 schrieb Yong Wu <wuy069 at gmail.com>:
> >
> > Hi all,
> >   I submit a parallel ORCA (Quantum Chemistry Program) job on multiple
> nodes in Rocks SGE, and get the follow error information,
> > ------------------------------------------------------------
> --------------
> > A hostfile was provided that contains at least one node not
> > present in the allocation:
> >
> >   hostfile:  test.nodes
> >   node:      compute-0-67
> >
> > If you are operating in a resource-managed environment, then only
> > nodes that are in the allocation can be used in the hostfile. You
> > may find relative node syntax to be a useful alternative to
> > specifying absolute node names see the orte_hosts man page for
> > further information.
> > ------------------------------------------------------------
> --------------
>
> Although a nodefile is not necessary, it might point to a bug in Open MPI
> - see below to get rid of it. Can you please post the output of the
> $PE_HOSTFILE and the converted test.nodes for a run, and the allocation you
> got:
>
> qstat -g t
>
> (You can limit the output to your user account and all the lines belonging
> to the job in question.)
>
>
> > The ORCA program compiled with openmpi, here, I used orte parallel
> environment in Rocks SGE.
>
> Well, you can decide whether I answer here or on the ORCA list ;-)
>
>
> > $ qconf -sp orte
> > pe_name            orte
> > slots              9999
> > user_lists         NONE
> > xuser_lists        NONE
> > start_proc_args    /bin/true
> > stop_proc_args     /bin/true
> > allocation_rule    $fill_up
> > control_slaves     TRUE
> > job_is_first_task  FALSE
> > urgency_slots      min
> > accounting_summary TRUE
>
> This is fine.
>
>
> > The submitted sge script:
> >   #!/bin/bash
> >   # Job submission script:
> >   # Usage: qsub <this_script>
> >   #
> >   #$ -cwd
> >   #$ -j y
> >   #$ -o test.sge.o$JOB_ID
> >   #$ -S /bin/bash
> >   #$ -N test
> >   #$ -pe orte 24
> >   #$ -l h_vmem=3.67G
> >   #$ -l h_rt=1240:00:00
> >
> >   # go to work dir
> >   cd $SGE_O_WORKDIR
>
> There is a switch for it:
>
> #$ -cwd
>
>
> >
> >   # load the module env for ORCA
> >   source /usr/share/Modules/init/sh
> >   module load intel/compiler/2011.7.256
> >   source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh
>
> The "mpivars.sh" seems not to be in the default Open MPI compilation.
> Where is it coming from, what's inside?
>
> Did you compile Open MPI with the "--with-sge" in the ./configure step? In
> case you didn't compile it on your own, you should see something like this:
>
> $ ompi_info | grep grid
>                  MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component
> v2.1.0)
>
>
> >   export orcapath=/share/apps/orca4.0.0
> >   export RSH_COMMAND="ssh"
> >
> >   #creat scratch dir on nfs dir
> >   tdir=/home/data/$SGE_O_LOGNAME/$JOB_ID
> >   mkdir -p $tdir
> >
> >   #cat $PE_HOSTFILE
> >
> >   PeHostfile2MachineFile()
> >   {
> >      cat $1 | while read line; do
> >         # echo $line
> >         host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
> >         nslots=`echo $line|cut -f2 -d" "`
> >         i=1
> >         while [ $i -le $nslots ]; do
> >            # add here code to map regular hostnames into ATM hostnames
> >            echo $host
> >            i=`expr $i + 1`
> >         done
> >      done
> >   }
> >
> >   PeHostfile2MachineFile $PE_HOSTFILE >> $tdir/test.nodes
>
> In former times, this conversion was done in the start_proc_args. Nowadays
> you neither need this conversion, nor any "machines" file, nor the
> "test.nodes" file any longer. Open MPI will detect on it's own the correct
> number of slots to use on each node.
>
> There are only some multi-serial computations in ORCA, which need rsh/ssh
> and a nodefile (I have to check whether they don't just pull the
> information out of a `mpiexec`).
>
>
> >   cp ${SGE_O_WORKDIR}/test.inp $tdir
> >
> >   cd $tdir
>
> Side note:
>
> In ORCA there seem several types of jobs to exist:
>
> - some types of ORCA jobs can compute happily in $TMPDIR using the scratch
> directory on the nodes (even in case the job needs more than one machine)
> - some need a shared scratch directory, like you create here in the shared
> /home
> - some will start several serial processes on the granted nodes by the
> defined $RSH_COMMAND
>
> -- Reuti
>
>
> >
> >   echo "ORCA job start at" `date`
> >
> >   time $orcapath/orca test.inp > ${SGE_O_WORKDIR}/test.log
> >
> >   rm ${tdir}/test.inp
> >   rm ${tdir}/test.*tmp 2>/dev/null
> >   rm ${tdir}/test.*tmp.* 2>/dev/null
> >   mv ${tdir}/test.* $SGE_O_WORKDIR
> >
> >   echo "ORCA job finished at" `date`
> >
> >   echo "Work Dir is : $SGE_O_WORKDIR"
> >
> >   rm -rf $tdir
> >   rm $SGE_O_WORKDIR/test.sge
> >
> >
> > However, the job can run normally on multiple nodes in Torque.
> >
> > Can someone help me? Thanks very much!
> >
> > Best regards!
> > Yong Wu
> > _______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20170407/843ce3bf/attachment.html>


More information about the users mailing list