[gridengine users] I can't run mpi jobs correctly
mahbube rustaee
rustaee at gmail.com
Wed Nov 23 08:07:24 UTC 2011
Excuse me Mr. Reuti for your time and I 'm appreciate with your kindness.
I compiled intel mpi openmpi-1.4.2 by --with-sge and it works via CLI
correctly (openmpi integrate with sge).
I modified PE mpifillamd such:
pe_name mpifillamd
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves FALSE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
I compiled my program with new open mpi.
and my script is:
#!/bin/bash
#$ -S /bin/bash
#$ -N Det2
#$ -cwd
#$ -j y
#$ -pe mpifillamd 100
. $HOME/.intelbash
. ~/openmpi_intel_1.4.2.sh
which mpirun
echo $LD_LIBRARY_PATH
mpirun -n $NSLOTS mpi-integ-sge-intel.comp
Output is:
/home/mrustaee/PF/openmpi-1.4.2/intel/bin/mpirun
/home/mrustaee/PF/openmpi-1.4.2/intel/lib:/opt/intel/Compiler/11.1/069/lib/intel64:/opt/intel/Compiler/11.1/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.1/069/mkl/lib/em64t:/opt/intel/Compiler/11.1/069/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.1/069/lib/intel64:/opt/intel/Compiler/11.1/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.1/069/mkl/lib/em64t:/opt/intel/Compiler/11.1/069/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
error: executing task of job 1227 failed: execution daemon on host
"amd-7-4.local" didn't accept task
--------------------------------------------------------------------------
A daemon (pid 31144) died unexpectedly with status 1 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
error: executing task of job 1227 failed: execution daemon on host
"amd-7-3.local" didn't accept task
mpirun: clean termination accomplished
-----------------------------------------
LIBRARY_PATH do not confilit other. intelbash shell sets intel library path
and openmpi_intel_1.4.2.sh sets open mpi library path!
--------------------------------------------------------------------------------------
when I qsub a script without -pe option and I run my job by hostfile such:
#!/bin/bash
#$ -S /bin/bash
#$ -N Det2
#$ -cwd
#$ -j y
. $HOME/.intelbash
. ~/openmpi_intel_1.4.2.sh
which mpirun
echo $LD_LIBRARY_PATH
mpirun -n 300 --hostfile machines mpi-integ-sge-intel.comp
everything is ok!. machines is a list of hosts that qsub couldn't run this
program on it.
what are happening !?
I can't catch that error!
Thx so much
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20111123/a084547b/attachment.html>
More information about the users
mailing list