[gridengine users] trouble running MPI jobs through SGE

Marlies Hankel m.hankel at uq.edu.au
Fri Apr 10 02:51:42 UTC 2015


Dear all,

I have a ROCKS 6.1.1 install and I have also installed the SGE roll. So 
the base config was done via the ROCKS install. The only changes I have 
made are setting the h_vmem complex to consumable and setting up a 
scratch complex. I have also set the h_vmem for all hosts.

I can run single CPU jobs fine and can execute simple things like

mpirun -np 40 hostname

but I cannot run proper MPI programs. I get the following error.

mpirun noticed that process rank 0 with PID 27465 on node phi-0-3 exited 
on signal 11 (Segmentation fault).

Basically the queues error logs on the head node and the execution nodes 
show nothing (/opt/gridengine/default/spool/../messages), also the .e, 
.o and .pe, .po also show nothing. The above error is in the standard 
output file of the program. I am trying VASP but have also tried a home 
grown MPI code. Both of these have been running out of the box via SGE 
for years on our old cluster (which was not ROCKS). I have tried the 
supplied orte PE (programs are compiled with openmpi 1.8.4 compiled with 
intel and with --with-sge and --with-verbs) and have also tried one 
where I specify catch rsh and startmpi and stopmpi scripts but it made 
no difference. It seems as if the program does not even start. I am not 
even trying to run over several nodes yet.

Adding to that is that I can run the program (VASP) perfectly fine by 
ssh to a node and just running from the command line. And also over 
several nodes via a hostfile. So VASP itself is working fine.

I had a look at env and made sure ulimits are set OK (need ulimit -s 
unlimted for VASP to work) but all looks OK.

Has anyone seen this problem before? Or do you have any suggestion on 
what to do to get some info on where it actually goes wrong?

Thanks in advance

Marlies

-- 

------------------

Dr. Marlies Hankel
Research Fellow, Theory and Computation Group
Australian Institute for Bioengineering and Nanotechnology (Bldg 75)
eResearch Analyst, Research Computing Centre and Queensland Cyber Infrastructure Foundation
The University of Queensland
Qld 4072, Brisbane, Australia
Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445
Email: m.hankel at uq.edu.au | www.theory-computation.uq.edu.au


Notice: If you receive this e-mail by mistake, please notify me,
and do not make any use of its contents. I do not waive any
privilege, confidentiality or copyright associated with it. Unless
stated otherwise, this e-mail represents only the views of the
Sender and not the views of The University of Queensland.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20150410/790fbe4d/attachment.html>


More information about the users mailing list