[gridengine users] Tight integration problem with mvapich2 2.0

William Hay w.hay at ucl.ac.uk
Tue Jan 6 11:38:23 UTC 2015

On Mon, 5 Jan 2015 14:20:08 +0000
Götz Waschk <goetz.waschk at gmail.com> wrote:

> Dear Gridengine experts,
> has anybody of you noticed a changed behaviour in mvapich2 2.0? I have upgraded from RHEL6.5 to 6.6, this has updated mvapich2 from 1.8 to 2.0rc1. With 1.8, the gridengine integration was working fine, but now, not all processes are killed on job deletion. All mpi processes that are on the slave nodes (that is the node not running the job script and the mpiexec command) continue to run after a qdel.
> Does anybody have an idea to fix this or a workaround?
> Regards, Götz

While I don't know about a fix but the first thing I would check is whether your job is tightly integrated (that is starting slave processes via grid engine).  To check this log into a node running slave processes and check whether they are descended from an sge_shepherd.

William Hay <w.hay at ucl.ac.uk>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20150106/f0b82ef8/attachment.sig>

More information about the users mailing list