[gridengine users] Processes not exiting

Hay, William w.hay at ucl.ac.uk
Wed Nov 14 11:21:05 UTC 2018


On Tue, Nov 13, 2018 at 05:06:51PM -0700, admin at genome.arizona.edu wrote:
> We have a cluster with gridengine 6.5u2 and noticing a strange behavior when
> running MPI jobs.  Our application will finish, yet the processes continue
> to run and use up the CPU.  We did configure a parallel environment for MPI
> as follows:
> 
> pe_name            mpi
> slots              500
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $round_robin
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
> 
> Then we have run our application "Maker" like this,
> qsub -cwd -N <NAME> -b y -V -pe mpi <CPUs> /opt/mpich-install/bin/mpiexec
> maker <maker options>
> 
> It seems to run fine and qstat will show it running.  Once it has completed,
> qstat is empty again and we have the desired output. However, the "maker"
> process have continued to run on the compute nodes until I login to each
> node and "kill -9" the processes.  We did not have this problem when running
> mpiexec directly with Maker, or running Maker in stand-alone mode (without
> MPI), so I guess it is a problem with our qsub command or parallel
> environment?  Any Ideas?

Do you have ENABLE_ADDGRP_KILL set?  Can be helpful in killing processes left behind when a job exits.

William
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20181114/8fa8a914/attachment.sig>


More information about the users mailing list