[gridengine users] Processes not exiting

admin at genome.arizona.edu admin at genome.arizona.edu
Wed Nov 14 00:06:51 UTC 2018


We have a cluster with gridengine 6.5u2 and noticing a strange behavior 
when running MPI jobs.  Our application will finish, yet the processes 
continue to run and use up the CPU.  We did configure a parallel 
environment for MPI as follows:

pe_name            mpi
slots              500
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE

Then we have run our application "Maker" like this,
qsub -cwd -N <NAME> -b y -V -pe mpi <CPUs> 
/opt/mpich-install/bin/mpiexec  maker <maker options>

It seems to run fine and qstat will show it running.  Once it has 
completed, qstat is empty again and we have the desired output. 
However, the "maker" process have continued to run on the compute nodes 
until I login to each node and "kill -9" the processes.  We did not have 
this problem when running mpiexec directly with Maker, or running Maker 
in stand-alone mode (without MPI), so I guess it is a problem with our 
qsub command or parallel environment?  Any Ideas?

Thanks,
-- 
Chandler / Systems Administrator
Arizona Genomics Institute
www.genome.arizona.edu


More information about the users mailing list