[gridengine users] Processes not exiting

Reuti reuti at staff.uni-marburg.de
Wed Nov 14 11:53:02 UTC 2018


> Am 14.11.2018 um 01:06 schrieb admin at genome.arizona.edu:
> We have a cluster with gridengine 6.5u2 and noticing a strange behavior when running MPI jobs.  Our application will finish, yet the processes continue to run and use up the CPU.  We did configure a parallel environment for MPI as follows:
> pe_name            mpi
> slots              500
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $round_robin
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
> Then we have run our application "Maker" like this,
> qsub -cwd -N <NAME> -b y -V -pe mpi <CPUs> /opt/mpich-install/bin/mpiexec  maker <maker options>

Which version of MPICH are you using? Maybe it's not tightly integrated.

-- Reuti

> It seems to run fine and qstat will show it running.  Once it has completed, qstat is empty again and we have the desired output. However, the "maker" process have continued to run on the compute nodes until I login to each node and "kill -9" the processes.  We did not have this problem when running mpiexec directly with Maker, or running Maker in stand-alone mode (without MPI), so I guess it is a problem with our qsub command or parallel environment?  Any Ideas?
> Thanks,
> -- 
> Chandler / Systems Administrator
> Arizona Genomics Institute
> www.genome.arizona.edu
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

More information about the users mailing list