[gridengine users] Processes not exiting

Feng Zhang prod.feng at gmail.com
Wed Nov 14 01:20:46 UTC 2018


probably it is the Maker which does not have proper handling of signals?

Maybe you can try to use a script to run the job, rather than run
binary directly, to see if it can work. Also you can add some signal
handling commands in your script to check...

Best,

Feng

On Tue, Nov 13, 2018 at 7:07 PM <admin at genome.arizona.edu> wrote:
>
> We have a cluster with gridengine 6.5u2 and noticing a strange behavior
> when running MPI jobs.  Our application will finish, yet the processes
> continue to run and use up the CPU.  We did configure a parallel
> environment for MPI as follows:
>
> pe_name            mpi
> slots              500
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $round_robin
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
>
> Then we have run our application "Maker" like this,
> qsub -cwd -N <NAME> -b y -V -pe mpi <CPUs>
> /opt/mpich-install/bin/mpiexec  maker <maker options>
>
> It seems to run fine and qstat will show it running.  Once it has
> completed, qstat is empty again and we have the desired output.
> However, the "maker" process have continued to run on the compute nodes
> until I login to each node and "kill -9" the processes.  We did not have
> this problem when running mpiexec directly with Maker, or running Maker
> in stand-alone mode (without MPI), so I guess it is a problem with our
> qsub command or parallel environment?  Any Ideas?
>
> Thanks,
> --
> Chandler / Systems Administrator
> Arizona Genomics Institute
> www.genome.arizona.edu
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users


More information about the users mailing list