[gridengine users] problem in run mpi jobs

Reuti reuti at staff.uni-marburg.de
Sun Nov 20 23:57:26 UTC 2011


Am 20.11.2011 um 12:37 schrieb mahbube rustaee:

> 1) I run intel mpi jobs. when $NSLOTS<=50 , qsub is ok, but for slots >50 either output is empty
> or output of job is:
> mpirun has exited due to process rank 4 with PID 23866 on
> node amd-7-5.local exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [amd-7-5.local:23861] 199 more processes have sent help message help-mtl-psm.txt / unable to open endpoint
> [amd-7-5.local:23861] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
> [amd-7-5.local:23861] 99 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure
> what config is missed?

the errors are from Open MPI, but above you state Intel MPI. Hence the $PATH on the exechost might point to the wrong `mpiexec`.

You can investigate this by `which mpiexec` in your jobscript.

-- Reuti

> 2) when I run a job directly via CLI, depend on number of slots  also  program ,output is correct !
> I think some config on OS and SGE is missed!
> Thx
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

More information about the users mailing list