[gridengine users] SGE + MPICH2 + /etc/ssh/sshd_config

Reuti reuti at Staff.Uni-Marburg.DE
Thu Sep 1 17:17:10 UTC 2011


Well, with all this information I don't see any reason why it switches to ssh.

The application you also recompiled with the latest MPICH2 I assume.

You can try to use the `strings` command on the compiled binary (in case the MPICH2 libraries are statically linked in):

strings myapp | grep JOB_ID

and you can scan for SGE_ROOT and alike. When they are present in the binary (or the shared libraries) it should work.

-- Reuti


Am 01.09.2011 um 19:10 schrieb Gowtham:

> 
> On Thu, 1 Sep 2011, Reuti wrote:
> 
> | Am 01.09.2011 um 18:42 schrieb Gowtham:
> | 
> | > On Thu, 1 Sep 2011, Reuti wrote:
> | > 
> | > | So, all startup methods are set to builtin, this is perfect.
> | > | 
> | > | Can you please check in the jobsscript by a:
> | > | 
> | > | which mpiexec
> | > | 
> | > | you are using? MPICH2 should detect the presence of SGE automatically.
> | > | 
> | > | -- Reuti
> | > 
> | > 
> | > john: pauli-users, pauli-admins
> | > greg: pauli-users
> | > 
> | > 
> | > On front end, as 'john' and 'greg', I get
> | > 
> | > [john at pauli ~]$ which mpiexec
> | > /share/apps/mpich2/1.4/gcc/4.1.2/bin/mpiexec
> | 
> | It can be different in a job. As said: Can you please check in a jobscript by a:
> | 
> | which mpiexec
> | 
> | you are using.
> | 
> | -- Reuti
> 
> I used the following SGE script, named which_mpiexec :
> 
> #! /bin/bash
> # 
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> #
> # Run 'which mpiexec'
> which mpiexec
> 
> 
> Submitted this to the queue as 'john' and 'greg' for their 
> respective home directories. 
> 
>  /home/john/test_runs/WhichMPIEXEC
>  /home/greg/test_runs/WhichMPIEXEC
> 
> In both cases, I get the following output:
> 
> [john at pauli WhichMPIEXEC]$ cat which_mpiexec.o45 
> /share/apps/mpich2/1.4/gcc/4.1.2/bin/mpiexec
> 
> [greg at pauli WhichMPIEXEC]$ cat which_mpiexec.o46
> /share/apps/mpich2/1.4/gcc/4.1.2/bin/mpiexec
> 
> 
> Does this need to be run as root as well?
> 
> 
> 
> | 
> | 
> | > [greg at pauli ~]$ which mpiexec
> | > /share/apps/mpich2/1.4/gcc/4.1.2/bin/mpiexec
> | > 
> | > 
> | > On compute node, as 'john', I get
> | > 
> | > [john at compute-0-0 ~]$ which mpiexec
> | > /share/apps/mpich2/1.4/gcc/4.1.2/bin/mpiexec
> | > 
> | > 
> | > On front end, as root, I get
> | > 
> | > [root at pauli ~]# which mpiexec
> | > /opt/openmpi/bin/mpiexec
> | > 
> | > 
> | > On compute node, as root I get
> | > 
> | > [root at compute-0-0 ~]# which mpiexec
> | > /opt/openmpi/bin/mpiexec
> | > 
> | > 
> | > | 
> | > | 
> | > | Am 01.09.2011 um 17:12 schrieb Gowtham:
> | > | 
> | > | > 
> | > | > On Thu, 1 Sep 2011, Reuti wrote:
> | > | > 
> | > | > | Am 01.09.2011 um 16:13 schrieb Gowtham:
> | > | > | 
> | > | > | > Please see in-line reply
> | > | > | > 
> | > | > | > | What is the setting of:
> | > | > | > | 
> | > | > | > | qrsh_command
> | > | > | > | qrsh_daemon
> | > | > | > | 
> | > | > | > | in `qconf -sconf`?
> | > | > | > 
> | > | > | > 
> | > | > | > On front end and compute nodes, I get nothing.
> | > | > | > 
> | > | > | > [root at pauli ~]# qconf -sconf | grep "qrsh"
> | > | > | > [root at pauli ~]#
> | > | > | > 
> | > | > | > [root at compute-0-0 ~]# qconf -sconf | grep qrsh
> | > | > | > [root at compute-0-0 ~]#
> | > | > | 
> | > | > | Can you please post the complete output.
> | > | > | 
> | > | > | Actually, the output on both machines must be the same, as the qmaster is contacted to deliver the value. But additonal configurations might exist for:
> | > | > | 
> | > | > | qconf -sconfl
> | > | > | 
> | > | > | like
> | > | > | 
> | > | > | qconf -sconf compute-0-0
> | > | > 
> | > | > 
> | > | > [root at pauli ~]# qconf -sconfl
> | > | > pauli.local
> | > | > [root at pauli ~]#
> | > | > 
> | > | > Following one of your previous suggestions (about modifying 
> | > | > the headers of emails sent from SGE jobs), I did the 
> | > | > following from front end:
> | > | > 
> | > | >  qconf -mconf
> | > | > 
> | > | > change the mailer from /bin/mailer to 
> | > | > /share/apps/sbin/mailwrapper.sh - it contains (all in one 
> | > | > line) 
> | > | > 
> | > | > (cat; echo; echo; echo "Please do not reply to this email") 
> | > | > | mail -s "pauli - $2" "$3" -- -f 
> | > | > "DoNotReply at mtu.edu" -F "SGE Admin - pauli"
> | > | > 
> | > | > 
> | > | > Then did
> | > | > 
> | > | >  qconf -dconf compute-0-0
> | > | >  qconf -dcont compute-0-1
> | > | > 
> | > | > The emails I get from SGE do have the headers & body 
> | > | > modified appropriately by /share/apps/sbin/mailwrapper.sh
> | > | > 
> | > | > 
> | > | > 
> | > | > [root at pauli ~]# qconf -sconf
> | > | > #global:
> | > | > execd_spool_dir              /opt/gridengine/default/spool
> | > | > mailer                       /share/apps/sbin/mailwrapper.sh
> | > | > xterm                        /usr/bin/X11/xterm
> | > | > load_sensor                  none
> | > | > prolog                       none
> | > | > epilog                       none
> | > | > shell_start_mode             posix_compliant
> | > | > login_shells                 sh,ksh,csh,tcsh
> | > | > min_uid                      0
> | > | > min_gid                      0
> | > | > user_lists                   none
> | > | > xuser_lists                  none
> | > | > projects                     none
> | > | > xprojects                    none
> | > | > enforce_project              false
> | > | > enforce_user                 auto
> | > | > load_report_time             00:00:40
> | > | > max_unheard                  00:05:00
> | > | > reschedule_unknown           00:00:00
> | > | > loglevel                     log_warning
> | > | > administrator_mail           none
> | > | > set_token_cmd                none
> | > | > pag_cmd                      none
> | > | > token_extend_time            none
> | > | > shepherd_cmd                 none
> | > | > qmaster_params               none
> | > | > execd_params                 none
> | > | > reporting_params             accounting=true reporting=true 
> | > | > \
> | > | >                             flush_time=00:00:15 joblog=true 
> | > | > sharelog=00:00:00
> | > | > finished_jobs                100
> | > | > gid_range                    20000-20100
> | > | > qlogin_command               builtin
> | > | > qlogin_daemon                builtin
> | > | > rlogin_command               builtin
> | > | > rlogin_daemon                builtin
> | > | > rsh_command                  builtin
> | > | > rsh_daemon                   builtin
> | > | > max_aj_instances             2000
> | > | > max_aj_tasks                 75000
> | > | > max_u_jobs                   0
> | > | > max_jobs                     0
> | > | > max_advance_reservations     0
> | > | > auto_user_oticket            0
> | > | > auto_user_fshare             0
> | > | > auto_user_default_project    none
> | > | > auto_user_delete_time        86400
> | > | > delegated_file_staging       false
> | > | > reprioritize                 0
> | > | > jsv_url                      none
> | > | > jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
> | > | > 
> | > | > [root at pauli ~]#
> | > | > 
> | > | > 
> | > | > [root at compute-0-0 ~]# qconf -sconf
> | > | > #global:
> | > | > execd_spool_dir              /opt/gridengine/default/spool
> | > | > mailer                       /share/apps/sbin/mailwrapper.sh
> | > | > xterm                        /usr/bin/X11/xterm
> | > | > load_sensor                  none
> | > | > prolog                       none
> | > | > epilog                       none
> | > | > shell_start_mode             posix_compliant
> | > | > login_shells                 sh,ksh,csh,tcsh
> | > | > min_uid                      0
> | > | > min_gid                      0
> | > | > user_lists                   none
> | > | > xuser_lists                  none
> | > | > projects                     none
> | > | > xprojects                    none
> | > | > enforce_project              false
> | > | > enforce_user                 auto
> | > | > load_report_time             00:00:40
> | > | > max_unheard                  00:05:00
> | > | > reschedule_unknown           00:00:00
> | > | > loglevel                     log_warning
> | > | > administrator_mail           none
> | > | > set_token_cmd                none
> | > | > pag_cmd                      none
> | > | > token_extend_time            none
> | > | > shepherd_cmd                 none
> | > | > qmaster_params               none
> | > | > execd_params                 none
> | > | > reporting_params             accounting=true reporting=true 
> | > | > \
> | > | >                             flush_time=00:00:15 joblog=true 
> | > | > sharelog=00:00:00
> | > | > finished_jobs                100
> | > | > gid_range                    20000-20100
> | > | > qlogin_command               builtin
> | > | > qlogin_daemon                builtin
> | > | > rlogin_command               builtin
> | > | > rlogin_daemon                builtin
> | > | > rsh_command                  builtin
> | > | > rsh_daemon                   builtin
> | > | > max_aj_instances             2000
> | > | > max_aj_tasks                 75000
> | > | > max_u_jobs                   0
> | > | > max_jobs                     0
> | > | > max_advance_reservations     0
> | > | > auto_user_oticket            0
> | > | > auto_user_fshare             0
> | > | > auto_user_default_project    none
> | > | > auto_user_delete_time        86400
> | > | > delegated_file_staging       false
> | > | > reprioritize                 0
> | > | > jsv_url                      none
> | > | > jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
> | > | > 
> | > | > [root at compute-0-0 ~]#
> | > | > 
> | > | > [root at compute-0-1 ~]# qconf -sconf
> | > | > #global:
> | > | > execd_spool_dir              /opt/gridengine/default/spool
> | > | > mailer                       /share/apps/sbin/mailwrapper.sh
> | > | > xterm                        /usr/bin/X11/xterm
> | > | > load_sensor                  none
> | > | > prolog                       none
> | > | > epilog                       none
> | > | > shell_start_mode             posix_compliant
> | > | > login_shells                 sh,ksh,csh,tcsh
> | > | > min_uid                      0
> | > | > min_gid                      0
> | > | > user_lists                   none
> | > | > xuser_lists                  none
> | > | > projects                     none
> | > | > xprojects                    none
> | > | > enforce_project              false
> | > | > enforce_user                 auto
> | > | > load_report_time             00:00:40
> | > | > max_unheard                  00:05:00
> | > | > reschedule_unknown           00:00:00
> | > | > loglevel                     log_warning
> | > | > administrator_mail           none
> | > | > set_token_cmd                none
> | > | > pag_cmd                      none
> | > | > token_extend_time            none
> | > | > shepherd_cmd                 none
> | > | > qmaster_params               none
> | > | > execd_params                 none
> | > | > reporting_params             accounting=true reporting=true 
> | > | > \
> | > | >                             flush_time=00:00:15 joblog=true 
> | > | > sharelog=00:00:00
> | > | > finished_jobs                100
> | > | > gid_range                    20000-20100
> | > | > qlogin_command               builtin
> | > | > qlogin_daemon                builtin
> | > | > rlogin_command               builtin
> | > | > rlogin_daemon                builtin
> | > | > rsh_command                  builtin
> | > | > rsh_daemon                   builtin
> | > | > max_aj_instances             2000
> | > | > max_aj_tasks                 75000
> | > | > max_u_jobs                   0
> | > | > max_jobs                     0
> | > | > max_advance_reservations     0
> | > | > auto_user_oticket            0
> | > | > auto_user_fshare             0
> | > | > auto_user_default_project    none
> | > | > auto_user_delete_time        86400
> | > | > delegated_file_staging       false
> | > | > reprioritize                 0
> | > | > jsv_url                      none
> | > | > jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
> | > | > 
> | > | > [root at compute-0-1 ~]#
> | > | > 
> | > | > 
> | > | > 
> | > | > 
> | > | > 
> | > | > | 
> | > | > | -- Reuti
> | > | 
> | > | 
> | 
> | 




More information about the users mailing list