[gridengine users] SGE + MPICH2 + /etc/ssh/sshd_config

Gowtham g at mtu.edu
Thu Sep 1 13:28:33 UTC 2011



This belongs to "keeping our users from directly
SSHing into compute nodes" category.


On a test cluster (pauli), I have the following set up:

    0. 1 Front end and 2 compute nodes

       Each compute node has 4 cpu cores


    1. Rocks 5.4 (service pack 2) - all rolls except
       bio, condor and xen; runs SGE queuing system
       6.2u5

[root at pauli ~]# rocks list roll
NAME          VERSION ARCH   ENABLED
area51:       5.4     x86_64 yes
base:         5.4     x86_64 yes
ganglia:      5.4     x86_64 yes
hpc:          5.4     x86_64 yes
kernel:       5.4     x86_64 yes
os:           5.4     x86_64 yes
sge:          5.4     x86_64 yes
web-server:   5.4     x86_64 yes
service-pack: 5.4.2   x86_64 yes


    2. MPICH2 (1.4), compiled with GCC 4.1.1, is in

       /share/apps/mpich2/1.4/gcc/4.1.2

      Configure & make/make install commands were as
      follows

export CC="/usr/bin/gcc"
export CXX="/usr/bin/g++"
export FC="/usr/bin/gfortran"
export F77="/usr/bin/gfortran"

./configure --prefix=/share/apps/mpich2/1.4/gcc/4.1.2
make
make install

       I compiled a simple 'hello, world' C program

       mpicc -g -Wall hello_world.c -o hello_world.x

       and 'hello_world.x' runs fine.


    3. There are two groups on this cluster

       pauli-users  : all users belong to this group
       pauli-admins : only administrators belong to this one,
                      in addition to being part of pauli-users

       I created 3 user accounts (all belonging to
       pauli-users) and one more account that belongs to
       pauli-users & pauli-admins

       These groups & users were created before any compute
       node was added to the cluster


    4. The extend-compute.xml had the following lines in
       <post> section

<file name="/etc/ssh/sshd_config" mode="append">

# Block non-root, non-pauli-admins users from directly
# accessing this compute node
AllowGroups root pauli-admins
</file>

        xmllint -noout extend-compute.xml was run and
        no errors were found.

        rocks distribution was rebuilt and the compute
        nodes were added via the usual insert-ethers

        I ran 'rocks sync users'

        When I check the '/etc/ssh/sshd_config' file
        in compute nodes, I do see the line

AllowGroups root pauli-admins

        The '/etc/group' file in compute node have lines
        corresponding to 'pauli-users' and 'pauli-admins'

pauli-users:x:500:
pauli-admins:x:501:john


    5. 'john' attempts to SSH into compute nodes get through
       while 'greg' (just a pauli-user) are blocked


    6. Now comes SGE

       I run the 'hello_world.x' with 8 processors (spanning
       both compute nodes) via SGE script - sge_test.sh -
       with 8 processors


#! /bin/bash
# 
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe mpich 8
#
# Run 'Hello, World!'
/share/apps/mpich2/1.4/gcc/4.1.2/bin/mpirun -n $NSLOTS \
   -f $TMP/machines /share/apps/bin/hello_world.x


      It produces desired output when I run this as 'john'
      (a pauli-admin user)

      It hangs in 'r' state. 'sge_test.sh.po12' contains


-catch_rsh /opt/gridengine/default/spool/compute-0-0/active_jobs/12.1/pe_hostfile
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-1
compute-0-1
compute-0-1
compute-0-1


      'sge_test.sh.o12' contains


Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-with-mic,password).


Can someone please help me if I am doing something wrong 
or missing something?

Thanks,
g

--
Gowtham
Advanced IT Research Support
Michigan Technological University

(906) 487/3593



More information about the users mailing list