[gridengine users] SGE (univa 8.0.1) - anyone running SGE with Centrify active directory integration?
bergman at merctech.com
bergman at merctech.com
Tue Nov 22 21:32:01 UTC 2011
In the message dated: Tue, 22 Nov 2011 15:05:43 EST,
The pithy ruminations from Chris Dagdigian on
<[gridengine users] SGE (univa 8.0.1) - anyone running SGE with Centrify active
directory integration?> were:
=> Hi folks,
In the spirit of supporting the community of SGE users, rather than
splitting hairs over open-source vs. commercial, I'm going to offer some
possible answers. Of course, they are worth exactly as much as Chris is
paying me for support. :)
=> I'm hands-on with a shiny new cluster running Univa's 8.0.1 release and
=> am having some issues running jobs as a non-root user via an account
=> that lives in Active Directory.
We're running SGE 6.2u5 (Sun courtesy binaries) on a mixed CentOS 4.8 &
CentOS 5.7 cluster.
=> The cluster is the standard sort of RHEL 5.7 based system but we are
=> using Centrify and in particular the Centrify
=> NIS-gateway-to-ActiveDirectory to service the cluster nodes without
=> having to license centrify on all nodes in the cluster.
We're using Kerberos to integrate with AD for authentication, and using
NIS for authorization. Ie., users must be in the local NIS tables in
order to login, but passwords are stored solely within AD.
In our environment, the AD authentiation only takes place when users
login to the head node. Once they are logged in, SGE connections between
nodes (qlogin, qsub) use passwordless SSH, but still depend on NIS for
the authorization (ie. user account information) on the compute nodes.
=> The user errors I see are familiar ones:
=> "can't get password entry for user "x". Either user does not exist or
=> NIS error!"
Yes, we've seen that too, when NIS is unavailable on some compute nodes.
=> The confusing thing is that I can SSH into compute nodes as the same
=> user and both password logins and passwordless SSH work perfectly. It's
Hmm...if NIS is broken, I'd expect that ssh would fail. If NIS is
working but NIS->AD integration is broken, I'd expect the passwordless
SSH to succeed.
On the compute node, does
getent passwd $user
return the expected info?
Do you allow direct logins from outside the cluster to compute nodes?
The question is...do you really need NIS->AD integration to perform
authentication for SGE jobs? Is there a gateway (a cluster head node)
or some division between interactive and batch-only nodes? If the
authorization & authentation take place on the interactive nodes, and SGE
can trust passwordless SSH between the interactive and compute nodes,
then you don't need the NIS->AD gateway in addition to an entry in the
NIS passwd table to give the UID<->login mapping, home directory, etc..
=> only when running under SGE that the jobs fail.
This suggests to me that SGE is using a different authorization mechanism
that you use for login sessions.
Do you use SSH for SGE jobs (ie., what are the values of 'qlogin_daemon'
and 'rsh_command' in the SGE config)?
=> If I had to guess I'd wonder first if SSHD was using Linux /etc/pam.d/
=> in a way that "works" while SGE is accessing PAM in some way that we
=> have not configured properly yet. That's only a guess though.
=> Does anyone have examples of SGE running via NIS authentication or via
=> Centrify? Any examples of PAM configuration that were needed to get NIS
=> users recognized under SGE?
I can give you our /etc/pam.d/system-auth config from the interactive
nodes, but we don't do anything special on the compute nodes...just
using NIS and passwordless SSH.
More information about the users