[gridengine users] Different ulimit settings given by different compute nodes with the exactly same /etc/security/limits.conf

Reuti reuti at staff.uni-marburg.de
Tue Jul 16 10:11:37 UTC 2019

> Am 16.07.2019 um 02:33 schrieb Derrick Lin <klin938 at gmail.com>:
> Thanks guys,
> >> Correct. The limits in place when sgeexecd is started are used (i.e. the one of the root user).
> I tried to simply restart the sgeexecd but it does not change anything.
> In my /etc/security/limits.conf I have:
> * soft nofile 18000
> * hard nofile 20000
> That should apply to every account? the SGE daemons are run under user "sge".

The appear to run under sge, but it runs under to root account (and should be started by root):

$ ps -e f -o user,ruser,command
sgeadmin root     /usr/sge/bin/lx24-em64t/sge_qmaster

> >> Several ulimits can be set in the queue configuration, and can so different for each queue or exechost.
> We don't have any ulimits setting inside queue or other SGE parts, limits.conf is the only place of the config. 
> It is so weird that most of the Compute Nodes pick up the settings correctly, only a few fail to pick up.

Do you log in in by SSH to the node? Then you have to restart the SSH daemon too, as the login process inherits the values the SSH daemon got.

The changes of the "nofile" setting should be visible in the shell when you log in too.

-- Reuti

> Currently, my only workaround is to rebuild the Compute Node (reinstall OS etc) so that it corrects this issue.
> >> Can you check the limits that are set in the sge_execd and sge_shepherd
> processes (/proc/<pid>/limits)?
> I tried to look it up, but I could not find the <pid> directory which is corresponding to the sgeexecd.
> Cheers,
> Derrick 
> On Thu, Jul 4, 2019 at 12:09 AM Skylar Thompson <skylar2 at uw.edu> wrote:
> Can you check the limits that are set in the sge_execd and sge_shepherd
> processes (/proc/<pid>/limits)? It's possible that the user who ran the
> execd init script had limits applied, which would carry over to the execd
> process.
> On Wed, Jul 03, 2019 at 12:36:00PM +1000, Derrick Lin wrote:
> > Hi guys,
> > 
> > We have custom settings for user open files in /etc/security/limits.conf in
> > all Compute Node. When checking if the configuration is effective with
> > "ulimit -a" by SSH to each node, it reflects the correct settings.
> > 
> > but when ran the same command through SGE (both qsub and qrsh), we found
> > that some Compute Nodes do not reflects the correct settings but the rest
> > are fine.
> > 
> > I am wondering if this is SGE related? And idea is welcomed.
> > 
> > Cheers,
> > Derrick
> > _______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> -- 
> -- Skylar Thompson (skylar2 at u.washington.edu)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine

More information about the users mailing list