[gridengine users] Ulimit for max open files

Luis Huang lhuang at NYGENOME.ORG
Mon Jun 26 17:24:57 UTC 2017


To increase the max open file, we have set execd_params in qconf –mconf and also on the OS level:
execd_params                 H_DESCRIPTORS=262144,H_LOCKS=262144,H_MAXPROC=262144

On our execution nodes we can see that SGE sets a soft limit of 65535 despite that we told it to set it to 262144.
After qlogin:
[root at p2node01 ~]# cat /proc/104694/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            unlimited            unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             262144               262144               processes
Max open files            65535                262144               files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            262144               262144               locks
Max pending signals       15023                15023                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

When running PE smp job requesting for 2 slots, the soft limit is set to 65535*2= 131070. The core number seems to be the exponent of the soft limit. If we request for more than 4 slots, it will exceed the hard limit and reset the max open files to the default of 1024. Our work around for this is to set H_DESCRIPTORS=9362. This is because some of our exec nodes are 28 cores. 28 x 9362= 262144 for the limit. I was wondering if there is a better way of doing this?

You might think hey, why do we need to have 200k+ open file. This is due to someone using a software that has an open file handler leak and does not fclose properly. Their workaround is a dirty hack where the job ssh onto the localhost and bypass the ulimit set by SGE.

Many thanks,

This electronic message is intended for the use of the named recipient only, and may contain information that is confidential, privileged or protected from disclosure under applicable law. If you are not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, disclosure, dissemination, distribution, copying or use of the contents of this message including any of its attachments is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and destroy all copies of this message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20170626/c973482f/attachment.html>

More information about the users mailing list