[gridengine users] Notify when h_vmem is hit?
mgstauff at gmail.com
Mon Aug 25 20:27:11 UTC 2014
Version OGS/GE 2011.11p1 (Rocks 6.1)
I'm using h_vmem and s_vmem to limit memory usage for qsub and qlogin jobs.
A user's got some analyses running on nearly identical data sets that are
hitting memory limits and being killed, which is fine, but the messages are
inconsistent. Some instances report an exception from the app in question
saying that memory can't be allocated. This app (an in-house tool) sends
exceptions to stdout. Other instances just dump core and there's no message
about memory problems in either stdout or stderr logs.
h_vmem is 6000M and s_vmem is 5900M. It might be that the instances are
right up against the s_vmem limit when the failing memory allocation
occurs, and in some cases the requested amount triggers only the soft
limit, and in other it triggers both. So perhpas the instances where it
triggers the hard limit are the ones without the exception messages?
Unfortunately the stderr and stdout log filenames don't contain job ids.
However, in my first tests anyway, a qsub script that runs out of memory
shows an exception message, even when s_vmem is higher than h_vmem. So I'm
not sure about this line of reasoning.
We're trying to figure it out and will run more tests, but I thought I'd
check here first to see if anyone's had this kind of experience. Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users