[gridengine users] cgroups Integration in OGS/GE 2011.11 update 1
m.c.dixon at leeds.ac.uk
Thu May 24 15:57:39 UTC 2012
On Thu, 24 May 2012, Rayson Ho wrote:
>> How are you mapping existing queue limits to cgroup limits?
>> memory.limit_in_bytes fits nicely onto h_rss (thanks for the suggestion
>> William), but the crucially important memory.memsw.limit_in_bytes (rss+swap)
>> doesn't seem to have an existing concept. Unless you're hijacking h_vmem?
> It's not really hijacking h_vmem - in the end, your memsw limit is the
> virtual size limit of the process/job.
My concern about this mainly centres around the fact that "virtual memory
size" already has a very specific meaning. It does not mean RAM usage +
swap and so isn't the same as memsw.
This does impact on things:
1) It's not a "drop in" replacement. If upgrading gridengine on an
existing system, activating your cgroup code will cause an immediate
change in behaviour of jobs, without the user altering their submission
flags. People don't tend to like that sort of thing.
2) It's removing functionality. The old behaviour allows you to ensure
that a job will fail if it mallocs something that's too big. Either it
runs or it doesn't - and provides a decent error code you can handle
rather than just die. That could be important to some people. In contrast,
the new behaviour will permit the malloc, but then stop at some less
predictable point in the future when you use more than the permitted
amount of the memory.
3) We've all got lots of users already using the old code. No matter what
we say to them, when presented with a new system most will ignore any
documentation written by admins and just copy their old job scripts across
and keep using them. Since the memory usage as measured by the cgroup PDC
is likely to be much lower (but more accurate) than that by the
traditional PDC, we've not forced users to read the documentation and
reassess their memory needs (e.g. a JSV rejecting the use of h_vmem unless
you've also have a "-l yes_I_really_mean_h_vmem"). So we don't get an
immediate big improvement in throughput.
Personally, my concern is mainly centered around (3).
In my view, using a new set of attributes (I don't care what they're
called), rather than overloading old ones, avoids all of these issues.
However, I freely admit that it makes the decision about what to do about
the accounting file somewhat less obvious.
> Open Grid Scheduler is "commercial open source", so when we ship GE
> 2011.11 update 1, you will get the source. We are only selling
> *optional* support, we don't sell our code under a commercial license,
That is fantastic news (sorry, I keep losing track of people's commercial
models): my sincere thanks for this.
>> Is there some way we can collaborate on this one?
> There are 2 issues that we need to solve first:
> 1) Copyright assignment - like any other open source projects, we do
> need to own the rights or else it is not safe for ISVs to use our
> code. So far, the external contributions are smaller and quite
> straightforward (in terms of the code change - the debugging behind
> that is often times much more complicated - eg. Brooks Davis'
> BeyondTrust AD fix in shepherd)... For larger contributions, we need
> to audit the code.
> Let me start another thread to follow up with this specific topic.
> 2) As we are shipping in less than 1 month, what do you plan to
> change?? We are only bug fixing the cgroups integration code now, and
> we plan to add enhancements only in later update releases.
You clearly have a more complete and advanced implementation than what
I've done and were intending to do. You therefore have priority and I
presumably have little to offer you (believe it or not, this is great news
At this stage about this specific feature, I'm hoping there can be a
discussion about how it can be best presented to the end user (once I was
sure I was capable of doing a cgroup feature, my next port of call was
going to be this list to start that conversation).
To my mind, this in particular means the attribute names and the
accounting file. I hope that it could be open to consensus - after all, we
all have a stake in this.
All the best,
PS Forgot to say in my previous emails (where are my manners?) -
congratulations to all on the imminent release and the development work
that has gone into it, and thanks again: it's very pleasing to see this
sort of thing done under an open source model.
Mark Dixon Email : m.c.dixon at leeds.ac.uk
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
More information about the users