[gridengine users] A couple of questions...
reuti at staff.uni-marburg.de
Wed Jun 29 07:48:23 UTC 2011
Am 29.06.2011 um 09:30 schrieb Vic:
> I've got a couple of things I need to find out. Many apologies if the
> answers are in the documentation, but I haven't been able to work stuff
> out for myself (and as I'm operating on a live grid, there's only so much
> experimentation I can do).
> 1) I have a queue full of two-core machines, and they get two slots per
> machine. I now have some machines with more cores (/proc/cpuinfo says 8
> cores, but they're probably not 8 real cores). If I create a queue using
> just those machines, I get 8 slots on each - but if I add them to my
> original queue, I only get two slots. The editing page in Qmon is greyed
> out. How do I tell SGE that I've got more slots on those instances?
you need to click on the locker left to the entry field first to allow it's editing. Being locked means to use the global setting of this queue. In the textual output you should get something like
$ qconf -sq all.a
> 2) I have an application that uses SGE, but isn't well-behaved - it
> launches as many jobs as it can, and swamps the grid. There doesn't appear
> to be a way to tell it to limit the number of concurrent jobs launched. I
> want to cause those jobs beyond a certain limit to be queued, rather than
> started - but I don't want to do this with a simple ticket allocation,
> because my users might (legitimately) launch multiple instances of this
> tool, and also I don't want their other access to be penalised because
> they are running it. I have a script solution for this (involving many
> sets of complexes and a way to select one in turn), but it's horribly
> complicated, and I'm sure there's a better way. Any ideas?
Is this an internal application or a commercial one which can be checked on the Internet? Sometimes it's indeed tricky to put applications into the limits and host allocation which was granted for it. You have to provide a custom hostfile, try to catch the `rsh` or `ssh` call to other machines, set environment variables...
Can you provide more information about the application's startup process?
More information about the users