[gridengine users] Limiting each user's slots across all nodes

David Trimboli trimboli at cshl.edu
Tue Mar 12 14:55:38 UTC 2019

On 3/5/2019 12:34 PM, David Trimboli wrote:
> On 3/5/2019 12:18 PM, Reuti wrote:
>>> Am 05.03.2019 um 18:06 schrieb David Trimboli<trimboli at cshl.edu>:
>>> I'm looking at SGE limits, and I'm not sure when something applies to all users or each user individually. I want to find out how to limit each user to a certain number of slots across the entire cluster (just one queue).
>>> I feel like this isn't it:
>>> {
>>>      Name           limit-user-slots
>>>      description    Limit each user to 10 slots
>>>      enabled        true
>>>      limit          users * queues {all.q} to slots=10
>> limit users {*} queues all.q to slots=10
>> In principle {all.q} wouldn't hurt as it means "for each entry in the list", and the only entry is all.q. But to lower the impact I would leave this out.
> Ohhhhhhh! I didn't realize that {} meant to apply to each entry in the 
> list. That gives me everything I need. Thanks to you and Bernd.

Now a followup question. I implemented this rule to ensure that no 
single user takes more than 90% of our available slots:

     name    limit90percent
     description    NONE
     enabled    TRUE
     limit    users {*} to slots=536

(Our cluster has a total of 596 slots.) This worked fine until someone 
tried to submit a parallel environment job with the -pe option. On 16 
out of our 24 nodes, it still worked. But if they sent a job hard-queued 
to one of the upper nodes 17–24, it would never run, with this in the 
scheduling info:

    cannot run because it exceeds limit "trimboli/////" in rule
    cannot run in PE "threads" because it only offers 0 slots

(My username is trimboli.) Now, it's quite possible that the upper nodes 
are set up differently than the lower nodes. The upper eight nodes were 
installed later than the others and have been treated differently in the 
past. I'd like to find what setting in the upper nodes is making this 
limit say that there are 0 slots when a PE job is run. Where can I look 
to find the culprit?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20190312/83289812/attachment.html>

More information about the users mailing list