[gridengine users] Repeated error message in logs from RQS rules

Simon Andrews simon.andrews at babraham.ac.uk
Fri Jul 14 08:36:06 UTC 2017


Can anyone shed any light on an error I'm getting repeated thousands of times in my grid engine messages log.  This happens when I have a job which is submitted and which is stopped from running by an RQS rule I have set up.  The error I get is:

07/14/2017 09:27:08|schedu|rocks1|C|not a single host excluded in rqs_excluded_hosts()

The RQS ruleset I have which triggers this looks like:

{
   name         per_user_slot_limit
   description  "limit the number of slots per user"
   enabled      TRUE
   limit        users {*} hosts {@interactive} to slots=8
   limit        users {andrewss} to slots=2
   limit        users {@bioinf} to slots=616
   limit        users {*} to slots=411
}

The rule seems to work, and jobs are held, and then started as expected.  A job which fails to schedule gets a state like this:

scheduling info:            cannot run in queue instance "all.q at compute-1-6.local" because it is not of type batch
                            cannot run in queue instance "all.q at compute-1-5.local" because it is not of type batch
                            cannot run in queue instance "all.q at compute-1-7.local" because it is not of type batch
                            cannot run in queue instance "all.q at compute-1-0.local" because it is not of type batch
                            cannot run in queue instance "all.q at compute-1-3.local" because it is not of type batch
                            cannot run because it exceeds limit "andrewss/////" in rule "per_user_slot_limit/3"
                            cannot run in queue instance "all.q at compute-1-4.local" because it is not of type batch
                            cannot run in queue instance "all.q at compute-1-1.local" because it is not of type batch
                            cannot run in queue instance "all.q at compute-1-2.local" because it is not of type batch

So it's seeing the rule and is applying it correctly, but the spurious errors are causing my messages file to inflate quickly when there are a lot of queued jobs.

Can anyone suggest how to debug or fix this?  I can't find anything relevant from googling around for the specific error outside of the library API it comes from.

This is using SGE-6.2u5p2-1.x86_64.

Thanks for any help you can offer!

Simon.


The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20170714/de8045fd/attachment.html>


More information about the users mailing list