[gridengine users] Strange behavior with functional scheduling

David Rosenstrauch darose at darose.net
Mon Oct 9 21:01:21 UTC 2017


I'm a bit of a SGE noob, so please bear with me.  We're in the process 
of a first-time SGE deploy for the users in our department.  Although 
we've been able to use SGE, submit jobs to the queues successfully, 
etc., we're running into issues trying to get the fair-share scheduling 
- specifically the functional scheduling - to work correctly.

We have very simple functional scheduling enabled, via the following 
configuration settings:

enforce_user                 auto
auto_user_fshare             100
weight_tickets_functional         10000
schedd_job_info                   true

(In addition, the "weight_tickets_share" setting is set to 0, thereby 
disabling share tree scheduling.)

A colleague and I are testing this setup by both of us submitting 
multiple jobs to one of our queues simultaneously, with me first 
submitting a large number of jobs (100) and he submitting a fewer number 
(25) shortly afterwards.  Our understanding is that the functional 
scheduling policy should prevent one user from having their jobs 
completely dominate a queue.  And so our expectation is that even though 
my jobs were submitted first, and there are more of them, the scheduler 
should wind up giving his jobs a higher priority so that he is not 
forced to wait until all of my jobs complete before his run.  (If he did 
have to wait, that would effectively be FIFO scheduling, not fair 
share.)

Although we aren't seeing FIFO scheduling, we're seeing close to it.  
One of his jobs (eventually) gets assigned a high number of tickets, and 
a higher priority, and gets scheduled and run.  But the remaining 
several dozen sit in the queue and don't get run until all of mine 
complete - which is not really fair share.

Although it does look like functional scheduling is happening to some 
extent (at least one of his jobs is getting prioritized ahead of mine) 
this scheduling behavior is not what we were expecting to see.  Our 
expectation was that one of his jobs would run for every 4 of mine (more 
or less), and that his jobs would not wind up queued up to run after 
mine complete.


Any idea what might be going on here?  Do I have my system misconfigured 
for functional scheduling?  Or am I just misunderstanding how this is 
supposed to work?  I've already done quite a bit of googling and man 
page reading on the relevant topics and settings, but wasn't able to 
find a good explanation for the behavior we're seeing.  Any help greatly 
appreciated!

Thanks,

DR



More information about the users mailing list