[gridengine users] Strange behavior with functional scheduling

David Rosenstrauch darose at darose.net
Mon Oct 9 22:06:56 UTC 2017


Hmmm ... just wondering: why the need for setting weight_tickets_share 
to 10000000 like you did, if we're not using share tree scheduling?  
(Actually, looking at it closer, looks like you're setting that twice - 
once to 10000000 and then later to 0.  I'm guessing the 2nd values 
supersedes the first, so you're effectively setting it to 0.)

In any case, we have several of those other settings in our config, but 
with different values:

weight_tickets_share              0
weight_user                       0.250000
weight_project                    0.250000
weight_department                 0.250000
weight_job                        0.250000
weight_tickets_functional         10000
weight_tickets_share              0

Perhaps these settings might be causing our issue?  Seems unlikely 
though, as we're not taking project or department into account in our 
scheduling.

Thanks,

DR

On 2017-10-09 5:40 pm, Ian Kaufman wrote:
> I am pretty sure you need something like the following (courtesy of
> Reuti):
> 
> weight_tickets_share              10000000
> 
> weight_user                       0.900000
> weight_project                    0.000000
> weight_department                 0.000000
> weight_job                        0.100000
> weight_tickets_functional         100000
> weight_tickets_share              0
> 
> policy_hierarchy                  F
> 
> On Mon, Oct 9, 2017 at 2:01 PM, David Rosenstrauch <darose at darose.net>
> wrote:
> 
>> I'm a bit of a SGE noob, so please bear with me.  We're in the
>> process of a first-time SGE deploy for the users in our department.
>> Although we've been able to use SGE, submit jobs to the queues
>> successfully, etc., we're running into issues trying to get the
>> fair-share scheduling - specifically the functional scheduling - to
>> work correctly.
>> 
>> We have very simple functional scheduling enabled, via the following
>> configuration settings:
>> 
>> enforce_user                 auto
>> auto_user_fshare             100
>> weight_tickets_functional         10000
>> schedd_job_info                   true
>> 
>> (In addition, the "weight_tickets_share" setting is set to 0,
>> thereby disabling share tree scheduling.)
>> 
>> A colleague and I are testing this setup by both of us submitting
>> multiple jobs to one of our queues simultaneously, with me first
>> submitting a large number of jobs (100) and he submitting a fewer
>> number (25) shortly afterwards.  Our understanding is that the
>> functional scheduling policy should prevent one user from having
>> their jobs completely dominate a queue.  And so our expectation is
>> that even though my jobs were submitted first, and there are more of
>> them, the scheduler should wind up giving his jobs a higher priority
>> so that he is not forced to wait until all of my jobs complete
>> before his run.  (If he did have to wait, that would effectively be
>> FIFO scheduling, not fair share.)
>> 
>> Although we aren't seeing FIFO scheduling, we're seeing close to it.
>> One of his jobs (eventually) gets assigned a high number of
>> tickets, and a higher priority, and gets scheduled and run.  But the
>> remaining several dozen sit in the queue and don't get run until all
>> of mine complete - which is not really fair share.
>> 
>> Although it does look like functional scheduling is happening to
>> some extent (at least one of his jobs is getting prioritized ahead
>> of mine) this scheduling behavior is not what we were expecting to
>> see.  Our expectation was that one of his jobs would run for every 4
>> of mine (more or less), and that his jobs would not wind up queued
>> up to run after mine complete.
>> 
>> Any idea what might be going on here?  Do I have my system
>> misconfigured for functional scheduling?  Or am I just
>> misunderstanding how this is supposed to work?  I've already done
>> quite a bit of googling and man page reading on the relevant topics
>> and settings, but wasn't able to find a good explanation for the
>> behavior we're seeing.  Any help greatly appreciated!
>> 
>> Thanks,
>> 
>> DR
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users [1]
> 
> --
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
> 
> 
> Links:
> ------
> [1] https://gridengine.org/mailman/listinfo/users



More information about the users mailing list