[gridengine users] Fair share policy

William Bryce bbryce at univa.com
Thu Feb 28 12:54:16 UTC 2019


Didn’t know that Mark. That is great. I remember there was more than one issue with Sharetree and arrays that we saw but it didn’t happen in the default sharetree configuration. I will have to check. 

Regards 

Bill

Sent from my iPhone

> On Feb 28, 2019, at 4:32 AM, Mark Dixon <M.C.Dixon at leeds.ac.uk> wrote:
> 
> Hi Bill,
> 
> I fixed that share-tree-array-jobs priority problem some time ago, unless 
> you're thinking of a different one?
> 
> https://arc.liv.ac.uk/trac/SGE/ticket/435
> https://arc.liv.ac.uk/trac/SGE/changeset/4840/sge
> 
> We use share tree and array jobs all the time with no problems. It made it 
> into a Son of Gridengine release.
> 
> Best,
> 
> Mark
> 
>> On Wed, 27 Feb 2019, William Bryce wrote:
>> 
>> Hi Iyad,
>> 
>> Reuti is correct the man sge_priority explains how sge calculates the
>> priority of jobs.  It includes the formula.  I will say that if you intend
>> to use share-tree with Array Jobs (i.e. qsub -t) then you will find out
>> that the priority calculation is 'wrong' because it does not properly
>> account for array jobs.  The functional share tree policy does not have
>> this issue - just the share tree policy.
>> 
>> Regards,
>> 
>> Bill.
>> 
>> 
>> On Wed, Feb 27, 2019 at 4:10 PM Kandalaft, Iyad (AAFC/AAC) <
>> iyad.kandalaft at canada.ca> wrote:
>> 
>>> HI Reuti
>>> 
>>> I'm implementing only a share-tree.  The docs somewhere state something
>>> along the lines of use one or the other.
>>> I've seen the man page as  It explains most of the math but leaves out
>>> some key elements.  For example, how are "tickets" handed out and in what
>>> quantity (i.e. why do some job get 20000 tickets based on my configuration
>>> below).  Also, the normalization function puts the values between 0 and 1
>>> but based on what?  Number of tickets issued to the job divided by the
>>> total?
>>> 
>>> Thanks for your help.
>>> 
>>> Iyad Kandalaft
>>> 
>>> -----Original Message-----
>>> From: Reuti <reuti at staff.uni-marburg.de>
>>> Sent: Wednesday, February 27, 2019 4:00 PM
>>> To: Kandalaft, Iyad (AAFC/AAC) <iyad.kandalaft at canada.ca>
>>> Cc: users at gridengine.org
>>> Subject: Re: [gridengine users] Fair share policy
>>> 
>>> Hi,
>>> 
>>> there is a man page "man sge_priority". Which policy do you intend to use:
>>> share-tree (honors past usage) or functional (current use), or both?
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) <
>>> iyad.kandalaft at canada.ca>:
>>>> 
>>>> Hi all,
>>>> 
>>>> I recently implemented a fair share policy using share tickets.  I’ve
>>> been monitoring the cluster for a couple of days using qstat -pri -ext -u
>>> “*” in order to see how the functional tickets are working and it seems to
>>> have the intended effect.  There are some anomalies where some running jobs
>>> have 0 tickets but still get scheduled since there’s free resources; I
>>> assume this is normal.
>>>> 
>>>> I’ll admit that I don’t fully understand the scheduling as it’s somewhat
>>> complex.  So, I’m hoping someone can review the configuration to see if
>>> they can find any glaring issues such as conflicting options.
>>>> 
>>>> I created a share-tree and gave all users an equal value of 10:
>>>> $ qconf -sstree
>>>> id=0
>>>> name=Root
>>>> type=0
>>>> shares=1
>>>> childnodes=1
>>>> id=1
>>>> name=default
>>>> type=0
>>>> shares=10
>>>> childnodes=NONE
>>>> 
>>>> I modified the scheduling by setting the weight_tickets_share to
>>> 1000000. I reduced the weight_waiting_time weight_priority weight_urgency
>>> to well below the weight_ticket (what are good values?).
>>>> $ qconf -ssconf
>>>> algorithm                         default
>>>> schedule_interval                 0:0:15
>>>> maxujobs                          0
>>>> queue_sort_method                 seqno
>>>> job_load_adjustments              np_load_avg=0.50
>>>> load_adjustment_decay_time        0:7:30
>>>> load_formula                      np_load_avg
>>>> schedd_job_info                   false
>>>> flush_submit_sec                  0
>>>> flush_finish_sec                  0
>>>> params                            none
>>>> reprioritize_interval             0:0:0
>>>> halftime                          168
>>>> usage_weight_list                 cpu=0.700000,mem=0.200000,io=0.100000
>>>> compensation_factor               5.000000
>>>> weight_user                       0.250000
>>>> weight_project                    0.250000
>>>> weight_department                 0.250000
>>>> weight_job                        0.250000
>>>> weight_tickets_functional         0
>>>> weight_tickets_share              1000000
>>>> share_override_tickets            TRUE
>>>> share_functional_shares           TRUE
>>>> max_functional_jobs_to_schedule   200
>>>> report_pjob_tickets               TRUE
>>>> max_pending_tasks_per_job         50
>>>> halflife_decay_list               none
>>>> policy_hierarchy                  OFS
>>>> weight_ticket                     0.500000
>>>> weight_waiting_time               0.000010
>>>> weight_deadline                   3600000.000000
>>>> weight_urgency                    0.010000
>>>> weight_priority                   0.010000
>>>> max_reservation                   0
>>>> default_duration                  INFINITY
>>>> 
>>>> I modified all the users to set the fshare to 1000 $ qconf -muser XXX
>>>> 
>>>> I modified the general conf to auto_user_fsahre 1000 and
>>> auto_user_delete_time 7776000 (90 days).  Halftime is set to the default 7
>>> days (I assume I should increase this).  I don’t know if
>>> auto_user_delete_time even matters.
>>>> $ qconf -sconf
>>>> #global:
>>>> execd_spool_dir              /opt/gridengine/default/spool
>>>> mailer
>>> /opt/gridengine/default/commond/mail_wrapper.py
>>>> xterm                        /usr/bin/xterm
>>>> load_sensor                  none
>>>> prolog                       none
>>>> epilog                       none
>>>> shell_start_mode             posix_compliant
>>>> login_shells                 sh,bash
>>>> min_uid                      100
>>>> min_gid                      100
>>>> user_lists                   none
>>>> xuser_lists                  none
>>>> projects                     none
>>>> xprojects                    none
>>>> enforce_project              false
>>>> enforce_user                 auto
>>>> load_report_time             00:00:40
>>>> max_unheard                  00:05:00
>>>> reschedule_unknown           00:00:00
>>>> loglevel                     log_info
>>>> administrator_mail           none
>>>> set_token_cmd                none
>>>> pag_cmd                      none
>>>> token_extend_time            none
>>>> shepherd_cmd                 none
>>>> qmaster_params               none
>>>> execd_params                 ENABLE_BINDING=true ENABLE_ADDGRP_KILL=true
>>> \
>>>>                             H_DESCRIPTORS=16K
>>>> reporting_params             accounting=true reporting=true \
>>>>                             flush_time=00:00:15 joblog=true
>>> sharelog=00:00:00
>>>> finished_jobs                100
>>>> gid_range                    20000-20100
>>>> qlogin_command               /opt/gridengine/bin/rocks-qlogin.sh
>>>> qlogin_daemon                /usr/sbin/sshd -i
>>>> rlogin_command               builtin
>>>> rlogin_daemon                builtin
>>>> rsh_command                  builtin
>>>> rsh_daemon                   builtin
>>>> max_aj_instances             2000
>>>> max_aj_tasks                 75000
>>>> max_u_jobs                   0
>>>> max_jobs                     0
>>>> max_advance_reservations     0
>>>> auto_user_oticket            0
>>>> auto_user_fshare             1000
>>>> auto_user_default_project    none
>>>> auto_user_delete_time        7776000
>>>> delegated_file_staging       false
>>>> reprioritize                 0
>>>> jsv_url                      none
>>>> jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
>>>> 
>>>> Thanks for your assistance.
>>>> 
>>>> Cheers
>>>> 
>>>> Iyad Kandalaft
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> users at gridengine.org
>>>> https://gridengine.org/mailman/listinfo/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>>> 
>> 
>> 
>> -- 
>> *William Bryce* | VP of Products
>> Univa Corporation <http://www.univa.com/> - 130 Esna Park Drive, Second
>> Floor, Markham, Ontario, Canada
>> *Email* bbryce at univa.com | *Mobile: 647.974.2841* | *Office: 647.478.5974*



More information about the users mailing list