[gridengine users] Fair share policy

Mark Dixon M.C.Dixon at leeds.ac.uk
Thu Feb 28 09:32:59 UTC 2019


Hi Bill,

I fixed that share-tree-array-jobs priority problem some time ago, unless 
you're thinking of a different one?

https://arc.liv.ac.uk/trac/SGE/ticket/435
https://arc.liv.ac.uk/trac/SGE/changeset/4840/sge

We use share tree and array jobs all the time with no problems. It made it 
into a Son of Gridengine release.

Best,

Mark

On Wed, 27 Feb 2019, William Bryce wrote:

> Hi Iyad,
>
> Reuti is correct the man sge_priority explains how sge calculates the
> priority of jobs.  It includes the formula.  I will say that if you intend
> to use share-tree with Array Jobs (i.e. qsub -t) then you will find out
> that the priority calculation is 'wrong' because it does not properly
> account for array jobs.  The functional share tree policy does not have
> this issue - just the share tree policy.
>
> Regards,
>
> Bill.
>
>
> On Wed, Feb 27, 2019 at 4:10 PM Kandalaft, Iyad (AAFC/AAC) <
> iyad.kandalaft at canada.ca> wrote:
>
>> HI Reuti
>>
>> I'm implementing only a share-tree.  The docs somewhere state something
>> along the lines of use one or the other.
>> I've seen the man page as  It explains most of the math but leaves out
>> some key elements.  For example, how are "tickets" handed out and in what
>> quantity (i.e. why do some job get 20000 tickets based on my configuration
>> below).  Also, the normalization function puts the values between 0 and 1
>> but based on what?  Number of tickets issued to the job divided by the
>> total?
>>
>> Thanks for your help.
>>
>> Iyad Kandalaft
>>
>> -----Original Message-----
>> From: Reuti <reuti at staff.uni-marburg.de>
>> Sent: Wednesday, February 27, 2019 4:00 PM
>> To: Kandalaft, Iyad (AAFC/AAC) <iyad.kandalaft at canada.ca>
>> Cc: users at gridengine.org
>> Subject: Re: [gridengine users] Fair share policy
>>
>> Hi,
>>
>> there is a man page "man sge_priority". Which policy do you intend to use:
>> share-tree (honors past usage) or functional (current use), or both?
>>
>> -- Reuti
>>
>>
>>> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) <
>> iyad.kandalaft at canada.ca>:
>>>
>>> Hi all,
>>>
>>> I recently implemented a fair share policy using share tickets.  I’ve
>> been monitoring the cluster for a couple of days using qstat -pri -ext -u
>> “*” in order to see how the functional tickets are working and it seems to
>> have the intended effect.  There are some anomalies where some running jobs
>> have 0 tickets but still get scheduled since there’s free resources; I
>> assume this is normal.
>>>
>>> I’ll admit that I don’t fully understand the scheduling as it’s somewhat
>> complex.  So, I’m hoping someone can review the configuration to see if
>> they can find any glaring issues such as conflicting options.
>>>
>>> I created a share-tree and gave all users an equal value of 10:
>>> $ qconf -sstree
>>> id=0
>>> name=Root
>>> type=0
>>> shares=1
>>> childnodes=1
>>> id=1
>>> name=default
>>> type=0
>>> shares=10
>>> childnodes=NONE
>>>
>>> I modified the scheduling by setting the weight_tickets_share to
>> 1000000. I reduced the weight_waiting_time weight_priority weight_urgency
>> to well below the weight_ticket (what are good values?).
>>> $ qconf -ssconf
>>> algorithm                         default
>>> schedule_interval                 0:0:15
>>> maxujobs                          0
>>> queue_sort_method                 seqno
>>> job_load_adjustments              np_load_avg=0.50
>>> load_adjustment_decay_time        0:7:30
>>> load_formula                      np_load_avg
>>> schedd_job_info                   false
>>> flush_submit_sec                  0
>>> flush_finish_sec                  0
>>> params                            none
>>> reprioritize_interval             0:0:0
>>> halftime                          168
>>> usage_weight_list                 cpu=0.700000,mem=0.200000,io=0.100000
>>> compensation_factor               5.000000
>>> weight_user                       0.250000
>>> weight_project                    0.250000
>>> weight_department                 0.250000
>>> weight_job                        0.250000
>>> weight_tickets_functional         0
>>> weight_tickets_share              1000000
>>> share_override_tickets            TRUE
>>> share_functional_shares           TRUE
>>> max_functional_jobs_to_schedule   200
>>> report_pjob_tickets               TRUE
>>> max_pending_tasks_per_job         50
>>> halflife_decay_list               none
>>> policy_hierarchy                  OFS
>>> weight_ticket                     0.500000
>>> weight_waiting_time               0.000010
>>> weight_deadline                   3600000.000000
>>> weight_urgency                    0.010000
>>> weight_priority                   0.010000
>>> max_reservation                   0
>>> default_duration                  INFINITY
>>>
>>> I modified all the users to set the fshare to 1000 $ qconf -muser XXX
>>>
>>> I modified the general conf to auto_user_fsahre 1000 and
>> auto_user_delete_time 7776000 (90 days).  Halftime is set to the default 7
>> days (I assume I should increase this).  I don’t know if
>> auto_user_delete_time even matters.
>>> $ qconf -sconf
>>> #global:
>>> execd_spool_dir              /opt/gridengine/default/spool
>>> mailer
>>  /opt/gridengine/default/commond/mail_wrapper.py
>>> xterm                        /usr/bin/xterm
>>> load_sensor                  none
>>> prolog                       none
>>> epilog                       none
>>> shell_start_mode             posix_compliant
>>> login_shells                 sh,bash
>>> min_uid                      100
>>> min_gid                      100
>>> user_lists                   none
>>> xuser_lists                  none
>>> projects                     none
>>> xprojects                    none
>>> enforce_project              false
>>> enforce_user                 auto
>>> load_report_time             00:00:40
>>> max_unheard                  00:05:00
>>> reschedule_unknown           00:00:00
>>> loglevel                     log_info
>>> administrator_mail           none
>>> set_token_cmd                none
>>> pag_cmd                      none
>>> token_extend_time            none
>>> shepherd_cmd                 none
>>> qmaster_params               none
>>> execd_params                 ENABLE_BINDING=true ENABLE_ADDGRP_KILL=true
>> \
>>>                              H_DESCRIPTORS=16K
>>> reporting_params             accounting=true reporting=true \
>>>                              flush_time=00:00:15 joblog=true
>> sharelog=00:00:00
>>> finished_jobs                100
>>> gid_range                    20000-20100
>>> qlogin_command               /opt/gridengine/bin/rocks-qlogin.sh
>>> qlogin_daemon                /usr/sbin/sshd -i
>>> rlogin_command               builtin
>>> rlogin_daemon                builtin
>>> rsh_command                  builtin
>>> rsh_daemon                   builtin
>>> max_aj_instances             2000
>>> max_aj_tasks                 75000
>>> max_u_jobs                   0
>>> max_jobs                     0
>>> max_advance_reservations     0
>>> auto_user_oticket            0
>>> auto_user_fshare             1000
>>> auto_user_default_project    none
>>> auto_user_delete_time        7776000
>>> delegated_file_staging       false
>>> reprioritize                 0
>>> jsv_url                      none
>>> jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
>>>
>>> Thanks for your assistance.
>>>
>>> Cheers
>>>
>>> Iyad Kandalaft
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>>
>
>
> -- 
> *William Bryce* | VP of Products
> Univa Corporation <http://www.univa.com/> - 130 Esna Park Drive, Second
> Floor, Markham, Ontario, Canada
> *Email* bbryce at univa.com | *Mobile: 647.974.2841* | *Office: 647.478.5974*



More information about the users mailing list