[gridengine users] RoundRobin scheduling among users

Reuti reuti at staff.uni-marburg.de
Wed Jan 27 11:25:18 UTC 2016


Hi,

> Am 26.01.2016 um 19:30 schrieb Dan Hyatt <dhyatt at dsgmail.wustl.edu>:
> 
>  I am looking to use this differently.
> The problem I am having is that I have users with 200-1000 jobs. I have 80 servers with almost 1000 cores.
> For my normal queue,  I want SGE PE to create up to 4 jobs per server until it runs out of servers, then add up to 4 more until all the jobs are allocated.  (1 per is fine as long as it will round robin and start adding a second job per server, then a third until it runs out of jobs)
> 
> Does the allocation rule limit the number of jobs per server PER qsub, or total jobs allowed per server?

Not per se. But having a fixed allocation rule of 16 on a machine with 16 cores has this effect of course to get only one job there. Or two jobs with a fixed allocation rule of 8 on this machine.


> The problem I am having is that I get 20 jobs per server and overload a couple of servers

Why? The jobs request a PE with the proper amount of cores? The job (i.e. the final application) is able to honor the granted list of machines where it should start slaves?


> while 80 servers running idle. Each has 10 cores and 128 GB of RAM so they can handle up to 20 light jobs each.

What do you refer by "light" jobs? If you overload a machine it might double the execution time per (serial) job. In my opinion overloading a machine (and having an alarm_threshold > 1) was/is used in case a parallel job is badly parallelized and would leave cores often idle. It could even be adjusted so that the parallel job has a priority (i.e. nice value) in the queue definition of 0, while the serial jobs which should use the unused idling cores only get a priority of 19.


> Also, for the heavy CPU jobs, I want a max of 4 jobs per server, so for pe_slots would I just put the integer 4 in there?

No, an "allocation_rule 4" would mean that each job may get 4 cores on this machine. Note that it will only start jobs in case they are dividable by 4, i.e. a job requesting 13 would never run if it requests this particular PE.

Unfortunately there is no default complex which could be limited to 4 per machine by an RQS, but you can set up a consumable complex with a default value of 1 and the attribute consumable "JOB". This can then be assigned and limited on a exechost level to 4 (this works, unless the user foul the system and request 0 for this complex, but a JSV could handle it). The complex could also be assigned with an arbitrary high value on a cluster level and an RQS could limit it on certain machines.

-- Reuti


> Should I create a third PE, lets say "dan" with the desired settings?  When I tried this before it would throw errors.
> 
> 
> Am I correct that I want to change these settings, but I suspect I really want to make a custom PE, these are default.
> 
> I was looking at http://linux.die.net/man/5/sge_pe  and http://www.softpanorama.org/HPC/Grid_engine/parallel_environment.shtml but seems to assume I comprehend the details of each.. Such as...can I only put one setting for allocation rule per PE and one PE per queue?
> 
> 
> [root at blade5-1-1 ~]# qconf -sp make
> pe_name            make
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $round_robin
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> qsort_args         NONE
> 
> [root at blade5-1-1 ~]# qconf -sp smp
> pe_name            smp
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $pe_slots
> control_slaves     TRUE
> job_is_first_task  TRUE
> urgency_slots      min
> accounting_summary TRUE
> qsort_args         NONE
> [root at blade5-1-1 ~]# echo $pe_slots
> 
> 
> 
> [root at blade5-1-1 ~]# qconf -sp DAN
> pe_name           DAN
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $round_robin
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> qsort_args         NONE
> 
> [root at blade5-1-1 ~]# qconf -sp smp
> pe_name            smp
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    4
> control_slaves     TRUE
> job_is_first_task  TRUE
> urgency_slots      min
> accounting_summary TRUE
> qsort_args         NONE
> [root at blade5-1-1 ~]# echo $pe_slots
> 
>>>> Yep, we use functional tickets to accomplish this exact goal. Every user
>>>> gets 1000 functional tickets via auto_user_fshare in sge_conf(5), though
>>>> your exact number will depend on the number tickets and weights you have
>>>> elsewhere in your policy configuration.
>>> Also the waiting time should be set to 0, and less importance of the urgency (as the default is to give 1000 per slot in the complex configuration - this means more slots results in being more important):
>>> 
>>> weight_user                       0.900000
>>> weight_project                    0.000000
>>> weight_department                 0.000000
>>> weight_job                        0.100000
>>> weight_tickets_functional         1000000
>>> weight_tickets_share              0
>>> share_override_tickets            TRUE
>>> share_functional_shares           TRUE
>>> max_functional_jobs_to_schedule   200
>>> report_pjob_tickets               TRUE
>>> max_pending_tasks_per_job         50
>>> halflife_decay_list               none
>>> policy_hierarchy                  F
>>> weight_ticket                     1.000000
>>> weight_waiting_time               0.000000
>>> weight_deadline                   3600000.000000
>>> weight_urgency                    0.100000
>>> weight_priority                   1.000000
>>> max_reservation                   32
>>> default_duration                  8760:00:00
>> We actually do weight waiting time, but at half the value of both
>> functional and urgency tickets. We then give big urgency boosts to
>> difficult-to-schedule jobs (i.e. lots of memory or CPUs in one spot). It
>> took us a while to arrive at a decent mix of short-run / small jobs vs
>> long-run / big jobs, and it definitely will be a site-dependent decision.
>> 
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users





More information about the users mailing list