[gridengine users] Scheduler Performance

Rayson Ho rayrayson at gmail.com
Tue Mar 15 19:38:38 UTC 2011


On Tue, Mar 15, 2011 at 6:12 AM, Esztermann, Ansgar
<Ansgar.Esztermann at mpi-bpc.mpg.de> wrote:
> Thanks, that has helped a bit. I know now that most of the CPU time is spent in the dispatching stage. However, it is still unclear to me why dispatching should be such a time-consuming task.

Good, at least we are on the right track!! :-)

The stage is called "job dispatching" but it really is not about
sending job start requests to the execution hosts -- in fact, the
scheduler thread does not talk to the execution hosts directly. The
job dispatching stage
(daemons/qmaster/sge_sched_thread.c:dispatch_jobs()) in the scheduler
tries to find a queue instance (think of it is a host or slot) that is
suitable for running the job.

With a few hundred jobs, grid engine (this applies to SGE forks like
Open Grid Scheduler or Son of GE -- as we have not changed the
scheduler code yet) can easily the load. But as your cluster is
spending 5 minutes to decide where the jobs should go, I'm curious
what kind of resource requirements do they have, and most importantly,
do they have soft request specified??

Rayson





>
> 03/15/2011 10:53:56|schedu|master1|P|PROF: job dispatching took 327.370 s (0 fast, 0 fast_soft, 8 pe, 0 pe_soft, 4 res)
> 03/15/2011 10:53:56|schedu|master1|P|PROF: parallel matching            878       262664         2634       159203       137234       159203       131007
> 03/15/2011 10:53:56|schedu|master1|P|PROF: sequential matching            0            0            0            0            0            0            0
> 03/15/2011 10:53:56|schedu|master1|P|PROF: create pending job orders: 0.000 s
> 03/15/2011 10:53:56|schedu|master1|P|PROF: scheduled in 327.450 (u 337.270 + s 7.960 = 345.230): 0 sequential, 0 parallel, 452 orders, 846 H, 214 Q, 839 QA, 10 J(qw), 431 J(r), 0 J(s), 0 J(h), 0 J(e), 0 J(x), 449 J(all), 57 C, 3 ACL, 149 PE, 12 U, 1 D, 0 PRJ, 1 ST, 0 CKPT, 0 RU, 1 gMes, 0 jMes, 452/3 pre-send, 0/0/0 pe-alg
>
> 03/15/2011 10:53:56|schedu|master1|P|PROF: send orders and cleanup took: 0.020 (u 0.020,s 0.000) s
> 03/15/2011 10:53:56|schedu|master1|P|PROF: schedd run took: 327.630 s (init: 0.000 s, copy: 0.130 s, run:327.470, free: 0.030 s, jobs: 449, categories: 43/0)
>
>
> A.
>
> --
> Ansgar Esztermann
> DV-Systemadministration
> Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
>
>



More information about the users mailing list