[gridengine users] Wrong scheduling behaviour with parallel jobs
Andreas Haupt
andreas.haupt at desy.de
Tue Mar 1 15:39:12 UTC 2011
Hi Dave,
On Wed, 2011-02-23 at 12:43 +0000, Dave Love wrote:
> Chris Jewell <chris.jewell at warwick.ac.uk> writes:
> > Dave Love <d.love at liverpool.ac.uk> writes:
> >> Yes <https://arc.liv.ac.uk/trac/SGE/ticket/1280>. I have a suspicion
> >> that restarting the qmaster has helped sometimes, but I'm not sure about
> >> that. I'd be interested if anyone has any more information/suggestions.
> >
> > For the record, I've just found that restarting qmaster sorts the issue.
>
> As it happens, I've just been in a position to confirm this.
>
> There had previously been reservations for the two largest, highest
> priority jobs in the system which had gone when I looked again and they
> have just reappeared on a restart.
How did you verify this? Via the scheduler logfile enabled with
"MONITOR=1" in the scheduler configuration?
I my case restarting the master obviously doesn't help. Jobs still get
reservations on one single PE only, if one uses wildcards in the PE
name. Maybe this is even another problem than yours ... If a job
explicitly requests a single PE without wildcards, everything is
reserved correctly again for this single job - the rest still
suffers ... :-(
Cheers,
Andreas
PS: where should I put my debugging information best so that they can be
used to finally solve this bug?
--
| Andreas Haupt | E-Mail: andreas.haupt at desy.de
| DESY Zeuthen | WWW: http://www-zeuthen.desy.de/~ahaupt
| Platanenallee 6 | Phone: +49/33762/7-7359
| D-15738 Zeuthen | Fax: +49/33762/7-7216
More information about the users
mailing list