[gridengine users] Wrong scheduling behaviour with parallel jobs

Andreas Haupt andreas.haupt at desy.de
Tue Mar 1 15:39:12 UTC 2011


Hi Dave,

On Wed, 2011-02-23 at 12:43 +0000, Dave Love wrote:
> Chris Jewell <chris.jewell at warwick.ac.uk> writes:
> > Dave Love <d.love at liverpool.ac.uk> writes:
> >> Yes <https://arc.liv.ac.uk/trac/SGE/ticket/1280>.  I have a suspicion
> >> that restarting the qmaster has helped sometimes, but I'm not sure about
> >> that.  I'd be interested if anyone has any more information/suggestions.
> >
> > For the record, I've just found that restarting qmaster sorts the issue.
> 
> As it happens, I've just been in a position to confirm this.
> 
> There had previously been reservations for the two largest, highest
> priority jobs in the system which had gone when I looked again and they
> have just reappeared on a restart.

How did you verify this? Via the scheduler logfile enabled with
"MONITOR=1" in the scheduler configuration?

I my case restarting the master obviously doesn't help. Jobs still get
reservations on one single PE only, if one uses wildcards in the PE
name. Maybe this is even another problem than yours ... If a job
explicitly requests a single PE without wildcards, everything is
reserved correctly again for this single job - the rest still
suffers ... :-(

Cheers,
Andreas

PS: where should I put my debugging information best so that they can be
used to finally solve this bug?

-- 
| Andreas Haupt             | E-Mail: andreas.haupt at desy.de
|  DESY Zeuthen             | WWW:    http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6          | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen          | Fax:    +49/33762/7-7216



More information about the users mailing list