[gridengine users] Resource Reservation logging
txema.llistes at gmail.com
Fri Oct 4 14:25:29 UTC 2013
I have a 27-node cluster. Currently there are 320 out of 320 slots
filled up. All by jobs requesting 1-slot.
At the top of my waiting queue there are 28 different jobs requesting 3
to 12 cores using two different parallel environments. All these jobs
are requesting -R y. They are being ignored and overrun by the myriad of
1-slot requesting jobs behind them in the waiting queue.
I have enabled the scheduler logging. During the last 4 hours, it has
logged 724 new jobs starting, in all the 27 nodes. Not a single job on
the system is requesting -l h_rt, but single-core jobs keep being
scheduled and all the parallel jobs are starving.
As far as I understand, the backfilling is killing my reservations, even
if no one is requesting any kind of time, but if I set the
"default_duration" to INFINITY, all the RESERVING log messages disappear.
Additionaly, for some odd reason, I only receive RESERVING messages from
the jobs requesting a given number of slots (-pe whatever N). The jobs
requesting a slot-range (-pe threaded 4-10) seem to reserve nothing.
My scheduler configuration is as follows:
# qconf -ssconf
I have also tested it with params PROFILE=1 and default_duration
INFINITY. But, when I set it, not a single reservation is logged in
/opt/gridengine/default/common/schedule and new jobs keep starting.
What am I missing? Is it possible to kill the backfilling? Are my
reservations really working?
Thanks in advance,
More information about the users