[gridengine users] master node selection and $fill_up behaviour revisited

Michael Weiser M.Weiser at science-computing.de
Tue Jun 28 13:10:03 UTC 2011


in July 2010 I asked on the users mailing list back at SunSource about a
peculiar regression in master node selection behaviour of SGE 6.2u5.
(see http://markmail.org/message/svuskq5qc6oe3axv) After some discussion
Andy pointed out that I was most likely hitting IZ 3148 which was fixed
in 6.2u6. And indeed, I was not able to trigger the bug in 6.2u6, which
was worst of all, because I couldn't upgrade.

Today I've tried a recent build of V800_BRANCH of
https://github.com/gridengine/gridengine.git and was able to reproduce
the bug just as with SGE 6.2u5.

Does anyone here have a handle on the issue and can help out in tracking
it down and fixing it?
Does perhaps one of the other forks fix the bug?

In short, after some jobs have been run on an empty cluster, the
scheduler will start distributing say a two-slot $pe_fillup job over two
nodes even though one of them could have accomodated the whole job. An

weiser at laudrup ~ $ qhost -j
global                  -               -    -    -    -     -       -       -       -       -
kempes                  lx-amd64        4    0    0    0  0.00   31.4G  249.7M   33.4G     0.0
   job-ID  prior   name       user         state submit/start at     queue      master ja-task-ID 
        17 0.51000 STDIN      weiser       r     06/28/2011 13:52:53 normal at kem MASTER        
        26 0.61000 STDIN      weiser       r     06/28/2011 13:56:53 normal at kem MASTER        
laudrup                 lx-amd64        2    0    0    0  0.04    7.7G  654.5M    1.9G  224.0K
        26 0.61000 STDIN      weiser       r     06/28/2011 13:56:53 normal at lau SLAVE         
maradonna               lx-amd64        4    0    0    0  0.00   31.4G  354.4M   33.4G     0.0

Thanks in advance,
