[gridengine users] master node selection and $fill_up behaviour revisited

Michael Weiser M.Weiser at science-computing.de
Tue Jun 28 13:10:03 UTC 2011


Hello,

in July 2010 I asked on the users mailing list back at SunSource about a
peculiar regression in master node selection behaviour of SGE 6.2u5.
(see http://markmail.org/message/svuskq5qc6oe3axv) After some discussion
Andy pointed out that I was most likely hitting IZ 3148 which was fixed
in 6.2u6. And indeed, I was not able to trigger the bug in 6.2u6, which
was worst of all, because I couldn't upgrade.

Today I've tried a recent build of V800_BRANCH of
https://github.com/gridengine/gridengine.git and was able to reproduce
the bug just as with SGE 6.2u5.

Does anyone here have a handle on the issue and can help out in tracking
it down and fixing it?
Does perhaps one of the other forks fix the bug?

In short, after some jobs have been run on an empty cluster, the
scheduler will start distributing say a two-slot $pe_fillup job over two
nodes even though one of them could have accomodated the whole job. An
example:

weiser at laudrup ~ $ qhost -j
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
kempes                  lx-amd64        4    0    0    0  0.00   31.4G  249.7M   33.4G     0.0
   job-ID  prior   name       user         state submit/start at     queue      master ja-task-ID 
   ----------------------------------------------------------------------------------------------
        17 0.51000 STDIN      weiser       r     06/28/2011 13:52:53 normal at kem MASTER        
        26 0.61000 STDIN      weiser       r     06/28/2011 13:56:53 normal at kem MASTER        
laudrup                 lx-amd64        2    0    0    0  0.04    7.7G  654.5M    1.9G  224.0K
        26 0.61000 STDIN      weiser       r     06/28/2011 13:56:53 normal at lau SLAVE         
maradonna               lx-amd64        4    0    0    0  0.00   31.4G  354.4M   33.4G     0.0

Thanks in advance,
-- 
Michael Weiser                science + computing ag
Senior Systems Engineer       Geschaeftsstelle Duesseldorf
                              Martinstrasse 47-55, Haus A
phone: +49 211 302 708 32     D-40223 Duesseldorf
fax:   +49 211 302 708 50     www.science-computing.de
-- 
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Roland Niemeier, 
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196 






More information about the users mailing list