[gridengine users] Understanding Parallel Enviroment ( whole nodes )

Joseph Farran jfarran at uci.edu
Sat Jun 9 01:15:13 UTC 2012


I am try to setup my MPI Parallel Environment so that whole nodes are used before going to the next node when looking for cores.

Our nodes have 64 cores.   What I like is that if I ask for 128 cores (slots), one compute node is selected with 64 cores, and then the next one with 64 cores.

At the suggestion of Prakashan I setup all of my 64 cores nodes with:

     qconf -rattr exechost complex_values "slots=64" node

To hopefully tell OGE to use 64-cores per nodes and no more.

Using a simple Parallel Environment called "mpi" with "$fill_up" allocation rule, I am getting weird results.

When I ask for 128 cores with:

     #$ -pe mpi 128

Some times I get two nodes at 64-core each which is correct:

    PE_HOSTFILE file: (/var/spool/oge/compute-2-5/active_jobs/83.1/pe_hostfile)
    compute-2-5.local 64 all.q at compute-2-5.local UNDEFINED
    compute-2-7.local 64 all.q at compute-2-7.local UNDEFINED

Other times, I get sporadic mixture that I can't make sense of, like this one with 3 nodes at 54, 64 and 10 cores:

    PE_HOSTFILE file: (/var/spool/oge/compute-1-2/active_jobs/84.1/pe_hostfile)
    compute-1-2.local 54 all.q at compute-1-2.local UNDEFINED
    compute-2-4.local 64 all.q at compute-2-4.local UNDEFINED
    compute-2-6.local 10 all.q at compute-2-6.local UNDEFINED

What setting is causing this and/or load sensor?    What I am looking for is to fill up one 64-core node before it goes on to the next one.   So that if I ask for 128 cores, I will always get 2 whole nodes.

More information about the users mailing list