[gridengine users] again CPU allocation

Ulrich Hiller hiller at mpia-hd.mpg.de
Tue Jul 23 08:09:31 UTC 2013


Dear list members,
i have GE 2011.11p1 on an opensuse 12.3 cluster installed.
Last week i have asked about jobs not using all CPUs
(subject CPU allocation).
I thought to have solved it by using the host option:
mpirun -np $NSLOTS -host host1,host2,.... -prefix python mycode.py
(or using the hostfile option)
all this embedded in a qsub command file:

#$ -cwd
#$ -pe * 24
#$ -r no
#$ -m n
#$ -l h_rt=240:00:00

Now i ran into anothoer problem:
When a host, say e.g. host3 crashes/shuts down (the whole host, not only
sge) i have to change the host list in order to run the code.

That means i have to know on which nodes i will be allocated CPUs so i
can give *only* those CPUs in the host list to mpirun.  This entirely
defeats the point of having a scheduler, because now i can no longer
leave my code sitting in the queue and trust that it will run when
enough CPUs become free.

Does anybody have an idea how i can slove this problem?

With kind regards and thank you in advance for any help, ulrich



More information about the users mailing list