[gridengine users] again CPU allocation
hiller at mpia-hd.mpg.de
Tue Jul 23 08:09:31 UTC 2013
Dear list members,
i have GE 2011.11p1 on an opensuse 12.3 cluster installed.
Last week i have asked about jobs not using all CPUs
(subject CPU allocation).
I thought to have solved it by using the host option:
mpirun -np $NSLOTS -host host1,host2,.... -prefix python mycode.py
(or using the hostfile option)
all this embedded in a qsub command file:
#$ -pe * 24
#$ -r no
#$ -m n
#$ -l h_rt=240:00:00
Now i ran into anothoer problem:
When a host, say e.g. host3 crashes/shuts down (the whole host, not only
sge) i have to change the host list in order to run the code.
That means i have to know on which nodes i will be allocated CPUs so i
can give *only* those CPUs in the host list to mpirun. This entirely
defeats the point of having a scheduler, because now i can no longer
leave my code sitting in the queue and trust that it will run when
enough CPUs become free.
Does anybody have an idea how i can slove this problem?
With kind regards and thank you in advance for any help, ulrich
More information about the users