[gridengine users] again CPU allocation
reuti at staff.uni-marburg.de
Tue Jul 23 09:29:22 UTC 2013
Am 23.07.2013 um 10:09 schrieb Ulrich Hiller:
> Dear list members,
> i have GE 2011.11p1 on an opensuse 12.3 cluster installed.
> Last week i have asked about jobs not using all CPUs
> (subject CPU allocation).
> I thought to have solved it by using the host option:
> mpirun -np $NSLOTS -host host1,host2,.... -prefix python mycode.py
> (or using the hostfile option)
> all this embedded in a qsub command file:
> #$ -cwd
> #$ -pe * 24
> #$ -r no
> #$ -m n
> #$ -l h_rt=240:00:00
> Now i ran into anothoer problem:
> When a host, say e.g. host3 crashes/shuts down (the whole host, not only
> sge) i have to change the host list in order to run the code.
This boils down to what I mentioned before: the `mpirun` should use the hosts which were granted by SGE. Hence no hostlist should be necessary at all on the command line as recent versions of MPICH2 and Open MPI will detect these automatically.
> That means i have to know on which nodes i will be allocated CPUs so i
> can give *only* those CPUs in the host list to mpirun. This entirely
> defeats the point of having a scheduler, because now i can no longer
> leave my code sitting in the queue and trust that it will run when
> enough CPUs become free.
> Does anybody have an idea how i can slove this problem?
The Python application should use the granted machines.
Although I don't see how the specified hostlist should change the behavior of `mpirun` when it's already tightly integrated: you can reformat the generated $SGE_JOB_SPOOL_DIR/pe_hostfile to prepare a so called machine file (this was necessary in former times without automatic tight integration) and provide this -hostlist to `mpirun`.
Besides doing it in the job script, such a task is often routed to the start_proc_args defined procedure in the PE (you may want to define a new one for your application) and having it only at one place and not in every scripts. There is a directory $SGE_ROOT/mpi with an example.
> With kind regards and thank you in advance for any help, ulrich
> users mailing list
> users at gridengine.org
More information about the users