[gridengine users] serial and mpi jobs running on the same nodes
Winkler, Ursula (firstname.lastname@example.org)
ursula.winkler at uni-graz.at
Mon Jan 26 20:39:36 UTC 2015
Hi gridengine mailinglist members,
I'll trying to find a solution for an environment running serial jobs as well as mpi jobs on
6 hosts where each host has 32 cores/slots. Due to the small number of nodes, assigning
each sort of jobs to separate nodes (e.g. nodes 1-2 for serial, nodes 3-6 for mpi jobs) is
not an option, expecially because the ratio serial:mpi is quite a variable one.
I tried out to set up 2 queues with "serial" as a subordinate queue to "mpi". - But that
only is unwasteful if the mpi job(s) use ~ 32 slots per host. Otherwise there are serial
jobs which could run but persist unnecessarily in a suspended state due to the fact
that the whole queue "serial" is suspended.
The other possible option should be the subordination of slots, but that doesn't work either
because the scheduler obviously (concerning subordination) is not capable of figuring out how many slots a mpi job actually is requesting, and so suspends stubbornly only one serial job -
which of course causes core oversubscription.
Has somebody an idea to solve this problem in a satisfying way?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users