[gridengine users] serial and mpi jobs running on the same nodes

Reuti reuti at staff.uni-marburg.de
Mon Jan 26 21:03:05 UTC 2015


Hi,

Am 26.01.2015 um 21:39 schrieb Winkler, Ursula (ursula.winkler at uni-graz.at):

> I'll trying to find a solution for an environment running serial jobs as well as mpi jobs on
> 6 hosts where each host has 32 cores/slots. Due to the small number of nodes, assigning
> each sort of jobs to separate nodes (e.g. nodes 1-2 for serial, nodes 3-6 for mpi jobs) is
> not an option, expecially because the ratio serial:mpi is quite a variable one.
> 
> I tried out to set up 2 queues with "serial" as a subordinate queue to "mpi". - But that
> only is unwasteful if the mpi job(s) use ~ 32 slots per host. Otherwise there are serial
> jobs which could run but persist unnecessarily in a suspended state due to the fact
> that the whole queue "serial" is suspended.
> 
> The other possible option should be the subordination of slots, but that doesn't work either
> because the scheduler obviously (concerning subordination) is not capable of figuring out how many slots a mpi job actually is requesting, and so suspends stubbornly only one serial job - 
> which of course causes core oversubscription.
> 
> Has somebody an idea to solve this problem in a satisfying way?

Why not submitting all jobs to one and the same queue?

It might be good to provide a suitable:

$ qconf -ssconf
...
max_reservation                   20
default_duration                  8760:00:00

and submit the parallel jobs with "-R y" to avoid starvation. To use the backfilling in a proper way a value h_rt  needs to be provided too during submission.

-- Reuti



More information about the users mailing list