[gridengine users] Complex value consumption in parallel jobs

Simon Andrews simon.andrews at babraham.ac.uk
Tue Jan 27 12:03:48 UTC 2015


I've spent the morning tracking down a scheduling problem on our cluster
which arose from a misunderstanding on how complex values and parallel
environments interact.

In our setup we have configured h_vmem to be consumable so we can schedule
based on the memory requirements of the jobs.  We also have a parallel
environment set up for SMP jobs which allow the user to reserve multiple
cores on the same physical machine.

This morning we found a load of jobs which couldn't be scheduled despite
us appearing to have plenty of memory and cores free.  Other jobs with
similar memory requirements and numbers of cores were able to be
scheduled, but this one set of jobs would only stay queued.

We eventually figured out that this was because when we set a pe request
and an h_vmem request, that the actual reservation of memory multiplies
the h_vmem by the number of cores, so we were actually requesting about
10X the memory we thought we were after.  I can see that for MPI type jobs
this makes plenty of sense since they are running independently,
potentially on different machines.  For SMP jobs though we're actually
just running different threads so it seems odd to have to make our users
calculate a 'memory per core' value, rather than an overall value for the
job.

Is there therefore any way to configure this behaviour within a pe?  I
couldn't see anything obvious in the pe or complex config, but this must
have been something people have addressed before.  For memory it's not so
bad in that we can at least just divide the allocation, but for something
like licenses where you only need one for a large SMP job I can't see how
you could set this up.

Any pointers would be greatly appreciated.

Thanks

Simon.


The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>




More information about the users mailing list