[gridengine users] Parallel GE jobs on 48-way nodes

Reuti reuti at staff.uni-marburg.de
Tue Oct 11 20:31:41 UTC 2011


Am 11.10.2011 um 14:37 schrieb William Hay:

> On 11 October 2011 12:55, Reuti <reuti at staff.uni-marburg.de> wrote:
>> Am 10.10.2011 um 20:46 schrieb Gerald Ragghianti:
>> 
>>> We have a cluster consisting of 48-core compute nodes where we need to run parallel (MPI) jobs across nodes.  There is a hardware limitation on the QDR Infiniband cards that limits the available hardware contexts to 16 per card.  We have to ensure that we don't over-subscribe these hardware contexts because parallel jobs without available contexts will crash.  The difficulty is that the contexts needed for a job are a function of the number of compute nodes the job uses, not the number of job slots.
>> 
>> When I get you right, you are seeking for something like a complex with "consumable HOST" (instead of JOB or YES, i.e. consume it one time on each used exechost independent from the total number of slots granted on this machine). Unfortunately it was discussed before but not implemented yet.
>> 
>> 
> I don't think per host consumables would be needed.  With a later
> version of grid engine 2 queues should be sufficient.
> 1 queue with an exclusive resource and multi-node PEs and one without
> either of those.  You'd have to add a slots resource at the host level
> to stop the host being overloaded and possibly use a JSV to ensure all
> jobs are appropriately directed.
> 
> Unfortunately I don't think 6.1 supports exclusive resources.

Yep, that would be a possible implementation.

Like the OP mentioned, one could use a consumable complex for 6.1. If you add "complex_values network=16" to the queue, and "load_thresholds network=15" it will be pushed to alarm state automatically and you can avoid the load sensor. When you add a default consumption of 1, it works out-of-the-box (it's only subtracted if it's attached to a queue).

I.e. the other queue for normal jobs don't have it attached, and you select the special multi-node queue by the requested PE.

And as outlined: the overall slot count per node needs to be limited on an exechost level.

-- Reuti


More information about the users mailing list