[gridengine users] Parallel GE jobs on 48-way nodes
w.hay at ucl.ac.uk
Wed Oct 12 07:34:29 UTC 2011
On 11 October 2011 23:33, Gerald Ragghianti <gragghia at utk.edu> wrote:
>> Like the OP mentioned, one could use a consumable complex for 6.1. If you add "complex_values network=16" to the queue, and "load_thresholds network=15" it will be pushed to alarm state automatically and you can avoid the load sensor. When you add a default consumption of 1, it works out-of-the-box (it's only subtracted if it's attached to a queue).
>> I.e. the other queue for normal jobs don't have it attached, and you select the special multi-node queue by the requested PE.
> Unfortunately, I think there are two problems with this suggestion.
> 1. If I set network=16, then only 16 processors out of 48 will be usable
> by parallel jobs.
> 2. The use of a load threshold seems to prevent fill_up from working
> correctly, so even if I have network=48 for the queue complex and
> network=47 for the load threshold it will not use up all 48 slots before
> moving on to the next host. This seems to be due to the alarm state
> becoming active on the queues at inconsistent times during a single
> scheduling iteration. This would also affect the use of a custom load
> sensor, so I'm abandoning that idea.
> If we were to update to 6.2u5, what options would we then have?
My suggestion of a queue with an exclusive resource should work with 6.2u5.
Assuming that each multi node job running on a node consumes 1 context
then you should be able to generalise
the solution by adding one such multi-node/exclusive queue per context
plus a single queue for serial/single node PEs.
> Gerald Ragghianti
> Office of Information Technology - High Performance Computing
> Newton HPC Program http://newton.utk.edu/
> The University of Tennessee, 2309 Kingston Pike, Knoxville, TN 37919
> Phone: 865-974-2448
> | One Contact OIT: 865-974-9900 |
> | Many Solutions help.utk.edu |
> users mailing list
> users at gridengine.org
More information about the users