[gridengine users] Parallel GE jobs on 48-way nodes

Gerald Ragghianti gragghia at utk.edu
Tue Oct 11 22:33:13 UTC 2011


> Like the OP mentioned, one could use a consumable complex for 6.1. If you add "complex_values network=16" to the queue, and "load_thresholds network=15" it will be pushed to alarm state automatically and you can avoid the load sensor. When you add a default consumption of 1, it works out-of-the-box (it's only subtracted if it's attached to a queue).
>
> I.e. the other queue for normal jobs don't have it attached, and you select the special multi-node queue by the requested PE.
Unfortunately, I think there are two problems with this suggestion.

1. If I set network=16, then only 16 processors out of 48 will be usable 
by parallel jobs.

2. The use of a load threshold seems to prevent fill_up from working 
correctly, so even if I have network=48 for the queue complex and 
network=47 for the load threshold it will not use up all 48 slots before 
moving on to the next host.  This seems to be due to the alarm state 
becoming active on the queues at inconsistent times during a single 
scheduling iteration.  This would also affect the use of a custom load 
sensor, so I'm abandoning that idea.

If we were to update to 6.2u5, what options would we then have?

-- 
Gerald Ragghianti

Office of Information Technology - High Performance Computing
Newton HPC Program http://newton.utk.edu/
The University of Tennessee, 2309 Kingston Pike, Knoxville, TN 37919
Phone: 865-974-2448

/-------------------------------------\
| One Contact       OIT: 865-974-9900 |
| Many Solutions         help.utk.edu |
\-------------------------------------/




More information about the users mailing list