[gridengine users] Having jobs run inter-node on local node disk to reduce I/O latency?
jake.carroll at uq.edu.au
Wed Jun 20 09:40:40 UTC 2012
A probably not so uncommon question today with what is probably a simple answer forthcoming.
I've currently got a situation where one of the storage arrays I'm using to share "big" NFS to my compute nodes is under a significant amount of 10GbE I/O strain. The array can't handle the concurrency I'm currently throwing at it.
To that end – I started contemplating somehow forcing queues to somehow "transfer" the data working sets or resources requested of the storage to local /scratch "inter-node". Each node has some decently speedy 15K SAS spindles inside it. I thought it'd be nice to see if we could reduce latency and contention on the 10GbE connected array a little by doing this.
We found this:
But I am sure there is a lot more to it.
I know of a configuration item I've seen called the "transfer" queue, but I've got a feeling it's got nothing to do with this, and is more used as a mechanism to programmatically forward jobs to other SGE queues et al.
Looking for some guidance on how we might programmatically enforce the jobs at "wire up" time to transfer working sets to node local /scratch to increase efficiencies (perhaps?).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users