[gridengine users] SGE supports heterogeneous network?

Reuti reuti at staff.uni-marburg.de
Tue Jan 27 10:48:19 UTC 2015


Hi,

> Am 27.01.2015 um 02:15 schrieb Sangmin Park <dorimosiada at gmail.com>:
> 
> We have three HPC systems called A, B, and C and these could be accessible through the login node. SGE is installed login node.
> A and B HPC systems are consist of master node and computing nodes respectively and connected gigabit ethernet between them. But, C HPC system has ideal configuration, not ethernet. It's wired infiniband network.
> 
> Each HPC system has two kinds of network, one for management using gigabit network, another for computing using gagiabit for A, B system and infiniband for C system.
> SGE uses management network.
> 
> Problems arise in C HPC system. SGE uses management network.
> So, when a user submits a job using sge, it could be use gigabit network, not infiniband network.

This is not necessarily related to SGE.

- SGE can be instructed to use any TCP/IP based connections, being it eth1 or any else:

https://arc.liv.ac.uk/SGE/howto/multi_intrfcs.html

But for sake of easiness I tended to route this low traffic to eth0 where also MPI should run on (in former times the MPI libraries just used the given hostname). The file transfer by NFS was then done on eth1, which was easy to adjust by export and mount.

But nowadays this depends heavily on the used MPI library. While I configured the nodes all the time to have an unique name per interface, the Open MPI library for example tries to cope with the situation that all interfaces have the same name and perform some kind of interface/network scan to get all possible routes between the granted nodes and uses a fixed distribution of the amount of traffic afterwards. So it might use both: the IB and Gigabit network and split the traffic.

What parallel library do you intend to use? Best is to ask on the associated mailing list of the parallel library how to adjust the startup to use IB (and only IB) after the initial startup of the application.

I would appreciate in case you can post the results of your findings here.

-- Reuti


> To use infiniband network, sge has to work with infiniband network in all cases.
> 
> Does SGE work in heterogeneous network systems well?
> 
> -Sangmin
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users





More information about the users mailing list