loading...
Hardware- and Software-Based Collective Communication on the Quadrics Network
Cambridge, Massachusette October 08-October 10
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/NCA.2001.962513IEEE International Symposium on Netwo ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The efficient implementation of collective communication patterns in a parallel machine is a challenging design effort, that requires the solution of many problems. In this paper we present an in-depth description of how the Quadrics network supports both hardware- and software-based collectives. We describe the main features of the two building blocks of this network, a network interface that can perform zero-copy user-level communication and a wormhole routing switch. We also focus our attention on the routing and flow control algorithms, deadlock avoidance and on how the processing nodes are integrated in a global, virtual shared memory.Experimental results conducted on 64-node AlphaServer cluster indicate that the time to complete the hardware-based barrier synchronization on the whole network is as low as 6 microsecs, with very good scalability. Good latency and scalability are also achieved with the software-based synchronization, which takes about 15 microsecs. With the broadcast, similar performance is achieved by the hardware- and software-based implementations, which can deliver messages of up to 256 bytes in 13 microsecs and can get a sustained asymptotic bandwidth of 288 i Mbytes/sec on all the nodes.The hardware-based barrier is almost insensitive to the network congestion, with 93% of the synchronizations taking less than 20 microsecs when the network is flooded with a background traffic of unicast messages. On the other hand, the software-based implementation suffers from a significant performance degradation. With high load the hardware broadcast maintains a reasonably good latency, delivering messages up to 2KB in 200 microsecs, while the software broadcast suffers from slightly higher latencies inherited from the synchronization mechanism. Both broadcast algorithms experience a significative performance degradation of the sustained bandwidth with large messages.
Citation:
Fabrizio Petrini, Salvador Coll, Eitan Frachtenberg, Adolfy Hoisie, "Hardware- and Software-Based Collective Communication on the Quadrics Network," nca, pp.0024, IEEE International Symposium on Network Computing and Applications (NCA'01), 2001
Usage of this product signifies your acceptance of the Terms of Use.