TRAM: Optimizing Fine-grained Communication with Topological Routing and Aggregation of Messages
Venue
International Conference on Parallel Processing (2014)
Publication Year
2014
Authors
Lukasz Wesolowski, Ramprasad Venkataraman, A Gupta, Jae-Seung Yeom, Keith Bisset, Yanhua Sun, Pritish Jetley, Thomas Quinn, Laxmikant Kale
BibTeX
Abstract
Fine-grained communication in supercomputing applications often limits performance
through high communication overhead and poor utilization of network bandwidth. This
paper presents Topological Routing and Aggregation Module (TRAM), a library that
optimizes fine-grained communication performance by routing and dynamically
combining short messages. TRAM collects units of fine-grained communication from
the application and combines them into aggregated messages with a common
intermediate destination. It routes these messages along a virtual mesh topology
mapped onto the physical topology of the network. TRAM improves network bandwidth
utilization and reduces communication overhead. It is particularly effective in
optimizing patterns with global communication and large message counts, such as all
to-all and many-to-many, as well as sparse, irregular, dynamic or data dependent
patterns. We demonstrate how TRAM improves performance through theoretical analysis
and experimental verification using benchmarks and scientific applications. We
present speedups on petascale systems of 6x for communication benchmarks and up to
4x for applications.
