Abstract
MPI_Allgather is an important collective operation which is used in applications such as matrix multiplication and in basic linear algebra operations. With the next generation systems going multi-core, the clusters deployed would enable a high process count per node. The traditional implementations of Allgather use two separate channels, namely network channel for communication across the nodes and shared memory channel for intra-node communication. An important drawback of this approach is the lack of sharing of communication buffers across these channels. This results in extra copying of data within a node yielding sub-optimal performance. This is true especially for a collective involving large number of processes with a high process density per node. In the approach proposed in the paper, we propose a solution which eliminates the extra copy costs by sharing the communication buffers for both intra and inter node communication. Further, we optimize the performance by allowing overlap of network operations with intra-node shared memory copies. On a 32, 2-way node cluster, we observe an improvement upto a factor of two for MPI_Allgather compared to the original implementation. Also, we observe overlap benefits upto 43% for 32x2 process configuration.
This research is supported in part by Department of Energy’s Grant #DE-FC02-01ER25506; National Science Foundation’s grants #CCR-0204429, #CCR-0311542 and #CNS-0403342; grants from Intel and Mellanox; and equipment donations from Intel, Mellanox, AMD, Apple, Advanced Clustering and Sun Microsystems.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Bernaschi, M., Richelli, G.: Mpi collective communication operations on large shared memory systems. In: Parallel and Distributed Processing, 2001. Proceedings. Ninth Euromicro Workshop (2001)
Bruck, J., Ho, C.-T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems. IEEE Transactions in Parallel and Distributed Systems 8(11), 1143–1156 (1997)
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing 22(6), 789–828 (1996)
InfiniBand Trade Association. InfiniBand Architecture Specification, Release 1.1 (October 2004), http://www.infinibandta.org
Kini, S.P., Liu, J., Wu, J., Wyckoff, P., Panda, D.K.: Fast and Scalable Barrier using RDMA and Multicast Mechanisms for InfiniBand-Based Clusters. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 369–378. Springer, Heidelberg (2003)
Liu, J., Wu, J., Kinis, S.P., Buntinas, D., Yu, W., Chandrasekaran, B., Noronha, R., Wyckoff, P., Panda, D.K.: MPI over InfiniBand: Early Experiences. Technical Report, OSU-CISRC-10/02-TR25, Computer and Information Science, the Ohio State University (January 2003)
NASA. NAS Parallel Benchmarks, http://www.nas.nasa.gov/Software/NPB/
Mamidala, A.R., Liu, J., Panda, D.K.: Efficient Barrier and Allreduce InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms. In: Proceedings of Cluster Computing (2004)
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI–The Complete Reference. In: The MPI-1 Core, 2nd edn., vol. 1. The MIT Press, Cambridge (1998)
Sur, S., Bondhugula, U.K.R., Mamidala, A.R., Jin, H.-W., Panda, D.K.: High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, pp. 148–157. Springer, Heidelberg (2005)
Sur, S., Jin, H.-W., Panda, D.K.: Efficient and Scalable All-to-All Exchange for InfiniBand-based Clusters. In: International Conference on Parallel Processing (ICPP) (2004)
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of Collective communication operations in MPICH. Int’l Journal of High Performance Computing Applications 19(1), 49–66 (2005)
Tipparaju, V., Nieplocha, J., Panda, D.K.: Fast collective operations using shared and remote memory access protocols on clusters. In: International Parallel and Distributed Processing Symposium 2003 (2003)
Wu, M.-S., Kendall, R.A., Wright, K.: Optimizing collective communications on smp clusters. In: ICPP 2005 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mamidala, A.R., Vishnu, A., Panda, D.K. (2006). Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2006. Lecture Notes in Computer Science, vol 4192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846802_17
Download citation
DOI: https://doi.org/10.1007/11846802_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39110-4
Online ISBN: 978-3-540-39112-8
eBook Packages: Computer ScienceComputer Science (R0)