Abstract
There has been a significant research in collective communication operations, in particular in MPI broadcast, on distributed memory platforms. Most of the research works are done to optimize the collective operations for particular architectures by taking into account either their topology or platform parameters. In this work we propose a very simple and at the same time general approach to optimize legacy MPI broadcast algorithms, which are widely used in MPICH and OpenMPI. Theoretical analysis and experimental results on IBM BlueGene/P and a cluster of Grid’5000 platform are presented.
Chapter PDF
Similar content being viewed by others
References
Message passing interface forum, http://www.mpi-forum.org/
MPICH-A Portable Implementation of MPI, http://www.mpich.org/
Gabriel, E., Fagg, G., Bosilca, G., Angskun, T., Dongarra, J., Squyres, J., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R., Daniel, D., Graham, R., Woodall, T.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings of the 11th European PVM/MPI Users’ Group Meeting, pp. 97–104 (2004)
Thakur, R., Gropp, W.D.: Improving the Performance of Collective Operations in MPICH. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 257–267. Springer, Heidelberg (2003)
Watts, J., Van de Geijn, R.: A Pipelined Broadcast for Multidimensional Meshes Parallel Processing Letters 05, 281 (1995)
Barnett, M., Gupta, S., Payne, D., Shuler, L., Van de Geijn, R., Watts, J.: Interprocessor collective communication library (InterCom). In: Proceedings of the Scalable High Performance Computing Conference, pp. 357–364. IEEE (1994)
Johnsson, S.L., Ho, C.-T.: Optimum Broadcasting and Personalized Communication in Hypercubes. IEEE Transactions on Computers 38(9), 1249–1268 (1989)
Sanders, P., Sibeyn, J.F.: A bandwidth latency tradeoff for broadcast and reduction. Information Processing Letters 86(1), 33–38 (2003)
Hoefler, T., Siebert, C., Rehm, W.: A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast. In: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium, vol. 232 (March 2007)
Graham, R., Venkata, M.G., Ladd, J., Shamis, P., Rabinovitz, I., Filipov, V., Shainer, G.: Cheetah: a framework for scalable hierarchical collective operations. In: Proceedings of CCGrid, pp. 73–83 (2011)
Dichev, K., Lastovetsky, A.: Optimization of collective communication for heterogeneous HPC platforms. In: High-Performance Computing on Complex Environments, Wiley, pp. 95–114. Wiley (2014)
Kumar, S., Dozsa, G., Almasi, G., Heidelberger, P., Chen, D., Giampapa, M.E., Blocksome, M., Faraj, A., Parker, J., Ratterman, J., Smith, B., Archer, C.J.: The deep computing messaging framework: generalized scalable message passing on the Blue Gene/P supercomputer. In: Proceedings of the 22nd Annual International Conference on Supercomputing (ICS), pp. 94–103 (2008)
Träff, J.L., Ripke, A.: Optimal Broadcast for Fully Connected Processor-node Networks. Journal of Parallel Distributed Computing 7(68), 887–901 (2008)
Hasanov, K., Quintin, J., Lastovetsky, A.: Hierarchical Approach to Optimization of Parallel Matrix Multiplication on Large-Scale Platforms. The Journal of Supercomputing 24 (2014)
Pješivac-Grbović, J.: Towards Automatic and Adaptive Optimizations of MPI Collective Operations. PhD Thesis, University of Tennessee, Knoxville (2007)
Lastovetsky, A., Rychkov, V., O’Flynn, M.: MPIBlib: Benchmarking MPI communications for parallel computing on homogeneous and heterogeneous clusters. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds.) EuroPVM/MPI 2008. LNCS, vol. 5205, pp. 227–238. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hasanov, K., Quintin, JN., Lastovetsky, A. (2014). High-Level Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8806. Springer, Cham. https://doi.org/10.1007/978-3-319-14313-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-14313-2_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14312-5
Online ISBN: 978-3-319-14313-2
eBook Packages: Computer ScienceComputer Science (R0)