High-Level Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms

Hasanov, Khalid; Quintin, Jean-Noël; Lastovetsky, Alexey

doi:10.1007/978-3-319-14313-2_35

Khalid Hasanov³⁴,
Jean-Noël Quintin³⁵ &
Alexey Lastovetsky³⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8806))

Included in the following conference series:

European Conference on Parallel Processing

1844 Accesses
4 Citations

Abstract

There has been a significant research in collective communication operations, in particular in MPI broadcast, on distributed memory platforms. Most of the research works are done to optimize the collective operations for particular architectures by taking into account either their topology or platform parameters. In this work we propose a very simple and at the same time general approach to optimize legacy MPI broadcast algorithms, which are widely used in MPICH and OpenMPI. Theoretical analysis and experimental results on IBM BlueGene/P and a cluster of Grid’5000 platform are presented.

Download to read the full chapter text

Chapter PDF

Hierarchical Optimization of MPI Reduce Algorithms

Hierarchical redesign of classic MPI reduction algorithms

Article 18 June 2016

Shared Memory Based MPI Broadcast Algorithms for NUMA Systems

Keywords

References

Message passing interface forum, http://www.mpi-forum.org/
MPICH-A Portable Implementation of MPI, http://www.mpich.org/
Gabriel, E., Fagg, G., Bosilca, G., Angskun, T., Dongarra, J., Squyres, J., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R., Daniel, D., Graham, R., Woodall, T.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings of the 11th European PVM/MPI Users’ Group Meeting, pp. 97–104 (2004)
Google Scholar
Thakur, R., Gropp, W.D.: Improving the Performance of Collective Operations in MPICH. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 257–267. Springer, Heidelberg (2003)
Chapter Google Scholar
Watts, J., Van de Geijn, R.: A Pipelined Broadcast for Multidimensional Meshes Parallel Processing Letters 05, 281 (1995)
Google Scholar
Barnett, M., Gupta, S., Payne, D., Shuler, L., Van de Geijn, R., Watts, J.: Interprocessor collective communication library (InterCom). In: Proceedings of the Scalable High Performance Computing Conference, pp. 357–364. IEEE (1994)
Google Scholar
Johnsson, S.L., Ho, C.-T.: Optimum Broadcasting and Personalized Communication in Hypercubes. IEEE Transactions on Computers 38(9), 1249–1268 (1989)
Article MathSciNet Google Scholar
Sanders, P., Sibeyn, J.F.: A bandwidth latency tradeoff for broadcast and reduction. Information Processing Letters 86(1), 33–38 (2003)
Article MathSciNet MATH Google Scholar
Hoefler, T., Siebert, C., Rehm, W.: A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast. In: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium, vol. 232 (March 2007)
Google Scholar
Graham, R., Venkata, M.G., Ladd, J., Shamis, P., Rabinovitz, I., Filipov, V., Shainer, G.: Cheetah: a framework for scalable hierarchical collective operations. In: Proceedings of CCGrid, pp. 73–83 (2011)
Google Scholar
Dichev, K., Lastovetsky, A.: Optimization of collective communication for heterogeneous HPC platforms. In: High-Performance Computing on Complex Environments, Wiley, pp. 95–114. Wiley (2014)
Google Scholar
Kumar, S., Dozsa, G., Almasi, G., Heidelberger, P., Chen, D., Giampapa, M.E., Blocksome, M., Faraj, A., Parker, J., Ratterman, J., Smith, B., Archer, C.J.: The deep computing messaging framework: generalized scalable message passing on the Blue Gene/P supercomputer. In: Proceedings of the 22nd Annual International Conference on Supercomputing (ICS), pp. 94–103 (2008)
Google Scholar
Träff, J.L., Ripke, A.: Optimal Broadcast for Fully Connected Processor-node Networks. Journal of Parallel Distributed Computing 7(68), 887–901 (2008)
Article Google Scholar
Hasanov, K., Quintin, J., Lastovetsky, A.: Hierarchical Approach to Optimization of Parallel Matrix Multiplication on Large-Scale Platforms. The Journal of Supercomputing 24 (2014)
Google Scholar
Pješivac-Grbović, J.: Towards Automatic and Adaptive Optimizations of MPI Collective Operations. PhD Thesis, University of Tennessee, Knoxville (2007)
Google Scholar
Lastovetsky, A., Rychkov, V., O’Flynn, M.: MPIBlib: Benchmarking MPI communications for parallel computing on homogeneous and heterogeneous clusters. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds.) EuroPVM/MPI 2008. LNCS, vol. 5205, pp. 227–238. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

University College Dublin, Belfield, Dublin, 4, Ireland
Khalid Hasanov & Alexey Lastovetsky
Extreme Computing R&D, Bull SAS, Paris, France
Jean-Noël Quintin

Authors

Khalid Hasanov
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Noël Quintin
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Lastovetsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, University of Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Luís Lopes
Vilnius University, 08663, Vilnius, Lithuania
Julius Žilinskas
Inria Rennes - Bretagne Atlantique, 35042, Rennes, France
Alexandru Costan
Inria, Campus Universitaire de Beaulieu, 35042, Rennes, France
Roberto G. Cascella
MTA SZTAKI, Budapest, Hungary
Gabor Kecskemeti
Inria, LaBRI, France
Emmanuel Jeannot
University Magna Graecia of Catanzaro, 88100, Catanzaro, Italy
Mario Cannataro
University of Pisa, Italy
Laura Ricci
Faculty of Computer Science, University of Vienna, Wien, Austria
Siegfried Benkner
Universitat Politècnica de València, Spain
Salvador Petit
ISISLab - Dipartimento di Informatica, Università di Salerno, Italy
Vittorio Scarano
High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, 70550, Stuttgart, Germany
José Gracia
Vienna University of Technology, 1040, Vienna, Austria
Sascha Hunold
Tennessee Tech University and Oak Ridge National Laboratory, 38505, Cookeville, TN, USA
Stephen L. Scott
RWTH Aachen University, Aachen, Germany
Stefan Lankes
Department of Informatics and Mathematics, University of Passau, Germany
Christian Lengauer
Universidad Carlos III de Madrid, 28911, Leganés, Spain
Jesús Carretero
TU München, 85747, Garching bei München, Germany
Jens Breitbart
TU Vienna, 1040, Vienna, Austria
Michael Alexander

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hasanov, K., Quintin, JN., Lastovetsky, A. (2014). High-Level Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8806. Springer, Cham. https://doi.org/10.1007/978-3-319-14313-2_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-14313-2_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14312-5
Online ISBN: 978-3-319-14313-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

High-Level Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms

Abstract

Chapter PDF

Similar content being viewed by others

Hierarchical Optimization of MPI Reduce Algorithms

Hierarchical redesign of classic MPI reduction algorithms

Shared Memory Based MPI Broadcast Algorithms for NUMA Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

High-Level Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms

Abstract

Chapter PDF

Similar content being viewed by others

Hierarchical Optimization of MPI Reduce Algorithms

Hierarchical redesign of classic MPI reduction algorithms

Shared Memory Based MPI Broadcast Algorithms for NUMA Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation