Full Bandwidth Broadcast, Reduction and Scan with Only Two Trees

Sanders, Peter; Speck, Jochen; Träff, Jesper Larsson

doi:10.1007/978-3-540-75416-9_10

Peter Sanders¹,
Jochen Speck¹ &
Jesper Larsson Träff²

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4757))

Included in the following conference series:

European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting

927 Accesses
2 Citations

Abstract

We present a new, simple algorithmic idea for exploiting the capability for bidirectional communication present in many modern interconnects for the collective MPI operations broadcast, reduction and scan. Our algorithms achieve up to twice the bandwidth of most previous and commonly used algorithms. In particular, our algorithms for reduction and scan are the currently best known. Experiments on clusters with Myrinet and InfiniBand interconnects show significant reductions in running time for broadcast and reduction, for reduction even close to the best possible factor of two.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

MPI vs. BitTorrent: Switching between Large-Message Broadcast Algorithms in the Presence of Bottleneck Links

FastCast: A Throughput- and Latency-Efficient Total Order Broadcast Protocol

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

Article 16 March 2023

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bar-Noy, A., Kipnis, S., Schieber, B.: Optimal multiple message broadcasting in telephone-like communication systems. Discrete Applied Mathematics 100(1-2), 1–15 (2000)
Article MATH MathSciNet Google Scholar
Barnett, M., Gupta, S., Payne, D.G., Schuler, L., van de Geijn, R., Watts, J.: Building a high-performance collective communication library. In: Supercomputing 1994, pp. 107–116 (1994)
Google Scholar
Chan, E.W., Heimlich, M.F., Purkayastha, A., van de Geijn, R.A.: On optimizing collective communication. In: IEEE International Conference on Cluster Computing CLUSTER 2004, IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable imlementation of the MPI message passing interface standard. Parallel Computing 22(6), 789–828 (1996)
Article MATH Google Scholar
Happe, H.H., Vinter, B.: Improving TCP/IP multicasting with message segmentation. In: Communicating Process Architectures (CPA 2005) (2005)
Google Scholar
Kwon, O.-H., Chwa, K.-Y.: Multiple message broadcasting in communication networks. Networks 26, 253–261 (1995)
Article MATH MathSciNet Google Scholar
Mayr, E.W., Plaxton, C.G.: Pipelined parallel prefix computations, and sorting on a pipelined hypercube. Journal of Parallel and Distributed Computing 17, 374–380 (1993)
Article MATH Google Scholar
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.: Performance analysis of MPI collective operations. In: International Parallel and Distributed Processing Symposium (IPDPS 2005), Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO) (2005)
Google Scholar
Rabenseifner, R., Träff, J.L.: More efficient reduction algorithms for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3241, pp. 36–46. Springer, Heidelberg (2004)
Google Scholar
Ritzdorf, H., Träff, J.L.: Collective operations in NEC’s high-performance MPI libraries. In: International Parallel and Distributed Processing Symposium (IPDPS 2006), p. 100 (2006)
Google Scholar
Sanders, P., Träff, J.L.: Parallel prefix (scan) algorithms for MPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 4192, pp. 49–57. Springer, Heidelberg (2006)
Chapter Google Scholar
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference. In: The MPI Core, 2nd edn., MIT Press, Cambridge (1998)
Google Scholar
Thakur, R., Gropp, W.D., Rabenseifner, R.: Improving the performance of collective operations in MPICH. International Journal on High Performance Computing Applications 19, 49–66 (2004)
Article Google Scholar
Träff, J.L.: A simple work-optimal broadcast algorithm for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3241, pp. 173–180. Springer, Heidelberg (2004)
Google Scholar
Träff, J.L., Ripke, A.: An optimal broadcast algorithm adapted to SMP-clusters. In: Di Martino, B., Kranzlmüller, D., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3666, pp. 48–56. Springer, Heidelberg (2005)
Chapter Google Scholar
Träff, J.L., Ripke, A.: Optimal broadcast for fully connected networks. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J.J. (eds.) HPCC 2005. LNCS, vol. 3726, pp. 45–56. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Universität Karlsruhe, Am Fasanengarten 5, D-76131 Karlsruhe, Germany
Peter Sanders & Jochen Speck
NEC Laboratories Europe, NEC Europe Ltd., Rathausallee 10, D-53757 Sankt Augustin, Germany
Jesper Larsson Träff

Authors

Peter Sanders
View author publications
You can also search for this author in PubMed Google Scholar
Jochen Speck
View author publications
You can also search for this author in PubMed Google Scholar
Jesper Larsson Träff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Franck Cappello Thomas Herault Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sanders, P., Speck, J., Träff, J.L. (2007). Full Bandwidth Broadcast, Reduction and Scan with Only Two Trees. In: Cappello, F., Herault, T., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2007. Lecture Notes in Computer Science, vol 4757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75416-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-75416-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75415-2
Online ISBN: 978-3-540-75416-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Full Bandwidth Broadcast, Reduction and Scan with Only Two Trees

Abstract

Chapter PDF

Similar content being viewed by others

MPI vs. BitTorrent: Switching between Large-Message Broadcast Algorithms in the Presence of Bottleneck Links

FastCast: A Throughput- and Latency-Efficient Total Order Broadcast Protocol

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Full Bandwidth Broadcast, Reduction and Scan with Only Two Trees

Abstract

Chapter PDF

Similar content being viewed by others

MPI vs. BitTorrent: Switching between Large-Message Broadcast Algorithms in the Presence of Bottleneck Links

FastCast: A Throughput- and Latency-Efficient Total Order Broadcast Protocol

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation