Abstract
We present a new, simple algorithmic idea for exploiting the capability for bidirectional communication present in many modern interconnects for the collective MPI operations broadcast, reduction and scan. Our algorithms achieve up to twice the bandwidth of most previous and commonly used algorithms. In particular, our algorithms for reduction and scan are the currently best known. Experiments on clusters with Myrinet and InfiniBand interconnects show significant reductions in running time for broadcast and reduction, for reduction even close to the best possible factor of two.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bar-Noy, A., Kipnis, S., Schieber, B.: Optimal multiple message broadcasting in telephone-like communication systems. Discrete Applied Mathematics 100(1-2), 1–15 (2000)
Barnett, M., Gupta, S., Payne, D.G., Schuler, L., van de Geijn, R., Watts, J.: Building a high-performance collective communication library. In: Supercomputing 1994, pp. 107–116 (1994)
Chan, E.W., Heimlich, M.F., Purkayastha, A., van de Geijn, R.A.: On optimizing collective communication. In: IEEE International Conference on Cluster Computing CLUSTER 2004, IEEE Computer Society Press, Los Alamitos (2004)
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable imlementation of the MPI message passing interface standard. Parallel Computing 22(6), 789–828 (1996)
Happe, H.H., Vinter, B.: Improving TCP/IP multicasting with message segmentation. In: Communicating Process Architectures (CPA 2005) (2005)
Kwon, O.-H., Chwa, K.-Y.: Multiple message broadcasting in communication networks. Networks 26, 253–261 (1995)
Mayr, E.W., Plaxton, C.G.: Pipelined parallel prefix computations, and sorting on a pipelined hypercube. Journal of Parallel and Distributed Computing 17, 374–380 (1993)
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.: Performance analysis of MPI collective operations. In: International Parallel and Distributed Processing Symposium (IPDPS 2005), Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO) (2005)
Rabenseifner, R., Träff, J.L.: More efficient reduction algorithms for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3241, pp. 36–46. Springer, Heidelberg (2004)
Ritzdorf, H., Träff, J.L.: Collective operations in NEC’s high-performance MPI libraries. In: International Parallel and Distributed Processing Symposium (IPDPS 2006), p. 100 (2006)
Sanders, P., Träff, J.L.: Parallel prefix (scan) algorithms for MPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 4192, pp. 49–57. Springer, Heidelberg (2006)
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference. In: The MPI Core, 2nd edn., MIT Press, Cambridge (1998)
Thakur, R., Gropp, W.D., Rabenseifner, R.: Improving the performance of collective operations in MPICH. International Journal on High Performance Computing Applications 19, 49–66 (2004)
Träff, J.L.: A simple work-optimal broadcast algorithm for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3241, pp. 173–180. Springer, Heidelberg (2004)
Träff, J.L., Ripke, A.: An optimal broadcast algorithm adapted to SMP-clusters. In: Di Martino, B., Kranzlmüller, D., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 3666, pp. 48–56. Springer, Heidelberg (2005)
Träff, J.L., Ripke, A.: Optimal broadcast for fully connected networks. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J.J. (eds.) HPCC 2005. LNCS, vol. 3726, pp. 45–56. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sanders, P., Speck, J., Träff, J.L. (2007). Full Bandwidth Broadcast, Reduction and Scan with Only Two Trees. In: Cappello, F., Herault, T., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2007. Lecture Notes in Computer Science, vol 4757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75416-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-75416-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75415-2
Online ISBN: 978-3-540-75416-9
eBook Packages: Computer ScienceComputer Science (R0)