Abstract
We develop and implement a new optimal broadcast algorithm for fully connected, bidirectional, one-ported networks under a linear communication cost model. For any number of processors p the number of communication rounds required to broadcast N blocks of data is ⌈logp⌉− 1 + N. For data of size m, assuming that sending and receiving m data units takes time α + βm, the best running time that can be achieved is \((\sqrt{(\lceil{\rm log} p\rceil - 1)\alpha} + \sqrt{{\beta}m})^{2}\), meeting the lower bound under the assumption that the m units are sent as N blocks. This is better than previously known (and implemented) results, which achieve this only when p is a power of two (or other special cases), in particular, the algorithm is (theoretically) a factor two better than the commonly used, pipelined binary tree algorithm. The algorithm has a regular communication pattern based on simultaneous binomial-like trees, and when the number of blocks to be broadcast is one, degenerates into a binomial tree broadcast. Thus the same algorithm can be used for all message sizes m. The algorithm has been incorporated into a state-of-the-art MPI (Message Passing Interface) library. We demonstrate significant practical improvements of up to a factor 1.5 over several other, commonly used broadcast algorithms.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.J.: LogGP: Incorporating long messages into the LogP model for parallel computation. Journal of Parallel and Distributed Computing 44(1), 71–79 (1997)
Cappello, F., Fraigniaud, P., Mans, B., Rosenberg, A.L.: HiHCoHP: Toward a realistic communication model for Hierarchical HyperClusters of Heterogeneous Processors. In: 15th International Parallel and Distributed Processing Symposium (IPDPS 2001), pp. 42–47 (2001)
Chan, E.W., Heimlich, M.F., Purkayastha, A., van de Geijn, R.A.: On optimizing collective communication. In: Cluster (2004)
Culler, D.E., Karp, R.M., Patterson, D., Sahay, A., Santos, E.E., Schauser, K.E., Subramonian, R., von Eicken, T.: LogP: A practical model of parallel computation. Communications of the ACM 39(11), 78–85 (1996)
Fraigniaud, P., Lazard, E.: Methods and problems of communication in usual networks. Discrete Applied Mathematics 53(1–3), 79–133 (1994)
Gołebiewski, M., Ritzdorf, H., Träff, J.L., Zimmermann, F.: The MPI/SX implementation of MPI for NEC’s SX-6 and other NEC platforms. NEC Research & Development 44(1), 69–74 (2003)
Hedetniemi, S.M., Hedetniemi, T., Liestman, A.L.: A survey of gossiping and broadcasting in communication networks. Networks 18, 319–349 (1988)
Johnsson, S.L., Ho, C.-T.: Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers 38(9), 1249–1268 (1989)
Juhász, S., Kovács, F.: Asynchronous distributed broadcasting in cluster environment. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 164–172. Springer, Heidelberg (2004)
Sanders, P., Sibeyn, J.F.: A bandwidth latency tradeoff for broadcast and reduction. Information Processing Letters 86(1), 33–38 (2003)
Santos, E.E.: Optimal and near-optimal algorithms for k-item broadcast. Journal of Parallel and Distributed Computing 57(2), 121–139 (1999)
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference. In: The MPI Core, 2nd edn. MIT Press, Cambridge (1998)
Thakur, R., Gropp, W.D., Rabenseifner, R.: Improving the performance of collective operations in MPICH. International Journal on High Performance Computing Applications 19, 49–66 (2004)
Tien, J.-Y., Ho, C.-T., Yang, W.-P.: Broadcasting on incomplete hypercubes. IEEE Transactions on Computers 42(11), 1393–1398 (1993)
Träff, J.L.: A simple work-optimal broadcast algorithm for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 173–180. Springer, Heidelberg (2004)
Träff, J.L., Ripke, A.: An optimal broadcast algorithm adapted to SMP clusters. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds.) EuroPVM/MPI 2005. LNCS, vol. 3666, pp. 48–56. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Träff, J.L., Ripke, A. (2005). Optimal Broadcast for Fully Connected Networks. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds) High Performance Computing and Communications. HPCC 2005. Lecture Notes in Computer Science, vol 3726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557654_8
Download citation
DOI: https://doi.org/10.1007/11557654_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29031-5
Online ISBN: 978-3-540-32079-1
eBook Packages: Computer ScienceComputer Science (R0)