Abstract
In this note we give a simple bandwidth- and latency optimal algorithm for the problem of broadcasting m units of data from a distinguished root processor to all p–1 other processors in one-ported (hypercubic) message-passing systems. Assuming linear, uniform communication cost, the time for the broadcast to complete is O(m+log2 p), more precisely no processor is involved in more than ⌈log2 p⌉ communication operations (send, receive, and send-receive), and for any constant message size thresholdb each processor (except the root) sends at most m–b′+( ⌈log2 p⌉–ℓ)b′ units of data, where b′ is determined by the smallest ℓ≤ ⌈log2 p⌉ such that b′=m/2ℓ≤ b (the root sends 2m–b′+( ⌈log2 p⌉ –ℓ)b′ units of data). Non-root processors receive m units of data.
Building on known ideas, the salient features of the algorithm presented here is its simplicity of implementation, and smooth transition from latency to bandwidth dominated performance as data size m increases. The implementation performs very well in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barnett, M., Gupta, S., Payne, D.G., Schuler, L., van de Geijn, R., Watts, J.: Building a high-performance collective communication library. In: Supercomputing 1994, pp. 107–116 (1994)
Bruck, J., Coster, L.D., Dewulf, N., Ho, C.-T., Lauwereins, R.: On the design and implementation of broadcast and global combine operations using the postal model. IEEE Transactions on Parallel and Distributed Systems 7(3), 256–265 (1996)
Bruck, J., Ho, C.-T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems 8(11), 1143–1156 (1997)
Culler, D.E., Karp, R.M., Patterson, D., Sahay, A., Santos, E.E., Schauser, K.E., Subramonian, R., von Eicken, T.: LogP: A practical model of parallel computation. Communications of the ACM 39(11), 78–85 (1996)
JáJá, J.: An Introduction to Parallel Algorithms. Addison-Wesley, Reading (1992)
Johnsson, S.L., Ho, C.-T.: Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers 38(9), 1249–1268 (1989)
Sanders, P., Sibeyn, J.F.: A bandwidth latency tradeoff for broadcast and reduction. Information Processing Letters 86(1), 33–38 (2003)
Santos, E.E.: Optimal and near-optimal algorithms for k-item broadcast. Journal of Parallel and Distributed Computing 57(2), 121–139 (1999)
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference, 2nd edn. The MPI Core, vol. 1. MIT Press, Cambridge (1998)
Thakur, R., Gropp, W.D.: Improving the performance of collective operations in MPICH. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 257–267. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Träff, J.L. (2004). A Simple Work-Optimal Broadcast Algorithm for Message-Passing Parallel Systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2004. Lecture Notes in Computer Science, vol 3241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30218-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-30218-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23163-9
Online ISBN: 978-3-540-30218-6
eBook Packages: Springer Book Archive