Skip to main content

A Simple Work-Optimal Broadcast Algorithm for Message-Passing Parallel Systems

  • Conference paper
Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3241))

Abstract

In this note we give a simple bandwidth- and latency optimal algorithm for the problem of broadcasting m units of data from a distinguished root processor to all p–1 other processors in one-ported (hypercubic) message-passing systems. Assuming linear, uniform communication cost, the time for the broadcast to complete is O(m+log2 p), more precisely no processor is involved in more than ⌈log2 p⌉ communication operations (send, receive, and send-receive), and for any constant message size thresholdb each processor (except the root) sends at most mb′+( ⌈log2 p⌉–ℓ)b′ units of data, where b′ is determined by the smallest ℓ≤ ⌈log2 p⌉ such that b′=m/2b (the root sends 2mb′+( ⌈log2 p⌉ –ℓ)b′ units of data). Non-root processors receive m units of data.

Building on known ideas, the salient features of the algorithm presented here is its simplicity of implementation, and smooth transition from latency to bandwidth dominated performance as data size m increases. The implementation performs very well in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Barnett, M., Gupta, S., Payne, D.G., Schuler, L., van de Geijn, R., Watts, J.: Building a high-performance collective communication library. In: Supercomputing 1994, pp. 107–116 (1994)

    Google Scholar 

  2. Bruck, J., Coster, L.D., Dewulf, N., Ho, C.-T., Lauwereins, R.: On the design and implementation of broadcast and global combine operations using the postal model. IEEE Transactions on Parallel and Distributed Systems 7(3), 256–265 (1996)

    Article  Google Scholar 

  3. Bruck, J., Ho, C.-T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems 8(11), 1143–1156 (1997)

    Article  Google Scholar 

  4. Culler, D.E., Karp, R.M., Patterson, D., Sahay, A., Santos, E.E., Schauser, K.E., Subramonian, R., von Eicken, T.: LogP: A practical model of parallel computation. Communications of the ACM 39(11), 78–85 (1996)

    Article  Google Scholar 

  5. JáJá, J.: An Introduction to Parallel Algorithms. Addison-Wesley, Reading (1992)

    MATH  Google Scholar 

  6. Johnsson, S.L., Ho, C.-T.: Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers 38(9), 1249–1268 (1989)

    Article  MathSciNet  Google Scholar 

  7. Sanders, P., Sibeyn, J.F.: A bandwidth latency tradeoff for broadcast and reduction. Information Processing Letters 86(1), 33–38 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  8. Santos, E.E.: Optimal and near-optimal algorithms for k-item broadcast. Journal of Parallel and Distributed Computing 57(2), 121–139 (1999)

    Article  MATH  Google Scholar 

  9. Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference, 2nd edn. The MPI Core, vol. 1. MIT Press, Cambridge (1998)

    Google Scholar 

  10. Thakur, R., Gropp, W.D.: Improving the performance of collective operations in MPICH. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 257–267. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Träff, J.L. (2004). A Simple Work-Optimal Broadcast Algorithm for Message-Passing Parallel Systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2004. Lecture Notes in Computer Science, vol 3241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30218-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30218-6_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23163-9

  • Online ISBN: 978-3-540-30218-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics