Abstract
Barrier synchronization is commonly used for synchronizing processors prior to a join operation and to enforce data dependencies during the execution of parallelized loops. Simple software implementations of barrier synchronization can result in memory hot-spots, especially in large scale shared-memory multiprocessors containing hundreds of processors and memory modules communicating through an interconnection network. A software combining tree can be used to substantially reduce memory contention due to hot-spots. However, such an implementation results inO(logn) latency in recognition of barrier synchronization, wheren is the number of processors. In this paper anadaptive software combining tree is used to implement a scalable barrier withO(1) recognition latency. The processors that arrive early at the barrier adapt the combining tree so that it has a structure appropriate for reducing the latency for the processors that arrive later. We also show how adaptive combining trees can be used to implement the fuzzy barrier. The fuzzy barrier mechanism reduces the idling of processors at the barriers by allowing the processors to execute useful instructions while they are waiting at the barrier.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
E. D. Brooks, The Butterfly Barrier,International Journal of Parallel Programming,15(4):295–307 (August 1986).
D. Hansgen, R. Finkel, and U. Manber, Two Algorithms for Barrier Synchronization,International Journal of Parallel Programming,17(1):1–18 (February 1988).
P. C. Yew, N. F. Tzeng, and D. H. Lawrie, Distributing Hot-Spot Addressing in Large Scale Multiprocessors,IEEE Transactions on Computers,C-36(4):388–395 (April 1987).
D. H. Lawrie, Access and Alignment of Data in an Array Processor,IEEE Transactions on Computers,C-24:1145–1155 (December 1975).
D. J. Kuck, E. S. Davidson, D. H. Lawrie, and A. H. Sameh, Parallel Supercomputing Today and the Cedar Approach,Science,231:967–974 (February 1986).
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, The NYU Ultracomputer-Designing a MIMD Shared Memory Parallel Machine,IEEE Transactions on Computers,C-32(2):175–189 (February 1983).
G. F. Pfister, The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture, InProc. of the International Conf. on Parallel Processing, pp. 764–771 (August 1985).
R. Gupta, The Fuzzy Barrier: A Mechanism for High Speed Synchronization of Processors, InProc. of the Third International Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 54–64 (April 1989).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gupta, R., Hill, C.R. A scalable implementation of barrier synchronization using an adaptive combining tree. Int J Parallel Prog 18, 161–180 (1989). https://doi.org/10.1007/BF01407897
Issue Date:
DOI: https://doi.org/10.1007/BF01407897