Abstract
Run-time synchronization overhead is a crucial factor in limiting speedup for parallel computers. In this paper, we present a new two-phase algorithm for removing redundant dependencies and minimizing interprocessor synchronizations when scheduling an acyclic task graph onto a multiprocessor system. The first phase removes redundant dependencies before scheduling; the second phase eliminates interprocessor synchronizations after scheduling. In a simulation using randomly generated task graphs, on the average, 98.28% of the dependencies are eliminated in the first phase, and 65.86% of the remaining dependencies are eliminated during the second phase, for a total of 99.41% removed. The approach has also been applied to some benchmark task graphs. The two-phase algorithm, which hasO(n 3) time complexity andO(n 2) space complexity, utilizes a new algorithm which computes the transitive closure and reduction at the same time, storing results in a single matrix.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
D. Kuck, E. Davidson, D. Lawrie, and A. Sameh, Parallel Supercomputing Today and the Cedar Approach,Science 231:967–974 (1986).
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, The NYU Ultracomputer—Designing an MIMD Shared-memory and Parallel Machine,IEEE Trans. on Computers C-32:175–189 (1983).
C. C. Price and M. A. Salama, Scheduling of Precedence-Constrained Tasks on Multiprocessors,The Computer Journal 33(3):219–229 (1990).
E. Arnould, F. Bitz, E. Cooper, H. T. Kung, R. D. Sansom, and P. Steenkiste, The Design of Nectar: A Network Backplane for Heterogeneous Multicomputers,Proc. of the Third Int'l Conf. on Architectural Support for Progr. Languages and Oper. Syst., pp. 205–216 (1988).
W. C. Athas and C. L. Sietz, Multicomputers: Message-passing Concurrent Computers,IEEE Computer Magazine 46:9–24 (1988).
F. W. Burton, G. P. McKeown, and V. J. Rayward-Smith, Applications of UET Scheduling Theory to the Implementation of Declarative Languages,The Computer Journal 33(4):330–336 (1990).
H. Kasahara and S. Narita, Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing,IEEE Trans. on Computers C-33(11):1023–1029 (1984).
V. P. Krothapalli and P. Sadayappan, Removal of Redundant Dependences in DOACROSS Loops with Constant DependencesProc. of the Third SIGPLAN Symp. on Principles and Practice of Parallel Programming 26:51–60 (1991).
Z. Li and W. Abu-Sufah, On Reducing Data Synchronization in Multiprocessed Loops,IEEE Trans. on Computers C-36(1):105–109 (1987).
J. Blazewicz, M. Drabowski, and J. Weglarz, Scheduling Multiprocessor Tasks to Minimize Schedule Length,IEEE Trans. on Computers C-35(5):389–393 (1986).
E. G. Coffman and R. L. Graham, Optimal Scheduling for Two-Processor Systems,Acta. Informatica 1:200–213 (1972).
H. N. Gabow, An Almost-Linear Algorithm for Two-Processor Scheduling,Journal of the ACM 29(3):207–227 (1982).
T. C. Hu, Parallel Sequencing and Assembly Line Problems,Operations Research 9:841–848 (1961).
H. F. Li, Scheduling Trees in Parallel/Pipelined Processing Environments,IEEE Trans. on Computers C-26(11):1101–1112 (1977).
V. F. Magirou and J. Z. Milis, An Algorithm for the Multiprocessor Assignment Problems,Oper. Res. Lett. 8(6):351–356 (1989).
S. P. Midkiff and D. A. Padua, Compiler Algorithms for Synchronizations,IEEE Trans. on Computers C-36(12):1485–1495 (1987).
P. L. Shaffer, Minimization of Interprocessor Synchronization in Multiprocessors with Shared and Private Memory,Int'l Conf. on Parallel Processing 3:138–142 (1989).
D. Bernstein, An Improved Approximation Algorithm for Scheduling Pipelined Machines,Int'l Conf. on Parallel Processing, pp. 430–433 (1988).
T.R. Gross, Code Optimization Techniques for Pipelined Architectures,COMPCON ′83, Spring, pp, 278–285 (1983).
H. Y. Chao and M. P. Harper, Scheduling a Superscalar Pipelined Processor Without Hardware Interlocks, Technical Report TR-EE 94-29, Purdue University (1994).
Y. H. Shiau and C. P. Chung, Adoptability and Effectiveness of Microcode Compaction Algorithms in Superscalar Processing,Parallel Computing 18(5):497–510 (1992).
T. Yoshimura and E. S. Kuh, Efficient Algorithms for Channel Routing,IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems CAD-1:25–35 (1982).
A. V. Aho, J. E. Hopcroft and J. D. Ullman,The Design and Analysis of Computer Algorithms, Addison-Wesley Publishing Company, San Francisco, California (1976).
S. Baase,Computer Algorithms, Addison-Wesley Publishing Company, San Diego, California (1988).
T. H. Cormen, C. E. Leiserson, and R. L. Rivest,Introduction to Algorithms, McGraw-Hill Book Company, New York (1990).
M. R. Garey and D. S. Johnson,Computers and Intractability, W. H. Freeman and Company, San Francisco, California (1979).
D. Gries, A. J. Martin, Jan L. A. van de Snepscheut, and J. T. Udding, An Algorithm for Transitive Reduction of an Acyclic Graph,Science of Computer Programming 12:151–155 (1989).
C. H. Papadimitriou and K. Steiglitz,Combinatorial Optimization: Algorithms and Complexity, Prentice-Hall Inc., San Francisco, California (1979).
K. K. Lee and H. W. Leong, An Improved Lower Bound for Channel Routing Problems,IEEE Int'l Symp. on Circuits and Systems 4: 11–14 (1991).
J. S. Wang and R. C. T. Lee, An Efficient Channel Routing Algorithm to Yield an Optimal Solution,IEEE Trans. on Computers 39(7):957–962 (1990).
T. L. Adam, K. M. Chandy, and J. R. Dickson, A Comparison of List Schedules for Parallel Processing Systems,Comm. ACM 17(12):685–690 (1974).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chao, HY., Harper, M.P. Minimizing redundant dependencies and interprocessor synchronizations. Int J Parallel Prog 23, 245–262 (1995). https://doi.org/10.1007/BF02577868
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02577868