Compiler techniques for fine-grain execution on workstation clusters using PAPERS

Dietz, H. G.; Cohen, W. E.; Muhammad, T.; Mattox, T. I.

doi:10.1007/BFb0025869

H. G. Dietz¹,
W. E. Cohen¹,
T. Muhammad¹ &
…
T. I. Mattox¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 892))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

142 Accesses
1 Citations

Abstract

Just a few years ago, parallel computers were tightly-coupled SIMD, VLIW, or MIMD machines. Now, they are clusters of workstations connected by communication networks yielding ever-higher bandwidth (e.g., Ethernet, FDDI, HiPPI, ATM). For these clusters, compiler research is centered on techniques for hiding huge synchronization and communication latencies, etc. — in general, trying to make parallel programs based on fine-grain aggregate operations fit an existing network execution model that is optimized for point-to-point block transfers.

In contrast, we suggest that the network execution model can and should be altered to more directly support fine-grain aggregate operations. By augmenting workstation hardware with a simple barrier mechanism (PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization), and appropriate operating system hooks for its direct use from user processes, the user is given a variety of efficient aggregate operations and the compiler is provided with a more static (i.e., more predictable), lower-latency, target execution model. This paper centers on compiler techniques that use this new target model to achieve more efficient parallel execution: first, techniques that statically schedule aggregate operations across processors, second, techniques that implement SIMD and VLIW execution.

This work was supported in part by the Office of Naval Research (ONR) under grant number N00014-91-J-4013 and by the National Science Foundation (NSF) under award number 9015696-CDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

T.B. Berg and H.J. Siegel, “Instruction Execution Trade-Offs for SIMD vs. MIMD vs. Mixed Mode Parallelism,” 5th International Parallel Processing Symposium, April 1991, pp. 301–308.
Google Scholar
U. Bruening, W. K. Giloi, and W. Schroeder-Preikschat, “Latency Hiding in Message-Passing Architectures,” 8th International Parallel Processing Symposium, Cancun, Mexico, April, 1994, pp. 704–709.
Google Scholar
E. A. Brewer and B. C. Kuszmaul, “How to Get Good Performance from the CM-5 Data Network,” 8th International Parallel Processing Symposium, Cancun, Mexico, April 1994, pp. 858–867.
Google Scholar
C.J. Brownhill and A. Nicolau, Percolation Scheduling for Non-VLIW Machines, Technical Report 90-02, University of California at Irvine, Irvine, California, January 1990.
Google Scholar
W. E. Cohen, H. G. Dietz, and J. B. Sponaugle, “Dynamic Barrier Architecture For Multi-Mode Fine-Grain Parallelism Using Conventional Processors,” Int'l Conf. on Parallel Processing, August 1994, vol. 1, pp. 93–96.
Google Scholar
R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman, “A VLIW Architecture for a Trace Scheduling Compiler,” IEEE Trans. on Computers, vol. C-37, no. 8, pp. 967–979, Aug. 1988.
Google Scholar
H. G. Dietz, Coding Multiway Branches Using Customized Hash Functions, Purdue University School of Electrical Engineering Technical Report TR-EE 92-31, July 1992.
Google Scholar
H. G. Dietz, T. Muhammad, J. B. Sponaugle, and T. Mattox, PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization, Purdue University School of Electrical Engineering Technical Report TR-EE 94-11, March 1994.
Google Scholar
H. G. Dietz, M.T. O'Keefe, and A. Zaafrani, “Static Scheduling for Barrier MIMD Architectures,” The Journal of Supercomputing, vol. 5, pp. 263–289, 1992.
Google Scholar
J. A. Fisher, “The VLIW Machine: A Multiprocessor for Compiling Scientific Code,” IEEE Computer, July 1984, pp. 45–53.
Google Scholar
R. Gupta, “The Fuzzy Barrier: A Mechanism for the High Speed Synchronization of Processors,” Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, MA, April 1989, pp. 54–63.
Google Scholar
P. J. Hatcher, A. J. Lapadula, R. R. Jones, M. J. Quinn, and R. J. Anderson, “A Production-Quality C^* Compiler for Hypercube Multicomputers” Third ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Williamsburg, Virginia, April 1991, pp. 73–82.
Google Scholar
R. Keryell and N. Paris. “Activity Counter: New Optimization for the Dynamic Scheduling of SIMD Control Flow,” Proc. Int'l Conf. Parallel Processing, pp. II 184–187, August 1993.
Google Scholar
L. T. Liu, and D. E. Culler, “Measurements of Active Messages Performance on the CM-5,” University of California Berkeley, CS csd-94-807(dir), UCB//CSD-94-807, May 1994. 36 pages.
Google Scholar
MasPar Computer Corporation, MasPar Programming Language (ANSI C compatible MPL) Reference Manual, Software Version 2.2, Document Number 9302-0001, Sunnyvale, California, November 1991.
Google Scholar
T. J. Parr, H. G. Dietz, and W. E. Cohen, “PCCTS Reference Manual (version 1.00),” ACM SIGPLAN Notices, Feb. 1992, pp. 88–165.
Google Scholar
C. D. Polychronopolous, “Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design,” IEEE Trans. Comput., vol. C-37, no. 8, pp. 991–1004, Aug. 1989.
Google Scholar
Thinking Machines Corporation, C ^* Programming Guide, Thinking Machines Corporation, Cambridge, Massachusetts, November 1990.
Google Scholar
D. W. Watson, Compile-Time Selection of Parallel Modes in an SIMD/SPMD Heterogeneous Parallel Environment, Ph.D. Dissertation, Purdue University School of Electrical Engineering, August 1993.
Google Scholar
J. Wiegand, “Cooperative development of Linux,” Proceedings of the 1993 IEEE International Professional Communication Conference, Philadelphia, PA, October 1993, pp. 386–390.
Google Scholar

Download references

Author information

Authors and Affiliations

Parallel Processing Laboratory School of Electrical Engineering, Purdue University, 47907-1285, West Lafayette, IN
H. G. Dietz, W. E. Cohen, T. Muhammad & T. I. Mattox

Authors

H. G. Dietz
View author publications
You can also search for this author in PubMed Google Scholar
W. E. Cohen
View author publications
You can also search for this author in PubMed Google Scholar
T. Muhammad
View author publications
You can also search for this author in PubMed Google Scholar
T. I. Mattox
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Keshav Pingali Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dietz, H.G., Cohen, W.E., Muhammad, T., Mattox, T.I. (1995). Compiler techniques for fine-grain execution on workstation clusters using PAPERS. In: Pingali, K., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1994. Lecture Notes in Computer Science, vol 892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0025869

Download citation

DOI: https://doi.org/10.1007/BFb0025869
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58868-9
Online ISBN: 978-3-540-49134-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics