Abstract
Just a few years ago, parallel computers were tightly-coupled SIMD, VLIW, or MIMD machines. Now, they are clusters of workstations connected by communication networks yielding ever-higher bandwidth (e.g., Ethernet, FDDI, HiPPI, ATM). For these clusters, compiler research is centered on techniques for hiding huge synchronization and communication latencies, etc. — in general, trying to make parallel programs based on fine-grain aggregate operations fit an existing network execution model that is optimized for point-to-point block transfers.
In contrast, we suggest that the network execution model can and should be altered to more directly support fine-grain aggregate operations. By augmenting workstation hardware with a simple barrier mechanism (PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization), and appropriate operating system hooks for its direct use from user processes, the user is given a variety of efficient aggregate operations and the compiler is provided with a more static (i.e., more predictable), lower-latency, target execution model. This paper centers on compiler techniques that use this new target model to achieve more efficient parallel execution: first, techniques that statically schedule aggregate operations across processors, second, techniques that implement SIMD and VLIW execution.
This work was supported in part by the Office of Naval Research (ONR) under grant number N00014-91-J-4013 and by the National Science Foundation (NSF) under award number 9015696-CDA.
Preview
Unable to display preview. Download preview PDF.
References
T.B. Berg and H.J. Siegel, “Instruction Execution Trade-Offs for SIMD vs. MIMD vs. Mixed Mode Parallelism,” 5th International Parallel Processing Symposium, April 1991, pp. 301–308.
U. Bruening, W. K. Giloi, and W. Schroeder-Preikschat, “Latency Hiding in Message-Passing Architectures,” 8th International Parallel Processing Symposium, Cancun, Mexico, April, 1994, pp. 704–709.
E. A. Brewer and B. C. Kuszmaul, “How to Get Good Performance from the CM-5 Data Network,” 8th International Parallel Processing Symposium, Cancun, Mexico, April 1994, pp. 858–867.
C.J. Brownhill and A. Nicolau, Percolation Scheduling for Non-VLIW Machines, Technical Report 90-02, University of California at Irvine, Irvine, California, January 1990.
W. E. Cohen, H. G. Dietz, and J. B. Sponaugle, “Dynamic Barrier Architecture For Multi-Mode Fine-Grain Parallelism Using Conventional Processors,” Int'l Conf. on Parallel Processing, August 1994, vol. 1, pp. 93–96.
R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman, “A VLIW Architecture for a Trace Scheduling Compiler,” IEEE Trans. on Computers, vol. C-37, no. 8, pp. 967–979, Aug. 1988.
H. G. Dietz, Coding Multiway Branches Using Customized Hash Functions, Purdue University School of Electrical Engineering Technical Report TR-EE 92-31, July 1992.
H. G. Dietz, T. Muhammad, J. B. Sponaugle, and T. Mattox, PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization, Purdue University School of Electrical Engineering Technical Report TR-EE 94-11, March 1994.
H. G. Dietz, M.T. O'Keefe, and A. Zaafrani, “Static Scheduling for Barrier MIMD Architectures,” The Journal of Supercomputing, vol. 5, pp. 263–289, 1992.
J. A. Fisher, “The VLIW Machine: A Multiprocessor for Compiling Scientific Code,” IEEE Computer, July 1984, pp. 45–53.
R. Gupta, “The Fuzzy Barrier: A Mechanism for the High Speed Synchronization of Processors,” Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, MA, April 1989, pp. 54–63.
P. J. Hatcher, A. J. Lapadula, R. R. Jones, M. J. Quinn, and R. J. Anderson, “A Production-Quality C* Compiler for Hypercube Multicomputers” Third ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Williamsburg, Virginia, April 1991, pp. 73–82.
R. Keryell and N. Paris. “Activity Counter: New Optimization for the Dynamic Scheduling of SIMD Control Flow,” Proc. Int'l Conf. Parallel Processing, pp. II 184–187, August 1993.
L. T. Liu, and D. E. Culler, “Measurements of Active Messages Performance on the CM-5,” University of California Berkeley, CS csd-94-807(dir), UCB//CSD-94-807, May 1994. 36 pages.
MasPar Computer Corporation, MasPar Programming Language (ANSI C compatible MPL) Reference Manual, Software Version 2.2, Document Number 9302-0001, Sunnyvale, California, November 1991.
T. J. Parr, H. G. Dietz, and W. E. Cohen, “PCCTS Reference Manual (version 1.00),” ACM SIGPLAN Notices, Feb. 1992, pp. 88–165.
C. D. Polychronopolous, “Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design,” IEEE Trans. Comput., vol. C-37, no. 8, pp. 991–1004, Aug. 1989.
Thinking Machines Corporation, C * Programming Guide, Thinking Machines Corporation, Cambridge, Massachusetts, November 1990.
D. W. Watson, Compile-Time Selection of Parallel Modes in an SIMD/SPMD Heterogeneous Parallel Environment, Ph.D. Dissertation, Purdue University School of Electrical Engineering, August 1993.
J. Wiegand, “Cooperative development of Linux,” Proceedings of the 1993 IEEE International Professional Communication Conference, Philadelphia, PA, October 1993, pp. 386–390.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dietz, H.G., Cohen, W.E., Muhammad, T., Mattox, T.I. (1995). Compiler techniques for fine-grain execution on workstation clusters using PAPERS. In: Pingali, K., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1994. Lecture Notes in Computer Science, vol 892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0025869
Download citation
DOI: https://doi.org/10.1007/BFb0025869
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58868-9
Online ISBN: 978-3-540-49134-7
eBook Packages: Springer Book Archive