Abstract
Scalar performance on processors with instruction level parallelism (ILP) is often limited by control and data dependences. This paper describes a family of compiler techniques, called Critical Path Reduction (CPR) techniques, which reduce the length of critical paths through control and data dependences. Control CPR reduces the number of branches on the critical path and improves the performance of branch intensive codes on processors with inadequate branch throughput or excessive branch latency. Data CPR reduces the number of arithmetic operations on the critical path. Optimization and scheduling are adapted to support CPR.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
R. Hank, W. W. Hwu, and B. R. Rau, Region-Based Compilation: An Introduction and Motivation,Proc. 28th Ann. Symp. on Microarchitecture Ann Arbor, Michigan, pp. 158–168 (1995).
J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5,J. Supercomputing 7(1/2):181–228 (1993).
M. Schlansker and V. Kathail, Acceleration of First and Higher Order Recurrences on Processors with Instruction Level Parallelism,Sixt Int’l. Workshop on Lang. Compilers for Parallel Computing, U. Banerjee,et al. (Eds., Springer-Verlag, pp. 406–429 (1993).
M. Schlansker, V. Kathail, and S. Anik, Height Reduction of Control Recurrences for ILP Processors,Proc. 27th Ann. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 40–51 (1994).
J. A. Fisher, Very Long Instruction Word Architectures and the ELI-512,Proc. Tenth Ann. Intnl. Symp. Computer Architecture, Stockholm, Sweden, pp. 140–150 (1983).
G. Lowneyet al., The Multiflow Trace Scheduling Compilers,J. Supercomputing 7(1/2):51–142 (1993).
W. W. Hwu,et al., The Superblock: An Effective Technique for VLIW and Superscalar Compilation.J. Supercomputing 7(1/2): 229–248 (1993).
J. A. Fisher and S. M. Freudenberger, Predicting Conditional Jump Directions from Previous Runs of a Program,Proc. Fifth Int’l. Conf. Archit. Support for Progr. Lang. and Oper. Syst., Boston, Massachusetts, pp. 85–95 (1992).
V. Kathail, M. S. Schlansker, and B. R. Rau, HPL PlayDoh Architecture Specification: Version 1.0. Technical Report HPL-93-80, Hewlett-Packard Laboratories, Palo Alto, California (1993).
P. Y. T. Hsu and E. S. Davidson. Highly Concurrent Scalar Processing.Proc. 13th Ann. Int’l. Symp. Computer Archit., pp. 386–395 (1986).
B. R. Rauet al., The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions and Trade-Offs.Computer 22(1):12–35 (1989).
S. A. Mahlke,et al., Effective Compiler Support for Predicated Execution Using the Hyperblock.Proc. 25th Ann. Int’l. Symp. Microarchitecture, pp. 45–54 (1992).
J. C. Dehnert, P. Y.-T. Hsu, and J. P. Bratt, Overlapped Loop Support in the Cydra 5.Proc. Third Int’l. Conf. Archit. Support for Progr. Lang. Oper. Syst., Boston, Massachusetts, pp. 26–38 (1989).
S. A. Mahlke,et al., Sentinel Scheduling: A Model for Compiler-Controlled Speculative Execution.ACM Trans. Computer Systems 11(4):376–408 (1993).
J. R. Ellis,Bulldog: A Compiler for VLIW Architectures, The MIT Press, Cambridge, Massachusetts, (1985).
J. Ferrante, K. Ottenstein, and J. Warren, The Program Dependence Graph and Its Use in Optimization.ACM Trans. Progr. Lang. Syst. 9(3):319–349 (1987).
K. Pingali and G. Bilardi, APT: A Data Structure for Optimal Control Dependence Computation.Proc. Progr. Lang. Design and Implementation, La Jolla, California (1995).
J. C. H. Park and M. S. Schlansker, On Predicated Execution. Technical Report HPL-91-58, Hewlett-Packard Laboratories, Palo Alto, California (1991).
D. J. Kuck,The Structure of Computers and Computations, John Wiley, New York (1978).
J. A. Fisher, Trace scheduling: A Technique for Global Microcode Compaction,IEEE Trans. Computers C-30(7):478–490 (1981).
A. Nicolau, Percolation Scheduling: A Parallel Compilation Technique. Technical Report TR 85-678, Department of Computer Science, Cornell (1985).
K. Ebcioglu and A. Nicolau. AGlobal Resource-Constrained Parallelization Technique.Proc. Third Int’l. Conf. Supercomputing, Crete, Greece, pp. 154–163 (1989).
P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of Loops with Exits on Pipelined Architectures,Proc. Supercomputing, pp. 200–212 (1990).
S.-M. Moon and K. Ebcioglu, An Efficient Resource-Constrained Global Scheduling Technique for Superscalar and VLIW Processors,Proc. 25th Ann. Int’l. Symp. Microarchitecture, Portland, Oregon (1992).
J. A. Fisher, 2N-way Jump Microinstruction Hardware and an Effective Instruction Binding Method,Proc. 13th Ann. Workshop on Microprogramming, Colorado Springs, Colorado, pp. 64–75 (1980).
K. Ebcioglu and R. Groves, Some Global Compiler Optimization and Architectural Features for Improving Performance of Superscalars, Technical Report RC16145, IBM T. J. Watson Research Center, Yorktown Heights, New York (1990).
B. R. Rau, M. S. Schlansker, and P. P. Tirumalai, Code Generation Schemas for Modulo Scheduled DO-Loops and WHILE-Loops. Technical Report HPL-92-47, Hewlett-Packard Laboratories, Palo Alto, California (1992).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schlansker, M., Kathail, V. Techniques for critical path reduction of scalar programs. Int J Parallel Prog 25, 147–181 (1997). https://doi.org/10.1007/BF02700034
Issue Date:
DOI: https://doi.org/10.1007/BF02700034