Abstract
Instruction level parallelism has become more and more important in today's microprocessor design. These microprocessors have multiple function units, which can execute more than one instruction at the same machine cycle to enhance the uniprocessor performance. Since the function units are usually pipelined in such microprocessors, branch misprediction penalty tremendously degrades the CPU performance. In order to reduce the branch misprediction penalty, predicated operation has been introduced in such microprocessor design as one of the new architectural features, which allows compilers to remove branches from programs.
The compiler converts programs with conditional branch to predicated code on these microprocessors with predicate operation support by using comparison instructions to set up Boolean predicates corresponding to branch conditions. Instructions previously guarded by a branch are guarded by a predicate. Predicates guard execution by either executing or nullifying the instruction according to the predicate's value.
In this report we present an algorithm to convert a control flow graph into predicated code. The algorithm shown in this report not only minimizes the number of predicates used for basic blocks in the control flow, but also moves the predicate assignments as early as possible to relax dependence constrains introduced by the if-conversion for the later phases of compiler such as the instruction scheduler and register allocator. The algorithm doesn't simply assign the predicate associated with a basic block to the instructions in the basic block. It assigns the predicate associated with the “farthest” dominator basic block to an instruction, to which the instruction can be speculatively moved without introducing compensation code in the instruction scheduling.
An optimization algorithm for if-converted code is discussed in the report as well. The algorithm is specially developed for predicated code optimization, which is different from the traditional optimizations. Common subexpression hoisting and sinking employs flow analysis, that extends the traditional common subexpression elimination technique. This algorithm requires traditional flow analysis information such as dominate and post-dominate relationship. Actually, the algorithm can be generalized on predicated code after if-conversion, which is particularly beneficial for single assignment DAG representation of the predicated code.
Preview
Unable to display preview. Download preview PDF.
References
J.A.Fisher, “Trace scheduling: A technique for global microcode compaction,” IEEE Transactions on Computers, vol. c-30, pp.478–490, July 1981.
P.G. Lowney, S.M. Freudenberger, T.J. Karzes, W.D. Lichtenstein, R.P. Nix, J.S. O'Donnell, and J.C. Ruttenberg, “The Multiflow trace scheduling compiler”, The Journal of Supercomputing, 1992.
B.R.Rau,D.W.L.Yen,W.Yen,and R.A.Towle, “The Cydra 5 departmental supercomputer,” IEEE Computer, pp.12–35, January 1989.
J.C.Dehnert and R.A.Towle, “Compiling for the Cydra 5”, The Journal of Supercomputing, 1992.
P.P.Chang, S.A.Mahlke, W.Y.Chen, N.J.Warter, and W.W.Hwu, “IMPACT: an architectural framework for multiple-instruction-issue processors”, In Proceedings 18th Annual International Symposium on Computer Architecture, Toronto, Canada, pp.266–275, May 1991.
A.Nicolau, “Percolation scheduling: a parallel compilation technique”, Technical Report TR 85-678, Department of Computer Science, Cornell, 1985.
D.W.Wall, “Limits of instruction-level parallelism”, Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.176–188, April 1991.
M.Butler,T.Y.Yeh,Y.Patt,M.Alsup,H.Scales,and M.Shebanow, “Single instruction stream parallelism is greater than two”, Proceedings of the 18th International Symposium on Computer Architecture, pp.276–286, May 1991.
S.Mahlke, R.Hank, R.Bringmann, J.Gyllenhaal, D.Gallagher, and W.M.Hwu, “Predicated Execution Support for Superscalar Processor', to be appear.
P.Tirumalai, M.Lee, and M.S.Schlansker, “Parallelization of loops with exits on pipelined architectures”, In Proceedings Supercomputing '90, pp.200–212, November 1990.
J.R.Allen,K.Kennedy,C.Porterfield,and J.Warren, “Conversion of control dependence to data dependence”, in Proceedings of the 10th ACM Symposium on Principles of Programming Languages, pp.177–189, January 1983.
J.C.H.Park and M.Schlansker, “On Predicated Execution”, Tech. Rep. HPL-91-58, Hewlett-Packard Software System Laboratory, May 1991.
A.Aho, R.Sethi, and J.Ullman, “Compilers:Prindples,Techniques,and Tools”, Reading MA: Addison-Wesley, 1988.
Hewlett-Packard Co. PA-RISC 1.1 Architecture and Instruction Set Reference Manual, Cupertino, CA: Hewlett-Packard Co. 1990.
J.Ferrante, K.J.Ottenstein, and J.D.Warren, “The program dependence graph and its use in optimization”, ACM Transactions on Programming Languages and Systems, vol.9, pp.319–349, July 1987.
N.Water, et al, “Enhanced modulo scheduling for loops with conditional branches”, Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, Oregon, 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fang, J.Z. (1997). Compiler algorithms on if-conversion, speculative predicates assignment and predicated code optimizations. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017250
Download citation
DOI: https://doi.org/10.1007/BFb0017250
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive