Abstract
The paper describes a parallelization algorithm for programs consisting of arbitrary nestings of loops and sequences of loops. The code produced by our algorithm yields all the degrees of communication-free parallelism that can be obtained via loop fission, fusion, interchange, reversal, skewing, scaling, reindexing and statement reordering. The algorithm first assigns the iterations of instructions in the program to processors via affine processor mappings, then generates the correct code by ensuring that the code executed by each processor is a subsequence of the original sequential execution sequence.
This research was supported in part by DARPA contract DABT63-91-K-0003 and an NSF Young Investigator award.
Preview
Unable to display preview. Download preview PDF.
References
J. R. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In Proceedings, 14th Annual ACM Symposium on Principles of Programming Languages, Munich, Germany, January 1987.
J. R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491–542, October 1987.
S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993.
C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proceedings of the Third ACM/SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 39–50, April 1991.
J. M. Anderson and M. S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993.
E. Ayguadé and J. Torres. Partitioning the statement per iteration space using non-singular matrices. In Proceedings of the 1993 ACM International Conference on Supercomputing, July 1993.
U. Banerjee. Speedup of Ordinary Programs. PhD thesis, University of Illinois at Urbana-Champaign, October 1979.
U. Banerjee. Unimodular transformations of double loops. In Proceedings of the Third Workshop on Programming Languages and Compilers for Parallel Computing, pages 192–219, August 1990.
U. Banerjee. Loop Transformations for Restructuring Compilers. Kluwer Academic, 1993.
S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings Supercomputing '92, pages 114–125, November 1992.
P. Feautrier. Some efficient solution to the affine scheduling problem, part II, multidimensional time. Int. J. of Parallel Programming, 21(6), December 1992.
P. Feautrier. Some efficient solutions to the affine scheduling problem, part I, one dimensional time. Int. J. of Parallel Programming, 21(5):313–348, October 1992.
P. Feautrier. Towards automatic distribution. Technical Report 92.95, Institut Blaise Pascal/Laboratoire MASI, December 1992.
C. H. Huang and P. Sadayappan. Communication-free hyperplane partitioning of nested loops. Journal of Parallel and Distributed Computing, 19:90–102, 1993.
W. Kelly and W. Pugh. A framework for unifying reordering transformations. Technical Report CS-TR-2995.1, University of Maryland, April 1993.
K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, pages 323–334, July 1992.
K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Programming Languages and Compilers for Parallel Computing, August 1993.
V. Sarkar and R. Thekkath. A general framework for iteration-reordering loop transformations. In Proceedings of the SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 175–187, June 1992.
J. Torres, E. Ayguadé, J. Labarta, and M. Valero. Align and distribute-based linear loop transformations. In Proceedings of the Sixth Workshop on Programming Languages and Compilers for Parallel Computing, August 1993.
M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.
M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. Transactions on Parallel and Distributed Systems, 2(4):452–470, October 1991.
M. J. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, MA, 1989.
M. J. Wolfe. Massive parallelism through program restructuring. In Symposium on Frontiers on Massively Parallel Computation, pages 407–415, October 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lim, A.W., Lam, M.S. (1995). Communication-free parallelization via affine transformations. In: Pingali, K., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1994. Lecture Notes in Computer Science, vol 892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0025873
Download citation
DOI: https://doi.org/10.1007/BFb0025873
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58868-9
Online ISBN: 978-3-540-49134-7
eBook Packages: Springer Book Archive