Communication-free parallelization via affine transformations

Lim, Amy W.; Lam, Monica S.

doi:10.1007/BFb0025873

Amy W. Lim¹ &
Monica S. Lam¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 892))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

161 Accesses
15 Citations

Abstract

The paper describes a parallelization algorithm for programs consisting of arbitrary nestings of loops and sequences of loops. The code produced by our algorithm yields all the degrees of communication-free parallelism that can be obtained via loop fission, fusion, interchange, reversal, skewing, scaling, reindexing and statement reordering. The algorithm first assigns the iterations of instructions in the program to processors via affine processor mappings, then generates the correct code by ensuring that the code executed by each processor is a subsequence of the original sequential execution sequence.

This research was supported in part by DARPA contract DABT63-91-K-0003 and an NSF Young Investigator award.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. R. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In Proceedings, 14th Annual ACM Symposium on Principles of Programming Languages, Munich, Germany, January 1987.
Google Scholar
J. R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491–542, October 1987.
Google Scholar
S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993.
Google Scholar
C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proceedings of the Third ACM/SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 39–50, April 1991.
Google Scholar
J. M. Anderson and M. S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993.
Google Scholar
E. Ayguadé and J. Torres. Partitioning the statement per iteration space using non-singular matrices. In Proceedings of the 1993 ACM International Conference on Supercomputing, July 1993.
Google Scholar
U. Banerjee. Speedup of Ordinary Programs. PhD thesis, University of Illinois at Urbana-Champaign, October 1979.
Google Scholar
U. Banerjee. Unimodular transformations of double loops. In Proceedings of the Third Workshop on Programming Languages and Compilers for Parallel Computing, pages 192–219, August 1990.
Google Scholar
U. Banerjee. Loop Transformations for Restructuring Compilers. Kluwer Academic, 1993.
Google Scholar
S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings Supercomputing '92, pages 114–125, November 1992.
Google Scholar
P. Feautrier. Some efficient solution to the affine scheduling problem, part II, multidimensional time. Int. J. of Parallel Programming, 21(6), December 1992.
Google Scholar
P. Feautrier. Some efficient solutions to the affine scheduling problem, part I, one dimensional time. Int. J. of Parallel Programming, 21(5):313–348, October 1992.
Google Scholar
P. Feautrier. Towards automatic distribution. Technical Report 92.95, Institut Blaise Pascal/Laboratoire MASI, December 1992.
Google Scholar
C. H. Huang and P. Sadayappan. Communication-free hyperplane partitioning of nested loops. Journal of Parallel and Distributed Computing, 19:90–102, 1993.
Google Scholar
W. Kelly and W. Pugh. A framework for unifying reordering transformations. Technical Report CS-TR-2995.1, University of Maryland, April 1993.
Google Scholar
K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, pages 323–334, July 1992.
Google Scholar
K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Programming Languages and Compilers for Parallel Computing, August 1993.
Google Scholar
V. Sarkar and R. Thekkath. A general framework for iteration-reordering loop transformations. In Proceedings of the SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 175–187, June 1992.
Google Scholar
J. Torres, E. Ayguadé, J. Labarta, and M. Valero. Align and distribute-based linear loop transformations. In Proceedings of the Sixth Workshop on Programming Languages and Compilers for Parallel Computing, August 1993.
Google Scholar
M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.
Google Scholar
M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. Transactions on Parallel and Distributed Systems, 2(4):452–470, October 1991.
Google Scholar
M. J. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, MA, 1989.
Google Scholar
M. J. Wolfe. Massive parallelism through program restructuring. In Symposium on Frontiers on Massively Parallel Computation, pages 407–415, October 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Systems Laboratory, Stanford University, 94305, Stanford, CA
Amy W. Lim & Monica S. Lam

Authors

Amy W. Lim
View author publications
You can also search for this author in PubMed Google Scholar
Monica S. Lam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Keshav Pingali Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lim, A.W., Lam, M.S. (1995). Communication-free parallelization via affine transformations. In: Pingali, K., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1994. Lecture Notes in Computer Science, vol 892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0025873

Download citation

DOI: https://doi.org/10.1007/BFb0025873
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58868-9
Online ISBN: 978-3-540-49134-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics