Quantifying the Multi-Level Nature of Tiling Interactions

Mitchell, Nicholas; Högstedt, Karin; Carter, Larry; Ferrante, Jeanne

doi:10.1023/A:1018782528453

Quantifying the Multi-Level Nature of Tiling Interactions

Published: December 1998

Volume 26, pages 641–670, (1998)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Parallel Programming Aims and scope Submit manuscript

Quantifying the Multi-Level Nature of Tiling Interactions

Download PDF

Nicholas Mitchell,
Karin Högstedt,
Larry Carter &
…
Jeanne Ferrante

114 Accesses
42 Citations
Explore all metrics

Abstract

Optimizations, including tiling, often target a single level of memory or parallelism, such as cache. These optimizations usually operate on a level-by-level basis, guided by a cost function parameterized by features of that single level. The benefit of optimizations guided by these one-level cost functions decreases as architectures tend towards a hierarchy of memory and of parallelism. We have identified three common architectural scenarios where a single tiling choice could be improved by using information from multiple levels in concert. For each scenario, we derive multi-level cost functions which guide the optimal choice of tile size and shape, and quantify the improvement gained. We give both analysis and simulation results to support our points.

Article PDF

Exploring Strategies to Improve Locality Across Many-Core Affinities

Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

An Analytical Model for Loop Tiling Transformation

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

REFERENCES

Michael E. Wolf and Monica S. Lam, A data locality optimizing algorithm, Progr. Lang. Design Implementation (1991).
Steve Carr and Ken Kennedy, Compiler blockability of numerical algorithms, J. Supercomputing, pp. 114–124 (November 1992).
Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng, Compiler optimizations for improving data locality, Sixth Int'l. Conf. Archit. Support Progr. Lang. Oper. Syst., San Jose, California, Oct. 1994.
Steve Carr and Ken Kennedy, Improving the ratio of memory operations to floatingpoint operations in loops, Trans. Progr. Lang. Syst. 16(6):1768–1810 (November 1994).
Google Scholar
Corinne Ancourt and François Irigoin, Scanning polyhedra with DO loops, Principles and Practice of Parallel Progr., pp. 39–50 ( April 1991).
Michael E. Wolf and Monica S. Lam, A loop transformation theory and an algorithm to maximize parallelism, IEEE Trans. Parallel Distrib. Syst. 2(4):452–471 (1991).
Google Scholar
Paul Feautrier, Some efficient solutions to the affine scheduling problem, Part I, one-dimensional time, IJPP 21(5):xx-xx (October 1992).
Google Scholar
Wayne Kelly and William Pugh, A unifying framework for iteration reordering transformations, IEEE First Int'l. Conf. Algorithms and Architectures for Parallel Processing (April 1995).
Daniel Lavery and Wen-mei Hwu, Unrolling-based optimizations for modulo scheduling, 28th Int'l. Symp. Microarchit., pp. 126–141 (December 1995).
Stephanie Coleman and Kathryn S. McKinley, Tile size selection using cache organization and data layout, Progr. Lang. Design and Implementation (June 1995).
Vivek Sarkar, Guang R. Gao, and Shaohua Han, Locality analysis for distributed shared-memory multiprocessors, Lang. Compilers for Parallel Computing (1996).
Dennis Gannon and Ko-Yang Wang, Applying AI Techniques to Program Optimization for Parallel Computers, Chap. 12, McGraw Hill Co. (1989).
Michael E. Wolf, Dror Maydan, and Ding-Kai Chen, Combining loop transformations considering caches and scheduling, 29th Int'l. Symp. Microarchit. (December 1996).
Michael J. Wolfe, Iteration space tiling for memory hierarchies, Parallel Processing for Sci. Comput., pp. 357–361 (1987).
J. Ramanujam and P. Sadayappan, Tiling multidimensional iteration spaces for nonshared memory machines, Supercomputing (November 1991).
David A. Padua and Michael J. Wolfe, Advanced compiler optimizations for supercomputers, Commun. ACM 29(12):1184–1201 (December 1986).
Google Scholar
Dennis Gannon, William Jalby, and Kyle Gallivan, Strategies for cache and local memory management by global program transformation, J. Parallel and Distrib. Comput., Vol. 5, No.5 (October 1988).
François Irigoin and Rémi Triolet, Supernode partitioning, Principles of Progr. Lang., pp. 319–328 (January 1988).
Michael J. Wolfe, More iteration space tiling, Supercomputing, pp. 655–664 (1989).
Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf, The cache performance and optimizations of blocked algorithms, ASPLOS-IV, Palo Alto, California (April 1991).
Utpal Banerjee, Unimodular transformations of double loops, in Progr. Lang. Compilers for Parallel Computing, Irvine, California (August 1990).
Ken Kennedy and Kathryn S. McKinley, Optimizing for parallelism and data locality, Int'l. Conf. Supercomputing (July 1992).
Jeanne Ferrante, Vivek Sarkar, and Wedy Thrash, On estimating and enhancing cache effectiveness, Lang. Compilers for Parallel Computing (1991).
Anant Agarwal, David Kranz, and Venkat Natarajan, Automatic partitioning of parallel loops and data arrays for distributed shared memory multiprocessors, Int'l. Conf. Parallel Computing (1993).
Vivek Sarkar and Radhika Thekkath, A general framework for iteration-reordering loop transformations, Technical Summary, Progr. Lang. Design and Implementation (1992).
Steve Carr, Combining optimization for cache and instruction-level parallelism, PACT '96, pp. 238–247 (1996).
Ken Kennedy and Kathryn S. McKinley, Maximizing loop parallelism and improving data locality via loop fusion and distribution, Lang. Compilers for Parallel Computing (1993).
Jeff Bilmes, Krste Asanović, Chee-Whye Chin, and Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Int'l. Conf. Supercomputing (1997).
Larry Carter, Jeanne Ferrante, Susan Flynn Hummel, Bowen Alpern, and Kang Su Gatlin, Hierarchical tiling: A methodology for high performance, Technical Report CS96–508, UCSD, Department of Computer Science and Engineering (November 1996).
Doug Burger and Todd Austin, The SimpleScalar architectural research tool set, Version 2.0, http://www.cs.wisc.edu/mscalar/simplescalar.html
Karin Högstedt, Larry Carter, and Jeanne Ferrante, Determining the idle time of a tiling, Principles of Progr. Lang. (1997).
Larry Carter, Jeanne Ferrante, and S. Flynn Hummel, Hierarchical tiling for improved superscalar performance, Int'l. Parallel Processing Symp. (April 1995).

Download references

Authors

Nicholas Mitchell
View author publications
You can also search for this author in PubMed Google Scholar
Karin Högstedt
View author publications
You can also search for this author in PubMed Google Scholar
Larry Carter
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne Ferrante
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mitchell, N., Högstedt, K., Carter, L. et al. Quantifying the Multi-Level Nature of Tiling Interactions. International Journal of Parallel Programming 26, 641–670 (1998). https://doi.org/10.1023/A:1018782528453

Download citation

Issue Date: December 1998
DOI: https://doi.org/10.1023/A:1018782528453

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Quantifying the Multi-Level Nature of Tiling Interactions

Abstract

Article PDF

Similar content being viewed by others

Exploring Strategies to Improve Locality Across Many-Core Affinities

Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

An Analytical Model for Loop Tiling Transformation

REFERENCES

Rights and permissions

About this article

Cite this article

Navigation

Quantifying the Multi-Level Nature of Tiling Interactions

Abstract

Article PDF

Similar content being viewed by others

Exploring Strategies to Improve Locality Across Many-Core Affinities

Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

An Analytical Model for Loop Tiling Transformation

REFERENCES

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation