Abstract
The performance of linear relaxation codes strongly depends on an efficient usage of caches. This paper considers one time step of the Jacobi and Gauß-Seidel kernels on a 3D array, and shows that tiling reduces the number of capacity misses to almost optimum. In particular, we prove that Ω(N 3 /(L√C)) capacity misses are needed for array size N× N × N, cache size C, and line size L. If cold misses are taken into account, tiling is off the lower bound by a factor of about 1+5/√LC. The exact value depends on tile size and data layout. We show analytically that rectangular tiles of shape (N - 2) × s × (sL/2) outperform square tiles, for row-major storage order.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
C.C. Douglas, U. Rüde, J. Hu, and M. Bittencourt. A guide to designing cache aware multigrid algorithms. In Concepts of Numerical Software, Notes on Numerical Fluid Mechanics. Vieweg-Verlag, 2001. To appear.
C. E. Leiserson, S. Rao, and S. Toledo. Efficient out-of-core algorithms for linear relaxation using blocking covers. Journal of Computer and System Sciences, 54(2):332–344, Apr. 1997.
C. Leopold. On optimal locality of linear relaxation. To appear in Proc. IASTED Int. Multi-Conf. on Applied Informatics, 2002.
C. Leopold. On optimal temporal locality of stencil codes. To appear in Proc. ACM Symp. on Applied Computing, 2002.
C. Leopold. An analytical evaluation of tiling for stencil codes with time loop. To appear in Workshop-Proc. of Int. Parallel and Distributed Processing Symp. (4th Workshop on Advances in Parallel and Distributed Computational Models), 2002.
G. Rivera and C.-W. Tseng. A comparison of compiler tiling algorithms. In 8th Int. Conf. on Compiler Construction, pages 168–182. LNCS 1575, 1999.
G. Rivera and C.-W. Tseng. Tiling optimizations for 3D scientific computations. In Proc. Supercomputing. IEEE, 2000. Available at http://www.supercomp.org/ sc2000/Proceedings/start.htm.
S. Sellappa. Cache-efficient multigrid algorithms. Master’s thesis, University of North Carolina at Chapel Hill, Dept. of Computer Science, 2000.
Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 215–228, 1999.
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. SIGPLAN Notices, 26(6):30–44, 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leopold, C. (2002). Tight Bounds on Capacity Misses for 3D Stencil Codes. In: Sloot, P.M.A., Hoekstra, A.G., Tan, C.J.K., Dongarra, J.J. (eds) Computational Science — ICCS 2002. ICCS 2002. Lecture Notes in Computer Science, vol 2329. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46043-8_85
Download citation
DOI: https://doi.org/10.1007/3-540-46043-8_85
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43591-4
Online ISBN: 978-3-540-46043-5
eBook Packages: Springer Book Archive