Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?

Jiang, Yunlian; Zhang, Eddy Z.; Tian, Kai; Shen, Xipeng

doi:10.1007/978-3-642-11970-5_15

Yunlian Jiang¹⁷,
Eddy Z. Zhang¹⁷,
Kai Tian¹⁷ &
…
Xipeng Shen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6011))

Included in the following conference series:

International Conference on Compiler Construction

1779 Accesses
62 Citations

Abstract

On Chip Multiprocessors (CMP), it is common that multiple cores share certain levels of cache. The sharing increases the contention in cache and memory-to-chip bandwidth, further highlighting the importance of data locality analysis.

As a rigorous and hardware-independent locality metric, reuse distance has served for a variety of locality analysis, program transformations, and performance prediction. However, previous studies have concentrated on sequential programs running on unicore processors. On CMP, accesses by different threads (or jobs) interact in the shared cache. How reuse distance applies to the new architecture remains an open question—particularly, how the interactions in shared cache affect the collection and application of reuse distance, and how reuse-distance–based locality analysis should adapt to such architecture changes.

This paper presents our explorations towards answering those questions. It first introduces the concept of concurrent reuse distance, a direct extension of the traditional concept of reuse distance with data references by all co-running threads (or jobs) considered. It then discusses the properties of concurrent reuse distance, revealing the special challenges facing the collection and application of concurrent reuse distance on CMP platforms. Finally, it presents the solutions to those challenges for a class of multithreading applications. The solutions center on a probabilistic model that connects concurrent reuse distance with the data locality of each individual thread. Experiments demonstrate the effectiveness of the proposed techniques in facilitating the uses of concurrent reuse distance for CMP computing.

Download to read the full chapter text

Chapter PDF

Analysis of Data Reuse in Task-Parallel Runtimes

Adaptive Thread Scheduling in Chip Multiprocessors

Article 14 May 2019

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Berg, E., Hagersten, E.: Fast data-locality profiling of native execution. ACM SIGMETRICS Performance Review 33, 169–180 (2005)
Article Google Scholar
Beyls, K., D’Hollander, E.H.: Reuse Distance as a Metric for Cache Behavior. In: Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems (2001)
Google Scholar
Beyls, K., D’Hollander, E.: Discovery of locality-improving refactoring by reuse path analysis. In: Gerndt, M., Kranzlmüller, D. (eds.) HPCC 2006. LNCS, vol. 4208, pp. 220–229. Springer, Heidelberg (2006)
Chapter Google Scholar
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Toronto, pp. 72–81 (2008)
Google Scholar
Browne, S., Deane, C., Ho, G., Mucci, P.: PAPI: A portable interface to hardware performance counters. In: Proceedings of Department of Defense HPCMP Users Group Conference (1999)
Google Scholar
Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the International Symposium on High Performance Computer Architecture (2005)
Google Scholar
Chen, X.E., Aamodt, T.M.: A First-Order Fine-Grained Multithreaded Throughput Model. In: Proceedings of the International Symposium on High-Performance Computer Architecture, Raleigh, pp. 329–340 (2009)
Google Scholar
Denning, P.: Thrashing: Its causes and prevention. In: Proceedings of the AFIPS 1968 Fall Joint Computer Conference (1968)
Google Scholar
Ding, C., Zhong, Y.: Predicting Whole-Program Locality with Reuse Distance Analysis. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, pp. 245–257 (2003)
Google Scholar
Fang, C., Carr, S., Onder, S., Wang, Z.: Instruction Based Memory Distance Analysis and its Application to Optimization. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, pp. 27–37 (2005)
Google Scholar
Fang, C., Carr, S., Onder, S., Wang, Z.: Feedback-directed Memory Disambiguation Through Store Distance Analysis. In: Proceedings of the 20th ACM International Conference on Supercomputing, Cairns, Queensland, Australia, pp. 278–287 (2006)
Google Scholar
Fedorova, A., Seltzer, M., Smith, M.D.: Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pp. 25–38 (2007)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, Heidelberg (2001)
MATH Google Scholar
Hsu, L.R., Reinhardt, S.K., Lyer, R., Makineni, S.: Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, Seattle, pp. 13–22 (2006)
Google Scholar
Liao, C., Liu, Z., Huang, L., Chapman, B.: Evaluating OpenMP on Chip Multithreading Platforms. In: Proceedings of International Workshop on OpenMP (2005)
Google Scholar
Lu, Q., Lin, J., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Soft-OLP: improving hardware cache performance through software-controlled object-level partitioning. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pp. 246–257 (2009)
Google Scholar
Marin, G., Mellor-Crummey, J.: Cross architecture performance predictions for scientific applications using parameterized models. In: Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York, pp. 2–13 (2004)
Google Scholar
Martin, M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, 92–99 (2005)
Google Scholar
Mattson, R.L., Gecsei, J., Slutz, D., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM System Journal 9(2), 78–117 (1970)
Article Google Scholar
Rafique, N., Lim, W., Thottethodi, M.: Architectural support for operating system-driven CMP cache management. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pp. 2–12 (2006)
Google Scholar
Settle, A., Kihm, J.L., Janiszewski, A., Connors, D.A.: Architectural Support for Enhanced SMT job scheduling. In: Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pp. 63–73 (2004)
Google Scholar
Shen, X., Shaw, J.: Scalable Implementation of Efficient Locality Approximation. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 202–216. Springer, Heidelberg (2008)
Chapter Google Scholar
Shen, X., Shaw, J., Meeker, B., Ding, C.: Locality approximation using time. In: Proceedings of the ACM SIGPLAN Conference on Principles of Programming Languages (2007)
Google Scholar
Shen, X., Zhong, Y., Ding, C.: Regression-based multi-model prediction of data reuse signature. In: Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico (2003)
Google Scholar
Smith, A.J.: On the Effectiveness of Set Associative Page Mapping and Its Applications in Main Memory Management. In: Proceedings of the 2nd International Conference on Software Engineering, pp. 286–292 (1976)
Google Scholar
Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for a simultaneous multithreading processor. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 66–76 (2000)
Google Scholar
Suh, G.E., Devadas, S., Rudolph, L.: Analytical Cache Models with Applications to Cache Partitioning. In: Proceedings of the 15th international conference on Supercomputing, Sorrento, Italy, pp. 1–12 (2001)
Google Scholar
Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. SIGOPS Oper. Syst. Rev. 41(3), 47–58 (2007)
Article Google Scholar
Thiebaut, D., Stone, H.S.: Footprints in the Cache. ACM Transactions on Computer Systems 5(4) (1987)
Google Scholar
Zhang, E.Z., Jiang, Y., Shen, X.: Does Cache Sharing on Modern CMP Matter to the Performance of Contemporary Multithreaded Programs? In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2010)
Google Scholar
Ding, C., Chilimbi, T.: All-Window Profiling of Concurrent Executions. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 265–266 (2008)
Google Scholar
Zhong, Y., Dropsho, S.G., Ding, C.: Miss Rate Prediction Across All Program Inputs. In: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (2003)
Google Scholar
Zhong, Y., Dropsho, S.G., Shen, X., Studer, A., Ding, C.: Miss rate prediction across program inputs and cache configurations. IEEE Transactions on Computers 56(3), 328–343 (2007)
Article MathSciNet Google Scholar
Zhong, Y., Orlovich, M., Shen, X., Ding, C.: Array Regrouping and Structure Splitting using Whole-Program Reference Affinity. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 255–266 (2004)
Google Scholar
Zhong, Y., Chang, W.: Sampling-based Program Locality Approximation. In: Proceedings of the International Symposium on Memory Management (2008)
Google Scholar
Zhong, Y., Shen, X., Ding, C.: Program Locality Analysis Using Reuse Distance. ACM Transactions on Programming Languages and Systems 31(6) (2009)
Google Scholar
Luk, C., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (2005)
Google Scholar
Magnusson, P.S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A Full System Simulation Platform. Computer, 50–58 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, The College of William and Mary, Williamsburg, VA, USA
Yunlian Jiang, Eddy Z. Zhang, Kai Tian & Xipeng Shen

Authors

Yunlian Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Eddy Z. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xipeng Shen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of California Riverside, CA 92521, Riverside, USA
Rajiv Gupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, Y., Zhang, E.Z., Tian, K., Shen, X. (2010). Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?. In: Gupta, R. (eds) Compiler Construction. CC 2010. Lecture Notes in Computer Science, vol 6011. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11970-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-11970-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11969-9
Online ISBN: 978-3-642-11970-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?

Abstract

Chapter PDF

Similar content being viewed by others

Analysis of Data Reuse in Task-Parallel Runtimes

Adaptive Thread Scheduling in Chip Multiprocessors

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?

Abstract

Chapter PDF

Similar content being viewed by others

Analysis of Data Reuse in Task-Parallel Runtimes

Adaptive Thread Scheduling in Chip Multiprocessors

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation