Shrinking L1 Instruction Caches to Improve Energy–Delay in SMT Embedded Processors

Ferrerón-Labari, Alexandra; Ortín-Obón, Marta; Suárez-Gracia, Darío; Alastruey-Benedé, Jesús; Viñals-Yúfera, Víctor

doi:10.1007/978-3-642-36424-2_22

Alexandra Ferrerón-Labari²⁰,
Marta Ortín-Obón²⁰,
Darío Suárez-Gracia²⁰,
Jesús Alastruey-Benedé²⁰ &
…
Víctor Viñals-Yúfera²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7767))

Included in the following conference series:

International Conference on Architecture of Computing Systems

1625 Accesses

Abstract

Instruction caches are responsible for a high percentage of the chip energy consumption, becoming a critical issue for battery-powered embedded devices. We can potentially reduce the energy consumption of the first level instruction cache (L1-I) by decreasing its size and associativity. However, demanding applications may suffer a dramatic performance degradation, specially in superscalar multi-threaded processors, where, in each cycle, multiple threads access the L1-I to fetch instructions.

We introduce iLP-NUCA (Instruction Light Power NUCA), a new instruction cache that substitutes the conventional L2, improving the Energy-Delay of the system. iLP-NUCA adds a new tree-based transport network topology that reduces latency and energy consumption, regarding former LP-NUCA implementations.

With iLP-NUCA we reduce the size of the L1-I outperforming conventional cache hierarchies, and reducing the overall consumption, independently of the number of threads.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism

Cache Coherence for Embedded Multi-core System Architectures: A Survey and Challenges

Energy Efficient Low Latency Multi-issue Cores for Intelligent Always-On IoT Applications

Article Open access 26 July 2020

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In: Proc. of the 22 nd Ann. Int. Symp. on Comp. Arch., pp. 392–403 (1995)
Google Scholar
Montanaro, J., Witek, R., Anne, K., Black, A., Cooper, E., Dobberpuhl, D., Donahue, P., Eno, J., Farell, A., Hoeppner, G., Kruckemyer, D., Lee, T., Lin, P., Madden, L., Murray, D., Pearce, M., Santhanam, S., Snyder, K., Stephany, R., Thierauf, S.: A 160 MHz 32 b 0.5 W CMOS RISC microprocessor. In: Proc. of 1996 IEEE Int. Solid-State Circuits Conference Digest of Technical Papers, pp. 214–215, 447 (1996)
Google Scholar
Segars, S.: Low power design techniques for microprocessors. ISSCC Tutorial note (February 2001)
Google Scholar
Gwennap, L.: What’s inside the Krait. Microprocessor Report 26, 1–9 (2012)
Google Scholar
Sundararajan, K.T., Jones, T.M., Topham, N.: Smart cache: A self adaptive cache architecture for energy efficiency. In: Proc. of the Int. Conference on Embedded Comp. Systems: Architectures, Modeling, and Simulation, pp. 41–50 (July 2011)
Google Scholar
Zhang, C., Vahid, F., Najjar, W.: A highly configurable cache for low energy embedded systems. ACM Trans. Embed. Comput. Syst. 4, 363–387 (2005)
Article Google Scholar
Bellas, N., Hajj, I., Polychronopoulos, C., Stamoulis, G.: Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. on Very Large Scale Integration Systems 8, 317–326 (2000)
Article Google Scholar
Kin, J., Gupta, M., Mangione-Smith, W.: The filter cache: an energy efficient memory structure. In: Proc. of the 30th Ann. IEEE/ACM Int. Symp. on Microarchitecture, pp. 184–193 (1997)
Google Scholar
Suárez, D., Dimitrakopoulos, G., Monreal, T., Katevenis, M.G.H., Viñals, V.: LP-NUCA: Networks-in-cache for high- performance low-power embedded processors. IEEE Trans. on Very Large Scale Integration Systems 20, 1510–1523 (2012)
Article Google Scholar
LSI Corporation: PowerPC^TM processor (476FP) embedded core product brief (January 2010), http://www.lsi.com/DistributionSystem/AssetDocument/PPC476FP-PB-v7.pdf
Halfhill, T.R.: Netlogic broadens XLP family. Microprocessor Report 24, 1–11 (2010)
Google Scholar
Byrne, J.: Freescale drops quad-core threshold. Microprocessor Report 26, 10–12 (2012)
Google Scholar
Austin, T., Burger, D.: The simplescalar tool set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin Madison (1997)
Google Scholar
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In: Proc. of the 40th Ann. IEEE/ACM Int. Symp. on Microarchitecture, pp. 3–14 (2007)
Google Scholar
Henning, J.L.: SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 1–17 (2006)
Article MathSciNet Google Scholar
Hamerly, G., Perelman, E., Lau, J., Calder, B.: SimPoint 3.0: Faster and more flexible program analysis. Journal of Instruction Level Parallelism (2005)
Google Scholar
Suárez, D., Monreal, T., Viñals, V.: A comparison of cache hierarchies for SMT processors. In: Proc. of the 22nd Jornadas de Paralelismo (2011)
Google Scholar
Wackerly, D., Mendenhall, W., Scheaffer, R.L.: Mathematical Statistics with Applications, 7th edn. Brooks/Cole Cengage Learning (2008)
Google Scholar
Gabor, R., Weiss, S., Mendelson, A.: Fairness and throughput in switch on event multithreading. In: Proc. of the 39th Ann. IEEE/ACM Int. Symp. on Microarchitecture, pp. 149–160 (2006)
Google Scholar
Li, Y., Brooks, D., Hu, Z., Skadron, K., Bose, P.: Understanding the energy efficiency of simultaneous multithreading. In: Proc. of the 2004 Int. Symp. on Low Power Electronics and Design, pp. 44–49 (2004)
Google Scholar
Yang, C.L., Lee, C.H.: Hotspot cache: joint temporal and spatial locality exploitation for i-cache energy reduction. In: Proc. of the 2004 Int. Symp. on Low Power Electronics and Design, pp. 114–119 (2004)
Google Scholar
Albonesi, D.H.: Selective cache ways: on-demand cache resource allocation. In: Proc. of the 32nd Ann. ACM/IEEE Int. Symp. on Microarchitecture, pp. 248–259 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

gaZ—DIIS—I3A, Universidad de Zaragoza, Spain
Alexandra Ferrerón-Labari, Marta Ortín-Obón, Darío Suárez-Gracia, Jesús Alastruey-Benedé & Víctor Viñals-Yúfera

Authors

Alexandra Ferrerón-Labari
View author publications
You can also search for this author in PubMed Google Scholar
Marta Ortín-Obón
View author publications
You can also search for this author in PubMed Google Scholar
Darío Suárez-Gracia
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Alastruey-Benedé
View author publications
You can also search for this author in PubMed Google Scholar
Víctor Viñals-Yúfera
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

FIT, Czech Technical University, Thákurova 9, 160 00, Prague 6, Czech Republic
Hana Kubátová
Elektrotechnik und Informationstechnik, TU Darmstadt, Merckstraße 25, 64283, Darmstadt, Germany
Christian Hochberger
Department of Signal Processing, Institute of Information Theory and Automation, Pod Vodárenskou věží 4, 18208, Prague 8, Czech Republic
Martin Daněk
Intelligent Embedded Systems, University of Kassel, Wilhelmshöher Allee 73, 34121, Kassel, Germany
Bernhard Sick

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrerón-Labari, A., Ortín-Obón, M., Suárez-Gracia, D., Alastruey-Benedé, J., Viñals-Yúfera, V. (2013). Shrinking L1 Instruction Caches to Improve Energy–Delay in SMT Embedded Processors. In: Kubátová, H., Hochberger, C., Daněk, M., Sick, B. (eds) Architecture of Computing Systems – ARCS 2013. ARCS 2013. Lecture Notes in Computer Science, vol 7767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36424-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-36424-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36423-5
Online ISBN: 978-3-642-36424-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Shrinking L1 Instruction Caches to Improve Energy–Delay in SMT Embedded Processors

Abstract

Chapter PDF

Similar content being viewed by others

Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism

Cache Coherence for Embedded Multi-core System Architectures: A Survey and Challenges

Energy Efficient Low Latency Multi-issue Cores for Intelligent Always-On IoT Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Shrinking L1 Instruction Caches to Improve Energy–Delay in SMT Embedded Processors

Abstract

Chapter PDF

Similar content being viewed by others

Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism

Cache Coherence for Embedded Multi-core System Architectures: A Survey and Challenges

Energy Efficient Low Latency Multi-issue Cores for Intelligent Always-On IoT Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation