Abstract
Instruction caches are responsible for a high percentage of the chip energy consumption, becoming a critical issue for battery-powered embedded devices. We can potentially reduce the energy consumption of the first level instruction cache (L1-I) by decreasing its size and associativity. However, demanding applications may suffer a dramatic performance degradation, specially in superscalar multi-threaded processors, where, in each cycle, multiple threads access the L1-I to fetch instructions.
We introduce iLP-NUCA (Instruction Light Power NUCA), a new instruction cache that substitutes the conventional L2, improving the Energy-Delay of the system. iLP-NUCA adds a new tree-based transport network topology that reduces latency and energy consumption, regarding former LP-NUCA implementations.
With iLP-NUCA we reduce the size of the L1-I outperforming conventional cache hierarchies, and reducing the overall consumption, independently of the number of threads.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In: Proc. of the 22 nd Ann. Int. Symp. on Comp. Arch., pp. 392–403 (1995)
Montanaro, J., Witek, R., Anne, K., Black, A., Cooper, E., Dobberpuhl, D., Donahue, P., Eno, J., Farell, A., Hoeppner, G., Kruckemyer, D., Lee, T., Lin, P., Madden, L., Murray, D., Pearce, M., Santhanam, S., Snyder, K., Stephany, R., Thierauf, S.: A 160 MHz 32 b 0.5 W CMOS RISC microprocessor. In: Proc. of 1996 IEEE Int. Solid-State Circuits Conference Digest of Technical Papers, pp. 214–215, 447 (1996)
Segars, S.: Low power design techniques for microprocessors. ISSCC Tutorial note (February 2001)
Gwennap, L.: What’s inside the Krait. Microprocessor Report 26, 1–9 (2012)
Sundararajan, K.T., Jones, T.M., Topham, N.: Smart cache: A self adaptive cache architecture for energy efficiency. In: Proc. of the Int. Conference on Embedded Comp. Systems: Architectures, Modeling, and Simulation, pp. 41–50 (July 2011)
Zhang, C., Vahid, F., Najjar, W.: A highly configurable cache for low energy embedded systems. ACM Trans. Embed. Comput. Syst. 4, 363–387 (2005)
Bellas, N., Hajj, I., Polychronopoulos, C., Stamoulis, G.: Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. on Very Large Scale Integration Systems 8, 317–326 (2000)
Kin, J., Gupta, M., Mangione-Smith, W.: The filter cache: an energy efficient memory structure. In: Proc. of the 30th Ann. IEEE/ACM Int. Symp. on Microarchitecture, pp. 184–193 (1997)
Suárez, D., Dimitrakopoulos, G., Monreal, T., Katevenis, M.G.H., Viñals, V.: LP-NUCA: Networks-in-cache for high- performance low-power embedded processors. IEEE Trans. on Very Large Scale Integration Systems 20, 1510–1523 (2012)
LSI Corporation: PowerPCTM processor (476FP) embedded core product brief (January 2010), http://www.lsi.com/DistributionSystem/AssetDocument/PPC476FP-PB-v7.pdf
Halfhill, T.R.: Netlogic broadens XLP family. Microprocessor Report 24, 1–11 (2010)
Byrne, J.: Freescale drops quad-core threshold. Microprocessor Report 26, 10–12 (2012)
Austin, T., Burger, D.: The simplescalar tool set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin Madison (1997)
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In: Proc. of the 40th Ann. IEEE/ACM Int. Symp. on Microarchitecture, pp. 3–14 (2007)
Henning, J.L.: SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 1–17 (2006)
Hamerly, G., Perelman, E., Lau, J., Calder, B.: SimPoint 3.0: Faster and more flexible program analysis. Journal of Instruction Level Parallelism (2005)
Suárez, D., Monreal, T., Viñals, V.: A comparison of cache hierarchies for SMT processors. In: Proc. of the 22nd Jornadas de Paralelismo (2011)
Wackerly, D., Mendenhall, W., Scheaffer, R.L.: Mathematical Statistics with Applications, 7th edn. Brooks/Cole Cengage Learning (2008)
Gabor, R., Weiss, S., Mendelson, A.: Fairness and throughput in switch on event multithreading. In: Proc. of the 39th Ann. IEEE/ACM Int. Symp. on Microarchitecture, pp. 149–160 (2006)
Li, Y., Brooks, D., Hu, Z., Skadron, K., Bose, P.: Understanding the energy efficiency of simultaneous multithreading. In: Proc. of the 2004 Int. Symp. on Low Power Electronics and Design, pp. 44–49 (2004)
Yang, C.L., Lee, C.H.: Hotspot cache: joint temporal and spatial locality exploitation for i-cache energy reduction. In: Proc. of the 2004 Int. Symp. on Low Power Electronics and Design, pp. 114–119 (2004)
Albonesi, D.H.: Selective cache ways: on-demand cache resource allocation. In: Proc. of the 32nd Ann. ACM/IEEE Int. Symp. on Microarchitecture, pp. 248–259 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferrerón-Labari, A., Ortín-Obón, M., Suárez-Gracia, D., Alastruey-Benedé, J., Viñals-Yúfera, V. (2013). Shrinking L1 Instruction Caches to Improve Energy–Delay in SMT Embedded Processors. In: Kubátová, H., Hochberger, C., Daněk, M., Sick, B. (eds) Architecture of Computing Systems – ARCS 2013. ARCS 2013. Lecture Notes in Computer Science, vol 7767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36424-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-36424-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36423-5
Online ISBN: 978-3-642-36424-2
eBook Packages: Computer ScienceComputer Science (R0)