Skip to main content

Compute-in-Memory Architecture

  • Living reference work entry
  • First Online:
Handbook of Computer Architecture

Abstract

In the era of big data and artificial intelligence, hardware advancement in throughput and energy efficiency is essential for both cloud and edge computations. Because of the merged data storage and computing units, compute-in-memory is becoming one of the desirable choices for data-centric applications to mitigate the memory wall bottleneck in von-Neumann architecture. In this chapter, the recent architectural designs and underlying circuit/device technologies for compute-in-memory are surveyed. The related design challenges and prospects are also discussed to provide an in-depth understanding of interactions between algorithms/architectures and circuits/devices. The chapter is organized hierarchically: the overview of the field (Introduction section); the principle of compute-in-memory (section “DNN Basics and Corresponding CIM Principle”); the latest architecture and algorithm techniques including network model, data flow, pipeline design, and quantization approaches (section “Architecture and Algorithm Techniques for CIM”); the related hardware support including embedded memory technologies such as static random access memories and emerging nonvolatile memories, as well as the peripheral circuit designs with a focus on the analog-to-digital converters (section “Hardware Implementations for CIM Architecture”); a summary and outlook of the compute-in-memory architecture (Conclusion section).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  • Ambrogio S, Gallot M, Spoon K, Tsai H, Mackin C, Wesson M, Kariyappa S, Narayanan P, Liu CC, Kumar A et al (2019) Reducing the impact of phase-change memory conductance drift on the inference of large-scale hardware neural networks. In: IEEE international electron devices meeting (IEDM)

    Google Scholar 

  • Chang HY, Narayanan P, Lewis SC, Farinha NC, Hosokawa K, Mackin C, Tsai H, Ambrogio S, Chen A, Burr GW (2019) AI hardware acceleration with analog memory: microarchitectures for low energy at high speed. IBM J Res Dev 63(6):8–1

    Article  Google Scholar 

  • Chen Y (2020) ReRAM: history, status, and future. IEEE Trans Electron Devices 67(4):1420–1433

    Article  Google Scholar 

  • Chen YH, Krishna T, Emer JS, Sze V (2016) Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138

    Article  Google Scholar 

  • Chen WH, Li KX, Lin WY, Hsu KH, Li PY, Yang CH, Xue CX, Yang EY, Chen YK, Chang YS, Hsu TH (2018a) A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Chen PY, Peng X, Yu S (2018b) NeuroSim: a circuit-level macro model for benchmarking neuro-inspired architectures in online learning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37(12):3067–3080

    Article  Google Scholar 

  • Chi P, Li S, Xu C, Zhang T, Zhao J, Liu Y, Wang Y, Xie Y (2016) Prime: a novel processing-in-memory architecture for neural network computation in reram-based main memory. In: 43rd annual international symposium on computer architecture (ISCA), vol 44, p 27

    Google Scholar 

  • Courbariaux M, Bengio Y, David JP (2015) Training deep neural networks with low precision multiplications. In: Workshop contribution at international conference on learning representations (ICLR)

    Google Scholar 

  • DeHon A (2000) The density advantage of configurable computing. Computer 33(4):41–49

    Article  Google Scholar 

  • Dong X, Xu C, Xie Y, Jouppi NP (2012) Nvsim: a circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Trans Comput-Aided Des Integr Circuits Syst 31(7):994–1007

    Article  Google Scholar 

  • Elliott DG, Stumm M, Snelgrove WM, Cojocaru C, Mckenzie R (1999) Computational RAM: implementing processors in memory. IEEE Des Test Comput 16(1):32–41

    Article  Google Scholar 

  • Feng Y, Chen B, Liu J, Sun Z, Hu H, Zhang J, Zhan X, Chen J (2021) Design-technology co-optimizations (DTCO) for general-purpose computing in-memory based on 55nm NOR flash technology. In: IEEE international electron devices meeting (IEDM)

    Google Scholar 

  • Gao L, Chen PY, Liu R, Yu S (2016) Physical unclonable function exploiting sneak paths in resistive cross-point array. IEEE Transactions on Electron Devices 63(8):3109–3115

    Article  Google Scholar 

  • Giannopoulos I, Sebastian A, Le Gallo M, Jonnalagadda V, Sousa M, Boon M (2018) 8-bit precision in-memory multiplication with projected phasechange memory. In: IEEE international electron devices meeting (IEDM)

    Google Scholar 

  • Gokmen T, Onen M, Haensch W (2017) Training deep convolutional neural networks with resistive cross-point devices. Front Neurosci 11:538

    Article  Google Scholar 

  • He Z, Lin J, Ewetz R, Yuan JS, Fan D (2019) Noise injection adaption: end-to-end ReRAM crossbar non-ideal effect adaption for neural network mapping. In: ACM/IEEE design automation conference (DAC)

    Google Scholar 

  • Huang S, Jiang H, Peng X, Li W, Yu S (2020a) XOR-CIM: compute-in-memory SRAM architecture with embedded XOR encryption. In: IEEE/ACM international conference on computer-aided design (ICCAD)

    Google Scholar 

  • Huang S, Sun X, Peng X, Jiang H, Yu S (2020b) Overcoming challenges for achieving high in-situ training accuracy with emerging memories. In: Design, Automation & Test in Europe Conference & Exhibition (DATE)

    Google Scholar 

  • Huang S, Peng X, Jiang H, Luo Y, Yu S (2021) Exploiting process variations to protect machine learning inference engine from chip cloning. In: IEEE international symposium on circuits and systems (ISCAS)

    Google Scholar 

  • Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Conference on neural information processing systems (NIPS)

    MATH  Google Scholar 

  • Ikegawa S, Mancoff FB, Janesky J, Aggarwal S (2020) Magnetoresistive random access memory: present and future. IEEE Transactions on Electron Devices 67(4):1407–1419

    Article  Google Scholar 

  • Imani M, Gupta S, Kim Y, Rosing T (2019) FloatPIM: in-memory acceleration of deep neural network training with high precision. In: ACM/IEEE 46th annual international symposium on computer architecture (ISCA)

    Google Scholar 

  • Jiang H, Peng X, Huang S, Yu S (2020a) CIMAT: a compute-in-memory architecture for on-chip training based on transpose SRAM arrays. IEEE Trans Comput 69(7):944–954

    MATH  Google Scholar 

  • Jiang H, Huang S, Peng X, Su JW, Chou Y-C, Huang WH, Liu TW, Liu R, Chang MF, Yu S (2020b) A two-way SRAM array based accelerator for deep neural network on-chip training. In: ACM/IEEE design automation conference (DAC)

    Google Scholar 

  • Jiang H, Li W, Huang S, Cosemans S, Catthoor F, Yu S (2021) Analog-to-digital converter design exploration for compute-in-memory accelerators. IEEE Des Test 39(2):48–55

    Article  Google Scholar 

  • Jouppi NP et al (2017) In-datacenter performance analysis of a tensor processing unit. In: ACM/IEEE international symposium on computer architecture (ISCA)

    Google Scholar 

  • Kawahara A, Azuma R, Ikeda Y, Kawai K, Katoh Y, Tanabe K, Nakamura T, Sumimoto Y, Yamada N, Nakai N, Sakamoto S, Hayakawa Y, Tsuji K, Yoneda S, Himeno A, Origasa K, Shimakawa K, Takagi T, Mikawa T, Aono K (2012) An 8Mb multi-layered cross-point ReRAM macro with 443MB/s write throughput. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Keckler S, Kunle O, Hofstee P (2009) Multicore processors and systems. Springer, US

    Book  MATH  Google Scholar 

  • Khwa W-S, Chen JJ, Li JF, Si X, Yang EY, Sun X, Liu R, Chen PY, Li Q, Yu S, Chang MF (2018) A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Kim T, Lee S (2020) Evolution of phase-change memory for the storage-class memory and beyond. IEEE Trans Electron Devices 67(4):1394–1406

    Article  Google Scholar 

  • Kim D, She X, Rahman NM, Chekuri VCK, Mukhopadhyay S (2020) Processing-in-memory-based on-chip learning with spike-time-dependent plasticity in 65-nm cmos. IEEE Solid-State Circuits Letters 3:278–281

    Article  Google Scholar 

  • Li C, Hu M, Li Y, Jiang H, Ge N, Montgomery E, Zhang J, Song W, Dávila N, Graves CE, Li Z (2018a) Analogue signal and image processing with large memristor crossbars. Nat Electron 1(1):52–59

    Article  Google Scholar 

  • Li Y, Kim S, Sun X, Solomon P, Gokmen T, Tsai H, Koswatta S, Ren Z, Mo R, Yeh CC, Haensch W, Leobandung E (2018b) Capacitor-based cross-point array for analog neural network with record symmetry and linearity. In: IEEE symposium on VLSI technology

    Google Scholar 

  • Li W, Huang S, Sun X, Jiang H, Yu S (2021) Secure-RRAM: a 40nm 16kb compute-in-memory macro with reconfigurability, sparsity control, and embedded security. In: IEEE custom integrated circuits conference (CICC)

    Google Scholar 

  • Lin MY, Cheng HY, Lin WT, Yang TH, Tseng IC, Yang CL, Hu HW, Chang HS, Li HP, Chang MF (2018) DL-RSIM: a simulation framework to enable reliable ReRAM-based accelerators for deep learning. In: IEEE/ACM international conference on computer-aided design (ICCAD)

    Google Scholar 

  • Liu R, Peng X, Sun X, Khwa WS, Si X, Chen JJ, Li JF, Chang MF, Yu S (2018) Parallelizing SRAM arrays with customized bit-cell for binary neural networks. In: ACM/IEEE design automation conference (DAC)

    Google Scholar 

  • Liu Q, Gao B, Yao P, Wu D, Chen J, Pang Y, Zhang W, Liao Y, Xue CX, Chen WH, Tang J (2020) A fully integrated analog ReRAM based 78.4 TOPS/W compute-in-memory chip with fully parallel MAC computing. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Long Y, She X, Mukhopadhyay S (2019) Design of reliable DNN accelerator with un-reliable ReRAM. In: Design, Automation & Test in Europe Conference & Exhibition (DATE)

    Google Scholar 

  • Lue HT, Hsu PK, Wei ML, Yeh TH, Du PY, Chen WC, Wang KC, Lu CY (2019) Optimal design methods to transform 3D NAND flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvCIM) accelerator for deep-learning neural networks (DNN). In: IEEE international electron devices meeting (IEDM)

    Google Scholar 

  • Luo Y, Peng X, Hatcher R, Rakshit T, Kittl J, Rodder MS, Seo JS, Yu S (2020) A variation robust inference engine based on STT-MRAM with parallel read-out. In: IEEE international symposium on circuits and systems (ISCAS)

    Google Scholar 

  • Mikolajick T, Schroeder U, Slesazeck S (2020) The past, the present, and the future of ferroelectric memories. IEEE Transactions on Electron Devices 67(4):1434–1443

    Article  Google Scholar 

  • Peng X, Huang S, Luo Y, Sun X, Yu S (2019) DNN+NeuroSim: an end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In: IEEE international electron devices meeting (IEDM)

    Google Scholar 

  • Peng X, Liu R, Yu S (2020) Optimizing weight mapping and data flow for convolutional neural networks on processing-in-memory architectures. IEEE Trans Circuits Syst I Regul Pap 67(4):1333–1343

    Article  Google Scholar 

  • Presley RK, Haggard RL (1994) A fixed point implementation of the backpropagation learning algorithm. In: Proceedings of SOUTHEASTCON

    Google Scholar 

  • Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) XNOR-net: ImageNet classification using binary convolutional neural networks. In: European conference on computer vision (ECCV)

    Google Scholar 

  • Ronen R, Eliahu A, Leitersdorf O, Peled N, Korgaonkar K, Chattopadhyay A, Perach B, Kvatinsky S (2022) The bitlet model: a parameterized analytical model to compare PIM and CPU systems. ACM J Emerg Technol Comput Syst 18(2):1–29

    Article  Google Scholar 

  • Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Williams RS, Srikumar V (2016) ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: ACM/IEEE international symposium on computer architecture (ISCA), vol 44, p 14

    Google Scholar 

  • Shim W, Yu S (2021) Technological design of 3D NAND based compute-in-memory architecture for GB-scale deep neural network. IEEE Electron Device Lett 42(2):160–163

    Article  Google Scholar 

  • Si X et al (2020) A 28nm 64Kb 6T SRAM computing-in-memory macro with 8b MAC operation for AI edge chips. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)

    Google Scholar 

  • Song L, Qian X, Li H, Chen Y (2017) PipeLayer: a pipelined RRAM-based accelerator for deep learning. In: IEEE international symposium on high performance computer architecture (HPCA)

    Google Scholar 

  • Song T, Jung J, Rim W, Kim H, Kim Y, Park C, Do J, Park S, Cho S, Jung H, Kwon B, Choi H-S, Choi J, Yoon JS (2018) A 7nm FinFET SRAM using EUV lithography with dual write-driver-assist circuitry for low-voltage applications. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Su JW et al (2020) A 28nm 64kb inference-training two-way transpose multibit 6T SRAM compute-in-memory macro for AI edge chips. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Sun X, Yu S (2019) Impact of non-ideal characteristics of resistive synaptic devices on implementing convolutional neural networks. IEEE J Emerg Sel Top Circuits Syst 9(3):570–579

    Article  Google Scholar 

  • Sun X, Wang P, Ni K, Datta S, Yu S (2018) Exploiting hybrid precision for training and inference: a 2T-1FeFET based analog synaptic weight cell. In: IEEE international electron devices meeting (IEDM)

    Google Scholar 

  • Ueyoshi K, Ando K, Hirose K, Takamaeda-Yamazaki S, Kadomoto J, Miyata T, Hamada M, Kuroda T, Motomura M (2018) QUEST: a 7.49 TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on CPUs. In: Conference on neural information processing systems (NIPS)

    Google Scholar 

  • Wang J, Wang X, Eckert C, Subramaniyan A, Das R, Blaauw D, Sylvester D (2019) A 28-nm compute SRAM with bit-serial logic/arithmetic operations for programmable in-memory vector computing. IEEE J Solid State Circuits 55(1):76–86

    Article  Google Scholar 

  • Wilton Steven JE, Jouppi NP (1996) CACTI: an enhanced cache access and cycle time model. IEEE J Solid State Circuits 31(5):677–688

    Article  Google Scholar 

  • Wu S, Li G, Chen F, Shi L (2018) Training and inference with integers in deep neural networks. In: International conference on learning representations (ICLR)

    Google Scholar 

  • Xia L, Li B, Tang T, Gu P, Yin X, Huangfu W, Chen PY, Yu S, Cao Y, Wang Y, Xie Y, Yang H (2016) MNSIM: simulation platform for memristor-based neuromorphic computing system. In: IEEE/ACM design automation and test in Europe Conference & Exhibition (DATE)

    Google Scholar 

  • Xue CX et al (2019) A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN-based AI edge processors. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Yeap G, Lin SS, Chen YM, Shang HL, Wang PW, Lin HC, Peng YC, Sheu JY, Wang M, Chen X, Yang BR (2019) 5nm CMOS production technology platform featuring full-fledged EUV, and high mobility channel FinFETs with densest 0.021 μm2 SRAM cells for mobile SoC and high performance computing applications. In: IEEE international electron devices meeting (IEDM)

    Google Scholar 

  • Yin S, Jiang Z, Seo JS, Seok M (2020a) XNOR-SRAM: in-memory computing SRAM macro for binary/ternary deep neural networks. IEEE J Solid State Circuits 55(6):1733–1743

    Google Scholar 

  • Yin S, Sun X, Yu S, Seo JS (2020b) High-throughput in-memory computing for binary deep neural networks with monolithically integrated RRAM and 90nm CMOS. IEEE Trans Electron Devices 67(10):4185–4192

    Article  Google Scholar 

  • Yoon JH, Chang M, Khwa WS, Chih YD, Chang MF, Raychowdhury A (2021) A 40nm 64Kb 56.67 TOPS/W read-disturb-tolerant compute-in-memory/digital RRAM macro with active-feedback-based read and in-situ write verification. In: IEEE international solid-state circuits conference (ISSCC)

    Google Scholar 

  • Yu S (2018) Neuro-inspired computing with emerging nonvolatile memorys. Proc IEEE 106(2):260–285

    Article  Google Scholar 

  • Yu S, Chen PY, Cao Y, Xia L, Wang Y, Wu H (2015) Scaling-up resistive synaptic arrays for neuro-inspired architecture: challenges and prospect. In: IEEE international electron devices meeting (IEDM)

    Google Scholar 

  • Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160

    Google Scholar 

  • Zhu R, Tang Z, Ye S, Huang Q, Guo L, Chang S (2021) Memristor-based image enhancement: high efciency and robustness. IEEE Trans Electron Devices 68(2):602–609

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shimeng Yu .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Singapore Pte Ltd.

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Jiang, H., Huang, S., Yu, S. (2023). Compute-in-Memory Architecture. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_62-1

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6401-7_62-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6401-7

  • Online ISBN: 978-981-15-6401-7

  • eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics