Abstract
The graphics processing unit (GPU) became an undoubtedly important computing engine for high-performance computing. With massive parallelism and easy programmability, GPU has been quickly adopted by various emerging computing domains including gaming, artificial intelligence, security, virtual reality, and so on. With its huge success in the market, GPU execution and its architecture became one of the essential topics in parallel computing today. The goal of this chapter is to provide readers with a basic understanding of GPU architecture and its programming model. This chapter explores the historical background of current GPU architecture, basics of various programming interfaces, core architecture components such as shader pipeline, schedulers and memories that support SIMT execution, various types of GPU device memories and their performance characteristics, and some examples of optimal data mapping to memories. Several recent studies are also discussed that helped advance the GPU architecture from the perspectives of performance, energy efficiency, and reliability.
References
Abdel-Majeed M, Dweik W, Jeon H, Annavaram M (2015) Warped-RE: low-cost error detection and correction in GPUs. In: Proceedings of the 45th annual IEEE/IFIP international conference on dependable systems and networks, 2015 June 22–25, Rio de Janeiro, Brazil
Abdel-Majeed M, Shafaei A, Jeon H, Pedram M, Annavaram M (2017) Pilot register file: energy efficient partitioned register file for GPUs. In: Proceedings of the IEEE international symposium on High performance computer architecture (HPCA), 2017 Feb 4–8, Austin, TX, USA
Alverson R, Callahan D, Cummings D, Koblenz B, Porterfield A, Smith B (1990) The tera computer system. In: ACM SIGARCH computer architecture news, 1990 Sept, vol 18(3b), pp 1–6
AMD (2021) AMD HIP programming guide v1.0. [Internet]. Available from: https://github.com/RadeonOpenCompute/ROCm/blob/master/AMD_HIP_Programming_Guide.pdf
Esfeden HA, Khorasani F, Jeon H, Wong D, Abu-Ghazaleh NB (2019) CORF: Coalescing Operand Register File for GPUs. In: international conference on architectural support for programming languages and operating systems, April 2019, Providence, RI
Gebhart M, Keckler SW, Dally WJ (2011) A compile-time managed multi-level register file hierarchy. In: Proceedings of the 45th annual IEEE/ACM international symposium on microarchitecture (MICRO), 2011 Dec 3–7, Porto Alegre Brazil
Hower DR, Hechtman BA, Beckmann BM, Gaster BR, Hill MD, Reinhardt SK, Wood DA (2014) Heterogeneous-race-free memory models. In: Proceedings of the international conference on architectural support for programming languages and operating systems (ASPLOS), Mar 1–5 2014, Salt Lake City, Utah, USA
Ibrahim MA, Kayiran O, Eckert Y, Loh GH, Jog A (2021) Analyzing and leveraging decoupled L1 caches in GPUs. In: Proceedings of the IEEE international symposium on high-performance computer architecture (HPCA), Feb 27–Mar 3 2021, Seoul, Korea
Jeon H, Annavaram M (2012) Warped-DMR: light-weight error detection for GPGPU. In: Proceedings of the 45th annual IEEE/ACM international symposium on microarchitecture (MICRO), 2012 Dec 1–5, Vancouver, BC, Canada
Jeon H, Ravi GS, Kim NS, Annavaram M (2015) GPU register file virtualization. In: Proceedings of the 48th annual IEEE/ACM international symposium on microarchitecture (MICRO), 2015 Dec 5–9, Waikiki, HI, USA
Jeon H, Esfeden HA, Abu-Ghazaleh NB, Wong D, Elango S (2019) Locality-aware GPU register file. IEEE Comput Archit Lett 18(2):153–156
Jog A, Kayiran O, Mishra AK, Kandemir MT, Mutlu O, Iyer R, Das CR (2013) Orchestrated scheduling and prefetching for GPGPUs. In: Proceedings of the 40th annual international symposium on computer architecture (ISCA), 2013 June 23, Tel Aviv, Israel
Kim K, Wo RW (2018) WIR: warp instruction reuse to minimize repeated computations in GPUs. In: Proceedings of the IEEE international symposium on High Performance Computer Architecture (HPCA), 2018 Feb 24–28, Vienna, Austria
Kim K, Lee S, Yoon MK, Koo G, Ro WW, Annavaram M (2016) Warped-preexecution: a GPU pre-execution approach for improving latency hiding. In: Proceedings of the IEEE international symposium on high performance computer architecture (HPCA), 2016 Mar 12–16, Barcelona, Spain
Kim H, Ahn S, Oh Y, Bo K, Ro WW, Song W (2020) Duplo: lifting redundant memory accesses of deep neural networks for GPU tensor cores. In: Proceedings of the 53rd annual IEEE/ACM international symposium on microarchitecture (MICRO), 2020 Oct 17–21, Athens, Greece
Koo G, Oh Y, Ro WW, Annavaram M (2017) Access pattern-aware cache management for improving data utilization in GPU. In: Proceedings of the ACM/IEEE 44th annual international symposium on computer architecture (ISCA), 2017 June 24–28, Toronto, ON, Canada
Lai J, Seznec A (2013) Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs. In: Proceedings of the 2013 IEEE/ACM international symposium on code generation and optimization (CGO), 2013 Feb 23, pp 1–10
Lee S, Kim K, Koo G, Jeon H, Ro WW, Annavaram M (2015) Warped-compression: enabling power efficient GPUs through register compression. In: Proceedings of the ACM/IEEE 42nd annual international symposium on computer architecture (ISCA), 2015 June 13–17, Portland, OR, USA
Lee S, Arunkumar A, Wu C (2015b) CAWA: coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads. In: Proceedings of the ACM/IEEE 42nd annual international symposium on computer architecture (ISCA), 2015 June 13–17, Portland, OR, USA
Lee S, Kim K, Koo G, Jeon H, Annavaram M, Ro WW (2017) Improving energy efficiency of GPUs through data compression and compressed execution. IEEE Trans Comp 66(5):834–847
Nie B, Yang L, Jog A, Smirni E (2018) Fault site pruning for practical reliability analysis of GPGPU applications. In: Proceedings of the 51st international symposium on microarchitecture (MICRO), 2018 Oct 20–24, Fukuoka, Japan
NVIDIA (2012) NVIDIA Geforce GTX 680 white paper v1.0. [Internet]. Available from: https://www.nvidia.com/content/PDF/product-specifications/GeForce_GTX_680_Whitepaper_FINAL.pdf
NVIDIA (2016) NVIDIA Tesla P100 white paper v1.1. [Internet]. Available from: https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
NVIDIA (2022) CUDA C++ Programming Guide v11.6. [Internet]. Available from: https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
Oh Y, Koo G, Annavaram M, Ro WW (2019) Linebacker: preserving victim cache lines in idle register files of GPUs. In: Proceedings of the ACM/IEEE 46th annual international symposium on computer architecture (ISCA), 2019 June 22–26, Phoenix, AZ, USA
Pattnaik A, Tang X, Kayiran O, Jog A, Mishra A, Kandemir MT, Sivasubramaniam A, Das CR (2019) Opportunistic computing in GPU architectures. In: Proceedings of the 46th international symposium on computer architecture (ISCA), 2019 June 22, Phoenix, Arizona
Rogers TG, O’Connor M, Aamodt TM (2012) Cache-conscious wavefront scheduling. In: Proceedings of the IEEE/ACM 45th annual international symposium on microarchitecture (MICRO), 2012 Dec 1–5, Vancouver, BC, Canada
Rogers TG, O’Connor M, Aamodt TM (2013) Divergence-aware warp scheduling. In: Proceedings of the IEEE/ACM 45th annual international symposium on microarchitecture (MICRO), 2013 Dec 7–11, Davis, CA, USA
Sethia A, Jamshidi D A, Mahlke S (2015) Mascar: speeding up GPU warps by reducing memory pitstops. In: IEEE 21st international symposium on high performance computer architecture (HPCA), 2015 Feb 7–11, Burlingame, CA, USA
Tan J, Fu X (2012) RISE: improving the streaming processors reliability against soft errors in GPGPUs. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques (PACT), 2012 Sept 19–23, Minneapolis, Minnesota, USA
Top500 (2021) Top 500 supercomputer lists. [Internet]. Available from: https://www.top500.org/
Wong D, Kim NS, Annavaram M (2016) Approximating warps with intra-warp operand value similarity. In: IEEE international symposium on high performance computer architecture, March 2016, Barcelona, Spain
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2023 Springer Nature Singapore Pte Ltd.
About this entry
Cite this entry
Jeon, H. (2023). GPU Architecture. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_66-2
Download citation
DOI: https://doi.org/10.1007/978-981-15-6401-7_66-2
Received:
Accepted:
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6401-7
Online ISBN: 978-981-15-6401-7
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering
Publish with us
Chapter history
-
Latest
GPU Architecture- Published:
- 25 June 2023
DOI: https://doi.org/10.1007/978-981-15-6401-7_66-2
-
Original
GPU Architecture- Published:
- 16 May 2023
DOI: https://doi.org/10.1007/978-981-15-6401-7_66-1