Abstract
Since their inception more than thirty years ago, field-programmable gate arrays (FPGAs) have grown more complex, more capable, and more diverse in their applications. FPGAs can be reprogrammed at a fundamental level, changing the function and interconnection of millions of elements. By reconfiguring their hardware to match the application, FPGAs often achieve higher energy efficiency, lower latency or faster time-to-market across a very wide range of application domains. A modern FPGA combines many components, from logic blocks, programmable routing and memory blocks to networks-on-chip and processor subsystems. For best efficiency, each component must be carefully architected to match the needs of a wide range of applications, and to mesh well with the other components. Their design involves many different choices starting from the high-level architectural parameters down to the transistor-level implementation details. This chapter describes the evolution of these FPGA components, their design principles and implementation challenges.
Similar content being viewed by others
References
Abdelfattah MS, Betz V (2013) The case for embedded networks on chip on field-programmable gate arrays. IEEE Micro 34(1):80–89
Abdelfattah MS et al (2015) Take the highway: design for embedded NoCs on FPGAs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 98–107
Ahmed E, Rose J (2004) The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Trans Very Large Scale Integr (VLSI) Syst 12(3):288–298
Ahmed I et al (2019) FRoC 2.0: automatic BRAM and logic testing to enable dynamic voltage scaling for FPGA applications. ACM Trans Reconfig Technol Syst (TRETS) 12(4):1–28
Betz V, Rose J (1998) How much logic should go in an FPGA logic block? IEEE Des Test Comput 15(1):10–15
Betz V, Rose J (1999) FPGA routing architecture: segmentation and buffering to optimize speed and density. In: ACM International Symposium on FPGAs, pp 59–68
Betz V et al (1999) Architecture and CAD for deep-submicron FPGAs. Springer Science & Business Media. New York, USA
Bohr MT (1995) Interconnect scaling – the real limiter to high performance ULSI. In: Proceedings of International Electron Devices Meeting. IEEE, pp 241–244
Boutros A et al(2018) You cannot improve what you do not measure: FPGA vs. ASIC efficiency gaps for convolutional neural network inference. ACM Trans Reconfig Technol Syst (TRETS) 11(3):1–23
Boutros A et al (2018) Embracing diversity: enhanced DSP blocks for low-precision deep learning on FPGAs. In: IEEE International Conference on Field Programmable Logic and Applications (FPL), pp 35–357
Boutros A et al (2020) Beyond peak performance: comparing the real performance of AI-optimized FPGAs and GPUs. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 10–19
Boutros A et al (2022) Architecture and application co-design for beyond-FPGA reconfigurable acceleration devices. IEEE Access 10:95067–95082
Caulfield AM et al (2016) A cloud-scale acceleration architecture. In: IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 1–13
Chaware R et al (2012) Assembly and reliability challenges in 3D integration of 28 nm FPGA die on a large high density 65 nm passive interposer. In: IEEE Electronic Components and Technology Conference, pp 279–283
Cheah HY et al (2014) The iDEA DSP block-based soft processor for FPGAs. ACM Trans Reconfig Technol Syst (TRETS) 7(3):1–23
Chiasson C, Betz V (2013a) COFFE: fully-automated transistor sizing for FPGAs. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 34–41
Chiasson C, Betz V (2013b) Should FPGAs abandon the pass gate? In: International Conference on Field-Programmable Logic and Applications, pp 1–8
Chromczak J et al (2020) Architectural enhancements in intel agilex FPGAs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 140–149
Ebeling C et al (2016) Stratix 10 high performance routable clock networks In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 64–73
Eldafrawy M et al (2020) FPGA logic block architectures for efficient deep learning inference. ACM Trans Reconfig Technol Syst (TRETS) 13(3):1–34
Estrin G (1960) Organization of computer systems: the fixed plus variable structure computer. In: Western Joint IRE-AIEE-ACM Computer Conference, pp 33–40
Feng W et al (2018) Improving FPGA performance with a S44 LUT structure. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 61–66
Fowers J et al (2018) A configurable cloud-scale DNN processor for real-time AI. In: ACM/IEEE International Symposium on Computer Architecture (ISCA), pp 1–14
Gaide B et al (2019) Xilinx adaptive compute acceleration platform: versal architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 84–93
Ganusov I, Devlin B (2016) Time-borrowing platform in the Xilinx ultrascale+ family of FPGAs and MPSoCs. In: IEEE International Conference on Field Programmable Logic and Applications (FPL), pp 1–9
Halfhill TR (2010) Tabula’s time machine. Microprocess Rep 131:0–0
Hall M, Betz V (2020) From tensorflow graphs to luts and wires: automated sparse and physically aware CNN hardware generation. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 56–65
Hutton M et al (2005) Efficient static timing analysis and applications using edge masks. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 174–183
Kapre N, Gray J (2017) Hoplite: a deflection-routed directional torus NoC for FPGAs. ACM Trans Reconfig Technol Syst (TRETS) 10(2):1–24
Karandikar S et al (2018) FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud. In: International Symposium on Computer Architecture (ISCA). . IEEE, pp 29–42
Krupnova H, Saucier G (2000) FPGA-based emulation: industrial and custom prototyping solutions. In: International Workshop on Field-Programmable Logic and Applications (FPL). . Springer, pp 68–77
Kuon I, Rose J (2007) Measuring the gap between FPGAs and ASICs. IEEE Trans Comput-Aided Des Integr Circuit Syst 26(2):203–215
LaForest CE et al (2012) Multi-ported memories for FPGAs via XOR. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 209–218
Lai B-CC, Lin J-L (2016) Efficient designs of multiported memory on FPGA. IEEE Trans Very Large Scale Integr (VLSI) Syst 25(1):139–150
Langhammer M, Pasca B (2015) Floating-point DSP block architecture for FPGAs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 117–125
Langhammer M et al (2021) Stratix 10 NX architecture and applications. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 57–67
Lemieux G et al (2000) Generating highly-routable sparse crossbars for PLDs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 155–164
Lemieux G et al (2004) Directional and single-driver wires in FPGA interconnect. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 41–48
Lewis D et al (2003) The Stratix routing and logic architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 12–20
Lewis D et al (2005) The Stratix II logic and routing architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 14–20
Lewis D et al (2009) Architectural enhancements in Stratix-III and Stratix-IV. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 33–42
Lewis D et al (2013) Architectural enhancements in Stratix V. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 147–156
Lewis D et al (2016) The Stratix 10 highly pipelined FPGA architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 159–168
Lockwood JW et al (2012) A low-latency library in FPGA hardware for high-frequency trading. In: Annual Symposium on High-Performance Interconnects (HOTI), pp 9–16
Meher PK et al (2008) FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic. IEEE Trans Signal Process 56(7):3009–3017
Murray K et al (2013) Titan: enabling large and complex benchmarks in academic CAD. In: IEEE International Conference on Field-Programmable Logic and Applications (FPL), pp 1–8
Murray K et al (2020a) VTR 8: high-performance cad and customizable FPGA architecture modelling. ACM Trans Reconfig Technol Syst (TRETS) 13(2):1–55
Murray K et al (2020b) Optimizing FPGA logic block architectures for arithmetic. IEEE Trans Very Large Scale Integr (VLSI) Syst 28(6):1378–1391
Nasiri E et al (2015) Multiple dice working as one: CAD flows and routing architectures for silicon interposer FPGAs. IEEE Trans Very Large Scale Integr (VLSI) Syst 24(5):1821–1834
Nikolić S et al (2020) Straight to the point: intra- and intercluster LUT connections to mitigate the delay of programmable routing. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 150–160
Nurvitadhi E et al (2018) In-package domain-specific ASICs for intel Stratix 10 FPGAs: a case study of accelerating deep learning using TensorTile ASIC. In: IEEE International Conference on Field-Programmable Logic and Applications (FPL), pp 106–1064
Nurvitadhi E et al (2019) Why compete when you can work together: FPGA-ASIC integration for persistent RNNs. In: IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp 199–207
Papamichael MK, Hoe JC (2012) CONNECT: re-examining conventional wisdom for designing NoCs in the context of FPGAs. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 37–46
Parandeh-Afshar H et al (2012) Rethinking FPGAs: elude the flexibility excess of LUTs with and-inverter cones. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 119–128
Petelin O, Betz V (2016) The speed of diversity: exploring complex FPGA routing toplogies for the global metal layer. In: IEEE International Conference on Field-Programmable Logic and Applications (FPL), pp 1–10
Petersen MB et al (2021) NetCracker: a peek into the routing architecture of Xilinx 7-series FPGAs. In: International Symposium on Field-Programmable Gate Arrays (FPGA)
Putnam A et al (2014) A reconfigurable fabric for accelerating large-scale datacenter services. In: ACM/IEEE International Symposium on Computer Architecture (ISCA), pp 13–24
Qian T et al (2018) A 1.25 Gbps programmable FPGA I/O buffer with multi-standard support. In: IEEE International Conference on Integrated Circuits and Microsystems, pp 362–365
Rasoulinezhad S et al (2019) PIR-DSP: an FPGA DSP block architecture for multi-precision deep neural networks. In: IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp 35–44
Rasoulinezhad S et al (2020) LUXOR: an FPGA logic cell architecture for efficient compressor tree implementations. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 161–171
Rettkowski J et al (2017) HW/SW co-design of the HOG algorithm on a xilinx zynq SoC. J Parallel Distrib Comput 109:50–62
Ronak B, Fahmy SA (2015a) Mapping for maximum performance on FPGA DSP blocks. IEEE Trans Comput-Aided Design Integr Circuits Syst 35(4):573–585
Ronak B, Fahmy SA (2015b) Minimizing DSP block usage through multi-pumping. In: International Conference on Field Programmable Technology (FPT)
Sivaswamy S et al (2005) HARP: hard-wired routing pattern FPGAs. In: International Symposium on Field-Programmable Gate Arrays (FPGA)
Swarbrick I et al (2019) Network-on-chip programmable platform in versal ACAP architecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 212–221
Tang X et al (2019) A study on switch block patterns for tileable FPGA routing architectures. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 247–250
Tatsumura K et al (2016) High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs. In: IEEE International Conference on Field-Programmable Technology (FPT), pp 4–11
Tessier R et al (2007) Power-efficient RAM mapping algorithms for FPGA embedded memory blocks. IEEE Trans Comput-Aided Des Integr Circuits Syst 26(2):278–290
Turakhia Y et al (2018) Darwin: a genomics co-processor provides up to 15,000x acceleration on long read assembly. ACM SIGPLAN Not 53(2):199–213
Tyhach J et al (2004) A 90 nm FPGA I/O buffer design with 1.6 Gbps data rate for source-synchronous system and 300 MHz clock rate for external memory interface. In: IEEE Custom Integrated Circuits Conference, pp 431–434
Upadhyaya P et al (2016) A fully-adaptive wideband 0.5–32.75 Gb/s FPGA transceiver in 16 nm FinFET CMOS technology. In: IEEE Symposium on VLSI Circuits, pp 1–2
Wang E et al (2019) Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput Surv (CSUR) 52(2):1–39
Wilton S et al (1995) Architecture of centralized field-configurable memory. In: ACM International Symposium on Field-Programmable Gate Arrays (FPGA), pp 97–103
Wong H et al (2011) Comparing FPGA vs. custom cmos and the impact on processor microarchitecture. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 5–14
Yazdanshenas S, Betz V (2018) Interconnect solutions for virtualized field-programmable gate arrays. IEEE Access 6:10497–10507
Yazdanshenas S, Betz v (2019) COFFE 2: automatic modelling and optimization of complex and heterogeneous FPGA Architectures. ACM Trans Reconfig Technol Syst (TRETS), 12(1):1–27
Yazdanshenas S et al (2017) Don’t forget the memory: automatic block RAM modelling, optimization, and architecture exploration. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 115–124
Yiannacouras P et al (2009) Data parallel FPGA workloads: software versus hardware. In: IEEE International Conference on Field-Programmable Logic and Applications (FPL), pp 51–58
Young-Schultz T et al (2020) Using openCL to enable software-like development of an FPGA-accelerated biophotonic cancer treatment simulator. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 86–96
Zgheib G et al (2014) Revisiting and-inverter cones. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pp 45–54
Zhao Z et al (2020) Achieving 100 Gbps intrusion prevention on a single server. In: USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp 1083–1100
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2023 Springer Nature Singapore Pte Ltd.
About this entry
Cite this entry
Boutros, A., Betz, V. (2023). Field-Programmable Gate Array Architecture. In: Chattopadhyay, A. (eds) Handbook of Computer Architecture. Springer, Singapore. https://doi.org/10.1007/978-981-15-6401-7_49-1
Download citation
DOI: https://doi.org/10.1007/978-981-15-6401-7_49-1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6401-7
Online ISBN: 978-981-15-6401-7
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering