11.1 Historical Prospective

Logic devices have amounted to about two thirds of the IC industry for many years. In logic devices, there has always been a tradeoff between the costs of developing the logic device in time and money, versus the cost of the end product in terms of performance, power, and cost (“PPC”) as illustrated in Fig. 11.1.

Fig. 11.1
figure 1

Logic device tradeoff

In a fundamental work at the Berkeley Wireless Research Center and followed work at many other technology centers [1,2,3] this tradeoff has been characterized over two decades of designs and benchmarks (Fig. 11.2).

Fig. 11.2
figure 2

Characterization of logic device tradeoff

At the early days of the FPGA market, two programming technologies were competing—SRAM based Look Up Table (LUT), and Anti-Fuse. LUT eventually won because it allows easy technology scaling and unlimited reprogramming iterations. Yet, due to the severe PPC penalties of FPGA technology [4], the adoption of the FPGA technology remains limited (Fig. 11.3).

Fig. 11.3
figure 3

The FPGA penalties

Adapting 3D technology to FPGA design could be cost-effective and might greatly reduce those PPC penalties.

11.2 Early Work on 3D FPGA

Early work on 3D FPGA considered that forming the SRAM of the LUT on top of the FPGA logic would be technologically possible and far less demanding than forming two levels of logic one on top of the other. Tier Logic collaborated with Toshiba [5] to build SRAM using Thin Film Transistors (TFT) for the FPGA LUT on top of the rest of the FPGA circuit. It believed it could have reduced the FPGA device area by about 20%, yet the effort failed, and the project was shut down. A similar concept using RRAM [6] on top of the logic instead of TFT reported potential 40% reduction compared to 2D FPGA but was not pursued commercially.

CEA Leti has been developing sequential monolithic 3D calling it CoolCube™. As a benchmark, they evaluated [7] applying their technology for FPGA putting logic over memory with the expectation to achieve 55% area reduction compared to 2D FPGA  [9].

11.3 3D for Multi-configurations

Tabula, a recently failed start-up, had developed a unique type of FPGA—a real time reconfigurable FPGA. The concept tries to leverage FPGA reconfigurability through storing multiple configurations on-chip and swapping them as needed. It effectively attempted to compensate for the limited area efficiency of the FPGA by reusing the same chip’s real estate for multiple purposes on the fly. The company even called its product a 3D FPGA, time being the 3rd dimension. Tabula had raised about $200M but eventually went out of business. An interesting concept that could be added to Tabula structure has been suggested [8] to leverage monolithic 3D technology for multi-stack to hold the multi configuration of the FPGA. Having more than one configuration of a device stack in 3D could allow switching between device configurations within just a few clock cycles and would not increase the device footprint.

11.4 3D for FPGA-ASIC Dual Mode Concept

An interesting alternative to FPGA was developed by eASIC [10], recently acquired by Intel. The original concept pioneered by eASIC was that the key deficiency of FPGA is its Programmable Interconnect (“PIC”) rather than logic. Consequently, eASIC’s early product used programmable LUT-4 (SRAM based) with mask-defined via interconnection. Figure 11.4 illustrates the advantage of via defined interconnect versus PIC at the 45 nm node.

Fig. 11.4
figure 4

Programmable interconnect versus masked defined interconnect

It should be noted that PIC requires sharing some of the base silicon fabric and consumes additional routing resources by going down from the interconnect levels (metal layers 3–6) to the base silicon and up again.

Figure 11.5 illustrates the effectiveness of via-defined interconnect logic. It could potentially provide logic that has only a factor of 2–4 area penalty versus ASICs, with a power-speed penalty of 2–3.

Fig. 11.5
figure 5

Source eASIC web site

eASIC versus FPGA and versus ASIC.

Leveraging monolithic 3D technology could enable effective replacement of eASIC’s via with electrically programmable anti-fuse, thus enabling FPGA devices with better than 10× improvement to PPC.

3D heterogeneous integration could help overcome some of the known limitations of anti-fuse technology. First, it allows using a standard fab and process for the base FPGA fabric. Second, it allows saving on the anti-fuse high voltage programming circuits overhead by moving them to an upper level.

Replacing via-defined interconnect fabric with programmable anti-fuse interconnect fabric could be done with relatively low overhead (<20%) as is illustrated by Fig. 11.6.

Fig. 11.6
figure 6

Anti-fuse M × N fully populated crossbar interconnect structure

An additional advantage in which 3D heterogeneous integration could be applied is supporting dual mode of the custom logic: using field programmable device for prototypes and low volume, and form a low-cost compatible volume replacement device, in which the anti-fuses are replaced by a mask-defined via layer (Fig. 11.7).

Fig. 11.7
figure 7

Dual mode: FPGA for prototype and low volume, and mask-defined via for low cost

Removing the anti-fuse and programming circuitry could reduce costs of the high-volume part for the relatively low cost of a single via mask.

11.5 Utilizing 3D Memory Fabric for FPGA Fabric

The breakthrough which was introduced with 3D NAND technology was the introduction of a new form of scaling—3D Scaling. In 3D scaling technology, more device transistors (or memory cells) are being produced for about the same manufacturing effort by having more layers in the substrate starting wafer. In Chap. 10 we presented a variation called 3D NOR which could be used to replace Stacked Capacitor DRAM technology. Here, a technology concept is presented to leverage 3D scaling for FPGA fabric. The technology has also been detailed in MonolithIC 3D, Inc. patent applications [11, 12]. The first structure [11] is leveraging 3D NOR memory fabric having a single crystal channel and vertically oriented word-lines for FPGA fabric. The second structure [12] leverages 3D NOR memory fabric having poly-crystalline channel and horizontally oriented word-lines for FPGA fabric. The following description is based on the first structure. First, a generic structure is constructed using shared lithography and processing, which later on could be programmed to function as an FPGA.

11.5.1 The Fabric

A key concept leveraging 3D NOR memory structure for FPGA application is using a flash memory for programmable logic applications [13,14,15] (Fig. 11.8).

Fig. 11.8
figure 8

Flash cell is a programmable logic function

A variation of the 3D NOR structure presented in Chap. 10 could include first epitaxial growth of multilayer SiGe over silicon for single crystal channel, or conventional multilayer deposition of polysilicon over oxide as common for 3D NAND. Then, etching the structure, forming rims and valleys takes place (Fig. 11.9).

Fig. 11.9
figure 9

Multilayer substrate after etching forming ridges and valleys

Next, depositing Oxide-Nitride-Oxide (O/N/O) makes the structure ready for charge trap memory function. Next, forming gates and a staircase makes the structure illustrated in Fig. 11.10.

Fig. 11.10
figure 10

Adding O/N/O, gates, and staircase access

The transistor schematic of one ridge is illustrated in Fig. 11.11.

Fig. 11.11
figure 11

Transistor schematic along a ridge

11.5.2 Programmable LUT-n Memory

The above structure could be used to form logic functions such as Look-Up-Table and programmable interconnect for FPGA applications. Figure 11.12 illustrates a LUT-2 formed in two layers of such a ridge.

Fig. 11.12
figure 12

LUT-2 could be formed in section of a 3D NOR structure

The LUT-2 gates (A, AN, B, BN) are the WL0–WL3 (Fig. 11.11). The X represents an additional variation in which an in the bit-line junction-less-transistors (“JLT”) is being formed. The details for such in bit-line JLT processing are detailed in PCT application WO 2017/053329. Such in bit-line JLT enable horizontal segmentation of the 3D NOR structure. The truth table of this LUT-2 structure is presented in Fig. 11.13 (Fig. 11.14).

Fig. 11.13
figure 13

Truth table of the programmable memory for LUT-2 function

Fig. 11.14
figure 14

LUT-4 could be formed in section of a 3D NOR ridge structure, having four LUT-2 vertically stacked within a ridge and adjacent 4 to 1 selector

The 3D NOR structure is a 3D matrix of n-type transistors. Accordingly, the logic functions formed in it utilize only n-type transistors. A transferred layer on top could be used to add full CMOS circuitry to complement the n-only programmable logic underneath. Logic circuits that utilize mainly n-type transistors had been proposed in the past [16]. One approach to reconstruct full swing signals from n-type only circuits is to use two complementing logic functions. Figure 11.15a, b illustrates the use of complementing LUT and LUT-N with top CMOS circuit to reconstruct full swing logic output.

Fig. 11.15
figure 15

a Two complementing LUT-4 with top lower control and reconstruction. b Optional differential amplifier top level reconstruction circuit

For higher performance, a differential amplifier circuit could be used instead of the logic half-latch.

11.5.3 Programmable Interconnect in Memory

Differential logic could be extended to differential signaling throughout the FPGA. It could help reduce power and improve speed but, far more importantly, it allows using the 3D NOR fabric for programmable routing. Differential interconnects offer lower voltage swings with better noise immunity resulting in lower power. For years, interconnect delay has increased with scaling, while gate delay has decreased as has been illustrated in Fig. 15.2a, b. Yet, the interconnect effect on chip power had been managed by chip operating voltage scaling known as Dennard scaling (Fig. 11.16).

Fig. 11.16
figure 16

End of Dennard scaling [17]

The end of Dennard Scaling made power the limiting factor. The constant charge and discharge of the interconnect capacitance now dominates chip power and performance (Fig. 11.17).

Fig. 11.17
figure 17

Interconnect chip power [18]

Yet, the industry has not adapted differential interconnect because it requires double the routing resources and additional support circuits. However, as power becomes a dominant problem, perhaps it is time for differential interconnects to take center role in new chip architectures.

3D scaling for configurable logic using shared litho and shared processing opens an iterating opportunity for new type of interconnect technology. In 3D scaling, many layers are processing together, allowing the effective processing of many layers of interconnect together as a generic 3D matrix, and later program them for specific interconnect functions.

For example, in a 3D fabric of 32 levels the top 10 could be used for the LUT-4 as is illustrated in Fig. 11.14 and the bottom 22 could be used for interconnect. The unused bit-lines of these 22 layers could function as horizontal (“X” direction) segments of the interconnect fabric. Vertical segment could be formed by depositing vertical (“Z” direction) conductive segments in-between the word-lines the structure—see Figs. 11.11 and 11.18a, b.

Fig. 11.18
figure 18

a Preparing the structure for Z segments, b Z segments with anti-fusses

The programmable connectivity structure could use RRAM technology or anti-fuse (One Time Programmable—“OTP”) technology. The connectivity segments in the horizontal direction vertical to the bit-line (“Y” direction), could add in using technology concept know as word-line replacement in 3D NAND (Fig. 11.19).

Fig. 11.19
figure 19

3D structure with programmable logic and X-Y-Z programmable connectivity

The support circuit on top could support the differential interconnect just like the differential logic.

The FPGA in memory fabric enables the formation of a multilayer (96–128) memory, such as 3D NOR, with the top 32 layers used for programmable logic while the rest for memory. Recently, logic in memory has become a popular concept as it fits very well many AI type applications. The 3D NOR with built-in FPGA could fit very well in this emerging space.

As a standalone FPGA product, 3D-NOR base FPGA could compete well with mask-defined standard cell designs. The LUT-4 footprint could be about (10 × 100 nm) × (2 × 100 nm) = 0.2 µm2 which represents a logic density of about 70 MGate/mm2. The forecast for standard cells at the 7 nm node is about 20 MGate/mm.

11.6 Summary

A few alternative concepts have been presented for use of 3D integration in FPGA applications. These alternatives offer different uses of 3D technologies resulting in different PPC, spanning the spectrum from 2× better FPGA, to about 0.4× of ASIC PPC, and to the 3D NOR FPGA, while having better PPC than ASICs.