6.1 Ultra-Low-Power CMOS in IoT Front-End Devices

The numbers of Internet-of-Things (IoT)Footnote 1 connected devices are reported to be about 15 billion in 2015 and continuously increasing with a high CAGR (compound annual growth rate) of ~10% or more [2,3,4] as shown in Fig. 6.1. The major functions of the IoT frond-end devices are sensing data and actuating something from/to the real physical world in the cyber-physical systems [5]. The actuating part may employ various types of devices such as displays and mechanical assemblies. The sensing part is usually comprised of sensors, analog front-end circuits, analog-to-digital convertors, edge processors, and data-transmission circuits. The data transmission often uses wireless communication to avoid any wiring, and for the same reason, the sensing part is better working without outer power supply, by using batteries or energy harvesters (EH). Since the wireless communication consumes relatively much power, in the operation with a standalone (battery or EH) power source, decreasing the data transmission rate is an effective way to reduce the power consumption of each IoT connected device. The use of the edge processing with ultra-low-power consumption has thus drawn much attention in recent years as well as decreasing latency for the data communication like the 5G technology.

Fig. 6.1
figure 1

Annual growth forecast of number of connected devices [2,3,4]

Continuous progress of semiconductor devices, especially, CMOS integrated circuits (ICs) with an aggressive miniaturization, has enabled both the performance improvement and decreasing the power consumption. It is well known, however, in the recent generations, the increase in the performance-per-power efficiency has been slowed down as the article named “The Free Lunch Is Over” [6] clearly depicted. As seen in Fig. 6.2 on the power and power-performance efficiency of top-class supercomputers [7], the maximum performance is dominated by available power: about 20 MW, this maximum power level has been unchanged in recent years. This situation (power-limited performance) is common from a large-scale supercomputer system to a tiny IoT sensor-node device. In the large system, the performance is dominated by the available power from the power grid and also by the cooling capability, and in the standalone wireless sensor node, the performance is determined simply by the battery life or the maximum generating power of EH. The only way to increase the performance for each system, regardless of its size or capacity, is thus to improve the performance-per-power efficiency.

Fig. 6.2
figure 2

Power consumption and power efficiency of top 500 supercomputers [7]

6.2 Available Power for a Standalone Sensor Node and Power Requirement for a Micro Controller Unit

In the battery-operated system, in general, life of the battery (or interval of charging cycle for the rechargeable battery) is determined by the average current.Footnote 2 Figure 6.3 depicts the life of popular small-size batteries as a function of average current. Considering the battery life (or charging interval) of about one year, the average current of about 100 µA will be required. For the devices powered by EH, the situation is similar. The available power of popular EH sources is on the order of 100 µW [8]. Note that the photovoltaic cells are most powerful among the EH sources, however, the available power by indoor light stays on the same level.

Fig. 6.3
figure 3

Battery life as a function of average current

In order to decrease the average current consumption to the above mentioned level, it is useful to reduce the working duration of wireless communication since the power consumption of wireless communication is in general tenth or hundredth of mW level. For most of the sensing nodes, the required data are not continuous, and thus intermittent operation of sensing and data transmission is a realistic solution. Reducing the power for the data processing at the sensing node (edge) is important as well, and this is the main topic of this chapter. In the standalone sensor node, a micro controller-unit (MCU) is usually used.

Let us consider the required power consumption level of an MCU for this purpose. The power efficiency metrics for MCU are in general active current per clock frequency (µA/MHz) and standby current (µA). Considering the target power consumption level of 100 µW to 1 mW, operation voltage at or less than 1 V, and clock frequency of 10 MHz or more, that is, the typical operation conditions of an MCU for the IoT sensor-node application, the active current level of 10–100 µA/MHz is required. Since the power consumption of the sensing node does not drop to zero due to the leakage current even for the intermittent operation, the standby current of MCU should be considerably low. Although most of MCUs have multiple sleep modes to reduce the standby current, it is essential to reduce the leakage of CMOS circuits themselves, and thus the low-leakage CMOS device technology is important. In the next section, the factors to reduce both the active and standby powers (currents) of the CMOS unit circuit are briefly reviewed.

6.3 Energy Efficient CMOS Operation

The power consumed by the CMOS circuit has two components, that is, active (dynamic, or switching) and leakage (or static) power. The power consumption of CMOS inverters, as a representative of CMOS circuits, can be expressed as

$$P = n\left( {\alpha C_{load} V_{dd}^{2} f + I_{leakage} V_{dd} } \right),$$
(6.1)

where n is the number of transistors, α is an activity factor (including time averaged active ratio of transistors), Cload is the load capacitance, Vdd is the operation (or supply) voltage, f is the clock frequency, and Ileakage is the leakage current. Note that the power due to the short-circuit current \(\left( {I_{sc} \propto \left( {V_{dd} /2 - V_{th} } \right)^{2} } \right)\) is omitted for simplicity. The energy consumed by a logic (switching) operation is more important than the power because it directly reflects the efficiency of the information processing. Divided by αf, the energy per a single switching operation (energy per cycle) is thus written as,

$$E = n\left( {C_{load} V_{dd}^{2} + I_{leakage} V_{dd} /\alpha f} \right).$$
(6.2)

The first and second terms correspond to active and leakage energies, respectively. In the same technology node (feature size of CMOS integrated circuits), Cload is constant and thus decreasing the Vdd is effective for decreasing the active energy. On the other hand, to minimize the leakage energy, the operation frequency should be taken into account. Let us consider the operation at the maximum operating frequency here that is determined by the propagation delay time tpd of CMOS circuits,

$$t_{pd} \propto C_{load} V_{dd} /\left( {V_{dd} - V_{th} } \right)^{m} ,$$
(6.3)

where Vth is the threshold voltage assuming a symmetrical CMOS operation: Vth = Vthp = −Vthn where Vthp and Vthn are threshold voltages of p- and n-type MOSFETs, respectively, and m is a factor taking the velocity saturation into account. For the recent CMOS technologies, the m value is about 1.2.

The energy E in (6.2) is thus determined by Vdd, Vth, and f where Cload and α are assumed to be constant. Although f can be arbitrarily set, in order to minimize E, f should be maximized within the range satisfying (6.3). As an example, E as a function of Vdd and Vth behaves as shown in Fig. 6.4 [9]. As indicated by the energy contours and a dashed line, the optimal combination of Vdd and Vth is determined according to the required frequency. The absolute minimum energy point (MEP) is also shown in the graph, however, this point lies at near the condition where Vdd is slightly less than Vth (so called sub-threshold operation), and its frequency is very slow. A practically useful approach is thus to follow the dashed line to increase f at the expense of the increase in E. This means that Vth should be controlled together with Vdd. The only way to control Vth within the conventional CMOS circuit operation scheme is applying back bias.

Fig. 6.4
figure 4

Minimum energy point and energy contours © 2005 IEEE [9]

6.4 Suitable CMOS Technology Node for IoT Front-End Devices

The miniaturization of the CMOS technology has enabled to decrease Cload, and together with decreasing Vdd, the active energy has significantly reduced. The leakage energy, however, has been rather increased for the recent highly scaled generations due to tunnel leakage around the gate electrode, short-channel-effect related subthreshold leakage, and so on. By taking the required frequency and the acceptable leakage-current level into account, one can determine the optimum CMOS technology node and its technology flavor such as general purpose or low standby power. There are many reports on the energy minimizing in various technology generations, for example, following the conventional scaling model, the energy at MEP decreases with decreasing the technology node from 65 to 22 nm [10], whereas in the low-energy dedicated design (subthreshold logic), the energy at MEP hits the bottom at 90 nm [11]. The change of the MEP behavior with different technology nodes is schematically shown in Fig. 6.5 as another example. In this calculation, the typical 65 and 28 nm processes of relatively high performance and relatively low leakage processes, respectively, are assumed. The typical parameters are: relative Cload of 1 and 0.25, Vth of 0.25 and 0.15 V, and subthreshold swing of 75 and 85 mV/decade, respectively. For the performance-dedicated applications, it is preferred to use more advanced process, 28 nm in this figure, and the energy at MEP is higher and also the voltage of MEP is higher (at 0.6 V). On the other hand, for the low-energy dedicated applications, the voltage of MEP is about 0.4 V or less. The minimum energy can decrease whereas the clock frequency is not so high. For the IoT front-end device applications, the low-energy dedicated option will be preferred.

Fig. 6.5
figure 5

Comparison of minimum energy points for high-performance and low-leakage processes

In addition to the above energy-performance trade-off, the production cost is another factor to choose the adequate CMOS technology. In the past generations, the most advanced process, that is, most miniaturized process, was the most cost-effective option. In the recent few generations, the situation has been different. There is a report that the lowest cost for an IC die with a high-volume production is achieved in the 65 nm process [12]. Although the situation may change by the maturity of the advanced CMOS processes and circuit design technologies, staying on at 65–40 nm nodes is currently the preferred option for the IoT front-end device applications.

6.5 The Variability Issue that Hinders Low-Voltage Operation

As discussed in the previous section, decreasing the operation voltage Vdd to that of the MEP condition is an important approach for the energy efficient CMOS ICs. The CMOS scaling, however, has brought another important issue: statistical characteristic variability of transistors. In the ultra-large-scale ICs with the most advanced process, the number of transistors exceeds billions. Decreasing the characteristic variability of transistors is thus crucial problem to operate the ICs without any functional errors, especially for low-voltage operation since the voltage margin should be minimized. There are many reports on the lowest operating voltage taking the variability into account. The lowest voltages of both logic and static random-access memory (SRAM) circuits have rather increased in the recent CMOS technologies [10]. Among many types of CMOS transistor characteristics, variability of Vth is most important. It is well-known that the Vth variability is defined as [13]

$$\upsigma\,V_{th} = A_{VT} /\sqrt {LW} ,$$
(6.4)

and in the conventional bulk MOS transistor, it can be written as

$$\upsigma\,V_{th} \propto t_{ox} N_{imp}^{1/4} /\sqrt {LW} ,$$
(6.5)

where σVth is the standard deviation of Vth, AVT is the Pelgrom coefficient, L is the channel length, W is the channel width, tox is the gate-oxide thickness, and Nimp is the impurity density of the channel region.

If we follow the ideal scaling rule [14], tox, L, and W decrease and Nimp increases at a constant rate by generation. This device scaling strategy inherently increases σVth slightly by generation [15]. Moreover, in the recent generations, tox has not been scaled down sufficiently due to the increase of gate leakage current. This further increases the σVth. It is thus very difficult to decrease the Vth variability with the conventional bulk MOS transistors. Using the transistor structures with a fully-depleted (FD) channel is a possible solution because, in these structures, the transistor characteristic can be controlled without increasing the channel impurity density Nimp, and thus σVth can be significantly decreased as indicated by (6.5).

6.6 SOTB Technology

It is a long history to commercialize the FD transistors. There were several important proposals regarding the transistor structures of FD or a reduced impurity-density channel in the late 1980s–early 1990s: for example, the intrinsic-channel (epitaxially grown channel) structure of the bulk MOSFETs [16], the planar double-gate structure [17], and the DELTA structure [18]. Note that the DELTA structure is the original name that is now well-known as FinFET with three-dimensional channels. As a family of the planar FD transistors using an SOI (silicon on insulator) wafer with a very thin BOX (buried oxide) layer, and adding new features of a high Vth controllability and a high compatibility with the existing bulk CMOS technology, the SOTB (Silicon on Thin Buried Oxide) transistor was proposed in 2004 [19]. The schematic cross section of the SOTB structure is shown in Fig. 6.6.

Fig. 6.6
figure 6

Schematic cross section of SOTB transistors. Hybrid bulk transistors are shown. SOTB transistors are used in low-voltage (< ~1.5 V) logic and analog circuits including SRAMs. Bulk transistors are used in peripheral, ESD-protection, high-voltage analog and power circuits, on-chip, flash memory, and reuse of legacy circuits

The advantages of SOTB transistors are listed as follows:

  1. 1.

    Excellent short-channel-effect (SCE) immunity due to better electrostatic control enabled by thin SOI and BOX layers and a underlying ground plane (GP).

  2. 2.

    Small Vth variability and low sensitivity to SOI-thickness variation due to a low Nimp SOI channel.

  3. 3.

    Flexible Vth control by Nimp and depth profile of the GP region.

  4. 4.

    Back-gate-bias control by applying voltages to the GP regions of p- and n-type transistors through the Vbp and Vbn terminals. Deep n-well region secures separation between the two GP regions if proper back-gate bias voltages are applied.

  5. 5.

    A hybrid bulk transistor can be integrated on the same wafer by removing the SOI and BOX layers. Patterning the gate electrode and shallow-trench isolation of both SOTB and hybrid bulk transistors can be done because step-height difference between the SOTB and hybrid bulk regions is small due to thin SOI and BOX layers.

  6. 6.

    The planar layout of transistors and logic cells is the same as that of the existing bulk technology.

  7. 7.

    High soft-error (single-event-upset) immunity against a high-energy-particle irradiation such as alpha particles and neutrons due to the thin active (channel) layer separated from the substrate by the BOX layer.

Proper Vth control is important to solve the performance and power trade-offs as described in Sect. 6.3. In the SOTB technology, the Vths of different flavors such as those suitable for ultra-low-voltage (Vdd down to 0.4 V) or ultra-low leakage (off-current down to pA/µm level) operations are controlled by selecting proper high-k gate-stack materials and changing the impurity density of the GP region [20].

Back-gate-bias controllability is an important point of the SOTB transistor design. In the SOTB structure, the GP layers act as back-gate electrodes. To achieve high back-gate bias controllability, it is important to thin down the BOX-layer as well as decreasing the depletion-layer thickness in the GP layer under the whole range of the bias voltages because the depletion layer also acts as a dielectric layer between the channel and the back gate. In the typical SOTB transistor design, for example, the back-gate-bias coefficient (γ)Footnote 3 is about 0.16 for the design with 10-nm BOX thickness and nearly uniform impurity-density profile of 1 × 1018 cm−3 in the GP region just below the BOX layer [15].

The range of back-gate-bias voltage is limited by the leakage current between the two GP layers through the deep n-well. The voltage difference, Vbp − Vbn (see Fig. 6.6), depends on back-bias voltage (Vbb) and operation voltage (Vdd), where Vbp = Vdd − Vbb and Vbn = Vbb. In the reverse back biasing to increase Vth (Vbb < 0), Vbp is positive, and Vbn is negative. In such a case, the junction between the nGP and pGP is reversely biased. In the forward biasing condition, Vbn > Vbp, that is, Vbb > Vdd/2, the junction is positively biased. The maximum applicable positive back-bias voltage is thus limited to the condition Vbn − Vbp < 0.5 (built in potential of pn junction); that is, Vbb < 0.25 + Vdd/2. A significant amount of leakage current flows from the pGP to the nGP when this condition is not satisfied. To apply higher forward Vbb, The flip-well structure (the conduction types of the GP layers in Fig. 6.6 are swapped each other) was proposed [21]. The forward Vbb significantly increases the maximum clock frequency of circuits, but also increase the static leakage current. The flip-well technology is thus suitable for high-performance applications. In the SOTB technology in this section, the normal conduction types of the GP layer are preferred since the static leakage reduction by the reverse back biasing is important for the IoT device applications.

6.7 Reduction of Vth Variation and Ultra-Low-Voltage SRAM Operation

The Vth variation of SOTB transistors was demonstrated to be about half of the bulk transistors of the same size both for p- and n-types [22]. Recent results [23] with the effect of back biasing are shown in Fig. 6.7. It should be noted that the Vth variation under reverse back biasing at Vbb = −2 V, that is effective for the static leakage reduction of a few orders of magnitude, is the same as that at Vbb = 0 V.

Fig. 6.7
figure 7

Vth distribution of one-million n-type SOTB transistors compared with bulk transistors of the same size © 2017 IEEE [23]

It is known that the low-voltage operation of SRAM is more difficult than the general CMOS logic circuits as indicated in [10]. It is thus important to investigate the lowest operation voltage (Vmin) of SRAM to verify the effect of the characteristic variability reduction. For the SOTB SRAM of 0.54 µm2 area of the conventional 6-transistor layout, the Vmin of 0.37 V was reported [22]. It was demonstrated that this Vmin can be achieved by controlling Vbb regardless of temperature variation from −30° to 80 °C. Figure 6.8 shows the Vmin of SOTB SRAMs with different Vth flavors (high speed or low leakage). Lower Vmin even at higher Vth than bulk SRAM was demonstrated.

Fig. 6.8
figure 8

Vmin (VDDmin) of SOTB SRAMs compared with bulk SRAMs © 2017 IEEE [23]

It should be noted that the SOTB SRAM can store the data at very small leakage current level (cell leakage about 1.2 pA [22]) by applying a proper reverse Vbb. Taking advantage of this feature, the SOTB SRAM can be used as a pseudo-nonvolatile memory in the specific applications.

6.8 Circuit Design Environment and Open Shuttle Activity

The design flow of the SOTB ICs can be built based on that for the bulk CMOS technology of the same technology node. The electronic design automation (EDA) tools and their file formats are completely the same as the existing ones from the register transfer level (RTL) to the layout (graphic database system: GDS). The circuits including both the SOTB and the hybrid bulk transistors can be designed at a time.

The design (mask) layer, layout rules and their verification files (including the antenna effect) should be revised or added to match the characteristics of SOTB. There are a few additional points to be specially considered related to the back-gate biasing. The location and distance of the back-gate-bias voltage taps are important design points for compromising the back-gate voltage stability and the integration density. These are generally embedded in the layout rule file and the standard-cell layouts. In some applications using different back-bias voltages in fine grained back-bias domains (that will be shown in Sect. 6.11), the spacing between the deep-n-well islands is preferably decreased, and it is also a trade off among the size, the leakage current, and the range of the back-bias voltage [24].

The compact model of transistors is indispensable for the circuit design with high accuracy of both timing and power estimation. Currently available transistor models for bulk transistors cannot be used accurately for the SOTB technology because the transistor characteristics under various back-bias voltages cannot be reproduced. New SPICE (Simulation Program with Integrated Circuit Emphasis) models for the SOTB and related thin-BOX FDSOI transistors with back biasing, namely, HiSIM-SOTB [25] and BSIM-IMG [26], have thus been developed. Both models are based on a surface-potential expression and they represent well the behavior of transistor characteristics with varying back-bias voltages.

Detailed studies on circuit design such as body-biasing schemes, delay variability reduction, signal voltage design on ultra-low-voltage macros, energy minimization have been reported [27,28,29,30,31]. Moreover, various types of circuit designs have been implemented using the SOTB shuttle service operated by the University of Tokyo in collaboration with Renesas Electronics Corp. from FY 2015 [32]. About 12 chip designs per shuttle run were fabricated in FY 2018. This shuttle is not restricted to academia, but can be used for the commercial proto typing. Most of the circuit design examples that will be mentioned in the later sections are demonstrated using this shuttle service.

6.9 MCU with Back-Gate Bias Control

A low-energy-consumption central processing unit (CPU) core for the MCU application was demonstrated using the 65-nm SOTB process [33]. The CPU core consists of an in-order 5-stage pipeline, and 4 blocks of 32 kword × 9 data memory. The scale of integration for this 32-bit CPU core is 50.1 kgate logic and 144 kB SRAM arrays and the area is 2.1 mm2. As shown in Fig. 6.9, the core is functional down to Vdd = 0.22 V at 1 MHz clock frequency, whereas the same core fabricated by the conventional bulk process operates down to 0.5 V. The MEP of the SOTB core is 13.4 pJ/cycle at Vdd = 0.35 V as shown in Fig. 6.10, which corresponds to 38 µA/MHz. This is a good number for the IoT application chip (see Sect. 6.2). The optimization of the energy is done by controlling the back-gate bias Vbb. The sleep current is only 0.14 µA at Vdd and Vbb of 0.35 and −2.5 V, respectively. Considering the intermittent operation, the average current consumptions for the activity (ratio of wakeup time) of 0.1, 1, and 10% are 0.52, 3.94, and 38.1 µA, respectively, and these are suited for both the battery- or EH-powered operations. Note that the current consumption of the Vbb generator circuit should be taken into account for this type of operation, because the generator should work throughout in the standby state. It was reported that the current consumption can be less than 1 µA [34], and thus this current consumption level is negligibly small. These data, low energy and sleep current, proves that the SOTB technology is a suitable for the energy-efficient MCU in the IoT applications.

Fig. 6.9
figure 9

Maximum operating frequency of 32-bit CPU core fabricated by SOTB and bulk technologies [33] © 2014 IEEE (same figure as Fig. 2.10 (a) in CHIPS 2020 Vol. 2)

Fig. 6.10
figure 10

Energy per cycle of 32-bit CPU core fabricated by SOTB and bulk technologies [33] © 2014 IEEE (same figure as Fig. 2.10 (b) in CHIPS 2020 Vol. 2)

The advanced MCU chip design equipped with an on-chip Vbb generator and various peripheral circuits was demonstrated [23]. Assuming the application with the EH power source and rf communication, the process and operating conditions of SOTB is slightly modified (with higher Vth) from those of [33]. The scale of integration for this MCU chip is 64 kgate 32-bit CPU logic and 64 kB SRAM. The maximum operation frequency can be controlled by Vdd and Vbb as shown in Fig. 6.11. At Vdd and Vbb of 0.75 and 0 V, respectively, the maximum frequency is 75 MHz and the active current is 37 µA/MHz. The leakage currents at Vbb = 0 and −1.5 V are 4.3 µA and 45 nA, respectively. In this design, due to higher Vdd and frequency than the design of the previous paragraph [33], the energy per cycle might be higher than MEP, nevertheless, it provides a practically useful option (the active current is the same level), as far as the available supply voltage matches the required Vdd (for example, using the dry cell of 0.8 V end voltage). In the above MCU designs, the nonvolatile memory macro, usually used to store the program code, are not implemented. In these chips, however, the code can be fetched from the SRAM and it can be stored taking advantage of a very small leakage current by applying the reverse Vbb.

Fig. 6.11
figure 11

Maximum clock frequency of MCU chip as a function of Vdd and Vbb © 2017 IEEE [23]

The energy and current consumption of the CPU cores are compared in Table 6.1. It is remarkable that both the active and standby currents for the SOTB CPUs are small.

Table 6.1 Comparison of energy and current consumption of various 32-bit CPU cores

6.10 MCU with Embedded Memory

In many applications using MCUs, the embedded nonvolatile memory is useful for storing the program code, parameters and the various data, from the sensors for example. Taking advantage of the hybrid bulk integration capability of the SOTB technology, the conventional embedded flash-memory macro can be integrated with the SOTB MCU core. The integration of a two-transistor type metal-oxide-nitride-oxide-silicon (MONOS) flash memory macro was demonstrated [37]. A new sense amplifier and a data transmission circuit were designed to utilize the SOTB’s low-energy and low-voltage capability. The memory operates at 64 MHz, and its read energy and current are 0.22 pJ/bit and 6.32 µA/MHz (32 bit bus).

There are various types of embedded memory candidates, among them, the code memory using the atom switch of lower energy than the conventional flash was demonstrated [38]. The atom switch is a family of the resistive random-access memory (ReRAM) utilizing the polymer electrolyte and metal (copper and ruthenium) electrodes. The advantages are a low writing voltage as low as 2 V and a high on-off ratio. The 32-bit MCU test chip with the atom-switch code memory was fabricated. The chip can operate at 25 MHz at Vdd = 0.39 V. The energies per cycle for memory and total (memory and logic) are 4.48 pJ (0.14 pJ/bit) and 18.26 pJ, respectively. The latter corresponds to the active current of 46.82 µA/MHz. Moreover, the nonvolatile programmable-logic circuits can be embedded with the atom-switch technology on the SOTB CMOS platform [39]. This circuit acts as an off-loader to improve the total energy efficiency (the same processing with less clock cycle) compared with the CPU-only circuits. Table 6.2 and Fig. 6.12 compare the performances of various MPU chips with different technologies and types of the embedded flash memory.

Table 6.2 Comparison of various MPUs with embedded flash memories
Fig. 6.12
figure 12

Benchmark of active and standby currents for various MCUs [23, 32, 42]

Finally in this section, the features and properties of the first commercial MCU chip of the SOTB technology are briefly described [42]. The CPU core is the Cortex M0+ (32 bit, two-stage pipeline) with a 1.5 MByte flash memory and 256 kByte SRAM. It operates up to 64 MHz, and the active and the standby currents are 20 µA/MHz and 200 nA, respectively. The energy performance seems to be improved from [23]. Various peripheral IP (intellectual property) cores are also embedded in the chip: analog-digital converters (ADCs), digital-analog converters (DACs), a temperature sensor, timers, serial interfaces, display interfaces, and security functions, as shown in Fig. 6.13. The unique feature of the chip is the embedded EH controller. Various types of harvesters and an energy-storage capacitor can be controlled by this chip. Due to the outstanding low-power performance, this chip seems to be a very suitable option to be used in the IoT front-end devices.

Fig. 6.13
figure 13

Block diagram of commercial MCU chip on SOTB technology [42] (SRAM: Static Random-Access Memory, ADC: Analog-Digital Converter, Vref: Reference Voltage, DAC: Digital-Analog Converter, GPT: General PWM Timer, PWM: Pulse Width Modulation, LED: Light Emitting Diode, FIFO: First-In First-Out, SPI: Serial Peripheral Interface, IIC: Inter-Integrated Circuit, QSPI: Quad SPI, USB: Universal Serial Bus, DMA: Direct Memory Access, CRC: Cyclic Redundancy Check, IWDT: Independent Watchdog Timer, WDT: Watchdog Timer, TSIP: Trusted Secure IP, TRNG: True Random Number Generator, AES: Advanced Encryption Standard, MPU: Memory Protection Unit)

6.11 Reconfigurable Circuits

In this section, the circuits are described, where the back-bias control has a strong effect for the optimization of performance and power (especially, static power). The reconfigurable circuits, such as the field-programmable gate array (FPGA), are widely used. It is well known that the flexibility and the power-performance efficiency are a trade-off relationship. For example, the hard-wired logic circuits such as the application-specific integrated circuit (ASIC) are overwhelmingly efficient compared to the software-defined circuits such as the microprocessor. However, there is no flexibility of changing the function of the circuits. Moreover, high required number of production for the custom ICs like ASIC is another obstacle for the small-volume products. The FPGA is a good compromise for this tradeoff and it is thus widely used. To optimize the power efficiency in the reconfigurable circuits, however, there is a problem to solve. In the design of the hard wired logic circuits, the designer can select the technology options, that is, Vth flavors, in each specific part of the circuits. In general, the critical paths are found through the timing analysis, and the low-Vth transistors are used only in these critical paths. By selecting proper Vth options, the performance and power of the circuits can be optimized. In the reconfigurable circuits, however, the speed requirement in each processing element (PE) is not determined at the time of the circuit design. In the conventional FPGA, therefore, all the PEs need to set to have the highest speed: low Vth. Since not all the PEs need to work with full activity in most of applications, there is a huge power loss in the conventional FPGA.

The independent back-biasing in each PE is thus a strong way to reduce the power consumption of the reconfigurable circuits. The important insight of the back-biasing for these circuits is that only the performance of the PEs in the bottle-neck process is needed to speed-up, and at the same time, the other PEs are better to a slow-down (with reverse back biasing) to reduce leakage power while securing the total performance (clock frequency).

The significant improvement of the power efficiency for FPGAs was demonstrated with independent back biasing for each PE in the FPGA, named Flex-Power FPGA [43] using the 65 nm SOTB process. The schematic architecture of the Flex-Power FPGA is shown in Fig. 6.14. Each PE has a body bias (back bias) selector connected to the body bias voltage lines for p- and n-type SOTBs (Vbp and Vbn). By using the specially designed mapping tool for the Flex-Power FPGA, the circuit is mapped on the look-up table of the FPGA. At the same time, the critical paths are found and the proper body-bias-selector information is also mapped. As an example, the result for the 32-bit binary counter is shown in Figs. 6.15 and 6.16. The counter operates from 14 to 72 MHz at Vdd from 0.5 to 1.2 V, respectively. It should be noted that the frequency does not change with the reverse back-bias voltages (VRBB). This is because the above mapping software sets the reverse-bias flags only for the non-critical paths. The static power can be reduced by the reverse bias by from 59 to 80% for Vdd of 1.2 and 0.5 V, respectively, as shown in Fig. 6.14. The detailed analyses on performance and power of the Flex-Power FPGA are described in [44].

Fig. 6.14
figure 14

Schematic FPGA architecture with independent back-biasing © 2016 IEICE [43]

Fig. 6.15
figure 15

Operation frequency of Flex-Power FPGA with different back-bias voltages © 2016 IEICE [43]

Fig. 6.16
figure 16

Static power reduction by back biasing in Flex-Power FPGA © 2016 IEICE [43]

Another significant power saving, regarding the reconfigurable circuits with the back-biasing, was demonstrated on the reconfigurable accelerator circuits named cool mega array (CMA) [45]. There are various types of the reconfigurable circuits: FPGA, dynamic reconfigurable processor array (DRPA), etc., with different time scales of the reconfiguration action. The CMA is designed as an off-loading processor of various image or sensing data dedicated for the low-power battery operating applications by reducing the power from those of the existing DRPAs (but without dynamic reconfigurability). The block diagram of CMA is shown in Fig. 6.17. It has a large PE array without memory elements for mapping the data flow of the application program, and has a small programmable micro controller for the data management. Results for the typical image processing (alpha blender, sepia filter, and gray-scale filter) are shown in Fig. 6.18. The maximum performance of 743 MOPS/mW, which corresponds to 1.35 pJ per operation cycle, is achieved at Vdd = 0.5 V with the optimized back-bias voltage application. Note that the curve in this graph is similar to the behavior of energy per cycle versus Vdd as shown in Figs. 6.5 and 6.10. The image processing on an evaluation board was demonstrated using lemon batteries [45] or indoor solar cells.

Fig. 6.17
figure 17

Block diagram of a CMA © 2015 IEEE [45] (ALU: arithmetic and logic unit, SEL: switching element)

Fig. 6.18
figure 18

Performance of the CMA for various image processing with and without back-biasing © 2015 IEEE [45]

In the back-bias operation of these reconfigurable circuits, the granularity of the back-bias domains is an important design point. Considering the effect of the back-biasing, it is ideal that all the domains should be independently controlled, however, this has a high area penalty. The optimization of the domain division size is investigated for CMA [46]. The sizes are selected from 1 × 1 to 4 × 4, where their area penalties varied from 12 to 1%. Figure 6.19 shows the power reduction ratio compared to the case that all the PEs operate under zero back-bias voltage for various image processing algorithms. The back-bias voltages are optimized for each algorithm and each power domain. Although the results are slightly different for the algorithms, where the usage of PEs is different, they clearly depict that there are optimum domain sizes for different algorithms with both low power and small area penalty.

Fig. 6.19
figure 19

Power reduction ratios under optimal back bias compared to zero back bias for CMA of various domain division sizes © 2016 IEEE [46] (alpha: 8-bit alpha blender and af: 24-bit RGB alpha blender)

6.12 Data Processing Circuits

Low-power data processors for data query, pattern matching, database operation, signal processing, etc. are important building blocks in the IoT edge processing. As well as the parallel operation by general-purpose computing on graphics processing units (GPGPU) and FPGA, dedicated data processing units are useful in terms of higher energy efficiency. In this section, the data processing circuits based on the content-addressable memory (CAM) and the coordinate-rotation digital computer (CORDIC) algorithm are described.

The CAM-based pattern matching system for two-dimensional image search is implemented on the SOTB technology [47]. The system consists of a CAM block, a shift circuit, multiplexers, an AND logic, and a finite-state machine (FSM) controller. The CAM memory block is designed by using the two-port SRAM macro of the 65-nm SOTB technology library. Back-bias flexibly controls the active performance under the operation state, and a reverse bias of −1.2 V reduces the leakage current down to 2 µA (0.2 mA with zero bias) under the standby state. Table 6.3 compares the performance of the system with that of the bulk 65-nm process. Significant increase of energy efficiency (more than ×5) with comparable search-time performance is achieved by the SOTB technology.

Table 6.3 Performance comparison of CAM-based pattern matching systems

The bitmap indexing is a kind of database index that is used for improving the speed of database retrieval, and is useful for various data analytics. The bitmap index creator (BIC) chip with high energy efficiency was demonstrated [49]. The block diagram of the BIC core is shown in Fig. 6.20. This core is used to index N records by M given keys. The record R1 is fed into the CAM with all M keys. If R1 contains some keys, bit flags turns on (one by one for all the M keys) at the specific positions of M × N bit matrix that is finally stored in the BI memory. The chip fabricated by the 65-nm SOTB technology operates at 41 MHz (at Vdd = 1.2 V) and 10 MHz (at Vdd = 0.4 V) where energy consumptions are 163 and 19 pJ/cycle, respectively. Remarkably small standby power of 2.64 nW (0.31 pW/bit) is achieved at Vdd = 0.4 V with reverse back bias of −2 V.

Fig. 6.20
figure 20

Block diagram of the BIC core (TM: transpose matrix, CU: control unit, TU: transpose unit) © 2019 Elsevier [49]

An adaptive CORDIC-based FFT (fast Fourier transformation) macro was implemented on the 65-nm SOTB technology [50]. By utilizing both forward and reverse back biasing, the active energy performance and the leakage can be optimized. The clock frequency is 43 MHz at Vdd = 1.0 V with zero back bias where the energy is 10.27 pJ/cycle. The energy can be decreased to about 3 pJ/cycle by decreasing Vdd down to 0.5 V and controlling Vbb to satisfy the required delay. Table 6.4 compares performances of the FFT macros. A remarkable reduction in energy is demonstrated by both the Adaptive CORDIC architecture and the SOTB technology.

Table 6.4 Performance comparison of FFT macros [50]

6.13 Security Circuits

It is widely accepted that the IoT devices should be robust in terms of security against any attack via the network or outside of the device physically. There are various studies on the circuits regarding the security. This section describes typical circuits such as encryption and physically unclonable function (PUF) of ultra-low-power consumption suited for the IoT devices.

The advanced encryption standard (AES) is widely used as an encryption method [52]. Area penalty, encryption speed, and low power are main issues on the AES encryption macros. The AES encryption circuits with a simple clock-gating technique were implemented by using the 65 nm SOTB process [53, 54]. The performances of the 8-bit AES encryption circuits are compared in Table 6.5. Significant energy reduction is achieved by the 65 nm SOTB technology while keeping the frequency relatively high.

Table 6.5 Comparison of performances of 8-bit AES encryption circuits

The generation circuits of elliptic-curve cryptography (ECC), with smaller key size than the conventional RSA that is widely used for digital signatures, were developed as a suitable candidate for the small IoT devices [58, 59]. By the improvement of the signature generation architecture and the optimization of Vdd and Vbb utilizing the 65-nm SOTB technology, smaller energy and faster signature generation time (Tsig) is demonstrated. Figure 6.21 plots signature generation time, operating frequency, power consumption, and energy consumption per one-signature generation as a function of Vdd for the ECC circuits [58]. The minimum energy is 1.68 µJ at Vdd = 0.3 V and Tsig = 2.3 ms. On the process with higher Vdd flavor [59], the signature generation speed increased about 10 times higher while the energy twice. The performances of the ECC circuits with Galois field of 256 bits are compared in Table 6.6. Among the circuits of the state-of-the-art technologies, the ECC circuits with the SOTB process are advantageous for both the generation time and energy.

Fig. 6.21
figure 21

Signature generation time, operating frequency, power and energy consumptions for ECC circuits implemented on 65-nm SOTB technology © 2016 IEEE [58]

Table 6.6 Performance comparison of ECC circuits with a Galois field of 256 bits

The physically unclonable functions (PUFs) can be used for IC authentication like a fingerprint preventing from counterfeit. Among various types of PUFs, the PUF using the silicon technology, in general, generates the individual identification data extracting from the characteristic variability of each chip, such as the power-on initial value of SRAMs or the delays of gates. The low-power PUF macro is implemented on the 65-nm SOTB technology [63]. The circuit consists of two chains of selectors generating a delay variation and a flip flop acting as an arbiter, as shown in Fig. 6.22. A concern arises on implementing the PUF on SOTB, that is, the SOTB’s small variability can deteriorate the uniqueness of the PUF. The result shows that the identification error rate is rather high in the voltage range as the conventional bulk-CMOS, however, by applying the reverse back-bias or decreasing the Vdd, the error-rate decreases due to increasing the delay variability. This means that the SOTB PUF can be used under the condition of lower voltage and lower power consumption than the conventional bulk PUF. The USB stick sized PUF module is also implemented by using this technology [64].

Fig. 6.22
figure 22

Block diagram of PUF circuit © 2017 IEEE [63]

6.14 Analog and Rf Circuits

In this section, various analog and rf circuit implementations are described.

ADCs (analog-digital converters) are indispensable parts in MCUs, and successive-approximation-register (SAR) type or Δ-Σ type ADCs are frequently used. A very low power Δ-Σ modulator circuit was demonstrated [65]. Figure 6.23 shows the block diagram. By adequately controlling the back-bias voltages, the mid rail is tuned to half Vdd with the symmetrical operation of inverters that drive the switched capacitors, and this enables very low Vdd operation. The modulator operates at Vdd = 0.5 V and achieves 910 nW power consumption (0.07 μW/MHz) and the conversion figure of merit (FoM) of 46 fJ/conversion.

Fig. 6.23
figure 23

Block diagram of the Δ-Σ modulator. Two integrators, 1 bit DAC, and a comparator are composed of back-gate controlled inverters without differential amplifiers © 2017 IEEE [65]

A voltage-controlled oscillator (VCO) with back-bias control was implemented on the 65-nm SOTB technology [66]. As shown in Fig. 6.24, the VCO consists of a ring oscillator. Figure 6.25 shows oscillation frequency and current consumption. They are controlled by the back-bias voltage, where Vc,dif = Vbp − Vbn and Vc,com = (Vbp + Vbn)/2 = Vdd/2. The oscillator operates at Vdd = 0.55 V with the tuning range from 377 to 556 MHzand achieves FoM = −158 dBc/Hz. This FoM value is the best among the CMOS ring-type VCO operating less than 1.0 V.

Fig. 6.24
figure 24

Block diagram of a ring-type VCO © 2017 IEEE [66]

Fig. 6.25
figure 25

Oscillation frequency and current consumption controlled by back-bias voltage © 2017 IEEE [66]

An ultra-low-power rf receiver and transmitter for wireless sensor node are described. The on-off-keying (OOK) modulation is a simple modulation scheme and suitable for low-power applications. A 312–315 MHz receiver circuit was designed on the 65 nm SOTB technology [67]. Post-layout simulation showed −58.5 dBm sensitivity with 1.36 and 8.39 μW power consumption corresponding to 10 kbps and 100 kbps data rate, respectively. The code-modulated synchronized (CMS)-OOK modulation transmitter with a normally-off intermittent operation scheme is implemented using the 65-nm SOTB technology to significantly reduce the power consumption of the transmitter [68]. The digital part is implemented on FPGA. By employing the CMS scheme as shown in Fig. 6.26, a ring-oscillator type internal carrier oscillator with relatively high jitter can be used. This enables to turn the carrier generator on quickly (reducing the on duration of rf transmission) and to reduce the power consumption in the intermittent operation. Also, the peak output power can be reduced by diffusing the carrier frequency with the back-bias of triangular waveform. A signal modulation via back-bias terminals is a unique feature of SOTB for analog application. As a result, −62 dBm/MHz peak power spectrum density at 15 MHz bandwidth is achieved. The chip consumes 83 µW in average according to 83 nJ/bit at 1 kbps data transmission. (The analog part of the power amplifier operates at 1.0 V, and 0.75 V for the rest of the part.)

Fig. 6.26
figure 26

Block diagram of a CMS-OOK transmitter © 2018 IEICE [68]

The dynamic threshold MOSFET (DTMOS) operation can be done with the SOTB technology by applying the same signal as the front gates to the back gates. The rf energy-harvesting circuit is implemented by using the SOTB DTMOS [69]. This harvester consists of three-stage cross-couple rectifiers as shown in Fig. 6.27 connected in series. The rf signals collected from an antenna are fed into VIN terminals and the rectifier outputs a dc voltage from the VDC terminal. The nodes (N1, N2, P1, and P2) are boosted by additional two floating nodes (not shown) of the similar structure as in Fig. 6.27 to improve the rectifying operation in a small input power range. The experimental result shows that the output dc voltage exceeds 1000 mV at input 954-MHz power of −9 dBm. With the 18-cm dipole antenna collecting rf in the laboratory environment, the output voltage is 130 mV.

Fig. 6.27
figure 27

Circuit schematic of cross-couple rectifier © 2019 IEEE [69]

The low-frequency noise characteristics of SOTB have been extensively studied [70]. Figure 6.28 shows distribution of the drain-current normalized current noise intensity for bulk and SOTB MOSFETs. Due to the low impurity density of the channel region, the variation in the noise characteristics is smaller than in bulk CMOS. Although the median value of noise is higher than in the bulk due to an additional interface between channel and the BOX layer, considering the variability tail, the noise characteristics of the SOTB is better.

Fig. 6.28
figure 28

Cumulative frequency distribution of drain-current normalized current noise intensity (Sid/I2d) in bulk and SOTB MOSFET. a Weak inversion state and b strong inversion state © 2018 JSAP [70]

6.15 Soft-Error Reliability

There are various reliability issues on silicon CMOS. In the FDSOI structure like SOTB, there are additional reliability issues such as, the bias temperature instability related to different electric field in the SOTB transistor from that of the bulk CMOS [71, 72], the antenna effect (plasma damage during the fabrication process) [73, 74], and the gate-oxide reliability of the hybrid bulk CMOS fabricated on the exposed surface by removing the SOI and BOX layers [75]. In addition to these transistor process related issues, the soft error, especially, the single event upset (SEU) of SRAMs and logic circuits is a serious reliability problem for ICs. The SOI CMOS transistors have inherently a higher soft-error immunity than the bulk CMOS transistors because of its structure with the BOX insulating layer that prevents most of the charges generated by the ion incidence from flowing to the channel. In this section, the soft error of SRAMs, logic circuits, and combined effects for the chip-level soft-error immunity are described below.

The SEU caused by alpha and neutron irradiation on the SOTB SRAM is thoroughly studied in comparison with the bulk SRAM of the same footprint [76]. The SOTB SRAM can operate at low voltage down to ~0.4 V [22], however, this can increase the risk of a soft error versus the conventional bulk SRAM operating at higher Vdd such as 1.0 V. The measurement results for both alpha and neutron irradiation show that the soft-error immunity of the SOTB device is superior to that of the bulk SRAM. In the SRAMs that require high reliability, the error correction code (ECC) is implemented. If multiple memory cells in a row are attacked at a time by a single particle incidence, however, there is some possibility that the ECC cannot completely work. The multiple cell upset (MCU: not the micro controller unit in this section) is thus a significant point to be considered for the SRAM reliability. As shown in Fig. 6.29, the MCU rate (FIT: failure in time) for the SOTB SRAM is lower than that of the bulk SRAM. Complete dielectric separation between transistors by both the shallow trench isolation (STI) and the BOX layer in the FDSOI transistor contributes to reduce the risk of MCU. This result suggest that the SOTB SRAM is more robust even at 0.4 V compared to the bulk SRAM at 1.0 V. Moreover, the soft-error rate under the reverse back-bias condition is significantly reduced.

Fig. 6.29
figure 29

Measured neutron-induced SEU and MCU as a function of supply voltage © 2015 IEEE [76]

Recently, the soft error due to muon irradiation draws much attention, especially for the SRAMs fabricated by the highly scaled process. The muon soft-error rate for both the bulk and the SOTB SRAMs was studied [77]. The experimental results reveal that the effect of muons is not significant compared to neutron effects for the 65-nm technologies and the SOTB is less sensitive to the muon irradiation than the bulk.

A new type of soft error was found on the SOTB SRAM [78]. In contrast to the above mentioned superior results on the SOTB‘s soft error immunity, a 100-fold increase is observed under the reverse back-bias compared to the zero back-bias. A remarkable phenomenon is that the multiple-bit error occurs along the bit line direction. In this direction, the p-well (p GP in Fig. 6.6) is common in an array of the SOTB SRAM. As schematically depicted in Fig. 6.30, electrons generated by the incident ions can modulate the potential of the p-well (p GP) layer, and this effect is significant if this layer is in the reverse bias state. Although this phenomenon is not a favorable characteristic in terms of the low-power circuit operation that tends to use the reverse biasing, the modeling of this soft error [79] can contribute to optimize the triple-well structure and the BOX thickness, and its reliability will be improved further.

Fig. 6.30
figure 30

Schematic illustration to explain multiple cell upset through a p-well layer underneath the BOX layer © 2018 IEEE [78]

The soft error caused in the logic circuits can seriously affect the operation, because there is generally no way of salvation like the ECC for SRAMs, other than using the redundant circuits with majority logic. Especially, it is known that the flip-flop (FF) circuit is relatively weak among various logic circuits. The experimental results for alpha and neutron irradiation were reported [80]. Figure 6.31 shows the neutron results for D-type FF as a function of back-bias voltage. It is remarkable that the soft-error immunity of SOTB D-FF is about 20 times better than the bulk D-FF, and the immunity of the SOTB D-FF becomes stronger with reverse back-bias whereas that of the bulk D-FF slightly increases.

Fig. 6.31
figure 31

Soft-error rate by neutron irradiation for bulk and SOTB D-FFs © 2014 IEEE [80]

There are various FF structures for radiation hardening such as the dual interlocked storage cell (DICE) latch [81]. In the FDSOI structures, with the same reason as the robustness over multiple cell upset in SRAMs, the impact of the single event in one transistor to the adjacent transistor is weaker for the SOTB transistor than the bulk one. A series connection of two transistors is thus effective way to improve the soft error immunity like the stacked inverter structure [82]. There are trade-off relationships between the soft-error immunity of the circuit and its size and delay because the soft error immune circuit tends to require additional transistors. The study to solve these trade-offs was reported on the SOTB circuits with various circuit topologies [83,84,85]. Figures 6.32 and 6.33 show the circuit schematics of the conventional transmission-gate FF (TGFF) and the feedback recovery FF (FRFF), respectively. The latter one is considered to be a superior structure in terms of the above trade-offs among the SOI FFs, and its feature is additional feedback lines indicated by N2 with only two additional transistors. The area, delay, and power of the latter increase only by 6%, 6%, and 3%, respectively, from the former (conventional TGFF), and the average soft-error rate by neutrons is 1/3 of that for the TGFF. The average cross section over heavy ions (Ar and Kr) is also 1/2 of the TGFF.

Fig. 6.32
figure 32

Circuit schematic of conventional transmission-gate FF (TGFF) ©2019 Prof. Kobayashi [85]

Fig. 6.33
figure 33

Circuit schematic of feedback recovery FF (FRFF) ©2019 Prof. Kobayashi [85]

By combining the results of SRAM and FF, the chip-level soft-error rate (SER) was estimated [86]. Assuming two types of typical processor chips: a high-performance processor of 6 × 6 mm2 size with 50% SRAM area and an embedded (open RISC) processor of 1 × 1 mm2 size with 91% SRAM area, the chip-level SERs for the bulk and SOTB chips operated at 0.5 and 1.0 V were calculated. Most of (>95%) the errors occur in the SRAM area when ECC is turned off. By applying ECC, the error-rates of the SOTB and bulk chips were drastically reduce by two orders and one order of magnitude, respectively. The smaller risk of MCU for SOTB enhances the effect of ECC. The results with ECC are shown in Fig. 6.34. Significant decrease in the chip-level SER for SOTB was demonstrated. By applying ECC, the majority of errors occur in the FF area. Note that the data of the conventional D-FF are used in this estimation. By using highly immune FF structures as described in the previous paragraph, the chip-level SER is anticipated to be improved further.

Fig. 6.34
figure 34

Comparison of chip-level SER with ECC © 2019 IEEE [86]

6.16 Summary of SOTB Chip Implementation

The various examples of the SOTB chip implementation described in this chapter are summarized in this section.

  • Low-voltage SRAM of minimum Vdd down to 0.37 V in 6.6:

    • Reverse Vbb (back bias) enables to store the data with very small leakage current.

  • MCU (microcontroller unit) in Sects. 6.9 and 6.10:

    • MEP (minimum energy point) operation at Vdd around 0.4 V with both small active and leakage currents.

    • MCU chip with embedded flash memory.

    • Commercial MCU chip with various IPs and embedded EH (energy harvesting) controller.

  • Reconfigurable circuits in Sect. 6.11:

    • FPGA (field-programmable gate array) with drastically reduced leakage current due to independent Vbb control on each processing element.

    • Reconfigurable accelerator circuit, CMA (cool mega array), with optimized Vbb and optimized domain size.

  • Data processing circuits in Sect. 6.12:

    • CAM (content-addressable memory) for pattern-matching systems and database operation.

    • FFT (fast Fourier transformation) macro using the coordinate rotation digital computer (CORDIC) algorithm.

  • Security circuits in Sect. 6.13

    • AES encryption circuits with enhanced performance and energy efficiency.

    • PUF (physically unclonable function) circuits, using small variability transistors.

  • Analog and rf circuits in Sect. 6.14

    • Δ-Σ modulator with high conversion figure of merit.

    • VCO (voltage-controlled oscillator) with Vbb control.

    • OOK (on-off keying) receiver and transmitter for the IoT node.

    • RF energy harvester by using the SOTB as a dynamic threshold MOSFET.

    • Small noise variability of the SOTB transistors.

  • Soft-error immune SRAM and logic circuits in Sect. 6.15

    • SRAMs with significantly reduced single-event as well as multiple-cell upsets.

    • Reduced soft-error rate for FF (flip-flop) circuits and circuit topologies to obtain further robustness.

6.17 Future Perspective

A drastic decrease of connectivity cost for IoT devices and a popularization of prototyping tools such as 3D printers with various easy-to-use 3D-CAD tools and tiny development boards with microcontrollers like Arduino and Raspberry Pi have accelerated the democratization of manufacturing, and they have opened a door of the makers movement [87, 88] with the open-source hardware. This will significantly accelerate the production of a wide variety of applications bridging the cyber and physical worlds through sensing, processing, networking, and actuating. Note that the opensource hardware is not restricted to the education and the hobby. For example, the industry-grade Raspberry Pi is already a strong candidate to be used in various control devices in the industry because of both low hardware and development costs.

In this context, the ultra-low-power electronic devices including microcontrollers and various accelerating engines will be more important in the future. With increasing the number of IoT connecting devices, the required specifications of ICs will be upgraded to satisfy the needs of increasing the performance of the edge processing. Considering the limited power for most of the IoT devices, improving the energy efficiency is still important as described in the first section. Let us quote the insightful words by Mark Horowitz, “Unfortunately, many of the magic bullets for decreasing energy without affecting performance have already been found and exploited. While there are no quick fixes, power growth must be addressed by application specific system level optimization, increasing use of specialized functional units and parallelism, and more adaptive control.” [89]. The authors consider that the highly optimized combination of dedicated functional logic engines, reconfigurable processors, and central processing units (microcontrollers), all with the adaptive control, will be a gold solution. The adaptive control function and low-voltage operation capability of the SOTB technology should contribute to each processing part working with the best energy efficiency. Some indications are believed to have been shown in Sects. 6.116.13.

On the logic engines and microcontrollers, the important factor is that the hardware should be released with an easy development environment to be a defacto standard. In the Arduino family, for example, the integrated development environment (IDE) is ultra easy to use, and this leads to a positive feedback of increasing its users, growing user communities, and further improving its environment. It is known that developing the system working on FPGA is not easy because it usually needs to use the hardware description language such as VHDL or Verilog HDL. The Arduino family, however, already released the development environment integrated with the Arduino IDE [90]. This will accelerate the users to take advantage of the higher-performance hardware. Therefore, highly sophisticated design environments also contribute to the highly energy-efficient logic engines to become popular.

Another important trend to be considered is novel computing architectures such as neuromorphic computing and quantum computing. Let us go back to “The Free Lunch Is Over” [6], it was shown that the increase of the clock frequency has already slowed down (currently, maximum frequency as high as 5 GHz). Furthermore, considering the MEP in Sects. 6.3 and 6.4, rather lower frequencies are preferred to improve the energy efficiency in the current logic-circuit framework with CMOS transistors. It is known that the human brain processing speed is, however, about 60 Hz [91] and the structure with massive and reconfigurable wiring might be another significant difference. A combination of the neuromorphic computing architecture and the 3D integration technology with moderate clock frequencies thus can be a new paradigm of high-efficiency computing. The quantum computing can be another new paradigm. The inherent parallel computing architecture enables high-performance computing with slower clock frequencies.Footnote 4 In the quantum computing with superconducting qubits, the cryogenic interface with the conventional electronic devices is important [93]. The FDSOI transistors can work at cryogenic temperatures (with a proper design) [94] and operation with minimum heat dissipation by the MEP operation will be an important design issue. These novel computing schemes, mean new scenarios, in which the SOTB technology can contribute to energy efficient computing.