1 Introduction

As technology scales down device’s feature size, circuit’s lifetime reliability has become a major challenge in integrated circuit design, mainly due to transistor aging induced by Bias Temperature Instability (BTI) mechanism [1]. BTI causes a gradual increase on the device’s threshold voltage (Vth) over the lifetime, increasing delay, and ultimately, it can make a circuit to violate time specifications. The impact of BTI on circuit delay degradation (and lifetime reliability) has been shown to be highly dependent on the operating temperature and the workload executed by the circuit [2, 3]. Moreover, circuit reliability is also affected by process-induced device’s variations (PV) [4, 5], which have a significant impact on circuit performance and make more difficult to satisfy stringent reliability constraints during circuit design.

The conventional approach to assure circuit lifetime under BTI and PV effects is to add a worst-case guardband to the clock period. In such a way, correct signal propagation through the logic paths is assured. However, as devices continue to shrink, the required guardbands are becoming unacceptably large, leading to conservative designs with reduced performance [6, 7].

Various aging-aware design techniques already exist in the literature. In [8, 9], gate’s input-node-reordering was proposed to mitigate the delay degradation of the paths due to BTI. The idea was to manipulate the percentage of time the devices experience BTI stress, also known as stress probability. However, degradation reduction may be insufficient to mitigate guardbands under both BTI and PV effects. Gate size optimization is a widely used approach to address aging and process variations issues. This method resizes the gates to achieve optimal trade-offs between delay, area, and lifetime reliability. In [10], the design optimization of a full-adder circuit based on extensive SPICE simulations was presented. However, SPICE-based optimization is computationally unfeasible for large-scale integrated circuits. The works [11, 12], built gate libraries robust to aging by sizing the transistors in a gate according to the stress probability that the devices experience. These approaches require more detailed guidance to determine where to place the robust gates within a circuit as the gates with the largest delay degradation may not be the most influential to overall circuit timing. In [13], it is proposed to increase the size of all the gates liying in the critical paths of the circuit and having a delay degradation larger than a given threshold (i.e., 5%). The proposed approach takes into account the maximal load capacitance that a gate of a given size can drive. However, not all the gates in the critical paths have the same impact on circuit delay degradation. Therefore, they should be treated differently. In [14, 15] an optimization problem minimizing circuit area for a given delay constraint is formulated and solved using Lagrangian relaxation. These methods may become complex for large circuits, especially if process parameters variations are considered.

The concept of gate criticality metrics under aging effects was introduced in [16]. Gate criticality metrics provide a fast estimation of how efficiently the delay degradation of the circuit improves at a given area or power cost when sizing a gate. Then, design actions can take place based on the metric scores. Different gate criticality metrics have been proposed in [5, 16, 17], and [18]. In [16, 17], the selected gates are replaced by their aging-robust counterparts from an aging-aware gate library (such as that in [12]). In [5, 18], the size of the gates with the highest metric score is iteratively increased until the desired timing constraint is met. However, it is not considered to decrease the size of gates with little impact on delay to mitigate area overhead. Also, the used metrics do not consider the impact of sizing a gate on both the degradation and the standard deviation of the paths delay (under PV), which may limit the efficiency of the optimization process.

Aging-aware circuit design optimization becomes a complex problem in scaled technologies because BTI-induced delay degradation strongly depends on the execcuted workload, which defines the stress probability of each transistor in the circuit. Unfortunately, the exact workload executed by a circuit over the lifetime is unpredictable and hardly to know in advance at the design phase. Therefore, a major limitation of the aforementioned aging-aware optimization approaches is that they either assume worst-case stress probability or a specific signal proability profile at main circuit inputs for aging estimation. While the first approach leads to conservative designs with excessive area overhead, the second approach may not be reliable if the actual signal probabilities of the circuit differ from those used during circuit design. Recently, a sizing approach considering the distribution of paths delay degradation for various workload profiles was proposed in [19]. The circuit is optimized based on the mean value of the delay degradation of the paths over a set of workloads, but this does not guarantee reliable operation. Furthermore, the effect of process variations was not considered.

This paper presents a methodology for guardband reduction by efficient selection and sizing of critical gates considering BTI aging and PV effects. This is an extension of our previous work in [20]. The proposed approach uses metrics to identify those gates providing efficient guardband reduction with as small as possible area overhead. The main contributions of this paper are:

  1. 1.

    A multiple workload-aware sizing algorithm is proposed. The paths delays are estimated for various workload scenarios at main inputs. In such way, a more accurate estimation of the maximal paths delay degradation is made. Then, the paths are optimized for the workload scenario that causes the largest delay degradation.

  2. 2.

    New statistical gate sizing metrics are proposed. The metrics include the impact of gate sizing on the BTI delay degradation and the standard deviation of the delay. A fast approximation for the sensitivity of the statistical delay of a path with respect to the size of a gate is proposed. The optimization process considers sizing-up gates to improve delay and sizing-down gates to mitigate area overhead.

The rest of this paper is organized as follows: Section 2 explains path-based delay estimation under BTI-aging and Process Variations. Section 3 presents the proposed gate size optimization methodology. Section 4 presents the proposed metrics and the sizing heuristic for guardband reduction with low area cost. Section 5 presents the simulation results on ISCAS Benchmark circuits. Section 6 presents the conclusions of this work.

2 Delay Estimation Under BTI and Process Variations

2.1 Statistical Model for BTI Aging

Bias Temperature Instability (BTI) is the dominant aging mechanism in modern technologies. Negative-BTI (NBTI) affects PMOS transistors under a negative gate-to-source bias. Similarly, Positive-BTI (PBTI) affects NMOS transistors under positive gate-to-source bias. NBTI was considered the major reliability issue before the 45nm technology node. However, PBTI has become important since the introduction of the high-k metal gate dielectric in sub-45nm technologies [21]. BTI mechanism has two phases [22, 23]:

  1. 1.

    Stress Phase: BTI is associated with the degradation of the SiSiO2 interface of the device due to the breaking of weak SiH bonds caused by the high vertical electric field and elevated temperatures. The released H atoms combine to form H2 spices and diffuse into the oxide leaving an interface-trap [22]. BTI is also associated with the trapping and de-trapping of charge carriers from the channel tunneling into pre-existing traps (defects) in the gate oxide [23]. These mechanisms manifest as a gradual increase on devices Vth during the stress phase.

  2. 2.

    Recovery Phase: When stress is removed (|Vgs| = 0) some of the traps in the SiSiO2 interface are passivated. Therefore, the Vth degradation during the stress phase is partially recovered.

The overall increase in Vth is a function of the percentage of time the device is at stress, also known as the stress probability, which strongly depends on the executed workload by the circuit. A power law is widely accepted to model this dependence [24,25,26]. A closed form equation to calculate BTI-induced Vth degradation is [26],

$$\begin{array}{@{}rcl@{}} \Delta V_{th,BTI}&\approx& K\cdot t_{ox}\cdot \sqrt{C_{ox}\cdot(V_{GS}-V_{TH0})} \cdot e^{\left( \frac{E_{ox}}{E_{0}}\right)}\\ &&\cdot e^{\left( \frac{-E_{a}}{kT}\right)}\cdot \alpha^{n}\cdot t^{n} \end{array} $$
(1)

where n is the time exponent, tox is the gate oxide thickness, Eox is the vertical electric field, T is the temperature, k is the Boltzmann constant, Cox is the oxide capacitance per unit of area, VTH0 is the initial (fresh) threshold voltage value, Ea and E0 are constants, α is the stress probability and K is a technology-dependent fitted constant, which can be different for NBTI and PBTI.

As can be observed in Eq. 1, the Vth deterioration depends on the initial Vth (VTH0). However, VTH0 becomes a random variable due to process variations. The impact of process variations in the long-term degradation of Vth can be accounted by a first-order Taylor approximation of Eq. 1 [27],

$$ {\Delta} V_{th,BTI} =(1+S_{v} \cdot {\Delta} V_{th,PV})\cdot A\cdot \alpha^{n}\cdot t^{n} $$
(2)

where ΔVth,PV is the shift in VTH0 due to process variations, and A and Sv are fitted constants. Then, the total Vth variation of a transistor m corresponds to the summation of the contributions due to BTI (ΔVth,BTI) and Process variations (ΔVth,PV), as given by Eq. 3 [27],

$$ {\Delta} V_{th,m}= A_{m}\cdot {\alpha_{m}^{n}}\cdot t^{n} + (1+S_{V,m}\cdot A_{m}\cdot {\alpha_{m}^{n}}\cdot t^{n}) \cdot {\Delta} V_{th,PV,m} $$
(3)

Note that at the beginning of the lifetime (t = 0) the total variation in Vth is due to only process variations. However, as circuit ages, BTI causes a shift in both the mean value and the variance of Vth [28].

2.2 Aging-Aware Statistical Gate Delay Model

For Statistical Static Timing Analysis, the gate delay is modeled as a linear function of normally distributed random variables representing process parameters.

$$ D=D_{n}+{S_{W}^{D}} {\Delta} W+{S_{L}^{D}} {\Delta} L+S_{tox}^{D} {\Delta} t_{ox}+{\sum\limits_{m}^{M}} S_{V_{th,m}}^{D} {\Delta} V_{th,m} $$
(4)

where Dn is the nominal gate delay, \({S_{W}^{D}}\), \({S_{L}^{D}}\), \(S_{tox}^{D}\) and \(S_{V_{th}}^{D}\) are the gate delay sensitivities with respect to deviations in W, L, tox, and Vth, respectively. M is the number of transistors in the gate. Note that ΔVth is composed of two deviation components, one related to the time-zero variability and the other related to aging effects (See Eq. 3). This linear model is adequate for small enough variations as computational complexity remains low and the error due to discarded higher order terms can be neglected [29].

In order to use the Aging-Aware Statistical Gate Delay Model into a Statistical Static Timing Analysis tool, the parameters in Eq. 4 are pre-computed by accurate SPICE electrical simulations. For each gate type (i.e., INV, NANDs, NORs), HSPICE simulations are run at various design conditions given by combinations of the input transition time (SRIN), the gate size (K), load capacitance (CL), and the operating Temperature (Te). For each combination, the nominal gate delay and gate delay sensitivities to process parameters are measured. Then, the extracted data is fitted using polynomials, which allow a fast and accurate estimation of the statistical gate delays using Eq. 4.

2.3 Statistical Delay of a Path

The statistical delay of a path is computed as the statistical sum of the random variables representing the delay of each gate in the path. Given the mean and standard deviation for a given aging time for all the gates in the path, the PDF of the path (Dp = N(μD,p,σD,p)) is obtained by:

$$ \mu_{D,p}= \mu_{Dn,p}+ \mu_{{\Delta} D,p} =\sum\limits_{i = 1}^{N} \mu_{Di} $$
(5a)
$$ \sigma_{D,p}=\sqrt{\sum_{i = 1}^{N}\sum_{j = 1}^{N} \rho_{ij} \cdot \sigma_{Di} \cdot \sigma_{Dj}} $$
(5b)

where μDn,p is the mean of the nominal delay of the path, μΔD,p is the mean of the delay degradation of the path, μDi is the mean of the aged delay of the gate i for the given aging time, σDi and σDj are the standard deviation of the aged delay of gates i and j, respectively. The parameter ρij is the correlation between gate delays, which depends on the spatial proximity of the gates in the circuit layout. The analytical model proposed in [30] is used to estimate the degree of spatial correlation between two gates. Note that the mean delay value of a path has a nominal component (μDn,p) and a component due to aging effects (μΔD,p). Also note that the standard deviation of the delay of a path depends on aging effects, as the threshold voltage variability changes due to aging (See Eq. 3).

3 Proposed Methodology for Guardband Reduction by Selection and Sizing of Critical Gates

The proposed optimization methodology consists of the three steps shown in Fig. 1. In the first step, those paths that may become critical under worst BTI conditions (worst stress probability and worst temperature) are identified. Those paths are called the Potential Critical Paths (PCPs) of the circuit. Similarly, the gates belonging to these paths are called the Critical Gates of the circuit. In this paper, only the PCPs are considered during design optimization. The non-PCPs are not considered for optimization as they would not trigger any aging-related issue.

Fig. 1
figure 1

Flow of the proposed gate sizing optimization methodology

In the second-step, a multiple work-load-aware aging analysis of the PCP set is done to estimate the specific workload that causes a realistic maximum aged delay on each PCP. In the third step, the PCPs are optimized using the proposed gate sizing metrics so that their realistic maximum aged delay satisfy a given target guardband (GBt) with low area cost.

3.1 PCP Identification Under Worst BTI Condition

Aging-Aware Statistical Static Timing Analysis (SSTA) is run assuming worst BTI conditions, i.e., the devices in the circuit are assumed to operate under near-static stress (α ≈ 1) and high temperature (T = 120C). Those paths with a μ + 3σ of the aged delay distribution greater than the nominal (without aging and PV) delay of the circuit are identified as Potential Critical Paths (PCPs).

The identification of PCPs under worst BTI conditions allows focusing the optimization in a reduced path set rather than in the entire circuit, reducing computational effort.

3.2 Multiple workload-Aware Aging Analysis

A workload corresponds to the set of consecutive bits applied to each main input of the circuit when executing a given program [31] and it is represented by the Signal Probability (SP) at main circuit inputs (probability of a node to be at logic 1). The workload impacts the stress probability (α) of each device and on their operating temperature [2, 3], which in turn influence BTI degradation, making complex circuit reliability analysis and optimization.

To address the unpredictability of the circuit workload at the design phase, we refine the workload conditions at which the delay of each PCP is evaluated during design optimization by performing a Multiple Workload-Aware Aging Analysis. The idea behind this step is to determine the workload at which a realistic maximum delay degradation of each PCP occurs. Figure 2 shows a histogram of the mean of the aged delay of a PCP in ISCAS circuit c2670 for 1000 different workload profiles. As can be seen, the maximum aged delay that the path can take over all tested workload profiles is much lower than the aged delay estimated using worst BTI conditions (α ≈ 1 and T = 120C). This is because the devices under the tested workload profiles experience more realistic degradation conditions due to BTI. Figure 2 also shows that the variation of path delay degradation due to the workload can be approximated by a gaussian-like distribution, as was also found in [3, 19].

Fig. 2
figure 2

Histogram of the delay of a PCP for various workload profiles (ISCAS c2670)

Since it is unfeasible to evaluate the delay degradation for each path and for every possible combination of signal probabilities at main inputs (representing a workload profile) for a state-of-art digital circuit, the following strategies are proposed to estimate an upper bound for the delay degradation of the paths with an acceptable computational cost:

  • The multiple workload-aware aging analysis is only performed over the PCP set.

  • For each PCP, only its mean delay degradation due to process variations is computed for the tested workloads.

  • If the delay degradation being obtained for a PCP does not increase after testing a given number N of consecutive workload profiles, it is assumed that a good enough approximation of the maximum PCP aged delay has been obtained, and the PCP degradation is not longer computed for the subsequent workload profiles.

  • Once the workload that causes maximum delay degradation for each PCP is identified, SSTA is run to compute the deviation of the delay of the PCPs due to process variations.

Algorithm 1 describes the proposed multiple workload-aware aging analysis procedure. For an user-defined number of workload profiles (MaxWL), a set of signal probabilities at main circuit inputs are generated and propagated to internal nodes (function generate_propagate_SP()). A uniform random number generator between 0 and 1 is used to obtain the signal probability assigned to each input. Then, the stress probability (α) of each transistor in the circuit is computed (function compute_stress_probability()). Figure 3 illustrates the basic equations for signal probability propagation and stress probability computation for some basic gates. The formula to propagate the signal probabilities for other more complex gates can be easily derived based on their truth tables. The operating temperature of each cell is also computed as it strongly influences BTI mechanism (function compute_temperature()). The temperature profile of the circuit is obtained from the power consumption profile using the electric model given in [32],

$$ T_{i}=R_{J,i} \cdot P_{i} + R_{I-A}\cdot P_{total}+T_{A} $$
(6)

where Ti is the operating temperature of gate i, Pi is the power consumption (Static and Dynamic) of the gate i, RJ,i is the junction to internal air heat resistance, Ptotal is the total circuit power consumption, RIA is the heat resistance from internal air to ambient, and TA is the ambient temperature [32]. Once the stress probability and operating temperature are obtained, the BTI-induced Vth shift of each device is computed (function computeVth,BTI()). Then, the mean value of the aged delay (μD,p) is computed for each PCP p whose flag variable PCP[p].MAX, which indicates that a good enough maximum aged delay of the path has been found, is not activated. If the obtained μD,p is the largest obtained for the currently tested workloads, the conditions of stress probability and temperature of the devices in the path are stored. If the obtained μD,p is not larger than the previous μD,p computed for a consecutive user-defined number (N) of workload profiles, the flag variable (PCP[j].MAX) is activated, indicating that the currently stored conditions for the path p cause a good enough estimation of the maximum aged delay of the path. Then, this path is not evaluated for the subsequent workload profiles. It is important to note that the workload that causes maximum path delay degradation can be different for each path.

figure a
Fig. 3
figure 3

Signal Probability Propagation and Stress Probability computation rules

Once the workload condition that causes maximum delay degradation for each PCP is identified, SSTA is run to compute the standard deviation of the delay of the PCPs. Then, the set of PCPs is reduced by discarding those paths whose maximum aged delay at the μ + 3σ corner does not exceed the nominal circuit delay. This process mitigates the computational effort required for design optimization. Moreover, the corresponding workload condition that causes a maximum delay degradation for each PCP is stored so that the path delay can be re-evaluated under such conditions if needed.

Figure 4 shows the behavior of the cumulative maximum delay degradation obtained for some paths of the circuit C1908 as a function of the number of tested workloads. As can be seen, the maximum delay degradation obtained for all the paths tend to saturate after some workload profiles are tested. This behavior suggests that only a moderated number of workload profiles need to be analyzed to get a good estimation of the maximum aged delay that a path can take.

Fig. 4
figure 4

Maximum delay degradation of some paths as function of the number of tested workload profiles

4 Selection and Sizing of Critical Gates

This section presents the proposed methodology for selection and sizing of the critical gates to optimize the circuit to satisfy a reduced target guardband (GBt).

4.1 Guardband Computation

The first step for the selection and sizing of critical gates (See Fig. 1) is to compute the actual guardband of the circuit. Here, only the maximum aged delay of each PCP that was obtained from the multiple workload-aware aging analysis step is considered. The guardband that each PCP impose (GBp) over the nominal circuit delay is defined as,

$$ GB_{p}=(\mu_{D,p}+ 3\sigma_{D,p})-D_{nom} $$
(7)

where μDp and σD,p are the mean value and the standard deviation of the maximum aged delay of the PCP p, and Dnom is the nominal circuit delay (no BTI and no PV).

The proposed methodology in this work assures reliable circuit operation for a user defined Target Guardband (GBt), which is smaller than the Initial Guardband, under the combined effect of aging and process variations.

4.2 Identification of Fast and Slow PCPs

The PCPs are then separated into two different subsets depending on the corresponding guardband imposed by each path, as illustrated in Fig. 5a) Slow-PCPs subset, which has negative slack (GBtGBp < 0); and b) Fast-PCPs subset, which has positive slack (GBtGBp > 0). This classification is done to exploit the fact that different design actions can be taken over each PCP subset. Some gates in the Slow-PCPs are sized-up to improve their delay, while some gates in the Fast-PCPs are sized-down to take advantage of their slack to mitigate area overhead.

Fig. 5
figure 5

Fast and Slow PCP sets

4.3 Evaluation of Sizing Metrics

Gate selection metrics are proposed to guide the optimization process. The metrics are intended to identify the best critical gates to be sized in each PCP subset to efficiently improve the circuit guardband.

4.3.1 Sensitivity of the Statistical Delay of a Path to a Gate Size

We define the sensitivity of the statistical delay of a path with respect to the size of a gate as the derivative of the μ + 3σ of the path delay distribution to a change in the size of the gate i in the path:

$$ \begin{array}{cll} & S^{Dp}_{Ki}& =\frac{\partial \mu_{Dp}}{\partial K_{i}} + 3\cdot \frac{\partial \sigma_{Dp} }{\partial K_{i}}\\ & & \\ & & =\left[\frac{\partial \mu_{Dn,p}}{\partial K_{i}} + \frac{\partial \mu_{{\Delta} D,p}} {\partial K_{i}} \right]+ 3\cdot\frac{\partial\sigma_{Dp}}{\partial K_{i}} \end{array} $$
(8)

where Ki is the size of the gate i in the path, μDp and σDp are the mean value and the standard deviation of the aged path delay obtained with Eqs. 5a and 5b, respectively. μDn,p and μΔD,p correspond to the mean value of the nominal (fresh) path delay and the mean value of the delay degradation of the path.

Equation 8 measures the impact of sizing a gate on the path delay. As can be seen, three components influence \(S^{Dp}_{Ki}\): 1) the component related to the nominal delay (no aging and no PV), 2) the component related to aging effects, and 3) the component related to process variations. Figure 6 shows these components for the path example shown in the inset Figure. As can be observed, the component related to the nominal path delay is the largest. However, the components due to the impact of aging on the mean delay and the impact of process variations are also important. It is worth to mention that the aging component depends on the degradation of the gate. A gate whose devices have larger aging also exhibit a larger \(\frac {\partial \mu _{{\Delta } D,p}}{\partial K_{i}}\). It is also important to note that spatial correlation plays an important role in the magnitude of \(\frac {\partial \sigma _{Dp}}{\partial K_{i}}\). Figure 6 shows two cases: when all the gates in the path are placed far away, and their spatial correlation is almost zero (ρ = 0), and the case when all the gates are placed very close to each other, having a full spatial correlation (ρ = 1). Therefore, those gates that have a higher correlation with the other gates in the path may be preferable to be optimized.

Fig. 6
figure 6

Example of the magnitude of the components of the sensitivity of the statistical delay of a path to sizing of a gate (Eq. 8)

The brute-force approach for computing Eq. 8 is to evaluate the statistical distribution of the aged path delay for both the current size of the gate and when the size of the gate is changed by a small perturbation (this is done for the numerical computation of the derivatives). In such way, for a path with N gates, the statistical delay of the path would have to be computed N + 1 times to compute the sensitivity of the statistical delay of the path with respect to the size of each gate, which is computationally costly. Therefore, we propose some simplifications to evaluate Eq. 8 more efficiently, as explained next.

Figure 7b shows the derivative of the mean value and the standard deviation of the delay of each gate in the path shown in Fig. 7a to a change in the size of the gate i in the path. As can be seen, only the timing response of the gates i − 1, i, and i + 1 are significantly affected. We call the set of these gates as the path segment for gate i. As shown, both the mean and standard deviation of the gate i − 1 increases due to the larger input capacitance of the sized gate. On the other hand, the mean and standard deviation of the gate i + 1 reduces because its input signal switches faster as gate i becomes stronger. Obviously, the mean value and the standard deviation of the delay of the sized gate are the most reduced when the size of this gate is increased. It should be noted that the change in the standard deviation of the delay of a gate is much smaller than the change in the mean value, as was observed before in Fig. 6. Based on the above mentioned observations, the following approximations are made:

Fig. 7
figure 7

A path example to illustrate the impact of sizing a gate on its neighboring gates in the path

Sensitivity of the Mean of the Path Delay to Gate Sizing

It is assumed that a change in the mean delay of a path is mainly due to a change in the mean delay of the gates in the path segment of the gate i. Therefore, we approximate \(\frac {\partial \mu _{Dp}}{\partial K_{i}}\) as,

$$ \begin{array}{ll} \frac{\partial \mu_{Dp}}{\partial K_{i}} & \approx \frac{\partial \mu_{D,i-1}}{\partial K_{i}}+ \frac{\partial \mu_{D,i}}{\partial K_{i}}+ \frac{\partial \mu_{D,i + 1}}{\partial K_{i}} \\ \\ & \approx \frac{\partial \mu_{D,i-1}}{\partial CL_{i-1}} \cdot \frac{\partial C_{in,i}}{\partial K_{i}} + \frac{\partial \mu_{D,i}}{\partial K_{i}}+ \frac{\partial \mu_{D,i + 1}}{\partial SRI_{i + 1}} \cdot \frac{\partial SRO_{i}}{\partial K_{i}} \end{array} $$
(9)

where μD,i− 1, μD,i and μD,i+ 1 are the aged delays of the gates i − 1, i and i + 1 in the path segment of the gate being analyzed, CLi− 1 is the load capacitance of the gate i − 1, Cin,i is the input capacitance of gate i, SRIi+ 1 is the signal transition time at input of gate i + 1 and SROi is the signal transition time at output of gate i, which is equal to SRIi+ 1.

Note that by using this approximation only the mean delay of the path segment of the gate i needs to be recomputed.

Sensitivity of the Standard Deviation of the Path Delay to Gate Sizing

It is assumed that the change in the standard deviation of the delay of a path due to sizing a gate i is mainly due to the change of the standard deviation of the delay of the gate i and its impact on the covariance with the other gates in the path. We can write:

$$ \begin{array}{ll} \frac{\partial \sigma_{D,p}}{\partial K_{i}} & =\frac{1}{2\sqrt{\sigma_{D,p}^{2}}} \cdot \frac{\partial \left[\sum_{i = 1}^{N}\sum_{j = 1}^{N} \rho_{ij} \cdot \sigma_{Di} \cdot \sigma_{Dj}\right]}{\partial K_{i}} \\ & \\ & \approx \frac{1}{2\sigma_{D,p}} \cdot \left( \frac{\partial\sigma_{Di}^{2}}{\partial K_{i}}+ 2\sum_{j\neq i}^{N} \frac{\partial \sigma_{Di}}{\partial K_{i}}\cdot \rho_{ij}\cdot\sigma_{Dj} \right) \\ & \\ & \approx \frac{1}{\sigma_{D,p}} \cdot \left( \frac{\partial\sigma_{Di}}{\partial K_{i}}\sum_{j = 1}^{N} \rho_{ij}\cdot\sigma_{Dj} \right) \end{array} $$
(10)

As can be observed, the sensitivity of the standard deviation of the path delay depends on the spatial correlation that the sized gate i has with each other of the gates in the path. Note that Eq. 10 only depends on the derivative of the standard deviation of the delay of the gate i with respect to the size of the gate itself. Therefore, to evaluate Eq. 10 only the standard deviation of the gate of interest i needs to be recomputed.

4.3.2 Proposed Gate Sizing Metrics

The statistical sensitivity \(S^{Dp}_{Ki}\) reveals which gate has a larger impact on the μ + 3σ delay of the path. This parameter is combined with other important information of the gates to form the proposed gate sizing metrics.

Two gate sizing metrics are proposed to guide the optimization process: One that measures the benefit of sizing-up a gate in the Slow-PCPs, and other that measures the benefit of sizing-down a gate in the Fast-PCPs. For each gate i, the two following metrics (See Eq. 11) are evaluated:

$$ \begin{array}{cccc} M_{SU,i}=\frac{S^{D}_{Ki,AVG} \cdot |Slack^{-}_{i,AVG}| \cdot N_{i}}{{\Delta} A_{i}} & & M_{SD,i}= \frac{Slack^{+}_{i,AVG}\cdot {\Delta} A_{i} }{S^{D}_{Ki,AVG} \cdot N_{i}} \end{array} $$
(11)

where MSU,i and MSD,i are the sizing-up and sizing-down metrics, respectively. \(S^{D}_{Ki,AVG}\) is the average statistical delay sensitivity of the Ni paths passing through the gate i with respect to changes in gate size (Ki), Slacki,AVG is the average slack of the paths passing through the gate i, and ΔAi is the area impact of sizing the gate, which depends on the geometry of the cell layout. Note that each metric is evaluated for a different PCP set. For sizing-up metric, Slacki,AVG takes a negative value as it is evaluated over the Slow-PCP set. On the other hand, Slacki,AVG takes a positive value for the sizing-down metric, where the Fast-PCPs are considered (See Fig. 5). The value of Ni and Slacki,AVG are also different depending on the PCP set being considered.

The metric score determines the delay-area trade-off of sizing a gate. The sizing-up metric score increases for gates influencing many paths since they allow to improve various paths at a time. The sizing-up metric score also increases for those gates in Slow-PCPs with large negative slacks as those paths should be optimized with higher priority. A large average statistical path delay sensitivity with respect to gate sizing also increases the sizing-up metric score as a large delay reduction can be obtained by increasing the gate size. Finally, the sizing-up metric score reduces for gates with a high area impact because increasing the size of those gates is area costly. A similar interpretation of the parameters is made for the sizing-down metric. In this case, the size-down metric score increases for those gates affecting few Fast-PCPs with low delay sensitivity to gate sizing (low impact on delay) and large positive slack. Also, gates with a large area impact are preferred due to potential area savings when sizing-down a gate.

4.4 Sizing Heuristic

Algorithm 2 summarizes the sizing heuristic.

figure b

The obtained sizing-up metric score MSU,i reflects the benefit of Slow-PCPs delay reduction vs. area trade-off of each gate. Thus, N gates with the highest MSU,i are picked and size-up proportionally to their respective score: ΔK = stepMSU,i. Where N is an user-defined number of gates that are sized at each iteration and step is the maximum size change that a gate can take at an iteration.

The sizing-down metric score MSD,i reflects a trade-off between the delay increase of the Fast-PCPs and the area reduction. However, the interdependence between Fast-PCPs and Slow-PCPs must be considered to select the gates to be sized-down because a gate having a high MSD,i may negatively impact on Slow-PCPs if the gate also has a high MSU,i score. Therefore, the two following conditions are applied to select the gates to be sized-down:

  • Gates sized-up are not allowed to be sized-down in the same iteration.

  • Gates that have a sizing-up metric score (MSU,i) larger than a constraint (\(C_{M_{SU}}\)) are not allowed to be sized-down.

The constraint \(C_{M_{SU}}\) is used to limit the negative impact on the slow-PCPs delay of sizing-down gates. The value of the constraint is dynamically changed along the sizing process. It is initially set to 1 (maximum) to maximize area savings as any gate is allowed to be sized-down, but it is gradually reduced each time the delay of the Slow-PCPs is not improved in a given iteration, so that the guardband converges towards the desired target delay. The N gates with the highest MSD,i score fulfilling the aforementioned conditions are sized-down according to the following rule: ΔK = −step ⋅ (1 − MSU,i) ⋅ MSD,i. Thus, the amount of size reduction of a selected gate reduces (increases) if the gate has a high (low) MSU,i (MSD,i) score.

The size-down procedure is useful when the initial design has oversized gates due to a non-optimal design. Also, it becomes beneficial when a gate in a Fast-PCP is driven by a gate in a Slow-PCP. This may occur if the gate in the Fast-PCP was sized-up at the beginning of the optimization procedure (i.e., the gate was critical first), but then its importance to the remaining Slow-PCPs decreases.

Once the selected gates are sized, the PCPs timing information is updated (See Algorithm 2) under the conditions of temperature and stress probability of the devices that caused maximum aged path delay, obtained from the multiple workload-aware aging analysis steps.

5 Simulation Results on ISCAS Benchmark Circuits

The proposed gate sizing optimization technique for guardband reduction has been implemented in C ++ code and applied to ISCAS benchmark circuits designed using a 32nm Synopsys Generic Technology [33]. The original design of each circuit is of minimum area, where all the gates have minimum dimensions.

5.1 Statistical Path Delay Sensitivity Approximation

Let us first analyze the accuracy of the proposed approximation for the sensitivity of the statistical delay of a path. For this analysis, the impact of sizing each gate at the μ + 3σ delay of the slowest path of ISCAS circuit C1908 was computed. Figure 8 shows the sensitivity of the statistical delay of the path with respect to the size of each gate in the path. Data is shown for both the statistical path delay sensitivity obtained with the proposed derivative approximation (See Eqs. 89 and 10) and the exact derivative calculation, where the statistical delay of the path is re-computed when the size of each gate in the path is perturbed. As can be observed, the proposed approximation follows well the derivative obtained with the exact computation.

Fig. 8
figure 8

Statistical delay sensitivity of a Path with respect to sizing each gate in the path. Longest path of C1908 circuit

5.2 Optimization Results

Table 1 shows detailed results obtained from the application of the proposed design optimization methodology to ISCAS 85/89 circuits. Circuits of different size and complexity were considered. The second and third columns give the total number of paths and gates in the circuits. Columns 4-7 show results related to the multiple workload-aware aging analysis step. The column labeled as PCPs correspond to the number of Potential Critical Paths, which are those paths whose μ + 3σ delay may become greater than the nominal delay of the circuit. These paths are the ones considered during selection and sizing of the gates. As can be observed, the number of PCPs does not depend on the total number of paths (i.e., the number of PCPs in c7552 and s1423 is very different, but these circuits have a similar number of paths). The number of PCPs changes depending on the susceptibility of each circuit to aging and the circuit topology. Column 5 gives the number of gates belonging to the selected set of PCPs. These gates are called as Critical Gates (CGs). The proposed heuristic uses the sizing metrics to identify which critical gates are more beneficial to be sized. Column 6 shows the initial guardband that would have to be added to the nominal delay to assure reliable circuit operation under the combined effect of aging and process variations. As can be seen, the percentage of guardband needed can be up to 45% of the nominal delay, which may be unacceptably large for high-performance state-of-art designs. Column 7 shows the CPU time spent in the multiple workload-aware aging analysis. This corresponds to the time for evaluating the PCP set for multiple workload profiles. It should be noted that the number of times each path is evaluated may be different depending on when it is detected that the maximal delay obtained for a path does not further increase when more workload profiles are analyzed.

Table 1 Optimization results using multiple workload-aware aging analysis

Columns 8 to 13 of Table 1 show the results obtained applying the proposed methodology for selection and sizing of critical gates to reduce the initial guardband to a more acceptable target guardband of 20% (less stringent) and 10% (more stringent). The number of PCPs in the initial design that violate the corresponding target guardband (Slow-PCPs), the area overhead, and the CPU time for design improvement are given. When the guardband constraint is of 20% the area overhead for most of the circuits remains low because only some slow-PCPs out of the whole PCP set need to be improved. However, when the target guardband becomes more stringent, the number of slow-PCPs significantly increases for most of the circuits, depending on how balanced are the delays of the PCPs. The area overhead and the corresponding CPU time also increase for more stringet target guardbands as further optimization is needed to achieve the target.

5.3 Benefits of the Multiple Workload-Aware Aging Analysis

Tables 2 and 3 show the results for the cases when only one single workload and when worst BTI conditions are assumed for aging analysis, respectively. When only a single workload profile is used, the number of PCPs, the number of Critical Gates and the estimated guardband for the circuit are smaller than those obtained with our proposed multiple workload-aware aging analysis approach. This is because, in our approach, at least one of the tested workload profiles caused more aging in the PCPs than the workload profile assumed for the single workload case. Consequently, the area overhead when designing circuits using the single workload assumption is lower than the area overhead obtained with our proposal. Also, the CPU time for design optimization is slightly lower. However, if the workload profile that the optimized circuit experiences over the lifetime is different than the one used at design, some of the paths may degrade enough to cause a failure to time specifications. When only the worst BTI is assumed (See Table 3), the number of PCPs, Critical Gates and estimated GB for the circuit significantly increases, which results in significant area overhead and extra CPU time since the PCPs required more sizing than needed. For instance, consider circuit c2670, where 12.68% of the area overhead is saved when using our approach with respect to the design using worst-BTI conditions for a target guardband of 20%. The saved area increases to 49.19% for a stringent target guardband of 10%. A similar observation can be made for the other circuits.

Table 2 Results using a single workload for aging analysis
Table 3 Results using worst BTI condition (αandT) for aging analysis

The robustness of the optimized designs for 20000 random generated workload profiles was analyzed. For each workload profile, the corresponding stress probability and operating temperature of the devices were computed, and SSTA was performed to obtain the corresponding μ + 3σ delay of all the PCPs. Then, the maximum μ + 3σ delay among all the PCPs was identified, since this value corresponds to the maximum delay that the circuit can take for the given signal probability profile. Figure 9 shows histograms of the μ + 3σ delay of circuit s298 for both the optimized design (GBt = 10%) using our proposed multiple workload-aware aging analysis and the optimized design using only one single workload profile for aging analysis. As can be seen, there are some workloads for which the μ + 3σ delay of the circuit may violate the allowed 10% of guardband. However, it is clear that the optimized design with the proposed approach may violate the guardband for a significantly lower number of workloads, which demonstrates the benefit of the proposed approach. Table 4 shows the percentage of workloads for which the μ + 3σ delay of the circuits violated the specified guardband constraint of 10%. For most of the circuits, the robustness of the optimized design using the multiple workload-aware aging analysis is significantly better than the optimized designs using only one single workload. Therefore, the obtained designs with the proposed approach are more reliable.

Fig. 9
figure 9

Histograms of the μ + 3σ corner of the circuit aged delay obtained for an exhaustive number (20000) of multiple signal probability groups at circuit main inputs

Table 4 Percentage of SP groups for which 10% of guardband may be violated

In the case that the coverage of possible workloads wants to be improved, designers can trade-off the degree of circuit reliability and the computational cost of performing a more exhaustive workload-aware aging analysis step.

5.4 Gate Sizing Optimization Comparison

The efficiency of the proposed gate sizing optimization metrics and the heuristic was compared against other aging-aware metrics proposed in [16] and [18], which are given in Eq. 12

$$ \begin{array}{cccccc} M_{i,[16]}=\frac{N_{i}\cdot{\Delta} D_{i}}{max(N_{i}\cdot{\Delta} D_{i})}+\delta & & & & M_{i,[18]}= S^{Di}_{Ki} \cdot {\sum_{p}^{N}} {\Delta} D_{i} \end{array} $$
(12)

where Mi,[16] is the metric proposed in [16], Ni is the number of paths (PCPs) passing through the gate i, ΔDi is the delay degradation of the gate, and δ is a parameter that takes the value of 1 if the gate is in the slowest path of the circuit (the path with the largest negative slack). Mi,[18] is the metric in [18], \(S^{Di}_{Ki}\) is the delay sensitivity of the gate to changes on its size, N is the number of paths passing through the gate and \(S^{Di}_{Ki}\) is the delay degradation of the gate.

Note that the metrics chosen for comparison are based on different characteristics of the gates. The metric in [16] focuses on identifying the gates suffering the largest degradation and affecting many paths. A similar metric has been proposed in [17] to improve the aged performance of critical paths in an ALU. The metric in [18] considers not only the gate delay degradation and the number of paths affected by the gate but also the delay sensitivity on gate sizing. This metric was shown to perform better than that proposed in [5]. The metrics in Eq. 12 were used in the proposed metric-guided design flow. Only the sizing-up heuristic was applied since the approaches in [16], and [18] do not consider a metric for sizing-down gates.

Table 5 shows the results for 20% and 10% of guardband constraint. For comparison purposes, the area overhead of our proposed approach is also given. It can be observed that our proposal gives designs with lower area overhead than those obtained using the metrics of [16] and [18]. This is because the proposed metrics includes important parameters not taken into account in the others such as the area impact and the paths slack. Furthermore, the proposed metric uses a statistical sensitivity that takes into account the impact of sizing a gate on the nominal delay, delay deterioration and variability due to process variations. Among the other metrics, the one in [16] is less efficient for gate sizing. This is because this metric only considers the delay degradation and the number of paths impacted by the gate. However, it does not consider the path delay sensitivity to gate sizing. Therefore, it does not measure the potential delay improvement of sizing a gate. Although the metric in [18] includes the delay sensitivity parameters, this sensitivity does not consider aging or process variations effects. Therefore, it may fail to indicate the gates more beneficial for delay improvement.

Table 5 Percentage of area overhead for target guardbands of 10% and 20% of using three selection and sizing methods

Figure 10 shows the number of iterations performed when using each of the metrics for gate sizing. An iteration corresponds to the process of performing SSTA over all the PCPs to determine the current guarband required for the circuit and the Slow- and Fast- PCP subsets, the evaluation of the sizing metric for each gate in the PCPs, and the application of the sizing heuristic. It can be observed that the proposed metrics imply a larger number of iterations. This is because the proposed metrics select the gates giving an efficient delay-area trade-off, which are not necesarily the ones improving quicker the circuit delay. On the other hand, the metric in [16] gives a higher priority to those gates in the longest PCP of the circuit, which results in a quick delay reduction but with increased area overhead.

Fig. 10
figure 10

Number of iterations to achieve target guardband

6 Conclusion

A gate sizing optimization methodology for guardband reduction in the presence of aging due to BTI and Process Variations have been presented in this paper. Since the workload that a circuit experiences over the lifetime if unknown at the design phase, the proposed methodology calculates the maximum realistic aged delay of the circuit paths for various workload profiles at main inputs, which define the stress probability of the devices. In such a way, the traditional worst BTI assumption and unreliable specific workload assumption have been avoided. It has been shown that a reasonable number of signal probability profiles is sufficient to obtain a good estimation of the maximum degraded delay of the circuit paths. For delay optimization towards the desired target guardband, gate metrics and a sizing heuristic have been proposed to select the best gates for both sizing-up to improve delay and sizing-down to mitigate area overhead. An approximation for the statistical sensitivity of a path delay has been proposed to mitigate computational effort of statistical timing analysis and speed-up metrics evaluation. The application of the proposed methodology on ISCAS benchmark circuits has shown that gate sizing using the proposed approach to estimate the maximum aged delay of the circuit paths results in significant area savings compared to gate sizing under worst BTI assumptions. Furthermore, it has been shown that the obtained designs can operate reliably for a different workload profile than those used during design optimization. The results using the proposed metrics has been compared against the results using other gates metrics in the literature, and it has been shown that the proposed approach provides a better area-delay trade-off.