1 Introduction

As integrated circuit (IC) technology advances, the ever increasing process variation has become a growing concern [5]. A complex IC, containing numerous memory components, is required to meet the design specification not only at the nominal process corner, but also under large-scale process variations. To achieve sufficiently high yield, the failure rate of each individual memory component must be extremely small. For instance, the failure rate of an SRAM bit-cell must be less than 10−8∼10−6 for a typical SRAM design [2, 12]. Due to this reason, efficiently analyzing the rare failure event for the individual memory component becomes an important task for the IC design community.

The simple way to estimate the failure probability is to apply the well-known crude Monte Carlo (CMC) technique [3]. CMC directly draws random samples from the probability density function (PDF) that models device-level variations, and performs a transistor-level simulation to evaluate the performance value for each sample. When CMC is applied to estimate an extremely small failure rate (e.g., 10−8∼10−6), most random samples do not fall into the failure region. Hence, a large number of (e.g., 107∼109) samples are needed to accurately estimate the small failure probability, which implies that CMC can be extremely expensive for our application of rare failure rate estimation.

To improve the sampling efficiency, importance sampling (IS) methods have been proposed in the literature [7, 13, 15, 17, 20]. Instead of sampling the original PDF, IS samples a distorted PDF to get more samples in the important failure region. The efficiency achieved by IS highly depends on the choice of the distorted PDF. The traditional IS methods apply several heuristics to construct a distorted PDF that can capture the most important failure region in the variation space. Such a goal, though easy to achieve in a low-dimensional variation space, is extremely difficult to fulfill when a large number of random variables are used to model process variations.

Another approach to improving the sampling efficiency, referred to as statistical blockade, has recently been proposed [18]. This approach first builds a classifier with a number of transistor-level simulations, and then draws random samples from the original PDF. Unlike CMC where all the samples are evaluated by transistor-level simulations, statistical blockade only simulates the samples that are likely to fall into the failure region or close to the failure boundary based on the classifier. The efficiency achieved by this approach highly depends on the accuracy of the classifier. If the variation space is high-dimensional, a large number of transistor-level simulations are needed to build an accurate classifier, which makes the statistical blockade method quickly intractable.

In addition to the aforementioned statistical methods, several deterministic approaches have also been proposed to efficiently estimate the rare failure probability [10, 14]. These methods first find the failure boundary, and then calculate the failure probability by integrating the PDF over the failure region in the variation space. Though efficient in a low-dimensional variation space, it is often computationally expensive to accurately determine the failure boundary in a high-dimensional space especially if the boundary has a complicated shape (e.g., non-convex or even discontinuous).

Most of these traditional methods [7, 9, 10, 13,14,15, 17, 18, 20, 22, 23] have been successfully applied to SRAM bit-cells to estimate their rare failure rates where only a small number of (e.g., 6∼20) independent random variables are used to model process variations and, hence, the corresponding variation space is low-dimensional. It has been demonstrated in the literature that estimating the rare failure probability in a high-dimensional space (e.g., hundreds of independent random variables to model the device-level variations for SRAM) becomes increasingly important [21]. Unfortunately, such a high-dimensional problem cannot be efficiently handled by most traditional methods. It, in turn, poses an immediate need of developing a new CAD tool to accurately capture the rare failure events in a high-dimensional variation space with low computational cost.

To address this technical challenge, we first describe a novel subset simulation (SUS) technique. The key idea of SUS, borrowed from the statistics community [1, 6, 11], is to express the rare failure probability as the product of several large conditional probabilities by introducing a number of intermediate failure events. As such, the original problem of rare failure probability estimation is cast to an equivalent problem of estimating a sequence of conditional probabilities via multiple phases. Since these conditional probabilities are relatively large, they are substantially easier to estimate than the original rare failure rate.

When implementing the SUS method, it is difficult, if not impossible, to directly draw random samples from the conditional PDFs and estimate the conditional probabilities, since these conditional PDFs are unknown in advance. To address this issue, a modified Metropolis (MM) algorithm is adopted from the literature [1] to generate random samples by constructing a number of Markov chains. The conditional probabilities of interest are then estimated from these random samples. Unlike most traditional techniques [7, 9, 10, 13,14,15, 17, 18, 20, 22, 23] that suffer from the dimensionality issue, SUS can be efficiently applied to high-dimensional problems, which will be demonstrated by the experimental results in Sect. 12.2.

To define the intermediate failure events required by SUS, the performance of interest (PoI) must be continuous. In other words, SUS can only analyze a continuous PoI. For many rare failure events, however, PoIs are discrete (e.g., the output of a voltage-mode sense amplifier). Realizing this limitation, we further describe a scaled-sigma sampling (SSS) approach to efficiently estimate the rare failure rates for discrete PoIs in a high-dimensional space. SSS is particularly developed to address the following two fundamental questions: (1) how to efficiently draw random samples from the rare failure region, and (2) how to estimate the rare failure rate based on these random samples. Unlike CMC that directly samples the variation space and therefore only few samples fall into the failure region, SSS draws random samples from a distorted PDF for which the standard deviation (i.e., sigma) is scaled up. Conceptually, it is equivalent to increasing the magnitude of process variations. As a result, a large number of samples can now fall into the failure region. Once the distorted random samples are generated, an analytical model derived from the theorem of “soft maximum” is optimally fitted by applying maximum likelihood estimation (MLE). Next, the failure rate can be efficiently estimated from the fitted model.

The remainder of this chapter is organized as follows. In Sect. 12.2, we will summarize the SUS approach and, next, the SSS approach will be presented in Sect. 12.3. Finally, we conclude in Sect. 12.4.

2 Subset Simulation

Suppose that the vector

$$\displaystyle \begin{aligned} \setlength{\arraycolsep}{5pt} \mathbf{x} = \left[ \begin{array}{cccc} x_1 & x_2 & \cdots & x_M \end{array} \right]^T \end{aligned} $$
(12.1)

is an M-dimensional random variable modeling device-level process variations. In a process design kit, the random variables {x m;m = 1, 2, …, M} in (12.1) are typically modeled as a jointly Normal distribution [7, 9, 10, 13,14,15, 17, 18, 20, 22, 23]. Without loss of generality, we further assume that {x m;m = 1, 2, …, M} are mutually independent and standard Normal (i.e., with zero mean and unit variance) and its joint PDF is

$$\displaystyle \begin{aligned} f( \mathbf{x} ) = \prod_{m=1}^M{p_m\left(x_m\right)} = \prod_{m=1}^M \left[ \dfrac{1}{\sqrt{2 \pi}} \cdot \exp{\left( - \dfrac{x_m^2}{2} \right)} \right] = \dfrac{\exp \left( - \left. \left\| \mathbf{x} \right\|{}_2^2 \middle / 2 \right. \right)}{ \left( \sqrt{2 \pi} \right)^M }, \end{aligned} $$
(12.2)

where \(p_m\left ( x_m \right )\) is the 1-D PDF for x m, and ∥•∥2 denotes the L 2-norm of a vector. Any correlated random variables that are jointly Normal can be transformed to the independent random variables {x m;m = 1, 2, …, M} by principal component analysis [3]. Then, the failure rate of a circuit can be mathematically represented as:

$$\displaystyle \begin{aligned} P_F = \Pr\left( \mathbf{x} \in \Omega \right) = \int_{\mathbf{x} \in \Omega} f(\mathbf{x}) \cdot d\mathbf{x}, \end{aligned} $$
(12.3)

where Ω denotes the failure region, i.e., the subset of the variation space where the PoI does not meet the specification.

Instead of directly estimating the rare failure probability P F, SUS expresses P F as the product of several large conditional probabilities by introducing a number of intermediate failure events in the variation space. Without loss of generality, we define K intermediate failure events { Ωk;k = 1, 2, …, K} as:

$$\displaystyle \begin{aligned} \Omega_1 \supset \Omega_2 \supset \cdots \supset \Omega_{K-1} \supset \Omega_K = \Omega. \end{aligned} $$
(12.4)

Based on (12.4), we can express P F in (12.3) as:

$$\displaystyle \begin{aligned} P_F = \Pr \left( \mathbf{x} \in \Omega \right) = \Pr \left( \mathbf{x} \in \Omega_K, \mathbf{x} \in \Omega_{K-1} \right). \end{aligned} $$
(12.5)

Equation (12.5) can be re-written as:

$$\displaystyle \begin{aligned} P_F = \Pr \left( \mathbf{x} \in \Omega_K \left| \mathbf{x} \in \Omega_{K-1} \right. \right) \cdot \Pr \left( \mathbf{x} \in \Omega_{K-1} \right). \end{aligned} $$
(12.6)

Similarly, we can express \(\Pr \left ( \mathbf {x} \in \Omega _{K-1} \right )\) as:

$$\displaystyle \begin{aligned} \Pr \left( \mathbf{x} \in \Omega_{K-1} \right) = \Pr \left( \mathbf{x} \in \Omega_{K-1} \left| \mathbf{x} \in \Omega_{K-2} \right. \right) \cdot \Pr \left( \mathbf{x} \in \Omega_{K-2} \right). \end{aligned} $$
(12.7)

From (12.4), (12.6), and (12.7), we can easily derive:

$$\displaystyle \begin{aligned} P_F = \Pr \left( \mathbf{x} \in \Omega_1 \right) \cdot \prod_{k=2}^K \Pr \left( \mathbf{x} \in \Omega_k \left| \mathbf{x} \in \Omega_{k-1} \right. \right) = \prod_{k=1}^K P_k, \end{aligned} $$
(12.8)

where

$$\displaystyle \begin{aligned} P_1 = \Pr \left( \mathbf{x} \in \Omega_1 \right), \end{aligned} $$
(12.9)
$$\displaystyle \begin{aligned} P_k = \Pr \left( \mathbf{x} \in \Omega_k \left| \mathbf{x} \in \Omega_{k-1} \right. \right) \quad \left( k = 2, 3, \ldots, K \right). \end{aligned} $$
(12.10)

If { Ωk;k = 1, 2, …, K} are properly chosen, all the probabilities {P k;k = 1, 2, …, K} are large and can be efficiently estimated. Once {P k;k = 1, 2, …, K} are known, the rare failure probability P F can be easily calculated by (12.8).

Note that the failure events { Ωk;k = 1, 2, …, K} are extremely difficult to specify in a high-dimensional variation space. For this reason, we do not directly define { Ωk;k = 1, 2, …, K} in the variation space. Instead, we utilize their corresponding subsets {F k;k = 1, 2, …, K} in the performance space:

$$\displaystyle \begin{aligned} F_k = \left\{ y(\mathbf{x}); \mathbf{x} \in \Omega_k \right\} \quad \left( k = 1, 2, \ldots, K \right), \end{aligned} $$
(12.11)

where y(x) denotes the PoI as a function of x. Since y(x) is typically a scalar, {F k;k = 1, 2, …, K} are just one-dimensional subsets of \(\mathbb {R}\) and, therefore, easy to be specified. Once {F k;k = 1, 2, …, K} are determined, { Ωk;k = 1, 2, …, K} are implicitly known. For instance, to know whether a given x belongs to Ωk, we first run a transistor-level simulation to evaluate y(x). If y(x) belongs to F k, x is inside Ωk. Otherwise, x is outside Ωk.

In what follows, we will use a simple 2-D example to intuitively illustrate the basic flow of SUS. Figure 12.1 shows this 2-D example where two random variables x = [x 1 x 2]T are used to model the device-level process variations, and Ω1 and Ω2 denote the first two subsets in (12.4). Note that Ω1 and Ω2 are depicted for illustration purposes in this example. In practice, we do not need to explicitly know Ω1 and Ω2, as previously explained.

Fig. 12.1
figure 1

A 2-D example is used to illustrate the procedure of probability estimation via multiple phases by using SUS: (a) generating MC samples and estimating P 1 in the 1st phase, and (b) generating MCMC samples and estimating P 2 in the 2nd phase

Our objective is to estimate the probabilities {P k;k = 1, 2, …, K} via multiple phases. Starting from the 1st phase, we simply draw L 1 independent random samples {x (1, l);l = 1, 2, …, L 1} from the PDF f(x) to estimate P 1. Here, the superscript “1” of the symbol x (1, l) refers to the 1st phase. Among these L 1 samples, we identify a subset of samples \(\{ {\mathbf {x}}_F^{(1, t)}; t = 1, 2, \ldots , T_1 \}\) that fall into Ω1, where T 1 denotes the total number of samples in this subset. As shown in Fig. 12.1(a), the red points represent the samples that belong to Ω1 and the green points represent the samples that are out of Ω1. In this case, P 1 can be estimated as:

$$\displaystyle \begin{aligned} \displaystyle P_1^{\text{SUS}} = \dfrac{1}{L_1} \cdot \sum_{l=1}^{L_1} I_{\Omega_1} \left[ {\mathbf{x}}^{(1,l)} \right] = \dfrac{T_1}{L_1}, \end{aligned} $$
(12.12)

where \(P_1^{\text{SUS}}\) denotes the estimated value of P 1, and \(I_{\Omega _1} (\mathbf {x})\) represents the indicator function

$$\displaystyle \begin{aligned} I_{\Omega_1}(\mathbf{x}) = \left\{ \begin{array}{ll} 1 & \quad \mathbf{x} \in \Omega_1 \\ 0 & \quad \mathbf{x} \notin \Omega_1 \end{array} \right. . \end{aligned} $$
(12.13)

If P 1 is large, it can be accurately estimated with a small number of random samples (e.g., L 1 is around 102∼103).

Next, in the 2nd phase, we need to estimate the conditional probability \(P_2 = \Pr (\mathbf {x} \in \Omega _2 | \mathbf {x} \in \Omega _1)\). Towards this goal, one simple idea is to directly draw random samples from the conditional PDF f(x|x ∈ Ω1) and then compute the mean of the indicator function \(I_{ \Omega _2 } (\mathbf {x})\)

$$\displaystyle \begin{aligned} I_{\Omega_2}(\mathbf{x}) = \left\{ \begin{array}{ll} 1 & \quad \mathbf{x} \in \Omega_2 \\ 0 & \quad \mathbf{x} \notin \Omega_2 \end{array} \right. . \end{aligned} $$
(12.14)

This approach, however, is practically infeasible since f(x|x ∈ Ω1) is unknown in advance. To address this issue, we apply a modified Metropolis (MM) algorithm [1] to generate a set of random samples that follow the conditional PDF f(x|x ∈ Ω1).

MM is a Markov chain Monte Carlo (MCMC) technique [3]. Starting from each of the samples \(\{ {\mathbf {x}}_F^{(1, t)}; t = 1, 2, \ldots , T_1\}\) that fall into Ω1 in the 1st phase, MM generates a sequence of samples that form a Markov chain. In other words, there are T 1 independently generated Markov chains in total and \({\mathbf {x}}_F^{(1, t)}\) is the 1st sample of the t-th Markov chain. To clearly explain the MM algorithm, we define the symbol \({\mathbf {x}}^{(2, t, 1)} = {\mathbf {x}}_F^{(1, t)}\), where t ∈{1, 2, …, T 1}. The superscripts “2” and “1” of x (2, t, 1) refer to the 2nd phase and the 1st sample of the Markov chain, respectively.

For our 2-D example, we start from \({\mathbf {x}}^{(2, 1, 1)}= [x_1^{(2, 1, 1)} \ x_2^{(2, 1, 1)}]^T\) to form the 1st Markov chain. To generate the 2nd sample x (2, 1, 2) from x (2, 1, 1), we first randomly sample a new value \(x_1^{NEW}\) from a 1-D transition PDF \(q_1[x_1^{NEW} | x_1^{(2, 1, 1)}]\) that must satisfy the following condition [1]:

$$\displaystyle \begin{aligned} q_1 \left[ x_1^{NEW} \left| x_1^{(2,1,1)} \right. \right] = q_1 \left[ x_1^{(2,1,1)} \left| x_1^{NEW} \right. \right] \end{aligned} $$
(12.15)

There are many possible ways to define \(q_1[x_1^{NEW} | x_1^{(2, 1, 1)}]\) in (12.15) [1]. For example, a 1-D Normal PDF can be used

$$\displaystyle \begin{aligned} q_1 \left[ x_1^{NEW} \left| x_1^{(2,1,1)} \right. \right] = \frac{1}{\sqrt{2\pi}\cdot\sigma_1}\cdot \exp \left\{ -\frac{\left[x_1^{NEW} - x_1^{(2,1,1)} \right]^2}{2\cdot\sigma_1^2} \right\}. \end{aligned} $$
(12.16)

where \(x_1^{(2, 1, 1)}\) and σ 1 are the mean and standard deviation of the distribution, respectively. Here, σ 1 is a parameter that usually be empirically chosen[19].

Next, we compute the ratio

$$\displaystyle \begin{aligned} r = \frac{p_1 \left( x_1^{NEW} \right)}{p_1 \left( x_1^{(2,1,1)}\right)}, \end{aligned} $$
(12.17)

where p 1(x 1) is the original PDF of the random variable x 1 shown in (12.2). A random sample u is then drawn from a 1-D uniform distribution with the following PDF:

$$\displaystyle \begin{aligned} f(u) = \left\{ \begin{array}{ll} 1 & \quad 0 \leq u \leq 1\\ 0 & \quad \text{Otherwise} \end{array} \right. , \end{aligned} $$
(12.18)

and the value of \(x_1^{(2, 1, 2)}\) is set as

$$\displaystyle \begin{aligned} x_1^{(2,1,2)} = \left\{ \begin{array}{ll} x_1^{NEW} & \quad u \leq \min(1,r)\\ x_1^{(2,1,1)} & \quad u > \min(1,r) \end{array} \right. . \end{aligned} $$
(12.19)

A similar procedure is applied to generate \(x_2^{(2, 1, 2)}\). Once \(x_1^{(2, 1, 2)}\) and \(x_2^{(2, 1, 2)}\) are determined, we form a candidate \({\mathbf {x}}^{NEW} = [ x_1^{(2, 1, 2)} \ x_2^{(2, 1, 2)} ]^T\) and use it to create the sample x (2, 1, 2)

$$\displaystyle \begin{aligned} {\mathbf{x}}^{(2,1,2)} = \left\{ \begin{array}{ll} {\mathbf{x}}^{NEW} & \quad {\mathbf{x}}^{NEW} \in \Omega_1\\ {\mathbf{x}}^{(2,1,1)} & \quad {\mathbf{x}}^{NEW} \notin \Omega_1 \end{array} \right. . \end{aligned} $$
(12.20)

By repeating the aforementioned steps, we can create other samples to complete the Markov chain {x (2, 1, l);l = 1, 2, …, L 2}, where L 2 denotes the length of the Markov chain in the 2nd phase. In addition, all other Markov chains can be similarly formed. Since there are T 1 Markov chains and each Markov chain contains L 2 samples, the total number of the MCMC samples is T 1 ⋅ L 2 for the 2nd phase. Figure 12.1(b) shows the sampling results in the 2nd phase for our 2-D example. In Fig. 12.1(b), the red points represent the initial samples {x (2, t, 1);t = 1, 2, …, T 1} of the Markov chains and they are obtained from the 1st phase. The yellow points represent the MCMC samples created via the MM algorithm in the 2nd phase. It has been proved in [1] that all these MCMC samples {x (2, t, l);t = 1, 2, …, T 1;l = 1, 2, …, L 2} in Fig. 12.1(b) approximately follow f(x|x ∈ Ω1). In other words, we have successfully generated a number of random samples that follow our desired distribution for the 2nd phase.

Among all the MCMC samples {x (2, t, l);t = 1, 2, …, T 1;l = 1, 2, …, L 2}, we further identify a subset of samples \( \{ {\mathbf {x}}_F^{(2, t)}; t = 1, 2, \ldots , T_2 \}\) that fall into Ω2, where T 2 denotes the total number of the samples in this subset. The conditional probability P 2 can be estimated as:

$$\displaystyle \begin{aligned} P_2^{\text{SUS}} = \dfrac{1}{T_1 \cdot L_2} \cdot \sum_{t=1}^{T_1} \sum_{l=1}^{L_2} I_{ \Omega_2 } \left[ {\mathbf{x}}^{(2,t,l)} \right] = \dfrac{T_2}{T_1 \cdot L_2} , \end{aligned} $$
(12.21)

where \(P_2^{\text{SUS}}\) denotes the estimated value of P 2.

By following the aforementioned idea, we can estimate all the probabilities {P k;k = 1, 2, …, K}. Once the values of {P k;k = 1, 2, …, K} are estimated, the rare failure rate P F is calculated by

$$\displaystyle \begin{aligned} P_F^{\text{SUS}} = \prod_{k=1}^K P_k^{\text{SUS}},\end{aligned} $$
(12.22)

where \(P_F^{\text{SUS}}\) represents the estimated value of P F by using SUS. If we have more than two random variables, estimating the probabilities {P k;k = 1, 2, …, K} can be pursued in a similar way [19].

To efficiently apply SUS, we must carefully choose the subset {F k;k = 1, 2, …, K} so that the probability P k will be close to 0.1, where k ∈{1, 2, …, K}. In this case, even if the failure rate P F is extremely small (e.g., \(10^{-8} {\thicksim } 10^{-6}\)), SUS only needs a small number of (e.g., \(K = 6 {\thicksim } 8\)) phases to complete. Furthermore, it only requires a few hundred samples during each phase to accurately estimate the probability P k.

In addition, to quantitatively assess the accuracy of the proposed SUS estimator, we must estimate its confidence interval (CI). To this end, we need to know the distribution of \(P_F^{\text{SUS}}\). Since \(P_F^{\text{SUS}}\) is equal to the multiplication of \(\{P_k^{\text{SUS}}; k = 1, 2, \ldots , K\}\), we must carefully study the statistical property of \(P_k^{\text{SUS}}\) in order to derive the distribution for \(P_F^{\text{SUS}}\).

To be specific, \(P_1^{\text{SUS}}\) is calculated by using (12.12) with L 1 independent and identically distributed (i.i.d.) samples drawn from f(x). Hence, according to the central limit theorem (CLT) [16], \(P_1^{\text{SUS}}\) approximately follows a Normal distribution

$$\displaystyle \begin{aligned} P_1^{\text{SUS}} \sim N\left( P_1, v_1 \right),\end{aligned} $$
(12.23)

where the mean value P 1 is defined in (12.9) and the variance value v 1 can be approximated as [16]

$$\displaystyle \begin{aligned} v_1 \approx \frac{1}{L_1}\cdot P_1^{\text{SUS}} \cdot \left(1-P_1^{\text{SUS}}\right).\end{aligned} $$
(12.24)

On the other hand, the conditional probability \(P_k^{\text{SUS}}\), where k ∈{2, 3, …, K}, can be estimated by using the MCMC samples {x (k, t, l);t = 1, 2, …, T k−1;l = 1, 2, …, L k} created by MM:

$$\displaystyle \begin{aligned} P_k^{\text{SUS}} = \dfrac{1}{T_{k-1} \cdot L_k} \cdot \sum_{t=1}^{T_{k-1}} \sum_{l=1}^{L_k} I_{ \Omega_k } \left[ {\mathbf{x}}^{(k,t,l)} \right],\end{aligned} $$
(12.25)

where \(I_{ \Omega _k } \left [ \mathbf {x} \right ]\) represents the indicator function

$$\displaystyle \begin{aligned} I_{\Omega_k}(\mathbf{x}) = \left\{ \begin{array}{ll} 1 & \quad \mathbf{x} \in \Omega_k \\ 0 & \quad \mathbf{x} \notin \Omega_k \end{array} \right. . \end{aligned} $$
(12.26)

Since the MCMC samples {x (k, t, l);l = 1, 2, …, L k}, where t ∈{1, 2, …, T k−1}, are strongly correlated, they cannot be considered as i.i.d. samples. For this reason, we cannot directly apply CLT to derive the distribution for the estimator \(P_k^{\text{SUS}}\) in (12.25).

To address this issue, we define a set of new random variables

$$\displaystyle \begin{aligned} s^{(k,t)} = \dfrac{1}{L_k} \cdot \sum_{l=1}^{L_k} I_{ \Omega_k } \left[ {\mathbf{x}}^{(k,t,l)} \right], \end{aligned} $$
(12.27)

where t ∈{1, 2, …, T k−1}. Studying (12.27) reveals two important observations. First, s (k, t) only depends on the t-th Markov chain {x (k, t, l);l = 1, 2, …, L k}. Since different Markov chains are created from different initial samples {x (k, t, 1);t = 1, 2, …, T k−1}, the random variables {s (k, t);t = 1, 2, …, T k−1} are almost statistically independent. Second, since all initial samples {x (k, t, 1);t = 1, 2, …, T k−1} follow the same conditional PDF \(p(\mathbf {x} \left | \mathbf {x}\in \Omega _{k-1}\right .)\) and all the Markov chains are generated by following the same procedure, the random variables {s (k, t);t = 1, 2, …, T k−1} must be identically distributed. For these reasons, we can consider {s (k, t);t = 1, 2, …, T k−1} as a set of i.i.d. random variables.

Based on (12.27), \(P_k^{\text{SUS}}\) in (12.25), where k ∈{2, 3, …, K}, can be re-written as

$$\displaystyle \begin{aligned} P_k^{\text{SUS}} = \dfrac{1}{T_{k-1}} \cdot \sum_{t=1}^{T_{k-1}} s^{(k,t)} \end{aligned} $$
(12.28)

and, as a result, approximately follows a Normal distribution according to CLT:

$$\displaystyle \begin{aligned} P_k^{\text{SUS}} \sim N\left( P_k, v_k\right), \end{aligned} $$
(12.29)

where P k is defined in (12.10) and

$$\displaystyle \begin{aligned} v_k \approx \dfrac{1}{\left( T_{k-1} - 1\right) \cdot T_{k-1}} \cdot \sum_{t=1}^{T_{k-1}} \left[ s^{(k,t)} - P_k^{\text{SUS}}\right]^2. \end{aligned} $$
(12.30)

To further derive the distribution for \(P_F^{\text{SUS}}\) in (12.22) based on (12.23) and (12.29), we take logarithm on both sides of (12.22) because it is much easier to handle summation than multiplication

$$\displaystyle \begin{aligned} \log\left( P_F^{\text{SUS}}\right) = \sum_{k=1}^K \log \left( P_k^{\text{SUS}} \right). \end{aligned} $$
(12.31)

To derive the distribution of \(\{log(P_k^{SUS}); k = 1, 2, \ldots , K\}\), we approximate the nonlinear function \(\log (\bullet )\) by the first-order Taylor expansion around the mean value P k of the random variable \(P_k^{SUS}\):

$$\displaystyle \begin{aligned} \log\left( P_k^{\text{SUS}}\right) \approx \log \left( P_k \right) + \dfrac{P_k^{\text{SUS}}-P_k}{P_k} \approx \log \left( P_k\right) + \dfrac{P_k^{\text{SUS}}-P_k}{P_k^{\text{SUS}}}. \end{aligned} $$
(12.32)

According to the linear approximation in (12.32), \(\log ( P_k^{\text{SUS}})\) follows a Normal distribution

$$\displaystyle \begin{aligned} \log\left( P_k^{\text{SUS}}\right) \sim N\left[ \log\left( P_k\right), v_{\log,k}\right], \end{aligned} $$
(12.33)

where

$$\displaystyle \begin{aligned} v_{\log,k} = \dfrac{v_k}{\left( P_k^{\text{SUS}}\right)^2}, \end{aligned} $$
(12.34)

and k ∈{1, 2, …, K}.

Since \(\log ( P_F^{\text{SUS}})\) is the summation of several “approximately” Normal random variables \(\{\log ( P_k^{\text{SUS}}); k = 1, 2, \ldots , K\}\), \(\log ( P_F^{\text{SUS}})\) also approximately follows a Normal distribution [16]

$$\displaystyle \begin{aligned} \log\left(P_F^{\text{SUS}}\right) \sim N\left\{ \text{MEAN}\left[ \log\left(P_F^{\text{SUS}}\right)\right], \text{VAR}\left[ \log\left(P_F^{\text{SUS}}\right)\right]\right\} \end{aligned} $$
(12.35)

Based on (12.8), (12.31), and (12.33), \(\text{MEAN}[\log (P_F^{\text{SUS}})]\) can be expressed as

$$\displaystyle \begin{aligned} \text{MEAN}\left[ \log\left(P_F^{\text{SUS}}\right)\right] = \sum_{k=1}^{K} \log\left(P_k\right) = \log\left( \prod_{k=1}^{K}P_k\right) = \log\left(P_F\right), \end{aligned} $$
(12.36)

and \(\text{VAR}[ \log (P_F^{\text{SUS}})]\) can be calculated as

$$\displaystyle \begin{aligned} \begin{aligned} &\text{VAR} \left[ \log\left(P_F^{\text{SUS}}\right)\right] = \text{VAR}\left[ \sum_{k=1}^{K} \log\left( P_k^{\text{SUS}} \right)\right] \\ &\quad = \sum_{k=1}^{K}v_{\log,k} + 2\cdot \sum_{i=1}^{K-1} \sum_{j=i+1}^{K}\text{COV}\left[ \log\left(P_i^{\text{SUS}}\right),\log\left(P_j^{\text{SUS}}\right)\right]\ \end{aligned}, \end{aligned} $$
(12.37)

where COV(•, •) denotes the covariance of two random variables.

When applying MCMC, we often observe that an MCMC sample is strongly correlated to its adjacent sample. However, the correlation quickly decreases as the distance between two MCMC samples increases. Therefore, we can assume that the samples used to estimate \(\log (P_i^{\text{SUS}})\) are weakly correlated to the samples used to estimate \(\log (P_j^{\text{SUS}})\), if the distance between i and j is greater than 1 (i.e., |i − j| > 1). Based on this assumption, (12.37) can be approximated as

$$\displaystyle \begin{aligned} \text{VAR} \left[ \log\left(P_F^{\text{SUS}}\right)\right] \approx \sum_{k=1}^{K}v_{\log,k} + 2\cdot \sum_{k=1}^{K-1} \text{COV}\left[ \log\left(P_k^{\text{SUS}}\right),\log\left(P_{k+1}^{\text{SUS}}\right)\right]. \end{aligned} $$
(12.38)

Accurately estimating the covariance between \(\log (P_k^{\text{SUS}})\) and \(\log (P_{k+1}^{\text{SUS}})\) is nontrivial. Here, we derive an upper bound for \(\text{COV}[\log (P_k^{\text{SUS}}), \log (P_{k+1}^{\text{SUS}})]\) [16]:

$$\displaystyle \begin{aligned} \text{COV}\left[ \log\left(P_k^{\text{SUS}}\right),\log\left(P_{k+1}^{\text{SUS}}\right)\right] \leq \sqrt{v_{\log,k}\cdot v_{\log,k+1}},\end{aligned} $$
(12.39)

where k ∈{1, 2, …, K − 1}. Substituting (12.39) into (12.38) yields

$$\displaystyle \begin{aligned} \text{VAR} \left[ \log\left(P_F^{\text{SUS}}\right)\right] \leq \sum_{k=1}^{K}v_{\log,k} + 2\cdot \sum_{k=1}^{K-1} \sqrt{v_{\log,k} \cdot v_{\log,k+1}} = v_{\log,\text{SUS}}.\end{aligned} $$
(12.40)

In this chapter, we approximate \(\text{VAR}[\log (P_F^{\text{SUS}})]\) by its upper bound \(v_{\log ,\text{SUS}}\) defined in (12.40) to provide a conservative estimation for the CI. Based on (12.36) and (12.40), (12.35) can be re-written as

$$\displaystyle \begin{aligned} \log\left( P_F^{\text{SUS}} \right) \sim N \left[ \log\left( P_F \right), v_{\log, \text{SUS}}\right].\end{aligned} $$
(12.41)

According to (12.41), we can derive the CI for any given confidence level. For instance, the 95% CI is expressed as

$$\displaystyle \begin{aligned} \left[ \exp\left(\log(P_F^{\text{SUS}})-1.96\cdot\sqrt{v_{\log,\text{SUS}}}\right), \exp\left(\log(P_F^{\text{SUS}})+1.96\cdot\sqrt{v_{\log,\text{SUS}}}\right)\right].\end{aligned} $$
(12.42)

To demonstrate the efficacy of SUS, we consider an SRAM column example designed in a 45 nm CMOS process, as shown in Fig. 12.2. In this example, our PoI is the read current I READ, which is defined as the difference between the bit-line currents I BL and \(I_{\text{BL}\_}\) (i.e., \(I_{\text{READ}} = I_{\text{BL}} - I_{\text{BL}\_}\)) when we start to read CELL <0>. If I READ is greater than a pre-defined specification, we consider the SRAM circuit as “PASS”. For process variation modeling, the local V TH mismatch of each transistor is considered as an independent Normal random variable. In total, we have 384 independent random variables (i.e., 64 bit-cells × 6 transistors per bit-cell = 384).

Fig. 12.2
figure 2

The simplified schematic is shown for an SRAM column consisting of 64 bit-cells designed in a 45 nm CMOS process

We first run CMC with 109 random samples, and the estimated failure rate is 1.1 × 10−6, which is considered as the “golden” failure rate in this example. Next, we compare SUS with the traditional importance sampling technique: MNIS [17], where 2000 simulations are used to construct the distorted PDF. We repeatedly run MNIS and SUS for 100 times with 6000 transistor-level simulations in each run. Figure 12.3 shows the 100 estimated 95% CIs for each method, where each blue bar represents the CI of a single run, and the red line represents the “golden” failure rate.

Fig. 12.3
figure 3

The 95% confidence intervals (blue bars) of the SRAM read current example are estimated from 100 repeated runs with 6000 transistor-level simulations in each run for: (a) MNIS and (b) SUS. The red line represents the “golden” failure rate

In this example, only a single CI estimated from 100 repeated runs by MNIS can cover the “golden” failure rate, implying that MNIS fails to estimate the CIs accurately. This is an important limitation of MNIS, and generally most of the importance sampling techniques, since the user cannot reliably know the actual “confidence” of the estimator in practice. For the SUS approach, however, there are 95 CIs out of 100 runs that cover the “golden” failure rate. More importantly, the CIs estimated by SUS are relatively tight, which implies that SUS achieves substantially better accuracy than the traditional MNIS approach in this example.

Before ending this section, we would like to emphasize that to define the subsets {F k;k = 1, 2, …, K} required by SUS, PoI must be continuous. Realizing this limitation, we further describe a scaled-sigma sampling (SSS) approach to efficiently estimate the rare failure rates for discrete PoIs in a high-dimensional space, which will be presented in the next section.

3 Scaled-Sigma Sampling

Unlike the traditional importance sampling methods that must explicitly identify the high-probability failure region, SSS takes a completely different strategy to address the following questions: (1) how to efficiently draw random samples from the high-probability failure region, and (2) how to estimate the failure rate based on these random samples. In what follows, we will derive the mathematical formulation of SSS and highlight its novelties.

For the application of rare failure rate estimation, a failure event often occurs at the tail of the PDF f(x). Given (12.2), it implies that the failure region Ω is far away from the origin x = 0, as shown in Fig. 12.4(a). Since the failure rate is extremely small, the traditional CMC analysis cannot efficiently draw random samples from the failure region. Namely, many samples cannot reach the tail of f(x).

Fig. 12.4
figure 4

The proposed SSS is illustrated by a 2-D example where the grey area Ω denotes the failure region and the circles represent the contour lines of the PDF. (a) Rare failure events occur at the tail of the original PDF f(x) and the failure region is far away from the origin x = 0. (b) The scaled PDF g(x) widely spreads over a large region and the scaled samples are likely to reach the far-away failure region

To address the aforementioned sampling issue, SSS applies a simple idea. Given f(x) in (12.2), we scale up the standard deviation of x by a scaling factor s (s > 1), yielding the following distribution:

$$\displaystyle \begin{aligned} g( \mathbf{x} ) = \prod_{m=1}^M \left[ \dfrac{\exp \left( - \left. x_m^2 \middle / 2s^2 \right. \right)}{\sqrt{2 \pi}s} \right] = \dfrac{\exp \left( - \left. \| \mathbf{x} \|{}_2^2 \middle / 2s^2 \right. \right)}{ \left( \sqrt{2 \pi} \cdot s\right)^M}. \end{aligned} $$
(12.43)

Once the standard deviation of x is increased by a factor of s, we conceptually increase the magnitude of process variations. Hence, the PDF g(x) widely spreads over a large region and the probability for a random sample to reach the far-away failure region increases, as shown in Fig. 12.4(b).

It is important to note that the mean of the scaled PDF g(x) remains 0, which is identical to the mean of the original PDF f(x). Hence, for a given sampling location x, the likelihood defined by the scaled PDF g(x) remains inversely proportional to the length of the vector x (i.e., ∥x2). Namely, it is more (or less) likely to reach the sampling location x, if the distance between the location x and the origin 0 is smaller (or larger). It, in turn, implies that the high-probability failure region associated with the original PDF f(x) remains the high-probability failure region after the PDF is scaled to g(x), as shown in Fig. 12.4(a) and (b). Scaling the PDF from f(x) to g(x) does not change the location of the high-probability failure region; instead, it only makes the failure region easy to sample.

Once the scaled random samples are drawn from g(x) in (12.43), we need to further estimate the failure rate P F defined in (12.3). To this end, one straightforward way is to apply the importance sampling method [3]. Such a simple approach, however, has been proved to be intractable when the dimensionality (i.e., M) of the variation space is high [21]. Namely, it does not fit the need of high-dimensional failure rate estimation in this chapter.

Instead of relying on the theory of importance sampling, SSS attempts to estimate the failure rate P F from a completely different avenue. We first take a look at the “scaled” failure rate P G corresponding to g(x):

$$\displaystyle \begin{aligned} P_G = \int_{\mathbf{x} \in \Omega} g(\mathbf{x}) \cdot d\mathbf{x} = \int_{-\infty}^{+\infty}I_{\Omega} (\mathbf{x}) \cdot g(\mathbf{x}) \cdot d\mathbf{x}, \end{aligned} $$
(12.44)

where I Ω(x) represents the indicator function:

$$\displaystyle \begin{aligned} I_{\Omega} (\mathbf{x}) = \left\{ \begin{array}{ll} 1 & \quad \mathbf{x} \in \Omega \\ 0 & \quad \mathbf{x} \notin \Omega \end{array} \right. . \end{aligned} $$
(12.45)

Our objective is to study the relation between the scaled failure rate P G in (12.44) and the original failure rate P F in (12.3). Towards this goal, we partition the M-dimensional variation space into a large number of identical hyper-rectangles with the same volume and the scaled failure rate P G in (12.44) can be approximated as:

$$\displaystyle \begin{aligned} P_G \approx \sum_k I_{\Omega} \left[ {\mathbf{x}}^{(k)} \right] \cdot g \left[ {\mathbf{x}}^{(k)} \right] \cdot \Delta \mathbf{x}, \end{aligned} $$
(12.46)

where Δx denotes the volume of a hyper-rectangle. The approximation in (12.46) is accurate, if each hyper-rectangle is sufficiently small. Given the definition of I Ω(x) in (12.45), Eq. (12.46) can be re-written as:

$$\displaystyle \begin{aligned} P_G \approx \sum_{k \in \Omega} g\left[ {\mathbf{x}}^{(k)} \right] \cdot \Delta \mathbf{x}, \end{aligned} $$
(12.47)

where {k;k ∈ Ω} represents the set of all hyper-rectangles that fall into the failure region.

Substituting (12.43) into (12.47), we have

$$\displaystyle \begin{aligned} P_G \approx \dfrac{\Delta \mathbf{x}}{\left( \sqrt{2\pi} \cdot s\right)^M} \cdot \sum_{k \in \Omega} \exp \left[ - \dfrac{\left\|{\mathbf{x}}^{(k)} \right\|{}_2^2}{2s^2} \right]. \end{aligned} $$
(12.48)

Taking the logarithm on both sides of (12.48) yields:

$$\displaystyle \begin{aligned} \log P_G \approx \log \dfrac{\Delta \mathbf{x}}{\left( 2 \pi \right)^{\left. M \middle /2 \right. }} - M \cdot \log s + \underset{k \in \Omega}{\mathrm{lse}} \left[ - \dfrac{\left\|{\mathbf{x}}^{(k)} \right\|{}_2^2}{2s^2} \right], \end{aligned} $$
(12.49)

where

$$\displaystyle \begin{aligned} \underset{k \in \Omega}{\mathrm{lse}} \left[ \dfrac{- \left\|{\mathbf{x}}^{(k)} \right\|{}_2^2}{2s^2} \right] = \log \left\{ \sum_{k \in \Omega} \exp \left[ - \dfrac{\left\|{\mathbf{x}}^{(k)} \right\|{}_2^2}{2s^2} \right] \right\} \end{aligned} $$
(12.50)

stands for the log-sum-exp function. The function lse(•) in (12.50) is also known as the “soft maximum” from the mathematics [4]. It can be bounded by

$$\displaystyle \begin{aligned} \max_{k \in \Omega} \left[ - \dfrac{\Big\|{\mathbf{x}}^{(k)}\Big\|{}_2^2}{2s^2}\right] + \log \left(T\right) \geq \underset{k \in \Omega}{\mathrm{lse}} \left[ - \dfrac{\Big\|{\mathbf{x}}^{(k)}\Big\|{}_2^2}{2s^2} \right] \geq \max_{k \in \Omega} \left[ - \dfrac{\Big\|{\mathbf{x}}^{(k)}\Big\|{}_2^2}{2s^2}\right],\end{aligned} $$
(12.51)

where T denotes the total number of hyper-rectangles in Ω.

In general, there exist a number of (say, T 0) dominant hyper-rectangles that are much closer to the origin 0 than other hyper-rectangles in the set {x (k);k ∈ Ω}. Without loss of generality, we assume that the first T 0 hyper-rectangles {x (k);k = 1, 2, …, T 0} are dominant. Hence, we can approximate the function lse(•) in (12.50) as

$$\displaystyle \begin{aligned} \underset{k \in \Omega}{\mathrm{lse}} \left[ - \dfrac{\Big\|{\mathbf{x}}^{(k)}\Big\|{}_2^2}{2s^2} \right] \approx \log \left\{ \sum_{k =1}^{T_0} \exp \left[ - \dfrac{\left\|{\mathbf{x}}^{(k)} \right\|{}_2^2}{2s^2} \right] \right\}.\end{aligned} $$
(12.52)

We further assume that these dominant hyper-rectangles {x (k);k = 1, 2, …, T 0} have similar distances to the origin 0. Thus, Eq. (12.52) can be approximated by

$$\displaystyle \begin{aligned} \underset{k \in \Omega}{\mathrm{lse}} \left[ \dfrac{- \left\|{\mathbf{x}}^{(k)} \right\|{}_2^2}{2s^2} \right] \approx \max_{k \in \Omega} \left[ - \dfrac{\left\|{\mathbf{x}}^{(k)} \right\|{}_2^2}{2s^2} \right] + \log \left( T_0 \right) .\end{aligned} $$
(12.53)

Substituting (12.53) into (12.49) yields

$$\displaystyle \begin{aligned} \log P_G \approx \alpha + \beta \cdot \log s + \dfrac{\gamma}{s^2},\end{aligned} $$
(12.54)

where

$$\displaystyle \begin{aligned} \renewcommand\arraystretch{2} \begin{array}{rcl} \alpha & = & \log \dfrac{\varDelta \mathbf{x}}{\left( 2 \pi \right)^{\left. M \middle /2 \right. } } + \log\left( T_0 \right) \\ \beta & = & -M \\ \gamma & = & \displaystyle \max_{k \in \Omega} \left[ - \dfrac{\left\|{\mathbf{x}}^{(k)} \right\|{}_2^2}{2} \right] \end{array}. \renewcommand\arraystretch{1}\end{aligned} $$
(12.55)

Equation (12.54) reveals the important relation between the scaled failure rate P G and the scaling factor s. The approximation in (12.54) does not rely on any specific assumption of the failure region. It is valid, even if the failure region is non-convex or discontinuous.

While (12.55) shows the theoretical definition of the model coefficients α, β and γ, finding their exact values is not trivial. For instance, the coefficient γ is determined by the hyper-rectangle that falls into the failure region Ω and is closest to the origin x = 0. In practice, without knowing the failure region Ω, we cannot directly find out the value of γ. For this reason, we fit the analytical model in (12.54) by linear regression. Namely, we first estimate the scaled failure rates {P G,q;q = 1, 2, …, Q} by setting the scaling factor s to a number of different values {s q;q = 1, 2, …, Q}. As long as the scaling factors {s q;q = 1, 2, …, Q} are sufficiently large, the scaled failure rates {P G,q;q = 1, 2, …, Q} are large and can be accurately estimated with a small number of random samples. Next, the model coefficients α, β, and γ are fitted by linear regression based on the values of {(s q, P G,q);q = 1, 2, …, Q}. Once α, β, and γ are known, the original failure rate P F in (12.3) can be predicted by extrapolation. Namely, we substitute s = 1 into the analytical model in (12.54):

$$\displaystyle \begin{aligned} \log P_F^{\text{SSS}} = \alpha + \gamma,\end{aligned} $$
(12.56)

where \(P_F^{\text{SSS}}\) denotes the estimated value of P F by SSS. Apply the exponential function to both sides of (12.56) and we have

$$\displaystyle \begin{aligned} P_F^{\text{SSS}} = \exp \left( \alpha + \gamma \right).\end{aligned} $$
(12.57)

To make the SSS method of practical utility, maximum likelihood estimation is applied to fit the model coefficients in (12.54). The MLE solution can be solved from an optimization problem and it is considered to be statistically optimal for a given set of random samples.

Without loss of generality, we assume that N q scaled random samples {x (n);n = 1, 2, …, N q} are collected for the scaling factor s q. The scaled failure rate P G,q can be estimated by MC

$$\displaystyle \begin{aligned} P_{G,q}^{\text{MC}}=\dfrac{1}{N_q}\cdot\sum_{n=1}^{N_q}I_{\Omega}\left( {\mathbf{x}}^{(n)}\right),\end{aligned} $$
(12.58)

where I Ω(x) is the indicator function defined in (12.45). The variance of the estimator \(P_{G,q}^{\text{MC}}\) in (12.58) can be approximated as [16]

$$\displaystyle \begin{aligned} v_{G,q}^{\text{MC}}=P_{G,q}^{\text{MC}}\cdot \dfrac{1-P_{G,p}^{\text{MC}}}{N_q}.\end{aligned} $$
(12.59)

If the number of samples N q is sufficiently large, the estimator \(P_{G,q}^{\text{MC}}\) in (12.58) follows a Gaussian distribution according to CLT [16]

$$\displaystyle \begin{aligned} P_{G,q}^{\text{MC}} \sim \text{Gauss}\left(P_{G,q}, v_{G,p}^{\text{MC}}\right),\end{aligned} $$
(12.60)

where P G,q denotes the actual failure rate corresponding to the scaling factor s q.

Note that the model template in (12.54) is expressed for \(\log P_G\), instead of P G. To further derive the probability distribution for \(\log P_{G,q}^{\text{MC}}\), we adopt the first-order delta method from the statistics community [16]. Namely, we approximate the nonlinear function \(\log (\bullet )\) by the first-order Taylor expansion around the mean value \(\log P_{G,q}\) of the random variable \(\log P_{G,q}^{\text{MC}}\)

$$\displaystyle \begin{aligned} \log P_{G,q}^{\text{MC}} \approx \log P_{G,q} + \dfrac{P_{G,q}^{\text{MC}} - P_{G,q}}{P_{G,q}} \approx \log P_{G,q} + \dfrac{P_{G,q}^{\text{MC}} - P_{G,q}}{P_{G,q}^{\text{MC}}}. \end{aligned} $$
(12.61)

Based on the linear approximation in (12.61), \(\log P_{G,q}^{\text{MC}}\) follows the Gaussian distribution

$$\displaystyle \begin{aligned} \log P_{G,q}^{\text{MC}} \sim \text{Gauss}\left[ \log P_{G,q}, \ \dfrac{v_{G,q}^{\text{MC}}}{\left(P_{G,q}^{\text{MC}}\right)^2} \right]. \end{aligned} $$
(12.62)

Equation (12.62) is valid for all scaling factors {s q;q = 1, 2, …, Q}. In addition, since the scaled failure rates corresponding to different scaling factors are estimated by independent Monte Carlo simulations, the estimated failure rates \(\{ P_{G,q}^{\text{MC}}; q = 1, 2, \ldots , Q\}\) are mutually independent. Therefore, the Q-dimensional random variable

$$\displaystyle \begin{aligned} \log {\mathbf{P}}_G^{\text{MC}} = \left[ \begin{array}{cccc} \log P_{G,1}^{\text{MC}} \ \ &\ \log P_{G,2}^{\text{MC}} \ \ &\ \cdots \ \ &\ \log P_{G,Q}^{\text{MC}} \end{array} \right]^T \end{aligned} $$
(12.63)

satisfies the following jointly Gaussian distribution:

$$\displaystyle \begin{aligned} \log {\mathbf{P}}_G^{\text{MC}} \sim \text{Gauss}\left( \boldsymbol{\mu}_G, \boldsymbol{\Sigma}_G\right), \end{aligned} $$
(12.64)

where the mean vector μ G and the covariance matrix Σ G are equal to

$$\displaystyle \begin{aligned} \boldsymbol{\mu}_G= \left[ \begin{array}{cccc} \log P_{G,1} \ \ &\ \log P_{G,2} \ \ &\ \cdots \ \ &\ \log P_{G,Q} \end{array} \right]^T \end{aligned} $$
(12.65)
$$\displaystyle \begin{aligned} \boldsymbol{\Sigma}_G = \text{diag}\left[ \dfrac{v_{G,1}^{\text{MC}}}{\left( P_{G,1}^{\text{MC}}\right)^2}, \dfrac{v_{G,2}^{\text{MC}}}{\left( P_{G,2}^{\text{MC}}\right)^2}, \cdots, \dfrac{v_{G,Q}^{\text{MC}}}{\left( P_{G,Q}^{\text{MC}}\right)^2} \right], \end{aligned} $$
(12.66)

where diag(•) denotes a diagonal matrix.

The diagonal elements of the covariance matrix Σ G in (12.66) can be substantially different. In other words, the accuracy of \(\{ \log P_{G,q}^{\text{MC}}; q = 1, 2, \ldots , Q\}\) associated with different scaling factors {s q;q = 1, 2, …, Q} can be different, because the scaled failure rates {P G,q;q = 1, 2, …, Q} strongly depend on the scaling factors. In general, we can expect that if the scaling factor s q is small, the scaled failure rate P G,q is small and, hence, it is difficult to accurately estimate \(\log P_{G,q}\) from a small number of random samples. For this reason, instead of equally “trusting” the estimators \(\{ \log P_{G,q}^{\text{MC}}; q = 1, 2, \ldots , Q\}\), we must carefully model the “confidence” for each estimator \(\log P_{G,q}^{\text{MC}}\), as encoded by the covariance matrix Σ G in (12.66). Such “confidence” information will be fully exploited by the MLE framework to fit a statistically optimal model.

Since the scaled failure rates {P G,q;q = 1, 2, …, Q} follow the analytical model in (12.54), the mean vector μ G in (12.65) can be re-written as

$$\displaystyle \begin{aligned} \boldsymbol{\mu}_G = \alpha + \beta \cdot \left[ \begin{array}{c} \log s_1 \\ \log s_2 \\ \vdots \\ \log s_Q \end{array} \right] + \gamma \cdot \left[ \begin{array}{c} s_1^{-2} \\ s_2^{-2} \\ \vdots \\ s_Q^{-2} \end{array} \right] = \mathbf{A} \cdot \boldsymbol{\Theta},\end{aligned} $$
(12.67)

where

$$\displaystyle \begin{aligned} \mathbf{A} = \left[ \begin{array}{ccc} 1 \ \ \ & \log s_1 \ \ \ & s_1^{-2} \\ 1 \ \ \ & \log s_2 \ \ \ & s_2^{-2} \\ \vdots \ \ \ & \vdots \ \ \ & \vdots \\ 1 \ \ \ & \log s_Q \ \ \ & s_Q^{-2} \end{array} \right]\end{aligned} $$
(12.68)
$$\displaystyle \begin{aligned} \boldsymbol{\Theta} = \left[ \begin{array}{ccc} \alpha \ \ \ & \beta \ \ \ & \gamma \end{array} \right]^T.\end{aligned} $$
(12.69)

Equation (12.68) implies that the mean value of the Q-dimensional random variable \(\log {\mathbf {P}}_G^{\text{MC}}\) depends on the model coefficients α, β, and γ. Given \(\{P_{G,q}^{\text{MC}}; q = 1, 2, \ldots , Q\}\), the key idea of MLE is to find the optimal values of α, β, and γ so that the likelihood of observing \(\{P_{G,q}^{\text{MC}}; q = 1, 2, \ldots , Q\}\) is maximized.

Because the random variable \(\log {\mathbf {P}}_G^{\text{MC}}\) follows the jointly Gaussian distribution in (12.64), the likelihood associated with the estimated failure rates \(\{P_{G,q}^{\text{MC}}; q = 1, 2, \ldots , Q\}\) is proportional to

$$\displaystyle \begin{aligned} L \sim \exp \left[ -\dfrac{1}{2}\left( \log {\mathbf{P}}_G^{\text{MC}} - \boldsymbol{\mu}_G \right)^T \cdot \boldsymbol{\Sigma}_G^{-1} \cdot \left( \log{{\mathbf{P}}_G^{\text{MC}}} - \boldsymbol{\mu}_G \right)\right].\end{aligned} $$
(12.70)

Taking the logarithm for (12.70) yields

$$\displaystyle \begin{aligned} \log L \sim -\left( \log {\mathbf{P}}_G^{\text{MC}} - \boldsymbol{\mu}_G \right)^T \cdot \boldsymbol{\Sigma}_G^{-1} \cdot \left( \log{{\mathbf{P}}_G^{\text{MC}}} - \boldsymbol{\mu}_G \right).\end{aligned} $$
(12.71)

Substitute (12.67) into (12.71), and we have

$$\displaystyle \begin{aligned} \log L \sim -\left( \log {\mathbf{P}}_G^{\text{MC}} - \mathbf{A}\cdot\boldsymbol{\Theta} \right)^T \cdot \boldsymbol{\Sigma}_G^{-1} \cdot \left( \log{{\mathbf{P}}_G^{\text{MC}}} - \mathbf{A}\cdot\boldsymbol{\Theta} \right). \end{aligned} $$
(12.72)

Note that the log-likelihood \(\log L\) in (12.72) depends on the model coefficients α, β, and γ, because the vector Θ is composed of these coefficients as shown in (12.69). Therefore, the MLE solution of α, β, and γ can be determined by maximizing the log-likelihood function

$$\displaystyle \begin{aligned} \max_{\boldsymbol{\Theta}} \qquad -\left( \log {\mathbf{P}}_G^{\text{MC}} - \mathbf{A}\cdot\boldsymbol{\Theta} \right)^T \cdot \boldsymbol{\Sigma}_G^{-1} \cdot \left( \log{{\mathbf{P}}_G^{\text{MC}}} - \mathbf{A}\cdot\boldsymbol{\Theta} \right). \end{aligned} $$
(12.73)

Since the covariance matrix Σ G is positive definite, the optimization in (12.73) is convex. In addition, since the log-likelihood \(\log L\) is simply a quadratic function of Θ, the unconstrained optimization in (12.73) can be directly solved by inspecting the first-order optimality condition [4]

$$\displaystyle \begin{aligned} \begin{aligned} &\dfrac{\partial}{\partial\boldsymbol{\Theta}} \left[ -\left( \log {\mathbf{P}}_G^{\text{MC}} - \mathbf{A}\cdot\boldsymbol{\Theta} \right)^T \cdot \boldsymbol{\Sigma}_G^{-1} \cdot \left( \log{{\mathbf{P}}_G^{\text{MC}}} - \mathbf{A}\cdot\boldsymbol{\Theta} \right)\right] \\ & \quad = 2\cdot {\mathbf{A}}^T \cdot \boldsymbol{\Sigma}_G^{-1} \cdot \left( \log {\mathbf{P}}_G^{\text{MC}} - \mathbf{A}\cdot\boldsymbol{\Theta}\right) = \mathbf{0} \end{aligned}. \end{aligned} $$
(12.74)

Based on the linear equation in (12.74), the optimal value of Θ can be determined by

$$\displaystyle \begin{aligned} \boldsymbol{\Theta} = \left( {\mathbf{A}}^T\cdot \boldsymbol{\Sigma}_G^{-1}\cdot \mathbf{A} \right)^{-1} \cdot {\mathbf{A}}^T \cdot \boldsymbol{\Sigma}_G^{-1} \cdot \log {\mathbf{P}}_G^{\text{MC}}. \end{aligned} $$
(12.75)

Studying (12.75) reveals an important fact that the estimators \(\{\log P_{G,q}^{\text{MC}}; q = 1, 2, \ldots , Q\}\) are weighted by the inverse of the covariance matrix Σ G. Namely, if the variance of the estimator \(\log P_{G,q}^{\text{MC}}\) is large, \(\log P_{G,q}^{\text{MC}}\) becomes non-critical when determining the optimal values of α, β, and γ. In other words, the MLE framework has optimally weighted the importance of \(\{\log P_{G,q}^{\text{MC}}; q = 1, 2, \ldots , Q\}\) based on the “confidence” level of these estimators. Once α, β, and γ are solved by MLE, the original failure rate P F can be estimated by (12.57).

To apply MLE, we need a set of pre-selected scaling factors {s q;q = 1, 2, …, Q}. In practice, appropriately choosing these scaling factors is a critical task due to several reasons. First, if these scaling factors are too large, the estimator \(P_F^{\text{SSS}}\) based on extrapolation in (12.57) would not be accurate, since the extrapolation point s = 1 is far away from the selected scaling factors. Second, if the scaling factors are too small, the scaled failure rates {P G,q;q = 1, 2, …, Q} are extremely small and they cannot be accurately estimated from a small number of scaled random samples. Third, the failure rates for different performances and/or specifications can be quite different. To estimate them both accurately and efficiently, we should choose small scaling factors for the performance metrics with large failure rates, but large scaling factors for the performance metrics with small failure rates. Hence, finding an appropriate set of scaling factors for all performances and/or specifications can be extremely challenging.

In this chapter, a number of evenly distributed scaling factors covering a relatively large range are empirically selected. For the performance metrics with large failure rates, the scaled failure rates corresponding to a number of small scaling factors can be used to fit the model template in (12.54). On the other hand, the scaled failure rates corresponding to a number of large scaling factors can be used for the performance metrics with small failure rates. As such, a broad range of performances and/or specifications can be accurately analyzed by the SSS method.

While the MLE algorithm is able to optimally estimate the model coefficients α, β, and γ and then predict the failure rate P F, it remains an open question how we can quantitatively assess the accuracy of our SSS method. Since SSS is based upon Monte Carlo simulation, a natural way for accuracy assessment is to calculate the confidence interval of the estimator \(P_F^{\text{SSS}}\). However, unlike the traditional estimator where a statistical metric is estimated by the average of multiple random samples and, hence, the confidence interval can be derived as a closed-form expression, our proposed estimator \(P_F^{\text{SSS}}\) is calculated by linear regression with nonlinear exponential/logarithmic transformation. Accurately estimating the confidence interval of \(P_F^{\text{SSS}}\) is not a trivial task.

To address the aforementioned challenge, a bootstrapping based technique [8] is developed to accurately estimate the CI of the SSS estimator. The key idea of bootstrap is to re-generate a large number of random samples based on a statistical model without running additional transistor-level simulations. These random samples are then used to repeatedly calculate the value of \(P_F^{\text{SSS}}\) in (12.57) for multiple times. Based on these repeated runs, the statistics (hence, the confidence interval) of the estimator \(P_F^{\text{SSS}}\) can be accurately estimated.

In particular, we start from the estimated failure rates \(\{P_{G,q}^{\text{MC}}; q = 1, 2, \ldots , Q\}\). Each estimator \(P_{G,q}^{\text{MC}}\) follows the Gaussian distribution in (12.60). The actual mean P G,q in (12.60) is unknown; however, we can approximate its value by the estimated failure rate \(P_{G,q}^{\text{MC}}\). Once we know the statistical distribution of \(P_{G,q}^{\text{MC}}\), we can re-sample its distribution and generate N RS sampled values \(\{P_{G,q}^{MC(n)}; n = 1, 2, \ldots , N_{\text{RS}}\}\). This re-sampling idea is applied to all scaling factors {s q;q = 1, 2, …, Q}, thereby resulting in a large data set \(\{P_{G,q}^{MC(n)}; q = 1, 2, \ldots , Q; n = 1, 2, \ldots , N_{\text{RS}}\}\). Next, we repeatedly run SSS for N RS times and get N RS different failure rates \(\{P_F^{SSS(n)}; n = 1, 2, \ldots , N_{\text{RS}}\}\). The confidence interval of \(P_F^{\text{SSS}}\) can then be estimated from the statistics of these failure rate values.

Note that to apply SSS, we only need a set of scaling factors and their corresponding scaled failure rates: {(s q, P G,q);q = 1, 2, …, Q}. As long as {s q;q = 1, 2, …, Q} are sufficiently large, {P G,q;q = 1, 2, …, Q} are not small probability values and, therefore, can be efficiently estimated by CMC. When applying CMC, we only need to determine whether the random samples belong to the failure region. Namely, the PoI does not have to be continuous. Due to this reason, SSS can be applied to estimate the rare failure rates for both continuous and discrete PoIs. However, since SUS explores additional information from the continuous performance values, SUS is often preferred over SSS when we handle continuous PoIs.

To demonstrate the efficacy of SSS, we consider an SRAM column consisting of 64 bit-cells and a sense amplifier (SA) designed in a 45 nm CMOS process. Figure 12.5 shows the simplified circuit schematic of this SRAM column example. Similar to the SRAM read current example shown in Fig. 12.2, we consider the local V TH mismatch of each transistor as an independent Normal random variable. In total, we have 384 independent random variables. In this example, the output of SA is considered as the PoI. If the output is correct, we consider the circuit as “PASS”. Hence, the PoI is binary, and we cannot apply SUS in this example. For comparison purposes, we run MNIS [17] and SSS for 100 times with 6000 transistor-level simulations in each run. As shown in Fig. 12.6, there are 3 and 97 CIs out of 100 runs that cover the “golden” failure rate for MNIS and SSS, respectively. Here, the “golden” failure rate is estimated by CMC with 109 random samples. MNIS, again, fails to accurately estimate the corresponding CIs. SSS, however, successfully estimates the CIs. These results demonstrate that SSS is superior to the traditional MNIS method in this SRAM example, where the dimensionality of the variation space is more than a few hundred.

Fig. 12.5
figure 5

The simplified schematic is shown for an SRAM column consisting of 64 bit-cells and a sense amplifier (SA) designed in a 45 nm CMOS process

Fig. 12.6
figure 6

The 95% confidence intervals (blue bars) of the SRAM example are estimated from 100 repeated runs with 6000 transistor-level simulations in each run for: (a) MNIS and (b) SSS. The red line represents the “golden” failure rate

4 Conclusions

Rare failure event analysis in a high-dimensional variation space has attracted more and more attention due to aggressive technology scaling. To address this technical challenge, we summarize two novel approaches: SUS and SSS. Several SRAM examples are used to demonstrate the efficacy of SUS and SSS. More experimental results of SUS and SSS can be found in the recent publications [19, 21]. Both SUS and SSS are based upon solid mathematical background and do not pose any specific assumption on the failure region. Hence, they can be generally applied to estimate the rare failure rates of a broad range of other circuits, e.g., DFF.