An efficient variable screening method for effective surrogate models for reliability-based design optimization

Cho, Hyunkyoo; Bae, Sangjune; Choi, K. K.; Lamb, David; Yang, Ren-Jye

doi:10.1007/s00158-014-1096-9

An efficient variable screening method for effective surrogate models for reliability-based design optimization

RESEARCH PAPER
Published: 08 June 2014

Volume 50, pages 717–738, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

An efficient variable screening method for effective surrogate models for reliability-based design optimization

Download PDF

Hyunkyoo Cho¹,
Sangjune Bae¹,
K. K. Choi¹,
David Lamb² &
…
Ren-Jye Yang²

1016 Accesses
25 Citations
Explore all metrics

Abstract

In the reliability-based design optimization (RBDO) process, surrogate models are frequently used to reduce the number of simulations because analysis of a simulation model takes a great deal of computational time. On the other hand, to obtain accurate surrogate models, we have to limit the dimension of the RBDO problem and thus mitigate the curse of dimensionality. Therefore, it is desirable to develop an efficient and effective variable screening method for reduction of the dimension of the RBDO problem. In this paper, requirements of the variable screening method for deterministic design optimization (DDO) and RBDO are compared, and it is found that output variance is critical for identifying important variables in the RBDO process. An efficient approximation method based on the univariate dimension reduction method (DRM) is proposed to calculate output variance efficiently. For variable screening, the variables that induce larger output variances are selected as important variables. To determine important variables, hypothesis testing is used in this paper so that possible errors are contained in a user-specified error level. Also, an appropriate number of samples is proposed for calculating the output variance. Moreover, a quadratic interpolation method is studied in detail to calculate output variance efficiently. Using numerical examples, performance of the proposed method is verified. It is shown that the proposed method finds important variables efficiently and effectively

Reliability-based design optimization using adaptive surrogate model and importance sampling-based modified SORA method

Article 13 November 2019

A Novel Reliability-Based Design Optimization Method Using Ensemble of Metamodels

Surrogate-assisted reliability-based design optimization: a survey and a unified modular framework

Article 11 May 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The variable screening method is a useful method in the design optimization process because it can select essential design variables for accurate surrogate models and effective design optimization. In the formulation of a design optimization problem, a set of design variables that describe the system need to be identified (Arora 2004). Design variables are selected to be independent of each other as much as possible in the design space. The number of independent design variables is known as the degrees of freedom, and this is the dimensionality of the optimization problem. To obtain an appropriate optimum design, a minimum number of design variables is required. For this reason, it is better to identify as many design variables as possible and then fix some of the variables at certain values according to the variable screening result. The variable screening method can play a key role, especially in reliability-based design optimization (RBDO), because the RBDO process requires a larger number of analyses than the deterministic design optimization (DDO) process due to reliability analyses and the design sensitivities of probabilities of failure. To this end, surrogate models are usually used to reduce the number of analyses required in RBDO. Various surrogate model methods such as the radial basis function (RBF), polynomial response surface (PRS), support vector regression (SVR), Kriging, and dynamic Kriging (DKG) methods have been developed (Cressie 1991; Barton 1994; Jin et al. 2001; Simpson et al. 2001; Queipo et al. 2005; Wang and Shan 2007; Forrester et al. 2008; Forrester and Keane 2009; Zhao et al. 2011). However, even for the surrogate model, the number of design variables becomes a critical factor because surrogate model generation is difficult for high-dimensional problems, due to the curse of dimensionality.

Variable screening methods have been developed in various disciplines. In statistics, important variables were found to create an accurate surrogate model of computer simulation using the maximum likelihood estimator (MLE) of correlation parameters of the Gaussian process for a deterministic problem (Welch et al. 1992). Using a regression model, essential variables among candidate variables were efficiently identified based on data (Duarte Silva 2001; Wang 2009). Especially in statistical learning theory, various feature selection methods have been developed to choose a reduced number of input variables to represent an output effectively (Guyon and Elisseeff 2003). In addition, methods such as manifold learning have been used to preserve input information in reduced dimension for efficient statistical analysis (Izenman 2008). In physics, a variable screening model was developed for the quasi-molecular treatment of ion-atom collision (Eichler and Wille 1975). In engineering, a confidence interval of the coefficient of a linear surrogate model was proposed to detect key variables for a car crash DDO problem (Craig et al. 2005). A sampling-based sensitivity measure using a small amount of data was introduced to rank the importance of variables and was applied to long-term performance of a geologic repository for high-level radioactive waste (Wu and Mohanty 2006). Moreover, the design sensitivity method can be extended to the variable screening method because vital variables have larger design sensitivity. In the deterministic problem, the design sensitivity, which shows the rate of change in the performance measure at the design point, can be obtained using various methods (Choi and Kim 2005a, b) and is called local sensitivity analysis (LSA) (Reedijk 2000; Chen et al. 2005). For a reliability problem, the variability of the input random variable should be incorporated to assess the design sensitivity of a probabilistic constraint. The design sensitivity of the probabilistic constraint using the first-order reliability method (FORM) (Haldar and Mahadevan 2000; Ditlevsen and Madsen 1996; Hou 2004), dimension reduction method (DRM) (Rahman and Wei 2008; Lee et al. 2010), and sampling-based stochastic sensitivity (Lee et al. 2011a, b), could be used to identify important design variables. In addition, global sensitivity analysis (GSA), such as correlation ratio (McKay et al. 1999), global sensitivity indices (Sobol 2001), and analytical GSA methods (Chen et al. 2005), can be used for variable screening as well (Mack et al. 2007).

However, previous works may have limitations to being directly applied to RBDO with surrogate models. If a method depends entirely on existing data (Duarte Silva 2001; Wang 2009; Guyon and Elisseeff 2003; Izenman 2008), it may not be possible to carry out RBDO because design variables change during the optimization process. Finding input variables from among all the variables that may be irrelevant to an output to well represent the output from data (Guyon and Elisseeff 2003) is not an issue from the RBDO perspective. The relationship and relative input variables are already known through the computer aided engineering (CAE) such as the finite element method (FEM) or computational fluid dynamics (CFD). A method using CAE to find variables which significantly affect output reliability is more interesting. Moreover, capturing the input information in reduced variables (Izenman 2008) is not an issue with RBDO, either; how much output uncertainty is affected by the input variables is the main interest. A method developed for a specific problem (Eichler and Wille 1975) will be inadequate for broad applications. Variable screening and design sensitivity methods for a deterministic problem (Welch et al. 1992; Craig et al. 2005; Choi and Kim 2005a, b) may not be applicable for RBDO because input randomness is not considered. Methods that require a very large number of analyses (McKay et al. 1999; Sobol 2001) could be ineffective for RBDO of computationally demanding problems and could become unstable when sufficient numbers of analyses are not provided (Wu and Mohanty 2006). The design sensitivity of the probabilistic constraint using FORM or DRM (Haldar and Mahadevan 2000; Ditlevsen and Madsen 1996; Hou 2004; Rahman and Wei 2008; Lee et al. 2010) requires searching for the most probable point (MPP), which may be very difficult to obtain for a large-dimensional problem. If a method was developed based on the assumption that accurate full-dimensional surrogate models are available a priori (Chen et al. 2005; Lee et al. 2011a, b), RBDO could be carried out using the surrogate models because it can provide accurate response and sensitivity of the problem, unless the optimization algorithm has a limitation on the number of design variables, which is not common. From the previous works, key desirable properties of a variable screening method for RBDO with a surrogate model were found: it should (1) be efficient, (2) consider input randomness, (3) not require a full-dimensional surrogate model, and (4) be applicable to broader problems.

Therefore, the objective of this paper is to develop a variable screening method that can satisfy the above desirable properties. The reliability analysis in RBDO captures the output variability induced by the input variability and the sensitivity of the performance function. The variable that induces larger output variability is important in the RBDO process. In this paper, a partial output variance, which is the output variance when one random variable has variability while others are fixed at their mean, is used to find important design variables (Bae 2012). The partial output variance is simple to calculate and requires a 1-D surrogate model for each design variable. The method introduced in this paper has strengths and weaknesses. Its main strength is its efficiency and practical applicability. Its weakness is accuracy; the interactions between the random variables are not fully captured. However, practical applicability is the focus in this paper because it is very important for large-scale problems. In the following sections, the proposed method will be explained in detail, and its strengths and weaknesses will be fully discussed. To demonstrate the effectiveness of the proposed method, analytical examples and a large-scale industrial problem are used.

2 Variable screening

As explained in the introduction, screening out variables means finding important variables among all random variables. Here, the word “important” could have different meanings depending on the problem we are dealing with. In the following two sections, the difference between variable screening for DDO and RBDO will be explained. Based on the difference, the required properties of variable screening for RBDO will be introduced.

2.1 Variable Screening for DDO

A DDO problem can be formulated as

$$\begin{array}{@{}rcl@{}} &{}\text{minimize}&\quad \quad \quad \quad \quad \text{cost}\left(\mathbf{d} \right)\notag\\ &{}\text{subject~to}&\quad\quad\quad\begin{array}{ll} {G_{j} \left({\mathbf{d}} \right)\le 0,} &\quad {j=1,...,NC} \\ {\mathbf{d}}^{L}\le \mathbf{d}\le \mathbf{d}^{U},& \quad{\mathbf{d}}\in \mathbb{R}^{NDV} \end{array} \end{array} $$

(1)

where d, G _j, NC, and NDV are the design variable vector, jth constraint function, number of constraints, and number of design variables, respectively.

As stated before, in the DDO problem, the input design variables do not have uncertainty, and thus the design sensitivity can be used as a barometer to determine the importance ranking of design variables with respect to the performance measure. The question is: “Where should the importance ranking of design variables be determined?” or “Where should the design sensitivity be calculated?”

The LSA calculates the design sensitivity at a given design point (Reedijk 2000; Chen et al. 2005). Usually, LSA is used to provide the direction of design movement in the optimization process. For variable screening, LSA can provide the importance ranking of design variables at the current design point. However, the importance ranking at the given design point could be different from the ranking at other design points if the performance measure is a nonlinear function of design variables. On the other hand, GSA is used to calculate overall design sensitivity on the entire design domain. The GSA is like averaged design sensitivity in the design domain. As it is an average, the importance ranking using GSA could mislead at specific points or even regions. Hence, LSA and GSA have advantages and disadvantages for variable screening (Reedijk 2000).

2.2 Variable Screening for RBDO

A general RBDO problem can be formulated as

$$\begin{array}{@{}rcl@{}} &&{\kern-7.6pt}\text{minimize} \quad \,\,\,\, \,\,\text{cost}\left(\mathbf{d}\right)\notag\\ &&{\kern-7.6pt}\text{subject to}\quad \,\,\,\,\begin{array}{l} P_{F_{j}} \,=\,P\left[G_{j} \left(\mathbf{X} \right)\!>\!0\right]\!\le\! P_{F_{j}}^{Tar} \;j\,=\,1,...,NC \\ {\mathbf{d}}^{L}\!\le\! \mathbf{d}\!\le\! \mathbf{d}^{U},\;\;\mathbf{d}\in \mathbb{R}^{NDV},\;\;\text{and}\;\;\mathbf{X}\in \mathbb{R}^{NRV} \end{array}\notag\\ \end{array} $$

(2)

where d, $G_{j},P_{F_{j} }^{Tar} $, NC, NDV, and NRV are the design variable vector, jth constraint function, jth target probability of failure, number of constraints, number of design variables, and number of random variables, respectively.

In the RBDO process, design variable vector d is the mean vector of the corresponding random variable X. Though the design variable d is deterministic, the design sensitivity for RBDO should consider the randomness of X because the constraints are based on the probabilistic performance measure P[G _j(X)>0] as shown in (2). Therefore, the design sensitivity of the performance measure alone cannot be used as a barometer. The design sensitivity of the probabilistic performance measure can be obtained by using several methods, such as FORM (Haldar and Mahadevan 2000; Ditlevsen and Madsen 1996; Hou 2004), DRM (Rahman and Wei 2008; Lee et al. 2010), and sampling-based stochastic sensitivity (Lee et al. 2011a, b). The design sensitivity of the probabilistic performance measure can be used for variable screening. The design sensitivities by those methods are LSA because they provide different sensitivities at different designs. The GSA method is also applicable for variable screening in RBDO problems as it is in DDO problems. Again, both LSA and GSA methods have advantages and disadvantages.

The random parameters will not increase the dimensionality of the optimization problem because they are not random design variables. However, the surrogate model that includes random parameters is still required because they affect the output distribution. The main objective of this paper is to select important design variables so that accurate surrogate models can be generated and, at the same time, an appropriate optimum design (i.e., not suboptimum) can be obtained in the RBDO process. Hence, once variable screening is done, the screened-out random design variables need to be fixed, not to be a random parameter. However, fixing a random variable as a deterministic variable will reduce the total output variability.

Consider a simple example:

$$\begin{array}{@{}rcl@{}} &&{\kern-7.6pt}X_{i} \sim N\left({5,3^{2}} \right), {i=1,2,\;...\;,10} \notag\\ &&{\kern-7.6pt}Y=\sum\limits_{i=1}^{10} {X_{i} } \sim N\left({50,\left({3\sqrt {10} } \right)^{2}} \right) \end{array} $$

(3)

If the probabilistic performance measure is P[Y>60], then the reliability analysis result is

$$ P\left[ {Y>60} \right]=1-\Phi \left({\frac{60-50}{3\sqrt {10} }} \right)=0.1459 $$

(4)

However, if one dimension is reduced by screening out X ₁₀=μ ₁₀=5 while the other variables remain random, then the probabilistic performance measure changes to

$$ \widetilde{Y}=\sum\limits_{i=1}^{9} {X_{i} } +5\sim N\left({50,9^{2}} \right) $$

(5)

As a consequence, the reliability analysis result yields

$$ P\left[ {\widetilde{Y}>60} \right]=1-\Phi \left({\frac{60-50}{9}} \right)=0.1333 $$

(6)

From (4) and (6), 0.0126 (1.26 %) of the reliability output is decreased by screening out one variable. A more fundamental problem is that the lost amount 1.26 % cannot be estimated without the full-dimensional reliability analysis result of (4). On the other hand, let’s assume that X ₁₀ has a smaller variance of one. Then, the full-dimensional reliability analysis yields

$$ P\left({Y>60} \right)=1-\Phi \left({\frac{60-50}{\sqrt {82} }} \right)=0.1347 $$

(7)

From (6) and (7), the difference is 0.0014 (0.14 %), which could be acceptable. Therefore, in this case, X ₁₀ could be fixed at the mean value. As shown in the example, the output variability decreases if any random variable is fixed at a deterministic value. However, there are some variables that affect the output variability a small amount. The variable screening method for effective surrogate models for RBDO is to find those variables that have small effects on the output variability. It is noted that the random parameters are considered as much as the random design variables in this paper. Even though the random parameters are not changing during the RBDO process, they will influence the output variability. Hence they should be considered in the variable screening process, so that reliability of the performance measure can be accurately approximated using reduced dimension.

3 Variable screening with 1-D surrogate model

The probability of failure cannot be solely determined by the output variability. To obtain accurate probability of failure, the output distribution is needed, that is, all statistical information of the output is required. However, even though an input distribution is known, it is very difficult to obtain complete output distribution since the performance measure could be implicit, a non-linear function, or even both. For example, for a given normal input distribution, the output distribution could be bimodal as well as asymmetric. Consequently, it is impractical to select a reduced number of input variables based solely on probability of failure. As discussed in previous sections, a screened-out variable will be fixed at its mean value. Then the change of output mean will be minimized. As a result, the output variability becomes the measure that can determine a probability of failure. Of course, other statistical moments or parameters, such as skewness and kurtosis, could affect probability of failure. However, either of these statistical moments cannot be a measure by itself. For example, a variable that induces larger (or smaller) output skewness may not be an important variable, but it could be an important variable when it induces larger (or smaller) output skewness and very similar output variability. We could consider a combination of the moments as a measure, but there are too many possible combinations to consider. Hence, under the assumption that the output mean is similar, the output variability is chosen as the measure to select vital variables for RBDO in this paper.

The output variability can be quantified by the output variance as shown in the previous section. The exact output variance of a nonlinear implicit performance measure is very difficult to obtain. Hence, an approximated output variance is used in this paper. In the following sections, the output variance is decomposed into partial output variances, which are the output variances when each input variable is random and the others are fixed at their mean values. Then, a method to find the design variables that have a large impact on output variance is developed using a hypothesis testing.

3.1 Approximated output variance

A univariate DRM is a well-known approximation method for statistical moments using multiple 1-D integrations (Rahman and Xu 2004). Consider a performance measure Y and its realization y subject to input random vector X={X ₁,…,X _N}^T:

$$ Y\left({\mathbf{X}} \right)=Y\left({X_{1} ,\ldots ,X_{N} } \right),\;\;y\left({\mathbf{x}} \right)=y\left({x_{1} ,\ldots ,x_{N} } \right) $$

(8)

Define a function Y _i, which is the performance measure when X _i is random and other variables are fixed at their mean values, as

$$ Y_{i} =Y\left({\mu_{1} ,\ldots ,\mu_{i-1} ,X_{i} ,\mu_{i+1} ,\ldots ,\mu_{N} } \right) $$

(9)

The realization of the performance measure at the input mean point μ _X is defined as

$$ y_{0} = y \left(\boldsymbol{\mu}_{\mathbf{X}}\right) $$

(10)

The lth statistical moment of Y, which is approximated using the univariate DRM, is defined as (Rahman and Xu 2004)

$$ m_{l} \cong E\left[ {\left\{ {\sum\limits_{i} {Y_{i} -\left({N-1} \right)y_{0} } } \right\}^{l}} \right] $$

(11)

Then, the output variance ${\sigma _{Y}^{2}} $ can be approximated as

$$ {\sigma_{Y}^{2}} \cong m_{2} -{m_{1}^{2}} =\sum\limits_{i} {\sigma_{Y_{i} }^{2} } +2\sum\limits_{i>j} {\rho_{Y_{i} Y_{j} } \sigma_{Y_{i} } \sigma_{Y_{j} } } $$

(12)

where $\sigma _{Y_{i} }^{2} $ is the variance of (9), which is the partial output variance when only X _i is random, and $\rho _{Y_{i} Y_{j} } $ is the correlation coefficient between Y _i and Y _j. As shown in (12), the partial output variances $\sigma _{Y_{i} }^{2} $ are the main variables for approximating the output variance ${\sigma _{Y}^{2}} $. When $\sigma _{Y_{i} }^{2} $ is larger than other partial output variances, it takes the largest portion in the output variance ${\sigma _{Y}^{2}} $. Therefore, if some X _i produces larger partial output variance than others, then X _i should be selected as an important variable. It is noted that calculation of $\sigma _{Y_{i} }^{2} $ requires only 1-D integration, and thus only 1-D surrogate models are required.

Statistical correlation between X _i and X _j yields the term of $\rho _{Y_{i} Y_{j} } \sigma _{Y_{i} } \sigma _{Y_{j} } $ in (12) and affects the output variance. When X _i and X _j are strongly correlated, one could be replaced by the other. To calculate the term $\rho _{Y_{i} Y_{j} } \sigma _{Y_{i} } \sigma _{Y_{j} } $, a two-dimensional surrogate model is required. If there are only a few correlation pairs, calculating the correlation term could be affordable. However, with a practical point of view, the partial output variance $\sigma _{Y_{i} }^{2} $ is the focus in this paper. As we are looking for important variables, not the value the output variance of ${\sigma _{Y}^{2}} $, the partial output variance would be enough for variable screening. In Fig. 1, contours of independent, positively correlated (ρ=0.8) and negatively correlated (ρ=−0.8) probability density functions are shown. Correlation determines how the random variables are distributed inside the box (dotted line), whereas the size of the box is determined by variances of X ₁ and X ₂. It can be seen that the primary effect on output variance is the box size, and then distribution inside the box follows. Consequently, to perform variable screening efficiently, the first thing we need to consider is the box size, not the distribution of random variables inside the box. Hence, the correlation term is not considered in this paper for efficiency and practicality. It is noted that the statistical correlation between X _i and X _j will be considered in reduced-dimensional RBDO if both variables are selected.

The partial output variance of $\sigma _{Y_{i} }^{2} $ is like LSA because it can have different values at different input mean points μ _X, which is the current design point in the RBDO process. Hence, the variable screening result could be changed as the design point changes. There are several recommended points at which to perform variable screening using LSA. The first one is the DDO optimum. As the DDO optimum is usually close to the RBDO optimum, the variable screening result at the DDO optimum is likely to be similar to the result at the RBDO optimum. Also, the design point where most of the deterministic constraints are active can be a good candidate point. It is noted that DDO or the design point where the constraints are active could be obtained using the finite difference method in a practical engineering problem. Also, DDO could be achieved using the sensitivity obtained from a 1-D surrogate model because DDO requires only the deterministic LSA, which is 1-D.

3.2 Variable screening using hypothesis testing

Using the 1-D surrogate model, the partial output variance $\sigma _{Y_{i} }^{2} $ can be calculated approximately as

$$ s_{Y_{i} }^{2} =\frac{1}{ns-1}\sum\limits_{j=1}^{ns} {\left\{ {y_{i} \left({\mu_{1} ,\ldots ,x_{i}^{\left(j \right)} ,\ldots ,\mu_{N} } \right)-\overline y_{i} } \right\}^{2}} $$

(13)

where $x_{i}^{\left (j \right )} $ is the jth realization of the input random variable X _i, ns is the number of samples, and $\overline y_{i} $ is the mean of y _i as,

$$ \overline y_{i} =\frac{1}{ns}\sum\limits_{j=1}^{ns} {y_{i} \left({\mu_{1} ,\ldots ,x_{i}^{\left(j \right)} ,\ldots ,\mu_{N} } \right)} $$

(14)

As explained in previous sections, the partial output variance $s_{Y_{i} }^{2} $ can be used to determine important design variables. To make the variable screening procedure systematic, hypothesis testing is applied in this paper. Hypothesis testing can prevent undesirable choices that could occur during the decision-making procedure. Calculated partial output variance $s_{Y_{i} }^{2} $ depends on the number of samples ns. When ns is large enough, the variable screening result will be accurate. However, it would require a large computational time. Also, it is hard to determine what value of ns is large enough. When ns is small, it will include statistical error. If calculated $s_{Y_{i} }^{2} $ are distinctive from each other or with respect to the screening threshold value, then the effect of ns may not be significant. However, ns could cause an error when some $s_{Y_{i} }^{2} $ are similar to each other or are near the screening threshold value. Hypothesis testing can prevent this problem in a statistical manner by letting users control the error level.

Various hypothesis testing methods have been developed for the decision-making problem (Rosner 2006). Among those methods, we need the one that is not sensitive to distribution type because the distribution type of Y _i or $s_{Y_{i} }^{2} $ is not known in general. The one-sample t-test is developed based on the central limit theorem, which states that the sample mean of non-normal distribution follows normal distribution approximately for a large number of samples. The one-sample t-test is not sensitive to underlying distribution types, so it is used in this paper. As the t-test is a method for sample mean, $s_{Y_{i} }^{2} $ is calculated nr times for its statistical moments as

$$ \overline v_{i} =\frac{1}{nr}\sum\limits_{k=1}^{nr} {s_{Y_{i} }^{2\left(k \right)} } $$

(15)

$$ s_{v_{i} }^{2} =\frac{1}{nr-1}\sum\limits_{k=1}^{nr} {\left({s_{Y_{i} }^{2\left(k \right)} -\overline v_{i} } \right)^{2}} $$

(16)

where $s_{Y_{i} }^{2\left (k \right )} $ is the kth repetition of $s_{Y_{i} }^{2} $ and nr is the number of repetitions. Now, the hypothesis is constructed:

$$ H_{0} :\;\overline v_{i} \le \mu_{0} \;\;\text{versus}\;\;H_{1} :\;\overline v_{i} >\mu_{0} $$

(17)

where μ ₀ is the criterion of hypothesis testing. According to (17), the design variable that corresponds to $\overline v_{i} $, which is greater than μ ₀ (H ₁ is true), will be selected as an important variable. Using the one-sample t-test, the hypothesis can be tested by checking the following statement:

$$ \text{Reject}\;H_{0} \;\text{in}\;\text{favor}\;\text{of}\;H_{1} \;\text{if}\;q\ge t_{nr-1,1-\alpha } $$

(18)

where α is the significance level, t _{n
r−1,∙} is $t_{nr-1}^{-1} \left (\cdot \right )$, and the test statistics q is defined as

$$ q\equiv {\left( {\overline{v}_{i} -\mu_{0} } \right)} \left/ {\vphantom {{\left( {\overline \nu_{i} -\mu_{0} } \right)} {\frac{s_{\nu _{i} } }{\sqrt {nr} }}}} \right. {\frac{s_{v _{i} } }{\sqrt {nr} }} $$

(19)

In (15) and (16), the uncertainty induced by ns is transferred to nr. Hence, ns can be a fixed number, whereas nr should be decided appropriately. Also, μ ₀ needs to be identified in (17) and (19). μ ₀ is the key criterion that decides important variables, and it should be a value relative to $\overline v_{i} $ because the relative difference of partial output variances should be checked for variable screening. At the same time, μ ₀ needs to be statistically independent from $\overline v_{i} $ for reasonable hypothesis testing. In this paper, preliminary testing is proposed to obtain reasonable nr and μ ₀ as follows. First, choose n r ₀, which is large enough so that the central limit theorem holds. Then, calculate the initial statistical moments of $s_{Y_{i} }^{2} $ as

$$ \overline v_{i}^{\left(0 \right)} =\frac{1}{nr_{0} }\sum\limits_{k=1}^{nr_{0} } {s_{Y_{i} }^{2\left(k \right)} } $$

(20)

$$ s_{v_{i} }^{2\left(0 \right)} =\frac{1}{nr_{0} -1}\sum\limits_{k=1}^{nr_{0} } {\left({s_{Y_{i} }^{2\left(k \right)} -\overline v_{i}^{\left(0 \right)} } \right)^{2}} $$

(21)

Using the value from (20), the testing criterion μ ₀ relative to $\overline v_{i} $ can be calculated as

$$ \mu_{0} =\frac{\gamma }{N}\sum\limits_{i=1}^{N} {\bar{{v }}_{i}^{\left( 0 \right)} } $$

(22)

where γ is a constant that the user selects. nr is calculated by limiting type II error (H ₀ is accepted when H ₁ is true) at the level of false negative rate β as [35]

$$ nr=\max \left( {\frac{s_{{v}_{i} }^{2(0)} \left( {t_{nr_{0} -1,1-\alpha } +t_{nr_{0} -1,1-\beta } } \right)}{\overline{{v}}_{i}^{\left( 0 \right)} -\mu _{0} },nr_{0} } \right) $$

(23)

In (23), t _{n
r−1,∙} should be used instead of $t_{nr_{0} -1,\bullet } $ for accurate calculation of nr. However, (23) requires the value of nr on the right side to calculate nr. To avoid this problem, $t_{nr_{0} -1,\bullet } $ is used instead, and $t_{nr_{0} -1,\bullet } $ produces a conservative result as it is larger than t _{n
r−1,∙} because nr is larger than n r ₀ in (23) and α is usually small. Finally, nr and μ ₀ are determined so that the proposed hypothesis testing can be utilized.

3.3 1-D surrogate model

In previous sections, the 1-D surrogate model is treated as the given one because it is not difficult to generate. However, efficiently creating a 1-D surrogate model could be an issue. For efficiency, quadratic interpolation is proposed as a basic 1-D surrogate model in this paper. Quadratic interpolation may not be an adequate method for creating a surrogate model for a highly nonlinear performance measure. However, a nonlinear performance measure can be effectively approximated by a quadratic function on a small region. If X follows normal distribution, the domain of X is (−∞,∞), whereas 99.73 % of X is in (μ _X−3_σ _X,μ _X+3σ _X), which is much smaller than the infinite domain. Even if X does not follow normal distribution, the region (μ _X−3σ _X,μ _X+3σ _X) can cover almost all (approximately 98 %) of X. In view of the fact that this paper is focused on calculation of partial output variance, the region (μ _X−3σ _X, μ _X+3σ _X) is large enough. Hence, the 1-D surrogate model needs to be accurate in the region (μ _X−3σ _X, μ _X+3σ _X) so that quadratic interpolation could be an appropriate method to approximate the performance measure in the region.

Quadratic interpolation requires three design of experiments (DoE) samples, and the location of DoE samples affects the accuracy of interpolation. The location of DoE samples determined using the Chebyshev polynomial is known to give uniform error in the domain (Rao 2002). Because only the region (μ _X−3σ _X, μ _X+3σ _X) is of interest, the location of DoE samples is determined as x ₁=μ _X−2.5981σ _X, x ₂=μ _X and x ₃=μ _X+2.5981σ _X using the Chebyshev polynomial. Since a random variable X may not be evenly distributed in its domain, providing uniform error does not necessarily mean that the calculated partial output variance is accurate. However, since no unique location of DoE samples is best for accurate partial output variance, the sample location by the Chebyshev polynomial is used in this paper due to the fact that it yields reasonable results for various distribution types of the random variable X. If the random variable X has a closed and bounded domain like [a, b], the domain can be directly used for calculation of partial output variance, and the location of samples are x ₁=0.93301a+0.06699b,x ₂=(a+b)/2 and x ₃=0.06699a+0.93301b, using Chebyshev polynomials.

To check the performance of a selected location of DoE samples, a nonlinear performance measure Y is used as

$$ Y\left(X \right)=0.3+\sin \left({16X/15-0.7} \right)+\sin^{2}\left({16X/15-0.7} \right) $$

(24)

Assuming that random variable X follows N(0.5,0.333²), three locations of DoE samples are chosen to compare the accuracy of the partial output variance. The first location is {0.167,0.5,0.833} , which is μ _X and μ _X±σ _X and the second location is from the Chebyshev polynomial as {−0.365, 0.5, 1.365}. The third location is wider, as {−0.667, 0.5, 1.667}, which is μ _X and μ _X±3.5σ _X. Partial output variances are calculated using 100,000 realizations of X, and true partial output variance is calculated by (24) with the same realizations. To check the accuracy of the quadratic interpolation itself, mean square error (MSE) is calculated in the region of (−0.5,1.5), which is (μ _X−3σ _X, μ _X+3σ _X) with 100 uniformly distributed points. The calculated result is shown in Table 1, and the shape of quadratic interpolations is shown in Fig. 2, where asterisk marks (*) represent the DoE sample point. As shown in Table 1, the location of the DoE sample using Chebyshev polynomials produces more accurate partial output variance compared to the true one and less MSE than the other cases.

Table 1 Quadratic interpolation with different DoE sample locations

An efficient variable screening method for effective surrogate models for reliability-based design optimization

Abstract

Similar content being viewed by others

Reliability-based design optimization using adaptive surrogate model and importance sampling-based modified SORA method

A Novel Reliability-Based Design Optimization Method Using Ensemble of Metamodels

Surrogate-assisted reliability-based design optimization: a survey and a unified modular framework

1 Introduction

2 Variable screening

2.1 Variable Screening for DDO

2.2 Variable Screening for RBDO

3 Variable screening with 1-D surrogate model

3.1 Approximated output variance

3.2 Variable screening using hypothesis testing

3.3 1-D surrogate model

4 Numerical examples

4.1 Analytical examples

4.1.1 Hartmann 6-D

4.1.2 Dixon-Price 12-D

4.2 Engineering example

4.2.1 Variable screening

4.2.2 Reliability-based design optimization

5 Conclusion

References

Acknowledgments

Disclaimer

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation