1 Introduction

Simulation-based design optimization considering various uncertainties has been widely developed pursuing reliable optimum for complex engineering systems. Representatively, two sources of uncertainties have been usually quantified and managed in design optimization to prevent unpredictable failure of a system: aleatory uncertainty represents inherent and natural randomness that cannot be manipulated, and epistemic uncertainty is incurred by a lack of knowledge that can be reduced by additional data or more advanced theory (Der Kiureghian and Ditlevsen 2009). Both uncertainties have to be quantified and aggregated to guarantee the reliability of performances in a system. In other words, not only natural randomness of variables and parameters but also imperfect knowledge and lack of data have been taken into account through effective uncertainty quantification and propagation in recent developments.

Various methodologies accounting for epistemic uncertainty in reliability-based design optimization (RBDO) have been prominent in determining safe design. Conventional RBDOs rely on complete statistical models of random input variables and simulation models assumed to be accurate. Thus, researches concentrating on how to compute multi-dimensional integration (i.e., estimation of reliability) have been proposed such as most probable point (MPP)-based methods (Tu et al. 1999; Lee et al. 2012; Meng et al. 2015; Kang et al. 2017; Jung et al. 2019a) and sampling-based methods (Lee et al. 2011; Dubourg and Sudret 2014; Kang et al. 2019; Moustapha and Sudret 2019). These methods mainly treat the aleatory uncertainty of random variables; thus, uncertainty of a performance function propagated from the randomness of input variables is widely investigated.

However, simulation model results may not be consistent with experimental results due to missing underlying physics and unknown model parameters, and also statistical representations for input variables can be inaccurate due to insufficient samples. As a result, epistemic uncertainty in simulation-based design optimization can generally be categorized into two types: (1) simulation model uncertainty and (2) input model uncertainty. The simulation model uncertainty includes model parameter uncertainty, model bias, and random error and can be calibrated with experiment results using either Bayesian approaches or Gaussian process (Noh et al. 2011; Jiang et al. 2013; Pan et al. 2016; Moon et al. 2017, 2018; Xi 2019). On the other hand, the uncertainty of input statistical models is also critical since estimating true input statistical information is difficult due to cost limitation for testing samples to quantify the variability of random variables. Extensive researches associated with the distribution of reliability have been developed to handle the input model uncertainty representing the uncertainty of input distribution parameters and types (Gunawan and Papalambros 2006; Youn and Wang 2008; Picheny et al. 2010; Sankararaman and Mahadevan 2011; Lee et al. 2013; McDonald et al. 2013; Yoo and Lee 2014; Nannapaneni and Mahadevan 2016; Peng et al. 2017; Hao et al. 2017; Ito et al. 2018; Moon et al. 2019). Recently, conservative RBDO (CRBDO) using the Bayes’ theorem has been proposed (Gelman et al. 2013; Cho et al. 2016), and then confidence-based design optimization (CBDO) to eliminate the double-loop MCS in CRBDO is developed (Jung et al. 2019b).

One of the notable features of the epistemic uncertainty in comparison with the aleatory uncertainty is that it can be reduced by gathering more experimental and computational data and acquiring more knowledge from experts. Thus, acquiring new data should be conducted that leads to the maximum impact on reducing the epistemic uncertainty using the minimum resource under the given budget. In other words, resource allocation for each test is necessary to enable practical cost optimization considering epistemic and aleatory uncertainties. A few works to develop test resource allocation in the context of model uncertainty have been proposed. It can provide an optimal scheme to distribute resources to different simulation models especially in multidisciplinary design optimization (MDO) (Sankararaman et al. 2013). On the other hand, there has been research on the trade-off between the number of samplings for MCS and system objective since insufficient samples for numerical integration can lead to inaccurate reliability estimation (Bae et al. 2018, 2019). In spite of the aforementioned developments, research on the trade-off between the sample size for input variables and system cost considering confidence of reliability has been quite limited.

In this paper, costs for both system objective and collecting additional data are simultaneously taken into consideration to allocate the optimal number of tests. It is noted that the optimum of CBDO is more conservative than RBDO where input statistical models are assumed to be known. It means that insufficient input data cause loss of the objective function in CBDO. However, acquiring new samples for input variables also demands engineering resources. That is, there is a trade-off between shifting design vector corresponding to the system objective and reducing epistemic uncertainty of input statistical models corresponding to the cost for additional samples. Following the previous studies (Bae et al. 2018), the system objective is defined as an operating cost, and the cost for collecting new samples for input variables is defined as a development cost. Acquiring new data may decrease the operating cost by reducing the epistemic uncertainty, but the development cost increases. Hence, the objective function in CBDO should include the development cost as well to find a practically balanced optimum in real engineering applications. In this paper, the proposed approach is called a bi-objective CBDO to determine the optimal sample size for input variables by balancing the development and operating cost. Note that the operating cost is affected only by random design variables, but the development cost is involved in both random design variables and random parameters. Consequently, the following question can be answered from the proposed bi-objective CBDO: How to determine sample size for input variables in RBDO to minimize the overall cost?

The remainder of the paper is organized as follows: brief reviews for CBDO including quantification of input distribution parameters and estimation of confidence in parameter space are presented in Sect. 2. Then, the proposed bi-objective CBDO explained with uncertainty on input distribution type, formulation, and sensitivity analysis of confidence with respect to the sample size for input variables is developed in Sect. 3. In Sect. 4, the proposed bi-objective CBDO is applied to numerical examples under various configurations to verify its feasibility and effectiveness. In addition, the bi-objective CBDO associated with a 4-beam multi-scale composite frame is given in Sect. 5. Finally, conclusions are presented in Sect. 6.

2 Review of confidence-based design optimization under insufficient input data

In this section, brief reviews for the confidence of reliability and CBDO using a reliability measure approach (RMA) under insufficient input samples to quantify the statistical models of random variables will be given (Jung et al. 2019b). The confidence is defined as a probability that reliability is larger than user-specified target reliability under epistemic uncertainty, and the reliability is defined as the probability that a limit-state function is less than zero to prevent failure of the system considering only aleatory uncertainty (e.g., randomness of input variables). CBDO quantifies the confidence of reliability induced by uncertainties of input statistical models such as input distribution types and parameters.

2.1 Uncertainty of input distribution model

Uncertain input statistical models have been taken into consideration using a Bayesian approach with uncertain input distribution parameters and types. However, in order to eliminate the uncertainty of input distribution types avoiding the discrete random variables, this research assumed that input distribution types are decided deterministically to concentrate on the proposed bi-objective optimization. Thus, only uncertainties of input distribution parameters affect the uncertainty of output reliability. In recent study on CBDO, when input distribution types are unknown, kernel density estimation (KDE) is exploited to construct the input distribution with varying parameters (Jung et al. 2019b). It will be briefly discussed in Sect. 3.1 for determination of input distribution type when only sparse data are available.

The variance and mean for input variables are quantified as inverse-gamma and normal distribution, respectively, based on the normality assumption as Cho et al. (2016)

$$ {\sigma}_i^2\mid {{}^{\ast}\mathbf{x}}_i\sim \mathrm{IG}\left(\frac{N{D}_i-1}{2},\frac{\left(N{D}_i-1\right){s}_i^2}{2}\right) $$
(1)

and

$$ {\mu}_i\mid {\sigma}_i^2,{{}^{\ast}\mathbf{x}}_i\sim N\left({{}^{\ast}\overline{x}}_i,\sqrt{\frac{\sigma_i^2}{ND}}\right) $$
(2)

where xi, NDi, \( {s}_i^2 \), and \( {{}^{\ast}\overline{x}}_i \) are the given input dataset, the sample size, the sample variance, and the sample mean of the ith random variable, respectively. The normal distribution of μi in (2) is conditional to realization of the variance from (1). As a result, variability of input distribution parameters is affected by the sample mean, variance, and the sample size. Note that the sample mean of random design variables is equivalent to the initial design point (i.e., mean vector) in CBDO to evaluate the cost function, and as the design moves, all the samples are shifted by the same amount meaning that the sample variance is fixed during the design optimization (Cho et al. 2016). However, the sample mean of random parameters is invariant and directly used in the optimization process.

2.2 Confidence of reliability

The confidence of reliability is computed using a cumulative distribution function (CDF) of reliability as a function of input distribution parameters when input distribution types are known. The confidence to satisfy the given reliability denoted as Re is expressed as Cho et al. (2016); Moon et al. (2018)

$$ \mathrm{CL}\left(\operatorname{Re}|\overset{\ast }{\mathbf{x}}\right)\equiv 1-{F}_{\mathrm{Re}}\left(\operatorname{Re}|\overset{\ast }{\mathbf{x}}\right)=1-{\int}_0^{\mathrm{Re}}{\int}_{\varOmega_{\boldsymbol{\Psi}}}f\left(\phi, \boldsymbol{\uppsi} |\overset{\ast }{\mathbf{x}}\right)d\boldsymbol{\uppsi} d\phi $$
(3)

where ϕ is a variable corresponding to the reliability; ψ denotes a vector of input distribution parameters. The joint probability density function (PDF) in (3) is described using the Bayes’ theorem as

$$ f\left(\phi, \boldsymbol{\uppsi} |\overset{\ast }{\mathbf{x}}\right)=f\left(\phi |\boldsymbol{\uppsi}, \overset{\ast }{\mathbf{x}}\right)f\left(\boldsymbol{\uppsi} |\overset{\ast }{\mathbf{x}}\right) $$
(4)

Therefore, the confidence of reliability can be estimated through uncertainty quantification and propagation from the uncertainty of input statistical models quantified by the given input dataset.

2.3 Reliability measure approach

Recently, reliability measure approach (RMA) of CBDO has been proposed to alleviate computational demands required to compute the confidence of reliability (Jung et al. 2019b). The key concept of RMA is identical to performance measure approach (PMA) of RBDO that the multi-dimensional integration is approximated on MPP. In RMA, MPP-search is performed in P-space where the input distribution parameters are transformed to the standard normal space through the Rosenblatt transformation (Rosenblatt 1952). The MPP of RMA is the point with the smallest reliability on the given hypersphere in P-space. The RMA formulation is given by

$$ {\displaystyle \begin{array}{l}\underset{\mathbf{p}}{\operatorname{minimize}}\kern0.62em \operatorname{Re}\left(G\left(\mathbf{X}\right)|\boldsymbol{\upmu}, {\boldsymbol{\upsigma}}^2\right)\\ {}\;\mathrm{subject}\ \mathrm{to}\kern0.5em \left\Vert \mathbf{p}\right\Vert ={\varPhi}^{-1}\left({\mathrm{CL}}_{\mathrm{Target}}\right)\\ {}\kern1.08em \mathrm{where}\kern2em \operatorname{Re}\left(G\left(\mathbf{X}\right)|\boldsymbol{\upmu}, {\boldsymbol{\upsigma}}^2\right)\cong \frac{1}{N_{MCS}}\sum \limits_{j=1}^{N_{MCS}}{\mathrm{I}}_{\varOmega_{\mathrm{R}}}\left(G\left({\mathbf{x}}^{(j)}\right)\right)\\ {}\kern3.119999em {\mathrm{I}}_{\varOmega_{\mathrm{R}}}\left(G\left(\mathbf{x}\right)\right)=\left\{\begin{array}{l}1,\kern0.96em G\left(\mathbf{x}\right)\le 0\\ {}0,\kern0.84em \mathrm{otherwise}\end{array}\right.\\ {}\kern3.239999em \varPhi \left({p}_{2i-1}\right)={F}_{\sigma_i^2}\left({\sigma}_i^2|{{}^{\ast}\mathbf{x}}_i\right)\\ {}\varPhi \left({p}_{2i}\right)={F}_{\mu_i}\left({\mu}_i|{\sigma}_i^2,{{}^{\ast}\mathbf{x}}_i\right)\kern0.84em \mathrm{for}\kern0.24em i=1,2,...,N\;\end{array}} $$
(5)

In (5), Re(G(X)| μ, σ2) is the reliability for the given input distribution parameters, and CLTarget is the target confidence to be satisfied in CBDO. The indicator function for a reliable domain, \( {\mathrm{I}}_{\varOmega_{\mathrm{R}}}\left(G\left({\mathbf{x}}^{(j)}\right)\right) \) is to judge whether the jth sampling point is in the reliable region or not. NMCS means the number of samples for MCS, and N is the number of input random variables including both design variables and parameters. The random variables are assumed to be independent in this study, but it is easily able to be expanded to correlated random variables. It can be seen that two input parameter distributions for the ith random variable denoted as \( {F}_{\mu_i}\left({\mu}_i|{\sigma}_i,{{}^{\ast}\mathbf{x}}_i\right) \) and \( {F}_{\sigma_i^2}\left({\sigma}_i^2|{{}^{\ast}\mathbf{x}}_i\right) \) are transformed to the standard normal distribution (Rosenblatt 1952). Thus, the number of random variables in P-space becomes 2N since a two-parameter distribution is assumed for the input random variables. Details regarding (5) and the concept of RMA are found in the literature (Jung et al. 2019b). In addition, sensitivity analysis for reliability in P-space has been developed using the first-order score function. Therefore, the optimization in (5) to find the input parameter realizations having the minimum reliability in P-space can be performed employing any gradient-based MPP-search algorithm.

2.4 Confidence-based design optimization

Instead of directly estimating the confidence of reliability at each design point, RMA judges whether the probabilistic constraint is satisfied or violated following the concept of PMA. Therefore, RMA facilitates efficient CBDO by checking the only reliability at MPP obtained from MCS. Therefore, CBDO is formulated as

$$ {\displaystyle \begin{array}{l}\kern1.08em \mathrm{given}\kern0.36em {}^{\ast}\mathbf{x}\\ {}\underset{\mathbf{d}}{\operatorname{minimize}}\kern0.36em \mathrm{cost}\left(\mathbf{d}\right)\\ {}\mathrm{subject}\ \mathrm{to}\kern0.36em \operatorname{Re}\left({G}_j\left(\mathbf{X}|{\boldsymbol{\uppsi}}_j^{\ast}\right)\right)-{\operatorname{Re}}_j^{\mathrm{Target}}\ge 0\kern0.48em \mathrm{for}\kern0.24em j=1,2,..., nc\end{array}} $$
(6)

where \( {\boldsymbol{\uppsi}}_j^{\ast } \) is the MPP of the jth constraint in P-space obtained from (5); \( {\operatorname{Re}}_j^{\mathrm{Target}} \) is the user-specified target confidence for the jth constraint; d is the mean vector of random design variables among random vector X; nc is the number of constraints. The sensitivity analysis with respect to design point (i.e., the mean vector of random design variables) is obtained from the gradient at the MPP. To avoid repetitive reliability computations for numerical sensitivity analysis, a gradient vector of reliability at MPP using the first-order score function is required whose derivation is given in the literature (Lee et al. 2010; Jung et al. 2019b).

3 Bi-objective confidence-based design optimization

Epistemic uncertainty caused by insufficient input data yields uncertainty of performance reliability. Different reliabilities may be obtained under different realizations of uncertain input models. Since CBDO takes uncertainty of input distribution parameters into account, a more conservative optimum would be obtained compared with an RBDO optimum, which leads to a loss in the objective function of the system defined as an operating cost. That is, the operating cost at the RBDO optimum without input model uncertainty is always less than that of the CBDO optimum for compensating the input model uncertainty. Therefore, reduction of the epistemic uncertainty by increasing the sample size for input variables saves the operating cost of the system, while it increases the development cost. If so, how to determine sample size for input variables in RBDO to minimize the overall cost?

To answer the question above, we present a new bi-objective CBDO accounting for both the operating cost and the development cost simultaneously during the design optimization. There are two ways to satisfy confidence constraints in CBDO: (1) shifting the design vector toward the feasible region, which increases the operating cost, and (2) adding more input samples to reduce the epistemic uncertainty, which increases the development cost. Once the development cost for testing samples of each variable and relative weights between two costs are quantified and aggregated, a balanced optimum can be estimated through the proposed bi-objective CBDO where a mean vector and the sample size for input variables are included in design vector to be optimized.

The purpose of the proposed method is to provide estimated optimal sample size under given conditions before actually acquiring samples. In other words, the optimal sample size for each input variable is estimated through initial sample estimates since the true parameters cannot be assessed without a large number of samples. Therefore, if the proposed bi-objective CBDO provides the optimal amount of additional data, and new samples are obtained, designers have to conduct CBDO again with new input data to validate the results.

3.1 Uncertainty on input distribution type

Uncertainties on input distribution parameters should be decoupled with uncertainties on input distribution types to employ the framework of RMA for CBDO. Discrete uncertainty on the input distribution type is difficult to handle compared with those of input distribution parameters. Since the proposed research assumes that the input distribution types are known, this section briefly explains several ways to select appropriate input distribution types based on given information.

  1. 1.

    Model identification method

    Model identification methods could be practical when multiple types of distribution candidates are given (Kang et al. 2016). For instance, various goodness-of-fit (GOF) tests can verify the suitability of a candidate distribution such as the Kolmogorov-Smirnov (K-S) test. On the other hand, the model selection method is capable of ranking multiple candidate distributions based on a specific criterion, so that the best distribution type fitting the given data can be selected. Akaike information criterion (AIC) and Bayesian information criterion (BIC) are the typical criteria to measure the fitness (Akaike 1974).

  2. 2.

    Johnson distribution

    The Johnson distribution is a four-parameter distribution that has flexibility for a wide range of different distributions and used to resolve the difficulty caused by uncertainty in distribution types (Johnson 1949). Since the Johnson distribution includes four parameters, higher moments such as skewness and kurtosis need to be incorporated in the framework of RMA which increases dimension from 2N to 4N in P-space where N is the number of random variable with insufficient samples.

  3. 3.

    Kernel density estimation

    Kernel density estimation (KDE) is an alternative for a parametric distribution when there is no prior knowledge on distribution type (Silverman 2018). As used in previous works on CBDO, an explicit PDF is obtained from given sample data. It is not necessary to determine specific distribution type, and its accuracy gradually increases as the sample size of input variable increases. In addition, when input variables follow a multimodal distribution, KDE can be the best option to hanlde the input distribution type. Readers can refer to the literature (Jung et al. 2019b) for detailed explanations on exploiting KDE in CBDO.

It is difficult to address that one method is superior since its effectiveness varies according to a given condition, and the thorough comparison between three methods is beyond the purpose of this research. In this paper, the Johnson distribution will be used for comparison since the model identification method and KDE have been already validated in the previous study (Kang et al. 2016; Jung et al. 2019b). For the comparison, it is assumed that the first two moments of the Johnson distribution will have uncertainties given by (1) and (2), and skewness and kurtosis will be deterministically estimated.

3.2 Formulation of bi-objective confidence-based design optimization

The proposed bi-objective CBDO is formulated as

$$ {\displaystyle \begin{array}{l}\underset{\mathbf{d},\mathbf{ND}}{\operatorname{minimize}}\kern0.36em {w}_{\mathbf{d}}\times {\mathrm{cost}}_{\mathrm{operating}}\left(\mathbf{d}\right)+{w}_{\mathbf{ND}}\times {\mathrm{cost}}_{\mathrm{development}}\left(\mathbf{ND}\right)\\ {}\mathrm{subject}\ \mathrm{to}\kern0.36em \Pr \left[\Pr \left[G\left(\mathbf{X}\right)\le 0\right]\ge {\operatorname{Re}}_{\mathrm{Target}}\right]\ge {\mathrm{CL}}_{\mathrm{Target}}\\ {}\kern0.96em \mathrm{where}\kern0.36em {X}_i\sim \Big\{\begin{array}{l}{\zeta}_i\left({\mu}_i\left({d}_i,N{D}_i,{\sigma}_i^2\right),{\sigma}_i^2\left(N{D}_i,{s}_i\right)\right)\kern0.48em \mathrm{for}\kern0.24em i=1,...,{N}_d\\ {}{\zeta}_i\left({\mu}_i\left({{}^{\ast}\overline{x}}_i,N{D}_i,{\sigma}_i^2\right),{\sigma}_i^2\left(N{D}_i,{s}_i\right)\right)\mathrm{for}\kern0.24em i={N}_d+1,...,{N}_d+{N}_p\end{array}\end{array}} $$
(7)

where wd and wND are weights for operating and development cost, respectively; \( \mathbf{d}={\left\{{d}_1,{d}_2,...,{d}_{N_d}\right\}}^T \) is the mean vector of random design variables and \( \mathbf{ND}={\left\{N{D}_1,N{D}_2,...,N{D}_{N_d+{N}_p}\right\}}^T \) is the vector including the sample size for input variables including random design variables and parameters; X includes random design variables and random parameters; NDi, \( {s}_i^2 \), \( {{}^{\ast}\overline{x}}_i \), and di are the sample size, sample variance, sample mean, and design vector for the ith random variable, respectively; Nd is the number of random design variables and Np is the number of random parameters. In (7), it is assumed that all input variables have input model uncertainty. Variabilities of the mean and variance are shown in (1) and (2), and the distribution type is denoted as ζ.

There are two major differences between conventional CBDO and the proposed bi-objective CBDO:

  1. 1.

    The sample size for input variables denoted as ND is the design variable in the proposed bi-objective CBDO. Even if the number of data has to be an integer, it will be treated as a continuous positive variable to employ a gradient-based optimization algorithm and sensitivity analysis. That is, not only the mean vector but also the sample size for input variables can change the confidence of reliability. It should be noted that the sample size for random parameters is also included in ND.

  2. 2.

    The development cost, which is a function of ND, is included in the objective function as can be seen in (7). Figure 1 shows four examples of functional relationship between the development cost and the sample size for input variables in case of a one-dimensional problem: linear, logarithmic, high-order polynomial, and exponential. As the sample size increases, all functions seem to converge to a point; however, that is only for better visualization.

Fig. 1
figure 1

Four examples of relationship between development cost and number of data

Therefore, to perform the proposed bi-objective CBDO in (7), relative weights which are wd and wND between the operating cost and development cost need to be quantified. In addition, functional forms of the operating cost primarily related to design performances and the development cost as a function of the sample size for input variables need to be established. The formulations of operating cost and development cost obviously depend on what kind of applications it will be applied to, and how to quantify each cost is beyond the purpose of the proposed research.

3.3 Sensitivity analysis of confidence with respect to design variables

Sensitivity analysis is an essential process in gradient-based optimizers to provide an accurate and efficient direction for searching the optimum. Since the framework of RMA is adopted, the sensitivity analysis with respect to design vector is identical to the gradient vector of reliability with respect to mean vector and the sample size at the MPP (Jung et al. 2019b). The stochastic sensitivity analysis for reliability with respect to mean vector can be performed analytically using the first-order score functions (Lee et al. 2011). Therefore, sensitivity analysis for confidence constraints with respect to ith design variable is to obtain the gradient of reliability at MPP written as

$$ \frac{\partial \operatorname{Re}}{\partial {d}_i}{\left|{}_{\boldsymbol{\uppsi} ={\boldsymbol{\uppsi}}^{\ast }}=\frac{\partial \operatorname{Re}}{\partial {\mu}_i}\right|}_{\boldsymbol{\uppsi} ={\boldsymbol{\uppsi}}^{\ast }}=\int {\mathrm{I}}_{\varOmega_{\mathrm{R}}}\left(G\left(\mathbf{x}\right)\right)\frac{\partial \ln {f}_{\mathbf{x}}\left(\mathbf{x};{\boldsymbol{\uppsi}}^{\ast}\right)}{\partial {\mu}_i}{f}_{\mathbf{x}}\left(\mathbf{x};{\boldsymbol{\uppsi}}^{\ast}\right)d\mathbf{x} $$
(8)

where \( \frac{\partial \ln {f}_{\mathbf{x}}\left(\mathbf{x};{\boldsymbol{\uppsi}}^{\ast}\right)}{\partial {\mu}_i} \) is the first-order score function with respect to the mean of i-th random variable and ψ is the MPP in P-space which represents the specific realizations of input distribution parameters obtained from (5). The first-order score functions for parametric distributions have been explicitly derived in the literature (Lee et al. 2011). However, the sensitivity of reliability with respect to the sample size for each input variable is also necessary.

In addition to (8), the proposed optimization demands additional sensitivity analysis with respect to ND formulated as

$$ {\displaystyle \begin{array}{l}\frac{\partial \operatorname{Re}}{\partial N{D}_i}=\int {\mathrm{I}}_{\varOmega_{\mathrm{R}}}\left(G\left(\mathbf{x}\right)\right)\frac{d\ln {f}_{\mathbf{x}}\left(\mathbf{x};\boldsymbol{\uppsi} \right)}{dN{D}_i}{f}_{\mathbf{x}}\left(\mathbf{x};\boldsymbol{\uppsi} \right)d\mathbf{x}\\ {}\kern2.28em =\int {\mathrm{I}}_{\varOmega_{\mathrm{R}}}\left(G\left(\mathbf{x}\right)\right)\left(\frac{\partial \ln {f}_{\mathbf{x}}\left(\mathbf{x};\boldsymbol{\uppsi} \right)}{\partial {\mu}_i}\frac{d{\mu}_i}{dN{D}_i}+\frac{d\ln {f}_{\mathbf{x}}\left(\mathbf{x};\boldsymbol{\uppsi} \right)}{d{\sigma}_i^2}\frac{\partial {\sigma}_i^2}{\partial N{D}_i}\right){f}_{\mathbf{x}}\left(\mathbf{x};\boldsymbol{\uppsi} \right)d\mathbf{x}\\ {}\kern2.28em =\int {\mathrm{I}}_{\varOmega_{\mathrm{R}}}\left(G\left(\mathbf{x}\right)\right)\Big[\frac{\partial \ln {f}_{\mathbf{x}}\left(\mathbf{x};\boldsymbol{\uppsi} \right)}{\partial {\mu}_i}\left(\frac{\partial {\mu}_i}{\partial N{D}_i}+\frac{d{\mu}_i}{d{\sigma}_i^2}\frac{\partial {\sigma}_i^2}{\partial N{D}_i}\right)\\ {}\kern4.319998em +\left(\frac{\partial \ln {f}_{\mathbf{x}}\left(\mathbf{x};\boldsymbol{\uppsi} \right)}{\partial {\sigma}_i^2}+\frac{\partial \ln {f}_{\mathbf{x}}\left(\mathbf{x};\boldsymbol{\uppsi} \right)}{\partial {\mu}_i}\frac{d{\mu}_i}{d{\sigma}_i^2}\right)\frac{\partial {\sigma}_i^2}{\partial N{D}_i}\Big]{f}_{\mathbf{x}}\left(\mathbf{x};\boldsymbol{\uppsi} \right)d\mathbf{x}\end{array}} $$
(9)

where ψ includes mean and variance of input variables since two-parameter distribution is assumed in this study.

To further simplify (9), \( \frac{d{x}_i}{d{p}_i} \) is obtained by taking a derivative of CDF with respect to pi which is either mean or variance as (Cho et al. 2017)

$$ \frac{d{x}_i}{d{p}_i}=-\frac{1}{f_{X_i}\left({x}_i;{a}_i,{b}_i\right)}\frac{\partial }{\partial {p}_i}{F}_{X_i}\left({x}_i;{a}_i,{b}_i\right) $$
(10)

where ai and bi are distribution parameters expressed by mean and variance. Using (10), \( \frac{\partial {\mu}_i}{\partial N{D}_i} \) in (9) is derived as

$$ {\displaystyle \begin{array}{l}\frac{\partial {\mu}_i}{\partial N{D}_i}=\frac{\partial {\mu}_i}{\partial {\sigma}_{\mu_i}^2}\frac{\partial {\sigma}_{\mu_i}^2}{\partial N{D}_i}=-\frac{\left(-N{D}_i\left({\mu}_i-{{}^{\ast}\overline{x}}_i\right)/{\sigma}_i^2\right)}{f_{\mu}\left({\mu}_i;{{}^{\ast}\overline{x}}_i,\sqrt{\sigma_i^2/N{D}_i}\right)}\times \phi \left(\frac{\mu_i-{{}^{\ast}\overline{x}}_i}{\sqrt{\sigma_i^2/N{D}_i}}\right)\times \left(-\frac{1}{2}\sqrt{\sigma_i^2/N{D}_i^3}\right)\\ {}\kern5.519997em =-\frac{1}{2}\left(\frac{\mu_i-{{}^{\ast}\overline{x}}_i}{N{D}_i}\right)\end{array}} $$
(11)

where \( {\sigma}_{\mu_i}^2 \) is the variance of μi expressed as \( \sqrt{\frac{\sigma_i^2}{N{D}_i}} \) distinguished from \( {\sigma}_i^2 \), and fμ is PDF of mean described in (2). Similarly,

$$ {\displaystyle \begin{array}{l}\frac{\partial {\sigma}_i^2}{\partial N{D}_i}=\frac{\partial {\sigma}_i^2}{\partial \alpha}\frac{\partial \alpha }{\partial N{D}_i}+\frac{\partial {\sigma}_i^2}{\partial \beta}\frac{\partial \beta }{\partial N{D}_i}\\ {}\kern1.68em =-\frac{1}{2}\frac{1}{f_{\sigma^2}\left({\sigma}_i^2;\alpha, \beta \right)}\times \Big[\frac{\left(\ln \left(\beta /{\sigma}_i^2\right)-{\psi}_0\left(\alpha \right)\right)\varGamma \left(\alpha, \beta /{\sigma}_i^2\right)+T\left(3,\alpha, \beta /{\sigma}_i^2\right)}{\varGamma \left(\alpha \right)}\;\\ {}\kern7.919994em +\left(-\frac{e^{-\beta /{\sigma}_i^2}{\left(\beta /{\sigma}_i^2\right)}^{\alpha }}{\beta \varGamma \left(\alpha \right)}\right){s}_i^2\Big]\end{array}} $$
(12)

where T(•,  • , •) is a Meijer G-function; \( \varGamma \left(s,x\right)={\int}_x^{\infty }{t}^{s-1}{e}^{-t} dt \) is an upper incomplete gamma function; \( \alpha =\frac{N{D}_i-1}{2} \) and \( \beta =\frac{\left(N{D}_i-1\right){s}_i^2}{2} \) are the shape and scale parameter of an inverse-gamma distribution of \( {\sigma}_i^2 \), respectively. \( {f}_{\sigma^2} \)is PDF of variance described in (1). Since μi and \( {\sigma}_i^2 \) are correlated as can be seen from (2), \( \frac{d{\mu}_i}{d{\sigma}_i^2} \) is obtained by taking a derivative on (2) as

$$ {\displaystyle \begin{array}{l}\frac{d{\mu}_i}{d{\sigma}_i^2}=-\frac{\left(-N{D}_i\left({\mu}_i-{{}^{\ast}\overline{x}}_i\right)/{\sigma}_i^2\right)}{f_{\mu}\left({\mu}_i;{{}^{\ast}\overline{x}}_i,\sqrt{\sigma_i^2/N{D}_i}\right)}\times \phi \left(\frac{\mu_i-{{}^{\ast}\overline{x}}_i}{\sqrt{\sigma_i^2/N{D}_i}}\right)\times \frac{1}{2{\sigma}_i\sqrt{N{D}_i}}\\ {}\kern1.44em =\frac{1}{2}\left(\frac{\mu_i-{{}^{\ast}\overline{x}}_i}{\sigma_i^2}\right)\end{array}} $$
(13)

Finally, the sensitivity in (9) is capable of being calculated through (11) to (13). Note that \( {{}^{\ast}\overline{x}}_i \) would be replaced with the ith entry of the design vector di when the ith random variable is a design variable, not a random parameter.

Through the sensitivity analysis in this section, the optimizer can explore next design candidates. The sensitivity analysis with respect to mean vector and the sample size in this section should be distinguished from the sensitivity analysis during MPP-search in P-space. This process is performed after the MPP is found as (5).

3.4 Overall procedures

As shown in Fig. 2, the overall procedure of the proposed bi-objective CBDO is very similar to PMA of RBDO except for the Rosenblatt transformation to P-space and the sensitivity analysis with respect to the sample size for input variables. The identification of input distribution type in Fig. 2 means determination of distribution types among parametric distributions, Johnson distribution, and KDE. In the bi-objective CBDO, MPP in P-space could be found at a current design point as (5) by MPP-search algorithm, and gradient-based optimizers such as sequential quadratic programming (SQP) provides appropriate search direction and step size for searching the optimal design vector as (7).

Fig. 2
figure 2

Flowchart of the proposed bi-objective CBDO

It should be noted that there are two main assumptions made for the bi-objective CBDO: (1) the confidence of reliability is treated in MPP-based approach as conventional CBDO, so that the linearization of reliability with respect to input distribution parameters is employed, and (2) the sample estimates to quantify the input parameter distributions are invariant during the optimization since there is no additional knowledge to change it. In other words, the variabilities of input distribution parameters are only affected by the mean vector d and the sample size ND since the input dataset is not actually updated during the bi-objective CBDO even though ND is changed.

4 Numerical studies: 2D mathematical example

A mathematical 2D optimization problem widely used in the previous RBDO studies is analyzed in various ways. Firstly, the proposed stochastic sensitivity analysis with respect to the sample size for input variables in Sect. 3.3 is validated. Secondly, the 2D bi-objective CBDO is tested with various weights for operating and development costs, and the discrepancies with true optimum are calculated to capture the error due to the bias of sample estimates. Thirdly, the Johnson distribution is used to grasp the error compared with the results when the true distribution type is identified. Fourthly, various types of development cost functions are tested for comparison. Finally, repeated tests with various initial input datasets are performed.

The bi-objective CBDO for the 2D example is formulated as

$$ {\displaystyle \begin{array}{l}\underset{\mathbf{d},\mathbf{ND}}{\operatorname{minimize}}\kern0.36em w\times \left(-\frac{{\left({d}_1+{d}_2-10\right)}^2}{30}-\frac{{\left({d}_1-{d}_2+10\right)}^2}{120}\right)+{\mathrm{cost}}_{\mathrm{development}}\left(N{D}_1,N{D}_2\right)\\ {}\mathrm{subject}\ \mathrm{to}\kern0.36em \Pr \left[\Pr \left[{G}_j\left(\mathbf{X}\right)\le 0\right]\ge 97.72\%\right]\ge 90.00\%\kern0.49em \mathrm{for}\kern0.24em j=1,2,3\\ {}\kern0.96em \mathrm{where}\kern0.36em {G}_1\left(\mathbf{X}\right)=1-\frac{X_1^2{X}_2}{20},\\ {}\kern2.879999em {G}_2\left(\mathbf{X}\right)=-1+{\left(0.9063{X}_1+0.4226{X}_2-6\right)}^2+{\left(0.9063{X}_1+0.4226{X}_2-6\right)}^3\\ {}\kern5.399997em -0.6{\left(0.9063{X}_1+0.4226{X}_2-6\right)}^4-\left(-0.4226{X}_1+0.9063{X}_2\right),\\ {}\kern2.879999em {G}_3\left(\mathbf{X}\right)=1-\frac{80}{X_1^2+8{X}_2+5},\\ {}\kern2.759999em {X}_i\sim N\left({\mu}_i\left({d}_i,{s}_i^2,N{D}_i\right),{\sigma}_i^2\left({s}_i^2,N{D}_i\right)\right),\kern0.36em 10\le N{D}_i\le 100\kern0.72em \mathrm{for}\kern0.24em i=1,2\\ {}\kern2.759999em w=\frac{w_{\mathbf{d}}}{w_{\mathbf{ND}}}\end{array}} $$
(14)

where target confidence and target reliability are set to 90.00% and 97.72%, respectively. The initial number of data is set to 10 for both random variables. w is the weight ratio of two costs, and costdevelopment(ND1, ND2) is the development cost. The operating cost is assumed to be the same as the objective function of the original optimization problem. Lower bounds of ND are set to 10 which is the initial number of data, and upper bounds are set to 100. The initial samples are drawn from a normal distribution with the true variance of 0.52. Note that initial design vector is RBDO optimum obtained from deterministic sample estimates to enhance the convergence.

4.1 Validation of sensitivity analysis

Sensitivity analysis of reliability with respect to ND is validated in this section by comparing analytically calculated values with results of the finite difference method (FDM). Since sensitivity with respect to mean vector is already validated in the previous works (Lee et al. 2010), only the results for the sample size are shown in this section. The highly nonlinear second constraint in (14) is used for the comparison. Therefore, when the vector of the sample size is ND = {10, 10}T at the lower bound, the MPP, p = {1.1853, 0.4671, 0.1276, −0.0551}T, is obtained through the hybrid mean value (HMV) method (Youn et al. 2003) in P-space using the given sample mean and variance. The number of samples to compute the reliability using MCS is 108. Table 1 shows sensitivity analysis results obtained from FDM and the proposed method with two different ND cases. To alleviate sampling uncertainty due to repetitive random samplings, the random seeds for MCS are controlled in this test. It can be concluded from Table 1 that the proposed sensitivity analysis is very accurate compared with FDM.

Table 1 Results of sensitivity analysis of reliability with respect to the number of data

4.2 Results of various weights for two costs

Effect of the weight ratio in the bi-objective CBDO is shown in this section. The development function is set to ND1 + ND2, in which the cost is linear to each number of data. Increasing the weight ratio indicates that the operating cost becomes higher than the development cost. Thus, the optimizer would try to reduce the operating cost by adding more data rather than shifting the mean vector. The test results of various weight ratios are listed in Table 2.

Table 2 Results of bi-objective CBDO with various weight ratios

Table 2 shows that the optimal sample size increases as the weight ratio increases, representing that large weight on operating cost forces to add more samples to satisfy the confidence constraints by reducing uncertainty of input distribution models. Similarly, the optimal sample size decreases as the weight ratio decreases meaning that the optimization tries to find an optimum by sacrificing the operating cost since adding more samples is much more expensive. In case of w = 500, the bi-objective CBDO recommends to add 10 more input samples for X1 and 14 more for X2 to reduce the total cost.

Table 3 shows results of the bi-objective CBDO when sample estimates are inaccurate compared with the true bi-objective CBDO optimum where its sample variance is equal to the true variance (i.e., population variance) of 0.25 for both random variables. The weight ratio is set to 2000. As the initial sample size increases, the sample variance will become closer to the true variance. It can be seen that discrepancy between estimated and true total costs gradually reduces as the sample variance is closer to the true variance. Consequently, Table 3 shows how much error quality of initial sample can cause to estimate the optimal sample size. For instance, if we have the initial dataset having the sample variance as 0.22, the estimated optimal sample size is {39, 53}T. However, the true optimum is {42, 54}T which makes the total cost minimum. The discrepancy between two optima indicates the error due to bias of initial sample estimates.

Table 3 Validation of bi-objective CBDO optima under inaccurate sample variance

4.3 Results of uncertainty on input distribution type

This section shows feasibility of the Johnson distribution instead of the parametric distribution selected from prior knowledge or model identification methods. In this test, the identical dataset used in the previous section. The weight ratio is set to 1000 with linear development cost, and the initial sample size for two random variables is 10. Note that each trial has a different dataset so that different optima are obtained since the sample variance, skewness, and kurtosis would be all different. Table 4 compares two results of the bi-objective CBDO with normal distributions and the Johnson distribution as an input, respectively. The optimal sample size using population variance is ND = {28.7602, 35.6763}T, and the mean of 5 trials is shown in the last row of Table 4. It is shown that the Johnson distribution can be successfully implemented in the proposed framework even though there is an error because the true distribution is a normal distribution, and the biased sample skewness and kurtosis are used. However, deviation of each optimum seems to be smaller for the Johnson distribution because higher-order moments are considered even though they have no uncertainty. In addition, the Johnson distribution can widely cover various types of distributions which may be more appropriate in the real world.

Table 4 Bi-objective CBDO results of two different input distribution types

4.4 Results of various development costs

In real engineering problems, the behavior of the development cost would vary depending on applications. As shown in Fig. 1, it behaves like a simple linear function or nonlinear functions. Therefore, to verify the effectiveness of the proposed bi-objective CBDO in various conditions, two kinds of nonlinear development cost functions with opposite properties are additionally tested in this section.

First, a logarithm function given by

$$ 10\log \left(N{D}_1\times N{D}_2\right) $$
(15)

is used for the development cost and test results with various weight ratios are listed in Table 5. Since the logarithm development function in (15) has gradually decreasing slope, the optimal sample size increases rapidly as the weight for operating cost increases.

Table 5 Results of bi-objective CBDO with various weight ratios under logarithm development cost

Secondly, the development cost is given as an exponential function written as

$$ \exp \left(\frac{N{D}_1}{10}\right)+\exp \left(\frac{N{D}_2}{10}\right) $$
(16)

The results are listed in Table 6 which indicates that larger sample size is avoided even for large weight ratio due to rapid increases in the development cost. The results of these different trends for the two contrasting functions, logarithm and exponential, support the feasibility of the proposed method.

Table 6 Results of bi-objective CBDO varying weight ratio under exponential development cost

4.5 Repeated tests with various initial samples

Previous test results are obtained using the same initial samples for fair comparisons. In this section, iteratively generated input datasets are repeatedly tested to demonstrate robustness of the proposed method. Other parameters and conditions such as weight ratio, development cost, and true input statistical models are invariant during the repeated tests. The different initial dataset in the bi-objective CBDO means that its sample estimates are different, thereby resulting in the different optima. With more data, sample variances are gradually closer to the true variance, but the sample size in the design variable is fixed as 10 in tests. In other words, the actual sample size for calculating the sample variance increases for validation since the purpose of repeated tests is to capture the effect on the bias of sample variance. However, it does not mean that the initial sample size in the design vector increases.

In Fig. 3, optima of the repeated tests with different sample variances are illustrated where 100 trials are performed for each case. It can be seen in Table 7 that variability of optima decreases as more samples are used to estimate the sample variance meaning that the optimal mean vectors gradually converge to the true optimum. The number of data in Fig. 3 and Table 7 is only for calculation of the sample variance and has no relation with the sample size in the design vector. Note that the optimum often goes to the boundary of the design space since the confidence of reliability may have multiple local optima.

Fig. 3
figure 3

Optima of 100 repeated multi-objective CBDOs

Table 7 Variance of optima for each number of samples

5 Engineering example: multi-scale composite frame optimization

To validate feasibility of the proposed method, multi-scale composite frame optimization is used in this section. Deterministic multi-scale design optimization of the composite frames for the minimum structural compliance with manufacturing constraints (Yan et al. 2017), maximum fundamental frequency design with continuous fiber winding angles (Duan et al. 2018), and a two-step optimization scheme for forcing convexity of fiber winding angles in the composite frames (Duan et al. 2019) have been studied. Similarly with the previous studies considering practical engineering applications, it is assumed in this example that each composite tube has the same number of layers, (i.e., Nlay = 20) with the same thickness in the initial design of 0.1 mm for the sake of simplicity. The fiber candidate material is carbon fiber-reinforced epoxy with orthotropic properties as listed in Table 8. Loading/boundary conditions and geometry of the composite frame are shown in Fig. 4. There are four deterministic design variables, radius of each beam, and three random parameters under insufficient data that are longitudinal modulus, magnitude, and direction of the load. True mean and variance of each random parameter are listed in Table 9.

Table 8 Material properties of the uni-directional carbon fiber-reinforced epoxy
Fig. 4
figure 4

Configuration of 4-beam composite frame structure

Table 9 Statistical representations of random parameters

5.1 Formulation of bi-objective confidence-based design optimization for 4-beam composite frame

The bi-objective CBDO for 4-beam composite frame structure is formulated as

$$ {\displaystyle \begin{array}{l}\kern1.08em \mathrm{given}\kern0.24em {{}^{\ast}\overline{p}}_i,{s}_i^2\\ {}\underset{\mathbf{r},\mathbf{ND}}{\operatorname{minimize}}\kern0.36em w\times V\left(\mathbf{r}\right)+{\mathrm{cost}}_{\mathrm{Development}}\left(\mathbf{ND}\right)\\ {}\mathrm{subject}\ \mathrm{to}\kern0.36em \Pr \left[\Pr \left[C\left(\mathbf{r},\mathbf{P}\right)\le {C}_{\mathrm{Target}}\right]\ge {\operatorname{Re}}_{\mathrm{Target}}\right]\ge {\mathrm{CL}}_{\mathrm{Target}}\\ {}{P}_i\sim N\left({\mu}_i\left({{}^{\ast}\overline{p}}_i,{s}_i^2,N{D}_i\right),{\sigma}_i^2\left({s}_i^2,N{D}_i\right)\right),\kern0.36em 10\le N{D}_i\le 100\kern0.72em \mathrm{for}\kern0.24em i=1,2,3\\ {}\kern2.999999em 0.05\le {r}_i\le 1\kern0.36em \mathrm{for}\kern0.24em j=1,2,3,4\end{array}} $$
(17)

where V(r) is the total volume; C(r, P) is the compliance as a function of radius vector r and three random parameters denoted as P in Table 9; CTarget, ReTarget, and CLTarget are the target compliance, target reliability, and target confidence, respectively; \( {{}^{\ast}\overline{p}}_i \) and \( {s}_i^2 \) are the sample mean and variance of the ith random parameter in Table 9 obtained from the initial samples; NDi is the sample size for the ith random parameter. The target compliance and reliability are set to 0.7 and 95%, respectively. Various target confidences and weight ratios between two costs are utilized for validation. Linear development cost is used for the bi-objective CBDO. The initial radius is 0.25 mm for all beams, and the initial sample size for three random parameters is set to 10. In this example, Kriging model for compliance is utilized to improve computational efficiency for reliability estimation. To generate the surrogate models, 300 samples in the design domain by the Latin hypercube sampling are used.

5.2 Results of bi-objective confidence-based design optimization

Results of the bi-objective CBDO under various weights are listed in Table 10. Volume of the structure decreases as the weight ratio increases since large weight means that reducing volume is relatively more valuable than the development cost for increasing sample size. Therefore, the optimal sample size gradually increases as the weight ratio increases. The estimated optimal radii of 4 beams are exact only when the sample variance of the initial sample is exact. It is evident that the RBDO optimum has the minimal volume since it has no epistemic uncertainty. On the other hand, increasing the target confidence means more conservative design is achieved, so that the optimal volume and sample size increase as listed in Table 11. Both costs are raised since the target confidence to be satisfied is increased, leading to much conservative design. In all tests, ND3 which is the sample size for loading direction goes to the lower bound implying that the loading direction is relatively insignificant on compliance compared with material properties and load magnitude.

Table 10 Optima of bi-objective CBDO with various weights between two costs
Table 11 Optima of bi-objective CBDO with various target confidences

6 Conclusion

Bi-objective CBDO accounting for both operating and development costs simultaneously is proposed for practical application of CBDO. The overall process of the bi-objective CBDO is developed based on the CBDO framework to efficiently handle the confidence of reliability that is derived from epistemic uncertainty. The confidence of reliability increases by collecting more input samples which increases the development cost as well as shifting the design vector toward feasible domain which increases the operating cost. Thus, the objective function includes both costs, and the sample size for input variables is handled as the design variable in the bi-objective CBDO which enable to facilitate decision making for the designers on how to allocate engineering efforts. The estimated optimal sample size is affected by the relative weights between two costs and the specification of a given system such as the process to acquire input samples, represented as an explicit expression for a development cost. Since the sample size changes during the optimization, the stochastic sensitivity analysis of confidence with respect to the sample size for input variables is developed to avoid repetitive reliability computations. Although the optimal sample size is obtained based on the assumption that the sample estimates are invariant during optimization, the estimated optimum can guarantee accuracy as the initial sample size increases. Various numerical tests and one engineering application have successfully supported the effectiveness of the proposed method. In consequence, the proposed bi-objective CBDO answers to the question: How to determine sample size for input variables in RBDO to minimize the overall cost?

On the other hand, there are many types of epistemic uncertainties besides the uncertainty of input statistical models, and each epistemic uncertainty can be reduced through additional information such as experiments to validate the simulation model. Therefore, it is crucial to consider the cost for gathering information which enables to reduce the epistemic uncertainty in practical applications. In future work, the multi-objective CBDO including model uncertainty to distribute the optimal number of experiments, simulations, and collecting input data will be investigated.

7 Replication of results

Matlab codes for the mathematical examples in Sect. 4 are uploaded on https://github.com/Yongsu-Jung/Bi-objective-CBDO.git. Overall concepts and algorithms can be validated through the mathematical example.