1 Introduction

The presence of uncertainty is inevitable in the life cycle analysis and design optimization of an industrial output. In order to illustrate its omnipresence, manufacturing variations can be considered as a classical example of one of the major sources of uncertainties inherently associated with a product [1]. Deterministic approaches may not always converge to the desired optima, especially when the design solution is highly sensitive to such variations. In such critical situations, it could lead to either unsafe or over safe outcomes. Therefore, incorporating uncertainty in product analysis and design is necessary to yield economically viable solutions.

Robust design optimization (RDO) is one of the most popular approaches which take into account the effect of uncertainties into the design optimization formulation [2,3,4]. RDO has been observed to improve product quality significantly, and yield insensitive solutions, even in real time industrial applications [5, 6]. Over the last two decades, it has gained much attention across various domains, such as, telecommunications and electronics [7,8,9], aerospace [10,11,12], automobile [13,14,15], ship design [16,17,18], structural mechanics [19,20,21], structural dynamics and vibration control [22, 23] and fatigue analysis [24,25,26].

RDO establishes a mathematical framework for optimization motivated to minimize the propagation of input uncertainty to output responses [27, 28]. A graphical representation presented in Fig. 1 illustrates the concept of RDO. Out of the two optimal solutions, \({x_2}\) is more robust as compared to \({x_1},\) since the former does not affect the objective function \(f\left( x \right)\) much, and hence is less sensitive. Broadly, the problem formulations of RDO generally constitute of maximizing the performance, minimizing the performance variance, or both. For the details and insight of RDO formulation, the reader is referred to the following literatures [2, 29, 30]. Since RDO attempts to resolve sensitive design solutions in a random environment, it suffers from computational issues as an obvious consequence. Despite the advances in computer configuration and speed, the mammoth computational costs of running such complex simulation codes have prohibited their usage.

Fig. 1
figure 1

Schematic diagram illustrating robust design optimization (RDO)

Therefore, in order to avoid running such high fidelity simulations, various approximation schemes have emerged [31]. These techniques, also known as surrogate modelling, have been observed to resolve the computational expenses significantly by approximating the underlying model in a sample space [32, 33]. Various such techniques have been developed till date, such as least square approximation [34], moving least square [35], polynomial chaos expansion [36], anchored ANOVA decomposition [37], Kriging [38], radial basis function [39], artificial neural network [40], support vector machines [41] and multivariate adaptive regression splines [42].

Such techniques have been successfully utilized in design optimization for engineering applications. A comprehensive review of the use of approximation models in mechanical and aerospace systems, and multidisplinary design optimization can be found in [12, 43, 44]. However, in context to optimization, such efficient paradigms have found their use primarily in deterministic frontier. To further illustrate the gap in research in this particular area, it may be justified to reveal that application of approximation models in RDO is quite limited in number and content [1]. A possible explanation for the lack of investigation in this area can be derived from the fact that the performance output is likely to be vulnerable and unstable in presence of uncertainties in an optimization framework. Additionally, solutions of constrained RDO problems can easily move to infeasible regions, if they are located close to constraint boundaries [45]. Thus, accuracy of the method to capture the original model is the governing factor to yield stable results satisfying feasible bounds, as the optima is highly sensitive to the convergence of each iteration.

Thus, the primary motivation of this paper lies in investigating most of the popular surrogate models and study their performance in the framework of RDO. Considering the lack of relevant literature in surrogate assisted RDO, an extensive review and thorough comparative assessment of various surrogate models in RDO has been felt as the need of the hour by the authors. This work is expected to serve as the guiding and selection application of the appropriate approximation models for the solution of high-fidelity computationally expensive stochastic optimization problems. Six benchmark RDO examples have been solved by utilizing as many as eleven popular surrogate models. Finally, a practical problem has been realistically modelled and solved by utilizing the most consistently performing model. To be specific, the dimensionality of the problems in terms of the number of stochastic variables vary from 2 to 48. The robust optimal solutions obtained have been validated with that of Monte Carlo simulation (MCS).

The paper has been organized in the following sequence. The theoretical development of the surrogate models employed for this study has been illustrated in Sect. 2. Section 3 explains the framework of surrogate assisted RDO technique. Numerical study has been carried out in Sect. 4 in order to illustrate the efficiency and accuracy of the various surrogate assisted RDO frameworks. A practical engineering RDO problem has been efficiently addressed in Sect. 5. Finally, conclusion has been drawn by discussing potentiality of the surrogate models in RDO framework on the basis of the results obtained from the study.

2 Surrogate Modelling

In this section, various surrogate models have been explained along with brief description of their mathematical formulations, which would provide complete insight to the readers.

2.1 Anchored ANOVA Decomposition

Various methods approximate multivariate functions in such a way that the component functions of the approximation are ordered starting from a constant and gradually approaching to multivariance as one proceeds along first order, second order and so on. An example of such method is ANOVA decomposition [37, 46,47,48] which is a general set of quantitative model assessment and analysis tool for mapping the high dimensional relationships between input and output model variables. It is an efficient formulation of the system response, if higher order co-operative effects are weak, allowing the physical model to be captured by the lower order terms. Practically for most well-defined physical systems, only relatively low order co-operative effects of the input variables are expected to have a significant effect on the overall response. ANOVA decomposition utilizes this property to fit an accurate hierarchical representation of the physical system. The fundamental concepts of generalized ANOVA decomposition has been discussed henceforth.

Let, \({\mathbf{x}}=\left( {{x_1},{x_2}, \ldots ,{x_P}} \right) \in {\mathbb{R}^P}\) and consider that \(\eta ={\mathbb{L}^2}\left( {{\mathbb{R}^P},{P_X}} \right).\) The generalized ANOVA decomposition expresses \(\eta \left( {\mathbf{x}} \right)=\eta \left( {{x_1},{x_2}, \ldots ,{x_P}} \right)\) as the sum of the hierarchical correlated function expansion in terms of the variables [37, 48] as

$$\eta \left( {\mathbf{x}} \right)={\eta _0}+\sum\limits_{i=1}^P {{\eta _i}\left( {{x_i}} \right)+\sum\limits_{1 \le i \le j \le P} {{\eta _{i,j}}\left( {{x_i},{x_j}} \right)+ \ldots +{\eta _{1, \ldots ,P}}\left( {\mathbf{x}} \right)} } =\sum\limits_{u \subseteq \left\{ {1 \ldots P} \right\}} {{\eta _u}\left( {{x_u}} \right)}$$
(1)

The expansion (1) exists and is unique under one of the hypothesis:

$$\begin{gathered} \int {{\eta _u}\left( {{x_u}} \right)d{P_{{X_i}}}} =0\quad \forall i \in u,\forall u \subseteq \left\{ {1 \ldots P} \right\} \hfill \\ \int {{\eta _u}\left( {{x_u}} \right){\eta _v}\left( {{x_v}} \right)d{P_X}=0\quad \forall u,v \subseteq } \left\{ {1 \ldots P} \right\},{ }u \ne v \hfill \\ \end{gathered}$$
(2)

where \({\eta _0}\) is constant term representing the zeroth order component function or the mean response. The function \({\eta _i}\left( {{x_i}} \right),\) referred to as first order component function representing the independent effect. Similarly, \({\eta _{i,j}}\left( {{x_i},{x_j}} \right)\) is termed as second-order component function and represents co-operative effect of two-variables acting at a time. The higher-order terms indicate higher order co-operative effect with \({\eta _{1, \ldots ,\,P}}\left( {\mathbf{x}} \right)\) denoting the effect of all the variables acting together.

Moreover, each term \({\eta _u}\) in the model function \(Y=\eta \left( {\mathbf{X}} \right)\) [in Eq. (1)] can be solved explicitly by integration and is given by [49],

$$\begin{gathered} \eta _{0} = {\text{E}}\left( {\mathbf{X}} \right) \hfill \\ \eta _{i} = {\text{E}}\left( {{Y \mathord{\left/ {\vphantom {Y {X_{i} }}} \right. \kern-\nulldelimiterspace} {X_{i} }}} \right) - {\text{E}}\left( Y \right),{\text{ }}i = 1, \ldots ,P \hfill \\ \eta _{u} = {\text{E}}\left( {{Y \mathord{\left/ {\vphantom {Y {X_{u} }}} \right. \kern-\nulldelimiterspace} {X_{u} }}} \right) - \sum\limits_{{v \subset u}} {\eta _{v} } ,{\text{ }}\left| u \right| \ge 2 \hfill \\ \end{gathered}$$
(3)

The conditional expectation \({\text{E}}\left( {{Y \mathord{\left/ {\vphantom {Y {X_{u} }}} \right. \kern-\nulldelimiterspace} {X_{u} }}} \right)\) effectively reduces a P dimensional function \(\eta \left( {{x_1},{x_2}, \ldots ,{x_P}} \right)\) to a linear sum of following smaller α(<P) dimensional functions [50]:

$$\left[ {\prod\limits_{1 \le u \ne i,j, \ldots ,{i_\alpha } \le P} {\phi _{{{\mu _u}}}^{u}\left( {{x_u}} \right)} } \right] \times \eta \left( {{x_i},{x_j}, \ldots ,{x_{{i_\alpha }}};\left\{ {\prod\limits_{1 \le u \ne i,j, \ldots ,{i_\alpha } \le P} {x_{u}^{{{\mu _u}}}} } \right\}} \right)$$
(4)

Over a \(\left( {P - \alpha } \right)\) dimensional rectangular grid \(\prod\nolimits_{{u \ne i,j, \ldots ,i_{\alpha } }} {x_{u}^{{\mu _{u} }} ,{\text{ }}} \mu _{u} = 1,2, \ldots ,K_{u}\)

Remark

In this study, second order anchored ANOVA decomposition has been utilized for approximating actual functions. To generate the approximation of any function, initially a reference point \({\mathbf{\bar x}}=\left( {{{\bar x}_1},{{\bar x}_2}, \ldots ,{{\bar x}_P}} \right),\) has to be defined in the variable space. In practice, the choice of the reference point \({\mathbf{\bar x}}\) is essential, especially only if the first few terms, i.e., first and second order, in Eq. (1) are considered. The reference point \({\mathbf{\bar x}}\) at the middle of the input domain appears to be the ideal choice [51].

Based on the above formulation presented, a step-by-step procedure for approximation by utilizing anchored ANOVA decomposition has been provided in algorithm 1.

figure a

2.2 Polynomial Chaos Expansion

The polynomial chaos expansion (PCE) is an efficient technique for obtaining the responses of stochastic systems. This has been introduced by Wiener [52] and hence, known as ‘Wiener Chaos expansion’. The generalized results have been presented by Xiu and Karniadakis [53] for various continuous and discrete system from the so called Askey-scheme and further stated the \({\mathcal{L}_2}\) convergence in the corresponding Hilbert space.

Assuming \({\mathbf{i}}=\left( {{i_1},{i_2}, \ldots ,{i_n}} \right) \in \mathbb{N}_{0}^{n}\) be a multi-index with \(\left| {\mathbf{i}} \right|={i_1}+{i_2}+ \cdots +{i_n},\) and let \(N \ge 0\) be an integer. The Nth order PCE of g(Z) can be stated as:

$$\hat g\left( Z \right)=\sum\limits_{\left| {\mathbf{i}} \right|=0}^N {{a_{\mathbf{i}}}{\Phi _{\mathbf{i}}}\left( Z \right)}$$
(5)

where \(\{ a_{{\mathbf{i}}} \}\) are unknown coefficients which are to be determined. \({\Phi _{\mathbf{i}}}\left( Z \right)\) are n-dimensional orthogonal polynomials with maximum order of N and satisfies the following relation

$$E\left( {{\Phi _{\mathbf{i}}}\left( Z \right){\Phi _{\mathbf{j}}}\left( Z \right)} \right)=\int\limits_\Omega {{\Phi _{\mathbf{i}}}\left( Z \right){\Phi _{\mathbf{j}}}\left( Z \right)\varpi \left( z \right)} ={\delta _{ij}},{ }0 \le \left| {\mathbf{i}} \right|,{ }\left| {\mathbf{j}} \right| \le N$$
(6)

here \({\delta _{{\mathbf{ij}}}}\) denotes the multivariate kronecker delta function. It is to be noted that if \(\varpi \left( z \right)\) is Gaussian, the orthogonality relation in Eq. (6) yields Hermite polynomial as the optimal polynomial. The correspondence of the type of orthogonal polynomial and type of random variable has been presented in Table 1 [53].

Table 1 The correspondence of the type of orthogonal polynomial with distribution pattern

Since the emergence of generalised PCE [53], discrete variants of PCE have been developed. It is worth mentioning that each of the variants of PCE are based on Eq. (5). However, the uniqueness resides in the algorithm utilized to determine the unknown coefficients associated with the bases. The Weiner-Askey PCE, proposed by Xiu and Karniadakis [53] is based on the Galerkin projection. In this method, Galerkin projection has been utilized to decompose the governing stochastic partial differential equation into a system of coupled differential equations. Furthermore, it has been demonstrated that PCE based on Galerkin approach yields excellent results and provide exponential convergence with increase in order of PCE. However, the method, being intrusive in nature, requires knowledge regarding the governing partial differential equation of the system. Consequently, it is not applicable to real-life problems with unknown governing differential equation. In order to address this issue, special attention has been provided to develop non-intrusive PCE. The most popular non-intrusive PCE is the one based on least square method [54, 55]. In this method, the least square technique is implemented to determine the unknown coefficients associated with the bases. Least square based PCE is easy to implement and applicable for systems with unknown governing differential equations. Other alternatives that have been investigated for determining the unknown coefficients associated with the bases are quadrature method [56, 57] and collocation approach [58, 59]. However, all the variants of PCE discussed above are only suitable for small scale problems. This is because, the number of unknown coefficients associated with PCE increases factorially with increase in number of variables. This renders application of PCE to large scale problems infeasible.

Blatman and Sudret [60, 61] proposed two adaptive sparse PCE for solving high-dimensional problems. The purpose of the methods is to determine, in an iterative manner, the components/variables that significantly contributes to the response of interest. The number of unknown coefficients associated with the bases are reduced by eliminating the components/variables having no or less effect on the output response. While the first approach [60] utilizes change in coefficient of determination (R2) to identify the significant components, the second approach utilizes least angle regression scheme [61] to identify the less important components. Moreover, it has been demonstrated that the proposed adaptive sparse PCE is capable of treating systems having number of variables as large as 500. However, only problems governed by elliptical partial differential equations have been investigated as part of the above works.

Due to its superior performance, PCE has found wide applications is various domains. PCE has been utilized for solving the stochastic steady state diffusion problem by Xiu and Karniadakis [62] and the stochastic Navier–Stokes equation [63]. Further, PCE has been utilized for sensitivity analysis by Sudret [64]. A reduced PCE has been developed for stochastic finite element analysis by Pascual and Adhikari [65, 66]. In each of the above mentioned applications, PCE has been found to yield excellent results. However, there are few issues regarding PCE that are yet to be answered. Firstly, PCE is only applicable to systems involving independent random variables. If the system under consideration involves correlated variables, ad hoc transformations, such as, Nataf transformation, needs to be employed to transform the dependent variables into independent variables. Secondly, PCE involves determining orthogonal polynomials for a system. However, orthogonal polynomials are known only for a few random variables as shown in Table 1. Hence, if the system under consideration involves variable(s), orthogonal polynomial(s) for which is not known, implementation of PCE may become tedious.

2.3 Multivariate Adaptive Regression Splines (MARS)

MARS [42] is governed by a set of bases which are selected for approximating the output response through a forward or backward iterative approach. The functional form of MARS is represented as:

$$\hat g\left( {\varvec{X}} \right)={\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \sum\limits_{k=1}^n {{\alpha _k}{\kern 1pt} {\kern 1pt} {\kern 1pt} H_{k}^{f}} \left( {{X_i}} \right)$$
(7)

with

$$H_{k}^{f}({X_1},{X_2},{X_3}, \ldots ,{X_n}){\kern 1pt} {\kern 1pt} {\kern 1pt} ={\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} 1$$
(8)

where \({\alpha _k}\) and \(H_{k}^{f}({x_i})\) are the coefficient of the expansion and the basis functions, respectively. The basis function can be represented as

$$H_{k}^{f}({X_i})=\prod\limits_{i=1}^{{i_k}} {\left[ {{z_{i,k}}\left( {{X_{j(i,k)}} - {t_{i,k}}} \right)} \right]_{{tr}}^{q}}$$
(9)

where \({i_k}\) is the order of interaction in the kth basis function and \({z_{i,k}}= \pm 1.\)\({x_{j(i,k)}}\) in Eq. (9) is the jth variable with \(1 \le j\left( {i,k} \right) \le n.\)\({t_{i,k}}\) represents the knot location on each of the corresponding variables. \(H_{k}^{f}({x_i})\) in Eq. (9) represents the multivariate spline basis function and is represented as the product of univariate spline basis functions \({z_{i,k}},\) which is either of order one or cubic, depending on the degree of continuity of the approximation. The notation “tr” denotes the function is a truncated power function.

$$\begin{aligned} \left[ {z_{{i,k}} {\kern 1pt} \left( {X_{{j(i,k)}} - t_{{i,k}} } \right)} \right]_{{tr}}^{q} &= \left[ {z_{{i,k}} \left( {X_{{j(i,k)}} - t_{{i,k}} } \right)} \right]^{q}\, {\text{for}}\,\left[ {z_{{i,k}} \left( {X_{{j(i,k)}} - t_{{i,k}} } \right)} \right] < 0 \\ &= 0,{\text{ otherwise}} \\ \end{aligned}$$
(10)

Each function is piecewise linear with a knot tr at every \({X_{(i,k)}}.\) MARS models the function by allowing the basis function to bend at the knots. The maximum number of knots considered, the minimum number of observations between knots, and the highest order of interaction terms are to be determined. Automated variable screening is performed within MARS by using the generalized cross-validation (GCV) model fit criterion, which has been developed by Craven and Wahba [67]. The location and number of spline bases needed is determined by a (a) over-fitting a spline function through each knot and (b) removing the knots that have least contribution to the overall fit of the model as determined by the modified GCV criterion. The following equation shows the lack-of-fit (LOF) criterion used by MARS:

$$L_{{fc}} \left( {\hat{g}_{k} } \right) = G_{{cv}} \left( k \right) = \frac{{\left( {{1 \mathord{\left/ {\vphantom {1 n}} \right. \kern-\nulldelimiterspace} n}} \right)\sum\nolimits_{{k = 1}}^{n} {\left[ {g\left( {X_{i} } \right) - \hat{g}_{k} \left( {X_{i} } \right)} \right]^{2} } }}{{\left[ {1 - \left\{ {\tilde{c}(k)/n} \right\}} \right]^{2} }}$$
(11)

where

$$\tilde c\left( k \right)=c\left( k \right)+{\kern 1pt} d.k$$
(12)

MARS has been utilized by Sudjianto et al. [68] to emulate a conceptually intensive complex automotive shock tower model in fatigue life durability analysis. A comparative assessment of MARS as compared to linear, second-order and higher-order regression models has been carried out by Wang et al. [69]. MARS has been utilized by Friedman [42] to approximate behaviour of performance variables in a simple alternating current series circuit. The primary advantage of MARS is its computational efficiency. Moreover, MARS model is capable of handling large data and almost no data preparation is required for building it. However, accuracy of MARS model is lower as compared to other surrogate techniques. Additionally, MARS is incapable of predicting the confidence bound of prediction and additional sample points are required for its validation. This, in turn, reduces the computational efficiency of the MARS model.

2.4 Radial Basis Function

Radial basis function (RBF) is another surrogate model which is quite popular among researchers. RBF is often used to perform the interpolation of scattered multivariate data [70,71,72]. The metamodel appears in a linear combination of Euclidean distances, which may be expressed as

$$\hat g\left( {\varvec{X}} \right)=\sum\limits_{k=1}^n {{w_k}{\varphi _k}\left( {{\varvec{X}},{x_k}} \right)}$$
(13)

where n is the number of sampling points, \({w_k}\) is the weight determined by the least-squares method and \({\varphi _k}(X,{x_k})\)is the k-th basis function determined at the sampling point \({x_k}.\) Various symmetric radial functions are used as basis function. The radial functions for RBF model are illustrated as,

$$R_{f} (\user2{X}) = \exp \left( { - \frac{{(\user2{X} - c)^{T} (\user2{X} - c)}}{{r^{2} }}} \right)\quad \left( {{\text{For Gaussian}}} \right)$$
(14)
$$R_{f} (\user2{X}) = \sqrt {1 + \frac{{(\user2{X} - c)^{T} (\user2{X} - c)}}{{r^{2} }}} \quad \left( {{\text{For multi-quadratic}}} \right)$$
(15)
$$R_{f} (\user2{X}) = \frac{1}{{\sqrt {1 + \frac{{(\user2{X} - c)^{T} (\user2{X} - c)}}{{r^{2} }}} }}\quad \left( {{\text{For inverse multi-quadratic}}} \right)$$
(16)
$$R_{f} (\user2{X}) = {\text{ }}{\kern 1pt} \frac{1}{{1{\kern 1pt} {\text{ }} + \frac{{(\user2{X} - c)^{T} (\user2{X} - c)}}{{r^{2} }}}}\quad \left( {{\text{For Cauchy}}} \right)$$
(17)

It is to be noted that unlike PCE and response surface method (RSM), RBF is not a regression technique. Rather, RBF may be broadly considered as an interpolation technique. As a result, RBF, unlike regression techniques, yields exact result at the sample points.

Till date, RBF has found wide application in the domain of structural reliability and uncertainty quantification. A dynamic surrogate model based on stochastic RBF has been developed by Volpi et al. [73] for uncertainty quantification. The method has been equipped with auto-tuning scheme based on curvature, adaptive sampling scheme, parallel infill and multi-response criterion. It has also been illustrated that the surrogate based on stochastic RBF outperforms popular surrogate such as Kriging. A hybridized RBF has been proposed by Dai et al. [74] for structural reliability analysis. To be specific, the hybridized RBF has been formulated by replacing the learning network of RBF with support vector algorithm. As a consequence, it has been possible to exploit the advantages of support vector algorithm, such as good generalization and global optimization. Comparative assessment illustrated that the method proposed outperform both RBF and support vector algorithm. Other works on RBF include, but are not limited to, development of performance measure approach based reliability analysis technique using RBF [75] and integration of RBF into the FORM algorithm [76]. However, RBF yields satisfactory results only for problems that are linear and/or, weakly non-linear.

2.5 Artificial Neural Network

Artificial neural networks (ANNs) are a family of surrogate model inspired by biological functioning of brain and nervous system. ANNs are generally presented as a system of interconnected ‘neuron’. The neurons in ANN, often termed as nodes, consists of some primitive functions. All the connections have numeric weight that are tuned based on input data. A typical structure of ANN has been illustrated in Fig. 2. The network is represented as a function \(\Phi\) obtained by combining the primitive functions \(f_{1} {\text{, }}f_{2} ,{\text{ }}f_{3} {\text{ and }}f_{4}.\)\({\alpha _1}{,}{\alpha _2}, \ldots ,{\alpha _5}\) are termed as weights and determined by employing some learning algorithm. Based on the above points, it should be clear that three elements govern the formation of an ANN, namely the primitive function associated with the node, topology of the network (single layer or multilayer) and the learning algorithm used to determine the weights associated with the connections. Based on these criteria, multiple variants of ANN have evolved over the years.

Fig. 2
figure 2

Typical structure of ANN

Multilayer feed-forward neural network (MFFNN) [77] is the most popular and widely used ANN. Here, the neurons are arranged in three layers, namely input layer, hidden layer and output layer. It is worthwhile to mention that the number of hidden layers in feed-forward neural network may be more than one. Therefore, it is necessary to perform convergence study to determine the optimum number of layers. Moreover, the number of neurons/nodes in each hidden layer should also be determined by using some appropriate convergence criterion. Each neuron/node consists of a transfer function that expresses the internal activation level of the neuron. A transfer function may either of linear or nonlinear. An account of popular transfer functions has been provided in Table 2.

Table 2 Popular transfer functions in ANN

Another popular ANN scheme is the well-known back-propagation algorithm based ANN [78]. In this scheme, the errors in prediction are propagated backward to the inputs. Based on the errors received, the weights associated with the connections are further updated. The process is repeated until errors at the output layer is less than a specified threshold.

Due to its high accuracy level, ANN has found wide application in uncertainty quantification and reliability analysis. ANN based response surface method has been presented by Shu and Gong [79] for reliability analyses of c-phi slopes. The soil properties have been assumed to be having spatial randomness and hence modelled as random field. It has been observed that the proposed approach yield accurate estimation of failure probability. A multi-wavelet neural network based response surface method has been proposed by Dai et al. [80] for structural reliability analysis. It has been illustrated that the proposed algorithm outperforms the well-known multilayer perceptron based response surface method. Other works which utilized neural network in the field of uncertainty quantification and reliability analysis include [40, 81, 82].

2.6 Support Vector Regression

Support vector regression (SVR) is a variant of the Support Vector Machine (SVM) utilized for regression analysis. SVR uses a subset of data samples, support vectors, in order to construct an approximation model that has a maximum deviation of \(\varepsilon\) from the function value corresponding to each training data. For a linear mapping, the SVR model may be represented as

$$\hat g({\varvec{X}})=\left\langle {W \cdot {\kern 1pt} {\varvec{X}}} \right\rangle +b$$
(18)

where \(\hat g({\varvec{X}})\) is the approximate value of the objective function at \(x,\)W represents a vector of weights, b is the bias term, and \(\left\langle \cdot \right\rangle\) denotes the inner product. Equation (18) may be further expressed as a convex optimization problem as

$$\begin{gathered} \arg \min \quad \;0.5\left| {\left| W \right|} \right|^{2} \hfill \\ {\text{s.t.}}\quad \quad \quad \left\{ {\begin{array}{*{20}c} {g_{i} \left( {X_{i} } \right) - \left\langle {W.X_{i} } \right\rangle - b \le \varepsilon } \\ {\left\langle {W.X_{i} } \right\rangle + b - g\left( {X_{i} } \right) \le \varepsilon } \\ \end{array} } \right. \hfill \\ \end{gathered}$$
(19)

where \({g_i}\left( {{X_i}} \right)\) represent the responses at the sample points. It should be noted that there might not be a function that satisfies the condition in Eq. (19). Hence, introducing slack variables \({\xi _i},\xi _{i}^{*},\) Eq. (19) can be rewritten as:

$$\begin{gathered} \arg \min \quad \;0.5\left| {\left| W \right|} \right|^{2} + C\sum\limits_{{i = 1}}^{n} {\left( {\xi _{i} + \xi _{i}^{*} } \right)} \hfill \\ {\text{s.t.}}\quad \quad \quad \left\{ {\begin{array}{*{20}c} {g_{i} \left( {X_{i} } \right) - \left\langle {W.X_{i} } \right\rangle - b \le \varepsilon + \xi _{i} } \\ {\left\langle {W.X_{i} } \right\rangle + b - g_{i} \le \varepsilon + \xi _{i}^{*} } \\ {\xi ,\xi _{i}^{*} \ge 0} \\ \end{array} } \right. \hfill \\ \end{gathered}$$
(20)

where \(n\) is the number of sample points. The regularization parameter, C, determines the trade-off between the model complexity and the degree for which deviation larger than \(\varepsilon\) is allowed in Eq. (20). The formulation discussed corresponds to dealing with a \(\varepsilon\)-insensitive loss function, as proposed by [83]

$$G(x) = \left\{ {\begin{array}{*{20}l} 0 & {\left| {g\left( \user2{X} \right) - \hat{g}\left( \user2{X} \right){\kern 1pt} } \right| \le \varepsilon } \\ {\left| {g\left( \user2{X} \right) - \hat{g}\left( \user2{X} \right)} \right| - \varepsilon } & {{\text{otherwise}}} \\ \end{array} } \right\}$$
(21)

Next introducing a Lagrange multiplier as:

$$L:=0.5{\left| {\left| W \right|} \right|^2}+C\sum\limits_{i=1}^n {\left( {{\xi _i}+\xi _{i}^{*}} \right)} - \sum\limits_{i=1}^n {{\alpha _i}\left( {\varepsilon +\,{\xi _i} - g\left( {{X_i}} \right)+\,\left\langle {W.{X_i}} \right\rangle +b} \right)} - \sum\limits_{i=1}^n {\alpha _{i}^{*}\left( {\varepsilon +\,\xi _{i}^{*}+\,g\left( {{X_i}} \right) - \left\langle {W.{X_i}} \right\rangle - b} \right)} - \sum\limits_{i=1}^n {\left( {{\eta _i}\xi _{i}^{*}+\eta _{i}^{*}\xi _{i}^{*}} \right)}$$
(22)

It is to be noted that the dual variables in Eq. (22) should satisfy the positivity constraints, i.e., \({\alpha _i},\alpha _{i}^{*},{\eta _i},\eta _{i}^{*} \ge 0.\) Furthermore, it can be shown that Eq. (22) has a saddle point with respect to the primary and dual variables at the optimal solution [84]. Hence,

$$\frac{{\partial L}}{{\partial b}}=\sum\limits_{i=1}^n {\left( {{\alpha _i} - \alpha _{i}^{*}} \right)} =0$$
(23)
$$\frac{{\partial L}}{{\partial w}}=w - \sum\limits_{i=1}^n {\left( {{\alpha _i} - \alpha _{i}^{*}} \right){X_i}} =0$$
(24)
$$\frac{{\partial L}}{{\partial \xi _{i}^{{\left( * \right)}}}}=C - \alpha _{i}^{{\left( * \right)}} - \eta _{i}^{{\left( * \right)}}=0$$
(25)

where \(\alpha _{i}^{{\left( * \right)}}\) includes both \({\alpha _i}\) and \(\alpha _{i}^{*}.\) Similarly, \(\eta _{i}^{{\left( * \right)}}\) also includes both \({\eta _i}\) and \(\eta _{i}^{*}.\) Substituting Eqs. (23)–(25) into Eq. (22) yields

$$\begin{gathered} \begin{array}{*{20}c} {{\text{Maximise}}} & {\left\{ {\begin{array}{*{20}c} { - 0.5\sum\limits_{{i,j = 1}}^{n} {\left( {\alpha _{i} - \alpha _{i}^{*} } \right)\left( {\alpha _{j} - \alpha _{j}^{*} } \right)\left\langle {X_{i} ,X_{j} } \right\rangle } } \\ { - \varepsilon \sum\limits_{{i = 1}}^{n} {\left( {\alpha _{i} + \alpha _{i}^{*} } \right)} + \sum\limits_{{i = 1}}^{n} {g\left( {X_{i} } \right)\left( {\alpha _{i} - \alpha _{i}^{*} } \right)} } \\ \end{array} } \right.} \\ \end{array} \hfill \\ \begin{array}{*{20}c} {{\text{s.t.}}} & {\quad \quad \quad \left\{ {\begin{array}{*{20}c} {\sum\limits_{{i = 1}}^{n} {\left( {\alpha _{i} - \alpha _{i}^{*} } \right) = 0} } \\ {\alpha _{i} ,\alpha _{i}^{*} \in \left[ {0,C} \right]} \\ \end{array} } \right.} \\ \end{array} \hfill \\ \end{gathered}$$
(26)

It should be noted that variables \({\eta _i}\) and \(\eta _{i}^{*}\) are not present in Eq. (26). Additionally using Eq. (24),

$$w=\sum\limits_{i=1}^n {\left( {{\alpha _i} - \alpha _{i}^{*}} \right){X_i}}$$
(27)

In order to compute b, the so-called Karush–Kuhn–Tucker (KKT) condition has been utilized. According to the KKT condition,

$$\begin{gathered} {\alpha _i}\left( {\varepsilon +{\xi _i} - g\left( {{X_i}} \right)+\left\langle {W.{X_i}} \right\rangle +b} \right)=0 \hfill \\ \alpha _{i}^{*}\left( {\varepsilon +\xi _{i}^{*}+g\left( {{X_i}} \right) - \left\langle {W.{X_i}} \right\rangle - b} \right)=0 \hfill \\ \end{gathered}$$
(28)

and

$$\begin{gathered} \left( {C - {\alpha _i}} \right){\xi _i}=0 \hfill \\ \left( {C - \alpha _{i}^{*}} \right)\xi _{i}^{*}=0 \hfill \\ \end{gathered}$$
(29)

From Eq. (29), important remarks can be presented. First, \(\left( {C - \alpha _{i}^{{\left( * \right)}}} \right)=0\) only if the samples lie outside the \(\varepsilon\)-insensitive zone. Secondly, \({\alpha _i}\alpha _{i}^{*}=0,\) i.e., either \({\alpha _i}\) or \(\alpha _{i}^{*}\) should always be zero. Finally, for \(\alpha _{i}^{{\left( * \right)}} \in \left( {0,C} \right),\)\(\xi _{i}^{{\left( * \right)}}=0.\) Hence, the second factor in Eq. (28) vanishes. Therefore,

$$\begin{gathered} b=g\left( {{X_i}} \right) - \left\langle {W.{X_i}} \right\rangle - \varepsilon \hfill \\ b=g\left( {{X_i}} \right) - \left\langle {W.{X_i}} \right\rangle +\varepsilon \hfill \\ \end{gathered}$$
(30)

Similar to the procedure described above, a non-linear regression can be achieved by replacing the \(\left\langle \cdot \right\rangle\) in Eq. (18) with a kernel function, K, [83] as

$$\hat g({\varvec{X}})=\sum\limits_{i=1}^n {\left( {{\alpha _i} - \alpha _{i}^{*}} \right)} { }K\left( {{X_i},{\kern 1pt} { }{\varvec{X}}} \right)+b$$
(31)

All other operations will be applicable as discussed previously.

SVR, of late, has found wide application in uncertainty quantification and reliability analysis. Least square based SVR has been utilized [85] for reliability analysis. It has been illustrated that the structural risk associated with SVR is inherently minimized and therefore, suitable as a surrogate model for reliability analysis. Bootstrap technique has been integrated into the framework of SVR by Lins et al. [86] for obtaining confidence and prediction interval using SVR. The bootstrap based SVR has been utilized for a large scale problem involving component degradation of offshore oil industry. A novel dynamic-weighted probabilistic SVR has been proposed by Liu et al. [87]. The approach developed has been utilized for a real case study on reactor coolant pump governed by 20 failure scenarios. Other significant work on SVR include [88, 89].

Tuning the parameters for obtaining optimum performance is an important aspect associated with SVR. A hybrid method (APSO-SVR) based on the particle swarm optimization and analytical selection for tuning of parameters in SVR has been developed by Zhao et al. [90]. It has been demonstrated that APSO-SVR outperforms the conventional SVR is terms of convergence. Other significant contribution to parameter tuning of SVR include work by Zhao et al. [91] and Coen et al. [92].

2.7 Kriging

Kriging is a surrogate model which is based on Gaussian process modelling. The basic idea of Kriging is to incorporate interpolation, governed by prior covariances, in order to obtain responses at unknown points [93, 94]. In this method, the functional response characteristics is illustrated as:

$${\varvec{\hat{g}}} {(\varvec{X}) = \varvec{y}_{0} (\varvec{X}) + \varvec{Z}(\varvec{X})}$$
(32)

where \(\hat {\varvec{g}}\left( {\varvec{X}} \right)\) is the response function of interest, \({\varvec{X}}\) is an N dimensional vector (N design variables), \({y_0}({\varvec{X}})\) is the known approximation (usually polynomial) function and Z(x) represents is the realization of a stochastic process with mean zero, variance, and non-zero covariance. In the model, the local deviation at an unknown point (X) is expressed using stochastic processes. The sample points are interpolated with the help of Gaussian as the correlation function to estimate the trend of the stochastic processes [95, 96].

Consider, \({\mathbf{X}}={\left\{ {{X_1},{X_2}, \ldots ,{X_N}} \right\}^T} \in {\Re ^N}\) be the vector of basic random variables and \(g\left( {\mathbf{X}} \right)\) be the system response output. In universal Kriging, \({y_0}({\varvec{X}})\) is represented by using a multivariate polynomial as:

$${y_0}(x)=\sum\limits_{i=1}^p {{a_i}{b_i}\left( {\varvec{X}} \right)}$$
(33)

where \({b_i}\left( {\varvec{X}} \right)\) represents the ith basis function and \({a_i}\) denotes the coefficient associated with the ith basis function. The primary idea behind such a representation is that the regression function captures the variance in the data (the overall trend) and the Gaussian process interpolates the residuals. Suppose \(X=\left\{ {{X^1},{X^2}, \ldots ,{X^n}} \right\}\) represents a set of n samples. Also, assume \(g=\left\{ {{g_1},{g_2}, \ldots ,{g_n}} \right\}\) to be the responses at training points. Therefore, the regression part can be written as a n × p model matrix F,

$$F=\left( {\begin{array}{*{20}{c}} {{b_1}\left( {{X^1}} \right)}& \cdots &{{b_p}\left( {{X^1}} \right)} \\ \vdots & \ddots & \vdots \\ {{b_1}\left( {{X^n}} \right)}& \cdots &{{b_p}\left( {{X^n}} \right)} \end{array}} \right)$$
(34)

whereas, the stochastic process is defined using a n × n correlation matrix \(\Psi\)

$$\Psi =\left( {\begin{array}{*{20}{c}} {\psi \left( {{X^1},\,{X^1}} \right)} \cdots {\psi \left( {{X^1},\,{X^n}} \right)} \\ \vdots \ddots \vdots \\ {\psi X\left( {{x^n},\,{X^1}} \right)} \cdots {\psi \left( {{X^n},\,{X^n}} \right)} \end{array}} \right)$$
(35)

where \(\psi \left( { \cdot , \cdot } \right)\) is a correlation function, parameterised by a set of hyperparameters \(\theta.\) The hyperparameters are further identified by maximum likelihood estimation (MLE). A detailed account of MLE in the context of Kriging can be found in [38]. The prediction mean and variance can be obtained as:

$$\mu \left( {\varvec{X}} \right)=M\alpha +r\left( {\varvec{X}} \right){\Psi ^{ - 1}}\left( {g - F\alpha } \right)$$
(36)

and

$${s^2}\left( {\varvec{X}} \right)={\sigma ^2}\left( {1 - r\left( {\varvec{X}} \right){\Psi ^{ - 1}}r{{\left( {\varvec{X}} \right)}^T}+\frac{{\left( {1 - {F^T}{\Psi ^{ - 1}}r{{\left( {\varvec{X}} \right)}^T}} \right)}}{{{F^T}{\Psi ^{ - 1}}F}}} \right)$$
(37)

where \(M=\left( {\begin{array}{*{20}{c}} {{b_1}\left( {{X_p}} \right)}& \ldots &{{b_p}\left( {{X_p}} \right)} \end{array}} \right)\) is the modal matrix of the predicting point \({X_p},\)

$$\alpha ={\left( {{F^T}\Psi F} \right)^{ - 1}}{F^T}{\Psi ^{ - 1}}g$$
(38)

is a p × 1 vector consisting of the unknown coefficients determined by generalised least squares regression and

$$r\left( x \right)=\left( {\begin{array}{*{20}{c}} {\psi \left( {{X_p},{X^1}} \right)}& \cdots &{\psi \left( {{X_p},{X^p}} \right)} \end{array}} \right)$$
(39)

is an 1 × n vector denoting the correlation between the prediction point and the sample points. The process variance \({\sigma ^2}\) is given by

$${\sigma ^2}=\frac{1}{n}{\left( {g - F\alpha } \right)^T}{\Psi ^{ - 1}}\left( {g - F\alpha } \right)$$
(40)

It is worthwhile to mention that the universal Kriging, as formulated above, is an interpolation technique. This can be easily validated by substituting the ith sample point in Eq. (36) and considering that \(r\left( {{X^i}} \right)\) is the ith column of \(\Psi:\)

$$\mu \left( {{X^i}} \right)=M\alpha +{g^i} - M\alpha ={g^i}$$
(41)

One issue associated with the universal Kriging is selection of the optimal polynomial order. Conventionally, the order of the polynomial is selected empirically. However, such non-adapted framework may render the modelling inefficient. Recent works [97,98,99,100,101,102,103,104] have addressed these issues.

An essential feature associated with Kriging is selection of appropriate covariance function [105,106,107]. Mostly, the covariance functions used with Kriging surrogate are stationary and can be expressed in the following form:

$$\psi \left( {x,x'} \right)=\prod\limits_j {{\psi _j}\left( {\theta ,{x_i} - {x_i}^{\prime }} \right)}$$
(42)

The correlation function defined in Eq. (42) has two desirable properties. Firstly, the correlation function for multivariate functions can be represented as product of one dimensional correlations. Secondly, the correlation is stationary and depends only in the distance between two points. Few standard stationary covariance functions, which have been investigated are namely, (a) exponential correlation function, (b) generalised exponential correlation function (c) Gaussian correlation function (d) linear correlation function (e) spherical correlation function (f) cubic correlation function and (g) spline correlation function. The mathematical forms of the above correlation functions are provided below:

  1. i.

    Exponential correlation function:

    $$\psi _{j} \left( {\theta ;\,d_{j} } \right) = \exp \left( { - \theta _{j} \left| {d_{j} } \right|} \right)$$
    (43)
  2. ii.

    Generalised exponential correlation function:

    $${\psi _j}\left( {\theta ;\,{d_j}} \right)=\exp \left( { - {\theta _j}{{\left| {{d_j}} \right|}^{{\theta _{n+1}}}}} \right),\,{ }0<{\theta _{n+1}} \le 2$$
    (44)
  3. iii.

    Gaussian correlation function:

    $${\psi _j}\left( {\theta ;\,{d_j}} \right)\,=\,\exp \left( { - {\theta _j}{d_j}^{2}} \right)$$
    (45)
  4. iv.

    Linear correlation function:

    $${\psi _j}\left( {\theta ;\,{d_j}} \right)\,=\,\max \left\{ {0,1 - {\theta _j}\left| {{d_j}} \right|} \right\}$$
    (46)
  5. v.

    Spherical correlation function:

    $${\psi _j}\left( {\theta ;\,{d_j}} \right)=1 - 1.5\,{\xi _j}\,+\,0.5\,\xi _{j}^{2},\,{ }{\xi _j}\,=\,\min \left\{ {1,\,{\theta _j}\left| {{d_j}} \right|} \right\}$$
    (47)
  6. vi.

    Cubic correlation function:

    $${\psi _j}\left( {\theta ;\,{d_j}} \right)=1 - 3\,\xi _{j}^{2}\,+\,2\,\xi _{j}^{3},\,{ }{\xi _j}=\min \left\{ {1,\,{\theta _j}\left| {{d_j}} \right|} \right\}$$
    (48)
  7. vii.

    Spline correlation function:

    $$\psi _{j} \left( {\theta ;d_{j} } \right) = \left\{ {\begin{array}{*{20}l} {1 - 5\xi _{j}^{2} + 30\xi _{j}^{3} ,\,\,\,\,\,\,\,\,\,\,\,0 \le \xi _{j} \le 0.2} \\ {1.25\left( {1 - \xi _{j}^{3} } \right),\,\,\,\,\,\,\,\,\,\,\,\,\,\,0.2 \le \xi _{j} \le 1} \\ {0,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\xi _{j}> 1} \\ \end{array} } \right.$$
    (49)

where \({\xi _j}={\theta _j}\left| {{d_j}} \right|.\)

For all the correlation functions described above, \({d_j}={x_i} - {x_i}^{\prime }.\)

2.8 Locally Weighted Polynomials

An approach for the pointwise estimation of the unknown function from known samples based upon Taylor’s series expansion has been proposed by Cleveland [108]. This idea was further extended into a statistical framework for model approximation. Later, these methods have been generalized into kernel regression approach [109, 110].

Locally weighted polynomial (LWP) regression is one such form of instance-based algorithm for learning continuous non-linear mappings [111]. Basically, it is a non-parametric regression approach that combine multiple models in a k-nearest-neighbor based metamodel [112]. A low-order weighted least square model is fitted at each training point. Let the input–output mapping be represented as,

$${Y_j}=M\left( {{x_j}} \right)+\sigma \left( {{x_j}} \right){\varepsilon _j}$$
(50)

where \(\varepsilon _{j} \sim N\left( {0,1} \right){\text{ and }}\sigma ^{2} \left( {x_{j} } \right)\) is the variance of \(Y_{j} {\text{ at }}x_{j}.\) For cases where homoscedastic variance is assumed, \({\sigma ^2}\left( x \right)={\sigma ^2}.\)\(M\left( x \right)\) can be obtained by solving the followinsg for \(\alpha\)

$$\sum\limits_{j=1}^n {{{\left( {{Y_j} - \sum\limits_{i=1}^p {{\alpha _i}{{\left( {{x_j} - {x_0}} \right)}^i}} } \right)}^2}} {K_b}\left( {{x_j} - {x_0}} \right)$$
(51)

where \({K_b}\left( \cdot \right)\) controls the weights and b controls the size of neighbourhood around \({x_0}\). Equation (51) can be rewritten as

$$\mathop {\arg \min }\limits_\alpha {\left( {{\mathbf{y}} - {\mathbf{x\alpha }}} \right)^T}{\mathbf{W}}\left( {{\mathbf{y}} - {\mathbf{x\alpha }}} \right)$$
(52)

where W is a diagonal matrix of weights, \({W_{ii}}={K_b}\left( {{x_j} - {x_0}} \right).\) Coefficient vector α can be obtained by the following expression

$${\mathbf{\alpha }}={\left( {{{\mathbf{x}}^T}{\mathbf{Wx}}} \right)^{ - 1}}{{\mathbf{x}}^T}{\mathbf{Wy}}$$
(53)

Thus, there are three principal parameters whose selection may have an effect on the approximation, which are bandwidth (b), the order of polynomial (p), and the kernel or weight function (Kb) [113]. A natural way to select the bandwidth and calibrate the tradeoff is to minimize the mean squared error [114]. In this context, a variable bandwidth selector for kernel regression which can be extended for local linear regression has been proposed in [115]. An adaptive method has been proposed by Fan and Gijbels [116] for selecting the appropriate order of polynomials based on local factors, allowing p to range through various points within the support. In context to choose Kb, it has been established in [113], that the constant for the Epanechnikov kernel is the smallest and hence optimal in terms of integrated mean squared error. However, the difference between the kernels is negligible, so the selection may depend upon the user’s preference at large.

Expressions for the asymptotic bias and variance of an estimate have been presented in [117]. The local linear model using the Epanechnikov kernel has been proven to optimize the linear minimax risk [118], which is a criterion to benchmark the efficiency of an estimator in terms of sample size required for obtaining a certain level of accuracy in results. Later, these results were extended to LWP in [119]. LWP is observed to perform well near the boundary of support of the data, unlike most of non-parametric models in which rate of convergence is slow [110]. However, the basis of slow convergence has been explained as lower number of training points were utilized for estimators near the boundary. It has been illustrated in [120], that no linear estimator can prove to be superior on the boundary in a minimax sense in terms of mean squared error in comparison to LWP. Further details can be found in [109]. Few references in which computational efficiency of LWP has been improved upon are [121,122,123]. Some recent extensions of LWP include [124,125,126].

After an extensive literature review of popular surrogate models, the following section has been attributed to demonstrate the utilization of any of the above models in RDO framework.

3 Surrogate Assisted RDO Framework

This section discusses that how the surrogate models will be employed to address the issue of computational expense of robust optimization. The objective and/or constraint functions in RDO involve mean and standard deviation of stochastic responses, which is the main reason for making the computational platform cumbersome. This is due to large number of simulations required to approximate the statistical quantities of responses during each optimization iteration. It is worth mentioning that, generally there are two ways to introduce efficiency in an RDO approach, which are,

  • To avoid expensive original function/FE evaluation within an optimization iteration,

  • To reduce the number of optimization iterations by preserving the elite solutions.

The scope of the present study is limited so as to address the first point discussed above, in order to present a comparative assessment of efficient surrogate assisted RDO tools. Therefore, instead of simulating original objective and constraint functions, the functions are approximated by surrogate models, which have been utilized within the optimization routine. This efficient surrogate based framework of RDO assists the computation to be limited to nominal costs, especially in case of large scale finite-element based models. A flow diagram of the entire steps involved in the surrogate assisted RDO approach has been depicted in Fig. 3 for better understanding.

Fig. 3
figure 3

Flowchart of surrogate assisted RDO framework utilized in this study

During the evaluation of objective and constraint functions in Fig. 3, it is to be noted that in order to compute the response statistics, simulations are carried out based upon the model generated by the surrogate. This renders significant level of computational efficiency in comparison to MCS performed on the actual FE model. Moreover, since simulations have to be carried out in each of the optimization iteration, computational savings can be achieved in each of such iterations, until convergence of the optima.

Thus, the computational efficiency of surrogates as compared to simulation based RDO framework can be realized as an obvious matter of fact, however, approximation accuracy of the former is a crucial factor yet to be investigated. Therefore, various surrogate models as described in Sect. 2 have been employed in solving few typical non-linear analytical examples in the following section.

4 Numerical Examples

In order to illustrate the efficiency and accuracy of the various surrogate models in RDO platform, six benchmark examples have been considered in this section. In example 1, a test function has been investigated. A two bar plane truss has been studied in example 2. Conceptual design of a bulk carrier is of great concern in shipping industry, which is considered in example 3. Design of a welded beam has been taken up as the fourth example. A speed reducer problem has been investigated in example 5. Side impact crashworthiness of a car has been considered in example 6. Each of the optimization problems deal with single objective and multiple constraint functions. The sequence of examples has been placed in accordance to the increasing number of stochastic variables and hence, increasing complexity. Results obtained have been compared with that of Monte Carlo simulations (MCS) based RDO solutions.

In this paper, the computational platform has been MATLAB® version 8.1 R2013a. MATLAB® toolbox fmincon has been utilized as the optimization search engine. Grid sampling has been utilized for generating training points for constructing anchored ANOVA model. While the other surrogate models have been trained by utilizing latin-hypercube sampling [127]. For the comparison of computational effort, the number of original actual function evaluations is chosen as the primary index tool. This is due to the fact that the number of function evaluations indirectly indicates the CPU time usage.

The surrogates employed in order to solve the examples have been illustrated in Table 3. The abbreviations of the surrogate models mentioned in Table 3, have been utilized throughout the remaining paper.

Table 3 List of the surrogate models utilized and their abbreviations

4.1 Example 1: Test Function [21]

The first example considered is RDO of a test function [21]. The description of the problem has been stated as:

$$\begin{gathered} {\text{Minimize }}F = \frac{{\sigma _{f} }}{{\sigma _{f}^{*} }} \hfill \\ {\text{Subjected to }}G = \mu _{g} - k\sigma _{g} \ge 0 \hfill \\ {\text{where }}f\left( {x_{1} ,x_{2} } \right) = \left( {x_{1} - 4} \right)^{3} + \left( {x_{1} - 3} \right)^{4} + \left( {x_{2} - 5} \right)^{2} + 10 \hfill \\ g\left( {x_{1} ,x_{2} } \right) = x_{1} + x_{2} - 6.45 \hfill \\ 1 \le \mu _{{x_{1} }} \le 10,1 \le \mu _{{x_{2} }} \le 10 \hfill \\ \end{gathered}$$
(54)

The objective is to minimize the standard deviation of f with a probabilistic constraint on g. \(\sigma _{f}^{*}\) and k have been adopted to be 15 and 3, respectively. The design variables \(x_{1} {\text{ and }}x_{2}\) follow normal distribution with standard deviation 0.4.

The number of sample points utilized for comparison of various methods have been presented in Table 4. The corresponding robust optimal solutions obtained have been reported in Table 5. The number of iterations and function calls required for yielding the optimal solutions have been presented in Fig. 4.

Table 4 Number of sample points utilized for example 1 (Sect. 4.1)
Table 5 Robust optimal solutions for example 1 (Sect. 4.1)
Fig. 4
figure 4

Comparative assessment of the surrogate models for example 1 in terms of (a) number of iterations (b) number of function calls, required in yielding the optimal solutions. The number of function calls refer to the number of function evaluations in the optimization loop. The results obtained by MCS are also provided

4.2 Example 2: Two Bar Planar Truss [21]

RDO of a planar two bar truss [21] has been considered as the second example. The problem consists of two design variables, which are the cross sectional area \({x_1}\) and the horizontal span of each truss\({x_2}.\) The density of bar material \(\rho,\) the magnitude of the applied load \(Q,\) and the material’s tensile strength \(S\) are the other parameters of the problem. The objective is to minimize the volume of structure subject to constraints on axial strength of each of the members. The description of the deterministic optimization can be stated as:

$$\begin{gathered} {\text{Minimize }}f\left( {x_{1} ,x_{2} } \right) = \rho x_{1} \sqrt {1 + x_{2}^{2} } \hfill \\ {\text{Subjected to}} \hfill \\ g_{1} \left( {x_{1} ,x_{2} } \right) = 1 - \frac{{5Q}}{{\sqrt {65} S}}\sqrt {1 + x_{2}^{2} } \left( {\frac{8}{{x_{1} }} + \frac{1}{{x_{1} x_{2} }}} \right) \ge 0 \hfill \\ g_{1} \left( {x_{1} ,x_{2} } \right) = 1 - \frac{{5Q}}{{\sqrt {65} S}}\sqrt {1 + x_{2}^{2} } \left( {\frac{8}{{x_{1} }} + \frac{1}{{x_{1} x_{2} }}} \right) \ge 0 \hfill \\ 0.2 \le x_{1} \le 20,0.1 \le x_{2} \le 1.6 \hfill \\ \end{gathered}$$
(55)

The values of the other parameters \(\rho ,{\text{ }}Q{\text{ and }}S\) are \(10^{4} {\text{ kg/m}}^{{\text{3}}} ,{\text{ }}800{\text{ kN and }}1050{\text{ MPa}},\) respectively. The RDO formulation has been presented in Eq. (56).

$$\begin{gathered} {\text{Minimize }}F = w_{1} \frac{{\mu _{f} }}{{\mu _{f}^{*} }} + w_{2} \frac{{\sigma _{f} }}{{\sigma _{f}^{*} }} \hfill \\ {\text{Subjected to }}G_{1} = \mu _{{g_{1} }} - k\sigma _{{g_{1} }} \ge 0 \hfill \\ G_{2} = \mu _{{g_{2} }} - k\sigma _{{g_{2} }} \ge 0 \hfill \\ 0.2 \le \mu _{{x_{1} }} \le 20,{\text{ }}0.1 \le \mu _{{x_{2} }} \le 1.6 \hfill \\ \end{gathered}$$
(56)

The weighing factors \(w_{1} {\text{ and }}w_{2}\) have been adopted to be 0.5. \(\mu _{f}^{*} ,{\text{ }}\sigma _{f}^{*} {\text{ and }}k\) have been set as 10, 2 and 3, respectively. The description of the random variables has been presented in Table 6. The coefficient of variation of the two design variables is 0.02.

Table 6 Description of the random variables

The number of sample points utilized for comparison of various methods have been presented in Table 7. The corresponding robust optimal solutions obtained have been reported in Table 8. The number of iterations and function calls required for yielding the optimal solutions have been presented in Fig. 5.

Table 7 Number of sample points utilized for solving the example 2 (Sect. 4.2)
Table 8 Robust optimal solutions for example 2 (Sect. 4.2)
Fig. 5
figure 5

Comparative assessment of the surrogate models for example 2 in terms of (a) number of iterations (b) number of function calls, required in yielding the optimal solutions. The number of function calls refer to the number of function evaluations in the optimization loop. The results obtained by MCS are also provided

4.3 Example 3: Bulk Carrier Design [16]

The third example considered is that of an RDO of a bulk carrier [16]. The basic cost function of the optimization problem has been considered to be the unit transportation cost. The six design variables have been described in Table 9. The formulation involves some design constraints and have been constructed based on geometry, stability and model validity.

Table 9 Description of design variables

The mathematical model of the cost function has been discussed briefly.

$${\text{Annual cost = capital costs + running costs + voyage costs}}$$
(57)
$${\text{Capital costs = }}0.2{\text{ }}\left( {{\text{ship costs}}} \right)$$
(58)
$${\text{Ship cost = 1}}{\text{.3(2000}}W_{S}^{{0.85}} + 3500W_{0} + 2400P^{{0.8}} )$$
(59)
$${\text{Steel weight = }}W_{S} = 0.034L^{{1.7}} B^{{0.7}} D^{{0.4}} C^{{0.5}}$$
(60)
$${\text{Outfit weight = }}W_{0} = L^{{0.8}} B^{{0.6}} D^{{0.3}} C^{{0.1}}$$
(61)
$${\text{Machinery weight = }}W_{m} = 0.17P^{{0.9}}$$
(62)
$${\text{Displacement = }}1.025LBTC$$
(63)
$${\text{Power = }}P\, = \,{{{\text{displacement}}^{{2/3}} V^{3} } \mathord{\left/ {\vphantom {{{\text{displacement}}^{{2/3}} V^{3}}{\left({a + bF_{n} } \right)}}} \right. \kern-\nulldelimiterspace}{\left({a + bF_{n} } \right)}}$$
(64)
$${\text{Froude number = }}F_{n} \, = \,{{V_{k} } \mathord{\left/ {\vphantom {{V_{k} } {\left( {gL} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {gL} \right)}}^{{0.5}}$$
(65)
$$V_{k} = 0.5144{\text{ V}}$$
(66)

With the help of Eq. (66), \({V_k}\) has units of \({\text{m/s}}\) and \(g = 9.8065{\text{ m/s}}^{2}\) in Eq. (65).

$$a=4977.06{C^2} - 8105.61C+4456.51$$
(67)
$$b= - 10847.2{C^2}+12817C - 6960.32$$
(68)
$${\text{Running costs = 40,000}}DWT^{{0.3}}$$
(69)
$${\text{Deadweight = }}DWT = {\text{displacement}} - {\text{light ship weight}}$$
(70)
$${\text{Light ship weight = }}W_{S} + W_{0} + W_{m}$$
(71)
$${\text{Voyage costs = }}\left( {{\text{fuel cost + port cost}}} \right)RTPA$$
(72)
$${\text{Fuel cost = }}1.05{\text{ daily consumption}} \times {\text{sea days}} \times {\text{fuel price }}$$
(73)
$${\text{Daily consumption = }}0.19P{{24} \mathord{\left/ {\vphantom {{24} {1000}}} \right. \kern-\nulldelimiterspace} {1000}} + 0.2$$
(74)
$${\text{Sea days = round trip miles/}}24V$$
(75)
$${\text{Port cost = }}6.3DWT^{{0.8}}$$
(76)
$${\text{Round trips per year = }}RTPA = {{350} \mathord{\left/ {\vphantom {{350} {\left( {{\text{sea days + port days}}} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {{\text{sea days + port days}}} \right)}}$$
(77)
$${\text{Port days = }}2\left[ {\left( {{{{\text{cargo deadweight}}} \mathord{\left/ {\vphantom {{{\text{cargo deadweight}}} {{\text{handling rate}}}}} \right. \kern-\nulldelimiterspace} {{\text{handling rate}}}}} \right) + 0.5} \right]$$
(78)
$${\text{Cargo deadweight = }}DWT - {\text{fuel carried}} - {\text{miscellaneous }}DWT$$
(79)
$${\text{Fuel carried = daily consumption }}\left( {{\text{sea days + 5}}} \right)$$
(80)
$${\text{Miscellaneous }}DWT = 2DWT^{{0.5}}$$
(81)
$${\text{Annual cargo capacity = }}DWT \times {\text{round trips per year}}$$
(82)
$${\text{Unit transportation cost = annual cost/annual cargo capacity}}$$
(83)

The unit transportation cost has been adopted to be the objective function of the optimized conceptual design of a bulk carrier and can be evaluated using Eq. (83). Since the design problem incorporates several environmental factors and involves detailed modelling, large number of parameters have been involved. Therefore, to maintain disambiguity, all parameters required for evaluating Eq. (83) have been well defined in Eqs. (57)–(82). The constraints pertaining to the optimization problem have been defined in Eqs. (84)–(91).

$${L \mathord{\left/ {\vphantom {L B}} \right. \kern-\nulldelimiterspace} B} \ge 6$$
(84)
$${L \mathord{\left/ {\vphantom {L {D \le 15}}} \right. \kern-\nulldelimiterspace} {D \le 15}}$$
(85)
$${L \mathord{\left/ {\vphantom {L {T \le 19}}} \right. \kern-\nulldelimiterspace} {T \le 19}}$$
(86)
$$T \le 0.45DW{T^{0.31}}$$
(87)
$$T \le 0.7D+0.7$$
(88)
$$25,000 \le DWT \le 5,00,000$$
(89)
$${F_n} \le 0.32$$
(90)
$$GMT=KB+BMT - KG \le 0.07B$$
(91)

where \(KB,BMT{\text{ and }}KG\) have been defined in Eqs. (92)–(94), respectively.

$${\text{Vertical center of buoyancy = }}KB = 0.53T$$
(92)
$${\text{Metacentric radius = }}BMT = \left( {0.085C - 0.002} \right){{B^{2} } \mathord{\left/ {\vphantom {{B^{2} } {TC}}} \right. \kern-\nulldelimiterspace} {TC}}$$
(93)
$${\text{Vertical center of gravity = }}KG = 1 + 0.52D$$
(94)

The description of random variables have been provided in Table 10. Two case studies have been performed for different configurations of objective function as presented in Table 11. The number of sample points utilized for comparison of various methods have been presented in Table 12. The robust optimal solutions corresponding to the case studies undertaken have been reported in Tables 13 and 14. The number of iterations and function calls required for yielding the optimal solutions have been presented in Fig. 6.

Table 10 Description of uncertain variables
Table 11 Description of objective functions for example 3 defined in Sect. 4.3
Table 12 Number of sample points utilized for solving the example 3 (Sect. 4.3)
Table 13 Robust optimal solutions corresponding to case 1 of Table 11 for example 3
Table 14 Robust optimal solutions corresponding to case 2 of Table 11 for example 3
Fig. 6
figure 6

Comparative assessment of the surrogate models for example 3 in terms of (a) number of iterations (b) number of function calls, required in yielding the optimal solutions, corresponding to case 1 of Table 11, (c) number of iterations (d) number of function calls, required in yielding the optimal solutions, corresponding to case 2 of Table 11. The number of function calls refer to the number of function evaluations in the optimization loop. The results obtained by MCS are also provided

4.4 Example 4: Welded Beam Design [128]

The fourth example considered is that of a welded beam design [128]. The objective is to minimize the cost of the beam subject to constraints on shear stress, bending stress, buckling load, and end deflection. There are four continuous design variables, namely, beam thickness \({x_1}\), beam width \({x_2}\), weld length \({x_3}\), and weld thickness \({x_4}\).

The problem description can be stated as follows:

$${\text{Minimize}}\quad f\left({\mathbf{x}} \right)=1.10471x_{1}^{2}{x_2}+0.04811{x_3}{x_4}\left( {14+{x_2}} \right)$$
(95)
$$\begin{gathered} {s}{.t}{.} \hfill \\ \quad \quad {g_1}\left( {\mathbf{x}} \right)=t - {t_{\max }} \le 0 \hfill \\ \quad \quad {g_2}\left( {\mathbf{x}} \right)=s - {s_{\max }} \le 0 \hfill \\ \quad \quad {g_3}\left( {\mathbf{x}} \right)={x_1} - {x_4} \le 0 \hfill \\ \quad \quad {g_4}\left( {\mathbf{x}} \right)=d - {d_{\max }} \le 0 \hfill \\ \quad \quad {g_5}\left( {\mathbf{x}} \right)=P - {P_c} \le 0 \hfill \\ \end{gathered}$$
(96)

where

$$M=P\left( {L+{x_2}/2} \right)$$
(97)
$$R=\sqrt {0.25\left( {x_{2}^{2}+{{\left( {{x_1}+{x_3}} \right)}^2}} \right)}$$
(98)
$$J=\sqrt 2 {x_1}{x_2}\left( {{{x_{2}^{2}} \mathord{\left/ {\vphantom {{x_{2}^{2}} {12+0.25{{\left( {{x_1}+{x_3}} \right)}^2}}}} \right. \kern-\nulldelimiterspace} {12+0.25{{\left( {{x_1}+{x_3}} \right)}^2}}}} \right)$$
(99)
$${P_c}=64746.022\left( {1 - 0.0282346{x_3}} \right){x_3}x_{4}^{3}$$
(100)
$${t_1}={P \mathord{\left/ {\vphantom {P {\left({\sqrt 2 {x_1}{x_2}} \right)}}} \right. } {\left({\sqrt 2 {x_1}{x_2}} \right)}}$$
(101)
$${t_2}={{MR} \mathord{\left/ {\vphantom {{MR} J}} \right. \kern-\nulldelimiterspace} J}$$
(102)
$$t=\sqrt {t_{1}^{2}+{{{t_1}{t_2}{x_2}} \mathord{\left/ {\vphantom {{{t_1}{t_2}{x_2}} R}} \right. \kern-\nulldelimiterspace} R}+t_{2}^{2}}$$
(103)
$$S={{6PL} \mathord{\left/ {\vphantom {{6PL} {\left( {{x_4}x_{3}^{2}} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {{x_4}x_{3}^{2}} \right)}}$$
(104)
$$d={{2.1952} \mathord{\left/ {\vphantom {{2.1952} {\left( {{x_4}x_{3}^{3}} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {{x_4}x_{3}^{3}} \right)}}$$
(105)
$$\begin{gathered} P = 6000,{\text{ }}L = 14,E = 30 \times 10^{6} ,{\text{ }}G = 12 \times 10^{6} , \hfill \\ t_{{\max }} = 13,600,{\text{ }}s_{{\max }} = 30,000,{\text{ }}x_{{\max }} = 10,{\text{ }}d_{{\max }} = 0.25{\text{ }} \hfill \\ 0.125 \le x_{1} \le 10,{\text{ }}0.1 \le x_{i} \le 10,{\text{for}}{\text{ }}\,i = 2,3,4. \hfill \\ \end{gathered}$$
(106)

For the RDO formulation of the problem, each of the design variables have been assumed to be normally distributed with standard deviation to be 5%. Case study has been performed considering objective function to be: \({\text{mean}}\left( {f\left( {\mathbf{x}} \right)} \right) + {\text{SD}}\left( {f\left( {\mathbf{x}} \right)} \right),\) where SD is standard deviation.

The number of sample points utilized for comparison of various methods have been presented in Table 15. The corresponding robust optimal solutions obtained have been reported in Table 16. The number of iterations and function calls required for yielding the optimal solutions have been presented in Fig. 7.

Table 15 Number of sample points utilized for solving the example 4 (Sect. 4.4)
Table 16 Robust optimal solutions for example 4 (Sect. 4.4)
Fig. 7
figure 7

Comparative assessment of the surrogate models for example 4 in terms of (a) number of iterations (b) number of function calls, required in yielding the optimal solutions. The number of function calls refer to the number of function evaluations in the optimization loop. The results obtained by MCS are also provided

4.5 Example 5: Speed Reducer [129]

The fifth example considered is that of speed reducer, which is a standard optimization problem. The details of theoretical formulation can be found elsewhere [129]. The problem consists of seven design variables and eleven constraint functions. The mathematical description of the problem has been presented below.

$${\text{Minimize }}f\left( {\mathbf{x}} \right) = 0.7854x_{1} x_{2}^{2} A - 1.508x_{1} B + 7.477C + 0.7854D$$
(107)
$$\begin{gathered} {\text{where}},\;A = 3.3333x_{3}^{2} + 14.9334x_{3} - 43.0934 \hfill \\ \quad \quad \quad B = x_{6}^{2} + x_{7}^{2} \hfill \\ \quad \quad \quad C = x_{6}^{3} + x_{7}^{3} \hfill \\ \quad \quad \quad D = x_{4} x_{6}^{2} + x_{5} x_{7}^{2} \hfill \\ \end{gathered}$$
(108)
$$\begin{gathered} {\text{s.t.,}}\quad g_{1} \left( {\mathbf{x}} \right) = {{\left( {27 - x_{1} x_{2}^{2} x_{3} } \right)} \mathord{\left/ {\vphantom {{\left( {27 - x_{1} x_{2}^{2} x_{3} } \right)} {27}}} \right. \kern-\nulldelimiterspace} {27}} \le 0 \hfill \\ \quad \quad \;g_{2} \left( {\mathbf{x}} \right) = {{\left( {397.5 - x_{1} x_{2}^{2} x_{3}^{2} } \right)} \mathord{\left/ {\vphantom {{\left( {397.5 - x_{1} x_{2}^{2} x_{3}^{2} } \right)} {397.5}}} \right. \kern-\nulldelimiterspace} {397.5}} \le 0 \hfill \\ \quad \quad \;g_{3} \left( {\mathbf{x}} \right) = {{\left( {1.93 - {{\left( {x_{2} x_{6}^{4} x_{3} } \right)} \mathord{\left/ {\vphantom {{\left( {x_{2} x_{6}^{4} x_{3} } \right)} {x_{4}^{3} }}} \right. \kern-\nulldelimiterspace} {x_{4}^{3} }}} \right)} \mathord{\left/ {\vphantom {{\left( {1.93 - {{\left( {x_{2} x_{6}^{4} x_{3} } \right)} \mathord{\left/ {\vphantom {{\left( {x_{2} x_{6}^{4} x_{3} } \right)} {x_{4}^{3} }}} \right. \kern-\nulldelimiterspace} {x_{4}^{3} }}} \right)} {1.93}}} \right. \kern-\nulldelimiterspace} {1.93}} \le 0 \hfill \\ \quad \quad \;g_{4} \left( {\mathbf{x}} \right) = {{\left( {1.93 - {{\left( {x_{2} x_{7}^{4} x_{3} } \right)} \mathord{\left/ {\vphantom {{\left( {x_{2} x_{7}^{4} x_{3} } \right)} {x_{5}^{3} }}} \right. \kern-\nulldelimiterspace} {x_{5}^{3} }}} \right)} \mathord{\left/ {\vphantom {{\left( {1.93 - {{\left( {x_{2} x_{7}^{4} x_{3} } \right)} \mathord{\left/ {\vphantom {{\left( {x_{2} x_{7}^{4} x_{3} } \right)} {x_{5}^{3} }}} \right. \kern-\nulldelimiterspace} {x_{5}^{3} }}} \right)} {1.93}}} \right. \kern-\nulldelimiterspace} {1.93}} \le 0 \hfill \\ \quad \quad \;g_{5} \left( {\mathbf{x}} \right) = {{\left( {\left( {{{A_{1} } \mathord{\left/ {\vphantom {{A_{1} } {B_{1} }}} \right. \kern-\nulldelimiterspace} {B_{1} }}} \right) - 1100} \right)} \mathord{\left/ {\vphantom {{\left( {\left( {{{A_{1} } \mathord{\left/ {\vphantom {{A_{1} } {B_{1} }}} \right. \kern-\nulldelimiterspace} {B_{1} }}} \right) - 1100} \right)} {1100}}} \right. \kern-\nulldelimiterspace} {1100}} \le 0 \hfill \\ \quad \quad \;g_{6} \left( {\mathbf{x}} \right) = {{\left( {\left( {{{A_{2} } \mathord{\left/ {\vphantom {{A_{2} } {B_{2} }}} \right. \kern-\nulldelimiterspace} {B_{2} }}} \right) - 850} \right)} \mathord{\left/ {\vphantom {{\left( {\left( {{{A_{2} } \mathord{\left/ {\vphantom {{A_{2} } {B_{2} }}} \right. \kern-\nulldelimiterspace} {B_{2} }}} \right) - 850} \right)} {850}}} \right. \kern-\nulldelimiterspace} {850}} \le 0 \hfill \\ \quad \quad \;g_{7} \left( {\mathbf{x}} \right) = {{\left( {x_{2} x_{3} - 40} \right)} \mathord{\left/ {\vphantom {{\left( {x_{2} x_{3} - 40} \right)} {40}}} \right. \kern-\nulldelimiterspace} {40}} \le 0 \hfill \\ \quad \quad \;g_{8} \left( {\mathbf{x}} \right) = {{\left( {5 - \left( {{{x_{1} } \mathord{\left/ {\vphantom {{x_{1} } {x_{2} }}} \right. \kern-\nulldelimiterspace} {x_{2} }}} \right)} \right)} \mathord{\left/ {\vphantom {{\left( {5 - \left( {{{x_{1} } \mathord{\left/ {\vphantom {{x_{1} } {x_{2} }}} \right. \kern-\nulldelimiterspace} {x_{2} }}} \right)} \right)} 5}} \right. \kern-\nulldelimiterspace} 5} \le 0 \hfill \\ \quad \quad \;g_{9} \left( {\mathbf{x}} \right) = {{\left( {\left( {{{x_{1} } \mathord{\left/ {\vphantom {{x_{1} } {x_{2} }}} \right. \kern-\nulldelimiterspace} {x_{2} }}} \right) - 12} \right)} \mathord{\left/ {\vphantom {{\left( {\left( {{{x_{1} } \mathord{\left/ {\vphantom {{x_{1} } {x_{2} }}} \right. \kern-\nulldelimiterspace} {x_{2} }}} \right) - 12} \right)} {12}}} \right. \kern-\nulldelimiterspace} {12}} \le 0 \hfill \\ \quad \quad \;g_{{10}} \left( {\mathbf{x}} \right) = {{\left( {1.9 + 1.5x_{6} - x_{4} } \right)} \mathord{\left/ {\vphantom {{\left( {1.9 + 1.5x_{6} - x_{4} } \right)} {1.9}}} \right. \kern-\nulldelimiterspace} {1.9}} \le 0 \hfill \\ \quad \quad \;g_{{11}} \left( {\mathbf{x}} \right) = {{\left( {1.9 + 1.1x_{7} - x_{5} } \right)} \mathord{\left/ {\vphantom {{\left( {1.9 + 1.1x_{7} - x_{5} } \right)} {1.9}}} \right. \kern-\nulldelimiterspace} {1.9}} \le 0 \hfill \\ \end{gathered}$$
(109)
$$\begin{gathered} {\text{where,}}\;A_{1} = \left[ {\left( {{{745x_{4} } \mathord{\left/ {\vphantom {{745x_{4} } {\left( {x_{2} x_{3} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {x_{2} x_{3} } \right)}}} \right)^{2} + \left( {16.91 \times 10^{6} } \right)} \right]^{{0.5}} \hfill \\ \quad \quad \quad B_{1} = 0.1x_{6}^{3} \hfill \\ \quad \quad \quad A_{2} = \left[ {\left( {{{745x_{5} } \mathord{\left/ {\vphantom {{745x_{5} } {\left( {x_{2} x_{3} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {x_{2} x_{3} } \right)}}} \right)^{2} + \left( {157.5 \times 10^{6} } \right)} \right]^{{0.5}} \hfill \\ \quad \quad \quad B_{2} = 0.1x_{7}^{3} \hfill \\ \end{gathered}$$
(110)

The variable bounds have been presented in Eq. (111).

$$\begin{gathered} 2.6 \le {x_1} \le 3.6 \hfill \\ 0.7 \le {x_2} \le 0.8 \hfill \\ 17 \le {x_3} \le 28 \hfill \\ 7.3 \le {x_4},{x_5} \le 8.3 \hfill \\ 2.9 \le {x_6} \le 3.9 \hfill \\ 5 \le {x_7} \le 5.5 \hfill \\ \end{gathered}$$
(111)

For the RDO formulation of the problem, each of the design variables have been assumed to be normally distributed with standard deviation to be 5%. Case studies have been performed considering various objective functions as shown in Table 17.

Table 17 Description of objective functions for example 5 (Sect. 4.5)

The number of sample points utilized for comparison of various methods have been presented in Table 18. The corresponding robust optimal solutions obtained have been reported in Tables 19 and 20. The number of iterations and function calls required for yielding the optimal solutions have been presented in Fig. 8.

Table 18 Number of sample points utilized for solving the example 5 (Sect. 4.5)
Table 19 Robust optimal solutions corresponding to case 1 of Table 17 for example 5 (Sect. 4.5)
Table 20 Robust optimal solutions corresponding to case 2 of Table 17 for example 5 (Sect. 4.5)
Fig. 8
figure 8

Comparative assessment of the surrogate models for example 5 in terms of (a) number of iterations (b) number of function calls, required in yielding the optimal solutions, corresponding to case 1 of Table 17, (c) number of iterations (d) number of function calls, required in yielding the optimal solutions, corresponding to case 2 of Table 17. The number of function calls refer to the number of function evaluations in the optimization loop. The results obtained by MCS are also provided

4.6 Example 6: Side Impact Crashworthiness of Car [130]

The sixth example considered is that of an RDO of side impact crashworthiness of car [130]. This example has been reformulated as a robust optimization problem. The problem consists of eleven stochastic variables and nine variables out of them are design variables. The description of the stochastic and design variables have been provided in Tables 21 and 22.

Table 21 General description of the stochastic and design variables
Table 22 Statistical description of stochastic variables

The optimization problem formulation can be stated as:

$${\text{Minimize }}f\left( {\mathbf{x}} \right) = {\text{Weight}}$$
(112)
$$\begin{gathered} {\text{s.t.,}}\quad g_{1} \left( {\mathbf{x}} \right) = {\text{Abdomen}}\,\,{\text{load}} \le 1\,\,{\text{kN}} \hfill \\ \quad \quad \;g_{2} \left( {\mathbf{x}} \right) = V \times C_{u} \le 0.32{\text{ }}{{\text{m}} \mathord{\left/ {\vphantom {{\text{m}} {\text{s}}}} \right. \kern-\nulldelimiterspace} {\text{s}}} \hfill \\ \quad \quad \;g_{3} \left( {\mathbf{x}} \right) = V \times C_{m} \le 0.32{\text{ }}{{\text{m}} \mathord{\left/ {\vphantom {{\text{m}} {\text{s}}}} \right. \kern-\nulldelimiterspace} {\text{s}}} \hfill \\ \quad \quad \;g_{4} \left( {\mathbf{x}} \right) = V \times C_{l} \le 0.8{\text{ }}{{\text{m}} \mathord{\left/ {\vphantom {{\text{m}} {\text{s}}}} \right. \kern-\nulldelimiterspace} {\text{s}}} \hfill \\ \quad \quad \;g_{5} \left( {\mathbf{x}} \right) = {\text{upper}}\,\,{\text{rib}}\,\,{\text{deflection}} \le 32{\text{ mm}} \hfill \\ \quad \quad \;g_{6} \left( {\mathbf{x}} \right) = {\text{middle}}\,\,{\text{rib}}\,\,{\text{deflection}} \le 32{\text{ mm}} \hfill \\ \quad \quad \;g_{7} \left( {\mathbf{x}} \right) = {\text{lower}}\,\,{\text{rib}}\,\,{\text{deflection}} \le 32{\text{ mm}} \hfill \\ \quad \quad \;g_{8} \left( {\mathbf{x}} \right) = {\text{pubic}}\,\,{\text{force}} \le 4\,\,{\text{kN}} \hfill \\ \quad \quad \;g_{9} \left( {\mathbf{x}} \right) = {\text{vel.}}\,\,{\text{of}}\,\,{\text{V-pillar}}\,\,{\text{at}}\,\,{\text{midpoint}} \le 9.9{\text{ }}{{{\text{mm}}} \mathord{\left/ {\vphantom {{{\text{mm}}} {{\text{ms}}}}} \right. \kern-\nulldelimiterspace} {{\text{ms}}}} \hfill \\ \quad \quad \;g_{8} \left( {\mathbf{x}} \right) = {\text{front door vel}}{\text{. at B - Pillar}} \le 15.7{\text{ }}{{{\text{mm}}} \mathord{\left/ {\vphantom {{{\text{mm}}} {{\text{ms}}}}} \right. \kern-\nulldelimiterspace} {{\text{ms}}}} \hfill \\ \end{gathered}$$
(113)

The functional forms of the objective and constraint functions have been provided in Eqs. (114)–(124).

$$f\left( {\mathbf{x}} \right)=1.98+4.9{x_1}+6.67{x_2}+6.98{x_3}+4.01{x_4}+1.78{x_5}+0.00001{x_6}+2.73{x_7}$$
(114)
$${g_1}\left( {\mathbf{x}} \right)=1.16 - 0.3717{x_2}{x_4} - 0.00931{x_2}{x_{10}} - 0.484{x_3}{x_9}+0.01343{x_6}{x_{10}}$$
(115)
$${g_2}\left( {\mathbf{x}} \right)=0.261 - 0.0159{x_1}{x_2} - 0.188{x_1}{x_8} - 0.019{x_2}{x_7}+0.0144{x_3}{x_5}+0.0008757{x_5}{x_{10}}+0.08045{x_6}{x_9}+0.00139{x_8}{x_{11}}+0.00001575{x_{10}}{x_{11}}$$
(116)
$${g_3}\left( {\mathbf{x}} \right)=0.214+0.00817{x_5} - 0.131{x_1}{x_8} - 0.0704{x_1}{x_9}+0.03099{x_2}{x_6} - 0.018{x_2}{x_7}+0.0208{x_3}{x_8}+0.121{x_3}{x_9} - 0.00364{x_5}{x_6}+0.0007715{x_5}{x_{10}} - 0.0005354{x_6}{x_{10}}+0.00121{x_8}{x_{11}}+0.00184{x_9}{x_{10}} - 0.018x_{2}^{2}$$
(117)
$${g_4}\left( {\mathbf{x}} \right)=0.74 - 0.61{x_2} - 0.163{x_3}{x_8}+0.001232{x_3}{x_{10}} - 0.166{x_7}{x_9}+0.227x_{2}^{2}$$
(118)
$${g_5}\left( {\mathbf{x}} \right)=28.98+3.818{x_3} - 4.2{x_1}{x_2}+0.0207{x_5}{x_{10}}+6.63{x_6}{x_9} - 7.77{x_7}{x_8}+0.32{x_9}{x_{10}}$$
(119)
$${g_6}\left( {\mathbf{x}} \right)=33.86+2.95{x_3}+0.1792{x_{10}} - 5.057{x_1}{x_2} - 11{x_2}{x_8} - 0.0215{x_5}{x_{10}} - 9.98{x_7}{x_8}+22{x_8}{x_9}$$
(120)
$${g_7}\left( {\mathbf{x}} \right)=46.36 - 9.9{x_2} - 12.9{x_1}{x_8}+0.1107{x_3}{x_{10}}$$
(121)
$${g_8}\left( {\mathbf{x}} \right)=4.72 - 0.5{x_4} - 0.19{x_2}{x_3} - 0.0122{x_4}{x_{10}}+0.009325{x_6}{x_{10}}+0.000191x_{{11}}^{2}$$
(122)
$${g_9}\left( {\mathbf{x}} \right)=10.58 - 0.674{x_1}{x_2} - 1.95{x_2}{x_8}+0.02054{x_3}{x_{10}} - 0.0198{x_4}{x_{10}}+0.028{x_6}{x_{10}}$$
(123)
$${g_{10}}\left( {\mathbf{x}} \right)=16.45 - 0.489{x_3}{x_7} - 0.843{x_5}{x_6}+0.0432{x_9}{x_{10}} - 0.0556{x_9}{x_{11}} - 0.000786x_{{11}}^{2}$$
(124)

Case studies have been performed considering various objective functions as shown in Table 23. The number of sample points utilized for comparison of various methods have been presented in Table 24. The corresponding robust optimal solutions obtained have been reported in Tables 25 and 26. The number of iterations and function calls required for yielding the optimal solutions have been presented in Fig. 9.

Table 23 Description of objective functions for example 6 (Sect. 4.6)
Table 24 Number of sample points utilized for solving the example 6 (Sect. 4.6)
Table 25 Robust optimal solutions corresponding to case 1 of Table 23 for example 6 (Sect. 4.6)
Table 26 Robust optimal solutions corresponding to case 2 of Table 23 for example 6 (Sect. 4.6)
Fig. 9
figure 9

Comparative assessment of the surrogate models for example 6 in terms of (a) number of iterations (b) number of function calls, required in yielding the optimal solutions, corresponding to case 1 of Table 23, (c) number of iterations (d) number of function calls, required in yielding the optimal solutions, corresponding to case 2 of Table 23. The number of function calls refer to the number of function evaluations in the optimization loop. The results obtained by MCS are also provided

4.7 Results and Discussion

The results obtained for each of the six problems by utilizing the surrogate models have been discussed in this section. This discussion has been presented in order to provide guidance to the users about the appropriateness of a particular surrogate model to be utilized in a specific problem. For each of the case studies performed, the best performing surrogate model (in terms of closeness to MCS based solutions) has been marked in bold.

Firstly, in case of example 1, ANOVA-D performs excellently, almost exactly matching the MC based optimal solutions. Apart from ANN and SVM, all other models exhibit acceptable performance in terms of approximation accuracy. ANOVA-D and the PCE based models have been observed to converge in less number of iterations and function calls, in comparison to the other surrogate models. In example 2, ANOVA-D, FK, RBF and the PCE based approaches yield strikingly similar results as compared to MCS based RDO. Also, the above models yield results in relatively less number of iterations as compared to the other models, which probably experience delayed convergence due to some false optima at intermediate iterations. In example 3, excellent similar results have been obtained by ANOVA-D and UK as compared to MC based solutions, both in terms of accuracy and rate of convergence. Models such as, RBF, LWP, ANN and MARS yield slightly inaccurate results, however, in relatively less number of iterations as compared to the above ones. In example 4, ANOVA-D, PCE-OLS and PCE-LAR achieve almost exact results as that of MCS, in relatively less number of iterations and function calls as compared to other surrogate models. Excellent results in terms of similarity to MCS have been obtained by ANOVA-D, PCE-OLS and PCE-LAR in example 5. The above models outperform the other surrogates not only in response approximation but also in terms of rate of convergence. FK has also achieved decent and satisfactory results, however, it requires significantly higher number of iterations to converge. It is also worth mentioning that PCE-Q did not achieve convergence due to the lack of ability to accurately approximate the response functions in presence of multiple constraints. Exactly same results have been achieved by ANOVA-D as compared to MC based solutions in example 6. However, other models such as, UK, PCE-LAR, LWP, ANN and MARS have performed very well both in terms of approximation accuracy and rate of convergence. It is also worth mentioning that PCE-Q and PCE-OLS did not achieve convergence due to the lack of ability to accurately approximate the response functions in multiple constrained environment.

The results of each of the examples as discussed above illustrate that performance of a surrogate model is very sensitive in an RDO framework, and may easily lead to an incorrect optima. In all of the above examples carried out, only one surrogate model i.e., ANOVA-D, has been consistent in accurately capturing the non-linearity and multi-modal landscapes in presence of constraints. It is also worth mentioning that PCE-LAR, being an adaptive sparse model, has achieved good results in most of the problems. Rest of the models have been observed to perform well in few problems and found unsuitable for other complex non-linear problems. Thus, on the basis of the above results, it is recommended to employ ANOVA-D as a surrogate model for highly complex problems whose response landscapes are difficult to capture and additionally comprising of multiple non-linear constraints. Since ANOVA-D has been observed to perform in a superior manner not only in terms of accuracy but also in convergence rate, therefore, it has been employed to solve a large-scale practical engineering problem in the next section.

5 Practical Problem: RDO of a Hydroelectric Dam Model

Electricity generation using a hydroelectric dam is primarily governed by the hourly water supplied through the turbine and the water level in the reservoir. It is quite obvious that due to environmental variations, large amount of uncertainties are associated with a hydroelectric dam. Moreover, cost of energy is also influenced by various factors. Hence, it is of utter importance to consider the presence of uncertainties while optimizing (maximizing) the overall revenue of a hydroelectric dam.

The hydroelectric dam considered in this study as presented in Fig. 10, is such that the water in the reservoir may either leave through the spillway or turbine. It is obvious that the water leaving through the turbine will be utilized in producing electricity. The primary objective of designing a hydroelectric dam is to maximize the revenue generated by selling the electricity. Conventional optimization of the above mentioned hydroelectric dam can be found in [131].

Fig. 10
figure 10

Schematic diagram of hydroelectric dam

Various uncertainties are associated with any hydroelectric dam. For instance, the flow through spillway and turbine are generally controlled by some machine operated gates. However, it is not possible to exactly control the flow with such machineries and this results in some uncertainties. On the other hand, the in-flow to the reservoir is uncontrolled and hence large sources of uncertainties is associated with this. Moreover, market price of electricity depends on various factors and is highly uncertain. It is to be noted that flow through spillway, flow through turbine, in-flow and market price are generally monitored on an hourly basis. In the present study, the simulation is run for 12 h and hence, the system under consideration involves 48 random variables. A detailed account of the involved uncertain variables have been provided in Table 27.

Table 27 Description of the random variables for hydroelectric dam model (Sect. 5)

The electricity produced in a hydroelectric dam depends on two primary parameters, namely amount of water owing through the turbine and the reservoir storage level. The storage of reservoir again depends on the three factors: (a) in-flow, (b) flow through turbine and (c) flow through spillway. As the flow through turbine increases, the water in the reservoir decreases. Therefore, it is necessary to compute the optimum flow through the turbine and spillway that maximizes the electricity production. Moreover, certain constraints needs to be considered while solving the optimization problem. First, both reservoir level and downstream flow rates should be within some specified limit. Secondly, maximum flow through the turbine should not exceed the turbine capacity. Finally, the mean reservoir level at the end of the simulation should be same as that at the beginning. This ensures that the reservoir is not emptied at the end of the optimization cycle. The RDO problem has been stated as:

$$\begin{gathered} \arg \min \; - \left[ {\beta \mu _{R} + \left( {1 - \beta } \right)\sigma _{R} } \right] \hfill \\ {\text{s.t.}}\quad \quad \mu _{{f_{t} \left( i \right)}} - 3\sigma _{{f_{t} \left( i \right)}} \ge 0,\forall i \hfill \\ \quad \quad \quad \mu _{{f_{t} \left( i \right)}} + 3\sigma _{{f_{t} \left( i \right)}} \ge 25000,\forall i \hfill \\ \quad \quad \quad \mu _{{f_{t} \left( i \right)}} - 3\sigma _{{f_{t} \left( i \right)}} + \mu _{{f_{s} \left( i \right)}} - 3\sigma _{{f_{s} \left( i \right)}} \ge 500,\forall i \hfill \\ \quad \quad \quad \left| {\mu _{{f_{t} \left( i \right)}} + 3\sigma _{{f_{t} \left( i \right)}} + \mu _{{f_{s} \left( i \right)}} + 3\sigma _{{f_{s} \left( i \right)}} - \mu _{{f_{t} \left( {i - 1} \right)}} + 3\sigma _{{f_{t} \left( {i - 1} \right)}} - \mu _{{f_{s} \left( {i - 1} \right)}} + 3\sigma _{{f_{s} \left( {i - 1} \right)}} } \right| \le 500,\forall i \hfill \\ \quad \quad \quad \mu _{{S\left( i \right)}} - 3\sigma _{{S\left( i \right)}} \ge 50000,\forall i \hfill \\ \quad \quad \quad \mu _{{S\left( i \right)}} + 3\sigma _{{S\left( i \right)}} \ge 100000,\forall i \hfill \\ \quad \quad \quad \mu _{{S\left( {{\text{end}}} \right)}} = 90000 \hfill \\ \end{gathered}$$
(125)

where \(\mu \left( \cdot \right){\text{ and }}\sigma \left( \cdot \right)\) denote mean and standard deviation, respectively. R denotes the revenue generated and S denotes the storage of the reservoir. \(f_{t} {\text{ and }}f_{s}\) in Eq. (125) represent the flow through turbine and spillway, respectively. \(\beta\) is the weightage factor. The objective is to determine \(f_{t} {\text{ and }}f_{s}\) that minimizes the objective function as defined in Eq. (125).

The RDO results have been obtained for \(\beta =0.5,\) i.e., equal weightage has been assigned to the mean and standard deviation of the revenue generated. The number of sample points required for training the anchored ANOVA decomposition (ANOVA-D) model is 4609. The results obtained by ANOVA-D have been validated with that of MCS (104 samples) for each optimization iteration. The robust optimal solutions obtained by utilizing ANOVA-D as presented in Table 28 and Fig. 11 has achieved excellent similarity with benchmark MCS solutions. This illustrates high approximation accuracy of ANOVA-D. Additionally, ANOVA-D has utilized significantly less number of sample points as compared to MCS, which illustrates its computational efficiency. Overall, the performance of ANOVA-D is acknowledgeable in such high-dimensional problem as this present one.

Table 28 Robust optimal solutions for the hydroelectric dam problem (Sect. 5)
Fig. 11
figure 11

Comparison of the flow through turbine and spillway (i.e., design variables) as obtained by utilizing ANOVA-D and MCS

After carrying out an extensive numerical study by utilizing few surrogate models in RDO framework (Sect. 4) and also, addressing a practical engineering problem in an efficient manner (Sect. 5), the study has been summarized briefly in the next section.

6 Summary and Recommendations

An extensive survey has been carried out in this study illustrating the performance of surrogate models in RDO framework. As previously illustrated, the approximation accuracy of a surrogate model is a crucial factor in stochastic optimization as a slight deviation from the results of any intermediate iterations may easily deviate to yield a false or local optima. Therefore, the motivation of the study has been to access the performance of available surrogate models in terms of their approximation potential while solving typical non-linear RDO problems. This study would also serve as a guiding handbook for the selection of a suitable surrogate model for addressing a problem of a particular level of complexity. In this context, few salient points have been highlighted on the basis of the results achieved by various surrogate models:

  • First and foremost, ANOVA-D has outperformed the other models investigated for solving typical non-linear RDO examples. It has proven its consistency and robustness in accurately approximating the response functions in all examples, unlike the other models. It is highly recommended for use in future applications of stochastic optimization.

  • Secondly, performance of least angle regression based PCE is noteworthy both in terms of yielding accurate solutions and rate of convergence, except in the third example. After ANOVA-D, PCE-LAR has achieved the second ranking among the models utilized and thus, recommended to be utilized and further improved.

  • Thirdly, ordinary least square and quadrature based PCE do not achieve convergence in examples 5 and 6. Thus, they are not recommended to be utilized for relatively high-dimensional non-linear problems with multiple constraints. However, they may be considered to be suitable for low-dimensional problems.

  • Fourthly, performance of models such as, Kriging, RBF, LWP, ANN and MARS have been observed to vary in different problems and thus, found to be inconsistent. It is highly recommended to validate the results obtained by using these above models with benchmark solutions, if available.

  • Lastly, SVM has yielded unacceptable results due to its lack of capability to accurately capture the non-linearity in functional space and thus, gets deviated from the true optima in presence of constraints.

Thus, after identifying high reliability of ANOVA-D in complex landscapes, it has been employed to solve a practical large-scale hydroelectric dam model. As expected, ANOVA-D has yielded similar results as compared to MCS based RDO solutions, by utilizing limited number of training points, considering the scale of the problem. Thus, the study illustrates the resilience of anchored ANOVA decomposition is note-worthy and encouraging for applications in further complex engineering systems.