Ensemble of surrogates with recursive arithmetic average

Zhou, Xiao Jian; Ma, Yi Zhong; Li, Xu Fang

doi:10.1007/s00158-011-0655-6

Ensemble of surrogates with recursive arithmetic average

Research Paper
Published: 07 May 2011

Volume 44, pages 651–671, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Ensemble of surrogates with recursive arithmetic average

Download PDF

Xiao Jian Zhou¹,
Yi Zhong Ma¹ &
Xu Fang Li²

812 Accesses
74 Citations
Explore all metrics

Abstract

Surrogate models are often used to replace expensive simulations of engineering problems. The common approach is to construct a series of metamodels based on a training set, and then, from these surrogates, pick out the best one with the highest accuracy as an approximation of the computationally intensive simulation. However, because the choice of approximate model depends on design of experiments (DOEs), the traditional strategy thus increases the risk of adopting an inappropriate model. Furthermore, in the design of complex product system, because of its feature of one-of-a-kind production, acquiring more samples is very expensive and intensively time-consuming, and sometimes even impossible. Therefore, in order to save sampling cost, it is a reasonable strategy to take full advantage of all the stand-alone surrogates and then combine them into an ensemble model. Ensemble technique is an effective way to make up for the shortfalls of traditional strategy. Motivated by the previous research on ensemble of surrogates, a new technique for constructing of a more accurate ensemble of surrogates is proposed in this paper. The weights are obtained using a recursive process, in which the values of these weights are updated in each iteration until the last ensemble achieves a desirable prediction accuracy. This technique has been evaluated using five benchmark problems and one reality problem. The results show that the proposed ensemble of surrogates with recursive arithmetic average provides more ideal prediction accuracy than the stand-alone surrogates and for most problems even exceeds the previously presented ensemble techniques. Finally, we should point out that the advantages of combination over selection are still difficult to illuminate. We are still using an “insurance policy” mode rather than offering significant improvements.

A pointwise ensemble of surrogates with adaptive function and heuristic formulation

Article 10 March 2022

Ensemble of surrogates with hybrid method using global and local measures for engineering design

Article 06 November 2017

Automatic selection for general surrogate models

Article 20 February 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the continuing updating of CPU and escalation of memory, the computer processing power has drastically increased, but the computational cost of complex high-fidelity engineering simulations often makes it impractical to rely exclusively on simulation for design optimization (Jin et al. 2001). Just taking Ford Motor Company as an example, it reported that it takes the company about 36–160 h to run one crash simulation (Wang and Shan 2007). For a two-dimension optimization problem, assuming that on average 50 iterations are needed in the optimization process, and assuming that each iteration requires one crash simulation, then the total amount of computation time would reach to as much as 75 days to 11 months, which is unacceptable in practice. In order to reduce the computational cost, surrogate models (also referred to as “metamodels”) are used to replace the expensive simulation models (Queipo et al. 2005; Viana et al. 2010). Surrogate evolves from the classical Design of Experiments (DOE) theory, in which the polynomial model is known as “response surface model”. Essentially, it is also a kind of surrogate. In addition to commonly used polynomial model, Sacks et al. (1989a, b) proposed a stochastic model, i.e., Kriging (Cresssie 1988), to treat the deterministic computer response as a realization of a random function with respect to the actual system response. Neural networks are also often applied to simulate the responses for complex systems (Papadrakakis et al. 1998). Other types of metamodels include radial basis functions (RBF) (Fang and Horstemeyer 2006), multivariate adaptive regression splines (MARS) (Friedman 1991), least interpolating polynomials (De Boor and Ron 1990), inductive learning (Langley and Simon 1995), support vector regression (SVR), and so on. In general, Kriging model is more accurate for non-linear problems than other models due to its capacity of interpolating the sample points and filtering noisy data, but it is difficult to be obtained and used because a global optimization process is involved to identify the maximum likelihood estimators. In contrary to Kriging model, polynomial models are relatively easy to be built up and clear on parameter sensitivity but unsatisfactory in accuracy because of the difficulty in determining its model structure (its highest order and the number of items) (Jin et al. 2001). The RBF model, particularly the multi-quadric RBF, can interpolate sample points and is easy to build, which thus seems to reach a trade-off between Kriging models and polynomial models. SVR has been intensively studied in the area of machine learning but seldom used in computer experiment. Its capacity of fitting of data has been tested and verified in Clarke et al. (2005), which shows that the higher accuracy was achieved, compared with all other metamodeling techniques including Kriging, polynomials, RBF and MARS in a series of test problems. Just as the author pointed out, the basic reasons why SVR outperforms others are not clear. More recent and comprehensive reviews of metamodeling can be traced to Kleijnen et al. (2005), Wang and Shan (2007), Simpson et al. (2008) and Forrester and Keane (2009).

If only one single predictor is desired, there are two strategies for us to obtain the final prediction surrogate. One is selection, which can be done using cross validation (Picard and Cook 1984; Kohavi 1995); the other is combination, which can be traced to the development of committees of neural networks by Perrone and Cooper (1993) with further refinement by Bishop (1995). Zerpa et al. (2005) and Goel et al. (2007) extended this idea to the ensemble of metamodels. Goel et al. (2007) found that multiple metamodels can be used to identify the regions of possible high errors where predictions of metamodels differ widely. Thereby this can guide the engineer to gather more sample points in this uncertain region to achieve more accurate result. In addition, the authors also found that combining of metamodels can provide us with a more robust ensemble, which can effectively eliminate the negative impact brought by inappropriate stand-alone metamodel, that is, the use of multiple surrogates acts like an insurance policy against poorly fitted models, which is also confirmed by Viana et al. (2009). Acar and Rais-Rohani (2009) proposed a combining technique with optimized weight coefficients, which are obtained by solving an optimization problem. The technique in Acar and Rais-Rohani (2009) could achieve a certain satisfactory result in some cases, nevertheless, it has several deficiencies as following: (1) The optimization problem used to determine the weight coefficients could not ensure obtaining a global optimal solution, and is easily trapped into a local optimum, and even has no local optimal solution; and (2) The range of weight coefficients are not constrained to w _i ≥ 0 when solving the optimization problem, as w _i < 0 is difficult to be explained in actual problems. In Acar and Rais-Rohani (2009), authors get the weights by minimizing GMSE or RMSE^v using a formal optimization algorithm in MATLAB. In terms of minimizing RMSE^v, the technique is essentially the same as the Bishop’s approach on minimizing the mean square error (MSE). Inspired by the works of Bishop (1995) and Acar and Rais-Rohani (2009), Viana et al. (2009) also obtained the weight coefficients by minimizing MSE. Viana et al. (2009) got the solution of the weight via Lagrange multipliers, and the authors replaced the real error covariance matrix C with cross-validation error matrix, with the corresponding method named OWS (optimal weighted surrogate) in the literature. However, OWS is essentially the same as the approach based on minimizing GMSE in Acar and Rais-Rohani (2009). In order to make the solution range between zero and one, Viana et al. (2009) only used the diagonal elements of C, with the corresponding method named OWS_diag in the literature, and just as the authors said in their paper, this method has similar structure and prediction accuracy to the approach named heuristic computation of the weights in Goel et al. (2007). In addition to these ensemble techniques mentioned above, there are several other ensemble techniques appeared in the literatures, such as BestPRESS (Goel et al. 2007), OWS_ideal Viana et al. (2009), and so on. Essentially, OWS_ideal Viana et al. (2009) is the same as minimizing RMSE^v in Acar and Rais-Rohani (2009). The difference between them is that RMSE^v in Acar and Rais-Rohani (2009) employs a formal optimization algorithm, while OWS_ideal Viana et al. (2009) is obtained via Lagrange multipliers.

Motivated by the existing works, the ensemble technique with recursive arithmetic average is proposed in this paper. The weights are obtained using a recursive process, in which the values of these weights are updated in each iteration until the last ensemble reach to a desirable prediction accuracy. This technique builds an ensemble of metamodels by recursive arithmetic average several times rather than arithmetically averaging the responses of the stand-alone metamodels just once. In order to illustrate the performance of the proposed technique, four types of metamodeling techniques (polynomial function, Kriging, RBF and SVR) are used to build up the ensemble, and these four stand-alone metamoels as well as the existing ensemble techniques are compared with the ensemble technique proposed in this paper. The performances of these stand-alone metamodels and all of the ensembles are evaluated by several commonly used criteria (e.g., correlation (denoted by R), maximum absolute error (MAE), average absolute error (AAE), root of mean square error (RMSE), etc.). The experimental results showed that the proposed ensemble of metamodels with recursive arithmetic average provides more accurate predictions than the stand-alone metamodels and for most problems even exceeds the previously presented ensemble techniques.

The remainder of this paper is organized as follows. In the next section, we present the basic weighted-sum formulation and the different techniques that can be used to select the weight factors for the stand-alone metamodels. In Section 3, the test problems are considered and the numerical procedure for finding an ensemble with recursive arithmetic average is presented. The presentation and discussion of results is displayed in Section 4. At last, the summary of several important conclusions is discussed in Section 5.

2 Ensemble of surrogates

For a given problem, if all the candidate metamodels developed for a given high-fidelity simulation happen to have the same level of accuracy, then a very straightforward form for the ensemble would be a simple average of the surrogates. However, for a specified problem the usual case is that there are some models that are more accurate than others. Therefore, in order to improve the accuracy of ensemble, the stand-alone surrogates have to be multiplied by different weight coefficients. Using ensemble of surrogates for approximation of response can be expressed as:

$$ \widehat{y}_s (x)=\sum\limits_{i=1}^N {w_i (x)} \widehat{y}_i (x) \sum\limits_{i=1}^N {w_i (x)} =1 $$

(1)

where x is input variable, $\widehat{y}_s (x)$ is the ensemble response, N is the number of surrogates in the ensembles, w _i (x) is the weight coefficient for the ith surrogate, $\widehat{y}_i (x)$ is the response estimated by the ith surrogate.

Generally, the weight coefficients are selected such that the surrogates with high accuracy have large weight factor and vice versa.

All of the ensembles of surrogates in literatures can be divided into three categories:

(1)
Combining surrogates by minimizing cross-validation errors (GMSE; PRESS in particular), e.g., heuristic computation of the weight coefficient (Goel et al. 2007), the approach based on minimizing GMSE^v in Acar and Rais-Rohani (2009), OWS, OWS_diag (Viana et al. 2009), and BestPRESS (Goel et al. 2007; Viana et al. 2009);
(2)
Combining surrogates using prediction variance, e.g., the approach obtaining the weights based on variance reciprocal (Bishop 1995; Zerpa et al. 2005);
(3)
Combining surrogates by minimizing mean square error (or root of mean square error (RMSE)), e.g., OWS_ideal (Viana et al. 2009), the approach based on minimizing RMSE^v in Acar and Rais-Rohani (2009).

In the first category, the weights are determined using training points, but, in the second and third category, the weight is determined using several validation points in test set. The techniques determining the weights using cross validation are time-consuming, while the ones using validation points all require additional simulations for response determination. Depending on the type of surrogate and the computational cost of simulation calculation, one error metric (PRESS or RSME) would be less expensive to evaluate than the others (PRESS or RSME). If the cost of obtaining data required for developing surrogate models is high, choosing PRESS as error metric would be a reasonable strategy, for additional response validations at test set are needed with RMSE. On the contrary, if the surrogate-constructing is computationally costly, RMSE (or MSE) used as error metric would be a better choice, for only a single surrogate would be constructed with RMSE. The technique proposed in this paper belongs to the third category. Next, the details of all the ensembles are presented below.

2.1 Weight coefficients selection based on prediction variance

Based on the work of Bishop (1995), Zerpa et al. (2005) used the ensemble of surrogates including response surface (RS) model, Kriging model and RBF model in the optimization of an alkali-surfactant polymer flooding process, and chose the prediction variance as the error metric. The values of the weight coefficients are determined by the following formula:

$$ w_i ={w_i^\ast } \mathord{\left/\vphantom{\sum\limits_{i=1}^M {w_i^\ast } }\right.}{\sum\limits_{i=1}^M {w_i^\ast } },w_i^\ast =\frac{1}{V_i } $$

(2)

where V _i is the prediction variance of the ith surrogate.

2.2 Combining surrogates by minimizing cross-validation errors

2.2.1 Heuristic computation of the weight coefficient

Goel et al. (2007) proposed a heuristic method for calculating the weight coefficients, which is known as PRESS (predicted residual sum of squares) weighted average surrogate, where the weight coefficients are computed as:

$$ \begin{array}{rll} w_i &=&{w_i^\ast } \mathord{\left/\vphantom{\sum\limits_{i=1}^M {w_i^\ast }}\right.}{\sum\limits_{i=1}^M {w_i^\ast } },w_i^\ast =(E_i +\alpha E_{avg} )^\beta ,\\ E_{avg} &=&\frac{1}{n}\sum\limits_{i=1}^n {E_i } ,\beta <0,\alpha <1 \end{array} $$

(3)

where E _i is the PRESS error of the ith surrogate, α, β are used to control the importance of averaging and individual PRESS respectively. Goel et al. (2007) suggested α = 0.05,β = − 1.

2.2.2 The approach based on minimizing GMSE^v

Acar and Rais-Rohani (2009) proposed a method for determining the weight coefficients, which is achieved through minimizing some error metric, such as PRESS error. The optimization problem is presented as:

$$ \begin{array}{rll} \min \varepsilon _s &=&Err\left\{\widehat{y}_s \big(w_i ,\widehat{y}_i ({{\bf x}}^k)\big)y_i \big({{\bf x}}^k\big),k=1\in N\right\}\!\!\!\!\!\!\!\!\!\! \\ s.t.\sum\limits_{i=1}^N {w_i } &=&1 \end{array} $$

(4)

where Err{·} is the selected error metric which measures the accuracy of the ensemble-predicted response $\widehat{y}_s $. The author adopted the generalized mean square cross-validation error (GMSE; leave-one-out cross validation or PRESS in polynomial response surface approximation terminology) as one kind of the error metric.

2.2.3 OWS (Optimal weighted surrogate)

Employing an ensemble of neural networks, Bishop (1995) proposed a weighted surrogate obtained by approximating the covariance between surrogates from residuals at training or test points, whose approach is based on mimizing the MSE:

$$ \mbox{MSE}_{\rm WAS} = \frac{1}{\rm V}\int_V e_{\rm WAS}^2({\bf x}) d{\bf x} = {\bf w}^T C{\bf w} $$

(5)

where $e_{\rm WAS}^{}({\bf x}) = y({\bf x}) - {y_{\rm WAS}}({\bf x})$ is the error associated with the prediction of the WAS ensemble model, and the integral, which is taken over the domain interest, permits the calculation of the elements of C as:

$$ {c_{ij}} = \frac{1}{V}\int_V {{e_i}({{\bf x}}){e_j}({{\bf x}})} d{{\bf x}} $$

(6)

where e _i(x) and e _j(x) are the errors associated with the prediction given by the surrogate model i and j respectively.

C plays the same role as the the covariance matrix in Bishop’s formulation. But C is approximated by the vectors of cross validation errors, $\tilde{e}$,

$$ {c_{ij}} \simeq \frac{1}{p}{\widetilde{e}_i^T}{\widetilde{e}_j} $$

(7)

where p is the number of data points and the i and j indicate different surrogates.

Given the C matrix, the optimal weighted surrogate (OWS) is obtained by minimizing the MSE as:

$$ \mathop {\min }\limits_{{\rm w}} {{\rm MS}}{{{\rm E}}_{{{\rm WAS}}}} = {{{\bf w}}^T}{{\bf Cw}} $$

(8)

s.t. 1 ^T w = 1.

Using Lagrange multipliers, the solution is obtained as:

$$ {\bf w} = \frac{{\bf C}^{-1}{\bf 1}}{{{\bf 1}^T}{{\bf C}^{-1}{\bf 1}}} $$

(9)

The weight in the formulation above may less than zero or larger than one, whose meaning is difficult to explain in real world, and, as pointed out by Viana et al. (2009), allowing this freedom was found to amplify errors coming from the approximation of matrix (7). In Viana et al. (2009), the author enforced the weight positive by solving (9) using only the diagonal elements of C. The approach is named OWS_diag.

After examining formulas (4) and (9), we can find that both approaches actually the same, for both of them are all based on minimizing cross validation (especially PRESS; GMSE). The difference between them is that the approach in Acar and Rais-Rohani (2009) obtains the weights through a optimization process, while the approach in Viana et al. (2009) obtains the weights through an analysis expression, however, both approaches have exactly the same solution. Thereby, in order to avoid replication, OWS is not included in the rest of this paper.

2.2.4 BestPRESS

The traditional method of using an ensemble of surrogates is to select the best surrogate among all of the considered models. However, once the choice is made, the surrogate is fixed even though the design of experiments is changed. If the choice is refined for each new DOE, we can included it in the strategies for multiple surrogates, where the model with least error is assigned a weight of one and all others are assigned zero weight. Just as many literatures do, we also call this strategy BestPRESS model.

2.3 Combining surrogates by minimizing mean square error (MSE) (or root of mean square error (RMSE))

2.3.1 OWS_ideal: the approach based on minimizing RMSE^v in Acar and Rais-Rohani (2009)

In formula (7), if $\tilde{e}$ is the real MSE in validation point set rather than the cross-validation in training set, then C is not the cross-validation error covariance matrix but the real error covariance matrix in formula (9). Just as we have pointed out above, OWS_ideal is exactly the same as the approach based on minimizing RMSE^v in Acar and Rais-Rohani (2009). In Acar and Rais-Rohani (2009), the RMSE^v (where v is number of validation points in test set) is chosen as the error metric in formula (4). Therefore, in order to avoid replication, the remainder of this paper doesn’t include OWS_ideal.

2.3.2 The strategy proposed in this paper-ensemble of surrogates with recursive arithmetic average

As having been mentioned above, most of the ensemble techniques obtain the weights by either minimizing cross-validation errors or minimizing RMSE (or MSE). Although the techniques using cross-validation errors don’t require additional validation points, they must be constructed many times, thereby, they are time-consuming. On the contrary, the techniques with RMSE (or MSE) need additional validation points, but these approaches only need to construct the surrogates once, so they are time-saving. In addition, when the value of RMSE at the test points is used as the error criterion, the techniques using RSME usually have better results, for the error metric employed in obtaining the weights is the same as that in measuring the prediction accuracy (they all use RMSE). The technique proposed in this paper also employs the prediction mean square error as the error metric.

In all of the combining techniques, the simplest and straight forward approach is to arithmetically average these single surrogates. Nevertheless, arithmetically averaging the stand-alone surrogates just once would not minimize the prediction mean square error. In order to make the prediction mean square error as low as possible, we consider to employ recursive process. Generally, the iteration in recursive process should be repeated several times, how many of which depends on the specified stop criterion. In this strategy, the algorithm stops when the prediction MSE of the worst surrogate approaches to that of the best surrogate. In other words, all the updated surrogates in the last iteration have similar prediction results (i.e., similar prediction MSEs). Furthermore, we should point out that the surrogates in the recursive process are not the initial single surrogates but the combining surrogates obtained using arithmetically averaging. The basic frame of this algorithm is as follows:

Input: Initial weight coefficients
Step 0: Fit the training data {x _j} ,j = 1,2,....,T (where T is the number of the training points) with N candidate surrogates;
Step 1: Calculate their prediction mean square errors: ${e_i} = \frac{1}{T}\sum\limits_{j = 1}^T {({Sur_{ij}} - \widehat{Sur}_{ij})^2} ,{{\rm }}i = 1,2,....,N$ (where $\widehat{Sur}_{ij}$ is the prediction value on the jth validation point of the ith individual surrogate) on the validation points;
Step 2: Find out the worst individual surrogate (i.e., the surrogate that has the largest prediction MSE, denoted by Sur_worst, and its corresponding prediction MSE is denoted by MSE_WorstSur) and the best surrogate (i.e., the surrogate that has the smallest prediction MSE, denoted by Sur_best, and its corresponding prediction MSE is denoted by MSE_BestSur).

While(MSE_WorstSur − MSE_BestSur > tol )D
Step 3: Obtain the arithmetic average of the candidate N surrogates; that is, all the candidate single surrogates are added, and then divided by the total number of all the candidate surrogates; denote this average ensemble model using Sur_ave;
Step 4: Replace the surrogate which has the largest prediction MSE (i.e. Sur_worst) with the simple average surrogate (i.e. Sur_ave) made in step 3 (this surrogate replaced by average surrogate may be one of the initial candidate surrogates or the average ensemble model in the previous time), then we can get N new surrogates, of which N − 1 surrogates are not changed; calculate and then update the weights for the initial individual surrogates;
Step 5: Do the same work as that in step 2; if the condition in while (·) is met, then return to step 3, otherwise break out of the loop.

EndWhile
Output: Optimal weight cofficients

Such iteration will be taken until the prediction MSE has no significant improvement. In the algorithm above, tol is the tolerant value determined in advance (e.g., tol = 0.01). Next, the convergence of the above-mentioned algorithm is presented as follows.

For a problem, there are N kinds of surrogates Sur ₁, Sur ₂, ..., Sur _N, the weight for Sur _i is w _i, and $\sum\limits_{i = 1}^N {{w_i} = 1}$. Assume the prediction value and prediction error of the ith surrogate Sur _i on the jth data point respectively are Sur _ij and e _ij, j = 1,2,....,T (where T is the number of the training points), then the prediction value and prediction error of the simple average surrogate Sur _ave on the jth data point respectively are $Sur_{ave}(j) = \sum\limits_{i = 1}^N {{w_i}Sur_{ij}} $ and $e_{ave}(j) = \sum\limits_{i = 1}^N {{w_i}e_{ij}} $. Denote the weight vector by W = [w ₁,w ₂,...w _N]^T, denote the prediction error vector of the Sur _i by E _i = [e _i1,e _i2,...e _iT]^T, denote the prediction error matrix by e = [E ₁,E ₂,...E _N], and denote the sum of prediction square error of the simple average surrogate by J, then the following stands:

$$ J = {W^T}EW, $$

(10)

where

$$E = {e^T}e = \left[ {\begin{array}{*{20}{c}} {{E_{11}}} & {{E_{12}}} & \ldots & {{E_{1N}}} \\ {{E_{21}}} & {{E_{22}}} & \ldots & {{E_{2N}}} \\ \vdots & \vdots & \vdots & \vdots \\ {{E_{N1}}} & {{E_{N2}}} & \ldots & {{E_{NN}}} \\ \end{array}} \right],$$

and where

$$ {E_{ij}} = E_i^T{E_j} = \sum\limits_{i = 1}^N {{e_{it}}{e_{jt}}}. $$

Apparently, E _ii is the sum of prediction square error of Sur _i.

Based on the description above, we have the following lemma.

Lemma 1

Assume the prediction error vector E ₁, E ₂, ..., E _N is linear independent, and denote the sum of prediction square error of the simple average surrogate by J _A , then

$$ {J_A} < {J_{\max }}. $$

(11)

Proof

The weights of the simple average surrogate is

$$ {W_A} = {[1/N,1/N,...1/N]^T}, $$

(12)

and

$$ {J_A} = W_A^TE{W_A} = \frac{1}{{{N^2}}}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {\sum\limits_{t = 1}^T {{e_{it}}{e_{jt}}} } } . $$

(13)

Because E ₁, E ₂, ..., E _N is linear independent, then

$$ \begin{array}{rll} \sum\limits_{t = 1}^N {{e_{it}}{e_{jt}}} &<& \sqrt {\sum\limits_{t = 1}^N {e_{_{it}}^2} } \sqrt {\sum\limits_{t = 1}^N {e_{_{jt}}^2} } = \sqrt {{E_{ii}}} \sqrt {{E_{jj}}}\\ &\le& \sqrt {{J_{\max }}} \sqrt {{J_{\max }}} = {J_{\max }}, \end{array} $$

(14)

so,

$${J_A} < \frac{1}{{{N^2}}}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{J_{\max }}} } = {J_{\max }}.$$

The proof is finished.□

Theorem 1

Denote the error vector which is obtained by replacing the worst surrogate with the simple average surrogate (i.e. Sur_ave) in k th iteration by

$$ {E^{(k)}} = \left(E_{11}^{(k)},E_{22}^{(k)},...E_{NN}^{(k)}\right), $$

(15)

then

$$ \mathop {\lim }\limits_{k \to \infty } {E^{(k)}} = (d,d,....d), $$

(16)

where, d = MSE _BestSur.

Proof

Denote $E_{\max }^{(0)} = \max \{ {E_{ii}}\} $ and $E_{\max }^{(k)} = \max \{ E_{ii}^{(k)}\} $, where i = 1,2,...N. Because the worst surrogate is replaced by the simple average surrogate in each iteration, according to lemma 1, $E_{\max }^{(0)} > E_{\max }^{(1)} > ... > E_{\max }^{(k)} > ...$. On the other hand, the best initial surrogate is not changed in each iteration, then $E_{\max }^{(k)} \ge MS{E_{BestSur}}$. Because it is monotonous and bounded, the data serial $\left\{ {E_{\max }^{(k)}} \right\}_{k = 0}^\infty $ has its limit, denoted by d, i.e., $\mathop {\lim }\limits_{k \to \infty } E_{\max }^{(k)} = d$.

Apparently, d ≥ MSE _BestSur. Next, we will prove d = MSE _BestSur. In fact, if d > MSE _BestSur, according to lemma 1, we can replace the worst surrogate with the simple average surrogate, then the prediction MSE of the worst surrogate will less than d in the next iteration, which is contradict to the conclusion $\mathop {\lim }\limits_{k \to \infty } E_{\max }^{(k)} = d$. Therefore, d = MSE _BestSur, i.e., $\mathop {\lim }\limits_{k \to \infty } E_{max}^{(k)} = MS{E_{BestSur}}$.

Furthermore, denote $E_{\min }^{(0)} = \min \{ {E_{ii}}\} $ and $E_{\min }^{(k)} = \min \{ E_{ii}^{(k)}\} $, we can easily know $E_{\min }^{(0)} = E_{\min }^{(1)} = ...= E_{\min }^{(k)} = ...=MSE_{BestSur}$. So, $\mathop {\lim }\limits_{k \to \infty } {E^{(k)}} = (d,d,....d)$, where d = MSE _BestSur. The proof is finished.□

The technique proposed in this paper has several differences from the existing ensemble techniques:

(1)
Because cross-validation often tends to overestimate errors, the real gain in accuracy of the ensemble technique based on cross-validation is limited, the illustration about which is presented in Viana et al. (2009). However, as for the third class of ensemble technique based on minimizing RMSE mentioned above, if the validation points are acquired easily, we can consider to get more validation points to construct the ensemble. Generally, the more validation points are used to determine the weights in ensemble of surrogates, the better prediction accuracy can be achieved by the ensemble. If the validation point set is large, the prediction MSE of the ensemble of surrogates would approache to that of the BestRMSE (Viana et al. 2009). In the process of obtaining the weights, the validation points are also needed in the technique proposed here, and with recursive scheme, the proposed technique can achieve desirable results. In a word, the technique proposed in this paper is based on minimizing RMSE, and, because it adopt recursive process, has an ideal prediction capacity, which is the difference of the proposed technique in this paper from those techniques based on minimizing cross-validation (especially, GMSE; PRESS).
(2)
As for OWS_ideal (Bishop 1995; Viana et al. 2009), using Lagrange multipliers to get the weight solution can neither ensure the weights larger than or equal to one nor ensure not less than zero, whose physical meaning in many circumstances is difficult to explain. Similarly, the approach based on minimizing RMSE^v (Acar and Rais-Rohani 2009) also hasn’t added the condition w _i ≥ 0 into formula (4). If w _i ≥ 0 is added into formula (4), the analysis expression like (9) cannot been obtained, and a lot of iterations in simplex method of operational research or other formal intelligent optimization algorithm would be needed. Thereby, when the dimension of the problem is large, the optimization process is also time-consuming. So, a simple and straight-forward approach is needed. Arithmetic average ensemble surrogate proposed in this paper can ensure the weights nonnegative and not larger than one, which is convenient to explain the importance of each candidate single surrogate.
(3)
As mentioned in (2), the optimization process is also time-consuming, especially in problems with large dimensions. On the contrary, recursive process is time-saving compared to optimization process. The number of iterations is effected by tol and usually is a dozen or dozens, so it executes more quickly than optimization process. The experiment results presented in the end of Section 4 confirm this.

3 Experiments

3.1 Benchmark problems

In order to test the proposed technique in this paper, we choose the following analytic functions that are commonly used as benchmark problems in literatures.

Branin–Hoo:

$$ \begin{array}{rll} y(x_1 ,x_2 )&=&\left(x_2 -\frac{5.1x_1 ^2}{4\pi ^2}+\frac{5x_1 }{\pi}-6\right)^2\\ &&+\,10\left(1-\frac{1}{8\pi }\right)\cos \big(x_1 \big)+10 \end{array} $$

(17)

where x ₁ ∈ [ − 5, 10], x ₂ ∈ [0, 15].

CamelBack:

$$ \begin{array}{rll} y\big(x_1 ,x_2 \big)&=&\left(4-2.1x_1^2 +\frac{x_1^4 }{3}\right)x_1^2 +x_1 x_2 \\ &&+\left(-4+4x_2^2\right)x_2^2 \end{array} $$

(18)

where x ₁ ∈ [ − 3, 3], x ₂ ∈ [ − 2, 2].

Goldstein–Price:

$$ \begin{array}{rll} y\big(x_1 ,x_2 \big)&=&\Bigl[1+\big(x_1 +x_2 +1\big)^2\notag\\ &&\times \Bigl(19-4x_1 +3x_1^2 -14x_2 +6x_1 x_2+3x_2^2 \Bigr)\Bigr]\!\!\!\!\!\!\!\!\!\!\!\!\!\notag\\ &&\times \Bigr[30+\big(2x_1 -3x_2 \big)^2 \notag\\ &&\quad\times\, \Bigl(18-32x_1 +12x_1^2 \notag\\ &&{\kern6pt} \qquad +48x_2 -\,36x_1x_2 +27x_2^2 \Bigr)\Bigr] \end{array} $$

(19)

where x ₁ , x ₂ ∈ [ − 2, 2].

Hartman:

$$ y({{\bf x}})=-\sum\limits_{i=1}^m {c_i} \exp \left[-\sum\limits_{j=1}^n{a_{ij} \big(x_j -p_{ij} \big)^2} \right] $$

(20)

where x _i ∈ [0, 1].

Both the three-variables (n = 3) and the six-variables (n = 6) models of this function are considered. The values of function parameters c _i ,p _ij ,a _ij for Hartman-3 and Hartman-6 models, given in Tables 1 and 2, are taken from Goel et al. (2007) and Acar and Rais-Rohani (2009). For the chosen examples, m = 4.

Table 1 Parameters used in Hartman function with three variables

Ensemble of surrogates with recursive arithmetic average

Abstract

Similar content being viewed by others

A pointwise ensemble of surrogates with adaptive function and heuristic formulation

Ensemble of surrogates with hybrid method using global and local measures for engineering design

Automatic selection for general surrogate models

Explore related subjects

1 Introduction

2 Ensemble of surrogates

2.1 Weight coefficients selection based on prediction variance

2.2 Combining surrogates by minimizing cross-validation errors

2.2.1 Heuristic computation of the weight coefficient

2.2.2 The approach based on minimizing GMSEv

2.2.3 OWS (Optimal weighted surrogate)

2.2.4 BestPRESS

2.3 Combining surrogates by minimizing mean square error (MSE) (or root of mean square error (RMSE))

2.3.1 OWS ideal : the approach based on minimizing RMSEv in Acar and Rais-Rohani (2009)

2.3.2 The strategy proposed in this paper-ensemble of surrogates with recursive arithmetic average

Lemma 1

Proof

Theorem 1

Proof

3 Experiments

3.1 Benchmark problems

Branin–Hoo:

CamelBack:

Goldstein–Price:

Hartman:

3.2 Abalone problems

3.3 Design and analysis of computer experiments

Root mean square error:

Average absolute error:

Max absolute error:

Correlation coefficient:

3.4 Ensemble techniques

4 Results and analysis of experiments

4.1 Correlation coefficient

4.2 RMSE

4.3 AAE

4.4 MAE

4.5 The effect of the number of the validation points

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Several metamodeling techniques

1.1 A.1 PRS

1.2 A.2 RBF

1.3 A.3 Kriging

1.4 ε-SVR

Appendix B: Box plots

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

2.2.2 The approach based on minimizing GMSE^v

2.3.1 OWS_ideal: the approach based on minimizing RMSE^v in Acar and Rais-Rohani (2009)