1 Introduction

Computer simulations are widely used in structural optimization of mechanical systems. However, for the design of complicated structures, computer simulation techniques such as finite element analysis may need unbearable computing time, which hinders the successful optimization of system performance. As a cheap alternative for computationally expensive simulations, surrogate modeling method can solve this problem very well and has been developed rapidly over the last few decades.

Surrogate models such as Polynomial Response Surface (PRS; Box and Draper 1987), Kriging (KRG; Sacks et al. 1989), Radial Basis Functions (RBF; Hardy 1971) and Support Vector Regression (SVR; Smola and Schölkopf 2004) are widely used in the practice of engineering design. And some reviews of various surrogates can be found in Queipo et al. (2005), Forrester and Keane (2009) and references therein. However, there is no clear consensus on which one is most suitable for an unknown problem. More recently, inspired by Bishop’s work (1995) in neural network, a promising modeling technique which is known as ensemble of surrogate models was developed (Zerpa et al. 2005; Goel et al. 2007). The ensemble method combines different surrogate models through a weighted form and the weight factor of each surrogate model is determined by the model accuracy. It is reasonable to view the ensemble approach as an alternative to model selection in statistics, and there are many researches in this area, including selection methods of models based on Akaike’s Information Criterion (AIC), Bayes Information Criterion (BIC), cross-validation, and the structural risk minimization methods (Madigan and Raftery 1994; Kass and Raftery 1995; Buckland et al. 1997; Cherkassky et al. 1999; Hoeting et al. 1999). Existing ensemble modeling methods can be summarized as global measures and local measures. For simplicity, weight factors evaluated from global measures and local measures are called global weight factors and local weight factors respectively in this article.

Global measures evaluate the weight factors over the entire design space. For each surrogate model, the weight factor keeps constant at every sampling point. Goel et al. (2007) proposed a heuristic algorithm in which the weight factor is calculated from generalized mean square cross-validation error (GMSE). Acar and Rais-Rohani (2009) treated the weight factors as design variables in an optimization problem, GMSE and root mean square error (RMSE) are selected as the objective function respectively. The optimization problem is solved by a numerical optimization procedure in their work. Viana et al. (2009) also obtained the optimum weight factors through minimizing RMSE, but they solved the optimization problem analytically by using Lagrange multipliers. Zhou et al. (2011) introduced a recursive algorithm in which the final averaged ensemble model is obtained by iterative modeling.

Compared with global measures, local measures evaluate the weight factors point by point, so the weight factors of each surrogate model are different at every sampling point. Sanchez et al. (2008) used prediction variance of the k-nearest sampling points around the prediction point to evaluate the weight factors. Based on the pointwise weight factors at sample points and the distances between the sample points and the prediction point, Acar (2010) proposed a spatial ensemble model.

Considering that global measures and local measures both have their pros and cons, an ensemble method (ES-HGL) which hybrids a global measure and a local measure is proposed in this article. In this method, design space is divided into two regions: the region far from the sample points (the outer region) and the region near to the sample points (the inner region). Then two strategies are introduced to evaluate the weight factors for different regions respectively: (1) in the outer region, a new weight factor named Hybrid Weight Factor is introduced; (2) in the inner region, the Hybrid Weight Factor and the local weight factor are combined in a certain way based on the location of the prediction point.

The remainder of this article is organized as follows: Some representative ensemble methods are briefly overviewed in Section 2. The development of the proposed ES-HGL model is described in Section 3. Several numerical and engineering examples are tested in Section 4. And several conclusions are presented in Section 5.

2 Background of ensemble methods

The common way of using surrogate modeling methods includes the following steps: constructing several candidate surrogate models, selecting the most accurate one based on some criteria and discarding the rest. However, this scenario has two major shortcomings. First, it is a waste of resource used on the construction of those so-called “inaccurate” models. Second, the performances of different surrogate models are influenced by the sample points, which means one surrogate model may be accurate on one data set but may be inaccurate on another one. To overcome these shortcomings, ensemble methods are proposed.

An ensemble model is a weighted combination of several individual surrogate models. The basic form of an ensemble model is defined as

$$ {\displaystyle \begin{array}{c}{\widehat{f}}^{ens}=\sum \limits_{i=1}^{N_s}{w}_i{\widehat{f}}_i\\ {}\sum \limits_{i=1}^{N_s}{w}_i=1\end{array}} $$
(1)

where \( {\widehat{f}}^{ens} \) is the response value of the ensemble model, N s is the number of used surrogate models and w i is the weight factor of the i th surrogate model \( {\widehat{f}}_i \). Apparently, if one surrogate model is more accurate than another, it will occupy a larger proportion in the ensemble model, and vice versa. In this section, some representative global measures and local measures for ensemble modeling are briefly introduced.

2.1 Global measures

Goel et al. (2007) proposed an ensemble model which is based on GMSE. The weight factors are evaluated from a heuristic algorithm

$$ {\displaystyle \begin{array}{c}{w}_i=\frac{w_i^{\ast }}{\sum \limits_{j=1}^{N_s}{w}_j^{\ast }},{w}_i^{\ast }={\left({E}_i+\alpha \overline{E}\right)}^{\beta}\\ {}\overline{E}=\frac{1}{N_s}\sum \limits_{i=1}^{N_s}{E}_i,{E}_i=\sqrt{\frac{1}{n}\sum \limits_{k=1}^n{\left({y}_k-{\widehat{y}}_{ik}\right)}^2}\end{array}} $$
(2)

where y k is the actual response at the k th sampling point, \( {\widehat{y}}_{ik} \) is the i th surrogate model’s corresponding prediction value using cross-validation and n is the number of sample points. Two unknown parameters α(α < 1) and β(β < 0) are determined based on the relationship between E i and \( \overline{E} \). Goel et al. (2007) suggested α = 0.05 and β =  − 1 in their study.

Acar and Rais-Rohani (2009) tried both GMSE and RMSE as the global error metrics. The weight factors of different surrogate models in the ensemble model are determined by solving the following optimization problem

$$ {\displaystyle \begin{array}{c} Find\kern0.5em {w}_i\\ {}\min \kern0.5em \mathrm{GMSE}\ \mathrm{or}\ \mathrm{RMSE}\\ {}s.t.\kern0.5em \sum \limits_{i=1}^{N_s}{w}_i=1\end{array}} $$
(3)

2.2 Local measures

Sanchez et al. (2008) used prediction variance as the local error metric to construct the ensemble model. The ensemble method is based on the k-nearest prediction variance, the weight factors are evaluated from

$$ {\displaystyle \begin{array}{l}{w}_i=\frac{\frac{1}{V_i^{near}}}{\sum \limits_{j=1}^{N_s}\frac{1}{V_j^{near}}}\\ {}{V}_i^{near}=\frac{1}{k-1}\sum \limits_{i=1}^k{\left({y}_i-{\widehat{y}}_i\right)}^2\end{array}} $$
(4)

where \( {V}_i^{near} \) is the prediction variance of the i th surrogate model. Here, Sanchez et al. (2008) suggested that k = 3 is a reasonable choice.

Acar (2010) proposed a spatial model, in which the calculation of weight factors depends on the pointwise weight factors and the distances between sample points and the prediction point. Hence, the weight factors are evaluated from

$$ {\displaystyle \begin{array}{l}{w}_i=\frac{w_i^{\ast }}{\sum \limits_{j=1}^{N_s}{w}_j^{\ast }},{w}_i^{\ast }=\sum \limits_{k=1}^n{w}_{ik}{I}_k(x)\\ {}{I}_k(x)=\frac{1}{d_k^2(x)},{d}_k(x)=\left\Vert x-{x}_k\right\Vert \end{array}} $$
(5)

where w ik is the pointwise weight factor of the i th surrogate model at the k th sample point. w ik equals one for the surrogate model with the lowest cross-validation error at the k th sample point, and equals zero for all other surrogate models at this sample point. I k (x) is a distance metric, especially \( {w}_i^{\ast }={w}_{ik} \) when d k (x) = 0. Three other approaches for determining w ik and I k (x) were also proposed by Acar (2010).

3 The proposed hybrid ensemble method

Constructing an ensemble model merely with global measures or local measures both have pros and cons. Global measures can guarantee the modeling accuracy in a global perspective but ignore the diversity of the combined surrogate models. Local measures can make the ensemble model more flexible but less robust, because inaccurate local error metrics may influence the model accuracy severely. This article integrates a global measure with a local measure in ensemble modeling, attempting to make the ensemble model more robust and accurate. Thus we call this approach ensemble of surrogates with hybrid method using global and local measures. In this method, the same error matrix, which is the most time-consuming part in the modeling process, is used for both global and local measures. It means that the modeling accuracy can be enhanced with the amount of modeling time remaining nearly the same.

The flowchart for ES-HGL model is shown in Fig. 1. Three key steps of this proposed method are: calculation of the weight factors using global measure and local measure based on the same error matrix, division of the design space and construction of ES-HGL model in the divided design space with different weight calculation strategies. These will be discussed in detail in Section 3.1 to 3.3.

Fig. 1
figure 1

Development and evaluation steps for ES-HGL model

3.1 Calculation of weight factors using global and local measures

Modeling error is an important criterion to evaluate the accuracy of the surrogate model. Commonly used measures include prediction variance and cross-validation error. With no more additional test points needed for error calculation, the cross-validation error is used as the modeling error to construct ES-HGL model in this article. Cross-validation error is the prediction error at each sample point when the surrogate model is constructed by using the other (n − 1) points (it is also called leave-one-out cross-validation error). Cross-validation error of the i th surrogate at the k th sample point is evaluated as

$$ {e}_{ik}={y}_k-{\widehat{y}}_{ik} $$
(6)

In order to save the repeated computational time and make a fair comparison, the same error matrix is used for constructing all ensemble models. Two ensemble modeling methods are selected to construct the proposed ES-HGL: the heuristic algorithm proposed by Goel et al. (2007) as the global measure, and the spatial model presented by Acar (2010) as the local measure. These two ensemble methods both use cross-validation error as the error matrix and can provide excellent modeling performances. The formulation of each method can be found in Section 2 and the measures can be described as follows

$$ \left\{\begin{array}{l}{w}^G\Leftarrow {f}^G\left({\boldsymbol{e}}^{CV}\right)\\ {}{w}^L\Leftarrow {f}^L\left({\boldsymbol{e}}^{CV}\right)\end{array}\right. $$
(7)

where e CV represents the error matrix which is computed from cross-validation error, f(·) represents the strategy of weight factor calculation, and the superscript G and L denote the weight factor is obtained by using global measure or local measure.

3.2 Division of the design space

It is crucial to choose a proper method to calculate weight factors of the combined surrogate models. Researchers have summarized the existing ensemble methods into two classes: the global measures and the local measures. However there is no consensus that which measure is the best one when it comes to an unknown problem. So in this article, the weight factors of individual surrogate models are calculated using different measures according to the location of the prediction point.

Due to the weight factors at a prediction point are evaluated based on the error at the sample points, the calculation strategies of weight factors need to be different for areas near to and far from the sample points. Among many of the existing methods, different measures have their own characteristics in calculating weight factors: (1) Local measures consider that the weight factors should indicate the diversity of combined surrogate models at different locations in the design space, so the weight factors are evaluated point by point and the weight factors at a prediction point are heavily influenced by the nearest sample point; (2) Global measures regard weight factors as the representative of overall accuracy, so that the weight factors are evaluated from the entire error matrix. Hence the weight factors at a prediction point are not affected greatly by the modeling error of any one sample point. Therefore, local measures are more suitable for regions near to the sample points while global measures are more appropriate for regions far from the sample points. Thus it is reasonable to define the weight factors as:

$$ {\displaystyle \begin{array}{c}{w}_i^{\ast }=\left\{\begin{array}{l}{w}_i^G\kern0.6em \mathrm{if}\kern0.2em x\in {\mathrm{R}}^o\\ {}{w}_i^L\kern0.8000001em \mathrm{if}\kern0.2em x\in {\mathrm{R}}^i\end{array}\right.\\ {}{w}_i=\frac{w_i^{\ast }}{\sum \limits_{j=1}^{N_s}{w}_j^{\ast }}\end{array}} $$
(8)

where Ro and Ri are the outer and inner regions. Notice that the actual landscape of interest is unknown, every inner part Ri should have an n-sphere shape whose center is the according sample point. Thus the design space can be divided from:

$$ x\in \left\{\begin{array}{l}{R}^i\kern0.8000001em \mathrm{if}\kern0.3em \left\Vert x-{x}_k^{nearest}\right\Vert \le {r}_k\\ {}{R}^o\kern0.7em \mathrm{if}\kern0.3em \left\Vert x-{x}_k^{nearest}\right\Vert >{r}_k\end{array}\right. $$
(9)

where Ro and Ri denote the regions far from and near to the sample points respectively, ∥ ⋅ ∥ denotes the Euclidean distance between the prediction point and the closest sample point and r k is the radius of the k th sample point’s inner region.

Once the division of design space is implemented, the determination of the region radius of each sample point is crucial. To solve this problem, the errors of ensemble models at sample points are calculated first, and then the region radii are computed from those errors. Obviously, the modeling errors evaluated from sample points are important criteria in the determination of region radii. In this article, the weighted cross-validation error (WCVE for short), which is a weighted sum of cross-validation errors from combined surrogate models, is utilized to evaluate the modeling error of ES-HGL. Apparently, with the strategy of weight factor calculation changing, the WCVEs of sample points are also different. For simplicity, WCVEs computed from the global measure and the local measure are denoted as WCVE k G and WCVE k L respectively in this article.

In ES-HGL, to evaluate the region radius of each sample point, two criteria should be followed:

(1) The region radius of the k th sample point is proportional to the distance (\( {r}_k^{\mathrm{max}} \)) between the current sample point and the closest sample point;

(2) The region radius depends on the ratio of WCVE k G to WCVE k L at the current sample point. That is to say, the region radius r k (⋅) is the function of the error ratio P k . Moreover, the region radius r k (P k ) monotonically increases as the error ratio P k increasing from 1 to positive infinity. At the same time, two boundary conditions should be satisfied:

$$ \underset{P_k\to 1}{\lim }r\left({P}_k\right)=0 $$
(10)
$$ \underset{P_k\to +\infty }{\lim }r\left({P}_k\right)={r}_k^{\mathrm{max}} $$
(11)

That is to say, when the global weighted cross-validation error WCVE k G at the current sample point is smaller than the local one (the error ratio P k is less than one), the region radius should be equal to zero. That is because the global measure is deemed to be more accurate than the local measure in this case, so we just adopt the global measure only. And when the global cross-validation error at the current sample point is far less than the local one (the error ratio P k is tend to be positive infinity), the region radius should be equal to the distance between the current sample point and the closest sample point \( {r}_k^{\mathrm{max}} \). And the local measure is deemed to be more accurate than the local measure in this case, so we just adopt the local measure only.

We consider the following two feasible formulas of the region radius r k (P k ) in the range of the elementary function:

$$ {r}_k^1\left({P}_k\right)={r}_k^{\mathrm{max}}\left(1-\frac{1}{P_k}\right) $$
(12)
$$ {r}_k^2\left({P}_k\right)={r}_k^{\mathrm{max}}\left(1-{e}^{1-{P}_k}\right) $$
(13)

After testing these two formulas respectively, we decide to adopt the first form \( {r}_k^1\left({P}_k\right) \) according to the test results (see in Appendix C.1). Hence, the region radius in (9) is evaluated from

$$ {r}_k=\left\{\begin{array}{l}{r}_k^{\mathrm{max}}\cdot \left(1-\frac{1}{P_k}\right)\kern0.65em \mathrm{if}\kern0.3em {P}_k>1\\ {}0\kern6.30em \mathrm{else}\end{array}\right. $$
(14)

where

$$ {r}_k^{\mathrm{max}}=\frac{1}{2}\underset{i\in S,i\ne k}{\min}\parallel {x}_k-{x}_i\parallel $$
(15)
$$ {P}_k=\frac{WCVE_k^G}{WCVE_k^L} $$
(16)
$$ {WCVE_k}^G=\sum \limits_{i=1}^{N_s}\mid {w}_{ik}^G{e}_{ik}\mid, {WCVE_k}^{\mathrm{L}}=\sum \limits_{i=1}^{N_s}\mid {w}_{ik}^{\mathrm{L}}{e}_{ik}\mid $$
(17)

where S denotes the sample points set, \( {r}_k^{\mathrm{max}} \) is half of the minimum distance between the k th sample point and the nearest sample point to the k th sample point and P k denotes the k th ratio of the weighted cross-validation error computed from different measures.

3.3 Construction of ES-HGL model

Though using a global measure and a local measure to construct the ensemble model in different regions is a good strategy, the defects are also distinct. The global measure ignores the diversity of combined surrogate models in different areas, and the local measure may be inaccurate when the error matrix cannot represent the actual modeling error very well. To balance the global measure and the local measure, a new weight factor named the Hybrid Weight Factor (HWF) is introduced. The value of HWF should be evaluated between the global weight and the local weight, i.e., w G ≤ w H ≤ w L or w L ≤ w H ≤ w G. Then we have tried the following four types of HWFs:

$$ {w}_1^H=\sqrt{\frac{{\left({w}^G\right)}^2+{\left({w}^L\right)}^2}{2}} $$
(18)
$$ {w}_2^H=\sqrt{w^G\times {w}^L} $$
(19)
$$ {w}_3^H=\frac{w^G+{w}^L}{2} $$
(20)
$$ {w}_4^H=\frac{WCVE^L}{WCVE^G+{WCVE}^L}{w}^G+\frac{WCVE^G}{WCVE^G+{WCVE}^L}{w}^L $$
(21)

We decide to adopt the first form \( {w}_1^H \) according to the test results (see in Appendix C.2).

The Hybrid Weight Factor can possess the following advantages:

  1. (1)

    The Hybrid Weight Factor makes the model more robust and accurate, because the modeling error in some local areas would be eliminated by the overall accuracy;

  2. (2)

    The Hybrid Weight Factor makes the model more flexible, because the diversities of combined surrogate models are sufficiently considered in the whole design space;

  3. (3)

    The Hybrid Weight Factor is succinct and easy to construct, for almost no extra computational burden is brought in the computing process.

Because the weight factors are based on the errors evaluated from the sample points, they have higher probability to be more accurate in regions near to the sample points. Then local weight factor, which can make an ensemble model more flexible, is used to amend the Hybrid Weight Factor in regions near to the sample points. In addition, considering that the local weight factors may be less accurate as the distance between the prediction point and the nearest sample point increasing, the effect of modification should be weakened. Thus in the new weight factor calculation strategy, the weight factor evaluated from (8) is modified based on the following two criteria:

  1. (1)

    in the outer region R o, the weight factor is equal to the Hybrid Weight Factor;

  2. (2)

    in the inner region Ri, the weight factor consists of two parts: the local weight factor and the Hybrid Weight Factor. The proportion of the local weight factor decreases with the distance between the prediction point and the nearest sample point increasing.

Therefore, (8) is replaced by

$$ {\displaystyle \begin{array}{c}{w}_i^{\ast }=\left\{\begin{array}{l}{w}_i^H\kern8.299995em \mathrm{if}\kern0.2em x\in {R}^o\\ {}{w}_i^L\rho +{w}_i^H\left(1-\rho \right)\kern1.90em \mathrm{if}\kern0.2em x\in {R}^i\end{array}\right.\\ {}{w}_i=\frac{w_i^{\ast }}{\sum \limits_{j=1}^{N_s}{w}_j^{\ast }}\end{array}} $$
(22)

Considering that the volume of n-sphere (Rennie 2005) is

$$ {V}_n\left(\mathrm{r}\right)=\frac{\pi^{\frac{n}{2}}}{\varGamma \left(\frac{n}{2}+1\right)}{r}^n $$
(23)

where the volume V n is proportional to r n and n denotes the dimension. So the impact metric of local measure ρ is calculated from

$$ \rho =1-{\left(\frac{\parallel x-{x}_k^{nearest}\parallel }{r_k}\right)}^n $$
(24)

The results of ES-HGL generated by using (8) and (22) respectively are given in the Appendix C.3. From the result comparison we can see that it is reasonable to use (22) in place of (8).

For better understanding how the design space is divided and how the various measures are applied, a 2-D problem is used for illustration in Fig. 2.

Fig. 2
figure 2

A 2-D problem for illustrating the division of the design space

In Fig. 2, the design space is normalized to [0,1]2 and 12 sample points are denoted as “+”. The circular regions with 12 sample points as the center points are inner regions defined in ES-HGL. The radii of these regions are evaluated from (14). The rest region in the design space is the outer region. Two random selected prediction points (denoted as “•”) are used to give a demonstration. The prediction point No. 1 is located in the circular region whose center point is the sample point No. 8, so it is in the inner region. Then the weights of surrogate models are determined by (22), (23) and (24). More exactly, the prediction point x and the nearest sample point \( {x}_k^{nearest} \) in (24) are the prediction point No. 1 and the sample point No. 8 respectively. The inner region radius r k in (24) is the radius of the circular region whose center point is the sample point No. 8. For the prediction point No. 2, the weights of surrogate models in ES-HGL are obtained from (22). Because this prediction point is not located in any circular region, that is to say, it is in the outer region.

4 Case studies

The approximation performance of ES-HGL model is compared with three existing ensemble models: the heuristic algorithm EG proposed by Goel et al. (2007) as the global measure, the spatial model SP introduced by Acar (2010) as the local measure and the optimization-based method OM (minimizing GMSE through a numerical optimization procedure to obtain the optimum weight factors) proposed by Acar and Rais-Rohani (2009). Three typical surrogate models: PRS, RBF and KRG (the detailed construction and tuning processes of these three surrogate models can be seen in Appendix A) are all used as the components of each ensemble model in this article and are also compared with the ensemble models. We use these surrogate models because they are commonly used by practitioners and they can represent different parametric and nonparametric approaches (Queipo et al. 2005).

Acar has studied the effect of several error metrics on ensemble of surrogate models (2015). Inspired by his work, four kinds of error metrics are used to evaluate the performances of different models: root mean squared error (RMSE) which evaluates the degree of deviation between the prediction value and the true response value over the entire design space, average absolute error (AAE) which ensures the positive and negative errors will not offset, maximum absolute error (MAE) which shows the maximum error within the whole design domain and coefficient of variation (COV) values which measure the dispersion of RMSE, AAE andMAE.

$$ {\displaystyle \begin{array}{c} RMSE=\sqrt{\frac{1}{N}\sum \limits_{i=1}^N{\left({y}_i-{\widehat{y}}_i\right)}^2}\\ {} AAE=\frac{\sum \limits_{i=1}^N\mid {\mathrm{y}}_i-{\widehat{\mathrm{y}}}_{\mathrm{i}}\mid }{N}\\ {} MAE=\max \left|{y}_i-{\widehat{y}}_i\right|\\ {} COV=\frac{std}{mean}\end{array}} $$
(25)

In these four definitions above, N is the number of test points, std denotes the sample standard deviation and mean denotes the mean value.

4.1 Numerical examples

Six well-known numerical examples varying from 2-D to 12-D are chosen from previous works (Dixon and Szegö 1978; Goel et al. 2007; Acar 2010) to test the performance of ES-HGL model: (1) Branin-Hoo function (2-D); (2) Camelback function (2-D); (3) and (4) are Hartman functions (3-D and 6-D); (5) Extended-Rosenbrock function (9-D); (6) Dixon-Price function (12-D). Description of these test functions can be seen in Appendix B.

In order to guarantee the accuracies of the constructed ensemble models and individual surrogate models, the number of sample points is set as twice the number of the coefficients in a full quadratic PRS (Acar 2010). Latin Hypercube Sampling (LHS; McKay et al. 1979) with good space-filling quality is used to generate the sample and test sets by the MATLAB® routine “lhsdesign” and the “maximin” criterion with a maximum of 100 iterations. To take the cost for constructing surrogate models into account and to make a full comparison, we randomly select an appropriate quantity of test sets to eliminate the effect brought by certain distribution of sample points and test points. The summary of the training and test point sets used in each problem is provided in Table 1.

Table 1 Summary of the training and test point sets used in each problem

For ease of comparison, the best results of ensemble models and individual surrogate models are shown in bold respectively (the lowest value for RMSE, AAE, MAE and COV).

From Table 2 we can see that no individual surrogate model is always accurate for different test functions while RBF is relatively better than KRG, and PRS is the worst one. On the contrary, ensemble models perform better than most of the individual surrogate models in most cases. ES-HGL outperforms in most of the error metrics for all six numerical examples, and is still among the top three models in situations when the results of ES-HGL are not the best. Because the advantages of both global and local measures are combined. The COV values of error metrics (RMSE, AAE and MAE) and the error distributions in boxplots (see Fig. 8, 9 and 10) indicate that the ES-HGL is robust, because it has small COV values and the error distributions are stable. On the other hand, the performances of the other three ensemble models and the three individual surrogate models vary apparently with different test numerical examples. It means that without losing accuracy, ES-HGL can offer more reliable approximations for problems with varying degrees of complexity and dimension. In general, the comparison results show that it is worth implementing the ensemble methods, and the proposed ES-HGL model provides more satisfactory robustness and accuracy under the four metrics for the six test functions in this article. But we should still suggest that ensemble method be used as an “insurance” rather than offering significant improvement.

Table 2 Mean and COV of RMSE, AAE and MAE for different surrogate models

4.2 NC machine beam design problem

A super heavy NC vertical lathe machine weighs over one thousand tons. It can process the workpieces with a maximum diameter of 28 m and maximum height of 13 m. The major part of this NC machine consists of beam, columns, slides, toolposts and other parts which are shown in Fig. 3(a). Among these parts, the beam is the main moving part that has a welded irregular triangular structure. It can conduct vertical feeding along the guide rail of the column, as well as provide support to the vertical toolpost and vertical slide. Two horizontal guide rails are located on the upper and lower side of the beam that can guide the vertical slide moving along the horizontal direction. Due to the great mass of the vertical slide and vertical toolpost, large weight is imposed on the guide rails of the beam. In addition, the beam is also affected by its large self weight and the cutting force of the vertical toolpost. Under these complex machining situations, the beam is easily deformed along the negative Z-direction, which has a great impact on the machining accuracy.

Fig. 3
figure 3

The structure of a super-heavy machine

The simplified structure of the beam is mainly controlled by twelve variables which are shown in Fig. 4: the thicknesses of the plate (x 1, x 2 and x 3), the total length of the beam (x 4), the widths of the front and rear end face (x 5 and x 6), the sizes of the tail (x 7 and x 8), the thickness of the rib plates (x 9), the width of the beam (x 10) and the geometry parameters of the rectangular holes (x 11 and x 12).

Fig. 4
figure 4

Diagram of the simplified beam with twelve variables

To reveal the relationship between the deformation along the negative Z-direction and the relative design variables, finite element analysis (FEA) simulations are implemented. Considering that one simulation for this beam needs great computing time, the ES-HGL model is constructed to evaluate the deflection of the negative Z-direction, which is the most representative indicator of load effect imposed on the beam. Three existing ensemble methods and three individual surrogate models are used to make a comparison. 150 sample points are selected for model construction, and 50 test points are adopted to evaluate the performances of each model.

The results of the test are given in Table 3. ES-HGL is the best in RMSE and AAE, meanwhile the second in MAE. The performance of SP is better than RBF and KRG, but worse than the best individual surrogate model PRS. And EG performs not well than the other three ensemble methods in this engineering case. It reveals that when confronted with a black-box problem with high dimensions, using individual surrogate model to approximate the unknown design space has a risk to obtain inaccurate results. However, these great inaccuracies of individual surrogates do not affect the approximate ability of ES-HGL. Obviously, ES-HGL is a promising ensemble modeling method dealing with engineering problem when the sampling cost is high and a reliable model is needed.

Table 3 Result comparison for the design of NC machine beam

4.3 Optimal design of bearings for an all-direction propeller

In this section, an optimal design of bearings for an all-direction propeller is used to demonstrate the superiority of ES-HGL from another aspect, exploring the ability to find the optimal solution when confronted with a complex engineering problem.

Because of the tough environment of ocean work, marine equipment is required to have high positioning accuracy, good mobility and high stability. The all-direction propeller, as a core dynamic positioning system, is widely used in drilling platform and large ships. Different models of the propeller studied in this article can be seen in Fig. 5. In the design of a propeller, the vibration resist property is important. Because large vibration will reduce the life of the propeller and deteriorate the service performance. Power flow which combines the effect of response speed and power, can give an absolute measure of vibration transmission. So this physical quantity is adopted here to evaluate the vibration characteristic. The power flow with ten structural parameters of shafting system is utilized as the optimization objective to obtain a better dynamic performance of the propeller. FEA simulations are used to build the relationship of several significant structural parameters and the power flow of the propeller. Considering that these simulations are time-consuming (21 mins for one simulation with Intel i3–2120 3.30GHz CPU and 4 GB RAM), we regard this simulation as a black-box problem.

Fig. 5
figure 5

Different models of the propeller

The optimization problem can be summarized as:

$$ {\displaystyle \begin{array}{c} Find\kern0.5em K={\left[{k}_1,{k}_2,{k}_3,{k}_4,{k}_5,{\mathrm{c}}_1,{\mathrm{c}}_2,{\mathrm{c}}_3,{\mathrm{c}}_4,{\mathrm{c}}_5\right]}^T\\ {}\min \kern0.5em {P}^{\ast }=\sqrt{\sum \limits_{i=1}^n{P}_i^2}\kern0.75em \mathrm{n}=5\\ {}{P}_i=\frac{\omega }{2\pi }{\int}_0^{\frac{\omega }{2\pi }}\operatorname{Re}\left({\mathrm{f}}_i\right)\operatorname{Re}\left({\mathrm{v}}_i\right)\mathrm{dt}\\ {}s.t.\kern0.5em {k}_i^L\le {k}_i\le {k}_i^U,{c}_i^L\le {c}_i\le {c}_i^U\kern0.5em i=1,2,3,4,5\end{array}} $$
(26)

where k i and c i are the stiffness and damping coefficients of the i th bearing, \( {k}_i^L \) and \( {k}_i^U \) are the lower and upper boundaries of k i , \( {c}_i^L \) and \( {c}_i^U \) are the lower and upper boundaries of c i and P i is the corresponding power flow of the i th bearing. The optimum stiffness and damping coefficients of each bearing need to be found at the minimum value of power flow.

Three other existing ensemble models (EG, SP and OM) and three individual surrogate models (PRS, RBF and KRG) are also adopted to make a comparison. For its good global search ability, the GA (Genetic Algorithm) toolbox of MATLAB is used to find the optimal solution of each model. After implementing one hundred FEA simulations and constructing the above mentioned four ensemble methods and three individual surrogate models, the optimized parameters are obtained by running the GA procedure.

Then we run FEA simulations with specified initial parameters (the mid-range of each variable) and these optimized parameters respectively for comparison. The result can be seen in Table 4. From the result we can see that all of the solutions found by the ensemble models and individual surrogate models are proved to be feasible. ES-HGL presents the best result which reduces 25.12% compared to the initial objective value 29.6499. EG and SP perform nearly the same as the best individual surrogate model PRS with a decrease of around 24%. KRG is the most inaccurate individual model which only reduces 13.85% on the basis of the initial objective value. The result shows that ensemble models perform better than the combined individual surrogate models. It explains the necessity of ensemble methods again. Above all, the successful application of ES-HGL model indicates that the proposed new ensemble model has a relative better accuracy and robustness than other individual surrogate models and ensemble models compared in this article.

Table 4 Result comparison of all-direction propeller bearings design in power flow

5 Conclusions

In this article, a new method which combines the advantages of both global and local measures is proposed to construct a better ensemble model in the cases that only a small number of sample points are available. In this method, design space is divided into two parts: the region far from the sample points and the region near to the sample points. Two strategies are introduced to evaluate weight factors in these two different regions respectively: (1) in the outer region, the Hybrid Weight Factor is adopted, which is composed of global weight factor and local weight factor; (2) in the inner region, for its good approximating ability around sample points, the local weight factor is used to amend the Hybrid Weight Factor according to the distance between the prediction point and the nearest sample point. Six numerical functions, a 12-D NC machine beam design problem and a design optimization problem for the bearings of an all-direction propeller are used to test the proposed ES-HGL method. Three other ensemble models and three individual surrogate models are adopted to make comparisons with ES-HGL. The results show that ES-HGL model can provide more robust and accurate approximations within limited sample points, while spending almost the equivalent modeling time compared with the combined ensemble models in this article.