1 Introduction

Numerical simulations have been widely used in the engineering design and optimization to facilitate quick exploration of design alternatives and obtain the optimal design. However, it is still challenging to deal with complex systems relying exclusively on numerical simulations due to the fact that the computational cost of high-fidelity (HF) simulations can be tremendous in spite of advances in computer capacity and speed nowadays (Viana et al. 2014). For example, it can take weeks to obtain the desired rollover crashworthiness and lightweight bus structure by applying an optimization algorithm directly to the simulations (Bai et al. 2019). In fact, due to the lengthy running times of HF simulations, almost any optimization algorithm applied directly to the simulations will be slow (Forrester et al. 2008). An efficient way to speed up the design optimization process is to employ inexpensive surrogate models to replace the time-consuming HF simulations. Surrogate models are built, using data drawn from a small number of simulations, to provide fast approximations of the relationship between system inputs and outputs. There are many popular surrogate modeling techniques, which can be divided into two categories according to whether the surrogate model passes through the sample points. If the surrogate models pass through all of the sample points, these techniques fall into the category of interpolation, such as radial basis function (RBF) (Fang and Horstemeyer 2006; Majdisova and Skala 2017) and kriging (KRG) (Hao et al. 2018; Sacks et al. 1989). Otherwise, they will be regression, for example, polynomial response surface (PRS) (Kleijnen 2008; Myers et al. 2016), moving least squares (MLS) (Lancaster and Salkauskas 1981; Wang 2015), support vector regression (SVR) (Smola and Schölkopf 2004; Clarke et al. 2005) and artificial neural networks (ANN) (Basheer and Hajmeer 2000; Napier et al. 2020). Each of these surrogate techniques has its own advantages and disadvantages, and it has been shown that there is no single surrogate technique that was found to be the most effective for all problems (Goel et al. 2007). More details and comparison of these techniques can be found in the literature (Jin et al. 2001; Wang and Shan 2007; Krishnamurthy 2005). Among the above-mentioned surrogate techniques, MLS is reported to be a very successful approximation scheme with advantages of high accuracy and low computational cost (Krishnamurthy 2005; Li et al. 2012), on the basis of which, it has been widely used in the engineering fields, such as structure reliability analysis (Lee et al. 2011; Lü et al. 2017), optimization for metal forming process (Breitkopf et al. 2005), and solar radiation estimation (Kaplan et al. 2020). Although a surrogate model saves an amount of computational resources, it still necessitates sufficient HF simulations to build the surrogate and ensure its accuracy, which tends to be unaffordable especially with increasing problem size. To address this challenge, multi-fidelity surrogate (MFS) models were proposed and have drawn much attention in the last two decades as they hold the promise of achieving the desired accuracy at a lower cost (Fernández-Godino et al. 2019).

Multi-fidelity surrogate is constructed by fusing many low-fidelity (LF) samples and a few HF samples. It is assumed that HF samples are more accurate than LF samples, but at a higher computational cost so that multiple evaluations of HF samples often cannot be afforded. LF samples, on the contrary, are cheaper than HF samples, but not as accurate as HF samples, though they could reflect the primary characteristics of the physical system. Moreover, it is worth noting that there is no clear boundary between the HF and LF samples. Whether a sample is HF or LF depends on the problem, and it can be determined based on the accuracy and cost against other fidelities (Fernández-Godino et al. 2019). For instance, the results obtained from experiments can be considered as HF samples, while the results from simulations as LF samples; on the other hand, results from simulations could also be regarded as HF as long as LF samples have even lower accuracy and cost. There are several ways to obtain LF samples, such as employing a simpler physical model, adopting a finite element model with coarse meshes, and reducing the dimensionality of the problem, etc.

The most popular approaches used for MFS are called correction-based approaches (Han and Görtz 2012; Fernández-Godino et al. 2016) or scaling function-based approaches (Zhou et al. 2017; Hao et al. 2020) which can be divided into three types: multiplicative, additive, and comprehensive approaches. MFS based on the multiplicative approach is constructed via multiplying the LF surrogate by a correction function, which denotes the ratio between HF and LF samples at the same locations. Multiplicative approach was first proposed by Haftka (1991) to develop the global–local approximations (GLA) method with the aim of combining the advantages of both global and local approximations. Later, this approach was termed the variable complexity modeling technique and adopted by Hutchison et al. (1994), suggesting the use of a constant correction function for an approximation, and the constant correction function was calculated using a Taylor series expansion. Alexandov et al. (2001) constructed an MFS model using multiplicative corrections to solve aerodynamic optimization problems. Furthermore, Liu et al. (2014) adopted a kriging model, instead of a constant, as the correction function to calibrate the LF model. It should be noted, however, that the multiplicative approach might be ineffective if the values of the LF model are close to zero at some locations. This property, to some extent, has limited its application to design optimization, especially for constrained problems where a solution often locates at the constraint boundary, that is, the value of the constraint function (or surrogate) will be zero. Then, the additive approach was proposed to avoid the problem of division by zero occurred in multiplicative approach.

Multi-fidelity surrogate based on the additive approach is constructed by combining an LF surrogate with a discrepancy function, which models the difference between LF and HF samples. Eldred et al. (2004a, b) compared the performance of additive and multiplicative corrections in multi-fidelity surrogate-based optimization and showed that additive corrections were preferable to multiplicative corrections. Sun et al. (2011) employed an MFS using the additive approach, in which an MLS surrogate was built for the LF model and a PRS surrogate for the discrepancy function, to optimize sheet metal forming process. Zhou et al. (2015) used two SVR surrogates, representing the LF model and the discrepancy function, respectively, to build an MFS model. Since the additive approach has a simple form and is more robust than the multiplicative approach, it has been widely used in engineering optimization (Berci et al. 2014; Absi et al. 2019; Batra et al. 2019). A significant improvement in accuracy was achieved by combining the additive and multiplicative approaches, which leads to one of the most popular frameworks for MFS, namely the comprehensive approach. It is this approach that constitutes one pillar for the construction of co-kriging (Forrester et al. 2007). The other pillar of the co-kriging, an effective MFS method, is the correlated Gaussian process-based approximation which contains the information of LF and HF samples. Han et al. (2013) introduced the gradient information to co-kriging. Apart from kriging-based MFS, other MFS techniques under the comprehensive framework have been attracting interest as well. Mainini and Maggiore (2012) constructed multiple LF and HF models, and selected the best ones to build an MFS model. Zhang et al. (2018) proposed an MFS model based on linear regression (LR), named LR-MFS, by considering the LF model as an additional monomial in the MFS with the scaling factor as a regression coefficient. Tao et al. (2019) introduced a deep learning-based MFS to robust aerodynamic design optimization. Durantin et al. (2017) proposed an MFS model based on RBF, which optimized the parameters by minimizing leave-one-out (LOO) cross-validation (CV) error. Song et al. (2019) combined the scaled LF model with a discrepancy function using two RBF surrogates in an MFS model and obtained a closed-form solution for the coefficients. Although efforts have been made to develop the MFS models, there is still space for investigating other MFS models to expand the arsenal of MFS models.

In this work, we proposed a simple and yet powerful MFS based on moving least squares, which is called adaptive MFS-MLS. The proposed MFS-MLS model is constructed using the comprehensive approach, which includes an LF scaling factor and a discrepancy function. In MFS-MLS, the scaling factor is a function of location and is multiplied with the LF model. The discrepancy function, modeling the difference between HF responses and the scaled LF model, is represented by an MLS model which consists of a linear combination of monomial basis functions. To compute the scaling factor and the coefficients of the basis functions, the predictions of the LF model and the basis functions are integrated into a matrix. Then, the scaling factor and the coefficients are correspondingly integrated into a coefficient vector and calculated by weighted least squares minimizing the error between the HF response and the prediction of the MFS-MLS. In addition, a new strategy was proposed to determine the size of the influence domain automatically. The MFS-MLS model allocates different weightings to HF samples within the influence domain, which distinguishes itself from global MFS models.

The remainder of this paper is organized as follows. Section 2 presents the details of the proposed MFS-MLS model. Comparisons between the proposed model and three MFS and three single-fidelity models on some benchmark numerical examples are given in Sect. 3. In Sect. 4, the MFS-MLS is applied to an engineering problem to further verify its effectiveness and applicability in dealing with practical problems. Conclusions and future work are drawn in Sect. 5.

2 The adaptive MFS-MLS methodology

The proposed MFS-MLS model is constructed using the comprehensive approach. The MFS-MLS model is adaptive because it can not only determine the size of influence domain automatically according to sample points, but also can determine the varying scaling factors at the prediction sites.

2.1 Adaptive MFS-MLS model

The comprehensive approach, possessing the advantages of additive and multiplicative approaches, has more flexibility and higher accuracy. Therefore, the comprehensive approach is employed in this paper and can be expressed as follows:

$${\varvec{y}}_{H} \left( {\varvec{x}} \right) = \rho {\varvec{y}}_{L} \left( {\varvec{x}} \right) + {\varvec{d}}\left( {\varvec{x}} \right)$$
(1)

where \({\varvec{x}}\) represents design variables in the design space, and \({{\varvec{y}}}_{H}\left({\varvec{x}}\right)\) and \({{\varvec{y}}}_{L}\left({\varvec{x}}\right)\) denote the responses of HF and LF models, respectively. \(\rho\) is a scaling factor and plays an important role in approximating multi-fidelity data, \({\varvec{d}}({\varvec{x}})\) represents the difference between the scaled LF responses and the HF responses, called discrepancy function. The discrepancy function could be smoothed by selecting an appropriate scaling factor. The details about how a scaling factor improves multi-fidelity prediction can be found in the literature (Park et al. 2018).

The proposed MFS-MLS model is based on the MLS, thus, we give a brief introduction about MLS, more details about MLS, such as the derivation of equations, can be found in the literature (Lee et al. 2011; Lü et al. 2017; Breitkopf et al. 2005).

Moving least squares is a successful approximation scheme with advantages of both high accuracy and low computational cost (Krishnamurthy 2005; Li et al. 2012). In essence, MLS is an extension of the polynomial regression; however, there exist two significant differences from the traditional polynomial regression: (1) MLS recognizes that all sample points may not be equally important in estimating the regression coefficients. Therefore, each squared residuals are given a weighting when constructing the loss function. In addition, the weightings are varied depending upon the distance between the point to be predicted and each observed data point. (2) Unlike the traditional polynomial regression, the coefficients of an MLS model are not constant anymore, they are functions of input \({\varvec{x}}\). At each prediction point \({{\varvec{x}}}_{\text{new}}\), the coefficients are calculated using sample points within the neighborhood of the point \({{\varvec{x}}}_{\text{new}}\). This neighborhood is referred to as the influence domain of the point \({{\varvec{x}}}_{\text{new}}\), and the samples outside the influence domain are not considered. This method is termed moving least squares because the influence domain is “moving” as the prediction point changes.

Inspired by MLS, the MFS-MLS model based on the comprehensive correction can be built by the following equation:

$$\begin{aligned} \hat{y}_{H} \left( {\varvec{x}} \right) & = a_{0} \left( {\varvec{x}} \right)y_{L} \left( {\varvec{x}} \right) + \mathop \sum \limits_{i = 1}^{m} a_{i} \left( {\varvec{x}} \right)p_{i} \left( {\varvec{x}} \right) \\ & = {\varvec{a}}\left( {\varvec{x}} \right)^{{\text{T}}} {\varvec{p}}\left( {\varvec{x}} \right) \\ \end{aligned}$$
(2)

where \({a}_{0}\left({\varvec{x}}\right)\) is the scaling function for the LF model \({y}_{L}\left({\varvec{x}}\right)\), note that \({a}_{0}\left({\varvec{x}}\right)\) is not a constant, but rather a function of design variables \({\varvec{x}}\), which provides more flexibility to the MFS-MLS model. \({p}_{i}\left({\varvec{x}}\right)\) is the monomial basis function, \({a}_{i}\left({\varvec{x}}\right)\) the coefficient of the basis function, and m is the number of terms in the basis.

In the MFS-MLS model, the conventional basis function vector of an MLS model is augmented as an integrated vector \({\varvec{p}}\left({\varvec{x}}\right)={[{y}_{L}\left({\varvec{x}}\right) {p}_{1}\left({\varvec{x}}\right)\dots {p}_{m}({\varvec{x}})]}^{\text{T}}\). \({\varvec{a}}\left({\varvec{x}}\right)\) is an augmented coefficient vector constituted by \({a}_{0}\left({\varvec{x}}\right)\) and \({a}_{i}\left({\varvec{x}}\right), i=\text{1,2},\dots ,m\). Note that \({a}_{0}\left({\varvec{x}}\right)\) is the counterpart of \(\rho\) in Eq. (1) and the term \(\sum_{i=1}^{m}{a}_{i}\left({\varvec{x}}\right){p}_{i}\left({\varvec{x}}\right)\) of the MFS-MLS model is equivalent to the discrepancy function \({\varvec{d}}({\varvec{x}})\) of Eq. (1).

In the absence of specific knowledge about the characteristics of the real function, linear and quadratic monomials are often employed as the basis functions, e.g., a full quadratic basis in a two-dimensional (2D) space is of the form: \(\stackrel{\sim }{{\varvec{p}}}\left({\varvec{x}}\right)={[1 {x}_{1} {x}_{2} {x}_{1}^{2} {x}_{1}{x}_{2} {x}_{2}^{2}]}^{\text{T}}\). Therefore, the integrated vector \({\varvec{p}}({\varvec{x}})\) can be represented by \({\varvec{p}}\left({\varvec{x}}\right)={[{y}_{L}\left(x\right) \stackrel{\sim }{{\varvec{p}}}\left({\varvec{x}}\right)]}^{\text{T}}\).

To compute coefficient vector \({\varvec{a}}\left({\varvec{x}}\right)\), a cost function \({\varvec{J}}({\varvec{a}})\) that is a sum of weighted discrete \({L}_{2}\) norms should be minimized:

$$\begin{aligned} {\varvec{J}}\left( {\varvec{a}} \right) & = \mathop \sum \limits_{j = 1}^{{n_{H} }} w\left( {{\varvec{x}} - {\varvec{x}}_{j} } \right)\left[ {\hat{y}_{H} \left( {{\varvec{x}}_{j} } \right) - y_{H} \left( {{\varvec{x}}_{j} } \right)} \right]^{2} \\ & = \mathop \sum \limits_{j = 1}^{{n_{H} }} w\left( {{\varvec{x}} - {\varvec{x}}_{j} } \right)\left[ {{\varvec{a}}\left( {\varvec{x}} \right)^{{\text{T}}} {\varvec{p}}\left( {\varvec{x}} \right) - y_{H} \left( {{\varvec{x}}_{j} } \right)} \right]^{2} \\ \end{aligned}$$
(3)

where \({{\varvec{x}}}_{j}(j=\text{1,2},\dots ,{n}_{H})\) are the \({n}_{H}\) HF sample points in the neighborhood of the evaluation point \({\varvec{x}}\), \(w({\varvec{x}}-{{\varvec{x}}}_{j})\) is the weight function of the HF samples. The commonly used weight functions are Gaussian function, the cubic spline, the exponential function, and the quartic spline. The details about weight functions and identification of the size of the influence domain will be discussed in Sect. 2.2.

Essentially, Eq. (3) is a quadratic form so that it can be rewritten in the matrix form as follows:

$${\varvec{J}}\left( {\varvec{a}} \right) = \left( {{\mathbf{P}}{\varvec{a}} - {\varvec{y}}_{H} } \right)^{{\text{T}}} {\varvec{W}}\left( {\varvec{x}} \right)\left( {{\mathbf{P}}{\varvec{a}} - {\varvec{y}}_{H} } \right)$$
(4)

where

$${\varvec{y}}_{H} = \left[ {y_{H} \left( {{\varvec{x}}_{1} } \right){ }y_{H} \left( {{\varvec{x}}_{2} } \right){ } \ldots { }y_{H} \left( {{\varvec{x}}_{{n_{H} }} } \right)} \right]^{{\text{T}}}$$
(5)
$${\varvec{P}} = \left[ {\begin{array}{*{20}c} {y_{L} \left( {{\varvec{x}}_{1} } \right)} & {p_{1} \left( {{\varvec{x}}_{1} } \right)} & \ldots & {p_{m} \left( {{\varvec{x}}_{1} } \right)} \\ {y_{L} \left( {{\varvec{x}}_{2} } \right)} & {p_{1} \left( {{\varvec{x}}_{2} } \right)} & \ldots & {p_{m} \left( {{\varvec{x}}_{2} } \right)} \\ \vdots & \vdots & \ddots & \vdots \\ {y_{L} \left( {{\varvec{x}}_{{n_{H} }} } \right)} & {p_{1} \left( {{\varvec{x}}_{{n_{H} }} } \right)} & \ldots & {p_{m} \left( {{\varvec{x}}_{{n_{H} }} } \right)} \\ \end{array} } \right]$$
(6)

and

$${\varvec{W}}\left( {\varvec{x}} \right) = \left[ {\begin{array}{*{20}c} {w\left( {{\varvec{x}} - {\varvec{x}}_{1} } \right)} & 0 & \ldots & 0 \\ 0 & {w\left( {{\varvec{x}} - {\varvec{x}}_{2} } \right)} & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & {w\left( {{\varvec{x}} - {\varvec{x}}_{{n_{H} }} } \right)} \\ \end{array} } \right]$$
(7)

Taking the derivatives of Eq. (4) w.r.t \({\varvec{a}}\left({\varvec{x}}\right)\) and setting to zero, we have

$${\varvec{A}}\left( {\varvec{x}} \right){\varvec{a}}\left( {\varvec{x}} \right) = {\varvec{B}}\left( {\varvec{x}} \right){\varvec{y}}_{H}$$
(8)

where matrices \(\mathbf{A}\left({\varvec{x}}\right)\) and \(\mathbf{B}({\varvec{x}})\) are

$${\varvec{A}}\left( {\varvec{x}} \right) = {\varvec{P}}^{{\text{T}}} {\varvec{W}}\left( {\varvec{x}} \right){\varvec{P}}$$
(9)
$${\varvec{B}}\left( {\varvec{x}} \right) = {\varvec{P}}^{{\text{T}}} {\varvec{W}}\left( {\varvec{x}} \right)$$
(10)

Hence, we obtain

$${\varvec{a}}\left( {\varvec{x}} \right) = {\varvec{A}}^{ - 1} \left( {\varvec{x}} \right){\varvec{B}}\left( {\varvec{x}} \right){\varvec{y}}_{H}$$
(11)

Substituting \({\varvec{a}}\left({\varvec{x}}\right)\) into Eq. (2), the MFS-MLS approximation \({\widehat{y}}_{H}\left({\varvec{x}}\right)\), can be obtained as

$$\hat{y}_{H} \left( {\varvec{x}} \right) = {\varvec{p}}^{{\text{T}}} \left( {\varvec{x}} \right){\varvec{A}}^{ - 1} \left( {\varvec{x}} \right){\varvec{B}}\left( {\varvec{x}} \right){\varvec{y}}_{H}$$
(12)

2.2 Weight function and the size of influence domain

Both the weight function and the size of influence domain have crucial impacts on the performance of the MLS (Lü et al. 2017), since the MFS-MLS is based on the MLS, it is reasonable to choose the weight function and the size of influence domain carefully in the process of constructing an MFS-MLS model. In this study, we selected the popular exponential function as the weight function and proposed a straightforward strategy to identify the size of the influence domain automatically.

The exponential function is adopted as the weight function and can be expressed by:

$$w\left( {\overline{s}} \right) = \left\{ \begin{gathered} e^{{ - \left( {2\overline{s}} \right)^{2} }} ,\quad \overline{s} \le 1 \hfill \\ 0,\quad \overline{s} > 1 \hfill \\ \end{gathered} \right.$$
(13)

The size of the influence domain is determined by Euclidean distance \(\overline{s }\) which depends on the number of terms in the basis function. Let \(s = \left\| {{\varvec{x}} - {\varvec{x}}_{j} } \right\|\), which denotes the Euclidean distance between the evaluation point \({\varvec{x}}\) and jth HF sample point \({{\varvec{x}}}_{j}\cdot \overline{s }=s/{s}_{k}\), let \(s_{1} ,s_{2} , \ldots ,s_{k} , \ldots ,s_{{n_{H} }}\) be the list of Euclidean distances between the evaluation point \({\varvec{x}}\) and all the HF sample points, sorted in ascending order. In this way, \({s}_{k}\) represents the kth largest Euclidean distance. The subscript k, representing the number of terms in the basis function, is given in Table 1. Therefore, the size of influence domain is determined by Euclidean distance \(\overline{s }\) and \(\overline{s }\) depends on k. Taking a 2D problem as an example, from Fig. 1, it can be observed that the influence domain is centered at the evaluation point, and the radius of the influence domain is the kth largest Euclidean distance\({s}_{k}\).

Table 1 Basis functions and the number of HF samples within the influence domain
Fig. 1
figure 1

Illustration for weight function and influence domain

The basis functions in an MFS-MLS model consists of n + 1 linear monomials or (n + 1)(n + 2)/2 quadratic monomials. The number of unknown coefficients of an MFS-MLS model are n + 2 or (n + 1)(n + 2)/2 if the linear or quadratic monomials are employed as basis functions, respectively. Therefore, n + 2 or 1 + (n + 1)(n + 2)/2 HF samples are sufficient to identify the coefficients of an MFS-MLS model, the radius of the influence domain can be set as the kth largest Euclidean distance between the evaluation point \({\varvec{x}}\) and all the HF sample points. The HF samples outside the influence domain are not involved in the calculation. However, if the number of HF samples is so scarce that \({n}_{H}<k\), then all the HF samples are included in the influence domain. This implies that it needs more and more samples for high-dimensional problems, otherwise the proposed strategy will turn the MFS-MLS model from local to global, which will incur a decline in the performance.

3 Numerical examples

To evaluate the performance of the MFS-MLS model, the MFS-MLS model is compared with three state-of-the-art benchmark MFS models (i.e., CoRBF proposed by Durantin et al. (2017), LR-MFS proposed by Zhang et al. (2018), and MFS-RBF proposed by Song et al. (2019)) and three single-fidelity surrogate models (i.e., PRS, RBF, and MLS) on a number of widely used numerical test functions and one engineering problem.

Among the three benchmark MFS models, however, it should be noted that since the source code of CoRBF is not available, some specific parameters may be different from those in the original one. For the LR-MFS model, a first-order PRS is used to approximate the discrepancy function. For the three single-fidelity models, the SURROGATES Toolbox (Viana 2010) is employed to conduct the comparative experiments.

3.1 Design of experiments

Design of experiments (DOE) is the sampling plan in design space, which is generally the first step in the process of building a surrogate model. Among many available DOE techniques, Latin hypercube sampling (LHS) is chosen in this paper to generate samples due to its great capability of generating near-random samples uniformly. More specifically, for all surrogate models in this paper, the lhsdesign function, a Matlab built-in sampling function, is adopted to generate samples.

In this paper, it is assumed that the number of HF samples, used for building a single-fidelity surrogate model, is m × n, where n is the dimension of the problem and m is a user-defined value. To compare the performance of MFS models and single-fidelity surrogate models fairly, the total computational budget of samples for building these two kinds of surrogate models is supposed to be equal. Therefore, to build an MFS model, the number of HF samples is set to k × n (k < m), and the remaining (mk) × n HF budget is replaced by more LF samples via cost ratio θ. The cost ratio of HF samples to LF samples means that the cost of evaluating θ LF samples is tantamount to that of evaluating one HF sample. Taking a 2D problem as an example, if m is set to 5 and cost ratio θ set to 20, then the total budget to build a surrogate model is 10 HF samples. Thus, we can use either 10 HF samples, 200 LF samples, or any combinations as shown in Table 2.

Table 2 Combination of HF and LF samples (for a 2D problem)

3.2 Performance criteria

To measure the performance of surrogate models, the coefficient of determination R2, a global performance metric, is selected, and the formula of R2 is given in Eq. (14):

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \overline{y}} \right)^{2} }}$$
(14)

where n is the number of testing samples; \({y}_{i}\) and \({\widehat{y}}_{i}\) represent true responses and predictions, respectively, at the testing points; \(\overline{y }\) is the mean of the true responses. The surrogate model is more accurate if R2 is closer to 1.

The Pearson correlation coefficient (PCC), also referred to as Pearson’s r, is a measure of the correlation between two random variables. In this paper, we adopt the squared Pearson’s r, denoted as r2, to represent the correlation between HF and LF functions, which is inspired by Toal (2015), as shown in Eq. (15):

$$r^{2} = \left( {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{hi} - \overline{y}_{h} } \right)\left( {y_{li} - \overline{y}_{l} } \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{n} {\left( {y_{hi} - \overline{y}_{h} } \right)^{2} } } \sqrt {\sum\nolimits_{i = 1}^{n} {\left( {y_{li} - \overline{y}_{l} } \right)^{2} } } }}} \right)^{2}$$
(15)

where \({y}_{h}\) and \({y}_{l}\) are a set of n observations, respectively, of the HF and LF functions for identical inputs; \({\overline{y} }_{h}\) and \({\overline{y} }_{l}\) represent the mean of \({y}_{h}\) and \({y}_{l}\), respectively. The correlation is in proportion to the value of r2, which ranges from 0 to 1.

3.3 Test function 1

The HF function (Eq. (16)) is a 2D test function derived from Cai et al. (2017). Rather than having a single LF response, we consider a range of different LF responses given by Eq. (17).

HF function:

$$y_{h} \left( {\varvec{x}} \right) = 4x_{1}^{2} - 2.1x_{1}^{4} + \frac{1}{3}x_{1}^{6} + x_{1} x_{2} - 4x_{2}^{2} + 4x_{2}^{4}$$
(16)

LF function:

$$y_{l} \left( {\varvec{x}} \right) = Ay_{h} \left( {0.85{\varvec{x}}} \right) + x_{1} x_{2} - 65$$
(17)

where \({x}_{1},{x}_{2}\in [-\text{2,2}]\), the parameter A varies from 0 to 1 and effectively controls the correlation of HF and LF responses.

A total computational budget of 5n (n = 2) HF samples are used to construct single-fidelity and MFS models. Here, 80% of the total budget is used to generate the HF samples and the remaining 20% budget to generate the LF samples, and the cost ratio is set to 10 so that the number of HF and LF samples are 8 and 20, respectively. For the three single-fidelity models, the total budget of the training samples is the same as that for constructing MFS models. Therefore, 10 HF samples are used to construct the three single-fidelity models. In addition, another 1000n testing samples from the HF function are used for the validation of the single-fidelity and MFS models. All the samples are generated by LHS. To eliminate the effect of random sampling plan on the performance of surrogate models, all the results are averaged over 30 random DOEs. To alleviate the outliers in 30 DOEs affecting the results, in this work, \({R}^{2}\) was set to 0 if \({R}^{2}\le 0\). Setting \({R}^{2}=0\) if \({R}^{2}\le 0\) has two merits. Firstly, \({R}^{2}<0\) and \({R}^{2}=0\) all represent that the surrogate model cannot capture the relationship between design variables and responses. Secondly, setting \({R}^{2}=0\) if \({R}^{2}\le 0\) will avoid the large negative value deteriorating the averaged results.

The reason why 5n HF samples are selected as the total budget is stated as follows: A rule of thumb for choosing the number of training samples to construct a single-fidelity model is 10n (Jones et al. 1998; Forrester and Keane 2009). The purpose of constructing MFS models is to achieve the desired accuracy at a lower cost. Thus, it is reasonable that the total budget of building an MFS model is no more than 10n HF samples. In addition, it holds little promise of improving the performance of MFS models using too few samples. Thus, 5n HF samples are chosen to build surrogate models.

The effect of the correlation between HF and LF function on the performance of the MFS-MLS model is studied by Figs. 2 and 3. Figure 2 compares the performance of the MFS-MLS model with that of three single-fidelity models (i.e., MLS, PRS, and RBF). The three single-fidelity and the MFS-MLS model are constructed by 10 HF samples and validated by the 1000n (n = 2) HF testing samples. In Fig. 2, the upper and lower subplot show the mean and standard deviation of R2 over 30 DOE. In the upper subplot, the left-hand y-axis represents the prediction accuracy by the coefficient of determination R2, the right-hand y-axis represents the correlation r2 between HF and LF functions. The x-axis represents the parameter A which controls the correlation r2. As it can be seen from Eqs. (1617), the LF function is a variant of the HF function and the correlation between the HF and LF function will be changed by varying the parameter A. To investigate the effect of the correlation between HF and LF function on the performance of the MFS-MLS model, parameter A is chosen from 0 to 1 spaced with 0.1, which means that total 11 LF functions and one HF function are used to form 11 pairs of HF and LF functions with diverse correlations. The red dashed line in Fig. 2 shows the relationship between the correlation r2 and the parameter A for test function 1. It is observed that the correlation r2 is monotonically increased from 0.02 to 0.92 as A increases, and the tendency of the prediction accuracy of the MFS-MLS, in general, matches the tendency of the correlation r2. The performance of the MFS-MLS is much better than that of the three single-fidelity surrogate models when the correlation r2 is greater than 0.2.

Fig. 2
figure 2

Comparison between MFS-MLS and single-fidelity surrogate models for test function 1

Fig. 3
figure 3

Comparison of the MFS models for test function 1

Figure 3 compares the performance of the MFS-MLS model with those of the LR-MFS, CoRBF, and MFS-RBF models in terms of R2 for test function 1. The four MFS models, MFS-MLS, LR-MFS, CoRBF, and MFS-RBF, are trained on the same training samples (i.e., 8 HF and 20LF samples) and validated on the same 1000n HF testing samples. It is shown that the MFS-MLS model performs much better than other MFS models when r2, the correlation between HF and LF functions, is less than 0.9. When r2 ≥ 0.9, the performance of MFS-MLS is slightly worse than that of CoRBF, but still better than those of MFS-RBF and LR-MFS. It is worth noting that the tendency of the performance of all of the four MFS models is consistent with the tendency of correlation r2, while the MFS-MLS is less sensitive to the correlation. This is caused by the fact that, for the MFS-MLS model, only the HF samples within the influence domain were used to compute the loss, the HF samples far away from the evaluation point have less or even no influence on the prediction. An MFS model is more robust if its performance is less sensitive to the correlation between HF and LF models. This would be useful when dealing with practical engineering problems because the correlation of HF and LF models in the practical engineering problems may be inaccurate or unknown due to the scarce HF samples.

The effect of the cost ratio of HF to LF samples on the performance of MFS-MLS models is studied in Fig. 4. Assuming the total budget is the cost of 5n HF samples, 80% of the total budget is used to generate the HF samples, the remaining 20% budget is used to generate the LF samples for building MFS-MLS models. The cost ratios are set to 5, 10, 20, and 40, respectively. The total budget and the number of the HF samples are fixed, the number of the LF samples is determined by the cost ratio. Then the number of training samples for the MFS-MLS model is shown in Table 3.

Fig. 4
figure 4

Effect of the cost ratio on the performance of a the MFS-MLS model and b the LF model

Table 3 The number of training samples for the MFS-MLS model

Figure 4 illustrates the effect of cost ratios on the performance of the MFS-MLS model under different correlations between HF and LF function. The correlation can be adjusted by parameter A of the LF function. Parameter A is chosen from 0 to 1 spaced with 0.1, hence, the correlation has 11 different values. It can be observed from Fig. 4a, when parameter A is fixed, that the correlation \({r}^{2}\) will be constant, the performance of the MFS-MLS model gets improved as the cost ratio increases. For example, if A = 0.4, then the correlation \({r}^{2}\) is 0.75, the \({R}^{2}\) of the MFS-MLS models are 0.55, 0.80, 0.91, and 0.96, respectively, when cost ratios are 5, 10, 20, and 40. When the cost ratio increases, more LF samples are used to construct the LF model so that the LF model becomes more accurate, see Fig. 4b, which leads to a better performance of the MFS-MLS model. When \({r}^{2}\)>0.5, with correlation increases, the performance of the MFS-MLS model reaches a plateau due to the decreasing performance of LF surrogate models. This also implies that the discrepancy function becomes easier to fit as the correlation increases. From Fig. 4a and b, we can also see that the performance of the MFS-MLS will be enhanced as that of the LF surrogate model gets improved if the correlation is not at extreme low values. Overall, both the cost ratio of HF and LF models and the performance of LF surrogate models have an important impact on the performance of MFS-MLS.

Assuming the cost ratio θ is 10, three different combinations, under the total budget of 5n HF samples, are used to investigate the effect of combinations of HF and LF samples on the performance of the MFS-MLS model. These three combinations are “4n–10n”, “3n–20n”, “2n–30n”, which means that 80%, 60%, and 40% of the total budget are used to generate the HF samples, respectively; 20%, 40%, and 60% of the total budget are used, correspondingly, to generate the LF samples. Taking “4n–10n” as an example, “4n–10n” means 4n HF samples, and 10n LF samples are used to build an MFS-MLS model. From Fig. 5, it can be observed that for these three combinations, the MFS-MLS model in the case of “4n–10n” performs better than the other two cases. In addition, only the case of “4n–10n” performs better than the single-fidelity PRS model, which exhibits the best performance among the above-mentioned three single-fidelity models, in terms of both the mean and standard deviation of \({R}^{2}\). Therefore, it is suggested that approximately 80% of the total budget should be allocated to generate the HF samples when using MFS-MLS models.

Fig. 5
figure 5

Effect of the combination of HF and LF samples on the performance of MFS-MLS for test function 1

3.4 Test function 2

The test function 2 is directly derived from Toal (2015), in which the HF function (Eq. (18)) is the “Trid function” of ten variables. It is often used to test the performance of surrogate models on high-dimensional problems. It is worth noting that the allocations of HF and LF samples for test function 2 are identical to that of test function 1 throughout the follow-up numerical experiments.

HF function:

$$y_{h} = \mathop \sum \limits_{i = 1}^{10} \left( {x_{i} - 1} \right)^{2} - \mathop \sum \limits_{i = 2}^{10} x_{i} x_{i - 1}$$
(18)

LF function:

$$y_{l} = \mathop \sum \limits_{i = 1}^{10} \left( {x_{i} - A} \right)^{2} - \left( {A - 0.65} \right)\mathop \sum \limits_{i = 2}^{10} ix_{i} x_{i - 1}$$
(19)

where \({x}_{i}\in \left[-\text{100,100}\right], i=\text{1,2},\dots ,10\). The parameter A varies from 0 to 1.

Figure 6 compares the performance of the MFS-MLS model with that of three single-fidelity models (i.e., MLS, PRS, and RBF). For the three single-fidelity models, 5\(n\) (i.e., 50) HF samples are used to construct the three single-fidelity models. In addition, another 1000\(n\) testing samples from the HF function are used for the validation of the single-fidelity and the MFS-MLS models. All the samples are generated by LHS. It is observed that the performance of MFS-MLS is not very sensitive to the correlation of HF and LF functions in this case. The performance of MFS-MLS is much better than that of the three single-fidelity surrogate models in terms of both the mean and standard deviation of \({R}^{2}\) when the correlation parameter A is no more than 0.8.

Fig. 6
figure 6

Comparison between MFS-MLS and single-fidelity surrogate models for the test function 2

Figure 7 compares the performance of MFS-MLS with those of LR-MFS, CoRBF, and MFS-RBF for test function 2. It is shown that the prediction accuracy of both CoRBF and LR-MFS are quite low, which means that the CoRBF and LR-MFS models cannot fit the test function 2 at all. The reason why the LR-MFS model cannot fit the test function 2 is probably caused by the fact that the discrepancy of the HF and LF function in test function 2 is a quadratic function; however, it is fitted by a first-order polynomial in the LR-MFS model. As for the CoRBF model, the model parameter obtained by optimizing leave-one-out error is probably not very stable for high-dimensional problems. The CoRBF model tends not to be good at dealing with high-dimensional problems with small samples available (Durantin et al. 2017). The MFS-MLS model performs much better than the other MFS models in terms of both the mean and standard deviation of \({R}^{2}\). The tendency of the performance of the MFS-RBF is highly consistent with the tendency of the correlation r2. Therefore, the performance of the MFS-RBF is very sensitive to the correlation r2. On the contrary, the performance of the MFS-MLS is not sensitive to the correlation for this case, and it is also better than that of the other three MFS models. As a matter of fact, the performance of the MFS-MLS model is still related to the correlation to some extent, which is not evident in Fig. 7, but it can be observed later on in Figs. 8 and 9.

Fig. 7
figure 7

Comparison of MFS models for test function 2

Fig. 8
figure 8

Effect of the cost ratio on the performance of a the MFS-MLS model and b the LF model

Fig. 9
figure 9

Effect of the combination of HF and LF samples on the performance of MFS-MLS for test function 2

The effect of the cost ratio of HF to LF samples on the performance of MFS-MLS models is studied by Fig. 8. it is assumed that the total budget is the cost of 5\(n\) (\(n\)=10) HF samples, 80% of the total budget is used to generate the HF samples, the remaining 20% budget is used to generate the LF samples for building MFS-MLS models. The cost ratios are set to 5, 10, 20, 40, respectively. The total budget and the number of the HF samples are fixed, the number of the LF samples is determined by the cost ratio. Then the number of training samples for the MFS-MLS model is shown in Table 4. The cost ratio determines the number of the LF samples and the LF model are constructed by the LF samples so that the cost ratio will affect the performance of the LF model.

Table 4 The number of training samples for the MFS-MLS model

Figure 8 illustrates the effect of different cost ratios on the performance of the MFS-MLS model. From Fig. 8a, it can be observed that the performance of the MFS-MLS model is getting better as the cost ratio gets increased when \({r}^{2}\ge 0.1\). It is understandable because the LF model becomes more accurate with more LF samples are added, as shown in Fig. 8(b), which will provide a more accurate prediction trend for the MFS-MLS model. However, when the correlation \({r}^{2}\) remains constant at extremely low values, for instance, \({r}^{2}\) ≤ 0.1, the performance of the MFS-MLS model does not improve as the cost ratio increases. This is because when the correlation is extremely small, the similarity of the landscape of HF and LF function is weak, the LF model cannot provide a useful trend for the MFS-MLS model even if the LF model is accurate enough. It can be observed from Fig. 8b that when A = 0.4 or 0.5, i.e., \({r}^{2}\) = 0.05 or 0, the \({R}^{2}\) of the MFS-MLS models are 0.827, 0.814, 0.854, 0.853 or 0.825, 0.804, 0.811, 0.830 when cost ratios are 5, 10, 20, and 40. when the correlation \({r}^{2}\) takes the extremely small values, the performance of the MFS-MLS model does not improve as the cost ratio increases. Moreover, it can be seen that the performance of the MFS-MLS becomes more consistent with the tendency of the correlation as the cost ratio increases. When the cost ratio is relatively low, the low prediction accuracy of the LF model disturbs the effect of correlation on the performance of the MFS-MLS. Therefore, both the cost ratio and correlation have an impact on the performance of MFS-MLS.

The effect of different combinations of HF and LF samples on the performance of the MFS-MLS model is investigated. From Fig. 9, it can be observed that when the correlation r2 < 0.8, the case of “4n–10n”, i.e., 4n HF samples and 10n LF samples, performs best among the three cases (“4n–10n”, “3n–20n”, and “2n–30n”); the case of “3n–20n” performs better than the combination of “2n–30n”. However, when the correlation r2 ≥ 0.8, the combination of “2n–30n” performs best. The order of the performance of these three combinations reverses. This phenomenon can be explained by follows: Although the test function 2 is high-dimensional, the landscape of the HF function is not very bumpy, and the nonlinearity of the HF function is also lower than that of the LF function, as we can see from Eqs. (18) and (19). Therefore, when the correlation is large enough (here, r2 ≥ 0.8), only a few HF samples are enough to calibrate or enhance the prediction accuracy of the MFS-MLS model as long as the LF model is accurate enough to reflect the real trend of HF function. When r2 < 0.8, the performance of the case “2n–30n” is unpromising because when the correlation is low, in other words, the similarity of the landscape of HF and LF function is weak, 2n HF samples are not sufficient to fit the discrepancy between the HF and the scaled LF function even though the LF surrogate model is accurate enough.

In Fig. 9, only the MLS model of the three single-fidelity models is shown, because the MLS model performs best among the three single-fidelity models. The three single-fidelity models are the same as the ones in Fig. 6. Overall, the combination of “4n–10n” and “3n–20n” would be reasonable choices for constructing an MFS-MLS model for this test function, because, for most correlations, they both achieve better performance than the single-fidelity models.

3.5 Other benchmark functions

In this section, extra 16 test functions were employed to validate the performance of the MFS-MLS model further. The 16 test functions, comprising different dimensions and various degrees of nonlinearity, are selected from the website https://www.sfu.ca/~ssurjano/index.html and listed in “Appendix 1”. For each function, 5n HF samples generated by LHS are employed to construct single-fidelity surrogate models. For MFS models, the number of HF and LF samples are 4n and 10n, respectively, and the cost ratio of HF samples to LF samples is set to 10, therefore, the total budget of constructing an MFS model is the cost of 5n HF samples. To eliminate the effect of random sampling plan on the performance of surrogate models, all the results are averaged over 30 random DOEs. “Appendix 2” lists the comparison results of the MFS models and the three single-fidelity models in terms of the R2 of the 16 test functions. The best results of each function are in bold italics. It is shown that the MFS-MLS model performs best among the four MFS and the three single-fidelity models except for the 3rd, 6th and 9th functions. It should be noted that, for the 3rd, 6th and 9th functions, the performance of the MFS-MLS model is just slightly weaker than those of the best.

Figure 10 compares the MFS-MLS model with the other three MFS models and the three single-fidelity models in terms of the prediction accuracy. The red columns represent the mean of prediction accuracy R2, the mean is obtained by averaging the values of R2 over the 30 random DOE and then averaged the 16 test functions. The light blue columns represent the standard deviation (Std) of R2. It can be found that all of the four MFS models outperform the three single-fidelity models appreciably in terms of the mean of the prediction accuracy except that the CoRBF model beats the single-fidelity RBF model by a narrow margin. The MFS-MLS has the largest mean R2 of 0.923, which is better than the other MFS and single-fidelity models with the same cost of 5n HF samples. Moreover, the standard deviation of R2 of MFS-MLS is also smaller than the rest models except the MFS-RBF, which shows the strong robustness of the MFS-MLS model on these 16 test functions.

Fig. 10
figure 10

Comparison of the mean and Std of the R2 for the 16 test functions

4 Engineering problem

In this section, a static analysis of the boom of a bucket wheel reclaimer (BWR) was used to validate the performance of the proposed model. BWRs, as shown in Fig. 11, are used for moving large amounts of bulk materials, such as coal and ores, in ports, power plants, and stockyards. This is how BWRs work: a heavy load is attached to the short end of the boom served as the balance weights. The bulk materials can then be reclaimed by the rotation of bucket wheel which is mounted on the long end of the boom on the opposite side. Generally, the boom of a BWR consists of I-beams, and overload can cause deformation, vibration or even failure of the boom. Therefore, the relationship between the maximum deformation (see Fig. 12) and the cross-sectional area of I-beam under different balance weights was investigated. As shown in Fig. 13, the Flange width (W1), beam height (W2), the web thickness (t), and the balance weight (P) are selected as design variables with ranges of 60–70 mm, 100–120 mm, 4–5 mm, and 300–350 kN, respectively. The maximum deformation of the boom is the quantity of interest. The static analysis was conducted on a personal computer with an Intel Core i7 6700 CPU and 32G RAM, using the commercial software ANSYS 17.0. It is assumed that the cutting resistance is constant and the gravity load is considered. The model of the boom built by Timoshenko beam, consisting of 374,000 elements, was used as the HF simulation model, while the model of the boom built by Euler–Bernoulli beam, consisting of 46,750 elements, was used as the LF simulation model. It was found that running one HF simulation takes approximately 71 s, while running one LF simulation takes approximately 13 s; thus, the cost ratio of the HF model to the LF model is approximately 5. The MFS models are constructed by 12 HF samples and 20 LF samples. Three single-fidelity models are constructed by 16 HF samples. In addition, another 20 HF samples were used as testing data to validate the performance of the MFS and single-fidelity models. The comparison of MFS-MLS with the other three MFS models and the three single-fidelity surrogates is shown in Fig. 14. It is observed that the MFS-MLS model exhibits the best results among all the surrogates for this engineering problem.

Fig. 11
figure 11

Bucket wheel reclaimer

Fig. 12
figure 12

The deformation cloud map of the boom

Fig. 13
figure 13

The cross-section of an I-beam

Fig. 14
figure 14

Comparison of different MFS and single-fidelity models for the engineering problem

5 Conclusions

A multi-fidelity surrogate model based on moving least squares, called MFS-MLS, was developed in this paper. In the proposed method, the MLS is used to combine the LF model and the discrepancy function to represent the HF responses. Unlike global MFS models, the coefficients of the MFS-MLS model at each prediction site are calculated using the weighted HF samples within the influence domain. Moreover, the size of the influence domain is determined adaptively by a new strategy. The MFS-MLS model was compared with three benchmark MFS models (i.e., MFS-RBF, CoRBF, and LR-MFS) and three popular single-fidelity surrogate models (RBF, PRS, and MLS) in terms of the prediction accuracy through multiple numerical test functions and an engineering problem. The results show that the MFS-MLS model exhibited competitive performance in both the numerical cases and the practical case. In addition, the effects of key factors (i.e., the correlation of HF and LF samples, the cost ratio of HF to LF samples, and the combination of HF and LF samples) on the performance of the MFS-MLS were investigated using two test functions. The results show that the prediction accuracy of the MFS-MLS model is less sensitive to the correlation of HF and LF samples compared with the other MFS models, which is caused by the fact that, for the MFS-MLS model, the HF samples far away from the evaluation point have less or even no influence on the prediction. The performance of the MFS-MLS model, however, is still getting better as the correlation increases as long as the LF model is accurate. It is also found that under the same total computational budget, the performance of the MFS-MLS model will become better with the increase of the cost ratio. Moreover, it is suggested that 60–80% of the total budget should be allocated to HF samples and this percentage can be increased if the cost ratio is large.

It is worth noting that the MFS-MLS, like other MFS models, cannot be mathematically proved to be a universal approximator. Therefore, the prior information of the engineering problems, such as the dimensionality of the problem and the total computational budget, is encouraged to be considered before applying the MFS-MLS model. When the MFS-MLS model solves high-dimensional problems, the influence domain tends to be so large that all the HF samples are included in it if the quadratic monomials with cross terms are employed. This implies that it needs more and more samples for high-dimensional problems, otherwise the proposed strategy will turn the MFS-MLS model from local to global, which will incur a decline in the performance. In the future, we will focus on strategies to identify the influence domain to locally correct the surrogate with further less HF samples for high-dimensional problems.