1 Introduction

Design optimization and uncertainty quantification often require numerous expensive simulations to find an optimum design and to propagate uncertainties. Instead, a surrogate fit to dozens of simulations is often employed as a cheap alternative. However, for computationally expensive high-fidelity simulations, even evaluating sufficient samples for building a surrogate is often unaffordable. To address this challenge, multi-fidelity surrogates (MFS) combine inexpensive low-fidelity models with a small number of high-fidelity simulations. For example, MFS can be employed to predict the response of an expensive finite element model with a few runs of the model and many runs from a less accurate model with a coarse mesh. MFS have been applied extensively in the literature (Fernández-Godino et al. 2016; Gano et al. 2006).

The regression-based MFS framework has been used extensively in design optimization, for example, by combining two- and three-dimensional finite element models (Mason et al. 1998) or coarse and fine finite element models (Balabanov et al. 1998). More recently, Bayesian MFS frameworks have become popular. Qian and Wu (2008) proposed the use of Markov chain Monte Carlo and sample average approximation algorithm for hyperparameter estimation of the Bayesian MFS framework. Co-Kriging followed with better computational efficiency (Forrester, 2007; Le Gratiet 2013). A Gaussian process (GP) based Bayesian MFS framework was introduced by Kennedy and O'Hagan (2000). The use of GP model provides flexibility and their prediction is not limited to a specific form of the trend function. Thus, the Bayesian MFS can also be useful when there is no prior information, which is also called non-informative prior (Rasmussen, 2006).

The Bayesian MFS framework can be expressed as

$$ {\widehat{y}}_H\left(\mathbf{x}\right)=\rho {\widehat{y}}_L\left(\mathbf{x}\right)+\widehat{\delta}\left(\mathbf{x}\right) $$
(1)

where \( {\widehat{y}}_H\left(\mathbf{x}\right) \)is high-fidelity function prediction at x, \( {\widehat{y}}_L\left(\mathbf{x}\right) \)is low-fidelity function prediction, \( \widehat{\delta}\left(\mathbf{x}\right) \)is discrepancy function prediction, and ρ is a low-fidelity scale factor. This scale factor has rarely been used in the past. However, the combined use of a scale factor and a discrepancy function has been common for recently developed GP-based MFS frameworks (Fernández-Godino et al. 2016; Zhou et al. 2018).

The MFS frameworks can handle noisy data using (1) with a noise model, where random noise follows a normal distribution defined with zero mean and a noise standard deviation, which needs to be estimated. However, this paper focuses on MFS prediction with a few high-fidelity samples without random noise. This is because filtering noise based on a few high-fidelity samples is often not reliable (Matsumura et al. 2015).

It was found that the Bayesian framework and co-Kriging often gave significantly more accurate predictions than other MFS frameworks with a discrepancy function only (Park et al. 2017). An interesting observation was the influence of the scale factor. The Bayesian framework gave much more accurate discrepancy predictions with the scalar and so did the MFS predictions, but it gave mediocre predictions without the scalar. The objective of this paper is to discover the reason why the use of the scalar made the Bayesian MFS significantly more accurate. Understanding the reason behind the Bayesian MFS is likely to help extend the success to other non-Bayesian MFS frameworks that may have advantages for some applications.

This paper is organized as follows. Section 2 discusses the importance of using the scalar for making MFS prediction. One-dimensional examples show how the scalar improves the accuracy of the discrepancy function to improve MFS prediction. It is discussed that large waviness and variation of the discrepancy function tend to increase errors. Bumpiness is introduced to combine them. Section 3 explains how the Bayesian framework characterizes a discrepancy function with variation and waviness. It finds the scalar by combining them through the likelihood function based on a Gaussian process model. Section 4 uses multi-dimensional examples to illustrate the correlation between bumpiness and error. These include the Borehole3 physical function and Hartmann6 algebraic function. Concluding remarks are presented in Section 5.

2 The importance of having the scale factor to reduce the bumpiness of a discrepancy

In MFS frameworks, it is usually assumed that the low-fidelity function is well approximated due to a relatively large number of samples. With the model expressed in (1), the low-fidelity prediction is based on a sufficient number of low-fidelity samples. Once the low-fidelity model is determined, the differences between the high-fidelity samples and the low-fidelity predictions are used to fit the discrepancy function. Since the discrepancy samples are based on the high-fidelity samples, the discrepancy function prediction depends on a small number of high-fidelity samples. Therefore, if the discrepancy function is wavy or has a high amplitude of oscillation, a small number of high-fidelity samples may lead to large errors.

Fortunately, the discrepancy function depends on the scalar ρ because it is defined as the difference between the high-fidelity prediction and the scaled low-fidelity prediction. Therefore, it is possible to manage the discrepancy function by changing the scalar. Based on our study about MFS frameworks, the Bayesian MFS framework was particularly effective with ρ. We found that it was because the Bayesian MFS determines the scalar to make the discrepancy simple. That is, it reduces the waviness and variation of the discrepancy, which tends to improve the prediction accuracy.

In order to show the above-mentioned characteristic, Fig. 1 shows two analytical examples where the Bayesian MFS framework was applied with and without ρ. In Fig. 1(a) and (b), the red and blue curves are the true high- and low-fidelity functions, and the crosses and the hollow circles are the high- and low-fidelity samples, respectively. Figure 1(c) and (d) show the true discrepancy function (dashed curve) without ρ (or with ρ = 1) and the corresponding predictions (red solid curve) using the Bayesian framework. The figures also show the two-sigma prediction uncertainty. Due to a large variation of the true discrepancy function, the four samples were not enough to predict the discrepancy accurately. Also, the 2σ-confidence intervals (blue areas) failed to cover the true discrepancy function.

Fig. 1
figure 1

Options to improve the accuracy by including ρ for MFS prediction (δ(x): True discrepancy function; \( \widehat{\delta} \)(x): Discrepancy prediction; δ: Discrepancy function data)

On the other hand, when the scale factor is present, the Bayesian framework found ρ = 2.5 for the first example, whose discrepancy is shown in Fig. 1(e). Note that the discrepancy samples in Fig. 1(c) and (e) are different because they are discrepancies between the high-fidelity samples to that of scaled low-fidelity samples. By choosing ρ = 2.5, the variation in the original discrepancy in Fig. 1(c) is drastically reduced as shown in Fig. 1(e), and thus, four samples were enough to accurately predict it. Due to the reduction of variation, the root-mean-squared-error (RMSE) in the discrepancy is reduced by more than a factor of 8. Although the true discrepancy function is still wavy, the bumpiness is significantly reduced by reducing the variation in this case.

In the case of the second example, Fig. 1(f) shows that the Bayesian method found ρ = 2 that turns the wavy discrepancy function in Fig. 1(e) into a linear function, and the prediction becomes almost perfect. In this case, the Bayesian MFS framework reduces the waviness of the discrepancy while the variation is still large.

Note that the results also illustrate the flexibility of the GP-based Bayesian MFS framework. The trend functions of the GP models of the Bayesian MFS framework were set to a constant function. Figure 1(f) shows that the GP-based discrepancy prediction gave a prediction like a linear function based on the data while its trend function is a constant function. In addition to the two analytical examples, a cantilever beam example is also included in Appendix B.

The concept of variation and waviness can be combined into the concept of bumpiness. The notion of bumpiness, which is also referred to as roughness, was introduced for measuring function roughness (Duchon 1977; Gutmann 2001; Cressie 2015). Salem and Tomaso (2018) uses bumpiness to select surrogate and surrogate weighting for an ensemble. Bumpiness is an integral of the square of second derivative of the associated function, as

$$ B\left(f(x)\right)=\int {\left|{f}^{{\prime\prime} }(x)\right|}^2\mathrm{d}x $$
(2)

In the following section, we describe how the Bayesian MFS framework combines the effect of variation and waviness through the likelihood function. In the Bayesian method, finding a scalar value that maximizes the likelihood function is related to reducing variation and waviness, which also leads to the reduction of bumpiness in (2). However, maximizing the likelihood function does not mean exactly minimizing the bumpiness.

3 Bayesian MFS framework: Finding the scale factor that reduces bumpiness

The two examples in the previous section used the Bayesian framework to determine ρ, as shown in Fig. 1(e) and (f). The Bayesian framework finds ρ using the method of maximum likelihood estimation (MLE) that estimates ρ at the mode of the likelihood function. While the ρ obtained by MLE is not exactly the same as the ρ minimizing the bumpiness, for the examples we analyzed they were close, as will be seen in the next section. This section discusses why the Bayesian formulation tends to reduce the bumpiness in the discrepancy function using variation reduction and waviness reduction.

The likelihood function, which is in the form of the multi-variate normal distribution, can be reformulated to find ρ as the minimizer, as

$$ \underset{\rho }{\arg \min}\kern1em {\widehat{\sigma}}_{\Delta}^2\left(\rho \right){\left|{\mathbf{R}}_{\Delta}\left({\boldsymbol{\upomega}}_{\Delta}\right)\right|}^{1/{n}_H} $$
(3)

where\( {\widehat{\sigma}}_{\Delta}\left(\rho \right) \)and |RΔ(ωΔ)|are, respectively, the process standard deviation and the determinant of the correlation matrix obtained based on discrepancy data \( {\mathbf{y}}_H-\rho {\mathbf{y}}_L^c \). \( {\widehat{\sigma}}_{\Delta}\left(\rho \right) \)represents the variation, while |RΔ(ωΔ)| represents the waviness of the discrepancy data. |RΔ(ωΔ)| can be interpreted as a waviness measure, which is a function of waviness vector ωΔ. The detailed derivation of (3) from the likelihood function is given in Appendix A.

\( {\widehat{\sigma}}_{\Delta} \)and ωΔare estimated based on the discrepancy data for given ρ using auto-covariance, which was introduced to quantify the probabilistic similarity of two values in space (Ripley 1981). The auto-covariance is also applicable to random function generation based on the variation and waviness parameters of the auto-covariance. The inverse use of the auto-covariance allows estimating the variation and waviness of the true function based on the data (Rasmussen 2004).

The Bayesian framework uses the Gaussian correlation function to model the auto-covariance of the uncertainties in discrepancy function predictions at different locations. Let Δ(x) and Δ(x) be the discrepancy predictions at two data locations x and xʹ. The covariance between them is expressed using their distance as

$$ \operatorname{cov}\left(\Delta \left(\mathbf{x}\right),\Delta \left({\mathbf{x}}^{\prime}\right)\right)={\sigma}_{\Delta}^2\exp \left(-{\left(\mathbf{x}-{\mathbf{x}}^{\prime}\right)}^{\mathrm{T}}\operatorname{diag}\left({\boldsymbol{\upomega}}_{\Delta}\right)\left(\mathbf{x}-{\mathbf{x}}^{\prime}\right)\right) $$
(4)

where diag(ωΔ) is a diagonal matrix with the waviness vector ωΔ, which has the same dimension with x.

Figure 2 shows two sets of samples from (a) a wavy function with small variation and (b) a less wavy function with large variation. The process standard deviation and waviness were estimated based on the data sets using (4). It is clear that a wavy function with small variation has a small σΔ and a large ωΔ. On the other hand, a less wavy function with large variation has a large σΔ and a small ωΔ.

Fig. 2
figure 2

The variation and the waviness of a discrepancy function

The effect of the process standard deviation on (3) is obvious because the objective function decreases as with the process standard deviation. The influence of |RΔ| on (3) is a function of the waviness parameter and discrepancy data locations. Since data location remains the same for finding ρ, there is no influence of the data location.

The correlation matrix RΔ is a symmetric square matrix. The size of the matrix is the number of discrepancy data. The diagonal elements of RΔ are one and the off-diagonal elements are obtained by the exponential part of the auto-correlation in (4). The off-diagonal elements measure the correlations between discrepancy values at two different data locations. The correlation matrix is expressed as

$$ {\mathbf{R}}_{\Delta}={\left[\begin{array}{ccc}1& \cdots & \exp \left(-{\left({\mathbf{x}}_{\delta }-{\mathbf{x}}_{\delta}^{\prime}\right)}^T\operatorname{diag}\left({\boldsymbol{\upomega}}_{\Delta}\right)\left({\mathbf{x}}_{\delta }-{\mathbf{x}}_{\delta}^{\prime}\right)\right)\\ {}& \ddots & \vdots \\ {} symm& & 1\end{array}\right]}_{\left({n}_H\times {n}_H\right)} $$
(5)

The properties of the determinant of a correlation matrix are well known (Reddon et al. 1985; Lophaven et al. 2002; Johnson and Wichern 2007). The determinant has a minimum value of zero when ωΔ = 0, which makes all the off-diagonal elements one. On the other hand, the determinant has a maximum value of one when ωΔ → ∞, which makes all the off-diagonal elements zero. As shown by Johnson and Wichern (2007), the determinant decreases monotonically with decreasing ωΔ. Since waviness is proportional to ωΔ, reducing the determinant is equivalent to reducing the waviness.

In summary, the minimization of (3) is achieved by reducing the product of the terms representing variation and waviness. When there is no way to reduce the variation and waviness using ρ simultaneously, (3) trade-off between them to minimize the objective function. Equation (3) drives bumpiness reduction through the reduction of variation and/or waviness. However, minimizing (3) is not theoretically equivalent to minimizing the bumpiness of the discrepancy function defined in (2).

The reader is referred to Section A.2 of Appendix for the detailed formulas to measure variation and waviness using the auto-covariance model of the Bayesian framework.

4 Multi-dimensional examples

In this section, the influence of ρ on increasing MFS prediction accuracy by reducing bumpiness will be presented through multi-dimensional examples: (a) physical borehole function and (b) numerical Hartmann 6 function.

Firstly, in this section, three different MFS frameworks are compared, along with two single-fidelity Kriging surrogates. Table 1 shows the framework descriptions with the corresponding abbreviations. “H” and “L” denote Kriging surrogates using only high- and low-fidelity samples, respectively. “B” is the Bayesian framework without ρ. “BR” is the Bayesian framework with ρ that is found by minimizing the bumpiness, while ρ in BR2 is found by minimizing error. The comparison between BR and B shows the effect of including ρ on the prediction. The comparison between BR and BR2 shows the effect of different criteria for finding ρ: reducing bumpiness versus minimizing error.

Table 1 Frameworks and the corresponding labels

Secondly, the influences of ρ on the bumpiness of the discrepancy function and the accuracy of MFS prediction were measured in the form of graphs by gradually changing ρ. However, since evaluating second-order derivatives of a multi-dimensional function in (2) is a computational challenge (Cressie, N., 2015; Duchon, J., 1977), one-dimensional bumpiness measures are used along Nline = 1000 randomly generated lines. Each line was generated by connecting two randomly generated points and extending the line to the boundary of the sampling domain. The average of the one-dimensional bumpiness measures is used as a representative bumpiness measure for a given value of ρ, as

$$ B\left(\rho \right)=\frac{1}{N_{line}}\sum \limits_{i=1}^{N_{line}}\int {\left|{\delta}^{{\prime\prime}}\left({s}_i,\rho \right)\right|}^2{ds}_i $$
(6)

where siis the parameter along the ith line, and δ(si, ρ)is the second-order derivative of the discrepancy function along the line for a given ρ.

The graph of bumpiness with respect to ρ explains the influence of ρ on the bumpiness, but it does not explain whether the change in bumpiness is caused by the variation and/or the waviness. To measure their individual contributions, graphs of variation and waviness are also obtained in terms of ρ. The discrepancy variation is measured using the variance of the discrepancy along all lines as

$$ V\left(\rho \right)=\frac{1}{N_{line}}\sum \limits_{i=1}^{N_{line}}{\sigma}_i^2 $$
(7)
$$ {\mu}_i=\int \delta \left({s}_i,\rho \right)/{L}_i{ds}_i\kern0.72em \mathrm{and}\kern0.72em {\sigma}_i^2=\int {\left(\delta \left({s}_i,\rho \right)-{\mu}_i\right)}^2/{L}_i{ds}_i $$
(8)

where Liis the length of the ith line.

To quantify the effect of waviness, a normalized bumpiness is used. For example, the variation of δ1(x) = 2sin(100×) is four times of the variation of δ2(x) = sin(100×) while their waviness is the same. If these two functions are normalized, then they have the same variation, so that only waviness can be measured. The waviness measure is defined as

$$ W\left(\rho \right)=\frac{1}{N_{line}}\sum \limits_{i=1}^{N_{line}}\int {\left|{\overline{\delta}}^{{\prime\prime}}\left({s}_i,\rho \right)\right|}^2{ds}_i $$
(9)

where \( {\overline{\delta}}^{{\prime\prime}}\left({s}_i,\rho \right)={\delta}^{{\prime\prime}}\left({s}_i,\rho \right)/{\sigma}_i \) is the normalized second-order derivative of the discrepancy function using the standard deviation.

The accuracy graph of the Bayesian framework is measured in terms of RMSE with respect to ρ. Since the MFS prediction depends on samples; i.e., design of experiments (DOE), 100 DOEs were randomly generated using the nearest neighbor sampling method (Forrester et al. 2007; Jin et al. 2005). Since the Bayesian framework is applied for each DOE to calculate RMSE, the above process yields 100 RMSEs. In the following examples, the median, 25 and 75 percentiles of RMSEs were obtained as a function of ρ.

4.1 Borehole function example: Reducing discrepancy variation

The empirical Borehole function calculates the water flow rate from an aquifer through a borehole. The function was obtained based on assumptions of steady-state flow from an upper aquifer to the borehole and from the borehole to the lower aquifer, no groundwater gradient, and laminar, isothermal flow through the borehole (Morris et al. 1993).

In this example, the borehole function is considered as the high-fidelity function and an approximate function is used as a low-fidelity function. The high-fidelity function is defined as

$$ {f}_H\left({R}_w,L,{K}_w\right)=\frac{2\pi {T}_u\left({H}_u-{H}_l\right)}{\ln \left(R/{R}_w\right)\left(1+\frac{2{LT}_u}{\ln \left(R/{R}_w\right){R}_w^2{K}_w}+\frac{T_u}{T_l}\right)} $$
(10)

The flow rate fH(Rw, L, Kw) is a function of three input variables, Rw, L, and Kw, which are the borehole radius, borehole length and hydraulic conductivity of borehole, respectively. The ranges of the input variables and other environmental parameters are presented in Table 2. The values of the parameters were determined as nominal values of the parameters based on Morris et al. (1993).

Table 2 Input variables and environmental parameters

A low-fidelity function of the borehole function is obtained from the literature (Xiong et al. 2013) as

$$ {f}_L\left({R}_w,L,{K}_w\right)=\frac{5{T}_u\left({H}_u-{H}_l\right)}{\ln \left(R/{R}_w\right)\left(1.5+\frac{2{LT}_u}{\ln \left(R/{R}_w\right){R}_w^2{K}_w}+\frac{T_u}{T_l}\right)} $$
(11)

Note that bounds of [0.5, 1.5] were used for ρ, and constant trend functions were used for the Bayesian framework.

Since MFS are built with low- and high-fidelity samples, there are many different combinations of low- and high-fidelity samples possible for the same total computational budget. MFS performances for different combinations were measured, and then, the one that shows the highest accuracy was selected and analyzed further.

All the frameworks in Table 1 were applied for different sample combinations for the same total budget. Table 3 shows sample size ratios for the total budget of 10H, which means the computational budget for evaluating ten high-fidelity samples. The sample cost ratio of 30 means that the cost of evaluating 30 low-fidelity samples is equivalent to that of evaluating a single high-fidelity sample. With the total budge of 10H, we can use either 10 high-fidelity samples, 300 low-fidelity samples, or any combinations as shown in Table 3. These combinations are expressed with the numbers of high- (nH) and low- (nL) fidelity samples, such as 7/90.

Table 3 Cases of sample size combinations for a total computational budget of evaluating 10 high-fidelity samples (10H) and ratio of 30 between the cost of high-fidelity and low-fidelity simulation (Borehole3 function)

For each sample size ratio, 100 DOEs were randomly generated using the nearest neighbor sampling method (Forrester et al. 2007), and the statistics of their RMSEs are used for evaluating the performance of each MFS framework. Note that RMSE of each DOE was calculated based on 100,000 test points in the sampling domain. For each sample size ratio and MFS framework, the median, 25 and 75 percentiles of RMSEs were obtained.

Figure 3 shows the median RMSEs of all five frameworks for different sample size ratios. Since the low-fidelity Kriging surrogate used 300 samples, the prediction error is small against the true low-fidelity function, but RMSE is high against the true high-fidelity function. On the other hand, the error in the high-fidelity Kriging surrogate comes from the prediction error because 10 high-fidelity samples are too few. In general, the RMSEs of the MFS frameworks were significantly lower than that of single-fidelities.

Fig. 3
figure 3

The median (of 100 DOEs) accuracy for different sample size ratios (Note that the BR2 curve is overlapped with the BR curve) (Borehole example)

The difference between these MFS frameworks is from the contributions of ρ and the criteria for finding ρ. BR and BR2 significantly outperformed B. That indicates that the inclusion of ρ is a key factor for the accuracy in this example. In addition, the difference in finding ρ (BR and BR2) did not lead to a significant difference, which means that the different criteria for finding ρ did not change the results. In the case of the sample size ratio of 7/90 (BR and BR2 were most accurate at this ratio), the estimated ρ from 100 DOEs has the mean of 1.25 and standard deviation of 9.3 × 10−7; that is, the effect of different DOEs is negligible. In this example, the directions of finding ρ for reducing bumpiness (BR) and maximizing agreement (BR2) are consistent, which is not always true. The reason will be discussed in a later section.

Figure 4(a), (b) and (c) show the graphs of bumpiness, variation, and waviness of the true discrepancy function with respect to ρ. In Fig. 4(d), the MFS accuracy was calculated based on 100 DOEs of 7/90 at which the prediction accuracy of BR is closest to the minimum median RMSE of BR shown in Fig. 3. From Fig. 4(a), ρ = 1.25 minimizes the bumpiness, which is consistent with the mean of the estimated ρ from BR; that is, BR found ρ that is identical to minimizing the bumpiness. The bumpiness behavior is related to the variation and waviness behavior. Figure 4(b) and (c) show that ρ affects the variation but not the waviness of the discrepancy function, so all the changes in the bumpiness are due to variation. Figure 4(d) shows that the MFS accuracy as well as its uncertainty is the best when the bumpiness is minimum. In summary, BR found ρ = 1.25 by minimizing the bumpiness, or equivalently, by minimizing variation. Such behavior is related to the characteristics that both the high- and low-fidelity functions are convex.

Fig. 4
figure 4

The bumpiness, variation and waviness graphs and the RMSE from BR in terms of ρ for the borehole example

Figure 5 illustrates a prediction comparison between BR and B along the line connecting x1 = {0.2, 1120, 1500} and x2 = {0, 1680, 15,000} for a chosen DOE. Note that the characteristics along other lines are similar to this one. Since the low-fidelity prediction was very accurate, the prediction accuracy was determined by the error in the discrepancy function. As the result shows, seven high-fidelity samples could not capture the curvature of the discrepancy in Fig. 5(b) without ρ (or ρ = 1). ρ reduced the variation of the discrepancy function and so does the bumpiness as shown in Fig. 5(a); and it increased the MFS prediction accuracy significantly. Note that the magnitude of the discrepancy function in Fig. 5(a) and (b) are different by a factor of about 100. Since the variation of the discrepancy function was reduced so much, the errors in fitting the discrepancy function have also been reduced by two orders of magnitude. The comparison between Fig. 5(c) and (e) shows that the high-fidelity response has a similar trend with the low-fidelity response. Therefore, magnifying the low-fidelity function reduces the variation of the discrepancy function.

Fig. 5
figure 5

Comparisons between the predictions based on the BR and the B on the line between x1 = {0.2, 1120, 1500} and x2 = {0, 1680, 15,000} and RMSEs along the line (Borehole example)

It is recalled that the performances of the BR and the BR2 were almost identical. This is because for this example, the reduction in variation of the discrepancy function is achieved by scaling down its magnitude. However, this is not always the case, as the Hartman 6 example will show.

4.2 Hartmann 6 function example

The Hartmann 6 function example also shows that Bayesian frameworks with ρ increase the prediction accuracy of MFS. However, in contrast to the borehole example, the BR2 did not give as good prediction as BR, which confirms that reducing bumpiness was more effective than minimizing error. For a high-fidelity function, the six-dimensional Hartmann 6 function is defined as

$$ {f}_H\left(\mathbf{x}\right)=-\frac{1}{1.94}\left(2.58+\sum \limits_{i=1}^4{\alpha}_i\exp \left(-\sum \limits_{j=1}^6{A}_{ij}{\left({x}_j-{P}_{ij}\right)}^2\right)\right) $$
(12)

where the domain of input variables is defined as {0.1, ...0.1} ≤ x ≤ {1, …, 1}. For model parameters, α = {1 1.2 3 3.2}T, and the following two constant matrices are used:

\( \mathbf{A}=\left(\begin{array}{cccccc}10& 3& 17& 3.5& 1.7& 8\\ {}0.05& 10& 17& 0.1& 8& 14\\ {}3& 3.5& 1.7& 10& 17& 8\\ {}17& 8& 0.05& 10& 0.1& 14\end{array}\right) \) and \( \mathbf{P}={10}^{-4}\left(\begin{array}{cccccc}1312& 1696& 5569& 124& 8283& 5886\\ {}2329& 4135& 8307& 3736& 1004& 9991\\ {}2348& 1451& 3522& 2883& 3047& 6650\\ {}4047& 8828& 8732& 5743& 1091& 381\end{array}\right) \).

The approximated Hartmann 6 function was invented to be used as a low-fidelity function as

$$ {f}_L\left(\mathbf{x}\right)=-\frac{1}{1.94}\left(2.58+\sum \limits_{i=1}^4{\alpha}_i^{\prime }{f}_{\mathrm{exp}}\left(-\sum \limits_{j=1}^6{A}_{ij}{\left({x}_j-{P}_{ij}\right)}^2\right)\right) $$
(13)

where \( {\boldsymbol{\upalpha}}^{\prime }={\left\{0.5\kern0.5em 0.5\kern0.5em 2.0\kern0.5em 4.0\right\}}^T \)andfexp(x)is the approximated exponential function as

$$ {f}_{\mathrm{exp}}(x)={\left(\exp \left(\frac{-4}{9}\right)+\exp \left(\frac{-4}{9}\right)\frac{\left(x+4\right)}{9}\right)}^9 $$
(14)

Note that the total function variation of the Hartmann 6 function is 0.33 and the RMSE of the low-fidelity function is 0.11.

In this example, the total computational budget is the cost of evaluating 56 high-fidelity samples (56H), and the sample cost ratio between high- and low-fidelity functions is 30. Table 4 shows the considered sample size ratios. The notations and the repetitions of DOE are the same as the previous example.

Table 4 Cases of sample size combinations for a total computational budget of evaluating 56 high-fidelity samples (56H) and sample cost ratio of 30 (Hartmann6 example)

Figure 6 shows the median of RMSEs of all the frameworks for different sample size ratios. In this example, BR outperformed both B and BR2, which shows that not only the inclusion of ρ but also reducing the bumpiness is important for prediction accuracy. Finding ρ by reducing bumpiness yielded much more accurate prediction than by minimizing error.

Fig. 6
figure 6

The median (of 100 DOEs) RMSEs versus sample size ratio (Hartman6 example)

Figure 7(a) and (b) show the histograms of ρ estimated from BR and BR2 for the sample size ratio of 42/420, at which the BR is the most accurate. The histograms clearly show they estimated significantly different ρ. The mode of the histogram was 1.49 for BR, while it was 1.03 for BR2. Since there is no difference in making low-fidelity predictions between the BR and the BR2, the difference between the two frameworks is caused by the difference in the ways of finding ρ.

Fig. 7
figure 7

Histogram of ρ estimates (Hartman6 example)

The graphs of MFS discrepancy bumpiness, variation and waviness with respect to ρ are shown in Fig. 8 as well as the graph of prediction accuracy for the sample size ratio of 42/420. Figure 8(a) shows the bumpiness graph of the true discrepancy function, where the minimum bumpiness occurred at ρ = 1.41. The mode of ρ from BR (1.49) is close to ρ at the minimum bumpiness, which indicates that BR found ρ, which reduces the bumpiness as discussed in Section 3. The contributions of the variation and the waviness are shown as a graph in Fig. 8(b) and (c). It shows that the bumpiness is strongly correlated with the variation, while the waviness shows an opposite behavior, but its contribution is overwhelmed by that of the variation. Figure 8(d) shows the RMSE of BR for varying ρ. Note that the corresponding RMSE graph of BR2 is identical to that of BR. That means, BR and BR2 gave identical predictions for the same ρ. The difference between BR and BR2 shown in Fig. 6 was because they used different ρ estimates. RMSE is closely correlated with the bumpiness, where the maximum accuracy occurred at ρ = 1.55. ρ at the minimum variation (1.55 from Fig. 8(b)) is consistent with ρ at the minimum RMSE (1.55 from Fig. 8(d)).

Fig. 8
figure 8

The bumpiness, variation and waviness graphs and the RMSE from BR in terms of ρ for the Hartman6 example

The result indicates that bumpiness reduction was effective to reduce prediction error. However, minimizing bumpiness is not equivalent to maximizing accuracy. An explanation of the observation is that the bumpiness of the true discrepancy function is compared with the accuracy of the predictions based on samples. Since there are infinite true functions for a given set of samples, the bumpiness of the true function cannot be perfectly correlated with the error.

In order to visualize the effect of different variation reductions between BR with BR2, a DOE was chosen for the sample size ratio of 42/420. Figure 9 shows predictions along a line in the sampling domain that has the maximum difference between BR and BR2. The low-fidelity prediction is reasonably accurate with RMSE of 0.048 for both frameworks (Fig. 9(e) and (f)). Therefore, the error in MFS mostly comes from the error in the discrepancy function. This is because 42 high-fidelity samples are not sufficient to capture the bumpy behavior of the discrepancy in the six-dimensions. BR determined ρ = 1.48 by minimizing the bumpiness, while BR2 found ρ = 1.04 by minimizing the error.

Fig. 9
figure 9

Comparisons between the BR and the BR2 on hyper-line between {0.35,0.32,0.63,0.14,0.88,0.10} and {0.30,0.39,0.31,0.36,0.10,0.80} and RMSEs along the line (Hartman6 example)

As in the borehole function, Figs. 9(a) and (b) show that the fit along the line is poor for both BR and BR2. However, the improvement by BR is not accomplished by reducing the magnitude of the discrepancy but by reducing its variation. The RMSEs along the line are much higher than the median RMSEs of the whole sampling domain shown in Fig. 6. This is because the line is a tiny part of the domain. However, in terms of the RMSE reduction, they are consistent: 23% reduction for the line and 20% reduction for the whole domain.

5 Concluding remarks

This paper discussed that the Bayesian discrepancy framework uses the low-fidelity scaling scalar to reduce variation and waviness of the discrepancy function through the likelihood based on the Gaussian process model. The variation and waviness reductions lead to reduction of bumpiness that combines the two without a Gaussian process model. The importance of including the low-fidelity scaling factor is that it allows to reduce bumpiness of the discrepancy function that tends to reduce error as the examples show. For the examples studied, the success of the Bayesian framework was largely based on the use of the scale factor. Without the scalar, the Bayesian method gave mediocre predictions. The three-dimensional borehole and the six-dimensional Hartmann6 examples demonstrated that the accuracy of the MFS predictions was strongly correlated with the bumpiness of the discrepancy function. For the Borehole3 example, the minimum RMSE was achieved with the scalar minimizing bumpiness. For the Hartmann6 example, the scalar minimizing RMSE was not identical to the scalar minimizing the bumpiness but they were very close and the behaviors of RMSE and bumpiness were strongly correlated. However, a perfect correlation cannot be expected between the bumpiness of the true function and the prediction error due to infinite possible true functions passing through a finite number of samples.

The Bayesian framework characterizes a discrepancy function with two factors variation and waviness through the Gaussian process model. And the maximum likelihood method combines variation and waviness reduction. Bumpiness is another way to combine them without using a Gaussian process model. For the examples using the Bayesian framework, variation reduction dominated bumpiness reduction for the Hartmann6 and Borehole3 examples. That can be interpreted as the low-fidelity function captured the trend of the high-fidelity function, but it did not capture the high-frequency behavior. Whereas waviness reduction requires the scaled low-fidelity model to capture the high-frequency behavior of the high-fidelity model without necessarily capturing the low-frequency behavior. We suspect that such a case may be rare, so reducing variation would be more common. The lessons learned from the Bayesian framework can be utilized for other MFS predictions.