1 Introduction

Bayesian optimization (BO) is a metamodel-based global optimization approach, where the search process is assisted by constructing and updating a metamodel iteratively, and the sequential sampling is guided by an acquisition function to incorporate uncertainty (Ghoreishi and Allaire 2019; Tran et al. 2019b). The construction of metamodels helps improve the search efficiency, while the sequential sampling guided by the acquisition function reduces the overall number of samples. The sequential sampling strategy is particularly helpful when high-cost simulations or physical experiments are involved. Different definitions of acquisition functions have been developed to balance between exploration and exploitation, such as expected improvement (EI), probability of improvement, and lower confidence bound. BO with the EI acquisition function is also called efficient global optimization (EGO) by some researchers.

Similar to other metamodel-based global optimization methods (Queipo et al. 2005; Wang and Shan 2007), the computational challenge for BO to solve large-scale problems still exists, because the number of samples to cover the search space grows exponentially as the dimension of the space increases. Multi-fidelity (MF) surrogate modeling is one approach to reduce the cost by combining sample points predicted by high-fidelity (HF) and low-fidelity (LF) models to construct the surrogates, since running LF models is less costly (Huang et al. 2006; Jones 2001; Shu et al. 2019a; Xiong et al. 2008; Zhou et al. 2017). The existing MF metamodels can be categorized as three types. The first type is the scaling function-based MF metamodeling, which tunes the LF model according to the HF model responses (Chang et al. 1993; Zhou et al. 2015). The second type is space-mapping MF metamodels, in which a transformation operator is applied to map the LF design space to the HF space and the optimal sample point in the HF space can be estimated (Bakr et al. 2001; Bandler et al. 1994; Koziel et al. 2006). The third type is MF Kriging models, such as the co-Kriging model (Kennedy and O'Hagan 2000) and hierarchical Kriging model (Han and Görtz 2012). Co-kriging models are constructed with the information of covariance between the LF and HF samples. However, they are constructed based on the nested HF sample points, which adds limitations in their applications. In hierarchical Kriging models, the LF Kriging model is directly used as the trend of the MF metamodel, without the requirement of nested sample points (Han and Görtz 2012; Zhang et al. 2018). Hierarchical Kriging allows designers to choose sample points more freely in the optimization process. Because of its flexibility in sampling, hierarchical Kriging received much more attentions in engineering design optimization (Courrier et al. 2016; Palar and Shimoyama 2017; Zhang et al. 2015). MF metamodels for multi-objective optimization (Shu et al. 2019b; Zhou et al. 2016), incorporating gradient information (Song et al. 2017; Ulaganathan et al. 2015), and adaptive hybrid scaling method (Gano et al. 2005) have also been developed.

The acquisition-guided sequential sampling has been applied in MF metamodel-based design optimization. For instance, Xiong et al. (2008) applied the lower confidence bound in sequential sampling to construct MF metamodels. Kim et al. (2017) used the EI acquisition function for the hierarchical Kriging model. However, these methods merely adopt the high-fidelity simulation data to update the MF metamodels. The acquisition functions were only applied to determine the locations of new samples, while the different costs associated with LF and HF samplings are not considered. To solve this problem, Huang et al. (2006) developed an augmented EI acquisition function for co-Kriging, in which EI is augmented by the correlation of predictions between different fidelity models and a ratio of sampling costs so that both the sample location and the fidelity level can be determined by maximizing the acquisition. Liu et al. (2018) improved the augmented EI criterion with the consideration of the sample cluster issue to reduce the computational cost of the co-Kriging. Ghoreishi et al. (2018) proposed to identify the next best fidelity information source and the best location in the input space via a value-gradient policy. Then they considered more information sources with different fidelity levels and explicitly account for the computational cost associated with individual sources (Ghoreishi et al. 2019). Zhang et al. (2018) proposed a multi-fidelity global optimization approach based on the hierarchical Kriging model, in which an MFEI acquisition function is extended from EI with different uncertainty levels corresponding to the samples of low and high fidelities. Tran et al. (2020) proposed to combine the overall posterior variance reduction and computational cost ratio to select the fidelity level.

In this paper, a new MF Bayesian optimization (MFBO) approach with the hierarchical Kriging model is developed. A MF acquisition function based on a new concept of expected further improvement is proposed, which enables the simultaneous selections of both location and fidelity level for the next sample. The different costs of HF and LF samples as well as the extra information of HF samples are considered altogether. A constrained MF acquisition function for unknown constraints is also introduced. The proposed MFBO approach is compared with the standard EGO method and the MFEI method (Zhang et al. 2018) using five numerical examples and one engineering case.

The remainder of this paper is organized as follows. In the “Background” section, the hierarchical Kriging model and standard EGO method are reviewed. In the “The proposed MFBO approach” section, the proposed MFBO approach and the new acquisition function are described in details. Five numerical examples and one engineering case study with the comparisons of results are presented in the “Examples and results” section, followed by concluding remarks in the “Concluding remarks” section 5.

2 Background

2.1 Hierarchical Kriging

Hierarchical Kriging is a MF metamodeling method, in which the LF Kriging model is taken to predict the overall trend whereas the HF samples are used to correct the LF model. The metamodel can be expressed as

$$ Y\left(\boldsymbol{x}\right)={\beta}_0{\hat{y}}_l\left(\boldsymbol{x}\right)+Z\left(\boldsymbol{x}\right) $$
(1)

where \( {\hat{y}}_l\left(\boldsymbol{x}\right) \) is the predicted mean of the LF Kriging model, which is constructed based on LF sample points, β0 is a scaling factor, and Z(x) is a stationary random process with zero mean and a covariance of

$$ Cov\left[Z\left(\boldsymbol{x}\right),Z\left(\boldsymbol{x}\hbox{'}\right)\right]={\sigma}^2R\left(\boldsymbol{x},\boldsymbol{x}\hbox{'}\right) $$
(2)

where σ2 is the process variance. R(x, x') is the spatial correlation function which only depends on the distance between two design sites, x and x'. Given HF sample points Xh = {xh, 1, xh, 2, …, xh, n} and their responses fh(Xh) = {f(xh, 1), f(xh, 2), …, f(xh, n)}, the predicted mean and variance of the hierarchical Kriging model at an unobserved point can be calculated as

$$ \hat{y}\left(\boldsymbol{x}\right)={\beta}_0{\hat{y}}_l\left(\boldsymbol{x}\right)+{\boldsymbol{r}}^{\mathrm{T}}\left(\boldsymbol{x}\right){\boldsymbol{R}}^{\hbox{-} 1}\left[{\boldsymbol{f}}_h\left({\boldsymbol{X}}_h\right)-{\beta}_0\mathbf{F}\right] $$
(3)

and

$$ \mathrm{MSE}\left\{\hat{y}\left(\boldsymbol{x}\right)\right\}={\beta}_0{\hat{y}}_l\left(\boldsymbol{x}\right)+{\boldsymbol{r}}^{\mathrm{T}}\left(\boldsymbol{x}\right){\boldsymbol{R}}^{\hbox{-} 1}\left[{\boldsymbol{f}}_h\left({\boldsymbol{X}}_h\right)-{\beta}_0\mathbf{F}\right] $$
(4)

respectively, where r(x) is the correlation vector with elements ri(x) = R(x, xi), xi ∈ Xh. R is the correlation matrix with elements R(i, j) = R(xi, xj), xi, xj ∈ Xh. F is the vector of predictions by the LF Kriging model at the locations of HF samples. The Gaussian correlation function

$$ R\left(\boldsymbol{x},{\boldsymbol{x}}^{\hbox{'}}\right)=\prod \limits_{k=1}^m{R}_k\left(\theta, {x}_k-{x}_k^{\prime}\right)=\prod \limits_{k=1}^m\exp \left(-{\theta}_k{\left|x-{x}^{\hbox{'}}\right|}^2\right) $$
(5)

is used in this paper. The hyper-parameters of hierarchical Kriging can be trained by maximizing the likelihood function:

$$ L\left({\beta}_0,{\sigma}^2,\boldsymbol{\theta} \right)=\frac{1}{\sqrt{{\left(2{\pi \sigma}^2\right)}^n\left|\mathbf{R}\right|}}\exp \left(-\frac{1}{2}\frac{{\left({\boldsymbol{f}}_h\left({\boldsymbol{X}}_h\right)-\mathbf{F}{\beta}_0\right)}^T{\mathbf{R}}^{-1}\left({\boldsymbol{f}}_h\left({\boldsymbol{X}}_h\right)-\mathbf{F}{\beta}_0\right)}{\sigma^2}\right) $$
(6)

More details of hierarchical Kriging can be found in Han and Görtz (2012).

2.2 Efficient global optimization approach

The EGO is a Bayesian optimization method where the EI acquisition function is used. The standard EGO method was originally proposed by Jones et al. (Jones et al. 1998) for expensive black-box problems. For Kriging and hierarchical Kriging models, the prediction at an unsampled point x can be regarded as a random variable and obeys a normal distribution \( Y\left(\boldsymbol{x}\right)\sim N\left(\hat{y}\left(\boldsymbol{x}\right),{\sigma}^2\left(\boldsymbol{x}\right)\right) \), where \( \hat{y}\left(\boldsymbol{x}\right) \) and σ2(x) are the predicted mean and variance. The improvement at x for a minimization problem is

$$ I\left(\boldsymbol{x}\right)=\max \left({f}_{\mathrm{min}}-Y\left(\boldsymbol{x}\right),0\right) $$
(7)

where fmin is the best solution in the current sample set. The expected improvement is

$$ EI\left(\boldsymbol{x}\right)=E\left[\max \left({f}_{\mathrm{min}}-Y\left(\boldsymbol{x}\right),0\right)\right] $$
(8)

By expressing the right-hand side of (8) as an integral, one can obtain the EI in the closed form as

$$ EI(x)=\left({f}_{\mathrm{min}}-\hat{y}(x)\right)\varPhi \left(\frac{f_{\mathrm{min}}-\hat{y}(x)}{\sigma (x)}\right)+\sigma (x)\phi \left(\frac{f_{\mathrm{min}}-\hat{y}(x)}{\sigma (x)}\right) $$
(9)

where ϕ(•) and Φ(•) are the probability density function and cumulative distribution function of the standard normal distribution, respectively. The EGO method helps obtain the next sample point by maximizing the EI function expressed in (9). Then the new sample point is used to update the metamodel. The iteration continues until the algorithm converges. More details of EGO can be found in Jones et al. (1998).

3 The proposed MFBO approach

The standard EGO method provides a way to select a new sample point in single-fidelity optimization. However, in MF optimization, the decision of choosing the next sample at either HF or LF level needs to be made to update the MF metamodel. In the proposed MFBO, a new acquisition function is developed to support the sequential sampling strategy for selecting sample points of different fidelity levels adaptively in MF optimization.

3.1 Acquisition function based on the expected further improvement

In general, HF sample points are more expensive to obtain but can provide more precise information, whereas LF sample points are less costly but less reliable. In MF optimization, the sample points in both HF and LF levels are chosen to update the MF metamodel. Here, a new acquisition function is defined so that the choice of fidelity level incorporates the considerations of both cost and benefit in LF and HF samples.

With the sequential sampling, the maximum EI gradually decreases as the BO algorithm converges to the optimal solution. If a sample point x is selected to update the metamodel, the EI will decrease from EI(x) calculated from (9) to zero. Therefore, the reduction of the EI value can be equivalently used to guide the sequential sampling. In the proposed MFBO, the reduction of EI value incorporates the different effects of LF and HF samples. If a HF sample is chosen at x, the HF effect on reducing the EI is

$$ \Delta {EI}_{\mathrm{h}}\left({\boldsymbol{x}}^{\ast}\right)= EI\left({\boldsymbol{x}}^{\ast}\right)\hbox{-} 0= EI\left({\boldsymbol{x}}^{\ast}\right) $$
(10)

where EI(·) is the EI function of hierarchical Kriging based on the existing HF samples and LF samples. If a LF sample at this location \( {\boldsymbol{x}}_l^{\ast } \) is chosen instead, the LF effect on reducing the EI, which is named further improvement given that \( {\boldsymbol{x}}_l^{\ast } \) is taken, is calculated as

$$ \Delta {EI}_l\left({\boldsymbol{x}}^{\ast}\left|{Y}_l\left({\boldsymbol{x}}_l^{\ast}\right)\right.\right)= EI\left({\boldsymbol{x}}^{\ast}\right)\hbox{-} EI\left({\boldsymbol{x}}^{\ast}\left|{Y}_l\left({\boldsymbol{x}}_l^{\ast}\right)\right.\right) $$
(11)

where Yl(·) is the LF metamodel with predicted mean \( {\hat{y}}_l\left(\cdotp \right) \) and variance σl2(x), \( EI\left({\boldsymbol{x}}^{\ast}\left|{Y}_l\left({\boldsymbol{x}}_l^{\ast}\right)\right.\right) \) represents the EI if calculated by HF sample x given that a LF sample \( {\boldsymbol{x}}_l^{\ast } \) was taken instead at the same location of x with the predicted LF response \( {Y}_l\left({\boldsymbol{x}}_l^{\ast}\right) \). Note that \( {Y}_l\left({\boldsymbol{x}}_l^{\ast}\right) \) is a random variable which follows a Gaussian distribution. Therefore, both the conditional expected value \( EI\left({\boldsymbol{x}}^{\ast}\left|{Y}_l\left({\boldsymbol{x}}_l^{\ast}\right)\right.\right) \) and the conditional further improvement in (11) vary according to \( {Y}_l\left({\boldsymbol{x}}_l^{\ast}\right) \). The overall expected value of \( EI\left({\boldsymbol{x}}^{\ast}\left|{Y}_l\left({\boldsymbol{x}}_l^{\ast}\right)\right.\right) \) is obtained as

$$ {\displaystyle \begin{array}{cc}E\left[ EI\left({x}^{\ast}\left|{Y}_l\left({x}_l^{\ast}\right)\right.\right)\right]& =\int P\left[\kern0.1em {Y}_l\left({x}_l^{\ast}\right)\right] EI\left({x}^{\ast}\left|{Y}_l\left({x}_l^{\ast}\right)\right.\right)d{Y}_l\left({x}_l^{\ast}\right)\\ {}& =\int \phi \left(\frac{Y_l\left({x}_l^{\ast}\right)-{\hat{y}}_l\left({x}^{\ast}\right)}{\sigma_l\left({x}^{\ast}\right)}\right) EI\left({x}^{\ast}\left|{Y}_l\left({x}_l^{\ast}\right)\right.\right)d{Y}_l\left({x}_l^{\ast}\right)\end{array}} $$
(12)

Note that \( EI\left({\boldsymbol{x}}^{\ast}\left|{Y}_l\left({\boldsymbol{x}}_l^{\ast}\right)\right.\right) \) in (12) is calculated based on the HF metamodel with HF sample x. Thus the expected further improvement is

$$ {\displaystyle \begin{array}{c}E\left[\varDelta E{I}_l\left({x}^{\ast}\right)\right]=E\left[ EI\left({x}^{\ast}\right)- EI\left({x}^{\ast}\left|{Y}_l\left({x}_l^{\ast}\right)\right.\right)\right]\\ {}\kern6.099997em = EI\left({x}^{\ast}\right)-E\left[ EI\left({x}^{\ast}\left|{Y}_l\left({x}_l^{\ast}\right)\right.\right)\right]\end{array}} $$
(13)

Considering the different costs of HF and LF samples and assuming that the cost ratio of a HF sample to a LF sample is T, we define a new acquisition function as

$$ {\displaystyle \begin{array}{c}a\left(x, fidelity\right)=\left\{\begin{array}{c}\frac{1}{T}\varDelta E{I}_h(x),\kern0.4em if\kern0.34em fidelity=2\\ {}E\left[\varDelta E{I}_l(x)\right],\kern0.4em if\kern0.34em fidelity=1\end{array}\right.\\ {}\kern0.3em =\kern0.4em EI(x)+\left( fidelity-1\right)\frac{1-T}{T} EI(x)+\left( fidelity-2\right)E\left[ EI\left({x}^{\ast}\left|{Y}_l\left({x}_l^{\ast}\right)\right.\right)\right]\end{array}} $$
(14)

to decide both the sample location and the fidelity level, where fidelity equals 1 for LF level and fidelity equals 2 for HF level. The location and fidelity level can be obtained by maximizing the acquisition function.

The proposed approach can be further extended to problems with multiple fidelity levels. The expected value of \( EI\left({\boldsymbol{x}}^{\ast}\left|{Y}_j\left({\boldsymbol{x}}_j^{\ast}\right)\right.\right) \) at the jth fidelity level can be calculated similarly as in (12), whereas \( EI\left({\boldsymbol{x}}^{\ast}\left|{Y}_j\left({\boldsymbol{x}}_j^{\ast}\right)\right.\right) \) itself is calculated based on metamodel \( {Y}_{j+1}\left(\boldsymbol{x}\right)={\beta}_{\mathrm{j}}{\hat{y}}_j\left(\boldsymbol{x}\right)+{Z}_j\left(\boldsymbol{x}\right) \) that is similar to (1). The acquisition function in (14) can be adjusted to include all fidelity levels with different cost ratios accordingly.

Because of the multiple integral in (12), direct calculation of the acquisition function in (14) can be computationally expensive. An alternative approach can be taken here to search the maximum of the acquisition function. From (11), it is seen that ΔEIl(x) tends to be large when EI(x) is large. ΔEIl(x) has a similar trend as EI(x). Thus, the location of the new LF sample point tends to be selected near the location where a large EI is obtained. Therefore, the EI function for HF sampling can be used in search of maximum, which approximates the true location of the maximum expected further improvement. From the sample location, the acquisition function in (14) can be evaluated based on the surrogates at both fidelity levels, and the fidelity level which leads to a larger acquisition value is selected. The worst-case scenario of this heuristic searching approach is that its searching efficiency is the same as the standard EGO.

3.2 Constrained acquisition function

In general, the constraints in engineering optimization can be divided into two categories: known constraints and unknown constraints. Known constraints can be evaluated easily and analytically without running a simulation. In contrast, unknown constraints are much more complex and usually related to design performance. Whether they are satisfied or not can only be determined after running a simulation. In this work, the penalty function approach (Coello 2000; Shu et al. 2017) is used to handle known constraints where the objective function is penalized. Researchers have proposed different approaches to hand unknown constraints such as constrained EI (Schonlau et al. 1998) and surrogates of constraints (Gardner et al. 2014; Gelbart et al. 2014; Tran et al. 2019a). Here, unknown constraints are incorporated in the new acquisition function.

For an unknown constraint g(x)≤0, we define an indication function F(x) as

$$ F\left(\boldsymbol{x}\right)=\left\{\begin{array}{l}=1,\kern0.5em \mathrm{if}\kern0.3em g\left(\boldsymbol{x}\right)\le 0\\ {}=0,\kern0.3em \mathrm{if}\kern0.4em g\left(\boldsymbol{x}\right)>0\end{array}\right. $$
(15)

Since g(x) cannot be evaluated without running a simulation, we can also construct a hierarchical Kriging model and assume that the prediction of g(x) obeys a normal distribution \( g\left(\boldsymbol{x}\right)\sim N\left(\hat{g}\left(\boldsymbol{x}\right),{\sigma}_g^2\left(\boldsymbol{x}\right)\right) \). Then, the constrained acquisition function is defined as

$$ {\displaystyle \begin{array}{cc}{a}_C\left(x, fidelity\right)& =E\left[F(x)a\left(x, fidelity\right)\right]\\ {}& =E\left[F(x)\right]E\left[a\left(x, fidelity\right)\right]+\operatorname{cov}\left[F(x),a\left(x, fidelity\right)\right]\\ {}& =E\left[F(x)\right]a\left(x, fidelity\right)+\operatorname{cov}\left[F(x),a\left(x, fidelity\right)\right]\end{array}} $$
(16)

In general, the correlation between F(x) and a(x, fidelity) can be ignored. Thus, cov[F(x), a(x, fidelity)] = 0. According to the definition of F(x), E[F(x)] can be calculated as

$$ E\left[F\left(\boldsymbol{x}\right)\right]\kern0.3em =\kern0.4em P\left[g(x)\le 0\right]=\Phi \left(\frac{-\hat{g}\left(\boldsymbol{x}\right)}{\sigma_{\mathrm{g}}\left(\boldsymbol{x}\right)}\right) $$
(17)

Then (16) can be expressed as

$$ {a}_C\left(\boldsymbol{x}, fidelity\right)\kern0.3em =\kern0.4em \Phi \left(\frac{-\hat{g}\left(\boldsymbol{x}\right)}{\sigma_{\mathrm{g}}\left(\boldsymbol{x}\right)}\right)a\left(\boldsymbol{x}, fidelity\right) $$
(18)

4 Examples and results

In this section, five numerical examples and one engineering case study are used to demonstrate the applicability and performance of the proposed approach. The formulations of the five numerical examples and the respective optimal solutions (Cai et al. 2016; Zhang et al. 2018; Zhou et al. 2016) are listed in Table 1. fh(x) and fl(x) represent the HF and LF models, respectively. gh(x)and gl(x) represent the HF and LF constraints, respectively. xbest is the optimal solution and fh(xbest) is the corresponding response.

Table 1 Formulations and solutions for five numerical examples

The proposed approach is compared with the standard EGO (Jones et al. 1998) and the MFEI method (Zhang et al. 2018). Note that the cost difference between HF and LF samples is not considered in the MFEI method, in contrast to our approach. The computational cost is calculated as

$$ \mathrm{cost}={n}_h+\frac{n_l}{T} $$
(19)

where nh and nl are the numbers of HF and LF samples, respectively. T is the cost ratio.

4.1 The one-dimensional example

The first numerical example in Table 1 is used for the illustration of the proposed approach as well as a detailed comparison between different approaches. Here we assume that the cost of a HF sample point is 4 times of a LF sample point (T = 4).

The initial hierarchical Kriging model is constructed based on six LF sample points Sl = {0.0,  0.2,  0.4,  0.6,  0.8,  1.0} and three HF sample points Sh = {0.0,  0.5,  1.0}. The initial samples are uniformly distributed in the design space. The initial sample points, the constructed HF and LF models, the initial hierarchical Kriging model, and the EI function are shown in Fig. 1a. The maximum value of EI function is at x1 = 0.9093. At this location, \( \frac{1}{T}\Delta {EI}_{\mathrm{h}}\left({x}_1\right)=1.8598 \) and E[ΔEIl(x1)] = 6.7459. Hence, a LF sample point is added at this location.

Fig. 1
figure 1

The hierarchical Kriging model and EI function of the one-dimensional function a The initial samples, MF model, and EI functions b The updated MF model and EI function after the first iteration c The updated MF model and EI function after the second iteration

The updated hierarchical Kriging model and EI function are shown in Fig. 1b. Similarly, a LF sample point is added at x2 = 0.8232 in the second iteration, where \( \frac{1}{T}\Delta {EI}_{\mathrm{h}}\left({x}_2\right)=1.7698 \) and E[ΔEIl(x2)] = 5.7174. The updated hierarchical Kriging model and EI function are shown in Fig. 1c. The maximum value of EI function in this iteration is at x3 = 0.7211. At this location, \( \frac{1}{T}\Delta {EI}_{\mathrm{h}}\left({x}_3\right)=0.8018 \) and E[ΔEIl(x3)] =  ‐ 1.8252. Hence, a HF sample point is added in the third iteration.

The searching process of the proposed approach in the first numerical example is listed in Table 2. The proposed approach requires three LF samples and three HF samples to find the optimal solution. The termination criterion for this numerical example is set as

$$ \left|{f}_{\mathrm{min}}-{f}_h\left({x}_{best}\right)\right|\le \varepsilon $$
(20)

where fmin is the best observed objective function and ε is set to be 0.01.

Table 2 The searching process of the proposed approach

The convergences of the three approaches for the first numerical example are compared in Fig. 2. The numbers of total HF and LF sample points, including the initial samples, and the computational costs of the three approaches are listed in Table 3. Note that the initial samples also need to be included in estimating the overall costs. For the standard EGO which itself is for single-fidelity optimization, the same numbers of initial LF and HF samples are recorded for comparison. After the initial model is constructed, the sample points added in the following iterations are counted as HF samples. It is seen from Table 3 that the proposed approach requires the least computational cost to find the optimal solution for the first example. The convergence criterion in Eq. (20) is applied in all three approaches.

Fig. 2
figure 2

The convergence curves of the three approaches for the one-dimensional example

Table 3 The number of sample points and computational cost in different approaches for Case 1

4.2 Numerical examples for Cases 2 to 4

For numerical examples of Cases 2 to 4, Latin hypercube sampling (LHS) (Park 1994; Wang 2003) is used to generate the initial HF and LF sample sets. The sizes of the initial HF and LF sample sets are set to be 3 times and 6 times of the dimensions of the problems, respectively. To account for the influence of randomness, each of these cases is solved 30 times with each of the three approaches. The results for the average numbers of LF and HF sample points are compared in Table 4. The same convergence criterion in (20) is applied for all cases. To illustrate the effect of the cost ratio on the proposed approach, two different cost ratios (T = 4 and T = 10) are tested for the proposed approach. Based on the acquisition functions in (14) and (18), the proposed approach tends to select more LF sample points if the cost of LF sampling is lower (i.e., a higher cost ratio).

Table 4 The average numbers of sample points in different approaches for Cases 2, 3, and 4

For Case 2, the EGO and proposed approach require almost the same number of LF and HF sample points, while the MFEI method samples much more LF sample points than the other two approaches. However, this does not reduce the number of HF samples required in the MFEI method. For Case 3 and Case 4, the MFEI method and the proposed approach require fewer HF sample points than the EGO by supplementing with LF sample points. One key difference between the MFEI method and the proposed approach is that the cost ratio is not considered in the MFEI method. The proposed approach tends to sample more LF sample points and fewer HF sample points as the cost ratio increases.

The computational costs of different approaches for T = 4 and T = 10 according to (18) are listed in Table 5. For Case 2, the EGO and the proposed approach have similar computational costs, which are lower than that of the MFEI method. For Case 3 and Case 4, the MFEI method is more efficient than EGO, and the proposed approach has the lowest cost among the three approaches.

Table 5 Computational costs of different approaches for Cases 2, 3, and 4

4.3 Case 5: a high-dimensional example

The fifth numerical example is used to test the ability of the different approaches to solve high-dimensional optimization problems. LHS is applied to generate the 40 initial HF samples and 100 initial LF samples. In this example, we set the maximum number of iterations to 200 to observe the convergence process of different optimization approaches.

The convergences of the objective values along with the computational costs for the three different approaches are compared in Fig. 3. The best observed objectives, the numbers of LF and HF sample points, and the corresponding computational costs (T = 4) for the three approaches after convergence are listed in Table 6.

Fig. 3
figure 3

The convergence curves of the three approaches for the high-dimensional example

Table 6 Comparing results of the three approaches

From Fig. 3 and Table 6, it is seen that the standard EGO and the proposed approach can obtain a better optimal solution than the MFEI approach. Compared to the EGO, the proposed approach has a lower computational cost to converge to the optimal solution. After 200 iterations, the cost of MFEI is the least, since the MFEI approach added the most LF samples and the fewest HF samples. The overreliance on LF samples led to the missing out on the opportunities to reach a better solution.

4.4 Engineering case study: impedance optimization of the long base

As an engineering case study, the proposed approach is applied to optimize the long base of a ship. The simulation model of the problem consists of a cylindrical shell and a long base, which is shown in Fig. 4. The optimization objective is to maximize the minimum impedance of the pedestal while keeping the weight below 3.4 tons. The mechanical impedance of a vibrating system is the complex ratio of a harmonic excitation to its response. In this example, the impedance is the origin impedance, which is the complex ratio of a harmonic excitation to its response at the same location. To calculate the impedance, two unit harmonic forces are loaded in the Y-axis direction at point A and point B of the long base in Fig. 4. The frequency of the unit harmonic forces ranges from 0 to 350 Hz. The displacements at the ends of the cylindrical shell and the part of the base connected to the bulkhead are fixed to zeros. The six design variables shown in Fig. 4 are listed in Table 7. Other fixed parameters related to materials and geometry are shown in Table 8.

Fig. 4
figure 4

The geometric model of the cylindrical shell and the long base

Table 7 The ranges of the design variables
Table 8 The fixed parameters related to materials and geometries

For the HF model, the step size of frequency calculation is chosen to be 2.5 Hz. For the LF model, the step size of calculation is 10 Hz. The computational cost of the HF model is 4 times of the LF model (T = 4). The convergence of the objective values along with the computational costs for the three approaches is plotted in Fig. 5. The best observed objectives, the numbers of LF and HF sample points, and the corresponding computational costs for the three approaches after convergence are listed in Table 9. For further comparison, the simulation resolution is further reduced to the step size of 25 Hz and applied as the LF model (T = 10). The results are also listed in Table 9.

Fig. 5
figure 5

The convergence curves of the three approaches

Table 9 Comparing results of the three approaches

From Fig. 5 and Table 9, it is seen that the MFEI method and the proposed approach can obtain a better optimal solution than the standard EGO. Compared to the EGO and MFEI, the proposed approach has a lower computational cost to converge to the optimal solution. The proposed approach can rely more on LF sample points when the LF model is cheaper. This indicates that the proposed approach can adjust the sampling process adaptively according to the cost of obtaining extra information.

5 Concluding remarks

In this paper, a MFBO approach for global optimization is proposed based on the hierarchical Kriging model and a new acquisition function. In the new acquisition function, the value of LF sample points is quantified as the expected further improvement and the cost ratio between HF and LF sampling is considered. Both the location and fidelity level of the next sample point are determined simultaneously by maximizing the acquisition function. For constrained problems, the acquisition function can be further generalized with the surrogates of constraints. The proposed approach has been demonstrated with five numerical problems and one engineering design case. Compared to single-fidelity BO and an existing multi-fidelity BO method, the new approach incorporates the sampling cost differences in the sequential process and shows a higher level of efficiency.

The major limitation of the proposed acquisition function is the cost of direct computation. In this paper, a heuristic approach is taken in search of the maximum of acquisition based on the EI of the HF model. The search efficiency is usually better than or at least not worse than the standard EGO. In future work, efficient computational methods for the new acquisition function with the expected further improvement will be investigated. Numerical integration methods such as quadrature and importance sampling can be helpful.

In the proposed acquisition function, the cost ratio of HF to LF samples plays a major role. In all examples of this paper, the ratios were assumed to be known a priori. When the costs of HF and LF simulations are not previously known in simulation-based design optimization, an initial ratio can be estimated. During the sequential sampling process, the cost ratio can be updated on the fly once the simulations are run and actual costs become available. Thus, the acquisition function can be adjusted adaptively. Nevertheless, the overall sampling cost proposed in (19) to evaluate the performance of MFBO approaches requires further study for its fairness in comparisons.

Scalability has been a major issue for Kriging-based metamodeling. The number of samples increases exponentially as the dimension of the searching space increases. Approaches such as batch parallelization (Tran et al. 2019a, 2019b) and sparse Gaussian process (McIntire et al. 2016; Zhang et al. 2019) have been applied in Bayesian optimization to alleviate the dimensionality challenge of Kriging. Yet much work of Bayesian optimization for high-dimensional problems remains.