1 Introduction

Computer models or simulators are increasingly becoming popular for gaining insights of the physical processes and phenomena that are too expensive or infeasible to observe. For example, Greenberg [15] developed a finite volume community ocean model (FVCOM) for simulating the flow of water in the Bay of Fundy; Bower et al. [3] discussed the formation of galaxies using a simulator called GALFORM; and Bayarri et al. [2] used a simulator called TITAN2D for modelling the maximum volcanic eruption flow height. Realistic computer simulators of complex processes can also be computationally expensive to run, and thus, statistical surrogates trained on a handful of simulator runs are often used for the deeper understanding of the underlying phenomena. Sacks et al. [38] proposed using a realization of the Gaussian process (GP) model as a surrogate for such processes.

The popular objectives of such computer experiments include global fitting, variable screening, and estimation of process features like the maximum, a pre-specified contour or a tail quantile region. The global fitting refers to finding accurate estimation of the underlying true response surface and thus making accurate prediction over the design domain. Assuming that the simulator under consideration is expensive to run, the number of simulator runs would be limited and thus one must be careful in choosing the inputs. Over the last two decades, several innovative methodologies and algorithms have been developed to address some of the concerns. See [10, 34, 35] for details.

We focus on efficient designs for global fitting. In computer experiments literature, a popular technique is to use Latin hypercube designs [27] with some space-filling properties like maximin interpoint distance [18, 28], minimum pairwise coordinate correlation [17, 21], orthogonal array-based structure [29, 39], projection property [19], etc. Such designs aim at filling the input space as evenly as possible, but do not consider the complexity of the response surface. On the other hand, D-optimal designs [18], integrated mean-squared prediction error (IMSPE)-optimal designs [36], and maximum mean-squared prediction error (MMSPE)-optimal designs [37] use the process response information in finding a design for global fitting.

Most of these designs follow one-shot approach, i.e. all design points are obtained at the same time. However, over the past decade, a few sequential designs have also been proposed for global fitting of the response surface that have higher prediction accuracy, for instance, the D-optimal design [12, 14], expected improvement (EI) criterion-based design [22], and minimum potential energy-based design [20]. More specifically, for example, the R package tgp provided the implementation of creating a sequential D-Optimal design for a stationary GP model of fixed parameterization by subsampling from a list of candidates. The algorithm is sequential in nature in that it adds one design point a time from the list of candidates, and the design point is added by maximizing the determinant of the covariance matrix constructed with the existing design points and a point from the list of candidates. In this paper, we propose two new sequential design approaches for global fitting. The proposed approaches obtain the sequential designs for achieving higher prediction accuracy based on the generalized EI criteria for contour estimation which aim to find the input settings that achieve a specific response value [33]. The rationale behind this idea is that to gain higher prediction accuracy is to have more accurate estimate of a response surface, and the estimation of a response surface can be approximated by the estimation of a large number of contours over the range of the responses. Thus, we can generalize the EI criterion for a single-level contour estimation to that for multiple-level contour estimation. This allows us to use the closed form of the generalized EI criterion for searching sequential designs.

We propose two generalizations of EI-based criteria for contour estimation. First, we recommend splitting the range of simulator outputs into k equi-spaced contours and then develop a new EI criterion for the simultaneous estimation of these pre-specified multiple contours. Second, we propose a new adaptive approach of choosing contour levels for selecting the follow-up trial by maximizing the EI criterion for contour estimation. The performance of the proposed approaches has been compared with several state-of-the-art designs for global fitting.

The remainder of the article is organized as follows. Section 2 presents a quick review of the GP model for building a surrogate of the computer model output, popular sequential design approaches for global fitting [20, 22], and the EI criterion for contour estimation [33]. Section 3 presents the new multiple-contour estimation-based EI method for constructing designs for global fitting of the response surface. In Sect. 4, we propose the new adaptive method of estimating the contour levels for choosing follow-up design points in the sequential framework. The performance comparison of the proposed methods and the existing approaches is discussed in Sect. 5. Finally, Sect. 6 summarizes the key findings and concluding remarks.

2 Background Review

This section reviews the necessary background and the existing relevant work for later development. More specifically, we provide a brief account of reviews on GP models used throughout, the existing sequential design approaches for global fitting as well as the contour estimation in Ranjan et al. [33].

2.1 Gaussian Process Models

Gaussian process models are most widely used in computer experiments to emulate outputs from computer codes (e.g. [38]). Its popularity is due to its simplicity, flexibility, and the ability of providing the predictive uncertainty. Here we cover the key concepts of GP models and refer the reader to Santner et al. [35] and Rasmussen and Williams [34] for details. For a training data of size n, let the ith input and output of a computer code be a d-dimensional vector \({\mathbf{x}}_i =(x_{i1}, \ldots , x_{id})^T\) and a scalar \(y_i = y({\mathbf{x}}_i)\), for \(i=1,\ldots ,n\). Typically, without loss of generality, the design domain is assumed to be a unit hypercube, \(\chi = (0,1)^d\). A GP model assumes

$$\begin{aligned} y({\mathbf{x}}_i) = {\mathbf{f}}^T {\varvec{\beta }}+ Z({\mathbf{x}}_i), \quad i= 1,2,\ldots ,n, \end{aligned}$$
(1)

where \({\mathbf{f}}\) is a vector of regression functions, \({\varvec{\beta }}\) is the vector of regression parameters, \(Z({\mathbf{x}})\) is a stationary stochastic process with mean zero, constant variance \(\sigma ^2\), and the correlation between two outputs \(y({\mathbf{x}}_i)\) and \(y({\mathbf{x}}_j)\) is denoted by \(R({\mathbf{x}}_i,{\mathbf{x}}_j)= \hbox {corr}({\mathbf{x}}_i,{\mathbf{x}}_j)\). In this article, we focus on the GP models with a constant mean, that is, \({\mathbf{f}}^T {\varvec{\beta }}= \mu\). Let \({\mathbf{y}}= (y_1,\ldots , y_n)^T\) be the vector of responses for the training data, and \({\mathbf{R}}\) be an \(n\times n\) spatial correlation matrix with the (ij)th element \(R({\mathbf{x}}_i, {\mathbf{x}}_j)\). For a GP model in (1), it is equivalent to assume that \({\mathbf{y}}\) follows a multivariate normal distribution with mean vector \(\mu {\mathbf{1}}_n\) and the covariance matrix \(\sigma ^2{\mathbf{R}}\) with \({\mathbf{R}}=({\mathbf{R}}({\mathbf{x}}_i, {\mathbf{x}}_j))\), where \({\mathbf{1}}_n\) is an n-dimensional column vector of all 1’s. Notationally, we denote \({\mathbf{y}}\sim GP(\mu {\mathbf{1}}_n, \sigma ^2{\mathbf{R}})\). There are many choices of valid correlation functions. One popular choice is the Gaussian correlation function,

$$\begin{aligned} R({\mathbf{x}}_i, {\mathbf{x}}_j)= & {} \prod _{k=1}^d \hbox {exp} \{ -\theta _k (x_{ik} - x_{jk})^2\}, \end{aligned}$$
(2)

where \(\theta _k\) is the correlation parameter for the kth input variable. The unknown parameters in the model include the mean \(\mu\), the variance \(\sigma ^2\), and d correlation parameters \(\theta _1, \ldots , \theta _d\). They can be estimated via the maximum likelihood approach or Bayesian approach such as Markov chain Monte Carlo (MCMC) [5, 10, 23, 35]. For the maximum likelihood approach, if the correlation parameters are known, the estimates of \(\mu\) and \(\sigma ^2\) in (1) are

$$\begin{aligned} \hat{\mu } = ({\mathbf{1}}_n^{\text{ T }}{\mathbf{R}}^{-1}{\mathbf{1}}_n)^{-1}{\mathbf{1}}_n^{\text{ T }}{\mathbf{R}}^{-1}{\mathbf{y}}\end{aligned}$$
(3)

and

$$\begin{aligned} \hat{\sigma }^2 = \frac{ ({\mathbf{y}}-{\mathbf{1}}_n\hat{\mu } )^T {\mathbf{R}}^{-1} ({\mathbf{y}}-{\mathbf{1}}_n\hat{\mu } )}{n}. \end{aligned}$$
(4)

The best linear unbiased predictor (BLUP) at an input \({\mathbf{x}}_0\) is given by

$$\begin{aligned} \hat{y}({\mathbf{x}}_0) = E[y({\mathbf{x}}_0)|{\mathbf{y}}] = \mu + {\mathbf{r}}^T({\mathbf{x}}_0) {\mathbf{R}}^{-1}({\mathbf{y}}- \mu {\mathbf{1}}_n), \end{aligned}$$
(5)

where \({\mathbf{r}}({\mathbf{x}}_0) = (R({\mathbf{x}}_0, {\mathbf{x}}_1),\ldots ,R({\mathbf{x}}_0, {\mathbf{x}}_n))^T\). Moreover, the predictive variance of \(y({\mathbf{x}})\) is

$$\begin{aligned} s^2({\mathbf{x}}_0) = \hbox {Var}(y({\mathbf{x}}_0)|{{\mathbf{y}}}) = \sigma ^2 \left( 1-{\mathbf{r}}^T({\mathbf{x}}_0){\mathbf{R}}^{-1}{\mathbf{r}}({\mathbf{x}}_0) \right) . \end{aligned}$$
(6)

In practice, the unknown correlation parameters in (3) and (4) are replaced with the estimates. Thus, \(\mu, \,\sigma ^2, \, {\mathbf{R}},\) and \({\mathbf{r}}({\mathbf{x}}_0)\) in (5) and (6) are replaced by \(\hat{\mu }\), \(\hat{\sigma }^2\), \({\hat{\mathbf{R}}},\) and \({\hat{\mathbf{r}}}({\mathbf{x}}_0)\), respectively. There are a number of R packages that can provide the GP model fitting, for example, mlegp, GPfit, DiceKriging, tgp, RobustGaSP and SAVE [6, 14, 16, 26, 30, 32]. These R packages are different in terms of computational efficiency and stability. In general, they should provide similar results. For the reason of stability, we use the R package GPfit [26] in this article.

2.2 Existing Sequential Design Approaches for Global Fitting

The general setup of a sequential design approach starts with an initial design and adds one point or a batch of points at a time sequentially. We focus on the sequential approaches of adding one point at a time. The next follow-up point should be chosen based on the information gathered from the existing data and should be most informative among the candidate points. The process of adding points is repeated until a tolerance based stopping criterion is met or a pre-specified budget is exhausted. The step-by-step process of the sequential design approach is as follows.

  1. Step 1

    Choose an initial design of run size \(n_0\). Let \(n=n_0\).

  2. Step 2

    Build a statistical surrogate model using the available data \(\{ ({\mathbf{x}}_i, y_i), i = 1,\ldots ,n\}\).

  3. Step 3

    Choose the next design point \({\mathbf{x}}_{n+1}\) based on a criterion. Run the computer code at the new input \({\mathbf{x}}_{n+1}\) and obtain the corresponding response \(y_{n+1}\).

  4. Step 4

    Let \(n= n+1\) and repeat Steps 2 and 3 until it reaches the run size budget or satisfies the stopping criterion.

A few remarks are in order. First, the initial design typically comes with some space-filling property like maximin interpoint distance, minimum pairwise coordinate correlation, etc. If the initial run size \(n_0\) is too small, the resulting surrogate model could be wildly inaccurate and mislead the follow-up design choice. On the other hand, if the \(n_0\) is relatively large, it may not fully take the advantage of sequential design criterion in Step 3. Ranjan et al. [33] recommended that the value of \(n_0\) should be between 25 and 35% of the ultimate run size budget. Such a recommendation is based on their sequential design approach for contour estimation. Second, the run size budget certainly depends on the computer code of interest. Loeppky et al. [25] provided a rule of thumb for selecting a sample size, that is, 10 times the number of input variables. In our illustrative examples, the total run size is at least 10d. Third, in principle, any modelling methods such as GP, treed GP (TGP), or Bayesian additive regression trees (BART) [4, 11] can be used as a surrogate in Step 2. We focus on GP modelling in the examples.

2.2.1 Expected Improvement Criterion by Lam and Notz [22]

Lam and Notz [22] introduced a sequential design approach based on an expected improvement for global fit (EIGF) criterion which chooses the next input point that maximizes the following expected improvement. The improvement function \(I({\mathbf{x}})\) is defined as

$$\begin{aligned} I({\mathbf{x}}) = (y({\mathbf{x}}) - y({\mathbf{x}}_{j^*}))^2 \end{aligned}$$

with \(y({\mathbf{x}}_{j^*})\) being the observed output at the sampled point, \({\mathbf{x}}_{j^*}\), that is closest in distance to the candidate point \({\mathbf{x}}\). The expected improvement is given by

$$\begin{aligned} \hbox {E}(I({\mathbf{x}})) = (\hat{y}({\mathbf{x}}) - y({\mathbf{x}}_{j^*}))^2 + \hbox {Var}(\hat{y}({\mathbf{x}})) = (\hat{y}({\mathbf{x}}) - y({\mathbf{x}}_{j^*}))^2 + s^2({\mathbf{x}}). \end{aligned}$$
(7)

Lam and Notz [22] used the Euclidean distance to determine this nearest sampled design point. The expectation in (7) is taken with respect to the predictive distribution of \(y({\mathbf{x}})\) under the GP model, i.e. \(y({\mathbf{x}}) \sim N(\hat{y}({\mathbf{x}}), s^2({\mathbf{x}}))\). The EIGF criterion in (7) balances the local search and global search of the next potential design input that guides the search for the ‘informative’ regions with significant variation in the response values.

2.2.2 Sequential Minimum Energy Designs by Joseph et al. [20]

Motivated by the fact in physics that the charged particles in a box repel and try to remain away from each other as much as possible, Joseph et al. [20] viewed a space-filling design in the experimental region as the positions occupied by the charged particles in a box. The charge of each particle represents the experimental response. A minimum energy design is obtained by minimizing the potential energy. Let \(q({\mathbf{x}})\) be the charge of the particle at the design input \({\mathbf{x}},\) and \(d({\mathbf{x}}_i, {\mathbf{x}}_j)\) denote the Euclidean distance between the ith and the jth input. Joseph et al. [20] defined the potential generalized energy (GE) of a design \({\mathbf{D}}_n = \{{\mathbf{x}}_1,\ldots , {\mathbf{x}}_n\}\) as

$$\begin{aligned} \hbox {GE}_p = \left\{ \sum _{i=1}^{n-1} \sum _{j=i+1}^n \left( \frac{q({\mathbf{x}}_i)q({\mathbf{x}}_j)}{d({\mathbf{x}}_i,{\mathbf{x}}_j)}\right) ^p \right\} ^{1/p}, \end{aligned}$$
(8)

where p is in the range of \([1, \infty )\). They further proposed a sequential minimum energy design approach which works as follows. Let \(\hat{q}({\mathbf{x}}) = \{ \hat{y}({\mathbf{x}}) \}^{-1/(2d)}\), where d is the dimensionality of the input \({\mathbf{x}}\). Then the proposed one-point at-a-time greedy algorithm finds the next follow-up design point given by

$$\begin{aligned} {{\mathbf{x}}}_{n+1} = \underset{{\mathbf{x}}_0 \in \chi }{\arg \min } \sum _{i=1}^n \left( \frac{ \hat{q}({\mathbf{x}}_i)\hat{q}({\mathbf{x}}_0)}{ d({\mathbf{x}}_i, {\mathbf{x}}_0)}\right) ^p. \end{aligned}$$
(9)

The design generated by this algorithm is called sequential minimum energy design (SMED).

2.3 Contour Estimation via EI Criterion

The contour at level ‘a’ of a simulator response surface consists of all the inputs \({\mathbf{x}}\) that yield the same response a, that is,

$$\begin{aligned} S(a) = \{ {\mathbf{x}}\in \chi : y({\mathbf{x}}) = a \}. \end{aligned}$$
(10)

Ranjan et al. [33] developed an expected improvement criterion under the sequential design methodology for estimating a contour from an expensive to evaluate computer simulator with scalar responses. The proposed improvement function is,

$$\begin{aligned} I({\mathbf{x}}) = \epsilon ^2({\mathbf{x}}) - \min \big \{ (y({\mathbf{x}}) - a)^2, \epsilon ^2({\mathbf{x}}) \big \}, \end{aligned}$$
(11)

where \(y({\mathbf{x}})\) has a normal predictive distribution, i.e. \(y({\mathbf{x}}) \sim N(\hat{y}({\mathbf{x}}), s^2({\mathbf{x}}))\), and \(\epsilon ({\mathbf{x}}) = \alpha s({\mathbf{x}})\) for a positive constant \(\alpha\). A suggested value for \(\alpha\) is 1.96 for the reason that this value defines a region of interest around S(a) to be 95% confidence interval under the normality assumption of the responses. Letting \(v_1({\mathbf{x}}) = a - \epsilon ({\mathbf{x}})\) and \(v_2({\mathbf{x}}) = a + \epsilon ({\mathbf{x}})\), the closed form of the expectation of the improvement function \(I({\mathbf{x}})\) with respect to the predictive distribution of \(y({\mathbf{x}})\) is given by,

$$\begin{aligned} \hbox {E}(I({\mathbf{x}}))= & {} \int _{v_1({\mathbf{x}})}^{v_2({\mathbf{x}})} [ \epsilon ^2({\mathbf{x}}) - (t - a)^2 ] \phi \left( \frac{t-\hat{y}({\mathbf{x}})}{s({\mathbf{x}})}\right) {\mathrm{d}}t \nonumber \\= & {} [\epsilon ({\mathbf{x}})^2 - (\hat{y}({\mathbf{x}}) - a)^2 - s^2({\mathbf{x}}) ] ( \Phi (u_2) - \Phi (u_1)) + s^2({\mathbf{x}}) (u_2\phi (u_2) - u_1\phi (u_1)) \nonumber \\&\quad + 2(\hat{y}({\mathbf{x}}) - a) s({\mathbf{x}}) (\phi (u_2) - \phi (u_1)), \nonumber \\= & {} T_1 + T_2 + T_3 \end{aligned}$$
(12)

where \(u_1 = [v_1({\mathbf{x}}) - \hat{y}({\mathbf{x}})]/s({\mathbf{x}})\), \(u_2 = [v_2({\mathbf{x}}) - \hat{y}({\mathbf{x}})]/s({\mathbf{x}})\), and \(\phi (\cdot )\) and \(\Phi (\cdot )\) are the probability density function and the cumulative distribution function of a standard normal random variable, respectively. Note that we define the three terms \(T_1\), \(T_2\) and \(T_3\) for easy explanation below. See Ranjan et al. [33] and the associated Errata for the derivation of (12). The first term \(T_1\) in (12) suggests an input with a large \(s({\mathbf{x}})\) in the neighbourhood of the predicted contour, while the last term \(T_3\) assigns the weights to points that are far away from the predicted contour with large uncertainties. The second term \(T_2\) is often dominated by the other two terms in (12). Maximizing the EI criterion in (12) results in the inputs with high uncertainty near the predicted contour as well as those far away, achieving both aims of local search and global exploration.

3 Global Fitting by Estimating Multiple Contours

This section proposes a new method for constructing a sequential design for achieving higher prediction accuracy of the overall global fit. The basic sequential framework would remain the same as in Sect. 2.2, that is, start with a good initial design (e.g. maximin Latin hypercube) of size \(n_0 \ll n\) and then sequentially add the remaining \(n-n_0\) points using some method that feeds on the objective of global fitting. Instead of the conventional approach of trying to evenly fill the input space, the proposed idea is to slice the response surface into multiple contours and then use the sequential design approach to simultaneously estimate those contours. Next, we generalize the EI criterion for contour estimation [33] for simultaneous estimation of multiple contours.

For a given integer \(k>0\) and the set of scalar values \(a_1, \ldots , a_k \in [y_{\min }, y_{\max }]\), suppose that we are interested in estimating k contours \(S(a_1),\ldots ,S(a_k)\), where \([y_{\min }, y_{\max }]\) represents the range of the true simulator response and \(S(\cdot )\) is defined in (10). Without loss of generality, assume \(a_1< a_2< \cdots <a_k\). For choosing the follow-up trial, we propose the improvement function at input \({\mathbf{x}}\) as

$$\begin{aligned} I({\mathbf{x}}) = \epsilon ^2({\mathbf{x}}) - \min \{ (y({\mathbf{x}}) - a_1)^2, \ldots , (y({\mathbf{x}}) - a_k)^2, \epsilon ^2({\mathbf{x}}) \}, \end{aligned}$$
(13)

where \(y({\mathbf{x}}) \sim N(\hat{y}({\mathbf{x}}), s^2({\mathbf{x}}))\) and \(\epsilon ({\mathbf{x}}) = \alpha s({\mathbf{x}})\) for some positive constant \(\alpha\). This improvement function will be nonzero only if \((y({\mathbf{x}})-a_j)^2 < \epsilon ^2({\mathbf{x}})\) for some j. Therefore, the improvement function can be rewritten as:

$$\begin{aligned} I({\mathbf{x}})= & {} \max \left\{ 0, \epsilon ^2({\mathbf{x}}) - (y({\mathbf{x}})-a_j)^2,\ \ j=1,2,\ldots ,k \right\} . \end{aligned}$$

Since \(a_1< a_2< \cdots <a_k\), the improvement function can be further simplified as

$$\begin{aligned} I({\mathbf{x}}) = \left\{ \begin{array}{ll} \epsilon ^2({\mathbf{x}}) - (y({\mathbf{x}}) - a_1)^2, &{} \ \ \ a_1 - \epsilon ({\mathbf{x}}) \le y({\mathbf{x}}) \le \min \{a_1+\epsilon ({\mathbf{x}}), (a_1+a_2)/2\}\\ \ldots &{} \\ &{} \ \ \ \max \{a_j - \epsilon ({\mathbf{x}}), (a_{j-1}+a_j)/2 \} \le y({\mathbf{x}}), \\ \epsilon ^2({\mathbf{x}}) - (y({\mathbf{x}}) - a_j)^2, &{} \ \ \ y({\mathbf{x}}) \le \min \{a_j + \epsilon ({\mathbf{x}}), (a_j+a_{j+1})/2\}, \\ &{} \ \ \ 2 \le j \le k-1; \\ \ldots &{} \\ \epsilon ^2({\mathbf{x}}) - (y({\mathbf{x}}) - a_k)^2, &{} \ \ \ \max \{a_k - \epsilon ({\mathbf{x}}), (a_{k-1}+a_k)/2 \} \le y({\mathbf{x}}) \le a_k + \epsilon ({\mathbf{x}}), \\ 0, &{} \ \ \ \hbox {otherwise}. \end{array} \right. \end{aligned}$$

The term \(\epsilon ({\mathbf{x}})\) defines an uncertainty band around each contour that is a function of the predictive standard deviation \(s({\mathbf{x}})\). For the design points already chosen, the radius of the band is exactly zero. In addition, the criterion will tend to be large for the samples from one of the sets \(( \{ {\mathbf{x}}:y({\mathbf{x}}) = a_1\}, \{ {\mathbf{x}}:y({\mathbf{x}}) = a_2\}, \ldots , \{ {\mathbf{x}}:y({\mathbf{x}}) = a_k\})\), where \(s({\mathbf{x}})\) is large.

Similar to other sequential design approaches, we suggest choosing follow-up design points by maximizing the corresponding expected improvement, where the expectation is taken with respect to the predictive distribution, \(y({\mathbf{x}}) \sim N(\hat{y}({\mathbf{x}}), s^2({\mathbf{x}}))\). For \(j=1,\ldots ,k\), let \(v_{j1}({\mathbf{x}})\)’s and \(v_{j2}({\mathbf{x}})\)’s be defined as follows,

$$\begin{aligned} v_{j1} ({\mathbf{x}})= \left\{ \begin{array}{ll} a_1 - \epsilon ({\mathbf{x}}), &{} \quad j = 1; \\ \max \{a_j - \epsilon ({\mathbf{x}}), (a_{j-1}+a_j)/2 \} , &{} \quad 2 \le j \le k,\\ \end{array} \right. \end{aligned}$$
(14)

and

$$\begin{aligned} v_{j2}({\mathbf{x}})= \left\{ \begin{array}{ll} \min \{a_j+\epsilon ({\mathbf{x}}), (a_{j}+a_{j+1})/2\}, &{} \quad 1 \le j \le k-1; \\ a_k + \epsilon ({\mathbf{x}}), &{} \quad j=k.\\ \end{array} \right. \end{aligned}$$
(15)

Then, the expectation of the improvement function in (13) is simply the sum of the individual contour estimation EI criterion of Ranjan et al. [33] over k cases, i.e.

$$\begin{aligned} E[I({\mathbf{x}})]= & {} \sum _{j=1}^k \int _{v_{j1}({\mathbf{x}})}^{v_{j2}({\mathbf{x}})} [ \epsilon ^2({\mathbf{x}}) - (t - a_j)^2 ] \phi \left( \frac{t-\hat{y}({\mathbf{x}})}{s({\mathbf{x}})}\right) {\mathrm{d}}t \nonumber \\= & {} \sum _{j=1}^k \Big \{ [\epsilon ({\mathbf{x}})^2 - (\hat{y}({\mathbf{x}}) - a_j)^2 - s^2({\mathbf{x}}) ] ( \Phi (u_{j2}) - \Phi (u_{j1})) \nonumber \\&\quad + s^2({\mathbf{x}}) (u_{j2}\phi (u_{j2}) - u_{j1}\phi (u_{j1})) + 2(\hat{y}({\mathbf{x}}) - a _j) s({\mathbf{x}}) (\phi (u_{j2}) - \phi (u_{j1})) \Big \}, \end{aligned}$$
(16)

where \(u_{j1} = (v_{j1}({\mathbf{x}}) - \hat{y}({\mathbf{x}}))/s({\mathbf{x}})\) and \(u_{j2} = (v_{j2}({\mathbf{x}}) - \hat{y}({\mathbf{x}}))/s({\mathbf{x}})\), \(\phi (\cdot )\) and \(\Phi (\cdot )\) are the probability density function and the cumulative distribution function of a standard normal random variable, respectively. The formulation in (16) reduces to (12) when the number of contour levels is \(k=1\). Compared with the EI criterion in (7) by Lam and Notz [22], the proposed criterion has different weighting on the variance.

Note that the maximization of \(E[I({\mathbf{x}})]\) over \(I({\mathbf{x}})\) has two advantages. First, the true value of \(y({\mathbf{x}})\) (and hence \(I({\mathbf{x}})\)) is unknown for any unsampled design point. Second, some regions of the design space may not have been sufficiently explored yet and the predictive variance of \(\hat{y}({\mathbf{x}})\) is relatively high. For such an unsampled design point, the predicted response is not within the \(\epsilon ({\mathbf{x}})\)-band of one of those k estimated contours, but it may be close to the true contours that may lie in the unexplored region. As a result, the EI approach facilitates a balance between the local exploitation versus global exploration.

The use of the EI criterion in (16) involves the choices of k - the number of contours, and the contour levels \(a_1, a_2, \ldots , a_k\). Finding their optimal choice with respect to achieving more accurate predictions appears to be a challenging task. A reasonable way to choose their values is to use equi-spaced k contours in the simulator output range \([y_{\min }, y_{\max }]\). The values of \(y_{\min }\) and \(y_{\max }\) are unknown in general. We thus estimate these two values using the fitted model. That is, we approximate \(y_{\min }\) and \(y_{\max }\) using the minimal and maximal of \(\hat{y}({\mathbf{x}}_1), \ldots , \hat{y}({\mathbf{x}}_m)\) where \({\mathbf{x}}_1,\ldots , {\mathbf{x}}_m\) are a large number of inputs from the design domain, m is chosen to be 1000d in all the examples and \(\hat{y}({\mathbf{x}}_j)\) is the fitted response at \({\mathbf{x}}_j\) using the fitted model at the current stage. Note that the existing methods reviewed in Sect. 2.2 do not make use of the information of \(y_{\min }\) and \(y_{\max }\). Using such information in the proposed method may contribute to its superior performance.

We now present two illustrations of the proposed multiple-contour estimation EI criterion (referred to as MC criterion) for global fitting with different values of k.

Example 1

Consider the computer model [13] that relates the one-dimensional input x and the output y as,

$$\begin{aligned} y = \frac{\hbox {sin}(10\pi x)}{2x} + (x-1)^4, \ \ \ 0.5 \le x \le 2.5. \end{aligned}$$
(17)

The true relationship between the input x and the output y is displayed in the blue solid curve in Fig. 1. Five initial design points are shown by black empty circles. We then sequentially add 15 design points using the MC criterion in (16). The numerical labels represent the order of the newly added design points. Figure 1a–d illustrates the sequential design scheme with the MC criterion for 1, 5, 10, 20 equally spaced contour levels within the ranges of the fitted surface. When \(k=1\), the majority of the added points are around the contour level \(a = 2.0\). For larger values of k, as k equally spaced contour levels are used, the added points by the proposed MC criterion-based sequential design approach are more space-filling. In addition, Fig. 1 reveals that in this example, the proposed approach can choose the inputs that are around the areas where the function changes the direction and locate most of the points in the areas where the computer model is more complex.

Fig. 1
figure 1

Illustration of MC criterion with k contour levels. The blue curves represent the true relationship between x and y of the computer model in (17); the black empty circles are the five initial design points; the red stars are locations of follow-up design points, and the numerical values are the order of the added points

Example 2

Consider a computer model with two-dimensional input variables \({\mathbf{x}}= (x_1,x_2)\), and the output given by

$$\begin{aligned} y({\mathbf{x}}) = [1 + (4x_1 + 4x_2 + 1)](3 + 192x_1x_2) , 0 \le x_1 \le 1, 0 \le x_2 \le 1. \end{aligned}$$
(18)

The contour plot of the response surface is visually displayed in Fig. 2. Suppose a maximin Latin hypercube design of 10 points is generated and the corresponding responses are collected from the computer model. First, we consider searching for the next follow-up design point for estimating only one contour at the level \(a = 300\). Figure 3 shows these 10 design points, the inputs with \(I({\mathbf{x}})>0\), and the maximizer of the EI criterion for contour estimation in (12) from the candidate set on a regular \(100\times 100\) rectangular grid.

Fig. 2
figure 2

The contour plot of the response surface of the computer simulator in (18)

Fig. 3
figure 3

Illustration of the follow-up point selection method using the EI criterion for contour estimation from the computer model (18). The black solid circles denote the training points, blue dots represent nonzero improvement value, i.e. \(\{{\mathbf{x}}: |y({\mathbf{x}})-a| \le \epsilon ({\mathbf{x}})\}\) for the contour level \(a=300\), the contour lines display \(\log (E[I({\mathbf{x}})])\) values, and the red solid circle shows the maximizer of the EI criterion

Next, we consider the simultaneous estimation of three contours at levels \(a_1 = 150\), \(a_2 = 300\) and \(a_3 = 600\) using the MC criterion in (16). Figure 4 shows the inputs from the same \(50\times 50\) grid candidate set that achieve nonzero improvement (13), and the point that maximizes the MC criterion. This point, depicted by red empty circle in Fig. 4, is in fact from the set of the points that yield nonzero improvement around the contour level \(a_1=150\).

Fig. 4
figure 4

Illustration of the follow-up point selection method using the MC criterion (at levels \(a_1=150\), \(a_2=300\), \(a_3=600\)) for the computer model (18). The black solid circles denote 20 training points. The purple square circles, blue pluses, and red triangles represent improvement around the three contour levels, respectively. The contour lines display \(\log (E(I({\mathbf{x}})))\), and the red circle represents the maximizer of the MC criterion in (16)

Figure 5 illustrates the complete sequential design scheme with 20 initial design points and 30 follow-up design points for simultaneously estimating three contours at levels \(a_1=150\), \(a_2=300,\) and \(a_3=600\). The red squares are the new follow-up points, and the label corresponds to the order the point is added. The last panel displays the squared distance between the estimated contour and the true contour at each stage. It can be observed from Fig. 5 that the estimated uncertainty bands around the three contours become narrower and more accurate. It can also be seen that more points are added to estimate the contour level \(a_1=150\) than to estimate the other two contours. Some points such as the second point are away from the contour bands.

Fig. 5
figure 5

Illustration of the MC criterion for contour levels \(a_1=150\), \(a_2=300\) and \(a_3=600\) with \(n_0=20\) initial design points and 30 follow-up points. The accuracy in f is measured by the squared distance between the estimated contour after adding i-follow-up points and the true contour

It is clear from the two examples that the resulting designs do not have the conventional space-filling property. This is desirable as the objective is an overall good fit of the response surface and not to explore the input space. However, as illustrated in Fig. 5, a significant fraction of design points tend to line up on the pre-specified contours, which could lead to biased designs if \(a_1, \ldots , a_k\) are not chosen appropriately. Next, we propose an efficient method of selecting contour levels.

4 Sequential Estimation of Contours for Global Fitting

In this section, we propose a new approach for choosing the follow-up design points. Different from the previous section where the simultaneous estimation of multiple contours was used for global fitting, we adopt the EI criterion for estimating only one contour level at each stage. That is, at each stage, we choose a contour level and find the design point that maximizes the criterion (12) with the chosen contour level. The important issue then is how to choose the contour level at each stage. We propose the following way to choose such contour level in an automatic way. Suppose at stage j, the training data are \(\{ ({\mathbf{x}}_i, y_i), i=1,\ldots ,n)\}\) and the corresponding emulator gives the predictive distribution as \(y({\mathbf{x}}) \sim N(\hat{y}({\mathbf{x}}), s^2({\mathbf{x}}))\) for any input \({\mathbf{x}}\). Let the candidate set for the next follow-up point be \({\mathbf{x}}^*_1, \ldots , {\mathbf{x}}^*_m\) and

$$\begin{aligned} {\mathbf{x}}^*_{opt} = \underset{1 \le i \le m}{\arg \max }\ s^2({\mathbf{x}}^*_i). \end{aligned}$$

Then, we choose the contour level at stage j as \(a_j = \hat{y}({\mathbf{x}}^*_{opt})\). In other words, at each stage, we set the contour level to be the fitted response that has maximum predictive variance. This is to encourage exploring the area with maximum uncertainty.

Example 2 (contd.) Consider finding a design for global fitting of the computer simulator in Example 2. The procedure starts with an initial design of size \(n_0=10\) obtained via maximin Latin hypercube sampling and \(n-n_0=30\) follow-up points are chosen as per the proposed sequential strategy. Figure 6 displays the follow-up design points found by the proposed method, i.e. the sequential contour estimation-based EI criterion, as well as the trace plot of the contour values.

Fig. 6
figure 6

Illustration of the sequential contour estimation-based EI criterion for global fitting with \(n_0=10\) and 30 added points in Example 2: a the initial points (in black) and the added points (in red); b the contour value versus the stage j, for \(j = 1, \ldots , 30\)

Note that the resulting design is more space-filling as compared to a systematic layout of points on the contour lines shown in Fig. 5. Again, the design is not completely space-filling and it has some pairs of close-by points.

Before concluding this section, we make some remarks on the computational cost of the approaches. We implemented all four methods [(EIGF, SMED, MC and sequential contour (SC)] in R with the help of the GPfit package used for fitting GP models and obtaining the predicted mean and variance, \(\hat{y}({\mathbf{x}})\) and \(s^2({\mathbf{x}})\), for every \({\mathbf{x}}\) in the test set. For all four methods, the overall sequential framework remains the same, but the design criteria for choosing the follow-up trial are different. From a computational cost standpoint, the evaluation of the EIGF criterion is dominated by finding \(y({\mathbf{x}}_{j*})\), which amounts to computing and sorting the distance between \({\mathbf{x}}_0\) (the candidate test point) and n training points \(\{{\mathbf{x}}_1, \ldots , {\mathbf{x}}_n\}\). The computationally dominating part of the SMED criterion is the evaluation of distance between \({\mathbf{x}}_0\) and n training points \(\{{\mathbf{x}}_1, \ldots , {\mathbf{x}}_n\}\), and then a sum (which is inexpensive). Thus, the SMED evaluation should be cheaper than that of the EIGF criterion. For the MC criterion with k contour levels, one has to sort \(2k-2\) numeric strings of size two each, and 2k evaluations of normal cumulative distribution function. For SC criterion evaluation, no sorting is required, and the normal cumulative distribution function has to be computed for only two values. However, the SC method requires an additional sorting of \(s^2({\mathbf{x}})\) values over the test set.

5 Simulated Examples

In this section, we conduct a simulation study to demonstrate the effectiveness of the proposed sequential design approaches. Specifically, we compare the proposed approaches with the following methods:

  1. (a)

    a one-shot maximin Latin hypercube design;

  2. (b)

    the sequential D-optimal design in the R package tgp;

  3. (c)

    the sequential approach by Lam and Notz [22];

  4. (d)

    the sequential minimum energy design in Joseph et al. [20];

  5. (e)

    the proposed multiple contours estimation-based criterion in Sect. 3;

  6. (f)

    the proposed sequential contour estimation-based criterion in Sect. 4.

For approach (e), we use the 10 equally spaced contour levels, that is, k is set to be 10. These methods are denoted by ‘maximinLHD’, ‘tgp’,‘EIGF’, ‘SMED’, ‘MC_10’, and ‘SC_var’.

Several criteria can be used to evaluate the performance of different design approaches in comparison. We adopt the root-mean-squared prediction error (RMSPE) given by

$$\begin{aligned} \hbox {RMSPE} = \sqrt{\frac{1}{|{\mathcal {X}}_{\mathrm{pred}}|}\sum _{{\mathbf{x}}\in {\mathcal {X}}_{\mathrm{pred}}} (\hat{y}({\mathbf{x}}) - y({\mathbf{x}}))^2}, \end{aligned}$$
(19)

where \(\hat{y}({\mathbf{x}})\) and \(y({\mathbf{x}})\) are the predicted response and the true response at the new input \({\mathbf{x}}\) in the hold-out set \({\mathcal {X}}_{\mathrm{pred}}\). Another criterion we use is the maximum error provided by

$$\begin{aligned} \hbox {Maximum error} = \max _ {{\mathbf{x}}\in {\mathcal {X}}_{\mathrm{{pred}}}} | \hat{y}({\mathbf{x}}) - y({\mathbf{x}})|. \end{aligned}$$
(20)

For each example below, the initial design for sequential designs is a maximin Latin hypercube design of \(n_0\) runs generated using the R package SLHD [1]. The model fitting is implemented using the default setting of the function GP_fit in the R package GPfit. The test data are a random Latin hypercube design of 1000d points where d is the number of input variables. The parameter \(\alpha\) in ‘MC_10’ and ‘SC_var’ is set to be 2.

Example 3

We consider the computer model with two input variables \(x_1\) and \(x_2\),

$$\begin{aligned} y= & {} \left( x_2 - \frac{5.1}{4\pi ^2} x_1^2 + \frac{5}{\pi }x_1-6\right) ^2 \nonumber \\&+\ 10 \left( 1- \frac{1}{8\pi }\right) \hbox {cos}(x_1) + 10,\ -5 \le x_1 \le 10, 0 \le x_2 \le 15. \end{aligned}$$
(21)

This model is known as Branin function [9]. We use \(n_0=10\) initial design points. The total run size budget is 30. Figure 7 displays the boxplots of RMSPEs and maximum errors of the different design approaches over 100 simulations. The results show that in this example the one-shot approach ‘maximinLHD’ is the worst while the approaches ‘MC_10’ and ‘SC_var’ are comparably better than the others.

Fig. 7
figure 7

The boxplots of RMSPEs and maximum errors of the methods ‘maximinLHD’, ‘tgp’, ‘EIGF’, ‘SMED’, ‘MC_10’, and ‘SC_var’ for the computer model in (21) with \(n_0=10\) and 20 added points over 100 simulations

Example 4

We consider the computer model with three input variables \(x_1\), \(x_2,\) and \(x_3\) [8]

$$\begin{aligned} y = 4(x_1-2+8x_2-8x^2_2)^2+(3-4x_2)^2+16\sqrt{x_3+1}(2x_3-1)^2, 0 \le x_i \le 1, \hbox { for} \ i=1,\ldots ,3. \end{aligned}$$
(22)

We use \(n_0=20\) initial design points. The total run size budget is 60. Figure 8 displays the boxplots of RMSPEs and maximum errors of the different design approaches over 100 simulations. Again, here the one-shot approach ‘maximinLHD’ is the worst. The approaches ‘tgp’, ‘MC_10’ and ‘SC_var’ are comparably better than the others in terms of RMSPE, and the proposed approaches ‘MC_10’ and ‘SC_var’ are significantly better than the others in terms of maximum errors.

Fig. 8
figure 8

The boxplots of RMSPEs and maximum errors of the methods ‘maximinLHD’, ‘tgp’, ‘EIGF’, ‘SMED’, ‘MC\(\_10\)’, and ‘SC_var’ for the computer model in (22) with \(n_0=20\) and 40 added points over 100 simulations

Example 5

We consider the computer model with four input variables \(x_1\), \(x_2\), \(x_3,\) and \(x_4\) [31]

$$\begin{aligned} y = \frac{x_1}{2}\left[ \sqrt{1+(x_2+x_3^2)\frac{x_4}{x_1^2}}-1\right] +(x_1+3x_4)exp(1+\sin (x_3)), 0\le x_i \le 1, \hbox { for } i=1,\ldots ,4. \end{aligned}$$
(23)

We use \(n_0=27\) initial design points. The total run size budget is 80. Figure 9 displays the boxplots of RMSPEs and maximum errors of the different design approaches over 100 simulations. Here, the approach ‘SMED’ is the worst followed by the approach ‘maximinLHD’ in terms of both criteria. The performance of the other four approaches is similar based on RMSPE. However, based on the maximum error, the proposed approaches give more accurate predictions.

Fig. 9
figure 9

The boxplots of RMSPEs and maximum errors of the methods ‘maximinLHD’, ‘tgp’, ‘EIGF’, ‘SMED’, ‘MC\(\_10\)’, and ‘SC_var’ for the computer model in (23) with \(n_0=27\) and 53 added points over 100 simulations

6 Concluding Remark

In this article we have developed two sequential design approaches for accurately predicting a complex computer code. The approaches are based on the expected improvement criteria for simultaneously or sequentially estimating contours. We used a GP model as a surrogate for the computer simulator, which is an integral component of the proposed criteria for identifying the follow-up trials. Numerical examples are given to demonstrate that the proposed approaches can significantly outperform the existing approaches.

A few remarks are in order. First, one can easily extend the methodology for a GP surrogate with a small nugget term, which are typically used for either increasing the numerical stability of the model or to account for small noise in the simulator output. Second, if some other surrogate is used instead of GP model, then also the key ideas like formulation of improvement function and sequential estimation of contour levels can be retained. Of course, the resultant expected improvement criteria would change, and in fact, one may not even end up with a closed form expression of the final design criterion for selecting follow-up points. Third, this article focuses on one-point-at-a-time fully sequential designs. However, in some applications, a batch sequential design is more desirable and practical than a fully sequential design [24]. Extending the expected improvement criteria to batch sequential designs is not trivial and would require a thorough investigation. We hope to address such an issue in the near future. Other future work includes the application of the proposed contour estimation-based sequential design approaches for global fitting for computer experiments with both qualitative and quantitative factors [7] and dynamic computer experiments [40].