Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction

Bouhlel, Mohamed Amine; Bartoli, Nathalie; Otsmane, Abdelkader; Morlier, Joseph

doi:10.1007/s00158-015-1395-9

Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction

REVIEW ARTICLE
Published: 14 January 2016

Volume 53, pages 935–952, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction

Download PDF

Mohamed Amine Bouhlel¹,
Nathalie Bartoli²,
Abdelkader Otsmane¹ &
…
Joseph Morlier³

3160 Accesses
132 Citations
Explore all metrics

Abstract

Engineering computer codes are often computationally expensive. To lighten this load, we exploit new covariance kernels to replace computationally expensive codes with surrogate models. For input spaces with large dimensions, using the kriging model in the standard way is computationally expensive because a large covariance matrix must be inverted several times to estimate the parameters of the model. We address this issue herein by constructing a covariance kernel that depends on only a few parameters. The new kernel is constructed based on information obtained from the Partial Least Squares method. Promising results are obtained for numerical examples with up to 100 dimensions, and significant computational gain is obtained while maintaining sufficient accuracy.

An efficient kriging modeling method for high-dimensional design problems based on maximal information coefficient

Article 24 July 2019

Combination of optimization-free kriging models for high-dimensional problems

Article 27 October 2023

Ensemble of Kriging with Multiple Kernel Functions for Engineering Design Optimization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction and main contribution

In recent decades, because simulation models have striven to more accurately represent the true physics of phenomena, computational tools in engineering have become ever more complex and computationally expensive. To address this new challenge, a large number of input design variables, such as geometric representation, are often considered. Thus, to analyze the sensitivity of input design variables or to search for the best point of a physical objective under certain physical constraints (i.e., global optimization), a large number of computing iterations are required, which is impractical when using simulations in real time. This is the main reason that surrogate modeling techniques have been growing in popularity in recent years. Surrogate models, also called metamodels, are vital in this context and are widely used as substitutes for time-consuming high-fidelity models. They are mathematical tools that approximate coded simulations of a few well-chosen experiments that serve as models for the design of experiments. The main role of surrogate models is to describe the underlying physics of the phenomena in question. Different types of surrogate models can be found in the literature, such as regression, smoothing spline (Wahba and Craven 1978; Wahba 1990), neural networks (Haykin 1998), radial basis functions (Buhmann 2003) and Gaussian-process modeling (Rasmussen and Williams 2006).

In this article, we focus on the kriging model because it estimates the prediction error. This model is also referred to as the Gaussian-process model (Rasmussen and Williams 2006) and was presented first in geostatistics (see, e.g., Cressie 1988 or Goovaerts 1997) before being extended to computer experiments and machine learning (Schonlau 1998; Sasena 2002; Jones et al. 1998; Picheny et al. 2010). The kriging model has become increasingly popular due to its flexibility in accurately imitating the dynamics of computationally expensive simulations and its ability to estimate the error of the predictor. However, it suffers from some well-known drawbacks in high dimension, which may be due to multiple causes. For starters, the size of the covariance matrix of the kriging model may increase dramatically if the model requires a large number of sample points. As a result, inverting the covariance matrix is computationally expensive. The second drawback is the optimization of the subproblem, which involves estimating the hyper-parameters for the covariance matrix. This is a complex problem that requires inverting the covariance matrix several times. Some recent works have addressed the drawbacks of high-dimensional Gaussian processes (Hensman et al. 2013; Damianou and Lawrence 2013; Durrande et al. 2012) or the large-scale sampling of data (Sakata et al. 2004). One way to reduce CPU time when constructing a kriging model is to reduce the number of hyper-parameters, but this approach assumes that the kriging model exhibits the same dynamics in all directions (Mera 2007).

Thus, because estimating the kriging parameters can be time consuming, especially with dimensions as large as 100, we present herein a new method that combines the kriging model with the Partial Least Squares (PLS) technique to obtain a fast predictor. Like the method of principle components analysis (PCA), the PLS technique reduces dimension and reveals how inputs depend on outputs. PLS is used in this work because PCA only exposes dependencies between inputs. Information given by PLS is integrated in the covariance structure of the kriging model to reduce the number of hyper-parameters. The combination of kriging and PLS is abbreviated KPLS and allows us to build a fast kriging model because it requires fewer hyper-parameters in its covariance function; all without eliminating any input variables from the original problem. In general, the number of kriging parameters is equal to the number of dimension which is reduced to at maximum 4 parameters with our approach.

The KPLS methods is used for many academic and industrial verifications, and promising results have been obtained for problems with up to 100 dimensions. The cases used in this paper do not exceed 100 input variables, which should be quite sufficient for most engineering problems. Problems with more than 100 inputs may lead to memory difficulties with the toolbox Scikit-learn (version 0.14), on which the KPLS method is based.

This paper is organized as follows: Section 2 summarizes the theoretical basis of the universal kriging model, recalling the key equations. The proposed KPLS model is then described in detail in Section 3 by using the kriging equations. Section 4 compares and analyzes the results of the KPLS model with those of the kriging model when applied to classic analytical examples and some complex engineering examples. Finally, Section 5 concludes and gives some perspectives.

2 Universal kriging model

To understand the mathematics of the proposed methods, we first review the kriging equations. The objective is to introduce the notation and to briefly describe the theory behind the kriging model. Assume that we have evaluated a cost deterministic function at n points x ⁽ⁱ⁾ (i = 1,…,n) with

$$\textbf{x}^{(i)} = \left[\textbf{x}^{(i)}_{1},\ldots,\textbf{x}^{(i)}_{d}\right] \in B \subset \mathbb{R}^{d}, $$

and we denote by X the matrix [x ^(1)t,…,x ^(n)t]^t. For simplicity, B is considered to be a hypercube expressed by the product between intervals of each direction space, i.e., B = ∏j = 1d[a _j,b _j], where a _j,b _j ∈ ℝ with a _j ≤ b _j for j = 1,…,d. Simulating these n inputs gives the outputs y = [y ⁽¹⁾,…,y ⁽ⁿ⁾]^t with y ⁽ⁱ⁾ = y(x ⁽ⁱ⁾) for i = 1,…,n. We use ŷ(x) to denote the prediction of the true function y(x) which is considered as a realization of a stochastic process Y(x) for all x∈B. For the universal kriging model (Roustant et al. 2012; Picheny et al. 2010), Y is written as

$$ Y(\textbf{x}) = \sum\limits_{j=1}^{m}\beta_{j}f_{j}(\textbf{x})+Z(\textbf{x}), $$

(1)

where, for j = 1,…,m, f _j is a known independent basis function, β _j ∈ ℝ is an unknown parameter, and Z is a random variable defined by Z(x)∼(0,k), with k being a stationary covariance function, also called a covariance kernel. The kernel function k can be written as

$$ k(\textbf{x},\textbf{x}^{\prime}) = \sigma^{2} r(\textbf{x},\textbf{x}^{\prime})=\sigma^{2} r_{\textbf{x}\textbf{x}^{\prime}}\ \forall\ \textbf{x},\textbf{x}^{\prime}\in B, $$

(2)

where σ ² is the process variance and r _{x
x} ^′ is the correlation function between x and x ^′. However, the correlation function r depends on some hyper-parameters 𝜃 and it is considered to be known. We also denote the n × 1 vector as r _{x
X} = [r _{x
x} ⁽¹⁾,…,r _{x
x} ⁽ⁿ⁾]^t and the n × n covariance matrix as R = [r _x ⁽¹⁾ X,…,r _x ⁽ⁿ⁾ X].

2.1 Derivation of prediction formula

Under the hypothesis above, the best linear unbiased predictor for y(x), given the observations y, is

$$ \hat{y}(\textbf{x}) = \textbf{f}(\textbf{x})^{t}\hat{\beta} + \textbf{r}_{\textbf{x} \textbf{X}}^{t}\textbf{R}^{-1}\left( \textbf{y}-\textbf{F}\hat{\beta}\right), $$

(3)

where f(x)=[f ₁(x),…,f _m(x)]^t is the m × 1 vector of basis functions, F = [f(x ⁽¹⁾),…,f(x ⁽ⁿ⁾)]^t is the n × m matrix, and β̂ is the vector of generalized least-square estimates of β = [β ₁,…,β _m]^t, which is given by

$$ \hat{\beta}= \left[ \begin{array}{c} \hat{\beta}_{1}\\ \vdots\\ \hat{\beta}_{m}\\ \end{array} \right] =\left( \textbf{F}^{t}\textbf{R}^{-1}\textbf{F}\right)^{-1}\textbf{F}^{t}\textbf{R}^{-1}\textbf{y}. $$

(4)

Moreover, the universal kriging model provides an estimate of the variance of the prediction, which is given by

$$ s^{2}(\textbf{x}) = \hat{\sigma}^{2}\left( 1-\textbf{r}_{\textbf{x} \textbf{X}}^{t}\textbf{R}^{-1}\textbf{r}_{\textbf{x} \textbf{X}}\right), $$

(5)

with

$$ \hat{\sigma}^{2}=\frac{1}{n}\left( \textbf{y}-\textbf{F}\hat{\beta}\right)^{t}\textbf{R}^{-1}\left( \textbf{y}-\textbf{F}\hat{\beta}\right). $$

(6)

For more details of the derivation of the prediction formula, see, for instance, Sasena (2002) or Schonlau (1998). The theory of the proposed method has been expressed in the same way as for the universal kriging model. The numerical examples in Section 4 use the ordinary kriging model, which is a special case of the universal model, but with f(x)={1} (and m = 1). For the ordinary kriging model, (3), (4), and (6) are then replaced by the equations given in Appendix A.

Note that the assumption of known covariance with known hyper-parameters 𝜃 is unrealistic in reality. For this reason, the covariance function is typically chosen from among a parametric family of kernels. Table 12 in Appendix B gives some examples of typical stationary kernels. The number of hyper-parameters required for the estimate is typically greater than (or equal to) the number of input variables. In this work, we use in the following a Gaussian exponential kernel:

$$k(\textbf{x},\textbf{x}^{\prime})=\sigma^{2}\prod\limits_{i=1}^{d}\exp\left( -\theta_{i}\left( \textbf{x}_{i}-{\textbf{x}^{\prime}}_{i}\right)^{2}\right)\ \forall\ \theta_{i}\in \mathbb{R}^{+}.$$

By applying some elementary operations to existing kernels, we can construct new kernels. In this work, we use the property that the tensor product of covariances is a covariance kernel in the product space. More details are available in Rasmussen and Williams (2006), Durrande (2011), Bishop (2007), Liem and Martins (2014).

2.2 Estimation of hyper-parameters 𝜃

The key point of the kriging approximation is how it estimates the hyper-parameters 𝜃, so its main steps are recalled here, along with some mathematical details.

One of the major challenges when building a kriging model is the complexity and difficulty of estimating the hyper-parameters 𝜃, in particular when dealing with problems with many dimensions or with a large number of sampling points. In fact, using (3) to make a kriging prediction requires inverting an n × n matrix, which typically has cost of (n ³), where n is the number of sampling points (Braham et al. 2014). The hyper-parameters are estimated using maximum likelihood (ML) or cross validation (CV), which are based on observations. Bachoc compared the ML and CV techniques (Bachoc 2013) and concluded that, in most cases studied, the CV variance is larger. The ML method is widely used to estimate the hyper-parameters 𝜃; it is also used in this paper. In practice, the following log-ML estimate is often used:

$$\begin{array}{@{}rcl@{}} \log\textrm{-ML}(\theta) &=& -\frac{1}{2}\left[n\ln(2\pi\sigma^{2})+\ln (\det \textbf{R}(\theta)) \right.\\ &&\left.+(\textbf{y} - \textbf{F}\beta)^{t}\textbf{R}(\theta)^{-1}(\textbf{y}-\textbf{F}\beta)/\sigma^{2}\right]. \end{array} $$

(7)

Inserting β̂ and σ̂² given by (4) and (6), respectively, into the expression (7), we get the following so-called concentrated likelihood function, which depends only on the hyper-parameters 𝜃:

$$\begin{array}{@{}rcl@{}} \textrm{log-ML}(\theta)&=&-\frac{1}{2}[n\ln\hat{\sigma}^{2}+\ln\det \textbf{R}(\theta)]\\ &=&-\frac{1}{2}\left[n\ln\left( \frac{1}{n}(\textbf{y}-\textbf{F}(\textbf{F}^{t}\textbf{R}^{-1}\textbf{F})^{-1}\textbf{F}^{t}\textbf{R}^{-1}\textbf{y})^{t}\right.\right.\\ &&\qquad\left.\times\textbf{R}^{-1}(\textbf{y}-\textbf{F}(\textbf{F}^{t}\textbf{R}^{-1}\textbf{F})^{-1}\textbf{F}^{t}\textbf{R}^{-1}\textbf{y})\vphantom{\frac{1}{n}}\right)\\ &&\qquad\left.+\ln\det\textbf{R}\vphantom{\left( \frac{1}{n}\right)}\right]. \end{array} $$

(8)

To facilitate reading, R(𝜃) has been replaced by R in the last line of (8).

Maximizing (8) is very computationally expensive for high dimensions and when using a large number of sample points because the (n × n) matrix R(𝜃) in (8) must be inverted. The maximization problem is often solved using genetic algorithms (see Forrester and Sobester 2008 for more details). In this work, we use the derivative-free optimization algorithm COBYLA that was developed by Powell (1994). COBYLA is a sequential trust-region algorithm that uses linear approximations for the objective and constraint functions.

Figure 1 recalls the principal stages of building a kriging model, and each step is briefly outlined below:

1.
The user must provide the initial design of experiments (X, y) and the parametric family of the covariance function k.
2.
To derive the prediction formula, the kriging algorithm assumes that all parameters of k are known.
3.
Under the hypothesis of the kriging algorithm, we estimate hyper-parameters 𝜃 from the concentrated likelihood function given by (8) and by using the COBYLA algorithm.
4.
Finally, we calculate the prediction (3) and the associated estimation error (5) after estimating all hyper-parameters of the kriging model.

3 Kriging model combined with Partial Least Squares

As explained above, maximizing the concentrated likelihood (8) can be time consuming when the number of covariance parameters is large, which typically occurs in large dimension. Solving this problem can be accelerated by combing the PLS method and the kriging model. The 𝜃 parameters from the kriging model represent the range in any spatial direction. Assuming, for instance, that certain values are less significant for the response, then the corresponding 𝜃 _i (i = 1,…,d) will be very small compared to the other 𝜃 parameters. The PLS method is a well-known tool for high-dimensional problems and consists of maximizing the variance by projecting onto smaller dimensions while monitoring the correlation between input variables and the output variable. In this way, the PLS method reveals the contribution of all variables—the idea being to use this information to scale the 𝜃 parameters.

In this section we propose a new method that can be used to build an efficient kriging model by using the information extracted from the PLS stage. The main steps for this construction are as follows:

1.
Use PLS to define weight parameters.
2.
To reduce the number of hyper-parameters, define a new covariance kernel by using the PLS weights.
3.
Optimize the parameters.

The key mathematical details of this construction are explained in the following.

3.1 Linear transformation of covariance kernels

Let x be a vector space over the hypercube B. We define a linear map given by

$$ \begin{array}{cccl} F:&B&\longrightarrow&B^{\prime},\\ &\textbf{x}& \longmapsto &\left[\alpha_{1}\textbf{x}_{1},\dotsc,\alpha_{d} \textbf{x}_{d}\right], \end{array} $$

(9)

where α ₁,…,α _d ∈ ℝ and B ^′ is a hypercube included in ℝ ^d (B ^′ can be different from B). Let k be an isotropic covariance kernel with k:B ^′×B ^′→ℝ. Since k is isotropic, the covariance kernel k(F(⋅),F(⋅)) depends on a single parameter, which must be estimated. However, if α ₁,…,α _d are well chosen, then we can use k(F(⋅),F(⋅)). In this case, the linear transformation F allows us to approach the isotropic case (Zimmerman and Homer 1991). In the present work, we choose α ₁,…,α _d based on information extracted from the PLS technique.

3.2 Partial Least Squares

The PLS method is a statistical method that finds a linear relationship between input variables and the output variable by projecting input variables onto a new space formed by newly chosen variables, called principal components (or latent variables), which are linear combinations of the input variables. This approach is particularly useful when the original data are characterized by a large number of highly collinear variables measured on a small number of samples. Below, we briefly describe how the method works. For now, suffice it to say that only the weighting coefficients are central to understanding the new KPLS approach. For more details on the PLS method, please see Helland (1988), Frank and Friedman (1993), Alberto and González (2012).

The PLS method is designed to search out the best multidimensional direction in X space that explains the characteristics of the output y. After centering and scaling the (n × d)-sample matrix X and the response vector y, the first principal component t ⁽¹⁾ is computed by seeking the best direction w ⁽¹⁾ that maximizes the squared covariance between t ⁽¹⁾ = X w ⁽¹⁾ and y:

$$ \textbf{w}^{(1)} = \left\lbrace \begin {array}{c} \arg\max\limits_{\textbf{w}} \,\textbf{w}^{t} \textbf{X}^{t} \textbf{y} \textbf{y}^{t} \textbf{X} \textbf{w} \\ \text{such that}~ \textbf{w}^{t} \textbf{w}\,=\,1. \end {array}\right. $$

(10)

The optimization problem (10) is maximized when w ⁽¹⁾ is the eigenvector of the matrix X ^t y y ^t X corresponding to the eigenvalue with the largest absolute value; the vector w ⁽¹⁾ contains the X weights of the first component. The largest eigenvalue of problem (10) can be estimated by the power iteration method introduced by Lanczos (1950).

Next, the residual matrix from X = X ⁽⁰⁾ space and from y = y ⁽⁰⁾ are calculated; these are denoted X ⁽¹⁾ and y ⁽¹⁾, respectively:

$$ \begin {array}{ccc} \textbf{X}^{(1)} &=& \textbf{X}^{(0)} - \textbf{t}^{(1)} \textbf{p}^{(1)},\\ \textbf{y}^{(1)} &=& \textbf{y}^{(0)} - c_{1} \textbf{t}^{(1)},\\ \end {array} $$

(11)

where p ⁽¹⁾ (a 1×d vector) contains the regression coefficients of the local regression of X onto the first principal component t ⁽¹⁾, and c ₁ is the regression coefficient of the local regression of y onto the first principal component t ⁽¹⁾. The system (11) is the local regression of X and y onto the first principal component.

Next, the second principal component, which is orthogonal to the first, can be sequentially computed by replacing X by X ⁽¹⁾ and y by y ⁽¹⁾ to solve the system (10). The same approach is used to iteratively compute the other principal components. To illustrate this process, a simple three-dimensional (3D) example with two principal components is given in Fig. 2. In the following, we use h to denote the number of principal components retained.

The principal components represent the new coordinate system obtained upon rotating the original system with axes, x ₁,…,x _d (Alberto and González 2012). For l = 1,…,h, t ^(l) can be written as

$$ \textbf{t}^{(l)} = \textbf{X}^{(l-1)}\textbf{w}^{(l)} = \textbf{X}\textbf{w}_{*}^{(l)}. $$

(12)

This important relationship is used for coding the method. The following matrix W _∗ = [w∗(1),…,.w∗(h)] is obtained by Manne (1987):

$$\textbf{W}_{*}=\textbf{W}\left( \textbf{P}^{t}\textbf{W}\right)^{-1},$$

where W = [w ⁽¹⁾,…,w ^(h)] and P = [p ⁽¹⁾ ^t,…,p ^(h) ^t].

The vector w ^(l) corresponds to the principal direction in X space that maximizes the covariance of X ^(l−1) ^t y ^(l−1) y ^(l−1) ^t X ^(l−1). If h = d, the matrix W _∗ = [w∗(1),…,w∗(d)] rotates the coordinate space (x ₁,…,x _d) to the new coordinate space (t ⁽¹⁾,…,t ^(d)), which follow the principal directions w ⁽¹⁾,…,w ^(d).

As mentioned in the introduction, the PLS method is chosen instead of the PCA method because the PLS method shows how the output variable depends on the input variables, whereas the PCA method focuses only on how the input variables depend on each other. In fact, the hyper-parameters 𝜃 for the kriging model depend on how each input variable affects the output variable.

3.3 Construction of new kernels for KPLS models

Let B be a hypercube included in ℝ ^d. As seen in the previous section, the vector w∗(1) is used to build the first principal component t ⁽¹⁾ = X w∗(1), where covariance between t ⁽¹⁾ and y is maximized. The scalars w∗1(1),…,w∗d(1) can then be interpreted as measuring the importance of x ₁,…,x _d, respectively, for constructing the first principal component where its correlation with the output y is maximized. However, we know that the hyper-parameters 𝜃 ₁,…,𝜃 _d (see Table 12 in Appendix B) can be interpreted as measuring how strongly the variables x ₁,…,x _d, respectively, affect the output y. Thus, we define a new kernel k _{k
p
l
s1}:B × B→ℝ given by k ₁(F ₁(⋅),F ₁(⋅)) with k ₁:B × B→ℝ being an isotropic stationary kernel and

$$\begin{array}{@{}rcl@{}} F_{1}:B&\longrightarrow&B,\\ \textbf{x}& \longmapsto &\left[\textbf{w}^{(1)}_{*1} \textbf{x}_{1},\dots,\textbf{w}^{(1)}_{*d} \textbf{x}_{d}\right]. \end{array} $$

(13)

F ₁ goes from B to B because it only works for the new coordinate system obtained by rotating the original coordinate axes, x ₁,…,x _d. Through the first component t ⁽¹⁾, the elements of the vector w∗(1) reflect how x depends on y. However, such information is generally insufficient, so the elements of the vector w∗(1) are supplemented by the information given by the other principal components t ⁽²⁾,…,t ^(h). Thus, we build a new kernel k _{k
p
l
s1:h} sequentially by using the tensor product of all kernels k _{k
p
l
s
l}, which accounts for all this information in only a single covariance kernel:

$$ k_{kpls1:h}(\textbf{x},\textbf{x}^{\prime}) = \prod\limits_{l=1}^{h} k_{l}(F_{l}\left( \textbf{x}\right),F_{l}({\textbf{x}^{\prime}})), $$

(14)

with k _l:B × B→ℝ and

$$ \begin{array}{cccl} F_{l}:&B &\longrightarrow&B\\ &\textbf{x}& \longmapsto &\left[\textbf{w}^{(l)}_{*1} \textbf{x}_{1},\dots,\textbf{w}^{(l)}_{*d} \textbf{x}_{d}\right]. \end{array} $$

(15)

If we consider the Gaussian kernel applied with this proposed approach, we get

$$\begin{array}{@{}rcl@{}} k(\textbf{x},\textbf{x}^{\prime})&=&\sigma^{2}\displaystyle{\prod\limits_{l=1}^{h}\prod\limits_{i=1}^{d}}\exp\left[-\theta_{l} \left( \textbf{w}^{(l)}_{*i}\textbf{x}_{i}-\textbf{w}^{(l)}_{*i}{\textbf{x}^{\prime}}_{i}\right)^{2}\right],\\ &&\forall\ \theta_{l}\in[0,+\infty[. \end{array} $$

Table 13 in Appendix B presents new KPLS kernels based on examples from Table 12 (also in Appendix B) that contain fewer hyper-parameters because h ≪ d. The number of principal components is fixed by the following leave-one-out cross-validation method:

(i) We build KPLS models based on h = 1, then h = 2, … principal components.
(ii) We choose the number of components that minimizes the leave-one-out cross-validation error.

The flowchart given in Fig. 3 shows how the information flows through the algorithm, from sample data, PLS algorithm, kriging hyper-parameters, to final predictor. With the same definitions and equations, almost all the steps for constructing the KPLS model are similar to the original steps for constructing the ordinary kriging model. The exception is the third step, which is highlighted in the solid-red box in Fig. 3. This step uses the PLS algorithm to define the new parametric kernel k _{k
p
l
s1:h} as follows:

a.
initialize the PLS algorithm with l = 1;
b.
if l ≠ 1, compute the residual of X ^(l−1) and y ^(l−1) by using system (11);
c.
compute X weights for iteration l;
d.
define a new kernel k _{k
p
l
s1:h} by using (14);
e.
if the number of iterations is reached, return to step 3, otherwise continue;
f.
update data considering l = l+1.

Note that, if kernels k _l are separable at this point, the new kernel given by (14) is also separable. In particular, if all kernels k _l are of the exponential type (e.g., all Gaussian exponentials), the new kernel given by (14) will be the same type as k _l. The proof is given in Appendix C.

4 Numerical examples

We now present a few analytical and engineering examples to verify the proper functioning of the proposed method. The ordinary kriging model with a Gaussian kernel provides the benchmark against which the results of the proposed combined approach are compared. The Python toolbox Scikit-learn v.014 (Pedregosa et al. 2011) is used to implement these numerical tests. This toolbox provides hyper-parameters for the ordinary kriging. The computations were done on an Intel^® Celeron^® CPU 900 2.20 GHz desktop PC. For the proposed method, we combined an ordinary kriging model with a Gaussian kernel with the PLS method with one to three principal components.

4.1 Analytical examples

We use two academic functions and vary the characteristics of these test problems to cover most of the difficulties faced in the field of substitution models. The first function is g07 (Michalewicz and Schoenauer 1996) with 10 dimensions, which is close to what is required by industry in terms of dimensions,

$$\begin{array}{@{}rcl@{}} y_{g07}(\textbf{x}) &=& {x_{1}^{2}}+{x_{2}^{2}}+x_{1}x_{2}-14x_{1}-16x_{2}\\ &&+(x_{3}-10)^{2}+4(x_{4}-5)^{2}+(x_{5}-3)^{2}\\ &&+2(x_{6}-1)^{2}+5{x_{7}^{2}}+7(x_{8}-11)^{2}\\ &&+2(x_{9}-10)^{2}+(x_{10}-7)^{2}+45,\\ &&-10\leq x_{i}\leq 10,\text{ for} i=1,\ldots,10. \end{array} $$

For this function, we use experiments based on a latin hypercube design with 100 data points to fit models.

The second function is the Griewank function (Regis and Shoemaker 2013), which is used because of its complexity, as illustrated in Fig. 4 for the two-dimensional (2D) case. The function is

$$\begin{array}{@{}rcl@{}} y_{\text{Griewank}}(\textbf{x})&=&\sum\limits_{i=1}^{d} \frac{{x_{i}^{2}}}{4000}-\prod\limits_{i=1}^{d} \cos\left( \frac{x_{i}}{\sqrt{i}}\right)+1,\\ &&-600\leq x_{i}\leq 600,\textrm{ for}~ i=1,\ldots,d. \end{array} $$

Two types of experiments are done with this function. The first is defined over the interval [−600, 600] and has varying dimensions (2, 5, 7, 10, 20, 60). This experiment serves to verify the effectiveness of the proposed approach in both low and high dimensions. It is based on the latin hypercube design and uses n data points to fit models, as mentioned in Table 1.

Table 1 Number of data points used for latin hypercube design for the Griewank test function

Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction

Abstract

Similar content being viewed by others

An efficient kriging modeling method for high-dimensional design problems based on maximal information coefficient

Combination of optimization-free kriging models for high-dimensional problems

Ensemble of Kriging with Multiple Kernel Functions for Engineering Design Optimization

Explore related subjects

1 Introduction and main contribution

2 Universal kriging model

2.1 Derivation of prediction formula

2.2 Estimation of hyper-parameters 𝜃

3 Kriging model combined with Partial Least Squares

3.1 Linear transformation of covariance kernels

3.2 Partial Least Squares

3.3 Construction of new kernels for KPLS models

4 Numerical examples

4.1 Analytical examples

4.1.1 Comparison with g07 function

4.1.2 Comparison with complex Griewank function over interval [−600, 600]

4.1.3 Comparison with complex Griewank function over interval [−5, 5]

4.2 Industrial examples

4.3 Dimensional limits

5 Conclusion and future work

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Equations for ordinary kriging model

Appendix B: Examples of kernels

Appendix C: Proof of equivalence kernel

Appendix D: Results of Griewank function in 20D and 60D over interval [−5, 5]

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation