Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

3.1 3.1 Introduction

This paper deals with the economical representation of dedicated sets of data, that are currently — and more and more importantly — available stemming out of various experiences or given by formal expressions. The amount of information that can be derived out of a given massive set of data is far much smaller than the size of the data itself, therefore, parallel to the increasing size of data acquisition and storage available on computer architectures, an effort for post processing and economically represent, analyze and derive pertinent information out of the data has been done during the last century. The main idea starts from the translation of the fact that the data are dedicated to some phenomenon and thus, there exists a certain amount of coherence in these data which can be separated into two classes: deterministic or statistical. Among them have been proposed: regularity, sparsity, small n-width etc. that can be either assumed, verified or proven.

The data themselves can be known in different ways, either (i) completely explicitly, like for instance (i-1) from an analytic representation or at least access to the values at every point, (i-2) or only given on a large set of points, (i-3) or also given through various global measures like moments, or (ii) given implicitly through a model like a partial differential equation (PDE). The range of applications is huge, examples can be found in statistics, image and information process, learning process, experiments in mechanics, meteorology, earth sciences, medicine, biology, etc. and the challenge is in computationally processing such a large amount of high-dimensional data so as to obtain low-dimensional descriptions and capture much of the phenomena of interest.

We consider the following problem formulation: Let us assume that we are given a (presumably large) set ℱ of functions ϕ ∈ ℱ defined over Ωx ⊂ ℝ dx (with d x ≥ 1). Our aim is to find some functions h 1, h 2,…, h Q : Ωx → ℝ such that every ϕ ∈ ℱ can be well approximated as follows

$$\varphi (x) \approx \sum\limits_{q = 1}^Q {{{\hat \varphi }_q}{h_q}(x),} $$

where Q ≪ dim(span{ℱ}). As said above, the ability for ℱ to posses this property is an assumption. It is precisely stated under the notion of small Kolmogorov n-width, defined as follows:

Let ℱ be a subset of some Banach space \(X\) and \({\mathbb{V}_Q}\) be a generic Q-dimensional subspace of \(X\). The angle between ℱ and \({\mathbb{V}_Q}\) is

$$E(F;{\mathbb{V}_Q}): = \mathop {\sup }\limits_{\varphi \in F} \;\mathop {\inf }\limits_{{v_Q} \in {\mathbb{V}_Q}} \left\| {\varphi - {v_Q}} \right\|X.$$

The Kolmogorov n-width of ℱ in \(X\) is given by

$${d_Q}(F,X): - inf\{ (F;{\mathbb{V}_Q})|{\mathbb{V}_Q}a\,Q - dimensional\,subspace\,of\;X\} .$$

The n-width of ℱ thus measures to what extent the set ℱ can be approximated by a n-dimensional subspace of \(X\).

This assumption of small Kolmogorov n-width can be taken for granted, but there are also reasons on the elements of ℱ that can lead to such a smallness such as regularity of the functions ϕ ∈ ℱ. As an example, we can quote, in the periodic settings, the well-known Fourier series. Small truncated Fourier series are good approximations of the full expansion if the decay rate of the Fourier coefficients is fast enough, i.e. if the functions ϕ have enough continuous derivatives. In this case, the basis is actually multipurpose since it is not dedicated to the particular set ℱ. Fourier series are indeed adapted to any set of regular enough functions, the more regular they are, the better the approximation is. Another property for ℱ to have a small Kolmogorov n-width is that it satisfies the principle of transform sparsity, i.e., we assume that the functions ϕ ∈ ℱ are expressed in a sparse way when written in some orthonormal basis set {ψi}, e.g. an orthonormal wavelet basis, a Fourier basis, or a local Fourier basis, depending on the application: this means that the coefficients \({\hat \varphi _i} = \left\langle {\varphi ,{\psi _i}} \right\rangle \) satisfy, for some p, 0 < p < 2, and some R:

$${\left\| \varphi\right\|_{{\ell ^p}}} = {\left( {{{\sum\limits_i {\left| {{{\hat \varphi }_i}} \right|} }^P}} \right)^{1/p}} \leqslant R$$

. A key implication of this assumption is that if we denote by ϕ N the sum of the N largest contributions then

$$\exists C(R,p),\;\forall \varphi\in \mathcal{F},\quad {\left\| {\varphi- {\varphi _N}} \right\|_{{\ell ^2}}} \leqslant C(R,p){(N + 1)^{1/2 - 1/p}},$$

i.e. there exists a contracted representation of such a ϕ. Note that the representation is adaptive and tuned to each ϕ (it is what is called a nonlinear approximation). However, under these assumptions, and if ℱ is finite dimensional (with a dimension that is much larger than N), the theory of compressed sensing (see [29]), at the price of having a slight logarithmic degradation of the convergence rate, allows to propose a non-adaptive recovery of ℓp functions, with p ≤ 1, that is almost optimal.We refer to [29] and the references therein for more details on this question. Anyway, these are cases where the set of basis functions {h i } does not constitute a multipurpose approximation set, all the contrary: it is tuned to that choice of ℱ and will not have any good property for another one.

The difficulty is of course to find the basis set {h i }. Note additionally that, from the definition of the small Kolmogorov n-width, except in a Hilbertian framework, the optimal elements need not even be in span{ℱ}.

Let us proceed and propose a way to better identify the various elements in ℱ: we consider that they are parametrized with y ∈ Ωy ⊂ ℝdy (with d y ≥ 1), so that ℱ consists of the parametrized functions f : Ωx × Ωy → ℝ. In what follows, we denote the function f as a function of x for some fixed parameter value y as f y := f(·,y). However, the role of x and y could be interchanged and both x and y will be considered equally as variables of the same level or as variable and parameter in all what follows.

In this paper, we present a survey of algorithms that search for an affine decomposition of the form

$$f\left( {x,y} \right) \approx \sum\limits_{q = 1}^Q {{g_q}\left( y \right){h_q}\left( x \right)} .$$
(3.1)

We focus on the case where the decomposition is chosen in an optimal way (in terms of sparse representation) and additionally we focus on methods with minimal computational complexity. It is assumed that we have a priori some or all the knowledge on functions f in ℱ, i.e. they are not implicitly defined by a PDE. In that “implicit” case there exists a family of reduced modeling approaches such as the reduced basis method; see e.g. [62].

Note that the domains Ωx and Ωy can be with finite cardinality M and N, inwhich case the functions can be written as matrices, then, the above algorithms can often be stated as a low-rank approximation: Given a matrix M ∈ ℝM × N, find a decomposition of the matrix M:

$$M \approx U{V^T}$$

where U is of size M × Q and V of size N × Q.

In this completely discrete setting, the Singular Value Decomposition (SVD), or the related Proper Orthogonal Decomposition (POD), yields an optimal (in terms of approximability with respect to the \({\left\|\cdot\right\|_{{\ell ^2}}} - norm\)) solution, but is rather expensive to compute. After presenting the POD in a general setting in Sect. 3.2, we present two alternatives, the Adaptive Cross Approximation (ACA) in Sect. 3.3 and the Empirical Interpolation Method (EIM), in Sect. 3.4, which originate from completely different backgrounds. We give a comparative overview of features and existing results of those approaches which are computationally much cheaper and yield in practice similar approximation results. The relation between ACA and the EIM is studied in Sect. 3.5. Section 3.6 is devoted to a projection method based on incomplete data known as Gappy POD or Missing Point Estimation, which in some cases can be interpreted as an interpolation scheme.

3.2 3.2 Proper Orthogonal Decomposition

Let us start by assuming that we have an unlimited knowledge of the data set and that we have unlimited computer resources — coming back at the end of this section to more realistic matter of facts. The first approach is known under the generic concept of Proper Orthogonal Decomposition (POD) which is a mathematical technique that stands at the intersection of various horizons that have actually been developed independently and concomitantly in various disciplines and is thus known under various names, including:

  • Proper Orthogonal Decomposition (POD): a term used in turbulence;

  • Singular Value Decomposition (SVD): a term used in algebra;

  • Principal Component Analysis (PCA): a term used in statistics for discrete random processes;

  • the discrete Karhunen-Loeve transform (KLT): a term used in statistics for continuous random processes;

  • the Hotelling transform: a term used in image processing;

  • Principal Orthogonal Direction (POD): a term used in geophysics;

  • Empirical Orthogonal Functions (EOFs): a term used in meteorology and geophysics.

All these somewhat equivalent approaches aim at obtaining low-dimensional approximate descriptions of high-dimensional processes, therefore eliminating information which has little impact on the overall understanding.

3.2.1 3.2.1 Historical Overview

As stated above, the POD is present under various forms in many contributions.

The original SVD was established for real-square matrices in the 1870’s by Beltrami and Jordan, for complex square matrices in 1902 by Autonne, and for general rectangular matrices in 1936 by Eckart and Young; see also the generalization to unitarily invariant norms by Mirsky [58]. The SVD can be viewed as the extension of the eigenvalue decomposition for the case of non-symmetric matrices and non-square matrices.

The PCA is a statistical technique. The earliest descriptions of the technique were given by Pearson [63] and Hotelling [44]. The purpose of the PCA is to identify the dependence structure behind a multivariate stochastic observation in order to obtain a compact description of it.

Lumley [51] traced the idea of the POD back to independent investigations by Kosambi [47], Loève [50], Karhunen [46], Pougachev [64] and Obukhov [59].

These methods aim at providing a set of orthonormal basis functions that allow to express approximately and optimally any function in the data set. The equivalence between all these approaches has been also investigated by many authors, among them [48, 56, 71].

3.2.2 3.2.2 Algorithm

Let us now present the POD algorithm in a semi-discrete framework, that is, we consider a finite family of functions \({\{ {f_y}\} _{y \in \Omega _y^{train}}}\) where f y : Ωx → ℝ for each train y ∈ Ωy = Ω trainy where Ω trainy is finite with cardinality N. In this context, the goal is to define an approximation P Q [f y ] to f y defined by

$${P_Q}\left[ {{f_y}} \right]\left( x \right) = \sum\limits_{q = 1}^Q {{g_q}\left( y \right){h_q}\left( x \right)}$$
(3.2)

with QN. The POD actually incorporates a scalar product, for functions depending on x ∈ Ωx and the above projection is then an orthogonal projection on the Q-dimensional vectorial space span {h q , q = 1,…, Q}.

The question is now to select properly the functions h q. With a scalar product, orthonormality is useful, since we would like that these modes are selected in order that they carry as much of the information that exists in the \({\{ {f_y}\} _{y \in \Omega _y^{train}}}\), i.e. the first function h 1 should be selected such that it provides the best one-term approximation similarly, then h q should be selected so that, with h 1, h 2,…, h q-1 it gives the best q-term approximation. The best q-term above is understood in the sense that the mean square error over all y ∈ Ω train y is the smallest. Such specially ordered orthonormal functions are called the proper orthogonal modes for the function f(x, y). With these functions, the expression (3.2) is called the POD of f and the algorithm is given in Table 3.1.

Scheme 3.1. Proper orthogonal decomposition (POD)

  1. a.

    Let \(\Omega _y^{train} = \{ \hat y, \ldots ,{\hat y_N}\} \) be a N-dimensional dicrete representation of Ωy.

  2. b.

    Construct the correlation matrix

    $${h_q}(x) = \sum\limits_{n = 1}^N {{{({v_q})}_n}f(x,{{\hat y}_n}),\quad 1 \leqslant q \leqslant Q,\quad x \in {\Omega _x},} $$

    where \({( \cdot , \cdot )_{{\Omega _x}}}\) denotes a scalar product of functions depending on Ωx.

  3. c.

    Then, solve for the Q largest eigenvalue-eigenvector pairs (λ q , v q ) such that

    $$C{v_q} = {\lambda _q}{v_q},\quad 1 \leqslant q \leqslant Q.$$
    (3.3)
  4. d.

    The orthogonal POD basis functions {h 1,…,hQ} such that \({\mathbb{V}_Q} = span\{ {h_1}, \ldots ,{h_Q}\} \) are then given by the linear combinations

    $${h_q}(x) = \sum\limits_{n = 1}^N {{{({v_q})}_n}f(x,{{\hat y}_n}),\quad 1 \leqslant q \leqslant Q,\quad x \in {\Omega _x},} $$

    and where (v q ) n denotes the n-th coefficient of the eigenvector v q .

Approximation. The approximation P Q [f y ] to f y : Ωx → ℝ, for any y ∈ Ωy, is then given by

$${P_Q}[{f_y}](x) = \sum\limits_{q = 1}^Q {{g_q}(y){h_q}(x),\quad x \in {\Omega _x},} $$

with \({g_q}(y) = \frac{{{{({f_y},{h_q})}_{{\Omega _X}}}}}{{{{({h_q},{h_q})}_{{\Omega _X}}}}}.\)

Proposition 3.1 The approximation error

$$d_2^{POD}(Q) = \sqrt {\frac{1}{N}\sum\limits_{y \in \Omega _y^{train}} {\left\| {{f_y} - {P_Q}[{f_y}]} \right\|_{{\Omega _x}}^2} } $$

minimizes the mean square error \(\sqrt {\frac{1}{N}\sum\limits_{y \in \Omega _y^{train}} {\left\| {{f_y} - {P_Q}[{f_y}]} \right\|_{{\Omega _x}}^2} } \) over all projection operators \({P_Q}\) onto a space of dimension Q. It is given by

$$d_2^{POD}\left( Q \right) = \sqrt {\sum\limits_{q = Q + 1}^N {{\lambda _q}} } ,$$
(3.4)

where {λ Q+1,…,λ} denotes the set of the N – Q smallest eigenvalues of the eigenvalue problem (3.3).

Remark 3.1 (Relation to SVD) If the scalar product \({( \cdot , \cdot )_{{\Omega _X}}}\) is approximated in the sense of ℓ2 on a discrete set of points

$${(v,w)_{\Omega _x^{train}}} = \frac{{\left| {{\Omega _x}} \right|}}{M}\sum\limits_{i = 1}^M {v({{\hat x}_i})w({{\hat x}_i}),} $$

, i.e.

$${(v,w)_{\Omega _x^{train}}} = \frac{{\left| {{\Omega _x}} \right|}}{M}\sum\limits_{i = 1}^M {v({{\hat x}_i})w({{\hat x}_i}),} $$

then we see that C = AT A where A is the matrix defined by \({A_{i,j}} = \sqrt {\frac{{\left| {{\Omega _x}} \right|}}{{NM}}} {f_{{{\hat y}_j}({{\hat x}_i})}}\). And thus, the square roots of the eigenvalues (3.3) are singular values of A.

Remark 3.2 (Infinite dimensional version) In the case where the POD is processed by leaving the parameter y continuous in Ωy, the correlation matrix becomes an operator C : L 2y) → L 2y) with kernel \(C({y_1},{y_2}) = {({f_{{y_1}}},{f_{{y_2}}})_{{\Omega _x}}}\) that acts on functions of y ∈ Ωy as follows

$$(C\phi )(y) = {(C(y, \cdot ),\phi )_{{\Omega _y}}},\quad \phi\in {L^2}({\Omega _y}).$$

Assuming that fL 2x × Ωy), by the results obtained in [67] (that generalize Mercer’s theorem to more general domains) there exists a sequence of positive real eigenvalues (that can be ranked in decreasing order) and associated orthonormal eigenvectors, which can be used to construct best L 2-approximations (3.1).

The infinite dimensional version is important to understand the generality of the approach, e.g. how the various POD algorithms are linked together. In essence, this boils down to spectral theory of self-adjoint operators, either finite (in the matrix case) or infinite (for integral operator defined with symmetric kernels). Such operators have positive real eigenvalues and the corresponding eigenvectors can be ranked in decreasing order of eigenvalues. The approximation is based on considering the only eigenmodes that corresponds to the largest eigenvalues, they are those that carry the maximum information.

In practice though, both in the x and the y variables, sample sets Ω trainx and Ω trainx are devised. Depending on the size of N, the solution of the eigenvalue problem (3.3) can be prohibitively expensive. Most of the time though, there is not much hint on the way these training points should be chosen and they are generally quite large sets with NQ.

We finally remind that the original goal is to approximate any function f(x, y) for all x ∈ Ωx and y ∈ Ωy. In this regard, the error bound (3.4) only provides an upper error estimate for functions f y with y ∈ Ω trainy and no certified error bound for functions f y with y ∈ Ωy ∖ Ω trainy can be provided.

3.3 3.3 Adaptive Cross Approximation

In order to cope with the difficulty of implementation of the POD algorithms, let us present here the Adaptive Cross Approximation. The approximation leading to (3.1) is

$$f\left( {x,y} \right) \approx {\mathfrak{J}_Q}\left[ {{f_y}} \right]\left( x \right): = {\left[ {\begin{array}{*{20}{c}} {f\left( {x,{y_1}} \right)} \\ \vdots\\ {f\left( {x,{y_Q}} \right)} \end{array}} \right]^T}M_Q^{ - 1}\left[ {\begin{array}{*{20}{c}} {f\left( {{x_1},y} \right)} \\ \vdots\\ {f\left( {{x_Q},y} \right)} \end{array}} \right]$$
(3.5)

with points x q , y q , q = 1,…, Q, chosen such that the matrix

$${M_Q}: = \left[ \begin{gathered} f({x_1},{y_1})\, \cdots \;f({x_1},{y_Q}) \hfill \\ \vdots \quad \quad \quad \quad \quad \quad \quad\vdots\hfill \\ f({x_Q},{y_1})\, \cdots \;f({x_Q},{y_Q}) \hfill \\ \end{gathered}\right] \in {\mathbb{R}^{Q \times Q}}$$

is invertible. Notice that while P Q used in the construction of the POD is an orthogonal projector, \({\Im _Q}:{C^0}({\Omega _x}) \to {\mathbb{V}_Q}\) is an interpolation operator from the space of continuous functions C 0x) onto the system \({\Im _Q}:{C^0}({\Omega _x}) \to {\mathbb{V}_Q}\), i.e.

$$[{\Im _Q}[{f_y}]({x_q}) = f({x_q},y)\quad for{\kern 1pt} ally{\kern 1pt} and{\kern 1pt} q = 1, \ldots ,Q.$$

Due to the symmetry of x and y in (3.5), we also have \({\Im _Q}[{f_{{y_q}}}](x) = f(x,{y_q})\) for all x and q = 1,…,Q.

3.3.1 3.3.1 Historical Overview

Approximations of type (3.5) were first considered by Micchelli and Pinkus in [57]. There, it was proved for so-called totally positive functions f, i.e. continuous functions f : [0,1] × [0,1] → ℝ with non-negative determinants

$$\left| {\left[ \begin{gathered} f({\xi _1},{\upsilon _1})\, \cdots \,f({\xi _1},{\upsilon _q}) \hfill \\ \vdots \quad \quad \quad \quad \quad \quad\vdots\hfill \\ f({\xi _q},{\upsilon _1})\, \cdots \,f({\xi _q},{\upsilon _q}) \hfill \\ \end{gathered}\right]} \right|$$

for all 0. ≤ ξ1 < … < ξ q ≤ 1, 0 ≤ υ and q = 1,…, Q, that such approximations are optimal with respect to the L 1-norm, i.e.

where \({\Im _Q}\) is defined at implicitly known nodes x 1,…,x Q and y 1,…,y Q ; see [57] for an additional technical assumption.

Instead of L 1-estimates, it is usually required to obtain L -estimates. The obvious estimate

$${\left\| {{f_y} - {\Im _Q}[{f_y}]} \right\|_{{L^\infty }({\Omega _x})}} \leqslant (1 + {\sigma _1}[f])\mathop {\inf }\limits_{\upsilon\in {\mathbb{V}_Q}} {\left\| {{f_y} - \upsilon } \right\|_{{L^\infty }({\Omega _x})}}$$

contains the expression

$${\sigma _1}[f]: = \mathop {\sup }\limits_{x \in {\Omega _x}} \left\| {M_Q^{ - T}} \right.\left[ \begin{gathered} f(x,{y_1}) \hfill \\ \vdots\hfill \\ f(x,{y_Q}) \hfill \\ \end{gathered}\right]\left\| {_{{\ell ^1}}.} \right.$$

Since there is usually no estimate on the previous infimum (note that \({\mathbb{V}_Q}\) also depends on \(\mathcal{F} = {\{ {f_y}\} _{y \in {\Omega _y}}}\), one tries to relate \({f_y} - {\Im _Q}[{f_y}]\) with the interpolation error in another system \({\mathbb{W}_Q} = span\{ {w_1}, \ldots {w_Q}\} \) of functions (e.g. polynomials, spherical harmonics, etc.); cf. [6, 12]. Assume that the determinant of the Vandermonde matrix \({W_Q}: = {[{w_i}({x_j})]_{i,j = 1, \ldots ,Q}}\) does not vanish and let L : Ωx → ℝQ be the vector consisting of Lagrange functions \({L_i} \in {\mathbb{W}_Q},\;i.e.\,{L_i}({x_j}) = {\delta _{ij}},\;i,j = 1, \ldots ,Q\). Then, the interpolation operator \({\Im '_Q}\) defined over C0x) with values in \({\mathbb{W}_Q}\) can be represented as

$${\Im '_Q}[\varphi ](x) = {\left[ \begin{gathered} \varphi ({x_1}) \hfill \\ \vdots\hfill \\ \varphi ({x_Q}) \hfill \\ \end{gathered}\right]^T}L(x),\quad \varphi\in {C^0}({\Omega _x}),$$

and we obtain

$${f_y}(x) - {\Im _Q}[{f_y}](x) = {f_y}(x) - {\left[ \begin{gathered} f({x_1},y) \hfill \\ \vdots\hfill \\ f({x_Q},y) \hfill \\ \end{gathered}\right]^T}L(x) - {\left( {\left[ \begin{gathered} f(x,{y_1}) \hfill \\ \vdots\hfill \\ f(x,{y_Q}) \hfill \\ \end{gathered}\right] - M_Q^TL(x)} \right)^T}M_Q^{ - 1}\left[ \begin{gathered} f({x_1},y) \hfill \\ \vdots\hfill \\ f({x_Q},y) \hfill \\ \end{gathered}\right] = {f_y}(x) - {\Im '_Q}[{f_y}](x) - {\left[ \begin{gathered} {f_{{y_1}}}(x) - {{\Im '}_Q}[{f_{{y_1}}}](x) \hfill \\ \vdots\hfill \\ {f_{{y_Q}}}(x) - {{\Im '}_Q}[{f_{{y_Q}}}](x) \hfill \\ \end{gathered}\right]^T}M_Q^{ - 1}\left[ \begin{gathered} f({x_1},y) \hfill \\ \vdots\hfill \\ f({x_Q},y) \hfill \\ \end{gathered}\right].$$

Hence, for any y ∈ Ωy

$${\left\| {{f_y} - {\mathfrak{J}_Q}\left[ {{f_y}} \right]} \right\|_{{L^\infty }\left( {{\Omega _x}} \right)}} \leqslant \left( {1 + {\sigma _2}\left[ f \right]} \right)\mathop {\max }\limits_{z \in \left\{ {y,{y_1}, \ldots {y_Q}} \right\}} {\left\| {{f_z} - {{\mathfrak{J}'}_Q}\left[ {{f_z}} \right]} \right\|_{L\infty \left( {{\Omega _x}} \right)}},$$
(3.6)

where

$${\sigma _2}[f]: = \mathop {\sup }\limits_{y \in {\Omega _y}} \left\| {M_Q^{ - 1}\left[ \begin{gathered} f({x_1},y) \hfill \\ \vdots\hfill \\ f({x_Q},y) \hfill \\ \end{gathered}\right]} \right.\left\| {_{{\ell ^1}}.} \right.$$

3.3.2 3.3.2 Construction of Interpolation Nodes

The assumption that the determinant of the Vandermonde matrix W Q does not vanish, can be guaranteed by the choice of x 1,…,x Q . To this end, let Q linearly independent functions w 1,…,w Q be given as above.As in [8], we construct linearly independent functions ℓ,…,ℓ Q satisfying ℓ q (x p ) = 0, p < q, and \(span\{ {\ell _1}, \ldots ,{\ell _Q}\}= {\mathbb{W}_Q},q \leqslant Q\) in the following way. Let ℓ1 = w 1 and x 1 ∈ Ωx be a maximum of |ℓ1|. Assume that ℓ Q-1 has already been constructed. For the construction ℓQ of define ℓ Q,0 := w Q and

$${\ell _{Q,q}}: = {\ell _{Q,q - 1}} - {\ell _{Q,q - 1}}({x_q})\frac{{{\ell _q}}}{{{\ell _Q}({x_q})}},\quad q = 1, \ldots ,Q - 1.$$

Then ℓ Q,Q-1(x q ) = 0, q < Q, and span{ℓ Q,0,…,ℓ Q,Q-1} = span{ℓ1,…,ℓ Q-1, w Q }. Hence, we set ℓ Q := ℓ Q,Q-1 and choose

$${x_Q}: = \mathop {\arg \;\sup }\limits_{x \in {\Omega _x}} \left| {{\ell _Q}\left( x \right)} \right|.$$
(3.7)

The previous construction guarantees unisolvency at the nodes x q , q = 1,…,Q.

Lemma 3.1 It holds that det W Q ≠ 0.

Proof Since span{ℓ1,…,ℓ Q } = span{w 1,…,w Q } it follows that there is a non-singular matrix T ∈ ℝQ×Q such that

$$\left[ \begin{gathered} {\ell _1} \hfill \\ \vdots\hfill \\ {\ell _Q} \hfill \\ \end{gathered}\right] = T\left[ \begin{gathered} {w_1} \hfill \\ \vdots\hfill \\ {w_Q} \hfill \\ \end{gathered}\right].$$

Hence, R Q = TW Q where R Q := [ℓ i (x j )] Q i,j=1 is upper triangular. The assertion follows from

$$\det \,{R_Q} = {\ell _1}({x_1}) \cdot\ldots\cdot {\ell _Q}({x_Q}) \ne 0.$$

As an example, we choose \({\mathbb{W}_Q} = {\prod _{Q - 1}}\) the space of polynomials of degree at most Q – 1. Then, it follows from (3.6) that ACA converges if, e.g., f is analytic with respect to x, and the speed of convergence is determined by the decay of f’s derivatives or the elliptical radius of the ellipse in which f has a holomorphic extension. Furthermore, it can be seen that

$${\ell _Q}(x) = \prod\limits_{q = 1}^{Q - 1} {(x - {x_q}).} $$

Hence, the choice (3.7) of x Q is a generalization of a construction that is due to Leja [49]. Leja recursively defines a sequence of nodes {x 1,…,x Q } for polynomial interpolation in a compact set K ⊂ ℂ as follows. Let x 1K be arbitrary. Once x 1,…,x Q-1 have been found, choose x Q K so that

$$\prod\limits_{q = 1}^{Q - 1} {\left| {{x_Q} - {x_q}} \right|}= \mathop {\max }\limits_{x \in K} \prod\limits_{q = 1}^{Q - 1} {\left| {x - {x_q}} \right|.} $$

In [68] it is proved that Lebesgue constants associated with Leja points are subexponential for fairly general compact sets in ℂ; see also [65]. Hence, analyticity is required in general for the convergence of the interpolation process.

The expression σ2[f] on the right-hand side of (3.6) can be controlled by the choice of the points y 1,…,y Q ∈ Ω y . Due to Laplace’s theorem

$${\left( {M_Q^{ - 1}\left[ \begin{gathered} f({x_1},y) \hfill \\ \vdots\hfill \\ f({x_Q},y) \hfill \\ \end{gathered}\right]} \right)_q} = \frac{{\det {M_q}(y)}}{{\det {M_Q}}},\quad q = 1, \ldots ,Q,$$

where M q (y) arises from replacing the q-th column of M Q by the vector [f(x 1,y),…,f(x Q ,y)]T, we obtain that σ2[f] ≤ Q if y 1,…,y Q are chosen such that

$$\left| {\det {M_Q}} \right| \geqslant \left| {\det {M_q}\left( y \right)} \right|,\quad q = 1, \ldots ,Q,y \in {\Omega _y}.$$
(3.8)

connection with the so-called maximum volume condition (3.8), we also refer to the error estimates in [66] which are based on the technique of exact annihilators (see [2, 3]) in order to provide similar results as (3.6).

3.3.3 3.3.3 Incremental Construction

The maximum volume condition (3.8) is difficult to satisfy by an a-priori choice of y 1,…,y Q . Therefore, the following incremental construction of approximations (3.5), which is called Adaptive Cross Approximation (ACA) [6], has turned out to be practically more relevant. Let r 0(x,y) := f(x,y) and define the sequence of remainders as

$${r_q}\left( {x,y} \right): = {r_{q - 1}}\left( {x,y} \right) - \frac{{{r_{q - 1}}\left( {x,{y_q}} \right){r_{q - 1}}\left( {{x_q},y} \right)}}{{{r_{q - 1}}\left( {{x_q},{y_q}} \right)}},\quad q = 1, \ldots ,Q,$$
(3.9)

where x q and y q are chosen such that r q-1(x q ,y q ) ≠ 0. Then, the algorithm is summarized in Table 3.2.

Since r q-1(x q ,y q ) coincides with the q-th diagonal entry of the upper triangular factor of the LUdecomposition of M Q , we obtain that det M Q ≠ 0. In [12], it is shown that

$$f\left( {x,y} \right) = {\mathfrak{J}_Q}\left[ {{f_y}} \right]\left( x \right) + {r_Q}\left( {x,y} \right)$$
(3.10)

and

$${\Im _Q}[{f_y}](x) = \sum\limits_{q = 1}^Q {{r_{q - 1}}(x,{y_g})\frac{{{r_{q - 1}}({x_q},y)}}{{{r_{q - 1}}({x_q},{y_q})}}.} $$

This method is used in [21] (see also [23]) under the name Geddes-Newton series expansion for the numerical integration of bivariate functions, where instead of the maximum volume condition (3.8) (x q , y q ) is found from maximizing |r q-1|. This choice of (x q , y q ) is usually referred to as global pivoting. Another pivoting strategy is the so-called partial pivoting, i.e., y q is chosen in the q-th step such that

$$\left| {{r_{q - 1}}({x_q},{y_q})} \right| \geqslant \left| {{r_{q - 1}}({x_q},{y_{}})} \right|\,for\,all\,y\, \in {\Omega _y}$$

for x q ∈ Ωx chosen by (3.7). For the latter condition (and in particular for the stronger global pivoting) the conservative bound σ2[f] ≤ 2Q-1 can be guaranteed; see [6]. The actual growth of σ2[f] with respect to Q is, however, typically significantly weaker.

Scheme 3.2. Bivariate Adaptive Cross Approximation (ACA2)

Set q := 1.

While err < tol

  1. a.

    Define the remainder \({r_{q - 1}} = f - \sum {_{i = 1}^{q - 1}{c_i}} \) and choose (x q ,y q ) ∈ Ω x × Ω y such that

    $${r_{q - 1}}({x_q},{y_q}) \ne 0.$$

    .

  2. b.

    Define the next tensor product by

    $${c_q}(x,y) = \frac{{{r_{q - 1}}(x,{y_q}){r_{q - 1}}({x_q},y)}}{{{r_{q - 1}}({x_q},{y_q})}}.$$
  3. c.

    Define the error level by

    $$err = {\left\| {{r_{q - 1}}} \right\|_{{L^\infty }(\Omega x \times \Omega y)}}$$

    and set q := q + 1.

3.3.4 3.3.4 Application to Matrices

Approximations of the form (3.5) are particularly useful when they are applied to large-scale matrices A ∈ ℝM × N. In this case, (3.5) becomes

$$A \approx \tilde A: = {A_{:,\sigma }}A_{\tau ,\sigma }^{ - 1}{A_{\tau ,:}},$$
(3.11)

where τ := {i 1,…,i Q } andσ := {j 1,…,j Q } are sets of row and column indices, respectively, such that Aτ,σ ∈ ℝQ×Q is invertible. Here and in the following, we use the notation Aτ,: for the rows τ and A:,σ for the columns σ of A. Notice that the approximation à has rank at most Q and is constructed from few of the original matrix entries. Such kind of approximations were investigated by Eisenstat and Gu [37] and Tyrtyshnikov et al. [35] in the context of the maximum volume condition. Again, the approximation can be constructed incrementally by the sequence of remainders R(0) := A and

$${R^{(q)}}: = {R^{(q - 1)}} - \frac{{R_{:,jq}^{(q - 1)}R_{iq,:}^{(q - 1)}}}{{R_{iq,jq}^{(q - 1)}}},\quad q = 1, \ldots ,Q,$$

where the index pair (i q ,j q ) is chosen such that \(R_{{i_q}{j_q}}^{(q - 1)} \ne 0.\). The previous condition guarantees that Aτ,σ is invertible, and we obtain

$$\tilde A = \sum\limits_{q = 1}^Q {\frac{{R_{:,{j_q}}^{(q - 1)}R_{{i_q},:}^{(q - 1)}}}{{R_{{i_q},{j_q}}^{(q - 1)}}}} .$$

If A arises from evaluating a smooth function at given points, then R(q) can be estimated using (3.6).

In order to avoid the computation of each entry of the remainders R(q), it is important to notice that only the entries in the i q -th row and the j q -th column of R(q-1) are required for the construction of Ã. Therefore, the following algorithm computes the column vectors \(R_{:,{j_q}}^{(q - 1)}\) and row vectors \({v_q}: = R_{{i_q},:}^{\left( {q - 1} \right)}\) resulting in

$$\tilde A = \sum\limits_{q = 1}^Q {\frac{{{u_q}v_q^T}}{{{{\left( {{v_q}} \right)}_{{j_q}}}}}.}$$
(3.12)

The iteration stops after Q steps if the error satisfies

$${\left\| {A - \tilde A} \right\|_{{\ell ^2}}} = {\left\| {{R^{\left( Q \right)}}} \right\|_{{\ell ^2}}} < \varepsilon$$
(3.13)

with given accuracy ε > 0. The previous condition cannot be evaluated with linear complexity. Since the next rank-1 term \(({v_{Q + 1}})_{{j_{Q + 1}}}^{ - 1}{u_{Q + 1}}v_{Q + 1}^T\) approximates R(Q), we replace (3.13) with the error indicator

$$\frac{{{{\left\| {{u_{Q + 1}}v_{Q + 1}^T} \right\|}_{{\ell ^2}}}}}{{\left| {{{({V_{Q + 1}})}_{{j_{Q + 1}}}}} \right|}} = \frac{{{{\left\| {{u_{Q + 1}}} \right\|}_{{\ell ^2}}}{{\left\| {{V_{Q + 1}}} \right\|}_{{\ell ^2}}}}}{{\left| {{{({V_{Q + 1}})}_{{j_{Q + 1}}}}} \right|}} < \varepsilon .$$

The algorithm is presented in Table 3.3. Remark 3.3 Notice that almost no condition has been imposed on the row index i q . The following three methods are commonly used to choose i q . In addition to choosing i q randomly, i q can be found as

$${i_q}: = \mathop {\arg \,\max }\limits_{i = 1, \ldots ,M} \left| {{{({u_{q - 1}})}_i}} \right|,$$

which leads to a cyclic pivoting strategy. If A stems from the evaluation of a function at given nodes, then the construction of Sect. 3.3.2 should be used in order to guarantee the well-posedness of the interpolation operator \({\Im '_Q}\) and exploit the error estimate (3.6).

In some cases (see [15]), it is required to put more effort in the choice of i q to guarantee a well-suited approximation space \(span\{ {A_{{i_1},:}}, \ldots ,{A_{{i_Q},:}}\} \); cf. [7].

Instead of the M · N entries of A, we only have to compute Q(M + N) entries of A for the approximation by Ã. The construction of (3.12) requires \(O({Q^2}(M + N))\) arithmetic operations, and à can be stored with Q(M + N) units of storage. Possible redundancies among the vectors u q , v q , q = 1,…,Q, can be removed via orthogonalization.

Scheme 3.3. Adaptive Cross Matrix Approximation

Set q := 1.

While err < tol

  1. a.

    Choose i q such that

    $${V_q}: = A_{{i_q},:}^T - \sum\limits_{\ell= 1}^{q - 1} {\frac{{{{({u_\ell })}_{{i_q}}}}}{{{{({v_\ell })}_{{j_\ell }}}}}{v_\ell }} $$

    is nonzero and j q such that \(\left| {{{({v_q})}_{{j_q}}}} \right| = {\max _{j = 1, \ldots ,N}}\left| {{{({v_q})}_j}} \right|.\)

  2. b.

    Compute the vector

    $${u_q}: = {A_{:,{j_q}}} - \sum\limits_{\ell= 1}^{q - 1} {\frac{{{{({v_\ell })}_{{j_q}}}}}{{{{({v_\ell })}_{{j_\ell }}}}}{u_\ell }.} $$
  3. c.

    Compute the error indicator

    $$err = {\left| {{{({v_q})}_{{j_q}}}} \right|^{ - 1}}{\left\| {{u_q}} \right\|_{{\ell ^2}}}{\left\| {{v_q}} \right\|_{{\ell ^2}}}$$

    and set q := q + 1.

The origin of this matrix version of ACA is the construction of so-called hierarchical matrices [7,39,40] for the efficient treatment of integral formulations of elliptic boundary value problems. Hierarchical matrices allow to treat discretizations of such non-local operators with logarithmic-linear complexity. To this end, subblocks A t,s from a suitable partition of large-scale matrices A are approximated by low-rank matrices.

A form that is slightly different from (3.11) and which looks more complicated at first glance is

$${A_{t,s}} \approx {\hat A_{t,s}}: = {A_{:,{\sigma _t}}}A_{{\tau _{t,}}{\sigma _t}}^{ - 1}{A_{{\tau _{t,}}{\sigma _s}}}A_{{\tau _{s,}}{\sigma _s}}^{ - 1}{A_{{\tau _s},:}}$$

with suitable index sets τ t , σ t , τ s , and s depending on the respective index t or s only. Notice that in contrast to Ã, Â does not interpolate A on the “cross” but rather at single points specified by the indices τ t , σ s , i.e. \({\hat A_{{\tau _t},{\sigma _s}}} = {A_{{\tau _t},{\sigma _s}}}\). The advantage of this approach is the fact that the large parts \({A_{:{\sigma _t}}}A_{{\tau _t},{\sigma _t}}^{ - 1}\) and \(A_{{\tau _s},{\sigma _s}}^{ - 1}{A_{{\tau _s},:}}\): depend only on either one of the two index sets t or s, while only the small matrix \({A_{{\tau _s},{\sigma _s}}}\) depends on both. This allows to further reduce the complexity of hierarchical matrix approximations by constructing so-called nested bases approximations [13], which are mandatory to efficiently treat high-frequency Helmholtz problems; see [11].

3.3.5 3.3.5 Relation with Gaussian Elimination

Without loss of generality, we may assume for the moment that i q = j q = q, q = 1,…,Q. Otherwise, interchange the rows and columns of the original matrix R (0). Then

$${R^{(q)}} = \left( {I - \frac{{{R^{(q - 1)}}{e_q}e_q^T}}{{e_q^TR(q - 1){e_q}}}} \right){R^{(q - 1)}} = {L^{(q)}}{R^{(q - 1)}},$$

where L(q) ∈ ℝ M×N is the matrix

which differs from a Gaussian matrix only in the position (q,q); cf. [6]. This relation was exploited in [41] for the convergence analysis of ACA in the case of positive definite matrices A.

Furthermore, it is an interesting observation that ACA reduces the rank of the remainder in each step, i.e. rankR(q) = rankR(q-1) - 1. This was first discovered by Wedderburn in [69, p. 69]; see also [6, 26]. Hence, ACA may be regarded as a rank revealing LU factorization [22,45]. As we know, it is possible that the elements grow in the LU decomposition algorithm; cf. [34]. Thus the exponential bound 2Q on σ2[f] is not a result of overestimation.

3.3.6 3.3.6 Generalizations of ACA

The Adaptive Cross Approximation can easily be generalized to a linear functional setting. Instead the evaluation of the remainders at the chosen points x q , y q , q = 1,…,Q, one considers the recursive construction

$${r_q}(x,y): = {r_{q - 1}}(x,y) - \frac{{\left\langle {{r_{q - 1}}(x, \cdot ),{\psi _q}} \right\rangle \left\langle {{\varphi _q},{r_{q - 1}}( \cdot ,y)} \right\rangle }}{{\left\langle {{\varphi _q},{r_{q - 1}},{\psi _q}} \right\rangle }},\quad q = 1, \ldots ,Q.$$

Here, ϕ q and ψ q denote given linear functionals acting on x and y, respectively. It is easy to show (see [10]) that

$$\left\langle {{\varphi _i},{r_q}\left( { \cdot ,y} \right)} \right\rangle= 0 = \left\langle {{r_q}\left( {x, \cdot } \right),{\psi _i}} \right\rangle \quad for\,all\,i \leqslant q,x \in {\Omega _x}\;and\;y \in {\Omega _y}.$$
(3.14)

Hence, r q vanishes for an increasing number of functionals and

$${\Im ''_Q}[{f_y}](x): = \sum\limits_{q = 1}^Q {\left\langle {{r_{q - 1}}(x, \cdot ),{\psi _q}} \right\rangle } \frac{{\left\langle {{\varphi _q},{r_{q - 1}}( \cdot ,y)} \right\rangle }}{{\left\langle {{\varphi _q},{r_{q - 1}},{\psi _q}} \right\rangle }}$$

gradually interpolates f y (in the sense of functionals). The Adaptive Cross Approximation (3.9) is obtained from choosing the Dirac functionals \({\varphi _q}: = {\delta _{{x_q}}}\) and \({\psi _q}: = {\delta _{{y_q}}}\).

The benefits of the separation of variables resulting form (3.5) are even more important for multivariate functions f. We present two ways to generalize (3.9) to functions depending on d variables. An obvious idea is to group the set of variables into two parts each containing d/2 variables; see [10] for a method that uses the covariance of f to construct this separation. Each of the two parts can be treated as a single new variable. Then, the application of (3.9) results in a sequence of less-dimensional functions which inherit the smoothness of f. Hence, (3.9) can be applied again until only univariate functions are left. Due the nestedness of the construction, the constructed approximation cannot be regarded as an interpolation. Error estimates for this approximation were derived in [8] for d = 3, 4. The application to tensors of order d > 2 was presented in [4, 60, 61].

A more sophisticated way to generalize ACA to multivariate functions is presented in [9]. For the case d = 3, the sequence of remainders is constructed as

$${r_q}(x,y,z): = {r_{q - 1}}(x,y,z) - \frac{{{r_{q - 1}}(x,y,{z_q}){r_{q - 1}}(x,{y_q}z){r_{q - 1}}({x_q},y,z){r_{q - 1}}({x_q},{y_q},{z_q})}}{{{r_{q - 1}}(x,{y_q},{z_q}){r_{q - 1}}({x_q},y,{z_q}){r_{q - 1}}({x_q},{y_q},z)}}$$

instead of (3.9). Notice that this kind of approximation requires that x q , y q , z q can be found such that the denominator r q-1(x,y q ,z q ) r q-1(x q ,y,z q ) r q-1(x q ,y q ,z) ≠ 0. On the other hand, the advantage of this generalization is that it is equi-directional in contrast to the aforementioned idea, i.e., none of the variables is preferred to the others. Hence, similar to (3.14) we obtain for all x,y,z

$${r_q}(x,y,{z_i}) = {r_q}(x,{y_i},z) = {r_q}({x_i},y,z) = 0,\quad i \leqslant q.$$

3.4 3.4 Empirical Interpolation Method

3.4.1 3.4.1 Historical Overview

The Empirical Interpolation Method (EIM) [5] originates from reduced order modeling and its application to the resolution of parameter dependent partial differential equations. We are thus in the context where the set of solutions u(·,y) to the PDE generates a manifold, parametrized by y (the parameter is generally called · in these applications) that possesses a small Kolmogorov n-width. In the construction stage of the reduced basis method, the reduced basis is constructed from a greedy approach where each new basis function, that is a solution to the PDE associated to an optimally chosen parameter, is incorporated recursively. The selection criteria of the parameter is based on maximal (a posteriori) error estimates over the parameter space. This construction stage can be expensive: indeed it requires an initial accurate classical discretization method of finite element, spectral or finite volume type and every solution associated to a parameter that is optimally selected, needs to be approximated during this stage by the classical method. Once the preliminary stage is performed off-line, all the approximations of solutions corresponding to a new parameter are performed as a linear combination of the (few) basis functions constructed during the first phase. This second on-line stage is very cheap. This is due to two facts. The first one is related to the fact that the greedy approach is proven to be quite optimal [14, 16, 28], for exponential or polynomial decay of the Kolmogorov n-width, the greedy method provides a basis set that has the same feature.

The second fact is related to the approximation process. A Galerkin approximation in this reduced space indeed provides very good approximations, and if Q modes are used, a linear PDE can be simulated by inverting Q × Q matrices only, i.e. much smaller complexity than the classical approaches.

In order that the same remains true for nonlinear PDE’s, a strategy, similar to the pseudo-spectral approximation for high-order Fourier or polynomial approximations has been sought. This involves the use of an interpolation operator. In order to be coherent, an approximation u Q (·,y) = ∑ Q i=1 α i (y)u(·,y i ) being given (where the y i are the parameters that define the reduced basis snapshots) we want to approximate \(G({u_Q}( \cdot ,y)){\kern 1pt} (G{\kern 1pt} being{\kern 1pt} a{\kern 1pt} nonlinear{\kern 1pt} functional)\) as a linear combination

$$G({u_Q}( \cdot ,y)) \approx \sum\limits_{i = 1}^Q {{\beta _i}} (y){\kern 1pt} G(u( \cdot ,{y_i})).$$

The derivation of the set {β i } i from {α i } i needs to be very fast, it is defined by interpolation through the Empirical Interpolation Method defined in the following section. This has been extensively used for different types of equations in [36] and has led to the definition of general interpolation techniques and rapid derivation of the associated points.

The approach having a broader scope than only the use in reduced basis approximation, a dedicated analysis of the approximation properties for sets with small- Kolmogorov n-width has been presented in [54]. This approach for nonlinear problems has actually also been used for problems where the dependency in the parameter is involved (the so called “non-affine problems”) and has boosted the domain of application of reduced order approximations.

3.4.2 3.4.2 Motivation

As said above and in the introduction, we are in a situation where the set \(mathcal{F} = {\{ f( \cdot ,y)\} _{y \in {\Omega _y}}}\) denotes a family of parametrized functions with small Kolmogorov n-width. We therefore do not identify Ω x with Ω y . In addition, for a given parameter y, f(·,y) is supposed to be accessible at all values in Ω x .

The EIM is designed to find approximations to members of ℱ through an interpolation operator I q that interpolates the function f y = f(·,y) at some particular points in Ω x That is, given an interpolatory system defined by a set of basis functions {h 1,…,h q } (linear combination of particular “snapshots” \({f_{{y_1}}}, \ldots {f_{{y_q}}}\)) and interpolation points {x 1,…,x q {, the interpolant I q [f y ] of f y with y ∈ Ω y written as

$${I_q}\left[ {{f_y}} \right]\left( x \right) = \sum\limits_{j = 1}^q {{g_j}\left( y \right){h_j}\left( x \right)} ,\quad x \in {\Omega _x},$$
(3.15)

is defined by

$${I_q}\left[ {{f_y}} \right]\left( {{x_i}} \right) = {f_y}\left( {{x_i}} \right),\quad i = 1, \ldots ,q.$$
(3.16)

Thus, (3.16) is equivalent to the following linear system

$$\sum\limits_{j = 1}^q {{g_j}\left( y \right){h_j}\left( {{x_i}} \right)}= {f_y}\left( {{x_i}} \right),\quad i = 1, \ldots ,q.$$
(3.17)

One of the problems is to ensure that the system above is uniquely solvable, i.e. that the matrix (h j (x i )) i,j is invertible, which will be considered in the design of the interpolation scheme.

Scheme 3.4. Empirical Interpolation Method

Set q = 1. Do while err < tol:

  1. a.

    Pick the sample point

    $${y_q} = \mathop {\arg \,\sup }\limits_{y \in {\Omega _y}} {\left\| {{f_y} - {I_{q - 1}}\left[ {{f_y}} \right]} \right\|_{{L^p}\left( {{\Omega _x}} \right)}},$$
    (3.18)

    and the corresponding interpolation point

    $${x_q} = \mathop {\arg \,\sup }\limits_{x \in {\Omega _x}} \left| {{f_{{y_q}}}\left( x \right) - {I_{q - 1}}\left[ {{f_{{y_q}}}} \right]\left( x \right)} \right|.$$
    (3.19)
  2. b.

    Define the next basis function as

    $${h_q} = \frac{{{f_{{y_q}}} - {I_{q - 1}}\left[ {{f_{{y_q}}}} \right]}}{{{f_{{y_q}}}\left( {{x_q}} \right) - {I_{q - 1}}\left[ {{f_{{y_q}}}} \right]\left( {{x_q}} \right)}}.$$
    (3.20)
  3. c.

    Define the error level by

    $$err = {\left\| {er{r_p}} \right\|_{{L^\infty }({\Omega _y})}}\quad with\quad er{r_p}(y) = {\left\| {{f_y} - {I_{q - 1}}[{f_y}]} \right\|_{LP({\Omega _x})}},$$

    and set q := q + 1.

3.4.3 3.4.3 Algorithm

The construction of the basis functions and interpolation points is based on a greedy algorithm. Note that the EIM is defined with respect to a given norm on Ω x and we consider here L px)-norms for 1 ≤ p ≤ ∞. The algorithm is given in Table 3.4.

Remark 3.4 Note that whenever dim(span{ℱ}) = q⋆ the algorithm finishes for q = q⋆.

As long as qq⋆, note that the basis functions {h 1,…,h q } and the snapshots \(\{ {f_{{y_1}}}, \ldots ,{f_{{y_q}}}\} \) span the same space, i.e.,

$${\mathbb{V}_q} = span\{ {h_1}, \ldots ,{h_q}\}= span\{ {f_{{y_1}}}, \ldots ,{f_{{y_q}}}\} .$$

The former are preferred to the latter due to the following properties

$${h_i}\left( {{x_i}} \right) = 1,\quad {\forall _i} = 1, \ldots ,q\quad and\quad {h_j}\left( {{x_i}} \right) = 0,\quad 1 \leqslant i < j \leqslant q.$$
(3.21)

Remark 3.5 It is easy to show that the interpolation operator I q is the identity if restricted to the space \({\mathbb{V}_q}\), i.e.,

$${I_q}[{f_{{y_i}}}](x) = {f_{{y_i}}}(x),\quad i = 1, \ldots ,q,\quad x \in {\Omega _x}.$$

Remark 3.6 The construction of the interpolating functions and the associated interpolation points follows a greedy approach: we add the function in ℱ that is the worse approximated by the current interpolation operator and the interpolation point is where the error is the largest. The construction is thus recursive which, in turn, means that it is of low computational cost.

Remark 3.7 As explained in [5], the algorithm can be reduced to the selection of the interpolation points only, in the case where the family of interpolating functions \([\{ {f_{{y_1}}}, \ldots ,{f_{{y_q}}}, \ldots \} \) is preexisting. This can be the case for instance if a POD strategy has been used previously or when one considers a set that has a canonical basis and ordering (like the set of polynomials).

Note that solving the interpolation system (3.17) can be written as a linear system B gy = fy with q unknowns and equations where

$${B_{i,j}} = {h_j}({x_i}),\quad {({f_y})_i} = {f_y}({x_i}),\quad i,j = 1, \ldots ,q,$$

such that the interpolant is defined by

$${I_q}[{f_y}](x) = \sum\limits_{j = 1}^q {{{({g_y})}_j}{h_j}(x),\quad x \in {\Omega _x}.} $$

This construction of the basis functions and interpolation points satisfies the following theoretical properties (see [5]):

  • the basis functions {h 1,…,h q } consist of linearly independent functions;

  • the interpolation matrix B i,j is lower triangular with unity diagonal by (3.21) and hence invertible, the remaining entries belong to [-1,1];

  • the empirical interpolation procedure is well-posed in L p x), as long as qq⋆.

If the L x)-norm (p = ∞) is considered, the error analysis of the interpolation procedure classically involves the Lebesgue constant \({\Lambda _q} = {\sup _{x \in {\Omega _x}}}\sum {_{i = 1}^q} \left| {{L_i}(x)} \right|\,where\,{L_i} \in {\mathbb{V}_q}\) are the Lagrange functions satisfying L i (x j ) = δ ij . The following bound holds [5]

$${\left\| {{f_y} - {I_q}[{f_y}]} \right\|_{{L^\infty }({\Omega _x})}} \leqslant (1 + {\Lambda _q})\mathop {\inf }\limits_{{v_q} \in {\mathbb{V}_q}} {\left\| {{f_y} - {v_q}]} \right\|_{{L^\infty }({\Omega _x})}}.$$

An (in practise very pessimistic) upper bound (cf. [54]) of the Lebesque constant is given by

$${\Lambda _q} \leqslant {2^q} - 1,$$

which in turn results in the following estimate. Assume that \(F \subset X \subset {L^\infty }({\Omega _x})\) and that there exists a sequence of finite dimensional spaces

$${\mathbb{Z}_1} \subset {\mathbb{Z}_2} \subset \ldots ,\quad \dim ({\mathbb{Z}_q}) = q,\quad and\quad {\mathbb{Z}_q} \subset F,$$

such that there exists c > 0 and α > log(4) with

$$[\mathop {\inf }\limits_{{v_q} \in {\mathbb{Z}_q}} \left\| {{f_y} - {v_q}} \right\|X \leqslant c{e^{ - \alpha q}},\quad y \in {\Omega _y},$$

then

$${\left\| {{f_y} - {I_q}[{f_y}]} \right\|_{{L^\infty }({\Omega _x})}} \leqslant c{e^{ - {{(\alpha- \log (4))}_q}}}.$$

Remark 3.8 The worst-case situation where the Lebesgue constant scales indeed like Λ q ≤ 2q - 1 is rather artificial and in all implementations we have done so far involving functions belonging to some reasonable set with small Kolmogorov n-width, the growth of the Lebesgue constant is much more reasonable and in most of the times a linear growth is observed. Note that, the points that are generated by the EIM using polynomial basis functions (in increasing order of degree) on [-1,1] are exactly the Leja points as indicated in the frame of the EIM by A. ChkifaFootnote 1 and the discussion in Sect. 3.3.2 in the case of ACA. On the other hand, if one considers the Leja points on a unit circle and then project them onto the interval [-1,1] a linear growth is shown in [25].

3.4.4 3.4.4 Practical Implementation

In the practical implementation of the EIM one encounters the following problem. Finding the supremum respectively the arg sup in (3.18) and (3.19) is not feasible if any kind of approximation is effected. The least difficult way, but not the only one, is to consider representative point-sets \(\Omega _x^{train} = \left\{ {{{\hat x}_1},{{\hat x}_2}, \ldots ,{{\hat x}_M}} \right\}\,of\,{\Omega _x}\,and\;\Omega _y^{train} = \left\{ {{{\hat y}_1},{{\hat y}_2}, \ldots ,{{\hat y}_N}} \right\}\,of\,{\Omega _y}.\) Then, the EIM is written as in Table 3.5.

This possible implementation of the EIM is sometimes referred to as the Discrete Empirical Interpolation Method (DEIM) [24].

Remark 3.9 Different strategies have been reported in [38,55] to successively enrich the training set Ω trainy . The main idea is to start with a small number of training points and enrich the set during the iterations of the algorithm and obtain a very fine discretization only towards the end of the algorithm. One can also think of enriching the training set Ω trainx simultaneously.

Remark 3.10 Using representative pointsets Ω trainx and Ω trainy is only one way to discretize the problem. Alternatively, one can think of using optimization methods to find the maximum over Ωx and Ωy. Such a strategy has been reported in [18, 19] in the context of the reduced basis method, which, as well as the EIM, is based on a greedy algorithm.

Scheme 3.5. Empirical Interpolation Method (possible implementation of EIM)

Set q = 1. Do while err < tol:

  1. a.

    Pick the sample point

    $${y_q} = \mathop {\arg \,\max }\limits_{y \in \Omega _y^{train}} {\left\| {{f_y} - {I_q} - 1\left[ {{f_y}} \right]} \right\|_{{L^p}\left( {{\Omega _x}} \right)}},$$
    (3.22)

    and the corresponding interpolation point

    $${x_q} = \mathop {\arg \,\max }\limits_{x \in \Omega _x^{train}} \left| {{f_{{y_q}}}(x) - {I_{q - 1}}[{f_{{y_q}}}](x)} \right|.$$
  2. b.

    Define the next basis function as

    $${h_q} = \frac{{{f_{{y_q}}} - {I_{q - 1}}[{f_{{y_q}}}]}}{{{f_{{y_q}}}({x_q}) - {I_{q - 1}}[{f_{{y_q}}}]({x_q})}}.$$
  3. c.

    Define the error level by

    $$err = {\left\| {er{r_p}} \right\|_{{L^\infty }({\Omega _y})}}\quad with\quad er{r_p}(y) = {\left\| {{f_y} - {I_{q - 1}}[{f_y}]} \right\|_{{L^P}({\Omega _x})}}$$

    and set q := q + 1.

3.4.5 3.4.5 Practical Implementation Using the Matrix Representation of the Function

One can define an implementation of the EIM in a completely discrete setting using the representative matrix of f defined by M i,j = f(x i , y j ) for 1 ≤ i ≤ M and 1 ≤ jN. For the sake of short notation we recall the notation M:,j used for the j-th column of M.

Assume that we are given a set of basis vectors {h1,…,h q } and interpolation indices i 1,…,i q , the discrete interpolation operator I q : ℝN → ℝN of column vectors is given in the span of the basis vectors {h j } q j=1 , i.e. by I q [r] = ∑ j=1 q g j (r)h j for some scalars g j (r), such that

$${({I_q}[r])_{{i_k}}} = \sum\limits_{j = 1}^q {{g_j}(r){{({h_j})}_{{i_k}}} = {r_{{i_k}}},\quad r \in {\mathbb{R}^N},\quad k = 1, \ldots ,q.} $$

Using this notation, we then present the matrix version of the EIM in Table 3.6.

This procedure allows to define an approximation of any coefficient of the matrix M. In some cases however, one would like to obtain an approximation of f(x, y) for any (x,y) ∈ Ωx ×. After running the implementation, one can still construct the continuous interpolant I Q [f](x,y) for any (x,y) ∈ Ωx × Ωy. Indeed, the interpolation points x 1,…,x Q are provided by \({x_q} = {\hat x_{{i_q}}}.\). The construction of the (continuous) basis functions hq is based on mimicking part b of the discrete algorithm but in a continuous context. Therefore, during the discrete version one saves the following data

$$\begin{gathered} {s_{q,j}} = {g_j}({M_{:,{j_q}}}),\quad \quad \quad \quad \quad \quad \quad from\quad {I_{q - 1}}[{M_{:,{j_q}}}] = \sum\limits_{j = 1}^{q = 1} {{g_j}\left( {{M_{:,{j_q}}}} \right){h_j}} , \hfill \\ {s_{q,q}} = {M_{{i_q},{j_q}}} - {({I_{q - 1}}[{M_{:,{j_q}}}])_{{i_q}}}. \hfill \\ \end{gathered} $$

Then, the continuous basis functions can be recovered by the following recursive formula

$${h_q} = \frac{{{f_{{y_q}}} - \Sigma _{j = 1}^{q - 1}{s_{q,j}}{h_j}}}{{{s_{q,q}}}}$$

using the notation \({y_q} = {\hat y_{{i_q}}}.\)

Scheme 3.6. Empirical Interpolation Method (implementation based on representative matrix M of f)

Set q = 1. Do while err < tol

  1. a.

    Pick the sample index

    $${j_q} = \mathop {\arg \,\max }\limits_{j = 1, \ldots ,M} {\left\| {{M_{:,j}} - {I_{q - 1}}[{M_{:,j}}]} \right\|_{{\ell ^p}}},$$

    and the corresponding interpolation index

    $${i_q} = \mathop {\arg \,\max }\limits_{i = 1, \ldots ,N} \left| {{M_{i,{j_q}}} - {{({I_{q - 1}}[{M_{:,{j_q}}}])}_i}} \right|.$$
  2. b.

    Define the next approximation column by

    $${h_q} = \frac{{{M_{:,{j_q}}} - {I_{q - 1}}[{M_{:,{j_q}}}]}}{{{M_{{i_q},{j_q}}} - {{({I_{q - 1}}[{M_{:,{j_q}}}])}_{{i_q}}}}}.$$
  3. c.

    Define the error level by

    $$err = \mathop {\max }\limits_{j = 1, \ldots ,M} {\left\| {{M_{:,j}} - {I_{q - 1}}[{M_{:,j}}]} \right\|_{{\ell ^p}}}$$

    and set q : = q + 1.

3.4.6 3.4.6 Generalizations of the EIM

In the following, we present some generalizations of the core concept behind the EIM.

3.4.6.1 3.4.6.1 Generalized Empirical Interpolation Method (gEIM)

We have seen that the EIM-interpolation operator I q [f y ] interpolates the function f y at some empirically constructed points x 1,…,x q . The EIM can be generalized in the following sense as proposed in [52]. Let Σ be a dictionary of linear continuous forms (say for the L 2x)-norm) acting on functions f y, y ∈ Ωy Then, the gEIM consists in providing a set of basis functions h 1,…,h q , such that \({\mathbb{V}_q} = span\{ {h_1}, \ldots ,{h_q}\} = span\{ {f_{{y_1}}}, \ldots ,{f_{{y_q}}}\} \) for some empirically chosen {y 1,…,y q } ⊂ Ωy, and a set of linear forms, or moments, {σ1,…,σ q } ⊂ Σ. The generalized interpolant then takes the form

$${J_q}[{f_y}] = \sum\limits_{j = 1}^q {{g_j}(y){h_j}(x),\quad x \in {\Omega _x},\quad y \in {\Omega _y},} $$

and is defined in the following way

$${\sigma _i}({J_q}[{f_y}]) = {\sigma _i}({f_y}),\quad i = 1, \ldots ,q,$$

which will define the coefficients g j (y) for each y ∈ Ωy. We note that if the linear forms are Dirac functionals δ x with x ∈ Ωx, then the gEIM reduces to the plain EIM. The algorithm is given in Table 3.7. This constructive algorithm satisfies the following theoretical properties (see [52]):

  • the set {h 1,…,h q } consists of linearly independent functions;

  • the generalized interpolation matrix (B) ij = σ i (h j ) is lower triangular with unity diagonal (hence invertible) with other entries s ∈ [−1, 1];

  • the generalized empirical interpolation procedure is well-posed in L 2x).

Scheme 3.7. Generalized Empirical Interpolation Method (gEIM)

Set q = 1. Do while err < tol:

  1. a.

    Pick the sample point

    $${y_q} = \mathop {\arg \,\sup }\limits_{y \in {\Omega _y}} {\left\| {{f_y} - {J_{q - 1}}[{f_y}]} \right\|_{{L^p}({\Omega _x})}},$$

    and the corresponding interpolation moment

    $${\sigma _q} = \mathop {\arg \,\sup }\limits_{\sigma \in \Sigma } \left| {\sigma ({f_{{y_q}}} - {J_{q - 1}}[{f_{{y_q}}}])} \right|.$$
  2. b.

    Define the next basis function as

    $${h_q} = \frac{{{f_{{y_q}}} - {J_{q - 1}}[{f_{{y_q}}}]}}{{{\sigma _q}({f_{{y_q}}} - {J_{q - 1}}[{f_{{y_q}}}])}}.$$
  3. c.

    Define the error level by

    $$err = {\left\| {er{r_p}} \right\|_{{L^\infty }({\Omega _y})}}\,with\,er{r_p}(y) = {\left\| {{f_y} - {I_{q - 1}}[{f_y}]} \right\|_{{L^p}({\Omega _x})}}$$

    and set q := q + 1.

In order to quantify the error of the interpolation procedure, like in the standard interpolation procedure, we introduce the Lebesgue constant in the L 2-norm:

$${\Lambda _q} = \mathop {\sup }\limits_{y \in {\Omega _y}} \frac{{{{\left\| {{J_q}[{f_y}]} \right\|}_{{L^2}({\Omega _x})}}}}{{{{\left\| {{f_y}} \right\|}_{{L^2}({\Omega _x})}}}},$$

i.e. the L 2-operator norm of J q . Thus, the interpolation error satisfies:

$${\left\| {{f_y} - {J_q}[{f_y}]} \right\|_{{L^2}(\Omega )}} \leqslant (1 + {\Lambda _q})\quad \mathop {\inf }\limits_{{v_q} \in {\mathbb{V}_q}} {\left\| {{f_y} - {v_q}} \right\|_{{L^2}(\Omega )}}.$$

Again, a (very pessimistic) upper-bound for Λ q is:

$${\Lambda _q} \leqslant {2^{q - 1}}\mathop {\max }\limits_{i = 1, \ldots ,q} {\left\| {{h_i}} \right\|_{{L^2}(\Omega )}},$$

indeed, the Lebesgue constant is, in many cases, uniformly bounded in this generalized case. The following result proves that the greedy construction is quite optimal [53].

  1. 1.

    Assume that the Kolmogorov n-width of ℱ in L 2x) is upper bounded by C 0 n for any n ≥ 1, then the interpolation error of the gEIM greedy selection process satisfies for any f ∈ ℱ the inequality \({\left\| {f( \cdot ,y) - {J_Q}[f( \cdot ,y)]} \right\|_{{L^2}({\Omega _x})}} \leqslant {C_0}{(1 + {\Lambda _Q})^3}{Q^{ - \alpha }}\).

  2. 2.

    Assume that the Kolmogorov n-width of ℱ in L 2(Ωx) is upper bounded by \({C_0}{e^{ - {c_1}{n^\alpha }}}\) for any n ≥ 1, then the interpolation error of the gEIM greedy selection process satisfies for any f ∈ ℱ the inequality \({\left\| {f( \cdot ,y) - {J_Q}[f( \cdot ,y)]} \right\|_{{L^2}({\Omega _x})}} \leqslant {C_0}{(1 + {\Lambda _Q})^3}{e^{ - {c_2}{Q^\alpha }}}\) for a positive constant c 2 slightly smaller than c 1.

3.4.6.2 3.4.6.2 hp-EIM

If the Kolmogov n-width is only decaying slowly with respect to n and the resulting number of basis functions and associated integration points is larger than desired, a remedy consists of partitioning the space Ωy into different elements Ω 1y ,…,Ω Py on which a separate interpolation operator \({I_{{q_p}}}:{\{ {f_y}\} _{y \in \Omega _x^p}} \to {\mathbb{V}_{{q_p}}}\) with p = 1,…,P is constructed. That is, for each element Ω Py a standard EIM as described above is performed. The choice of creating the partition is subject to some freedom and different approaches have been presented in [30, 32].

A somewhat different approach is presented in [55], although in the framework of a projection method, where the idea of a strict partition of the space Ωy is abandoned. Instead, given a set of sample points y 1,…,y K for which the basis functions f(·,y 1),…,f(y K ) are known (or have been computed) a local approximation space for any y ∈ Ωy is constructed by considering the N basis functions whose parameter values are closest to y. In addition, the distance function, measuring the distance between two points in Ωy, can be empirically built in order to represent local anisotropies in the parameter space Ωy. Further, the distance function can also be used to define the training set Ω trainy which can be uniformly sampled with respect to the problem dependent distance function.

Several approaches have been presented in cases where Ωy is high-dimensional (dim(Ωy) ≈ 10). In such cases, finding the maximizer in (3.22) becomes a challenge. Since the discrete set Ω trainy should be representative of Ωy, we require that Ω trainy consists of a very large number of training points. Finding the maximum over this huge set is therefore prohibitive expensive as a result of the curse of dimensionality.

In [42], the authors propose a computational approach that randomly samples the space Ωy with a feasible number of training points, that is however changing over the iterations. Therefore, counting all tested training points over all iterations is still a very large number, at each iteration though finding the maximum is a feasible task.

In [43], the authors use, in the framework of the reduced basismethod, an ANOVA expansion based on sparse grid quadrature in order to identify the sensitivity of each dimension in Ωy. Then, once unimportant dimensions in Ωy are identified, the values of the unimportant dimensions are fixed to some reference value and the variation of y in Ωy is then restricted to the important dimensions. Finally, a greedy-based algorithm is used to construct a low-order approximation.

3.5 3.5 Comparison of ACA versus EIM

In the previous sections, we have given independent presentations of the basics of the ACA and the EIM type methods. As was explained, the backgrounds and the applications are different. In addition, we have also presented the results of the convergence analysis of these approximations yielding another fundamental difference between the two approaches. The frame for the convergence of the ACA is a comparison to any other interpolating system, such as the polynomial approximation and the existence of derivatives for the family of functions f y , y ∈ Ωy is then the reason for convergence. The convergence of the EIM is compared with respect to the n-width expressed by the Kolmogorov small dimension.

Nevertheless, despite there differences in origins, it is clear that some link exist between these two constructive approximation methods. We show now the relation between the ACA and the EIM in a particular case.

Theorem 3.1 The Bivariate Adaptive Cross Approximation with global pivoting is equivalent to the Empirical Interpolation Method using the Lx)-norm.

Proof We proceed by induction. Our affirmation A q at the q-th step is:

(A q )1: the interpolation points {x 1,…,x q } and {y 1,…,y q } of the EIM and ACA are identical;

(A q )2: g q (y) = r q-1(x q , y), y ∈ Ωy;

(A q )3: I g [f y ](x) = ℑ q [f y ](x), (x,y) ∈ Ωx × Ωy.

Induction base (q = 1): First, we note that r 0 = f and thus

$$({x_1},{y_1}) = \mathop {\arg \,\sup }\limits_{(x,y) \in {\Omega _x} \times {\Omega _y}} \left| {{r_0}(x,y)} \right| = \quad \mathop {\arg \,\sup }\limits_{(x,y) \in {\Omega _x} \times {\Omega _y}} \left| {f(x,y)} \right|.$$

Then, from (3.20) we conclude that \({h_1}(x) = \frac{{f(x,{y_1})}}{{f({x_1},{y_1})}}\) and by (3.17) we obtain that \({g_1}(y) = \frac{{f({x_1},y)}}{{{h_1}({x_1})}} = f({x_1},y) = {r_0}({x_1},y)\) since h 1(x 1) = 1. Further, using additionally (3.15), we get

$${I_1}[{f_{{y_1}}}](x) = {g_1}(y){h_1}(x) = {r_0}({x_1},y)\frac{{f(x,{y_1})}}{{f({x_1},{y_1})}} = \frac{{{r_0}({x_1},y){r_0}(x,{y_1})}}{{{r_0}({x_1},{y_1})}} = {\Im _1}[{f_y}](x),$$

for all (x, y) ∈ Ωx × Ωy and A1 holds in consequence.

Induction step (q > 1): Let us assume A q-1 to be true and we first note that

$${r_{q - 1}}\left( {x,y} \right) = f\left( {x,y} \right) - {I_{q - 1}}\left[ {{f_y}} \right]\left( x \right)$$
(3.23)

by (3.10) and (A q-1)3. Therefore, the selection criteria for the points (x q , y q ) are identical for the EIM wit p = ∞ and the ACA with global pivoting. In consequence, the chosen sample points (x q , y q ) are identical. Further, combining (3.20) and (3.23) yields

$${h_q}\left( x \right) = \frac{{{f_{{y_q}}}\left( x \right) - {I_{q - 1}}\left[ {{f_{{y_q}}}} \right]\left( x \right)}}{{{f_{{y_q}}}\left( {{x_q}} \right) - {I_{q - 1}}\left[ {{f_{{y_q}}}} \right]\left( {{x_q}} \right)}} = \frac{{{r_{q - 1}}\left( {x,{y_q}} \right)}}{{{r_{q - 1}}\left( {{x_q},{y_q}} \right)}}.$$
(3.24)

By (3.17) for i = q, using that h q (x q ) = 1 and (3.23), we obtain (A q )2:

$${g_q}\left( y \right) = f\left( {{x_q},y} \right) - \sum\limits_{j = 1}^{q - 1} {{g_j}\left( y \right){h_j}\left( {{x_q}} \right)} = f\left( {{x_q},y} \right) - {I_{q - 1}}\left[ {{f_y}} \right]\left( {{x_q}} \right) = {r_{q - 1}}\left( {{x_q},y} \right).$$
(3.25)

Finally, combining (3.24) and (3.25) in addition to (A q-1)3, we onclude that

$${I_q}[{f_y}](x) = {I_{q - 1}}[{f_y}](x) + {g_q}(y){h_q}(x) = {\Im _{q - 1}}[{f_y}](x) + {r_{q - 1}}({x_q},y)\frac{{{r_{q - 1}}(x,{y_q})}}{{{r_{q - 1}}({x_q},{y_q})}} = {\Im _q}[{f_y}](x)$$

and the proof is complete.

3.6 3.6 Gappy POD

In the following, we present a completion to the POD method called Gappy POD [17, 31, 70] or Missing Point Estimation [1]. We refer to it as the Gappy POD in the following. It is a projection based method (thus not an interpolation based method although in some particular cases it can be interpreted as an interpolation scheme). However, the projection matrix is approximated by a low-rank approximation that in turn is based on partial or incomplete (“gappy”) data of the functions under consideration. In a first turn, we present the method as introduced in [17, 70] and we generalize it in a second turn.

3.6.1 3.6.1 The Gappy POD Algorithm

We start from the conceptual idea that a set of basis functions {h 1,…,h Q }, that can — but does not need to — be obtained through a POD procedure, is given. We first introduce the idea of Gappy POD in the context of Remark 3.1 where functions are represented by a vector containing its pointwise values on a given grid \(\Omega _x^{train} = \{ {\hat x_1}, \ldots ,{\hat x_M}\} \). We remind that the projection P Q [f y ] of f y with y ∈ Ωy onto the space spanned by {h 1,…,h Q } is defined by

$${({P_Q}[{f_y}],{h_q})_{\Omega _x^{train}}},\quad {({f_y},{h_q})_{\Omega _x^{train}}}\quad q = 1, \ldots ,Q.$$

Next, assume that we only dispose of some incomplete data of f y . That is, we are given say L (< M) distinct points {x 1,…,x L } among Ω trainx where f y (x i ) is available. Then, we define the gappy scalar product by

$${(v,w)_{L,\Omega _x^{train}}} = \frac{{|\Omega |}}{L}\sum\limits_{i = 1}^L {v({x_i})w({x_i}),} $$

which only takes into account available data of f y . We can compute the gappy projection defined by

$${({P_{Q,L}}[{f_y}],{h_q})_{L,\Omega _x^{train}}} = {({f_y},{h_q})_{L,\Omega _x^{train}}},\quad q = 1, \ldots ,Q.$$

Observe that the basis functions {h 1,…,h Q } are no longer orthonormal for the gappy scalar product and that the stability of the method mainly depends on the properties of the mass matrix M h,L defined by

$${({M_{h,L}})_{i,j}} = {({h_j},{h_i})_{L,\Omega _x^{train}}}.$$

To summarize, in the above presentation we assumed that the data of f y at some given points was available and then defined a “best approximation” with respect to the available but incomplete data. For instance, the data can be assimilated by physical experiments and the Gappy POD allows to reconstruct the solution in the whole domain Ω trainx assuming that it can be accurately represented by the basis functions {h 1,…,h Q }.

We now change the viewpoint and ask the question: If we can place L sensors at the locations {x i } L i=1 ⊂ Ωx at which we have access to the data f y (x i ) (through measurements), where would we place the points {x i } L i=1 ?

One might consider different criteria to chose the sensors. In [70] the placement of L sensors is stated as a minimization problem

$$\min \,\kappa ({M_{h,L}})\quad where\quad {M_{h,L}}\,is\,based\,on\,L\,points\,\{ {x_1}, \ldots ,{x_L}\} $$

and κ (M h,L ) denotes the condition number of MM h,L . We report in Table 3.8 a slight modification of the algorithm presented in [1, 70] to construct a sequence of sensor placements {x 1,…,x L } (with LQ) based on an incremental greedy algorithm.

Scheme 3.8. Sensor placement algorithm with Gappy POD and minimal condition number

For 1 ≤ lL :

$${x_l} = \mathop {\arg \,\min \kappa ({M_{h,l}}(x))}\limits_{x \in {\Omega _x}} $$

where

$${({M_{h,l}}(x))_{i,j}} = \frac{{|{\Omega _x}|}}{l}\left[ {\sum\limits_{k = 1}^{l - 1} {{h_i}({x_k}){h_j}({x_k}) + {h_i}(x){h_j}(x)} } \right],\quad 1 \leqslant i,j \leqslant \min (Q,1).$$

Scheme 3.9. Sensor placement algorithm with Gappy POD and minimal error

For 1 ≤ lL:

$${x_l} = \mathop {\arg \,\max }\limits_{x \in {\Omega _x}} {\left\| {{P_{Q,l - 1}}[{f_y}] - {f_y} - } \right\|_{{L^p}({\Omega _x})}}$$

where P Q,l-1[f y] is the gappy projection of f y onto the span of {h 1,…,h min(Q,l-1)} based on the pointwise information at {x 1,…,x l-1}.

This natural algorithm actually seems to have some difficulties at the beginning, for small values of l. It is thus recommended to start with the algorithm presented in Table 3.9.

This criterion is actually the one that is used in the Gappy POD method presented in [20] in the frame of the GNAT approach that allows a stabilized implementation of the gappy method for a challenging CFD problem. Further, we have the following link between the gappy projection and the EIM as noticed in [33].

Lemma 3.2 Let {h 1,…,h Q } and {x 1,…,x Q } be given basis functions and interpolation nodes. If the interpolation is well-defined (i.e. the interpolation matrix being invertible), then the interpolatory system based on the basis functions {h 1,…,h Q } and the interpolation nodes {x 1,…,x Q } is equivalent to the gappy projection system based on the basis functions {h 1,…,h Q } with available data at the points {x 1,…,x Q }, that is, for any y ∈ Ωy the unique interpolant IQ[fy] ∈ span{h 1,…,h Q } such that

$${I_Q}\left[ {{f_y}} \right]\left( {{x_q}} \right) = {f_y}\left( {{x_q}} \right),\quad q = 1, \ldots ,Q,$$
(3.26)

is equivalent to the unique gappy projection P Q,L [f y ] defined by

$${\left( {{P_{Q,L}}\left[ {{f_y}} \right],{h_q}} \right)_{Q,\Omega _x^{train}}} = {\left( {{f_y},{h_q}} \right)_{Q,\Omega _x^{train}}},\quad q = 1, \ldots ,Q.$$
(3.27)

Proof Multiply (3.26) by \(\frac{{|{\Omega _x}|}}{Q}{h_i}({x_q})\) and take the sum over all q = 1,…,Q to obtain

$$\frac{{|{\Omega _x}|}}{Q}\sum\limits_{q = 1}^Q {{I_Q}[{f_y}]({x_q}){h_i}({x_q}) = } \frac{{|{\Omega _x}|}}{Q}\sum\limits_{q = 1}^Q {{f_y}({x_q}){h_i}({x_q}),\quad i = 1, \ldots ,Q,} $$

which is equivalent to \({({I_Q}[{f_y}],{h_i})_{L,\Omega _x^{train}}} = {({f_y},{h_i})_{L,\Omega _x^{train}}}\) for all i = 1,…,Q. On the other hand, if P Q,L [f y ] is the solution of (3.27), then there holds that

$$\sum\limits_{q = 1}^Q {{P_{Q,L}}\left[ {{f_y}} \right]\left( {{x_q}} \right){h_i}\left( {{x_q}} \right)} = \sum\limits_{q = 1}^Q {{f_y}\left( {{x_q}} \right){h_i}\left( {{x_q}} \right)} ,\quad i = 1, \ldots ,Q.$$
(3.28)

Since the interpolating system is well-posed, the interpolation matrix B i,j = h j (x i ) is invertible and thus there exists a vector u j such that Bu j = e j for some j = 1,…,Q where e j is the canonical basis vector. Then, multiply (3.28) by (u j ) i and sum over all i:

$$\sum\limits_{i,q = 1}^Q {{P_{Q,L}}[{f_y}]({x_q}){{({u_j})}_i}{B_{qi}} = \sum\limits_{i,q = 1}^Q {{f_y}({x_q}){{({u_j})}_i}{B_{q,i}},\quad j = 1, \ldots ,Q,} } $$

to get

$${P_{Q,L}}[{f_y}]({x_j}) = {f_y}({x_j}),\quad j = 1, \ldots ,Q.$$

Thus, the gappy projection satisfies the interpolation scheme.

One feature of the sensor placement algorithm based on the Gappy POD framework is that the basis functions {h 1,…,h q } are given and the sensors are chosen accordingly. As a consequence of the interpretation of the gappy projection as an interpolation scheme if the number of basis functions and sensors coincide, one might combine the Gappy POD approach with the EIM in the following way in order to construct basis functions and redundant sensor locations simultaneously:

  1. 1.

    use the EIM to construct simultaneously Q basis functions {h q } Q q=1 and interpolation points {x q } Q q=1 until a sufficiently small error is achieved;

  2. 2.

    use the gappy projection framework as outlined above to add interpolation points (sensors) to enhance the stability of the scheme.

3.6.2 3.6.2 Generalization of Gappy POD

In the previous algorithm the functions were represented by their nodal values at some points \({\hat x_1}, \ldots ,{\hat x_M}\). That is, we can introduce for each point \({\hat x_i}\) a functional \({\hat \sigma _i} = {\delta _{{{\hat x}_i}}}\) x denoting the Dirac functional associated to the point x) such that the interpolant of any continuous function f onto the space \({\mathbb{V}_M}\) of piecewise linear and globally continuous functions can be written as

$$\sum\limits_{m = 1}^M {{{\hat \sigma }_m}(f){{\hat \varphi }_m},} $$

where \(\{ {\hat \varphi _m}\} _{m = 1}^M\) denotes the Lagrange basis of \([{\mathbb{V}_M}\) with respect to the points \({\hat x_1}, \ldots ,{\hat x_M}.\)

We present a generalization where we allow a more general discrete space \([{\mathbb{V}_M}\). Therefore, let \([{\mathbb{V}_M}\) be a M-dimensional discrete space spanned by a set of basis functions \(\{ {\hat \varphi _i}\} _{i = 1}^M\) such as for example the finite element hat-functions, Fourier-basis or polynomial basis functions. In the context of the theory of finite elements, cf. [27], we are given M functionals \(\{ {\hat \varphi _m}\} _{m = 1}^M\), associated with the basis set \(\{ {\hat \varphi _i}\} _{i = 1}^M\), which determine the degrees of freedom of a function. That is, for f regular enough such that all degrees of freedom \({\hat \varphi _m}(f)\) are well-defined, the following interpolation scheme

$$f \to \sum\limits_{m = 1}^M {{{\hat \sigma }_m}(f){{\hat \varphi }_m}} $$

defines a function in \({\mathbb{V}_M}\) that interpolates the degrees of freedom.

We start with noting that the scalar product between two functions f,g in \({\mathbb{V}_M}\) is given by

$${(f,g)_{{\Omega _x}}} = \sum\limits_{n,m = 1}^M {{{\hat \sigma }_n}(f){{\hat \sigma }_m}(g){{({{\hat \sigma }_n},{{\hat \sigma }_m})}_{{\Omega _x}}}.} $$

In this framework, the meaning of “gappy” data is generalized. We speak of gappy data if only partial data of degrees of freedom, i.e. the \({\hat \sigma _m}(f)\) is available. Thus, in this generalized context, the degrees of freedom are not necessarily nodal values, i.e. the functionals being Dirac functionals, and depend on the choice of the basis functions.

Assume that we are given Q basis functions h 1,…,h Q that describe a subspace in \([{\mathbb{V}_M}\) and LQ degrees of freedom \({\sigma _l} = {\hat \sigma _{{i_l}}}\), for l = 1,…,L (chosen among all M degrees of freedom \({\hat \sigma _1}, \ldots ,{\hat \sigma _L}\)). Denoting by \({\sigma _l} = {\hat \sigma _{{i_l}}}\) the corresponding L basis functions, we then define a gappy scalar product

$${(f,g)_{L,{\Omega _x}}} = \frac{M}{L}\sum\limits_{l,k = 1}^L {{\sigma _l}(f){\sigma _k}(g){{({\varphi _l},{\varphi _k})}_{{\Omega _x}}}.} $$

Given any f y , y ∈ Ωy the gappy projection P Q,L [fy] span{h 1,…,h Q } is defined by

$${({P_{Q,L}}[{f_y}],{h_q})_{L,{\Omega _x}}} = {({f_y},{h_q})_{L,{\Omega _x}}},\quad q = 1, \ldots ,Q.$$

Then, the sensor placement algorithm introduced in the previous section can easily be generalized to this setting.

Remark 3.11 If the mass matrix \({\hat M_{i,j}} = {({\hat \varphi _j},{\hat \varphi _i})_{{\Omega _x}}}\) associated with the basis set \(\{ {\hat \varphi _i}\} _{i = 1}^M\) satisfies the following orthogonality property \({\hat M_{i,j}} = {({\hat \varphi _j},{\hat \varphi _i})_{{\Omega _x}}} = \frac{{|{\Omega _x}|}}{M}{\delta _{ij}}\) either by construction of the basis functions or using mass lumping (in the case of finite elements) and if the basis functions \(\{ {\hat \varphi _i}\} _{i = 1}^M\) are nodal basis functions associated with the set of points \(\Omega _x^{train} = \left\{ {{{\hat x}_1}, \ldots ,{{\hat x}_M}} \right\}\), then the original Gappy POD method is established.

Remark 3.12 If themassmatrix \({(\hat M)_{i,j}} = {({\hat \varphi _j},{\hat \varphi _i})_{{\Omega _x}}}\) associated with the selected functions {ϕ i { L i=1 is orthonormal, then the gappy projection P Q,L [f y ] is solution to the following quadratic minimization problem

$$\mathop {\min }\limits_{f \in {\mathbb{V}_Q}} \sum\limits_{l = 1}^L {|{\sigma _l}({f_y}) - {\sigma _l}(f){|^2}.} $$

Since L < Q in a general setting, this means that the gappy projection fits the selected degrees of freedom optimally in a least-squares sense. In the general case, P Q,L [f y ] is solution to the following minimization problem

$$\mathop {\min }\limits_{f \in {\mathbb{V}_Q}} \sum\limits_{l,k = 1}^L {({\sigma _l}({f_y}) - {\sigma _l}(f)){{({\varphi _l},{\varphi _m})}_{{\Omega _x}}}({\sigma _k}({f_y}) - {\sigma _k}(f)).} $$

Acknowledgements This work was supported by the research grant ApProCEM-FP7-PEOPLE-PIEF-GA-2010-276487.