1 Introduction

The mathematical modeling of complex physical phenomena often demands for functions that depend on a large number of variables. One typical instance occurs when a quantity of interest u is given as the solution to an equation written in general form as

$$\displaystyle \begin{aligned} \mathcal{P}(u,y)=0, {} \end{aligned} $$
(1)

where \(y=(y_j)_{j=1,\dots ,d}\in \mathbb {R}^d\) is a vector that concatenates various physical parameters which have an influence on u.

Supposing that we are able to solve the above problem, either exactly or approximately by numerical methods, for any y in some domain of interest \(U\subset \mathbb {R}^d\) we thus have access to the parameter-to-solution map

$$\displaystyle \begin{aligned} y \mapsto u(y). {} \end{aligned} $$
(2)

The quantity u(y) may be of various forms, namely:

  1. 1.

    a real number, that is, \(u(y)\in \mathbb {R}\);

  2. 2.

    a function in some Banach space, for example when (1) is a partial differential equation (PDE);

  3. 3.

    a vector of eventually large dimension, in particular when (1) is a PDE whose solution is numerically approximated using some numerical method with fixed discretization parameters.

In all three cases, the above maps act from U to some finite- or infinite-dimensional Banach space which we shall generically denote by V .

As a guiding example which will be further discussed in this paper, consider the elliptic diffusion equation

$$\displaystyle \begin{aligned} -\mathrm{div}(a\nabla u)=f, {} \end{aligned} $$
(3)

set on a given bounded Lipschitz domain \(D\subset \mathbb {R}^k\) (say with k = 1, 2 or 3), for some fixed right-hand side f ∈ L2(D), homogeneous Dirichlet boundary conditions u|∂D = 0, and where a has the general form

$$\displaystyle \begin{aligned} a=a(y)=\overline a+\sum_{j\,{\geq}\,1} y_j\psi_j. {} \end{aligned} $$
(4)

Here, \(\overline a\) and ψj are given functions in L(D), and the yj range in finite intervals that, up to renormalization, can all be assumed to be [−1, 1]. In this example y = (yj)j ≥ 1 is countably infinite-dimensional, that is, d = . The standard weak formulation of (3) in \(H^1_0(D)\),

$$\displaystyle \begin{aligned} \int_D a\nabla u\nabla v=\int_D fv, \quad v\in H^1_0(D), \end{aligned}$$

is ensured to be well-posed for all such a under the so-called uniform ellipticity assumption

$$\displaystyle \begin{aligned} \sum_{j\,{\geq}\,1}|\psi_j| \leq \overline a-r, \quad \mbox{a.e. on } D, {} \end{aligned} $$
(5)

for some r > 0. In this case, the map yu(y) acts from \(U=[-1,1]^{\mathbb {N}}\) to \(H^1_0(D)\). However, if we consider the discretization of (3) in some finite element space \(V_h \subset H_0^1(D)\), where h refers to the corresponding mesh size, using for instance the Galerkin method, then the resulting map

$$\displaystyle \begin{aligned} y\mapsto u_h(y), \end{aligned}$$

acts from \(U=[-1,1]^{\mathbb {N}}\) to Vh. Likewise, if we consider a quantity of interest such as the flux q(u) =∫Σ au ⋅ σ over a given interface Σ ⊂ D with σ being the outward pointing normal vector, then the resulting map

$$\displaystyle \begin{aligned}y\mapsto q(y)=q(u(y)), \end{aligned}$$

acts from \(U=[-1,1]^{\mathbb {N}}\) to \(\mathbb {R}\). In all three cases, the above maps act from U to the finite- or infinite-dimensional Banach space V , which is either \(H^1_0\), Vh or \(\mathbb {R}\).

In the previous instances, the functional dependence between the input parameters y and the output u(y) is described in clear mathematical terms by Eq. (1). In other practical instances, the output u(y) can be the outcome of a complex physical experiment or numerical simulation with input parameter y. However the dependence on y might not be given in such clear mathematical terms.

In all the abovementioned cases, we assume that we are able to query the map (2) at any given parameter value y ∈ U, eventually up to some uncertainty. Such uncertainty may arise due to:

  1. 1.

    measurement errors, when yu(y) is obtained by a physical experiment, or

  2. 2.

    computational errors, when yu(y) is obtained by a numerical computation.

The second type of errors may result from the spatial discretization when solving a PDE with a given method, and from the round-off errors when solving the associated discrete systems.

One common way of modeling such errors is by assuming that we observe u at the selected points y up to a an additive noise η which may depend on y, that is, we evaluate

$$\displaystyle \begin{aligned} y\mapsto u(y)+\eta(y), {} \end{aligned} $$
(6)

where η satisfies a uniform bound

$$\displaystyle \begin{aligned} \|\eta\|{}_{L^\infty(U,V)}:=\sup_{y\in U} \|\eta(y)\|{}_V\leq \varepsilon, {} \end{aligned} $$
(7)

for some ε > 0 representing the noise level.

Queries of the exact u(y) or of the noisy u(y) + η(y) are often expensive since they require numerically solving a PDE, or setting up a physical experiment, or running a time-consuming simulation algorithm. A natural objective is therefore to approximate the map (2) from some fixed number m of such queries at points {y1, …, ym}∈ U. Such approximations \(y\mapsto \widetilde u(y)\) are sometimes called surrogate or reduced models.

Let us note that approximation of the map (2) is sometimes a preliminary task for solving other eventually more complicated problems, such as:

  1. 1.

    Optimization and Control, i.e. find a y which minimizes a certain criterion depending on u(y). In many situations, the criterion takes the form of a convex functional of u(y), and the minimization is subject to feasibility constraints. See e.g. the monographs [3, 30] and references therein for an overview of classical formulations and numerical methods for optimization problems.

  2. 2.

    Inverse Problems, i.e. find an estimate y from some data depending on the output u(y). Typically, we face an ill-posed problem, where the parameter-to-solution map does not admit a global and stable inverse. Nonetheless, developing efficient numerical methods for approximating the parameter-to-solution map, i.e. solving the so-called direct problem, is a first step towards the construction of numerical methods for solving the more complex inverse problem, see e.g. [36].

  3. 3.

    Uncertainty Quantification, i.e. describe the stochastic properties of the solution u(y) in the case where the parameter y is modeled by a random variable distributed according to a given probability density. We may for instance be interested in computing the expectation or variance of the V -valued random variable u(y). Note that this task amounts in computing multivariate integrals over the domain U with respect to the given probability measure. This area also embraces, among others, optimization and inverse problems whenever affected by uncertainty in the data. We refer to e.g. [24] for the application of polynomial approximation to uncertainty quantification, and to [35] for the Bayesian approach to inverse problems.

There exist many approaches for approximating an unknown function of one or several variables from its evaluations at given points. One of the most classical approaches consists in picking the approximant in a given suitable n-dimensional space of elementary functions, such that

$$\displaystyle \begin{aligned} n\leq m. {} \end{aligned} $$
(8)

Here, by “suitable” we mean that the space should have the ability to approximate the target function to some prescribed accuracy, taking for instance advantage of its smoothness properties. By “elementary” we mean that such functions should have simple explicit form which can be efficiently exploited in numerical computations. The simplest type of such functions are obviously polynomials in the variables yj. As a classical example, we may decide to use, for some given \(k\in \mathbb {N}_0\), the total degree polynomial space of order k, namely

$$\displaystyle \begin{aligned} \mathbb{P}_k:=\mathrm{span}\left\{y\mapsto y^\nu\; : \; |\nu|:=\|\nu\|{}_1=\sum_{j=1}^d \nu_j\leq k\right\}, \end{aligned}$$

with the standard notation

$$\displaystyle \begin{aligned} \quad y^\nu:=\prod_{j=1}^dy_j^{\nu_j}, \quad \nu=(\nu_j)_{j=1,\dots,d}. \end{aligned}$$

Note that since u(y) is V -valued, this means that we actually use the V -valued polynomial space

$$\displaystyle \begin{aligned} \mathbb{V}_k:=V\otimes \mathbb{P}_k=\left\{y\mapsto \sum_{|\nu|\leq k} w_\nu y^\nu\; : \; w_\nu\in V\right\}. \end{aligned}$$

Another classical example is the polynomial space of degree k in each variable, namely

$$\displaystyle \begin{aligned} \mathbb{Q}_k:=\mathrm{span}\left\{y\mapsto y^\nu\; : \; \|\nu\|{}_\infty=\max_{j=1,\dots,d} \nu_j\leq k\right\}. \end{aligned}$$

A critical issue encountered by choosing such spaces is the fact that, for a fixed value of k, the dimension of \(\mathbb {P}_k\) grows with d like dk, and that of \(\mathbb {Q}_k\) grows like kd, that is, exponentially in d. Since capturing the fine structure of the map (2) typically requires a large polynomial degree k in some coordinates, we expect in view of (8) that the number of needed evaluations m becomes prohibitive as the number of variables becomes large. This state of affairs is a manifestation of the so-called curse of dimensionality. From an approximation theoretic or information-based complexity point of view, the curse of dimensionality is expressed by the fact that functions in standard smoothness classes such as Cs(U) cannot be approximated in L(U) with better rate then nsd by any method using n degrees of freedom or n evaluations, see e.g. [17, 31].

Therefore, in high dimension, one is enforced to give up on classical polynomial spaces of the above form, and instead consider more general spaces of the general form

$$\displaystyle \begin{aligned} \mathbb{P}_\varLambda:=\mathrm{span}\{y\mapsto y^\nu\; : \; \nu \in \varLambda\}, {} \end{aligned} $$
(9)

where Λ is a subset of \(\mathbb {N}^d_0\) with a given cardinality n := #(Λ). In the case of infinitely many variables d = , we replace \(\mathbb {N}_0^d\) by the set

$$\displaystyle \begin{aligned} \mathcal{F}:=\ell^0(\mathbb{N},\mathbb{N}_0):=\{\nu=(\nu_j)_{j\,{\geq}\,1} \; :\: \#(\mathrm{supp}(\nu))<\infty\}, \end{aligned}$$

of finitely supported sequences of nonnegative integers. For V -valued functions, we thus use the space

$$\displaystyle \begin{aligned} \mathbb{V}_\varLambda=V\otimes \mathbb{P}_\varLambda:=\left\{y\mapsto \sum_{\nu\in\varLambda} w_\nu y^\nu\; : \; w_\nu\in V\right\}. \end{aligned}$$

Note that \(\mathbb {V}_\varLambda =\mathbb {P}_\varLambda \) in the particular case where \(V=\mathbb {R}\).

The main objective when approximating the map (2) is to maintain a reasonable trade-off between accuracy measured in a given error norm and complexity measured by m or number of degrees of freedom measured by n, exploiting the different importance of each variable. Intuitively, large polynomial degrees should only be allocated to the most important variables. In this sense, if d is the dimension and k is the largest polynomial degree in any variable appearing in Λ, we view Λ as a very sparse subset of {0, …, k}d.

As generally defined by (9), the space \(\mathbb {P}_\varLambda \) does not satisfy some natural properties of usual polynomial spaces such as closure under differentiation in any variable, or invariance by a change of basis when replacing the monomials yν by other tensorized basis functions of the form

$$\displaystyle \begin{aligned} \phi_\nu(y)=\prod_{j\,{\geq}\,1} \phi_{\nu_j}(y_j), \end{aligned}$$

where the univariate functions {ϕ0, …, ϕk} form a basis of \(\mathbb {P}_k\) for any k ≥ 0, for example with the Legendre or Chebyshev polynomials. In order to fulfill these requirements, we ask that the set Λ has the following natural property.

Definition 1

A set \(\varLambda \subset \mathbb {N}_0^d\) or \(\varLambda \subset \mathcal {F}\) is downward closed if and only if

$$\displaystyle \begin{aligned} \nu \in \varLambda \mathrm{ and} \widetilde \nu \leq \nu \implies \widetilde \nu \in \varLambda, \end{aligned}$$

where \( \widetilde \nu \leq \nu \) means that \(\widetilde \nu _j\leq \nu _j\) for all j.

Downward closed sets are also called lower sets. We sometimes use the terminology of downward closed polynomial spaces for the corresponding \(\mathbb {P}_\varLambda \). To our knowledge, such spaces have been first considered in [23] in the bivariate case d = 2 and referred to as polynômes pleins. Their study in general dimension d has been pursued in [25] and [16]. The objective of the present paper is to give a survey of recent advances on the use of downward closed polynomial spaces for high-dimensional approximation.

The outline is the following. We review in Sect. 2 several polynomial approximation results obtained in [1, 2] in which the use of well-chosen index sets allows one to break the curse of dimensionality for relevant classes of functions defined on U = [−1, 1]d, e.g. such as those occurring when solving the elliptic PDE (3) with parametric diffusion coefficient (4). Indeed, we obtain an algebraic convergence rate ns, where s is independent of d in the sense that such a rate may even hold when d = . Here, we consider the error between the map (2) and its approximant in either norms L(U, V ) = L(U, V, ) or L2(U, V ) = L2(U, V, ), where is the uniform probability measure,

$$\displaystyle \begin{aligned} d\mu:=\bigotimes_{j\,{\geq}\,1} \frac {dy_j} 2. \end{aligned}$$

We also consider the case of lognormal diffusion coefficients of the form

$$\displaystyle \begin{aligned} a=\exp(b), \quad b=b(y)=\sum_{j\,{\geq}\,1} y_j\psi_j, {} \end{aligned} $$
(10)

where the yj are i.i.d. standard Gaussian random variables. In this case, we have \(U=\mathbb {R}^d\) and the error is measured in L2(U, V, ) where

$$\displaystyle \begin{aligned} d\gamma:=\bigotimes_{j\,{\geq}\,1} g(y_j)dy_j, \quad g(t):=\frac{1}{\sqrt{2\pi}} e^{-t^2/2}, {} \end{aligned} $$
(11)

is the tensorized Gaussian probability measure. The above approximation results are established by using n-term truncations of polynomial expansions, such as Taylor, Legendre or Hermite, which do not necessarily result in downward closed index sets. In the present paper we provide a general approach to establish similar convergence rates with downward closed polynomial spaces.

The coefficients in the polynomial expansions cannot be computed exactly from a finite number of point evaluations of (2). One first numerical procedure that builds a polynomial approximation from point evaluations is interpolation. In this case the number m of samples is exactly equal to the dimension n of the polynomial space. We discuss in Sect. 3 a general strategy to choose evaluation points and compute the interpolant in arbitrarily high dimension. One of its useful features is that the evaluations and interpolants are updated in a sequential manner as the polynomial space is enriched, exploiting in a crucial way the downward closed structure. We study the stability of this process and its ability to achieve the same convergence rates in L established in Sect. 2.

A second numerical procedure for building a polynomial approximation is the least-squares method, which applies to the overdetermined case m > n. To keep the presentation concise, we confine to results obtained in the analysis of this method only for the case of evaluations at random points. In Sect. 4 we discuss standard least squares, both in the noisy and noiseless cases, and in particular explain under which circumstances the method is stable and compares favorably with the best approximation error in L2. Afterwards we discuss the more general method of weighted least squares, which allows one to optimize the relation between the dimension of the polynomial space n and the number of evaluation points m that warrants stability and optimal accuracy.

The success of interpolation and least squares is critically tied to the choice of proper downward closed sets (Λn)n≥1 with #(Λn) = n. Ideally we would like to choose the set \(\varLambda _n^*\) that minimizes the best approximation error

$$\displaystyle \begin{aligned} e(u,\varLambda):=\min_{v\in V_\varLambda} \|u-v\|, \end{aligned} $$
(12)

in some norm ∥⋅∥ of interest, among all possible downward closed sets Λ of cardinality n. In addition to be generally nonunique, such a set \(\varLambda _n^*\) is often not accessible. In practice we need to rely on some a-priori analysis to select “suboptimal yet good” sets. An alternative strategy is to select the sequence (Λn)n≥1 in an adaptive manner, that is, make use of the computation of the approximation for Λn−1 in order to choose Λn.

We discuss in Sect. 5 several adaptive and nonadaptive strategies which make critical use of the downward closed structure of such sets. While our paper is presented in the framework of polynomial approximation, the concept of downward closed set may serve to define multivariate approximation procedures in other nonpolynomial frameworks. At the end of the paper we give some remarks on this possible extension, including, as a particular example, approximation by sparse piecewise polynomial spaces using hierarchical bases, such as sparse grid spaces.

Let us finally mention that another class of frequently used methods in high-dimensional approximation is based on Reproducing Kernel Hilbert Space (RKHS) or equivalently Gaussian process regression, also known as kriging. In such methods, for a given Mercer kernel K(⋅, ⋅) the approximant is typically searched by minimizing the associated RKHS norm among all functions agreeing with the data at the evaluation points, or equivalently by computing the expectation of a Gaussian process with covariance function K conditional to the observed data. Albeit natural competitors, these methods do not fall in the category discussed in the present paper, in the sense that the space where the approximation is picked varies with the evaluation points. It is not clear under which circumstances they may also break the curse of dimensionality.

2 Sparse Approximation in Downward Closed Polynomial Spaces

2.1 Truncated Polynomial Expansions

As outlined in the previous section, we are interested in deriving polynomial approximations of the map (2) acting from U = [−1, 1]d with \(d\in \mathbb {N}\) or d =  to the Banach space V . Our first vehicle to derive such approximations, together with precise error bounds for relevant classes of maps, consists in truncating certain polynomial expansions of (2) written in general form as

$$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}} u_\nu \phi_\nu, {} \end{aligned} $$
(13)

where for each \(\nu =(\nu _j)_{j\,{\geq }\,1}\in \mathcal {F}\) the function \(\phi _\nu :U\to \mathbb {R}\) has the tensor product form

$$\displaystyle \begin{aligned} \phi_\nu(y)=\prod_{j\,{\geq}\,1} \phi_{\nu_j}(y_j), \end{aligned}$$

and uν ∈ V . Here we assume that (ϕk)k≥0 is a sequence of univariate polynomials such that ϕ0 ≡ 1 and the degree of ϕk is equal to k. This implies that {ϕ0, …, ϕk} is a basis of \(\mathbb {P}_k\) and that the above product only involves a finite number of factors, even in the case where d = . Thus, we obtain polynomial approximations of (2) by fixing some sets \(\varLambda _n\subset \mathcal { F}\) with #(Λn) = n and defining

$$\displaystyle \begin{aligned} u_{\varLambda_n}:=\sum_{\nu\in\varLambda_n} u_\nu \phi_\nu. {} \end{aligned} $$
(14)

Before discussing specific examples, let us make some general remarks on the truncation of countable expansions with V -valued coefficients, not necessarily of tensor product or polynomial type.

Definition 2

The series (13) is said to converge conditionally with limit u in a given norm ∥⋅∥ if there exists an exhaustion (Λn)n≥1 of \(\mathcal {F}\) (which means that for any \(\nu \in \mathcal {F}\) there exists n0 such that ν ∈ Λn for all n ≥ n0), with the convergence property

$$\displaystyle \begin{aligned} \lim_{n\to \infty} \left\|u-u_{\varLambda_n}\right\| =0. {} \end{aligned} $$
(15)

The series (13) is said to converge unconditionally towards u in the same norm, if and only if (15) holds for every exhaustion (Λn)n≥1 of \(\mathcal {F}\).

As already mentioned in the introduction, we confine our attention to the error norms L(U, V ) or L2(U, V ) with respect to the uniform probability measure . We are interested in establishing unconditional convergence, as well as estimates of the error between u and its truncated expansion, for both norms.

In the case of the L2 norm, unconditional convergence can be established when \((\phi _\nu )_{\nu \in \mathcal {F}}\) is an orthonormal basis of L2(U). In this case we know from standard Hilbert space theory that if (2) belongs to L2(U, V ) then the inner products

$$\displaystyle \begin{aligned} u_\nu:=\int_{U} u(y)\phi_\nu(y) \, d\mu,\quad \nu\in\mathcal{F}, \end{aligned}$$

are elements of V , and the series (13) converges unconditionally towards u in L2(U, V ). In addition, the error is given by

$$\displaystyle \begin{aligned} \left\|u-u_{\varLambda_n}\right\|{}_{L^2(U,V)}= \left( \sum_{\nu\notin\varLambda_n} \|u_\nu\|{}_V^2\right)^{1/2}, {} \end{aligned} $$
(16)

for any exhaustion (Λn)n≥1. Let us observe that, since is a probability measure, the L(U, V ) norm controls the L2(U, V ) norm, and thus the above holds whenever the map u is uniformly bounded over U.

For the L norms, consider an expansion (13) where the functions \(\phi _\nu :U\mapsto \mathbb {R}\) are normalized such that \(\|\phi _\nu \|{ }_{L^\infty (U)}=1\), for all \(\nu \in \mathcal {F}\). Then \((\|u_\nu \|{ }_V)_{\nu \in \mathcal {F}}\in \ell ^1(\mathcal {F})\), and it is easily checked that, whenever the expansion (13) converges conditionally to a function u in L(U, V ), it also converges unconditionally to u in L(U, V ). In addition, for any exhaustion (Λn)n≥1, we have the error estimate

$$\displaystyle \begin{aligned} \left\|u-u_{\varLambda_n}\right\|{}_{L^\infty(U,V)}\leq \sum_{\nu\notin\varLambda_n} \|u_\nu\|{}_V. {} \end{aligned} $$
(17)

The above estimate is simply obtained by triangle inequality, and therefore generally it is not as sharp as (16). One particular situation is when \((\phi _\nu )_{\nu \in \mathcal {F}}\) is an orthogonal basis of L2(U) normalized in L. Then, if u ∈ L2(U, V ) and if the

$$\displaystyle \begin{aligned} u_\nu:=\frac {1}{\|\phi_\nu\|{}_{L^2(U,V)}^2}\int_{U} u(y)\phi_\nu(y) \, d\mu,\quad \nu\in\mathcal{F}, \end{aligned}$$

satisfy \((\|u_\nu \|{ }_V)_{\nu \in \mathcal {F}}\in \ell ^1(\mathcal {F})\), we find on the one hand that (13) converges unconditionally to a limit in L(U, V ) and in turn in L2(U, V ). On the other hand, we know that it converges toward u ∈ L2(U, V ). Therefore, its limit in L(U, V ) is also u.

A crucial issue is the choice of the sets Λn that we decide to use when defining the n-term truncation (14). Ideally, we would like to use the set Λn which minimizes the truncation error in some given norm ∥⋅∥ among all sets of cardinality n.

In the case of the L2 error, if \((\phi _\nu )_{\nu \in \mathcal {F}}\) is an orthonormal basis of L2(U), the estimate (16) shows that the optimal Λn is the set of indices corresponding to the n largest ∥uνV. This set is not necessarily unique, in which case any realization of Λn is optimal.

In the case of the L error, there is generally no simple description of the optimal Λn. However, when the ϕν are normalized in L(U), the right-hand side in the estimate (17) provides an upper bound for the truncation error. This bound is minimized by again taking for Λn the set of indices corresponding to the n largest ∥uνV, with the error now bounded by the 1 tail of the sequence \((\|u_\nu \|{ }_V)_{\nu \in \mathcal {F}}\), in contrast to the 2 tail which appears in (16).

The properties of a given sequence \((c_\nu )_{\nu \in \mathcal {F}}\) which ensure a certain rate of decay ns of its q tail after one retains its n largest entries are well understood. Here, we use the following result, see [12], originally invoked by Stechkin in the particular case q = 2. This result says that the rate of decay is governed by the p summability of the sequence for values of p smaller than q.

Lemma 1

Let 0 < p < q < ∞ and let \((c_\nu )_{\nu \in \mathcal {F}}\in \ell ^p(\mathcal {F})\) be a sequence of nonnegative numbers. Then, if Λn is a set of indices which corresponds to the n largest cν , one has

$$\displaystyle \begin{aligned} \left(\sum_{\nu\notin\varLambda_n} c_\nu^q\right)^{1/q} \leq C (n+1)^{-s}, \quad C:=\|(c_\nu)_{\nu\in\mathcal{F}}\|{}_{\ell^p}, \quad s:=\frac{1}{p} -\frac{1}{q}. \end{aligned}$$

In view of (16) or (17), application of the above result shows that p summability of the sequence \((\|u_\nu \|{ }_V)_{\nu \in \mathcal {F}}\) implies a convergence rate ns when retaining the terms corresponding to the n largest ∥uνV in (13). From (16), when \((\phi _\nu )_{\nu \in \mathcal {F}}\) is an orthonormal basis, we obtain \(s=\frac {1}{p}-\frac {1}{2}\) if p < 2. From (17), when the ϕν are normalized in L(U), we obtain \(s=\frac {1}{p}-1\) if p < 1.

In the present setting of polynomial approximation, we mainly consider four types of series corresponding to four different choices of the univariate functions ϕk:

  • Taylor (or power) series of the form

    $$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}} t_\nu y^\nu, \quad t_{\nu}:=\frac{1}{\nu !}\partial^\nu u(y=0), \quad \nu!:=\prod_{j\,{\geq}\,1} \nu_j!, {} \end{aligned} $$
    (18)

    with the convention that 0! = 1.

  • Legendre series of the form

    $$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}} w_\nu L_\nu(y), \quad L_\nu(y)=\prod_{j\,{\geq}\,1}L_{\nu_j}(y_j),\quad w_\nu:=\int_U u(y)L_\nu(y) \, d\mu, {} \end{aligned} $$
    (19)

    where (Lk)k≥0 is the sequence of Legendre polynomials on [−1, 1] normalized with respect to the uniform measure \(\int _{-1}^1 |L_k(t)|{ }^2 \frac {dt}2=1\), so that \((L_\nu )_{\nu \in \mathcal {F}}\) is an orthonormal basis of L2(U, ).

  • Renormalized Legendre series of the form

    $$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}} \widetilde w_\nu \widetilde L_\nu(y), \quad \widetilde L_\nu(y)=\prod_{j\,{\geq}\,1}\widetilde L_{\nu_j}(y_j), \quad \widetilde w_\nu:=\left(\prod_{j\,{\geq}\,1} (1+2\nu_j)\right)^{1/2} w_\nu, {} \end{aligned} $$
    (20)

    where \((\widetilde L_k)_{k\geq 0}\) is the sequence of Legendre polynomials on [−1, 1] with the standard normalization \(\|\widetilde L_k\|{ }_{L^\infty ([-1,1])}=\widetilde L_k(1)=1\), so that \(\widetilde L_k=(1+2k)^{-1/2} L_k\).

  • Hermite series of the form

    $$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}} h_\nu H_\nu(y), \quad H_\nu(y)=\prod_{j\,{\geq}\,1}H_{\nu_j}(y_j),\quad h_\nu:=\int_U u(y)H_\nu(y) \, d\gamma, {} \end{aligned} $$
    (21)

    with (Hk)k≥0 being the sequence of Hermite polynomials normalized according to \(\int _{\mathbb {R}} | H_k(t)|{ }^2 g(t)dt=1\), and given by (11). In this case \(U=\mathbb {R}^d\) and \((H_\nu )_{\nu \in \mathcal {F}}\) is an orthonormal basis of L2(U, ).

We may therefore estimate the L2 error resulting from the truncation of the Legendre series (19) by application of (16), or the L error resulting from the truncation of the Taylor series (18) or renormalized Legendre series (20) by application of (17). According to Lemma 1, we derive convergence rates that depend on the value of p such that the coefficient sequences \((\|t_\nu \|{ }_V)_{\nu \in \mathcal {F}}\), \((\|w_\nu \|{ }_V)_{\nu \in \mathcal {F}}\), \((\|\widetilde w_\nu \|{ }_V)_{\nu \in \mathcal {F}}\) or \((\|h_\nu \|{ }_V)_{\nu \in \mathcal {F}}\) belong to \(\ell ^p(\mathcal {F})\).

In a series of recent papers such summability results have been obtained for various types of parametric PDEs. We refer in particular to [1, 13] for the elliptic PDE (3) with affine parameter dependence (4), to [2, 20] for the lognormal dependence (10), and to [9] for more general PDEs and parameter dependence. One specific feature is that these conditions can be fulfilled in the infinite-dimensional framework. We thus obtain convergence rates that are immune to the curse of dimensionality, in the sense that they hold with d = . Here, we mainly discuss the results established in [1, 2] which have the specificity of taking into account the support properties of the functions ψj.

One problem with this approach is that the sets Λn associated to the n largest values in these sequences are generally not downward closed. In the next sections, we revisit these results in order to establish similar convergence rates for approximation in downward closed polynomial spaces.

2.2 Summability Results

The summability results in [1, 2] are based on certain weighted 2 estimates which can be established for the previously defined coefficient sequences under various relevant conditions for the elliptic PDE (3). We first report below these weighted estimates. The first one from [1] concerns the affine parametrization (4). Here, we have \(V=H^1_0(D)\) and V denotes its dual H−1(D).

Theorem 1

Assume that ρ = (ρj)j ≥ 1 is a sequence of positive numbers such that

$$\displaystyle \begin{aligned} \sum_{j\,{\geq}\,1} \rho_j|\psi_j(x)|\leq \overline a(x)-\widetilde r, \quad x\in D, {} \end{aligned} $$
(22)

for some fixed number \(\widetilde r >0\) . Then, one has

$$\displaystyle \begin{aligned} \sum_{\nu\in \mathcal{F}}\left( \rho^\nu \|t_\nu\|{}_V\right)^2<\infty, \quad \rho^\nu=\prod_{j\,{\geq}\,1} \rho_j^{\nu_j}, {} \end{aligned} $$
(23)

as well as

$$\displaystyle \begin{aligned} \sum_{\nu\in \mathcal{F}}\left(\beta(\nu)^{-1} \rho^\nu \|w_\nu\|{}_V\right)^2=\sum_{\nu\in \mathcal{F}}\left(\beta(\nu)^{-2} \rho^\nu \|\widetilde w_\nu\|{}_V\right)^2<\infty, {} \end{aligned} $$
(24)

with

$$\displaystyle \begin{aligned} \beta(\nu):=\prod_{j\,{\geq}\,1} (1+2\nu_j)^{1/2}. \end{aligned}$$

The constants bounding these sums depend on \(\widetilde r\) , \(\|\,f\|{ }_{V'}\) , \(\overline a_{\min }\) and \(\|\overline a\|{ }_{L^\infty }\).

A few words are in order concerning the proof of these estimates. The first estimate (23) is established by first proving that the uniform ellipticity assumption (5) implies the 2 summability of the Taylor sequence \((\|t_\nu \|{ }_V)_{\nu \in \mathcal {F}}\). Since the assumption (22) means that (5) holds with the ψj replaced by ρj ψj, this gives the 2 summability of the Taylor sequence for the renormalized map

$$\displaystyle \begin{aligned}y\mapsto u(\rho y), \quad \rho y=(\rho_jy_j)_{j\,{\geq}\,1}, \end{aligned}$$

which is equivalent to (23). The second estimate is established by first showing that \(\sum _{j\,{\geq }\,1} \rho _j|\psi _j|\leq \overline a-\widetilde r\) implies finiteness of the weighted Sobolev-type norm

$$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}}\frac{ \rho^{2\nu} }{\nu!}\int_U \left\| \partial^\nu u(y) \right\|{}_V^2 \prod_{j\,{\geq}\,1} (1 - |\,y_j|)^{2\nu_j} \, d\mu<\infty. \end{aligned}$$

Then, one uses the Rodrigues formula \(L_k(t) = \left (\frac {d}{dt}\right )^k\left ( \frac {\sqrt {2 k + 1}}{k! \,2^{k}} ( t^2 - 1)^{k} \right )\) in each variable yj to bound the weighted 2 sum in (24) by this norm.

Remark 1

As shown in [1], the above result remains valid for more general classes of orthogonal polynomials of Jacobi type, such as the Chebyshev polynomials which are associated with the univariate measure \(\frac {dt}{2\pi \sqrt {1-t^2}}\).

The second weighted 2 estimate from [2] concerns the lognormal parametrization (10).

Theorem 2

Let r ≥ 0 be an integer. Assume that there exists a positive sequence ρ = (ρj)j ≥ 1 such that \(\sum _{j\,{\geq }\,1} \exp (-\rho _j^2)<\infty \) and such that

$$\displaystyle \begin{aligned} \sum_{j\,{\geq}\,1} \rho_j|\psi_j(x)|=K< C_r:=\frac{\ln 2}{\sqrt r}, \quad x\in D. {} \end{aligned} $$
(25)

Then, one has

$$\displaystyle \begin{aligned} \sum_{\nu\in \mathcal{F}} \xi_\nu \|h_\nu\|{}_V^2<\infty, \end{aligned} $$
(26)

where

$$\displaystyle \begin{aligned} \xi_\nu:=\sum_{\|\widetilde \nu\|{}_{\ell^\infty}\leq r}{\nu\choose \widetilde \nu} \rho^{2\widetilde \nu}=\prod_{j\,{\geq}\,1}\left(\sum_{l=0}^r {\nu_j\choose l}\rho_j^{2l}\right), \quad {\nu\choose \widetilde \nu}:=\prod_{j\,{\geq}\,1}{\nu_j\choose \widetilde \nu_j}, \end{aligned}$$

with the convention that \({k \choose l}=0\) when l > k. The constant bounding this sum depends on \(\|\,f\|{ }_{V'}\) , \(\sum _{j\,{\geq }\,1} \exp (-\rho _j^2)\) and on the difference Cr − K.

Similar to the weighted 2 estimate (24) for the Legendre coefficients, the proof of (26) follows by first establishing finiteness of a weighed Sobolev-type norm

$$\displaystyle \begin{aligned} \sum_{\|\nu\|{}_{\ell^\infty} \leq r}\frac {\rho^{2\nu}}{\nu !} \int_U \|\partial^\nu u(y)\|{}_V^2 \, d\gamma<\infty, \end{aligned}$$

under the assumption (25) in the above theorem. Then one uses the Rodrigues formula \(H_k(t) = \frac {(-1)^k}{\sqrt {k!} } \frac {g^{(k)}(t)}{g(t)}\), with g given by (11), in each variable yj to bound the weighted 2 sum in (26) by this norm.

In summary, the various estimates expressed in the above theorems all take the form

$$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}} (\omega_\nu c_\nu)^2 <\infty, \end{aligned}$$

where

$$\displaystyle \begin{aligned} c_\nu\in \{ \|t_\nu\|{}_V, \|w_\nu\|{}_V,\|\widetilde w_\nu\|{}_V,\|h_\nu\|{}_V\}, \end{aligned}$$

or equivalently

$$\displaystyle \begin{aligned} \omega_\nu\in \{\rho^\nu,\rho^\nu \beta(\nu)^{-1},\rho^\nu \beta(\nu)^{-2}, \xi_\nu^{1/2}\}. \end{aligned}$$

Then, one natural strategy for establishing p summability of the sequence \((c_\nu )_{\nu \in \mathcal {F}}\) is to invoke Hölder’s inequality, which gives, for all 0 < p < 2,

$$\displaystyle \begin{aligned} \left(\sum_{\nu\in\mathcal{F}} |c_\nu|{}^p\right)^{1/p}\leq \left(\sum_{\nu\in\mathcal{F}} (\omega_\nu c_\nu)^2\right)^{1/2} \left(\sum_{\nu\in\mathcal{F}} |\kappa_\nu|{}^q\right)^{1/q} <\infty, \quad \frac{1}{q}:=\frac{1}{p}-\frac{1}{2}, \end{aligned}$$

where the sequence \((\kappa _\nu )_{\nu \in \mathcal {F}}\) is defined by

$$\displaystyle \begin{aligned} \kappa_\nu:=\omega_\nu^{-1}.{} \end{aligned} $$
(27)

Therefore p summability of \((c_\nu )_{\nu \in \mathcal {F}}\) follows from q summability of \((\kappa _\nu )_{\nu \in \mathcal {F}}\) with 0 < q <  such that \(\frac {1}{q}=\frac {1}{p}-\frac {1}{2}\). This q summability can be related to that of the univariate sequence

$$\displaystyle \begin{aligned} b=(b_j)_{j\,{\geq}\,1},\quad b_j:=\rho_j^{-1}. \end{aligned}$$

Indeed, from the factorization

$$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}} b^{q\nu}=\prod_{j\,{\geq}\,1} \sum_{n\geq 0} b_j^{nq}, \end{aligned}$$

one readily obtains the following elementary result, see [12] for more details.

Lemma 2

For any 0 < q < ∞, one has

$$\displaystyle \begin{aligned} b\in \ell^q(\mathbb{N})\quad \mathrm{and}\quad \|b\|{}_{\ell^\infty}<1 \iff (b^\nu)_{\nu\in\mathcal{F}} \in \ell^q(\mathcal{F}). \end{aligned}$$

In the case ων = ρν, i.e. κν = bν, this shows that the p summability of the Taylor coefficients \((\|t_\nu \|{ }_V)_{\nu \in \mathcal {F}}\) follows if the assumption (22) holds with \(b=(\rho _j^{-1})_{j\,{\geq }\,1}\in \ell ^q(\mathbb {N})\) and ρj > 1 for all j. By a similar factorization, it is also easily checked that for any algebraic factor of the form \(\alpha (\nu ):=\prod _{j\,{\geq }\,1}(1+c_1\nu _j)^{c_2}\) with c1, c2 ≥ 0, one has

$$\displaystyle \begin{aligned} b\in \ell^q(\mathbb{N})\quad \mathrm{and}\quad \|b\|{}_{\ell^\infty}<1 \iff (\alpha(\nu)b^\nu)_{\nu\in\mathcal{F}} \in \ell^q(\mathcal{F}). \end{aligned}$$

This allows us to reach a similar conclusion in the cases ων = β(ν)−1 ρν or ων = β(ν)−2 ρν, which correspond to the Legendre coefficients \((\|w_\nu \|{ }_V)_{\nu \in \mathcal {F}}\) and \((\|\widetilde w_\nu \|{ }_V)_{\nu \in \mathcal {F}}\), in view of (24).

Likewise, in the case where \(\omega _\nu =\xi _\nu ^{1/2}\), using the factorization

$$\displaystyle \begin{aligned} \sum_{\nu\in \mathcal{F}} \kappa_\nu^q = \prod_{j\,{\geq}\,1} \sum_{n \geq 0} \left( \sum_{l=0}^r {n \choose l}\rho_j^{2l} \right)^{-q/2}, \end{aligned}$$

it is shown in [2] that the sum on the left converges if b ∈ q, provided that r was chosen large enough such that \(q>\frac {2}{r}\). This shows that the p summability of the Hermite coefficients \((\|h_\nu \|{ }_V)_{\nu \in \mathcal {F}}\) follows if the assumption (25) holds with \(b=(\rho _j^{-1})_{j\,{\geq }\,1}\in \ell ^q(\mathbb {N})\). Note that, since the sequence b can be renormalized, we may replace (25) by the condition

$$\displaystyle \begin{aligned} \sup_{x\in D}\sum_{j\,{\geq}\,1} \rho_j|\psi_j(x)|<\infty, {} \end{aligned} $$
(28)

without a specific bound.

2.3 Approximation by Downward Closed Polynomials

The above results, combined with Lemma 1, allow us to build polynomial approximations \(u_{\varLambda _n}\) with provable convergence rates ns in L or L2 by n-term truncation of the various polynomial expansions. However, we would like to obtain such convergence rates with sets Λn that are in addition downward closed.

Notice that if a sequence \((\kappa _\nu )_{\nu \in \mathcal {F}}\) of nonnegative numbers is monotone nonincreasing, that is

$$\displaystyle \begin{aligned} \nu\leq \widetilde \nu \implies \kappa_{\,\widetilde \nu\,} \leq \kappa_{\nu}, \end{aligned}$$

then the set Λn corresponding to the n largest values of κν (up to a specific selection in case of equal values) is downward closed. More generally, there exists a sequence (Λn)n≥1 of downward closed realizations of such sets which is nested, i.e. Λ1 ⊂ Λ2…, with \(\varLambda _1=0_{\mathcal {F}}:=(0,0,\ldots )\).

Since the general sequences \((\kappa _\nu )_{\nu \in \mathcal {F}}\) that are defined through (27) may not always be monotone nonincreasing, we introduce the following notion: for any sequence \((\kappa _\nu )_{\nu \in \mathcal {F}}\) tending to 0, in the sense that #{ν : |κν| > δ} <  for all δ > 0, we introduce its monotone majorant \((\widehat \kappa _{\nu })_{\nu \in \mathcal {F}}\) defined by

$$\displaystyle \begin{aligned} \widehat \kappa_\nu:=\max_{\widetilde \nu\geq \nu} |\kappa_{\,\widetilde \nu\,}|, \end{aligned}$$

that is the smallest monotone nonincreasing sequence that dominates \((\kappa _\nu )_{\nu \in \mathcal {F}}\). In order to study best n-term approximations using downward closed sets, we adapt the q spaces as follows.

Definition 3

For 0 < q < , we say that \((\kappa _\nu )_{\nu \in \mathcal {F}}\in \ell ^\infty (\mathcal {F})\) belongs to \(\ell ^q_m(\mathcal {F})\) if and only if its monotone majorant \((\widehat \kappa _\nu )_{\nu \in \mathcal {F}}\) belongs to \(\ell ^q(\mathcal {F})\).

We are now in position to state a general theorem that gives a condition for approximation using downward closed sets in terms of weighted 2 summability.

Theorem 3

Let \((c_\nu )_{\nu \in \mathcal {F}}\) and \((\omega _\nu )_{\nu \in \mathcal {F}}\) be positive sequences such that

$$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}}(\omega_\nu c_\nu)^2<\infty, \end{aligned}$$

and such that \((\kappa _\nu )_{\nu \in \mathcal {F}}\in \ell ^q_m(\mathcal {F})\) for some 0 < q < ∞ with \(\kappa _\nu =\omega _\nu ^{-1}\) . Then, for any 0 < r ≤ 2 such that \(\frac {1}{q}>\frac {1}{r}-\frac {1}{2}\) , there exists a nested sequence (Λn)n≥1 of downward closed sets such that #(Λn) = n and

$$\displaystyle \begin{aligned} \left(\sum_{\nu \notin \varLambda_n} c_\nu^r\right)^{1/r} \leq Cn^{-s}, \quad s:=\frac{1}{q}+\frac{1}{2}-\frac{1}{r}>0. {} \end{aligned} $$
(29)

Proof

With \((\widehat \kappa _\nu )_{\nu \in \mathcal {F}}\) being the monotone majorant of \((\kappa _\nu )_{\nu \in \mathcal {F}}\), we observe that

$$\displaystyle \begin{aligned} A^2:=\sum_{\nu\in\mathcal{F}}(\widehat \kappa_\nu^{-1} c_\nu)^2\leq \sum_{\nu\in\mathcal{F}}(\kappa_\nu^{-1} c_\nu)^2=\sum_{\nu\in\mathcal{F}}(\omega_\nu c_\nu)^2 <\infty. \end{aligned}$$

We pick a nested sequence (Λn)n≥1 of downward closed sets, such that Λn consists of the indices corresponding to the n largest \(\widehat \kappa _\nu \). Denoting by \((\widehat \kappa _n)_{n\geq 1}\) the decreasing rearrangement of \((\widehat \kappa _\nu )_{\nu \in \mathcal {F}}\), we observe that

$$\displaystyle \begin{aligned} n \widehat \kappa_n^q \leq \sum_{j= 1}^n \widehat \kappa_j^q \leq B^q, \quad B:=\|(\widehat \kappa_\nu)_{\nu\in\mathcal{F}}\|{}_{\ell^q}<\infty. \end{aligned}$$

With p such that \(\frac {1}{p}=\frac {1}{r}-\frac {1}{2}\), we find that

$$\displaystyle \begin{aligned} \left(\sum_{\nu \notin \varLambda_n} c_\nu^r\right)^{1/r} & \leq \left( \sum_{\nu \notin \varLambda_n} (\widehat \kappa_\nu^{-1} c_\nu)^2\right)^{1/2}\left(\sum_{\nu \notin \varLambda_n} \widehat \kappa_\nu^p\right)^{1/p}\\ &\leq A \left(\widehat \kappa_{n+1}^{p-q}\sum_{\nu \notin \varLambda_n} \widehat \kappa_\nu^q\right)^{1/p}\\ &\leq A B (n+1)^{1/p-1/q}, \end{aligned} $$

where we have used Hölder’s inequality and the properties of \((\widehat \kappa _n)_{n\geq 1}\). This gives (29) with C := AB. □

We now would like to apply the above result with \(c_\nu \in \{ \|t_\nu \|{ }_V, \|w_\nu \|{ }_V,\|\widetilde w_\nu \|{ }_V, \|h_\nu \|{ }_V\}\), and the corresponding weight sequences \(\omega _\nu \in \{\rho ^\nu ,\rho ^\nu \beta (\nu )^{-1},\rho ^\nu \beta (\nu )^{-2}, \xi _\nu ^{1/2}\}\), or equivalently \(\kappa _\nu \in \{b^\nu ,b^\nu \beta (\nu ),b^\nu \beta (\nu )^2, \xi _\nu ^{-1/2}\}\). In the case of the Taylor series, where κν = bν, we readily see that if bj < 1 for all j ≥ 1, then the sequence \((\kappa _\nu )_{\nu \in \mathcal {F}}\) is monotone nonincreasing, and therefore Lemma 2 shows that b ∈ q implies \((\kappa _\nu )_{\nu \in \mathcal {F}}\in \ell ^q_m(\mathcal {F})\). By application of Theorem 3 with the value r = 1, this leads to the following result.

Theorem 4

If (22) holds with \((\rho _j^{-1})_{j\,{\geq }\,1}\in \ell ^q(\mathbb {N})\) for some 0 < q < 2 and ρj > 1 for all j, then

$$\displaystyle \begin{aligned} \|u-u_{\varLambda_n}\|{}_{L^\infty(U,V)}\leq Cn^{-s}, \quad s:=\frac{1}{q}-\frac{1}{2}, \end{aligned}$$

where \(u_{\varLambda _n}\) is the truncated Taylor series and Λn is any downward closed set corresponding to the n largest κν.

In the case of the Legendre series, the weight κν = bν β(ν) is not monotone nonincreasing due to the presence of the algebraic factor β(ν). However, the following result holds.

Lemma 3

For any 0 < q < ∞ and for any algebraic factor of the form \(\alpha (\nu ):=\prod _{j\,{\geq }\,1}(1+c_1\nu _j)^{c_2}\) with c1, c2 ≥ 0, one has

$$\displaystyle \begin{aligned} b\in \ell^q(\mathbb{N})\quad \mathrm{and}\quad \|b\|{}_{\ell^\infty}<1 \iff (\alpha(\nu) b^\nu)_{\nu\in\mathcal{F}} \in \ell^q_m(\mathcal{F}). \end{aligned}$$

Proof

The implication from right to left is a consequence of Lemma 2, and so we concentrate on the implication from left to right. For this it suffices to find a majorant \(\widetilde \kappa _\nu \) of κν := α(ν)bν which is monotone nonincreasing and such that \((\widetilde \kappa _\nu )_{\nu \in \mathcal {F}} \in \ell ^q(\mathcal {F})\). We notice that for any τ > 1, there exists C = C(τ, c1, c2) ≥ 1 such that

$$\displaystyle \begin{aligned} (1+c_1n)^{c_2}\leq C\tau^n,\quad n\geq 0. \end{aligned}$$

For some J ≥ 1 and τ to be fixed further, we may thus write

$$\displaystyle \begin{aligned} \kappa_\nu\leq \widetilde \kappa_\nu:= C^J \prod_{j=1}^J(\tau b_j)^{\nu_j} \prod_{j>J} (1+c_1\nu_j)^{c_2}b_j^{\nu_j}. \end{aligned}$$

Since \(\|b\|{ }_{\ell ^\infty }<1\) we can take τ > 1 such that \(\theta :=\tau \|b\|{ }_{\ell ^\infty }<1\). By factorization, we find that

$$\displaystyle \begin{aligned} \sum_{\nu\in \mathcal{F}} \widetilde \kappa_\nu^q=C^{Jq} \left(\prod_{j=1}^J \left(\sum_{n\geq 0} \theta^{qn}\right)\right) \left( \prod_{j>J} \left(\sum_{n\geq 0} (1+c_1n)^{qc_2} b_j^{nq}\right)\right). \end{aligned}$$

The first product is bounded by (1 − θq)J. Each factor in the second product is a converging series which is bounded by \(1+cb_j^q\) for some c > 0 that depends on c1, c2 and \(\|b\|{ }_{\ell ^\infty }\). It follows that this second product converges. Therefore \((\widetilde \kappa _\nu )_{\nu \in \mathcal {F}}\) belongs to \(\ell ^q(\mathcal {F})\).

Finally, we show that \(\widetilde \kappa _\nu \) is monotone nonincreasing provided that J is chosen large enough. It suffices to show that \(\widetilde \kappa _{\nu +e_j}\leq \widetilde \kappa _\nu \) for all \(\nu \in \mathcal {F}\) and for all j ≥ 1 where

$$\displaystyle \begin{aligned} e_j:=(0,\dots,0,1,0,\dots), \end{aligned}$$

is the Kronecker sequence of index j. When j ≤ J this is obvious since \(\widetilde \kappa _{\nu +e_j}=\tau b_j\widetilde \kappa _\nu \leq \theta \widetilde \kappa _\nu \leq \widetilde \kappa _\nu \). When j > J, we have

$$\displaystyle \begin{aligned} \widetilde \kappa_{\nu+e_j}\widetilde \kappa_{\nu}^{-1}=b_j\left(\frac {1+c_1(\nu_j+1)}{1+c_1\nu_j} \right)^{c_2}. \end{aligned}$$

Noticing that the sequence \(a_n:=\left (\frac {1+c_1(n+1)}{1+c_1n} \right )^{c_2}\) converges toward 1 and is therefore bounded, and that bj tends to 0 as j →, we find that for J sufficiently large, the right-hand side in the above equation is bounded by 1 for all ν and j > J. □

From Lemma 3, by applying Theorem 3 with r = 1 or r = 2, we obtain the following result.

Theorem 5

If (22) holds with \((\rho _j^{-1})_{j\,{\geq }\,1}\in \ell ^q(\mathbb {N})\) for some 0 < q < ∞ and ρj > 1 for all j, then

$$\displaystyle \begin{aligned} \|u-u_{\varLambda_n}\|{}_{L^2(U,V)}\leq Cn^{-s}, \quad s:=\frac{1}{q}, \end{aligned}$$

where \(u_{\varLambda _n}\) is the truncated Legendre series and Λn is any downward closed set corresponding to the n largest \(\widehat \kappa _\nu \) where κν := bν β(ν). If q < 2, we also have

$$\displaystyle \begin{aligned} \|u-u_{\varLambda_n}\|{}_{L^\infty(U,V)}\leq Cn^{-s}, \quad s:=\frac{1}{q}-\frac{1}{2}, \end{aligned}$$

with Λn any downward closed set corresponding to the n largest \(\widehat \kappa _\nu \) where κν := bν β(ν)2 , with \(b:=\rho _j^{-1})_{j\,{\geq }\,1}\).

Finally, in the case of the Hermite coefficients, which corresponds to the weight

$$\displaystyle \begin{aligned} \kappa_\nu:= \prod_{j\,{\geq}\,1}\left( \sum_{l=0}^r {\nu_j \choose l}b_j^{-2l} \right)^{-1/2}, {} \end{aligned} $$
(30)

we can establish a similar summability result.

Lemma 4

For any 0 < q < ∞ and any integer r ≥ 1 such that \(q> \frac {2}{r}\) , we have

$$\displaystyle \begin{aligned} b\in \ell^q(\mathbb{N})\implies (\kappa_\nu)_{\nu\in\mathcal{F}} \in \ell^q (\mathcal{F}), \end{aligned}$$

where κν is given by (30). In addition, for any integer r ≥ 0, the sequence \((\kappa _\nu )_{\nu \in \mathcal {F}}\) is monotone nonincreasing.

Proof

For any \(\nu \in \mathcal {F}\) and any k ≥ 1 we have

and therefore the sequence \((\kappa _\nu )_{\nu \in \mathcal {F}}\) is monotone nonincreasing.

Now we check that \((\kappa _\nu )_{\nu \in \mathcal {F}} \in \ell ^q(\mathcal {F})\), using the factorization

$$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}} \kappa_\nu^q=\prod_{j\,{\geq}\,1} \sum_{n \geq 0}\left( \sum_{l=0}^r {n \choose l}b_j^{-2l} \right)^{-q/2} \leq \prod_{j\,{\geq}\,1}\sum_{n\geq 0}{n \choose r\wedge n}^{-q/2}b_j^{q(r\wedge n)}. \end{aligned} $$
(31)

where the inequality follows from the fact that the value \(l=n\wedge r:=\min \{n,r\}\) is contained in the sum.

The j-th factor Fj in the rightmost product in (31) may be written as

$$\displaystyle \begin{aligned}F_j= 1+b_j^q+\cdots+b_j^{(r-1)q}+C_{r,q}b_j^{rq}, \end{aligned}$$

where

$$\displaystyle \begin{aligned} C_{r,q} := \sum_{n \geq r} {n\choose r}^{-q/2} = (r!)^{q/2} \sum_{n\geq 0} \bigl[(n+1)\cdots (n+r)\bigr]^{-q/2}<\infty, \end{aligned} $$
(32)

since we have assumed that q > 2∕r. This shows that each Fj is finite. If \(b\in \ell ^q({\mathbb {N}})\), there exists an integer J ≥ 0 such that bj < 1 for all j > J. For such j, we can bound Fj by \(1 + (C_{r,q} + r - 1) b_j^{q}\), which shows that the product converges. □

From this lemma, and by application of Theorem 3 with the value r = 2, we obtain the following result for the Hermite series.

Theorem 6

If (28) holds with \((\rho _j^{-1})_{j\,{\geq }\,1}\in \ell ^q(\mathbb {N})\) for some 0 < q < ∞, then

$$\displaystyle \begin{aligned} \|u-u_{\varLambda_n}\|{}_{L^2(U,V)}\leq Cn^{-s}, \quad s:=\frac{1}{q}, \end{aligned}$$

where \(u_{\varLambda _n}\) is the truncated Hermite series and Λn is a downward closed set corresponding to the n largest κν given by (30).

In summary, we have established convergence rates for approximation by downward closed polynomial spaces of the solution map (2) associated to the elliptic PDE (3) with affine or lognormal parametrization. The conditions are stated in terms of the control on the L norm of ∑j ≥ 1 ρj|ψj|, where the ρj have a certain growth measured by the q summability of the sequence \(b=(b_j)_{j\,{\geq }\,1}=(\rho _j^{-1})_{j\,{\geq }\,1}\). This is a way to quantify the decay of the size of the ψj, also taking their support properties into account, and in turn to quantify the anisotropic dependence of u(y) on the various coordinates yj. Other similar results have been obtained with different PDE models, see in particular [12]. In the above results, the polynomial approximants are constructed by truncation of infinite series. The remainder of the paper addresses the construction of downward closed polynomial approximants from evaluations of the solution map at m points {y1, …, ym}∈ U, and discusses the accuracy of these approximants.

3 Interpolation

3.1 Sparse Interpolation by Downward Closed Polynomials

Interpolation is one of the most standard processes for constructing polynomial approximations based on pointwise evaluations. Given a downward closed set \(\varLambda \subset \mathcal {F}\) of finite cardinality, and a set of points

$$\displaystyle \begin{aligned} \varGamma\subset U, \quad \#(\varGamma)=\#(\varLambda), \end{aligned}$$

we would like to build an interpolation operator IΛ, that is, \(I_\varLambda u\in \mathbb {V}_\varLambda \) is uniquely characterized by

$$\displaystyle \begin{aligned} I_\varLambda u(y)=u(y), \quad y\in\varGamma, \end{aligned}$$

for any V -valued function u defined on U.

In the univariate case, it is well known that such an operator exists if and only if Γ is a set of pairwise distinct points, and that additional conditions are needed in the multivariate case. Moreover, since the set Λ may come from a nested sequence (Λn)n≥1 as discussed in Sect. 2, we are interested in having similar nestedness properties for the corresponding sequence (Γn)n≥1, where

$$\displaystyle \begin{aligned} \#(\varGamma_n)=\#(\varLambda_n)=n. \end{aligned}$$

Such a nestedness property allows us to recycle the n evaluations of u which have been used in the computation of \(I_{\varLambda _n} u\), and use only one additional evaluation for the next computation of \(I_{\varLambda _{n+1}} u\).

It turns out that such hierarchical interpolants can be constructed in a natural manner by making use of the downward closed structure of the index sets Λn. This construction is detailed in [7] but its main principles can be traced from [23]. In order to describe it, we assume that the parameter domain is of either form

$$\displaystyle \begin{aligned} U=[-1,1]^d \quad \mathrm{or} \quad [-1,1]^{\mathbb{N}}, \end{aligned}$$

with the convention that d =  in the second case. However, it is easily checked that the construction can be generalized in a straightforward manner to any domain with Cartesian product form

$$\displaystyle \begin{aligned} U=\underset{k\geq 1}{\times} J_k, \end{aligned}$$

where the Jk are finite or infinite intervals.

We start from a sequence of pairwise distinct points

$$\displaystyle \begin{aligned} T=(t_k)_{k\geq 0}\subset [-1,1]. \end{aligned}$$

We denote by Ik the univariate interpolation operator on the space \( \mathbb {V}_k:=V\otimes \mathbb {P}_k \) associated with the k-section {t0, …, tk} of this sequence, that is,

$$\displaystyle \begin{aligned} I_ku(t_i)=u(t_i),\quad i=0,\dots,k, \end{aligned}$$

for any V -valued function u defined on [−1, 1]. We express Ik in the Newton form

$$\displaystyle \begin{aligned} I_ku=I_0u+\sum_{l=1}^k\varDelta_l u,\quad \varDelta_l:=I_l-I_{l-1}, {} \end{aligned} $$
(33)

and set I−1 = 0 so that we can also write

$$\displaystyle \begin{aligned} I_ku=\sum_{l=0}^k \varDelta_l u. \end{aligned}$$

Obviously the difference operator Δk annihilates the elements of \(\mathbb {V}_{k-1}\). In addition, since Δk u(tj) = 0 for j = 0, …, k − 1, we have

$$\displaystyle \begin{aligned} \varDelta_k u(t)=\alpha_k B_k(t), \end{aligned}$$

where

$$\displaystyle \begin{aligned} B_{k}(t):=\prod_{l=0}^{k-1}\frac {t-t_l}{t_k-t_l}. \end{aligned}$$

The coefficient αk ∈ V can be computed inductively, since it is given by

$$\displaystyle \begin{aligned} \alpha_k=\alpha_k(u):=u(t_k)-I_{k-1}u(t_k), \end{aligned}$$

that is, the interpolation error at tk when using Ik−1. Setting

$$\displaystyle \begin{aligned} B_0(t):=1, \end{aligned}$$

we observe that the system {B0, …, Bk} is a basis for \(\mathbb {P}_k\). It is sometimes called a hierarchical basis.

In the multivariate setting, we tensorize the grid T, by defining

$$\displaystyle \begin{aligned}y_\nu:=(t_{\nu_j})_{j\,{\geq}\,1}\in U, \quad \nu\in\mathcal{F}. \end{aligned}$$

We first introduce the tensorized operator

$$\displaystyle \begin{aligned} I_\nu:=\bigotimes_{j\,{\geq}\,1} I_{\nu_j}, \end{aligned}$$

recalling that the application of a tensorized operator ⊗j ≥ 1 Aj to a multivariate function amounts in applying each univariate operator Aj by freezing all variables except the jth one, and then applying Aj to the unfrozen variable. It is readily seen that Iν is the interpolation operator on the tensor product polynomial space

$$\displaystyle \begin{aligned} \mathbb{V}_\nu=V\otimes \mathbb{P}_\nu, \quad \mathbb{P}_\nu:=\bigotimes_{j\,{\geq}\,1} \mathbb{P}_{\nu_j}, \end{aligned}$$

associated to the grid of points

$$\displaystyle \begin{aligned} \varGamma_{\nu}=\underset{j\,{\geq}\,1}{\times} \{t_{0},\dots,t_{\nu_j}\}. \end{aligned}$$

This polynomial space corresponds to the particular downward closed index set of rectangular shape

$$\displaystyle \begin{aligned} \varLambda=R_\nu:=\{ \widetilde \nu \; : \; \widetilde \nu \leq \nu \}. \end{aligned}$$

Defining in a similar manner the tensorized difference operators

$$\displaystyle \begin{aligned} \varDelta_\nu:=\bigotimes_{j\,{\geq}\,1}\varDelta_{\nu_j}, \end{aligned}$$

we observe that

$$\displaystyle \begin{aligned} I_\nu=\bigotimes_{j\,{\geq}\,1} I_{\nu_j}=\bigotimes_{j\,{\geq}\,1} (\sum_{l=0}^{\nu_j} \varDelta_l)=\sum_{\widetilde\nu\in R_\nu} \varDelta_{\widetilde\nu}. \end{aligned} $$

The following result from [7] shows that the above formula can be generalized to any downward closed set in order to define an interpolation operator. We recall its proof for sake of completeness.

Theorem 7

Let \(\varLambda \subset \mathcal {F}\) be a finite downward closed set, and define the grid

$$\displaystyle \begin{aligned} \varGamma_\varLambda:=\{y_\nu \; : \; \nu\in \varLambda\}. \end{aligned} $$

Then, the interpolation operator onto \(\mathbb {V}_\varLambda \) for this grid is defined by

$$\displaystyle \begin{aligned} I_\varLambda:=\sum_{\nu\in \varLambda} \varDelta_\nu. {} \end{aligned} $$
(34)

Proof

From the downward closed set property, \(\mathbb {V}_\nu \subset \mathbb {V}_\varLambda \) for all ν ∈ Λ. Hence the image of IΛ is contained in \(\mathbb {V}_\varLambda \). With IΛ defined by (34), we may write

$$\displaystyle \begin{aligned} I_\varLambda u=I_{\nu}u+\sum_{\widetilde \nu\in\varLambda, \widetilde \nu \nleqslant \nu} \varDelta_{\,\widetilde \nu\,} u, \end{aligned}$$

for any ν ∈ Λ. Since yν ∈ Γν, we know that

$$\displaystyle \begin{aligned} I_\nu u(y_\nu)=u(y_\nu). \end{aligned}$$

On the other hand, if \(\widetilde \nu \nleqslant \nu \), this means that there exists a j ≥ 1 such that \(\widetilde \nu _j>\nu _j\). For this j we thus have \(\varDelta _{\,\widetilde \nu \,} u(y)=0\) for all y ∈ U with the jth coordinate equal to \(t_{\nu _j}\) by application of \(\varDelta _{\nu _j}\) in the jth variable, so that

$$\displaystyle \begin{aligned} \varDelta_{\,\widetilde \nu\,} u(y_\nu)=0. \end{aligned}$$

The interpolation property IΛ u(yν) = u(yν) thus holds, for all ν ∈ Λ. □

The decomposition (34) should be viewed as a generalization of the Newton form (33). In a similar way, its terms can be computed inductively: if \(\varLambda =\widetilde \varLambda \cup \{\nu \}\) where \(\widetilde \varLambda \) is a downward closed set, we have

$$\displaystyle \begin{aligned} \varDelta_\nu u=\alpha_\nu B_\nu, \end{aligned}$$

where

$$\displaystyle \begin{aligned} B_\nu(y):=\prod_{j\,{\geq}\,1} B_{\nu_j}(y_j), \end{aligned}$$

and

$$\displaystyle \begin{aligned} \alpha_\nu=\alpha_\nu(u):=u(y_\nu)-I_{\,\widetilde \varLambda\,} u(y_\nu). \end{aligned}$$

Therefore, if (Λn)n≥1 is any nested sequence of downward closed index sets, we can compute \(I_{\varLambda _n}\) by n iterations of

$$\displaystyle \begin{aligned} I_{\varLambda_{i}}u=I_{\varLambda_{i-1}}u +\alpha_{\nu^i} B_{\nu^i}, \end{aligned}$$

where νi ∈ Λi is such that Λi = Λi−1 ∪{νi}.

Note that (Bν)νΛ is a basis of \(\mathbb {P}_\varLambda \) and that any \(f \in \mathbb {V}_\varLambda \) has the unique decomposition

$$\displaystyle \begin{aligned}f=\sum_{\nu\in \varLambda} \alpha_\nu B_\nu, \end{aligned}$$

where the coefficients αν = αν(f) ∈ V are defined by the above procedure. Also note that αν(f) does not depend on the choice of Λ but only on ν and f.

3.2 Stability and Error Estimates

The pointwise evaluations of the function u could be affected by errors, as modeled by (6) and (7). The stability of the interpolation operator with respect to such perturbations is quantified by the Lebesgue constant, which is defined by

$$\displaystyle \begin{aligned} \mathbb{L}_\varLambda:=\sup \frac{\|I_\varLambda\, f\|{}_{L^\infty(U,V)}} {\|\,f\|{}_{L^\infty(U,V)}}, \end{aligned}$$

where the supremum is taken over the set of all V -valued functions f defined everywhere and uniformly bounded over U. It is easily seen that this supremum is in fact independent of the space V , so that we may also write

$$\displaystyle \begin{aligned} \mathbb{L}_\varLambda:=\sup \frac{\|I_\varLambda\, f\|{}_{L^\infty(U)}} {\|\,f\|{}_{L^\infty(U)}}, \end{aligned}$$

where the supremum is now taken over real-valued functions. Obviously, we have

$$\displaystyle \begin{aligned} \|u-I_\varLambda(u+\eta)\|{}_{L^\infty(U,V)} \leq \|u-I_\varLambda u\|{}_{L^\infty(U,V)}+ \mathbb{L}_\varLambda \varepsilon, \end{aligned}$$

where ε is the noise level from (7).

The Lebesgue constant also allows us to estimate the error of interpolation \(\|u-I_\varLambda u\|{ }_{L^\infty (U,V)}\) for the noiseless solution map in terms of the best approximation error in the L norm: for any u ∈ L(U, V ) and any \(\widetilde u\in \mathbb {V}_\varLambda \) we have

$$\displaystyle \begin{aligned} \|u-I_\varLambda u\|{}_{L^\infty(U,V)} \leq \|u-\widetilde u\|{}_{L^\infty(U,V)}+\|I_\varLambda \widetilde u-I_\varLambda u\|{}_{L^\infty(U,V)}, \end{aligned}$$

which by infimizing over \(\widetilde u\in \mathbb {V}_\varLambda \) yields

$$\displaystyle \begin{aligned} \|u-I_\varLambda u\|{}_{L^\infty(U,V)} \leq (1+\mathbb{L}_\varLambda)\inf_{\widetilde u\in \mathbb{V}_\varLambda}\|u-\widetilde u\|{}_{L^\infty(U,V)}. \end{aligned}$$

We have seen in Sect. 2 that for relevant classes of solution maps yu(y), there exist sequences of downward closed sets (Λn)n≥1 with #(Λn) = n, such that

$$\displaystyle \begin{aligned} \inf_{\widetilde u\in \mathbb{V}_{\varLambda_n}}\|u- \widetilde u\|{}_{L^\infty(U,V)} \leq Cn^{-s}, \quad n\geq 1, \end{aligned}$$

for some s > 0. For such sets, we thus have

$$\displaystyle \begin{aligned} \|u-I_{\varLambda_n} u\|{}_{L^\infty(U,V)} \leq C(1+\mathbb{L}_{\varLambda_n})n^{-s}. {} \end{aligned} $$
(35)

This motivates the study of the growth of \(\mathbb {L}_{\varLambda _n}\) as n →.

For this purpose, we introduce the univariate Lebesgue constants

$$\displaystyle \begin{aligned} \mathbb{L}_k:=\sup\frac {\|I_k f\|{}_{L^\infty([-1,1])}} {\|\,f\|{}_{L^\infty([-1,1])}}. \end{aligned}$$

Note that \(\mathbb {L}_0=1\). We also define an analog quantity for the difference operator

$$\displaystyle \begin{aligned} \mathbb{D}_k:=\sup\frac {\|\varDelta_k f\|{}_{L^\infty([-1,1])}} {\|\,f\|{}_{L^\infty([-1,1])}}. \end{aligned}$$

In the particular case of the rectangular downward closed sets Λ = Rν, since \(I_\varLambda =I_\nu =\otimes _{j\,{\geq }\,1}I_{\nu _j}\), we have

$$\displaystyle \begin{aligned} \mathbb{L}_{R_\nu}=\prod_{j\,{\geq}\,1} \mathbb{L}_{\nu_j}. \end{aligned}$$

Therefore, if the sequence T = (tk)k≥0 is such that

$$\displaystyle \begin{aligned} \mathbb{L}_k \leq (1+k)^\theta ,\quad k\geq 0, {} \end{aligned} $$
(36)

for some θ ≥ 1, we find that

$$\displaystyle \begin{aligned} \mathbb{L}_{R_\nu}\leq \prod_{j\,{\geq}\,1} (1+\nu_j)^\theta= (\#(R_\nu))^\theta, \end{aligned}$$

for all \(\nu \in \mathcal {F}\).

For arbitrary downward closed sets Λ, the expression of IΛ shows that

$$\displaystyle \begin{aligned} \mathbb{L}_\varLambda\leq \sum_{\nu\in \varLambda} \prod_{j\,{\geq}\,1} \mathbb{D}_{\nu_j}. \end{aligned}$$

Therefore, if the sequence T = (tk)k≥0 is such that

$$\displaystyle \begin{aligned} \mathbb{D}_k \leq (1+k)^\theta ,\quad k\geq 0, {} \end{aligned} $$
(37)

we find that

$$\displaystyle \begin{aligned} \mathbb{L}_\varLambda \leq \sum_{\nu\in \varLambda} \prod_{j\,{\geq}\,1} (1+\nu_j)^\theta =\sum_{\nu\in \varLambda} (\#(R_\nu))^\theta \leq \sum_{\nu\in \varLambda} (\#(\varLambda))^\theta = (\#(\varLambda))^{\theta+1}. \end{aligned}$$

The following result from [7] shows that this general estimate is also valid under the assumption (36) on the growth of \(\mathbb {L}_k\).

Theorem 8

If the sequence T = (tk)k≥0 is such that (36) or (37) holds for some θ ≥ 1, then

$$\displaystyle \begin{aligned} \mathbb{L}_{\varLambda} \leq (\#(\varLambda))^{\theta+1}, \end{aligned}$$

for all downward closed sets Λ.

One noticeable feature of the above result is that the bound on \(\mathbb {L}_\varLambda \) only depends on #(Λ), independently of the number of variables, which can be infinite, as well as of the shape of Λ.

We are therefore interested in univariate sequences T = (tk)k≥0 such that \(\mathbb {L}_k\) and \(\mathbb {D}_k\) have moderate growth with k. For Chebyshev or Gauss-Lobatto points, given by

$$\displaystyle \begin{aligned} \mathcal{C}_k\,{:=}\,\left\{\cos\left(\frac {2 l+1}{2k+2} \pi\right)\; : \; l\,{=}\,0,\dots, k\right\}\;\mathrm{and}\; \mathcal{G}_k\,{:=}\,\left\{\cos\left(\frac {l}{k} \pi\right)\; : \; l\,{=}\,0,\dots, k\right\}, \end{aligned}$$

it is well known that the Lebesgue constant has logarithmic growth \(\mathbb {L}_k\sim \ln (k)\), thus slower than algebraic. However these points are not the k section of a single sequence T, and therefore they are not convenient for our purposes. Two examples of univariate sequences of interest are the following.

  • The Leja points: from an arbitrary t0 ∈ [−1, 1] (usually taken to be 1 or 0), this sequence is recursively defined by

    $$\displaystyle \begin{aligned} t_k:=\mathop{\text{argmax}} \left\{ \prod_{l=0}^{k-1}|t-t_l| \; : \; t\in [-1,1]\right\}. \end{aligned}$$

    Note that this choice yields hierarchical basis functions Bk that are uniformly bounded by 1. Numerical computations of \(\mathbb {L}_k\) for the first 200 values of k indicates that the linear bound

    $$\displaystyle \begin{aligned} \mathbb{L}_k \leq 1+k, {} \end{aligned} $$
    (38)

    holds. Proving that this bound, or any other algebraic growth bound, holds for all values of k ≥ 0 is currently an open problem.

  • The ℜ-Leja points: they are the real part of the Leja points defined on the complex unit disc {|z|≤ 1}, taking for example e0 = 1 and recursively setting

    $$\displaystyle \begin{aligned} e_k:=\mathop{\text{argmax}} \left\{ \prod_{l=0}^{k-1}|e-e_l| \; : \; |e|\leq 1\right \}. \end{aligned}$$

    These points have the property of accumulating in a regular manner on the unit circle according to the so-called Van der Corput enumeration [4]. It is proven in [5] that the linear bound (38) holds for the Lebesgue constant of the complex interpolation operator on the unit disc associated to these points. The sequence of real parts

    $$\displaystyle \begin{aligned} t_k:=\Re(e_k), \end{aligned}$$

    is defined after eliminating the possible repetitions corresponding to \(e_k=\overline e_l\) for two different values of k = l. These points coincide with the Gauss-Lobatto points for values of k of the form 2n + 1 for n ≥ 0. A quadratic bound

    $$\displaystyle \begin{aligned} \mathbb{D}_k \leq (1+k)^2, \end{aligned}$$

    is established in [6].

If we use such sequences, application of Theorem 8 gives bounds of the form

$$\displaystyle \begin{aligned} \mathbb{L}_\varLambda\leq (\#(\varLambda))^{1+\theta}, \end{aligned}$$

for example with θ = 2 when using the -Leja points, or θ = 1 when using the Leja points provided that the conjectured bound (38) holds. Combining with (35), we obtain the convergence estimate

$$\displaystyle \begin{aligned} \|u-I_{\varLambda_n} u\|{}_{L^\infty(U,V)} \leq C n^{-(s-1-\theta)}, \end{aligned}$$

which reveals a serious deterioration of the convergence rate when using interpolation instead of truncated expansions.

However, for the parametric PDE models discussed in Sect. 2, it is possible to show that this deterioration actually does not occur, based on the following lemma which relates the interpolation error to the summability of coefficient sequences in general expansions of u.

Lemma 5

Assume that u admits an expansion of the type (13), where \(\|\phi _\nu \|{ }_{L^\infty (U)}\leq 1\) which is unconditionally convergent towards u in L(U, V ). Assume in addition that yu(y) is continuous from U equipped with the product topology toward V . If the univariate sequence T = (tk)k≥0 is such that that (36) or (37) holds for some θ ≥ 1, then, for any downward closed set Λ,

$$\displaystyle \begin{aligned} \| u-I_\varLambda u \|{}_{L^\infty(U,V)} \leq 2\sum_{\nu\notin \varLambda} \pi (\nu) \|u_\nu\|{}_V, \quad \pi(\nu) := \prod_{j\,{\geq}\,1} (1+\nu_j)^{\theta+1}. {} \end{aligned} $$
(39)

Proof

The unconditional convergence of (13) and the continuity of u with respect to the product topology allow us to say that the equality in (13) holds everywhere in U. We may thus write

$$\displaystyle \begin{aligned} I_\varLambda u = I_\varLambda\left(\sum_{\nu\in\mathcal{F}} u_\nu \phi_\nu\right) = \sum_{\nu\in\mathcal{F}} u_\nu I_\varLambda \phi_\nu = \sum_{\nu\in\varLambda} u_\nu \phi_\nu + \sum_{\nu\notin\varLambda} u_\nu I_{\varLambda}\phi_\nu, \end{aligned}$$

where we have used that IΛ ϕν = ϕν for every ν ∈ Λ since \(\phi _\nu \in \mathbb {P}_\varLambda \). For the second sum on the right-hand side, we observe that for each νΛ,

$$\displaystyle \begin{aligned} I_{\varLambda}\phi_\nu=\sum_{\widetilde \nu\in \varLambda}\varDelta_{\,\widetilde \nu\,} \phi_\nu =\sum_{\widetilde \nu\in \varLambda\cap R_\nu}\varDelta_{\,\widetilde \nu\,} \phi_\nu=I_{\varLambda\cap R_\nu} \phi_\nu, \end{aligned}$$

since \(\varDelta _{\,\widetilde \nu \,}\) annihilates \(\mathbb {P}_\nu \) whenever \(\widetilde \nu \nleq \nu \). Therefore

$$\displaystyle \begin{aligned} u-I_\varLambda u = \sum_{\nu\not\in\varLambda} u_\nu (I- I_{\varLambda\cap R_\nu })\phi_\nu, \end{aligned}$$

where I stands for the identity operator. This implies

$$\displaystyle \begin{aligned} \|u-I_\varLambda u \|{}_{L^\infty(U,V)} \leq \sum_{\nu\not\in\varLambda} (1+\mathbb{L}_{\varLambda\cap R_\nu})\|u_\nu\|{}_V \leq 2\sum_{\nu\not\in\varLambda} \mathbb{L}_{\varLambda\cap R_\nu}\|u_\nu\|{}_V \; . \end{aligned}$$

Since (36) or (37) holds, we obtain from Theorem 8 that

$$\displaystyle \begin{aligned} \mathbb{L}_{\varLambda\cap R_\nu} \leq (\#(\varLambda\cap R_\nu))^{\theta+1} \leq (\#(R_\nu))^{\theta+1} = \pi(\nu), \end{aligned}$$

which yields (39). □

We can apply the above lemma with the Taylor series (18) or the renormalized Legendre series (20). This leads us to analyze the 1 tail of the sequence \((c_\nu )_{\nu \in \mathcal {F}}\) where cν is either π(ν)∥tνV or \(\pi (\nu )\|\widetilde w_\nu \|{ }_V\). If (22) holds, we know from Theorem 1 that this sequence satisfies the bound

$$\displaystyle \begin{aligned} \sum_{\nu\in\mathcal{F}}(\omega_\nu c_\nu)^2 <\infty, \end{aligned}$$

where ων is either π(ν)−1 ρν or π(ν)−1 β(ν)−2 ρν. Since π(ν) has algebraic growth similar to β(ν), application of Lemma 3 and of Theorem 3 with the value r = 1, leads to the following result.

Theorem 9

If (22) holds with \((\rho _j^{-1})_{j\,{\geq }\,1}\in \ell ^q(\mathbb {N})\) for some 0 < q < 2 and ρj > 1 for all j, then

$$\displaystyle \begin{aligned} \|u-I_{\varLambda_n}u\|{}_{L^\infty(U,V)}\leq Cn^{-s}, \quad s:=\frac{1}{q}-\frac{1}{2}, \end{aligned}$$

where Λn is any downward closed set corresponding to the n largest \(\widehat \kappa _\nu \) where κν is either π(ν)bν or π(ν)β(ν)2 bν , where \(b:=(\rho _j^{-1})_{j\,{\geq }\,1}\).

4 Discrete Least Squares Approximations

4.1 Discrete Least Squares on V -Valued Linear Spaces

Least-squares fitting is an alternative approach to interpolation for building a polynomial approximation of u from \(\mathbb {V}_\varLambda \). In this approach we are given m observations u1, …, um of u at points \(y^1,\ldots ,y^m \in U \subseteq \mathbb {R}^d\) where m ≥ n = #(Λ).

We first discuss the least-squares method in the more general setting of V -valued linear spaces,

$$\displaystyle \begin{aligned} \mathbb{V}_n:=V\otimes \mathbb{Y}_n, \end{aligned}$$

where \(\mathbb {Y}_n\) is the space of real-valued functions defined everywhere on U such that \(\dim (\mathbb {Y}_n)=n\). In the next section, we discuss more specifically the case where \(\mathbb {Y}_n=\mathbb {P}_\varLambda \). Here we study the approximation error in the L2(U, V, ) norm for some given probability measure , when the evaluation points yi are independent and drawn according to this probability measure. For notational simplicity we use the shorthand

$$\displaystyle \begin{aligned} \|\cdot\|:=\|\cdot \|{}_{L^2(U,V,d\mu)}. \end{aligned}$$

The least-squares method selects the approximant of u in the space \(\mathbb {V}_n\) as

$$\displaystyle \begin{aligned} u_L:=\mathop{\text{argmin}}_{\widetilde u \in \mathbb{V}_n} \frac{1}{m} \sum_{i=1}^m \|\, \widetilde u(y^i)-u^i \|{}_V^2. \end{aligned}$$

In the noiseless case where ui := u(yi) for any i = 1, …, m, this also writes

$$\displaystyle \begin{aligned} u_L=\mathop{\text{argmin}}_{\widetilde u \in \mathbb{V}_\varLambda }\|u-\widetilde u\|{}_m, {} \end{aligned} $$
(40)

where the discrete seminorm is defined by

$$\displaystyle \begin{aligned} \|\,f \|{}_m:= \left( \frac{1}{m} \sum_{i=1}^m \|\,f(y^i) \|{}_V^2 \right)^{1/2}. \end{aligned}$$

Note that \(\|\,f\|{ }_m^2\) is an unbiased estimator of ∥ f2 since we have

$$\displaystyle \begin{aligned} \mathbb{E}(\|\,f\|{}_m^2)=\|\,f\|{}^2. \end{aligned}$$

Let {ϕ1, …, ϕn} denote an arbitrary L2(U, ) orthonormal basis of the space \(\mathbb {Y}_n\). If we expand the solution to (40) as \(\sum _{j=1}^n c_j \phi _j\), with cj ∈ V , the V -valued vector c = (c1, …, cn)t is the solution to the normal equations

$$\displaystyle \begin{aligned} \mathbf{G} \mathbf{c}=\mathbf{d}, {} \end{aligned} $$
(41)

where the matrix G has entries

$$\displaystyle \begin{aligned} \mathbf{G}_{j,k}=\frac{1}{m}\sum_{i=1}^m \phi_j(y^i)\phi_k(y^i), \end{aligned}$$

and where the V -valued data vector d = (d1, …, dn)t is given by

$$\displaystyle \begin{aligned} d_j:=\frac{1}{m} \sum_{i=1}^m u^i \phi_j(y^i). \end{aligned}$$

This linear system always has at least one solution, which is unique when G is nonsingular. When G is singular, we may define uL as the unique minimal \(\ell ^2(\mathbb {R}^n,V)\) norm solution to (41).

In the subsequent analysis, we sometimes work under the assumption of a known uniform bound

$$\displaystyle \begin{aligned} \| u \|{}_{L^\infty(U,V)}\leq \tau. {} \end{aligned} $$
(42)

We introduce the truncation operator

$$\displaystyle \begin{aligned} z \mapsto T_{\tau} (z):= \left\{ \begin{array}{ll} z, & \ \mathrm{ if} \ \| z\|{}_V \leq \tau, \\ \frac{z}{\|z \|{}_V}, & \ \mathrm{ if} \ \| z\|{}_V > \tau, \end{array} \right. \end{aligned}$$

and notice that it is a contraction: \(\| T_{\tau }(z)-T_{\tau }(\widetilde z)\|{ }_V \leq \|z -\widetilde z\|{ }_V \) for any \(z,\widetilde z \in V\). The truncated least-squares approximation is defined by

$$\displaystyle \begin{aligned} u_T:=T_{\tau}\circ u_L. \end{aligned}$$

Note that, in view of (42), we have ∥u(y) − uT(y)∥V ≤∥u(y) − uL(y)∥V for any y ∈ U and therefore

$$\displaystyle \begin{aligned} \|u - u_T\|\leq \|u -u_L\|. \end{aligned}$$

Note that the random matrix G concentrates toward its expectation which is the identity matrix I as m →. In other words, the probability that G is ill-conditioned becomes very small as m increases. The truncation operator aims at avoiding instabilities which may occur when G is ill-conditioned. As an alternative proposed in [15], we may define for some given A > 1 the conditioned least-squares approximation by

$$\displaystyle \begin{aligned} u_C:=u_L, \ \mathrm{if} \ \mathrm{cond}(\mathbf{G}) \leq A, \quad u_C:=0, \ \mathrm{otherwise}, \end{aligned}$$

where \(\mathrm {cond}(\mathbf {G}):={\lambda _{\max }(\mathbf {G})}/{\lambda _{\min }(\mathbf {G})}\) is the usual condition number.

The property that ∥G −I2 ≤ δ for some 0 < δ < 1 amounts to the norm equivalence

$$\displaystyle \begin{aligned} (1-\delta)\|\,f \|{}^2\leq \|\,f\|{}_m^2\leq (1+\delta)\|\,f\|{}^2, \quad f\in \mathbb{V}_n. \end{aligned}$$

It is well known that if m ≥ n is too close to n, least-squares methods may become unstable and inaccurate for most sampling distributions. For example, if U = [−1, 1] and \(\mathbb {Y}_n= \mathbb {P}_{n-1}\) is the space of algebraic polynomials of degree n − 1, then with m = n the estimator coincides with the Lagrange polynomial interpolation which can be highly unstable and inaccurate, in particular for equispaced points. Therefore, m should be sufficiently large compared to n for the probability that G is ill-conditioned to be small. This trade-off between m and n has been analyzed in [11], using the function

$$\displaystyle \begin{aligned}y \mapsto k_n(y):=\sum_{j=1}^n |\phi_j(y)|{}^2, \end{aligned}$$

which is the diagonal of the integral kernel of the L2(U, ) projector on \(\mathbb {Y}_n\). This function depends on , but not on the chosen orthonormal basis. It is strictly positive in U under minimal assumptions on the orthonormal basis, for example if one element of the basis is the constant function over all U. Obviously, the function kn satisfies

$$\displaystyle \begin{aligned} \int_{U} k_n \, d\mu=n. \end{aligned}$$

We define

$$\displaystyle \begin{aligned} K_n:=\|k_n \|{}_{L^\infty(U)} \geq n. \end{aligned}$$

The following results for the least-squares method with noiseless evaluations were obtained in [8, 11, 15, 28] for real-valued functions, however their proof extends in a straightforward manner to the present setting of V -valued functions. They are based on a probabilistic bound for the event ∥G −I2 > δ using the particular value \(\delta =\frac {1}{2}\), or equivalently the value \(A=\frac {1+\delta }{1-\delta }=3\) as a bound on the condition number of G.

Theorem 10

For any r > 0, if m and n satisfy

$$\displaystyle \begin{aligned} K_n \leq \kappa\frac{m}{\ln m},\;\; \mathrm{with}\;\; \kappa:=\kappa(r)= \frac{3\ln(3/2)-1}{2+2r}, {} \end{aligned} $$
(43)

then the following hold.

  1. (i)

    The matrix G satisfies the tail bound

    $$\displaystyle \begin{aligned} \mathrm{Pr} \, \left\{ \|\mathbf{G}-\mathbf{I}\|{}_2 > \frac{1}{2} \right\} \leq 2m^{-r}. \end{aligned}$$
  2. (ii)

    If u satisfies (42), then the truncated least-squares estimator satisfies, in the noiseless case,

    $$\displaystyle \begin{aligned} \mathbb{E}(\|u-u_T\|{}^2)\leq (1+\zeta(m))\inf_{\widetilde u\in \mathbb{V}_n}\|u-\widetilde u\|{}^2+8\tau^2m^{-r}, \end{aligned}$$

    where \(\zeta (m):= \frac {4\kappa } {\ln (m)}\to 0\) as m ∞, and κ is as in (43).

  3. (iii)

    The conditioned least-squares estimator satisfies, in the noiseless case,

    $$\displaystyle \begin{aligned} \mathbb{E}(\|u-u_C\|{}^2)\leq (1+\zeta(m))\inf_{\widetilde u\in \mathbb{V}_n}\|u-\widetilde u\|{}^2+2 \| u \|{}^2 m^{-r}, \end{aligned}$$

    where ζ(m) is as in (ii).

  4. (iv)

    If u satisfies (42), then the estimator u E ∈{u L, u T, u C} satisfies, in the noiseless case,

    $$\displaystyle \begin{aligned} \|u- u_E\| \leq (1+\sqrt 2) \inf_{\widetilde u\in \mathbb{V}_n}\|u-\widetilde u\|{}_{L^\infty(U,V)}, {} \end{aligned} $$
    (44)

    with probability larger than 1 − 2m r.

In the case of noisy evaluations modeled by (6)–(7), the observations are given by

$$\displaystyle \begin{aligned} u^i=u(y^i)+\eta(y^i). {} \end{aligned} $$
(45)

The following result from [8] shows that (44) holds up to this additional perturbation.

Theorem 11

For any r > 0, if m and n satisfy condition (43) and u satisfies (42), then the estimator uE ∈{uL, uT, uC} in the noisy case (45) satisfies

$$\displaystyle \begin{aligned} \|u- u_E\|\leq (1+\sqrt 2) \inf_{\widetilde u \in \mathbb{V}_n}\|u-\widetilde u\|{}_{L^\infty(U,V)} + \sqrt 2 \varepsilon, \end{aligned}$$

with probability larger than 1 − 2nr , where ε is the noise level in (7).

Similar results, with more general assumptions on the type of noise, are proven in [11, 15, 29].

4.2 Downward Closed Polynomial Spaces and Weighted Least Squares

Condition (43) shows that Kn gives indications on the number m of observations required to ensure stability and accuracy of the least-squares approximation. In order to understand how demanding this condition is with respect to m, it is important to have sharp upper bounds for Kn. Such bounds have been proven when the measure on U = [−1, 1]d has the form

$$\displaystyle \begin{aligned} d\mu= C \bigotimes_{j=1}^d (1-y_j)^{\theta_1}(1+y_j)^{\theta_2} dy_j, \end{aligned} $$
(46)

where θ1, θ2 > −1 are real shape parameters and C is a normalization constant such that ∫U  = 1. Sometimes (46) is called the Jacobi measure, because the Jacobi polynomials are orthonormal in L2(U, ). Remarkable instances of the measure (46) are the uniform measure, when θ1 = θ2 = 0, and the Chebyshev measure, when \(\theta _1=\theta _2=-\frac {1}{2}\).

When \(\mathbb {Y}_n=\mathbb {P}_\varLambda \) is a multivariate polynomial space and Λ is a downward closed multi-index set with #(Λ) = n, it is proven in [8, 27] that Kn satisfies an upper bound which only depends on n and on the choice of the measure (46) through the values of θ1 and θ2.

Lemma 6

Let dμ be the measure defined in (46). Then it holds

$$\displaystyle \begin{aligned} K_n \leq \left\{ \begin{array}{ll} n^{\frac{ \ln 3 }{\ln 2}}, & \quad \mathrm{ if} \ \theta_1=\theta_2=-\frac{1}{2}, \\ n^{2\max\{\theta_1,\theta_2\}+2}, & \quad \mathrm{ if} \ \theta_1,\theta_2\in\mathbb{N}_0. \end{array} \right. \end{aligned} $$
(47)

A remarkable property of both algebraic upper bounds in (47) is that the exponent of n is independent of the dimension d, and of the shape of the downward closed set Λ. Both upper bounds are sharp in the sense that equality holds for multi-index sets of rectangular type Λ = Rν corresponding to tensor product polynomial spaces.

As an immediate consequence of Theorem 10 and Lemma 6, we have the next corollary.

Corollary 1

For any r > 0, with multivariate polynomial spaces \(\mathbb {P}_\varLambda \) and Λ downward closed, if m and n satisfy

$$\displaystyle \begin{aligned} \dfrac{m}{\ln m} \geq \kappa \left\{ \begin{array}{ll} n^{\frac{\ln 3}{\ln 2}}, & \quad \mathrm{ if} \ \theta_1=\theta_2=-\frac{1}{2}, \\ n^{2\max\{\theta_1,\theta_2 \} +2}, & \quad \mathrm{ if} \ \theta_1,\theta_2\in\mathbb{N}_0, \end{array} \right. \end{aligned} $$
(48)

with κ = κ(r) as in (43), then the same conclusions of Theorem 10 hold true.

Other types of results on the accuracy of least squares have been recently established in [14], under conditions of the same type as (48).

In some situations, for example when n is very large, the conditions (48) might require a prohibitive number of observations m. It is therefore a legitimate question to ask whether there exist alternative approaches with less demanding conditions than (48) between m and n. At best, we would like that m is of order only slightly larger than n, for example by a logarithmic factor. In addition, the above analysis does not apply to situations where the basis functions ϕk are unbounded, such as when using Hermite polynomials in the expansion (21). It is thus desirable to ask for the development of approaches that also cover this case.

These questions have an affirmative answer by considering weighted least-squares methods, as proposed in [15, 18, 21]. In the following, we survey some results from [15]. For the space \(\mathbb {V}_n=V\otimes \mathbb {Y}_n\), the weighted least-squares approximation is defined as

$$\displaystyle \begin{aligned} u_W:=\mathop{\text{argmin}}_{\widetilde u \in \mathbb{V}_n} \frac{1}{m} \sum_{i=1}^m w^i \| \widetilde u(y^i)-u^i \|{}_V^2, \end{aligned}$$

for some given choice of weights wi ≥ 0. This estimator is again computed by solving a linear system of normal equations now with the matrix G with entries

$$\displaystyle \begin{aligned} \mathbf{G}_{j,k}=\frac{1}{m}\sum_{i=1}^m w(y^i) \phi_j(y^i)\phi_k(y^i). \end{aligned}$$

Of particular interest to us are weights of the form

$$\displaystyle \begin{aligned} w^i=w(y^i), \end{aligned}$$

where w is some nonnegative function defined on U such that

$$\displaystyle \begin{aligned} \int_{U} w^{-1} \, d\mu=1. {} \end{aligned} $$
(49)

We then denote by the probability measure

$$\displaystyle \begin{aligned} d\sigma:= w^{-1} d\mu, {} \end{aligned} $$
(50)

and we draw the independent points y1, …, ym from . The case w ≡ 1 and  =  corresponds to the previously discussed standard (unweighted) least-squares estimator uL. As previously done for uL, we associate to uW a truncated estimator uT and a conditioned estimator uC, by replacing uL with uW in the corresponding definitions.

Let us introduce the function

$$\displaystyle \begin{aligned}y \mapsto k_{n,w}(y):=\sum_{j=1}^n w(y) |\phi_j(y)|{}^2, \end{aligned}$$

where once again {ϕ1, …, ϕn} is an arbitrary L2(U, ) orthonormal basis of the space \(\mathbb {Y}_n\). Likewise, we define

$$\displaystyle \begin{aligned} K_{n,w}:=\|k_{n,w} \|{}_{L^\infty(U)}. \end{aligned}$$

The following result, established in [15] for real-valued functions, extends Theorem 10 to this setting. Its proof in the V -valued setting is similar.

Theorem 12

For any r > 0, if m and n satisfy

$$\displaystyle \begin{aligned} \dfrac{m}{\ln m} \geq \kappa \, K_{n,w},\;\; \mathrm{with}\;\; \kappa:=\kappa(r)= \frac{3\ln(3/2)-1}{2+2r}, \end{aligned}$$

then the same conclusions of Theorem 10 hold true with u L replaced by u W.

If we now choose

$$\displaystyle \begin{aligned} w(y)=\dfrac{ n }{ \sum_{j=1}^n |\phi_j(y)|{}^2 }, {} \end{aligned} $$
(51)

that satisfies condition (49) by construction, then the measure defined in (50) takes the form

$$\displaystyle \begin{aligned} d\sigma= \dfrac{ \sum_{j=1}^n |\phi_j(y)|{}^2 }{ n } d\mu. {} \end{aligned} $$
(52)

The choice (51) also gives

$$\displaystyle \begin{aligned} K_{n,w}=\|k_{n,w} \|{}_{L^\infty(U)}=n, \end{aligned}$$

and leads to the next result, as a consequence of the previous theorem.

Theorem 13

For any r > 0, if m and n satisfy

$$\displaystyle \begin{aligned} \dfrac{m}{\ln m} \geq \kappa \ n, \;\; \mathrm{with}\;\; \kappa:=\kappa(r)= \frac{3\ln(3/2)-1}{2+2r}, {} \end{aligned} $$
(53)

then the same conclusions of Theorem 10 hold true with u L replaced by u W , with w given by (51) and the weights taken as w i = w(y i).

Remark 2

The above Theorem 13 ensures stability and accuracy of the weighted least-squares approximation, under the minimal condition that m is linearly proportional to n, up to a logarithmic factor. The fact that we may obtain near optimal approximation in L2 with this amount of sample is remarkable and quite specific to the randomized sampling setting, as it was also observed in similar types of results obtained in the context of information based complexity [22, 32,33,34, 37]. For example, in the paper [37], the authors obtain the optimal L2 approximation rate for specific classes of functions that are described by reproducing kernel Hilbert spaces. The recent results from [22] are perhaps closer to our above results since the proposed method uses the same optimal sampling measure associated to the weight (51) as in [15]. The estimates obtained in [22] compare the error of the randomized algorithm with the approximation numbers of the embedding of the RKHS in L2, assuming a certain polynomial decay for these numbers. In Theorem 13, we do not assume any particular form of decay of the best approximation error.

Clearly the above Theorem 13 is an advantage of weighted least squares compared to standard least squares, since condition (43) is more demanding than (53) in terms of the number of observations m.

However, this advantage comes with some drawbacks that we now briefly recall, see [15] for an extensive description. In general (50) and (52) are not product measures, even if is one. Therefore, the first drawback of using weighted least squares concerns the efficient generation of independent samples from multivariate probability measures, whose computational cost could be prohibitively expensive, above all when the dimension d is large. In some specific settings, for example downward closed polynomial spaces \(\mathbb {Y}_n=\mathbb {P}_\varLambda \) with #(Λ) = n, and when is a product measure, this drawback can be overcome. We refer to [15], where efficient sampling algorithms have been proposed and analyzed. For any m and any downward closed set Λ, these algorithms generate m independent samples with proven bounds on the required computational cost. The dependence on the dimension d and m of these bounds is linear. For the general measure (50) the efficient generation of the sample is a nontrivial task, and remains a drawback of such an approach.

The second drawback concerns the use of weighted least squares in a hierarchical context, where we are given a nested sequence Λ1 ⊂… ⊂ Λn of downward closed sets, instead of a single such set Λ. Since the measure (52) depends on n, the sets (Λn)n≥1 are associated to different measures (n)n≥1. Hence, recycling samples from the previous iterations of the adaptive algorithm is not as straightforward as in the case of standard least squares.

As a final remark, let us stress that the above results of Theorems 12 and 13 hold for general approximation spaces \(\mathbb {Y}_n\) other than polynomials.

5 Adaptive Algorithms and Extensions

5.1 Selection of Downward Closed Polynomial Spaces

The interpolation and least-squares methods discussed in Sects. 3 and 4 allow us to construct polynomial approximations in \(\mathbb {V}_\varLambda =V \otimes \mathbb {P}_\varLambda \) of the map (2) from its pointwise evaluations, for some given downward closed set Λ. For these methods, we have given several convergence results in terms of error estimates either in L(U, V ) or L2(U, V, ). In some cases, these estimates compare favorably with the error of best approximation \(\min _{\widetilde u \in \mathbb {V}_\varLambda } \|u-\widetilde u\|\) measured in such norms.

A central issue which still needs to be addressed is the choice of the downward closed set Λ, so that this error of best approximation is well behaved, for a given map u. Ideally, for each given n, we would like to use the set

$$\displaystyle \begin{aligned} \varLambda_n=\mathop{\text{argmin}}_{\varLambda\in \mathcal{D}_n}\min_{\widetilde u\in \mathbb{V}_{\varLambda}} \|u-\widetilde u\|, \end{aligned}$$

where \(\mathcal {D}_n\) is the family of all downward closed sets Λ of cardinality n. However such sets Λn are not explicitly given to us, and in addition the resulting sequence (Λn)n≥1 is generally not nested.

Concrete selection strategies aim to produce “suboptimal yet good” nested sequences (Λn)n≥1 different from the above. Here, an important distinction should be made between nonadaptive and adaptive selection strategies.

In nonadaptive strategies, the selection of Λn is made in an a-priori manner, based on some available information on the given problem. The results from Sect. 2.3 show that, for relevant instances of solution maps associated to elliptic parametric PDEs, there exist nested sequences (Λn)n≥1 of downward closed sets such that #(Λn) = n and \(\min _{\widetilde u\in \mathbb {V}_{\varLambda _n}} \|u-\widetilde u\|\) decreases with a given convergence rate ns as n →. In addition, these results provide constructive strategies for building the sets Λn, since these sets are defined as the indices associated to the n largest \(\widehat \kappa _\nu :=\max _{\widetilde \nu \geq \nu } \kappa _{\,\widetilde \nu \,}\) like in Theorem 5, or directly to the n largest κν like in Theorems 4 and 6, and since the κν are explicitly given numbers.

In the case where we build the polynomial approximation by interpolation, Theorem 9 shows that a good choice of Λn is produced by taking κν to be either π(ν)bν or π(ν)β(ν)2 bν where \(b=(\rho _j^{-1})_{j\,{\geq }\,1}\) is such that (22) holds. The choice of such a sequence ρ depends both on the size and support properties of the functions ψj. For example, when the functions ψj have nonoverlapping support, one natural choice is to take

$$\displaystyle \begin{aligned} \rho_j=\min_{x\in \mathrm{supp}(\psi_j)} \frac {\bar a(x)-\tilde r}{|\psi_j(x)|}. \end{aligned} $$
(54)

We refer to [1] for the choices of sequences ρ in more general situations, for example in the case where (ψj)j ≥ 1 is a wavelet basis.

In the case where we build the polynomial approximation by least-squares methods, the various results from Sect. 4 show that under suitable assumptions, the error is nearly as good as that of best approximation in L2(U, V, ) with respect to the relevant probability measure. In the affine case, Theorem 5 shows that a good choice of Λn is produced by taking κν to be bν β(ν) where \(b=(\rho _j^{-1})_{j\,{\geq }\,1}\) is such that (22) holds. In the lognormal case Theorem 6 shows that a good choice of Λn is produced by taking κν to be given by (30) where \(b=(\rho _j^{-1})_{j\,{\geq }\,1}\) is such that (28) holds.

Let us briefly discuss the complexity of identifying the downward closed set Λn associated to the n largest \(\widehat \kappa _\nu \). For this purpose, we introduce for any downward closed set Λ its set of neighbors defined by

$$\displaystyle \begin{aligned} N(\varLambda):=\{\nu\in \mathcal{F}\setminus \varLambda \mbox{ such that } \varLambda\cup\{\nu\} \mbox{ is downward closed} \}. \end{aligned}$$

We may in principle define Λn = {ν1, …, νn} by the following induction.

  • Take \(\nu ^1=0_{\mathcal {F}}\) as the null multi-index.

  • Given Λk = {ν1, …, νk}, choose a νk+1 maximizing \(\widehat \kappa _\nu \) over ν ∈ N(Λk).

In the finite-dimensional case d < , we observe that N(Λk) is contained in the union of N(Λk−1) with the set consisting of the indices

$$\displaystyle \begin{aligned} \nu^k+e_j, \quad j=1,\dots,d, \end{aligned}$$

where ej is the Kronecker sequence with 1 at position j. As a consequence, since the values of the \(\widehat \kappa _\nu \) have already been computed for ν ∈ N(Λk−1), the step k of the induction requires at most d evaluations of \(\widehat \kappa _\nu \), and therefore the overall computation of Λn requires at most nd evaluations.

In the infinite-dimensional case d = , the above procedure cannot be practically implemented, since the set of neighbors has infinite cardinality. This difficulty can be circumvented by introducing a priority order among the variables, as done in the next definitions.

Definition 4

A monotone nonincreasing positive sequence \((c_\nu )_{\nu \in \mathcal {F}}\) is said to be anchored if and only if

$$\displaystyle \begin{aligned} l\leq j \implies c_{e_j} \leq c_{e_l}. \end{aligned}$$

A finite downward closed set Λ is said to be anchored if and only if

$$\displaystyle \begin{aligned} e_j\in \varLambda \quad \mathrm{and}\quad l\leq j \quad \implies \quad e_l\in \varLambda, \end{aligned}$$

where el and ej are the Kronecker sequences with 1 at position l and j, respectively.

Obviously, if \((c_\nu )_{\nu \in \mathcal {F}}\) is anchored, one of the sets Λn corresponding to its n largest values is anchored. It is also readily seen that all sequences \((\widehat \kappa _\nu )_{\nu \in \mathcal {F}}\) that are used in Theorems 4, 5, 6 or 9 for the construction of Λn are anchored, provided that the sequence \(b=(\rho _j^{-1})_{j\,{\geq }\,1}\) is monotone nonincreasing. This is always the case up to a rearrangement of the variables. For any anchored set Λ, we introduce the set of its anchored neighbors defined by

$$\displaystyle \begin{aligned} \widetilde N(\varLambda):=\{\nu\in N(\varLambda)\; : \; \nu_j=0\;\; \mathrm{if}\;\; j >j(\varLambda)+1\}, {} \end{aligned} $$
(55)

where

$$\displaystyle \begin{aligned}j(\varLambda):=\max\{j\; : \; \nu_j>0\; \mbox{for some}\; \nu\in \varLambda\}. \end{aligned}$$

We may thus modify in the following way the above induction procedure.

  • Take \(\nu ^1=0_{\mathcal {F}}\) as the null multi-index.

  • Given Λk = {ν1, …, νk}, choose a νk+1 maximizing \(\widehat \kappa _\nu \) over \(\nu \in \widetilde N(\varLambda _k)\).

This procedure is now feasible in infinite dimension. At each step k the number of active variables is limited by j(Λk) ≤ k − 1, and the total number of evaluations of \(\widehat \kappa _\nu \) needed to construct Λn does not exceed 1 + 2 + ⋯ + (n − 1) ≤ n2∕2.

In adaptive strategies the sets Λn are not a-priori selected, but instead they are built in a recursive way, based on earlier computations. For instance, one uses the previous set Λn−1 and the computed polynomial approximation \(u_{\varLambda _{n-1}}\) to construct Λn. If we impose that the sets Λn are nested, this means that we should select an index νnΛn−1 such that

$$\displaystyle \begin{aligned} \varLambda_n:=\varLambda_{n-1}\cup\{\nu^n\}. \end{aligned}$$

The choice of the new index νn is further limited to N(Λn−1) if we impose that the constructed sets Λn are downward closed, or to \(\widetilde N(\varLambda _{n-1})\) if we impose that these sets are anchored.

Adaptive methods are known to sometimes perform significantly better than their nonadaptive counterpart. In the present context, this is due to the fact that the a-priori choices of Λn based on the sequences κν may fail to be optimal. In particular, the guaranteed rate ns based on such choices could be pessimistic, and better rates could be obtained by other choices. However, convergence analysis of adaptive methods is usually more delicate. We next give examples of possible adaptive strategies in the interpolation and least-squares frameworks.

5.2 Adaptive Selection for Interpolation

We first consider polynomial approximations obtained by interpolation as discussed in Sect. 3. The hierarchical form

$$\displaystyle \begin{aligned} I_\varLambda u= \sum_{\nu\in \varLambda}\alpha_\nu B_\nu, {} \end{aligned} $$
(56)

may formally be viewed as a truncation of the expansion of u in the hierarchical basis

$$\displaystyle \begin{aligned} \sum_{\nu\in \mathcal{F}}\alpha_\nu B_\nu, \end{aligned}$$

which however may not always be converging, in contrast to the series discussed in Sect. 2. Nevertheless, we could in principle take the same view, and use for Λn the set of indices corresponding to the n largest terms of (56) measured in some given metric Lp(U, V, ). This amounts in choosing the indices of the n largest wνανV, where the weight wν is given by

$$\displaystyle \begin{aligned} w_\nu:=\|B_\nu\|{}_{L^p(U,d\mu)}. \end{aligned}$$

This weight is easily computable when is a tensor product measure, such as the uniform measure. In the case where p =  and if we use the Leja sequence, we know that \(\|B_\nu \|{ }_{L^\infty (U)}=1\) and therefore this amounts to choosing the largest ∥ανV.

This selection strategy is not practically feasible since we cannot afford this exhaustive search over \(\mathcal {F}\). However, it naturally suggests the following adaptive greedy algorithm, which has been proposed in [7].

  • Initialize \(\varLambda _1:=\{0_{\mathcal {F}}\}\) with the null multi-index.

  • Assuming that Λn−1 has been selected and that \((\alpha _\nu )_{\nu \in \varLambda _{n-1}}\) have been computed, compute αν for ν ∈ N(Λn−1).

  • Set

    $$\displaystyle \begin{aligned} \nu^n:=\mathop{\text{argmax}} \{ w_\nu\|\alpha_\nu\|{}_V\; : \; \nu\in N(\varLambda_{n-1})\}. {} \end{aligned} $$
    (57)
  • Define Λn := Λn−1 ∪{νn}.

In the case where p =  and if we use the Leja sequence, this strategy amounts in picking the index νn that maximizes the interpolation error \(\|u(y_{\nu }) - I_{\varLambda _{n-1}}u(y_{\nu })\|{ }_V\) among all ν in N(Λn−1). By the same considerations as previously discussed for the a-priori selection of Λn, we find that in the finite-dimensional case, the above greedy algorithm requires at most dn evaluation after n steps. When working with infinitely many variables (yj)j ≥ 1, we replace the infinite set N(Λn) in the algorithm by the finite set of anchored neighbors \(\widetilde N(\varLambda _n)\) defined by (55). Running n steps of the resulting greedy algorithm requires at most n2∕2 evaluations.

Remark 3

A very similar algorithm has been proposed in [19] in the different context of adaptive quadratures, that is, for approximating the integral of u over the domain U rather than u itself. In that case, the natural choice is to pick the new index νn that maximizes |∫U Δν u | over N(Λn) or \(\widetilde N(\varLambda _n)\).

The main defect of the above greedy algorithm is that it may fail to converge, even if there exist sequences (Λn)n≥1 such that \(I_{\varLambda _n} u\) converges toward u. Indeed, if Δν u = 0 for a certain ν, then no index \(\widetilde \nu \geq \nu \) will ever be selected by the algorithm. As an example, if u is of the form

$$\displaystyle \begin{aligned} u(y)=u_1(y_1)u_2(y_2), \end{aligned}$$

where u1 and u2 are nonpolynomial smooth functions such that u2(t0) = u2(t1), then the algorithm could select sets Λn with indices ν = (k, 0) for k = 0, …, n − 1, since the interpolation error at the point (tk, t1) vanishes.

One way to avoid this problem is to adopt a more conservative selection rule which ensures that all of \(\mathcal {F}\) is explored, by alternatively using the rule (57), or picking the multi-index \(\nu \in \widetilde N(\varLambda _n)\) which has appeared at the earliest stage in the neighbors of the previous sets Λk. This is summarized by the following algorithm.

  • Initialize \(\varLambda _1:=\{0_{\mathcal {F}}\}\) with the null multi-index.

  • Assuming that Λn−1 has been selected and that \((\alpha _\nu )_{\nu \in \varLambda _{n-1}}\) have been computed, compute αν for \(\nu \in \widetilde N(\varLambda _{n-1})\).

  • If n is even, set

    $$\displaystyle \begin{aligned} \nu^n:=\mathop{\text{argmax}} \{ w_\nu \|\alpha_\nu\|{}_V\; : \; \nu\in \widetilde N(\varLambda_{n-1})\}. {} \end{aligned} $$
    (58)
  • If n is odd, set

    $$\displaystyle \begin{aligned} \nu^{n}:=\mathrm{argmin} \{k(\nu)\; : \; \nu\in \widetilde N(\varLambda_{n-1})\}, \quad k(\nu):=\min\{k\; : \; \nu\in \widetilde N(\varLambda_k)\}. \end{aligned}$$
  • Define Λn := Λn−1 ∪{νn}.

Even with such modifications, the convergence of the interpolation error produced by this algorithm is not generally guaranteed. Understanding which additional assumptions on u ensure convergence at some given rate, for a given univariate sequence T such as Leja points, is an open problem.

Remark 4

Another variant to the above algorithms consists in choosing at the iteration k more than one new index at a time within N(Λk−1) or \( \widetilde N(\varLambda _{k-1})\). In this case, we have nk := #(Λk) ≥ k. For example we may choose the smallest subset of indices that retains a fixed portion of the quantity \(\sum _{\nu \in \varLambda _{k-1}} w_\nu \|\alpha _\nu \|{ }_V\). This type of modification turns out to be particularly relevant in the least-squares setting discussed in the next section.

5.3 Adaptive Selection for Least Squares

In this section we describe adaptive selections in polynomial spaces, for the least-squares methods that have been discussed in Sect. 4. We focus on adaptive selection algorithms based on the standard (unweighted) least-squares method.

As a preliminary observation, it turns out that the most efficient available algorithms for adaptive selection of multi-indices might require the selection of more than one index at a time. Therefore, we adopt the notation that nk := #(Λk) ≥ k, where the index k denotes the iteration in the adaptive algorithm.

As discussed in Sect. 4, stability and accuracy of the least-squares approximation is ensured under suitable conditions between the number of samples and the dimension of the approximation space, see e.g. condition (48). Hence, in the development of reliable iterative algorithms, such conditions need to be satisfied at each iteration. When is the measure (46) with shape parameters θ1, θ2, condition (48) takes the form of

$$\displaystyle \begin{aligned} \dfrac{m_k}{\ln m_k} \geq \kappa \ n_k^{s}, \end{aligned} $$
(59)

where mk denotes the number of samples at iteration k, and

$$\displaystyle \begin{aligned} s= \left\{ \begin{array}{ll} \ln 3 /\ln 2, & \quad \mathrm{ if} \ \theta_1=\theta_2=-\frac{1}{2}, \\ {2\max\{\theta_1,\theta_2 \} +2}, & \quad \mathrm{ if} \ \theta_1,\theta_2\in\mathbb{N}_0. \end{array} \right. \end{aligned}$$

Since nk increases with k, the minimal number of samples mk that satisfies (59) has to increase as well at each iteration. At this point, many different strategies can be envisaged for progressively increasing mk such that (59) remains satisfied at each iteration k. For example, one can double the number of samples by choosing mk = 2mk−1 whenever (59) is broken, and keep mk = mk−1 otherwise. The sole prescription for applying Corollary 1 is that the samples are independent and drawn from . Since all the samples at all iterations are drawn from the same measure , at the kth iteration, where mk samples are needed, it is possible to use mk−1 samples from the previous iterations, thus generating only mk − mk−1 new samples.

We may now present a first adaptive algorithm based on standard least squares.

  • Initialize \(\varLambda _1:=\{0_{\mathcal {F}}\}\) with the null multi-index.

  • Assuming that Λk−1 has been selected, compute the least-squares approximation

    $$\displaystyle \begin{aligned} u_L=\sum_{\nu \in \varLambda_{k-1} \cup N(\varLambda_{k-1}) } c_\nu \phi_\nu \end{aligned}$$

    of u in \(\mathbb {V}_{\varLambda _{k-1}\cup N(\varLambda _{k-1})}\), using a number of samples mk that satisfies condition (59) with nk = #(Λk−1 ∪ N(Λk−1)).

  • Set

    $$\displaystyle \begin{aligned} \nu^k:=\mathop{\text{argmax}}_{ \nu\in N(\varLambda_{k-1})} | c_\nu |{}^2. {} \end{aligned} $$
    (60)
  • Define Λk := Λk−1 ∪{νk}.

Similarly to the previously discussed interpolation algorithms, in the case of infinitely many variables (yj)j ≥ 1 the set N(Λk) is infinite and should be replaced by the finite set of anchored neighbors \(\widetilde N(\varLambda _k)\) defined by (55). As for interpolation, we may define a more conservative version of this algorithm in order to ensure that all of \(\mathcal {F}\) is explored. For example, when k is even, we define νk according to (60), and when k is odd we pick for νk the multi-index \(\nu \in \widetilde N(\varLambda _k)\) which has appeared at the earliest stage in the neighbors of the previous sets Λk. The resulting algorithm is very similar to the one presented for interpolation, with obvious modifications due to the use of least squares.

As announced at the beginning, it can be advantageous to select more than one index at a time from \(\widetilde {N}(\varLambda _{k-1})\), at each iteration k of the adaptive algorithm. For describing the multiple selection of indices from \(\widetilde {N}(\varLambda _{k-1})\), we introduce the so-called bulk chasing procedure. Given a finite set \(R \subseteq \widetilde {N}(\varLambda _{k-1})\), a nonnegative function \(\mathcal {E}:R\to \mathbb {R}\) and a parameter α ∈ (0, 1], we define the procedure \(\mathrm{bulk}:= \mathrm{bulk}(R,\mathcal { E},\alpha )\) that computes a set F ⊆ R of minimal positive cardinality such that

$$\displaystyle \begin{aligned} \sum_{\nu \in F} \mathcal{E}(\nu) \geq \alpha \sum_{\nu \in R} \mathcal{E}(\nu). \end{aligned}$$

A possible choice for the function \(\mathcal {E}\) is

$$\displaystyle \begin{aligned} \mathcal{E}(\nu)=\mathcal{E}_L(\nu):=|c_\nu|{}^2, \quad \nu \in R, \end{aligned}$$

where cν is given from an available least-squares estimator

$$\displaystyle \begin{aligned} u_L=\sum_{\nu \in \varLambda} c_\nu \phi_\nu, \end{aligned}$$

that has been already computed on any downward closed set \(R \subset \varLambda \subseteq \varLambda _{k-1}\cup \widetilde {N}(\varLambda _{k-1})\). Another choice for \(\mathcal {E}\) is

$$\displaystyle \begin{aligned} \mathcal{E}(\nu)=\mathcal{E}_M( \nu):= \langle \phi_{ \nu}, u - \widetilde{u}_L \rangle_{m_{k-1}}, \quad \nu \in R, \end{aligned}$$

where \(\widetilde {u}_L\) is the truncation to Λk−1 of a least-squares estimator uL =∑νΛ cν ϕν that has been already computed on any downward closed set \(\varLambda _{k-1} \subset \varLambda \subseteq \varLambda _{k-1} \cup \widetilde {N}(\varLambda _{k-1})\), using a number of samples mk−1 that satisfies condition (59) with nk = #(Λ). The discrete norm in \(\mathcal {E}_M(\nu )\) uses the same mk−1 evaluations of u that have been used to compute the least-squares approximation uL on Λ.

Both \(\mathcal {E}_L(\nu )\) and \(\mathcal {E}_M(\nu )\) should be viewed as estimators of the coefficient 〈u, ϕν〉. The estimator \(\mathcal {E}_M(\nu )\) is of Monte Carlo type and computationally cheap to calculate. Combined use of the two estimators leads to the next algorithm for greedy selection with bulk chasing, that has been proposed in [26].

  • Initialize \(\varLambda _1:=\{0_{\mathcal {F}}\}\) with the null multi-index, and choose α1, α2 ∈ (0, 1].

  • Assuming that Λk−1 has been selected, set

    $$\displaystyle \begin{aligned} F_1=\mathrm{bulk}(\widetilde{N}(\varLambda_{k-1}), \mathcal{E}_M ,\alpha_1 ), {} \end{aligned} $$
    (61)

    where \(\mathcal {E}_M\) uses the least-squares approximation uL =∑νΛ cν ϕν of u in \(\mathbb {V}_{\varLambda }\) that has been calculated at iteration k − 1 on a downward closed set \( \varLambda _{k-1} \subset \varLambda \subseteq \varLambda _{k-1} \cup \widetilde {N}(\varLambda _{k-1})\) using a number of samples mk−1 that satisfies (59) with nk = #(Λ).

  • Compute the least-squares approximation

    $$\displaystyle \begin{aligned} u_L=\sum_{\nu \in \varLambda_{k-1} \cup F_1} c_\nu \phi_\nu {} \end{aligned} $$
    (62)

    of u on \(\mathbb {V}_{\varLambda _{k-1} \cup F_1}\) using a number of samples mk that satisfies (59) with nk = #(Λk−1 ∪ F1).

  • Set

    $$\displaystyle \begin{aligned} F_2=\mathrm{bulk}(F_1, \mathcal{E}_L ,\alpha_2 ), {} \end{aligned} $$
    (63)

    where \(\mathcal {E}_L\) uses the least-squares approximation uL computed on Λk−1 ∪ F1.

  • Define Λk = Λk−1 ∪ F2.

The set \(\widetilde {N}(\varLambda _{k-1})\) can be large, and might contain many indices that are associated to small coefficients. Discarding these indices is important in order to avoid unnecessary computational burden in the calculation of the least-squares approximation. The purpose of the bulk procedure (61) is to perform a preliminary selection of a set \(F_1 \subseteq \widetilde {N}(\varLambda _{k-1})\) of indices, using the cheap estimator \(\mathcal {E}_M\). At iteration k, \(\mathcal {E}_M\) in (61) uses the estimator computed in (62) at iteration k − 1 and truncated to Λk−1. Afterwards, at iteration k, the least-squares approximation in (62) is calculated on Λk−1 ∪ F1, using a number of samples mk which satisfies condition (59), with nk = #(Λk−1 ∪ F1). The second bulk procedure (63) selects a set F2 of indices from F1, using the more accurate estimator \(\mathcal {E}_L\). The convergence rate of the adaptive algorithm depends on the values given to the parameters α1 and α2.

Finally we mention some open issues related to the development of adaptive algorithms using the weighted least-squares methods discussed in Sect. 4, instead of standard least squares. In principle the same algorithms described above can be used with the weighted least-squares estimator uW replacing the standard least-squares estimator uL, provided that, at each iteration k, the number of samples mk satisfies

$$\displaystyle \begin{aligned} \dfrac{m_k}{\ln m_k} \geq \kappa \ n_k, \end{aligned}$$

and that the samples are drawn from the optimal measure, see Theorem 13. This ensures that at each iteration k of the adaptive algorithm, the weighted least-squares approximation remains stable and accurate. However, no guarantees on stability and accuracy are ensured if the above conditions are not met, for example when the samples from previous iterations are recycled.

5.4 Approximation in Downward Closed Spaces: Beyond Polynomials

The concept of downward closed approximation spaces can be generalized beyond the polynomial setting. We start from a countable index set S equipped with a partial order ≤, and assume that there exists a root index 0S ∈ S such that 0S ≤ σ for all σ ∈ S. We assume that (Bσ)σS is a basis of functions defined on [−1, 1] such that \(B_{0_S} \equiv 1\). We then define by tensorization a basis of functions on U = [−1, 1]d when d < , or \(U=[-1,1]^{\mathbb {N}}\) in the case of infinitely many variables, according to

$$\displaystyle \begin{aligned} B_\nu(y)=\prod_{j\,{\geq}\,1} B_{\nu_j} (y_j), \quad \nu:=(\nu_j)_{j\,{\geq}\,1}\in \mathcal{F}, \end{aligned}$$

where \(\mathcal {F}:=S^d\) in the case d < , or \(\mathcal {F}=\ell ^0({\mathbb {N}},S)\), i.e. the set of finitely supported sequences, in the case \(d=\mathbb {N}\).

The set \(\mathcal {F}\) is equipped with a partial order induced by its univariate counterpart: \(\nu \leq \widetilde \nu \) if and only if \(\nu _j\leq \widetilde \nu _j\) for all j ≥ 1. We may then define downward closed sets \(\varLambda \subset \mathcal {F}\) in the same way as in Definition 1 which corresponds to the particular case \(S=\mathbb {N}\). We then define the associated downward closed approximation space by

$$\displaystyle \begin{aligned} \mathbb{V}_\varLambda:=V\otimes \mathbb{B}_\varLambda, \quad \mathbb{B}_\varLambda:=\mathrm{span}\{B_\nu\, : \, \nu\in \varLambda\}, \end{aligned}$$

that is the space of functions of the form ∑νΛ uν Bν with uν ∈ V .

Given a sequence T = (tσ)σS of pairwise distinct points we say that the basis (Bσ)σS is hierarchical when it satisfies

$$\displaystyle \begin{aligned} B_\sigma(t_\sigma)=1 \ \mathrm{and} \ B_\sigma(t_{\widetilde\sigma})=0 \ \mathrm{if} \ \widetilde \sigma \leq \sigma \ \mathrm{and} \ \widetilde \sigma \neq \sigma. \end{aligned}$$

We also define the tensorized grid

$$\displaystyle \begin{aligned}y_\nu:=(t_{\nu_j})_{j\,{\geq}\,1}\in U. \end{aligned}$$

Then, if \(\varLambda \subset \mathcal {F}\) is a downward closed set, we may define an interpolation operator IΛ onto VΛ associated to the grid

$$\displaystyle \begin{aligned} \varGamma_\varLambda:=\{y_\nu \; : \; \nu\in \varLambda\}. \end{aligned}$$

In a similar manner as in the polynomial case, this operator is defined inductively by

$$\displaystyle \begin{aligned} I_\varLambda u:=I_{\,\widetilde \varLambda\,} u+\alpha_{\nu} B_\nu,\quad \alpha_\nu:=\alpha_\nu(u)=u(y_\nu)-I_{\,\widetilde \varLambda\,} u(y_\nu), \end{aligned}$$

where \(\nu \notin \widetilde \varLambda \) and \(\widetilde \varLambda \) is any downward closed set such that \(\varLambda =\widetilde \varLambda \cup \{\nu \}\). We initialize this computation with \(\varLambda _1=\{0_{\mathcal {F}}\}\), where \(0_{\mathcal {F}}\) is the null multi-index, by defining \(I_{\varLambda _1}u\) as the constant function with value \(u(y_{0_{\mathcal {F}}})\).

Examples of relevant hierarchical systems include the classical piecewise linear hierarchical basis functions. In this case the set S is defined by

$$\displaystyle \begin{aligned} S = \{\lambda_{-1},\lambda_1,(0,0)\} \cup \left\{(j,k)\; : \; -2^{j-1}\leq k\leq 2^{j-1}-1,\; j=1,2,\dots \right\} \end{aligned}$$

equipped with the partial order λ−1 ≤ λ1 ≤ (0, 0) and

$$\displaystyle \begin{aligned} (j,k)\leq (j+1,2k), \quad \quad (j,k)\leq (j+1,2k+1), \quad \quad (j,k) \in S. \end{aligned}$$

The set S is thus a binary tree where λ−1 is the root node, (0, 0) is a child of λ1 which is itself a child of λ−1, every node (j, k) has two children (j + 1, 2k) and (j + 1, 2k + 1), and the relation \(\widetilde \lambda \leq \lambda \) means that \(\widetilde \lambda \) is a parent of λ. The index j corresponds to the level of refinement, i.e. the depth of the node in the binary tree. We associate with S the sequence

$$\displaystyle \begin{aligned} T := \{t_{\lambda_{-1}},t_{\lambda_1},t_{(0,0)}\} \cup \left\{t_{(j,k)} := \frac {2k+1}{2^j}: (j,k)\in S, j\geq1 \right\} , \end{aligned}$$

where \(t_{\lambda _{-1}}=-1\), \(t_{\lambda _1}=1\) and t(0,0) = 0. The hierarchical basis of piecewise linear functions defined over [−1, 1] is then given by

$$\displaystyle \begin{aligned} B_{\lambda_{-1}} \equiv 1,\quad B_{\lambda_{1}} (t) = \frac {1+t}{2},\quad B_{(j,k)} (t) = H(2^j(t-t_{(j,k)})),\quad (j,k) \in S, \end{aligned}$$

where

$$\displaystyle \begin{aligned} {H}(t):=\max\{0,1-|t|\}, \end{aligned}$$

is the usual hat function. In dimension d = 1, the hierarchical interpolation amounts in the following steps: start by approximating f with the constant function equal to f(−1), then with the affine function that coincides with f at − 1 and 1, then with the piecewise affine function that coincides with f at − 1, 0 and − 1; afterwards refine the approximation in further steps by interpolating f at the midpoint of an interval between two adjacents interpolation points.

Other relevant examples include piecewise polynomials, hierarchical basis functions, and more general interpolatory wavelets, see [10] for a survey.