1 Introduction

Many problems in science and engineering lead to problems which are defined on the tensor product of L domains Ω 1×Ω 2×⋯×Ω L , where \(\varOmega_{i}\in\mathbb{R}^{n_{i}}\). Already for the simple situation of L=2, there exists a large number of problems. This includes, for instance, radiosity models and radiative transfer [25], where Ω 1 denotes the spatial three-dimensional domain of a geometric object under consideration and Ω 2 is the sphere \(\mathbb{S}^{2}\). Then there are the parabolic problems, where Ω 1 is the time interval and Ω 2 is the spatial domain [5, 13, 15, 22], or various phase space problems, e.g., the Boltzmann equation, kinetic equations, or the Langevin equation [2], where both Ω 1 and Ω 2 are three-dimensional cubes or the full three-dimensional real space.

Even more applications can be found in case of a general L>2. For example, non-Newtonian flow can be modeled by a coupled system which consists of the Navier Stokes equation for the flow in a three-dimensional geometry described by Ω 1 and of the Fokker–Planck equation in the 3(L−1)-dimensional configuration space on Ω 2×Ω 3×⋯×Ω L , where each domain Ω i (i≥2) is a sphere. Here L denotes the number of atoms in a chain-like molecule which constitutes the non-Newtonian behavior of the flow; for details, see [3, 20, 21]. Also the L-th moment of linear elliptic boundary value problems with stochastic source terms are known to satisfy a deterministic partial differential equation with the L-fold tensor product of the elliptic operator on the L-fold product of the physical domain [23, 24]. This approach extends to stochastic diffusion problems and to more general partial differential equations with stochastic coefficient functions or with stochastic domains [1618]. Another example is multiscale homogenization. After unfolding [1, 6, 7], it gives rise to the product of the macroscopic physical domain with the L−1 periodic microscopic domains which correspond to the L−1 different microscales [19]. Finally, we find the product of L domains in quantum mechanics for, e.g., the electronic Schrödinger equation where each electron has its associated three-dimensional domain, see, e.g., [10, 26, 27].

A naive, conventional discretization would use tensor products of all basis functions from suitable finite dimensional ansatz spaces \(V_{J}^{(i)}\), i=1,2,…,L which are defined on each domain separately. This leads to the full tensor product space \(V_{J}^{(1)}\otimes V_{J}^{(2)} \otimes\cdots\otimes V_{J}^{(L)}\). However, in general, the full tensor product space contains way too many degrees of freedom such that desirable realistic simulations are beyond current computing capacities. This relates to the well-known curse of dimension which states that the number of degrees of freedom for an approximation necessary to obtain a prescribed accuracy grows exponentially with the dimension.

To overcome the curse of dimension, we will focus in this paper on the construction of sparse tensor product spaces, also known as sparse grids [4, 28]. The starting point are multilevel decompositions of the ansatz spaces

$$V_J^{(i)} = W_0^{(i)}\oplus W_1^{(i)}\oplus\cdots \oplus W_J^{(i)}, \quad i=1,2,\ldots,L, $$

which can be constructed via hierarchical bases, interpolets, or wavelet-like bases. From this, the regular sparse tensor product space is defined according to

$$ \widehat{V}_J^{\mathrm{reg}} = \bigoplus _{j_1+j_2+\cdots+j_L\le J} W_{j_1}^{(1)}\otimes W_{j_2}^{(2)}\otimes\cdots\otimes W_{j_L}^{(L)}, $$
(1.1)

see, e.g., [4, 12, 14, 28]. Its approximation power is nearly as good as that of the corresponding full tensor product space if the functions to be approximated provide additional smoothness in terms of bounded mixed derivatives. Note here that the regular sparse tensor product space (1.1) is optimal if the approximation error is measured in the L 2-norm and the properties of the ansatz spaces \(V_{J}^{(i)}\) and \(W^{(i)}_{j_{i}}\), respectively, do not depend in i. This means that all the domains Ω i have the same dimension and all the spaces \(V_{J}^{(i)}\) and \(W^{(i)}_{j_{i}}\), respectively, are equipped with the same type of ansatz functions.

To also cover the more general cases, we will introduce in the following the special sparse tensor product space

$$ \widehat{\mathbf{V}}_J^{\boldsymbol{\alpha}} := \bigoplus _{\boldsymbol{\alpha}^T\mathbf{j}\le J} W_{j_1}^{(1)}\otimes W_{j_2}^{(2)}\otimes\cdots\otimes W_{j_L}^{(L)} $$

for an arbitrary vector α=(α 1,α 2,…,α L )>0 and \(\mathbf{j}=(j_{1},j_{2},\ldots,j_{L})\in\mathbb{N}_{0}^{L}\), and we will discuss various choices of α for respective general situations of local dimensions and regularities of the L domains.

For the case L=2, we already systematically studied in [11] what the most efficient construction of sparse tensor product spaces is if the spatial dimension of the underlying domains or the polynomial exactness (and thus the approximation power) of the ansatz spaces differ. In this paper, we will now extend these results to L-fold sparse tensor product spaces. It will turn out that, in case of smooth functions, there is a whole range of sparse tensor product spaces which possess the same optimal convergence rate. However, in case of functions with limited regular or mixed Sobolev smoothness, it will turn out that the sparse tensor product space, which equilibrates the number of degrees of freedom (see Sect. 3), is superior to all the other sparse tensor product spaces under consideration. We recently learned that the situation L≥2 has been considered in [9], too. There, different approximation powers in the coordinate directions were also taken into account but the underlying domains were only of equal dimension n 1=n 2=⋯=n L =1.

The remainder of this paper is organized as follows. In Sect. 2, we specify the requirements of the multiscale hierarchies on each subdomain. Then, in Sect. 3, we construct general sparse tensor product spaces. In Sect. 4, we study their properties. Section 5 is dedicated to the comparison of the cost complexities for the approximation of functions of mixed Sobolev smoothness, where also different degrees of smoothness are possible in the different coordinate directions. Finally, in Sect. 6, we give some concluding remarks.

Throughout this paper, the notion “essential” in the context of complexity estimates means “up to logarithmic terms.” Moreover, to avoid the repeated use of generic but unspecified constants, we signify by CD that C is bounded by a multiple of D independently of parameters which C and D may depend on. Obviously, CD is defined as DC, and CD as CD and CD.

2 Approximation on a Subdomain

Let Ω⊂ℝn be a sufficiently smooth, bounded domain. We consider a nested sequence of finite dimensional spaces

$$ V_0 \subset V_1 \subset\cdots\subset V_j\subset\cdots\subset L^2( \varOmega), $$
(2.1)

each of which consists of piecewise polynomial ansatz functions \(V_{j} = \operatorname{span}\{\varphi_{j,k}:k\in\varDelta_{j}\}\) (Δ j denotes a suitable index set) such that dimV j ∼2jn and

$$ L^2(\varOmega) = \overline{\bigcup _{j\in\mathbb{N}_0}V_j}. $$
(2.2)

We will use the spaces V j for the approximation of functions. To this end, we assume that the approximation property

$$ \inf_{v_j\in V_j}\|u-v_j \|_{H^q(\varOmega)} \lesssim h_j^{s-q}\|u\|_{H^s(\varOmega)}, \quad u\in H^s(\varOmega), $$
(2.3)

holds for q<γ, qsr uniformly in j. Here we set h j :=2j, i.e., h j corresponds to the width of the mesh associated with the subspace V j on Ω. The parameter γ>0 refers to the regularity of the functions which are contained in V j , i.e.,

$$\gamma := \sup\bigl\{s\in\mathbb{R}: V_j\subset H^s( \varOmega)\bigr\}. $$

The integer r>0 refers to the polynomial exactness, that is, the maximal order of polynomials which are locally contained in the space V j .

We now introduce a wavelet basis associated with the multiscale analysis (2.1) and (2.2) as follows. The wavelets Ψ j :={ψ j,k :k∈∇ j }, where ∇ j :=Δ j Δ j−1, are the bases of the complementary spaces W j of V j−1 in V j , i.e.,

$$V_j = V_{j-1}\oplus W_j, \qquad V_{j-1}\cap W_j = \{0\}, \qquad W_j = \operatorname {span}\{ \varPsi_j\}. $$

Recursively, we obtain

$$V_J = \bigoplus_{j=0}^J W_j,\qquad W_0 := V_0, $$

and thus, with

$$\varPsi_J := \bigcup_{j=0}^J \varPsi_j, \qquad \varPsi_0 := \operatorname {span}\{ \varphi_{0,k}:k\in\varDelta_0\}, $$

we get a wavelet basis in V J . A final requirement is that the infinite collection Ψ:=⋃ j≥0 Ψ j forms a Riesz basis of L 2(Ω). Then there exists also a dual (or biorthogonal) wavelet basis \(\widetilde{\varPsi}= \bigcup_{j\ge 0}\widetilde{\varPsi}_{j} = \{\widetilde{\psi}_{j,k}: k\in\nabla_{j}\}\) which defines a dual multiscale analysis of regularity \(\widetilde{\gamma}>0\), see, e.g., [8] for further details. In particular, each function fL 2(Ω) admits the unique representation

$$ f = \sum_{j=0}^\infty \sum_{k\in\nabla_j} (f,\widetilde{\psi}_{j,k})_{L^2(\varOmega)} \psi_{j,k}. $$
(2.4)

With the definition of the projections

$$Q_j:L^2(\varOmega)\to W_j,\qquad Q_jf = \sum_{k\in\nabla_j} (f,\widetilde{\psi}_{j,k})_{L^2(\varOmega)}\psi_{j,k}, $$

the atomic decomposition (2.4) gives rise to the multilevel decomposition

$$f = \sum_{j=0}^\infty Q_jf. $$

Since for all \(-\gamma<q<\widetilde{\gamma}\) also a (properly scaled version of the) wavelet basis Ψ is a Riesz basis of H q(Ω), we especially have

$$ \|f\|_{H^q(\varOmega)}^2\sim\sum _{j=0}^\infty\|Q_j f\|_{H^q(\varOmega)}^2, \quad-\gamma<q<\widetilde{\gamma}, $$
(2.5)

see [8]. Finally, for any fH s(Ω), the approximation property (2.3) induces the estimate

$$ \|Q_jf\|_{H^q(\varOmega)} \lesssim 2^{-j(s-q)}\|f\|_{H^s(\varOmega)},\quad q\le s\le r. $$
(2.6)

3 Sparse Tensor Product Spaces

Consider now L domains \(\varOmega_{i}\subset\mathbb{R}^{n_{i}}\) with n i ∈ℕ for all i=1,2,…,L. We aim for the approximation of functions in anisotropic Sobolev spaces

$$\mathbf{H}^{\mathbf{s}}(\boldsymbol{\varOmega}) := H^{s_1}( \varOmega_1)\otimes H^{s_2}(\varOmega_2)\otimes\cdots \otimes H^{s_L}(\varOmega_L), $$

where Ω:=Ω 1×Ω 2×⋯×Ω L . To this end, we assume to individually have for each subdomain Ω i , i=1,2,…,L, the multiscale analysis

$$V_0^{(i)}\subset V_1^{(i)}\subset V_2^{(i)}\subset\cdots \subset L^2( \varOmega_i),\qquad V_0^{(i)} = \operatorname {span}\bigl\{ \varPhi_j^{(i)}\bigr\},\quad i=1,2,\ldots,L, $$

with associated complementary spaces

$$V_j^{(i)} = V_{j-1}^{(i)}\oplus W_j^{(i)}, \qquad V_{j-1}^{(i)}\cap W_j^{(i)} = \{0\}, \qquad W_j^{(i)} = \operatorname {span}\bigl\{\varPsi_j^{(i)}\bigr\}. $$

Furthermore, for all i=1,2,…,L, let us denote the polynomial exactnesses of the spaces \(V_{j}^{(i)}\) by r i , and the regularity of the primal and dual wavelet bases by γ i and \(\widetilde{\gamma}_{i}\), respectively.

In this paper, we study the approximation of functions in the special sparse tensor product spaceFootnote 1

$$ \widehat{\mathbf{V}}_J^{\boldsymbol{\alpha}} := \bigoplus_{\boldsymbol{\alpha}^T\mathbf{j}\le J} W_{j_1}^{(1)} \otimes W_{j_2}^{(2)}\otimes\cdots\otimes W_{j_L}^{(L)} $$
(3.1)

for an arbitrary vector α=(α 1,α 2,…,α L )>0 and \(\mathbf{j}=(j_{1},j_{2},\ldots,j_{L})\in\mathbb{N}_{0}^{L}\). Here and subsequently, inequalities for vectors are to be understood componentwise, i.e., α>0 means α i >0 for all i=1,2,…,L. Moreover, we set r=(r 1,r 2,…,r L ), γ=(γ 1,γ 2,…,γ L ), and \(\widetilde{\boldsymbol{\gamma}}= (\widetilde{\gamma}_{1}, \widetilde{\gamma}_{2},\ldots,\widetilde{\gamma}_{L})\).

Now we can formulate the following problem: How must α>0 be chosen such that, for given parameters \(-\widetilde{\boldsymbol{\gamma}}<\mathbf{q}< \boldsymbol{\gamma}\) and q<sr , a function fH s(Ω) can be best approximated in \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\) in the H q(Ω)-norm?

Natural choices of the parameter α>0 are:

  1. (i)

    To equilibrate the accuracy in the involved univariate spaces \(V_{J/\alpha_{i}}^{(i)}\), i=1,2,…,L, we obtain the condition

    $$2^{-J(r_1-q_1)/\alpha_1} = 2^{-J(r_2-q_2)/\alpha_2} = \cdots = 2^{-J(r_L-q_L)/\alpha_L}. $$

    This means that we have to choose α i =r i q i for all i=1,2,…,L.

  2. (ii)

    To equilibrate the number of degrees of freedom in the involved univariate spaces \(V_{J/\alpha_{i}}^{(i)}\), i=1,2,…,L, we obtain the condition

    $$2^{Jn_1/\alpha_1} = 2^{Jn_2/\alpha_2} = \cdots = 2^{Jn_L/\alpha_L}. $$

    This condition is satisfied if α i =n i for all i=1,2,…,L.

  3. (iii)

    Following the idea of an equilibrated cost-benefit rate (see [4]), we get the condition

    $$2^{j_1(n_1+r_1-q_1)}\cdot 2^{j_2(n_2+r_2-q_2)}\cdots 2^{j_L(n_L+r_L-q_L)} = 2^{J\cdot const.} \quad\text{for all $\boldsymbol{\alpha}^{T}\mathbf{j} = J$}. $$

    For const.=1, we find α i =n i +r i q i for all i=1,2,…,L.

In the next section, we will compute for these and other choices the cost and the error of the approximation in the related sparse tensor product spaces.

4 Properties of the Sparse Tensor Product Spaces

To compute the convergence rate of functions in the sparse tensor product spaces \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\) with arbitrary α>0, we first count the degrees of freedom which are contained in these spaces.

Theorem 4.1

For any α>0, the dimension of the sparse tensor product space

$$\widehat{\mathbf{V}}_J^{\boldsymbol{\alpha}} = \bigoplus _{\boldsymbol{\alpha}^T\mathbf{j}\le J} W_{j_1}^{(1)}\otimes W_{j_2}^{(2)}\otimes\cdots\otimes W_{j_L}^{(L)} $$

is proportional to \(2^{J\max\{n_{1}/\alpha_{1},n_{2}/\alpha_{2},\ldots, n_{L}/\alpha_{L}\}} J^{R-1}\), where R counts how often the maximum is attained.

Proof

We shall use induction to prove this theorem. Consider first the case L=1. Due to \(\dim W_{j_{1}}^{(1)}\sim 2^{j_{1}n_{1}}\), it holds that

$$\dim\widehat{\mathbf{V}}_J^{\boldsymbol{\alpha}}\sim\sum _{j_1=0}^{J/\alpha_1} 2^{j_1n_1} \lesssim 2^{J n_1/\alpha_1}. $$

Since R=1, the desired result follows immediately.

We now show the induction step LL+1. To this end, we denote the (L+1)-fold sparse tensor product space by

$$ \widehat{\mathbf{V}}_J^{(\boldsymbol{\alpha},\alpha_{L+1})} = \bigoplus_{\boldsymbol{\alpha}^T\mathbf{j}+\alpha_{L+1} j_{L+1}\le J} W_{j_1}^{(1)} \otimes\cdots\otimes W_{j_L}^{(L)}\otimes W_{j_{L+1}}^{(L+1)}, $$
(4.1)

where, without loss of generalityFootnote 2, we will assume

$$ \frac{n_1}{\alpha_1} = \cdots = \frac{n_R}{\alpha_R} > \frac{n_{R+1}}{\alpha_{R+1}} \ge\frac{n_{R+2}}{\alpha_{R+2}}\ge\cdots\ge\frac{n_L}{\alpha_L} \quad \text{and}\quad\frac{n_L}{\alpha_L}\ge\frac{n_{L+1}}{\alpha_{L+1}}. $$
(4.2)

Obviously, there holds the identity

(4.3)

According to our induction hypothesis, the sparse tensor product space \(\widehat{\mathbf{V}}_{J-\alpha_{L+1}j_{L+1}}^{\boldsymbol{\alpha}}\) consists of \(\mathcal{O} (2^{(J-\alpha_{L+1}j_{L+1})n_{1}/\alpha_{1}} (J-\alpha_{L+1}j_{L+1})^{R-1} )\) degrees of freedom. Hence, since \(\dim W^{(L+1)}_{j_{L+1}} \sim 2^{j_{L+1}n_{L+1}}\), we get the estimate

Due to the ordering (4.2), we have to distinguish two cases, namely n 1/α 1>n L+1/α L+1 and n 1/α 1=n L+1/α L+1. In the case n 1/α 1>n L+1/α L+1, which always happens for R<L and might happen if R=L, the exponent in the last sum is always negative, which immediately implies

$$ \dim\widehat{\mathbf{V}}_J^{(\boldsymbol{\alpha},\alpha_{L+1})} \lesssim 2^{J n_1/\alpha_1} J^{R-1}. $$
(4.4)

In the case n 1/α 1=n L+1/α L+1, which can only happen if R=L, the exponent in the sum is zero, and it follows that

$$ \dim\widehat{\mathbf{V}}_J^{(\boldsymbol{\alpha},\alpha_{L+1})} \lesssim 2^{Jn_1/\alpha_1} J^{L-1}\sum_{j_{L+1}=0}^{J/\alpha_{L+1}} 1 \lesssim 2^{Jn_1/\alpha_1}J^L. $$
(4.5)

The combination of (4.4) and (4.5) yields the desired result in case of L+1. This completes the proof. □

Remark 4.2

The statement of Theorem 4.1 is sharp, which can be seen as follows. The sparse tensor product space \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\) contains the space \(W_{0}^{(1)}\otimes\cdots\otimes W_{0}^{(k-1)}\otimes W_{J/\alpha_{k}}^{(k)}\otimes W_{0}^{(k+1)}\otimes\cdots\otimes W_{0}^{(L)}\) for all 1≤kL whose number of degrees of freedom is proportional to \(2^{Jn_{k}/\alpha_{k}}\) (i.e., \(2^{Jn_{k}/\alpha_{k}}\) bounds its number of degrees of freedom from below and above). Thus, \(2^{J\max\{n_{1}/ \alpha_{1},n_{2}/\alpha_{2},\ldots,n_{L}/\alpha_{L}\}}\) is obviously a lower bound for the number of degrees of freedom of the sparse tensor product space \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\). If the maximum is attained R times, for example, if n 1/α 1=n 2/α 2=⋯=n R /α R , then there are asymptotically J R−1 further combinations of (j 1,j 2,…,j R ) such that \(\sum_{k=1}^{R} \alpha_{k} j_{k} = J\). The numbers of degrees of freedom of the associated spaces \(W_{j_{1}}^{(1)}\otimes\cdots\otimes W_{j_{R}}^{(R)}\otimes W_{0}^{(R+1)} \otimes\cdots\otimes W_{0}^{(L)}\) are also proportional to \(2^{J\max\{n_{1} /\alpha_{1},n_{2}/\alpha_{2},\ldots,n_{L}/\alpha_{L}\}}\). This yields the logarithmic factor J R−1 in the estimate as well as its sharpness.

Next we consider the approximation power in the sparse tensor product spaces \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\). Obviously, the highest possible rate of convergence is attained in the space H r(Ω), where r=(r 1,r 2,…,r L ) is the vector of polynomial exactness of the underlying full tensor product spaces. Therefore, in the following theorem, we restrict ourselves without loss of generality to sr.

Theorem 4.3

Let \(-\widetilde{\boldsymbol{\gamma}}<\mathbf{q}<\boldsymbol{\gamma}\) and q<sr and fH s(Ω). Then, the sparse grid projector

$$ \widehat{\mathbf{Q}}_J^{\boldsymbol{\alpha}}:\mathbf{H}^\mathbf{q}(\boldsymbol{\varOmega}) \to\widehat{\mathbf{V}}_J^{\boldsymbol{\alpha}}, \qquad \widehat{\mathbf{Q}}_J^{\boldsymbol{\alpha}}f = \sum _{\boldsymbol{\alpha}^T\mathbf{j}\le J} \bigl(Q_{j_1}^{(1)}\otimes Q_{j_2}^{(2)}\otimes\cdots\otimes Q_{j_L}^{(L)} \bigr)f $$
(4.6)

satisfies

$$ \big\|\bigl(I-\widehat{\mathbf{Q}}_J^{\boldsymbol{\alpha}}\bigr)f\big\|_{\mathbf{H}^\mathbf{q}(\boldsymbol{\varOmega})}\lesssim 2^{-J\min\{\frac{s_1-q_1}{\alpha_1},\frac{s_2-q_2}{\alpha_2},\ldots,\frac{s_L-q_L}{\alpha_L}\}} J^{(P-1)/2}\|f\|_{\mathbf{H}^{\mathbf{s}}(\boldsymbol{\varOmega})}. $$
(4.7)

Here, P counts how often the minimum is attained in the exponent.

Proof

We show the assertion again by induction over L. For the case L=1, the tensor product domain Ω is a single domain Ω 1 and the projector \(\widehat{\mathbf{Q}}_{J}^{\boldsymbol{\alpha}}\) onto the sparse grid is simply \(\sum_{j_{1}=0}^{J/\alpha_{1}} Q_{j_{1}}^{(1)}\). Hence, the assertion follows immediately from (2.5) and (2.6), since

for all \(f\in H^{s_{1}}(\varOmega_{1})\).

We now assume that the assertion is shown for the product domain Ω=Ω 1×⋯×Ω L and the sparse grid projector \(\widehat{\mathbf{Q}}_{J}^{\boldsymbol{\alpha}}\) given by (4.6). This means the error estimate (4.7) is valid for any function from H s(Ω). To prove the induction step LL+1, we shall assume without loss of generalityFootnote 3 that

$$\frac{s_1-q_1}{\alpha_1} = \cdots = \frac{s_P-q_P}{\alpha_P} < \frac{s_{P+1}-q_{P+1}}{\alpha_{P+1}} \le \frac{s_{P+2}-q_{P+2}}{\alpha_{P+2}}\le\cdots\le\frac{s_L-q_L}{\alpha_L} $$

and

$$\frac{s_L-q_L}{\alpha_L}\le\frac{s_{L+1}-q_{L+1}}{\alpha_{L+1}}. $$

The induction step then reads as follows. For any given function

we shall show that the sparse grid projection

$$\widehat{\mathbf{Q}}_J^{(\boldsymbol{\alpha},\alpha_{L+1})} = \sum _{\boldsymbol{\alpha}^T\mathbf{j}+\alpha_{L+1}j_{L+1}\le J} Q_{j_1}^{(1)}\otimes\cdots\otimes Q_{j_{L+1}}^{(L+1)}: \mathbf{H}^{(\mathbf{q},q_{L+1})}(\boldsymbol{\varOmega}\times\varOmega_{L+1}) \to\widehat{\mathbf{V}}_J^{(\boldsymbol{\alpha},\alpha_{L+1})} $$

satisfies the error estimate

Here, the sparse tensor product space \(\widehat{\mathbf{V}}_{J}^{(\boldsymbol{\alpha},\alpha_{L+1})}\) is defined in (4.1) and P :=P+1 if (s 1q 1)/α 1=(s L+1q L+1)/α L+1 and P :=P otherwise.

Analogously to (4.3), we find the identity

In particular, it holds that

Thus, in view of (2.5) and (2.6), since \(f\in\mathbf{H}^{({\mathbf{s}},s_{L+1})} (\boldsymbol{\varOmega}\times\varOmega_{L+1})\), we find

The second norm on the right-hand side can be estimated as

The induction hypotheses (4.7) implies

By employing in addition (2.6), the first norm on the right-hand side of the above expression can be estimated according to

Altogether, we thus get

If (s 1q 1)/α 1<(s L+1q L+1)/α L+1, the exponent in the first sum is always negative, and we obtain

If (s 1q 1)/α 1=(s L+1q L+1)/α L+1, which may only happen if P=L, the exponent in the first sum is always 0, which, due to J/α L+1+1 terms in the sum, leads to

This completes the proof. □

Remark 4.4

(i) The constant in (4.7) depends on the particular choice of α (and of course also of L, q, and r).

(ii) For any 1≤kL, the sparse tensor product space \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\) contains the full tensor product space \(V_{0}^{(1)}\otimes\cdots\otimes V_{0}^{(k-1)} \otimes V_{J/\alpha_{k}}^{(k)}\otimes V_{0}^{(k+1)}\otimes\cdots \otimes V_{0}^{(L)}\). But it does not contain the space \(V_{0}^{(1)} \otimes\cdots\otimes V_{0}^{(k-1)}\otimes V_{J/\alpha_{k}+1}^{(k)} \otimes V_{0}^{(k+1)}\otimes\cdots\otimes V_{0}^{(L)}\). Therefore, the convergence rate cannot be higher than \(2^{-J(s_{k}-q_{k})/ \alpha_{k}}\). Repeating this argument for all 1≤kL and taking the minimum, we obtain the lower bound \(2^{-J\min\{\frac{s_{1}-q_{1}}{\alpha_{1}},\frac{s_{2}-q_{2}}{\alpha_{2}},\ldots,\frac{s_{L}-q_{L}}{\alpha_{L}}\}}\). Thus, we conclude that the error estimate (4.7) is essentially sharp.

(iii) Note that for \({\mathbf{s}}\not=\mathbf{r}\), the logarithmic factor J (R−1)/2 in (4.7) can be reduced or even removed by using more sophisticated estimates, see [14].

By combining Theorems 4.1 and 4.3, we can express the convergence rate in terms of the number of degrees of freedom \(N := \dim \widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\). This gives us the cost complexity of the approximation of functions in the sparse tensor product spaces \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\).

Corollary 4.5

Let \(-\widetilde{\boldsymbol{\gamma}}<\mathbf{q}<\boldsymbol{\gamma}\), q<sr, and fH s(Ω). Furthermore, denote by \(N := \dim\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\) the number of degrees of freedom in the sparse tensor product space \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\), and set

$$\beta := \frac{\min\{(s_1-q_1)/\alpha_1,(s_2-q_2)/\alpha_2,\ldots,(s_L-q_L)/\alpha_L\}}{ \max\{n_1/\alpha_1,n_2/\alpha_2,\ldots,n_L/\alpha_L\}}. $$

Assume that the minimum in the numerator is attained P times and the maximum in the denominator is attained R times. Then, the approximation (4.6) in \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\) possesses the following convergence rate in terms of the degrees of freedom N:

$$ \|f-\widehat{f}_J\|_{\mathbf{H}^\mathbf{q}(\boldsymbol{\varOmega})} \lesssim N^{-\beta}(\log N)^{(P-1)/2+\beta(R-1)} \|f\|_{\mathbf{H}^{\mathbf{s}}(\boldsymbol{\varOmega})}. $$
(4.8)

Proof

First assume R=1. We then have \(N\sim 2^{J\max\{n_{1}/\alpha_{1}, n_{2}/\alpha_{2},\ldots,n_{L}/\alpha_{L}\}}\) due to Theorem 4.1. Hence, it holds that

$$N^{-\beta} = N^{-\frac{\min\{(s_1-q_1)/\alpha_1,(s_2-q_2)/\alpha_2,\ldots,(s_L-q_L)/\alpha_L\}}{\max\{n_1/\alpha_1,n_2/\alpha_2,\ldots,n_L/\alpha_L\}}} \sim 2^{-J\min\{\frac{s_1-q_1}{\alpha_1},\frac{s_2-q_2}{\alpha_2},\ldots,\frac{s_L-q_L}{\alpha_L}\}}, $$

which, together with (4.7), yields the desired error estimate (4.8) for R=1.

If R>1, then the sparse tensor product space \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\) contains

$$N\sim 2^{J\max\{n_1/\alpha_1,n_2/\alpha_2,\ldots,n_L/\alpha_L\}}J^{R-1} $$

degrees of freedom. Thus, we now have the relation

$$\biggl(\frac{N}{J^{R-1}} \biggr)^{-\beta} \sim 2^{-J\min\{\frac{s_1-q_1}{\alpha_1},\frac{s_2-q_2}{\alpha_2},\ldots,\frac{s_L-q_L}{\alpha_L}\}}. $$

Consequently, by taking the logarithm on both sides, we get log(N/J R−1)∼J, and since log(N/J R−1)≤logN, we obtain from (4.7) the estimate

$$\|f-\widehat{f}_J\|_{L^2(\boldsymbol{\varOmega})}\lesssim \biggl( \frac{N}{(\log N)^{R-1}} \biggr)^{-\beta}(\log N)^{(P-1)/2} \|f \|_{\mathbf{H}^{\mathbf{s}}(\boldsymbol{\varOmega})}, $$

i.e., the desired error estimate (4.8) for R>1. This completes the proof. □

5 Discussion of the Results

We now discuss the best choice of the parameter α>0. To this end, we intend to approximate a function fH s(Ω) of maximal smoothness, i.e., we shall apply Corollary 4.5 in the case s=r. Our first question addresses the highest possible convergence rate β which can be achieved in the sparse tensor product spaces \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\).

Lemma 5.1

It holds that

for all α>0.

Proof

Choose α>0 arbitrarily but fixed and let k,∈{1,2,…,L} be such that

$$\frac{r_k-q_k}{\alpha_k} = \min \biggl\{\frac{r_1-q_1}{\alpha_1},\frac{r_2-q_2}{\alpha_2},\ldots, \frac{r_L-q_L}{\alpha_L} \biggr\} $$

and

$$\frac{n_\ell}{\alpha_\ell} = \max \biggl\{\frac{n_1}{\alpha_1},\frac{n_2}{\alpha_2},\ldots, \frac{n_L}{\alpha_L} \biggr\}. $$

It thus holds that (r k q k )/α k ≤(r i q i )/α i and n /α n i /α i for i=1,2,…,L. In particular, we find

$$\beta = \frac{r_k-q_k}{\alpha_k}\cdot\frac{\alpha_\ell}{n_\ell } \le\frac{r_i-q_i}{\alpha_i}\cdot \frac{\alpha_i}{n_i} = \frac{r_i-q_i}{n_i}\quad\text{for all $i=1,2,\ldots,L$}, $$

which immediately implies the assertion

$$\beta\le\min \biggl\{\frac{r_1-q_1}{n_1},\frac{r_2-q_2}{n_2},\ldots, \frac{r_L-q_L}{n_L} \biggr\}. $$

 □

This lemma thus gives an upper bound β for the convergence rate β in Corollary 4.5. We now show that this rate is essentially achieved for the canonical choices (i)–(iii) of α of Sect. 3.

(i) The equilibration of the accuracy in the involved univariate spaces led to the choice α i =r i q i for all i=1,2,…,L. According to Corollary 4.5 and since we consider s=r, we have here

We thus obtain the rate

$$\|f-\widehat{f}_J\|_{\mathbf{H}^\mathbf{q}(\boldsymbol{\varOmega})} \lesssim N^{-\beta^\star}(\log N)^{(L-1)/2+\beta^\star(R-1)} \|f\|_{\mathbf{H}^\mathbf{r}(\boldsymbol{\varOmega})}. $$

(ii) The equilibration of the degrees of freedom in the involved univariate spaces corresponds to the choice α i =n i for all i=1,2,…,L. This immediately leads to

$$\beta = \frac{\min\{(r_1-q_1)/n_1,(r_2-q_2)/n_2,\ldots,(r_L-q_L)/n_L\}}{ \max\{1,1,\ldots,1\}} = \beta^\star. $$

Hence, the rate of convergence is

$$\|f-\widehat{f}_J\|_{\mathbf{H}^\mathbf{q}(\boldsymbol{\varOmega})} \lesssim N^{-\beta^\star}(\log N)^{(P-1)/2+\beta^\star(L-1)} \|f\|_{\mathbf{H}^\mathbf{r}(\boldsymbol{\varOmega})}. $$

(iii) The equilibration of the cost-benefit rate is given by α i =r i q i +n i for all i=1,2,…,L. Let k∈{1,2,…,L} such that

$$\frac{r_k-q_k}{n_k+r_k-q_k} = \min \biggl\{\frac{r_1-q_1}{n_1+r_1-q_1}, \ldots,\frac{r_L-q_L}{n_L+r_L-q_L} \biggr\}. $$

Then, it follows that

$$ \frac{r_k-q_k}{n_k+r_k-q_k}\le\frac{r_i-q_i}{n_i+r_i-q_i}\quad \text{for all $i=1,2,\ldots,L$}, $$
(5.1)

which is equivalent to

$$\frac{r_k-q_k}{n_k}\le \frac{r_i-q_i}{n_i}\quad\text{for all $i=1,2,\ldots,L$}. $$

Hence, (5.1) also implies

$$\frac{n_k}{n_k+r_k-q_k}\ge\frac{n_i}{n_i+r_i-q_i}\quad\text{for all $i=1,2,\ldots,L$}. $$

It therefore again holds that

$$\beta = \frac{\min\{(r_1-q_1)/(n_1+r_1-q_1),\ldots,(r_L-q_L)/(n_L+r_L-q_L)\}}{ \max\{n_1/(n_1+r_1-q_1),\ldots,n_L/(n_L+r_L-q_L)\}} = \beta^\star, $$

and Corollary 4.5 yields

$$\big\|f-\widehat{f}_J\big\|_{\mathbf{H}^\mathbf{q}(\boldsymbol{\varOmega})} \lesssim N^{-\beta^\star}(\log N)^{(P-1)/2+\beta^\star(R-1)} \|f\|_{\mathbf{H}^\mathbf{r}(\boldsymbol{\varOmega})}. $$

We can generalize these particular examples as follows. Set

$$ \alpha_i=\lambda n_i + (1-\lambda) (r_i-q_i)\quad\text{for all $i=1,2, \ldots,L$}, $$
(5.2)

where λ∈[0,1] is an arbitrarily chosen parameter. For λ=1, we find α i =n i ; for λ=0, we find α i =r i q i ; and for λ=1/2, we find α i =(n i +r i q i )/2, which yields the same sparse tensor product spaces as the choice α i =n i +r i q i . Thus, the above examples are covered by (5.2).

For the choice (5.2), it holds for a specific k that

$$\frac{r_k-q_k}{\lambda n_k + (1-\lambda)(r_k-q_k)} = \min_{i\in\{1,2,\ldots,L\}} \biggl\{\frac{r_i-q_i}{\lambda n_i + (1-\lambda)(r_i-q_i)} \biggr\} $$

if and only if

$$\frac{n_k}{\lambda n_k + (1-\lambda)(r_k-q_k)} = \max_{i\in\{1,2,\ldots,L\}} \biggl\{\frac{n_i}{\lambda n_i + (1-\lambda) (r_i-q_i)} \biggr\}. $$

Consequently, for all λ∈[0,1], we find

$$\beta = \frac{\min_{i\in\{1,2,\ldots,L\}}\{(r_i-q_i)/(\lambda n_i+(1-\lambda)(r_i-q_i))\}}{ \max_{i\in\{1,2,\ldots,L\}}\{n_i/(\lambda n_i+(1-\lambda)(r_i-q_i))\}} = \beta^\star. $$

Nevertheless, the logarithmic factors in the convergence rate in (4.8) might differ in the extremal cases λ=0 and λ=1.

In the case of L=2, this construction covers all possible sparse tensor product spaces \(\widehat{\mathbf{V}}^{\boldsymbol{\alpha}}_{J}\) which essentially (i.e., except for logarithmic terms) offer the highest possible convergence rate β , see [11]. Note however that in the case L>2, there might be other choices which also result in optimal sparse tensor product spaces.

Lemma 5.2

Let k∈{1,2,…,L} be such that

$$\frac{r_k-q_k}{n_k} = \min \biggl\{\frac{r_1-q_1}{n_1}, \frac{r_2-q_2}{n_2},\ldots, \frac{r_L-q_L}{n_L} \biggr\} = \beta^\star. $$

Then, for all α>0 such that

$$ \frac{r_k-q_k}{r_i-q_i}\le\frac{\alpha_k}{\alpha_i}\le \frac{n_k}{n_i} \quad\mbox{\textit{for all}}\ i=1,2,\ldots,L, $$
(5.3)

it holds that

$$\beta = \frac{\min\{(r_1-q_1)/\alpha_1,(r_2-q_2)/\alpha_2,\ldots,(r_L-q_L)/\alpha_L\}}{ \max\{n_1/\alpha_1,n_2/\alpha_2,\ldots,n_L/\alpha_L\}} = \beta^\star. $$

If (5.3) is not satisfied, then it holds that β<β .

Proof

The condition (5.3) immediately implies the equalities

$$\frac{r_k-q_k}{\alpha_k}\le\frac{r_i-q_i}{\alpha_i}\quad\text{and}\quad \frac{n_i}{\alpha_i} \le\frac{n_k}{\alpha_k}\quad\text{for all $i=1,2,\ldots,L$}. $$

Hence,

$$\beta = \frac{\min\{(r_1-q_1)/\alpha_1,(r_2-q_2)/\alpha_2,\ldots,(r_L-q_L)/\alpha_L\}}{ \max\{n_1/\alpha_1,n_2/\alpha_2,\ldots,n_L/\alpha_L\}} \\ = \frac{r_k-q_k}{n_k} = \beta^\star $$

indeed holds. If (5.3) is not satisfied for some specific j∈{1,2,…,L}, then it follows that

$$\frac{r_j-q_j}{\alpha_j}<\frac{r_k-q_k}{\alpha_k} \quad\text{or}\quad\frac{n_k}{\alpha_k}< \frac{n_j}{\alpha_j}. $$

Therefore, we obtain

$$\beta = \frac{\min\{(r_1-q_1)/\alpha_1,(r_2-q_2)/\alpha_2,\ldots,(r_L-q_L)/\alpha_L\}}{ \max\{n_1/\alpha_1,n_2/\alpha_2,\ldots,n_L/\alpha_L\}} \not=\beta^\star, $$

which, in view of Lemma 5.1, shows β<β . □

6 Conclusion

The analysis in Sect. 5 reveals that all sparse tensor product spaces \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\) with α satisfying the criterion (5.3) produce essentially the optimal convergence rate β =min i∈{1,2,…,L}{(r i q i )/n i } when a function fH r(Ω) of maximal smoothness is to be approximated. However, if the function to be approximated is only in H s(Ω) with q<s<r, then the highest possible convergence rate is

$$\beta =\min_{i\in\{1,2,\ldots,L\}}\bigl\{(s_i-q_i)/n_i\bigr\}. $$

To achieve this rate, the criterion (5.3) for the choice of α>0 has to be modified according to

$$\frac{s_k-q_k}{s_i-q_i}\le\frac{\alpha_k}{\alpha_i}\le\frac{n_k}{n_i} \quad\text{for all $i=1,2,\ldots,L$}. $$

As a consequence, to obtain for all q<sr the highest possible rate in \(\widehat{\mathbf{V}}_{J}^{\boldsymbol{\alpha}}\), one has to choose α i =n i for all i=1,2,…,L, i.e., one has to equilibrate the degrees of freedom in the involved univariate spaces \(V_{J/\alpha_{i}}^{(i)}\).