Smolyak’s Algorithm: A Powerful Black Box for the Acceleration of Scientific Computations

Tempone, Raúl; Wolfers, Sören

doi:10.1007/978-3-319-75426-0_9

Raúl Tempone¹¹ &
Sören Wolfers¹¹

Part of the book series: Lecture Notes in Computational Science and Engineering ((LNCSE,volume 123))

692 Accesses
1 Citations

Abstract

We provide a general discussion of Smolyak’s algorithm for the acceleration of scientific computations. The algorithm first appeared in Smolyak’s work on multidimensional integration and interpolation. Since then, it has been generalized in multiple directions and has been associated with the keywords: sparse grids, hyperbolic cross approximation, combination technique, and multilevel methods. Variants of Smolyak’s algorithm have been employed in the computation of high-dimensional integrals in finance, chemistry, and physics, in the numerical solution of partial and stochastic differential equations, and in uncertainty quantification. Motivated by this broad and ever-increasing range of applications, we describe a general framework that summarizes fundamental results and assumptions in a concise application-independent manner.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Fast Sparse Grid Operations Using the Unidirectional Principle: A Generalized and Unified Framework

AMGCL: An Efficient, Flexible, and Extensible Algebraic Multigrid Implementation

Article 01 May 2019

Why Geometric Numerical Integration?

1 Introduction

We study Smolyak’s algorithm for the convergence acceleration of general numerical approximation methods

$$\displaystyle \begin{aligned} \begin{aligned} \mathscr{A}\colon \mathbb{N}^n:=\{0,1,\dots,\}^{n}&\to Y, \end{aligned} \end{aligned}$$

which map discretization parameters $\boldsymbol {{k}}=(\i _1,\dots ,{k}_n)\in \mathbb {N}^n$ to outputs $\mathscr {A}(\boldsymbol {{k}})$ in a Banach space Y .

For instance, a straightforward way to approximate the integral of a function $f\colon [0,1]^n\to \mathbb {R}$ is to employ tensor-type quadrature formulas, which evaluate f at the nodes of a regular grid within [0, 1]ⁿ. This gives rise to an approximation method where k _j determines the grid resolution in direction of the jth coordinate axis, j ∈{1, …, n}. Smolyak himself derived and studied his algorithm in this setting, where it leads to evaluations in the nodes of sparse grids [24, 26]. Another example, which emphasizes the distinctness of sparse grids and the general version of Smolyak’s algorithm considered in this work, is integration of a univariate function $f\colon \mathbb {R}\to \mathbb {R}$ that is not compactly supported but exhibits sufficient decay at infinity. In this case, k ₁ could as before determine the resolution of regularly spaced quadrature nodes and k ₂ could be used to determine a truncated quadrature domain. Smolyak’s algorithm then leads to quadrature nodes whose density is high near the origin and decreases at infinity, as intuition would dictate.

To motivate Smolyak’s algorithm, assume that the approximation method $\mathscr {A}$ converges to a limit $\mathscr {A}_{\infty }\in Y$ at the rate

$$\displaystyle \begin{aligned} \|\mathscr{A}(\boldsymbol{{k}})-\mathscr{A}_\infty\|{}_{Y}\leq K_1 \sum_{j=1}^{n}{k}_j^{-{\beta}_j} \quad \forall \boldsymbol{{k}}\in\mathbb{N}^{n} \end{aligned} $$

(1)

and requires the work

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{A}(\boldsymbol{{k}}))= K_2\prod_{j=1}^{n}{k}_j^{\gamma_j} \quad \forall \boldsymbol{{k}}\in\mathbb{N}^{n} \end{aligned} $$

(2)

for some K ₁ > 0, K ₂ > 0 and β _j > 0, γ _j > 0, j ∈{1, …, n}. An approximation of $\mathscr {A}_{\infty }$ with accuracy 𝜖 > 0 can then be obtained with the choice

$$\displaystyle \begin{aligned} {k}_j:=-\Big(\frac{\epsilon}{n K_1}\Big)^{-1/{\beta}_j},\quad j\in\{1,\dots,n\}, \end{aligned} $$

(3)

which requires the work

$$\displaystyle \begin{aligned} C(n,K_1,K_2,\gamma_1,\dots,\gamma_n,{\beta}_1,\dots,,{\beta}_n)\epsilon^{-(\gamma_1/{\beta}_1+\dots+\gamma_n/{\beta}_n)}. \end{aligned} $$

(4)

Here and in the remainder of this work we denote by C(… ) generic constants that depend only on the quantities in parentheses but may change their value from line to line and from equation to equation.

The appearance of the sum γ ₁∕β ₁ + ⋯ + γ _n∕β _n in the exponent above is commonly referred to as the curse of dimensionality. Among other things, we will show (see Example 1) that if the bound in Eq. (1) holds in a slightly stronger sense, then Smolyak’s algorithm can replace this dreaded sum by $\max _{j=1}^{n}\gamma _j/{\beta }_j$, which means that it yields convergence rates that are, up to possible logarithmic factors, independent of the number of discretization parameters. In the general form presented here, Smolyak’s algorithm forms linear combinations of the values $\mathscr {A}(\boldsymbol {{k}})$, $\boldsymbol {{k}}\in \mathbb {N}^{n}$, based on

1.
an infinite decomposition of $\mathscr {A}_{\infty }$ and
2.
a knapsack approach to truncate this decomposition.

Since the decomposition is independent of the particular choice of $\mathscr {A}$ and the truncation relies on easily verifiable assumptions on the decay and work of the decomposition terms, Smolyak’s algorithm is a powerful black box for the non-intrusive acceleration of scientific computations. In the roughly 50 years since its first description, applications in various fields of scientific computation have been described; see, for example, the extensive survey article [2]. The goal of this work is to summarize previous results in a common framework and thereby encourage further research and exploration of novel applications. While some of the material presented here may be folklore knowledge in the sparse grids community, we are not aware of any published sources that present this material in a generally applicable fashion.

The remainder of this work is structured as follows. In Sect. 2, we introduce the infinite decomposition of $\mathscr {A}_{\infty }$ that is at the core of Smolyak’s algorithm. In Sect. 3, we introduce spaces of approximation methods $\mathscr {A}\colon \mathbb {N}^n\to Y$ that allow for efficient solutions of the resulting truncation problem. In Sect. 4, we derive explicit convergence rates for Smolyak’s algorithm in common examples of such spaces. Finally, in Sect. 5, we discuss how various previous results can be deduced within the framework presented here.

2 Decomposition

Smolyak’s algorithm is based on a decomposition of $\mathscr {A}_{\infty }$ that is maybe most simply presented in the continuous setting. Here, Fubini’s theorem and the fundamental theorem of calculus show that any function $f\colon \mathbb {R}_{\geq }^{n}:=[0,\infty )^{n}\to Y$ with f ≡ 0 on $\partial \mathbb {R}_{\geq }^{n}$ satisfies

$$\displaystyle \begin{aligned} f(x)=\int_{\prod_{j=1}^n [0,x_j]}\partial_{1}\dots\partial_{n}f(s) \;ds\quad \forall x\in \mathbb{R}_{\geq}^{n}, \end{aligned} $$

(5)

questions of integrability and differentiability aside. Moreover, if f converges to a limit f _∞∈ Y as $\min _{j=1}^{n}x_j\to \infty $, then

$$\displaystyle \begin{aligned} f_{\infty}=\lim_{\min_{j=1}^{n}x_j\to\infty}\int_{\prod_{j=1}^n [0,x_j]}\partial_{1}\dots\partial_{n}f(s) \;ds=\int_{\mathbb{R}_{\geq}^{n}}\partial_{\text{mix}}f(s)\,ds, \end{aligned} $$

(6)

where we introduced the shorthand ∂ _mix for the mixed derivative ∂ ₁…∂ _n. The crucial observation is now that an approximation of f _∞ can be achieved not only by rectangular truncation of the integral in Eq. (6), which according to Eq. (5) is equivalent to a simple evaluation of f at a single point, but also by truncation to more complicated domains. These domains should ideally correspond to large values of ∂ _mix f in order to minimize the truncation error, but also have to take into consideration the associated computational work.

To transfer the decomposition in Eq. (6) to the discrete setting, we denote by $Y^{\mathbb {N}^n}:=\{\mathscr {A}\colon \mathbb {N}^n\to Y\}$ the space of all functions from $\mathbb {N}^n$ into the Banach space Y . Next, we define the discrete unidirectional difference and sum operators

$$\displaystyle \begin{aligned} \begin{aligned} &\hspace{3em}\varDelta_{j}\colon Y^{\mathbb{N}^{n}}\to Y^{\mathbb{N}^{n}}\\ (\varDelta_{j}\mathscr{A})(\boldsymbol{{k}}):=&\begin{cases}\mathscr{A}({k}_1,\dots,{k}_{n})-\mathscr{A}({k}_1,\dots,{k}_{j-1},{k}_{j}-1,{k}_{j+1},\dots,{k}_{n})\quad \text{if }{k}_j>0,\\ \mathscr{A}({k}_1,\dots,{k}_{n})\quad \text{else}, \end{cases} \end{aligned} \end{aligned}$$

$$\displaystyle \begin{aligned} \begin{aligned} &{\Sigma}_{j}:=\varDelta_{j}^{-1}\colon Y^{\mathbb{N}^{n}}\to Y^{\mathbb{N}^{n}}\\ {}({\Sigma}_{j}&\mathscr{A})(\boldsymbol{{k}}):=\sum_{s=0}^{{k}_j}\mathscr{A}({k}_1,\dots,{k}_{j-1},s,{k}_{j+1},\dots,{k}_n), \end{aligned} \end{aligned}$$

Finally, we introduce their compositions, the mixed difference operator

$$\displaystyle \begin{aligned} \varDelta_{\text{mix}}:=\varDelta_{1}\circ\dots\circ\varDelta_{n}\colon Y^{\mathbb{N}^{n}}\to Y^{\mathbb{N}^{n}}, \end{aligned}$$

and the rectangular sum operator

$$\displaystyle \begin{aligned} {\Sigma}_{R}:={\Sigma}_{1}\circ\dots\circ{\Sigma}_{n}\colon Y^{\mathbb{N}^{n}}\to Y^{\mathbb{N}^{n}}, \end{aligned}$$

which replace the mixed derivative and integral operators that map $f\colon \mathbb {R}^{n}\to Y$ to f↦∂ _mix f and $x\mapsto \int _{\prod _{j=1}^{n}[0,x_j]} f(s)\,ds$, respectively.

The discrete analogue of Eq. (5) is now a matter of simple algebra.

Proposition 1

(i)
We have ${\Sigma }_{R}=\varDelta _{\mathit{\text{mix}}}^{-1}$ , that is
$$\displaystyle \begin{aligned} \mathscr{A}(\boldsymbol{{k}})=\sum_{s_1=0}^{{k}_1}\dots\sum_{s_n=0}^{{k}_{n}} \varDelta_{\mathit{\text{mix}}}\mathscr{A}(s_1,\dots,s_{n})\quad \forall \boldsymbol{{k}}\in\mathbb{N}^{n}. \end{aligned}$$
(ii)
We have $\varDelta _{\mathit{\text{mix}}}=\sum _{\mathbf {e}\in \{0,1\}^{n}} (-1)^{|\mathbf {e}|{ }_1}S_{\mathbf {e}}$ , where S _e is the shift operator defined by
$$\displaystyle \begin{aligned} (S_{\mathbf{e}}\mathscr{A})(\boldsymbol{{k}}):=\begin{cases}\mathscr{A}(\boldsymbol{{k}}-\mathbf{e}),\quad \mathit{\text{if }}\boldsymbol{{k}}-\mathbf{e}\in\mathbb{N}^{n}\\ 0\quad \mathit{\text{else}}\end{cases}. \end{aligned}$$

Proof

Part (i) follows directly from the commutativity of the operators $\{{\Sigma }_{j}\}_{j=1}^{n}$. Part (ii) follows from plugging the representation $\varDelta _{j}= \operatorname {\mathrm {Id}}-S_{{\mathbf {e}}_j}$, where e _j is the jth standard basis vector in $\mathbb {N}^{n}$, into the definition Δ _mix = Δ ₁ ∘⋯ ∘ Δ _n, and subsequent expansion.

Part (i) of the previous proposition shows that, ignoring questions of convergence, discrete functions $\mathscr {A}\colon \mathbb {N}^{n}\to Y$ with limit $\mathscr {A}_{\infty }$ satisfy

$$\displaystyle \begin{aligned} \mathscr{A}_{\infty}=\sum_{\boldsymbol{{k}}\in\mathbb{N}^{n}}\varDelta_{\text{mix}}\mathscr{A}(\boldsymbol{{k}}) \end{aligned} $$

(7)

in analogy to Eq. (6). In the next section, we define spaces of discrete functions for which this sum converges absolutely and can be efficiently truncated. We conclude this section by the observation that a necessary condition for the sum in Eq. (7) to converge absolutely is that the unidirectional limits $\mathscr {A}({k}_1,\dots ,\infty ,\dots ,{k}_{n}):=\lim _{{k}_j\to \infty }\mathscr {A}({k}_1,\dots ,{k}_j,\dots ,{k}_n)$ exist. Indeed, by part (i) of the previous proposition, these limits correspond to summation of $\varDelta _{\text{mix}}\mathscr {A}$ over hyperrectangles that are growing in direction of the jth coordinate axis and fixed in all other directions. For instance, in the context of time-dependent partial differential equations this implies stability requirements for the underlying numerical solver, prohibiting explicit time-stepping schemes that diverge when the space-discretization is refined while the time-discretization is fixed.

3 Truncation

For any index set $\mathscr {I}\subset \mathbb {N}^{n}$, we may define Smolyak’s algorithm as the approximation of $\mathscr {A}_{\infty }$ that is obtained by truncation of the infinite decomposition in Eq. (7) to $\mathscr {I}$,

$$\displaystyle \begin{aligned} \mathscr{S}_{\mathscr{I}}(\mathscr{A}):=\sum_{\boldsymbol{{k}}\in \mathscr{I}}\varDelta_{\text{mix}}\mathscr{A}(\boldsymbol{{k}}). \end{aligned} $$

(8)

By definition of $\varDelta _{\text{mix}}\mathscr {A}$, the approximation $\mathscr {S}_{\mathscr {I}}(\mathscr {A})$ is a linear combination of the values $\mathscr {A}(\boldsymbol {{k}})$, $\boldsymbol {{k}}\in \mathbb {N}^{n}$ (see Sect. 3.2 for explicit coefficients). This is the reason for the name combination technique that was given to approximations of this form in the context of the numerical approximation of partial differential equations [11]. When one talks about the Smolyak algorithm, or the combination technique, a particular truncation is usually implied. The general idea here is to include those indices for which the ratio between contribution (measured in the norm of Y ) and required work of the corresponding decomposition term is large. To formalize this idea, we require decay of the norms of the decomposition terms and bounds on the work required for their evaluation. To express the former, we define for strictly decreasing functions $e_j\colon \mathbb {N}\to \mathbb {R}_{>}:=(0,\infty )$, j ∈{1, …, n} the spaces

$$\displaystyle \begin{aligned} \mathscr{E}_{(e_j)_{j=1}^{n}}(Y):=\big\{\mathscr{A}\colon \mathbb{N}^{n}\to Y :\exists K_1>0 \;\forall \boldsymbol{{k}}\in\mathbb{N}^{n}\; \|\varDelta_{\text{mix}}\mathscr{A}(\boldsymbol{{k}})\|{}_{Y}\leq K_1\prod_{j=1}^{n}e_j({k}_j)\big\}. \end{aligned}$$

Proposition 2

(i)
If
$$\displaystyle \begin{aligned} \sum_{\boldsymbol{{k}}\in\mathbb{N}^{n}}\prod_{j=1}^{n}e_j({k}_j)<\infty, \end{aligned} $$
(9)
then any $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ has a limit $\mathscr {A}_{\infty }:=\lim _{\min _{j=1}^{n}{k}_j\to \infty }\mathscr {A}(\boldsymbol {{k}})$ . Furthermore, the decomposition in Eq.(7) holds and converges absolutely.
(ii)
The spaces $\mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ are linear subspaces of $Y^{\mathbb {N}^{n}}$.
(iii)
(Error expansions ) Assume that the ratios e _j(k)∕e _j(k + 1) are uniformly bounded above for $k\in \mathbb {N}$ and j ∈{1, …, n}. For $\boldsymbol {{k}}\in \mathbb {N}^{n}$ and J ⊂{1, …, n} let $\boldsymbol {{k}}_J:=({k}_j)_{j\in J}\in \mathbb {N}^{|J|}$ . If the approximation error can be written as
$$\displaystyle \begin{aligned} \mathscr{A}(\boldsymbol{{k}})-\mathscr{A}_{\infty}=\sum_{\varnothing\neq J\subset\{1,\dots,n\}}\mathscr{A}_J(\boldsymbol{{k}}_{J})\quad \forall \boldsymbol{{k}}\in\mathbb{N}^{n} \end{aligned}$$
with functions $\mathscr {A}_{J}\colon \mathbb {N}^{|J|}\to Y$ , J ⊂{1, …, n} that satisfy
$$\displaystyle \begin{aligned} \|\mathscr{A}_J(\boldsymbol{{k}}_J)\|{}_{Y}\leq \prod_{j\in J}e_j({k}_j) \end{aligned}$$
then
$$\displaystyle \begin{aligned} \mathscr{A}\in\mathscr{E}_{(e_j)_{j=1}^{n}}(Y). \end{aligned}$$
(iv)
(Multilinearity [ 20 ]) Assume $(Y_i)_{i=1}^m$ and Y are Banach spaces and $\mathscr {M}\colon \prod _{i=1}^mY_i\to Y$ is a continuous multilinear map. If
$$\displaystyle \begin{aligned} \mathscr{A}_i\in\mathscr{E}_{(e_{j})_{j=n_1+\dots+n_{i-1}+1}^{n_1+\dots+n_i}}(Y_i)\quad \forall\,i\in\{1,\dots,m\}, \end{aligned}$$
then
$$\displaystyle \begin{aligned} \mathscr{M}(\mathscr{A}_1,\dots,\mathscr{A}_m)\in\mathscr{E}_{(e_{j})_{j=1}^{n}}(Y), \end{aligned}$$
where n := n ₁ + ⋯ + n _m and
$$\displaystyle \begin{aligned} \mathscr{M}(\mathscr{A}_1,\dots,\mathscr{A}_m)(\boldsymbol{{k}}):=\mathscr{M}(\mathscr{A}_1(\boldsymbol{{k}}_1),\dots,\mathscr{A}_m(\boldsymbol{{k}}_m))\quad \forall \boldsymbol{{k}}=(\boldsymbol{{k}}_1,\dots,\boldsymbol{{k}}_m)\in\mathbb{N}^{n}. \end{aligned}$$

Proof

Since Y is a Banach space, the assumption in part (i) shows that for any $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ the infinite sum in Eq. (7) converges absolutely to some limit $\bar {\mathscr {A}}$. Since rectangular truncations of this sum yield point values $\mathscr {A}(\boldsymbol {{k}})$, $\boldsymbol {{k}}\in \mathbb {N}^{n}$ by part (i) of Proposition 1, the limit $\mathscr {A}_{\infty }:=\lim _{\min _{j=1}^{n}{k}_j\to \infty }\mathscr {A}(\boldsymbol {{k}})$ exists and equals $\bar {\mathscr {A}}$. Part (ii) follows from the triangle inequality.

For part (iii), observe that by part (ii) it suffices to show $\mathscr {A}_{J}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ for all J ⊂{1, …, n}, where we consider $\mathscr {A}_{J}$ as functions on $\mathbb {N}^{n}$ depending only on the parameters indexed by J. Since $\varDelta _{\text{mix}}=\varDelta _{\text{mix}}^{J}\circ \varDelta _{\text{mix}}^{J^C}$, where $\varDelta _{\text{mix}}^{J}$ denotes the mixed difference operator acting on the parameters in J, we then obtain

$$\displaystyle \begin{aligned} \varDelta_{\text{mix}}\mathscr{A}_J(\boldsymbol{{k}})=\begin{cases} \varDelta_{\text{mix}}^{J}\mathscr{A}_{J}(\boldsymbol{{k}}_J) \quad \text{if }\;\forall j\in J^{C}:\boldsymbol{{k}}_{j}=0\\ 0\quad \text{else}. \end{cases} \end{aligned}$$

Hence, it suffices to consider J = {1, …, n}. In this case, the assumption $\|\mathscr {A}_{J}(\boldsymbol {{k}}_{J})\|{ }_{Y}\leq C\prod _{j\in J}e_{j}({k}_j)$ is equivalent to $\varDelta _{\text{mix}}^{-1}\mathscr {A}_{J}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$. Thus, it remains to show that Δ _mix preserves $\mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$. This holds by part (ii) of this proposition together with part (ii) of Proposition 1 and the fact that shift operators preserve $\mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$, which itself follows from the assumption that the functions e _j(⋅)∕e _j(⋅ + 1) are uniformly bounded.

Finally, for part (iv) observe that by multilinearity of $\mathscr {M}$ we have

$$\displaystyle \begin{aligned} \varDelta_{\text{mix}}\mathscr{M}(\mathscr{A}_1,\dots,\mathscr{A}_m)=\mathscr{M}(\varDelta_{\text{mix}}^{(1)}\mathscr{A}_1,\dots,\varDelta_{\text{mix}}^{(m)}\mathscr{A}_m), \end{aligned}$$

where the mixed difference operator on the left hand side acts on n = n ₁ + ⋯ + n _m coordinates, whereas those on the right hand side only act on the n _i coordinates of $\mathscr {A}_i$. By continuity of $\mathscr {M}$ we have

$$\displaystyle \begin{aligned} \|\mathscr{M}(\varDelta_{\text{mix}}^{(1)}\mathscr{A}_1,\dots,\varDelta_{\text{mix}}^{(m)}\mathscr{A}_m)(\boldsymbol{{k}})\|{}_{Y}\leq C\prod_{i=1}^{m}\|\varDelta_{\text{mix}}^{(i)}\mathscr{A}_i(\boldsymbol{{k}}_i)\|{}_{Y_i}, \end{aligned}$$

for some C > 0, from which the claim follows.

Parts (iii) and (iv) of the previous proposition provide sufficient conditions to verify $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ without analyzing mixed differences directly.

Example 1

(i)
After an exponential reparametrization, the assumptions in Eqs. (1) and (2) become
$$\displaystyle \begin{aligned} \|\mathscr{A}(\boldsymbol{{k}})-\mathscr{A}_\infty\|{}_{Y}\leq K_1 \sum_{j=1}^{n}\exp(-{\beta}_j{k}_j) \quad \forall \boldsymbol{{k}}\in\mathbb{N}^{n} \end{aligned}$$

and
$$\displaystyle \begin{aligned} \text{Work}(\mathscr{A}(\boldsymbol{{k}}))= K_2\prod_{j=1}^{n}\exp(\gamma_j{k}_j) \quad \forall \boldsymbol{{k}}\in\mathbb{N}^{n}, \end{aligned}$$
respectively. If we slightly strengthen the first and assume that
$$\displaystyle \begin{aligned} \mathscr{A}(\boldsymbol{{k}})-\mathscr{A}_{\infty}=\sum_{j=1}^{n}\mathscr{A}_j({k}_j)\quad \forall \boldsymbol{{k}}\in\mathbb{N}^n \end{aligned}$$
with functions $\mathscr {A}_j$ that satisfy
$$\displaystyle \begin{aligned} \|\mathscr{A}_j({k}_j)\|{}_{Y}\leq C\exp(-{\beta}_j{k}_j),\quad \forall {k}_j\in\mathbb{N} \end{aligned}$$
for some C > 0 and β _j > 0, j ∈{1, …, n}, then
$$\displaystyle \begin{aligned} \mathscr{A}\in\mathscr{E}_{(e_j)_{j=1}^{n}} \quad \text{with } e_{j}({k}_j):=\exp(-{\beta}_j{k}_j), \end{aligned}$$
by part (iii) of Proposition 2. Theorem 1 below then shows that Smolyak’s algorithm applied to $\mathscr {A}$ requires only the work $\epsilon ^{-\max _{j=1}^{n}\{\gamma _j/{\beta }_j\}}$, up to possible logarithmic factors, to achieve the accuracy 𝜖 > 0.
(ii)
Assume we want to approximate the integral of a function $f\colon [0,1]\to \mathbb {R}$ but are only able to evaluate approximations $f_{{k}_2}$, ${k}_2\in \mathbb {N}$ of f with increasing cost as k ₂ →∞. Given a sequence $S_{{k}_1}$, ${k}_1\in \mathbb {N}$ of linear quadrature formulas, the straightforward approach would be to fix sufficiently large values of k ₁ and k ₂ and then approximate the integral of $f_{{k}_2}$ with the quadrature formula $S_{{k}_1}$. Formally, this can be written as
$$\displaystyle \begin{aligned} \mathscr{A}({k}_1,{k}_2):=S_{{k}_1}f_{{k}_2}. \end{aligned}$$
To show decay of the mixed differences $\varDelta _{\text{mix}}\mathscr {A}$, observe that the application of quadrature formulas to functions is linear in both arguments, which means that we may write
$$\displaystyle \begin{aligned} \mathscr{A}({k}_1,{k}_2)=\mathscr{M}(S_{{k}_1},f_{{k}_2})=\mathscr{M}(\mathscr{A}_1({k}_1),\mathscr{A}_2({k}_2)) \end{aligned}$$
where $\mathscr {A}_1({k}_1):=S_{{k}_1}$, $\mathscr {A}_2({k}_2):=f_{{k}_2}$, and $\mathscr {M}$ is the application of linear functionals to functions on [0, 1]. Assume, for example, that the functions $f_{{k}_2}$ converge to f in some Banach space B of functions on [0, 1] as k ₂ →∞, and that the quadrature formulas $S_{{k}_1}$ converge to the integral operator $\int $ in the continuous dual space B ^∗ as k ₁ →∞. The decay of the mixed differences $\varDelta _{\text{mix}}\mathscr {A}({k}_1,{k}_2)$ then follows from part (iv) of Proposition 2, since $\mathscr {M}$ is a continuous bilinear map from B ^∗× B to $\mathbb {R}$. We will see in Sect. 5.3 below that the application of Smolyak’s algorithm in this example yields so called multilevel quadrature formulas. This connection between Smolyak’s algorithm and multilevel formulas was observed in [14].
(iii)
Assume that we are given approximation methods $\mathscr {A}_j\colon \mathbb {N}\to Y_j$, j ∈{1, …n} that converge at the rates $\|\mathscr {A}_{j}({k}_j)-\mathscr {A}_{\infty ,j}\|{ }_{Y_j}\leq e_j({k}_j)$ to limits $\mathscr {A}_{\infty ,j}\in Y_j$, where $e_j\colon \mathbb {N}\to \mathbb {R}_{>}$ are strictly decreasing functions. Define the tensor product algorithm
$$\displaystyle \begin{aligned} \begin{aligned} \mathscr{A}\colon\mathbb{N}^{n}\to Y:=Y_1\otimes\dots\otimes Y_{n},\\ \mathscr{A}(\boldsymbol{{k}}):=\mathscr{A}_{1}({k}_1)\otimes\dots\otimes\mathscr{A}_{n}({k}_n). \end{aligned} \end{aligned}$$
If the algebraic tensor product Y is equipped with a norm that satisfies $\|y_1\otimes \dots \otimes y_n\|{ }_{Y}\leq \|y_1\|{ }_{Y_1}\dots \|y_{n}\|{ }_{Y_{n}}$, then $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$. Indeed, $\mathscr {A}_j\in \mathscr {E}_{e_j}(Y_j)$ by part (iii) of Proposition 2, thus $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ by part (iv) of the same proposition.

Similar to the product type decay assumption on the norms $\|\varDelta _{\text{mix}}\mathscr {A}(\boldsymbol {{k}})\|{ }_{Y}$, which we expressed in the spaces $\mathscr {E}_{(e_j)_{j=1}^{n}}$, we assume in the remainder that

$$\displaystyle \begin{aligned} \text{Work}(\varDelta_{\text{mix}}\mathscr{A}(\boldsymbol{{k}}))\leq K_2\prod_{j=1}^{n}w_j({k}_j)\quad \forall \boldsymbol{{k}} \in \mathbb{N}^{n} \end{aligned} $$

(10)

for some K ₂ > 0 and increasing functions $w_j\colon \mathbb {N}\to \mathbb {R}_{>}$. By part (ii) of Proposition 1, such a bound follows from the same bound on the evaluations $\mathscr {A}(\boldsymbol {{k}})$ themselves.

3.1 Knapsack Problem

The goal of this subsection is to describe quasi-optimal truncations of the decomposition in Eq. (7) for functions $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ that satisfy Eq. (10). Given a work budget W > 0, a quasi-optimal index set solves the knapsack problem

$$\displaystyle \begin{aligned} \begin{aligned} \max_{\mathscr{I}\subset\mathbb{N}^{n}}&\quad |\mathscr{I}|{}_{e}:=K_1\sum_{\boldsymbol{{k}}\in\mathscr{I}}\prod_{j=1}^{n}e_{j}({k}_j)\\ \text{subject to}&\quad |\mathscr{I}|{}_{w}:=K_2\sum_{\boldsymbol{{k}}\in\mathscr{I}}\prod_{j=1}^{n}w_j({k}_j)\leq W. \end{aligned} \end{aligned} $$

(11)

The term that is maximized here is motivated by

$$\displaystyle \begin{aligned} \|\mathscr{S}_{\mathscr{I}}(\mathscr{A})-\mathscr{A}_{\infty}\|{}_{Y}=\|\sum_{\boldsymbol{{k}}\in\mathscr{I}^{c}}\varDelta_{\text{mix}}\mathscr{A}(\boldsymbol{{k}})\|{}_{Y}\approx\sum_{\boldsymbol{{k}}\in\mathscr{I}^{c}}\|\varDelta_{\text{mix}}\mathscr{A}(\boldsymbol{{k}})\|{}_{Y}\approx |\mathscr{I}^{c}|{}_{e} \end{aligned}$$

Proposition 3 below shows that for any W > 0 the knapsack problem has an optimal value. However, finding corresponding optimal sets is NP-hard [19, Section 1.3]. As a practical alternative one can use Dantzig’s approximation algorithm [19, Section 2.2.1], which selects indices for which the ratio between contribution and work is above some threshold δ(W) > 0,

$$\displaystyle \begin{aligned} \mathscr{I}_{W}:=\{\boldsymbol{{k}}\in\mathbb{N}^{n}:\prod_{j=1}^{n}e_{j}({k}_j)/w_{j}({k}_j)>\delta(W)\}, \end{aligned} $$

(12)

where δ(W) is chosen minimally such that $|\mathscr {I}_{W}|{ }_{w}\leq W$.

Proposition 3

(i)
The knapsack problem in Eq.(11) has a (not necessarily unique) solution, in the sense that a maximal value of $|\mathscr {I}|{ }_{e}$ is attained. We denote this maximal value by E ^∗(W).
(ii)
Any set $\mathscr {I}^{*}$ for which $|\mathscr {I}|{ }_{e}=E^*(W)$ is finite and downward closed: If $\boldsymbol {{k}}\in \mathscr {I}^{*}$ and $\tilde {\boldsymbol {{k}}}\in \mathbb {N}^{n}$ satisfies $\tilde {\boldsymbol {{k}}}\leq \boldsymbol {{k}}$ componentwise, then $\tilde {\boldsymbol {{k}}}\in \mathscr {I}^{*}$ . The same holds for the set $\mathscr {I}_{W}$ from Eq.(12).
(iii)
The set $\mathscr {I}_{W}$ from Eq.(12) satisfies
$$\displaystyle \begin{aligned} |\mathscr{I}_{W}|{}_{e}\geq \frac{|\mathscr{I}_{W}|{}_{w}}{W} E^*(W). \end{aligned}$$
This means that if $\mathscr {I}_{W}$ uses all of the available work budget, $|\mathscr {I}_{W}|{ }_{w}=W$ , then it is a solution to the knapsack problem. In particular, Dantzig’s solutions are optimal for the work $|\mathscr {I}_{W}|{ }_{w}$ they require, but not necessarily for the work W they were designed for.

Proof

There is an upper bound N on the cardinality of admissible sets in Eq. (11) since the functions w _j are increasing and strictly positive. Furthermore, replacing an element k of an admissible set by $\tilde {\boldsymbol {{k}}}$ with $\tilde {\boldsymbol {{k}}}\leq \boldsymbol {{k}}$ decreases |⋅|_w and increases |⋅|_e. This proves parts (i) and (ii), as there are only finitely many downward closed sets of cardinality less than N (for example, all such sets are subsets of {0, …, N − 1}ⁿ). Part (iii) follows directly from the inequality $|\mathscr {I}_{W}|{ }_e/|\mathscr {I}_{W}|{ }_{w}\geq |\mathscr {I}^*|{ }_{e}/|\mathscr {I}^*|{ }_{w}$, where $\mathscr {I}^{*}$ is a set that attains the maximal value E ^∗(W).

Even in cases where no bounding functions e _j and w _j are available, parts (ii) and (iii) of the previous proposition serve as motivation for adaptive algorithms that progressively build a downward closed set $\mathscr {I}$ by adding at each step a multi-index that maximizes a gain-to-work estimate [6, 15].

3.2 Combination Rule

Part (ii) of Proposition 1 provides a way to express the approximations $\mathscr {S}_{\mathscr {I}}(\mathscr {A})$ in a succinct way as linear combinations of different values of $\mathscr {A}$. This yields the combination rule, which in its general form says that

$$\displaystyle \begin{aligned} \mathscr{S}_{\mathscr{I}}(\mathscr{A})=\sum_{\boldsymbol{{k}}\in\mathscr{I}}c_{\boldsymbol{{k}}}\mathscr{A}(\boldsymbol{{k}}) \end{aligned}$$

with

$$\displaystyle \begin{aligned} c_{\boldsymbol{{k}}}=\sum_{e\in\{0,1\}^{n}: \boldsymbol{{k}}+e\in\mathscr{I}}(-1)^{|e|{}_1} \end{aligned} $$

(13)

for any downward closed set $\mathscr {I}$. It is noteworthy that c _k = 0 for all k with $\boldsymbol {{k}}+(1,\dots ,1)\in \mathscr {I}$, because for such k the sum in Eq. (13) is simply the expansion of (1 − 1)ⁿ.

When $\mathscr {I}$ is a standard simplex, $\mathscr {I}=\{\boldsymbol {{k}}\in \mathbb {N}^{n}:|\boldsymbol {{k}}|{ }_{1}\leq L\}$, the following explicit formula holds [25]:

$$\displaystyle \begin{aligned} c_{\boldsymbol{{k}}}=\begin{cases} (-1)^{L-|\boldsymbol{{k}}|{}_{1}}\binom{n-1}{L-|\boldsymbol{{k}}|{}_{1}}\quad \text{if } L-n+1\leq |\boldsymbol{{k}}|{}_{1}\leq L\\ 0\quad \text{else}. \end{cases} \end{aligned}$$

4 Convergence Analysis

4.1 Finite-Dimensional Case

We consider an approximation method $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ with

$$\displaystyle \begin{aligned} e_j({k}_j)=K_{j,1}\exp(-{\beta}_j{k}_j)({k}_j+1)^{s_j}\quad \forall j\in\{1,\dots,n\} \end{aligned} $$

(14)

and assume that

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{A}(\boldsymbol{{k}}))\leq \prod_{j=1}^{n}K_{j,2}\exp(\gamma_j{k}_j)({k}_j+1)^{t_j} \quad \forall \boldsymbol{{k}} \in \mathbb{N}^{n} \end{aligned} $$

(15)

with K _j,1 > 0, K _j,2 > 0, β _j > 0, γ _j > 0, s _j ≥ 0, t _j ≥ 0. The required calculations with s ≡t ≡ 0 were previously done in various specific contexts, see for example [13]. According to Proposition 3, quasi-optimal index sets are given by

$$\displaystyle \begin{aligned} \begin{aligned} \mathscr{I}_{\delta}:&=\big\{\boldsymbol{{k}}\in\mathbb{N}^{n}:\prod_{j=1}^{n}\frac{K_{j,1}\exp(-{\beta}_j{k}_j)({k}_j+1)^{s_j}}{K_{j,2}\exp(\gamma_j{k}_j)({k}_j+1)^{t_j}}>\delta \big\}\\ &=\big\{\boldsymbol{{k}}\in\mathbb{N}^{n}:\frac{K_1}{K_2}\exp(-(\boldsymbol{{\beta}}+\boldsymbol{\gamma})\cdot\boldsymbol{{k}})\prod_{j=1}^{n}({k}_j+1)^{s_j-t_j}>\delta \big\} \end{aligned} \end{aligned}$$

for δ > 0, where $K_1:=\prod _{j=1}^{n}K_{j,1}$, $K_2:=\prod _{j=1}^{n}K_{j,2}$, and β := (β ₁, …, β _n), γ := (γ ₁, …, γ _n). For the analysis in this section, we use the slightly simplified sets

$$\displaystyle \begin{aligned} \mathscr{I}_{L}:=\left\{\boldsymbol{{k}}\in\mathbb{N}^{n}:\exp((\boldsymbol{{\beta}}+\boldsymbol{\gamma})\cdot\boldsymbol{{k}})\leq \exp(L) \right\}=\left\{\boldsymbol{{k}}\in\mathbb{N}^{n}:(\boldsymbol{{\beta}}+\boldsymbol{\gamma})\cdot \boldsymbol{{k}}\leq L \right\}, \end{aligned}$$

with L →∞, where, by abuse notation, we distinguish the two families of sets by the subscript letter.

The work required by $\mathscr {S}_{L}(\mathscr {A}):=\mathscr {S}_{\mathscr {I}_{L}}(\mathscr {A})$ satisfies

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{S}_{L}(\mathscr{A}))\leq \sum_{\boldsymbol{{k}}\in\mathscr{I}_{L}}\prod_{j=1}^{n}K_{j,2}\exp(\gamma_j{k}_j)({k}_j+1)^{t_j}=K_2\sum_{(\boldsymbol{{\beta}}+\boldsymbol{\gamma})\cdot\boldsymbol{{k}}\leq L}\exp(\boldsymbol{\gamma}\cdot\boldsymbol{{k}})(\boldsymbol{{k}}+\mathbf{1})^{\boldsymbol{t}} \end{aligned}$$

with $(\boldsymbol {{k}}+\mathbf {1})^{\boldsymbol {t}}:=\prod _{j=1}^{n}({k}_j+1)^{t_j}$. Similarly, the approximation error satisfies

$$\displaystyle \begin{aligned} \|\mathscr{S}_{L}(\mathscr{A})-\mathscr{A}_{\infty}\|{}_{Y}\leq \sum_{\boldsymbol{{k}}\in\mathscr{I}_{L}^{c}}\prod_{j=1}^{n}K_{j,1}\exp(-{\beta}_j{k}_j){k}_j^{s_j}=K_1 \sum_{(\boldsymbol{{\beta}}+\boldsymbol{\gamma})\cdot\boldsymbol{{k}}>L}\exp(-\boldsymbol{{\beta}}\cdot\boldsymbol{{k}})(\boldsymbol{{k}}+\mathbf{1})^{\boldsymbol{s}}.\end{aligned} $$

(16)

The exponential sums appearing in the work and residual bounds above are estimated in the appendix of this work, with the results

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{S}_{L}(\mathscr{A}))\leq K_2C(\boldsymbol{\gamma},\boldsymbol{t},n)\exp(\frac{\rho}{1+\rho}L)(L+1)^{n^*-1+t^*} \end{aligned} $$

(17)

and

$$\displaystyle \begin{aligned} \|\mathscr{S}_{L}(\mathscr{A})-\mathscr{A}_{\infty}\|{}_{Y}\leq K_1C(\boldsymbol{{\beta}},\boldsymbol{s},n)\exp(-\frac{1}{1+\rho}L)(L+1)^{n^*-1+s^*}, \end{aligned} $$

(18)

where $\rho :=\max _{j=1}^{n}\gamma _j/{\beta }_j$, J := {j ∈{1, …, n} : γ _j∕β _j = ρ}, n ^∗ := |J|, s ^∗ :=∑_{j ∈ J} s _j, t ^∗ :=∑_{j ∈ J} t _j. We may now formulate the main result of this section by rewriting the bound in Eq. (17) in terms of the right-hand side of Eq. (18).

Theorem 1

Under the previously stated assumptions on $\mathscr {A}$ and for small enough 𝜖 > 0, we may choose L > 0 such that

$$\displaystyle \begin{aligned} \|\mathscr{S}_{L}(\mathscr{A})-\mathscr{A}_{\infty}\|{}_{Y}\leq \epsilon \end{aligned}$$

and

$$\displaystyle \begin{aligned} \mathit{\text{Work}}(\mathscr{S}_{L}(\mathscr{A}))\leq K_1^{\rho}K_2C(\boldsymbol{{\beta}},\boldsymbol{\gamma},\boldsymbol{s},\boldsymbol{t},n)\epsilon^{-\rho}|\log\epsilon|{}^{(n^*-1)(1+\rho)+\rho s^*+t^{*}}. \end{aligned}$$

□

This means that we have eliminated the sum in the exponent of the bound in Eq. (4), as announced in Sect. 1. The additional logarithmic factors in Theorem 1 vanish if the worst ratio of work and convergence exponents, ρ, is attained only for a single index $j_{\max }\in \{1,\dots ,n\}$ and if $t_{j_{\max }}=s_{j_{\max }}=0$.

Remark 1

If γ ≡ 0 and β ≡ 0, that is when both work and residual depend algebraically on all parameters, then an exponential reparametrization, $\exp (\tilde {\boldsymbol {{k}}}):=\boldsymbol {{k}}$, takes us back to the situation considered above. The preimage of $\mathscr {I}_{L}=\{\tilde {\boldsymbol {{k}}} : ({\boldsymbol {s}}+{\boldsymbol {t}})\cdot \tilde {\boldsymbol {{k}}}\leq L\}$ under this reparametrization is $\{\boldsymbol {{k}} : \prod _{j=1}^{n}{k}_j^{s_j+t_j}\leq \exp (L)\}$, whence the name hyperbolic cross approximation [5].

Remark 2

When the terms $\varDelta _{\text{mix}}\mathscr {A}(\boldsymbol {{k}})$, $\boldsymbol {{k}}\in \mathbb {N}^{n}$ are orthogonal to each other, we may substitute the Pythagorean theorem for the triangle inequality in Eq. (16). As a result, the exponent of the logarithmic factor in Theorem 1 reduces to (n ^∗− 1)(1 + ρ∕2) + ρs ^∗ + t ^∗.

4.2 Infinite-Dimensional Case

The theory of the previous sections can be extended to the case n = ∞. In this case the decomposition in Eq. (7) becomes

$$\displaystyle \begin{aligned} \mathscr{A}_{\infty}=\sum_{\boldsymbol{{k}}\in\mathbb{N}^{\infty}_{c}}\varDelta_{\text{mix}}\mathscr{A}(\boldsymbol{{k}}), \end{aligned} $$

(19)

where $\mathbb {N}^{\infty }_{c}$ are the sequences with finite support, and $\varDelta _{\text{mix}}\mathscr {A}(\boldsymbol {{k}})$ is defined as $\varDelta _{1}\circ \dots \circ \varDelta _{n_{\max }}\mathscr {A}(\boldsymbol {{k}})$, where $n_{\max }$ is a bound on the support of k. In particular, every term in Eq. (19) is a linear combination of values of $\mathscr {A}$ with only finitely many nonzero discretization parameters.

We consider the case $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ for

$$\displaystyle \begin{aligned} e_j({k}_j):=K_{j,1}\exp(-\beta_j{k}_j)({k}_j+1)^{s} \quad \forall j\geq 1 \end{aligned}$$

and s ≥ 0, $K_1:=\prod _{j=1}^{\infty }K_{j,1}<\infty $ s ≥ 0, and we assume constant computational work for the evaluation of the mixed differences $\varDelta _{\text{mix}}\mathscr {A}(\boldsymbol {{k}})$, i.e. w _j ≡ C in Eq. (10) for all j ≥ 1. Similarly to the finite-dimensional case, we consider sets

$$\displaystyle \begin{aligned} \mathscr{I}_{L}:=\big\{\boldsymbol{{k}}\in\mathbb{N}^{\infty}_{c}:\sum_{j=1}^{\infty}{\beta}_j{k}_j\leq L\big\} \end{aligned}$$

and the associated Smolyak algorithm

$$\displaystyle \begin{aligned} \mathscr{S}_{L}(\mathscr{A}):=\sum_{\mathscr{I}_{L}}\varDelta_{\text{mix}}\mathscr{A}(\boldsymbol{{k}}). \end{aligned}$$

The following theorem is composed of results from [10] on interpolation and integration of analytic functions; the calculations there transfer directly to the general setting.

Theorem 2

Let L > 0 and define $N:=|\mathscr {I}_{L}|=\mathit{\text{Work}}(\mathscr {S}_{L}(\mathscr {A}))$.

(i)
Assume s = 0.
- [ 10 , Theorem 3.2] If there exists β ₀ > 1 such that $M:=M({\beta }_0,({\beta }_j)_{j=1}^{\infty }):=\sum _{j=1}^{\infty }\frac {1}{\exp ({\beta }_j/{\beta }_0)-1}<\infty $ , then
  $$\displaystyle \begin{aligned} \|\mathscr{S}_{L}(\mathscr{A})-\mathscr{A}_{\infty}\|{}_{Y}\leq \frac{K_1 }{{\beta}_0}\exp({\beta}_0M)N^{-({\beta}_0-1)}, \end{aligned}$$
  which implies
  $$\displaystyle \begin{aligned} \mathit{\text{Work}}(\mathscr{S}_{L}(\mathscr{A}))\leq C(K,\beta_0,M)\epsilon^{-1/({\beta}_0-1)} \end{aligned}$$
  for $\epsilon :=\frac {K_1}{{\beta }_0}\exp ({\beta }_0M)N^{-({\beta }_0-1)}$.
- [ 10 , Theorem 3.4] If β _j ≥ β ₀ j for β ₀ > 0, j ≥ 1, then
  $$\displaystyle \begin{aligned} \|\mathscr{S}_{L}(\mathscr{A})-\mathscr{A}_{\infty}\|{}_{Y}\leq \frac{2}{{\beta}_0\sqrt{\log N}}N^{1+\frac{1}{4}{\beta}_0-\frac{3}{8}{\beta}_0(\log N)^{1/2}}. \end{aligned}$$
(ii)
Assume s > 0.
- [ 10 , Corollary 4.2 (i)] If there exist β ₀ > 1 and δ > 0 such that $M({\beta }_0,((1-\delta ){\beta }_j)_{j=1}^{\infty })<\infty $ , then
  $$\displaystyle \begin{aligned} \|\mathscr{S}_{L}(\mathscr{A})-\mathscr{A}_{\infty}\|{}_{Y}\leq C(K_1,\delta,{\beta}_0,M,({\beta}_j)_{j\in\mathbb{N}},s)N^{-({\beta}_0-1)}, \end{aligned}$$
  
  which implies
  $$\displaystyle \begin{aligned} \mathit{\text{Work}}(\mathscr{S}_{L}(\mathscr{A}))\leq C(K_1,\delta,{\beta}_0,M,({\beta}_j)_{j\in\mathbb{N}},s)\epsilon^{-1/({\beta}_0-1)} \end{aligned}$$
  for $\epsilon :=C(K_1,\delta ,{\beta }_0,M,({\beta }_j)_{j\in \mathbb {N}},s)N^{-({\beta }_0-1)}$.
- [ 10 , Corollary 4.2 (ii)] If β _j ≥ β ₀ j for β ₀ > 0, then for every $\hat {{\beta }}_0<{\beta }_0$ we have
  $$\displaystyle \begin{aligned} \|\mathscr{S}_{L}(\mathscr{A})-\mathscr{A}_{\infty}\|{}_{Y}\leq \frac{C(\hat{{\beta}}_0,M,b)}{\sqrt{\log N}}N^{1+\frac{\hat{{\beta}}_0}{4}-\frac{3}{8}\hat{{\beta}}_0(\log N)^{1/2}}. \end{aligned}$$
  □

Remark 3

For alternative approaches to infinite-dimensional problems, which allow even for exponential type work bounds, $w_j({k}_j)=K_{j,2}\exp (\gamma _j{k}_j)$, consider for example [4, 22, 23].

5 Applications

5.1 High-Dimensional Interpolation and Integration

Smolyak introduced the algorithm that now bears his name in [24] to obtain efficient high-dimensional integration and interpolation formulas from univariate building blocks. For example, assume we are given univariate interpolation formulas S _k, ${k}\in \mathbb {N}$ for functions in a Sobolev space H ^β([0, 1]) that are based on evaluations in 2^k points in [0, 1] and converge at the rate

$$\displaystyle \begin{aligned} \|S_{{k}}-\operatorname{\mathrm{Id}}\|{}_{H^{\beta}([0,1])\to H^{\alpha}([0,1])}\leq C 2^{-k(\beta-\alpha)} \end{aligned}$$

for some 0 ≤ α < β. A straightforward high-dimensional interpolation formula is then the corresponding tensor product formula

$$\displaystyle \begin{aligned} \bigotimes_{j=1}^{n}S_{{k}_j}\colon H^{\beta}([0,1])^{\otimes n}=:H^{\beta}_{\text{mix}}([0,1]^{n})\to H^{\alpha}([0,1])^{\otimes n}=:H^{\alpha}_{\text{mix}}([0,1]^{n}) \end{aligned}$$

for $(k_1,\dots ,k_{n})\in \mathbb {N}^{n}$, where we consider both tensor product spaces to be completed with respect to the corresponding Hilbert space tensor norm [12]. This can be interpreted as a numerical approximation method with values in a space of linear operators,

$$\displaystyle \begin{aligned} \mathscr{A}(\boldsymbol{{k}}):=\bigotimes_{j=1}^{n}S_{{k}_j}\in\mathscr{L}(H^{\beta}_{\text{mix}}([0,1]^{n}),H^{\alpha}_{\text{mix}}([0,1]^{n}))=:Y, \end{aligned}$$

whose discretization parameters k = (k ₁, …, k _n) determine the resolution of interpolation nodes in each direction j ∈{1, …, n}.

If we associate as work with $\mathscr {A}(\boldsymbol {{k}})$ the number of required point evaluations,

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{A}(\boldsymbol{{k}})):=\prod_{j=1}^{n}2^{{k}_j}, \end{aligned}$$

then we are in the situation described in Sect. 4.1. Indeed, we have $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$ with $e_{j}({k}_j):=2^{-{k}_j(\beta -\alpha )}$ by part (iii) of Example 1, since the operator norm of a tensor product operator between Hilbert space tensor products factorizes into the product of the operator norms of the constituent operators (see [12, Proposition 4.127] and [3, Section 26.7]).

In particular, the straightforward tensor product formulas $\mathscr {A}({k},\dots ,{k})$ require the work

$$\displaystyle \begin{aligned} \epsilon^{-n/(\beta-\alpha)} \end{aligned}$$

to approximate the identity operator with accuracy 𝜖 > 0 in the operator norm, whereas Smolyak’s algorithm $\mathscr {S}_{L}(\mathscr {A})$ with an appropriate choice of L = L(𝜖) achieves the same accuracy with

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{S}_{L}(\mathscr{A}))\lesssim \epsilon^{-1/(\beta-\alpha)}|\log\epsilon|{}^{(n-1)(1+1/(\beta-\alpha))}, \end{aligned}$$

according to Theorem 1. Here and in the following, we denote by $\lesssim $ estimates that hold up to factors that are independent of 𝜖. As a linear combination of tensor product operators, Smolyak’s algorithm $\mathscr {S}_{L}(\mathscr {A})$ is a linear interpolation formula based on evaluations in the union of certain tensor grids. These unions are commonly known as sparse grids [2, 7, 26].

Remark 4

Interpolation of functions in general Banach spaces, with convergence measured in different general Banach spaces can be treated in the same manner. However, more care has to be taken with the tensor products. Once the algebraic tensor products of the function spaces are equipped with reasonable cross norms [12] and completed, it has to be verified that the operator norm of linear operators between the tensor product spaces factorizes. Unlike for Hilbert spaces, this is not always true for general Banach spaces. However, it is true whenever the codomain is equipped with the injective tensor norm, or when the domain is equipped with the projective tensor norm [12, Sections 4.2.9 and 4.2.12]. For example, the L ^∞-norm (and the similar C ^k-norms) is an injective tensor norm on the product of L ^∞-spaces, while the L ¹-norm is a projective norm on the tensor product of L ¹-spaces.

5.2 Monte Carlo Path Simulation

Consider a stochastic differential equation (SDE)

$$\displaystyle \begin{aligned} \begin{cases} dS(t)=a(t,S(t))dt+b(t,S(t))dW(t)\quad 0\leq t\leq T\\ S(0)=S_0\in \mathbb{R}^d, \end{cases} \end{aligned} $$

(20)

with a Wiener process W(t) and sufficiently regular coefficients $a,b\colon [0,T]\times \mathbb {R}^d\to \mathbb {R}$. A common goal in the numerical approximation of such SDE is to compute expectations of the form

$$\displaystyle \begin{aligned} E[Q(S(T))], \end{aligned}$$

where $Q\colon \mathbb {R}^d\to \mathbb {R}$ is a Lipschitz-continuous quantity of interest of the final state S(T). To approach this problem numerically, we first define random variables S _N(t), 0 ≤ t ≤ T as the forward Euler approximations of Eq. (20) with N ≥ 1 time steps. Next, we approximate the expectations E[Q(S _N(T))] by Monte Carlo sampling using M ≥ 1 independent samples $S^1_{N}(T),\dots ,S^{M}_{N}(T)$ that are computed using independent realizations of the Wiener process. Together, this gives rise to the numerical approximation

$$\displaystyle \begin{aligned} \mathscr{A}(M,N):=\frac{1}{M}\sum_{i=1}^{M}Q(S^i_{N}(T)). \end{aligned}$$

For fixed values of M and N this is a random variable that satisfies

$$\displaystyle \begin{aligned} \begin{aligned} E[\left(\mathscr{A}(M,N)-E[Q(S(T))]\right)^2]&=\left(E[\mathscr{A}(M,N)]-E[Q(S(T))]\right)^2\\ &\quad +\operatorname{\mathrm{Var}}[\mathscr{A}(M,N)]\\ &=(E[Q(S_N(T))]-E[Q(S(T))])^2\\ &\quad +M^{-1}\operatorname{\mathrm{Var}}[Q(S_{N}(T))]\\ & \lesssim N^{-2} + M^{-1}, \end{aligned} \end{aligned}$$

where the last inequality holds by the weak rate of convergence of the Euler method [17, Section 14.1] and by its L ²-boundedness as N →∞. This shows that the random variables $\mathscr {A}(M,N)$ converge to the limit $\mathscr {A}_{\infty }=E[Q(S(T))]$, which itself is just a deterministic real number, in the sense of probabilistic mean square convergence as M, N →∞. To achieve a mean square error or order 𝜖 ² > 0, this straightforward approximation requires the simulation of M ≈ 𝜖 ⁻² sample paths of Eq. (20), each with N ≈ 𝜖 ⁻¹ time steps, which incurs the total work

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{A}(M,N))=MN\approx\epsilon^{-3}. \end{aligned}$$

Smolyak’s algorithm allows us to achieve the same accuracy with the reduced work 𝜖 ⁻² of usual Monte Carlo integration. To apply the results of Sect. 4.1, we consider the reparametrized algorithm $\mathscr {A}(k,l)$ with

$$\displaystyle \begin{aligned} M_k:=M_0\exp(2k/3), \end{aligned}$$

$$\displaystyle \begin{aligned} N_l:=N_0\exp(2l/3), \end{aligned}$$

for which the convergence and work parameters of Sect. 4.1 attain the values β _j = 1∕3, γ _j = 2∕3, and s _j = t _j = 0, j ∈{1, 2}. (Here and in the following we implicitly round up non-integer values, which increases the required work only by a constant factor.) Indeed, we may write

$$\displaystyle \begin{aligned} \mathscr{A}(k,l)=\mathscr{M}(\mathscr{A}_1(k),\mathscr{A}_2(l))), \end{aligned}$$

where $\mathscr {A}_1(k)$, $k\in \mathbb {N}$ is the operator that maps random variables to an empirical average over M _k independent samples, $\mathscr {A}_2(l)$, $l\in \mathbb {N}$ is the random variable $Q(S_{N_l}(T))$, and $\mathscr {M}$ denotes the application of linear operators to random variables. Since $\mathscr {A}_1(k)$ converges in the operator norm to the expectation operator on the space of square integrable random variables at the usual Monte Carlo convergence rate $M_k^{-1/2}$ as k →∞, and $\mathscr {A}_2(l)$ converges to Q(S(T)) at the strong convergence rate $N_l^{-1/2}$ of the Euler method in the L ²-norm [17, Section 10.2] as l →∞, and since $\mathscr {M}$ is linear in both arguments, the claimed values of the convergence parameters β _j, j ∈{1, 2} hold by part (iv) of Proposition 2.

Theorem 1 now shows that choosing L = L(𝜖) such that

$$\displaystyle \begin{aligned} E[(\mathscr{S}_{L}(\mathscr{A})-E[Q(S(T))])^2]\leq \epsilon^2 \end{aligned}$$

incurs the work

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{S}_{L}(\mathscr{A}))\lesssim \epsilon^{-2}|\log\epsilon|{}^{-3}. \end{aligned} $$

(21)

To link this result to the keyword multilevel approximation, we observe that, thanks to our particular choice of parametrization, Smolyak’s algorithm from Sect. 4.1 takes the simple form

$$\displaystyle \begin{aligned} \mathscr{S}_{L}(\mathscr{A})=\sum_{k+l\leq L}\varDelta_{\text{mix}}\mathscr{A}(k,l). \end{aligned}$$

Since Δ _mix = Δ ₁ ∘ Δ ₂ and $\varDelta _{1}={\Sigma }_{1}^{-1}$ we may further write

$$\displaystyle \begin{aligned} \begin{aligned} \mathscr{S}_{L}(\mathscr{A})=&\sum_{l=0}^{L}\sum_{k=0}^{L-l}\varDelta_{\text{mix}}\mathscr{A}(k,l)\\ =&\sum_{l=0}^{L}\varDelta_{2}\mathscr{A}(L-l,l)\\ =&\frac{1}{M_L}\sum_{i=1}^{M_L}Q(S^{i}_{N_{0}}(T))+\sum_{l=1}^{L}\frac{1}{M_{L-l}}\sum_{i=1}^{M_{L-l}}\left(Q(S^{i}_{N_l}(T))-Q(S^{i}_{N_{l-1}}(T))\right), \end{aligned}\end{aligned} $$

(22)

which reveals that Smolyak’s algorithm employs a large number of samples from the coarse approximation $S_{N_0}(T)$, and subsequently improves on the resulting estimate of E[Q(S(T))] by adding approximations of the expectations $E\left [Q(S_{N_{l}}(T))-Q(S_{N_{l-1}}(T))\right ]$, l ∈{1, …, L} that are computed using less samples.

Equation (22) is a multilevel formula of the form analyzed in [8, 16]. Alternatively, this formula could also be deduced directly from the combination rule for triangles in Sect. 5.4. Compared to the analysis in [8], our presentation has two shortcomings: First, our analysis only exploits the strong rate of the discretization method used to approximate Eq. (20). In the situation considered above, this does not affect the results, but for more slowly converging schemes a faster weak convergence rate may be exploited to obtain improved convergence rates. Second, the bound in Eq. (21) is larger than that in [8] by the factor $|\log \epsilon |$. This factor can be removed by using independent samples for different values of l in Eq. (22), since we may then apply Remark 2.

5.3 Multilevel Quadrature

As in Example 1 of Sect. 3, assume that we want to approximate the integral $\int _{[0,1]}f(x)\;dx\in \mathbb {R}$ using evaluations of approximations $f_{l}\colon [0,1]\to \mathbb {R}$, $l\in \mathbb {N}$. This is similar to the setting of the previous subsection, but with random sampling replaced by deterministic quadrature.

As before, denote by S _k, ${k}\in \mathbb {N}$ a sequence of quadrature formulas based on evaluations in 2^k nodes. If we assume that point evaluations of f _l require the work $\exp (\gamma l)$ for some γ > 0, that

$$\displaystyle \begin{aligned} \|f_{l}-f\|{}_{B}\lesssim 2^{-\kappa l} \end{aligned}$$

for some κ > 0 and a Banach space B of functions on [0, 1] and that

$$\displaystyle \begin{aligned} \|S_{{k}}-\int_{[0,1]}\cdot\;dx\|{}_{B^*}\lesssim \exp(-{\beta}{k}) \end{aligned}$$

for some β > 0, then $\mathscr {A}({k},l):=S_{{k}}f_{l}$ satisfies

$$\displaystyle \begin{aligned} |S_{{k}}f_{l}-\int_{[0,1]}f(x)\;dx|\lesssim \exp(-{\beta}{k})+\exp(-\kappa l). \end{aligned}$$

Hence, an accuracy of order 𝜖 > 0 can be achieved by setting

$$\displaystyle \begin{aligned} {k}:=-\log(\epsilon)/{\beta}, \quad l:=-\log_2(\epsilon)/\kappa, \end{aligned}$$

which requires the work

$$\displaystyle \begin{aligned} 2^{{k}}\exp(\gamma l)=\epsilon^{-1/{\beta}-\gamma/\kappa}.\end{aligned} $$

We have already shown the decay of the mixed differences,

$$\displaystyle \begin{aligned} |\varDelta_{\text{mix}}\mathscr{A}({k},l)|\lesssim \exp(-{\beta} {k} )2^{-\kappa l},\end{aligned} $$

in Example 1. Thus, Theorem 1 immediately shows that we can choose L = L(𝜖) such that Smolyak’s algorithm satisfies

$$\displaystyle \begin{aligned} |\mathscr{S}_{L}(\mathscr{A})-\int_{[0,1]}f(x)\;dx|\leq \epsilon, \end{aligned}$$

with

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{S}_{L}(\mathscr{A}))\lesssim \epsilon^{-\max\{1/{\beta},\gamma/\kappa\}}|\log\epsilon|{}^{r} \end{aligned}$$

for some r = r(β, γ, κ) ≥ 0.

As in Sect. 5.2, we may rewrite Smolyak’s algorithm $\mathscr {S}_{L}(\mathscr {A})$ in a multilevel form, which reveals that a Smolyak’s algorithm employs a large number of evaluations of f ₀, and subsequently improves on the resulting integral approximation by adding estimates of the integrals ∫_[0,1] f _l(x) − f _l−1(x) dx, l > 0, that are computed using less quadrature nodes.

5.4 Partial Differential Equations

The original Smolyak algorithm inspired two approaches to the numerical solution of partial differential equations (PDEs). The intrusive approach is to solve discretizations of the PDE that are built on sparse grids. The non-intrusive approach, which we describe here, instead applies the general Smolyak algorithm to product type discretizations whose resolution in the jth direction is described by the parameter k _j [9, 26].

We discuss here how the non-intrusive approach can be analyzed using error expansions of finite difference approximations. For example, the work [11], which introduced the name combination technique, exploited the fact that for the Poisson equation with sufficiently smooth data on [0, 1]², finite difference approximations $u_{{k}_1,{k}_2}\in L^{\infty }([0,1]^2)$ with meshwidths $h_{j}=2^{-{k}_{j}}$ in the directions j ∈{1, 2} satisfy

$$\displaystyle \begin{aligned} u-u_{{k}_1,{k}_2}=w_1(h_1)+w_2(h_2)+w_{1,2}(h_1,h_2), \end{aligned} $$

(23)

where u is the exact solution and w ₁(h ₁), w ₂(h ₂), w _1,2(h ₁, h ₂) ∈ L ^∞([0, 1]²) are error terms that converge to zero in L ^∞ at the rates $\mathscr {O}(h_1^2)$, $\mathscr {O}(h_2^2)$, and $\mathscr {O}(h_1^2h_2^2)$, respectively. Since the work required for the computation of $\mathscr {A}({k}_1,{k}_2):=u_{{k}_1,{k}_2}$ usually satisfies

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{A}({k}_1,{k}_2))\approx (h_1h_2)^{-\gamma} \end{aligned}$$

for some γ ≥ 1 depending on the employed solver, an error bound of size 𝜖 > 0 could be achieved with the straightforward choice k ₁ := k ₂ := −(log₂ 𝜖)∕2, which would require the work

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{A}({k}_1,{k}_2))\approx \epsilon^{-\gamma}. \end{aligned}$$

Since Eq. (23) in combination with part (iii) of Proposition 2 shows that $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{2}}$ with $e_j({k}):=2^{-2{k}_j}$, we may deduce from Theorem 1 that Smolyak’s algorithm applied to $\mathscr {A}$ requires only the work

$$\displaystyle \begin{aligned} \epsilon^{-\gamma/2}|\log\epsilon|{}^{1+\gamma/2} \end{aligned}$$

to achieve the same accuracy. The advantage of Smolyak’s algorithm becomes even more significant in higher dimensions. All that is required to generalize the analysis presented here to high-dimensional problems, as well as to different PDE and different discretization methods, are error expansions such as Eq. (23).

5.5 Uncertainty Quantification

A common goal in uncertainty quantification [1, 13, 18] is the approximation of response surfaces

$$\displaystyle \begin{aligned} \varGamma\ni \boldsymbol{y}\mapsto f(\boldsymbol{y}):=Q(u_{\boldsymbol{y}})\in\mathbb{R}. \end{aligned}$$

Here, $\boldsymbol {y}\in \varGamma \subset \mathbb {R}^m$ represents parameters in a PDE and Q(u _y) is a real-valued quantity of interest of the corresponding solution u _y. For example, a thoroughly studied problem is the parametric linear elliptic second order equation with coefficients $a\colon U\times \varGamma \to \mathbb {R}$,

$$\displaystyle \begin{aligned} \begin{cases} -\nabla_x \cdot(a(x,\boldsymbol{y})\nabla_x u_{\boldsymbol{y}}(x))=g(x)\quad &\text{ in }U\subset\mathbb{R}^d\\ \quad \quad \quad u_{\boldsymbol{y}}(x)=0 \quad &\text{ on }\partial U, \end{cases} \end{aligned}$$

whose solution for any fixed y ∈ Γ is a function $u_{\boldsymbol {y}}\colon U\to \mathbb {R}$.

Approximations of response surfaces may be used for optimization, for worst-case analysis, or to compute statistical quantities such as mean and variance in the case where Γ is equipped with a probability distribution. The non-intrusive approach to compute such approximations, which is known as stochastic collocation in the case where Γ is equipped with a probability distribution, is to compute the values of f for finitely many values of y and then interpolate. For example, if we assume for simplicity that $\varGamma =\prod _{j=1}^{m}[0,1]$, then we may use, as in Sect. 5.1, a sequence of interpolation operators S _k: H ^β([0, 1]) → H ^α([0, 1]) based on evaluations in $(y_{{k},i})_{i=1}^{2^{k}}\subset [0,1]$. However, unlike in Sect. 5.1, we cannot compute values of f exactly but have to rely on a numerical PDE solver. If we assume that this solver has discretization parameters $\boldsymbol {l}=(l_{1},\dots ,l_{d})\in \mathbb {N}^{d}$ and returns approximations u _y,l such that the functions

$$\displaystyle \begin{aligned} \begin{aligned} f_{\boldsymbol{l}}\colon \varGamma&\to\mathbb{R}\\ \boldsymbol{y}&\mapsto f_{\boldsymbol{l}}(\boldsymbol{y}):=Q(u_{\boldsymbol{y},\boldsymbol{l}}) \end{aligned} \end{aligned}$$

are elements of $H^{\beta }_{\text{mix}}([0,1]^m)$, then we may define the numerical approximation method

$$\displaystyle \begin{aligned} \begin{aligned} &\mathscr{A}\colon \mathbb{N}^{m}\times\mathbb{N}^{d}\to H^{\alpha}_{\text{mix}}([0,1]^m)=:Y\\ &\mathscr{A}(\boldsymbol{{k}},\boldsymbol{l}):=\Big(\bigotimes_{j=1}^{m}S_{{k}_j}\Big)f_{\boldsymbol{l}}, \end{aligned} \end{aligned}$$

with n := m + d discretization parameters.

At this point the reader should already be convinced that straightforward approximation is a bad idea. We therefore omit this part of the analysis, and directly move on to the application of Smolyak’s algorithm. To do so, we need to identify functions $e_{j}\colon \mathbb {N}\to \mathbb {R}_{>}$ such that $\mathscr {A}\in \mathscr {E}_{(e_j)_{j=1}^{n}}(Y)$. For this purpose, we write $\mathscr {A}$ as

$$\displaystyle \begin{aligned} \mathscr{A}(\boldsymbol{{k}},\boldsymbol{l})=\mathscr{M}(\mathscr{A}_1(\boldsymbol{{k}}),\mathscr{A}_2(\boldsymbol{l})), \end{aligned}$$

where

$$\displaystyle \begin{aligned} \mathscr{A}_1(\boldsymbol{{k}}):=\bigotimes_{j=1}^{m}S_{{k}_j}\in \mathscr{L}\Big(H^{\beta}_{\text{mix}}([0,1]^m);H^{\alpha}_{\text{mix}}([0,1]^m)\Big)=:Y_1 \quad \forall \boldsymbol{{k}}\in \mathbb{N}^m \end{aligned}$$

$$\displaystyle \begin{aligned} \mathscr{A}_2(\boldsymbol{l}):=f_{\boldsymbol{l}}\in H^{\beta}_{\text{mix}}([0,1]^m)=:Y_2\quad \forall \boldsymbol{l} \in \mathbb{N}^d \end{aligned}$$

and

$$\displaystyle \begin{aligned} \mathscr{M}\colon Y_1 \times Y_2 \to Y \end{aligned}$$

is the application of linear operators in Y ₁ to functions in Y ₂. Since $\mathscr {M}$ is continuous and multilinear, we may apply part (iv) of Proposition 2 to reduce our task to the study of $\mathscr {A}_1$ and $\mathscr {A}_2$. The first part can be done exactly as in Sect. 5.1. The second part can be done similarly to Sect. 5.4. However, we now have to verify not only that the approximations u _y,l converge to the exact solutions u _y for each fixed value of y as $\min _{j=1}^{d}l_{j}\to \infty $, but that this convergence holds in some uniform sense over the parameter space.

More specifically, let us denote by $\varDelta _{\text{mix}}^{(\boldsymbol {l})}$ the mixed difference operator with respect to the parameters l and let us assume that

$$\displaystyle \begin{aligned} \|\varDelta_{\text{mix}}^{(\boldsymbol{l})}f_{\boldsymbol{l}}\|{}_{H^{\beta}_{\text{mix}}([0,1]^m)}\lesssim \prod_{j=1}^{d}\exp(-\kappa_j l_{j})=:\prod_{j=1}^{d}e^{(2)}_{j}(l_j)\quad \forall \boldsymbol{l}\in\mathbb{N}^{d}. \end{aligned}$$

For example, such bounds are proven in [13, 14]. If the interpolation operators satisfy as before

$$\displaystyle \begin{aligned} \|S_{{k}}-\operatorname{\mathrm{Id}}\|{}_{H^{\beta}([0,1])\to H^{\alpha}([0,1])}\lesssim 2^{-{k}(\beta-\alpha)}=:e^{(1)}({k})\quad \forall\,{k}\in\mathbb{N}, \end{aligned}$$

then the results of Sect. 5.1 together with part (iv) of Proposition 2 shows that

$$\displaystyle \begin{aligned} \mathscr{A}\in \mathscr{E}_{(e^{(1)})_{j=1}^{m}\cup (e^{(2)}_j)_{j=1}^{d}}(Y). \end{aligned}$$

If we further assume that the work required by the PDE solver with discretization parameters l is bounded by $\exp (\boldsymbol {\gamma }^{(2)}\cdot \boldsymbol {l})$ for some $\boldsymbol {\gamma }\in \mathbb {R}_{>}^{d}$, then we may assign as total work to the algorithm $\mathscr {A}(\boldsymbol {{k}},\boldsymbol {l})$ the value

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{A}(\boldsymbol{{k}},\boldsymbol{l})):=2^{|\boldsymbol{{k}}|{}_{1}}\exp(\boldsymbol{\gamma}\cdot \boldsymbol{l}), \end{aligned}$$

which is the number of required samples, $2^{|\boldsymbol {{k}}|{ }_{1}}$, times the bound on the work per sample, $\exp (\boldsymbol {\gamma }\cdot \boldsymbol {l})$. Thus, by Theorem 1, Smolyak’s algorithm achieves the accuracy

$$\displaystyle \begin{aligned} \|\mathscr{S}_{L}(\mathscr{A})-f\|{}_{Y}\lesssim \epsilon \end{aligned}$$

with

$$\displaystyle \begin{aligned} \text{Work}(\mathscr{S}_{L}(\mathscr{A}))\lesssim \epsilon^{-\rho}|\log\epsilon|{}^{r}, \end{aligned}$$

where $\rho :=\max \{1/(\beta -\alpha ),\max \{\gamma _j/\kappa _j\}_{j=1}^{d}\}$ and r ≥ 0 as in Sect. 4.1.

6 Conclusion

We showed how various existing efficient numerical methods for integration, Monte Carlo simulations, interpolation, the solution of partial differential equations, and uncertainty quantification can be derived from two common underlying principles: decomposition and efficient truncation. The analysis of these methods was divided into proving decay of mixed differences by means of Proposition 2 and then applying general bounds on exponential sums in form of Theorem 1.

Besides simplifying and streamlining the analysis of existing methods, we hope that the framework provided in this work encourages novel applications. Finally, we believe that the general version of Smolyak’s algorithm presented here may be helpful in designing flexible and reusable software implementations that can be applied to future problems without modification.

References

I. Babuška, R. Tempone, G.E. Zouraris, Galerkin finite element approximations of stochastic elliptic partial differential equations. SIAM J. Numer. Anal. 42(2), 800–825 (2004)
Article MathSciNet Google Scholar
H.-J. Bungartz, M. Griebel, Sparse grids. Acta Numer. 13, 147–269 (2004)
Article MathSciNet Google Scholar
A. Defant, K. Floret, Tensor Norms and Operator Ideals (Elsevier, Burlington, 1992)
MATH Google Scholar
D. Dũng, M. Griebel, Hyperbolic cross approximation in infinite dimensions. J. Complexity 33, 55–88 (2016)
Article MathSciNet Google Scholar
D. Dũng, V.N. Temlyakov, T. Ullrich, Hyperbolic cross approximation (2015). arXiv:1601.03978
Google Scholar
J. Garcke, A dimension adaptive sparse grid combination technique for machine learning. ANZIAM J. 48(C), C725–C740 (2007)
Article MathSciNet Google Scholar
J. Garcke, Sparse grids in a nutshell, in Sparse Grids and Applications (Springer, Berlin, 2012), pp. 57–80
Book Google Scholar
M.B. Giles, Multilevel monte carlo path simulation. Oper. Res. 56(3), 607–617 (2008)
Article MathSciNet Google Scholar
M. Griebel, H. Harbrecht, On the convergence of the combination technique, in Sparse Grids and Applications (Springer, Cham, 2014), pp. 55–74
MATH Google Scholar
M. Griebel, J. Oettershagen, On tensor product approximation of analytic functions. J. Approx. Theory 207, 348–379 (2016)
Article MathSciNet Google Scholar
M. Griebel, M. Schneider, C. Zenger, A combination technique for the solution of sparse grid problems, in Iterative Methods in Linear Algebra, ed. by P. de Groen, R. Beauwens. IMACS, (Elsevier, Amsterdam, 1992), pp. 263–281
Google Scholar
W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus (Springer, Berlin, 2012)
Book Google Scholar
A.-L. Haji-Ali, F. Nobile, R. Tempone, Multi-index Monte Carlo: when sparsity meets sampling. Numer. Math. 132(4), 767–806 (2016)
Article MathSciNet Google Scholar
H. Harbrecht, M. Peters, M. Siebenmorgen, Multilevel accelerated quadrature for PDEs with log-normally distributed diffusion coefficient. SIAM/ASA J. Uncertain. Quantif. 4(1), 520–551 (2016)
Article MathSciNet Google Scholar
M. Hegland, Adaptive sparse grids. ANZIAM J. 44(C), C335–C353 (2002)
MathSciNet MATH Google Scholar
S. Heinrich, Monte Carlo complexity of global solution of integral equations. J. Complexity 14(2), 151–175 (1998)
Article MathSciNet Google Scholar
P.E. Kloeden, E. Platen, Numerical Solution of Stochastic Differential Equations (Springer, Berlin, 1992)
Book Google Scholar
O.P. Le Maître, O.M. Knio, Spectral Methods for Uncertainty Quantification (Springer, Berlin, 2010)
Book Google Scholar
S. Martello, P. Toth, Knapsack Problems: Algorithms and Computer Implementations (Wiley, New York, 1990)
MATH Google Scholar
F. Nobile, R. Tempone, S. Wolfers, Sparse approximation of multilinear problems with applications to kernel-based methods in UQ. Numer. Math. 139(1), 247–280 (2018)
Article MathSciNet Google Scholar
F.W.J. Olver, A.B. Olde Daalhuis, D.W. Lozier, B.I. Schneider, R.F. Boisvert, C.W. Clark, B.R. Miller, B.V. Saunders (eds.), NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/. Release 1.0.13 of 2016-09-16
A. Papageorgiou, H. Woźniakowski, Tractability through increasing smoothness. J. Complexity 26(5), 409–421 (2010)
Article MathSciNet Google Scholar
I.H. Sloan, H. Woźniakowski, Tractability of multivariate integration for weighted Korobov classes. J. Complexity 17(4), 697–721 (2001)
Article MathSciNet Google Scholar
S.A. Smolyak, Quadrature and interpolation formulas for tensor products of certain classes of functions. Soviet Math. Dokl. 4, 240–243 (1963)
MATH Google Scholar
G.W. Wasilkowski, H. Woźniakowski, Explicit cost bounds of algorithms for multivariate tensor product problems. J. Complexity 11(1), 1–56 (1995)
Article MathSciNet Google Scholar
C. Zenger, Sparse grids, in Parallel Algorithms for Partial Differential Equations. Proceedings of the Sixth GAMM-Seminar, ed. by W. Hackbusch (Vieweg, Braunschweig, 1991)
Google Scholar

Download references

Author information

Authors and Affiliations

King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
Raúl Tempone & Sören Wolfers

Authors

Raúl Tempone
View author publications
You can also search for this author in PubMed Google Scholar
Sören Wolfers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sören Wolfers .

Editor information

Editors and Affiliations

Institut für Numerische Simulation, Universität Bonn , Bonn, Germany
Jochen Garcke
Institute for Parallel and Distributed Systems, Universität Stuttgart, Stuttgart, Germany
Dirk Pflüger
Department of Mathematics, University of Tennessee, Knoxville, Tennessee, USA
Clayton G. Webster
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Guannan Zhang

Appendix

Lemma 1

Let γ _j > 0, β _j > 0, and t _j > 0 for j ∈{1, …, n}. Then

$$\displaystyle \begin{aligned} \sum_{(\boldsymbol{{\beta}}+\boldsymbol{\gamma})\cdot\boldsymbol{{k}}\leq L}\exp(\boldsymbol{\gamma}\cdot\boldsymbol{{k}})(\boldsymbol{{k}}+\mathbf{1})^{\boldsymbol{t}}\leq C(\boldsymbol{\gamma},\boldsymbol{t},n)\exp(\mu L)(L+1)^{n^*-1+t^*}, \end{aligned}$$

where $\rho :=\max _{j=1}^{n}\gamma _j/{\beta }_j$ , $\mu :=\frac {\rho }{1+\rho }$ , J := {j ∈{1, …, n} : γ _j∕β _j = ρ}, n ^∗ := |J|, t ^∗ :=∑_{j ∈ J} t _j , and $(\boldsymbol {{k}}+\mathbf {1})^{\boldsymbol {t}}:=\prod _{j=1}^{n}({k}_j+1)^{t_j}$.

Proof

First, we assume without loss of generality that the dimensions are ordered according to whether they belong to J or J ^c := {1, …, n}∖ J. To avoid cluttered notation we then separate dimensions by plus or minus signs in the subscripts; for example, we write $\boldsymbol {t}=(\boldsymbol {t}_J,\boldsymbol {t}_{J^{c}})=:(\boldsymbol {t}_+,\boldsymbol {t}_-)$.

Next, we may replace the sum by an integral over {(β + γ) ⋅x ≤ L}. Indeed, by monotonicity we may do so if we replace L by L + |β + γ|₁, but looking at the final result we observe that a shift of L only affects the constant C(γ, t, n).

Finally, using a change of variables y _j := (β _j + γ _j)x _j and the shorthand μ := γ∕(β + γ) (with componentwise division) we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \int_{(\boldsymbol{{\beta}}+\boldsymbol{\gamma})\cdot \boldsymbol{x}\leq L}\exp(\boldsymbol{\gamma}\cdot \boldsymbol{x})(\boldsymbol{x}+\mathbf{1})^{\boldsymbol{t}}\;d\boldsymbol{x}\\&\displaystyle &\displaystyle \quad \leq C \int_{|\boldsymbol{y}|{}_1\leq L}\exp(\boldsymbol{\mu}\cdot \boldsymbol{y})(\boldsymbol{y}+\mathbf{1})^{\boldsymbol{t}}\;d\boldsymbol{y}\\ &\displaystyle &\displaystyle \quad =C\int_{|\boldsymbol{y}_{+}|{}_1\leq L}\exp(\boldsymbol{\mu}_{+}\cdot \boldsymbol{y}_{+})(\boldsymbol{y}_{+}+\mathbf{1})^{\boldsymbol{t}_{+}}\\&\displaystyle &\displaystyle \qquad \times\int_{|\boldsymbol{y}_{-}|{}_{1}\leq L-|\boldsymbol{y}_{+}|{}_1}\exp(\boldsymbol{\mu}_{-}\cdot\boldsymbol{y}_{-})(\boldsymbol{y}_{-}+\mathbf{1})^{\boldsymbol{t}_{-}}\;d\boldsymbol{y}_{-}\;d\boldsymbol{y}_{+}\\ &\displaystyle &\displaystyle \quad \leq C\int_{|\boldsymbol{y}_{+}|{}_1\leq L}\exp(\mu|\boldsymbol{y}_{+}|{}_1)(\boldsymbol{y}_{+}+\mathbf{1})^{\boldsymbol{t}_{+}}\\&\displaystyle &\displaystyle \qquad \times\int_{|\boldsymbol{y}_{-}|{}_{1}\leq L-|\boldsymbol{y}_{+}|{}_1}\exp(\mu_{-}|\boldsymbol{y}_-|{}_{1})(\boldsymbol{y}_{-}+\mathbf{1})^{\boldsymbol{t}_{-}}\;d\boldsymbol{y}_{-}\;d\boldsymbol{y}_{+}=(\star), \end{array} \end{aligned} $$

where the last equality holds by definition of $\mu =\max \{\boldsymbol {\mu }_{+}\}$ and $\mu _{-}:=\max \{\boldsymbol {\mu }_{-}\}$. We use the letter C here and in the following to denote quantities that depend only on γ, t and n but may change value from line to line. Using $(\boldsymbol {y}_{ +}+\mathbf {1})^{\boldsymbol {t}_{+}}\leq (|\boldsymbol {y}_{+}|{ }_1+1)^{|\boldsymbol {t}_{+}|{ }_1}$ and $(\boldsymbol {y}_{-}+\mathbf {1})^{\boldsymbol {t}_{-}}\leq (|\boldsymbol {y}_{-}|{ }_1+1)^{|\boldsymbol {t}_{-}|{ }_1}$ and the linear change of variables y↦(|y|₁, y ₂, …, y _n) in both integrals, we obtain

where we used supremum bounds for both integrals for the third inequality, the change of variables w := L − u for the penultimate equality, and the fact that μ > μ ₋ for the last inequality.

Lemma 2

Let γ _j > 0, β _j > 0, and s _j > 0 for j ∈{1, …, n}. Then

$$\displaystyle \begin{aligned} \sum_{(\boldsymbol{{\beta}}+\boldsymbol{\gamma})\cdot\boldsymbol{{k}}>L}\exp(-\boldsymbol{{\beta}}\cdot\boldsymbol{{k}})(\boldsymbol{{k}}+\mathbf{1})^{\boldsymbol{s}} \leq C(\boldsymbol{{\beta}},\boldsymbol{s},n)\exp(-\nu L)(L+1)^{n^{*}-1+s^*}, \end{aligned}$$

where $\rho :=\max _{j=1}^{n}\gamma _j/{\beta }_j$ , $\nu :=\frac {1}{1+\rho }$ , J := {j ∈{1, …, n} : γ _j∕β _j = ρ}, n ^∗ := |J|, s ^∗ :=∑_{j ∈ J} t _j , and $(\boldsymbol {{k}}+\mathbf {1})^{\boldsymbol {s}}:=\prod _{j=1}^{n}({k}_j+1)^{s_j}$.

Proof

First, we assume without loss of generality that the dimensions are ordered according to whether they belong to J or J ^c. To avoid cluttered notation we then separate dimensions by plus or minus signs in the subscripts; for example, we write $\boldsymbol {s}=(\boldsymbol {s}_J,\boldsymbol {s}_{J^{c}})=:(\boldsymbol {s}_+,\boldsymbol {s}_-)$.

Next, we may replace the sum by an integral over {(β + γ) ⋅x > L}. Indeed, by monotonicity we may do so if we replace L by L −|β + γ|₁, but looking at the final result we observe that a shift of L only affects the constant C(β, s, n).

Finally, using a change of variables y _j := (β _j + γ _j)x _j and the shorthand ν := β∕(β + γ) (with componentwise division) we obtain

$$\displaystyle \begin{aligned} \begin{aligned} &\int_{(\boldsymbol{{\beta}}+\boldsymbol{\gamma})\cdot \boldsymbol{x}>L}\exp(-\boldsymbol{{\beta}}\cdot \boldsymbol{x})(\boldsymbol{x}+\mathbf{1})^{\boldsymbol{s}}\;d\boldsymbol{x}\leq C\int_{|\boldsymbol{y}|{}_1>L}\exp(-\boldsymbol{\nu}\cdot \boldsymbol{y})(\boldsymbol{y}+\mathbf{1})^{\boldsymbol{s}}\;d\boldsymbol{y}\\ &=C\int_{|\boldsymbol{y}_{+}|{}_1> L}\exp(-\boldsymbol{\nu}_{+}\cdot \boldsymbol{y}_{+})(\boldsymbol{y}_{+}+\mathbf{1})^{\boldsymbol{s}_{+}}\int_{|\boldsymbol{y}_{-}|{}_{1}>(L-|\boldsymbol{y}_{+}|{}_1)^{+}}\exp(-\boldsymbol{\nu}_{-}\cdot\boldsymbol{y}_{-})(\boldsymbol{y}_{-}+\mathbf{1})^{\boldsymbol{s}_{-}}d\boldsymbol{y}_{-}d\boldsymbol{y}_{+}\\ &\leq C\int_{|\boldsymbol{y}_{+}|{}_1> L}\exp(-\nu|\boldsymbol{y}_{+}|{}_1)(\boldsymbol{y}_{+}+\mathbf{1})^{\boldsymbol{s}_{+}}\int_{|\boldsymbol{y}_{-}|{}_{1}>(L-|\boldsymbol{y}_{+}|{}_1)^{+}}\exp(-\nu_{-}|\boldsymbol{y}_-|{}_{1})(\boldsymbol{y}_{-}+\mathbf{1})^{\boldsymbol{s}_{-}}d\boldsymbol{y}_{-}d\boldsymbol{y}_{+}\\ &=:(\star), \end{aligned} \end{aligned}$$

where the last equality holds by definition of $\nu =\max \{\boldsymbol {\nu }_{+}\}$ and $\nu _{-}:=\max \{\boldsymbol {\nu }_{-}\}$. We use the letter C here and in the following to denote quantities that depend only on β, s and n but may change value from line to line. Using $(\boldsymbol {y}_{ +}+\mathbf {1})^{\boldsymbol {s}_{+}}\leq (|\boldsymbol {y}_{+}|{ }_1+1)^{|\boldsymbol {s}_{+}|{ }_1}$ and $(\boldsymbol {y}_{-}+\mathbf {1})^{\boldsymbol {s}_{-}}\leq (|\boldsymbol {y}_{-}|{ }_1+1)^{|\boldsymbol {s}_{-}|{ }_1}$ and the linear change of variables y↦(|y|₁, y ₂, …, y _n) in both integrals, we obtain

To bound (⋆⋆), we estimate the inner integral using the inequality $\int _{a}^{\infty }\exp (-b v)(v+1)^{c}\,dv\leq C\exp (-b a)(a+1)^c$ [21, (8.11.2)], which is valid for all positive a, b, c:

$$\displaystyle \begin{aligned} \begin{aligned} (\star\star)&\leq C\int_0^{L}\exp(-\nu u)(u+1)^{|\boldsymbol{s}_{+}|{}_{1}+|J|-1}\exp(-\nu_{-}(L-u))(L-u+1)^{|\boldsymbol{s}_{-}|{}_1+|J^{c}|-1}\;du\\ &\leq C(L+1)^{|\boldsymbol{s}_{+}|{}_{1}+|J|-1}\int_0^{L}\exp(-\nu(L-w))\exp(-\nu_{-}w)(w+1)^{|\boldsymbol{s}_{-}|{}_1+|J^{c}|-1}\;dw\\ &=C(L+1)^{|\boldsymbol{s}_{+}|{}_{1}+|J|-1}\exp(-\nu L)\int_0^{L}\exp(-(\nu_{-}-\nu)w)(w+1)^{|\boldsymbol{s}_{-}|{}_1+|J^{c}|-1}\;dw\\ &\leq C(L+1)^{|\boldsymbol{s}_{+}|{}_{1}+|J|-1}\exp(-\nu L), \end{aligned} \end{aligned}$$

where we used a supremum bound and the change of variables w := L − u for the second inequality, and the fact that ν ₋ > ν for the last inequality. Finally, to bound (⋆ ⋆ ⋆), we observe that the inner integral is independent of L, and bound the outer integral in the same way we previously bounded the inner integral. This shows

$$\displaystyle \begin{aligned} (\star\star\star)\leq C\exp(-\nu L)(L+1)^{|\boldsymbol{s}_{+}|{}_{1}+|J|-1}. \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tempone, R., Wolfers, S. (2018). Smolyak’s Algorithm: A Powerful Black Box for the Acceleration of Scientific Computations. In: Garcke, J., Pflüger, D., Webster, C., Zhang, G. (eds) Sparse Grids and Applications - Miami 2016. Lecture Notes in Computational Science and Engineering, vol 123. Springer, Cham. https://doi.org/10.1007/978-3-319-75426-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-75426-0_9
Published: 21 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75425-3
Online ISBN: 978-3-319-75426-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Smolyak’s Algorithm: A Powerful Black Box for the Acceleration of Scientific Computations

Abstract

Similar content being viewed by others

Fast Sparse Grid Operations Using the Unidirectional Principle: A Generalized and Unified Framework

AMGCL: An Efficient, Flexible, and Extensible Algebraic Multigrid Implementation

Why Geometric Numerical Integration?

1 Introduction

2 Decomposition

Proposition 1

Proof

3 Truncation

Proposition 2

Proof

Example 1

3.1 Knapsack Problem

Proposition 3

Proof

3.2 Combination Rule

4 Convergence Analysis

4.1 Finite-Dimensional Case

Theorem 1

Remark 1

Remark 2

4.2 Infinite-Dimensional Case

Theorem 2

Remark 3

5 Applications

5.1 High-Dimensional Interpolation and Integration

Remark 4

5.2 Monte Carlo Path Simulation

5.3 Multilevel Quadrature

5.4 Partial Differential Equations

5.5 Uncertainty Quantification

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Lemma 1

Proof

Lemma 2

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation