Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

2.1 Introduction

The more complex a data site or mathematical model is the more adapted a corresponding mathematical representation needs to be in order to capture its information content at acceptable cost in terms of storage and computational complexity. In principle, this is true for mathematical objects described explicitly by large sets of possibly noisy or corrupted data but also for those given only implicitly as the solution of an operator equation. The latter scenario is perhaps even more challenging because direct observations are not possible. By “adapted representation” we mean a representation of the unknown function that exploits possibly global features of this function so as to require, for a prescribed target accuracy, only relatively few parameters to determine a corresponding approximation. Such global features could take a variety of forms such as (i) a high degree of regularity except at isolated singularities located on lower dimensional manifolds, or (ii) a particular sparsity possibly with respect to a dictionary which may even depend on the problem at hand. In fact, corresponding scenarios are not strictly disjoint. In either case reconstruction or approximation methods are necessarily nonlinear. For instance, as for (i), 1D best N-term wavelet approximations offer a powerful method based on selecting only possible few coefficients in an exact representation with respect to a given universal background dictionary, e.g. a wavelet basis. When dealing with more than one spatial variable the situation quickly becomes more complicated and for spatial dimensions much larger than three, classical numerical tools designed for the low dimensional regime become practically useless. This is commonly referred to as curse of dimensionality. Unfortunately, there seems to be no universal strategy of dealing with the curse of dimensionality, i.e., that works in all possible cases.

One global structural feature which is encountered in many multivariate scenarios is anisotropy: images, as functions of two variables, exhibit edges and discontinuities along curves. Higher dimensional biological images have sharp interfaces separating more homogeneous regions. Likewise highly anisotropic phenomena such as shear- or boundary layers are encountered in solutions to transport dominated initial-boundary value problems.

One major focus of the DFG-SPP project “Adaptive Anisotropic Discretization Concepts” has been to efficiently recover and economically encode anisotropic structures represented by explicitly given data or determined as solutions of operator equations which are prone to give rise to such structures. Regarding this latter case, which we will focus on in this article, parametric transport problems (as well as close relatives) have served as guiding model problems for the following reasons: (i) their solutions could exhibit shear or boundary layers and hence discontinuities across lower dimensional manifolds calling for suitable anisotropic discretizations; (ii) how to contrive suitable variational formulations, which in particular accommodate such anisotropic discretizations is much less clear than in the elliptic case; (iii) parametric versions give rise to high-dimensional problems.

Concerning (i), directional representation systems like curvelets and shearlets outperform classical isotropic wavelet bases when approximating so called “cartoon images”, see [23] and [9, 3135]. For recent applications to imaging data, in particular, inpainting as well as in combination with geometric separation concepts the reader is referred to [24, 29]. In the present context of solving operator equations we outline in Sect. 2.2 trial spaces which accommodate directional adaptivity. They are motivated by recent constructions of compactly supported piecewise polynomial shearlet systems (see e.g. [30]) because they are close to classical multiresolution structures and similar in nature to classical discretization systems. Since cartoons exhibit structural similarities with the solution to transport problems we state best N-term error bounds for cartoon functions that will later serve as benchmarks for an adaptive solver. For related anisotropic simplicial discretizations and their analysis see e.g. [10, 12, 15].

As for (ii), our approach differs from previous works on anisotropic discretizations derived from “curvature information” on the current approximation and hence not based on a rigorous error control (see e.g. [22] and the references therein), in that we derive first in Sect. 2.3 well conditioned variational formulations for general unsymmetric or indefinite and singularly perturbed problems, see [14, 16] for details on convection-diffusion and transport problems. The underlying basic principles are of independent interest by themselves and seem to have appeared first in [3]. They are also closely related to ongoing developments running under the flag of Discontinuous Petrov Galerkin (DPG) Methods, see e.g. [19, 20]. The approach is motivated by two crucial corner stones. On the one hand, one can essentially choose the norm for the (infinite dimensional) trial space X by which one would like to measure accuracy while adapting the norm for the (infinite dimensional) test space Y so as to ensure that (ideally) the operator induced by this variational formulation is even an isometry from X to Y ′ (the normed dual of Y ). Numerical feasibility of (nearly optimal) Petrov Galerkin discretizations based on such formulations, even beyond a DPG framework, hinges on an appropriate saddle point formulation which turns out to be actually crucial in connection with model reduction [18]. On the one hand, this allows one to accommodate, for instance, L 2-frames. On the other hand, the resulting tight error-residual relation is the basis of computable a-posteriori error estimators [14, 16] and, ultimately, to rigorously founded adaptive anisotropic refinement strategies.

These variational formulations apply in much more generality but in order to address issue (iii) we exemplify them for the simple linear transport equation (stationary or instationary) whose parametric version leads to high-dimensional problems and forms a core constituent of kinetic models such as radiative transport. There the transport direction – the parameter – varies over a unit sphere so that solutions are functions of the spatial variables (and, possibly, of time) and of the transport direction.

We briefly highlight two ways of treating such parametric problems under slightly different objectives. Both strategies aim at approximating the solution \(u(x,{\boldsymbol s})\), \(x \in \varOmega \subset \mathbb{R}^{d}\), \({\boldsymbol s} \in S^{d-1}\), in the form

$$\displaystyle{ u(x,{\boldsymbol s}) \approx \sum _{j=1}^{n}c_{ j}({\boldsymbol s})u_{j}(x). }$$
(2.1)

In Sect. 2.4 the u j are constructed offline in a greedy manner from snapshots of the solution manifold, thus forming a solution dependent dictionary. According to the paradigm of the Reduced Basis Method (RBM) the parameter dependent coefficients \(c_{j}({\boldsymbol s})\) are not given explicitly but can be efficiently computed in an online fashion, e.g. in the context of design or (online) optimization. This approach works the better the smoother the dependence of the solution on the parameters is so that the Kolmogorov n-widths decay rapidly with increasing n. Making essential use of the well conditioned variational formulations from Sect. 2.3, it can be shown that the resulting RBM has stability constants as close to one as one wishes yielding for the first time an RBM for transport and convection-diffusion problems with this property exhibiting the same rates as the Kolmogorov widths [18].

In Sect. 2.5 of this report, and in [26], we present algorithms which construct explicitly separable approximations of the form (2.1) for the parametric transport problem of radiative transfer. We also mention that separable approximations such as (2.1) arise in a host of other applications; for example, in parametric representations of PDEs with random field input data with the aid of sparse tensor product interpolation methods; we refer to [11, 13] and to the references therein. Adaptive near-minimal rank tensor solvers for problems in high dimensional phase space are established and analyzed in [2].

2.2 Anisotropic Approximations

Let D = (0, 1)2 and let curv(∂ Ω) denote the curvature of \(\partial \varOmega \cap D\). The class of cartoon-like functions on D = (0, 1)2,

$$\displaystyle\begin{array}{rcl} \mathcal{C}(\zeta,L,M,D)&:=& \{f_{1}\chi _{\varOmega } + f_{2}\chi _{D\setminus \varOmega }:\,\,\varOmega \subset D,\,\vert \partial \varOmega \cap D\vert \leq L,\partial \varOmega \cap D \in C^{2}, \\ & & \mathrm{curv}(\partial \varOmega ) \leq \zeta,\|f_{i}^{(l)}\|_{ L_{\infty }(D)} \leq M,\,l \leq 2,\,\,i = 1,2\},{}\end{array}$$
(2.2)

(where the parameters ζ, L are not mutually independent) has become a well accepted benchmark for sparse approximation in imaging [23]. Compactly supported shearlet systems for \(L^{2}(\mathbb{R}^{2})\) have been introduced in [30, 33] to provide (near-) optimal sparse approximations for such classes. We observe that such cartoons also exhibit similar features as solutions to transport problems.

Unfortunately, even compactly supported shearlets do not comply well with quadrature and boundary adaptation tasks faced in variational methods for PDEs. We are therefore interested in generating locally refinable anisotropic partitions for which corresponding piecewise polynomial approximations realize the favorable near-optimal approximation rates for cartoon functions achieved by shearlet systems. Unfortunately, as shown in [36, Chapter 9.3], simple triangular bisections connecting the midpoint of an edge to the opposite vertex is not sufficient for warranting such rates, see [10, 15] for related work. In fact, a key feature would be to realize a “parabolic scaling law” similar to the shearlet setting. By this we mean a sufficient rapid directional resolution by anisotropic cells whose width scales like the square of the diameter. To achieve this we consider partitions comprised of triangles and quadrilaterals pointed out to us in Cohen and Mirebeau (2013, private communication). We sketch the main ideas and refer to [17] for details.

Starting from some initial partition consisting of triangles and quadrilaterals, refined partitions are obtained by splitting a given cell Q of a current partition according to one of the following rules:

  1. (i)

    Connect a vertex with the midpoint of an edge not containing the vertex.

  2. (ii)

    Connect two vertices.

  3. (iii)

    Connect the midpoints of two edges which, when Q is a quadrilateral, do not share any vertex.

The types of bisections are indicated in Fig. 2.1: (1), (2) are examples of (i), (3) illustrates (ii), and (4), (5) are examples for (iii). One easily checks that these refinement rules produce only triangles and quadrilaterals. Moreover, a quadrilateral can be bisected in eight possible ways whereas a triangle can be split in six possible ways. Assigning to each split type a number in I Q  = { 1, , 8} when Q is a quadrilateral and a number in I Q  = { 9, , 14} when Q is a triangle, we denote by

$$\displaystyle{ R_{\iota _{Q}}(Q) =\{ Q_{1},Q_{2}\}\quad \mbox{ for some }\iota _{Q} \in I_{Q}, }$$
(2.3)

the refinement operator which replaces the cell Q by its two children Q 1, Q 2 generated, according to the choice ι Q , by the above split rules (i)–(iii).

Fig. 2.1
figure 1

Illustration of the partition rules

For any partition \(\mathcal{G}\) of D, let \(\mathbb{P}_{1}(\mathcal{G}) =\{ v \in L_{2}(D): v\vert Q \in \mathbb{P}_{1},Q \in \mathcal{G}\}\) be the space of piecewise affine functions on \(\mathcal{G}\) and denote by \(\mathfrak{G}\) the set of all finite partitions that can be created by successive applications of \(R_{\iota _{Q}}\) to define then

$$\displaystyle{\varSigma _{N}:=\bigcup \,\{\mathbb{P} _{1}(\mathcal{G}): \mathcal{G}\in \mathfrak{G},\,\#(\mathcal{G}) \leq N\}.}$$

The next result from [17] shows that approximations by elements of Σ N realize (and even slightly improve on) the known rates obtained for shearlet systems for the class of cartoon-like functions [33].

Theorem 2.1 ([17]).

Let \(f \in \mathcal{C}(\zeta,L,M,D)\) with D = (0,1) 2 and assume that the discontinuity curve \(\varGamma = \partial \varOmega \cap D\) is the graph of a C 2 -function. Then one has

$$\displaystyle{\inf _{\varphi \in \varSigma _{N}}\|f -\varphi \|_{L_{2}(D)} \leq C(\zeta,L)M\,N^{-1}\log N,}$$

where C(ζ,L) is an absolute constant depending only on ζ,L.

The proof of Theorem 2.1 is based on constructing a specific sequence \(\mathcal{C}_{j}\) of admissible partitions from \(\mathfrak{G}\) where the refinement decisions represented by \(R_{\iota _{Q}}\) use full knowledge of the approximated function f. A similar sequence of partitions is employed in Sect. 2.3.4.2 where ι Q  ∈ I Q , however, results from an a posteriori criterion described later below. We close this section by a few remarks on the structure of the \(\mathcal{C}_{j}\). Given \(\mathcal{C}_{j-1}\), we first generate

$$\displaystyle{ \tilde{\mathcal{C}}_{j} =\{ Q^{'} \in \tilde{ R}(Q): Q \in \mathcal{C}_{j-1}\}, }$$
(2.4)

where \(\tilde{R}\) is either \(R_{\iota _{Q}}\) or the identity. To avoid unnecessary refinements we define then \(\mathcal{C}_{j}\) by replacing any pair of triangles \(Q,Q^{'} \in \tilde{\mathcal{C}}_{j}\) whose union forms a parallelogram P by P itself. This reduces the number of triangles in favor of parallelograms.

2.3 Well-Conditioned Stable Variational Formulations

In this section we highlight some new conceptual developments from [14, 16, 18] which are, in particular, relevant for the high dimensional parametric problems addressed later below.

2.3.1 The General Principles

Anisotropic structures are already exhibited by solutions of elliptic boundary value problems on polyhedral domains in 3D. However, related singularities are known a priori and can be dealt with by anisotropic preset mesh refinements. Anisotropic structures of solutions to transport dominated problems can be less predictable so that a quest for adaptive anisotropic discretization principles gains more weight. Recall that every known rigorously founded adaptation strategy hinges in one way or the other on being able to relate a current error of an approximate solution to the corresponding residual in a suitable norm. While classical variational formulations of elliptic problems grant exactly such an error-residual relation, this is unclear for transport dominated problems. The first fundamental issue is therefore to find also for such problems suitable variational formulations yielding a well conditioned error-residual relation.

2.3.1.1 Abstract Petrov-Galerkin Formulation

Suppose that for a pair of Hilbert spaces X, Y (with scalar products (⋅ , ⋅ ) X , (⋅ , ⋅ ) Y and norms \(\|\cdot \|_{X},\|\cdot \|_{Y }\)), and a given bilinear form \(b(\cdot,\cdot ): X \times Y \rightarrow \mathbb{R}\), the problem

$$\displaystyle{ b(u,v) = f(v),\quad v \in Y, }$$
(2.5)

has for any f ∈ Y ′ (the normed dual of Y ) a unique solution u ∈ X. It is well-known that this is equivalent to the existence of constants 0 < c b  ≤ C b  <  such that

$$\displaystyle{ \sup _{w\in X}\sup _{v\in Y }\frac{b(w,v)} {\|w\|_{X}\|v\|_{Y }} \leq C_{b},\quad \inf _{w\in X}\sup _{v\in Y }\frac{b(v,w)} {\|w\|_{X}\|v\|_{Y }} \geq c_{b}, }$$
(2.6)

and that, for each v ∈ Y, there exists a w ∈ X such that \(b(w,v)\neq 0\). This means that the operator B: X → Y ′, defined by (Bu)(v): = b(u, v), u ∈ X, v ∈ Y, is an isomorphism with condition number \(\kappa _{X,Y }(B):=\| B\|_{\mathcal{L}(X,Y ')}\|B^{-1}\|_{\mathcal{L}(Y ',X)} \leq C_{b}/c_{b}\). For instance, when (2.5) represents a convection dominated convection-diffusion problem with the classical choice X = Y = H 0 1(Ω), the quotient C b c b becomes very large. Since

$$\displaystyle{ \|B\|_{\mathcal{L}(X,Y ')}^{-1}\|Bv - f\|_{ Y '} \leq \| u - v\|_{X} \leq \| B^{-1}\|_{ \mathcal{L}(Y ',X)}\|Bv - f\|_{Y '}, }$$
(2.7)

the error \(\|u - v\|_{X}\) can then not be tightly estimated by the residual \(\|Bv - f\|_{Y '}\).

2.3.1.2 Renormation

On an abstract level the following principle has surfaced in a number of different contexts such as least squares methods (see e.g. [5, 8]) and the so-called, more recently emerged Discontinuous Petrov Galerkin (DPG) methods, see e.g. [3, 16, 19, 20] and the references therein. The idea is to fix a norm, \(\|\cdot \|_{Y }\), say, and modify the norm for X so that the corresponding operator even becomes an isometry. More precisely, define

$$\displaystyle{ \|u\|_{\hat{X}}:=\sup _{v\in Y }\frac{b(u,v)} {\|v\|_{Y }} =\| \mathit{Bu}\|_{Y '} =\| R_{Y }^{-1}\mathit{Bu}\|_{ Y }, }$$
(2.8)

where R Y : Y → Y ′ is the Riesz map defined by (v, z) Y  = (R Y v)(z). The following fact is readily verified, see e.g. [16, 40].

Remark 2.1.

One has \(\kappa _{\hat{X},Y }(B) = 1\), i.e., (2.6) holds with c b  = C b  = 1 when \(\|\cdot \|_{X}\) is replaced by \(\|\cdot \|_{\hat{X}}\).

Alternatively, fixing X and redefining \(\|\cdot \|_{Y }\) by \(\|v\|_{\hat{Y }}:=\| B^{{\ast}}v\|_{X'}\), one has \(\kappa _{X,\hat{Y }}(B) = 1\), see [16]. Both possibilities lead to the error residual relations

$$\displaystyle{ \|u - w\|_{X} =\| f -\mathit{Bw}\|_{\hat{Y }'},\quad \|u - w\|_{\hat{X}} =\| f -\mathit{Bw}\|_{Y '},\quad u,w \in X. }$$
(2.9)

2.3.2 Transport Equations

Several variants of these principles are applied and analyzed in detail in [14] for convection-diffusion equations. We concentrate in what follows on the limit case for vanishing viscosity, namely pure transport equations. For simplicity we consider the domain D = (0, 1)d, d = 1, 2, 3, with \(\varGamma:= \partial D\), denoting as usual by \({\boldsymbol n} ={\boldsymbol n}(x)\) the unit outward normal at x ∈ Γ (excluding the four corners, of course). Moreover, we consider velocity fields \({\boldsymbol b}(x)\), x ∈ D, which for simplicity will always be assumed to be differentiable, i.e., \({\boldsymbol b}(x) \in C^{1}(\overline{D})^{d}\). Likewise \(c(x) \in C^{0}(\overline{D})\) will serve as the reaction term in the first order transport equation

$$\displaystyle{ {\boldsymbol b} \cdot \nabla u + cu =\; \mbox{ $f_{\circ }$ in $D$}\,,\quad u =\; \mbox{ $g$ on $\varGamma _{-}$}\,, }$$
(2.10)

where \(\varGamma _{\pm }:=\{ x \in \partial D:\; \pm {\boldsymbol b}(x) \cdot {\boldsymbol n}(x) > 0\}\) denotes the inflow, outflow boundary, respectively. Furthermore, to simplify the exposition we shall always assume that \(2c -\nabla \cdot {\boldsymbol b} \geq c_{0} > 0\) in D holds.

A priori there does not seem to be any “natural” variational formulation. Nevertheless, the above principle can be invoked as follows. Following e.g. [16], one can show that the associated bilinear form with derivatives on the test functions

$$\displaystyle{ b(w,v):=\int _{D}\,w(-{\boldsymbol b} \cdot \nabla v + v(c -\nabla \cdot {\boldsymbol b}))\;\mathit{dx}, }$$
(2.11)

is trivially bounded on \(L_{2}(D) \times W_{0}(-{\boldsymbol b},D)\), where

$$\displaystyle{ W_{0}(\mp {\boldsymbol b},D):=\mathrm{ clos}_{\|\cdot \|_{W({\boldsymbol b},D)}}\{v \in C^{1}(D) \cap C(\overline{D}),\,v\mid _{\varGamma _{ \pm }} \equiv 0\}, }$$
(2.12)

and

$$\displaystyle{ \|v\|_{W({\boldsymbol b},D)}:= \left (\|v\|_{L_{2}(D)}^{2} +\int _{ D}\vert {\boldsymbol b} \cdot \nabla v\vert ^{2}\,\mathit{dx}\right )^{1/2}. }$$
(2.13)

Moreover, the trace γ (v) exists for \(v \in W_{0}({\boldsymbol b},D)\) in \(L_{2}(\varGamma _{-},\vert {\boldsymbol b} \cdot {\boldsymbol n}\vert )\), endowed with the norm \(\|g\|_{L_{2}(\varGamma _{\pm },\vert {\boldsymbol b}\cdot {\boldsymbol n}\vert )}^{2} =\int _{\varGamma _{\pm }}\vert g\vert ^{2}\vert {\boldsymbol b} \cdot {\boldsymbol n}\vert ds\) so that

$$\displaystyle{ f(v):= (f_{\circ },v) +\int _{\varGamma _{-}}g\gamma _{-}(v)\vert {\boldsymbol b} \cdot {\boldsymbol n}\vert ds }$$
(2.14)

belongs to \((W_{0}({\boldsymbol b},D))'\) and the variational problem

$$\displaystyle{ b(u,v) = f(v),\quad v \in W_{0}(-{\boldsymbol b},D) }$$
(2.15)

possesses a unique solution in L 2(D) which, when regular enough, coincides with the classical solution of (2.10), see [16, Theorem 2.2].

Moreover, since X = L 2(D) = X′, the quantity \(\|v\|_{Y }:=\| B^{{\ast}}v\|_{L_{2}(D)}\) is an equivalent norm on \(W_{0}(-{\boldsymbol b},D)\), see [16], and Remark 2.1 applies, i.e.,

$$\displaystyle{ \|B\|_{\mathcal{L}(L_{2}(D),(W_{0}({\boldsymbol b},D))')} =\| B^{{\ast}}\|_{ \mathcal{L}(W_{0}({\boldsymbol b},D),L_{2}(D))} = 1, }$$
(2.16)

see [16, Proposition 4.1]. One could also reverse the roles of test and trial space (with the inflow boundary conditions being then essential ones) but the present formulation imposes least regularity on the solution which will be essential in the next section. Note that whenever a PDE is written as a first order system, X can always be arranged as an L 2-space.

Our particular interest concerns the parametric case, i.e., the constant convection field \({\boldsymbol s}\) in

$$\displaystyle{ \begin{array}{rcl} {\boldsymbol s} \cdot \nabla u(x,{\boldsymbol s}) +\kappa (x)u(x,{\boldsymbol s})& =&f_{\circ }(x),\quad x \in D \subset \mathbb{R}^{d},\,\,d = 2,3, \\ u(x,{\boldsymbol s})& =&g(x,{\boldsymbol s}),\,x \in \varGamma _{-}({\boldsymbol s}), \end{array} }$$
(2.17)

may vary over a set of directions \(\mathcal{S}\) so that now the solution u depends also on the transport direction \({\boldsymbol s}\). In (2.17) and the following we assume that \(\mathrm{ess}\inf _{x\in D}\kappa (x) \geq 0\). Thus, for instance, when \(\mathcal{S} = S^{2}\), the unit 2−sphere, u is considered as a function of five variables, namely d = 3 spatial variables and parameters from a two-dimensional set \(\mathcal{S}\). This is the simplest example of a kinetic equation forming a core constituent in radiative transfer models. The in- and outflow boundaries now depend on \({\boldsymbol s}\):

$$\displaystyle{ \varGamma _{\pm }({\boldsymbol s}):=\{ x \in \partial D: \mp {\boldsymbol s} \cdot \mathbf{n}(x) < 0\},\qquad {\boldsymbol s} \in \mathcal{S}\;. }$$
(2.18)

Along similar lines one can determine u as a function of x and \({\boldsymbol s}\) in \(X = L_{2}(D \times \mathcal{S})\) as the solution of a variational problem with test space \(Y:=\mathrm{ clos}_{\|\cdot \|_{W(D\times \mathcal{S})}}\{v \in C(\mathcal{S},C^{1}(D)): v\vert _{\varGamma _{\pm }} \equiv 0\}\) with \(\|v\|_{W(D\times \mathcal{S})}^{2}:=\| v\|_{L_{2}(D\times \mathcal{S})}^{2} +\int _{\mathcal{S}\times D}\vert {\boldsymbol s} \cdot \nabla v\vert ^{2}\mathit{dx}d{\boldsymbol s}\). Again this formulation requires minimum regularity. Since later we shall discuss yet another formulation, imposing stronger regularity conditions, we refer to [16] for details.

2.3.3 \(\delta\)-Proximality and Mixed Formulations

It is initially not clear how to exploit (2.9) numerically since the perfect inf-sup stability on the infinite dimensional level is not automatically inherited by a given pair \(X_{h} \subset X,Y _{h} \subset Y\) of equal dimension. However, given \(X_{h} \subset \hat{ X}\), one can identify the “ideal” test space Y (X h ) = R Y −1 B(X h ) which may be termed ideal because

$$\displaystyle{ \sup _{w\in X_{h}}\sup _{v\in Y (X_{h})}\frac{b(w,v)} {\|w\|_{X}\|v\|_{Y }} =\inf _{w\in X_{h}}\sup _{v\in Y (X_{h})}\frac{b(v,w)} {\|w\|_{X}\|v\|_{Y }} = 1, }$$
(2.19)

see [16]. In particular, this means that the solution u h  ∈ X h of the corresponding Petrov-Galerkin scheme

$$\displaystyle{ b(u_{h},v) = f(v),\quad v \in Y (X_{h}), }$$
(2.20)

realizes the best \(\hat{X}\)-approximation to the solution u of (2.5), i.e.,

$$\displaystyle{ \|u - u_{h}\|_{\hat{X}} =\inf _{w\in X_{h}}\|u - w\|_{\hat{X}}. }$$
(2.21)

Of course, unless Y is an L 2 space, the ideal test space Y (X h ) is, in general, not computable exactly. To retain stability it is natural to look for a numerically computable test space Y h that is sufficiently close to Y (X h ).

One can pursue several different strategies to obtain numerically feasible test spaces Y h . When (2.5) is a discontinuous Galerkin formulation one can choose Y as a product space over the given partition, again with norms induced by the graph norm for the adjoint B so that the approximate inversion of the Riesz map R Y can be localized [19, 20]. An alternative, suggested in [14, 16], is based on noting that by (2.8) the ideal Petrov Galerkin solution u h from (2.20) is a minimum residual solution in Y ′, i.e., \(u_{h} =\mathrm{ argmin}_{w\in X_{h}}\|f -\mathit{Bw}\|_{Y '}\) whose normal equations read (fBu h , Bw) Y = 0, w ∈ X h . Since the inner product (⋅ , ⋅ ) Y is numerically hard to access, one can write \((f -\mathit{Bu}_{h},\mathit{Bw})_{Y '} =\langle R_{Y }^{-1}(f -\mathit{Bu}_{h}),\mathit{Bw}\rangle\), where the dual pairing \(\langle \cdot,\cdot \rangle\) is now induced by the standard L 2-inner product. Introducing as an auxiliary variable the “lifted residual”

$$\displaystyle{ y = R_{Y }^{-1}(f -\mathit{Bu}_{ h}), }$$
(2.22)

or equivalently \((R_{Y }y)(v) =\langle R_{Y }y,v\rangle = (y,v)_{Y } =\langle f -\mathit{Bu}_{h},v\rangle\), v ∈ Y, one can show that (2.20) is equivalent to the saddle point problem

$$\displaystyle{ \begin{array}{lccl} \langle R_{Y }y,v\rangle + b(u_{h},v)& =&\langle f,v\rangle,&v \in Y, \\ b(w,y) & =& 0, &w \in X_{h},\end{array} }$$
(2.23)

which involves only standard L 2-inner products, see [16, 18].

Remark 2.2.

When working with \(X,\hat{Y }\) instead of \(\hat{X},Y\), one has R Y  = BR X −1 B and hence, when X = L 2(D) as in (2.11), one has R Y  = BB .

Since the test space Y is still infinite dimensional, a numerical realization would require finding a (possibly small) subspace \(V \subset Y\) such that the analogous saddle point problem with Y replaced by V is still inf-sup stable. The relevant condition on V can be described by the notion of \(\delta\)-proximality introduced in [16], see also [14]. We recall the formulation from [18]: \(V \subset Y\) is \(\delta\) -proximal for \(X_{h} \subset \hat{ X}\) if, for some \(\delta \in (0,1)\), with P Y, V denoting the Y -orthogonal projection from Y to V,

$$\displaystyle{ \|(I - P_{Y,V })R_{Y }^{-1}\mathit{Bw}\|_{ Y } \leq \delta \| R_{Y }^{-1}\mathit{Bw}\|_{ Y },\quad w \in X_{h}\;. }$$
(2.24)

For a discussion of how to monitor or realize \(\delta\)-proximality we refer to [14, 16], see also Sect. 2.3.5.

Theorem 2.2 ([14, 16, 18]).

Assume that for given \(X_{h} \times V \subset X \times Y\) the test space V is \(\delta\) -proximal for X h , i.e.  (2.24) is satisfied. Then, the solution \((u_{X_{h},V },y_{X_{h},V }) \in X_{h} \times V\) of the saddle point problem

$$\displaystyle{ \begin{array}{lccl} \langle R_{Y }y_{X_{h},V },v\rangle + b(u_{X_{h},V },v)& =&\langle f,v\rangle,&v \in V, \\ b(w,y_{X_{h},V }) & =& 0, &w \in X_{h}, \end{array} }$$
(2.25)

satisfies

$$\displaystyle{ \|u - u_{X_{h},V }\|_{\hat{X}} \leq \frac{1} {1-\delta }\inf _{w\in X_{h}}\|u - w\|_{\hat{X}}. }$$
(2.26)

and

$$\displaystyle{ \|u - u_{X_{h},V }\|_{\hat{X}} +\| y - y_{X_{h},V }\|_{Y } \leq \frac{2} {1-\delta }\inf _{w\in X_{h}}\|u - w\|_{\hat{X}}. }$$
(2.27)

Moreover, one has

$$\displaystyle{ \inf _{w\in X_{h}}\sup _{v\in V }\frac{b(w,v)} {\|v\|_{Y }\|q\|_{\hat{X}}} \geq \sqrt{1 -\delta ^{2}}. }$$
(2.28)

Finally,  (2.25) is equivalent to the Petrov-Galerkin scheme

$$\displaystyle{ b(u_{X_{h},V },v) = f(v),\quad v \in Y _{h}:= P_{Y,V }(R_{Y }^{-1}B(X_{ h})) = P_{Y,V }(Y (X_{h})). }$$
(2.29)

The central message is that the Petrov-Galerkin scheme (2.29) can be realized without computing a basis for the test space Y h , which for each basis function could require solving a problem of the size dim V, by solving instead the saddle point problem (2.25). Moreover, the stability of both problems is governed by the \(\delta\)-proximality of V. As a by-product, in view of (2.22), the solution component \(y_{X_{h},V }\) approximates the exact lifted residual \(R_{Y }^{-1}(f -\mathit{Bu}_{X_{h},V })\) and, as pointed out below, can be used for an a posteriori error control.

The problem (2.25), in turn, can be solved with the aid of an Uzawa iteration whose efficiency relies again on \(\delta\)-proximality. For k = 0, , solve

$$\displaystyle\begin{array}{rcl} \langle R_{Y }y^{k},v\rangle & =& \langle f -\mathit{Bu}^{k},v\rangle,\quad v \in V, \\ (u^{k+1},w)_{\hat{ X}}& =& (u^{k},w)_{\hat{ X}} +\langle B^{{\ast}}y^{k},w\rangle,\quad w \in X_{ h}.{}\end{array}$$
(2.30)

Thus, each iteration requires solving a symmetric positive definite Galerkin problem in V for the approximate lifted residual.

Theorem 2.3 ([16, Theorem 4.3]).

Assume that  (2.24) is satisfied. Then the iterates generated by the scheme  (2.30) converge to \(u_{X_{h},V }\) and

$$\displaystyle{ \|u_{X_{h},V } - u^{k+1}\|_{ \hat{X}} \leq \delta \| u_{X_{h},V } - u^{k}\|_{ \hat{X}},\quad k = 0,1,2,\ldots. }$$
(2.31)

2.3.4 Adaptive Petrov-Galerkin Solvers on Anisotropic Approximation Spaces

The benefit of the above saddle point formulation is not only that it saves us the explicit calculation of the test basis functions but that it provides also an error estimator based on the lifted residual \(y_{h} = y_{h}(u_{X_{h},V },f)\) defined by the first row of (2.25).

2.3.4.1 Abstract \(\delta\)-Proximinal Iteration

In fact, it is shown in [16] that when V h  ⊂ Y is even \(\delta\)-proximal for X h + B −1 F h , with some finite dimensional subspace \(F_{h} \subset Y '\), one has

$$\displaystyle{ (1-\delta )\|f_{h} -\mathit{Bw}\|_{Y '} \leq \| y_{h}(w,f_{h})\|_{Y } \leq \| f_{h} -\mathit{Bw}\|_{Y '},\quad w \in X_{h}, }$$
(2.32)

where f h  ∈ F h is an approximation of f ∈ Y ′. The space F h controls which components of f are accounted for in the error estimator. The term ff h is a data oscillation error as encountered in adaptive finite element methods. It follows that the current error of the Petrov-Galerkin approximation \(u_{X_{h},V }\) is controlled from below and above by the quantity \(\|y_{h}\|_{Y }\). This can be used to formulate the adaptive Algorithm 1 that can be proven to give rise to a fixed error reduction per step. Its precise formulation can be found in [16, § 4.2]. It is shown in [16, Proposition 4.7] that each refinement step in Algorithm 1 below reduces the error by a fixed fraction. Hence it terminates after finitely many steps and outputs an approximate solution \(\bar{u}\) satisfying \(\|u -\bar{ u}\|_{\hat{X}} \leq \varepsilon\).

Algorithm 1 Adaptive algorithm

2.3.4.2 Application to Transport Equations

We adhere to the setting described in Sect. 2.3.2, i.e., \(X =\hat{ X} = L_{2}(D)\), \(\hat{Y } = Y = W_{0}(-{\boldsymbol b},D)\), and R Y  = BB .

The trial spaces that we now denote by X j to emphasize the nested construction below, are spanned by discontinuous piecewise linear functions on a mesh composed of cells from collections \(\mathcal{C}_{j}\), i.e.,

$$\displaystyle{ X_{j} =\mathbb{P} _{1}(\mathcal{C}_{j}),\quad j \geq 0, }$$
(2.34)

where the collections \(\mathcal{C}_{j}\) are derived from collections \(\tilde{\mathcal{C}}_{j}\) of the type (2.4) as described in Sect. 2.2.

Given X j of the form (2.34), the test spaces V j are defined by

$$\displaystyle{ V _{j}:=\mathbb{P} _{2}(\mathcal{G}_{j}) \cap C(D)\quad \mbox{ with}\quad \mathcal{G}_{j}:=\{ R^{\mathit{iso}}(Q): Q \in \mathcal{C}_{ j}\},\; }$$
(2.35)

where \(R^{\mathit{iso}}(Q) =\{ Q \cap P_{i}: i = 1,\ldots,4\}\) is defined as follows. Let P be a parallelogram containing Q and sharing at least three vertices with Q. (There exist at most two such parallelograms and we choose one of them). Then the parallelograms P i result from a dyadic refinement of P. As pointed out later, the test spaces V j constructed in this way, appear to be sufficiently large to ensure \(\delta\)-proximality for X j for \(\delta\) significantly smaller than one uniformly with respect to j.

Since the test spaces V j are determined by the trial spaces X j the crucial step is to generate X j+1 by enlarging X j based on an a posteriori criterion that “senses” directional information. This, in turn, is tantamount to a possibly anisotropic adaptive refinement of \(\mathcal{C}_{j}\) leading to the updated spaces for the next iteration sweep of the form (2.30). The idea is to use a greedy strategy based on the largest “fluctuation coefficients”. To describe this we denote for each ι Q  ∈ I Q by \(\varPsi _{R_{\iota _{ Q}}(Q)}\) an orthonormal Alpert-type wavelet basis for the difference space \(\mathbb{P}_{1}(R_{\iota _{Q}}(Q)) \ominus \mathbb{P}_{1}(Q)\), see [1]. We then set

$$\displaystyle{ \varPsi _{j} =\{\psi _{\gamma } \in \varPsi _{R(Q)}: Q \in \mathcal{C}_{j-1}\}, }$$
(2.36)

where \(\varPsi _{R(Q)} =\bigcup _{\iota _{Q}\in I_{Q}}\varPsi _{R_{\iota _{ Q}}(Q)}\). Initializing \(\mathcal{C}_{0}\) as a uniform partition (on a low level), we define for some fixed θ ∈ (0, 1)

$$\displaystyle{T_{j} =\theta \cdot \max _{\psi _{\gamma }\in \varPsi _{j}}\vert \langle B^{{\ast}}r_{ j}^{K},\psi _{\gamma }\rangle \vert }$$

for j > 0, where Ψ j is the two level basis defined in (2.36) and r j K = y K is the lifted residual from the first row of the Uzawa iteration. Then, for each \(Q \in \mathcal{C}_{j-1}\), we define its refinement \(\tilde{R}(Q)\) (see the remarks following (2.4)) by

$$\displaystyle{\tilde{R}(Q):= \left \{\begin{array}{rl} \{Q\},&\quad \mbox{ if}\quad \max _{\psi _{\gamma }\in \varPsi _{R(Q)}}\vert \langle B^{{\ast}}r_{j}^{K},\psi _{\gamma }\rangle \vert \leq T_{j}, \\ R_{\hat{\iota }_{Q}}(Q),&\quad \mbox{ otherwise}, \end{array} \right.\,\,}$$

where \(\hat{\iota }_{Q}\) is chosen to maximize \(\max _{\psi _{\gamma }\in \varPsi _{R_{ \iota _{Q}}(Q)}}\vert \langle B^{{\ast}}r_{j}^{K},\psi _{\gamma }\rangle \vert \) among all ι Q  ∈ I Q . One can then check whether this enrichment yields a sufficiently accurate L 2-approximation of B r j K (step 3 of Algorithm 1). In this case, we adopt \(\mathcal{C}_{j}\). Otherwise, the procedure is repeated for a smaller threshold θ.

2.3.5 Numerical Results

We provide some numerical experiments to illustrate the performance of the previously introduced anisotropic adaptive scheme for first order linear transport equations and refer to [17] for further tests. We monitor \(\delta\)-proximality by computing

$$\displaystyle{ \frac{\inf _{\phi \in V _{j}}\|u_{j} - u_{j}^{K} - B^{{\ast}}\phi \|_{L_{2}([0,1]^{2})}} {\|u_{j} - u_{j}^{K}\|_{L_{2}([0,1]^{2})}}, }$$
(2.37)

where \(u_{j} =\mathrm{ argmin}_{v_{j}\in X_{j}}\|u - v_{j}\|_{L_{2}(D)}\). This is only a lower bound of the \(\delta\)-proximality constant \(\delta\) for one particular choice of w in (2.24) which coincides with the choice of w in the proof in [16]. In the following experiment, the number K of Uzawa iterations is for simplicity set to K = 10. One could as well employ an early termination of the inner iteration based on a posteriori control of the lifted residuals r j k.

We consider the transport equation (2.10) with zero boundary condition g = 0, convection field \({\boldsymbol b} = (x_{2},1)^{T}\), and right hand side \(f =\chi _{\{x_{1}>x_{2}^{2}/2\}} + 1/2 \cdot \chi _{\{x_{1}\leq x_{2}^{2}/2\}}\) so that the solution exhibits a discontinuity along the curvilinear shear layer given by \(x_{1} = \frac{1} {2}x_{2}^{2}\).

In this numerical example we actually explore ways of reducing the relatively large number of possible splits corresponding to the operators \(R_{\iota _{Q}}\), ι Q  ∈ I Q , while still realizing the parabolic scaling law. In fact, we confined the cells to intersections of parallelograms P and their intersections with the domain D, much in the spirit of shearlet systems, employing anisotropic refinements as illustrated in Fig. 2.2 as well as the isotropic refinement R iso. Permitting occasional overlaps of parallelograms, one can even avoid any interior triangles, apparently without degrading the accuracy of the adaptive approximation. The general refinement scheme described in Sect. 2.2 covers the presently proposed one as a special case, except, of course, for the possible overlap of cells.

Fig. 2.2
figure 2

Possible directional adjustments are illustrated for a parallelogram P (dashed line). (a) Rule (iii) of Sect. 2.2 yields two parallelograms with the same “direction”. (b), (c) Applying rule (i) twice, changes the anisotropic direction slightly. The three refined parallelograms depicted in (b), (c) illustrate the results of a possible merging of adjacent triangles

Figure 2.3a, b show the adaptive grids associated with the trial space X 5 and the test space V 5. The refinement in the neighborhood of the discontinuity curve reflects a highly anisotropic structure. Figure 2.3c illustrates the approximation given by 306 basis elements. We emphasize that the solution is very smooth in the vicinity of the discontinuity curve and oscillations across the jump are almost completely absent and in fact much less pronounced than observed for isotropic discretizations. Figure 2.3d indicates the optimal rate realized by our scheme, see Theorem 2.1.

Fig. 2.3
figure 3

(a) Adaptive grid for the trial space X 5. (b) Adaptive grid for the test space V 5. (c) Approximate solution (306 basis elements). (d) L 2(D) errors (vertical axis) for N degrees of freedom (horizontal axis) achieved by the adaptive scheme (blue) in comparison with the optimal rate N −1 (red), predicted by Theorem 2.1. This is to be compared with the rate N −1∕2 realized by adaptive isotropic refinements [16]

The estimated values of the proximality parameter \(\delta\), displayed in Table 2.1, indicate the numerical stability of the scheme.

Table 2.1 Numerical estimates (2.37) for the proximality constant \(\delta\) and for the L 2 approximation error

In the remainder of the paper we discuss parametric equations whose solutions are functions of spatial variables and additional parameters. Particular attention will here be paid to the radiative transfer problems, where the dimension of the physical domain is 2 or 3.

2.4 Reduced Basis Methods

2.4.1 Basic Concepts and Rate Optimality

Model reduction is often necessary when solutions to parametric families of PDEs are frequently queried for different parameter values e.g. in an online design or optimization process. The linear transport equation (2.17) is a simple example of such a parameter dependent PDE. Since (a) propagation of singularities is present and (b) the parameters determine the propagation direction \({\boldsymbol s}\) it turns out to already pose serious difficulties for standard model reduction techniques.

We emphasize that, rather than considering a single variational formulation for functions of spatial variables and parameters, as will be done later in Sect. 2.5, we take up the parametric nature of the problem by considering a parametric family of variational formulations. That is, for each fixed \({\boldsymbol s}\) the problem is an ordinary linear transport problem for which we can employ the corresponding variational formulation from Sect. 2.3.2, where now the respective spaces may depend on the parameters. In this section we summarize some of the results from [18] which are based in an essential way on the concepts discussed in the previous section.

In general, consider a family

$$\displaystyle{ b_{\mu }(u,v) = f(v),\quad u \in X_{\mu },\,v \in Y _{\mu },\,\,\mu \in \mathcal{P},\,\,b_{\mu }(u,v) =\sum _{ k=1}^{M}\varTheta _{ k}(\mu )b_{k}(u,v) }$$
(2.38)

of well-posed problems, where \(\mathcal{P}\subset \mathbb{R}^{P}\) is a compact set of parameters μ, and the parameter dependence is assumed to be affine with smooth functions Θ k . The solutions u(⋅ ; μ) = u(μ) then become functions of the spatial variables and of the parameters \(\mu \in \mathcal{P}\).

As before we can view (2.38) as a parametric family of operator equations B μ u = f, where B μ : X μ  → Y μ ′ is again given by (B μ u)(v) = b μ (u, v). Each particular solution u(μ) is a point on the solution manifold

$$\displaystyle{ \mathcal{M}:=\{ B_{\mu }^{-1}f:\mu \in \mathcal{P}\}. }$$
(2.39)

Rather than viewing u(μ) as a point in a very high-dimensional (in fact infinite dimensional) space, and calling a standard solver for each evaluation in a frequent query problem, the Reduced Basis Method (RBM) tries to exploit the fact that each u(μ) belongs to a much smaller dimensional manifold . Assuming that all the spaces X μ are equivalent to a reference Hilbert space X with norm \(\|\cdot \|_{X}\), the key objective of the RBM is to construct a possibly small dimensional linear space \(X_{n} \subset X\) such that for a given target accuracy \(\varepsilon > 0\)

$$\displaystyle{ \sup _{\mu \in \mathcal{P}}\inf _{w\in X_{n}}\|u(\mu ) - w\|_{X}:=\mathrm{ maxdist}_{X}(\mathcal{M},X_{n}) \leq \varepsilon. }$$
(2.40)

Once X n has been found, bounded linear functionals of the exact solution u(μ) can be approximated within accuracy \(\varepsilon\) by the functional applied to an approximation from X n which, when n is small, can hopefully be determined at very low cost. The computational work in an RBM is therefore divided into an offline and an online stage. Finding X n is the core offline task which is allowed to be computationally (very) expensive. More generally, solving problems in the “large” space X is part of the offline stage. Of course, solving a problem in X is already idealized. In practice X is replaced by a possibly very large trial space, typically a finite element space, which is referred to as the truth space and should be chosen large enough to guarantee the desired target accuracy, ideally certified by a posteriori bounds.

The computation of a (near-)best approximation u n (μ) ∈ X n is then to be online feasible. More precisely, one seeks to obtain a representation

$$\displaystyle{ u_{n}(\mu ) =\sum _{ j=1}^{n}c_{ j}(\mu )\phi _{j}, }$$
(2.41)

where the ϕ j form a basis for X n and where for each query \(\mu \in \mathcal{P}\) the expansion coefficients c j (μ) can be computed by solving only problems of the size n, see e.g. [39] for principles of practical realizations. Of course, such a concept pays off when the dimension \(n = n(\varepsilon )\), needed to realize (2.40), grows very slowly when \(\varepsilon\) decreases. This means that the elements of \(\mathcal{M}\) have sparse representations with respect to certain problem dependent dictionaries.

The by now most prominent strategy for constructing “good” spaces X n can be sketched as follows. Evaluating for a given X n the quantity \(\mathrm{maxdist}_{X}(\mathcal{M},X_{n})\) is infeasible because this would require to determine for each \(\mu \in \mathcal{P}\) (or for each μ in a large training set \(\mathcal{P}_{h} \subset \mathcal{P}\) which for simplicity we also denote by \(\mathcal{P}\)) the solution u(μ) which even for the offline stage is way too expensive. Therefore, one chooses a surrogate R n (μ) such that

$$\displaystyle{ \inf _{w\in X_{n}}\|u(\mu ) - w\|_{X} \leq R_{n}(\mu,X_{n}),\quad \mu \in \mathcal{P}, }$$
(2.42)

where the evaluation of R n (μ, X n ) is fast and an optimization of R n (μ, X n ) can therefore be performed in the offline stage. This leads to the greedy algorithm in Algorithm 2.

Algorithm 2 Greedy algorithm

A natural question is to ask how the spaces X n constructed in such a greedy fashion compare with “best spaces” in the sense of the Kolmogorov n-widths

$$\displaystyle{ d_{n}(\mathcal{M})_{X}:=\inf _{\mathrm{dim}\,W_{n}=n}\sup _{w\in \mathcal{M}}\inf _{z\in W_{n}}\|w - z\|_{X}\;. }$$
(2.44)

The n-widths are expected to decay the faster the more regular the dependence of u(μ) is on μ. In this case an RBM has a chance to perform well.

Clearly, one always has \(d_{n}(\mathcal{M})_{X} \leq \mathrm{ maxdist}_{X}(\mathcal{M},X_{n})\). Unfortunately, the best constant C n for which \(\mathrm{maxdist}_{X}(\mathcal{M},X_{n}) \leq C_{n}d_{n}(\mathcal{M})_{X}\) is C n  = 2n, see [4, 6]. Nevertheless, when comparing rates rather than individual values, one arrives at more positive results [4, 21]. The following consequence of these results asserts optimal performance of the greedy algorithm provided that the surrogate sandwiches the error of best approximation.

Theorem 2.4 ([18, Theorem 1.3]).

Assume that there exists a constant 0 < c R ≤ 1 such that one has for all n

$$\displaystyle{ c_{R}R_{n}(\mu,X_{n}) \leq \inf _{w\in X_{n}}\|u(\mu ) - w\|_{X} \leq R_{n}(\mu,X_{n}),\quad \mu \in \mathcal{P}. }$$
(2.45)

Then, the spaces X n produced by Algorithm  2 satisfy

$$\displaystyle{ d_{n}(\mathcal{M})_{x} \leq \mathit{Cn}^{-\alpha }\quad \Longrightarrow\quad \mathrm{maxdist}_{ X}(\mathcal{M},X_{n}) \leq \bar{\mathit{Cn}}^{-\alpha }, }$$
(2.46)

where \(\bar{C}\) depends only on C,α, and \(\kappa (R_{n}):= 1/c_{R}\) , the condition of the surrogate.

We call the RBM rate-optimal whenever (2.46) holds for any α > 0. Hence, finding rate-optimal RBMs amounts to finding feasible well-conditioned surrogates.

2.4.2 A Double Greedy Method

Feasible surrogates that do not require the explicit computation of truth solutions for each \(\mu \in \mathcal{P}\) need to be based in one way or the other on residuals. When (2.38) is a family of uniformly X-elliptic problems so that B μ are uniformly bounded isomorphisms from X onto X′, residuals indeed lead to feasible surrogates whose condition depends on the ratio of the continuity and coercivity constant. This follows from the mapping property of B μ , stability of the Galerkin method, and the best approximation property of the Galerkin projection, see [18].

When the problems (2.38) are indefinite or unsymmetric and singularly perturbed these mechanisms no longer work in this way, which explains why the conventional RBMs do not perform well for transport dominated problems in that they are far from rate-optimal.

As shown in [18], a remedy is offered by the above renormation principle providing well-conditioned variational formulations for (2.38). In principle, these allow one to relate errors (in a norm of choice) to residuals in a suitably adapted dual norm which are therefore candidates for surrogates. The problem is that, given a trial space X n , in particular a space generated in the context of an RBM, it is not clear how to obtain a sufficiently good test space such that the corresponding Petrov-Galerkin projection is comparable to the best approximation. The new scheme developed in [18] is of the following form:

  1. (I)

    Initialization: take X 1: = span {u(μ 1)}, μ 1 randomly chosen, Y 1: = { 0};

  2. (II)

    Given a pair of spaces \(X_{n},\tilde{V }_{n}\), the routine Update-inf-sup- \(\delta\) enriches \(\tilde{V }_{n}\) to a larger space V n which is \(\delta\)-proximal for X n ;

  3. (III)

    Extend X n to X n+1 by a greedy step according to Algorithm 2, set \(\tilde{V }_{n+1} = V _{n}\), and go to (II) as long as a given target tolerance for an a posteriori threshold is not met.

The routine Update-inf-sup- \(\delta\) works roughly as follows (see also [25] in the case of the Stokes system). First, we search for a parameter \(\bar{\mu }\in \mathcal{P}\) and a function \(\bar{w} \in X_{n}\) for which the inf-sup condition is worst, i.e.

$$\displaystyle{ \sup _{v\in \tilde{V }_{n}}\frac{b_{\bar{\mu }}(\bar{w},v)} {\|v\|_{Y _{\bar{\mu }}}\|\bar{w}\|_{\hat{X}_{\bar{\mu }}}} =\inf _{\mu \in \mathcal{P}}\left (\inf _{w\in X_{n}}\sup _{v\in \tilde{V }_{n}}\frac{b_{\mu }(w,v)} {\|v\|_{Y _{\mu }}\|w\|_{\hat{X}_{\mu }}} \right ). }$$
(2.47)

If this worst case inf-sup constant does not exceed yet a desired uniform lower bound, \(\tilde{V }_{n}\) does not contain an effective supremizer, i.e., a function realizing the supremum in (2.47), for \(\bar{\mu },\bar{w}\), yet. However, since the truth space satisfies a uniform inf-sup condition, due to the same variational formulation, there exists a good supremizer in the truth space which, is given by the Galerkin problem

$$\displaystyle{\bar{v} = R_{Y _{\bar{\mu }}}^{-1}B_{\bar{\mu }}\bar{w} =\mathrm{ argmax}_{ v\in Y _{\bar{\mu }}}\frac{b_{\bar{\mu }}(\bar{w},v)} {\|v\|_{Y _{\bar{\mu }}}\|\bar{w}\|_{\hat{X}_{\bar{\mu }}}},}$$

providing the enrichment \(\tilde{V }_{n} \rightarrow \mathrm{ span}\{\tilde{V }_{n},R_{Y _{\mu }}^{-1}B_{\mu }\bar{w}\}\).

The interior greedy stabilization loop (II) ensures that the input pair X n , Y n in step (III) is inf-sup stable with an inf-sup constant as close to one as one wishes, depending on the choice of \(\delta < 1\). By 2.2, each solution u n (μ) of the discretized system for (X h , V ) = (X n , V n ) satisfies the near-best approximation property (2.26) and (2.27). Hence \(\|f - B_{\mu }u_{n}(\mu )\|_{Y _{\mu }'}\) is a well conditioned surrogate (with condition close to one). Therefore, the assumptions of Theorem 2.4 hold so that the outer greedy step (III) yields a rate-optimal update. In summary, under the precise assumptions detailed in [18], the above double greedy scheme is rate-optimal.

Before turning to numerical examples, a few comments on the interior greedy loop Update-inf-sup- \(\delta\) are in order.

  1. (a)

    Finding \(\bar{\mu }\) in (2.47) requires for each μ-query to perform a singular value decomposition in the low dimensional reduced spaces so that this is offline feasible, see [18, Remark 4.2].

  2. (b)

    When the test spaces Y μ all agree with a reference Hilbert space Y as sets and with equivalent norms it is easy to see that the interior stabilization loop terminates after at most M steps where M is the number of parametric components in (2.38), see [18, Remark 4.9] and [25, 38]. If, on the other hand, the spaces Y μ differ even as sets, as in the case of transport equations when the transport direction is the parameter, this is not clear beforehand. By showing that the inf-sup condition is equivalent to a \(\delta\)-proximality condition one can show under mild assumptions though that the greedy interior loop still terminates after a number of steps which is independent of the truth dimension, [18, Remark 4.11].

  3. (c)

    In this latter case the efficient evaluation of \(\|f - B_{\mu }u(\mu )\|_{Y _{\mu }'}\) requires additional efforts, referred to as iterative tightening, see [18, Section 5.1].

  4. (d)

    The renormation strategy saves an expensive computation of stability constants as in conventional RBMs since, by construction, through the choice of \(\delta\), the stability constants can be driven as close to one as one wishes.

The scheme has been applied in [18] to convection-diffusion and pure transport problems where the convection directions are parameter dependent. Hence the variational formulations are of the form (2.38). We briefly report some results for the transport problem since this is an extreme case in the following sense. The test spaces Y μ do not agree as sets when one would like the X μ to be equivalent for different parameters. Hence, one faces the obstructions mentioned in (b), (c) above. Moreover, for discontinuous right hand side and discontinuous boundary conditions the dependence of the solutions on the parameters has low regularity so that the n-widths do not decay as rapidly as in the convection-diffusion case. Nevertheless, the rate-optimality still shows a relatively fast convergence for the reduced spaces X n shown below.

The first example concerns (2.17) (with \(\mu ={\boldsymbol s}\) ranging over a quarter circle, D = (0, 1)2) for \(f_{\circ }\equiv 1\), \(g \equiv 0\). In the second example, we take \(f_{\circ }(x_{1},x_{2}) = 0.5\) for x 1 < x 2, \(f_{\circ }(x_{1},x_{2}) = 1\) for \(x_{1} \geq x_{2}\) (Tables 2.22.3 and Fig. 2.4).

Fig. 2.4
figure 4

Surrogates of the reduced basis approximation for Examples 1 and 2

Table 2.2 Numerical results for Example 1, maximal truth error in L 2 0.000109832
Table 2.3 Numerical results for Example 2 after a single cycle of iterative tightening. Maximal truth error in L 2 0.0154814

2.5 Sparse Tensor Approximation for Radiative Transfer

We now extend the parametric transport problem (2.17) to the radiative transport problem (RTP) (see, e.g., [37]) which consists in finding the radiative intensity \(u: D \times \mathcal{S}\rightarrow \mathbb{R}\), defined on the Cartesian product of a bounded physical domain \(D \subset \mathbb{R}^{d}\), where d = 2, 3, and the unit \(d_{\mathbb{S}}\)-sphere as the parameter domain: \(\mathcal{P} = \mathcal{S}\) with \(d_{\mathbb{S}} = 1,2\). Given an absorption coefficient \(\kappa \geq 0\), a scattering coefficient \(\sigma \geq 0\), and a scattering kernel or scattering phase function Φ > 0, which is normalized to \(\int _{\mathcal{S}}\varPhi ({\boldsymbol s},{\boldsymbol s}')\mathrm{d}{\boldsymbol s}' = 1\) for each direction \({\boldsymbol s}\), one defines the transport operator \(\mathrm{T}u:= ({\boldsymbol s} \cdot \nabla _{x}+\kappa )u\), and the scattering operator \(\mathrm{Q}u:=\sigma \mathrm{ Q_{1}}\!u =\sigma (u -\int _{\mathcal{S}}\varPhi ({\boldsymbol s},{\boldsymbol s}')u({\boldsymbol x},{\boldsymbol s}')\mathrm{d}{\boldsymbol s}')\). The radiative intensity is then given by

$$\displaystyle{ (\mathrm{T} +\mathrm{ Q})u = f,\;\;u\vert _{\partial \varOmega _{-}} = g, }$$
(2.48)

where \(f:=\kappa I_{b}\), \(\partial \varOmega _{-}:=\{ ({\boldsymbol x},{\boldsymbol s}) \in \partial D \times \mathcal{S}:{\boldsymbol s} \cdot {\boldsymbol n}({\boldsymbol x}) < 0\}\), and g denote the source term, the inflow-boundary, and the inflow-boundary values, respectively. As before, \(\varGamma _{-}({\boldsymbol s}):=\{{\boldsymbol x} \in \partial D:{\boldsymbol s} \cdot {\boldsymbol n}({\boldsymbol x}) < 0\}\) (see (2.18)) stands for the physical inflow-boundary.

The partial differential equation (2.48) is known as stationary monochromatic radiative transfer equation (RTE) with scattering, and can be viewed as (nonlocal) extension of the parametric transport problem (2.17), where the major difference to (2.17) is the scattering operator Q. Sources with support contained in D are modeled by the blackbody intensity I b  ≥ 0, radiation from sources outside of the domain or from its enclosings is prescribed by the boundary data g ≥ 0.

Deterministic numerical methods for the RTP which are commonly used in engineering comprise the discrete ordinates (S N -) method and the spherical harmonics (P N -) method.

In the discrete ordinate method (DOM), the angular domain is collocated by a finite number of fixed propagation directions in the angular parameter space; in this respect, the DOM resembles the greedy collocation in the parameter domain: each of the directions Eq. (2.48) results in a spatial PDE which is solved (possibly in parallel) by standard finite differences, finite elements, or finite volume methods.

In the spherical harmonics method (SHM), a spectral expansion with spatially variable coefficients is inserted as ansatz into the variational principle Eq. (2.48). By orthogonality relations, a coupled system of PDEs (whose type can change from hyperbolic to elliptic in the so-called diffuse radiation approximation) for the spatial coefficients is obtained, which is again solved by finite differences or finite elements.

The common deterministic methods S N - and P N -approximation exhibit the so-called “curse of dimensionality”: the error with respect to the total numbers of degrees of freedom (DoF) M D and \(M_{\mathcal{S}}\) on the physical domain D and the parameter domain \(\mathcal{S}\) scales with the dimension d and \(d_{\mathbb{S}}\) as \(O(M_{D}^{-s/d} + M_{\mathcal{S}}^{-t/d_{\mathbb{S}}})\) with positive constants s and t.

The so called sparse grid approximation method alleviates this curse of dimensionality for elliptic PDEs on cartesian product domains, see [7] and the references therein. Widmer et al. [41] has developed a sparse tensor method to overcome the curse of dimensionality for radiative transfer with a wavelet (isotropic) discretization of the angular domain. Under certain regularity assumptions on the absorption coefficient \(\kappa\) and the blackbody intensity I b , their method achieves the typical benefits of sparse tensorization: a log-linear complexity in the number of degrees of freedom of a component domain with an essentially (up to a logarithmic factor) undeteriorated rate of convergence. However, scattering had not been addressed in that work.

In order to include scattering and to show that the concepts of sparse tensorization can also be applied to common solution methods, sparse tensor versions of the spherical harmonics approximation were developed extending the “direct sparse” approach by [41]. The presently developed version also accounts for scattering [27]. For this sparse spherical harmonics method, we proved that the benefits of sparse tensorization can indeed be harnessed.

As a second method a sparse tensor product version of the DOM based on the sparse grid combination technique was realized and analyzed in [26, 28]. Solutions to discretizations of varying discretization levels, for a number of collocated transport problems, and with scattering discretized by combined Galerkin plus quadrature approximation in the transport collocation directions are combined in this method to form a sparse tensor solution that we proved in [26, 28] breaks the curse of dimensionality as described above. These benefits hold as long as the exact solution of the RTE is sufficiently regular. An overview follows.

2.5.1 Sparse Discrete Ordinates Method (Sparse DOM)

We adopt a formulation where the inflow boundary conditions are enforced in a weak sense. To this end, we define the boundary form (see, e.g., [26])

$$\displaystyle{ \partial b(u,v):= (v,{\boldsymbol s} \cdot {\boldsymbol n}u)_{L^{2}(\partial \varOmega _{-})} =\int _{\mathcal{S}}\int _{\varGamma _{-}({\boldsymbol s})}{\boldsymbol s} \cdot {\boldsymbol n}uv\mathrm{d}{\boldsymbol x}\mathrm{d}{\boldsymbol s}\;. }$$
(2.49)

Writing for \(v: D \times \mathcal{S}\rightarrow \mathbb{R}\) briefly \(\|v\|:=\| v\|_{L_{2}(D\times \mathcal{S})}\), the norms

$$\displaystyle{\|v\|_{-}^{2}:= -\partial b(v,v),\quad \|v\|_{ 1}^{2}:=\| v\|^{2} +\|{\boldsymbol s} \cdot \nabla _{ x}v\|^{2} +\|\mathrm{ Q_{ 1}}\!v\|^{2} +\| v\|_{ -}^{2}}$$

define the Hilbert space \(\mathcal{V}_{1}:=\{ v \in L_{2}(D \times \mathcal{S}):\| v\|_{1} < \infty \}\). The SUPG-stabilized Galerkin variational formulation reads: find \(u \in \mathcal{V}_{1}\) such that

$$\displaystyle{ (\mathrm{R}v,(\mathrm{T} +\mathrm{ Q})u)_{L^{2}(D\times \mathcal{S})} - 2\partial b(u,v) = (\mathrm{R}v,f)_{L_{2}(D\times \mathcal{S})} - 2\partial b(g,v)\quad \forall v \in \mathcal{V}_{1} }$$
(2.50)

with SUPG stabilization \(\mathrm{R}v:= v + \eta {\boldsymbol s} \cdot \nabla _{x}v\), where η ≈ 2L.

For the discretization of (2.50), we replace \(\mathcal{V}_{1}\) by \(V ^{L,N} = V _{D}^{L} \otimes V _{\mathcal{S}}^{N}\). In the physical domain, standard P 1-FEM with a one-scale basis on a uniform mesh of width \(h_{\max } \lesssim 2^{-L}\) is used, in the angular domain, piecewise constants on a quasiuniform mesh of width \(h_{\max } \lesssim N^{-1}\). Fully discrete problems are obtained with a one-point quadrature in the angular domain. The resulting Galerkin formulation (2.50) can be shown to result in the same linear system of equations as the standard collocation discretization [26, Sec. 5.2]. The solution is constructed with the sparse grid combination technique (see [7]):

$$\displaystyle{\hat{u}_{L,N} =\sum _{ \ell_{D}=0}^{L}\left (u_{\ell_{ D},\ell_{\mathcal{S}}^{\max }(\ell_{D})} - u_{\ell_{D},\ell_{\mathcal{S}}^{\max }(\ell_{D}+1)}\right ),}$$

where \(u_{\ell_{D},\ell_{\mathcal{S}}} \in V ^{\ell_{D},\ell_{\mathcal{S}}}\) denotes the solution to a full tensor subproblem of physical resolution level D and angular resolution level \(\ell_{\mathcal{S}}\). The maximum angular index \(\ell_{\mathcal{S}}^{\max } = 2^{\lfloor \log _{2}(N+1)\rfloor /L(L-l_{D})}\) ensures that the angular resolution decreases when the physical resolution increases and vice versa.

While the full tensor solution u L, N requires \(O(2^{dL}N^{d_{\mathbb{S}}})\) degrees of freedom, the sparse solution involves asymptotically at most \(O((L +\log N)(2^{dL} + N^{d_{\mathbb{S}}}))\) degrees of freedom [26, Lemma 5.6]. At the same time, we have

$$\displaystyle{\|u - u_{L,N}\|_{1} \leq C2^{-L}\|u\|_{ H^{2,0}(D\times \mathcal{S})} + N^{-1}\|u\|_{ H^{1,1}(D\times \mathcal{S})},}$$

while for solutions in \(H^{2,1}(D \times \mathcal{S}) \subset (H^{2,0}(D \times \mathcal{S}) \cap H^{1,1}(D \times \mathcal{S}))\)

$$\displaystyle{\|u -\hat{ u}_{L,N}\|_{1} \leq CL\max \{2^{-L},N^{-1}\}\|u\|_{ H^{2,1}(D\times \mathcal{S})}\;,}$$

where C is an absolute constant.

2.5.2 Numerical Experiment

To evaluate the solution numerically we monitor the incident radiation \(G({\boldsymbol x}) =\int _{\mathcal{S}}u({\boldsymbol x},{\boldsymbol s})\mbox{ d}{\boldsymbol s}\) and its relative error \(err(G_{L,N})_{X} =\| G - G_{L,N}\|_{X}/\|G\|_{X}\), X = L 2(D), H 1(D).

The setting for the experiment is D = (0, 1)d, \(\mathcal{S} = \mathcal{S}_{\mathbb{S}}^{d}\). We solve the RTP (2.48) with isotropic scattering \(\varPhi ({\boldsymbol s},{\boldsymbol s}') = 1/\vert \mathcal{S}\vert \) and with zero inflow boundary conditions g = 0. A blackbody radiation \(I_{b}({\boldsymbol x},{\boldsymbol s})\) corresponding to the exact solution

$$\displaystyle{u({\boldsymbol x},{\boldsymbol s}) = \frac{3} {16\pi }(1 + ({\boldsymbol s} \cdot {\boldsymbol s}')^{2})\prod _{ i=1}^{3}(-4x_{ i}(x_{i} - 1)),}$$

with fixed \({\boldsymbol s}' = (1/\sqrt{3},1/\sqrt{3},1/\sqrt{3})^{\top }\) is inserted in the right hand side functional in (2.50). The absorption coefficient is set to \(\kappa = 1\), the scattering coefficient to σ = 0. 5.

This 3 + 2-dimensional problem was solved with a parallel C++ solver designed for the sparse tensor solution of large-scale radiative transfer problems. Figure 2.5 shows the superior efficiency of the sparse approach with respect to number of degrees of freedom vs. achieved error. The convergence rates indicate that the curse of dimensionality is mitigated by the sparse DOM. Further gains are expected once the present, nonadaptive sparse DOM is replaced by the greedy versions outlined in Sect. 2.3.1.1.

Fig. 2.5
figure 5

Convergence in incident radiation with full and sparse DOM. Resolution for reference solution was L ref = 4. Reference slopes provided as visual aids only