1991 Mathematics Subject Classification.

1 Introduction

Many engineering applications revolve around the task of identifying a configuration that in some sense best fits certain objective criteria under certain constraints. Such design or optimization problems typically involve (sometimes many) parameters that need to be chosen so as to satisfy given optimality criteria. An optimization over such a parameter domain usually requires a frequent evaluation of the states under consideration which typically means to frequently solve a parameter-dependent family of operator equations

$$\displaystyle{ B_{y}u = f,\quad y \in \mathcal{Y}. }$$
(11.1)

In what follows the parameter set \(\mathcal{Y}\) is always assumed to be a compact subset of \(\mathbb{R}^{p}\) for some fixed \(p \in \mathbb{N}\) and B y should be thought of as a (linear) partial differential operator whose coefficients depend on the parameters \(y \in \mathcal{Y}\). Moreover, B y is viewed as an operator taking some Hilbert space U one-to-one and onto the normed dual V ′ of some (appropriate) Hilbert space V where U and V are identified through a variational formulation of (11.1) as detailed later, see for instance (11.30). Recall also that the normed dual V ′ is endowed with the norm

$$\displaystyle{ \|w\|_{V '}:=\sup _{v\in V,v\neq 0}\frac{\langle w,v\rangle } {\|v\|_{V }}, }$$
(11.2)

where \(\langle \cdot,\cdot \rangle\) denotes the dual pairing between V and V ′.

Given a parametric model (11.1) the above mentioned design or optimization problems concern now the states u(y) ∈ U which, as a function of the parameters \(y \in \mathcal{Y}\), form what we refer to as the solution manifold

$$\displaystyle{ \mathcal{M}:=\{ u(y):= B_{y}^{-1}f: y \in \mathcal{Y}\}. }$$
(11.3)

Examples of (11.1) arise, for instance, in geometry optimization when a transformation of a variable finitely parametrized domain to a reference domain introduces parameter-dependent coefficients of the underlying partial differential equation (PDE) over such domains, see, e.g., [14]. Parameters could describe conductivity, viscosity, or convection directions, see, e.g., [10, 23, 25]. As an extreme case, parametrizing the random diffusion coefficients in a stochastic PDE, e.g., by Karhunen-Loew or polynomial chaos expansions, leads to a deterministic parametric PDE involving, in principle, even infinitely many parameters, \(p = \infty\), see, e.g., [7] and the literature cited there. We will, however, not treat this particular problem class here any further since, as will be explained later, it poses different conceptual obstructions than those in the focus of this chapter, namely the absence of ellipticity which makes conventional strategies fail. In particular, we shall explain why for other relevant problem classes, e.g., those dominated by transport processes, \(\mathcal{M}\) is not “as visible” as for elliptic problems and how to restore “full visibility.”

1.1 General Context - Reduced Basis Method

A conventional way of searching for a specific state in \(\mathcal{M}\) or optimize over \(\mathcal{M}\) is to compute approximate solutions of (11.1) possibly for a large number of parameters y. Such approximations would then reside in a sufficiently large trial space \(U_{\mathcal{N}}\subset U\) of dimension \(\mathcal{N}\), typically a finite element space. Ideally one would try to assure that \(U_{\mathcal{N}}\) is large enough to warrant sufficient accuracy of whatever conclusions are to be drawn from such a discretization. A common terminology in reduced order modeling refers to \(U_{\mathcal{N}}\) as the truth space providing accurate computable information. Of course, each such parameter query in \(U_{\mathcal{N}}\) is a computationally expensive task so that many such queries, especially in an online context, would be practically infeasible. On the other hand, solving for each \(y \in \mathcal{Y}\) a problem in \(U_{\mathcal{N}}\) would just treat each solution u(y) as some “point” in the infinite-dimensional space U, viz. in the very large finite-dimensional space \(U_{\mathcal{N}}\). This disregards the fact that all these points actually belong to a possibly much thinner and more coherent set, namely the low-dimensional manifold \(\mathcal{M}\) which, for compact \(\mathcal{Y}\) and well-posed problems (11.1), is compact. Moreover, if the solutions u(y), as functions of \(y \in \mathcal{Y}\), depend smoothly on y there is hope that one can approximate all elements of \(\mathcal{M}\) uniformly over \(\mathcal{Y}\) with respect to the Hilbert space norm \(\|\cdot \|_{U}\) by a relatively small but judiciously chosen linear space U n . Here “small” means that n = dim U n is significantly smaller than \(\mathcal{N} = \mathrm{dim}\,U_{\mathcal{N}}\), often by orders of magnitude. As detailed later the classical notion of Kolmogorov n-widths quantifies how well a compact set in a Banach space can be approximated in the corresponding Banach norm by a linear space and therefore can be used as a benchmark for the effectiveness of a model reduction strategy.

Specifically, the core objective of the Reduced Basis Method (RBM) is to find for a given target accuracy \(\varepsilon\) a possibly small number \(n = n(\varepsilon )\) of basis functions ϕ j , j = 0, , n, whose linear combinations approximate each \(u \in \mathcal{M}\) within accuracy at least \(\varepsilon\). This means that ideally for each \(y \in \mathcal{Y}\) one can find coefficients c j (y) such that the expansion

$$\displaystyle{ u_{n}(x,y):=\sum _{ j=0}^{n(\varepsilon )}c_{ j}(y)\phi _{j}(x) }$$
(11.4)

satisfies

$$\displaystyle{ \|u(y) - u_{n}(y)\|_{U} \leq \varepsilon,\quad y \in \mathcal{Y}. }$$
(11.5)

Thus projecting (11.1) into the small space U n : = span {ϕ 0, , ϕ n } reduces each parameter query to solving a small n × n system of equations where typically \(n \ll \mathcal{N}\).

1.2 Goal Orientation

Recall that the actual goal of reduced modeling is often not to recover the full fields \(u(y) \in \mathcal{M}\) but only some quantity of interest I(y) typically given as a functional I(y): = (u(y)) of u(y) where  ∈ U′. Asking just the value of such a functional is possibly a weaker request than approximating all of u(y) in the norm \(\|\cdot \|_{U}\). In other words, one may have \(\vert \ell(u(y)) -\ell (u_{n}(y))\vert \leq \varepsilon\) without insisting on the validity of (11.5) for a tolerance of roughly the same size. Of course, one would like to exploit this in favor of online efficiency. Duality methods as used in the context of goal-oriented finite element methods [3] are indeed known to offer ways of economizing the approximate evaluation of functionals. Such concepts apply in the RBM context as well, see, e.g., [16, 21]. However, as we shall point out later, guaranteeing that \(\vert \ell(u(y)) -\ell (u_{n}(y))\vert \leq \varepsilon\) holds for \(y \in \mathcal{Y}\) ultimately reduces to tasks of the type (11.5) as well. So, in summary, understanding how to ensure (11.5) for possibly small \(n(\varepsilon )\) remains the core issue and therefore guides the subsequent discussions.

Postponing for a moment the issue of how to actually compute the ϕ j , it is clear that they should intrinsically depend on \(\mathcal{M}\) rendering the whole process highly nonlinear. To put the above approach first into perspective, viewing u(x, y) as a function of the spatial variables x and of the parameters y, (11.4) is just separation of variables where the factors c j (y), ϕ j (x) are a priori unknown. It is perhaps worth stressing though that, in contrast to other attempts to find good tensor approximations, in the RBM context explicit representations are only computed for the spatial factors ϕ j while for each y the weight c j (y) has to be computed by solving a small system in the reduced space U n . Thus the computation of \(\{\phi _{0},\ldots,\phi _{n(\varepsilon )}\}\) could be interpreted as dictionary learning and, loosely speaking, \(n = n(\varepsilon )\) being relatively small for a given target accuracy, means that all elements in \(\mathcal{M}\) are approximately sparse with respect to the dictionary {ϕ 0, , ϕ n , }.

The methodology just outlined has been pioneered by Y. Maday, T.A. Patera, and collaborators, see, e.g., [6, 21, 23, 25]. As indicated before, RBM is one variant of a model order reduction paradigm that is specially tailored to parameter dependent problems. Among its distinguishing constituents one can name the following. There is usually a careful division of the overall computational work into an offline phase, which could be computationally intense but should remain manageable, and an online phase, which should be executable with highest efficiency taking advantage of a precomputed basis and matrix assemblations during the offline phase. It is important to note that while the offline phase is accepted to be computationally expensive it should remain offline feasible in the sense that a possibly extensive search over the parameter domain \(\mathcal{Y}\) in the offline phase requires for each query solving only problems in the small reduced space. Under which circumstances this is possible and how to realize such division concepts has been worked out in the literature, see, e.g., [23, 25]. Here we are content with stressing that an important role is played by the way how the operator B y depends on the parameter y, namely in an affine way as stated in (11.18) later below. Second, and this is perhaps the most distinguishing constituent, along with each solution in the reduced model one strives to provide a certificate of accuracy, i.e., computed bounds for incurred error tolerances [23, 25].

1.3 Central Objectives

When trying to quantify the performance of such methods aside from the above-mentioned structural and data organization aspects, among others, the following questions come to mind:

  1. (i)

    for which type of problems do such methods work very well in the sense that \(n(\varepsilon )\) in (11.5) grows only slowly when \(\varepsilon\) decreases? This concerns quantifying the sparsity of solutions.

  2. (ii)

    How can one compute reduced bases \(\{\phi _{0},\ldots,\phi _{n(\varepsilon )}\}\) for which \(n(\varepsilon )\) is nearly minimal in a sense to be made precise below?

Of course, the better the sparsity quantified by (i) the better could be the payoff of an RBM. However, as one may expect, an answer to (i) depends strongly on the problem under consideration. This is illustrated also by the example presented in §11.5.4. Question (ii), instead, can be addressed independently of (i) in the sense that, no matter how many basis functions have to be computed in order to meet a given target accuracy, can one come up with methods that guarantee generating a nearly minimal number of such basis functions? This has to do with how to sample the solution manifold and is the central theme in this chapter.

The most prominent way of generating the reduced bases is a certain greedy sampling of the manifold \(\mathcal{M}\). Contriving greedy sampling strategies that give rise to reduced bases of nearly minimal length, in a sense to be made precise below, also for noncoercive or unsymmetric singularly perturbed problems is the central objective in this chapter. We remark though that a greedy parameter search in its standard form is perhaps not suitable for very high-dimensional parameter spaces without taking additional structural features of the problem into account. The subsequent discussions do therefore not target specifically the large amount of recent work on stochastic elliptic PDEs, since while greedy concepts are in principle well understood for elliptic problems they are per se not necessarily adequate for infinitely many parameters without exploiting specific problem-dependent structural information.

First, we recall in §11.2 a greedy space growth paradigm commonly used in all established RBMs. To measure its performance in the sense of (ii) we follow [6] and compare the corresponding distances \(\mathrm{dist}_{U}(\mathcal{M},U_{n})\) to the smallest possible distances achievable by linear spaces of dimension n, called Kolmogorov n-widths. The fact that for elliptic problems the convergence rates for the greedy errors are essentially those of the n-widths, and hence rate-optimal, is shown in §11.3 to be ultimately reduced to analyzing the so-called weak greedy algorithms in Hilbert spaces, see also [4, 13]. However, for indefinite or strongly unsymmetric and singularly perturbed problems this method usually operates far from optimality. We explain why this is the case and describe in §11.4 a remedy proposed in [10]. A pivotal role is played by certain well-conditioned variational formulations of (11.1) which are then shown to lead again to an optimal outer greedy sampling strategy also for non-elliptic problems. An essential additional ingredient consists of certain stabilizing inner greedy loops, see §11.5. The obtained rate-optimal scheme is illustrated by a numerical example addressing convection-dominated convection-diffusion problems in §11.5.4. We conclude in §11.6 with applying these concepts to the efficient evaluation of quantities of interest.

2 The Greedy Paradigm

The by far most prominent strategy for constructing reduced bases for a given parameter-dependent problem (11.1) is the following greedy procedure, see, e.g., [23]. The basic idea is that, having already constructed a reduced space U n of dimension n, find an element \(u_{n+1} = u(y_{n+1})\) in \(\mathcal{M}\) that is farthest away from the current space U n , i.e., that maximizes the best approximation error from U n and then grow U n by setting \(U_{n+1}:= U_{n} +\mathop{ \mathrm{span}}\,\{u_{n+1}\}\). Hence, denoting by \(P_{U,U_{n}}\) the U-orthogonal projection onto U n ,

$$\displaystyle{ y_{n+1}:=\mathop{ \mathrm{argmax}}\nolimits _{y\in \mathcal{Y}}\|u(y) - P_{U,U_{n}}u(y)\|_{U},\quad u_{n+1}:= u(y_{n+1}). }$$
(11.6)

Unfortunately, determining such an exact maximizer is computationally way too expensive even in an offline phase because one would have to compute for a sufficiently dense sampling of \(\mathcal{Y}\) the exact solution u(y) of (11.1) in U (in practice in \(U_{\mathcal{N}}\)). Instead one tries to construct more efficiently computable surrogates R(y, U n ) satisfying

$$\displaystyle{ \|u(y) - P_{U,U_{n}}u(y)\|_{U} \leq R(y,U_{n}),\quad y \in \mathcal{Y}. }$$
(11.7)

Recall that “efficiently computable” in the sense of offline feasibility means that for each \(y \in \mathcal{Y}\), the surrogate R(y, U n ) can be evaluated by solving only a problem of size n in the reduced space U n . Deferring an explanation of the nature of such surrogates, Algorithm 1 described below is a typical offline feasible surrogate-based greedy algorithm (SGA). Clearly, the maximizer in (11.8) below is not necessarily unique. In case several maximizers exist it does not matter which one is selected.

Algorithm 1 Surrogate-based greedy algorithm

Strictly speaking, the scheme SGA is still idealized since:

  1. (a)

    computations cannot be carried out in U;

  2. (b)

    one cannot parse through all of a continuum \(\mathcal{Y}\) to maximize R(y, U n ).

Concerning (a), as mentioned earlier computations in U are to be understood as synonymous to computations in a sufficiently large truth space \(U_{\mathcal{N}}\) satisfying all targeted accuracy tolerances for the underlying application. Solving problems in \(U_{\mathcal{N}}\) is strictly confined to the offline phase and the number of such solves should remain of the order of n = dim U n . We will not distinguish in what follows between U and \(U_{\mathcal{N}}\) unless such a distinction matters.

As for (b), the maximization is usually performed with the aid of a exhaustive search over a discrete subset of \(\mathcal{Y}\). Again, we will not distinguish between a possibly continuous parameter set and a suitable training subset. In fact, continuous optimization methods that would avoid a complete search have so far not proven to work well since each greedy step increases the number of local maxima of the objective functional. Now, how fine such a discretization for a exhaustive search should be, depends on how smoothly the u(y) depend on y. But even when such a dependence is very smooth a coarse discretization of a high-dimensional parameter set \(\mathcal{Y}\) would render a exhaustive search infeasible so that, depending on the problem at hand, one has to resort to alternate strategies such as, for instance, random sampling. However, since it seems that (b) can only be answered for a specific problem class we will not address this issue in this chapter any further.

Instead, we focus on general principles which guarantee the following. Loosely speaking the reduced spaces based on sampling \(\mathcal{M}\) should perform optimally in the sense that the resulting spaces U n have the (near) “smallest dimension” needed to satisfy a given target tolerance while the involved offline and online cost remains feasible in the sense indicated above. To explain first what is meant by “optimal” let us denote the greedy error produced by SGA as

$$\displaystyle{ \sigma _{n}(\mathcal{M})_{U}:=\max _{v\in \mathcal{M}}\inf _{\bar{u}\in U_{n}}\|v -\bar{ u}\|_{U} =\max _{y\in \mathcal{Y}}\|u(y) - P_{U,U_{n}}u(y)\|_{U}. }$$
(11.9)

Note that if we replace in (11.9) the space U n by any linear subspace \(W_{n} \subset U\) and infimize the resulting distortion over all subspaces of U of dimension at most n, we obtain the classical Kolmogorov n-widths \(d_{n}(\mathcal{M})_{U}\) quantifying the “thickness” of a compact set, see (11.21). One trivially has

$$\displaystyle{ d_{n}(\mathcal{M})_{U} \leq \sigma _{n}(\mathcal{M})_{U},\quad n \in \mathbb{N}. }$$
(11.10)

Of course, it would be best if one could reverse the above inequality. We will discuss in the next section to what extent this is possible.

To prepare for such a discussion we need more information about how the surrogate R(y, U n ) relates to the actual error \(\|u(y) - P_{U,U_{n}}u(y)\|_{U}\) because the surrogate drives the greedy search and one expects that the quality of the snapshots found in SGA depends on how “tight” the upper bound in (11.7) is.

To identify next the essential conditions on a “good” surrogate it is instructive to consider the case of elliptic problems. To this end, suppose that

$$\displaystyle{\langle B_{y}u,v\rangle = b_{y}(u,v) =\langle f,v\rangle,\quad u,v \in U,}$$

is a uniformly U-coercive bounded bilinear form and f ∈ U′, i.e., there exist constants \(0 <c_{1} \leq C_{1} <\infty\) such that

$$\displaystyle{ c_{1}\|v\|_{U}^{2} \leq b_{ y}(v,v),\quad \vert b_{y}(u,v)\vert \leq C_{1}\|u\|_{U}\|v\|_{U},\,\,u,v \in U,\,y \in \mathcal{Y}, }$$
(11.11)

holds uniformly in \(y \in \mathcal{Y}\). The operator equation (11.1) is then equivalent to: given f ∈ U′ and a \(y \in \mathcal{Y}\), find u(y) ∈ U such that

$$\displaystyle{ b_{y}(u(y),v) =\langle f,v\rangle,\quad v \in U. }$$
(11.12)

Ellipticity has two important well-known consequences. First, since (11.11) implies \(\|B_{y}\|_{U\rightarrow U'} \leq C_{1}\), \(\|B_{y}^{-1}\|_{U'\rightarrow U} \leq c_{1}^{-1}\) the operator B y : U → U′ has a finite condition number

$$\displaystyle{ \kappa _{U,U'}(B_{y}):=\| B_{y}\|_{U\rightarrow U'}\|B_{y}^{-1}\|_{ U'\rightarrow U} \leq C_{1}/c_{1} }$$
(11.13)

which, in particular, means that residuals in U′ are uniformly comparable to errors in U

$$\displaystyle{ C_{1}^{-1}\|f - B_{ y}\bar{u}\|_{U'} \leq \| u(y) -\bar{ u}\|_{U} \leq c_{1}^{-1}\|f - B_{ y}\bar{u}\|_{U'},\quad \bar{u} \in U,\,y \in \mathcal{Y}. }$$
(11.14)

Second, by Céa’s Lemma, the Galerkin projection \(\Pi _{y,U_{n}}\) onto U n is up to a constant as good as the best approximation, i.e., under assumption (11.11)

$$\displaystyle{ \|u(y) - \Pi _{y,U_{n}}u(y)\|_{U} \leq \frac{C_{1}} {c_{1}} \inf _{v\in U_{n}}\|u(y) - v\|_{U}. }$$
(11.15)

(When b y (⋅ , ⋅ ) is in addition symmetric C 1c 1 can be replaced by \((C_{1}/c_{1})^{1/2}\).) Hence, by (11.14) and (11.15),

$$\displaystyle{ R(y,U_{n}):= c_{1}^{-1}\sup _{ v\in U}\frac{\langle f,v\rangle - b_{y}(\Pi _{y,U_{n}}u(y),v)} {\|v\|_{U}} }$$
(11.16)

satisfies more than just (11.7), namely it provides also a uniform lower bound

$$\displaystyle{ \frac{c_{1}} {C_{1}}R(y,U_{n}) \leq \| u(y) - P_{U,U_{n}}u(y)\|_{U},\quad y \in \mathcal{Y}. }$$
(11.17)

Finally, suppose that b y (⋅ , ⋅ ) depends affinely on the parameters in the sense that

$$\displaystyle{ b_{y}(u,v) =\sum _{ k=1}^{M}\theta _{ k}(y)b_{k}(u,v), }$$
(11.18)

where the \(\theta _{k}\) are smooth functions of \(y \in \mathcal{Y}\) and the bilinear forms b k (⋅ , ⋅ ) are independent of y. Then, based on suitable precomputations (in \(U_{\mathcal{N}}\)) in the offline phase, the computation of \(\Pi _{y,U_{n}}u(y)\) reduces for each \(y \in \mathcal{Y}\) to the solution of a rapidly assembled (n × n) system, and R(y, U n ) can indeed be computed very efficiently, see [16, 23, 25].

An essential consequence of (11.7) and (11.17) can be formulated as follows.

Proposition 2.1.

Given \(U_{n} \subset U\) , the function u n+1 generated by (11.8) for R(y,U n ) defined by (11.16) has the property that

$$\displaystyle{ \|u_{n+1} - P_{U,U_{n}}u_{n+1}\|_{U} \geq \frac{c_{1}} {C_{1}}\max _{v\in \mathcal{M}}\min _{\bar{u}\in U_{n}}\|v -\bar{ u}\|_{U}. }$$
(11.19)

Hence, maximizing the residual based surrogate R(y, U n ) (over a suitable discretization of \(\mathcal{Y}\)) is a computationally feasible way of determining, up to a fixed factor \(\gamma:= c_{1}/C_{1} \leq 1\), the maximal distance between \(\mathcal{M}\) and U n and performs in this sense almost as well as the “ideal” but computationally infeasible surrogate \(R^{{\ast}}(\mu,U_{n}):=\| u(y) - P_{U,U_{n}}u(y)\|_{U}\).

Proof of Proposition 2.1.

Suppose that \(\bar{y} =\mathop{ \mathrm{argmax}}\nolimits _{y\in \mathcal{Y}}R(y,U_{n}),y^{{\ast}}:=\ \mathop{\mathrm{argmax}}\nolimits _{y\in \mathcal{Y}}\|u(y) - P_{U,U_{n}}u(y)\|_{U}\) so that \(u_{n+1} = u(\bar{y})\). Then, keeping (11.17) and (11.15) in mind, we have

$$\displaystyle\begin{array}{rcl} \|u_{n+1} - P_{U,U_{n}}u_{n+1}\|_{U}& =& \|u(\bar{y}) - P_{U,U_{n}}u(\bar{y})\|_{U} \geq \frac{c_{1}} {C_{1}}R(\bar{y},U_{n}) \geq \frac{c_{1}} {C_{1}}R(y^{{\ast}},U_{ n}) {}\\ & \geq & \frac{c_{1}} {C_{1}}\|u(y^{{\ast}}) - P_{ U,U_{n}}u(y^{{\ast}})\|_{ U} = \frac{c_{1}} {C_{1}}\max _{y\in \mathcal{Y}}\|u(y) - P_{U,U_{n}}u(y)\|_{U},{}\\ \end{array}$$

where we have used (11.7) in the second but last step. This confirms the claim. □ 

Property (11.19) turns out to play a key role in the analysis of the performance of the scheme SGA.

3 Greedy Space Growth

Proposition 2.1 allows us to view the algorithm SGA as a special instance of the following scenario. Given a compact subset \(\mathcal{K}\) of a Hilbert space H with inner product (⋅ , ⋅ ) inducing the norm \(\|\cdot \|^{2} = (\cdot,\cdot )\), consider the weak greedy Algorithm 2 (WGA) below.

Algorithm 2 Weak greedy algorithm

Note that again the choice of u n+1 is not necessarily unique and what follows holds for any choice satisfying (11.20).

Greedy strategies have been used in numerous contexts and variants. The current version is not to be confused though with the weak orthogonal greedy algorithm introduced in [26] for approximating a function by a linear combination of n terms from a given dictionary. In contrast, the scheme WGA described in Algorithm 2 aims at constructing a (problem dependent) dictionary with the aid of a PDE model. While greedy function approximation is naturally compared with the best n-term approximation from the underlying dictionary (see [2, 26] for related results), a natural question here is to compare the corresponding greedy errors

$$\displaystyle{\sigma _{n}(\mathcal{K})_{H}:=\max _{v\in \mathcal{K}}\min _{v_{n}\in U_{n}}\|v - v_{n}\| =: \mathop{\mathrm{max\,dist}}_{H}(\mathcal{K},U_{n})}$$

incurred when approximating a compact set \(\mathcal{K}\) with the smallest possible deviation of \(\mathcal{K}\) from any n-dimensional linear space, given by the Kolmogorov n-widths

$$\displaystyle{ d_{n}(\mathcal{K})_{H}:=\inf _{\mathrm{dim}V =n}\sup _{v\in \mathcal{K}}\inf _{v_{n}\in V }\|v - v_{n}\| =\inf _{\mathrm{dim}V =n}\mathop{\mathrm{max\,dist}}_{H}(\mathcal{K},V ), }$$
(11.21)

mentioned earlier in the preceding section. One trivially has \(d_{n}(\mathcal{K})_{H} \leq \sigma _{n}(\mathcal{K})_{H}\) for all \(n \in \mathbb{N}\) and the question arises whether there actually exists a constant C such that

$$\displaystyle{ \sigma _{n}(\mathcal{K})_{H} \leq Cd_{n}(\mathcal{K})_{H},\quad n \in \mathbb{N}. }$$
(11.22)

One may doubt such a relation to hold for several reasons. First, orthogonal greedy function approximation performs in a way comparable to best n-term approximation only under rather strong assumptions on the underlying given dictionary. Intuitively, one expects that errors made early on in the iteration are generally hard to correct later although this intuition turns out to be misleading in the case of the present set approximation. Second, the spaces U n generated by the greedy growth are restricted by being generated only from snapshots in \(\mathcal{K}\) while the best spaces can be chosen freely, see the related discussion in [4].

The comparison (11.22) was addressed first in [6] for the ideal case γ = 1. In this case a bound of the form \(\sigma _{n}(\mathcal{K})_{H} \leq Cn2^{n}d_{n}(\mathcal{K})_{H}\) could be established for some absolute constant C. This is useful only for cases where the n-widths decay faster than \(n^{-1}2^{-n}\) which indeed turns out to be possible for elliptic problems (11.12) with a sufficiently smooth affine parameter dependence (11.18). In fact, in such a case the u(y) can be even shown to be analytic as a function of y, see [7] and the literature cited there. It was then shown in [4] that the slightly better bound

$$\displaystyle{ \sigma _{n}(\mathcal{K})_{H} \leq \frac{2^{n+1}} {\sqrt{3}} d_{n}(\mathcal{K})_{H},\quad n \in \mathbb{N}, }$$
(11.23)

holds. More importantly, these bounds cannot be improved in general. Moreover, the possible exponential loss in accuracy is not due to the fact the greedy spaces are generated by snapshots from \(\mathcal{K}\). In fact, denoting by \(\bar{d}_{n}(\mathcal{K})_{H}\) the restricted “inner” widths, obtained by allowing only subspaces spanned by snapshots of \(\mathcal{K}\) in the competition, one can prove that \(\bar{d}_{n}(\mathcal{K})_{H} \leq nd_{n}(\mathcal{K})_{H}\), \(n \in \mathbb{N}\), which is also sharp in general [4].

While these findings may be interpreted as limiting the use of reduced bases generated in a greedy fashion to problems where the n-widths decay exponentially fast the situation turns out to be far less dim if one does not insist on a direct comparison of the type (11.22) with n being the same on both sides of the inequality. In [4, 13] the question is addressed whether a certain convergence rate of the n-widths \(d_{n}(\mathcal{K})_{H}\) implies some convergence rate of the greedy errors \(\sigma _{n}(\mathcal{K})_{H}\). The following result from [4] gave a first affirmative answer.

Theorem 3.1.

Let 0 < γ ≤ 1 be the parameter in (11.20) and assume that \(d_{0}(\mathcal{K})_{H} \leq M\) for some M > 0. Then

$$\displaystyle{d_{n}(\mathcal{K})_{H} \leq Mn^{-\alpha },\;\quad n \in \mathbb{N},}$$

for some α > 0, implies

$$\displaystyle{ \sigma _{n}(\mathcal{K})_{H} \leq CMn^{-\alpha },\;\;n> 0, }$$
(11.24)

where \(C:= q^{\frac{1} {2} }(4q)^{\alpha }\) and \(q:= \lceil 2^{\alpha +1}\gamma ^{-1}\rceil ^{2}\) .

This means that the weak greedy scheme may still be highly profitable even when the n-widths do not decay exponentially. Moreover, as expected, the closer the weakness parameter γ is to one, the better, which will later guide the sampling strategies for constructing reduced bases.

Results of the above type are not confined to polynomial rates. A sub-exponential decay of the \(d_{n}(\mathcal{K})_{H}\) with a rate \(e^{-cn^{\alpha } }\), α ≤ 1 is shown in [4] to imply a rate

$$\displaystyle{ \sigma _{n}(\mathcal{K})_{H} \leq C(\alpha,\gamma )e^{-\tilde{c}n^{\tilde{\alpha }}},\quad \tilde{\alpha } =\alpha /(1+\alpha ),\quad n \in \mathbb{N}. }$$
(11.25)

The principle behind the estimates (11.24), (11.25) is to exploit a “flatness” effect or what one may call “conditional delayed comparison.” More precisely, given any \(\theta \in (0,1)\) and defining \(q:= \lceil 2(\gamma \theta )^{-1}\rceil ^{2}\), one can show that ([4, Lemma 2.2])

$$\displaystyle{\sigma _{n+qm}(\mathcal{K})_{H} \geq \theta \sigma _{n}(\mathcal{K})_{H}\quad \Rightarrow \quad \sigma _{n}(\mathcal{K})_{H} \leq q^{1/2}d_{ m}(\mathcal{K})_{H},\quad n \in \mathbb{N}.}$$

Thus a comparison between greedy errors and n-widths is possible when the greedy errors do not decay too quickly. This is behind the diminished exponent \(\tilde{\alpha }\) in (11.25).

These results have been improved upon in [13] in several ways employing different techniques yielding improved comparisons. Abbreviating \(\sigma _{n}:=\sigma _{n}(\mathcal{K})_{H},d_{n}:= d_{n}(\mathcal{K})_{H}\), a central result in the present general Hilbert space context states that for any N ≥ 0, K ≥ 1, 1 ≤ m < K one has

$$\displaystyle{ \prod _{i=1}^{K}\sigma _{ N+i}^{2} \leq \gamma ^{-2K}\Big( \frac{K} {M}\Big)^{m}\Big( \frac{K} {K - m}\Big)^{K-m}\sigma _{ N+1}^{2m}d_{ m}^{2(K-m)}. }$$
(11.26)

As a first important consequence, one derives from these inequalities a nearly direct comparison between \(\sigma _{n}\) and d n without any constraint on the decay of \(\sigma _{n}\) or d n . In fact, taking \(N = 0,K = n\), and any 1 ≤ m < n in (11.26), using the monotonicity of the \(\sigma _{n}\), one shows that \(\sigma _{n}^{2n} \leq \gamma ^{-2n}\Big( \frac{n} {m}\Big)^{m}\Big( \frac{n} {n-m}\Big)^{n-m}d_{ m}^{2(n-m)}\) from which one deduces

$$\displaystyle{ \sigma _{n} \leq \sqrt{2}\gamma ^{-1}\min _{ 1\leq m<n}d_{m}^{\frac{n-m} {n} },\quad n \in \mathbb{N}. }$$
(11.27)

This, in particular, gives the direct unconditional comparison

$$\displaystyle{\sigma _{2n}(\mathcal{K})_{H} \leq \gamma ^{-1}\sqrt{2d_{ n}(\mathcal{K})_{H}},\quad n \in \mathbb{N}.}$$

The estimate (11.27) is then used in [13] to improve on (11.25) establishing the bounds

$$\displaystyle{ d_{n}(\mathcal{K})_{H} \leq C_{0}e^{-c_{0}n^{\alpha }}\quad \Rightarrow \quad \sigma _{ n}(\mathcal{K})_{H} \leq \sqrt{2C_{0}}\gamma ^{-1}e^{-c_{1}n^{\alpha }},\quad \,n \in \mathbb{N}, }$$
(11.28)

i.e., the exponent α is preserved by the rate for the greedy errors. Moreover, one can recover (11.24) from (11.26) (with different constants).

Although not needed in the present context the second group of results in [13] should be mentioned that concerns the extension of the weak greedy algorithm WGA to Banach spaces X in place of the Hilbert space H. Remarkably, a direct comparison between \(\sigma _{n}(\mathcal{K})_{X}\) and \(d_{n}(\mathcal{K})_{X}\) similar to (11.26) is also established in [13]. The counterpart to (11.27) reads \(\sigma _{2n} \leq 2\gamma ^{-1}\sqrt{nd_{n}}\), i.e., one loses a factor \(\sqrt{n}\) which is shown, however, to be necessary in general.

All the above results show that the smaller the weakness parameter γ the stronger the derogation of the rate of the greedy errors in comparison with the n-widths.

4 What are the Right Projections?

As shown by (11.24) and (11.28), the weak greedy algorithm WGA realizes optimal rates for essentially all ranges of interest. A natural question is under which circumstances a surrogate-based greedy algorithm SGA is in this sense also rate-optimal, namely ensures the validity of (11.24) and (11.28). Obviously, this is precisely the case when new snapshots generated through maximizing the surrogate have the weak greedy property (11.20). Note that Proposition 2.1 says that the residual-based surrogate (11.16) in the case of coercive problems does ensure the weak-greedy property so that SGA is indeed rate-optimal for coercive problems. Note also that the weakness parameter \(\gamma = c_{1}/C_{1}\) is in this case the larger the smaller the condition number of the operator B y is, see (11.13). Obviously, the key is that the surrogate not only yields an upper bound for the best approximation error but also, up to a constant, a lower bound (11.17), and the more tightly the best approximation error is sandwiched by the surrogate the better the performance of SGA. Therefore, even if the problem is coercive for a very small \(\gamma = c_{1}/C_{1}\), as is the case for convection-dominated convection-diffusion problems, in view of the dependence of the bounds in (11.24) and (11.28) on γ −1, one expects that the performance of a greedy search based on (11.16) degrades significantly.

In summary, as long as algorithm SGA employs a tight surrogate in the sense that

$$\displaystyle{ c_{S}R(y,U_{n}) \leq \inf _{v\in U_{n}}\|u(y) - v\|_{U} \leq R(y,U_{n}),\quad y \in \mathcal{Y}, }$$
(11.29)

holds for some constant c S  > 0, independent of \(y \in \mathcal{Y}\), algorithm SGA is rate-optimal in the sense of (11.24), (11.28), i.e., it essentially realizes the n-width rates over all ranges of interest, see [10]. We refer to \(c_{S}^{-1}:=\kappa _{n}(R)\) as the condition of the surrogate R(⋅ , U n ). In the RBM community the constant c S −1 is essentially the stability factor which is usually computed along with an approximate reduced solution. Clearly, the bounds in §11.3 also show that the quantitative performance of SGA is expected to be the better the smaller the condition of the surrogate, i.e., the larger c S .

As shown so far, coercive problems with a small condition number κ U, U(B y ) represent an ideal setting for RBM and standard Galerkin projection combined with the symmetric surrogate (11.16), based on measuring the residual in the dual norm \(\|\cdot \|_{U'}\) of the “error norm” \(\|\cdot \|_{U}\), identifies rate-optimal snapshots for a greedy space growth. Of course, this marks a small segment of relevant problems. Formally, one can still apply these projections and surrogates for any variational problem (11.12) for which a residual can be computed. However, in general, for indefinite or unsymmetric singularly perturbed problems, the tightness relation (11.29) may no longer hold for surrogates of the form (11.16) or, if it holds the condition κ n (R) becomes prohibitively large. In the latter case, the upper bound of the best approximation error is too loose to direct the search for proper snapshots. A simple example is the convection-diffusion problem: for \(f \in (H_{0}^{1}(\Omega ))'\) find \(u \in H_{0}^{1}(\Omega )\), \(\Omega \subset \mathbb{R}^{d}\), such that

$$\displaystyle{ \varepsilon (\nabla u,\nabla v) + (\vec{b} \cdot \nabla u,v) + (cu,v) =: b_{y}(u,v) =\langle f,v\rangle,\quad v \in H_{0}^{1}(\Omega ), }$$
(11.30)

where, for instance, \(y = (\varepsilon,\vec{b}) \in \mathcal{Y}:= [\varepsilon _{0},1] \times S^{d-1}\), S d−1 the (d − 1)-sphere.

Remark 4.1.

It is well known that when \(c -\frac{1} {2}\mathrm{div}\vec{b} \geq 0\) problem (11.30) has for any \(f \in H^{-1}(\Omega ):= (H_{0}^{1}(\Omega ))'\) a unique solution. Thus for \(U:= H_{0}^{1}(\Omega )\) (11.11) is still valid but with \(\kappa _{U,U'}(B_{y}) \sim \varepsilon ^{-1}\) which becomes arbitrarily large for a correspondingly small diffusion lower bound \(\varepsilon _{0}\).

The standard scheme SGA indeed no longer performs nearly as well as in the well-conditioned case. The situation is even less clear when \(\varepsilon = 0\) (with modified boundary conditions) where no “natural” variational formulation suggests itself (we refer to [10] for a detailed discussion of these examples). Moreover, for indefinite problems the Galerkin projection does generally perform like the best approximation which also adversely affects tightness of the standard symmetric residual based surrogate (11.16).

Hence, to retain rate-optimality of SGA also for the above-mentioned extended range of problems one has to find a better surrogate than the one based on the symmetric residual bound in (11.16). We indicate in the next section that such better surrogates can indeed be obtained at affordable computational cost for a wide range of problems through combining Petrov-Galerkin projections with appropriate unsymmetric residual bounds. The approach can be viewed as preconditioning the continuous problem already on the infinite-dimensional level.

4.1 Modifying the Variational Formulation

We consider now a wider class of (not necessarily coercive) variational problems

$$\displaystyle{ b(u,v) =\langle f,v\rangle,\quad v \in V, }$$
(11.31)

where we assume at this point only for each f ∈ V ′ the existence of a unique solution u ∈ U, i.e., the operator B: U → V ′, induced by b(⋅ , ⋅ ), is bijective. This is well known to be equivalent to the validity of

$$\displaystyle{ \left \{\begin{array}{l} \inf _{w\in W}\sup _{v\in V }\frac{b(w,v)} {\|w\|_{U}\|v\|_{V }} \geq \beta,\quad \sup _{v\in V }\sup _{w\in U}\frac{b(w,v)} {\|w\|_{U}\|v\|_{V }} \leq C_{b}, \\ \text{for}\,v \in V \,\exists \,w \in W,\,\ \text{such that}\ b(w,v)\neq 0,\end{array} \right. }$$
(11.32)

for some constants β, C b . However, one then faces two principal obstructions regarding an RBM based on the scheme SGA:

  1. (a)

    first, as in the case of (11.30) for small diffusion, κ U, V(B) ≤ C b β could be very large so that the corresponding error-residual relation

    $$\displaystyle{ \|u - v\|_{U} \leq \beta ^{-1}\|f - Bv\|_{ V '},\quad v \in U, }$$
    (11.33)

    renders a corresponding residual-based surrogate ill conditioned.

  2. (b)

    When b(⋅ , ⋅ ) is not coercive, the Galerkin projection does, in general, not perform as well as the best approximation.

The following approach has been used in [10] to address both (a) and (b). The underlying basic principle is not new, see [1], and variants of it have been used for different purposes in different contexts such as least squares finite element methods [18] and, more recently, in connection with discontinuous Petrov Galerkin methods [9, 11, 12]. In the context of RBMs the concept of natural norms goes sort of half way by sticking in the end to Galerkin projections [25]. This marks an essential distinction from the approach in [10] discussed later below.

The idea is to change the topology of one of the spaces so as to (ideally) make the corresponding induced operator an isometry, see also [9]. Following [10], fixing for instance, \(\|\cdot \|_{V }\), one can define

$$\displaystyle{ \|w\|_{\hat{U}}:=\sup _{v\in V }\frac{b(w,v)} {\|v\|_{V }} =\| Bw\|_{V '},\quad w \in U, }$$
(11.34)

which means that one has for Bu = f

$$\displaystyle{ \|u - w\|_{\hat{U}} =\| f - Bw\|_{V '},\quad w \in U, }$$
(11.35)

a perfect error-residual relation. It also means that replacing \(\|\cdot \|_{U}\) in (11.32) by \(\|\cdot \|_{\hat{U}}\) yields the inf-sup constant \(\hat{\beta }= 1\). Alternatively, fixing \(\|\cdot \|_{U}\), one may set

$$\displaystyle{ \|v\|_{\hat{V }}:=\sup _{w\in U}\frac{b(w,v)} {\|w\|_{U}} =\| B^{{\ast}}v\|_{ U'},\quad v \in V, }$$
(11.36)

to again arrive at an isometry \(B: U \rightarrow \hat{ V }'\), meaning

$$\displaystyle{ \|u - w\|_{U} =\| f - Bw\|_{\hat{V }'},\quad w \in U. }$$
(11.37)

Whether the norm for U or for V is prescribed depends on the problem at hand and we refer to [810] for examples of both types.

Next note that for any subspace \(W \subset U\) one has

$$\displaystyle{ u_{W} =\mathop{ \mathop{\mathrm{argmin}}\nolimits }\limits _{w\in W}\|u - w\|_{\hat{U}} =\mathop{ \mathop{\mathrm{argmin}}\nolimits }\limits _{w\in W}\|f - Bw\|_{V '}, }$$
(11.38)

and analogously for the pair \((U,\hat{V })\), i.e., the best approximation in the \(\hat{U}\) norm is a minimum residual solution in the V ′ norm.

To use residuals in V ′ as surrogates requires fixing a suitable discrete projection for a given trial space. In general, in particular when VU, the Galerkin projection is no longer appropriate since inf-sup stability of the infinite-dimensional problem is no longer inherited by an arbitrary pair of finite-dimensional trial and test spaces. To see which type of projection would be ideal, denote by R U : U′ → U the Riesz map defined for any linear functional  ∈ U′ by

$$\displaystyle{\langle \ell,w\rangle = (R_{U}\ell,w)_{U},\quad w \in U.}$$

Then, by (11.34), for any \(w \in W \subset U\), taking v: = R V Bw ∈ V one has

$$\displaystyle{b(w,v) =\langle Bw,v\rangle =\langle Bw,R_{V }Bw\rangle = (Bw,Bw)_{V '} = (w,w)_{\hat{U}}.}$$

Thus, in particular,

$$\displaystyle{b(u - u_{h},R_{V }Bw) = (u - u_{h},w)_{\hat{U}},}$$

i.e., given \(W \subset U\), using V W : = R V B(W) as a test space in the Petrov-Galerkin scheme

$$\displaystyle{ b(u_{h},v) =\langle f,v\rangle,\quad v \in V _{W}:= R_{V }B(W), }$$
(11.39)

is equivalent to computing the \(\hat{U}\)-orthogonal projection of the exact solution u of (11.31) and hence the best \(\hat{U}\) approximation to u. One readily sees that this also means

$$\displaystyle{ \inf _{w\in W}\sup _{v\in V (W)}\frac{b(w,v)} {\|w\|_{\hat{U}}\|v\|_{V }} = 1, }$$
(11.40)

i.e., we have a Petrov-Galerkin scheme for the pair of spaces W, V W with perfect stability and the Petrov-Galerkin projection is the best \(\hat{U}\)-projection. Unfortunately, this is not of much help yet, because computing the ideal test space \(V _{W} = R_{V }B(W) = B^{-{\ast}}R_{\hat{U}}^{-1}(W)\) is not numerically feasible. Nevertheless, it provides a useful orientation for finding good and practically realizable pairs of trial and test spaces, as explained next.

4.2 A Saddle Point Formulation

We briefly recall now from [9, 10] an approach to deriving from the preceding observations a practically feasible numerical scheme which, in particular, fits into the context of RBMs. Taking (11.38) as point of departure we notice that the minimization of \(\|f - Bw\|_{V '}\) over W is a least squares problem whose normal equations read: find u W  ∈ W such that (with \(R_{V '} = R_{V }^{-1}\))

$$\displaystyle{ 0 = (f - Bu_{W},Bw)_{V '} =\langle R_{V }(f - Bu_{W}),Bw\rangle,\quad w \in W. }$$
(11.41)

Introducing the auxiliary variable \(r:= R_{V }(f - Bu_{W})\) which is equivalent to

$$\displaystyle{ \langle R_{V '}r,v\rangle = (r,v)_{V } =\langle f - Bu_{w},v\rangle,\quad v \in V _{W} = R_{V }B(W), }$$
(11.42)

the two relations (11.41) and (11.42) can be rewritten in form of the saddle point problem

$$\displaystyle{ \begin{array}{lcll} (r,v)_{V } + b(u_{W},v)& =&\langle f,v\rangle,&v \in V _{W}. \\ b(w,r) & =&0, &w \in W. \end{array} }$$
(11.43)

The corresponding inf-sup constant is still one (since the supremum of b(w, v) over V W equals for each w ∈ W the supremum over all of V ) and (⋅ , ⋅ ) V is a scalar product so that (11.43) has a unique solution u W , see e.g. [5]. Taking for any w ∈ W the test function v = R V Bw ∈ V W in the first line of (11.43), one obtains

$$\displaystyle{(r,v)_{V } = (r,R_{V }Bw)_{V } =\langle r,Bw\rangle = b(w,r) = 0,}$$

by the second line in (11.43) so we see that \(\langle f,R_{V }Bw\rangle = b(u_{W},R_{V }Bw)\) holds for all w ∈ W which means that u W solves the ideal Petrov-Galerkin problem (11.39). Thus (11.43) is equivalent to the ideal Petrov Galerkin scheme (11.39).

Of course, (11.43) is still not realizable since the space V W is still not computable at affordable cost. One more step to arrive at a realizable scheme is based on the following: given the finite-dimensional space W, replacing V W in (11.43) by some (accessible) space \(Z \subset V\), amounts to a Petrov-Galerkin formulation with test space P V, Z V W , where again P V, Z denotes the V -orthogonal projection to Z. Thus, when Z is large enough the (computable) projection P V, Z V W is close enough to V W so that one obtains a stable finite-dimensional saddle point problem which is the same as saying that its inf-sup constant is safely bounded away from zero. Since Z = V would yield perfect stability the choice of \(Z \subset V\) can be viewed as a stabilization. To quantify this we follow [10] and say that for some δ ∈ (0, 1), \(Z \subset V\) is δ-proximal for \(W \subset U\) if Z is sufficiently close to the ideal test space V W  = R V B(W) in the sense that

$$\displaystyle{ \|(I - P_{V,Z})R_{V }Bw\|_{V } \leq \delta \| R_{V }Bw\|_{V },\quad w \in W. }$$
(11.44)

The related main findings from [10] can be summarized as follows.

Theorem 4.2.

  1. (i)

    The pair \((u_{W,Z},r_{W,Z}) \in W \times Z \subset U \times V\) solves the saddle point problem

    $$\displaystyle{ \begin{array}{lcll} (r_{W,Z},v)_{V } + b(u_{W,Z},v)& =&\langle f,v\rangle,&v \in Z, \\ b(w,u_{W,Z}) & =&0, &w \in W,\end{array} }$$
    (11.45)

    if and only if u W,Z solves the Petrov-Galerkin problem

    $$\displaystyle{ b(u_{W,Z},v) =\langle f,v\rangle,\quad v \in P_{V,Z}(R_{V }B(W)). }$$
    (11.46)
  1. (ii)

    If Z is δ-proximal for W, (11.46) is solvable and one has

    $$\displaystyle{ \begin{array}{rcl} \|u - u_{W,Z}\|_{\hat{U}}& \leq & \frac{1} {1-\delta }\inf _{w\in W}\|u_{W,Z} - w\|_{\hat{U}}, \\ \|u - u_{W,Z}\|_{\hat{U}} +\| r_{W,Z}\|_{V }& \leq & \frac{2} {1-\delta }\inf _{w\in W}\|u_{W,Z} - w\|_{\hat{U}}. \end{array} }$$
    (11.47)
  2. (iii)

    Zis δ-proximal for W if and only if

    $$\displaystyle{ \inf _{w\in W}\sup _{v\in Z}\frac{b(w,v)} {\|w\|_{\hat{U}}\|v\|_{V }} \geq \sqrt{1 -\delta ^{2}}. }$$
    (11.48)

Note that (11.45) involves ordinary bilinear forms and finite-dimensional spaces W, Z and (iii) says that the V -projection of the ideal test space R V B(W) onto Z is a good test space if and only if Z is δ-proximal for W. Loosely speaking, Z is large enough to “see” a substantial part of the ideal test space R V B(W) under projection. The perhaps most important messages to be taken home regarding the RBM context read as follows.

Remark 4.3.

  1. (i)

    The Petrov-Galerkin scheme (11.46) is realized through the saddlepoint problem (11.45) without explicitly computing the test space P V, Z (R V B(W)).

  1. (ii)

    Moreover, given W, by compactness and (11.44), one can in principle enlarge Z so as to make δ as small as possible, a fact that will be exploited later.

  2. (iii)

    The solution component u W, Z is a near best approximation to the exact solution u in the \(\hat{U}\) norm.

  3. (iv)

    r W, Z can be viewed as a lifted residual which tends to zero in V when W grows and can be used for a posteriori error estimation, see [9]. In the Reduced Basis context this can be exploited for certifying the accuracy of the truth solutions and for constructing computationally feasible surrogates for the construction of the reduced bases.

5 The Reduced Basis Construction

We point out next how to use the preceding results for sampling the solution manifold \(\mathcal{M}\) of a given parametric family of variational problems: given \(y \in \mathcal{Y}\), f ∈ V y ′, find u(y) ∈ U y such that

$$\displaystyle{ b_{y}(u(y),v) =\langle f,v\rangle,\quad v \in V _{y}, }$$
(11.49)

in a way that the corresponding subspaces are rate-optimal. We will always assume that the dependence of the bilinear form b y (⋅ , ⋅ ) on \(y \in \mathcal{Y}\) is affine in the sense of (11.18).

As indicated by the notation the spaces U y , V y for which the variational problems are well posed in the sense that the induced operator B y : U y  → V y ′ is bijective, could depend on y through y-dependent norms. However, to be able to speak of a “solution manifold” \(\mathcal{M}\) as a compact subset of some “reference Hilbert space,” the norms \(\|\cdot \|_{U_{y}}\) should be uniformly equivalent to some reference norm \(\|\cdot \|_{U}\) which has to be taken into account when formulating (11.49). In fact, under this condition, as shown in [10], for well-posed variational formulations of pure transport problems the dependence of the test spaces V y on \(y \in \mathcal{Y}\) is essential, in that

$$\displaystyle{ V:=\bigcap _{y\in \mathcal{Y}}V _{y} }$$
(11.50)

is a strict subset of each individual V y . This complicates the construction of a tight surrogate. We refer to [10] for ways of dealing with this obstruction and confine the subsequent discussion for simplicity to cases where the test norms \(\|\cdot \|_{V _{y}}\) are also uniformly equivalent to a single reference norm \(\|\cdot \|_{V }\), see the example later below.

Under the above assumptions, the findings of the preceding section will be used next to contrive a well-conditioned tight surrogate even for non-coercive or severely ill-conditioned variational problems which is then in general unsymmetric, i.e., V y U y . These surrogates will then be used in SGA. To obtain such a residual-based well-conditioned surrogate in the sense of (11.29), we first renorm the pairs of spaces U y or V y according to (11.34) or (11.36). In anticipation of the example below, for definiteness we concentrate on (11.34) and refer to [10] for a discussion of (11.36). As indicated above, we assume further that the norms \(\|\cdot \|_{\hat{U}_{y}},\|\cdot \|_{V _{y}}\) are equivalent to reference norms \(\|\cdot \|_{\hat{U}},\|\cdot \|_{V }\), respectively.

5.1 The Strategy

Suppose that we have already constructed a pair of spaces \(U_{n} \subset U_{y},V _{n} \subset V _{y}\), \(y \in \mathcal{Y}\), such that for a given δ < 1

$$\displaystyle{ \inf _{w\in U_{n}}\sup _{v\in V _{n}} \frac{b_{y}(w,v)} {\|w\|_{\hat{U}_{y}}\|v\|_{V _{y}}} \geq \sqrt{1 -\delta ^{2}},\quad y \in \mathcal{Y}, }$$
(11.51)

i.e., \(V _{n} \subset V\) is δ-proximal for \(U_{n} \subset U\). Thus, by Theorem 4.2, the parametric saddle point problem

$$\displaystyle{ \begin{array}{lcll} (r_{n}(y),v)_{V _{y}} + b_{y}(u_{n}(y),v)& =&\langle f,v\rangle,&v \in V _{n}, \\ b(w,r_{n}(y)) & =&0, &w \in U_{n},\end{array} }$$
(11.52)

has for each \(y \in \mathcal{Y}\) a unique solution (u n (y), r n (y)) ∈ U n × V n . By the choice of norms we know that

$$\displaystyle{ \|u(y) - u_{n}(y)\|_{\hat{U}_{y}} =\| f - B_{\mu }u_{n}(y)\|_{V _{y}'},\quad y \in \mathcal{Y}, }$$
(11.53)

i.e.,

$$\displaystyle{ R(y,U_{n} \times V _{n}):=\| f - B_{\mu }u_{n}(y)\|_{V _{y}'},\quad y \in \mathcal{Y}, }$$
(11.54)

suggests itself as a surrogate. There are some subtle issues about how to evaluate R(y, U n × V n ) in the dual \(V _{\mathcal{N}}'\) of a sufficiently large truth space \(V _{\mathcal{N}}\subset V _{y}\), \(y \in \mathcal{Y}\), so as to faithfully reflect errors in \(\hat{U}_{\mu }\), not only in the truth space \(U_{\mathcal{N}}\subset U_{y}\) but also in \(\hat{U}\), and how these quantities are actually related to the auxiliary variable \(\|r_{n}(y)\|_{V _{y}}\) which is computed anyway. As indicated before, these issues are aggravated when the norms \(\|\cdot \|_{V _{y}}\) are not all equivalent to a single reference norm. We refer to a corresponding detailed discussion in [10, §5.1] and continue working here for simplicity with the idealized version (11.54) and assume its offline feasibility.

Thus we can evaluate the errors \(\|u(y) - u_{n}(y)\|_{\hat{U}_{y}}\) and can determine a maximizing parameter y n+1 for which

$$\displaystyle{ \|u(y_{n+1}) - u_{n}(y_{n+1})\|_{\hat{U}_{y}} =\max _{y\in \mathcal{Y}}\|f - B_{\mu }u_{n}(y)\|_{V _{y}'}. }$$
(11.55)

Now relation (11.47) in Theorem 4.2 tells us that for each \(y \in \mathcal{Y}\)

$$\displaystyle{ \|u(y) - u_{n}(y)\|_{\hat{U}_{y}} \leq (1-\delta )^{-1}\inf _{ w\in U_{n}}\|u(y) - w\|_{\hat{U}_{y}}, }$$
(11.56)

i.e., u n (y) is a near best approximation to u(y) from U n which is, in fact, the closer to the best approximation the smaller δ. By (11.53) and (11.56), the surrogate (11.54) is indeed well conditioned with condition number close to one for small δ.

A natural strategy is now to enlarge U n to \(U_{n+1}:= U_{n} +\mathop{ \mathrm{span}}\,\{u(y_{n+1})\}\). In fact, this complies with the weak greedy step (11.20) in §11.3 with weakness parameter \(\gamma = (1-\delta )\) as close to one as one wishes, when δ is chosen accordingly small, provided that the pair of spaces U n , V n satisfies (11.51). A repetition would therefore, in principle, be a realization of Algorithm 1, SGA, establishing rate-optimality of this RBM. Obviously, the critical condition for such a procedure to work is to ensure at each stage the validity of the weak greedy condition (11.20) which in the present situation means that the companion space V n is at each stage δ-proximal for U n . So far we have not explained yet how to grow V n along with U n so as to ensure δ-proximality. This is explained in the subsequent section.

Remark 5.1.

One should note that, due to the possible parameter dependence of the norms \(\|\cdot \|_{\hat{U}_{y}},\|\cdot \|_{V _{y}}\) on y, obtaining tight surrogates with the aid of an explicit Petrov-Galerkin formulation would be infeasible in an RBM context because one would have to recompute the corresponding (parameter dependent) test basis for each parameter query which is not online feasible. It is therefore actually crucial to employ the saddle point formulation in the context of RBMs since this allows us to determine a space V n of somewhat larger dimension than U n which stabilizes the saddle point problem for all y simultaneously.

5.2 A Greedy Stabilization

A natural option is to enlarge V n by the second component r n (y n+1) of (11.52). Note though that the lifted residuals r n tend to zero as \(n \rightarrow \infty\). Hence, the solution manifold of the (y-dependent version of the) saddle point formulation (11.43) has the form

$$\displaystyle{\mathcal{M}\times \{ 0\},}$$

where \(\mathcal{M}\) is the solution manifold of (11.49) (since r(y) = 0 for \(y \in \mathcal{Y}\)). Thus the spaces V n are not needed to approximate the solution manifold. Instead the sole purpose of the space V n is to guarantee stability. At any rate, the grown pair \(U_{n+1},V _{n} +\mathop{ \mathrm{span}}\,\{r_{n}(y_{n+1})\} =: V _{n+1}^{0}\) may fail to satisfy now (11.51).

Therefore, in general one has to further enrich V n+1 0 by additional stabilizing elements again in a greedy fashion until (11.51) holds for the desired δ. For problems that initially arise as natural saddle point problems such as the Stokes system, enrichments by the so-called supremizers (to be defined in a moment) have been proposed already in [14, 15, 22]. In these cases it is possible to enrich V n+1 0 by a fixed a priori known number of such supremizers to guarantee inf-sup stability. As shown in [10], this is generally possible when using fixed (parameter independent) reference norms \(\|\cdot \|_{\hat{U}}\), \(\|\cdot \|_{V }\) for U and V. For the above more general scope of problems a greedy strategy was proposed and analyzed in [10], a special case of which is also considered in [15] without analysis. The strategy in [10] adds only as many stabilizing elements as are actually needed to ensure stability and works for a much wider range of problems including singularly perturbed ones. In cases where not all parameter-dependent norms \(\|\cdot \|_{V _{y}}\) are equivalent such a strategy is actually necessary and its convergence analysis is then more involved, see [10].

To explain the procedure, suppose that after growing U n to U n+1 we have already generated an enrichment V n+1 k of V n+1 0 (which could be, for instance, either \(V _{n+1}^{0}:= V _{n} +\mathop{ \mathrm{span}}\,\{r_{n}(y_{n+1})\}\) or \(V _{n+1}^{0}:= V _{n}\)) but the pair \(U_{n+1},V _{n+1}^{k}\) still fails to satisfy (11.51) for the given δ < 1. To describe the next enrichment from V n+1 k to \(V _{n+1}^{k+1}\) we first search for a parameter \(\bar{y} \in \mathcal{Y}\) and a function \(\bar{w} \in U_{n+1}\) for which the inf-sup condition (11.51) is worst, i.e.,

$$\displaystyle{ \sup _{v\in V _{n+1}^{k}} \frac{b_{\bar{y}}(\bar{w},v)} {\|v\|_{V _{\bar{y}}}\|\bar{w}\|_{\hat{U}_{\bar{y}}}} =\inf _{y\in \mathcal{Y}}\left (\inf _{w\in U_{n+1}}\sup _{v\in V _{n+1}^{k}} \frac{b_{y}(w,v)} {\|v\|_{V _{y}}\|w\|_{\hat{U}_{y}}}\right ). }$$
(11.57)

If this worst case inf-sup constant does not exceed yet \(\sqrt{ 1 -\delta ^{2}}\), the current space V n+1 k does not contain an effective supremizer for \(\bar{y},\bar{w}\), yet. However, since the truth space satisfies the uniform inf-sup condition (11.51) there must exist a good supremizer in the truth space which can be seen to be given by

$$\displaystyle{ \bar{v} =\mathop{ \mathrm{argmax}}\nolimits _{v\in V _{\bar{y}}} \frac{b_{\bar{y}}(\bar{w},v)} {\|v\|_{V _{\bar{y}}}\|\bar{w}\|_{\hat{U}_{\bar{y}}}}, }$$
(11.58)

providing the next enrichment

$$\displaystyle{ V _{n+1}^{k+1}:= \mathrm{span}\{V _{ n+1}^{k},\bar{v}\}. }$$
(11.59)

We defer some comments on the numerical realization of finding \(\bar{y},\bar{v}\) in (11.57), (11.58) to the next section.

This strategy can now be applied recursively until one reaches a satisfactory uniform inf-sup condition for the reduced spaces. Again, the termination of this stabilization loop is easily ensured when (11.18) holds and the norms \(\|\cdot \|_{\hat{U}_{y}}\), \(\|\cdot \|_{V _{y}}\) are uniformly equivalent to reference norms \(\|\cdot \|_{\hat{U}}\), \(\|\cdot \|_{V }\), respectively, but is more involved in the general case [10].

5.3 The Double Greedy Scheme and Main Result

Thus, in summary, to ensure that the greedy scheme SGA with the particular surrogate (11.54), based on the corresponding outer greedy step for extending U n to U n+1, has the weak greedy property (11.20), one can employ an inner stabilizing greedy loop producing a space \(V _{n+1} = V _{n+1}^{k^{{\ast}} }\) which is δ-proximal for U n+1. Here k  = k (δ) is the number of enrichment steps needed to guarantee the validity of (11.51) for the given δ. A sketchy version of the corresponding “enriched” SGA, developed in [10], looks is given below in Algorithm 3.

Algorithm 3 Double greedy algorithm

As indicated above, both Algorithm 1, SGA, and Algorithm 3, SGA-dou, are surrogate-based greedy algorithms. The essential difference is that for non-coercive problems or problems with an originally large variational condition number in SGA-dou an additional interior greedy loop provides a tight well-conditioned (unsymmetric) surrogate which guarantees the desired weak greedy property (with weakness constant γ as close to one as one wishes) needed for rate-optimality.

Of course, the viability of Algorithm SGA-dou hinges mainly on two questions:

  1. (a)

    how to find the worst inf-sup constant in (11.57) and how to compute the supremizer in (11.58)?

  2. (b)

    does the inner greedy loop terminate (early enough)?

As for (a), it is well known that, fixing bases for U n , V n k, finding the worst inf-sup constant amounts to determine for \(y \in \mathcal{Y}\) the cross-Gramian with respect to b y (⋅ , ⋅ ) and compute its smallest singular value. Since these matrices are of size n × (n + k) and hence (presumably) of “small” size, a search over \(\mathcal{Y}\) requires solving only problems in the reduced spaces and are under assumption (11.18) therefore offline feasible. The determination of the corresponding supremizer \(\bar{v}\) in (11.58), in turn, is based on the well-known observation that

$$\displaystyle{\mathop{\mathrm{argmax}}\nolimits _{v\in V _{\bar{y}}}\frac{b_{\bar{y}}(\bar{w},v)} {\|v\|_{V _{\bar{y}}}} = R_{V _{\bar{y}}}B_{\bar{y}}\bar{w},}$$

which is equivalent to solving the Galerkin problem

$$\displaystyle{(\bar{v},z)_{V _{\bar{y}}} = b_{\bar{y}}(\bar{w},z),\quad z \in V _{\bar{y}}.}$$

Thus each enrichment step requires one offline Galerkin solve in the truth space.

A quantitative answer to question (b) is more involved. We are content here with a few related remarks and we refer to a detailed discussion of this issue in [10]. As mentioned before, when all the norms \(\|\cdot \|_{\hat{U}_{y}},\|\cdot \|_{V _{y}}\), \(y \in \mathcal{Y}\), are equivalent to reference norms \(\|\cdot \|_{\hat{U}},\|\cdot \|_{V }\), respectively, the inner loop terminates after at most the number of terms in (11.18). When the norms \(\|\cdot \|_{V _{y}}\) are no longer uniformly equivalent to a single reference norm termination is less clear. Of course, since all computations are done in a truth space which is finite dimensional, compactness guarantees termination after finitely many steps. However, the issue is that the number of steps should not depend on the truth space dimension. The reasoning used in [10] to show that (under mild assumptions) termination happens after a finite number of steps, independent of the truth space dimension, is based on the following fact. Defining \(U_{n}^{1}(y):=\{ w \in U_{n}:\| w\|_{\hat{U}_{y}} = 1\}\), solving the problem

$$\displaystyle{ (\bar{y},\bar{w}):=\mathop{ \mathop{\mathrm{argmax}}\nolimits }\limits _{y\in \mathcal{Y};w\in U_{n}^{1}(y)}\inf _{\phi \in V _{n}^{k}}\|R_{V _{y}}B_{y}w -\phi \|_{V _{y}}, }$$
(11.60)

when all the \(\|\cdot \|_{\hat{U}_{y}}\) norms are equivalent to a single reference norm, can be shown to be equivalent to a greedy step of the type (11.57) and can hence again be reduced to similar small eigenvalue problems in the reduced space. Note, however, that (11.60) is similar to a greedy space growth used in the outer greedy loop and for which some understanding of convergence is available. Therefore, successive enrichments based on (11.60) are studied in [10] regarding their convergence. The connection with the inner stabilizing loop based on (11.57) is that

$$\displaystyle{\mathop{\mathop{\mathrm{argmax}}\nolimits }\limits _{y\in \mathcal{Y};w\in U_{n}^{1}(y)}\inf _{\phi \in V _{n}^{k}}\|R_{V _{\bar{y}}}B_{\bar{y}}w -\phi \|_{V _{\bar{y}}} \leq \delta }$$

just means

$$\displaystyle{\inf _{\phi \in V _{n}^{k}}\|R_{V _{y}}B_{y}w -\phi \|_{V _{y}} \leq \delta \| R_{V _{y}}B_{y}\|_{V _{y}} =\delta \| w\|_{\hat{U}_{y}},\quad w \in U_{n},\,y \in \mathcal{Y},}$$

which is a statement on δ-proximality known to be equivalent to inf-sup stability, see Theorem 4.2, and (11.44).

A central result from [10] can be formulated as follows, see [10, Theorem 5.5].

Theorem 5.2.

If (11.18) holds and the norms \(\|\cdot \|_{\hat{U}_{y}},\|\cdot \|_{V _{y}}\) are all equivalent to a single reference norm \(\|\cdot \|_{\hat{U}},\|\cdot \|_{V }\) , respectively, and the surrogates (11.54) are used, then the scheme SGA-dou is rate-optimal, i.e., the greedy errors \(\sigma _{n}(\mathcal{M})_{\hat{U}}\) decay at the same rate as the n-widths \(d_{n}(\mathcal{M})_{\hat{U}}\), \(n \rightarrow \infty\).

Recall that the quantitative behavior of the greedy error rates are directly related to those of the n-widths by γ = c S , see Theorem 3.1. This suggests that a fast decay of \(d_{n}(\mathcal{M})_{\hat{U}}\) is reflected by the corresponding greedy errors already for moderate values of n which is in the very interest of reduced order modeling. This will be confirmed by the examples below. In this context an important feature of SGA-dou is that through the choice of the δ-proximality parameter the weakness parameter γ can be driven toward one, of course, at the expense of somewhat larger spaces V n+1. Hence, stability constants close to one are built into the method. This is to be contrasted by the conventional use of SGA based on surrogates that are not ensured to be well conditioned and for which the computation of the certifying stability constants tends to be computationally expensive, see e.g. [21].

5.4 A Numerical Example

The preceding theoretical results are illustrated next by a numerical example that brings out some of the main features of the scheme. While the double greedy scheme applies to noncoercive or indefinite problems (e.g., see [10] for pure transport) we focus here on a classical singularly perturbed problem because it addresses also some principal issues for RBMs regarding problems with small scales. Specifically, we consider the convection-diffusion problem (11.30) on \(\Omega = (0,1)^{2}\) for a simple parameter-dependent convection field

$$\displaystyle{\vec{b}(y):= \left (\begin{array}{*{10}c} \cos y\\ \sin y \end{array} \right ),\quad y \in [0,2\pi ),\quad c = 1,}$$

keeping for simplicity the diffusion level \(\varepsilon\) fixed but allowing it to be arbitrarily small. All considerations apply as well to variable and parameter-dependent diffusion with any arbitrarily small but strictly positive lower bound. The “transition” to a pure transport problem is discussed in detail in [10, 28]. Parameter-dependent convection directions mark actually the more difficult case and are, for instance, of interest with regard to kinetic models.

Let us first briefly recall the main challenges posed by (11.30) for very small diffusion \(\varepsilon\). The problem becomes obviously dominantly unsymmetric and singularly perturbed. Recall that for each positive \(\varepsilon\) the problem possesses for each \(y \in \mathcal{Y}\) a unique solution u(y) in \(U = H_{0}^{1}(\Omega )\) that has a zero trace on the boundary \(\partial \Omega\). However, as indicated earlier, the condition number κ U, U(B y ) of the underlying convection-diffusion operator B y , viewed as an operator from \(U = H_{0}^{1}(\Omega )\) onto \(U' = H^{-1}(\Omega )\), behaves like \(\varepsilon ^{-1}\), that is, it becomes increasingly ill conditioned. This has well known consequences for the performance of numerical solvers but above all for the stability of corresponding discretizations.

We emphasize that the conventional mesh-dependent stabilizations like SUPG (cf. [17]) do not offer a definitive remedy because the corresponding condition, although improved, remains very large for very small \(\varepsilon\). In [19] SUPG stabilization for the offline truth calculations as well as for the low-dimensional online Galerkin projections are discussed for moderate Peclét numbers of the order of up to 103. In particular, comparisons are presented when only the offline phase uses stabilization while the un-stabilized bilinear form is used in the online phase, see also the references in [19] for further related work.

As indicated earlier, we also remark in passing that the singularly perturbed nature of the problem poses an additional difficulty concerning the choice of the truth space \(U_{\mathcal{N}}\). In fact, when \(\varepsilon\) becomes very small one may not be able to afford resolving correspondingly thin layers in the truth space which increases the difficulty of capturing essential features of the solution by the reduced model.

This problem is addressed in [10] by resorting to a weak formulation that does not use \(H_{0}^{1}(\Omega )\) (or a renormed version of it) as a trial space but builds on the results from [8]. A central idea is to enforce the boundary conditions on the outflow boundary \(\Gamma _{+}(y)\) only weakly. Here \(\Gamma _{+}(y)\) is that portion of \(\partial \Omega\) for which the inner product of the outward normal and the convection direction is positive. Thus solutions are initially sought in the larger space \(H_{0,\Gamma ^{-}(y)}^{1}(\Omega ) =: U_{-}(y)\) enforcing homogeneous boundary conditions only on the inflow boundary \(\Gamma _{-}(y)\). Since the outflow boundary and hence also the inflow boundary depend on the parameter y, this requires subdividing the parameter set into smaller sectors, here four, for which the outflow boundary \(\Gamma _{+} = \Gamma _{+}(y)\) remains unchanged. We refer in what follows for simplicity to one such sector denoted again by \(\mathcal{Y}\).

The following prescription of the test space falls into the category (11.34) where the norm for U is adapted. Specifically, choosing

$$\displaystyle\begin{array}{rcl} s_{y}(u,v)&:=& \frac{1} {2}\big(\langle B_{y}u,v\rangle +\langle B_{y}v,u\rangle \big), {}\\ \|v\|_{V _{y}}^{2}&:=& s_{ y}(v,v) =\epsilon \vert v\vert _{H^{1}(\Omega )}^{2} +\Big\|\Big (c -\frac{1} {2}\mathrm{div}\,\vec{b}(y)\Big)^{1/2}v\Big\|_{ L_{2}(\Omega )}^{2}, {}\\ \end{array}$$

in combination with a boundary penalization on \(\Gamma _{+}\), we follow [8, 28] and define

$$\displaystyle{\|u\|_{\bar{U}_{y}}^{2}:=\|\bar{ B}_{ y}u\|_{\bar{V }_{y}'}^{2} =\|\bar{ B}_{ y}u\|_{V _{y}'}^{2} + \lambda \|u\|_{ H_{b}(\mu )}^{2},}$$

where \(H_{b}(y) = H_{00}^{1/2}(\Gamma _{+}(y))\), \(\bar{V }_{y}:= V _{y} \times H_{b}(y)'\) and \(\bar{B}_{y}\) denotes the operator induced by this weak formulation over \(\bar{U}_{y}:= H_{0,\Gamma _{-}(y)}^{1}(\Omega ) \times H_{b}(y)\). The corresponding variational formulation is of minimum residual type (cf. (11.38)) and reads

$$\displaystyle{ u(y) =\mathop{ \mathrm{argmin}}\limits _{w\in U_{-}(y)}\big\{\|\bar{B}_{y}w - f\|_{V _{y}'}^{2} +\lambda \| w\|_{ H_{b}(y)}^{2}\big\}. }$$
(11.61)

One can show that its (infinite-dimensional) solution, whenever being sufficiently regular, solves also the strong form of the convection diffusion problem (11.30). Figure 11.1 illustrates the effect of this formulation where we set \(n = \mathrm{dim}\,U_{n},n_{V }:= \mathrm{dim}\,V _{n}\).

Fig. 11.1
figure 1

Left: \(\varepsilon = 2^{-5},n = 6,n_{V } = 13\); middle: \(\varepsilon = 2^{-7},n = 7,n_{V } = 20\); right: \(\varepsilon = 2^{-26},n = 20,n_{V } = 57\).

The shaded planes shown in Figure 11.1 indicate the convection direction for which the snapshot is taken. For moderately large diffusion the boundary layer at \(\Gamma _{+}\) is resolved by the truth space discretization and the boundary conditions at the outflow boundary are satisfied exactly. For smaller diffusion in the middle example the truth space discretization can no longer resolve the boundary layer and for very small diffusion (right) the solution is close to the one for pure transport. The rationale of (11.61) is that all norms commonly used for convection-diffusion equations resemble the one chosen here, for instance in the form of a mesh-dependent “broken norm,” which means that most part of the incurred error of an approximation is concentrated in the layer region, see, e.g., [24, 27]. Hence, when the layers are not resolved by the discretization, enforcing the boundary conditions does not improve accuracy and, on the contrary, may degrade accuracy away from the layer by causing oscillations. The present formulation instead avoids any nonphysical oscillations and enhances accuracy in those parts of the domain where this is possible for the afforded discretization, see [8, 10, 28] for a detailed discussion. The following table quantifies the results for the case of small diffusion \(\varepsilon = 2^{-26}\) and a truth discretization whose a posteriori error bound is 0. 002.

Columns 3 and 8 show the δ governing the condition of the saddle point problems (and hence of the corresponding Petrov-Galerkin problems), see (11.51), the greedy space growth is based upon (Table 11.1). Hence, the surrogates are very tight giving rise to weakness parameters very close to one. As indicated in Remark 4.3 one can use also an a posteriori bound for the truth solution based on the corresponding lifted residual. Columns 5 and 10 show therefore the relative accuracy of the current reduced model and the truth model. This corresponds to the stability constants computed by conventional RBMs. Even for elliptic problems these latter ones are significantly larger than the ones for the present singularly perturbed problem which are guaranteed to be close to one by the method itself. Based on the a posteriori bounds for the truth solution (which are also obtained with the aid of tailored well-conditioned variational formulations, see [8]), the greedy space

Table 11.1 Convection-diffusion equation, \(\varepsilon = 2^{-26}\), maximal a posteriori error 0. 00208994

growth is stopped when the surrogates reach the order of the truth accuracy. As illustrated in Figure 11.2, in the present example this is essentially already the case for ≤ 20 trial reduced basis functions and almost three times as many test functions. To show this “saturation effect” we have continued the space growth formally up to n = 24 showing no further significant improvement which is in agreement with the resolution provided by the truth space. These relations agree with the theoretical predictions in [10]. Figure 11.2 illustrates also the rapid gain of accuracy by the first few reduced basis functions which supports the fact that the solution manifold is “well seen” by the Petrov-Galerkin surrogates. More extensive numerical tests shown in [10] show that the achieved stability is independent of the diffusion but the larger the diffusion the smaller become the dimensions \(n = \mathrm{dim}\,U_{n},n_{V } = \mathrm{dim}V _{n}\) for the reduced spaces. This indicates the expected fact that the larger the diffusion the smoother is the dependence of u(y) on the parameter y. In fact, when \(\varepsilon \rightarrow 0\) one approaches the regime of pure transport where the smoothness of the parameter dependence is merely Hölder continuity requiring for a given target accuracy a larger number of reduced basis functions, see [10].

Fig. 11.2
figure 2

Convection-diffusion equation, \(\varepsilon = 2^{-26}\), maximal a posteriori error 0. 00208994

6 Is it Necessary to Resolve All of \(\mathcal{M}\) ?

The central focus of the preceding discussion has been to control the maximal deviation

$$\displaystyle{ \sigma _{n}(\mathcal{M})_{U} =\max _{y\in \mathcal{Y}}\|u(y) - P_{U,U_{n}}u(y)\|_{U} }$$
(11.62)

and to push this deviation below a given tolerance for n as small as possible. However, in many applications one is not interested in the whole solution field but only in a quantity of interest I(y), typically of the form I(y) = (u(y)) where  ∈ U′ is a bounded linear functional. Looking then for some desired optimal state I  = (u(y )) one is interested in a guarantee of the form

$$\displaystyle{ \vert \ell(u_{n}(y)) -\ell (u(y))\vert \leq \mathrm{tol},\quad y \in \mathcal{Y}, }$$
(11.63)

where the states u n (y) belong to a possibly small reduced space U n in order to be then able to carry out the optimization over \(y \in \mathcal{Y}\) in the small space \(U_{n} \subset U\). Asking only for the values of just a linear functional of the solution seems to be much less demanding than asking for the whole solution and one wonders whether this can be exploited in favor of even better online efficiency.

Trying to reduce computational complexity by exploiting the fact that retrieving only a linear functional of an unknown state - a scalar quantity - may require less information than recovering the whole state is the central theme of goal-oriented adaptation in finite element methods, see [3]. Often the desired accuracy is indeed observed to be reached by significantly coarser discretizations than needed to approximate the whole solution within a corresponding accuracy. The underlying effect, sometimes referred to as “squared accuracy” is well understood and exploited in the RBM context as well, see [16, 21]. We briefly sketch the main ideas for the current larger scope of problems and point out that, nevertheless, a guarantee of the form (11.63) ultimately requires controlling the maximal deviation of a reduced space in the sense of (11.62). Hence, an optimal sampling of a solution manifold remains crucial.

First, a trivial estimate gives for  ∈ U

$$\displaystyle{ \vert \ell(u_{n}(y)) -\ell (u(y))\vert \leq \|\ell\|_{U'}\|u_{n}(y) - u(y)\|_{U} }$$
(11.64)

so that a control of \(\sigma _{n}(\mathcal{M})_{U}\) would indeed yield a guarantee. However, the n needed to drive \(\|\ell\|_{U'}\sigma _{n}(\mathcal{M})_{U}\) below tol is usually larger than necessary.

To explain the principle of improving on (11.64) we consider again a variational problem of the form (11.31) (suppressing any parameter dependence for a moment) for a pair of spaces U, V where we assume now that κ U, V(B) ≤ C b c b is already small, possibly after renorming an initial less favorable formulation through (11.34) or (11.36). Let u ∈ U again denote the exact solution of (11.31). Given a  ∈ U′ we wish to approximate (u), using an approximate solution \(\bar{u} \in W \subset U\) defined by

$$\displaystyle{ b(\bar{u},v) =\langle f,v\rangle,\quad v \in \tilde{ V }_{W} \subset V, }$$
(11.65)

where \(\tilde{V }_{W}\) is a suitable test space generated by the methods discussed in §11.4.1. In addition we will use the solution z ∈ V of the dual problem:

$$\displaystyle{ b(w,z) = -\ell(w),\quad w \in U, }$$
(11.66)

together with an approximation \(\bar{z} \in Z \subset V\) defined by

$$\displaystyle{ b(w,\bar{z}) = -\ell(w),\quad w \in \tilde{ W}_{Z} \subset U, }$$
(11.67)

again with a suitable test space \(\tilde{W}_{Z}\). Recall that we need not determine the test spaces \(\tilde{V }_{W},\tilde{W}_{Z}\) explicitly but rather realize the corresponding Petrov-Galerkin projections through the equivalent saddle-point formulations with suitable δ-proximal auxiliary spaces generated by a greedy stabilization.

Then, defining the primal residual functional

$$\displaystyle{ r_{\bar{u}}(v):= r(\bar{u},v):= b(u -\bar{ u},v) =\langle f,v\rangle - b(\bar{u},v) }$$
(11.68)

and adapting the ideas in [16, 21] for the symmetric case V = U to the present slightly more general setting, we claim that

$$\displaystyle{ \hat{\ell}(\bar{u}):=\ell (\bar{u}) - r(\bar{u},\bar{z}) }$$
(11.69)

is an approximation to the true value (u) satisfying

$$\displaystyle{ \vert \hat{\ell}(\bar{u}) -\ell (u)\vert \leq C\inf _{w\in W}\|u - w\|_{U}\inf _{v\in Z}\|z - v\|_{V }, }$$
(11.70)

where C depends only on the inf-sup constant of the finite-dimensional problems. In fact, since by (11.66),

$$\displaystyle{\ell(u) -\ell (\bar{u}) = b(\bar{u} - u,z) = -r(\bar{u},z),}$$

one has \(\ell(u) =\ell (\bar{u}) - r(\bar{u},z)\) and hence

$$\displaystyle\begin{array}{rcl} \vert \hat{\ell}(\bar{u}) -\ell (u)\vert & =& \vert \ell(\bar{u}) - r(\bar{u},\bar{z}) -\ell (\bar{u}) + r(\bar{u},z)\vert = \vert r(\bar{u},z -\bar{ z})\vert = \vert b(u -\bar{ u},z -\bar{ z})\vert {}\\ &\leq & C_{b}\|u -\bar{ u}\|_{U}\|z -\bar{ z}\|_{V }, {}\\ \end{array}$$

which confirms the claim since \(\bar{u}\), \(\bar{z}\) are near best approximations due to the asserted inf-sup stability of the finite-dimensional problems.

Clearly, (11.70) says that in order to approximate (u) the primal approximation in U need not resolve u at all as long as the dual solution z is approximated well enough. Moreover, when is a local functional, e.g., a local average approximating a point evaluation, z is close to the corresponding Green’s function with (near) singularity in the support of . In the elliptic case z would be very smooth away from the support of and hence well approximable by a relatively small number of degrees of freedom concentrated around the support of . Thus it may very well be more profitable to spend less effort on approximating u than on approximating z.

Returning to parameter-dependent problems (11.49), the methods in §11.5 can now be used as follows to construct possibly small reduced spaces for a frequent online evaluation of the quantities I(y) = (u(y)). We assume that we already have properly renormed families of norms \(\|\cdot \|_{U_{y}},\|\cdot \|_{V _{y}}\), \(y \in \mathcal{Y}\), with uniform inf-sup constants close to one. We also assume now that both families of norms are equivalent (by compactness of \(\mathcal{Y}\) uniformly equivalent) to reference norms \(\|\cdot \|_{U},\|\cdot \|_{V }\), respectively. Hence, we can consider two solution manifolds

$$\displaystyle{\mathcal{M}_{\mathrm{pr}}:=\{ u(y) = B_{y}^{-1}f,\,y \in \mathcal{Y}\}\subset U,\quad \mathcal{M}_{\mathrm{ dual}}:=\{ z(y):= B_{y}^{-{\ast}}\ell,\,y \in \mathcal{Y}\}\subset V,}$$

and use Algorithm 3, SGA-dou, to generate (essentially in parallel) two sequences of pairs of reduced spaces

$$\displaystyle{(U_{n},V _{n}),\,(Z_{n},W_{n}),\quad n \in \mathbb{N}.}$$

Here \(V _{n} \subset V,W_{n} \subset U\) are suitable stabilizing spaces such that for m < n and for the corresponding reduced solutions \(u_{m}(y) \in U_{m},z_{n-m}(y) \in Z_{n-m}\) the quantity

$$\displaystyle{ I_{n,m}(y):=\ell (u_{m}(y)) - r(u_{m}(y),z_{n-m}(y)) }$$
(11.71)

satisfies

$$\displaystyle{ \vert I(y) - I_{n,m}(y)\vert \leq C\sigma _{m}(\mathcal{M}_{\mathrm{pr}})_{U}\sigma _{n-m}(\mathcal{M}_{\mathrm{dual}})_{V }, }$$
(11.72)

with a constant C independent of n, m. The choice of m < n determines how to distribute the computational effort for computing the two sequences of reduced bases and their stabilizing companion spaces. By Theorem 5.2, one can see that whichever n-width rate \(d_{n}(\mathcal{M}_{\mathrm{pr}})_{U}\) or \(d_{n}(\mathcal{M}_{\mathrm{dual}})_{V }\) decays faster one can choose m < n to achieve for a total of \(\mathrm{dim}\,U_{m} + \mathrm{dim}\,Z_{n-m} = n\) the smallest error bound. Of course, the rates are not known and one can use the tight surrogates to bound and estimate the respective errors very accurately. For instance, when \(d_{n}(\mathcal{M}_{\mathrm{pr}})_{U} \leq Cn^{-\alpha }\), \(d_{n}(\mathcal{M}_{\mathrm{dual}})_{V } \leq Cn^{-\beta }\), \(m =\Big \lfloor \Big( \frac{\alpha }{\alpha +\beta }\Big)n\Big\rfloor\) yields an optimal distribution with a bound

$$\displaystyle{ \vert I(y) - I_{n,m}(y)\vert \leq C\Big(\frac{\alpha +\beta } {\beta } \Big)^{\beta }\Big(\frac{\alpha +\beta } {\alpha } \Big)^{\alpha }n^{-(\alpha +\beta )}. }$$
(11.73)

In particular, when β > α the dimensions on the reduced bases for the dual problem should be somewhat larger but essentially using the same dimensions for the primal and dual reduced spaces yields the rate \(n^{-(\alpha +\beta )}\) confirming the “squaring” when α = β. In contrast, as soon as either one of the n-width rates decays exponentially it is best to grow only the reduced spaces for the faster decay while keeping a fixed space for the other side.

7 Summary

We have reviewed recent developments concerning reduced basis methods with the following main focus. Using Kolmogorov n-width as a benchmark for the performance of reduced basis methods in terms of minimizing the dimensions of the reduced models for a given target accuracy, we have shown that this requires essentially to construct tight well-conditioned surrogates for the underlying variational problem. We have explained how renormation in combination with inner stabilization loops can be used to derive such residual-based surrogates even for problem classes not covered by conventional schemes. This includes in a fully robust way indefinite as well as ill-conditioned (singularly perturbed) coercive problems. Greedy strategies based on such surrogates are then shown to constitute an optimal sampling strategy, i.e., the resulting snapshots span reduced spaces whose distances from the solution manifold decay essentially at the same rate as the Kolmogorov n-widths. This means, in particular, that stability constants need not be determined by additional typically expensive computations but can be pushed by the stabilizing inner greedy loop as close to one as one wishes. Finally, we have explained why the focus on uniform approximation of the entire solution manifold is equally relevant for applications where only functionals of the parameter-dependent solutions have to be approximated.