1 Introduction

During recent decades, reduced order modeling has become a well-established tool in large scale numerical computing. Reduced basis (RB) methods turned out to be useful for reducing the computational complexity of parametrized (stochastic) or time dependent differential equations. Identification of parameters (so called inverse problems) usually needs a large number of solutions for many values of parameters [39]. An appropriate reduced model can significantly shorten solution time. Numerical methods for time dependent problems can employ sufficiently rich but small bases obtained from some solution history where the next time steps are to be solved. Within the RB methods, reduced order models are obtained during the preparatory offline phase, which allows us to efficiently compute relatively accurate solutions of the original-sized problems during the online phase. A large variety of methods providing reduced bases has been developed. The latest overview of the main issues concerning RB methods can be found in [31]. For example, in proper orthogonal decomposition methods [24] a singular value decomposition is used to choose the best subset from the whole set of possible discretized solutions. See, for example [41] for its adaptive form. A novelty method for a low-rank approximation to the solution of the stochastic Galerkin method is presented in [34]. In our paper, we deal with the so called greedy reduced basis (GRB) algorithm where the reduced basis is built up consecutively element by element [6, 20]. The choice of the function which has to be added to the current reduced basis depends on its contribution to the approximation quality of the new basis. Typically, such an element of the discretized solution manifold that achieves the largest distance from the span of the current reduced basis can be chosen. In practice, a sufficiently dense but finite set of snapshots is scanned instead of the whole manifold. Still, in every step of the GRB method it is necessary to check every element of this set which is computationally demanding. In our paper we focus on this part of the GRB algorithm. Especially, we introduce a new strategy for estimating the distance of a high-dimensional (high-fidelity) solution of the discretized parametric problem from the span of a current reduced basis. These error estimates are called a posteriori error estimates, see e.g. [18, 22], as they are only obtained after the inexact discretized solution is known. We should emphasize that here we consider the estimates of the distance between an approximate solution and the exact solution of the discretized problem contrary to many cases in literature where the name a posteriori error estimate serves for estimating the distance between an approximate and the exact solution of the underlying infinite-dimensional problem, cf. [1, 33].

The a posteriori error estimates are in fact estimates of algebraic errors of inexact discretized solutions. To obtain guaranteed bounds to these errors is in general a nontrivial problem; see, for example [33]. However, for special types of data, namely if the coefficient function of the problem is in the affine form, several approaches have been developed which provide guaranteed and feasible bounds to some norms of the error. The classical and mostly used estimates are based on estimating the dual norm of the residual scaled by (a bound to) the coercivity constant; see, for example [11, 18, 19, 38]. The complexity of the estimates does not depend on the size of the high fidelity problem; see, e.g. [18, 19]. However, the formula appears to be unstable in the case of large data or ill-conditioned problems. Rewriting the formula leads to better stability but increases memory consumption [11, 14, 15, 19]. Using some interpolation schemes [13,14,15] yields efficient but only approximate (not guaranteed) estimates. In recent paper [23], a hierarchical estimator has been introduced based on constructing some larger space than the reduced space and on a saturation assumption. Thus the obtained error bounds are not guaranteed as well. In all estimators mentioned so far, (a lower bound to) the coercivity constant must be employed. Obtaining it can be difficult in many applications; then the successive constrained method [26, 27] can appear as useful.

In this paper, the classical greedy RB algorithm is considered, but the a posteriori error estimate is new. We introduce a hierarchical estimate, the motivation for which comes from multilevel preconditioning methods, see, e.g. [3, 21]. There are several main differences between these the new and the classical estimators. While the classical upper bound estimator is defined as the \(H^1\) dual norm of the residual scaled by the squared root of (a lower bound to) the coercivity constant, the new estimator approximates directly the energy norm of the error taking into account the actual values of parameters. This can be viewed as more physical. Moreover, the conjugate gradient method minimizes the energy norm of the error (on Krylov subspaces) as well. Of course, the dual norm of the residual is equivalent to the energy norm of the error, but for small and large coercivity and continuity constants, respectively, these two norms can provide rather different values. In some problems, accumulating the error in some small part of the solution domain can destroy validity of the solution, cf. [32]. The new estimator performs locally, thus the distribution of the error over the domain is well approximated. To gain all these benefits, we pay some extra effort during the offline phase; namely the complexity of the new estimator depends on the size of the high-fidelity problem. However, including adaptive enriching the training set for the RB algorithm, cf. [25], reduces the number of problems that have to be solved and makes the multilevel estimator competitive.

The goal of this paper is twofold. We suggest a new kind of a posteriori error estimates for the GRB algorithm and derive two-sided guaranteed bounds for the energy norm of the true error. The second goal of this paper is to present the GRB algorithm with both the standard and the new a posteriori error estimates not only in the form of general analytical formulas but also using linear algebra formulations to provide a clear connection between analytical and linear algebra objects. Our ideas are demonstrated on simple benchmark examples of parameter dependent elliptic differential equations. The outline of the paper is as follows. In the next section we describe the parameter dependent elliptic differential equation in the form which we are dealing with. In Sect. 3, the GRB method is described. In Sect. 4, the classical a posteriori error estimate (mean based, called MB) is described and the new one (multilevel, called ML) is introduced. In Sect. 5, the corresponding GBR algorithms are described and their computational and storage costs are discussed. The paper is concluded with numerical experiments and a short discussion.

2 Problem setting

In parameter dependent differential equations, the parameter space is usually a probability space equipped with a probability measure. Thus, let us start by introducing the classical probability framework for uncertainty quantification. The triple \({\mathcal{T}}_\Omega = (\Omega , {{\mathcal{F}}}, P)\) denotes a probability space where \(\Omega\) is a sample space representing all possible outcomes, \({{\mathcal{F}}}\) is a \(\sigma\)-algebra in \(\Omega\) and P is a probability measure on \({{\mathcal{F}}}\). We assume that the random input to the equation can be represented by a random vector \({{\mathcal{X}}} = ({{\mathcal{X}}}_1, {{\mathcal{X}}}_2, \dots {{\mathcal{X}}}_K)\), where \({{\mathcal{X}}}_k: \Omega \rightarrow {{\mathbb{R}}}, \, k=1, \dots K\), are real-valued random variables defined in \({{\mathcal{T}}}_\Omega\).

Let us denote by \(\Gamma \subseteq {{\mathbb{R}}}^K\) the image of \({{\mathcal{X}}}\), i.e. \(\Gamma \subseteq \Pi _{k=1}^{K} \Gamma _k\), where each \(\Gamma _k = \{{{\mathcal{X}}}_k(\omega ), \omega \in \Omega \}\) is a measure space equipped with the measure \(\rho _k(z)\,{\mathrm{d}}z\) induced by \({{\mathcal{X}}}_k\). Then, the system with a random input can be viewed as a parametric system which depends on the vector of parameters \(\varvec{\xi } = ((\varvec{\xi })_1, \dots (\varvec{\xi })_K), \ \varvec{\xi } \in \Gamma\). The values of \(\varvec{\xi }\) represent realizations of the random vector \({{\mathcal{X}}}\).

We solve the problem to find u dependent on \({\varvec{x}}\in D\) and \(\varvec{\varvec{\xi }}\in \Gamma\), \(u(\cdot ,\varvec{\varvec{\xi }})\in {W_0^{1,2}(D)}\), such that

$$\begin{aligned}&\int _D a({{\varvec{x}}},{\varvec{\xi }})\nabla u({\varvec{x}},\varvec{\xi })\nabla v({\varvec{x}})\, {\mathrm{d}}{\varvec{x}}\nonumber \\&\quad =\int _D f({\varvec{x}})v({\varvec{x}})\, {\mathrm{d}}{\varvec{x}} \quad {\text {for all }} \ v\in {W_0^{1,2}(D)},\; \varvec{\xi }\in \Gamma , \end{aligned}$$
(1)

where \(D\subset {{\mathbb{R}}}^2\) is a bounded polygonal domain, \(u({\varvec{x}},\varvec{\xi })=0\) on \(\delta D\times \Gamma\), \(f\in L^2(D)\) and \(a(\cdot ,\varvec{\xi })\in L^\infty (D)\) for \(\varvec{\xi }\in \Gamma\). The gradient \(\nabla\) is considered with respect to the physical variable \({{\varvec{x}}}\). In particular, we may search for some characteristics of the solution, like the mean \({\mathrm{E}}\,u({\varvec{x}},\cdot )\) and the variance \({\mathrm{Var}}\, u({\varvec{x}},\cdot )\) of \(u({\varvec{x}},\varvec{\xi })\) for every \({\varvec{x}}\in D\), or another quantity of interest. The coefficient \(a({\varvec{x}},\varvec{\xi })\) in (1) is considered in the affine form

$$\begin{aligned} a({\varvec{x}},\varvec{\xi })=a_0({\varvec{x}})+\sum _{k=1}^K ({\varvec{\xi }})_k a_k({\varvec{x}}), \end{aligned}$$
(2)

where \(a_k({\varvec{x}})\in L^\infty (D)\), \(\varvec{\xi }=((\varvec{\xi })_1,\dots ,(\varvec{\xi })_K)\), \(({\varvec{\xi }})_k\in \Gamma _k\). The function \(a_0({\varvec{x}})\) may represent the mean value of \(a({\varvec{x}},{\varvec{\xi }})\). We assume that there exist constants \(0<\alpha _1\le \alpha _2<\infty\), such that

$$\begin{aligned} \alpha _1\le a({\varvec{x}},\varvec{\xi })\le \alpha _2 \quad \text {for all}\; {\varvec{x}}\in D,\ \varvec{\xi }\in \Gamma . \end{aligned}$$
(3)

For the discretization of problem (1) with respect to the physical variable \({\varvec{x}}\in D\), we employ the finite element (FE) method, see e.g. [9], with piece-wise bilinear basis functions \(\psi _n({\varvec{x}})\), \(n=1,\dots ,N\), using a grid of N inner nodes on D. Let us denote the N-dimensional span of these functions by \(V_N\). This type of discretization is chosen for simplicity, the following ideas are also valid for other types of discretization. The discretized problem reads to find \({u(\cdot ,\varvec{\xi })}\in V_N\) such that

$$\begin{aligned}&\int _D a({\varvec{x}},\varvec{\xi })\nabla u({\varvec{x}},\varvec{\xi })\nabla v({\varvec{x}})\, {\mathrm{d}}{\varvec{x}}\nonumber \\&\quad =\int _D f({\varvec{x}})v({\varvec{x}})\, {\mathrm{d}}{\varvec{x}} \quad {\text {for all }} \ v\in V_N,\; \varvec{\xi }\in \Gamma . \end{aligned}$$
(4)

Due to (2), the discretized problem (4) can be expressed in the matrix-vector form

$$\begin{aligned} {{{\varvec{A}}}}(\varvec{\xi }){{{\varvec{u}}}}(\varvec{\xi }):=\left( {{{\varvec{A}}}}_0+ \sum _{k=1}^K ({\varvec{\xi }})_k{{{\varvec{A}}}}_k\right) {{{\varvec{u}}}}(\varvec{\xi })={{{\varvec{b}}}}, \end{aligned}$$
(5)

where \(({{{\varvec{A}}}}_k)_{ns}=\int _D a_k({\varvec{x}})\nabla \psi _n({\varvec{x}})\nabla \psi _s({\varvec{x}})\,{\mathrm{d}}{\varvec{x}}\), \({{{\varvec{b}}}}_n=\int _Df({\varvec{x}})\psi _n({\varvec{x}})\,{\mathrm{d}}{\varvec{x}}\), \(k=0,1,\dots ,K\), \(n,s=1,\dots ,N\). The solution \({{{\varvec{u}}}}(\varvec{\xi })\in {{\mathbb{R}}}^N\) of (5) is the coefficient vector of the solution \(u({\varvec{x}},\varvec{\xi })\in V_N\) of (4), which means that they are connected via \(u({\varvec{x}},\varvec{\xi }) = \sum _{n=1}^N {{{\varvec{u}}}}_n(\varvec{\xi }) \psi _n({\varvec{x}})\).

Let us emphasize that instead of the usual solution space \(W_0^{1,2}(D)\) equipped with the inner product

$$\begin{aligned} \int _D \nabla u({\varvec{x}})\nabla v({\varvec{x}})\, {\mathrm{d}}{\varvec{x}}, \end{aligned}$$

and the related norm, we consider the same set of functions with the inner product and the norm defined as

$$\begin{aligned} (u,v)_V=\int _D a_0({\varvec{x}})\nabla u({\varvec{x}})\nabla v({\varvec{x}})\, {\mathrm{d}}{\varvec{x}} ,\qquad \Vert u\Vert _V^2=(u,u)_V, \end{aligned}$$
(6)

respectively, and denote this vector space V. We assume that \(\varvec{\xi }=(0,\dots ,0)\in \Gamma\), thus from (2) and (3) it follows that (6) are well defined.

In this paper, multi-valued objects, such as the vector \(\varvec{\xi }\) or the matrix \({\varvec{A}}\), are denoted by bold letters. The order of any object in some sequence is marked by a subscript, e.g. \(\varvec{\xi }_k, {\varvec{A}}_k, \psi _k({\varvec{x}})\). To refer to a certain element of a multi-valued object, we use round brackets and subscripts. For example, \((\varvec{\xi }_j)_k\) is the k-th element of the vector \(\varvec{\xi }_j\). We hope that all ambiguous terms, if they occur, can be understood from the context. Thus, for example, \({\varvec{A}}(\varvec{\xi })\) is a matrix depending on \(\varvec{\xi }\), but \((\varvec{\xi })_k{\varvec{A}}\) is the matrix \({\varvec{A}}\) multiplied by the scalar \((\varvec{\xi })_k\). Some description of objects can be found in the place of superscripts, e.g. \({\varvec{u}}^{\mathrm{RB}}(\varvec{\xi })\) is the solution of a reduced problem parametrized by \(\varvec{\xi }\). To emphasize the dependency on \(\varvec{\xi }\), the notation such as \({\varvec{u}}^{\mathrm{RB}}(\varvec{\xi })\) can be used. But sometimes “\((\varvec{\xi })\)” is omitted for the sake of simplicity.

3 Reduced basis method

In order to provide a high-fidelity approximation of the solution of (1) for \(\varvec{\xi }\in \Gamma\), the discretized problem (4)—or (5) in the matrix form—is generally required to be of a very large dimension. The solution manifold \({{\mathcal{M}}}_N = \{ {u}({\varvec{x}},\varvec{\xi }) \in V_N: \varvec{\xi } \in \Gamma \}\) of problem (4), however, can often be sufficiently approximated with a given error bound \(\tau >0\) by some reduced space \(V_M \subset V_N\) with the dimension M much smaller than N. Any basis \(\mathcal{U}_M=\{u_1({\varvec{x}}), \dots u_M({\varvec{x}})\}\) of \(V_M\) is called the reduced basis. When building such a basis during the offline phase we aim to find the smallest M and a set of M functions \({\mathcal{U}}_M\) such that for any \(\varvec{\xi }\in \Gamma\) the distance of the solution \(u({\varvec{x}},\varvec{\xi })\) of (4) from the span of \({\mathcal{U}}_M\) is less than \(\tau\). Let us emphasize that at least two meaningful metrics can be used. The first is generated by the metrics of V, i.e. \(\Vert v\Vert _V^2=\int _D a_0({\varvec{x}})\nabla v({\varvec{x}})^2\, {\mathrm{d}}{\varvec{x}}\), and the second is the energy norm given by individual problems (4): \(\Vert v\Vert _{\varvec{\xi }}^2= \int _Da({\varvec{x}},\varvec{\xi })\nabla v({\varvec{x}})^2\, {\mathrm{d}}{\varvec{x}}\). See Sect. 3.1 for details.

In the matrix-vector form, the reduced basis can be represented by the matrix \({{{\varvec{U}}}}_M \in {{\mathbb{R}}}^{N\times M}\) with columns \(\{ {{\varvec{u}}}_1, \dots {{\varvec{u}}}_M \}\) formed by the coefficient vectors with respect to the FE basis functions \(\psi _1({\varvec{x}}),\dots , \psi _N({\varvec{x}})\) of the basis \({\mathcal{U}}_M=\{u_1({\varvec{x}}), \dots u_M({\varvec{x}})\}\). The name reduced basis will be used for \({\mathcal{U}}_M\) and for the coefficient vectors \({\varvec{U}}_M\) as well. In the online phase, for each \(\varvec{\xi } \in \Gamma\), the high-fidelity solution \({\varvec{u}}(\varvec{\xi })\) of (5) can now be approximated by \({\varvec{U}}_M{\varvec{u}}^{\mathrm{RB}}_M(\varvec{\xi })\) where \({\varvec{u}}^{\mathrm{RB}}_M(\varvec{\xi })\) is the RB solution of the RB problem

$$\begin{aligned} {{{\varvec{A}}}}^{\mathrm{RB}}(\varvec{\xi }){{{\varvec{u}}}}^{\mathrm{RB}}(\varvec{\xi }):=\left( {{{\varvec{A}}}}_0^{\mathrm{RB}}+ \sum _{k=1}^K ({\varvec{\xi }})_k{{{\varvec{A}}}}_k^{\mathrm{RB}}\right) {{{\varvec{u}}}}^{\mathrm{RB}}(\varvec{\xi })={{{\varvec{b}}}}^{\mathrm{RB}}, \end{aligned}$$
(7)

where \({{{\varvec{A}}}}^{\mathrm{RB}}_k\in {\mathbb{R}}^{M\times M}\), \({\varvec{u}}^{\mathrm{RB}}(\varvec{\xi }),{{{\varvec{b}}}}^{\mathrm{RB}}\in {\mathbb{R}}^{M}\), \({{{\varvec{A}}}}^{\mathrm{RB}}_k={{{\varvec{U}}}}_M^T{{{\varvec{A}}}}_k{{{\varvec{U}}}}_M,\ k=0,1,\dots ,K\), and \({{{\varvec{b}}}}^{\mathrm{RB}}={{{\varvec{U}}}}_M^T{{{\varvec{b}}}}\). System (7) represents problem (5) restricted to the range of \({{{\varvec{U}}}}_M\). Note that the range of \({{{\varvec{U}}}}_N\) is equal to \({{\mathbb{R}}}^N\), the solution space of (5).

Evaluating the exact errors \({\varvec{u}}(\varvec{\xi })-{\varvec{U}}_M{\varvec{u}}^{\mathrm{RB}}_M(\varvec{\xi })\) within the RB algorithm would unacceptably slow down the computation, and so only some estimates of the errors (called a posteriori error estimates) are employed. In the next three subsections we provide a definition of the error of the RB solution and its measurement, a detailed description of the greedy algorithm and we briefly comment on the choice of the set of snapshots \({\varvec{u}}(\varvec{\xi })\) associated to the choice of the training set of parameters \(\varvec{\xi }\in \Gamma\).

3.1 Error of RB solution

Let us use the following notation. For all \(u,v\in V\) let

$$\begin{aligned} &\left( u,v\right):= {} \int _D u({\varvec{x}}) v({\varvec{x}}) \, {\mathrm{d}}{\varvec{x}}\\ &\left( u,v\right) _V:= {} \int _D a_0({\varvec{x}})\nabla u({\varvec{x}}) \nabla v({\varvec{x}}) \, {\mathrm{d}}{\varvec{x}}\\ &\Vert v\Vert _{V}^2:= {} \int _D a_0({\varvec{x}})\nabla v({\varvec{x}}) \nabla v({\varvec{x}}) \, {\mathrm{d}}{\varvec{x}}\\ &\left( u,v\right) _{\varvec{\xi }}:= {} \int _D a({\varvec{x}},\varvec{\xi })\nabla u({\varvec{x}}) \nabla v({\varvec{x}}) \, {\mathrm{d}}{\varvec{x}}\\ &\Vert v\Vert _{\varvec{\xi }}^2:= {} \int _D a({\varvec{x}},\varvec{\xi })\nabla v({\varvec{x}}) \nabla v({\varvec{x}}) \, {\mathrm{d}}{\varvec{x}}. \end{aligned}$$

Note that in our setting

$$\begin{aligned} \left( u,v\right) _V=\left( u,v\right) _{\varvec{0}}= \int _D a_0({\varvec{x}})\nabla u({\varvec{x}}) \nabla v({\varvec{x}}) \, {\mathrm{d}}{\varvec{x}},\quad u,v\in V. \end{aligned}$$

Using the exact integration we have for the coefficient vectors \({\varvec{u}},{\varvec{v}}\in {\mathbb{R}}^N\) of functions \(u,v\in V_N\)

$$\begin{aligned} \left( u,v\right) ={\varvec{v}}^T{\varvec{M}}{\varvec{u}},\quad \left( u,v\right) _V={\varvec{v}}^T{\varvec{A}}_0{\varvec{u}},\quad \left( u,v\right) _{\varvec{\xi }}={\varvec{v}}^T{\varvec{A}}(\varvec{\xi }){\varvec{u}} \end{aligned}$$

and corresponding dual norms can be expressed as

$$\begin{aligned} \left( u,v\right) _{V'}={\varvec{v}}^T{\varvec{A}}_0^{-1}{\varvec{u}},\quad \left( u,v\right) _{\varvec{\xi }'}={\varvec{v}}^T({\varvec{A}}(\varvec{\xi }))^{-1}{\varvec{u}}, \end{aligned}$$

where \({\varvec{M}}\) is the mass matrix, \(({\varvec{M}})_{jk}=\int _D \psi _j({\varvec{x}})\psi _k({\varvec{x}})\,{\mathrm{d}}{\varvec{x}}\), \(j,k=1,\dots ,N\), and \({\varvec{A}}_0\) and \({\varvec{A}}(\varvec{\xi })\) are introduced in (5). Note that \({\varvec{b}}={\varvec{Mf}}\), where \({\varvec{f}}\) is the coefficient vector of a projection of the function \(f\in L^2(D)\) to \(V_N\): \(({\varvec{f}})_{j}=\int _D \psi _j({\varvec{x}})f({\varvec{x}})\,{\mathrm{d}}{\varvec{x}}\), \(j= 1,\dots ,N\).

Let \(u_N({\varvec{x}},\varvec{\xi })\in V_N\) be the exact solution of (4) and let \(u_M({\varvec{x}},\varvec{\xi })\) be the RB solution in the reduced space \(V_M \subset V_N\) for a given \(\varvec{\xi }\in \Gamma\). Let \(e_M({\varvec{x}},\varvec{\xi })=u_N({\varvec{x}},\varvec{\xi })-u_M({\varvec{x}},\varvec{\xi })\in V_N\) denote the error function of this RB solution. We measure the error in the energy norm generated by the underlying particular problem (4) and denote

$$\begin{aligned} E_M(\varvec{\xi })=\Vert {e}_M({\varvec{x}},\varvec{\xi })\Vert _{\varvec{\xi }}^2= \Vert {u}_N({\varvec{x}},\varvec{\xi })-{u}_M({\varvec{x}},\varvec{\xi })\Vert _{\varvec{\xi }}^2 \ . \end{aligned}$$
(8)

In the matrix-vector form, let \({\varvec{u}}_N(\varvec{\xi })\in {\mathbb{R}}^N\) be the solution of (5) and let \({\varvec{u}}_M(\varvec{\xi })={{{\varvec{U}}}}_M{{{\varvec{u}}}}^{\mathrm{RB}}(\varvec{\xi })\) be the prolonged RB solution \({{{\varvec{u}}}}^{\mathrm{RB}}(\varvec{\xi })\in {\mathbb{R}}^M\) of (7). The coefficient vector of the error function \(e_M({\varvec{x}},\varvec{\xi })\) is \({\varvec{e}}_M(\varvec{\xi })={\varvec{u}}_N(\varvec{\xi })- {\varvec{u}}_M(\varvec{\xi })\). Denoting \({\varvec{r}}_M(\varvec{\xi })\in {\mathbb{R}}^N\) the residual vector of (5),

$$\begin{aligned} {\varvec{r}}_M(\varvec{\xi })={\varvec{b}}-{\varvec{A}}(\varvec{\xi }){\varvec{u}}_M(\varvec{\xi }), \end{aligned}$$
(9)

we get

$$\begin{aligned} {\varvec{A}}(\varvec{\xi })\, {\varvec{e}}_M(\varvec{\xi }) ={\varvec{A}}(\varvec{\xi })\,({\varvec{u}}_N(\varvec{\xi })- {\varvec{u}}_M(\varvec{\xi })) = {\varvec{b}}-{\varvec{A}}(\varvec{\xi }){\varvec{u}}_M(\varvec{\xi })={\varvec{r}}_M(\varvec{\xi }), \end{aligned}$$

so the squared energy norm (8) of the error can be expressed as

$$\begin{aligned} E_M(\varvec{\xi })= &\, {} {\varvec{e}}_M(\varvec{\xi })^T {\varvec{A}}(\varvec{\xi })\, {\varvec{e}}_M(\varvec{\xi })\nonumber \\= &\, {} ({\varvec{A}}(\varvec{\xi })\,{\varvec{e}}_M(\varvec{\xi }))^T ({\varvec{A}}(\varvec{\xi }))^{-1} {\varvec{A}}(\varvec{\xi })\, {\varvec{e}}_M(\varvec{\xi }) \nonumber \\= &\, {} {{{\varvec{r}}}}_M({{\varvec{\xi }}})^T \, ({{{\varvec{A}}}}({{\varvec{\xi }}}))^{-1} {{{\varvec{r}}}}_M({{\varvec{\xi }}}) \, = \Vert {{{\varvec{r}}}}_M({{\varvec{\xi }}}) \Vert ^2_{{{\varvec{\xi }}}'} \ . \end{aligned}$$
(10)

3.2 Greedy algorithm

In the offline phase of building up the reduced basis, we employ the greedy algorithm [17, 18], which is based on two presumptions. First, that a good choice of the RB space is a space generated by some carefully chosen elements of the manifold \({\mathcal{M}}_N\) that are represented by M independent solutions \(\{ {{\varvec{u}}}_1, \dots {{\varvec{u}}}_M \}\) of (5) corresponding to a set of samples \(T_M = \{ {\varvec{\xi }}_1, \dots {\varvec{\xi }}_M \} \subset \Gamma\). Second, that we are able to characterize the whole manifold \({{\mathcal {M}}}_N\) sufficiently by a finite trainig set \(T_J\subset \Gamma\), \(J \gg M\), which has to be chosen in advance.

The solutions \({\varvec{u}}(\varvec{\xi }_m)\) of (5) for all \(\varvec{\xi }_m \in T_M\), \(m=1, \dots , M\), form the columns of the matrix \({{{\varvec{U}}}}_M\in {{\mathbb{R}}}^{N\times M}\). The set \(T_M \subset T_J\) is constructed by picking \(\varvec{\xi }\) from \(T_J\) one-by-one according to some criterion, and the matrix \({{{\varvec{U}}}}_M\) is built column-by-column, correspondingly. The first element \(\varvec{\xi }_1\in T_J\) is usually chosen close to the “center” of \(\Gamma\). Thus, \({{{\varvec{U}}}}_1={{{\varvec{u}}}}(\varvec{\xi }_1)\) and \(T_1=\{\varvec{\xi }_1\}\). Having \(T_m = \{ \varvec{\xi }_1, \dots {\varvec{\xi }_m}\}\) already selected and the corresponding matrix \({{{\varvec{U}}}}_m\) constructed, the item \(\varvec{\xi }_{m+1}\) is considered to be such \(\varvec{\xi } \in T_J\) that the corresponding solution \({{{\varvec{u}}}}^{\mathrm{RB}}(\varvec{\xi })\) of (7) maximizes the squared energy norm \(E_m({{\varvec{\xi }}})\) of the error given by (10) for \(M=m\). The process stops whenever \(E_m({{\varvec{\xi }}})\le \tau\) for all \(\varvec{\xi }\in T_J\). In practice, the columns of \({{{\varvec{U}}}}_m\) are orthonormalized (in our numerical tests, orthonormalization with respect to the scalar product of \({\mathbb{R}}^N\) was used).

In other words, for \(m=1,2,\dots ,M\), the reduced problems (7) are solved for all currently remaining \(\varvec{\xi }\in T_J\setminus T_m\), the obtained solutions \({{{\varvec{u}}}}^{\mathrm{RB}}(\varvec{\xi })\) are prolonged to the range of \({{{\varvec{U}}}}_m\), \({{{\varvec{u}}}}^{\mathrm{RB}}(\varvec{\xi })\rightarrow {{{\varvec{U}}}}_m{{{\varvec{u}}}}^{\mathrm{RB}}(\varvec{\xi })\), and the energy norms of the errors of the approximate solutions \({{{\varvec{U}}}}_m {{{\varvec{u}}}}^{\mathrm{RB}}(\varvec{\xi })\) of (5) are compared. This means that \(E_m(\varvec{\xi })\) defined by (10) with M substituted by m has to be computed many times during the offline phase. Consequently, instead of computing the exact values of \(E_m({{\varvec{\xi }}})\), some estimates of the errors are used, in which the inverse \({{{\varvec{A}}}}({{\varvec{\xi }}})^{-1}\) is approximated. We focus on this in Sect. 4.

3.3 Choice of the training set

The training set \(T_J\subset \Gamma\) of J elements should sufficiently represent the whole parameter space \(\Gamma\) and might allow efficient computation of the desired characteristics. To this end, various techniques are used to generate \(T_J\) such as the Monte Carlo method, or regular tensor grids over \(\Gamma\), or sparse grids, see e.g. [30]. The coordinates of the points in \(T_J\) can be obtained, for example, as abscissas of some Gauss or Clenshaw–Curtis quadrature formulas. In this paper, the Monte Carlo method and the uniform probability distribution on \(\Gamma\) are considered. In general, the training set may be chosen according to the density distribution of the underlying probability space. A large training set \(T_J\) induces a lot of evaluations of (estimates of) \(E_m(\varvec{\xi })\). Starting with a relatively small training set and its adaptive enriching is proposed in [25]. This modification increases the efficiency of the methods of error estimates that we suggest, thus we employ this idea in our algorithms.

4 A posteriori error estimates

The key part of RB methods is estimating the quality of a current reduced basis or estimating which snapshot should be next included into the reduced basis. During the offline phase, formula (10) has to be evaluated or estimated for every \(m=1,2,\dots ,M\) and for all \(\varvec{\xi }\) remaining in the actual training set. Let us emphasize that the name a posteriori error estimate here and also, for example, in [17, 18, 22, 40], means the estimate of a difference (measured in some norm) between the solutions \(u_M({\varvec{x}},\varvec{\xi })\in V_M\subset V_N\) and \(u_N({\varvec{x}},\varvec{\xi })\in V_N\). These solutions are the \((\cdot ,\cdot )_{\varvec{\xi }}\)-orthogonal projections of the exact solution \(u({\varvec{x}},\varvec{\xi })\in V\) of problem (1) onto \(V_M\) and \(V_N\), respectively. However, a widely used meaning of the name a posteriori error estimate is connected to the estimate of the distance of some approximate solution \(u_M({\varvec{x}},\varvec{\xi })\in V_M\) from the exact solution \(u({\varvec{x}},\varvec{\xi })\in V\) of (1); see, for example, [1, 10]. Such an estimate cannot be obtained using only the discretized form of the problem. Some sophisticated construction is needed unless some other properties of the solutions are employed, such as in [10]. Before recalling the standard error estimate and introducing the new one in the next two subsections, we present the following lemma.

Lemma 4.1

Let the matrices \({\varvec{A}},{\varvec{M}}\in {\mathbb{R}}^{N\times N}\) be real, symmetric and positive definite. Suppose that there exist \(0<c_1\le c_2<\infty\) such that

$$\begin{aligned} c_1{\varvec{v}}^T{\varvec{Mv}}\le {\varvec{v}}^T{\varvec{Av}}\le c_2{\varvec{v}}^T{\varvec{Mv}}\quad \text {for all}\; {\varvec{v}}\in {\mathbb{R}}^N. \end{aligned}$$
(11)

Then

$$\begin{aligned} \frac{1}{c_2}\,{\varvec{v}}^T{\varvec{M}}^{-1}{\varvec{v}}\le {\varvec{v}}^T{\varvec{A}}^{-1}{\varvec{v}}\le \frac{1}{c_1}\,{\varvec{v}}^T{\varvec{M}}^{-1}{\varvec{v}}\quad \text {for all}\; {\varvec{v}}\in {\mathbb{R}}^N. \end{aligned}$$
(12)

Moreover, the possible largest \(c_1\) and the smallest \(c_2\) in (11) are equal to the minimal and maximal eigenvalue of \({\varvec{M}}^{-1}{\varvec{A}}\), respectively.

Proof

For any \({\varvec{v}}\in {\mathbb{R}}^N\), inserting \({\varvec{v}} = {\varvec{M}}^{-\frac{1}{2}} {\varvec{u}}\) into (11), we have

$$\begin{aligned} c_1{\varvec{u}}^T{\varvec{u}}\le {\varvec{u}}^T{\varvec{M}}^{-\frac{1}{2}}{\varvec{A}}{\varvec{M}}^{-\frac{1}{2}}{\varvec{u}}\le c_2{\varvec{u}}^T{\varvec{u}}\quad \text {for all}\; {\varvec{u}}\in {\mathbb{R}}^N, \end{aligned}$$
(13)

so the eigenvalues of \({\varvec{M}}^{-\frac{1}{2}}{\varvec{A}}{\varvec{M}}^{-\frac{1}{2}}\) lie in \(\langle c_1,c_2\rangle\). Then the spectra of the matrices \({\varvec{M}}^{-1}{\varvec{A}}\) and of \({\varvec{A}}{\varvec{M}}^{-1}\), both similar to \({\varvec{M}}^{-\frac{1}{2}}{\varvec{A}}{\varvec{M}}^{-\frac{1}{2}}\), lie in \(\langle c_1,c_2\rangle\) as well. Then the spectra of the matrices \({\varvec{M}}{\varvec{A}}^{-1}\), \({\varvec{A}}^{-1}{\varvec{M}}\), and \({\varvec{M}}^{\frac{1}{2}}{\varvec{A}}^{-1}{\varvec{M}}^{\frac{1}{2}}\) lie in \(\langle \frac{1}{c_2},\frac{1}{c_1}\rangle\), which yields (12). The last statement follows from the equivalence of (11) and (13) and from the similarity of \({\varvec{M}}^{-1}{\varvec{A}}\) and \({\varvec{M}}^{-\frac{1}{2}}{\varvec{A}}{\varvec{M}}^{-\frac{1}{2}}\). \(\square\)

4.1 Mean based a posteriori error estimate

A widely used estimate of the energy norm of error (10) is based on the dual norm of the residual

$$\begin{aligned} E^{\mathrm{MB}}_{M}({{\varvec{\xi }}}) = {{{\varvec{r}}}}_M^T({{\varvec{\xi }}})\, ({{{\varvec{A}}}_0})^{-1}\, {{{\varvec{r}}}}_M({{\varvec{\xi }}}) \, = \Vert {{{\varvec{r}}}}_M({{\varvec{\xi }}}) \Vert ^2_{V'} \ , \end{aligned}$$
(14)

see, e.g. [18], and it is called the mean-based estimate (MB) in this paper due to its relationship with the matrix \({\varvec{A}}_0\) given by \(a_0(x)\) which is often defined as the mean value of \(a(x,\varvec{\xi })\). Using (14) we can get guaranteed upper and lower bounds of (10). From coercivity and continuity conditions (3) for every \(\varvec{\xi }\in \Gamma\) there exist \(0<\alpha ^{\mathrm{cont}}_1(\varvec{\xi })\le \alpha ^{\mathrm{cont}}_2(\varvec{\xi })<\infty\) such that

$$\begin{aligned} \alpha _1^{\mathrm{cont}}(\varvec{\xi })\,\Vert v\Vert _{V}^2\le \left( v,v\right) _{\varvec{\xi }}\le \alpha ^{\mathrm{cont}}_2(\varvec{\xi })\,\Vert v\Vert _{V}^2 \quad \text {for all}\; v\in V , \;\varvec{\xi }\in \Gamma , \end{aligned}$$
(15)

where

$$\begin{aligned} \text {ess inf}_{{\varvec{x}}\in D}\frac{a({\varvec{x}},\varvec{\xi })}{a_0({\varvec{x}})}\le & {} \alpha _1^{\mathrm{cont}}(\varvec{\xi }),\\ \alpha _2^{\mathrm{cont}}(\varvec{\xi })\le & {} \text {ess sup}_{{\varvec{x}}\in D}\frac{a({\varvec{x}},\varvec{\xi })}{a_0({\varvec{x}})}. \end{aligned}$$

Possibly, there may also exist uniform bounds \({\bar{\alpha }}_1^{\mathrm{cont}}\) and \({\bar{\alpha }}_2^{\mathrm{cont}}\) such that \(0<{\bar{\alpha }}_1^{\mathrm{cont}}\le \alpha ^{\mathrm{cont}}_1(\varvec{\xi })\le \alpha ^{\mathrm{cont}}_2(\varvec{\xi }) \le {\bar{\alpha }}_2^{\mathrm{cont}}<\infty\) for all \(\varvec{\xi }\in \Gamma\). If we consider the discretized problem only, \(v\in V_N\subset V\), the inequalities (15) remain valid with possibly different constants,

$$\begin{aligned}&\alpha ^{\mathrm{disc}}_1(\varvec{\xi })\,{\varvec{v}}^T{\varvec{A}}_0{\varvec{v}}\le {\varvec{v}}^T{\varvec{A}}(\varvec{\xi }){\varvec{v}}\nonumber \\&\quad \le \alpha ^{\mathrm{disc}}_2(\varvec{\xi })\,{\varvec{v}}^T{\varvec{A}}_0{\varvec{v}}, \quad \text {for all}\; {\varvec{v}}\in {\mathbb{R}}^N, \varvec{\xi }\in \Gamma , \end{aligned}$$
(16)

where \(\alpha _1^{\mathrm{cont}}(\varvec{\xi })\le \alpha _1^{\mathrm{disc}}(\varvec{\xi })\), \(\alpha _2^{\mathrm{disc}}(\varvec{\xi })\le \alpha _2^{\mathrm{cont}}(\varvec{\xi })\). Obviously, cf. Lemma 4.1,

$$\begin{aligned} \alpha _1^{\mathrm{disc}}(\varvec{\xi })=\lambda _{\mathrm{min}}({\varvec{A}}_0^{-1}{\varvec{A}}(\varvec{\xi })),\quad \alpha _2^{\mathrm{disc}}(\varvec{\xi })=\lambda _{\mathrm{max}}({\varvec{A}}_0^{-1}{\varvec{A}}(\varvec{\xi }))\quad \text { for all} \;\varvec{\xi }\in \Gamma . \end{aligned}$$

In our numerical experiments, there will exist uniform bounds \({{\bar{\alpha }}}_1^{\mathrm{disc}}\) and \({{\bar{\alpha }}}_2^{\mathrm{disc}}\) such that \({{\bar{\alpha }}}_1^{\mathrm{disc}}\le \alpha _1^{\mathrm{disc}}(\varvec{\xi })\), \(\alpha _2^{\mathrm{disc}}(\varvec{\xi })\le {{\bar{\alpha }}}_2^{\mathrm{disc}}\) for all \(\varvec{\xi }\in \Gamma\), where

$$\begin{aligned} {{\bar{\alpha }}}_1^{\mathrm{disc}}=\min _{\varvec{\xi }\in \Gamma } \lambda _{\mathrm{min}}({\varvec{A}}_0^{-1}{\varvec{A}}(\varvec{\xi })),\quad {{\bar{\alpha }}}_2^{\mathrm{disc}}=\max _{\varvec{\xi }\in \Gamma } \lambda _{\mathrm{max}}({\varvec{A}}_0^{-1}{\varvec{A}}(\varvec{\xi })). \end{aligned}$$

Finally, we get

$$\begin{aligned} {{\bar{\alpha }}}_1^{\mathrm{cont}}\le & {} {{\bar{\alpha }}}_1^{\mathrm{disc}}\le \alpha _1^{\mathrm{disc}}(\varvec{\xi }),\nonumber \\ \alpha _2^{\mathrm{disc}}(\varvec{\xi })\le & {} {{\bar{\alpha }}}_2^{\mathrm{disc}}\le {{\bar{\alpha }}}_2^{\mathrm{cont}}\quad \text {for all}\; \varvec{\xi }\in \Gamma . \end{aligned}$$
(17)

Combining (16) and Lemma 4.1 we get two-sided guaranteed bounds to the squared energy norm (10) of the error

$$\begin{aligned} \frac{1}{\alpha _2^{\mathrm{disc}}(\varvec{\xi })}\; {\varvec{r}}_M(\varvec{\xi })^T{\varvec{A}}_0^{-1} {\varvec{r}}_M(\varvec{\xi })\le {\varvec{e}}_M(\varvec{\xi })^T {\varvec{A}}(\varvec{\xi }){\varvec{e}}_M(\varvec{\xi })^T\quad \le \frac{1}{\alpha _1^{\mathrm{disc}}(\varvec{\xi })}\; {\varvec{r}}_M(\varvec{\xi })^T{\varvec{A}}_0^{-1} {\varvec{r}}_M(\varvec{\xi }). \end{aligned}$$

This can also be expressed using norms and dual norms as

$$\begin{aligned} \frac{\Vert {{{\varvec{r}}}}_M({{\varvec{\xi }}}) \Vert ^2_{V'}}{\alpha _2^{\mathrm{cont}}(\varvec{\xi })}\le & {} \frac{\Vert {{{\varvec{r}}}}_M({{\varvec{\xi }}}) \Vert ^2_{V'} }{\alpha _2^{\mathrm{disc}}(\varvec{\xi })}\le \Vert {\varvec{e}}_M(\varvec{\xi }) \Vert ^2_{\varvec{\xi }}\nonumber \\\le & {} \frac{\Vert {{{\varvec{r}}}}_M({{\varvec{\xi }}}) \Vert ^2_{V'}}{\alpha _1^{\mathrm{disc}}(\varvec{\xi })} \le \frac{\Vert {{{\varvec{r}}}}_M({{\varvec{\xi }}}) \Vert ^2_{V'}}{\alpha _1^{\mathrm{cont}}(\varvec{\xi })} , \end{aligned}$$
(18)

cf. [8, (4-I.14)] or [18, (3.29)] for the upper bound.

In the GRB algorithm, instead of computing the constants \(\alpha _1^{\mathrm{disc}}(\varvec{\xi })\) and \(\alpha _2^{\mathrm{disc}}(\varvec{\xi })\) for every \(\varvec{\xi }\in T_J\) separately, uniform lower and upper bounds of (17) can be used. However, if the variation of \(a({\varvec{x}},\varvec{\xi })\) with respect to \(\varvec{\xi }\) grows, these uniform bounds may become useless.

4.2 Multilevel a posteriori error estimate

The estimate of \(E_M(\varvec{\xi })\) suggested in this section is based on a hierarchy designed in the FE solution space. A full hierarchy is not necessary, only two or three levels can be sufficient. The theory of multilevel methods can be found in e.g. [3, 21]. An example of a specific FE setting (bilinear FE basis fubctions with rectangular supports) and the way of obtaining certain constants (\(\gamma\), \(\beta _1\), and \(\beta _2\)) and using them in a two-level error estimate can be found in Appendix. This scheme can be easily adapted to any other hierarchical FE setting. A multilevel method is then derived from the two-level scheme by recursive splitting of the coarse spaces which is described, e.g., in [28, 37]. Therefore, let us only consider a general hierarchical two-level splitting of the solution space \(V_N\) into two subspaces, the coarse and the fine one; see the definition in “Appendix”. Let us denote the coefficient vectors of the function \(u\in V_N\) with respect to the original FE space by \({\varvec{u}}\in {\mathbb{R}}^N\) and with respect to the hierarchical two-level basis by

$$\begin{aligned} {\varvec{u}}^{\mathrm{ML}}=\left( \begin{array}{c} {\varvec{u}}^{\mathrm{ML,C}}\\ {\varvec{u}}^{\mathrm{ML,F}}\end{array}\right) \in {\mathbb{R}}^N,\quad {\varvec{Pu}}^{\mathrm{ML}}={\varvec{u}}, \end{aligned}$$
(19)

where we use the transformation matrix \({\varvec{P}}\in {\mathbb{R}}^{N\times N}\) which is specified in “Appendix”. Any system of linear equations \({\varvec{Au}}={\varvec{b}}\) with respect to the original FE basis can be transformed into the system

$$\begin{aligned} {\varvec{A}}^{\mathrm{ML}}{\varvec{u}}^{\mathrm{ML}}= & {} \left( \begin{array}{cc}{\varvec{A}}^{\mathrm{ML,C}}&{}{\varvec{A}}^{\mathrm{ML,CF}}\\ {{\varvec{A}}^{\mathrm{ML,CF}}}^T&{}{\varvec{A}}^{\mathrm{ML,F}}\end{array}\right) \left( \begin{array}{c}{\varvec{u}}^{\mathrm{ML,C}}\\ {{\varvec{u}}^{\mathrm{ML,F}}}\end{array}\right) \nonumber \\= & {} \left( \begin{array}{c}{\varvec{b}}^{\mathrm{ML,C}}\\ {{\varvec{b}}^{\mathrm{ML,F}}}\end{array}\right) = {\varvec{b}}^{\mathrm{ML}} \end{aligned}$$
(20)

where we have

$$\begin{aligned} {\varvec{A}}^{\mathrm{ML}}{\varvec{u}}^{\mathrm{ML}}= {\varvec{P}}^T{\varvec{AP}}{\varvec{u}}^{\mathrm{ML}}= {\varvec{P}}^T{\varvec{b}}={\varvec{b}}^{\mathrm{ML}}. \end{aligned}$$

The squared energy norm (10) of the error can be expressed in this hierarchical splitting as

$$\begin{aligned} E_M(\varvec{\xi })=\, & {} {\varvec{e}}^{\mathrm{ML}}_M(\varvec{\xi })^T{\varvec{A}}^{\mathrm{ML}}(\varvec{\xi }){\varvec{e}}^{\mathrm{ML}}_M(\varvec{\xi })\nonumber \\=\, & {} {\varvec{r}}^{\mathrm{ML}}_M(\varvec{\xi })^T{\varvec{A}}^{\mathrm{ML}}(\varvec{\xi })^{-1}{\varvec{r}}^{\mathrm{ML}}_M(\varvec{\xi }), \end{aligned}$$
(21)

where \({\varvec{Pe}}^{\mathrm{ML}}_M(\varvec{\xi })={\varvec{e}}_M(\varvec{\xi })\) and \({\varvec{Pr}}^{\mathrm{ML}}_M(\varvec{\xi })={\varvec{r}}_M(\varvec{\xi })\), see Lemma 7.1 in “Appendix”. Due to Lemma 4.1 applied to (39) in “Appendix”, \({\varvec{A}}^{\mathrm{ML}}(\varvec{\xi })^{-1}\) in (21) can be approximated using

$$\begin{aligned}&\frac{1}{1+\gamma }\,{\varvec{v}}^T \left( \begin{array}{cc}{{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })}^{-1}&{}0\\ 0&{}{{\varvec{A}}^{\mathrm{ML,F}}(\varvec{\xi })}^{-1}\end{array}\right) {\varvec{v}}\nonumber \\&\quad \le {\varvec{v}}^T{\varvec{A}}^{\mathrm{ML}}(\varvec{\xi })^{-1}{\varvec{v}} \nonumber \\&\quad \le \frac{1}{1-\gamma }\,{\varvec{v}}^T \left( \begin{array}{cc}{{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })}^{-1}&{}0\\ 0&{}{{\varvec{A}}^{\mathrm{ML,F}}(\varvec{\xi })}^{-1}\end{array}\right) {\varvec{v}} \end{aligned}$$
(22)

for all \({\varvec{v}}\in {\mathbb{R}}^N\), where \(\gamma \in \langle 0,1)\) can be easily obtained for all hierarchical FE spaces and does not depend on the discretization parameter and on \(\varvec{\xi }\in \Gamma\) provided all functions \(a_k({\varvec{x}})\) are piecewise constant or mildly varying on D; see the details in “Appendix”. Thus we have to solve only two smaller systems of equations. Moreover, the coarse subspace can be further decomposed into coarse and fine subspaces and thus the system with the matrix \({\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })\) can be further approximated by a two-by-two block diagonal matrix.

The matrix \({\varvec{A}}^{\mathrm{ML,F}}(\varvec{\xi })\) can be equivalently substituted by its diagonal. An example of such an equivalence is given in “Appendix”. More precisely, let \({\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi })\) be the diagonal of \({\varvec{A}}^{\mathrm{ML,F}}(\varvec{\xi })\), then there exist constants \(0<\beta _1\le 1\le \beta _2<\infty\) such that

$$\begin{aligned} \beta _1\;{\varvec{v}}^T{\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi }){\varvec{v}}\le {\varvec{v}}^T{\varvec{A}}^{\mathrm{ML,F}}(\varvec{\xi }){\varvec{v}}\le \beta _2\;{\varvec{v}}^T{\varvec{D}}^{\mathrm{ML,F}} (\varvec{\xi }){\varvec{v}} \end{aligned}$$

for all \({\varvec{v}}\in {\mathbb{R}}^N\). Thus using Lemma 7.1 in Appendix, we finally get the multilevel (ML) guaranteed error bounds

$$\begin{aligned} E^{\mathrm{ML,1}}_M(\varvec{\xi })\le {\varvec{e}}^{\mathrm{ML}}_M(\varvec{\xi })^T{\varvec{A}}^{\mathrm{ML}}(\varvec{\xi }){\varvec{e}}^{\mathrm{ML}}_M(\varvec{\xi })\le E^{\mathrm{ML,2}}_M(\varvec{\xi }), \end{aligned}$$
(23)

where

$$\begin{aligned} E^{\mathrm{ML,1}}_M(\varvec{\xi })= & {} \frac{1}{(1+\gamma )}\; {\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi })^T {{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })}^{-1}{\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi })\\&+\frac{1}{(1+\gamma )\beta _2}\; {\varvec{r}}_M^{\mathrm{ML,F}}(\varvec{\xi })^T {\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi })^{-1} {\varvec{r}}_M^{\mathrm{ML,F}}(\varvec{\xi }) \end{aligned}$$

and

$$\begin{aligned} E^{\mathrm{ML,2}}_M(\varvec{\xi })= & {} \frac{1}{(1-\gamma )}\; {\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi })^T {{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })}^{-1}{\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi })\\&+\frac{1}{(1-\gamma )\beta _1}\; {\varvec{r}}_M^{\mathrm{ML,F}}(\varvec{\xi })^T {\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi })^{-1} {\varvec{r}}_M^{\mathrm{ML,F}}(\varvec{\xi }), \end{aligned}$$

and the multilevel central error estimate

$$\begin{aligned} E^{\mathrm{ML}}_M(\varvec{\xi })= & {} {\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi })^T {{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })}^{-1}{\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi })\nonumber \\&+ {\varvec{r}}_M^{\mathrm{ML,F}}(\varvec{\xi })^T {\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi })^{-1} {\varvec{r}}_M^{\mathrm{ML,F}}(\varvec{\xi }), \end{aligned}$$
(24)

where

$$\begin{aligned} E^{\mathrm{ML,1}}_M(\varvec{\xi })\le E^{\mathrm{ML}}_M(\varvec{\xi })\le E^{\mathrm{ML,2}}_M(\varvec{\xi })\quad \text {for all}\; \varvec{\xi }\in \Gamma . \end{aligned}$$

The constants \(\gamma\), \(\beta _1\) and \(\beta _2\) can be quantified or estimated under many practical and theoretical (not only) FE settings. In our numerical experiments in Sect. 6, we will use bilinear FE basis functions with rectangular supports, and thus either \(\gamma ^2\le \frac{3}{8}\approx 0.6124^2\) if \(a({\varvec{x}},\varvec{\xi })\) is constant on the coarse elements, or \(\gamma ^2\le 0.4109\approx 0.6410^2\) if \(a({\varvec{x}},\varvec{\xi })\) is mildly varying inside the coarse elements (up to 20%), and \(\beta _1=\frac{1}{4}\) and \(\beta _2=\frac{7}{4}\); see the details in Appendix. For triangular or tetrahedral elements and piecewice linear basis functions [4, 7, 16], \(\gamma ^2 \in \langle \frac{3}{8},\frac{3}{4}\rangle\) depending on the shape of elements and iso/anisotropy of the operator. A hierarchy can be also built from different polynomial degrees [16, 36]. Note that a uniform guaranteed upper bound to \(\gamma\) and the bounds to \(\beta _1\) and \(\beta _2\) can be obtained from a simple algebra after rectricting the problem on a reference element [5, 16, 36]. The presented two-level method of substituting \({\varvec{A}}^{-1}\) by the inverse of a block \(2\times 2\) matrix is a corner stone of a multilevel algorithm. The idea of recursive splitting of the coarse spaces and the way how to substitute \({\varvec{A}}^{-1}\) are given in [3, 21, 28, 37]. As a result, the bounds analogous to \(\frac{1}{1+\gamma }\) and \(\frac{1}{1-\gamma }\) in (22) are 1 and \(\lambda\), respectively, where

$$\begin{aligned} \lambda =\frac{1}{2\sqrt{1-\gamma ^2}-1} \end{aligned}$$

provided \(\gamma ^2<\frac{3}{4}\). For the case \(\gamma ^2\ge \frac{3}{4}\), the multilevel method can still be used but the bounds are not uniform with respect to the number of levels any more. Note that the computation is faster but the bounds to the true error become less tight with growing number of levels.

5 Algorithms

The purpose of this part is to present the transcription of the methods suggested in the previous sections into a pseudo-code which may help to understand the details of the algorithms. First, we introduce the general GRB algorithm. Then we present the algorithms for two different methods of the a posteriori error estimation: mean-based (MB) estimates and multilevel (ML) estimates, which are described in Sects. 4.1 and 4.2 , respectively. These two methods are described in the form of modifications of the GRB algorithm.

5.1 GRB algorithm

The following algorithm represents a typical greedy algorithm for the construction of a reduced basis, described in Sect. 3. The algorithm is general in the sense that the error estimator is not specified yet.

GRB algorithm:

Input: The training set \(T_J\) and the constant \(\tau >0\) which serves as a threshold for the acceptable (squared) norm of the error. The training set \(T_J\) is considered to be divided into \(L \ge 1\) nonoverlapping sets (levels): \(T_J:=T^{\mathrm{lev}\; 1} \cup T^{\mathrm{lev}\; 2} \dots \cup T^{\mathrm{lev}\; L}\).

Output: The \(N\times M\) matrix \({{{\varvec{U}}}}_M\), the columns of which form the reduced basis (\({{{\varvec{U}}}}_M\) is constructed column-by-column).

  1. 1.

    Set \(T := T^{\mathrm{lev}\; 1}\), the first level of \(T_J\); set \(l:=1\).

    Take \(\varvec{\xi }_1\in T\) which is close to the center of \(\Gamma\).

    Solve (5) with \(\varvec{\xi }=\varvec{\xi }_1\); denote the solution \({{{\varvec{u}}}}_1\).

    Set \({{{\varvec{U}}}}_1:= {{{\varvec{u}}}}_1\) and \(T_1:=\{\varvec{\varvec{\xi }}_1\}\).

    Set \(T:=T\setminus \{\varvec{\xi }_1\}\).

    Set \(m:=1\).

  2. 2.

    For \(j=1,2,\dots ,\vert T\vert\):

    Find the solution of (7) with \(\varvec{\xi }=\varvec{\xi }_j\in T\); denote the solution by \({{{\varvec{u}}}}_{m,j}\).

    Compute \(E_{m,j}\) – an estimate (of the square) of some suitable norm of \({{{\varvec{e}}}}_{m,j}\),

    the error vector, which is defined as \({{{\varvec{e}}}}_{m,j} = {\varvec{A}}(\varvec{\xi })^{-1}{\varvec{b}}-{{{\varvec{U}}}}_m{{{\varvec{u}}}}_{m,j}\),

    where \({{{\varvec{U}}}}_m{{{\varvec{u}}}}_{m,j}\) is a prolongation of \({{{\varvec{u}}}}_{m,j}\).

  3. 3.

    If \(\max _{j=1,\dots ,\vert T\vert }E_{m,j}<\tau\) and \(l=L\) then \(M:=m\) and stop.

    If \(\max _{j=1,\dots ,\vert T\vert }E_{m,j}<\tau\) and \(l<L\) then go to step 4.

    Set \(\varvec{\xi }_{m+1}:=\varvec{\xi }_{j_{\mathrm{max}}}\), where \(j_\mathrm{max}:=\mathrm{argmax}_{j=1,\dots ,\vert T\vert }E_{m,j}\).

    Solve (5) with \(\varvec{\xi }=\varvec{\xi }_{m+1}\), and denote the solution by \({{{\varvec{u}}}}_{m+1}\).

    Set \({{{\varvec{U}}}}_{m+1}:=[{{{\varvec{U}}}}_m,{{{\varvec{u}}}}_{m+1}]\) and \(T_{m+1}:=T_m \cup \{\varvec{\xi }_{m+1}\}\).

    Orthonormalize \({{{\varvec{U}}}}_{m+1}\).

  4. 4.

    Set \(T_{\mathrm{remove}}:=\{\varvec{\xi }_j\in T;\; E_{m,j}<\tau \}\).

    Set \(T:=T\setminus T_{\mathrm{remove}}\).

    Enlarge T by adding the next level of values of \(\varvec{\xi }\), if applicable (for example, if \(\vert T\vert\) became less than some chosen threshold and \(l<L\)), i.e. set \(T := T \cup T^{\mathrm{lev}\; l}\), \(l:=l+1\).

    Set \(m:=m+1\) and go to step 2.

We compare two special cases of the GRB algorithm, namely the GRB-MB algorithm and the GRB-ML algorithm below, which in step 2 of the GRB algorithm use the mean-based squared error estimate \(E_{m}^{\mathrm{{MB}}}\) and the multilevel estimate \(E_{m}^{\mathrm{{ML}}}\), respectively, as described in the previous sections.

5.2 GRB-MB algorithm

The estimate \(E^{\mathrm{MB}}_{m,j} := E^{\mathrm{MB}}_{m}({{\varvec{\xi }}}_j)\) of (14) of the squared energy norm (10) is used. In order to effectively compute \(E^{\mathrm{MB}}_{m,j}\), auxiliary vectors \({{{\varvec{w}}}}_{k,m}\in {{\mathbb{R}}}^N\) are successively computed, where \({{{\varvec{w}}}}_{k,m}=({{{\varvec{A}}}}_0)^{-1}{{{\varvec{A}}}}_k{{{\varvec{u}}}}_m\), \(k=0,\dots ,K\), \(m=1,\dots ,M\). These vectors are used for the computation of \(E^{\mathrm{MB}}_{m,j}\) in the following manner. The residual of (5) used in (14) can be expressed as

$$\begin{aligned} {{{\varvec{r}}}}=\, & {} {{{\varvec{b}}}}-{{{\varvec{A}}}}({{\varvec{\xi }}_j}){{{\varvec{U}}}}_m {{{\varvec{u}}}}_{m,j} = {{{\varvec{b}}}}-\left( \sum _{k=0}^K (\varvec{\xi }_j)_k{{{\varvec{A}}}}_k\right) {{{\varvec{U}}}}_m{{{\varvec{u}}}}_{m,j} \\=\, & {} {{{\varvec{b}}}}-\sum _{k=0}^K (\varvec{\xi }_j)_k\left( {{{\varvec{A}}}}_k {{{\varvec{U}}}}_m\right) {{{\varvec{u}}}}_{m,j}, \end{aligned}$$

and its inverse image with respect to \({{{\varvec{A}}}}_0\) as

$$\begin{aligned} ({{{\varvec{A}}}}_0)^{-1} {{{\varvec{r}}}} =({\varvec{A}}_0)^{-1}{\varvec{b}}-\sum _{k=0}^K (\varvec{\xi }_j)_k\sum _{i=1}^m ({\varvec{u}}_{m,j})_i{{{\varvec{w}}}}_{k,i}. \end{aligned}$$
(25)

From relationship (25), the formula for the error estimate (14) can be obtained as

$$\begin{aligned} E^{\mathrm{MB}}_{m,j}= & {} \ {{{\varvec{r}}}}^T ({{{\varvec{A}}}}_0)^{-1} {{{\varvec{r}}}} = {{{\varvec{r}}}}^T ({{{\varvec{A}}}}_0)^{-1} {{{\varvec{A}}}}_0 \, ({{{\varvec{A}}}}_0)^{-1} {{{\varvec{r}}}} \nonumber \\=\, & {} \left( ({{{\varvec{A}}}}_0)^{-1} {{{\varvec{r}}}}\right) ^T {{{\varvec{A}}}}_0 \left( ({{{\varvec{A}}}}_0)^{-1} {{{\varvec{r}}}}\right) \nonumber \\=\, & {} \left( {{{\varvec{b}}}}^T ({{{\varvec{A}}}}_0)^{-1} - \sum _{k=0}^K (\varvec{\xi }_j)_k\sum _{i=1}^m ({\varvec{u}}_{m,j})_i{{{\varvec{w}}}}_{k,i}^T \right) \nonumber \\&{\varvec{A}}_0 \left( ({{{\varvec{A}}}}_0)^{-1}{\varvec{b}}-\sum _{k=0}^K (\varvec{\xi }_j)_k\sum _{i=1}^m ({\varvec{u}}_{m,j})_i{{{\varvec{w}}}}_{k,i}\right) \nonumber \\=\, & {} {{{\varvec{b}}}}^T({{{\varvec{A}}}}_0)^{-1}{\varvec{b}} - 2 \sum _{k=0}^K \sum _{i=1}^m (\varvec{\xi }_j)_k({\varvec{u}}_{m,j})_i {{{\varvec{b}}}}^T {{{\varvec{w}}}}_{k,i}\nonumber \\&+\sum _{k=0}^K \sum _{r =0}^K \sum _{i=1}^m \sum _{s =1}^m(\varvec{\xi }_j)_{r} (\varvec{\xi }_j)_k ({\varvec{u}}_{m,j})_{s} ({\varvec{u}}_{m,j})_i {{{\varvec{w}}}}_{r,s}^T {{{\varvec{A}}}}_0 {{{\varvec{w}}}}_{k,i}\,, \quad \ \end{aligned}$$
(26)

cf. [17]. The inner products \({{{\varvec{b}}}}^T {{{\varvec{w}}}}_{k,i}\) and \({{{\varvec{w}}}}_{r,s}^T {{{\varvec{A}}}}_0 {{{\varvec{w}}}}_{k,i}\), \(r, k=0,\dots ,K\), \(s, i =1,\dots ,m\), are independent of \({\varvec{\xi }}\), they are successively computed, stored and reused.

GRB-MB algorithm (refinement of the GRB algorithm)

Input, Output: As in the GRB algorithm.

  1. 1.

    As in the GRB algorithm, plus:

    Set \({{\varvec{W}}}_{0,1}={{{\varvec{u}}}}_1\) and \((\varvec{\xi }_j)_0=1, \ j=1, \dots , J\) (in order to simplify the notation later).

    Set \({{\varvec{W}}}_{k,1}=({{{\varvec{A}}}}_0)^{-1}{{{\varvec{A}}}}_k{\varvec{u}}_1\), \(k=1,\dots ,K\), and compute and store the inner products

    \({{{\varvec{b}}}}^T {{{\varvec{w}}}}_{k,1}\) and \({{{\varvec{w}}}}_{k,1}^T {{{\varvec{A}}}}_0 {{{\varvec{w}}}}_{r,1}\), \(r, k=0,\dots ,K\).

  2. 2.

    As in the GRB algorithm. The estimate \(E_{m,j} \equiv E^{\mathrm{MB}}_{m,j}\) is computed using (26).

  3. 3.

    As in the GRB algorithm, plus whenever a new column \({{{\varvec{u}}}}_{m+1}\) is added to \({{{\varvec{U}}}}_m\) in the GRB algorithm, compute vectors \({{{\varvec{w}}}}_{k,m+1}=({{{\varvec{A}}}}_0)^{-1}{{{\varvec{A}}}}_k{\varvec{u}}_{m+1}\), \(k=0,\dots ,K\), and use them for the computation of the inner products \({{{\varvec{b}}}}^T {{{\varvec{w}}}}_{k,m+1}\) and \({{{\varvec{w}}}}_{k,m+1}^T {{{\varvec{A}}}}_0 {{{\varvec{w}}}}_{r,s} = {{{\varvec{w}}}}_{k,m+1}^T {{{\varvec{A}}}}_r{\varvec{u}}_{s}\), \(r, k=0,\dots ,K\), \(s=1,\dots ,m+1\). Store these inner products.

  4. 4.

    As in the GRB algorithm.

5.3 GRB-ML algorithm

The estimate \(E^{\mathrm{ML}}_{m,j} := E^{\mathrm{ML}}_{m}({{\varvec{\xi }}}_j)\) of (24) of the squared energy norm (10) is used.

The residual \({\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi })\) in (24) represents the residual \({\varvec{r}}_M(\varvec{\xi })\) from (10), expressed in the hierarchical basis. It is obtained using \({{\varvec{P}}}\) from (19) as \({\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi }) = {\varvec{P}}^T {\varvec{r}}_M(\varvec{\xi })\).

The matrices \({{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })}\) and \({\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi })\) in (24) come from the hierarchical splitting (20) applied to equation (5), which leads to the transformed matrix

$$\begin{aligned} {\varvec{A}}^{\mathrm{ML}}(\varvec{\xi })= {\varvec{P}}^T{\varvec{A}}(\varvec{\xi })\,{\varvec{P}} = \left( \begin{array}{cc}{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi }) &{} {\varvec{A}}^{\mathrm{ML,CF}}(\varvec{\xi })\\ {{\varvec{A}}^{\mathrm{ML,CF}}}^T(\varvec{\xi }) &{} {\varvec{A}}^{\mathrm{ML,F}}(\varvec{\xi })\end{array}\right) , \end{aligned}$$

with \({\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi })\) being the diagonal of \({\varvec{A}}^{\mathrm{ML,F}}(\varvec{\xi })\). From the affine property of \(\varvec{\xi }\) we have

$$\begin{aligned} {{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })}= & {} {{{\varvec{A}}}}_0^{\mathrm{ML,C}}+\sum _{k=1}^K (\varvec{\xi })_k{{{\varvec{A}}}}_k^{\mathrm{ML,C}}, \nonumber \\ {{\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi })}= & {} {{{\varvec{D}}}}_0^{\mathrm{{ML,F}}}+\sum _{k=1}^K (\varvec{\xi })_k{{{\varvec{D}}}}_k^{\mathrm{{ML,F}}}. \end{aligned}$$
(27)

Alternatively, all vectors and matrices can be considered with respect to the hierarchical basis during the whole algorithm. Then, no transformation with \({{\varvec{P}}}\) is needed.

GRB-ML algorithm (refinement of the GRB algorithm)

Input, Output: As in the GRB algorithm.

  1. 1.

    As in the GRB algorithm, plus:

    Choose the hierarchy and prepare the transformation \({{{\varvec{P}}}}\) from (19).

    Assemble and store the matrices \({{{\varvec{A}}}}^{\mathrm{{ML,C}}}_k\), \(k=0, \dots K\), from (27).

  2. 2.

    As in the GRB algorithm. The estimate \(E_{m,j} \equiv E^{\mathrm{ML}}_{m,j}\) of (24) is computed in three steps:

    1. 2.1

      Compute the residual of (5) as \({{{\varvec{r}}}} = {{{\varvec{b}}}}-\sum _{k=0}^K (\varvec{\xi }_j)_k \left( {{{\varvec{A}}}}_k {{{\varvec{U}}}}_m\right) {{{\varvec{u}}}}_{m,j}\).

      Transform and split it to coarse and fine residuals as

      $$\begin{aligned} {{{\varvec{r}}}}^{\mathrm{ML}} = {\varvec{P}}^T {{{\varvec{r}}}} = \left( \begin{array}{c} {{{\varvec{r}}}}^{\mathrm{ML,C}}\\ {{{\varvec{r}}}}^{\mathrm{ML,F}} \end{array}\right) . \end{aligned}$$
    2. 2.2

      Assemble the matrices \({{{\varvec{A}}}}^{\mathrm{ML,C}}({{\varvec{\xi }}}_j)\) and \({\varvec{D}}^{\mathrm{ML,F}}({{\varvec{\xi }}}_j)\) using (27).

      Solve the coarse problem \({{{\varvec{A}}}}^{\mathrm{ML,C}}({{\varvec{\xi }}}_j) \,{{{\varvec{e}}}}^{\mathrm{C}} = {{{\varvec{r}}}}^{\mathrm{ML,C}}\).

      Compute \({{{\varvec{e}}}}^{\mathrm{F}}=({\varvec{D}}^{\mathrm{ML,F}}({{\varvec{\xi }}}_j))^{-1} {{{\varvec{r}}}}^{\mathrm{ML,F}}\).

    3. 2.3

      Compute the estimate as \(E^{\mathrm{ML}}_{m,j}={{{\varvec{r}}}}^T {{{\varvec{P}}}}\left( \begin{array}{c} {{{\varvec{e}}}}^{\mathrm{C}}\\ {{{\varvec{e}}}}^{\mathrm{F}} \end{array}\right) = ({{{\varvec{r}}}}^{\mathrm{ML}})^T\left( \begin{array}{c} {{{\varvec{e}}}}^{\mathrm{C}}\\ {{{\varvec{e}}}}^{\mathrm{F}} \end{array}\right)\).

  3. 3.,

    4. As in the GRB algorithm.

5.4 Computational and storage cost

Both the GRB-MB and GRB-ML algorithms have the same basic structure of the GRB algorithm, so that we discuss just the differences between the algorithms: additional memory and computational cost. In evaluating the computational cost, we concentrate on the solution of linear systems only, which represents the main part of the cost.

GRB-MB algorithm

Additional memory needed: about \(\frac{3}{2}KM + \frac{1}{2}(KM)^2\) floating point numbers for \({{{\varvec{b}}}}^T {{{\varvec{w}}}}_{k,i}\) and \({{{\varvec{w}}}}_{r,s}^T {{{\varvec{A}}}}_0 {{{\varvec{w}}}}_{k,i}\), \(r, k=0,\dots ,K\), \(s, i =1,\dots ,M\).

For each vector \({{{\varvec{u}}}}_m\), \(m=1,\dots ,M\), of the reduced basis, the computation of vectors \({{{\varvec{w}}}}_{k,m}\) represents the solution of K systems of the size N with the (sparse) matrix \({{{\varvec{A}}}}_0\).

GRB-ML algorithm

Additional memory needed: about \((K+1)\cdot N_{\mathrm{C}}\) floating point numbers for \({{{\varvec{A}}}}^{\mathrm{{ML,C}}}_k\), \(k=0,1,\dots ,K\), where \(N_{\mathrm{C}}\) is the size of the coarse basis (matrices \({{{\varvec{A}}}}^{\mathrm{{ML,C}}}_k\) are sparse).

For each vector \({{{\varvec{u}}}}_m\), \(m=1,\dots ,M\), of the reduced basis: for each \({{\varvec{\xi }}}_j\) in the (actual) training set, the computation of the coarse part of the error \({{{\varvec{e}}}}^{\mathrm{C}}\) represents the solution of the system of the size \(N_C\) with the (sparse) matrix \({{{\varvec{A}}}}^{\mathrm{{ML,C}}}({{\varvec{\xi }}}_j)\).

6 Numerical experiments

In our experiments, we consider \(D=(0,1)\times (0,1)\) with a uniform discretization grid and piece-wise bilinear FE functions. We consider two benchmark problems denoted by (P1) and (P2) motivated by [18]. In problem (P1), the coefficient function (2) is defined as

$$\begin{aligned} a_0({\varvec{x}})= & {} 1,\quad a_k({\varvec{x}})=\frac{1}{2(m_{k,1}^2+m_{k,2}^2)} \sin (m_{k,1}\pi ({\varvec{x}})_1)\nonumber \\&\sin (m_{k,2}\pi ({\varvec{x}})_2),\quad k=1,\dots ,K, \end{aligned}$$
(28)

where \((m_{k,1}, m_{k,2})\subset {\mathbb {N}}^2\) is the k-th pair in the ordered sequence of pairs of integers. The elements in the sequence are ordered according to their Euclidean norm. Thus, the first seven pairs are (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (1, 3), (2, 3). The data of problem (P2) mimic random porosity of the media. The domain D is uniformly divided into \(m\times m\) rectangles, \(m=2,3,4,\dots\). For a given m, \(K=m^2\), the coefficient function (2) is defined as

$$\begin{aligned} a_0({\varvec{x}})=1,\quad a_k({\varvec{x}})=c\,\chi _{k},\quad k=1,\dots ,K, \end{aligned}$$
(29)

where \(\chi _{k}\) is the characteristic function associated with the k-th rectangle. We choose \(c=0.5\) in Example 6.4 and \(c=0.95\) in Examples 6.1 and 6.6 . In all examples, we use \(\Gamma =\langle -1,1\rangle ^K\) and a training set generated randomly using the pseudo-random generator function. In all experiments, we present the results for one particular choice of the random training set, but almost identical results are obtained for other choices. The probability density \(\rho (\varvec{\xi })\) is constant in our experiments and equals \(\frac{1}{\vert \Gamma \vert }\). By an error we mean the deviation of some approximate discretized solution from the exact discretized solution. In our experiments, we compare the true energy norm of the error and its two estimates: the mean based (MB) estimate and the multilevel (ML) estimate. In addition, we compare the memory consumption and, instead of the solution time, which strongly depends on implementation, we compare the numbers of steps of the conjugate gradient (CG) method performed during the GRB-MB or GRB-ML computation. The CG method is used for solving any problem of the size N or the coarse problem. All of them have sparse matrices. We compute the numbers of the CG steps in three different stages: in the MB estimate where the auxiliary vectors \({\varvec{w}}_{k,i}\), \(k=0,\dots ,K\), \(i=1,\dots ,M\), are computed, (cf. Sect. 5.2), in the ML estimates where the coarse problems with matrices \({\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })\) are solved (cf. Sect. 5.3), and in the common part of the GRB algorithm where the new vectors \({\varvec{u}}^m\) of the reduced basis are computed (cf. Sect. 5.1). Since the coarse systems solved for the ML estimates are smaller than the original problems (5), we use a scaling factor to get a realistic comparison of the total work of the MB and ML methods. The scaling factor corresponds to the coarsening scale which is \(\frac{N_C}{N}\approx \frac{1}{4}\) in our examples. All experiments were performed in MATLAB [29] on a personal computer. The threshold for the squared energy norm of the error in all experiments is \(\tau =10^{-6}\). The CG method is finished whenever the relative residual decrease reaches \(10^{-8}\). The quality of the resulting reduced basis after finishing the GRB algorithm is measured by the averaged squared norm of the exact error

$$\begin{aligned} E^{\mathrm{av}}_M=\frac{1}{\vert T_J\vert }\sum _{\varvec{\xi }_j\in T_J}E_M(\varvec{\xi }_j) \approx \int _\Gamma E_M(\varvec{\xi })\rho (\varvec{\xi })\,{\mathrm{d}}\varvec{\xi }, \end{aligned}$$
(30)

which we compute for the GRB-MB and GRB-ML methods separately. For these respective methods we also evaluate and compare the central estimates of \(E^{\mathrm{av}}_M\),

$$\begin{aligned} E^{\mathrm{av,MB}}_M= & {} \frac{1}{\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J}E^{\mathrm{MB}}_M(\varvec{\xi }^j)\nonumber \\= & {} \frac{1}{\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J}{\varvec{r}}_M(\varvec{\xi }_j)^T{\varvec{A}}_0^{-1} {\varvec{r}}_M(\varvec{\xi }_j) \end{aligned}$$
(31)
$$\begin{aligned} E^{\mathrm{av, ML}}_M= & {} \frac{1}{\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J}E^{\mathrm{ML}}_M(\varvec{\xi }_j)\nonumber \\= & {} \frac{1}{\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J}\left( {\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi }_j)^T {{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi }_j)}^{-1}{\varvec{r}}_M^{\mathrm{ML,C}}(\varvec{\xi }_j)\right. \nonumber \\&\left. + {\varvec{r}}_M^{\mathrm{ML,F}}(\varvec{\xi }_j)^T D(\varvec{\xi }_j)^{-1} {\varvec{r}}_M^{\mathrm{ML,F}}(\varvec{\xi }_j)\right) , \end{aligned}$$
(32)

using (18) and (24), respectively. The maximal squared energy norm of the exact error over \(T_J\) is denoted by \(E^{\mathrm{max}}_M\), and the maximal MB and ML estimates are denoted by \(E^{\mathrm{max,MB}}_M\) and \(E^{\mathrm{max,ML}}_M\), respectively. In addition to this, we compare the guaranteed averaged lower and upper bounds to the squared exact errors. For the MB estimates, the bounds are obtained from (18) where either the uniform constants \({{\bar{\alpha }}}_1^{\mathrm{disc}}\le {{\bar{\alpha }}}_2^{\mathrm{disc}}\) are used yielding

$$\begin{aligned} L_M^{\mathrm{av,MB}}= & {} \frac{1}{{{\bar{\alpha }}}_2^{\mathrm{disc}}\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J} {\varvec{r}}_M(\varvec{\xi }_j)^T{\varvec{A}}_0^{-1} {\varvec{r}}_M(\varvec{\xi }_j)\\ U_M^{\mathrm{av,MB}}= & {} \frac{1}{{{\bar{\alpha }}}_1^{\mathrm{disc}}\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J} {\varvec{r}}_M(\varvec{\xi }_j)^T{\varvec{A}}_0^{-1} {\varvec{r}}_M(\varvec{\xi }_j), \end{aligned}$$

or the non-uniform constants \(\alpha _1^{\mathrm{disc}}(\varvec{\xi })\le \alpha _2^{\mathrm{disc}}(\varvec{\xi })\) are used yielding more accurate bounds

$$\begin{aligned} l_M^{\mathrm{av,MB}}= & {} \frac{1}{\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J}\frac{1}{\alpha _2^{\mathrm{disc}}(\varvec{\xi }_j)}\; {\varvec{r}}_M(\varvec{\xi }_j)^T{\varvec{A}}_0^{-1} {\varvec{r}}_M(\varvec{\xi }_j)\\ u_M^{\mathrm{av,MB}}= & {} \frac{1}{\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J}\frac{1}{\alpha _1^{\mathrm{disc}}(\varvec{\xi }_j)}\; {\varvec{r}}_M(\varvec{\xi }_j)^T{\varvec{A}}_0^{-1} {\varvec{r}}_M(\varvec{\xi }_j), \end{aligned}$$

however, evaluating \(\alpha _1^{\mathrm{disc}}(\varvec{\xi })\) and \(\alpha _2^{\mathrm{disc}}(\varvec{\xi })\) for every training sample may rather slow down the computation, unless they are precomputed for all \(\varvec{\xi }\in T_J\). For the ML estimates, the bounds

$$\begin{aligned} L_M^{\mathrm{av,ML}}= & {} \frac{1}{\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J}E_M^{\mathrm{ML,1}}(\varvec{\xi }_j), \\ U_M^{\mathrm{av,ML}}= & {} \frac{1}{\vert T_J\vert } \sum _{\varvec{\xi }_j\in T_J} E_M^{\mathrm{ML,2}}(\varvec{\xi }_j) \end{aligned}$$

are obtained from (23) using \(\beta _1=\frac{1}{4}\), \(\beta _2=\frac{7}{4}\), and \(\gamma =0.641\) in problem (P1) and \(\gamma =\sqrt{\frac{3}{8}}\) in problem (P2).

Example 6.1

In this example, we illustrate the accuracy of the MB and ML a posteriori error estimates in the GRB method. We consider \(N=(2^5-1)^2\) and the problem (P1) with \(K=6\), and the problem (P2) with \(K=4\) and \(c=0.95\) in (29). We generate 200 random samples and use three levels containing 30, 60 and 110 random samples. In Figs. 1 (P1 problem) and 2 (P2 problem), we can see the graphical plots of the decay of the averaged squared energy norms of the errors and of their estimates. On the horizontal axis, there are sizes M of individual reduced bases designed by the GBR algorithms using the MB or ML a posteriori error estimates. On the vertical axis, there are averaged over \(\varvec{\xi }\in T_J\) exact squared energy norms of the errors \(E_M^{\mathrm{av}}\) (30) (solid line with crosses) and their estimates and guaranteed lower and upper bounds. In both plots, the solid lines without markers denote the central error estimates \(E_M^{\mathrm{av,MB}}\) and \(E_M^{\mathrm{av,ML}}\) defined by (31) and (32), respectively. In the left plot, the solid lines with dots denote the averaged guaranteed lower and upper bounds \(L^{\mathrm{av,MB}}_M\) and \(U^{\mathrm{av,MB}}_M\). The dashed lines denote the guaranteed lower and upper bounds \(l^{\mathrm{av,MB}}_M\) and \(u^{\mathrm{av,MB}}_M\). In the right plot, the solid lines with dots denote the guaranteed lower and upper bounds \(L^{\mathrm{av,ML}}_M\) and \(U^{\mathrm{av,ML}}_M\). In Figs. 1 and 2 in the bottom pairs of plots, we can see the ratios of the computed quantities with respect to the exact error \(E_M^{\mathrm{av}}\). We can see that in these examples, the MB or ML estimates yield the same final dimensions of the reduced bases and almost the same decay rates of errors. In both problems, \(l^{\mathrm{av,MB}}_M\) and \(u^{\mathrm{av,MB}}_M\) provide the best guaranteed two-sided estimates. However, the constants \(\alpha _1^{\mathrm{disc}}(\varvec{\xi })\) and \(\alpha _2^{\mathrm{disc}}(\varvec{\xi })\) are costly to obtain. In (P1) problem, the guaranteed MB bounds obtained for the uniform constants \({{\bar{\alpha }}}_1^{\mathrm{disc}}=0.4875\) and \({{\bar{\alpha }}}_2^{\mathrm{disc}}1.5125\) are more accurate than the guaranteed ML bounds. In (P2) problem, the guaranteed ML bounds are more accurate than the guaranteed MB bounds obtained for the uniform constants \({{\bar{\alpha }}}_1^{\mathrm{disc}}=0.05\) and \({{\bar{\alpha }}}_2^{\mathrm{disc}}1.95\). The central ML estimates \(E_M^{\mathrm{av,ML}}\) are slightly more accurate than the central MB estimates \(E_M^{\mathrm{av,MB}}\) in (P2) problem.

Fig. 1
figure 1

Averaged exact errors (solid lines with crosses), their estimates (plain solid lines) and guaranteed lower and upper bounds (solid lines with dots for MB and ML; dashed lines for MB) of Example 6.1, (P1) problem for MB (left) and ML (right)

Fig. 2
figure 2

Averaged exact errors (solid lines with crosses), their estimates (plain solid lines) and guaranteed lower and upper bounds (solid lines with dots for MB and ML; dashed lines for MB) of Example 6.1, (P2) problem for MB (left) and ML (right)

Example 6.2

We consider problem (P1), \(N=127^2=(2^7-1)^2\), \(K=1,3,6,12\). We generate 200 random samples divided into three levels containing 30, 60 and 110 random samples. The numerical results are found in Table 1 for the GRB-MB and GRB-ML methods. We present dimensions M of the resulting reduced bases and the averaged exact errors \(E_M^{\mathrm{av}}\) and the estimates \(E_M^{\mathrm{av,MB}}\) and \(E_M^{\mathrm{av,ML}}\). For the GRB-MB method we present two lower and two upper averaged guaranteed error bounds \(L^{\mathrm{av,MB}}_M\), \(l^{\mathrm{av,MB}}_M\), \(U^{\mathrm{av,MB}}_M\), and \(u^{\mathrm{av,MB}}_M\). For the GRB-ML method, we present \(L^{\mathrm{av,ML}}_M\) and \(U^{\mathrm{av,ML}}_M\). The last two columns contain the maximal achieved error over \(\varvec{\xi }\in T_J\) denoted by \(E^{\mathrm{max}}_M\), and the maximal estimates obtained either from the MB or ML estimates denoted by \(E^{\mathrm{max,MB}}_M\) and \(E^{\mathrm{max,ML}}_M\), respectively. The MB bounds \(l^{\mathrm{av,MB}}_M\) and \(u^{\mathrm{av,MB}}_M\) are very accurate. The uniform averaged guaranteed bounds for ML and MB seem comparable. In Table 2, we present how many 8-byte numbers need to be allocated (mem.) and how many (rescaled) CG steps are performed (CG) for the MB and ML methods and in the common part of the GRB algorithm (comm.) We can see that the memory allocated for the auxiliary arrays in the MB method is smaller than that allocated for the set of matrices \({\varvec{A}}_k\), \(k=0,1,\dots ,K\), if K is relatively small, otherwise the ML method may need less memory than the MB method.

Table 1 Errors and their estimates in Example 6.2 for problem (P1) for \(N=(2^7-1)^2=16129\) and 200 random samples with 3 levels
Table 2 Memory and CG steps in Example 6.2 for problem (P1) for \(N=(2^7-1)^2=16129\) and 200 random samples with 3 levels

Example 6.3

We consider problem (P1), \(K=6\), \(N=(2^{\mathrm{exp}}-1)^2\), \(\mathrm{exp}=5,6,7,8\). We generate 200 random samples and use three levels containing 30, 60 and 110 random samples. In Tables 3 and 4 , we present analogous results as in Example 6.2 and Tables 1 and 2 . The reduced bases obtained for the MB and ML error estimates are of almost the same sizes and qualities.

Table 3 Errors and their estimates in Example 6.3 for problem (P1) for \(K=6\) and 200 random samples with 3 levels for \(N=(2^{\mathrm{exp}}-1)^2\)
Table 4 Memory and CG steps in Example 6.3 for problem (P1) for \(K=6\) and 200 random samples with 3 levels for \(N=(2^{\mathrm{exp}}-1)^2\)

Example 6.4

In this example, we compare enriching the training set during the offline phase with considering the whole training set from the beginning of the GRB method. We solve problem (P2) with \(N=14161\), \(K=9\), and \(c=0.5\) in (29). The training set \(T_J\) contains \(J=400\) samples. The samples are either all considered and searched from the beginning of the GRB algorithm, or there are two or five subsets (levels) of samples which are gradually involved: the first subset is considered from the beginning and the next ones are added whenever the rest of the current training set contains less than seven elements. If we use two levels, they contain 100 and 300 samples, respectively; if we use five levels, they contain 30, 60, 60, 100 and 150 samples, respectively. In Tables 5 and 6 , we can see that the gradually enriched training sets significantly reduce the number of CG steps of the GRB-ML algorithm. However, the dimension of the resulting reduced basis slightly grows. The other outputs and the parameters of GRB-MB and GRB-ML remain comparable.

Table 5 Errors and their estimates in Example 6.4 for problem (P2), \(K=9\), \(N=14161\), GRB method with Monte Carlo set of 400 samples with 1, 2, or 5 levels of adaptivity
Table 6 Memory and CG steps in Example 6.4 for problem (P2), \(K=9\), \(N=14161\), GRB method with Monte Carlo set of 400 samples with 1, 2, or 5 levels of adaptivity

Example 6.5

An example of the error function and its MB and ML estimates can be found in Fig. 3 for problem (P1) with \(K=3\) and \(N=29^2\) and with the omitted factor 1/2 at \(a_1,a_2,a_3\). For a reduced basis with \(M=15\) elements obtained by the GRB-MB method for the random sample \(\varvec{\xi }=(-0.9,0.5,0.7)\), we obtain the residual \({\varvec{r}}= {\varvec{b}}-{\varvec{A}}(\varvec{\xi }){\varvec{U}}_{15}{\varvec{u}}^{\mathrm{RB}}_{15}(\varvec{\xi })\). We can compare the exact error vector \({\varvec{e}}={\varvec{A}}(\varvec{\xi })^{-1}{\varvec{r}}\), the MB approximate error vector \({\varvec{e}}^{\mathrm{MB}}={\varvec{A}}_0^{-1}{\varvec{r}}\) and the ML approximate error vector

$$\begin{aligned} {\varvec{P}}{\varvec{e}}^{\mathrm{ML}}={\varvec{P}} \left( \begin{array}{cc}{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })&{}0\\ 0&{}{\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi })\end{array}\right) ^{-1}{\varvec{P}}^T{\varvec{r}} \end{aligned}$$

reshaped into two-dimensional arrays: \({\varvec{e}}\) (left), the differences \({\varvec{e}}-{\varvec{e}}^{\mathrm{MB}}\) (middle) and \({\varvec{e}}-{\varvec{Pe}}^{\mathrm{ML}}\) (right). In general, for growing norms of \(\varvec{\xi }\) the differences between the two estimates are amplified and the ML estimates are more accurate.

Fig. 3
figure 3

Exact error (left) of a solution of Example 6.5 for \(M=15\) and \(\varvec{\xi }=(-0.9,0.5,0.7)\). The difference between the exact error function and the MB estimate (middle), and between the exact error function and the ML estimate (right)

Example 6.6

In this example, we study the localization of the estimates of the error functions obtained by the MB and ML methods. We consider problem (P2) with \(N=59^2\), \(K=9\) and \(c=0.95\) in (29). We examine \(M=10,20,30\), and 40, and consider the set \(T^{\mathrm{test}}\) of 300 random test samples, \(T^{\mathrm{test}}\subset \Gamma\), \(T^{\mathrm{test}}\ne T_J\). For all \(\varvec{\xi }\in T^{\mathrm{test}}\) we compute the solutions \({\varvec{u}}_M^{\mathrm{RB}}(\varvec{\xi })\) of (7), and test the exact errors of (5) \({\varvec{e}}_M(\varvec{\xi })={\varvec{A}}(\varvec{\xi })^{-1}{\varvec{b}}-{\varvec{U}}_M{\varvec{u}}_M^{\mathrm{RB}}(\varvec{\xi })\) and the approximate errors obtained either by the MB or by the ML estimates, \({\varvec{e}}^{\mathrm{MB}}_M(\varvec{\xi })={\varvec{A}}_0^{-1}({\varvec{b}}- {\varvec{A}}(\varvec{\xi }){\varvec{U}}_M{\varvec{u}}_M^{\mathrm{RB}}(\varvec{\xi }))\) and

$$\begin{aligned} {\varvec{P}}{\varvec{e}}^{\mathrm{ML}}_M(\varvec{\xi })={\varvec{P}}\left( \begin{array}{cc}{\varvec{A}}^{\mathrm{ML,C}}(\varvec{\xi })&{}0\\ 0&{}{\varvec{D}}^{\mathrm{ML,F}}(\varvec{\xi })\end{array}\right) ^{-1}{\varvec{P}}^T({\varvec{b}}-{\varvec{A}}(\varvec{\xi }){\varvec{U}}_M{\varvec{u}}_M^{\mathrm{RB}}(\varvec{\xi })), \end{aligned}$$
(33)

respectively. The averages over \(\varvec{\xi }\in T^{\mathrm{test}}\) and the standard deviations of the differences of certain nodal values \(({\varvec{e}}_M(\varvec{\xi }))_{p}-({\varvec{e}}^{\mathrm{MB}}_M(\varvec{\xi }))_p\) are displayed in Table 7. The same quantities are computed for \(({\varvec{e}}_M(\varvec{\xi }))_{p}-({\varvec{P}}{\varvec{e}}^{\mathrm{ML}}_M(\varvec{\xi }))_p\). We chose \(p=1741=(59^2+1)/2\), which is the number of the node in the center of D, \(\varvec{\nu }_{p}=(0.5,0.5)\). We can see that for the reduced bases of smaller dimensions M, the approximate error functions obtained by the ML method are better localized (the standard deviation is smaller) than the estimates obtained from the MB method. For richer reduced bases (larger M), the methods provide comparable results.

Table 7 Differences between the estimated and true errors in Example 6.6: the means (aver) and the standard deviations (SD) of the differences in the central node of D

Several observations can be made from our experiments:

  1. 1.

    Both the MB and ML error estimates provide guaranteed bounds to the exact energy norm of the error of approximate solutions. When using the uniform bounds \({{\bar{\alpha }}}_1^{\mathrm{disc}}\) and \({{\bar{\alpha }}}_2^{\mathrm{disc}}\) in the MB method (cf. Sect. 4.1), the MB and ML methods yield comparable bounds. When using the individually computed \(\alpha _1^{\mathrm{disc}}(\varvec{\xi })\) and \(\alpha _2^{\mathrm{disc}}(\varvec{\xi })\) for each \(\varvec{\xi }\in \Gamma\), the MB bounds are more accurate but the computation may be time consuming. Moreover, when the function \(a_0({\varvec{x}})\) does not correspond to the mean of \(a({\varvec{x}},\varvec{\xi })\) in (2), or if the variation of \(a({\varvec{x}},\varvec{\xi })\) is large, the MB guaranteed lower and upper error bounds may become useless. The ML estimate provides an error function which better locally corresponds to the exact error function on D, see Examples 6.2 and 6.5 .

  2. 2.

    We encountered the numerical instability of evaluating formula (26), reported in [11, 14, 15, 23], even after reordering the terms trying to avoid the summation of large and small numbers. Therefore, for large K and M we stored and worked with the matrices \({\varvec{A}}_0^{-1}{\varvec{A}}_k{\varvec{U}}_m\), \(k=1,\dots ,K\), \(m=1,\dots ,M\), instead of the scheme described in Sect. 5.2; cf. [17, Section 3.5.2].

  3. 3.

    In all numerical experiments we observed the exponential decay of the error with respect to the size of the reduced basis, cf. [10, 18].

7 Discussion

In this paper, we deal with the GRB algorithm and suggest a new a posteriori error estimate (called ML), which is based on a multilevel splitting of the FE solution space and can serve as an alternative to the estimate (called here MB) widely used in the GRB methods. While the MB estimates are based on the dual norm of the residual scaled by the coercivity constant, the ML estimate is obtained from a slightly modified energy norm of the error. We discuss and compare these two methods and introduce some numerical experiments. For both estimates, we can have guaranteed lower and upper bounds. In addition, the MB estimate allows for tight guaranteed bounds which, however, depend on constants that may be difficult to evaluate. On the other hand, the ML method yields error functions which are better localized than those obtained from the MB method, especially if the variance of the data is big. The number of operations and the achieved accuracy depend on the problem. For a large number of unknowns and a small number of random variables, the MB methods seems more efficient. If the number of random variables grows, the methods become comparable in terms of the memory and CG steps. There are some modifications which can be applied in the ML method. For example, if the size of the coarse problem is still large, further coarsening of the coarse space can be applied. Gradually enriching the training set instead of considering all test samples from the start of the GRB algorithm also improves the relative efficiency of the ML method. The efficiency of the MB method depends on the affine form of the coefficient function. For the ML method, the coefficient function is not required to be in the affine form; thus in the case of the log-normal distribution of the data the ML estimates can be obtained with almost the same effort as for the affine data provided the coefficient function \(a({\varvec{x}},\varvec{\xi })\) is piece-wise constant or mildly varying on subdomains of D. The ML estimates are directly applicable to any kind of elliptic problems since the associated weak forms generate (semi)norms. Errors of solutions of advection-diffusion-reaction problems can be measured using norms associated to the symmetric part of related non-symmetric bilinear forms [2, 12]. Then the ML error estimates can be used here. In nonlinear elasticity problems, for example, solved by Newton’s method, the ML estimates can be used in every step. Moreover, due to mild varying of the system matrices during the steps of Newton’s method, bounds to \(\gamma\) can be obtained easily. We believe that in some special problems, e.g. with varying but piecewise constant material data, an appropriate implementation can further improve the efficiency. Our future research will focus on implementing matrix free computation and matrix free forms of the obtained formulas at least for problems with regular grids.