1 Introduction

Recently stochastic convection–diffusion problems have attracted considerable interest [2, 6,7,8, 17, 19, 26]. To obtain efficient numerical discretizations adaptivity is mandatory. Yet, for these problems, adaptivity in general and a posteriori error estimates in particular are still in their infancy. As a first step to close this gap we consider in this article deterministic non-stationary convection–diffusion equations with a non-linearity of the form \(\nu \varphi (u) g\) modelling the noise (cf. Eq. (2.1) below). They neither fit into the framework of [21, §3] and [25, §6.2] due to the non-linearity, nor into the framework of [1, 12, 23, 24] and [25, §6.6] due to the lacking differentiability of the non-linearity or its lacking strong monotonicity. Therefore, in what follows, we will carefully adapt the arguments of [21, §3] and [25, §6.2] to catch the interplay of the various parameters controlling the relative size of diffusion, convection, reaction and non-linearity.

The article is organized as follows. In Sect. 2 we present the differential equation and its variational formulation. Section 3 gives the discretization which is a stabilized \(\theta \)-scheme with a possibly explicit treatment of the non-linearity. In Sect. 4 we then derive the a posteriori error estimates (cf. Theorem 4.14).

2 Variational problem

As a deterministic paradigm for stochastic convection–diffusion problems, we consider the following non-stationary non-linear convection–diffusion equations:

$$\begin{aligned} \begin{aligned} \partial _t u - \varepsilon \Delta u + \mathbf {a}\cdot \nabla u + b u&= \nu \varphi (u) g&\text {in } \Omega \times (0,T], \\ u&= 0&\text {on } \Gamma \times (0,T], \\ u(\cdot ,0)&= u_0&\text {in } \Omega . \end{aligned} \end{aligned}$$
(2.1)

Here, \(\Omega \subset \mathbb {R}^d\), \(d \in \{2, 3\}\), is a bounded polyhedral domain with Lipschitz boundary \(\Gamma \). The final time T is arbitrary, but kept fixed in what follows. We assume that the data satisfy the following conditions (compare [21, §3] and [25, §6.2]):

  1. (A1)

    \(\varepsilon > 0\), \(\nu \ge 0\),

  2. (A2)

    \(g \in L^\infty (\Omega \times (0,T])\), \(\mathbf {a}\in C(0,T;W^{1,\infty }(\Omega )^d)\), \(b \in L^\infty (\Omega \times (0,T])\), \(u_0 \in L^2(\Omega )\),

  3. (A3)

    there are two constants \(\beta \ge 0\) and \({{\mathrm{c}}}_b\ge 0\), which do not depend on \(\varepsilon \), such that \(-\frac{1}{2}{{\mathrm{div}}}\mathbf {a}+ b \ge \beta \) in \(\Omega \times (0,T]\) and \(\left||b \right||_{L^{\infty }(\Omega \times (0,T])} \le {{\mathrm{c}}}_b\beta \),

  4. (A4)

    the function \(\varphi \), modelling the noise, is Lipschitz continuous, i.e.

    \(\left|\varphi (s_1) - \varphi (s_2) \right|_{} \le L \left|s_1 - s_2 \right|_{}\) for all \(s_1, s_2 \in \mathbb {R}\).

Examples of functions satisfying assumption (A4) with \(L = 1\) are \(\varphi (s) = 1 + \left|s \right|_{}\) and \(\varphi (s) = \sqrt{1 + s^2}\).

We will be particularly interested in the convection-dominated regime \(\varepsilon \ll 1\). At the expense of more technical arguments and additional data oscillations, the second assumption can be replaced by slightly weaker conditions concerning the temporal regularity. The third assumption allows us to simultaneously handle the case of a non-vanishing reaction term and the one of absent reaction. If \(b \ne 0\) we may assume without loss of generality that \({{\mathrm{c}}}_b\ge 1\); if \(b = 0\) we set \(\beta = 0\) and \({{\mathrm{c}}}_b= 1\).

We denote by \(L^p(\Omega )\) and \(W^{k,p}(\Omega )\), \(1 \le p \le \infty \), \(k \ge 1\), the standard Lebesgue and Sobolev spaces equipped with their standard norms \(\left||\cdot \right||_{L^{p}(\Omega )}\) and \(\left||\cdot \right||_{W^{k,p}(\Omega )}\) respectively, by \(H^1_0(\Omega )\) the space of all functions in \(W^{1,2}(\Omega )\) with vanishing trace and by \(H^{-1}(\Omega )\) the dual space of \(H^1_0(\Omega )\). The norms of \(H^1_0(\Omega )\) and \(H^{-1}(\Omega )\) depend on the parameters \(\varepsilon \) and \(\beta \) and are specified in (4.1) and (4.2) below. Further, we define a bilinear form \(B : H^1_0(\Omega ) \times H^1_0(\Omega ) \rightarrow \mathbb {R}\) and a non-linear map \(N : H^1_0(\Omega ) \rightarrow H^{-1}(\Omega )\) by setting for all \(u, v \in H^1_0(\Omega )\)

$$\begin{aligned} B(u,v)&= \int _\Omega \left( \varepsilon \nabla u \cdot \nabla v + \mathbf {a}\cdot \nabla u v + b u v \right) , \nonumber \\ \left\langle N(u)\,,\,v \right\rangle&= \int _\Omega \nu \varphi (u) g v. \end{aligned}$$
(2.2)

Remind that B and N depend on time t due to the functions \(\mathbf {a}\), b and g.

The variational formulation of problem (2.1) then is to find a function u in \(L^2(0,T;H^1_0(\Omega ))\) with its weak temporal derivative \(\partial _t u\) in \(L^2(0,T;H^{-1}(\Omega ))\) such that \(u(\cdot ,0) = u_0\) almost everywhere and

$$\begin{aligned} \left\langle \partial _t u\,,\,v \right\rangle + B(u,v) = \left\langle N(u)\,,\,v \right\rangle \end{aligned}$$
(2.3)

for all \(v \in H^1_0(\Omega )\) and almost all \(t \in (0,T)\).

In what follows we assume that problem (2.3) admits at least one solution.

3 Discrete problem

For the space–time discretization of problem (2.1), we consider partitions \(\mathcal {I}= \left\{ [t_{n-1},t_n] : 1 \le n \le N_\mathcal {I}\right\} \) of the time-interval [0, T] into sub-intervals satisfying \(0 = t_0< \cdots < t_{N_\mathcal {I}} = T\). For every n with \(1 \le n \le N_\mathcal {I}\), we denote by \(I_n = [t_{n-1},t_n]\) the n-th sub-interval and by \(\tau _n = t_n - t_{n-1}\) its length. With every intermediate time \(t_n\), \(0 \le n \le N_\mathcal {I}\), we associate a partition \(\mathcal {T}_n\) of \(\Omega \) and a corresponding finite element space \(V(\mathcal {T}_n)\). The partitions \(\mathcal {I}\) and \(\mathcal {T}_n\) and the spaces \(V(\mathcal {T}_n)\) must satisfy the following assumptions (compare [21, §3] and [25, §6.2]):

  • The closure of \(\Omega \) is the union of all elements in \(\mathcal {T}_n\).

  • Every element has at least one vertex in \(\Omega \).

  • Every element in \(\mathcal {T}_n\) is either a simplex or a parallelepiped, i.e. it is the image of the d-dimensional reference simplex \(\widehat{K}_d = \left\{ x\in \mathbb {R}^d : x_1 \ge 0, \, \ldots , \, x_d \ge 0, \right. \) \(\left. x_1 + \cdots + x_d \le 1 \right\} \) or of the d-dimensional reference cube \(\widehat{K}_d = [0, 1]^d\) under an affine mapping (affine-equivalence).

  • Any two elements in \(\mathcal {T}_n\) are either disjoint or share a complete lower dimensional face of their boundaries (admissibility).

  • Denoting by \(h_{K}\) the diameter of any element K and by \(\rho _K\) the diameter of the largest ball inscribed into K, the shape parameter

    $$\begin{aligned} {{\mathrm{C}}}_{\mathcal {T}}= \max _{1 \le n \le N_\mathcal {I}} \max _{K \in \mathcal {T}_n} \frac{h_{K}}{\rho _K} \end{aligned}$$

    is of moderate size independently of \(\varepsilon \), \(\beta \) and \(\nu \) (shape-regularity).

  • For every n with \(1 \le n \le N_\mathcal {I}\) there is an affine-equivalent, admissible and shape-regular partition \(\widetilde{\mathcal {T}}_n\) such that it is a refinement of both \(\mathcal {T}_n\) and \(\mathcal {T}_{n-1}\) and such that

    $$\begin{aligned} {{\mathrm{C}}}_{\widetilde{\mathcal {T}},\mathcal {T}} = \max _{1 \le n \le N_\mathcal {I}} \max _{K \in \widetilde{\mathcal {T}}_n} \max _{K^\prime \in \mathcal {T}_n; K \subset K^\prime } \frac{h_{K^\prime }}{h_{K}} \end{aligned}$$

    is of moderate size independently of \(\varepsilon \), \(\beta \) and \(\nu \) (transition condition).

  • Each \(V(\mathcal {T}_n)\) consists of continuous functions which are piecewise polynomials, the degrees being at least one and being bounded uniformly with respect to all partitions \(\mathcal {T}_n\) and \(\mathcal {I}\) (degree condition).

The transition condition is due to the simultaneous presence of finite element functions defined on different grids. Usually the partition \(\mathcal {T}_n\) is obtained from \(\mathcal {T}_{n-1}\) by a combination of refinement and of coarsening. In this case the transition condition only restricts the coarsening: it should not be too abrupt nor too strong.

The lower bound on the polynomial degrees is needed for the construction of suitable quasi-interpolation operators. The upper bound ensures that the constants in inverse estimates are uniformly bounded.

Notice that we do not impose any shape-condition of the form \(\max _n \tau _n \le c \min _n \tau _n\).

For any parameter \(\Theta \in [0,1]\) we set for abbreviation

$$\begin{aligned} g^{n\Theta }&= \Theta g(\cdot ,t_n) + (1 - \Theta ) g(\cdot ,t_{n-1}), \nonumber \\ \mathbf {a}^{n\Theta }&= \Theta \mathbf {a}(\cdot ,t_n) + (1 - \Theta ) \mathbf {a}(\cdot ,t_{n-1}), \nonumber \\ b^{n\Theta }&= \Theta b(\cdot ,t_n) + (1 - \Theta ) b(\cdot ,t_{n-1}) \end{aligned}$$
(3.1)

and

$$\begin{aligned} B^{n\Theta }(u, v)&= \int _\Omega \left\{ \varepsilon \nabla u \cdot \nabla v + \mathbf {a}^{n\Theta } \cdot \nabla u v + b^{n\Theta } u v \right\} , \\ \left\langle N^{n\Theta }(u)\,,\,v \right\rangle&= \int _\Omega \nu \varphi (u) g^{n\Theta } v. \end{aligned}$$

For the finite element discretization of problem (2.1) we consider a stabilized \(\theta \)-scheme with a possibly explicit treatment of the non-linearity. More precisely we choose two parameters \(\theta , \vartheta \in [0,1]\) and look for a sequence \(u^n_{\mathcal {T}_n} \in V(\mathcal {T}_n)\), \(0 \le n \le N_\mathcal {I}\), such that \(u^0_{\mathcal {T}_0}\) is the \(L^2\)-projection of \(u_0\) onto \(V(\mathcal {T}_0)\) and such that, for \(n = 1, \ldots , N_\mathcal {I}\) and \(U^{n\Theta } = \Theta u^n_{\mathcal {T}_n} + (1 - \Theta ) u^{n-1}_{\mathcal {T}_{n-1}}\), \(\Theta \in \{\theta , \vartheta \}\),

$$\begin{aligned} \int _\Omega \frac{1}{\tau _n} (u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}) v_{\mathcal {T}_n} + B^{n\theta }(U^{n\theta },v_{\mathcal {T}_n}) + S^n(U^{n\theta },v_{\mathcal {T}_n}) = \left\langle N^{n\vartheta }(U^{n\vartheta })\,,\,v_{\mathcal {T}_n} \right\rangle \qquad \end{aligned}$$
(3.2)

holds for all \(v_{\mathcal {T}_n} \in V(\mathcal {T}_n)\).

Note that by choosing \(\vartheta \ne \theta \) we may handle the non-linear and linear terms in (2.1) differently. In particular we may choose \(\vartheta = 0\) and \(\theta \in \{ \frac{1}{2}, 1 \}\) thus using an explicit discretization for the non-linear term and an implicit one for the linear terms.

The term \(S^n\) specifies the particular stabilization. It is supposed to be linear in its second argument and affine in its first argument. Note that \(S^n\) may contain contributions of the data g. Of course, the choice \(S^n = 0\) is also possible and corresponds to a standard finite element method without stabilization. Some popular choices of \(S^n\) are as follows (cf. [21] for more details and references):

  • Streamline diffusion method: Here, the stabilizing term has the form

    $$\begin{aligned} \quad \quad \quad S^n(u,v) = \sum _{K \in \mathcal {T}_n} \vartheta _K \int _K&\left\{ -\varepsilon \Delta u + \mathbf {a}^{n\theta } \cdot \nabla u + b^{n\theta } u - \nu \varphi (u) g^{n\theta } \right\} \mathbf {a}^{n\theta } \cdot \nabla v \end{aligned}$$

    with \(\vartheta _K \left||\mathbf {a} \right||_{L^{\infty }(K)} \le c_S h_{K}\) for all \(K \in \mathcal {T}_n\) (cf. e.g. [16, 20]).

  • Local projection scheme: Denoting by \(\mathcal {M}_n\) a macro-partition such that every element in \(\mathcal {M}_n\) is the union of elements in \(\mathcal {T}_n\) and by \(I - \kappa _{\mathcal {M}_n}\) the \(L^2\)-projection onto an appropriate discontinuous projection space \(D(\mathcal {M}_n)\) living on the partition \(\mathcal {M}_n\) and by \(\bar{\mathbf {a}}_{\mathcal {M}_n}\) a piecewise constant approximation of \(\mathbf {a}^{n\theta }\) on \(\mathcal {M}_n\), we either have

    $$\begin{aligned} S^n(u,v) = \sum _{M \in \mathcal {M}_n} \vartheta _M \int _M \kappa _{\mathcal {M}_n} \left( \bar{\mathbf {a}}_{\mathcal {M}_n}\cdot \nabla u \right) \kappa _{\mathcal {M}_n} \left( \bar{\mathbf {a}}_{\mathcal {M}_n}\cdot \nabla v \right) \end{aligned}$$

    with \(\vartheta _M \left||\mathbf {a} \right||_{L^{\infty }(M)}\le c_S h_{M}\) for all \(M \in \mathcal {M}_n\) or

    $$\begin{aligned} S^n(u,v) = \sum _{M \in \mathcal {M}_n} \vartheta _M \int _M \kappa _{\mathcal {M}_n} \left( \nabla u \right) \kappa _{\mathcal {M}_n} \left( \nabla v \right) \end{aligned}$$

    with \(\vartheta _M \le c_S \left||\mathbf {a} \right||_{L^{\infty }(M)} h_{M}\) for all \(M \in \mathcal {M}_n\) (cf. e.g. [15, 18, 22]).

  • Subgrid scale approach: Decomposing the solution space \(V(\mathcal {T}_n)\) into a space of resolvable scales \(X(\mathcal {T}_n)\) and a space of unresolvable scales \(Y(\mathcal {T}_n)\) such that \(V(\mathcal {T}_n) = X(\mathcal {T}_n) \oplus Y(\mathcal {T}_n)\) and denoting by \(\Pi _n: V(\mathcal {T}_n) \rightarrow Y(\mathcal {T}_n)\) a projection operator with \(X(\mathcal {T}_n) = \ker (\Pi _n)\), we either have

    $$\begin{aligned} S^n(u,v) = \sum _{K \in \mathcal {T}_n} \vartheta _K \int _K \left( \bar{\mathbf {a}}_{\mathcal {T}_n}\cdot \nabla \Pi _n(u) \right) \left( \bar{\mathbf {a}}_{\mathcal {T}_n}\cdot \nabla \Pi _n(v) \right) \end{aligned}$$

    with \(\vartheta _K \left||\mathbf {a} \right||_{L^{\infty }(K)}\le c_S h_{K}\) for all \(K \in \mathcal {T}_n\) or

    $$\begin{aligned} S^n(u,v) = \sum _{K \in \mathcal {T}_n} \vartheta _K \int _K \nabla \Pi _n \left( u \right) \nabla \Pi _n \left( v_\mathcal {T}\right) \end{aligned}$$

    with \(\vartheta _K \le c_S \left||\mathbf {a} \right||_{L^{\infty }(K)} h_{K}\) for all \(K \in \mathcal {T}_n\) (cf. e.g. [10, 13, 14, 20]).

  • Continuous interior penalty method: Denoting by \(\mathcal {E}_{n,\Omega }\) the collection of all element faces of \(\mathcal {T}_n\) inside \(\Omega \) and by \({\mathbb J}_{E}(\cdot )\) the jump across such a face, we have

    $$\begin{aligned} S^n(u,v) = \sum _{E \in \mathcal {E}_{n,\Omega }} \vartheta _E \int _E {\mathbb J}_{E}(\mathbf {a}^{n\theta } \cdot \nabla u) {\mathbb J}_{E}(\mathbf {a}^{n\theta } \cdot \nabla v) \end{aligned}$$

    with \(\vartheta _E \le c_S h_{E}^2\) for all \(E \in \mathcal {E}_{n,\Omega }\) (cf. e.g. [3,4,5, 9, 11]).

In what follows we assume that problem (3.2) admits at least one solution.

4 A posteriori error estimates

In what follows we consider a solution u of the variational problem (2.3) and a solution \(\left( u^n_{\mathcal {T}_n} \right) _{0 \le n \le N_\mathcal {I}}\) of the discrete problem (3.2). With the latter we associate the function \(u_\mathcal {I}\) which is continuous and piecewise affine with respect to time and which equals \(u^n_{\mathcal {T}_n}\) at time \(t_n\), \(0 \le n \le N_\mathcal {I}\). We want to derive explicitly computable a posteriori error estimates which yield upper and lower bounds for the error \(u - u_\mathcal {I}\). In doing so we pay particular attention to the dependence of the bounds on the parameters \(\varepsilon \), \(\beta \) and \(\nu \). To this end we proceed as in [21] and [25, §6.2]:

  • We introduce the residual associated with the error and prove that a suitable norm of the error is bounded from below and above by a suitable dual norm of the residual.

  • We additively split the residual into three contributions called data residual, temporal residual and spatial residual.

  • We separately bound the dual norms of the data, temporal and spatial residuals.

In following this path, we must pay particular attention to the non-linearity. Its Lipschitz-continuity will be crucial.

4.1 Norms

We equip \(H^1_0(\Omega )\) with the energy norm

$$\begin{aligned} \left||{} \left|v \right||{} \right|_{} = \left\{ \varepsilon \left||\nabla v \right||^2 + \beta \left||v \right||^2 \right\} ^\frac{1}{2}\end{aligned}$$
(4.1)

and \(H^{-1}(\Omega )\) by the corresponding dual norm

$$\begin{aligned} \left||{} \left|\ell \right||{} \right|_{*} = \sup _{v \in H^1_0(\Omega ) \setminus \{ 0 \}} \frac{\left\langle \ell \,,\,v \right\rangle }{\left||{} \left|v \right||{} \right|_{}}, \end{aligned}$$
(4.2)

where \(\left||\cdot \right||_{\omega }\) is the standard \(L^2\)-norm on any measurable subset \(\omega \) of \(\Omega \) and \(\left||\cdot \right|| = \left||\cdot \right||_\Omega \).

For abbreviation we set for \(0 \le t_- < t_+ \le T\)

$$\begin{aligned} X(t_-,t_+) = L^2(t_-,t_+;H^1_0(\Omega )) \cap L^\infty (t_-,t_+;L^2(\Omega )) \cap H^1(t_-,t_+;H^{-1}(\Omega )), \end{aligned}$$

equip it with the norm

$$\begin{aligned} \left||u \right||_{X(t_-,t_+)}&= \left\{ \sup _{t_-< t < t_+} \left||u(\cdot ,t) \right||^2 + \int _{t_-}^{t_+} \left||{} \left|u(\cdot ,t) \right||{} \right|_{}^2 \right. \\&\quad \quad \left. + \int _{t_-}^{t_+} \left||{} \left|\partial _t u(\cdot ,t) + \mathbf {a}\cdot \nabla u(\cdot ,t) \right||{} \right|_{*}^2 \right\} ^\frac{1}{2}\end{aligned}$$

and set

$$\begin{aligned} X = X(0,T), \quad \left||\cdot \right||_X = \left||\cdot \right||_{X(0,T)}. \end{aligned}$$

Recall that for \(0 \le t_- < t_+ \le T\) and \(\ell : (t_-,t_+) \rightarrow H^{-1}(\Omega )\)

$$\begin{aligned} \left||\ell \right||_{L^2(t_-,t_+;H^{-1}(\Omega ))} = \left\{ \int _{t_-}^{t_+} \left||{} \left|\ell (t) \right||{} \right|_{*}^2 \right\} ^\frac{1}{2}. \end{aligned}$$

Denote by

$$\begin{aligned} {{\mathrm{c}}}_F= \sup _{v \in H^1_0(\Omega ) \setminus \{ 0 \}} \frac{\left||v \right||}{\left||\nabla v \right||} \end{aligned}$$
(4.3)

the best constant in Friedrich’s inequality. Note that \({{\mathrm{c}}}_F\lesssim {{\mathrm{diam}}}(\Omega )\). Setting

$$\begin{aligned} \lambda = \min \left\{ {{\mathrm{c}}}_F\varepsilon ^{-\frac{1}{2}}, \beta ^{-\frac{1}{2}} \right\} \end{aligned}$$

Equations (4.1) and (4.3) imply for every \(v \in H^1_0(\Omega )\)

$$\begin{aligned} \left||v \right|| \le \lambda \left||{} \left|v \right||{} \right|_{}. \end{aligned}$$
(4.4)

For abbreviation we finally set

$$\begin{aligned} \gamma (t) = \left||g(\cdot ,t) \right||_{L^{\infty }(\Omega )}, \quad \gamma = \left||g \right||_{L^{\infty }(\Omega \times (0,T))}. \end{aligned}$$

4.2 Lipschitz-continuity of the non-linearity

The non-linearity N is not differentiable, but Lipschitz-continuous.

Lemma 4.1

(Lipschitz-continuity of N). For every \(t \in (0,T)\) and \(u_1, u_2, v \in H^1_0(\Omega )\) we have

$$\begin{aligned} \left\langle N(u_1) - N(u_2)\,,\,v \right\rangle \le \nu L \gamma (t) \left||u_1 - u_2 \right|| \left||v \right|| \end{aligned}$$

and

$$\begin{aligned} \left||{} \left|N(u_1) - N(u_2) \right||{} \right|_{*}&\le \nu L \lambda \gamma (t) \left||u_1 - u_2 \right|| \\&\le \nu L \lambda ^2 \gamma (t) \left||{} \left|u_1 - u_2 \right||{} \right|_{}. \end{aligned}$$

Proof

For every \(v \in H^1_0(\Omega )\) and \(t \in (0,T)\) we have thanks to assumption (A4)

$$\begin{aligned} \left\langle N(u_1) - N(u_2)\,,\,v \right\rangle \le \nu L \int _\Omega \left|g(\cdot ,t) \right|_{} \left|u_1 - u_2 \right|_{} \left|v \right|_{}. \end{aligned}$$

Together with Hölder’s inequality this proves the first inequality. The second and third one, follow from the first one and (4.4). \(\square \)

Remark 4.2

Using the continuous embedding of \(H^1_0(\Omega )\) into \(L^p(\Omega )\) with \(p < \infty \) if \(d = 2\) and \(p = 6\) if \(d = 3\), the terms \(\nu L \lambda \gamma (t)\) and \(\nu L \lambda ^2\gamma (t)\) in Lemma 4.1 can be replaced by \(\min \Bigl \{ \nu L \beta ^{-\frac{1}{2}} \gamma (t), \nu L {{\mathrm{c}}}_p\varepsilon ^{-\frac{1}{2}} \left||g(\cdot ,t) \right||_{L^{q}(\Omega )} \Bigr \}\) and \(\min \Bigl \{ \nu L \beta ^{-1} \gamma (t), \nu L {{\mathrm{c}}}_p^2 \varepsilon ^{-1} \left||g(\cdot ,t) \right||_{L^{r}(\Omega )} \Bigr \}\), resp. where \(q = \frac{2p}{p - 2}\), \(r = \frac{p}{p-2}\) and \({{\mathrm{c}}}_p= \sup _{v} \frac{\left||v \right||_{L^{p}(\Omega )}}{\left||\nabla v \right||}\).

4.3 Equivalence of residual and error

With the discrete solution \(u_\mathcal {I}\) we associate the residual \(R(u_\mathcal {I}) \in L^2(0,T;H^{-1})\) by setting for all \(v \in H^1_0(\Omega )\)

$$\begin{aligned} \left\langle R(u_\mathcal {I})\,,\,v \right\rangle = \left\langle N(u_\mathcal {I})\,,\,v \right\rangle - \left\langle \partial _t u_\mathcal {I}\,,\,v \right\rangle - B(u_\mathcal {I},v). \end{aligned}$$

Notice, that B and N are given by (2.2) and that \(\partial _t u_\mathcal {I}= \frac{1}{\tau _n} \left( u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right) \) on \([t_{n-1},t_n]\). With this notation, we have the following equivalence of error and residual.

Lemma 4.3

(Equivalence of error and residual). For all \(1 \le n \le N_\mathcal {I}\) the \(L^2(t_{n-1}, t_n;H^{-1}(\Omega ))\)-norm of the residual is bounded from above by the \(X(t_{n-1},t_n)\)-norm of the error

$$\begin{aligned}&\left||R(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \\&\quad \le \left||u - u_\mathcal {I} \right||_{X(t_{n-1},t_n)} \sqrt{2} {{\mathrm{c}}}_b\left\{ 1 + \nu L \lambda \min \left\{ \lambda , \sqrt{\tau _n} \right\} \max _{t_{n-1} \le t \le t_n}\gamma (t) \right\} . \end{aligned}$$

Conversely, the X(0, T)-norm of the error is bounded from above by the \(L^2(0,T; H^{-1}(\Omega ))\)-norm of the residual

$$\begin{aligned} \left||u - u_\mathcal {I} \right||_X&\le \left\{ \left||u_0 - u^0_{\mathcal {T}_0} \right||^2 + \left||R(u_\mathcal {I}) \right||_{L^2(0,T;H^{-1}(\Omega ))}^2 \right\} ^\frac{1}{2}\cdot \\&\quad \;\biggl \{ 3 + \Bigl [1 + 3 \max \bigl \{ {{\mathrm{c}}}_b^2, \nu ^2 L^2 \lambda ^2 \gamma ^2 \min \{ T, \lambda ^2 \} \bigr \} \Bigr ] e^{2 \nu L \gamma T} \biggr \}^\frac{1}{2}. \end{aligned}$$

If in addition \(\kappa = 2 \nu L \min \{ T, \lambda ^2 \} \gamma < 1\), the upper bound for the norm of the error can be improved to

$$\begin{aligned} \left||u - u_\mathcal {I} \right||_X&\le \left\{ \left||u_0 - u^0_{\mathcal {T}_0} \right||^2 + \left||R(u_\mathcal {I}) \right||_{L^2(0,T;H^{-1}(\Omega ))}^2 \right\} ^\frac{1}{2}\cdot \\&\quad \;\left\{ 3 + \left[ 1 + 3 \max \left\{ {{\mathrm{c}}}_b^2, \frac{1}{2}\nu L \lambda ^2 \gamma \right\} \right] \frac{1}{1 - \kappa }\right\} ^\frac{1}{2}. \end{aligned}$$

Proof

The variational formulation (2.3) and the definition of the residual yield

$$\begin{aligned} \left\langle \partial _t (u - u_\mathcal {I})\,,\,v \right\rangle + B(u - u_\mathcal {I},v) = \left\langle N(u) - N(u_\mathcal {I})\,,\,v \right\rangle + \left\langle R(u_\mathcal {I})\,,\,v \right\rangle \end{aligned}$$
(4.5)

for all \(v \in H^1_0(\Omega )\) and almost all \(t \in (0,T)\). Therefore, [25, Proposition 6.14] and the assumption \({{\mathrm{c}}}_b\ge 1\) imply for all \(1 \le n \le N_\mathcal {I}\)

$$\begin{aligned}&\left||R(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \\&\quad \le \sqrt{2} {{\mathrm{c}}}_b\left\{ \left||u - u_\mathcal {I} \right||_{X(t_{n-1},t_n)} + \left||N(u) - N(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \right\} . \end{aligned}$$

Together with Lemma 4.1 this proves the upper bound for the dual norm of the residual.

To prove the upper bounds for the error, we go back to the proof of [25, Proposition 6.14] and first observe that

$$\begin{aligned}&\left\langle \partial _t (u - u_\mathcal {I}) + \mathbf {a}\cdot \nabla (u - u_\mathcal {I})\,,\,v \right\rangle \\&\quad = \int _\Omega \bigl [ \varepsilon \nabla (u_\mathcal {I}- u) \cdot \nabla v + b (u_\mathcal {I}- u) v \bigr ] + \left\langle N(u) - N(u_\mathcal {I})\,,\,v \right\rangle + \left\langle R(u_\mathcal {I})\,,\,v \right\rangle , \end{aligned}$$

Together with Lemma 4.1 this implies

$$\begin{aligned} \left||{} \left|\partial _t (u - u_\mathcal {I}) + \mathbf {a}\cdot \nabla (u - u_\mathcal {I}) \right||{} \right|_{*} \le \left||{} \left|R(u_\mathcal {I}) \right||{} \right|_{*} + {{\mathrm{c}}}_b\left||{} \left|u - u_\mathcal {I} \right||{} \right|_{} + \nu L \lambda \gamma (t) \left||u - u_\mathcal {I} \right|| \end{aligned}$$

and

$$\begin{aligned}&\int _0^T \left||{} \left|\partial _t (u - u_\mathcal {I}) + \mathbf {a}\cdot \nabla (u - u_\mathcal {I}) \right||{} \right|_{*}^2 \\&\quad \le 3 \left\{ \int _0^T \left||{} \left|R(u_\mathcal {I}) \right||{} \right|_{*}^2 + {{\mathrm{c}}}_b^2 \int _0^T \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2 \right. \\&\quad \quad \quad \quad \left. + \nu ^2 L^2 \lambda ^2 \gamma ^2 \min \left\{ T \sup _{0< t < T} \left||u - u_\mathcal {I} \right||^2, \lambda ^2 \int _0^T \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2 \right\} \right\} . \end{aligned}$$

In order to bound \(\sup _{0< t < T} \left||u - u_\mathcal {I} \right||^2\) and \(\int _0^T \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2\), we now use a standard parabolic energy argument and insert \(u - u_\mathcal {I}\) as test-function v in (4.5). Thanks to the coercivity of the bilinear form B and Lemma 4.1 this yields

$$\begin{aligned}&\frac{1}{2}\frac{\mathrm{d}}{\mathrm{d}t} \left||u - u_\mathcal {I} \right||^2 + \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2 \\&\quad \le \frac{1}{2}\frac{\mathrm{d}}{\mathrm{d}t} \left||u - u_\mathcal {I} \right||^2 + B(u - u_\mathcal {I}, u - u_\mathcal {I}) \\&\quad = \left\langle N(u) - N(u_\mathcal {I})\,,\,u - u_\mathcal {I} \right\rangle + \left\langle R(u_\mathcal {I})\,,\,u - u_\mathcal {I} \right\rangle \\&\quad \le \nu L \gamma (t) \left||u - u_\mathcal {I} \right||^2 + \left||{} \left|R(u_\mathcal {I}) \right||{} \right|_{*} \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{} \\&\quad \le \nu L \gamma (t) \left||u - u_\mathcal {I} \right||^2 + \frac{1}{2}\left||{} \left|R(u_\mathcal {I}) \right||{} \right|_{*}^2 + \frac{1}{2}\left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2 \end{aligned}$$

and thus

$$\begin{aligned}&\left||(u - u_\mathcal {I})(\cdot ,t) \right||^2 + \int _0^t \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2 \\&\quad \le 2 \nu L \gamma \int _0^t \left||u - u_\mathcal {I} \right||^2 + \int _0^t \left||{} \left|R(u_\mathcal {I}) \right||{} \right|_{*}^2 + \left||u_0 - u^0_{\mathcal {T}_0} \right||^2. \end{aligned}$$

If \(\kappa < 1\) we may absorb the first term on the right-hand side of this estimate by the left-hand side and obtain

$$\begin{aligned}&\sup _{0< t < T} \left||u - u_\mathcal {I} \right||^2 + \int _0^T \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2 \\&\quad \le \frac{1}{1 - \kappa } \left\{ \left||u_0 - u^0_{\mathcal {T}_0} \right||^2 + \left||R(u_\mathcal {I}) \right||_{L^2(0,T;H^{-1}(\Omega ))}^2 \right\} . \end{aligned}$$

Otherwise, Gronwall’s Lemma yields

$$\begin{aligned}&\sup _{0< t < T} \left||u - u_\mathcal {I} \right||^2 + \int _0^T \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2 \\&\quad \le e^{2 \nu L \gamma T} \left\{ \left||u_0 - u^0_{\mathcal {T}_0} \right||^2 + \left||R(u_\mathcal {I}) \right||_{L^2(0,T;H^{-1}(\Omega ))}^2 \right\} . \end{aligned}$$

Combining these estimates with the bound for \(\int _0^T \left||{} \left|\partial _t (u - u_\mathcal {I}) + \mathbf {a}\cdot \nabla (u - u_\mathcal {I}) \right||{} \right|_{*}^2\) establishes the upper bound for the error. \(\square \)

4.4 Decomposition of the residual

We additively split the residual

$$\begin{aligned} R(u_\mathcal {I}) = R_\tau (u_\mathcal {I}) + R_h(u_\mathcal {I}) + R_D(u_\mathcal {I}) \end{aligned}$$

into a temporal residual, a spatial residual and a data residual which, for all \(v \in H^1_0(\Omega )\), are defined by

$$\begin{aligned} \left\langle R_\tau (u_\mathcal {I})\,,\,v \right\rangle&= \left\langle N^{n\vartheta }(u_\mathcal {I})\,,\,v \right\rangle - \left\langle N^{n\vartheta }(U^{n\vartheta })\,,\,v \right\rangle + B^{n\theta }(U^{n\theta } - u_\mathcal {I},v), \nonumber \\ \left\langle R_h(u_\mathcal {I})\,,\,v \right\rangle&= \left\langle N^{n\vartheta }(U^{n\vartheta })\,,\,v \right\rangle - \left\langle \partial _t u_\mathcal {I}\,,\,v \right\rangle - B^{n\theta }(U^{n\theta },v), \nonumber \\ \left\langle R_D(u_\mathcal {I})\,,\,v \right\rangle&= \left\langle N(u_\mathcal {I})\,,\,v \right\rangle - \left\langle N^{n\vartheta }(u_\mathcal {I})\,,\,v \right\rangle - B(u_\mathcal {I},v) + B^{n\theta }(u_\mathcal {I},v). \end{aligned}$$
(4.6)

In addition, we additively split the temporal residual

$$\begin{aligned} R_\tau (u_\mathcal {I}) = R_{\tau ,\text {lin}}(u_\mathcal {I}) + R_{\tau ,\text {nonlin}}(u_\mathcal {I}) \end{aligned}$$

into a linear and a non-linear part which, for all \(v \in H^1_0(\Omega )\), are defined by

$$\begin{aligned} \left\langle R_{\tau ,\text {lin}}(u_\mathcal {I})\,,\,v \right\rangle&= B^{n\theta }(U^{n\theta } - u_\mathcal {I},v) \\ \left\langle R_{\tau ,\text {nonlin}}(u_\mathcal {I})\,,\,v \right\rangle&= \left\langle N^{n\vartheta }(u_\mathcal {I})\,,\,v \right\rangle - \left\langle N^{n\vartheta }(U^{n\vartheta })\,,\,v \right\rangle . \end{aligned}$$

In the following subsections we will estimate the three residuals separately. The following Lemma shows that this is permissible. Lemma 4.12 below in addition shows that the temporal residual is governed by its linear part if \(\nu L \lambda ^2 \gamma \) is sufficiently small.

Lemma 4.4

(Decomposition of the residual). For every \(n \in \left\{ 1, \ldots , N_\mathcal {I}\right\} \) we have

$$\begin{aligned} \left||R_\tau (u_\mathcal {I}) + R_h(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))}&\le \left||R_{\tau ,\text {lin}}(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \\&\quad + \left||R_{\tau ,\text {nonlin}}(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \\&\quad + \left||R_h(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \end{aligned}$$

and

$$\begin{aligned}&\frac{2}{25} \left\{ \left||R_{\tau ,\text {lin}}(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))}^2 + \left||R_h(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))}^2 \right\} ^\frac{1}{2}\\&\quad \le \left||R_\tau (u_\mathcal {I}) + R_h(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} + \left||R_{\tau ,\text {nonlin}}(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))}. \end{aligned}$$

Proof

Since \(\sqrt{\frac{5}{14}} \left( 1 - \frac{\sqrt{3}}{2} \right) > \frac{2}{25}\) and \(R_{\tau ,\text {lin}}\) is affine in \(U^{n\theta } - u_\mathcal {I}\) and thus proportional to \(\frac{t - t_{n-1}}{\tau _n} - \theta \), the estimates follow from the triangle inequality and [25, Lemma 6.16]. \(\square \)

4.5 Bounding the data residual

Hölder’s inequality and (4.4) yield the following upper bound for the data residual.

Lemma 4.5

(Upper bound for the data residual). For every \(n \in \left\{ 1, \ldots , N_\mathcal {I}\right\} \) we have

$$\begin{aligned}&\left||R_D(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \\&\quad \le \nu L \lambda \left\{ \left||g - g^{n\vartheta } \right||_{L^2(t_{n-1},t_n;L^2(\Omega ))} \right. \\&\quad \quad \quad \quad \quad \left. + \left||g - g^{n\vartheta } \right||_{L^\infty (t_{n-1},t_n;L^\infty (\Omega ))} \left( \int _{t_{n-1}}^{t_n} \left||{} \left|u_\mathcal {I} \right||{} \right|_{}^2 \right) ^\frac{1}{2}\right\} \\&\quad \quad + \varepsilon ^{-\frac{1}{2}} \lambda \left||\mathbf {a}- \mathbf {a}^{n\theta } \right||_{L^\infty (t_{n-1},t_n;L^\infty (\Omega ))} \left( \int _{t_{n-1}}^{t_n} \left||{} \left|u_\mathcal {I} \right||{} \right|_{}^2 \right) ^\frac{1}{2}\\&\quad \quad + \lambda ^2 \left||b - b^{n\theta } \right||_{L^\infty (t_{n-1},t_n;L^\infty (\Omega ))} \left( \int _{t_{n-1}}^{t_n} \left||{} \left|u_\mathcal {I} \right||{} \right|_{}^2 \right) ^\frac{1}{2}. \end{aligned}$$

Remark 4.6

Since \(u_\mathcal {I}= \frac{t - t_{n-1}}{\tau _n} u^n_{\mathcal {T}_n} + \frac{t_n - t}{\tau _n} u^{n-1}_{\mathcal {T}_{n-1}}\) for \(t_{n-1} \le t \le t_n\), the convexity of \(\left||{} \left|\cdot \right||{} \right|_{}^2\) and Simpson’s rule yield

$$\begin{aligned} \int _{t_{n-1}}^{t_n} \left||{} \left|u_\mathcal {I} \right||{} \right|_{}^2 \le \frac{\tau _n}{2} \left( \left||{} \left|u^n_{\mathcal {T}_n} \right||{} \right|_{}^2 + \left||{} \left|u^{n-1}_{\mathcal {T}_{n-1}} \right||{} \right|_{}^2 \right) . \end{aligned}$$

4.6 Bounding the temporal residual

We first bound the linear part of the temporal residual.

For every time-interval \([t_{n-1},t_n]\) we have

$$\begin{aligned} R_{\tau ,\text {lin}} (u_\mathcal {I}) = \left( \theta - \frac{t - t_{n-1}}{\tau _n} \right) r^n \end{aligned}$$

where \(r^n \in H^{-1}(\Omega )\) is defined by

$$\begin{aligned} \left\langle r^n\,,\,v \right\rangle = B^{n\theta }(u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}},v) \end{aligned}$$

for \(v \in H^1_0(\Omega )\). The assumption \({{\mathrm{c}}}_b\ge 1\) and [25, Lemma 6.17] therefore yield the following upper and lower bounds for the linear part of the temporal residual.

Lemma 4.7

(Bounds for the linear part of the temporal residual). For every \(n \in \left\{ 1, \ldots , N_\mathcal {I}\right\} \), the linear part of the temporal residual can be bounded from above and from below by

$$\begin{aligned}&\frac{\sqrt{\tau _n}}{\sqrt{12} (2 + {{\mathrm{c}}}_b)} \left\{ \left||{} \left|u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right||{} \right|_{} + \left||{} \left|\mathbf {a}^{n\theta } \cdot \nabla (u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}) \right||{} \right|_{*} \right\} \\&\quad \le \left||R_{\tau ,\text {lin}}(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \\&\quad \quad \le \frac{\sqrt{\tau _n}}{\sqrt{3} {{\mathrm{c}}}_b} \left\{ \left||{} \left|u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right||{} \right|_{} + \left||{} \left|\mathbf {a}^{n\theta } \cdot \nabla (u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}) \right||{} \right|_{*} \right\} . \end{aligned}$$

The term \(\left||{} \left|\mathbf {a}^{n\theta } \cdot \nabla (u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}) \right||{} \right|_{*}\) is not suited for a posteriori error estimates since it involves the dual norm \(\left||{} \left|\cdot \right||{} \right|_{*}\). The next two Lemmas bound this term for the case of dominant diffusion, i.e. \(\varepsilon \gtrsim 1\), and of dominant convection, i.e. \(\varepsilon \ll 1\), respectively. The first one follows from Hölder’s inequality and (4.3), the second one from [25, Lemma 6.18].

Lemma 4.8

(Bounding the convective derivative for dominant diffusion). For every \(n \in \left\{ 1, \ldots , N_\mathcal {I}\right\} \) we have

$$\begin{aligned} \left||{} \left|\mathbf {a}^{n\theta } \cdot \nabla (u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}) \right||{} \right|_{*} \le \varepsilon ^{-\frac{1}{2}} \lambda \left||\mathbf {a}^{n\theta } \right||_{L^{\infty }(\Omega )} \left||{} \left|u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right||{} \right|_{}. \end{aligned}$$

Lemma 4.9

(Bounding the convective derivative for dominant convection). For every \(n \in \left\{ 1, \ldots , N_\mathcal {I}\right\} \) denote by \(S^{1,0}_0(\widetilde{\mathcal {T}}_n)\) the space of continuous, piecewise affine functions vanishing on \(\Gamma \) corresponding to the partition \(\widetilde{\mathcal {T}}_n\) and by \(\widetilde{u}^{n}_{\mathcal {T}_n} \in S^{1,0}_0(\widetilde{\mathcal {T}}_n)\) the unique solution of the discrete reaction-diffusion problem

$$\begin{aligned} \varepsilon \int _\Omega \nabla \widetilde{u}^{n}_{\mathcal {T}_n} \cdot \nabla v_{\mathcal {T}_n} + \beta \int _\Omega \widetilde{u}^{n}_{\mathcal {T}_n} v_{\mathcal {T}_n} = \int _\Omega \mathbf {a}^{n\theta } \cdot \nabla (u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}) v_{\mathcal {T}_n} \end{aligned}$$

for all \(v_{\mathcal {T}_n} \in S^{1,0}_0(\widetilde{\mathcal {T}}_n)\). Define the error indicator \(\widetilde{\eta }^{n}_{\mathcal {T}_n}\) by

$$\begin{aligned} \widetilde{\eta }^{n}_{\mathcal {T}_n}&= \left\{ \sum _{K \in \widetilde{\mathcal {T}}_n} \hslash _{K}^2 \left||\mathbf {a}^{n\theta } \cdot \nabla (u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}) + \varepsilon \Delta \widetilde{u}^{n}_{\mathcal {T}_n} - \beta \widetilde{u}^{n}_{\mathcal {T}_n} \right||_{K}^2 \right. \\&\quad \quad \left. + \sum _{E \in \widetilde{\mathcal {E}}_{n,\Omega }} \varepsilon ^{- \frac{1}{2}} \hslash _{E} \left||{\mathbb J}_{E}(\mathbf {n}_E \cdot \nabla \widetilde{u}^{n}_{\mathcal {T}_n}) \right||_{E}^2 \right\} ^\frac{1}{2}\end{aligned}$$

and the data error \(\widetilde{\theta }^{n}_{\mathcal {T}_n}\) by

$$\begin{aligned} \widetilde{\theta }^{n}_{\mathcal {T}_n} = \left\{ \sum _{K \in \widetilde{\mathcal {T}}_n} \hslash _{K}^2 \left||(\mathbf {a}^{n\theta } - \mathbf {a}^{n\theta }_{\widetilde{\mathcal {T}}_n}) \cdot \nabla (u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}) \right||_{K}^2 \right\} ^\frac{1}{2}\end{aligned}$$

where \(\hslash _{\omega } = \min \left\{ \varepsilon ^{- \frac{1}{2}} {{\mathrm{diam}}}(\omega ), \beta ^{- \frac{1}{2}} \right\} \) and \(\mathbf {a}^{n\theta }_{\widetilde{\mathcal {T}}_n}\) is an approximation of \(\mathbf {a}^{n\theta }\) on \(\widetilde{\mathcal {T}}_n\). Then there are two constants \(c_\dagger \) and \(c^\dagger \) which only depend on the shape-parameters \({{\mathrm{C}}}_{\mathcal {T}}\) and \({{\mathrm{C}}}_{\widetilde{\mathcal {T}},\mathcal {T}}\) such that the following estimates are valid

$$\begin{aligned} c_\dagger \left\{ \left||{} \left|\widetilde{u}^{n}_{\mathcal {T}_n} \right||{} \right|_{} + \widetilde{\eta }^{n}_{\mathcal {T}_n} - \widetilde{\theta }^{n}_{\mathcal {T}_n}\right\}&\le \left||{} \left|\mathbf {a}^{n\theta } \cdot \nabla (u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}) \right||{} \right|_{*} \\&\le c^\dagger \left\{ \left||{} \left|\widetilde{u}^{n}_{\mathcal {T}_n} \right||{} \right|_{} + \widetilde{\eta }^{n}_{\mathcal {T}_n} + \widetilde{\theta }^{n}_{\mathcal {T}_n} \right\} . \end{aligned}$$

Next we bound the non-linear part of the temporal residual.

Lemma 4.10

(Upper bounds for the non-linear temporal residual). For every \(n \in \left\{ 1, \ldots , N_\mathcal {I}\right\} \), the non-linear part of the temporal residual can be bounded from above by

$$\begin{aligned} \left||R_{\tau ,\text {nonlin}}(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))}&\le \sqrt{\frac{\tau _n}{3}} \nu L \lambda \gamma \left||u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right|| \\&\le \sqrt{\frac{\tau _n}{3}} \nu L \lambda ^2 \gamma \left||{} \left|u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right||{} \right|_{}. \end{aligned}$$

Proof

The assertion follows from (4.4), Lemma 4.1,

$$\begin{aligned} \int _{t_{n-1}}^{t_n} \left||u_\mathcal {I}- U^{n\vartheta } \right||^2 \le \left||u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right||^2 \int _{t_{n-1}}^{t_n} \left( \vartheta - \frac{t - t_{n-1}}{\tau _n} \right) ^2 \end{aligned}$$

and

$$\begin{aligned} \int _{t_{n-1}}^{t_n} \left( \vartheta - \frac{t - t_{n-1}}{\tau _n} \right) ^2 = \frac{\tau _n}{6} \left[ 2 - 6 \vartheta (1 - \vartheta ) \right] \le \frac{\tau _n}{3}. \end{aligned}$$

\(\square \)

Lemma 4.10 and the estimate

$$\begin{aligned} \left||u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right|| \le 2 \sup _{t_{n-1} \le t \le t_n} \left||(u - u_\mathcal {I})(\cdot ,t) \right|| + \sqrt{\tau _n} \left||\partial _t u \right||_{\Omega \times (t_{n-1},t_n)} \end{aligned}$$

yield the following upper bound for the non-linear part of the temporal residual for all parameters \(\varepsilon \), \(\beta \), \(\nu \) and \(\gamma \).

Lemma 4.11

(Non-linear temporal residual and error). For all parameters \(\varepsilon \), \(\beta \), \(\nu \) and \(\gamma \) the non-linear part of the temporal residual is bounded from above by the error and the \(L^2\)-norm of \(\partial _t u\), i.e. for every \(n \in \left\{ 1, \ldots , N_\mathcal {I}\right\} \) we have

$$\begin{aligned} \left||R_{\tau ,\text {nonlin}}(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))}&\le \frac{2 \sqrt{\tau _n}}{\sqrt{3}} \nu L \lambda \gamma \sup _{t_{n-1} \le t \le t_n} \left||(u - u_\mathcal {I})(\cdot ,t) \right|| \\&\quad + \frac{\tau _n}{\sqrt{3}} \nu L \lambda \gamma \left||\partial _t u \right||_{\Omega \times (t_{n-1},t_n)}. \end{aligned}$$

If, on the other hand, \(\nu L \lambda ^2 \gamma \) is sufficiently small, Lemmas 4.44.7 and 4.10 imply that the temporal residual is governed by its linear part.

Lemma 4.12

(Non-linear and linear temporal residual). If \(\widetilde{\kappa } = 25 \left( 2 + {{\mathrm{c}}}_b\right) \nu L \lambda ^2 \gamma < 1\), the temporal residual is governed by its linear part, i.e. for every \(n \in \left\{ 1, \ldots , N_\mathcal {I}\right\} \) we have

$$\begin{aligned}&\frac{2}{25} (1 - \widetilde{\kappa }) \left\{ \left||R_{\tau ,\text {lin}}(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))}^2 + \left||R_h(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))}^2 \right\} ^\frac{1}{2}\\&\quad \le \left||R_\tau (u_\mathcal {I}) + R_h(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \\&\quad \quad \le \left( 1 + \frac{2}{25} \widetilde{\kappa }\right) \left\{ \left||R_{\tau ,\text {lin}}(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} + \left||R_h(u_\mathcal {I}) \right||_{L^2(t_{n-1},t_n;H^{-1}(\Omega ))} \right\} . \end{aligned}$$

4.7 Bounding the spatial residual

Comparing (4.6) and [21, Equation (3.5)] reveals that [21, Lemma 3.5] yields an upper bound for the spatial residual \(R_h(u_\mathcal {I})\) if we replace the right-hand side \(f^{n\theta }\) there by \(\nu \varphi (U^{n\theta }) g^{n\theta }\). This in particular requires suitable finite element approximations of \(g^{n\theta }\) and \(\varphi (U^{n\theta })\). For the latter there are two natural choices: \(\varphi (\overline{U^{n\theta }}_{\mathcal {T}_n})\) and \(\overline{\varphi (U^{n\theta })}_{\mathcal {T}_n}\) where \(\overline{\psi }_{\mathcal {T}_n}\) denotes a piecewise constant approximation on \(\mathcal {T}_n\) of a given function \(\psi \). In view of the Lipschitz continuity of \(\varphi \), the first option is preferable. These observations yield the following bounds for the spatial residual.

Lemma 4.13

(Bounds for the spatial residual). For every \(n \in \{ 1, \ldots , N_\mathcal {I}\}\) define a spatial error indicator by

and spatial data errors by

$$\begin{aligned} \theta ^n_{\mathcal {T}_n}&\!=\! \left\{ \sum _{K \in \mathcal {T}_n} \hslash _{K}^2 \Bigl ||\nu \varphi (U^{n\theta }) \left( g^{n\theta }_{\mathcal {T}_n} - g^{n\theta } \right) + \nu (\varphi (U^{n\theta }) - \varphi (\overline{U^{n\theta }}_{\mathcal {T}_n})) g^{n\theta }_{\mathcal {T}_n} \right. \\&\quad \quad \quad \quad \left. + (\mathbf {a}^{n\theta }_{\mathcal {T}_n} - \mathbf {a}^{n\theta }) \cdot \nabla U^{n\theta } + (b^{n\theta }_{\mathcal {T}_n} - b^{n\theta }) U^{n\theta } \Bigr ||_K^2 \right\} ^\frac{1}{2}, \\ \Theta ^n_{\mathop {cip},\mathcal {T}_n}&\!=\! \left\{ \sum _{K \in \mathcal {T}_n} \hslash _{K}^2 \left||\left( \mathbf {a}^{n\theta } - \mathbf {a}^{n\theta }_{\mathcal {T}_n} \right) \cdot \nabla U^{n\theta } \right||_{K}^2 + \hslash _{K}^2 h_{K}^2 \left||\nabla \mathbf {a}^{n\theta } \right||_{L^{\infty }(K)} \left||\nabla U^{n\theta } \right||_{K}^2 \right\} ^\frac{1}{2}\!. \end{aligned}$$

Here, \(U^{n\theta } = \theta u^n_{\mathcal {T}_n} + (1 - \theta ) u^{n-1}_{\mathcal {T}_{n-1}}\) is as in (3.2), \(\overline{U^{n\theta }}_{\mathcal {T}_n}\) is a piecewise constant approximation of \(U^{n\theta }\) on \(\mathcal {T}_n\), \(g^{n\theta }\), \(\mathbf {a}^{n\theta }\) and \(b^{n\theta }\) are as in (3.1) and \(g^{n\theta }_{\mathcal {T}_n}\), \(\mathbf {a}^{n\theta }_{\mathcal {T}_n}\) and \(b^{n\theta }_{\mathcal {T}_n}\) are approximations of \(g^{n\theta }\), \(\mathbf {a}^{n\theta }\) and \(b^{n\theta }\) on \(\mathcal {T}_n\). Then, on every interval \((t_{n-1},t_n]\), the dual norm of the spatial residual can be bounded from above by

$$\begin{aligned} \left||{} \left|R_h(u_\mathcal {I}) \right||{} \right|_{*} \le c^\flat \left\{ \left( \eta ^n_{\mathcal {T}_n} \right) ^2 + \left( \theta ^n_{\mathcal {T}_n} \right) ^2 + \sigma _{\mathop {cip}} \left( \Theta ^n_{\mathop {cip},\mathcal {T}_n} \right) ^2 \right\} ^\frac{1}{2}\end{aligned}$$

and from below by

$$\begin{aligned} \eta ^n_{\mathcal {T}_n} \le c_\flat \left[ \left||{} \left|R_h(u_\mathcal {I}) \right||{} \right|_{*} + \theta ^n_{\mathcal {T}_n} \right] . \end{aligned}$$

Here, the parameter \(\sigma _{\mathop {cip}}\) equals 1 for the continuous interior penalty method and vanishes for the other stabilizations. The above error estimates are robust in the sense that the constants \(c^\flat \) and \(c_\flat \) are independent of the parameters \(\varepsilon \), \(\beta \) and \(\nu \).

4.8 A posteriori error estimates

Combining the previous lemmas yields the following a posteriori error estimates.

Theorem 4.14

(A posteriori error estimates). The error between the solution u of problem (2.3) and the solution \(u_\mathcal {I}\) of problem (3.2) is bounded from above by

$$\begin{aligned}&\left\{ \sup _{0< t < T} \left||u - u_\mathcal {I} \right||_{L^{\infty }(\Omega )}^2 \!+\! \int _0^T \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2 \!+\! \int _0^T \left||{} \left|\partial _t (u - u_\mathcal {I}) \!+\! \mathbf {a}\cdot \nabla (u - u_\mathcal {I}) \right||{} \right|_{*}^2 \right\} ^\frac{1}{2}\\&\quad \le c^*\left\{ \left||u_0 - u^0_{\mathcal {T}_0} \right||^2 \right. \\&\qquad + \sum _{n = 1}^{N_\mathcal {I}} \tau _n \left[ \left( \eta ^n_{\mathcal {T}_n}\right) ^2 + \left||{} \left|u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right||{} \right|_{}^2 + \left( \widetilde{\eta }^{n}_{\mathcal {T}_n}\right) ^2 + \left||{} \left|\widetilde{u}^{n}_{\mathcal {T}_n} \right||{} \right|_{}^2 + \left( \widetilde{\theta }^{n}_{\mathcal {T}_n}\right) ^2 \right] \\&\qquad + \sum _{n = 1}^{N_\mathcal {I}} \tau _n \left[ \left( \theta ^n_{\mathcal {T}_n}\right) ^2 + \sigma _{\mathop {cip}} \left( \Theta ^n_{\mathop {cip},\mathcal {T}_n} \right) ^2 \right] \\&\qquad \;\; + \left||g - g^{n\vartheta } \right||_{L^\infty (0,T;L^\infty (\Omega ))}^2 \left( 1 + \int _0^T \left||{} \left|u_\mathcal {I} \right||{} \right|_{}^2 \right) \\&\qquad \;\;\left. + \left( \left||\mathbf {a}- \mathbf {a}^{n\theta } \right||_{L^\infty (0,T;L^\infty (\Omega ))}^2 + \left||b - b^{n\theta } \right||_{L^\infty (0,T;L^\infty (\Omega ))}^2 \right) \int _0^T \left||{} \left|u_\mathcal {I} \right||{} \right|_{}^2 \right\} ^\frac{1}{2}\end{aligned}$$

and on each interval \((t_{n-1},t_n]\), \(1 \le n \le N_\mathcal {I}\), from below by

$$\begin{aligned}&\tau _n^\frac{1}{2}\left\{ \left( \eta ^n_{\mathcal {T}_n}\right) ^2 + \left||{} \left|u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}} \right||{} \right|_{}^2 + \left( \widetilde{\eta }^{n}_{\mathcal {T}_n}\right) ^2 + \left||{} \left|\widetilde{u}^{n}_{\mathcal {T}_n} \right||{} \right|_{}^2 \right\} ^\frac{1}{2}\\&\quad \le c_*\left\{ \sup _{t_{n-1} \le t \le t_n} \left||u - u_\mathcal {I} \right||^2 + \int _{t_{n-1}}^{t_n} \left||{} \left|u - u_\mathcal {I} \right||{} \right|_{}^2 \right. \\&\qquad + \int _{t_{n-1}}^{t_n} \left||{} \left|\partial _t (u - u_\mathcal {I}) + \mathbf {a}\cdot \nabla (u - u_\mathcal {I}) \right||{} \right|_{*}^2 \\&\qquad + \tau _n \left( \theta ^n_{\mathcal {T}_n}\right) ^2 + \tau _n \left( \widetilde{\theta }^n_{\mathcal {T}_n}\right) ^2 \\&\qquad + \left||g - g^{n\vartheta } \right||_{L^\infty (t_{n-1},t_n;L^\infty (\Omega ))}^2 \left( 1 + \int _{t_{n-1}}^{t_n} \left||{} \left|u_\mathcal {I} \right||{} \right|_{}^2 \right) \\&\qquad + \left( \left||\mathbf {a}- \mathbf {a}^{n\theta } \right||_{L^\infty (t_{n-1},t_n;L^\infty (\Omega ))}^2 \right. \\&\qquad \left. \left. + \left||b - b^{n\theta } \right||_{L^\infty (t_{n-1},t_n;L^\infty (\Omega ))}^2 \right) \int _{t_{n-1}}^{t_n} \left||{} \left|u_\mathcal {I} \right||{} \right|_{}^2 \right\} ^\frac{1}{2}\\&\qquad + c_{**} \left\{ \tau _n^\frac{1}{2}\sup _{t_{n-1} \le t \le t_n} \left||u - u_\mathcal {I} \right|| + \tau _n \left||\partial _t u \right||_{\Omega \times (t_{n-1},t_n)} \right\} . \end{aligned}$$

Here, the functions \(\widetilde{u}^{n}_{\mathcal {T}_n}\) and the indicators \(\widetilde{\eta }^{n}_{\mathcal {T}_n}\) and \(\widetilde{\theta }^{n}_{\mathcal {T}_n}\) are defined in Lemma 4.9 and the quantities \(\eta ^n_{\mathcal {T}_n}\), \(\theta ^n_{\mathcal {T}_n}\) and \(\Theta ^n_{\mathop {cip},\mathcal {T}_n}\) are as in Lemma 4.13. The functions \(\widetilde{u}^{n}_{\mathcal {T}_n}\) and the indicators \(\widetilde{\eta }^{n}_{\mathcal {T}_n}\) and \(\widetilde{\theta }^{n}_{\mathcal {T}_n}\) may be dropped if \(\varepsilon \gtrsim 1\). The parameter \(\sigma _{\mathop {cip}}\) equals 1 for the continuous interior penalty scheme and vanishes for the other stabilizations. For arbitrary parameters \(\varepsilon \), \(\beta \), \(\nu \) and \(\gamma \), the constant \(c^*\) is proportional to \(\nu L \lambda ^2 \gamma \) and \(e^{\nu L \gamma T}\) with factors depending on the shape parameters \({{\mathrm{C}}}_{\mathcal {T}}\) and \({{\mathrm{C}}}_{\widetilde{\mathcal {T}},\mathcal {T}}\), the constant \(c_*\) is proportional to \(\nu L \lambda ^2 \gamma \) with factors depending on the shape parameters \({{\mathrm{C}}}_{\mathcal {T}}\) and \({{\mathrm{C}}}_{\widetilde{\mathcal {T}},\mathcal {T}}\) and the polynomial degrees of the finite element functions and the constant \(c_{**}\) is proportional to \(\nu L \lambda \gamma \). If \(\kappa = 2 \nu L \min \{ T, \lambda ^2 \} \gamma < 1\), the constant \(c^*\)only depends on \(\kappa \) and the shape parameters \({{\mathrm{C}}}_{\mathcal {T}}\) and \({{\mathrm{C}}}_{\widetilde{\mathcal {T}},\mathcal {T}}\). If in addition \(\widetilde{\kappa } = 25 \left( 2 + {{\mathrm{c}}}_b\right) \nu L \lambda ^2 \gamma < 1\), the constant \(c_*\) only depends on \(\widetilde{\kappa }\), the shape parameters \({{\mathrm{C}}}_{\mathcal {T}}\) and \({{\mathrm{C}}}_{\widetilde{\mathcal {T}},\mathcal {T}}\) and the polynomial degrees of the finite element functions and the \(c_{**}\)-term can be dropped.

Proof

For the proof of the first estimate, we observe that the second part of Lemma 4.3 yields an upper bound for this estimate’s left-hand side in terms of \(\left||u_0 - u^0_{\mathcal {T}_0} \right||\) and the dual norm of the residual. The upper bound for the data residual, Lemma 4.5, gives rise to the terms involving \(g - g^{n\theta }\), \(\mathbf {a}- \mathbf {a}^{n\theta }\) and \(b - b^{n\theta }\). Lemmas 4.4 and 4.74.12 yield bounds for the temporal residual involving the \(u^n_{\mathcal {T}_n} - u^{n-1}_{\mathcal {T}_{n-1}}\), \(\widetilde{\eta }^{n}_{\mathcal {T}_n}\), \(\widetilde{u}^{n}_{\mathcal {T}_n}\) and \(\widetilde{\theta }^{n}_{\mathcal {T}_n}\) terms. Lemma 4.13 finally bounds the spatial residual and gives rise to the remaining terms on the right-hand side of the theorem’s first estimate.

To prove the theorem’s second estimate, first observe that Lemma 4.13 provides an upper bound for \(\eta ^n_{\mathcal {T}_n}\) in terms of the spatial residual and \(\theta ^n_{\mathcal {T}_n}\). Next, Lemmas 4.74.8 and 4.9 give upper bounds for the remaining terms on the left-hand side of the theorem’s second estimate in terms of the linear temporal residual and \(\widetilde{\theta }^n_{\mathcal {T}_n}\). Lemmas 4.44.11 and 4.12 allow to bound the sum of the norms of the spatial and linear temporal residual by the norm of the sum of the spatial and full temporal residual plus the \(c_{**}\)-term on the right-hand side. Finally, Lemma 4.5 for the data residual and the first part of Lemma 4.3 give rise to the remaining terms on the right-hand side of the theorem’s second estimate. \(\square \)