1 Introduction

Let us first recall the a priori error estimate which holds for the approximation, by a conforming numerical method, of the homogeneous Dirichlet problem in a bounded nonempty open set \(\Omega \) of \({\mathbb {R}}^d\), \(d\ge 1\). Let \(\overline{u}\in H^1_0(\Omega )\) and \(u_{h}\in U_{h}\subset H^1_0(\Omega )\) (where \(U_{h}\) is a finite dimensional vector space), be the respective solutions of

$$\begin{aligned} \forall v\in H^1_0(\Omega ),\ \langle \nabla \overline{u},\nabla v\rangle _{L^2} = \langle f,v\rangle _{L^2} \end{aligned}$$
(1.1)

and

$$\begin{aligned} \forall v\in U_{h},\ \langle \nabla u_{h},\nabla v\rangle _{L^2} = \langle f,v\rangle _{L^2}, \end{aligned}$$

where \(f\in L^2(\Omega )\) is given, and where \(\langle \cdot ,\cdot \rangle _{L^2}\) denotes the scalar product in \(L^2(\Omega )^d\) or in \(L^2(\Omega )\). It is well-known that Céa’s Lemma [5] yields the following error estimate:

$$\begin{aligned} \inf _{v\in U_{h}} \mathop {\delta }\nolimits (\overline{u},v) \le \mathop {\delta }\nolimits (\overline{u},u_{h}) \le (1+ \textrm{diam}(\Omega )) \inf _{v\in U_{h}} \mathop {\delta }\nolimits (\overline{u},v), \end{aligned}$$

where, for any \(v\in U_{h}\), \(\mathop {\delta }\nolimits (\overline{u},v)\) measures the distance between the element \(\overline{u}\in H^1_0(\Omega )\) and the element \(v\in U_{h}\), as defined by

$$\begin{aligned} \mathop {\delta }\nolimits (\overline{u},v)^2 = \Vert \nabla \overline{u} - \nabla v\Vert _{L^2}^2+\Vert \overline{u} - v\Vert _{L^2}^2. \end{aligned}$$

The above error estimate is optimal, since it shows that the approximation error \(\mathop {\delta }\nolimits (\overline{u},u_{h})\) has the same order as that of the interpolation error \(\inf _{v\in U_{h}} \mathop {\delta }\nolimits (\overline{u},v)\). Such a generic error estimate is then used for determining the order of the method if the solution shows more regularity, leading to an interpolation error controlled by higher order derivatives of the solution.

Turning to approximations of the function \(\overline{u}\) which are nonconforming (i.e. no longer belonging to the space in which the problem is well-posed), we consider the framework of the Gradient Discretisation method (GDM) [11]. This framework provides a setting to prove convergence results of numerical schemes for (linear and nonlinear) elliptic and parabolic problems, that do not rely on the specificities of each method but rather identifies the key properties that enable the proof of their convergence; as a consequence, any error estimate or convergence established in the GDM framework readily applies to all methods covered by the framework, which are numerous: conforming, non-conforming and mixed finite elements, discontinuous Galerkin methods, but also finite volume methods, mimetic finite differences, arbitrary-order fully discrete polytopal schemes, and even meshless methods; we refer the reader to [11, Part III] and [8, 9, 12, 15, 16] for a non-exhaustive listing of the methods that are Gradient Discretisation Methods.

The GDM framework is based on a triplet of a space \(X_{h}\) and two operators \(\textsc {P}_{h}:X_{h}\rightarrow L^2(\Omega )\) and \({{\mathbf {{\small {\uppercase {G}}}}}}_{h}:X_{h}\rightarrow L^2(\Omega )^d\). The space is a finite-dimensional real vector space that encodes the degrees of freedom of the approximate solution. This space can take many shapes: for conforming methods, for example, it can be a subspace of \(H^1_0(\Omega )\); for methods having cell and/or face averages as unknowns, on the other hand, it could simply be \({\mathbb {R}}^K\) with K the total number of cells and faces. The operator \(\textsc {P}_{h}\) reconstructs, from an element in \(X_{h}\), a function in \(L^2(\Omega )\), while \({{\mathbf {{\small {\uppercase {G}}}}}}_{h}\) reconstructs a (discrete) gradient. These two reconstructions are used to define the scheme associated with \((X_{h},\textsc {P}_{h},{{\mathbf {{\small {\uppercase {G}}}}}}_{h})\), obtained by substituting in the weak formulation of the PDE the trial and test functions (resp. their gradient) by the reconstructed functions (resp. gradients). Each particular choice of triplet \((X_{h},\textsc {P}_{h},{{\mathbf {{\small {\uppercase {G}}}}}}_{h})\) corresponds to a specific scheme. We refer the reader to [11, Chapter 1] for a more thorough introduction to the GDM, and how the generic underlying ideas can be built from the ground up.

The resulting scheme for (1.1) reads: find \(u_{h}\in X_{h}\) such that

$$\begin{aligned} \forall v\in X_{h},\ \langle {{\mathbf {{\small {\uppercase {G}}}}}}_{h}u_{h},{{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\rangle _{L^2} = \langle f,\textsc {P}_{h}v\rangle _{L^2}. \end{aligned}$$

Then the following error estimate [11, Theorem 2.28] is a reformulation of G. Strang’s second lemma [24]:

$$\begin{aligned} \frac{1}{2} \left[ \zeta _{h}(\nabla \overline{u}) + \inf _{v\in X_{h}} \mathop {\delta }\nolimits (\overline{u},v)\right] \le \mathop {\delta }\nolimits (\overline{u},u_{h}) \le (1+ p_{h}) \left[ \zeta _{h}(\nabla \overline{u}) + \inf _{v\in X_{h}} \mathop {\delta }\nolimits (\overline{u},v)\right] , \end{aligned}$$

where \(\mathop {\delta }\nolimits (\overline{u},v)\), which measures the distance between the element \(\overline{u}\in H^1_0(\Omega )\) and the element \(v\in X_{h}\), is such that

$$\begin{aligned} \mathop {\delta }\nolimits (\overline{u},v)^2 = \Vert \nabla \overline{u} - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\Vert _{L^2}^2+\Vert \overline{u} - \textsc {P}_{h}v\Vert _{L^2}^2, \end{aligned}$$

and \(\zeta _{h}(\nabla \overline{u})\), which measures the conformity error of the method (it vanishes in the case of conforming methods), is defined by

$$\begin{aligned} \zeta _{h}(\nabla \overline{u}) = \max _{v\in X_{h}{\setminus } \{0\} } \frac{\langle \nabla \overline{u},{{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\rangle _{L^2} - \langle \textrm{div}(\nabla \overline{u}),\textsc {P}_{h}v\rangle _{L^2}}{ \Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\Vert _{L^2} }. \end{aligned}$$

The value \(p_{h}\) is associated to the discrete Poincaré inequality

$$\begin{aligned} \Vert \textsc {P}_{h}v\Vert _{L^2} \le p_{h}\Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\Vert _{L^2},\hbox { for all }v\in X_{h}. \end{aligned}$$
(1.2)

In the case where \(p_{h}\) is bounded independently of the accurateness of the approximation (for example, for mesh-based methods, \(p_{h}\) only depends on a regularity factor of the meshes), this error estimate is again optimal: it shows the same order for the approximation error and for the sum of the interpolation and conformity errors.

Hence, in the conforming case, the order of the method is only determined by the interpolation properties of \(U_{h}\), and in the nonconforming one, by the interpolation and conformity properties of \((X_{h},\textsc {P}_{h},{{\mathbf {{\small {\uppercase {G}}}}}}_{h})\).

In the case of parabolic problems, a large part of the literature only provides error estimates assuming supplementary regularity of the solution. For example, in [10], an error estimate is established for the GDM approximation of the heat equation under the condition that the exact solution of the problem belongs to the space \(W^{1,\infty }(0,T;W^{2,\infty }(\Omega ))\). Error estimate results for linear parabolic problems in the spirit of Céa’s Lemma have only recently been published. These results are based on variational formulations of the parabolic problem and on an inf-sup inequality satisfied by the involved bilinear form (see [7, XVIII.3 Théorème 2] for first results, and [23, III Proposition 2.3, p. 112] for a more complete formulation); they concern either semi-discrete numerical schemes (continuous in time, discrete in space), see for example [6, 25], or fully discrete time-space problems [3, 20, 21, 26]. In [2], similar optimal results are obtained for the full time-space approximation of linear parabolic partial differential equation, using Euler schemes or a discontinuous Galerkin scheme in time, together with conforming approximations. Let us more precisely describe the result obtained in [2], in the case of the implicit Euler scheme for the heat equation. Let \(T>0\), \(\xi _0\in L^2(\Omega )\) and \(f\in L^2(0,T;L^2(\Omega ))\) be given and let \(\overline{u}\in W {:}{=} H^1(0,T;H^{-1}(\Omega ))\cap L^2(0,T;H^1_0(\Omega ))\) (equivalently \(W = \{ u \in L^2(0,T;H^1_0(\Omega )) : \partial _t u \in L^2(0,T;H^{-1}(\Omega ))\}\)) be the solution of: \(\overline{u}(0) = \xi _0\) and, for a.e. \(t\in (0,T)\),

$$\begin{aligned} \forall v\in H^1_0(\Omega ),\ \langle \partial _t \overline{u}(t),v\rangle _{H^{-1},H^1_0} + \langle \nabla \overline{u}(t),\nabla v\rangle _{L^2} = \langle f(t),v\rangle _{L^2}. \end{aligned}$$

The existence and uniqueness of \(\overline{u}\) is due to Lions [19, Théorème 1.1, p. 46], see also [17, Théorème 4.29]. Let \(N\in {\mathbb {N}}\backslash \{0\}\) and \(U_{h}\subset H^1_0(\Omega )\) be given (as above, \(U_{h}\) is assumed to be a finite dimensional vector space). Let \(u_{h}{:}{=} (u^{(m)})_{m=0,\ldots ,N}\in W_{h}{:}{=} U_{h}^{N+1}\) be the solution of: \(u^{(0)} = {\mathcal {P}}^{L^2}_{U_{h}}(\xi _0)\) (orthogonal projection on \(U_{h}\) in \(L^2(\Omega )\)) and, for \(m=1,\ldots ,N\),

$$\begin{aligned} \forall v\in U_{h},\ \langle \frac{u^{(m)}- u^{(m-1)}}{k},v\rangle _{L^2} + \langle \nabla u^{(m)},\nabla v\rangle _{L^2} = \langle f^{(m)},v\rangle _{L^2}, \end{aligned}$$

with \(k=T/N\) and \(f^{(m)} = \frac{1}{k} \int _{(m-1)k}^{mk} f(t)\textrm{d}t\). Then it is shown in [2] that

$$\begin{aligned} \inf _{v\in W_{h}} \delta ^{(T)}(\overline{u},v)\le \delta ^{(T)}(\overline{u},u_{h}) \le C \inf _{v\in W_{h}} \delta ^{(T)}(\overline{u},v), \end{aligned}$$

where \( \delta ^{(T)}(\overline{u},v)\) is a suitable distance between the elements of W and those of \(W_{h}\), and C only depends on T and \(\Omega \). Note that the common bilinear form, for which inf-sup inequalities cover both the discrete and the continuous case, is not conforming in W.

The present work establishes an optimal error estimate result for the full time-space approximation of linear parabolic partial differential equation, using the implicit Euler scheme together with the GDM for the approximation of the continuous operators, without assuming a stronger regularity than the natural hypothesis \(\overline{u}\in W\). Our analysis also includes conforming methods with mass lumping: the latter technique is widely used, for stability reasons, in the real life implementation of conforming finite element methods for parabolic problems. Indeed, the implementation of the mass lumping, often viewed as a numerical integration approximation, is in fact a change of the approximation space which yields a non conformity error (see, e.g., the presentation in [11, Sect. 8.4]), and the resulting implicit Euler method is thus a doubly non conforming method, both in space and in time.

Let us describe such a doubly non conforming scheme in the case of the discretisation of the heat equation. Given \((X_{h},\textsc {P}_{h},{{\mathbf {{\small {\uppercase {G}}}}}}_{h})\) for the nonconforming approximation of an elliptic problem by the GDM, the time-space approximation is defined through the knowledge of \(u_{h}{:}{=} (u^{(m)})_{m=0,\ldots ,N}\in W_{h}{:}{=} X_{h}^{N+1}\), solution of: \(\textsc {P}_{h}u^{(0)} = {\mathcal {P}}^{L^2}_{\textsc {P}_{h}(X_{h})}(\xi _0)\) (orthogonal projection on \(\textsc {P}_{h}(X_{h})\) in \(L^2(\Omega )\)) and, for \(m=1,\ldots ,N\),

$$\begin{aligned} \forall v\in X_{h},\ \langle \textsc {P}_{h}\frac{u^{(m)}- u^{(m-1)}}{k},\textsc {P}_{h}v\rangle _{L^2} + \langle {{\mathbf {{\small {\uppercase {G}}}}}}_{h}u^{(m)},{{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\rangle _{L^2} = \langle f^{(m)},\textsc {P}_{h}v\rangle _{L^2}, \end{aligned}$$

defining k and \(f^{(m)}\) as above. Then our main result (expressed in Theorem 4.1) states that

$$\begin{aligned} \frac{1}{2} \left[ \zeta ^{(T)}_{h}({\varvec{v}}) + \inf _{v\in W_{h}} \delta ^{(T)}(\overline{u},v)\right] \le \delta ^{(T)}(\overline{u},u_{h}) \le C_{h}\left[ \zeta ^{(T)}_{h}({\varvec{v}}) + \inf _{v\in W_{h}} \delta ^{(T)}(\overline{u},v)\right] , \end{aligned}$$

where \({\varvec{v}}\in L^2(0,T;H_{\div }(\Omega ))\) is computed from \(\overline{u}\) by (4.5), \(\zeta ^{(T)}_{h}({\varvec{v}})\) defined by (4.4) again measures the conformity error of the method (and again vanishes in the case of conforming methods), and \(\delta ^{(T)}(\overline{u},v)\) measures the distance between the element \(\overline{u}\in W\) and the element \(v\in W_{h}\) [see (4.3)]. The real number \(C_{h}\) depends continuously on \(p_{h}\) [see (1.2)] which remains bounded for any reasonable nonconforming method [11]. This error estimate is established in the case of nonconforming methods for a general parabolic problem with general time conditions which include periodic boundary conditions.

This paper is organized as follows. In Sect. 2, we establish the continuous setting for parabolic problems with generic Cauchy data (initial or periodic, for example): this setting is presented in an abstract way (based on generic Hilbert spaces and unbounded operators between them, instead, say, of \(H^1_0\) spaces and the specific case of the Laplacian or some anisotropic diffusion operator) to demonstrate the generality of our analysis that covers for example diffusion and advection–diffusion PDEs, as well as higher-order models. In Sect. 3, we recall the general setting of the gradient discretisation method (GDM) and define the GDM for the approximation of space-time parabolic problems. Section 4 is concerned with Theorem 4.1, which is our main result and which states the error estimate between the space-time GDM approximation and the exact solution under the natural regularity assumptions given by the existence and uniqueness theorem of Sect. 2. The proof of this theorem relies on a series of technical lemmas establishing an inf-sup property on a bilinear form involved in the continuous and the discrete formulation. In Sect. 5, interpolation results are proved on a dense subspace of the solution space, hence leading to convergence results. Finally, Sect. 6 provides a numerical confirmation of the error estimate result, on problems with low regularity solutions. In the examples that are considered here, the non conformity error (which in one case includes the effect of mass lumping) is smaller than the interpolation error.

2 The parabolic problem

Let \(L\) and \({\varvec{L}}\) be separable Hilbert spaces; let \({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\subset L\) be a dense subspace of \(L\) and let \({{\mathbf {{\small {\uppercase {G}}}}}}:{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\rightarrow {\varvec{L}}\) be a linear operator whose graph \({\mathcal {G}} = \{ (u,{{\mathbf {{\small {\uppercase {G}}}}}}u), u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\}\) is closed in \(L\times {\varvec{L}}\).

As a consequence, \({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\) endowed with the graph norm \(\left| u\right| _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}},{\mathcal {G}}}^2 = \left| u\right| _{L}^2 + \left| {{\mathbf {{\small {\uppercase {G}}}}}}u\right| _{{\varvec{L}}}^2\) is a Hilbert space continuously embedded in \(L\). We assume that the graph norm is equivalent to \(\left| {{\mathbf {{\small {\uppercase {G}}}}}}u\right| _{{\varvec{L}}}\), which means that there exists a Poincaré constant \(C_P\) such that

$$\begin{aligned} \left| u\right| _{L}\le C_P \left| {{\mathbf {{\small {\uppercase {G}}}}}}u\right| _{{\varvec{L}}}\hbox { for all }u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}. \end{aligned}$$
(2.1)

As a consequence, we use from hereon the norm \(\left| {\cdot }\right| _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}{:}{=} \left| {{{\mathbf {{\small {\uppercase {G}}}}}}\cdot }\right| _{{\varvec{L}}}\) on \({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\). Since \(L\times {\varvec{L}}\) is separable, \({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\) is also separable for the norm \(\left| \cdot \right| _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}\) (see [4, Chap. III]).

Remark 2.1

(Example of spaces and operators) In the case of the heat equation, considering homogeneous Dirichlet boundary conditions, we let \(L= L^2(\Omega )\), \({\varvec{L}}= L^2(\Omega )^d\) and \({{\mathbf {{\small {\uppercase {G}}}}}}v = \nabla v\) for all v. If we consider an initial value problem with homogeneous Neumann boundary conditions, using the change of variable \(w(t) = \exp (-t)u(t)\) it is possible to consider \({\varvec{L}}= L^2(\Omega )^d\times L^2(\Omega )\) and \({{\mathbf {{\small {\uppercase {G}}}}}}v = (\nabla v,v)\) for all v. This change of variable is no longer possible in the case of periodic time boundary conditions. Notice that the solution may be periodic in the case of some zero mean value right-hand-side: in this case, it is possible to choose \({\varvec{L}}= L^2(\Omega )^d\times {\mathbb {R}}\) and \({{\mathbf {{\small {\uppercase {G}}}}}}v = (\nabla v,\int _\Omega v(\varvec{x})\textrm{d}\varvec{x})\).

In the following, the notation \(\langle \cdot , \cdot \rangle _{Z}\) denotes the inner product in a given Hilbert space Z, and \(\langle \cdot , \cdot \rangle _{Z',Z}\) denotes the duality action in a given Banach space Z whose dual space is denoted \(Z'\). Define \({\varvec{H}}_\textsc {D}\) by:

$$\begin{aligned} {\varvec{H}}_\textsc {D}= \big \{ {\varvec{v}}\in {\varvec{L}}:\, \exists w\in L, \forall u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}, \langle {\varvec{v}},{{\mathbf {{\small {\uppercase {G}}}}}}u\rangle _{{\varvec{L}}} + \langle w,u\rangle _{L} = 0\big \}. \end{aligned}$$
(2.2)

The density of \({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\) in \(L\) implies (and is actually equivalent to) the following property.

$$\begin{aligned} \text{ For } \text{ all } w\in L\text{, } \big (\forall u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}, \langle w,u\rangle _{L} = 0\big )\Rightarrow w = 0. \end{aligned}$$
(2.3)

Therefore, for any \({\varvec{v}}\in {\varvec{H}}_\textsc {D}\), the element \(w\in L\) whose existence is assumed in (2.2) is unique; this defines a linear operator \(\textsc {D}:{\varvec{H}}_\textsc {D}\rightarrow L\), so that

$$\begin{aligned} \forall u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}},\ \forall {\varvec{v}}\in {\varvec{H}}_\textsc {D},\ \langle {\varvec{v}},{{\mathbf {{\small {\uppercase {G}}}}}}u\rangle _{{\varvec{L}}} + \langle \textsc {D}{\varvec{v}},u\rangle _{L} = 0. \end{aligned}$$
(2.4)

Remark 2.2

(Divergence and operator \(\textsc {D}\)) In the case where \({{\mathbf {{\small {\uppercase {G}}}}}}\) is the standard gradient operator, \(\textsc {D}\) is the standard divergence operator. These notations match the choice made in [13] for stationary problems.

It easily follows from this that the graph of \(\textsc {D}\) is closed in \({\varvec{L}}\times L\), and therefore that, endowed with the graph norm \(\left| {\varvec{v}}\right| _{{\varvec{H}}_\textsc {D}} = \left| {\varvec{v}}\right| _{{{\varvec{L}}}} + \left| \textsc {D}{\varvec{v}}\right| _{L}\), \({\varvec{H}}_\textsc {D}\) is a Hilbert space continuously embedded and dense in \({\varvec{L}}\) (see [18, Theorem 5.29, p. 168]).

The continuous framework for linear parabolic problems with general time boundary conditions starts by the usual identification of the space \(L\) with a subspace of \({{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}'\) by letting

$$\begin{aligned} \langle y,u\rangle _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}',{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}} = \langle y,u\rangle _{L},\hbox { for all }y\in L, \ u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}. \end{aligned}$$

This identification yields the Gelfand triple

$$\begin{aligned} {{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}{\mathop {\hookrightarrow }\limits ^{d}} L\hookrightarrow {{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}', \end{aligned}$$

where the superscript d recalls that the first embedding is dense. Let \(T>0\), and recall that we may identify the dual space \({L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}'\) with \(L^2(0,T;{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}')\) and the space \({L^2(0,T;L)}'\) with \(L^2(0,T;{L})\); hence we have a further Gelfand triple

$$\begin{aligned} {L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}{\mathop {\hookrightarrow }\limits ^{d}} {L^2(0,T;L)} \hookrightarrow {L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})'}. \end{aligned}$$

The classical space W associated with the Gelfand triple is defined by

$$\begin{aligned} \begin{aligned} W = \Big \{u\in {L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})};\ \exists C\ge 0&\text{ such } \text{ that } \langle u, v'\rangle _{{L^2(L)}}\le C\Vert v\Vert _{{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}}\\&\text{ for } \text{ all } v\in C^1_c((0,T);{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\Big \}. \end{aligned} \end{aligned}$$

The “time derivative” of \(u\in W\) may then be defined as the element of the space \( L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})'\) identified with the space \(L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}')\) such that

$$\begin{aligned} \langle u',v\rangle _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})',{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}} {:}{=} -\langle u, v'\rangle _{{L^2(L)}} \text{ for } \text{ all } v\in C^1_c((0,T);{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}). \end{aligned}$$

Note that here as well as in the rest of this paper, for a given space Z we use in the dual products and norms the notation \(L^2(Z)\) (resp. \(H^p(Z)\) for \(p=1,2\)) as an abbreviation for \(L^2(0,T;Z)\) (resp. \(H^p(0,T;Z)\) for \(p=1,2\)). In other words, we can write W as follows, introducing also a Hilbert structure,

The space W can be identified with a subspace of \(C([0,T];L)\) and there exists \(C_T>0\) such that

$$\begin{aligned} \sup _{t\in [0,T]}\Vert v(t)\Vert _{L}\le C_T \Vert v\Vert _{W},\hbox { for all }v\in W. \end{aligned}$$
(2.5)

Recall the following integration by parts formula [23, III Corollary 1.1, p. 106]).

Lemma 2.3

One has, for all \(v,w\in W\),

$$\begin{aligned}{} & {} \langle v',w\rangle _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})',L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})} + \langle w',v\rangle _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})',L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})} \\{} & {} \quad = \langle v(T),w(T)\rangle _{L}-\langle v(0),w(0)\rangle _{L}. \end{aligned}$$

Let \(\Lambda \in L^\infty (0,T;\mathcal {L}({\varvec{L}},{\varvec{L}}))\) and let \({\mathfrak {S}}\in \mathcal L({\varvec{L}},{\varvec{L}})\) be a symmetric positive definite operator such that there exists \(M\ge 1\) and \(\alpha >0\) with

$$\begin{aligned} \Vert {\mathfrak {S}}^{-1}\Lambda (t)\Vert \le {}&M{} & {} \quad \hbox { for a.e.}\ t\in (0,T), \end{aligned}$$
(2.6a)
$$\begin{aligned} \langle {\mathfrak {S}}^{-1}\Lambda (t) \xi ,\xi \rangle _{{\varvec{L}}}\ge {}&\alpha \Vert \xi \Vert _{{\varvec{L}}}^2{} & {} \quad \text{ for } \text{ a.e. } t\in (0,T) \text{ and } \text{ all } \xi \in {\varvec{L}}. \end{aligned}$$
(2.6b)

We also define \(\rho >0\) by

(2.7)

The role of \(\Lambda \) is described in the remark below, while the role of \({\mathfrak {S}}\) (introduced to enable a more precise control of M, \(\alpha \) and \(\rho \) in certain cases, and thus improve the constants appearing in the error estimate) is discussed in Remark 4.3.

Remark 2.4

(Example of \(\Lambda \)) The operator \(\Lambda \) represents the model under consideration as well as the associated physical data. In the context of Remark 2.1 with Dirichlet boundary conditions (so \({\varvec{L}}=L^2(\Omega )^d\)), for example, taking a uniformly coercive matrix \({\mathbb {M}}:\Omega \rightarrow \mathcal S_d(\mathbb R)\) (where \(\mathcal S_d(\mathbb R)\) is the set of symmetric matrices), a vector field \({\textbf{b}}:[0,T]\times \Omega \rightarrow \mathbb R^d\), and setting \(\Lambda (t)\xi ={\mathbb {M}}\xi +{\textbf{b}}(t)\cdot \xi \) for all \(\xi \in L^2(\Omega )^d\), the model (2.8) below represents a diffusion–advection parabolic problem with diffusion matrix \({\mathbb {M}}\) and advective velocity \(\mathbb {b}\).

Let \(\Phi ~:~L\rightarrow L\) be a linear contraction (which means that \(\Vert \Phi v\Vert _{L}\le \Vert v\Vert _{L}\) for all \(v\in L\)). Our aim is to obtain an error estimate for an approximate solution of the following problem. Given \(g\in L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}')\) and \(\xi _0\in L\),

$$\begin{aligned} \hbox {find }u\in W\hbox { s.t.\ } u' - \textsc {D}(\Lambda {{\mathbf {{\small {\uppercase {G}}}}}}u)=g\hbox { and }u(0)-\Phi u(T)=\xi _0. \end{aligned}$$
(2.8)

Using the identification between \({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\) and \({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}'\) by the Riesz representation theorem, we decompose \(g \in L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}')\) as \(g = f +\textsc {D}{\varvec{F}}\) with \(f\in L^2(0,T;L)\), \({\varvec{F}}\in L^2(0,T;{\varvec{L}})\). This decomposition is not unique; indeed \(f=0\) is always possible, but in several problems of interest, the source term g belongs to \(L^2(0,T;L)\). Therefore, the problem to be considered reads

$$\begin{aligned} \hbox {find }u\in W\hbox { s.t.\ } u' - \textsc {D}(\Lambda {{\mathbf {{\small {\uppercase {G}}}}}}u +{\varvec{F}})=f\hbox { and }u(0)-\Phi u(T)=\xi _0. \end{aligned}$$
(2.9)

We introduce the Riesz isomorphism \(R:{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}'\rightarrow {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\) (which also defines the Riesz isomorphism \(R:L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}')\rightarrow L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\)) such that

$$\begin{aligned} \forall (\xi ,v)\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}'\times {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}},\ \langle {\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}R\xi , {{\mathbf {{\small {\uppercase {G}}}}}}v\rangle _{{\varvec{L}}}= \langle \xi , v\rangle _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}',{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}. \end{aligned}$$
(2.10)

The problem (2.9) is then equivalent to

$$\begin{aligned} \hbox {find }u\in W\hbox { s.t.\ } -\textsc {D}({\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}R u' + \Lambda {{\mathbf {{\small {\uppercase {G}}}}}}u +{\varvec{F}})=f\hbox { and }u(0)-\Phi u(T)=\xi _0, \end{aligned}$$
(2.11)

which contains that \({\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}Ru' + \Lambda {{\mathbf {{\small {\uppercase {G}}}}}}u + {\varvec{F}}\in L^2(0,T;{\varvec{H}}_\textsc {D})\).

Theorem 2.5

[1] For all \(f\in L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})'\) and \(\xi _0\in L\), Problem (2.9) has a unique solution.

3 The space-time discretisation

3.1 Space approximation using the gradient discretisation method

Definition 3.1

(Gradient discretisation) A gradient discretisation is defined by \({\mathcal {D}}_{h}= (X_{{h}},\textsc {P}_{h},{{\mathbf {{\small {\uppercase {G}}}}}}_{h})\), where:

  1. 1.

    The set of discrete unknowns \(X_{{h}}\) is a finite dimensional real vector space.

  2. 2.

    The “function” reconstruction \(\textsc {P}_{h}~:~X_{{h}}\rightarrow L\) is a linear mapping that reconstructs, from an element of \(X_{{h}}\), an element in \(L\).

  3. 3.

    The “gradient” reconstruction \({{\mathbf {{\small {\uppercase {G}}}}}}_{h}~:~X_{{h}}\rightarrow {\varvec{L}}\) is a linear mapping that reconstructs, from an element of \(X_{{h}}\), an element of \({\varvec{L}}\).

  4. 4.

    The mapping \({{\mathbf {{\small {\uppercase {G}}}}}}_{h}\) is such that the mapping \(v \mapsto \left| {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\right| _{{\varvec{L}}}\) defines a norm on \(X_{h}\).

We then define the following weighted norm on \(X_{h}\)

(3.1)

and \(p_{h}\) as the norm of \(\textsc {P}_{h}\):

$$\begin{aligned} p_{h}= \max _{v\in X_{{h}}{\setminus }\{0\}}\frac{\left| \textsc {P}_{h}v\right| _{L}}{\Vert v \Vert _{{h}}}. \end{aligned}$$
(3.2)

3.2 Description of the Euler scheme

We now refer to the framework of Sect. 2. In particular, \({\Phi }:L\rightarrow L\) is linear and \(\Vert \Phi \Vert \le 1\). Moreover, \(f\in L^2(0,T;L)\), \({\varvec{F}}\in L^2(0,T;{\varvec{L}})\) and \(\xi _0\in L\) are given.

Let \(N\in {\mathbb {N}}{\setminus }\{0\}\) and define the time step (taken to be uniform for simplicity of presentation) \(k = \frac{T}{N}\). For all \(m=1,\ldots ,N\), \(\Lambda ^{(m)}\in \mathcal {L}({\varvec{L}},{\varvec{L}})\) denotes the coercive linear operator given by

$$\begin{aligned} \Lambda ^{(m)} = \frac{1}{k}\int _{(m-1)k}^{m k} \Lambda (t) \textrm{d}t \end{aligned}$$

and \(f^{(m)}\in L\), \({\varvec{F}}^{(m)}\in {\varvec{L}}\) are defined by

$$\begin{aligned} f^{(m)} = \frac{1}{k}\int _{(m-1)k}^{m k} f(t) \textrm{d}t\quad \hbox { and }\quad {\varvec{F}}^{(m)} = \frac{1}{k}\int _{(m-1)k}^{m k} {\varvec{F}}(t) \textrm{d}t. \end{aligned}$$

The implicit Euler scheme consists in seeking \(N+1\) elements of \(X_{h}\), denoted by \((w^{(m)})_{m=0,\ldots ,N}\), such that

$$\begin{aligned} \langle \textsc {P}_{h}w^{(0)} - \Phi \textsc {P}_{h}w^{(N)}, \textsc {P}_{h}u\rangle _{L} = \langle \xi _0, \textsc {P}_{h}u\rangle _{L} \text{ for } \text{ all } u\in X_{h}\end{aligned}$$
(3.3a)

and

$$\begin{aligned} \begin{aligned} \langle \textsc {P}_{h}\frac{w^{(m)}-w^{(m-1)}}{k}, \textsc {P}_{h}u\rangle _{ L} {}&+ \langle \Lambda ^{(m)} {{\mathbf {{\small {\uppercase {G}}}}}}_{h}w^{(m)},{{\mathbf {{\small {\uppercase {G}}}}}}_{h}u\rangle _{{\varvec{L}}} \\ ={}&\langle f^{(m)},\textsc {P}_{h}u\rangle _{L} -\langle {\varvec{F}}^{(m)},{{\mathbf {{\small {\uppercase {G}}}}}}_{h}u \rangle _{{\varvec{L}}}\\&\qquad \text{ for } \text{ all } m=1,\ldots ,N \text{ and } u\in X_{h}. \end{aligned} \end{aligned}$$
(3.3b)

Remark 3.2

The discrete value \(w^{(0)}\) is only involved in (3.3a)–(3.3b) through \(\textsc {P}_{h}w^{(0)}\). As a consequence, we only prove in the following the uniqueness of \(\textsc {P}_{h}w^{(0)}\). If \(\textsc {P}_{h}:X_{h}\rightarrow L\) is one-to-one, this shows the uniqueness of \(w^{(0)}\); if this operator is not one-to-one, then \(w^{(0)}\) is actually not unique.

Note that, if \(\Phi \equiv 0\), the scheme is the usual implicit scheme, and the existence and uniqueness of a solution \((\textsc {P}_{h}w^{(0)},(w^{(m)})_{m=1,\ldots ,N})\) to (3.3b) is standard. In the general case, a linear system involving \(\textsc {P}_{h}w^{(0)}\) must be solved, and its invertibility is proved by Theorem 4.1.

We now define the space \(W_{h}\) of all functions \(w~:~[0,T]\rightarrow X_{h}\) that are piecewise constant in time in the following way: there exist \(N+1\) elements of \(X_{h}\), denoted by \((w^{(m)})_{m=0,\ldots ,N}\), such that

$$\begin{aligned} \begin{aligned} w(0)={}&w^{(0)}, \text{ and } \\ w(t) ={}&w^{(m)}\hbox { for all }t\in ((m-1)k, mk],\hbox { for all }m=1,\ldots ,N. \end{aligned} \end{aligned}$$
(3.4)

We observe that the space \(W_{h}\) is isomorphic to \(X_{h}^{N+1}\), through the mapping \(w\mapsto (w(mk))_{m=0,\ldots ,N}\). We define the discrete derivative of \(w\in W_{h}\) as follows:

$$\begin{aligned} \begin{aligned} \partial w(t) ={}&\frac{w^{(m)}-w^{(m-1)}}{k},\\&\qquad \text { for a.e. } t\in ((m-1)k, mk),\hbox { for all }m=1,\ldots ,N. \end{aligned} \end{aligned}$$
(3.5)

Define the space \(V_{h}\) of all functions \(v\in L^2(0,T; X_{h})\) for which there exist N elements of \(X_{h}\), denoted by \((v^{(m)})_{m=1,\ldots ,N}\), such that

$$\begin{aligned} v(t) = v^{(m)}\hbox { for all }t\in ((m-1)k, mk),\hbox { for all }m=1,\ldots ,N. \end{aligned}$$
(3.6)

Remark 3.3

(Difference between \(W_{h}\) and \(V_{h}\)) \(W_{h}\) and \(V_{h}\) are both spaces of piecewise constant functions in time. However, functions in \(W_{h}\) are defined pointwise and everywhere, including at all time steps (and are left-continuous on [0, T]), whereas functions in \(V_{h}\) are only defined almost everywhere on (0, T). This distinction between the two spaces plays the same role as the distinction between \({\mathcal {X}}\) and \({\mathcal {Y}}\) in [21, Sect. 5].

Scheme (3.3a)–(3.3b) can than be written under the form:

$$\begin{aligned} \hbox {Find }w_{h}\in W_{h}\hbox { such that }\forall (v,z)\in V_{h}\times X_{h}, \quad b(w_{h}, (v,z)) = L((v,z)), \end{aligned}$$
(3.7)

with

$$\begin{aligned} \begin{aligned} b(w_{h}, (v,z)) ={}&\langle \textsc {P}_{h}\partial w_{h}, \textsc {P}_{h}v\rangle _{ L^2(L) } + \langle \Lambda {{\mathbf {{\small {\uppercase {G}}}}}}_{h}w_{h},{{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\rangle _{L^2({\varvec{L}})}\\&+\langle \textsc {P}_{h}w_{h}(0) - \Phi \textsc {P}_{h}w_{h}(T), \textsc {P}_{h}z\rangle _{L} \end{aligned} \end{aligned}$$
(3.8)

and

$$\begin{aligned} L((v,z)) = \langle f,\textsc {P}_{h}v \rangle _{L^2(L)} -\langle {\varvec{F}},{{\mathbf {{\small {\uppercase {G}}}}}}_{h}v \rangle _{L^2({\varvec{L}})}+ \langle \xi _0, \textsc {P}_{h}z\rangle _{L}. \end{aligned}$$

Remark 3.4

(Role of the test functions) In (3.7), the function \(v\in V_{h}\) tests the evolution Eq. (3.3b) while \(z\in X_{h}\) tests (through \(\textsc {P}_{h}z\)) the initial/final condition (3.3a).

Theorem 3.5

Under the setting of this section, there exists one and only one solution \((\textsc {P}_{h}w^{(0)},(w^{(m)})_{m=1,\ldots ,N})\) to (3.3a)–(3.3b) or equivalently to (3.7). For this solution, we denote by \(w_{h}\) the element of \(W_{h}\) corresponding to \((w^{(m)})_{m=0,\ldots ,N})\) for a given choice of \(w^{(0)}\).

Proof

Since \((\textsc {P}_{h}w^{(0)},(w^{(m)})_{m=1,\ldots ,N})\) is solution to a square linear system, the error estimate Theorem 4.1 proves that, for a null right-hand-side, the solution is null. Hence the system is invertible. \(\square \)

4 Error estimate

Define the discrete Riesz operator \(R_{h}:X_{h}\rightarrow X_{h}\) by: for all \(u\in X_{h}\), \(R_{h}u\) satisfies

$$\begin{aligned} \langle {\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}u, {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\rangle _{{\varvec{L}}} = \langle \textsc {P}_{h}u, \textsc {P}_{h}v\rangle _{L}\quad \text{ for } \text{ all } v\in X_{h}. \end{aligned}$$
(4.1)

We note that with this definition, the scheme (3.7) can be recast as: for all \((v,z)\in V_{h}\times X_{h}\),

$$\begin{aligned} \begin{aligned} \langle {\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial w_{h}+\Lambda {{\mathbf {{\small {\uppercase {G}}}}}}_{h}w_{h}+{\varvec{F}}, {}&{{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\rangle _{ L^2({\varvec{L}}) } +\langle \textsc {P}_{h}w_{h}(0) - \Phi \textsc {P}_{h}w_{h}(T), \textsc {P}_{h}z\rangle _{L}\\ ={}&\langle f,\textsc {P}_{h}v \rangle _{L^2(L)} + \langle \xi _0, \textsc {P}_{h}z\rangle _{L}. \end{aligned} \end{aligned}$$
(4.2)

Set, for all \(u\in W\) and \(v\in W_{h}\),

(4.3)

We also define \(\zeta ^{(T)}_{{h}}:L^2(0,T;{\varvec{H}}_\textsc {D}) \rightarrow [0,+\infty )\) by: for all \({\varvec{v}}\in L^2(0,T;{\varvec{H}}_\textsc {D})\),

(4.4)

Theorem 4.1

Let u be the solution to (2.9), let

$$\begin{aligned} {\varvec{v}}{:}{=} {\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}Ru' + \Lambda {{\mathbf {{\small {\uppercase {G}}}}}}u +{\varvec{F}}\in L^2(0,T;{\varvec{H}}_\textsc {D}) \end{aligned}$$
(4.5)

and let \(w_{h}\) be a solution to (3.3). Then there exists \(C_{h}\ge 0\), depending only on \(p_{h}\) [defined by (3.2)] in a non decreasing and continuous way, and on \((\alpha , M, T)\) [see (2.6)], such that:

$$\begin{aligned}{} & {} \frac{1}{2}\Big [ \zeta ^{(T)}_{h}({\varvec{v}})+ \inf _{v\in W_{h}} \delta ^{(T)}_{h}(u,v)\Big ] \nonumber \\{} & {} \quad \le \delta ^{(T)}_{h}(u,w_{h}) \le C_{h}\max (1,\rho )\Big [ \zeta ^{(T)}_{h}({\varvec{v}}) + \inf _{v\in W_{h}} \delta ^{(T)}_{h}(u,v)\Big ]. \end{aligned}$$
(4.6)

Remark 4.2

(Optimal error estimate) If \(C_{h}\) is bounded independently of h, which is the case for several discretisation methods for which \(p_{h}\) can be shown to be bounded thanks to a regularity assumption on the mesh [11, Part III], the second inequality in (4.6) gives an error estimate for the scheme, while the first inequality shows its optimality. This is the result announced in the title and introduction of this work.

Remark 4.3

(Role of \({\mathfrak {S}}\)) The role of \({\mathfrak {S}}\) is to provide a more precise control of the constants \(M,\alpha ,\rho \) which impact \(C_h\) in (4.6). The weight \({\mathfrak {S}}\) should be chosen to make \(M,\alpha ,\rho \) as small as possible—and, ideally, to compensate for a possible strong anisotropy of \(\Lambda \) (that would create large ratios \(M/\alpha \) if \({\mathfrak {S}}\) is absent). In the case where \(\Lambda \) is a time-independent symmetric coercive operator, a natural choice is \({\mathfrak {S}}= \Lambda \); then, we can take \(\alpha =M=\rho = 1\) in (2.6), and \(C_h\) and \(\rho \) are independent of \(\Lambda \) [but the norm of the error estimate depends on it, see Definition (3.1)].

We also note that, by Hypothesis (2.6b) and since \({\mathfrak {S}}\) is symmetric positive definite, we have

where \(C_\star \) and \(C^\star \) depend on \({\mathfrak {S}}\), \(\alpha \), M. Hence, the estimate (4.6) also translates into an estimate on the term (4.3) without the factors \({\mathfrak {S}}\) and . The latter estimate, however, has multiplicative constants that may depend more severely on the anisotropy of \(\Lambda \), as explained above.

Remark 4.4

(Error estimate without regularity assumption) One of the strengths of the error estimate (4.6), besides the fact that it is in some cases robust with respect to the model’s anisotropy as explained in Remark 4.3 above, is that it is established without assuming any regularity property on the PDE solution. If we assume some smoothness in time and space of that solution, and if we consider mesh-based methods where h is the mesh size, the terms on each side of this estimate can be evaluated in terms of powers of h, but even if we take a minimal-regularity solution, this estimate provides the convergence of any GDM scheme.

Remark 4.5

In the case where the source term g belongs to \(L^2(0,T;L)\) (instead of the weaker assumption \(g\in L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}')\) that we assume here), an error estimate for solutions with minimal regularity was established in [22] for a discontinuous Galerkin discretisation in time and a conforming discretisation in space. Our result, on the contrary, covers non-conforming and non-Galerkin approximations and highlights the role of the defect of conformity measure (4.4) when considering general spatial discretisations.

The proof of Theorem 4.1 is given after stating and proving a series of technical lemmas involving operators on Hilbert spaces.

Lemma 4.6

For \(w\in W_{h}\), the following inequalities hold

(4.7)
(4.8)

and, recalling that \(p_{h}\) is defined by (3.2),

(4.9)

Proof

Let \(w\in W_{h}\). Using the relation \((a-b)a=\frac{1}{2} a^2+\frac{1}{2} (a-b)^2-\frac{1}{2} b^2\), the definition (4.1) of \(R_{h}\) yields, for \(0\le m \le m'\le N\),

$$\begin{aligned} \int _{mk}^{m'k} \langle {}&{\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial w(t), {{\mathbf {{\small {\uppercase {G}}}}}}_{h}w(t)\rangle _{{\varvec{L}}}\textrm{d}t = \int _{mk}^{m'k} \langle \textsc {P}_{h}\partial w(t), \textsc {P}_{h}w(t)\rangle _{L}\textrm{d}t \nonumber \\ ={}&\sum _{p=m}^{m'-1} k \langle \textsc {P}_{h}\frac{w^{(p+1)} - w^{(p)}}{k},\textsc {P}_{h}w^{(p+1)}\rangle _L\nonumber \\ ={}&\frac{1}{2} \Vert \textsc {P}_{h}w^{(m')}\Vert _L^2 +\frac{1}{2}\sum _{p=m}^{m'-1} \Vert \textsc {P}_{h}\left( w^{(p+1)} - w^{(p)}\right) \Vert _L^2 -\frac{1}{2} \Vert \textsc {P}_{h}w^{(m)}\Vert _L^2. \end{aligned}$$
(4.10)

Using the Cauchy–Schwarz inequality on the left-hand side provides

(4.11)

where the second line follows from the Young inequality. Setting \(m=0\) allows us to take any \(m'=0,\ldots ,N\). Taking the square root of the above inequality and using then concludes the proof of (4.7).

The inequality (4.8) is obtained letting \(m=0\) and \(m'=N\) in (4.10). To prove (4.9), we come back to (4.11) and set \(m'=N\) to get, after multiplication by 2k, for all \(m=0,\ldots ,N\),

Summing over \(m=1,\ldots ,N\) yields

which proves (4.9). \(\square \)

Lemma 4.7

Let V be a Hilbert space and let \(A~:~{V}\rightarrow {V}\) be an M-continuous and \(\alpha \)-coercive operator (with \(M\ge 1\) and \(\alpha >0\)), which means that

$$\begin{aligned} \Vert A v\Vert _{{V}} \le M \Vert v\Vert _{V}\quad \text{ and }\quad \langle A v, v\rangle _{{V}} \ge \alpha \Vert v\Vert _{V}^2\quad \forall v\in {V}. \end{aligned}$$
(4.12)

Then, for all \(v,w\in V\),

$$\begin{aligned} \Vert w + A v\Vert _{{V}}^2\ge 2\alpha \langle w,v\rangle _{{V}}+ \frac{1}{3} \left( \frac{\alpha }{M}\right) ^3 \left( \Vert w \Vert _{{V}}^2 +\Vert v\Vert _{V}^2\right) . \end{aligned}$$
(4.13)

Proof

Consider the symmetric \(A_{\textrm{s}}{:}{=}\frac{A+A^*}{2}\) and anti-symmetric \(A_{\textrm{a}}{:}{=}\frac{A-A^*}{2}\) parts of A. We have, for all \(v\in {V}\), \(\langle A_{\textrm{s}} v,v\rangle =\langle A v,v\rangle \ge \alpha \Vert v\Vert _{V}^2\). It follows that the self-adjoint operator \(A_{\textrm{s}}\) is positive and invertible, and has a positive invertible square root which satisfies

(4.14)

and , so that

$$\begin{aligned} \Vert A_\textrm{s}^{-1/2} v\Vert \ge \frac{1}{\sqrt{M}} \Vert v\Vert \quad \forall v\in {V}. \end{aligned}$$
(4.15)

Applying (4.14) to instead of v gives

where the second line follows from developing the square of the norm and using . By anti-symmetry of \(A_\textrm{a}\) we have \(\langle v, A_{\textrm{a}} v\rangle _{V}=0\), which leads to

(4.16)

Now we use the Young inequality combined with the Cauchy–Schwarz inequality to estimate, for all \(\gamma >0\),

$$\begin{aligned} 2 \left| \langle A_\textrm{s}^{-1/2}w, A_\textrm{s}^{-1/2}A_{\textrm{a}} v\rangle _{V} \right| \le {}&2 \Vert A_\textrm{s}^{-1/2}w\Vert _{V} \Vert A_\textrm{s}^{-1/2}A_{\textrm{a}}v\Vert _{V}\\ \le {}&\gamma \Vert A_\textrm{s}^{-1/2}w\Vert _{V}^2 +\frac{1}{\gamma } \Vert A_\textrm{s}^{-1/2}A_{\textrm{a}}v\Vert _{V}^2. \end{aligned}$$

Taking \(\gamma <1\) and plugging this estimate into (4.16) yields

Applying (4.14) with \(A_\textrm{s}^{-1/2}A_\textrm{a}v\) instead of v and using \(\Vert A_\textrm{a}v\Vert _V\le M\Vert v\Vert _V\) gives

$$\begin{aligned} \Vert A_\textrm{s}^{-1/2}A_{\textrm{a}}v\Vert _{V} \le \frac{M}{\sqrt{\alpha }} \Vert v\Vert _{V}, \end{aligned}$$

which leads, since \(1-\frac{1}{\gamma }<0\), to

Let \(\gamma =\frac{1}{1+s}\) where \(s>0\) is fixed later. Then \(1-\gamma =\frac{s}{1+s}\) and \(1-\frac{1}{\gamma } = -s\) and, using (4.14) and (4.15), it follows that

$$\begin{aligned} \Vert w+A v\Vert ^2 \ge 2\alpha \langle w, v\rangle _{V}+ \frac{s}{1+s}\frac{\alpha }{M}\Vert w\Vert _{V}^2 -s M^2 \Vert v\Vert _{V}^2+\alpha ^2\Vert v\Vert _{V}^2. \end{aligned}$$

Choose \(s =\frac{\alpha ^2}{2 M^2}\) to obtain

$$\begin{aligned} \Vert w+A v\Vert ^2 \ge 2\alpha \langle w, v\rangle _{V}+ \beta \left( \Vert w\Vert _{V}^2 +\Vert v\Vert _{V}^2\right) , \end{aligned}$$

where, using \(\alpha \le M\) and \(1\le M\),

$$\begin{aligned} \beta =\min \left\{ \frac{\alpha ^2}{2}, \frac{\alpha ^2}{\alpha ^2 +2M^2}\frac{\alpha }{M} \right\} = \frac{\alpha ^2}{\alpha ^2 +2M^2}\frac{\alpha }{M} \ge \frac{1}{3} \Big (\frac{\alpha }{M}\Big )^3. \end{aligned}$$

\(\square \)

Lemma 4.8

Let \(L\) be a Hilbert space, and let \(\Phi ~:~L\rightarrow L\) be a contraction (which means that \(\Vert \Phi \Vert \le 1\)). Let \(a>0\) and \(b\in [0,a]\) be given reals such that \(\gamma {:}{=} a - b\Vert \Phi \Vert ^2>0\). The following estimate holds

$$\begin{aligned} a\Vert w\Vert _{L}^2 - b \Vert v\Vert _{L}^2 + \frac{9 a^2}{\gamma }\Vert v - \Phi w\Vert _{L}^2 \ge \frac{\gamma }{3} \left( \Vert w\Vert _{L}^2+\Vert v\Vert _{L}^2\right) \text{ for } \text{ all } v,w\in L. \end{aligned}$$
(4.17)

Proof

Let \(v,w\in L\) be given. By the Young inequality, we have for any \(\mu >0\),

$$\begin{aligned} -2 \langle v, \Phi w\rangle _{L} \ge -\mu \Vert \Phi w\Vert _{L}^2- \frac{1}{\mu }\Vert v\Vert _{L}^2. \end{aligned}$$

Choosing \(\mu >1\), this implies that

$$\begin{aligned} \Vert v-{\Phi } w\Vert _{L}^2 \ge {}&\Vert v\Vert _{L}^2\left( 1 - \frac{1}{\mu }\right) - \Vert w\Vert _{L}^2\Vert \Phi \Vert ^2 (\mu - 1). \end{aligned}$$
(4.18)

Let \(\beta {:}{=} b\Vert \Phi \Vert ^2 \in [0,a)\) and \( \mu {:}{=} \frac{\beta + 2 a}{2 \beta + a}\in (1,+\infty )\). Let \(\theta >0\) and \(\alpha >0\) be such that

$$\begin{aligned} \theta \left( 1 - \frac{1}{\mu }\right) = b + \alpha \quad \text{ and } \quad \theta (\mu - 1)\Vert \Phi \Vert ^2 = a - \alpha . \end{aligned}$$

Using \(\Vert \Phi \Vert ^2\le 1\), this system is satisfied for

$$\begin{aligned} \alpha = \frac{(a-\beta )(a+\beta )}{2\beta + a + \Vert \Phi \Vert ^2(2a+\beta )}\ge \frac{\gamma }{3}. \end{aligned}$$
(4.19)

Using the preceding equation and \(b\le a\), we get

$$\begin{aligned} \theta = \frac{2a+\beta }{a-\beta }(b +\alpha ) \le \frac{9 a^2}{\gamma }. \end{aligned}$$

Invoking (4.18) then gives

$$\begin{aligned} a\Vert w\Vert _{L}^2 - b \Vert v\Vert _{L}^2 + {}&\frac{9 a^2}{\gamma }\Vert v - \Phi w\Vert _{L}^2 \ge a\Vert w\Vert _{L}^2 - b \Vert v\Vert _{L}^2 + \theta \Vert v - \Phi w\Vert _{L}^2 \\ \ge {}&(a - \theta (\mu - 1)\Vert \Phi \Vert ^2)\Vert w\Vert _{L}^2+\left( \theta \left( 1 - \frac{1}{\mu }\right) -b\right) \Vert v\Vert _{L}^2 \\ ={}&\alpha \left( \Vert w\Vert _{L}^2+\Vert v\Vert _{L}^2\right) . \end{aligned}$$

Recalling (4.19) concludes the proof of (4.17). \(\square \)

We now give a sufficient condition for establishing an inf-sup condition on the bilinear form b defined by (3.8). Such a condition is sufficient to obtain an error estimate for conforming schemes (see e.g. [14]). In the case of the (possibly non-conforming) scheme studied in this paper, it provides an essential step in the error estimate proof [see (4.29)].

Lemma 4.9

Let V and \(L\) be Hilbert spaces. Let \({\widehat{Z}}\) and Y be the Hilbert spaces defined by \({\widehat{Z}} = {V}\times {V}\times L\times L\) and \(Y = {V}\times L\). Let \(A:{V}\rightarrow {V}\) be an M-continuous and \(\alpha \)-coercive linear operator in the sense of (4.12). Let \(\Phi :L\rightarrow L\) be a linear operator such that \(\Vert \Phi \Vert \le 1\). We define \({\widehat{b}}~:~{\widehat{Z}}\times Y\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} {\widehat{b}}((z_1,z_2,z_3,z_4),(y_1,y_2)) = \langle z_1 + A z_2,y_1\rangle _{{V}} + \langle z_3 - \Phi z_4,y_2\rangle _{L}, \end{aligned}$$
(4.20)

for all \((z_1,z_2,z_3,z_4)\in {\widehat{Z}}\) and for all \((y_1,y_2)\in Y\).

Let \({\widehat{X}}\subset {\widehat{Z}}\) be a subspace of \({\widehat{Z}}\). We define the Hilbert spaces \({\widehat{X}}_1\subset V\), \({\widehat{X}}_2\subset V\), \({\widehat{X}}_3\subset L\) and \({\widehat{X}}_4\subset L\) by: for \(i=1,\ldots ,4\),

$$\begin{aligned} {\widehat{X}}_i = \overline{\left\{ x_i: \ x\in {\widehat{X}}\right\} },\hbox { where }x_i\hbox { is the { i}-th component of }x\in {\widehat{Z}}. \end{aligned}$$

Assume that

$$\begin{aligned} {\widehat{X}}_1\subset {\widehat{X}}_2, \end{aligned}$$
(4.21)

and that there exist \(\zeta >0\) and \(\delta >0\) such that, for all \(x\in {\widehat{X}}\),

$$\begin{aligned} \langle x_1,x_2\rangle _{{V}} +\frac{\alpha ^2}{12\, M^3}\left( \Vert x_2\Vert _{V}^2 + \Vert x_1\Vert _{{V}}^2\right) \ge \mu \Vert x_4\Vert _{L}^2 -\nu \Vert x_3\Vert _{L}^2, \end{aligned}$$
(4.22)

for some \(\mu \in (0,\zeta ]\) and \(\nu \in [0,\mu ]\) with \(\mu - \nu \Vert \Phi \Vert ^2 \ge \delta \). Then, there exists \({\widehat{\beta }}>0\), only depending on \(\alpha \), M, \(\zeta \) and \(\delta \) (and not on \(\mu \), \(\nu \) and \(\Vert \Phi \Vert \)) such that

$$\begin{aligned} \sup _{y\in {\widehat{X}}_2\times {\widehat{X}}_3, \Vert y\Vert _Y = 1} {\widehat{b}}(x,y) \ge {\widehat{\beta }} \Vert x\Vert _{{\widehat{Z}}}\quad \forall x\in {\widehat{X}}. \end{aligned}$$
(4.23)

Proof

Let \(x\in {\widehat{X}}\). Let \( P_{3}:L\rightarrow {\widehat{X}}_3\subset L\) be the orthogonal projection onto \({\widehat{X}}_3\). Then, setting

$$\begin{aligned} {\mathcal {N}}(x) = \sup _{y\in {\widehat{X}}_2\times {\widehat{X}}_3, \Vert y\Vert _Y = 1} {\widehat{b}}(x,y), \end{aligned}$$

we have, using (4.21),

$$\begin{aligned} {\mathcal {N}}(x)^2 = \Vert x_1 + A x_2\Vert _{{V}}^2 + \Vert P_{3}(x_3 - \Phi x_4)\Vert _{L}^2. \end{aligned}$$
(4.24)

We then obtain, for \(\theta >0\) to be chosen later

$$\begin{aligned} {\mathcal {N}}(x)^2 \ge \frac{1}{\max (1,\theta )} \Vert x_1 + A x_2\Vert _{{V}}^2 + \frac{\theta }{\max (1,\theta )} \Vert P_{3}(x_3 - \Phi x_4)\Vert _{L}^2. \end{aligned}$$
(4.25)

We apply Lemma 4.7 to obtain

$$\begin{aligned} \Vert x_1 + A x_2\Vert _{{V}}^2 \ge {}&2\alpha \langle x_1,x_2\rangle _{{V}} + \frac{1}{3} \left( \frac{\alpha }{M}\right) ^3(\Vert x_2\Vert _{V}^2 + \Vert x_1 \Vert _{{V}}^2 )\\ \ge {}&2\alpha (\mu \Vert x_4\Vert _{L}^2 - \nu \Vert x_3\Vert _{L}^2) + \frac{1}{6} \left( \frac{\alpha }{M}\right) ^3 \left( \Vert x_2\Vert _{V}^2 + \Vert x_1 \Vert _{{V}}^2 \right) , \end{aligned}$$

where the second line follows from the assumption (4.22), after writing \(\frac{1}{3} \left( \frac{\alpha }{M}\right) ^3=2\alpha \frac{\alpha ^2}{12\,M^3}+\frac{1}{6} \left( \frac{\alpha }{M}\right) ^3\).

Together with (4.25), this yields

$$\begin{aligned} \max (1,\theta ){\mathcal {N}}(x)^2&\ge 2\alpha \left( \mu \Vert x_4\Vert _{L}^2 - \nu \Vert x_3\Vert _{L}^2\right) + \frac{\alpha ^3}{6\, M^3} \left( \Vert x_2\Vert _{V}^2 + \Vert x_1 \Vert _{{V}}^2 \right) \nonumber \\&\quad + \theta \Vert P_{3}\left( x_3 - \Phi x_4\right) \Vert _{L}^2. \end{aligned}$$
(4.26)

Noting that \( P_{3}(x_3 - \Phi x_4) = x_3 - P_{3}\Phi x_4\) and that \(\Vert P_{3}\circ \Phi \Vert \le \Vert \Phi \Vert \le 1\), we use Lemma 4.8 with \(P_3\circ \Phi \) instead of \(\Phi \), \(a= 2\alpha \mu \) and \(b= 2\alpha \nu \), which gives \(\gamma \ge 2\alpha (\mu - \nu \Vert \Phi \Vert ^2)\ge 2\alpha \delta \). If we set \(\theta = \frac{9 a^2}{\gamma } \le \frac{18\alpha \mu ^2}{\mu - \nu \Vert \Phi \Vert ^2}\le \frac{18\alpha \zeta ^2}{\delta }\), we get

$$\begin{aligned} 2\alpha \left( \mu \Vert x_4\Vert _{L}^2 - \nu \Vert x_3\Vert _{L}^2\right) + \theta \Vert P_{3}(x_3 - \Phi x_4)\Vert _{L}^2 \ge \frac{\gamma }{3} \left( \Vert x_3\Vert _{L}^2+\Vert x_4\Vert _{L}^2\right) . \end{aligned}$$

Combined with (4.26), this gives

$$\begin{aligned} \max \left( 1,\frac{18\alpha \zeta ^2}{\delta }\right) {\mathcal {N}}(x)^2 \ge \frac{\alpha ^3}{6\, M^3} \big (\Vert x_1\Vert _{{V}}^2 + \Vert x_2\Vert _{{V}}^2\big ) + \frac{2\alpha \delta }{3}\big (\Vert x_3\Vert _{L}^2+\Vert x_4\Vert _{L}^2\big ), \end{aligned}$$

which leads to (4.23). \(\square \)

Let us now prove the error estimate.

Proof

(Proof of Theorem 4.1) Let \(v\in V_{h}\) and \(z\in X_{h}\) be given. Definition (4.4) of \(\zeta ^{(T)}_{h}({\varvec{v}})\) give

This yields, using the definition of \({\varvec{v}}\), the relation (2.11) (which gives \(\textsc {D}{\varvec{v}}=-f\)), and (4.2),

Using \(u(0) - \Phi u(T) = \xi _0\), we get

(4.27)

We then take an arbitrary element \({\widetilde{v}}\in W_{h}\) and notice that, by definition (4.3) of \(\delta ^{(T)}_{h}(u,{\widetilde{v}})\) and since \(\Phi \) is a contraction,

Adding this inequality to (4.27) yields

Using the bilinear form \({\widehat{b}}\) defined by (4.20), with \(V=L^2(0,T;{\varvec{L}})\) endowed with the inner product \(\langle {\mathfrak {S}}{\cdot },{\cdot }\rangle _{L^2({\varvec{L}})}\) and \(A={\mathfrak {S}}^{-1}\Lambda \) [which satisfies (4.12) by (2.6a)–(2.6b)], the preceding inequality implies

$$\begin{aligned} {\widehat{b}}((z_1,z_2,z_3,z_4),(y_1,y_2)) \le {\widehat{c}}_1 \Vert y_1\Vert _{V} + {\widehat{c}}_2 \Vert y_2\Vert _{L}, \end{aligned}$$
(4.28)

with

$$\begin{aligned} z_1 = {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial ({\widetilde{v}} - w_{h}),&\qquad z_2 = {{\mathbf {{\small {\uppercase {G}}}}}}_{h}({\widetilde{v}} - w_{h}) , \nonumber \\ z_3 = \textsc {P}_{h}({\widetilde{v}}(0) - w_{h}(0)),&\qquad z_4 = \textsc {P}_{h}({\widetilde{v}}(T) -w_{h}(T)),\nonumber \\ y_1 = {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v,&\qquad y_2 = \textsc {P}_{h}z,\nonumber \\ {\widehat{c}}_1 = \delta ^{(T)}_{h}(u,{\widetilde{v}})+\zeta ^{(T)}_{h}({\varvec{v}}),&\qquad {\widehat{c}}_2 = 2\delta ^{(T)}_{h}(u,{\widetilde{v}}). \end{aligned}$$

We therefore aim to apply Lemma 4.9 with

$$\begin{aligned} {\widehat{X}}={{\mathbf {{\small {\uppercase {G}}}}}}_{h}(V_{h})\times {{\mathbf {{\small {\uppercase {G}}}}}}_{h}(V_{h})\times \textsc {P}_{h}(X_{h})\times \textsc {P}_{h}(X_{h}). \end{aligned}$$

Condition (4.21) is satisfied since \({\widehat{X}}_1 = {\widehat{X}}_2\). Let \({\widehat{C}}\) be an upper bound of the norm of \(\textsc {P}_{h}\) defined by (3.2). Adding (4.8) to \((1+\frac{{\widehat{C}}^2}{T})^{-1}\frac{\alpha ^2}{12\,M^3}\times \) (4.9) shows that the hypothesis (4.22) is satisfied with

$$\begin{aligned} \mu =\frac{1}{2}+\left( 1+\frac{{\widehat{C}}^2}{T}\right) ^{-1}\frac{\alpha ^2}{12 M^3}\quad \text{ and } \quad \nu =\frac{1}{2}. \end{aligned}$$

We note that \(\mu -\nu \Vert \Phi \Vert ^2\ge \mu -\nu =\left( 1+\frac{{\widehat{C}}^2}{T}\right) ^{-1}\frac{\alpha ^2}{12\,M^3}{=}{:}\delta \). Taking the maximum of (4.28) over all \((y_1,y_2)\in {{\mathbf {{\small {\uppercase {G}}}}}}_{h}(V_{h})\times \textsc {P}_{h}(X_{h})\) with norm in \(V\times L\) equal to 1, Lemma 4.9 therefore yields \({\widehat{\beta }}>0\) depending only on \(\alpha \), M, \({\widehat{C}}\) and T such that

(4.29)

where we use for positive a and b in the last inequality. By (2.7) we have

Plugging this into (4.29) and using (4.7) in Lemma 4.6 together with , we infer

Using the triangle inequality in the definition (4.3) of \(\delta ^{(T)}_{h}\), we infer

$$\begin{aligned} {\widehat{\beta }} \delta ^{(T)}_{h}(u, w_{h}) \le \sqrt{3}\max (1,\rho )\left( 3\delta ^{(T)}_{h}(u,{\widetilde{v}})+\zeta ^{(T)}_{h}({\varvec{v}})\right) + {\widehat{\beta }}\delta ^{(T)}_{h}(u, {\widetilde{v}}). \end{aligned}$$

Since \({\widetilde{v}}\) is arbitrary in \(W_{h}\), this concludes the proof of the second inequality in (4.6).

Let us now turn to the first inequality in (4.6). We first note that

$$\begin{aligned} \inf _{v\in W_{h}} \delta ^{(T)}_{h}(u,v) \le \delta ^{(T)}_{h}(u,w_{h}). \end{aligned}$$

To bound \(\zeta ^{(T)}_{h}({\varvec{v}})\) we recall that \({\varvec{v}}={\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}Ru' + \Lambda {{\mathbf {{\small {\uppercase {G}}}}}}u +{\varvec{F}}\) satisfies \(-\textsc {D}{\varvec{v}}=f\) [see (2.11)], and use the scheme (4.2) (with \(z=0\)) to write, for any \(v\in V_{h}\backslash \{0\}\),

Dividing by and taking the supremum over \(v\in V_{h}\backslash \{0\}\) shows that \(\zeta ^{(T)}_{h}({\varvec{v}})\le \delta ^{(T)}_{h}(u,w_{h})\), which concludes the proof. \(\square \)

5 Interpolation results

In this section, we consider a sequence \((\mathcal D_{h_n})_{n\in \mathbb N}\) of gradient discretisations which is

  1. 1.

    Consistent, in the sense that

    $$\begin{aligned} \forall \varphi \in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}},\ \lim _{n\rightarrow \infty } \sigma _{{h}_n}(\varphi )=0, \end{aligned}$$
    (5.1)

    where

    (5.2)
  2. 2.

    Limit-conforming, in the sense that

    $$\begin{aligned} \forall {\varvec{\varphi }}\in {\varvec{H}}_\textsc {D},\ \lim _{n\rightarrow \infty } \zeta _{{h}_n}({\varvec{\varphi }}) = 0, \end{aligned}$$
    (5.3)

    where

    $$\begin{aligned} \forall {\varvec{\varphi }}\in {\varvec{H}}_\textsc {D},\; \zeta _{{h}_n}({\varvec{\varphi }}) = \sup _{v\in X_{{h}_n}{\setminus }\{0\}}\frac{\displaystyle \left| \langle {\varvec{\varphi }},{{\mathbf {{\small {\uppercase {G}}}}}}_{{h}_n} v\rangle _{{\varvec{L}}} + \langle \textsc {D}{\varvec{\varphi }},\textsc {P}_{{h}_n} v\rangle _{L} \right| }{\Vert v \Vert _{{h}_n}}. \end{aligned}$$
    (5.4)

Applying [13, Lemma 3.10] or [11, Lemma 2.6] for example, we can state that there exists \({\widehat{C}}>0\) such that, for all \(n\in \mathbb N\),

$$\begin{aligned} \max _{v\in X_{{h}_n}{\setminus }\{0\}}\frac{\left| \textsc {P}_{{h}_n} v\right| _{L}}{\Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\Vert _{\varvec{L}}}\le {\widehat{C}}. \end{aligned}$$
(5.5)

In the following, we denote by \(C_i\), for \(i\in {\mathbb {N}}\), various constants which only depend on \({\widehat{C}}\), T, \(C_T\) [see (2.5)], \(\Lambda \) and \({\mathfrak {S}}\).

Let \(N_n\) a sequence of positive integers diverging to infinity, and let \(k_n = T/N_n\). This section is devoted to the proof of the following theorem, which enables us to apply Theorem 4.1 for proving the convergence of the scheme under the hypotheses of this section.

Theorem 5.1

Under the hypotheses of this section, the following holds.

For any \({\varvec{\varphi }}\in L^2(0,T;{\varvec{H}}_\textsc {D})\),

$$\begin{aligned} \lim _{n\rightarrow \infty } \zeta ^{(T)}_{{h}_n}({\varvec{\varphi }}) = 0. \end{aligned}$$
(5.6)

Moreover, recalling the definition (4.3) of \(\delta ^{(T)}_{h}\), we have, for all \(w\in W\),

$$\begin{aligned} \lim _{n\rightarrow \infty } \inf _{v\in W_{{h}_n}} \delta ^{(T)}_{{h}_n}(w,v) = 0. \end{aligned}$$
(5.7)

As a consequence, letting u be the solution to (2.9), and \(w_{{h}_n}\) be the solution to (3.3) for \({h}= {h}_n\), then

$$\begin{aligned} \lim _{n\rightarrow \infty } \delta ^{(T)}_{{h}_n}(u,w_{{h}_n}) = 0. \end{aligned}$$
(5.8)

Proof

For a.e. \(t\in (0,T)\) and all \(v\in V_{h_n}\) we have

$$\begin{aligned} \left| \langle {\varvec{\varphi }}(t),{{\mathbf {{\small {\uppercase {G}}}}}}_{h_n} v(t)\rangle _{{\varvec{L}}}+\langle \textsc {D}{\varvec{\varphi }}(t),\textsc {P}_{h_n} v(t)\rangle _L\right| \le \zeta _{h_n}({\varvec{\varphi }}(t))\Vert v(t)\Vert _{h_n}. \end{aligned}$$

Recalling that , integrating over \(t\in (0,T)\) and using the Cauchy–Schwarz inequality yields

and thus

By limit-conformity we know that, for a.e. \(t\in (0,T)\), \(\zeta _{h_n}({\varvec{\varphi }}(t))\rightarrow 0\) as \(n\rightarrow \infty \). Since we also have , we can apply the dominated convergence theorem to obtain (5.6).

Let us now turn to the proof of (5.7). Let \(\overline{w}\in H^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\). We prove in Lemma 5.7 that

$$\begin{aligned} \lim _{n\rightarrow \infty } \inf _{v\in W_{{h}_n}} \delta ^{(T)}_{{h}_n}(\overline{w},v) = 0. \end{aligned}$$

The conclusion follows by density of \(H^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\) in W, and the property

$$\begin{aligned} \delta ^{(T)}_{{h}_n}(w,v)\le \delta ^{(T)}_{{h}_n}(\overline{w},v)+ \left( \Vert w-\overline{w}\Vert _L^2 + \Vert {{\mathbf {{\small {\uppercase {G}}}}}}w-{{\mathbf {{\small {\uppercase {G}}}}}}\overline{w}\Vert _{{\varvec{L}}}^2\right) ^{\frac{1}{2}}, \end{aligned}$$

valid for any \(w,\overline{w}\in W\).

Finally, (5.8) is a consequence of (5.6), (5.7) and Theorem 4.1. \(\square \)

The next lemmas are steps for the proof of the final lemma of this section, Lemma 5.7.

In the following, for legibility reasons, we sometimes drop the index n in \({h}_n\). Recalling the definition (5.2) of \(\sigma _{h}\), we set \(\widehat{\sigma }^{(T)}_{{h}}: L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\rightarrow [0,+\infty )\) as

$$\begin{aligned} \widehat{\sigma }^{(T)}_{{h}}(v) {:}{=} \Vert \sigma _{h}(v(\cdot ))\Vert _{L^2((0,T))} \text{ for } \text{ all } v\in L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}). \end{aligned}$$

Note that the quantity \(\widehat{\sigma }^{(T)}_{h}(v)=\Vert \inf _{w\in X_{h}} \delta _{{h}}(v(\cdot ),w)\Vert _{L^2((0,T))}\) is not equivalent to \(\inf _{w\in W_{h}} \delta ^{(T)}_{{h}}(v,w)\). In particular, it does not include a term equivalent to \(\sup _{[t\in [0,T]}\Vert v(t) - w(t)\Vert _L\).

We have the following lemma

Lemma 5.2

For any \(\varphi \in L^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\),

$$\begin{aligned} \lim _{n\rightarrow \infty } \widehat{\sigma }^{(T)}_{{h}_n}(\varphi ) = 0. \end{aligned}$$
(5.9)

Proof

By consistency (5.1), for a.e. \(t\in (0,T)\) we have \(\sigma _{{h}_n}(\varphi (t))\rightarrow 0\) as \(n\rightarrow \infty \). Since \(0\in X_{h}\) and \(\Vert \varphi (t)\Vert _L\le C_P \Vert {{\mathbf {{\small {\uppercase {G}}}}}}\varphi (t)\Vert _{{\varvec{L}}}\), we also have \(\widehat{\sigma }^{(T)}_{{h}_n}(\varphi (t)) \le (1+C_P)\Vert \varphi (t)\Vert _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}\). The dominated convergence theorem then concludes the proof of (5.9). \(\square \)

The interpolator \({\mathcal {I}}_{h}: {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\rightarrow X_{h}\) is the linear map defined by

$$\begin{aligned} \forall u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}},\ {\mathcal {I}}_{h}u = \text{ argmin }\{\Vert \textsc {P}_{h}v - u\Vert _{L}^2+\Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v - {{\mathbf {{\small {\uppercase {G}}}}}}u\Vert _{{\varvec{L}}}^2:\,v\in X_{h}\}. \end{aligned}$$

Since \(\mathcal I_{h}u\) is the solution of an unconstrained quadratic minimisation problem, we have

$$\begin{aligned}{} & {} \forall u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}},\ \forall v\in X_{h},\\{} & {} \quad \langle \textsc {P}_{h}{\mathcal {I}}_{h}u,\textsc {P}_{h}v\rangle _L+\langle {{\mathbf {{\small {\uppercase {G}}}}}}_{h}{\mathcal {I}}_{h}u,{{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\rangle _{\varvec{L}}= \langle u,\textsc {P}_{h}v\rangle _L+\langle {{\mathbf {{\small {\uppercase {G}}}}}}u,{{\mathbf {{\small {\uppercase {G}}}}}}_{h}v\rangle _{\varvec{L}}. \end{aligned}$$

Selecting \(v=\mathcal I_{h}u\) and using (2.1) and (5.5), we deduce the bound

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}\mathcal I_{h}u\Vert _{\varvec{L}}\le (C_P{\widehat{C}}+1)\Vert {{\mathbf {{\small {\uppercase {G}}}}}}u\Vert _{\varvec{L}}. \end{aligned}$$
(5.10)

We also define an interpolator for space-time functions: if \(w\in C([0,T];{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\), the element \(\mathcal I_{{h},k} w\in W_{h}\) is defined by the relations (3.4) using the family \((w^{m})_{m=0,\ldots ,N}=(\mathcal I_{h}w(mk))_{m=0,\ldots ,N}\). We then have the following lemma.

Lemma 5.3

For all \(w\in H^1(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\) we have

$$\begin{aligned} \widehat{\sigma }^{(T)}_{{h}}(w) \le \Vert \delta _{h}(w(\cdot ),\mathcal I_{{h},k} w (\cdot ))\Vert _{L^2((0,T))} \le \widehat{\sigma }^{(T)}_{{h}}(w) + C_{1} k \Vert w'\Vert _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}. \end{aligned}$$

Proof

Recalling the definition (5.2) of \(\delta _{h}\) and using triangle inequalities, we have

(5.11)

For all \(m=0,\ldots ,N-1\) and for a.e. \(t\in ((m-1)k,m k)\), it holds

(5.12)

This yields, owing to the Cauchy–Schwarz inequality,

$$\begin{aligned} \left| {{\mathbf {{\small {\uppercase {G}}}}}}_{h}\mathcal I_{{h},k}w (t)-{{\mathbf {{\small {\uppercase {G}}}}}}_{h}{\mathcal {I}}_{h}w(t)\right| _{{\varvec{L}}}^2 \le k \int _{mk}^{(m+1)k} \left| {{\mathbf {{\small {\uppercase {G}}}}}}_{h}{\mathcal {I}}_{h}w'(s)\right| _{{\varvec{L}}}^2\textrm{d}s, \end{aligned}$$

and therefore

$$\begin{aligned} \int _{mk}^{(m+1)k}\left| {{\mathbf {{\small {\uppercase {G}}}}}}_{h}\mathcal I_{{h},k}w (t)-{{\mathbf {{\small {\uppercase {G}}}}}}_{h}{\mathcal {I}}_{h}w(t)\right| _{{\varvec{L}}}^2\textrm{d}t \le k^2 \int _{mk}^{(m+1)k} \left| {{\mathbf {{\small {\uppercase {G}}}}}}_{h}{\mathcal {I}}_{h}w'(s)\right| _{{\varvec{L}}}^2\textrm{d}s. \end{aligned}$$

Invoking the projection inequality (5.10) we can write \(\left| {{\mathbf {{\small {\uppercase {G}}}}}}_{h}{\mathcal {I}}_{h}w'(s)\right| _{{\varvec{L}}}\le (C_P{\widehat{C}}+1)\left| {{\mathbf {{\small {\uppercase {G}}}}}}w'(s)\right| _{{\varvec{L}}}\). Plugging this into the relation (5.11) concludes the proof of the lemma. \(\square \)

Lemma 5.4

For all \(u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\), recalling the definitions (2.10) and (4.1) of the continuous and discrete Riesz operators, we have

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}{\mathcal {I}}_{h}u - {{\mathbf {{\small {\uppercase {G}}}}}}R u\Vert _{\varvec{L}}\le C_{2} (\zeta _{h}({{\mathbf {{\small {\uppercase {G}}}}}}Ru) + \sigma _{h}(R u) + \sigma _{h}(u) ). \end{aligned}$$

Proof

Let \(v_1\in X_{h}\) be such that

$$\begin{aligned} \forall z\in X_{h},\ \langle {\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}_{h}v_1,{{\mathbf {{\small {\uppercase {G}}}}}}_{h}z\rangle _{\varvec{L}}= \langle u,\textsc {P}_{h}z\rangle _L. \end{aligned}$$

By definition (2.10) of Ru, we note that \(v_1\) is the solution of the gradient scheme for the linear problem satisfied by Ru; hence, we have the following error estimate [13, Theorem 5.2]:

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v_1 - {{\mathbf {{\small {\uppercase {G}}}}}}Ru\Vert _{\varvec{L}}\le C_{3}(\zeta _{h}({{\mathbf {{\small {\uppercase {G}}}}}}Ru) + \sigma _{h}(Ru)). \end{aligned}$$
(5.13)

Recall that \(R_{h}\mathcal I_{h}u\) satisfies, by definition of \(R_{h}\),

$$\begin{aligned} \forall z\in X_{h},\ \langle {\mathfrak {S}}{{\mathbf {{\small {\uppercase {G}}}}}}_{h}(R_{h}\mathcal I_{h}u),{{\mathbf {{\small {\uppercase {G}}}}}}_{h}z\rangle _{\varvec{L}}= \langle \textsc {P}_{h}{\mathcal {I}}_{h}u,\textsc {P}_{h}z\rangle _L. \end{aligned}$$

Subtracting the equations satisfied by \(v_1\) and \(R_{h}\mathcal I_{h}u\), taking \(z=v_1-R_{h}\mathcal I_{h}u\) and using the Cauchy–Schwarz inequality together with (5.5), we obtain

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}(v_1 - R_{h}\mathcal I_{h}u )\Vert _{\varvec{L}}\le C_{4} \Vert u - \textsc {P}_{h}{\mathcal {I}}_{h}u\Vert _L\le C_{4} \sigma _{h}(u). \end{aligned}$$

Combined with (5.13), this concludes the proof. \(\square \)

Lemma 5.5

For all \(w\in H^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\), it holds

$$\begin{aligned}{} & {} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R w' - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial \mathcal I_{{h},k}w \Vert _{L^2({\varvec{L}})} \\{} & {} \quad \le k \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R w''\Vert _{L^2({\varvec{L}})} + C_{2} \left( \zeta ^{(T)}_{h}({{\mathbf {{\small {\uppercase {G}}}}}}R w') + \widehat{\sigma }^{(T)}_{{h}}(R w') + \widehat{\sigma }^{(T)}_{{h}}(w') \right) . \end{aligned}$$

Proof

Let \(\overline{w'}_{h}\) be the function defined on (0, T) by: for all \(m=0,\ldots ,N\) and \(t\in ((m-1)k,mk)\),

$$\begin{aligned} \overline{w'}_{h}(t) = \frac{1}{k}\int _{(m-1)k}^{mk} w'(s)\textrm{d}s = \frac{w(mk) - w((m-1)k)}{k}. \end{aligned}$$

We have

$$\begin{aligned}{} & {} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R w' - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial \mathcal I_{{h},k}w \Vert _{L^2({\varvec{L}})} \nonumber \\{} & {} \quad \le \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R w' - {{\mathbf {{\small {\uppercase {G}}}}}}R\overline{w'}_{h}\Vert _{L^2({\varvec{L}})} + \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R\overline{w'}_{h}- {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial \mathcal I_{{h},k}w \Vert _{L^2({\varvec{L}})}. \end{aligned}$$
(5.14)

We have

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R w' - {{\mathbf {{\small {\uppercase {G}}}}}}R\overline{w'}_{h}\Vert _{L^2({\varvec{L}})}^2 = \sum _{m=0}^{N-1} \int _{mk}^{(m+1)k} \left| {{\mathbf {{\small {\uppercase {G}}}}}}R w'(t) - \frac{1}{k}\int _{(m-1)k}^{mk} {{\mathbf {{\small {\uppercase {G}}}}}}R w'(s)\textrm{d}s\right| _{{\varvec{L}}}^2\textrm{d}t. \end{aligned}$$

We have, using the Jensen inequality,

$$\begin{aligned}{} & {} \int _{mk}^{(m+1)k} \left| \frac{1}{k}\int _{(m-1)k}^{mk} ({{\mathbf {{\small {\uppercase {G}}}}}}R w'(t) - {{\mathbf {{\small {\uppercase {G}}}}}}R w'(s))\textrm{d}s\right| _{{\varvec{L}}}^2\textrm{d}t \\{} & {} \quad \le \int _{mk}^{(m+1)k}\int _{(m-1)k}^{mk} \frac{1}{k}\left| {{\mathbf {{\small {\uppercase {G}}}}}}R w'(t) - {{\mathbf {{\small {\uppercase {G}}}}}}R w'(s)\right| _{{\varvec{L}}}^2\textrm{d}s\textrm{d}t, \end{aligned}$$

and

$$\begin{aligned} \left| {{\mathbf {{\small {\uppercase {G}}}}}}R w'(t) - {{\mathbf {{\small {\uppercase {G}}}}}}R w'(s)\right| _{{\varvec{L}}}^2 \le k\int _{mk}^{(m+1)k}\left| {{\mathbf {{\small {\uppercase {G}}}}}}R w''(\tau )\right| _{{\varvec{L}}}^2\textrm{d}\tau . \end{aligned}$$

This yields

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R w' - {{\mathbf {{\small {\uppercase {G}}}}}}R\overline{w'}_{h}\Vert _{L^2({\varvec{L}})} \le k \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R w''\Vert _{L^2({\varvec{L}})}. \end{aligned}$$
(5.15)

On the other hand, for a.e. \(t\in ((m-1)k,mk)\) and writing

$$\begin{aligned} \partial \mathcal I_{{h},k}w(t)=\frac{1}{k} ({\mathcal {I}}_{h}w(mk) - {\mathcal {I}}_{h}w((m-1)k)) = \frac{1}{k} \int _{(m-1)k}^{mk}{\mathcal {I}}_{h}w'(s)\textrm{d}s, \end{aligned}$$

we have

$$\begin{aligned} {{\mathbf {{\small {\uppercase {G}}}}}}R\overline{w'}_{h}(t) - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial \mathcal I_{{h},k}w (t) = \frac{1}{k}\int _{(m-1)k}^{mk} \Big ({{\mathbf {{\small {\uppercase {G}}}}}}R w'(s) - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}{\mathcal {I}}_{h}w'(s)\Big )\textrm{d}s. \end{aligned}$$

This yields

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}\overline{w'}_{h}- {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial \mathcal I_{{h},k}w \Vert _{L^2({\varvec{L}})} \le \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R w' - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}{\mathcal {I}}_{h}w'\Vert _{L^2({\varvec{L}})}. \end{aligned}$$

Applying Lemma 5.4 to \(u=w'(t)\), squaring and integrating over \(t\in (0,T)\), we infer

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}\overline{w'}_{h}- {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial \mathcal I_{{h},k}w \Vert _{L^2({\varvec{L}})} \le C_{2} (\zeta ^{(T)}_{h}({{\mathbf {{\small {\uppercase {G}}}}}}R w') + \widehat{\sigma }^{(T)}_{{h}}(R w') + \widehat{\sigma }^{(T)}_{{h}}(w') ). \end{aligned}$$

The proof is concluded by combining this estimate, (5.14) and (5.15). \(\square \)

Lemma 5.6

For all \(w\in H^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\), we have

$$\begin{aligned}{} & {} \sup _{t\in [0,T]}\Vert \textsc {P}_{h}\mathcal I_{{h},k}w (t) - w(t)\Vert _L\\{} & {} \quad \le C_{5} \Big (k \left( \Vert w'\Vert _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}+ \Vert w''\Vert _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}\right) + \widehat{\sigma }^{(T)}_{{h}}(w) + \widehat{\sigma }^{(T)}_{{h}}(w')\Big ). \end{aligned}$$

Proof

Let us first establish a preliminary inequality. For \(s,t\in [0,T]\),

$$\begin{aligned} w'(t) = w'(s) + \int _s^t w''(\tau )\textrm{d}\tau , \end{aligned}$$

which leads to

$$\begin{aligned} \Vert w'(t)\Vert _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}} \le \Vert w'(s)\Vert _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}} + \int _0^T \Vert w''(\tau )\Vert _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}\textrm{d}\tau . \end{aligned}$$

Integrating with respect to s and using the Cauchy–Schwarz inequality, we obtain

$$\begin{aligned} \sup _{t\in [0,T]} \Vert w'(t)\Vert _{{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}}\le \frac{1}{\sqrt{T}} \Vert w'\Vert _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}+ \sqrt{T}\Vert w''\Vert _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}. \end{aligned}$$
(5.16)

For all \(t\in [0,T]\), we have

$$\begin{aligned} \Vert \textsc {P}_{h}\mathcal I_{{h},k}w (t) - w(t)\Vert _L\le & {} \Vert \textsc {P}_{h}\mathcal I_{{h},k}w (t) -\textsc {P}_{h}{\mathcal {I}}_{h}w(t)\Vert _L+ \Vert \textsc {P}_{h}{\mathcal {I}}_{h}w(t) - w(t)\Vert _L\nonumber \\\le & {} {\widehat{C}} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}(\mathcal I_{{h},k}w (t) - {\mathcal {I}}_{h}w(t))\Vert _{{\varvec{L}}} + \Vert \textsc {P}_{h}{\mathcal {I}}_{h}w(t) - w(t)\Vert _L. \nonumber \\ \end{aligned}$$
(5.17)

The first term in the right-hand side can be bounded using (5.12), (5.10) (with \(u=w'(t)\)) and (5.16) to write

$$\begin{aligned}{} & {} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}_{h}(\mathcal I_{{h},k}w (t) - {\mathcal {I}}_{h}w(t))\Vert _{{\varvec{L}}} \nonumber \\{} & {} \quad \le (C_P{\widehat{C}}+1)k\left( \frac{1}{\sqrt{T}} \Vert w'\Vert _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}+ \sqrt{T}\Vert w''\Vert _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}\right) . \end{aligned}$$
(5.18)

To estimate the second term in the right-hand side of (5.17), we write, for any \(s\in (0,T)\),

$$\begin{aligned} \textsc {P}_{h}{\mathcal {I}}_{h}w(t) - w(t) = \textsc {P}_{h}{\mathcal {I}}_{h}w(s) - w(s) + \int _s^t (\textsc {P}_{h}{\mathcal {I}}_{h}w'(\tau ) - w'(\tau ))\textrm{d}\tau . \end{aligned}$$

Integrating with respect to s and using the Cauchy–Schwarz inequality, this yields

$$\begin{aligned} \Vert \textsc {P}_{h}{\mathcal {I}}_{h}w(t) - w(t)\Vert _{L} \le C_{6}\left( \widehat{\sigma }^{(T)}_{{h}}(w)+\widehat{\sigma }^{(T)}_{{h}}(w')\right) . \end{aligned}$$

Plugging this estimate together with (5.18) in (5.17) concludes the proof. \(\square \)

Lemma 5.7

For all \(w\in H^2(0,T;{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})\), it holds

$$\begin{aligned}{} & {} \delta ^{(T)}_{{h}}(w,\mathcal I_{{h},k}w) \le C_{7}\Big ( k \left( \Vert w'\Vert _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}+ \Vert w''\Vert _{L^2({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}})}\right) \nonumber \\{} & {} \quad + \zeta ^{(T)}_{h}({{\mathbf {{\small {\uppercase {G}}}}}}R w') + \widehat{\sigma }^{(T)}_{{h}}(R w') + \widehat{\sigma }^{(T)}_{{h}}(w') + \widehat{\sigma }^{(T)}_{{h}}(w) \Big ). \end{aligned}$$
(5.19)

As a consequence,

$$\begin{aligned} \lim _{n\rightarrow \infty } \inf _{v\in W_{{h}_n}} \delta ^{(T)}_{{h}_n}(w,v) = 0. \end{aligned}$$
(5.20)

Proof

Recalling the definition (4.3) of \(\delta ^{(T)}_{{h}}\), the estimate (5.19) is a consequence of Lemmas 5.5, 5.3 and 5.6, once we notice that, for all \(u\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\),

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}R u \Vert _{{\varvec{L}}}\le C_{8} \Vert u\Vert _L\le C_{8} C_{P} \Vert u\Vert _{H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}, \end{aligned}$$

the first inequality being obtained by selecting \(\xi =v=u\) in (2.10), while the second follows from (2.1).

The relation (5.20) follows from Lemmas 5.1 and (5.2). \(\square \)

6 Numerical illustration

6.1 Irregular initial data

One of the key features of the error estimate in Theorem 4.1 is that it does not require any regularity on the solution beyond the one provided by the model itself. Let us apply our error estimate to a case where the continuous solution of the problem is such that \(u'\notin L^2(0,T;L)\). Let \(\Omega = (0,1)\), \(L= {\varvec{L}}= L^2(\Omega )\), \({{\mathbf {{\small {\uppercase {G}}}}}}u = \partial _x u\), \({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}= H^1_0(\Omega )\), \(\Phi = 0\), \(\Lambda = \textrm{Id}\), \(f=0\), \({\varvec{F}}=0\), \(\xi _0(x) = 1\), \(T=1/10\). Then the solution of Problem (2.9) is given, for \(t\in (0,T]\) and \(x\in [0,1]\), by

$$\begin{aligned} u(t)(x) = \sum _{p\in {\mathbb {N}}} \frac{4}{(2p+1)\pi } \exp \left( -((2p+1)\pi )^2 t\right) \sin ((2p+1)\pi x). \end{aligned}$$

We define \((X_{h},\textsc {P}_{h},{{\mathbf {{\small {\uppercase {G}}}}}}_{h})\), letting \(k = 0.9 h^2\), from the Control Volume Finite Element gradient discretisation [11, 8.4, p. 274]. It consists, for a given \(M\in {\mathbb {N}}\backslash \{0\}\), in defining \(h = 1/(M+1)\), \(X_{h}= {\mathbb {R}}^M\), and, for any \(w{:}{=}(w_i)_{i=1,\ldots ,M}\), letting \(w_0 = w_{M+1} = 0\),

$$\begin{aligned} \textsc {P}_{h}w(x)= & {} w_i\hbox { if }x\in \left( \left( i-\frac{1}{2}\right) h,\left( i+\frac{1}{2}\right) h\right) \cap (0,1),\ i=0,\ldots ,M+1,\\ {{\mathbf {{\small {\uppercase {G}}}}}}_{h}w(x)= & {} \frac{w_{i+1} - w_i}{h}\hbox { if }x\in (ih,(i+1)h),\ i=0,\ldots ,M. \end{aligned}$$

Let \(\varphi _i:[0,1]\rightarrow [0,1]\), for \(i=1,\ldots ,M\), be the \(P^1\) finite element basis function: \(\varphi _i(jh) = \delta _{ij}\) for all \(j=0,\ldots ,M+1\), and \(\varphi _i\) continuous and piecewise affine. Then, setting

$$\begin{aligned} v {:}{=} \sum _{i=1}^M w_i \varphi _i \in H^1_0(\Omega ), \end{aligned}$$

we get

$$\begin{aligned} {{\mathbf {{\small {\uppercase {G}}}}}}_{h}w = {{\mathbf {{\small {\uppercase {G}}}}}}v. \end{aligned}$$

The advantage of this discretisation method is that is satisfies monotonicity properties, due to the fact that the mass matrix is lumped, accounting for the definition of \(\textsc {P}_{h}\). We show in Fig. 1 the exact solution at different times, and the approximate solution obtained by the scheme at the final time. In this case, the continuous solution does not satisfy \(u'\in L^2(0,T;L)\), nor \(u\in L^2(0,T;H^2(\Omega ))\). Indeed, for any \(T>0\), we have

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \Vert u'\Vert _{L^2(\varepsilon ,T;L^2(\Omega ))} = \lim _{\varepsilon \rightarrow 0} \Vert \Delta u\Vert _{L^2(\varepsilon ,T;L^2(\Omega ))} = +\infty . \end{aligned}$$

This can be shown by noticing that

$$\begin{aligned} \Vert u'\Vert _{L^2(\varepsilon ,T;L^2(\Omega ))}^2 = \frac{1}{2} \big (\Vert {{\mathbf {{\small {\uppercase {G}}}}}}u(\varepsilon )\Vert _{{\varvec{L}}}^2 - \Vert {{\mathbf {{\small {\uppercase {G}}}}}}u(T)\Vert _{{\varvec{L}}}^2\big ). \end{aligned}$$

If \( \Vert u'\Vert _{L^2(\varepsilon ,T;L^2(\Omega ))}\) were bounded as \(\varepsilon \rightarrow 0\), so would be \(\Vert {{\mathbf {{\small {\uppercase {G}}}}}}u(\varepsilon )\Vert _{{\varvec{L}}}\). Since \(u(\varepsilon )\rightarrow \xi _0\) in \(L\), this would imply that \(\xi _0\in {H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}\), which does not hold.

Fig. 1
figure 1

Exact solution at different times and approximate solution at the final time, irregular initial data

Computing the error terms involved in (4.3) in Theorem 4.1, we remark that, since the right-hand-side vanishes,

$$\begin{aligned} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}Ru' - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}R_{h}\partial w\Vert _{L^2({\varvec{L}})} = \Vert {{\mathbf {{\small {\uppercase {G}}}}}}u - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}w\Vert _{L^2({\varvec{L}})}. \end{aligned}$$

It therefore suffices to evaluate

$$\begin{aligned} E_1 = \Vert {{\mathbf {{\small {\uppercase {G}}}}}}u - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}w\Vert _{L^2({\varvec{L}})},\ E_2 = \max _{t\in [0,T]}\Vert u(t) - \textsc {P}_{h}w(t)\Vert _{L}. \end{aligned}$$

In order to compute an accurate value of \(E_1\), we remark that

$$\begin{aligned} \int _{(m-1)k}^{mk} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}u(t) - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}w^{(m)}\Vert _{{\varvec{L}}}^2 \textrm{d}t = T_1^{(m)} - 2 T_2^{(m)} + T_3^{(m)}, \end{aligned}$$

with

$$\begin{aligned} T_1^{(m)}= & {} \int _{(m-1)k}^{mk} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}u(t)\Vert _{{\varvec{L}}}^2 \textrm{d}t = \frac{1}{2} \left( \Vert u(mk)\Vert _{L}^2 - \Vert u((m-1)k)\Vert _{L}^2\right) , \\ T_2^{(m)}= & {} \int _{(m-1)k}^{mk} \langle {{\mathbf {{\small {\uppercase {G}}}}}}u(t),{{\mathbf {{\small {\uppercase {G}}}}}}v^{(m)}\rangle _{{\varvec{L}}} \textrm{d}t = \langle u(mk)- u((m-1)k), v^{(m)}\rangle _{L}, \end{aligned}$$

and

$$\begin{aligned} T_3^{(m)} = \int _{(m-1)k}^{mk} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}v^{(m)}\Vert _{{\varvec{L}}}^2 \textrm{d}t. \end{aligned}$$

Hence

$$\begin{aligned} \int _{0}^{T} \Vert {{\mathbf {{\small {\uppercase {G}}}}}}u(t) - {{\mathbf {{\small {\uppercase {G}}}}}}_{h}v_{h}(t)\Vert _{{\varvec{L}}}^2 \textrm{d}t = \frac{1}{2} \left( \Vert u(T)\Vert _{L}^2 - \Vert \xi _0\Vert _{L}^2\right) + \sum _{m=1}^N\left( T_2^{(m)}+T_3^{(m)}\right) . \end{aligned}$$

It then suffices to compute the terms \( \langle u(mk)- u((m-1)k), v^{(m)}\rangle _{L}\) using quadrature formulas.

We observe in Fig. 2 that \(E_1\) and \(E_2\) behave as \(C\sqrt{h}\). The behaviour of \(E_2\) is compatible with the interpolation error of the solution. Indeed, defining \(w^\textrm{ini}\in X_{h}\) such that \(w_i^{\textrm{ini}} = 1\) for any \(i=1,\ldots ,M\) and \(w_0^{\textrm{ini}} = w_{M+1}^{\textrm{ini}} = 0\), we have \(\Vert \xi _0 - \textsc {P}_{h}w^{\textrm{ini}}\Vert _{L} = \sqrt{2 (1-0)^2 \frac{h}{2}}\). Note that the function \({\varvec{v}}\) given by (4.5) is null in this case, which implies that \(\zeta ^{(T)}_{h}({\varvec{v}}) = 0\).

Fig. 2
figure 2

Errors \(E_1\) and \(E_2\) for different values of h, irregular initial data

6.2 Irregular right-hand-side

We consider again \(\Omega = (0,1)\), \(L= {\varvec{L}}= L^2(\Omega )\), \({{\mathbf {{\small {\uppercase {G}}}}}}u = \partial _x u\), \({H_{{{\mathbf {{\small {\uppercase {G}}}}}}}}= H^1_0(\Omega )\), \(\Phi = 0\), \(\Lambda = \textrm{Id}\), \(T=1/10\), and \(\xi _0,f,{\varvec{F}}\) such that the solution of Problem (2.9) is given, for \(t\in (0,T]\) and \(x\in [0,1]\), by

$$\begin{aligned} u(t)(x) = t \min (x,1-x). \end{aligned}$$

Hence we let \(\xi _0 = 0\), \(f(t)(x) = \min (x,1-x)\) and \({\varvec{F}}(t)(x) = -\partial _x u(t)(x)\).

This problem is approximated on [0, T] using the same discretisation method as in the previous section with \(k = 0.9 h^2\), and specifying odd values for M. Figure 3 shows the exact solution at different times, and the approximate solution obtained by the scheme described below at the final time.

Fig. 3
figure 3

Exact solution at different times and approximate solution at the final time, irregular right-hand-side

Fig. 4
figure 4

Errors \(E_1\) and \(E_2\) for different values of h, irregular right-hand-side

We see in Fig. 4 that \(E_1\) behaves as \(h^2\) and \(E_2\) as h. These orders are in conformity with the expected interpolation error of the function \(u(t)(x) = t\min (x,1-x)\). Indeed, if we define the interpolate \((w^m)_{m=0,\ldots ,N}\in X_{h}^{N+1}\) such that \(w_i^m = mk\min (ih,1-ih)\) for any \(i=0,\ldots ,M+1\), the \(L^2(\Omega )\) interpolation error on the gradient behaves as \((m+1)k - t\) for any \(mk<t<(m+1)k\). This yields an error \(E_1\) behaving as \(k\sim h^2\) in the \(L^2({\varvec{L}})\) norm. The error \(E_2\) behaves as the \(L^2(\Omega )\) norm of the difference between an affine function and a piecewise constant function with step h, which provides a behaviour in h.