1 Introduction

The purpose of this paper is to discuss some aspects of the numerical solution of the semilinear time-fractional initial boundary value problem

$$\begin{aligned} ^{C}\!\partial _t^{\alpha }u +\mathcal {L}u=f(x,t,u)\; \text{ in } \Omega \times (0,T_0],\quad u(x,0)=u_0(x)\; \text{ in } \Omega , \end{aligned}$$
(1.1)

subject to a homogeneous Dirichlet boundary condition, where \(\Omega \subset \mathbb {R}^d\) (\(d\ge 2\)) is a bounded convex polyhedral domain with a boundary \(\partial \Omega \) and \(T_0>0\) is a fixed time. Here \(u_0\) is a given initial data and f is a smooth function of its arguments satisfying

$$\begin{aligned} \sup _{x\in \Omega ,t\in (0,T_0)}\big (|\partial _t f(x,t,u)|+|\partial _u f(x,t,u)|\big )\le L \quad \forall u\in \mathbb {R}. \end{aligned}$$
(1.2)

The operator \(\mathcal {L}\) is defined by \(\mathcal {L}u = -\text{ div } [A(x)\nabla u]+\kappa (x)u\), where \(A(x)=[{a_{ij}(x)}]\) is a \(d\times d\) symmetric and uniformly positive definite in \(\bar{\Omega }\) matrix, and \(\kappa \in L^\infty (\Omega )\) is nonnegative. The coefficients \(a_{ij}\) and \(\kappa \) are assumed to be sufficiently smooth on \(\bar{\Omega }\). The operator \(^{C}\!\partial _t^{\alpha }\) is the Caputo fractional derivative in time of order \(\alpha \in (0,1)\) defined by

$$\begin{aligned} ^{C}\!\partial _t^{\alpha }\varphi (t)=\frac{1}{\Gamma (1-\alpha )}\int _0^t(t-s)^{-\alpha }\partial _s\varphi (s)\,ds,\quad 0<\alpha <1, \end{aligned}$$
(1.3)

where \(\partial _s\varphi =\partial \varphi /\partial s\) and \(\Gamma (\cdot )\) denotes the usual Gamma function. As \(\alpha \rightarrow 1^-\), \(^{C}\!\partial _t^{\alpha }\) converges to \(\partial _t\), and thus, problem (1.1) reduces to the standard semilinear parabolic problem [20].

Let \((\cdot ,\cdot )\) denote the inner product in \(L^2(\Omega )\) with induced norm \(\Vert \cdot \Vert \). Since \(\Omega \) is convex, the solution of the elliptic problem \(\mathcal {L}u=f\) in \(\Omega \), with \(u=0\) on \(\partial \Omega \) and \(f\in L^2(\Omega )\), belongs to \(H^2(\Omega )\). With \(\mathcal {D}(\mathcal {L})=H^2(\Omega )\cap H^1_0(\Omega )\), recall that the operator \(\mathcal {L}:\mathcal {D}(\mathcal {L})\rightarrow L^2(\Omega )\) is selfadjoint, positive definite and has a compact inverse. Let \(\{\lambda _j,\varphi _j\}_{j=1}^\infty \) denotes the eigenvalues and eigenfunctions of \(\mathcal {L}\) with \(\{\varphi _j\}_{j=1}^\infty \) an orthonormal basis in \(L^2(\Omega )\). By spectral method, the fractional powers of \(\mathcal {L}\) are defined by

$$\begin{aligned} \mathcal {L}^\nu v = \sum _{j=1}^\infty \lambda _j^\nu (v,\varphi _j)\varphi _j, \quad \nu >0, \end{aligned}$$

with domains \(\mathcal {D}(\mathcal {L}^\nu )=\{v\in L^2(\Omega ): \Vert \mathcal {L}^\nu v\Vert <\infty \}\). Note that \(\{\mathcal {D}(\mathcal {L}^\nu )\}\) is a Hilbert scale of interpolation spaces and \(\mathcal {D}(\mathcal {L})\subset \mathcal {D}(\mathcal {L}^\nu )\subset \mathcal {D}(\mathcal {L}^\beta ) \subset \mathcal {D}(\mathcal {L}^0)= L^2(\Omega )\) with continuous and compact embeddings for \(0<\beta<\nu <1\).

The regularity of the solution in (1.1) plays a key role in our error analysis. For initial data \(u_0\in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in (0,1]\), problem (1.1) has a unique solution u satisfying [1, Theorem 3.1]:

$$\begin{aligned}&u\in C^{\alpha \nu }([0,T_0];L^2(\Omega )) \cap C([0,T_0];\mathcal {D}(\mathcal {L}^\nu )) \cap C((0,T_0];\mathcal {D}(\mathcal {L})),\end{aligned}$$
(1.4)
$$\begin{aligned}&^{C}\!\partial _t^{\alpha }u\in C((0,T_0];L^2(\Omega )),\end{aligned}$$
(1.5)
$$\begin{aligned}&\partial _t u(t)\in L^2(\Omega ) \quad \text{ and }\quad \Vert \partial _t u(t)\Vert \le c t^{\alpha \nu -1}, \;\; \quad t\in (0,T_0]. \end{aligned}$$
(1.6)

The results show that the solution of the semilinear problem (1.1) enjoys (to some extent) smoothing properties analogous to those of the homogeneous linear problem. For \(u_0\in L^2(\Omega )\), it is shown that [1, Theorem 3.2]

$$\begin{aligned} u\in C([0,T_0];L^2(\Omega )) \cap L^\gamma (0,T_0;\mathcal {D}(\mathcal {L})), \quad \gamma <1/\alpha . \end{aligned}$$
(1.7)

Note that the first time derivative of u is not smooth enough in space even in the case of a smooth initial data. This actually causes a major difficulty in deriving optimal error estimates based on standard techniques, such as the energy method.

The numerical approximation of fractional differential equations has received considerable attention over the last two decades. For linear time-fractional equations, a vast literature is now available. See the short list [11, 12, 17, 19, 27, 28] on problems with nonsmooth data and [13] for a concise overview and recent developments. In contrast, numerical studies on nonlinear time-fractional evolution problems are rather limited. In [22], a linearized \(L^1\)-Galerkin FEM was proposed to solve a nonlinear time-fractional Schrödinger equation. In [21], \(L^1\)-type schemes have been analyzed for approximating the solution of (1.1). The error estimates in [22] and [21] are derived under high regularity assumptions on the exact solution, so the limited smoothing property of the model (1.1) was not taken into consideration. In [14], the numerical solution of (1.1) was investigated assuming that the nonlinearity f is uniformly Lipschitz in u and the initial data \(u_0\in \mathcal {D}(\mathcal {L})\). Error estimates are established for linearized time-stepping schemes based on the \(L^1\)-method and a convolution quadrature generated by the backward Euler difference formula. In the recent paper [1], we derived error estimates for the same problem with initial data \(u_0\in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in (0,1]\). The new estimates extend known results obtained for the standard semilinear parabolic problem [15]. For other types of time-fractional problems, one may refer to [5] for fractional diffusion-wave equations and to [29] for an integro-differential equation.

In this paper, we approximate the solution of the semilinear problem (1.1) by general Galerkin type approximation methods in space and a convolution quadrature in time. Our aim is to develop a unified error analysis with optimal error estimates with respect to the data regularity. We shall follow a semigroup type approach and make use of the inverse of the associated elliptic operator [3]. The current study extends the recent work [17] dealing with the homogeneous linear problem, which relied on the energy technique. Our analysis includes conforming, nonconforming and mixed FEMs, and the results are applicable to nonlinear multi-term diffusion problems. It is worth noting that most of our results hold in the limiting case \(\alpha =1\), i.e., our study also generalizes the work [3]. Particularly interesting are the estimates derived for the mixed FEM, which are new and have not been established earlier.

The paper is organized as follows. In Sect. 2, a general setting of the problem is introduced and preliminary error estimates are derived, which require regularity properties analogous to those of the homogeneous linear problem. In Sect. 3, an alternative error estimation is proposed without a priori regularity assumptions on the exact solution. Time-stepping schemes based on a backward Euler convolution quadrature method are analyzed in Sect. 4. Applications are presented in Sect. 5. The mixed form of problem (1.1) is considered in Sect. 6 and related convergence rates are obtained. Finally, numerical results are provided to validate the theoretical findings.

Throughout the paper, we denote by c a constant which may vary at different occurrences, but is always independent of the mesh size h and the time step size \(\tau \). We shall also use the abbreviation f(u) and f(t) for f(xtu) and f(xt), respectively.

2 General setting and preliminary estimates

Set \(T=\mathcal {L}^{-1}\). Then, \(T:L^2(\Omega )\rightarrow \mathcal {D}(\mathcal {L})\) is compact, selfadjoint and positive definite. In terms of T, we may write (1.1) as

$$\begin{aligned} T^{C}\!\partial _t^{\alpha }u +u = Tf(u),\quad t>0,\quad u(0)=u_0. \end{aligned}$$
(2.1)

For the purpose of approximating the solution of this problem, let \(V_h\subset L^2(\Omega )\) be a family of finite-dimensional spaces that depends on h, \(0<h<1\). We assume that we are given a corresponding family of linear operators \(T_h:L^2(\Omega )\rightarrow V_h\) which approximate T. Then consider the semidiscrete problem: find \(u_h(t)\in V_h\) for \(t\ge 0\) such that

$$\begin{aligned} T_h^{C}\!\partial _t^{\alpha }u_h +u_h = T_hf(u_h),\quad t>0 ,\quad u_h(0)=u_{0h}\in V_h, \end{aligned}$$
(2.2)

where \(u_{0h}\) is a suitably chosen approximation of \(u_0\). In our analysis, we shall make the following assumptions:

  1. (i)

    \(T_h\) is selfadjoint, positive semidefinite on \(L^2(\Omega )\) and positive definite on \(V_h\).

  2. (ii)

    \(T_hP_h=T_h\), where \(P_h:L^2(\Omega )\rightarrow V_h\) is the orthogonal \(L^2\)-projection onto \(V_h\).

  3. (iii)

    For some constants \(\gamma >0\) and \(c>0\), there holds

    $$\begin{aligned} \Vert T_hf-Tf\Vert \le ch^{\gamma }\Vert f\Vert \quad \forall f\in L^2(\Omega ). \end{aligned}$$
    (2.3)

Since \(T_h^{-1}\) exists on \(V_h\), (2.2) may be solved uniquely for \(t> 0\). The following diagram displays the different links between the operators under consideration:

figure a

In the diagram, the operator \(R_h:D(\mathcal {L})\rightarrow V_h\) is defined by \(R_h=T_h\mathcal {L}\). It is the analogue of the Ritz elliptic projection in the context of Galerkin finite element (FE) methods. Note that \(R_hT=T_h\), and in view of (2.3), \(R_h\) satisfies

$$\begin{aligned} \Vert R_h v-v\Vert = \Vert T_h\mathcal {L}v-T\mathcal {L}v\Vert \le ch^\gamma \Vert \mathcal {L}v\Vert \quad \forall v\in D(\mathcal {L}). \end{aligned}$$
(2.4)

Further, by the definition of \(P_h\), we see that \(\Vert P_h v-v\Vert \le \Vert R_h v-v\Vert \, \forall v \in \mathcal {D}(\mathcal {L})\).

Examples of family \(\{T_h\}\) with the above properties are exhibited by the standard Galerkin FE and spectral methods in the case \(V_h\subset H_0^1(\Omega )\), and by other nonconforming Galerkin methods in the case \(V_h\not \subset H_0^1(\Omega )\). The mixed FE method applied to (1.1) is a typical example which has the above properties and will be considered in this study.

By our assumptions on \(T_h\), the operator \((z^{-\alpha } I+T_h)^{-1}:L^2(\Omega )\rightarrow L^2(\Omega )\) satisfies

$$\begin{aligned} \Vert (z^{-\alpha } I+T_h)^{-1}\Vert \le M |z|^{\alpha } \quad \forall z\in \Sigma _\theta , \end{aligned}$$
(2.5)

where \(\Sigma _{\theta }\) is the sector \( \Sigma _{\theta }=\{z\in \mathbb {C}, \,z\ne 0,\, |\arg z|< \theta \} \) with \(\theta \in (\pi /2,\pi )\) being fixed and M depends on \(\theta \). In (2.5), and in the sequel, we keep the same notation \(\Vert \cdot \Vert \) to denote the operator norm from \(L^2(\Omega )\rightarrow L^2(\Omega )\). Using that

$$\begin{aligned} (z^{-\alpha } I+T_h)^{-1}T_h=I- z^{-\alpha } (z^{-\alpha } I+T_h)^{-1}, \end{aligned}$$
(2.6)

we obtain

$$\begin{aligned} \Vert (z^{-\alpha } I+T_h)^{-1}T_h\Vert \le 1+M \quad \forall z\in \Sigma _\theta . \end{aligned}$$
(2.7)

Note that (2.5) and (2.7) hold for T. By means of the Laplace transform, the solution of problem (2.2) is represented by

$$\begin{aligned} u_h(t)=E_h(t)u_{0h}+\int _0^t {\bar{E}}_h(t-s)f(u_h(s))\,ds,\quad t>0. \end{aligned}$$
(2.8)

The operators \(E_h(t):L^2(\Omega )\rightarrow L^2(\Omega )\) and \(\bar{E}_h(t):L^2(\Omega )\rightarrow L^2(\Omega )\) are defined by

$$\begin{aligned} E_h(t) = \frac{1}{2\pi i}\int _{\Gamma _{\theta ,\delta }}e^{zt} z^{-1}K_h(z) \,dz \quad \text{ and } \quad {\bar{E}}_h(t) = \frac{1}{2\pi i}\int _{\Gamma _{\theta ,\delta }}e^{zt}z^{-\alpha } K_h(z)\,dz, \end{aligned}$$
(2.9)

respectively, where \(K_h(z):=(z^{-\alpha } I+T_h)^{-1}T_h\). The contour \(\Gamma _{\theta ,\delta }=\{\rho e^{\pm i\theta }:\rho \ge \delta \}\cup \{\delta e^{i\psi }: |\psi |\le \theta \},\) with \(\theta \in (\pi /2,\pi )\) and \(\delta > 0\), is oriented with an increasing imaginary part. Similarly, the solution u of problem (2.1) is given by

$$\begin{aligned} u(t)=E(t)u_{0}+\int _0^t {\bar{E}}(t-s)f(u(s))\,ds,\quad t>0, \end{aligned}$$
(2.10)

where the operators E and \(\bar{E}\) are respectively defined in terms of \(K(z):=(z^{-\alpha } I+T)^{-1}T\) as in (2.9). Standard arguments show that (see for instance [26])

$$\begin{aligned} \Vert { E}(t)v\Vert +\Vert {E}_h(t)v\Vert + t^{1-\alpha }\left( \Vert {\bar{E}}(t)v\Vert +\Vert {\bar{E}}_h(t)v\Vert \right) \le c \Vert v\Vert \quad \forall v\in L^2(\Omega ). \end{aligned}$$
(2.11)

Now let \(e(t)=u_h(t)-u(t)\) denote the error at time t. Define the intermediate solution \(v_h(t)\in V_h\), \(t\ge 0\), by

$$\begin{aligned} T_h^{C}\!\partial _t^{\alpha }v_h +v_h = T_h f(u),\quad t>0 ,\quad v_h(0)=u_{0h}. \end{aligned}$$
(2.12)

Then, by splitting the error \(e = (u_h-v_h)+(v_h-u)=:\eta +\xi \), and subtracting (2.12) from (2.1), we find that \(\xi \) satisfies

$$\begin{aligned} T_h^{C}\!\partial _t^{\alpha }\xi (t)+\xi (t)= (T_h-T)(f(u)-^{C}\!\partial _t^{\alpha }u)(t),\quad t>0. \end{aligned}$$
(2.13)

With

$$\begin{aligned} \rho (t) := (T_h-T)(f(u)-^{C}\!\partial _t^{\alpha }u)(t), \end{aligned}$$
(2.14)

we thus obtain

$$\begin{aligned} T_h^{C}\!\partial _t^{\alpha }\xi (t) +\xi (t) = \rho (t),\quad t>0, \quad \xi (0)\in L^2(\Omega ). \end{aligned}$$
(2.15)

Before proving the main result of this section, we recall the following lemma which generalizes the classical Gronwall’s inequality, see [6].

Lemma 2.1

Assume that y is a nonnegative function in \(L^1(0,T_0)\) which satisfies

$$\begin{aligned} y(t) \le g(t)+\beta \int _0^t(t-s)^{-\alpha }y(s)\,ds\quad \text{ for } t\in (0,T_0], \end{aligned}$$

where \(g(t)\ge 0\), \(\beta \ge 0\), and \(0<\alpha <1\). Then there exists a constant \(C_{T_0}\) such that

$$\begin{aligned} y(t) \le g(t)+ C_{T_0}\int _0^t(t-s)^{-\alpha }g(s)\,ds\quad \text{ for } t\in (0,T_0]. \end{aligned}$$

Note that, by using (2.10), (2.11) and the inequality

$$\begin{aligned} \Vert f(u(t))\Vert \le \Vert f(u(t))-f(0)\Vert +\Vert f(0)\Vert \le L\Vert u\Vert +\Vert f(0)\Vert , \quad t\ge 0, \end{aligned}$$
(2.16)

Lemma 2.1 implies that \(\Vert u(t)\Vert \le c(\Vert u_0\Vert +\Vert f(0)\Vert )\) for \(t\ge 0\) with \(c=c(\alpha ,L,T_0)\). Now we are ready to prove an error estimate for problem (2.1). Here \(\rho \) is given by (2.14) and \(\tilde{\rho }(t):=\int _0^t\rho (s)\,ds\).

Lemma 2.2

Let u and \(u_h\) be the solutions of (2.1) and (2.2), respectively. Assume that \(T_h(u_0-u_{0h})=0\). Then, for \(t>0\),

$$\begin{aligned} \Vert e(t)\Vert \le c \left( G(t)+\int _0^t(t-s)^{\alpha -1}G(s)\,ds\right) , \end{aligned}$$
(2.17)

where

$$\begin{aligned} G(t)= t^{-1}\sup _{s\le t} ( \Vert \tilde{\rho }(s)\Vert +s\Vert \rho (s)\Vert + s^2\Vert \rho _t(s)\Vert ), \end{aligned}$$
(2.18)

and c is independent of h.

Proof

First we derive a bound for the difference between u and the intermediate solution \(v_h\). Since \(T_h\xi (0)=0\), an application of Lemma 3.5 in [17] to (2.1) and (2.12) yields \(\Vert \xi (t)\Vert \le c G(t)\), where G is given by (2.18). In view of the splitting \(e =\eta +\xi \), it suffices to estimate \(\Vert \eta \Vert \). Note that \(\eta \) satisfies

$$\begin{aligned} T_h^{C}\!\partial _t^{\alpha }\eta (t) +\eta (t) = T_h(f(u_h)-f(u)),\quad t>0, \quad \eta (0)=0. \end{aligned}$$

Hence, by Duhamel’s principle,

$$\begin{aligned} \eta (t)=\int _0^t\bar{E}_h(t-s)(f(u_h(s))-f(u(s)))\,ds, \quad t>0. \end{aligned}$$

Using the property of \(\bar{E}_h\) in (2.11) and condition (1.2), we see that

$$\begin{aligned} \Vert \eta (t)\Vert \le c L\int _0^t (t-s)^{\alpha -1}\Vert u_h(s)-u(s)\Vert \,ds, \quad t>0, \end{aligned}$$

and thus

$$\begin{aligned} \Vert e(t)\Vert \le \Vert \xi (t)\Vert +c \int _0^t(t-s)^{\alpha -1}\Vert e(s)\Vert \,ds, \quad t>0. \end{aligned}$$

An application of Lemma 2.1 yields (2.17), which completes the proof. \(\square \)

Clearly the error estimate in Lemma 2.2 is meaningful provided that \(G\in L^1(0,T)\). Recalling that \(\rho =(T_h-T)\mathcal {L}u\), we have by (2.3), \(\Vert \partial _t \rho (t)\Vert \le ch^\gamma \Vert \mathcal {L}\partial _t u(t)\Vert \). Hence, to achieve a \(O(h^\gamma )\) order of convergence, we need to assume that \(\partial _t u(t)\in \mathcal {D}(\mathcal {L})\) for \(t\in (0,T_0]\). It turns out that, without additional conditions on initial data and nonlinearity, this property, which holds in the linear case, does not generalize to the semilinear problem. This remark equally applies to the semilinear parabalic problem, see the discussion in [32, pp. 228].

3 Error estimates without regularity assumptions

We shall present below an alternative derivation of the error bound without a priori regularity assumptions on the exact solution. To do so, we first introduce the operator

$$\begin{aligned} S_h(z) =(z^{-\alpha }I+ T_h)^{-1}T_h-(z^{-\alpha }I+ T)^{-1}T. \end{aligned}$$

Then \(S_h\) satisfies the following property.

Lemma 3.1

There holds

$$\begin{aligned} \Vert S_h(z)v\Vert \le ch^{\gamma }|z|^{\alpha (1-\nu )}\Vert \mathcal {L}^\nu v\Vert \quad \forall z\in \Sigma _\theta ,\quad \nu \in [0,1]. \end{aligned}$$
(3.1)

Proof

Using the identity (2.6), we verify that

$$\begin{aligned} S_h(z)= & {} z^{-\alpha }(z^{-\alpha } I+T)^{-1}\left[ (z^{-\alpha } I+T_h)-(z^{-\alpha } I+T)\right] (z^{-\alpha } I+T_h)^{-1}\\= & {} z^{-\alpha }(z^{-\alpha } I+T)^{-1}(T_h-T)(z^{-\alpha } I+T_h)^{-1}. \end{aligned}$$

Then, by (2.5) and (2.3),

$$\begin{aligned} \Vert S_h(z)v\Vert \le c \Vert (T_h-T)(z^{-\alpha } I+T_h)^{-1}v\Vert \le ch^{\gamma } |z|^{\alpha }\Vert v\Vert . \end{aligned}$$

This shows (3.1) for \(\nu =0\). For \(\nu =1\), i.e., \(v\in D(\mathcal {L})\), we have \(T\mathcal {L}v=v\). Then, by (2.3) and (2.7), we get

$$\begin{aligned} \Vert S_h(z)v\Vert \le c \Vert (T_h-T)(z^{-\alpha } I+T)^{-1}T\mathcal {L}v\Vert \le ch^{\gamma } \Vert \mathcal {L}v\Vert . \end{aligned}$$

The desired estimate (2.3) follows now by interpolation. \(\square \)

We further introduce the following operators: \(F_h(t)=E_h(t)-E(t)\) and \(\bar{F}_h(t)=\bar{E}_h(t)-\bar{E}(t)\). Then, by Lemma 3.1,

$$\begin{aligned} \Vert \bar{F}_h(t)v\Vert \le ch^\gamma \int _{\Gamma _{\theta ,1/t}}e^{Re(z)t}\,|dz|\;\Vert v\Vert \le ct^{-1}h^\gamma \Vert v\Vert . \end{aligned}$$
(3.2)

Similarly, based on (3.1), the following estimate

$$\begin{aligned} \Vert F_h(t)v\Vert \le ct^{-\alpha (1-\nu )} h^\gamma \Vert \mathcal {L}^\nu v\Vert , \quad \nu =0,1, \end{aligned}$$
(3.3)

holds for \(F_h(t)\). Now we are ready to prove a nonsmooth data error estimate. Here and the throughout the paper, \({\ell _h(\nu )}=|\ln h|\) if \(\nu =0\) and 1 otherwise.

Theorem 3.1

Let \(u_0\in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in [0,1]\). Let u and \(u_h\) be the solutions defined by (2.10) and (2.8), respectively, with \(u_{0h}=P_hu_0\). Then there is a constant \(c=c(r,L,T_0)\), where \(r\ge \Vert \mathcal {L}^\nu u_0\Vert +\Vert f(0)\Vert \), such that

$$\begin{aligned} \Vert u_h(t)-u(t)\Vert \le ch^\gamma {\ell _h(\nu )}t^{-\alpha (1-\nu )},\quad t\in (0,T_0]. \end{aligned}$$
(3.4)

Proof

Recall that \(e=u_h-u\). From (2.8) and (2.10), we get after rearrangements

$$\begin{aligned} e(t)= F_h(t)u_0+\int _0^t \bar{E}_h(t-s)[f(u_h(s))-f(u(s))]\,ds+\int _0^t \bar{F}_h(t-s) f(u(s))\,ds. \end{aligned}$$
(3.5)

The last term in (3.5) can be written as \(I+II\) where

$$\begin{aligned} I = \int _0^t \bar{F}_h(t-s) (f(u(s))-f(u(t)))\,ds \quad \text{ and } \quad II=\left( \int _0^t \bar{F}_h(t-s) \,ds\right) f(u(t)). \end{aligned}$$

For \(\nu \in (0,1]\), we use (3.2), (1.2) and the property \(u\in C^{\alpha \nu }([0,T_0],L^2(\Omega ))\) to get

$$\begin{aligned} \Vert I\Vert \le ch^\gamma \int _0^t(t-s)^{-1}(t-s)^{\alpha \nu }\,ds \le ch^\gamma t^{\alpha \nu }. \end{aligned}$$

To estimate II, we introduce the operator \(\widetilde{E}(t)=\frac{1}{2\pi i}\int _{\Gamma _{\theta ,\delta }}e^{zt} z^{-1-\alpha } S_h(z)\,dz.\) Then \(\widetilde{E}'(t)=\bar{F}(t)\) and \(\Vert \widetilde{E}(t)\Vert \le ch^\gamma \) for all \(t\ge 0\) since \(\Vert S_h(z)\Vert \le ch^\gamma |z|^{\alpha }\). Hence, \(\Vert II\Vert \le \Vert f(u(t))\Vert \,\Vert \widetilde{E}(t)-\widetilde{E}(0)\Vert \le c h^\gamma \) as \(\Vert f(u)\Vert \) is bounded, see (2.16). Now, using the properties of \(\bar{E}_h\) and \(F_h\) in (2.11) and (3.3), respectively, we obtain

$$\begin{aligned} \Vert e(t)\Vert \le ct^{-\alpha (1-\nu )} h^\gamma \Vert \mathcal {L}^\nu u_0\Vert +c\int _0^t(t-s)^{\alpha -1}\Vert e(s)\Vert \,ds+ch^\gamma + ch^\gamma t^{\alpha \nu }. \end{aligned}$$
(3.6)

The desired estimate follows now by applying Lemma 2.1. To establish the estimate for \(\nu =0\), we follow the same arguments presented in the proof of Theorem 4.4 in [1].

As an immediate application of Theorem 3.1, consider the standard conforming Galerkin FEM with \(V_h\subset H_0^1(\Omega )\) consists of piecewise linear functions on a shape-regular triangulation with a mesh parameter h. Let \(T_h:L^2(\Omega )\rightarrow V_h\) be the solution operator of the discrete problem:

$$\begin{aligned} a(T_hf, v)=(f,v)\quad \forall v \in V_h, \end{aligned}$$
(3.7)

where \(a(\cdot ,\cdot )\) is the bilinear form associated with the elliptic operator \(\mathcal {L}\). Then, \(T_h\) is selfadjoint, positive semidefinite on \(L^2(\Omega )\) and positive definite on \(V_h\), see [3], and satisfies (2.3) with \(\gamma =2\). Thus, by Theorem 3.1,

$$\begin{aligned} \Vert u_h(t)-u(t)\Vert \le c h^2 {\ell _h(\nu )}t^{-\alpha (1-\nu )}, \quad t >0, \end{aligned}$$
(3.8)

for \(\nu \in [0,1]\). This improves the following estimate

$$\begin{aligned} \Vert u_h(t)-u(t)\Vert \le ch^2\left( t^{-\alpha (1-\nu )}+\max (0,\ln (t^{\alpha (1-\nu )}/h^2))\right) ,\quad t>0, \end{aligned}$$
(3.9)

established in [1]. We notice that the logarithmic factor is also present in the parabolic case when \(\nu =0\), see [15, Theorem 1.1].

As a second example, we show that the present semidiscrete error analysis extends to the following multi-term time-fractional diffusion problem:

$$\begin{aligned} P(\partial _t) u + \mathcal {L}u=f(u)\; \text{ in } \Omega \times (0,T_0],\quad u(0)=u_0\; \text{ in } \Omega , \quad u=0 \; \text{ on } \partial \Omega \times (0,T_0], \end{aligned}$$
(3.10)

where the multi-term differential operator \(P(\partial _t)\) is defined by \( P(\partial _t) = \partial _t^{\alpha }+\sum _{i=1}^m b_i\partial _t^{\alpha _i} \) with \(0<\alpha _m\le \cdots \le \alpha _1\le \alpha <1\) being the orders of the fractional Caputo derivatives, and \(b_i>0\), \(i=1,\ldots ,m\). This model was derived to improve the modeling accuracy of the single-term model (1.1) for describing anomalous diffusion. An inspection of the proof of the Theorem 3.1 reveals that its main arguments are based on the bounds derived for the operators \(F_h\), \(\bar{E}_h\) and \(\bar{F}_h\). Following [12], one can verify that these operators satisfy the same bounds as in the single-term case. This readily implies that the estimate (3.4) remains valid for the multi-term diffusion problem (3.10).

Remark 3.1

In the parabolic case, a singularity in time appears which has the same form as in (3.2). Hence, the estimate (3.4) remains valid when \(\alpha =1\).

Remark 3.2

For smooth initial data \(u_0\in \mathcal {D}(\mathcal {L})\), the estimate (3.4) still holds for the choice \(u_{0h}=R_hu_0\). Indeed, we have

$$\begin{aligned} E_h(t)R_h u_0 - E(t)u_0 = E_h(t)(R_h u_0 - u_0) + (E_h(t)u_0- E(t)u_0). \end{aligned}$$

By the stability of the operator \(E_h(t)\),

$$\begin{aligned} \Vert E_h(t)(R_h u_0 - u_0)\Vert \le c\Vert R_h u_0 - u_0\Vert \le ch^\gamma \Vert \mathcal {L}u_0\Vert . \end{aligned}$$

Then we reach our conclusion by following the arguments in the proof of Theorem 3.1.

4 Fully discrete schemes

This section is devoted to the analysis of a fully discrete scheme for problem (2.2) based on a convolution quadrature (CQ) generated by the backward Euler method, using the framework developed in [5, 26]. Divide the time interval \([0,T_0]\) into N equal subintervals with a time step size \(\tau =T_0/N\), and let \(t_j=j\tau \). The convolution quadrature [24] refers to an approximation of any function of the form \(k*\varphi \) as

$$\begin{aligned} (k*\varphi ) (t_n):=\int _0^{t_n}k(t_n-s)\varphi (s)\,ds\approx \sum _{j=0}^n \beta _{n-j}(\tau ) \varphi (t_j), \end{aligned}$$

where the weights \(\beta _j=\beta _j(\tau )\) are computed from the Laplace transform K(z) of k rather than the kernel k(t). With \(\partial _t\) being time differentiation, define \(K(\partial _t)\) as the operator of (distributional) convolution with the kernel k: \(K(\partial _t)\varphi =k*\varphi \) for a function \(\varphi (t)\) with suitable smoothness. Then a convolution quadrature will approximate \(K(\partial _t)\varphi \) by a discrete convolution \(K( \partial _\tau )\varphi \) at \(t=t_n\) as \( K( \partial _\tau )\varphi (t_n) = \sum _{j=0}^n \beta _{n-j}(\tau ) \varphi (t_j), \) where the quadrature weights \(\{\beta _j(\tau )\}_{j=0}^{\infty }\) are determined by the generating power series \( \sum _{j=0}^\infty \beta _j(\tau ) \xi ^j=K(\delta (\xi )/\tau ) \) with \(\delta (\xi )\) being a rational function, chosen as the quotient of the generating polynomials of a stable and consistent linear multistep method. For the backward Euler method, \(\delta (\xi )=1-\xi \).

An important property of the convolution quadrature is that it maintains some relations of the continuous convolution. For instance, the associativity of convolution is valid for the convolution quadrature [5] such as

$$\begin{aligned} K_2( \partial _\tau ) K_1( \partial _\tau )=K_2 K_1( \partial _\tau )\quad \text { and }\quad K_2( \partial _\tau )(k_1*\varphi )=(K_2( \partial _\tau )k_1)*\varphi . \end{aligned}$$
(4.1)

In the following lemma, we state an interesting result on the error of the convolution quadrature [25, Theorem 5.2].

Lemma 4.1

Let G(z) be analytic in the sector \(\Sigma _\theta \) and such that

$$\begin{aligned} \Vert G(z)\Vert \le c|z|^{-\mu }\quad \forall z\in \Sigma _\theta , \end{aligned}$$

for some real \(\mu \) and c. Then, for \(\varphi (t)=ct^{\sigma -1}\), the convolution quadrature based on the backward Euler method satisfies

$$\begin{aligned} \Vert G(\partial _t)\varphi (t) - G( \partial _\tau )\varphi (t)\Vert \le \left\{ \begin{array}{ll} C t^{\mu +\sigma -2} \tau , &{} \sigma \ge 1\\ C t^{\mu -1} \tau ^\sigma , &{} 0< \sigma \le 1. \end{array} \right. \end{aligned}$$
(4.2)

Upon using the relation between the Riemann-Liouville derivative denoted by \(\partial _t^{\alpha }\) and the Caputo derivative \(^{C}\!\partial _t^{\alpha }\), the semidiscrete scheme (2.2) can be rewritten as

$$\begin{aligned} T_h\partial _t^{\alpha }(u_h -u_{0h}) +u_h = T_hf(u_h),\quad t>0,\quad u_h(0)=u_{0h}. \end{aligned}$$
(4.3)

Thus, the proposed backward Euler CQ scheme is to seek \(U_h^n\in V_h\), \(n\ge 0\), such that

$$\begin{aligned} T_h\partial _\tau ^{\alpha }(U_h^n -U_h^0) +U_h^n = T_hf(U_h^n), \quad n\ge 1,\quad U_h^0=u_{0h}. \end{aligned}$$
(4.4)

4.1 The linear case

We begin by investigating the time discretization of the inhomogeneous linear problem

$$\begin{aligned} T ^{C}\!\partial _t^{\alpha }u +u = Tf(t),\quad t>0,\quad u(0)=u_0, \end{aligned}$$
(4.5)

with a semidiscrete solution \(u_h(t)\in V_h\) satisfying

$$\begin{aligned} T_h ^{C}\!\partial _t^{\alpha }u_h +u_h = T_hf(t),\quad t>0,\quad u_h(0)=P_h u_{0}. \end{aligned}$$
(4.6)

The fully discrete solution \(U_h^n\in V_h\) is defined by

$$\begin{aligned} T_h\partial _\tau ^{\alpha }(U_h^n -U_h^0) +U_h^n = T_hf(t_n), \quad n\ge 1,\quad U_h^0=P_h u_{0}. \end{aligned}$$
(4.7)

Then we establish the following result.

Theorem 4.1

Let \(u_0\in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in [0,1]\). Let u and \(U_h^n\) be the solutions of (4.5) and (4.7), respectively, with \(f=0\). Then, for \(t_n>0\),

$$\begin{aligned} \Vert U^n_h-{u}(t_n)\Vert \le c (\tau t_n^{\alpha \nu -1}+ h^\gamma t_n^{-\alpha (1-\nu )})\Vert \mathcal {L}^\nu u_0\Vert . \end{aligned}$$
(4.8)

Proof

We first notice that \(\Vert u_h(t_n)-{u}(t_n)\Vert \le c t_n^{-\alpha (1-\nu )}h^\gamma \Vert \mathcal {L}^\nu u_0\Vert \), which follows from Theorem 3.1 in the case \(f=0\). In order to estimate \(\Vert U^n_h-{u}_h(t_n)\Vert \), apply \(\partial _t^{-\alpha }\) and \(\partial _\tau ^{-\alpha }\) to (4.6) and (4.7), respectively, use the associativity of convolution and the property \(T_hP_h=T_h\), to deduce that

$$\begin{aligned} U^n_h-{u}_h(t_n)= - \left( G(\partial _\tau )- G(\partial _t)\right) u_{0}, \end{aligned}$$
(4.9)

where \(G(z)=(z^{-\alpha } I+T_h)^{-1}T_h\). We recall that, by (2.5), \(\Vert G(z)\Vert \le c\; \forall z\in \Sigma _\theta .\) Then, Lemma 4.1 (with \(\mu =0\), \(\sigma =1\)) and the \(L^2\)-stability of \(P_h\) yield

$$\begin{aligned} \Vert U^n_h-{u}_h(t_n)\Vert \le c \tau t_n^{-1}\Vert u_0\Vert . \end{aligned}$$
(4.10)

For \(u_0\in \mathcal {D}(\mathcal {L})\), consider first the choice \(u_h(0)=R_h u_0\). Recalling that \(R_h=T_h \mathcal {L}\), we use the identity \(G(z)=I-z^{-\alpha }(z^{-\alpha } I+T_h)^{-1}\) to get

$$\begin{aligned} U^n_h-{u}_h(t_n)= -\left( \bar{G}(\partial _\tau )- \bar{G}(\partial _t)\right) \mathcal {L}v, \end{aligned}$$

where \(\bar{G}(z)=z^{-\alpha }(z^{-\alpha } I+T_h)^{-1}T_h\). Since \(\Vert \bar{G}(z)\Vert \le c |z|^{-\alpha }\)\(\forall z\in \Sigma _\theta \), an application of Lemma 4.1 (with \(\mu =\alpha \), \(\sigma =1\)) yields

$$\begin{aligned} \Vert U^n_h-{u}_h(t_n)\Vert \le c \tau t_n^{\alpha -1}\Vert \mathcal {L}v\Vert . \end{aligned}$$
(4.11)

For the choice \(u_0=P_hv\), we split the new error

$$\begin{aligned} G(\partial _\tau )u_0- G(\partial _t)u_0= G(\partial _\tau )(u_0-R_h u_0) + G(\partial _\tau )R_hu_0-G(\partial _t)u_0. \end{aligned}$$

By the stability of the discrete scheme, the estimate (4.11) and remark 3.2, we deduce that \( \Vert G(\partial _\tau )u_0- G(\partial _t)u_0\Vert \le c(h^\gamma +\tau t_n^{\alpha -1})\Vert \mathcal {L}u_0\Vert . \) This shows (4.8) when \(\nu =1\). Finally, for \(\nu \in (0,1)\), the estimate (4.8) follows by interpolation. \(\square \)

Now we prove an error estimate when \(f\ne 0\) but with \(u_0=0\).

Theorem 4.2

Let u and \(U_h^n\) be the solutions of (4.5) and (4.7), respectively, with \(u_0=0\). If f satisfies \(|f(t)-f(s)|\le c|t-s|^\theta \)\(\forall t,s\in \mathbb {R}\) with some \(\theta \in (0,1)\) and \(\int _0^t (t-s)^{\alpha -1}\Vert f'(s)\Vert ds<\infty \)\(\forall t\in (0,T_0]\), then, for \(t_n>0\),

$$\begin{aligned} \Vert U^n_h-u(t_n)\Vert \le c h^\gamma +c \left( \tau t_n^{\alpha -1}\Vert f(0)\Vert +\tau \int _0^{t_n} (t_n-s)^{\alpha -1}\Vert f'(s)\Vert ds\right) . \end{aligned}$$
(4.12)

Proof

Using the assumptions on f in the proof of Theorem 3.1 we conclude that \(\Vert u_h(t)-u(t)\Vert \le c h^\gamma \). From (4.6) and (4.7), the semidiscrete and fully discrete solutions are now given by \(u_h=\bar{G}(\partial _t)f\) and \(U_h^n=\bar{G}(\partial _\tau )f\), respectively, where \(\bar{G}(z)\) is defined in the previous proof. Using the expansion \(f(t)=f(0)+(1*f')(t)\) and the second relation in (4.1), we find that

$$\begin{aligned} U^n_h-{u}_h(t_n)= ( \bar{G}(\partial _\tau )- \bar{G}(\partial _t))f(0)+(( \bar{G}(\partial _\tau )- \bar{G}(\partial _t))1)*f'(t_n)=:I+II. \end{aligned}$$

By Lemma 4.1 (with \(\mu =\alpha \) and \(\sigma =1\)), we have

$$\begin{aligned} \Vert I\Vert \le c\tau t_n^{\alpha -1}\Vert f(0)\Vert . \end{aligned}$$

For the second term, Lemma 4.1 yields

$$\begin{aligned} \Vert II\Vert \le \int _0^{t_n} \Vert (( \bar{G}(\partial _\tau )- \bar{G}(\partial _t))1)(t_n-s) f'(s)\Vert \le c\tau \int _0^{t_n} (t_n-s)^{\alpha -1} \Vert f'(s)\Vert ds, \end{aligned}$$

which completes the proof of (4.12). \(\square \)

4.2 The semilinear case

We consider now the time approximation of the nonlinear problem (2.1) and prove related error estimates. The time-stepping scheme is now defined as follows: find \(U_h^n\in V_h\), \(n\ge 0\), such that

$$\begin{aligned} T_h\partial _\tau ^{\alpha }(U_h^n -U_h^0) +U_h^n = T_hf(U_h^n), \quad n\ge 1, \quad U_h^0=P_h u_{0}. \end{aligned}$$
(4.13)

This scheme is written in an expanded form as

$$\begin{aligned} T_h(U_h^n-U_h^0)+\tau ^{\alpha }\sum _{j=0}^n q_{n-j}^{(\alpha )} U_h^j= \tau ^{\alpha }T_h \sum _{j=0}^n q_{n-j}^{(\alpha )} f(U_h^j), \end{aligned}$$

where \( q_{j}^{(\alpha )} = (-1)^{j} \left( \begin{array}{c} -\alpha \\ j \end{array}\right) ,\) see [25]. Since \(q_0^{(\alpha )}=1\), the implicit equation for \(U_h^n\) is of the form

$$\begin{aligned} (\tau ^\alpha I+ T_h)U_h^n=T_h\zeta +\eta +\tau ^\alpha T_h f(U_h^n),\quad n\ge 1, \end{aligned}$$
(4.14)

where \(\zeta ,\eta \in V_h\). The solvability of (4.14) is discussed below.

Proposition 4.1

Under the restriction \(\tau ^\alpha M L<1\), the nonlinear system (4.14) has a unique solution \(U_h^n\in V_h\) for every \(n\ge 1\).

Proof

The solvability of (4.14) is equivalent to the existence of a fixed point for the mapping \(S_h:V_h \rightarrow V_h\) defined by

$$\begin{aligned} S_h(v)=(\tau ^\alpha I+ T_h)^{-1}(T_h\zeta +\eta +\tau ^\alpha T_h f(v)). \end{aligned}$$
(4.15)

Recalling that \(\Vert (\tau ^\alpha I+ T_h)^{-1}T_h\Vert \le M\), we have \(\forall v,w\in V_h\),

$$\begin{aligned} \Vert S_h(w)-S_h(v)\Vert\le & {} \tau ^\alpha \Vert (\tau ^\alpha I+ T_h)^{-1}T_h(f(v)-f(w))\Vert \\\le & {} \tau ^\alpha M \Vert f_h(v)-f_h(w)\Vert \\\le & {} \tau ^\alpha M L \Vert v-w\Vert . \end{aligned}$$

Hence, \(S_h\) is a contraction if \(\tau ^\alpha M L<1\), and therefore, (4.15) has a unique fixed point in \(V_h\) which is also the unique solution of (4.14). \(\square \)

To investigate the stability of the scheme (4.13), we rewrite it in the form

$$\begin{aligned} (\partial _\tau ^{-\alpha }+T_h) U_h^n = T_h U_h^0 + \partial _\tau ^{-\alpha }T_h f(U_h^n). \end{aligned}$$
(4.16)

Noting that \(U_h^n\) depends linearly and boundedly on \(U_h^0\), \( f(U_h^j)\), \(0\le j\le n\), we deduce the existence of linear and bounded operators \(P_n\) and \(R_n:V_h\rightarrow V_h\), \(n\ge 0\), such that \(U_h^n\) is represented by

$$\begin{aligned} U_h^n = P_n U_h^0 + \tau \sum _{j=0}^n R_{n-j} f(U_h^j), \end{aligned}$$
(4.17)

see [5, Section 4]. In view of (4.16), the operators \(\tau R_n\), \(n\ge 0\), are the convolution quadrature weights corresponding to the Laplace transform \(K(z)=z^{-\alpha }(z^{-\alpha }I+ T_h)^{-1}T_h\), i.e., they are the coefficients in the series expansion

$$\begin{aligned} \tau \sum _{j=0}^\infty R_j\xi ^j=K((1-\xi )/\tau ). \end{aligned}$$

Since \(\Vert K(z)\Vert \le c|z|^{-\alpha }\), an application of Lemma 3.1 in [5] with \(\mu =\alpha \), shows that there is a constant \(B>0\), independent of \(\tau \), such that

$$\begin{aligned} \Vert R_n\Vert \le B t_{n+1}^{\alpha -1},\quad n=0,1,2,\ldots . \end{aligned}$$
(4.18)

Based on this bound, we show that the scheme (4.13) is stable in \(L^2(\Omega )\).

Proposition 4.2

Under the restriction \(\tau ^\alpha B L<1\), there exists a constant C independent of h and \(\tau \) such that

$$\begin{aligned} \Vert U_h^n\Vert \le C(\Vert U_h^0\Vert +\Vert f(u_0)\Vert ), \quad n\ge 1. \end{aligned}$$
(4.19)

Proof

Taking \(L^2(\Omega )\)-norms in (4.17) and using (4.18), we deduce that

$$\begin{aligned} \Vert U_h^n\Vert \le c\Vert U_h^0\Vert + \tau B\sum _{j=0}^n t_{n-j+1}^{\alpha -1}\Vert f(U_h^j)\Vert . \end{aligned}$$

Since \(\Vert f(U_h^j)\Vert \le L \Vert U_h^j\Vert +\Vert f(0)\Vert \),

$$\begin{aligned} \Vert U_h^n\Vert \le c\Vert U_h^0\Vert + \frac{B}{\alpha }T^\alpha \Vert f(0)\Vert + \tau B L\sum _{j=0}^n t_{n-j+1}^{\alpha -1}\Vert U_h^j\Vert . \end{aligned}$$

With the assumption that \(\tau ^{\alpha } B L<1\), a generalized discrete Gronwall’s lemma (see, e.g., Theorem 6.1 in [9]) readily implies (4.19). \(\square \)

Now we are ready to prove the main result of this section.

Theorem 4.3

Let \(u_0\in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in (0,1]\). Let u be the solution of (2.1). Then there exists \(\tau _0>0\) such that, for \(0<\tau <\tau _0\), the numerical solution \(U_h^n\) given by (4.13) is uniquely defined and satisfies

$$\begin{aligned} \Vert U_h^n-u(t_n)\Vert \le c(\tau t_n^{\alpha \nu -1}+h^\gamma t_n^{\alpha (\nu -1)}),\quad t_n>0. \end{aligned}$$
(4.20)

Proof

Select \(\tau _0\) such that \(\tau _0^\alpha ML<1\). Then, for \(0<\tau <\tau _0\), the discrete solution \(U_h^n\in V_h\) is well defined. Let \(v_h^n\), \(n\ge 0\), be the intermediate discrete solution defined by

$$\begin{aligned} T_h \partial _\tau ^{\alpha }(v_h^n-v_h^0)+v_h^n=T_hf(u(t_n)),\quad n\ge 1, \quad v_h^0=U_h^0. \end{aligned}$$
(4.21)

Then (4.21) can be viewed as a fully discretization of (2.1) with a given right-hand side function f(u). Hence, by applying Theorems 4.1 and 4.2, and using the bound \(\Vert \partial _t u(s)\Vert \le cs^{\alpha \nu -1}\), we deduce that

$$\begin{aligned} \Vert u(t_n)-v_h^n\Vert \le&c\tau t_n^{\alpha \nu -1}+c h^\gamma t_n^{\alpha (\nu -1)}+ch^\gamma +c\tau t_n^{\alpha -1}\Vert f(u(0))\Vert \nonumber \\&+c\tau \int _0^{t_n}(t_n-s)^{\alpha -1}\Vert \partial _t f(s,x,u(s))+\partial _u f(s,x,u(s))\partial _t u(s)\Vert \,ds\nonumber \\ \le&c(\tau t_n^{\alpha \nu -1}+h^\gamma t_n^{\alpha (\nu -1)}+ch^\gamma +\tau t_n^{\alpha -1}+\tau t_n^{\alpha +\alpha \nu -1})\nonumber \\ \le&c(\tau t_n^{\alpha \nu -1}+h^\gamma t_n^{\alpha (\nu -1)}). \end{aligned}$$
(4.22)

On the other hand, expressing \(v_h^n\) in terms of the data, through the discrete operators \(P_n\) and \(R_n\), as in (4.17), we obtain

$$\begin{aligned} v_h^n = P_n U_h^0 + \tau \sum _{j=0}^n R_{n-j} f(u(t_j)). \end{aligned}$$
(4.23)

Thus, in view of (4.17) and (4.23), we have for \(0<t_n\le T_0\),

$$\begin{aligned} U_h^n-u(t_n)= & {} U_h^n-v_h^n+v_h^n-u(t_n) \\= & {} v_h^n-u(t_n) +\tau \sum _{j=0}^n R_{n-j} (f(U_h^j)-f(u(t_j))). \end{aligned}$$

By (1.2) and the estimate in (4.18) for \(R_n\), we obtain

$$\begin{aligned} \Vert U_h^n-u(t_n)\Vert \le \Vert v_h^n-u(t_n)\Vert + \tau LB \sum _{j=1}^n t_{n-j+1}^{\alpha -1} \Vert U_h^j-u(t_j)\Vert + \tau LB t_{n+1}^{\alpha -1} \Vert P_hu_0-u_0\Vert . \end{aligned}$$

Using the estimate (4.22) and making the additional assumption \(\tau _0^\alpha LB<1\), we now apply a generalized Gronwall’s inequality, see Lemma 5.1 in [1], to finally derive (4.20). \(\square \)

5 Galerkin type approximations

In this section, we apply our analysis to approximate the solution of (1.1) by general Galerkin type methods and derive optimal \(L^2(\Omega )\)-error estimates for cases with smooth and nonsmooth initial data. For a general setting, we assume that we are given a bilinear form \(a_h:V_h\times V_h\rightarrow \mathbb {R}\) which has the following property:

Property A\(a_h(\cdot ,\cdot )\)is symmetric positive definite, and the discrete problem

$$\begin{aligned} a_h(T_hf, \chi )=(f,\chi )\quad \forall \chi \in V_h \end{aligned}$$
(5.1)

defines a linear operator\(T_h:L^2(\Omega )\rightarrow V_h\)satisfying the estimate (2.3).

The solution of the continuous problem (1.1) will be approximated through the semidiscrete problem: find\(u_h(t)\in V_h\)such that

$$\begin{aligned} (^{C}\!\partial _t^{\alpha }u_h,\chi )+ a_h(u_h,\chi )= (f(u_h),\chi ) \quad \forall \chi \in V_h, \quad t>0, \end{aligned}$$
(5.2)

with\(u_h(0)=P_hu_0\). We recall that\(P_h:L^2(\Omega )\rightarrow V_h\)is the\(L^2\)-projection onto\(V_h\). Next we define the fully discrete scheme based on the backward Euler CQ method as follows: with\(U_h^0=P_hu_0,\)find\(U_h^n\in V_h\), \(n\ge 1,\)such that

$$\begin{aligned} (\partial _\tau ^{\alpha }U_h^n,\chi )+ a_h(U_h^n,\chi )= (\partial _\tau ^{\alpha }U_h^0,\chi )+ (f(U_h^n),\chi )\quad \forall \chi \in V_h. \end{aligned}$$
(5.3)

Then we have the following result.

Theorem 5.1

Let u be the solution of problem (1.1) with \(u_0\in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in (0,1]\). Let \(U^n_h\) be the solution of problem (5.3) with \(U_h^0=P_hu_0\). Assume that \(a_h(\cdot ,\cdot )\) satisfies Property A. Then there holds:

$$\begin{aligned} \Vert U^n_h-{u}(t_n)\Vert \le c (\tau t_n^{\alpha \nu -1} + h^\gamma t_n^{\alpha (\nu -1)}),\quad t_n>0, \end{aligned}$$
(5.4)

where c is independent of h and \(\tau \).

Proof

Since \(a_h(\cdot ,\cdot )\) is symmetric, the operator \(T_h\) is selfadjoint and positive semidefinite on \(L^2(\Omega )\): for all \(f,g \in L^2(\Omega )\)

$$\begin{aligned} (f,T_hg)=a_h(T_hf,T_hg)=(T_hf,g) \quad \text{ and } \quad (f,T_hf)=a_h(T_hf,T_hf)\ge 0. \end{aligned}$$

If \(f\in V_h\) and \(T_hf=0\), then (5.1) implies that \(f=0\), that is \(T_h\) is positive definite on \(V_h\). Further, as from (5.1), we have \(T_h=T_hP_h\). Hence, \(T_h\) satisfies all the conditions stated in Sect. 2. In view of (5.1), the fully discrete scheme (5.3) is equivalently written as

$$\begin{aligned} T_h \partial _\tau ^{\alpha }(U_h^n-U_h^0)+U_h^n= T_hf(U_h^n), \quad n\ge 1,\quad U_h^0=P_hu_0. \end{aligned}$$
(5.5)

Thus, the desired estimate is a direct consequence of Theorem 4.3.

We notice that Property A is quite standard and holds for a large class of Galerkin type approximation methods, including FE and spectral methods. For the spectral methods, one may choose the notation \(V_N\) instead of \(V_h\) and replace the estimate (2.3) by \(\Vert T_hf-Tf\Vert \le cN^{-\gamma }\Vert f\Vert \; \forall f\in L^2(\Omega ).\) Further, since we are not imposing restrictions on the space \(V_h\), our analysis applies to nonconforming space approximations, such as the early method by Nitsche [30] and the method by Crouzeix and Raviart [4]. Recent examples of nonconforming methods include discontinuous Galerkin (DG) FE methods. An interesting case is the symmetric DG interior penalty method, see [2]. Here the discrete bilinear form \(a_h\) for the Laplacian operator is given by

$$\begin{aligned} a_h(u,v):= & {} \sum _{K\in \mathcal T_h}\int _K c^2\nabla u\cdot \nabla v\,dx- \sum _{F\in {\mathcal F}_h}\int _F [[u]]\cdot \{\{c^2\nabla v\}\}\,ds\nonumber \\&-\sum _{F\in {\mathcal F}_h}\int _F [[v]]\cdot \{\{c^2\nabla u\}\}\,ds +\sum _{F\in {\mathcal F}_h}\rho h_F^{-1}\int _F c^2[[v]]\cdot [[u]]\,ds, \end{aligned}$$
(5.6)

where \(\mathcal {T}_h=\{K\}\) is a regular partition of \(\bar{\Omega }\) and \(\mathcal {F}_h=\{F\}\) is the set of all interior and boundary edges or faces of \(\mathcal {T}_h=\{K\}\). The last three terms in (5.6) correspond to jump and flux terms at element boundaries, with \(h_F\) denoting the diameter of the edge or the face F, see [2] for more details. The parameter \(\rho >0\) is the interior penalty stabilization parameter that has to be chosen sufficiently large, independent of the mesh size. The bilinear form \(a_h\) is clearly symmetric and satisfies property A with \(\gamma =2\). Our analysis extends to other symmetric spatial DG methods as long as property A is satisfied.

6 Mixed FE methods

We consider now the mixed form of the problem (1.1) and establish a priori error estimates for smooth and nonsmooth initial data. For the sake of simplicity, we choose \(\mathcal {L}=-\Delta \) and \(\Omega \subset \mathbb {R}^2\). One notable advantage of the mixed FEM is that it approximates the solution u and its gradient simultaneously, resulting in a high convergence rate for the gradient.

By introducing the new variable \({\varvec{\sigma }}=\nabla u\), the problem can be formulated as

$$\begin{aligned} ^{C}\!\partial _t^{\alpha }u- \nabla \cdot {\varvec{\sigma }}=f(u), \qquad {\varvec{\sigma }}=\nabla u, \qquad u=0 \; \text{ on } \partial \Omega , \end{aligned}$$

with \(u(0)=u_0\). Let \(H(div;\Omega )= \{{\varvec{\sigma }}\in (L^2(\Omega ))^2:\nabla \cdot {\varvec{\sigma }}\in L^2(\Omega ) \}\) be a Hilbert space equipped with norm \(\Vert {\varvec{\sigma }}\Vert _{\mathbf{W}} =(\Vert {\varvec{\sigma }}\Vert ^2+\Vert \nabla \cdot {\varvec{\sigma }}\Vert ^2)^{\frac{1}{2}}\). Then, with \(V=L^2(\Omega )\) and \(\mathbf{W}= H(div;\Omega )\), the weak mixed formulation of (1.1) is defined as follows: find \((u,{\varvec{\sigma }}):(0,T_0]\rightarrow V\times \mathbf{W}\) such that

$$\begin{aligned} (^{C}\!\partial _t^{\alpha }u, \chi )- (\nabla \cdot {\varvec{\sigma }}, \chi )= & {} (f(u),\chi ) \;\;\;\forall \chi \in V,\end{aligned}$$
(6.1)
$$\begin{aligned} ({\varvec{\sigma }}, \mathbf{w}) + (u,\nabla \cdot \mathbf{w})= & {} 0\ \;\;\;\forall \mathbf{w}\in \mathbf{W}, \end{aligned}$$
(6.2)

with \(u(0)=u_0\). Note that the boundary condition \(u=0\) on \(\partial \Omega \) is implicitly contained in (6.2). By Green’s formula, we formally obtain \({\varvec{\sigma }}= \nabla u\) in \(\Omega \) and \(u=0\) on \(\partial \Omega \).

For the mixed form of problem (1.1), a few numerical studies are available, dealing only with the linear case. In [8], the authors investigated a hybridizable DG method for the space discretization. In [33], a non-standard mixed FE method was proposed and analysed. Another related analysis for mixed method applied to the time-fractional Navier-Stokes equations was presented in [23]. The convergence analyses in all these studies require high regularity assumptions on the exact solution, which is not in general reasonable. In the recent work [17], we investigated a mixed FE method for (1.1) with \(f=0\) and derived optimal error estimates for semidiscrete schemes with smooth and nonsmooth initial data. The estimates extend the results obtained for the standard linear parabolic problem [16]. In the present analysis, we shall avoid energy arguments as employed in [17] due to the weak regularity of the solution.

For the FE approximation of problem (6.1)–(6.2), let \({\mathcal T}_h\) be a shape regular and quasi-uniform partition of the polygonal convex domain \(\bar{\Omega }\) into triangles K of diameter \(h_K\). Further, let \(V_h\) and \(\mathbf{W}_h\) be the Raviart–Thomas FE spaces [31] of index \(\ell \ge 0\) given respectively by

$$\begin{aligned} V_h=\{ w\in L^2(\Omega ):\;w|_{K}\in P_{\ell }(K) \;\forall K\in {\mathcal T}_h\} \end{aligned}$$

and

$$\begin{aligned} \mathbf{W}_h=\{ \mathbf{v} \in H(div,\Omega ):\;\mathbf{v}|_{K}\in RT_{\ell }(K) \;\forall K\in {\mathcal T}_h\}, \end{aligned}$$

where \(RT_{\ell }(K)=(P_{\ell }(K))^2+{x}P_{\ell }(K),\)\({\ell }\ge 0\). Other examples of mixed FE spaces may also be considered. We shall restrict our analysis to the low order cases \(\ell =0,1,\) as high order Thomas–Raviart elements are not attractive due to the limited smoothing property of the time-fractional model, see [17].

For \((u,{\varvec{\sigma }})\in V\times \mathbf{W}\), we define the intermediate mixed projection as the pair \((\tilde{u}_h,\tilde{{\varvec{\sigma }}}_h)\in V_h\times \mathbf{W}_h\) satisfying

$$\begin{aligned} (\nabla \cdot ({\varvec{\sigma }}-\tilde{{\varvec{\sigma }}}_h), \chi _h)= & {} 0 \;\;\;\forall \chi _h \in V_h,\end{aligned}$$
(6.3)
$$\begin{aligned} ( ({\varvec{\sigma }}-\tilde{{\varvec{\sigma }}}_h), \mathbf{w}_h) + (u-\tilde{u}_h,\nabla \cdot \mathbf{w}_h)= & {} 0 \;\;\;\forall \mathbf{w}_h \in \mathbf{W}_h. \end{aligned}$$
(6.4)

Then, with \((u,{\varvec{\sigma }})=(u,\nabla u)\), the following estimates hold [16, Theorem 1.1]:

$$\begin{aligned} \Vert u-\tilde{u}_h\Vert \le C h^{1+\ell }\Vert \mathcal {L}u\Vert ,\quad \Vert {\varvec{\sigma }}-\tilde{{\varvec{\sigma }}}_h\Vert \le Ch\Vert \mathcal {L}u\Vert ,\quad \ell =0,1. \end{aligned}$$
(6.5)

For a given function \(f\in L^2(\Omega )\), let \((u_h,{\varvec{\sigma }}_h)\in V_h\times \mathbf{W}_h\) be the unique solution of the mixed elliptic problem

$$\begin{aligned} - (\nabla \cdot {\varvec{\sigma }}_h, \chi _h)= & {} (f,\chi _h) \;\;\;\forall \chi _h \in V_h,\end{aligned}$$
(6.6)
$$\begin{aligned} ({\varvec{\sigma }}_h, \mathbf{w}_h) + (u_h,\nabla \cdot \mathbf{w}_h)= & {} 0 \;\;\; \forall \mathbf{w}_h \in \mathbf{W}_h. \end{aligned}$$
(6.7)

Then, we define a pair of operators \((T_h,S_h):L^2(\Omega )\rightarrow V_h\times \mathbf{W}_h\) as \(T_hf=u_h\) and \(S_hf={\varvec{\sigma }}_h\). With \(T:L^2(\Omega ) \rightarrow \mathcal {D}(\mathcal {L})\) being the inverse of the operator \(\mathcal {L}\), the following result holds [16, Lemma 1.5].

Lemma 6.1

The operator \(T_h:L^2(\Omega )\rightarrow V_h\) defined by \(T_hf=u_h\) is selfadjoint, positive semidefinite on \(L^2(\Omega )\) and positive definite on \(V_h\). Further, we have

$$\begin{aligned} \Vert T_hf-Tf\Vert \le ch^{1+\ell }\Vert f\Vert ,\quad \ell =0,1. \end{aligned}$$

6.1 Semidiscrete scheme

The semidiscrete mixed FE scheme is to seek a pair \((u_h,{\varvec{\sigma }}_h):(0,T_0]\rightarrow V_h\times \mathbf{W}_h\) such that

$$\begin{aligned} (^{C}\!\partial _t^{\alpha }u_h, \chi _h)- (\nabla \cdot {\varvec{\sigma }}_h, \chi _h)= & {} (f(u_h), \chi _h) \;\;\;\forall \chi _h \in V_h,\end{aligned}$$
(6.8)
$$\begin{aligned} ({\varvec{\sigma }}_h, \mathbf{w}_h) + (u_h,\nabla \cdot \mathbf{w}_h)= & {} 0 \;\;\;\forall \mathbf{w}_h \in \mathbf{W}_h, \end{aligned}$$
(6.9)

with \(u_h(0)=P_hu_0\). Since \(V_h\) and \(\mathbf{W}_h\) are finite-dimensional, we can eliminate \({\varvec{\sigma }}_h\) in the discrete level using (6.9) by writing it in terms of \(u_h\). Therefore, substituting in (6.8), we obtain a system of time-fractional ODEs. Existence and uniqueness can be shown using standard results from fractional ODE theory [20].

For the error analysis, define \(e_u=u_h-u\) and \(e_{\varvec{\sigma }}={\varvec{\sigma }}_h-{\varvec{\sigma }}\). Then, from (6.1)–(6.2) and (6.8)–(6.9), \(e_u\) and \(e_{\varvec{\sigma }}\) satisfy

$$\begin{aligned} (^{C}\!\partial _t^{\alpha }e_u, \chi _h)- (\nabla \cdot e_{\varvec{\sigma }}, \chi _h)= & {} (f(u_h)-f(u),\chi ) \;\;\;\forall \chi _h \in V_h,\end{aligned}$$
(6.10)
$$\begin{aligned} ( e_{\varvec{\sigma }}, \mathbf{w}_h) + (e_u,\nabla \cdot \mathbf{w}_h)= & {} 0 \;\;\;\forall \mathbf{w}_h \in \mathbf{W}_h. \end{aligned}$$
(6.11)

Using the projections \((\tilde{u}_h,\tilde{{\varvec{\sigma }}})\), we split the errors \(e_u=(u_h- \tilde{u}_h)-(\tilde{u}_h- u)=:\theta -\rho ,\) and \(e_{\varvec{\sigma }}=({\varvec{\sigma }}_h- \tilde{{\varvec{\sigma }}}_h)-( \tilde{{\varvec{\sigma }}}_h-{\varvec{\sigma }})=:{\varvec{\xi }}-{\varvec{\zeta }}.\) From (6.10)–(6.11), we then obtain

$$\begin{aligned} (^{C}\!\partial _t^{\alpha }e_u, \chi _h)- (\nabla \cdot {\varvec{\xi }}, \chi _h)= & {} (f(u_h)-f(u),\chi ) \;\;\;\forall \chi _h \in V_h,\end{aligned}$$
(6.12)
$$\begin{aligned} ({\varvec{\xi }}, \mathbf{w}_h) + (\theta ,\nabla \cdot \mathbf{w}_h )= & {} 0 \;\;\;\forall \mathbf{w}_h \in \mathbf{W}_h. \end{aligned}$$
(6.13)

Next we state our results for the semidiscrete problem.

Theorem 6.1

Let \((u,{\varvec{\sigma }})\) and \((u_h,{\varvec{\sigma }}_h)\) be the solutions of (6.1)–(6.2) and (6.8)–(6.9), respectively, with \(u_h(0)=P_hu_0\). Then, for \(u_0 \in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in [0,1]\), the following error estimates hold for \(t>0\):

$$\begin{aligned} \Vert u_h(t)-u(t)\Vert \le c h^{1+\ell } {\ell _h(\nu )}t^{-\alpha (1-\nu )},\quad \ell =0,1, \end{aligned}$$
(6.14)

and

$$\begin{aligned} \Vert {\varvec{\sigma }}_h(t)-{\varvec{\sigma }}(t)\Vert \le c h {\ell _h(\nu )}t^{-\alpha (1-\nu )},\quad \ell =1. \end{aligned}$$
(6.15)

Proof

From the definition of the operator \(T_h\) above, the semidiscrete problem may also be written as

$$\begin{aligned} T_h^{C}\!\partial _t^{\alpha }u_h+u_h=T_hf(u_h), \quad t>0,\quad u_h(0)=P_hu_0. \end{aligned}$$
(6.16)

Note that \(T_h\) satisfies the properties in Lemma 6.1 and \(T_hP_h=T_h\). Recalling the definition of the continuous operator T, see (2.1), the estimate (6.14) follows immediately from Theorem 3.1. Now, using (6.13) and the standard inverse inequality: \(\Vert \nabla \cdot {\varvec{\xi }}\Vert \le ch^{-1}\Vert {\varvec{\xi }}\Vert \)\(\forall {\varvec{\xi }}\in \mathbf{W}_h\), we have

$$\begin{aligned} \Vert {\varvec{\xi }}\Vert ^2\le \Vert \theta \Vert \,\Vert \nabla \cdot {\varvec{\xi }}\Vert \le ch^{-1}\Vert \theta \Vert \,\Vert {\varvec{\xi }}\Vert . \end{aligned}$$

Further, \(\Vert \theta \Vert \le \Vert e_u\Vert +\Vert \rho \Vert \le c h^2 {\ell _h(\nu )}t^{-\alpha (1-\nu )}\) by (6.5) and (6.14), so that \( \Vert {\varvec{\xi }}(t)\Vert \le c h {\ell _h(\nu )}t^{-\alpha (1-\nu )}. \) Together with \(\Vert {\varvec{\zeta }}(t)\Vert \le c h \Vert \mathcal {L}u(t)\Vert \le c h t^{-\alpha (1-\nu )}\Vert u_0\Vert ,\) this shows (6.15). \(\square \)

Remark 6.1

Avoiding the inverse inequality in the estimation of the flux variable \({\varvec{\sigma }}\) seems to be challenging as estimates for the first and second time derivatives of u are not available, even in the \(H^{1}(\Omega )\)-norm.

Remark 6.2

The following estimate holds in the stronger \(L^\infty (\Omega )\)-norm:

$$\begin{aligned} \Vert u_h(t)-u(t)\Vert _{L^\infty (\Omega )}\le c h |\ln h|t^{-\alpha (1-\nu )},\quad t>0, \quad \ell =1. \end{aligned}$$
(6.17)

Indeed, from (6.13), we may get \(\Vert \theta (t)\Vert _{L^\infty (\Omega )}\le C|\ln h|\,\Vert {\varvec{\xi }}(t)\Vert \), see [16, Lemma 1.2]. Noting also that \(\Vert {\varvec{\zeta }}(t)\Vert \le c h|\ln h| \Vert \mathcal {L}u(t)\Vert \), see [16, Theorem 1.1], we obtain (6.17).

Remark 6.3

For the linear problem with \(f=0\) and \(u_0 \in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in [1/2,1]\), the solution \(u(t)\in H^3(\Omega )\) for \(t>0\). As a consequence, when \(\ell =1\), approximations of order \(O(h^2)\) are achieved for both variables u and \({\varvec{\sigma }}\), see [17, Thorem 5.4].

6.2 Fully discrete scheme

The fully mixed FE scheme based on the backward Euler CQ method is to find a pair \((U_h^n,\Sigma _h^n)\in V_h\times \mathbf{W}_h\) such that for \(n\ge 1\),

$$\begin{aligned} (\partial _\tau ^{\alpha }U_h^n, \chi _h)- (\nabla \cdot \Sigma _h^n, \chi _h)= & {} (\partial _\tau ^{\alpha }U_h^0,\chi _h)+ (f(U_h^n),\chi _h)\;\;\;\forall \chi _h \in V_h,\end{aligned}$$
(6.18)
$$\begin{aligned} (\Sigma _h^n, \mathbf{w}_h) + (U_h^n,\nabla \cdot \mathbf{w}_h)= & {} 0 \;\;\;\forall \mathbf{w}_h \in \mathbf{W}_h, \end{aligned}$$
(6.19)

with \(U_h^0=P_hu_0\). We now prove the following result.

Theorem 6.2

Let \((u,{\varvec{\sigma }})\) be the solution (6.1)–(6.2). Let \((U_h^n,\Sigma _h^n)\) be the solution of (6.18)–(6.19) with \(U_h^0=P_hu_0\). Then, for \(u_0 \in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in (0,1]\), the following error estimate holds for \(t_n>0\):

$$\begin{aligned} \Vert U_h^n-u(t_n)\Vert \le c (\tau t_n^{\alpha \nu -1} + h^{1+\ell }t_n^{-\alpha (1-\nu )}),\quad \ell =0,1, \end{aligned}$$
(6.20)

where c is independent of h and \(\tau \).

Proof

In view of Lemma 6.1, the fully semidiscrete problem is rewritten as

$$\begin{aligned} T_h\partial _\tau ^{\alpha }U_h^n+U_h^n= T_h\partial _\tau ^{\alpha }U_h^0+T_hf(U_h^n), \quad n\ge 1,\quad U_h^0=P_hu_0. \end{aligned}$$
(6.21)

Recalling again the definition of the continuous operator T, the estimate (6.20) follows immediately from Theorem 4.3. \(\square \)

We conclude this section by showing error estimates for the linear problem with \(f=0\). The results are intended to complete the semidiscrete mixed FE error analysis presented in [17].

Theorem 6.3

Let \((u,{\varvec{\sigma }})\) be the solution (6.1)–(6.2) with \(f=0\). Let \((U_h^n,\Sigma _h^n)\) be the solution of (6.18)–(6.19) with \(f=0\) and \(U_h^0=P_hu_0\). Then, for \(u_0 \in \mathcal {D}(\mathcal {L}^\nu )\), \(\nu \in [0,1]\), the following error estimate holds for \(t_n>0\):

$$\begin{aligned} \Vert U_h^n-u(t_n)\Vert \le c (\tau t_n^{\alpha \nu -1} + h^{1+\ell } t_n^{-\alpha (1-\nu )})\Vert \mathcal {L}^\nu u_0\Vert ,\quad \ell =0,1. \end{aligned}$$
(6.22)

Furthermore, in the case \(\nu =0\),

$$\begin{aligned} \Vert \Sigma _h^n-{\varvec{\sigma }}(t_n)\Vert \le c (\tau t_n^{-\alpha /2-1}+ht_n^{-\alpha }) \Vert u_0\Vert ,\quad \ell =1, \end{aligned}$$
(6.23)

where c is independent of h and \(\tau \).

Proof

The first estimate is given in Theorem 4.1. To derive (6.23), we use (6.8)–(6.9) and (6.18)–(6.19) so that, with \(u_h^n=u_h(t_n)\) and \({\varvec{\sigma }}_h^n={\varvec{\sigma }}_h(t_n)\), we have

$$\begin{aligned} (\partial _\tau ^{\alpha }(U_h^n-u_h^n), \chi _h)- (\nabla \cdot (\Sigma _h^n-{\varvec{\sigma }}_h^n), \chi _h)= & {} -((\partial _\tau ^{\alpha }-\partial _t^{\alpha }) (u_h^n-u_0),\chi _h) \;\;\;\forall \chi _h \in V_h,\\ (\Sigma _h^n-{\varvec{\sigma }}_h^n, \mathbf{w}_h) + (U_h^n-u_h^n,\nabla \cdot \mathbf{w}_h)= & {} 0 \;\;\;\forall \mathbf{w}_h \in \mathbf{W}_h. \end{aligned}$$

Choose \(\chi _h=U_h^n-u_h^n\) and \(\mathbf{w}_h=\Sigma _h^n-{\varvec{\sigma }}_h^n\) so that

$$\begin{aligned} \Vert \Sigma _h^n-{\varvec{\sigma }}_h^n\Vert ^2+ (\partial _\tau ^{\alpha }(U^n_h-u_h^n),U^n_h-u_h^n)= -((\partial _\tau ^{\alpha }-\partial _t^{\alpha })(u_h^n-u_0),U^n_h-u_h^n). \end{aligned}$$
(6.24)

From (6.16) and (6.21), we see that \(U_h^n-u_h^n=G(\partial _\tau )u_0-G(\partial _t)u_0\), where \(G(z)=(z^{-\alpha }I+T_h)^{-1}T_h\). Hence,

$$\begin{aligned} \partial _\tau ^{\alpha }(U^n_h-u_h^n)= & {} (\partial _\tau ^{\alpha }U^n_h - \partial _t^{\alpha }u^n_h) - (\partial _\tau ^{\alpha }-\partial _t^{\alpha }) u^n_h \\= & {} (\tilde{G}(\partial _\tau )- \tilde{G}(\partial _t))u_0 - (\partial _\tau ^{\alpha }-\partial _t^{\alpha }) u^n_h, \end{aligned}$$

where \(\widetilde{G}(z)=z^\alpha G(z)\). Inserting this result in (6.24), we get

$$\begin{aligned} \Vert \Sigma _h^n-{\varvec{\sigma }}_h^n\Vert ^2+ ((\widetilde{G}(\partial _\tau )- \widetilde{G}(\partial _t))u_0,U^n_h-u_h^n)= ((\partial _\tau ^{\alpha }-\partial _t^{\alpha })u_0,U^n_h-u_h^n), \end{aligned}$$
(6.25)

and therefore

$$\begin{aligned} \Vert \Sigma _h^n-{\varvec{\sigma }}_h^n\Vert ^2 \le \left( \Vert (\widetilde{G}(\partial _\tau )- \widetilde{G}(\partial _t))u_0 \Vert +\Vert (\partial _\tau ^{\alpha }-\partial _t^{\alpha })u_0\Vert \right) \Vert U^n_h-u_h^n \Vert . \end{aligned}$$

Applying Lemma 4.1 (with \(\mu =-\alpha \), \(\sigma =1\)), we obtain \(\Vert (\partial _\tau ^{\alpha }-\partial _t^{\alpha })u_0\Vert \le c\tau t_n^{-\alpha -1}\Vert u_0\Vert \). Similarly, \(\Vert \tilde{G}(z)\Vert \le c|z|^{\alpha }\, \forall z\in \Sigma _\theta \) implies that \(\Vert (\tilde{G}(\partial _\tau ^{\alpha })- \tilde{G}(\partial _t^{\alpha }))u_0\Vert \le c\tau t_n^{-\alpha -1}\Vert u_0\Vert \). The estimate (6.23) follows now since \(\Vert {\varvec{\sigma }}_h(t_n)-{\varvec{\sigma }}(t_n)\Vert \le cht^{-\alpha }\Vert u_0\Vert \), see [17]. This completes the proof. \(\square \)

Remark 6.4

For problem (1.1) with \(\mathcal {L}u = -\text{ div } [A(x)\nabla u]+\kappa (x)u\), one may consider an expanded mixed FE method by setting \(\mathbf{q}=\nabla u\) and \({\varvec{\sigma }}= A\mathbf{q}\), see [7]. Thus, the scalar unknown u, its gradient and its flux \({\varvec{\sigma }}\) are treated explicitly. The new mixed Galerkin method is to find \((u,\mathbf{q},{\varvec{\sigma }}):(0,T_0]\rightarrow V\times \mathbf{W}\times \mathbf{W}\) such that

$$\begin{aligned} (^{C}\!\partial _t^{\alpha }u, \chi )- (\nabla \cdot {\varvec{\sigma }}, \chi )+(\kappa u, \chi )= & {} (f(u),\chi ) \;\;\;\forall \chi \in V,\\ (\mathbf{q}, \mathbf{w}) + (u,\nabla \cdot \mathbf{w})= & {} 0\ \;\;\;\forall \mathbf{w}\in \mathbf{W}, \\ ({\varvec{\sigma }}, \mathbf{w}) - (A \mathbf{q},\mathbf{w})= & {} 0\ \;\;\;\forall \mathbf{w}\in \mathbf{W}, \end{aligned}$$

with \(u(0)=u_0\). We define the semidiscrete approximation as the triple \((u_h,\mathbf{q}_h,{\varvec{\sigma }}_h):(0,T_0]\rightarrow V_h\times \mathbf{W}_h\times \mathbf{W}_h\) satisfying

$$\begin{aligned} (^{C}\!\partial _t^{\alpha }u_h, \chi _h)- (\nabla \cdot {\varvec{\sigma }}_h, \chi _h)+(\kappa u_h, \chi _h)= & {} (f(u_h),\chi _h) \;\;\;\forall \chi _h \in V_h,\\ (\mathbf{q}_h, \mathbf{w}_h) + (u_h,\nabla \cdot \mathbf{w}_h)= & {} 0\ \;\;\;\forall \mathbf{w}_h \in \mathbf{W}_h, \\ ({\varvec{\sigma }}_h, \mathbf{w}_h) - (A \mathbf{q}_h,\mathbf{w}_h)= & {} 0\ \;\;\;\forall \mathbf{w}_h \in \mathbf{W}_h, \end{aligned}$$

with \(u_h(0)=P_hu_0\). Then, in order to extend the previous analysis to the expanded mixed FE method, one may prove a result similar to Lemma 6.1. This can be achieved by using the approximation properties derived in [7, Theorem 4.8].

7 Numerical Experiments

In this part, we present numerical examples to verify the theoretical results. We consider problem (1.1) and its mixed form (6.1)–(6.2) in the unit square \(\Omega =(0,1)^2\) with \(\mathcal {L}=-\Delta \). We choose \(f(u)=\sqrt{1+u^2}\) and perform numerical tests with the following smooth and nonsmooth initial data:

  1. (a)

    \(u_0(x,y)=xy(1-x)(1-y)\in \dot{H}^2(\Omega )\).

  2. (b)

    \(u_0(x,y)=\chi _D(x,y)\in \dot{H}^{1/2-\epsilon }(\Omega )\), \(\epsilon >0\),

where \(\chi _D(x,y)\) denotes the characteristic function of the domain \(D:=\{(x,y)\in \Omega :x^2+y^2\le 1\}\).

Since the error analysis of the standard conforming Galerkin FEM has thoroughly been investigated, see for instance [11], we shall mainly focus on spatial errors from nonconforming and mixed FEMs. The backward Euler CQ method is used for the time discretization,

In the computation, we divide the domain \(\Omega \) into regular right triangles with M equal subintervals of length \(h=1/M\) on each side of the domain. We choose \(\alpha = 0.5\) and the final time \(T=0.1\). Since the exact solution is difficult to obtain, we compute a reference solution on a refined mesh in each case. All the numerical results are obtained using FreeFEM++ [10].

To check the spatial discretization errors, we display in Table 1 the \(L^2(\Omega )\)-norm of the errors in the discrete solutions in cases (a) and (b), computed by the Crouzeix–Raviart nonconforming FEM (P1nc) based on the numerical scheme (5.3) and its analogue used with the standard conforming FEM. From the table, we observe a convergence rate of order \(O(h^2)\) for smooth and nonsmmoth initial data, which confirms the theoretical convergence rates. Similar results have been obtained with different values of \(\alpha \). We notice that the nonconforming method yields slightly better results.

Table 1 \(L^2\)-errors for cases (a) and (b); P1 and P1nc FEMs

For the mixed FEM, we perform the computation using the lowest-order Raviart–Thomas FE spaces (RT0, P0) and (RT1, P1dc), where P0 and P1dc denote the sets of piecewise constant and linear functions, respectively. Here, we adopt the notation used in FreeFEM++. In our tests, we include the case \(\alpha =1\) (i.e., the parabolic case) in order to investigate the effect of the solution regularity.

Table 2 \(L^2\)-errors and convergence rates for case (a); mixed FEMs
Table 3 \(L^2\)-errors and convergence rates for case (b); mixed FEMs

The numerical results for problem (a) are given in Table 2. They show a convergence rate of order O(h) in the case of (RT0, P0) and of order \(O(h^2)\) in the case of (RT1, P1dc) for both values \(\alpha =0.5\) and \(\alpha =1\), which agrees well with the theoretical results.

In Table 3, we present the numerical results for problem (b). The results reveal that the convergence rates are maintained in the cases of (RT0, P0) with \(\alpha =0.5\) and (RT1, P1dc) with \(\alpha =1\), which agrees well with our convergence analysis. By contrast, the convergence rate reduces to \(O(h^{1.7})\) in the case of (RT1, P1dc) with \(\alpha =0.5\). This confirms our prediction that the optimal \(O(h^{2})\) convergence rate is no longer attainable when the initial data is not smooth. Note that, since the numerical results do not show a convergence rate of O(h), this may be seen as an unexpected result. However, as the initial data \(u_0\) has some smoothness, \(u_0\) is roughly in \(\dot{H}^{1/2-\epsilon }(\Omega )\) for some \(\epsilon >0\), the numerical results do not contradict our theoretical findings. Indeed, the smoothness of the particular initial data \(u_0\) could then have a positive effect on the convergence rate. This fact has also been observed in the study of a homogeneous linear time-fractional problem with time-dependent coefficients [18].