1 Introduction

Fractional differential equations have been successfully used as a powerful tool in modelling the phenomena related to nonlocality and spatial heterogeneity. As a class of basic fractional differential equations, the fractional sub-diffusion equations (FSDEs for short) that describe a special type of anomalous diffusion phenomenon are becoming increasingly significant in many fields of science and engineering (see, e.g., [1, 2]). These equations were derived by using continuous time random walks with a fractional derivative term in time to represent the degree of memory in the diffusing material [3, 4].

The analytical solutions of most generalized FSDEs are rather difficult to obtain. A number of numerical methods have been developed for the computation of their solutions; for instance, the explicit and implicit finite difference methods in [5, 6], the compact finite difference methods in [7, 8], the finite element methods in [9,10,11], the spectral methods in [12, 13], the alternating direction implicit methods in [14, 15], the implicit meshless method in [16], etc.

Most of the aforementioned numerical methods are generally intended for the equations with constant diffusion coefficients. However, in the inhomogeneous medium, the diffusion coefficient may depend on the space variable (see, e.g., [17,18,19,20,21,22]). This dependence leads to numerous physical applications which are described by the equations involving variable diffusion coefficients (see, e.g., [17,18,19,20,21,22]). When solving these equations, the techniques used for the equations with constant coefficients cannot be applied directly and extra efforts are usually required, especially for obtaining the expected high-order accuracy. In this paper, we seek a high-order compact finite difference method for solving a class of Caputo-type variable coefficient FSDEs in conservative form. The class of equations under consideration with its boundary and initial conditions is given by

$$\begin{aligned} \left\{ \begin{array}{l} {_{~0}^{C}}\mathcal{D}_{t}^{\alpha }u(x,t)=\mathcal{L}u(x,t)+f(x,t),\qquad (x,t)\in (0,L)\times (0,T],\\ u(0,t)=\phi _{0}(t),\quad u(L,t)=\phi _{L}(t),\qquad t\in (0,T],\\ u(x,0)=0,\qquad x\in [0,L], \end{array}\right. \end{aligned}$$
(1.1)

where the term \({_{~0}^{C}}\mathcal{D}_{t}^{\alpha }u(x,t)\) represents the Caputo fractional derivative of order \(\alpha \) in t, which is defined by

$$\begin{aligned} {_{~0}^{C}}\mathcal{D}_{t}^{\alpha }u(x,t)={_{0}\mathcal{D}}_{t}^{-(1-\alpha )}[\partial _{t}u(x,t)]=\frac{1}{\Gamma (1-\alpha )} \int _{0}^{t} \partial _{s}u(x,s) (t-s)^{-\alpha }\mathrm{d}s, \quad 0<\alpha <1,\nonumber \\ \end{aligned}$$
(1.2)

and the term \(\mathcal{L}u(x,t)\) is the diffusion term with the diffusion coefficient k(x), which is given by

$$\begin{aligned} \mathcal{L}u(x,t)=\partial _{x}\left( k(x) \partial _{x}u\right) (x,t). \end{aligned}$$
(1.3)

Without loss of generality, we have assumed the homogeneous initial condition \(u(x,0)=0\) in the above problem. If \(u(x,0)=\psi (x)\) for some sufficiently smooth function \(\psi (x)\) then the problem can be reduced to the same form as the above problem for \(v(x,t)=u(x,t)-\psi (x)\).

Throughout the paper, we assume that the given functions f(xt), \(\phi _{0}(t)\), \(\phi _{L}(t)\) and k(x) in (1.1) and (1.3) are smooth enough and there exist positive constants \(c_{0}\) and \(c_{1}\) such that

$$\begin{aligned} c_{0}\le k(x)\le c_{1}, \qquad x\in [0,L]. \end{aligned}$$
(1.4)

In addition, we assume that the solution to the problem (1.1) has the necessary regularity.

It is important to develop high-order numerical methods for solving problem (1.1). A compact finite difference method was proposed in [19], where the unconditional stability and the global convergence of the method were proved by introducing a new norm regarding to the variable coefficient k(x). The recent work [22] extends the method in [19] to the Neumann problem of (1.1). The stability and convergence of the proposed method were studied in that paper by applying the energy method to the matrix form of the method. Another method was given in [20], where a compact exponential finite difference method was considered for a time fractional convection-diffusion reaction equation with variable coefficient that contains the present problem (1.1) as a special case. However, the related convergence analysis was carried out only for the case of constant coefficient. Similarly, some combined compact finite difference methods were proposed in [21] in a general setting but the obtained theoretical results are only for the equation of integer order with constant coefficient. In the above works, the L1 approximation formula was used for the discretization of the Caputo fractional derivative \({_{~0}^{C}}\mathcal{D}_{t}^{\alpha }u\). Consequently, the temporal accuracy of the resulting method is only of order \(2-\alpha \), which is less than two. This motivated us to look for a more accurate approximation to the Caputo fractional derivative \({_{~0}^{C}}\mathcal{D}_{t}^{\alpha }u\) and then develop a high-order numerical method for the problem (1.1), with a rigorous theoretical analysis for the general case of variable coefficient k(x).

Usually, there are two types of approximation approaches for the Caputo fractional derivative \({_{~0}^{C}}\mathcal{D}_{t}^{\alpha }y\) of the function y(t) defined on a finite interval. The idea of the first type is to replace the integrand y(t) inside the integral by its piecewise interpolating polynomial. In general, the obtained discretization method in this way has the convergence order \(r+1-\alpha \), where r is the degree of the interpolating polynomial. The very relevant works were given in [13, 23,24,25,26,27,28,29], where \((2-\alpha )\)th-order (i.e., L1 approximation), \((3-\alpha )\)th-order, \((4-\alpha )\)th-order and \((r+1-\alpha )\)th-order \((r\ge 4)\) methods were investigated, respectively. When such high-order methods are applied to fractional differential equations, a key issue is the stability analysis of the corresponding scheme for all \(\alpha \) in (0, 1). A full stability analysis was established in [23] for a second-order scheme and in [13] for a \((3-\alpha )\)th-order scheme. The second type of approximation approach is to utilize the called weighted and shifted Grüunwald difference operators. Although this approach is often used to handle the Riemann–Liouville fractional derivative, some numerical approximations for the Caputo fractional derivative can also be constructed with the help of the equivalence of these two derivatives under some regularity assumptions. We refer to [30,31,32] for such methods. Different from the first type of approach, the method from this technique has the convergence order independent of the derivative order \(\alpha \). However, on account of the weighted and shifted terms, it usually requires an additional technique for the discretization on the first few time levels in order to obtain the expected high-order accuracy (see, e.g., [30, 31]).

Based on the Lubich operator, this paper derives a class of approximation formulae for the Caputo fractional derivative \({_{~0}^{C}}\mathcal{D}_{t}^{\alpha }y\) defined on a finite interval. The Lubich operator was firstly introduced in [33] to obtain high-order approximations of the Riemann–Liouville fractional integral. Its applications to fractional differential equations are mostly for Riemann–Liouville-type equations (see, e.g., [34,35,36,37,38]). In a series of works [39,40,41,42,43,44], some second-order schemes were proposed for the Caputo-type fractional sub-diffusion or diffusion-wave equation with constant coefficient. The main idea in these works is to transform the original equation into an equivalent integro-differential equation and then apply the Lubich operator of second-order to the Riemann–Liouville fractional integral of the equivalent equation. In this paper, we directly apply the Lubich operator to construct a set of high-order approximation formulae for the Caputo fractional derivative \({_{~0}^{C}}\mathcal{D}_{t}^{\alpha }y\) defined on the finite interval [0, T]. The convergence of the formulae is of order r, where \(r\ge 2\) is a positive integer depending on the choice of the generating functions. On the basis of these formulae, one can easily get high-order and unconditionally stable numerical methods to solve Caputo-type problem (1.1). Another feature of the formulae is that they take the same form and so the computations can be carried out by the same recurrence relation without concern for the various generating functions. We remark that these high-order approximation formulae can also be obtained from the high-order formulae for the Riemann–Liouville fractional derivative; but some additional regularity assumptions are necessary.

In order to give a high-order discretization of the variable coefficient differential operator \(\mathcal{L}u\), we here consider the fourth-order compact difference discretization proposed in our previous works [45, 46], instead of that used in [19, 22, 47]. This discretization was designed by means of the integro-interpolation method, thereby differing essentially from that in [19, 22, 47]. One important difference is that it depends only on the diffusion coefficient k(x) while the one in [19, 22, 47] depends not only on the coefficient k(x) itself but also on its first-order and even second-order derivatives. The dependence of k(x) on x causes considerable difficulty in the analysis of the proposed method; at least the analysis for the case of constant coefficient does not work. Motivated by the recent study in [22], we overcome this difficulty by carefully decomposing the coefficient matrix and then applying the discrete energy method to a suitable matrix form of the method.

The outline of the paper is as follows. In Sect. 2, we construct a set of high-order approximation formulae for the Caputo fractional derivative \({_{~0}^{C}}\mathcal{D}_{t}^{\alpha }y\) defined on [0, T] by making use of the Lubich operator. On the basis of the obtained Lubich approximation formulae, a set of high-order compact finite difference methods for solving problem (1.1) is proposed in Sect. 3. Unconditional stability and convergence of the proposed methods are studied in Sect. 4 for the general case of variable coefficient k(x). Section 5 is devoted to further discussions. In Sect. 6, we apply the proposed compact difference methods to two model problems and present some numerical results to illustrate the theoretical results. The final section contains a brief conclusion.

2 Approximation Formulae for the Caputo Fractional Derivative

In this section, we develop a general form of high-order Lubich difference approximation formulae for the Caputo fractional derivative \({_{~0}^{C}\mathcal{D}_{t}^{\alpha }}y(t)\) of the function y(t) defined on [0, T].

2.1 Lubich Difference Operator

For any function y(t) defined on [0, T], any real number \(\alpha \in (0,1)\) and any positive integer r, the Lubich difference operator with the step \(\tau \) is defined by

$$\begin{aligned} \mathcal{L}_{r,\tau }^{\alpha } y(t)= \tau ^{-\alpha } \sum _{k=0}^{\left[ \frac{t}{\tau }\right] }\varpi _{r,k}^{(\alpha )} y(t-k\tau ), \qquad t\in [0,T], \end{aligned}$$
(2.1)

where \(\varpi _{r,k}^{(\alpha )}\) are the coefficients of the Taylor series expansion of the generating function

$$\begin{aligned} W_{r,\alpha }(z)=\left( \sum \limits _{i=1}^{r} \frac{1}{i} (1-z)^{i} \right) ^{\alpha }, \end{aligned}$$
(2.2)

that is,

$$\begin{aligned} \left( \sum \limits _{i=1}^{r} \frac{1}{i} (1-z)^{i} \right) ^{\alpha }=\sum \limits _{k=0}^{\infty } \varpi _{r,k}^{(\alpha )} z^{k},\qquad |z|\le 1. \end{aligned}$$
(2.3)

An easily implementable way of computing the coefficients \(\varpi _{r,k}^{(\alpha )}\) is by the following recursive relation given in [48, 49]:

$$\begin{aligned} \varpi _{r,0}^{(\alpha )}=(a_{r,0})^{\alpha }, \qquad \varpi _{r,k}^{(\alpha )}=\frac{1}{ka_{r,0}}\sum _{j=1}^{\min \{k,r\}} a_{r,j}(j\alpha -k+j)\varpi _{r,k-j}^{(\alpha )}~(k\ge 1), \end{aligned}$$
(2.4)

where

$$\begin{aligned} a_{r,0}=\sum _{i=1}^{r} \frac{1}{i}, \qquad a_{r,j}=\frac{(-1)^{j}}{j!}\sum _{i=j}^{r} \frac{(i-1)!}{(i-j)!}, \qquad j=1,2,\dots ,r. \end{aligned}$$
(2.5)

The exact expression of the coefficients \(\varpi _{r,k}^{(\alpha )}\) for \(r=2,3,4,5,6\) can be found in [48, 49]. It was also shown in [48] that the coefficients \(\varpi _{r,k}^{(\alpha )}\rightarrow 0\) as \(k\rightarrow \infty \) for \(r=2,3,4,5,6\), while the coefficients \(\varpi _{r,k}^{(\alpha )}\) for \(r=7,8,9,10\) are oscillatory for sufficiently larger k and so may be unsuitable for numerical computations.

The following lemma gives a property of the generating function \(W_{r,\alpha }(z)\), which is useful for our next discussions.

Lemma 2.1

Let \(V_{r}(z)=\left( W_{r,\alpha }(\mathrm{e}^{-z})\right) ^{\frac{1}{\alpha }}\). Then

$$\begin{aligned} V_{r}(0)=0, \quad V_{r}^{(1)}(0)=1,\quad ~~ V_{r}^{(k)}(0)=0~(2\le k\le r), \quad V_{r}^{(r+1)}(0)=-r!. \end{aligned}$$
(2.6)

Proof

It is clear that \(V_{r}(0)=0\). Since \(V_{r}(z)=\sum _{i=1}^{r} \frac{1}{i} (1-\mathrm{e}^{-z})^{i}\), we have

$$\begin{aligned}&V_{r}^{(1)}(z)=\mathrm{e}^{-z}G_{r}(z),\nonumber \\&V_{r}^{(k)}(z)=\mathrm{e}^{-z}\sum _{l=0}^{k-1} {k-1 \atopwithdelims ()l} (-1)^{k-1-l} G_{r}^{(l)}(z), \qquad k=2,3,\ldots , \end{aligned}$$
(2.7)

where \(G_{r}(z)=\sum _{i=0}^{r-1} (1-\mathrm{e}^{-z})^{i}\). In view of \(G_{r}(z)=\mathrm{e}^{z}\left( 1-(1-\mathrm{e}^{-z})^{r}\right) \), we obtain

$$\begin{aligned}&G_{r}^{(1)}(z)=G_{r}(z)-r(1-\mathrm{e}^{-z})^{r-1},\nonumber \\&G_{r}^{(l)}(z)=G_{r}^{(l-1)}(z)-\sum _{j=1}^{l-1}a_{l,j}(1-\mathrm{e}^{-z})^{r-1-j}\mathrm{e}^{-jz}, \qquad l=2,3,\dots ,r, \end{aligned}$$
(2.8)

where \(a_{l,j}\) are some constants independent of z and, in particular, \(a_{l,l-1}=\frac{r!}{(r-l)!}\). This implies \(G_{r}^{(l)}(0)=1\) \((0\le l\le r-1)\) and \(G_{r}^{(r)}(0)=1-r!\). Therefore by (2.7),

$$\begin{aligned}&V_{r}^{(1)}(0)=1, \qquad V_{r}^{(k)}(0)=\sum _{l=0}^{k-1} {k-1 \atopwithdelims ()l} (-1)^{k-1-l}=0,\qquad k=2,3,\dots ,r,\nonumber \\&V_{r}^{(r+1)}(0)=\sum _{l=0}^{r-1} {r \atopwithdelims ()l} (-1)^{r-l}+1-r!=-r!. \end{aligned}$$

This completes the proof. \(\square \)

2.2 Approximation Formulae

Assume \(y(t)\in C^{r}[0,T]\) with \(y^{(r+1)}(t)\in L^{1}[0,T]\) for a nonnegative integer r. We define its extension \(y_\mathrm{ex}(t)\) to the entire real line \(\mathbb {R}\) as follows:

$$\begin{aligned} y_\mathrm{ex}(t)=\left\{ \begin{array}{l@{\quad }l} 0,&{} t\in (-\infty , 0),\\ y(t), \qquad &{} t\in [0,T],\\ \bar{y}(t), \quad &{} t\in (T,2T),\\ 0,&{} t\in [2T,\infty ), \end{array}\right. \end{aligned}$$
(2.9)

where \(\bar{y}(t)\) is a Hermite interpolation polynomial satisfying \(\bar{y}^{(k)}(T)=y^{(k)}(T)\) and \(\bar{y}^{(k)}(2T)=0\) for \(k=0,1, \dots ,r\).

For any nonnegative integer m and any real number \(\alpha \in (0,1)\), we define

$$\begin{aligned} \mathscr {C}^{m+\alpha }(\mathbb {R})=\left\{ f: f(t)\in L^{1}(\mathbb {R}), \int _{-\infty }^{\infty } |\omega |^{m+\alpha } \left| \hat{f} (\omega ) \right| \mathrm{d}\omega <\infty \right\} , \end{aligned}$$
(2.10)

where the function \(\hat{f}(\omega )=\mathcal{F}[f](\omega )\) is the Fourier transformation of f(t), i.e., \(\hat{f} (\omega ) = \mathcal{F}[f](\omega )= \int _{-\infty }^{\infty } \mathrm{e}^{\mathrm{i}\omega t} f(t)\mathrm{d}t\) for all \(\omega \in \mathbb {R}\) (see [50]).

Definition 2.1

Let y(t) be a function defined on [0, T]. We say that \(y(t)\in \mathscr {C}^{m+\alpha }[0,T]\) for a nonnegative integer m and a real number \(\alpha \in (0,1)\) if its extension \(y_\mathrm{ex}(t)\in \mathscr {C}^{m+\alpha }(\mathbb {R})\).

Recall that for any real number \(\alpha \in (0,1)\), the \(\alpha \)th-order Caputo fractional derivative of the function f(t) defined on the entire real line \(\mathbb {R}\) is defined by

$$\begin{aligned} {_{-\infty }^{~~C}\mathcal{D}_{t}^{\alpha }}f(t)={_{-\infty }\mathcal{D}}_{t}^{-(1-\alpha )}[f^{(1)}(t)]=\frac{1}{\Gamma (1-\alpha )}\int _{-\infty }^{t} f^{(1)}(s) (t-s)^{-\alpha }\mathrm{d}s. \end{aligned}$$

We have the following lemma.

Lemma 2.2

Assume \(y(t)\in C^{r}[0,T]\cap \mathscr {C}^{r_{0}+\alpha }[0,T]\) with \(y^{(r+1)}(t)\in L^{1}[0,T]\) for a real number \(\alpha \in (0,1)\) and two nonnegative integers r and \(r_{0}\) satisfying \(r\le r_{0}\). Then

  1. (1)

    \(y^{(k)}(0)=0\) for \(k=0,1,\dots ,r\),

  2. (2)

    \( \mathcal{F} \left[ y_\mathrm{ex}^{(k)} \right] (\omega )=(-\mathrm{i}\omega )^{k}\hat{y}_\mathrm{ex}(\omega )\) for \(k=0,1,\dots ,r+1\),

  3. (3)

    \(\mathcal{F} \left[ {_{-\infty }^{~~C}\mathcal{D}_{t}^{\alpha }}y_\mathrm{ex}^{(k)}\right] (\omega ) =(-\mathrm{i} \omega )^{k+\alpha } \hat{y}_\mathrm{ex}(\omega )\) for \(k=0,1,\dots ,r\), where \(y_\mathrm{ex}(t)\) is the extension of y(t) to the entire real line \(\mathbb {R}\).

Proof

(a) Since \(y(t)\in C^{r}[0,T]\) with \(y^{(r+1)}(t)\in L^{1}[0,T]\), we have that for each \(k=0,1,\dots ,r+1\), \(y_\mathrm{ex}^{(k)}(t)\) is absolutely integrable on \(\mathbb {R}\) and thus its Fourier transformation \(\mathcal{F}\left[ y_\mathrm{ex}^{(k)} \right] (\omega )\) is defined.

Assume, by contradiction, that there exists some \(r^{\prime }\) satisfying \(0\le r^{\prime }\le r\) such that \(y^{(k)}(0)=0\) for \(k=0,1,\dots ,r^{\prime }-1\) but \(y^{(r^{\prime })}(0)\not = 0\). By integrating by parts,

$$\begin{aligned}&\mathcal{F} \left[ y_\mathrm{ex}^{(r^{\prime }+1)} \right] (\omega )= \int _{0}^{\infty } \mathrm{e}^{\mathrm{i}\omega t} y_\mathrm{ex}^{(r^{\prime }+1)}(t)\mathrm{d}t=-y^{(r^{\prime })}(0)-\mathrm{i}\omega \mathcal{F} \left[ y_\mathrm{ex}^{(r^{\prime })} \right] (\omega )\nonumber \\&\quad =-y^{(r^{\prime })}(0)+(-\mathrm{i}\omega )^{2} \mathcal{F} \left[ y_\mathrm{ex}^{(r^{\prime }-1)}\right] (\omega )=\cdots =-y^{(r^{\prime })}(0)+(-\mathrm{i}\omega )^{r^{\prime }+1} \hat{y}_\mathrm{ex}(\omega ).\nonumber \\ \end{aligned}$$
(2.11)

By the Riemann–Lebesgue lemma (see [51]), \(\lim _{|\omega |\rightarrow \infty }\mathcal{F} \left[ y_\mathrm{ex}^{(r^{\prime }+1)} \right] (\omega )=0\). We therefore obtain from (2.11) that \(\lim _{|\omega |\rightarrow \infty }(-\mathrm{i}\omega )^{r^{\prime }+1} \hat{y}_\mathrm{ex}(\omega )= y^{(r^{\prime })}(0)\not = 0\). This implies

$$\begin{aligned} \lim _{|\omega |\rightarrow \infty } |\omega |^{r^{\prime }+1-r_{0}-\alpha }\left| \omega \right| ^{r_{0}+\alpha } \left| \hat{y}_\mathrm{ex}(\omega )\right| = \lim _{|\omega |\rightarrow \infty } |\omega |^{r^{\prime }+1}\left| \hat{y}_\mathrm{ex}(\omega )\right| = \left| y^{(r^{\prime })}(0)\right| \not = 0. \end{aligned}$$

Since \(r^{\prime }+1-r_{0}-\alpha <1\), we get that the function \(|\omega |^{r_{0}+\alpha } |\hat{y}_\mathrm{ex}(\omega ) | \) is not integrable on \(\mathbb {R}\). This contradicts \(y_\mathrm{ex}(t)\in \mathscr {C}^{r_{0}+\alpha }(\mathbb {R})\). This proves \(y^{(k)}(0)=0\) for \(k=0,1,\dots ,r\).

(b) The result (2) follows from the result (1) and (2.11).

(c) Let \(k=0,1,\dots ,r\). Since \({_{-\infty }^{~~C}\mathcal{D}_{t}^{\alpha }}y_\mathrm{ex}^{(k)}(t)={_{-\infty }\mathcal{D}}_{t}^{-(1-\alpha )}\left[ y_\mathrm{ex}^{(k+1)}(t)\right] \) and \(y_\mathrm{ex}^{(k+1)}(t)\in L^{1}(\mathbb {R})\), we have from Theorem 7.1 of [50] that

$$\begin{aligned} \mathcal{F} \left[ {_{-\infty }^{~~C}\mathcal{D}_{t}^{\alpha }}y_\mathrm{ex}^{(k)}\right] (\omega ) =(-\mathrm{i} \omega )^{-(1-\alpha )} \mathcal{F}\left[ y_\mathrm{ex}^{(k+1)}\right] (\omega ). \end{aligned}$$
(2.12)

By the result (2), \( \mathcal{F}\left[ y_\mathrm{ex}^{(k+1)}\right] (\omega ) =(-\mathrm{i} \omega )^{k+1} \hat{y}_\mathrm{ex}(\omega )\) which together with (2.12) yields the result (3). \(\square \)

Theorem 2.1

Assume \(y(t)\in {C}[0,T]\cap \mathscr {C}^{r+\alpha }[0,T]\) with \(y^{(1)}(t)\in L^{1}[0,T]\) for a real number \(\alpha \in (0,1)\) and a positive integer r. Then

$$\begin{aligned} {_{~0}^{C}\mathcal{D}_{t}^{\alpha }}y(t)=\mathcal{L}_{r,\tau }^{\alpha } y(t)+\mathcal{O}(\tau ^{r}) \end{aligned}$$
(2.13)

holds uniformly for all \(t\in [0,T]\) as \(\tau \rightarrow 0\).

Proof

Let \(y_\mathrm{ex}(t)\) be the extension of y(t) to the entire real line \(\mathbb {R}\) and define

$$\begin{aligned} \Delta _{r,\tau }^{\alpha } y_\mathrm{ex}(t)= \tau ^{-\alpha } \sum _{k=0}^{\infty }\varpi _{r,k}^{(\alpha )} y_\mathrm{ex}(t-k\tau ), \qquad t\in \mathbb {R}. \end{aligned}$$
(2.14)

We have that for all \(t\in [0,T]\),

$$\begin{aligned} {_{~0}^{C}\mathcal{D}_{t}^{\alpha }}y(t)= & {} {_{-\infty }^{~~C}\mathcal{D}_{t}^{\alpha }}y_\mathrm{ex}(t),\qquad \mathcal{L}_{r,\tau }^{\alpha } y(t)=\Delta _{r,\tau }^{\alpha } y_\mathrm{ex}(t). \end{aligned}$$
(2.15)

It suffices to prove

$$\begin{aligned} {_{-\infty }^{~~C}\mathcal{D}_{t}^{\alpha }}y_\mathrm{ex}(t)=\Delta _{r,\tau }^{\alpha } y_\mathrm{ex}(t)+\mathcal{O}(\tau ^{r}) \end{aligned}$$
(2.16)

holds uniformly for all \(t\in \mathbb {R}\) as \(\tau \rightarrow 0\).

Taking the Fourier transformation on both sides of (2.14) yields

$$\begin{aligned} \mathcal{F} \left[ \Delta _{r,\tau }^{\alpha } y_\mathrm{ex} \right] (\omega )=\tau ^{-\alpha } \sum _{k=0}^{\infty }\varpi _{r,k}^{(\alpha )} \mathrm{e}^\mathrm{i\omega k \tau }\hat{y}_\mathrm{ex}(\omega )=\widetilde{W}_{r,\alpha }(-\mathrm{i}\omega \tau ) (-\mathrm{i}\omega )^{\alpha } \hat{y}_\mathrm{ex}(\omega ), \end{aligned}$$
(2.17)

where

$$\begin{aligned} \widetilde{W}_{r,\alpha }(z)= \left\{ \begin{array}{l@{\quad }l} \displaystyle z^{-\alpha } W_{r,\alpha }(\mathrm{e}^{-z}), \qquad &{}z\not =0,\\ 1,&{}z=0. \end{array} \right. \end{aligned}$$
(2.18)

By Lemma 2.2, \(\mathcal{F} \left[ {_{-\infty }^{~~C}\mathcal{D}_{t}^{\alpha }}y_\mathrm{ex}\right] (\omega ) =(-\mathrm{i} \omega )^{\alpha } \hat{y}_\mathrm{ex}(\omega )\). So we have from (2.17) that

$$\begin{aligned} \mathcal{F} \left[ \Delta _{r,\tau }^{\alpha } y_\mathrm{ex} \right] (\omega )=\mathcal{F} \left[ {_{-\infty }^{~~C}\mathcal{D}_{t}^{\alpha }}y_\mathrm{ex}\right] (\omega )+\left( \widetilde{W}_{r,\alpha } (-\mathrm{i}\omega \tau )-1 \right) (-\mathrm{i}\omega )^{\alpha } \hat{y}_\mathrm{ex}(\omega ). \end{aligned}$$
(2.19)

Let \(V_{r}(z)=\left( {W}_{r,\alpha } (\mathrm{e}^{-z}) \right) ^{\frac{1}{\alpha }}\). We have from Lemma 2.1 that the power series expansion of the function \(V_{r}(z)\) has the form \(V_{r}(z)=z+\sum _{l=r+1}^{\infty } a_{l}z^{l}\) which converges absolutely for all \(|z|\le R\) with some \(R>0\). Let \(C_{1}=\sum _{l=r+1}^{\infty } |a_{l}|R^{l}<\infty \) and \(C_{2}=\min \left\{ R, \left( \frac{R^{r+1}}{2C_{1}} \right) ^{\frac{1}{r}} \right\} \). Then, for all \(|z|\le C_{2}\),

$$\begin{aligned} \left| \sum _{l=r+1}^{\infty } a_{l}z^{l-1} \right| =\left| z^{r}\sum _{l=r+1}^{\infty } a_{l}z^{l-r-1} \right| \le \left| z \right| ^{r}R^{-r-1}C_{1}\le \frac{1}{2}, \end{aligned}$$

and so

$$\begin{aligned} \left| \widetilde{W}_{r,\alpha } (z)-1\right| = \left| \left( 1+\sum _{l=r+1}^{\infty } a_{l}z^{l-1}\right) ^{\alpha }-1\right| =\left| \sum _{n=1}^{\infty } {\alpha \atopwithdelims ()n} \left( \sum _{l=r+1}^{\infty } a_{l}z^{l-1}\right) ^{n} \right| \le C_{3}|z |^{r},\nonumber \\ \end{aligned}$$
(2.20)

where \(C_{3}=R^{-r-1} C_{1} \sum _{n=1}^{\infty } \left( \frac{1}{2} \right) ^{n-1}<\infty \). This proves that for all \(|\omega \tau |\le C_{2}\),

$$\begin{aligned} \left| \widetilde{W}_{r,\alpha } (-\mathrm{i}\omega \tau )-1\right| \le C_{3} |\omega \tau |^{r}. \end{aligned}$$
(2.21)

When \(|\omega \tau |> C_{2}\), we have

$$\begin{aligned} \left| \widetilde{W}_{r,\alpha } (-\mathrm{i}\omega \tau )-1\right|= & {} \left| (-\mathrm{i}\omega \tau )^{-\alpha } \left( \sum _{i=1}^{r} \frac{1}{i}\left( 1-\mathrm{e}^{\mathrm{i}\omega \tau } \right) ^{i} \right) ^{\alpha }-1\right| \nonumber \\< & {} \frac{1}{C_{2}^{r}} \left( 1+ \max \left\{ 1,\frac{1}{C_{2}}\right\} \sum _{i=1}^{r} \frac{2^{i}}{i} \right) | \omega \tau |^{r}. \end{aligned}$$
(2.22)

Therefore, there exists a positive constant \(C_{4}\) independent of \(\omega \tau \) such that

$$\begin{aligned} \left| \widetilde{W}_{r,\alpha } (-\mathrm{i}\omega \tau )-1\right| \le C_{4} |\omega \tau |^{r} \end{aligned}$$
(2.23)

uniformly for \(\omega \tau \in \mathbb {R}\).

Performing the inverse Fourier transformation on both sides of (2.19) leads to

$$\begin{aligned} \Delta _{r,\tau }^{\alpha } y_\mathrm{ex} (t)={_{-\infty }^{~~C}\mathcal{D}_{t}^{\alpha }}y_\mathrm{ex}(t)+\phi (t,\tau ), \end{aligned}$$
(2.24)

where

$$\begin{aligned} \left| \phi (t,\tau )\right|= & {} \frac{1}{2\pi } \left| \int _{-\infty }^{\infty } \mathrm{e}^{-\mathrm{i}\omega t}\left( \widetilde{W}_{r,\alpha } (-\mathrm{i}\omega \tau ) -1\right) (-\mathrm{i}\omega )^{\alpha } \hat{y}_\mathrm{ex}(\omega ) \mathrm{d} \omega \right| \nonumber \\\le & {} \frac{C_{4}}{2\pi } \left( \int _{-\infty }^{\infty } | \omega |^{r+\alpha } |\hat{y}_\mathrm{ex}(\omega )| \mathrm{d} \omega \right) \tau ^{r}. \end{aligned}$$
(2.25)

The condition requiring that \(y(t)\in \mathscr {C}^{r+\alpha }[0,T]\) implies \(\int _{-\infty }^{\infty } | \omega |^{r+\alpha } |\hat{y}_\mathrm{ex}(\omega )| \mathrm{d} \omega <\infty \), and thus, (2.16) holds uniformly for all \(t\in \mathbb {R}\) as \(\tau \rightarrow 0\). \(\square \)

Remark 2.1

We see from Lemma 2.2 that a necessary condition for the condition of Theorem 2.1 to be satisfied is given by \(y(0)=0\). For this reason, we have assumed the homogeneous initial condition \(u(x,0)=0\) in the problem (1.1). If the problem (1.1) is given with the nonhomogeneous initial condition \(u(x,0)=\psi (x)\), as mentioned in Sect. 1, the substitution \(v(x,t)=u(x,t)-\psi (x)\) will transform the problem (1.1) to the problem which has the same form and the homogeneous initial condition (also see the further discussions in Sect. 5). Hence, Theorem 2.1 is directly applicable to the above nonhomogeneous initial condition without any complication. For a detailed study on approximation methods for the Caputo fractional derivative defined on a finite interval with the nonhomogeneous initial condition, we refer to the recent work in [52].

2.3 Two Test Examples

In this subsection, we use two examples to demonstrate the numerical accuracy of the approximation formula (2.13). We only consider the formula for \(r=3,4,5,6\). Let \(\tau =1/N\) be the step, where N is a positive integer. Denote \(t_{n}=n\tau \) \((0\le n\le N)\). We compute the error \(\mathrm{E}_{r}(\tau )\) and the convergence order \( \mathrm{O}_{r}(\tau )\) by

$$\begin{aligned} \mathrm{E}_{r}(\tau )=\displaystyle \max _{0\le n\le N} \left| {_{~0}^{C}\mathcal{D}_{t}^{\alpha }}y(t_{n})-\mathcal{L}_{r,\tau }^{\alpha } y(t_{n})\right| , \qquad \mathrm{O}_{r}(\tau )=\log _{2}\left( \frac{\mathrm{E}_{r}(2\tau )}{\mathrm{E}_{r}(\tau )}\right) . \end{aligned}$$
(2.26)

Example 2.1

Let \(y(t)=t^{r+\alpha }\), where r is a positive integer. The \(\alpha \)th-order Caputo fractional derivative of y(t) is given by

$$\begin{aligned} {_{~0}^{C}\mathcal{D}_{t}^{\alpha }}y(t)=\frac{\Gamma (r+\alpha +1)}{\Gamma (r+1)} t^{r}, \qquad 0<\alpha <1. \end{aligned}$$

We use the approximation formula (2.13) to compute \({_{~0}^{C}\mathcal{D}_{t}^{\alpha }}y(t_{n})\) \((0\le n\le N)\) numerically. Let \(r=3,4,5,6\). The error \(\mathrm{E}_{r}(\tau )\) and the convergence order \(\mathrm{O}_{r}(\tau )\) for \(\alpha =1/4,1/2,3/4\) and different step \(\tau \) are listed in Table 1. It is seen that the approximation formula (2.13) has the convergence order described in Theorem 2.1.

Example 2.2

In this example, we consider the function \(y(t)=\mathrm{e}^{\frac{t}{3}}-\sum _{k=0}^{r} \frac{1}{k!}\left( \frac{t}{3}\right) ^{k}+t^{r+\alpha }\), where r is a positive integer. Its \(\alpha \)th-order Caputo fractional derivative is given by

$$\begin{aligned} {_{~0}^{C}\mathcal{D}_{t}^{\alpha }}y(t)=\frac{t^{1-\alpha }}{3}\sum _{k=r}^{\infty } \frac{1}{\Gamma (k+2-\alpha )}\left( \frac{t}{3}\right) ^{k}+\frac{\Gamma (r+\alpha +1)}{\Gamma (r+1)} t^{r}, \qquad 0<\alpha <1. \end{aligned}$$

In our calculation, the series in the above equation is approximated by its finite sum for \(k=r\) to 50. The error \(\mathrm{E}_{r}(\tau )\) and the convergence order \(\mathrm{O}_{r}(\tau )\) of the approximation formula (2.13) for \(\alpha =1/4,1/2,3/4\) and different step \(\tau \) are listed in Table 2. We see that the numerical results coincide with the theoretical analysis results given in Theorem 2.1.

Table 1 The errors and the convergence orders of the approximation formula (2.13) for Example 2.1
Table 2 The errors and the convergence orders of the approximation formula (2.13) for Example 2.2

3 The Compact Finite Difference Scheme

Let \(h=L/M\) be the spatial step, where M is a positive integer. We partition [0, L] into a mesh by the mesh points \(x_{i}=ih\) \((0\le i\le M)\). Let

$$\begin{aligned}&J_{i}= \frac{1}{h} \left( \displaystyle \int _{x_{i-1}}^{x_{i}} \frac{1}{k(s)}\,\mathrm{d}s \right) ^{-1}, \qquad g_{1,i}(x)=J_{i} \displaystyle \int _{x_{i-1}}^{x} \frac{1}{k(s)}\,\mathrm{d}s,\nonumber \\&g_{2,i}(x)=J_{i} \displaystyle \int _{x}^{x_{i}} \frac{1}{k(s)}\,\mathrm{d}s,\qquad \phi _{1,i}(x)=-\frac{x-x_{i}}{2h}+\frac{(x-x_{i})^{2}}{2h^{2}},\nonumber \\&\phi _{2,i}(x)=1-\frac{(x-x_{i})^{2}}{h^{2}},\qquad \phi _{3,i}(x)=\frac{x-x_{i}}{2h}+\frac{(x-x_{i})^{2}}{2h^{2}}, \end{aligned}$$
(3.1)

and

$$\begin{aligned} E_{i}^{(l)}=\int _{x_{i-1}}^{x_{i}}\phi _{2+l,i}(x)g_{1,i}(x) \mathrm{d}x+\int _{x_{i}}^{x_{i+1}}\phi _{2+l,i}(x)g_{2,i+1}(x)\mathrm{d}x,\qquad l=-\,1,0,1. \end{aligned}$$

For any grid function \(w=\{w_{i}~|~0\le i\le M\}\), we define operators

$$\begin{aligned}&\mathcal{Q}w_{i}=J_{i} w_{i-1}-\left( J_{i}+J_{i+1} \right) w_{i}+J_{i+1}w_{i+1},\nonumber \\&\mathcal{H}w_{i}=E_{i}^{(-1)}w_{i-1}+E_{i}^{(0)}w_{i}+E_{i}^{(1)}w_{i+1}, \qquad 1\le i\le M-1. \end{aligned}$$
(3.2)

Then we have the following lemmas from [45, 46].

Lemma 3.1

Assume that \(\mathcal{L}w(x)\in C^{4}[0,L]\), where \(\mathcal{L}w\) is defined by (1.3). Then it holds that

$$\begin{aligned} \mathcal{H}(\mathcal{L}w)_{i}=\mathcal{Q}w_{i}+\mathcal{O}(h^{4}), \qquad 1\le i\le M-1. \end{aligned}$$
(3.3)

Lemma 3.2

It holds that

$$\begin{aligned} E_{i}^{(l)}=\frac{1}{12}+\mathcal{O}(h)~(l=-1,1), \qquad E_{i}^{(0)}=\frac{5}{6}+\mathcal{O}(h), \qquad 1\le i\le M-1. \end{aligned}$$
(3.4)

Lemma 3.3

There exists a positive constant \(h_{1}^{*}\), independent of h, such that for all \(h\le h_{1}^{*}\),

$$\begin{aligned} E_{i}^{(l)}\ge 0 ~(l=-1,0,1), \qquad E_{i}^{(-1)}+E_{i}^{(1)}\le E_{i}^{(0)}, \qquad 1\le i\le M-1. \end{aligned}$$
(3.5)

For a positive integer N, we let \(\tau =T/N\) be the time step. Denote \(t_{n}=n\tau \) \((0\le n\le N)\). Let u(xt) be the solution of the problem (1.1). Define the grid functions

$$\begin{aligned} \begin{array}{lll} U_{i}^{n}=u(x_{i},t_{n}),\qquad &{} V_{i}^{n}=\mathcal{L}u(x_{i}, t_{n}),\qquad &{} f_{i}^{n}=f(x_{i},t_{n}),\\ \phi _{0}^{n}=\phi _{0}(t_{n}), \qquad &{} \phi _{L}^{n}=\phi _{L}(t_{n}). \end{array} \end{aligned}$$

An application of the approximation formulae (2.13) and (3.3) yields

$$\begin{aligned}&{_{~0}^{C}\mathcal{D}_{t}^{\alpha }}U_{i}^{n}=\tau ^{-\alpha } \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} U_{i}^{n-k}+(R_{t})_{i}^{n}, \qquad 0\le i\le M, ~~1\le n\le N,\end{aligned}$$
(3.6)
$$\begin{aligned}&\mathcal{H}V_{i}^{n}=\mathcal{Q}U_{i}^{n}+(R_{x})_{i}^{n}, \qquad 1\le i\le M-1, ~~1\le n\le N, \end{aligned}$$
(3.7)

where \((R_{t})_{i}^{n}\) and \((R_{x})_{i}^{n}\) are the corresponding local truncation errors. Substituting (3.6) into the governing equation of (1.1), we obtain

$$\begin{aligned} \tau ^{-\alpha } \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} U_{i}^{n-k}=V_{i}^{n}+f_{i}^{n}-(R_{t})_{i}^{n},\qquad 0\le i\le M, ~~1\le n\le N. \end{aligned}$$
(3.8)

Applying \(\mathcal{H}\) to both sides of (3.8) and using (3.7) lead to

$$\begin{aligned} \tau ^{-\alpha } \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} \mathcal{H}U_{i}^{n-k}=\mathcal{Q}U_{i}^{n}+\mathcal{H}f_{i}^{n}+(R_{xt})_{i}^{n},\qquad 1\le i\le M-1, ~~1\le n\le N, \nonumber \\ \end{aligned}$$
(3.9)

where

$$\begin{aligned} (R_{xt})_{i}^{n}=-\mathcal{H}(R_{t})_{i}^{n}+(R_{x})_{i}^{n}. \end{aligned}$$
(3.10)

Omitting the small term \((R_{xt})_{i}^{n}\) in (3.9), we obtain the following compact finite difference scheme:

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} \tau ^{-\alpha } \displaystyle \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} \mathcal{H}u_{i}^{n-k}=\mathcal{Q}u_{i}^{n}+\mathcal{H}f_{i}^{n},&{} 1\le i\le M-1, \quad 1\le n\le N,\\ u_{0}^{n}=\phi _{0}^{n},\quad u_{M}^{n}=\phi _{L}^{n}, &{} 1\le n\le N,\\ u_{i}^{0}=0, &{} 0\le i\le M, \end{array}\right. \end{aligned}$$
(3.11)

where \(u_{i}^{n}\) denotes the finite difference approximation to \(U_{i}^{n}\).

Theorem 3.1

Let u(xt) be the solution of problem (1.1). Assume that \(\mathcal{L}u(x,\cdot )\in C^{4}[0,L]\) and \(u(\cdot ,t)\in C[0,T]\cap \mathscr {C}^{r+\alpha }[0,T]\) with \(\partial _{t}u(\cdot ,t)\in L^{1}[0,T]\) for a positive integer r. Then the truncation error \((R_{xt})_{i}^{n}\) of the compact difference scheme (3.11) satisfies

$$\begin{aligned} \left| (R_{xt})_{i}^{n}\right| \le C^{*} \left( \tau ^{r}+h^{4}\right) , \qquad 1\le i\le M-1,~~1\le n\le N, \end{aligned}$$
(3.12)

where \(C^{*}\) is a positive constant independent of \(\tau \), h and n.

Proof

We have from Theorem 2.1 and Lemma 3.1 that the truncation errors \((R_{t})_{i}^{n}\) and \((R_{x})_{i}^{n}\) in (3.6) and (3.7) have the form

$$\begin{aligned} (R_{t})_{i}^{n}=\mathcal{O}(\tau ^{r}),\qquad (R_{x})_{i}^{n}=\mathcal{O}(h^{4}),\qquad 1\le i\le M-1,~~1\le n\le N. \end{aligned}$$
(3.13)

Since by Lemma 3.2, \(\mathcal{H}w_{i}=\frac{1}{12}(w_{i-1}+10w_{i}+w_{i+1})+(w_{i-1}+w_{i}+w_{i+1})\mathcal{O}(h)\) for any grid function \(w=\{w_{i}~|~0\le i\le M\}\), we apply the estimates in (3.13) into (3.10) to get the desired estimate (3.12) immediately. \(\square \)

Theorem 3.2

The compact difference scheme (3.11) is uniquely solvable for all sufficiently small \(h\le h_{1}^{*}\), where \(h_{1}^{*}\) is the constant defined in Lemma 3.3.

Proof

It is sufficient to prove that the coefficient matrix \(Q^{*}\) of the system from the compact difference scheme (3.11) is nonsingular if \(h\le h_{1}^{*}\). In fact, \(Q^{*}=\mathrm{tridiag}(p_{i-1}^{*}, q_{i}^{*}, r_{i+1}^{*})\), where \(p_{0}^{*}=r_{M}^{*}=0\) and

$$\begin{aligned}&p_{i}^{*}=\tau ^{-\alpha } \varpi _{r,0}^{(\alpha )} E_{i+1}^{(-1)}-J_{i+1}, \qquad 1\le i\le M-2, \\&q_{i}^{*}=\tau ^{-\alpha } \varpi _{r,0}^{(\alpha )} E_{i}^{(0)}+J_{i}+J_{i+1}, \qquad 1\le i\le M-1,\\&r_{i}^{*}=\tau ^{-\alpha } \varpi _{r,0}^{(\alpha )} E_{i-1}^{(1)}-J_{i}, \qquad 2\le i\le M-1. \end{aligned}$$

It is clear that \(q_{i}^{*}>0\) for each \(1\le i\le M-1\).

Case 1 Assume that \(p_{i}^{*}\not =0\) for all \(1\le i\le M-2\). In this case, the matrix \(Q^{*}\) is irreducible. By Lemma 3.3, we have that for \(2\le i\le M-2\),

$$\begin{aligned} |p_{i-1}^{*}|+|r_{i+1}^{*}|\le & {} \tau ^{-\alpha } \varpi _{r,0}^{(\alpha )} \left( E_{i}^{(-1)}+E_{i}^{(1)}\right) +J_{i}+J_{i+1}\\\le & {} \tau ^{-\alpha } \varpi _{r,0}^{(\alpha )} E_{i}^{(0)}+J_{i}+J_{i+1}=|q_{i}^{*}|. \end{aligned}$$

Similarly,

$$\begin{aligned} |r_{2}^{*}|\le \tau ^{-\alpha } \varpi _{r,0}^{(\alpha )} E_{1}^{(1)}+J_{2}<|q_{1}^{*}|,\qquad |p_{M-2}^{*}|\le \tau ^{-\alpha } \varpi _{r,0}^{(\alpha )} E_{M-1}^{(-1)}+J_{M-1}<|q_{M-1}^{*}|. \end{aligned}$$

This proves that \(Q^{*}\) is irreducibly diagonally dominant and thus nonsingular (see [53]).

Case 2 Assume that \(p_{i_{0}}^{*}= 0\) for some \(1\le i_{0}\le M-2\). In this case, we complete the proof by partitioning \(Q^{*}\) and then considering its submatrices. \(\square \)

4 Stability and Convergence

In this section, we carry out the stability and convergence analysis of the compact difference scheme (3.11) using a technique of discrete energy analysis. Since the coefficient matrix of the scheme (3.11) is not symmetric due to the dependence of the coefficient k(x) on x, a direct analysis using the discrete energy method is much more difficult. Motivated by the recent study in [22], we here use an indirect approach by decomposing the coefficient matrix and then applying the discrete energy method to a suitable matrix form of the scheme (3.11).

4.1 A Suitable Matrix Form of the Scheme (3.11)

We now write the compact difference scheme (3.11) in a suitable matrix form for our analysis. Let \(\mathcal{S}_{h}=\{w~|~ w=(w_{0}, w_{1}, \dots , w_{M}), w_{0}=w_{M}=0\}\) be the space of the grid functions defined on the spatial mesh and vanishing on two boundary points. For any \(w\in \mathcal{S}_{h}\), we let \(\mathbf{w}=(w_{1}, w_{2}, \dots ,w_{M-1})^{T}\in \mathbb {R}^{M-1}\). Define

$$\begin{aligned} S=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 1 &{} &{} &{}\\ -\,1 &{} 1 &{} &{}\\ &{}-\,1 &{}1 &{}\\ &{} &{}\ddots &{}\ddots \\ &{}&{}&{}-\,1 &{}1\\ &{}&{}&{} &{}-\,1\\ \end{array} \right] _{M\times (M-1)}, \quad \Lambda =\mathrm{diag}\left( J_{1},J_{2}, \dots , J_{M} \right) . \end{aligned}$$

Then we have that for any \(w\in \mathcal{S}_{h}\),

$$\begin{aligned} \left( \mathcal{Q}w_{1},\mathcal{Q}w_{2},\dots ,\mathcal{Q}w_{M-1} \right) ^{T} =-S^{T} \Lambda S\mathbf{w}. \end{aligned}$$
(4.1)

Let

$$\begin{aligned}&S_{-1}=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} -\,1 &{} 1 &{} &{} &{}\\ &{} -\,1 &{}1 &{} &{}\\ &{} &{}-\,1 &{}1 &{}\\ &{} &{} &{}\ddots &{}\ddots \\ &{}&{}&{} &{}-\,1 &{}1\\ &{}&{}&{}&{} &{}-\,1\\ \end{array} \right] _{(M-1)^{2}}, \\&\quad S_{1}=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 1 &{} &{} &{}&{}\\ -\,1 &{} 1 &{}&{} &{}\\ &{}-\,1 &{}1 &{}&{}\\ &{} &{}\ddots &{}\ddots &{}\\ &{}&{}&{}-\,1 &{}1&{}\\ &{}&{}&{} &{}-\,1&{}1\\ \end{array} \right] _{(M-1)^{2}},\\&\Lambda _{l}=\mathrm{diag}\left( E_{1}^{(-l)}-\frac{1}{12},E_{2}^{(-l)}-\frac{1}{12}, \dots ,E_{M-1}^{(-l)}-\frac{1}{12} \right) , \qquad l=-1,1,\\&Q=\mathrm{tridiag}\left( \frac{1}{12}, P_{i}-\frac{1}{6}, \frac{1}{12}\right) ,\qquad H= Q+ \Lambda _{-1} S_{-1}-\Lambda _{1} S_{1}, \end{aligned}$$

where \(P_{i}=E_{i}^{(-1)}+E_{i}^{(0)}+E_{i}^{(1)}\). Since for \(i=1,2,\dots ,M-1\),

$$\begin{aligned} \mathcal{H}w_{i}= & {} E_{i}^{(-1)}w_{i-1}+E_{i}^{(0)}w_{i}+E_{i}^{(1)}w_{i+1},\nonumber \\= & {} \frac{1}{12}w_{i-1}+\left( P_{i}-\frac{1}{6}\right) w_{i}+\frac{1}{12} w_{i+1}\nonumber \\&\quad +\left( E_{i}^{(-1)}-\frac{1}{12}\right) w_{i-1}+\left( \frac{1}{6}-E_{i}^{(-1)}-E_{i}^{(1)}\right) w_{i} +\left( E_{i}^{(1)}-\frac{1}{12}\right) w_{i+1},\nonumber \end{aligned}$$

we have that for any \(w\in \mathcal{S}_{h}\),

$$\begin{aligned} \left( \mathcal{H}w_{1}, \mathcal{H}w_{2},\dots , \mathcal{H}w_{M-1} \right) ^{T}=H\mathbf{w}. \end{aligned}$$
(4.2)

An application of (4.1) and (4.2) shows that the compact difference scheme (3.11) can be expressed in the matrix form as

$$\begin{aligned} \tau ^{-\alpha } \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} H\mathbf{u}^{n-k}=-S^{T} \Lambda S\mathbf{u}^{n}+\mathbf{g}^{n},\qquad 1\le n\le N, \end{aligned}$$
(4.3)

where

$$\begin{aligned} \mathbf{u}^{n}=\left( u_{1}^{n}, u_{2}^{n}, \dots ,u_{M-1}^{n}\right) ^{T},\qquad \mathbf{g}^{n}=\left( \mathcal{H}f_{1}^{n},\mathcal{H}f_{2}^{n},\dots ,\mathcal{H}f_{M-1}^{n}\right) ^{T}+\mathbf{r}^{n}, \end{aligned}$$
(4.4)

and \(\mathbf{r}^{n}\) absorbs the boundary values of the solution vector. Noticing that \(\mathbf{r}^{n}=0\) when \(u_{0}^{n}=u_{M}^{n}=0\) for all \(0\le n\le N\).

Lemma 4.1

There exists a positive constant \(h_{2}^{*}\), independent of h, such that for all \(h\le h_{2}^{*}\), the matrix Q is symmetric and positive definite. Moreover, \(\lambda _{\min }(Q)\ge \frac{1}{3}\) and \(\lambda _{\max }(Q)\le \frac{4}{3}\), where \(\lambda _{\min }(Q)\) and \(\lambda _{\max }(Q)\) denote the smallest and largest eigenvalues of Q.

Proof

We write

$$\begin{aligned} Q=\mathrm{tridiag}\left( \frac{1}{12}, \frac{5}{6}, \frac{1}{12}\right) +\mathrm{diag}\left( P_{1}-1,P_{2}-1, \dots , P_{M-1}-1\right) . \end{aligned}$$

We have from Lemma 3.2 that \(P_{i}=1+\mathcal{O}(h)\) which implies \(P_{i}-1=\mathcal{O}(h)\) for all \(1\le i\le M-1\). Since the eigenvalues of the matrix \(\mathrm{tridiag}\left( \frac{1}{12}, \frac{5}{6}, \frac{1}{12}\right) \) are given by \(\lambda _{i}=\frac{5}{6}+\frac{1}{6} \cos \frac{i\pi }{M}\) for all \(1\le i\le M-1\), there exists a positive constant \(h_{2}^{*}\), independent of h, such that for all \(h\le h_{2}^{*}\),

$$\begin{aligned} \lambda _{\min }(Q)\ge \frac{5}{6}-\frac{1}{6} -\frac{1}{3}=\frac{1}{3}, \qquad \lambda _{\max }(Q)\le \frac{5}{6}+\frac{1}{6} +\frac{1}{3}=\frac{4}{3}. \end{aligned}$$

The proof is completed. \(\square \)

Let \(B=Q^{-1}\) when \(h\le h_{2}^{*}\). Then the matrix form (4.3) is equivalent to

$$\begin{aligned} \tau ^{-\alpha } \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} BH\mathbf{u}^{n-k}=-BS^{T} \Lambda S\mathbf{u}^{n}+B\mathbf{g}^{n},\qquad 1\le n\le N. \end{aligned}$$
(4.5)

The above matrix form and the following lemmas will be used in our stability and convergence analysis of the compact difference scheme (3.11).

Lemma 4.2

There exists a positive constant \(c_{2}\), independent of h, such that

$$\begin{aligned} \lambda _{\max }(\Lambda _{l}^{2})\le c_{2}h^{2}, \qquad l=-1,1. \end{aligned}$$
(4.6)

Proof

Lemma 3.2 implies \(\Lambda _{l}^{2}=\mathrm{diag}\left( (\mathcal{O}(h))^{2},(\mathcal{O}(h))^{2},\dots ,(\mathcal{O}(h))^{2}\right) \) for \(l=-1,1\). This proves the result (4.6). \(\square \)

Lemma 4.3

It holds that

  1. (1)

    \(\frac{c_{0}}{h^{2}}\le \lambda _{\min }(\Lambda )\le \lambda _{\max }(\Lambda )\le \frac{c_{1}}{h^{2}}\), where \(c_{0}\) and \(c_{1}\) are the constants in (1.4),

  2. (2)

    \(\lambda _{\max }(SS^{T})\le 4\),

  3. (3)

    \((S_{l}{} \mathbf{w})^{T}S_{l}{} \mathbf{w}\le (S\mathbf{w})^{T}S\mathbf{w}\) for any \(\mathbf{w}\in \mathbb {R}^{M-1}\) and \(l=-1,1\).

Proof

(a) The result (1) follows from (1.4).

(b) Since \(SS^{T}=\mathrm{tridiag}(-\,1,2,-\,1)+\mathrm{diag}(-\,1,0,\dots ,0,-\,1)\), the result (2) follows from the theorem of Gerschgorin (see [53]).

(c) For any \(\mathbf{w}=(w_{1}, w_{2}, \dots ,w_{M-1})^{T}\in \mathbb {R}^{M-1}\),

$$\begin{aligned} (S_{-1}{} \mathbf{w})^{T}S_{-1}{} \mathbf{w}= & {} \sum _{i=2}^{M-1} (w_{i}-w_{i-1})^{2}+(w_{M-1})^{2},\nonumber \\ (S_{1}{} \mathbf{w})^{T}S_{1}\mathbf{w}= & {} \sum _{i=2}^{M-1} (w_{i}-w_{i-1})^{2}+(w_{1})^{2},\nonumber \\ (S\mathbf{w})^{T}S\mathbf{w}= & {} \sum _{i=2}^{M-1} (w_{i}-w_{i-1})^{2}+(w_{1})^{2}+(w_{M-1})^{2}. \end{aligned}$$
(4.7)

This proves the result (3). \(\square \)

Lemma 4.4

There exists a positive constant \(h_{3}^{*}\), independent of h, such that for all \(h\le h_{3}^{*}\),

$$\begin{aligned} (H\mathbf{w})^{T}H\mathbf{w}\le \frac{27}{8} \mathbf{w}^{T}{} \mathbf{w}, \qquad \mathbf{w}\in \mathbb {R}^{M-1}. \end{aligned}$$

Proof

We have from Lemma 3.2 that there exists a positive constant \(h_{3}^{*}\), independent of h, such that for all \(h\le h_{3}^{*}\),

$$\begin{aligned} (E_{i}^{(-1)})^{2}\le \frac{1}{16}, \qquad (E_{i}^{(0)})^{2}\le 1, \qquad (E_{i}^{(1)})^{2}\le \frac{1}{16}, \qquad 1\le i\le M-1. \end{aligned}$$

Let \(w_{i}\) \((1\le i\le M-1)\) denote the ith-component of \(\mathbf{w}\) and let \(w_{0}=w_{M}=0\). Then we have

$$\begin{aligned} (H\mathbf{w})^{T}H\mathbf{w}= & {} \sum _{i=1}^{M-1} \left( E_{i}^{(-1)}w_{i-1}+E_{i}^{(0)}w_{i}+E_{i}^{(1)}w_{i+1}\right) ^{2}\\\le & {} 3\sum _{i=1}^{M-1} \left( (E_{i}^{(-1)})^{2}(w_{i-1})^{2}+(E_{i}^{(0)})^{2}(w_{i})^{2}+(E_{i}^{(1)})^{2}(w_{i+1})^{2} \right) \\\le & {} \frac{27}{8} \sum _{i=1}^{M-1} (w_{i})^{2}=\frac{27}{8} \mathbf{w}^{T}{} \mathbf{w}. \end{aligned}$$

This completes the proof. \(\square \)

4.2 Analysis of Stability and Convergence

Now we turn to the analysis of stability and convergence for the scheme (3.11), based on its matrix form (4.5). We first introduce the following lemma.

Lemma 4.5

Let \(\varpi _{r,k}^{(\alpha )}\) be defined by (2.4), where \(0<\alpha <1\) and \(2\le r\le 6\). If \(r=4\), we assume \(0<\alpha \le \alpha ^{*}\), where

$$\begin{aligned} \alpha ^{*}=\frac{\pi }{\pi -\arccos \left( \frac{1}{5}\right) +2\arctan \left( \frac{191\sqrt{6}}{317} \right) }\approx 0.8439. \end{aligned}$$

Then for any nonnegative integer n and \(w^{m}\in \mathcal{S}_{h}\) \((0\le m\le n)\), it holds that

$$\begin{aligned} \sum _{m=0}^{n} \sum _{k=0}^{m}\varpi _{r,k}^{(\alpha )} \left( \mathbf{w}^{m}\right) ^{T} \mathbf{w}^{m-k}\ge 0, \qquad 2\le r\le 6. \end{aligned}$$
(4.8)

Proof

Denote by \(w_{i}^{m}\) the ith-component of \(\mathbf{w}^{m}\). It is sufficient to prove that for each \(1\le i\le M-1\),

$$\begin{aligned} \sum _{m=0}^{n} \left( \sum _{k=0}^{m}\varpi _{r,k}^{(\alpha )} w_{i}^{m-k} \right) w_{i}^{m}\ge 0, \qquad 2\le r\le 6. \end{aligned}$$
(4.9)

Let

$$\begin{aligned} \varpi _{r}^{(\alpha )}=\frac{1}{2} \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 2\varpi _{r,0}^{(\alpha )}&{} \varpi _{r,1}^{(\alpha )}&{}\varpi _{r,2}^{(\alpha )}&{} \cdots \cdots &{} \varpi _{r,n-1}^{(\alpha )}&{}\varpi _{r,n}^{(\alpha )}\\ \varpi _{r,1}^{(\alpha )}&{} 2\varpi _{r,0}^{(\alpha )} &{}\varpi _{r,1}^{(\alpha )}&{} \cdots \cdots ~&{} ~\varpi _{r,n-2}^{(\alpha )}~&{}~\varpi _{r,n-1}^{(\alpha )}\\ &{}&{}&{}\cdots \cdots &{}&{}\\ \varpi _{r,n-1}^{(\alpha )}~&{}~ \varpi _{r,n-2}^{(\alpha )}~ &{}~\varpi _{r,n-3}^{(\alpha )}~&{}~ \cdots \cdots &{} 2\varpi _{r,0}^{(\alpha )}&{}\varpi _{r,1}^{(\alpha )}\\ \varpi _{r,n}^{(\alpha )}&{} \varpi _{r,n-1}^{(\alpha )} &{}\varpi _{r,n-2}^{(\alpha )}&{} \cdots \cdots &{} \varpi _{r,1}^{(\alpha )}&{}2\varpi _{r,0}^{(\alpha )} \end{array} \right] . \end{aligned}$$

One can easily check that the validity of (4.9) is equivalent to that the above symmetric Toeplitz matrix \(\varpi _{r}^{(\alpha )}\) is positive semi-definite. By the Grenander-Szeg\(\ddot{\mathrm{o}}\) theorem (see [54]), the matrix \(\varpi _{r}^{(\alpha )}\) is positive semi-definite if its generating function \(f_{r}(\alpha ,\theta )\) is nonnegative for all \(\theta \in [-\pi ,\pi ]\), where

$$\begin{aligned} f_{r}(\alpha ,\theta )=\varpi _{r,0}^{(\alpha )}+\frac{1}{2} \sum _{k=1}^{\infty }\varpi _{r,k}^{(\alpha )}\left( \mathrm{e}^{\mathbf{i}k\theta }+\mathrm{e}^{-\mathbf{i}k\theta }\right) =\sum _{k=0}^{\infty }\varpi _{r,k}^{(\alpha )}\cos (k\theta ). \end{aligned}$$

However, the latter follows from Theorems 2.1 and 2.2 in [38]. The proof is completed. \(\square \)

For any \(w\in \mathcal{S}_{h}\), we define its \(L^{2}\) norm \(\Vert w \Vert \), \(L^{\infty }\) norm \(\Vert w \Vert _{\infty }\) and \(H^{1}\) norm \(\Vert w \Vert _{1}\) by

$$\begin{aligned} \Vert w\Vert =\left( h\sum \limits _{i=1}^{M-1} (w_{i})^{2}\right) ^{\frac{1}{2}}, \qquad \Vert w\Vert _{\infty }=\max _{0\le i\le M} |w_{i}|, \qquad \Vert w\Vert _{1}=\left( \Vert w\Vert ^{2} +|w|_{1}^{2} \right) ^{\frac{1}{2}}, \end{aligned}$$

where \(|w|_{1}^{2}=\frac{1}{h}\sum _{i=1}^{M} (w_{i}-w_{i-1})^{2}\). It is known from [55] (pages 111 and 112) that for any \(w\in \mathcal{S}_{h}\),

$$\begin{aligned} \Vert w\Vert ^{2}\le \frac{L^{2}}{8} |w|_{1}^{2}, \qquad \Vert w\Vert _{\infty }^{2}\le \frac{L}{4} |w|_{1}^{2}. \end{aligned}$$
(4.10)

Also we have that for all \(w\in \mathcal{S}_{h}\), \(\Vert w\Vert ^{2}=h\mathbf{w}^{T}{} \mathbf{w}\) and \(h|w|_{1}^{2}=(S\mathbf{w})^{T}S\mathbf{w}\) (see (4.7)), where \(\mathbf{w}=(w_{1}, w_{2}, \dots ,w_{M-1})^{T}\). The following theorem gives an a prior estimate of the compact difference scheme (3.11).

Theorem 4.1

Assume that the condition in Lemma 4.5 is satisfied and let \(h_{4}^{*}\) be a positive constant such that \(( \frac{c_{2}}{c_{0}}(1+h^{3})+36c_{1} ) h<\frac{3}{4}\) for all \(h\le h_{4}^{*}\), where \(c_{0}\), \(c_{1}\) and \(c_{2}\) are the constants in (1.4)) and (4.6). Also let \(u^{n}=(u_{0}^{n}, u_{1}^{n}, \dots , u_{M}^{n})\) be the solution of the compact difference scheme (3.11) with the initial value \(u_{i}^{0}\) and \(u_{0}^{n}=u_{M}^{n}=0\) for all \(0\le n\le N\). Then when \(h\le \min \{h_{1}^{*}, h_{2}^{*}, h_{3}^{*}, h_{4}^{*}\}\), where \(h_{i}^{*}\) \((i=1,2,3)\) are the constants defined in Lemmas 3.3, 4.1 and 4.4, it holds that for \(2\le r\le 6\),

$$\begin{aligned} \tau \sum _{m=1}^{n}|u^{m}|_{1}^{2} \le \frac{81\varpi _{r,0}^{(\alpha )}}{8c_{0}c_{3}}\tau ^{1-\alpha }\Vert u^{0} \Vert ^{2}+ \frac{L^{2}+72c_{0}}{8c_{0}^{2}c_{3}}\tau \sum _{m=1}^{n} \Vert \mathcal{H}f^{m}\Vert ^{2},\qquad 1\le n\le N,\nonumber \\ \end{aligned}$$
(4.11)

where \(c_{3}= \frac{3}{4}- ( \frac{c_{2}}{c_{0}}(1+h^{3})+36c_{1} ) h>0 \).

Proof

Multiplying (4.5) with \(h(\mathbf{u}^{n})^{T}H^{T}\) to get

$$\begin{aligned} \tau ^{-\alpha } \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} h (\mathbf{u}^{n})^{T}H^{T}BH\mathbf{u}^{n-k}= & {} -h (\mathbf{u}^{n})^{T}H^{T}BS^{T} \Lambda S\mathbf{u}^{n}+h (\mathbf{u}^{n})^{T}H^{T}B\mathbf{g}^{n}.\nonumber \\ \end{aligned}$$
(4.12)

Since \(H= Q+ \Lambda _{-1} S_{-1}-\Lambda _{1} S_{1}\) and \(B=Q^{-1}\), we have

$$\begin{aligned} -\,h (\mathbf{u}^{n})^{T}H^{T}BS^{T} \Lambda S\mathbf{u}^{n}= & {} -h \left( S\mathbf{u}^{n} \right) ^{T} \Lambda S\mathbf{u}^{n}+ h( \Lambda _{1} S_{1} \mathbf{u}^{n})^{T}BS^{T}\Lambda S \mathbf{u}^{n}\nonumber \\&-h( \Lambda _{-1} S_{-1} \mathbf{u}^{n})^{T}BS^{T}\Lambda S \mathbf{u}^{n}, \end{aligned}$$
(4.13)

and

$$\begin{aligned} h (\mathbf{u}^{n})^{T}H^{T}B\mathbf{g}^{n}=h(\mathbf{u}^{n})^{T}\mathbf{g}^{n}+h( \Lambda _{-1} S_{-1} \mathbf{u}^{n})^{T}B\mathbf{g}^{n}-h( \Lambda _{1} S_{1} \mathbf{u}^{n})^{T}B\mathbf{g}^{n}. \end{aligned}$$
(4.14)

We first estimate \(-h (\mathbf{u}^{n})^{T}H^{T}BS^{T} \Lambda S\mathbf{u}^{n}\). By the Cauchy–Schwarz inequality,

$$\begin{aligned}&h( \Lambda _{1} S_{1} \mathbf{u}^{n})^{T}BS^{T}\Lambda S \mathbf{u}^{n}= \left( h^{-1}\Lambda _{1} S_{1}{} \mathbf{u}^{n} \right) ^{T}\left( h^{2}BS^{T}\Lambda S \mathbf{u}^{n} \right) \nonumber \\&\quad \le \frac{h^{-2}}{2} \left( \Lambda _{1} S_{1}{} \mathbf{u}^{n} \right) ^{T}\left( \Lambda _{1} S_{1}{} \mathbf{u}^{n} \right) + \frac{h^{4}}{2}\left( BS^{T}\Lambda S \mathbf{u}^{n} \right) ^{T} \left( BS^{T}\Lambda S \mathbf{u}^{n} \right) \nonumber \\&\quad = \frac{h^{-2}}{2} (S_{1}{} \mathbf{u}^{n})^{T} \Lambda _{1}^{2} S_{1}{} \mathbf{u}^{n}+\frac{h^{2}}{2}\left( \Lambda ^{\frac{1}{2}} S\mathbf{u}^{n} \right) ^{T} h^{2}\Lambda ^{\frac{1}{2}} S B^{2}S^{T} \Lambda ^{\frac{1}{2}} \left( \Lambda ^{\frac{1}{2}} S\mathbf{u}^{n} \right) . \end{aligned}$$
(4.15)

We have from Lemmas 4.2 and 4.3 that

$$\begin{aligned} h^{-2}(S_{1}{} \mathbf{u}^{n})^{T} \Lambda _{1}^{2} S_{1}{} \mathbf{u}^{n}\le c_{2} (S_{1}{} \mathbf{u}^{n})^{T} S_{1}{} \mathbf{u}^{n}\le c_{2} (S\mathbf{u}^{n})^{T} S\mathbf{u}^{n}\le \frac{c_{2}}{c_{0}}h^{2} (S\mathbf{u}^{n})^{T}\Lambda S\mathbf{u}^{n}.\nonumber \\ \end{aligned}$$
(4.16)

Using Lemmas 4.1 and 4.3, we obtain

$$\begin{aligned}&\lambda _\mathrm{max} \left( h^{2}\Lambda ^{\frac{1}{2}} S B^{2}S^{T} \Lambda ^{\frac{1}{2}} \right) \le \lambda _\mathrm{max} \left( S S^{T} \right) \lambda _\mathrm{max} \left( h^{2}\Lambda \right) \lambda _\mathrm{max} \left( B^{2}\right) \le 36c_{1}. \end{aligned}$$
(4.17)

This implies

$$\begin{aligned} \left( \Lambda ^{\frac{1}{2}} S\mathbf{u}^{n} \right) ^{T} h^{2}\Lambda ^{\frac{1}{2}} S B^{2}S^{T} \Lambda ^{\frac{1}{2}} \left( \Lambda ^{\frac{1}{2}} S\mathbf{u}^{n} \right) \le 36c_{1} (S\mathbf{u}^{n})^{T}\Lambda S\mathbf{u}^{n}. \end{aligned}$$
(4.18)

Substituting (4.16) and (4.18) into (4.15), we get

$$\begin{aligned} h( \Lambda _{1} S_{1} \mathbf{u}^{n})^{T}BS^{T}\Lambda S \mathbf{u}^{n}\le \left( \frac{c_{2}}{2c_{0}}+18c_{1}\right) h^{2} (S\mathbf{u}^{n})^{T}\Lambda S\mathbf{u}^{n}. \end{aligned}$$
(4.19)

A similar argument gives

$$\begin{aligned} -\,h( \Lambda _{-1} S_{-1} \mathbf{u}^{n})^{T}BS^{T}\Lambda S \mathbf{u}^{n}\le \left( \frac{c_{2}}{2c_{0}}+18c_{1}\right) h^{2} (S\mathbf{u}^{n})^{T}\Lambda S\mathbf{u}^{n}. \end{aligned}$$
(4.20)

Hence we have from (4.19), (4.20) and (4.13) that

$$\begin{aligned} -\,h (\mathbf{u}^{n})^{T}H^{T}BS^{T} \Lambda S\mathbf{u}^{n}\le -\left( 1- \left( \frac{c_{2}}{c_{0}}+36c_{1}\right) h\right) h \left( S\mathbf{u}^{n} \right) ^{T} \Lambda S\mathbf{u}^{n}. \end{aligned}$$
(4.21)

We next estimate \(h (\mathbf{u}^{n})^{T}H^{T}B\mathbf{g}^{n}\). It follows from the Cauchy–Schwarz inequality, (4.10) and Lemma 4.3 that

$$\begin{aligned} h (\mathbf{u}^{n})^{T} \mathbf{g}^{n}\le & {} \frac{2c_{0}h}{L^{2}} (\mathbf{u}^{n})^{T}{} \mathbf{u}^{n}+\frac{L^{2}h}{8c_{0}}(\mathbf{g}^{n})^{T}{} \mathbf{g}^{n} \le \frac{c_{0}}{4h} \left( S\mathbf{u}^{n} \right) ^{T} \left( S\mathbf{u}^{n} \right) +\frac{L^{2}h}{8c_{0}}(\mathbf{g}^{n})^{T}{} \mathbf{g}^{n}\nonumber \\\le & {} \frac{h}{4} \left( S\mathbf{u}^{n} \right) ^{T} \Lambda \left( S\mathbf{u}^{n} \right) +\frac{L^{2}h}{8c_{0}} (\mathbf{g}^{n})^{T}\mathbf{g}^{n}. \end{aligned}$$
(4.22)

Also by the Cauchy–Schwarz inequality and Lemmas 4.14.3,

$$\begin{aligned} -\,h( \Lambda _{1} S_{1} \mathbf{u}^{n})^{T}B\mathbf{g}^{n}\le & {} \frac{h}{2} ( S_{1} \mathbf{u}^{n})^{T} \Lambda _{1}^{2}S_{1}{} \mathbf{u}^{n}+\frac{h}{2} (\mathbf{g}^{n})^{T} B^{2} \mathbf{g}^{n}\nonumber \\\le & {} \frac{c_{2}h^{5}}{2c_{0}} \left( S\mathbf{u}^{n}\right) ^{T} \Lambda \left( S \mathbf{u}^{n}\right) +\frac{9h}{2} (\mathbf{g}^{n})^{T} \mathbf{g}^{n}. \end{aligned}$$
(4.23)

Similarly,

$$\begin{aligned} h( \Lambda _{-1} S_{-1} \mathbf{u}^{n})^{T}B\mathbf{g}^{n}\le \frac{c_{2}h^{5}}{2c_{0}} \left( S\mathbf{u}^{n}\right) ^{T} \Lambda \left( S \mathbf{u}^{n}\right) +\frac{9h}{2} (\mathbf{g}^{n})^{T} \mathbf{g}^{n}. \end{aligned}$$
(4.24)

Therefore, it follows from (4.14) and (4.22)–(4.24) that

$$\begin{aligned} h (\mathbf{u}^{n})^{T}H^{T}B\mathbf{g}^{n}\le \left( \frac{1}{4}+ \frac{c_{2}h^{4}}{c_{0}}\right) h \left( S\mathbf{u}^{n} \right) ^{T} \Lambda \left( S\mathbf{u}^{n} \right) +\frac{(L^{2}+72c_{0})h}{8c_{0}} (\mathbf{g}^{n})^{T}\mathbf{g}^{n}. \end{aligned}$$
(4.25)

Applying (4.21) and (4.25) into (4.12), we obtain

$$\begin{aligned}&\tau ^{-\alpha } \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} h (\mathbf{u}^{n})^{T}H^{T}BH\mathbf{u}^{n-k}\le -c_{3}h \left( S\mathbf{u}^{n} \right) ^{T} \Lambda S\mathbf{u}^{n} +\frac{(L^{2}+72c_{0})h}{8c_{0}} (\mathbf{g}^{n})^{T}{} \mathbf{g}^{n},~~~~~~~\nonumber \\ \end{aligned}$$
(4.26)

where \(c_{3}= \frac{3}{4}- ( \frac{c_{2}}{c_{0}}(1+h^{3})+36c_{1} ) h >0 \) for all \(h\le h_{4}^{*}\). Moreover, by Lemma 4.3,

$$\begin{aligned} \tau ^{-\alpha } \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} h (\mathbf{u}^{n})^{T}H^{T}BH\mathbf{u}^{n-k} \le -\frac{c_{0}c_{3}}{h} \left( S\mathbf{u}^{n} \right) ^{T}S\mathbf{u}^{n} +\frac{(L^{2}+72c_{0})h}{8c_{0}} (\mathbf{g}^{n})^{T}{} \mathbf{g}^{n}. \nonumber \\ \end{aligned}$$
(4.27)

Replacing n by m and then summing up for m from 1 to n on both sides of (4.27), we have

$$\begin{aligned}&\tau ^{-\alpha } \sum _{m=1}^{n}\sum _{k=0}^{m}\varpi _{r,k}^{(\alpha )} h (\mathbf{u}^{m})^{T}H^{T}BH\mathbf{u}^{m-k}\nonumber \\&\quad \le -\frac{c_{0}c_{3}}{h} \sum _{m=1}^{n} \left( S\mathbf{u}^{m} \right) ^{T}S\mathbf{u}^{m} + \frac{(L^{2}+72c_{0})h}{8c_{0}}\sum _{m=1}^{n} (\mathbf{g}^{m})^{T}\mathbf{g}^{m} \end{aligned}$$

or equivalently,

$$\begin{aligned}&\tau ^{-\alpha } \sum _{m=0}^{n}\sum _{k=0}^{m}\varpi _{r,k}^{(\alpha )} h (\mathbf{u}^{m})^{T}H^{T}BH\mathbf{u}^{m-k}\le -\frac{c_{0}c_{3}}{h} \sum _{m=1}^{n} \left( S\mathbf{u}^{m} \right) ^{T}S\mathbf{u}^{m}\nonumber \\&\quad +\tau ^{-\alpha } \varpi _{r,0}^{(\alpha )} h (\mathbf{u}^{0})^{T}H^{T}BH\mathbf{u}^{0}+ \frac{(L^{2}+72c_{0})h}{8c_{0}}\sum _{m=1}^{n} (\mathbf{g}^{m})^{T}\mathbf{g}^{m}. \end{aligned}$$
(4.28)

Since the matrix B is symmetric and positive definite, there exists a symmetric and positive definite matrix \(B_{1}\) such that \(B=B_{1}^{T}B_{1}\). Let \(\mathbf{w}^{n}= B_{1}H\mathbf{u}^{n}\). Then by Lemma 4.5,

$$\begin{aligned} \tau ^{-\alpha } \sum _{m=0}^{n}\sum _{k=0}^{m}\varpi _{r,k}^{(\alpha )} h (\mathbf{u}^{m})^{T}H^{T}BH\mathbf{u}^{m-k}=\tau ^{-\alpha } \sum _{m=0}^{n}\sum _{k=0}^{m}\varpi _{r,k}^{(\alpha )} h (\mathbf{w}^{m})^{T}{} \mathbf{w}^{m-k}\ge 0. \nonumber \\ \end{aligned}$$
(4.29)

This together with (4.28) implies

$$\begin{aligned} \frac{1}{h}\sum _{m=1}^{n} \left( S\mathbf{u}^{m} \right) ^{T}S\mathbf{u}^{m} \le \tau ^{-\alpha } \frac{\varpi _{r,0}^{(\alpha )}}{c_{0}c_{3}} h (\mathbf{u}^{0})^{T}H^{T}BH\mathbf{u}^{0}+\frac{(L^{2}+72c_{0})h}{8c_{0}^{2}c_{3}}\sum _{m=1}^{n} (\mathbf{g}^{m})^{T}{} \mathbf{g}^{m}.\nonumber \\ \end{aligned}$$
(4.30)

By Lemmas 4.1 and 4.4, \(h(\mathbf{u}^{0})^{T}H^{T}BH\mathbf{u}^{0}\le 3h(\mathbf{u}^{0})^{T}H^{T}H\mathbf{u}^{0}\le \frac{81}{8}h(\mathbf{u}^{0})^{T}{} \mathbf{u}^{0}\). It is clear that

$$\begin{aligned} h(\mathbf{u}^{0})^{T}{} \mathbf{u}^{0}=\Vert u^{0} \Vert ^{2},\qquad \frac{1}{h}\left( S\mathbf{u}^{m} \right) ^{T}S\mathbf{u}^{m}=|u^{m}|_{1}^{2},\qquad h(\mathbf{g}^{m})^{T}{} \mathbf{g}^{m}= \Vert \mathcal{H}f^{m}\Vert ^{2}. \end{aligned}$$

Then the estimate (4.11) follows immediately from (4.30). \(\square \)

A similar argument as that for proving Lemma 4.4 shows that when \(h\le h_{3}^{*}\),

$$\begin{aligned} \Vert \mathcal{H}f^{n}\Vert ^{2}\le \frac{27}{8} \Vert f^{n} \Vert ^{2}+\frac{3}{16} h \left( (f_{0}^{n})^{2} +(f_{M}^{n})^{2} \right) . \end{aligned}$$

This observation and Theorem 4.1 imply that the compact difference scheme (3.11) is unconditionally stable to the initial value \(u^{0}\) and the source term f, or more precisely, it is stable without any restriction on the time step \(\tau \) in terms of the spatial step h.

We now consider the convergence of the compact difference scheme (3.11). Let \(e_{i}^{n}=U_{i}^{n}-u_{i}^{n}\). From (3.9) and (3.11), we get the following error equation:

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} \tau ^{-\alpha } \displaystyle \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} \mathcal{H}e_{i}^{n-k}=\mathcal{Q}e_{i}^{n}+(R_{xt})_{i}^{n},&{} 1\le i\le M-1, \quad 1\le n\le N,\\ e_{0}^{n}=e_{M}^{n}=0, &{} 1\le n\le N,\\ e_{i}^{0}=0, &{} 0\le i\le M. \end{array}\right. \end{aligned}$$
(4.31)

Based on this error equation, we have the following convergence result.

Theorem 4.2

Let \(U_{i}^{n}\) denote the value of the solution u(xt) of the problem (1.1) at the mesh point \((x_{i},t_{n})\) and let \(u^{n}=(u_{0}^{n}, u_{1}^{n}, \dots , u_{M}^{n})\) be the solution of the compact difference scheme (3.11). Assume that the conditions in Theorems 3.1 and 4.1 are satisfied. Then we have that for \(2\le r\le 6\),

$$\begin{aligned}&\left( \tau \sum _{m=1}^{n}\left\| U^{m}-u^{m}\right\| _{1}^{2} \right) ^{\frac{1}{2}}\le C_{1}\left( L^{2}+72c_{0}\right) ^{\frac{1}{2}}\left( \tau ^{r}+h^{4}\right) ,\qquad 1\le n\le N,\end{aligned}$$
(4.32)
$$\begin{aligned}&\left( \tau \sum _{m=1}^{n} \left\| U^{m}-u^{m}\right\| ^{2}\right) ^{\frac{1}{2}}\le C_{2}\left( L^{2}+72c_{0}\right) ^{\frac{1}{2}}\left( \tau ^{r}+h^{4}\right) ,\qquad 1\le n\le N, \end{aligned}$$
(4.33)
$$\begin{aligned}&\left( \tau \sum _{m=1}^{n} \left\| U^{m}-u^{m}\right\| _{\infty }^{2}\right) ^{\frac{1}{2}}\le C_{3}\left( L^{2}+72c_{0}\right) ^{\frac{1}{2}}\left( \tau ^{r}+h^{4}\right) ,\qquad 1\le n\le N, \end{aligned}$$
(4.34)

where

$$\begin{aligned} C_{1}=\frac{C^{*}}{8c_{0}}\left( \frac{(L^{2}+8)LT}{c_{3}}\right) ^{\frac{1}{2}}, \qquad C_{2}=\frac{C^{*}}{8c_{0}}\left( \frac{L^{3}T}{c_{3}}\right) ^{\frac{1}{2}}, \qquad C_{3}=\frac{C^{*}L}{4c_{0}}\left( \frac{T}{2c_{3}}\right) ^{\frac{1}{2}}. \end{aligned}$$

Proof

It follows from (4.31) and Theorem 4.1 that

$$\begin{aligned} \tau \sum _{m=1}^{n} \left| e^{m}\right| _{1}^{2}\le \frac{L^{2}+72c_{0}}{8c_{0}^{2}c_{3}}\tau \sum _{m=1}^{n} \Vert (R_{xt})^{m}\Vert ^{2}. \end{aligned}$$

Applying Theorem 3.1, we get the estimate

$$\begin{aligned} \tau \sum _{m=1}^{n} \left| e^{m}\right| _{1}^{2}\le \frac{LT{C^{*}}^{2}}{8c_{0}^{2}c_{3}}\left( L^{2}+72c_{0}\right) \left( \tau ^{r}+h^{4}\right) ^{2}. \end{aligned}$$
(4.35)

Finally, the estimates in (4.32)–(4.34) follow from (4.35) and (4.10). \(\square \)

Theorem 4.2 shows that the compact difference scheme (3.11) converges with the convergence order \(\mathcal{O}(\tau ^{r}+h^{4})\), regardless of the order \(\alpha \) of the fractional derivative.

5 Further Discussions

5.1 On Zero-Derivatives Condition

Here we give a discussion about the condition used in Theorems 3.1, 4.1 and 4.2, i.e., that \(u(\cdot ,t)\in C[0,T]\cap \mathscr {C}^{r+\alpha }[0,T]\) with \(\partial _{t}u(\cdot ,t)\in L^{1}[0,T]\). To do this, we first introduce the following proposition, the proof of which will be given in “Appendix”.

Proposition 5.1

Assume \(y(t)\in C^{r+1}[0,T]\) with \(y^{(r+2)}(t)\in L^{1}[0,T]\) for a nonnegative integer r. Then for \(\alpha \in (0,1)\),

$$\begin{aligned} y(t)\in \mathscr {C}^{r+\alpha }[0,T]\Longleftrightarrow y^{(k)}(0)=0~\mathrm{for}~k=0,1,\dots ,r. \end{aligned}$$
(5.1)

In Theorems 3.1, 4.1 and 4.2, we have assumed that \(u(\cdot ,t)\in C[0,T]\cap \mathscr {C}^{r+\alpha }[0,T]\) with \(\partial _{t}u(\cdot ,t)\in L^{1}[0,T]\). As in the known treatments of the Grünwald or Lubich difference approximations and their modifications ([30,31,32, 34, 36, 37]), this assumption ensures that the approximation (2.13) (with y(t) replaced by \(u(\cdot ,t)\)) holds uniformly for all \(t\in [0,T]\) as \(\tau \rightarrow 0\) and thus the rth-order temporal accuracy of the compact difference scheme (3.11). With the help of Proposition 5.1, one see that \(u(\cdot ,t)\in \mathscr {C}^{r+\alpha }[0,T]\) is equivalent to that \(\partial _{t} u(\cdot ,0)=0~(k=0,1,\dots ,r)\) if \(u(\cdot ,t)\) is smooth enough. Generally, the condition requiring that the analytical solution of (1.1) and its several derivatives with respect to t must be zero at \(t=0\) is essential for the high-order accuracy of the scheme (3.11). We refer to it as “zero-derivatives condition”. In order to preserve high-order accuracy of the scheme (3.11) for the problem (1.1) without the above zero-derivatives condition, we provide two basic techniques as follows:

(1) Using a suitable transformation. The main idea of this technique is to consider the problem (1.1) for

$$\begin{aligned} v(x,t)=u(x,t)-\sum _{p=0}^{r}\frac{\partial _{t}^{p}u(x,0)}{p!}t^{p} \end{aligned}$$
(5.2)

instead, where the coefficients \(\partial _{t}^{p}u(x,0)\) \((p=1,2,\dots ,r)\) are specified later. This technique transforms the problem (1.1) into an equivalent problem which has the same form and satisfies the zero-derivatives condition (see, e.g., [56]). Hence, the compact difference scheme (3.11) preserves its high-order accuracy. Moreover, the stability and convergence analysis under the zero-derivatives condition remains valid. In the next section, we shall use a numerical example to show the effectiveness of this technique. The coefficients \(\partial _{t}^{p}u(x,0)\) \((p=1,2,\dots ,r)\) in (5.2) can be computed from the known function f(xt) as given in the following propositions, the proofs of which are left in “Appendix”.

Proposition 5.2

Let u(xt) be the solution of the problem (1.1). Assume that both u(xt) and \(\mathcal{L}u(x,t)\) are in \(C^{0,1}([0,L]\times [0,T])\). Then

$$\begin{aligned} \partial _{t}u(x,0)={_{~0}^{C}}\mathcal{D}_{t}^{1-\alpha }f(x,0),\qquad x\in [0,L]. \end{aligned}$$
(5.3)

Proposition 5.3

Let u(xt) be the solution of the problem (1.1). Assume that both u(xt) and \(\mathcal{L}u(x,t)\) are in \(C^{0,r}([0,L]\times [0,T])\) for a positive integer \(r\ge 2\). Define

$$\begin{aligned}&F_{p}(x,t)=\sum _{l=1}^{p} \frac{t^{l-\alpha -p}}{\Gamma (l-\alpha -p+1)} \partial _{t}^{l} u(x,0)-\partial _{t}^{p}f(x,t)~(t\not =0),\quad p=1,2,\dots ,r-1, \\&G_{p}(x,t)=\sum _{l=1}^{p-1} \frac{t^{l+\alpha -p}}{\Gamma (l+\alpha -p+1)} \partial _{t}^{l} (\mathcal{L}u)(x,0)+\partial _{t}^{p-1}({_{~0}^{C}}\mathcal{D}_{t}^{1-\alpha }f)(x,t)~(t\not =0),\quad p=2,3,\dots ,r. \end{aligned}$$

Then for \(x\in [0,L]\), the limits \(\lim _{t\rightarrow 0} F_{p}(x,t)\) and \(\lim _{t\rightarrow 0} G_{p}(x,t)\) exist and

$$\begin{aligned}&\partial _{t}^{p}(\mathcal{L}u)(x,0)= \lim _{t\rightarrow 0} F_{p}(x,t), \qquad p=1,2,\dots ,r-1, \end{aligned}$$
(5.4)
$$\begin{aligned}&\partial _{t}^{p}u(x,0)= \lim _{t\rightarrow 0} G_{p}(x,t), \qquad p=2,3,\dots ,r. \end{aligned}$$
(5.5)

(2) Adding suitable correction terms. For the Caputo fractional derivative \({_{~0}^{C}}\mathcal{D}_{t}^{\alpha }u(x,t)\), we can rewrite it as

$$\begin{aligned} {_{~0}^{C}}\mathcal{D}_{t}^{\alpha }u(x,t)={_{~0}^{C}}\mathcal{D}_{t}^{\alpha }v(x,t)+\sum _{p=1}^{r}\frac{ \partial _{t}^{p}u(x,0)}{\Gamma (p+1-\alpha )}t^{p-\alpha }, \end{aligned}$$

where v(xt) is defined by (5.2). Obviously, we have made sure that v(xt) satisfies the zero-derivatives condition, and so by Theorem 2.1,

$$\begin{aligned} {_{~0}^{C}\mathcal{D}_{t}^{\alpha }}U_{i}^{n}= & {} \tau ^{-\alpha } \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} \left( U_{i}^{n-k}-\sum _{p=0}^{r} \frac{\partial _{t}^{p}u(x_{i},0)}{p!} t_{n-k}^{p} \right) \nonumber \\&+\sum _{p=1}^{r}\frac{ \partial _{t}^{p}u(x_{i},0)}{\Gamma (p+1-\alpha )}t_{n}^{p-\alpha }+\mathcal{O}(\tau ^{r}).\nonumber \\ \end{aligned}$$
(5.6)

For preserving the rth-order accuracy, we approximate \(\partial _{t}^{p}u(x_{i},0)\) by the following rth-order difference formula (see [57], page 83):

$$\begin{aligned} \partial _{t}^{p}u(x_{i},0)=\frac{1}{\tau ^{p}} \sum _{q=0}^{p+r-1}b_{p,q}^{(r)}u(x_{i},t_{q})+\mathcal{O}(\tau ^{r}), \end{aligned}$$
(5.7)

where the coefficients \(b_{p,q}^{(r)}\) for \(p=1,2,3,4\) and \(r=3,4\) are given in Table 3. Thus, the resulting discretization of (1.1) can be written as

$$\begin{aligned} \left\{ \begin{array}{l} \tau ^{-\alpha } \displaystyle \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} \left( \mathcal{H}u_{i}^{n-k}-\mathcal{H}u_{i}^{0}-\sum _{p=1}^{r} \frac{t_{n-k}^{p}}{p!} \left( \frac{1}{\tau ^{p}} \sum _{q=0}^{p+r-1}b_{p,q}^{(r)}\mathcal{H}u_{i}^{q}\right) \right) \\ \qquad +\displaystyle \sum _{p=1}^{r}\frac{t_{n}^{p-\alpha }}{\Gamma (p+1-\alpha )}\left( \frac{1}{\tau ^{p}} \sum _{q=0}^{p+r-1}b_{p,q}^{(r)}\mathcal{H}u_{i}^{q}\right) =\mathcal{Q}u_{i}^{n}+\mathcal{H}f_{i}^{n},\\ \qquad \qquad 1\le i\le M-1, ~~1\le n\le N,\\ u_{0}^{n}=\phi _{0}^{n},\quad u_{M}^{n}=\phi _{L}^{n}, \qquad 1\le n\le N,\\ u_{i}^{0}=0, \qquad 0\le i\le M. \end{array}\right. \end{aligned}$$
(5.8)

This is a modification to the scheme (3.11) by adding some correction terms related to the starting values \(u_{i}^{n}\) for \(n=0,1,\dots ,2r-1\) (and so it is required that \(\tau \le T/(2r-1)\)). Note that the starting values \(u_{i}^{n}\) for \(n=1,2,\dots ,2r-1\) are coupled and have to be solved simultaneously. The numerical results in the next section will show that the above scheme truly has high-order accuracy for the problem (1.1) without the zero-derivatives condition.

Table 3 The coefficients \(b_{p,q}^{(r)}\) for the approximation (5.7)
Table 4 The errors and the temporal convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (3.11) for Example 6.1 with \(\alpha =1/4\) \((h=1/3500)\)
Table 5 The errors and the temporal convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (3.11) for Example 6.1 with \(\alpha =1/2\) \((h=1/3500)\)
Table 6 The errors and the temporal convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (3.11) for Example 6.1 with \(\alpha =3/4\) \((h=1/3500)\)
Table 7 The errors and the spatial convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (3.11) for Example 6.1 with \(\alpha =1/4\)
Table 8 The errors and the spatial convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (3.11) for Example 6.1 with \(\alpha =1/2\)
Table 9 The errors and the spatial convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (3.11) for Example 6.1 with \(\alpha =3/4\)

5.2 Further Spatial Approximations

In general, the integrals involved in the compact difference scheme (3.11) cannot be evaluated exactly. One way of overcoming this difficulty is to replace them by suitable numerical integrations. Let \(\varphi (x)=1/k(x)\) and assume \(\varphi (x)\in C^{6}[0,L]\). By the closed Newton-Cotes formula of degree 4 (see [58]),

$$\begin{aligned} \int _{x_{i-1}}^{x_{i}} \frac{1}{k(s)}\,\mathrm{d}s= & {} \frac{h}{90} \left( 7\varphi (x_{i-1})+32\varphi (x_{i-\frac{3}{4}})\right. \nonumber \\&\left. +12\varphi (x_{i-\frac{1}{2}})+32\varphi (x_{i-\frac{1}{4}})+7\varphi (x_{i})\right) +\mathcal{O}(h^{7}),\nonumber \\ \end{aligned}$$
(5.9)

where \(x_{i}=ih\) even if i is not a integer. This implies

$$\begin{aligned} J_{i}=\widetilde{J}_{i}+\mathcal{O}(h^{4}), \qquad 1\le i\le M, \end{aligned}$$
(5.10)

where

$$\begin{aligned} \widetilde{J}_{i}=\frac{1}{h^{2}}\frac{90}{7\varphi (x_{i-1})+32\varphi (x_{i-\frac{3}{4}})+12\varphi (x_{i-\frac{1}{2}})+32\varphi (x_{i-\frac{1}{4}})+7\varphi (x_{i})}. \end{aligned}$$
(5.11)

Exchanging the order of integration, we have

$$\begin{aligned} E_{i}^{(-1)}= & {} \int _{x_{i-1}}^{x_{i}}\phi _{1,i}(x)g_{1,i}(x) \mathrm{d}x+\int _{x_{i}}^{x_{i+1}}\phi _{1,i}(x)g_{2,i+1}(x)\mathrm{d}x\nonumber \\= & {} \frac{J_{i}}{h^{2}} \int _{x_{i-1}}^{x_{i}} \widetilde{\varphi }_{1,i} (s) \mathrm{d}s -\frac{J_{i+1}}{h^{2}} \int _{x_{i}}^{x_{i+1}} \widetilde{\varphi }_{1,i}(s) \mathrm{d}s, \end{aligned}$$
(5.12)

where \(\widetilde{\varphi }_{1,i}(s)=\varphi (s) \left( \frac{h}{4}(s-x_{i})^{2}-\frac{1}{6} (s-x_{i})^{3} \right) \). Then by the closed Newton-Cotes formula of degree 4,

$$\begin{aligned} E_{i}^{(-1)}= & {} \frac{J_{i}h^{2}}{90} \left( \frac{35}{12}\varphi (x_{i-1})+\frac{27}{4}\varphi (x_{i-\frac{3}{4}})+\varphi (x_{i-\frac{1}{2}})+\frac{7}{12}\varphi (x_{i-\frac{1}{4}}) \right) \nonumber \\&-\frac{J_{i+1}h^{2}}{90} \left( \frac{5}{12}\varphi (x_{i+\frac{1}{4}})+\frac{1}{2}\varphi (x_{i+\frac{1}{2}})+\frac{9}{4}\varphi (x_{i+\frac{3}{4}})+\frac{7}{12}\varphi (x_{i+1}) \right) \nonumber \\&+\frac{h^{5}}{1935360} \left( J_{i+1} \widetilde{\varphi }_{1,i}^{(6)}(\eta _{i})-J_{i} \widetilde{\varphi }_{1,i}^{(6)}(\xi _{i})\right) , \end{aligned}$$
(5.13)

where \(\xi _{i}\in (x_{i-1}, x_{i})\) and \(\eta _{i}\in (x_{i}, x_{i+1})\). A simple calculation shows

$$\begin{aligned}&h^{5}\left( J_{i+1} \widetilde{\varphi }_{1,i}^{(6)}(\eta _{i})-J_{i} \widetilde{\varphi }_{1,i}^{(6)}(\xi _{i})\right) =h^{5} \left( J_{i+1} \left( \widetilde{\varphi }_{1,i}^{(6)}(\eta _{i})-\widetilde{\varphi }_{1,i}^{(6)}(\xi _{i}) \right) +(J_{i+1}-J_{i}) \widetilde{\varphi }_{1,i}^{(6)}(\xi _{i}) \right) \nonumber \\&\quad = J_{i+1}h^{5} \left( \mathcal{O}(h)+20 \left( \varphi ^{(3)}(\xi _{i})-\varphi ^{(3)}(\eta _{i}) \right) \right) +\mathcal{O}(h^{4})=\mathcal{O}(h^{4}). \end{aligned}$$
(5.14)

Applying (5.10) and (5.14) into (5.13) leads to

$$\begin{aligned} E_{i}^{(-1)}=\widetilde{E}_{i}^{(-1)}+\mathcal{O}(h^{4}), \qquad 1\le i\le M-1, \end{aligned}$$
(5.15)

where

$$\begin{aligned} \widetilde{E}_{i}^{(-1)}= & {} \frac{\widetilde{J}_{i}h^{2}}{1080} \left( 35\varphi (x_{i-1})+81\varphi (x_{i-\frac{3}{4}})+12\varphi (x_{i-\frac{1}{2}})+7\varphi (x_{i-\frac{1}{4}}) \right) \nonumber \\&-\frac{\widetilde{J}_{i+1}h^{2}}{1080} \left( 5\varphi (x_{i+\frac{1}{4}})+6\varphi (x_{i+\frac{1}{2}})+27\varphi (x_{i+\frac{3}{4}})+7\varphi (x_{i+1}) \right) . \end{aligned}$$
(5.16)

Similarly, we get

$$\begin{aligned} E_{i}^{(0)}=\widetilde{E}_{i}^{(0)}+\mathcal{O}(h^{4}), \qquad E_{i}^{(1)}=\widetilde{E}_{i}^{(1)}+\mathcal{O}(h^{4}), \qquad 1\le i\le M-1, \end{aligned}$$
(5.17)

where

$$\begin{aligned} \widetilde{E}_{i}^{(0)}= & {} \frac{\widetilde{J}_{i}h^{2}}{540} \left( 28\varphi (x_{i-1})+117\varphi (x_{i-\frac{3}{4}})+33\varphi (x_{i-\frac{1}{2}})+47\varphi (x_{i-\frac{1}{4}}) \right) \nonumber \\&+\frac{\widetilde{J}_{i+1}h^{2}}{540} \left( 47\varphi (x_{i+\frac{1}{4}})+33\varphi (x_{i+\frac{1}{2}})+117\varphi (x_{i+\frac{3}{4}})+28\varphi (x_{i+1}) \right) ,\end{aligned}$$
(5.18)
$$\begin{aligned} \widetilde{E}_{i}^{(1)}= & {} \frac{\widetilde{J}_{i+1}h^{2}}{1080} \left( 7\varphi (x_{i+\frac{1}{4}})+12\varphi (x_{i+\frac{1}{2}})+81\varphi (x_{i+\frac{3}{4}})+35\varphi (x_{i+1}) \right) \nonumber \\&-\frac{\widetilde{J}_{i}h^{2}}{1080} \left( 7\varphi (x_{i-1})+27\varphi (x_{i-\frac{3}{4}})+6\varphi (x_{i-\frac{1}{2}})+5\varphi (x_{i-\frac{1}{4}}) \right) . \end{aligned}$$
(5.19)

For any grid function \(w=\{w_{i}~|~0\le i\le M\}\), we define operators

$$\begin{aligned}&\widetilde{\mathcal{Q}}w_{i}=\widetilde{J}_{i} w_{i-1}-\left( \widetilde{J}_{i}+\widetilde{J}_{i+1} \right) w_{i}+\widetilde{J}_{i+1}w_{i+1},\nonumber \\&\widetilde{\mathcal{H}}w_{i}=\widetilde{E}_{i}^{(-1)}w_{i-1}+\widetilde{E}_{i}^{(0)}w_{i}+\widetilde{E}_{i}^{(1)}w_{i+1}, \qquad 1\le i\le M-1. \end{aligned}$$
(5.20)

Then the corresponding compact finite difference method is to find \(u_{i}^{n}\) such that

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} \tau ^{-\alpha } \displaystyle \sum _{k=0}^{n}\varpi _{r,k}^{(\alpha )} \widetilde{\mathcal{H}}u_{i}^{n-k}=\widetilde{\mathcal{Q}}u_{i}^{n}+\widetilde{\mathcal{H}}f_{i}^{n},&{} 1\le i\le M-1, 1\le n\le N,\\ u_{0}^{n}=\phi _{0}^{n},\quad u_{M}^{n}=\phi _{L}^{n}, &{} 1\le n\le N,\\ u_{i}^{0}=0, &{} 0\le i\le M. \end{array}\right. \end{aligned}$$
(5.21)

Based on the formulae (5.10), (5.11) and (5.15)–(5.19), we can establish the results similar to those in Lemmas 3.13.3 and in Theorems 3.1 and 3.2 for the above scheme (5.21). Consequently, for the solution \(u_{i}^{n}\) of the above scheme (5.21), we have the numerical stability given in Theorem 4.1 and the error estimates (4.32)–(4.34) under the condition of Theorem 4.2. It should be pointed out that since the scheme (5.21) is a further approximation to the scheme (3.11), it is recommended to use only when the integrals involved in the scheme (3.11) cannot be evaluated exactly.

6 Applications and Numerical Results

In this section, we apply the proposed compact finite difference methods to two model problems in the form (1.1). The exact analytical solution u(xt) of each problem is explicitly known and is mainly used to compare with the computed solution \(u_{i}^{n}\) of the compact difference scheme (3.11) or its modified schemes (5.8) and (5.21).

To demonstrate the accuracy of the computed solution \(u_{i}^{n}\), we compute its weighted \(H^{1}\), \(L^{\infty }\) and \(L^{2}\) norms errors:

$$\begin{aligned} \mathrm{E}_{r,\nu }(\tau ,h)=\displaystyle \left( \tau \sum _{n=1}^{N}\left\| U^{n}-u^{n}\right\| ^{2}_{\nu }\right) ^{\frac{1}{2}}(\nu =1,\infty ),\qquad \mathrm{E}_{r,2}(\tau ,h)=\displaystyle \left( \tau \sum _{n=1}^{N}\left\| U^{n}-u^{n}\right\| ^{2}\right) ^{\frac{1}{2}}, \end{aligned}$$

where \(U_{i}^{n}=u(x_{i},t_{n})\). The temporal and spatial convergence orders are computed, respectively, by the formulae

$$\begin{aligned} \displaystyle \mathrm{O}_{r,\nu }^\mathrm{t}(\tau ,h)=\log _{2}\left( \frac{\mathrm{E}_{r,\nu }(2\tau ,h)}{\mathrm{E}_{r,\nu }(\tau ,h)}\right) ,\qquad \displaystyle \mathrm{O}_{r,\nu }^\mathrm{s}(\tau ,h)=\log _{2}\left( \frac{\mathrm{E}_{r,\nu }(\tau ,2h)}{\mathrm{E}_{r,\nu }(\tau ,h)}\right) , \qquad \nu =1,2,\infty . \end{aligned}$$

All computations are carried out by using a MATLAB routine on a computer with Xeon X5650 CPU and 96GB memory.

Table 10 The errors and the temporal convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (5.21) with \(r=3\) for Example 6.2 \((h=1/4000)\)

Example 6.1

Consider the problem (1.1) in the domain \([0,1]\times [0,1]\) with \(k(x)=1+x^{2}\). The source term f(xt) and the boundary functions \(\phi _{0}(t)\) and \(\phi _{L}(t)\) are properly taken such that the problem has the solution \(u(x,t)=t^{r+\alpha }(2\mathrm{e}^{3}+\sin x)\) for any positive integer r.

Let \(r=3,4,5,6\). We use the compact difference scheme (3.11) with \(r=3,4,5,6\) to solve the above problem. The errors \(\mathrm{E}_{r,\nu }(\tau ,h)\) and the temporal convergence orders \(\mathrm{O}_{r,\nu }^\mathrm{t}(\tau ,h)\) \((\nu =1,2,\infty )\) of the computed solution \(u_{i}^{n}\) for \(h=1/3500\) and different time step \(\tau \) are listed in Table 4 (\(\alpha =1/4\)), Table 5 (\(\alpha =1/2)\) and Table 6 (\(\alpha =3/4\)). As expected from our theoretical analysis, the computed solution \(u_{i}^{n}\) has the rth-order temporal accuracy.

Table 11 The errors and the temporal convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (5.21) for the transformed problem of Example 6.2 with \(\alpha =1/2\) \((h=1/4000)\)
Table 12 The errors and the spatial convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (5.21) for the transformed problem of Example 6.2 with \(\alpha =1/2\)

We next compute the spatial convergence order of the compact difference scheme (3.11). The errors \(\mathrm{E}_{r,\nu }(\tau ,h)\) and the spatial convergence orders \(\mathrm{O}_{r,\nu }^\mathrm{s}(\tau ,h)\) \((\nu =1,2,\infty )\) of the computed solution \(u_{i}^{n}\) for different spatial step h are presented in Table 7 (\(\alpha =1/4\)), Table 8 (\(\alpha =1/2\)) and Table 9 (\(\alpha =3/4\)), where the time step \(\tau =1/15000\) for \(r=3\), \(\tau =1/3000\) for \(r=4\) and \(\tau =1/800\) for \(r=5,6\). The data in these tables demonstrate that the computed solution \(u_{i}^{n}\) is of the fourth-order spatial accuracy. This coincides well with the analysis.

Table 13 The errors and the temporal convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (5.8) with \(r=4\) for Example 6.3 \((h=1/4000)\)
Table 14 The errors and the spatial convergence orders of the computed solution \(u_{i}^{n}\) by the scheme (5.8) with \(r=4\) for Example 6.3 \((\tau =1/4000)\)

Example 6.2

This example is mainly used to demonstrate the effectiveness of the technique using the substitution (5.2) and the high-order accuracy of the modified scheme (5.21). Consider the problem (1.1) in the domain \([0,1]\times [0,1]\) with \(k(x)=\mathrm{e}^{x^{2}}\). For this k(x), the integrals involved in the compact difference scheme (3.11) cannot be evaluated exactly; so we use the modified scheme (5.21) instead. Taking the exact analytical solution as

$$\begin{aligned} u(x,t)=\left( 4t^{r+\alpha }+\sum _{p=1}^{r} t^{p} \right) (1+2x-x^{2}-x^{3}),\qquad r\ge 1, \end{aligned}$$

it is easy to analytically get the source function

$$\begin{aligned} f(x,t)= & {} \left( \frac{4\Gamma (r+\alpha +1)}{\Gamma (r+1)} t^{r}+\sum _{p=1}^{r} \frac{\Gamma (p+1)}{\Gamma (p+1-\alpha )} t^{p-\alpha } \right) (1+2x-x^{2}-x^{3})\nonumber \\&+\left( 4t^{r+\alpha }+\sum _{p=1}^{r} t^{p} \right) \mathrm{e}^{x^{2}} (6x^{3}+4x^{2}+2x+2). \end{aligned}$$

Let \(r=3\). Table 10 lists the errors \(\mathrm{E}_{3,\nu }(\tau ,h)\) and the temporal convergence orders \(\mathrm{O}_{3,\nu }^\mathrm{t}(\tau ,h)\) \((\nu =1,2,\infty )\) of the computed solution \(u_{i}^{n}\) by the compact difference scheme (5.21) with \(r=3\) for \(\alpha =1/4,1/2,3/4\) and different time step \(\tau \). It is seen that the third-order temporal accuracy of the computed solution \(u_{i}^{n}\) cannot be achieved.

To preserve the desired high-order accuracy, we transform the present problem by using the substitution (5.2), where according to Propositions 5.2 and 5.3, the coefficients \(\partial _{t}^{p}u(x,0)\) are given by

$$\begin{aligned} u(x,0)=0, \qquad \partial _{t}^{p}u(x,0)=p! (1+2x-x^{2}-x^{3}),\qquad p=1,2,\dots ,r. \end{aligned}$$

Let \(r=3,4,5,6\). We use the compact difference scheme (5.21) with \(r=3,4,5,6\) to solve the above transformed problem. As in the first example, the basic feature of the rth-order temporal convergence and the fourth-order spatial convergence of the computed solution \(u_{i}^{n}\) was observed in the numerical computations for each \(r=3,4,5,6\) and different \(\alpha \). Without loss of generality, we only present the results for \(\alpha =1/2\) in Tables 11 and 12. In Table 12, we take the time step \(\tau =1/20000\) for \(r=3\), \(\tau =1/4000\) for \(r=4\), \(\tau =1/1000\) for \(r=5, 6\). Clearly, these results are in accord with our theoretical analysis results. This demonstrates the effectiveness of the technique using the substitution (5.2) for preserving high-order accuracy. It also shows that the modified scheme (5.21) indeed maintains the desired high-order accuracy of the compact difference scheme (3.11).

Example 6.3

We present numerical results to show that the modified scheme (5.8) has high-order accuracy for the problem (1.1) without the zero-derivatives condition. To this end, we still consider the problem in Example 6.2 as our test problem, but we solve it directly by the scheme (5.8) (with the operators \(\mathcal{Q}\) and \(\mathcal{H}\) being replaced by \(\widetilde{\mathcal{Q}}\) and \(\widetilde{\mathcal{H}}\), respectively). Let \(r=4\). We list in Tables 13 and 14 the errors \(\mathrm{E}_{4,\nu }(\tau ,h)\), the temporal convergence orders \(\mathrm{O}_{4,\nu }^\mathrm{t}(\tau ,h)\) and the spatial convergence orders \(\mathrm{O}_{4,\nu }^\mathrm{s}(\tau ,h)\) \((\nu =1,2,\infty )\) of the computed solution \(u_{i}^{n}\) for \(\alpha =1/4,1/2,3/4\). It is seen that the modified scheme (5.8) has the fourth-order accuracy both in time and space as expected.

7 Conclusion

In this paper, we have proposed a set of high-order compact finite difference methods. It can be used to solve a class of Caputo-type fractional sub-diffusion equations in conservative form, where the diffusion coefficient may be spatially variable. A class of Lubich approximation formulae have been derived for the Caputo fractional derivative defined on a finite interval. The high-order compact difference discretization of the spatial variable coefficient differential operator is different from that used in the known treatments of fractional differential equations. The proposed compact difference methods are unconditionally stable and have the global convergence order \(\mathcal{O}(\tau ^{r}+h^{4})\), where \(r\ge 2\) is a positive integer and \(\tau \) and h are the temporal and spatial steps. We have also developed a discrete energy technique for the analysis of the present variable coefficient problem. Using this technique, a theoretical analysis of the stability and convergence of the methods has been rigorously carried out for the case of \(2\le r\le 6\), and the optimal error estimates in the weighted \(H^{1}\), \(L^{2}\) and \(L^{\infty }\) norms have been successfully obtained for the general case of variable coefficient. We have further proposed two modified schemes for enlarging the applicability of the methods while preserving high-order accuracy. Numerical results coincide with the theoretical analysis results and illustrate the various convergence orders of the methods.