1 Introduction

It is well known that numerical approximation of optimal control problems has long been an important topic in engineering design work. There has been extensive researches on developing fast numerical algorithms for these optimal control problems. Some recent progress in this area can be found in [5, 8, 1217, 2123, 25, 29] and the references cited therein.

In this paper we consider the model of a pointwise-contrained time-dependent optimal control problem imposed on the space interval (0, 1) over the time period [0, T]: Find a control \(u \in K := \{ v \in L^2(0,T;(0,1)), v \ge u_0\}\) with a prescribed function \(u_0(x,t)\) to minimize the objective functional

$$\begin{aligned} J(u) := \displaystyle \frac{1}{2}\displaystyle \int _0^T\displaystyle \int _0^1 \big (y - \tilde{y} \big )^2 dx dt + \displaystyle \frac{1}{2} \displaystyle \int _0^T\displaystyle \int _0^1 u^2 dx dt, \end{aligned}$$
(1)

which is governed by the initial-boundary value problem of a diffusion equation for the state variable y(xt)

$$\begin{aligned}&L y = f + u, ~(x,t) \in (0,1) \times (0,T], \nonumber \\&y(0,t) = y(1,t) = 0, ~ t \in [0,T], ~~y(x,0) = y_0(x), ~x \in [0,1]. \end{aligned}$$
(2)

Here \(\tilde{y}(x,t)\) represents a prescribed or observed value of the state variable y, f(xt) is a prescribed source and sink term, u is the control variable, and L represents a diffusion operator.

This type of problems occurs widely in engineering design and applications. For instance, in the temperature control in a thermo conduction process, y represents the temperature in a medium and u the heating or cooling source. The goal of the control problem is to optimize the input of the heating or cooling source u such that the temperature distribution u is as close as to the prescribed temperature distribution \(\tilde{y}\). Due to the limitation of the surrounding environments or equipments, the input control u might be subject to certain constraint which leads to a constrained control problem such as that with a pointwise obstacle constraint as described by the admissible set K. Similar models also occur in the optimal control in mass diffusive transport process in porous media and in the photocopier and the laser printer in the modern office that rely on the transport of electrons in amorphous semiconductors in an electric field. The resulting optimality conditions from these models consist of non-smooth nonlinear systems of partial differential equations.

The canonical optimal control problems of this type, in which the differential operator L is a classical second-order Fickian diffusion operator, have been studied extensively in the literature and many efficient and reliable methods have been developed [5, 1215, 17, 21, 25]. There exist at least three general classes of computational approaches to solving the resulting nonlinear systems of partial differential equations derived from the optimality conditions. The primal-dual active set method employs the primal-dual active set strategy based on the augmented Lagrangian method in [12, 13]. This method has widely been used in solving numerical solutions of different kinds of optimal control in the literature. However in some cases, e.g. when the Lagrange multipliers are irregular, the active sets are not easy to be computed numerically. The second class is to apply the semi-Newton algorithm to solve the resulting optimality conditions, which involves computations of second derivatives, see, e.g., [14, 15]. There exists an extensive body of research on the above two methods in the literature, and it is even impossible to give a brief review here. The third kind is the class of gradient projection methods that use a line search method to minimize the objective functional, guided by the gradient whose computation needs repeatedly solving the state and the co-state equations. This class of methods has been found robust in practice [17], but its effectiveness relies on very efficient solvers for the state and the co-state equations.

In the last few decades an increasingly more number of evidence suggests that the classical second-order Fickian diffusion equation or heat conduction equation does not necessarily provides a proper description of the diffusion or heat conduction process in a heterogeneous medium. A microscopic explanation is as follows: In a normal diffusion process modeled by a classical second-order Fickian diffusion or heat conduction equation, a particle’s motion in the underlying transport or conduction process has virtually no spatial correlation and long walks in the same direction are rare, so the variance of a particle excursion distance is finite. The central limit theorem ensures that the probability of finding a particle somewhere in space satisfies a normal distribution, which gives a probabilistic description of a normal diffusion. However, in a diffusion or heat conduction process in a heterogeneous medium, a particle’s motion is better described by steps that are not independent and that can take vastly different times to perform. A particle’s long walk may have long correlation length so the processes can have large deviations from the stochastic process of Brownian motion or even have an infinite variance. Thus, the assumptions of (i) the existence of a mean free path and (ii) the existence of a mean time taken to perform a step, which were made by Einstein in his derivation of diffusion equation or Brownian motion [9], are violated. As a matter of fact, it was shown [3, 20] that the heavy tail behavior of the transport processes in heterogeneous media can be described accurately by Levy distribution, which can be viewed as a probability description of fractional diffusion equations. This is the fundamental reason why fractional diffusion or heat conduction equations provide a better description of transport processes in heterogeneous media than classical Fickian diffusion or heat conduction equations do.

Motivated by these facts, in this paper we consider the optimal control problem of (1)–(2) except that L is a fractional diffusion operator given by

$$\begin{aligned} Ly := \displaystyle \frac{\partial {y}}{\partial {t}} - d(x,t) \bigg ( r \; {}_0D_x^{2-\beta } + (1-r)\; {}_xD_1^{2-\beta } \bigg ) y, ~(x,t) \in (0,1) \times (0,T] \end{aligned}$$
(3)

where d(xt) is the diffusivity coefficient with positive lower and upper bounds, \(0 \le r \le 1\) indicates the relative weight of forward versus backward transition probability, \(2-\beta \) with \(0 < \beta <1\) is the order of the anomalous diffusion in the governing equation. The left and right fractional integrals are defined by Podlubny [24]

$$\begin{aligned}&^{}_0D_x^{-\beta }w(x) \displaystyle := \frac{1}{\varGamma (\beta )} \int _0^x \frac{w(s)}{(x-s)^{1-\beta }}ds, \\&^{}_xD^{-\beta }_1w(x) \displaystyle := \frac{1}{\varGamma (\beta )} \int _x^1 \frac{w(s)}{(s-x)^{1-\beta }}ds \end{aligned}$$

For any positive integer m, the left and right Riemann–Liouville fractional derivatives of order \(m-\beta \) are defined by

$$\begin{aligned}&{}_0D_x^{m-\beta }w(x) \displaystyle := D^m {}^{}_0D_x^{-\beta } w(x) = \frac{1}{\varGamma (\beta )} \frac{d^m}{dx^m} \int _0^x \frac{w(s)}{(x-s)^{1-\beta }} ds,\\&{}_xD^{m-\beta }_1w(x) \displaystyle := (- D)^m {}^{}_xD_1^{-\beta } w(x) = \frac{(-1)^m}{\varGamma (\beta )} \frac{d^m}{dx^m} \int _x^1 \frac{w(s)}{(s-x)^{1-\beta }} ds. \end{aligned}$$

Because of the improved modeling capability of the fractional differential equations, there has been increasingly more research on both mathematical analysis and numerical approximation of the fractional control problems [1, 7, 10]. However, because of the nonlocal nature of fractional differential operators, the corresponding numerical methods for solving the fractional state and costate equations generate full stiffness matrices. Consequently, the resulting computational complexity is of \(O(N^3)\) and the memory requirement is of \(O(N^2)\) for a problem of size N, which is in sharp contrast to the computational work and memory requirement of O(N) for numerical methods for second-order Fickian diffusion equations. Recall that the fractional state and costate equations have to be iteratively solved for many times. Hence, the computational bottleneck becomes a really severe burden. To our best knowledge, there is no work in the literature that has addressed the significantly increased computational cost issue.

The goal of this paper is to develop a fast gradient projection method for the fractional optimal control problem (1)–(3), which significantly reduces the computational cost of the solution of the fractional state and costate equations to \(O(M N \log N)\) from \(O(M N^3)\), with M being the number of time steps, per iterative solve of the two equations, and the memory requirement to O(N) from \(O(N^2)\). Numerical experiments justify these. The rest of the paper is organized as follows. In Sect. 2 we derive the optimality condition for the fractional optimal control problem. In Sect. 3 we present a fractional gradient projection method. In Sect. 4 we present a fast and faithful solution method. In Sect. 5 we carry out numerical experiments to validate the fast gradient projection method developed.

2 The Optimality Condition of the Fractional Control Problem

We frequently use the following properties in this section. The left and right Riemann–Liouville fractional integral operators are adjoints in the \(L^2\) sense [24, 26]:

$$\begin{aligned} \displaystyle \int _0^1 \left( {}_0D_x^{-\beta } w\right) v dx = \displaystyle \int _0^1 w \left( {}_xD_1^{-\beta } v\right) dx, \quad \forall v, w \in L^2(0,1). \end{aligned}$$
(4)

For any vw with \(v(0)=0\) and \(w(1)=0\), the fractional integral operator and the classical first-order differential operators commute [33]:

$$\begin{aligned} {}^{}_0D^{-\beta }_x D v = D {}_0D^{-\beta }_x v , \qquad {}^{}_xD^{-\beta }_1 Dw = D {}^{}_xD^{-\beta }_1 w. \end{aligned}$$
(5)

To derive the adjoint costate equation for the costate function we rewrite the state equation (2) as

$$\begin{aligned}&d^{-1}\displaystyle \frac{\partial {y}}{\partial {t}} - \big ( r\; {}_0D_x^{2-\beta } + (1-r)\; {}_xD_1^{2-\beta } \big ) y = d^{-1}(f+u),\nonumber \\&\quad y(0,t) = y(1,t) = 0, \quad t \in [0,T], \nonumber \\&\quad y(x,0) = y_0(x), \quad x \in [0,1]. \end{aligned}$$
(6)

We then multiply the equation by any test function p with appropriately specified boundary condition and terminal conditions [as in (7)] and utilize the adjoint property (4) to obtain the following terminal-boundary value problem of the adjoint costate equation for the costate function p(xt)

$$\begin{aligned}&-\displaystyle \frac{\partial {(d^{-1}p)}}{\partial {t}} - \big ( (1-r)\; {}_0D_x^{2-\beta } + r\; {}_xD_1^{2-\beta } \big ) p = y-\tilde{y}, \nonumber \\&\quad p(0,t) = p(1,t) = 0, \quad t \in [0,T], \nonumber \\&\quad p(x,T) = 0, \quad x \in [0,1]. \end{aligned}$$
(7)

To derive the optimality condition for (1), we calculate the Gâteaux derivative of the objective functional J(u). Let \(u \in K\) be the minimizer of the objective functional J(u). Let \(v \in K\) be any admissible function and \(\delta u = v-u\). For any arbitrarily small positive number \(\varepsilon \), we note that

$$\begin{aligned} \delta y := \lim \limits _{\varepsilon \rightarrow 0^+} \displaystyle \frac{y(u+\varepsilon \delta u)-y(u)}{\varepsilon } = y'(u)\delta u \end{aligned}$$
(8)

exists, provided that y is differentiable with respect to u. Consequently,

$$\begin{aligned} y(u+\varepsilon \delta u)= & {} y(u) + \varepsilon y'(u) \delta u + o(|\varepsilon \delta u|) \\= & {} y(u) + \varepsilon \delta y + o(|\varepsilon \delta u|) \end{aligned}$$

as \(\varepsilon \) tends to zero. Therefore, we expand \(J(u+\varepsilon \delta u)\) as follows

$$\begin{aligned} J(u+\varepsilon \delta u)= & {} \displaystyle \frac{1}{2}\displaystyle \int _0^T\displaystyle \int _0^1 (y(u+\varepsilon \delta u) -\tilde{y})^2 dx dt\\&+\, \displaystyle \frac{1}{2} \displaystyle \int _0^T\displaystyle \int _0^1 (u+\varepsilon \delta u)^2 dx dt \\= & {} \displaystyle \frac{1}{2}\displaystyle \int _0^T\displaystyle \int _0^1 (y(u) -\tilde{y})^2 + 2 \varepsilon (y(u) -\tilde{y}) \delta y dx dt \\&+\, \displaystyle \frac{1}{2} \displaystyle \int _0^T\displaystyle \int _0^1 u^2 + 2 \varepsilon \; u \; \delta u dx dt + o(\varepsilon ) . \end{aligned}$$

We accordingly find that

$$\begin{aligned} J'(u)\delta u= & {} \displaystyle \lim \limits _{\varepsilon \rightarrow 0^+} \displaystyle \frac{J(u+\varepsilon \delta u) -J(u)}{\varepsilon } \nonumber \\= & {} \displaystyle \int _0^T\displaystyle \int _0^1 (y-\tilde{y}) \delta y ~dxdt + \displaystyle \int _0^T\displaystyle \int _0^1 u \delta u ~dxdt\nonumber \\= & {} \displaystyle \int _0^T \displaystyle \int _0^1 \Big [-\displaystyle \frac{\partial {(d^{-1}p)}}{\partial {t}} - \Big ( (1-r)\; {}_0D_x^{2-\beta } + r\; {}_xD_1^{2-\beta } \Big ) p\Big ] \delta y \;dxdt \nonumber \\&+\, \displaystyle \int _0^T\displaystyle \int _0^1 u \delta u ~dxdt. \end{aligned}$$
(9)

In the last step we have replaced \(y-\tilde{y}\) in the integrand of the first integral by the left-hand side of the costate equation (7).

We see from (8) that \(\delta y\) satisfies the homogeneous initial and boundary conditions. We further utilize the fact that \(p(x,T) = 0\) and integrate the first integrand in the first integral on the right-hand side with respect to time by parts to get

$$\begin{aligned} \displaystyle \int _0^T \displaystyle \int _0^1 \displaystyle \frac{\partial {(d^{-1}p)}}{\partial {t}} \delta y dxdt = -\displaystyle \int _0^T \displaystyle \int _0^1 p d^{-1} \displaystyle \frac{\partial {\delta y}}{\partial {t}} dxdt. \end{aligned}$$
(10)

We intend to utilize the facts that \(\delta y\) and p both vanish at the boundary to integrate the second integrand in the first integral on the right-hand side with respect to x by parts in a similar fashion. However because the fractional integral operators and the classical differential operators do not commute in general, we have to take extra care by utilizing (4) and (5) repeatedly to obtain

$$\begin{aligned}&\displaystyle \int _0^T \displaystyle \int _0^1 \left[ \Big ( (1-r)\; {}_0D_x^{2-\beta } + r\; {}_xD_1^{2-\beta } \Big ) p\right] \delta y dxdt\nonumber \\&\quad = \displaystyle \int _0^T \displaystyle \int _0^1 \left[ \Big ( (1-r)\; D^2~{}_0D_x^{-\beta } + r\; D^2 {}_xD_1^{-\beta } \Big ) p\right] \delta y dxdt \nonumber \\&\quad = \displaystyle \int _0^T \displaystyle \int _0^1 \left[ \Big ( (1-r)\; D ~{}_0D_x^{-\beta } D + r\; D {}_xD_1^{-\beta } D \Big ) p\right] \delta y dxdt \nonumber \\&\quad = -\displaystyle \int _0^T \displaystyle \int _0^1 \left[ \Big ( (1-r)\; ~{}_0D_x^{-\beta } D + r\; {}_xD_1^{-\beta } D \Big ) p\right] (D \delta y) dxdt \nonumber \\&\quad = -\displaystyle \int _0^T \displaystyle \int _0^1 \left[ \Big ( r\; {}_0D_x^{-\beta } + (1-r)\; {}_xD_1^{-\beta } \Big ) D \delta y\right] Dp dxdt \nonumber \\&\quad = -\displaystyle \int _0^T \displaystyle \int _0^1 \left[ \Big ( rD\; {}_0D_x^{-\beta } + (1-r)D\; {}_xD_1^{-\beta } \Big ) \delta y\right] Dp dxdt \nonumber \\&\quad = \displaystyle \int _0^T \displaystyle \int _0^1 \left[ \Big ( r D^2\; {}_0D_x^{-\beta } + (1-r) D^2 \; {}_xD_1^{-\beta } \Big ) \delta y\right] p dxdt \nonumber \\&\quad = \displaystyle \int _0^T \displaystyle \int _0^1 \left[ \Big ( r\; {}_0D_x^{2-\beta } + (1-r)\; {}_xD_1^{2-\beta } \Big ) \delta y\right] p dxdt. \end{aligned}$$
(11)

We incorporate (10) and (11) into (9) to obtain

$$\begin{aligned} J'(u)\delta u= & {} \displaystyle \int _0^T \displaystyle \int _0^1 \Big [ d^{-1} \displaystyle \frac{\partial {\delta y}}{\partial {t}} - \Big ( r\; {}_0D_x^{2-\beta } + (1-r)\; {}_xD_1^{2-\beta } \Big ) \delta y\Big ] p dxdt \\&+ \displaystyle \int _0^T\displaystyle \int _0^1 u \delta u ~dxdt. \end{aligned}$$

We observe from (6) and (8) that

$$\begin{aligned} \displaystyle d^{-1}\displaystyle \frac{\partial {\delta y}}{\partial {t}} -\bigg ( r\; {}_0D_x^{2-\beta } + (1-r)\; {}_xD_1^{2-\beta } \bigg ) \delta y =d^{-1} \delta u. \end{aligned}$$

Recall that u is the minimizer of the objective functional \(J(\cdot )\) and \(\delta u = v-u\), the fact that \(J'(u)\delta u \ge 0\) leads to the following variational inequality

$$\begin{aligned} J'(u)\delta u= & {} \displaystyle \int _0^T\displaystyle \int _0^1 (d^{-1} p +u) \delta u dxdt \nonumber \\= & {} \displaystyle \int _0^T\displaystyle \int _0^1 (d^{-1}p+u) (v-u) dxdt \ge 0 \end{aligned}$$
(12)

for all \(v \in K\). This variational inequality has the following closed form solution [17]

$$\begin{aligned} u = \max \left\{ u_0,-d^{-1}p\right\} . \end{aligned}$$
(13)

As a matter of fact, if \(-d^{-1}p > u_0\) on the entire interval (0, 1), then (13) implies \(u=-d^{-1}p\) and so \(d^{-1}p+u=0\) on the entire interval (0, 1). The variational inequality (12) becomes identically zero for all \(v \in K\). If \(-d^{-1} p \le u_0\) on the entire interval (0, 1), (13) implies \(u=u_0\). Thus, \(d^{-1}p + u = d^{-1}p + u_0 \ge 0\). Since \(v-u = v-u_0 \ge 0\) for all \(v\in K\), the variational inequality (12) holds.

In the general case, the maximum in (13) may be reached by either on different subdomains of \([0,1] \times [0,T]\). Suppose that \(u = -d^{-1}p\) on \(\varOmega _1\) and \(u = u_0\) on \(\varOmega _2\) with \(\varOmega _1 \cup \varOmega _2 = [0,1] \times [0,T]\). It follows that the integral \(\displaystyle \int _{\varOmega _1} (d^{-1}p+u) (v-u) dxdt = 0\). Hence, the integral in (12) reduces to an integral on \(\varOmega _2\), on which \(u=u_0\). Then the same argument in the preceding paragraph shows that this integral remains nonnegative for all \(v \in K\). Therefore, (12) still holds.

3 A Fractional Gradient Projection Method

The optimality condition of the constrained optimal control problem (1) and (2) reduces the problem to the solution of a coupled system of the state equation (6), the costate equation (7), and the optimal control condition (13). A commonly used method for the efficient solution of the system is the following sequentially decoupled gradient projection method [17]:

  1. (i)

    Pick up an guess of the control \(u^0\).

    For \(n=0,1,\ldots \), do

  2. (ii)

    With the known control \(u^n\), solve the state equation forward in time for the state variable \(y^n\)

    $$\begin{aligned} \displaystyle d^{-1}\displaystyle \frac{\partial {y^n}}{\partial {t}} - \bigg ( r\; {}_0D_x^{2-\beta } + (1-r)\; {}_xD_1^{2-\beta } \bigg ) y^n = d^{-1} \bigg ( f + u^n \bigg ). \end{aligned}$$
  3. (iii)

    With the known state variable \(y^n\), solve the costate equation backward in time for the costate variable \(p^n\)

    $$\begin{aligned} \displaystyle -\displaystyle \frac{\partial {(d^{-1}p^n)}}{\partial {t}} - \bigg ( (1-r)\; {}_0D_x^{2-\beta } + r\; {}_xD_1^{2-\beta } \bigg ) p^n = y^n-\tilde{y}. \end{aligned}$$
  4. (iv)

    Update the optimal control

    $$\begin{aligned} \displaystyle u^{n+1} = \max \big \{u_0,-d^{-1} p^n\big \}. \end{aligned}$$
  5. (v)

    If \(\Vert u^{n+1}-u^n\Vert _{\infty }\) is within the tolerance, then stop

    end do

It is clear that in the gradient projection method, the major computational cost comes from the repeated solution of the state equation (6) and the costate equation (7). In the rest of this section, we present two finite difference methods for these equations and discuss related computational issues.

Let M and N be positive integers. Define the time step size \(\tau := T/M\) and the spatial mesh size \(h := 1/(N+1)\), respectively. We define the temporal partition \(t^k := k \tau \) for \(k=0,1,\ldots ,M\) and the spatial partition \(x_i := i h\) for \(i =0, 1,\ldots ,N+1\). Let \(y_i^k, p_i^k, u_i^k\) be the finite difference approximations to \(y(x_i,t^k)\), \(p(x_i,t^k)\), \(u(x_i,t^k)\), respectively, and denote \((d^{-1})_i^k := d^{-1}(x_i,t^k)\), \(f_i^k := f(x_i,t^k)\), \(\tilde{y}_i^k := \tilde{y}(x_i,t^k)\). Then the Meerschaert finite difference scheme for the state equation can be formulated as follows [19]: For \(k=1,2,\ldots ,M\), solve the following equation forward in time to find \(\{ y^k_i\}_{i=1}^N\) such that

$$\begin{aligned}&\big ( d^{-1} \big )_i^k\displaystyle \frac{y_i^k-y_i^{k-1}}{\tau } - \left[ r\displaystyle \sum _{j=0}^{i+1}g_j^{(2-\beta )} y_{i-j+1}^k + (1-r)\displaystyle \sum _{j=0}^{N-i+1}g_{j}^{(2-\beta )} y_{i+j-1}^k \right] h^{\beta -2} \nonumber \\&\quad = \big (d^{-1}\big )_i^k \big (f_i^k + u_i^k \big ). \end{aligned}$$
(14)

Here the fractional binomial coefficients \(g_{j}^{(\alpha )}\) can be evaluated recursively

$$\begin{aligned} \displaystyle g_{0}^{(\alpha )}=1, \qquad g_{j}^{(\alpha )}=\left( 1-\frac{\alpha +1}{j}\right) g_{j-1}^{(\alpha )} \quad {\mathrm {for}} \quad j \ge 1. \end{aligned}$$

We can similarly formulate the Meerschaert finite difference scheme for the costate equation: For \(k=M, M-1,\ldots ,1\), solve the following equation backward in time to find \(\{ p^k_i\}_{i=1}^N\) such that

$$\begin{aligned}&\displaystyle \frac{(d^{-1})_i^{k-1} p_i^{k-1}-(d^{-1})_i^k p_i^k}{\tau } \displaystyle - \left[ (1-r)\displaystyle \sum _{j=0}^{i+1}g_j^{(2-\beta )} p_{i-j+1}^k\right. \left. +r\displaystyle \sum _{j=0}^{N-i+1}g_{j}^{(2-\beta )} p_{i+j-1}^k \right] h^{\beta -2} \nonumber \\&\quad \quad = y_i^{k-1} - \tilde{y}_i^{k-1}. \end{aligned}$$
(15)

Because a shifted Grünwald approximation was used, the Meerschaert scheme has only first-order accuracy in both space and time [19].

Second-order finite difference methods were developed to improve the computational efficiency. A Crank-Nicolson temporal discretization plus a Richardson extrapolation of the spatial discretization in (14) results in a scheme that has second-order accuracy in both space and time [18]. In this paper we use an alternative second-order scheme, the CN-WSGD scheme, developed in [28]: For \(k=1,2,\ldots ,M\), solve the following equation forward in time to find \(\{ y^k_i\}_{i=1}^N\) such that

(16)

Here the superscript \(k-\frac{1}{2}\) denotes the mean value of the function on k and \(k-1\) levels and \(w_{j}^{(\alpha )}\) can be evaluated by

$$\begin{aligned} \displaystyle w_{0}^{(\alpha )} =\frac{\alpha }{2}g_{0}^{(\alpha )}, \quad w_{j}^{(\alpha )} =\frac{\alpha }{2}g_{j}^{(\alpha )} +\frac{2-\alpha }{2}g_{j-1}^{(\alpha )} \quad {\mathrm {for}} \quad j\ge 1. \end{aligned}$$

We can similarly formulate the CN-WSGD scheme for the costate equation as follows: For \(k=M, M-1,\ldots ,1\), solve the following equation backward in time to find \(\{ p^k_i\}_{i=1}^N\) such that

$$\begin{aligned}&\displaystyle \frac{(d^{-1})_i^{k-1} p_i^{k-1}-(d^{-1})_i^k p_i^k}{\tau } \displaystyle - \Bigg [(1-r)\displaystyle \sum _{j=0}^{i+1}w_j^{(2-\beta )} p_{i-j+1}^{k-\frac{1}{2}} +r\displaystyle \sum _{j=0}^{N-i+1}w_{j}^{(2-\beta )} p_{i+j-1}^{k-\frac{1}{2}} \Bigg ] h^{\beta -2} \nonumber \\&\qquad = y_i^{k-\frac{1}{2}} - \tilde{y}_i^{k-\frac{1}{2}}. \end{aligned}$$
(17)

The nonlocal property of fractional differential operators results in a significant increase in the already expensive computational and storage cost in the gradient projection method for the constrained optimal control problem. Despite that the optimal control constrained by space-fractional diffusion equations provides a more appropriate description than its analogue constrained by second-order diffusion equations, the significantly increased computational and storage cost poses a severe computational challenge to the new model. More precisely, within each outer loop in the gradient projection method, the computational cost of the space-fractional state and costate equations is of order \(O(M N^3)\), which is a very sharp increase of the computational cost of second-order state and costate equations that has a computational cost of O(M N). This significantly increased computational cost gets even worse as the state and costate equations have to be solved iteratively in the gradient projection method. Therefore, the development of a fast and faithful numerical method is of fundamental importance.

4 A Fast and Faithful Solution Method

The goal of this section is to develop a fast and faithful gradient projection method for the optimal control problem (1) and (2). As the major computational complexity of the gradient projection method comes from the numerical solution of the state and costate equations, we focus on the development of fast and faithful solution methods for these equations.

4.1 Fast Implementation of the Finite Difference Methods for the State and Costate Equations

Let \(\mathbf {y}^k := (y_1^k,y_2^k,\ldots ,y_N^k)^T\), and \(\mathbf {y}^{k-1}\), \(\mathbf {f}^k\), \(\mathbf {u}^k\) be introduced in a similar fashion. Then the Meerschaert scheme for the state equation (14) can be expressed in a matrix form [19]

$$\begin{aligned} \big (\mathbf {D}^k+\gamma \mathbf {A}\big )\mathbf {y}^k =\mathbf {D}^k\mathbf {y}^{k-1}+\tau \mathbf {D}^k\big (\mathbf {f}^k +\mathbf {u}^k\big ), \end{aligned}$$
(18)

where \(\gamma ={\tau }/{h^{2-\beta }}\), \(\mathbf {D}^k={\mathrm {diag}}((d^{-1})_i^k)_{i=1}^N\), and \(\mathbf {A}\) is a full stiffness matrix. Traditionally, the linear system (18) has been solved numerically by Gaussian elimination [19], which requires computational work of \(O(N^3)\) per time step. It was reported in [32] that the stiffness matrix \(\mathbf {A}\) can be decomposed as

$$\begin{aligned} \mathbf {A}= r \mathbf {G}+ (1-r) \mathbf {G}^T \end{aligned}$$
(19)

where \(\mathbf {G}\) is a Toeplitz matrix of the form

$$\begin{aligned} \mathbf {G}= -\left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} g_1 &{} g_0 &{} &{} &{} \\ g_2 &{} g_1 &{} \ddots &{} &{} \\ \vdots &{} \ddots &{} \ddots &{} \ddots &{} \\ \vdots &{} \ddots &{} \ddots &{} g_1 &{} g_0 \\ g_N &{} \cdots &{} \cdots &{} g_2 &{} g_1 \end{array}\right) \end{aligned}$$
(20)

with \(g_i := g_i^{(2-\beta )}\).

Fast finite difference methods were developed by exploring the decomposition [31, 32] which reduce the computational cost from \(O(N^3)\) at each time step to \(O(N \log N)\) per inner iteration. It is known that an \(N \times N\) Toeplitz matrix \(\mathbf {T}=(t_{i-j})_{0\le i,j\le N}\) can be embedded into a \(2N \times 2N\) circulant matrix \(\mathbf {C}_{2N}\) [6, 11]

$$\begin{aligned} \begin{array}{l} \mathbf {C}_{2N} = \left[ \begin{array}{c@{\quad }c} \mathbf {T}&{} \hat{\mathbf {T}} \\ \hat{\mathbf {T}} &{} \mathbf {T}\end{array}\right] , \quad \hat{\mathbf {T}} = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0 &{} t_{N-1} &{} \ldots &{} t_{2} &{} t_{1}\\ t_{1-N} &{} 0 &{} t_{N-1} &{} \ldots &{} t_{2} \\ \vdots &{} t_{1-N} &{} 0 &{} \ddots &{} \vdots \\ t_{-2} &{} \vdots &{} \ddots &{} \ddots &{} t_{N-1} \\ t_{-1} &{} t_{-2} &{} \ldots &{} t_{1-N} &{} 0 \end{array}\right] , \end{array}\end{aligned}$$
(21)

The circulant matrix \(\mathbf {C}_{2N}\) can be decomposed as

$$\begin{aligned} \mathbf {C}_{2N} = \mathbf {F}_{2N}^{-1} {\mathrm {diag}}(\mathbf {F}_{2N} \mathbf {c}) \mathbf {F}_{2N}, \end{aligned}$$
(22)

where \(\mathbf {c}\) is the first column vector of \(\mathbf {C}_{2N}\) and \(\mathbf {F}_{2N}\) is the \(2N \times 2N\) discrete Fourier transform matrix. It is well known that the matrix-vector multiplication \(\mathbf {F}_{2N}\mathbf {v}_{2N}\) for \(\mathbf {v}_{2N} \in R^{2N}\) can be carried out in \(O(2N \log (2N)) = O(N \log N)\) operations via the fast Fourier transform (FFT). Therefore, \(\mathbf {C}_{2N} \mathbf {v}_{2N}\) can be evaluated in \(O(N \log N)\) operations by (22).

The same idea can be applied to other schemes in the previous section. The Meerschaert scheme (15) for the costate equation can be expressed in the matrix form

$$\begin{aligned} \Bigg (\mathbf {D}^{k-1}+\gamma \mathbf {A}^T\Bigg )\mathbf {p}^{k-1} =\mathbf {D}^k\mathbf {p}^{k}+\tau \Bigg (\mathbf {y}^{k-1} -\tilde{\mathbf {y}}^{k-1}\Bigg ). \end{aligned}$$
(23)

It was shown in [30, 32] that the matrix \(\mathbf {G}\) is diagonal-dominant and positive-definite for \(0< \beta <1\), and so does \(\mathbf {A}\). Clearly the matrix \(\mathbf {A}\) is Toeplitz, and the coefficient matrices \(\mathbf {D}^k+\gamma \mathbf {A}\), \(\mathbf {D}^{k-1}+\gamma \mathbf {A}^T\) are of diagonal-plus-Toeplitz type. Thus, the memory requirements to store them are O(N), instead of a traditional storage for them that uses \(O(N^2)\) of storage. Furthermore, if \(r = 1/2\), \(\mathbf {A}\) is symmetric and positive-definite, so the conjugate gradient can be used to solve the linear systems (18) and (23) iteratively. For \(r \ne 1/2\), \(\mathbf {A}\) is nonsymmetric, the conjugate gradient squared type of methods can be employed to solve the systems.

Similarly, the CN-WSGD schemes (16) and (17) can be expressed in the matrix forms

$$\begin{aligned} \Bigg (\mathbf {D}^{k-\frac{1}{2}} + \frac{1}{2}\gamma \mathbf {B}\Bigg )\mathbf {y}^k = \Bigg (\mathbf {D}^{k-\frac{1}{2}} - \frac{1}{2}\gamma \bar{\mathbf {B}} \Bigg ) \mathbf {y}^{k-1} +\tau \mathbf {D}^{k-\frac{1}{2}} \bigg (\mathbf {f}^{k-\frac{1}{2}} +\mathbf {u}^{k-\frac{1}{2}} \bigg ) \end{aligned}$$
(24)

and

$$\begin{aligned} \Bigg (\mathbf {D}^{k-1}+\frac{1}{2}\gamma \mathbf {B}^T \Bigg )\mathbf {p}^{k-1} = \Bigg ( \mathbf {D}^k - \frac{1}{2}\gamma \mathbf {B}^T \Bigg )\mathbf {p}^{k} +\tau \bigg (\mathbf {y}^{k-\frac{1}{2}} -\tilde{\mathbf {y}}^{k-\frac{1}{2}} \bigg ). \end{aligned}$$
(25)

Here \(\mathbf {B}= r \bar{\mathbf {G}} + (1-r)\bar{\mathbf {G}}^T\) and \(\bar{\mathbf {G}}\) is also given by the expression in (20) with \(g_i\) being replaced by \(w_i^{(2-\beta )}\). \(\mathbf {D}^{k-\frac{1}{2}}\), \(\mathbf {f}^{k-\frac{1}{2}}\), \(\mathbf {u}^{k-\frac{1}{2}}\), \(\mathbf {y}^{k-\frac{1}{2}}\), and \(\tilde{\mathbf {y}}^{k-\frac{1}{2}}\) are their corresponding average at time step \(t^{k-1}\) and \(t^k\). The matrix \(\bar{\mathbf {G}}\) was shown to be positive definite [28], and so is \(\mathbf {B}\). Hence, the fast method described earlier in the subsection still applies.

4.2 A Fast Preconditioned Conjugate Gradient Algorithm

Although the fast Krylov subspace iterative solvers described in the previous subsection significantly reduced the computational cost to \(O(N \log N)\) per iteration, large number of iterations might be needed if the condition number of the coefficient matrix is large. In addition, the round off errors accumulated due to large number of iterations might deteriorate the convergence behavior of the iterative solvers [30]. This motivates us to use preconditioned Krylov subspace solvers.

For \(r = 1/2\), the coefficient matrices of the systems (18), (23), (24), and (25) are of diagonal-plus-symmetric and positive-definite Toeplitz type. We modify the Strang preconditioner as follows [27]: For the Toeplitz matrix \(\mathbf {T}\) given in the previous subsection, the diagonals \(s_j\) of the Strang circulant preconditioner \(\mathbf {S}(\mathbf {T})=(s_{i-j})_{0\le i,j\le N}\) are given by

$$\begin{aligned} s_l=\left\{ \begin{array}{ll} t_l, &{} \quad 0\le l\le N/2, \\ t_{l-N}, &{}\quad N/2 < l < N, \\ t_{l+N}, &{}\quad -N < l < 0. \end{array}\right. \end{aligned}$$
(26)

Then, the diagonal-plus-symmetric and positive-definite Toeplitz coefficient matrices in the systems in (18), (23), (24), and (25) are preconditioned by the following circulant preconditioner

$$\begin{aligned} \mathbf {M}^k = D^k\mathbf {I}+\gamma \mathbf {S}(\mathbf {T}), \end{aligned}$$
(27)

where \(D^k=\frac{1}{N}\sum \nolimits _{i=1}^N (d^{-1})_i^k\) is the mean value of the diagonal elements of \(\mathbf {D}^k\), \(\mathbf {I}\) is the identity matrix of order N, and \(\mathbf {T}= \mathbf {A}\) for the systems (18) and (23) or \(\mathbf {T}= \mathbf {B}\) for (24) and (25). We use the system (18) as an example to formulate the fast preconditioned conjugate gradient method as follows [2]:

figure a

In the above algorithm, all steps need O(N) operations except for two matrix-vector multiplications \(\mathbf {A}\mathbf {y}^{(0)}\), \(\mathbf {A}\mathbf {p}^{(i-1)}\), and the solution of the system \(\mathbf {M}^k \mathbf {q}^{(i)} = \mathbf {r}^{(i)}\). The matrix-vector multiplications can be carried out in \(O(N \log N)\) operations as shown in Sect. 4.1. The preconditioner \(\mathbf {M}^k\) is an \(N\times N\) circulant matrix and so can be decomposed similarly as (22)

$$\begin{aligned} \mathbf {M}^k = \mathbf {F}_{N}^{-1} {\mathrm {diag}}(\mathbf {F}_{N} \mathbf {s}) \mathbf {F}_{N}, \end{aligned}$$

where \(\mathbf {s}\) is the first column vector of \(\mathbf {M}^k\) and \(\mathbf {F}_{N}\) is the \(N \times N\) discrete Fourier transform matrix. It can be inverted in \(O(N \log N)\) operations.

4.3 A Fast Preconditioned Conjugate Gradient Squared Algorithm

For \(r \ne 1/2\), the stiffness matrices \(\mathbf {A}\) and \(\mathbf {B}\) in the systems (18), (23), (24), and (25) are positive-definite but nonsymmatric Toeplitz matrices. We again use the Meerschaert scheme (18) as an example to demonstrate the idea. We still use a circulant preconditioner of the structure \(\mathbf {M}^k = D^k\mathbf {I}+\gamma \mathbf {S}(\mathbf {A})\). However, note that \(\mathbf {S}(\mathbf {A})\) is different from that of the symmetric case. Of course, we can also replace \(\mathbf {S}(\mathbf {A})\) by a T. Chan preconditioner [4]. Then a fast preconditioned conjugate gradient squared algorithm for (18) can be formulated as follows [2]:

figure b

5 Numerical Experiments

In this section we carry out numerical experiments to investigate the performance of the fast gradient projection method. These methods were implemented using Compaq Visual Fortran 6.6 on a ThinkPad T410 Laptop.

5.1 An Optimal Control Problem with \(r = 0.5\)

We consider the optimal control problem (1) and (2) with the following data

$$\begin{aligned} \beta= & {} 0.2, \quad r = 0.5, \quad T = 1, \quad u_0 = 1,\nonumber \\ \tilde{y}= & {} \displaystyle \frac{100 x^3(1-x)^2 \sin (T-t)}{(1+xt)^2} +\displaystyle \frac{100 x^2(1-x)^2 \cos (T-t)}{1+xt} \nonumber \\&-\displaystyle \frac{\sin (T-t)}{\varGamma (1.2)}\Big [(x^{0.2}+(1-x)^{0.2}) -5(x^{1.2}+(1-x)^{1.2}) +\displaystyle \frac{50}{11}(x^{2.2}+(1-x)^{2.2})\Big ], \nonumber \\ f= & {} -\max \Big \{ \displaystyle \frac{100 x^2(1-x)^2 \sin (T-t)}{1+xt},1\Big \}, \nonumber \\ d= & {} \displaystyle \frac{1+xt}{100}. \end{aligned}$$
(28)

The analytical solution of this problem is given by

$$\begin{aligned} y= & {} 0, \nonumber \\ p= & {} -x^2(1-x)^2 \sin (T-t), \nonumber \\ u= & {} \max \Big \{\displaystyle \frac{100 x^2(1-x)^2 \sin (T-t)}{1+xt},1\Big \}. \end{aligned}$$
(29)

Since \(r = 0.5\), the coefficient matrices for the states equations are symmetric. We solve the Meerschaert schemes (18) and (23) as well as the CN-WSGD systems (24) and (25) by Gaussian elimination (Gauss), the fast conjugate gradient (CG) method, and the fast preconditioned conjugate gradient (PCG) method. We present the \(l^2\) errors of the approximations to y, p, and u, and the convergence rates of the PCG scheme in Tables 1 and 2, respectively. Since Gauss elimination consumes too much CPU time, we skipped the numerical solution by Gauss elimination for the finest mesh \(N = 2^{10}\). We present the consumed CPU times of the two schemes in Table 3.

Table 1 Performance of the Meerschaert scheme solved by Gauss, the CG, and the PCG solvers for the example in Sect. 5.1
Table 2 Performance of the CN-WSGD scheme solved by Gauss, the CG, and the PCG solvers for the example in Sect. 5.1
Table 3 The consumed CPU times of the two schemes solved by Gauss, the CG, and the PCG solvers for the example in Sect. 5.1
Table 4 Performance of the Meerschaert scheme solved by Gauss, the CGS, and the PCGS solvers for the example in Sect. 5.2
Table 5 Performance of the CN-WSGD scheme solved by Gauss, the CGS, and the PCGS solvers for the example in Sect. 5.2

We observe from Tables 1 and 2 that Gauss, the CG and the PCG solvers generate almost identical numerical solutions. Thus, we only compute the convergence rates of the PCG solutions as an indicator. The fast gradient projection method, as an iterated method for the coupled nonlinear system, retains the first order convergence rate of Meerschaert scheme and the second order convergence rate of the CN-WSGD scheme, respectively. These results, together with the results in Table 3, show the utility of the fast preconditioned conjugate gradient method.

5.2 An Optimal Control Problem with \(r = 0.8\)

We consider the same example as in Sect. 5.1 except that \(r = 0.8\) and

$$\begin{aligned} \tilde{y}= & {} \displaystyle \frac{100 x^3(1-x)^2 \sin (T-t)}{(1+xt)^2} +\displaystyle \frac{100 x^2(1-x)^2 \cos (T-t)}{1+xt} \nonumber \\&-\displaystyle \frac{2\sin (T-t)}{5\varGamma (1.2)}\Big [\big (x^{0.2}+4(1-x)^{0.2}\big ) \nonumber \\&\qquad -5\big (x^{1.2}+4(1-x)^{1.2}\big ) +\displaystyle \frac{50}{11}\big (x^{2.2}+4(1-x)^{2.2}\big )\Big ]. \end{aligned}$$
(30)

The true solution of this problems is the same as the one in Sect. 5.1.

As \(r = 0.8\), the stiffness matrices of the numerical schemes are nonsymmetric, so we solve the Meerschaert schemes (18) and (23) and the CN-WSGD schemes (24) and (25) by Gauss elimination, the fast conjugate gradient squared (CGS) method, and the fast preconditioned conjugate gradient squared (PCGS) method. We present the corresponding results in Tables 4, 5 and 6, respectively.

Table 6 The consumed CPU times of the two schemes solved by Gauss, the CGS, and the PCGS solvers for the example in Sect. 5.2
Table 7 Performance of the Meerschaert scheme solved by Gauss, the CGS, and the PCGS solvers for the example in Sect. 5.3
Table 8 Performance of the CN-WSGD scheme solved by Gauss, the CGS, and the PCGS solvers for the example in Sect. 5.3
Table 9 The consumed CPU times of the two schemes solved by Gauss, the CGS, and the PCGS solvers for the example in Sect. 5.3

We observe roughly the same convergence behavior as those in Sect. 5.1. One important exception is that for the Meerschaert scheme with the finest mesh \(M = N = 2^{10}\) the CGS solver diverged. As a matter of fact, the CGS solver does not converge within 4 hours of CPU time. In contrast, the PCGS solver still converged in 6 minutes and 15 seconds of CPU time. In other words, the PCGS solver not only reduced the CPU time of the CGS solver, but also improved its convergence behavior. Furthermore, the CGS solver converged for the CN-WSGD scheme at \(M = N = 2^{10}\). This showed an additional computational benefit of the second-order CN-WSGD scheme over the first-order Meerschaert scheme.

5.3 An Optimal Control Problem with \(r = 1\)

We consider the optimal control problem (1) and (2) with the following data

$$\begin{aligned} T= & {} 1, \quad u_0 = 1, \quad \beta =0.2, \quad r=1, \nonumber \\ \tilde{y}= & {} x^2(1-x)\sin (T-t) +\displaystyle \frac{100 x^2(1-x)^2 \sin (T-t)}{(1+xt)^2} \nonumber \\&+\displaystyle \frac{100 x(1-x)^2 \cos (T-t)}{1+xt} -\displaystyle \frac{\sin (T-t)}{\varGamma (1.2)}\Big [2(1-x)^{0.2} -5(1-x)^{1.2}\Big ], \nonumber \\ f= & {} -x^2(1-x)\cos (T-t) -\displaystyle \frac{(1+xt)\sin (T-t)}{100\varGamma (1.2)}\Big [2x^{0.2} -5x^{1.2}\Big ] \nonumber \\&-\max \Big \{ \displaystyle \frac{100 x(1-x)^2 \sin (T-t)}{1+xt},1\Big \}, \nonumber \\ d= & {} \displaystyle \frac{1+xt}{100}. \end{aligned}$$
(31)

The true solution to the problem is

$$\begin{aligned} y= & {} x^2(1-x)\sin (T-t) \nonumber \\ p= & {} -x(1-x)^2\sin (T-t) \nonumber \\ u= & {} \max \Big \{ \displaystyle \frac{100 x(1-x)^2 \sin (T-t)}{1+xt},1 \Big \}. \end{aligned}$$
(32)

This example has even stronger nonsymmetry, as the containing fractional diffusion equations are one-sided equations. Namely, the state equation is a left-sided fractional diffusion equation while the costate equation is a right-sided equation. We solve the Meerschaert schemes (18) and (23) and the CN-WSGD schemes (24) and (25) by Gauss elimination, the CGS method, and the PCGS method. We present the corresponding results in Tables 7, 8 and 9, respectively.

We observe similar convergence behavior as in Sect. 5.2 except that the CGS solver diverged even for the CN-WSGD scheme at \(M=N=2^{10}\). In contrast, the PCGS method still converges for both the Meerschaert scheme and the CN-WSGD scheme. This again shows the utility of the PCGS method.