1 Introduction

In recent years, a lot of attention has been paid to the study of nonlocal problems, of which fractional differential equations represent an instance. This is motivated by the fact that fractional derivatives are better suited to capturing long-range interactions, as well as memory effects. For instance, they have been used to describe anomalous transport phenomena [9, 10], option pricing [6], porous media flow [5], and viscoelastic materials [8], to name a few. It is only natural then, from the purely mathematical as well as the practical points of view, to try to optimize systems that are governed by these equations. In previous work [4], we dealt with a constrained optimization problem where the state is governed by a differential equation that presented nonlocal features in time as well as in space. Throughout the analysis presented in [4], the nonlocalities in time and space were intertwined and this required to develop several tools to analyze the nonlocal operator in space that are, in principle, not relevant to the nonlocality in time. It is thus our feeling that the extensive technicalities that ensued in the analysis of [4] obscured many of the unique features that optimization of fractional differential equations contains; for instance, the lack of time regularity regardless of the smoothness of data. For this reason, our main objective in this note is to present a detailed study for the case where the state is governed by a time-fractional ordinary differential equation.

Let us be precise in our considerations. Given m, n ≥ 1, a final time T > 0, a desired state \(u_d \in L^2(0,T;\mathbb {R}^m)\), and a regularization parameter μ > 0, we define the cost functional as

$$\displaystyle \begin{aligned} J(u,z) = \frac 12 \int_0^T \left( |\mathcal{C} u - u_d|{}_m^2 + \mu |z|{}_n^2 \right) \, {\mathrm{d}} t, \end{aligned} $$
(1)

where we denote the Euclidean norm in \(\mathbb {R}^s\) by |⋅|s and \(\mathcal {C} \in \mathbb {M}^{m \times n}\); \(\mathbb {M}^{m \times n}\) denotes the set of all m–by–n matrices. The variable u is called the state, while the variable z is the control. The control and state are related by the so-called state equation, which we now describe. Given an initial condition \(\psi \in \mathbb {R}^n\), a forcing function \(f : (0,T] \to \mathbb {R}^n\), a symmetric positive definite matrix \(\mathcal {A} \in \mathbb {M}^{n \times n}\), the state equation reads

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma u + \mathcal{A} u = f + z, \ t \in (0,T], \qquad u(0) = \psi. \end{aligned} $$
(2)

Here, γ ∈ (0, 1) and \(\, {\mathrm {d}}_t^\gamma \) denotes the so-called left-sided Caputo fractional derivative of order γ, which is defined by [19, 28]

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma v(t) = \frac 1{\varGamma(1-\gamma)} \int_0^t \frac 1{(t-\zeta)^\gamma} \dot v(\zeta) \, {\mathrm{d}} \zeta, \end{aligned} $$
(3)

where by \(\dot v\) we denote the usual derivative and Γ is the Gamma function. We must immediately remark that, in addition to (3), there are other, not equivalent, definitions of fractional derivatives: Riemann–Liouville, Grünwald-Letnikov, and Marchaud derivatives. In this work, we shall focus on the Caputo derivatives since they allow for a standard initial condition in (2); a highly desirable feature in applications; see, for instance, the discussion in [14, Section E.4]. For further motivation and applications, we refer the reader to [11, 14].

The problem we shall be concerned with is to find (ŭ, z̆) such that

$$\displaystyle \begin{aligned} J({\breve{u}}, {\breve{z}}) = \min J(u,z) \end{aligned} $$
(4)

subject to the state equation (2) and the control constraints

$$\displaystyle \begin{aligned} a \preceq z \preceq b. \end{aligned} $$
(5)

Here \(a,b \in \mathbb {R}^n\) which we assume satisfy that a ≼ b. The relation v ≼ w means that, for all i = 1, …, n, we have vi ≤ wi.

To our knowledge, the first work that was devoted to the study of (4) is [2] where a formal Lagrangian formulation is discussed and optimality conditions are formally derived. The author of this work also presents a numerical scheme based on shifted Legendre polynomials. However, there is no analysis of the optimality conditions or numerical scheme. Other discretization schemes using finite elements [3], rational approximations [30], spectral methods [24, 32, 33], or other techniques have been considered. Most of these works do not provide a rigorous justification or analysis of their schemes, and the ones that do obtain error estimates under rather strong regularity assumptions of the state variable; namely, they require that \(\ddot u \in L^\infty (0,T;\mathbb {R}^n)\) which is rather problematic; see Theorem 2 below. In contrast, in this work, we carefully describe the regularity properties of the state equation and on their basis provide convergence (without rates) of the numerical scheme we propose.

Throughout our discussion, we will follow the standard notation and terminology. Nonstandard notation will be introduced in the course of our exposition. The rest of this work is organized as follows: Basic facts about fractional derivatives and integrals are presented in Section 1.1. We study the state equation in Section 2 where we construct the solution to problem (2), study its regularity, and present a somewhat new point of view for a classical scheme—the so-called L1 scheme. More importantly, we use the right regularity to obtain rates of convergence; an issue that has been largely ignored in the literature. With these ingredients at hand we proceed, in Section 3, to analyze the optimization problem (4); we show existence and uniqueness of an optimal state–control pair and propose a scheme to approximate it. We employ a piecewise linear (in time) approximation of the state and a piecewise constant approximation of the control. While not completely necessary for the analysis, we identify the discrete adjoint problem and use it to derive discrete optimality conditions. Finally, we show the strong convergence of the discrete optimal control to the continuous one. Owing to the reduced regularity of the solution to the state equation, this convergence, however, cannot have rates.

1.1 Fractional Derivatives and Integrals

We begin by recalling some fundamental facts about fractional derivatives and integrals. The left-sided Caputo fractional derivative is defined in (3). The right-sided Caputo fractional derivative of order γ is given by [19, 28]

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_{T-t}^\gamma v(t) = -\frac 1{\varGamma(1-\gamma)} \int_t^T \frac 1{(\zeta - t)^\gamma} \dot v(\zeta) \, {\mathrm{d}} \zeta.\end{aligned} $$
(6)

For v ∈ L1(0, T), the left Riemann–Liouville fractional integral of order σ ∈ (0, 1) is defined by

$$\displaystyle \begin{aligned} I_t^\sigma[v](t) = \frac 1{\varGamma(\sigma)} \int_0^t \frac 1{(t-\zeta)^{1-\sigma}} v(\zeta) \, {\mathrm{d}} \zeta; \end{aligned} $$
(7)

see [28, Section 2]. Young’s inequality for convolutions immediately yields that, for p > 1, \(I_t^\sigma \) is a continuous operator from Lp(0, T) into itself. More importantly, a result by Flett [12] shows that

$$\displaystyle \begin{aligned} v \in L\log L(0,T) \implies I^\sigma_t[v] \in L^{\frac 1{1-\sigma}}(0,T). \end{aligned} $$
(8)

We refer the reader to [20] for the definition of the Orlicz space \(L\log L(0,T)\). This observation will be very important in subsequent developments. Notice finally that if \(v \in W^{1}_1(0,T)\), then we have that \(\, {\mathrm {d}}_t^\gamma v(t) = I_t^{1-\gamma } [\dot v] (t)\).

The generalized Mittag-Leffler function with parameters α > 0 and \(\beta \in \mathbb {R}\) is defined by

$$\displaystyle \begin{aligned} E_{\alpha,\beta}(z) = \sum_{k=0}^\infty \frac{z^k}{\varGamma(\alpha k + \beta)}, \quad z \in \mathbb{C}. \end{aligned} $$
(9)

We refer the reader to [14] for an account of the principal properties of the Mittag-Leffler function.

2 The State Equation

In this section, we construct the solution to (2), thus showing its existence and uniqueness. This shall be of uttermost importance not only when showing the existence and uniqueness of solutions to our optimization problem, but when we deal with the discretization, as we will study the smoothness of u. To shorten notation, in this section we set

$$\displaystyle \begin{aligned} g=f+z, \end{aligned}$$

where f is the forcing term and z is the control in (2).

2.1 Solution Representation and Regularity

Let us now construct the solution to (2) and review its main properties. We will adapt the arguments of [26] to our setting. Since the matrix \(\mathcal {A}\) is symmetric and positive definite, it is orthogonally diagonalizable; meaning that there are \(\{\lambda _\ell ,\xi _\ell \}_{\ell =1}^n \subset \mathbb {R}_+ \times \mathbb {R}^n\) such that

$$\displaystyle \begin{aligned} \mathcal{A} \xi_\ell = \lambda_\ell \xi_\ell, \qquad \xi_{\ell_1} \cdot \xi_{\ell_2} = \delta_{\ell_1,\ell_2}. \end{aligned}$$

This, in particular, implies that the vectors \(\{\xi _\ell \}_{\ell =1}^n\) form an orthonormal basis of \(\mathbb {R}^n\). Moreover, for any vector \(v \in \mathbb {R}^n\), we can define \(|v|{ }_{\mathcal {A}}^2 = v\cdot \mathcal {A} v\), which turns out to be a norm that satisfies

$$\displaystyle \begin{aligned} \lambda_1 |v|{}^2_n \leq |v|{}_{\mathcal{A}}^2 \leq \lambda_n |v|{}^2_n, \quad \forall v \in \mathbb{R}^n. \end{aligned} $$
(10)

We set

$$\displaystyle \begin{aligned} \| v \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)}^2 = \int_0^T |v|{}_{\mathcal{A}}^2 \, {\mathrm{d}} t. \end{aligned} $$
(11)

With these properties of the matrix \(\mathcal {A}\) at hand, we propose the following solution ansatz:

$$\displaystyle \begin{aligned} u(t) = \sum_{\ell=1}^n u_{\ell}(t) \xi_{\ell}, \quad u_{\ell}(t) = u(t) \cdot \xi_{\ell}, \end{aligned} $$
(12)

where the coefficients u(t) satisfy

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma u_\ell(t) + \lambda_\ell u_{\ell}(t) = g_{\ell}(t), \ t \in (0,T], \qquad u_{\ell}(0) = \psi_{\ell}, \end{aligned} $$
(13)

for  ∈{1, ⋯ , n}. Here, g(t) = g(t) ⋅ ξ and ψ = ψ ⋅ ξ. The importance of this orthogonal decomposition lies in the fact that we have reduced problem (2) to a decoupled system of equations. The theory of fractional ordinary differential equations [28] gives, for  ∈{1, ⋯ , n}, a unique function u satisfying problem (13). In addition, standard considerations, which formally entail taking the Laplace transform of (13), yield that

$$\displaystyle \begin{aligned} u_{\ell}(t) = E_{\gamma,1}(-\lambda_{\ell} t^\gamma) \psi_{\ell} + \int_0^t (t-\zeta)^{\gamma-1} E_{\gamma,\gamma}(-\lambda_{\ell}(t-\zeta)^\gamma) g_{\ell}(\zeta) \, {\mathrm{d}} \zeta. \end{aligned} $$
(14)

We refer the reader to [25,26,27] for details. This representation shall prove rather useful to describe the existence, uniqueness, and regularity of u. To concisely state it, let us define

$$\displaystyle \begin{aligned} \mathbb{U} = \{ w \in L^2(0,T;\mathbb{R}^n): \, {\mathrm{d}}_t^\gamma w \in L^2(0,T;\mathbb{R}^n) \}. \end{aligned} $$
(15)

With this notation, a specialization of the results of [26] to the substantially simpler case when \(\mathcal {A}\) is a positive definite matrix (and thus the spaces are finite dimensional) yields the following result.

Theorem 1 (Existence and Uniqueness)

Assume that \(g \in L^2(0,T;\mathbb {R}^n)\) . Problem (2) has a unique solution\(u \in \mathbb {U}\), given by (12) and (14). Moreover, the following a priori estimate holds

$$\displaystyle \begin{aligned} I_t^{1-\gamma}\left[|u|{}_n^2\right](T) + \| u \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)}^2 \lesssim \varLambda_\gamma^2(\psi,g), \end{aligned} $$
(16)

where, for \(v \in \mathbb {R}^n\) and \(h \in L^2(0,T;\mathbb {R}^n)\) we have

$$\displaystyle \begin{aligned} \varLambda_\gamma^2(v,h) = I_t^{1-\gamma}\left[|v|{}_n^2\right](T) + \| h \|{}_{L^2(0,T;\mathbb{R}^n)}^2, \end{aligned} $$
(17)

where we implicitly identified v with the constant function \([0,T] \ni t \mapsto v \in \mathbb {R}^n\) . In this estimate, the hidden constant is independent of ψ, g, and u.

Having obtained conditions that guarantee the existence and uniqueness for (2) we now study its regularity. This is important since, as it is well known, smoothness and rate of approximation go hand in hand. This is exactly the content of direct and converse theorems in approximation theory [1, 17]. Consequently, any rigorous study of an approximation scheme must be concerned with the regularity of the solution. This, we believe, is an issue that for this problem has been largely ignored in the literature since, essentially, the solution to (2) is not smooth. Let us now follow [25, 26] and elaborate on this matter. The essence of the issue is already present in the case n = 1 so that (14) is the solution. Let us, to further simplify the discussion, set \(\mathcal {A} = 1\), g ≡ 0, and ψ = 1. In this case, the solution verifies the following asymptotic estimate:

$$\displaystyle \begin{aligned} u(t) = E_{\gamma,1}(-t^\gamma) = 1 - \frac 1{\varGamma(1+\gamma)} t^\gamma + {\mathcal{O}}(t^{2\gamma}), \qquad t \downarrow 0. \end{aligned}$$

If this is the case we then expect that, as t 0, \(\dot u(t) \approx t^{\gamma -1}\) and \(\ddot u(t) \approx t^{\gamma -2}\). Notice that, since γ ∈ (0, 1), the function ω1(t) = tγ−1 belongs to \(L\log L(0,T)\) but ω1L1+𝜖(0, T) for any 𝜖 > γ(1 + γ)−1. Similarly, the function ω2(t) = tγ−2 is not Lebesgue integrable, but

$$\displaystyle \begin{aligned} \int_0^T t^\sigma |\omega_2(t)|{}^2 \, {\mathrm{d}} t = \int_0^T t^{\sigma+2(\gamma-2)} \, {\mathrm{d}} t < \infty \Rightarrow \sigma > 3-2\gamma, \end{aligned}$$

which implies that ω2 belongs to the weighted Lebesgue space L2(tσ;0, T), where σ > 3 − 2γ > 1. The considerations given above tell us that we should expect the following:

$$\displaystyle \begin{aligned} \dot u \in L\log L(0,T;\mathbb{R}^n) \qquad \ddot u \in L^2(t^\sigma; 0,T;\mathbb{R}^n), \ \sigma > 3-2\gamma. \end{aligned} $$
(18)

The justification of this heuristic is the content of the next result. For a proof, we refer the reader to [26, Theorem 8].

Theorem 2 (Regularity)

Assume that \(g \in H^2(0,T;\mathbb {R}^n)\) . Then u, the solution to (2), satisfies (18) and, for t ∈ (0, T], we have the following asymptotic estimate:

$$\displaystyle \begin{aligned} \left(\int_0^T \zeta^\sigma |\ddot u(\zeta)|{}_n^2 \, {\mathrm{d}} \zeta \right)^{1/2} + t^{1-\gamma} \left| \dot u(t) - \frac 1t (u(t) - \psi) \right|{}_n \lesssim |\psi|{}_n + \| g \|{}_{H^2(0,T;\mathbb{R}^n)}, \end{aligned}$$

where σ > 3 − 2γ. The hidden constant is independent of t but blows up as γ ↓ 0+.

Remark 1 (Extensions)

Under the correct framework, the conclusion of Theorem 2 can be extended to the case where \(\mathcal {A}\) is an operator acting on a Hilbert space \(\mathcal H\) and Equation (2) is understood in a Gelfand triple \(\mathcal V \hookrightarrow \mathcal H \hookrightarrow \mathcal V'\); see [26] for details.

2.2 Discretization of the State Equation

Now that we have studied the state equation and the regularity properties of its solution u, we proceed to discretize it. To do so, we denote by \(\mathcal {K} \in \mathbb N\) the number of time steps. We define the (uniform) time step \(\tau = T/\mathcal {K} >0\) and set tk =  for \(k=0,\ldots ,\mathcal {K}\). We denote the time partition by \(\mathcal {T} = \{t_k\}_{k=0}^{\mathcal {K}}\). We define the space of continuous and piecewise linear, over the partition \(\mathcal {T}\), functions as follows:

$$\displaystyle \begin{aligned} \mathbb{U}(\mathcal{T}) = \left\{ W \in C([0,T];\mathbb{R}^n): W|{}_{(t_k,t_{k+1}]} \in \mathbb{P}_1(\mathbb{R}^n), k = 0, \ldots, \mathcal{K}-1 \right\}. \end{aligned} $$
(19)

We also define the space of piecewise constant functions

$$\displaystyle \begin{aligned} \mathbb{Z}(\mathcal{T}) = \left\{ W \in BV(0,T;\mathbb{R}^n): W|{}_{(t_k,t_{k+1}]} \in \mathbb{P}_0(\mathbb{R}^n), k = 0, \ldots, \mathcal{K}-1 \right\}, \end{aligned} $$
(20)

and the \(L^2(0,T;\mathbb {R}^n)\)-orthogonal projection onto \(\mathbb {Z}(\mathcal {T})\), that is, the operator \(\varPi _{\mathcal {T}} : L^2(0,T;\mathbb {R}^n) \to \mathbb {Z}(\mathcal {T})\) defined by

$$\displaystyle \begin{aligned} \int_0^T \left( r - \varPi_{\mathcal{T}} r \right) \cdot Z^{\boldsymbol{\tau}} \, {\mathrm{d}} t = 0 \quad \forall Z^{\boldsymbol{\tau}} \in \mathbb{Z}(\mathcal{T}). \end{aligned}$$

We remark that \(\varPi _{\mathcal {T}}\) satisfies

$$\displaystyle \begin{aligned} \| r - \varPi_{\mathcal{T}} r \|{}_{L^2(0,T;\mathbb{R}^n)} \lesssim \tau \| \dot r \|{}_{L^2(0,T;\mathbb{R}^n)}, \end{aligned} $$
(21)

where the hidden constant is independent of r and τ.

For a function \(\phi \in BV(0,T;\mathbb {R}^n)\) we set ϕk =lim𝜖↑0ϕ(tk − 𝜖) and \(\phi ^{\boldsymbol {\tau }} = \{ \phi ^k \}_{k=0}^{\mathcal {K}}\), which can be uniquely identified with either an element of \(\mathbb {U}(\mathcal {T})\) or \(\mathbb {Z}(\mathcal {T})\) by the procedures we describe now. To ϕτ we associate \(\bar \phi ^{\boldsymbol {\tau }} \in \mathbb {Z}(\mathcal {T})\) defined by

$$\displaystyle \begin{aligned} \bar\phi^{\boldsymbol{\tau}}(0) = \phi^0, \quad \bar\phi^{\boldsymbol{\tau}}|{}_{(t_k,t_{k+1}]}(t) = \phi^{k+1}, \ k = 0, \ldots,\mathcal{K}-1. \end{aligned} $$
(22)

We also associate \(\hat \phi ^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) via

$$\displaystyle \begin{aligned} \hat\phi^{\boldsymbol{\tau}}(0) = \phi^0, \quad \hat\phi^{\boldsymbol{\tau}} |{}_{(t_k,t_{k+1}]}(t) = \frac{t_{k+1} - t}\tau \phi^k + \frac{t-t_k}\tau \phi^{k+1}, \ k = 0, \ldots, \mathcal{K}-1.\end{aligned} $$
(23)

Notice that

$$\displaystyle \begin{aligned} \| \hat\phi^{\boldsymbol{\tau}} \|{}_{L^\infty(0,T;\mathbb{R}^n)} = \| \bar\phi^{\boldsymbol{\tau}} \|{}_{L^\infty(0,T;\mathbb{R}^n)} = \| \phi^{\boldsymbol{\tau}} \|{}_{\ell^\infty(\mathbb{R}^n)} \end{aligned}$$

and that

$$\displaystyle \begin{aligned} \| \bar\phi^{\boldsymbol{\tau}} \|{}_{L^2(0,T;\mathbb{R}^n)}^2 = \tau \sum_{k=1}^{\mathcal{K}} |\phi^k|{}_n^2. \end{aligned}$$

Finally, for a sequence ϕτ we also define, for \(k=0,\ldots , \mathcal {K}-1\),

$$\displaystyle \begin{aligned} {\mathfrak{d}} \phi^{k+1} = \tau \dot{\hat\phi}^{\boldsymbol{\tau}}|{}_{(t_k,t_{k+1}]} = \phi^{k+1} - \phi^k , \end{aligned} $$
(24)

which can be understood as a mapping \({\mathfrak {d}} : \mathbb {U}(\mathcal {T}) \to \mathbb {Z}(\mathcal {T})\).

Having introduced this notation, we propose to discretize (2) by a collocation method over \(\mathbb {U}(\mathcal {T})\). In other words, we seek for \(\hat U^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) such that

$$\displaystyle \begin{aligned} \hat U^{\boldsymbol{\tau}}(0) = \psi, \end{aligned} $$
(25)

and, for every \(k = 0, \ldots \mathcal {K}-1\), it satisfies

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma \hat{U}^{\boldsymbol{\tau}}(t_{k+1}) + \mathcal{A} \hat{U}^{\boldsymbol{\tau}}(t_{k+1}) = \varPi_{\mathcal{T}} g(t_{k+1}). \end{aligned} $$
(26)

Remark 2 (Derivation of the Scheme)

In the literature, (26) is commonly referred to as the L1-scheme [16, 21, 22, 29], even though it is not presented this way. Nevertheless, let us show that this is equivalent to the methods presented in the literature. To see the relation it is sufficient to compute, for a function \(\hat W^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\), the value of \(\, {\mathrm {d}}_t^{\gamma } \hat W^{\boldsymbol {\tau }}(t_{k+1})\). By definitions (3), (23), and (24), we obtain that

$$\displaystyle \begin{aligned} \begin{aligned} \, {\mathrm{d}}_t^\gamma \hat W^{\boldsymbol{\tau}}(t_{k+1}) &= \frac 1{\varGamma(1-\gamma)} \int_0^{t_{k+1}} \frac 1{(t_{k+1} - \zeta)^\gamma} \dot{\hat W}^{\boldsymbol{\tau}}(\zeta) \, {\mathrm{d}} \zeta \\ &= \frac{\tau^{-1}}{\varGamma(1-\gamma)} \sum_{j=0}^k {\mathfrak{d}} W^{j+1} \int_{t_j}^{t_{j+1}} \frac 1{(t_{k+1} - \zeta)^\gamma} \, {\mathrm{d}} \zeta = \sum_{j=0}^k a_j^k {\mathfrak{d}} W^{j+1}, \end{aligned} \end{aligned} $$
(27)

where the coefficients \(a_j^k\) satisfy

$$\displaystyle \begin{aligned} \begin{aligned} a_j^k &= \frac{\tau^{-1}}{\varGamma(1-\gamma)} \int_{t_j}^{t_{j+1}} \frac 1{(t_{k+1} - \zeta)^\gamma} \, {\mathrm{d}} \zeta \\ &= \frac{\tau^{-1}}{\varGamma(2-\gamma)} \left[ (t_{k+1} - t_j)^{1-\gamma} - (t_{k+1} - t_{j+1})^{1-\gamma} \right] \\ &= \frac{\tau^{-\gamma}}{\varGamma(2-\gamma)} \left[ (k+1-j)^{1-\gamma} - (k-j)^{1-\gamma} \right]. \end{aligned} \end{aligned} $$
(28)

Here, in the last step, we used that the time step is uniform and of size τ. The fact that the time step is uniform also implies that

$$\displaystyle \begin{aligned} a_{k-j}^k = \frac{\tau^{-\gamma}}{\varGamma(2-\gamma)} \left[ (j+1)^{1-\gamma} - j^{1-\gamma} \right] = a^{k+j}_k, \end{aligned}$$

so that, after the change of indices m = k − j, we obtain

$$\displaystyle \begin{aligned} \begin{aligned} \, {\mathrm{d}}_t^\gamma \hat W^{\boldsymbol{\tau}}(t_{k+1}) & = \frac{\tau^{-\gamma}}{\varGamma(2-\gamma)} \sum_{m=0}^k b_m {\mathfrak{d}} W^{k+1-m} \\ &= \frac{\tau^{-\gamma}}{\varGamma(2-\gamma)} \left( b_0 W^{k+1} + \sum_{m=1}^{k} (b_m - b_{m-1}) W^{k+1-m} -b_k W^0\right), \end{aligned} \end{aligned} $$
(29)

with bm = (m + 1)1−γ − m1−γ. The expression above is what is commonly referred to as the L1 scheme.

2.2.1 Stability

Let us discuss the stability of scheme (26) as originally detailed in [26, Section 3.2.2]. We begin by exploring the properties of the coefficients \(a_j^k\).

Lemma 1 (Properties of \(a_j^k\))

Assume that the time step is given by τ > 0. For every\(k=0,\ldots ,\mathcal {K}-1\)and j = 0, …, k, the coefficients\(a_j^k\), defined in (28), satisfy

$$\displaystyle \begin{aligned} 0 < a_j^k, \qquad a_j^k < a_{j+1}^k, \qquad a_j^{k+1} < a_j^k. \end{aligned}$$

Moreover \(a_k^k = \tau ^{-\gamma } /\varGamma (2-\gamma )\).

Proof

The positivity of the coefficients follows from the fact that, for j = 0, …, k and ζ ∈ (tj, tj+1), we have that tk+1 − ζ > 0. We now show that the coefficients are increasing in the lower index. In fact, an application of the mean value theorem yields

$$ \begin{array}{lll} a_j^k = \frac 1{\varGamma(1-\gamma)} \int_{t_j}^{t_{j+1}} \frac{\, {\mathrm{d}} \zeta}{(t_{k+1}-\zeta)^\gamma} = \frac 1{\varGamma(1-\gamma)} \frac 1{(t_{k+1} - \zeta_j)^\gamma} \end{array}$$

for some ζj ∈ (tj, tj+1). Since the function ζ↦(tk+1ζ)γ is increasing for ζ < tk+1, we conclude that \(a_j^k < a_{j+1}^k\). To show that the coefficients are decreasing in the upper index, we note that

$$\displaystyle \begin{aligned} t_{k+1} > t_k \implies \frac 1{(t_{k+1} - \zeta)^\gamma} < \frac 1{(t_k-\zeta)^\gamma}, \end{aligned}$$

so that \(a_j^{k+1} < a_j^k\). Finally, we note that

$$\displaystyle \begin{aligned} a_k^k = \frac 1{\varGamma(1-\gamma)} \int_{t_k}^{t_{k+1}} \frac{\, {\mathrm{d}} \zeta}{(t_{k+1}-\zeta)^\gamma} = \frac{ \tau^{-\gamma} }{\varGamma(2-\gamma)}. \end{aligned}$$

This concludes the proof. □

With the results of Lemma 1 at hand, we can now show stability of the scheme.

Theorem 3 (Stability)

For every \(\mathcal {K} \in \mathbb N\) , the scheme (26) is unconditionally stable and satisfies

$$\displaystyle \begin{aligned} I_t^{1-\gamma}\left[|\bar{U}^{\boldsymbol{\tau}}|{}_n^2\right](T) + \| \bar{U}^{\boldsymbol{\tau}} \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)}^2 \lesssim \varLambda_\gamma^2(\psi,g), \end{aligned}$$

where the hidden constant is independent of ψ, g, \(\bar {U}^{\boldsymbol {\tau }}\) and \(\mathcal {K}\) ; and Λ γ is defined in (17).

Proof

Multiply (26), by 2Uk+1 to obtain

$$\displaystyle \begin{aligned} 2 \, {\mathrm{d}}_t^\gamma \hat{U}^{\boldsymbol{\tau}} (t_{k+1}) \cdot U^{k+1} + 2|U^{k+1}|{}_{\mathcal{A}}^2 \leq 2|\varPi_{\mathcal{T}} g^{k+1}|{}_n |U^{k+1}|{}_n, \end{aligned} $$
(30)

where on the right-hand side we applied the Cauchy–Schwartz inequality; \(|\cdot |{ }_{\mathcal {A}}\) is defined in Section 2.1. We thus use (10), together with Young’s inequality, to say that

$$\displaystyle \begin{aligned} 2 \, {\mathrm{d}}_t^\gamma \hat{U}^{\boldsymbol{\tau}} (t_{k+1}) \cdot U^{k+1} + |U^{k+1}|{}_{\mathcal{A}}^2 \leq \lambda_1^{-1}|\varPi_{\mathcal{T}} g^{k+1}|{}^2_n . \end{aligned}$$

We now invoke (27) and deduce that

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma \hat{U}^{\boldsymbol{\tau}} (t_{k+1}) \cdot U^{k+1} &= a_k^k |U^{k+1}|{}_n^2 + \sum_{j=0}^{k-1} a_j^k U^{j+1} \cdot U^{k+1} - \sum_{j=1}^k a_j^k U^j \cdot U^{k+1}\\ & \quad - a_0^k U^0\cdot U^{k+1} \\ &= a_k^k |U^{k+1}|{}_n^2 + \sum_{j=1}^k (a_{j-1}^k - a_j^k) U^j \cdot U^{k+1} - a_0^k U^0 \cdot U^{k+1}. \end{aligned} $$

With this at hand (30) reduces to

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle 2a_k^k |U^{k+1}|{}_n^2 + |U^{k+1}|{}_{\mathcal{A}}^2 \\ &\displaystyle &\displaystyle \quad \leq \lambda_1^{-1} |\varPi_{\mathcal{T}} g^{k+1}|{}^2_n + 2 \sum_{j=1}^k (a_j^k - a_{j-1}^k) U^j \cdot U^{k+1} +2 a_0^k U^0 \cdot U^{k+1}. \end{array} \end{aligned} $$

Since, as stated in Lemma 1, we have that \(a_j^k-a_{j-1}^k>0\) we estimate

$$\displaystyle \begin{aligned} 2 \sum_{j=1}^k (a_j^k - a_{j-1}^k) U^j \cdot U^{k+1} &\leq \sum_{j=1}^k (a_j^k - a_{j-1}^k) ( |U^j|{}_n^2 + |U^{k+1}|{}_n^2 ) \\ &=\sum_{j=1}^k (a_j^k - a_{j-1}^k) |U^j|{}_n^2 +(a_k^k -a_0^k) |U^{k+1}|{}_n^2,\end{aligned} $$

which can be used to obtain that

$$\displaystyle \begin{aligned} a_k^k |U^{k+1}|{}_n^2 + \sum_{j=1}^k a_{j-1}^k |U^j|{}_n^2+ |U^{k+1}|{}_{\mathcal{A}}^2 \leq \lambda_1^{-1}|\varPi_{\mathcal{T}} g^{k+1}|{}^2_n + a_0^k |\psi|{}_n^2 + \sum_{j=1}^k a_j^k |U^j|{}_n^2.\end{aligned} $$
(31)

Notice now that, since \(a_j^k\) are defined as in (28) and bm = (m + 1)1−γ − m1−γ, for every j = 0, …, k we have

$$\displaystyle \begin{aligned} a_j^k = \frac{\tau^{-\gamma}}{\varGamma(2-\gamma)} b_{k-j}.\end{aligned} $$

Thus, the change of indices m = k + 1 − j on the left-hand side and l = k − j on the right-hand side of (31), respectively, yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\tau^{-\gamma}}{\varGamma(2-\gamma)} \sum_{m=0}^k b_m |U^{k+1-m}|{}_n^2 +|U^{k+1}|{}_{\mathcal{A}}^2 &\displaystyle \leq&\displaystyle \lambda_1^{-1}|\varPi_{\mathcal{T}} g^{k+1}|{}^2_n + a_0^k |\psi|{}_n^2 \\ &\displaystyle &\displaystyle + \frac{\tau^{-\gamma}}{\varGamma(2-\gamma)} \sum_{l=0}^{k-1} b_l |U^{k-l}|{}_n^2, \end{array} \end{aligned} $$

where the sum on the right-hand side vanishes for k = 0. Multiply by τ and add over k to obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \frac{\tau^{1-\gamma}}{\varGamma(2-\gamma)} \sum_{k=0}^{\mathcal{K}-1} b_k |U^{\mathcal{K}-k}|{}_n^2 + \| \bar{U}^{\boldsymbol{\tau}} \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)}^2&\displaystyle \leq&\displaystyle \lambda_1^{-1}\| \varPi_{\mathcal{T}} g \|{}_{L^2(0,T;\mathbb{R}^n)}^2\\ &\displaystyle &\displaystyle + \tau |\psi|{}_n^2 \sum_{k=0}^{\mathcal{K}-1} a_0^k, \end{array} \end{aligned} $$
(32)

where \( \| \bar {U}^{\boldsymbol {\tau }} \|{ }_{L^2_{\mathcal {A}}(0,T;\mathbb {R}^n)}\) is defined by (11). Notice now that, since the time step is uniform,

$$\displaystyle \begin{aligned} \tau \sum_{k=0}^{\mathcal{K}-1} a_0^k= \frac{\tau^{1-\gamma}}{\varGamma(2-\gamma)} \sum_{k=0}^{\mathcal{K}-1} b_k = \frac{T^{1-\gamma}}{\varGamma(2-\gamma)} = I_t^{1-\gamma}[1](T). \end{aligned} $$
(33)

We now analyze the first term on the left-hand side of (32): Changing indices via \(l+1=\mathcal {K}-k\) gives

$$\displaystyle \begin{aligned} \begin{aligned} \frac{\tau^{1-\gamma}}{\varGamma(2-\gamma)} \sum_{k=0}^{\mathcal{K}-1} b_k |U^{\mathcal{K}-k}|{}_n^2 &= \frac{\tau^{1-\gamma}}{\varGamma(2-\gamma)} \sum_{l=0}^{\mathcal{K}-1} b_{\mathcal{K} - l-1} |U^{l+1}|{}_n^2 \\ &= \sum_{l=0}^{\mathcal{K}-1} \tau a_l^{\mathcal{K}-1} |U^{l+1}|{}_n^2 \\ &= \frac 1{\varGamma(1-\gamma)} \sum_{l=0}^{\mathcal{K}-1} \int_{t_l}^{t_{l+1}} \frac 1{(t_{\mathcal{K}} - \zeta)^\gamma} |\bar{U}^{\boldsymbol{\tau}}(\zeta)|{}_n^2 \, {\mathrm{d}} \zeta \\ &= I_t^{1-\gamma}\left[ |\bar{U}^{\boldsymbol{\tau}} |{}_n^2 \right](T). \end{aligned} \end{aligned} $$
(34)

Inserting (33) and (34) in (32), and using \(\varPi _{\mathcal {T}}\) that is a projection, yields the result. □

2.2.2 Consistency and Error Estimates

Let us now discuss the consistency of scheme (26). This will allow us to obtain error estimates. Clearly, it suffices to control the difference \(\, {\mathrm {d}}_t^\gamma (u - \hat {u}^{\boldsymbol {\tau }})\). The following formal estimate has been shown in many references; see, for instance, [21, 22]. The proof, essentially, is a Taylor expansion argument.

Proposition 1 (Consistency for Smooth Functions)

Let \(w \in C^2([0,T];\mathbb {R}^n)\) , then

$$\displaystyle \begin{aligned} \| \, {\mathrm{d}}_t^\gamma(w - \hat{w}^{\boldsymbol{\tau}}) \|{}_{L^\infty(0,T;\mathbb{R}^n)} \lesssim \tau^{2-\gamma}, \end{aligned}$$

where the hidden constant depends on \(\|w\|{ }_{C^2([0,T];\mathbb {R}^n)}\) but is independent of τ.

We must immediately point out that this estimate cannot be used in the analysis of (2). The reason behind this lies in Theorem 2 which shows that, in general, the solution to the state equation is not twice continuously differentiable. For this reason, in [26] a new consistency estimate, which takes into account the correct regularity of the solution, has been developed. This is the content of the next result.

Theorem 4 (Consistency)

Let γ ∈ (0, 1) and u solve (2). In the setting of Theorem2we have that, for any θ < ,

$$\displaystyle \begin{aligned} \|\, {\mathrm{d}}_t^\gamma(u - \hat{u}^{\boldsymbol{\tau}}) \|{}_{L^2(0,T;\mathbb{R}^n)} \lesssim \tau^\theta \left( |\psi|{}_n + \| g\|{}_{H^2(0,T;\mathbb{R}^n)} \right), \end{aligned}$$

where the hidden constant is independent of τ but blows up as θ ↑. Here θ is independent of γ.

For a proof of this result, we refer the reader to [26, Section 3.2.1]. We just comment that it consists of a combination of the fine regularity results of Theorem 2, weighted estimates, and the mapping properties of the fractional integral operator \(I_t^{1-\gamma }\) detailed in Section 1.1. Let us, however, show how from this we obtain an error estimate.

Corollary 1 (Error Estimates)

Let u solve (2) and Uτsolve (26). In the setting of Theorem2we have that, for any θ < ,

$$\displaystyle \begin{aligned} I_t^{1-\gamma}\left[ |\bar u^{\boldsymbol{\tau}} - \bar{U}^{\boldsymbol{\tau}}|{}_n^2 \right](T) + \| \bar u^{\boldsymbol{\tau}} - \bar{U}^{\boldsymbol{\tau}}\|{}_{L_{\mathcal{A}}^2(0,T;\mathbb{R}^n)}^2 \lesssim \tau^{2\theta} \left( |\psi|{}_n + \| g\|{}_{H^2(0,T;\mathbb{R}^n)} \right)^2, \end{aligned}$$

where the hidden constant is independent of τ and the data but blows up as θ ↑.

Proof

Define eτ = uτ − Uτ. Subtracting (2) and (25)–(26) at t = tk+1 yields \(\hat {e}^{\boldsymbol {\tau }}(0) = 0\) and, for \(k=0,\ldots , \mathcal {K}-1\)

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma \hat{e}^{\boldsymbol{\tau}}(t_{k+1}) + \mathcal{A} \hat{e}^{\boldsymbol{\tau}}(t_{k+1}) = \, {\mathrm{d}}_t^\gamma( \hat{u}^{\boldsymbol{\tau}} - u )(t_{k+1}) + (g - \varPi_{\mathcal{T}} g)(t_{k+1}). \end{aligned}$$

Since \(\bar {e}^{\boldsymbol {\tau }}(0) = 0\), the stability estimate of Theorem 3 then yields

$$\displaystyle \begin{aligned} I_t^{1-\gamma}\left[ |\bar{e}^{\boldsymbol{\tau}}|{}_n^2 \right](T) + \| \bar{e}^{\boldsymbol{\tau}} \|{}_{L_{\mathcal{A}}^2(0,T;\mathbb{R}^n)}^2 {\lesssim} \|\, {\mathrm{d}}_t^\gamma(u - \hat{u}^{\boldsymbol{\tau}}) \|{}_{L^2(0,T;\mathbb{R}^n)}^2 + \| g - \varPi_{\mathcal{T}} g \|{}_{L^2(0,T;\mathbb{R}^n)}^2. \end{aligned}$$

The consistency estimate of Theorem 4 gives a control of the first term. Finally, owing to the regularity of g, we have that \(\| g - \varPi _{\mathcal {T}} g \|{ }_{L^2(0,T;\mathbb {R}^n)} \lesssim \tau \); see (21). This implies the result. □

2.3 Numerical Illustration

It is natural to wonder whether the reduced rate of convergence given in Corollary 1 is nothing but a consequence of the methods of proof. Here we show, by means of some computational examples, that while the rate τθ might not be sharp it is not possible to obtain the rate of convergence suggested by Proposition 1.

Let us set n = 1, T = 1, λ1 = , ψ = 1 and g = 0. From (14) we then obtain that the solution to the state equation (2) is given by

$$\displaystyle \begin{aligned} u(t) = E_{\gamma,1}\left(-\frac 12 t^\gamma \right). \end{aligned}$$

We implemented, in an in-house code, the scheme (25)–(26) and used it to approximate this function. We measured the L2(0, T) norm of the error, where we implemented the Mittag-Leffler function following [15]. Integration was carried out using a composite Gaussian rule with three (3) nodes; increasing the number of nodes produced no significant difference in the results.

The rates of convergence for various values of γ ∈ (0, 1) are presented in Figure 1. As we can see, Corollary 1 is not sharp, but consistent with the experimental orders. More importantly, the rates suggested by Proposition 1 are not obtained. In fact, the experimental rate of convergence seems to be \({\mathcal {O}}(\tau ^{\kappa } ) < {\mathcal {O}}(\tau ^{2-\gamma })\) with \(\kappa = \min \{1,\gamma +\tfrac 12\}\). However, the proof of such an estimate eludes us at the moment.

Fig. 1
figure 1figure 1

Experimental rates of convergence for the solution of (2) using (25)–(26). We have set n = 1, T = 1, λ1 = , ψ = 1 and g = 0. The figures show the computed rates of convergence with respect to the time step for γ = 0.3 (left), γ = 0.5 (middle), and γ = 0.8 (right). We observe that the rate of convergence τ2−γ is never attained

3 The Optimization Problem

Having studied the state equation, we can proceed with the study of the constrained optimization problem (4)–(5). We will show existence and uniqueness of a solution, along with a numerical technique to approximate it. We will also discuss the convergence properties of the proposed approximation scheme.

3.1 Existence and Uniqueness

To precisely state the constrained optimization problem, we begin by defining the set of admissible controls

$$\displaystyle \begin{aligned} Z_{\mathrm{ad}} = \left\{ \zeta \in L^2(0,T;\mathbb{R}^n): a \preceq \zeta(t) \preceq b,\ a.e.~ t \in (0,T) \right\}, \end{aligned} $$
(35)

which is, under the assumption that a ≼ b, a nonempty, closed, convex, and bounded subset of \(L^2(0,T;\mathbb {R}^n)\).

Now, as the conclusion of Theorem 1 asserts, for any \(z \in L^2(0,T;\mathbb {R}^n)\) there is a unique \(u = u(z) \in \mathbb {U}\) that solves (2). This uniquely defines an affine continuous mapping \({\mathfrak {S}} : L^2(0,T;\mathbb {R}^n) \to \mathbb {U} \subset L^2(0,T;\mathbb {R}^n)\) by the rule \(u = {\mathfrak {S}} z\), where u solves (2). With these tools at hand, we can show the existence and uniqueness of a state–control pair, that is, a pair \(({\breve {u}}, {\breve {z}}) \in \mathbb {U} \times Z_{\mathrm{ad}}\) such that \({\breve {u}} = {\mathfrak {S}} {\breve {z}}\) and satisfies (4)–(5). The proof of the following result is standard and we include it just for the sake of completeness.

Theorem 5 (Existence and Uniqueness)

The optimization problem: Find (u, z) such that satisfies (4) subject to (2) and (5) has a unique solution\(({\breve {u}}, {\breve {z}} ) \in \mathbb {U} \times Z_{\mathrm{ad}}\).

Proof

The control to state operator \({\mathfrak {S}}\) allows us to introduce the so-called reduced cost functional:

$$\displaystyle \begin{aligned} {\mathcal{J}}(z) := J({\mathfrak{S}} z,z) = \frac 12 \int_0^T \left( | \mathcal{C} {\mathfrak{S}} z - u_d |{}_m^2 + \mu |z |{}_n^2 \right) \, {\mathrm{d}} t, \end{aligned}$$

and to equivalently state the problem as: minimize \({\mathcal {J}}\) over Zad. Since μ > 0 and \({\mathfrak {S}}\) is affine the reduced cost \({\mathcal {J}}\) is strictly convex. Owing to the continuity of \({\mathfrak {S}}\), we have that \({\mathcal {J}}\) is continuous as well. Existence and uniqueness then follow from the direct method of calculus of variations [7, 23]. □

3.2 Discretization

We now proceed to discretize the optimization problem (4)–(5). We will do so by a piecewise constant approximation of the control and a piecewise linear continuous approximation of the state. We will follow the notation of Section 2.2 and, additionally, define

$$\displaystyle \begin{aligned} \mathbb{Z}_{\mathrm{ad}}(\mathcal{T}) = \mathbb{Z}(\mathcal{T}) \cap Z_{\mathrm{ad}}. \end{aligned}$$

Once again, \(\mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) is a nonempty, convex, and closed subset of \(\mathbb {Z}(\mathcal {T})\). Notice also that, since a, b are time independent \(\varPi _{\mathcal {T}} Z_{\mathrm{ad}} \subset \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\).

We also define the discrete cost functional \(J_{\mathcal {T}} : \mathbb {U}(\mathcal {T}) \times \mathbb {Z}(\mathcal {T}) \to \mathbb {R}\) by

$$\displaystyle \begin{aligned} J_{\mathcal{T}}(\hat U^{\boldsymbol{\tau}},Z^{\boldsymbol{\tau}}) = \frac 12 \int_0^T \left( | \mathcal{C} \bar{U}^{\boldsymbol{\tau}} - \bar{u}^{\boldsymbol{\tau}}_d |{}_m^2 + \mu |Z^{\boldsymbol{\tau}} |{}_n^2 \right) \, {\mathrm{d}} t, \end{aligned}$$

where \(\mathbb {U}(\mathcal {T})\) and \(\mathbb {Z}(\mathcal {T})\) are defined in (19) and (20), respectively. We immediately comment that, by an abuse of notation, we defined \(\bar u_d^{\boldsymbol {\tau }} \subset \mathbb {R}^m\) as the sequence of values \(u_d^k = \int _{t_k}^{t_{k+1}} u_d \, {\mathrm {d}} t\). In other words, we are modifying the cost by replacing the desired state ud by its piecewise constant approximation \(\bar {u}^{\boldsymbol {\tau }}_d\). Additionally, we have replaced \(\hat U^{\boldsymbol {\tau }}\) by its piecewise constant counterpart \(\bar {U}^{\boldsymbol {\tau }} \in \mathbb {Z}(\mathcal {T})\). For these reasons,

$$\displaystyle \begin{aligned} J_{\mathcal{T}}(\hat U^{\boldsymbol{\tau}},Z^{\boldsymbol{\tau}}) \neq J( \hat U^{\boldsymbol{\tau}},Z^{\boldsymbol{\tau}}). \end{aligned}$$

We propose the following discretization of the state equation (2): Given \(Z^{\boldsymbol {\tau }} \in \mathbb {Z}(\mathcal {T})\), find \(\hat U^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) such that \(\hat U^{\boldsymbol {\tau }}(0) = \psi \) and, for all \(k = 0, \ldots , \mathcal {K}-1 \), we have

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma \hat U^{\boldsymbol{\tau}} (t_{k+1}) + \mathcal{A} \hat U^{\boldsymbol{\tau}}(t_{k+1}) = \varPi_{\mathcal{T}} f (t_{k+1}) + Z^{\boldsymbol{\tau}}(t_{k+1}), \end{aligned} $$
(36)

where \(\, {\mathrm {d}}_t^\gamma \) is defined in (3) and \( \varPi _{\mathcal {T}}\) corresponds to the \(L^2(0,T;\mathbb {R}^n)\)-orthogonal projection onto \(\mathbb {Z}(\mathcal {T})\). We remark that (36) is nothing but discretization (25)–(26) of the state equation, where the variable z is already piecewise constant in time. Since \(f + Z^{\boldsymbol {\tau }} \in L^2(0,T;\mathbb {R}^n)\), we can invoke Theorem 3 to conclude that problem (36) is stable for all τ > 0.

We thus define the discrete optimization problem as follows: Find \((\breve {\hat U}^{\boldsymbol {\tau }},{\breve {Z}}^{\boldsymbol {\tau }}) \in \mathbb {U}(\mathcal {T})\times \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) such that

$$\displaystyle \begin{aligned} J_{\mathcal{T}}(\breve{\hat U}^{\boldsymbol{\tau}},{\breve{Z}}^{\boldsymbol{\tau}}) = \min J_{\mathcal{T}}(\hat U^{\boldsymbol{\tau}},Z^{\boldsymbol{\tau}}) \end{aligned} $$
(37)

subject to (36). Let us briefly comment on the existence and uniqueness of a minimizer, which closely follows Theorem 5. Indeed, for every \(z \in L^2(0,T;\mathbb {R}^n)\) there exists a unique \(\hat U^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) that solves (36) with data \(\varPi _{\mathcal {T}} z\). This uniquely defines a map \({\mathfrak {S}}_{\mathcal {T}} : L^2(0,T;\mathbb {R}^n) \to \mathbb {U}(\mathcal {T})\), which we call the discrete control to state map. We can then define the reduced cost as

$$\displaystyle \begin{aligned} \mathbb{Z}(\mathcal{T}) \ni Z^{\boldsymbol{\tau}} \mapsto {\mathcal{J}}_{\mathcal{T}}(Z^{\boldsymbol{\tau}}) = J_{\mathcal{T}}(\widehat{{\mathfrak{S}}_{\mathcal{T}} Z^{\boldsymbol{\tau}}}, Z^{\boldsymbol{\tau}}) \end{aligned}$$

and proceed as in Theorem 5, by using the strict convexity of \({\mathcal {J}}_{\mathcal {T}}\) and the continuity of the affine map \({\mathfrak {S}}_{\mathcal {T}}\), which follows from Theorem 3.

3.3 Discrete Optimality Conditions

Let us derive discrete optimality conditions. This is useful not only in the practical solution of the discrete optimization problem (36)–(37), but it will help us in analyzing its convergence properties. As stated before, problem (36)–(37) is equivalent to the following constrained optimization problem: Find \(\breve Z^{\boldsymbol {\tau }} \in \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) such that

$$\displaystyle \begin{aligned} {\mathcal{J}}_{\mathcal{T}} (\breve Z^{\boldsymbol{\tau}}) = \min \left\{ {\mathcal{J}}_{\mathcal{T}}(Z^{\boldsymbol{\tau}}) : Z^{\boldsymbol{\tau}} \in \mathbb{Z}_{\mathrm{ad}}(\mathcal{T}) \right\}, \end{aligned}$$

that is, a minimization problem over a closed, bounded, and convex set. It is standard then (since \({\mathcal {J}}_{\mathcal {T}}\) is convex, coercive, and differentiable) that a necessary and sufficient condition for optimality is

$$\displaystyle \begin{aligned} D {\mathcal{J}}_{\mathcal{T}}(\breve Z^{\boldsymbol{\tau}}) \left[ Z^{\boldsymbol{\tau}} - \breve Z^{\boldsymbol{\tau}} \right] \geq 0 \quad \forall Z^{\boldsymbol{\tau}} \in \mathbb{Z}_{\mathrm{ad}}(\mathcal{T}), \end{aligned} $$
(38)

where \(D {\mathcal {J}}_{\mathcal {T}}(Z)[\cdot ]\) is the Gâteaux derivative of \({\mathcal {J}}_{\mathcal {T}}\) at the point Z. Let us now rewrite and simplify the optimality condition (38) by introducing the so-called adjoint state that, as stated in [31, Section 1.4.3], is a simple trick that is of utmost importance in optimal control theory.

For a given \(\hat U^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) the adjoint is the function \(\hat P^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) such that \(\hat P^{\boldsymbol {\tau }}(T) = 0\) and, for all \(k = \mathcal {K}-1, \ldots , 0\)

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_{T-t}^\gamma \hat P^{\boldsymbol{\tau}}(t_k) + \mathcal{A} \hat P^{{\boldsymbol{\tau}}}(t_k) = \mathcal{C}^\intercal\left( \mathcal{C} \bar U^{\boldsymbol{\tau}}(t_k) - \bar u_d^{\boldsymbol{\tau}}(t_k) \right), \end{aligned} $$
(39)

where \(\, {\mathrm {d}}_{T-t}^\gamma \) denotes the right-sided Caputo fractional derivative of order γ defined in (6). The optimality conditions are as follows.

Theorem 6 (Optimality Conditions)

The pair \((\breve {\hat U}^{\boldsymbol {\tau }},\breve Z^{\boldsymbol {\tau }}) \in \mathbb {U}(\mathcal {T}) \times \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) solves (37) if and only if\(\breve {\hat U}^{\boldsymbol {\tau }} = {\mathfrak {S}}_{\mathcal {T}} \breve Z^{\boldsymbol {\tau }}\)and

$$\displaystyle \begin{aligned} \int_0^T \left( \bar{\breve P}^{\boldsymbol{\tau}} + \mu \breve Z^{\boldsymbol{\tau}} \right) \cdot \left( Z^{\boldsymbol{\tau}} - \breve Z^{\boldsymbol{\tau}} \right) \, {\mathrm{d}} t \geq 0 \quad \forall Z^{\boldsymbol{\tau}} \in \mathbb{Z}_{\mathrm{ad}}(\mathcal{T}), \end{aligned} $$
(40)

where \(\breve P^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) solves (39) with data\(\breve {\hat U}^{\boldsymbol {\tau }}\).

Proof

We will obtain the result by showing that (40) is nothing but a restatement of (38). In fact, a simple calculation reveals that, for any \(\varTheta ^{\boldsymbol {\tau }}, \varPsi ^{\boldsymbol {\tau }} \in \mathbb {Z}(\mathcal {T})\), we have

$$\displaystyle \begin{aligned} D {\mathcal{J}}_{\mathcal{T}}(\varTheta^{\boldsymbol{\tau}})[\varPsi^{\boldsymbol{\tau}}] = \int_0^T \left[ \left( \mathcal{C} \overline{{\mathfrak{S}}_{\mathcal{T}} \varTheta^{\boldsymbol{\tau}}} - \bar u_d^{\boldsymbol{\tau}} \right) \cdot \mathcal{C} \overline{{\mathfrak{S}}_{\mathcal{T}} \varPsi^{\boldsymbol{\tau}}} + \mu \varTheta^{\boldsymbol{\tau}} \cdot \varPsi^{\boldsymbol{\tau}} \right] \, {\mathrm{d}} t. \end{aligned}$$

Consequently, (38) can be equivalently rewritten as, for every \(Z^{\boldsymbol {\tau }} \in \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\),

$$\displaystyle \begin{aligned} \int_0^T \left[ \mathcal{C}^\intercal \left( \mathcal{C} \overline{{\mathfrak{S}}_{\mathcal{T}} \breve Z^{\boldsymbol{\tau}}} - \bar u_d^{\boldsymbol{\tau}} \right) \cdot \overline{{\mathfrak{S}}_{\mathcal{T}} (Z^{\boldsymbol{\tau}} - \breve Z^{\boldsymbol{\tau}} )} + \mu \breve Z^{\boldsymbol{\tau}} \cdot (Z^{\boldsymbol{\tau}} - \breve Z^{\boldsymbol{\tau}} ) \right] \, {\mathrm{d}} t \geq 0.\end{aligned} $$
(41)

Let us focus our attention now on the first term inside the integral. Denote \(U^{\boldsymbol {\tau }} ={\mathfrak {S}}_{\mathcal {T}} Z^{\boldsymbol {\tau }}\) and \(\breve U^{\boldsymbol {\tau }} ={\mathfrak {S}}_{\mathcal {T}} \breve Z^{\boldsymbol {\tau }}\). Define \(\varPhi ^{\boldsymbol {\tau }} := U^{\boldsymbol {\tau }} - \breve U^{\boldsymbol {\tau }}\) and notice that \(\hat \varPhi ^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) satisfies: \(\hat \varPhi ^{\boldsymbol {\tau }}(0) = 0\) and, for every \(k = 0, \ldots , \mathcal {K}-1\),

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma \hat \varPhi^{\boldsymbol{\tau}} (t_{k+1}) + \mathcal{A} \hat \varPhi^{\boldsymbol{\tau}} (t_{k+1}) = Z^{\boldsymbol{\tau}}(t_{k+1}) - \breve Z^{\boldsymbol{\tau}}(t_{k+1}), \end{aligned}$$

or, in view of (22), equivalently,

$$\displaystyle \begin{aligned} \overline{\, {\mathrm{d}}_t^\gamma \hat \varPhi^{\boldsymbol{\tau}}} + \mathcal{A} \bar \varPhi^{\boldsymbol{\tau}} = Z^{\boldsymbol{\tau}} - \breve Z^{\boldsymbol{\tau}}. \end{aligned}$$

Multiply this equation by \(\bar {\breve P}^{\boldsymbol {\tau }}\) and integrate to obtain

$$\displaystyle \begin{aligned} \int_0^T \left[ \overline{\, {\mathrm{d}}_t^\gamma \hat \varPhi^{\boldsymbol{\tau}}} \cdot \bar{\breve P}^{\boldsymbol{\tau}} + \mathcal{A} \bar \varPhi^{\boldsymbol{\tau}} \cdot \bar{\breve P}^{\boldsymbol{\tau}} \right] \, {\mathrm{d}} t = \int_0^T \left( Z^{\boldsymbol{\tau}} - \breve Z^{\boldsymbol{\tau}} \right) \cdot \bar{\breve P}^{\boldsymbol{\tau}} \, {\mathrm{d}} t. \end{aligned}$$

Now, multiply (39) by \(\bar \varPhi ^{\boldsymbol {\tau }}\) and integrate to obtain

$$\displaystyle \begin{aligned} \int_0^T \left[ \overline{\, {\mathrm{d}}_{T-t}^\gamma \hat{\breve P}^{\boldsymbol{\tau}}} \cdot \bar\varPhi^{\boldsymbol{\tau}} + \mathcal{A} \bar{\breve P}^{\boldsymbol{\tau}} \cdot \bar \varPhi^{\boldsymbol{\tau}} \right] \, {\mathrm{d}} t = \int_0^T \mathcal{C}^\intercal\left( \mathcal{C} \bar{\breve U}^{\boldsymbol{\tau}} - \bar u_d^{\boldsymbol{\tau}} \right) \cdot \bar \varPhi^{\boldsymbol{\tau}} \, {\mathrm{d}} t. \end{aligned}$$

Subtract these last two identities. Upon remembering the definition of Φτ, we thus obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \int_0^T\left[ \overline{\, {\mathrm{d}}_t^\gamma \hat \varPhi^{\boldsymbol{\tau}}} \cdot \bar{\breve P}^{\boldsymbol{\tau}} - \overline{\, {\mathrm{d}}_{T-t}^\gamma \hat{\breve P}^{\boldsymbol{\tau}}} \cdot \bar\varPhi^{\boldsymbol{\tau}} \right] \, {\mathrm{d}} t \\ &\displaystyle &\displaystyle \quad = \int_0^T \left[ \left( Z^{\boldsymbol{\tau}} - \breve Z^{\boldsymbol{\tau}} \right) \cdot \bar{\breve P}^{\boldsymbol{\tau}} - \mathcal{C}^\intercal\left( \mathcal{C} \bar{\breve U}^{\boldsymbol{\tau}} - \bar u_d^{\boldsymbol{\tau}} \right) \cdot \overline{{\mathfrak{S}}_{\mathcal{T}} (Z^{\boldsymbol{\tau}} - \breve Z^{\boldsymbol{\tau}})} \right] \, {\mathrm{d}} t, \end{array} \end{aligned} $$

where we have used that the matrix \(\mathcal {A}\) is symmetric. Notice that the last term in this expression is nothing but the first term on the left-hand side of (41). In other words, if we can show that

$$\displaystyle \begin{aligned} \int_0^T \overline{\, {\mathrm{d}}_t^\gamma \hat \varPhi^{\boldsymbol{\tau}}} \cdot \bar{\breve P}^{\boldsymbol{\tau}} \, {\mathrm{d}} t = \int_0^T \overline{\, {\mathrm{d}}_{T-t}^\gamma \hat{\breve P}^{\boldsymbol{\tau}}} \cdot \bar\varPhi^{\boldsymbol{\tau}} \, {\mathrm{d}} t \end{aligned} $$
(42)

we obtain the result.

To show this we realize that, since we are dealing with piecewise constants, we can equivalently rewrite the left-hand side of this identity as

$$\displaystyle \begin{aligned} \int_0^T \overline{\, {\mathrm{d}}_t^\gamma \hat \varPhi^{\boldsymbol{\tau}}} \cdot \bar{\breve P}^{\boldsymbol{\tau}} \, {\mathrm{d}} t &= \tau \sum_{k=0}^{\mathcal{K}-1} {\breve P}^{k+1} \cdot \, {\mathrm{d}}_t^\gamma \hat \varPhi^{\boldsymbol{\tau}} (t_{k+1}) \\ &= \frac{\tau^{1-\gamma}}{\varGamma(2-\gamma)} \sum_{k=0}^{\mathcal{K}-1} {\breve P}^{k+1} \cdot \sum_{m=0}^k b_m {\mathfrak{d}} \varPhi^{k+1-m}, \end{aligned} $$

where we used (29).

In a similar manner to the computations of Remark 2, we can obtain that

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_{T-t}^\gamma \hat{\breve P}(t_k) = -\sum_{j=k}^{\mathcal{K}-1} a^j_k {\mathfrak{d}} {\breve P}^{j+1} = - \frac{\tau^{-\gamma}}{\varGamma(2-\gamma)} \sum_{j=k}^{\mathcal{K}-1} b_{j-k} {\mathfrak{d}} {\breve P}^{j+1}, \end{aligned}$$

consequently

$$\displaystyle \begin{aligned} \int_0^T \overline{\, {\mathrm{d}}_{T-t}^\gamma \hat{\breve P}^{\boldsymbol{\tau}}} \cdot \bar\varPhi^{\boldsymbol{\tau}} \, {\mathrm{d}} t = \frac{\tau^{1-\gamma}}{\varGamma(2-\gamma)} \sum_{k=1}^{\mathcal{K}} \varPhi^k \cdot \sum_{j=k}^{\mathcal{K}-1} b_{j-k} {\mathfrak{d}} {\breve P}^{j+1}. \end{aligned}$$

We can invoke now the results of [4, Appendix A] to conclude that the identity (42) holds. The theorem is thus proven. □

Remark 3 (Discrete Fractional Integration by Parts)

Notice that, during the course of the proof of Theorem 6 we showed that, whenever \(\hat V^{\boldsymbol {\tau }}, \hat W^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) satisfy \(\hat V^{\boldsymbol {\tau }}(0) = 0\) and \(\hat W^{\boldsymbol {\tau }}(T) = 0\), then they satisfy the following discrete fractional integration by parts

$$\displaystyle \begin{aligned} \int_0^T \overline{\, {\mathrm{d}}_t^\gamma \hat V^{\boldsymbol{\tau}}} \cdot \bar{W}^{\boldsymbol{\tau}} \, {\mathrm{d}} t = \int_0^T \overline{\, {\mathrm{d}}_{T-t}^\gamma \hat{W}^{\boldsymbol{\tau}}} \cdot \bar V^{\boldsymbol{\tau}} \, {\mathrm{d}} t. \end{aligned}$$

This identity shall prove useful in the sequel.

Remark 4 (Projection)

The solution to the variational inequality (40) can be accomplished rather easily. Indeed, since all the involved functions belong to \(\mathbb {Z}(\mathcal {T})\), it suffices to consider one time interval, say (tk−1, tk], where we must have

$$\displaystyle \begin{aligned} \left( \breve P^k + \mu \breve Z^k \right) \cdot \left( Z^k - \breve Z^k \right) \geq 0. \end{aligned}$$

From this it immediately follows that

$$\displaystyle \begin{aligned} \breve Z^k = \Pr_{[a,b]} \left( \frac{-1}\mu \breve P^k \right), \end{aligned}$$

where, for \(w \in \mathbb {R}^n\), we define Pr[a,b]w as the projection onto the cube \([a,b] = \left \{ x \in \mathbb {R}^n : a \preceq x \preceq b \right \}\), which can be easily accomplished by the formula

$$\displaystyle \begin{aligned} \Pr_{[a,b]} w_i = \max \left\{ a_i, \min \left\{ b_i , w_i \right\} \right\}, \quad i = 1, \dots, n. \end{aligned}$$

This is the main advantage of considering piecewise constant approximations of the control and a modified cost. Other variants might yield a better approximation, but at the cost of a more involved solution scheme.

3.4 Convergence

Let us now discuss the convergence of our approximation scheme. The main issue here is that since, even for a smooth f, the right-hand side of (36) belongs only to \(L^2(0,T;\mathbb {R}^n)\) we cannot invoke the results of Corollary 1 to establish a rate of convergence. Notice, additionally, that we modified the cost, one of the reasons being that this led us to the simplifications detailed in Remark 4. As a consequence we only show convergence without rates.

We begin by noticing that, for any \(z \in L^2(0,T;\mathbb {R}^n)\) we have that , where \(\hat V^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) satisfies

$$\displaystyle \begin{aligned} \hat V^{\boldsymbol{\tau}}(0) = \psi, \qquad \, {\mathrm{d}}_t^\gamma \hat V^{\boldsymbol{\tau}}(t_{k+1}) + \mathcal{A} \hat V^{\boldsymbol{\tau}}(t_{k+1}) = \varPi_{\mathcal{T}} f(t_{k+1}), \ k = 0, \ldots, \mathcal{K} -1, \end{aligned}$$

and the linear, continuous operator is the solution operator for the scheme: Find \(\hat U_0^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) such that \(\hat U_0^{\boldsymbol {\tau }} (0) = 0\) and, for \(k=0, \ldots , \mathcal {K}-1\),

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma \hat U^{\boldsymbol{\tau}}_0(t_{k+1}) + \mathcal{A} \hat U^{\boldsymbol{\tau}}_0(t_{k+1}) = \varPi_{\mathcal{T}} z (t_{k+1}). \end{aligned} $$
(43)

Let us describe the properties of \(\hat V^{\boldsymbol {\tau }}\).

Proposition 2 (Properties of \(\hat V^{\boldsymbol {\tau }}\))

Assume that \(f \in L^2(0,T;\mathbb {R}^n)\) , then the family \(\{\hat V^{\boldsymbol {\tau }}\}_{\mathcal {T}}\) converges, as \(\mathcal {K} \to \infty \) , in \(L^2(0,T;\mathbb {R}^n)\) to \(v \in \mathbb {U}\) , which solves

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma v + \mathcal{A} v = f, \quad t \in (0,T], \qquad v(0) = \psi. \end{aligned}$$

Proof

The claimed result is obtained by a simple density argument, combined with stability of the continuous and discrete state equations. Let 𝜖 > 0. Since \(f \in L^2(0,T;\mathbb {R}^n)\), there is a \(f_\epsilon \in H^2(0,T;\mathbb {R}^n)\) such that

$$\displaystyle \begin{aligned} \| f - f_\epsilon \|{}_{L^2(0,T;\mathbb{R}^n)} < \frac\epsilon{4C_1}, \end{aligned}$$

where by C1 we denote the constant in inequality (16). Denote by v𝜖 the solution to

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_t^\gamma v_\epsilon+ \mathcal{A} v_\epsilon = f_{\epsilon}, \quad t \in (0,T], \qquad v(0) = \psi. \end{aligned}$$

The smoothness of f𝜖 allows us to invoke Theorem 2 to assert that the regularity estimates (18), with u replaced by v𝜖, hold. In addition, invoking Theorem 1, we get that

$$\displaystyle \begin{aligned} \| v - v_\epsilon \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)} \leq C_1 \varLambda_\gamma(0,f-f_\epsilon ) = C_1 \| f - f_\epsilon\|{}_{L^2(0,T;\mathbb{R}^n)} < \frac\epsilon4. \end{aligned}$$

Let us now approximate v𝜖 via the scheme (26), over a mesh \(\mathcal {T}\) where \(\mathcal {K}\) remains to be chosen. In doing so we obtain a function \(\hat V_\epsilon ^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\). Moreover, since v𝜖 verifies the assumptions of Theorem 2, we invoke Corollary 1 to conclude that

$$\displaystyle \begin{aligned} \| \bar v_\epsilon - \bar V_\epsilon^{\boldsymbol{\tau}} \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)} \leq C_2 \tau^\theta, \end{aligned}$$

where C2 denotes a positive constant that depends on \(\|f_\epsilon \|{ }_{H^2(0,T;\mathbb {R}^n)}\). However, since 𝜖 is fixed, we can choose \(\mathcal {K}\) so that

$$\displaystyle \begin{aligned} C_2 \tau^\theta < \frac\epsilon4 \implies \| \bar v_\epsilon - \bar V_\epsilon^{\boldsymbol{\tau}} \|{}_{L^2(0,T;\mathbb{R}^n)} < \frac\epsilon4.\end{aligned} $$

The last ingredient is to observe that the difference \(V_\epsilon ^{\boldsymbol {\tau }} - V^{\boldsymbol {\tau }}\) solves (25)–(26) with zero initial condition and right-hand side \(\varPi _{\mathcal {T}}( f - f_\epsilon )\). We then invoke the stability of the scheme, stated in Theorem 3, to obtain

$$\displaystyle \begin{aligned} \| \bar V_\epsilon^{\boldsymbol{\tau}} - \bar V^{\boldsymbol{\tau}} \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)} \leq C_1 \varLambda_\gamma(0,\varPi_{\mathcal{T}}( f- f_\epsilon) ) \leq C_1 \| f - f_\epsilon\|{}_{L^2(0,T;\mathbb{R}^n)} < \frac\epsilon4,\end{aligned} $$

where we used that \(\varPi _{\mathcal {T}}\) is a projection.

Combine these observations to conclude that

$$\displaystyle \begin{aligned} \| v - \bar V^{\boldsymbol{\tau}} \|{}_{L^2_{\mathcal{A}}(0,T,\mathbb{R}^n)} &\leq \| v - v_\epsilon \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)} + \| v_\epsilon - \bar v_\epsilon \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)} \\&+ \| \bar v_\epsilon - \bar V_\epsilon^{\boldsymbol{\tau}} \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)} +\| \bar V_\epsilon^{\boldsymbol{\tau}} - \bar V^{\boldsymbol{\tau}} \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)}\\ & < \frac{3 \epsilon}4 + \| v_\epsilon - \bar v_\epsilon \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)}.\end{aligned} $$

Conclude by noticing that, since v𝜖 → v, after possibly taking an even larger \(\mathcal {K}\) we can assert

$$\displaystyle \begin{aligned} \| v_\epsilon - \bar v_\epsilon \|{}_{L^2_{\mathcal{A}}(0,T;\mathbb{R}^n)} < \frac\epsilon4.\end{aligned} $$

This concludes the proof. □

The main consequence of this statement arises when we use the decomposition of \({\mathfrak {S}}_{\mathcal {T}}\) in the reduced cost. Namely, we get

for \(W^{\boldsymbol {\tau }} = u_d^{\boldsymbol {\tau }} - \mathcal {C} V^{\boldsymbol {\tau }}\), that is, the discrete desired state changes and, moreover, \(W^{\boldsymbol {\tau }} \to u_d - \mathcal {C} v\) in \(L^2(0,T;\mathbb {R}^m)\) as \(\mathcal {K} \to \infty \). All these considerations allow us to reduce the problem to the case when ψ = 0 and f ≡ 0 so that the discrete control to state map is a linear operator.

In this setting we can assert the strong convergence of and its adjoint, which will be a fundamental tool in proving convergence. Here and in what follows, we denote by \({\mathfrak {B}}(L^2(0,T;\mathbb {R}^n))\) the space of bounded linear operators on \(L^2(0,T;\mathbb {R}^n)\) endowed with the operator norm.

Lemma 2 (Strong Convergence)

The family of solution operators and of their adjoints is uniformly bounded in \({\mathfrak {B}}(L^2(0,T;\mathbb {R}^n))\) and strongly convergent.

Proof

We begin by realizing that the uniform boundedness, in \({\mathfrak {B}}(L^2(0,T;\mathbb {R}^n))\), of is a restatement of Theorem 3, see [13, 18]. Moreover, the error estimates of Corollary 1 are valid for a collection of right-hand sides that is dense in \(L^2(0,T;\mathbb {R}^n)\). This means, by an argument similar to the one provided in Proposition 2, that for every \(z \in L^2(0,T;\mathbb {R}^n)\) the family converges; see [13, Proposition 5.17].

Let us now prove the same statements for the family of adjoints. To do so we must first identify it. Let \(z,\eta \in L^2(0,T;\mathbb {R}^n)\) and \(\hat U_0^{\boldsymbol {\tau }}\) solve (43). In addition, let \(\hat P^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) be the solution to (39) but with the right-hand side replaced by \(\varPi _{\mathcal {T}} \eta \). Multiply the aforementioned equations by \(\bar P^{\boldsymbol {\tau }}\) and \(\bar U_0^{\boldsymbol {\tau }}\), integrate and subtract to obtain

where we used that the matrix \(\mathcal {A}\) is symmetric. We now invoke Remark 3 to conclude that the right-hand side of the previous expression vanishes, which implies that

where the first and last equalities hold by the definition of \(\varPi _{\mathcal {T}}\). Since the last identity holds for every \(z \in L^2(0,T;\mathbb {R}^n)\), we thus have that .

It now remains to realize that Pτ is a discretization of the problem

$$\displaystyle \begin{aligned} \, {\mathrm{d}}_{T-t}^\gamma p + \mathcal{A} p = \eta, \ t \in [0,T), \quad p(T) = 0. \end{aligned}$$

Repeating the arguments that led to Theorem 3 and Corollary 1, we get that Pτ is a stable and consistent approximation, so we can, again, conclude the uniform boundedness and convergence of the family . □

We are now ready to establish convergence of our scheme.

Theorem 7 (Convergence)

The family \(\{ \breve Z^{\boldsymbol {\tau }} \}_{\mathcal {T}}\) of optimal controls is uniformly bounded and contains a subsequence that converges strongly to \(\breve z\) , the solution to (4).

Proof

Boundedness is a consequence of optimality. Indeed, if z0 ∈ Zad then

$$\displaystyle \begin{aligned} \frac \mu2 \| \breve Z^{\boldsymbol{\tau}} \|{}_{L^2(0,T;\mathbb{R}^n)}^2 \leq {\mathcal{J}}_{\mathcal{T}}( \breve Z^{\boldsymbol{\tau}} ) \leq {\mathcal{J}}_{\mathcal{T}}( \varPi_{\mathcal{T}} z_0 ) \lesssim \| z_0 \|{}_{L^2(0,T;\mathbb{R}^n)}^2 + \| u_d \|{}_{L^2(0,T;\mathbb{R}^m)}^2, \end{aligned}$$

where we used the continuity of \({\mathfrak {S}}_{\mathcal {T}}\) and \(\varPi _{\mathcal {T}}\). This implies the existence of a (not relabeled) weakly convergent subsequence.

To show convergence of this sequence to \(\breve z\), we invoke the theory of Γ-convergence [7], so that we must verify three assumptions:

  1. 1.

    Lower bound: We must show that, whenever \(Z^{\boldsymbol {\tau }} \rightharpoonup z\) then \({\mathcal {J}}(z) \leq \liminf {\mathcal {J}}_{\mathcal {T}}(Z^{\boldsymbol {\tau }})\). To do so, let \(\eta \in L^2(0,T;\mathbb {R}^n)\) and notice that

    $$\displaystyle \begin{aligned} \int_0^T \left[ \overline{{\mathfrak{S}}_{\mathcal{T}} Z^{\boldsymbol{\tau}}} - {\mathfrak{S}} z \right] \cdot \eta \, {\mathrm{d}} t &= \int_0^T \left[ \overline{{\mathfrak{S}}_{\mathcal{T}} z} - {\mathfrak{S}} z \right] \cdot \eta \, {\mathrm{d}} t + \int_0^T \overline{{\mathfrak{S}}_{\mathcal{T}} (Z^{\boldsymbol{\tau}} - z)} \cdot \eta \, {\mathrm{d}} t \\ &= A + B. \end{aligned} $$

    The pointwise convergence of shows that A → 0, while the pointwise convergence of the adjoints shows that B → 0. In conclusion, \({\mathfrak {S}}_{\mathcal {T}} Z^{\boldsymbol {\tau }} \rightharpoonup {\mathfrak {S}} z\). Now, owing to the weak lower semicontinuity of norms, and the fact that \(\bar u^{\boldsymbol {\tau }}_d \to u_d\) in \(L^2(0,T;\mathbb {R}^m)\) we conclude

    $$\displaystyle \begin{aligned} {\mathcal{J}}(z) \leq \liminf {\mathcal{J}}_{\mathcal{T}}(Z^{\boldsymbol{\tau}}). \end{aligned}$$
  2. 2.

    Existence of a recovery sequence: We must show that, for every z ∈ Zad there is \(Z^{\boldsymbol {\tau }} \in \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) such that \(Z^{\boldsymbol {\tau }} \rightharpoonup z\) and \({\mathcal {J}}(z) \geq \limsup {\mathcal {J}}_{\mathcal {T}}(Z^{\boldsymbol {\tau }})\). To do so, it suffices to set \(Z^{\boldsymbol {\tau }} = \varPi _{\mathcal {T}} z\). Indeed, we even have strong convergence so that we can say \({\mathfrak {S}}_{\mathcal {T}} \varPi _{\mathcal {T}} z \to {\mathfrak {S}} z\). Continuity of norms and the convergence of \(\bar u_d^{\boldsymbol {\tau }}\) allow us to conclude the inequality for the costs.

  3. 3.

    Equicoerciveness: We must show that, for every \(r \in \mathbb {R}\), there is a weakly closed and weakly compact \(K_{r} \subset L^2(0,T;\mathbb {R}^n)\) such that, for all \(\mathcal {T}\), the r-sublevel set of \({\mathcal {J}}_{\mathcal {T}}\) is contained in Kr. To do so it suffices to notice that

    $$\displaystyle \begin{aligned} {\mathcal{J}}_{\mathcal{T}}(Z^{\boldsymbol{\tau}}) \geq \frac\mu2 \| Z^{\boldsymbol{\tau}} \|{}_{L^2(0,T;\mathbb{R}^n)}^2. \end{aligned}$$

    Thus, invoking [7, Proposition 7.7], we can immediately conclude.

With these three ingredients, we can now show convergence. Indeed, the lower bound inequality and recovery sequence property allow us to say that

$$\displaystyle \begin{aligned} {\mathcal{J}}_{\mathcal{T}} \overset{\varGamma}{\to} {\mathcal{J}} \end{aligned}$$

so that minimizers of \({\mathcal {J}}_{\mathcal {T}}\) converge to minimizers of \({\mathcal {J}}\). Equicoerciveness and the uniqueness of \(\breve z\) are the conditions of the so-called fundamental lemma of Γ-convergence [7, Corollary 7.24] which allow us to conclude that \(\breve Z^{\boldsymbol {\tau }} \rightharpoonup \breve z\).

We finalize the proof by showing strong convergence. To do so we first note that, by Dal Maso [7, equation (7.32)], we have \({\mathcal {J}}_{\mathcal {T}}(\breve Z^{\boldsymbol {\tau }}) \to {\mathcal {J}}(\breve z)\). Therefore

$$\displaystyle \begin{aligned} \frac 12 \int_0^T \left[ \left| \overline{{\mathfrak{S}}_{\mathcal{T}} \breve Z^{\boldsymbol{\tau}}} - {\mathfrak{S}} z \right|{}^2_n + \mu \left| \bar{\breve Z}^{\boldsymbol{\tau}} - \breve z \right|{}_n^2 \right]^2 \, {\mathrm{d}} t &= {\mathcal{J}}_{\mathcal{T}}( \breve Z^{\boldsymbol{\tau}}) + {\mathcal{J}}(\breve z) \\&- \int_0^T \overline{{\mathfrak{S}}_{\mathcal{T}} \breve Z^{\boldsymbol{\tau}}} \cdot \left( {\mathfrak{S}} \breve z - \bar u_d^{\boldsymbol{\tau}} \right) \, {\mathrm{d}} t \\ &+ \int_0^T u_d \cdot \left( {\mathfrak{S}} \breve z - \bar{u}^{\boldsymbol{\tau}}_d \right) \, {\mathrm{d}} t\\ &- \mu \int_0^T \breve Z^{\boldsymbol{\tau}} \cdot \breve z \, {\mathrm{d}} t \\ &\to {\mathcal{J}}(\breve z) + {\mathcal{J}}(\breve z) - 2 {\mathcal{J}}(\breve z) = 0, \end{aligned} $$

where we, again, used the convergence of the adjoint.

This concludes the proof of convergence. □

We conclude by showing weak convergence of the state.

Corollary 2 (State Convergence)

In the setting of Theorem 7 we have that \(\breve {U}^{\boldsymbol {\tau }} \rightharpoonup \breve {u}\) in \(L^2(0,T;\mathbb {R}^n)\).

Proof

This follows from the strong convergence of Z̆τ and of the adjoints . Indeed, let \(v \in L^2(0,T;\mathbb {R}^n)\) and notice that

Since , we obtain the result by invoking Proposition 2. □