Abstract
We consider a linear quadratic optimization problem where the state is governed by a fractional ordinary differential equation. We also consider control constraints. We show existence and uniqueness of an optimal state–control pair and propose a method to approximate it. Due to the low regularity of the solution to the state equation, rates of convergence cannot be proved unless problematic assumptions are made. Instead, we appeal to the theory of Γ-convergence to show the convergence of our scheme.
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
1 Introduction
In recent years, a lot of attention has been paid to the study of nonlocal problems, of which fractional differential equations represent an instance. This is motivated by the fact that fractional derivatives are better suited to capturing long-range interactions, as well as memory effects. For instance, they have been used to describe anomalous transport phenomena [9, 10], option pricing [6], porous media flow [5], and viscoelastic materials [8], to name a few. It is only natural then, from the purely mathematical as well as the practical points of view, to try to optimize systems that are governed by these equations. In previous work [4], we dealt with a constrained optimization problem where the state is governed by a differential equation that presented nonlocal features in time as well as in space. Throughout the analysis presented in [4], the nonlocalities in time and space were intertwined and this required to develop several tools to analyze the nonlocal operator in space that are, in principle, not relevant to the nonlocality in time. It is thus our feeling that the extensive technicalities that ensued in the analysis of [4] obscured many of the unique features that optimization of fractional differential equations contains; for instance, the lack of time regularity regardless of the smoothness of data. For this reason, our main objective in this note is to present a detailed study for the case where the state is governed by a time-fractional ordinary differential equation.
Let us be precise in our considerations. Given m, n ≥ 1, a final time T > 0, a desired state \(u_d \in L^2(0,T;\mathbb {R}^m)\), and a regularization parameter μ > 0, we define the cost functional as
where we denote the Euclidean norm in \(\mathbb {R}^s\) by |⋅|s and \(\mathcal {C} \in \mathbb {M}^{m \times n}\); \(\mathbb {M}^{m \times n}\) denotes the set of all m–by–n matrices. The variable u is called the state, while the variable z is the control. The control and state are related by the so-called state equation, which we now describe. Given an initial condition \(\psi \in \mathbb {R}^n\), a forcing function \(f : (0,T] \to \mathbb {R}^n\), a symmetric positive definite matrix \(\mathcal {A} \in \mathbb {M}^{n \times n}\), the state equation reads
Here, γ ∈ (0, 1) and \(\, {\mathrm {d}}_t^\gamma \) denotes the so-called left-sided Caputo fractional derivative of order γ, which is defined by [19, 28]
where by \(\dot v\) we denote the usual derivative and Γ is the Gamma function. We must immediately remark that, in addition to (3), there are other, not equivalent, definitions of fractional derivatives: Riemann–Liouville, Grünwald-Letnikov, and Marchaud derivatives. In this work, we shall focus on the Caputo derivatives since they allow for a standard initial condition in (2); a highly desirable feature in applications; see, for instance, the discussion in [14, Section E.4]. For further motivation and applications, we refer the reader to [11, 14].
The problem we shall be concerned with is to find (ŭ, z̆) such that
subject to the state equation (2) and the control constraints
Here \(a,b \in \mathbb {R}^n\) which we assume satisfy that a ≼ b. The relation v ≼ w means that, for all i = 1, …, n, we have vi ≤ wi.
To our knowledge, the first work that was devoted to the study of (4) is [2] where a formal Lagrangian formulation is discussed and optimality conditions are formally derived. The author of this work also presents a numerical scheme based on shifted Legendre polynomials. However, there is no analysis of the optimality conditions or numerical scheme. Other discretization schemes using finite elements [3], rational approximations [30], spectral methods [24, 32, 33], or other techniques have been considered. Most of these works do not provide a rigorous justification or analysis of their schemes, and the ones that do obtain error estimates under rather strong regularity assumptions of the state variable; namely, they require that \(\ddot u \in L^\infty (0,T;\mathbb {R}^n)\) which is rather problematic; see Theorem 2 below. In contrast, in this work, we carefully describe the regularity properties of the state equation and on their basis provide convergence (without rates) of the numerical scheme we propose.
Throughout our discussion, we will follow the standard notation and terminology. Nonstandard notation will be introduced in the course of our exposition. The rest of this work is organized as follows: Basic facts about fractional derivatives and integrals are presented in Section 1.1. We study the state equation in Section 2 where we construct the solution to problem (2), study its regularity, and present a somewhat new point of view for a classical scheme—the so-called L1 scheme. More importantly, we use the right regularity to obtain rates of convergence; an issue that has been largely ignored in the literature. With these ingredients at hand we proceed, in Section 3, to analyze the optimization problem (4); we show existence and uniqueness of an optimal state–control pair and propose a scheme to approximate it. We employ a piecewise linear (in time) approximation of the state and a piecewise constant approximation of the control. While not completely necessary for the analysis, we identify the discrete adjoint problem and use it to derive discrete optimality conditions. Finally, we show the strong convergence of the discrete optimal control to the continuous one. Owing to the reduced regularity of the solution to the state equation, this convergence, however, cannot have rates.
1.1 Fractional Derivatives and Integrals
We begin by recalling some fundamental facts about fractional derivatives and integrals. The left-sided Caputo fractional derivative is defined in (3). The right-sided Caputo fractional derivative of order γ is given by [19, 28]
For v ∈ L1(0, T), the left Riemann–Liouville fractional integral of order σ ∈ (0, 1) is defined by
see [28, Section 2]. Young’s inequality for convolutions immediately yields that, for p > 1, \(I_t^\sigma \) is a continuous operator from Lp(0, T) into itself. More importantly, a result by Flett [12] shows that
We refer the reader to [20] for the definition of the Orlicz space \(L\log L(0,T)\). This observation will be very important in subsequent developments. Notice finally that if \(v \in W^{1}_1(0,T)\), then we have that \(\, {\mathrm {d}}_t^\gamma v(t) = I_t^{1-\gamma } [\dot v] (t)\).
The generalized Mittag-Leffler function with parameters α > 0 and \(\beta \in \mathbb {R}\) is defined by
We refer the reader to [14] for an account of the principal properties of the Mittag-Leffler function.
2 The State Equation
In this section, we construct the solution to (2), thus showing its existence and uniqueness. This shall be of uttermost importance not only when showing the existence and uniqueness of solutions to our optimization problem, but when we deal with the discretization, as we will study the smoothness of u. To shorten notation, in this section we set
where f is the forcing term and z is the control in (2).
2.1 Solution Representation and Regularity
Let us now construct the solution to (2) and review its main properties. We will adapt the arguments of [26] to our setting. Since the matrix \(\mathcal {A}\) is symmetric and positive definite, it is orthogonally diagonalizable; meaning that there are \(\{\lambda _\ell ,\xi _\ell \}_{\ell =1}^n \subset \mathbb {R}_+ \times \mathbb {R}^n\) such that
This, in particular, implies that the vectors \(\{\xi _\ell \}_{\ell =1}^n\) form an orthonormal basis of \(\mathbb {R}^n\). Moreover, for any vector \(v \in \mathbb {R}^n\), we can define \(|v|{ }_{\mathcal {A}}^2 = v\cdot \mathcal {A} v\), which turns out to be a norm that satisfies
We set
With these properties of the matrix \(\mathcal {A}\) at hand, we propose the following solution ansatz:
where the coefficients uℓ(t) satisfy
for ℓ ∈{1, ⋯ , n}. Here, gℓ(t) = g(t) ⋅ ξℓ and ψℓ = ψ ⋅ ξℓ. The importance of this orthogonal decomposition lies in the fact that we have reduced problem (2) to a decoupled system of equations. The theory of fractional ordinary differential equations [28] gives, for ℓ ∈{1, ⋯ , n}, a unique function uℓ satisfying problem (13). In addition, standard considerations, which formally entail taking the Laplace transform of (13), yield that
We refer the reader to [25,26,27] for details. This representation shall prove rather useful to describe the existence, uniqueness, and regularity of u. To concisely state it, let us define
With this notation, a specialization of the results of [26] to the substantially simpler case when \(\mathcal {A}\) is a positive definite matrix (and thus the spaces are finite dimensional) yields the following result.
Theorem 1 (Existence and Uniqueness)
Assume that \(g \in L^2(0,T;\mathbb {R}^n)\) . Problem (2) has a unique solution\(u \in \mathbb {U}\), given by (12) and (14). Moreover, the following a priori estimate holds
where, for \(v \in \mathbb {R}^n\) and \(h \in L^2(0,T;\mathbb {R}^n)\) we have
where we implicitly identified v with the constant function \([0,T] \ni t \mapsto v \in \mathbb {R}^n\) . In this estimate, the hidden constant is independent of ψ, g, and u.
Having obtained conditions that guarantee the existence and uniqueness for (2) we now study its regularity. This is important since, as it is well known, smoothness and rate of approximation go hand in hand. This is exactly the content of direct and converse theorems in approximation theory [1, 17]. Consequently, any rigorous study of an approximation scheme must be concerned with the regularity of the solution. This, we believe, is an issue that for this problem has been largely ignored in the literature since, essentially, the solution to (2) is not smooth. Let us now follow [25, 26] and elaborate on this matter. The essence of the issue is already present in the case n = 1 so that (14) is the solution. Let us, to further simplify the discussion, set \(\mathcal {A} = 1\), g ≡ 0, and ψ = 1. In this case, the solution verifies the following asymptotic estimate:
If this is the case we then expect that, as t ↓ 0, \(\dot u(t) \approx t^{\gamma -1}\) and \(\ddot u(t) \approx t^{\gamma -2}\). Notice that, since γ ∈ (0, 1), the function ω1(t) = tγ−1 belongs to \(L\log L(0,T)\) but ω1∉L1+𝜖(0, T) for any 𝜖 > γ(1 + γ)−1. Similarly, the function ω2(t) = tγ−2 is not Lebesgue integrable, but
which implies that ω2 belongs to the weighted Lebesgue space L2(tσ;0, T), where σ > 3 − 2γ > 1. The considerations given above tell us that we should expect the following:
The justification of this heuristic is the content of the next result. For a proof, we refer the reader to [26, Theorem 8].
Theorem 2 (Regularity)
Assume that \(g \in H^2(0,T;\mathbb {R}^n)\) . Then u, the solution to (2), satisfies (18) and, for t ∈ (0, T], we have the following asymptotic estimate:
where σ > 3 − 2γ. The hidden constant is independent of t but blows up as γ ↓ 0+.
Remark 1 (Extensions)
Under the correct framework, the conclusion of Theorem 2 can be extended to the case where \(\mathcal {A}\) is an operator acting on a Hilbert space \(\mathcal H\) and Equation (2) is understood in a Gelfand triple \(\mathcal V \hookrightarrow \mathcal H \hookrightarrow \mathcal V'\); see [26] for details.
2.2 Discretization of the State Equation
Now that we have studied the state equation and the regularity properties of its solution u, we proceed to discretize it. To do so, we denote by \(\mathcal {K} \in \mathbb N\) the number of time steps. We define the (uniform) time step \(\tau = T/\mathcal {K} >0\) and set tk = kτ for \(k=0,\ldots ,\mathcal {K}\). We denote the time partition by \(\mathcal {T} = \{t_k\}_{k=0}^{\mathcal {K}}\). We define the space of continuous and piecewise linear, over the partition \(\mathcal {T}\), functions as follows:
We also define the space of piecewise constant functions
and the \(L^2(0,T;\mathbb {R}^n)\)-orthogonal projection onto \(\mathbb {Z}(\mathcal {T})\), that is, the operator \(\varPi _{\mathcal {T}} : L^2(0,T;\mathbb {R}^n) \to \mathbb {Z}(\mathcal {T})\) defined by
We remark that \(\varPi _{\mathcal {T}}\) satisfies
where the hidden constant is independent of r and τ.
For a function \(\phi \in BV(0,T;\mathbb {R}^n)\) we set ϕk =lim𝜖↑0ϕ(tk − 𝜖) and \(\phi ^{\boldsymbol {\tau }} = \{ \phi ^k \}_{k=0}^{\mathcal {K}}\), which can be uniquely identified with either an element of \(\mathbb {U}(\mathcal {T})\) or \(\mathbb {Z}(\mathcal {T})\) by the procedures we describe now. To ϕτ we associate \(\bar \phi ^{\boldsymbol {\tau }} \in \mathbb {Z}(\mathcal {T})\) defined by
We also associate \(\hat \phi ^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) via
Notice that
and that
Finally, for a sequence ϕτ we also define, for \(k=0,\ldots , \mathcal {K}-1\),
which can be understood as a mapping \({\mathfrak {d}} : \mathbb {U}(\mathcal {T}) \to \mathbb {Z}(\mathcal {T})\).
Having introduced this notation, we propose to discretize (2) by a collocation method over \(\mathbb {U}(\mathcal {T})\). In other words, we seek for \(\hat U^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) such that
and, for every \(k = 0, \ldots \mathcal {K}-1\), it satisfies
Remark 2 (Derivation of the Scheme)
In the literature, (26) is commonly referred to as the L1-scheme [16, 21, 22, 29], even though it is not presented this way. Nevertheless, let us show that this is equivalent to the methods presented in the literature. To see the relation it is sufficient to compute, for a function \(\hat W^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\), the value of \(\, {\mathrm {d}}_t^{\gamma } \hat W^{\boldsymbol {\tau }}(t_{k+1})\). By definitions (3), (23), and (24), we obtain that
where the coefficients \(a_j^k\) satisfy
Here, in the last step, we used that the time step is uniform and of size τ. The fact that the time step is uniform also implies that
so that, after the change of indices m = k − j, we obtain
with bm = (m + 1)1−γ − m1−γ. The expression above is what is commonly referred to as the L1 scheme.
2.2.1 Stability
Let us discuss the stability of scheme (26) as originally detailed in [26, Section 3.2.2]. We begin by exploring the properties of the coefficients \(a_j^k\).
Lemma 1 (Properties of \(a_j^k\))
Assume that the time step is given by τ > 0. For every\(k=0,\ldots ,\mathcal {K}-1\)and j = 0, …, k, the coefficients\(a_j^k\), defined in (28), satisfy
Moreover \(a_k^k = \tau ^{-\gamma } /\varGamma (2-\gamma )\).
Proof
The positivity of the coefficients follows from the fact that, for j = 0, …, k and ζ ∈ (tj, tj+1), we have that tk+1 − ζ > 0. We now show that the coefficients are increasing in the lower index. In fact, an application of the mean value theorem yields
for some ζj ∈ (tj, tj+1). Since the function ζ↦(tk+1 − ζ)−γ is increasing for ζ < tk+1, we conclude that \(a_j^k < a_{j+1}^k\). To show that the coefficients are decreasing in the upper index, we note that
so that \(a_j^{k+1} < a_j^k\). Finally, we note that
This concludes the proof. □
With the results of Lemma 1 at hand, we can now show stability of the scheme.
Theorem 3 (Stability)
For every \(\mathcal {K} \in \mathbb N\) , the scheme (26) is unconditionally stable and satisfies
where the hidden constant is independent of ψ, g, \(\bar {U}^{\boldsymbol {\tau }}\) and \(\mathcal {K}\) ; and Λ γ is defined in (17).
Proof
Multiply (26), by 2Uk+1 to obtain
where on the right-hand side we applied the Cauchy–Schwartz inequality; \(|\cdot |{ }_{\mathcal {A}}\) is defined in Section 2.1. We thus use (10), together with Young’s inequality, to say that
We now invoke (27) and deduce that
With this at hand (30) reduces to
Since, as stated in Lemma 1, we have that \(a_j^k-a_{j-1}^k>0\) we estimate
which can be used to obtain that
Notice now that, since \(a_j^k\) are defined as in (28) and bm = (m + 1)1−γ − m1−γ, for every j = 0, …, k we have
Thus, the change of indices m = k + 1 − j on the left-hand side and l = k − j on the right-hand side of (31), respectively, yields
where the sum on the right-hand side vanishes for k = 0. Multiply by τ and add over k to obtain
where \( \| \bar {U}^{\boldsymbol {\tau }} \|{ }_{L^2_{\mathcal {A}}(0,T;\mathbb {R}^n)}\) is defined by (11). Notice now that, since the time step is uniform,
We now analyze the first term on the left-hand side of (32): Changing indices via \(l+1=\mathcal {K}-k\) gives
Inserting (33) and (34) in (32), and using \(\varPi _{\mathcal {T}}\) that is a projection, yields the result. □
2.2.2 Consistency and Error Estimates
Let us now discuss the consistency of scheme (26). This will allow us to obtain error estimates. Clearly, it suffices to control the difference \(\, {\mathrm {d}}_t^\gamma (u - \hat {u}^{\boldsymbol {\tau }})\). The following formal estimate has been shown in many references; see, for instance, [21, 22]. The proof, essentially, is a Taylor expansion argument.
Proposition 1 (Consistency for Smooth Functions)
Let \(w \in C^2([0,T];\mathbb {R}^n)\) , then
where the hidden constant depends on \(\|w\|{ }_{C^2([0,T];\mathbb {R}^n)}\) but is independent of τ.
We must immediately point out that this estimate cannot be used in the analysis of (2). The reason behind this lies in Theorem 2 which shows that, in general, the solution to the state equation is not twice continuously differentiable. For this reason, in [26] a new consistency estimate, which takes into account the correct regularity of the solution, has been developed. This is the content of the next result.
Theorem 4 (Consistency)
Let γ ∈ (0, 1) and u solve (2). In the setting of Theorem2we have that, for any θ < ,
where the hidden constant is independent of τ but blows up as θ ↑. Here θ is independent of γ.
For a proof of this result, we refer the reader to [26, Section 3.2.1]. We just comment that it consists of a combination of the fine regularity results of Theorem 2, weighted estimates, and the mapping properties of the fractional integral operator \(I_t^{1-\gamma }\) detailed in Section 1.1. Let us, however, show how from this we obtain an error estimate.
Corollary 1 (Error Estimates)
Let u solve (2) and Uτsolve (26). In the setting of Theorem2we have that, for any θ < ,
where the hidden constant is independent of τ and the data but blows up as θ ↑.
Proof
Define eτ = uτ − Uτ. Subtracting (2) and (25)–(26) at t = tk+1 yields \(\hat {e}^{\boldsymbol {\tau }}(0) = 0\) and, for \(k=0,\ldots , \mathcal {K}-1\)
Since \(\bar {e}^{\boldsymbol {\tau }}(0) = 0\), the stability estimate of Theorem 3 then yields
The consistency estimate of Theorem 4 gives a control of the first term. Finally, owing to the regularity of g, we have that \(\| g - \varPi _{\mathcal {T}} g \|{ }_{L^2(0,T;\mathbb {R}^n)} \lesssim \tau \); see (21). This implies the result. □
2.3 Numerical Illustration
It is natural to wonder whether the reduced rate of convergence given in Corollary 1 is nothing but a consequence of the methods of proof. Here we show, by means of some computational examples, that while the rate τθ might not be sharp it is not possible to obtain the rate of convergence suggested by Proposition 1.
Let us set n = 1, T = 1, λ1 = , ψ = 1 and g = 0. From (14) we then obtain that the solution to the state equation (2) is given by
We implemented, in an in-house code, the scheme (25)–(26) and used it to approximate this function. We measured the L2(0, T) norm of the error, where we implemented the Mittag-Leffler function following [15]. Integration was carried out using a composite Gaussian rule with three (3) nodes; increasing the number of nodes produced no significant difference in the results.
The rates of convergence for various values of γ ∈ (0, 1) are presented in Figure 1. As we can see, Corollary 1 is not sharp, but consistent with the experimental orders. More importantly, the rates suggested by Proposition 1 are not obtained. In fact, the experimental rate of convergence seems to be \({\mathcal {O}}(\tau ^{\kappa } ) < {\mathcal {O}}(\tau ^{2-\gamma })\) with \(\kappa = \min \{1,\gamma +\tfrac 12\}\). However, the proof of such an estimate eludes us at the moment.
3 The Optimization Problem
Having studied the state equation, we can proceed with the study of the constrained optimization problem (4)–(5). We will show existence and uniqueness of a solution, along with a numerical technique to approximate it. We will also discuss the convergence properties of the proposed approximation scheme.
3.1 Existence and Uniqueness
To precisely state the constrained optimization problem, we begin by defining the set of admissible controls
which is, under the assumption that a ≼ b, a nonempty, closed, convex, and bounded subset of \(L^2(0,T;\mathbb {R}^n)\).
Now, as the conclusion of Theorem 1 asserts, for any \(z \in L^2(0,T;\mathbb {R}^n)\) there is a unique \(u = u(z) \in \mathbb {U}\) that solves (2). This uniquely defines an affine continuous mapping \({\mathfrak {S}} : L^2(0,T;\mathbb {R}^n) \to \mathbb {U} \subset L^2(0,T;\mathbb {R}^n)\) by the rule \(u = {\mathfrak {S}} z\), where u solves (2). With these tools at hand, we can show the existence and uniqueness of a state–control pair, that is, a pair \(({\breve {u}}, {\breve {z}}) \in \mathbb {U} \times Z_{\mathrm{ad}}\) such that \({\breve {u}} = {\mathfrak {S}} {\breve {z}}\) and satisfies (4)–(5). The proof of the following result is standard and we include it just for the sake of completeness.
Theorem 5 (Existence and Uniqueness)
The optimization problem: Find (u, z) such that satisfies (4) subject to (2) and (5) has a unique solution\(({\breve {u}}, {\breve {z}} ) \in \mathbb {U} \times Z_{\mathrm{ad}}\).
Proof
The control to state operator \({\mathfrak {S}}\) allows us to introduce the so-called reduced cost functional:
and to equivalently state the problem as: minimize \({\mathcal {J}}\) over Zad. Since μ > 0 and \({\mathfrak {S}}\) is affine the reduced cost \({\mathcal {J}}\) is strictly convex. Owing to the continuity of \({\mathfrak {S}}\), we have that \({\mathcal {J}}\) is continuous as well. Existence and uniqueness then follow from the direct method of calculus of variations [7, 23]. □
3.2 Discretization
We now proceed to discretize the optimization problem (4)–(5). We will do so by a piecewise constant approximation of the control and a piecewise linear continuous approximation of the state. We will follow the notation of Section 2.2 and, additionally, define
Once again, \(\mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) is a nonempty, convex, and closed subset of \(\mathbb {Z}(\mathcal {T})\). Notice also that, since a, b are time independent \(\varPi _{\mathcal {T}} Z_{\mathrm{ad}} \subset \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\).
We also define the discrete cost functional \(J_{\mathcal {T}} : \mathbb {U}(\mathcal {T}) \times \mathbb {Z}(\mathcal {T}) \to \mathbb {R}\) by
where \(\mathbb {U}(\mathcal {T})\) and \(\mathbb {Z}(\mathcal {T})\) are defined in (19) and (20), respectively. We immediately comment that, by an abuse of notation, we defined \(\bar u_d^{\boldsymbol {\tau }} \subset \mathbb {R}^m\) as the sequence of values \(u_d^k = \int _{t_k}^{t_{k+1}} u_d \, {\mathrm {d}} t\). In other words, we are modifying the cost by replacing the desired state ud by its piecewise constant approximation \(\bar {u}^{\boldsymbol {\tau }}_d\). Additionally, we have replaced \(\hat U^{\boldsymbol {\tau }}\) by its piecewise constant counterpart \(\bar {U}^{\boldsymbol {\tau }} \in \mathbb {Z}(\mathcal {T})\). For these reasons,
We propose the following discretization of the state equation (2): Given \(Z^{\boldsymbol {\tau }} \in \mathbb {Z}(\mathcal {T})\), find \(\hat U^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) such that \(\hat U^{\boldsymbol {\tau }}(0) = \psi \) and, for all \(k = 0, \ldots , \mathcal {K}-1 \), we have
where \(\, {\mathrm {d}}_t^\gamma \) is defined in (3) and \( \varPi _{\mathcal {T}}\) corresponds to the \(L^2(0,T;\mathbb {R}^n)\)-orthogonal projection onto \(\mathbb {Z}(\mathcal {T})\). We remark that (36) is nothing but discretization (25)–(26) of the state equation, where the variable z is already piecewise constant in time. Since \(f + Z^{\boldsymbol {\tau }} \in L^2(0,T;\mathbb {R}^n)\), we can invoke Theorem 3 to conclude that problem (36) is stable for all τ > 0.
We thus define the discrete optimization problem as follows: Find \((\breve {\hat U}^{\boldsymbol {\tau }},{\breve {Z}}^{\boldsymbol {\tau }}) \in \mathbb {U}(\mathcal {T})\times \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) such that
subject to (36). Let us briefly comment on the existence and uniqueness of a minimizer, which closely follows Theorem 5. Indeed, for every \(z \in L^2(0,T;\mathbb {R}^n)\) there exists a unique \(\hat U^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) that solves (36) with data \(\varPi _{\mathcal {T}} z\). This uniquely defines a map \({\mathfrak {S}}_{\mathcal {T}} : L^2(0,T;\mathbb {R}^n) \to \mathbb {U}(\mathcal {T})\), which we call the discrete control to state map. We can then define the reduced cost as
and proceed as in Theorem 5, by using the strict convexity of \({\mathcal {J}}_{\mathcal {T}}\) and the continuity of the affine map \({\mathfrak {S}}_{\mathcal {T}}\), which follows from Theorem 3.
3.3 Discrete Optimality Conditions
Let us derive discrete optimality conditions. This is useful not only in the practical solution of the discrete optimization problem (36)–(37), but it will help us in analyzing its convergence properties. As stated before, problem (36)–(37) is equivalent to the following constrained optimization problem: Find \(\breve Z^{\boldsymbol {\tau }} \in \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) such that
that is, a minimization problem over a closed, bounded, and convex set. It is standard then (since \({\mathcal {J}}_{\mathcal {T}}\) is convex, coercive, and differentiable) that a necessary and sufficient condition for optimality is
where \(D {\mathcal {J}}_{\mathcal {T}}(Z)[\cdot ]\) is the Gâteaux derivative of \({\mathcal {J}}_{\mathcal {T}}\) at the point Z. Let us now rewrite and simplify the optimality condition (38) by introducing the so-called adjoint state that, as stated in [31, Section 1.4.3], is a simple trick that is of utmost importance in optimal control theory.
For a given \(\hat U^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) the adjoint is the function \(\hat P^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) such that \(\hat P^{\boldsymbol {\tau }}(T) = 0\) and, for all \(k = \mathcal {K}-1, \ldots , 0\)
where \(\, {\mathrm {d}}_{T-t}^\gamma \) denotes the right-sided Caputo fractional derivative of order γ defined in (6). The optimality conditions are as follows.
Theorem 6 (Optimality Conditions)
The pair \((\breve {\hat U}^{\boldsymbol {\tau }},\breve Z^{\boldsymbol {\tau }}) \in \mathbb {U}(\mathcal {T}) \times \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) solves (37) if and only if\(\breve {\hat U}^{\boldsymbol {\tau }} = {\mathfrak {S}}_{\mathcal {T}} \breve Z^{\boldsymbol {\tau }}\)and
where \(\breve P^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) solves (39) with data\(\breve {\hat U}^{\boldsymbol {\tau }}\).
Proof
We will obtain the result by showing that (40) is nothing but a restatement of (38). In fact, a simple calculation reveals that, for any \(\varTheta ^{\boldsymbol {\tau }}, \varPsi ^{\boldsymbol {\tau }} \in \mathbb {Z}(\mathcal {T})\), we have
Consequently, (38) can be equivalently rewritten as, for every \(Z^{\boldsymbol {\tau }} \in \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\),
Let us focus our attention now on the first term inside the integral. Denote \(U^{\boldsymbol {\tau }} ={\mathfrak {S}}_{\mathcal {T}} Z^{\boldsymbol {\tau }}\) and \(\breve U^{\boldsymbol {\tau }} ={\mathfrak {S}}_{\mathcal {T}} \breve Z^{\boldsymbol {\tau }}\). Define \(\varPhi ^{\boldsymbol {\tau }} := U^{\boldsymbol {\tau }} - \breve U^{\boldsymbol {\tau }}\) and notice that \(\hat \varPhi ^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) satisfies: \(\hat \varPhi ^{\boldsymbol {\tau }}(0) = 0\) and, for every \(k = 0, \ldots , \mathcal {K}-1\),
or, in view of (22), equivalently,
Multiply this equation by \(\bar {\breve P}^{\boldsymbol {\tau }}\) and integrate to obtain
Now, multiply (39) by \(\bar \varPhi ^{\boldsymbol {\tau }}\) and integrate to obtain
Subtract these last two identities. Upon remembering the definition of Φτ, we thus obtain
where we have used that the matrix \(\mathcal {A}\) is symmetric. Notice that the last term in this expression is nothing but the first term on the left-hand side of (41). In other words, if we can show that
we obtain the result.
To show this we realize that, since we are dealing with piecewise constants, we can equivalently rewrite the left-hand side of this identity as
where we used (29).
In a similar manner to the computations of Remark 2, we can obtain that
consequently
We can invoke now the results of [4, Appendix A] to conclude that the identity (42) holds. The theorem is thus proven. □
Remark 3 (Discrete Fractional Integration by Parts)
Notice that, during the course of the proof of Theorem 6 we showed that, whenever \(\hat V^{\boldsymbol {\tau }}, \hat W^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) satisfy \(\hat V^{\boldsymbol {\tau }}(0) = 0\) and \(\hat W^{\boldsymbol {\tau }}(T) = 0\), then they satisfy the following discrete fractional integration by parts
This identity shall prove useful in the sequel.
Remark 4 (Projection)
The solution to the variational inequality (40) can be accomplished rather easily. Indeed, since all the involved functions belong to \(\mathbb {Z}(\mathcal {T})\), it suffices to consider one time interval, say (tk−1, tk], where we must have
From this it immediately follows that
where, for \(w \in \mathbb {R}^n\), we define Pr[a,b]w as the projection onto the cube \([a,b] = \left \{ x \in \mathbb {R}^n : a \preceq x \preceq b \right \}\), which can be easily accomplished by the formula
This is the main advantage of considering piecewise constant approximations of the control and a modified cost. Other variants might yield a better approximation, but at the cost of a more involved solution scheme.
3.4 Convergence
Let us now discuss the convergence of our approximation scheme. The main issue here is that since, even for a smooth f, the right-hand side of (36) belongs only to \(L^2(0,T;\mathbb {R}^n)\) we cannot invoke the results of Corollary 1 to establish a rate of convergence. Notice, additionally, that we modified the cost, one of the reasons being that this led us to the simplifications detailed in Remark 4. As a consequence we only show convergence without rates.
We begin by noticing that, for any \(z \in L^2(0,T;\mathbb {R}^n)\) we have that , where \(\hat V^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) satisfies
and the linear, continuous operator is the solution operator for the scheme: Find \(\hat U_0^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) such that \(\hat U_0^{\boldsymbol {\tau }} (0) = 0\) and, for \(k=0, \ldots , \mathcal {K}-1\),
Let us describe the properties of \(\hat V^{\boldsymbol {\tau }}\).
Proposition 2 (Properties of \(\hat V^{\boldsymbol {\tau }}\))
Assume that \(f \in L^2(0,T;\mathbb {R}^n)\) , then the family \(\{\hat V^{\boldsymbol {\tau }}\}_{\mathcal {T}}\) converges, as \(\mathcal {K} \to \infty \) , in \(L^2(0,T;\mathbb {R}^n)\) to \(v \in \mathbb {U}\) , which solves
Proof
The claimed result is obtained by a simple density argument, combined with stability of the continuous and discrete state equations. Let 𝜖 > 0. Since \(f \in L^2(0,T;\mathbb {R}^n)\), there is a \(f_\epsilon \in H^2(0,T;\mathbb {R}^n)\) such that
where by C1 we denote the constant in inequality (16). Denote by v𝜖 the solution to
The smoothness of f𝜖 allows us to invoke Theorem 2 to assert that the regularity estimates (18), with u replaced by v𝜖, hold. In addition, invoking Theorem 1, we get that
Let us now approximate v𝜖 via the scheme (26), over a mesh \(\mathcal {T}\) where \(\mathcal {K}\) remains to be chosen. In doing so we obtain a function \(\hat V_\epsilon ^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\). Moreover, since v𝜖 verifies the assumptions of Theorem 2, we invoke Corollary 1 to conclude that
where C2 denotes a positive constant that depends on \(\|f_\epsilon \|{ }_{H^2(0,T;\mathbb {R}^n)}\). However, since 𝜖 is fixed, we can choose \(\mathcal {K}\) so that
The last ingredient is to observe that the difference \(V_\epsilon ^{\boldsymbol {\tau }} - V^{\boldsymbol {\tau }}\) solves (25)–(26) with zero initial condition and right-hand side \(\varPi _{\mathcal {T}}( f - f_\epsilon )\). We then invoke the stability of the scheme, stated in Theorem 3, to obtain
where we used that \(\varPi _{\mathcal {T}}\) is a projection.
Combine these observations to conclude that
Conclude by noticing that, since v𝜖 → v, after possibly taking an even larger \(\mathcal {K}\) we can assert
This concludes the proof. □
The main consequence of this statement arises when we use the decomposition of \({\mathfrak {S}}_{\mathcal {T}}\) in the reduced cost. Namely, we get
for \(W^{\boldsymbol {\tau }} = u_d^{\boldsymbol {\tau }} - \mathcal {C} V^{\boldsymbol {\tau }}\), that is, the discrete desired state changes and, moreover, \(W^{\boldsymbol {\tau }} \to u_d - \mathcal {C} v\) in \(L^2(0,T;\mathbb {R}^m)\) as \(\mathcal {K} \to \infty \). All these considerations allow us to reduce the problem to the case when ψ = 0 and f ≡ 0 so that the discrete control to state map is a linear operator.
In this setting we can assert the strong convergence of and its adjoint, which will be a fundamental tool in proving convergence. Here and in what follows, we denote by \({\mathfrak {B}}(L^2(0,T;\mathbb {R}^n))\) the space of bounded linear operators on \(L^2(0,T;\mathbb {R}^n)\) endowed with the operator norm.
Lemma 2 (Strong Convergence)
The family of solution operators and of their adjoints is uniformly bounded in \({\mathfrak {B}}(L^2(0,T;\mathbb {R}^n))\) and strongly convergent.
Proof
We begin by realizing that the uniform boundedness, in \({\mathfrak {B}}(L^2(0,T;\mathbb {R}^n))\), of is a restatement of Theorem 3, see [13, 18]. Moreover, the error estimates of Corollary 1 are valid for a collection of right-hand sides that is dense in \(L^2(0,T;\mathbb {R}^n)\). This means, by an argument similar to the one provided in Proposition 2, that for every \(z \in L^2(0,T;\mathbb {R}^n)\) the family converges; see [13, Proposition 5.17].
Let us now prove the same statements for the family of adjoints. To do so we must first identify it. Let \(z,\eta \in L^2(0,T;\mathbb {R}^n)\) and \(\hat U_0^{\boldsymbol {\tau }}\) solve (43). In addition, let \(\hat P^{\boldsymbol {\tau }} \in \mathbb {U}(\mathcal {T})\) be the solution to (39) but with the right-hand side replaced by \(\varPi _{\mathcal {T}} \eta \). Multiply the aforementioned equations by \(\bar P^{\boldsymbol {\tau }}\) and \(\bar U_0^{\boldsymbol {\tau }}\), integrate and subtract to obtain
where we used that the matrix \(\mathcal {A}\) is symmetric. We now invoke Remark 3 to conclude that the right-hand side of the previous expression vanishes, which implies that
where the first and last equalities hold by the definition of \(\varPi _{\mathcal {T}}\). Since the last identity holds for every \(z \in L^2(0,T;\mathbb {R}^n)\), we thus have that .
It now remains to realize that Pτ is a discretization of the problem
Repeating the arguments that led to Theorem 3 and Corollary 1, we get that Pτ is a stable and consistent approximation, so we can, again, conclude the uniform boundedness and convergence of the family . □
We are now ready to establish convergence of our scheme.
Theorem 7 (Convergence)
The family \(\{ \breve Z^{\boldsymbol {\tau }} \}_{\mathcal {T}}\) of optimal controls is uniformly bounded and contains a subsequence that converges strongly to \(\breve z\) , the solution to (4).
Proof
Boundedness is a consequence of optimality. Indeed, if z0 ∈ Zad then
where we used the continuity of \({\mathfrak {S}}_{\mathcal {T}}\) and \(\varPi _{\mathcal {T}}\). This implies the existence of a (not relabeled) weakly convergent subsequence.
To show convergence of this sequence to \(\breve z\), we invoke the theory of Γ-convergence [7], so that we must verify three assumptions:
-
1.
Lower bound: We must show that, whenever \(Z^{\boldsymbol {\tau }} \rightharpoonup z\) then \({\mathcal {J}}(z) \leq \liminf {\mathcal {J}}_{\mathcal {T}}(Z^{\boldsymbol {\tau }})\). To do so, let \(\eta \in L^2(0,T;\mathbb {R}^n)\) and notice that
$$\displaystyle \begin{aligned} \int_0^T \left[ \overline{{\mathfrak{S}}_{\mathcal{T}} Z^{\boldsymbol{\tau}}} - {\mathfrak{S}} z \right] \cdot \eta \, {\mathrm{d}} t &= \int_0^T \left[ \overline{{\mathfrak{S}}_{\mathcal{T}} z} - {\mathfrak{S}} z \right] \cdot \eta \, {\mathrm{d}} t + \int_0^T \overline{{\mathfrak{S}}_{\mathcal{T}} (Z^{\boldsymbol{\tau}} - z)} \cdot \eta \, {\mathrm{d}} t \\ &= A + B. \end{aligned} $$The pointwise convergence of shows that A → 0, while the pointwise convergence of the adjoints shows that B → 0. In conclusion, \({\mathfrak {S}}_{\mathcal {T}} Z^{\boldsymbol {\tau }} \rightharpoonup {\mathfrak {S}} z\). Now, owing to the weak lower semicontinuity of norms, and the fact that \(\bar u^{\boldsymbol {\tau }}_d \to u_d\) in \(L^2(0,T;\mathbb {R}^m)\) we conclude
$$\displaystyle \begin{aligned} {\mathcal{J}}(z) \leq \liminf {\mathcal{J}}_{\mathcal{T}}(Z^{\boldsymbol{\tau}}). \end{aligned}$$ -
2.
Existence of a recovery sequence: We must show that, for every z ∈ Zad there is \(Z^{\boldsymbol {\tau }} \in \mathbb {Z}_{\mathrm{ad}}(\mathcal {T})\) such that \(Z^{\boldsymbol {\tau }} \rightharpoonup z\) and \({\mathcal {J}}(z) \geq \limsup {\mathcal {J}}_{\mathcal {T}}(Z^{\boldsymbol {\tau }})\). To do so, it suffices to set \(Z^{\boldsymbol {\tau }} = \varPi _{\mathcal {T}} z\). Indeed, we even have strong convergence so that we can say \({\mathfrak {S}}_{\mathcal {T}} \varPi _{\mathcal {T}} z \to {\mathfrak {S}} z\). Continuity of norms and the convergence of \(\bar u_d^{\boldsymbol {\tau }}\) allow us to conclude the inequality for the costs.
-
3.
Equicoerciveness: We must show that, for every \(r \in \mathbb {R}\), there is a weakly closed and weakly compact \(K_{r} \subset L^2(0,T;\mathbb {R}^n)\) such that, for all \(\mathcal {T}\), the r-sublevel set of \({\mathcal {J}}_{\mathcal {T}}\) is contained in Kr. To do so it suffices to notice that
$$\displaystyle \begin{aligned} {\mathcal{J}}_{\mathcal{T}}(Z^{\boldsymbol{\tau}}) \geq \frac\mu2 \| Z^{\boldsymbol{\tau}} \|{}_{L^2(0,T;\mathbb{R}^n)}^2. \end{aligned}$$Thus, invoking [7, Proposition 7.7], we can immediately conclude.
With these three ingredients, we can now show convergence. Indeed, the lower bound inequality and recovery sequence property allow us to say that
so that minimizers of \({\mathcal {J}}_{\mathcal {T}}\) converge to minimizers of \({\mathcal {J}}\). Equicoerciveness and the uniqueness of \(\breve z\) are the conditions of the so-called fundamental lemma of Γ-convergence [7, Corollary 7.24] which allow us to conclude that \(\breve Z^{\boldsymbol {\tau }} \rightharpoonup \breve z\).
We finalize the proof by showing strong convergence. To do so we first note that, by Dal Maso [7, equation (7.32)], we have \({\mathcal {J}}_{\mathcal {T}}(\breve Z^{\boldsymbol {\tau }}) \to {\mathcal {J}}(\breve z)\). Therefore
where we, again, used the convergence of the adjoint.
This concludes the proof of convergence. □
We conclude by showing weak convergence of the state.
Corollary 2 (State Convergence)
In the setting of Theorem 7 we have that \(\breve {U}^{\boldsymbol {\tau }} \rightharpoonup \breve {u}\) in \(L^2(0,T;\mathbb {R}^n)\).
Proof
This follows from the strong convergence of Z̆τ and of the adjoints . Indeed, let \(v \in L^2(0,T;\mathbb {R}^n)\) and notice that
Since , we obtain the result by invoking Proposition 2. □
References
N.I. Achieser. Theory of approximation. Dover Publications, Inc., New York, 1992. Translated from the Russian and with a preface by Charles J. Hyman, Reprint of the 1956 English translation.
O.P. Agrawal. A general formulation and solution scheme for fractional optimal control problems. Nonlinear Dynam., 38(1–4):323–337, 2004.
O.P. Agrawal. A general finite element formulation for fractional variational problems. J. Math. Anal. Appl., 337(1):1–12, 2008.
H. Antil, E. Otárola, and A.J. Salgado. A space-time fractional optimal control problem: analysis and discretization. SIAM J. Control Optim., 54(3):1295–1328, 2016.
M. Caputo. Diffusion of fluids in porous media with memory. Geothermics, 28(1):113 – 130, 1999.
A. Cartea and D. del Castillo-Negrete. Fractional diffusion models of option prices in markets with jumps. Physica A, 374:749–763, 2007.
G. Dal Maso. An introduction toΓ-convergence. Progress in Nonlinear Differential Equations and their Applications, 8. Birkhäuser Boston, Inc., Boston, MA, 1993.
L. Debnath. Fractional integral and fractional differential equations in fluid mechanics. Fract. Calc. Appl. Anal., 6(2):119–155, 2003.
D. del Castillo-Negrete. Fractional diffusion models of nonlocal transport. Phys. Plasmas, 13(8):082308, 16, 2006.
D. del Castillo-Negrete, B. A. Carreras, and V. E. Lynch. Nondiffusive transport in plasma turbulence: A fractional diffusion approach. Phys. Rev. Lett., 94:065003, Feb 2005.
K. Diethelm. The analysis of fractional differential equations, volume 2004 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2010. An application-oriented exposition using differential operators of Caputo type.
T.M. Flett. A note on some inequalities. Proc. Glasgow Math. Assoc., 4:7–15 (1958), 1958.
G.B. Folland. Real analysis. John Wiley & Sons, Inc., New York, 1984.
R. Gorenflo, A.A. Kilbas, F. Mainardi, and S.V. Rogosin. Mittag-Leffler functions, related topics and applications. Springer Monographs in Mathematics. Springer, Heidelberg, 2014.
R. Gorenflo, J. Loutchko, and Y. Luchko. Computation of the Mittag-Leffler function Eα,β(z) and its derivative. Fract. Calc. Appl. Anal., 5(4):491–518, 2002. Dedicated to the 60th anniversary of Prof. Francesco Mainardi.
B. Jin, R. Lazarov, and Z. Zhou. An analysis of the L1 scheme for the subdiffusion equation with nonsmooth data. IMA J. Numer. Anal., 36(1):197–221, 2016.
J.-P. Kahane. Teoría constructiva de funciones. Universidad de Buenos Aires, Buenos Aires], 1961:111, 1961.
L. V. Kantorovich and G. P. Akilov. Funktsionalnyi analiz. “Nauka”, Moscow, third edition, 1984.
A. A. Kilbas, H. M. Srivastava, and J. J. Trujillo. Theory and applications of fractional differential equations, volume 204 of North-Holland Mathematics Studies. Elsevier Science B.V., Amsterdam, 2006.
M.A. Krasnosel’skiı̆ and Ja.B. Rutickiı̆. Convex functions and Orlicz spaces. Translated from the first Russian edition by Leo F. Boron. P. Noordhoff Ltd., Groningen, 1961.
Y. Lin, X. Li, and C. Xu. Finite difference/spectral approximations for the fractional cable equation. Math. Comp., 80(275):1369–1396, 2011.
Y. Lin and C. Xu. Finite difference/spectral approximations for the time-fractional diffusion equation. J. Comput. Phys., 225(2):1533–1552, 2007.
J.-L. Lions. Optimal control of systems governed by partial differential equations. Translated from the French by S. K. Mitter. Die Grundlehren der mathematischen Wissenschaften, Band 170. Springer-Verlag, New York-Berlin, 1971.
A. Lotfi and S. A. Yousefi. A numerical technique for solving a class of fractional variational problems. J. Comput. Appl. Math., 237(1):633–643, 2013.
W. McLean. Regularity of solutions to a time-fractional diffusion equation. ANZIAM J., 52(2):123–138, 2010.
R.H. Nochetto, E. Otárola, and A.J. Salgado. A PDE approach to space-time fractional parabolic problems. SIAM J. Numer. Anal., 54(2):848–873, 2016.
K. Sakamoto and M. Yamamoto. Initial value/boundary value problems for fractional diffusion-wave equations and applications to some inverse problems. J. Math. Anal. Appl., 382(1): 426–447, 2011.
S. G. Samko, A. A. Kilbas, and O. I. Marichev. Fractional integrals and derivatives. Gordon and Breach Science Publishers, Yverdon, 1993. Theory and applications, Edited and with a foreword by S. M. Nikol’skiı̆, Translated from the 1987 Russian original, Revised by the authors.
Z. Sun and X. Wu. A fully discrete difference scheme for a diffusion-wave system. Appl. Numer. Math., 56(2):193–209, 2006.
C. Tricaud and Y. Chen. An approximate method for numerically solving fractional order optimal control problems of general form. Comput. Math. Appl., 59(5):1644–1655, 2010.
F. Tröltzsch. Optimal control of partial differential equations, volume 112 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2010. Theory, methods and applications, Translated from the 2005 German original by Jürgen Sprekels.
X. Ye and C. Xu. Spectral optimization methods for the time fractional diffusion inverse problem. Numer. Math. Theory Methods Appl., 6(3):499–516, 2013.
X. Ye and C. Xu. A spectral method for optimal control problems governed by the time fractional diffusion equation with control constraints. In Spectral and high order methods for partial differential equations—ICOSAHOM 2012, volume 95 of Lect. Notes Comput. Sci. Eng., pages 403–414. Springer, Cham, 2014.
Acknowledgements
E. Otárola was supported in part by CONICYT through FONDECYT project 3160201. A.J. Salgado was supported in part by NSF grant DMS-1418784.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this chapter
Cite this chapter
Otárola, E., Salgado, A.J. (2018). Optimization of a Fractional Differential Equation. In: Antil, H., Kouri, D.P., Lacasse, MD., Ridzal, D. (eds) Frontiers in PDE-Constrained Optimization. The IMA Volumes in Mathematics and its Applications, vol 163. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8636-1_8
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8636-1_8
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-8635-4
Online ISBN: 978-1-4939-8636-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)