A Brief Introduction to PDE-Constrained Optimization

Antil, Harbir; Leykekhman, Dmitriy

doi:10.1007/978-1-4939-8636-1_1

Harbir Antil⁶ &
Dmitriy Leykekhman⁷

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 163))

2487 Accesses
5 Citations
1 Altmetric

Abstract

In this chapter we give a brief overview of optimization problems with partial differential equation (PDE) constraints, i.e., PDE-constrained optimization (PDECO). We start with three potentially different formulations of a general PDECO problem and focus on the so-called reduced form. We present a derivation of the optimality conditions. Later we discuss the linear and the semilinear quadratic PDECO problems. We conclude with the discretization and the convergence rates for these problems. For illustration, we make a MATLAB code available at

https://bitbucket.org/harbirantil/pde_constrained_opt that solves the semilinear PDECO problem with control constraints.

Access provided by CONRICYT-eBooks. Download chapter PDF

An introduction to partial differential equations constrained optimization

Article 09 August 2018

Introduction

pyomo.dae: a modeling and automatic discretization framework for optimization with differential and algebraic equations

Article 20 December 2017

1 Introduction

This volume is aimed at early-career graduate students or whoever is interested in entering into the field of PDE-constrained optimization (PDECO), i.e., optimization problems with PDE constraints. The purpose of this particular chapter is to provide a brief mathematical introduction to this topic with an emphasis on optimality conditions and finite-element discretization. Most of the material presented here is well-known and can be found in several textbooks. In particular, we mention [14, 53, 56, 62, 81].

PDE-constrained optimization problems arise in many applications, for instance in flow control [42] and shape optimization [43, 80]. To name a few: controlling pollutants in a river [5]; drug delivery using magnetic fields [8, 9]; optimal placement of droplets on electrowetting on dielectric (EWOD) devices [7]; shape optimization of microfluidic biochips [3, 4]. In the first two examples, the control is distributed (confined to either the entire domain or a subdomain). In the third example, the control acts on the boundary, and in the final example, the control is the shape of the domain. These problems range from macroscale to microscale and contain varying levels of difficulties due to the underlying physics.

A PDE-constrained optimization problem has four or five major components: i) a control variable z; ii) state variable u; iii) a state equation (i.e., PDE or system of PDEs) which associates a state u with every control z; iv) a cost functional J to be minimized which depends on z and u; and, possibly, v) constraints imposed on z (control constraints) or/and on u (state constraints). In the most abstract form, this amounts to

$$\displaystyle \begin{aligned} \text{minimize}\quad J(u,z) \end{aligned} $$

(1)

subject to u ∈ U_ad and z ∈ Z_ad satisfying

$$\displaystyle \begin{aligned} e(u,z)=0. \end{aligned} $$

(2)

Thus we want to minimize the cost J which is a function of the state u and the control z. The state equation is given by (2) that represents a PDE or system of PDEs (usually in the weak form) with u in an admissible set U_ad and the control z in an admissible set Z_ad.

To fully understand PDECO problems, the following topics must be considered: (i) functional analysis of the problem; (ii) solver development; (iii) discretization and software development. We shall briefly elaborate on each next, a more detailed discussion will be given in the subsequent chapters. By analysis, we mean

Existence, uniqueness, and regularity of a solution to (2).
Existence and uniqueness of a solution to the PDECO problem (1)–(2).
First-order necessary optimality conditions using adjoint equations.
If possible the second-order necessary/sufficient conditions.

We can carry out the solver development, typically gradient based, either at the continuous (infinite dimensional) level [49, 50, 82] or at the discrete (finite dimensional) level [57, 66]. The choice depends on the two options (a) “First discretize then optimize approach” or (b) “First optimize then discretize approach.” As the names suggest in case (a) one first discretizes the optimization problem and then writes the optimality conditions at the finite dimensional level. On the other hand, (b) requires first writing the optimality conditions at the continuous level and then discretizing them. These two approaches in general are different. For instance the discretization of the adjoint variables is tied to the state variables if one proceeds with (a). On the other hand, (b) provides more flexibility with respect to this particular aspect. There is no universal approach to select either of the approaches [53, Section 3.2.4]. However, a choice should be made so that the discrete system preserves the structure of the continuous problem. It is worth noting that the optimization methods can be derived directly at the continuous level. In any event, one should avoid using off-the-shelf finite-dimensional algorithms to solve PDECO problems. This may lead to mesh dependence,, i.e., when the number of optimization iterations drastically increases with mesh refinement. In order to overcome such issues, we must adapt such algorithms (if we are using them) to incorporate functional-analytic concepts, for instance, suitable inner products. Nevertheless, discretization and software development amounts to

Either finite-difference, finite-volume, or finite-element discretizations of the PDE and the control [19, 46, 70]. Other discretizations such as wavelets are possible as well, see [30, 31, 78] and references therein.
Analysis of discrete PDE and optimization solvers and check for mesh independence [51, 82].
Make the solver efficient, for instance, by using
- Adaptive finite-element methods (AFEM) [59, 67].
- Preconditioning techniques [15, 76, 84], time-parallel approaches [16, 32, 33, 83].
- Model reduction: Proper orthogonal decomposition [6, 41, 54, 55], reduced-basis method [17, 45, 69].
Software development, for instance, Rapid Optimization Library (ROL) [60], Dolfin-Adjoint [36].

Our goal for this chapter is to collect the necessary ingredients to understand basic PDECO problems. The remaining chapters of this volume provide comprehensive treatments of variational inequalities, algorithm development, optimization under uncertainty, inverse problems, shape optimization, and several applications. We introduce notation, relevant function spaces, and notions of derivatives. We provide an abstract treatment to solve (1)–(2). Moreover, we discuss the linear quadratic PDECO problem in detail (quadratic cost functional with linear elliptic PDEs and control constraints), provide a flavor of the semilinear quadratic PDECO problem, and discuss the necessary components for the numerical analysis of the linear and semilinear quadratic PDECO problems. We also provide a MATLAB implementation of the semilinear quadratic PDECO problem where the space discretization is carried out using the finite-element method (FEM). We solve the resulting optimization problem using the Newton-CG method, or LBFGS method in the absence of the control constraints (Z_ad = Z) and the semismooth Newton method in the presence of the control constraints. The code can be downloaded using the link

https://bitbucket.org/harbirantil/pde_constrained_opt

The first part of this chapter is organized as follows: in Section 2, we first consider a general optimization problem. We discuss existence of solutions to this problem in Section 2.1 and provide notions of derivatives in function spaces in Section 2.2. We conclude this section with the first-order necessary optimality conditions (cf. Section 2.3). We apply the approach discussed in Section 2 to the general PDECO problem (1)–(2) in Section 3 and discuss the full and the reduced forms. Derivation of optimality conditions is performed using both full and reduced forms.

In the second part of this chapter, we focus on the linear and semilinear quadratic PDECO problems. We introduce basic Sobolev spaces in Section 4. In Section 5, we mention the results relating to the well-posedness (existence and uniqueness whenever possible) of linear elliptic PDEs and their regularity. Next in Section 6, we formulate the linear quadratic PDECO problem and study its well-posedness in the reduced form. Section 7 discusses the semilinear quadratic PDECO problem. We conclude by introducing a finite-element discretization for the linear and semilinear quadratic PDECO problems in Section 8.

2 Abstract Optimization Problem

The purpose of this section is to consider an abstract optimization problem and study its well-posedness (cf. Section 2.1). We derive the first-order necessary optimality conditions in Section 2.3. This requires the notions of derivatives in function spaces (cf. Section 2.2). The results presented in this section will be applied to the PDECO problems in the subsequent sections.

Let Z be a real reflexive Banach space and Z_ad is a closed convex subset. We consider the minimization problem

$$\displaystyle \begin{aligned} \min_{z \in Z_{ad}} f(z).\end{aligned} $$

(3)

2.1 Existence

We first show existence of solution to the minimization problem (3) using the direct method of calculus of variations. This is also known as the Weierstrass theorem.

Theorem 1

Suppose $f:Z \rightarrow \mathbb {R}$ is weakly lower semicontinuous with Z a reflexive Banach space and Z _ad ⊂ Z is closed convex. Let the lower γ-level set {z ∈ Z_ad : f(z) ≤ γ} of f is nonempty and bounded for some$\gamma \in \mathbb {R}$. Then problem (3) has an optimal solution, i.e., there exists$\bar {z} \in Z_{ad}$such that$f(\bar {z}) \le f(z)$for all z ∈ Z_ad. If f is strictly convex then the solution is unique.

Proof

The proof is based on the direct method of calculus of variations. Following the proof of [14, Theorem 3.2.1], we can construct a minimizing sequence $\{z_n\}_{n\in \mathbb {N}}$ contained in the lower γ-level set such that $f(z_n) \rightarrow \inf f(z)$ as n →∞. Since the lower γ-level set is convex and closed, therefore it is weakly sequentially closed [81, Theorem 2.11]. In addition, since Z is reflexive and the lower γ-level set is bounded, therefore it is weakly sequentially compact [81, Theorem 2.11]. As a result, there exists a subsequence (not relabeled) such that

$$\displaystyle \begin{aligned} z_n \rightharpoonup \bar{z} \quad \mbox{in } Z \end{aligned}$$

with $\bar {z}$ in the lower γ-level set. It then remains to show that $\bar {z}$ is the optimal solution. Due to the weak lower semicontinuity of f, we conclude that

$$\displaystyle \begin{aligned} f(\bar{z}) \le \liminf f(z_n) = \inf_{z\in Z_{ad}} f(z) . \end{aligned}$$

Finally, in order to show the uniqueness let us assume that z₁ and z₂ be two optimal solutions. Using the definition of strict convexity, we have

$$\displaystyle \begin{aligned} f\left(\frac{z_1+z_2}{2}\right) < \frac{1}{2} f(z_1) + \frac{1}{2} f(z_2) = \inf f(z) \end{aligned}$$

which is a contradiction. This completes the proof.

After showing the existence of minimizers, a natural question to ask is what the first-order optimality conditions are. However, we need to understand the notions of derivatives in function spaces before we can proceed further.

2.2 Differentiation in Banach Spaces

We introduce the notions of derivatives in function spaces [18, 79]. As an example, we shall apply the ideas to a quadratic cost functional. We will also derive the first-order optimality conditions for the problem (3) based on the derivatives introduced in this section.

Let $\mathcal {L}(A,B)$ denote the space of bounded linear operators from Banach space A to B. Let (Z, ∥⋅∥_Z), (V, ∥⋅∥_V) be real Banach spaces, $\mathcal {Z} \subset Z$, open and $F : \mathcal {Z} \rightarrow V$. Moreover, let $z \in \mathcal {Z}$.

Definition 1 (Directional Derivative)

F is said to be directionally differentiable at z if the limit $ \lim _{t\downarrow 0}\frac {1}{t} (F(z+th) - F(z))$ exists in V for all h ∈ Z. If such limit exists, we denote

$$\displaystyle \begin{aligned}F'(z,h) := \lim_{t\downarrow 0}\frac{1}{t} (F(z+th) - F(z)) \end{aligned}$$

and say that F′(z, h) is the directional derivative of F at z in the direction h.

Notice that for a given z, h↦F′(z, h) is not necessarily a linear mapping but it is positive homogeneous, we refer to [79, 81] for examples.

Definition 2 (Gâteaux Derivative)

F is said to be Gâteaux differentiable at z if it is directionally differentiable and F′(z, h) = F′(z)h for $F'(z) \in \mathcal {L}(Z,V)$. We refer to F′(z) as the Gâteaux derivative at z.

We next introduce a stronger notion.

Definition 3 (Fréchet Derivative)

F is said to be Fréchet differentiable at z if and only if F is Gâteaux differentiable at z and the following holds:

$$\displaystyle \begin{aligned} F(z+h) = F(z) + F'(z) h + r(z,h) \quad \mbox{with} \quad \frac{\| r(z,h) \|{}_V}{\| h \|{}_Z} \rightarrow 0 \quad \mbox{as} \quad \| h \|{}_Z \rightarrow 0 . \end{aligned}$$

We refer to F′(z) as the Fréchet derivative at z.

Remark 1 (Few Facts)

(i)
If the Fréchet derivative exists so does the Gâteaux derivative and they coincide. However, the converse is not true in general.
(ii)
We say that F is continuously Gâteaux differentiable if F′(⋅) exists and F′(⋅) is continuous. In that case F is Fréchet differentiable [18, pp. 35–36].
(iii)
Let E = G(F(z)) where F is Gâteaux differentiable at z and G is Fréchet differentiable at F(z), then E is Gâteaux differentiable.^{Footnote 1}

Notice that when $V = \mathbb {R}$ then $\mathcal {L}(Z,V) = Z^*$. In addition if F is Gâteaux differentiable at z then we have

$$\displaystyle \begin{aligned} F'(z)h = \langle F'(z), h \rangle_{Z^*,Z}, \end{aligned}$$

where Z^∗ is the dual space of Z and $\langle \cdot ,\cdot \rangle _{Z^*,Z}$ denotes the duality pairing.

Before we conclude this subsection, we apply the above introduced definitions to two prototypical quadratic functionals. The derivatives in both these examples are Fréchet derivatives.

Example 1

Let (H, (⋅, ⋅)_H) be a real Hilbert space and $F : H \rightarrow \mathbb {R}$ defined as $F(z) := \| z \|{ }^2_H = (z,z)_H$, then for all z, h ∈ H we have

$$\displaystyle \begin{aligned} F(z+h) - F(z) = 2(z,h)_H + \| h \|{}^2_H. \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned} F'(z)h = (2z , h)_H . \end{aligned}$$

Using the Riesz Representation Theorem (identify H with its dual H^∗), we can write

$$\displaystyle \begin{aligned} (\nabla F(z), h)_H = \langle F'(z), h\rangle_{H^*,H} , \end{aligned}$$

where ∇F(z) ∈ H is the representative of F′(z) ∈ H^∗. We refer to ∇F(z) ∈ H as the gradient of F at z. In the above case, we have ∇F(z) = 2z.

Remark 2 (Gradient)

As can be seen from the above example, the expression that we obtain by identifying F′(z) ∈ H^∗ with an element of H is called the gradient of F. We will use the notation ∇F(z) to denote the gradient. We further notice that the definition of the gradient depends on the underlying inner product.

Example 2

Let (Z, (⋅, ⋅)_Z), (H, (⋅, ⋅)_H) be real Hilbert spaces and u_d ∈ H be fixed. Let $S \in \mathcal {L}(Z,H)$. Consider $E: Z \rightarrow \mathbb {R}$,

$$\displaystyle \begin{aligned} E(z) = \| S z - u_d \|{}^2_H . \end{aligned}$$

Then E(z) = G(F(z)), where $G(v) = \| v \|{ }_H^2$ and F(z) = Sz − u_d.

Next using the chain rule, we obtain that

$$\displaystyle \begin{aligned} \langle E'(z), h\rangle_{Z^*,Z} &= \langle G'(F(z)), F'(z) h\rangle_{H^*,H} = (2 v, F'(z) h)_H \\ &= 2(S z - u_d, S h)_H = 2\langle S^*(S z - u_d), h\rangle_{Z^*,Z} , \end{aligned} $$

where $S^* \in \mathcal {L}(H^*,Z^*)$ is the adjoint of S. Here we have assumed that S^∗S and S^∗u_d are well defined. Thus E′(z) = S^∗(Sz − u_d) ∈ Z^∗. Since Z is a Hilbert space, similar to the previous example, we can again apply the Riesz representation theorem to get the representative ∇E(z) ∈ Z of E′(z).

2.3 First-Order Necessary Optimality Conditions

We conclude this section with the following result on the first-order necessary optimality conditions.

Theorem 2

Let Z be a real Banach space (not necessarily reflexive). Let $f : \mathcal {Z} \rightarrow \mathbb {R}$ be Gâteaux differentiable in $\mathcal {Z}$ , where $Z_{ad} \subset \mathcal {Z} \subset Z$ , $\mathcal {Z}$ open. If $\bar {z} \in Z_{ad}$ is a solution to (3) then the first-order necessary optimality conditions are

$$\displaystyle \begin{aligned} \langle f'(\bar{z}), z-\bar{z}\rangle_{Z^*,Z} \ge 0 \quad \forall z \in Z_{ad} . \end{aligned} $$

(4)

In addition, if f is convex and $\bar {z} \in Z_{ad}$ solves (4) then$\bar {z}$is a solution to (3), i.e., (4) is necessary and sufficient.

Proof

The proof of the first part is a direct consequence of the definition of Gâteaux derivative. Let z ∈ Z_ad be arbitrary. By the convexity of Z_ad we have that $\bar {z}+t(z-\bar {z}) \in Z_{ad}$ for all t ∈ [0, 1]. From the optimality of $\bar {z}$ it follows that

$$\displaystyle \begin{aligned} f(\bar{z}+t(z-\bar{z})) - f(\bar{z}) \ge 0 \quad \forall t \in [0,1]. \end{aligned}$$

Dividing both sides by t and taking the limit as t approaches 0⁺, we arrive at (4).

Next we use the convexity of f, i.e., for all t ∈ (0, 1] $f(\bar {z}+t(z-\bar {z})) \le (1-t) f(\bar {z}) + t f(z)$. By rearranging terms and taking the limit as t approaches 0⁺, we arrive at

$$\displaystyle \begin{aligned} f(z) - f(\bar{z}) \ge \langle f'(\bar{z}), z-\bar{z}\rangle_{Z^*,Z} \quad \forall z \in Z_{ad} .\end{aligned} $$

We then obtain the desired sufficient condition by using (4).

Remark 3

We make the following observations:

i.
Theorem 2 only requires f to be directionally differentiable and it holds after replacing $\langle f'(\bar {z}), z-\bar {z}\rangle _{Z^*,Z}$ by $f'(z,z-\bar {z})$.
ii.
Notice that the existence of a $\bar {z} \in Z_{ad}$ in Theorem 2 that solves (3) can be satisfied under the assumptions of Theorem 1.
iii.
In general, for a nonconvex f, we cannot expect to achieve a global minimum but only a local minimum. We call $\bar {z} \in Z_{ad}$ a local minimum to (3) if there exists an ε > 0 such that
$$\displaystyle \begin{aligned} f(\bar{z}) \le f(z) \quad \forall z \in Z_{ad} \cap B_\varepsilon(\bar z)\end{aligned} $$
where $B_\varepsilon (\bar z) \subset Z$ is a ball of radius ε centered at $\bar {z}$.
iv.
Equation (4) is known as a variational inequality.

3 Application to PDE-Constrained Optimization Problems

Let Z be a real reflexive Banach space and U, Y be real Banach spaces. We begin by recalling the abstract problem (1)–(2)

$$\displaystyle \begin{aligned} \min_{(u,z) \in U\times Z} J(u,z) \quad \mbox{subject to} \quad e(u,z) = 0 , \quad z \in Z_{ad} , \ u \in U_{ad} , \end{aligned} $$

(5)

where $J: U\times Z \rightarrow \mathbb {R}$ and e : U × Z_ad → Y where Z_ad ⊂ Z is closed convex. We refer to (5) as the full-space form. Before we proceed further, we remark that often the cost functional J has two components

$$\displaystyle \begin{aligned} J(u,z) = J_1(u) + J_2(z),\end{aligned} $$

where $J_1 : U \rightarrow \mathbb {R}$ is the objective (to be attained) and $J_2 : Z \rightarrow \mathbb {R}$ is the control penalty.

Another way to write (5) is by letting X := U × Z, X_ad := U_ad × Z_ad. Then we seek x ∈ X_ad such that

$$\displaystyle \begin{aligned} \min_{x \in X_{ad}} J(x) \quad \mbox{subject to} \quad e(x) = 0. \end{aligned} $$

(6)

Notice that (6) does not assume splitting of the control and state variables and is a generalization of (5).

By eliminating the state variables, we obtain a third form of (5) and we call it the reduced form. Specifically, this requires existence of a solution operator: S : Z → U, which assigns each control to a unique state

$$\displaystyle \begin{aligned} z \mapsto S(z) = u(z) \quad \mbox{ where } u(z) \mbox{ satisfies} \quad e(u(z),z) = 0 . \end{aligned}$$

Thus we define the reduced cost functional as $\mathcal {J} : Z \rightarrow \mathbb {R}$

$$\displaystyle \begin{aligned} \mathcal{J}(z) := J(S(z),z) . \end{aligned}$$

Instead of (5) we then solve

$$\displaystyle \begin{aligned} \begin{aligned} &\min_{z \in Z_{ad}} \mathcal{J}(z) \\ \mbox{subject to}& \\ &S(z)\in U_{ad}. \end{aligned} \end{aligned} $$

(7)

We remark that the formulations (5) and (7) are not equivalent in general. For instance, there are applications where the solution operator S does not exist, i.e., the problem (7) may not have a solution but (5) is still solvable.

We next state existence result for (7). For simplicity of presentation from here on we will assume that U_ad = U, i.e., no state constraints. However the discussion can be easily adapted to this more general situation.

Corollary 1

Let the following assumptions hold

(i)
Z _ad ⊆ Z is closed and convex.
(ii)
For each z ∈ Z_ad, there exists a unique mapping S(z) that solves e(S(z), z) = 0.
(iii)
S is weakly continuous, i.e., if $z_n \rightharpoonup z$ in Z _ad then $S(z_n) \rightharpoonup S(z)$ in U.
(iv)
The lower γ-level set $\{z \in Z_{ad} : \mathcal {J}(z) \le \gamma \}$ of $\mathcal {J}$ is nonempty and bounded for some $\gamma \in \mathbb {R}$.
(v)
J ₁ is continuous and convex and J ₂ is weakly lower semicontinuous.

Then there exists a solution to (7).

Proof

We notice that since J₁ is continuous and convex, therefore it is weakly lower semicontinuous [81, Theorem 2.12]. The weak continuity of S combined with the weak lower semicontinuity of J₂ implies that $\mathcal {J}$ is weakly lower semicontinuous. The proof then follows using Theorem 1.

We remark that in many applications we replace the lower γ-level set by either boundedness of Z_ad or the coercivity of J₂. More details will be provided in the subsequent sections.

At times it is more suitable to directly work with the full-space (5) form as the reduced form (7) may not even exist. This requires us to use the Lagrangian functional; we will discuss this in Section 3.2. Another advantage of using Lagrangian formulation is the ease with which it allows us to derive the first- and second-order derivatives. This will be discussed in Section 3.2. We first consider the derivation of first-order optimality conditions for the reduced form in Section 3.1.

3.1 Reduced Form: First-Order Necessary Optimality Conditions

Corollary 2

Let all the assumptions of Corollary 1 hold except Z being reflexive. Let $\mathcal {Z}$ be an open set in Z such that $Z_{ad} \subset \mathcal {Z}$ such that $z \mapsto S(z) : \mathcal {Z} \rightarrow U$ is Gâteaux differentiable with derivative

$$\displaystyle \begin{aligned} S'(z) \in \mathcal{L}(Z,U),\end{aligned} $$

$(u,z) \mapsto J(u,z) : U \times Z \rightarrow \mathbb {R}$ is Fŕechet differentiable with

$$\displaystyle \begin{aligned} J'(u,z) \in \mathcal{L}(U\times Z, \mathbb{R}).\end{aligned} $$

If $\bar {z}$ is minimizer of (5) over Z_adthen the first-order necessary optimality conditions are given by

$$\displaystyle \begin{aligned} \langle S'(\bar{z})^* J_u(S(\bar{z}),\bar{z}) + J_z(S(\bar{z}),\bar{z}),z-\bar{z}\rangle_{Z^*,Z} \ge 0 \quad \forall z \in Z_{ad}, \end{aligned} $$

(8)

where J _u and J _z are the partial derivatives of J. If $\mathcal {J}$ is convex then the condition (8) is sufficient.

Proof

The proof is a consequence of Theorem 2. Let $\bar {z}$ be a solution of (5) then from Theorem 2 we have that $\langle \mathcal {J}^{\prime }(\bar {z}), z-\bar {z}\rangle _{Z^*,Z} \ge 0$ for all z ∈ Z_ad. Combining this with the directional derivative and setting $h:=z-\bar {z}$, we obtain that

$$\displaystyle \begin{aligned} \langle \mathcal{J}^{\prime}(\bar{z}), h\rangle_{Z^*,Z} &= J'(S(z),z)h = \langle J_u(S(z),z), S'(z) h \rangle_{U^*,U} + \langle J_z(S(z),z) , h \rangle_{Z^*,Z} \end{aligned} $$

where we have used the chain rule for derivatives which holds under the stated differentiability assumptions on S and J. Since S′(z)^∗ is well defined, we conclude that

$$\displaystyle \begin{aligned} \langle \mathcal{J}^{\prime}(\bar{z}), h\rangle_{Z^*,Z} &= \langle S'(z)^* J_u(S(z),z), h \rangle_{Z^*,Z} + \langle J_z(S(z),z) , h \rangle_{Z^*,Z}\end{aligned} $$

This completes the proof.

To further understand the structure of $S'(\bar z)$, we assume that the PDE constraint function e is sufficiently smooth and the conditions of the implicit function theorem hold. Upon differentiating the state equation, we obtain that

$$\displaystyle \begin{aligned} e_u(S(\bar z),\bar z) S'(\bar z) h = - e_z (S(\bar z),\bar z) h . \end{aligned}$$

Subsequently, we arrive at

$$\displaystyle \begin{aligned} S'(\bar z) h = - e_u(S(\bar z),\bar z)^{-1} \left( e_z (S(\bar z),\bar z) h \right) .\end{aligned} $$

(9)

Substituting this in (8), we obtain that

$$\displaystyle \begin{aligned} -\left\langle e_z (S(\bar{z}),\bar{z})^* \left( (e_u(S(\bar{z}),\bar{z})^{-1})^* J_u(S(\bar{z}),\bar{z}) \right), z -\bar{z} \right\rangle_{Z^*,Z}\\ + \langle J_z(S(\bar{z}),\bar{z}),z-\bar{z}\rangle_{Z^*,Z} \ge 0 .\end{aligned} $$

Introducing the adjoint variable $\bar p$ and solving the adjoint equation,

$$\displaystyle \begin{aligned} e_u (S(\bar{z}),\bar{z})^{*} \bar p = J_u(S(\bar{z}),\bar{z}),\end{aligned} $$

(10)

we arrive at the following reformulation of (8)

$$\displaystyle \begin{aligned} -\left\langle e_z (S(\bar{z}),\bar{z})^* \bar p, z-\bar{z} \right\rangle_{Z^*,Z} + \langle J_z(S(\bar{z}),\bar{z}),z-\bar{z}\rangle_{Z^*,Z} \ge 0 . \end{aligned} $$

(11)

Notice that

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime}(z) = -e_z (S(z),z)^* p + J_z(S(z),z) \in Z^* ,\end{aligned} $$

(12)

is the derivative of $\mathcal {J}$ at z. We summarize the computation of $\mathcal {J}^{\prime }(z)$ in Algorithm 1.

Algorithm 1 Derivative computation using adjoints

1:
Given z, solve e(u, z) = 0 for the state u.
2:
Solve the adjoint equation e_u(u(z), z)^∗p = J_u(u(z), z) for p.
3:
Compute $\mathcal {J}^{\prime }(z) = J_u(u(z),z) - e_z(u(z),z)^* p(z)$.

Algorithm 1 requires two PDE solvers (possibly nonlinear in Step 1, linear PDE in Step 2).

In order to get the gradient, we can again apply the Riesz representation theorem (under the assumption that Z is a Hilbert space or admits a representation) to get a representative $\nabla \mathcal {J}(z)$ satisfying

$$\displaystyle \begin{aligned} (\nabla \mathcal{J}(z) , v)_{Z} = \langle \mathcal{J}^{\prime}(z), v \rangle_{Z^*,Z} \quad \forall v \in Z. \end{aligned}$$

Having the expression of the gradient in hand, we can develop a gradient-based optimization solver. We will provide another derivation of the first-order optimality conditions in Section 3.2 using the Lagrangian approach.

In order to design Newton-based methods, it is desirable to have the second-order derivative information of the reduced functional. This can be obtained by using either the reduced functional approach or the Lagrangian approach. We will provide a brief discussion in the next section as well. More details will be given in subsequent chapters, we also refer to [44, 53].

3.2 Lagrangian Formulation

3.2.1 First-Order Optimality Conditions

The full-space form requires us to introduce Lagrangian functional: $L: U \times Z_{ad} \times Y^* \rightarrow \mathbb {R}$,

$$\displaystyle \begin{aligned} L(u,z,p) = J(u,z) - \langle e(u,z), p \rangle_{Y,Y^*} . \end{aligned} $$

(13)

where Y^∗ is the dual space of Y (recall that e : U × Z_ad → Y ). Notice that if we set u = S(z) in (13) then e(S(z), z) = 0 and we obtain that

$$\displaystyle \begin{aligned} \mathcal{J}(z) = J(S(z),z) = L(S(z),z,p) \quad \mbox{for any } p \in Y^*. \end{aligned} $$

(14)

Now if $(\bar u,\bar z,\bar p)$ denotes a stationary point, then the partial derivatives of L(u, z, p) with respect to u, z, and p at $(\bar u,\bar z,\bar p)$ vanish and as a result we obtain

$$\displaystyle \begin{aligned} L_p(\bar u,\bar z,\bar p) = 0, \end{aligned}$$

which reduces to the state equation

$$\displaystyle \begin{aligned}e(\bar u,\bar z) = 0.\end{aligned} $$

Also

$$\displaystyle \begin{aligned} L_u(\bar u,\bar z,\bar p) &= 0 {},\end{aligned} $$

(15)

which is just the adjoint equation (10). Indeed

$$\displaystyle \begin{aligned} \langle L_u(\bar u,\bar z,\bar p) , \xi\rangle_{U^*,U} &= \langle J_u(\bar u,\bar z) , \xi\rangle_{U^*,U} - \langle \bar p, e_u(\bar u,\bar z) \xi \rangle_{Y^*,Y} \\ &= \langle J_u(\bar u,\bar z) - e_u(\bar u,\bar z)^* \bar p , \xi\rangle_{U^*,U}. \end{aligned} $$

In other words

$$\displaystyle \begin{aligned} L_u(\bar u,\bar z,\bar p) = J_u(\bar u,\bar z) - e_u(\bar u,\bar z)^* \bar p . \end{aligned}$$

Finally,

$$\displaystyle \begin{aligned} \langle L_z(\bar u,\bar z,\bar p) , z - \bar z \rangle_{Z^*,Z} \ge 0 \quad \forall z \in Z_{ad},\end{aligned} $$

which is equivalent to the variational inequality for the control (11). Indeed we have that the gradient of the reduced function $\mathcal {J}$ at z (12) is

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime}(z) = L_z(u,z,p), \end{aligned}$$

where u and p solve the state and the adjoint equations, respectively.

Few comments are in order, first of all the above approach provides an elegant way to derive the first-order necessary optimality conditions and is highly recommended. As we will discuss below, the above approach also allows us to easily derive the second-order derivatives for the reduced functional. Secondly, even though the above introduced Lagrangian L is rigorous, however, we have not yet addressed the question of existence of the Lagrange multiplier p which makes the approach “formal.” The existence of Lagrange multiplier p can be shown by using the Karush–Kuhn–Tucker (KKT) theory in function spaces [81, Chapter 6]. This theory requires the Robinson’s regularity condition [71] or the Zowe–Kurcyusz constraint qualification [85], see [53, Chapter 1]. In certain cases, in particular, in the absence of control constraints and linear PDE constraints, one can also use the inf-sup theory for saddle point problems [39] to show existence of the Lagrange multipliers.

3.2.2 Second-Order Derivatives

Next we focus on deriving the expression of the second-order derivative of the reduced functional $\mathcal {J}$. The second-order derivative information can significantly improve the convergence rates for optimization algorithms. For instance in the absence of control constraints, the first-order necessary optimality conditions are $\mathcal {J}^{\prime }(\bar {z}) = 0$ in Z^∗. In order to solve for z, one can use Newton’s method which is quadratically convergent (locally). Each iteration of Newton’s method requires solving

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime\prime}(z) v = -\mathcal{J}^{\prime}(z) \quad \mbox{in } Z^* \end{aligned} $$

(16)

for a direction v ∈ Z. In general (for large problems), it is often too expensive to store and factorize the Hessian. Instead it is often more practical to solve (16) using iterative methods that only require Hessian-times-vector products. We will discuss the computation of Hessian-times-vector product next. We remark that in case of bound constraints on the control one can use a superlinearly (locally) convergent semismooth Newton method [48, 56, 82].

We will proceed by using the Lagrangian approach. We operate under the assumption that J and e are twice continuously differentiable.

From (14), we recall that

$$\displaystyle \begin{aligned} \mathcal{J}(z) = J(u(z),z) = L(u(z),z,p) \end{aligned}$$

u(z) = S(z) solves the state equation and p ∈ Y^∗ is arbitrary. After differentiating this expression in a direction h₁, we obtain that

$$\displaystyle \begin{aligned} \langle \mathcal{J}^{\prime}(z) , h_1 \rangle_{Z^*,Z} = \langle L_u(u(z),z,p) , u'(z) h_1\rangle_{U^*,U} +\langle L_z(u(z),z,p) , h_1\rangle_{Z^*,Z}. \end{aligned}$$

Again differentiating this expression in a direction h₂ and choosing a particular p that solves the adjoint equation (15), we arrive at

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime\prime}(z) = T(S(z),z)^* H(S(z),z,p) T(S(z),z), \end{aligned} $$

where

$$\displaystyle \begin{aligned} T(u,z) = \left(\begin{array}{c} - e_u(u,z)^{-1} e_z (u,z) \\ I_Z \end{array} \right) \end{aligned}$$

with I_Z denoting the identity map on Z and H(u, z, p) is given by

$$\displaystyle \begin{aligned} H(u,z,p) = \left(\begin{array}{cc} L_{uu}(u,z,p) & L_{uz}(u,z,p) \\ L_{zu}(u,z,p) & L_{zz}(u,z,p) \end{array} \right). \end{aligned}$$

Then one can compute the Hessian vector product by using Algorithm 2. Notice that

Algorithm 2 Hessian-Times-Vector Computation

1:
Given z, solve e(u, z) = 0 for u (if not done already).
2:
Solve adjoint equation: e_u(u, z)^∗p = J_u(u, z) for p (if not done already).
3:
Solve e_u(u, z)w = −e_z(u, z)v.
4:
Solve e_u(u, z)^∗q = L_uu(u, z, p)w + L_uz(u, z, p)v.
5:
Compute

Algorithm 2 requires two linear PDE solvers in Steps 3 and 4. We refer to [53, Chapter 1] and [44] for further details.

So far our approach has been general. For the chapter remainder, we will focus on two particular examples where the cost functional is quadratic and the PDE constraints are linear and semilinear elliptic PDEs. In order to develop the notion of solutions to these PDEs, we first introduce Sobolev spaces.

4 Sobolev Spaces

In this section, we introduce the necessary function spaces to be used throughout the chapter remainder. Let $\varOmega \subset \mathbb {R}^n$ be an open, bounded domain with Lipschitz boundary ∂Ω. For 1 ≤ p < ∞, we denote by L^p(Ω) the Banach space

$$\displaystyle \begin{aligned} L^p(\varOmega) := \left\{ v : \varOmega \rightarrow \mathbb{R} \;:\; v \mbox{ is measurable and } \int_\varOmega |v(x)|{}^p \;dx < \infty \right\} \end{aligned}$$

with the norm $\|v\|{ }_{L^p(\varOmega )} := \left (\int _\varOmega |v(x)|{ }^p \;dx \right )^{\frac {1}{p}}$. These spaces are equivalence classes of functions equal up to a set of measure zero [37]. In particular when p = 2, we obtain L²(Ω) which is a Hilbert space with inner product $(u,v)_{L^2(\varOmega )} = \int _\varOmega u(x) v(x) \; dx$. When p = ∞, we obtain L^∞(Ω), a Banach space with norm $\|v\|{ }_{L^\infty (\varOmega )} := \mbox{ess sup}_{\varOmega } |v|$.

Moving forward, we use multi-index notation to define partial derivatives. For a multi-index $\gamma = (\gamma _1,\dots ,\gamma _n) \in \mathbb {N}^n_0 := (\mathbb {N}\cup \{0\})^n$, we let its order to be $|\gamma | := \sum _{i=1}^n \gamma _i$. The associated |γ|-th order partial derivative of a function u at x is

$$\displaystyle \begin{aligned}D^\gamma u(x) := \frac{\partial^{|\gamma|}u}{\partial x_1^{\gamma_1}\dots \partial x_n^{\gamma_n}}(x) . \end{aligned}$$

Then we denote by W^{k, p}(Ω) the Sobolev spaces with the norm

$$\displaystyle \begin{aligned} \|u\|{}_{W^{k,p}(\varOmega)} := \left\{\begin{array}{ll} \left( \sum_{|\gamma| \le k} \int_\varOmega |D^\gamma u|{}^p \;dx \right)^{1/p} & 1 \le p < \infty \\ \sum_{|\gamma|\le k} \mbox{ess sup}_{\varOmega} |D^\gamma u| & p = \infty . \end{array} \right. \end{aligned}$$

If p = 2, we write

$$\displaystyle \begin{aligned} H^k(\varOmega) = W^{k,2}(\varOmega), \quad k = 0, 1, \dots \end{aligned}$$

which is a Hilbert space with inner product

$$\displaystyle \begin{aligned} (u,v)_{H^k(\varOmega)} = \sum_{|\gamma| \le k} (D^\gamma u, D^\gamma v)_{L^2(\varOmega)} . \end{aligned}$$

Notice that H⁰(Ω) = L²(Ω).

We denote by $W_0^{k,p}(\varOmega )$ the closure of $C^\infty _0(\varOmega )$ with respect to W^{k, p}(Ω)-norm. Thus $u \in W^{k,p}_0(\varOmega )$ if and only if there exist functions $u_m \in C_0^\infty (\varOmega )$ such that u_m → u in W^{k, p}(Ω). The space $H^1_0(\varOmega )$ consists of functions u ∈ H¹(Ω) such that

$$\displaystyle \begin{aligned} u = 0 \quad \mbox{on } \partial\varOmega , \end{aligned}$$

in the trace sense. Using the Poincaré inequality $\|u\|{ }_{L^2(\varOmega )} \le C \|\nabla u\|{ }_{L^2(\varOmega )}$, where C = C(Ω), we have

$$\displaystyle \begin{aligned} \| u \|{}_{H^1(\varOmega)} \le C \| \nabla u \|{}_{L^2(\varOmega)} \quad \forall u \in H^1_0(\varOmega). \end{aligned}$$

Finally, we denote the dual of $H^1_0(\varOmega )$ by H⁻¹(Ω). It is easy to see that L²(Ω) is continuously embedded in H⁻¹(Ω). For more details, we refer to [1].

5 Second-Order Linear Elliptic PDEs

In this section, we study the second-order elliptic PDEs. In general, we cannot expect classical solutions to these PDEs, therefore we first introduce the notion of weak solutions in Section 5.1. We will also study the notion of ‘strong’ solutions for this problem in Section 5.2. This higher regularity (strong solutions) will help us to establish the approximability of the continuous solution using the finite-element method (cf. Section 8). Strong solutions, in addition, also play a role in other situations, for instance, in studying the regularity of multipliers in state-constrained problems [20] and PDECO problems with variational inequalities [49].

5.1 Existence and Uniqueness

We begin this section by making certain uniform ellipticity assumptions.

Assumption 3 (Coefficient Matrix)

Let A be an n × n matrix with entries a_ijfor 1 ≤ i, j ≤ n. We assume that a_ijare measurable, belong to L^∞(Ω), and are symmetric, that is, a_ij(x) = a_ji(x) for all 1 ≤ i, j ≤ n and for a.e. x ∈ Ω. We further assume that A is positive definite and satisfy the uniform ellipticity condition

$$\displaystyle \begin{aligned} \exists \; \beta \ge \alpha > 0 \quad \mathit{\mbox{such that}} \quad \alpha |\xi|{}^2 \le A(x) \xi \cdot \xi \le \beta |\xi|{}^2 \quad \forall \xi \in \mathbb{R}^n, \mathit{\mbox{ a.e. }} x \mathit{\mbox{ in }} \varOmega . \end{aligned} $$

(17)

Given f, we consider the following linear elliptic PDE

$$\displaystyle \begin{aligned} \begin{aligned} -\mbox{div } (A \nabla u) &= f \quad \mbox{in } \varOmega \\ u &= 0 \quad \mbox{on } \partial\varOmega . \end{aligned} \end{aligned} $$

(18)

We understand (18) in a weak sense, i.e., given f ∈ H⁻¹(Ω), we seek a solution $u \in H^1_0(\varOmega )$ that satisfies

$$\displaystyle \begin{aligned} \begin{aligned} \int_\varOmega A \nabla u \cdot \nabla v \;dx &= \langle f, v\rangle_{-1,1} , \quad \forall v \in H^1_0(\varOmega) , \end{aligned} \end{aligned} $$

(19)

where 〈⋅, ⋅〉_−1,1 denotes the duality pairing between H⁻¹(Ω) and $H^1_0(\varOmega )$.

Theorem 4

For every f ∈ H⁻¹(Ω), there exists a unique weak solution$u \in H^1_0(\varOmega )$that fulfills

$$\displaystyle \begin{aligned} \| u\|{}_{H^1(\varOmega)} \le C \|f\|{}_{H^{-1}(\varOmega)}, \end{aligned} $$

(20)

where the constant C only depends on Ω and α.

Proof

The existence and uniqueness is due to the Lax–Milgram Lemma. Moreover, the bound (20) immediately follows by using the fact that A is uniformly positive definite and the Poincaré inequality [38].

Remark 4

In general, for Sobolev spaces W^{1, p}(Ω) with p ≠ 2, we need the inf-sup conditions to prove Theorem 4. The Banach–Nečas theorem then guarantees existence and uniqueness of u [34, Theorem 2.6]. The latter is a necessary and sufficient condition. See also [67].

5.2 Regularity

It is imperative to understand the regularity of solution u to (19). For instance, such an understanding allows us to develop numerical schemes with optimal rate of convergence (see Section 8). It can assist us with proving rates of convergence for the optimization algorithms [47].

Theorem 5

Let $u \in H^1_0(\varOmega )$ be a weak solution of (19) and let the coefficient matrix A satisfy Assumption 3.

If f ∈ L²(Ω) and Ω is a convex polytope or C^{1, 1}domain in$\mathbb {R}^n$, then$u \in H^2(\varOmega ) \cap H^1_0(\varOmega )$and there exists a constant C = C(α, β, Ω) such that
$$\displaystyle \begin{aligned} \| u \|{}_{H^2(\varOmega)} \le C \| f \|{}_{L^2(\varOmega)} . \end{aligned}$$
If f ∈ L^p(Ω) for 1 < p < ∞ and Ω is C^{1, 1}, then$u \in W^{2,p}(\varOmega ) \cap W^{1,p}_0(\varOmega )$and there exists a constant C = C(α, β, Ω, p) such that
$$\displaystyle \begin{aligned} \| u \|{}_{W^{2,p}(\varOmega)} \le C \| f \|{}_{L^p(\varOmega)} . \end{aligned}$$
If p > n , then$u \in C^{1,\alpha }(\bar \varOmega )$with α = 1 − n∕p.

Proof

If Ω is a convex polygonal/polyhedral domain, then H²-regularity is in [40, 3.2.1.2]. When ∂Ω is C^{1, 1} and f ∈ L^p(Ω) for any 1 < p < ∞, the result is due to [38, Theorem 9.15]. In the case p > n, the C^{1, α} regularity follows from W^{2, p} regularity and the Sobolev embedding.

6 Linear Quadratic PDE-Constrained Optimization Problem

Having some basic understanding of elliptic PDEs in hand, we next apply the results of Section 3 to a linear quadratic PDECO problem (cf. Section 6.1). In Section 6.2, we formulate it as a reduced PDECO problem only in terms of the control variable z. This allows us to use the direct method of calculus of variations from Theorem 1 to show the existence of solution.

6.1 Problem Formulation

Let u_d ∈ L²(Ω) and z_a, z_b ∈ L²(Ω) with z_a < z_b a.e. in Ω being given. Moreover, let λ ≥ 0 denotes the penalty parameter. Then we are interested in minimizing

$$\displaystyle \begin{aligned} J(u,z) = \frac{1}{2}\| u - u_d \|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2} \|z\|{}^2_{L^2(\varOmega)} \end{aligned} $$

(21)

subject to (in the weak form)

$$\displaystyle \begin{aligned} \begin{aligned} -\mbox{div } (A \nabla u) &= z \quad \mbox{in } \varOmega, \\ u &= 0 \quad \mbox{on } \partial\varOmega, \end{aligned} \end{aligned} $$

(22)

and the pointwise control constraints

$$\displaystyle \begin{aligned} z \in Z_{ad} := \{ v \in L^2(\varOmega) : z_a(x) \le v(x) \le z_b(x) \ , \ a.e. \ x \in \varOmega \}. \end{aligned} $$

(23)

Notice that in our formulation above we also allow z_a = −∞ and z_b = ∞. We call this case as an unconstrained case, i.e., Z_ad = L²(Ω).

For the above problem, we have (cf. Section 3)

$$\displaystyle \begin{aligned} U = H^1_0(\varOmega), \quad Y = H^{-1}(\varOmega), \quad Z = L^2(\varOmega). \end{aligned}$$

In order to understand the problem (21)–(23) similar to (7), first we condense the variables and write J only in terms of z. We again call this as the reduced form and the resulting cost functional as the reduced functional. We discuss this next.

6.2 Reduced PDECO Problem

For every z ∈ Y , there exists a unique solution u = u(z) ∈ U to (22). As a result, we can define the solution operator to (22) as

$$\displaystyle \begin{aligned} \mathcal{S} : Y \rightarrow U , \quad z \mapsto u(z) = \mathcal{S} z, \end{aligned} $$

which is linear and continuous. In view of the continuous embedding, $ H^1_0(\varOmega ) \hookrightarrow L^2(\varOmega ) \hookrightarrow H^{-1}(\varOmega )$ we may also consider $\mathcal {S}$ as a map from L²(Ω) to L²(Ω). In other words instead of $\mathcal {S}$, we consider the operator $S := \mathcal {E}_u \mathcal {S}\mathcal {E}_z$, where $\mathcal {E}_z : L^2(\varOmega ) \rightarrow H^{-1}(\varOmega )$ and $\mathcal {E}_u : H^1_0(\varOmega ) \rightarrow L^2(\varOmega )$ denote the embedding operators that assign to each z ∈ L²(Ω) and $u \in H^1_0(\varOmega )$ the functions z ∈ H⁻¹(Ω) and u ∈ L²(Ω) so that when these new functions are restricted to L²(Ω) and $H^1_0(\varOmega )$ the operator yields the original functions, respectively. Notice that S : L²(Ω) → L²(Ω). One of the main advantages of using S is the fact that the adjoint operator S^∗ also acts on L²(Ω) (cf. Section 6.3). Using the solution map S, we arrive at the so-called reduced cost $\mathcal {J} : L^2(\varOmega ) \rightarrow \mathbb {R}$ which is defined as

$$\displaystyle \begin{aligned} \mathcal{J}(z) := J(S z,z) ,\end{aligned} $$

and the minimization problem (21)–(23) is equivalent to the reduced problem:

$$\displaystyle \begin{aligned} \min_{z \in Z_{ad}} \mathcal{J}(z).\end{aligned} $$

(24)

We notice that it is more convenient to analyze (24) in comparison to (21)–(23). In fact, we have the following well-posedness result.

Corollary 3

Let either λ > 0 or Z_adbe bounded. Then there exists a solution to the minimization problem (24). If in addition λ > 0 or S is injective then the solution is unique.

Proof

The proof is a consequence of Theorem 1. We will first show that $\mathcal {J}$ is weakly lower semicontinuous. Notice that $\mathcal {J}(z) = \mathcal {J}_1(Sz) + \mathcal {J}_2(z)$ where $\mathcal {J}_1(Sz) := \frac {1}{2}\|Sz-u_d\|{ }_{L^2(\varOmega )}^2$ and $\mathcal {J}_2(z) := \frac {\lambda }{2}\|z\|{ }_{L^2(\varOmega )}^2$. Clearly $\mathcal {J}_2$ is weakly lower semicontinuous (convexity and continuity imply weak lower semicontinuity [18, Theorem 2.23]). On the other hand, due to the compact embedding of $H^1_0(\varOmega )$ in L²(Ω), we have that S : L²(Ω) → L²(Ω) is completely continuous, i.e., if $z_n \rightharpoonup z$ in L²(Ω) then Sz_n → Sz in L²(Ω). Thus, owing to the continuity and convexity of $\mathcal {J}_1$, we conclude that $\mathcal {J}_1$ is weakly lower semicontinuous. Whence $\mathcal {J}$ is weakly lower semicontinuous.

It then remains to characterize the lower γ-level set. Here we replace the lower γ-level set condition by the coercivity of $\mathcal {J}$ (if λ > 0) or by the closed convex bounded set Z_ad.

Finally uniqueness is due to strict convexity of $\mathcal {J}$.

For the remainder of this section, we will consider λ > 0.

6.3 First-Order Optimality Conditions

We are now ready to derive the first-order optimality conditions by following Section 3.1 and the expression of the gradient of the reduced objective function. In Section 6.4, we follow Section 3.2 and consider an alternate strategy to derive the optimality conditions using the Lagrangian formulation.

We recall that the reduced functional is

$$\displaystyle \begin{aligned} \mathcal{J}(z) = \frac{1}{2} \| Sz - u_d \|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2} \| z \|{}^2_{L^2(\varOmega)}.\end{aligned} $$

Using Examples 1 and 2, the gradient of $\mathcal {J}$ is given by

$$\displaystyle \begin{aligned} \nabla \mathcal{J}(z) = S^*(S z - u_d) + \lambda z\in L^2(\varOmega). \end{aligned}$$

Here we have S : L²(Ω) → L²(Ω). The first-order necessary and sufficient (due to the convexity of $\mathcal {J}$) optimality condition (4) then becomes

$$\displaystyle \begin{aligned} (\nabla\mathcal{J} (\bar z), z-\bar{z})_{L^2(\varOmega)} \ge 0 \quad \forall z \in Z_{ad} . \end{aligned} $$

(25)

In order to efficiently evaluate $ \nabla \mathcal {J}(z)$, we introduce the so-called adjoint variable $ p \in H^1_0(\varOmega )$ solving

$$\displaystyle \begin{aligned} \begin{aligned} -\mbox{div } (A \nabla {p}) &= {u} - u_d \quad \mbox{in } \varOmega \\ p &= 0 \quad \mbox{on } \partial\varOmega . \end{aligned} \end{aligned} $$

(26)

We will next show that the adjoint operator S^∗ : L²(Ω) → L²(Ω) can be defined by S^∗ζ := p where p solves (26) with right-hand side given by ζ. Here ζ ∈ L²(Ω) is arbitrary. Let z ∈ L²(Ω) be an arbitrary right-hand side of the state equation and the resulting state variable is $u = Sz \in H^1_0(\varOmega )$. By testing the equation for u with p and vice versa, we obtain that $(z,p)_{L^2(\varOmega )} = (\zeta ,u)_{L^2(\varOmega )}$. Since u = Sz, we deduce that S^∗ζ = p. As a result, S^∗(u − u_d) = p where p solves (26).

Thus the gradient computation reduces to evaluation of the following expression:

$$\displaystyle \begin{aligned} \nabla\mathcal{J}(z) = p + \lambda z \in L^2(\varOmega). \end{aligned}$$

Finally, we gather the first-order necessary and sufficient optimality system:

$$\displaystyle \begin{aligned} \bar{u} \in H^1_0(\varOmega): & \quad \int_\varOmega A \nabla \bar{u} \cdot \nabla v\ dx = \int_\varOmega \bar{z} v \ dx\quad \forall v \in H^1_0(\varOmega) {} \end{aligned} $$

(27a)

$$\displaystyle \begin{aligned} \bar{p} \in H^1_0(\varOmega): & \quad \int_\varOmega A \nabla \bar{p} \cdot \nabla v \ dx = \int_\varOmega (\bar u - u_d) v \ dx\quad \forall v \in H^1_0(\varOmega) {} \end{aligned} $$

(27b)

$$\displaystyle \begin{aligned} \bar{z} \in Z_{ad}: & \quad (\bar{p} + \lambda \bar{z},z-\bar{z})_{L^2(\varOmega)} \ge 0, \quad \forall z \in Z_{ad} {}. \end{aligned} $$

(27c)

Notice that (27) is a coupled system, namely $\bar {u}$ in (27a) depends on the unknown optimal control $\bar {z}$ which fulfills the inequality (27c). The latter depends on the adjoint variable $\bar {p}$ that solves the adjoint equation (27b). This in turn depends on $\bar {u}$. We further remark the variational inequality (27c) for the control is equivalent to the following projection formula (see [81]):

$$\displaystyle \begin{aligned} \begin{aligned} \bar{z}(x) = \mathcal{P}_{Z_{ad}} \left\{ -\frac{1}{\lambda} \bar{p}(x) \right\} \quad \mbox{a.e. } x \in \varOmega . \end{aligned} \end{aligned} $$

(28)

Here $\mathcal {P}_{Z_{ad}}(v)$ denotes projection of v onto Z_ad. In our case, we can further write this as $\mathcal {P}_{Z_{ad}}(v) := \min \left \{b(x), \max \left \{ a(x) , v(x)\right \} \right \}$, x a.e. in Ω.

6.4 Lagrange Method

An alternative approach to derive the first-order optimality system is using the Lagrangian as described in Section 3.2. We again emphasize that even though this approach is used formally in the following, it can be made rigorous. It provides a systematic way of deriving the optimality system, especially for tedious problems and this step is strongly recommended.

Introduce $L : H^1_0(\varOmega ) \times Z_{ad} \times H^1(\varOmega ) \rightarrow \mathbb {R}$, defined as

$$\displaystyle \begin{aligned} L(u,z,p) := J(u,z) - \int_\varOmega \left( A \nabla u \cdot \nabla p - z p \right) \ dx .\end{aligned} $$

If $(\bar u, \bar z, \bar p)$ is a stationary point then

$$\displaystyle \begin{aligned} \begin{aligned} \langle L_p(\bar u, \bar z, \bar p) , h\rangle_{H^{-1}(\varOmega), H^1_0(\varOmega)} &= 0 \quad \forall h \in H^1_0(\varOmega) , \\ \langle L_u(\bar u, \bar z, \bar p) , h\rangle_{H^{-1}(\varOmega), H^1_0(\varOmega)} &= 0 \quad \forall h \in H^1_0(\varOmega) , \\ \left( L_z(\bar u, \bar z, \bar p) , (z - \bar z) \right)_{L^2(\varOmega)} &\ge 0 \quad \forall z \in Z_{ad}. \end{aligned}\end{aligned} $$

(29)

It is not hard to see that (29) leads to the same optimality system as in (27).

7 Semilinear Quadratic PDE-Constrained Optimization Problem

The focus of the previous section was on the linear quadratic PDECO problem. However, things are more delicate when we replace the linear PDE constraint by a semilinear one. In this section, we provide a brief discussion of a PDECO problem governed by a semilinear PDE.

Let $\varOmega \subset \mathbb {R}^n$, with n ≥ 2, be a Lipschitz domain and the Assumption 3 holds. Moreover, let u_d ∈ L²(Ω) and z_a, z_b ∈ L^∞(Ω) with z_a(x) < z_b(x) a.e. x ∈ Ω and λ ≥ 0 be given. We then consider the following semilinear optimal control problem:

$$\displaystyle \begin{aligned} \min J(u,z) := \frac{1}{2} \|u-u_d\|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2}\|z\|{}_{L^2(\varOmega)}^2\end{aligned} $$

(30)

subject to $u \in L^\infty (\varOmega ) \cap H^1_0(\varOmega )$ solving the weak form of

$$\displaystyle \begin{aligned} \begin{aligned} -\mbox{div } (A \nabla u) + f(\cdot,u) &= z \quad \mbox{in } \varOmega \\ u &= 0 \quad \mbox{on } \partial\varOmega \end{aligned}\end{aligned} $$

(31)

and

$$\displaystyle \begin{aligned} z \in Z_{ad} := \{ v \in L^\infty(\varOmega) : z_a(x) \le v(x) \le z_b(x) \ , \ a.e. \ x \in \varOmega \}. \end{aligned} $$

(32)

Notice the difference between the semilinear state equation (31) and the linear state equation (22). The key difficulty in the above problem is due to the nonlinearity introduced by f. The control bounds fulfill z_a, z_b ∈ L^∞(Ω). This is different than the linear case where we assumed the bounds to be in L²(Ω). This choice enforces the control z ∈ L^∞(Ω) which in turn provides additional regularity for the state u as we discuss next.

In order to establish existence of solution to the state equation (31), we make certain assumptions on the nonlinearity f.

Assumption 6

For a function $f:\varOmega \times \mathbb {R}\to \mathbb {R}$ we consider the following assumption:

$$\displaystyle \begin{aligned} \begin{cases} f(x,\cdot) \mathit{\text{ is strictly increasing}}&\mathit{\text{ for a.e. }} x\in\varOmega,\\ f(x,0)=0 &\mathit{\text{ for a.e. }} x\in \varOmega,\\ f(x,\cdot) \mathit{\text{ is continuous }}\;&\mathit{\text{ for a.e. }} x\in \varOmega,\\ f(\cdot,t) \mathit{\text{ is measurable }}&\mathit{\mbox{ for all }} t\in\mathbb{R},\\ \lim_{t\to\infty}f(x,t)=\infty &\mathit{\text{ for a.e. }} x\in \varOmega. \end{cases}\end{aligned} $$

Remark 5

The condition f(x, 0) = 0 in Assumption 6 is not a restriction. If this condition on f cannot be verified, then it is enough to rewrite Equation (31) in Ω as

$$\displaystyle \begin{aligned} -\mbox{div } (A \nabla u) + f(\cdot,u) - f(\cdot,0) = z - f(\cdot,0) \quad \mbox{in } \varOmega .\end{aligned} $$

A typical example of f that fulfills Assumption 6 is given next (cf. [10, 11]).

Example 3

Let q ∈ [1, ∞) and let b : Ω → (0, ∞) be a function in L^∞(Ω), that is, b(x) > 0 for a.e. x ∈ Ω. Define the function $f:\varOmega \times \mathbb R\to \mathbb R$ by f(x, t) = b(x)|t|^q−1t.

Theorem 7 (Existence and Uniqueness for Semilinear PDE)

Let Assumptions 3 and 6 hold. Then for every z ∈ L^p(Ω) with$p > \frac {n}{2}$, there exists a unique weak solution$u \in L^\infty (\varOmega ) \cap H^1_0(\varOmega )$to the state equation (31) and there exists a constant C = C(α, β, Ω) > 0 such that

$$\displaystyle \begin{aligned} \|u\|{}_{H^1(\varOmega)} + \|u\|{}_{L^\infty(\varOmega)}\le C\|z\|{}_{L^p(\varOmega)}.\end{aligned} $$

(33)

Proof

The existence of solution is using the Browder–Minty theorem (cf [11, proposition 3.2] after setting s = 1). On the other hand, the L^∞(Ω) regularity is by using a technique of Stampacchia [11, Theorem 3.5], see also [2, 21].

For the above minimization problem, we have (cf. Section 3)

$$\displaystyle \begin{aligned} U = H^1_0(\varOmega) \cap L^\infty(\varOmega), \quad Y = H^{-1}(\varOmega), \quad Z = L^p(\varOmega) \quad \mbox{ with } p > n/2 .\end{aligned} $$

Notice that

$$\displaystyle \begin{aligned} Z_{ad} \subset L^\infty(\varOmega) \subset Z . \end{aligned}$$

As a result and owing to Theorem 7, the control to state map is well defined

$$\displaystyle \begin{aligned} S : L^\infty(\varOmega) \rightarrow L^\infty(\varOmega) \cap H^1_0(\varOmega) . \end{aligned}$$

We notice that S is also well defined as a map from Z to $L^\infty (\varOmega ) \cap H^1_0(\varOmega )$.^{Footnote 2} Due to S we can write the reduced problem as

$$\displaystyle \begin{aligned} \min_{z\in Z_{ad}} \mathcal{J}(z) := J(S(z),z). \end{aligned} $$

(34)

In order to show the existence of solution to (34), typically f is assumed to be locally Lipschitz in the second argument to fulfill the assumptions on S in Corollary 1. This assumption was recently weakened in [11] and was replaced by the following growth condition on f: There exists a constant c ∈ (0, 1] such that

$$\displaystyle \begin{aligned} c|f(x,\xi-\eta)|\le |f(x,\xi)-f(x,\eta)| \end{aligned} $$

(35)

for a.e. x ∈ Ω and for all $\xi ,\eta \in \mathbb {R}$. Such a growth condition is fulfilled by Example 3 (cf. [11]). Under this condition, we have the following existence result for (34).

Corollary 4

Let the Assumptions of Theorem 7 hold. In addition, let f fulfills the growth condition (35) and that

$$\displaystyle \begin{aligned} f(\cdot,w(\cdot))\in L^2(\varOmega)\;\mathit{\mbox{ for every }} w\in L^\infty(\varOmega). \end{aligned} $$

(36)

Then there exists a solution to (34).

Proof

Similar to Corollary 3, the proof is again a consequence of Theorem 1. We interpret Z_ad as a subset of Z which is a reflexive Banach space. However, care must be taken to show the weak lower semicontinuity of $\mathcal {J}$. One has to carefully study the convergence of the state sequence $\{S(z_n)\}_{n\in \mathbb {N}}$. See for instance [11, Theorem 4.2].

Notice that the condition (36) is also fulfilled by Example 3.

Remark 6

We mention that all the results given in Corollary 4 remain true if one replaces the growth condition (35) and (36) with the following local Lipschitz continuity condition: For all M > 0 there exists a constant L_M > 0 such that f satisfies

$$\displaystyle \begin{aligned} |f(x,\xi)-f(x,\eta)|\leq L_{M}|\xi-\eta| \end{aligned} $$

(37)

for a.e. x ∈ Ω and $\xi ,\eta \in \mathbb {R}$ with |η|, |ξ|≤ M.

For the remainder of this section, we will assume that λ > 0. Before we proceed further to derive the optimality conditions, we need some additional assumptions on f. Notice that the second-order derivatives are needed if one is interested in studying the second-order sufficient conditions.

Assumption 8

We assume the following:

(i)
The function f(x, ⋅) is k-times, with k = 1, 2, continuously differentiable for a.e. x ∈ Ω.
(ii)
For all M > 0, there exists a constant L_M > 0 such that f satisfies (37) and
$$\displaystyle \begin{aligned} \left| D^k_u f (x,\xi) - D^k_u f (x,\eta)\right| \le L_{M}|\xi-\eta| , \quad k = 1,2 ,\end{aligned} $$
for a.e. x ∈ Ω and$\xi ,\eta \in \mathbb {R}$with |ξ|, |η|≤ M. Here D_udenotes the partial derivatives with respect to the second component.
(iii)
D _u f(⋅, 0) ∈ L^∞(Ω).
(iv)
D _uu f(⋅, u(⋅)) ∈ L^∞(Ω) whenever u(⋅) ∈ L^∞(Ω).

Assumptions 8 help us prove that $S : L^\infty (\varOmega ) \rightarrow L^\infty (\varOmega ) \cap H^1_0(\varOmega )$ is not only twice Fréchet differentiable (using the Implicit Function Theorem) but also twice continuous Fréchet differentiability of $\mathcal {J}$.

By invoking Corollary 2 the first-order necessary optimality conditions are given as follows: For every solution $\bar {z}$ of the problem (34), there exists a unique optimal state $\bar {u}=S(\bar z)$ and an optimal adjoint state $\bar {p}$ such that

$$\displaystyle \begin{aligned} \begin{aligned} \bar{u} \in L^\infty(\varOmega)\cap H^1_0(\varOmega): & \ \int_\varOmega A \nabla \bar{u} \cdot \nabla v\ dx {+} \int_\varOmega f(x,\bar{u}) v \ dx\\ & \qquad \qquad \qquad \qquad {=} \int_\varOmega \bar{z} v \ dx\quad \forall v \in H^1_0(\varOmega) \\ \bar{p} \in H^1_0(\varOmega): & \ \int_\varOmega A \nabla \bar{p} \cdot \nabla v \ dx {+} \int_\varOmega D_u f(x,\bar{u}) \bar{p} \ dx \\ & \qquad \qquad \qquad \qquad {=} \int_\varOmega (\bar u {-} u_d) v \ dx\quad \forall v \in H^1_0(\varOmega) \\ \bar{z} \in Z_{ad}: & \ (\bar{p} {+} \lambda \bar{z},z{-}\bar{z})_{L^2(\varOmega)} \ge 0, \quad \forall z \in Z_{ad}. \end{aligned}\end{aligned} $$

(38)

Alternatively, one can use the Lagrangian approach of Section 6.4 to derive (38).

Notice that the variational inequality in (38) again can be written using the Projection formula as

$$\displaystyle \begin{aligned} \bar{z}(x) = \mathcal{P}_{Z_{ad}} \left\{ -\frac{1}{\lambda} \bar{p}(x) \right\} \quad \mbox{a.e. } x \in \varOmega .\end{aligned} $$

(39)

We further remark that since $\mathcal {J}$ is non-convex, in general due to the semilinear state equation, we cannot expect a global unique solution to the PDECO problem but only a local one. This local uniqueness can be shown by studying second order sufficient conditions. Nevertheless, care must be taken to prove such a result. This is due to the fact that the penalty term on the control in the cost functional is in L²(Ω). However, the constraints in Z_ad are in L^∞(Ω). This leads to the so-called L²(Ω) − L^∞(Ω) norm discrepancy and should be taken into account before considering second-order sufficient conditions. We refer to [81, Theorem 4.29] for details. We further remark that the second-order sufficient conditions are a useful tool to derive the discretization error estimates [13]. A further discussion is provided in Theorem 12.

8 Discrete Optimal Control Problem

We will illustrate the main ideas of the finite-element approximation of the PDE-constrained optimization problems in the case of linear elliptic problem with control constraints (21)–(23). First, we briefly review the basics of the finite-element discretization just for the state equation. For what follows, it is sufficient to take f ∈ L²(Ω). We consider the weak form of Equation (19),

$$\displaystyle \begin{aligned} (A\nabla u, \nabla v)_{L^2(\varOmega)}=(f,v)_{L^2(\varOmega)}\quad \forall v\in H^1_0(\varOmega). \end{aligned}$$

We partition the domain Ω into elements. For simplicity, we only discuss the case when elements are simplices. For h ∈ (0, h₀]; h₀ > 0, let $\mathcal {T}_h$ denote a quasi-uniform triangulation of Ω with mesh size h, i.e., $\mathcal {T}_h = \{\tau \}$ is a partition of Ω into triangles or tetrahedrons τ of diameter h_τ such that for h =max_τh_τ,

$$\displaystyle \begin{aligned}\operatorname{diam}(\tau)\le h \le C |\tau|{}^{\frac{1}{n}}, \quad \forall \tau\in \mathcal{T}_h, \end{aligned}$$

where the constant C > 0 independent of h. For simplicity, we assume ∪ τ = Ω. Let V_h be the set of all functions in $H^1_0(\varOmega )$ that are continuous on Ω and linear on each τ. V_h is usually called the space of conforming Lagrange piecewise linear elements.

Now we define the finite-element Galerkin approximation u_h ∈ V_h of (8), as the unique solution of

$$\displaystyle \begin{aligned} (A\nabla u_h, \nabla v_h)_{L^2(\varOmega)}=(f,v_h)_{L^2(\varOmega)}\quad \forall v_h\in V_h.\end{aligned} $$

(40)

Expanding u_h in terms of basis functions, it is easy to see that (40) is equivalent to a system of linear equations and since $V_h\subset H^1_0(\varOmega )$ the resulting matrix is nonsingular. Notice that by construction

$$\displaystyle \begin{aligned} (A\nabla (u-u_h), \nabla v_h)_{L^2(\varOmega)}=0\quad \forall v_h\in V_h.\end{aligned} $$

(41)

Thus, the Galerkin solution u_h is the orthogonal projection of u onto V_h with respect to the inner-product $(A\nabla \cdot ,\nabla \cdot )_{L^2(\varOmega )}$. Almost immediately we obtain the following key result.

Lemma 1 (Céa Lemma)

Let u and u _h satisfy (41). Then the following estimate holds

$$\displaystyle \begin{aligned}\|u-u_h\|{}_{H^1(\varOmega)}\le C\min_{\chi\in V_h}\|u-\chi\|{}_{H^1(\varOmega)}.\end{aligned} $$

The constant C depends only on ellipticity, boundedness of the matrix A, and the domain Ω.

The above result says that the Galerkin solution is the best approximation to u from V_h in H¹(Ω)-norm up to a constant. We can use Cea’s lemma to derive a priori error estimates. Let I_h : H¹(Ω) → V_h be a projection with the approximation properties

$$\displaystyle \begin{aligned} \|u-I_hu\|{}_{H^s(\varOmega)}\le Ch^{2-s}\|u\|{}_{H^2(\varOmega)}, \quad s=0,1,\end{aligned} $$

(42)

then from Cea’s lemma immediately follows

$$\displaystyle \begin{aligned}\|u-u_h\|{}_{H^1(\varOmega)}\le Ch\|u\|{}_{H^2(\varOmega)}.\end{aligned} $$

Notice that the constant C > 0 above is independent of h. The error estimates in L²(Ω)-norm are not immediate, since the Galerkin solution does not have a property of being best approximation in L²(Ω)-norm; nevertheless, one can still establish optimal error estimates in L²(Ω) via a duality argument, also known as Nitsche’s trick. This result requires H²(Ω)-regularity.

Lemma 2

Let Ω be convex or C ^{1, 1} and u and u _h satisfy (41). Then there exists a constant C independent of h such that

$$\displaystyle \begin{aligned}\|u-u_h\|{}_{L^2(\varOmega)}\le Ch\min_{\chi\in V_h}\|u-\chi\|{}_{H^1(\varOmega)}.\end{aligned} $$

Proof

Let e = u − u_h and consider a dual problem

$$\displaystyle \begin{aligned}(A\nabla w, \nabla v)_{L^2(\varOmega)}=(e,v)_{L^2(\varOmega)}\quad \forall v\in H^1_0(\varOmega) .\end{aligned} $$

By setting, v = e, we obtain

$$\displaystyle \begin{aligned}\|e\|{}_{L^2(\varOmega)}^2 = (A\nabla w, \nabla e)_{L^2(\varOmega)} = (A\nabla (w-w_h), \nabla e)_{L^2(\varOmega)},\end{aligned} $$

where the last equality is due to (41). Next using the Cauchy–Schwarz inequality and the fact that, under given regularity of the domain $\|w\|{ }_{H^2(\varOmega )} \le C \|e\|{ }_{L^2(\varOmega )}$, we obtain the required result.

Combining Cea’s lemma and Lemma 2, we immediately establish the optimal a priori error estimate

$$\displaystyle \begin{aligned} \|u-u_h\|{}_{L^2(\varOmega)}\le Ch^2\|u\|{}_{H^2(\varOmega)}. \end{aligned}$$

Notice that the above estimate does require the convexity of Ω.

Corollary 5

H ² regularity also allows us to express the error in terms of data. Thus from Lemmas 1 and 2 it follows

$$\displaystyle \begin{aligned}\|u-u_h\|{}_{L^2(\varOmega)}+h\|u-u_h\|{}_{H^1(\varOmega)}\le Ch^2\|f\|{}_{L^2(\varOmega)}. \end{aligned}$$

8.1 Discrete Linear Quadratic PDE-Constrained Optimization Problem

For the remainder chapter, we will assume that λ > 0. To discretize the problem, we replace the state space $H^1_0(\varOmega )$ with $V_h\subset H^1_0(\varOmega )$ and the control space of admissible functions Z_ad with Z_ad,h ⊂ Z_ad. In case of unconstrained control, we can choose Z_ad,h = V_h. Theoretically, the mesh for the discretization of the state variable and the mesh for the discretization of the control can be different. However having two different meshes adds more technical difficulties for implementation. For this reason, it is more convenient to work with the same mesh which is what we assume from now on. Thus, the discretized problem (21)–(23) becomes

$$\displaystyle \begin{aligned} \min_{u_h\in V_h, \; z_h\in Z_{ad,h}}J_h(u_h,z_h) = \frac{1}{2}\| u_h - u_d \|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2} \|z_h\|{}^2_{L^2(\varOmega)}\end{aligned} $$

(43)

subject to

$$\displaystyle \begin{aligned} \begin{aligned} (A \nabla u_h,\nabla v_h) &= (z_h,v_h) \quad \forall v_h\in V_h . \end{aligned}\end{aligned} $$

(44)

Similar to the infinite dimensional case, we define a discrete solution operator S_h : Z_ad,h → V_h to (44) and the reduced discrete problem becomes

$$\displaystyle \begin{aligned} \min_{z_h\in Z_{ad,h}}\mathcal{J}_h(z_h) :=\min_{z_h\in Z_{ad,h}} J_h(S_h z_h,z_h).\end{aligned} $$

(45)

Similar to the continuous problem, one can show that problem (45) has a unique solution $\bar {z}_h\in Z_{ad,h}$, the corresponding discrete optimal state is $\bar {u}_h = S_h(\bar {z}_h)$, and similar to Theorem 2 the first-order necessary and sufficient optimality condition is

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime}_h(\bar{z}_h)(z_h-\bar{z}_h)\geq 0, \quad \forall z_h\in Z_{ad,h}. \end{aligned} $$

(46)

8.2 Optimization Problem Without Control Constraints

In this situation, Z_ad = L²(Ω) and Z_ad,h = V_h and as a result (25) and (46) reduce to equalities

$$\displaystyle \begin{aligned} \bar{z}=-\frac{1}{\lambda}\bar{p},\quad \bar{z}_h=-\frac{1}{\lambda}\bar{p}_h, \end{aligned} $$

(47)

correspondingly, and as a result the continuous and discrete PDECO problems are equivalent to the systems of equations

$$\displaystyle \begin{aligned} \begin{aligned} \bar{u} &= S(-\frac{1}{\lambda}\bar{p})\\ \bar{p}&=S^*(\bar{u}-u_d) \end{aligned} \end{aligned}$$

and

$$\displaystyle \begin{aligned}\begin{aligned} \bar{u}_h &= S_h(-\frac{1}{\lambda}\bar{p}_h)\\ \bar{p}_h&=S_h^*(\bar{u}_h-u_d). \end{aligned} \end{aligned}$$

As a result $\bar {z},\bar {u},\bar {p}\in H^2(\varOmega )$ and we can expect second-order convergence for the optimal control in L² norm. Indeed, one can establish the following result.

Theorem 9

Let $\bar {z}$ and $\bar {z}_h$ be optimal solutions to continuous and discrete PDECO problems (21) and (43), respectively, without control constraints. Assume in addition that Ω is convex or C^{1, 1}. Then there exists a constant C independent of h such that

$$\displaystyle \begin{aligned} \| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch^2\left(\|\bar{z}\|{}_{L^2(\varOmega)}+\|u_d\|{}_{L^2(\varOmega)}\right). \end{aligned}$$

Proof

We begin by recalling the unconstrained optimality conditions $\lambda \bar {z} + \bar {p} = 0 = \lambda \bar {z}_h + \bar {p}_h$, which yields $ (\lambda \bar {z} + \bar {p}, \bar {z}_h-\bar {z}) = 0 = (\lambda \bar {z}_h + \bar {p}_h, \bar {z}-\bar {z}_h) . $ Adding these last two equalities, we arrive at

$$\displaystyle \begin{aligned} \lambda \|\bar{z}-\bar{z}_h\|{}^2_{L^2(\varOmega)} = (\bar{p}-\bar{p}_h, \bar{z}_h - \bar{z})_{L^2(\varOmega)} = (S^*(S\bar{z}-u_d) - S_h^*(S_h\bar{z}_h - u_d) ,\bar{z}_h - \bar{z})_{L^2(\varOmega)}, \end{aligned} $$

(48)

where in the last equality we have used the representation of $\bar {p}$ and $\bar {p}_h$. Up on rewriting (48), we obtain

$$\displaystyle \begin{aligned} \lambda \|\bar{z}-\bar{z}_h\|{}^2_{L^2(\varOmega)} = (S^*S\bar{z} - S_h^*S_h\bar{z}_h ,\bar{z}_h - \bar{z})_{L^2(\varOmega)} +((S^*- S_h^*)u_d ,\bar{z}_h - \bar{z})_{L^2(\varOmega)} = I + II. \end{aligned} $$

(49)

It follows from Corollary 5 that

$$\displaystyle \begin{aligned} |II| \le C h^2 \|u_d\|{}_{L^2(\varOmega)} \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)}. \end{aligned} $$

(50)

It then remains to estimate I in (49). We add and subtract $(S_h^* S_h \bar {z},\bar {z}_h - \bar {z})_{L^2(\varOmega )}$ to I and arrive at

$$\displaystyle \begin{aligned} I &= ( (S^*S - S_h^*S_h)\bar{z} ,\bar{z}_h - \bar{z})_{L^2(\varOmega)} + ( S_h(\bar{z} - \bar{z}_h) ,S_h(\bar{z}_h - \bar{z}))_{L^2(\varOmega)} \\ &\le ( (S^*S - S_h^*S_h)\bar{z} ,\bar{z}_h - \bar{z})_{L^2(\varOmega)}, \end{aligned} $$

(51)

where we have used the fact that $( S_h(\bar {z} {-} \bar {z}_h) ,S_h(\bar {z}_h {-} \bar {z}))_{L^2(\varOmega )} = {-} \|S_h(\bar {z}{-}\bar {z}_h) \|{ }^2 \le 0$. Again adding and subtracting $(S_h^*S\bar {z},,\bar {z}_h - \bar {z})_{L^2(\varOmega )}$ to (51), we arrive at

$$\displaystyle \begin{aligned} |I| &\le |( (S^*-S_h^*)S\bar{z} ,\bar{z}_h - \bar{z})_{L^2(\varOmega)} + ( S_h^*(S-S_h)\bar{z} ,\bar{z}_h - \bar{z})_{L^2(\varOmega)}| \\ &\le C h^2 \|\bar{z}\|{}_{L^2(\varOmega)} \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)}, \end{aligned} $$

(52)

where we have first used the triangle inequality and have estimated the first term using Corollary 5 and continuity of S : L²(Ω) → L²(Ω): $|( (S^*-S_h^*)S\bar {z} ,\bar {z}_h - \bar {z})_{L^2(\varOmega )}| \le C h^2 \|S\bar {z}\|{ }_{L^2(\varOmega )} \|\bar {z}-\bar {z}_h\|{ }_{L^2(\varOmega )} \le C h^2 \|\bar {z}\|{ }_{L^2(\varOmega )} \|\bar {z}-\bar {z}_h\|{ }_{L^2(\varOmega )}$. The estimate of the remaining term follows again using Corollary 5 and the continuity of $S_h^* : L^2(\varOmega ) \rightarrow L^2(\varOmega )$. Finally, substituting the estimates of I and II from (52) and (50) in (49), we arrive at the asserted result.

8.3 Optimization Problem with Control Constraints

For the rest of this section, we assume constant box constraints, i.e., $z_a,z_b\in \mathbb {R}$, with z_a < z_b. We remind the reader that in this situation the optimal control is given by a projection formula:

$$\displaystyle \begin{aligned} \bar{z}(x) = \mathcal{P}_{Z_{ad}} \left\{ -\frac{1}{\lambda} \bar{p}(x) \right\}. \end{aligned} $$

(53)

If the constraints are active, $\bar {z}\notin H^2(\varOmega )$. However, we can still conclude $\bar {z}\in H^1(\varOmega )$ and even $\bar {z}\in W^1_\infty (\varOmega )$ by using [58, Theorem A.1]. In light of this, the second-order convergence cannot in general be expected. There are several approaches to treat the problem.

8.3.1 Cell-Wise Constant Control Discretization

One idea is to consider a cellwise constant discretization of the control variable, i.e., we choose $Z_{ad,h}=Z_{ad}\cap Z_h^0$, where $Z_h^0$ is a space of piecewise constant functions on the partition $\mathcal {T}_h$. This idea goes back to Falk [35]. Since we consider piecewise constant discretization, only first-order convergence for the control can be expected. Indeed for such discretization, one can establish the following convergence result.

Theorem 10

Let $\bar {z}$ and $\bar {z}_h$ be optimal solutions to continuous and discrete PDECO problems (21) and (43), respectively, with control constraints (23). Let$Z_{ad,h}=Z_{ad}\cap Z_h^0$. Assume in addition that Ω is convex or C^{1, 1}. Then there exists a constant C independent of h such that

$$\displaystyle \begin{aligned}\| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch\left(\|\bar{z}\|{}_{H^1(\varOmega)}+\|u_d\|{}_{L^2(\varOmega)}\right). \end{aligned}$$

Proof

First we define a projection $\pi _h:Z_{ad}\to Z_{ad}\cap Z_h^0$ by

$$\displaystyle \begin{aligned} \pi_h v\mid_\tau =\frac{1}{|\tau|}\int_\tau v\ dx,\quad \forall\tau\in \mathcal{T}_h. \end{aligned} $$

(54)

Thus the projection π_h is the orthogonal projection onto $Z_h^0$ with respect to L²-inner-product, i.e.

$$\displaystyle \begin{aligned} (v - \pi_h{v},w)_{L^2(\varOmega)} =0, \quad w\in Z_h^0 \end{aligned} $$

(55)

and has the following approximation property

$$\displaystyle \begin{aligned} \| v - \pi_h{v}\|{}_{L^2(\varOmega)} \le Ch\| \nabla v\|{}_{L^2(\varOmega)}, \quad v\in H^1(\varOmega). \end{aligned} $$

(56)

Then replacing z by $\bar {z}_h$ in (25) and z_h by $\pi _h \bar {z}$ in (46), we arrive at

$$\displaystyle \begin{aligned} (\lambda\bar{z}+\bar{p}, \bar{z}_h - \bar{z})_{L^2(\varOmega)} \ge 0, \quad (\lambda\bar{z}_h+\bar{p}_h, \pi_h \bar{z} - \bar{z}_h)_{L^2(\varOmega)} \ge 0 . \end{aligned}$$

Adding these inequalities, we obtain that

$$\displaystyle \begin{aligned} \lambda \|\bar{z}-\bar{z}_h\|{}^2_{L^2(\varOmega)} &\le (\bar{p}-\bar{p}_h, \bar{z}_h-\bar{z})_{L^2(\varOmega)} + (\lambda \bar{z}_h + \bar{p}_h, \pi_h\bar{z}-\bar{z})_{L^2(\varOmega)} = I+II. \end{aligned} $$

(57)

The estimate of I is exactly the same as in Theorem 9, i.e.,

$$\displaystyle \begin{aligned} |I| &\le C h^2 (\|\bar{z}\|{}_{L^2(\varOmega)} +\|u_d\|{}_{L^2(\varOmega)}) \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)} \\ &\le C_\lambda h^4 (\|\bar{z}\|{}_{L^2(\varOmega)} +\|u_d\|{}_{L^2(\varOmega)})^2 + \frac{\lambda}{4} \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)}^2 , \end{aligned} $$

(58)

where we have used the Young’s inequality in addition. Next we provide estimate for II. Using the characterization of $\bar {p}_h$, followed by, adding and subtracting $(S_h^*(S_h\bar {z}-u_d),\pi _h\bar {z}-\bar {z})_{L^2(\varOmega )}$, and using the continuity of $S_h^*$, we obtain

$$\displaystyle \begin{aligned} II &= (S_h^*(S_h\bar{z}_h-u_d),\pi_h\bar{z}-\bar{z})_{L^2(\varOmega)} \\ &= (S_h^*S_h(\bar{z}_h{-}\bar{z}),\pi_h\bar{z}{-}\bar{z})_{L^2(\varOmega)} {+}(S_h^*(S_h\bar{z}{-}u_d),\pi_h\bar{z}{-}\bar{z})_{L^2(\varOmega)} =: II_1 {+} II_2 . \end{aligned} $$

(59)

To estimate II₁, we use (56) and Young’s inequality to arrive at

$$\displaystyle \begin{aligned} |II_1| \le C_\lambda h^2 \|\bar{z}\|{}_{H^1(\varOmega)}^2 + \frac{\lambda}{4} \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)}^2 . \end{aligned} $$

(60)

It then remains to estimate II₂ in (59). By adding and subtracting $(S_h^*(S\bar {z}-u_d),\pi _h\bar {z}-\bar {z})_{L^2(\varOmega )}$ in II₂, we obtain that

$$\displaystyle \begin{aligned} |II_2| &= |(S_h^*(S_h-S)\bar{z},\pi_h\bar{z}-\bar{z})_{L^2(\varOmega)} + (S_h^*(S\bar{z}-u_d),\pi_h\bar{z}-\bar{z})_{L^2(\varOmega)}| \\ &\le C h^3 \|\bar{z}\|{}_{H^1(\varOmega)} + |( (S_h^*-S^*)(S\bar{z}-u_d), \pi_h\bar{z}-\bar{z})_{L^2(\varOmega)}| \\ &\quad + |( S^*(S\bar{z}-u_d), \pi_h\bar{z}-\bar{z})_{L^2(\varOmega)}| =: II_{2,1} + II_{2,2} + II_{2,3}, \end{aligned} $$

(61)

where we have used Corollary 5 and (56) to estimate the first term. Moreover, we have added and subtracted $(S^*(S\bar {z}-u_d),\pi _h\bar {z}-\bar {z})_{L^2(\varOmega )}$ to the second term. It then remains to estimate II_2,2 and II_2,3. Again using Corollary 5 and (56), we obtain that $II_{2,2} \le C h^3 (\|\bar {z}\|{ }_{L^2(\varOmega )}+\|u_d\|{ }_{L^2(\varOmega )})\|\bar {z}\|{ }_{H^1(\varOmega )}$. Finally, to estimate II_2,3 we first recall that $S^*(S\bar {z}-u_d) = \bar {p}$. Since π_h is L²-orthogonal projection, we obtain

$$\displaystyle \begin{aligned} II_{2,3} = ( \bar{p}-\pi_h\bar{p}, \pi_h\bar{z}-\bar{z})_{L^2(\varOmega)} \le C h^3 (\|\bar{z}\|{}_{L^2(\varOmega)}+\|u_d\|{}_{L^2(\varOmega)}) \|\bar{z}\|{}_{H^1(\varOmega)}. \end{aligned} $$

(62)

Collecting all the estimates, we arrive at the asserted result.

Comparing this result with the unconstrained case, we have only first-order convergence. This is mainly due to the choice of the discrete control space which does not take the full advantage of the regularity of the optimal control, namely $\bar {z}\in W^1_\infty (\varOmega )$ . Moreover, away from the active constraints $\bar {z}$ is still in H². Taking this in consideration, there are some alternatives to increase the order of the convergence.

8.3.2 Cell-Wise Linear Control Discretization

To improve the convergence rate of the above result, we consider Z_ad,h = Z_ad ∩ V_h, i.e., the control space consists of piecewise linear functions satisfying constraints (22). The approximation properties in this setting were investigated in a number of papers, for example [72, 74]. We will not provide all the details, only highlight the main ideas. To take advantage of the regularity for $\bar {z}$ discussed above, we consider the following sets:

$$\displaystyle \begin{aligned} \mathcal{T}^1_h = \{\tau\in \mathcal{T}_h \mid \bar{z}\mid_\tau = z_a\ \text{or}\ \bar{z}\mid_\tau = z_b \}, \end{aligned}$$

which is the set of active cells,

$$\displaystyle \begin{aligned} \mathcal{T}^2_h = \{\tau\in \mathcal{T}_h \mid z_a<\bar{z}\mid_\tau < z_b \}, \end{aligned}$$

the set of not active cells, and the rest

$$\displaystyle \begin{aligned}\mathcal{T}^3_h=\mathcal{T}_h\backslash(\mathcal{T}^1_h\cup\mathcal{T}^2_h). \end{aligned}$$

Then under the assumption that

$$\displaystyle \begin{aligned} meas(\mathcal{T}^3_h)\le Ch, \end{aligned} $$

(63)

which is valid, for example, if the boundary of the active set consists of a finite number of rectifiable curves, one can establish the following result.

Theorem 11

Let $\bar {z}$ and $\bar {z}_h$ be optimal solutions to continuous and discrete PDECO problems (21) and (43), respectively, with control constraints. Assume in addition that Ω is convex or C^{1, 1}, u_d ∈ L^p(Ω) for p > n and assumptions (63) hold. Then there exists a constant C independent of h such that

$$\displaystyle \begin{aligned}\| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch^{\frac{3}{2}}\left(\|\bar{p}\|{}_{H^2(\varOmega)}+\|\nabla \bar{z}\|{}_{L^\infty(\varOmega)}\right). \end{aligned}$$

Proof

The proof of this result can be found in [73].

Remark 7 (Variational Discretization)

The idea of the variational discretization approach introduced by Hinze [52] is not to discretize the control variable, i.e., to choose Z_ad,h = Z_ad. This approach does give second-order convergence for the control but requires a nonstandard implementation, especially for n > 1.

8.4 Semilinear Equations

Similar to the linear case, we discretize the problem with finite elements. Thus, we replace the state space $H^1_0(\varOmega )$ with $V_h\subset H^1_0(\varOmega )$ and the control space of admissible functions Z_ad with Z_ad,h ⊂ Z_ad. We additionally assume that $z_a,z_b\in \mathbb {R}\cup \{\pm \infty \}$, with z_a < z_b. In case of unconstrained control, we take z_a = −∞ and z_b = ∞, i.e., Z_ad,h = V_h. The discretized problem (30)–(32) becomes

$$\displaystyle \begin{aligned} \min_{u_h\in V_h, \; z_h\in Z_{ad,h}}J_h(u_h,z_h) = \frac{1}{2}\| u_h - u_d \|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2} \|z_h\|{}^2_{L^2(\varOmega)}\end{aligned} $$

(64)

subject to

$$\displaystyle \begin{aligned} \begin{aligned} (A \nabla u_h,\nabla v_h)+(f(\cdot,u_h),v_h) &= (z_h,v_h) \quad \forall v_h\in V_h . \end{aligned}\end{aligned} $$

(65)

All of the above strategies for choosing Z_ad,h can be applied for the semilinear problem as well and similar error estimates can be obtained. Of course the arguments are more technical and we refer to [13, 22, 29] for details.

Theorem 12

Assume that Ω is convex domain with C ^{1, 1} boundary and let Assumptions 6 and 8 be satisfied. Let$\bar {z}$be a strict local solution to (30)–(32) that fulfills the second-order optimality condition: There exist δ > 0 and τ > 0 such that

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime\prime}(\bar{z}) (z,z) \ge \delta \|z\|{}_{L^2(\varOmega)}^2 \end{aligned}$$

holds for all z ∈ L^∞(Ω) satisfying

$$\displaystyle \begin{aligned} z(x) \left\{\begin{array}{ll} \ge 0 & \mathit{\mbox{ if }} \bar{z}(x) = z_{a} , \\ \le 0 & \mathit{\mbox{ if }} \bar{z}(x) = z_{b} , \\ = 0 & \mathit{\mbox{ if }} |\lambda\bar{z}+\bar{p}| \ge \tau > 0. \end{array} \right.\end{aligned} $$

(66)

Then:

[ 13 , Thm. 5.1] (Cell-wise constant control) Let $Z_{ad,h}=Z_{ad}\cap Z_h^0$ and $\{\bar {z}_h\}$ be a sequence of locally optimal piecewise constant solutions to (64)–(65) that converges strongly in L²(Ω) to$\bar {z}$. Then there exists a constant C independent of h such that
$$\displaystyle \begin{aligned}\| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch.\end{aligned} $$
[ 29 , Thm. 4.5](Cell-wise linear control) Let Z _ad,h = Z_ad ∩ V_hand$\{\bar {z}_h\}$be a sequence of locally optimal piecewise linear solutions to (64)–(65) that converges strongly in L²(Ω) to$\bar {z}$. If in addition (63) holds, then there exists a constant C independent of h such that
$$\displaystyle \begin{aligned}\| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch^{\frac{3}{2}}.\end{aligned} $$

9 Conclusion and the Current State of the Art

In this introductory chapter, we reviewed the main ideas behind the study of PDECO. We briefly mentioned numerical approximation of such problems by the finite-element method and showed some convergence results in the case of control constraints. However, the subject is vast with many active research directions. In this final section, let us mention some topics that we skipped and some active areas of research that we did not touch.

In the above discussion, we only touched problems with control constraints. However, problems with state constraints are equally important. In contrast to the control constraints, which basically amount to projection onto the feasible set, the state constraints require much more care since the Lagrange multipliers are not functions, only measures. We refer to [81, Chapter 6] for a nice introduction on the subject. For such problems, the error analysis is also more subtle and one often has to reserve to more technical pointwise error estimates to derive optimal-order convergence. We refer to [28, 64, 65] for some recent developments of the subject.
All above error estimates were a priori error estimates. However, a large fraction of the literature on finite elements is devoted to a posteriori error estimates, i.e., estimates where the error between the true solution and the discrete approximated solution is expressed in terms of computable quantities. We refer to [75, 77] for more recent development of the subject.
One can consider more complicated state equations or even systems, which can be linear or nonlinear, time dependent, variational inequalities, and so on. Currently, the theory is well developed for problems constrained by linear and semilinear elliptic problems, but the research is very much active for nonlinear, time dependent, and variational inequalities [61, 68].
We only consider a quadratic cost functional in our error analysis. However, other choices may be desired. For example, it was observed numerically that seeking the control from the space of regular Borel measures forces the sparsity of optimal solution, meaning that the support of solution is small. This phenomenon was analyzed for elliptic and parabolic problems in a number of papers [23,24,25]; however, there are still some remaining open questions.
In all our examples, we considered the distributed control, i.e., the control z was acting in the interior of the domain. However, problems where the control acts on the boundary are important in many applications. The control can enter as Dirichlet, Neumann, or Robin boundary conditions. Because of the variational structure of the problems, the Neumann and Robin boundary conditions naturally enter the variational form of the state equation and as a result Neumann boundary controls can be naturally analyzed, see [81, Chapter 2], we refer to [26, 27] for the Robin case. Dirichlet boundary conditions do not have this property and one has to use more sophisticated machinery to overcome technical difficulties and to derive optimal error estimate for the optimal solution [12]. Alternatively, one can also use the penalized Robin boundary conditions to study the Dirichlet boundary control problems [63].

Notes

1.
The chain rule only requires the outer function to be Hadamard directionally differentiable and the inner function to be Hadamard (Gâteaux) directionally differentiable [79, Proposition 3.6].
2.
Note both the choices of spaces for S are motivated by the theory of Nemytskii or superposition operators. Care must to taken to ensure their differentiability [81, Section 4.3].

References

R.A. Adams and J.J.F. Fournier. Sobolev spaces, volume 140 of Pure and Applied Mathematics (Amsterdam). Elsevier/Academic Press, Amsterdam, second edition, 2003.
MATH Google Scholar
J.-J. Alibert and J.-P. Raymond. Boundary control of semilinear elliptic equations with discontinuous leading coefficients and unbounded controls. Numer. Funct. Anal. Optim., 18(3–4):235–250, 1997.
Article MathSciNet MATH Google Scholar
H. Antil, M. Heinkenschloss, and R. H.W. Hoppe. Domain decomposition and balanced truncation model reduction for shape optimization of the Stokes system. Optimization Methods and Software, 26(4–5):643–669, 2011.
Article MathSciNet MATH Google Scholar
H. Antil, M. Heinkenschloss, R.H.W. Hoppe, C. Linsenmann, and A. Wixforth. Reduced order modeling based shape optimization of surface acoustic wave driven microfluidic biochips. Math. Comput. Simul., 82(10):1986–2003, June 2012.
Article MathSciNet MATH Google Scholar
H. Antil, M. Heinkenschloss, R.H.W. Hoppe, and D. C. Sorensen. Domain decomposition and model reduction for the numerical solution of PDE constrained optimization problems with localized optimization variables. Comput. Vis. Sci., 13(6):249–264, 2010.
Article MathSciNet MATH Google Scholar
H. Antil, M. Heinkenschloss, and D. C. Sorensen. Application of the discrete empirical interpolation method to reduced order modeling of nonlinear and parametric systems. volume 8 of Springer MS&A series: Reduced Order Methods for modeling and computational G. Rozza, Eds. Springer-Verlag Italia, Milano, 2013.
Google Scholar
H. Antil, M. Hintermüller, R. Nochetto, T. Surowiec, and D. Wegner. Finite horizon model predictive control of electrowetting on dielectric with pinning. Interfaces Free Bound., 19(1): 1–30, 2017.
Article MathSciNet MATH Google Scholar
H. Antil, R.H. Nochetto, and P. Venegas. Controlling the Kelvin force: Basic strategies and applications to magnetic drug targeting. arXiv preprint arXiv:1704.06872, 2017.
Google Scholar
H. Antil, R.H. Nochetto, and P. Venegas. Optimizing the Kelvin force in a moving target subdomain. Math. Models Methods Appl. Sci., 28(1):95–130, 2018.
Article MathSciNet MATH Google Scholar
H. Antil, J. Pfefferer, and M. Warma. A note on semilinear fractional elliptic equation: analysis and discretization. Math. Model. Numer. Anal. (ESAIM: M2AN), 51:2049–2067, 2017.
Article MathSciNet MATH Google Scholar
H. Antil and M. Warma. Optimal control of fractional semilinear PDEs. arXiv preprint arXiv:1712.04336, 2017.
Google Scholar
T. Apel, M. Mateos, J. Pfefferer, and A. Rösch. On the regularity of the solutions of Dirichlet optimal control problems in polygonal domains. SIAM J. Control Optim., 53(6):3620–3641, 2015.
Article MathSciNet MATH Google Scholar
N. Arada, E. Casas, and F. Tröltzsch. Error estimates for the numerical approximation of a semilinear elliptic control problem. Comput. Optim. Appl., 23(2):201–229, 2002.
Article MathSciNet MATH Google Scholar
H. Attouch, G. Buttazzo, and G. Michaille. Variational analysis in Sobolev and BV spaces, volume 17 of MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA, second edition, 2014. Applications to PDEs and optimization.
Google Scholar
M. Benzi, G.H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numer., 14:1–137, 2005.
Article MathSciNet MATH Google Scholar
M. Berggren and M. Heinkenschloss. Parallel solution of optimal-control problems by time-domain decomposition. In Computational science for the 21st century. Symposium, pages 102–112, 1997.
Google Scholar
P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk. Convergence rates for greedy algorithms in reduced basis methods. SIAM J. Math. Anal., 43(3):1457–1472, 2011.
Article MathSciNet MATH Google Scholar
J.F. Bonnans and A. Shapiro. Perturbation analysis of optimization problems. Springer Series in Operations Research. Springer-Verlag, New York, 2000.
Google Scholar
S.C. Brenner and L.R. Scott. The mathematical theory of finite element methods, volume 15 of Texts in Applied Mathematics. Springer, New York, third edition, 2008.
Book MATH Google Scholar
E. Casas. Control of an elliptic problem with pointwise state constraints. SIAM J. Control Optim., 24(6):1309–1318, 1986.
Article MathSciNet MATH Google Scholar
E. Casas. Boundary control of semilinear elliptic equations with pointwise state constraints. SIAM J. Control Optim., 31(4):993–1006, 1993.
Article MathSciNet MATH Google Scholar
E. Casas. Using piecewise linear functions in the numerical approximation of semilinear elliptic control problems. Adv. Comput. Math., 26(1–3):137–153, 2007.
Article MathSciNet MATH Google Scholar
E. Casas, C. Clason, and K. Kunisch. Approximation of elliptic control problems in measure spaces with sparse solutions. SIAM J. Control Optim., 50(4):1735–1752, 2012.
Article MathSciNet MATH Google Scholar
E. Casas, C. Clason, and K. Kunisch. Parabolic control problems in measure spaces with sparse solutions. SIAM J. Control Optim., 51(1):28–63, 2013.
Article MathSciNet MATH Google Scholar
E. Casas and K. Kunisch. Optimal control of semilinear elliptic equations in measure spaces. SIAM J. Control Optim., 52(1):339–364, 2014.
Article MathSciNet MATH Google Scholar
E. Casas, M. Mateos, and J.-P. Raymond. Penalization of Dirichlet optimal control problems. ESAIM Control Optim. Calc. Var., 15(4):782–809, 2009.
Article MathSciNet MATH Google Scholar
E. Casas, M. Mateos, and F. Tröltzsch. Error estimates for the numerical approximation of boundary semilinear elliptic control problems. Comput. Optim. Appl., 31(2):193–219, 2005.
Article MathSciNet MATH Google Scholar
E. Casas, M. Mateos, and B. Vexler. New regularity results and improved error estimates for optimal control problems with state constraints. ESAIM Control Optim. Calc. Var., 20(3):803–822, 2014.
Article MathSciNet MATH Google Scholar
E. Casas and F. Tröltzsch. A general theorem on error estimates with application to a quasilinear elliptic optimal control problem. Comput. Optim. Appl., 53(1):173–206, 2012.
Article MathSciNet MATH Google Scholar
A. Cohen, M. Hoffmann, and M. Reiß. Adaptive wavelet Galerkin methods for linear inverse problems. SIAM J. Numer. Anal., 42(4):1479–1501, 2004.
Article MathSciNet MATH Google Scholar
W. Dahmen, A. Kunoth, and K. Urban. A wavelet Galerkin method for the Stokes equations. Computing, 56(3):259–301, 1996. International GAMM-Workshop on Multi-level Methods (Meisdorf, 1994).
Article MathSciNet MATH Google Scholar
X. Deng, X.-C. Cai, and J. Zou. Two-level space-time domain decomposition methods for three-dimensional unsteady inverse source problems. J. Sci. Comput., 67(3):860–882, 2016.
Article MathSciNet MATH Google Scholar
X. Deng and M. Heinkenschloss. A parallel-in-time gradient-type method for discrete time optimal control problems. 2016.
Google Scholar
A. Ern and J.L. Guermond. Theory and practice of finite elements, volume 159 of Applied Mathematical Sciences. Springer-Verlag, New York, 2004.
Book MATH Google Scholar
R.S. Falk. Approximation of a class of optimal control problems with order of convergence estimates. J. Math. Anal. Appl., 44:28–47, 1973.
Article MathSciNet MATH Google Scholar
P.E. Farrell, D.A. Ham, S.W. Funke, and M.E. Rognes. Automated derivation of the adjoint of high-level transient finite element programs. SIAM J. Sci. Comput., 35(4):C369–C393, 2013.
Article MathSciNet MATH Google Scholar
G.B. Folland. Real analysis. Pure and Applied Mathematics (New York). John Wiley & Sons, Inc., New York, second edition, 1999. Modern techniques and their applications, A Wiley-Interscience Publication.
Google Scholar
D. Gilbarg and N.S. Trudinger. Elliptic partial differential equations of second order. Classics in Mathematics. Springer-Verlag, Berlin, 2001. Reprint of the 1998 edition.
Chapter MATH Google Scholar
V. Girault and P.-A. Raviart. Finite element methods for Navier-Stokes equations, volume 5 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, 1986. Theory and algorithms.
Google Scholar
P. Grisvard. Singularities in boundary value problems, volume 22 of Recherches en Mathématiques Appliquées [Research in Applied Mathematics]. Masson, Paris, 1992.
MATH Google Scholar
M. Gubisch and S. Volkwein. Proper orthogonal decomposition for linear-quadratic optimal control. In Model reduction and approximation, volume 15 of Comput. Sci. Eng., pages 3–63. SIAM, Philadelphia, PA, 2017.
MATH Google Scholar
M.D. Gunzburger. Perspectives in flow control and optimization, volume 5 of Advances in Design and Control. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2003.
Google Scholar
J. Haslinger and R. A. E. Mäkinen. Introduction to shape optimization, volume 7 of Advances in Design and Control. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2003. Theory, approximation, and computation.
Google Scholar
M. Heinkenschloss. Numerical solution of implicitly constrained optimization problems. 2008.
Google Scholar
J.S. Hesthaven, G. Rozza, and B. Stamm. Certified reduced basis methods for parametrized partial differential equations. SpringerBriefs in Mathematics. Springer, Cham; BCAM Basque Center for Applied Mathematics, Bilbao, 2016. BCAM SpringerBriefs.
Book MATH Google Scholar
J.S. Hesthaven and T. Warburton. Nodal discontinuous Galerkin methods, volume 54 of Texts in Applied Mathematics. Springer, New York, 2008. Algorithms, analysis, and applications.
Google Scholar
M. Hintermüller and M. Hinze. Moreau-Yosida regularization in state-constrained elliptic control problems: error estimates and parameter adjustment. SIAM J. Numer. Anal., 47(3):1666–1683, 2009.
Article MathSciNet MATH Google Scholar
M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth Newton method. SIAM J. Optim., 13(3):865–888 (2003), 2002.
Article MathSciNet MATH Google Scholar
M. Hintermüller and I. Kopacka. Mathematical programs with complementarity constraints in function space: C- and strong stationarity and a path-following algorithm. SIAM J. Optim., 20(2):868–902, 2009.
Article MathSciNet MATH Google Scholar
M. Hintermüller and I. Kopacka. A smooth penalty approach and a nonlinear multigrid algorithm for elliptic MPECs. Comput. Optim. Appl., 50(1):111–145, 2011.
Article MathSciNet MATH Google Scholar
M. Hintermüller and M. Ulbrich. A mesh-independence result for semismooth Newton methods. Math. Program., 101(1, Ser. B):151–184, 2004.
Google Scholar
M. Hinze. A variational discretization concept in control constrained optimization: the linear-quadratic case. Comput. Optim. Appl., 30(1):45–61, 2005.
Article MathSciNet MATH Google Scholar
M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich. Optimization with PDE constraints, volume 23 of Mathematical Modelling: Theory and Applications. Springer, New York, 2009.
Book MATH Google Scholar
M. Hinze and S. Volkwein. Proper orthogonal decomposition surrogate models for nonlinear dynamical systems: error estimates and suboptimal control. In Dimension reduction of large-scale systems, volume 45 of Lect. Notes Comput. Sci. Eng., pages 261–306. Springer, Berlin, 2005.
MATH Google Scholar
M. Hinze and S. Volkwein. Error estimates for abstract linear-quadratic optimal control problems using proper orthogonal decomposition. Comput. Optim. Appl., 39(3):319–345, 2008.
Article MathSciNet MATH Google Scholar
K. Ito and K. Kunisch. Lagrange multiplier approach to variational problems and applications, volume 15 of Advances in Design and Control. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008.
Google Scholar
C.T. Kelley. Iterative methods for optimization, volume 18 of Frontiers in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1999.
Google Scholar
D. Kinderlehrer and G. Stampacchia. An introduction to variational inequalities and their applications, volume 31 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000. Reprint of the 1980 original.
Google Scholar
K. Kohls, A. Rösch, and K.G. Siebert. A posteriori error analysis of optimal control problems with control constraints. SIAM J. Control Optim., 52(3):1832–1861, 2014.
Article MathSciNet MATH Google Scholar
D. Kouri, D. Ridzal, and G.von Winckel. Webpage. https://trilinos.org/packages/rol/.
G. Leugering, S. Engell, A. Griewank, M. Hinze, R. Rannacher, V. Schulz, M. Ulbrich, and S. Ulbrich, editors. Constrained optimization and optimal control for partial differential equations, volume 160 of International Series of Numerical Mathematics. Birkhäuser/Springer Basel AG, Basel, 2012.
Google Scholar
J.-L. Lions. Optimal control of systems governed by partial differential equations. Translated from the French by S. K. Mitter. Die Grundlehren der mathematischen Wissenschaften, Band 170. Springer-Verlag, New York-Berlin, 1971.
Google Scholar
S. May, R. Rannacher, and B. Vexler. Error analysis for a finite element approximation of elliptic Dirichlet boundary control problems. SIAM J. Control Optim., 51(3):2585–2611, 2013.
Article MathSciNet MATH Google Scholar
I. Neitzel, J. Pfefferer, and A. Rösch. Finite element discretization of state-constrained elliptic optimal control problems with semilinear state equation. SIAM J. Control Optim., 53(2):874–904, 2015.
Article MathSciNet MATH Google Scholar
I. Neitzel and F. Tröltzsch. Numerical analysis of state-constrained optimal control problems for PDEs. In Constrained optimization and optimal control for partial differential equations, volume 160 of Internat. Ser. Numer. Math., pages 467–482. Birkhäuser/Springer Basel AG, Basel, 2012.
Google Scholar
J. Nocedal and S.J. Wright. Numerical optimization. Springer Series in Operations Research and Financial Engineering. Springer, New York, second edition, 2006.
MATH Google Scholar
R.H. Nochetto, K.G. Siebert, and A. Veeser. Theory of adaptive finite element methods: an introduction. In Multiscale, nonlinear and adaptive approximation, pages 409–542. Springer, Berlin, 2009.
Chapter MATH Google Scholar
E. Rocca G. Schimperna J. Sprekels P. Colli, A. Favini, editor. Solvability, Regularity, and Optimal Control of Boundary Value Problems for PDEs, volume 22 of Springer INdAM Series. Springer, Cham, 2017. In Honour of Prof. Gianni Gilardi.
Google Scholar
A. Quarteroni, A. Manzoni, and F. Negri. Reduced basis methods for partial differential equations, volume 92 of Unitext. Springer, Cham, 2016. An introduction, La Matematica per il 3+2.
Google Scholar
B. Rivière. Discontinuous Galerkin methods for solving elliptic and parabolic equations, volume 35 of Frontiers in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008. Theory and implementation.
Google Scholar
S.M. Robinson. Stability theory for systems of inequalities. II. Differentiable nonlinear systems. SIAM J. Numer. Anal., 13(4):497–513, 1976.
Article MathSciNet MATH Google Scholar
A. Rösch. Error estimates for linear-quadratic control problems with control constraints. Optim. Methods Softw., 21(1):121–134, 2006.
Article MathSciNet MATH Google Scholar
A. Rösch and R. Simon. Linear and discontinuous approximations for optimal control problems. Numer. Funct. Anal. Optim., 26(3):427–448, 2005.
Article MathSciNet MATH Google Scholar
A. Rösch and R. Simon. Superconvergence properties for optimal control problems discretized by piecewise linear and discontinuous functions. Numer. Funct. Anal. Optim., 28(3–4): 425–443, 2007.
Article MathSciNet MATH Google Scholar
A. Rösch and D. Wachsmuth. A-posteriori error estimates for optimal control problems with state and control constraints. Numer. Math., 120(4):733–762, 2012.
Article MathSciNet MATH Google Scholar
A. Schiela and S. Ulbrich. Operator preconditioning for a class of inequality constrained optimal control problems. SIAM J. Optim., 24(1):435–466, 2014.
Article MathSciNet MATH Google Scholar
R. Schneider and G. Wachsmuth. A posteriori error estimation for control-constrained, linear-quadratic optimal control problems. SIAM J. Numer. Anal., 54(2):1169–1192, 2016.
Article MathSciNet MATH Google Scholar
C. Schwab and C.J. Gittelson. Sparse tensor discretizations of high-dimensional parametric and stochastic PDEs. Acta Numer., 20:291–467, 2011.
Article MathSciNet MATH Google Scholar
A. Shapiro. On concepts of directional differentiability. J. Optim. Theory Appl., 66(3):477–487, 1990.
Article MathSciNet MATH Google Scholar
J. Sokoł owski and J.-P. Zolésio. Introduction to shape optimization, volume 16 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, 1992. Shape sensitivity analysis.
Google Scholar
F. Tröltzsch. Optimal control of partial differential equations, volume 112 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2010. Theory, methods and applications, Translated from the 2005 German original by Jürgen Sprekels.
Google Scholar
M. Ulbrich. Semismooth Newton methods for variational inequalities and constrained optimization problems in function spaces, volume 11 of MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA, 2011.
Google Scholar
S. Ulbrich. Preconditioners based on “parareal” time-domain decomposition for time-dependent PDE-constrained optimization. In Multiple Shooting and Time Domain Decomposition Methods, pages 203–232. Springer, 2015.
Google Scholar
A.J. Wathen. Preconditioning. Acta Numer., 24:329–376, 2015.
Article MathSciNet MATH Google Scholar
J. Zowe and S. Kurcyusz. Regularity and stability for the mathematical programming problem in Banach spaces. Appl. Math. Optim., 5(1):49–62, 1979.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Sciences, George Mason University, Fairfax, VA, 22030, USA
Harbir Antil
Department of Mathematics, University of Connecticut, Storrs, CT, 06269, USA
Dmitriy Leykekhman

Authors

Harbir Antil
View author publications
You can also search for this author in PubMed Google Scholar
Dmitriy Leykekhman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harbir Antil .

Editor information

Editors and Affiliations

Department of Mathematical Sciences, George Mason University, Fairfax, VA, USA
Harbir Antil
Center for Computing Research, Sandia National Laboratories, Albuquerque, NM, USA
Drew P. Kouri
Corporate Strategic Research, ExxonMobil Research and Engineering Company, Annandale, NJ, USA
Martin-D. Lacasse
Center for Computing Research, Sandia National Laboratories, Albuquerque, NM, USA
Denis Ridzal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Antil, H., Leykekhman, D. (2018). A Brief Introduction to PDE-Constrained Optimization. In: Antil, H., Kouri, D.P., Lacasse, MD., Ridzal, D. (eds) Frontiers in PDE-Constrained Optimization. The IMA Volumes in Mathematics and its Applications, vol 163. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8636-1_1

Download citation

DOI: https://doi.org/10.1007/978-1-4939-8636-1_1
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-8635-4
Online ISBN: 978-1-4939-8636-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

A Brief Introduction to PDE-Constrained Optimization

Abstract

Similar content being viewed by others

An introduction to partial differential equations constrained optimization

Introduction

pyomo.dae: a modeling and automatic discretization framework for optimization with differential and algebraic equations

1 Introduction

2 Abstract Optimization Problem

2.1 Existence

Theorem 1

Proof

2.2 Differentiation in Banach Spaces

Definition 1 (Directional Derivative)

Definition 2 (Gâteaux Derivative)

Definition 3 (Fréchet Derivative)

Remark 1 (Few Facts)

Example 1

Remark 2 (Gradient)

Example 2

2.3 First-Order Necessary Optimality Conditions

Theorem 2

Proof

Remark 3

3 Application to PDE-Constrained Optimization Problems

Corollary 1

Proof

3.1 Reduced Form: First-Order Necessary Optimality Conditions

Corollary 2

Proof

Algorithm 1 Derivative computation using adjoints

3.2 Lagrangian Formulation

3.2.1 First-Order Optimality Conditions

3.2.2 Second-Order Derivatives

Algorithm 2 Hessian-Times-Vector Computation

4 Sobolev Spaces

5 Second-Order Linear Elliptic PDEs

5.1 Existence and Uniqueness

Assumption 3 (Coefficient Matrix)

Theorem 4

Proof

Remark 4

5.2 Regularity

Theorem 5

Proof

6 Linear Quadratic PDE-Constrained Optimization Problem

6.1 Problem Formulation

6.2 Reduced PDECO Problem

Corollary 3

Proof

6.3 First-Order Optimality Conditions

6.4 Lagrange Method

7 Semilinear Quadratic PDE-Constrained Optimization Problem

Assumption 6

Remark 5

Example 3

Theorem 7 (Existence and Uniqueness for Semilinear PDE)

Proof

Corollary 4

Proof

Remark 6

Assumption 8

8 Discrete Optimal Control Problem

Lemma 1 (Céa Lemma)

Lemma 2

Proof

Corollary 5

8.1 Discrete Linear Quadratic PDE-Constrained Optimization Problem

8.2 Optimization Problem Without Control Constraints

Theorem 9

Proof

8.3 Optimization Problem with Control Constraints

8.3.1 Cell-Wise Constant Control Discretization

Theorem 10

Proof

8.3.2 Cell-Wise Linear Control Discretization

Theorem 11

Proof

Remark 7 (Variational Discretization)

8.4 Semilinear Equations

Theorem 12