1 Introduction

This volume is aimed at early-career graduate students or whoever is interested in entering into the field of PDE-constrained optimization (PDECO), i.e., optimization problems with PDE constraints. The purpose of this particular chapter is to provide a brief mathematical introduction to this topic with an emphasis on optimality conditions and finite-element discretization. Most of the material presented here is well-known and can be found in several textbooks. In particular, we mention [14, 53, 56, 62, 81].

PDE-constrained optimization problems arise in many applications, for instance in flow control [42] and shape optimization [43, 80]. To name a few: controlling pollutants in a river [5]; drug delivery using magnetic fields [8, 9]; optimal placement of droplets on electrowetting on dielectric (EWOD) devices [7]; shape optimization of microfluidic biochips [3, 4]. In the first two examples, the control is distributed (confined to either the entire domain or a subdomain). In the third example, the control acts on the boundary, and in the final example, the control is the shape of the domain. These problems range from macroscale to microscale and contain varying levels of difficulties due to the underlying physics.

A PDE-constrained optimization problem has four or five major components: i) a control variable z; ii) state variable u; iii) a state equation (i.e., PDE or system of PDEs) which associates a state u with every control z; iv) a cost functional J to be minimized which depends on z and u; and, possibly, v) constraints imposed on z (control constraints) or/and on u (state constraints). In the most abstract form, this amounts to

$$\displaystyle \begin{aligned} \text{minimize}\quad J(u,z) \end{aligned} $$
(1)

subject to u ∈ Uad and z ∈ Zad satisfying

$$\displaystyle \begin{aligned} e(u,z)=0. \end{aligned} $$
(2)

Thus we want to minimize the cost J which is a function of the state u and the control z. The state equation is given by (2) that represents a PDE or system of PDEs (usually in the weak form) with u in an admissible set Uad and the control z in an admissible set Zad.

To fully understand PDECO problems, the following topics must be considered: (i) functional analysis of the problem; (ii) solver development; (iii) discretization and software development. We shall briefly elaborate on each next, a more detailed discussion will be given in the subsequent chapters. By analysis, we mean

  • Existence, uniqueness, and regularity of a solution to (2).

  • Existence and uniqueness of a solution to the PDECO problem (1)–(2).

  • First-order necessary optimality conditions using adjoint equations.

  • If possible the second-order necessary/sufficient conditions.

We can carry out the solver development, typically gradient based, either at the continuous (infinite dimensional) level [49, 50, 82] or at the discrete (finite dimensional) level [57, 66]. The choice depends on the two options (a) “First discretize then optimize approach” or (b) “First optimize then discretize approach.” As the names suggest in case (a) one first discretizes the optimization problem and then writes the optimality conditions at the finite dimensional level. On the other hand, (b) requires first writing the optimality conditions at the continuous level and then discretizing them. These two approaches in general are different. For instance the discretization of the adjoint variables is tied to the state variables if one proceeds with (a). On the other hand, (b) provides more flexibility with respect to this particular aspect. There is no universal approach to select either of the approaches [53, Section 3.2.4]. However, a choice should be made so that the discrete system preserves the structure of the continuous problem. It is worth noting that the optimization methods can be derived directly at the continuous level. In any event, one should avoid using off-the-shelf finite-dimensional algorithms to solve PDECO problems. This may lead to mesh dependence,, i.e., when the number of optimization iterations drastically increases with mesh refinement. In order to overcome such issues, we must adapt such algorithms (if we are using them) to incorporate functional-analytic concepts, for instance, suitable inner products. Nevertheless, discretization and software development amounts to

  • Either finite-difference, finite-volume, or finite-element discretizations of the PDE and the control [19, 46, 70]. Other discretizations such as wavelets are possible as well, see [30, 31, 78] and references therein.

  • Analysis of discrete PDE and optimization solvers and check for mesh independence [51, 82].

  • Make the solver efficient, for instance, by using

    • Adaptive finite-element methods (AFEM) [59, 67].

    • Preconditioning techniques [15, 76, 84], time-parallel approaches [16, 32, 33, 83].

    • Model reduction: Proper orthogonal decomposition [6, 41, 54, 55], reduced-basis method [17, 45, 69].

  • Software development, for instance, Rapid Optimization Library (ROL) [60], Dolfin-Adjoint [36].

Our goal for this chapter is to collect the necessary ingredients to understand basic PDECO problems. The remaining chapters of this volume provide comprehensive treatments of variational inequalities, algorithm development, optimization under uncertainty, inverse problems, shape optimization, and several applications. We introduce notation, relevant function spaces, and notions of derivatives. We provide an abstract treatment to solve (1)–(2). Moreover, we discuss the linear quadratic PDECO problem in detail (quadratic cost functional with linear elliptic PDEs and control constraints), provide a flavor of the semilinear quadratic PDECO problem, and discuss the necessary components for the numerical analysis of the linear and semilinear quadratic PDECO problems. We also provide a MATLAB implementation of the semilinear quadratic PDECO problem where the space discretization is carried out using the finite-element method (FEM). We solve the resulting optimization problem using the Newton-CG method, or LBFGS method in the absence of the control constraints (Zad = Z) and the semismooth Newton method in the presence of the control constraints. The code can be downloaded using the link

https://bitbucket.org/harbirantil/pde_constrained_opt

The first part of this chapter is organized as follows: in Section 2, we first consider a general optimization problem. We discuss existence of solutions to this problem in Section 2.1 and provide notions of derivatives in function spaces in Section 2.2. We conclude this section with the first-order necessary optimality conditions (cf. Section 2.3). We apply the approach discussed in Section 2 to the general PDECO problem (1)–(2) in Section 3 and discuss the full and the reduced forms. Derivation of optimality conditions is performed using both full and reduced forms.

In the second part of this chapter, we focus on the linear and semilinear quadratic PDECO problems. We introduce basic Sobolev spaces in Section 4. In Section 5, we mention the results relating to the well-posedness (existence and uniqueness whenever possible) of linear elliptic PDEs and their regularity. Next in Section 6, we formulate the linear quadratic PDECO problem and study its well-posedness in the reduced form. Section 7 discusses the semilinear quadratic PDECO problem. We conclude by introducing a finite-element discretization for the linear and semilinear quadratic PDECO problems in Section 8.

2 Abstract Optimization Problem

The purpose of this section is to consider an abstract optimization problem and study its well-posedness (cf. Section 2.1). We derive the first-order necessary optimality conditions in Section 2.3. This requires the notions of derivatives in function spaces (cf. Section 2.2). The results presented in this section will be applied to the PDECO problems in the subsequent sections.

Let Z be a real reflexive Banach space and Zad is a closed convex subset. We consider the minimization problem

$$\displaystyle \begin{aligned} \min_{z \in Z_{ad}} f(z).\end{aligned} $$
(3)

2.1 Existence

We first show existence of solution to the minimization problem (3) using the direct method of calculus of variations. This is also known as the Weierstrass theorem.

Theorem 1

Suppose \(f:Z \rightarrow \mathbb {R}\) is weakly lower semicontinuous with Z a reflexive Banach space and Z ad ⊂ Z is closed convex. Let the lower γ-level set {z  Zad : f(z) ≤ γ} of f is nonempty and bounded for some\(\gamma \in \mathbb {R}\). Then problem (3) has an optimal solution, i.e., there exists\(\bar {z} \in Z_{ad}\)such that\(f(\bar {z}) \le f(z)\)for all z  Zad. If f is strictly convex then the solution is unique.

Proof

The proof is based on the direct method of calculus of variations. Following the proof of [14, Theorem 3.2.1], we can construct a minimizing sequence \(\{z_n\}_{n\in \mathbb {N}}\) contained in the lower γ-level set such that \(f(z_n) \rightarrow \inf f(z)\) as n →. Since the lower γ-level set is convex and closed, therefore it is weakly sequentially closed [81, Theorem 2.11]. In addition, since Z is reflexive and the lower γ-level set is bounded, therefore it is weakly sequentially compact [81, Theorem 2.11]. As a result, there exists a subsequence (not relabeled) such that

$$\displaystyle \begin{aligned} z_n \rightharpoonup \bar{z} \quad \mbox{in } Z \end{aligned}$$

with \(\bar {z}\) in the lower γ-level set. It then remains to show that \(\bar {z}\) is the optimal solution. Due to the weak lower semicontinuity of f, we conclude that

$$\displaystyle \begin{aligned} f(\bar{z}) \le \liminf f(z_n) = \inf_{z\in Z_{ad}} f(z) . \end{aligned}$$

Finally, in order to show the uniqueness let us assume that z1 and z2 be two optimal solutions. Using the definition of strict convexity, we have

$$\displaystyle \begin{aligned} f\left(\frac{z_1+z_2}{2}\right) < \frac{1}{2} f(z_1) + \frac{1}{2} f(z_2) = \inf f(z) \end{aligned}$$

which is a contradiction. This completes the proof.

After showing the existence of minimizers, a natural question to ask is what the first-order optimality conditions are. However, we need to understand the notions of derivatives in function spaces before we can proceed further.

2.2 Differentiation in Banach Spaces

We introduce the notions of derivatives in function spaces [18, 79]. As an example, we shall apply the ideas to a quadratic cost functional. We will also derive the first-order optimality conditions for the problem (3) based on the derivatives introduced in this section.

Let \(\mathcal {L}(A,B)\) denote the space of bounded linear operators from Banach space A to B. Let (Z, ∥⋅∥Z), (V, ∥⋅∥V) be real Banach spaces, \(\mathcal {Z} \subset Z\), open and \(F : \mathcal {Z} \rightarrow V\). Moreover, let \(z \in \mathcal {Z}\).

Definition 1 (Directional Derivative)

F is said to be directionally differentiable at z if the limit \( \lim _{t\downarrow 0}\frac {1}{t} (F(z+th) - F(z))\) exists in V  for all h ∈ Z. If such limit exists, we denote

$$\displaystyle \begin{aligned}F'(z,h) := \lim_{t\downarrow 0}\frac{1}{t} (F(z+th) - F(z)) \end{aligned}$$

and say that F′(z, h) is the directional derivative of F at z in the direction h.

Notice that for a given z, hF′(z, h) is not necessarily a linear mapping but it is positive homogeneous, we refer to [79, 81] for examples.

Definition 2 (Gâteaux Derivative)

F is said to be Gâteaux differentiable at z if it is directionally differentiable and F′(z, h) = F′(z)h for \(F'(z) \in \mathcal {L}(Z,V)\). We refer to F′(z) as the Gâteaux derivative at z.

We next introduce a stronger notion.

Definition 3 (Fréchet Derivative)

F is said to be Fréchet differentiable at z if and only if F is Gâteaux differentiable at z and the following holds:

$$\displaystyle \begin{aligned} F(z+h) = F(z) + F'(z) h + r(z,h) \quad \mbox{with} \quad \frac{\| r(z,h) \|{}_V}{\| h \|{}_Z} \rightarrow 0 \quad \mbox{as} \quad \| h \|{}_Z \rightarrow 0 . \end{aligned}$$

We refer to F′(z) as the Fréchet derivative at z.

Remark 1 (Few Facts)

  1. (i)

    If the Fréchet derivative exists so does the Gâteaux derivative and they coincide. However, the converse is not true in general.

  2. (ii)

    We say that F is continuously Gâteaux differentiable if F′(⋅) exists and F′(⋅) is continuous. In that case F is Fréchet differentiable [18, pp. 35–36].

  3. (iii)

    Let E = G(F(z)) where F is Gâteaux differentiable at z and G is Fréchet differentiable at F(z), then E is Gâteaux differentiable.Footnote 1

Notice that when \(V = \mathbb {R}\) then \(\mathcal {L}(Z,V) = Z^*\). In addition if F is Gâteaux differentiable at z then we have

$$\displaystyle \begin{aligned} F'(z)h = \langle F'(z), h \rangle_{Z^*,Z}, \end{aligned}$$

where Z is the dual space of Z and \(\langle \cdot ,\cdot \rangle _{Z^*,Z}\) denotes the duality pairing.

Before we conclude this subsection, we apply the above introduced definitions to two prototypical quadratic functionals. The derivatives in both these examples are Fréchet derivatives.

Example 1

Let (H, (⋅, ⋅)H) be a real Hilbert space and \(F : H \rightarrow \mathbb {R}\) defined as \(F(z) := \| z \|{ }^2_H = (z,z)_H\), then for all z, h ∈ H we have

$$\displaystyle \begin{aligned} F(z+h) - F(z) = 2(z,h)_H + \| h \|{}^2_H. \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned} F'(z)h = (2z , h)_H . \end{aligned}$$

Using the Riesz Representation Theorem (identify H with its dual H), we can write

$$\displaystyle \begin{aligned} (\nabla F(z), h)_H = \langle F'(z), h\rangle_{H^*,H} , \end{aligned}$$

where ∇F(z) ∈ H is the representative of F′(z) ∈ H. We refer to ∇F(z) ∈ H as the gradient of F at z. In the above case, we have ∇F(z) = 2z.

Remark 2 (Gradient)

As can be seen from the above example, the expression that we obtain by identifying F′(z) ∈ H with an element of H is called the gradient of F. We will use the notation ∇F(z) to denote the gradient. We further notice that the definition of the gradient depends on the underlying inner product.

Example 2

Let (Z, (⋅, ⋅)Z), (H, (⋅, ⋅)H) be real Hilbert spaces and ud ∈ H be fixed. Let \(S \in \mathcal {L}(Z,H)\). Consider \(E: Z \rightarrow \mathbb {R}\),

$$\displaystyle \begin{aligned} E(z) = \| S z - u_d \|{}^2_H . \end{aligned}$$

Then E(z) = G(F(z)), where \(G(v) = \| v \|{ }_H^2\) and F(z) = Sz − ud.

Next using the chain rule, we obtain that

$$\displaystyle \begin{aligned} \langle E'(z), h\rangle_{Z^*,Z} &= \langle G'(F(z)), F'(z) h\rangle_{H^*,H} = (2 v, F'(z) h)_H \\ &= 2(S z - u_d, S h)_H = 2\langle S^*(S z - u_d), h\rangle_{Z^*,Z} , \end{aligned} $$

where \(S^* \in \mathcal {L}(H^*,Z^*)\) is the adjoint of S. Here we have assumed that SS and Sud are well defined. Thus E′(z) = S(Sz − ud) ∈ Z. Since Z is a Hilbert space, similar to the previous example, we can again apply the Riesz representation theorem to get the representative ∇E(z) ∈ Z of E′(z).

2.3 First-Order Necessary Optimality Conditions

We conclude this section with the following result on the first-order necessary optimality conditions.

Theorem 2

Let Z be a real Banach space (not necessarily reflexive). Let \(f : \mathcal {Z} \rightarrow \mathbb {R}\) be Gâteaux differentiable in \(\mathcal {Z}\) , where \(Z_{ad} \subset \mathcal {Z} \subset Z\) , \(\mathcal {Z}\) open. If \(\bar {z} \in Z_{ad}\) is a solution to (3) then the first-order necessary optimality conditions are

$$\displaystyle \begin{aligned} \langle f'(\bar{z}), z-\bar{z}\rangle_{Z^*,Z} \ge 0 \quad \forall z \in Z_{ad} . \end{aligned} $$
(4)

In addition, if f is convex and \(\bar {z} \in Z_{ad}\) solves (4) then\(\bar {z}\)is a solution to (3), i.e., (4) is necessary and sufficient.

Proof

The proof of the first part is a direct consequence of the definition of Gâteaux derivative. Let z ∈ Zad be arbitrary. By the convexity of Zad we have that \(\bar {z}+t(z-\bar {z}) \in Z_{ad}\) for all t ∈ [0, 1]. From the optimality of \(\bar {z}\) it follows that

$$\displaystyle \begin{aligned} f(\bar{z}+t(z-\bar{z})) - f(\bar{z}) \ge 0 \quad \forall t \in [0,1]. \end{aligned}$$

Dividing both sides by t and taking the limit as t approaches 0+, we arrive at (4).

Next we use the convexity of f, i.e., for all t ∈ (0, 1] \(f(\bar {z}+t(z-\bar {z})) \le (1-t) f(\bar {z}) + t f(z)\). By rearranging terms and taking the limit as t approaches 0+, we arrive at

$$\displaystyle \begin{aligned} f(z) - f(\bar{z}) \ge \langle f'(\bar{z}), z-\bar{z}\rangle_{Z^*,Z} \quad \forall z \in Z_{ad} .\end{aligned} $$

We then obtain the desired sufficient condition by using (4).

Remark 3

We make the following observations:

  1. i.

    Theorem 2 only requires f to be directionally differentiable and it holds after replacing \(\langle f'(\bar {z}), z-\bar {z}\rangle _{Z^*,Z}\) by \(f'(z,z-\bar {z})\).

  2. ii.

    Notice that the existence of a \(\bar {z} \in Z_{ad}\) in Theorem 2 that solves (3) can be satisfied under the assumptions of Theorem 1.

  3. iii.

    In general, for a nonconvex f, we cannot expect to achieve a global minimum but only a local minimum. We call \(\bar {z} \in Z_{ad}\) a local minimum to (3) if there exists an ε > 0 such that

    $$\displaystyle \begin{aligned} f(\bar{z}) \le f(z) \quad \forall z \in Z_{ad} \cap B_\varepsilon(\bar z)\end{aligned} $$

    where \(B_\varepsilon (\bar z) \subset Z\) is a ball of radius ε centered at \(\bar {z}\).

  4. iv.

    Equation (4) is known as a variational inequality.

3 Application to PDE-Constrained Optimization Problems

Let Z be a real reflexive Banach space and U, Y  be real Banach spaces. We begin by recalling the abstract problem (1)–(2)

$$\displaystyle \begin{aligned} \min_{(u,z) \in U\times Z} J(u,z) \quad \mbox{subject to} \quad e(u,z) = 0 , \quad z \in Z_{ad} , \ u \in U_{ad} , \end{aligned} $$
(5)

where \(J: U\times Z \rightarrow \mathbb {R}\) and e : U × Zad → Y  where Zad ⊂ Z is closed convex. We refer to (5) as the full-space form. Before we proceed further, we remark that often the cost functional J has two components

$$\displaystyle \begin{aligned} J(u,z) = J_1(u) + J_2(z),\end{aligned} $$

where \(J_1 : U \rightarrow \mathbb {R}\) is the objective (to be attained) and \(J_2 : Z \rightarrow \mathbb {R}\) is the control penalty.

Another way to write (5) is by letting X := U × Z, Xad := Uad × Zad. Then we seek x ∈ Xad such that

$$\displaystyle \begin{aligned} \min_{x \in X_{ad}} J(x) \quad \mbox{subject to} \quad e(x) = 0. \end{aligned} $$
(6)

Notice that (6) does not assume splitting of the control and state variables and is a generalization of (5).

By eliminating the state variables, we obtain a third form of (5) and we call it the reduced form. Specifically, this requires existence of a solution operator: S : Z → U, which assigns each control to a unique state

$$\displaystyle \begin{aligned} z \mapsto S(z) = u(z) \quad \mbox{ where } u(z) \mbox{ satisfies} \quad e(u(z),z) = 0 . \end{aligned}$$

Thus we define the reduced cost functional as \(\mathcal {J} : Z \rightarrow \mathbb {R}\)

$$\displaystyle \begin{aligned} \mathcal{J}(z) := J(S(z),z) . \end{aligned}$$

Instead of (5) we then solve

$$\displaystyle \begin{aligned} \begin{aligned} &\min_{z \in Z_{ad}} \mathcal{J}(z) \\ \mbox{subject to}& \\ &S(z)\in U_{ad}. \end{aligned} \end{aligned} $$
(7)

We remark that the formulations (5) and (7) are not equivalent in general. For instance, there are applications where the solution operator S does not exist, i.e., the problem (7) may not have a solution but (5) is still solvable.

We next state existence result for (7). For simplicity of presentation from here on we will assume that Uad = U, i.e., no state constraints. However the discussion can be easily adapted to this more general situation.

Corollary 1

Let the following assumptions hold

  1. (i)

    Z ad ⊆ Z is closed and convex.

  2. (ii)

    For each z  Zad, there exists a unique mapping S(z) that solves e(S(z), z) = 0.

  3. (iii)

    S is weakly continuous, i.e., if \(z_n \rightharpoonup z\) in Z ad then \(S(z_n) \rightharpoonup S(z)\) in U.

  4. (iv)

    The lower γ-level set \(\{z \in Z_{ad} : \mathcal {J}(z) \le \gamma \}\) of \(\mathcal {J}\) is nonempty and bounded for some \(\gamma \in \mathbb {R}\).

  5. (v)

    J 1 is continuous and convex and J 2 is weakly lower semicontinuous.

Then there exists a solution to (7).

Proof

We notice that since J1 is continuous and convex, therefore it is weakly lower semicontinuous [81, Theorem 2.12]. The weak continuity of S combined with the weak lower semicontinuity of J2 implies that \(\mathcal {J}\) is weakly lower semicontinuous. The proof then follows using Theorem 1.

We remark that in many applications we replace the lower γ-level set by either boundedness of Zad or the coercivity of J2. More details will be provided in the subsequent sections.

At times it is more suitable to directly work with the full-space (5) form as the reduced form (7) may not even exist. This requires us to use the Lagrangian functional; we will discuss this in Section 3.2. Another advantage of using Lagrangian formulation is the ease with which it allows us to derive the first- and second-order derivatives. This will be discussed in Section 3.2. We first consider the derivation of first-order optimality conditions for the reduced form in Section 3.1.

3.1 Reduced Form: First-Order Necessary Optimality Conditions

Corollary 2

Let all the assumptions of Corollary 1 hold except Z being reflexive. Let \(\mathcal {Z}\) be an open set in Z such that \(Z_{ad} \subset \mathcal {Z}\) such that \(z \mapsto S(z) : \mathcal {Z} \rightarrow U\) is Gâteaux differentiable with derivative

$$\displaystyle \begin{aligned} S'(z) \in \mathcal{L}(Z,U),\end{aligned} $$

\((u,z) \mapsto J(u,z) : U \times Z \rightarrow \mathbb {R}\) is Fŕechet differentiable with

$$\displaystyle \begin{aligned} J'(u,z) \in \mathcal{L}(U\times Z, \mathbb{R}).\end{aligned} $$

If \(\bar {z}\) is minimizer of (5) over Zadthen the first-order necessary optimality conditions are given by

$$\displaystyle \begin{aligned} \langle S'(\bar{z})^* J_u(S(\bar{z}),\bar{z}) + J_z(S(\bar{z}),\bar{z}),z-\bar{z}\rangle_{Z^*,Z} \ge 0 \quad \forall z \in Z_{ad}, \end{aligned} $$
(8)

where J u and J z are the partial derivatives of J. If \(\mathcal {J}\) is convex then the condition (8) is sufficient.

Proof

The proof is a consequence of Theorem 2. Let \(\bar {z}\) be a solution of (5) then from Theorem 2 we have that \(\langle \mathcal {J}^{\prime }(\bar {z}), z-\bar {z}\rangle _{Z^*,Z} \ge 0\) for all z ∈ Zad. Combining this with the directional derivative and setting \(h:=z-\bar {z}\), we obtain that

$$\displaystyle \begin{aligned} \langle \mathcal{J}^{\prime}(\bar{z}), h\rangle_{Z^*,Z} &= J'(S(z),z)h = \langle J_u(S(z),z), S'(z) h \rangle_{U^*,U} + \langle J_z(S(z),z) , h \rangle_{Z^*,Z} \end{aligned} $$

where we have used the chain rule for derivatives which holds under the stated differentiability assumptions on S and J. Since S′(z) is well defined, we conclude that

$$\displaystyle \begin{aligned} \langle \mathcal{J}^{\prime}(\bar{z}), h\rangle_{Z^*,Z} &= \langle S'(z)^* J_u(S(z),z), h \rangle_{Z^*,Z} + \langle J_z(S(z),z) , h \rangle_{Z^*,Z}\end{aligned} $$

This completes the proof.

To further understand the structure of \(S'(\bar z)\), we assume that the PDE constraint function e is sufficiently smooth and the conditions of the implicit function theorem hold. Upon differentiating the state equation, we obtain that

$$\displaystyle \begin{aligned} e_u(S(\bar z),\bar z) S'(\bar z) h = - e_z (S(\bar z),\bar z) h . \end{aligned}$$

Subsequently, we arrive at

$$\displaystyle \begin{aligned} S'(\bar z) h = - e_u(S(\bar z),\bar z)^{-1} \left( e_z (S(\bar z),\bar z) h \right) .\end{aligned} $$
(9)

Substituting this in (8), we obtain that

$$\displaystyle \begin{aligned} -\left\langle e_z (S(\bar{z}),\bar{z})^* \left( (e_u(S(\bar{z}),\bar{z})^{-1})^* J_u(S(\bar{z}),\bar{z}) \right), z -\bar{z} \right\rangle_{Z^*,Z}\\ + \langle J_z(S(\bar{z}),\bar{z}),z-\bar{z}\rangle_{Z^*,Z} \ge 0 .\end{aligned} $$

Introducing the adjoint variable \(\bar p\) and solving the adjoint equation,

$$\displaystyle \begin{aligned} e_u (S(\bar{z}),\bar{z})^{*} \bar p = J_u(S(\bar{z}),\bar{z}),\end{aligned} $$
(10)

we arrive at the following reformulation of (8)

$$\displaystyle \begin{aligned} -\left\langle e_z (S(\bar{z}),\bar{z})^* \bar p, z-\bar{z} \right\rangle_{Z^*,Z} + \langle J_z(S(\bar{z}),\bar{z}),z-\bar{z}\rangle_{Z^*,Z} \ge 0 . \end{aligned} $$
(11)

Notice that

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime}(z) = -e_z (S(z),z)^* p + J_z(S(z),z) \in Z^* ,\end{aligned} $$
(12)

is the derivative of \(\mathcal {J}\) at z. We summarize the computation of \(\mathcal {J}^{\prime }(z)\) in Algorithm 1.

Algorithm 1 Derivative computation using adjoints

  1. 1:

    Given z, solve e(u, z) = 0 for the state u.

  2. 2:

    Solve the adjoint equation eu(u(z), z)p = Ju(u(z), z) for p.

  3. 3:

    Compute \(\mathcal {J}^{\prime }(z) = J_u(u(z),z) - e_z(u(z),z)^* p(z)\).

Algorithm 1 requires two PDE solvers (possibly nonlinear in Step 1, linear PDE in Step 2).

In order to get the gradient, we can again apply the Riesz representation theorem (under the assumption that Z is a Hilbert space or admits a representation) to get a representative \(\nabla \mathcal {J}(z)\) satisfying

$$\displaystyle \begin{aligned} (\nabla \mathcal{J}(z) , v)_{Z} = \langle \mathcal{J}^{\prime}(z), v \rangle_{Z^*,Z} \quad \forall v \in Z. \end{aligned}$$

Having the expression of the gradient in hand, we can develop a gradient-based optimization solver. We will provide another derivation of the first-order optimality conditions in Section 3.2 using the Lagrangian approach.

In order to design Newton-based methods, it is desirable to have the second-order derivative information of the reduced functional. This can be obtained by using either the reduced functional approach or the Lagrangian approach. We will provide a brief discussion in the next section as well. More details will be given in subsequent chapters, we also refer to [4453].

3.2 Lagrangian Formulation

3.2.1 First-Order Optimality Conditions

The full-space form requires us to introduce Lagrangian functional: \(L: U \times Z_{ad} \times Y^* \rightarrow \mathbb {R}\),

$$\displaystyle \begin{aligned} L(u,z,p) = J(u,z) - \langle e(u,z), p \rangle_{Y,Y^*} . \end{aligned} $$
(13)

where Y is the dual space of Y  (recall that e : U × Zad → Y ). Notice that if we set u = S(z) in (13) then e(S(z), z) = 0 and we obtain that

$$\displaystyle \begin{aligned} \mathcal{J}(z) = J(S(z),z) = L(S(z),z,p) \quad \mbox{for any } p \in Y^*. \end{aligned} $$
(14)

Now if \((\bar u,\bar z,\bar p)\) denotes a stationary point, then the partial derivatives of L(u, z, p) with respect to u, z, and p at \((\bar u,\bar z,\bar p)\) vanish and as a result we obtain

$$\displaystyle \begin{aligned} L_p(\bar u,\bar z,\bar p) = 0, \end{aligned}$$

which reduces to the state equation

$$\displaystyle \begin{aligned}e(\bar u,\bar z) = 0.\end{aligned} $$

Also

$$\displaystyle \begin{aligned} L_u(\bar u,\bar z,\bar p) &= 0 {},\end{aligned} $$
(15)

which is just the adjoint equation (10). Indeed

$$\displaystyle \begin{aligned} \langle L_u(\bar u,\bar z,\bar p) , \xi\rangle_{U^*,U} &= \langle J_u(\bar u,\bar z) , \xi\rangle_{U^*,U} - \langle \bar p, e_u(\bar u,\bar z) \xi \rangle_{Y^*,Y} \\ &= \langle J_u(\bar u,\bar z) - e_u(\bar u,\bar z)^* \bar p , \xi\rangle_{U^*,U}. \end{aligned} $$

In other words

$$\displaystyle \begin{aligned} L_u(\bar u,\bar z,\bar p) = J_u(\bar u,\bar z) - e_u(\bar u,\bar z)^* \bar p . \end{aligned}$$

Finally,

$$\displaystyle \begin{aligned} \langle L_z(\bar u,\bar z,\bar p) , z - \bar z \rangle_{Z^*,Z} \ge 0 \quad \forall z \in Z_{ad},\end{aligned} $$

which is equivalent to the variational inequality for the control (11). Indeed we have that the gradient of the reduced function \(\mathcal {J}\) at z (12) is

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime}(z) = L_z(u,z,p), \end{aligned}$$

where u and p solve the state and the adjoint equations, respectively.

Few comments are in order, first of all the above approach provides an elegant way to derive the first-order necessary optimality conditions and is highly recommended. As we will discuss below, the above approach also allows us to easily derive the second-order derivatives for the reduced functional. Secondly, even though the above introduced Lagrangian L is rigorous, however, we have not yet addressed the question of existence of the Lagrange multiplier p which makes the approach “formal.” The existence of Lagrange multiplier p can be shown by using the Karush–Kuhn–Tucker (KKT) theory in function spaces [81, Chapter 6]. This theory requires the Robinson’s regularity condition [71] or the Zowe–Kurcyusz constraint qualification [85], see [53, Chapter 1]. In certain cases, in particular, in the absence of control constraints and linear PDE constraints, one can also use the inf-sup theory for saddle point problems [39] to show existence of the Lagrange multipliers.

3.2.2 Second-Order Derivatives

Next we focus on deriving the expression of the second-order derivative of the reduced functional \(\mathcal {J}\). The second-order derivative information can significantly improve the convergence rates for optimization algorithms. For instance in the absence of control constraints, the first-order necessary optimality conditions are \(\mathcal {J}^{\prime }(\bar {z}) = 0\) in Z. In order to solve for z, one can use Newton’s method which is quadratically convergent (locally). Each iteration of Newton’s method requires solving

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime\prime}(z) v = -\mathcal{J}^{\prime}(z) \quad \mbox{in } Z^* \end{aligned} $$
(16)

for a direction v ∈ Z. In general (for large problems), it is often too expensive to store and factorize the Hessian. Instead it is often more practical to solve (16) using iterative methods that only require Hessian-times-vector products. We will discuss the computation of Hessian-times-vector product next. We remark that in case of bound constraints on the control one can use a superlinearly (locally) convergent semismooth Newton method [485682].

We will proceed by using the Lagrangian approach. We operate under the assumption that J and e are twice continuously differentiable.

From (14), we recall that

$$\displaystyle \begin{aligned} \mathcal{J}(z) = J(u(z),z) = L(u(z),z,p) \end{aligned}$$

u(z) = S(z) solves the state equation and p ∈ Y is arbitrary. After differentiating this expression in a direction h1, we obtain that

$$\displaystyle \begin{aligned} \langle \mathcal{J}^{\prime}(z) , h_1 \rangle_{Z^*,Z} = \langle L_u(u(z),z,p) , u'(z) h_1\rangle_{U^*,U} +\langle L_z(u(z),z,p) , h_1\rangle_{Z^*,Z}. \end{aligned}$$

Again differentiating this expression in a direction h2 and choosing a particular p that solves the adjoint equation (15), we arrive at

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime\prime}(z) = T(S(z),z)^* H(S(z),z,p) T(S(z),z), \end{aligned} $$

where

$$\displaystyle \begin{aligned} T(u,z) = \left(\begin{array}{c} - e_u(u,z)^{-1} e_z (u,z) \\ I_Z \end{array} \right) \end{aligned}$$

with IZ denoting the identity map on Z and H(u, z, p) is given by

$$\displaystyle \begin{aligned} H(u,z,p) = \left(\begin{array}{cc} L_{uu}(u,z,p) & L_{uz}(u,z,p) \\ L_{zu}(u,z,p) & L_{zz}(u,z,p) \end{array} \right). \end{aligned}$$

Then one can compute the Hessian vector product by using Algorithm 2. Notice that

Algorithm 2 Hessian-Times-Vector Computation

  1. 1:

    Given z, solve e(u, z) = 0 for u (if not done already).

  2. 2:

    Solve adjoint equation: eu(u, z)p = Ju(u, z) for p (if not done already).

  3. 3:

    Solve eu(u, z)w = −ez(u, z)v.

  4. 4:

    Solve eu(u, z)q = Luu(u, z, p)w + Luz(u, z, p)v.

  5. 5:

    Compute

Algorithm 2 requires two linear PDE solvers in Steps 3 and 4. We refer to [53, Chapter 1] and [44] for further details.

So far our approach has been general. For the chapter remainder, we will focus on two particular examples where the cost functional is quadratic and the PDE constraints are linear and semilinear elliptic PDEs. In order to develop the notion of solutions to these PDEs, we first introduce Sobolev spaces.

4 Sobolev Spaces

In this section, we introduce the necessary function spaces to be used throughout the chapter remainder. Let \(\varOmega \subset \mathbb {R}^n\) be an open, bounded domain with Lipschitz boundary ∂Ω. For 1 ≤ p < , we denote by Lp(Ω) the Banach space

$$\displaystyle \begin{aligned} L^p(\varOmega) := \left\{ v : \varOmega \rightarrow \mathbb{R} \;:\; v \mbox{ is measurable and } \int_\varOmega |v(x)|{}^p \;dx < \infty \right\} \end{aligned}$$

with the norm \(\|v\|{ }_{L^p(\varOmega )} := \left (\int _\varOmega |v(x)|{ }^p \;dx \right )^{\frac {1}{p}}\). These spaces are equivalence classes of functions equal up to a set of measure zero [37]. In particular when p = 2, we obtain L2(Ω) which is a Hilbert space with inner product \((u,v)_{L^2(\varOmega )} = \int _\varOmega u(x) v(x) \; dx\). When p = , we obtain L(Ω), a Banach space with norm \(\|v\|{ }_{L^\infty (\varOmega )} := \mbox{ess sup}_{\varOmega } |v|\).

Moving forward, we use multi-index notation to define partial derivatives. For a multi-index \(\gamma = (\gamma _1,\dots ,\gamma _n) \in \mathbb {N}^n_0 := (\mathbb {N}\cup \{0\})^n\), we let its order to be \(|\gamma | := \sum _{i=1}^n \gamma _i\). The associated |γ|-th order partial derivative of a function u at x is

$$\displaystyle \begin{aligned}D^\gamma u(x) := \frac{\partial^{|\gamma|}u}{\partial x_1^{\gamma_1}\dots \partial x_n^{\gamma_n}}(x) . \end{aligned}$$

Then we denote by Wk, p(Ω) the Sobolev spaces with the norm

$$\displaystyle \begin{aligned} \|u\|{}_{W^{k,p}(\varOmega)} := \left\{\begin{array}{ll} \left( \sum_{|\gamma| \le k} \int_\varOmega |D^\gamma u|{}^p \;dx \right)^{1/p} & 1 \le p < \infty \\ \sum_{|\gamma|\le k} \mbox{ess sup}_{\varOmega} |D^\gamma u| & p = \infty . \end{array} \right. \end{aligned}$$

If p = 2, we write

$$\displaystyle \begin{aligned} H^k(\varOmega) = W^{k,2}(\varOmega), \quad k = 0, 1, \dots \end{aligned}$$

which is a Hilbert space with inner product

$$\displaystyle \begin{aligned} (u,v)_{H^k(\varOmega)} = \sum_{|\gamma| \le k} (D^\gamma u, D^\gamma v)_{L^2(\varOmega)} . \end{aligned}$$

Notice that H0(Ω) = L2(Ω).

We denote by \(W_0^{k,p}(\varOmega )\) the closure of \(C^\infty _0(\varOmega )\) with respect to Wk, p(Ω)-norm. Thus \(u \in W^{k,p}_0(\varOmega )\) if and only if there exist functions \(u_m \in C_0^\infty (\varOmega )\) such that um → u in Wk, p(Ω). The space \(H^1_0(\varOmega )\) consists of functions u ∈ H1(Ω) such that

$$\displaystyle \begin{aligned} u = 0 \quad \mbox{on } \partial\varOmega , \end{aligned}$$

in the trace sense. Using the Poincaré inequality \(\|u\|{ }_{L^2(\varOmega )} \le C \|\nabla u\|{ }_{L^2(\varOmega )}\), where C = C(Ω), we have

$$\displaystyle \begin{aligned} \| u \|{}_{H^1(\varOmega)} \le C \| \nabla u \|{}_{L^2(\varOmega)} \quad \forall u \in H^1_0(\varOmega). \end{aligned}$$

Finally, we denote the dual of \(H^1_0(\varOmega )\) by H−1(Ω). It is easy to see that L2(Ω) is continuously embedded in H−1(Ω). For more details, we refer to [1].

5 Second-Order Linear Elliptic PDEs

In this section, we study the second-order elliptic PDEs. In general, we cannot expect classical solutions to these PDEs, therefore we first introduce the notion of weak solutions in Section 5.1. We will also study the notion of ‘strong’ solutions for this problem in Section 5.2. This higher regularity (strong solutions) will help us to establish the approximability of the continuous solution using the finite-element method (cf. Section 8). Strong solutions, in addition, also play a role in other situations, for instance, in studying the regularity of multipliers in state-constrained problems [20] and PDECO problems with variational inequalities [49].

5.1 Existence and Uniqueness

We begin this section by making certain uniform ellipticity assumptions.

Assumption 3 (Coefficient Matrix)

Let A be an n × n matrix with entries aijfor 1 ≤ i, j  n. We assume that aijare measurable, belong to L(Ω), and are symmetric, that is, aij(x) = aji(x) for all 1 ≤ i, j  n and for a.e. x  Ω. We further assume that A is positive definite and satisfy the uniform ellipticity condition

$$\displaystyle \begin{aligned} \exists \; \beta \ge \alpha > 0 \quad \mathit{\mbox{such that}} \quad \alpha |\xi|{}^2 \le A(x) \xi \cdot \xi \le \beta |\xi|{}^2 \quad \forall \xi \in \mathbb{R}^n, \mathit{\mbox{ a.e. }} x \mathit{\mbox{ in }} \varOmega . \end{aligned} $$
(17)

Given f, we consider the following linear elliptic PDE

$$\displaystyle \begin{aligned} \begin{aligned} -\mbox{div } (A \nabla u) &= f \quad \mbox{in } \varOmega \\ u &= 0 \quad \mbox{on } \partial\varOmega . \end{aligned} \end{aligned} $$
(18)

We understand (18) in a weak sense, i.e., given f ∈ H−1(Ω), we seek a solution \(u \in H^1_0(\varOmega )\) that satisfies

$$\displaystyle \begin{aligned} \begin{aligned} \int_\varOmega A \nabla u \cdot \nabla v \;dx &= \langle f, v\rangle_{-1,1} , \quad \forall v \in H^1_0(\varOmega) , \end{aligned} \end{aligned} $$
(19)

where 〈⋅, ⋅〉−1,1 denotes the duality pairing between H−1(Ω) and \(H^1_0(\varOmega )\).

Theorem 4

For every f  H−1(Ω), there exists a unique weak solution\(u \in H^1_0(\varOmega )\)that fulfills

$$\displaystyle \begin{aligned} \| u\|{}_{H^1(\varOmega)} \le C \|f\|{}_{H^{-1}(\varOmega)}, \end{aligned} $$
(20)

where the constant C only depends on Ω and α.

Proof 

The existence and uniqueness is due to the Lax–Milgram Lemma. Moreover, the bound (20) immediately follows by using the fact that A is uniformly positive definite and the Poincaré inequality [38].

Remark 4

In general, for Sobolev spaces W1, p(Ω) with p ≠ 2, we need the inf-sup conditions to prove Theorem 4. The Banach–Nečas theorem then guarantees existence and uniqueness of u [34, Theorem 2.6]. The latter is a necessary and sufficient condition. See also [67].

5.2 Regularity

It is imperative to understand the regularity of solution u to (19). For instance, such an understanding allows us to develop numerical schemes with optimal rate of convergence (see Section 8). It can assist us with proving rates of convergence for the optimization algorithms [47].

Theorem 5

Let \(u \in H^1_0(\varOmega )\) be a weak solution of (19) and let the coefficient matrix A satisfy Assumption 3.

  • If f  L2(Ω) and Ω is a convex polytope or C1, 1domain in\(\mathbb {R}^n\), then\(u \in H^2(\varOmega ) \cap H^1_0(\varOmega )\)and there exists a constant C = C(α, β, Ω) such that

    $$\displaystyle \begin{aligned} \| u \|{}_{H^2(\varOmega)} \le C \| f \|{}_{L^2(\varOmega)} . \end{aligned}$$
  • If f  Lp(Ω) for 1 < p < ∞ and Ω is C1, 1, then\(u \in W^{2,p}(\varOmega ) \cap W^{1,p}_0(\varOmega )\)and there exists a constant C = C(α, β, Ω, p) such that

    $$\displaystyle \begin{aligned} \| u \|{}_{W^{2,p}(\varOmega)} \le C \| f \|{}_{L^p(\varOmega)} . \end{aligned}$$

    If p > n , then\(u \in C^{1,\alpha }(\bar \varOmega )\)with α = 1 − np.

Proof 

If Ω is a convex polygonal/polyhedral domain, then H2-regularity is in [40, 3.2.1.2]. When ∂Ω is C1, 1 and f ∈ Lp(Ω) for any 1 < p < , the result is due to [38, Theorem 9.15]. In the case p > n, the C1, α regularity follows from W2, p regularity and the Sobolev embedding.

6 Linear Quadratic PDE-Constrained Optimization Problem

Having some basic understanding of elliptic PDEs in hand, we next apply the results of Section 3 to a linear quadratic PDECO problem (cf. Section 6.1). In Section 6.2, we formulate it as a reduced PDECO problem only in terms of the control variable z. This allows us to use the direct method of calculus of variations from Theorem 1 to show the existence of solution.

6.1 Problem Formulation

Let ud ∈ L2(Ω) and za, zb ∈ L2(Ω) with za < zb a.e. in Ω being given. Moreover, let λ ≥ 0 denotes the penalty parameter. Then we are interested in minimizing

$$\displaystyle \begin{aligned} J(u,z) = \frac{1}{2}\| u - u_d \|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2} \|z\|{}^2_{L^2(\varOmega)} \end{aligned} $$
(21)

subject to (in the weak form)

$$\displaystyle \begin{aligned} \begin{aligned} -\mbox{div } (A \nabla u) &= z \quad \mbox{in } \varOmega, \\ u &= 0 \quad \mbox{on } \partial\varOmega, \end{aligned} \end{aligned} $$
(22)

and the pointwise control constraints

$$\displaystyle \begin{aligned} z \in Z_{ad} := \{ v \in L^2(\varOmega) : z_a(x) \le v(x) \le z_b(x) \ , \ a.e. \ x \in \varOmega \}. \end{aligned} $$
(23)

Notice that in our formulation above we also allow za = − and zb = . We call this case as an unconstrained case, i.e., Zad = L2(Ω).

For the above problem, we have (cf. Section 3)

$$\displaystyle \begin{aligned} U = H^1_0(\varOmega), \quad Y = H^{-1}(\varOmega), \quad Z = L^2(\varOmega). \end{aligned}$$

In order to understand the problem (21)–(23) similar to (7), first we condense the variables and write J only in terms of z. We again call this as the reduced form and the resulting cost functional as the reduced functional. We discuss this next.

6.2 Reduced PDECO Problem

For every z ∈ Y , there exists a unique solution u = u(z) ∈ U to (22). As a result, we can define the solution operator to (22) as

$$\displaystyle \begin{aligned} \mathcal{S} : Y \rightarrow U , \quad z \mapsto u(z) = \mathcal{S} z, \end{aligned} $$

which is linear and continuous. In view of the continuous embedding, \( H^1_0(\varOmega ) \hookrightarrow L^2(\varOmega ) \hookrightarrow H^{-1}(\varOmega )\) we may also consider \(\mathcal {S}\) as a map from L2(Ω) to L2(Ω). In other words instead of \(\mathcal {S}\), we consider the operator \(S := \mathcal {E}_u \mathcal {S}\mathcal {E}_z\), where \(\mathcal {E}_z : L^2(\varOmega ) \rightarrow H^{-1}(\varOmega )\) and \(\mathcal {E}_u : H^1_0(\varOmega ) \rightarrow L^2(\varOmega )\) denote the embedding operators that assign to each z ∈ L2(Ω) and \(u \in H^1_0(\varOmega )\) the functions z ∈ H−1(Ω) and u ∈ L2(Ω) so that when these new functions are restricted to L2(Ω) and \(H^1_0(\varOmega )\) the operator yields the original functions, respectively. Notice that S : L2(Ω) → L2(Ω). One of the main advantages of using S is the fact that the adjoint operator S also acts on L2(Ω) (cf. Section 6.3). Using the solution map S, we arrive at the so-called reduced cost \(\mathcal {J} : L^2(\varOmega ) \rightarrow \mathbb {R}\) which is defined as

$$\displaystyle \begin{aligned} \mathcal{J}(z) := J(S z,z) ,\end{aligned} $$

and the minimization problem (21)–(23) is equivalent to the reduced problem:

$$\displaystyle \begin{aligned} \min_{z \in Z_{ad}} \mathcal{J}(z).\end{aligned} $$
(24)

We notice that it is more convenient to analyze (24) in comparison to (21)–(23). In fact, we have the following well-posedness result.

Corollary 3

Let either λ > 0 or Zadbe bounded. Then there exists a solution to the minimization problem (24). If in addition λ > 0 or S is injective then the solution is unique.

Proof 

The proof is a consequence of Theorem 1. We will first show that \(\mathcal {J}\) is weakly lower semicontinuous. Notice that \(\mathcal {J}(z) = \mathcal {J}_1(Sz) + \mathcal {J}_2(z)\) where \(\mathcal {J}_1(Sz) := \frac {1}{2}\|Sz-u_d\|{ }_{L^2(\varOmega )}^2\) and \(\mathcal {J}_2(z) := \frac {\lambda }{2}\|z\|{ }_{L^2(\varOmega )}^2\). Clearly \(\mathcal {J}_2\) is weakly lower semicontinuous (convexity and continuity imply weak lower semicontinuity [18, Theorem 2.23]). On the other hand, due to the compact embedding of \(H^1_0(\varOmega )\) in L2(Ω), we have that S : L2(Ω) → L2(Ω) is completely continuous, i.e., if \(z_n \rightharpoonup z\) in L2(Ω) then Szn → Sz in L2(Ω). Thus, owing to the continuity and convexity of \(\mathcal {J}_1\), we conclude that \(\mathcal {J}_1\) is weakly lower semicontinuous. Whence \(\mathcal {J}\) is weakly lower semicontinuous.

It then remains to characterize the lower γ-level set. Here we replace the lower γ-level set condition by the coercivity of \(\mathcal {J}\) (if λ > 0) or by the closed convex bounded set Zad.

Finally uniqueness is due to strict convexity of \(\mathcal {J}\).

For the remainder of this section, we will consider λ > 0.

6.3 First-Order Optimality Conditions

We are now ready to derive the first-order optimality conditions by following Section 3.1 and the expression of the gradient of the reduced objective function. In Section 6.4, we follow Section 3.2 and consider an alternate strategy to derive the optimality conditions using the Lagrangian formulation.

We recall that the reduced functional is

$$\displaystyle \begin{aligned} \mathcal{J}(z) = \frac{1}{2} \| Sz - u_d \|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2} \| z \|{}^2_{L^2(\varOmega)}.\end{aligned} $$

Using Examples 1 and 2, the gradient of \(\mathcal {J}\) is given by

$$\displaystyle \begin{aligned} \nabla \mathcal{J}(z) = S^*(S z - u_d) + \lambda z\in L^2(\varOmega). \end{aligned}$$

Here we have S : L2(Ω) → L2(Ω). The first-order necessary and sufficient (due to the convexity of \(\mathcal {J}\)) optimality condition (4) then becomes

$$\displaystyle \begin{aligned} (\nabla\mathcal{J} (\bar z), z-\bar{z})_{L^2(\varOmega)} \ge 0 \quad \forall z \in Z_{ad} . \end{aligned} $$
(25)

In order to efficiently evaluate \( \nabla \mathcal {J}(z)\), we introduce the so-called adjoint variable \( p \in H^1_0(\varOmega )\) solving

$$\displaystyle \begin{aligned} \begin{aligned} -\mbox{div } (A \nabla {p}) &= {u} - u_d \quad \mbox{in } \varOmega \\ p &= 0 \quad \mbox{on } \partial\varOmega . \end{aligned} \end{aligned} $$
(26)

We will next show that the adjoint operator S : L2(Ω) → L2(Ω) can be defined by Sζ := p where p solves (26) with right-hand side given by ζ. Here ζ ∈ L2(Ω) is arbitrary. Let z ∈ L2(Ω) be an arbitrary right-hand side of the state equation and the resulting state variable is \(u = Sz \in H^1_0(\varOmega )\). By testing the equation for u with p and vice versa, we obtain that \((z,p)_{L^2(\varOmega )} = (\zeta ,u)_{L^2(\varOmega )}\). Since u = Sz, we deduce that Sζ = p. As a result, S(u − ud) = p where p solves (26).

Thus the gradient computation reduces to evaluation of the following expression:

$$\displaystyle \begin{aligned} \nabla\mathcal{J}(z) = p + \lambda z \in L^2(\varOmega). \end{aligned}$$

Finally, we gather the first-order necessary and sufficient optimality system:

$$\displaystyle \begin{aligned} \bar{u} \in H^1_0(\varOmega): & \quad \int_\varOmega A \nabla \bar{u} \cdot \nabla v\ dx = \int_\varOmega \bar{z} v \ dx\quad \forall v \in H^1_0(\varOmega) {} \end{aligned} $$
(27a)
$$\displaystyle \begin{aligned} \bar{p} \in H^1_0(\varOmega): & \quad \int_\varOmega A \nabla \bar{p} \cdot \nabla v \ dx = \int_\varOmega (\bar u - u_d) v \ dx\quad \forall v \in H^1_0(\varOmega) {} \end{aligned} $$
(27b)
$$\displaystyle \begin{aligned} \bar{z} \in Z_{ad}: & \quad (\bar{p} + \lambda \bar{z},z-\bar{z})_{L^2(\varOmega)} \ge 0, \quad \forall z \in Z_{ad} {}. \end{aligned} $$
(27c)

Notice that (27) is a coupled system, namely \(\bar {u}\) in (27a) depends on the unknown optimal control \(\bar {z}\) which fulfills the inequality (27c). The latter depends on the adjoint variable \(\bar {p}\) that solves the adjoint equation (27b). This in turn depends on \(\bar {u}\). We further remark the variational inequality (27c) for the control is equivalent to the following projection formula (see [81]):

$$\displaystyle \begin{aligned} \begin{aligned} \bar{z}(x) = \mathcal{P}_{Z_{ad}} \left\{ -\frac{1}{\lambda} \bar{p}(x) \right\} \quad \mbox{a.e. } x \in \varOmega . \end{aligned} \end{aligned} $$
(28)

Here \(\mathcal {P}_{Z_{ad}}(v)\) denotes projection of v onto Zad. In our case, we can further write this as \(\mathcal {P}_{Z_{ad}}(v) := \min \left \{b(x), \max \left \{ a(x) , v(x)\right \} \right \}\), x a.e. in Ω.

6.4 Lagrange Method

An alternative approach to derive the first-order optimality system is using the Lagrangian as described in Section 3.2. We again emphasize that even though this approach is used formally in the following, it can be made rigorous. It provides a systematic way of deriving the optimality system, especially for tedious problems and this step is strongly recommended.

Introduce \(L : H^1_0(\varOmega ) \times Z_{ad} \times H^1(\varOmega ) \rightarrow \mathbb {R}\), defined as

$$\displaystyle \begin{aligned} L(u,z,p) := J(u,z) - \int_\varOmega \left( A \nabla u \cdot \nabla p - z p \right) \ dx .\end{aligned} $$

If \((\bar u, \bar z, \bar p)\) is a stationary point then

$$\displaystyle \begin{aligned} \begin{aligned} \langle L_p(\bar u, \bar z, \bar p) , h\rangle_{H^{-1}(\varOmega), H^1_0(\varOmega)} &= 0 \quad \forall h \in H^1_0(\varOmega) , \\ \langle L_u(\bar u, \bar z, \bar p) , h\rangle_{H^{-1}(\varOmega), H^1_0(\varOmega)} &= 0 \quad \forall h \in H^1_0(\varOmega) , \\ \left( L_z(\bar u, \bar z, \bar p) , (z - \bar z) \right)_{L^2(\varOmega)} &\ge 0 \quad \forall z \in Z_{ad}. \end{aligned}\end{aligned} $$
(29)

It is not hard to see that (29) leads to the same optimality system as in (27).

7 Semilinear Quadratic PDE-Constrained Optimization Problem

The focus of the previous section was on the linear quadratic PDECO problem. However, things are more delicate when we replace the linear PDE constraint by a semilinear one. In this section, we provide a brief discussion of a PDECO problem governed by a semilinear PDE.

Let \(\varOmega \subset \mathbb {R}^n\), with n ≥ 2, be a Lipschitz domain and the Assumption 3 holds. Moreover, let ud ∈ L2(Ω) and za, zb ∈ L(Ω) with za(x) < zb(x) a.e. x ∈ Ω and λ ≥ 0 be given. We then consider the following semilinear optimal control problem:

$$\displaystyle \begin{aligned} \min J(u,z) := \frac{1}{2} \|u-u_d\|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2}\|z\|{}_{L^2(\varOmega)}^2\end{aligned} $$
(30)

subject to \(u \in L^\infty (\varOmega ) \cap H^1_0(\varOmega )\) solving the weak form of

$$\displaystyle \begin{aligned} \begin{aligned} -\mbox{div } (A \nabla u) + f(\cdot,u) &= z \quad \mbox{in } \varOmega \\ u &= 0 \quad \mbox{on } \partial\varOmega \end{aligned}\end{aligned} $$
(31)

and

$$\displaystyle \begin{aligned} z \in Z_{ad} := \{ v \in L^\infty(\varOmega) : z_a(x) \le v(x) \le z_b(x) \ , \ a.e. \ x \in \varOmega \}. \end{aligned} $$
(32)

Notice the difference between the semilinear state equation (31) and the linear state equation (22). The key difficulty in the above problem is due to the nonlinearity introduced by f. The control bounds fulfill za, zb ∈ L(Ω). This is different than the linear case where we assumed the bounds to be in L2(Ω). This choice enforces the control z ∈ L(Ω) which in turn provides additional regularity for the state u as we discuss next.

In order to establish existence of solution to the state equation (31), we make certain assumptions on the nonlinearity f.

Assumption 6

For a function \(f:\varOmega \times \mathbb {R}\to \mathbb {R}\) we consider the following assumption:

$$\displaystyle \begin{aligned} \begin{cases} f(x,\cdot) \mathit{\text{ is strictly increasing}}&\mathit{\text{ for a.e. }} x\in\varOmega,\\ f(x,0)=0 &\mathit{\text{ for a.e. }} x\in \varOmega,\\ f(x,\cdot) \mathit{\text{ is continuous }}\;&\mathit{\text{ for a.e. }} x\in \varOmega,\\ f(\cdot,t) \mathit{\text{ is measurable }}&\mathit{\mbox{ for all }} t\in\mathbb{R},\\ \lim_{t\to\infty}f(x,t)=\infty &\mathit{\text{ for a.e. }} x\in \varOmega. \end{cases}\end{aligned} $$

Remark 5

The condition f(x, 0) = 0 in Assumption 6 is not a restriction. If this condition on f cannot be verified, then it is enough to rewrite Equation (31) in Ω as

$$\displaystyle \begin{aligned} -\mbox{div } (A \nabla u) + f(\cdot,u) - f(\cdot,0) = z - f(\cdot,0) \quad \mbox{in } \varOmega .\end{aligned} $$

A typical example of f that fulfills Assumption 6 is given next (cf. [1011]).

Example 3

Let q ∈ [1, ) and let b : Ω → (0, ) be a function in L(Ω), that is, b(x) > 0 for a.e. x ∈ Ω. Define the function \(f:\varOmega \times \mathbb R\to \mathbb R\) by f(x, t) = b(x)|t|q−1t.

Theorem 7 (Existence and Uniqueness for Semilinear PDE) 

Let Assumptions 3 and 6 hold. Then for every z  Lp(Ω) with\(p > \frac {n}{2}\), there exists a unique weak solution\(u \in L^\infty (\varOmega ) \cap H^1_0(\varOmega )\)to the state equation (31) and there exists a constant C = C(α, β, Ω) > 0 such that

$$\displaystyle \begin{aligned} \|u\|{}_{H^1(\varOmega)} + \|u\|{}_{L^\infty(\varOmega)}\le C\|z\|{}_{L^p(\varOmega)}.\end{aligned} $$
(33)

Proof 

The existence of solution is using the Browder–Minty theorem (cf [11, proposition 3.2] after setting s = 1). On the other hand, the L(Ω) regularity is by using a technique of Stampacchia [11, Theorem 3.5], see also [221].

For the above minimization problem, we have (cf. Section 3)

$$\displaystyle \begin{aligned} U = H^1_0(\varOmega) \cap L^\infty(\varOmega), \quad Y = H^{-1}(\varOmega), \quad Z = L^p(\varOmega) \quad \mbox{ with } p > n/2 .\end{aligned} $$

Notice that

$$\displaystyle \begin{aligned} Z_{ad} \subset L^\infty(\varOmega) \subset Z . \end{aligned}$$

As a result and owing to Theorem 7, the control to state map is well defined

$$\displaystyle \begin{aligned} S : L^\infty(\varOmega) \rightarrow L^\infty(\varOmega) \cap H^1_0(\varOmega) . \end{aligned}$$

We notice that S is also well defined as a map from Z to \(L^\infty (\varOmega ) \cap H^1_0(\varOmega )\).Footnote 2 Due to S we can write the reduced problem as

$$\displaystyle \begin{aligned} \min_{z\in Z_{ad}} \mathcal{J}(z) := J(S(z),z). \end{aligned} $$
(34)

In order to show the existence of solution to (34), typically f is assumed to be locally Lipschitz in the second argument to fulfill the assumptions on S in Corollary 1. This assumption was recently weakened in [11] and was replaced by the following growth condition on f: There exists a constant c ∈ (0, 1] such that

$$\displaystyle \begin{aligned} c|f(x,\xi-\eta)|\le |f(x,\xi)-f(x,\eta)| \end{aligned} $$
(35)

for a.e. x ∈ Ω and for all \(\xi ,\eta \in \mathbb {R}\). Such a growth condition is fulfilled by Example 3 (cf. [11]). Under this condition, we have the following existence result for (34).

Corollary 4

Let the Assumptions of Theorem  7 hold. In addition, let f fulfills the growth condition (35) and that

$$\displaystyle \begin{aligned} f(\cdot,w(\cdot))\in L^2(\varOmega)\;\mathit{\mbox{ for every }} w\in L^\infty(\varOmega). \end{aligned} $$
(36)

Then there exists a solution to (34).

Proof 

Similar to Corollary 3, the proof is again a consequence of Theorem 1. We interpret Zad as a subset of Z which is a reflexive Banach space. However, care must be taken to show the weak lower semicontinuity of \(\mathcal {J}\). One has to carefully study the convergence of the state sequence \(\{S(z_n)\}_{n\in \mathbb {N}}\). See for instance [11, Theorem 4.2].

Notice that the condition (36) is also fulfilled by Example 3.

Remark 6

We mention that all the results given in Corollary 4 remain true if one replaces the growth condition (35) and (36) with the following local Lipschitz continuity condition: For all M > 0 there exists a constant LM > 0 such that f satisfies

$$\displaystyle \begin{aligned} |f(x,\xi)-f(x,\eta)|\leq L_{M}|\xi-\eta| \end{aligned} $$
(37)

for a.e. x ∈ Ω and \(\xi ,\eta \in \mathbb {R}\) with |η|, |ξ|≤ M.

For the remainder of this section, we will assume that λ > 0. Before we proceed further to derive the optimality conditions, we need some additional assumptions on f. Notice that the second-order derivatives are needed if one is interested in studying the second-order sufficient conditions.

Assumption 8

We assume the following:

  1. (i)

    The function f(x, ⋅) is k-times, with k = 1, 2, continuously differentiable for a.e. x  Ω.

  2. (ii)

    For all M > 0, there exists a constant LM > 0 such that f satisfies (37) and

    $$\displaystyle \begin{aligned} \left| D^k_u f (x,\xi) - D^k_u f (x,\eta)\right| \le L_{M}|\xi-\eta| , \quad k = 1,2 ,\end{aligned} $$

    for a.e. x  Ω and\(\xi ,\eta \in \mathbb {R}\)with |ξ|, |η|≤ M. Here Dudenotes the partial derivatives with respect to the second component.

  3. (iii)

    D u f(⋅, 0) ∈ L(Ω).

  4. (iv)

    D uu f(⋅, u(⋅)) ∈ L(Ω) whenever u(⋅) ∈ L(Ω).

Assumptions 8 help us prove that \(S : L^\infty (\varOmega ) \rightarrow L^\infty (\varOmega ) \cap H^1_0(\varOmega )\) is not only twice Fréchet differentiable (using the Implicit Function Theorem) but also twice continuous Fréchet differentiability of \(\mathcal {J}\).

By invoking Corollary 2 the first-order necessary optimality conditions are given as follows: For every solution \(\bar {z}\) of the problem (34), there exists a unique optimal state \(\bar {u}=S(\bar z)\) and an optimal adjoint state \(\bar {p}\) such that

$$\displaystyle \begin{aligned} \begin{aligned} \bar{u} \in L^\infty(\varOmega)\cap H^1_0(\varOmega): & \ \int_\varOmega A \nabla \bar{u} \cdot \nabla v\ dx {+} \int_\varOmega f(x,\bar{u}) v \ dx\\ & \qquad \qquad \qquad \qquad {=} \int_\varOmega \bar{z} v \ dx\quad \forall v \in H^1_0(\varOmega) \\ \bar{p} \in H^1_0(\varOmega): & \ \int_\varOmega A \nabla \bar{p} \cdot \nabla v \ dx {+} \int_\varOmega D_u f(x,\bar{u}) \bar{p} \ dx \\ & \qquad \qquad \qquad \qquad {=} \int_\varOmega (\bar u {-} u_d) v \ dx\quad \forall v \in H^1_0(\varOmega) \\ \bar{z} \in Z_{ad}: & \ (\bar{p} {+} \lambda \bar{z},z{-}\bar{z})_{L^2(\varOmega)} \ge 0, \quad \forall z \in Z_{ad}. \end{aligned}\end{aligned} $$
(38)

Alternatively, one can use the Lagrangian approach of Section 6.4 to derive (38).

Notice that the variational inequality in (38) again can be written using the Projection formula as

$$\displaystyle \begin{aligned} \bar{z}(x) = \mathcal{P}_{Z_{ad}} \left\{ -\frac{1}{\lambda} \bar{p}(x) \right\} \quad \mbox{a.e. } x \in \varOmega .\end{aligned} $$
(39)

We further remark that since \(\mathcal {J}\) is non-convex, in general due to the semilinear state equation, we cannot expect a global unique solution to the PDECO problem but only a local one. This local uniqueness can be shown by studying second order sufficient conditions. Nevertheless, care must be taken to prove such a result. This is due to the fact that the penalty term on the control in the cost functional is in L2(Ω). However, the constraints in Zad are in L(Ω). This leads to the so-called L2(Ω) − L(Ω) norm discrepancy and should be taken into account before considering second-order sufficient conditions. We refer to [81, Theorem 4.29] for details. We further remark that the second-order sufficient conditions are a useful tool to derive the discretization error estimates [13]. A further discussion is provided in Theorem 12.

8 Discrete Optimal Control Problem

We will illustrate the main ideas of the finite-element approximation of the PDE-constrained optimization problems in the case of linear elliptic problem with control constraints (21)–(23). First, we briefly review the basics of the finite-element discretization just for the state equation. For what follows, it is sufficient to take f ∈ L2(Ω). We consider the weak form of Equation (19),

$$\displaystyle \begin{aligned} (A\nabla u, \nabla v)_{L^2(\varOmega)}=(f,v)_{L^2(\varOmega)}\quad \forall v\in H^1_0(\varOmega). \end{aligned}$$

We partition the domain Ω into elements. For simplicity, we only discuss the case when elements are simplices. For h ∈ (0, h0]; h0 > 0, let \(\mathcal {T}_h\) denote a quasi-uniform triangulation of Ω with mesh size h, i.e., \(\mathcal {T}_h = \{\tau \}\) is a partition of Ω into triangles or tetrahedrons τ of diameter hτ such that for h =maxτhτ,

$$\displaystyle \begin{aligned}\operatorname{diam}(\tau)\le h \le C |\tau|{}^{\frac{1}{n}}, \quad \forall \tau\in \mathcal{T}_h, \end{aligned}$$

where the constant C > 0 independent of h. For simplicity, we assume ∪ τ = Ω. Let Vh be the set of all functions in \(H^1_0(\varOmega )\) that are continuous on Ω and linear on each τ. Vh is usually called the space of conforming Lagrange piecewise linear elements.

Now we define the finite-element Galerkin approximation uh ∈ Vh of (8), as the unique solution of

$$\displaystyle \begin{aligned} (A\nabla u_h, \nabla v_h)_{L^2(\varOmega)}=(f,v_h)_{L^2(\varOmega)}\quad \forall v_h\in V_h.\end{aligned} $$
(40)

Expanding uh in terms of basis functions, it is easy to see that (40) is equivalent to a system of linear equations and since \(V_h\subset H^1_0(\varOmega )\) the resulting matrix is nonsingular. Notice that by construction

$$\displaystyle \begin{aligned} (A\nabla (u-u_h), \nabla v_h)_{L^2(\varOmega)}=0\quad \forall v_h\in V_h.\end{aligned} $$
(41)

Thus, the Galerkin solution uh is the orthogonal projection of u onto Vh with respect to the inner-product \((A\nabla \cdot ,\nabla \cdot )_{L^2(\varOmega )}\). Almost immediately we obtain the following key result.

Lemma 1 (Céa Lemma)

Let u and u h satisfy (41). Then the following estimate holds

$$\displaystyle \begin{aligned}\|u-u_h\|{}_{H^1(\varOmega)}\le C\min_{\chi\in V_h}\|u-\chi\|{}_{H^1(\varOmega)}.\end{aligned} $$

The constant C depends only on ellipticity, boundedness of the matrix A, and the domain Ω.

The above result says that the Galerkin solution is the best approximation to u from Vh in H1(Ω)-norm up to a constant. We can use Cea’s lemma to derive a priori error estimates. Let Ih : H1(Ω) → Vh be a projection with the approximation properties

$$\displaystyle \begin{aligned} \|u-I_hu\|{}_{H^s(\varOmega)}\le Ch^{2-s}\|u\|{}_{H^2(\varOmega)}, \quad s=0,1,\end{aligned} $$
(42)

then from Cea’s lemma immediately follows

$$\displaystyle \begin{aligned}\|u-u_h\|{}_{H^1(\varOmega)}\le Ch\|u\|{}_{H^2(\varOmega)}.\end{aligned} $$

Notice that the constant C > 0 above is independent of h. The error estimates in L2(Ω)-norm are not immediate, since the Galerkin solution does not have a property of being best approximation in L2(Ω)-norm; nevertheless, one can still establish optimal error estimates in L2(Ω) via a duality argument, also known as Nitsche’s trick. This result requires H2(Ω)-regularity.

Lemma 2

Let Ω be convex or C 1, 1 and u and u h satisfy (41). Then there exists a constant C independent of h such that

$$\displaystyle \begin{aligned}\|u-u_h\|{}_{L^2(\varOmega)}\le Ch\min_{\chi\in V_h}\|u-\chi\|{}_{H^1(\varOmega)}.\end{aligned} $$

Proof

Let e = u − uh and consider a dual problem

$$\displaystyle \begin{aligned}(A\nabla w, \nabla v)_{L^2(\varOmega)}=(e,v)_{L^2(\varOmega)}\quad \forall v\in H^1_0(\varOmega) .\end{aligned} $$

By setting, v = e, we obtain

$$\displaystyle \begin{aligned}\|e\|{}_{L^2(\varOmega)}^2 = (A\nabla w, \nabla e)_{L^2(\varOmega)} = (A\nabla (w-w_h), \nabla e)_{L^2(\varOmega)},\end{aligned} $$

where the last equality is due to (41). Next using the Cauchy–Schwarz inequality and the fact that, under given regularity of the domain \(\|w\|{ }_{H^2(\varOmega )} \le C \|e\|{ }_{L^2(\varOmega )}\), we obtain the required result.

Combining Cea’s lemma and Lemma 2, we immediately establish the optimal a priori error estimate

$$\displaystyle \begin{aligned} \|u-u_h\|{}_{L^2(\varOmega)}\le Ch^2\|u\|{}_{H^2(\varOmega)}. \end{aligned}$$

Notice that the above estimate does require the convexity of Ω.

Corollary 5

H 2 regularity also allows us to express the error in terms of data. Thus from Lemmas 1 and 2 it follows

$$\displaystyle \begin{aligned}\|u-u_h\|{}_{L^2(\varOmega)}+h\|u-u_h\|{}_{H^1(\varOmega)}\le Ch^2\|f\|{}_{L^2(\varOmega)}. \end{aligned}$$

8.1 Discrete Linear Quadratic PDE-Constrained Optimization Problem

For the remainder chapter, we will assume that λ > 0. To discretize the problem, we replace the state space \(H^1_0(\varOmega )\) with \(V_h\subset H^1_0(\varOmega )\) and the control space of admissible functions Zad with Zad,h ⊂ Zad. In case of unconstrained control, we can choose Zad,h = Vh. Theoretically, the mesh for the discretization of the state variable and the mesh for the discretization of the control can be different. However having two different meshes adds more technical difficulties for implementation. For this reason, it is more convenient to work with the same mesh which is what we assume from now on. Thus, the discretized problem (21)–(23) becomes

$$\displaystyle \begin{aligned} \min_{u_h\in V_h, \; z_h\in Z_{ad,h}}J_h(u_h,z_h) = \frac{1}{2}\| u_h - u_d \|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2} \|z_h\|{}^2_{L^2(\varOmega)}\end{aligned} $$
(43)

subject to

$$\displaystyle \begin{aligned} \begin{aligned} (A \nabla u_h,\nabla v_h) &= (z_h,v_h) \quad \forall v_h\in V_h . \end{aligned}\end{aligned} $$
(44)

Similar to the infinite dimensional case, we define a discrete solution operator Sh : Zad,h → Vh to (44) and the reduced discrete problem becomes

$$\displaystyle \begin{aligned} \min_{z_h\in Z_{ad,h}}\mathcal{J}_h(z_h) :=\min_{z_h\in Z_{ad,h}} J_h(S_h z_h,z_h).\end{aligned} $$
(45)

Similar to the continuous problem, one can show that problem (45) has a unique solution \(\bar {z}_h\in Z_{ad,h}\), the corresponding discrete optimal state is \(\bar {u}_h = S_h(\bar {z}_h)\), and similar to Theorem 2 the first-order necessary and sufficient optimality condition is

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime}_h(\bar{z}_h)(z_h-\bar{z}_h)\geq 0, \quad \forall z_h\in Z_{ad,h}. \end{aligned} $$
(46)

8.2 Optimization Problem Without Control Constraints

In this situation, Zad = L2(Ω) and Zad,h = Vh and as a result (25) and (46) reduce to equalities

$$\displaystyle \begin{aligned} \bar{z}=-\frac{1}{\lambda}\bar{p},\quad \bar{z}_h=-\frac{1}{\lambda}\bar{p}_h, \end{aligned} $$
(47)

correspondingly, and as a result the continuous and discrete PDECO problems are equivalent to the systems of equations

$$\displaystyle \begin{aligned} \begin{aligned} \bar{u} &= S(-\frac{1}{\lambda}\bar{p})\\ \bar{p}&=S^*(\bar{u}-u_d) \end{aligned} \end{aligned}$$

and

$$\displaystyle \begin{aligned}\begin{aligned} \bar{u}_h &= S_h(-\frac{1}{\lambda}\bar{p}_h)\\ \bar{p}_h&=S_h^*(\bar{u}_h-u_d). \end{aligned} \end{aligned}$$

As a result \(\bar {z},\bar {u},\bar {p}\in H^2(\varOmega )\) and we can expect second-order convergence for the optimal control in L2 norm. Indeed, one can establish the following result.

Theorem 9

Let \(\bar {z}\) and \(\bar {z}_h\) be optimal solutions to continuous and discrete PDECO problems (21) and (43), respectively, without control constraints. Assume in addition that Ω is convex or C1, 1. Then there exists a constant C independent of h such that

$$\displaystyle \begin{aligned} \| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch^2\left(\|\bar{z}\|{}_{L^2(\varOmega)}+\|u_d\|{}_{L^2(\varOmega)}\right). \end{aligned}$$

Proof

We begin by recalling the unconstrained optimality conditions \(\lambda \bar {z} + \bar {p} = 0 = \lambda \bar {z}_h + \bar {p}_h\), which yields \( (\lambda \bar {z} + \bar {p}, \bar {z}_h-\bar {z}) = 0 = (\lambda \bar {z}_h + \bar {p}_h, \bar {z}-\bar {z}_h) . \) Adding these last two equalities, we arrive at

$$\displaystyle \begin{aligned} \lambda \|\bar{z}-\bar{z}_h\|{}^2_{L^2(\varOmega)} = (\bar{p}-\bar{p}_h, \bar{z}_h - \bar{z})_{L^2(\varOmega)} = (S^*(S\bar{z}-u_d) - S_h^*(S_h\bar{z}_h - u_d) ,\bar{z}_h - \bar{z})_{L^2(\varOmega)}, \end{aligned} $$
(48)

where in the last equality we have used the representation of \(\bar {p}\) and \(\bar {p}_h\). Up on rewriting (48), we obtain

$$\displaystyle \begin{aligned} \lambda \|\bar{z}-\bar{z}_h\|{}^2_{L^2(\varOmega)} = (S^*S\bar{z} - S_h^*S_h\bar{z}_h ,\bar{z}_h - \bar{z})_{L^2(\varOmega)} +((S^*- S_h^*)u_d ,\bar{z}_h - \bar{z})_{L^2(\varOmega)} = I + II. \end{aligned} $$
(49)

It follows from Corollary 5 that

$$\displaystyle \begin{aligned} |II| \le C h^2 \|u_d\|{}_{L^2(\varOmega)} \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)}. \end{aligned} $$
(50)

It then remains to estimate I in (49). We add and subtract \((S_h^* S_h \bar {z},\bar {z}_h - \bar {z})_{L^2(\varOmega )}\) to I and arrive at

$$\displaystyle \begin{aligned} I &= ( (S^*S - S_h^*S_h)\bar{z} ,\bar{z}_h - \bar{z})_{L^2(\varOmega)} + ( S_h(\bar{z} - \bar{z}_h) ,S_h(\bar{z}_h - \bar{z}))_{L^2(\varOmega)} \\ &\le ( (S^*S - S_h^*S_h)\bar{z} ,\bar{z}_h - \bar{z})_{L^2(\varOmega)}, \end{aligned} $$
(51)

where we have used the fact that \(( S_h(\bar {z} {-} \bar {z}_h) ,S_h(\bar {z}_h {-} \bar {z}))_{L^2(\varOmega )} = {-} \|S_h(\bar {z}{-}\bar {z}_h) \|{ }^2 \le 0\). Again adding and subtracting \((S_h^*S\bar {z},,\bar {z}_h - \bar {z})_{L^2(\varOmega )}\) to (51), we arrive at

$$\displaystyle \begin{aligned} |I| &\le |( (S^*-S_h^*)S\bar{z} ,\bar{z}_h - \bar{z})_{L^2(\varOmega)} + ( S_h^*(S-S_h)\bar{z} ,\bar{z}_h - \bar{z})_{L^2(\varOmega)}| \\ &\le C h^2 \|\bar{z}\|{}_{L^2(\varOmega)} \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)}, \end{aligned} $$
(52)

where we have first used the triangle inequality and have estimated the first term using Corollary 5 and continuity of S : L2(Ω) → L2(Ω): \(|( (S^*-S_h^*)S\bar {z} ,\bar {z}_h - \bar {z})_{L^2(\varOmega )}| \le C h^2 \|S\bar {z}\|{ }_{L^2(\varOmega )} \|\bar {z}-\bar {z}_h\|{ }_{L^2(\varOmega )} \le C h^2 \|\bar {z}\|{ }_{L^2(\varOmega )} \|\bar {z}-\bar {z}_h\|{ }_{L^2(\varOmega )}\). The estimate of the remaining term follows again using Corollary 5 and the continuity of \(S_h^* : L^2(\varOmega ) \rightarrow L^2(\varOmega )\). Finally, substituting the estimates of I and II from (52) and (50) in (49), we arrive at the asserted result.

8.3 Optimization Problem with Control Constraints

For the rest of this section, we assume constant box constraints, i.e., \(z_a,z_b\in \mathbb {R}\), with za < zb. We remind the reader that in this situation the optimal control is given by a projection formula:

$$\displaystyle \begin{aligned} \bar{z}(x) = \mathcal{P}_{Z_{ad}} \left\{ -\frac{1}{\lambda} \bar{p}(x) \right\}. \end{aligned} $$
(53)

If the constraints are active, \(\bar {z}\notin H^2(\varOmega )\). However, we can still conclude \(\bar {z}\in H^1(\varOmega )\) and even \(\bar {z}\in W^1_\infty (\varOmega )\) by using [58, Theorem A.1]. In light of this, the second-order convergence cannot in general be expected. There are several approaches to treat the problem.

8.3.1 Cell-Wise Constant Control Discretization

One idea is to consider a cellwise constant discretization of the control variable, i.e., we choose \(Z_{ad,h}=Z_{ad}\cap Z_h^0\), where \(Z_h^0\) is a space of piecewise constant functions on the partition \(\mathcal {T}_h\). This idea goes back to Falk [35]. Since we consider piecewise constant discretization, only first-order convergence for the control can be expected. Indeed for such discretization, one can establish the following convergence result.

Theorem 10

Let \(\bar {z}\) and \(\bar {z}_h\) be optimal solutions to continuous and discrete PDECO problems (21) and (43), respectively, with control constraints (23). Let\(Z_{ad,h}=Z_{ad}\cap Z_h^0\). Assume in addition that Ω is convex or C1, 1. Then there exists a constant C independent of h such that

$$\displaystyle \begin{aligned}\| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch\left(\|\bar{z}\|{}_{H^1(\varOmega)}+\|u_d\|{}_{L^2(\varOmega)}\right). \end{aligned}$$

Proof

First we define a projection \(\pi _h:Z_{ad}\to Z_{ad}\cap Z_h^0\) by

$$\displaystyle \begin{aligned} \pi_h v\mid_\tau =\frac{1}{|\tau|}\int_\tau v\ dx,\quad \forall\tau\in \mathcal{T}_h. \end{aligned} $$
(54)

Thus the projection πh is the orthogonal projection onto \(Z_h^0\) with respect to L2-inner-product, i.e.

$$\displaystyle \begin{aligned} (v - \pi_h{v},w)_{L^2(\varOmega)} =0, \quad w\in Z_h^0 \end{aligned} $$
(55)

and has the following approximation property

$$\displaystyle \begin{aligned} \| v - \pi_h{v}\|{}_{L^2(\varOmega)} \le Ch\| \nabla v\|{}_{L^2(\varOmega)}, \quad v\in H^1(\varOmega). \end{aligned} $$
(56)

Then replacing z by \(\bar {z}_h\) in (25) and zh by \(\pi _h \bar {z}\) in (46), we arrive at

$$\displaystyle \begin{aligned} (\lambda\bar{z}+\bar{p}, \bar{z}_h - \bar{z})_{L^2(\varOmega)} \ge 0, \quad (\lambda\bar{z}_h+\bar{p}_h, \pi_h \bar{z} - \bar{z}_h)_{L^2(\varOmega)} \ge 0 . \end{aligned}$$

Adding these inequalities, we obtain that

$$\displaystyle \begin{aligned} \lambda \|\bar{z}-\bar{z}_h\|{}^2_{L^2(\varOmega)} &\le (\bar{p}-\bar{p}_h, \bar{z}_h-\bar{z})_{L^2(\varOmega)} + (\lambda \bar{z}_h + \bar{p}_h, \pi_h\bar{z}-\bar{z})_{L^2(\varOmega)} = I+II. \end{aligned} $$
(57)

The estimate of I is exactly the same as in Theorem 9, i.e.,

$$\displaystyle \begin{aligned} |I| &\le C h^2 (\|\bar{z}\|{}_{L^2(\varOmega)} +\|u_d\|{}_{L^2(\varOmega)}) \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)} \\ &\le C_\lambda h^4 (\|\bar{z}\|{}_{L^2(\varOmega)} +\|u_d\|{}_{L^2(\varOmega)})^2 + \frac{\lambda}{4} \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)}^2 , \end{aligned} $$
(58)

where we have used the Young’s inequality in addition. Next we provide estimate for II. Using the characterization of \(\bar {p}_h\), followed by, adding and subtracting \((S_h^*(S_h\bar {z}-u_d),\pi _h\bar {z}-\bar {z})_{L^2(\varOmega )}\), and using the continuity of \(S_h^*\), we obtain

$$\displaystyle \begin{aligned} II &= (S_h^*(S_h\bar{z}_h-u_d),\pi_h\bar{z}-\bar{z})_{L^2(\varOmega)} \\ &= (S_h^*S_h(\bar{z}_h{-}\bar{z}),\pi_h\bar{z}{-}\bar{z})_{L^2(\varOmega)} {+}(S_h^*(S_h\bar{z}{-}u_d),\pi_h\bar{z}{-}\bar{z})_{L^2(\varOmega)} =: II_1 {+} II_2 . \end{aligned} $$
(59)

To estimate II1, we use (56) and Young’s inequality to arrive at

$$\displaystyle \begin{aligned} |II_1| \le C_\lambda h^2 \|\bar{z}\|{}_{H^1(\varOmega)}^2 + \frac{\lambda}{4} \|\bar{z}-\bar{z}_h\|{}_{L^2(\varOmega)}^2 . \end{aligned} $$
(60)

It then remains to estimate II2 in (59). By adding and subtracting \((S_h^*(S\bar {z}-u_d),\pi _h\bar {z}-\bar {z})_{L^2(\varOmega )}\) in II2, we obtain that

$$\displaystyle \begin{aligned} |II_2| &= |(S_h^*(S_h-S)\bar{z},\pi_h\bar{z}-\bar{z})_{L^2(\varOmega)} + (S_h^*(S\bar{z}-u_d),\pi_h\bar{z}-\bar{z})_{L^2(\varOmega)}| \\ &\le C h^3 \|\bar{z}\|{}_{H^1(\varOmega)} + |( (S_h^*-S^*)(S\bar{z}-u_d), \pi_h\bar{z}-\bar{z})_{L^2(\varOmega)}| \\ &\quad + |( S^*(S\bar{z}-u_d), \pi_h\bar{z}-\bar{z})_{L^2(\varOmega)}| =: II_{2,1} + II_{2,2} + II_{2,3}, \end{aligned} $$
(61)

where we have used Corollary 5 and (56) to estimate the first term. Moreover, we have added and subtracted \((S^*(S\bar {z}-u_d),\pi _h\bar {z}-\bar {z})_{L^2(\varOmega )}\) to the second term. It then remains to estimate II2,2 and II2,3. Again using Corollary 5 and (56), we obtain that \(II_{2,2} \le C h^3 (\|\bar {z}\|{ }_{L^2(\varOmega )}+\|u_d\|{ }_{L^2(\varOmega )})\|\bar {z}\|{ }_{H^1(\varOmega )}\). Finally, to estimate II2,3 we first recall that \(S^*(S\bar {z}-u_d) = \bar {p}\). Since πh is L2-orthogonal projection, we obtain

$$\displaystyle \begin{aligned} II_{2,3} = ( \bar{p}-\pi_h\bar{p}, \pi_h\bar{z}-\bar{z})_{L^2(\varOmega)} \le C h^3 (\|\bar{z}\|{}_{L^2(\varOmega)}+\|u_d\|{}_{L^2(\varOmega)}) \|\bar{z}\|{}_{H^1(\varOmega)}. \end{aligned} $$
(62)

Collecting all the estimates, we arrive at the asserted result.

Comparing this result with the unconstrained case, we have only first-order convergence. This is mainly due to the choice of the discrete control space which does not take the full advantage of the regularity of the optimal control, namely \(\bar {z}\in W^1_\infty (\varOmega )\) . Moreover, away from the active constraints \(\bar {z}\) is still in H2. Taking this in consideration, there are some alternatives to increase the order of the convergence.

8.3.2 Cell-Wise Linear Control Discretization

To improve the convergence rate of the above result, we consider Zad,h = Zad ∩ Vh, i.e., the control space consists of piecewise linear functions satisfying constraints (22). The approximation properties in this setting were investigated in a number of papers, for example [72, 74]. We will not provide all the details, only highlight the main ideas. To take advantage of the regularity for \(\bar {z}\) discussed above, we consider the following sets:

$$\displaystyle \begin{aligned} \mathcal{T}^1_h = \{\tau\in \mathcal{T}_h \mid \bar{z}\mid_\tau = z_a\ \text{or}\ \bar{z}\mid_\tau = z_b \}, \end{aligned}$$

which is the set of active cells,

$$\displaystyle \begin{aligned} \mathcal{T}^2_h = \{\tau\in \mathcal{T}_h \mid z_a<\bar{z}\mid_\tau < z_b \}, \end{aligned}$$

the set of not active cells, and the rest

$$\displaystyle \begin{aligned}\mathcal{T}^3_h=\mathcal{T}_h\backslash(\mathcal{T}^1_h\cup\mathcal{T}^2_h). \end{aligned}$$

Then under the assumption that

$$\displaystyle \begin{aligned} meas(\mathcal{T}^3_h)\le Ch, \end{aligned} $$
(63)

which is valid, for example, if the boundary of the active set consists of a finite number of rectifiable curves, one can establish the following result.

Theorem 11

Let \(\bar {z}\) and \(\bar {z}_h\) be optimal solutions to continuous and discrete PDECO problems (21) and (43), respectively, with control constraints. Assume in addition that Ω is convex or C1, 1, ud ∈ Lp(Ω) for p > n and assumptions (63) hold. Then there exists a constant C independent of h such that

$$\displaystyle \begin{aligned}\| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch^{\frac{3}{2}}\left(\|\bar{p}\|{}_{H^2(\varOmega)}+\|\nabla \bar{z}\|{}_{L^\infty(\varOmega)}\right). \end{aligned}$$

Proof

The proof of this result can be found in [73].

Remark 7 (Variational Discretization)

The idea of the variational discretization approach introduced by Hinze [52] is not to discretize the control variable, i.e., to choose Zad,h = Zad. This approach does give second-order convergence for the control but requires a nonstandard implementation, especially for n > 1.

8.4 Semilinear Equations

Similar to the linear case, we discretize the problem with finite elements. Thus, we replace the state space \(H^1_0(\varOmega )\) with \(V_h\subset H^1_0(\varOmega )\) and the control space of admissible functions Zad with Zad,h ⊂ Zad. We additionally assume that \(z_a,z_b\in \mathbb {R}\cup \{\pm \infty \}\), with za < zb. In case of unconstrained control, we take za = − and zb = , i.e., Zad,h = Vh. The discretized problem (30)–(32) becomes

$$\displaystyle \begin{aligned} \min_{u_h\in V_h, \; z_h\in Z_{ad,h}}J_h(u_h,z_h) = \frac{1}{2}\| u_h - u_d \|{}^2_{L^2(\varOmega)} + \frac{\lambda}{2} \|z_h\|{}^2_{L^2(\varOmega)}\end{aligned} $$
(64)

subject to

$$\displaystyle \begin{aligned} \begin{aligned} (A \nabla u_h,\nabla v_h)+(f(\cdot,u_h),v_h) &= (z_h,v_h) \quad \forall v_h\in V_h . \end{aligned}\end{aligned} $$
(65)

All of the above strategies for choosing Zad,h can be applied for the semilinear problem as well and similar error estimates can be obtained. Of course the arguments are more technical and we refer to [13, 22, 29] for details.

Theorem 12

Assume that Ω is convex domain with C 1, 1 boundary and let Assumptions 6 and 8 be satisfied. Let\(\bar {z}\)be a strict local solution to (30)(32) that fulfills the second-order optimality condition: There exist δ > 0 and τ > 0 such that

$$\displaystyle \begin{aligned} \mathcal{J}^{\prime\prime}(\bar{z}) (z,z) \ge \delta \|z\|{}_{L^2(\varOmega)}^2 \end{aligned}$$

holds for all z  L(Ω) satisfying

$$\displaystyle \begin{aligned} z(x) \left\{\begin{array}{ll} \ge 0 & \mathit{\mbox{ if }} \bar{z}(x) = z_{a} , \\ \le 0 & \mathit{\mbox{ if }} \bar{z}(x) = z_{b} , \\ = 0 & \mathit{\mbox{ if }} |\lambda\bar{z}+\bar{p}| \ge \tau > 0. \end{array} \right.\end{aligned} $$
(66)

Then:

  • [ 13 , Thm. 5.1] (Cell-wise constant control) Let \(Z_{ad,h}=Z_{ad}\cap Z_h^0\) and \(\{\bar {z}_h\}\) be a sequence of locally optimal piecewise constant solutions to (64)(65) that converges strongly in L2(Ω) to\(\bar {z}\). Then there exists a constant C independent of h such that

    $$\displaystyle \begin{aligned}\| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch.\end{aligned} $$
  • [ 29 , Thm. 4.5](Cell-wise linear control) Let Z ad,h = Zad ∩ Vhand\(\{\bar {z}_h\}\)be a sequence of locally optimal piecewise linear solutions to (64)(65) that converges strongly in L2(Ω) to\(\bar {z}\). If in addition (63) holds, then there exists a constant C independent of h such that

    $$\displaystyle \begin{aligned}\| \bar{z} - \bar{z}_h\|{}_{L^2(\varOmega)} \le Ch^{\frac{3}{2}}.\end{aligned} $$

9 Conclusion and the Current State of the Art

In this introductory chapter, we reviewed the main ideas behind the study of PDECO. We briefly mentioned numerical approximation of such problems by the finite-element method and showed some convergence results in the case of control constraints. However, the subject is vast with many active research directions. In this final section, let us mention some topics that we skipped and some active areas of research that we did not touch.

  • In the above discussion, we only touched problems with control constraints. However, problems with state constraints are equally important. In contrast to the control constraints, which basically amount to projection onto the feasible set, the state constraints require much more care since the Lagrange multipliers are not functions, only measures. We refer to [81, Chapter 6] for a nice introduction on the subject. For such problems, the error analysis is also more subtle and one often has to reserve to more technical pointwise error estimates to derive optimal-order convergence. We refer to [28, 64, 65] for some recent developments of the subject.

  • All above error estimates were a priori error estimates. However, a large fraction of the literature on finite elements is devoted to a posteriori error estimates, i.e., estimates where the error between the true solution and the discrete approximated solution is expressed in terms of computable quantities. We refer to [75, 77] for more recent development of the subject.

  • One can consider more complicated state equations or even systems, which can be linear or nonlinear, time dependent, variational inequalities, and so on. Currently, the theory is well developed for problems constrained by linear and semilinear elliptic problems, but the research is very much active for nonlinear, time dependent, and variational inequalities [61, 68].

  • We only consider a quadratic cost functional in our error analysis. However, other choices may be desired. For example, it was observed numerically that seeking the control from the space of regular Borel measures forces the sparsity of optimal solution, meaning that the support of solution is small. This phenomenon was analyzed for elliptic and parabolic problems in a number of papers [23,24,25]; however, there are still some remaining open questions.

  • In all our examples, we considered the distributed control, i.e., the control z was acting in the interior of the domain. However, problems where the control acts on the boundary are important in many applications. The control can enter as Dirichlet, Neumann, or Robin boundary conditions. Because of the variational structure of the problems, the Neumann and Robin boundary conditions naturally enter the variational form of the state equation and as a result Neumann boundary controls can be naturally analyzed, see [81, Chapter 2], we refer to [26, 27] for the Robin case. Dirichlet boundary conditions do not have this property and one has to use more sophisticated machinery to overcome technical difficulties and to derive optimal error estimate for the optimal solution [12]. Alternatively, one can also use the penalized Robin boundary conditions to study the Dirichlet boundary control problems [63].