Numerical Optimization Methods for the Optimal Control of Elliptic Variational Inequalities

Surowiec, Thomas M.

doi:10.1007/978-1-4939-8636-1_4

Thomas M. Surowiec⁶

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 163))

1853 Accesses
2 Citations

Abstract

The optimal control of variational inequalities introduces a number of additional challenges to PDE-constrained optimization problems both in terms of theory and algorithms. The purpose of this article is to first introduce the theoretical underpinnings and then to illustrate various types of numerical methods for the optimal control of variational inequalities. For a generic problem class, sufficient conditions for the existence of a solution are discussed and subsequently, the various types of multiplier-based optimality conditions are introduced. Finally, a number of function-space-based algorithms for the numerical solution of these control problems are presented. This includes adaptive methods based on penalization or regularization as well as non-smooth approaches based on tools from non-smooth optimization and set-valued analysis. A new type of projected subgradient method based on an approximation of limiting coderivatives is proposed. Moreover, several existing methods are extended to include control constraints. The computational performance of the algorithms is compared and contrasted numerically.

TMS’s research was sponsored in part by the DFG grant no. SU 963/1-1 as well as Research Center matheon supported by the Einstein Foundation Berlin within project OT1.

Access provided by CONRICYT-eBooks. Download chapter PDF

Finding global solutions of some inverse optimal control problems using penalization and semismooth Newton methods

Article Open access 21 June 2023

An ℓ 1-Penalty Scheme for the Optimal Control of Elliptic Variational Inequalities

A difference-of-convex functions approach for sparse PDE optimal control problems with nonconvex costs

Article 04 May 2019

1 Introduction

The optimal control of variational inequalities is a natural extension of PDE-constrained optimization in which the forward problem or underlying PDE is replaced by a variational inequality or convex variational problem. There are a vast number of applications in which variational inequalities or convex variational problems are used. Perhaps the most well-known disciplines are continuum mechanics [32, 52] and mathematical image processing [67, 70, 72]. However, one also finds variational inequalities in such diverse areas as petroleum engineering [85], digital microfluidics [84], and mathematical finance [51].

Just as in PDE-constrained optimization, we are interested in optimizing or controlling the solution to variational inequalities through one or several parameters. For example, in a packaging problem, we might be interested in determining the optimal force distribution to apply to an elastic membrane in order to obtain a desired shape without passing through a fixed obstacle. In the petroleum engineering application mentioned above, one obtains a variational inequality when using the variational formulation for the steady laminar flow of drill mud modelled as a Bingham fluid. Here, one might wish to determine the optimal pressure needed to affect a desired flow regime.

Due to a lack of Fréchet (or even Gâteaux) differentiability of the control-to-state mapping, a reduced space optimization approach will require either concepts from variational analysis or approximation theory not only to prove existence of a solution but also to derive meaningful first-order optimality conditions and develop efficient numerical methods.

The theoretical and algorithmic challenges remain intact when considering a full-space approach, since the feasible set (after reformulating the variational inequality as a complementarity problem) is both non-convex and fails to satisfy the usual constraint qualifications. For example, in PDE-constrained optimization, one typically uses the Slater or Robinson-Zowe-Kurcyusz constraint qualifications, cf. [48, 77], to ensure the existence and boundedness of the set of Lagrange multipliers. In order to handle these degenerate constraints, one may take a penalty or relaxation approach, though recent work suggests that there are viable theoretical approaches to this “full-space” setting that do not rely on smoothing, see [81].

We now give a brief historical development. We caution the reader that this is only a sampling of the sizeable amount of work published over the past five decades. The purpose is merely to provide an understanding of the current motivations in theory and numerical methods, especially the ideas presented in this paper. Variational inequalities were first introduced by Signorini and solved by Fichera [29]. For this article, the early works by Brezis, Lions, and Stampacchia [18, 60] are perhaps the most relevant as they provide us with a sufficient existence and regularity theory. As a starting point for further study, we also suggest the well-known monographs by Kinderlehrer and Stampacchia and Rodrigues [53, 71]. Perhaps the earliest work on the optimal control of variational inequalities is due to Yvon, in which the adaptive penalty technique for variational inequalities (see Section 4.1) was used in order to approximate the control problem by a more tractable, parameter-dependent problem [86, 87]. This was later used by Lions [58, 59] and fully developed by Barbu [7]. The monograph by Barbu [7] contains many important results and techniques that are still used today, see also related contributions in [9,10,11,12, 50, 68, 73]. In 1976, Mignot offered an alternative to the smoothing approach by developing concepts of generalized differentiability [63]. There, he was able to prove and obtain an explicit formula for the Hadamard directional differentiability of the control-to-state mapping of the variational inequality and, with this result, derive first-order optimality conditions. This result later appeared in [64] and is rederived in [45] using techniques from variational analysis. More recent work in the control of variational inequalities has focused on the development of efficient, function-space-based numerical methods and ever more complex settings as in [2, 4, 21,22,23, 34,35,36, 39, 40, 45, 49, 56, 65, 66, 80, 83].

Parallel to the infinite-dimensional developments, a great deal of progress has been made on theory and numerical methods for finite-dimensional mathematical programs with equilibrium or complementarity constraints, as evidenced by the well-known manuscripts by Luo, Pang, and Ralph [61] and Outrata, Kočvara, and Zowe [69]. In addition to the references therein, we also mention the approaches in [3, 74], which have since been extended to infinite dimensions and are in part featured in this paper.

In reference to the existing finite-dimensional literature, recent works have begun referring to the problem class considered here as elliptic mathematical programs with equilibrium constraints (or elliptic MPECs). We will henceforth do the same for the sake of brevity.

The article is structured as follows. After introducing some necessary notation, data assumptions, and the canonical example in Section 2, we discuss sufficient conditions for the existence of a solution in Section 3. Afterwards, we give several kinds of multiplier-based first-order stationarity conditions similar to classical Karush-Kuhn-Tucker (KKT) conditions. These may or may not be first-order optimality conditions depending on the regularity and data of the given problem. For a newcomer to elliptic MPECs, this potentially puzzling array of possible stationarity conditions may seem strange. However, it is necessary to understand the gap in what is theoretically the “best” type of KKT point and what type of stationary point a given numerical method can theoretically guarantee (at least asymptotically).

Following the theoretical results in Section 3, we pass to the main focus of our discussion: numerical optimization methods for elliptic MPECs. First, in Section 4, we consider what we refer to as “regularization-based” methods. In Section 4.1, we present a typical adaptive smoothing method that makes use of approximation theory for variational inequalities and which is directly linked to the derivation of first-order stationarity conditions for the elliptic MPEC. Similarly, in Section 4.2, we use a simple penalization of the complementarity condition (in weak form) to obtain a simpler PDE-constrained optimization problem with control and state constraints. The numerical solution of the subproblems in the adaptive penalty method is thoroughly discussed for a canonical example problem and the numerical behavior is illustrated. We note that the smoothed problems become increasingly difficult to solve as the smoothing parameter tends to zero, which in part motivates the desire for “non-smooth” methods.

The structure of the non-smooth numerical methods section is devised to illustrate the parallels to some popular methods for smooth PDE-constrained optimization problems, e.g., projected-gradient methods (Section 5.1), direct solvers for the KKT system (Section 5.2), and globalization of the direct solvers via a line search (Section 5.3). There are obvious (and expected) limitations to these approaches, all of which arise from the non-smooth or degenerate nature of elliptic MPECs. Nevertheless, they indicate the possibility of inventing new efficient numerical methods beyond the adaptive smoothing/penalty paradigm, which often outperform the smooth methods in practice.

In Section 5.1, we present a new subgradient method for the reduced space problem. The choice of “subgradient” is motivated by the limiting variational calculus found, e.g., in the book by Mordukhovich [66]. We discuss the asymptotic behavior of this method and illustrate the potential for future study. Following this in Section 5.2, an active-set-based solver with feasibility restoration suggested by Hintermüller in [36] as a direct solver for the C-stationarity system is considered. Finally, in Section 5.3, we consider the so-called bundle-free implicit programming approach [46], which can be understood as a globalization of the active-set solver using a line search. We extend the latter to include control constraints. The performance of the non-smooth methods is then compared and contrasted with that of the adaptive penalty method. All numerical examples have been solved in the Julia programming language [14] version 0.5.0 on a 2016 MacBook Pro Intel(R) Core(TM) i5 @ 3.1 GHz with 16 GB 2133 MHz LPDDR3 RAM.

2 Notation, Assumptions, and Preliminary Results

In this section, we fix our notation and analytical framework. Elliptic variational inequalities and elliptic MPECs are introduced.

2.1 Norms, Inner Products, and Convergence

All spaces are based on the real number field. The Euclidean norm and scalar product on $\mathbb {R}^m$ are denoted by |x| and x ⋅ y for $x,y \in \mathbb {R}^m$, respectively. For $r \in \mathbb {R}$, we denote $\max (0,r)$ by (r)₊. All other norms are denoted by ∥⋅∥_X for some space X. The topological dual of X is denoted by X^∗ and the dual pairing by $\langle \cdot , \cdot \rangle _{X,X^*}$. The inner product on a Hilbert space H is denoted by (⋅, ⋅)_H. Given a sequence {x_k}⊂ X, we denote strong convergence to $\overline {x} \in X$ by $x_k \stackrel {X}{\to } \overline {x}$; for weak convergence, we use $x_k \stackrel {X}{\rightharpoonup } \overline {x}$ and for weak-star convergence, $x_k \stackrel {X}{\rightharpoonup ^*} \overline {x}$. In all these cases, we drop the sub- or superscripts if it is clear in context.

2.2 Extended Real-Valued Functionals and Convex Analysis

For a topological vector space U, the extended real-valued functional $G:U \to \overline {\mathbb {R}}$ is proper if G(u) > −∞ for all u ∈ U and G(w) < +∞ for some w ∈ U. G is said to be closed or lower-semicontinuous if its epigraph

$$\displaystyle \begin{aligned} \mathrm{epi}\, G:= \left\{ (u,\alpha) \in U \times \mathbb{R} \left|\; G(u) \le \alpha \right.\right\} \end{aligned}$$

is a closed subset in the product topology on $U \times \mathbb {R}$. Moreover, G is convex provided

$$\displaystyle \begin{aligned} G(\lambda u + (1-\lambda) w)) \le \lambda G(u) + (1-\lambda)G(w),\quad \forall u,w \in U, \forall \lambda \in [0,1]. \end{aligned}$$

For a Banach space V and $F:V \to \overline {\mathbb {R}}$, the (convex) subdifferential of F at some point x ∈ V such that −∞ < F(x) < +∞ is defined by the potentially empty set:

$$\displaystyle \begin{aligned} \partial F(x) := \left\{x^* \in V^* \left|\; F(y) \ge F(x) + \langle x^*,y-x\rangle,\forall y \in V \right.\right\}. \end{aligned}$$

If C ⊂ V , then the indicator functional i_C of C is defined by

$$\displaystyle \begin{aligned} i_{C}(x) = 0 \text{ if } x \in C,\; \text{ and } i_{C}(x) = +\infty \text{ otherwise.} \end{aligned}$$

If C is non-empty, closed, and convex, then $\mathcal {N}_C(x) := \partial i_{C}(x)$ is called the convex normal cone to C at x ∈ C. If K ⊂ V is a non-empty, closed, convex cone, then

$$\displaystyle \begin{aligned} \xi \in \mathcal{N}_K(x)\ \Leftrightarrow\ x \in K, \xi \in K^{-} := \left\{\mu \in V^* \left| \langle \mu,y \rangle \le 0,\forall y \in K \right.\right\} \text{ and }\langle \xi, x \rangle = 0. \end{aligned}$$

Recall that K⁻ referred to as the polar cone to K. For more on convex analysis, see, e.g., [25].

2.3 Function Spaces and 2-Capacity

Unless noted, $\varOmega \subset \mathbb {R}^n$, $n \in \left \{1,2,3\right \}$, is a non-empty, bounded, and open subset with Lipschitz boundary Γ = ∂Ω. The Lebesgue measure of $E \subset \mathbb {R}^n$ is denoted by meas(E). For p ∈ R with 1 ≤ p < +∞, we denote the usual Lebesgue space of p-integrable functions by L^p(Ω) and the Lebesgue space of essentially bounded functionals by L^∞(Ω). The norms are given by

$$\displaystyle \begin{aligned} \| u \|{}_{L^p} = \left(\int_{\varOmega} |u(\omega) |{}^p d\omega\right)^{1/p} \text{ and } \| u \|{}_{L^{\infty}} = \text{ess sup}_{\omega \in \varOmega}| u(\omega) |, \end{aligned}$$

respectively. For $k \in \mathbb {N}$, we denote the Sobolev space of L^p-functions u with $| D^{\alpha }u |= |(\partial ^{\alpha _1}u,\dots ,\partial ^{\alpha _n}u)| \in L^p(\varOmega )$ by W^{k, p}(Ω), where α = (α₁, …, α_n) is a multi-index with |α₁| + ⋯ + |α_n|≤ k and $\partial ^{\alpha _i}u$ is the α_i-th weak partial derivative of u with respect to x_i, i ∈{1, 2, 3}. The W^{k, p}-norms are then defined by

$$\displaystyle \begin{aligned} \| u \|{}_{W^{k,p}} = \left(\sum_{|\alpha|\le k}\int_{\varOmega} |D^{\alpha} u(\omega) |{}^p d\omega\right)^{1/p} \text{ and } \| u \|{}_{W^{k,\infty}} = \sum_{|\alpha|\le k}\text{ess}\ \text{sup}_{\omega \in \varOmega}| D^{\alpha} u(\omega) |.\end{aligned} $$

For our discussion, the most important case is when p = 2. Here, both L²(Ω) and H^k(Ω) := W^{k, 2}(Ω) are Hilbert spaces with inner product defined using the norms given above. We denote the space of all H¹-functions with zero trace by $H^1_0(\varOmega )$ and its dual by H⁻¹(Ω). It follows from the Poincaré inequality that

$$\displaystyle \begin{aligned} \| u \|{}_{H^1_0} = \|\nabla u\|{}_{L^2} = (\nabla u,\nabla u)_{L^2}^{1/2}. \end{aligned}$$

is an equivalent norm on $H^1_0(\varOmega )$. See, e.g., [1] for more. Finally, recall that the 2-capacity of an arbitrary subset E ⊂ Ω is given by

$$\displaystyle \begin{aligned} \mathrm{Cap}_2(E,\varOmega) := \inf\left\{\| u \|{}^2_{H^1_0}: v \in H^1_0(\varOmega)\; v \ge 1 \text{ a.e. on open neighborhood }G \supset E \right\}. \end{aligned}$$

cf. [6, Prop. 5.8.3]. Note that $H^1_0$-functions possess a representative that is continuous up to a set of positive capacity, cf. [26]. Moreover, there exist E ⊂ Ω with Cap₂(E, Ω) > 0 and meas(E) = 0, e.g., the boundary of a smooth open set. Hence, an $H^1_0$-function in 2D is continuous across a smooth curve in the plane, but not at single points.

2.4 Elliptic Variational Inequalities

Let H be a Hilbert space and H^∗ its topological dual. For this subsection, (⋅, ⋅) denotes the inner product and ∥⋅∥ the norm on H. The pairing between H and H^∗ is denoted by 〈⋅, ⋅〉. Let $a: H \times H \to \mathbb {R}$ be a bilinear form on H and A : H → H^∗ the associated bounded linear operator, i.e., a(u, v) = 〈Au, v〉, u, v ∈ H. We recall that a is said to be coercive/elliptic, if there exists some constant c > 0 such that a(v, v) ≥ c∥v∥² for all v ∈ H. Clearly, the function ∥v∥_a := (a(v, v))^1∕2 defines an equivalent norm on H. Let K ⊂ H be non-empty, closed, and convex and w ∈ H^∗. Then, the variational problem

$$\displaystyle \begin{aligned} \text{Find } u \in K: a(u,v-u) \ge \langle w, v - u \rangle,\quad \text{ for all } v \in K \end{aligned} $$

(1)

is called an elliptic variational inequality. Note that problems of this type are often referred to as variational inequalities of the first kind. They can be equivalently written as a generalized equation:

$$\displaystyle \begin{aligned} \text{Find } u \in K: Au + \mathcal{N}_K(u) = Au + \partial i_K(u) \ni w. \end{aligned} $$

(2)

If we replace i_K by a subdifferentiable proper closed convex functional φ, then we obtain a variational inequality of the second kind. The following is due to Lions and Stampacchia [60], see also [53]:

Theorem 1

Let $a:H \times H \to \mathbb {R}$ be a coercive bilinear form, K ⊂ H closed and convex, and w ∈ H^∗. Then, (1) possesses a unique solution S(w). Moreover, the solution mapping w↦S(w) is Lipschitz continuous: For all w₁, w₂ ∈ H^∗, it holds that

$$\displaystyle \begin{aligned} \|S(w_1) - S(w_2)\|{}_{H} \le (1/c)\|w_1 - w_2\|{}_{H^*}, \end{aligned}$$

where c is the constant of coercivity of a.

Although the solution mapping S is Lipschitz continuous, it is not necessarily differentiable. In some special cases, S is directionally differentiable. For more on variational inequalities, see, e.g., [53]. The conditions on A and K will be considered standing assumptions on the variational inequality for the rest of this article. We conclude this subsection with an example.

Example 1

In the context of (1), let $H:= H^1_0(\varOmega )$ and H^∗ = H⁻¹(Ω). Moreover, define Ψ ∈ H¹(Ω) with Ψ|_Γ ≤ 0 and $K \subset H^1_0(\varOmega )$ such that

$$\displaystyle \begin{aligned} K := \left\{ u \in H^1_0(\varOmega) \left|\; u(x) \ge \varPsi(x),\; \text{ for almost every (a.e.) } x \in \varOmega \right.\right\}. \end{aligned}$$

Clearly, since $\max (0,\varPsi ) \in H^1_0(\varOmega )$, K≠∅. Moreover, one readily shows that K is closed and convex in $H^1_0(\varOmega )$. The setting of (1) is general enough to allow for nonsymmetric bilinear forms a(u, v), e.g.,

$$\displaystyle \begin{aligned} a(u,v) := \int_{\varOmega} \sum_{i,j} a_{ij}\partial_i u \partial_j v - \sum_{i} b_i (\partial_i u) v + c u v \mathrm{d}x,\quad u,v \in H^1_0(\varOmega), \end{aligned}$$

with appropriate assumptions on a_ij, b_i, c. However, for our purposes it suffices to consider

$$\displaystyle \begin{aligned} a(u,v) := \int_{\varOmega} \nabla u \cdot \nabla v \mathrm{d}x,\quad u,v \in H^1_0(\varOmega) \end{aligned}$$

Letting f ∈ H⁻¹(Ω) and combining the above, we obtain a classical obstacle problem

$$\displaystyle \begin{aligned} \text{Find } u \in K: \int_{\varOmega} \nabla u\cdot \nabla [v-u] \mathrm{d}x \ge \langle f, v - u \rangle,\quad \text{ for all } v \in K.\end{aligned} $$

If Ψ|_Γ ≡ 0, then (without altering the boundary conditions), we obtain an equivalent problem:

$$\displaystyle \begin{aligned} \text{Find } u \in K_0: \int_{\varOmega} \nabla u\cdot \nabla [v-u] \mathrm{d}x \ge \langle f + \varDelta \varPsi, v - u \rangle,\quad \text{ for all } v \in K_0, \end{aligned}$$

where $K_0 = \left \{ u \in H^1_0(\varOmega ) \left |\; u(x) \ge 0,\; \text{ a.e. } x \in \varOmega \right .\right \};$ see, e.g., [53, 71].

Of course, this set K represents perhaps the easiest kind of set that one can consider in a variational inequality. It is just one example of a so-called “polyhedric set,” a notion introduced in the mid-1970s by Haraux [33] and Mignot [63] in the context of sensitivity analysis of variational inequalities. See the recent detailed study [82] for a state-of-the-art on polyhedricity. For more general closed, convex sets, e.g.,

$$\displaystyle \begin{aligned} \left\{ u \in H^1_0(\varOmega) \left|\; | (\nabla u)(x) | \le \varPsi(x),\; \text{a.e. } x \in \varOmega \right.\right\} \end{aligned} $$

(3)

it is possible to prove existence, uniqueness, and continuity results under weak assumptions on Ψ. Since these sets are neither polyhedric nor cones, the differential sensitivity analysis of the solution map and subsequent derivation of optimality conditions or efficient numerical methods is extremely challenging. Elliptic MPECs with this type of constraint were considered in [45]. However, we caution the reader that the proofs for the derivation of the tangent cones are in fact erroneous. Thus, the differential sensitivity results for the solution mapping only hold under the assumption that the tangent cones do in fact have the form purported in the text.

2.5 Elliptic Mathematical Programs with Equilibrium Constraints

We present an abstract framework for a class of elliptic MPECs. Let V , H, and Z be separable Hilbert spaces such that the state space V is a dense subset of H and V ⊂ H ⊂ V^∗ both algebraically and topologically. Moreover, assume that f ∈ V^∗, Z_ad ⊂ Z is a non-empty, closed, and convex set (the set of admissible controls/decision variables), B : Z → V^∗ is a bounded linear operator and $F:U \to \overline {\mathbb {R}}$ and $G: Z \to \overline {\mathbb {R}}$. We define an elliptic MPEC as follows:

$$\displaystyle \begin{aligned} \min\ & J(z,u) := F(u) + G(z) \text{ over } (z,u) \in Z \times V, \end{aligned} $$

(4a)

$$\displaystyle \begin{aligned} \text{s.t. } &z \in Z_{\mathrm{ad}},\; u \text{ solves }~(1) \text{ with } w := Bz + f. \end{aligned} $$

(4b)

In light of Theorem 1, we can rewrite (4) in reduced form, analogously to standard PDE-constrained optimization problems:

$$\displaystyle \begin{aligned} \min\ \mathcal{J}(z) := F(S(Bz)) + G(z) \text{ over } z\in Z_{\mathrm{ad}}. \end{aligned} $$

(5)

Here, f ∈ V^∗ is fixed. Therefore, we only write S(Bz) for the solution mapping (instead of S(Bz + f)). Obtaining a meaningful full-space formulation of the optimization problem (4) is not always possible. However, if K in (1) is a cone, then we can also formulate a full-space version of the elliptic MPEC by introducing a slack variable ξ ∈ V^∗:

$$\displaystyle \begin{aligned} \min\ &J(z,u) \text{ over } (z,u,\xi) \in Z \times V \times V^*, \end{aligned} $$

(6a)

$$\displaystyle \begin{aligned} \text{s.t. } &z \in Z_{\mathrm{ad}},{} \end{aligned} $$

(6b)

(6c)

(6d)

Here, K⁺ := −K⁻. This follows from (2). Due to the presence of the complementarity condition (6d), problems of the type (6) are often referred to in the finite-dimensional literature (and more recently in the infinite-dimensional literature) as (elliptic) mathematical programs with complementarity constraints (MPCCs); see e.g., [61, 69, 74] (finite dimensions) and the recent work [81] (infinite dimensions).

Remark 1

We consider here the simplest case in which the control or decision variable z appears only on the “right-hand side” of the variational inequality. However, in many interesting applications, e.g., topology optimization [8], the control enters nonlinearly through the differential operator A, e.g., when A(z)u = −div(z∇u). In the context of elliptic MPECs, this case is scarcely covered in the literature, see [36].

We finalize this section with a canonical example of an elliptic MPEC.

Example 2

In the notation of (4), we set $V = H^1_0(\varOmega )$, H = L²(Ω), V^∗ = H⁻¹(Ω), and Z = L²(Ω). For some u_d ∈ L²(Ω), we let F be a standard tracking-type functional and G an L²-Tikhonov regularization:

$$\displaystyle \begin{aligned} J(z,u) := F(u) {+} G(z) = \frac{1}{2}\|u {-} u_d\|{}^2_{L^2} {+} \frac{\alpha}{2}\|z\|{}^2_{L^2},\quad (z,u) \in L^2(\varOmega) \times H^1_0(\varOmega),\, \alpha > 0.\end{aligned} $$

For the forward problem, we choose the simplest form of the obstacle problem with Ψ ≡ 0. Then, the following is an elliptic MPEC:

$$\displaystyle \begin{aligned} \min\ &\frac{1}{2}\|u - u_d\|{}^2_{L^2} + \frac{\alpha}{2}\|z\|{}^2_{L^2},\text{ over } (z,u) \in L^2(\varOmega) \times H^1_0(\varOmega), \end{aligned} $$

(7a)

$$\displaystyle \begin{aligned} \text{s.t. } &z \in Z_{\mathrm{ad}},\; u \in K_0 \text{ solves }: \int_{\varOmega} \nabla u\cdot \nabla [v-u] \mathrm{d}x \ge \langle Bz + f, v - u \rangle,\quad \text{ for all } v \in K_0,\end{aligned} $$

(7b)

Some possibilities for Z_ad are local bilateral constraints: − 1 ≤ w(x) ≤ 1, a.e. x ∈ Ω or global constraints such as $ \|w\|{ }_{L^2} \le 1 $.

3 Existence and Stationarity Conditions

We start this section in the abstract framework of Section 2.5. In order to provide insight into the deeper meaning of the various stationarity conditions, we ultimately restrict ourselves to the canonical example (Example 2).

3.1 Existence of a Solution

Under suitable conditions, on F, G, Z_ad, and B, we can use Weierstrass’s existence theorem, see, e.g., [6, Thm. 3.2.5] to prove that the reduced form MPEC has a solution. For the following result, we appeal to the monograph by Barbu [7, Chap. 3].

Theorem 2

In the context of problem (4), we assume the following:

1.
$F:H \to \mathbb {R}$ is locally Lipschitz and nonnegative.
2.
$G:Z \to \overline {\mathbb {R}}$is convex, lower-semicontinuous, and for some constants κ₁ > 0,$\kappa _ 2 \in \mathbb {R}$. G(z) ≥ κ₁∥z∥_Z + κ₂, ∀z ∈ Z
3.
B is completely continuous.

Then, (4), or equivalently (5), admits a solution.

Proof

By replacing G(z) with $G(z) + i_{Z_{ad}}(z)$, the assertion follows from [7, Prop. 3.1]. In particular, we note that by Barbu [7, Lem. 3.1] the mapping (S ∘ B) : Z → V is completely continuous and, furthermore, the composite objective functional $(F \circ S \circ B):Z \to \mathbb {R}_+$ is weakly lower-semicontinuous.

Since for any z ∈ Z_ad, (1) has a unique solution u = S(Bz). There exists a unique $\xi \in \mathcal {N}_K(u)$ such that ξ = Au − Bz − f. Conversely, if (z, u, ξ) satisfies (6b)–(6d), then z ∈ Z_ad and u = S(Bz). In other words, there is a one-to-one correspondence between the feasible set of (5) and (6). Therefore, if (4), or equivalently (5), admits a solution, then so does (6).

Corollary 1

Under the assumptions of Theorem 2 , (6) admits a solution.

Remark 2

Ignoring Theorem 2, we could also prove this assertion by appealing to [6, Thm. 3.2.5]. Indeed, under the hypotheses of Theorem 2, (6) can be viewed as an unconstrained problem in which the associated objective functional is radially unbounded, proper, and weakly lower-semicontinuous.

3.2 Primal Stationarity Conditions

In this subsection, we restrict ourselves to the setting of Example 2. As mentioned earlier, there are some situations in which the control-to-state mapping S is directionally differentiable. In particular, Mignot [63] demonstrated that the control-to-state mapping in Example 2 is directionally differentiable in the sense of Hadamard. The next result draws the parallel to PDE-constrained optimization problems.

Theorem 3

Let $(\overline {z},\overline {u})$ be a solution to (7), then the following first-order necessary optimality condition holds:

$$\displaystyle \begin{aligned} \alpha (\overline{z}, w - \overline{z}) + (\overline{u}-u_d, S'(\overline{u}; B(w - \overline{z}))) \ge 0,\quad \forall w \in Z_{\mathrm{ad}}. \end{aligned} $$

(8)

In the finite-dimensional literature, e.g., in [61, 74], primal first-order conditions of the type (8) are often referred to as B-stationarity conditions. There, the “B” comes from the fact that either the so-called Bouligand tangent cone or Bouligand differentiability is used. This motivates the next definition.

Definition 1

If $(\overline {z},\overline {u})$ is a feasible point of (7) that satisfies (8), then $(\overline {z},\overline {u})$ is B-stationary.

For the setting in (5), it is possible to derive B-stationarity conditions provided that F, G are smooth. Clearly, the main challenge is determining directional differentiability of S. Suppose for the sake of argument that $S'(\overline {u}; du)$ is linear in du ∈ V^∗. Then, we can “dualize” the B-stationarity conditions (8):

$$\displaystyle \begin{aligned} \alpha (\overline{z}, w - \overline{z}) + (B^*p, w - \overline{z}) \ge 0,\quad \forall w \in Z_{\mathrm{ad}},\quad p = S'(\overline{u})^*(u-u_d). \end{aligned}$$

Note that the condition “$p = S'(\overline {u})^*(u-u_d)$” essentially means (in this ideal case) that p solves a type of adjoint equation. Using this “adjoint state” p, we could now develop solution algorithms. However, $S'(\overline {u};du)$ is in general nonlinear ( albeit Lipschitz) in du and thus, the standard adjoint state p does not exist. As a result, there are several different types of dual/multiplier-based first-order stationarity conditions reminiscent of classical KKT conditions in nonlinear programming.

3.3 Dual Stationarity Conditions

In this subsection, we again restrict ourselves to the setting in Example 2. Furthermore, we assume that Γ is regular enough to guarantee that $S(Bz) \in H^2(\varOmega ) \cap H^1_0(\varOmega )$ for any z ∈ Z_ad. This is possible if, e.g., Bz + f ∈ L²(Ω) and Ω is a convex polyhedron, cf. [18, 71]. This regularity assumption ensures that ξ ∈ L²(Ω) and, by the Sobolev embedding theorem, S(Bz) is at least Hölder continuous on Ω. We now introduce two multiplier-based first-order stationarity conditions taken from [39, 40].

Definition 2

Let z ∈ Z_ad and u = S(Bz) with associated slack variable ξ ∈ L²(Ω). The set $\varOmega ^0 := \left \{x \in \varOmega \left |\; u(x) = 0 \right .\right \}$ is called the active set or coincidence set and $I := \left \{x \in \varOmega \left |\; u(x) > 0 \right .\right \}$ the inactive set. The set $\varOmega ^{00} := \varOmega ^0 \cap \left \{x \in \varOmega \left |\; \xi (x) = 0\right .\right \}$ is called the biactive or weakly active set and $\varOmega ^{0+} := \varOmega ^0 \cap \left \{x \in \varOmega \left |\; \xi (x) > 0\right .\right \}$ the strongly active set.

Note that the biactive and strongly active sets are only defined up to a set of Lebesgue measure zero. In contrast, the active and inactive sets may be more finely defined up to sets of positive capacity even in less regular settings.

Definition 3

The point $(z,u,\xi ) \in L^2(\varOmega ) \times H^1_0(\varOmega ) \times L^2(\varOmega )$ is called a C-stationary point for (7) provided that there exist $p \in H^1_0(\varOmega )$ and λ ∈ H⁻¹(Ω) such that the following system of equations is fulfilled:

$$\displaystyle \begin{aligned} \alpha (\overline{z}, w - \overline{z}) + (B^*p, w - \overline{z}) &\ge 0, \quad \forall w \in Z_{\mathrm{ad}} \end{aligned} $$

(9a)

$$\displaystyle \begin{aligned} A^*p - \lambda &= u_d - u \end{aligned} $$

(9b)

$$\displaystyle \begin{aligned} Au - \xi &= Bz + f \end{aligned} $$

(9c)

$$\displaystyle \begin{aligned} \xi &\ge 0,\text{ a.e. } x \in \varOmega {} \end{aligned} $$

(9d)

$$\displaystyle \begin{aligned} u &\ge 0,\text{ a.e. } x \in \varOmega {} \end{aligned} $$

(9e)

$$\displaystyle \begin{aligned} (\xi,u) &= 0 {} \end{aligned} $$

(9f)

$$\displaystyle \begin{aligned} \langle \lambda, p\rangle &\le 0 {} \end{aligned} $$

(9g)

$$\displaystyle \begin{aligned} p &= 0,\text{ a.e. } x \in \varOmega^{0+} {} \end{aligned} $$

(9h)

$$\displaystyle \begin{aligned} \langle \lambda, \phi \rangle &= 0, \forall \phi \in H^1_0(\varOmega): \phi = 0,\text{ a.e. } \varOmega^0 {} \end{aligned} $$

(9i)

If, in addition to (9), p and λ satisfy:

$$\displaystyle \begin{aligned} p &\le 0,\text{ a.e. } x \in \varOmega^{00} {} \end{aligned} $$

(10a)

$$\displaystyle \begin{aligned} \langle \lambda, \phi \rangle &\ge 0, \forall \phi \in H^1_0(\varOmega): \phi \ge 0,\text{ a.e. } \varOmega^{00}, \phi = 0,\text{ a.e. } \varOmega \setminus (I \cup \varOmega^{00}), {} \end{aligned} $$

(10b)

then $(z,u,\xi ) \in L^2(\varOmega ) \times H^1_0(\varOmega ) \times L^2(\varOmega )$ is called an S-stationary point.

Several comments on this system are necessary. To start, we note that if K₀ is replaced by the entire space $H^1_0(\varOmega )$, then ξ, λ and the conditions (9d)–(10b) vanish and we obtain the usual first-order system for a linear-quadratic PDE-constrained optimization problem with convex control constraints. Moreover, if there is no biactive set, then both (in fact all) dual stationarity concepts coincide. Thus, biactivity is essentially the source of all difficulties in the study of MPECs. We will see later that it also relates to the Gâteaux differentiability of the control-to-state map.

Perhaps the most critical point here is the usage of almost everywhere versus quasi everywhere conditions for the multipliers. As defined above, these S-stationarity conditions are not equivalent to the necessary first-order optimality conditions in the pioneering works [63, 64]; unless certain regularity assumptions on the active/biactive/inactive sets are made. In [63, 64] (see also [43, 80]), the notion of capacity (in contrast to Lebesgue measure) is used. This is because capacity is needed for the correct representation of the tangent/contingent cone and its polar. When replacing q.e. by a.e. some conditions become stronger and others finer.

In order to see this, suppose that meas(Ω⁰) = 0, but Cap₂(Ω⁰, Ω) > 0. Then, (9i) would imply λ = 0 ∈ H⁻¹(Ω). However, when using capacity, since the test functions ϕ admit quasi-continuous representatives, requiring ϕ = 0 “quasi-everywhere” (q.e.) on Ω⁰ would shrink the set of test functions so that λ is not necessarily zero. Similar problems arise by requiring a.e. instead of q.e. conditions on the adjoint variables p. Indeed, if Ω⁰⁺ has measure zero, then (9h) is trivially satisfied by any p; this would not be the case if we use q.e. since $p \in H^1_0(\varOmega )$ admits a quasi-continuous representative.

Therefore, the stationarity conditions of Mignot and Puel, see (29) below, should be considered the true strong/S-stationarity conditions for our canonical MPEC as opposed to (9)–(10). For more complex constraints in which the image space of the constraint mapping is an L^p-space, as in, e.g.,[34, 35, 45], the problems with defining active sets up to sets of capacity zero seem to be absent.

Nevertheless, (9)–(10) arose naturally through the limiting process of an adaptive penalty scheme. As such, they are highly relevant for the study of numerical methods for elliptic MPECs, especially for methods utilizing smoothing plus continuation. In contrast, it is unclear how to properly include notions of capacity into efficient numerical methods (without simplifying assumptions as in [46]).

Finally, in many complex real-world applications, it might be impossible to obtain even C-stationarity conditions of this type, due to a lack of compactness and regularity properties. For example, in the optimal control of electrowetting on dielectrics [4], Allen-Cahn [27, 28], Cahn-Hilliard [20], or Cahn-Hilliard-Navier-Stokes system with obstacle potentials [47], one can usually only derive an approximation of (9g). Here, this would be equivalent to replacing (9g) by

$$\displaystyle \begin{aligned} \limsup_{k\to \infty} \langle \lambda_k, p_k\rangle \le 0 \end{aligned}$$

for sequences $\left \{\lambda _k\right \}$ and $\left \{p_k\right \}$ with $\lambda _k \rightharpoonup \lambda $ and $p_k \rightharpoonup p$.

In conclusion, the various notions of dual stationarity and the theory needed to derive them are still active areas of research. The main point here is that there is a stratification of concepts that one should also be aware of when considering the design and convergence of numerical methods. That is, it is not enough to prove that a scheme converges but also to what kind of stationary point. Of course, many of the issues involving capacity, weak topologies, products of weakly convergent sequences, and weak lim-inf or lim-sup-type sign conditions will not necessarily appear in numerical experiments.

4 Regularization-Based Methods

In this and the coming sections, we present a number of numerical optimization methods for elliptic MPECs. These are split into two classes: Methods that employ adaptive smoothing, relaxation, or penalization of the forward problem or complementarity constraints (regularization-based methods) and those that do not (non-smooth methods).

We also note that the proper discretization of elliptic MPECs using (adaptive) finite elements schemes that take into account the additional difficulties due to the inherent degeneracy/non-smoothness has only been considered in a handful of papers. For example, we mention the most recent works [16, 17, 24, 37, 62]. For our numerical study, we make use of a simple finite difference scheme to discretize the operators along with a nested grid approach to simulate mesh refinements. This allows us to easily compare the methods.

The regularization-based methods all follow a similar scheme. First, the variational inequality is approximated by a parameter-dependent semilinear elliptic PDE or the complementarity constraint is relaxed or penalized. This yields in both cases a more tractable family of approximating optimization problems. Next, the smooth PDE-constrained problems are solved, yielding a parameter-dependent KKT point. Finally, continuation is performed on the regularization parameter (passing to 0 or + ∞).

The non-smooth methods are rather different and there is still plenty of room for new ideas. We only mention here that with these methods the emphasis is placed on directly solving the original, non-smooth problem without changing the forward problem. We postpone further details until later.

4.1 An Adaptive Penalty Method

We begin in the abstract framework of Section 2.5 and present a general approximation result as found in [7, Thm. 2.2]. For a comprehensive study on the numerical analysis and approximation of variational inequalities, see, e.g., [31]. In the following, ϕ^𝜖 refers to some penalty functional for the constraint K.

Definition 4

For any constant 𝜖 > 0, let $\phi ^{\epsilon }: V \to \mathbb {R}$ such that ϕ^ε is convex and Fréchet differentiable on V and satisfies

1.
There exists a C, independent of 𝜖, u with ϕ^𝜖(u) ≥−C(∥u∥_V + 1) for all ε > 0, u ∈ V .
2.
ϕ^𝜖(u) → i_K(u) as 𝜖 ↓ 0 for all u ∈ V .
3.
For all u ∈ V and for all {u_𝜖} such that $u_{\epsilon } \rightharpoonup u$ as 𝜖 ↓ 0, liminf_𝜖↓0ϕ^𝜖(u_𝜖) ≥ i_K(u).

Remark 3

See also [7, Thm 2.4] for functionals on H, which is more relevant for (13).

In the abstract setting, we approximate (1) by

$$\displaystyle \begin{aligned} Au + \nabla \phi^{\epsilon}(u) = Bz + f \end{aligned} $$

(11)

We denote the approximate solution mapping by S_𝜖. For the next result, we restrict ourselves to the case when A is symmetric (as in Example 2), see [7, Thm. 2.2].

Theorem 4

Let w _𝜖 → w in V^∗as 𝜖 ↓ 0, then the sequence {u_𝜖}⊂ V with u_𝜖 := S_𝜖(w_𝜖) converges weakly to u = S(w) as 𝜖 ↓ 0. If in addition,

$$\displaystyle \begin{aligned} \langle \nabla \phi^{\epsilon}(y) - \nabla \phi^{\delta}(v),y-v\rangle \ge -C(1 + (\| \nabla \phi^{\epsilon}(y) \|{}^2_{V^*} + \| \nabla \phi^{\delta}(v) \|{}^2_{V^*}))(\epsilon + \delta), \\ \forall \epsilon, \delta > 0, \forall y,v \in V, \end{aligned} $$

(12)

then u _𝜖 → u (strongly in V ) as 𝜖 ↓ 0.

In the setting of Example 2, one possibility for ϕ^𝜖 is:

$$\displaystyle \begin{aligned} \phi^{\epsilon}(u) := (2\epsilon)^{-1}\|(-u)_+\|{}^2_{L^2}. \end{aligned} $$

(13)

Redefining K₀ for L²-functions, ϕ^𝜖 in (13) is the Moreau-Yosida regularization of the indicator functional $i_{K_0}$ with respect to the L²-topology, cf. [5, Sec. 2.7]. However, in order to solve the smoothed MPECs numerically, we will require more smoothness of ϕ^𝜖 below. We now approximate (5) by:

$$\displaystyle \begin{aligned} \min\ &\mathcal{J}_{\epsilon}(z) := F(S_{\epsilon}(Bz)) + G(z) \text{ over } z\in Z_{\mathrm{ad}},{} \end{aligned} $$

(14)

Thus, the existence theory, optimality conditions, and numerical methods for (14) reduces to results from PDE-constrained optimization.

Turning to Example 2, we employ a smoothed plus function in (11) characterized by:

$$\displaystyle \begin{aligned} \nabla \phi^{\epsilon}(u) = \left\{\begin{array}{ll} \epsilon^{-1}u-0.5,& \text{ if } u \ge \epsilon,\\ \frac{u^3}{\epsilon^3} - \frac{u^4}{2\epsilon^4},& \text{ if } u \in (0,\epsilon),\\ 0,& \text{ else,} \end{array}\right. \quad \nabla^2 \phi^{\epsilon}(u) = \left\{\begin{array}{ll} \epsilon^{-1},& \text{ if } u \ge \epsilon,\\ \frac{3u^2}{\epsilon^3} - \frac{2u^3}{\epsilon^4},& \text{ if } u \in (0,\epsilon),\\ 0.& \text{ else.} \end{array}\right. \end{aligned} $$

(15)

We can then prove that the approximate solution mapping S_𝜖 is Fréchet differentiable. Then, for a solution (z_𝜖, u_𝜖), there exists an adjoint state p_𝜖 ∈ V such that

$$\displaystyle \begin{aligned} \alpha (z_{\epsilon}, w - z_{\epsilon}) + (B^*p_{\epsilon}, w - z_{\epsilon}) &\ge 0, \quad \forall w \in Z_{\mathrm{ad}}{} \end{aligned} $$

(16a)

$$\displaystyle \begin{aligned} A^*p_{\epsilon} - \lambda_{\epsilon} &= u_d - u_{\epsilon},{} \end{aligned} $$

(16b)

$$\displaystyle \begin{aligned} Au_{\epsilon} - \xi_{\epsilon}, &= Bz_{\epsilon} + f,{} \end{aligned} $$

(16c)

$$\displaystyle \begin{aligned} \xi_{\epsilon} &= \nabla \phi^{\epsilon}(-u_{\epsilon}),\text{ a.e. } x \in \varOmega {} \end{aligned} $$

(16d)

$$\displaystyle \begin{aligned} \lambda_{\epsilon} &= -\nabla^2 \phi^{\epsilon}(-u_{\epsilon})p_{\epsilon},\text{ a.e. } x \in \varOmega, {} \end{aligned} $$

(16e)

holds; cf. the techniques in [57, 77].

We note here that the case of gradient constraints mentioned in (3) can also be treated using such a penalty method. However, using a standard quadratic penalty/Moreau-Yosida-type approach yields a penalized state equation of the form:

$$\displaystyle \begin{aligned} -\mathrm{div}((1 + \xi_{\epsilon}) \nabla u_{\epsilon}) &= Bz_{\epsilon} + f, \end{aligned} $$

(17a)

$$\displaystyle \begin{aligned} \xi_{\epsilon} &= \frac{1}{\epsilon}\left(1 - \frac{\varPsi}{|\nabla u_{\epsilon}|} \right)_+, \end{aligned} $$

(17b)

which is considerably more challenging due to the bilinear dependence on u_𝜖 in the PDE and accompanying non-smooth first-order PDE.

From (16), it is possible to derive C-stationarity conditions and, under further assumptions, even S-stationarity conditions, cf. [39, 40], by passing to the limit in 𝜖. More specifically, we show that a subsequence of stationary points for (16) converges to a point that satisfies C-stationarity for (7). Therefore, if we have a numerical method that solves (16), then by performing continuation on 𝜖 ↓ 0 we have a means of numerically approximating C-stationary points (or better) for (7). This furnishes a convergence proof in function space for the “outer loop” of the method, depicted in Algorithm 4.1. For the interested reader, we provide a short discussion of the limiting arguments in Detail 5.

Algorithm 4.1 Adaptive penalty method: outer loop

Detail 5 (Sketch of the Limiting Technique)

We first note that (16a) is equivalent to

$$\displaystyle \begin{aligned} z_{\epsilon} = \mathrm{Proj}_{Z_{\mathrm{ad}}}(-\frac{1}{\alpha} B^*p_{\epsilon}).\end{aligned} $$

(18)

Suppose Z_ad is bounded, then {z_𝜖}_𝜖>0 is bounded in Z. Let 𝜖_k ↓ 0. Then, there exists a subsequence {z_l} with $z_{l} = z_{\epsilon _{k_l}} \stackrel {Z}{\rightharpoonup }$z^⋆ ∈ Z_ad. Hence, $S_{\epsilon _{k_l}}(Bz_{l}) =: u_{l} \stackrel {V}{\to } u^{\star } = S(Bz^{\star })$ by Theorem 4 due to the complete continuity of B. Therefore, $\xi _{l} \to \xi ^{\star } := Bz^{\star } + f -Au^{\star } \in -\mathcal {N}_{K_0}(u^{\star })$, i.e., ξ^⋆, u^⋆ satisfy. For the adjoint state p_𝜖 and multiplier λ_𝜖, we test the adjoint equation (16b) with p_𝜖:

$$\displaystyle \begin{aligned}\displaystyle c_1\|p_{\epsilon}\|{}^2_{V} \le \langle A^*p_{\epsilon} - \lambda_{\epsilon},p_{\epsilon}\rangle = (u_d - u_{\epsilon},p_{\epsilon}) \le\\\displaystyle \|u_d - u_{\epsilon}\|{}_{H}\| p_{\epsilon}\|{}_{H} \le c_2 \|u_d - u_{\epsilon}\|{}_{H}\| p_{\epsilon}\|{}_{V} \end{aligned} $$

for scalars c₁, c₂ > 0, independent of 𝜖. Since $u_{l} \stackrel {V}{\to } u^{\star }$, {p_l} is bounded in V . Hence, there exists $\{p_{l_m}\}$ with $p_{l_{m}}\stackrel {V}{\rightharpoonup }p^{\star }$ and, by substitution, $\lambda _{l_m} \stackrel {V^*}{\rightharpoonup } \lambda ^{\star } = u^{\star } -u_d +A^*p^{\star }$.

Without knowledge of the best possible system, one might stop at this point; however, the theory indicates that a much more refined system is possible. We quickly demonstrate (9g). The remaining conditions (9h), (9i) require lengthy arguments that go beyond the scope of this work. Returning to (16b), it is clear that

$$\displaystyle \begin{aligned} \langle A^*p_{\epsilon},p_{\epsilon}\rangle - (u_d - u_{\epsilon},p_{\epsilon}) = \langle\lambda_{\epsilon},p_{\epsilon}\rangle = -\int_{\varOmega} \nabla^2 \phi^{\epsilon}(-u_{\epsilon})| p_{\epsilon} |{}^2 \mathrm{d}x \le 0. \end{aligned}$$

Since A is symmetric, the bilinear form a(⋅, ⋅) is weakly lower-semicontinuous. Moreover, due to the Rellich-Kondrachov theorem, both $u_{\epsilon } \stackrel {H}{\to } u^{\star }$, $p_{\epsilon } \stackrel {H}{\to }p^{\star }$. Thus,

$$\displaystyle \begin{aligned} \langle \lambda^{\star},p^{\star}\rangle = \langle A^*p^{\star},p^{\star}\rangle - (u_d - u^{\star},p^{\star}) \le \liminf_{m\to +\infty} \langle A^*p_{l_m},p_{l_m}\rangle - (u_d - u_{l_m},p_{l_m}) \le 0.\end{aligned} $$

(19)

In [7], a more general setting allows a similar argument. However in general, e.g., if A is not symmetric, we only have:

$$\displaystyle \begin{aligned} \limsup_{\epsilon \downarrow 0 } \langle\lambda_{\epsilon},p_{\epsilon}\rangle \le 0 \end{aligned}$$

as mentioned at the end of Section 3.3. □

A possible stopping rule in Algorithm 4.1 could be the residual of the C-stationarity system (9) or when a minimum value of 𝜖^k+1 is reached, e.g., machine precision. Obviously, the most computationally demanding part here is step 3. Assuming that $\mathrm {Proj}_{Z_{\mathrm {ad}}}$ is relatively simple to calculate, e.g., for local bilateral constraints, then (16) reduces to

$$\displaystyle \begin{aligned} A^*p_{\epsilon} +\nabla^2 \phi^{\epsilon}(-u_{\epsilon})p_{\epsilon} &= u_d - u_{\epsilon}, \end{aligned} $$

(20a)

$$\displaystyle \begin{aligned} Au_{\epsilon} - \nabla \phi^{\epsilon}(-u_{\epsilon}) &= B\mathrm{Proj}_{Z_{\mathrm{ad}}}(-\frac{1}{\alpha} B^*p_{\epsilon}) + f. \end{aligned} $$

(20b)

One could then solve (20) using, e.g., a semismooth Newton method. Alternatively, an interior point approach for the projection might be employed. We briefly recall the semismooth Newton method in infinite-dimensional spaces as discussed in [19, 38, 79]. Let X, Y be Banach spaces, D ⊂ X an open subset of X, and $\mathcal {F}:D \to Y$.

Definition 5

The mapping $\mathcal {F}:D \subset X \to Y$ is said to be Newton-differentiable on the open subset U ⊂ D, if there exists a family of mappings $\mathcal {G}:U \to \mathcal {L}(X,Y)$ such that

$$\displaystyle \begin{aligned} \|\mathcal{F}(x+h) - \mathcal{F}(x) - \mathcal{G}(x+h)h\|{}_{Y} = o(\|h\|{}_{X}), \forall x \in U. \end{aligned}$$

$\mathcal {G}$ is called the Newton derivative for $\mathcal {F}$ on U. In [38], it is shown that

$$\displaystyle \begin{aligned} \mathcal{G}_{\delta}(y)(x) = \left\{\begin{array}{ccl} 1 & \text{if} & y(x) > 0\\ 0 & \text{if} & y(x) < 0\\ \delta & \text{if} & y(x) = 0 \end{array}\right. \end{aligned} $$

(21)

for every y ∈ X and $\delta \in \mathbb {R}$ is a Newton derivative of the $\max (0,\cdot )$, under the condition that $\max (0,\cdot ):L^{p}(\varOmega ) \to L^{q}(\varOmega )$ with 1 ≤ q < p ≤∞.

Therefore, if a mapping $\mathcal {F}$ is Newton-differentiable, then using the concepts above leads to a generalized Newton step for the equation $\mathcal {F}(x) = 0$, see, e.g., [19, 38].

Theorem 6

Suppose that $\mathcal {F}(x^*) = 0$ and that $\mathcal {F}$ is Newton-differentiable on an open neighborhood U of x ^∗ with Newton derivative $\mathcal {G}$ . If $\mathcal {G}(x)$ is nonsingular for all x ∈ U and the set$\left \{||\mathcal {G}(x)^{-1}||{ }_{\mathcal {L}(Y,X)} : x \in U\right \}$is bounded, then the semismooth Newton iteration

$$\displaystyle \begin{aligned} x_{l+1} = x_l - \mathcal{G}(x_l)^{-1}\mathcal{F}(x_l),\quad l=0,1,2,\ldots \end{aligned} $$

(22)

converges superlinearly to x ^∗ , provided that ||x₀ − x^∗||_Xis sufficiently small.

Under the assumption that $Z_{\mathrm {ad}} := \left \{v \in L^2(\varOmega ) \left |\; a \le v \le b, \text{ a.e. } x \in \varOmega \right .\right \}$ with a, b ∈ L²(Ω) and a < b, we can derive a semismooth Newton step for the solution of (20) in function space: Fix some $(u,p) \in H^1_0(\varOmega ) \times H^1_0(\varOmega )$ and define the following subsets of Ω:

$$\displaystyle \begin{aligned} \varOmega^a &:= \left\{ x \in \varOmega \left| a(x) + \alpha^{-1}(B^*p)(x) > 0\right.\right\},\\ \varOmega^b &:= \left\{ x \in \varOmega \left| -\alpha^{-1}(B^*p)(x)- b(x) > 0\right.\right\}. \end{aligned} $$

Moreover, let Ω^ina := Ω ∖ (Ω^a ∪ Ω^b) (up to a set of Lebesgue measure zero) and define the residual $\mathcal {F}_{\epsilon }(u,p)$ of (20) by:

$$\displaystyle \begin{aligned} \mathcal{F}^1_{\epsilon}(u,p) &:= Au - \nabla \phi^{\epsilon}(-u) - B\mathrm{Proj}_{Z_{\mathrm{ad}}}(-\frac{1}{\alpha} B^*p) - f,\\ \mathcal{F}^2_{\epsilon}(u,p) &:= A^*p +\nabla^2 \phi^{\epsilon}(-u)p - u_d + u.\end{aligned} $$

Since

$$\displaystyle \begin{aligned} z = \mathrm{Proj}_{Z_{\mathrm{ad}}}(-\alpha^{-1}B^*p) = -\alpha^{-1} B^*p - (-\alpha^{-1}B^*p - b)_+ + (a+\alpha^{-1}B^*p)_+,\end{aligned} $$

(23)

we can use (21) to obtain a Newton derivative $\mathcal {G}$ for $\mathcal {F}$:

$$\displaystyle \begin{aligned} \mathcal{G}_{\epsilon}(u,p) = \left[ \begin{array}{cc} A + \nabla^2 \phi^{\epsilon}(-u) & \alpha^{-1}\chi_{\varOmega^{ina}}BB^*\\ I- \nabla^3 \phi^{\epsilon}(-u)p & A^* + \nabla^2 \phi^{\epsilon}(-u)\\ \end{array} \right], \end{aligned}$$

where $\chi _{\varOmega ^{ina}}$ is the characteristic function for the set Ω^ina. If (δu, δp) denotes the difference between the new iterate and the current iterate in the semismooth Newton step, then at each iteration, we solve

$$\displaystyle \begin{aligned} \left[ \begin{array}{cc} A + \nabla^2 \phi^{\epsilon}(-u) & \alpha^{-1}\chi_{\varOmega^{ina}}BB^*\\ I- \nabla^3 \phi^{\epsilon}(-u)p & A^* + \nabla^2 \phi^{\epsilon}(-u)\\ \end{array} \right]\left[\begin{array}{c}\delta u\\ \delta p\end{array}\right] = -\mathcal{F}_{\epsilon}(u,p). \end{aligned} $$

(24)

If we can show that $\mathcal {G}_{\epsilon }(u,p)$ is invertible independently of (u, p) (for fixed 𝜖 > 0) so that the set $\left \{||\mathcal {G}_{\epsilon }(u,p)^{-1}||: (u,p) \in H^1_0(\varOmega )\times H^1_0(\varOmega ) \right \}$ is bounded, then we are guaranteed to have local superlinear convergence. This leads to Algorithm 4.2.

Algorithm 4.2 Adaptive penalty method: inner loop

We conclude this subsection with a numerical experiment. This is used in part to compare to the non-smooth numerical methods in later sections.

Example 3

Let Ω = (0, 1)², α = 1, b ≡ 0.035, a ≡ 0, and A = −Δ (associated with $H^1_0(\varOmega )$). Defining

$$\displaystyle \begin{aligned} y^{\dagger}({\mathbf{x}}_1,{\mathbf{x}}_2) = \left\{\begin{array}{ll} 160({\mathbf{x}}^3_1 - {\mathbf{x}}^2_1 + 0.25{\mathbf{x}}_1)({\mathbf{x}}_2^3 - {\mathbf{x}}^2_2 + 0.25{\mathbf{x}}_2) & \text{in } (0,0.5)\times (0,0.5),\\ 0 & \text{else,} \end{array}\right. \end{aligned}$$

$$\displaystyle \begin{aligned} \xi^{\dagger}({\mathbf{x}}_1,{\mathbf{x}}_2) = \max(0,-2|{\mathbf{x}}_1 - 0.8| - 2|{\mathbf{x}}_1{\mathbf{x}}_2 - 0.3| + 0.5), \end{aligned}$$

we set

$$\displaystyle \begin{aligned} f = -\varDelta y^{\dagger} - y^{\dagger} - \xi^{\dagger},\text{ and } u_d = y^{\dagger} + \xi^{\dagger} - \alpha\varDelta y^{\dagger}. \end{aligned}$$

The example is chosen due to the nontrivial biactive set and overlap of active sets for the control and state constraints. The only change to [40, Exp. 5.1] is the addition of control constraints, which were set to ±∞ there. Concerning the discretization and solution, we use a standard five-point stencil to discretize the negative Laplace operator with finite differences. The problem is solved on a uniform mesh with 512² grid points.

We start the algorithm at (z⁰, u⁰, ξ⁰) = (0, 0, 0). The stopping criterion is based on the L²-norm of the residual with stopping tolerance of 10⁻⁹. For this example, the 𝜖-update in 4.1 proved to be extremely sensitive, meaning that a reasonably aggressive update strategy, e.g., 𝜖_k+1 = 𝜖_k∕2 failed once $\epsilon _{k} = \mathcal {O}(10^{-4})$. To be fair, one would normally not cold-start this algorithm on such a fine grid. Opting instead for either an adaptive FEM strategy, multigrid scheme (as in [40]), or a nested grid strategy (as in [39, 46]) would certainly improve the performance and allow for a more aggressive update strategy for the smoothing parameter. Nevertheless, we see that as the penalized problem approaches the original non-smooth non-convex problem, the nonlinear system becomes increasingly more difficult to solve (eventually even failing). See Table 1 for the convergence history and Figure 1 for plots of the solution.

Table 1 Convergence history of the adaptive penalty method Algorithm 4.1 with stopping tolerance tol = 10⁻⁹, starting 𝜖₁ = 1 using 𝜖_k = 2^−(k−1) with k = 1, …, 12, l represents the iterations per inner loop of Algorithm 4.2

Full size table

4.2 An ℓ¹ Penalty Method

In this subsection, we present a technique originating in the finite-dimensional MPEC literature [3]. The extension to infinite dimensions can be found here [43]. The idea is elegant in its simplicity and allows us to approximate the elliptic MPEC by a sequence of PDE-constrained optimization problems with control and state constraints. Moreover, instead of a semilinear elliptic PDE, as in the previous method, we have a linear elliptic PDE. We begin in the abstract framework of Section 2.5 under the assumption that K is a cone and then pass to the problem in Example 2.

Using an ℓ¹-penalty for the condition 〈u, ξ〉 = 0, we approximate (6) by

$$\displaystyle \begin{aligned} \min\ &J(z,u) + \frac{1}{\epsilon}|\langle u,\xi \rangle | \text{ over } (z,u,\xi) \in Z \times V \times V^*, \end{aligned} $$

(25a)

$$\displaystyle \begin{aligned} \text{s.t. } &z \in Z_{\mathrm{ad}},{} \end{aligned} $$

(25b)

(25c)

(25d)

By definition of K⁺, 〈u, ξ〉≥ 0. Hence, for (u, ξ) ∈ K × K⁺, $\frac {1}{\epsilon }|\langle u,\xi \rangle | =\frac {1}{\epsilon }\langle u,\xi \rangle $, which yields the smooth objective $J_{\epsilon }(z,u,\xi ):=J(z,u) + \frac {1}{\epsilon }\langle u,\xi \rangle $.

The analysis for this problem requires several technical results. Nevertheless, under appropriate regularity and boundedness assumptions, one can still show existence of a solution, consistency of the approximation, derive first-order conditions and (after passing to the limit in 𝜖) obtain a (weak) form of C-stationarity. We briefly sketch the ideas here and refer the reader to [43, Section 2] for the detailed technical analysis in the context of Example 2.

Detail 7 (Sketch of Existence and Consistency Arguments)

In addition to the assumptions in Theorem 2, let F be weakly lower-semicontinuous, Z_ad bounded, and A symmetric.

To show that (25) has a solution, we prove the boundedness of infimizing sequence {(z_k, u_k, ξ_k)} (despite the unboundedness of K⁺ and lack of coercivity of J_𝜖(z, u, ξ)). Let (z₀, u₀, ξ₀) be feasible for (6). Then, for all sufficiently large k ∈ N,

$$\displaystyle \begin{aligned} J(z_0,u_0) = J_{\epsilon}(z_0,u_0,\xi_0) \ge\\ J_{\epsilon}(z_k,u_k,\xi_k) = F(u_k) + G(z_k) + \frac{1}{\epsilon}\langle u_k, \xi_k\rangle \ge G(z_k) + \frac{1}{\epsilon}\langle u_k, \xi_k\rangle. \end{aligned} $$

(26)

Hence, $J(z_0,u_0) - G(z_k) \ge \frac {1}{\epsilon }\langle u_k, \xi _k\rangle \ge 0$. Since Z_ad is bounded, there exists a weakly convergent subsequence $\{z_{k_l}\}$ with $z_{k_l} \rightharpoonup z^{\star } \in Z_{\mathrm {ad}}$. Then, by the weak lower-semicontinuity of G, we have

$$\displaystyle \begin{aligned} 0 \le \liminf_{l} \langle u_{k_l}, \xi_{k_l}\rangle \le \limsup_{l} \langle u_{k_l}, \xi_{k_l}\rangle \le J(z_0,u_0) - G(z^{\star}). \end{aligned}$$

Thus, testing (25c) with $u_{k_l}$, we first obtain the boundedness of $\{u_{k_l}\}$ in V and then $\{\xi _{k_l}\}$ in V^∗. Given A is symmetric, we use an argument as in (19) to prove that (along a further subsequence $\{k_{l_m}\}$) $\liminf _{m} \langle u_{k_{l_m}}, \xi _{k_{l_m}}\rangle \ge \langle u^{\star },\xi ^{\star }\rangle $, where u^⋆ and ξ^⋆ are the weak limit points. Since K and K⁺ are weakly closed, u^⋆ ∈ K and ξ^⋆ ∈ K⁺. This suffices to prove that (25) has a solution. A proof for consistency of the approximation, i.e., that global optimizers {(z_𝜖, u_𝜖, ξ_𝜖)} converge as 𝜖 ↓ 0 (at least along a subsequence) to a global optimizer {(z^⋆, u^⋆, ξ^⋆)} of (6) can be derived analogously using the boundedness of {z_𝜖}⊂ Z_ad and subsequently, the inequality 𝜖(J(z₀, u₀) − G(z_𝜖)) ≥〈u_𝜖, ξ_𝜖〉≥ 0. □

As in Section 4.1, we restrict ourselves to the setting of Example 2 for the derivation of stationarity conditions. Here, the derivation of first-order optimality conditions requires the verification of a constraint qualification, in this case that of Robinson-Zowe-Kurcyusz, see [88]. According to Propositions 2.5 and 2.6 in [43], we have the following 𝜖-dependent first-order optimality conditions: For any solution (z_𝜖, u_𝜖, ξ_𝜖) to (25) (under the data assumptions of Example 2), there exists a multiplier-tuple (p_𝜖, 𝜗_𝜖, τ_𝜖) such that

$$\displaystyle \begin{aligned} \alpha (z_{\epsilon}, w - z_{\epsilon}) + (B^*p_{\epsilon}, w - z_{\epsilon}) &\ge 0, \quad \forall w \in Z_{\mathrm{ad}}{} \end{aligned} $$

(27a)

$$\displaystyle \begin{aligned} A^*p_{\epsilon} + \frac{1}{\epsilon}\xi_{\epsilon} - \vartheta_{\epsilon} &= u_d - u_{\epsilon},{} \end{aligned} $$

(27b)

$$\displaystyle \begin{aligned} Au_{\epsilon} - \xi_{\epsilon} &= Bz_{\epsilon} + f,{} \end{aligned} $$

(27c)

$$\displaystyle \begin{aligned} u_{\epsilon} \in K_{0}, \; \xi_{\epsilon} &\in K^+_{0}, {} \end{aligned} $$

(27d)

$$\displaystyle \begin{aligned} \frac{1}{\epsilon}u_{\epsilon} - p_{\epsilon} - \tau_{\epsilon} &= 0, {} \end{aligned} $$

(27e)

$$\displaystyle \begin{aligned} \vartheta_{\epsilon} \in K^+_{0}, \; \langle \vartheta_{\epsilon}, u_{\epsilon} \rangle = 0, \; \tau_{\epsilon} \in K_{0}, \; \langle \xi_{\epsilon}, \tau_{\epsilon} \rangle &= 0. {} \end{aligned} $$

(27f)

In comparison, this system is much larger than (16) and contains additional information. Nevertheless, under certain boundedness assumptions, passing to the limit in 𝜖 yields, in fact, a weaker C-stationarity system, see [43, Thm. 2.9]:

$$\displaystyle \begin{aligned} \alpha (z^{\star}, w - z^{\star}) + (B^*p^{\star}, w - z^{\star}) &\ge 0, \quad \forall w \in Z_{\mathrm{ad}}{} \end{aligned} $$

(28a)

$$\displaystyle \begin{aligned} A^*p^{\star} - \lambda^{\star} &= u_d - u^{\star}, {} \end{aligned} $$

(28b)

$$\displaystyle \begin{aligned} Au^{\star} - \xi^{\star} &= Bz^{\star} + f,{} \end{aligned} $$

(28c)

$$\displaystyle \begin{aligned} u^{\star} \in K_{0}, \; \xi^{\star} \in K^+_{0}, \; \langle u^{\star},\xi^{\star}\rangle &=0, {} \end{aligned} $$

(28d)

$$\displaystyle \begin{aligned} \langle \lambda^{\star}, u^{\star}\rangle = 0, \; \langle p^{\star} , \xi^{\star}\rangle = 0, \; \langle \lambda^{\star},p^{\star} \rangle &\le 0. {} \end{aligned} $$

(28e)

To see that (28e) relates to (9h), (9i) suppose for the sake of argument that ξ^⋆, p^⋆, λ^⋆ are merely vectors of length n and the conditions in (28e) are understood as the componentwise (Hadamard) products. Then, by complementarity, ξ^⋆ ≥ 0 and $\xi ^{\star }_i = 0$ if i is an inactive or biactive index and 〈p^⋆, ξ^⋆〉 = 0 in turn implies that the strongly active components of p^⋆ are zero (as in (9h)). The same applies to the inactive components of λ^⋆ due to 〈λ^⋆, u^⋆〉 = 0 (as in (9i)).

Though there is a significant gap, the derivation of (28) is related to the convergence of a function-space-based numerical method. Indeed, using known results for linear elliptic PDE-constrained optimization problems with control and state constraints, we have viable efficient algorithms that can guarantee convergence to a KKT point, which satisfies (27). By performing continuation on 𝜖 we can be assured to converge (along a subsequence) to a weak C-stationary point. Furthermore, Theorem 2.12 in [43] provides a more compelling argument.

Theorem 8 (Thm 2.12 [43])

Suppose (z_𝜖, u_𝜖, ξ_𝜖, p_𝜖, 𝜗_𝜖, τ_𝜖) satisfies (27) and that (z_𝜖, u_𝜖, ξ_𝜖) is feasible for (7). Then, (z_𝜖, u_𝜖, ξ_𝜖) is strongly stationary in the sense of Mignot and Puel, i.e., conditions (9h)–(10b) are replaced by

$$\displaystyle \begin{aligned} p &= 0,\mathit{\text{ q.e. }} x \in \varOmega^{0+} {} \end{aligned} $$

(29a)

$$\displaystyle \begin{aligned} \langle \lambda, \phi \rangle &= 0, \forall \phi \in H^1_0(\varOmega): \phi = 0,\mathit{\text{ q.e. }} \varOmega^{0} {} \end{aligned} $$

(29b)

$$\displaystyle \begin{aligned} p &\le 0,\mathit{\text{ q.e. }} x \in \varOmega^{00} {} \end{aligned} $$

(29c)

$$\displaystyle \begin{aligned} \langle \lambda, \phi \rangle &\ge 0, \forall \phi \in H^1_0(\varOmega): \phi \ge 0,\mathit{\text{ q.e. }} \varOmega^{00}, \phi = 0,\mathit{\text{ q.e. }} \varOmega \setminus (I \cup \varOmega^{00}) {} \end{aligned} $$

(29d)

Since we are using an ℓ¹ penalty, which often amounts to an exact penalty function, there is a good chance that a stationary point (z_𝜖, u_𝜖, ξ_𝜖) is feasible for (7) for sufficiently small 𝜖.

Algorithm 4.3 ℓ¹ penalty method: outer loop

Again we might choose as a stopping criterion the residual of C-stationarity, taking λ := 𝜖⁻¹ξ − 𝜗 and substituting τ = 𝜖⁻¹u − p. Since the theory only guarantees 𝜗 ∈ H⁻¹(Ω), one will need to treat the discrete quantities carefully, cf. [43] for more details. In addition, the solution of the subproblems (25) does not reduce to the solution of a system similar to (20). Therefore, the solution of the subproblems can be more difficult than in the adaptive penalty framework. Nevertheless, the theoretical result in Theorem 8 indicates the potential of this algorithm to generate a better stationary point.

5 Non-Smooth Numerical Methods

In this section, we present several methods that do not require a smoothing or penalization of the original MPEC. We present a new approximate projected subgradient method alongside a direct solver for the C-stationarity system presented in [36] and a recent method from [46] that may serve as a globalization of the direct solver. In all of these methods, we need to solve the variational inequality (1). This can be done using the semismooth Newton methods as in [38, 41] or special monotone multigrid methods as in [55].

5.1 An Approximate Projected Subgradient Method

The subgradient method is perhaps the simplest non-smooth optimization algorithm. Suppose X is a real separable Hilbert space. Given a proper, convex, lower-semicontinuous, and subdifferentiable functional $f: X \to \mathbb {R}$, x⁰ ∈ X, g⁰ ∈ ∂f(x⁰), and a sequence {ν_k} with ν_k > 0, compute a sequence of iterates {x^k} according to the rule

$$\displaystyle \begin{aligned} x^{k+1} := x^k - \nu_k g^k, \quad g^{k+1} \in \partial f(x^k).\end{aligned} $$

Here, and unless otherwise noted below, we would need to apply the Riesz map to g^k before using it in this iteration.

Similarly, given a non-empty, closed, and convex subset C ⊂ X the projected-gradient method replaces the previous rule by

$$\displaystyle \begin{aligned} x^{k+1} := \mathrm{Proj}_{C}\, (x^k - \nu_k g^k), \quad g^{k+1} \in \partial f(x^{k+1}).\end{aligned} $$

(30)

The subgradient method was invented by N.Z. Shor in the 1970s, see [76], and although it does not guarantee descent of the objective functional and can be quite slow to converge, it still finds a wide array of applications due to its simplicity and ability to be combined with distributed algorithm techniques, as in, e.g., [78].

Consider the reduced elliptic MPEC in (5):

$$\displaystyle \begin{aligned} \min\ &\mathcal{J}(z) := F(S(Bz)) + G(z) \text{ over } z\in Z_{\mathrm{ad}}.\end{aligned} $$

Since S is nonlinear, the reduced objective functional is typically non-convex. Therefore, we cannot directly apply the projected subgradient method. However, there exist a number of generalized subdifferentials for non-convex functions. In our case, we will initially make use of the limiting subdifferential (also known as the Mordukhovich subdifferential) for the reduced objective. We will restrict ourselves to the framework of Example 2. First, we recall several definitions from variational analysis, see [66].

Definition 6 (Normal Cones to Arbitrary Sets)

Let X be a Hilbert space and C ⊂ X. Then, the multifunction $\widehat {\mathcal {N}}_{C}:X \rightrightarrows X^*$ defined by

$$\displaystyle \begin{aligned} \widehat{\mathcal{N}}_{C}(x):= \left\{ x^* \in X^*\left|\; \langle x^*,x' {-} x\rangle_{X} \le o(||x'{-}x||{}_{X}),\; \forall x'\stackrel{X}{\to} x, x' \in C \right.\right\},\quad x\in C,\end{aligned} $$

(31)

and $\widehat {\mathcal {N}}_C(x):=\emptyset $ for x∉C is called the regular (Fréchet) normal cone to C. The multifunction $\mathcal {N}_{C}:X\rightrightarrows X^*$ defined by

$$\displaystyle \begin{aligned} \mathcal{N}_{C}(x):= \left\{x^*\in X^*\left|\;\exists x_k \stackrel{X}{\to}x,\,\exists x^*_k\stackrel{X^*}{\rightharpoonup} x^*:\,x^*_k\in\widehat{\mathcal{N}}_{C}(x_k),\,\,\forall k\in\mathbb{N}\right.\right\} \end{aligned} $$

(32)

is called the limiting (Mordukhovich) normal cone to C.

Although $\widehat {\mathcal {N}}_{C}$ is convex, it fails to admit a satisfactory calculus needed for most non-smooth, non-convex problems. In contrast, the limiting normal cone enjoys a robust calculus. Note that for closed convex sets C, both cones agree, and in general $\widehat {\mathcal {N}}_{C}(x) \subsetneqq \mathcal {N}_{C}(x)$. We will use the limiting normal cone to define a generalized subgradient, needed in part for our proposed numerical method.

Definition 7 (Limiting Subdifferential)

Let X be a Hilbert space, $\phi :X \to \overline {\mathbb {R}}$, and x ∈ X such that |ϕ(x)| < +∞. The set

$$\displaystyle \begin{aligned} \partial \phi(x) := \left\{ x^* \in X^* \left|\; (x^*,-1)\in \mathcal{N}_{\mathrm{epi}\ \phi}(x,\phi(x)) \right.\right\} \end{aligned} $$

(33)

is called the limiting (Mordukhovich) subdifferential. If |ϕ(x)| = ∞, we set ∂ϕ(x) = ∅.

Therefore, if we know the limiting subdifferential $\partial \mathcal {J}(z)$ in (5), then we could design a projected subgradient iteration along the lines of (30):

$$\displaystyle \begin{aligned} z^{k+1} := \mathrm{Proj}_{Z_{\mathrm{ad}}}\, (z^k - \nu_k g^k), \quad g^{k+1} \in \partial \mathcal{J}(z^{k+1}). \end{aligned}$$

If $\mathcal {J}$ were smooth, then $\partial \mathcal {J}$ is just the gradient of the reduced objective functional, which we usually calculate in PDE-constrained optimization by solving an adjoint equation. However here, $\mathcal {J}$ is non-smooth and non-convex. In order to obtain a generalized adjoint state for the reduced objective functional we require the so-called “coderivatives.”

Definition 8 (Coderivatives)

Let X be a Hilbert space, $\varPhi :X \rightrightarrows X^*$, and y ∈ Φ(x), i.e., $(x,y)\in \mathop {\mathrm {Graph}}\varPhi $. The regular (Fréchet) coderivative of Φ at (x, y) is the multifunction $\widehat {D}^*\varPhi (x,y)\colon Y^*\rightrightarrows X^*$ defined by

$$\displaystyle \begin{aligned} h^*\in\widehat{D}^*\varPhi (x,y)(d^*)\Longleftrightarrow(h^*,-d^*)\in\widehat{\mathcal{N}}_{\mathop{\mathrm{Graph}} \varPhi}(x,y). \end{aligned} $$

(34)

The limiting (Mordukhovich) coderivative D^∗Φ(x, y) of Φ at $(x,y)\in \mathop {\mathrm {Graph}}\varPhi $ is similarly defined by

$$\displaystyle \begin{aligned} h^*\in D^*\varPhi(x,y)(d^*)\Longleftrightarrow(h^*,-d^*)\in \mathcal{N}_{\mathop{\mathrm{Graph}}\varPhi}(x,y). \end{aligned} $$

(35)

For example, if Φ = S_𝜖 from Section 4.1, then the coderivatives coincide and we have:

$$\displaystyle \begin{aligned} \widehat D^*S_{\epsilon}(w,u)(w^*)=D^*S_{\epsilon}(w,u)(w^*) = S_{\epsilon}^{\prime}(w)^*w^*, \quad w^*\in X^*, \end{aligned}$$

i.e., $\widehat D^*S_{\epsilon }(w,u)(w^*)$ yields the usual adjoint state p obtained by solving the associated linear elliptic PDE with − w^∗ on the right-hand side.

For the tracking-type objective in Example 2, it was argued in [44, Prop. 1] that

$$\displaystyle \begin{aligned} \partial \mathcal{J}(z) \subset \alpha z+ B^*D^*S(B z,u)(u- u_d). \end{aligned} $$

(36)

Here, it follows from [66, Thm 4.44] that p ∈ D^∗S(Bz, u)(u − u_d) is a solution to the generalized adjoint equation:

$$\displaystyle \begin{aligned} A^* p + D^* \mathcal{N}_{K_0}(u,\xi)(p) \ni u_d - u, \end{aligned} $$

(37)

where $\xi = Bz + f - Au \in \mathcal {N}_{K_0}(u)$. Therefore, assuming that (37) were solvable, we could fashion our projected subgradient method as in Algorithm 5.4.

Algorithm 5.4 Limiting projected subgradient algorithm

Since we have no efficient means of handling $D^* \mathcal {N}_{K_0}(u^{k+1},\xi ^{k+1})(p)$, Algorithm 5.4 is impractical, especially when we consider that subgradient methods potentially require many iterations even for favorable convex problems. We therefore propose an alternative in Algorithm 5.5. The formal derivation for the approximate generalized adjoint state is based on simple geometric observations for a related finite-dimensional setting.

Consider that the (convex) normal cone $\mathcal {N}_{K_0}$ is generated by the (convex) subdifferential of the indicator functional $i_{K_0}$. Since the functionals ϕ^𝜖 in (13) converge in a variational sense to $i_{K_0}$, they provide a viable candidate for approximating elements of $\mathcal {N}_{K_0}$. Moreover, for any u ∈ V , ∇ϕ^𝜖(u) = −𝜖⁻¹(−u)₊. Comparing as 𝜖 ↓ 0, it appears (at least in finite dimensions) that $ \mathop {\mathrm {Graph}} \nabla \phi ^{\epsilon } \to \mathop {\mathrm {Graph}} -\mathcal {N}_{K_0}$, see Figure 2. This behavior transfers to $\widehat {\mathcal {N}}_{ \mathop {\mathrm {Graph}} -\mathcal {N}_{K_0}}(u,\xi )$ and $\widehat {\mathcal {N}}_{ \mathop {\mathrm {Graph}} \nabla \phi ^{\epsilon }}(u,\xi )$, cf. Figure 2 with $\varTheta := \mathop {\mathrm {Graph}} -\mathcal {N}_{\mathbb {R}_+}$ and $\varLambda := \mathop {\mathrm {Graph}} \nabla \phi ^{0.1}$.

Using the information in Figure 2, we can first calculate the limiting normal cones $\mathcal {N}_{\varTheta }(u,\xi )$, from which we obtain: $D^*\mathcal {N}_{\varTheta }(1,0)(p) = \{0\}, \text{ for all } p,$ and

$$\displaystyle \begin{aligned} D^*\mathcal{N}_{\varTheta}(0,1)(p) = \mathbb{R}, \text{ if } p = 0,\end{aligned} $$

otherwise $D^*\mathcal {N}_{\varTheta }(0,1)(p) = \emptyset $. The most interesting case is:

$$\displaystyle \begin{aligned} D^*\mathcal{N}_{\varTheta}(0,0)(p) = \left\{\begin{array}{ll} \{0\},& \text{ for all } p < 0,\\ \mathbb{R},& \text{ if } p = 0\\ \mathbb{R}_{-},& \text{else.} \end{array}\right.\end{aligned} $$

Similarly, we have for all p

$$\displaystyle \begin{aligned} D^*\mathcal{N}_{\varLambda}(1,0)(p) = \{0\},\quad D^*\mathcal{N}_{\varLambda}(0,1)(p) = -\epsilon^{-1} p;\end{aligned} $$

and

$$\displaystyle \begin{aligned} D^*\mathcal{N}_{\varLambda}(0,0)(p) = \left\{ \begin{array}{ll} \{0\}\cup \{ -\epsilon^{-1} p\},& \text{ for all } p < 0,\\ \{0\}, & \text{ if } p = 0,\\ \left[-\epsilon^{-1}p,0\right] & \text{else.} \end{array} \right. \end{aligned}$$

Though certainly more tractable numerically, $D^*\mathcal {N}_{\varLambda }$ is still a set-valued mapping with non-convex images. We therefore suggest the following single-valued mapping:

$$\displaystyle \begin{aligned} \widetilde{D}^*\mathcal{N}_{\varLambda}(0,0)(p) := \left\{q : q = -\epsilon^{-1}\chi_{\{u = 0\}}p \right\} \end{aligned}$$

where χ_{u=0} is the characteristic function for the active set. This mapping coincides with the limiting coderivative $D^*\mathcal {N}_{\varTheta }$ on the inactive set and strongly active set, whenever $D^*\mathcal {N}_{\varTheta }$ is non-empty. On the biactive set, it is either contained in $D^*\mathcal {N}_{\varTheta }$ or approaches it for 𝜖 ↓ 0, cf. Figure 3.

By extrapolating these ideas from this simple one-dimensional geometric study to the infinite-dimensional setting, we arrive at our proposed algorithm.

Algorithm 5.5 Approximate projected subgradient algorithm

Remark 4

In fact, − χ_{u≤0}p is nothing more than the adjoint of the Newton derivative for the non-smooth Nemytskii operator $(- \cdot )_+ : H^1_0(\varOmega ) \to L^2(\varOmega )$.

Using the analytical techniques described throughout the text, we have the next result.

Theorem 9

Suppose that Z _ad is bounded. Then, any sequence {(z^k, u^k, ξ^k, p^k, λ^k)}⊂ Z × V × V^∗× V × V^∗generated by Algorithm5.5is bounded. Here,

$$\displaystyle \begin{aligned} \lambda_k := -\frac{1}{\epsilon_k} \chi_{\{ u^{k} \le 0 \}}p^k. \end{aligned}$$

Moreover, any weak accumulation point of (z^⋆, u^⋆, ξ^⋆, p^⋆, λ^⋆) will be feasible for (7) and we have

$$\displaystyle \begin{aligned} \lim_{k\to \infty}\int_{\{u^k = 0\}}|p^k|{}^2 \mathrm{d}x = 0,\quad \langle \lambda^\star,p^\star\rangle \le 0, \quad \langle \lambda^\star,u^\star \rangle = 0. \end{aligned} $$

(38)

Proof

Since Z_ad is bounded, {z^k} is bounded in Z. It immediately follows that {u^k} is bounded, since

$$\displaystyle \begin{aligned} c\|u^k\|{}^2_V \le \langle Au^k,u^k \rangle - \langle \xi^k, u^k\rangle = \langle Bz^k + f, u^k\rangle \Rightarrow c\|u^k\|{}_V \le \| Bz^k + f \|{}_{V^*} \end{aligned}$$

and ξ^k = Bz^k + f − Au^k. Similarly, we obtain the boundedness of {p^k} in V and {λ^k} in V^∗:

$$\displaystyle \begin{aligned} c\|p^k\|{}^2_V \le \langle A^*p^k,p^k \rangle + \langle \lambda^k, p^k\rangle = \langle u_d - u^k,p^k\rangle \Rightarrow c\|p^k\|{}_V \le \| u_d - u^k\|{}_{V^*} \end{aligned}$$

and λ^k = u_d − u^k − A^∗p^k. Furthermore, since

$$\displaystyle \begin{aligned} 0 \le \epsilon_k\langle A^*p^k,p^k \rangle + \int_{\{u^k = 0\}}|p^k|{}^2 \mathrm{d}x = \epsilon_k\langle u_d - u^k,p^k\rangle \end{aligned}$$

the limit condition in (38) holds. Moreover, will can again show that 〈λ^⋆, p^⋆〉≤ 0 using the same argument as in (19) and by definition 〈λ^k, u^k〉 = 0. Since B is compact, u^k → u^⋆ in V (along a subsequence). This yields 〈λ^⋆, u^⋆〉 = 0.

The purpose of Theorem 9 is to show that Algorithm 5.5 produces a sequence with a weak accumulation point that satisfies a kind of limiting C-stationarity system. However, the lag in indices prevents us from closing the argument by proving that {z^k} fulfils (9) (regardless of whether we choose fixed, bounded, or diminishing step sizes). Nevertheless, we will see later in the bundle-free approach that a variant of our approximate adjoint equation can under certain circumstances yield descent directions for the reduced objective.

We now demonstrate the performance of the algorithm on an example with a nontrivial biactive set: Example 3. In order to assure feasibility at every step, we solve the variational inequality with the primal-dual active set (PDAS)/semismooth Newton method from [38]. Note that although the solver for the variational inequality is mesh dependent, the majority of the linear solves are done within the first four iterations, see Table 2. For a graph of the behavior of the residuals as well as plots of the solution, see Figure 4.

Table 2 Outer loop k vs. inner loop iterations “iter” for PDAS with tol = 10⁻¹⁰ used in Algorithm 5.5

Full size table

We again use a uniform grid with 512² grid points and start the algorithm at (z⁰, u⁰, ξ⁰) = (0, 0, 0). We choose the a priori step sizes ν_k := (k)^−1∕2 and update 𝜖_k according to 𝜖_k := 10⁻⁴∕2^k. The inner PDAS solver stops once the residual of the non-smooth system of equations reaches a tolerance of 10⁻¹⁰. Though the theory does not provide a stopping criterion, we check the residual of strong stationarity, which reaches $\mathcal {O}(10^{-9})$ after 30 iterations. The residual is calculated using a discrete approximation of the following quantity:

$$\displaystyle \begin{aligned} \mathrm{res^k} := \| z^{k} - \mathrm{Proj}_{Z_{\mathrm{ad}}}\, (z^k - g^k)\|{}_{L^2} + \|Au^{k} - \xi^{k} - z^{k} - f\|{}_{L^2} + \|\max(0,-u^{k})\|{}_{L^2}+ \\ \|\max(0,-\xi^k)\|{}_{L^2} + |(\xi^k,u^k)_{L^2}| + \|Ap^{k} - \lambda^{k} + u^{k} -u_{d}\|{}_{L^2} + |\max(0, \langle \lambda^k,p^k\rangle)|+ \\ \|\chi_{\varOmega^{0+}_k}p^k\|{}_{L^2} + \|\chi_{\mathcal{I}_k}\lambda^k\|{}_{L^2} + \|\chi_{\varOmega^{00}_k}\max(0,p^k)\|{}_{L^2} + \|\chi_{\varOmega^{00}_k}\max(0,-\lambda^k)\|{}_{L^2}\end{aligned} $$

(39)

Despite lacking a convergence theory, the algorithm performs very well on this large-scale nontrivial problem. Indeed, counting all the free variables (z, u, ξ, etc.) there are over 10⁶ degrees of freedom. Moreover, the presence of biactivity means that the example considered is genuinely non-smooth and non-convex. This is particularly encouraging, as the algorithm clearly outperforms the adaptive penalty method in terms of ease of implementation, size and structure of the systems of linear equations, and order of accuracy (almost reaching even strong stationarity). In particular, we note the stark contrast to the “sharpness” of the solution in comparison to the smooth method.

5.2 A Direct Solver for C-Stationarity Conditions

In this section, we adapt a method from [36] to the canonical MPEC (7). We work in the setting of Example 2 and assume that

$$\displaystyle \begin{aligned} Z_{\mathrm{ad}} := \left\{v \in L^2(\varOmega) \left|\; a \le v \le b, \text{ a.e. } x \in \varOmega \right.\right\} \end{aligned}$$

with a, b ∈ L²(Ω), a < b. We first state the algorithm. Then, we motivate the steps of the algorithm and discuss its convergence properties.

The formal derivation of Algorithm 5.6, which we describe in detail below, follows several basic steps: Fix a control and estimate the active and inactive sets; ignoring biactivity, use (28e) to approximate (28b)–(28e) as a system of equations; solve the reduced system (including (28a)) to obtain an update; test the residual of C-stationarity, if necessary, return to 1.

Algorithm 5.6 An active-set equality-constrained newton solver w. Feasibility restoration

More specifically, suppose we are in the infinite-dimensional setting. Assuming that ξ^⋆ is sufficiently regular, then Equations (28c), (28d) can be understood as the following system of smooth and non-smooth equations:

$$\displaystyle \begin{aligned} Au^{\star} - \xi^{\star} &= Bz^{\star} + f, {} \end{aligned} $$

(40a)

$$\displaystyle \begin{aligned} \xi^{\star} &= (\xi^{\star} - cu^{\star})_+, {} \end{aligned} $$

(40b)

where c > 0 is some scaling constant. Since ξ^⋆ ∈ L²(Ω), the Newton derivative for the plus function described at the end of Section 4.1 is not valid in this setting (since both the domain and range here must be taken as L²(Ω)). On a discrete level, this is not an issue. For some fixed z^⋆, solving (40) gives a pair (u^⋆, ξ^⋆) along with active and inactive sets:

$$\displaystyle \begin{aligned} \varOmega^0(u^{\star}) := \left\{x \in \varOmega \left|\; u^{\star}(x) = 0 \right.\right\}, \quad \varOmega^+(u^{\star}) := \left\{x \in \varOmega \left|\; u^{\star}(x) > 0 \right.\right\}.\end{aligned} $$

As in our discussion of (28), if we treat the variables u^⋆, λ^⋆ as finite-dimensional vectors and the complementarity condition as the pointwise product of u^⋆ and λ^⋆, then the complementarity condition would indicate that λ^⋆ = 0 on the inactive set. Analogously, we take p^⋆ = 0 on the (entire) active set, thus ignoring biactivity. Consequently the remaining sign condition in (28) holds. Finally, using the projection formula (23) along with a semismooth Newton step, we can handle the variational inequality (28a). We recall the sets

$$\displaystyle \begin{aligned} \varOmega^a &:= \left\{ x \in \varOmega \left| a(x) + \alpha^{-1}(B^*p^{\star})(x) > 0\right.\right\},\\ \varOmega^b.&:= \left\{ x \in \varOmega \left| -\alpha^{-1}(B^*p^{\star})(x)- b(x) > 0\right.\right\}.\end{aligned} $$

and Ω^ina := Ω ∖ (Ω^a ∪ Ω^b).

Now, supposing we want a new approximation (u^⋆, z^⋆, p^⋆) via u^⋆ + δu, z^⋆ + δz, and p^⋆ + δp, we consider the reduced system by eliminating the dual variables:

$$\displaystyle \begin{aligned} \delta z +\alpha^{-1} \chi_{\varOmega^{ina}}B^*\delta p &= \mathrm{Proj}_{Z_{\mathrm{ad}}}(-\alpha^{-1}B^*p^{\star}) - z^{\star} &\text{ on } \varOmega, {} \end{aligned} $$

(41a)

$$\displaystyle \begin{aligned} A^* \delta p + \delta u &= u_d - u^{\star} -A^*p^{\star}, &\text{ on } \varOmega^+(u^{\star}), {} \end{aligned} $$

(41b)

$$\displaystyle \begin{aligned} \delta p &= 0, &\text{ on } \varOmega^0(u^{\star}), {} \end{aligned} $$

(41c)

$$\displaystyle \begin{aligned} A \delta u - B\delta z &= Bz^{\star} + f - Au^{\star}, &\text{ on } \varOmega^+(u^{\star}), {} \end{aligned} $$

(41d)

$$\displaystyle \begin{aligned} \delta u &= 0, &\text{ on } \varOmega^0(u^{\star}), {}\end{aligned} $$

(41e)

If we replace δz in (41d), then we get the smaller system in (δu, δp):

$$\displaystyle \begin{aligned} A^* \delta p + \delta u &= u_d - u^{\star} -A^*p^{\star}, &\text{ on } \varOmega^+(u^{\star}), {} \end{aligned} $$

(42a)

$$\displaystyle \begin{aligned} \delta p &= 0, &\text{ on } \varOmega^0(u^{\star}), {} \end{aligned} $$

(42b)

$$\displaystyle \begin{aligned} A \delta u + \alpha^{-1} B \chi_{\varOmega^{ina}}B^*\delta p &= B \mathrm{Proj}_{Z_{\mathrm{ad}}}(-\alpha^{-1}B^*p^{\star}) + f -Au^{\star}, &\text{ on } \varOmega^+(u^{\star}), {} \end{aligned} $$

(42c)

$$\displaystyle \begin{aligned} \delta u &= 0, &\text{ on } \varOmega^0(u^{\star}), {} \end{aligned} $$

(42d)

which is remarkably similar to the semismooth Newton step in the adaptive penalty method. Unlike the smooth methods, however, a proof of convergence remains elusive. The main culprit here is clearly the sequence $\{\varOmega ^+_k\}$, which need not converge with respect to any notion of set convergence, e.g., Painlevé-Kuratowski or in the sense of characteristic functions.

Note also that this formally derived system is well-defined in function space provided that we have enough regularity. For example, provided that the sets Ω⁺(u^⋆), Ω⁰(u^⋆) are sufficiently regular, we could reduce the search for $(\delta u, \delta p) \in \mathcal {H} \times \mathcal {H}$ with $\mathcal {H} := H^1_0(\varOmega ^+(u^{\star }))$, solve the associated weak form of (42a), (42b), and extend the solutions by zero on Ω⁰(u^⋆).

An obvious stopping criterion for Algorithm 5.6 would be the residual of (28) up to some user-defined tolerance (as suggested in [36]). Moreover, we see that the computational effort is roughly that of the adaptive penalty method. In fact, there is much less nonlinearity here due to a lack of the penalty terms ϕ^ε. We also mention that the MPECs in [36] are much more challenging than (7), as the controls there arise inside the differential operator. Nevertheless, the algorithm seems to perform very well, even on examples with nontrivial biactive sets.

We demonstrate the performance of Algorithm 5.6 on Example 3. As expected, Algorithm 5.6 behaves like a second-order method, see Table 3. We once again used the PDAS/semismooth Newton method in [38] to restore feasibility at every step. Moreover, since the multiplier λ is eliminated from the algorithm, we artificially reintroduce it for the calculation of the residuals.

Table 3 Residuals of strong stationarity for Algorithm 5.6 with stopping tolerance tol = 10⁻⁷

Full size table

The solutions look identical to those plotted in Figure 5. We therefore only provide images of the biactive sets for y^⋆ and upper and lower active sets for z^⋆. We note that this algorithm also performs quite well, reaching a residual of strong stationarity on the order of $\mathcal {O}(10^{-8})$ within k = 5 iterations, though it never reaches $\mathcal {O}(10^{-9})$ in contrast to the approximating subgradient algorithm. In addition, the effort to solve each step is higher, as seen in Table 4.

Table 4 Outer loop k vs. inner loop “iter” iterations for PDAS with tol = 10⁻¹⁰ used in Algorithm 5.6

Full size table

Our next non-smooth method seeks to overcome the theoretical deficiencies of Algorithm 5.5 and 5.6. In some sense, it takes a step towards bridging the gap between this active-set-based solver and the approximate projected subgradient method in the previous subsection.

5.3 The Bundle-Free Implicit Programming Method

We now present the bundle-free implicit programming approach from [46], which we extend for control constraints. In contrast to the active-set method in the previous subsection, this method is based off of B-stationarity conditions. We must therefore assume that S is directionally differentiable.

We first state the basic assumptions and the algorithm. Afterwards, we discuss the motivations for the steps and examine the convergence properties. Throughout this subsection, let $Z_{\mathrm {ad}} := \left \{z \in L^2(\varOmega ) \left |\; a \le z \le b \right .\right \}$ with a, b ∈ L²(Ω) and a < b and assume that the variational inequality is defined as in (7). We otherwise work in the abstract framework of (6), where $ \mathcal {J}(z) := F(S(B z)) + G(z) $ and F and G are continuously Fréchet differentiable. By assumption, ξ ∈ H.

For some constant c_q > 0, let q(⋅) := c_q∥⋅∥²∕2 and for z ∈ Z_ad define the local quadratic models:

$$\displaystyle \begin{aligned} \mathcal{M}(\cdot) &:= q(\cdot) + F'(z;S'(Bz; B\cdot)) + G'(z; \cdot),\\ \mathcal{M}_{\epsilon}(\cdot) &:= q(\cdot) + F'(z;d_{\epsilon}(B\cdot)) + G'(z;\cdot) ,\,\,\epsilon > 0, \end{aligned} $$

where d_𝜖(⋅) is a smooth approximation of S(Bz;⋅) such that d_𝜖(0) = 0. Here, we suggest letting d_𝜖(w) = d, the solution to

$$\displaystyle \begin{aligned} A d + \epsilon^{-1} \chi_{\varOmega^{0+}} d - \chi_{\varOmega^{0}}\nabla^2 \phi^{\epsilon}(-d) = w, \end{aligned}$$

where w ∈ V^∗, 𝜖 > 0, and u = S(Bz). This is related to the true formula for d = S′(Bz;w) given by

$$\displaystyle \begin{aligned} \text{Find } d \in \mathcal{K}(u,\xi): \int_{\varOmega} \nabla d\cdot \nabla [v-d] \mathrm{d}x \ge \langle w, v - d \rangle,\quad \text{ for all } v \in \mathcal{K}(u,\xi), \end{aligned} $$

(43)

where $ \mathcal {K}(u,\xi ): = \left \{ u \in H^1_0(\varOmega ) \left |\; d(x) \ge 0,\; \text{ q.e. } x \in \varOmega ^{0},\; d(x) = 0. \text{ a.e. }x \in \right .\right .$$\left .\left .\varOmega ^{0+}\right .\right \};$ see [63] (or [15, Chap. 6] for an English summary). As this formula makes use of quasi-continuity and potentially non-negligible sets of Lebesgue measure zero, it is unclear how to make direct use of the nonnegativity condition in a numerical method. To circumvent this issue, a certain regularity condition is assumed throughout (as in [46]):

1.
$\varOmega ^{0} = \overline {\text{int}(\varOmega ^{0})},$
2.
If m(Ω⁰⁰) = 0, then cap(Ω⁰⁰) = 0 or S′(Bu;Bh) = d = 0 q.e. x ∈ Ω⁰.
3.
$\text{ If } m(\varOmega ^{00}) > 0, \text{ then } \exists \nu > 0, \exists \gamma ' > 0: \forall \gamma \ge \gamma ', d_{\gamma ,\varepsilon } \ge 0,\,a.e.\, \mathcal {A}_{\nu }\setminus \varOmega ^{0})\\ \text{where } \mathcal {A}_{\nu } := \left \{x \in \varOmega \left | \text{dist}(x,\varOmega ^{0}) < \nu \right .\right \}. $

In light of this assumption, we speak of “negligible” biactive sets whenever m(Ω⁰⁰) = 0. Note that S is in fact Gâteaux differentiable, whenever this condition holds. This once again highlights a fundamental difference between the optimal control of PDEs and variational inequalities, even for the simplest variational inequality of interest.

We also suggest the smooth approximation of the directional derivative in order to guarantee that $p_{\epsilon } = d^{\prime }_{\epsilon }(0)^*w$ solves

$$\displaystyle \begin{aligned} A^* p + \epsilon^{-1} \chi_{\varOmega^{0+}} p = w \end{aligned}$$

and that {p_𝜖}_𝜖>0 is uniformly bounded in V . Notice the similarity to the approximating limiting subgradient suggested earlier.

Algorithm 5.7 A bundle-free implicit programming approach

In Algorithm 5.7, $ \mathcal {T}_{Z_{\mathrm {ad}}}(z)$ is the tangent cone to Z_ad at z ∈ Z_ad, which is defined by

$$\displaystyle \begin{aligned} \mathcal{T}_{Z_{\mathrm{ad}}}(z) := -\left[ \mathcal{N}_{\mathrm{ad}}(z)\right]^{+} = \overline{\mathbb{R}_{+}(Z_{\mathrm{ad}} - z)}. \end{aligned}$$

We use the generalized Armijo line search (cf. [13].): Set $\tau ^k := \beta ^{m_k} s$ where m_k is the first nonnegative integer such that

$$\displaystyle \begin{aligned} \mathcal{J}(z^k) - \mathcal{J}(z^k(\beta^{m} s)) \ge \frac{\sigma}{\beta^{m}s^{}}\|z^k - z^k(\beta^{m} s)) \|{}^2_{L^2}, \end{aligned} $$

(46)

where β, σ ∈ (0, 1), s > 0 and, given a step δz, we set

$$\displaystyle \begin{aligned} z(\tau) := \mathrm{Proj}_{Z_{\mathrm{ad}}}(z +\tau \delta z),\; \tau > 0. \end{aligned}$$

Since much of Algorithm 5.7 is derived from B-stationarity conditions, we restate them here:

$$\displaystyle \begin{aligned} \mathcal{J}'(z;w -z) = F'(S(Bz))S'(Bz;B(w-z)) + G'(z)(w-z)\ge 0,\quad \forall w \in Z_{\mathrm{ad}}. \end{aligned} $$

(47)

From (47), it is clear that we can equivalently reformulate B-stationarity as:

$$\displaystyle \begin{aligned} \mathcal{J}'(z;\delta z) \ge 0,\,\,\forall \delta z \in \mathcal{T}_{Z_{\mathrm{ad}}}(z). \end{aligned}$$

Moreover, since $\mathcal {J}'(z;\delta z)$ is positively homogeneous in δz, it follows that δz = 0 is a minimizer of $\mathcal {J}'(z;\delta z)$ over $\mathcal {T}_{Z_{\mathrm {ad}}}(z)$, whenever z is B-stationary. Finally, given that $\mathcal {T}_{Z_{\mathrm {ad}}}(z)$ is a non-empty, closed, and convex cone, we can add the coercive quadratic form q to the definition of B-stationarity without altering the characterization, cf. the general analysis in [46, Section 2]. Therefore, if z is B-stationary, then 0 ∈ Z solves the auxiliary problem

$$\displaystyle \begin{aligned} \min\left\{ \mathcal{M}(\delta z) := q(\delta z) + \mathcal{J}'(z;\delta z) \text{ over } \delta z \in \mathcal{T}_{Z_{\mathrm{ad}}}(z)\right\}. \end{aligned} $$

(48)

Now, if the biactive set is negligible, then S is in fact Gâteaux differentiable. In this case, (48) has a unique solution given by

$$\displaystyle \begin{aligned} (\delta z^* + c_{q} ^{-1}(B^*S'(Bz)^*\nabla F'(S(Bz)) + \nabla G(z)), w - \delta z^*) \ge 0,\,\,\forall w \in \mathcal{T}_{Z_{\mathrm{ad}}}(z), \end{aligned}$$

which, noting that $ \nabla \mathcal {M}(0) = B^*S'(Bz)^*\nabla F'(S(Bz)) + \nabla G(z)$, is equivalent to (??). Furthermore, in this smooth setting, we can prove the following result.

Proposition 1

Let z ∈ Z_ad, u = S(Bz), ξ = Au − Bz − f, and suppose that$\mathcal {J}$is Gâteaux differentiable at z. If z is not B-stationary, then (46) stops in a finite number of steps.

Proof

Suppose δz is given by the projection formula (??). Since δz is the unique global optimum and $0 \in \mathcal {T}_{Z_{\mathrm {ad}}}(z)$, we have

$$\displaystyle \begin{aligned} q(\delta z) + \mathcal{J}'(z;\delta z) < 0 \Longrightarrow \mathcal{J}'(z;\delta z) \le -\frac{c_q}{2}\|\delta z\|{}^2_{L^2} \end{aligned} $$

(49)

Moreover, for any τ > 0, we have $z(\tau ) := \mathrm {Proj}_{Z_{\mathrm {ad}}}(z-\tau \delta z)$ and since $\mathrm {Proj}_{Z_{\mathrm {ad}}}$ is non-expansive:

$$\displaystyle \begin{aligned} \|z-z(\tau)\|{}_{L^2} = \|\mathrm{Proj}_{Z_{\mathrm{ad}}}(z+\tau \delta z) - \mathrm{Proj}_{Z_{\mathrm{ad}}}(z) \|{}_{L^2} \le \tau \|z\|{}_{L^2}. \end{aligned} $$

(50)

Furthermore, since z ∈ Z_ad, $\delta z \in \mathcal {T}_{Z_{\mathrm {ad}}}(z)$, and Z_ad is defined by simple pointwise bound constraints in L²(Ω), we appeal to the proof of Lemma 6.34 [15], which shows that

$$\displaystyle \begin{aligned} \tau^{-1}(z(\tau)-z) \to \delta z \text{ as } \tau \downarrow 0. \end{aligned} $$

(51)

Finally, suppose that (46) fails for all τ > 0. Then,

$$\displaystyle \begin{aligned} \tau^{-1}(\mathcal{J}(z(\tau)) - \mathcal{J}(z))> -\sigma\tau^{-2}\|z-z(\tau)\|{}^2_{L^2},\quad \forall \tau > 0. \end{aligned}$$

On the right side of the inequality, we can estimate from below using (50):

$$\displaystyle \begin{aligned} -\sigma\tau^{-2}\|z-z(\tau)\|{}^2_{L^2} \ge -\sigma \|\delta z\|{}^2_{L^2} \end{aligned}$$

Now, letting z(τ) = z + ττ⁻¹(z(τ) − z) = z + τd_τ, where d_τ → δz by (51), we have by (49):

$$\displaystyle \begin{aligned} \tau^{-1}(\mathcal{J}(z(\tau)) - \mathcal{J}(z)) = \tau^{-1}(\mathcal{J}(z + \tau d_{\tau}) - \mathcal{J}(z)) \to \mathcal{J}'(z; \delta z) \le -\frac{c_q}{2}\|\delta z\|{}^2_{L^2}. \end{aligned}$$

But, then

$$\displaystyle \begin{aligned} -\frac{c_q}{2}\|\delta z\|{}^2_{L^2} \ge \mathcal{J}'(z; \delta z) \ge -\sigma \|\delta z\|{}^2_{L^2}, \end{aligned}$$

a contradiction, since σ ∈ (0, c_q∕2).

Proposition 1 provides a justification for steps 4:–6: in Algorithm 5.7. Note that the calculation of the gradient $\nabla \mathcal {M}^k(0)$ requires the solution of an adjoint equation. For example, in the setting of Example 2, $\nabla \mathcal {M}^k(0) = \alpha z^k - B^*p^k$, where p^k solves (here in strong form):

$$\displaystyle \begin{aligned} \begin{array}{rclr} A^* p &=& u_d - u^{k} &\text{ on } \varOmega^+_k,\\ p &=& 0, &\text{ on } \varOmega^0_k, \end{array} \end{aligned} $$

(52)

Turning now to the case when the biactive set is non-negligible, we can easily adjust the proof of Proposition (1) for the non-smooth setting.

Corollary 2

Let z ∈ Z_ad, u = S(Bz), ξ = Au − Bz − f, and suppose that δz is minimizer for (48). If z is not B-stationary, then (46) stops in a finite number of steps.

Nevertheless, whenever the biactive set is non-negligible, the directional derivative is nonlinear in δz. In particular, (48) is non-convex, which was a key assumption in the arguments. We therefore need an alternative procedure to calculate the step δz.

In Algorithm 5.7, we suggested the step $ \delta z = \mathrm {Proj}_{\mathcal {T}_{Z_{\mathrm {ad}}}(z^k)}\left [-c_q^{-1} \nabla \mathcal {M}^k_{\epsilon ^k}(0)\right ], $ which is related to the smoothed auxiliary problem using the 𝜖-dependent model $\mathcal {M}_{\epsilon }$:

$$\displaystyle \begin{aligned} \min\left\{\mathcal{M}_{\epsilon}(\delta z) := q(\delta z) + F'(z;d_{\epsilon}(B\delta z)) + G'(z)\delta z \text{ over } \delta z \in \mathcal{T}_{Z_{\mathrm{ad}}}(z)\right\}. \end{aligned} $$

(53)

Note if the approximation was chosen so that d_𝜖 → S and 0 solves (53) for all sufficiently small 𝜖 > 0 (or at least along some null sequence), then z must be B-stationary as 0 solves (48), as well. In fact, in the current setting, A is symmetric so we can use the approximation results in [7] (discussed above), see also [46], to argue that if δz_𝜖 → δz weakly in L²(Ω), then

$$\displaystyle \begin{aligned} d_{\epsilon}(B \delta z_{\epsilon}) \to S'(Bz; B\delta z), \text{ as } \epsilon \downarrow 0, \end{aligned}$$

provided that B is completely continuous, e.g., when B is the embedding of L²(Ω) into H⁻¹(Ω).

On the other hand, if there exists some $\delta z \in \mathcal {T}_{Z_{\mathrm {ad}}}(z)$ such that $ ( \nabla \mathcal {M}_{\epsilon }(0), \delta z ) < 0, $ then by continuity, there is some η_𝜖 > 0 such that $\mathcal {M}_{\epsilon }(\eta _{\epsilon } \delta z) < 0$. Since $\mathcal {T}_{Z_{\mathrm {ad}}}(z)$ is a cone, $\eta _{\epsilon } \delta z \in \mathcal {T}_{Z_{\mathrm {ad}}}(z)$ and 0 does not solve (53). If this persists as 𝜖 ↓ 0, then z cannot be B-stationary.

In light of this, consider our δz update. Let $w := d^{\prime }_{\epsilon }(0)^*F'(z) + G'(z) = \nabla \mathcal {M}_{\epsilon }(0)$ and note that

$$\displaystyle \begin{aligned} \mathcal{T}_{Z_{\mathrm{ad}}}(z) = \left\{\delta z \in L^2(\varOmega) \left|\; \delta z \ge 0, \text{ a.e. on } \varOmega^{a},\; \delta z \le 0, \text{ a.e. on } \varOmega^{b}\right.\right\}, \end{aligned}$$

where $\varOmega ^{a} := \left \{x \in \varOmega \left |\; z(x) = a(x) \right .\right \}$, $\varOmega ^{b} := \left \{x \in \varOmega \left |\; z(x) = b(x) \right .\right \}$ and Ω^ina := Ω ∖ (Ω^a ∪ Ω^b). Then, using the basic properties of $\mathrm {Proj}_{\mathcal {T}_{Z_{\mathrm {ad}}}(z)}$, we have

$$\displaystyle \begin{aligned}\displaystyle (\nabla \mathcal{M}_{\epsilon}(0),\delta z) =\\\displaystyle -c_{q}\left[\int_{\varOmega^{ina}} |\delta z|{}^2 + \int_{\varOmega^{a} \cap \{ -c_q^{-1} w \ge 0\}} |\delta z|{}^2 + \int_{\varOmega^{b}\cap \{-c_q^{-1} w \le 0\}} |\delta z|{}^2\right] \le 0. \end{aligned} $$

Assuming that the latter term is nonzero and expanding $\mathcal {M}_{\epsilon }$ at zero in direction δz, we obtain:

$$\displaystyle \begin{aligned} \mathcal{M}_{\epsilon}(0 + \eta \delta z) = \eta\left( \frac{\eta c_{q}}{2}\|\delta z\|{}^2_{L^2} + (\nabla \mathcal{M}_{\epsilon}(0),\delta z) + o(1) \right). \end{aligned} $$

(54)

We may then choose a sufficiently small η_𝜖 > 0 such that

$$\displaystyle \begin{aligned} \frac{\eta_{\epsilon} c_{q}}{2}\|\delta z\|{}^2_{L^2} + (\nabla \mathcal{M}_{\epsilon}(0),\delta z) + o(1) < 0. \end{aligned}$$

Hence, $\mathcal {M}_{\epsilon }(\eta _{\epsilon } \delta z) \le 0$ and by definition

$$\displaystyle \begin{aligned} F'(z;d_{\epsilon}(B(\eta_{\epsilon} \delta z))) + G'(z)(\eta_{\epsilon} \delta z) \le - \eta^2_{\epsilon} q(\delta z_{\epsilon}). \end{aligned}$$

Therefore, if a uniform lower bound η > 0 with η_𝜖 ≥ η > 0 exists, then we can prove that δz_𝜖 is a descent direction for some sufficiently small 𝜖 > 0.

Proposition 2

Let z ∈ Z_ad, u = S(Bz), and ξ = Au − Bz − f and suppose that$\mathcal {J}$is only directionally differentiable at z. Furthermore, let$ \delta z_{\epsilon } = \mathrm {Proj}_{\mathcal {T}_{Z_{\mathrm {ad}}}(z)}\left [-c_q^{-1} \nabla \mathcal {M}_{\epsilon }(0)\right ] $and assume that$ \limsup _{\epsilon \downarrow 0} ( \nabla \mathcal {M}_{\epsilon }(0),\delta z_{\epsilon }) < 0. $If there exists an η > 0 such that$\mathcal {M}_{\epsilon }(\eta \delta z_{\epsilon }) \le 0$for all sufficiently small 𝜖 > 0, then there exists some$\widehat {\epsilon } > 0$such that$\delta z_{\widehat {\epsilon }}$is a descent direction for$\mathcal {J}$at z.

Proof

By assumption, F′(z;d_𝜖(B(ηδz_𝜖)) + G′(z)(ηδz_𝜖) ≤−η²q(δz_𝜖). Then,

$$\displaystyle \begin{aligned}\displaystyle \mathcal{J}'(z;\eta \delta z_{\epsilon}) = (\mathcal{J}'(z; \eta \delta z_{\epsilon}) - F'(z;d_{\epsilon}(B(\eta \delta z_{\epsilon}))) - G'(z)(\eta \delta z_{\epsilon})) +\\ {} \displaystyle F'(z;d_{\epsilon}(B(\eta \delta z_{\epsilon})) + G'(z)(\eta \delta z_{\epsilon} ) \le \\ {} (\mathcal{J}'(z; \eta \delta z_{\epsilon}) - F'(z;d_{\epsilon}(B(\eta \delta z_{\epsilon}))) - G'(z)(\eta \delta z_{\epsilon} )) - \eta^2 q(\delta z_{\epsilon}) =\\ {} F'(z;S(Bz;B(\eta \delta z_{\epsilon}))) - F'(z;d_{\epsilon}(B (\eta \delta z_{\epsilon}))) - \eta^2 q(\delta z_{\epsilon}) =\\ {} \langle \nabla F(z), S(Bz;B(\eta \delta z_{\epsilon})) - d_{\epsilon}(B (\eta \delta z_{\epsilon}))\rangle - \eta^2 q(\delta z_{\epsilon}). \end{aligned} $$

Now, since $\left \{p_{\epsilon }\right \}$ is bounded, we can show that $\left \{\delta z_{\epsilon }\right \}$ is bounded. Hence, there is a subsequence (denoted still by 𝜖) such that $\delta z_{\epsilon } \rightharpoonup \delta z^*$. Then, for sufficiently small $\widehat {\epsilon } > 0$

$$\displaystyle \begin{aligned} \langle \nabla F(z), S(Bz;B(\eta \delta z_{\widehat{\epsilon}})) - d_{\widehat{\epsilon}}(B (\eta \delta z_{\widehat{\epsilon}}))\rangle \le \eta^2 q(\delta z_{\widehat{\epsilon}})/2, \end{aligned} $$

(55)

Hence, $ \mathcal {J}'(z; \delta z_{\widehat {\epsilon }}) \le -c_{q}\eta \|\delta z_{\widehat {\epsilon }}\|{ }^2/4, $ as was to be shown.

Since a direct verification of the hypotheses is potentially too expensive from a computational standpoint, we suggest a heuristic. Fix a lower bound $ \underline {\eta } > 0$. If $(\nabla \mathcal {M}_{\epsilon }(0),\delta z) < -q(\delta z)$ and (55) (or an approximation as in [46, Remark 3.13]) holds with η = 1 (or $\eta = \underline {\eta }$), we use the current δz in the line search. Thus, the “descent condition” in 9: holds. If $-q(\delta z) \le (\nabla \mathcal {M}_{\epsilon }(0),\delta z) < 0$, then we choose η > 0 such that $\eta q(\delta z)+ (\nabla \mathcal {M}_{\epsilon }(0),\delta z) < 0$, set δz := ηδz, update $ \underline {\eta } := \min (\eta , \underline {\eta })$, and check (55) or an approximation. Here, the descent condition also holds, but the model $\mathcal {M}_{\epsilon }$ might be failing. If (55) fails in the latter, then we go to step 10:. Finally, if $(\nabla \mathcal {M}_{\epsilon }(0),\delta z) = 0$, then we go to step 10: . In practice, one might also attempt to circumvent the verification of (55) by using a sufficiently small 𝜖 > 0 and decreasing at every step.

This heuristic has its limitations, e.g., it cannot prohibit $ \underline {\eta } \downarrow 0$. Therefore, if it appears that this is the case, we break the while loop and go to a “robustification step” in 15:. This just means that if the quadratic model $\mathcal {M}_{\epsilon }$ appears to be ineffective, then we should calculate a new control z by using an alternative method that is guaranteed to converge and “restart” the algorithm. For instance, we could use the adaptive penalty method in Section 4.1 (for a fixed γ, increasing each time the robustification step is used).

Finally, we recall that in the classical projected-gradient approaches, the Lipschitz continuity of the gradient of the objective is essential for convergence proofs, cf. [13]. There, one can show that the line search will always stop provided that the step size is below a certain threshold which only depends on σ and the Lipschitz modulus L. This is, of course, not possible for the control of variational inequalities as the reduced objective is non-smooth (even when the biactive set is negligible we only have Gâteaux differentiability). Therefore, we must also monitor the behavior of the accepted step sizes at each iteration. If, as with $ \underline {\eta }$, the step sizes τ^k appear to be rapidly decreasing with each iteration, then we also make use of a robustification step. For a full convergence proof in the case of the canonical MPEC (excluding control constraints), we refer the interested reader to [46].

We conclude by demonstrating the performance of the bundle-free method on two examples, Example 3 and the following example (adapted from [39, Ex. 6.1] by adding control constraints).

Example 4

Here, we let a ≡ 0, b ≡ 0.8 and set α = 1. In addition, we define f and u_d as follows:

$$\displaystyle \begin{aligned} z_1({\mathbf{x}}_1,{\mathbf{x}}_2) &= -4096{\mathbf{x}}_1^6 + 6144{\mathbf{x}}_2^5 -3072{\mathbf{x}}_1^4 + 512{\mathbf{x}}_2^3,\\ {} z_2({\mathbf{x}}_1,{\mathbf{x}}_2) &= -244.140625{\mathbf{x}}_1^6+585.9375{\mathbf{x}}_2^5 -468.75{\mathbf{x}}_2^4+125{\mathbf{x}}_2^3,\\ {} y^*({\mathbf{x}}_1,{\mathbf{x}}_2) &= \left\{\begin{array}{ll} z_1({\mathbf{x}}_1,{\mathbf{x}}_2)z_2({\mathbf{x}}_1,{\mathbf{x}}_2) & ({\mathbf{x}}_1,{\mathbf{x}}_2) \in ]0,0.5[\times ]0,0.8[,\\ {} 0 & \text{otherwise,} \end{array}\right.\\ u^* &{=} y^*,\quad \xi^*({\mathbf{x}}_1,{\mathbf{x}}_2) {=} 2\max(0,-|{\mathbf{x}}_1 {-} 0.8|{-}|{\mathbf{x}}_1{\mathbf{x}}_2 {-} 0.2|{-}0.3+0.35),\\ f &= -\varDelta y^* - u^* - \xi^*,\quad u_d = y^*+\xi^* -\alpha\varDelta u^*. \end{aligned} $$

In order to compare to the other methods, we again use a uniform grid with 512² grid points and start the algorithm at (z⁰, u⁰, ξ⁰) = (0, 0, 0) for Example 3. In contrast, we use a random starting point when solving Example 4. We start the algorithm with 𝜖₀ = 10⁻¹⁰ and subsequently set 𝜖_k+1 = 𝜖_k∕2.

For Example 3 we obtain the same solution as in all the previous algorithms. Likewise, the algorithm performs very well on Example 4, see Table 5 and Figure 6. We note, however, that Example 4 (when starting with a random initial guess) is more difficult to solve. In fact, once the difference in function values becomes negligible, i.e., on the order of $\mathcal {O}(10^{-13})$, the residual of S-stationarity stagnates at $\mathcal {O}(10^{-7})$. Finally, one important aspect of the theory for the bundle-free method was the relation $(\nabla \mathcal {M}_{\epsilon }(0),\delta z) < -q(\delta z)$. To see how the choice of 𝜖_k influences this, see Table 6. There, we observe that far from the solution, a larger value of 𝜖_k seems to yield a good approximation. However, the choice becomes critical once we close in on the solution.

Table 5 Residuals, step sizes, and number of PDAS iterations (“iter”) for Algorithm 5.7 when used to solve Examples 3 and 4

Full size table

Table 6 Behavior of Taylor expansion of $\mathcal {M}_{\epsilon }(0+\delta z_k)$ at each iteration in Algorithm 5.7 when applied to Example 3

Full size table

6 Conclusion

Despite being a topic of interest for several decades, the optimal control of variational inequalities continues to be an active field of research. The rapidly growing interest in the past decade appears to be a result of the many theoretical, algorithmic, and computational advances in PDE-constrained optimization to date. We have seen here that both reduced space and full-space approaches are possible; however, the difficulties due to either a non-smooth control-to-state mapping or degenerate complementarity constraints persist. The various techniques for deriving optimality conditions in the presence of non-smoothness or degeneracy have led both in finite and infinite dimensions to a hierarchy of first-order optimality conditions. Some of these conditions are directly related to function-space-based numerical methods (C-stationarity) whereas other conditions (e.g., those of Mignot and Puel) are derived using concepts of generalized differentiation and a fine analysis of the regularity properties of the underlying functions and multipliers. These facts should therefore always be taken into account when developing numerical optimization algorithms.

In our numerical studies, we considered two main types of solution methods: smooth and non-smooth. Using approximation techniques for variational inequalities as in [30, 31], the smooth methods are almost always available and allow us to immediately take advantage of existing solvers for (smooth) PDE-constrained optimization problems. For the smoothed/regularized problems, we are only limited by our knowledge of the corresponding parameter-dependent PDE-constrained optimization problem. In our study, we make use of a smooth continuation approach and solve the first-order conditions directly using a semismooth Newton method. For small penalty parameters 𝜖, the solution of the linear system (24) needed to calculate the updates is relatively well-behaved. However, the lower off-diagonal block becomes increasingly problematic as we attempt to approach the limiting problem for 𝜖 ↓ 0. Thus, we eventually pay a major price for smoothing the original problem.

As mentioned in the introduction, we present here several possible non-smooth methods that mirror related approaches in smooth PDE-constrained optimization. The first method is inspired by the classical projected subgradient methods in [54, 76]. Just as in these classical methods, the relative cost of each iteration is roughly as cheap as a standard projected-gradient approach sans line search: solve the VI, then a linear elliptic PDE, and compute a pointwise projection. As with all subgradient gradient methods, the strong convergence statements are essentially limited to convex problems. Nevertheless, the method presented here behaves quite well in practice and thus, warrants a deeper study in future research.

The active-set method has been taken from [36] and adapted to the setting of the canonical example. At every iteration, the cost of solving the nonlinear system is slightly more than in the smooth continuation approach due to the feasibility restoration step, which requires the solution of a mixed complementarity problem. However, the conditioning issues are now absent as the potentially problematic perturbation in the lower off-diagonal blocks has been removed. Just as in [36], this method performs exceptionally well, even on a problem with persistent biactivity throughout the iterations. However, as mentioned in the text, proving convergence of this method is rather difficult. One possibility would be to make regularity and monotonicity arguments on the data and active sets throughout the iterations (as was done in [42] for a related problem).

Finally, we considered the bundle-free implicit programming approach, which can be thought of as a globalization of the active-set strategy since we typically choose a step size of τ = 1 at the beginning of every line search. Though the theory does contain several strong assumptions to guarantee unconditional global convergence to a stationary point, there appear to be no other genuinely non-smooth function-space-based algorithms for non-smooth non-convex problems currently in the literature. As noted in [46], if one can prove a kind of semismooth property of the reduced objective, then the convergence theory is greatly simplified. In comparison to the other non-smooth methods, it is clearly more costly due to the usage of the line search. The caveats for this method are the need to monitor the step sizes (essentially restarting if they get too small) and a meaningful heuristic for the convergence criteria. Nevertheless, the theory stands in contrast to non-convex bundle methods as outlined in, e.g., [75]. There, in the ideal non-convex setting, the algorithm terminates at a point at which 0 lies up to some tolerance ε > 0 in a convex hull of certain subgradients corresponding to points y_i that are not “far away” from the current iterate x_k. The connection to the various MPEC stationarity concepts is therefore unclear.

References

R. A. Adams and J. J. F. Fournier. Sobolev spaces, volume 140 of Pure and Applied Mathematics (Amsterdam). Elsevier/Academic Press, Amsterdam, second edition, 2003.
MATH Google Scholar
S. Albrecht and M. Ulbrich. Mathematical programs with complementarity constraints in the context of inverse optimal control for locomotion. Optimization Methods and Software, pages 1–29, 2017.
Google Scholar
M. Anitescu, P. Tseng, and S.J. Wright. Elastic-mode algorithms for mathematical programs with equilibrium constraints: global convergence and stationarity properties. Math. Program, 110:337–371, 2005.
Article MathSciNet MATH Google Scholar
H. Antil, M. Hintermüller, R. H. Nochetto, T. M. Surowiec, and D. Wegner. Finite horizon model predictive control of electrowetting on dielectric with pinning. Interfaces Free Bound., 19(1):1–30, 2017.
Article MathSciNet MATH Google Scholar
H. Attouch. Variational Convergence for Functions and Operators. Pitman Advanced Publishing Program, Boston, London, Melbourne, 1984.
MATH Google Scholar
H. Attouch, G. Buttazzo, and G. Michaille. Variational analysis in Sobolev and BV spaces, volume 6 of MPS/SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2006. Applications to PDEs and optimization.
Google Scholar
V. Barbu. Optimal control of variational inequalities, volume 100 of Research Notes in Mathematics. Pitman (Advanced Publishing Program), Boston, MA, 1984.
Google Scholar
M. P. Bendsøe and O. Sigmund. Topology Optimization. Theory, methods and Applications. Springer Verlag, Berlin, Heidelberg, New York, 2003.
Google Scholar
M. Bergounioux. Optimal control of variational inequalities: a mathematical programming approach. In Modelling and optimization of distributed parameter systems (Warsaw, 1995), pages 123–130. Chapman & Hall, New York, 1996.
Google Scholar
M. Bergounioux. Optimal control of an obstacle problem. Appl. Math. Optim., 36(2):147–172, 1997.
Article MathSciNet Google Scholar
M. Bergounioux. Use of augmented Lagrangian methods for the optimal control of obstacle problems. J. Optim. Theory Appl., 95(1):101–126, 1997.
Article MathSciNet MATH Google Scholar
A. Bermúdez and C. Saguez. Optimal control of a Signorini problem. SIAM J. Control Optim., 25(3):576–582, 1987.
Article MathSciNet MATH Google Scholar
D. P. Bertsekas. On the Goldstein-Levitin-Polyak gradient projection method. IEEE Transactions on Automatic Control, pages 174–184, 1976.
Google Scholar
J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A Fresh Approach to Numerical Computing. SIAM Rev., 59(1):65–98, 2017.
Article MathSciNet MATH Google Scholar
J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems. Springer Verlag, Berlin, Heidelberg, New York, 2000.
Book MATH Google Scholar
M. Boukrouche and D. A. Tarzia. Convergence of distributed optimal control problems governed by elliptic variational inequalities. Comput. Optim. Appl., 53(2):375–393, 2012.
Article MathSciNet MATH Google Scholar
C. Brett, C. M. Elliott, M. Hintermüller, and C. Löbhard. Mesh adaptivity in optimal control of elliptic variational inequalities with point-tracking of the state. Interfaces Free Bound., 17(1):21–53, 2015.
Article MathSciNet MATH Google Scholar
H. R. Brezis and G. Stampacchia. Sur la régularité de la solution d’inéquations elliptiques. Bull. Soc. Math. France, 96:153–180, 1968.
Article MathSciNet MATH Google Scholar
X. Chen, Z. Nashed, and L. Qi. Smoothing methods and semismooth methods for nondifferentiable operator equations. SIAM J. Numer. Anal., 38(4):1200–1216 (electronic), 2000.
Article MathSciNet MATH Google Scholar
P. Colli, M. H. Farshbaf-Shaker, G. Gilardi, and J. Sprekels. Optimal boundary control of a viscous Cahn–Hilliard system with dynamic boundary condition and double obstacle potentials. SIAM Journal on Control and Optimization, 53(4):2696–2721, 2015.
Article MathSciNet MATH Google Scholar
J. C. De Los Reyes. Optimal control of a class of variational inequalities of the second kind. SIAM J. Control Optim., 49(4):1629–1658, 2011.
Article MathSciNet MATH Google Scholar
J. C. De los Reyes, R. Herzog, and C. Meyer. Optimal control of static elastoplasticity in primal formulation. SIAM J. Control Optim., 54(6):3016–3039, 2016.
Article MathSciNet MATH Google Scholar
J. C. De los Reyes and C. Meyer. Strong stationarity conditions for a class of optimization problems governed by variational inequalities of the second kind. J. Optim. Theory Appl., 168(2):375–409, 2016.
Google Scholar
A. K. Dond, T. Gudi, and N. Nataraj. A nonconforming finite element approximation for optimal control of an obstacle problem. Comput. Methods Appl. Math., 16(4):653–666, 2016.
Article MathSciNet MATH Google Scholar
I. Ekeland and R. Témam. Convex analysis and variational problems, volume 28 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, English edition, 1999. Translated from the French.
Google Scholar
L. C. Evans and R. F. Gariepy. Measure theory and fine properties of functions. Studies in advanced mathematics. CRC Press, Boca Raton (Fla.), 1992.
Google Scholar
M. H. Farshbaf-Shaker. A penalty approach to optimal control of Allen-Cahn variational inequalities: MPEC-view. Numerical Functional Analysis and Optimization, 33(11):1321–1349, 2012.
Article MathSciNet MATH Google Scholar
M. H. Farshbaf-Shaker and C. Hecht. Optimal control of elastic vector-valued Allen–Cahn variational inequalities. SIAM Journal on Control and Optimization, 54(1):129–152, 2016.
Article MathSciNet MATH Google Scholar
G. Fichera. Problemi elastostatici con vincoli unilaterali: Il problema di Signorini con ambigue condizioni al contorno. Atti Accad. Naz. Lincei Mem. Cl. Sci. Fis. Mat. Natur. Sez. I (8), 7:91–140, 1963.
MathSciNet MATH Google Scholar
R. Glowinski. Numerical methods for nonlinear variational problems. Springer Series in Computational Physics. Springer-Verlag, New York, 1984.
Google Scholar
R. Glowinski, J. L. Lions, and R. Trémolières. Numerical analysis of variational inequalities, volume 8 of Studies in Mathematics and its Applications. North-Holland Publishing Co., Amsterdam, 1981.
Book MATH Google Scholar
W. Han and B. D. Reddy. Plasticity, volume 9 of Interdisciplinary Applied Mathematics. Springer, New York, second edition, 2013. Mathematical theory and numerical analysis.
Google Scholar
A. Haraux. How to differentiate the projection on a convex set in Hilbert space. some applications to variational inequalities. J. Math. Soc. Japan, 29(4):615–631, 1977.
Article MathSciNet MATH Google Scholar
R. Herzog, C. Meyer, and G. Wachsmuth. C-stationarity for optimal control of static plasticity with linear kinematic hardening. SIAM J. Control Optim., 50(5):3052–3082, 2012.
Article MathSciNet MATH Google Scholar
R. Herzog, C. Meyer, and G. Wachsmuth. B- and strong stationarity for optimal control of static plasticity with hardening. SIAM J. Optim., 23(1):321–352, 2013.
Article MathSciNet MATH Google Scholar
M. Hintermüller. An active-set equality constrained Newton solver with feasibility restoration for inverse coefficient problems in elliptic variational inequalities. Inverse Problems, 24(3):23pp., 2008.
Article MathSciNet MATH Google Scholar
M. Hintermüller, R. H. W. Hoppe, and C. Löbhard. Dual-weighted goal-oriented adaptive finite elements for optimal control of elliptic variational inequalities. ESAIM Control Optim. Calc. Var., 20(2):524–546, 2014.
Article MathSciNet MATH Google Scholar
M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth newton method. SIAM Journal on Optimization, 13(3):865–888, 2002.
Article MathSciNet MATH Google Scholar
M. Hintermüller and I. Kopacka. Mathematical programs with complementarity constraints in function space: C- and strong stationarity and a path-following algorithm. SIAM J. Optim., 20(2):868–902, 2009.
Article MathSciNet MATH Google Scholar
M. Hintermüller and I. Kopacka. A smooth penalty approach and a nonlinear multigrid algorithm for elliptic MPECs. Comput. Optim. Appl., 50(1):111–145, 2011.
Article MathSciNet MATH Google Scholar
M. Hintermüller and K. Kunisch. Path-following methods for a class of constrained minimization problems in function space. SIAM Journal on Optimization, 17(1):159–187, 2006.
Article MathSciNet MATH Google Scholar
M. Hintermüller and A. Laurain. Optimal shape design subject to elliptic variational inequalities. SIAM Journal on Control and Optimization, 49(3):1015–1047, 2011.
Article MathSciNet MATH Google Scholar
M. Hintermüller, C. Löbhard, and M. H. Tber. An ℓ¹-penalty scheme for the optimal control of elliptic variational inequalities. In Mehiddin Al-Baali, Lucio Grandinetti, and Anton Purnama, editors, Numerical Analysis and Optimization, volume 134 of Springer Proceedings in Mathematics & Statistics, pages 151–190. Springer International Publishing, 2015.
Google Scholar
M. Hintermüller, B. S. Mordukhovich, and T. M. Surowiec. Several approaches for the derivation of stationarity conditions for elliptic MPECs with upper-level control constraints. Math. Program., 146(1):555–582, 2014.
Article MathSciNet MATH Google Scholar
M. Hintermüller and T. Surowiec. First-order optimality conditions for elliptic mathematical programs with equilibrium constraints via variational analysis. SIAM J. Optim., 21(4):1561–1593, 2011.
Article MathSciNet MATH Google Scholar
M. Hintermüller and T. Surowiec. A bundle-free implicit programming approach for a class of elliptic MPECs in function space. Math. Program., 160(1):271–305, 2016.
Article MathSciNet MATH Google Scholar
M. Hintermüller and D. Wegner. Optimal control of a semidiscrete Cahn–Hilliard–Navier–Stokes system. SIAM Journal on Control and Optimization, 52(1):747–772, 2014.
Article MathSciNet MATH Google Scholar
M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich. Optimization with PDE constraints, volume 23 of Mathematical Modelling: Theory and Applications. Springer, New York, 2009.
Book MATH Google Scholar
B. Horn and S. Ulbrich. Shape optimization for contact problems based on isogeometric analysis. Journal of Physics: Conference Series, 734(3):032008, 2016.
Google Scholar
K. Ito and K. Kunisch. Optimal control of elliptic variational inequalities. Applied Mathematics and Optimization, 41(3):343–364, 2000.
Article MathSciNet MATH Google Scholar
P. Jaillet, D. Lamberton, and B. Lapeyre. Variational inequalities and the pricing of American options. Acta Appl. Math., 21(3):263–289, 1990.
Article MathSciNet MATH Google Scholar
N. Kikuchi and J. T. Oden. Contact problems in elasticity: a study of variational inequalities and finite element methods, volume 8 of SIAM Studies in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1988.
Google Scholar
D. Kinderlehrer and G. Stampacchia. An introduction to variational inequalities and their applications, volume 88 of Pure and Applied Mathematics. Academic Press Inc. [Harcourt Brace Jovanovich Publishers], New York, 1980.
Google Scholar
Krzysztof C. Kiwiel. Methods of descent for nondifferentiable optimization. Lecture Notes in Mathematics. 1133. Berlin etc.: Springer-Verlag, 1985.
Google Scholar
R. Kornhuber. Monotone multigrid methods for elliptic variational inequalities. I. Numer. Math., 69(2):167–184, 1994.
Article MathSciNet MATH Google Scholar
K. Kunisch and T. Pock. A bilevel optimization approach for parameter learning in variational models. SIAM J. Imaging Sci., 6(2):938–983, 2013.
Article MathSciNet MATH Google Scholar
J. L. Lions. Contrôle optimal de systèmes gouvernés par des équations aux dérivées partielles. Dunod, Paris, 1968.
MATH Google Scholar
J.-L. Lions. Various topics in the theory of optimal control of distributed systems. In Optimal control theory and its applications (Proc. Fourteenth Biennial Sem. Canad. Math. Congr., Univ. Western Ontario, London, Ont., 1973), Part I, pages 116–309. Lecture Notes in Econom. and Math. Systems, Vol. 105. Springer, Berlin, 1974.
Chapter Google Scholar
J.-L. Lions. Contrôle des systèmes distribués singuliers, volume 13 of Méthodes Mathématiques de l’Informatique [Mathematical Methods of Information Science]. Gauthier-Villars, Montrouge, 1983.
MATH Google Scholar
J.-L. Lions and G. Stampacchia. Variational inequalities. Comm. Pure Appl. Math., 20:493–519, 1967.
Article MathSciNet MATH Google Scholar
Z.-Q. Luo, J.-S. Pang, and D. Ralph. Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge, 1996.
Book MATH Google Scholar
C. Meyer and O. Thoma. A priori finite element error analysis for optimal control of the obstacle problem. SIAM J. Numer. Anal., 51(1):605–628, 2013.
Article MathSciNet MATH Google Scholar
F. Mignot. Contrôle dans les inéquations variationelles elliptiques. J. Functional Analysis, 22(2):130–185, 1976.
Article MathSciNet MATH Google Scholar
F. Mignot and J.-P. Puel. Optimal control in some variational inequalities. SIAM J. Control and Optimization, 22(3):466–476, 1984.
Article MathSciNet MATH Google Scholar
K. Mombaur, A. Truong, and J.-P. Laumond. From human to humanoid locomotion–an inverse optimal control approach. Autonomous Robots, 28(3):369–383, 2010.
Article Google Scholar
B. S. Mordukhovich. Variational analysis and generalized differentiation. I: Basic Theory, volume 330 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, Berlin, 2006.
Book Google Scholar
D. Mumford and J. Shah. Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math., 42(5):577–685, 1989.
Article MathSciNet MATH Google Scholar
P. Neittaanmäki, J. Sprekels, and D. Tiba. Optimization of Elliptic Systems. Springer Monographs in Mathematics. Springer, New York, 2006.
Google Scholar
J. V. Outrata, M. Kočvara, and J. Zowe. Nonsmooth Approach to Optimization Problems with Equilibrium Constraints, volume 28 of Nonconvex Optimization and its Applications. Kluwer Academic Publishers, Dordrecht, 1998.
Book MATH Google Scholar
P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE transactions on pattern analysis and machine intelligence, 12(7):174–184, 1990.
Article Google Scholar
J.-F. Rodrigues. Obstacle Problems in Mathematical Physics. Number 134 in North-Holland Mathematics Studies. North-Holland Publishing Co., Amsterdam, 1984.
Google Scholar
L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Phys. D, 60(1–4):259–268, 1992. Experimental mathematics: computational issues in nonlinear science (Los Alamos, NM, 1991).
Google Scholar
C. Saguez. Optimal control of free boundary problems. In System modelling and optimization (Budapest, 1985), volume 84 of Lecture Notes in Control and Inform. Sci., pages 776–788. Springer, Berlin, 1986.
MATH Google Scholar
H. Scheel and S. Scholtes. Mathematical programs with complementarity constraints: Stationarity, optimality, and sensitivity. Mathematics of Operations Research, 25(1):1–22, 2000.
Article MathSciNet MATH Google Scholar
H. Schramm and J. Zowe. A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results. SIAM Journal on Optimization, 2(1):121–152, 1992.
Article MathSciNet MATH Google Scholar
N. Z. Shor. Minimization Methods for Non-differentiable Functions. Springer-Verlag, New York, 1985.
Book MATH Google Scholar
F. Tröltzsch. Optimal control of partial differential equations, volume 112 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2010. Theory, methods and applications, Translated from the 2005 German original by Jürgen Sprekels.
Google Scholar
J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Automat. Control, 31(9):803–812, 1986.
Article MathSciNet MATH Google Scholar
M. Ulbrich. Semismooth Newton methods for operator equations in function spaces. SIAM J. Optim., 13(3):805–842 (2003), 2002.
Article MathSciNet MATH Google Scholar
G. Wachsmuth. Strong stationarity for optimal control of the obstacle problem with control constraints. SIAM J. Optim., 24(4):1914–1932, 2014.
Article MathSciNet MATH Google Scholar
G. Wachsmuth. Mathematical programs with complementarity constraints in Banach spaces. J. Optim. Theory Appl., 166(2):480–507, 2015.
Article MathSciNet MATH Google Scholar
G. Wachsmuth. A guided tour of polyhedric sets: Basic properties, new results on intersections and applications. Technical report, TU Chemnitz, 2016.
Google Scholar
G. Wachsmuth. Towards M-stationarity for optimal control of the obstacle problem with control constraints. SIAM J. Control Optim., 54(2):964–986, 2016.
Article MathSciNet MATH Google Scholar
S. W. Walker, A. Bonito, and R. H. Nochetto. Mixed finite element method for electrowetting on dielectric with contact line pinning. Interfaces Free Bound., 12(1):85–119, 2010.
Article MathSciNet MATH Google Scholar
Y-S. Wu, K. Pruess, and P. A. Witherspoon. Flow and displacement of Bingham non-Newtonian fluids in porous media. SPE Reservoir Engineering, 7(3), 1992.
Article Google Scholar
J.-P. Yvon. Etude de quelques problèmes de contrôle pour des systèmes distribués. PhD thesis, Paris VI, 1973.
Google Scholar
J.-P. Yvon. Optimal control of systems governed by variational inequalities. In Fifth Conference on Optimization Techniques (Rome, 1973), Part I, pages 265–275. Lecture Notes in Comput. Sci., Vol. 3. Springer, Berlin, 1973.
Google Scholar
J. Zowe and S. Kurcyusz. Regularity and stability for the mathematical programming problem in Banach spaces. Appl. Math. Optim., 5(1):49–62, 1979.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This paper is an extension of a short course given by the author at the “Frontiers in PDE-constrained Optimization” workshop on June 6–10, 2016 at the Institute for Mathematics and its Applications at the University of Minnesota, Minneapolis, which was sponsored by ExxonMobil. The author would therefore like to express his gratitude for the financial support and the opportunity to write this article. In addition, the author would like to thank Harbir Antil, Patrick Farrell, and the two anonymous reviewers for their helpful comments and thought-provoking questions on the text.

Author information

Authors and Affiliations

FB12 Mathematik und Informatik, Philipps-Universität Marburg, Marburg, Germany
Thomas M. Surowiec

Authors

Thomas M. Surowiec
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas M. Surowiec .

Editor information

Editors and Affiliations

Department of Mathematical Sciences, George Mason University, Fairfax, VA, USA
Harbir Antil
Center for Computing Research, Sandia National Laboratories, Albuquerque, NM, USA
Drew P. Kouri
Corporate Strategic Research, ExxonMobil Research and Engineering Company, Annandale, NJ, USA
Martin-D. Lacasse
Center for Computing Research, Sandia National Laboratories, Albuquerque, NM, USA
Denis Ridzal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Surowiec, T.M. (2018). Numerical Optimization Methods for the Optimal Control of Elliptic Variational Inequalities. In: Antil, H., Kouri, D.P., Lacasse, MD., Ridzal, D. (eds) Frontiers in PDE-Constrained Optimization. The IMA Volumes in Mathematics and its Applications, vol 163. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8636-1_4

Download citation

DOI: https://doi.org/10.1007/978-1-4939-8636-1_4
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-8635-4
Online ISBN: 978-1-4939-8636-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Numerical Optimization Methods for the Optimal Control of Elliptic Variational Inequalities

Abstract

Similar content being viewed by others

Finding global solutions of some inverse optimal control problems using penalization and semismooth Newton methods

An ℓ 1-Penalty Scheme for the Optimal Control of Elliptic Variational Inequalities

A difference-of-convex functions approach for sparse PDE optimal control problems with nonconvex costs

1 Introduction

2 Notation, Assumptions, and Preliminary Results

2.1 Norms, Inner Products, and Convergence

2.2 Extended Real-Valued Functionals and Convex Analysis

2.3 Function Spaces and 2-Capacity

2.4 Elliptic Variational Inequalities

Theorem 1

Example 1

2.5 Elliptic Mathematical Programs with Equilibrium Constraints

Remark 1

Example 2

3 Existence and Stationarity Conditions

3.1 Existence of a Solution

Theorem 2

Proof

Corollary 1

Remark 2

3.2 Primal Stationarity Conditions

Theorem 3

Definition 1

3.3 Dual Stationarity Conditions

Definition 2

Definition 3

4 Regularization-Based Methods

4.1 An Adaptive Penalty Method

Definition 4

Remark 3

Theorem 4

Algorithm 4.1 Adaptive penalty method: outer loop

Detail 5 (Sketch of the Limiting Technique)

Definition 5

Theorem 6

Algorithm 4.2 Adaptive penalty method: inner loop

Example 3

4.2 An ℓ1 Penalty Method

Detail 7 (Sketch of Existence and Consistency Arguments)

Theorem 8 (Thm 2.12 [43])

Algorithm 4.3 ℓ1 penalty method: outer loop

5 Non-Smooth Numerical Methods

5.1 An Approximate Projected Subgradient Method

Definition 6 (Normal Cones to Arbitrary Sets)

Definition 7 (Limiting Subdifferential)

Definition 8 (Coderivatives)

Algorithm 5.4 Limiting projected subgradient algorithm

Algorithm 5.5 Approximate projected subgradient algorithm

Remark 4

Theorem 9

Proof

5.2 A Direct Solver for C-Stationarity Conditions

Algorithm 5.6 An active-set equality-constrained newton solver w. Feasibility restoration

5.3 The Bundle-Free Implicit Programming Method

Algorithm 5.7 A bundle-free implicit programming approach

Proposition 1

Proof

Corollary 2

Proposition 2

Proof

Example 4

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

4.2 An ℓ¹ Penalty Method

Algorithm 4.3 ℓ¹ penalty method: outer loop