1 Introduction

In a previous work [10], the first and third authors applied and further developed certain techniques from convex and nonsmooth analysis to derive first-order optimality conditions for a class of bilevel optimization problems known as mathematical programs with equilibrium constraints, or simply MPECs, in function spaces. Such models are known to arise in many application areas such as mathematical elasticity, finance, economics, etc. Nevertheless, the techniques were only applicable to a certain class of MPECs in which the so-called upper-level variables or controls are not subject to any constraints. In fact, the literature on the derivation of explicit (i.e., multiplier-based) necessary optimality conditions for MPECs in function spaces with upper-level constraints is rather scarce; though there are some results available in [9, 16, 17]. We thus aim to present several techniques for the derivation of multiplier-based first-order optimality conditions.

In the literature on optimization problems governed by partial differential equations, regularization/penalization techniques employed for the derivation of necessary optimality conditions are relatively widespread. Conversely, techniques from set-valued and variational analysis provide powerful tools for the direct derivation of multiplier-based optimality conditions. Currently it is unclear as to how these techniques compare from both the analytical perspective, e.g., the selectivity of the derived conditions and the generality of their applicability, as well as in terms of numerics, e.g., the development of mesh-independent solvers.

In this paper we are mainly concerned with the following class of MPECs:

$$\begin{aligned} \begin{array}{l} \min J(u,y) := \dfrac{1}{2}||y - y_d||^2_{L^2(\varOmega )} + \dfrac{\alpha }{2}||u||^2_{L^2(\varOmega )} \mathrm{\,\,over\,\,} (u,y) \in L^2(\varOmega )\times H^1_0(\varOmega )\\ \text{ s.t. }\quad u \in U_{ad} := \left\{ w \in L^2(\varOmega ) \left| \, a \le w \le b, \mathrm{\,\, almost\, everywhere\, (a.e.)\, in\,}\varOmega \!\right. \right\} \!,\\ Ay + N_M(y) \ni Bu + f. \end{array}\quad \end{aligned}$$
(1)

Here \(\alpha >0 ,\,\varOmega \subset \mathbb R ^{n}\) with \(1\le n \le 3\) is open and bounded, \(f \in L^2(\varOmega )\), and there exists \(\beta \in \mathbb R \) such that \(b - a \ge \beta > 0\,\,a.e.\,\varOmega \), where \(a,b\in L^{\infty }(\varOmega )\). For \(1\le p \le \infty ,\,L^p(\varOmega )\) represents the standard Lebesgue space. Letting \(C^{\infty }_0(\varOmega )\) be the space of all infinitely differentiable functions with compact support contained in \(\varOmega \) and \(C^{m}(\varOmega )\) the space of all \(m\)-times continuously differentiable functions, we define the Sobolev spaces \(H^1_0(\varOmega )\) and \(H^m(\varOmega ),\,1\le m < \infty \), as the completion of \(C^{\infty }_0(\varOmega )\) and \(C^{m}(\varOmega )\) under the norm \(||\cdot ||_{H^m}\) defined by

$$\begin{aligned} || u ||_{H^m} = \left( \sum _{0 \le |\gamma | \le m}||D^{\gamma }u||^2_2\right) ^{1/2},\quad u \in H^{m}(\varOmega ), \end{aligned}$$

where \(\gamma \) represents the standard multi-index, and \(D^{\gamma } := D^{\gamma _1}_1\cdots D^{\gamma _n}_n\) with \(D^{\gamma _i}_j\) the \(\gamma _i\)th-weak (distributional) partial derivative with respect to the \(j\)th component. Due to the Poincaré inequality, we can endow \(H^1_0(\varOmega )\) with the equivalent norm \(||u||_{H^1_0(\varOmega )} = ||\nabla u||_{L^2(\varOmega )}\), where \(\nabla u\) is the weak gradient of \(u\). We use the symbol \(H^{-1}(\varOmega )\) to represent the dual of \(H^1_0(\varOmega )\). See [1] for more on these and related spaces.

The bounded linear operator \(A \in \mathcal L (H^1_0(\varOmega ),H^{-1}(\varOmega ))\) is assumed to be coercive, i.e., we assume that there exists a constant \(\xi >0\) such that

$$\begin{aligned} \langle Ay,y\rangle _{H^{-1},H^1_0} \ge \xi \Vert y\Vert ^2_{H^1_0} \quad \text{ for } \text{ all } \;y \in H^1_0(\varOmega ) \end{aligned}$$

whereas, unless otherwise stated, \(B\in \mathcal L (L^2(\varOmega ),H^{-1}(\varOmega ))\). Finally, we define the closed and convex subset \(M \subset H^1_0(\varOmega )\) by

$$\begin{aligned} M:= \left\{ y \in H^1_0(\varOmega )\left| \;y \ge 0\,\,a.e.\,\varOmega \!\right. \right\} \!. \end{aligned}$$
(2)

The operator \(N_M(y)\) for \(y\in M\) signifies the classical normal cone of convex analysis defined by

$$\begin{aligned} N_{M}(y) := \left\{ y^* \in H^{-1}(\varOmega ) \left| \langle y^*,y' - y\rangle _{H^{-1},H^1_0} \le 0,\;\forall y' \in M\!\right. \right\} \!. \end{aligned}$$

Accordingly, we could rewrite the generalized equation in (1) as the variational inequality

$$\begin{aligned} \langle Ay - Bu-f,y' - y\rangle _{H^{-1},H^1_0} \ge 0, \quad \forall y '\in M. \end{aligned}$$

In addition, we note that this variational inequality/lower-level problem is directly related to the classical obstacle problem as can be found in [11, 18]. For example, if \(A\) is a second-order differential operator and \(f = A\psi \) with \(\psi \in H^2(\varOmega ), \psi |_{\partial \varOmega } \le 0\), where \(\psi \) represents the obstacle, then the variational inequality is the first-order necessary and sufficient conditions for the obstacle problem:

$$\begin{aligned} \min \left\{ \frac{1}{2}\langle Ay,y\rangle _{H^{-1},H^1_0} - (Bu,y)_{L^2} \text { over } y \in H^1_0(\varOmega ) : y \ge \psi ,\,\,a.e.\,\varOmega \right\} \end{aligned}$$

The remaining notational assumptions are fairly standard, however, for completeness we provide them here for quick reference. We use \(\langle \cdot ,\cdot \rangle _{X^*,X}\) to represent the duality pairing between a topological vector space \(X\) and its dual \(X^*\) and \((\cdot ,\cdot )_{X}\) for the inner product on \(X\) when \(X\) is a Hilbert space. The arrows \(\stackrel{X}{\rightarrow }\) and \(\stackrel{X}{\rightharpoonup }\) are used to represent, respectively, strong and weak convergence of sequences in \(X\). All the subscripts are omitted when it is clear in context. Furthermore, we recall that the contingent cone to a closed set \(C \subset X\) of a Banach space \(X\) at a point \(x\in C\) is defined by

$$\begin{aligned} T_{C}(x) := \left\{ h \in X \left| \;\exists t_k \rightarrow 0^+,\exists h_k \stackrel{X}{\rightarrow } h: x + t_kh_k \in C,\,\forall k \right. \right\} \end{aligned}$$

and that in the event \(C\) is convex and the space \(X\) is reflexive, the aforementioned normal cone of convex analysis can be defined as the polar (negative dual) cone to \(T_{C}(x)\), i.e.,

$$\begin{aligned} N_{C}(x):= \left[ T_{C}(x)\right] ^{-}_{X} := \left\{ x^* \in X^* \left| \;\langle x^*,h\rangle _{X^*,X} \le 0,\;\forall h \in T_{C}(x)\right. \right\} \!. \end{aligned}$$

Throughout the paper we denote by \(S:H^{-1}(\varOmega )\rightarrow H^1_0(\varOmega )\) the mapping defined by

$$\begin{aligned} S(w) := \left\{ y \in H^1_0(\varOmega ) \left| \;Ay + N_M(y) \ni w + f\!\right. \right\} \!. \end{aligned}$$
(3)

\(S\) is referred to as the solution mapping associated with the variational inequality/generalized equation in our original MPEC (1). This mapping can be easily shown to be single-valued and Lipschitz continuous by utilizing the coercivity of \(A\) and the variational form of the generalized equation in (3); see, e.g., [11] or [4] as well as [10]. Moreover, it is well-known that \(S\) is in fact (Hadamard) directionally differentiable at every \(w \in H^{-1}(\varOmega )\), i.e., the limits

$$\begin{aligned} S'(w;h):=\lim _{\begin{array}{c} t\rightarrow 0^+\\ h' \stackrel{{H^{-1}}}{\longrightarrow } h \end{array}} \frac{S(w + th') - S(w)}{t} \end{aligned}$$

exist for all \(h \in H^{-1}(\varOmega )\).

By definition, if \(y = S(w)\), then there exists a \(v\in N_M(y)\) such that \(v = w + f - Ay\). Then by defining the classical critical cone from optimization theory:

$$\begin{aligned} \mathcal K (y,v):= T_M(y)\cap \left\{ v \right\} ^{\bot }\!, \end{aligned}$$
(4)

where

$$\begin{aligned} \left\{ v \right\} ^{\bot } := \left\{ y \in H^1_0(\varOmega ) \left| \langle v,r\rangle _{H^{-1},H^1_0} = 0\right. \right\} \!, \end{aligned}$$

we can directly characterize the graph of \(S'\):

$$\begin{aligned} \mathrm{gph\,}S'(w;\cdot ) = \left\{ (h,d) \in H^{-1}(\varOmega ) \times H^1_0(\varOmega ) \left| Ad + N_\mathcal{K (y,v)}(d) \ni h\!\right. \right\} \!. \end{aligned}$$
(5)

This differentiability result is essentially due to Mignot [12], but it was rederived for a broader class of problems in [10]. Furthermore, since the operator \(B\) is linear and bounded from \(L^2(\varOmega )\) into \(H^{-1}(\varOmega )\), we know that

$$\begin{aligned} d = (S \circ B)'(u; h) = S'(Bu;Bh)\Longleftrightarrow Ad + N_\mathcal{K (y,v)}(d) \ni Bh. \end{aligned}$$

The reader is referred to [5, Chapter 2.2] for more details on these concepts.

Given the properties of the solution mapping \(S\) to the variational inequality described above, we can reformulate the MPEC (1) as the following nonsmooth optimization problem:

$$\begin{aligned}&\min V(u) := \dfrac{1}{2}|| S(Bu) - y_d||^2_{L^2} + \dfrac{\alpha }{2}||u||^2_{L^2} \text { over } u \in L^2(\varOmega )\nonumber \\&s.t.\quad u \in U_{ad}. \end{aligned}$$
(6)

Following, e.g. [13] or [4], it can be shown that (6) has a solution. Since \(S\) is merely directionally differentiable, (6) cannot be directly analyzed and solved with the same techniques used for the optimal control of partial differential equations (PDE). In particular, the usual method of deriving optimality conditions for PDE-constrained problems are not available (cf. [20] for some standard techniques).

We now define in what follows various stationarity concepts for MPECs in the current context that are studied in the subsequent sections.

In contrast to the dual stationarity concepts, the following primal stationarity condition is easily adapted to function spaces provided \(S\) is regular enough. Moreover, it provides good candidates for locally optimal solutions, but, it is not always convertible into a multiplier based system.

Definition 1

(B-stationarity) A feasible point \(\bar{u}\) for the reduced MPEC (6) is referred to as B-stationary provided

$$\begin{aligned} V'(\bar{u};h) \ge 0, \quad \forall h \in T_{U_{ad}}(\bar{u}) \end{aligned}$$

As we do not necessarily restrict ourselves to the more regular settings in which \(S(Bu) \in H^2(\varOmega ) \cap H^1_0(\varOmega )\), we cannot directly use the form of C- and S-stationarity as defined in [8, 9]. We will see in Sect. 4 that in such situations more information is available and therefore, a more refined stationarity system.

Definition 2

(C- and S-stationarity) A point \((\bar{u},\bar{y}) \in L^2(\varOmega ) \times H^1_0(\varOmega )\) feasible to the MPEC (1) is called a C-stationary point of the MPEC if there exist multipliers \(\bar{s} \in L^2(\varOmega ),\,\bar{v} \in H^{-1}(\varOmega ),\,\bar{p} \in H^1_0(\varOmega )\), and \(\bar{r} \in H^{-1}(\varOmega )\) for which

$$\begin{aligned} 0&= \alpha \bar{u} + B^*\bar{p} + \bar{s},\end{aligned}$$
(7)
$$\begin{aligned} 0&= \bar{y}-y_d - A^*\bar{p} + \bar{r}, \end{aligned}$$
(8)
$$\begin{aligned} 0&= A\bar{y} - B\bar{u} - f + \bar{v}, \end{aligned}$$
(9)

where the multipliers satisfy the following conditions:

$$\begin{aligned} 0&\le \bar{s},\,\,a.e.\,\mathcal A _a(\bar{u})\quad \bar{s} = 0,\,\,a.e.\,\mathcal J (\bar{u}),\quad \bar{s} \ge 0\,\,a.e.\,\mathcal A _{b}(\bar{u}),\end{aligned}$$
(10)
$$\begin{aligned} 0&\ge \langle \bar{v}, \varphi \rangle _{H^{-1},H^1_0},\,\,\forall \varphi \in H^1_0(\varOmega ): \varphi \ge 0\,\,a.e.\,\varOmega ,\end{aligned}$$
(11)
$$\begin{aligned} 0&= \langle \bar{v}, \bar{y}\rangle _{H^{-1},H^1_0},\end{aligned}$$
(12)
$$\begin{aligned} 0&= \langle \bar{v},\bar{p}\rangle _{H^{-1},H^1_0},\end{aligned}$$
(13)
$$\begin{aligned} 0&= \langle \bar{r},\bar{y}\rangle _{H^{-1},H^1_0},\end{aligned}$$
(14)
$$\begin{aligned} 0&\ge \langle \bar{r},\bar{p}\rangle _{H^{-1},H^1_0}. \end{aligned}$$
(15)

Here, we use the notation

$$\begin{aligned} \mathcal A (\bar{y}):= \left\{ x \in \varOmega \left| \; \bar{y}(x) = 0\right. \right\} \text { and }\;\mathcal I (\bar{y}) := \varOmega {\setminus }\mathcal A (\bar{y}) \end{aligned}$$

to represent the active and inactive sets for the lower-level problem, respectively, and

$$\begin{aligned} \mathcal A _a(\bar{u})&:= \left\{ x \in \varOmega \left| \bar{u}(x) = a(x)\right. \right\} \!,\\ \mathcal A _b(\bar{u})&:= \left\{ x \in \varOmega \left| \bar{u}(x) = b(x)\right. \right\} \!,\\ \mathcal J (\bar{u})&:= \varOmega {\setminus }(\mathcal A _a(\bar{u})\cup \mathcal A _b(\bar{u})) \end{aligned}$$

for the lower active, upper active, and inactive sets for the control constraints, respectively.

If in addition to the above conditions we have \(\bar{v}\in L^2(\varOmega )\) and

$$\begin{aligned} 0&\le \bar{p}\;a.e.\; \mathcal B ,\end{aligned}$$
(16)
$$\begin{aligned} 0&\le \langle \bar{r},\varphi \rangle _{H^{-1},H^1_0},\,\,\forall \varphi \in H^1_0(\varOmega ): \varphi \ge 0,\,\,a.e.\,\mathcal B \text { and } \varphi = 0,\,\,a.e.\,\mathcal A (\bar{y}){\setminus }\mathcal B , \qquad \quad \end{aligned}$$
(17)

then \((\bar{u},\bar{y})\) is said to be a S(trong)-stationary point, where the notation

$$\begin{aligned} \mathcal B := \left\{ x \in \mathcal A (\bar{y})\left| \;\bar{v}(x) = 0\right. \right\} \!, \end{aligned}$$

is used to denote the so-called bi-active set.

We note that (17) could also be defined when \(v \in H^{-1}(\varOmega )\). In this case, one has

$$\begin{aligned} 0 \le \langle \bar{r},\varphi \rangle ,\,\,\forall \varphi \in H^1_0(\varOmega ): \langle \bar{v},\varphi \rangle = 0 \text { and } \varphi \ge 0,\text { q.e. }\mathcal A (\bar{y}), \end{aligned}$$

where “q.e.” stands for quasi-almost-everywhere, see e.g. [13].

The terms C-stationarity and S-stationarity are originally attributed to Scheel and Scholtes [19], where the “C” reflects the fact that the notions from Clarke’s nonsmooth calculus were used in the derivation process. In the sense that only the product of the multipliers \(\bar{r}\) and \(\bar{p}\) has a sign, C-stationarity conditions are not KKT conditions in the classical sense. In infinite dimensions, S-stationarity conditions were first derived for the class of problems with \(U_{ad} = L^2(\varOmega )\) by [13]. Since then, no one has been able to extend their results to the case when \(U_{ad}\) is a proper convex subset.

In a function space context, Outrata, Jarušek and Starà in [16] and [17] successfully applied elements of the limiting variational calculus, see Sect. 3, to problems similar to ours. Unfortunately, these results are only applicable in the case of control constraints when \(\varOmega \subset \mathbb{R }\) and the controls \(u\) belong to \(H^{-1}(\varOmega )\). These conditions are similar to a finite-dimensional concept known as M-stationarity, where the “M” is used in reference to the limiting variational calculus largely developed by the second author.

The rest of the paper is structured as follows. In Sect. 2 we derive primal first-order optimality conditions similar to the B-stationarity conditions mentioned above. In Sect. 3 we define certain notions from the limiting variational calculus and then apply these concepts to our class of MPECs. We then verify the necessary qualification conditions and derive new limiting stationarity conditions. Section 4 is devoted to a hybrid derivation method utilizing the results from [10] for MPECs without upper-level constraints by penalizing the control constraints with a smooth penalty function. In Sect. 5 we recall a penalization-regularization method extended to elliptic MPECs by the first author and I. Kopacka in [9] and establish its important consequences.

2 B-Stationarity

In this brief section, we establish B-stationarity for a locally optimal solution of the MPEC (1). It can be observed from the proof that it is certainly possible to work with more general objective functionals \(J\) than the tracking-type functional.

Theorem 1

(B-stationarity of an optimal solution) Let \((\bar{u},\bar{y})\) be a locally optimal solution to the original MPEC (1). Then the following optimality condition holds

$$\begin{aligned} \alpha ( \bar{u},h)_{L^2} + (\bar{y} - y_d,d)_{L^2} \ge 0,\forall (h,d) \in \left[ T_{U_{ad}}(\bar{u})\times H^1_0(\varOmega )\right] \cap \mathrm{gph\,}S'(B\bar{u};B\cdot ).\nonumber \\ \end{aligned}$$
(18)

Equivalently, if \((\bar{u},\bar{y})\) is a locally optimal solution to the MPEC (1), then the origin in \(L^2(\varOmega )\times H^1_0(\varOmega )\) is a solution to the following MPEC

$$\begin{aligned} \begin{array}{l} \min \,\, \alpha ( \bar{u},h)_{L^2} + (\bar{y} - y_d,d)_{L^2}\quad \text { over }\quad (h,d) \in L^2(\varOmega )\times H^1_0(\varOmega )\\ s.t.\quad h \in T_{U_{ad}}(\bar{u}), Ad + N_\mathcal{K (\bar{y},\bar{v})}(d) \ni Bh. \end{array} \end{aligned}$$
(19)

Proof

Throughout the proof, we refer to the reduced MPEC (6), from which we recall that the mapping \(V:L^2(\varOmega ) \rightarrow \mathbb{R }\) is directionally differentiable and Lipschitz continuous. Next we modify the nonsmooth problem (6) one step further to

$$\begin{aligned} \min V(u) + I_{U_{ad}}(u) \text { over } u \in L^2(\varOmega ), \end{aligned}$$
(20)

where \(I_{U_{ad}}\) is the indicator function of \(U_{ad}\). Given an arbitrary locally optimal solution \(\bar{u}\) to problem (20), observe that the corresponding pair \((\bar{u},\bar{y})\) is a locally optimal solution to the original MPEC (1), and vice versa. Moreover, it can be argued (see e.g. [3] Chapter 6.1.3) that the following condition must hold

$$\begin{aligned} \liminf _{\begin{array}{c} t \rightarrow 0^+\\ h^{\prime }\rightarrow _{L^2} h \end{array}} \frac{ V(\bar{u} + th^{\prime }) - V(\bar{u}) + I_{U_{ad}}(\bar{u} + th^{\prime }) - I_{U_{ad}}(\bar{u})}{t} \ge 0,\quad \forall h \in L^2(\varOmega ).\qquad \end{aligned}$$
(21)

Continuing, we first note that if \(h \in L^2(\varOmega )\) but \(h \notin T_{U_{ad}}(\bar{u})\), then there either exist no sequences \(t_k \rightarrow 0^+\) or \(h_k \rightarrow _{L^2} h\) such that \(\bar{u} + t_k h_k \in U_{ad}\). Thus, for such \(h\) the limit inferior in (21) is equal to \(+\infty \). Suppose now that \(h\in T_{U_{ad}}(\bar{u})\). Then by definition there exist sequences \(t_k\rightarrow 0^+\) and \(h_k \rightarrow _{L^2} h\) such that \(\bar{u}+t_k h_k \in U_{ad}\). For such sequences, the difference quotients in (21) reduce to

$$\begin{aligned} \frac{ V(\bar{u} + t_kh_k) - V(\bar{u})}{t_k}. \end{aligned}$$

Then by using the directional differentiability and the fact that \(V\) is Lipschitz continuous (and therefore \(V'(\bar{u};\cdot )\) as well), we further reduce the difference quotients to

$$\begin{aligned} \frac{ V(\bar{u} + t_kh_k) - V(\bar{u})}{t_k} = \frac{ V(\bar{u}) + t_kV'(\bar{u};h_k) + o(t_k) - V(\bar{u})}{t_k} = V'(\bar{u};h_k) + \frac{o(t_k)}{t_k}, \end{aligned}$$

which implies in turn that

$$\begin{aligned} V'(\bar{u};h) \ge 0,\quad \forall h \in T_{U_{ad}}(\bar{u}). \end{aligned}$$

The final step of the proof requires us to compute the derivative \(V'(\bar{u},h)\). By definition, we need to calculate the following limit:

$$\begin{aligned} \lim _{t \rightarrow 0^+} \frac{\frac{1}{2}||S(B(\bar{u} + th)) - y_d||^2_{L^2} + \frac{\alpha }{2}||\bar{u} + th||^2_{L^2} - \frac{1}{2}||S(B\bar{u}) - y_d||^2_{L^2} -\frac{\alpha }{2}||\bar{u}||_{L^2}^2}{t}. \end{aligned}$$

We observe first that

$$\begin{aligned} \frac{\frac{\alpha }{2}||\bar{u} + th||^2_{L^2} - \frac{\alpha }{2}||\bar{u}||^2_{L^2}}{t} = \alpha (\bar{u},h)_{L^2} + \frac{\alpha t}{2}||h||^2_{L^2}. \end{aligned}$$

Similarly, we reduce the remaining terms (using the directional differentiability of \(S\)) to

$$\begin{aligned}&\frac{\frac{1}{2}||S(B\bar{u} +t h) - y_d||^2_{L^2} - \frac{1}{2}||S(B\bar{u}) - y_d||^2_{L^2} }{t}\\&\quad =(S(B\bar{u}) - y_d,S'(B\bar{u};B h) + \frac{o(t)}{t})_{L^2} + t||S'(B\bar{u};B h) + \frac{o(t)}{t}||^2_{L^2}. \end{aligned}$$

Then by adding the reduced terms and passing to the limit, we obtain the equality

$$\begin{aligned}&\lim _{t \rightarrow 0^+} \frac{\frac{1}{2}||S(B(\bar{u}+t h)) - y_d||^2_{L^2} + \frac{\alpha }{2}||\bar{u} + t h||^2_{L^2} - \frac{1}{2}||S(B\bar{u}) - y_d||^2_{L^2} -\frac{\alpha }{2}||\bar{u}||_{L^2}^2}{t}\\&\quad = \alpha (\bar{u},h)_{L^2} + (S(B\bar{u}) - y_d,S'(B\bar{u};B h))_{L^2}, \end{aligned}$$

which completes the proof of the theorem via substitution. \(\square \)

Theorem 1 shows that if \(\bar{u}\in U_{ad}\) such that \(T_{U_{ad}}(\bar{u}) = L^2(\varOmega )\), then S-stationarity conditions can be rederived without major difficulties provided the operator \(B\) is surjective or the identity on \(L^2(\varOmega )\).

By directly adapting the proof of [5, Lemma 6.34], we obtain the following description:

$$\begin{aligned} T_{U_{ad}}(\bar{u}) = \left\{ h \in L^2(\varOmega ) \left| \begin{array}{l} h \ge 0,\,\,\text {a.e. on }\mathcal A _{a}(\bar{u})\\ h \le 0,\,\,\text {a.e. on }\mathcal A _{b}(\bar{u}) \end{array}\right. \right\} . \end{aligned}$$

Therefore, it can be argued that if the Lebesgue measure of the active set \(\mathcal A _{a}(\bar{u})\cup \mathcal A _b(\bar{u})\) equals zero, then \(T_{U_{ad}}(\bar{u}) = L^2(\varOmega )\). Thus, even though \(U_{ad}\) has an empty interior in \(L^2(\varOmega )\), there exist admissible points such that the tangent cone is equal to the entire space.

Under a fairly restrictive assumption, it is easy to derive the following corollary from Theorem  1, which yields a dual form of B-stationarity.

Corollary 1

(Dual form of B-stationarity) Let \((\bar{u},\bar{y})\) be a locally optimal solution to the MPEC (1), where \(\bar{u} \in U_{ad}\) such that \(S'(B\bar{u};\cdot ) =: \Sigma _{\bar{u}}(\cdot )\) is a bounded linear operator from \(L^2(\varOmega )\) into \(H^{1}_0(\varOmega )\). Then the following optimality condition holds

$$\begin{aligned} \alpha ( \bar{u},h)_{L^2} + (B^*\Sigma _{\bar{u}}^*(\bar{y} - y_d),h)_{L^2} \ge 0,\forall h \in T_{U_{ad}}(\bar{u}), \end{aligned}$$

which in dual form is equivalent to the inclusion

$$\begin{aligned} 0 \in \alpha \bar{u} + B^*\Sigma _{\bar{u}}^*(\bar{y} - y_d) + N_{U_{ad}}(\bar{u}) \end{aligned}$$

or equivalently the variational inequality

$$\begin{aligned} (\alpha \bar{u} + B^*\Sigma _{\bar{u}}^*(\bar{y} - y_d),u'- \bar{u})_{L^2} \ge 0,\,\,\forall u' \in U_{ad}. \end{aligned}$$

In order to obtain workable KKT-type optimality conditions in the case where \(S'(B\bar{u};B\cdot )\) is not a bounded linear operator, we would need to calculate the following polar cone:

$$\begin{aligned} \left[ \left( T_{U_{ad}}(\bar{u})\times H^1_0(\varOmega )\right) \cap \mathrm{gph\,}S'(B\bar{u};B\cdot )\right] ^{-}_{L^2\times H^1_0}. \end{aligned}$$

Unfortunately, it appears to be a difficult, if not impossible, task. Thus the need for a different set of more constructive tools for the derivation of dual conditions (in both finite and infinite dimensions) is evident.

3 Dual optimality conditions via limiting variational calculus

We first recall several definitions and concepts from variational analysis and generalized differentiation. Our main source is the two-volume monograph [14, 15]. Throughout the following section, unless otherwise noted, all spaces will be assumed to be Hilbert spaces. Nevertheless, we stress that these objects along with the accompanying results can be defined/proved in much more general settings.

Definition 3

(The Regular/Fréchet Normal Cone) Let \(C\subset X\). Then the multifunction \((\)set-valued mapping\()\, \widehat{N}_{C}:X \rightrightarrows X^*\) defined by

$$\begin{aligned} \widehat{N}_{C}(x):= \left\{ x^* \in X^*\left| \;\limsup _{\begin{array}{c} x'\stackrel{X}{\rightarrow } x\\ x' \in C \end{array}} \frac{ (x^*,x' - x)_{X}}{||x'-x||_{X}} \le 0\right. \right\} ,\quad x\in C, \end{aligned}$$
(22)

and \(\widehat{N}_C(x):=\emptyset \) for \(x\notin C\) is called the regular/Fréchet normal cone to \(C\).

Unfortunately, the convex cone \(\widehat{N}_{C}\) does not admit a satisfactory calculus. This restricts the scope of applications of (22), in particular, for deriving multiplier-based optimality conditions. The situation changes significantly when we apply an appropriate limiting procedure to the mapping \(\widehat{N}_C(\cdot )\).

Definition 4

(The Limiting/Mordukhovich Normal Cone) Let \(C\subset X\). The multifunction \(N_{C}:X\rightrightarrows X^*\) defined by

$$\begin{aligned} N_{C}(x):= \left\{ x^*\in X^*\left| \;\exists x_k \stackrel{X}{\rightarrow }x,\,\exists x^*_k\stackrel{X^*}{\rightharpoonup } x^*:\,x^*_k\in \widehat{N}_{C}(x_k),\,\,\forall k\in \mathbb{N }\right. \right\} \end{aligned}$$
(23)

is called the limiting/Mordukhovich normal cone to \(C\).

If the set \(C\) is convex, both cones (22) and (23) agree with the normal cone of convex analysis, otherwise \(\widehat{N}_{C}(x) \subsetneqq N_{C}(x)\) in general.

Next we define the notions of coderivatives for set-valued (in particular, single-valued) mappings generated by the corresponding normal cones (22) and (23).

Definition 5

(Coderivatives) Let \(\varPhi :X \rightrightarrows Y\) be a set-valued mapping between (paired) reflexive Banach spaces \(X\) and \(Y\), and let \((x,y)\in \mathrm{gph\,}\varPhi \). The regular/Fréchet coderivative of \(\varPhi \) at \((x,y)\) is the multifunction \(\widehat{D}^*\varPhi (x,y):Y^*\rightrightarrows X^*\) defined by

$$\begin{aligned} h^*\in \widehat{D}^*\varPhi (x,y)(d^*)\Longleftrightarrow (h^*,-d^*)\in \widehat{N}_{\mathrm{gph\,}\varPhi }(x,y). \end{aligned}$$
(24)

The limiting/Mordukhovich coderivative \(D^*\varPhi (x,y)\) of \(\varPhi \) at \((x,y)\in \mathrm{gph\,}\varPhi \) is similarly defined by

$$\begin{aligned} h^*\in D^*\varPhi (x,y)(d^*)\Longleftrightarrow (h^*,-d^*)\in N_{\mathrm{gph\,}\varPhi }(x,y). \end{aligned}$$
(25)

We observe from (22) to (25) that the limiting coderivative (25) admits the following representation, cf. [14, Corollary 2.36]:

$$\begin{aligned}&h^*\in D^*\varPhi (x,y)(d^*)\nonumber \\&\quad \Longleftrightarrow \exists x_k \stackrel{X}{\rightarrow }x,\,\,\, \exists y_k\stackrel{Y}{\rightarrow }y,\,\,\, \exists d^*_k\stackrel{Y^*}{\rightharpoonup }d^*,\,\,\, \exists h^*_k\stackrel{X^*}{\rightharpoonup }h^*: h^*_k\in \widehat{D}^*\varPhi (x_k,y_k)(d^*_k).\nonumber \\ \end{aligned}$$
(26)

If in (26) we replace “\(d^*_k\stackrel{Y^*}{\rightharpoonup }d^*\)” by the condition “\(d^*_k\stackrel{Y^*}{\rightarrow }d^*\)”, then the corresponding construction \(D^*_M\varPhi (x,y)\) is known as the mixed coderivative of \(\varPhi \) at \((x,y)\in \mathrm{gph\,}\varPhi \).

If \(\varPhi :X\rightarrow Y\) is strictly differentiable at \(x\), e.g. \(C^1\) around this point, with derivative \(\varPhi '(x)\), then all three coderivatives reduce to the adjoint derivative operator

$$\begin{aligned} \widehat{D}^*\varPhi (x)(y^*)=D^*\varPhi (x)(y^*)=D^*_M\varPhi (x)(y^*)=\big \{\varPhi '(x)^*y^*\big \},\quad y^*\in Y^*, \end{aligned}$$

where \(y=\varPhi (x)\) is omitted due to single-valuedness. In general these coderivative mappings are positively homogeneous in \(y^*\) with full calculi for \(D^*\varPhi \) and \(D^*_M\varPhi \) and a rather restrictive one for \(\widehat{D}^*\varPhi \). For mappings between infinite-dimensional spaces the aforementioned calculus rules require appropriate “normal compactness” conditions. The weakest ones among such conditions are given in the next definition.

Definition 6

(Sequential normal compactness) Let \(\varPhi :X \rightrightarrows Y\) be a set-valued mapping between (paired) spaces \(X\) and \(Y\), and let \((x,y)\in \mathrm{gph\,}\varPhi \). We say that \(\varPhi :X\rightrightarrows Y\) is sequentially normally compact (SNC) at \((x,y) \in \mathrm{gph\,}\varPhi \) if for any collection of sequences \(\{x_k\} \subset X,\,\{y_k\} \subset Y,\,\{x^*_k\} \subset X^*\), and \(\{y^*_k \} \subset Y^*\) satisfying

$$\begin{aligned} x_k \stackrel{X}{\rightarrow } \bar{x},\quad y_k \stackrel{Y}{\rightarrow } \bar{y},\quad x^*_k \stackrel{X^*}{\rightharpoonup } 0,\quad y^*_k \stackrel{Y^*}{\rightharpoonup } 0, \text { with } y^*_k \in \widehat{D}^*\varPhi (x_k,y_k)(x^*_k), \end{aligned}$$

it follows that \(||x^*_k||_{X^*}\rightarrow 0\) and \(||y^*_k||_{Y^*}\rightarrow 0\). If the requirement that \(y^*_k \stackrel{Y^*}{\rightharpoonup } 0\) above is replaced by \(||y^*_k||_{Y^*} \rightarrow 0\), then \(\varPhi \) is said to be partially sequentially normally compact (PSNC) at \((x,y)\).

Besides finite-dimensional settings, the SNC and PSNC properties automatically hold when the involved mappings satisfy certain Lipschitz-like properties. Moreover, they are preserved under various compositions, see [14].

Definition 7

(The Aubin property) Let \(\varPhi :X \rightrightarrows Y\) be a set-valued mapping between (paired) spaces \(X\) and \(Y\), and let \((x,y)\in \mathrm{gph\,}\varPhi \). We say that \(\varPhi \) has the Aubin property or is Lipschitz-like/Pseudo-Lipschitz at \((x,y)\) if there are neighborhoods \(\mathcal U \) of \(x\) and \(\mathcal V \) of \(y\) together with a constant \(L >0\) such that

$$\begin{aligned} \Vert y-y'\Vert _{Y}\le L\Vert x - x'\Vert _{X},\,\,\forall (x,y),(x',y')\in \left[ \mathcal{U }\times \mathcal{V }\right] \cap \mathrm{gph\,}\varPhi . \end{aligned}$$
(27)

It immediately follows from (27) that for single-valued mappings \(\varPhi :X\rightarrow Y\), the Aubin property reduces to the classical local Lipschitz continuity. Moreover, the coderivative criterion from [14, Theorem 4.0] asserts that a closed-graph mapping \(\varPhi :X\rightrightarrows Y\) has the Aubin property around \((x,y)\in \mathrm{gph\,}\varPhi \) if and only if it is PSNC at this point and the injectivity condition “\(D^*_M\varPhi (x,y)(0)=\{0\}\)” holds.

For convenience, we restate the MPEC (1) in compact form:

$$\begin{aligned}&\min \dfrac{1}{2}||y - y_d||^2_{L^2(\varOmega )} + \dfrac{\alpha }{2}||u||^2_{L^2(\varOmega )} \text { over } (u,y) \in L^2(\varOmega )\times H^1_0(\varOmega )\nonumber \\&\quad s.t.\quad u \in U_{ad},\quad y = S(Bu). \end{aligned}$$
(28)

Our first result provides a necessary optimality condition for the MPEC (1) in terms of the limiting coderivative of \(S\) and the convex normal cone to the control set \(U_{ad}\).

Proposition 1

(Limiting optimality conditions for the MPEC) Let \((\bar{u},\bar{y})\) be a locally optimal solution to the MPEC (1). Then we have

$$\begin{aligned} 0 \in \alpha \bar{u} + B^*D^* S(B\bar{u},y)(\bar{y} - y_d) + N_{U_{ad}}(\bar{u}). \end{aligned}$$
(29)

Proof

As argued in the introduction, \(S\circ B\) is Lipschitz continuous from \(L^2(\varOmega ) \rightarrow H^1_0(\varOmega )\). Thus, it follows from [14, Theorem 4.10] that

  1. 1.

    \(S \circ B \) is partially sequentially normally compact (PSNC) at \((\bar{u},\bar{y})\),

  2. 2.

    \(D^*_M (S\circ B)(\bar{u},\bar{y})(0)=\left\{ 0\right\} \).

Applying now the necessary optimality conditions for abstract MPECs established in [15, Theorems 5.33 and 5.34] to (28), and taking into account that the cost functional therein is smooth, we conclude that the PSNC and qualification assumptions required by [15, Theorems 5.33 and 5.34] are satisfied. It follows that

$$\begin{aligned} 0 \in \alpha \bar{u} + D^*(S\circ B)(\bar{u},\bar{y})(\bar{y} - y_d) + N_{U_{ad}}(\bar{u}). \end{aligned}$$

Finally, it follows from the calculus result of [14, Corollary 3.16] that

$$\begin{aligned} D^*(S\circ B)(\bar{u},\bar{y})(\bar{y} - y_d) \subset B^* D^*S(B\bar{u},\bar{y})(\bar{y} - y_d), \end{aligned}$$

and therefore, the asserted optimality condition (29) holds. \(\square \)

Remark 1

(Regularity of the optimal control) We mention that if \(U_{ad} = L^2(\varOmega )\), then \(N_{U_{ad}}(\bar{u}) = \{0\}\). Moreover, if \(B\) acts as the identity on \(L^2(\varOmega )\), then \(B^*y^* \in H^1_0(\varOmega )\) for all \(y^* \in D^* S(B\bar{u},\bar{y})(\bar{y} - y_d)\). Thus it follows from Proposition 1 that the optimal solution \(\bar{u}\) enjoys an increased regularity in this case. Observe that the above arguments can be easily extended to more general situations.

The remaining part of this section is dedicated to the explicit characterization of the coderivative in the necessary optimality condition of Proposition 1. Developing this derivation technique, we arrive at multiplier-based optimality conditions for the original MPEC.

We start by first observing the following description of the coderivative in (29) in light of (26):

$$\begin{aligned}&D^*S(B\bar{u},\bar{y})(\bar{y} - y_d) = \left\{ \bar{p}^* \in H^1_0(\varOmega )\left| \exists y^*_k \stackrel{{{H}^{-1}}}{\longrightarrow } B\bar{u}, \exists y_k \stackrel{{H^1_0}}{\rightarrow } \bar{y},\right. \right. \\&\quad \left. \left. \exists q^*_k \stackrel{{H^{-1}}}{\mathrm{- \!\!\! \rightharpoonup }} \bar{y} - y_d, \exists p^*_k \stackrel{{H^1_0}}{\rightharpoonup } \bar{p}^*: p^*_k \in \widehat{D}^* S(y^*_k,y_k)(q^*_k),\forall k\right. \right\} . \end{aligned}$$

By simply referring to the definition of the regular coderivative (24), we know that the previous equation can be understood as

$$\begin{aligned} D^*S(B\bar{u},\bar{y})(\bar{y} - y_d) = \left\{ \bar{p}^* \in H^1_0(\varOmega )\left| \exists y^*_k \stackrel{{H^{-1}}}{\longrightarrow } B\bar{u}, \exists y_k \stackrel{{H^1_0}}{\rightarrow } \bar{y},\right. \right. \\\left. \left. \exists q^*_k \stackrel{{H^{-1}}}{\mathrm{- \!\!\! \rightharpoonup }} \bar{y} - y_d, \exists p^*_k \stackrel{{H^1_0}}{\rightharpoonup } \bar{p}^*:(p^*_k,-q^*_k) \in \widehat{N}_{\mathrm{gph\,}S} (y^*_k,y_k),\forall k\right. \right\} \!. \end{aligned}$$

Using [14, Theorem 1.10], we approximate the limiting coderivative of \(S\) by replacing \(\widehat{N}_{\mathrm{gph\,}S} (y^*_k,y_k)\) with the larger polar contingent cone \(\left[ T_{\mathrm{gph\,}S}(y^*_k,y_k)\right] ^{-}\). Note that the contingent cone to the graph of \(S\) coincides with the graph of the so-called contingent derivative of \(S\); see [3]. In the current setting with \(S\) being single-valued, Lipschitz continuous and Hadamard directionally differentiable, the contingent derivative coincides with the Hadamard directional derivative. It was shown in the proof of [10, Theorem 4.6] that

$$\begin{aligned} (p_k^*,-q^*_k) \in \left[ T_{\mathrm{gph\,}S}(y^*_k,y_k)\right] ^{-}\Longleftrightarrow p^*_k \in \mathcal K (y_k,v_k),\,\, A^*p^*_k - q^*_k \in \left[ \mathcal K (y_k,v_k)\right] ^{-}\!, \end{aligned}$$

where \(v_k\in N_M(y_k)\) such that \(v_k = y^*_k + f - Ay_k\), and where \(\mathcal K (y_k,v_k)\) is the critical cone (4). This leads to the following characterization of the coderivative.

Proposition 2

(A characterization via the critical cone) Let \(v_k := y^*_k + f - Ay_k \in N_M(y_k)\). Then elements of the limiting coderivative of the solution map (3) are described by

$$\begin{aligned}&\bar{p}\in D^*S(B\bar{u},\bar{y})(\bar{y} - y_d)\\&\quad \Longrightarrow \exists y^*_k \stackrel{{H^{-1}}}{\longrightarrow } B\bar{u},\quad \exists y_k \stackrel{{H^1_0}}{\rightarrow } \bar{y},\quad \exists q_k \stackrel{{H^{-1}}}{\mathrm{- \!\!\! \rightharpoonup }} \bar{y} - y_d,\quad \exists p_k \stackrel{{H^1_0}}{\rightharpoonup } \bar{p}:\\&\quad p_k \in \mathcal K (y_k,v_k) \text { and }A^*p_k - q_k \in \left[ \mathcal K (y_k,v_k)\right] ^{-}. \end{aligned}$$

Before establishing our next result, we point out a simple fact concerning the convergence of normal cone mappings to closed convex sets. Suppose that \(X\) is a Banach space and \(C \subset X\) is a closed convex subset. Let \(x_k \in C\) be such that \(x_k \stackrel{X}{\rightarrow } x\) and let \(z_k \in N_C(x_k)\) be such that \(z_k \stackrel{X^*}{\rightharpoonup } z\). Then by the definition of the normal cone we have \(\langle z_k,x'-x_k\rangle \le 0\) for all \(x' \in C\). For an arbitrary element \(x' \in C\), it follows that \(\langle z_k,x' - x_k\rangle \rightarrow \langle z,x' - x\rangle \) and thus \(z \in N_C(x)\). We make use of this property in the proof of the following proposition.

Proposition 3

(Limits of the polar critical cones) Let \(r_k \in \left[ \mathcal K (y_k,v_k)\right] ^{-}\) with

$$\begin{aligned} r_k \stackrel{{H^{-1}}}{\mathrm{- \!\!\! \rightharpoonup }} \bar{r},\quad y_k \stackrel{{H^{1_0}}}{\rightarrow } \bar{y},\quad v_k \stackrel{{H^{-1}}}{\longrightarrow } \bar{v} \end{aligned}$$

and \(v_k\in N_M(y_k)\). Then one of the following alternatives holds

$$\begin{aligned} \bar{r}&\in N_M(\bar{y}),\end{aligned}$$
(30)
$$\begin{aligned} \bar{r}&\in N_M(\bar{y}) - \bar{\varepsilon }\bar{v},\,\,\bar{\varepsilon } > 0,\end{aligned}$$
(31)
$$\begin{aligned} 0&= \langle \bar{r}, \bar{y}\rangle _{H^{-1},H^1_0}. \end{aligned}$$
(32)

Proof

Given that the polar (negative dual) cone of an intersection of closed convex sets \(C,D\) is equal to the closure of the sum \([C]^{-}+[D]^{-}\), we have

$$\begin{aligned} \left[ T_M(y_{k}) \cap \left\{ v_{k}\right\} ^{\bot }\right] ^{-} = \mathrm{cl\,}\left\{ N_M(y_{k}) + \mathbb{R } v_{k}\right\} _{H^{-1}} = \mathrm{cl\,}\left\{ N_M(y_{k}) - \mathbb{R }_+ v_{k}\right\} _{H^{-1}}. \end{aligned}$$

It follows then that for each \(k \in \mathbb{N }\) there exists a sequence \(r^k_l \in N_M(y_k) - \mathbb{R }_{+}v_k\) such that \(r^k_{l} \stackrel{{H^{-1}}}{\rightarrow } r_k\) as \(l \rightarrow +\infty \). Then for each \(k\) we can find an \(L_{k} \in \mathbb{N }\) such that \(||r^k_l - r_k||_{H^{-1}} \le 1/2^k\) for all \(l \ge L_{k}\). Now let \(\varphi \in H^1_0(\varOmega )\) with \(||\varphi ||_{H^1_{0}} = 1\). From the previous argument, we deduce the following:

$$\begin{aligned} \frac{1}{2^k} \ge |\langle r^k_{L_k} - r_k,\varphi \rangle _{H^{-1},H^1_0}| =|\langle r^k_{L_k} - \bar{r},\varphi \rangle _{H^{-1},H^1_0} + \langle \bar{r} - r_k,\varphi \rangle _{H^{-1},H^1_0}|. \end{aligned}$$

Passing to the limit as \(k \rightarrow +\infty \), it follows that \(\hat{r}_k := r^k_{L_k} \stackrel{{H^{-1}}}{\rightharpoonup } \bar{r}\). Consider now that for all \(k\), there exists \(w_k \in N_M(y_k)\) and \(\varepsilon _k \in \mathbb{R }_+\) such that \(\hat{r}_k = w_k - \varepsilon _k v_k\).

If there exists a subsequence \(\varepsilon _{k_l}\) such that \(\varepsilon _{k_l} = 0\) for all sufficiently large \(l\), then \(\bar{r} \in N_M(\bar{y})\). Suppose instead that \(\varepsilon _k > 0\) for all \(k\) large enough and that \(\varepsilon _k\) is bounded. Without loss of generality, we can assume that \(\varepsilon _k \rightarrow \bar{\varepsilon } > 0\). Otherwise we can take a subsequence and obtain the same result. Then \(w_k = \hat{r}_k + \varepsilon _k v_k\) is bounded in \(H^{-1}(\varOmega )\), a reflexive Banach space. Hence, there exists a subsequence \(w_{k_l}\) and \(\bar{w} \in H^{-1}(\varOmega )\) such that \(w_{k_l} \stackrel{{H^{-1}}}{\rightharpoonup } \bar{w}\). It follows from \(w_{k_l} \in N_M(y_{k_l})\) that \(\bar{w} \in N_M(\bar{y})\). Therefore, \(\bar{r} \in N_M(\bar{y}) - \bar{\varepsilon }\bar{v}\).

Finally, assume that \(\varepsilon _k\) is unbounded. Since \(M\) is a cone, \(\langle w_k,y_k\rangle _{H^{-1},H^1_0} = 0\) for all \(w_k \in N_M(y_k)\), from which we can deduce that \(\langle \hat{r}_k,y_k\rangle _{H^{-1},H^1_0} = 0\) for all \(k\). Passing to the limit yields: \(\langle \bar{r},\bar{y} \rangle _{H^{-1},H^1_0} = 0\); as was to be shown. \(\square \)

Remark 2

Note that in (30) and (31), \(\langle \bar{r} ,\bar{y}\rangle _{H^{-1},H^1_0} = 0\) also holds.

Remark 3

It follows from [5, Theorem 6.57], that if \(\lambda \in N_M(y)\), then

$$\begin{aligned} \langle \lambda ,y\rangle _{H^{-1},H^1_0}= 0,\,\, \mathrm{and }\,\,0 \ge \langle \lambda , \varphi \rangle _{H^{-1},H^1_0},\,\,\forall \varphi \in H^1_0(\varOmega ): \varphi \ge 0\,\,a.e.\,\varOmega , \end{aligned}$$
(33)

Thus, (32) is always implied by (30) and (31).

We are now ready to establish the main result of this section.

Theorem 2

(Limiting stationarity conditions) Let \((\bar{u},\bar{y})\) be a locally optimal solution to MPEC (1). Then there exist \(\bar{p} \in H^1_0(\varOmega ),\,\bar{r} \in H^{-1}(\varOmega ),\,\bar{v} \in H^{-1}(\varOmega )\), and \(\bar{s} \in L^2(\varOmega )\) along with sequences \(\{p_k\}\subset H^1_0(\varOmega )\) and \(\{r_k\}\subset H^{-1}(\varOmega )\) such that \(p_k \stackrel{{H^{1_0}}}{\mathrm{- \!\!\! \rightharpoonup }}\bar{p}\) and \(r_k \stackrel{{H^{-1}}}{\mathrm{- \!\!\! \rightharpoonup }}\bar{r}\), for which it holds that

$$\begin{aligned} 0&= \alpha \bar{u} + B^*\bar{p} + \bar{s},\end{aligned}$$
(34)
$$\begin{aligned} 0&= \bar{y}-y_d - A^*\bar{p} + \bar{r}, \end{aligned}$$
(35)
$$\begin{aligned} 0&= A\bar{y} - B\bar{u}-f + \bar{v}, \end{aligned}$$
(36)
$$\begin{aligned} 0&\le \bar{s},\,\,a.e.\,\mathcal A _a(\bar{u})\quad \bar{s} = 0,\,\,a.e.\,\mathcal J (\bar{u}),\quad \bar{s} \ge 0\,\,a.e.\,\mathcal A _{b}(\bar{u}),\end{aligned}$$
(37)
$$\begin{aligned} 0&\ge \langle \bar{v}, \varphi \rangle _{H^{-1},H^1_0},\,\,\forall \varphi \in H^1_0(\varOmega ): \varphi \ge 0\,\,a.e.\,\varOmega ,\end{aligned}$$
(38)
$$\begin{aligned} 0&= \langle \bar{v}, \bar{y}\rangle _{H^{-1},H^1_0},\end{aligned}$$
(39)
$$\begin{aligned} 0&= \langle \bar{v},\bar{p}\rangle _{H^{-1},H^1_0},\end{aligned}$$
(40)
$$\begin{aligned} 0&\ge \limsup _{k\rightarrow \infty }\langle r_k,p_k\rangle _{H^{-1},H^1_0}. \end{aligned}$$
(41)

In addition, one of the following alternatives holds

$$\begin{aligned}&\langle&\bar{r},\bar{y}\rangle _{H^{-1},H^1_0}= 0,\,\, \mathrm{and }\,\,0 \ge \langle \bar{r}, \varphi \rangle _{H^{-1},H^1_0},\,\,\forall \varphi \in H^1_0(\varOmega ): \varphi \ge 0\,\,a.e.\,\varOmega ,\qquad \end{aligned}$$
(42)
$$\begin{aligned}&\langle&\bar{r},\bar{y}\rangle _{H^{-1},H^1_0}= 0,\,\, \mathrm{and }\,\,\bar{r} \in N_M(\bar{y}) - \varepsilon \bar{v},\,\,\varepsilon > 0,\end{aligned}$$
(43)
$$\begin{aligned}&\langle&\bar{r}, \bar{y}\rangle _{H^{-1},H^1_0} = 0. \end{aligned}$$
(44)

Proof

Equation (34) follows directly from Proposition 1, whereas Eq. (36) is due to the feasibility of an optimal solution. Equations (37), (38) and (39) also follow from the characterization of the normal cones \(N_{U_{ad}}(\bar{u})\) and \(N_M(\bar{y})\), cf. [5, Lemma 6.34, Lemma 6.57]. According to Proposition 2, \(\bar{p}\) is the weak limit in \(H^{1}_0(\varOmega )\) of some sequence \(p_k\) with \(p_k \in \mathcal K (y_k,v_k)\), whereas \(r_k\) is given by

$$\begin{aligned} r_k := A^*p_k - q_k \in \left[ \mathcal K (y_k,v_k)\right] ^{-}\!, \end{aligned}$$

with \(q_k \stackrel{{H^{-1}}}{\mathrm{- \!\!\! \rightharpoonup }} \bar{y} - y_d\). Whence, we have and (35). By definition of the critical cone, \(\langle v_k,p_k\rangle _{H^{-1},H^1_0} = 0\) for all \(k\). Hence, (40) holds, since \(v_k \stackrel{{H^{-1}}}{\longrightarrow } \bar{v}\). Moreover, we have by polarity \(\langle r_k,p_k\rangle _{H^{-1},H^1_0} \le 0\), from which (41) follows. The alternatives (42)–(44) arise as a consequence of Proposition 3 in light of (33). \(\square \)

Remark 4

(Discussions on the limiting stationarity conditions)

  1. (i)

    The reader has most likely noted that we did not use the inclusion \(p^*_k \in T_M(y_k)\) in the proof of Theorem 2 to further characterize \(\bar{p}\). Using a diagonalization argument, it can be shown that \(\bar{p}\) is an element of the weak Painlevé–Kuratowski outer limit of the sequence of sets \(\{t_k^{-1}(M - y_k)\}\), which amounts to the so-called (weak) paratingent cone; see, e.g., [3]. The paratingent cone is often too large to provide any meaningful local linearization of the set \(M\). For example, let \(M := [0,1] \subset \mathbb{R }\). Here, \(T_M(0) = \mathbb{R }_+\), which is locally a reasonable approximation of the set. Conversely, for a sequence \(\varepsilon _k \rightarrow 0^+,\,T_M(\varepsilon _k) = \mathbb{R }\) for all \(k\). Therefore, the paratingent cone at \(0\) becomes the entire space. Due to such a property even for the simplest of convex sets in finite dimensions, we do not attempt to characterize this cone in \(H^1_0(\varOmega )\) without more knowledge of the involved structures.

  2. (ii)

    Since the sequences \(p_k\) and \(r_k\) in Theorem 2 need only converge weakly in their respective spaces, nothing more can be said about the sign of the product \(\langle \bar{r},\bar{p}\rangle _{H^{-1},H^1_0}\). The very existence of these sequences was provided by the coderivative, so it appears that no extra information can be obtained. In the next two sections we are not provided with the existence of these sequences, rather, we must derive them. The advantage then becomes evident as we can show that \(q_k\stackrel{L^2}{\rightharpoonup }\bar{y} - y_d\), not merely weakly in \(H^{-1}(\varOmega )\). However, there we must make additional data assumptions, so that the technique used here remains more widely applicable.

4 Stationarity conditions via penalization of the control constraints

We begin this section by simplifying the model class through the removal of the constraint \(u \ge a \,\,a.e.\,\varOmega \). It should be clear that the same arguments remain valid so that their application to bilateral control constraints can also be considered. Our new model problem becomes

$$\begin{aligned} \begin{array}{ll} \min \frac{1}{2}||y - y_d||^2_{L^2(\varOmega )} + \frac{\alpha }{2}||u||^2_{L^2(\varOmega )} \text { over } (u,y) \in L^2(\varOmega ) \times H^1_0(\varOmega ),\\ s.t.\quad u \le b \,\,a.e.\,\varOmega ,\\ Ay + N_M(y) \ni Bu + f. \end{array} \end{aligned}$$
(45)

Thus from now on we denote by \(U_{ad}\) the set

$$\begin{aligned} U_{ad} := \left\{ u \in L^2(\varOmega ) \left| \, u \le b\,\,a.e.\,\varOmega \right. \right\} . \end{aligned}$$

Moreover, we assume further that the linear operator \(B\) is the identity on \(L^2(\varOmega )\) and henceforth cease to explicitly use it in the results below. The results can be extended for more general \(B\), provided \(B\) is surjective. All the other data assumptions for (1) remain the same, unless otherwise stated.

Continuing with the reduced model class (45), we now penalize the constraint on \(u\) with an \(L^2\)-penalty function derived from the Moreau–Yosida regularization of the indicator function of \(U_{ad}\). By defining

$$\begin{aligned} J_{\gamma }(u,y) := \frac{1}{2}||y - y_d||^2_{L^2(\varOmega )} + \frac{\alpha }{2}||u||^2_{L^2(\varOmega )} + \frac{\gamma }{2}||(u-b)_+||^2_{L^2}, \end{aligned}$$

this gives rise to the following class of MPECs:

$$\begin{aligned}&\min J_{\gamma }(u,y) \text { over } (u,y) \in L^2(\varOmega ) \times H^1_0(\varOmega ),\nonumber \\&s.t.\quad y = S(u) \end{aligned}$$
(46)

with \(\gamma > 0\), where \((\cdot )_+ := \max (0,\cdot )\) pointwise almost everywhere.

First we justify the required well-posedness of the penalization procedure.

Proposition 4

(Well-posedness of the penalization) Let \(\gamma _n \rightarrow \infty \) as \(n\rightarrow \infty \). Then for each \(n\in \mathbb{N }\) the MPEC problem (46) with \(\gamma := \gamma _n\) has an optimal solution \((\bar{u}_{\gamma _n},\bar{y}_{\gamma _n}):=(\bar{u}_n,\bar{y}_n)\). Moreover, if \((\bar{u},\bar{y})\in L^2(\varOmega )\times H^1_0(\varOmega )\) is optimal to (45), then there exists a subsequence of \(\{(\bar{u}_n,\bar{y}_n)\}\), indexed still by \(n\), such that \((\bar{u}_n,\bar{y}_n) \rightarrow (\bar{u},\bar{y})\) in the strong–strong topology on \(L^2(\varOmega ) \times H^1_0(\varOmega )\).

Proof

The following arguments are standard, we present them merely for completeness. Since the penalty functional \(||(\cdot -b)_+||^2_{L^2}:L^2(\varOmega ) \rightarrow \mathbb{R }\) is weakly lower semicontinuous and bounded from below, we can apply a classical argument (see, e.g., [13]) to show that MPEC (46) has an optimal solution \((\bar{u}_{\gamma _n},\bar{y}_{\gamma _n}):=(\bar{u}_n,\bar{y}_n)\) for each \(\gamma _n > 0\). It follows from the definition that

$$\begin{aligned}&\frac{1}{2}||\bar{y}_n - y_d||^2_{L^2} + \frac{\alpha }{2}||\bar{u}_n||^2_{L^2} + \frac{\gamma _n}{2}||(\bar{u}_n - b)_+||^2_{L^2}\\&\le \frac{1}{2}||y - y_d||^2_{L^2} + \frac{\alpha }{2}||u||^2_{L^2} + \frac{\gamma _n}{2}||(u - b)_+||^2_{L^2}, \\&\quad \forall (u,y) \in L^2(\varOmega )\times H^1_0(\varOmega ) : Ay + N_M(y) \ni u + f. \end{aligned}$$

Then letting \((\bar{u},\bar{y})\) be a globally optimal solution to (45), we obtain the inequality

$$\begin{aligned} \frac{1}{2}||\bar{y}_n \!-\! y_d||^2_{L^2} \!+\! \frac{\alpha }{2}||\bar{u}_n||^2_{L^2} \!+\! \frac{\gamma _n}{2}||(\bar{u}_n - b)_+||^2_{L^2} \le \frac{1}{2}||\bar{y} \!-\! y_d||^2_{L^2} \!+\! \frac{\alpha }{2}||\bar{u}||^2_{L^2},\qquad \end{aligned}$$
(47)

from which the following conclusions are deduced:

  1. (i)

    \(\left\{ \bar{u}_n\right\} \) is bounded in \(L^2(\varOmega )\);

  2. (ii)

    \(\frac{1}{2}||(\bar{u}_n - b)_+||^2_{L^2} \rightarrow 0\) as \(n \rightarrow \infty \).

Hence there exists a control \(u^* \in L^2(\varOmega )\) and a subsequence \(\{\bar{u}_{n_l}\}\) such that \(\bar{u}_{n_l} \stackrel{L^2}{\rightharpoonup } u^*\). Using the Lipschitz continuity of \(y=S(u)\) as a function of \(u\) from \(H^{-1}(\varOmega )\) into \(H^1_0(\varOmega )\), we have for some fixed \(C > 0\)

$$\begin{aligned} ||\bar{y}_{n_l} - y^*||_{H^1_0} \le C||\bar{u}_{n_l} - u^*||_{H^{-1}}, \end{aligned}$$

where \(\bar{y}_{n_l},\,y^*\) are solutions to the variational inequality associated with \(\bar{u}_{n_l},u^* \in L^2(\varOmega )\), respectively. Since \(L^2(\varOmega ) \hookrightarrow H^{-1}(\varOmega )\) is compact, there exists a subsequence \(\{\bar{u}_{n_{l_k}}\}\) with \(\bar{u}_{n_{l_k}} \stackrel{{H^{-1}}}{\longrightarrow } u^*\). Thus, \(\bar{y}_{n_{l_k}} \stackrel{{H^{1}_{0}}}{\rightarrow } y^*\). Furthermore, since

$$\begin{aligned} \langle A\bar{y}_{n_{l_k}} - \bar{u}_{n_{l_k}} - f,y^{\prime } - \bar{y}_{n_{l_k}}\rangle _{H^{-1},H^1_0} \ge 0,\quad \forall y^{\prime } \in M, \end{aligned}$$

passing to the limit as \(k \rightarrow \infty \) yields

$$\begin{aligned} \langle Ay^* - u^*- f,y' - y^*\rangle _{H^{-1},H^1_0} \ge 0, \quad \forall y' \in M, \end{aligned}$$

and thus \(y^* = S(u^*)\). It is easy to check that \((u^*,y^*)\) is in fact a feasible point of the original MPEC (45). Indeed, since the functional \(F(\cdot ):=||(\cdot - b)_+||^2_{L^2}: L^2(\varOmega ) \rightarrow \mathbb{R }\) is weakly lower semicontinuous, it follows that

$$\begin{aligned} 0 = \lim _{k \rightarrow \infty } F(\bar{u}_{n_{l_k}}) = \liminf _{k\rightarrow \infty } F(\bar{u}_{n_{l_k}}) \ge F(u^*) \Rightarrow ||(u^* - b)_+||^2_{L^2} = 0, \end{aligned}$$

and hence, \(u^* \le b\,\,a.e.\,\varOmega \). Taking now the limit inferior in (47) ensures that \((u^*,y^*)=(\bar{u},\bar{y})\).

Finally, it follows from (47) that

$$\begin{aligned} ||\bar{u}_{n_{l_k}}||^2_{L^2} - ||\bar{u}||^2_{L^2} \le \frac{1}{\alpha }\Big (||\bar{y}_{n_{l_k}} - y_d||^2_{L^2} - ||\bar{y} - y_d||^2_{L^2}\Big ). \end{aligned}$$

Using then \(\bar{y}_{n_{l_k}} \stackrel{{H^{1_0}}}{\rightarrow } \bar{y}\) and the weak lower-semicontinuity of the \(L^2\)-norm yields

$$\begin{aligned} 0&= ||\bar{u}||^2_{L^2} - ||\bar{u}||^2_{L^2} \le \liminf _{k \rightarrow \infty } ||\bar{u}_{n_{l_k}}||^2_{L^2} - ||\bar{u}||^2_{L^2} \\&\le \liminf _{k \rightarrow \infty } \frac{1}{\alpha }\Big (||\bar{y}_{n_{l_k}} - y_d||^2_{L^2} - ||\bar{y} - y_d||^2_{L^2}\Big ) = 0 \end{aligned}$$

as well as

$$\begin{aligned} \limsup _{k\rightarrow \infty } ||\bar{u}_{n_{l_k}}||^2_{L^2} - ||\bar{u}||^2_{L^2} \le \limsup _{k \rightarrow \infty } \frac{1}{\alpha }\Big (||\bar{y}_{n_{l_k}} - y_d||^2_{L^2} - ||\bar{y} - y_d||^2_{L^2}\Big ) = 0. \end{aligned}$$

Thus \(\bar{u}_{n_{l_k}} \stackrel{L^2}{\rightharpoonup } \bar{u}\) and \(||\bar{u}_{n_{l_k}}||_{L^2} \rightarrow ||u||_{L^2}\). Since \(L^2(\varOmega )\) is a Hilbert space, the latter implies \(\bar{u}_{n_{l_k}} \stackrel{L^2}{\rightarrow } \bar{u}\). \(\square \)

By applying the same arguments as in the proof of Proposition 1, we check that any locally optimal solution \((\bar{u},\bar{y})\) to (46) satisfies the necessary optimality condition

$$\begin{aligned} 0 \in \nabla _{u} J_{\gamma }(\bar{u},\bar{y}) + B^* D^* S(\bar{u},\bar{y})\left( \nabla _{y} J_{\gamma }(\bar{u},\bar{y})\right) . \end{aligned}$$
(48)

Since \(B = B^*\) is the identity on \(L^2(\varOmega )\) and \(\bar{p} \in D^* S(\bar{u},\bar{y})(\nabla _{y} J_{\gamma }(\bar{u},\bar{y}))\) is an element of \(H^1_0(\varOmega )\), we can argue that \(\nabla _{u} J_{\gamma }(\bar{u},\bar{y}) \in H^1_0(\varOmega )\). This leads us to the following proposition.

Proposition 5

(Increased regularity at a solution) If \((\bar{u}_\gamma ,\bar{y}_\gamma )\) is a locally optimal solution of (46), then

$$\begin{aligned} \alpha \bar{u}_\gamma + \gamma (\bar{u}_\gamma - b)_+ \in H^1_0(\varOmega ).\end{aligned}$$

Proof

Since \(\nabla _{u} J_{\gamma }(\bar{u}_{\gamma },\bar{y}_{\gamma }) = \alpha \bar{u}_{\gamma } + \, \gamma (\bar{u}_{\gamma } - b)_+\), the result follows from the argument directly preceeding the statement of this proposition. \(\square \)

Based on the results in [10], we now derive primal and dual optimality conditions for MPECs of type (46).

Theorem 3

(S-stationarity for penalized MPECs) Let \((\bar{u}_{\gamma },\bar{y}_{\gamma })\) be a local optimal solution to MPEC (46). Then we have

$$\begin{aligned} (\alpha \bar{u}_{\gamma } + \gamma (\bar{u}_{\gamma }-b)_+,h)_{L^2} + (\bar{y}_{\gamma } - y_d,d)_{L^2} \ge 0, \forall (h,d) \in \mathrm{gph\,}S'(\bar{u}_{\gamma },\cdot ). \end{aligned}$$
(49)

Moreover, there exist \(\bar{p}_{\gamma } \in H^1_0(\varOmega ),\,\bar{r}_{\gamma } \in H^{-1}(\varOmega )\), and \(\bar{v}_{\gamma } \in H^{-1}(\varOmega )\) such that

$$\begin{aligned} 0&= \alpha \bar{u}_{\gamma } + \gamma (\bar{u}_{\gamma }-b)_+ + \bar{p}_{\gamma },\end{aligned}$$
(50)
$$\begin{aligned} 0&= \bar{y}_{\gamma } - y_d - A^*\bar{p}_{\gamma } + \bar{r}_{\gamma },\end{aligned}$$
(51)
$$\begin{aligned} 0&= A\bar{y}_{\gamma } - \bar{u}_{\gamma } - f + \bar{v}_{\gamma } \end{aligned}$$
(52)

with the primal-dual triple \((\bar{p}_{\gamma },\bar{r}_{\gamma },\bar{v}_{\gamma })\) satisfying the inclusions

$$\begin{aligned} \bar{p}_{\gamma }\in \mathcal K (\bar{y}_{\gamma },\bar{v}_{\gamma }),\quad \bar{r}_{\gamma } \in \left[ \mathcal K (\bar{y}_{\gamma },\bar{v}_{\gamma })\right] ^{-},\quad \bar{v}_{\gamma } \in N_M(\bar{y}_{\gamma }). \end{aligned}$$
(53)

Proof

As the penalty functional is Fréchet differentiable from \(L^2(\varOmega )\) into \(\mathbb{R }\) for each \(\gamma > 0\), the primal optimality condition (49) can be derived by using the same argument that was applied in order to prove Theorem 1.

By the data assumptions, \(h \in L^2(\varOmega )\). Therefore, we can rewrite (49) as

$$\begin{aligned} (\alpha \bar{u}_{\gamma } + \gamma (\bar{u}_{\gamma }-b)_+,h)_{L^2} + (\bar{y}_{\gamma } - y_d,d)_{L^2} \ge 0, \quad \forall (h,d) \in \mathrm{gph\,}S'(\bar{u}_{\gamma },\cdot ). \end{aligned}$$

Using the characterization (5) of \(\mathrm{gph\,}S'(\bar{u}_{\gamma },\cdot )\) and the result from Proposition 5, we may write

$$\begin{aligned}&\langle \alpha \bar{u}_{\gamma } + \gamma (\bar{u}_{\gamma }-b)_+,Ad + w\rangle _{H^1_0,H^{-1}}\\&\quad \,+ \,\,\langle \bar{y}_{\gamma } - y_d,d\rangle _{H^{-1},H^1_0}\ge 0,\quad \forall (d,w) \in \mathrm{gph\,}N_\mathcal{K (\bar{y}_{\gamma },\bar{v}_{\gamma })}. \end{aligned}$$

This is equivalent to defining \(\bar{p}_{\gamma } \in H^1_0(\varOmega )\) such that

$$\begin{aligned} \langle -A^*\bar{p}_{\gamma } + \bar{y}_{\gamma } - y_d,d\rangle _{H^{-1},H^{1}_0} + \langle \bar{p}_{\gamma },w\rangle _{H^{1}_0,H^{-1}} \ge 0,\quad \forall (d,w) \in \mathrm{gph\,}N_\mathcal{K (\bar{y}_{\gamma },\bar{v}_{\gamma })}. \end{aligned}$$

where

$$\begin{aligned} 0 = \alpha \bar{u}_{\gamma } + \gamma (\bar{u}_{\gamma }-b)_+ + \bar{p}_{\gamma }. \end{aligned}$$

Then since \([\mathrm{gph\,}N_\mathcal{K (\bar{y}_{\gamma },\bar{v}_{\gamma })}]^{-} = \left[ \mathcal K (\bar{y}_{\gamma },\bar{v}_{\gamma })\right] ^{-}\times \mathcal K (\bar{y},\bar{v})\) in the \(H^{-1}(\varOmega ) \times H^1_0(\varOmega )\)-topology (see, e.g., the proof of Theorem 4.6 in [10]), we obtain the relation

$$\begin{aligned} 0 = \bar{y}_{\gamma } - y_d - A^*\bar{p}_{\gamma } + \bar{r}_{\gamma } \end{aligned}$$

where

$$\begin{aligned} \bar{p}_{\gamma } \in \mathcal K (\bar{y}_{\gamma },\bar{v}_{\gamma }) \text { and } \bar{r}_{\gamma } \in \left[ \mathcal K (\bar{y}_{\gamma },\bar{v}_{\gamma })\right] ^{-}. \end{aligned}$$

From which we obtain the assertion; (52) follows from feasibility. \(\square \)

Given the well-known characterizations of the cones involved in the dual conditions of Theorem 3, see e.g. Lemmas 6.57 and Section 6.4.4 in [5], one can derive S-stationarity conditions as in [13] for the penalized MPEC (46). This result is not surprising, as the results in [10] provide S-stationarity conditions for much more general settings than considered here, provided the objective functional is Fréchet differentiable and there are no upper-level constraints. We have nevertheless decided to provide the derivation above in order to partially demonstrate the technique.

Next, we derive some auxiliary results needed for the main result of this section. Recall the following two notions of variational convergence:

Definition 8

(Mosco epi-convergence and graph convergence) For \(n \ge 1\), let \(\phi _n, \phi : X \rightarrow \overline{\mathbb{R }}\) be proper convex lower semicontinuous functions and \(X\) a reflexive Banach space. One says that \(\phi _n\) epi-converges in the sense of Mosco to \(\phi \), denoted by \(\phi _n \xrightarrow {\mathrm{M-epi}}\phi \), provided the following two conditions hold for all \( x \in X\):

  1. 1.

    \(\forall x_n \stackrel{X}{\rightharpoonup } x,\,\phi (x) \le \liminf _{n\rightarrow \infty } \phi _n(x_n)\),

  2. 2.

    \(\exists x_n \stackrel{X}{\rightarrow } x\) such that \(\phi (x) \ge \limsup _{n \rightarrow \infty }\phi _n(x_n)\).

For \(n \ge 1\), let \(A_n\) and \(A\) be maximal monotone operators from \(X\) into \(X^*\). The sequence \(A_n\) is said to graph converge to \(A\), denoted by \(A_n \xrightarrow {\mathrm{G}}A\), if the following property holds:

  • For every \((x,y) \in \mathrm{gph\,}A\), there exists a sequence \((x_n,y_n) \in \mathrm{gph\,}A_n\) such that \(x_n \stackrel{X}{\rightarrow } x\) and \(y_n \stackrel{X^*}{\rightarrow } y\).

We refer the reader to the monograph by Attouch [2] for more on these and related topics. After defining graph convergence, Attouch points out in Proposition 3.59 in [2] that for a sequence of maximal monotone operators \(A_n \xrightarrow {\mathrm{G}}A\), the following holds:

  • For every sequence \((x_n,y_n) \in \mathrm{gph\,}A_n\) such that \(x_n \stackrel{X}{\rightarrow } x\) and \(y_n \stackrel{X^*}{\rightharpoonup } y,\,(x,y) \in \mathrm{gph\,}A\) (and vice versa, by exchanging strong and weak).

This result shows that the convergence properties of sequences of normal cone mappings to convex sets discussed in Sect. 3 extends to the much broader class of maximal monotone operators. We now apply these notions and results on variational convergence to our problem.

Lemma 1

(Moreau–Yosida approximations of unilateral pointwise constraints) Let \(\gamma _n \rightarrow \infty \), and let \(b\in L^2(\varOmega )\). Define the Moreau–Yosida regularization \(F_n:L^2(\varOmega )\rightarrow \overline{\mathbb{R }}\) by

$$\begin{aligned} F_n(u):=\frac{\gamma _n}{2}||(u-b)_+||^2_{L^2},\,\,\forall u \in L^2(\varOmega ). \end{aligned}$$

Then \(F_n \xrightarrow {\mathrm{M-epi}}I_{U_{ad}}\), where \(I_{U_{ad}}\) stands for the indicator function of the set \(U_{ad}\) given by

$$\begin{aligned} U_{ad} := \left\{ u \in L^2(\varOmega ) \left| \;u \le b\,\,a.e.\,\varOmega \right. \right\} \!. \end{aligned}$$

Proof

We begin by assuming that \(u \notin U_{ad}\). For any \(u_n\rightharpoonup u\) in \(L^2(\varOmega )\), we can use the weak lower semicontinuity of \(||(\cdot - b)_+||^2_{L^2}\) in order to deduce the existence of some \(\varepsilon > 0\) such that

$$\begin{aligned} \liminf _{n\rightarrow \infty } ||(u_n - b)_+||^2_{L^2} \ge \varepsilon > 0. \end{aligned}$$

It follows that \(\liminf _{n \rightarrow \infty } F_n(u_n) = +\infty \). Conversely, suppose that \(u \in U_{ad}\), then since the trivial sequence \(u_n = u\) converges weakly to \(u\) in \(L^2(\varOmega )\), we have found a sequence such that \(\liminf _{n \rightarrow \infty } F_n(u_n) = 0\). Therefore, it holds for all \( u \in L^2(\varOmega )\) that

$$\begin{aligned} \forall u_n\stackrel{L^2}{\rightharpoonup } u,\,\, I_{U_{ad}}(u) \le \liminf _{n \rightarrow \infty } F_n(u_n). \end{aligned}$$

The remaining argument requires us to demonstrate the existence of a strongly convergent sequence such that the limit superior condition in Definition 8 holds for all \(u \in L^2(\varOmega )\). Of course, if \(u \notin U_{ad}\), then \(I_{U_{ad}}(u) = +\infty \). Thus for any sequence \(u_n\) strongly converging to \(u\) in \(L^2(\varOmega )\), it follows that

$$\begin{aligned} +\infty = I_{U_{ad}}(u) \ge \limsup _{n\rightarrow \infty }F_n(u_n). \end{aligned}$$

Finally, if \(u \in U_{ad}\), then by taking the trivial sequence \(u_n = u\), we see that \(F_n(u_n) = 0\) for all \(n\). Hence,

$$\begin{aligned} 0 = I_{U_{ad}}(u) \ge \limsup _{n\rightarrow \infty }F_n(u_n), \end{aligned}$$

as was to be shown. \(\square \)

We now combine Lemma 1 with [2, Theorem 3.66] to obtain:

Proposition 6

(Convergence of approximations) Let \(\gamma _n\rightarrow \infty ,\,b \in L^2(\varOmega )\), and \(u_n \stackrel{L^2}{\rightarrow } u\). If \(w_n \stackrel{L^2}{\rightarrow } w\) for \(w_n: =\gamma _n(u_n - b)_+\), then we have \(w\in N_{U_{ad}}(u)\).

Proof

The aforementioned theorem by Attouch states that the Mosco epi-convergence for a sequence of proper, convex, and lower semicontinuous functions is equivalent to the graph convergence of their subdifferentials (plus a normalizing condition). Using \(F_n\) and \(F\) from Lemma 1, we see that

$$\begin{aligned} \partial F_n(u_n)=\gamma _n(u_n - b)_+ \;\text { and }\;\partial F(u)=\partial I_{U_{ad}} = N_{U_{ad}}(u),\quad u\in L^2(\varOmega ). \end{aligned}$$

Then by [2, Proposition 3.59] (see above), the assertion holds. \(\square \)

We are now ready to derive the main result of this section.

Theorem 4

(Improved limiting stationarity conditions for the constrained MPEC) Let \(\gamma _n\rightarrow \infty \), and let \((\bar{u},\bar{y}) \in L^2(\varOmega ) \times H^1_0(\varOmega )\) be an optimal solution to (45). Then there exist sequences

$$\begin{aligned} \bar{u}_n \stackrel{L^2}{\rightarrow } \bar{u},\quad \bar{y}_n \stackrel{{H^{1}_{0}}}{\rightarrow } \bar{y},\quad \bar{p}_n \stackrel{{H^{1}_{0}}}{\rightharpoonup } \bar{p},\quad \bar{r}_n \stackrel{{H^{-1}}}{\mathrm{- \!\!\! \rightharpoonup }} \bar{r}, \end{aligned}$$
(54)

where \((\bar{u}_n,\bar{y}_n)\in L^2(\varOmega )\times H^1_0(\varOmega )\) solves the penalized MPEC (46) for each \(n\in \mathbb{N }\), with \(\gamma := \gamma _n\), and, given \(\bar{v}_n = \bar{u}_n + f - A\bar{y}_n \in N_M(y_n)\),

$$\begin{aligned} (\bar{u}_n,\bar{y}_n,\bar{v}_n,\bar{p}_n,\bar{r}_n) \in L^2(\varOmega )\times H^1_0(\varOmega ) \times H^{-1}(\varOmega ) \times H^1_0(\varOmega ) \times H^{-1}(\varOmega ) \end{aligned}$$

satisfies the strong stationarity system (50)–(53). Moreover, there exist \(\bar{v} \in H^{-1}(\varOmega )\) and \(\bar{s} \in L^2(\varOmega )\) such that \((\bar{u},\bar{y},\bar{v},\bar{p},\bar{r},\bar{s})\) satisfies the limiting stationarity conditions (34)–(44) with (41) replaced by

$$\begin{aligned} 0 \ge \langle \bar{r},\bar{p}\rangle _{H^{-1},H^1_0}. \end{aligned}$$

Proof

According to Proposition 4, there exists a sequence of optimal solutions \((\bar{u}_n,\bar{y}_n)\) of (46) with \(\gamma := \gamma _n\) such that along a subsequence, indexed still by \(n,\,\bar{u}_n\) and \(\bar{y}_n\) converge as in (54) to a solution \((\bar{u},\bar{y})\) of (45). Since each pair is an optimal solution, we have from Theorem 3 the existence of \((\bar{p}_n,\bar{r}_n,\bar{v}_n)\) such that the conditions (50)–(53) hold for each tuple \((\bar{u}_n,\bar{y}_n,\bar{v}_n,\bar{p}_n,\bar{r}_n)\).

Using now the properties of \(\bar{p}_n\) and \(\bar{r}_n\), we have from (51), after multiplying with \(\bar{p}_n\), the following equation

$$\begin{aligned} \langle A^*\bar{p}_n,\bar{p}_n\rangle _{H^{-1},H^1_0} = (\bar{y}_n - y_d,\bar{p}_n)_{L^2} + \langle \bar{r}_n,\bar{p}_n\rangle _{H^{-1},H^1_0}. \end{aligned}$$

Using the coercivity of \(A\) and the fact that \(\langle \bar{r}_n,\bar{p}_n\rangle _{H^{-1},H^1_0} \le 0\), we know there exists a \(\xi > 0\) such that

$$\begin{aligned} \xi || \bar{p}_n ||^2_{H^1_0} \le (\bar{y}_n - y_d,\bar{p}_n)_{L^2} \le ||\bar{y}_n - y_d||_{L^2}||\bar{p}_n||_{L^2}. \end{aligned}$$

Then by dividing through by \(||\bar{p}_n||_{L^2}\) and using the fact that \(H^1_0(\varOmega )\hookrightarrow L^2(\varOmega )\) is continuous, we derive the existence of some \(\kappa > 0\) such that

$$\begin{aligned} ||\bar{p}_n||_{H^1_0} \le \kappa ||\bar{y}_n - y_d||_{L^2}. \end{aligned}$$

It follows that \(\{\bar{p}_n\}\) is bounded in \(H^1_0(\varOmega )\). Therefore, there exists \(\bar{p} \in H^1_0(\varOmega )\) and a subsequence \(\{\bar{p}_{n_l}\}\) such that \( \bar{p}_{n_l} \stackrel{{H^1_0}}{\rightharpoonup } \bar{p}\). Moreover, we can use this sequence along with (51) to conclude the existence of a sequence \(\{\bar{r}_{n_l}\}\) in \(H^{-1}(\varOmega )\) which converges weakly in \(H^{-1}(\varOmega )\) to some \(\bar{r} \in H^{-1}(\varOmega )\). Thus, the sequences \((\bar{u}_{n_l},\bar{y}_{n_l},\bar{v}_{n_l},\bar{p}_{n_l},\bar{r}_{n_l})\) satisfy the same requirements as those arising from the definition of the limiting coderivative in Proposition 2.

Next, since for all \(n_l\)

$$\begin{aligned} - \bar{p}_{n_l} - \alpha \bar{u}_{n_l} = \gamma _{n_l}(\bar{u}_{n_l} - b)_+, \end{aligned}$$

where \(\bar{p}_{n_l} \stackrel{{H^1_0}}{\rightharpoonup } \bar{p}\), therefore strongly in \(L^2(\varOmega )\), and \(\bar{u}_{n_l} \stackrel{L^2}{\rightarrow } \bar{u}\), we can apply Proposition 6 in order to deduce the limiting condition

$$\begin{aligned} 0 \in \bar{p} + \alpha \bar{u} + N_{U_{ad}}(\bar{u}). \end{aligned}$$

Hence, \((\bar{u},\bar{y},\bar{v},\bar{p},\bar{r})\) fulfills the relations (34)–(44) via the same results which were used to prove Theorem 2.

Finally, since \(\langle \bar{r}_n,\bar{p}_n\rangle _{H^{-1},H^1_0} \le 0\) for all \(n \ge 1\), with \(\bar{r}_n := A^*\bar{p}_n + y_d - \bar{y}_n\), we obtain

$$\begin{aligned} 0&\ge \langle A^*\bar{p}_n + y_d - \bar{y}_n,\bar{p}_n\rangle _{H^{-1},H^1_0} = \langle A^*\bar{p}_n,\bar{p}_n\rangle _{H^{-1},H^1_0} + (y_d - \bar{y}_n,\bar{p}_n)_{L^2} \\&\ge \liminf _{n\rightarrow \infty }\langle A^*\bar{p}_n,\bar{p}_n\rangle _{H^{-1},H^1_0} + (y_d - \bar{y}_n,\bar{p}_n)_{L^2} \\&\ge \langle A^*\bar{p},\bar{p}\rangle _{H^{-1},H^1_0} + (y_d - \bar{y},\bar{p})_{L^2} = \langle \bar{r},\bar{p}\rangle _{H^{-1},H^1_0}. \end{aligned}$$

This completes the proof. \(\square \)

In the context of Sect. 3, the previous inequality can also be written:

$$\begin{aligned} 0 \ge \langle A^*\bar{p}_n - \bar{q}_n,\bar{p}_n\rangle _{H^{-1},H^1_0} = \langle A^*\bar{p}_n,\bar{p}_n\rangle _{H^{-1},H^1_0} - \langle \bar{q}_n,\bar{p}_n\rangle _{H^{-1},H^1_0}. \end{aligned}$$

However, \(\bar{q}_n \in H^{-1}(\varOmega )\) and only converges weakly in \(H^{-1}(\varOmega )\) to \(\bar{y} - y_d\). Hence, a similar argument is not possible. Weak convergence in \(L^2(\varOmega )\) of \(\bar{q}_n\) can only be obtained in Sect. 3, by definition of \(D^*S\), if we assume that \(S:H^{-1}(\varOmega ) \rightarrow L^2(\varOmega )\), i.e. by enlarging the range space. How exactly this affects the results on the directional differentiability and polarization of the contingent cone \(T_{\mathrm{gph\,}S}\), which are essential for the characterization of \(D^*S\), remains an open question.

5 Stationarity conditions for constrained MPECs via regularization-penalization techniques

In this section we explore yet another approximation approach to study the class of our constrained elliptic MPECs. Such a penalization-approximation technique has been recently applied to MPECs by Hintermüller and Kopacka in [9] although it has been widely employed before in various frameworks of single-level optimal control and related problems governed by partial differential equations; see, e.g., the books by Barbu [4, Chapter 3.2] and Mordukhovich [15, Chapter 7.4] with the bibliographies therein. Note also that the concept of penalizing the nonsmoothness/multivaluedness via a sequence of parameter-dependent differentiable functions goes back to earlier developments presented in [11] and [7]. Our notation and terminology are based on [9].

For simplicity we consider in this section the class of MPECs (45) described at the beginning of Sect. 5 with imposing two additional assumptions. First, we assume that \(A\) is the second-order differential operator associated with the bilinear form \(a:H^1_0(\varOmega )\times H^1_0(\varOmega ) \rightarrow \mathbb{R }\) defined by

$$\begin{aligned} a(v,w)&= \sum _{i,j = 1}^l\int \limits _{\varOmega }a_{ij}\frac{\partial v}{\partial x_j}\frac{\partial w}{\partial x_i} dx + \sum _{i=1}^l\int \limits _{\varOmega }b_i\frac{\partial v}{\partial x_i} w dx\nonumber \\&+\int \limits _{\varOmega } c vw dx,\quad \forall v,w \in H^1_0(\varOmega ), \end{aligned}$$
(55)

where \(b_i, c \in L^{\infty }(\varOmega ),\,a_{ij} \in C^{0,1}(\bar{\varOmega })\), i.e., Lipschitz continuous on the closure of \(\varOmega ,\,c \ge 0\), and \(a(\cdot ,\cdot )\) is both bounded and coercive. Second, we assume that either \(\varOmega \) is a convex polyhedron or \(\partial \varOmega \) is a \(C^{1,1}\)-boundary. This implies that every solution \(y\) of the variational inequality is an element of \(H^2(\varOmega ) \cap H^1_0(\varOmega )\).

Suppose now that \(\pi : H^1_0(\varOmega )\rightarrow H^{-1}(\varOmega )\) is Lipschitz continuous and monotone with the condition \(\ker (\pi )=M\). Then the variational inequality can be approximated by a semi-linear second-order partial differential equation written here in the form

$$\begin{aligned} a(y,\varphi )+\frac{1}{\beta }\langle \pi (y) ,\varphi \rangle _{H^{-1},H^1_0}=(u,\varphi )_{L^2}+(f,\varphi )_{L_2}, \quad \forall \varphi \in H^1_0(\varOmega ), \end{aligned}$$

where \(\beta ^{-1} > 0\) is a penalty parameter. The assumptions imposed on the penalty operator \(\pi \) ensure that the above partial differential equation (PDE) has a unique solution \(y_{\beta }(u)\). Moreover, it can be shown that \(y_{\beta }(u) \rightarrow y(u)\) in \(H^1_0(\varOmega )\) as \(\beta \rightarrow 0^+\), where \(y(u)\) solves the original variational inequality; see e.g., [7, 9]. Note that in [9] the mapping \(\pi \) was defined by using the maximum operator

$$\begin{aligned} \pi (v) := -\max (0, - v),\quad \forall v \in H^1_0(\varOmega ). \end{aligned}$$

Since the pointwise maximum \(\max (0,\cdot )\) is nondifferentiable, certain regularized (i.e., smoothed) operators dependent on some parameter \(\varepsilon > 0\) were considered in [9]. These smoothed operators, which we denote now by \(\max _{\varepsilon }(0,\cdot )\), act almost identically to the \(\max (0,\cdot )\) operators with the only difference that the “kink” at zero is smoothed out on a neighborhood depending on \(\varepsilon \). One such example is given explicitly by

$$\begin{aligned} \mathrm{max}_{\varepsilon }(0,r) :=\left\{ \begin{array}{ll} r-\displaystyle \frac{\varepsilon }{2} &{} \text { if }\;r \ge \varepsilon ,\\ \displaystyle \frac{r^2}{2\varepsilon } &{} \text { if }\;r\in (0,\varepsilon ),\\ 0 &{} \text { if }\;r \le 0. \end{array}\right. \end{aligned}$$

Under relatively weak assumptions it is shown in [9, Theorem 2.3] that solutions \(y_{\beta }\) to the regularized penalized problems

$$\begin{aligned} A y_{\beta } - \frac{1}{\beta } \mathrm{max}_{\varepsilon }(0,-y_{\beta }) = u_{\beta }+f, \end{aligned}$$

with \(u_{\beta }, u \in L^2(\varOmega )\) and \(u_{\beta } \rightarrow u\) in \(H^{-1}(\varOmega )\), converge strongly in \(H^1_0(\varOmega )\) as \(\beta \rightarrow 0^+\) to the solution \(y(u)\) of the original variational inequality. By using the penalized regularized variational inequality, i.e., the semi-linear partial differential equation, we define the following smoothed penalized problem that approximates MPEC (45) under consideration:

$$\begin{aligned}&\min \dfrac{1}{2}||y - y_d||^2_{L^2(\varOmega )}+\dfrac{\alpha }{2}||u||^2_{L^2(\varOmega )} \text { over } (u,y) \in L^2(\varOmega ) \times H^1_0(\varOmega )\nonumber \\&s.t. \quad a \le u \le b\,\,a.e.\,\varOmega ,\nonumber \\&A y -\frac{1}{\beta } \mathrm{max}_{\varepsilon }(0,-y)=u+f. \end{aligned}$$
(56)

Since (56) is no longer an MPEC, more classical methods for the derivation of optimality conditions can be applied. The process is roughly as follows: the regularization of the non-smoothness can be used to show that the solution mapping \(S\) of the PDE is Fréchet differentiable for each \(\varepsilon > 0\). After rewriting the problem in terms of the control \(u\), one can then characterize the solutions via a variational inequality, which after introducing the proper slack variables, leads to the following result.

Theorem 5

(Necessary optimality conditions for the penalized regularized problems) Let \(\beta ,\varepsilon > 0\) and \((y,u)\in H^1_0(\varOmega )\times L^2(\varOmega )\) be an optimal solution to (56). Then there exists an adjoint state \(p\in H^1_0(\varOmega )\) such that

$$\begin{aligned}&y + A^*p + \frac{1}{\beta }\mathrm{max}'_{\varepsilon }(0,-y)p = y_d,\end{aligned}$$
(57)
$$\begin{aligned}&Ay - \frac{1}{\beta }\mathrm{max}_{\varepsilon }(0,-y) = u+f,\end{aligned}$$
(58)
$$\begin{aligned}&u \in U_{ad}, \,\, (\alpha u - p,v - u)_{L^2} \ge 0,\,\,\forall v \in U_{ad}. \end{aligned}$$
(59)

By defining sequences of stationary points, rather than local or global minimizers in the primal variables, satisfying (57)–(59) along a sequence of positive numbers \(\beta \rightarrow 0^+\) and a bounded sequence \(\left\{ \varepsilon (\beta )\right\} \) with \(\varepsilon (\beta )/ \beta \rightarrow 0\) as \(\beta \rightarrow 0^+\), it is shown in [9, Theorem 3.4] that there exists a sextuple

$$\begin{aligned}&(\tilde{u},\tilde{y},\tilde{v},\tilde{p},\tilde{r},\tilde{s})\in L^2(\varOmega )\times \left[ H^2(\varOmega ) \cap H^1_0(\varOmega )\right] \\&\quad \times L^2(\varOmega )\times H^1_0(\varOmega )\times H^{-1}(\varOmega )\times L^2(\varOmega ) \end{aligned}$$

and a subsequence of the stationary points, which we again denote by \(\beta \), such that

$$\begin{aligned} u_{\beta } \stackrel{L^2}{\rightarrow } \tilde{u},\quad y_{\beta } \stackrel{{H^1_0}}{\rightarrow } \tilde{y},\quad \frac{1}{\beta }\mathrm{max}_{\varepsilon (\beta )}(0,-y_{\beta }) \stackrel{{H^{-1}}}{\longrightarrow } \tilde{v} \end{aligned}$$

and

$$\begin{aligned} p_{\beta } \stackrel{{H^1_0}}{\rightharpoonup } \tilde{p},\quad \quad \frac{1}{\beta }\mathrm{max}'_{\varepsilon (\beta )}(0,-y_{\beta })p_{\beta } \stackrel{{H^{-1}}}{\rightharpoonup } \tilde{r}, \end{aligned}$$

where \((\tilde{u},\tilde{y},\tilde{v},\tilde{p},\tilde{r},\tilde{s})\) is a C-stationary point for the original MPEC in the sense that (7)–(10) hold and in place of (12)–(15) one has

$$\begin{aligned} 0&\ge \tilde{v} ,\,\,a.e.\,\mathcal A (\tilde{y}),\end{aligned}$$
(60)
$$\begin{aligned} 0&= \tilde{v},\,\,a.e.\,\mathcal I (\tilde{y}),\end{aligned}$$
(61)
$$\begin{aligned} 0&= \tilde{p},\,\,a.e.\,\mathcal A (\tilde{y})\cap \left\{ x \in \varOmega \left| \tilde{v}(x) > 0\right. \right\} ,\end{aligned}$$
(62)
$$\begin{aligned} 0&\ge \langle \tilde{r},\tilde{p}\rangle _{H^{-1},H^1_0}\!, \end{aligned}$$
(63)

and for every \(\varepsilon > 0\) there exists a subset \(E_{\varepsilon } \subset \mathcal I (\bar{y})\) with \(meas(\mathcal I (\bar{y}){\setminus } E_{\varepsilon }) \le \varepsilon \) such that

$$\begin{aligned} 0 = \langle \tilde{r},\varphi \rangle _{H^{-1},H^1_0},\quad \forall \varphi \in H^1_0(\varOmega ):\varphi = 0,\,\varOmega {\setminus } E_{\varepsilon }. \end{aligned}$$

Such a point \((\tilde{u},\tilde{y})\) is said to be \(\mathcal E \)-almost-C-stationary and was introduced in [8] and [9]. This concept is unique to function space settings and at the moment only available for MPECs in this more regular setting. Nevertheless, it is strictly stronger than the limiting stationarity conditions of Sects. 3 and 4 if one were to apply the techniques used there on this class of MPECs. However, if \(\bar{r}\) in Theorem 4 happens to enjoy a pointwise interpretation, e.g. when \(\bar{p} \in H^2(\varOmega ) \cap H^1_0(\varOmega )\), then the \(\mathcal E \)-almost-C-stationarity conditions would be strictly weaker than those given by the limiting stationarity conditions.

In addition, the multiplier \(\tilde{r}\) can be shown to be in \((L^{\infty }(\varOmega ))^*\), which is more regular in the sense that each \(\tilde{r}\) is then a finitely additive, finite signed-measure, see e.g., [6, Theorem IV.8.16]. For simplicity we confine ourselves to the case when the coefficient functions \(b_j\) in the bilinear form (55) equal to zero.

Proposition 7

(Increased regularity of the multiplier \(\tilde{r}\)) The limiting multiplier \(\tilde{r}\) in Theorem 5 is an element of \(H^{-1}(\varOmega )\cap (L^{\infty }(\varOmega ))^*\).

Proof

We apply a technique similar to that which was used to prove [4, Theorem 3.3]. We begin by letting sign\((\cdot )\) represent the pointwise sign function and suppose that \(\sigma (\cdot )\) is a monotonic \(C^1\)-smoothing of sign\((\cdot )\), which has the property

$$\begin{aligned} \sigma (x) < 0,\text { if } x < 0,\,\, \sigma (0) = 0,\,\,\sigma (x) > 0,\text { if } x > 0. \end{aligned}$$

For an arbitrarily fixed number \(\beta > 0\), multiply equation (57) above by \(\sigma (p_{\beta })\) and obtain the equality

$$\begin{aligned} \langle A^*p_{\beta },\sigma (p_{\beta })\rangle _{H^{-1},H^1_0}+\left( \frac{1}{\beta }\mathrm{max}'_{\varepsilon (\beta )}(0,-y_{\beta }) p_{\beta },\sigma (p_{\beta })\right) _{L^2} =\big (y_d - y_{\beta },\sigma (p_{\beta })\big )_{L^2}. \end{aligned}$$

To see that the first term of the latter equation is always nonnegative, we refer back to the definition of the bilinear form \(a(\cdot ,\cdot )\) in (55). This gives

$$\begin{aligned} \langle A^*p_{\beta },\sigma (p_{\beta })\rangle _{H^{-1},H^1_0} = \int \limits _{\varOmega }\sigma '(p_{\beta })\sum _{i,j} a_{ij}|\nabla p_{\beta }|^2\,dx + \int \limits _{\varOmega }cp_{\beta }\sigma (p_{\beta })\,dx. \end{aligned}$$
(64)

The assumptions imposed above ensure that \(cp_{\beta }\sigma (p_{\beta })\ge 0\) almost everywhere on \(\varOmega \). Furthermore, observe that the first term on the right-hand side of equation (64) is positive since the derivative of \(\sigma \) is either zero or positive and since the operator \(A\) is coercive. It follows from the convergence results of Theorem 5 that there exists a constant \(\kappa > 0\) such that

$$\begin{aligned} 0\le \Big (\frac{1}{\beta }\mathrm{max}'_{\varepsilon (\beta )}(0,-y_{\beta })p_{\beta },\sigma (p_{\beta })\Big )_{L^2}\le \big (y_d - y_{\beta },\sigma (p_{\beta })\big )_{L^2}\le \kappa , \end{aligned}$$

Given the positivity of the integrand \(\frac{1}{\beta }\mathrm{max}'_{\varepsilon (\beta )}(0,-y_{\beta })p_{\beta }\sigma (p_{\beta })\), it follows that

$$\begin{aligned} \int \limits _{\varOmega }|\frac{1}{\beta }\mathrm{max}'_{\varepsilon (\beta )}(0,-y_{\beta })p_{\beta }\sigma (p_{\beta })|dx \le \kappa , \quad \forall \beta > 0. \end{aligned}$$

Then by letting \(\sigma \rightarrow \) sign\((\cdot )\), we can argue that \(\frac{1}{\beta }\mathrm{max}'_{\varepsilon (\beta )}(0,-y_{\beta })p_{\beta }\) is bounded in \(L^1(\varOmega )\). In which case, we deduce the existence of a subsequence, still denoted by \(\beta \), and an element \(r^* \in (L^{\infty }(\varOmega ))^{*}\), such that \(\frac{1}{\beta }\mathrm{max}'_{\varepsilon (\beta )}(0,-y_{\beta })p_{\beta } \stackrel{*}{\rightharpoonup }r^*\) in \((L^{\infty }(\varOmega ))^{*}\). It follows that \(r^* = \tilde{r} \in (L^{\infty }(\varOmega ))^{*}\). \(\square \)

6 Conclusions and comparisons

In terms of the usefulness of the results for the development of numerical methods, the penalization/regularization technique has the clear advantage. Indeed, in this setting the practitioner is required to solve a sequence of KKT systems arising from smooth non-linear programs (NLPs). Moreover, the limit of subsequences of solutions to the NLPs is guaranteed under weak assumptions to satisfy a type of stationarity conditions weaker than C-stationarity, yet stronger than so-called weak stationarity [19].

The development of a numerical method from the derivation technique described in Sect. 4 is somewhat more difficult. In contrast to the previously discussed method, in which one speaks of the convergence of stationary points, this method requires knowledge of optimal solutions for each of the penalized MPECs. However, provided with this information, one is guaranteed that each member of the sequence is strongly stationary and that the limit of subsequences of these solutions will satisfy the limiting stationarity system. The development of numerical methods realizing strong stationary points is one possible future direction.

Finally, though they provide us with a significant amount of insight in terms of the limits of solutions satisfying strong stationarity conditions, the limiting calculus appears to impose certain restrictions on the ability to construct numerical methods in function spaces. This relates mainly to the fact that the existence of the sequences in Proposition 2 is guaranteed by the definition of the limiting coderivative while the sequences in Theorem 4, along with their characteristics, had to be derived. Moreover, the relationship in (38) is clearly difficult to handle. If such a method were available, then the minimal requirements placed on the operator \(B\) would allow the practitioner to consider examples in which the control perturbation of the variational inequality is not distributed on the entire domain \(\varOmega \). Nevertheless, the technique used in Proposition 1 required very little in terms of structure of the control. In this sense, the limiting calculus has an analytical advantage.