1 Introduction

The aim of this paper is to present a self-contained theory of a general Lagrange multipliers rule (LMR) for an abstract optimization problem in a Banach space with both equality and inequality constraints, that covers most of theoretical and applied problems, including in particular optimal control problems with purely state and mixed state-control constraints.

To the theory of LMR as the most important necessary optimality condition, a highly vast literature is devoted concerning a variety of classes of problems, finite and infinite dimensional, convex and nonconvex, smooth and nonsmooth, etc. (see e.g. Hurwicz (1958); Dubovitskii and Milyutin (1965); Pshenichnyi (1982); Gamkrelidze and Kharatishvili (1967); Varaiya (1967); Nagahisa and Sakawa (1969); Kurcyusz (1976); Maurer and Zowe (1979); Norris (1973); Pourciau (1980); Rockafellar (1993); Tamminen (1994); Jahn (1994) and the literature therein). We do not aim to survey all the corresponding results, nor cover totally all the known problems. Instead, we propose a formulation (Theorem 2.1), which, not pretending to be new, is, in our opinion, reasonably general and, at the same time, most simple and convenient for the practical usage in many situations, such as optimal control problems with state and mixed control-state constraints Milyutin et al. (2004); Dmitruk and Osmolovskii (2014), with age structured systems Osmolovskii and Veliov (2017), etc. Note only that probably the closest to our result is paper (Norris 1973) with slightly different assumptions and proof.

Any proof of LMR is essentially based on some results from convex analysis and functional analysis, which therefore will be given first. The material presented here constitutes a self-contained piece of theory, involving only standard notions and facts, and not involving any difficult concepts from nonsmooth analysis, which are highly specific and thus are not available for the nonspecialist. So, this theory is entirely available for students of mathematical specialties and can be used for teaching purposes.

The paper is organized as follows. In Sect. 2 we formulate the main result: the Lagrange multipliers rule for an abstract optimization problem (Theorem 2.1). Section 3 is devoted to separation theorems and related results used in the extremum theory. Some properties of sublinear functionals are presented in Sect. 4. In Sect. 5 we consider questions related to the classical Lyusternik theorem on the tangent manifold. Note again that all concepts and facts in Sects. 35 are to some extent well-known, so we sometimes do not give references to their initial sources. On the other hand, for the reader’s convenience, we give proofs for the most of these facts. The reader who is familiar with this preliminary material can skip it and go directly to the proof of the main result which is given in Sect. 6.

2 Main Result

Let X, Y, and Z i , i = 1, …, ν be Banach spaces, \(\mathcal {D}\subset X\) an open set, and K i  ⊂ Z i , i = 1, …, ν closed convex cones with nonempty interiors. Let \(F_0:\mathcal {D}\to \mathbb {R},\) \(g:\mathcal {D}\to Y,\) and \(f_i : \mathcal {D}\to Z_i\), i = 1, …, ν, be given mappings. Consider the following optimization problem:

$$\displaystyle \begin{aligned} F_0(x)\to \min, \qquad f_i(x) \in K_i\,, \quad i=1,\ldots, \nu, \qquad g(x)=0. \end{aligned} $$
(1)

Let \(K_i^0 := \{z_i^* \in Z_i^*:\; \langle z_i^*,z_i \rangle \leqslant 0 \mbox{ for every } z_i \in K_i \}\) be the polar cone to K i , i = 1, …, ν. Here \(\langle z_i^*, z_i \rangle \) is the duality pairing between Z i and its dual space \(Z^*_i.\) We study the local minimality of an admissible point \(x^0 \in \mathcal {D}.\)

It is worth noting that the inequality constraints \(f_i(x)\leqslant 0,\) where \(f_i:\mathcal {D}\to \mathbb {R}\) are given functionals, may also be presented in the form f i (x) ∈ K i if we put \(K_i=\mathbb {R}_- := (-\infty ,0].\) Then \(K_i^0 =\mathbb {R}_+ := [0,\infty )\). On the other hand, all the inequality constraints f i (x) ∈ K i can be written as one constraint f(x) ∈ K if we define a mapping f : X → Z = Z 1 ×… × Z ν by f(x) = (f 1(x), …, f ν (x)), and a cone K = K 1 ×… × K ν in Z. However, we choose the form (1) to keep a visual relation with convenient statements.

We impose the following

Assumptions

(1) The objective function F 0 and the mappings f i are Fréchet differentiable at x 0; the operator g is strictly differentiable at x 0 (smoothness of the data functions), (2) the image of the derivative g (x 0) is closed in Y (weak regularity of equality constraint).

(The definition of strictly differentiable operator will be recalled in Sect. 5.3.)

The following theorem gives necessary conditions for a point \( x_0 \in \mathcal {D}\) to be a local minimizer for problem (1).

Theorem 2.1

Let x 0 provide a local minimum in problem (1). Then there exist Lagrange multipliers \(\alpha _0\geqslant 0,\;\, z_i^* \in K^0_i,\;\; i=1,\,\ldots ,\,\nu ,\) and y  Y , satisfying the nontriviality condition

$$\displaystyle \begin{aligned} \alpha_0+\sum_{i=1}^\nu\|z_i^*\|+\|y^*\|>0,\end{aligned} $$
(2)

the complementary slackness conditions

$$\displaystyle \begin{aligned} \langle z_i^*,\, f_i(x_0) \rangle =0,\qquad i=1,\ldots,\nu,\end{aligned} $$
(3)

and such that the Lagrange function

$$\displaystyle \begin{aligned} \mathcal{L}(x) = \; \alpha_0 F_0(x) + \sum_{i=1}^\nu \langle z_i^*, \,f_i(x)\rangle+ \langle y^*, \,g(x)\rangle\end{aligned} $$

is stationary at x 0 : \( \mathcal {L}'(x_0)=0,\;\) i.e.,

$$\displaystyle \begin{aligned} \alpha_0 F_0^{\prime}(x_0)\, +\, \sum_{i=1}^\nu z_i^* f_i^{\prime}(x_0)\, +\, y^* g^{\prime}(x_0) \;=\; 0.\end{aligned} $$
(4)

We prove this theorem in Sect. 6, but first we need a number of auxiliary notions and assertions. We start with some facts of linear functional analysis, most of which are, of course, well known, while others are specific for the extremum theory.

3 Some Facts of Linear Functional Analysis

Each of these facts holds in a proper vector space—linear topological space, locally convex space, normed space, Banach space, and sometimes even in a vector space without any topology. The first time reader can harmlessly assume that all happens in a Banach space.

3.1 Separation of Convex Sets

Let X be a topological vector space, x 1, x 2 ∈ X. By [x 1, x 2] we denote the interval in X with the ends x 1 , x 2 , i.e., [x 1, x 2] = co {x 1, x 2}. By X we denote the dual space to X, consisting of all linear continuous functionals \(x^*:X\to \mathbb {R}.\)

Definition

Let A and B be two sets in X. A nonzero functional x ∈ X separates these sets if

$$\displaystyle \begin{aligned} \sup_{x\in A}\,\langle x^*,x\rangle \;\leqslant\; \inf_{x\in B}\,\langle x^*,x\rangle. \end{aligned} $$
(5)

Obviously, this is equivalent to the existence of a number c such that \(\langle x^*,x\rangle \leqslant c\) on A, and \(\langle x^*,x\rangle \geqslant c\) on B. It is said that the hyperplane 〈x , x〉  = c separates A and B.

The following theorem is a key abstract assertion, on which a comprehensive theory of necessary extremum conditions is based.

Theorem 3.1 (Hahn–Banach)

Let A be an open convex set, B a convex set. Suppose that A  B = Ø. Then there exists a nonzero linear continuous functional separating A and B.

This is the Hahn–Banach theorem in the “geometric form”, or the separation theorem. Its proof can be found in any textbook on functional analysis (e.g. in Kolmogorov and Fomin (1968); Dunford and Schwartz (1968)).

Corollary 3.1

Let X be a locally convex space, A a convex closed set, and B a convex compact set. Suppose that A  B = Ø. Then there exists a linear continuous functional x  X strictly separating A and B, i.e.,

$$\displaystyle \begin{aligned} \sup_{x\in A}\,\langle x^*,x\rangle \;<\; \inf_{x\in B}\,\langle x^*,x\rangle. \end{aligned} $$
(6)

(Any such x is necessarily nonzero.)

The proof follows from the fact that the compact set B can be placed into an open convex set \(\widetilde B\) that still does not intersect A. Then any x separating A and \(\widetilde B\) satisfies (6).

Along with Theorem 3.1, the following “dual” fact holds as well (Dunford and Schwartz 1968).

Theorem 3.2 (Hahn–Banach)

Let A be a convex set in X , open in the weak-* topology, and B be a convex set in X . Suppose that A  B = Ø. Then A and B can be separated by a nonzero element x  X.

Now, let be given a set M ⊂ X.

Definition 3.1

An element x ∈ X is a support functional to the set M if

$$\displaystyle \begin{aligned}\inf x^*(M)\;:=\;\inf_{x\in M}\langle x^*,x\rangle> -\infty.\end{aligned} $$

This obviously means that \(\langle x^*,x\rangle \geqslant a\) on M for some real a. The set of all support functionals to M we denote by M . Clearly, M is a convex and closed cone. (The set − M is called barrier cone to M.) Note that if K is a cone, then x ∈ K iff \(\langle x^*,x\rangle \geqslant 0\) for all x ∈ K, and in this case, K is said to be dual (or conjugate) to the cone K. An easy property is that, if K 1 and K 2 are two nonempty cones, then \((K_1+K_2)^* = K_1^*\cap K_2^*\) (while the question about (K 1K 2) is not that simple). Also note that, if one of the two sets is a cone, then the constant c in the separation theorem can be taken zero, and then x or − x is an element of the dual cone.

Let L ⊂ X be a subspace. Then L consists of all functionals vanishing on L. The set of such functionals is denoted by L and called annihilator of the subspace L. So, the dual cone for a subspace coincides with its annihilator.

Another basic example is given by the following

Lemma 3.1

Let \(l: X \to \mathbb {R}\) be a nonzero linear functional on a vector space X. Define an open half-space K = {x  X : 〈l, x〉 > 0}. Let m  K , i.e.,m, x〉⩾0 for all x  K. Then there is α⩾0 such that m = α l. Thus, the dual cone to a half-space is the ray spanned by the functional defining this half-space.

Proof

Consider the mapping \(A: X \to \mathbb {R}^2, Ax = (\langle l, x \rangle ,\, \langle m, x \rangle ).\) Obviously, it is not onto: \(AX \ne \mathbb {R}^2\) (since it does not include the point (1,-1)), and so, there is a nonzero pair \((\alpha ,\beta )\in \mathbb {R}^2\) such that αl, x〉 + βm, x〉 = 0 for all x ∈ X, i.e. α l + β m = 0. If β = 0, then α≠0 and l = 0, a contradiction with the assumption. Therefore, β≠0 and we can set β = −1. Thus, m = αl. Obviously, α⩾0, since 〈l, x〉 > 0 implies 〈m, x〉⩾0. \(\Box \)

Finally, one can easily show that two sets A and B in X can be separated if and only if there exist nonzero functionals x and y such that

$$\displaystyle \begin{aligned}x^*\in A^*,\quad y^*\in B^*, \quad x^*+y^*=0,\quad \inf x^*(A)+\inf y^*(B)\geqslant 0\, .\end{aligned} $$

This equivalent formulation of separation of two sets makes it possible to generalize the separation theorem for the case of a finite number of sets. Let us pass to this generalization.

3.2 The Dubovitskii–Milyutin Theorem on the Nonintersection of Cones

Let, as before, X be a linear topological space.

Theorem 3.3 (Dubovitskii–Milyutin (1965))

Let Ω 1, …, Ω s ,  Ω be nonempty convex cones in X, among which the cones Ω 1, …, Ω s are open. Then Ω 1 ∩… ∩ Ω s  ∩ Ω = Ø iff there exist linear functionals

$$\displaystyle \begin{aligned} x_1^*\in \Omega_1^*\,,\quad \ldots,\quad x_s^*\in \Omega_s^*\,,\quad x^*\in \Omega^*, \end{aligned} $$
(7)

not all equal zero, such that

$$\displaystyle \begin{aligned} x_1^*+\, \ldots\, + x_s^*\, +\, x^*=\, 0\,. \end{aligned} $$
(8)

Dubovitskii and Milyutin called relation (8) the Euler (Euler–Lagrange) equation for the given system of cones. Although the name may seem strange, the fact is that all necessary conditions of the first order for a local minimum in various problems on a conditional extremum, including the Euler-Lagrange equation in the calculus of variations, and even Maximum principle in optimal control, can be obtained by using this (at first glance, primitive) equality. The corresponding procedure is called Dubovitskii–Milyutin’s scheme (or approach) and will be presented below.

To prove Theorem 3.3, we need the following simple fact. Let X 1, …, X s be Banach spaces, and W = X 1 ×… × X s their product. Then w ∈ W iff there exist \(x^*_i\in X^*_i\), i = 1, …, s such that, for any w = (x 1, …, x s ) ∈ W we have \( \langle w^*, w \rangle =\langle x^*_1,x_1\rangle + \ldots + \langle x^*_s,x_s\rangle .\)

Proof of Theorem 3.3.

(⇒) Suppose that Ω1 ∩… ∩ Ω s  ∩ Ω = Ø. In the product W of s copies X ×… × X = X s of the space X consider two cones: K = Ω1 ×… × Ω s and D = {w = (x 1, …, x s )∣x 1 = … = x s  = x ∈ Ω}. Thus, D is the “diagonal” in the product Ω ×… × Ω of s copies of Ω. We claim that K ∩ D = Ø.

Suppose not, and let w ∈ K ∩ D. Since w ∈ K, then w = (x 1, …, x s ), where x 1 ∈ Ω1, …, x s  ∈ Ω s . Since w ∈ D, we have x 1 = … = x s  = x ∈ Ω. Thus, x ∈ Ω1 ∩… ∩ Ω s  ∩ Ω, which contradicts the nonintersection of the cones.

Since the cones K and D are convex, and K is open, the Hahn–Banach separation theorem says that condition K ∩ D = Ø implies the existence of a nonzero w ∈ W such that

$$\displaystyle \begin{aligned}\langle w^*,K \rangle \geqslant 0 \quad \mbox{and} \quad \langle w^*,D \rangle \leqslant 0\,. \end{aligned}$$

But \(w^*=(x_1^*,\ldots ,x_s^*),\) where all \(x_i^*\in X^*,\) and the condition \(\langle w^*,K \rangle \geqslant 0\) means that \(\langle x^*_1,x_1 \rangle + \ldots + \langle x^*_s,x_s \rangle \geqslant 0\) for any x 1 ∈ Ω1, …, x s  ∈ Ω s . From the positive homogeneity of all Ω i it easily follows that all \(x_i^*\in \Omega _i^*,\;\, i=1,\ldots ,s.\) Moreover, not all \(x_1^*,\ldots ,x_s^*\) equal zero, since w ≠0.

Further, condition \(\langle w^*,D \rangle \leqslant 0\) means that \(\langle x_1^*+\ldots +x_s^*, x\rangle \leqslant 0\) for all x ∈ Ω. Set \(x^*= -(x^*_1+\ldots +x_s^*).\) Then x ∈ Ω and \(x^*_1+\ldots +x_s^*+x^*=0.\)

(⇐=) Suppose there is a nonzero collection of functionals \(x_1^*,\ldots ,x_s^*,\,x^*\) satisfying conditions (7) and (8), but the cones Ω1, …, Ω s ,  Ω intersect. Let \(\hat x\) be their common element. Among the functionals \(x_1^*, \ldots , x_s^*\,,\) there is at least one nonzero: \(x_{i_0}^* \ne 0\) (otherwise, the last functional x  = 0 too, so the whole collection is trivial, a contradiction). Obviously, \(\langle x_{i_0}^*, \hat x\rangle > 0\) since \(\hat x \in \mbox{int}\,\Omega _{i_0}\,.\) For the rest of \(x_i^*\) we have \(\langle x^*_i,\hat x\rangle \geqslant 0,\) and \(\langle x^*,\hat x\rangle \geqslant 0\) as well. Consequently, \(\bigl \langle \sum _{i=1}^s x_i^* + x^*,\,\hat x \bigr \rangle >0,\) which contradicts the equality (8).\(\Box \)

We will also give an analog (generalization) of this theorem for a finite number of convex sets, not necessarily cones.

Theorem 3.4 (Dubovitskii–Milyutin (1965))

Let M 1, …, M s , M be nonempty convex sets in a linear topological space X, among which the sets M 1, …, M s are open. Then the condition

$$\displaystyle \begin{aligned}M_1\cap\ldots\cap M_s\cap M = \O \end{aligned}$$

is equivalent to the existence of a nontrivial collection of functionals \((x_1^*,\,\ldots , x_s^*,\,x^*)\) from X such that

$$\displaystyle \begin{aligned} x_1^*\, +\, \ldots\, +\, x_s^*\, +\, x^*\; =\, 0\,, \end{aligned} $$
(9)
$$\displaystyle \begin{aligned} \inf\, \langle x_1^*,M_1\rangle\, +\, \ldots\, +\, \inf\, \langle x_s^*,M_s\rangle\, +\, \inf\, \langle x^*,M\rangle \geqslant 0\,. \end{aligned} $$
(10)

The proof can be reduced to the case of cones, where one can apply Theorem 3.3. However, we will give another, maybe even more simple proof by a direct use of the separation theorem.

Proof

(⇒) In the space W = X s, consider the set A of elements of the form w = (x 1 − x, …, x s  − x), where x 1 ∈ M 1, …, x s  ∈ M s , x ∈ M. Obviously, A is open and convex. The nonintersection of M 1, …, M s and M yields that A does not contain zero element (0, … 0) ∈ X s. Consequently, there exists a functional \(w^*=(x_1^*,\ldots ,x_s^*)\) that separates A from zero: \(\langle w^*,w \rangle \geqslant 0 \forall \, w \in A,\) that is

$$\displaystyle \begin{aligned}\langle x_1^*,x_1\rangle+\ldots+\langle x_s^*,x_s\rangle\; - \langle x_1^* + \ldots + x_s^*, x\rangle \geqslant 0 \end{aligned}$$

for all x 1 ∈ M 1, …, x s  ∈ M s , x ∈ M. Set \(x^*= -(x_1^* +\ldots +x_k^*).\,\) Obviously, the collection \((x_1^*,\;\ldots ,\;x_k^*,\;x^*)\) is nonzero and satisfies conditions (9), (10).

(⇐=) Suppose there is a nonzero collection of functionals \(x_1^*,\ldots ,x_s^*,\,x^*\) satisfying conditions (9), (10). Denote for brevity \(m_i = \inf \, \langle x_i^*,M_i\rangle \) and \(m_i = \inf \, \langle x^*,M\rangle .\) Obviously, at least one \(x_{i_0}^* \ne 0,\) whence \(\langle x_{i_0}^*, \hat x\rangle > m_{i_0}\) because \(\hat x \in \mbox{int}\,M_{i_0}\,,\) while all other \(\langle x^*_i,\hat x\rangle \geqslant m_i\) and \(\langle x^*,\hat x\rangle \geqslant m.\) Then, in view of (10),

$$\displaystyle \begin{aligned}\Bigl\langle \sum_{i=1}^s x_i^* + x^*,\,\hat x \Bigr\rangle \;>\; \sum_{i=1}^s m_i\, + m \;\geqslant\; 0, \end{aligned}$$

which contradicts the equality (9). \(\Box \)

Note that Theorem 3.3 directly follows from the last one, because if M is a cone, then \(\inf \,\langle x^*,M \rangle =c > -\infty \) is equivalent to that c = 0 and x ∈ M .

Also note that both Theorems 3.3 and 3.4 are in fact equivalent to Theorem 3.1, but are more prepared for application in the theory of optimization.

3.3 The Dual Cone to the Intersection of Cones

An important corollary of Theorem 3.3 is the following formula (Dubovitskii and Milyutin 1965).

Lemma 3.2

Let K 1 and K 2 be convex cones, one of which intersects with the interior of other: \(\;( \mathop {\mathrm {int}} K_1)\cap K_2\ne \O .\;\) Then

$$\displaystyle \begin{aligned} (K_1\cap K_2)^*\; =\, K_1^* + K_2^*\,. \end{aligned} $$
(11)

Proof

The inclusion ⊃ is trivial, so we just have to prove ⊂ . Take any p ∈ (K 1K 2), i.e., \((p,\,K_1\cap K_2) \geqslant 0.\) We need to represent it as p = p 1 + p 2, where \(p_1\in K_1^*\) \(p_2\in K_2^*\,.\,\) Assume that p≠0 (otherwise there is a trivial representation 0 = 0 + 0).

Introduce one more cone, an open half-space K 0 = {x | (p, x) < 0}. It is nonempty, since p≠0. Obviously, \(K_0\cap \, \mathop {\mathrm {int}} K_1\cap K_2 =\O \) (otherwise \(\exists \, x\in \, \mathop {\mathrm {int}} K_1\cap K_2\) such that x ∈ K 0, i.e., (p, x) < 0, which contradicts the choice of p.)

Since the cones K 0 and \( \mathop {\mathrm {int}} K_1\) are open, by the Dubovitskii–Milyutin Theorem 3.3 there exist \(p_i\in K_i^*,\;\, i=0,1,2,\) not all zero, such that p 0 + p 1 + p 2 = 0 (we took into account that \(( \mathop {\mathrm {int}} K_1)^* = K_1^*).\) But p 0 = −αp with some \(\alpha \geqslant 0,\) i.e., we have αp = p 1 + p 2. If α = 0, we get p 1 + p 2 = 0, and both these functionals are not zero. Then, again by Theorem 3.3, we get \( \mathop {\mathrm {int}} K_1\cap K_2= \O ,\) which contradicts the assumption. Therefore, α > 0, and hence \(p= \frac 1 \alpha (p_1+p_2)\in K_1^* +K_2^*\,.\;\) \(\Box \)

3.4 The Support Cone to a Convex Set at a Point

Let X be a linear topological space, M ⊂ X a convex set.

Definition

An element l ∈ X is called support functional or inner normal to the set M at a point \(x_0 \in \overline M\) if \(\langle l,M \rangle \geqslant \langle l,x_0 \rangle .\) The set of all such functionals will be denoted by M (x 0). Obviously, it is a closed convex cone. In particular case when M itself is a cone and x 0 = 0, we obtain M (x 0) = M , i.e., the support cone to a cone M at the origin is exactly the above dual (conjugate) cone M .

Together with the cone M (x 0), introduce the set Ω(x 0) = {h ∈ X : ∃ α > 0 such that \(x_0 +\alpha h \in \mathop {\mathrm {int}} M\}.\) Obviously, it is an open convex cone (may be empty), which is called the cone of interior directions to the set M at the point x 0.

Lemma 3.3

If M has a nonempty interior, then Ω (x 0) = M (x 0).

Proof

With no loss of generality we take x 0 = 0. Then \(\displaystyle \Omega (0) = \bigcup _{\alpha >0} \alpha ( \mathop {\mathrm {int}} M),\) so \(\;\langle l,\Omega (0) \rangle \geqslant 0 \iff \langle l, \mathop {\mathrm {int}} M \rangle \geqslant 0 \iff \langle l, M \rangle \geqslant 0,\;\) q.e.d. \(\Box \)

Corollary 3.2

If K is a convex cone, and \(x_0 \in \overline K,\) then

K (x 0) = {l  K  : 〈l, x 0〉 = 0 }.

Proof

We have to show that the inequality \(\langle l, K \rangle \geqslant \langle l, x_0 \rangle \) holds ⇔ \(\langle l, K \rangle \geqslant 0\) and 〈l, x 0〉 = 0. Let us prove the implication ⇒ .

Since x 0 + K ⊂ K, we obtain \(\langle l,\, x_0 + K \rangle \geqslant \langle l, x_0 \rangle ,\) hence \(\langle l, K \rangle \geqslant 0.\) In particular, \(\langle l, x_0 \rangle \geqslant 0.\) But since \(0 \in \overline K\) and l ∈ K (x 0), we have \(0 = \langle l, 0 \rangle \geqslant \langle l, x_0 \rangle ,\) whence \(\langle l, x_0 \rangle \leqslant 0,\) and so 〈l, x 0〉 = 0. The reverse implication is trivial. \(\Box \)

The Tangential Cone

Note by the way that there is yet another cone related to a convex set M at a point x 0, called the tangential cone

$$\displaystyle \begin{aligned}T (x_0)= \; \overline{\, \bigcup_{\alpha\geqslant 0} \alpha (M- x_0)}\; =\; \overline{\,\mathbb{R}_+ (M- x_0)}.\end{aligned} $$

Obviously, it is the minimal closed cone containing the set M − x 0. An easy fact is that \(T(x_0) = \overline {\Omega (x_0)}\) provided that M has nonempty interior. (Indeed, the inclusion \(T(x_0) \supset \overline {\Omega (x_0)}\) is trivial. On the other hand, for any \(\alpha \geqslant 0\) we have \(\alpha (M- x_0) \subset \overline {\,\alpha (M- x_0)},\) hence \(T(x_0) \subset \overline {\Omega (x_0)}.\)) However, we will not use this cone in what follows.

3.5 Lemma on the Nontriviality of Annihilator

Let L ⊂ X be a linear manifold. The set L  = {x ∈ X ∣〈x , x〉 = 0 ∀ x ∈ L} is called annihilator of L. As it was noted above, any x ∈ L is a support functional to L, and the converse is also true. Thus, L  = L . Obviously, L ⊂ X is a subspace, i.e., a closed linear manifold.

A simple corollary of the Hahn–Banach separation Theorem 3.1 is the following

Lemma 3.4

Let X be a locally convex space and L  X a subspace which does not coincide with X. Then L contains a nonzero element.

(Recall that a topological vector space X is locally convex if any nonempty open set in X contains a nonempty open convex subset.)

Proof

Take any x ∈ X ∖ L. Since L is closed and the space X is locally convex, there exists an open convex set U ∋ x which does not intersect with L. Then U and L can be separated by a nonzero functional x , and therefore, x is bounded on L either from above or from below, which in turn implies that 〈x , L〉 = 0, q.e.d. \(\Box \)

Note that here the assumption of the closedness of L is essential.

3.6 The Banach Open Mapping Theorem

The following theorem is one of the basic principles of functional analysis.

Theorem 3.5 (S. Banach)

Let X and Y be Banach spaces and A : X  Y a linear surjective operator. Then the image of the unit ball contains a ball of some radius r > 0:

$$\displaystyle \begin{aligned} A B_1(0) \supset B_r(0). \end{aligned}$$

This assertion can be equivalently formulated as follows: for any y ∈ Ythere is an x ∈ X such that Ax = y and \(||x|| \leqslant C\,||y||\,,\) where the constant C does not depend on x or y. (One can take C = 1∕r .)

The latter formulation is more convenient for application in optimization theory than the standard formulation saying that the image of any open set is open.

3.7 Lemma on the Closed Image of a Combined Operator

An important role in optimization theory is played by the following fact (Alekseev et al. 1987; Levitin et al. 1974).Footnote 1

Lemma 3.5

Let X, Y, Z be Banach spaces, A : X  Y and B : X  Z continuous linear operators. Let AX = Y and the set \(B( \mathop {\mathrm {Ker}} A)\) be closed in Z. Then the “combined” operator T : X  Y × Z, such that x↦(Ax, Bx), has a closed image.

Proof

Let a sequence x n  ∈ X be such that A x n  = y n , B x n  = z n , and (y n , z n ) → (y 0, z 0) . We must show that there exists x ∈ X such that Ax = y 0 and Bx = z 0. Since AX = Y, we can assume that y 0 = 0. (Otherwise, taking \(\hat x\) such that \(A\hat x =y_0\,,\) we obtain δy n  = Aδx n  → 0.)

Thus, Ax n  → 0. By the Banach open mapping theorem, there exists a sequence \(x^{\prime }_n\to 0\) such that \(A x^{\prime }_n= A x_n\,.\) Define \(\bar x_n = x_n - x^{\prime }_n\,.\) Obviously, \(\bar x_n \in Ker\,A\) and \(B\bar x_n = B x_n -B x^{\prime }_n \to z_0\) (tends to the same point z 0), so, z 0 is the limit point of the sequence \(B\bar x_n \in B(Ker\,A).\) Since B(Ker A) is closed by the assumption, z 0 ∈ B(Ker A), which means that ∃ x 0 such that Ax 0 = 0 and Bx 0 = z 0 . \(\Box \)

Clearly, this lemma remains valid if the image Y 1 := AX is just a closed subspace in Y, not necessarily the whole space Y. Then Y 1 itself is a Banach space, and the operator A : X → Y 1 maps onto, hence the image of T = (A, B) is closed in Y 1 × Z, and therefore, in Y × Z.

The following particular case of Lemma 3.5 is most usable in optimal control theory for systems of ODEs. (There, the operator A is the linearized control system, and B is the derivative of endpoint constraints.)

Corollary 3.3

Let A : X  Y be a linear surjective operator, and B : X  Z a finite dimensional linear operator (dim Z < ∞). Then the combined operator Tx = (Ax, Bx) has a closed image in Y × Z.

The proof follows from the fact that the subspace \(B( \mathop {\mathrm {Ker}}\, A)\) is always finite dimensional, and therefore closed.

3.8 Annihilator of the Kernel of a Surjective Operator

Let X and Y be Banach spaces and A : X → Y a linear operator. Recall that its adjoint operator A  : Y → X maps any y ∈ Y to a functional x ∈ X defined by the relation 〈x , x〉 = 〈y , Ax〉. Thus, x  = A y . Sometimes we will also use another, “physical” notation: x  = y A, which is more convenient in formulas.

The following fact is well-known and widely used.

Lemma 3.6

Let the operator A : X  Y be surjective. Then

$$\displaystyle \begin{aligned}(Ker\,A)^*\; =\; A^*\,Y^*. \end{aligned}$$

In other words, any linear functional x  X vanishing on \( \mathop {\mathrm {Ker}}\,A\) has the formx , x〉 = 〈y , Axwith some y  Y . And vice versa, any functional of this form vanishes on \( \mathop {\mathrm {Ker}} A.\) (The last statement does not require the surjectivity of A).

Proof

  1. a)

    Show that \(A^*Y^*\subset ( \mathop {\mathrm {Ker}} A)^*.\) Indeed, let x  = A y , i.e., 〈x , x〉 = 〈y , Ax〉 ∀ x ∈ X. Then, obviously, for any \(x\in \mathop {\mathrm {Ker}} A\) we have 〈x , x〉 = 0. (Here the condition AX = Y is not required.)

  2. b)

    Show that \(( \mathop {\mathrm {Ker}} A)^*\subset A^*Y^*.\) Let a linear functional x vanish on \( \mathop {\mathrm {Ker}} A.\) Consider the operator \(\; T: X\to Y\times \mathbb {R}, x\to (Ax,\langle x^*,x\rangle ).\)

    By Corollary 3.3, its image TX is closed in \(Y\times \mathbb {R}.\) This image does not coincide with the whole space Y × R, because it does not contain the point (0, 1), since Ax = 0 implies 〈x , x〉 = 0. Then, by Lemma 3.4 the annihilator of TX contains a nonzero element \((y^*, c) \in Y^*\times \mathbb {R},\) i.e.,

    $$\displaystyle \begin{aligned}\langle y^*,Ax\rangle\, +\, c\langle x^*,x\rangle \;= 0\qquad \mbox{for all }\; x\in X. \end{aligned}$$

    We claim that c≠0. Indeed, if c = 0, then 〈y , Ax〉 = 0 on the whole space X, which means that 〈y , y〉 = 0 on the whole Y (since AX = Y ), whence y  = 0, so (y , c) = (0, 0), a contradiction.

    Thus, c≠0. Then \(\displaystyle \langle x^*,x\rangle = -\langle \frac {1}{c}\,y^*,Ax\rangle \,\) for all x ∈ X , i.e., \(\langle x^*,x\rangle =\langle y_1^*,Ax\rangle ,\;\) where \(y_1^*=-\frac {1}{c}\,y^*\in Y^*,\) as required. \(\Box \)

Remark

Note an important particular case of this lemma. If \(Y=\mathbb {R}^m,\) then Ax = (〈a 1, x〉 …, 〈a m , x〉), where all a i  ∈ X , and \(y^* =(\beta _1\,\ldots ,\, \beta _m) \in \mathbb {R}^m.\) Lemma 3.6 says that, if a linear functional x is subjected to functionals a 1 …, a m in the sense that 〈a 1, x〉 = 0, …, 〈a m , x〉 = 0 implies 〈x , x〉 = 0, then x is a linear combination of these a i , i.e. x  = β 1 a 1  + …  + β m a m for some \(\beta \in \mathbb {R}^m.\) Lemma 3.6 is nothing more than the straightforward generalization of this well known fact of finite-dimensional linear algebra to the case of infinite-dimensional spaces, and its proof is in fact the same.

3.9 The Farkas–Minkowski Theorem

Lemma 3.6 has an analogue in the case when \( \mathop {\mathrm {Ker}} A\) is replaced by A −1 K, where K is a cone in the image space Y (see e.g. Hurwicz (1958)).

Theorem 3.6

Let X and Y be linear topological spaces, A : X  Y a linear continuous operator, K  Y a convex cone with a nonempty interior, Ω = A −1 K = {x  X : Ax  K}. Suppose thatx 0 such that \(A x_0 \in \mathop {\mathrm {int}} K.\)

Then x Ω if and only if there exists an y  K such that x  = A y . That is, Ω  = A K . (In “physical” notation, x  = y A and Ω  = K A, respectively.)

Proof

If y ∈ K and x  = y A, then obviously x ∈ Ω. So, we have to prove only the reverse implication.

Take any x ∈ Ω. In the space X × Y, consider a subspace Γ = {(x, y) : Ax = y} (the graph of operator A), and a convex cone C = X × K. Since A is continuous, the subspace Γ is closed. Define a linear functional \(p: X\times Y \to \mathbb {R}\) by setting p(x, y) = 〈x , x〉. Obviously, \(p \geqslant 0\) on Γ ∩ C = {(x, y) : Ax = y ∈ K}, i.e., p ∈ ( Γ ∩ C). By the assumption, the pair \((x_0,\,A x_0) \in \Gamma \cap ( \mathop {\mathrm {int}} C),\) whence, by Lemma 3.2, p = h + q, where h ∈ Γ and q ∈ C . Since Γ is the kernel of linear operator X × Y → Y, (x, y)↦Ax − y, which is obviously surjective, h ∈ Γ means that h(x, y) = 〈y , Ax − y〉 for all x, y with some y ∈ Y .

Now, let q(x, y) = 〈λ, x〉 + 〈μ, y〉 with some λ ∈ X , μ ∈ Y . The condition q ∈ C  = X × K means that λ = 0 and μ ∈ K , so q(x, y) = 〈μ, y〉, and then p = h + q means that 〈x , x〉 = 〈y , Ax − y〉 + 〈μ, y〉 for all x ∈ X, y ∈ Y. This implies that y  = μ ∈ K and 〈x , x〉 = 〈y , Ax〉, q.e.d. \(\Box \)

4 Sublinear Functionals

Let X be a normed space. A functional \(\varphi : X \to \mathbb {R}\) is called sublinear if it is positively homogeneous and convex:

  1. (a)

    φ(λx) = λφ(x) ∀x ∈ X, λ > 0,

  2. (b)

    \(\varphi (x+y)\leqslant \varphi (x)+\varphi (y)\quad \forall x,y\in X.\)

The first condition is the positive homogeneity. Condition (b) is called subadditivity; for positively homogeneous functionals it is equivalent to the convexity.

Similar to linear functionals, a sublinear functional φ is called bounded if there exists a constant C such that

  1. (c)

    \( \quad |\varphi (x)|\leqslant C\|x \|\quad \; \forall \, x\in X.\)

By virtue of homogeneity of φ, this condition is equivalent to the boundedness of φ on the unit ball in X, and in view of convexity it is also equivalent to the boundedness from above on the unit ball.

In what follows, we consider only bounded sublinear functionals.

4.1 Support Functionals of Sublinear Functionals

Definition

A linear functional l ∈ X is called support to a sublinear functional φ if \(l(x)\leqslant \varphi (x)\) for all x ∈ X. The set of all support functionals to φ is denoted by ∂φ and called subdifferential of φ, while the elements l ∈ ∂φ are called the subgradients of φ.

If C is such that \(\varphi (x)\leqslant C\|x\|\) for all x ∈ X, then ∂φ ⊂ B C (0). Indeed, if l ∈ ∂φ, then \(\langle l,x\rangle \leqslant \varphi (x)\leqslant C\|x\|\;\; \forall \, x.\) Hence \(|\langle l,x\rangle | \leqslant C\|x\|\;\;\forall \, x,\) and therefore \(\|l\| \leqslant C.\)

So, any bounded sublinear functional has a bounded set ∂φ. It is easy to verify that ∂φ is convex and closed. Moreover, the separation theorem implies that it is weakly* closed. By the Banach–Alaoglu theorem (Kolmogorov and Fomin 1968; Dunford and Schwartz 1968), any bounded weakly* closed set in the dual space is compact in the weak* topology of this space. Thus, ∂φ is convex and weakly* compact in X .

We now show that ∂φ is not empty. Moreover, even a stronger fact holds true.

Theorem 4.1 (G. Minkowski)

Let \(\varphi :X \to \mathbb {R}\) be a bounded sublinear functional on a normed space X. Then the set ∂φ of its support functionals is nonempty, and the following representation holds:

$$\displaystyle \begin{aligned} \varphi(x)=\; \max_{l\in \partial\varphi}\, \langle l,x\rangle \; \quad \forall\, x\in X. \end{aligned} $$
(12)

Note that the maximum in the right-hand side of (12) is always attained because of the weak* compactness of ∂φ.

Proof

If l ∈ ∂φ, then \(\langle l,x\rangle \leqslant \varphi (x)\;\; \forall \, x,\) from which we obtain the inequality

$$\displaystyle \begin{aligned}\sup_{l\in \partial\varphi}\,\langle l,x\rangle \leqslant \varphi(x) \qquad \forall\, x. \end{aligned}$$

Now, take any x 0 ∈ X and show that there exists l ∈ ∂φ such that 〈l, x 0〉 = φ(x 0). Then equality (12) will be established.

In the product \(X\times \mathbb {R},\) consider the set K = {(x, t)  : φ(x) < t}. Obviously, it is a nonempty convex cone. Moreover, it is open since φ is continuous. Set t 0 = φ(x 0). Then the point (x 0, t 0) does not belong to K. Therefore, by the Hahn–Banach Theorem 3.1, there is a nonzero functional \( (l,\alpha )\in X^*\times \mathbb {R}\) separating it from K:

$$\displaystyle \begin{aligned} \langle l,x\rangle + \alpha t \;\leqslant\; \langle l,x_0\rangle +\alpha t_0\qquad \forall\, (x,t)\in K\,. \end{aligned} $$
(13)

Let us analyze this condition. Since, for a fixed x, it holds for all t > φ(x), then clearly \(\alpha \leqslant 0.\) If α = 0, then \(\langle l,x-x_0\rangle \leqslant 0\) for all x ∈ X, whence l = 0, which contradicts the nontriviality of the pair (l, α). Therefore, α < 0, and we can put α = −1. Then (13) becomes:

$$\displaystyle \begin{aligned}\langle l,x\rangle- \langle l,x_0\rangle \;\leqslant\; t-t_0 \qquad \forall\, t>\varphi(x), \end{aligned}$$

whence this is true also for t = φ(x), and so, we have

$$\displaystyle \begin{aligned} \langle l,x\rangle - \langle l,x_0\rangle \;\leqslant\; \varphi(x)-\varphi(x_0) \qquad \forall\, x \in X. \end{aligned} $$
(14)

Now, take any \(\bar x\in X,\) and set \(x=N \bar x,\) where N is a large number. Then

$$\displaystyle \begin{aligned}\langle l,\bar x \rangle - \frac 1N \langle l,x_0\rangle \; \leqslant\; \varphi(\bar x) - \frac 1N \varphi(x_0), \end{aligned}$$

whence, letting N →, we obtain \(\langle l,\bar x\rangle \leqslant \varphi (\bar x).\) Since \(\bar x\) is arbitrary, this means that l ∈ ∂φ, and therefore, ∂φ is nonempty. Next, setting x = 0 in (14), we get \( \langle l,x_0\rangle \geqslant \varphi (x_0).\) Since l ∈ ∂φ, the opposite inequality is also true. Therefore, we obtain equality 〈l, x 0〉 = φ(x 0), which proves the theorem. \(\Box \)

An easy observation is that, taking an arbitrary nonempty bounded set Q ⊂ X , we obtain a bounded sublinear functional

$$\displaystyle \begin{aligned} \varphi(x)=\; \sup_{x^*\in Q}\,\langle x^*,x\rangle \; \quad \forall\, x\in X, \end{aligned} $$
(15)

where Q can be harmlessly replaced by the weak-* closure of its convex hull. Due to Theorem 4.1, this formula describes all possible sublinear bounded functionals on X, while the next lemma ensures the uniqueness of representation (15) if Q is also convex and closed.

Lemma 4.1

Let Q be a bounded, convex, and closed set in X . Then the functional (15) is sublinear, bounded, and has ∂φ = Q.

Proof

The sublinearity and boundedness are obvious. It remains to prove the last equality. The inclusion Q ⊂ ∂φ is obvious. To prove the reverse inclusion, take any x Q and show that x ∂φ. Indeed, since Q is weakly-* closed and x Q, the separation Theorem 3.2 says that there is x 0 ∈ X such that \(\langle x^*, x_0 \rangle > \sup \, \langle Q, x_0 \rangle = \varphi (x_0),\) hence x ∂φ. \(\Box \)

(If Q is unbounded, the functional (15) is still sublinear, but not bounded. Such functionals are not of much interest for our purposes.)

4.2 Subdifferential of the Maximum of Sublinear Functionals

Theorem 4.2 (Dubovitskii–Milyutin (1965))

Let \(\varphi _i: X \to \mathbb {R},\) i = 1, …, m, be bounded sublinear functionals on a normed space X, and

$$\displaystyle \begin{aligned}\Phi(x): =\; \max_{1\leqslant i\leqslant m}\, \varphi_i(x). \end{aligned}$$

Then

$$\displaystyle \begin{aligned} \partial \Phi =\; co\, \Bigl( \bigcup_{1\leqslant i\leqslant m}\, \partial\varphi_i \Bigr)\,. \end{aligned} $$
(16)

Proof

Define the sets A i  = ∂φ i . Since all of them are weakly-* compact, their union \(Q= \bigcup _{1\leqslant i\leqslant m}\,A_i\) is weakly-* compact too, and then its convex hull as well.

For any x ∈ X, we have

$$\displaystyle \begin{aligned}\Phi(x) = \max_{1\leqslant i\leqslant m}\,\varphi_i(x)\; =\; \max_{1\leqslant i\leqslant m}\,\max\, \langle A_i,x \rangle\; = \end{aligned}$$
$$\displaystyle \begin{aligned}=\; \max\, \langle\, \bigcup_{1\leqslant i\leqslant m} A_i, x \rangle \; =\; \max\, \langle\, co\, \bigcup_{1\leqslant i\leqslant m} A_i, x \rangle \; =\; \max\, \langle Q,\,x \rangle. \end{aligned}$$

By Lemma 4.1 this yields Q =  Φ. \(\Box \)

4.3 Subdifferential of a Composite Sublinear Functional

Recall the Hahn–Banach theorem in the “algebraic form”, called extension theorem (also proved in standard courses of functional analysis, see e.g. Kolmogorov and Fomin (1968); Dunford and Schwartz (1968)).

Theorem 4.3 (Hahn–Banach)

Let X be a arbitrary vector space, Γ  X a subspace, \(\varphi : X \to \mathbb {R}\) a sublinear functional, and \(\varphi _{\Gamma } : \Gamma \mapsto \mathbb {R}\) its restriction to Γ. Then for any \(l: \Gamma \to \mathbb {R}\) satisfying l  ∂φ Γ there exists \(\widetilde l: X\to \mathbb {R}\) satisfying \(\widetilde l \in \partial \varphi \) such that \(\widetilde l(x) = l(x)\) on Γ, i.e., \(\widetilde l\) is an extension of l from the subspace Γ to the entire space X.

The next theorem, proposed in Dubovitskii and Milyutin (1965), is an analog of Theorem 3.6.

Theorem 4.4

Let A : X  Y be a linear operator, \(\varphi :Y\to \mathbb {R}\) a sublinear functional. Then the functional f(x) = φ(Ax) is sublinear, and its subdifferential

$$\displaystyle \begin{aligned} \partial f\; =\; A^*\,\partial\varphi, \end{aligned} $$
(17)

i.e., any p  ∂f has the formp, x〉 = 〈μ, Axwith some μ  ∂φ. (Or, in “physical” notation, ∂f = ∂φ  A and p = μA, respectively.) The reverse inclusion is trivial.

Proof

Let p ∈ ∂f i.e., \(\langle p,x\rangle \leqslant \varphi (Ax)\; \; \forall \, x\in X.\) In the product X × Y define a sublinear functional \(\widehat \varphi (x,y)=\varphi (y),\) and on the subspace Γ = {(x, y) | y = Ax} define a linear functional l(x, y) = p(x). Then \(l(x,y)\leqslant \widehat \varphi (x,y)\) on Γ (since \(\langle p,x\rangle \leqslant \varphi (Ax) \;\, \forall \, x\in X.)\) By Theorem 4.3, the functional l can be extended to \(\widehat l\) defined on the whole space X × Y preserving the property \(\widehat l\leqslant \widehat \varphi .\) But every linear functional \(\widehat l\) on X × Y has the form \(\widehat l (x,y)= \langle \lambda ,x \rangle + \langle \mu ,y\rangle \) with some λ ∈ X and μ ∈ Y . So, we have

$$\displaystyle \begin{aligned}\begin{array}{lll} && \langle\lambda,x\rangle+ \langle\mu,y\rangle \;\leqslant\; \widehat\varphi (x,y)\; =\; \varphi(y), \qquad \forall\, (x,y) \in X\times Y, \\ {} && \langle\lambda,x\rangle+ \langle\mu, y\rangle\; =\;\, l\langle x,y\rangle\;\, =\; \langle p,x\rangle \qquad \; \forall\, (x,y)\in\Gamma. \end{array} \end{aligned}$$

The first relation implies that λ = 0 and μ ∈ ∂φ, and the second one that 〈p, x〉 = 〈μ, Ax〉, since y = Ax on Γ. Thus, p = μA, q.e.d. \(\Box \)

4.4 Subdifferential of the Derivative of a Sublinear Functional

Let \(\varphi : X\to \mathbb {R}\) be a bounded sublinear functional. Then its directional derivative

$$\displaystyle \begin{aligned}\psi(\bar x) =\; \varphi'(x_0, \bar x) :=\; \lim_{\varepsilon\to 0+} \frac{\varphi(x_0+\varepsilon\bar x) -\varphi(x_0)}\varepsilon \end{aligned}$$

at any point x 0 ∈ X is also a bounded sublinear functional. (This simple fact holds for any convex function φ Lipschitz continuous around x 0). Note that

$$\displaystyle \begin{aligned} \varphi'(x_0, \bar x) \;\leqslant\; \frac{\varphi(x_0+\varepsilon\bar x) -\varphi(x_0)}\varepsilon \qquad \mbox{for all}\quad \varepsilon>0, \end{aligned} $$
(18)

because, by the convexity of φ, the right-hand side monotonically decreases as ε decreases to 0 + .

The next lemma gives a relation between the subdifferentials of functionals φ and ψ (see e.g. Dubovitskii and Milyutin (1965); Pshenichnyi (1982)).

Lemma 4.2

∂ψ  = {x  ∂φ  : 〈x , x 0〉 = φ(x 0)}.

In particular case when x 0 = 0, we have ∂ψ = ∂φ.

Proof

(⊃) Let \(\langle x^*,x\rangle \leqslant \varphi (x)\) for all x, and 〈x , x 0〉 = φ(x 0). Then \(\forall \, \bar x\in X\) and ε > 0 we have \(\langle x^*,\, x_0+\varepsilon \bar x\rangle \leqslant \varphi (x_0+ \varepsilon \bar x),\) and since \(\varepsilon \bar x = (x_0+ \varepsilon \bar x) - x_0\,,\) we have \(\langle x^*,\varepsilon \bar x\rangle \leqslant \varphi (x_0+ \varepsilon \bar x) -\varphi (x_0),\) and therefore,

$$\displaystyle \begin{aligned}\langle x^*,\bar x \rangle \;\leqslant\; \lim_{\varepsilon\to 0+} \frac{\varphi(x_0+\varepsilon\bar x) -\varphi(x_0)}\varepsilon \;=\; \varphi'(x_0,\bar x), \qquad \mbox{i.e.}\quad x^*\in\partial\psi. \end{aligned}$$

(⊂). Let \(\langle x^*,\bar x \rangle \leqslant \varphi '(x_0,\bar x)\) for all \(\bar x\in X.\) Setting ε = 1 in (18), we have \(\langle x^*,\bar x \rangle \leqslant \varphi (x_0+\bar x)-\varphi (x_0) \leqslant \varphi (\bar x)\) (the last inequality holds by the subadditivity). Taking \(\bar x = -x_0\,,\) we get \(-\langle x^*,x_0 \rangle \leqslant -\varphi (x_0),\) while the reverse inequality \(\langle x^*,x_0 \rangle \leqslant \varphi (x_0)\) always holds since x ∈ ∂φ. Thus, 〈x , x 0〉 = φ(x 0), q.e.d. \(\Box \)

4.5 Cones Defined by Sublinear Functionals and Their Dual Cones

Let X be a normed space and \(\varphi : X\to \mathbb {R}\) a bounded sublinear functional.

Consider the convex closed cone \(K=\{x\in X:\; \varphi (x) \leqslant 0\}\) and the convex open cone Ω = {x ∈ X : φ(x) < 0}.

Theorem 4.5 (Dubovitskii and Milyutin (1965))

Suppose the cone Ω is nonempty. Then 1) \( \mathop {\mathrm {int}} K = \Omega , \;\; K = \overline \Omega ,\;\) and 2) \(K^* = -\mathbb {R}_+ \partial \varphi ,\) i.e., for every linear functional μ  K there exist \(\alpha \geqslant 0\) and l  ∂φ such that μ = −αl. (The converse is obviously true.)

Proof

The first assertion is an easy consequence of the convexity and continuity of φ. To prove the second one, take any μ ∈ K . If μ = 0, we can take α = 0 and any l ∈ ∂φ, thus obtaining the required relation μ = −αl.

Further, consider the case μ≠0. By the assumption, we have:

$$\displaystyle \begin{aligned} \varphi(x)<0\; \Longrightarrow\; \mu(x)\geqslant 0. \end{aligned} $$
(19)

In the product \(X\times \mathbb {R},\) define the cones:

$$\displaystyle \begin{aligned}C_1=\{(x,t):\; \varphi (x)<t\}\quad \mbox{and}\quad C_2=\{(x,t):\; \mu (x)<0,\;\, t=0\}.\end{aligned} $$

Clearly, they are nonempty and convex, the cone C 1 being open. Furthermore, \(C_1\bigcap C_2 =\O .\) Indeed, if they have a common element (x, 0), then φ(x) < 0 and μ(x) < 0, which contradicts (19).

By the separation theorem there exists a functional \((\lambda ,\beta ) \in X^*\times \mathbb {R}\) which is strictly negative on C 1 and nonnegative on C 2, that is,

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \varphi(x)<t \;\; \Longrightarrow\;\; l(x)+\beta t<0 \,, {} \end{array} \end{aligned} $$
(20)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \mu(x)<0 \;\; \Longrightarrow\;\; l(x)\geqslant 0\,. {}\vspace{-3pt} \end{array} \end{aligned} $$
(21)

Taking in (20) x = 0 and t = 1, we obtain β < 0, so we can set β = −1. Then (20) becomes:

$$\displaystyle \begin{aligned}\varphi(x)<t \; \Longrightarrow \; l(x)<t \qquad \forall\, x, t,\end{aligned} $$

whence \(l(x)\leqslant \varphi (x) \;\, \forall \, x,\;\) i.e. l ∈ ∂φ. By Lemma 3.1 condition (21) implies l = −αμ, where \(\alpha \geqslant 0.\) If α = 0, then l = 0, hence, φ(x)⩾0 ∀ x, which contradicts the assumption that K is nonempty. Therefore, α > 0, and then \(\mu = -\frac 1\alpha \, l,\;\) q.e.d. \(\Box \)

As was already noted, any sublinear functional φ obviously generates a closed convex cone \(K=\{x\in X:\; \varphi (x) \leqslant 0\}.\) The next theorem shows that, under some “regularity” assumption, the reverse relation also holds.

Theorem 4.6

Let K  X be a closed convex cone with a nonempty interior. Then there exists a bounded sublinear functional φ that generates this cone, i.e., \(K=\{x\in X:\; \varphi (x) \leqslant 0\}.\)

Proof

Consider the polar cone K 0 and define a set \(Q = \{ x^* \in K^0:\, ||x^*|| \leqslant 1\}.\) Define a bounded sublinear functional \(\varphi (x) = \sup \,\langle Q,x\rangle .\) By the Alaoglu theorem, Q is compact in the weak-* topology, and obviously convex, hence by Lemma 4.1 ∂φ = Q.

Now, define a set \( C = \{ x\,|\;\, \varphi (x) \leqslant 0\}\; =\; \{ x\,|\;\, \langle Q,x \rangle \leqslant 0\}.\) Obviously, it is a closed convex cone. We claim that C = K. Indeed, if x ∈ K, then by the definition of K 0 we have \(\langle Q,x \rangle \leqslant 0,\) that is, \(\varphi (x) \leqslant 0,\) so x ∈ C. On the other hand, if x ∈ C, i.e., \(\langle Q,x \rangle \leqslant 0,\) then also \(\langle K^0,x \rangle \leqslant 0,\) which implies that x ∈ K (otherwise, if xK, then by the separation theorem ∃ x ∈ K 0 such that 〈x , x〉 > 0, a contradiction). \(\Box \)

In a standard case, when there is a finite number of smooth scalar inequalities \(f_i(x) \leqslant 0,\) i = 1, …, m, they can be replaced by one vector-valued inequality

$$\displaystyle \begin{aligned}f(x) := (f_1(x),\, \ldots,\, f_m(x)) \in K = \mathbb{R}_-^m\,, \end{aligned}$$

or by one nonsmooth inequality \(\displaystyle \; \Phi (x):= \max _{1\leqslant i\leqslant m} f_i(x) \leqslant 0.\)

Here, the cone \(\mathbb {R}^m_-\) is given by the sublinear inequality \(\displaystyle \varphi (x):= \max _{1\leqslant i\leqslant m}\, x_i \leqslant 0.\)

All the above are facts of linear and convex functional analysis. However, to handle the nonlinear equality constraint g(x) = 0 in optimization problem (1), we also need some facts of nonlinear functional analysis.

5 Covering and Metric Regularity

Our main goal in this section will be to obtain a “correction theorem” for the operator g in a neighborhood of a reference point x 0 , which is now often called generalized Lyusternik theorem. It is essentially related with (and, in fact, based on) the following concept of covering mapping and on a theorem on the stability of covering property under small perturbations.

5.1 Milyutin’s Covering Theorem

Recall the following important notion proposed by Milyutin (see Levitin et al. (1974, 1978); Dmitruk et al. (1980)).

Let X be a complete metric space, Y a vector space with a translation-invariant metric (e.g. a normed space), G a set in X, and T : X → Y a mapping. We denote the metrics in X and Y by the same letter d, and the ball B r (x) is sometimes denoted by B(x, r).

Definition 5.1

The mapping T covers on G with a rate a > 0 if

$$\displaystyle \begin{aligned} \forall\, B_r(x)\subset G\qquad T(B_r(x))\supset B_{ar}(T(x)). \end{aligned} $$
(22)

(Obviously, this property makes sense only if the set G has a nonempty interior; moreover, one can assume that \(G \subset \overline {\, \mathop {\mathrm {int}} G}\).)Footnote 2

Note that this property remains valid if the mapping T is replaced by T + y for any fixed y ∈ Y. Note also, that a covering mapping can be not continuous on G. (A simple example is presented by the mapping \(T: \mathbb {R}^2 \to \mathbb {R}, (x,y) \mapsto x + f(y),\) where f is an arbitrary function.)

The most simple and, at the same time fundamental, example of covering mapping is a linear surjective operator A : X → Y between Banach spaces. By Theorem 3.5, it covers with some rate r > 0 on the whole space X.

Now, let another mapping S : X → Y be also given.

Definition 5.2

The mapping S contracts on G with a rate b⩾0 if

$$\displaystyle \begin{aligned} \forall\, B_r(x)\subset G\qquad S(B_r(x))\subset B_{br}(S(x)). \end{aligned} $$
(23)

Obviously, any such mapping is continuous on G. On the other hand, any mapping b −Lipschitz continuous on G contracts on G with rate b.

The following theorem presents the stability of covering property under small contracting additive perturbations. Its formulation was proposed by A.A. Milyutin in Levitin et al. (1974, Sec.2) and the proof was published in Levitin et al. (1978, Sec.2), Dmitruk et al. (1980).

Theorem 5.1 (A.A. Milyutin)

Let T cover on G with a rate a > 0 and have a closed graph (e.g., be continuous on G), and let S contract on G with a rate b < a. Then their sum F = T + S covers on G with the rate a  b > 0.

The proof is so important and at the same time transparent, that it worth to be given here completely. Take any ball B(x 0, ρ) ⊂ G. We must show that

$$\displaystyle \begin{aligned}F(B(x_0,\rho))\;\supset\; B(F(x_0), (a-b)\rho). \end{aligned}$$

Without loss of generality, assume that a = 1 and hence b < 1. Denote for brevity y 0 = F(x 0), r = (1 − b)ρ. Take any \(\hat y\in B(y_0,r).\) We have to show that \(\exists \, \hat x\in B(x_0, \rho )\) such that \(F(\hat x)=\hat y.\) The point \(\hat x\) will be obtained as the limit of a sequence {x n }, which will be generated by a special iteration process.

At the beginning, we have the following relation:

$$\displaystyle \begin{aligned} T(x_0) +S(x_0) = y_0\;, \end{aligned} $$
(24)

and we need to obtain \(T(\hat x) +S(\hat x) = \hat y. \) Since the mapping T is 1-covering, so does T + S(x 0), and since \(\hat y \in B_r(y_0)\) and B(x 0, r) ⊂ G, there exists x 1 ∈ B(x 0, r) such that

$$\displaystyle \begin{aligned} T(x_1) + S(x_0) = \hat y\;. \end{aligned} $$
(25)

Now, replace here S(x 0) by S(x 1). Since the mapping S is b −contracting on the ball B(x 0, r), we have \(d(S(x_1), S(x_0)) \leqslant br,\) and so

$$\displaystyle \begin{aligned} T(x_1) +S(x_1) = y_1\,, \end{aligned} $$
(26)

where \(d(\hat y,y_1)\leqslant br.\) So, we moved from Eq. (24) for a “base” point x 0 to Eq. (26) for a new “base” point x 1, where

$$\displaystyle \begin{aligned}d(x_0,x_1)\leqslant r, \qquad y_1 \in B(y_0,br), \end{aligned}$$

and we still need to obtain \(T(\hat x) +S(\hat x) = \hat y.\) Since

$$\displaystyle \begin{aligned}r +br <\; r\,(1+b+b^2+ \ldots )\; =\; r\,\frac 1{1-b} \;= \rho, \end{aligned}$$

the ball B(x 1, br) is contained in the initial ball B(x 0, ρ); hence the 1–covering of T and b −contracting of S hold on B(x 1, br). Then, by analogy with the preceding step, there exists x 2 ∈ B(x 1, br), such that

$$\displaystyle \begin{aligned}T(x_2) + S(x_2) = y_2 \qquad \mbox{with}\qquad d(\hat y,y_2)\leqslant b^2 r,\end{aligned} $$

and so on. Continuing this process infinitely, we obtain a sequence of points x n , y n such that

$$\displaystyle \begin{aligned} F(x_n) =\; T(x_n) +S(x_n) =\; y_n\;,\end{aligned} $$
(27)
$$\displaystyle \begin{aligned} d(x_{n-1},x_n)\leqslant b^{n-1} r, \qquad d(\hat y,y_n)\leqslant b^n r. \end{aligned} $$
(28)

Moreover, we have

$$\displaystyle \begin{aligned}d(x_0,x_n) +b^n r\; \leqslant\; d(x_0,x_1) + d(x_1,x_2) + \ldots + d(x_{n-1},x_n) + b^n r \;\leqslant \end{aligned}$$
$$\displaystyle \begin{aligned} \leqslant\; r +br + \ldots + b^{n-1} r +b^n r\; <\; r\,\frac 1{1-b}\; = \rho, \end{aligned} $$
(29)

whence the ball B(x n , b n r) is contained in the initial ball B(x 0, ρ), which makes the next step possible.

Consider the obtained sequence {x n } . The first inequality in (28) implies that it is a Cauchy sequence, and since X is complete, this sequence has a limit \(\hat x.\) By (29) we get \(d(x_0,\hat x)\leqslant \rho ,\) i.e., \(\hat x\in B(x_0,\rho ).\) The second inequality in (28) implies that \(y_n \to \hat y,\) and then, from (27) and continuity of F on the initial ball (or from the closedness of its graph) we get \(F(\hat x)=\hat y,\) which is exactly what was required. \(\Box \)

Remark

The iteration process in this proof is much similar to that in the Newton method: the role of derivative F′, involved in the Newton method, is played in our case by the mapping T, and the role of the small nonlinear residual is played by the mapping S. The covering property of mapping T allows us to “solve” Eq. (25) with respect to x 1 , while the perturbational term S applied to x 1 pushes us aside the desired goal \(\hat y.\) Repeating iteratively this procedure, we obtain a sequence which gives in the limit a desired solution: \(T(\hat x) +S(\hat x) = \hat y.\) This abstract Newton-like method can be called Lyusternik iteration process (Lyusternik 1934). Note however, that this process does not completely coincide with the Newton method, because the mapping T is not one-to-one, in general, and therefore, Eq. (25) is not solved uniquely, in contrast to the Newton method. So, the Lyusternik process is more general than the latter. (For example, the Lyusternik iteration process is actually used in the standard proof of the Banach open mapping Theorem 3.5, whereas the Newton method cannot be used there).

5.2 Local Covering and Metric Regularity

In what follows, we will actually need a local version of the covering property, when a mapping g covers in a neighborhood of a given point x 0, i.e., when inclusion (5.3) holds for any ball containing in this neighborhood.Footnote 3 A closely related to this property is the following central (and the most simple) notion of nonsmooth analysis.

Set y 0 = g(x 0). The mapping g is said to be metric regular with constant C in neighborhoods of x 0 and y 0 if there exist neighborhoods \(\mathcal {U}(x_0)\) and \(\mathcal {V}(y_0)\) of x 0 and y 0, respectively, such that for every \( x \in \mathcal {U}(x_0)\) and every \( y \in \mathcal {V}(y_0)\) the following estimate holds:

$$\displaystyle \begin{aligned} dist\,(x, g^{-1}(y))\;\leqslant\; C\,d(g(x), y).\end{aligned} $$
(30)

(Here, dist denotes the distance from a point to a set.)

Between these two notions the following simple relation holds (see e.g. Dmitruk et al. (1980)).

Theorem 5.2

Suppose that a mapping g : X  Y covers with a rate a > 0 in a neighborhood of x 0. Then g is metric regular with constant C = 1∕a in neighborhoods of x 0 and y 0.

If g is continuous at x 0, the reverse is also true: if a mapping g is metric regular with a constant C in neighborhoods of x 0 and y 0, then it covers in some neighborhood of x 0 with any rate a < 1∕C.

Proof

(⇒) Again, we set a = 1, so g is 1-covering on B ε (x 0) for some ε > 0. Set δ = ε∕3, y 0 = g(x 0) and show that g is metric 1-regular on B ε (x 0) × B ε (y 0).

Take any x ∈ B(x 0, δ), y′∈ B(y 0, δ) and denote g(x) = y, d(y′, y) = r. Thus, y′∈ B r (y), and we have to show that \(d(x, g^{-1}(y')) \leqslant r.\) We will actually show a bit more, that

$$\displaystyle \begin{aligned} \exists\, x'\in B_r(x)\quad \; \mbox{such that}\quad \; g(x')=y'\,.\end{aligned} $$
(31)

The following two cases are possible:

  1. (a)

    \(r\geqslant 2\delta \;\; (y\) and y′ are “far” from each other),

  2. (b)

    r < 2δ (y and y′ are “close”).

In case (a), since g is 1-covering, the image of B(x 0, δ) contains the ball B(y 0, δ), hence ∃ x′∈ B(x 0, δ) such that g(x′) = y′. Then

$$\displaystyle \begin{aligned}d(x',x)\leqslant\; d(x',x_0) +d(x_0,x) \;\leqslant\; 2\delta \;\leqslant\; r, \end{aligned}$$

and so, (31) is proved.

In case (b), we have δ + r < δ + 2δ = ε, therefore, by the triangle inequality, B r (x) ⊂ B ε (x 0), and since g covers with 1 on B ε (x 0), we have g(B r (x))  ⊃ B r (y). Taking into account that y′∈ B r (y), we get (31). \(\Box \)

(⇐=) Now, we assume that C = 1. So, let g be continuous at x 0, and for some δ > 0, ε > 0 metric regular with constant 1 on B δ (x 0) × B ε (y 0). Reduce, if necessary, δ so that

$$\displaystyle \begin{aligned} 0 < \delta < \varepsilon/3\qquad \mbox{and}\qquad g(B_\delta(x_0))\;\subset\; B_{\varepsilon/3}(y_0) \end{aligned} $$
(32)

(the last is possible since g is continuous), and show that g covers on B δ (x 0) with any rate a < 1.

Consider any ball B(x, r) ⊂ B δ (x 0). Obviously, its radius \(r \leqslant 2\delta \) (because, its any point x′ satisfies \(d(x',x)\leqslant d(x',x_0) +d(x_0,x) \leqslant 2\delta \)).

Denote F(x) = y and take any r′ < r. It suffices to show that

$$\displaystyle \begin{aligned} g(B_r(x))\;\supset\; B_{r'}(y), \end{aligned} $$
(33)

which readily would imply the a −covering of g for any a < 1.

Take any point \(y'\in B_{r'}(y).\) In view of (32),

$$\displaystyle \begin{aligned}d(y_0,y')\;\leqslant\; d(y_0,y) + d(y,y')\;\leqslant\; \frac\varepsilon 3\, + r'\, <\, \frac\varepsilon 3\, + r \;\leqslant\; \frac\varepsilon 3\, + 2\delta\; <\varepsilon, \end{aligned}$$

hence y′∈ B ε (y 0). Since x ∈ B δ (x 0), then, by the metric 1-regularity, the points x, y′ satisfy the estimate:

$$\displaystyle \begin{aligned}d(x,\, g^{-1}(y')) \;\leqslant \; d(y,y')\;\leqslant r'<r. \end{aligned}$$

This implies that ∃ x′∈ B r (x) such that g(x′) = y′, which proves (33), and hence, the theorem is completely proved. \(\Box \)

This theorem says that the local metric regularity and local covering are, in fact, one and the same property. Our experience shows that, to obtain this property is more convenient in the “geometrical” form of covering, while to use it is more convenient in the “analytical” form of metric regularity.

5.3 Covering and Metric Regularity for Strictly Differentiable Operators

Let now X and Y be Banach spaces, G ⊂ X an open set, x 0 ∈ G a given point. Recall the following

Definition

An operator g : G → Y is said to be strictly differentiable at x 0 if there exists a linear operator T : X → Y such that, for any ε > 0 there exists a neighborhood \(\mathcal {O}(x_0)\) such that, for all \(x_1,x_2\in \mathcal {O}(x_0)\) the following inequality holds:

$$\displaystyle \begin{aligned} \|g(x_2)-g(x_1)-T(x_2-x_1)\| \;\leqslant\; \varepsilon\|x_2-x_1\|.\end{aligned} $$
(34)

The latter means that the operator g(x) − Tx is Lipschitz continuous in \(\mathcal {O}(x_0)\) with constant ε. Also, it means that

$$\displaystyle \begin{aligned} g(x_2)-g(x_1)-T(x_2-x_1)\; =\; \zeta(x_1,x_2)\|x_2-x_1\|,\end{aligned} $$
(35)

where ζ(x 1, x 2) → 0 as ∥x 1 − x 0∥ + ∥x 2 − x 0∥→ 0.

The operator T is called strict derivative of g at x 0. Fixing x 1 = x 0, we obtain that g is Frechet differentiable at x 0 and its Frechet derivative g′(x 0) = T. It can be easily shown (by using the mean value theorem) that any operator continuously Frechet differentiable at x 0 is strictly differentiable at this point, but the reverse is not true even in the 1-dimensional case.

An important corollary of Theorem 5.1 for strictly differentiable operators is the following

Theorem 5.3

Let X and Y be Banach spaces, G  X an open set, x 0 ∈ G a given point, and g : G  Y an operator strictly differentiable at x 0. Suppose that g′(x 0)X = Y (the so-called Lyusternik condition). Then the operator g covers with some rate a > 0 in a neighborhood of x 0.

Proof

Since g′(x 0)X = Y, the Banach open mapping theorem guarantees that the linear operator g′(x 0) covers with some rate a > 0 on the whole space X.

Further, let us represent g in the form

$$\displaystyle \begin{aligned} g(x) =\, g(x_0+(x-x_0))\, =\; g(x_0)+ g^{\prime}(x_0)(x-x_0)\,+\, S(x), \end{aligned} $$
(36)

where S(x) = g(x) − g(x 0) − g′(x 0)(x − x 0).

The operator g(x 0) + g′(x 0)(x − x 0) covers on X with the same rate a, while, according to (34), the operator S can be made Lipschitz continuous with any constant ε > 0 in a properly chosen neighborhood \(\mathcal {O}(x_0).\) Taking any ε < a, we obtain by Theorem 5.1 that g covers on \(\mathcal {O}(x_0)\) with the rate a − ε > 0. \(\Box \)

Theorems 5.2 and 5.3 directly imply the following theorem on a correction to the level set of a nonlinear operator, which was in fact first proposed by L.A. Lyusternik in his seminal paper (Lyusternik 1934).

Theorem 5.4

Let X and Y be Banach spaces, G  X an open set, x 0 ∈ G a fixed point, and g : G  Y a strictly differentiable at x 0 operator. Suppose that y 0 = g(x 0) = 0 and g′(x 0)X = Y. Then g is metric regular in some neighborhoods of x 0 and y 0, and hence, there exist a neighborhood \(\mathcal {O}(x_0)\) and a constant C such that, for every \( x \in \mathcal {O}(x_0)\) one can find an h = h(x) ∈ X such that

$$\displaystyle \begin{aligned} g(x+h) = 0 \quad \mathit{\mbox{and}}\quad \; \|h\| \leqslant C\| g(x) \|. \end{aligned} $$
(37)

This theorem is a highly convenient and efficient tool in handling a wide range of nonlinear equations, and because of this, plays a key role in studying optimization problems with equality constraints.

6 Proof of Theorem 2.1

Our proof follows the Dubovitskii–Milyutin scheme (Dubovitskii and Milyutin 1965). It consists of two steps. At the first step, we pass from the local minimality to the incompatibility of (sub)linear approximations of all the constraints and the cost of the problem, and at the second step, this incompatibility of approximations is rewritten, by the Dubovitskii–Milyutin separation theorem, as the corresponding Euler–Lagrange equation.

First of all, following (Levitin et al. 1974, 1978), it is convenient to introduce the next

Definition

We say that x 0 is a point of s-necessity (or strongest necessity) in Problem (1) if there is no sequence x n  → x 0 such that

$$\displaystyle \begin{aligned} F_0(x_n)< F_0(x_0), \quad f_i(x_n)\in \mathop{\mathrm{int}} K_i,\;\; i=1,\ldots \nu, \qquad g(x_n)=0.\end{aligned} $$
(38)

Obviously, the local minimum at the point x 0 implies the s-necessity at this point.Footnote 4

Like in all problems with inequality constraints, it is convenient to distinguish between active and inactive indices. If i is such that \(f_i(x_0) \in \mathop {\mathrm {int}} K_i\,,\) this index is called inactive. Otherwise, if f i (x 0) ∈ ∂K i , the index i is active. The index i = 0 is always active by definition. The set of all active indices is denoted by I(x 0).

6.1 Lemma on Incompatibility of a System of Approximations

For any i = 1, …, ν, define

$$\displaystyle \begin{aligned} \begin{array}{rcl}C_i=\; \{\bar z\in Z_i:\;\, \exists\, \alpha >0 \;\; \mbox{ such that }\; f_i(x_0) + \alpha \bar z\, \in\,\mathop{\mathrm{int}} K_i\}\vspace{-1.3pc} \end{array} \end{aligned} $$
  • the cone of interior directions to the cone K i at the point z i  = f i (x 0). Clearly, \(C_i = \mathop {\mathrm {int}} K_i - \mathbb {R}_+ f_i(x_0)\,,\) it is nonempty, convex and open. Along with this, define an open convex cone

    $$\displaystyle \begin{aligned} \Omega_i=\; \{\bar x\in X:\;\, \bar z= f_i^{\prime}(x_0)\,\bar x \in C_i\} \end{aligned}$$
  • the preimage of cone C i under the linear mapping \(f_i^{\prime }(x_0): X\to Z_i\,.\) Note that, if i is inactive, i.e. \(f_i(x_0) \in \mathop {\mathrm {int}} K_i\,,\) then C i  = Z i and Ω i  = X i .

Now, consider the following system of approximations for Problem (1) at the point x 0:

$$\displaystyle \begin{aligned} F_0^{\prime}(x_0)\,\bar x <0,\quad \; \bar x\in\Omega_i\,,\quad i=1,\ldots, \nu, \qquad g^{\prime}(x_0)\,\bar x =0. \end{aligned} $$
(39)

Lemma 6.1

If x 0 is a point of s-necessity in Problem (1) and g′(x 0)X = Y (the Lyusternik regularity condition), then there is no \(\bar x\in X\) satisfying system (39).

Proof

Suppose, on the contrary, that there exists \(\bar x\) satisfying system (39). Consider the “sequence” \(x_0+ \varepsilon \bar x\) with ε → 0 + . For this sequence, we have

$$\displaystyle \begin{aligned}g(x_0+\varepsilon\bar x)\, =\; g(x_0) + g^{\prime}(x_0)\varepsilon\bar x + o(\varepsilon)\; =\, o(\varepsilon), \end{aligned}$$

hence, by Theorem 5.4, there is a correction r ε  ∈ X such that \(g(x_0+\varepsilon \bar x+ r_\varepsilon )=0\) and ∥r ε ∥ = o(ε).

Further, take any i ∈{1, …, ν}. If iI(x 0), then obviously \(f_i(x_0+\varepsilon \bar x+ r_\varepsilon ) \in \mathop {\mathrm {int}} K_i\,.\) If i ∈ I(x 0), we have f i (x 0) ∈ K i and \(f_i(x_0)+\alpha f_i^{\prime }(x_0)\bar x\in \mathop {\mathrm {int}} K_i\) for some α > 0, whence the whole half-interval \((f_i(x_0),\, f_i(x_0)+\alpha f_i^{\prime }(x_0)\bar x]\) lies in \( \mathop {\mathrm {int}} K_i.\) Then

$$\displaystyle \begin{aligned}f_i(x_0+\varepsilon\bar x +r_\varepsilon) =\; f_i(x_0) + \varepsilon f_i^{\prime}(x_0)\bar x\, + o(\varepsilon)\, \in \mathop{\mathrm{int}} K_i \end{aligned}$$

for all small enough ε > 0. Similarly, \(F_0(x_0+\varepsilon \bar x + r_\varepsilon )< F_0(x_0)\) for small enough ε > 0, since \(F_0^{\prime }(x_0)\bar x <0\) and ∥r ε ∥ = o(ε). Thus, the sequence \(x_0+\varepsilon \bar x +r_\varepsilon \) satisfies system (38), which contradicts the s-necessity at x 0 . \(\Box \)

6.2 Passage to the Euler–Lagrange Equation

Let x 0 be a point of s-necessity in problem (1). Without loss of generality assume that all indices \(i\geqslant 1\) are active. Consider the regular case, when g′(x 0)X = Y. Then by Lemma 6.1 the system of approximations (39) is incompatible. We aim to apply the Dubovitskii–Milyutin “separation” Theorem 3.3.

Assume first that \(F_0^{\prime }(x_0)\ne 0\) and Im\(f^{\prime }_i(x_0) \cap C_i \ne \O \) for all i = 1, …, ν (the main, nondegenerate case). Then all the cones in system (39) are nonempty. By Theorem 3.3, there exist a support functional \(x_0^*\) to the half-space \(\Omega _0= \{\bar x: F_0^{\prime }(x_0)\bar x<0\},\) support functionals \(x_i^*\) to the cones Ω i , i = 1, …, ν, and a support functional x to the subspace \(\Omega = \{\bar x: g'(x_0)\bar x=0\},\) not all of which are equal to zero, such that \(x_0^*+ x^*_1 +\,\ldots \, + x^*_\nu + x^*=0.\) By Lemma 3.1 \(x_0^*= -\alpha _0 f_0^{\prime }(x_0)\) with some α 0⩾0, by the Farkas–Minkowski Theorem 3.6 each \(x_i^*= z^*_i f_i^{\prime }(x_0)\) with some \(z_i^*\in K_i^*\) such that \(\langle z_i^*, f_i(x_0)\rangle =0,\) and by Lemma 3.6 x  = y g′(x 0) with some y ∈ Y . Thus, we get

$$\displaystyle \begin{aligned}-\alpha_0 F_0^{\prime}(x_0)\; +\; \sum_{i=1}^\nu z^*_i f_i^{\prime}(x_0)\; +\; y^*g^{\prime}(x_0)\; =\,0. \end{aligned}$$

Obviously, \(\alpha _0+ \sum _{i=1}^\nu \|z^*_i\|+\|y^*\|>0\) (otherwise all the support functionals equal zero). Finally note that \(-z_i^*\in K_i^0\) for all i = 1, …, ν, and changing y to − y , we obtain the Euler–Lagrange equation in the required form (4).

Now, consider the degenerate cases. If \(F_0^{\prime }(x_0)= 0,\) we take α 0 = 1 and all other functionals equal to zero, thus obtaining (4). If ∃ i such that Im\(f^{\prime }_i(x_0) \cap C_i = \O ,\) we can separate the subspace Im\(f^{\prime }_i(x_0)\) and the open cone C i by a nonzero \(z^*_i\,.\) By Lemma 3.3 \(\langle z^*_i\,K_i \rangle \geqslant 0\) and \(\langle z^*_i\,, f_i(x_0) \rangle =0.\) Setting α 0 = 0, all \(z^*_j =0\) for ji, and y  = 0, we obtain (4). Finally, if g′(x 0)XY, then by the assumption, the subspace g′(x 0)X is closed in Y, and hence, by Lemma 3.4, there is a nonzero functional y ∈ Y vanishing on g′(x 0)X, which means that y g′(x 0) = 0. Taking all other functionals equal zero, we again obtain Eq. (4). \(\Box \)

Remarks

  1. 1.

    As one could see, the incompatibility of system (39) follows not only from the local minimum at x 0 in problem (1), but even from the s −necessity in this problem. In the conditions of s −necessity, the cost functional plays the same role as any active inequality constraint (the inactive constraints can be removed from the study altogether). Therefore, the inequality F 0(x) − F 0(x 0) < 0 can be replaced by a more general condition \(f_0(x) \in \mathop {\mathrm {int}} K_0,\) where K 0 is a convex cone with a nonempty interior in a Banach space Z 0. Thus, instead of study the local minimum at x 0 in problem (1), one can study the presence of s −necessity at x 0 in a general system of inequalities and equalities:

    $$\displaystyle \begin{aligned} f_i(x) \in K_i\,, \quad \; i=0,1,\ldots \nu, \qquad g(x) =0, \end{aligned} $$
    (40)

    in which all f i come symmetrically. In this case, the following analog of Theorem 2.1 holds:

    Theorem 6.1 Let x 0 be a point of s necessity in system (40). Then there exist Lagrange multipliers \(z_i^* \in K^0_i,\;\, i=0,1,\,\ldots ,\,\nu ,\) and y  Y , not all of which equal zero, satisfying the complementary slackness conditions \(\langle z_i^*,\, f_i(x_0) \rangle =0,\) i = 0, 1, …, ν, such that the Lagrange function

    $$\displaystyle \begin{aligned} \mathcal{L}(x) = \; \sum_{i=0}^\nu \langle z_i^*, \,f_i(x)\rangle\, +\, \langle y^*, \,g(x)\rangle \end{aligned}$$

    is stationary at x 0 : \(\displaystyle \mathcal {L}^{\prime }(x_0)=\; \sum _{i=0}^\nu z_i^* f_i^{\prime }(x_0)\, +\, y^* g^{\prime }(x_0)\;=\; 0. \)

  2. 2.

    The functional F 0 in problem (1) can be taken not necessarily smooth; one can take it in the form F 0(x) = φ 0(f 0(x)), where f 0 : X → Z 0, similarly to other f i , is a smooth mapping, and φ 0 is a sublinear functional on Z 0 such that φ 0(f 0(x 0)) = 0 and the cone \(K_0 = \{ x\,:\; \varphi _0(x) \leqslant 0\}\) has a nonempty interior. Then the inequality F 0(x) < 0 is equivalent to that \(f_0(x) \in \mathop {\mathrm {int}} K_0\,,\) and we again come to the s −necessity in system (40), symmetric with respect to all i = 0, 1, …, ν.

  3. 3.

    On the other hand, by virtue of Theorem 4.6, any “vector-valued” inequality f i (x) ∈ K i can be represented as a scalar inequality \(\varphi _i(f_i(x)) \leqslant 0,\) where \(\varphi _i : Z_i \to \mathbb {R}\) is a sublinear functional such that \(K_i = \{ x\,:\; \varphi _i(x) \leqslant 0\}.\) In this case, we come to a system of a finite number of nonsmooth scalar inequalities and one (generally, infinite-dimensional) equality:

    $$\displaystyle \begin{aligned} \varphi_i(f_i(x)) \leqslant 0, \quad \; i=0,1,\ldots, \nu, \qquad g(x) =0. \end{aligned} $$
    (41)

    The conditions of s −necessity for this system are still given by Theorem 6.1 with \(z^*_i = \alpha _i \hat z^*_i,\) where \(\alpha _i \geqslant 0\) and \(\hat z^*_i \in \partial \varphi _i(f_i(x_0)).\) In its turn, system (41) can be reduced to a system with only one nonsmooth scalar inequality:

    $$\displaystyle \begin{aligned} \Phi(x) :=\; \max_{0\leqslant i\leqslant\nu}\varphi_i(f_i(x)) \leqslant 0, \qquad g(x) =0. \end{aligned} $$
    (42)

    All these representations are equivalent and give the same conditions of s −necessity.

  4. 4.

    Finally note that Theorem 6.1 (and equivalent Theorem 2.1) can be proved in another way, if one starts from the nonsmooth representation (42) instead of the smooth “infinite-dimensional” system (40). On this way, instead of the Dubovitskii–Milyutin Theorem 3.3 on the nonintersection of cones one should use the Dubovitskii–Milyutin Theorem 4.2 on the subdifferential of maximum of sublinear functionals. Let us show this, keeping the relations between the functionals φ i and the cones K i as in Remark 3.

Let a functional \(\Phi : X\to \mathbb {R}\) be Lipschitz continuous around x 0, whose directional derivative \(\Psi (\bar x) = \Phi ^{\prime }(x_0,\bar x)\) is convex in \(\bar x,\) hence is a bounded sublinear functional. Consider the system

$$\displaystyle \begin{aligned}\Phi(x) \leqslant 0, \qquad g(x)=0. \end{aligned}$$

Suppose that g′(x 0)X = Y and the s −necessity for this system at x 0 holds. Then it easily follows that there is no \(\bar x\) such that

$$\displaystyle \begin{aligned}\Phi^{\prime}(x_0, \bar x) <0, \qquad g^{\prime}(x_0)\bar x =0. \end{aligned}$$

This means that \(\Psi (\bar x) \geqslant 0\) for all \(\bar x \in L = Ker\,g'(x_0),\) i.e. the functional \(\left .\Psi \right |{ }_L : L\to \mathbb {R}\) is nonnegative on the subspace L. Therefore, \(\left .\partial \Psi \right |{ }_L\) contains the functional l  = 0. Then, by the Hahn–Banach Theorem 4.3, there exists its extension \(\widetilde l^* \in X^*\) such that \(\widetilde l^* =l^*=0\) on L, and \(\widetilde l^* \in \partial \Psi ,\) i.e., \(\Psi (\bar x) -\langle \widetilde l^*,\bar x \rangle \geqslant 0\) on the whole space X. Since \(\widetilde l^* = -y^* g'(x_0)\) for some y ∈ Y , we obtain the inclusion

$$\displaystyle \begin{aligned} \partial\Psi\,+\, y^* g'(x_0)\, \ni 0. \end{aligned} $$
(43)

Now, let Φ have the form as in (42), where all indices i = 1, …, ν are active, i.e. all f i (x 0) = 0. It easily follows that

$$\displaystyle \begin{aligned}\Phi^{\prime}(x_0, \bar x) =\; \max_{1\leqslant i\leqslant\nu}\, \varphi_i^{\prime}(f_i(x_0),\, f^{\prime}_i(x_0)\bar x) \qquad \forall\, \bar x\in X. \end{aligned}$$

Define sublinear functionals \(\psi _i(\bar z) = \varphi _i^{\prime }(f_i(x_0), \bar z)\) on Z i , linear operators \(A_i = f^{\prime }_i(x_0): X \to Z_i\,,\) and sublinear functionals \(P_i(\bar x) = \psi _i(A_i \bar x).\) Then

$$\displaystyle \begin{aligned}\Phi^{\prime}(x_0, \bar x) =\; \max_{1\leqslant i\leqslant\nu}\, P_i(\bar x). \end{aligned}$$

By Theorem 4.4, ∂P i  = ∂ψ i  ⋅ A i , that is \(x^*_i \in \partial P_i\) iff \(x^*_i = z^*_i f^{\prime }_i(x_0)\) for some \(z^*_i \in \partial \psi _i\,.\) The latter means that \(z^*_i \in \partial \varphi _i\) and \(\langle z^*_i, f_i(x_0) \rangle = \varphi _i(f_i(x_0))=0.\)

Since \(\Psi (\bar x) = \max _i P_i(\bar x),\) by the Dubovitskii–Milyutin Theorem 4.2 x ∈  Ψ iff \(x^* = \sum _i\,\alpha _i x^*_i\,,\) where all \(x^*_i \in \partial P_i\,,\) all \(\alpha _i \geqslant 0\) and \(\sum \alpha _i =1.\) Therefore, \(x^* = \sum \alpha _i z^*_i f^{\prime }_i(x_0),\) where \(z^*_i \in \partial \varphi _i\) (hence \(z^*_i \in K_i^0)\) and \(\langle z^*_i, f_i(x_0) \rangle =0.\)

In view of (43), we have

$$\displaystyle \begin{aligned}\sum_i \alpha_i z^*_i f^{\prime}_i(x_0)\,+\, y^* g^{\prime}(x_0) =0. \end{aligned}$$

Note that \(\hat z^*_i = \alpha _i z^*_i\) still satisfy the conditions \(\hat z^*_i \in K_i^0\) and \(\langle \hat z^*_i, f_i(x_0) \rangle =0,\) so we come to the Euler–Lagrange equation

$$\displaystyle \begin{aligned}\sum_i \hat z^*_i f^{\prime}_i(x_0)\,+\, y^* g^{\prime}(x_0) =0, \end{aligned}$$

which proves Theorem 6.1.