1 Introduction

The semi-infinite programming problem (SIP in short) has attracted much attention due to its various applications in engineering design, optimal control, economic equilibria, etc. It has become an active field of research in applied mathematics. For example, see Fiacco and Kortanek [1], Goberna and López [2], Polak [3], Polak and Wuu [4], Polak [5] and the references therein. The main difficulty in solving (SIP) comes from the fact that it has an infinite number of constraints, and hence it is hard to design algorithms for solving it. However, in recent years, many effective methods have been proposed for solving semi-infinite programming, such as discretization methods (see Hettich [6], Still [7]), reduction methods (see Hettich and Kortanek [8], Reemsten and Görner [9]) and exchange methods (see Goberna and López [2], Hettich and Kortanek [8], Reemsten and Görner [9], Zhang et al. [10]).

For studying duality properties of (SIP), we will use an augmented Lagrangian approach, which is crucial for dealing with nonconvex and constrained optimization problems. The augmented Lagrangian function was first proposed independently by Hestenes [11] and Powell [12] to solve an optimization problem with equality constraints. This was extended to solve an inequality constrained optimization problem by Buys [13]. The augmented Lagrangian method has been used to solve an optimization problem with both equality and inequality constraints in Rockafellar [14]. Under a growth condition, one can relate the value of the dual problem to the behavior of the perturbation function. In Rockafellar [14], the definition of a quadratic growth condition was introduced and a necessary and sufficient condition for a zero duality gap between a constrained optimization problem and its augmented Lagrangian dual problem was obtained. In Rockafellar and Wets [15, 11K*], an augmented Lagrangian with a convex augmenting function was introduced and the corresponding zero duality property was established. The definitions of several growth conditions were delineated by Penot [16]. By assuming that the augmenting function satisfies the level coercivity condition and the perturbation function satisfies some growth condition, several zero duality gap results between the primal problem and its augmented Lagrangian dual problem were obtained in Penot [16]. A level-bounded augmenting function was given in Huang and Yang [17], where the convexity of augmenting functions in Rockafellar and Wets [15] was replaced by a level-boundedness condition. A valley at 0 augmenting function was given in Zhou and Yang [18], where the level-boundedness condition of the augmenting functions in Huang and Yang [17] was replaced by the valley at 0 property. The latter property allows one to introduce a duality scheme with zero duality gap. More general frameworks for establishing zero duality gap and the existence of exact penalty parameters were introduced in [19,20,21]. Optimization problems in which the constraints are given in terms of a set inclusion were considered by Shapiro and Sun [22]. Using a quadratic growth condition (a property originally introduced by Rockafellar [14]), and a quadratic augmented Lagrangian, Shapiro and Sun established strong duality and second-order conditions ensuring existence of Lagrange multipliers (see [22, Theorems 2.4 and 3.4]). The concept of augmented Lagrange multiplier is related with that of a global saddle point of augmented Lagrangian functions. This connection was exploited by Sun et al. [23], who considered four classes of augmented Lagrangian functions. Under second-order sufficient conditions and certain additional conditions, Sun et al. [23] established the existence of local and global saddle points of these four kinds of augmented Lagrangian functions. Rückmann and Shapiro [24] studied the (SIP) via an augmented Lagrangian approach with a nonnegative convex augmenting function and a reduction approach method. They obtained some necessary and sufficient conditions for the existence of an augmented Lagrange multiplier. Furthermore, they discussed two particular cases: the proximal Lagrangian and the sharp Lagrangian.

Sufficient conditions for the existence of an augmented multiplier of an augmented Lagrangian with a valley at 0 augmenting function of a cone constrained optimization problem were obtained in Zhou et al. [25] by using local saddle point conditions and bounded conditions of a perturbed constraint set. In the present paper, we will use the augmented Lagrangian to study \(\text {(SIP)}\). Firstly, we use the sharp Lagrangian to derive first-order necessary conditions and first-order sufficient conditions for the existence of an augmented Lagrange multiplier for \(\text {(SIP)}\) (see Theorems 3.1 and 3.2). Secondly, we consider an augmented Lagrangian with an augmenting function having a valley at 0 property. This allows us to obtain the existence of an augmented Lagrange multiplier for \(\text {(SIP)}\) under two possible assumptions: (i) that \(\text {(SIP)}\) has a unique global optimal solution, and (ii) that the objective and the constraint functions satisfy a generalized representation condition originally introduced in [26].

Since the augmenting function used in this paper may be nonconvex and non-quadratic, standard techniques such as those used in Rückmann and Shapiro [24] and in Shapiro and Sun [22] to show the existence of augmented Lagrange multipliers cannot be applied here. For the same reasons, the saddle point results we derive cannot make use of the techniques used in Sun et al [23]. As a result, both the results and the proof techniques we use in the present paper are different from those in the last three references.

The structure of the paper is outlined as follows: in Sect. 2, we introduce some notations and preliminary results needed in the sequel of this article. In Sect. 3, we obtain some first-order optimality conditions for the existence of an augmented Lagrange multiplier of the sharp Lagrangian problem. In Sect. 4, we obtain some second-order conditions for the existence of an augmented Lagrange multiplier of a valley at 0 augmented Lagrangian for \((\text {SIP})\). The last section, Sect. 5, contains some concluding remarks and an open question.

2 Preliminaries

We consider a semi-infinite programming problem (SIP, in short) in the following form

figure a

where \(\varOmega \) is a (possibly infinite) nonempty set, \(f: {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) and \( g:{\mathbb {R}}^n\times \varOmega \rightarrow {\mathbb {R}}\) are real-valued functions. (SIP) is said to be convex if f and \(g(\cdot ,\omega ) (\omega \in \varOmega )\) are convex.

Let \({\mathbb {R}}_+:=[0,+\infty [\,\,, {\mathbb {R}}_{++}:=(0,+\infty [, {\mathbb {R}}_{\pm \infty }:=\ ]-\infty ,+\infty [\) and the indicator function of a set \(Z \subset {\mathbf {Y}}\) be defined by \(\delta _Z(y) = 0, \text{ if } y \in Z, \text{ and } \delta _Z(y) =+\infty , \text{ otherwise. }\)

Let \(\varOmega \) be a nonempty set and \({\mathbb {R}}^{\varOmega }\) be the set of all functions from \(\varOmega \) to \({\mathbb {R}}\). Denote by \({\mathbb {R}}^{(\varOmega )}\) the subset of \({\mathbb {R}}^{\varOmega }\) consisting of all functions from \(\varOmega \) to \({\mathbb {R}}\) which are nonzero in at most a finite subset of \(\varOmega \). More precisely, \({\mathbb {R}}^{(\varOmega )}\) is defined as follows.

$$\begin{aligned} {\mathbb {R}}^{(\varOmega )}:=\{\lambda :\varOmega \rightarrow {\mathbb {R}}: \ \lambda (\omega )=0, \ \text {for all} \ \omega \in \varOmega \ \text {except for a finite number} \}, \end{aligned}$$

and \({\mathbb {R}}_+^{(\varOmega )} := \{ \lambda \in {\mathbb {R}}^{(\varOmega )} : \lambda (\omega ) \ge 0, \forall \omega \in \varOmega \}\). For \(\lambda \in {\mathbb {R}}^{(\varOmega )}\), let

$$\begin{aligned} \text {supp}(\lambda ):=\{\omega \in \varOmega :\lambda (\omega )\ne 0\}. \end{aligned}$$

Therefore, we say that \({\mathbb {R}}^{(\varOmega )}\) is the subset of \({\mathbb {R}}^{\varOmega }\) consisting of all functions with finite support. The set \({\mathbb {R}}^{(\varOmega )}\) will play the role of the dual space, and hence will contain the Lagrange multipliers.

For \(\lambda \in {\mathbb {R}}^{(\varOmega )}\) and \(y\in {\mathbb {R}}^{\varOmega }\), we can define a scalar product in a natural way:

$$\begin{aligned} \langle \lambda , y\rangle :=\sum _{\omega \in \text {supp}(\lambda )}\lambda (\omega )y(\omega ). \end{aligned}$$

For a finite set \(\varGamma =\{\omega _1,\ldots ,\omega _m\}\subset \varOmega \) and \(y\in {\mathbb {R}}^{\varOmega }\), denote by \(y^\varGamma (\cdot )\) the restriction of function \(y(\cdot )\) to \(\varGamma \), i.e.,

$$\begin{aligned} y^\varGamma (\omega ) :={\left\{ \begin{array}{ll} y(\omega ), \ \ \ \ &{}\text {if}\ \omega \in \varGamma ,\\ 0, &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(1)

By abuse of notation, we may use the same symbol \(y^\varGamma \) to denote the finite dimensional vector that has for coordinates \(y^\varGamma =(y(\omega _1),\ldots ,y(\omega _m))^T\).

Let \(\sigma :{\mathbb {R}}^{(\varOmega )}\rightarrow {\mathbb {R}}_+\) be a function with \(\sigma (0)=0\). We refer to the function \(\sigma \) as an augmenting function.

For a given \(x\in {\mathbb {R}}^n\), we define the set

$$\begin{aligned} D(x):=\{y\in {\mathbb {R}}^\varOmega \,:\, g(x,\omega )+y(\omega )\le 0\,\forall \,\omega \in \varOmega \}, \end{aligned}$$

which is the set of “admissible” perturbations at the point x.

We will now use all these elements to define the augmented Lagrangian for (SIP).

Definition 2.1

Fix \(\lambda \in {\mathbb {R}}^{(\varOmega )}\), \(r>0\), a penalty parameter. The augmented Lagrangian for the problem \({(\text {SIP})}\) is the function \(L^{\varOmega }:{\mathbb {R}}^n\times {{\mathbb {R}}^{(\varOmega )}}\times {\mathbb {R}}_{++}\rightarrow {\mathbb {R}}_{\pm \infty }\) defined by

$$\begin{aligned} L^\varOmega (x,\lambda ,r)&:={\inf }_{y\in {\mathbb {R}}^{\varOmega }}\{{f}(x)-\langle \lambda ,y\rangle +r\sigma (y^\varGamma ): g(x,\omega )+y(\omega )\le 0, \omega \in \varOmega \}\nonumber \\&={\inf }_{y\in {\mathbb {R}}^{\varOmega }}\{ {f}(x)-\langle \lambda ,y\rangle +r\sigma (y^\varGamma )+\delta _{D(x)}(y)\}. \end{aligned}$$
(2)

Denote by \(\text {val }{(\text {SIP})}\) the optimal value of the problem \((\text {SIP})\), and denote by \(X_0\) the set of feasible solutions of \({(\text {SIP})}\), i.e.,

$$\begin{aligned} X_0:=\{x\in {\mathbb {R}}^n: g(x, \omega )\le 0, \omega \in \varOmega \}. \end{aligned}$$

Let \(x^* \in X_0\). Denote the active constraint index set at \(x^*\) by

$$\begin{aligned} I(x^*):=\{\omega \in \varOmega : g(x^*, \omega )=0 \}. \end{aligned}$$

We assume throughout the paper that the optimal value of problem \(\text {(SIP)}\) is finite.

Fix \(y\in {\mathbb {R}}^{\varOmega }\) and denote by val \((P_{y})\) the optimal value of the following parameterized problem of \( \text {(SIP)}\):

figure b

With this notation, for \(y=0\) we naturally obtain val \( (P_{0})=\text {val }\text {(SIP)}\).

Let \(\lambda \in {\mathbb {R}}^{(\varOmega )}\), \(\varGamma =\text {supp}(\lambda )\) and \(r>0\). Define \(v_r^\varOmega :{\mathbb {R}}^{\varOmega }\rightarrow {\mathbb {R}}_{\pm \infty }\) as

$$\begin{aligned} v_r^\varOmega (y):=\text {val }(P_y)+r\sigma (y^\varGamma ), \quad y \in {\mathbb {R}}^{\varOmega }. \end{aligned}$$
(3)

The assumption that \(\sigma (0)=0\) implies that \(v_r^\varOmega (0)=\text {val }\text {(SIP)}\) for every \(r>0\). By (2)–(3), we always have that

$$\begin{aligned} v_r^\varOmega (0)&\ge {\inf }_{y\in {\mathbb {R}}^{\varOmega }}\{v_r^\varOmega (y)-\langle \lambda , y\rangle \}={\inf }_{y\in {\mathbb {R}}^{\varOmega }}\{\text {val }(P_y)-\langle \lambda , y\rangle +r\sigma (y^\varGamma )\} \nonumber \\&={\inf }_{y\in {\mathbb {R}}^{\varOmega }}{\inf }_{x\in {\mathbb {R}}^n}\{f(x)-\langle \lambda , y\rangle +r\sigma (y^\varGamma ): g(x, \omega )+y(\omega ) \le 0, \ \omega \in \varOmega \} \nonumber \\&={\inf }_{x\in {\mathbb {R}}^n}{\inf }_{y\in {\mathbb {R}}^{\varOmega }}\{{f}(x)-\langle \lambda ,y\rangle +r\sigma (y^\varGamma ): g(x,\omega )+y(\omega )\le 0, \omega \in \varOmega \}\nonumber \\&={\inf }_{x\in {\mathbb {R}}^n}L^\varOmega (x, \lambda , r). \end{aligned}$$
(4)

When there exists \(r>0\) such that the above expression holds as an equality, we say that \(\lambda \) is an augmented Lagrange multiplier with penalty parameter r. We make this precise in the next definition.

Definition 2.2

\(\lambda \in {\mathbb {R}}^{(\varOmega )}\) is said to be an augmented Lagrange multiplier for (SIP) if val (SIP) is finite and there exists \(r\ge 0\) such that

$$\begin{aligned} v_r^\varOmega (y)\ge v_r^\varOmega (0)+\langle \lambda , y\rangle , \quad \forall y\in {\mathbb {R}}^{\varOmega }. \end{aligned}$$
(5)

If (5) holds for r, then we say that the augmented Lagrange multiplier \(\lambda \) has a penalty parameter r.

Remark 2.1

  1. (i)

    If (5) holds for a given \(r>0\), then \(\lambda \) is an augmented Lagrange multiplier for (SIP) for every \(r'>r\). This is a direct consequence of (3) and the fact that \(\sigma \) is nonnegative.

  2. (ii)

    It follows from (5) that \(\lambda \) is an augmented Lagrange multiplier for \(\text {(SIP)}\) if and only if it satisfies an inequality resembling the subgradient inequality for convex functions. That is, \( \lambda \in \partial v_r^\varOmega (y_1)\) if and only if \( v_r^\varOmega (y)-v_r^\varOmega (y_1)\ge \langle \lambda , y-y_1\rangle , \forall y\in {\mathbb {R}}^{\varOmega }\). Note that \(\partial v_r^\varOmega (0)\) is well-defined only if \(v_r^\varOmega (0)=\text {val }\text {(SIP)}\) is finite.

2.1 Discretizations and Lagrangian Functions for (SIP)

A finite set \(\varGamma :=\{\omega _1,\dots , \omega _m\}\subset \varOmega \) naturally induces a discretized version of (SIP), in which only the constraints corresponding to the elements of \(\varGamma \) are taken into account. This discretized problem is denoted \(({P}^\varGamma )\), and defined as:

figure c

Denote by \(S^\varGamma \) the set of optimal solutions of (\(P^\varGamma )\). As in (\(\hbox {P}_y\)), we can perturb the discretized problem (\(P^\varGamma )\), this time with \({\bar{y}}=(\bar{y}_1, \dots , \bar{y}_m)^T\in {\mathbb {R}}^m\). This parameterized problem becomes:

figure d

We denote by \({v}^\varGamma ({\bar{y}})\) the optimal value of \((P_{{\bar{y}}}^\varGamma )\). Recall that \( y^\varGamma (\cdot )\) is the restriction of the function \(y(\cdot )\) to \(\varGamma \) [cf. (1)]. Define \( \sigma ^\varGamma ({\bar{y}}):=\sigma ( y^\varGamma ). \) This allows us to define, as in (3):

$$\begin{aligned} {v}_r^\varGamma ({\bar{y}}):={v}^\varGamma ({\bar{y}})+r\sigma ^\varGamma ({\bar{y}}). \end{aligned}$$

We define next the Lagrangians associated with the discretized problem (P\(_{{\bar{y}}}^\varGamma )\).

Definition 2.3

Given a finite set \(\varGamma :=\{\omega _1,\dots , \omega _m\}\subset \varOmega \), the augmented Lagrangian and the classical Lagrangian for problem \((P^\varGamma )\) are defined respectively by

$$\begin{aligned} L^\varGamma (x,{\bar{\lambda }},r):&=\inf _{{\bar{y}}\in {\mathbb {R}}^m}\{{f}(x)-\langle {\bar{\lambda }},\bar{y}\rangle +r\sigma ^\varGamma ({\bar{y}}):g(x,\omega _j)+\bar{y}_j\le 0,\ \omega _j \in \varGamma \}, \nonumber \\ L_0^\varGamma (x,{\bar{\lambda }}):&={f}(x)+\sum _{w \in \varGamma } {\bar{\lambda }}(\omega ) g(x,\omega ),\, \end{aligned}$$
(6)

where \(x\in {\mathbb {R}}^n,{\bar{\lambda }}\in {\mathbb {R}}^m\) and \(r\in {\mathbb {R}}_+\).

Remark 2.2

Consider Problem \(({P}_{{\bar{y}}}^\varGamma )\), and let \(y\in {\mathbb {R}}^{(\varOmega )}\) be such that \(y(\omega _j)=:\bar{y}_j,\) for all \(\omega _j\in \varGamma \). In this situation, the discretized problem \(({P}^\varGamma _{{\bar{y}}}\)) is a relaxation of problem \(({P}_y)\), in the sense that the constraint set in \(({P}_{{\bar{y}}}^\varGamma )\) is larger than the one in \(({P}_y)\). Namely,

$$\begin{aligned} \text{ If } y^\varGamma ={\bar{y}} \in {\mathbb {R}}^m,\quad \text{ then }\quad \text {val }(P_y)\ge {v}^\varGamma ({\bar{y}}). \end{aligned}$$

This fact and (3) yield

$$\begin{aligned} {v}_r^\varOmega (y)= \text {val }(P_y)+r\sigma (y^\varGamma )\ge {v}_r^\varGamma ({\bar{y}}), \end{aligned}$$
(7)

where \({\bar{y}}\) is the restriction of \(y\in {\mathbb {R}}^{(\varOmega )}\) to the set \(\varGamma \).

We define next an augmented Lagrangian multiplier for the discretized problems.

Definition 2.4

Fix \(\varGamma :=\{\omega _1,\ldots ,\omega _m\}\subset \varOmega \), a vector \(\lambda \in {\mathbb {R}}^m\) is said to be an augmented Lagrange multiplier for \(({P}^\varGamma )\) if val \(({P}^\varGamma )\) is finite and there exists \(r\ge 0\) such that

$$\begin{aligned} v_r^\varGamma (y)\ge v^\varOmega (0)+\langle \lambda , y\rangle , \quad \forall y\in {\mathbb {R}}^m. \end{aligned}$$
(8)

If (8) holds for r, then we say that the augmented Lagrange multiplier \(\lambda \) has a penalty parameter r.

The following is an example where the augmented Lagrangian for \(({P}^\varGamma )\) with a separable augmenting function is computed explicitly.

Example 2.1

If \(\sigma ^\varGamma \) is separable, i.e., \(\sigma ^\varGamma ({\bar{y}}) = \sum _{j =1}^{m} \sigma _j({\bar{y}}_j)\), then the expression of the augmented Lagrangian can be simplified to

$$\begin{aligned} L^\varGamma (x,{\bar{\lambda }},r)&=\inf _{{\bar{y}}\in {\mathbb {R}}^{m}}\left\{ {f}(x)-\langle {\bar{\lambda }},{\bar{y}}\rangle +r\sum _{j =1}^{m} \sigma _j({\bar{y}}_j) : g(x,\omega _j)+{\bar{y}}_i\le 0, j = 1, \ldots ,m \right\} \\&= f(x) + r \sum _{j =1}^{m} \inf _{{\bar{y}}_j \in {\mathbb {R}}}\left\{ - \frac{{\bar{\lambda }}_j}{r} {\bar{y}}_j +\sigma _j({\bar{y}}_j): {\bar{y}}_j \le - g(x,\omega _j) \right\} . \end{aligned}$$

Assume furthermore that \(\sigma _j({\bar{y}}_j)=\max \{{\bar{y}}_j^2, \sqrt{|{\bar{y}}_j|}\}\) for all \(j=1,\ldots ,m,\) and let \(\psi :{\mathbb {R}}\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) be defined as \(\psi _j(a,b) = \inf \{ -bz + \max \{z^2, \sqrt{|z|}\} : z \le -a \},\) then

$$\begin{aligned} \psi _j(a,b) = \left\{ \begin{array}{ll} 0, &{} \text{ if } 0 \le -a, -1\le b\le 1 \text{ or } 0 \le -a \le \frac{1}{b^2}, 1 \le b;\\ -|b|+1, &{} \text{ if } -1 \le -a, b \le -1 \text{ or } 1 \le -a, 1 \le b \\ &{} \qquad \text{ or } -1 \le -a \le -\left( 1+\frac{1}{b}\right) ^2, -1 \le b \le -\frac{1}{2};\\ ab + \max \{a^2, \sqrt{|a|}\}, &{} \text{ if } -a \le -1, b \le -1 \text{ or } -a \le 0, -\frac{1}{2} \le b \le 1\\ &{} \qquad \text{ or } -a \in (-\infty , -1] \cup \left[ -\left( 1+\frac{1}{b}\right) ^2, 0\right] ,\\ &{} \qquad -1 \le b \le -\frac{1}{2}\\ &{} \qquad \text{ or } -a \in (-\infty , 0]\cup \left[ \frac{1}{b^2},1\right] , 1 \le b. \end{array} \right. \end{aligned}$$

Thus, we obtain

$$\begin{aligned} L^\varGamma (x,\lambda ,r) = f(x) + r\sum _{j =1}^{m} \psi _j\left( g(x,\omega _j),\frac{\lambda _j}{r}\right) . \end{aligned}$$

The following lemma shows that the optimal value of the discretized augmented Lagrangian is always equal to the augmented Lagrangian for (SIP). It also relates the discretized Lagrangian with the classical one.

In the sequel, we will use the following notation: given a finite set \(\varGamma :=\{\omega _1,\ldots ,\omega _m\}\subset \varOmega \), and \((x,u)\in {\mathbb {R}}^n\times {\mathbb {R}}_+^m\), call \(\gamma (x,u)\in {\mathbb {R}}^m\) the vector defined by

$$\begin{aligned}{}[\gamma (x,u)]_j=-g(x,\omega _j)-u_j, \text{ for } j=1,\ldots ,m. \end{aligned}$$
(9)

Lemma 2.1

Let \(f: {\mathbb {R}}^n\rightarrow {\mathbb {R}}\), \( g: {\mathbb {R}}^n\times \varOmega \rightarrow {\mathbb {R}}\) be real-valued functions, and let \(\lambda \in {\mathbb {R}}^{(\varOmega )}\), \(\sigma :{\mathbb {R}}^{\varOmega }\rightarrow {{\mathbb {R}}_+}\) be a real-valued function with \(\sigma (0)=0\), \(\varGamma =\text {supp}(\lambda )=\{\omega _1,\dots , \omega _m\}\subset \varOmega \) and \(r>0\). Then

  1. (i)
    $$\begin{aligned} L^\varOmega (x,\lambda ,r) = L^\varGamma (x,{\bar{\lambda }},r),\quad x\in {\mathbb {R}}^n, \end{aligned}$$
    (10)

    where \({\bar{\lambda }}= (\lambda (\omega _1),\ldots , \lambda (\omega _m))^T\).

  2. (ii)

    For every \(x\in {\mathbb {R}}^n\), we have

    $$\begin{aligned} \begin{aligned} L^\varGamma (x,{\bar{\lambda }},r)&= f(x)+\sum _{j=1}^{m }\lambda (\omega _j)g(x,\omega _j)+\inf _{u\ge 0}\left\{ \sum _{j=1}^{m} \lambda (\omega _j) u_j+r\sigma ^\varGamma (\gamma (x,u))\right\} \\&= L_0^\varGamma (x,{\bar{\lambda }})+\inf _{u\ge 0}\left\{ \sum _{j=1}^{m} \lambda (\omega _j) u_j+r\sigma ^\varGamma (\gamma (x,u))\right\} ,\\ \end{aligned} \end{aligned}$$
    (11)

    where \(\gamma (x,u)\) is as in (9).

Proof

(i) Let \( x\in {\mathbb {R}}^n\). For any \(y\in {\mathbb {R}}^{\varOmega }\) such that \(g(x,\omega )+y(\omega )\le 0,\, \omega \in \varGamma \), set \(\bar{y}_j=y(\omega _j), j=1,\ldots ,m\), so \(\bar{y}=(\bar{y}_1,\ldots ,\bar{y}_m)^T\), and we have \(g(x,\omega _j)+\bar{y}_j\le 0\) for all \(\omega _j \in \varGamma \). The definitions then yield

$$\begin{aligned} f(x)-\langle \lambda , y \rangle + r\sigma (y^\varGamma )&= f(x)-\langle {{\bar{\lambda }}},{\bar{y}} \rangle + r\sigma ^\varGamma (\bar{y})\\&\ge {L}^\varGamma (x,{{\bar{\lambda }}},r), \end{aligned}$$

where we used (6) in the last inequality. Taking now infimum over \(y\in D(x)\) and using (2), we obtain

$$\begin{aligned} L^\varOmega (x,\lambda ,r) \ge {L}^\varGamma (x,{{\bar{\lambda }}},r). \end{aligned}$$

Therefore, (10) holds. To prove the opposite inequality, define the set

$$\begin{aligned} D^\varGamma (x):=\{y\in D(x)\,:\, supp(y)\subset \varGamma \}. \end{aligned}$$

The set \(D^\varGamma (x)\) can be identified with the set

$$\begin{aligned} D_m^{\varGamma }(x):=\{{\bar{y}} \in {\mathbb {R}}^m\,:\, g(x,\omega _j)+\bar{y}_j\le 0,\,\forall \, \omega _j \in \varGamma \}. \end{aligned}$$

Given any function \(y\in D^\varGamma (x)\), call \({\bar{y}}\in {\mathbb {R}}^m\) the vector of images of y. Since \(supp(y)\subset \varGamma \) and \(y^\varGamma ={\bar{y}}\) we have that \(\sigma (y^\varGamma )=\sigma ^\varGamma ({\bar{y}})\) and \(\langle \lambda ,y \rangle =\langle \bar{\lambda },y^\varGamma \rangle = \langle \bar{\lambda },{\bar{y}}\rangle \), where we used the definition of \(\bar{\lambda }\) in the latter equality. Altogether, we can write

$$\begin{aligned} L^\varOmega (x,\lambda ,r)= & {} f(x) +{\inf }_{y\in D(x)} \, -\langle \lambda , y \rangle + r\sigma (y^\varGamma )\\\le & {} f(x) +{\inf }_{y\in D^\varGamma (x)}\, -\langle \lambda , y \rangle + r\sigma (y^\varGamma )\\= & {} f(x) +{\inf }_{{\bar{y}}\in D_m^{\varGamma }(x)}\, -\langle {\bar{\lambda }}, {\bar{y}} \rangle + r\sigma ^\varGamma ({\bar{y}})=L^\varGamma (x,{\bar{\lambda }},r), \end{aligned}$$

which proves the opposite inequality.

(ii) Fix \(x\in {\mathbb {R}}^n\) and take any \(y\in {\mathbb {R}}^{\varOmega }\) such that \(g(x,\omega _j)+ y(\omega _j)\le 0\) for all \(j=1, \dots , m\). Define \(u_j:=-(g(x,\omega _j)+ y(\omega _j))\) for all \(j=1, \dots , m\). Using (6), we can write

$$\begin{aligned} \begin{aligned} L^\varGamma (x,{\bar{\lambda }},r)&=\inf _{{\bar{y}}\in {\mathbb {R}}^m}\{{f}(x)-\langle {\bar{\lambda }},\bar{y}\rangle +r\sigma ({\bar{y}}):g(x,\omega _j)+\bar{y}_j\le 0,\ \omega _j \in \varGamma \}\\&=\inf _{u\ge 0}\left\{ {f}(x)+ \sum _{j=1}^{m} \lambda (\omega _j)(g(x,\omega _j)+ u_j)\right. \\&\quad \left. +\,r\sigma (-g(x,\omega _1)-u_1, \ldots , -g(x,\omega _m)-u_m)\right\} \\&=f(x)+\sum _{j=1}^{m }\lambda (\omega _j)g(x,\omega _j)+\inf _{u\ge 0}\left\{ \sum _{j=1}^{m} \lambda (\omega _j) u_j+r\sigma (\gamma (x,u))\right\} . \end{aligned} \end{aligned}$$

That is, the augmented Lagrangian \(L^{\varGamma }(x,{\bar{\lambda }},r)\) of \((P^\varGamma )\) can be expressed as (11). \(\square \)

Lemma 2.2

Let \(f: {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) and \( g:{\mathbb {R}}^n\times \varOmega \rightarrow {\mathbb {R}}\) be real-valued functions, and let \(\sigma :{\mathbb {R}}^{\varOmega }\rightarrow {{\mathbb {R}}_+}\) be a real-valued function with \(\sigma (0)=0\), \(x^*\in X_0\), \(\lambda ^*\in {\mathbb {R}}_+^{(\varOmega )}\), \(\varGamma =\text {supp}(\lambda ^*)= \{\omega _1, \dots , \omega _m\}\). Assume that the KKT conditions for (SIP) is satisfied at \(x^*\) and \(\lambda ^*\), i.e.,

$$\begin{aligned} \nabla _x L_0^\varGamma (x^*,\lambda ^*) =0, \end{aligned}$$
(12)

and

$$\begin{aligned} \lambda ^*(\omega ) \ge 0, \lambda ^*(\omega ) g(x^*,\omega )=0, \forall \omega \in \varOmega . \end{aligned}$$
(13)

Then

$$\begin{aligned} L^\varOmega (x^*, \lambda ^*, r)=L^\varGamma (x^*, {\bar{\lambda }}^*, r) =f(x^*), \quad \forall r>0, \end{aligned}$$
(14)

where \({\bar{\lambda }}^*= (\lambda ^*(\omega _1),\ldots , \lambda ^*(\omega _m))\).

Proof

Since \(x^*\in X_0\), we have \(g(x^*,\omega )\le 0, \omega \in \varOmega \). It follows from (2) that

$$\begin{aligned} L^\varOmega (x^*,\lambda ^*,r)\le f(x^*). \end{aligned}$$
(15)

Letting \(\tilde{u}_j=- g(x^*,\omega _j)\ge 0\) for all \(j=1,\ldots , m\), we obtain

$$\begin{aligned} 0\le {\inf }_{u\ge 0}\sigma ^\varGamma (\gamma (x^*,u))\le \sigma ^\varGamma (\gamma (x^*,\tilde{u}))=\sigma (0)=0. \end{aligned}$$
(16)

By using (13), we have

$$\begin{aligned} \inf _{u\ge 0}\sum _{j=1}^{m} \lambda ^*(\omega _j) u_j=0. \end{aligned}$$
(17)

Noticing the KKT conditions and combining (16) and (17) with (11), we have

$$\begin{aligned}&L^\varGamma (x^*,{\bar{\lambda }}^*,r)\\&\quad = f(x^*)+{\sum }_{j=1}^{m}\lambda ^*(\omega _j)g(x^*,\omega _j) +{\inf }_{u\ge 0}\left\{ {\sum }_{j=1}^{m} \lambda ^*(\omega _j) u_j +r\sigma ^\varGamma (\gamma (x^*,u))\right\} \\&\quad \ge f(x^*)+ {\inf }_{u\ge 0}\left\{ {\sum }_{j=1}^{m} \lambda ^*(\omega _j) u_j\right\} + r\, {\inf }_{u\ge 0}\left\{ \sigma ^\varGamma (\gamma (x^*,u))\right\} = f(x^*). \end{aligned}$$

This, together with (10) and (15), implies that (14) holds. \(\square \)

2.2 Duality Properties of (SIP)

In this section, we establish some properties and strong duality results for (SIP), including first-order optimality conditions and a global saddle property.

Lemma 2.3

Let \(f: {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) and \( g:{\mathbb {R}}^n\times \varOmega \rightarrow {\mathbb {R}}\) be real-valued functions, and let \(\sigma :{\mathbb {R}}^{\varOmega }\rightarrow {{\mathbb {R}}_+}\) be a real-valued function with \(\sigma (0)=0\), \(x^*\in X_0\), \(\lambda ^*\in {\mathbb {R}}_+^{(\varOmega )}\), \(\varGamma =\text {supp}(\lambda ^*)= \{\omega _1, \dots , \omega _m\}\). Assume that the KKT conditions (12) and (13) for (SIP) are satisfied at \(x^*\) and \(\lambda ^*\). If there exists \(r^*>0\), such that \(\lambda ^*\) is an augmented Lagrange multiplier of the problem (SIP) with the penalty parameter \(r^*\) and \(f(x^*)={\text {val }}\) (SIP), then

$$\begin{aligned} L^\varOmega (x, \lambda ^*, r^*)\ge L^\varOmega (x^*, \lambda ^*, r^*)\ge L^\varOmega (x^*, \lambda , r^*), \quad \forall x\in {\mathbb {R}}^n,\ \lambda \in {\mathbb {R}}^{(\varOmega )}. \end{aligned}$$
(18)

Conversely, if

$$\begin{aligned} L^\varOmega (x, \lambda ^*, r^*)\ge L^\varOmega (x^*, \lambda ^*, r^*), \quad \forall x\in {\mathbb {R}}^n, \end{aligned}$$
(19)

then \(\lambda ^*\) is an augmented Lagrange multiplier of (SIP) with the penalty parameter \(r^*\).

Proof

As \(\lambda ^*\) is an augmented Lagrange multiplier of \(\text {(SIP)}\) with the penalty parameter \(r^*\), we have

$$\begin{aligned} v_{r^*}^{\varOmega }(y)\ge v_{r^*}^{\varOmega }(0)+\langle \lambda ^*, y\rangle ,\quad \forall y\in {\mathbb {R}}^{\varOmega }. \end{aligned}$$

Therefore,

$$\begin{aligned} {\inf }_{y\in {\mathbb {R}}^{\varOmega }}\{v^\varOmega _{r^*}(y)-\langle \lambda ^*, y\rangle \} \ge v^\varOmega _{r^*}(0)=\text {val }\text {(SIP)}. \end{aligned}$$
(20)

It follows from (14), (4) and (20) that

$$\begin{aligned} f(x^*)&= L^\varOmega (x^*, \lambda ^*, r^*)\ge {\inf }_{x\in {\mathbb {R}}^n}L^\varOmega (x, \lambda ^*, r^*)\\&= {\inf }_{y\in {\mathbb {R}}^{\varOmega }}\{v^\varOmega _{r^*}(y)-\langle \lambda ^*, y\rangle \} \\&\ge \text {val }\text {(SIP)}=f(x^*). \end{aligned}$$

The above expression implies that

$$\begin{aligned} L^\varOmega (x, \lambda ^*, r^*)\ge L^\varOmega (x^*, \lambda ^*, r^*), \quad \forall x\in {\mathbb {R}}^n. \end{aligned}$$
(21)

Let \(\lambda \in {\mathbb {R}}^{(\varOmega )}.\) Since \(x^*\in X_0\) and \(0\in {\mathbb {R}}^{\varOmega }\), it follows from (2) that

$$\begin{aligned} L^\varOmega (x^*, \lambda , r^*)\le f(x^*)= L^\varOmega (x^*, \lambda ^*, r^*). \end{aligned}$$
(22)

By (21) and (22), we obtain both inequalities in (18).

Assume that there exists \(r^*>0\), such that (19) holds. Then we obtain

$$\begin{aligned} \inf _{x\in {\mathbb {R}}^n}L^\varOmega (x, \lambda ^*, r^*)\ge L^\varOmega (x^*, \lambda ^*, r^*)=f(x^*) = \text {val }\text {(SIP)}. \end{aligned}$$
(23)

It follows from (4) and (23) that

$$\begin{aligned} {\inf }_{y\in {\mathbb {R}}^{\varOmega }}\{v^\varOmega _{r^*}(y)-\langle \lambda ^*, y\rangle \} ={\inf }_{x\in {\mathbb {R}}^n}L^\varOmega (x, \lambda ^*, r^*)\ge \text {val }\text {(SIP)}. \end{aligned}$$

This implies that

$$\begin{aligned} v_{r^*}^{\varOmega }(y)\ge v_{r^*}^{\varOmega }(0)+\langle \lambda ^*, y\rangle ,\quad \forall y\in {\mathbb {R}}^{\varOmega }. \end{aligned}$$

That is, \(\lambda ^*\) is an augmented Lagrange multiplier of \(\text {(SIP)}\) with penalty parameter \(r^*\). \(\square \)

Remark 2.3

By Lemma 2.3, if \(\lambda ^*\in {\mathbb {R}}^{(\varOmega )}\) is an augmented Lagrange multiplier of (SIP) with the penalty parameter \(r^*\), and \(x^*\) is an optimal solution of (SIP) and satisfies (12) and (13), then \((x^*,\lambda ^*)\) is a saddle point of \(L^\varOmega (\cdot , \cdot , r^*)\). Conversely, if for some \(r^*>0\), \((x^*,\lambda ^*)\) is a saddle point of \(L(\cdot , \cdot , r^*)\), then \(\lambda ^*\) is an augmented Lagrange multiplier of (SIP) with the penalty parameter \(r^*\).

3 First-order Conditions for Augmented Lagrange Multipliers

In this section, we focus on the augmented Lagrangian in which the augmenting function is given by the norm of the function \(y^\varGamma \), the restriction of \(y\in {\mathbb {R}}^{\varOmega }\) to a fixed finite subset \(\varGamma \subset \varOmega \). Namely, given \(\varGamma :=\{\omega _1,\ldots ,\omega _m\}\) we consider \(\sigma ^\varGamma :{\mathbb {R}}^{(\varOmega )}\rightarrow {\mathbb {R}}_+\), defined as

$$\begin{aligned} \sigma ^\varGamma (y):=\sigma (y^\varGamma )=\Vert y^\varGamma \Vert =\sum _{\omega \in \varGamma }|y(\omega )|=\sum _{j=1}^m |y(\omega _j)|. \end{aligned}$$
(24)

The resulting augmented Lagrangian, inspired in [15, Example 11.58], is called the sharp Lagrangian. It is a classical fact from convex programming that the sharp Lagrangian admits an exact penalty parameter (see, e.g., [27, Theorem 9.3.1]). We will show that this can also be achieved in our more general context. First, we will obtain necessary optimality conditions for the existence of an augmented Lagrange multiplier of the sharp Lagrangian problem. Second, we establish sufficient conditions for the existence of an augmented Lagrange multiplier, by assuming inf-compactness of the functions involved and an extended Mangasarian–Fromovitz constraint qualification. We start by establishing necessary first-order necessary conditions for the existence of an augmented Lagrange multiplier of the semi-infinite programming problem \(\text {(SIP)}\).

Given a fixed finite subset \(\varGamma \subset \varOmega \), we will use the following notation for the set of active constraints of (SIP) and the set of active constraints of \((P^\varGamma )\) at a given \(x \in X_0\), respectively.

$$\begin{aligned} I(x):=\{w\in \varOmega \,:\, g(x,w)= 0\},\quad \varGamma (x):=\{ w\in \varGamma \,:\,g(x,w)= 0 \}=I(x)\cap \varGamma . \end{aligned}$$

Theorem 3.1

Let \(f(\cdot )\) and \( g(\cdot ,\omega ), \omega \in \varOmega \), be real-valued continuously differentiable functions. Fix \(\lambda ^*\in {\mathbb {R}}^{(\varOmega )}\), and call \(\varGamma = \text {supp}(\lambda ^*)=\{\omega _1, \dots , \omega _m\}\). Consider \(\sigma ^\varGamma \) as in (24). Assume that

  1. (a)

    \(\lambda ^*\) is an augmented Lagrange multiplier of (SIP) with penalty parameter \(r^*\),

  2. (b)

    \(x^*\) is a local minimum for (SIP), and

  3. (c)

    \(\text {val }\) (SIP) \(=f(x^*)\).

In this situation, we have that

$$\begin{aligned} \nabla f(x^*)^\top d \ge 0, \quad \forall d \in \varPsi (x^*), \end{aligned}$$

where

$$\begin{aligned} \varPsi (x^*):= \{d\in {\mathbb {R}}^n: \nabla _x g(x^*,\omega )^\top d\le 0, \omega \in I(x^*)\}, \end{aligned}$$

is the cone of feasible directions from \(x^*\).

Proof

Assumption (a) means that

$$\begin{aligned} v_{r^*}^{\varOmega }(y)- \langle \lambda ^*, y\rangle \ge v_{r^*}^{\varOmega }(0), \quad \forall y\in {\mathbb {R}}^{\varOmega }. \end{aligned}$$

Noting that \(v_{r^*}^{\varOmega }(y)=\text {val }(P_y)+{r^*}\sigma (y^\varGamma )=\text {val }(P_y)+{r^*} \sum _{\omega \in \varGamma }|y(\omega )|\) and \(v_{r^*}^{\varOmega }(0)=\text {val }\text { (SIP)},\) the last inequality is rewritten as

$$\begin{aligned} \text {val }(P_y)-\langle \lambda ^*, y\rangle +r^*\sum _{\omega \in \varGamma }|y(\omega )|\ge \text {val }\text {(SIP)}, \quad \forall y\in {\mathbb {R}}^{\varOmega }. \end{aligned}$$

Taking infimum in the above inequality and using (c), we obtain

$$\begin{aligned} \inf _{y\in {\mathbb {R}}^{\varOmega }}\left\{ \text {val }(P_y)-\langle \lambda ^*, y\rangle +r^*\sum _{\omega \in \varGamma }|y(\omega )|\right\} \ge \text {val }\text {(SIP)}= f(x^*). \end{aligned}$$

Using now the definition of \(\text {val }(P_y)\) and the fact (b) that \(x^*\) is a local minimum for \((\text {SIP})\), we obtain

$$\begin{aligned} f(x^*)\le & {} {\inf }_{y\in {\mathbb {R}}^{\varOmega }}\left\{ \text {val }(P_y)-\langle \lambda ^*,y\rangle +r^*{\sum }_{\omega \in \varGamma }|y(\omega )|\right\} \\= & {} {\inf }_{y\in {\mathbb {R}}^{\varOmega }} {\inf }_{x\in {\mathbb {R}}^n}\Big \{f(x) -\langle \lambda ^*,y\rangle +r^*{\sum }_{\omega \in \varGamma }|y(\omega )|\,:\, g(x,\omega )\\&+\,y(\omega )\le 0, \omega \in \varOmega \Big \}\\= & {} {\inf }_{x\in {\mathbb {R}}^n}{\inf }_{y\in {\mathbb {R}}^{\varOmega }}\left\{ f(x) \quad +\,{\sum }_{j=1}^m\lambda ^*(\omega _j) (-y(\omega _j))\right. \\&\left. +\,r^*{\sum }_{j=1}^m|y(\omega _j)|\,:\, g(x,\omega )+y(\omega )\le 0, \omega \in \varOmega \right\} \\= & {} {\inf }_{x\in {\mathbb {R}}^n} L^\varOmega (x,\lambda ^*,r^*), \end{aligned}$$

where we used the definition of \(L^\varOmega \) (see (2)) in the last equality, and the fact that \(\varGamma =\text {supp}\,\lambda ^*\) in the second-to-last equality. This expression readily gives

$$\begin{aligned} f(x^*)&\le {\inf }_{y\in {\mathbb {R}}^{\varOmega }}\left\{ f(x) \quad +\,{\sum }_{j=1}^m\lambda ^*(\omega _j) (-y(\omega _j))\right. \nonumber \\&\left. \quad +\,r^*{\sum }_{j=1}^m|y(\omega _j)|\,:\, g(x,\omega )+y(\omega )\le 0, \omega \in \varOmega \right\} = L^\varOmega (x,\lambda ^*,{r^*}), \end{aligned}$$
(25)

for all \(x\in {\mathbb {R}}^n\). Fix \(x\in {\mathbb {R}}^n\), for each \(\omega _j\in \varGamma \), we have two possibilities. Either \(g(x,\omega _j)\le 0\), or \(g(x,\omega _j)> 0\). If \(g(x,\omega _j)\le 0\), take \(\tilde{y}(\omega _j)=0\), otherwise, take \(\tilde{y}(\omega _j)=-g(x,\omega _j)\). Namely, take \(\tilde{y}(\omega ):=-\max \{0,g(x,\omega )\}\) for \(\omega \in \varGamma \) and \(\tilde{y}(\omega )=0\) for \(\omega \not \in \varGamma \). Using this choice of \(\tilde{y}\in {\mathbb {R}}^{\varOmega }\) in (25), we can write

$$\begin{aligned} f(x^*) \le L^\varOmega (x,\lambda ^*,r^*)&={\inf }_{y\in {\mathbb {R}}^{\varOmega }}\left\{ f(x) \quad +\,{\sum }_{j=1}^m\lambda ^*(\omega _j) (-y(\omega _j))\right. \nonumber \\&\left. \quad +\,r^*{\sum }_{j=1}^m|y(\omega _j)|\,:\, g(x,\omega )+y(\omega )\le 0, \omega \in \varOmega \right\} \nonumber \\&\le f(x)+{\sum }_{j=1}^m \lambda ^*(\omega _j) g^+(x,\omega _j)+r^*{\sum }_{j=1}^{m} g^+(x,\omega _j), \end{aligned}$$
(26)

where \((a)^+=\max \{a,0\}\) for any \(a\in {\mathbb {R}}\). Inequality (26) implies that

$$\begin{aligned} f(x^*)\le f(x)+\sum _{j=1}^m (\lambda ^*(\omega _j) + r^*) g^+(x,\omega _j),\quad \forall x \in {\mathbb {R}}^n, \end{aligned}$$

i.e., \(x^*\) is an unconstrained minimum of \(H(x):=f(x)+\sum _{j=1}^m (\lambda ^*(\omega _j) + r^*) g^+(x,\omega _j)\). Thus it must satisfy the necessary first-order optimality conditions

$$\begin{aligned} H^\circ (x^*, d)=\nabla f(x^*)^T d +\sum _{j\in I(x^*)} (\lambda ^*(\omega _j)+r^*) \max \{\nabla _x g(x^*,\omega _j)^T d, 0\}\ge 0, \quad \forall \,d\in {\mathbb {R}}^n, \end{aligned}$$

where \(H^\circ (x^*, d)\) is the Clarke generalized directional derivative of H at \(x^*\) in the direction d, see [29], and the sum is restricted to \(j\in I(x^*)\) because \(x^*\) is feasible and \(\nabla _x g^+(x^*,\omega _j) ^\top d=0\) for all \(j\not \in I(x^*)\) and all \(d\in {\mathbb {R}}^n\). Fix now \(d\in \varPsi (x^*)\). By definition of \(\varPsi (x^*)\), we must have \(\max \{\nabla _x g(x^*,\omega _j)^Td, 0\}=0\), and the above inequality yields

$$\begin{aligned} \nabla f(x^*)^\top d \ge 0, \quad \forall d \in \varPsi (x^*). \end{aligned}$$

This completes the proof. \(\square \)

The last result in this section establishes sufficient first-order conditions for the existence of an augmented Lagrange multiplier. Given \(\varGamma =\{\omega _1,\ldots , \omega _m\}\subset \varOmega \), define

$$\begin{aligned} X_0^\varGamma :=\{x\in {\mathbb {R}}^n:g(x,\omega )\le 0, \omega \in \varGamma \}, \end{aligned}$$
(27)

the subset of points which are feasible for problem \((P^\varGamma )\). We will make the following assumptions on the objective \(f(\cdot ) \) and the subset of constraint functions \(g(\cdot ,\omega )\) for \(\omega \in \varGamma \).

Assumption 3.1

Let f(x),  and \(g(x,\omega )\) (\(\omega \in \varGamma \)) be inf-compact, that is, there exist \(\alpha \in {\mathbb {R}}\), \(\omega \in \varGamma \), such that the sets \(\{ x \in {\mathbb {R}}^n : f(x) \le \alpha \}\) and \(\{ x \in {\mathbb {R}}^n : g(x,\omega ) \le \alpha \}\) are nonempty and compact.

Remark 3.1

If Assumption 3.1 holds, then the inf-compactness of f implies that the solution set \(S^\varGamma \) of the discretized problem \((P^{\varGamma })\) is nonempty and compact.

The extended Mangasarian–Fromovitz constraint qualification (EMFCQ in short) is said to hold for \((P^\varGamma )\) at a feasible solution \(x_0\), see Bonnans and Shapiro [28, p. 510], if there exists \(z\in {\mathbb {R}}^n\) such that

$$\begin{aligned} \nabla _x g(x_0,\omega _j)^\top z <0, \quad \forall \omega _j \in \varGamma \text { such that } g(x_0,w_j)=0. \end{aligned}$$

Assumption 3.2

The (EMFCQ) holds for the reduced problem \(({P}^\varGamma )\) at every \(x\in S^\varGamma \).

Let \(x\in X_0\) and define the set of (classical) Lagrange multipliers associated with x:

$$\begin{aligned} \varLambda ^\varGamma (x) := \left\{ \lambda \in {\mathbb {R}}^m : \nabla _x L_0^\varGamma (x,\lambda ) =0, \lambda \ge 0, \lambda _j g(x,\omega _j)=0, \ j=1, \dots , m\right\} . \end{aligned}$$

It is known that if Assumption 3.2 holds, then \(\varLambda ^\varGamma (x)\) is bounded (cf. Gauvin [30], see also Bonnans and Shapiro [28, p. 510]). A general result on the boundedness of the set of Lagrange multipliers for an optimization problem with a closed set inclusion constraint is given in Burke [31].

Theorem 3.2

We assume that (SIP) is convex. Let \(\lambda ^*\in {\mathbb {R}}^{(\varOmega )}\), and \(\varGamma =\text {supp}( \lambda ^*)= \{\omega _1, \dots , \omega _m\}\). Denote by \( \lambda ^*(\omega _j)={\bar{\lambda }}_j\), \(j=1,\ldots ,m\). Suppose that \(\text {val }(\) SIP \()=\text {val }(P^\varGamma )\). Let \(f(\cdot )\) and \( g(\cdot ,\omega ), \omega \in \varOmega \), be real-valued continuously differentiable functions, and let \(\sigma ^\varGamma \) be as in (24). Suppose that Assumptions 3.1 and 3.2 are satisfied. Then, there exists \(r^*> 0\), such that

  1. (i)

    \({\bar{\lambda }}= ({\bar{\lambda }}_1, \dots , {\bar{\lambda }}_m)^T \) is an augmented Lagrange multiplier of \(( P^\varGamma )\), with penalty parameter \(r^*\), and

  2. (ii)

    \(\lambda ^*\in {\mathbb {R}}^{(\varOmega )}\) is an augmented Lagrange multiplier of (SIP), with the same penalty parameter \(r^*\).

Proof

(i) First, we claim that there exists \(\alpha _0\in {\mathbb {R}}\) such that \(f(x)\ge \alpha _0, \forall x\in {\mathbb {R}}^n\). Indeed, if this is not the case, there exists a sequence \(\{x_k\}\) such that \(f(x_k)\rightarrow -\infty \) as \(k\rightarrow \infty \). By Assumption 3.1, there exists \(\alpha \in {\mathbb {R}}\), such that the set

$$\begin{aligned} {\mathbb {E}}=\{ x \in {\mathbb {R}}^n : f(x) \le \alpha \} \end{aligned}$$

is nonempty and compact. So, \(x_k\in {\mathbb {E}}\) for large enough k. Thus, there exists a subsequence \(\{x_{k_j}\} \) of \(\{x_k\}\), such that \(x_{k_j}\rightarrow x_0\) as \(j\rightarrow \infty \). Since f is continuous, \(x_0\in {\mathbb {E}}\) and \(f(x_{k_j})\rightarrow f(x_0)\) as \(j\rightarrow \infty \), which is impossible.

Recall that \({v}^\varGamma ({\bar{y}})\) is the optimal value of \((P_{{\bar{y}}}^\varGamma )\). Thus

$$\begin{aligned} v^\varGamma ({\bar{y}}) \ge \alpha _0, \end{aligned}$$
(28)

since \(f(x)\ge \alpha _0, \forall x\in {\mathbb {R}}^n\).

To complete our proof, we will use similar arguments to the ones used in Rückmann and Shapiro [24, Theorem 3]. It follows from Assumption 3.1 that the set of optimal solutions of \((P^\varGamma )\) is compact. By Assumption 3.2, the sets \(\varLambda ^\varGamma (x)\) are uniformly bounded for all x in a neighborhood of any point \({\bar{x}}\in S^\varGamma \) (cf. Proposition 4.43 of Bonnans and Shapiro [28]). Thus, we obtain that the set \(\bigcup _{x\in S^\varGamma }\varLambda ^\varGamma (x)\) is bounded as \(S^\varGamma \) is compact, and then \(\sup _{\lambda \in \bigcup _{x\in S^\varGamma }\varLambda ^\varGamma (x)}\Vert \lambda -{\bar{\lambda }}\Vert \) is finite. As for \((P^\varGamma )\), Robinson’s constraint qualification is equivalent to the EMFCQ, by Theorem 4.27 of Bonnans and Shapiro [28], we have

$$\begin{aligned} (v^\varGamma )'_{-}(0,d)\ge \inf _{x\in S^\varGamma }\inf _{\lambda \in \varLambda ^\varGamma (x)} \nabla _y \{f(x)+\langle \lambda , g(x)+y\rangle \}^\top d = \inf _{x\in S^\varGamma }\inf _{\lambda \in \varLambda ^\varGamma (x)} \langle \lambda , d\rangle , \end{aligned}$$
(29)

where \((v^\varGamma )'_{-}(0,d)\) is the lower Hadamard directional derivative of \(v^\varGamma (\cdot )\) at 0 in a unit direction \(d\in {\mathbb {R}}^m\), more precisely,

$$\begin{aligned} (v^\varGamma )'_{-}(0,d):=\displaystyle \sup _{r>0}\left[ \,\,\,\inf _{t\in (0,r),\,\Vert d-d'\Vert <r}\left[ \dfrac{v^\varGamma (td')-v^\varGamma (0)}{t}\right] \right] . \end{aligned}$$

Using the above definition and (29), we can write, for a fixed \(\varepsilon >0\)

$$\begin{aligned} \displaystyle \sup _{r>0}\left[ \,\,\,\inf _{t\in [0,r],\,\Vert d-d'\Vert <r}\left[ \dfrac{v^\varGamma (td')-v^\varGamma (0)}{t}\right] \right] > \inf _{x\in S^\varGamma }\inf _{\lambda \in \varLambda ^\varGamma (x)}\langle \lambda , d\rangle -\varepsilon \,\Vert d\Vert . \end{aligned}$$

Hence, by definition of the supremum in the left-hand side there exists \(\tau >0\) such that for all \(d'\in \tau B_{{\mathbb {R}}^m}\) (where \(B_{{\mathbb {R}}^m}\) is the unit ball in \({\mathbb {R}}^m\)), and every \(t\in [0,\tau ]\) we can write

$$\begin{aligned} \dfrac{v^\varGamma (td')-v^\varGamma (0)}{t}> \left[ {\inf }_{x\in S^\varGamma }{\inf }_{\lambda \in \varLambda ^\varGamma (x)}\langle \lambda , d\rangle \right] - \varepsilon \Vert d\Vert . \end{aligned}$$
(30)

Multiplying by t the expression above and restricting the quotient in the left-hand side to \(d'=d\), we can write (30) in terms of \({\bar{y}}:=td\in \tau B_{{\mathbb {R}}^m}\), to obtain

$$\begin{aligned} v^\varGamma ({\bar{y}})-v^\varGamma (0)> \left[ {\inf }_{x\in S^\varGamma }{\inf }_{\lambda \in \varLambda ^\varGamma (x)}\langle \lambda , {\bar{y}}\rangle \right] - \varepsilon \Vert {\bar{y}}\Vert . \end{aligned}$$
(31)

Let us find a suitable lower bound for the first term in the right-hand side of (31). We can write

$$\begin{aligned} \langle \lambda ,{\bar{y}}\rangle \ge \langle \lambda -{\bar{\lambda }},{\bar{y}}\rangle +\langle {\bar{\lambda }},{\bar{y}}\rangle \ge \langle {\bar{\lambda }},{\bar{y}}\rangle -\Vert \lambda -{\bar{\lambda }}\Vert \Vert {\bar{y}}\Vert . \end{aligned}$$

Taking now the infimum in this expression, we obtain

$$\begin{aligned} \inf _{x\in S^\varGamma }\inf _{\lambda \in \varLambda ^\varGamma (x)}\langle \lambda , {\bar{y}}\rangle&\ge \langle {\bar{\lambda }},{\bar{y}}\rangle + \Vert {\bar{y}}\Vert \,{\inf }_{x\in S^\varGamma }{\inf }_{\lambda \in \varLambda ^\varGamma (x)}\left[ -\Vert \lambda -{\bar{\lambda }}\Vert \right] \nonumber \\&= \langle {\bar{\lambda }},{\bar{y}}\rangle - \Vert {\bar{y}}\Vert \,{\sup }_{x\in S^\varGamma }{\sup }_{\lambda \in \varLambda ^\varGamma (x)}\left[ \Vert \lambda -{\bar{\lambda }}\Vert \right] . \end{aligned}$$
(32)

The boundedness of the set \(\bigcup _{x\in S^\varGamma }\varLambda ^\varGamma (x)\) implies the existence of \(r_1>0\) such that

$$\begin{aligned} r_1>\sup _{x\in S^\varGamma }\sup _{\lambda \in \varLambda ^\varGamma (x)}\Vert \lambda -{\bar{\lambda }}\Vert . \end{aligned}$$

Using this fact, and (32)-(31), we obtain

$$\begin{aligned} v^\varGamma ({\bar{y}})&> {v}^\varGamma (0)+{\inf }_{x\in S^\varGamma }{\inf }_{\lambda \in \varLambda ^\varGamma (x)}\langle \lambda , {\bar{y}}\rangle - \varepsilon \Vert {\bar{y}}\Vert \nonumber \\&= {v}^\varGamma (0)+ \langle {\bar{\lambda }},{\bar{y}}\rangle - \Vert {\bar{y}}\Vert \,{\sup }_{x\in S^\varGamma }{\sup }_{\lambda \in \varLambda ^\varGamma (x)}\left[ \Vert \lambda -{\bar{\lambda }}\Vert \right] - \varepsilon \Vert {\bar{y}}\Vert \nonumber \\&> {v}^\varGamma (0)+\langle {\bar{\lambda }},{\bar{y}}\rangle - \Vert {\bar{y}}\Vert \,r_1 - \varepsilon \Vert {\bar{y}}\Vert > {v}^\varGamma (0)+\langle {\bar{\lambda }},{\bar{y}}\rangle - \Vert {\bar{y}}\Vert (r_1+\varepsilon ). \end{aligned}$$
(33)

Noting that every \({\bar{y}}\in \tau B_{{\mathbb {R}}^m}\) can be written as \({\bar{y}}:=td\) for \(t\in [0,\tau ]\) and d a unit vector, we conclude that the inequality in (33) holds for every \({\bar{y}}\in \tau B_{{\mathbb {R}}^m}\). Altogether, we can write for all \({\bar{y}}\in \tau B_{{\mathbb {R}}^m}\)

$$\begin{aligned} {v}_{r}^\varGamma ({\bar{y}})&={v}^\varGamma ({\bar{y}})+r\sigma ^\varGamma ({\bar{y}})= {v}^\varGamma ({\bar{y}})+r\Vert {\bar{y}}\Vert \nonumber \\&\ge {v}^\varGamma (0)+ \Vert {\bar{y}}\Vert (r -\varepsilon -r_1) + \langle {\bar{\lambda }}, {\bar{y}}\rangle , \end{aligned}$$
(34)

where we used the definition of \({v}_{r}^\varGamma \) in the first equality, the definition of \(\sigma ^\varGamma \) in the second equality, and (33) in the inequality. This implies that for \(r>(\varepsilon +r_1)\), we obtain the inequality in the definition of an augmented Lagrangian for \((P^\varGamma )\), for every \({\bar{y}}\in \tau B_{{\mathbb {R}}^m}\). Now we need to establish this inequality for \({\bar{y}}\not \in \tau B_{{\mathbb {R}}^m}\). Recalling that \(v^\varGamma (0)=\mathrm{val}(P^\varGamma )\), and using also (28), we can write

$$\begin{aligned} v_{r}^\varGamma ({\bar{y}})-v^\varGamma (0)-\langle {\bar{\lambda }},{\bar{y}}\rangle&={v}^\varGamma ({\bar{y}})+{r}\sigma ^\varGamma ({\bar{y}})-v^\varGamma (0)-\langle {\bar{\lambda }},{\bar{y}}\rangle \nonumber \\&= {v}^\varGamma ({\bar{y}})+{r}\Vert {\bar{y}}\Vert -v^\varGamma (0)-\langle {\bar{\lambda }},{\bar{y}}\rangle \nonumber \\&\ge \alpha _0 -\text {val }(P^\varGamma )+({r}-\Vert {\bar{\lambda }}\Vert )\Vert {\bar{y}}\Vert \nonumber \\&\ge \alpha _0 -\text {val }(P^\varGamma )+({r}-\Vert {\bar{\lambda }}\Vert ) \tau , \end{aligned}$$
(35)

where the first inequality comes from (28) and the last inequality follows from choosing \(r>\Vert {\bar{\lambda }}\Vert \) and the fact that \({\bar{y}}\not \in \tau B_{{\mathbb {R}}^m}\). Given these two inequalities, we can choose

$$\begin{aligned} r^*> \max \left\{ r_1+\varepsilon , \Vert {\bar{\lambda }}\Vert +\frac{\text {val }(P^\varGamma )-\alpha _0}{\tau }\right\} >0. \end{aligned}$$
(36)

With this choice, (34), (35) and (36) yield

$$\begin{aligned} v_{r^*}^\varGamma (y)\ge v^\varGamma (0)+\langle {\bar{\lambda }}, y\rangle , \quad \forall y\in {\mathbb {R}}^m. \end{aligned}$$
(37)

From (8) \({\bar{\lambda }}\) is an augmented Lagrange multiplier of \((P^\varGamma )\), with penalty parameter \(r^*\).

(ii) Since \(\text {val }\text {(SIP)}=\text {val (P}^\varGamma )\), we have that \(v_{r^*}^{\varOmega }(0)=v_{r^*}^\varGamma (0)=v^\varGamma (0)=\text {val }(\text {SIP})\). Fix an arbitrary \( y\in {\mathbb {R}}^{(\varOmega )}\), we can take its restriction \(y^\varGamma \) to \(\varGamma \), namely set

$$\begin{aligned} (y(\omega _1),\dots ,y(\omega _m))^T=:y^\varGamma . \end{aligned}$$

By Remark 2.2 and (7) we have \(v_{r^*}^{\varOmega }(y)\ge v_{r^*}^{\varGamma }(y^\varGamma )\). So, it follows from (37) applied to \(y:=y^\varGamma \) that

$$\begin{aligned} v_{r^*}^{\varOmega }(y)\ge v_{r^*}^{\varGamma }(y^\varGamma )\ge v_{r^*}^{\varOmega }(0)+\langle {\bar{\lambda }}, y\rangle , \quad \forall y\in {\mathbb {R}}^{(\varOmega )}. \end{aligned}$$

Therefore, \({\bar{\lambda }}\in {\mathbb {R}}^{(\varOmega )}\) is an augmented Lagrange multiplier of \((\text {SIP})\), with penalty parameter \(r^*\).

Remark 3.2

Note that the existence of an augmented Lagrange multiplier is a global property. The proof of Theorem 3.2 shows that Assumption 3.1 is essential to obtain the existence of an augmented Lagrange multiplier of \((\text {SIP})\). Indeed, by Assumption 3.1 and the continuity property of f, we obtain that (28) and (35) hold.

4 Second-order Conditions for Augmented Lagrange Multipliers

In this section, we investigate second-order conditions for the existence of an augmented Lagrange multiplier for the semi-infinite programming problem \((\text {SIP})\). Recall that for \(y\in {\mathbb {R}}^{\varOmega }\), the norm of the restriction \(y^\varGamma \) to a finite set \(\varGamma =\{\omega _1,\ldots ,\omega _m\}\) is defined as \(\Vert y^\varGamma \Vert =\sum _{\omega \in \varGamma }|y(\omega )|\), i.e., the \(\ell _1\)-norm in \({\mathbb {R}}^m\) of the vector \(y^\varGamma =(y(\omega _1),\ldots ,y(\omega _m))^T\). We will also use \(\Vert \cdot \Vert _{\infty }\) for denoting the infinity norm in \({\mathbb {R}}^m\). Namely, given \(z\in {\mathbb {R}}^m\), we write \(\Vert z\Vert _{\infty }:=\max _{j=1,\ldots ,m}|z_j|\). Next we state a crucial assumption we make on \(\sigma \).

Definition 4.1

Let \(\sigma :{\mathbb {R}}^{(\varOmega )}\rightarrow {{\mathbb {R}}_+}\) be a real-valued function and \(\varGamma \) be a finite subset of \(\varOmega \).

  1. (a)

    We say that \(\sigma \) has a valley at 0 with respect to \(\varGamma \) in \({\mathbb {R}}^{\varOmega }\) if \(\sigma \) is continuous at 0 with \(\sigma (0)=0\), and for every \(\delta >0\) we have \(c_\delta =\inf _{y\in {\mathbb {R}}^{\varOmega }, \Vert y^\varGamma \Vert \ge \delta } \sigma (y^\varGamma ) >0\).

  2. (b)

    Fix \(p\in ]0,1]\). We say that \(\sigma \) satisfies a p-growth condition at 0 with respect to \(\varGamma \) if there exists \(\rho _0>0\) such that

    $$\begin{aligned} \lim _{y\in {\mathbb {R}}^{\varOmega },y\rightarrow 0}\frac{\sigma (y^\varGamma )}{\Vert y^\varGamma \Vert ^p}=\rho _0>0. \end{aligned}$$
    (38)

Remark 4.1

Under Assumption (a) in Definition 4.1, there exists \(\alpha >0\) such that the level set

$$\begin{aligned} V_{\alpha }:=\{z\in {\mathbb {R}}^m\,:\, \sigma ^\varGamma (z)\le \alpha \}, \end{aligned}$$

is bounded. Indeed, if this is not true, then for all \(\alpha >0\) the set \(V_{\alpha }\) is unbounded. Take \(0<\alpha <c_1\), where \(c_1>0\) is as in Definition 4.1(a) for \(\delta =1\). Since \(V_{\alpha }\) is unbounded there exists \(z\in V_{\alpha }\) such that \(\Vert z\Vert >1\). Altogether, we have

$$\begin{aligned} \alpha \ge \sigma ^\varGamma (z)\ge \inf _{y\in {\mathbb {R}}^{\varOmega }, \Vert y^\varGamma \Vert \ge 1} \sigma (y^\varGamma )=c_1, \end{aligned}$$

contradicting the choice of \(\alpha \). Hence, the claim is true and there exists \(\alpha _0>0\) such that \(V_{\alpha _0}\) is bounded. For future use, denote by \(M_{\alpha _0}\) the positive constant such that

$$\begin{aligned} V_{\alpha _0}\subset B[0,M_{\alpha _0}]\subset {\mathbb {R}}^m, \end{aligned}$$

where \(B[0,M_{\alpha _0}]\) is the ball centered at 0 with radius \(M_{\alpha _0}\).

The properties described in Definition 4.1 are used for establishing a lower bound on the augmented Lagrangian. The next technical lemma shows why this assumption is instrumental in establishing the lower bound.

Lemma 4.1

Fix a pair \((\hat{x},\hat{\lambda })\in {\mathbb {R}}^n\times {\mathbb {R}}_+^{(\varOmega )}\) and denote by \(\varGamma =\text {supp}(\hat{\lambda })= \{\omega _1, \dots , \omega _m\}\). Assume that

  1. (i)

    For every \(\omega \in \varGamma \), the function \( g(\cdot ,\omega ):{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) is continuous at \(\hat{x}\), and

  2. (ii)

    \(\sigma :{\mathbb {R}}^{(\varOmega )}\rightarrow {{\mathbb {R}}_+}\) has a valley at 0 respect to \(\varGamma \), and it satisfies p-growth condition at 0 with respect to \(\varGamma \).

In this situation, there exists an open neighborhood \(N(\hat{x})\) of \(\hat{x}\) and \(r'>0\) such that for every \(x\in N(\hat{x})\) and every \(r>r'\) we have

$$\begin{aligned} \inf _{u\ge 0}\left\{ \sum _{j=1}^m \hat{\lambda }(w_j) \left( g(x,\omega _j)+u_j\right) +r\sigma (-g(x,\omega _j)-u_j)\right\} \ge 0. \end{aligned}$$

Proof

The statement of the lemma is clearly true when \(\hat{\lambda }=0\), so it is enough to prove the lemma for \(\hat{\lambda }\not =0\). As in (9) define \(\gamma (x,u)\in {\mathbb {R}}^m\) as \([\gamma (x,u)]_j:=-g(x,\omega _j)-u_j\) for all \(j=1,\ldots ,m.\) With this notation, we have that \([\gamma (x,u)]_j\le -g(x,\omega _j)\) for all \(u\ge 0\). The infimum in the statement of the lemma thus re-writes as

$$\begin{aligned} \inf _{u\ge 0}\left\{ \sum _{j=1}^m \hat{\lambda }(w_j) ([-\gamma (x,u)]_j) +r\sigma (\gamma (x,u))\right\} . \end{aligned}$$
(39)

Fix \(r_0>0\). The continuity of g implies that there exists a constant \(C_1>0\) such that

$$\begin{aligned} \max _{x\in B[\hat{x},r_0]} \left\{ \sum _{j=1}^m |g(x,\omega _j)|\right\} \le C_1. \end{aligned}$$

This allows us to find a lower bound for the first term in the argument of (39). Indeed,

$$\begin{aligned} {\sum }_{j=1}^m \hat{\lambda }(w_j) ([-\gamma (x,u)]_j)\ge & {} {\sum }_{j=1}^m \hat{\lambda }(w_j) g(x,\omega _j)\nonumber \\\ge & {} -\Vert \hat{\lambda }\Vert _{\infty }\, {\sum }_{j=1}^m g(x,\omega _j)\nonumber \\\ge & {} -C_1\Vert \hat{\lambda }\Vert _{\infty }\,, \end{aligned}$$
(40)

where we used the fact that \(u_j\ge 0\) in the first inequality. The p-growth condition assumed in part (ii) implies that

$$\begin{aligned} \sup _{r>0} \,\inf _{|z|<r} \frac{\sigma (z)}{\Vert z\Vert ^p}=\rho _0>\rho _0/2>0. \end{aligned}$$

Therefore, there exists \(r_1>0\) such that

$$\begin{aligned} \frac{\sigma (z)}{\Vert z\Vert ^p}>\rho _0/2, \end{aligned}$$

for every \(z\in B(0,r_1)\subset {\mathbb {R}}^m\). Take \(r_2<\min \{1,r_1\}\) This yields

$$\begin{aligned} \sigma (z)\ge (\rho _0/2) \Vert z\Vert ^p\ge (\rho _0/2) \Vert z\Vert , \end{aligned}$$
(41)

where we used the fact that \(p\in (0,1]\) in the last inequality.

Call \(A(r,x,u):=\sum _{j=1}^m \hat{\lambda }(w_j) ([-\gamma (x,u)]_j) +r\sigma (\gamma (x,u))\), the expression between the curly brackets in (39). We will show that

$$\begin{aligned} \inf _{u\ge 0,\,x\in B(\hat{x},r_0)}A(r,x,u)\ge 0, \end{aligned}$$

for large enough \(r>0\). We consider first the case in which \(u\ge 0\) is such that \(\Vert \gamma (x,u)\Vert <r_2\). In this case, using (41) for \(z:=\gamma (x,u)\), we obtain

$$\begin{aligned} A(r,x,u)= & {} {\sum }_{j=1}^m \hat{\lambda }(w_j) ([-\gamma (x,u)]_j) +r\sigma (\gamma (x,u))\\\ge & {} -\Vert \hat{\lambda }\Vert _{\infty } \Vert \gamma (x,u)\Vert +r\sigma (\gamma (x,u))\\\ge & {} -\Vert \hat{\lambda }\Vert _{\infty } \Vert \gamma (x,u)\Vert +r\, (\rho _0/2) \Vert \gamma (x,u)\Vert \\= & {} \Vert \gamma (x,u)\Vert \left( r\, (\rho _0/2)-\Vert \hat{\lambda }\Vert _{\infty }\right) , \end{aligned}$$

which is positive for \(r> \frac{2 \Vert \hat{\lambda }\Vert _{\infty }}{\rho _0} =: r_3\). Hence, for all \(x\in B(\hat{x},r_0)\) we can write

$$\begin{aligned} \inf _{r>r_3}\{ A(r,x,u)\,:\, u\ge 0 \text{ and } \Vert \gamma (x,u)\Vert <r_2\}\ge 0. \end{aligned}$$
(42)

To complete the proof, we consider now those \(u\ge 0\) such that \(\Vert \gamma (x,u)\Vert \ge r_2\). Using the valley at 0 property and (40) have

$$\begin{aligned} A(r,x,u)= & {} {\sum }_{\omega _j\in \varGamma (\hat{x})} \hat{\lambda }(w_j) ([-\gamma (x,u)]_j) +r\sigma (\gamma (x,u))\\\ge & {} -C_1\Vert \hat{\lambda }\Vert _{\infty }\, +r\sigma (\gamma (x,u))\\\ge & {} -C_1\Vert \hat{\lambda }\Vert _{\infty }\, +r\,c_{r_2}, \end{aligned}$$

which is positive when \(r>\frac{C_1\Vert \hat{\lambda }\Vert _{\infty }}{c_{r_2}}=:r_4\). Hence, for all \(x\in B(\hat{x},r_0)\) we have

$$\begin{aligned} \inf _{r>r_4}\{ A(r,x,u)\,:\, u\ge 0 \text{ and } \Vert \gamma (x,u)\Vert \ge r_2\}\ge 0. \end{aligned}$$
(43)

From (43) and (42) we deduce that, for \(r>\max \{r_3,r_4\}=:r'\), the statement of the lemma is true for \(N(\hat{x}):= B(\hat{x},r_0)\). \(\square \)

The next result will be used to show that the Lagrangian \(L^\varGamma \) is lsc when \(\sigma \) is a valley at zero with respect to \(\varGamma \).

Lemma 4.2

Fix a pair \((\hat{x},\hat{\lambda })\in {\mathbb {R}}^n\times {\mathbb {R}}_+^{(\varOmega )}\), and denote by \(\varGamma =\text {supp}(\hat{\lambda })= \{\omega _1, \dots , \omega _m\}\). Assume that

  1. (i)

    \( g(\cdot ,\omega ), \omega \in \varGamma \), is real-valued and continuous,

  2. (ii)

    \(\sigma :{\mathbb {R}}^{(\varOmega )}\rightarrow {{\mathbb {R}}_+}\) is lsc and it has a valley at 0 with respect to \(\varGamma \).

For a fixed \(r>0\), define \(q(\cdot ,r):{\mathbb {R}}^n\rightarrow {\mathbb {R}}_{\pm \infty }\) as

$$\begin{aligned} q(x,r):=\inf _{u\ge 0}\left\{ \sum _{j=1}^m \hat{\lambda }(w_j) ([-\gamma (x,u)]_j)+r\sigma (\gamma (x,u))\right\} . \end{aligned}$$

Then, for \(r>0\) large enough, the function \(q(\cdot ,r)\) is lsc in \({\mathbb {R}}^n\).

Proof

Fix \(r>0\). Set \(Q(r,x,u):=\sum _{j=1}^m \hat{\lambda }(w_j) ([-\gamma (x,u)]_j) +r\sigma (\gamma (x,u)) +\delta _{{\mathbb {R}}_+^m}(u)\). Then, we can write

$$\begin{aligned} q(x,r)=\inf _{u\in {\mathbb {R}}^m} Q(r,x,u). \end{aligned}$$

Theorem 1.17 in [15] states that \(q(\cdot ,r)\) is lsc if (a) \(Q(r,\cdot ,\cdot )\) is lsc, and (b) \(Q(r,\cdot ,\cdot )\) is level-bounded in u locally uniformly in x. Condition (b) can be re-stated as follows. For every \(\alpha \in {\mathbb {R}}\) there exists a ball \(B(\hat{x},r_0)\) such that the set

$$\begin{aligned} W_{\alpha }(r):=\{(x,u)\,:\, x\in B(\hat{x},r_0),\, Q(r,x,u)\le \alpha \}\quad \text {is bounded in }{\mathbb {R}}^n\times {\mathbb {R}}^m. \end{aligned}$$
(44)

Since the \({\mathbb {R}}^n\)-coordinate of \(W_{\alpha }(r)\) is bounded by definition, it is enough to show that the \({\mathbb {R}}^m\)-coordinate of \(W_{\alpha }(r)\) is bounded. We start by noting that \(Q(r,\cdot ,\cdot )\) is lsc. This is a consequence of the fact that \(Q(r,\cdot ,\cdot )\) is the sum of lower semicontinuous functions. Indeed, the continuity assumption (i) on g, together with the lower semicontinuity of \(\sigma \) assumed in (ii) implies that \(\sigma \circ \gamma (\cdot ,\cdot )\) is lsc. The first term in the expression of \(Q(r,\cdot ,\cdot )\) is continuous on u and hence lsc. The third term in the expression of \(Q(r,\cdot ,\cdot )\) is \(\delta _{{\mathbb {R}}_+^m}\), which is lsc because the set \({\mathbb {R}}_+^m\) is closed. This proves that \(Q(r,\cdot ,\cdot )\) is lsc for every \(r>0\).

Let us show now that, for \(r>0\) and large enough, \(Q(r,\cdot ,\cdot )\) is level-bounded in u locally uniformly in x. Namely, we will prove that (44) holds for \(r_0=1\) and for \(r>0\) large enough. The continuity of g implies that there exists \(C_1>0\) such that

$$\begin{aligned} \max _{x\in B[\hat{x},1]}\,\, \left\{ \sum _{j=1}^m |g(x,\omega _j)|\right\} \le C_1. \end{aligned}$$

Note that, for every fixed \(\omega _{j_0}\) we have

$$\begin{aligned} C_1\ge \sum _{j=1}^m |g(x,\omega _j)|\ge |g(x,\omega _{j_0})|\ge -g(x,\omega _{j_0}). \end{aligned}$$
(45)

Assume that \((x,u)\in W_{\alpha }(r)\) (and hence \(x\in B[\hat{x},1]\)), then we must have \(u\ge 0\) and we can write a lower bound for the first term in the expression of Q as follows:

$$\begin{aligned} {\sum }_{j=1}^m \hat{\lambda }(w_j) ([-\gamma (x,u)]_j)= & {} {\sum }_{j=1}^m \hat{\lambda }(w_j) (g(x,\omega _j)+u_j)\\\ge & {} {\sum }_{j=1}^m \hat{\lambda }(w_j) g(x,\omega _j)\\\ge & {} -C_1\Vert \hat{\lambda }\Vert , \end{aligned}$$

where we used the fact that \(u_j\ge 0\) in the fist inequality and (45) in the last one. Altogether, we can write, for every \((x,u)\in W_{\alpha }(r):=\{(x,u)\,:\, x\in B(\hat{x},1),\, Q(r,x,u)\le \alpha \}\),

$$\begin{aligned} \alpha \ge Q(r,x,u)= & {} {\sum }_{j=1}^m \hat{\lambda }(w_j) ([-\gamma (x,u)]_j) +r\sigma ^\varGamma (\gamma (x,u))\\\ge & {} -C_1\Vert \hat{\lambda }\Vert +r\sigma ^\varGamma (\gamma (x,u)). \end{aligned}$$

Therefore, we deduce that

$$\begin{aligned} W_{\alpha }(r)\subset \{(x,u)\,:\, \sigma ^\varGamma (\gamma (x,u))\le \frac{\alpha +C_1\Vert \hat{\lambda }\Vert }{r}\}. \end{aligned}$$

By Remark 4.1 there exists \(\alpha _0\) such that the set

$$\begin{aligned} V_{\alpha _0}:=\{z\in {\mathbb {R}}^m\,:\, \sigma ^\varGamma (z)\le \alpha _0\}, \end{aligned}$$

is bounded. Take \(r>0\) such that \(\frac{\alpha +C_1\Vert \hat{\lambda }\Vert }{r}\le \alpha _0\). For this choice of r, we have that, if \((x,u)\in W_{\alpha }(r)\), then \(\gamma (x,u)\in V_{\alpha _0}\).

Let \(M_{\alpha _0}\) be as in Remark 4.1. We can thus write, for every u such that \((x,u)\in W_{\alpha }(r)\),

$$\begin{aligned} \Vert u\Vert= & {} {\sum }_{j=1}^m |u_j|= {\sum }_{j=1}^m |-g(x,\omega _j)+g(x,\omega _j)+u_j|\\\le & {} {\sum }_{j=1}^m |-g(x,\omega _j)|+ |g(x,\omega _j)+u_j|\le C_1 +\Vert \gamma (x,u)\Vert \le C_1+ M_{\alpha _0}, \end{aligned}$$

showing that, for this choice of r, we have

$$\begin{aligned} W_{\alpha }(r)\subset B[\hat{x},1]\times B[0, C_1+ M_{\alpha _0}]. \end{aligned}$$

Hence, (44) holds for \(r>0\) large enough. By Theorem 1.17 in [15], the function \(q(\cdot ,r)\) is lsc for \(r>0\) large enough. \(\square \)

This lemma readily gives the lower semicontinuity of \(L^\varGamma (\cdot ,\lambda ,r)\) for large enough r.

Proposition 4.1

Fix a pair \((\hat{x},\hat{\lambda })\in {\mathbb {R}}^n\times {\mathbb {R}}_+^{(\varOmega )}\), and denote by \(\varGamma =\text {supp}(\hat{\lambda })= \{\omega _1, \dots , \omega _m\}\). Assume that

  1. (i)

    \(f(\cdot )\) is real-valued and lsc and \( g(\cdot ,\omega ), \omega \in \varGamma \), is real-valued and continuous,

  2. (ii)

    \(\sigma :{\mathbb {R}}^{(\varOmega )}\rightarrow {{\mathbb {R}}_+}\) is lsc and it has a valley at 0 with respect to \(\varGamma \).

In this situation, \(L^\varGamma (\cdot ,\lambda ,r)\) is lsc for \(r>0\) large enough.

Proof

By definition, we know that

$$\begin{aligned} L^\varGamma (x,\hat{\lambda },r)= & {} f(x)+{\inf }_{u\ge 0}\left\{ {\sum }_{j=1}^{m} \hat{\lambda }(\omega _j)[-\gamma (x,u)]_j+r\sigma (\gamma (x,u))\right\} \\= & {} f(x)+q(x,r), \end{aligned}$$

where we are using the notation of Lemma 4.2. By Assumption (i) the first term is lsc. Since we are in conditions of Lemma 4.2, the second term is lsc for r large enough. Hence, the whole expression is lsc for r large enough.

Remark 4.2

With the notation of Definition 4.1, the sharp augmenting function \(\sigma ^\varGamma (y):=\Vert y^\varGamma \Vert \) used in Sect. 3 has a valley at 0 with \(c_{\delta }=\delta \), and it satisfies a p-growth condition at 0 with respect to \(\varGamma \) for \(p=1\). More generally, the augmenting function \(\sigma _p^\varGamma (y):=\Vert y^\varGamma \Vert ^p\) for \(p\in (0,1]\) has a valley at 0 with \(c_{\delta }=\delta ^p\), and it satisfies a p-growth condition at 0.

The following lemma uses an augmenting function as in Definition 4.1 (a), (b) and a second-order sufficient condition, see [32].

Lemma 4.3

Let \(f(\cdot )\) and \( g(\cdot ,\omega ), \omega \in \varOmega \), be real-valued twice continuously differentiable functions, and let \(x^*\in X_0\), \(\lambda ^*\in {\mathbb {R}}^{(\varOmega )}\), \(\varGamma =\text {supp}(\lambda ^*)= \{\omega _1, \dots , \omega _m\}\), \(\sigma :{\mathbb {R}}^{(\varOmega )}\rightarrow {{\mathbb {R}}_+}\) is lsc, it has a valley at 0 respect to \(\varGamma \), and it satisfies p-growth condition at 0 with respect to \(\varGamma \).

Assume that the KKT conditions (12) and (13) for (SIP) are satisfied, and that the second-order sufficient condition given by

$$\begin{aligned} z^T\nabla _{xx}^2 L_0^\varGamma (x^*,\lambda ^*)z> 0,\ \forall z\in V(x^*), \end{aligned}$$
(46)

holds, where \({V}(x^*)\) is defined by

$$\begin{aligned} V(x^*):= \left\{ 0 \not = d\in {\mathbb {R}}^n\Bigg | \begin{array}{cc} \nabla f(x^*)^T d \le 0,\\ \nabla _x g(x^*,\omega )^T d \le 0, &{}\forall ~\omega \in I(x^*)\bigcap \varGamma \end{array} \right\} . \end{aligned}$$
(47)

Then there exist \(r^*>0\) and a neighborhood \(N(x^*)\) of \(x^*\) such that

$$\begin{aligned} L^\varOmega (x, \lambda ^*, r^*)\ge L^\varOmega (x^*, \lambda ^*, r^*), \quad \forall x\in N(x^*). \end{aligned}$$
(48)

Proof

By contradiction, suppose that there exist two sequences \(\{r_k\}\) and \(\{x_k\}\) with \(r_k>0\), \(x_k\not =x^*\), such that \(r_k\rightarrow \infty \), \(x_k\rightarrow x^*\), as \(k \rightarrow \infty \) and

$$\begin{aligned} L^\varOmega (x_k,\lambda ^*,r_k)< L^\varOmega (x^*,\lambda ^*,r_k). \end{aligned}$$
(49)

We have

$$\begin{aligned} L_0^\varGamma (\lambda ^*, r^*)&=f(x_k)+\sum _{j=1}^m \lambda ^*(\omega _j)g(x_k, \omega _j)\le L^\varGamma (x_k,\lambda ^*,r_k)\nonumber \\&= L^\varOmega (x_k,\lambda ^*,r_k)< f(x^*), \end{aligned}$$
(50)

where the first inequality follows from (11) and the nonnegativity of \(\lambda ^*(\omega )\) (\(\forall \omega \in \varOmega )\) (since (13) holds), the second equality is due to (10) and the last inequality comes from (49) and (14). Define \(v_k:=x_k-x^*\) and \(s_k:=\frac{v_k}{\Vert v_k\Vert }\). By compactness of the ball, there exists a subsequence of \(\{s_{k}\}\) converging to a vector s with \(\Vert s\Vert =1\). Without loss of generality, suppose that \(s=\lim _{k\rightarrow \infty }s_k\). We will show first that \(s\in V(x^*)\). We can write

$$\begin{aligned} 0 >\frac{L^\varOmega (x_k,\lambda ^*,r_k)- L^\varOmega (x^*,\lambda ^*,r_k)}{\Vert v_k\Vert } \ge \frac{f(x_k)-f(x^*)}{\Vert v_k\Vert }+\frac{\inf _{u\ge 0} A(r_k,x_k,u)}{\Vert v_k\Vert }, \end{aligned}$$
(51)

where A(rxu) is as in Lemma 4.1. Since we are under conditions of Lemma 4.1, there exist \(r',r_0>0\) such that the last term is nonnegative for \(r_k>r'\) and \(x_k\in B(x^*,r_0)\). Since \(r_k\) tends to infinity and \(x_k\) tends to \(x^*\), we can find \(k_0\) such that \(r_k>r'\) and \(x_k\in B(x^*,r_0)\) for \(k\ge k_0\). We assume from now on that \(k\ge k_0\). Using this fact and the Taylor development of f, we can write

$$\begin{aligned} 0&> \frac{f(x_k)-f(x^*)}{\Vert v_k\Vert }+\frac{\inf _{u\ge 0}A(r_k,x_k,u)}{\Vert v_k\Vert }\ge \frac{f(x_k)-f(x^*)}{\Vert v_k\Vert }\\&= \nabla f(x^*)^Ts_k +\frac{o(\Vert v_k\Vert )}{\Vert v_k\Vert }, \end{aligned}$$

where \(\lim _k \frac{o(\Vert v_k\Vert )}{\Vert v_k\Vert }=0\). Taking limit yields

$$\begin{aligned} \nabla f(x^*)^Ts\le 0, \end{aligned}$$
(52)

which is one of the inequalities defining \(V(x^*)\). To show that the remaining inequalities in \(V(x^*)\) hold for s, write again

$$\begin{aligned} 0&> \frac{f(x_k)-f(x^*)}{\Vert v_k\Vert }+\frac{\inf _{u\ge 0}A(r_k,x_k,u)}{\Vert v_k\Vert }\\&=\nabla _x L_0^\varGamma (x^*,\lambda ^*)^T s_k+\frac{o(\Vert v_k\Vert )}{\Vert v_k\Vert }+ r_k\frac{\inf _{u\ge 0}\sigma (\gamma (x_k,u))}{\Vert v_k\Vert }\\&=\frac{o(\Vert v_k\Vert )}{\Vert v_k\Vert }+r_k\frac{\inf _{u\ge 0}\sigma (\gamma (x_k,u))}{\Vert v_k\Vert }, \end{aligned}$$

where the first inequality follows from (49), the first equality follows from the definitions of \(A(x_k,r_k,u)\) and \(L_0^\varGamma \), and the second equality comes from (12) in Lemma 2.2. Thus, we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{\inf _{u\ge 0}\sigma (\gamma (x_k,u))}{\Vert v_k\Vert }= 0, \end{aligned}$$

since \(r_k\rightarrow \infty \), as \( k\rightarrow \infty \). Therefore, there exists \(u^k\in {\mathbb {R}}_+^m\) such that

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{\sigma (\gamma (x_k,u^k))}{\Vert v_k\Vert }= 0, \end{aligned}$$
(53)

and then

$$\begin{aligned} \lim _{k\rightarrow \infty }\sigma (\gamma (x_k,u^k))= 0. \end{aligned}$$

Since \(\sigma \) has a valley at 0, this can only happen if

$$\begin{aligned} \lim _{k\rightarrow \infty }\Vert \gamma (x_k,u^k)\Vert = 0. \end{aligned}$$

The p-growth condition now implies that

$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{\sigma (\gamma (x_k,u^k))}{\Vert \gamma (x_k,u^k)\Vert ^p}= \rho _0. \end{aligned}$$
(54)

Note that \(0<p\le 1\) and \(|g(x_k, \omega _j)+u_j^k|\ge g^+(x_k, \omega _j)\) for all \(\omega _j\in \varGamma (x^*)\). Proceeding as in the proof of Lemma 4.1, and taking k large enough so that \(\Vert \gamma (x_k,u^k)\Vert <1\), we obtain

$$\begin{aligned} \sum _{\omega _j\in \varGamma (x^*)} g^+(x_k, \omega _j) \le \Vert \gamma (x_k,u^k)\Vert \le \Vert \gamma (x_k,u^k)\Vert ^p. \end{aligned}$$
(55)

From (54) we can take also k large enough so that \(\frac{\sigma (\gamma (x_k,u^k))}{\Vert \gamma (x_k,u^k)\Vert ^p}>\rho _0/2\). Combine this with (55) to write

$$\begin{aligned} 0\le \frac{\sum _{\omega _j\in \varGamma (x^*)} g^+(x_k, \omega _j)}{\Vert v_k\Vert }\le \frac{\Vert \gamma (x_k,u^k)\Vert ^p}{\Vert v_k\Vert }\le \frac{2}{\rho _0} \frac{\sigma (\gamma (x_k,u^k))}{\Vert v_k\Vert } . \end{aligned}$$

Using the above expression and (53), we deduce that

$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{\sum _{\omega _j\in \varGamma (x^*)} g^+(x_k, \omega _j)}{\Vert v_k\Vert }=0. \end{aligned}$$
(56)

Since the functions \(g(\cdot ,\omega _j),\,\omega _j\in \varGamma (x^*)\) are continuously differentiable at \(x^*\), and \(g(x^*,\omega _j)=0\) for all \(\omega _j\in \varGamma (x^*)\), we use the Taylor development of \(g(\cdot ,\omega _j)\) to write

$$\begin{aligned} \frac{g^+(x_k,\omega _j)}{\Vert v_k\Vert }&=\max \left\{ 0,\frac{g(x_k,\omega _j)}{\Vert v_k\Vert }\right\} \nonumber \\&=\max \left\{ 0,\nabla _x g(x^*,\omega _j)^T s_k+\dfrac{o(\Vert v_k\Vert )}{\Vert v_k\Vert }\right\} \end{aligned}$$
(57)

for all \(\omega _j\in \varGamma (x^*)\). Since each term in the sum of (56) is nonnegative, we deduce that

$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{g^+(x_k, \omega _j)}{\Vert v_k\Vert }=0. \end{aligned}$$

for all \(\omega _j\in \varGamma (x^*)\). Use this fact in (57) to obtain

$$\begin{aligned} \displaystyle 0=\lim _{k\rightarrow +\infty } \max \left\{ 0,\nabla _x g(x^*,\omega _j)^\top s_k+\dfrac{o(\Vert v_k\Vert )}{\Vert v_k\Vert }\right\} =\max \left\{ 0,\nabla _x g(x^*,\omega _j)^\top s\right\} , \end{aligned}$$

for all \(\omega _j\in \varGamma (x^*)\). This readily gives

$$\begin{aligned} \nabla _x g(x^*,\omega _j)^\top s \le 0, \end{aligned}$$

for all \(\omega _j\in \varGamma (x^*)\). Combining this fact and (52), we conclude that \(s\in V(x^*).\) The Taylor development of \(L_0^\varGamma \) and the first-order conditions yield

$$\begin{aligned} \begin{aligned}&f(x_k)-f(x^*)+\sum _{j=1}^{m}\lambda ^*(\omega _j) g(x_k,\omega _j)\\&=f(x_k)-f(x^*)+\sum _{j=1}^{m}\big [\lambda ^*(\omega _j) g(x_k,\omega _j)-\lambda ^*(\omega _j) g(x^*,\omega _j)\big ]\\&=\frac{1}{2}s_k^T \big [ \nabla ^2f(x^*)+\sum _{j=1}^{m}\lambda ^*(\omega _j) \nabla _{xx}^2 g(x^*,\omega _j)\big ] s_k \Vert v_k\Vert ^2+o(\Vert v_k\Vert ^2)\\&= \Vert v_k\Vert ^2 \left( s_k^T \big [ \nabla ^2f(x^*)+\sum _{j=1}^{m}\lambda ^*(\omega _j) \nabla _{xx}^2 g(x^*,\omega _j)\big ] s_k + \frac{o(\Vert v_k\Vert ^2)}{\Vert v_k\Vert ^2} \right) \\&= \Vert v_k\Vert ^2 \left( s_k^T \nabla ^2_{xx} L_0^\varGamma (x^*,\lambda ^*) s_k + \frac{o(\Vert v_k\Vert ^2)}{\Vert v_k\Vert ^2} \right) . \end{aligned} \end{aligned}$$

Assumption (46) and the fact that \(s\in V(x^*)\), imply that, for k large enough, the expression between parentheses is positive. Hence, we have that

$$\begin{aligned} f(x_k)-f(x^*)+\sum _{j=1}^{m}\lambda ^*(\omega _j) g(x_k,\omega _j)>0, \end{aligned}$$

contradicting (50) for large enough k. This implies that (49) cannot hold, and therefore (48) must hold. \(\square \)

The following result establishes existence of an augmented Lagrange multiplier for (SIP), under the conditions of Lemma 4.3 and some boundedness assumptions on f and \(g(\cdot ,\omega )\).

Theorem 4.1

Let \(x^*\) be a unique global optimal solution of \((P^\varGamma )\). Suppose that all conditions in Lemma 4.3 are satisfied. Moreover, if

  1. (i)

    f and \( g(x,\omega )\ (\omega \in \varGamma )\) are bounded below,

  2. (ii)

    there exists \(\mu _0>0\) such that the set below is bounded

    $$\begin{aligned} \varTheta :=\{x\in {\mathbb {R}}^n:g(x,\omega )\le \mu _0, \omega \in \varGamma \}, \end{aligned}$$
    (58)

    then \(\lambda ^*\) is an augmented Lagrange multiplier of (SIP).

Proof

By Condition (i), there exists \(\xi _0>-\infty \) such that

$$\begin{aligned} \min _{x\in {\mathbb {R}}^n,\omega \in \varOmega } \{f(x), g(x,\omega )\}\ge \xi _0. \end{aligned}$$
(59)

By Lemma 2.2 (ii), it is enough to prove that there exists \(r_0^*>0\) such that (19) holds for every \(x\in {\mathbb {R}}^n\). We establish this fact in two steps, according to whether x belongs or not to the set \(\varTheta \) defined in (58). We use in this proof the notation of Lemma 4.1.

Step I In this step, we prove that there exists \(r_1^*>0\) such that (19) holds on \({\mathbb {R}}^n\backslash \varTheta \). If \(x\in {\mathbb {R}}^n \backslash \varTheta \), then \(g(x,\omega _{j_0}) >\mu _0\) for some \(\omega _{j_0}\in \varGamma \). Thus, for every \(u\ge \) we have

$$\begin{aligned} \Vert \gamma (x,u)\Vert \ge g(x,\omega _{j_0})+u_{j_0}> \mu _0. \end{aligned}$$

Since \(\sigma \) has a valley at 0, there exists \( c_{\mu _0}>0\) such that

$$\begin{aligned} \sigma (\gamma (x,u))\ge c_{\mu _0}. \end{aligned}$$
(60)

By (11), (59) and (60), we have

$$\begin{aligned} \begin{aligned} L^\varGamma (x,\lambda ^*,r)&= f(x)+\sum _{j=1}^m \lambda ^*(\omega _j) g(x,\omega _j)+\inf _{u\ge 0}\left\{ \sum _{j=1}^{m} \lambda ^*(\omega _j) u_j+r\sigma (\gamma (x,u))\right\} \\&\ge \xi _0\left( 1+\sum _{j=1}^m \lambda ^*(\omega _j)\right) +r c_{\mu _0}, \quad \forall x\in {\mathbb {R}}^n \backslash \varTheta . \end{aligned} \end{aligned}$$

Then there exists \(r_1^*>0\) such that

$$\begin{aligned} L^\varGamma (x,\lambda ^*,r_1^*)\ge f(x^*), \quad \forall x\in {\mathbb {R}}^n \backslash \varTheta . \end{aligned}$$

This inequality, combined with (10) in Lemma 2.1 and (14) in Lemma 2.2, yields that

$$\begin{aligned} L^\varOmega (x, \lambda ^*, r_1^*)\ge L^\varOmega (x^*, \lambda ^*, r_1^*), \quad \forall x\in {\mathbb {R}}^n\backslash \varTheta . \end{aligned}$$
(61)

That is, (19) holds on \( {\mathbb {R}}^n \backslash \varTheta \).

Step II We prove in this step that there exists \(r_2^*>0\) such that (19) holds for \(x\in \varTheta \). By contradiction, suppose that there exist sequences \(\{r_k\}\) and \(\{x_k\}\) with \(r_k>0\), \(x_k\in \varTheta \), such that \(r_k\rightarrow +\infty \) and

$$\begin{aligned} L^\varGamma (x_k,\lambda ^*,r_k)=L^\varOmega (x_k,\lambda ^*,r_k)< L^\varOmega (x^*,\lambda ^*,r_k)=f(x^*), \end{aligned}$$
(62)

where we used (14) in the equalities. Set \(\varGamma = \{\omega _1, \dots , \omega _m\}\). It follows from (11) and (62) that

$$\begin{aligned} \begin{aligned} f(x^*)&> f(x_k)+\sum _{j=1}^m \lambda ^*(\omega _j)g(\omega _j,x_k)+r_k\inf _{u\ge 0}\sigma (\gamma (x_k,u)). \end{aligned} \end{aligned}$$
(63)

Since \(r_k\rightarrow +\infty \) the above inequality implies that

$$\begin{aligned} \lim _{k\rightarrow \infty } \inf _{u\ge 0}\sigma (\gamma (x_k,u))=0. \end{aligned}$$

Therefore, we can take a sequence \(\{u^k\}\subset {\mathbb {R}}_+^m\) such that

$$\begin{aligned} \lim _{k\rightarrow \infty }\sigma (\gamma (x_k,u^k))=0. \end{aligned}$$

By condition (ii), \(\varTheta \) is bounded, and the sequence \(\{x_k\}\) has at least a cluster point \(\bar{x}\). Without loss of generality, we assume that \(x_k\rightarrow {\bar{x}}.\) We claim that \({\bar{x}}\in X_0^\varGamma \), where \( X_0^\varGamma \) is defined by (27). Indeed, the fact that \(\sigma \) has a valley at zero and the last expression imply that, for every \(\omega _j\in \varGamma \) we have

$$\begin{aligned} 0=\lim _{k\rightarrow \infty }\Vert \gamma (x_k,u^k)\Vert \ge \lim _{k\rightarrow \infty } |g(x_k,\omega _j)+u^k_j|=0. \end{aligned}$$

Since g is continuous, the sequence \(\{g(x_k,\omega _j)\}\) is bounded for every j, and hence we conclude that the sequence \(\{u^k\}\subset {\mathbb {R}}_+^m\) is bounded too. Therefore, they both have convergent subsequences. We may and do denote these still by \(\{g(x_k,\omega _j)\}\) and \(\{u^k\}\). Hence, we can write, for every \(\omega _j\in \varGamma \),

$$\begin{aligned} g({\bar{x}},\omega _j)=\lim _{k\rightarrow \infty } g(x_k,\omega _j)=\lim _{k\rightarrow \infty } -u^k_j\le 0, \end{aligned}$$

because \(\{u^k\}\subset {\mathbb {R}}_+^m\). Therefore, \({\bar{x}}\in X_0^\varGamma \). Since we are under the conditions of Lemma 4.3, there exist \(r^*>0\) and a neighborhood \(N(x^*)\) of \(x^*\) such that (48) holds, i.e.,

$$\begin{aligned} L^\varOmega (x, \lambda ^*, r^*)> L^\varOmega (x^*, \lambda ^*, r^*)=f(x^*), \quad \forall x\in N(x^*). \end{aligned}$$
(64)

It follows from (62) and (64) that \(x_k\notin N(x^*)\). Indeed, for large enough k we must have \(r_k>r^*\) and hence (62) gives

$$\begin{aligned} L^\varGamma (x_k,\lambda ^*,r^*)\le L^\varGamma (x_k,\lambda ^*,r_k)< L^\varOmega (x^*,\lambda ^*,r_k)=f(x^*), \end{aligned}$$

where we used the fact that \(r_k>r^*\) in the first inequality. This expression and (64) yield \(x_k\not \in N(x^*)\). Since \(x_k\rightarrow {\bar{x}}\), we deduce that \({\bar{x}}\not = x^*\). By assumption, \(x^*\) is a unique global optimal solution of \((P^\varGamma )\) and \(\bar{x}\) is feasible for \((P^\varGamma )\), hence we must have \(f({\bar{x}})> f(x^*)\). Let \(\beta =f({\bar{x}})- f(x^*)\). By using (10) and (11), we have

$$\begin{aligned} \begin{aligned} L^\varOmega ({\bar{x}},\lambda ^*,r)&= f({\bar{x}})+\inf _{u\ge 0}\left\{ \sum _{j=1}^{m} \lambda ^*(\omega _j) [-\gamma ({\bar{x}},u)]_j+r\sigma ^\varGamma (\gamma (\bar{x},u))\right\} \\&\ge f(x^*)+\frac{\beta }{2}+\inf _{u\ge 0}\left\{ \sum _{j=1}^{m} \lambda ^*(\omega _j) [-\gamma ({\bar{x}},u)]_j+r\sigma ^\varGamma (\gamma (\bar{x},u))\right\} . \end{aligned} \end{aligned}$$
(65)

Since we are under the conditions of Lemma 4.1, the last term in (65) is nonnegative for r large enough. Using this in (65) gives

$$\begin{aligned} L^\varOmega ({\bar{x}},\lambda ^*,r)\ge f(x^*)+\frac{\beta }{2}, \quad \forall r\ge r'. \end{aligned}$$
(66)

Fix \(k_0\) such that \(r_k\ge r'\) for all \(k\ge k_0\), where \(r'>0\) is as in (66). Using Proposition 4.1, we may and do assume that \(r'\) is large enough such that \(L^\varGamma (\cdot , \lambda ^*, r')\) is lsc. Then by (62) we obtain

$$\begin{aligned} f(x^*)\ge & {} {\liminf }_k L^\varGamma (x_k,\lambda ^*,r_k)\ge {\liminf }_k L^\varGamma (x_k,\lambda ^*,r')\\\ge & {} L^\varGamma ({\bar{x}},\lambda ^*,r')=L^\varOmega ({\bar{x}},\lambda ^*,r)\ge f(x^*)+\frac{\beta }{2}, \end{aligned}$$

where we used (62) in the first inequality, the fact that \(L^\varGamma (x_k,\lambda ^*,r_k)\ge L^\varGamma ({\bar{x}},\lambda ^*,r')\) if \(r_k\ge r'\) in the second inequality, lower semicontinuity of \(L^\varGamma (\cdot , \lambda ^*, r')\) in the third inequality, (10) in the equality, and (66) in the last inequality. This expression entails a contradiction and hence there exists \(r_2^*>0\) such that

$$\begin{aligned} L^\varOmega (x, \lambda ^*, r_2^*)\ge L^\varOmega (x^*, \lambda ^*, r_2^*), \quad \forall x\in \varTheta . \end{aligned}$$
(67)

Equivalently, (19) holds on \(\varTheta \). Let \(r_0^*=\max \{r_1^*, r_2^*\}\). It follows from (61) and (67) that (19) holds on \({\mathbb {R}}^n\). The proof is complete. \(\square \)

Remark 4.3

Shapiro and Sun [22] established second-order conditions ensuring existence of an augmented Lagrange multiplier when the augmenting function is assumed to be a quadratic function and condition (R) (called quadratic growth condition in Rockafellar[14]) is satisfied. Under some second-order sufficient conditions and by assuming that the global optimal solution of the primal problem is unique, Sun et al. [23] obtained existence results of global saddle points for four classes of augmented Lagrangian functions (by Remark 2.3, this implies existence of augmented Lagrange multipliers). In the latter paper, the augmenting function \(\sigma \) is twice differentiable and convex, while in the other three classes of augmented Lagrangian functions the augmenting function is assumed to be twice differentiable. In our analysis (see Theorem 4.1), the augmenting function is not necessarily convex nor differentiable. Hence, the result and the proof techniques in Theorem 4.1 are different than those in Shapiro and Sun [22] and Sun et al. [23]. Moreover, it is known that some nonconvex and nondifferentiable penalty functions, which are the special cases of the augmented Lagrangians with nonconvex and nondifferentiable augmenting functions, are able to provide a smaller exact penalty parameter than that of the classical \(l_1\) penalty function, see [33, Theorem 4.9], and indeed it has been shown that nonconvex and nondifferentiable penalty functions need only smaller penalty parameters to achieve the required accuracy than the classical \(l_1\) penalty function in some practical applications, see [34].

Next we apply a global second-order sufficient condition [26] to obtain the existence of an augmented Lagrange multiplier.

Definition 4.2

Let X be a subset of \({\mathbb {R}}^n\), \(x^*\in X\), \( W(x^*)\) be a subset of \({\mathbb {R}}^n\) and \(f:X\rightarrow {\mathbb {R}}\) be twice continuously differentiable at \(x^*\). We say that a generalized representation condition holds for f at \(x^*\) on X with respect to \(\eta (x,x^*) \in W(x^*)\) if, for every \(x \in X\),

$$\begin{aligned} f(x)=f(x^*)+\nabla f(x^*)^\top (x-x^*)+\frac{1}{2}\eta (x,x^*)^\top \nabla ^2f(x^*)\eta (x,x^*). \end{aligned}$$

Remark 4.4

There are some functions that satisfy the generalized representation condition in Definition 4.2 (cf. Yang [26]). For example, for a linear fractional function

$$\begin{aligned} f(x) = \frac{b^\top x+c}{a^\top x+s}, \quad x \in X \end{aligned}$$

where \(a, b \in {\mathbb {R}}^n\), \(c, s\in {\mathbb {R}}\) and X is a convex set such that \(a^\top x+s>0, \forall x \in X.\) It is easy to show that

$$\begin{aligned} f(x)=f(x^*)+\nabla f(x^*) ^\top (x-x^*)+\frac{1}{2}\frac{a^\top x^*+s}{a^\top x+s}(x-x^*)^\top \nabla ^2f(x^*) (x-x^*). \end{aligned}$$

Then f(x) satisfies the generalized representation condition on X with

$$\begin{aligned} W(x^*)=\left\{ \eta (x,x^*)\in {\mathbb {R}}^n:\eta (x,x^*)=\sqrt{\frac{a^\top x^*+s}{a^\top x+s}}(x-x^*)\right\} . \end{aligned}$$

Theorem 4.2

Let \(f(\cdot )\) and \( g(\cdot ,\omega ), \omega \in \varOmega \), be real-valued twice continuously differentiable functions, and let \(x^*\in X_0\), \(\lambda ^*\in {\mathbb {R}}^{(\varOmega )}\), \(\varGamma =\text {supp}(\lambda ^*)= \{\omega _1, \dots , \omega _m\}\), \(V(x^*)\) be defined by (47), \(\sigma :{\mathbb {R}}^{\varOmega }\rightarrow {{\mathbb {R}}_+}\) have a valley at 0 respect to \(\varGamma \) and (38) hold. Assume that the KKT Conditions (12) and (13) for problem (SIP) and Conditions (i), (ii) in Theorem 4.1 are satisfied. Moreover, if there exists a set \(W(x^*) \subset {\mathbb {R}}^n {\setminus } \{0\}\) such that \(V(x^*) \subset W(x^*)\) and the generalized representation condition holds for f(x) and \(g(x,\omega _j), j=1,\ldots ,m,\) at \(x^*\) on \( X_0^\varGamma \) with respect to the same \(\eta (x,x^*)\in W(x^*)\), where \(\eta (\cdot ,x^*): {\mathbb {R}}^n \rightarrow W(x^*)\) is continuous,

$$\begin{aligned} y^\top \nabla _{xx}^2 L_0^\varGamma (x^*,\lambda ^*)y >0, \quad \forall y\in W(x^*). \end{aligned}$$
(68)

Then \(\lambda ^*\) is an augmented Lagrange multiplier of (SIP).

Proof

As in the proof of Theorem 4.1, we will prove that there exists \(r^*>0\) such that (19) holds on \( {\mathbb {R}}^n= \varTheta \cup ( {\mathbb {R}}^n \backslash \varTheta )\), where \(\varTheta \) is defined by (58). The proof for the case in which \(x\not \in \varTheta \) follows the same line of argument as that in Step 1 of the proof of Theorem 4.1 and hence it is omitted. Namely, Step 1 of the proof of Theorem 4.1 can be quoted to conclude that there exists \(r_1^*>0\) such that (19) holds on \( {\mathbb {R}}^n \backslash \varTheta \). To complete the proof, we prove that there exists \(r_2^*>0\) such that (19) holds on \(\varTheta \). By contradiction, suppose that there exist sequences \(\{r_k\}\) and \(\{x_k\}\) with \(r_k>0\), \(x_k\in \varTheta \), such that \(r_k\rightarrow +\infty \) and (62) holds. Again, similar steps as those in Theorem 4.1, Step II establishes (63) and proves that \({\bar{x}}\in X_0^\varGamma \), where \( X_0^\varGamma \) is defined by (27). Because \(V(x^*) \subset W(x^*)\), it follows from Lemma 4.3 that there are \(r^*>0\) and a neighborhood \(N(x^*)\) of \(x^*\) such that (48) holds, i.e.,

$$\begin{aligned} L^\varOmega (x, \lambda ^*, r^*)> L^\varOmega (x^*, \lambda ^*, r^*), \quad \forall x\in N(x^*). \end{aligned}$$

Thus, it follows from (62) that \({\bar{x}}\not = x^*\), since \(x_k\rightarrow {\bar{x}}\), as \(k\rightarrow \infty \).

By the assumption, there is \(\eta ({\bar{x}},x^*) \in W(x^*)\) such that the generalized representation conditions of f(x) and \(g(x,\omega _j), j=1,\cdots ,m,\) hold. By (68), we have

$$\begin{aligned} \eta ({\bar{x}}, x^*)^\top \nabla _{xx}^2 L_0^\varGamma (x^*, \lambda ^*)\eta ({\bar{x}}, x^*) > 0. \end{aligned}$$

Thus, for k sufficiently large, and by continuity of \(\eta (\cdot , x^*)\),

$$\begin{aligned} \eta (x_k, x^*)^\top \nabla _{xx}^2 L_0^\varGamma (x^*, \lambda ^*)\eta (x_k, x^*) > 0. \end{aligned}$$
(69)

Thus, by the KKT conditions and (69), we have

$$\begin{aligned}&f(x_k)+\sum _{j=1}^m\lambda ^*(\omega _j)g(\omega _j, x_k) \\&\quad = f(x^*) + \nabla f(x_k)^\top (x_k-x^*)+ \frac{1}{2} \eta (x_k,x^*)^\top \nabla ^2 f(x_k)^\top \eta (x_k,x^*)\\&\qquad + \sum _{j=1}^m \lambda ^*(\omega _j) g(\omega _j, x_k) \\&\qquad + \sum _{j=1}^m\lambda ^*(\omega _j) \nabla g(\omega _j, x_k)^\top (x_k-x^*)\\&\qquad + \frac{1}{2} \sum _{j=1}^m\lambda ^*(\omega _j) \eta (x_k,x^*)^\top \nabla _{xx}^2 g(\omega _j, x_k) \eta (x_k,x^*) \\&\quad = \frac{1}{2} \eta (x_k,x^*)^\top \nabla _{xx}^2 L^\varOmega (x^*,\lambda ^*) \eta (x_k,x^*) \\&\quad > f(x^*), \end{aligned}$$

which contradicts to (63). Therefore, there exists \(r_2^*>r^*>0\) such that (19) holds on \(\varTheta \).

Let \(r^*=\max \{r_1^*,r_2^*\}\). Therefore, (19) holds on \( {\mathbb {R}}^n\). By Lemma 2.2, \(\lambda ^*\) is an augmented Lagrange multiplier of \((\text {SIP})\) with penalty parameter \(r^*\). \(\square \)

Remark 4.5

If Assumption 3.1 holds, then conditions (i) and (ii) of Theorem 4.1 are satisfied (see the proof of Theorem 3.2).

5 Conclusions

In the case of a sharp Lagrangian, we obtained first-order necessary or sufficient conditions for the existence of an augmented Lagrange multiplier for (SIP). Using a valley at 0 augmenting function, we obtained second-order sufficient conditions for the existence of an augmented Lagrange multiplier for (SIP). We employed the discretization technique in our work and provided some characterizations of augmented Lagrange multipliers for (SIP) in terms of saddle points. However, we could not establish the equivalence of the (SIP) and its discretized optimization problem when an augmented Lagrange multiplier exists. We leave this question as an open problem for our future research.