1 Introduction

The method of exact penalty functions [13] is a powerful tool for solving various constrained optimization problems. However, as it is well-known, exact penalty functions are usually nonsmooth, even in the case when the original problem is smooth. This obstacle makes it impossible to apply (without any transformation of the problem) well-developed and extensively studied methods of smooth unconstrained optimization to minimization of an exact penalty function.

Huyer and Neumaier in [4] proposed a new approach to exact penalization that allows one to overcome nonsmoothness of exact penalty functions. Namely, let the original constrained optimization problem has the form

$$\begin{aligned} \min f(x) \quad \text {subject to} \quad F(x) = 0, \quad x \in [\underline{x}, \overline{x}], \end{aligned}$$
(1)

where \(f :\mathbb {R}^n \rightarrow \mathbb {R}\) and \(F :\mathbb {R}^n \rightarrow \mathbb {R}^m\) are smooth function, \(\underline{x}, \overline{x} \in \mathbb {R}^n\) are given vectors, and

$$\begin{aligned}{}[\underline{x}, \overline{x}] = \left\{ x = (x_1, \ldots , x_n) \in \mathbb {R}^n \mid \underline{x}_i \le x_i \le \overline{x}_i \quad \forall i \in \{ 1, \ldots , n \} \right\} . \end{aligned}$$

The new approach relies on the introduction of additional variable \(\varepsilon \ge 0\) in the following way. Choose \(w \in \mathbb {R}^m\), and note that the problem (1) is equivalent to the optimization problem

$$\begin{aligned} \min _{x, \varepsilon } f(x) \quad \text {subject to} \quad F(x) = \varepsilon w, \quad \varepsilon = 0, \quad x \in [\underline{x}, \overline{x}]. \end{aligned}$$
(2)

Then one defines the new “smooth” penalty function for the augmented problem (2) as follows

$$\begin{aligned} F_{\lambda }(x, \varepsilon ) = {\left\{ \begin{array}{ll} f(x), &{} \text { if }\quad \varepsilon = \Delta (x, \varepsilon ) = 0, \\ f(x) + \frac{1}{2 \varepsilon } \frac{\Delta (x, \varepsilon )}{1 - q \Delta (x, \varepsilon )} + \lambda \beta (\varepsilon ), &{} \text { if }\quad \varepsilon > 0, \, \Delta (x, \varepsilon ) < q^{-1}, \\ + \infty , &{}\; \text {otherwise}. \end{array}\right. } \end{aligned}$$
(3)

where \(\lambda \ge 0\) is the penalty parameter, \(\Delta (x, \varepsilon ) = \Vert F(x) - \varepsilon w \Vert ^2\) is the constraint violation measure, \(\beta :[0, \overline{\varepsilon }] \rightarrow [0, +\infty )\) with \(\beta (0) = 0\) is the penalty term, \(q > 0\) and \(\overline{\varepsilon } > 0\) are some prespecified thresholds. Finally, one replaces the augmented problem (2) with the penalized problem

$$\begin{aligned} \min _{x, \varepsilon } F_{\lambda }(x, \varepsilon ) \quad \text {subject to} \quad (x, \varepsilon ) \in [\underline{x}, \overline{x}] \times [0, \overline{\varepsilon }]. \end{aligned}$$
(4)

Observe that the penalty function \(F_{\lambda }(x, \varepsilon )\) is smooth for any \(\varepsilon \in (0, \overline{\varepsilon )}\) and \(x\) such that \(0 < \Delta (x, \varepsilon ) < q^{-1}\) provided the function \(\beta \) is smooth on \((0, \overline{\varepsilon })\). Furthermore, it was proved in [4] that under standard assumption (namely, constraint qualification) the penalty function \(F_{\lambda }(x, \varepsilon )\) is locally exact. In other words, \((x^*, \varepsilon ^*)\) is a local minimum of the problem (4) if and only if \(\varepsilon ^* = 0\) and \(x^*\) is a local minimum of the original problem (1). Consequently, one can apply methods of smooth unconstrained minimization to the penalized problem (4) in order to find a solution of the initial constrained optimization problem (1).

Later on, the approach of smooth exact penalty functions was generalized [5, 6] and successfully applied to various constrained optimization problems [7, 8], including some optimal control problems [911]. However, it should be noted that the existing proofs of the exactness of the smooth penalty function (3) and its various generalizations are quite complicated, and overburdened by technical details that overshadow understanding of the technique of smooth exact penalty functions. Also, the question of when the problem (1) is actually equivalent to the problem (4) in terms of optimal solutions [in this case the penalty function \(F_{\lambda }(x, \varepsilon )\) is called exact] has not been discussed in the literature.

The aim of this article is to present a new perspective on the method of smooth exact penalty functions. This perspective allows one to apply previously unused tools (namely, parametric optimization) to the study and construction of smooth exact penalty functions. It also helped us to essentially simplify the proof of exactness of these functions. Another aim of this articles is to provide necessary and sufficient conditions for the smooth penalty function to be (globally) exact.

The paper is organised as follows. In Sect. 2 we describe a new approach to smooth exact penalty functions. Some general results that draw a connection between exactness of the new penalty functions and some properties of perturbed optimization problems are presented in Sect. 3. In Sect. 4 we give a new simple proof of local exactness of a new penalty function that significantly generalizes all results on local exactness of smooth penalty functions existing in the literature. We also provide new conditions for the smooth penalty function to be globally exact.

2 How to construct a smooth exact penalty function?

Two main approaches are usually used for the study of exact penalty functions: direct approach, that is based on the use of error bound estimates and metric regularity, and indirect one, that relies on the consideration of a perturbed optimization problem. In the indirect approach [1215], a perturbation of the initial problem is introduced as a tool for the study of a penalty function that is already defined. However, one can introduce a perturbation in order to construct a penalty function.

Namely, define the perturbed objective function for the problem (1)

$$\begin{aligned} g(x, \mu ) = {\left\{ \begin{array}{ll} f(x), &{} \text { if }\quad \mu = \Delta (x, \mu ) = 0, \\ f(x) + \frac{1}{\mu } \frac{\Delta (x, \mu )}{1 - q \Delta (x, \mu )}, &{} \text { if }\quad \mu > 0, \, \Delta (x, \mu ) < q^{-1}, \\ + \infty , &{}\; \text {otherwise}, \end{array}\right. } \end{aligned}$$

where \(\mu \ge 0\) is a perturbation parameter. Consider the perturbed optimization problem

$$\begin{aligned} \min _x g(x, \mu ) \quad \text {subject to} \quad x \in [\underline{x}, \overline{x}]. \end{aligned}$$
(5)

It is clear that the problem above with \(\mu = 0\) is equivalent to the initial problem (1). Moreover, a penalization of the constraint \(F(x) = 0\) is achieved via the introduced perturbation. As a second step, note that the perturbed problem is equivalent to the problem

$$\begin{aligned} \min _{x, \mu } g(x, \mu ) \quad \text {subject to} \quad x \in [\underline{x}, \overline{x}], \quad \mu = 0. \end{aligned}$$

Finally, inroduce a penalty function for the problem above that penalizes only the constraint on the perturbation parameter, i.e. \(\mu = 0\). The penalty function has the form

$$\begin{aligned} F_{\lambda }(x, \mu ) = g(x, \mu ) + \lambda \beta (\mu ). \end{aligned}$$
(6)

and it is smooth for any \(\mu \in (0, \overline{\varepsilon })\) and \(x \in \mathbb {R}^n\) such that \(0 < \Delta (x, \mu ) < q^{-1}\). Thus, the fact that the nonlinear constraints are taken into account via the perturbation (not penalization), and the fact that the penalty term is constructed only for a simple one dimensional constraint on the perturbation parameter allows one to avoid nonsmoothness that usually arises due to the restrictive requirements on a penalty term.

In the following section, we develop the approach discussed above in the general case, and show that exactness of the penalty function \(F_{\lambda }(x, \mu )\) of the from (6) is directly connected with some properties of the perturbed problem (5).

3 Exact penalty function for a perturbed optimization problem

Let \(X\) be a topological space, \(f :X \rightarrow \mathbb {R} \cup \{ + \infty \}\) be a given function, and \(M\), \(A \subset X\) be nonempty sets such that \(M \cap A \ne \emptyset \). Hereafter, we study the following optimization problem:

$$\begin{aligned} \min f(x) \quad \text {subject to} \quad x \in M, \quad x \in A.\qquad \qquad \qquad (\mathcal {P}) \end{aligned}$$

Denote by \(\Omega = M \cap A\) the set of feasible solutions of the problem (\(\mathcal {P}\)). Denote also \(\mathbb {R}_+ = [0, +\infty )\) and \({{\mathrm{dom}}}f = \{ x \in X \mid f(x) < +\infty \}\). We suppose that the function \(f\) is bounded from below on \(\Omega \).

Introduce a metric space of perturbation parameters \((P, d)\), and a perturbed objective function \(g :X \times P \rightarrow \mathbb {R} \cup \{ + \infty \}\) such that there exists \(\mu _0 \in P\) for which the following conditions are satisfied:

  1. 1.

    \(g(x, \mu _0) = f(x)\) for any \(x \in \Omega \) (consistency condition);

  2. 2.

    \({{\mathrm{arg\,min}}}_{x \in A} g(x, \mu _0) = {{\mathrm{arg\,min}}}_{x \in \Omega } f(x)\) (exact penalization condition).

With the use of the exact penalization condition one gets that the problem (\(\mathcal {P}\)) is equivalent (in terms of optimal solutions) to the following optimization problem:

$$\begin{aligned} \min _{x, \mu } g(x, \mu ) \quad \text {subject to} \quad x \in A, \quad \mu = \mu _0. \end{aligned}$$
(7)

Furthermore, the consistency condition guarantees that if \((x_0, \mu _0)\) with \(x_0 \in \Omega \) is a local minimum of the problem above, then \(x_0\) is a local minimum of the problem (\(\mathcal {P}\)).

We apply the exact exact penalization technique to treat the problem (7). Namely, choose a function \(\beta :\mathbb {R}_+ \rightarrow \mathbb {R}_+ \cup \{ +\infty \}\) such that \(\beta (t) = 0\) iff \(t = 0\). For any \(\lambda \ge 0\) define the penalty function

$$\begin{aligned} F_{\lambda }(x, \mu ) = g(x, \mu ) + \lambda \beta \left( d(\mu , \mu _0)\right) \end{aligned}$$

and consider the following penalized problem

$$\begin{aligned} \min _{x, \mu } F_{\lambda }(x, \mu ) \quad \text {subject to} \quad x \in A. \end{aligned}$$
(8)

Observe that the function \(\lambda \rightarrow F_{\lambda }(x, \mu )\) is non-decreasing.

Our aim is to study a relation between (local or global) minima of the initial problem (\(\mathcal {P}\)) and minima of the problem (8) in the context of the theory of exact penalty functions. Recall the definition of exact penalty function.

Definition 1

Let \(x^* \in {{\mathrm{dom}}}f\) be a local minimum of the problem (\(\mathcal {P}\)). The penalty function \(F_{\lambda }\) is called exact at the point \(x^*\) [or, to be more precise, at the point \((x^*, \mu _0)\)] if there exists \(\lambda \ge 0\) such that \((x^*, \mu _0)\) is local minimum of the problem (8). The greatest lower bound of all such \(\lambda \) is denoted by \(\lambda (x^*)\).

Definition 2

The penalty function \(F_{\lambda }\) is said to be exact if there exists \(\lambda \ge 0\) such that the function \(F_{\lambda }\) attains the global minimum on the set \(A \times P\), and if \((x^*, \mu ^*) \in A \times P\) is an optimal solution of the problem (8), then \(\mu ^* = \mu _0\). The greatest lower bound of all such \(\lambda \ge 0\) is denoted by \(\lambda ^*(g, \beta )\).

Remark 1

  1. (i)

    Note that if \((x^*, \mu ^*) \in A \times P\) is an optimal solution of the problem (8) with \(\mu ^* = \mu _0\), then \(x^*\) is an optimal solution of the problem (\(\mathcal {P}\)) by virtue of the exact penalization condition on the function \(g(x, \mu )\) and the fact that \(F_{\lambda }(x, \mu _0) = g(x, \mu _0)\). Thus, the penalty function \(F_{\lambda }\) is exact if and only if there exists \(\lambda \ge 0\) such that the problem (8) is equivalent to the problem (\(\mathcal {P}\)) in terms of optimal solutions.

  2. (ii)

    It is easy to verify that if the penalty function \(F_{\lambda }\) is exact, then for any \(\lambda > \lambda ^*(g, \beta )\) the function \(F_{\lambda }\) attains the global minimum on the set \(A \times P\), and if \((x^*, \mu ^*) \in A \times P\) is an optimal solution of the problem (8), then \(\mu ^* = \mu _0\).

It transpires that exactness of the penalty function \(F_{\lambda }\) is closely related to some properties of the perturbed optimization problem

$$\begin{aligned} \min _{x} g(x, \mu ) \quad \text {subject to} \quad x \in A \qquad \qquad (\mathcal {P}_{\mu }). \end{aligned}$$

Denote by \(h(\mu ) = \inf _{x \in A} g(x, \mu )\) the optimal value function of this problem.

Recall that the problem (\(\mathcal {P}_{\overline{\mu }}\)), \(\overline{\mu } \in P\), is said to be \(\beta \)-calm at a point \(x^* \in A\) if there exists \(\lambda \ge 0\), \(r > 0\) and a neighbourhood \(U\) of \(x^*\) such that

$$\begin{aligned} g(x, \mu ) - g(x^*, \overline{\mu }) \ge - \lambda \beta (d(\mu , \overline{\mu })) \quad \forall x \in U \cap A \quad \forall \mu \in B(\overline{\mu }, r), \end{aligned}$$

where \(B(\overline{\mu }, r) = \{ \mu \in P \mid d(\mu , \overline{\mu }) \le r \}\).

Remark 2

If \(\beta (t) \equiv t\), then the concept of \(\beta \)-calmness coincides with the well-known concept of calmness of a perturbed optimization problem [1215].

The following propositions describes a connection between exactness of the penalty function \(F_{\lambda }\) and calmness of the perturbed problem (\(\mathcal {P}_{\mu }\)) (cf. analogous result for classical penalty functions in [14]).

Proposition 1

Let \(x^* \in {{\mathrm{dom}}}f\) be a local minimum of the problem (\(\mathcal {P}\)). Then the penalty function \(F_{\lambda }\) is exact at the point \(x^*\) if and only if the problem (\(\mathcal {P}_{\mu _0}\)) is \(\beta \)-calm at \(x^*\).

Proof

The validity of the proposition follows from the fact that the inequality

$$\begin{aligned} g(x, \mu ) - g\left( x^*, \mu _0\right) \ge - \lambda \beta \left( d(\mu , \mu _0)\right) \quad \forall x \in U \cap A \quad \forall \mu \in B(\overline{\mu }, r), \end{aligned}$$

holds true for some \(\lambda \ge 0\), \(r \ge 0\) and a neighbourhood \(U\) of \(x^*\) iff for any \(x \in U \cap A\) and \(\mu \in B(\mu _0, r)\) one has

$$\begin{aligned} F_{\lambda }(x, \mu ) = g(x, \mu ) + \beta \left( d\left( \mu , \mu _0\right) \right) \ge g(x, \mu _0) = F_{\lambda }(x, \mu _0), \end{aligned}$$

i.e. iff \((x^*, \mu _0)\) is a local minimum of \(F_{\lambda }\) on the set \(A\). \(\square \)

We need an auxiliary definition. The optimal value function \(h\) is called \(\beta \)-calm from below at a point \(\overline{\mu } \in P\) if

$$\begin{aligned} \liminf _{\mu \rightarrow \mu _0} \frac{h(\mu ) - h(\overline{\mu })}{\beta \left( d(\mu , \overline{\mu })\right) } > - \infty \end{aligned}$$

(cf. the definition of calmness from below in [16]).

Theorem 1

Let \(\beta \) be strictly increasing. For the penalty function \(F_{\lambda }\) to be exact it is necessary and sufficient that the following assumptions hold true:

  1. 1.

    there exists an optimal solution of the problem (\(\mathcal {P}\));

  2. 2.

    the optimal value function \(h(\mu )\) is \(\beta \)-calm from below at \(\mu _0\);

  3. 3.

    there exists \(\lambda _0 \ge 0\) such that \(F_{\lambda _0}\) is bounded from below on the set \(A \times P\).

Proof

Necessity. Since the penalty function \(F_{\lambda }\) is exact, then it attains the global minimum on the set \(A \times P\) for some \(\lambda \ge 0\). Therefore, in particular, \(F_{\lambda }\) is bounded from below on \(A \times P\).

Fix sufficiently large \(\lambda \ge 0\), and a global minimizer \((x^*, \mu ^*)\) of \(F_{\lambda }\) on the set \(A \times P\). Due to the exactness of \(F_{\lambda }\) one has \(\mu ^* = \mu _0\). Therefore, as it was mentioned above, \(x^*\) is an optimal solution of the problem (\(\mathcal {P}\)). Thus, the first assumption is valid as well.

Observe that

$$\begin{aligned} g(x^*, \mu _0) \ge h(\mu _0) := \inf _{x \in A} g(x, \mu _0) \ge \inf _{(x, \mu ) \in A \times P} F_{\lambda }(x, \mu ) = g(x^*, \mu _0). \end{aligned}$$

Hence \(h(\mu _0) = g(x^*, \mu _0)\). With the use of exactness of \(F_{\lambda }\) one gets that for all \(x \in A\) and \(\mu \in P\) the following holds

$$\begin{aligned} g(x, \mu ) + \lambda \beta \left( d\left( \mu , \mu _0\right) \right) = F_{\lambda }(x, \mu ) \ge F_{\lambda }\left( x^*, \mu _0\right) = g\left( x^*, \mu _0\right) = h\left( \mu _0\right) \end{aligned}$$

or, equivalently,

$$\begin{aligned} g(x, \mu ) - h(\mu _0) \ge - \lambda \beta \left( d(\mu , \mu _0)\right) \quad \forall x \in A \quad \forall \mu \in P. \end{aligned}$$

Since the inequality above holds true for all \(x \in A\), one obtains that

$$\begin{aligned} h(\mu ) - h(\mu _0) \ge - \lambda \beta (d(\mu , \mu _0)) \quad \forall \mu \in P, \end{aligned}$$

which implies the \(\beta \)-calmness from below of \(h\) at \(\mu _0\).

Sufficiency. Let \(x^*\) be an optimal solution of the problem (\(\mathcal {P}\)). Then taking into account the exact penalization condition on the function \(g(x, \mu )\) one gets that \(h(\mu _0) = g(x^*, \mu _0)\), i.e. \(x^*\) is a global minimum of the function \(x \rightarrow g(x, \mu _0)\).

Since the optimal value function \(h\) is \(\beta \)-calm from below at \(\mu _0\), then there exist \(\lambda _1 \ge 0\) and \(\delta > 0\) such that

$$\begin{aligned} h(\mu ) - h(\mu _0) \ge - \lambda _1 \beta (d(\mu , \mu _0)) \quad \forall \mu \in B(\mu _0, \delta ). \end{aligned}$$

Hence for any \(x \in A\) one has

$$\begin{aligned} g(x, \mu ) - g(x^*, \mu _0) \ge h(\mu ) - h(\mu _0) \ge - \lambda _1 \beta (d(\mu , \mu _0)) \quad \forall \mu \in B(\mu _0, \delta ) \end{aligned}$$

or, equivalently, for all \((x, \mu ) \in A \times B(\mu _0, \delta )\) the following holds

$$\begin{aligned} F_{\lambda _1}(x, \mu ) = g(x, \mu ) + \lambda _1 \beta (d(\mu , \mu _0)) \ge g(x^*, \mu _0) = F_{\lambda _1}(x^*, \mu _0). \end{aligned}$$

On the other hand, if \(x \in A\) and \(\mu \notin B(\mu _0, \delta )\), then for any \(\lambda \ge \lambda _2\), where

$$\begin{aligned} \lambda _2 = \lambda _0 + \frac{g\left( x^*, \mu _0\right) - c}{\beta (\delta )}, \quad c = \inf _{(x, \mu ) \in A \times P} F_{\lambda _0}(x, \mu ) > -\infty , \end{aligned}$$

one has

$$\begin{aligned} F_{\lambda }(x, \mu )= & {} F_{\lambda _0}(x, \mu ) + (\lambda - \lambda _0) \beta (d(\mu , \mu _0)) \\\ge & {} c + (\lambda - \lambda _0) \beta (\delta ) \ge g(x^*, \mu _0) = F_{\lambda }(x^*, \mu _0). \end{aligned}$$

Thus, for any \(\lambda \ge \overline{\lambda } := \max \{ \lambda _1, \lambda _2 \}\) one has

$$\begin{aligned} F_{\lambda }(x, \mu ) \ge F_{\lambda }(x^*, \mu _0) \quad \forall (x, \mu ) \in A \times P \end{aligned}$$

or, in other words, the penalty function \(F_{\lambda }\) attains the global minimum on \(A \times P\) at the point \((x^*, \mu _0)\). Let \((\overline{x}, \overline{\mu }) \in A \times P\) be a different global minimizer of \(F_{\lambda }\) on \(A \times P\). Let us show that \(\overline{\mu } = \mu _0\), provided \(\lambda > \overline{\lambda }\), then one concludes that the penalty function \(F_{\lambda }\) is exact.

Indeed, for any \(\lambda > \overline{\lambda }\), \(x \in A\) and \(\mu \ne \mu _0\) one has

$$\begin{aligned} F_{\lambda }(x^*, \mu _0) = F_{\overline{\lambda }}(x^*, \mu _0) \le F_{\overline{\lambda }}(x, \mu ) < F_{\lambda }(x, \mu ), \end{aligned}$$

since \(\beta (d(\mu , \mu _0)) > 0\). Hence \(\overline{\mu } = \mu _0\) by virtue of the fact that \((\overline{x}, \overline{\mu })\) is a global minimizer of \(F_{\lambda }\) on \(A \times P\). \(\square \)

Let us also point out a connection between calmness of the optimal value function \(h\) and calmness of the problem (\(\mathcal {P}_{\mu _0}\)) at optimal solutions of the problem (\(\mathcal {P}\)) in the case when the set \(A\) is compact.

Theorem 2

Let the set \(A\) be compact, and the function \(g(x, \mu )\) be lower semicontinuous (l.s.c.) on \(A \times B(\mu _0, r)\) for some \(r > 0\). Then for the optimal value function \(h\) to be \(\beta \)-calm from below at \(\mu _0\) it is necessary and sufficient that the problem (\(\mathcal {P}_{\mu _0}\)) is \(\beta \)-calm at every optimal solution of the problem (\(\mathcal {P}\)).

Proof

Necessity. Fix an optimal solution \(x^*\) of the problem (\(\mathcal {P}\)). From the definition of \(\beta \)-calmness it follows that there exist \(\lambda \ge 0\) and \(r > 0\) such that

$$\begin{aligned} h(\mu ) - h(\mu _0) \ge - \lambda \beta (d(\mu , \mu _0)) \quad \forall \mu \in B(\mu _0, r). \end{aligned}$$

Since \(x^*\) is an optimal solution of the problem (\(\mathcal {P}\)), then \(h(\mu _0) = g(x^*, \mu _0)\) due to the exact penalization condition on \(g(x, \mu )\). Therefore

$$\begin{aligned} g(x, \mu ) - g(x^*, \mu _0) \ge h(\mu ) - h(\mu _0) \ge - \lambda \beta (d(\mu , \mu _0)) \quad \forall (x, \mu ) \in A \times B(\mu _0, r). \end{aligned}$$

Thus, the problem (\(\mathcal {P}_{\mu _0}\)) is \(\beta \)-calm at \(x^*\).

Sufficiency. Taking into account the facts that the set \(A\) is compact and the function \(g(x, \mu )\) is l.s.c. one gets that the function \(g(\cdot , \mu _0)\) attains the global minimum on the set \(A\), and the set \(A^*\) of all global minima of \(g(\cdot , \mu _0)\) on \(A\) is compact. Furthermore, from the exact penalization condition it follows that \(A^*\) is also the set of all optimal solutions of the problem (\(\mathcal {P}\)). Hence the problem (\(\mathcal {P}_{\mu _0}\)) is \(\beta \)-calm at every \(x^* \in A^*\). Therefore for any \(x^* \in A^*\) there exist \(\lambda (x^*) \ge 0\), \(r(x^*) \ge 0\) and a neighbourhood \(U(x^*)\) of \(x^*\) such that

$$\begin{aligned} g(x, \mu ) - g\left( x^*, \mu _0\right) \ge - \lambda \left( x^*\right) \beta (d(\mu , \mu _0)) \quad \forall (x, \mu ) \in U\left( x^*\right) \times B\left( \mu _0, r\left( x^*\right) \right) . \end{aligned}$$

Applying the compactness of \(A^*\) one obtains that there exists \(x_1^*, x_2^*, \ldots , x_n^* \in A^*\) such that \(A^* \subset \bigcup _{k = 1}^n U(x^*_k)\). Denote

$$\begin{aligned} U = \bigcup _{k = 1}^n U\left( x_k^*\right) , \quad \overline{\lambda } = \max _{k \in 1:n} \lambda \left( x_k^*\right) , \quad \overline{r} = \min _{k \in 1:n} r\left( x_k^*\right) . \end{aligned}$$

Then for any \(x \in U\) one has

$$\begin{aligned} g(x, \mu ) - h(\mu _0) \ge - \overline{\lambda } \beta (d(\mu , \mu _0)) \quad \forall \mu \in B(\mu _0, \overline{r}), \end{aligned}$$
(9)

due to the fact that \(g(x^*, \mu _0) = h(\mu _0) = \inf _{x \in A} g(x, \mu _0)\) for any \(x^* \in A^*\).

Set \(K = A {\setminus } U\). Since \(A^* \subset U\), then for any \(x \in K\) one has \(g(x, \mu _0) > h(\mu _0)\) (recall that \(A^*\) is the set of all global minima of \(g(\cdot , \mu _0)\) on \(A\)). Consequently, applying the lower semicontinuity of the function \(g(x, \mu )\) one gets that for any \(x \in K\) there exist \(\delta (x) > 0\) and a neighbourhood \(V(x)\) of \(x\) such that

$$\begin{aligned} g(y, \mu ) > h(\mu _0) \quad \forall (y, \mu ) \in V(x) \times B\left( \mu _0, \delta (x)\right) . \end{aligned}$$

Observe that from the facts that \(U\) is open and \(A\) is compact it follows that \(K = A {\setminus } U\) is also compact. Hence and from the inequality above it is easy to show that there exists \(\delta > 0\) such that

$$\begin{aligned} g(x, \mu ) > h(\mu _0) \quad \forall (x, \mu ) \in K \times B(\mu _0, \delta ). \end{aligned}$$

Therefore taking into account (9) one gets that

$$\begin{aligned} g(x, \mu ) - h(\mu _0) \ge - \overline{\lambda } \beta (d(\mu , \mu _0)) \quad \forall (x, \mu ) \in A \times B\left( \mu _0, \min \{ \delta , \overline{r} \}\right) \!, \end{aligned}$$

which yields

$$\begin{aligned} h(\mu ) - h(\mu _0) \ge - \overline{\lambda } \beta (d(\mu , \mu _0)) \quad \forall \mu \in B(\mu _0, \min \{ \delta , \overline{r} \}). \end{aligned}$$

Thus, the optimal valued function \(h\) is \(\beta \)-calm from below at \(\mu _0\). \(\square \)

Combining Theorems 1 and 2, and Proposition 1 one obtains that the following result holds true.

Corollary 1

Let the set \(A\) be compact, the function \(g(x, \mu )\) be l.s.c. on \(A \times B(\mu _0, r)\) for some \(r > 0\), and the function \(\beta \) be strictly increasing. Then the penalty function \(F_{\lambda }\) is exact if and only if it is exact at every optimal solution of the problem (\(\mathcal {P}\)), and there exists \(\lambda _0 \ge 0\) such that \(F_{\lambda _0}(x, \mu )\) is bounded from below on \(A \times P\).

4 Smooth exact penalty functions

Let us apply the theory developed in the previous section to the study of smooth exact penalty functions. Let \(X\) and \(Y\) be metric spaces, \(A \subset X\) be a nonempty set, and \(\Phi :X \rightrightarrows Y\) be a given set-valued mapping with closed images. For any subset \(C \subset X\) and \(x_0 \in X\) denote by

$$\begin{aligned} d(x_0, C) = \inf _{x \in C} d(x_0, x) \end{aligned}$$

the distance between \(C\) and \(x_0\). For any \(y \in Y\) denote, as usual, \(\Phi ^{-1}(y) = \{ x \in X \mid y \in \Phi (x) \}\).

Fix an element \(y_0 \in Y\), and consider the following optimization problem:

$$\begin{aligned} \min f(x) \quad \text {subject to} \quad y_0 \in \Phi (x), \quad x \in A. \end{aligned}$$
(10)

Note that the set \(\Omega \) of feasible solutions of this problem has the form \(\Omega = \Phi ^{-1}(y_0) \cap A\).

Following the general technique proposed above and the method of smooth exact penalty functions [6], define \(P = \mathbb {R}_+\), fix a non-decreasing function \(\phi :\mathbb {R}_+ \cup \{ +\infty \} \rightarrow \mathbb {R}_+ \cup \{ +\infty \}\) such that \(\phi (t) = 0\) iff \(t = 0\), and introduce the perturbed objective function

$$\begin{aligned} g(x, \mu ) = {\left\{ \begin{array}{ll} f(x), &{} \text { if }\quad x \in \Omega , \mu = 0, \\ + \infty , &{} \text { if }\quad x \notin \Omega , \mu = 0, \\ f(x) + \frac{1}{\mu } \phi \left( d\left( y_0, \Phi (x)\right) ^2\right) , &{} \text { if }\quad \mu > 0. \end{array}\right. } \end{aligned}$$

Clearly, the function \(g(x, \mu )\) satisfies the consistency condition and the exact penalization condition with \(\mu _0 = 0\).

Introduce the penalty function

$$\begin{aligned} F_{\lambda }(x, \mu ) = g(x, \mu ) + \lambda \beta (\mu ), \end{aligned}$$

where \(\beta :\mathbb {R}_+ \rightarrow \mathbb {R}_+ \cup \{ + \infty \}\) is a non-decreasing function such that \(\beta (\mu ) = 0\) iff \(\mu = 0\). Let us obtain sufficient conditions for \(F_{\lambda }(x, \mu )\) to be exact. In order to formulate these conditions, recall that a set valued mapping \(\Phi \) is said to be metrically subregular with respect to the set \(A\) with constant \(a > 0\) at a point \((\overline{x}, \overline{y}) \in X \times Y\) with \(\overline{y} \in \Phi (\overline{x})\) and \(\overline{x} \in A\) if there exists a neighbourhood \(U\) of \(\overline{x}\) such that

$$\begin{aligned} d\left( \Phi (x), \overline{y}\right) \ge a d\left( x, \Phi ^{-1}(\overline{y}) \cap A\right) \quad \forall x \in U \cap A. \end{aligned}$$

Thus, \(\Phi \) is metrically subregular with respect to the set \(A\) iff the restriction of \(\Phi \) to \(A\) is metrically subregular in the usual sense. See [17, 18] and references therein for extensive study of metric subregularity.

Theorem 3

Let \(x^* \in {{\mathrm{dom}}}f\) be a local minimum of the problem (10), the function \(f\) be Lipschitz continuous near \(x^*\), and the set-valued mapping \(\Phi \) be metrically subregular with respect to the set \(A\) with constant \(a > 0\) at \((x^*, y_0)\). Suppose also that the following assumptions are satisfied:

  1. 1.

    there exists \(\phi _0 > 0\) and \(t_0 > 0\) such that \(\phi (t) \ge \phi _0 t\) for any \(t \in [0, t_0]\);

  2. 2.

    there exists \(\beta _0 > 0\) and \(\overline{\mu } > 0\) such that \(\beta (\mu ) \ge \beta _0 \mu \) for any \(\mu \in [0, \overline{\mu }]\).

Then the penalty function \(F_{\lambda }(x, \mu )\) is exact at the point \(x^*\). Moreover, one has

$$\begin{aligned} \lambda (x^*) \le \frac{L^2}{4 \phi _0 \beta _0 a^2}, \end{aligned}$$
(11)

where \(L \ge 0\) is a Lipschitz constant of \(f\) near \(x^*\).

Proof

Since \(x^*\) is a local minimum of the problem (10), then there exists \(\rho > 0\) such that \(f(x) \ge f(x^*)\) for any \(x \in B\left( x^*, \rho \right) \cap \Omega \). Suppose, for a moment, that there exists \(\delta \in (0, \rho )\) such that

$$\begin{aligned} f(x) - f\left( x^*\right) \ge - L d(x, \Omega ) \quad \forall x \in B\left( x^*, \delta \right) {\setminus } \Omega , \end{aligned}$$
(12)

where \(L \ge 0\) is a Lipschitz constant of \(f\) near \(x^*\). Then applying the metric subregularity of \(\Phi \) with respect to \(A\), and the fact that the function \(\phi \) is non-decreasing, one gets that for any \(\mu > 0\) and \(x \in B(x^*, r) \cap A\), where \(r = \min \{ \delta ,\sqrt{t_0}/a \}\), the following inequalities hold true

$$\begin{aligned} g(x, \mu ) - g\left( x^*, 0\right)= & {} f(x) - f\left( x^*\right) + \frac{1}{\mu } \phi \left( d\left( y_0, \Phi (x)\right) ^2\right) \\\ge & {} - L d(x, \Omega ) + \frac{\phi _0 a^2}{\mu } d\left( x, \Phi ^{-1}(y_0) \cap A\right) ^2 \\= & {} - L d(x, \Omega ) + \frac{\phi _0 a^2}{\mu } d\left( x, \Omega \right) ^2\!. \end{aligned}$$

Note that the function \(h(t) = - L t + \phi _0 a^2 t^2 / \mu \) attains the global minimum at the point \(\mu L / 2 \phi _0 a^2\), and

$$\begin{aligned} h\left( \frac{\mu L}{2 \phi _0 a^2} \right) = - \frac{L^2}{4 \phi _0 a^2} \mu . \end{aligned}$$

Hence for any \(\mu > 0\) and \(x \in B(x^*, r) \cap A\) one has

$$\begin{aligned} g(x, \mu ) - g(x^*, 0) \ge - \frac{L^2}{4 \phi _0 a^2} \mu . \end{aligned}$$
(13)

On the other hand, if \(x \in B(x^*, r) \cap A\) and \(\mu = 0\), then either \(x \notin \Omega \) and \(g(x, \mu ) = + \infty \ge g(x^*, 0)\) or \(x \in \Omega \) and \(g(x, \mu ) = f(x) \ge f(x^*) = g(x^*, 0)\) (recall that \(r \le \delta < \rho \)). Therefore the inequality (13) is satisfied for any \(x \in B(x^*, r) \cap A\) and \(\mu \ge 0\), which yields that

$$\begin{aligned} F_{\lambda }(x, \mu ) = g(x, \mu ) + \lambda \beta (\mu ) \ge g(x, \mu ) + \lambda \beta _0 \mu \ge g\left( x^*, 0\right) = F_{\lambda }\left( x^*, 0\right) \end{aligned}$$

for all \((x, \mu ) \in B(x^*, r) \times [0, \overline{\mu }]\) and for any \(\lambda \ge L^2 / 4 \phi _0 \beta _0 a^2\). Thus, the penalty function \(F_{\lambda }(x, \mu )\) is exact at \(x^*\), and (11) holds true.

It remains to show that the inequality (12) is valid for some \(\delta > 0\). Indeed, fix \(x \in B(x^*, \rho / 2) {\setminus } \Omega \). By the definition of the distance between a point and a set there exists a sequence \(\{ x_n \} \subset \Omega \) such that \(d(x, x_n) \rightarrow d(x, \Omega )\) as \(n \rightarrow \infty \). Moreover, without loss of generality one can suppose that \(d(x, x_n) \le \rho / 2\) for any \(n \in \mathbb {N}\), since \(d(x, x^*) \le \rho / 2\) and \(x^* \in \Omega \). Consequently, one has

$$\begin{aligned} d\left( x_n, x^*\right) \le d\left( x_n, x\right) + d\left( x, x^*\right) \le \frac{\rho }{2} + \frac{\rho }{2} = \rho , \end{aligned}$$

which implies \(f(x_n) \ge f(x^*)\). Therefore applying the Lipschitz continuity of \(f\) near \(x^*\) one obtains that for any \(n \in \mathbb {N}\) the following inequalities holds true

$$\begin{aligned} f(x) - f\left( x^*\right) = f(x) - f\left( x_n\right) + f\left( x_n\right) - f\left( x^*\right) \ge f(x) - f\left( x_n\right) \ge - L d\left( x, x_n\right) \!. \end{aligned}$$

Passing to the limit as \(n \rightarrow \infty \) one gets the desired result. \(\square \)

Applying Corollary 1 and the theorem above one can easily obtain sufficient conditions for the penalty function \(F_{\lambda }\) to be exact.

Theorem 4

Let the set \(A\) be compact. Suppose that the following assumptions are satisfied:

  1. 1.

    \(f\) is l.s.c. on \(A\), and locally Lipschitz continuous near optimal solutions of the problem (\(\mathcal {P}\));

  2. 2.

    \(\Phi \) is metrically subregular with respect to the set \(A\) at \((x^*, y_0)\) for any optimal solution \(x^*\) of the problem (\(\mathcal {P}\));

  3. 3.

    the mapping \(x \rightarrow d(y_0, \Phi (x))\) is continuous on \(A\);

  4. 4.

    \(\phi \) is l.s.c., and there exists \(\phi _0 > 0\) and \(t_0 > 0\) such that \(\phi (t) \ge \phi _0 t\) for any \(t \in [0, t_0]\);

  5. 5.

    \(\beta \) is strictly increasing and there exists \(\beta _0 > 0\) and \(\overline{\mu } > 0\) such that \(\beta (\mu ) \ge \beta _0 \mu \) for any \(\mu \in [0, \overline{\mu }]\).

Then the penalty function \(F_{\lambda }\) is exact.

Proof

Applying Theorem 3, and taking into account the assumptions of the theorem one obtains that the penalty function \(F_{\lambda }\) is exact at every optimal solution of the problem (\(\mathcal {P}\)). Since \(f\) is l.s.c. on \(A\), and the set \(A\) is compact, then \(f\) is bounded from below on this set. Hence the function \(g(x, \mu )\) is bounded from below on \(A \times \mathbb {R}_+\), which implies that the penalty function \(F_{\lambda }\) is also bounded from below on \(A \times \mathbb {R}_+\) for any \(\lambda \ge 0\).

Let us show that the function \(g(x, \mu )\) is l.s.c. on \(A \times \mathbb {R}_+\). Then with the use of Corollary 1 one obtains the desired result.

For any \(\varepsilon > 0\) introduce the function

$$\begin{aligned} g_{\varepsilon }(x, \mu ) = f(x) + \frac{1}{\mu + \varepsilon } \phi \left( d\left( y_0, \Phi (x)\right) ^2\right) \quad (x, \mu ) \in A \times \mathbb {R}_+. \end{aligned}$$

Taking into account the fact that the function \(x \rightarrow d(y_0, \Phi (x))\) is continuous on \(A\), and \(\phi \) is l.s.c., one gets that the function \(x \rightarrow \phi (d(y_0, \Phi (x))^2)\) is l.s.c. on \(A\) as well. Hence and from the lower semicontinuity of \(f\) it follows that the function \(g_{\varepsilon }(x, \mu )\) is l.s.c. on \(A \times \mathbb {R}_+\). Note that

$$\begin{aligned} g(x, \mu ) = \sup _{\varepsilon > 0} g_{\varepsilon }(x, \mu ) \quad \forall (x, \mu ) \in A \times \mathbb {R}_+. \end{aligned}$$

Therefore the function \(g(x, \mu )\) is l.s.c. on \(A \times \mathbb {R}_+\) as the supremum of a family of l.s.c. functions. \(\square \)

Theorem 3 can be modified to the case of more general functions \(g(x, \mu )\) and \(F_{\lambda }(x, \mu )\). In particular, let

$$\begin{aligned} g(x, \mu ) = {\left\{ \begin{array}{ll} f(x), &{} \text { if }\quad x \in \Omega , \mu = 0, \\ + \infty , &{} \text { if }\quad x \notin \Omega , \mu = 0, \\ f(x) + \frac{1}{\mu ^{\alpha }} \phi \left( d\left( y_0, \Phi (x)\right) ^2\right) , &{} \text { if }\quad \mu > 0. \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} F_{\lambda }(x, \mu ) = g(x, \mu ) + \lambda \beta (\mu ), \end{aligned}$$

where \(\alpha > 0\) (cf. [8, 9, 11]). The following result holds true.

Theorem 5

Let \(x^* \in {{\mathrm{dom}}}f\) be a local minimum of the problem (10), the function \(f\) be Lipschitz continuous near \(x^*\), and the set-valued mapping \(\Phi \) be metrically subregular with respect to the set \(A\) at \((x^*, y_0)\). Suppose that the following assumptions are satisfied:

  1. 1.

    \(\phi (t) \ge \phi _0 t^{\gamma }\) for any \(t \in [0, t_0]\), and for some \(\phi _0 > 0\), \(\gamma > 0\) and \(t_0 > 0\);

  2. 2.

    \(\beta (\mu ) \ge \beta _0 \mu ^{\sigma }\) for any \(\mu \in [0, \overline{\mu }]\), and for some \(\beta _0 > 0\), \(\sigma > 0\) and \(\overline{\mu } > 0\).

Suppose also that

$$\begin{aligned} \gamma > \frac{1}{2}, \quad \sigma \le \frac{\alpha }{2 \gamma - 1}. \end{aligned}$$

Then the penalty function \(F_{\lambda }(x, \mu )\) is exact at the point \(x^*\).

Proof

Arguing in the same way as in the proof of Theorem 3 one can easily verify that there exists \(\Theta > 0\) such that

$$\begin{aligned} g(x, \mu ) - g\left( x^*, 0\right) \ge - \Theta \mu ^{\frac{\alpha }{2 \gamma - 1}} \quad \forall x \in B\left( x^*, r\right) \cap A \quad \forall \mu \ge 0, \end{aligned}$$

where \(r > 0\) is sufficiently small. Then applying the assumption on the function \(\beta \) one can show that \(F_{\lambda }\) is exact at \(x^*\), provided \(\sigma \le \alpha / (2 \gamma - 1)\). \(\square \)