A new class of exact penalty functions and penalty algorithms

Wang, Changyu; Ma, Cheng; Zhou, Jinchuan

doi:10.1007/s10898-013-0111-9

A new class of exact penalty functions and penalty algorithms

Published: 29 September 2013

Volume 58, pages 51–73, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Global Optimization Aims and scope Submit manuscript

A new class of exact penalty functions and penalty algorithms

Download PDF

Changyu Wang¹,
Cheng Ma² &
Jinchuan Zhou³

868 Accesses
23 Citations
Explore all metrics

Abstract

For nonlinear programming problems, we propose a new class of smooth exact penalty functions, which includes both barrier-type and exterior-type penalty functions as special cases. We develop necessary and sufficient conditions for exact penalty property and inverse proposition of exact penalization, respectively. Furthermore, we establish the equivalent relationship between these penalty functions and classical simple exact penalty functions in the sense of exactness property. In addition, a feasible penalty function algorithm is proposed. The convergence analysis of the algorithm is presented, including the global convergence property and finite termination property. Finally, numerical results are reported.

Smooth exact penalty functions: a general approach

Article 10 April 2015

A class of exact penalty functions and penalty algorithms for nonsmooth constrained optimization problems

Article 19 October 2019

An Interior-Point $$\boldsymbol{\ell_{1}}$$ -Penalty Method for Nonlinear Optimization

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The penalty function algorithm is an important method for solving constrained optimization problems, which, by augumenting a penalty term, transforms the original problem into a single unconstrained (or simple constrained) problem or into a sequence of unconstrained problems. For classical penalty function algorithms, we need to make the penalty parameter infinitely large in a limiting sense to recover an optimal solution of the original problem, which incurs numerical instability in implementation. To avoid this difficulty, we require the penalty function to satisfy the following property: recovering an exact solution of the original problem for reasonable finite values of the penalty parameter. The penalty function possessing this property is said to be exact. However, it should be identified that some exact penalty functions have the disadvantage that they either need Jacobian [7, 9, 14, 15, 20] or are no longer smooth ($l_1$ or $l_\infty $ penalty functions etc. [1, 3, 4, 6, 12, 16, 18, 19, 21, 22]).

In this paper, we are mainly concerned with the following nonlinear programming problem

$$\begin{aligned} (P)&\min ~f(x)\\&\mathrm{s.t.}~ F(x)=0,\quad x\in [u,v], \end{aligned}$$

where $[u,v]=\{x\in \mathbb{R }^n | u\le x\le v \},$ $u\in (\{-\infty \}\cup \mathbb{R })^n,$ $v\in (\{+\infty \}\cup \mathbb{R })^n$ and $\mathrm{int}[u,v]\ne \emptyset $. The functions $f:D\rightarrow \mathbb{R }$ and $F:D\rightarrow \mathbb{R }^m$ are continuously differentiable on an open set $D$ satisfying $[u,v]\subset D.$ We always assume that the feasible region is nonempty and the function $f$ is bounded below on $D$, because $f$ can be replaced by $e^f$ otherwise.

Let $\omega \in \mathbb{R }^m$ be fixed. The problem $(P)$ can be written equivalently as

$$\begin{aligned} \min&f(x)\\ \mathrm{s.t.}&F(x)=\varepsilon \omega ,\\&x\in [u,v],~~\varepsilon =0. \end{aligned}$$

For this problem, Huyer and Neumaier [13] introduce the following exact penalty function

$$\begin{aligned}&f_\sigma (x,\varepsilon )=\left\{ \begin{array}{l@{\quad }l} f(x),&{}\hbox {if}\; \varepsilon =\triangle (x,\varepsilon )=0; \\ f(x)+\frac{1}{2\varepsilon }\frac{\triangle (x,\varepsilon )}{1-q\triangle (x,\varepsilon )}+\sigma \beta (\varepsilon ), &{}\hbox {if}\;\varepsilon >0 , \triangle (x,\varepsilon )<q^{-1};\\ +\infty , &{}\hbox {otherwise},\\ \end{array} \right. \end{aligned}$$

(1.1)

where $q>0$ is a given positive constant, $\sigma >0$ is a penalty parameter, $\beta : [0,\overline{\varepsilon }]\rightarrow [0,\infty )$ is continuous on $[0,\overline{\varepsilon }]$ with $\overline{\varepsilon }>0$ and continuously differentiable on $(0,\overline{\varepsilon }]$ with $\beta (0)=0$. The term $\Delta (x,\varepsilon )$ measures the violation of the constraints, i.e.,

$$\begin{aligned} \Delta (x,\varepsilon )=\Vert F(x)-\varepsilon \omega \Vert ^2=\sum \limits _{j=1}^m \big (F_j(x)-\varepsilon \omega _j\big )^2. \end{aligned}$$

The corresponding penalty problem to (1.1) is

$$\begin{aligned} (P_\sigma )&\min ~f_\sigma (x,\varepsilon )\\&\mathrm{s.t.}~(x,\varepsilon )\in [u,v]\times [0,\overline{\varepsilon }]. \end{aligned}$$

Note that $f_\sigma $ is continuously differentiable on $D_q=\{(x,\varepsilon )\in D\times (0,\overline{\varepsilon })| \Delta (x,\varepsilon )<q^{-1}\}$. The exact penalty property of $f_\sigma $ is discussed thoroughly in [13], e.g., for smooth case, the exact penalty property is established if $D_F$ (see (3.1)) is bounded and each $x\in D_F$ satisfies Mangasarian-Fromovitz constraint qualification [13, Theorem 2.1]; for nonsmooth case of $(P)$, sufficient conditions are established by the regular technique and other strong conditions [13, Theorem 5.3].

The above penalty function $f_{\sigma }$ is simple and exact. Here, we call a penalty function “simple”, if it involves objective and constraint functions of the original problem $(P)$ and does not involve gradients or the Jacobian. The main reason of $f_\sigma $ to have significant differences with the classical simple exact penalty functions is that $f_{\sigma }$ has good smoothness property, which is not enjoyed by the latter [2, 10]. In this paper, we restrict our focus primarily on the case of $(P)$ being smooth. The main results are listed as follows.

(i)
The term $\frac{\Delta (x,\varepsilon )}{1-q\Delta (x,\varepsilon )}$ in (1.1) plays a role as a barrier term. However, to compute an interior point is not an easy thing in practical applications. Therefore, we extend this term to a class of convex functions. This allows us to present a unified framework for some barrier-type and exterior-type penalty functions, in which the latter are smooth on $[u,v]\times (0,{\overline{\varepsilon }})$ and have a larger smooth area than the former. The extended penalty function class (2.1) in Sect. 2 provides several alternatives in the design of penalty function algorithms.
(ii)
Based on (2.1) in Sect. 2, the corresponding penalty problems for $(P)$ are proposed. We firstly introduce the extended Mangasarian-Fromovitz constraint qualification, which generalizes the classical Mangasarian-Fromovitz constraint qualification from a single point to an infinite sequence. The sufficient condition for exact penalty property of $\widetilde{f}_\sigma $ is established, provided that $\nabla f$ is bounded on $D_F$ and the extended Mangasarian-Fromovitz constraint qualification holds (see Theorem 3.1). This finding generalizes [13, Theorem 2.1]. Furthermore, necessary conditions for exact penalty property are obtained (see Theorem 3.2).
(iii)
Following by a series of technical lemmas, we show that these necessary conditions are also necessary and sufficient conditions for inverse proposition of exact penalization (see Theorem 4.1). This result accurately characterizes the equivalence between $\widetilde{f}_\sigma $ and the classical simple exact penalty functions in the sense of exactness property. It demonstrates that $\widetilde{f}_\sigma $ possesses exactness property as the classical simple penalty functions as well as the smoothness property, which is not shared by the classical simple exact penalty functions.
(iv)
It should be mentioned that, even if the original programming problem has optimal solutions, the penalty functions may fail to locate an optimal solution. In this case, it is impossible to obtain optimal solutions via penalty function algorithms, because in some penalty function algorithms, we need to find the optimal solution of penalty subproblem at each iteration. It also may occur even for the smooth penalty function with lower bound. For example, although the penalty function proposed in [8] is smooth and bounded from below, there exists an example to indicate that, the penalty function may fail to locate optimal solutions even though the original problem has an optimal solution. This causes the penalty function algorithm proposed in [8] and its revised version inapplicable. In addition, some papers discuss the convergence of iterative sequences generated by penalty function algorithms to FJ points or KKT points; see [5, 17, 20]. In order to avoid the cases mentioned above, we present a class of revised penalty function algorithms. The main feature of this class of algorithms is that if the optimal solution of the penalty subproblem does not exist, then we resort to finding a $\delta $-optimal solution of the penalty subproblem. Note that the $\delta $-optimal solution always exists, because penalty functions are bounded from below. This ensures the revised penalty function algorithms are always feasible. In addition, utilizing the exact penalty property and structural features of the class of penalty functions, we show that, under certain conditions, the proposed algorithm has finite termination property, i.e., the algorithm terminates at the optimal solution of the original problem after finitely many iterations; otherwise, by a perturbation theorem, the global convergence property is given, that is, every accumulation point of the sequence generated by the algorithm is an optimal solution of the original problem.

The organization of this paper is as follows. Section 2 extends the penalty function proposed by Huyer and Neumaier (1.1) to a class of penalty functions (2.1), establishes corresponding penalty problems and introduces the extended Mangasarian-Fromovitz constraint qualification. The necessary and sufficient conditions for exact penalty property are developed in Sect. 3. In Sect. 4, necessary and sufficient conditions for the inverse propositions of exact penalization are established. Additionally, the equivalence relationship between the new class of penalty functions and the classical simple exact penalty function in the sense of exactness property is established. Section 5 is devoted to the revised penalty function algorithm, where the finite termination property and global convergence property of the proposed algorithm are discussed, respectively. Numerical results are reported in Sect. 6.

2 A class of exact penalty functions

In this section, we extend the term $\frac{\Delta (x,\varepsilon )}{1-q\Delta (x,\varepsilon )}$ in (1.1) to a class of convex functions. This allows us to present a unified framework for some barrier-type and exterior-type penalty functions. Given $a\in (0,+\infty ]$, let a function $\phi :[0,a)\rightarrow [0,+\infty )$ satisfy

$(i_1)$ $\phi $ is convex and continuously differentiable on $[0,a)$ with $\phi (0)=0$.

$(i_2)$ $\phi '(t)>0$ for all $t\in [0,a)$.

Many functions satisfy the conditions $(i_1),(i_2)$, for example

$$\begin{aligned} \begin{array}{ll} \phi _1(t)=\frac{t}{(1-qt)^\alpha } &{} (a=q^{-1}, \alpha \ge 1);\\ \phi _2(t)=\tan (t) &{} (a=\frac{\pi }{2}); \\ \phi _3(t)=-\log (1-t^\alpha ) &{} (a=1, \alpha \ge 1);\\ \phi _4(t)=t &{} (a=+\infty );\\ \phi _5(t)=e^t-1 &{} (a=+\infty );\\ \phi _6(t)=\frac{1}{2}(\sqrt{t^2+4}+t)-1 &{} (a=+\infty ). \end{array} \end{aligned}$$

Utilizing $\phi $, a penalty function is given by

$$\begin{aligned}&\widetilde{f}_\sigma (x,\varepsilon )=\left\{ \begin{array}{l@{\quad }l} f(x),&{} \hbox {if}~ \varepsilon =\triangle (x,\varepsilon )=0, \\ f(x)+\frac{1}{2\varepsilon }\phi (\Delta (x,\varepsilon ))+\sigma \beta (\varepsilon ), &{}\hbox {if }\varepsilon >0 , \triangle (x,\varepsilon )<a, \\ +\infty , &{}\hbox {otherwise}. \end{array} \right. \end{aligned}$$

(2.1)

It is easy to see that $\widetilde{f}_\sigma $ is continuously differentiable on $D_a=\{(x,\varepsilon )\in D\times (0,\overline{\varepsilon })|\Delta (x,\varepsilon )<a\}$. The barrier-type penalty functions correspond to the case of $a<+\infty $, and the exterior-type ones correspond to the case of $a=+\infty $.

The corresponding penalty problem is defined as

$$\begin{aligned} (\widetilde{P}_\sigma )&\min ~\widetilde{f}_\sigma (x,\varepsilon )\\&\mathrm{s.t.}~(x,\varepsilon )\in [u,v]\times [0,\overline{\varepsilon }]. \end{aligned}$$

Let’s firstly recall Mangasarian-Fromovitz constraint qualification. Consider

$$\begin{aligned} \left\{ \begin{array}{l} F(z)= 0, \\ g(z)\le 0,\\ \end{array} \right. \end{aligned}$$

(2.2)

where $g:D\rightarrow \mathbb{R }^l$ is continuously differentiable, and $F$ is defined as before. Denote by $\nabla F\in \mathbb{R }^{n\times m}$ the transpose of Jacobian of $F$.

Definition 2.1

For constraint system (2.2), we say that Mangasarian-Fromovitz constraint qualification (hereafter, MFCQ for short) holds at $z^*\in \mathbb{R }^n$, if

(1)
$g(z^*)\le 0$ and $\mathrm{rank}(\nabla F(z^*))=m$;
(2)
there exists $p\in \mathbb{R }^n$ such that $\nabla F(z^*)^T p=0$ and $\nabla g_j(z^*)^Tp<0$ for all $j\in J(z^*)$,

where $J(z^*)=\{j\in J|g_j(z^*)=0\}$ and $J=\{1,2,\ldots ,l\}$.

For an infinite sequence $K\subset \{1,2,\ldots \}$ and $\{z^k\}_{k\in K}\subset \mathbb{R }^n$, denote

$$\begin{aligned} J^+(K)=\{j\in J|\limsup \limits _{k\in K, k\rightarrow \infty } g_j(z^k)\ge 0\} \quad \mathrm{and} \quad J^-(K)=\{j\in J|\limsup \limits _{k\in K, k\rightarrow \infty } g_j(z^k)<0\}. \end{aligned}$$

Definition 2.2

For constraint system (2.2), we say that an extended Mangasarian-Fromovitz constraint qualification (hereafter, EMFCQ for short) holds for $\{z^k\}_{k\in K}$, if there exist a matrix $\nabla F^*$ and an infinite subset $K_0\subset K$ such that

(1)
$\lim \limits _{k\in K_0,k\rightarrow \infty }\nabla F(z^k)=\nabla F^*$ and $\mathrm{rank}(\nabla F^*)=m$;
(2)
there exists $p\in \mathbb{R }^n$ such that $(\nabla F^*)^T p=0$ and $\limsup \limits _{k\in K_0,k\rightarrow \infty }\nabla g_j(z^k)^T p<0$ for all $j\in J^+(K_0)$.

Obviously, if MFCQ holds at $z^*$, then EMFCQ holds for the constant sequence $\{z^k\}_{k\in K}$ satisfying $z^k=z^*$ for all $k\in K$. Furthermore, from Definitions 2.1 and 2.2, we readily get the following result.

Proposition 2.1

Let $\{z^k\}_{k\in K}\subset \mathbb{R }^n$. If the standard MFCQ holds for an accumulation point $z^*$ of the sequence $\{z^k\}_{k\in K}$ for (2.2), then EMFCQ holds for $\{z^k\}_{k\in K}$.

The following example shows that EMFCQ is suitable for an unbounded and infeasible sequence.

Example 2.1

Consider the following constraint system

$$\begin{aligned} \left\{ \begin{array}{lll} F(z)&{}=&{}z_1+z_2+z_3^2=0,\\ g_1(z)&{}=&{}z_1-z_2\le 0,\\ g_2(z)&{}=&{}z_1-z_2-z_3^2\le 0. \end{array} \right. \end{aligned}$$

It is easy to see that EMFCQ holds for the unbounded sequence $\{z^k\}= \{(k+\frac{1}{k},k,1+\frac{1}{k})^T\}$. Now let us discuss EMFCQ in more detail. Clearly, from Definition 2.1, Definition 2.2 and Example 2.1, we generalize MFCQ from a single point $z^*$ with $g(z^*)\le 0$ to an infinite sequence $\{z^k\}_{k\in K}$. This sequence may be unbounded and infeasible, i.e., not necessarily satisfy $g(z^k)\le 0$. Therefore, EMFCQ is suitable to weaken some assumptions on exactness property in the existing literature. In fact, it, to a great extent, mainly can be used to remove the level-bounded assumption. For instance, as stated in [13, Theorem 2.1], exactness property of the penalty function $f_{\sigma }(z,\varepsilon )$ is established, provided that the level set $D_F$ is bounded and every point $z\in D_F$ satisfies MFCQ. These conditions ensure that the sequence $\{z^k\}_{k\in K}$ is bounded and every accumulation point satisfies MFCQ, where $(z^k,\varepsilon _k)$ is a local optimal solution of the penalty problem $(\tilde{P}_\sigma )$. According to Proposition 2.1, EMFCQ holds for $\{z^k\}_{k\in K}$. Therefore, we can show that $\tilde{f}_\sigma (z,\varepsilon )$ is also exact (see Theorem 3.1 below) by requiring the validity of EMFCQ for $\{z^k\}_{k\in K}$ and the boundedness of $\nabla f$ over $D_F$. Therefore, [13, Theorem 2.1] is a special case of our result (see Corollary 3.1 and Remark 3.1 below).

Similarly, the level-bounded assumption in the convergence analysis of penalty function methods can be weakened by EMFCQ. For example, in [8], a class of penalty function methods (see [8, Algorithm 1]) was proposed for the nonlinear programming problem with inequality constraints. Notice that, in general, the iteration point $z^k$ generated by Algorithm 1 is infeasible. By the assumptions that the level set $\Omega _\varepsilon $ is bounded and all global optimal solutions of the original problem satisfy MFCQ (see [8], Assumptions $(H1)$ and $(H2)$]), an infinite sequence $\{z^k\}$ generated by Algorithm 1 is proven to be feasible as $k$ sufficiently large. Similar to the above discussion, it immediately follows from Assumptions $(H1), (H2)$ and [8, Lemma 3] that $\{z^k\}$ is bounded and every accumulation point satisfies MFCQ. Thus, we recover [8, Theorem 1] by replacing these assumptions by EMFCQ holds for $\{z^k\}_{k\in K}$ and $\nabla f$ is bounded on the level set. This indicates that [8, Theorem 1] is also a special case in our work.

3 Exact penalty property

This section deals mainly with the “exact” property of the penalty functions (2.1). In particular, [13, Theorem 2.1] is generalized by using EMFCQ, instead of standard MFCQ. Define the level set

$$\begin{aligned} D_F=\big \{x\in [u,v]\big |\Vert F(x)\Vert \le \sqrt{a}+\overline{\varepsilon }\Vert \omega \Vert \big \}, \end{aligned}$$

(3.1)

where $a\in (0,+\infty ]$. If $a=+\infty $, then $D_F$ reduces to $[u,v]$. Before proceeding, we need the following assumptions.

Assumption (A)

$(A_1)$ $\nabla f$ is bounded on $D_F$;

$(A_2)$ For any infinite subsequence $\{\sigma _k\}_{k\in K}\rightarrow +\infty $, EMFCQ holds for $\{x^k\}_{k\in K}$, where $(x^k,\varepsilon _k)$ is a local optimal solution of $(\widetilde{P}_{\sigma _k})$ with finite value.

Lemma 3.1

Suppose that there exists $\beta _1>0$ such that $\beta '(\varepsilon )\ge \beta _1$ for all $\varepsilon \in (0,\overline{\varepsilon }]$. If $(x,\varepsilon )$ is a KKT point of $(\widetilde{P}_\sigma )$ with $\varepsilon >0$, then

$$\begin{aligned} 2\beta _1\phi '(0)\sigma \varepsilon ^2\frac{1}{\phi '(\Delta )^2}\le \Vert F(x)\Vert ^2, \end{aligned}$$

where $\Delta $ denotes $\Delta (x,\varepsilon )$ for simplification.

Proof

If $(x,\varepsilon )$ is a KKT point of $(\widetilde{P}_\sigma )$ with $\varepsilon >0$, then by the construction of $\widetilde{f}_\sigma $, there exist $\lambda ,~\eta \in \mathbb{R }^n_+,~\lambda _{n+1}\ge 0,$ and $\eta _{n+1}\ge 0$ such that

$$\begin{aligned}&\nabla f(x)+\frac{1}{\varepsilon }\phi '(\Delta ) \nabla F(x) \big (F(x)-\varepsilon \omega \big )=\lambda -\eta , \nonumber \\&\inf (\lambda _i,x_i-u_i)=\inf (\eta _i,v_i-x_i)=0,\qquad \qquad i=1,2,\ldots ,n, \nonumber \\&-\frac{1}{2\varepsilon ^2}\phi (\Delta )-\frac{1}{\varepsilon }\phi '(\Delta ) \big (F(x)-\varepsilon \omega \big )^T\omega +\sigma \beta '(\varepsilon )=\lambda _{n+1}-\eta _{n+1}, \end{aligned}$$

(3.2)

$$\begin{aligned}&\lambda _{n+1}=\inf (\eta _{n+1},\overline{\varepsilon }-\varepsilon )=0. \end{aligned}$$

(3.3)

It follows from (3.2) and (3.3) that

$$\begin{aligned} -\frac{1}{2\varepsilon ^2}\phi (\Delta )-\frac{1}{\varepsilon } \phi '(\Delta )\big (F(x)-\varepsilon \omega \big )^T \omega +\sigma \beta '(\varepsilon )\le 0, \end{aligned}$$

from which and the fact that $\phi '(\Delta )>0$ we have

$$\begin{aligned} -\frac{1}{\phi '(\Delta )}\phi (\Delta )-2\varepsilon \big (F(x)-\varepsilon \omega \big )^T \omega +2\varepsilon ^2\sigma \beta '(\varepsilon )\frac{1}{\phi '(\Delta )}\le 0. \end{aligned}$$

(3.4)

Rearranging (3.4) yields

$$\begin{aligned} -\frac{1}{\phi '(\Delta )}\phi (\Delta )+\Delta +\varepsilon ^2\Vert \omega \Vert ^2+2\varepsilon ^2 \sigma \beta '(\varepsilon )\frac{1}{\phi '(\Delta )}\le \Vert F(x)\Vert ^2. \end{aligned}$$

(3.5)

The convexity of $\phi $, the condition $(i_2)$ and the fact $\phi (0)=0$ ensure that

$$\begin{aligned} -\Delta \le \big (\phi (0)-\phi (\Delta )\big )\frac{1}{\phi '(\Delta )} =-\phi (\Delta )\frac{1}{\phi '(\Delta )}. \end{aligned}$$

(3.6)

Combining (3.5) and (3.6) yields

$$\begin{aligned} 2\varepsilon ^2\sigma \beta '(\varepsilon )\frac{1}{\phi '(\Delta )} \le \Vert F(x)\Vert ^2. \end{aligned}$$

In consideration of $\beta '(\varepsilon )\ge \beta _1$ and the monotonicity of $\phi '$ (since $\phi $ is convex), the above inequality implies that

$$\begin{aligned} 2\beta _1\phi '(0)\sigma \varepsilon ^2\frac{1}{\phi '(\Delta )^2}\le \Vert F(x)\Vert ^2. \end{aligned}$$

$\square $

Lemma 3.2

Suppose that there exists $\beta _1>0$ such that $\beta '(\varepsilon )\ge \beta _1$ for all $\varepsilon \in (0,\overline{\varepsilon }]$. If $(x^*,\varepsilon ^*)$ is a local optimal solution of $(\widetilde{P}_\sigma )$ with finite optimal value, then $x^*$ is a local optimal solution of $(P)$ if and only if $\varepsilon ^*=0$.

Proof

Since constraint functions of $(\widetilde{P}_\sigma )$ are all linear, then $(x^*,\varepsilon ^*)$ is also a KKT point of $(\widetilde{P}_\sigma )$. If $x^*$ is a local optimal solution of $(P)$, then $F(x^*)=0$. According to Lemma 3.1, we must have $\varepsilon ^*=0$. Conversely, if $\varepsilon ^*=0$, taking account of the finiteness of $\widetilde{f}_\sigma (x^*,0)$ and the construction of $\widetilde{f}_\sigma $, we have $F(x^*)=0$, and hence $x^*$ is a local optimal solution of $(P)$. $\square $

Lemma 3.3

Suppose that there exists $\beta _1>0$ such that $\beta '(\varepsilon )\ge \beta _1$ for all $\varepsilon \in (0,\overline{\varepsilon }]$. If $(x^k,\varepsilon _k)$ is a KKT point of $(\widetilde{P}_{\sigma _k})$ with $\varepsilon _k>0$, then for $\sigma _k\rightarrow +\infty (k\rightarrow \infty )$, we have

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{1}{\varepsilon _k}\phi '(\Delta _k) \sqrt{\Delta _k}=+\infty , \end{aligned}$$

where $\Delta _k:=\Delta (x^k,\varepsilon _k)$.

Proof

Lemma 3.1 implies that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{1}{\varepsilon _k}\phi '(\Delta _k) \Vert F(x^k)\Vert =+\infty . \end{aligned}$$

Note that

$$\begin{aligned} \frac{1}{\varepsilon _k}\phi '(\Delta _k)\Vert F(x^k) \Vert \le \phi '(\Delta _k)\left( \frac{1}{\varepsilon _k}\sqrt{\Delta _k}+\Vert \omega \Vert \right) . \end{aligned}$$

Therefore,

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\phi '(\Delta _k)\left( \frac{1}{\varepsilon _k} \sqrt{\Delta _k}+\Vert \omega \Vert \right) =+\infty , \end{aligned}$$

which, together with the monotonicity of $\phi '$, yields

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{1}{\varepsilon _k} \phi '(\Delta _k)\sqrt{\Delta _k}=+\infty . \end{aligned}$$

$\square $

Theorem 3.1

Suppose that Assumptions $(A_1)$ and $(A_2)$ hold, and that there exists $\beta _1>0$ such that $\beta '(\varepsilon )\ge \beta _1$ for all $\varepsilon \in (0,\overline{\varepsilon }]$. When $\sigma >0$ is sufficiently large, if $(x^*,\varepsilon ^*)$ is a local optimal solution of $(\widetilde{P}_\sigma )$ with finite optimal value, then $\varepsilon ^*=0$. Furthermore, $x^*$ is a local optimal solution of $(P)$.

Proof

It suffices to show that $\varepsilon ^*=0$. Suppose on the contrary that there exist $\sigma _k\rightarrow +\infty $ as $k\rightarrow \infty $ and a sequence of local optimal solutions $(x^k,\varepsilon _k)$ of $(\widetilde{P}_{\sigma _k})$ with $\widetilde{f}_\sigma (x^k,\varepsilon _k)$ finite and $\varepsilon _k>0$. Since all constraint functions are linear, then $(x^k,\varepsilon ^k)$ is a KKT point of $(\widetilde{P}_{\sigma _k})$, i.e., there exist $\lambda ^k,\eta ^k\in \mathbb{R }^n_+$ such that

$$\begin{aligned}&\nabla f(x^k)+\frac{1}{\varepsilon _k}\phi '(\Delta _k)\nabla F(x^k)(F(x^k)-\varepsilon _k\omega )=\lambda ^k-\eta ^k,\end{aligned}$$

(3.7)

$$\begin{aligned}&\inf (\lambda _i^k,x_i^k-u_i)=\inf (\eta _i^k,v_i-x_i^k)=0, \quad i=1,2,\ldots ,n. \end{aligned}$$

(3.8)

Since the index set $\{1,\ldots ,n\}$ is finite, then there exists an infinite subset $K\subset \{1,2,\ldots \}$ such that for $k\in K$,

$$\begin{aligned} x_i^k&= u_i,\qquad \qquad \quad i\in I_1, \end{aligned}$$

(3.9)

$$\begin{aligned} x_i^k&= v_i,\qquad \qquad \quad i\in I_2, \end{aligned}$$

(3.10)

$$\begin{aligned} u_i&< x_i^k<v_i,\qquad i\in I_3, \end{aligned}$$

(3.11)

where

$$\begin{aligned} I_1\cup I_2\cup I_3=\{1,2,\ldots ,n\}~~\mathrm{and}~~I_i\cap I_j =\emptyset \quad \text {if} \quad i\ne j. \end{aligned}$$

Invoking Assumption $(A_2)$, there exist an infinite subset $K_0\subset K$ and $p\in \mathbb{R }^n$ such that

$$\begin{aligned}&\displaystyle \lim \limits _{k\in {K_0},k\rightarrow \infty }\nabla F(x^k)=\nabla F^*,~~\mathrm{rank} (\nabla F^*)=m, ~~(\nabla F^*)^T p=0,&\end{aligned}$$

(3.12)

$$\begin{aligned}&\displaystyle p_i\left\{ \begin{array}{ll} >0, \quad i\in I_1,\\ <0, \quad i\in I_2.\\ \end{array} \right.&\end{aligned}$$

(3.13)

Putting (3.7)–(3.11) together yields

$$\begin{aligned} \frac{\partial f(x^k)}{\partial x_i}+\frac{1}{\varepsilon _k} \phi '(\vartriangle _k)\Big (\nabla {F(x^k)}(F(x^k)-\varepsilon _k\omega )\Big )_i \left\{ \begin{array}{ll} \ge 0,\quad i\in {I_1}, \\ \le 0, \quad i\in {I_2},\\ =0, \quad i\in {I_3}, \ \ \end{array} \right. \end{aligned}$$

(3.14)

where $\partial f(x)/\partial x_i$ denotes the partial derivative of $f$ with respect to $x_i$. Let

$$\begin{aligned} h_k=\frac{1}{\varepsilon _k}\phi '(\Delta _k)(F(x^k)-\varepsilon _k\omega ) ~~\mathrm{and}~~ q_k=\frac{h_k}{\Vert h_k\Vert }. \end{aligned}$$

Lemma 3.3 implies

$$\begin{aligned} \lim \limits _{k\in {K_0},k\rightarrow \infty }\Vert h_k\Vert =+\infty . \end{aligned}$$

(3.15)

Since $\Vert q_k\Vert =1$ is bounded, we can assume, without loss of generality, that

$$\begin{aligned} \lim \limits _{k\in {K_0},k\rightarrow \infty }q_k=\tilde{q}\ne 0. \end{aligned}$$

(3.16)

It then follows from (3.14) that

$$\begin{aligned} \frac{\partial f(x^k)}{\partial x_i}\frac{1}{\Vert h_k\Vert }+\Big (\nabla F(x^k) q_k\Big )_i \left\{ \begin{array}{ll} \ge 0, \quad i\in {I_1}, \\ \le 0, \quad i\in {I_2},\\ =0, \quad i\in {I_3}. \ \ \end{array} \right. \end{aligned}$$

From Assumption $(A_1)$, (3.12), (3.15), and (3.16), taking limits in the above formula yields

$$\begin{aligned} \Big ((\nabla F^*) \tilde{q}\Big )_i \left\{ \begin{array}{ll} \ge &{} 0, \quad i\in {I_1}, \\ \le &{} 0,\quad i\in {I_2},\\ =&{} 0, \quad i\in {I_3}. \end{array} \right. \end{aligned}$$

(3.17)

Combining this with (3.12) implies

$$\begin{aligned} 0=p^T(\nabla F^*) \tilde{q} =\sum \limits _{i\in {I_1}}p_i\Big ( (\nabla F^*) \tilde{q} \Big )_i+\sum \limits _{i\in {I_2}} p_i\Big ( \big ( \nabla F^*) \tilde{q}\Big )_i, \end{aligned}$$

which, together with (3.13) and (3.17) yields $(\nabla F^*)\tilde{q}=0$, and hence $\tilde{q}=0$ since $\nabla F^*$ has full column rank by (3.12). This leads to a contradiction to $\tilde{q}\ne 0$ in (3.16). Therefore, the desired result readily follows from Lemma 3.2. $\square $

As a corollary of Theorem 3.1, we have

Corollary 3.1

Suppose that

(1):: the set $D_F$ is bounded,
(2):: the standard MFCQ holds at each point of $D_F$,
(3):: there exists $\beta _1>0$ such that $\beta '(\varepsilon )\ge \beta _1$ for all $\varepsilon \in (0,\overline{\varepsilon }]$.

Then, whenever $\sigma >0$ is large enough, every local optimal solution $(x^*,\varepsilon ^*)$ of $(\widetilde{P}_\sigma )$ with finite value satisfies $\varepsilon ^*=0$. Furthermore, $x^*$ is a local optimal solution of $(P)$.

Proof

The validity of Assumption $(A_1)$ comes from the condition $(1)$, and Assumption $(A_2)$ is due to the conditions $(1), (2)$, and Proposition 2.1. $\square $

Remark 3.1

In particular, Corollary 3.1 recovers [13, Theorem 2.1] by taking $\phi (t)=\frac{t}{1-qt}$.

Inspired by [13], a necessary condition for exact penalty property is given below. Toward this end, consider a class of functions $\Phi :[0,a )\rightarrow [0,+\infty )$, where $a>0$, satisfying

$(j_1)$ :: $\Phi $ is continuous and increasing on $[0,a)$ with $\Phi (0)=0$,
$(j_2)$ :: there exists $a'\in (0,a)$ such that $\Phi (t)\ge t$ for all $t\in [0,a']$.

Such a class of functions includes several important functions as special cases, for example

$$\begin{aligned} \begin{array}{lll} &{} \Phi _1(t)=t^\alpha , &{}\quad t\in [0,+\infty ),~\alpha \in (0,1],\\ &{} \Phi _2(t)=-\log (1-t^\alpha ), &{}\quad t\in [0,1),~~~~\ \,\,\alpha \in (0,1], \\ &{} \Phi _3(t)=e^t-1, &{}\quad t\in [0,+\infty ). \end{array} \end{aligned}$$

Assumption (B)

Let $x^*$ be a feasible point of $(P)$. There exist $\gamma >0$ and a neighborhood $N(x^*)$ such that

$$\begin{aligned} f(x)-f(x^*)+\gamma \Phi (\Vert F(x)\Vert )\ge 0,\quad \forall x\in N(x^*)\cap [u,v], \end{aligned}$$

(3.18)

where $\Phi $ satisfies the conditions $(j_1)$ and $(j_2)$.

The function $\Phi $ plays a key role in (3.18), which is illustrated by the following example.

Example3.1

$$\begin{aligned} \min&f(x)=x\\ ~\mathrm{s.t.}&F(x)=x^2=0, \\&x\in (-\infty ,+\infty ). \end{aligned}$$

Clearly, the optimal solution is $x^*=0$. If $\Phi (t)=t$, i.e., $\Phi (\Vert F(x)\Vert )=x^2$, then (3.18) is false for all $\gamma >0$, while if $\Phi (t)=\sqrt{t}$, i.e., $\Phi (\Vert F(x)\Vert )=|x|,$ then (3.18) is true by taking $\gamma =1$.

Theorem 3.2

Suppose that $\beta (\varepsilon )=\Phi (\varepsilon )$ for $\varepsilon >0$ sufficiently small. If $(x^*,0)$ is a local optimal solution of $(\widetilde{P}_{\sigma _0})$ with finite value for some $\sigma _0>0$, then Assumption (B) holds at $x^*$ for $\Phi $ with $\gamma \ge \sigma _0+\phi '(0)(1+\Vert \omega \Vert )^2$.

Proof

Let $\gamma \ge \sigma _0+\phi '(0)(1+\Vert \omega \Vert )^2$. Suppose on the contrary that there exists a sequence $x^k\in [u,v]$ converging to $x^*$ such that

$$\begin{aligned} f(x^k)-f(x^*)\le f(x^k)-f(x^*)+\gamma \Phi (\Vert F(x^k)\Vert )<0. \end{aligned}$$

(3.19)

Since $(x^*,0)$ is a local optimal solution of $(\widetilde{P}_{\sigma _0})$ and the corresponding optimal value is finite, it follows from the construction of $\widetilde{f}_\sigma $ that $x^*$ is feasible (cf.(2.1)) and thus $x^*$ is a local optimal solution of $(P)$. Therefore, $x_k$ is infeasible by (3.19). This, together with the continuity of $F$, implies that

$$\begin{aligned} \Vert F(x^k)\Vert >0 \quad \mathrm{and} \quad \lim \limits _{k\rightarrow \infty }\Vert F(x^k)\Vert =\Vert F(x^*)\Vert =0. \end{aligned}$$

Let $\varepsilon _k=\Vert F(x^k)\Vert $, i.e., $\varepsilon _k>0$ and $\varepsilon _k\rightarrow 0$ as $k\rightarrow \infty .$ Notice that

$$\begin{aligned} \Delta _k=\Vert F(x^k)-\varepsilon _k\omega \Vert ^2\le \varepsilon ^2_k(1+\Vert \omega \Vert )^2. \end{aligned}$$

(3.20)

Thus,

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\Delta _k \le \lim \limits _{k\rightarrow \infty }\varepsilon ^2_k(1+\Vert \omega \Vert )^2= 0. \end{aligned}$$

As a result,

$$\begin{aligned} \phi '(\Delta _k)\le 2\phi '(0), \end{aligned}$$

(3.21)

whenever $k$ is sufficiently large, since $\phi '$ is continuous and $\phi '(0)>0$. Since $(x^*,0)$ is a local optimal solution of $(\widetilde{P}_{\sigma _0})$, then for $k$ large enough we have

$$\begin{aligned}&0 \le f(x^k)-f(x^*)+\frac{1}{2\varepsilon _k}\phi (\Delta _k) +\sigma _0\beta (\varepsilon _k)\\&\;\;\begin{array}{l@{\quad }l} = f(x^k)-f(x^*)+\frac{1}{2\varepsilon _k}\phi (\Delta _k) +\sigma _0\Phi (\varepsilon _k) &{} by\, Assumption\, \beta (\varepsilon )=\Phi (\varepsilon )\\ < \frac{1}{2\varepsilon _k}\phi (\Delta _k)+\sigma _0 \Phi (\varepsilon _k)-\gamma \Phi (\Vert F(x^k)\Vert )&{} by ~ (3.19)\\ \le \frac{1}{2\varepsilon _k}\phi '(\Delta _k)\Delta _k +\sigma _0\Phi (\varepsilon _k)-\gamma \Phi (\varepsilon _k),&{} by~the ~convexity ~of ~\phi ~and~ \phi (0)=0\\ \le \varepsilon _k\phi '(0)(1+\Vert \omega \Vert )^2+(\sigma _0-\gamma ) \Phi (\varepsilon _k)&{} by\, (3.20) ~and~(3.21)\\ \le \Phi (\varepsilon _k)\Big [\phi '(0)(1+\Vert \omega \Vert )^2 +\sigma _0-\gamma \Big ],&{} by \, condition~ j_2 \\ \le 0, &{} since ~\gamma \ge \sigma _0+\phi '(0)(1+\Vert \omega \Vert )^2 \end{array} \end{aligned}$$

which leads to a contradiction. $\square $

4 Inverse propositions for exact penalization

In this section, we shall show that, if $x^*$ is a local optimal solution of $(P)$, then Assumption (B) introduced in Sect. 3 is a necessary and sufficient condition for $(x^*,0)$ to be a local optimal solution of $(\widetilde{P}_\sigma )$. Based on Theorem 3.2, we further establish the equivalence between this new class of exact penalty functions and the classical simple and exact penalty functions in the sense of “exact penalty property”. Our conclusions clarify that this class of penalty functions (2.1) possesses not only exactness property as the classical simple penalty function, but also the smoothness property, which is not shared by the classical simple exact penalty function, however.

Theorem 4.1

Let $x^*$ be a local optimal solution of $(P)$. The following two statements hold:

(1):: If Assumption (B) holds at $x^*$ for $\Phi $, and $\beta (\varepsilon )\ge \Phi (\sqrt{\varepsilon })$ for $\varepsilon >0$ sufficiently small, then $(x^*,0)$ is a local optimal solution of $(\widetilde{P}_\sigma )$ for all $\sigma \ge \gamma $;
(2):: Let $\beta (\varepsilon )=\Phi (\varepsilon )$ for $\varepsilon >0$ sufficiently small. If there exists $\sigma _0>0$ such that $(x^*,0)$ is a local optimal solution of $(\widetilde{P}_\sigma )$ for all $\sigma \ge \sigma _0$, then Assumption (B) holds at $x^*$ for $\Phi $ with $\gamma \ge \sigma _0+\phi '(0)(1+\Vert \omega \Vert )^2$.

Proof

We only need to show the validity of part (1), since part (2) follows from Theorem 3.2. Suppose on the contrary that for some $\sigma \ge \gamma $, there exist $(x^k,\varepsilon _k)\in {[u,v]\times (0,\overline{\varepsilon }]}, x^k\rightarrow x^*,$ and $\varepsilon _k\rightarrow 0$ as $k\rightarrow \infty $ such that $f(x^*)=\widetilde{f}_\sigma (x^*,0)>\widetilde{f}_\sigma (x^k, \varepsilon _k),$, i.e.,

$$\begin{aligned} 0&> f(x^k)-f(x^*)+\frac{1}{2\varepsilon _k}\phi (\Delta _k) +\sigma \beta (\varepsilon _k) \nonumber \\&\ge f(x^k)-f(x^*)+\frac{1}{2\varepsilon _k}\phi '(0)\Delta _k +\sigma \beta (\varepsilon _k), \end{aligned}$$

(4.1)

where we have used the gradient inequality of convex functions, i.e., $\phi (\Delta _k)\ge \phi (0)+\phi '(0)\Delta _k=\phi '(0)\Delta _k$. Taking limits on both sides of (4.1) yields $ \lim \limits _{k\rightarrow \infty }\Delta _k/{\varepsilon _k}= 0,$ which means that

$$\begin{aligned} \sqrt{\Delta _k}\le \frac{1}{2}\sqrt{\varepsilon _k} \quad \mathrm{and} \quad \sqrt{\varepsilon _k}\le \frac{1}{2\Vert \omega \Vert +1} \end{aligned}$$

whenever $k$ is sufficiently large. So,

$$\begin{aligned} \Vert F(x^k)\Vert \le \sqrt{\Delta _k}+\varepsilon _k\Vert \omega \Vert \le \left( \frac{1}{2}+\sqrt{\varepsilon _k}\Vert \omega \Vert \right) \sqrt{\varepsilon _k} \le \sqrt{\varepsilon _k}. \end{aligned}$$

(4.2)

Since $\varepsilon _k\rightarrow 0$, then by hypothesis, for $k$ large enough,

$$\begin{aligned} \beta (\varepsilon _k)\ge \Phi (\sqrt{\varepsilon _k}). \end{aligned}$$

(4.3)

Therefore, we conclude that for $k$ sufficiently large,

$$\begin{aligned}&0>f(x^k)-f(x^*)+\frac{1}{2\varepsilon _k}\phi '(0)\Delta _k +\sigma \beta (\varepsilon _k)\\&\;\;\begin{array}{l@{\quad }l} \ge f(x^k)-f(x^*)+\sigma \beta (\varepsilon _k) &{} by\, the~nonnegativity ~of~ \phi '\\ \ge \sigma \beta (\varepsilon _k)-\gamma \Phi (\Vert F(x^k)\Vert ) &{} by~Assumption (B)\\ \ge \sigma \Phi (\sqrt{\varepsilon _k})-\gamma \Phi (\Vert F(x^k)\Vert ) &{} by\, (4.3) \\ \ge \Phi (\sqrt{\varepsilon _k})(\sigma -\gamma )&{} by~the~monotonicity\, of\, \Phi \, and\, (4.4)\\ \ge 0, &{} since ~\sigma \ge \gamma \\ \end{array} \end{aligned}$$

which leads to a contradiction. $\square $

The concept of regular zero is introduced in [13, Definition 4.1]. To be consistent with their notation, let us define $H:\mathbb{R }^n\rightarrow \mathbb{R }^p$ with $p=m+|I^*|\le n$ and

$$\begin{aligned} H(x)=\left( \begin{array}{ll} F(x)\\ x_{I^*}-x^*_{I^*}\end{array}\right) , \end{aligned}$$

where $|I^*|$ signifies the number of elements in $I^*=I^*_1\cup I^*_2, I^*_1$ and $I^*_2$ are, respectively, the active index sets of $u\le x^*$ and $x^*\le v$, i.e.,

$$\begin{aligned} I^*_1=\{i|u_i=x^*_i \} ~~\mathrm{and}~~ I^*_2=\{i|v_i=x^*_i \}. \end{aligned}$$

According to Theorem 4.1, we obtain

Corollary 4.1

Let $x^*$ be a local optimal solution of $(P)$ and $\beta (\varepsilon )\ge \sqrt{\varepsilon }$ for all $\varepsilon $ sufficiently small. If $x^*$ is a regular zero of $H$, then there exists $\gamma >0$ such that $(x^*,0)$ is a local optimal solution of $(\widetilde{P}_\sigma )$ for all $\sigma \ge \gamma $.

Proof

Since $x^*$ is a regular zero of $H$, it then follows from [13, Lemmas 5.1 and 5.2] that Assumption (B) holds at $x^*$ for $\Phi (t)=t$. Combining this and Theorem 4.1 yields the desired result. $\square $

Remark 4.1

For the smooth case, Corollary 4.1 reduces to [13, Theorem 5.3] by taking $\phi (t)=\frac{t}{1-qt}$.

Corollary 4.2

Let $x^*$ be a strict local optimal solution of $(P)$ and $\beta (\varepsilon )\ge \sqrt{\varepsilon }$ for $\varepsilon >0$ sufficiently small. If MFCQ holds at $x^*$, then there exists $\gamma >0$ such that $(x^*,0)$ is a local optimal solution of $(\widetilde{P}_\sigma )$ for all $\sigma \ge \gamma $.

Proof

Since MFCQ holds at $x^*$, it then follows from [10, Theorem 4.4] that Assumption (B) holds at $x^*$ for $\Phi (t)=t$. Therefore, the desired result follows from Theorem 4.1. $\square $

Actually, these two sufficient conditions given in Corollaries 4.1 and 4.2 are independent. This is illustrated by the following examples.

Example 4.1

$$\begin{aligned} \begin{array}{lc} F(x)=x_1-x_2=0,\\ x=(x_1,x_2)\in [0,+\infty )\times [0,\infty ). \end{array} \end{aligned}$$

It is easy to see that MFCQ holds at $x^*=(0,0)$, while $x^*$ is not a regular zero of $H$.

Now let us show that the second-order sufficient conditions guarantee the validity of Assumption (B). Recall that, for (P), the second-order sufficient conditions are said to hold at $x^*$ if

(a)
$x^*$ is a KKT point, i.e., there exist $\lambda ^*,\eta ^*\in {\mathbb{R }^n_+},$ and $\mu ^*\in \mathbb{R }^m$ such that
$$\begin{aligned}&\nabla f(x^*)-\lambda ^*+\eta ^*+\nabla F(x^*)\mu ^*= 0, \end{aligned}$$
(4.4)

$$\begin{aligned}&\inf (\lambda ^*_i,x^*_i-u_i)= \inf (\eta ^*_i,v_i-x^*_i)=0,\quad i=1,2,\ldots ,n. \end{aligned}$$
(4.5)
(b)
the matrix $\nabla _{xx}^2L(x^*,\mu ^*)$ is positive definite on the cone $\{d\ne 0|\nabla F(x^*)d=0, d_i=0, ~as~ \lambda _i^*>0~or~ \eta _i^*>0\}$, where $L(x,\mu )=f(x)+\mu ^T F(x).$

The following example shows that the second-order sufficient conditions are independent of MFCQ and regular zero of $H$.

Example 4.2

$$\begin{aligned} \min&f(x)=x_1^2+x_2^2\\ ~\mathrm{s.t.}&F(x)=x_1-x_2=0,\\&x=(x_1,x_2)\in [0,+\infty )\times (-\infty ,0]. \end{aligned}$$

It is easy to check that at the point $x^*=(0,0)$, the second-order sufficient conditions hold, while MFCQ and regular zero of $H$ do not hold true.

Corollary 4.3

If $\beta (\varepsilon )\ge \sqrt{\varepsilon }$ for all $\varepsilon >0$ sufficiently small and the second-order sufficient conditions hold at $x^*$, then there exists $\gamma >0$ such that $(x^*,0)$ is a local strict optimal solution of $(\widetilde{P}_\sigma )$ for all $\sigma \ge \gamma $.

Proof

Since the second-order sufficient conditions hold at $x^*$, it then readily follows from [10, Theorem 4.6] that Assumption (B) is valid for $\Phi (t)=t$ with the inequality being strict. The desired result follows from Theorem 4.1. $\square $

Now we propose a new sufficient condition for the validity of Assumption (B) from a different way.

Proposition 4.1

Let $x^*$ be a local optimal solution of $(P)$. If $x^*$ is a KKT point and

$$\begin{aligned} {\mathop {\mathop {\lim }\limits _{x\in [u,v]\rightarrow x^*}}\limits _{F(x)\ne 0}}\frac{1}{\Phi (\Vert F(x)\Vert )} \int \limits ^1_0\Big (\nabla _xL(x^*+s(x-x^*),\mu ^*) -\nabla _xL(x^*,\mu ^*)\Big )^T(x-x^*)ds=0,\nonumber \\ \end{aligned}$$

(4.6)

where $\lambda ^*,\eta ^*\in {\mathbb{R }^n_+},~\mu ^*\in \mathbb{R }^m$ are the corresponding Lagrangian multipliers, then Assumption (B) holds at $x^*$ for $\Phi $ with $\gamma =1+\Vert \mu ^*\Vert $.

Proof

It is sufficient to show the existence of a neighborhood of $x^*$, say $N(x^*)$, such that

$$\begin{aligned} f(x)-f(x^*)+\gamma \Phi (\Vert F(x)\Vert )\ge 0,\quad x\in N(x^*)\cap [u,v]. \end{aligned}$$

(4.7)

Since $x^*$ is a local optimal solution of $(P)$, then there exists a neighborhood of $x^*$, say $\widetilde{N}(x^*)$, such that (4.7) holds true for all $x\in \widetilde{N}(x^*)\cap [u,v]$ satisfying $F(x)=0$. Now consider the case of $x\in \widetilde{N}(x^*)\cap [u,v]$ with $F(x)\ne 0$. Putting (4.4), (4.5), and (4.6) together yields

$$\begin{aligned} f(x)-f(x^*)&= \nabla f(x^*)^T(x-x^*)+\int \limits ^1_0\Big (\nabla f(x^*+s(x-x^*))-\nabla f(x^*)\Big )^T(x-x^*)ds \nonumber \\&= \sum \limits _{i\in I^*_1}\lambda ^*_i(x_i-x^*_i)-\sum \limits _{i\in I^*_2}\eta ^*_i(x_i-x^*_i)-\mu ^{*T}\nabla F(x^*)^T (x-x^*) \nonumber \\&+\int \limits ^1_0 \Big (\nabla f(x^*+s(x-x^*))-\nabla f(x^*)\Big )^T(x-x^*)ds \nonumber \\&\ge -\mu ^{*T}\nabla F(x^*)^T (x-x^*)+\int \limits ^1_0\Big (\nabla f(x^*+s(x-x^*))-\nabla f(x^*)\Big )^T(x-x^*)ds \nonumber \\&= -\mu ^{*T}F(x)+\int \limits ^1_0\mu ^{*T}\Big (\nabla F(x^*+s(x-x^*))-\nabla F(x^*)\Big )^T(x-x^*)ds \nonumber \\&+\int \limits ^1_0\Big (\nabla f(x^*+s(x-x^*))-\nabla f(x^*)\Big )^T(x-x^*)ds \nonumber \\&= -\mu ^{*T}F(x)+\int \limits ^1_0\Big (\nabla _x L(x^*+s(x-x^*),\mu ^*)-\nabla _x L(x^*,\mu ^*)\Big )^T(x-x^*)ds \nonumber \\&= -\mu ^{*T}F(x)+o(\Phi (\Vert F(x)\Vert )). \end{aligned}$$

(4.8)

Note that there exists a neighborhood $N(x^*)\subset \widetilde{N}(x^*)$ of $x^*$ such that

$$\begin{aligned} \frac{1}{\Phi (\Vert F(x)\Vert )}\Big |o\Big (\Phi (\Vert F(x)\Vert )\Big ) \Big |\le \frac{1}{2}, \end{aligned}$$

(4.9)

whenever $x\in N(x^*)\cap [u,v]$ with $F(x)\ne 0$. Therefore,

$$\begin{aligned} f(x)-f(x^*)+\gamma \Phi (\Vert F(x)\Vert )&\ge \gamma \Phi (\Vert F(x)\Vert )-\mu ^{*T}F(x)+o\big (\Phi (\Vert F(x)\Vert )\big ) \quad ~by ~(4.8) \\&\ge \gamma \Phi (\Vert F(x)\Vert )-\Vert \mu ^*\Vert \Vert F(x)\Vert +o\big (\Phi (\Vert F(x)\Vert )\big )\\&\ge \Phi (\Vert F(x)\Vert )(\gamma -\Vert \mu ^*\Vert )+o\big (\Phi (\Vert F(x)\Vert )\big )\quad \quad \,\, by~ \Phi (t)\ge t\\&= \Phi (\Vert F(x)\Vert )+o\big (\Phi (\Vert F(x)\Vert )\big ) \quad \quad \quad \quad \quad \quad \quad \quad \,\, by~\gamma =1+\Vert \mu ^*\Vert \\&\ge \frac{1}{2}\Phi (\Vert F(x)\Vert ) \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\,\,\qquad by\, (4.9)\\&> 0. \end{aligned}$$

This yields the inequality as desired. $\square $

It should be emphasized that the condition (4.6) is also independent of other conditions given in the previous discussion, which is illustrated by the following example.

Example 4.3

$$\begin{aligned} \min&f(x)=-|x|^\frac{3}{2}\\ \mathrm{s.t.}&F(x)=x^2=0,\\&x\in (-\infty ,+\infty ). \end{aligned}$$

By a simple calculation, we know that, at $x^*=0$, the condition (4.6) holds when $\Phi (t)=\sqrt{t}$, while MFCQ, the second-order sufficient conditions, and regular zero of $F$ do not hold at $x^*$. In addition, the condition (4.6) is also true for Example 4.2 when $\Phi (t)=t$.

Applying Theorem 4.1 and Proposition 4.1 yields the following corollary.

Corollary 4.4

Let $x^*$ be a local optimal solution of $(P)$ and $\beta (\varepsilon )\ge \Phi (\sqrt{\varepsilon })$ for all $\varepsilon >0$ sufficiently small. If $x^*$ is a KKT point and the condition (4.6) holds, then $(x^*,0)$ is a local optimal solution of $(\widetilde{P}_\sigma )$ with $\sigma \ge 1+\Vert \mu ^*\Vert $.

We turn our attention to the relationship between this new class of exact penalty functions and the classical simple penalty functions in the sense of “exactness” property. Define

$$\begin{aligned} f_\gamma (x)=f(x)+\gamma \Phi (\Vert F(x)\Vert ), \end{aligned}$$

where $\Phi $ satisfies $(j_1)$ and $(j_2)$. Clearly, $f_\gamma $ is a classical simple and exact penalty function. The associated penalty problem is

$$\begin{aligned} (P_\gamma )&\min \, f_\gamma (x)\\&\mathrm{s.t.}\, x\in [u,v]. \end{aligned}$$

Note that (3.18) in Assumption (B) is equivalent to saying that $x^*$ is a local optimal solution of $(P_\gamma )$. Let $x^*$ be a local optimal solution of $(P)$. According to Theorem 4.1, the relationship between $(\widetilde{P}_\sigma )$ and $(P_\gamma )$ is summarized as follows:

(1)
Suppose that $\beta (\varepsilon )\ge \Phi (\sqrt{\varepsilon })$ for $\varepsilon >0$ sufficiently small. If there exists $\gamma >0$ such that $x^*$ is a local optimal solution of the penalty problem $(P_{\gamma })$, then $(x^*,0)$ is a local optimal solution of the penalty problem $(\tilde{P}_{\sigma })$ when $\sigma >0$ is sufficiently large.
(2)
Suppose that $\beta (\varepsilon )= \Phi (\varepsilon )$ for $\varepsilon >0$ sufficiently small. If there exists $\sigma >0$ such that $(x^*,0)$ is a local optimal solution of the penalty problem $(\tilde{P}_{\sigma })$, then $x^*$ is a local optimal solution of the penalty problem $(P_{\gamma })$ when $\gamma >0$ is sufficiently large.

5 Penalty function methods

In this section, we present a revised penalty function algorithm via $\tilde{f}_{\sigma }$. The algorithm is always feasible, since $\tilde{f}_{\sigma }$ is bounded from below. It should be noted that the parameter $\varepsilon $ in $\tilde{f}_{\sigma }$ plays a key role in the framework of the algorithm. It follows from the Assumption (A) and Theorem 3.1 that the proposed algorithm terminates at the optimal solutions of $(P)$ after finitely many iterations (see Theorem 5.1). We further analyze the case in the absence of Assumption (A). In this case, we present a perturbation theorem about the algorithm (see Theorem 5.2). This allows us to obtain the global convergence property of the algorithm (see Corollary 5.1). Furthermore, necessary and sufficient conditions for zero duality gap are also derived between the original problem $(P)$ and its dual problem $(L)$ defined by the penalty function $\tilde{f}_{\sigma }$ (see Corollary 5.2).

Assumption (C)

The function $\beta $ satisfies $\inf \limits _{\varepsilon \ge \varepsilon _0}\beta (\varepsilon )>0$ for all $\varepsilon _0>0$.

Indeed, the function $\beta $ given in the previous discussion satisfies this assumption.

Algorithm 5.1

Let $\alpha \in (0,1)$ be sufficiently small number, $~\delta _0>0,~\sigma _0\ge 1$, and $k:=0$;

Step 1. If $argmin(\widetilde{P}_{\sigma _k})\ne \emptyset $, solve

$$\begin{aligned} (x^k,\varepsilon _k)\in argmin (\widetilde{P}_{\sigma _k}) \end{aligned}$$

and go to Step 2. Otherwise, solve

$$\begin{aligned} (x^k,\varepsilon _k)\in \delta _k - argmin (\widetilde{P}_{\sigma _k}), \end{aligned}$$

i.e.,

$$\begin{aligned} \widetilde{f}_\sigma (x^k,\varepsilon _k)\le \inf \{\widetilde{f}_\sigma (x,\varepsilon )| (x,\varepsilon )\in [u,v]\times [0,\bar{\varepsilon }]\}+\delta _k, \end{aligned}$$

and go to Step 3.

Step 2. If $\varepsilon _k=0$, stop; Otherwise, go to Step 4.

Step 3. If $\varepsilon _k=0$ and $\delta _k\le \alpha $, stop; Otherwise, go to Step 4.

Step 4. Let

$$\begin{aligned} \delta _{k+1}=\frac{1}{2}\delta _k ~~\mathrm{and}~~ \sigma _{k+1}=\left\{ \begin{array}{ll} \sigma _k,\quad \hbox {if~}\varepsilon _k=0,\\ \rho \sigma _k,\ \hbox {otherwise}; \end{array} \right. \end{aligned}$$

where $\rho > 1$ is a constant.

Step 5. Let $k:=k+1$ and go back to step 1.

This algorithm is always feasible, since the existence of $\delta $-optimal solution is ensured by the lower-boundedness of $\widetilde{f}_\sigma $ over $D\times [0,\overline{\varepsilon }]$. In addition, due to the smoothness of $\widetilde{f}_{\sigma _k}(\cdot ,\varepsilon )$ on $D_F$, many descent algorithms can be used to find $(x^k,\varepsilon _k)$. Given $\eta \ge 0$, define a perturbation function of $(P)$ as follows

$$\begin{aligned} \theta (\eta )=\inf \{f(x)|x\in [u,v],\Vert F(x)\Vert \le \eta \}. \end{aligned}$$

Clearly, $\theta $ is a nonincreasing function with $\theta (0)$ equal to the optimal value of $(P)$. Since $f(x)$ is lower-bounded on $[u,v]$, then the limit of $\theta $ as $\eta $ approaches to $0^+$ exists and

$$\begin{aligned} \lim \limits _{\eta \rightarrow 0^+}\theta (\eta )\le \theta (0), \end{aligned}$$

(5.1)

i.e., $\theta $ is upper semicontinuous at zero.

Theorem 5.1

Under the assumptions of Theorem 3.1, let $\{(x^k,\varepsilon _k)\}$ be a sequence generated by Algorithm 5.1 and $(x^k,\varepsilon _k)\in argmin (\widetilde{P}_{\sigma _k})$. Then there exists $k_0$ such that when $\sigma _k\ge \sigma _{k_0}, \varepsilon _k=0$, i.e., $x^k$ is an optimal solution of $(P)$.

Proof

From Theorem 3.1, there exists $k_0$ such that when $\sigma _k\ge \sigma _{k_0}, \varepsilon _k=0$. It readily follows from the finiteness of $\tilde{f}_{\sigma _k}(x^k,0)$ and the definition of $\tilde{f_{\sigma }}$ that $F(x^k)=0$, i.e., $\tilde{f}_{\sigma _k}(x^k,0)=f(x^k)$. Note that $(x^k,0)\in argmin (\widetilde{P}_{\sigma _k})$ yields that $x^k\in argmin (P)$. $\square $

Lemma 5.1

Let $\beta $ satisfy Assumption $(C)$, and $\{(x^k,\varepsilon _k)\}$ be an infinite sequence generated by Algorithm 5.1. Then

(1):: $\exists k_0>0$ such that $\varepsilon _k>0$ for all $k\ge k_0$;
(2):: $\lim \limits _{k\rightarrow \infty }\varepsilon _k=0$;
(3):: $\lim \limits _{k\rightarrow \infty }F(x^k)=0$.

Proof

Note that the condition $\delta _k\le \alpha $ is always true as $k$ large enough, since $\delta _{k+1}=\delta _k/2$ by Step 4. This, according to Step 2 and Step 3, means that the termination condition is $\varepsilon _k=0$. In other words, the algorithm will generate an infinite sequence only if $\varepsilon _k>0$ as $k$ sufficiently large. By the iteration strategy on $\sigma _k$ in Step 4, we have

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\sigma _k=\infty . \end{aligned}$$

(5.2)

From Step 1, we have

$$\begin{aligned} f(x^k)+\frac{1}{2\varepsilon _k}\phi (\Delta _k)+\sigma _k\beta (\varepsilon _k)&\le \inf \{\widetilde{f}_{\sigma _k}(x,\varepsilon ) |(x,\varepsilon )\in {[u,v]\times [0,\overline{\varepsilon }]}\}+\delta _k\nonumber \\&\le f(\overline{x})+\delta _k, \end{aligned}$$

(5.3)

where $\overline{x}$ is a feasible point of $(P)$. Note that $f$ is bounded from below on $D$ and $\phi $ is nonnegative. It immediately follows from (5.2) and (5.3) that $\lim \limits _{k\rightarrow \infty }\beta (\varepsilon _k)=0,$ which, together with Assumption (C), further implies that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\varepsilon _k =0. \end{aligned}$$

(5.4)

Similarly, according to the nonnegativity of $\beta $, we get from (5.3) and (5.4) that $\lim \limits _{k\rightarrow \infty }\phi (\Delta _k)= 0$, and hence

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\Delta _k=0, \end{aligned}$$

(5.5)

where we have used the gradient inequality of convex function $\phi $, i.e., $\phi (\Delta _k)\ge \phi '(0)\Delta _k\ge 0$. Since $\Vert F(x^k)\Vert \le \sqrt{\Delta _k}+\varepsilon _k\Vert \omega \Vert $, putting (5.4) and (5.5) together yields $\lim \limits _{k\rightarrow \infty }\Vert F(x^k)\Vert =0$ as claimed. $\square $

The global convergence property of Algorithm 5.1 is given below.

Theorem 5.2

Let $\beta $ satisfy Assumption $(C)$. The following statements hold:

(1):

Assume Algorithm 5.1 stops after finitely many iterations, then $x^k$ is an either optimal solution or $\alpha $-optimal solution of $(P)$.

(2):

If an infinite sequence is generated, then

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }f(x^k)= \lim \limits _{\eta \rightarrow 0^+}\theta (\eta ), \end{aligned}$$

and

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\inf \{\widetilde{f}_{\sigma _k} (x,\varepsilon )~|~(x,\varepsilon )\in {[u,v]\times [0,\overline{\varepsilon }]}\} =\lim \limits _{\eta \rightarrow 0^+}\theta (\eta ). \end{aligned}$$

Proof

(1). Suppose the algorithm stops at the $k$-th iteration, then $\varepsilon _k=0$. Since $\widetilde{f}_{\sigma _k}(x^k,0)$ is finite, then $F(x^k)=0,$ and hence $\widetilde{f}_{\sigma _k}(x^k,0)=f(x^k)$ by the definition of $\widetilde{f}_{\sigma }$. Therefore, according to Step 1, we have $(x^k,0)\in argmin (\widetilde{P}_{\sigma _k})$ or $(x^k,0)\in \delta _k- argmin (\widetilde{P}_{\sigma _k})$. This together with $\widetilde{f}_{\sigma _k}(x^k,0)=f(x^k)$ and $\delta _k\le \alpha $ yields $x^k\in argmin(P)$ or $x^k\in \alpha - argmin(P)$.

(2). If the algorithm does not stop after finitely many iterations, then we have $\lim \limits _{k\rightarrow \infty }\sigma _k=+\infty $. By Lemma 5.1, there exists a $k_0$ such that $\varepsilon _k>0$ for all $k\ge k_0$. Because $\widetilde{f}_{\sigma _k}(x^k,\varepsilon _k)$ is finite, then

$$\begin{aligned} \widetilde{f}_{\sigma _k}(x^k,\varepsilon _k) =f(x^k)+\frac{1}{2\varepsilon _k}\phi (\Delta _k)+\sigma _k\beta (\varepsilon _k). \end{aligned}$$

The continuity of $\beta $ and the fact $\beta (0)=0$ guarantee the existence of $\overline{\varepsilon }_k>0$ such that $\lim \limits _{k\rightarrow \infty }\overline{\varepsilon }_k=0$ and

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\sigma _k\beta (\overline{\varepsilon }_k)=0. \end{aligned}$$

(5.6)

Indeed, this is ensured by choosing $\overline{\varepsilon }_k$ such that $\beta (\overline{\varepsilon }_k)\le 1/\sigma _k^2.$ Choose another sequence $\{\xi _k\}$ with $\xi _k>0$ and $\lim \limits _{k\rightarrow \infty }\xi _k= 0$. According to the definition of infimum in $\theta $, there exist $y^k\in [u,v]$ such that $\Vert F(y^k)\Vert \le \overline{\varepsilon }_k,$ and

$$\begin{aligned} f(y^k)\le \theta (\overline{\varepsilon }_k)+\xi _k. \end{aligned}$$

(5.7)

Let $\overline{\Delta }_k=\Vert F(y^k)-\overline{\varepsilon }_k\omega \Vert ^2. $ Since $\Vert F(y^k)\Vert \le \overline{\varepsilon }_k$, then

$$\begin{aligned} \overline{\Delta }_k\le \overline{\varepsilon }^2_k(1+\Vert \omega \Vert )^2, \end{aligned}$$

(5.8)

from which and the fact $\lim \limits _{k\rightarrow \infty }\overline{\varepsilon }_k=0$, we get $\overline{\Delta }_k<a$ for all $k$ sufficiently large. Thus, $f_{\sigma _k}(y^k,\overline{\varepsilon }_k)$ is finite by (2.1). Lemma 5.1 implies that, for any $\eta >0$, we have $\Vert F(x^k)\Vert \le \eta $ as $k$ large enough. Hence, $\theta (\eta )\le f(x^k)$ holds by the definition of $\theta $. Therefore,

$$\begin{aligned} \theta (\eta )&\le f(x^k)\\&\le f(x^k)+\frac{1}{2\varepsilon _k}\phi (\Delta _k) +\sigma _k\beta (\varepsilon _k)\\&= \widetilde{f}_{\sigma _k}(x^k,\varepsilon _k)\\&\le \inf \{\widetilde{f}_{\sigma _k}(x,\varepsilon ) |(x,\varepsilon )\in {[u,v]\times [0,\overline{\varepsilon }]}\}+\delta _k\\&\le \widetilde{f}_{\sigma _k}(y^k,\overline{\varepsilon }_k)+\delta _k\\&= f(y^k)+\frac{1}{2\overline{\varepsilon }_k}\phi (\overline{\Delta }_k) +\sigma _k\beta (\overline{\varepsilon }_k)+\delta _k\\&\le \theta (\overline{\varepsilon }_k)+\xi _k +\frac{1}{2\overline{\varepsilon }_k}\phi '(\overline{\Delta }_k) \overline{\Delta }_k+\sigma _k\beta (\overline{\varepsilon }_k)+\delta _k\\&\le \theta (\overline{\varepsilon }_k)+\xi _k+\frac{1}{2}\phi ' (\overline{\Delta }_k)\overline{\varepsilon }_k(1+\Vert \omega \Vert )^2 +\sigma _k\beta (\overline{\varepsilon }_k)+\delta _k, \end{aligned}$$

where the fifth inequality comes from (5.7) and the convexity of $\phi $ as before, and the last inequality is due to (5.8). Using (5.6) and the arbitrariness of $\eta >0$, we get the desired result by taking limits on both sides of above inequality. $\square $

Corollary 5.1

Let $\beta $ satisfy Assumption $(C)$, and $\{(x^k,\varepsilon _k)\}$ be an infinite sequence generated by Algorithm 5.2. The following statements hold.

(1):: $\lim \limits _{k\rightarrow \infty }f(x^k)=\theta (0)$ if and only if $\theta $ is lower semicontinuous at zero.
(2):: If $x^*$ is an accumulation point of $\{x^k\}$, then $x^*$ is a global optimal solution of $(P)$.

Proof

(1). This follows directly from Part 2) of Theorem 5.2.

(2). If $x^*$ is an accumulation point, then $F(x^*)=0$ by Lemma 5.1, i.e., $x^*$ is feasible. According to Part 2) in Theorem 5.2, we get from (5.1) that

$$\begin{aligned} \theta (0)\ge \lim \limits _{\eta \rightarrow 0^+}\theta (\eta )=\lim \limits _{k\rightarrow \infty }f(x^k)=f(x^*)\ge \theta (0), \end{aligned}$$

which implies $f(x^*)=\theta (0)$, i.e., $x^*$ is a global optimal solution of $(P)$. $\square $

Using the penalty function $\widetilde{f}_\sigma $, define the associated dual optimization problem

$$\begin{aligned} (L)&\max l(\sigma ) ~\\&\mathrm{s.t.}~~\sigma \ge 0, \end{aligned}$$

where $l(\sigma )=\inf \{\widetilde{f}_\sigma (x,\varepsilon )~|~(x,\varepsilon )\in {[u,v]\times [0,\overline{\varepsilon }]}\}$. It is easy to see that the weak dual theorem holds, i.e.,

$$\begin{aligned} \sup \limits _{\sigma \ge 0} l(\sigma )\le \theta (0). \end{aligned}$$

Note that $l$ is a nondecreasing function. Therefore, invoking Part 2) in Theorem 5.2, we develop the following necessary and sufficient conditions for the validity of zero duality gap property, i.e.,

$$\begin{aligned} \sup \limits _{\sigma \ge 0}l(\sigma )=\theta (0). \end{aligned}$$

Corollary 5.2

Let $\beta $ satisfy Assumption $(C)$. Then the zero duality gap property between $(P)$ and $(L)$ holds if and only if $\theta $ is lower semicontinuous at zero.

6 Numerical results

To give some insight into the behavior of Algorithm 5.1, numerical tests are performed on four nonlinear programming problems with equality constraints obtained from [11]. The algorithm is implemented in Matlab 7.8.0 and runs on Intel Core 2 CPU 2.39 GHz with 1.99 GB memory. We use $\Vert \nabla _{(x,\varepsilon )}\widetilde{f}_{\sigma }(x,\varepsilon )\Vert \le 10^{-6} $ as stopping criteria. Tables 1, 2, 3 and 4 show the computational results for the corresponding problems with the following items:

$$\begin{aligned} \begin{array}{l@{\quad }l} \phi _i(t)\, (i=1,2,\ldots ,6)&{} \text{ as } \text{ defined } \text{ in } \text{ Section } \text{2 },\\ \sigma _k~~&{} \text{ the } \text{ penalty } \text{ parameter },\\ x^k,\varepsilon _k &{}\text{ the } \text{ final } \text{ iterate },\\ \Delta (x^k,\varepsilon _k)~~&{} \text{ the } \text{ violation } \text{ measure } \text{ of } \text{ the } \text{ constraints },\\ \widetilde{f}_{\sigma _k}(x^k,\varepsilon _k)&{} \text{ the } \text{ value } \text{ of } \text{ penalty } \text{ function }~\widetilde{f}_{\sigma }(x,\varepsilon )~\text { at the final } (x^k,\varepsilon _k)\\ &{} \text{ with } \text{ the } \text{ penalty } \text{ parameter }~ \sigma _k. \end{array} \end{aligned}$$

Table 1 Numerical results of Example 6.1

Full size table

Table 2 Numerical results of Example 6.2

Full size table

Table 3 Numerical results of Example 6.3

Full size table

Table 4 Numerical results of Example 6.4

Full size table

We make numerical tests using different choices of $\phi _i,i=1,4,6$ defined in Sect. 2, where $\phi _1(t)$ is barrier-type penalty function and $\phi _4(t), \phi _6(t)$ are exterior-type penalty functions. Here, $\beta (\varepsilon )$ in (2.1) is set as $\sqrt{\varepsilon }$ and $\rho =2$ or $\rho =4$. Tables 1, 2, 3 and 4 illustrate the practical behavior of the algorithm. The penalty subproblem can be (approximately) solved by unconstrained smooth minimization techniques since the proposed penalty function has good smoothness property. Here, in our numerical experiments, trust-region methods are employed for solving the penalty subproblems.

Example 6.1

([11])

$$\begin{aligned} \min&f(x)=-x_1\\ \mathrm{s.t.}&F_1(x)=x_2-x_1^3-x_3^2=0,\\&F_2(x)=x_1^2-x_2-x_4^2=0. \end{aligned}$$

The point $\bar{x}=(1,1,0,0)$ is the unique (global) minimizer with the optimal objective function value $-$1.0000.

Example 6.2

([11])

$$\begin{aligned} \min f(x)&= (x_1-x_2)^2+(x_2-x_3)^2+(x_3-x_4)^4+(x_4-x_5)^4 ~\\ \mathrm{s.t.}\, F_1(x)&= x_1+x_2^2+x_3^3-3=0,\\ F_2(x)&= x_2-x_3^2+x_4-1=0. \end{aligned}$$

The point $\bar{x}=(1,1,1,1,1)$ is the (global) minimizer with the optimal objective function value 0.

Example 6.3

([11])

$$\begin{aligned} \min f(x)&= (4x_1-x_2)^2+(x_2+x_3-2)^2+(x_4-1)^2+(x_5-1)^2~\\ \mathrm{s.t.}~F_1(x)&= x_1+3x_2=0,\\ F_2(x)&= x_3+x_4-2x_5=0,\\ F_3(x)&= x_2-x_5=0. \end{aligned}$$

The point $\bar{x}=(-\frac{33}{349},\frac{11}{349},\frac{180}{349}, -\frac{158}{349},\frac{11}{349})$ is a minimizer with the optimal objective function value $\frac{1859}{349}$.

Example 6.4

([11])

$$\begin{aligned} \min f(x)&= \sum \limits _{j=1}^{10}x_j\left( c_j+\ln \frac{x_j}{x_1+\cdots +x_{10}}\right) \\ \mathrm{s.t.}~F_1(x)&= x_1+2x_2+2x_3+x_6+x_{10}-2=0,\\ F_2(x)&= x_4+2x_5+x_6+x_7-1=0,\\ F_3(x)&= x_3+x_7+x_8+2x_9+x_{10}=0,\\ x_i&\ge 0, i=1,2,\ldots , 10. \end{aligned}$$

where $c_1=-6.089,c_2=-17.164,c_3=-34.054, c_4=-5.914,c_5=-24.721, c_6=-14.986,c_7=-24.100, c_8=-10.708, c_9=-26.662,c_{10}=-22.179$. The point $\bar{x}=(0.1083,0.9438, 0,0.0004,0.4978,0.0040,0,0,0,0)$ is a (but not unique) local minimizer with the optimal objective function value $-$30.581215.

As shown, for whatever choices of $\phi _i$, with the penalty parameter gradually increasing, the values of indicator variable $\varepsilon _k$ and constraint violation measure $\Delta (x^k,\varepsilon _k)$ tend to zero as desired. Additionally, it is not difficult to observe that the minimizers can be obtained without requirements of large penalty parameters for the choices of $\phi _i$ considered here. Numerical performances verify the correctness of the developed theory as desired. For example, as illustrated in Tables 1, 2, 3 and 4, the iterates $(x^k,\varepsilon _k)$ are already quite close to the point $(\bar{x},0)$, where $\bar{x}$ is a minimizer of the original problem. As stated in Theorem 3.1, the optimal solution of the penalty problem must take the form of $(x^k,\varepsilon _k)$ with $\varepsilon _k=0$ for sufficiently large penalty parameters, which further means $x^k$ is a local optimal solution of the original problem $(P)$.

In summary, our numerical experiments on four examples of nonlinear programming problems with equality constraints confirm the efficiency of the proposed algorithm. As shown in Tables 1, 2, 3 and 4, the numerical outputs for the different choices for $\phi _i, i=1,4,6$ seem to have no significant differences, which demonstrates the class of convex functions presenting a unified framework for some barrier-type and exterior-type functions are effective for solving nonlinear programming problems. Nevertheless, there exists a little difference in the algorithm implementation process for solving exterior-type penalty functions and barrier-type penalty functions; for barrier-type penalty functions, we must take the additional constraint $\Delta (x,\varepsilon )<a$ into account in contrast to exterior-type penalty functions, for instance, $a=q^{-1}$ for $\phi _1(t), a=\frac{\pi }{2}$ for $\phi _2(t)$, and $a=1$ for $\phi _3(t)$.

References

Antczak, T.: Exact penalty functions method for mathematical programming problems involving invex functions. Eur. J. Oper. Res. 198, 29–36 (2009)
Article Google Scholar
Bazaraa, M., Sherali, H., Shetty, C.: Nonlinear Programming: Theory and Algorithms. Wiley, New York (2006)
Book Google Scholar
Burke, J.: Calmness and exact penalization. SIAM J. Control Optim. 29, 493–497 (1991)
Article Google Scholar
Cominetti, R., Dussault, J.P.: Stable exponential-penalty algorithm with superlinear convergence. J. Optim. Theory Appl. 83, 285–390 (1994)
Google Scholar
Chen, L., Goldfarb, D.: Interior-point $l_2$ penalty methods for nonlinear programming with strong global convergence properties. Math. Program. 108, 1–36 (2006)
Article Google Scholar
Evans, J.P., Gould, F.J., Tolle, J.W.: Exact penalty functions in nonlinear programming. Math. Programm. 4, 72–97 (1973)
Article Google Scholar
Fletcher, R.: An exact penalty function for nonlinear programming with inequalities. Math. Program. 5, 129–150 (1973)
Article Google Scholar
Gonzaga, C.C., Castillo, R.A.: A nonlinear programming algorithm based on non-coercive penalty functions. Math. Program. 96, 87–101 (2003)
Article Google Scholar
Han, S.P., Mangasarian, O.L.: A dual differentiable exact penalty function. Math. Program. 25, 293–306 (1983)
Article Google Scholar
Han, S.P., Mangasarian, O.L.: Exact penalty functions in nonlinear programming. Math. Program. 17, 251–269 (1979)
Article Google Scholar
Hock, W., Schittkowski, K.: Test Examples for Nonlinear Programming Codes. Springer, New York (1981)
Book Google Scholar
Hoheisel, T., Kanzowa, C., Outrata, J.: Exact penalty results for mathematical programs with vanishing constraints. Nonlinear Anal. 72, 2514–2526 (2010)
Article Google Scholar
Huyer, W., Neumaier, A.: A new exact penalty function. SIAM J. Optim. 13, 1141–1158 (2003)
Article Google Scholar
Li, W., Peng, J.: Exact penalty functions for constrained minimization problems via regularized gap function for variational inequalities. J. Global Optim. 37, 85–94 (2007)
Article Google Scholar
Lucidi, S.: New results on a continuously differentiable exact penalty function. SIAM J. Optim. 2, 558–574 (1992)
Article Google Scholar
Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)
Book Google Scholar
Meng, Z.Q., Dang, C.Y., Yang, X.Q.: On the smoothing of the square-root exact penalty function for inequality constrained optimization. Comput. Optim. Appl. 35, 375–398 (2006)
Article Google Scholar
Di Pillo, G: Exact penalty methods, algorithms for continuous optimization. In: Spedicato, E. (ed.). Kluwer, The Netherlands (1994)
Di Pillo, G., Grippo, L.: An exact penalty function method with global convergence properties for nonlinear programming problems. Math. Program. 36, 1–18 (1986)
Article Google Scholar
Di Pillo, G., Lucidi, S.: An augmented Lagrangian function with improved exactness properties. SIAM J. Optim. 12, 376–406 (2001)
Article Google Scholar
Tseng, P., Bertsekas, D.P.: On the convergence of the exponential multiplier method for convex programming. Math. Program. 60, 1–19 (1993)
Article Google Scholar
Zaslavski, A.J.: Existence of exact penalty for constrained optimization problems in Hilbert spaces. Nonlinear Anal. 67, 238–248 (2007)
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank referees for their constructive comments and suggestions, which significantly improved the presentation of the paper. The authors also thank Prof. Zhiyuan Tian in Qingdao University for his help.

Author information

Authors and Affiliations

Institute of Operations Research, Qufu Normal University, Qufu, China
Changyu Wang
College of Business, Qingdao University, Qingdao, China
Cheng Ma
Department of Mathematics, School of Science, Shandong University of Technology, Zibo, China
Jinchuan Zhou

Authors

Changyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jinchuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng Ma.

Additional information

This research was supported by the National Natural Science Foundation of China (10971118, 11271226, 11271233, 11101248, 10901096).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Ma, C. & Zhou, J. A new class of exact penalty functions and penalty algorithms. J Glob Optim 58, 51–73 (2014). https://doi.org/10.1007/s10898-013-0111-9

Download citation

Received: 22 November 2011
Accepted: 17 September 2013
Published: 29 September 2013
Issue Date: January 2014
DOI: https://doi.org/10.1007/s10898-013-0111-9

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new class of exact penalty functions and penalty algorithms

Abstract

Similar content being viewed by others

Smooth exact penalty functions: a general approach

A class of exact penalty functions and penalty algorithms for nonsmooth constrained optimization problems

An Interior-Point $$\boldsymbol{\ell_{1}}$$ -Penalty Method for Nonlinear Optimization

1 Introduction

2 A class of exact penalty functions

Definition 2.1

Definition 2.2

Proposition 2.1

Example 2.1

3 Exact penalty property

Assumption (A)

Lemma 3.1

Proof

Lemma 3.2

Proof

Lemma 3.3

Proof

Theorem 3.1

Proof

Corollary 3.1

Proof

Remark 3.1

Assumption (B)

Example3.1

Theorem 3.2

Proof

4 Inverse propositions for exact penalization

Theorem 4.1

Proof

Corollary 4.1

Proof

Remark 4.1

Corollary 4.2

Proof

Example 4.1

Example 4.2

Corollary 4.3

Proof

Proposition 4.1

Proof

Example 4.3

Corollary 4.4

5 Penalty function methods

Assumption (C)

Algorithm 5.1

Theorem 5.1

Proof

Lemma 5.1

Proof

Theorem 5.2

Proof

Corollary 5.1

Proof

Corollary 5.2

6 Numerical results

Example 6.1

Example 6.2

Example 6.3

Example 6.4

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation