1 Introduction

It is well known that augmented Lagrangian methods are efficient to solve constrained optimization problems in different areas of applications. The augmented Lagrangian for optimization problems with equality constraints was proposed by Hestenes [12] and Powell [23] and extended by Rockafellar for inequality constraints [24]. The Powell–Hestenes–Rockafellar (PHR) augmented Lagrangian function has been widely used in practice, see for example [1, 8] related to the packages Algencan and Lancelot, respectively.

In each iteration of an augmented Lagrangian method, a subproblem with simple constraints is approximately solved. One drawback of the PHR approach is that the objective functions of the subproblems are not twice continuously differentiable, which can result in difficulties to apply a Newton-like method. This motivates the study of different augmented Lagrangian functions. Many nonlinear Lagrangian/penalty functions have been proposed and/or analyzed by many authors, see for example [2, 3, 6, 11, 14, 16, 17, 21, 22, 25, 29].

It is desirable that optimization algorithms have effective stopping criteria. Ideally, an algorithm should stop at an iterate for which it is possible to ensure feasibility and optimality up to arbitrarily small given precisions. On the other hand, if the problem is infeasible, the algorithm should stop with a certificate of infeasibility. In the context of global optimization and PHR augmented Lagrangian methods, this issue was considered in [5]. In the local optimization field, strategies handling possible infeasibility were considered, for example, in [7, 19, 30] in the context of SQP, augmented Lagrangian and Interior Point methods, respectively. In some practical applications, it may be useful to accept some level of infeasibility (for example when the constraints involve modeling errors) and distinguish infeasible points according to their objective function values. These issues were considered in [4].

In this paper, we consider a large family of augmented Lagrangian methods introduced by Kort and Bertsekas [2, 14]. We propose an algorithm based on this class and, accepting inexact global solutions of the subproblems, study its asymptotic convergence properties. We prove that the limit points of the sequence generated by the algorithm are global minimizers of an infeasibility measure, which is specified for some penalty functions. Based on these results, a second algorithm is proposed, which stops after a finite number of iterations at an approximate feasible/optimal solution or detects that the problem is infeasible. In particular, we extend some results of [5] to our more general setting. We also present some numerical experiments illustrating the applicability of the algorithm.

This paper is organized as follows. In Sect. 2, we consider the optimization problem and the class of Lagrangian/penalty functions to be analyzed. In Sect. 3, we propose an algorithm and examine its asymptotic convergence properties. We also propose a second algorithm that takes into account the possible infeasibility of the problem. In Sect. 4, we present some numerical experiments analyzing our algorithm for some Lagrangian/penalty functions. Last section is devoted to conclusions and final remarks.

Notation

The \(\alpha \)-norm and maximum norm of \(x \in {{\mathbb {R}}}^m\) are denoted, respectively, by \( \Vert x \Vert _\alpha :=\left( \sum _{i=1}^{m}|x_i|^{\alpha }\right) ^{1/\alpha }\) and \(\Vert x \Vert _\infty :=\max (|x_{1}|,\ldots ,|x_m|),\) when \(\alpha =2\), we use the notation \(\Vert \cdot \Vert \). We denote \(\mathbb {R}_{+}\) and \(\mathbb {R}_{++}\) the sets of non-negative and strictly positive real numbers, respectively. Let \(v \in \mathbb {R}^n\), then \(v_{+} := (\max \{0, v_1\}, \ldots , \max \{0, v_n\})\). For \(K = (k_1, k_2, \ldots ) \subseteq \mathbb {N}\) (with \(k_j < k_{j+1}\) for all \(j\)), we denote \(K \displaystyle \mathop {\subset }_{\infty }\mathbb {N}\).

2 General Lagrangian function

In this paper, we consider the inequality constrained optimization problem

$$\begin{aligned} \begin{array}{l@{\quad }l} \text{ Minimize } &{} f(x) \\ \text{ subject } \text{ to } &{} g(x) \le 0 \\ &{} x \in \Omega , \end{array} \end{aligned}$$
(1)

where \( f: \mathbb {R}^n \rightarrow \mathbb {R}\) and \(g: \mathbb {R}^n \rightarrow \mathbb {R}^m\) are continuous and \(\Omega \subset \mathbb {R}^n \) is a nonempty compact set. In general, \(\Omega \) is defined by simple constraints such as affine constraints or box constraints.

In order to solve problem (1) or identify if the feasible set is empty, we propose an algorithm based on the general class of Lagrangians introduced by Kort and Bertsekas [2, 14]

$$\begin{aligned} L(x, {\lambda },\rho )=f(x)+\frac{1}{\rho }\sum _{i=1}^{m}W(\rho g_i(x),\lambda _i), \end{aligned}$$
(2)

where \(x\in \Omega ,\) \(\lambda \in \mathbb {R}^{m}_{+}, \rho >0\), and the penalty function \(W:{{\mathbb {R}}}^2\rightarrow {{\mathbb {R}}}\) satisfies the following conditions:

  1. (H1)

    \(W\) is continuously differentiable on \({{\mathbb {R}}}\times ]0,\infty [\) and possesses, for all \(s \in {{\mathbb {R}}}\), the right derivative

    $$\begin{aligned} \lim _{t \rightarrow 0^+}\frac{W(s,t)-W(s,0)}{t}; \end{aligned}$$
  2. (H2)

    \(W(s,.)\) is concave on \({{\mathbb {R}}}_{+}\) for each fixed \(s\in {{\mathbb {R}}}\);

  3. (H3)

    for each fixed \(t \in {{\mathbb {R}}}_{+}, W(.,t)\) is convex on \({{\mathbb {R}}}\) and satisfies the following strict convexity condition: If \(s_0>0\) or \(W'_s(s_0,t)>0\), then \(W(s,t)- W(s_0,t)>(s-s_0)W'_s(s_0,t)\) for \(s\ne s_0\), where \(W'_s(s,t)\) denotes the partial derivative of \(W(s,t)\) with respect to s;

  4. (H4)

    \(W(0,t)=0, W'_s(0,t)=t\), for all \(t\in {\mathbb {R}}_{+}\);

  5. (H5)

    \( \lim _{s\rightarrow -\infty }W'_s(s,t)=0\), for all \(t\in {\mathbb {R}}_{+}\);

  6. (H6)

    \( \inf _{s\in {{\mathbb {R}}}}W(s,t)> -\infty \), for all \(t\in {{\mathbb {R}}}_{+}\);

  7. (H7)

    \( \lim _{s\rightarrow \infty }\frac{W(s,t)}{s}=\infty \), for all \(t\in {{\mathbb {R}}}_{+}\).

Remark 1

Since by (H3) \(W(.,t)\) is convex, it follows that \(W'(.,t)\) is monotone. Hence, (H5) implies

$$\begin{aligned} W'_s(s,t)\ge 0 \qquad \forall (s,t)\in {{\mathbb {R}}}\times {{\mathbb {R}}}_{+}, \end{aligned}$$

which shows that \(W(\cdot ,t)\) is non-decreasing for every \(t\in {{\mathbb {R}}}_{+}\). It is also possible to show that (H7) is equivalent to \(\lim _{s\rightarrow \infty }{W'_s(s,t)}=\infty \), for all \(t\in {{\mathbb {R}}}_{+}\).

The family of penalty functions satisfying (H1)–(H7) is very large. Next, we present two subclasses of functions satisfying these conditions. The first one generalizes the classical quadratic augmented Lagrangian, and the second one contains some exponential type functions. In particular, every function in the second family is twice continuously differentiable. For more details on these functions and other examples, see for instance [2].

Example 1

Let \(W:{{\mathbb {R}}}^2\rightarrow {{\mathbb {R}}}\) such that

$$\begin{aligned} W(s,t)=\left\{ \begin{array}{l@{\quad }l} ts+\phi (s) &{} \text{ if }\; t+\phi '(s)\ge 0,\\ \displaystyle \min _{\tau \in {{\mathbb {R}}}}[t\tau + \phi (\tau )] &{} \text{ otherwise }, \\ \end{array}\right. \end{aligned}$$
(3)

where \(\phi :{{\mathbb {R}}}\rightarrow {{\mathbb {R}}}\) satisfies the following conditions:

  1. (E1)

    \(\phi \) is continuously differentiable and strictly convex on \({{\mathbb {R}}}\);

  2. (E2)

    \(\phi (0)=0,\) \(\phi '(0)=0,\) \(\lim _{s \rightarrow -\infty }\phi '(s)=-\infty \);

  3. (E3)

    \(\lim _{s \rightarrow \infty } \frac{\phi (s)}{|s|}=\infty .\)

It is not difficult to prove that the penalty function \(W(s,t)\) defined in (3) satisfies (H1)–(H7), see [2]. In the following, we consider some examples of \(\phi \) satisfying conditions (E1)–(E3):

  1. (i)

    \(\phi (s)=\alpha ^{-1}|s|^{\alpha }, \alpha >1\);

  2. (ii)

    \(\phi (s)=\alpha ^{-1}|s|^{\alpha }+(1/2)s^2, \alpha >1\);

  3. (iii)

    \(\phi (s)=\cosh (s)-1\).

Taking \(\phi (s)=(1/2)s^2\) in (3), then

$$\begin{aligned} W(s,t)=\left\{ \begin{array}{l@{\quad }l} ts+({1}/{2)}s^2 &{} \text{ if }\; s\ge -t,\\ -({1}/{2})t^2 &{} \text{ if }\; s< -t .\\ \end{array}\right. \end{aligned}$$
(4)

Hence, after simple calculations, the corresponding augmented Lagrangian function (2) can be rewritten as

$$\begin{aligned} L(x, {\lambda },\rho )=f(x)+\frac{1}{2\rho }\sum _{i=1}^{m}\{[\max \{0,\lambda _i+\rho g_i(x)\}]^2-(\lambda _i)^2\}, \end{aligned}$$
(5)

which is the PHR augmented Lagrangian function [24]. Now, if \(\phi (s)=\alpha ^{-1}|s|^{\alpha }\), where \(\alpha >1\), or \(\phi (s)=\cosh (s)-1\), we obtain the following penalty functions, respectively,

$$\begin{aligned} W(s,t)=\left\{ \begin{array}{l@{\quad }l} ts+\alpha ^{-1}|s|^{\alpha } &{} \text{ if }\quad t+\alpha ^{-1}\frac{d}{ds}|s|^{\alpha }\ge 0,\\ \min _{\tau \in {{\mathbb {R}}}}[t\tau + \alpha ^{-1}|\tau |^{\alpha }] &{} \text{ otherwise }, \\ \end{array}\right. \end{aligned}$$
(6)

and

$$\begin{aligned} W(s,t)=\left\{ \begin{array}{l@{\quad }l} ts+\cosh (s)-1 &{}\quad \text{ if }\quad t+\sinh (s)\ge 0,\\ t\sinh ^{-1}(-t)+\cosh (\sinh ^{-1}(-t))-1 &{} \quad \text{ otherwise } .\\ \end{array}\right. \end{aligned}$$
(7)

(Recall that \(\sinh ^{-1}(x)=\ln (x+\sqrt{x^2+1})\)).

We mention that, similarly to the quadratic augmented Lagrangian, the penalty functions of Example 1 may not be twice continuously differentiable. Next, we present some examples of penalty functions with the advantage that the corresponding augmented Lagrangian functions are twice continuously differentiable, see [2] for details.

Example 2

Let \(W:{{\mathbb {R}}}^2\rightarrow {{\mathbb {R}}}\) such that

$$\begin{aligned} W(s,t)=t\psi (s)+\xi (s), \end{aligned}$$
(8)

where \( \psi ,\xi :{{\mathbb {R}}}\rightarrow {{\mathbb {R}}}\) satisfy the following conditions:

  1. (A1)

    \(\psi \) and \(\xi \) are twice continuously differentiable functions and convex;

  2. (A2)

    \(\psi (0)=0,\) \(\psi '(0)=1\) and \(\psi ''(s)>0\) for all \(s\in \mathbb {R}\);

  3. (A3)

    \(\lim _{s \rightarrow -\infty }\psi (s)>-\infty , \lim _{s \rightarrow -\infty }\psi '(s)=0\) and \(\lim _{s \rightarrow \infty }\psi '(s)=\infty \);

  4. (A4)

    \(\xi (s)=0\) for all \(s\le 0\) and \(\xi (s)>0\) for all \(s>0\);

  5. (A5)

    \(\xi ''(0)=0\) and \(\lim _{s \rightarrow \infty }\xi '(s)=\infty \).

An interesting example in this family of penalty functions is obtained by taking \(\psi (s)=e^ s-1, \xi (s)=\frac{1}{3}[\max \{0,s\}]^3\). In this case,

$$\begin{aligned} W(s,t)=t(e^ s-1)+\frac{1}{3}[\max \{0,s\}]^3 \end{aligned}$$
(9)

and the associated augmented Lagrangian is

$$\begin{aligned} L(x, {\lambda },\rho )=f(x)+\frac{1}{\rho }\sum _{i=1}^{m} \big ( \lambda _i(e^{\rho g_i(x)}-1)+ \frac{1}{3}[\max \{0,\rho g_i(x)\}]^3\big ). \end{aligned}$$

Another example of a twice continuously differentiable penalty function is obtained by taking \(\psi \) and \(\xi \) as

$$\begin{aligned} \psi (s)=\left\{ \begin{array}{l@{\quad }l} s+s^2&{} \text{ if }\; s\ge 0,\\ s/(1-s) &{} \text{ if }\; s< 0 ,\\ \end{array}\right. \quad \text{ and }\quad \xi (s)=\left\{ \begin{array}{l@{\quad }l} s^3&{} \text{ if }\; s\ge 0,\\ 0 &{} \text{ if }\; s< 0 .\\ \end{array}\right. \end{aligned}$$

Thus, the corresponding penalty function (8) becomes

$$\begin{aligned} W(s,t)=\left\{ \begin{array}{l@{\quad }l} ts+ts^2+ s^3&{} \text{ if }\; s\ge 0,\\ ts/(1-s) &{} \text{ if }\; s< 0 .\\ \end{array}\right. \end{aligned}$$
(10)

In the following, we present a lemma which is essential for the analysis of our algorithm.

Lemma 1

Let \(s\in {{\mathbb {R}}}, t\ge 0\) and a bounded sequence \(\{t_k\}\subset {{\mathbb {R}}}_{+}\). Then

  1. (i)

    \(W_t'(s,t)\ge s\);

  2. (ii)

    \(sW_s'(s,t)\ge W(s,t)\ge tW_t'(s,t)\ge st;\)

  3. (iii)

    \(W_s'(s,t)=t\) implies \(s\le 0, st=0\);

  4. (iv)

    \(\lim _{k\rightarrow \infty } \frac{1}{r_k}\inf _{s\in \mathbb {R}} W(s,t_k)=0,\) where \(r_k \rightarrow \infty \).

Proof

For the proofs of items (i)–(iii), see [2, Proposition 5.1]. Let us prove item (iv). It follows by assumptions (H4) and (H6) that

$$\begin{aligned} 0 \ge \inf _{s\in \mathbb {R}} W(s,t)> -\infty . \end{aligned}$$

From (H2) we have \(q(t):= \inf _{s\in \mathbb {R}} W(s,t)\) is concave \([0,+\infty [\). Let \(0<t_k\le b , v_k \in \partial q(t_k)\) and \(v \in \partial q(b)\). Therefore, using the supergradient inequality \(q(0)\le q(t_k)-v_kt_k \) and the non-increasing property of superdifferential \((t_k\le b\Rightarrow v\le v_k )\), we obtain

$$\begin{aligned} q(t_k)\ge q(0)-b|v|,\qquad 0<t_k\le b . \end{aligned}$$

Since the last estimate also holds for \(t_k=0\) and \(q(t)\le 0\) for all \(t\ge 0\), it follows that \(q(t_k)\) is bounded for \(0\le t_k\le b \). Therefore, as \(r_k \rightarrow \infty \), we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{1}{r_k}\inf _{s\in \mathbb {R}} W(s,t^k)=0. \end{aligned}$$

\(\square \)

3 Algorithms

In this section, we study global convergence properties of an augmented Lagrangian method associated with the Lagrangian function defined in (2). We state the algorithm and show that its generated sequence satisfies an asymptotic infeasibility measure. Based on this result, we show some infeasibility measures satisfied by the limit points of the sequence generated by the algorithm when specific examples of penalty functions are considered. We also analyze some properties of the algorithm, which are essential for the feasibility/infeasibility test presented in the second algorithm. In order to state the algorithm, let \(\{\varepsilon _k\}\subset {{\mathbb {R}}}_{+}\) be a bounded sequence, which is used to estimate the quality of approximate solutions of the augmented Lagrangian subproblems. Also, let us define \(\sigma (x,\lambda ,\rho ):=(\sigma _1(x,\lambda ,\rho ),\ldots ,\sigma _m(x,\lambda ,\rho ) )^T\) with

$$\begin{aligned} \sigma _i(x,\lambda ,\rho ):=W'_s(\rho g_i(x), {\lambda _i)}-{\lambda _i},\;\; \text{ for }\; i=1,\ldots ,m. \end{aligned}$$
(11)

It follows by item (iii) of Lemma 1 that \(\left\| \sigma (x,\lambda ,\rho )\right\| _{\infty }\) may be used to measure the feasibility and complementarity condition of an approximate solution \(x\) and a multiplier \(\lambda \). We shall consider this measure in order to update the penalty parameter.

Algorithm 1

Let \(\gamma >1, 0 < \tau < 1, \rho _1>0, \lambda _{\max }>0\), and \({\lambda }^1 \in {\mathbb {R}}^m\) such that \({{\lambda }}^1_i \in [0, \lambda _{\max }],\) for all \(i=1,\dots ,m\). Assume that \(\{\varepsilon _k\}\) is a bounded positive sequence and initialize \(k \leftarrow 1\).

Step 1.:

(Subproblem)

Find \(x_k \in \Omega \) such that

$$\begin{aligned} L(x^k, {\lambda }^k,\rho _k) \le L(x, {\lambda }^k,\rho _k) + \varepsilon _k \end{aligned}$$
(12)

for all \(x \in \Omega \).

Step 2.:

(Update penalty parameter)

If \(k=1\) or

$$\begin{aligned} \Vert \sigma (x^k,{\lambda }^k,\rho _k)\Vert _{\infty } \le \tau \; \Vert \sigma (x^{k-1},{\lambda }^{k-1},\rho _{k-1})\Vert _{\infty }, \end{aligned}$$
(13)

set \(\rho _{k+1} = \rho _k\). Otherwise, set \(\rho _{k+1} = \gamma \rho _k\).

Step 3.:

(Update multipliers)

Compute \( {\lambda }^{k+1}_i \in [0, \lambda _{\max }],\) for all \( i=1,\dots ,m\).

Set \(k \leftarrow k+1\) and go to Step 1.

We mention that, as in many augmented Lagrangian methods, solving the subproblems is usually a difficult task. It is assumed that to solve a sequence of optimization problems with simple constraints is much less difficult than to solve the constrained optimization problem (1). In order to alleviate this task, we will accept approximate solutions of the subproblems. In the first iterations the tolerance \(\varepsilon _k\) may be considered not so stringent and one shall start to consider it smaller during the execution of the algorithm. Moreover, it follows from Remark 1 that, if problem (1) is convex, then the subproblems are also convex, which implies that every stationary point is a global solution.

In order to prove the theoretical results, we require the boundedness of the Lagrange multipliers, see Step 3 of Algorithm 1. In practice, the rule to update the Lagrangian multipliers is determinant for the performance of the algorithm. For the family of augmented Lagrangian functions studied in this paper, we consider the safeguard strategy to update the Lagrangian multipliers given by

$$\begin{aligned} \lambda ^{k+1}_i=\min \{W'_s(\rho _kg_i(x^k),\lambda _i^k), \lambda _{\max }\}, \text{ for } \text{ all } k, \text{ and } i=1,\ldots ,m. \end{aligned}$$
(14)

A similar strategy is also considered in [1, 3, 5, 16].

In the following, we present some properties of Algorithm  1. These properties will be summarized and they will be the motivations for our last algorithm, for which we will introduce an infeasibility test. Let us consider the function \(M:\mathbb {R}^m \times \mathbb {R}_{+}^{m} \times \mathbb {R}_{++} \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} M(y,\lambda ,\rho )=\frac{1}{\rho }\sum _{i=1}^{m}W(\rho y_i,\lambda _i). \end{aligned}$$
(15)

Proposition 2

Assume that \(\{x^k\}\) is a sequence generated by Algorithm 1. Then, for all \(z \in \Omega \),

$$\begin{aligned} \limsup _{k \rightarrow \infty } \frac{M(g(x^k),\lambda ^k,\rho _k)}{ \beta (\rho _k)}\le \limsup _{k \rightarrow \infty } \frac{M(g(z)_{+},\lambda ^k,\rho _k)}{\beta (\rho _k) }, \end{aligned}$$
(16)

for any continuous function \(\beta :\;]0,\infty [\;\rightarrow \mathbb {R}_{++}\) such that \(\lim _{s \rightarrow \infty }\beta (s)=\infty \). In particular, if problem (1) is feasible, then every limit point of \(\{x^ k\}\) is feasible.

Proof

Let \(x^* \in \Omega \) be an arbitrary limit point of \(\{x^k\}\) (such a point exists because \(\Omega \) is compact). Let \(K \displaystyle \mathop {\subset }_{\infty }\mathbb {N}\) such that \(\lim _{k \in K} x^k = x^*\). As \(\{\rho _k\}\) is non-decreasing, we may consider two cases: \(\{\rho _k\}\) bounded and \(\rho _k\rightarrow \infty \).

Case 1 (\(\{\rho _k\}\) bounded): By Step 2 of Algorithm 1, we have \(\lim _{k \rightarrow \infty }\Vert \sigma (x^k,\lambda ^k,\rho _k)\Vert _{\infty }=0,\) or equivalently,

$$\begin{aligned} \lim _{k \rightarrow \infty }[W'_s(\rho _k g_i(x^k),{\lambda }^k_i)-{\lambda }^k_i]=0, \quad i=1,\ldots ,m. \end{aligned}$$
(17)

Since \(\{\lambda ^k\}\) and \(\{\rho _k\}\) are bounded, we may assume, without loss of generality, \(\lambda ^k\rightarrow \bar{\lambda }\ge 0\), and \(\rho _k\rightarrow \bar{\rho }>0\). So, (17) implies

$$\begin{aligned} W'_s(\bar{\rho }g_i(x^*),\bar{\lambda }_i)=\bar{\lambda }_i \quad i=1,\ldots ,m. \end{aligned}$$

This result, combined with item (iii) of Lemma 1, gives \(g({x^*})\le ~0\), which trivially implies the second part of the proposition when \(\{\rho _k\}\) is bounded. Now, since \( g_i({x^*})\le ~0\), for all \(i=1,\ldots , m, W(\cdot ,t)\) is non-decreasing for every \(t\ge 0\) and \(W(0,\bar{\lambda _i})=0\), we obtain

$$\begin{aligned} \limsup _{k \rightarrow \infty } \frac{1}{\rho _k\beta (\rho _k)}\sum _{i=1}^{m}W(\rho _k g_i(x^k),{\lambda }^k_i)\le 0. \end{aligned}$$
(18)

Similarly, observing that \(0\le \max \{0,g_i(z)\}=g_i(z)_{+}\), we have

$$\begin{aligned} \limsup _{k \rightarrow \infty } \frac{1}{\rho _k\beta (\rho _k)}\sum _{i=1}^mW(\rho _k g_i(z)_{+},{\lambda }^k_i)= & {} \frac{1}{\bar{\rho }\beta (\bar{\rho })}\sum _{i=1}^{m} W(\bar{\rho } g_i(z)_{+},\bar{\lambda _i})\nonumber \\\ge & {} \frac{1}{\bar{\rho }\beta (\bar{\rho })}\sum _{i=1}^{m} W(0,\bar{\lambda _i}) =0. \end{aligned}$$
(19)

Therefore, (16) is implied by (18) and (19).

Case 2 (\(\rho _k\rightarrow \infty \)): For any \(z \in \Omega \), we have, by Step 1 of Algorithm 1

$$\begin{aligned} L(x^k, {\lambda }^k,\rho _k ) \le L(z, {\lambda }^k, \rho _k) + \varepsilon _k, \quad \forall \; k. \end{aligned}$$

Since \(g_i(z)\le g_i(z)_{+}\), for all \(i=1, \ldots ,m,\) \(W(\cdot ,t)\) is non-decreasing for every \(t\ge 0\) and taking into account (2), the last inequality becomes

$$\begin{aligned} f(x^k)+\frac{1}{\rho _k}\sum _{i=1}^{m}W(\rho _k g_i(x^k),{\lambda }^k_i) \le f(z) +\frac{1}{\rho _k}\sum _{i=1}^{m}W(\rho _k g_i(z)_{+},{\lambda }^k_i)+\varepsilon _k, \quad \forall \; k. \end{aligned}$$

Hence, using the functions \(\beta \) and M [see (15)], it follows from the last inequality that

$$\begin{aligned} \frac{M(g(x^k),\lambda ^k,\rho _k)}{ \beta (\rho _k)}\le \frac{M(g(z)_+,\lambda ^k,\rho _k)}{\beta (\rho _k)} +\frac{\varepsilon _k + f(z) - f(x^k)}{ \beta (\rho _k)}. \end{aligned}$$

As \(\{\varepsilon _k\}\) is bounded, \(f\) is continuous, \(\lim _{\rho _k \rightarrow \infty }\beta (\rho _k)=\infty \), and \(\Omega \) is compact, then (16) follows from the last inequality. Thus, the first part of the proposition is concluded.

Observing that when \(\{\rho _k\}\) is bounded, the last assertion of the proposition has already been proved, we may assume that \(\rho _k \rightarrow \infty \). Let \(z \in \Omega \) be a global minimizer of (1). Hence, using that \(\max \{0,g_i(z)\}=0, W(0,{\lambda }^k_i)=0\) and (16), we obtain

$$\begin{aligned} \limsup _{k \rightarrow \infty } \frac{M(g(x^k),\lambda ^k,\rho _k)}{ \beta (\rho _k)}\le 0, \end{aligned}$$
(20)

for any continuous function \(\beta :\;]0,\infty [\;\rightarrow \mathbb {R}_{++}\) such that \( \lim _{s \rightarrow \infty }\beta (s)=\infty \). Suppose, by contradiction, that \(x^*\) is not feasible for problem (1). Thus, there exists some \(i_0\) such that \(g_{i_0}(x^{*})> 0\). Hence, for some \(\delta >0\), it follows from the continuity of \(g_{i_0}\) that there exist \(k_0>0\) such that \(g_{i_0}(x^{k})\ge \delta \) for all \(k\ge k_0, k \in K\). Consequently, using that \(W(\cdot ,t)\) is non-decreasing for every \(t\ge 0\), we obtain

$$\begin{aligned} M(g(x^k),\lambda ^k,\rho _k)= & {} \displaystyle \frac{1}{\rho _k}\sum _{i\ne i_0}W(\rho _k g_i(x^k),{\lambda }^k_i)+\frac{W(\rho _k g_{i_0}(x^k),{\lambda }^k_{i_0})}{\rho _k} \nonumber \\\ge & {} \displaystyle \frac{1}{\rho _k}\sum _{i\ne i_0}W(\rho _k g_i(x^k),{\lambda }^k_i)+\frac{W(\rho _k \delta ,{\lambda }^k_{i_0})}{\rho _k} \nonumber \\\ge & {} \displaystyle \frac{1}{\rho _k}\sum _{i\ne i_0}\inf _{s\in \mathbb {R}} W(s,{\lambda }^k_i)+\frac{W(\rho _k \delta ,0)}{\rho _k}, \end{aligned}$$
(21)

where we have used properties of infimum and item (i) of Lemma 1 in the last inequality. It follows by item (iv) of Lemma 1 and properties of function \(\beta \) that

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{1}{\rho _k\beta {(\rho _k)}}\sum _{i\ne i_0}\inf _{s\in \mathbb {R}} W(s,{\lambda }^k_i)=0. \end{aligned}$$
(22)

Now, taking \(\beta (s)={W(\delta s,0)}/{s}\), (H7) implies \(\lim _{s\rightarrow \infty }\beta (s)=\infty \). Hence, it follows by (21) and (22) that

$$\begin{aligned} \limsup _{k \rightarrow \infty } \frac{M(g(x^k),\lambda ^k,\rho _k)}{ \beta (\rho _k)}\ge 1, \end{aligned}$$

which is a contradiction with (20), concluding the proof. \(\square \)

Proposition 2 presents an asymptotic infeasibility measure associated with the general augmented Lagrangian function (2). In the following, we specify this infeasibility measure for some specific penalty functions.

Corollary 3

Let \(\{x^k\}\) be an infinite sequence generated by Algorithm 1 with \(W\) defined in (6). For any \(x^*\) limit point of \(\{x^k\}\), and for all \(z \in \Omega \) we have

$$\begin{aligned} \Vert g(x^*)_+\Vert _{\alpha }\le \Vert g(z)_+\Vert _{\alpha }. \end{aligned}$$
(23)

Proof

Let \(K \displaystyle \mathop {\subset }_{\infty }\mathbb {N}\) be such that \(\lim _{k \in K} x^k = x^*\). Using the definition in (6) and \(\rho _k>0\) for all \(k \in \mathbb {N}\), we obtain

$$\begin{aligned}&W(\rho _k g_i(x^k),{\lambda }^k_i)\nonumber \\&\quad =\left\{ \begin{array}{l@{\quad }l@{\quad }l} \rho _k{\lambda }^k_i g_i(x^k) +\alpha ^{-1}\rho _k^{\alpha } |g_i(x^k)|^{\alpha } &{} \text{ if }\quad g_i(x^k) \ge 0, \\ \rho _k{\lambda }^k_i g_i(x^k) +\alpha ^{-1}\rho _k^{\alpha } |g_i(x^k)|^{\alpha } &{} \text{ if }\quad g_i(x^k) < 0, \; {\lambda }^k_i\ge \rho _k^{\alpha -1}|g_i(x^k)|^{\alpha -1}, \\ (1-\alpha )\alpha ^{-1}({\lambda }^k_i)^{\alpha /(\alpha -1)} &{} \text{ if }\quad g_i(x^k) < 0, \; {\lambda }^k_i< \rho _k^{\alpha -1}|g_i(x^k)|^{\alpha -1}. \\ \end{array} \right. \end{aligned}$$

First, consider the case in which \(\rho _k \rightarrow \infty \). It follows from this expression of \(W\) that

$$\begin{aligned} \limsup _{k \in K} \sum _{i=1}^{m}\frac{W(\rho _k g_i(x^k),{\lambda }^k_i)}{\rho _k^{\alpha }}=\sum _{i=1}^{m}\alpha ^{-1} {|\max \{0,g_i(x^*)\}|^{\alpha }}. \end{aligned}$$

It is easy to see that a similar argument shows

$$\begin{aligned} \limsup _{k \in K} \sum _{i=1}^{m}\frac{W(\rho _k\max \{0, g_i(z)\},{\lambda }^k_i)}{\rho _k^{\alpha }}=\sum _{i=1}^{m}\alpha ^{-1} {|\max \{0,g_i(z)\}|^{\alpha }}. \end{aligned}$$

Therefore, by Proposition 2 with \(\beta (s)=s^{\alpha -1}\), we conclude that (23) holds in the case \(\rho _k \rightarrow \infty \).

In the case \(\{\rho _k\}\) bounded, similarly to the first part of the proof of Proposition 2, we obtain that each limit point \(x^*\) of \(\{x^k\}\) is feasible for problem (1), which trivially implies (23). \(\square \)

Remark 2

For the PHR augmented Lagrangian function (5), Corollary 3 implies the infeasibility measure \(\Vert g(x^*)_+\Vert \le \Vert g(z)_+\Vert \), for every \(z\in \Omega \) and \(x^*\) limit point of \(\{x^k\}\) (see [5, Theorem 2.1] for a similar result). If we consider the penalty function (10), similarly to the proof of Corollary 3 with \( \beta (s)=s^{2}\), we obtain \(\Vert g(x^*)_+\Vert _3\le \Vert g(z)_+\Vert _3\), for every \(z\in \Omega \) and \(x^*\) limit point of \(\{x^k\}\).

Next, we present an infeasibility measure for the limit points of the sequence \(\{x^k\}\) generated by Algorithm  1 when the exponential type function is considered.

Corollary 4

Assume that \(\{x^k\}\) is a sequence generated by Algorithm 1 with \(W\) defined in (9). Let \(x^*\in \Omega \) be a limit point of \(\{x^k\}\) such that \(\displaystyle \lim _{k \in K} x^k = x^*\) with \(\displaystyle K\displaystyle \mathop {\subset }_{\infty }\mathbb {N}\). Suppose that if \(g_i(x^*)>0\) then \(\displaystyle \limsup _{j\in K}\lambda _i^j>0\). Then, for all \(z \in \Omega \),

$$\begin{aligned} \Vert g(x^*)_+\Vert _{\infty }\le \Vert g(z)_+\Vert _{\infty }. \end{aligned}$$
(24)

Proof

Let us assume that problem (1) is infeasible, otherwise, it follows from the last statement of Proposition 2 that (24) trivially holds. Suppose by contradiction that there exists \(z\in \Omega \) such that \(\Vert g(x^*)_+\Vert _{\infty }> \Vert g(z)_+\Vert _{\infty }\). Since problem (1) is infeasible, \(\Vert g(z)_+\Vert _{\infty }>0\) and \(\rho _k \rightarrow \infty \) (see Step 2 of Algorithm 1). Therefore, there exists an index \(i_0\in \{1,\ldots ,m\}\) and \(\delta >0\) such that

$$\begin{aligned} g_{i_0}(x^*)=g_{i_0}(x^*)_+=\Vert g(x^*)_+\Vert _\infty >\Vert g(z)_+\Vert _{\infty }+\delta . \end{aligned}$$

As \(g\) is continuous, there exists \(k_0>0\) such that \(g_{i_0}(x^k)>\Vert g(z)_+\Vert _{\infty }+\delta \), for any \(k\ge k_0, k\in K\). In particular, it follows from our assumptions that \( \limsup _{k\in K}\lambda _{i_0}^k>0\). Observing that \(0\le \lambda ^k_i\le \lambda _{\max } \) and considering the function \(M\) of (15) and \(W\) defined in (9), it follows that

$$\begin{aligned} M(g(x^k),\lambda ^k,\rho _k)= & {} \frac{1}{\rho _k}\sum _{i=1}^m\lambda ^k_i (e^{\rho _kg_i(x^k)}-1)\\&+\,\frac{1}{3} [\max \{0,\rho _kg_i(x^k)\}]^3\ge \frac{\lambda _{i_0}^ke^{\rho _kg_{i_0}(x^k)}}{\rho _k} -\frac{m\lambda _{\max }}{\rho _k}. \end{aligned}$$

Therefore, since \(z\) is fixed, defining \(\displaystyle \beta (s)=e^{s\Vert g(z)_+\Vert _{\infty }}\) we have

$$\begin{aligned} \limsup _{k\in K}\frac{M(g(x^k),\lambda ^k,\rho _k)}{\beta (\rho _k)}\ge & {} \limsup _{k\in K} \frac{1}{\rho _k}\bigg [\lambda _{i_0}^ke^{\rho _k(g_{i_0}(x^k)-\Vert g(z)_+\Vert _{\infty })} -\frac{m\lambda _{\max }}{e^{\rho _k\Vert g(z)_+\Vert _{\infty }}}\bigg ]\\\ge & {} \limsup _{k\in K} \lambda _{i_0}^k\frac{e^{\rho _k\delta }}{\rho _k}. \end{aligned}$$

As \( \limsup _{k\in K}\lambda _{i_0}^k>0\), we obtain \( \limsup _{k\in K}( \lambda _{i_0}^k{e^{\rho _k\delta }})/{\rho _k}=\infty .\) Combining this result with previous inequalities, it yields

$$\begin{aligned} \limsup _{k\in K}\frac{M(g(x^k),\lambda ^k,\rho _k)}{\beta (\rho _k)}=\infty . \end{aligned}$$
(25)

On the other hand,

$$\begin{aligned} \frac{M(g(z)_+,\lambda ^k,\rho _k)}{\beta (\rho _k)}= & {} \displaystyle \frac{1}{e^{\rho _k\Vert g(z)_+\Vert _{\infty }}}\sum _{i=1}^m\bigg [\frac{\lambda ^k_i}{\rho _k} (e^{\rho _kg_i(z)_+}-1)+\frac{1}{3\rho _k}(\max \{0,\rho _kg_i(z)_+\})^3\bigg ]\nonumber \\\le & {} \displaystyle \sum _{i=1}^m\bigg [\frac{\lambda ^k_i}{\rho _k} \big (e^{\rho _k(g_i(z)_+-\Vert g(z)_+\Vert _{\infty })}\big ) +\frac{\rho _k^2(\max \{0,g_i(z)_+\})^3}{3e^{\rho _k\Vert g(z)_+\Vert _{\infty }}}\bigg ]\nonumber \\\le & {} \displaystyle \sum _{i=1}^m \frac{\lambda ^k_i}{\rho _k}+\sum _{i=1}^m \frac{\rho _k^2}{3e^{\rho _k\Vert g(z)_+\Vert _{\infty }}}(\max \{0,g_i(z)_+\})^3\nonumber \\\le & {} \displaystyle \frac{m\lambda _{\max }}{\rho _k}+\sum _{i=1}^m \frac{\rho _k^2}{3e^{\rho _k\Vert g(z)_+\Vert _{\infty }}}(\max \{0,g_i(z)_+\})^3, \end{aligned}$$
(26)

where the first and third inequalities are due to fact that the multipliers are non-negative and bounded by \(\lambda _{\max }\), respectively. In the second one, we used \(g_i(z)_+\le \Vert g(z)_+\Vert _{\infty }\), for \(i=1,\ldots ,m.\) Since \(\{\rho _k\}\) tends to infinity, we obtain from (26)

$$\begin{aligned} \displaystyle \limsup _{k\rightarrow \infty } \frac{M(g(z)_+,\lambda ^k,\rho _k)}{\beta (\rho _k)}\le 0, \end{aligned}$$

which combined with (25) gives a contradiction, by Proposition 2. The proof of the corollary follows. \(\square \)

Remark 3

We mention that the update rule for the Lagrange multipliers is general, see Step 3 of Algorithm 1. Therefore, the additional asymptotic condition of Corollary 4 may be forced during the progress of the algorithm. If the update rule (14) is considered, then the asymptotic condition is satisfied when \(\Vert x_{k+1}-x_k\Vert \rightarrow 0\). We also mention that Corollary 4 holds for the penalty function defined in (7).

Next, we present an estimate which is used in Algorithm 2 to define a criterion to detect an approximate solution of problem (1). We shall consider for any \(k\),

$$\begin{aligned} \gamma _k:= -M(g(x^k),\lambda ^k,\rho _k)=-\frac{1}{\rho _k}\sum _{i=1}^{m}W(\rho _k g_i(x^k),{\lambda }^k_i). \end{aligned}$$
(27)

Lemma 5

Assume that \(\{x^k\}\) is an infinite sequence generated by Algorithm 1. Suppose that problem (1) is feasible. Then, for any \(k\)

$$\begin{aligned} f(x^k)\le f(z)+\gamma _k+\varepsilon _k, \end{aligned}$$

for all feasible point \(z\).

Proof

By Step 1 of Algorithm (1) and (2), we have

$$\begin{aligned} f(x^k)+\frac{1}{\rho _k}\sum _{i=1}^{m}W(\rho _k g_i(x^k),{\lambda }^k_i) \le f(z) +\frac{1}{\rho _k}\sum _{i=1}^{m}W(\rho _k g_i(z),{\lambda }^k_i)+\varepsilon _k,\quad \forall \; k, \;\text{ and }\;z\in \Omega .\nonumber \\ \end{aligned}$$
(28)

Let \(z\) be a feasible point. Therefore, since \(g(z) \le 0 , W(0,t)=0\) for \(t\ge 0\) and \(W(\cdot ,t)\) is non-decreasing for every \(t\ge 0\), we have

$$\begin{aligned} W(\rho _k g_i(z),{\lambda }^k_i)\le W(0,{\lambda }^k_i)=0, \quad \forall \;k \;\text{ and }\; i=1,\ldots ,m . \end{aligned}$$

Combining this result with (27) and (28), we obtain

$$\begin{aligned} f(x^k)\le f(z)+\gamma _k+\varepsilon _k, \end{aligned}$$

which proves the lemma. \(\square \)

In the following theorem, we present a criterion to detect infeasibility after a finite number of iterations of Algorithm  1. We assume that it is possible to obtain a bounded sequence \(\{c_k\}\) satisfying

$$\begin{aligned} |f(x^k)-f(z) | \le c_k \;\; \text{ for } \text{ all } \;\; z \in \Omega , \, \forall k . \end{aligned}$$
(29)

In the numerical experiments section, we describe a way to obtain \(c_k\). Clearly, since \(f\) is continuous and \(\Omega \) is bounded, the sequence \(\{c_k\}\) may be assumed to be bounded.

Theorem 6

Assume that \(\{x^k\}\) is an infinite sequence generated by Algorithm 1. Then, problem (1) is infeasible if, and only if, there exists \(k \in \mathbb {N}\) such that

$$\begin{aligned} \gamma _k+\varepsilon _k<-c_k, \end{aligned}$$
(30)

where \(\gamma _k\) is defined in (27).

Proof

Suppose the feasible region of (1) is non-empty. Therefore, it follows by (29) and Lemma 5 that, for all feasible point \(z\),

$$\begin{aligned} \gamma _k+\varepsilon _k \ge f(x^k)-f(z) \ge -c_k,\qquad \qquad \forall \; k. \end{aligned}$$

This means that the infeasibility test (30) fails to be fulfilled.

Reciprocally, suppose that problem (1) is infeasible. In this case, it follows from Step 2 of Algorithm 1 that \(\rho _k \rightarrow \infty \). Also, for any subsequence \(\{x^k\}_{k \in K}\), there exist \(k_0>0\) and \(i_0\in \{1,\ldots ,m\}\) such that \(g_{i_0}(x^{k})> \delta \) for some \(\delta >0\), and \(k\in K, k\ge k_0\). Consequently, using \(W(\cdot ,t)\) is non-decreasing for every \(t\ge 0\), we obtain for all \(k\in K, k\ge k_0\)

$$\begin{aligned} \gamma _k= & {} \displaystyle -\frac{1}{\rho _k}\sum _{i= 1}^{m}W(\rho _k g_i(x^k),{\lambda }^k_i)\nonumber \\= & {} \displaystyle -\frac{1}{\rho _k}\sum _{i\ne i_0}W(\rho _k g_i(x^k),{\lambda }^k_i)-\frac{W(\rho _k g_{i_0}(x^k),{\lambda }^k_{i_0})}{\rho _k}\nonumber \\\le & {} \displaystyle -\frac{1}{\rho _k}\sum _{i\ne i_0}W(\rho _k g_i(x^k),{\lambda }^k_i)-\frac{W(\rho _k \delta ,{\lambda }^k_{i_0})}{\rho _k}\nonumber \\\le & {} \displaystyle -\frac{1}{\rho _k}\sum _{i\ne i_0}^{m}\inf _{s\in \mathbb {R}} W(s,{\lambda }^k_i)-\frac{W(\rho _k \delta ,0)}{\rho _k}, \end{aligned}$$
(31)

where we have used properties of infimum and item (i) of Lemma 1 to get the last inequality. It follows by item (iv) of Lemma 1 that

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{1}{\rho _k}\sum _{i\ne i_0}^{m}\inf _{s\in \mathbb {R}} W(s,{\lambda }^k_i)=0. \end{aligned}$$

On the other hand, (H7) implies

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{W(\rho _k \delta ,0)}{\rho _k}=\infty . \end{aligned}$$

Therefore, it follows from (31) that

$$\begin{aligned} \lim _{k\rightarrow \infty }\gamma _k=-\infty . \end{aligned}$$

Hence, the boundedness of \(\{\varepsilon _k\}\) and \(\{c_k\}\) implies the fulfillment of test (30), for \(k\) large enough. \(\square \)

In the following theorem, we show that an optimality condition holds after a finite number of iterations of Algorithm  1, under the assumption that problem (1) is feasible and \(\{ \varepsilon _k\} \) tends to zero.

Theorem 7

Assume that \(\{x^k\}\) is an infinite sequence generated by Algorithm 1. Let \(K \displaystyle \mathop {\subset }_{\infty }\mathbb {N}\) and \(x^* \in \Omega \) be such that \(\lim _{k \in K} x^k = x^*\). Suppose that problem (1) is feasible and \(\lim _{k \rightarrow \infty } \varepsilon _k = 0\). Then,

$$\begin{aligned} \lim _{k \in K} \gamma _k=0, \end{aligned}$$
(32)

where \(\gamma _k\) is defined in (27). As a consequence, given \(\varepsilon _{\text{ opt }}>0\), for large enough \(k\),

$$\begin{aligned} \gamma _k + \varepsilon _k \le \varepsilon _{\text{ opt }}, \end{aligned}$$

which in turn implies

$$\begin{aligned} f(x^k)\le f(z)+\varepsilon _{\text{ opt }}, \qquad \forall \;z \;\text{ feasible }. \end{aligned}$$

Proof

It follows from Proposition 2 that \(x^*\) is feasible for problem (1). Since the feasible region of (1) is non-empty and compact, there exists a global minimizer \(z \in \Omega \). By Lemma 5,

$$\begin{aligned} f(x^k)\le f(z)+\gamma _k +\varepsilon _k, \qquad \forall \; k. \end{aligned}$$
(33)

First, we shall prove (32) in the case that \(\rho _k \rightarrow \infty \). For this, note that item (iv) of Lemma 1 implies

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{1}{\rho _k}\sum _{i= 0}^{m}\inf _{s\in \mathbb {R}} W(s,{\lambda }^k_i)=0. \end{aligned}$$

Using the definition of \(\gamma _k\) in (27), properties of infimum and the last equality, we obtain

$$\begin{aligned} \limsup _{k \in K}\gamma _k=\limsup _{k \in K}-\frac{1}{\rho _k}\sum _{i=1 }^{m}W(\rho _k g_i(x^k),{\lambda }^k_i)\le \limsup _{k \in K}-\frac{1}{\rho _k}\sum _{i=1}^{m}\inf _{s\in \mathbb {R}} W(s,{\lambda }^k_i)=0. \end{aligned}$$
(34)

On the other hand, taking the infimum limit for \(k \in K\) in (33) and using \({ \varepsilon _k} \rightarrow 0\) and \(f(z)\le f(x^*)\), we obtain

$$\begin{aligned} \liminf _{k \in K} \gamma _k \ge 0, \end{aligned}$$

which, combined with (34), proves (32) in the case that \(\rho _k \rightarrow \infty \).

Now, if \(\{\rho _k\}\) is bounded, by step 2 of Algorithm 1, we have \(\lim _{k \rightarrow \infty }\Vert \sigma (x^k,\lambda ^k,\rho _k)\Vert _{\infty }=0,\) or equivalently,

$$\begin{aligned} \lim _{k \rightarrow \infty }[W'_s(\rho _k g_i(x^k),{\lambda }^k_i)-{\lambda }^k_i]=0, \quad i=1,\ldots ,m. \end{aligned}$$
(35)

As \(\{\lambda ^k\}\) and \(\{\rho _k\}\) are bounded, we can assume that \(\lambda ^k\rightarrow \bar{\lambda }\ge 0\) and \(\rho _k\rightarrow \bar{\rho }>0\). So, (35) implies

$$\begin{aligned} W'_s(\bar{\rho } g_i(x^*),\bar{\lambda }_i)=\bar{\lambda }_i \quad i=1,\ldots ,m, \end{aligned}$$
(36)

which, combined with item (iii) of Lemma 1, gives

$$\begin{aligned} g({x^*})\le 0,\quad \bar{\lambda }_ig_i({x^*})=0, \quad i=1,\ldots ,m. \end{aligned}$$
(37)

Now, using item (ii) of Lemma 1, we have for all \(k\) and \(i\)

$$\begin{aligned} \lambda ^k_i g_i(x^k)\le \frac{1}{\rho _k}W(\rho _kg_i(x^k),\lambda ^k_i) \le g_i(x^k)W_s'(\rho _kg_i(x^k),\lambda ^k_i). \end{aligned}$$

Taking the limit for \(k \in K\) in the last inequality, together with (36) and (37), it follows that

$$\begin{aligned} \lim _{k \rightarrow \infty }\frac{1}{\rho _k}[W(\rho _k g_i(x^k),{\lambda }^k_i)]=\bar{\lambda }_i g_i(x^*)=0,\quad i=1,\ldots ,m, \end{aligned}$$

which, combined with (27), yields (32) in the case \(\{\rho _k\}\) bounded. Thus, the proof of (32) is concluded.

The second statement of the theorem follows from the first result and the assumption \( \varepsilon _k \rightarrow 0\). In order to establish the last part of the theorem, combine the second one and Lemma 5. \(\square \)

Due to the previous theorems, we are able to define a variation of Algorithm  1, for which we can guarantee finite termination with certificates of infeasibility or optimality up to given precisions. To state Algorithm 2, we assume that \(\varepsilon _{\text{ feas }} > 0\) and \(\varepsilon _{\text{ opt }} > 0\) are user-given tolerances for feasibility and optimality respectively.

Algorithm 2

Let \(\gamma >1, 0 < \tau < 1, \rho _1>0, \lambda _{\max }>0\), and \({\lambda }^1 \in {{\mathbb {R}}}^m\) such that \({{\lambda }}^1_i \in [0, \lambda _{\max }],\) for all \(i=1,\dots ,m\). Assume that \(\{\varepsilon _k\}\) is a bounded positive sequence and initialize \(k \leftarrow 1\).

Step 1:

(Subproblem)

Obtain \(x^k \in \Omega \) satisfying

$$\begin{aligned} L(x^k, {\lambda }^k,\rho _k) \le L(z, {\lambda }^k,\rho _k) + \varepsilon _k,\qquad \forall z \in \Omega . \end{aligned}$$
Step 2:

(Infeasibility test)

Let

$$\begin{aligned} \gamma _k := -\frac{1}{\rho _k}\sum _{i=1}^{m}[W(\rho _k g_i(x^k),\lambda ^k_i)]. \end{aligned}$$

Compute \(c_k > 0\) such that \(| f(x^k) - f(z) | \le c_k \) for all \(z \in \Omega \). If

$$\begin{aligned} \gamma _k + \varepsilon _k < - c_k, \end{aligned}$$

stop declaring Infeasible problem.

Step 3:

(Feasibility and optimality test)

If

$$\begin{aligned} \Vert g(x^k)_+\Vert _{\infty } \le \varepsilon _{\text{ feas }} \;\; \text{ and } \;\; \gamma _k + \varepsilon _k \le \varepsilon _{\text{ opt }}, \end{aligned}$$

stop declaring Solution found.

Step 4:

(Updating the penalty parameter)

If \(k=1\) or

$$\begin{aligned} \Vert \sigma (x^k,\lambda ^k,\rho _k)\Vert _{\infty } \le \tau \; \Vert \sigma (x^{k-1},\lambda ^{k-1},\rho _{k-1})\Vert _{\infty }, \end{aligned}$$

then define \(\rho _{k+1} = \rho _k\). Otherwise, \(\rho _{k+1} := \gamma \rho _k\).

Step 5:

(Updating the multipliers)

Compute \( \lambda ^{k+1}_i=\min \{W'_s(\rho _kg_i(x^k),\lambda _i^k), \lambda _{\max }\},\) for all \( i=1,\dots ,m\).

Set \(k \leftarrow k+1\) and go to Step 1.

The next result is an immediate consequence of Theorems 6 and 7.

Theorem 8

Let \(\{x_k\}\) be generated by Algorithm 2 with \(\lim _{k \rightarrow \infty } \varepsilon _k = 0\). Then, after a finite number of iterations, the algorithm finds an approximate solution \(x^k\) satisfying

$$\begin{aligned} \Vert g(x^k)_+\Vert \le \varepsilon _{\text{ feas }}, \qquad f(x^k) \le f(z) + \varepsilon _{\text{ opt }} \end{aligned}$$

for all \(z \in \Omega \) such that \(g(z) \le 0\), or stops, certifying the problem is infeasible.

4 Numerical experiments

This section summarizes the results of the numerical experiments we carried out in order to verify the effectiveness of Algorithm 2, for solving problem (1) or identifying if it is infeasible. We implemented Algorithm 2 considering five different augmented Lagrangian functions (2) with the penalty functions (4), (6) with \(\alpha =4/3\), (7), (9) and (10), which will be denoted by augmented Lagrangian of type (ALtype) \(1, 2, 3, 4\) and 5, respectively.

Our algorithm was implemented as modification of the software Algencan, which is a Fortran 77 code, free available in http://www.ime.usp.br/~egbirgin/tango/ and based on the theory presented in [1]. Similarly to [3], our main modifications are the implementation of different augmented Lagrangian functions. The main difference of our codes and the ones implemented in [3] is a multistart strategy to solve the subproblems and the introduction of stopping criteria, described in Steps 2 and 3 of Algorithm 2. The experiments were run on a 3.4 GHz Intel(R) i7 with 4 processors, 8Gb of RAM and Linux operating system.

4.1 Implementation details

The performance of algorithms is, in general, sensitive to the choice of the initial parameters. For comparison purposes, we decided to keep the same configuration of the initial algorithmic parameters for all augmented Lagrangians versions. We set \(\varepsilon _{\text{ feas }} = \varepsilon _{\text{ opt }} = 10^{-4}, \lambda ^1=0, \lambda _{\max }=10^{10}, \rho _1=10, \gamma = 10\) and \(\tau =0.5\). We put \(\varepsilon _1=10^{-4}\) and \(\varepsilon _k=\max \{10^{-(k+2)},10^{-8}\}\) for \(k\ge 2\).

In all considered problems the set \(\Omega \) is a box \(\{ x \in \mathbb {R}^n \;| \; l \le x \le u \}\). We use the local optimization package Gencan to solve the box-constrained subproblems. Gencan is an active-set method which uses a truncated Newton approach with line searches within the faces whereas, for leaving the faces, uses spectral projected gradient iterations, see [1] and references therein. In all implemented versions, we replaced each Hessian of the augmented Lagrangian-vector product by an incremental quotient, using the approximation

$$\begin{aligned} \nabla ^2L(x,\lambda ,\rho )d\approx \frac{1}{t}\left( \nabla L(x+td,\lambda ,\rho )-\nabla L(x,\lambda ,\rho )\right) \end{aligned}$$

with a small value of \(t\).

Step 1 of Algorithm 2 requires the global solution of the subproblems. However, we use Gencan, which is a local optimization algorithm, to solve the subproblems. In each outer iteration, we ran Gencan 30 times using different randomly generated initial points (feasible with respect to the box \(\Omega \)) and we considered as solution the point with the smallest value of the objective function of the subproblem. Adopting this strategy, we assume that we are able to find a global minimizer of each subproblem. Let us call this strategy multistart.

For simplicity, let us denote the objective function of the \(k\)-th subproblem \(L (x,\lambda ^k,\rho _k)\) by \(L (x)\). Given a tolerance \(\bar{\varepsilon }_k>0\), Gencan stops at point \(y\) if

$$\begin{aligned} \left\| y-P_{\Omega }(y-\nabla L (y)) \right\| \le \bar{\varepsilon }_k, \end{aligned}$$
(38)

where \(P_{\Omega }(x)\) is the Euclidean projection of \(x\) on the box \(\Omega \). Therefore, the inequality (38) holds for \(y\) equal to \(x^k\). The tolerance \(\bar{\varepsilon }_k\) used in the stopping criterion (38) of the \(k\)-th subproblem is related to the gap \(\varepsilon _k\) at Step 1 of Algorithm 2 by

$$\begin{aligned} \bar{\varepsilon }_k = \frac{\varepsilon _k}{\max \{1,d(x^k)+\left\| \nabla L(x^k)\right\| \}}, \end{aligned}$$
(39)

where

$$\begin{aligned} d(x) = \left\| \max \left\{ P_{\Omega }(x-\nabla L (x))-l,u-P_{\Omega }(x-\nabla L (x))\right\} \right\| . \end{aligned}$$

Let us justify the relation (39). Denote by \(x_*^k\) a global minimizer of the \(k\)-th subproblem. Since the box \(\Omega \) is compact and convex, using properties of the Euclidean projection, we obtain

$$\begin{aligned} \left<x^k-\nabla L(x^k)-P_{\Omega }(x^k-\nabla L (x^k)),x_*^k-P_{\Omega }(x^k-\nabla L (x^k))\right> \le 0. \end{aligned}$$

Then,

$$\begin{aligned} \left<\nabla L(x^k),P_{\Omega }(x^k-\nabla L (x^k))-x_*^k\right> + \left<x^k-P_{\Omega }(x^k-\nabla L (x^k)), x_*^k-P_{\Omega }(x^k-\nabla L (x^k))\right> \le 0, \end{aligned}$$

and, by the Cauchy–Schwarz inequality,

$$\begin{aligned} \left<\nabla L(x^k),P_{\Omega }(x^k-\nabla L (x^k))-x_*^k\right> \le \left\| x^k-P_{\Omega }(x^k-\nabla L (x^k)) \right\| \left\| x_*^k-P_{\Omega }(x^k-\nabla L (x^k)) \right\| . \end{aligned}$$

Therefore, by the stopping criterion (38) and the definition of \(d(x)\), we have

$$\begin{aligned} \left<\nabla L(x^k),P_{\Omega }(x^k-\nabla L (x^k))-x_*^k\right> \le \bar{\varepsilon }_k d(x^k). \end{aligned}$$
(40)

If \(L(x)\) is a convex function, it follows by the gradient inequality that

$$\begin{aligned} L(x_*^k) \ge L(x^k)+ \left<\nabla L(x^k),x_*^k-x^k\right>, \end{aligned}$$

or, equivalently,

$$\begin{aligned} L(x_*^k) \ge L(x^k)+ \left<\nabla L(x^k),P_{\Omega }(x^k-\nabla L (x^k))-x^k\right>+ \left<\nabla L(x^k),x_*^k- P_{\Omega }(x^k-\nabla L (x^k))\right>. \end{aligned}$$

Hence, using the Cauchy–Schwarz inequality, stopping criterion (38) and (40), we obtain

$$\begin{aligned} L(x_*^k) \ge L(x^k)- \bar{\varepsilon }_k\left( d(x^k)+\Vert \nabla L(x^k)\Vert \right) , \end{aligned}$$

which, combined with (39), gives the desired inequality in Step 1 of Algorithm 2. Now, consider the case where \(L(x)\) is not a convex function. If we assume that \(x^k\) is close enough to \(x_*^k\) then the linear approximation \(L(x^k)+ \left<\nabla L(x^k),x_*^k-x^k\right>\) of \(L(x_*^k)\) is “acceptable”. Therefore, we can proceed as in the convex case to conclude that the inequality in Step 1 of Algorithm 2 is “approximately” satisfied.

The constant \(c_k\) at Step 2 of Algorithm 2 is computed as suggested in [5]. By interval arithmetic, an interval \([f^{\min },f^{\max }]\) is computed, such that \(f^{\min } \le f(x) \le f^{\max }\) for all \(x \in \Omega \). So, \(c_k\) is calculate by

$$\begin{aligned} c_k = \max \{ f(x^k) - f^{\min }, f^{\max } - f(x^k)\}. \end{aligned}$$

For interval analysis calculations we use the Intlib library [13].

4.2 Some feasible examples

The main purpose of this section is to illustrate the performance of Algorithm 2 with the five types of augmented Lagragian functions for some feasible problems. In the following experiments, we considered the set of problems with inequality constraints analyzed in [5]. These problems come from the global optimization literature. Each problem is feasible and the optimal value of the objective function is known. Therefore, it is possible to check if a particular solution given by Algorithm 2 is indeed a global minimizer.

Table 1 Performance of Algorithm 2 with the five augmented Lagrangian functions in the problems 1–7

In this section, we are also interested to check whether the multistart strategy to solve the subproblems and the relation (39) are appropriate or not. The reliability of these approaches will be corroborated if the problems are successfully solved. We understand that a problem has been successfully solved if Algorithm 2 stops at Step 3 and the final iterate \(x^*\) satisfies \(f(x^*)\le f(z)+\varepsilon _{opt}\) for all feasible \(z\). In the following, we describe the problems considered in these section.

figure a

Table 1 shows the performance of Algorithm 2 considering the five types of augmented Lagrangian functions. In the table, the first three columns identify the problem and the number of variables and constraints. “ALtype” identifies the considered augmented Lagrangian function, “Iter” is the number of augmented Lagrangian iterations, \(f(x^*)\) is the value of the objective function at the final iterate \(x^*, \Vert g(x^*)_+\Vert _{\infty }\) is the infeasibility measurement at \(x^*, \gamma _k+\varepsilon _k\) is the computed value at Steps 2 and 3 in the last iteration and “Global” identifies whether the final iterate is a global solution, in the sense that \(f(x^*)\le f(z)+\varepsilon _{opt}\) for all feasible \(z\), or not.

In all instances, Algorithm 2 stopped at Step 3 in a finite number of iterations reporting solution found. In fact, note that the values of \(\gamma _k+\varepsilon _k\) in Table 1 are equal to or less than \(\varepsilon _{opt}=10^{-4}\). The last column reports that all problems have been successfully solved by Algorithm 2 for each augmented Lagrangian type. Analyzing the column “Iter” of Table 1, we may note that, for these problems, the algorithm has a similar performance for all augmented Lagrangian functions considered. Moreover, these results allow us to conclude that the multistart strategy to solve the subproblems and the relation (39) are appropriate.

4.3 Kissing problems

In this section, our aim is to illustrate the behavior of our algorithm to detect if a problem is infeasible. For this, we analyze our infeasible test for some infeasible problems. In order to be self-contained, we also consider some feasible problems.

Given a set of \(nSph\) \(N\)-dimensional unit spheres, we consider the problem of finding the coordinates of the sphere centers in such a way that each one of the spheres touches another given one, with no overlap and maximizing the sum of the squared distances between the sphere centers.

Let us denote by \(C_i\in \mathbb {R}^{N}\) the coordinate vector of the \(N\)-dimensional sphere center for all \(i=1,\dots ,nSph\). This problem can be modeled as follows:

$$\begin{aligned} \begin{array}{lll} \text{ Maximize } &{} \displaystyle \sum _{i<j}^N \left\| C_i-C_j\right\| ^2 &{} \\ \text{ subject } \text{ to } &{} \left\| C_i\right\| ^2=1, &{} i=1,\dots ,nSph,\\ &{} \left\| C_i-C_j\right\| ^2\ge 1, &{} \forall i<j,\\ &{} -1\le C_i\le 1, &{} i=1,\dots ,nSph. \end{array} \end{aligned}$$
(41)

The first set of constraints requires that each considered sphere touches another one centered at the origin, while the second set determines that the spheres must be placed without overlapping. Simple devices can be used to transform problem (41) to the form (1). For example, we may rewrite each equality constraint as two inequality constraints. After making the necessary modifications, problem (41) has \(n= N \times nSph\) variables, \(2\times nSph\) constraints arising from the first set and another \(C^{nSph}_{2}\) arising from the second one, totaling \(m = 2\times nSph + C^{nSph}_{2}\) constraints beyond the box-constraints.

The kissing number is defined as the maximum number of non-overlapping \(N\)-dimensional unit spheres that can be arranged such that each one touches another given \(N\)-dimensional unit sphere. When \(N=2\) the kissing number is equal 6, while for \(N=3\) and \(N=4\) is equal 12 and 24, respectively, [20, 27]. Therefore, for \(N=2\), problem (41) is feasible if \(1\le nSph\le 6\) and infeasible if \(nSph>6\). Analogously, feasible instances of problem (41) are obtained when \(N=3\) and \(1\le nSph\le 12\) or when \(N=4\) and \(1\le nSph\le 24\). Infeasible instances may be generated with \(N=3\) and \(nSph> 12\) or with \(N=4\) and \(nSph> 24\).

For illustrative purposes, we considered six instances of problem (41) with \(N=2, nSph=3, 6, 7, 10, 12\) and 15. The first two instances are feasible while the remaining ones are infeasible. The problems were solved with different types of augmented Lagrangians, specified in the Fig. 1. For all instances Algorithm 2 behaves according to theory. For the feasible instances Algorithm 2 stopped at Step 3 reporting solution found while for infeasible ones stopped at Step 2 declaring infeasibility. Figure 1 illustrates the “solutions”.

Fig. 1
figure 1

Graphical representation of six instances of the kissing problem with \(N=2\) and \(nSph=3, 6, 7, 10, 12\) and 15

For the infeasible instances illustrated in Fig. 1, Algorithm 2 stopped at outer iteration \(4, 3, 3\) and 3, respectively. Despite that Algorithm 2 stopped with few iterations, the nice arrangement of the circles corroborates the result of Proposition 2. In some sense, Proposition 2 establishes that the limit points of the sequence generated by Algorithm 2 are global minimizers of an infeasibility measure. See Corollary 3 for ALtype 1 and 2, Remark 2 for ALtype 5, Corollary 4 for ALtype 4 and Remark 3 for ALtype 3.

In order to verify the applicability of our approach, we run Algorithm 2 for feasible and infeasible instances of problem (41) considering the five types of augmented Lagrangians. Table 2 shows the performance of Algorithm 2 with the five augmented Lagrangian functions for three feasible instances of the problem (41). These problems correspond to dimension \(N=2, 3\) and 4 with \(nSph\) equal to their respective kissing number. Therefore, it is reasonable to believe that these are the most difficult feasible problems for each considered dimension. In the table, the first four columns identify the dimension \(N\), the number of \(N\)-dimensional spheres \(nSph\) and the number of variables and constraints. “ALtype” identifies the considered augmented Lagrangian function, “Iter” is the number of augmented Lagrangian iterations, \(f(x^*)\) is the value of the objective function at the final iterate \(x^*, \Vert g(x^*)_+\Vert _{\infty }\) is the infeasibility measurement at \(x^*, \gamma _k+\varepsilon _k\) and \(-c_k\) are the computed values at Steps 2 and 3 in the last iteration. In all the three feasible problems for all augmented Lagrangian types, Algorithm 2 stopped at Step 3 reporting solution found. In fact, note that the values of \(\gamma _k+\varepsilon _k\) in Table 2 is less than \(\varepsilon _{opt}=10^{-4}\).

Table 2 Performance of Algorithm 2 with the five augmented Lagrangian functions for three feasible instances of the kissing problem

Table 3 shows the performance of Algorithm 2 with the five augmented Lagrangian functions for six infeasible instances of the problem (41). Three out of the six considered instances may be the most difficult infeasible problems for each considered dimension. These problems correspond to dimension \(N=2, 3\) and 4 with \(nSph\) equal to their respective kissing number plus one. Table 3 contains the same information as Table 2. In all infeasible instances, Algorithm 2 stopped at Step 2 reporting infeasibility. In fact, comparing the last two columns of Table 3, \(\gamma _k+\varepsilon _k < -c_k\) in all cases. The results in Tables 2 and 3 show that the performance of Algorithm 2 is in agreement with the theoretical part.

Table 3 Performance of Algorithm 2 with the five augmented Lagrangian functions for six infeasible instances of the kissing problem

4.4 Comments on the numerical experiments

In order to compare the performance of the five type of Lagrangian functions, we may consider the amount of augmented Lagrangian iterations (columns “Iter”) of the problems presented in Sects. 4.2 and 4.3. For the ten feasible problems of our numerical experiments, the numbers of augmented Lagrangian iterations are 32, 46, 37, 37 and 36 for ALtype 1, 2, 3, 4 and 5, respectively. For the six infeasible problems, the numbers of augmented Lagrangian iterations are 38, 62, 25, 24 and 23 for ALtype 1, 2, 3, 4 and 5, respectively.

Based on our numerical experiments, we may conclude that Algorithm 2 with ALtype 2 presents the worst performance for both feasible and infeasible instances. It is interesting to point out that augmented Lagrangians for ALtype 3, 4 and 5 demonstrated to be as efficient as the PHR augmented Lagrangian (ALtype 1) for feasible instances. Moreover, we observe that Algorithm 2 with these Lagrangians detected infeasibility in less iterations than PHR, which may reflect in computational savings. These motivate to continue the research on general augmented Lagrangian methods for nonlinear programming under possible infeasibility.

5 Conclusions

In this article, we considered a nonlinear programming problem for which the constraint set may be infeasible. We studied the augmented Lagrangian method based on the penalty functions of Kort and Bertsekas. A general algorithm was stated and its convergence properties were analyzed. For some specific penalty functions, we showed that every limit point of the sequence generated by the algorithm is a global minimizer of an infeasibility measure. Some results regarding optimality are also presented. Based on these results, a second algorithm is proposed, which stops after a finite number of iterations at an approximate solution or detects that the problem is infeasible. We also considered some numerical experiments illustrating the applicability of the algorithm. It is interesting to point out that some other augmented Lagrangians presented similar computational performance to the classical quadratic augmented Lagrangian (PHR). Besides, for infeasible problems, our algorithm based on these other Lagrangians detected infeasibility in less iterations than PHR. These numerical results motivate to continue the studies and analysis of general Lagrangian methods.