1 Introduction

When solving an optimization problem, most of the time it is not known in advance if the problem is feasible or not. If care is not taken, the numerical solution of an infeasible nonlinear problem may lead to a long sequence of iterations until the algorithm stops because a failure is detected. Even if the problem is feasible, the sequence of iterates may sometimes converges to an infeasible stationary point. In that case, it would be convenient to quickly detect the infeasibility of the computed solution, in order to choose a new starting point and hope to converge to a feasible solution. A rapid infeasibility detection is particularly important when solving large sequences of nonlinear optimization problems that differ from each other by the variation of values of some parameters or by some slight modifications of the optimization model, as for example in branch-and-bound methods.

In this paper, we concentrate on the rapid detection of the infeasibility in the framework of the solution of a nonlinear optimization problem by means of a primal-dual augmented Lagrangian method. The augmented Lagrangian method was proposed independently by Hestenes [1] and Powell [2]. It is known to be very robust with respect to the degeneracy due to the linear dependence of the gradients of the constraints. This method is the basis of some efficient softwares like LANCELOT-A [3], ALGENCAN [4,5,6] and SPDOPT [7]. It is worth noting that the algorithm proposed in [7] departs from the standard augmented Lagrangian algorithm by its primal-dual nature. The advantage is to get a superlinear or quadratic rate of convergence of the iterates to an optimal solution under usual assumptions. In this paper, we propose a modification of this algorithm, which also guarantees a rapid rate of convergence when the sequence of iterates converges to an infeasible stationary point.

The infeasibility detection has been recently the focus of particular attention for improving the behavior of augmented Lagrangian algorithms. Martínez and Prudente [8] proposed an adaptive stopping criterion for the solution of the subproblems and showed that their new algorithm performs better than the original version of ALGENCAN. Birgin et al. [9, 10] improved also an augmented Lagrangian algorithm within the framework of global optimization and show better performances than the initial implementation in [11]. Gonçalves et al. [12] extend the results of [9] to an entire class of penalty functions. Within the framework of sequential quadratic programming (SQP) methods, Byrd et al. [13] have proposed an algorithm to quickly detect infeasibility and have shown that their algorithm has fast local convergence properties. To our knowledge, this is the first work that analyzes the local convergence around an infeasible stationary point. More recently, Burke et al. [14] have improved the infeasibility detection for an SQP algorithm. They also proved the global and rapid local convergence properties of their algorithm.

The augmented Lagrangian method of Armand and Omheni [7] may detect infeasibility when the sequence of dual variables becomes unbounded and the penalty parameter is forced to zero. The main drawback is that the infeasibility can take long to detect. We then propose to introduce a new parameter, called the feasibility parameter, whose role is to control the progress of the iterates to the feasible set. This parameter scales the objective function relatively to the constraints until a nearly feasible point is detected. The level of feasibility detection is arbitrary and, for example, can be chosen of the same order as the overall stopping tolerance. Once this event arises, the algorithm switches to its normal behavior and continues the minimization process until convergence. From a formal point of view, the algorithm can be interpreted as the numerical solution of the Fritz John optimality conditions, but with a perturbation of the constraints due to the augmented Lagrangian parameters (Lagrange multiplier and quadratic penalty term). The feasibility parameter is updated dynamically. In particular, its value depends on the norm of the residual of a primal-dual system related to the minimization of the infeasibility measure. This leads to a superlinear or quadratic convergence of the sequence of iterates to an infeasible stationary point. To our knowledge, this is the first local convergence result in the infeasible case of an augmented Lagrangian method. The paper concentrates on the equality constraints case, to complete the capability of the algorithm [7] in detecting infeasibility. A possible extension to the general case with equalities and inequalities is discussed in Sect. 7.

The paper is organized as follows. In the next section, we summarize our notation and terminology which will be used. The algorithm is described in the next section. The global convergence and the asymptotic analysis are studied in Sects. 4 and 5. Some numerical experiments are reported in Sect. 6 to demonstrate the efficiency of new method. Section 7 ends the paper.

2 Notation

For two real vectors x and y of same lengths, \(x^\top y\) is their Euclidean scalar product and \(\Vert x\Vert =(x^\top x)^{1/2}\) is the associated norm. For a real matrix M, the induced matrix norm is \(\Vert M\Vert =\max \{\Vert Md\Vert : \Vert d\Vert \le 1\}.\) The inertia of a real symmetric matrix M, denoted by \(\hbox {In}(M):=(\iota _+,\iota _-,\iota _0)\), is the numbers of positive, negative and null eigenvalues. For a function f and an iterate \(x_k\), to simplify the notation we denote \(f_k=f(x_k)\). Likewise, \(f^*\) stands for \(f(x^*)\), \(g_k^+\) stands for \(g(x_k^+)\), and so on. The positive part of a real number r is defined by \(r_+=\max \{r,0\}.\) Let \(\{a_k\}\) and \(\{b_k\}\) be nonnegative scalar sequences. We write \(a_k={ O}(b_k)\), or equivalently \(b_k=\varOmega (a_k)\), if there exists a constant \(c>0\) such that \(a_k\le c\,b_k\) for all \(k\in \mathbb {N}\). The notation \(a_k=\varTheta (b_k)\) means that \(a_k={ O}(b_k)\) and \(a_k=\varOmega (b_k)\). We also write \(a_k={ o}(b_k)\), if \(a_k=\epsilon _k b_k\) for all \(k\in \mathbb {N}\), with \(\lim \epsilon _k =0\).

3 Algorithm

We consider the equality constrained optimization problem

figure a

where \(f{:}\mathbb {R}^n\rightarrow \mathbb {R}\) and \(c{:}\mathbb {R}^n\rightarrow \mathbb {R}^m\) are smooth, and where \(\rho \ge 0\). For the value \(\rho =1\), the problem \((\hbox {P}_1)\) is referred as the original problem, the one to be initially solved. For the value \(\rho =0\), any feasible solution is optimal for (P\(_0\)). The parameter \(\rho \) is then called as the feasibility parameter.

The main contribution of this paper is to propose an updating strategy of the feasibility parameter, in order to guarantee the global convergence of the minimization algorithm to a feasible or an infeasible stationary point of the original problem and also fast local convergence in both cases.

Let us recall some definitions about stationary points. A point \(x\in \mathbb {R}^n\) is called a Fritz John (FJ) point of problem \((\hbox {P}_1)\) if there exists \((z,y)\in \mathbb {R}\times \mathbb {R}^m\) with \((z,y)\ne 0\) such that

$$\begin{aligned} c(x)=0 \quad \hbox {and}\quad zg(x)+A(x)y=0, \end{aligned}$$

where \(g(x)=\nabla f(x)\) denotes the gradient of f at x and \(A(x)=\nabla c(x)\) is the transpose of the Jacobian matrix of the constraints at x. A FJ point x is a Karush–Kuhn–Tucker (KKT) point whenever \(z\ne 0\). In that case, y / z is the vector of Lagrange multipliers related to problem \((\hbox {P}_1)\). A FJ point x for which \(z=0\) is called a singular stationary point. In other words, a singular stationary point is a feasible point at which the linear independence constraint qualification (LICQ) does not hold. A point \(x\in \mathbb {R}^n\) is called an infeasible stationary point of problem \((\hbox {P}_1)\) if

$$\begin{aligned} c(x)\ne 0 \quad \hbox {and}\quad A(x)c(x)=0. \end{aligned}$$

In other words, an infeasible stationary point is not feasible for the problem \((\hbox {P}_1)\) and is a stationary point for the feasibility problem

$$\begin{aligned} \begin{array}{ll}\hbox {minimize}_{x\in \mathbb {R}^n}&\textstyle \frac{1}{2}\Vert c(x)\Vert ^2.\end{array} \end{aligned}$$
(1)

The augmented Lagrangian associated with (\(\hbox {P}_\rho \)) is defined as

$$\begin{aligned} {\mathcal {L}}_{\rho ,\sigma }(x,\lambda ):= \rho f(x)+\lambda ^\top c(x)+\textstyle \frac{1}{2\sigma }\Vert c(x)\Vert ^2, \end{aligned}$$
(2)

where \(\lambda \in \mathbb {R}^m\) is an estimate of the vector of Lagrange multipliers associated with the equality constraints and \(\sigma >0\) is a quadratic penalty parameter. Recall that when \(x^*\) is a KKT point for (\(\hbox {P}_\rho \)), with an associated vector of Lagrange multipliers \(\lambda ^*\), if the sufficient second-order optimality conditions hold at \(x^*\), then \(x^*\) is a strict local minimum of \({\mathcal {L}}_{\rho ,\sigma }(\cdot ,\lambda ^*)\) provided that \(\sigma \) is small enough, see e.g., [15, Proposition 1.26]. This result serves as a basis of augmented Lagrangian methods, in which the augmented Lagrangian is minimized while the parameters \(\lambda \) and \(\sigma \) are updated in an appropriate manner, see e.g., [16, Chapter 17].

The first-order optimality conditions for minimizing \({\mathcal {L}}_{\rho ,\sigma }(\cdot ,\lambda )\) are

$$\begin{aligned} \rho g(x)+ A(x)\left( \lambda + \textstyle \frac{1}{\sigma }c(x)\right) =0. \end{aligned}$$

By introducing the dual variable \(y\in \mathbb {R}^m\) and the notation \(w:=(x,y)\), these optimality conditions can be reformulated as

$$\begin{aligned} \varPhi (w,\lambda ,\rho ,\sigma ) := \begin{pmatrix} \rho g(x)+ A(x)y\\ c(x)+\sigma (\lambda -y) \end{pmatrix} =0. \end{aligned}$$

This formulation of the optimality conditions for minimizing (2) serves as a basis of our algorithm. Note that by setting \(\lambda =y\), we retrieve the optimality conditions of problem (\(\hbox {P}_\rho \)).

Let us define the regularized Jacobian matrix of the function \(\varPhi \) with respect to w by

$$\begin{aligned} J_{\rho ,\sigma ,\theta }(w) = \begin{pmatrix} H_{\rho ,\theta }(w) &{} A(x) \\ A(x)^\top &{} -\sigma I \end{pmatrix}, \end{aligned}$$

where \(\theta \ge 0\) is a regularization parameter and where

$$\begin{aligned} H_{\rho ,\theta }(w) = \rho \nabla ^2f(x) + \sum _{i=1}^{m} y_i \nabla ^2 c_i(x) +\theta I \end{aligned}$$

is the regularized Hessian of the Lagrangian associated with (\(\hbox {P}_\rho \)). During the iterations, the regularization parameter is chosen to control the inertia of regularized Jacobian matrix of \(\varPhi \). It is well known that \(\hbox {In}(J_{\rho ,\sigma ,\theta }(w))=(n,m,0)\) if and only if the matrix

$$\begin{aligned} K_{\rho ,\sigma ,\theta }(w):=H_{\rho ,\theta }(w) + \textstyle \frac{1}{\sigma }A(x)A(x)^\top \end{aligned}$$

is positive definite (see e.g., [17, Lemma A.16]). A link with the augmented Lagrangian is given by the following formula:

$$\begin{aligned} \textstyle K_{\rho ,\sigma ,\theta }(w) = \nabla _{xx}^2{\mathcal {L}}_{\rho ,\sigma }(x,y-\frac{1}{\sigma }c(x)) + \theta I. \end{aligned}$$

The algorithm is a Newton-type method for the solution of the optimality system \(\varPhi =0\), and it follows the one proposed in [7]. The globalization scheme of the algorithm uses two kinds of iteration. At a main iteration, called outer iteration, all the parameters \(\lambda \), \(\rho \) and \(\sigma \) are updated and a full Newton step for the solution of \(\varPhi =0\) is performed. If the norm of the residual \(\Vert \varPhi \Vert \) at the trial iterate is deemed sufficiently small, then the new iterate is updated and a new outer iteration is called, otherwise the parameters are fixed to their current values and a sequence of inner iterations is applied in order to reduce sufficiently \(\Vert \varPhi \Vert \). The inner iteration algorithm is a backtracking line search applied to a primal-dual merit function, whose first-order optimality conditions correspond to \(\varPhi =0\).

figure b

Algorithm 1 is quite similar to [7, Algorithm 1], except for the first four steps which are related to the updating of the parameters.

Initially, a primal-dual starting point is defined and the values of the parameters are chosen. A flag is used to indicate if the algorithm is in the feasibility detection phase (\(\mathtt{F=true}\)) or not (\(\mathtt{F=false}\)).

At the first step, the algorithm tests if a nearly feasible point, with regards to a feasibility tolerance \(\epsilon >0\), has been detected. If it is the case, the algorithm switches into the normal operating mode of [7, Algorithm 1]. This means in particular that the feasibility parameter \(\rho _k\) will remain constant for all further iterations. This switching mechanism is necessary to avoid the undesirable situation where the feasibility measure goes to zero very slowly, while the condition (3) is alternatively satisfied and not satisfied an infinite number of times, leading to decreasing the feasibility parameter to zero. Moreover, in this situation, it would be impossible to make the distinction between the satisfaction of the KKT conditions and the regularity of the constraints.

At the second step, the algorithm tests if a sufficient reduction in the feasibility measure has been obtained. If it is the case, the feasibility parameter is kept constant, the Lagrange multiplier estimate is set to the current value of the dual variable and a new value of the quadratic penalty parameter is chosen. For \(k\ge 1\), the index \(i_k\) is the number of the last iteration prior to k at which inequality (3) holds. Note that, at Step 4, the quadratic penalty parameter is chosen in such a way that it could remain constant all along the iterations. But in that case, the convergence to a KKT point is only linear and the numerical experiments in [7] have shown that, in practice, it is better to force the convergence of \(\sigma _k\) to zero.

If the algorithm detects that the constraints have not decreased sufficiently, because condition (3) is not satisfied, then there are two situations. If \(\mathtt{F=true}\), then the algorithm is still in the feasibility detection phase. In that case, the feasibility parameter is sufficiently decreased, the quadratic penalty parameter is kept constant and the Lagrange multiplier estimate is rescaled. This scaling is important to force the convergence to zero of \(\{\lambda _k\}\) when this step is always executed from some iteration (see Lemma 3.1-(ii)), ensuring that the sequence of iterates approaches stationarity of the feasibility problem (see Theorem 4.1-(ii)). The second situation is when \(\mathtt{F=false}\). In that case, the algorithm has left the feasibility detection phase. Then, the feasibility parameter is kept constant, but the quadratic penalty parameter is decreased to penalize the constraints and the Lagrange multiplier estimate is kept constant as in a classical augmented Lagrangian algorithm.

The following lemma summarizes the behavior of the algorithm regarding the feasibility detection strategy and the update rules of the parameters.

Lemma 3.1

Assume that Algorithm 1 generates an infinite sequence \(\{w_k\}\). Let \(\mathcal {K}\subset \mathbb {N}\) be the set of iteration indices for which the condition (3) is satisfied.

  1. (i)

    If \(\mathcal {K}\) is infinite, then the subsequence \(\{c_k\}_{k\in {\mathcal {K}}}\) converges to zero and \(\{\rho _k\}\) is eventually constant.

  2. (ii)

    If \(\mathcal {K}\) is finite, then \(\liminf \Vert c_k\Vert >0\) and both sequences \(\{\sigma _k \rho _k\}\) and \(\{\sigma _k \lambda _k\}\) converge to zero.

Proof

For \(k\in \mathbb {N}\), set \(\beta _k=\Vert c_{i_k}\Vert \). We then have for all \(k\in \mathcal {K}\),

$$\begin{aligned} \beta _{k+1} \le a \max \{\beta _{j}: (k-\ell )_+\le j\le k\}+\zeta _k \end{aligned}$$

and for all \(k\notin \mathcal {K}\), \(\beta _{k+1}=\beta _k\). It has been shown in [7, Lemma 3.1] that such a sequence converges to zero. This proves the first conclusion of assertion (i). Since \(\{c_k\}_{k\in \mathcal {K}}\) converges to zero, then there exists \(k_0\in \mathcal {K}\) such that \(\Vert c_{k_0}\Vert \le \epsilon \) and thus \(\mathtt{F=false}\) for all further iterations. The update rules of the feasibility parameter at Step 4 and Step 5 imply that \(\rho _{k}=\rho _{k_0}\) for all \(k\ge k_0\), which proves the second conclusion of assertion (i).

To prove conclusion (ii), suppose that \({\mathcal {K}}\) is finite and let \(k_0=\max {\mathcal {K}}\). For all \(k\ge k_0+1\), \(i_{k}=k_0\) and Step 3 is executed. It follows that for all \(k\ge k_0+\ell \), we have \(\Vert c_k\Vert > a\Vert c_{k_0}\Vert \), and therefore \(\liminf \Vert c_k\Vert >0\). We consider two cases. If at some iteration k, \(\Vert c_k\Vert \le \epsilon \), then \(\mathtt{F=false}\) for all further iterations. The update of the parameters at Step 3 implies that both sequences \(\{\rho _k\}\) and \(\{\lambda _k\}\) are eventually constant and \(\{\sigma _k\}\) tends to zero. It follows that \(\{\sigma _k\rho _k\}\) and \(\{\sigma _k\lambda _k\}\) tend to zero. The second case is when \(\Vert c_k\Vert >\epsilon \) for all \(k\in \mathbb {N}\), which implies that \(\mathtt{F=true}\) at each iteration. In that case, for all \(k\ge k_0+1\), \(\rho _{k+1}\le \tau \rho _k\), \(\sigma _{k+1}=\sigma _{k}\) and \(\lambda _{k+1}=\frac{\rho _{k+1}}{\rho _{k_0}} y_{k_0}\). We deduce that \(\{\rho _k\}\) goes to zero, \(\{\sigma _k\}\) is eventually constant, and \(\{\lambda _k\}\) goes to zero, which implies that both sequences \(\{\sigma _k\rho _k\}\) and \(\{\sigma _k\lambda _k\}\) tend to zero. \(\square \)

At Step 5 of Algorithm 1, the parameter \(\theta _k\) is selected to control the inertia of the matrix \(J_k\). This issue is important to avoid that the outer iterates converge to a stationary point, which is not a local minimum, see [18].

At Step 6, a tolerance \(\varepsilon _k>0\) is chosen to check if a sufficient reduction in the norm of the optimality conditions at the candidate iterate \(w_{k}^{+}\) has been obtained. An example of choice of \(\varepsilon _k\) is detailed in Sect. 6. If the residual norm is not smaller than this tolerance, then a sequence of inner iterations is called to compute the new iterate.

The inner iteration algorithm consists of a minimization procedure of the primal-dual merit function defined by

$$\begin{aligned} \varphi _{\lambda , \rho ,\sigma ,\nu }(w)= {\mathcal {L}}_{\rho ,\sigma }(x,\lambda )+\textstyle \frac{\nu }{2\sigma }\Vert c(x) +\sigma (\lambda -y)\Vert ^2, \end{aligned}$$

where \(\nu >0\) is a scaling parameter. It is easy to see that w is a stationary point of this function if and only if \(\varPhi (w,\lambda ,\rho ,\sigma )=0\). The minimization procedure is a backtracking line search algorithm quite similar to [7, Algorithm 2], and we refer the interested reader to this paper for a full description of this algorithm. The only difference is that in our description of Algorithm 1, the quadratic parameter \(\sigma _{k+1}\) is kept constant during the inner iterations, while in [7] it can be increased. This choice has no impact from a theoretical point of view and simplifies the presentation of the algorithm. In our numerical experiments, the value of the quadratic penalty parameter is also kept constant during the inner iterations.

4 Global Convergence Analysis

The global convergence of the inner iteration algorithm has been studied in [7, Theorem 2.3]. It has been shown that if the function f is bounded from below and if the gradient of the constraints and the regularized Hessian of the Lagrangian stay bounded during the inner iterations, then the iterate \(w_{k+1}\) can be computed in a finite number of inner iterations. In view of this result, it will be assumed that the inner iteration algorithm succeeds in a finite number of iterations in finding \(w_{k+1}\) each time it is called at Step 6 of Algorithm 1.

Theorem 4.1

Assume that Algorithm 1 generates an infinite sequence \(\{w_k\}\). Assume also that the sequence \(\{(g_k, A_k)\}\) is bounded. In any case, the iterates approach feasible or infeasible stationarity of problem \((\hbox {P}_1)\). More precisely, let \(\mathcal {K}\subset \mathbb {N}\) be the set of iteration indices for which the condition (3) is satisfied. Then, at least one of the following situations occurs.

  1. (i)

    If \(\mathcal {K}\) is infinite, then the subsequence \(\{c_k\}_{\mathcal {K}}\) tends to zero. In addition, if \(\{y_k\}_{\mathcal {K}}\) is bounded, then the sequence \(\{(g_k,A_k)\}\) has a limit point \((\bar{g}, \bar{A})\) such that \(\bar{g}+\bar{A} \bar{y}=0\) for some \(\bar{y}\in \mathbb {R}^m\). If \(\{y_k\}_{\mathcal {K}}\) is unbounded, then \(\{A_k\}\) has a limit point \(\bar{A}\) which is rank deficient.

  2. (ii)

    If \(\mathcal {K}\) is finite, then \(\{\Vert c_k\Vert \}\) is bounded away from zero and \(\{A_k c_k\}\) tends to zero.

Proof

First note that the convergence to zero of the sequence \(\{\rho _k g_k+A_k y_k\}\) is a direct consequence of Step 6 of Algorithm 1.

Let us prove outcome (i). Lemma 3.1-(i) implies that \(\lim _{\mathcal {K}} c_k=0\) and \(\{\rho _k\}\) is eventually constant. If \(\{y_k\}_{\mathcal {K}}\) is bounded, then the assumptions imply that the whole sequence \(\{(g_k,A_k,y_k/\rho _k)\}_{\mathcal {K}}\) is bounded and so has a limit point \((\bar{g}, \bar{A},\bar{y})\) such that \(\bar{g}+\bar{A} \bar{y}=0\), which proves the first part of outcome (i). Suppose now that \(\{y_k\}_{\mathcal {K}}\) is unbounded. There exists \({\mathcal {K}}'\subset {\mathcal {K}}\) such that \(y_k\ne 0\) for all \(k\in {\mathcal {K}}'\) and \(\lim _{\mathcal {K}'} \Vert y_k\Vert =\infty \). For \(k\in {\mathcal {K}'}\), we have

$$\begin{aligned} \Vert A_k u_k\Vert \le \textstyle \frac{1}{\Vert y_k\Vert }\Vert \rho _k g_k+A_ky_k\Vert + \frac{\rho _k}{\Vert y_k\Vert } \Vert g_k\Vert , \end{aligned}$$

where \(u_k=y_k/\Vert y_k\Vert \). Since \(\{(A_k,u_k)\}_{\mathcal {K}'}\) is bounded, it has a limit point \((\bar{A}, \bar{u})\), with \(\bar{u}\ne 0\). By taking the limit in the previous inequality, knowing that the two terms of the right-hand side tend to zero, we deduce that \(\bar{A}\bar{u}=0\), which proves the second part of outcome (i).

For outcome (ii), suppose that \({\mathcal {K}}\) is finite. Lemma 3.1-(ii) implies that \(\{\Vert c_k\Vert \}\) is bounded away from zero and \(\{\sigma _k\rho _k,\sigma _k\lambda _k\}\) tends to zero. For all \(k\in \mathbb {N}\), we have

$$\begin{aligned} A_k c_k = A_k(c_k+\sigma _k(\lambda _{k}-y_k)) -\sigma _kA_k\lambda _{k} +\sigma _k(\rho _{k}g_k+ A_k y_k) -\sigma _k\rho _{k}g_k. \end{aligned}$$

By taking the norm on both sides, for all k we have

$$\begin{aligned}&\Vert A_k c_k\Vert \\&\quad \le \Vert A_k\Vert \Vert c_k+\sigma _k(\lambda _k-y_k)\Vert + \sigma _k \Vert A_k\Vert \Vert \lambda _k\Vert +\sigma _k \Vert \rho _k g_k + A_k y_k\Vert + \sigma _k \rho _k \Vert g_k\Vert \\&\quad \le \max \{\Vert A_k\Vert ,\sigma _k,\Vert g_k\Vert \} (2\Vert \varPhi (w_k,\lambda _k,y_k,\sigma _k) \Vert + \sigma _k\Vert \lambda _{k}\Vert + \sigma _k\rho _{k}). \end{aligned}$$

Since the first term of the right-hand side of this inequality is bounded above and all the terms in the parentheses tend to zero, we have \(\lim A_k c_k=0\), which concludes the proof. \(\square \)

To sum up, the next result shows the behavior of the algorithm when the sequence of primal iterates remains bounded, a usual and mild assumption in a global convergence analysis.

Theorem 4.2

Assume that Algorithm 1 generates an infinite sequence \(\{w_k\}\) such that the sequence \(\{x_k\}\) lies in a compact set.

  1. (i)

    Any feasible limit point of the sequence \(\{x_k\}\) is a Fritz John point of problem \((\hbox {P}_1)\).

  2. (ii)

    If the sequence \(\{x_k\}\) has no feasible limit point, then any limit point is an infeasible stationary point of problem \((\hbox {P}_1)\).

Proof

The compactness assumption implies that the sequences \(\{g_k\}\) and \(\{A_k\}\) are bounded and so Theorem 4.1 applies.

Let \(\bar{x}\) be a limit point of \(\{x_k\}\) such that \(\bar{c}=0\). From Lemma 3.1-(ii), we have that the condition (3) is satisfied an infinite number of times. It follows from Lemma 3.1-(i) that there exists \(k_0\in \mathbb {N}\) such that for all \(k\ge k_0\), \(\rho _k=\rho _{k_0}\). Let \({\mathcal {J}}\subset \mathbb {N}\) such that the subsequence \(\{x_k\}_{\mathcal {J}}\) tends to \(\bar{x}\). Step 6 of Algorithm 1 implies that the sequence \(\{\rho _{k_0} g_k+A_k y_k\}\) tends to zero. Dividing by \(\Vert (\rho _{k_0},y_k)\Vert \) and because \(\rho _{k_0}\ne 0\), we have

$$\begin{aligned} \lim _{{\mathop {k\in {\mathcal {J}}}\limits ^{k\rightarrow \infty }}}\frac{\rho _{k_0} g_k+A_k y_k}{\Vert (\rho _{k_0},y_k)\Vert } = 0. \end{aligned}$$

By compactness, the sequence \(\{(\rho _{k_0},y_k)/\Vert (\rho _{k_0},y_k)\Vert \}_{\mathcal {J}}\) has a limit point \((\bar{\rho },\bar{y})\), such that \(\Vert (\bar{\rho },\bar{y})\Vert =1\) and \(\bar{\rho }\bar{g} + \bar{A} \bar{y}=0\), which proves assertion (i).

Suppose now that \(\{x_k\}\) has no feasible limit point. From Lemma 3.1-(i), we have that the condition (3) is only satisfied a finite number of times. Theorem 4.1-(ii) implies that \(\bar{A} \bar{c}=0\) for any limit point \(\bar{x}\) of \(\{x_k\}\), which proves assertion (ii). \(\square \)

5 Asymptotic Analysis

We have to distinguish two cases for the asymptotic analysis. The first one is when the sequence \(\{w_k\}\) converges to a primal-dual solution of the problem. In this case, because \(\{c_k\}\) converges to zero, the feasibility parameter becomes constant after a finite number of iterations and the algorithm is reduced to Algorithm 1 in [7] applied to the solution of problem (\(\hbox {P}_\rho \)) with a fixed value of \(\rho \). It has been shown that under the classical assumptions of linear independence constraint qualification and second-order sufficient conditions at the optimal limit point, a suitable choice of the parameters allows to get a superlinear or quadratic rate of convergence of \(\{w_k\}\), see [7, Theorems 4.4 and 4.5]. The second case to analyze is when the sequence \(\{x_k\}\) converges to an infeasible stationary point. This is what we will develop in detail in this section.

The first assumption is that the sequence of iterates converges to an infeasible stationary point.

Assumption 1

Algorithm 1 generates an infinite sequence \(\{w_k\}\) which converges to \(w^*=(x^*,y^*)\in \mathbb {R}^{n+m}\), where \(x^*\) is an infeasible stationary point of problem \((\hbox {P}_1)\).

This assumption is very usual for the analysis of the rate of convergence of a numerical optimization algorithm. Note that it is equivalent to assume that \(\{x_k\}\) converges to an infeasible stationary point \(x^*\) and that the algorithm always stays in the feasibility detection phase, i.e., \(\mathtt{F=true}\) for all the iterations. Indeed, by Lemma 3.1-(i), the convergence of \(\{x_k\}\) to an infeasible stationary point \(x^*\) implies that the condition (3) is satisfied for a finite number of iterations. In that case, by Lemma 3.1-(ii), the sequence \(\{\sigma _k \lambda _k\}\) tends to zero. Step 3 of Algorithm 1 implies that if \(\mathtt{F=true}\) at any iteration, then \(\{\sigma _k\}\) is eventually constant and on the contrary, if \(\mathtt{F=false}\) at some iteration, then \(\{\sigma _k\}\rightarrow 0\). Since \(\{c_k+\sigma _k(\lambda _k-y_k)\}\) tends to zero, the sequence \(\{\sigma _k y_k\}\) tends to \(c^*\ne 0\) and thus \(\{y_k\}\) has a finite limit if and only if \(\mathtt{F}\) keeps the value \(\mathtt{true}\) at all the iterations. This indicates that the choice of the value of the feasibility tolerance \(\epsilon \) is an important issue relatively to the behavior of the algorithm. In practice, \(\epsilon \) is chosen equal to, or smaller than, the stopping tolerance of the overall algorithm.

Lemma 5.1

Under Assumption 1, the inequality (3) is satisfied a finite number of times, the sequence \(\{\rho _k\}\) converges to zero, \(\{\sigma _k\}\) is eventually constant, and \(\Vert \lambda _k\Vert ={ O}(\rho _k)\).

Proof

Assumption 1 implies that \(\{c_k\}\) converges to a nonzero value. Therefore, by virtue of Lemma 3.1-(i), the inequality (3) is satisfied only a finite number of times. It follows that Step 3 of Algorithm 1 is always executed for k sufficiently large and that \(\mathtt{F=true}\) for all iterations. Indeed, for all \(k\in \mathbb {N}\) we have

$$\begin{aligned} \Vert c_k\Vert \le \Vert c_k+\sigma _k(\lambda _k-y_k)\Vert +\sigma _k\Vert \lambda _k\Vert +\sigma _k\Vert y_k\Vert . \end{aligned}$$

Step 6 and Lemma 3.1-(ii) imply that the first two terms of the right-hand side of the inequality tend to zero. Because \(\{y_k\}\) is supposed to be convergent, we deduce that the sequence \(\{\sigma _k\}\) does not converge to zero, which implies that \(\mathtt{F=true}\) for all iterations. Therefore, there exists \(k_0\in \mathbb {N}\) such that for all \(k \ge k_0\), \(\rho _{k}\le \tau ^{k-k_0} \rho _{k_0}\), \(\sigma _{k}=\sigma _{k_0}\) and \(\lambda _{k}/\rho _k= \lambda _{k_0}/\rho _{k_0}\), the conclusion follows. \(\square \)

Let \(\sigma >0\) be the limit value of \(\{\sigma _k\}\). For \(w=(x,y)\in \mathbb {R}^{n+m}\), let us define

$$\begin{aligned} F(w)= \begin{pmatrix} A(x) y\\ c(x)-\sigma y \end{pmatrix}. \end{aligned}$$
(6)

We have \(\lim \varPhi (w_k,\lambda _k,\rho _k,\sigma _k)= \varPhi (w^*,0,0,\sigma )=F(w^*)\), and therefore \(y^*=\frac{1}{\sigma }c^*\).

Assumption 2

The functions f and c are twice continuously differentiable and their second derivatives are Lipschitz continuous over an open neighborhood of \(x^*.\)

The Hessian matrix of the function \(\frac{1}{2}\Vert c\Vert ^2\) is defined as

$$\begin{aligned} C:= \textstyle \sum _ic_i\nabla ^2c_i+AA^\top . \end{aligned}$$

Assumption 3

The sufficient second-order optimality conditions hold at \(x^*\) for the feasibility problem (1), i.e., the matrix \(C^*\) is positive definite.

The following lemma is a direct consequence of these assumptions.

Lemma 5.2

Under Assumptions 2 and 3, there exist a neighborhood W of \(w^*\), \(M>0\), \(L>0\) and \(0< a_1\le a_2\) such that for all \(w, w'\in W\) we have

  1. (i)

    \(||F'(w)^{-1}||\le M,\)

  2. (ii)

    \(||F(w')-F(w)-F'(w)(w'-w)\Vert \le \frac{L}{2}\Vert w-w'\Vert ^2\),

  3. (iii)

    \(a_1\Vert w-w'\Vert \le \Vert F(w)-F(w')\Vert \le a_2\Vert w-w'\Vert \).

Proof

To prove (i), it suffices to show that \(F'(w^*)\) is nonsingular. By using the fact that \(y^*=\frac{1}{\sigma }c^*\), we have

$$\begin{aligned} F'(w^*)=\begin{pmatrix} \textstyle \frac{1}{\sigma }\sum _i c_i^*\nabla ^2c_i^*&{}A^*\\ A^{*\top }&{}-\sigma I \end{pmatrix}. \end{aligned}$$

It is well known that \(F'(w^*)\) is nonsingular if and only if the matrix \(\frac{1}{\sigma }C^*\), the Schur complement of \(-\sigma I\) of the matrix \(F'(w^*)\), is positive definite (see e.g., [19, Lemma 4.1]). Thus, item (i) follows from Assumption 3. Assumption 2 implies that \(F'\) is Lipschitz continuous on W with the Lipschitz constant L. Property (ii) then follows from the Lipschitz continuity of \(F'\) and from [20, Lemma 4.1.12]. The assertion (iii) follows from [20, Lemma 4.1.16]. \(\square \)

The next lemma shows that the matrix \(J_k\) used at Step 5 of Algorithm 1 is a good approximation of the Jacobian matrix of F at \(w_k\) when the feasibility parameter goes to zero.

Lemma 5.3

Under Assumptions 13, there exists \(\beta >0\) such that for all \(k\in \mathbb {N}\) large enough,

$$\begin{aligned} \Vert J_k-F'_k\Vert \le \beta \rho _{k+1} \quad \hbox {and}\quad \Vert J_k^{-1}\Vert \le 2M, \end{aligned}$$

where M is defined by Lemma 5.2.

Proof

From the definition of \(J_k\), for all \(k\in \mathbb {N}\) we have

$$\begin{aligned} \Vert J_k-F'_k\Vert = \Vert \rho _{k+1} \nabla ^2 f_k + \theta _k I\Vert . \end{aligned}$$

Since \(\{x_k\}\) converges to \(x^*\) and f is assumed to be twice continuously differentiable in a neighborhood of \(x^*\), the first inequality will be proved if we show that \(\theta _k=0\) for k large enough. This is the case if \(\hbox {In}(J_k)=(n,m,0)\) or, equivalently, if \(K_{\rho _{k+1},\sigma ,0}(w_k)\) is positive definite for k large enough. For all \(k\in \mathbb {N}\), we have

$$\begin{aligned} K_{\rho _{k+1},\sigma ,0}(w_k)= & {} H_{\rho _{k+1},0}(w_k)+\textstyle \frac{1}{\sigma }A_k A_k^\top \\= & {} \textstyle \frac{1}{\sigma }C^* + \frac{1}{\sigma }(C_k-C^*) + H_{\rho _{k+1},0}(x_k,y_k-\frac{1}{\sigma }c_k). \end{aligned}$$

By assumption, \(C^*\) is positive definite and the two other matrices tend to zero when k tends to infinity. It follows that \(K_{\rho _{k+1},\sigma ,0}(w_k)\) is positive definite for k large enough, first inequality.

Using Lemma 5.2-(i), the inequality just proved and the fact that \(\{\rho _k\}\) tends to zero, for k large enough we have

$$\begin{aligned} \Vert F'^{-1}_k(J_k-F'_k)\Vert\le & {} \Vert F'^{-1}_k\Vert \Vert J_k-F'_k\Vert \\\le & {} M\beta \rho _{k+1} \\\le & {} \textstyle \frac{1}{2}. \end{aligned}$$

It then suffices to apply [20, Theorem 3.1.4] to prove the second inequality.

The last lemma gives an estimate of the distance of the Newton iterate \(w_k^+\) to the solution \(w^*\).

Lemma 5.4

Assume that Assumptions 13 hold. The sequence of iterates generated by Algorithm 1 satisfies

$$\begin{aligned} \Vert w_k^+-w^*\Vert ={ O}(\Vert w_k-w^*\Vert ^2) +{ O}(\rho _{k+1}). \end{aligned}$$

Proof

Let \(k\in \mathbb {N}\). From the definition of the trial iterate \(w_k^+\) at Step 5 of Algorithm 1, we have

$$\begin{aligned} w_k^+-w^*= & {} w_k-w^*-J_k^{-1}\varPhi (w_k,\lambda _{k+1},\rho _{k+1},\sigma )\\= & {} J_k^{-1}\big ((J_k-F_k')(w_k-w^*) +F'_k(w_k-w^*)-F_k\\&+ \,F_k-\varPhi (w_k,\lambda _{k+1},\rho _{k+1},\sigma )\big ). \end{aligned}$$

By using \(F^*=0\), by taking the norm on both sides, then by applying Lemma 5.3, Lemma 5.2-(ii), finally by using the convergence of \(\{w_k\}\) to \(w^*\), the boundedness of \(\{g_k\}\) and \(\Vert \lambda _k\Vert ={ O}(\rho _k)\) from Lemma 5.1, we obtain

$$\begin{aligned}&\Vert w_k^+-w^*\Vert \\&\quad \le \Vert J_k^{-1}\Vert \big ( \Vert J_k-F_k'\Vert \Vert w_k-w^*\Vert + \Vert F^* - F_k - F'_k (w^*-w_k)\Vert \\&\qquad + \Vert F_k-\varPhi (w_k,\lambda _{k+1},\rho _{k+1},\sigma )\Vert \big )\\&\quad \le 2M\big (\beta \rho _{k+1}\Vert w_k-w^*\Vert +\textstyle \frac{L}{2}\Vert w_k-w^*\Vert ^2 + \rho _{k+1} \Vert g_k\Vert +\sigma \Vert \lambda _{k+1}\Vert \big )\\&\quad = { O}(\rho _{k+1}) + { O}(\Vert w_k-w^*\Vert ^2), \end{aligned}$$

which concludes the proof. \(\square \)

We now state the main result of this section. This theorem shows the rapid rate of convergence of Algorithm 1 in the infeasible case. In addition, under a suitable choice of the parameters, there is no need of inner iterations for k large enough. In this case, the cost of the one iteration of the algorithm is reduced to the solution of the linear system at Step 5, which can be done with \({ O}((n+m)^3)\) arithmetic operations.

Theorem 5.1

Assume that Assumptions 13 hold. Let \(t\in ]0,2]\). If the feasibility parameter of Algorithm 1 is chosen so that \(\rho _{k+1}={ O}(\Vert F_k\Vert ^t)\), then

$$\begin{aligned} \Vert w_{k+1}-w^*\Vert ={ O}(\Vert w_k-w^*\Vert ^t). \end{aligned}$$
(7)

In addition, if \(\rho _{k+1}=\varTheta (\Vert F_k\Vert ^t)\) and if \(\varepsilon _k = \varOmega (\rho _k^{t'})\) for \(0<t'<t\), then for k large enough there is no inner iterations, i.e., \(w_{k+1}=w_k^+\).

Proof

The assumption on the value of \(\rho _{k+1}\) and the Lipschitz property of F from Lemma 5.2-(iii) imply that

$$\begin{aligned} \rho _{k+1}={ O}(\Vert w_k-w^*\Vert ^t). \end{aligned}$$
(8)

Using this estimate in Lemma 5.4, we deduce that

$$\begin{aligned} \Vert w_k^+-w^*\Vert ={ O}(\Vert w_k-w^*\Vert ^t). \end{aligned}$$
(9)

At Step 6 of Algorithm 1, we have either \(w_{k+1}=w_k^+\) or \(w_{k+1}\) is computed by means of the inner iterations. In the first case, it is clear that (7) follows from (9). Suppose now that the second case holds, i.e., the inequality (4) is not satisfied at iteration k. We then have

$$\begin{aligned} \Vert \varPhi (w_{k+1},\lambda _{k+1},\rho _{k+1},\sigma )\Vert \le \varepsilon _k < \Vert \varPhi (w_{k}^{+},\lambda _{k+1},\rho _{k+1},\sigma )\Vert . \end{aligned}$$
(10)

From (9), the sequence \(\{w_k^+\}\) tends to \(w^*\), and therefore \(\{g_k^+\}\) is bounded. Using the second inequality of Lemma 5.2-(iii) and Lemma 5.1, then (8) and (9), we deduce that

$$\begin{aligned} \Vert \varPhi (w_{k}^{+},\lambda _{k+1},\rho _{k+1},\sigma )\Vert\le & {} \Vert F_k^+-F^*\Vert +\rho _{k+1}\Vert g_k^+\Vert +\sigma \Vert \lambda _{k+1}\Vert \nonumber \\= & {} { O}(\Vert w_k^+-w^*\Vert )+{ O}(\rho _{k+1}) \nonumber \\= & {} { O}(\Vert w_k-w^*\Vert ^t). \end{aligned}$$
(11)

Combining (10) and(11), we obtain

$$\begin{aligned} \Vert \varPhi (w_{k+1},\lambda _{k+1},\rho _{k+1},\sigma )\Vert = { O}(\Vert w_k-w^*\Vert ^t). \end{aligned}$$

Finally, from the first inequality of Lemma 5.2-(iii), the last estimate, the boundedness of \(\{g_k\}\), Lemma 5.1 and the estimate (8), we have

$$\begin{aligned} a_1\Vert w_{k+1}-w^*\Vert\le & {} \Vert F_{k+1}-F^*\Vert \\= & {} \Vert F_{k+1}\Vert \\\le & {} \Vert \varPhi (w_{k+1},\lambda _{k+1},\rho _{k+1},\sigma )\Vert +\rho _{k+1}\Vert g_{k+1}\Vert +\sigma \Vert \lambda _{k+1}\Vert \\= & {} { O}(\Vert w_k-w^*\Vert ^t)+{ O}(\rho _{k+1})\\= & {} { O}(\Vert w_k-w^*\Vert ^t), \end{aligned}$$

which proves (7).

Let us now prove the second assertion of the theorem. On the one hand, Lemma 5.2-(iii) and (7) imply that \(\Vert F_{k+1}\Vert = { O}(\Vert F_{k}\Vert ^t)\). By assumption, we have \(\rho _{k+1}=\varTheta (\Vert F_k\Vert ^t)\), and thus \(\rho _{k+1} = { O}(\rho _k^t)\). Since \(t'<t\), we then have \(\rho _{k+1} = { o}(\rho _k^{t'})\). On the other hand, the estimate (11) implies that

$$\begin{aligned} \Vert \varPhi (w_k^+,\lambda _{k+1},\rho _{k+1},\sigma )\Vert = { O}(\Vert F_k\Vert ^t)={ O}(\rho _{k+1}). \end{aligned}$$

By assumption, \(\varepsilon _k = \varOmega (\rho _k^{t'})\); therefore, for k large enough, the inequality (4) is satisfied. \(\square \)

6 Numerical Experiments

Our algorithm is called SPDOPT-ID (strongly primal-dual optimization with infeasibility detection) and has been implemented in C. The performances of SPDOPT-ID are compared with those of SPDOPT-AL [7] on a set of 130 standard problems from the CUTEr collection [21]. The selected problems are those with equality constraints and a solution time less than 300 s. To create a second set of 130 infeasible problems, the constraint \(c_1^2+1=0\), where \(c_1\) is the first component of c, has been added to each problem. Note that the addition of this new constraint leads to a twofold difficulty. Indeed, not only the constraints are infeasible, but their gradients are linearly dependent.

We also compare the condition used to update the parameters in Step 2 of Algorithm 1, with the one used in [7, Algorithm 1]. The algorithm called SPDOPT-IDOld is Algorithm 1, but with the inequality (3) which is replaced by

$$\begin{aligned} \Vert c_k\Vert \le a\max \{\Vert c_{i_j}\Vert + \zeta _{i_j}: (k-\ell )_+\le j\le k\}. \end{aligned}$$
(12)

We will show that this modification is of importance when solving an infeasible problem and that the use of (3) in place of (12) leads to better numerical performances.

The feasibility parameter is initially set to \(\rho _0=1\). When \(\mathtt{F=true}\), the feasibility parameter in Step 3 is updated by the formula

$$\begin{aligned} \rho _{k+1}=\min \{0.2\rho _k, 0.2\Vert F_k\Vert ^2,1/(k+1)\}. \end{aligned}$$

The assumption on \(\rho _{k+1}\) in the statements of Theorem 5.1 is satisfied with \(t=2\). The rate of convergence of \(\{w_k\}\) to \(w^*\) is then quadratic. A lower bound of \(10^{-16}\) is applied on this parameter.

The parameters \(\sigma _{k}\) and \(\theta _k\) are updated at Step 4 and Step 5 as in [7, Algorithm 1].

To be able to solve a quadratic problem in only one iteration, we adopt the same procedure as in [22] for the choice of the starting point. Let \(\bar{w}=(\bar{x},\bar{y})\), where \(\bar{x}\) is the default starting point and \(\bar{y}=(1,\ldots ,1)^\top \). Initially, the following linear system \(J_{1,0,0}(\bar{w})d=-\varPhi (\bar{w},\bar{y},1,0)\) is solved. If the inequality \(\Vert \varPhi (\bar{w}+d,0,1,0)\Vert _\infty \le \Vert \varPhi (\bar{w},0,1,0)\Vert _\infty \) is satisfied, then \(w_0=\bar{w}+d\), otherwise \(w_0=\bar{w}\).

The algorithm is terminated and an optimal solution is declared to be found if \(\Vert (g_k+A_k y_k/\rho _k,c_k)\Vert _\infty \le \varepsilon _{\text {tol}}\) with \(\varepsilon _{\text { tol}}=10^{-8}\). Otherwise, the algorithm returns a notification that an infeasible stationary point has been found if \(\rho _k\le \varepsilon _{\text {tol}},\)\(\Vert c_k\Vert >\varepsilon _{\text {tol}}\) and \(\Vert \varPhi (w_k,0,0,\sigma _k)\Vert _\infty \le \varepsilon _{\text {tol}}\). For SPDOPT-AL, we also add the stopping condition \(\Vert c_k\Vert > \varepsilon _{\text {tol}}\), \(\Vert A_kc_k\Vert \le \varepsilon _{\text {tol}}\) and \(\sigma _k\le \varepsilon _{\text {tol}}\) to terminate this algorithm at an infeasible stationary point.

As mentioned in Sect. 5, the feasibility tolerance at Step 1 is set to \(\epsilon =\varepsilon _{\text {tol}}\), to get a fast local convergence when the algorithm converges to an infeasible stationary point.

At Step 2 of Algorithm 1, we choose \(a=0.9, \ell =2\) and \(\zeta _k=10\sigma _k\rho _k\) for all iteration k.

The sequence of tolerance \(\{\varepsilon _k\}\) in Step 6 is defined by the following formula:

$$\begin{aligned} \varepsilon _k =0.9\max \{\Vert \varPhi (w_i,\lambda _i,\rho _i,\sigma _i)\Vert : (k-4)_+\le i\le k\} +\zeta _k. \end{aligned}$$

The convergence to zero of the sequence \(\{\varepsilon _k\}\) is a consequence of [22, Proposition 1]. This choice meets the requirements to get a fast convergence both in the feasible case, i.e., \(\varepsilon _k=\varOmega (\sigma _{k+1})\), and in the infeasible case, i.e., \(\varepsilon _k=\varOmega (\rho _k^{t'})\), with \(t'=1\).

The symmetric indefinite factorization code MA57 [23] is used to factorize and regularize the matrix \(J_k\). Since this factorization reveals the inertia of the matrix, the correction parameter \(\theta _k\), initially set to zero, is increased and a new factorization is performed until the inertia of \(J_k\) has the correct value.

The maximum number of iterations, counting both the inner and the outer iterations, is limited to 3000.

Fig. 1
figure 1

Performance profiles comparing the three algorithms on the set of standard problems

For the standard problems, only 129 problems solved by at least one of three algorithms are selected for the comparison purpose (problem dixchlng has not been solved). Figure 1 shows the performance profiles of Dolan and Moré [24] on the numbers of function and gradient evaluations. For \(\tau \ge 0\), \(\rho _s(\tau )\) is the fraction of test problems for which the performance of the solver s is within a factor \(2^\tau \) of the best one. These profiles show that the performances of the three algorithms are very similar, the difference is not significant. In terms of robustness, the three algorithms solve successfully the same number of problems (128 problems). We can conclude that the infeasibility detection does not reduce the performances of the algorithm for solving standard problems.

Fig. 2
figure 2

Performance profiles comparing the three algorithms on the set of infeasible problems

Figure 2 shows the performances of these algorithms in terms of numbers of function and gradient evaluations on a set of 126 infeasible problems (the problems gilbert, hager3, porous1, porous2 have been eliminated since three algorithms cannot detect the infeasibility). We observe that SPDOPT-ID is the most efficient algorithm for detecting infeasible problems, with an efficiency rate of approximately \(90\%.\) In any case, the efficiency of SPDOPT-ID and SPDOPT-IDOld is very significant comparing to SPDOPT-AL. In terms of robustness, our two algorithms are more robust than SPDOPT-AL since they can detect more than \(95\%\) of problems, whereas SPDOPT-AL only detects less than \(60\%.\) This figure also shows that SPDOPT-ID is better comparing to SPDOPT-IDOld, justifying the choice of new criterion for updating parameters.

Fig. 3
figure 3

Values of \(\log _{10}\Vert A_k c_k\Vert \) and \(\log _{10}\Vert F_k\Vert \) for the last ten iterations of SPDOPT-AL and SPDOPT-ID. T represents the index of the stopping iteration for each run

We conclude this section by a comparison of a numerical estimate of the rate of convergence of the new algorithm SPDOPT-ID and of the original one SPDOPT-AL, when the sequence of iterates converges to an infeasible stationary point. We used a graphical representation inspired by [14]. We selected a set of 58 problems among the collection of infeasible problems, for which both algorithms generate a sequence converging to an infeasible stationary point. Figure 3 shows the last ten values of \(\Vert A_k c_k\Vert \) for SPDOPT-AL and of \(\Vert F_k\Vert \) for SPDOPT-ID. We cannot plot the values \(\Vert F_k\Vert \) for SPDOPT-AL, because when the sequence of iterates converges to an infeasible stationary point, \(\{\sigma _k\}\) goes to zero and \(\{y_k\}\) becomes unbounded. Under some regularity assumptions, we obviously have \(\Vert A_kc_k\Vert = \varTheta (\Vert x_k-x^*\Vert )\) and \(\Vert F_k\Vert =\varTheta (\Vert w_k-w^*\Vert )\). These curves empirically show that there is a true improvement in the rate of convergence of the algorithm, from linear to quadratic.

For the solution of these infeasible problems, we observed that for the last four outer iterations, there is no inner iterations (i.e., \(w_{k+1}=w_k^+\)) for \(90\%\) of the problems. This percentage is more than \(93\%\) if one considers the last three outer iterations. For the infeasible problems for which SPDOPT-ID uses inner iterations at the last outer iterations, either the algorithm does not terminate successfully or the quadratic convergence is not observed. These observations confirm the asymptotic property of our algorithm.

7 Conclusions

During the solution of an optimization problem, a rapid detection of the infeasibility is a difficult task. In the framework of an augmented Lagrangian method, we have proposed to add a new parameter, which has to scale down the objective function when the infeasibility measure is not sufficiently reduced. The global and local convergence analyses, as well as the numerical results, show that this method is reliable for both solving a feasible optimization problem and quickly detecting the infeasibility.

The original algorithm SPDOPT-AL [7] uses only one penalty parameter \(\sigma _k\) to balance between optimality and feasibility as in a classical augmented Lagrangian algorithm. According to [7, Theorem 3.4], when the problem is infeasible, any limit point of \(\{x_k\}\) is an infeasible stationary point, but the sequence \(\{y_k\}\) becomes unbounded. For this classical scheme, we did not find a way to update the penalty parameter to get a fast rate of convergence to an infeasible stationary point. In practice, we only observed a linear rate of convergence, as shown in Fig. 3. The introduction of the feasibility parameter \(\rho _k\) allows to balance between the solution of the first-order optimality conditions of the original problem \((\hbox {P}_1)\) and the ones of the feasibility problem (1). By this way, we see that taking \(\sigma _k\rightarrow 0\) in the classical scheme is not equivalent to taking \(\rho _k\rightarrow 0\) and keeping \(\sigma _k\) constant in the new scheme.

Another advantage of our new scheme is that in case of convergence to an infeasible stationary point, when the quadratic penalty parameter \(\sigma _k\) is kept constant, it becomes a regularization term for the matrix \(F'(w)\). For this reason, the asymptotic analysis near an infeasible stationary point does not require a constraint qualification like some other works in the literature, see e.g., [13, Assumption 3.2-(b)] and [14, Assumption 4.19-(a)].

The quadratic convergence to an infeasible stationary point is an original result for an augmented Lagrangian algorithm. However, the fast local convergence is only guaranteed under the assumption that the infeasible stationary point, the limit point of the sequence of iterates, is sufficiently infeasible, i.e., \(\Vert c^*\Vert >\epsilon \), for some \(\epsilon >0\). On some examples, we observed that the algorithm converges with a linear rate to an infeasible stationary point, because the norm of the constraints evaluated at this point is very close to the stopping tolerance. Hence, an open question is to design an algorithm with rapid convergence in the infeasible case, regardless the norm of the constraints at the limit point.

Another natural question is the extension of this approach to the solution of general optimization problem with equality and inequality constraints. One possibility is to introduce slack variables to inequality constraints and apply augmented Lagrangian method in the case of simple bounds as in [4, 25]. On the other hand, we note that infeasibility detection has been used in the framework of interior point methods [26, 27]. However, these works did not report complete global and local convergence analyses of their methods, despite good numerical results. To our knowledge, there is no local convergence result for nonlinear interior points methods in the infeasible case. To extend our approach, a possibility is to combine an augmented Lagrangian method and a log-barrier penalty to handle inequalities as in [28]. But the introduction of a feasibility parameter makes the convergence analysis quite different. Indeed, in that case the feasibility parameter becomes a scaling factor between the log-barrier function and the quadratic penalty term; therefore, when the sequence of iterates converges to an infeasible stationary point, this sequence follows a path of solutions parameterized by the log-barrier penalty parameter, see [29].