1 Introduction

In this paper, we consider a numerical algorithm for the following nonlinear equality constrained optimization problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{}f(x)\\ \text {s.t.}&{}c(x)=0, \end{array}}\right. \end{aligned}$$
(1)

where the objective \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) and the constraints \(c:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^m\) with \(m\le n\) are sufficiently smooth.

The proposed algorithm is a member of the class of two-phase trust region algorithms for which the reader is referred to, e.g., [16]. More specifically, our approach is based on the Byrd–Omojokun trust region SQP method [4, 5], which is recognized as the most practical trust region method for general nonlinear equality constrained optimization problems. In the Byrd–Omojokun method, a complete step is decomposed into a normal step and a tangent step, which are computed by solving a two-phase relaxation QP subproblem.

As a common view, the technique used to accept or reject steps deeply impacts the efficiency of methods for nonlinear constrained optimization. In many traditional algorithms for nonlinear constrained optimization, penalty functions or filters are used to judge the quality of a trial step. The most obvious disadvantage of penalty functions is their over dependence upon penalty parameters. An inappropriate penalty parameter can possibly reject a good step, and even a sophisticatedly designed strategy for updating the penalty parameter is not very efficient in practical use. That is why Fletcher and Leyffer proposed the creative concept of filters [7]. Convergence theories on filter methods can be seen, e.g., in [8, 9]. Nevertheless, filter methods also have an Achilles’ heel, a restoration phase is used to reduce infeasibility until a feasible subproblem is obtained. Gould, Loh, and Robinson [10] recently proposed a new robust filter method which is free of restoration phases. This method uses a complicated unified step computation process and a mixed step acceptance strategy based on filters and steering penalty functions.

Over the last few years, methods without a penalty function or a filter has been a hot topic in the nonlinear optimization community. Bielschowsky and Gomes [11] introduced an infeasibility control technique based on trust cylinders. This method needs to obtain a possibly computationally expensive restoration step per iteration. Gould and Toint [12] introduced a trust funnel technique for step acceptance. This method uses different trust regions for normal and tangent steps, but the strategy for coordinating normal and tangent trust region radii is very sophisticated and still in exploration. Zhu and Pu [13] proposed a step acceptance mechanism based on a control set. Liu and Yuan [14] proposed a penalty-filter-free technique in the line search framework. Those four methods are designed especially for equality constrained optimization. More recently, Shen et al. [15] proposed a non-monotone SQP method for general nonlinear constrained optimization without a penalty function or a filter. Although the penalty-filter-free methods mentioned above share some similarities, they are quite different from each other.

This paper is mostly based on the method of Zhu and Pu [13]. The most outstanding novelty of Zhu and Pu’s work is a new technique controlling infeasibility via a set of constraint violation of some previous iterations. This technique is very useful in practice, but in theory it has a fly in the ointment. Specifically, a strong assumption on the step size must be required to establish global convergence. The primary cause of this assumption is that a double trust regions strategy similar to Gould and Toint’s work [12] is used for step computations and the ratio of normal and tangent trust region radii is out of control in theory though in practice it is not the case.

The main contribution of this paper is to complete the global convergence theory of the new infeasibility control technique in [13]. Compared with [13], the most significant modification made in the proposed algorithm is that the double trust regions strategy is replaced by a standard (single) trust region strategy. Global convergence to first order critical points is then proved under mild assumptions. Of course, we also present an extended numerical results on some CUTEr problems to demonstrate the efficiency of the proposed algorithm.

The paper is organized as follows. In Sect. 2, a complete description of the proposed algorithm is introduced. we Assumptions and global convergence analysis are presented in Sect. 3. Section 4 is devoted to some numerical results.

2 The algorithm

2.1 Step computation

We compute steps on the basis of the Byrd–Omojokun trust region method [5]. Each complete step \(s_k\) is composed of a normal step \(n_k\) and a tangent step \(t_k\), i.e.,

$$\begin{aligned} s_k=n_k+t_k. \end{aligned}$$
(2)

The normal step \(n_k\) aims at reducing the constraint violation function \(h(x)\), where

$$\begin{aligned} h(x):=\frac{1}{2}||c(x)||^2 \end{aligned}$$
(3)

with \(||\cdot ||\) denoting the Euclidean norm. This function can be viewed as an infeasibility measure at a point \(x\). The tangent step \(t_k\) aims at reducing the objective as much as possible while preserving the constraint violation. Specifically, \(n_k\) and \(t_k\) are computed as follows.

For the normal step \(n_k\), we solve the following trust region least squares problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{}\frac{1}{2}||c_k+A_k v||^2\\ \text {s.t.}&{}||v||\le \tau \Delta _k, \end{array}}\right. \end{aligned}$$
(4)

where \(\tau \in (0,1)\), \(c_k=c(x_k)\), and \(A_k=A(x_k)\) which is the Jacobian of \(c(x)\) at \(x_k\). We assume the solution to (4), the normal step \(n_k\), satisfies

$$\begin{aligned} ||n_k||\le \kappa _n ||c_k||, \end{aligned}$$
(5)

where \(\kappa _n>0\). This assumption is actually a regularization condition on the Jacobian of constraints. In fact, suppose \(A_k\) has the SVD: \(A_k=U_k \Sigma _k V_k^T\). Then \(v_k=-V_k\Sigma _k^{\dagger } U_k^Tc_k\), where \(\Sigma _k^{\dagger }\) is the pseudo-inverse of \(\Sigma _k\), solves the least squares problem

$$\begin{aligned} \min ||c_k+A_k v||^2. \end{aligned}$$

Thus, a sufficient condition for (5) is that the smallest positive singular value of \(A_k\) is bounded below away from zero. Notice that \(n_k=0\) when \(x_k\) is feasible.

After computing the normal step \(n_k\), we proceed to the next task that is to find a tangent step \(t_k\) to improve the optimality of the current iterate \(x_k\). Consider a quadratic model of the Lagrangian at \(x_k\)

$$\begin{aligned} m_k(x_k+d):=f_k+g_k^Td+\frac{1}{2}d^T B_kd, \end{aligned}$$

where \(f_k=f(x_k)\), \(g_k=\nabla f(x_k)\), and \(B_k\) is an approximate Hessian of the Lagrangian

$$\begin{aligned} {\mathcal {L}}(x,\lambda )=f(x)+\lambda ^T c(x) \end{aligned}$$

at \(x_k\). It follows that

$$\begin{aligned} m_k(x_k+n_k)=f_k+ g_k^Tn_k+\frac{1}{2}n_k^TB_k n_k \end{aligned}$$

and

$$\begin{aligned} m_k(x_k+n_k+t)= m_k(x_k+n_k)+ (g_k^n)^T t+ \frac{1}{2}t^TB_k t , \end{aligned}$$

where \(g_k^n=g_k+B_k n_k\). Then the tangent step \(t_k\) should be an solution to the following problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{} (g_k^n)^T t+\frac{1}{2}t^TB_k t\\ \text {s.t.}&{} A_k t=0,\\ &{}||n_k+t||\le \Delta _k.\end{array}}\right. \end{aligned}$$

But, in practice, we solves for \(t_k\) this problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{} (Z_k^T g_k^n)^T v+\frac{1}{2}v^T Z_k^T B_k Z_k v\\ \text {s.t.}&{}||v||\le \sqrt{\Delta _k^2-||n_k||^2},\end{array}}\right. \end{aligned}$$
(6)

where \(Z_k\) is an orthonormal basis matrix of the null space of \(A_k\). Therefore, the dogleg method [16], the CG-Steihaug [17] method, and the generalized Lanczos trust region (GLTR) method [18] can apply. Let \(v_k\) be the obtained solution to (6), we set \(t_k=Z_kv_k\). Since the tangent step \(t_k\) is in the null space of \(A_k\), we have from (2) that

$$\begin{aligned} c_k+A_ks_k=c_k+A_k n_k, \end{aligned}$$
(7)

which means that the linearized constraint violation remains unchanged after the tangent step \(t_k\) is taken.

The lagrangian multiplier vector \(\lambda _{k+1}\) is obtained by solving the following least squares problem

$$\begin{aligned} \min _{\lambda } \frac{1}{2}||g_k+A_k^T \lambda ||^2 \end{aligned}$$
(8)

2.2 Step acceptance

The mechanism for step acceptance is introduced by Zhu and Pu [13]. This mechanism uses a novel infeasibility control technique to promote global convergence to first order critical points. Now we describe this technique in detail as follows.

The key is the innovative concept of “control set” which is a set of \(l\) positive numbers and denoted by

$$\begin{aligned} H_k:=\{H_{k,1},H_{k,2},\ldots ,H_{k,l}\}, \end{aligned}$$

where the \(l\) elements are sorted in a non-increasing order, i.e., \(H_{k,1}\ge H_{k,2} \ge \cdots \ge H_{k,l}\). At the beginning, we define \(H_0\) as \(H_0=\{u,\cdots ,u\}\) where \(u\) is a sufficiently large constant such that

$$\begin{aligned} u\ge \max \{h(x_0),1\}. \end{aligned}$$
(9)

For an arbitrary iteration \(k\), when the complete step \(s_k\) is computed, we consider the following three cases.

$$\begin{aligned}&\bullet ~~h(x_k)=0,~~h(x_k+s_k)\le H_{k,1},\end{aligned}$$
(10)
$$\begin{aligned}&\bullet ~~h(x_k)>0,~~h(x_k+s_k)\le \beta h(x_k),\end{aligned}$$
(11)
$$\begin{aligned}&\bullet ~~h(x_k)>0,~~f(x_k+s_k)\le f(x_k)-\gamma h(x_k+s_k),~~ h(x_k+s_k)\le \beta H_{k,2}, \end{aligned}$$
(12)

where \(\beta \) and \(\gamma \) are two constants such that \(0<\gamma <\beta <1\). If one of (11) and (12) is satisfied, then

$$\begin{aligned} f(x_k+s_k)\le f(x_k)-\gamma h(x_k+s_k)~~\text {or}~~ h(x_k+s_k)\le \beta h(x_k). \end{aligned}$$
(13)

After \(x_k+s_k\) is accepted as the next iterate \(x_{k+1}\), we may update the control set \(H_k\) by substituting a new element \(h_k^+\) for the biggest element \(H_{k,1}\), where

$$\begin{aligned} h_k^+:= (1-\theta )h(x_k)+\theta h(x_{k+1}) \end{aligned}$$
(14)

with \(\theta \in (0,1)\). Of course, all the elements in the new control set \(H_{k+1}\) will be rearranged in a non-increasing order as well. The purpose of the control set is evidently to compel the infeasibility of the iterates to approach zero progressively. Although only the first two elements \(H_{k,1}\) and \(H_{k,2}\) of \(H_k\) are involved in conditions (10)–(13), the length \(l\) of \(H_k\) impacts the strength of infeasibility control. For example, consider two cases that \(l=2\) and \(l=3\) with the same values for the initial number and the entering number:

$$\begin{aligned}&H_0=\{100,100\},~~~~~~~H_1=H_0\oplus 10=\{100,10\},~~~~~~~H_2=H_1\oplus 1=\{10,1\},\\&H_0\!=\!\{100,100,100\},~H_1\!=\!H_0\oplus 10=\{100,100,10\},~H_2\!=\!H_1\oplus 1\!=\!\{100,10,1\}. \end{aligned}$$

Here the notation “\(\oplus \)” means the control set is updated with some new entry.

It is observed that the first two elements of bigger \(l\) change faster.

All iterations are classified into the following three types.

\(\bullet \) \(f\)-type. At least one of (10)–(12) holds and

$$\begin{aligned} \chi _k>\sigma _1 ||c_k||^{\sigma _2},~~\delta _k^f\ge \zeta \delta _k^{f,t}, \end{aligned}$$
(15)

where \(\sigma _1,\sigma _2,\zeta \in (0,1)\) and \(\chi _k,\delta _k^f,\delta _k^{f,t}\) are defined by

$$\begin{aligned} \chi _k&:= ||Z_k^T g_k^n||,\end{aligned}$$
(16)
$$\begin{aligned} \delta _k^f&:= f(x_k)-m_k(x_k+s_k),\end{aligned}$$
(17)
$$\begin{aligned} \delta _k^{f,t}&:= m_k(x_k+n_k)-m_k(x_k+s_k). \end{aligned}$$
(18)

\(\bullet \) \(h\)-type. At least one of (10)–(12) holds is but (15) fails.

\(\bullet \) \(c\)-type. None of (10)–(12) holds.

Given some constants \(\eta _1,\eta _2,\tau _1,\tau _2,\bar{\Delta },\hat{\Delta }\) such that \(0<\eta _1<\eta _2<1\), \(0<\tau _1<1\le \tau _2\), \(0<\bar{\Delta }<\hat{\Delta }\), we accept or reject the trial step according to the following strategy.

When \(k\) is an \(f\)-type iteration, we accept \(x_k+s_k\) if

$$\begin{aligned} \rho _k^f:=\frac{f(x_k)-f(x_k+s_k)}{\delta _k^f}\ge \eta _1. \end{aligned}$$
(19)

The corresponding update rule for the trust region radius \(\Delta _k\) is

$$\begin{aligned} \Delta _{k+1}=\left\{ {\begin{array}{ll} \min \{\max \{\tau _2\Delta _k,\bar{\Delta }\},\hat{\Delta }\} &{}\text {if}~~\rho _k^f \ge \eta _2,\\ \max \{\Delta _k,\bar{\Delta }\},&{}\text {if}~~\eta _1\le \rho _k^f < \eta _2,\\ \tau _1\Delta _k,&{}\text {if}~~\rho _k^f<\eta _1. \end{array}}\right. \end{aligned}$$
(20)

When \(k\) is an \(h\)-type iteration, we always accept \(x_k+s_k\) and update the trust region radius \(\Delta _k\) according to the following rule

$$\begin{aligned} \Delta _{k+1}^f= \max \{\Delta _k,\bar{\Delta }\}. \end{aligned}$$
(21)

When \(k\) is a \(c\)-type iteration, we accept \(x_k+s_k\) if

$$\begin{aligned} \delta _k^c>0, ~~\rho _k^c:=\frac{h(x_k)-h(x_k+s_k)}{\delta _k^c}\ge \eta _1 \end{aligned}$$
(22)

where

$$\begin{aligned} \delta _k^c:=\frac{1}{2}||c_k||^2-\frac{1}{2}||c_k+A_ks_k||^2. \end{aligned}$$
(23)

The trust region radius \(\Delta _k\) is then updated by

$$\begin{aligned} \Delta _{k+1}=\left\{ {\begin{array}{ll} \min \{\max \{\tau _2\Delta _k,\bar{\Delta }\},\hat{\Delta }\} &{}\text {if}~~\rho _k^c \ge \eta _2,\\ \max \{\Delta _k,\bar{\Delta }\},&{}\text {if}~~\eta _1\le \rho _k^c < \eta _2,\\ \tau _1\Delta _k,&{}\text {if}~~\rho _k^c<\eta _1. \end{array}}\right. \end{aligned}$$
(24)

Before we present a formal description of our trust region infeasibility control algorithm, the refer should notice that formulae (20), (21), and (24) imply that

$$\begin{aligned} \Delta _{k+1}\ge \bar{\Delta }\end{aligned}$$
(25)

if \(k\) is a successful iteration, which is important for the global convergence analysis in the next section.

2.3 The algorithm

Now a formal statement of the algorithm is presented as follows.

Algorithm 1

A trust-region algorithm with infeasibility control (TRIC)

  • Initialization: Choose \(x_0,B_0\) and parameters \(\beta ,\gamma ,\theta ,\zeta ,\eta _1,\eta _2,\sigma _1,\sigma _2,\tau ,\tau _1\in (0,1),u,\tau _2\in [1,+\infty )\), \(l\in \{2,3,\cdots \}\) and \(\Delta _0,\bar{\Delta },\hat{\Delta }\in (0,+\infty )\) such that \(\bar{\Delta }<\Delta _0<\hat{\Delta }\). Set \(k=0\).

  • Step 1: Stop if \(x_k\) is a KT point.

  • Step 2: Solve (4) for \(n_k\) if \(c_k\ne 0\) and set \(n_k=0\) if \(c_k=0\).

  • Step 3: Compute \(Z_k\), solve (6) for \(v_k\), set \(t_k=Z_kv_k\), and obtain \(s_k=n_k+t_k\).

  • Step 4: When \(k\) is \(f\)-type, accept \(x_k+s_k\) if (19) holds and update \(\Delta _k\) according to (20).

    • When \(k\) is \(h\)-type, accept \(x_k+s_k\), update \(\Delta _k\) according to (21), and update \(H_k\).

    • When \(k\) is \(c\)-type, accept \(x_k+s_k\) if (22) holds, update \(\Delta _k\) according to (24), and update \(H_k\).

  • Step 5: If \(x_k+s_k\) has been accepted, set \(x_{k+1}=x_k+s_k\), and set \(x_{k+1}=x_k\) otherwise.

  • Step 6: If \(x_k+s_k\) has been accepted, solve (8) for \(\lambda _{k+1}\).

  • Step 7: If \(x_k+s_k\) has been accepted, choose a symmetric matrix \(B_{k+1}\).

  • Step 8: Increment \(k\) by one and go to Step 1.

Remark 1

From step 4 we observe that the control set \(H_k\) is updated if \(k\) is a successful \(h\) or \(c\)-type iterations, and left unchanged otherwise. Also, from step 4 we see that \(k\) is always a successful iteration if it is \(h\)-type.

3 Global convergence

Firtst, we make the following assumptions that are essential for our convergence analysis.

Assumptions

A1. :

Then objective \(f\) and the constraints \(c\) are twice continuously differentiable.

A2. :

The set \(\{x_k\}\cup \{x_k+s_k\}\) is contained in a compact and convex set \(\Omega \).

A3. :

There exists a positive constant \(\kappa _B\) such that \(||B_k||\le \kappa _B\) for all \(k\).

A4. :

Inequality (5) is satisfied for all \(k\).

A5. :

There exist two constants \(\kappa _{h},\kappa _{\sigma }>0\) such that

$$\begin{aligned} h(x)\le \kappa _h~~\Longrightarrow ~~\sigma _{\min }(A(x))\ge \kappa _{\sigma }, \end{aligned}$$
(26)

where \(\sigma _{\min }(A)\) represents the smallest singular value of \(A\).

Remark

By contrast, these assumptions are weaker than that in [13]. In [13], the authors use a double trust regions strategy and impose

$$\begin{aligned} ||s_k||\le \kappa _s \min \{\Delta _k^c,\Delta _k^f\}, \end{aligned}$$

where \(\kappa _s\) is a positive constant and \(\Delta _k^c\) and \(\Delta _k^f\) are the trust regions for the normal step and the tangent step, respectively. This assumption is strong in theory because \(\Delta _k^c\) and \(\Delta _k^f\) there are updated independently.

In the rest of this section, we denote the index set of successful iterations by \({\mathcal {S}}\) and the index sets of \(f\)-type, \(h\)-type, and \(c\)-type iterations by \({\mathcal {F}}\), \({\mathcal {H}}\), and \({\mathcal {C}}\), respectively.

Lemma 1

Suppose that \(k\in {\mathcal {S}}\) and that \(x_k\) is a feasible point which is not a KT point. Then \(k\) must be an \(f\)-type iteration and therefore all elements of the control set are positive.

Proof

The feasibility of \(x_{k}\) implies that \(n_k=0\), \(\delta _k^f=\delta _k^{f,t}\), \(\delta _k^c=0\), and by (22) that \(k\) cannot be a successful \(c\)-type iteration. The hypothesis that \(x_k\) is not a KT point implies by (16) that \(\chi _k=||Z_k^T g_k^n||>0\) and therefore (15) holds. Thus, \(k\) must be a successful \(f\)-type iteration. It follows from the mechanism of the algorithm, the control set \(H_k\) is updated only in successful \(h\)-type and \(c\)-type iterations. Recalling the update rule of the control set is substituting \(h_k^+\) defined by (14) for \(H_{k,1}\), we can deduce by induction that \(H_{k,i}>0,~i=1,\ldots ,l\) for all \(k\). \(\square \)

Lemma 2

For all \(k\), we have

$$\begin{aligned}&h(x_j)\le H_{k,1},~~\forall ~j\ge k, \end{aligned}$$
(27)

and \(H_{k,1}\) is monotonically non-increasing in \(k\).

Proof

Without loss of generality, we can assume that all \(k\) are successful iterations. We first prove the inequality

$$\begin{aligned} h(x_k)\le H_{k,1} \end{aligned}$$
(28)

by induction. Obviously, (9) implies that (28) holds for \(k=0\). For \(k\ge 1\), we assume that (28) holds for \(k-1\) and consider the following three cases.

The first case is that \(k-1\in {\mathcal {F}}\). Then at least one of (10)–(12) holds and therefore, according to the hypothesis \(h(x_{k-1})\le H_{k-1,1}\), we have from (10)–(12) that

$$\begin{aligned} h(x_k)\le \max \{H_{k-1,1},\beta h(x_{k-1}),\beta H_{k-1,2}\} =H_{k-1,1}. \end{aligned}$$

Since the \(H_k\) cannot be updated if \(k\) is an \(f\)-type iteration, we have \(H_{k,1}=H_{k-1,1}\). Thus (28) follows.

The second case is that \(k-1\in {\mathcal {H}}\). Lemma 1 implies that \(x_{k-1}\) is infeasible. Then either (11) or (12) is satisfied and \(H_{k-1}\) is updated by substituting \(h_{k-1}^+\) for \(H_{k-1,1}\). It follows from (11) and (12) that

$$\begin{aligned} h(x_k)\le \beta \max \{h(x_{k-1}),H_{k-1,2}\}. \end{aligned}$$

Therefore, by the update rule of the control set together with (14), we obtain

$$\begin{aligned} H_{k,1}=\max \{h_{k-1}^+,H_{k-1,2}\}=\max \{(1-\theta )h(x_{k-1})+\theta h(x_k),H_{k-1,2}\}> h(x_k). \end{aligned}$$

Thus (28) follows.

The third case is that \(k-1\in {\mathcal {C}}\). Then (22) holds and \(H_{k-1}\) is updated by substituting \(h_{k-1}^+\) for \(H_{k-1,1}\). According to (14) and (22), we have

$$\begin{aligned} h(x_{k})<(1-\theta )h(x_{k-1})+\theta h(x_k)=h_{k-1}^+. \end{aligned}$$

Therefore, by the update rule of the control set, we have \(h_{k-1}^+\le H_{k,1}\). Hence we obtain (28) from the last two inequalities.

Now we can finish the proof of this lemma based on (28). Note that

$$\begin{aligned} \max \{h(x_{k+1}),h(x_k)\}\le H_{k,1} \end{aligned}$$

according to (10)–(12), (22), (28) and the mechanism of the algorithm. Then we have \(h_{k}^+\le H_{k,1}\) from (14). Thus the monotonicity of \(H_{k,1}\) follows from the update rule of the control set. Finally, (27) follows immediately from (28) and the monotonicity of \(H_{k,1}\).    \(\square \)

Lemma 3

For all \(k\), we have that

$$\begin{aligned} \delta _k^c\ge \kappa _c||A_k^Tc_k|| \min \left\{ \frac{||A_k^Tc_k||}{||A_k^TA_k||},\Delta _k\right\} , \end{aligned}$$
(29)

where \(\kappa _c=\frac{1}{2}\tau \), and

$$\begin{aligned} \delta _k^{f,t}\ge \kappa _f\chi _k \min \left\{ \frac{\chi _k}{||B_k||},\Delta _k\right\} , \end{aligned}$$
(30)

where \(\kappa _f=\frac{1}{2}\sqrt{1-\tau ^2}\).

Proof

It follows from Lemma 4.3 in [19] that a solution \(d^*\) to the problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{}g^Td+\frac{1}{2}d^TBd\\ \text {s.t.}&{}||d||\le \Delta \end{array}}\right. \end{aligned}$$

must satisfy the Cauchy condition

$$\begin{aligned} -g^T d^*-\frac{1}{2}(d^*)^T B d^*\ge \frac{1}{2}||g||\min \left\{ \frac{||g||}{||B||},\Delta \right\} . \end{aligned}$$
(31)

Then, according to (4), (7), (23), and (31), we have

$$\begin{aligned} \delta _k^c&=\frac{1}{2}||c_k||^2-\frac{1}{2}||c_k+A_kn_k||^2\\&\ge \frac{1}{2}||A_k^Tc_k||\min \left\{ \frac{||A_k^Tc_k||}{||A_k^TA_k||},\tau \Delta _k\right\} \\&\ge \frac{1}{2}\tau ||A_k^Tc_k||\min \left\{ \frac{||A_k^Tc_k||}{||A_k^TA_k||},\Delta _k\right\} . \end{aligned}$$

Similarly, according to (6), (16), (18), and (31), we have

$$\begin{aligned} \delta _k^{f,t}&\ge \frac{1}{2}\chi _k\min \left\{ \frac{\chi _k}{||Z_k^T B_kZ_k||},\sqrt{\Delta _k^2-||n_k||^2}\right\} \\&\ge \frac{1}{2}\chi _k\min \left\{ \frac{\chi _k}{||B_k||},\sqrt{1-\tau ^2}\Delta _k\right\} \\&\ge \frac{1}{2}\sqrt{1-\tau ^2}\chi _k\min \left\{ \frac{\chi _k}{|| B_k||},\Delta _k\right\} . \end{aligned}$$

The proof is complete.    \(\square \)

Lemma 4

For all \(k\), we have that

$$\begin{aligned} |f(x_k+s_k)-m_k(x_k+s_k)|\le \kappa _D ||s_k||^2, \end{aligned}$$
(32)

and

$$\begin{aligned} |~||c(x_k+s_k)||^2-||c_k+A_ks_k||^2~|\le \kappa _D||s_k||^2, \end{aligned}$$
(33)

where \(\kappa _D>0\) is a constant.

Proof

Inequalities (32) and (33) are just two consequences of the assumptions at the beginning of this section and Taylor’s theorem. \(\square \)

Lemma 5

If \(k\in {\mathcal {F}}\) and

$$\begin{aligned} \Delta _k\le \kappa _{\Delta }^f\chi _k, \end{aligned}$$
(34)

where \(\kappa _{\Delta }^f=\min \{\frac{1}{\kappa _B},\frac{(1-\eta _1)\zeta \kappa _f}{2\kappa _D}\}\), then \(\rho _k^f>\eta _1\). Similarly, if \(k\in {\mathcal {C}}\), \(c_k\ne 0\), and

$$\begin{aligned} \Delta _k\le \kappa _{\Delta }^c||A_k^Tc_k||, \end{aligned}$$
(35)

where \(\kappa _{\Delta }^c=\min \{\frac{1}{\kappa _A},\frac{(1-\eta _1)\kappa _c}{2\kappa _D}\}\) with \(\kappa _A\) being a positive constant, then \(\rho _k^c>\eta _1\).

Proof

It follows from (15), (30), and A4 that

$$\begin{aligned} \delta _k^f \ge \zeta \kappa _f \chi _k \min \left\{ \frac{\chi _k}{||B_k||},\Delta _k \right\} \ge \zeta \kappa _f \chi _k \min \left\{ \frac{\chi _k}{\kappa _B},\Delta _k \right\} . \end{aligned}$$

This, together with (32) and the fact that

$$\begin{aligned} ||s_k||\le ||n_k||+||t_k||\le \left( \tau +\sqrt{1-\tau ^2}\Delta _k\right) \le \sqrt{2}\Delta _k, \end{aligned}$$
(36)

implies if (34) holds then

$$\begin{aligned} |1-\rho _k^f|&=\left| \frac{f(x_k+s_k)-m_k(x_k+s_k)}{\delta _k^f}\right| \le \frac{\kappa _D||s_k||^2}{ \zeta \kappa _f \chi _k \min \left\{ \frac{\chi _k}{\kappa _B},\Delta _k \right\} }\\ {}&\le \frac{2\kappa _D\Delta _k^2}{\zeta \kappa _f \chi _k \Delta _k}\le 1-\eta _1. \end{aligned}$$

Hence, the first assertion follows. Similarly, using (29) and assumptions A1 and A2, we have

$$\begin{aligned} \delta _k^c\ge \kappa _c||A_k^Tc_k|| \min \left\{ \frac{||A_k^Tc_k||}{\kappa _A},\Delta _k\right\} , \end{aligned}$$

where \(\kappa _A=\max \limits _k\{||A_k^TA_k||\}\). This, together with (33) and (36), implies that if (35) holds then

$$\begin{aligned} |1-\rho _k^c|&= \left| \frac{||c(x_k+s_k)||^2-||c_k+A_ks_k||^2}{2\delta _k^c}\right| \le \frac{\kappa _D||s_k||^2}{ \kappa _c ||A_k^Tc_k|| \min \left\{ \frac{|||A_k^Tc_k||}{\kappa _A},\Delta _k \right\} }\\&\le \frac{2\kappa _D\Delta _k^2}{\kappa _c ||A_k^Tc_k|| \Delta _k}\le 1-\eta _1. \end{aligned}$$

Then the second assertion follows as well. \(\square \)

We show below that our algorithm can eventually make a step forward at any iterate which is not an infeasible stationary point of \(h(x)\). We recall beforehand the definition of an infeasibility stationary point of \(h(x)\).

Definition 1

A point \(\hat{x}\) is an infeasible stationary point of \(h(x)\) if \(\hat{x}\) satisfies

$$\begin{aligned} A(\hat{x})^T c(\hat{x})=0~~\text {and}~~c(\hat{x})\ne 0. \end{aligned}$$

Lemma 6

Suppose that KT points and infeasible stationary points never occur. Then we have \(|{\mathcal {S}}|=+\infty \).

Proof

According to the mechanism of the algorithm, \(x_k+s_k\) must be accepted if \(k\) is an \(h\)-type iteration, we only consider the cases \(k\in {\mathcal {F}}\) and \(k\in {\mathcal {C}}\).

Suppose that \(x_k\) is infeasible. Since by the hypothesis \(x_k\) is not an infeasible stationary point, we have \(||A_k^Tc_k||>0\). It then follows from (29) that \(\delta _k^c>0\). Therefore, when \(k\in {\mathcal {C}}\), Lemma 5 ensures that \(\rho _k^c\ge \eta _1\) for all \(\Delta _k\) such that \(\Delta _k\le \kappa _{\Delta }^c ||A_k^Tc_k||\). Thus, \(k\) is a successful \(c\)-type iteration. When \(k\in {\mathcal {F}}\), we know by (15) that \(\chi _k>\sigma _1 ||c_k||^{\sigma _2}\) and by Lemma 5 that \(\rho _k^f\ge \eta _1\) for all \(\Delta _k\) such that \(\Delta _k\le \kappa _{\Delta }^f \chi _k\). Note that (16) implies \(\chi _k\) depends on \(g_k^n=B_kn_k+g_k\) and therefore depends on \(n_k\) which may change as \(\Delta _k\) decreases. Since \(||A_k^Tc_k||>0\), it follows from Theorem 4.1 in [19] that \(||n_k||=\tau \Delta _k\) for all sufficiently small \(\Delta _k\). Using the arguments above and (5), we have

$$\begin{aligned} \chi _k\ge O(||n_k||^{\sigma _2})=O(\Delta _k^{\sigma _2}). \end{aligned}$$

Thus, (34) must be satisfied for all sufficiently small \(\Delta _k\). Therefore, a successful \(f\)-type iteration will eventually be finished at \(x_k\).

Now we suppose that \(x_k\) is feasible. Since \(x_k\) is not a KT point, we have

$$\begin{aligned} \chi _k=||Z_k^T g_k^n||=||Z_k^T g_k||>0. \end{aligned}$$

So, (15) is satisfied. It follows from \(c_k+A_ks_k=0\), (36), and Taylor’s theorem that

$$\begin{aligned} h(x_k\!+\!s_k)\!=\!\frac{1}{2}\sum _{i=1}^m c_i^2(x_k\!+\!s_k)\!=\!\frac{1}{8}\sum _{i=1}^m (s_k^T \nabla ^2 c_i(\xi _i)s_k)^2 \!\le \! \frac{1}{8}m \kappa _C^2||s_k||^4\!\le \! \frac{1}{2}m\kappa _C^2\Delta _k^4,\nonumber \\ \end{aligned}$$
(37)

where

$$\begin{aligned} \kappa _C=\max \limits _{x\in \Omega ,1\le i\le m}\{\nabla ^2 c_i(x)\}. \end{aligned}$$

So, (10) holds whenever \(\Delta _k\le \left( \frac{2H_{k,1}}{m\kappa _C^2}\right) ^{1/4}\). Applying Lemma 5 once again, we have \(\rho _k^f\ge \eta _1\) when \(\Delta _k\) is sufficiently small. Hence, a successful \(f\)-type iteration will be finished at \(x_k\) in the end. \(\square \)

Lemma 7

If \(x_k\) is infeasible but not a stationary point, then

$$\begin{aligned} \Delta _k\ge \min \left\{ \tau _1\kappa _{\Delta }^f\max (\chi _k,\sigma _1||c_k||^{\sigma _2}),\bar{\Delta }\right\} , \end{aligned}$$
(38)

or

$$\begin{aligned} \Delta _k\ge \min \left\{ \tau _1\kappa _{\Delta }^c||A_k^Tc_k||,\bar{\Delta }\right\} . \end{aligned}$$
(39)

Proof

The results follows immediately from (15), (25), Lemma 4.5, the proof of Lemma 6 and the mechanism of the algorithm. \(\square \)

Lemma 8

Suppose \(x^*\in \Omega \) is a feasible point but not a KT point. Then there exists a neighbourhood \({\mathcal {N}}(x^*)\) of \(x^*\) and positive constants \(\delta ,\mu ,\kappa \) such that for any \(x\in {\mathcal {N}}(x^*)\cap \Omega \), if \(\Delta _k\ge \mu ||c_k||\), then \(c_k+A_ks_k=0\) and (15) holds, and moreover, if

$$\begin{aligned} \mu ||c_k||\le \Delta _k\le \min \{\kappa ,(\kappa _H H_{k,2})^{1/4}\}, \end{aligned}$$

where \(\kappa _H=\frac{2\beta }{m\kappa _C^2}\), then (12) and (19) hold and \(\delta _k^f\ge \delta \Delta _k\).

Proof

Assumptions A1, A2, and A5 imply that when \(x_k\) is sufficiently close to \(x^*\), \((A_kA_k^T)^{-1}\) exists and

$$\begin{aligned} ||A_k^T(A_kA_k^T)^{-1}c_k||\le \kappa _I ||c_k|| \end{aligned}$$
(40)

for some constant \(\kappa _I>0\). Therefore, if

$$\begin{aligned} \Delta _k\ge \frac{\kappa _I}{\tau }||c_k||, \end{aligned}$$
(41)

we have

$$\begin{aligned} n_k=-A_k^T(A_kA_k^T)^{-1}c_k \end{aligned}$$
(42)

and \(c_k+A_ks_k=0\).

Because \(x^*\) is a feasible point but not a KT point, there exists a constant \(\epsilon >0\) such that, for all \(x_k\) sufficiently close to \(x^*\),

$$\begin{aligned} \chi _k\ge \epsilon >\sigma _1 ||c_k||^{\sigma _2}, \end{aligned}$$
(43)

and therefore by (30) and assumption A3

$$\begin{aligned} \delta _k^{f,t}\ge \kappa _f \epsilon \min \left\{ \frac{\epsilon }{\kappa _B},\Delta _k\right\} . \end{aligned}$$
(44)

Define

$$\begin{aligned} \delta _k^{f,n}:=f(x_k)-m_k(x_k+n_k)=-g_k^T n_k-\frac{1}{2}n_k^TB_k n_k. \end{aligned}$$

It follows from (40)–(42) and assumptions A1–A3 that if \(x_k\) is sufficiently close to \(x^*\) and (41) is satisfied,

$$\begin{aligned} |\delta _k^{f,n}|\le \left( ||g_k||+\frac{1}{2}\kappa _B||n_k||\right) ||n_k||\le \left( ||g_k||+\frac{1}{2}\kappa _B\kappa _I||c_k||\right) \kappa _I||c_k||\le \kappa _G ||c_k||,\nonumber \\ \end{aligned}$$
(45)

where \(\kappa _G=\kappa _I \max \limits _{x\in \Omega }\Big \{||\nabla f(x)||+\frac{1}{2}\kappa _B\kappa _I ||c(x)||\Big \}\). Then, applying (41), (44), and (45), we have that if \(x_k\) is sufficiently close to \(x^*\) and

$$\begin{aligned} \max \left\{ \frac{\kappa _I}{\tau },\frac{\kappa _G}{(1-\zeta )\kappa _f\epsilon }\right\} ||c_k||\le \Delta _k\le \frac{\epsilon }{\kappa _B}, \end{aligned}$$

then

$$\begin{aligned} (1-\zeta )\delta _k^{f,t}\ge -\delta _k^{f,n}, \end{aligned}$$

and therefore

$$\begin{aligned} \delta _k^f=\delta _k^{f,t}+\delta _k^{f,n}\ge \zeta \delta _k^{f,t}, \end{aligned}$$

which together with (43) implies (15).

We deduce from Lemma 5 and (43) that if

$$\begin{aligned} \Delta _k\le \kappa _{\Delta }^f \epsilon , \end{aligned}$$
(46)

then (19) holds. If, in addition, \(\Delta _k\) satisfies

$$\begin{aligned} \Delta _k\le \min \left\{ \frac{\epsilon }{\kappa _B}, \left( \frac{2\eta _1\zeta \kappa _f\epsilon }{m\gamma \kappa _C^2}\right) ^{1/3},\left( \frac{2\beta H_{k,2}}{m\kappa _C^2} \right) ^{1/4}\right\} , \end{aligned}$$

then by (19), (37), and (44), we have

$$\begin{aligned} f(x_k)-f(x_k+s_k)\ge \eta _1\delta _k^f\ge \eta _1 \zeta \kappa _f\epsilon \Delta _k\ge \gamma h(x_k+s_k), \end{aligned}$$

and

$$\begin{aligned} h(x_k+s_k)\le \beta H_{k,2}. \end{aligned}$$

The last two inequalities mean (12) holds. Finally, defining \(\delta =\zeta \kappa _f\epsilon \), \(\mu =\max \left\{ \frac{\kappa _I}{\tau },\frac{\kappa _G}{(1-\zeta )\kappa _f\epsilon }\right\} \), \(\kappa =\min \left\{ \frac{\epsilon }{\kappa _B},\kappa _{\Delta }^f \epsilon \left( \frac{2\eta _1\zeta \kappa _f\epsilon }{m\gamma \kappa _C^2}\right) ^{1/3}\right\} \), and choosing a sufficiently small neighbourhood \({\mathcal {N}}(x^*)\), we complete the proof. \(\square \)

Now we consider convergence of the case that successful \(c\)-type and \(h\)-type iterations are finitely many.

Lemma 9

Suppose that \(|{\mathcal {S}}|=+\infty \) and \(|({\mathcal {H}} \cup {\mathcal {C}})\cap {\mathcal {S}}|<+\infty \). Then there exists a subsequence \({\mathcal {K}}\subset {\mathcal {S}}\) such that

$$\begin{aligned} \lim _{k\rightarrow \infty ,k\in {\mathcal {K}}}h(x_k)=0, \end{aligned}$$
(47)

and any limit point of \(\{x_k\}_{k\in {\mathcal {K}}}\) is a KT point.

Proof

Suppose that \(x_k\) is infeasible for all sufficiently large \(k\) for otherwise (47) must hold for some subsequence \({\mathcal {K}}\). The hypothesis of this lemma implies \(k\in {\mathcal {F}}\) for all \(k\in {\mathcal {S}}\) sufficiently large. Then \(\{f(x_k)\}\) is monotonically non-increasing from (15) and (19). It follows from (13) and Lemma 1 of [8] that \(\lim _{k\rightarrow \infty }h(x_k)=0\). Thus, (47) follows immediately.

Let \(x^*\) be an arbitrary limit point of \(\{x_k\}_{k\in {\mathcal {K}}}\). From (47), we deduce that \(x^*\) is feasible. Without loss of generality, suppose that \(\lim _{k\rightarrow \infty ,k\in {\mathcal {K}}}x_k=x^*\). To derive a contradiction, we assume \(x^*\) is not a KT point. Then, for sufficiently large \(k\in {\mathcal {K}}\), we have \(x_k\in {\mathcal {N}}(x^*)\), where \({\mathcal {N}}(x^*)\) is a neighbourhood of \(x^*\) characterized in Lemma 8. Applying Lemma 8, if

$$\begin{aligned} \mu ||c_k||\le \Delta _k\le \min \{\kappa ,(\kappa _H H_{k,2})^{1/4}\}, \end{aligned}$$
(48)

\(x_k+s_k\) must satisfies all the conditions for a successful \(f\)-type iteration. Note that the control set is not updated in a successful \(f\)-type iteration. Therefore, we can find an index \(k_0\) such that \(H_{k_0}=H_{k}\) for all \(k\ge k_0\). Hence, for all sufficiently large \(k\in {\mathcal {K}}\), the interval in (48) becomes

$$\begin{aligned} \mu ||c_k||\le \Delta _k\le \min \{\kappa ,(\kappa _H H_{k_0,2})^{1/4}\}, \end{aligned}$$

where the lower bound approaches zero and the upper bound is a positive constant. It then follows from the mechanism of the algorithm that, for all sufficiently large \(k\in {\mathcal {K}}\),

$$\begin{aligned} \Delta _k\ge \Delta _{\min }:=\min \left\{ \tau _1\kappa ,\tau _1(\kappa _H H_{k_0,2})^{1/4},\bar{\Delta }\right\} . \end{aligned}$$

Therefore, by Lemma 8 and the non-decreasing monotonicity of \(\delta _k^f\) in \(\Delta _k\) on the interval \([\mu ||c_k||,+\infty )\), we have

$$\begin{aligned} f(x_k)-f(x_k+s_k)\ge \eta _1\delta _k^f\ge \eta _1\delta \Delta _{\min }, \end{aligned}$$

this together with the non-increasing monotonicity of \(\{f(x_k)\}\), implies \(f(x_k)\rightarrow -\infty \) as \(k\rightarrow \infty \). This contradicts assumptions A1 and A2. So, the proof is complete. \(\square \)

Next we consider convergence of the case that successful \(c\)-type or \(h\)-type iterations are infinitely many.

Lemma 10

Suppose \(|{\mathcal {H}} |=+\infty \). Then \(\lim _{k\rightarrow \infty }h(x_k)=0\).

Proof

Denote \({\mathcal {H}}=\{k_i\}\). Since at least one of (10)–(12) holds in \(h\)-type iterations and by Lemma \(x_{k_i}\) is infeasible, we deduce from (11), (12), (14), and (27) that

$$\begin{aligned} h_{k_i}^+&=(1-\theta )h(x_{k_i})+\theta h(x_{k_i+1}) \\&\le (1-\theta )H_{k_i,1}+\theta \beta \max (H_{k_i,2},h(x_{k_i}))\\&\le (1-\theta +\theta \beta )H_{k_i,1}. \end{aligned}$$

It then follows from Lemma 2 and the update rule of the control set that

$$\begin{aligned} H_{k_{i+l},1}\le (1-\theta +\beta \theta ) H_{k_i,1}. \end{aligned}$$
(49)

Applying Lemma 2 once again together with (49), we have

$$\begin{aligned} \lim _{k\rightarrow \infty }H_{k,1}=0. \end{aligned}$$
(50)

Thus, the result follows from (27) and (50). \(\square \)

In what follows, to obtain global convergence, we will rule out a bad scenario of successful \(c\)-type iterations that is

$$\begin{aligned} \lim _{k\rightarrow \infty ,k\in {\mathcal {C}} \cap {\mathcal {S}} }||A_k^Tc_k||=0~~ {\text {with}} ~~\liminf \limits _{k\rightarrow \infty ,k\in {\mathcal {C}} \cap {\mathcal {S}}}||c_k||>0. \end{aligned}$$
(51)

Lemma 11

Suppose \(|{\mathcal {C}}\cap {\mathcal {S}}|=+\infty \) and (51) is avoided. Then \(\lim _{k\rightarrow \infty }h(x_k)=0\).

Proof

We first prove

$$\begin{aligned} \lim \limits _{k\rightarrow \infty ,k\in {\mathcal {C}}\cap {\mathcal {S}}} ~||A_k^T c_k||=0. \end{aligned}$$
(52)

Denote \({\mathcal {C}}\cap {\mathcal {S}}=\{k_i\}\). From (14), (22), (27), (29), we have

$$\begin{aligned} H_{k_i,1}-h_{k_i}^+&\ge h(x_{k_i})-h_{k_i}^+=\theta (h(x_{k_i})-h(x_{k_i+1}))\ge \theta \eta _1\delta _{k_i}^c\nonumber \\&\ge \theta \eta _1\kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \frac{||A_{k_i}^Tc_{k_i}||}{||A_{k_i}^TA_{k_i}||},\Delta _{k_i}\right\} \nonumber \\&\ge \theta \eta _1\kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \frac{||A_{k_i}^Tc_{k_i}||}{\kappa _A},\Delta _{k_i}\right\} , \end{aligned}$$
(53)

where \(\kappa _A\) is still defined by \(\kappa _A=\max \limits _k\{||A_k^TA_k||\}\) as in the proof of Lemma 5.

Since \(k_i\in {\mathcal {C}}\cap {\mathcal {S}}\), \(x_k\) is an infeasible point by Lemma 1. Lemma 7 implies

$$\begin{aligned} \Delta _{k_i}\ge \min \left\{ \tau _1\kappa _{\Delta }^f\sigma _1||c_{k_i}||^{\sigma _2}, \tau _1\kappa _{\Delta }^c||A_{k_i}^T c_{k_i}||,\bar{\Delta }\right\} . \end{aligned}$$
(54)

From (53), (54), and \(\kappa _{\Delta }^c\le \frac{1}{\kappa _A}\), we then have

$$\begin{aligned} H_{k_i,1}-h_{k_i}^+\ge \theta \eta _1\kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \tau _1\kappa _{\Delta }^f\sigma _1||c_{k_i}||^{\sigma _2}, \tau _1\kappa _{\Delta }^c||A_{k_i}^T c_{k_i}||,\bar{\Delta }\right\} . \end{aligned}$$
(55)

It therefore follows from (55), Lemma 2, and the update rule of the control set that

$$\begin{aligned} H_{k_i,1}-H_{k_{i+l,1}}\ge \theta \eta _1\kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \tau _1\kappa _{\Delta }^f\sigma _1||c_{k_i}||^{\sigma _2}, \tau _1\kappa _{\Delta }^c||A_{k_i}^T c_{k_i}||,\bar{\Delta }\right\} , \end{aligned}$$

which implies (52) immediately.

Since (51) is avoided, it follows from (52) that \(\liminf \limits _{k\rightarrow \infty ,k\in {\mathcal {C}}\cap {\mathcal {S}}}||c_k||=0\). So, there exists a subsequence \({\mathcal {J}}\subset {\mathcal {C}}\cap {\mathcal {S}}\) such that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty ,k\in {\mathcal {J}}}||c_k||=0. \end{aligned}$$
(56)

Remembering \(h(x_{k+1})< h(x_k)\) for all \(k\in {\mathcal {C}}\cap {\mathcal {S}}\), we have from (14) and (56) that \(\lim _{k\rightarrow \infty ,k\in {\mathcal {J}}} h_k^+=0\), which, together with Lemma 2 and the update rule of the control set, implies (50). The result then follows from (27) and (50). \(\square \)

Lemma 12

Suppose \(|({\mathcal {H}} \cup {\mathcal {C}})\cap {\mathcal {S}}|=+\infty \) and (51) is avoided. Then

$$\begin{aligned} \lim _{k\rightarrow \infty }h(x_k)=0 \end{aligned}$$
(57)

and there exists a constant \(\kappa _{\beta }\in (0,1)\) such that at least one limit point of \(\{x_k\}\) is a KT point whenever \(\beta \in [\kappa _{\beta },1)\).

Proof

Equality (57) follows immediately from Lemma 10 and Lemma 11. It follows from (14) and (57) that \(\lim _{k\rightarrow \infty }h_k^+=0\). Denote \({\mathcal {K}}=({\mathcal {H}}\cup {\mathcal {C}})\cap {\mathcal {S}}\). Therefore, by \(|{\mathcal {K}}|=+\infty \), the positivity of any \(H_{k,i}\), and the update rule of the control set, we can find a subsequence \(\{k_i\}\subset {\mathcal {K}}\) such that

$$\begin{aligned} h_{k_i}^+<H_{k,2}. \end{aligned}$$
(58)

Suppose \(x^*\) is a limit point of \(\{x_{k_i}\}\), which by (57) is a feasible point. To derive a contradiction, we assume \(x^*\) is not a KT point. Without loss of generality, we further assume \(\lim _{i\rightarrow \infty }x_{k_i}=x^*\). Thus, for sufficiently large \(k_i\), we have \(x_{k_i}\in {\mathcal {N}}(x^*)\) and

$$\begin{aligned} \chi _{k_i}\ge \epsilon , \end{aligned}$$
(59)

where \({\mathcal {N}}(x^*)\) is a neighbourhood of \(x^*\) characterized in Lemma 8, and \(\epsilon >0\) is a constant. According to (14) and (58), we have

$$\begin{aligned} h(x_{k_i})\le \frac{1}{1-\theta }h_{k_i}^+<\frac{1}{1-\theta }H_{k,2}, \end{aligned}$$

and therefore

$$\begin{aligned} ||c(x_{k_i})||\le \Big (2h(x_{k_i})\Big )^{1/2}\le O\Big ((H_{k_i,2})^{1/2}\Big ). \end{aligned}$$
(60)

We investigate the interval described in Lemma 8

$$\begin{aligned} \mu ||c_{k_i}||\le \Delta _{k_i}\le \min \Big \{\kappa ,(\kappa _H H_{k_i,2})^{1/4}\Big \}. \end{aligned}$$
(61)

It follows from (60) and \(c(x_{k_i})\rightarrow 0\) that the lower bound in (61) is eventually smaller than \(\tau _1\) times of the upper bound in (61). Thus, we have from (25) and Lemma 8 that, for all sufficiently large \(k_i\),

$$\begin{aligned} \Delta _{k_i}\ge \tau _1\Big (\kappa _H H_{k_i,2}\Big )^{1/4}. \end{aligned}$$
(62)

In addition, Lemma 8 ensures that, in this situation, (15) must hold, and therefore \(k\) cannot be an \(h\)-type iteration.

Now we consider any sufficiently large \(k_i\) such that \(x_{k_i}\in {\mathcal {N}}(x^*)\), (59) holds, and

$$\begin{aligned} h(x_{k_i})\le \kappa _h. \end{aligned}$$
(63)

Using the arguments above, we know \(k_i\in {\mathcal {C}}\cap {\mathcal {S}}\), which implies by Lemma 1 that \(x_{k_i}\) is infeasible. It then follows from (29) and assumptions A1, A2, and A5 that

$$\begin{aligned} \delta _{k_i}^c\ge \kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \frac{||A_{k_i}^Tc_{k_i}||}{||A_{k_i}^TA_{k_i}||},\Delta _{k_i}\right\} \ge \kappa _c\kappa _{\sigma } ||c_{k_i}|| \min \left\{ \frac{\kappa _{\sigma }||c_{k_i}||}{\kappa _A},\Delta _{k_i}\right\} . \end{aligned}$$
(64)

According to (60) and (62), we have \(\Delta _{k_i}\ge O(||c_{k_i}||^{1/2})\), which together with (64), implies for all sufficiently large \(k_i\),

$$\begin{aligned} \delta _{k_i}^c\ge \kappa _c\kappa _{\sigma } ||c_{k_i}|| \min \left\{ \frac{\kappa _{\sigma }||c_{k_i}||}{\kappa _A},O(||c_{k_i}||^{1/2})\right\} \ge \frac{\kappa _c\kappa _{\sigma }^2}{\kappa _A} ||c_{k_i}||^2. \end{aligned}$$
(65)

Since \(k_i\in {\mathcal {C}}\cap {\mathcal {S}}\), (22) holds, and therefore, by (65), we have

$$\begin{aligned} h(x_{k_i+1})\le h(x_{k_i})-\eta _1\delta _{k_i}^c\le \left( 1-\frac{2\kappa _c\kappa _{\sigma }^2}{\kappa _A}\right) h(x_{k_i}). \end{aligned}$$

This implies that if

$$\begin{aligned} \kappa _{\beta }:=\left( 1-\frac{2\kappa _c\kappa _{\sigma }^2}{\kappa _A}\right) \le \beta <1, \end{aligned}$$

then \(h(x_{k_i+1})\le \beta h(x_{k_i})\) and therefore (11) holds for \(k_i\). Thus, \(k_i\) cannot be a \(c\)-type iteration, which produces a contradiction. Hence, \(x^*\) is a KT point. \(\square \)

We now present our main result below.

Theorem 1

Suppose that KT points and infeasible stationary points never occur and that (51) is avoided. Then at least one limit point of \(\{x_k\}\) is a KT point whenever the parameter \(\beta \) in (11) and (12) satisfies \(\beta \in [\kappa _{\beta },1)\) where \(\kappa _{\beta }\in (0,1)\) is a constant.

Proof

The result follows immediately from from Lemmas 69, and 12. \(\square \)

4 Numerical results

In this section, preliminary numerical results are shown to demonstrate the potential of the new trust region infeasibility control algorithm. All the codes of the new algorithm were written in MATLAB7.9. Details about our implementation are described as follows.

A standard stopping criterion

$$\begin{aligned} ||c_k||_{\infty }\le 10^{-6}(1+||x_k||_2), \end{aligned}$$

and

$$\begin{aligned} ||g_k+A_k^T\lambda _{k+1}||_{\infty }\le 10^{-6}(1+||\lambda _{k+1}||_2) \end{aligned}$$

was used for our algorithm. The approximate Hessian \(B_k\) was initialized to the identity and updated by the damped BFGS formula. The dogleg method was applied to compute both normal and tangent steps. The Lagrangian multipliers were computed via MATLAB’s lsqlin function. All the parameters were chosen as:

$$\begin{aligned}&\tau =0.8,\tau _1=0.5, \tau _2=1.2,\beta =0.9999, \gamma =\theta =\zeta =\eta _1=10^{-4}, \eta _2=0.7,\\&\sigma _1=10^{-8},\sigma _2=0.5,l=\max \{\min \{15,\lceil n/5\rceil \},3\},u=\max \{1000,1.5h(x_0)\},\\&\Delta _0=\max \{0.4||x_0||,1.2\sqrt{n}\},\hat{\Delta }=10\Delta _0, \bar{\Delta }=10^{-4}. \end{aligned}$$

We compared our algorithm with the famous nonlinear optimization solver SNOPT [20]. The corresponding results are shown in Tables 1 and 2, where “TRIC” denotes our trust region infeasibility control algorithm, “Nit” denotes the number of iterations, “Nf” denotes the number of function evaluations, and “Ng” denotes the number of gradient evaluations. The test problems were a number of equality constrained problems chosen from the CUTEr collection [21].

Table 1 Numerical results
Table 2 Numerical results

We also plot the logarithmic performance profile of Dolan and Moré [22] in Fig. 1. In the plots, the performance profile is defined by

$$\begin{aligned} \pi _s(t)\triangleq \frac{\text {no. of problems where}~\log _2{(r_{p,s})}\le t}{\text {total no. of problems}}, \end{aligned}$$

where \(r_{p,s}\) is the ratio of Nf or Ng required to solve problem \(p\) by solver \(s\) and the lowest value of Ng required by any solver on this problem. The ratio \(r_{p,s}\) is set to infinity whenever solver \(s\) fails to solve problem \(p\). It can be observed from Fig. 1 that TRIC is comparable with SNOPT.

Fig. 1
figure 1

Performance profile