On a globally convergent trust region algorithm with infeasibility control for equality constrained optimization

Zhu, Xiaojing

doi:10.1007/s12190-015-0870-1

On a globally convergent trust region algorithm with infeasibility control for equality constrained optimization

Original Research
Published: 03 February 2015

Volume 50, pages 275–298, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Applied Mathematics and Computing Aims and scope Submit manuscript

On a globally convergent trust region algorithm with infeasibility control for equality constrained optimization

Download PDF

Xiaojing Zhu¹

178 Accesses
1 Citation
Explore all metrics

Abstract

Recently, a novel step acceptance mechanism for equality constrained optimization was proposed by Zhu and Pu (Comput. Appl. Math. 31(2):407–432, 2012). This new mechanism uses an infeasibility control technique that is quite different from traditional penalty functions and filters. However, in that paper, global convergence of the algorithm with this new mechanism was proved in a double trust regions framework where a strong assumption on step sizes must be required. In this paper, we improve Zhu and Pu’s work and furnish a complete global convergence proof for this infeasibility control mechanism in a standard trust region framework where some mild assumptions are sufficient. In addition, numerical results on a number of CUTEr problems accompanied by comparison with SNOPT show the efficiency of the proposed algorithm.

A relaxed nonmonotone adaptive trust region method for solving unconstrained optimization problems

Article 30 January 2015

Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem

Article 30 October 2015

An improved adaptive trust-region algorithm

Article 23 February 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we consider a numerical algorithm for the following nonlinear equality constrained optimization problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{}f(x)\\ \text {s.t.}&{}c(x)=0, \end{array}}\right. \end{aligned}$$

(1)

where the objective $f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}$ and the constraints $c:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^m$ with $m\le n$ are sufficiently smooth.

The proposed algorithm is a member of the class of two-phase trust region algorithms for which the reader is referred to, e.g., [1–6]. More specifically, our approach is based on the Byrd–Omojokun trust region SQP method [4, 5], which is recognized as the most practical trust region method for general nonlinear equality constrained optimization problems. In the Byrd–Omojokun method, a complete step is decomposed into a normal step and a tangent step, which are computed by solving a two-phase relaxation QP subproblem.

As a common view, the technique used to accept or reject steps deeply impacts the efficiency of methods for nonlinear constrained optimization. In many traditional algorithms for nonlinear constrained optimization, penalty functions or filters are used to judge the quality of a trial step. The most obvious disadvantage of penalty functions is their over dependence upon penalty parameters. An inappropriate penalty parameter can possibly reject a good step, and even a sophisticatedly designed strategy for updating the penalty parameter is not very efficient in practical use. That is why Fletcher and Leyffer proposed the creative concept of filters [7]. Convergence theories on filter methods can be seen, e.g., in [8, 9]. Nevertheless, filter methods also have an Achilles’ heel, a restoration phase is used to reduce infeasibility until a feasible subproblem is obtained. Gould, Loh, and Robinson [10] recently proposed a new robust filter method which is free of restoration phases. This method uses a complicated unified step computation process and a mixed step acceptance strategy based on filters and steering penalty functions.

Over the last few years, methods without a penalty function or a filter has been a hot topic in the nonlinear optimization community. Bielschowsky and Gomes [11] introduced an infeasibility control technique based on trust cylinders. This method needs to obtain a possibly computationally expensive restoration step per iteration. Gould and Toint [12] introduced a trust funnel technique for step acceptance. This method uses different trust regions for normal and tangent steps, but the strategy for coordinating normal and tangent trust region radii is very sophisticated and still in exploration. Zhu and Pu [13] proposed a step acceptance mechanism based on a control set. Liu and Yuan [14] proposed a penalty-filter-free technique in the line search framework. Those four methods are designed especially for equality constrained optimization. More recently, Shen et al. [15] proposed a non-monotone SQP method for general nonlinear constrained optimization without a penalty function or a filter. Although the penalty-filter-free methods mentioned above share some similarities, they are quite different from each other.

This paper is mostly based on the method of Zhu and Pu [13]. The most outstanding novelty of Zhu and Pu’s work is a new technique controlling infeasibility via a set of constraint violation of some previous iterations. This technique is very useful in practice, but in theory it has a fly in the ointment. Specifically, a strong assumption on the step size must be required to establish global convergence. The primary cause of this assumption is that a double trust regions strategy similar to Gould and Toint’s work [12] is used for step computations and the ratio of normal and tangent trust region radii is out of control in theory though in practice it is not the case.

The main contribution of this paper is to complete the global convergence theory of the new infeasibility control technique in [13]. Compared with [13], the most significant modification made in the proposed algorithm is that the double trust regions strategy is replaced by a standard (single) trust region strategy. Global convergence to first order critical points is then proved under mild assumptions. Of course, we also present an extended numerical results on some CUTEr problems to demonstrate the efficiency of the proposed algorithm.

The paper is organized as follows. In Sect. 2, a complete description of the proposed algorithm is introduced. we Assumptions and global convergence analysis are presented in Sect. 3. Section 4 is devoted to some numerical results.

2 The algorithm

2.1 Step computation

We compute steps on the basis of the Byrd–Omojokun trust region method [5]. Each complete step $s_k$ is composed of a normal step $n_k$ and a tangent step $t_k$, i.e.,

$$\begin{aligned} s_k=n_k+t_k. \end{aligned}$$

(2)

The normal step $n_k$ aims at reducing the constraint violation function $h(x)$, where

$$\begin{aligned} h(x):=\frac{1}{2}||c(x)||^2 \end{aligned}$$

(3)

with $||\cdot ||$ denoting the Euclidean norm. This function can be viewed as an infeasibility measure at a point $x$. The tangent step $t_k$ aims at reducing the objective as much as possible while preserving the constraint violation. Specifically, $n_k$ and $t_k$ are computed as follows.

For the normal step $n_k$, we solve the following trust region least squares problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{}\frac{1}{2}||c_k+A_k v||^2\\ \text {s.t.}&{}||v||\le \tau \Delta _k, \end{array}}\right. \end{aligned}$$

(4)

where $\tau \in (0,1)$, $c_k=c(x_k)$, and $A_k=A(x_k)$ which is the Jacobian of $c(x)$ at $x_k$. We assume the solution to (4), the normal step $n_k$, satisfies

$$\begin{aligned} ||n_k||\le \kappa _n ||c_k||, \end{aligned}$$

(5)

where $\kappa _n>0$. This assumption is actually a regularization condition on the Jacobian of constraints. In fact, suppose $A_k$ has the SVD: $A_k=U_k \Sigma _k V_k^T$. Then $v_k=-V_k\Sigma _k^{\dagger } U_k^Tc_k$, where $\Sigma _k^{\dagger }$ is the pseudo-inverse of $\Sigma _k$, solves the least squares problem

$$\begin{aligned} \min ||c_k+A_k v||^2. \end{aligned}$$

Thus, a sufficient condition for (5) is that the smallest positive singular value of $A_k$ is bounded below away from zero. Notice that $n_k=0$ when $x_k$ is feasible.

After computing the normal step $n_k$, we proceed to the next task that is to find a tangent step $t_k$ to improve the optimality of the current iterate $x_k$. Consider a quadratic model of the Lagrangian at $x_k$

$$\begin{aligned} m_k(x_k+d):=f_k+g_k^Td+\frac{1}{2}d^T B_kd, \end{aligned}$$

where $f_k=f(x_k)$, $g_k=\nabla f(x_k)$, and $B_k$ is an approximate Hessian of the Lagrangian

$$\begin{aligned} {\mathcal {L}}(x,\lambda )=f(x)+\lambda ^T c(x) \end{aligned}$$

at $x_k$. It follows that

$$\begin{aligned} m_k(x_k+n_k)=f_k+ g_k^Tn_k+\frac{1}{2}n_k^TB_k n_k \end{aligned}$$

and

$$\begin{aligned} m_k(x_k+n_k+t)= m_k(x_k+n_k)+ (g_k^n)^T t+ \frac{1}{2}t^TB_k t , \end{aligned}$$

where $g_k^n=g_k+B_k n_k$. Then the tangent step $t_k$ should be an solution to the following problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{} (g_k^n)^T t+\frac{1}{2}t^TB_k t\\ \text {s.t.}&{} A_k t=0,\\ &{}||n_k+t||\le \Delta _k.\end{array}}\right. \end{aligned}$$

But, in practice, we solves for $t_k$ this problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{} (Z_k^T g_k^n)^T v+\frac{1}{2}v^T Z_k^T B_k Z_k v\\ \text {s.t.}&{}||v||\le \sqrt{\Delta _k^2-||n_k||^2},\end{array}}\right. \end{aligned}$$

(6)

where $Z_k$ is an orthonormal basis matrix of the null space of $A_k$. Therefore, the dogleg method [16], the CG-Steihaug [17] method, and the generalized Lanczos trust region (GLTR) method [18] can apply. Let $v_k$ be the obtained solution to (6), we set $t_k=Z_kv_k$. Since the tangent step $t_k$ is in the null space of $A_k$, we have from (2) that

$$\begin{aligned} c_k+A_ks_k=c_k+A_k n_k, \end{aligned}$$

(7)

which means that the linearized constraint violation remains unchanged after the tangent step $t_k$ is taken.

The lagrangian multiplier vector $\lambda _{k+1}$ is obtained by solving the following least squares problem

$$\begin{aligned} \min _{\lambda } \frac{1}{2}||g_k+A_k^T \lambda ||^2 \end{aligned}$$

(8)

2.2 Step acceptance

The mechanism for step acceptance is introduced by Zhu and Pu [13]. This mechanism uses a novel infeasibility control technique to promote global convergence to first order critical points. Now we describe this technique in detail as follows.

The key is the innovative concept of “control set” which is a set of $l$ positive numbers and denoted by

$$\begin{aligned} H_k:=\{H_{k,1},H_{k,2},\ldots ,H_{k,l}\}, \end{aligned}$$

where the $l$ elements are sorted in a non-increasing order, i.e., $H_{k,1}\ge H_{k,2} \ge \cdots \ge H_{k,l}$. At the beginning, we define $H_0$ as $H_0=\{u,\cdots ,u\}$ where $u$ is a sufficiently large constant such that

$$\begin{aligned} u\ge \max \{h(x_0),1\}. \end{aligned}$$

(9)

For an arbitrary iteration $k$, when the complete step $s_k$ is computed, we consider the following three cases.

$$\begin{aligned}&\bullet ~~h(x_k)=0,~~h(x_k+s_k)\le H_{k,1},\end{aligned}$$

(10)

$$\begin{aligned}&\bullet ~~h(x_k)>0,~~h(x_k+s_k)\le \beta h(x_k),\end{aligned}$$

(11)

$$\begin{aligned}&\bullet ~~h(x_k)>0,~~f(x_k+s_k)\le f(x_k)-\gamma h(x_k+s_k),~~ h(x_k+s_k)\le \beta H_{k,2}, \end{aligned}$$

(12)

where $\beta $ and $\gamma $ are two constants such that $0<\gamma <\beta <1$. If one of (11) and (12) is satisfied, then

$$\begin{aligned} f(x_k+s_k)\le f(x_k)-\gamma h(x_k+s_k)~~\text {or}~~ h(x_k+s_k)\le \beta h(x_k). \end{aligned}$$

(13)

After $x_k+s_k$ is accepted as the next iterate $x_{k+1}$, we may update the control set $H_k$ by substituting a new element $h_k^+$ for the biggest element $H_{k,1}$, where

$$\begin{aligned} h_k^+:= (1-\theta )h(x_k)+\theta h(x_{k+1}) \end{aligned}$$

(14)

with $\theta \in (0,1)$. Of course, all the elements in the new control set $H_{k+1}$ will be rearranged in a non-increasing order as well. The purpose of the control set is evidently to compel the infeasibility of the iterates to approach zero progressively. Although only the first two elements $H_{k,1}$ and $H_{k,2}$ of $H_k$ are involved in conditions (10)–(13), the length $l$ of $H_k$ impacts the strength of infeasibility control. For example, consider two cases that $l=2$ and $l=3$ with the same values for the initial number and the entering number:

$$\begin{aligned}&H_0=\{100,100\},~~~~~~~H_1=H_0\oplus 10=\{100,10\},~~~~~~~H_2=H_1\oplus 1=\{10,1\},\\&H_0\!=\!\{100,100,100\},~H_1\!=\!H_0\oplus 10=\{100,100,10\},~H_2\!=\!H_1\oplus 1\!=\!\{100,10,1\}. \end{aligned}$$

Here the notation “$\oplus $” means the control set is updated with some new entry.

It is observed that the first two elements of bigger $l$ change faster.

All iterations are classified into the following three types.

$\bullet $ $f$-type. At least one of (10)–(12) holds and

$$\begin{aligned} \chi _k>\sigma _1 ||c_k||^{\sigma _2},~~\delta _k^f\ge \zeta \delta _k^{f,t}, \end{aligned}$$

(15)

where $\sigma _1,\sigma _2,\zeta \in (0,1)$ and $\chi _k,\delta _k^f,\delta _k^{f,t}$ are defined by

$$\begin{aligned} \chi _k&:= ||Z_k^T g_k^n||,\end{aligned}$$

(16)

$$\begin{aligned} \delta _k^f&:= f(x_k)-m_k(x_k+s_k),\end{aligned}$$

(17)

$$\begin{aligned} \delta _k^{f,t}&:= m_k(x_k+n_k)-m_k(x_k+s_k). \end{aligned}$$

(18)

$\bullet $ $h$-type. At least one of (10)–(12) holds is but (15) fails.

$\bullet $ $c$-type. None of (10)–(12) holds.

Given some constants $\eta _1,\eta _2,\tau _1,\tau _2,\bar{\Delta },\hat{\Delta }$ such that $0<\eta _1<\eta _2<1$, $0<\tau _1<1\le \tau _2$, $0<\bar{\Delta }<\hat{\Delta }$, we accept or reject the trial step according to the following strategy.

When $k$ is an $f$-type iteration, we accept $x_k+s_k$ if

$$\begin{aligned} \rho _k^f:=\frac{f(x_k)-f(x_k+s_k)}{\delta _k^f}\ge \eta _1. \end{aligned}$$

(19)

The corresponding update rule for the trust region radius $\Delta _k$ is

$$\begin{aligned} \Delta _{k+1}=\left\{ {\begin{array}{ll} \min \{\max \{\tau _2\Delta _k,\bar{\Delta }\},\hat{\Delta }\} &{}\text {if}~~\rho _k^f \ge \eta _2,\\ \max \{\Delta _k,\bar{\Delta }\},&{}\text {if}~~\eta _1\le \rho _k^f < \eta _2,\\ \tau _1\Delta _k,&{}\text {if}~~\rho _k^f<\eta _1. \end{array}}\right. \end{aligned}$$

(20)

When $k$ is an $h$-type iteration, we always accept $x_k+s_k$ and update the trust region radius $\Delta _k$ according to the following rule

$$\begin{aligned} \Delta _{k+1}^f= \max \{\Delta _k,\bar{\Delta }\}. \end{aligned}$$

(21)

When $k$ is a $c$-type iteration, we accept $x_k+s_k$ if

$$\begin{aligned} \delta _k^c>0, ~~\rho _k^c:=\frac{h(x_k)-h(x_k+s_k)}{\delta _k^c}\ge \eta _1 \end{aligned}$$

(22)

where

$$\begin{aligned} \delta _k^c:=\frac{1}{2}||c_k||^2-\frac{1}{2}||c_k+A_ks_k||^2. \end{aligned}$$

(23)

The trust region radius $\Delta _k$ is then updated by

$$\begin{aligned} \Delta _{k+1}=\left\{ {\begin{array}{ll} \min \{\max \{\tau _2\Delta _k,\bar{\Delta }\},\hat{\Delta }\} &{}\text {if}~~\rho _k^c \ge \eta _2,\\ \max \{\Delta _k,\bar{\Delta }\},&{}\text {if}~~\eta _1\le \rho _k^c < \eta _2,\\ \tau _1\Delta _k,&{}\text {if}~~\rho _k^c<\eta _1. \end{array}}\right. \end{aligned}$$

(24)

Before we present a formal description of our trust region infeasibility control algorithm, the refer should notice that formulae (20), (21), and (24) imply that

$$\begin{aligned} \Delta _{k+1}\ge \bar{\Delta }\end{aligned}$$

(25)

if $k$ is a successful iteration, which is important for the global convergence analysis in the next section.

2.3 The algorithm

Now a formal statement of the algorithm is presented as follows.

Algorithm 1

A trust-region algorithm with infeasibility control (TRIC)

Initialization: Choose $x_0,B_0$ and parameters $\beta ,\gamma ,\theta ,\zeta ,\eta _1,\eta _2,\sigma _1,\sigma _2,\tau ,\tau _1\in (0,1),u,\tau _2\in [1,+\infty )$, $l\in \{2,3,\cdots \}$ and $\Delta _0,\bar{\Delta },\hat{\Delta }\in (0,+\infty )$ such that $\bar{\Delta }<\Delta _0<\hat{\Delta }$. Set $k=0$.
Step 1: Stop if $x_k$ is a KT point.
Step 2: Solve (4) for $n_k$ if $c_k\ne 0$ and set $n_k=0$ if $c_k=0$.
Step 3: Compute $Z_k$, solve (6) for $v_k$, set $t_k=Z_kv_k$, and obtain $s_k=n_k+t_k$.
Step 4: When $k$ is $f$-type, accept $x_k+s_k$ if (19) holds and update $\Delta _k$ according to (20).
- When $k$ is $h$-type, accept $x_k+s_k$, update $\Delta _k$ according to (21), and update $H_k$.
- When $k$ is $c$-type, accept $x_k+s_k$ if (22) holds, update $\Delta _k$ according to (24), and update $H_k$.
Step 5: If $x_k+s_k$ has been accepted, set $x_{k+1}=x_k+s_k$, and set $x_{k+1}=x_k$ otherwise.
Step 6: If $x_k+s_k$ has been accepted, solve (8) for $\lambda _{k+1}$.
Step 7: If $x_k+s_k$ has been accepted, choose a symmetric matrix $B_{k+1}$.
Step 8: Increment $k$ by one and go to Step 1.

Remark 1

From step 4 we observe that the control set $H_k$ is updated if $k$ is a successful $h$ or $c$-type iterations, and left unchanged otherwise. Also, from step 4 we see that $k$ is always a successful iteration if it is $h$-type.

3 Global convergence

Firtst, we make the following assumptions that are essential for our convergence analysis.

Assumptions

A1. :: Then objective $f$ and the constraints $c$ are twice continuously differentiable.
A2. :: The set $\{x_k\}\cup \{x_k+s_k\}$ is contained in a compact and convex set $\Omega $.
A3. :: There exists a positive constant $\kappa _B$ such that $||B_k||\le \kappa _B$ for all $k$.
A4. :: Inequality (5) is satisfied for all $k$.
A5. :: There exist two constants $\kappa _{h},\kappa _{\sigma }>0$ such that
$$\begin{aligned} h(x)\le \kappa _h~~\Longrightarrow ~~\sigma _{\min }(A(x))\ge \kappa _{\sigma }, \end{aligned}$$
(26)

where $\sigma _{\min }(A)$ represents the smallest singular value of $A$.

Remark

By contrast, these assumptions are weaker than that in [13]. In [13], the authors use a double trust regions strategy and impose

$$\begin{aligned} ||s_k||\le \kappa _s \min \{\Delta _k^c,\Delta _k^f\}, \end{aligned}$$

where $\kappa _s$ is a positive constant and $\Delta _k^c$ and $\Delta _k^f$ are the trust regions for the normal step and the tangent step, respectively. This assumption is strong in theory because $\Delta _k^c$ and $\Delta _k^f$ there are updated independently.

In the rest of this section, we denote the index set of successful iterations by ${\mathcal {S}}$ and the index sets of $f$-type, $h$-type, and $c$-type iterations by ${\mathcal {F}}$, ${\mathcal {H}}$, and ${\mathcal {C}}$, respectively.

Lemma 1

Suppose that $k\in {\mathcal {S}}$ and that $x_k$ is a feasible point which is not a KT point. Then $k$ must be an $f$-type iteration and therefore all elements of the control set are positive.

Proof

The feasibility of $x_{k}$ implies that $n_k=0$, $\delta _k^f=\delta _k^{f,t}$, $\delta _k^c=0$, and by (22) that $k$ cannot be a successful $c$-type iteration. The hypothesis that $x_k$ is not a KT point implies by (16) that $\chi _k=||Z_k^T g_k^n||>0$ and therefore (15) holds. Thus, $k$ must be a successful $f$-type iteration. It follows from the mechanism of the algorithm, the control set $H_k$ is updated only in successful $h$-type and $c$-type iterations. Recalling the update rule of the control set is substituting $h_k^+$ defined by (14) for $H_{k,1}$, we can deduce by induction that $H_{k,i}>0,~i=1,\ldots ,l$ for all $k$. $\square $

Lemma 2

For all $k$, we have

$$\begin{aligned}&h(x_j)\le H_{k,1},~~\forall ~j\ge k, \end{aligned}$$

(27)

and $H_{k,1}$ is monotonically non-increasing in $k$.

Proof

Without loss of generality, we can assume that all $k$ are successful iterations. We first prove the inequality

$$\begin{aligned} h(x_k)\le H_{k,1} \end{aligned}$$

(28)

by induction. Obviously, (9) implies that (28) holds for $k=0$. For $k\ge 1$, we assume that (28) holds for $k-1$ and consider the following three cases.

The first case is that $k-1\in {\mathcal {F}}$. Then at least one of (10)–(12) holds and therefore, according to the hypothesis $h(x_{k-1})\le H_{k-1,1}$, we have from (10)–(12) that

$$\begin{aligned} h(x_k)\le \max \{H_{k-1,1},\beta h(x_{k-1}),\beta H_{k-1,2}\} =H_{k-1,1}. \end{aligned}$$

Since the $H_k$ cannot be updated if $k$ is an $f$-type iteration, we have $H_{k,1}=H_{k-1,1}$. Thus (28) follows.

The second case is that $k-1\in {\mathcal {H}}$. Lemma 1 implies that $x_{k-1}$ is infeasible. Then either (11) or (12) is satisfied and $H_{k-1}$ is updated by substituting $h_{k-1}^+$ for $H_{k-1,1}$. It follows from (11) and (12) that

$$\begin{aligned} h(x_k)\le \beta \max \{h(x_{k-1}),H_{k-1,2}\}. \end{aligned}$$

Therefore, by the update rule of the control set together with (14), we obtain

$$\begin{aligned} H_{k,1}=\max \{h_{k-1}^+,H_{k-1,2}\}=\max \{(1-\theta )h(x_{k-1})+\theta h(x_k),H_{k-1,2}\}> h(x_k). \end{aligned}$$

Thus (28) follows.

The third case is that $k-1\in {\mathcal {C}}$. Then (22) holds and $H_{k-1}$ is updated by substituting $h_{k-1}^+$ for $H_{k-1,1}$. According to (14) and (22), we have

$$\begin{aligned} h(x_{k})<(1-\theta )h(x_{k-1})+\theta h(x_k)=h_{k-1}^+. \end{aligned}$$

Therefore, by the update rule of the control set, we have $h_{k-1}^+\le H_{k,1}$. Hence we obtain (28) from the last two inequalities.

Now we can finish the proof of this lemma based on (28). Note that

$$\begin{aligned} \max \{h(x_{k+1}),h(x_k)\}\le H_{k,1} \end{aligned}$$

according to (10)–(12), (22), (28) and the mechanism of the algorithm. Then we have $h_{k}^+\le H_{k,1}$ from (14). Thus the monotonicity of $H_{k,1}$ follows from the update rule of the control set. Finally, (27) follows immediately from (28) and the monotonicity of $H_{k,1}$. $\square $

Lemma 3

For all $k$, we have that

$$\begin{aligned} \delta _k^c\ge \kappa _c||A_k^Tc_k|| \min \left\{ \frac{||A_k^Tc_k||}{||A_k^TA_k||},\Delta _k\right\} , \end{aligned}$$

(29)

where $\kappa _c=\frac{1}{2}\tau $, and

$$\begin{aligned} \delta _k^{f,t}\ge \kappa _f\chi _k \min \left\{ \frac{\chi _k}{||B_k||},\Delta _k\right\} , \end{aligned}$$

(30)

where $\kappa _f=\frac{1}{2}\sqrt{1-\tau ^2}$.

Proof

It follows from Lemma 4.3 in [19] that a solution $d^*$ to the problem

$$\begin{aligned} \left\{ {\begin{array}{rl} \mathop {\min }&{}g^Td+\frac{1}{2}d^TBd\\ \text {s.t.}&{}||d||\le \Delta \end{array}}\right. \end{aligned}$$

must satisfy the Cauchy condition

$$\begin{aligned} -g^T d^*-\frac{1}{2}(d^*)^T B d^*\ge \frac{1}{2}||g||\min \left\{ \frac{||g||}{||B||},\Delta \right\} . \end{aligned}$$

(31)

Then, according to (4), (7), (23), and (31), we have

$$\begin{aligned} \delta _k^c&=\frac{1}{2}||c_k||^2-\frac{1}{2}||c_k+A_kn_k||^2\\&\ge \frac{1}{2}||A_k^Tc_k||\min \left\{ \frac{||A_k^Tc_k||}{||A_k^TA_k||},\tau \Delta _k\right\} \\&\ge \frac{1}{2}\tau ||A_k^Tc_k||\min \left\{ \frac{||A_k^Tc_k||}{||A_k^TA_k||},\Delta _k\right\} . \end{aligned}$$

Similarly, according to (6), (16), (18), and (31), we have

$$\begin{aligned} \delta _k^{f,t}&\ge \frac{1}{2}\chi _k\min \left\{ \frac{\chi _k}{||Z_k^T B_kZ_k||},\sqrt{\Delta _k^2-||n_k||^2}\right\} \\&\ge \frac{1}{2}\chi _k\min \left\{ \frac{\chi _k}{||B_k||},\sqrt{1-\tau ^2}\Delta _k\right\} \\&\ge \frac{1}{2}\sqrt{1-\tau ^2}\chi _k\min \left\{ \frac{\chi _k}{|| B_k||},\Delta _k\right\} . \end{aligned}$$

The proof is complete. $\square $

Lemma 4

For all $k$, we have that

$$\begin{aligned} |f(x_k+s_k)-m_k(x_k+s_k)|\le \kappa _D ||s_k||^2, \end{aligned}$$

(32)

and

$$\begin{aligned} |~||c(x_k+s_k)||^2-||c_k+A_ks_k||^2~|\le \kappa _D||s_k||^2, \end{aligned}$$

(33)

where $\kappa _D>0$ is a constant.

Proof

Inequalities (32) and (33) are just two consequences of the assumptions at the beginning of this section and Taylor’s theorem. $\square $

Lemma 5

If $k\in {\mathcal {F}}$ and

$$\begin{aligned} \Delta _k\le \kappa _{\Delta }^f\chi _k, \end{aligned}$$

(34)

where $\kappa _{\Delta }^f=\min \{\frac{1}{\kappa _B},\frac{(1-\eta _1)\zeta \kappa _f}{2\kappa _D}\}$, then $\rho _k^f>\eta _1$. Similarly, if $k\in {\mathcal {C}}$, $c_k\ne 0$, and

$$\begin{aligned} \Delta _k\le \kappa _{\Delta }^c||A_k^Tc_k||, \end{aligned}$$

(35)

where $\kappa _{\Delta }^c=\min \{\frac{1}{\kappa _A},\frac{(1-\eta _1)\kappa _c}{2\kappa _D}\}$ with $\kappa _A$ being a positive constant, then $\rho _k^c>\eta _1$.

Proof

It follows from (15), (30), and A4 that

$$\begin{aligned} \delta _k^f \ge \zeta \kappa _f \chi _k \min \left\{ \frac{\chi _k}{||B_k||},\Delta _k \right\} \ge \zeta \kappa _f \chi _k \min \left\{ \frac{\chi _k}{\kappa _B},\Delta _k \right\} . \end{aligned}$$

This, together with (32) and the fact that

$$\begin{aligned} ||s_k||\le ||n_k||+||t_k||\le \left( \tau +\sqrt{1-\tau ^2}\Delta _k\right) \le \sqrt{2}\Delta _k, \end{aligned}$$

(36)

implies if (34) holds then

$$\begin{aligned} |1-\rho _k^f|&=\left| \frac{f(x_k+s_k)-m_k(x_k+s_k)}{\delta _k^f}\right| \le \frac{\kappa _D||s_k||^2}{ \zeta \kappa _f \chi _k \min \left\{ \frac{\chi _k}{\kappa _B},\Delta _k \right\} }\\ {}&\le \frac{2\kappa _D\Delta _k^2}{\zeta \kappa _f \chi _k \Delta _k}\le 1-\eta _1. \end{aligned}$$

Hence, the first assertion follows. Similarly, using (29) and assumptions A1 and A2, we have

$$\begin{aligned} \delta _k^c\ge \kappa _c||A_k^Tc_k|| \min \left\{ \frac{||A_k^Tc_k||}{\kappa _A},\Delta _k\right\} , \end{aligned}$$

where $\kappa _A=\max \limits _k\{||A_k^TA_k||\}$. This, together with (33) and (36), implies that if (35) holds then

$$\begin{aligned} |1-\rho _k^c|&= \left| \frac{||c(x_k+s_k)||^2-||c_k+A_ks_k||^2}{2\delta _k^c}\right| \le \frac{\kappa _D||s_k||^2}{ \kappa _c ||A_k^Tc_k|| \min \left\{ \frac{|||A_k^Tc_k||}{\kappa _A},\Delta _k \right\} }\\&\le \frac{2\kappa _D\Delta _k^2}{\kappa _c ||A_k^Tc_k|| \Delta _k}\le 1-\eta _1. \end{aligned}$$

Then the second assertion follows as well. $\square $

We show below that our algorithm can eventually make a step forward at any iterate which is not an infeasible stationary point of $h(x)$. We recall beforehand the definition of an infeasibility stationary point of $h(x)$.

Definition 1

A point $\hat{x}$ is an infeasible stationary point of $h(x)$ if $\hat{x}$ satisfies

$$\begin{aligned} A(\hat{x})^T c(\hat{x})=0~~\text {and}~~c(\hat{x})\ne 0. \end{aligned}$$

Lemma 6

Suppose that KT points and infeasible stationary points never occur. Then we have $|{\mathcal {S}}|=+\infty $.

Proof

According to the mechanism of the algorithm, $x_k+s_k$ must be accepted if $k$ is an $h$-type iteration, we only consider the cases $k\in {\mathcal {F}}$ and $k\in {\mathcal {C}}$.

Suppose that $x_k$ is infeasible. Since by the hypothesis $x_k$ is not an infeasible stationary point, we have $||A_k^Tc_k||>0$. It then follows from (29) that $\delta _k^c>0$. Therefore, when $k\in {\mathcal {C}}$, Lemma 5 ensures that $\rho _k^c\ge \eta _1$ for all $\Delta _k$ such that $\Delta _k\le \kappa _{\Delta }^c ||A_k^Tc_k||$. Thus, $k$ is a successful $c$-type iteration. When $k\in {\mathcal {F}}$, we know by (15) that $\chi _k>\sigma _1 ||c_k||^{\sigma _2}$ and by Lemma 5 that $\rho _k^f\ge \eta _1$ for all $\Delta _k$ such that $\Delta _k\le \kappa _{\Delta }^f \chi _k$. Note that (16) implies $\chi _k$ depends on $g_k^n=B_kn_k+g_k$ and therefore depends on $n_k$ which may change as $\Delta _k$ decreases. Since $||A_k^Tc_k||>0$, it follows from Theorem 4.1 in [19] that $||n_k||=\tau \Delta _k$ for all sufficiently small $\Delta _k$. Using the arguments above and (5), we have

$$\begin{aligned} \chi _k\ge O(||n_k||^{\sigma _2})=O(\Delta _k^{\sigma _2}). \end{aligned}$$

Thus, (34) must be satisfied for all sufficiently small $\Delta _k$. Therefore, a successful $f$-type iteration will eventually be finished at $x_k$.

Now we suppose that $x_k$ is feasible. Since $x_k$ is not a KT point, we have

$$\begin{aligned} \chi _k=||Z_k^T g_k^n||=||Z_k^T g_k||>0. \end{aligned}$$

So, (15) is satisfied. It follows from $c_k+A_ks_k=0$, (36), and Taylor’s theorem that

$$\begin{aligned} h(x_k\!+\!s_k)\!=\!\frac{1}{2}\sum _{i=1}^m c_i^2(x_k\!+\!s_k)\!=\!\frac{1}{8}\sum _{i=1}^m (s_k^T \nabla ^2 c_i(\xi _i)s_k)^2 \!\le \! \frac{1}{8}m \kappa _C^2||s_k||^4\!\le \! \frac{1}{2}m\kappa _C^2\Delta _k^4,\nonumber \\ \end{aligned}$$

(37)

where

$$\begin{aligned} \kappa _C=\max \limits _{x\in \Omega ,1\le i\le m}\{\nabla ^2 c_i(x)\}. \end{aligned}$$

So, (10) holds whenever $\Delta _k\le \left( \frac{2H_{k,1}}{m\kappa _C^2}\right) ^{1/4}$. Applying Lemma 5 once again, we have $\rho _k^f\ge \eta _1$ when $\Delta _k$ is sufficiently small. Hence, a successful $f$-type iteration will be finished at $x_k$ in the end. $\square $

Lemma 7

If $x_k$ is infeasible but not a stationary point, then

$$\begin{aligned} \Delta _k\ge \min \left\{ \tau _1\kappa _{\Delta }^f\max (\chi _k,\sigma _1||c_k||^{\sigma _2}),\bar{\Delta }\right\} , \end{aligned}$$

(38)

or

$$\begin{aligned} \Delta _k\ge \min \left\{ \tau _1\kappa _{\Delta }^c||A_k^Tc_k||,\bar{\Delta }\right\} . \end{aligned}$$

(39)

Proof

The results follows immediately from (15), (25), Lemma 4.5, the proof of Lemma 6 and the mechanism of the algorithm. $\square $

Lemma 8

Suppose $x^*\in \Omega $ is a feasible point but not a KT point. Then there exists a neighbourhood ${\mathcal {N}}(x^*)$ of $x^*$ and positive constants $\delta ,\mu ,\kappa $ such that for any $x\in {\mathcal {N}}(x^*)\cap \Omega $, if $\Delta _k\ge \mu ||c_k||$, then $c_k+A_ks_k=0$ and (15) holds, and moreover, if

$$\begin{aligned} \mu ||c_k||\le \Delta _k\le \min \{\kappa ,(\kappa _H H_{k,2})^{1/4}\}, \end{aligned}$$

where $\kappa _H=\frac{2\beta }{m\kappa _C^2}$, then (12) and (19) hold and $\delta _k^f\ge \delta \Delta _k$.

Proof

Assumptions A1, A2, and A5 imply that when $x_k$ is sufficiently close to $x^*$, $(A_kA_k^T)^{-1}$ exists and

$$\begin{aligned} ||A_k^T(A_kA_k^T)^{-1}c_k||\le \kappa _I ||c_k|| \end{aligned}$$

(40)

for some constant $\kappa _I>0$. Therefore, if

$$\begin{aligned} \Delta _k\ge \frac{\kappa _I}{\tau }||c_k||, \end{aligned}$$

(41)

we have

$$\begin{aligned} n_k=-A_k^T(A_kA_k^T)^{-1}c_k \end{aligned}$$

(42)

and $c_k+A_ks_k=0$.

Because $x^*$ is a feasible point but not a KT point, there exists a constant $\epsilon >0$ such that, for all $x_k$ sufficiently close to $x^*$,

$$\begin{aligned} \chi _k\ge \epsilon >\sigma _1 ||c_k||^{\sigma _2}, \end{aligned}$$

(43)

and therefore by (30) and assumption A3

$$\begin{aligned} \delta _k^{f,t}\ge \kappa _f \epsilon \min \left\{ \frac{\epsilon }{\kappa _B},\Delta _k\right\} . \end{aligned}$$

(44)

Define

$$\begin{aligned} \delta _k^{f,n}:=f(x_k)-m_k(x_k+n_k)=-g_k^T n_k-\frac{1}{2}n_k^TB_k n_k. \end{aligned}$$

It follows from (40)–(42) and assumptions A1–A3 that if $x_k$ is sufficiently close to $x^*$ and (41) is satisfied,

$$\begin{aligned} |\delta _k^{f,n}|\le \left( ||g_k||+\frac{1}{2}\kappa _B||n_k||\right) ||n_k||\le \left( ||g_k||+\frac{1}{2}\kappa _B\kappa _I||c_k||\right) \kappa _I||c_k||\le \kappa _G ||c_k||,\nonumber \\ \end{aligned}$$

(45)

where $\kappa _G=\kappa _I \max \limits _{x\in \Omega }\Big \{||\nabla f(x)||+\frac{1}{2}\kappa _B\kappa _I ||c(x)||\Big \}$. Then, applying (41), (44), and (45), we have that if $x_k$ is sufficiently close to $x^*$ and

$$\begin{aligned} \max \left\{ \frac{\kappa _I}{\tau },\frac{\kappa _G}{(1-\zeta )\kappa _f\epsilon }\right\} ||c_k||\le \Delta _k\le \frac{\epsilon }{\kappa _B}, \end{aligned}$$

then

$$\begin{aligned} (1-\zeta )\delta _k^{f,t}\ge -\delta _k^{f,n}, \end{aligned}$$

and therefore

$$\begin{aligned} \delta _k^f=\delta _k^{f,t}+\delta _k^{f,n}\ge \zeta \delta _k^{f,t}, \end{aligned}$$

which together with (43) implies (15).

We deduce from Lemma 5 and (43) that if

$$\begin{aligned} \Delta _k\le \kappa _{\Delta }^f \epsilon , \end{aligned}$$

(46)

then (19) holds. If, in addition, $\Delta _k$ satisfies

$$\begin{aligned} \Delta _k\le \min \left\{ \frac{\epsilon }{\kappa _B}, \left( \frac{2\eta _1\zeta \kappa _f\epsilon }{m\gamma \kappa _C^2}\right) ^{1/3},\left( \frac{2\beta H_{k,2}}{m\kappa _C^2} \right) ^{1/4}\right\} , \end{aligned}$$

then by (19), (37), and (44), we have

$$\begin{aligned} f(x_k)-f(x_k+s_k)\ge \eta _1\delta _k^f\ge \eta _1 \zeta \kappa _f\epsilon \Delta _k\ge \gamma h(x_k+s_k), \end{aligned}$$

and

$$\begin{aligned} h(x_k+s_k)\le \beta H_{k,2}. \end{aligned}$$

The last two inequalities mean (12) holds. Finally, defining $\delta =\zeta \kappa _f\epsilon $, $\mu =\max \left\{ \frac{\kappa _I}{\tau },\frac{\kappa _G}{(1-\zeta )\kappa _f\epsilon }\right\} $, $\kappa =\min \left\{ \frac{\epsilon }{\kappa _B},\kappa _{\Delta }^f \epsilon \left( \frac{2\eta _1\zeta \kappa _f\epsilon }{m\gamma \kappa _C^2}\right) ^{1/3}\right\} $, and choosing a sufficiently small neighbourhood ${\mathcal {N}}(x^*)$, we complete the proof. $\square $

Now we consider convergence of the case that successful $c$-type and $h$-type iterations are finitely many.

Lemma 9

Suppose that $|{\mathcal {S}}|=+\infty $ and $|({\mathcal {H}} \cup {\mathcal {C}})\cap {\mathcal {S}}|<+\infty $. Then there exists a subsequence ${\mathcal {K}}\subset {\mathcal {S}}$ such that

$$\begin{aligned} \lim _{k\rightarrow \infty ,k\in {\mathcal {K}}}h(x_k)=0, \end{aligned}$$

(47)

and any limit point of $\{x_k\}_{k\in {\mathcal {K}}}$ is a KT point.

Proof

Suppose that $x_k$ is infeasible for all sufficiently large $k$ for otherwise (47) must hold for some subsequence ${\mathcal {K}}$. The hypothesis of this lemma implies $k\in {\mathcal {F}}$ for all $k\in {\mathcal {S}}$ sufficiently large. Then $\{f(x_k)\}$ is monotonically non-increasing from (15) and (19). It follows from (13) and Lemma 1 of [8] that $\lim _{k\rightarrow \infty }h(x_k)=0$. Thus, (47) follows immediately.

Let $x^*$ be an arbitrary limit point of $\{x_k\}_{k\in {\mathcal {K}}}$. From (47), we deduce that $x^*$ is feasible. Without loss of generality, suppose that $\lim _{k\rightarrow \infty ,k\in {\mathcal {K}}}x_k=x^*$. To derive a contradiction, we assume $x^*$ is not a KT point. Then, for sufficiently large $k\in {\mathcal {K}}$, we have $x_k\in {\mathcal {N}}(x^*)$, where ${\mathcal {N}}(x^*)$ is a neighbourhood of $x^*$ characterized in Lemma 8. Applying Lemma 8, if

$$\begin{aligned} \mu ||c_k||\le \Delta _k\le \min \{\kappa ,(\kappa _H H_{k,2})^{1/4}\}, \end{aligned}$$

(48)

$x_k+s_k$ must satisfies all the conditions for a successful $f$-type iteration. Note that the control set is not updated in a successful $f$-type iteration. Therefore, we can find an index $k_0$ such that $H_{k_0}=H_{k}$ for all $k\ge k_0$. Hence, for all sufficiently large $k\in {\mathcal {K}}$, the interval in (48) becomes

$$\begin{aligned} \mu ||c_k||\le \Delta _k\le \min \{\kappa ,(\kappa _H H_{k_0,2})^{1/4}\}, \end{aligned}$$

where the lower bound approaches zero and the upper bound is a positive constant. It then follows from the mechanism of the algorithm that, for all sufficiently large $k\in {\mathcal {K}}$,

$$\begin{aligned} \Delta _k\ge \Delta _{\min }:=\min \left\{ \tau _1\kappa ,\tau _1(\kappa _H H_{k_0,2})^{1/4},\bar{\Delta }\right\} . \end{aligned}$$

Therefore, by Lemma 8 and the non-decreasing monotonicity of $\delta _k^f$ in $\Delta _k$ on the interval $[\mu ||c_k||,+\infty )$, we have

$$\begin{aligned} f(x_k)-f(x_k+s_k)\ge \eta _1\delta _k^f\ge \eta _1\delta \Delta _{\min }, \end{aligned}$$

this together with the non-increasing monotonicity of $\{f(x_k)\}$, implies $f(x_k)\rightarrow -\infty $ as $k\rightarrow \infty $. This contradicts assumptions A1 and A2. So, the proof is complete. $\square $

Next we consider convergence of the case that successful $c$-type or $h$-type iterations are infinitely many.

Lemma 10

Suppose $|{\mathcal {H}} |=+\infty $. Then $\lim _{k\rightarrow \infty }h(x_k)=0$.

Proof

Denote ${\mathcal {H}}=\{k_i\}$. Since at least one of (10)–(12) holds in $h$-type iterations and by Lemma $x_{k_i}$ is infeasible, we deduce from (11), (12), (14), and (27) that

$$\begin{aligned} h_{k_i}^+&=(1-\theta )h(x_{k_i})+\theta h(x_{k_i+1}) \\&\le (1-\theta )H_{k_i,1}+\theta \beta \max (H_{k_i,2},h(x_{k_i}))\\&\le (1-\theta +\theta \beta )H_{k_i,1}. \end{aligned}$$

It then follows from Lemma 2 and the update rule of the control set that

$$\begin{aligned} H_{k_{i+l},1}\le (1-\theta +\beta \theta ) H_{k_i,1}. \end{aligned}$$

(49)

Applying Lemma 2 once again together with (49), we have

$$\begin{aligned} \lim _{k\rightarrow \infty }H_{k,1}=0. \end{aligned}$$

(50)

Thus, the result follows from (27) and (50). $\square $

In what follows, to obtain global convergence, we will rule out a bad scenario of successful $c$-type iterations that is

$$\begin{aligned} \lim _{k\rightarrow \infty ,k\in {\mathcal {C}} \cap {\mathcal {S}} }||A_k^Tc_k||=0~~ {\text {with}} ~~\liminf \limits _{k\rightarrow \infty ,k\in {\mathcal {C}} \cap {\mathcal {S}}}||c_k||>0. \end{aligned}$$

(51)

Lemma 11

Suppose $|{\mathcal {C}}\cap {\mathcal {S}}|=+\infty $ and (51) is avoided. Then $\lim _{k\rightarrow \infty }h(x_k)=0$.

Proof

We first prove

$$\begin{aligned} \lim \limits _{k\rightarrow \infty ,k\in {\mathcal {C}}\cap {\mathcal {S}}} ~||A_k^T c_k||=0. \end{aligned}$$

(52)

Denote ${\mathcal {C}}\cap {\mathcal {S}}=\{k_i\}$. From (14), (22), (27), (29), we have

$$\begin{aligned} H_{k_i,1}-h_{k_i}^+&\ge h(x_{k_i})-h_{k_i}^+=\theta (h(x_{k_i})-h(x_{k_i+1}))\ge \theta \eta _1\delta _{k_i}^c\nonumber \\&\ge \theta \eta _1\kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \frac{||A_{k_i}^Tc_{k_i}||}{||A_{k_i}^TA_{k_i}||},\Delta _{k_i}\right\} \nonumber \\&\ge \theta \eta _1\kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \frac{||A_{k_i}^Tc_{k_i}||}{\kappa _A},\Delta _{k_i}\right\} , \end{aligned}$$

(53)

where $\kappa _A$ is still defined by $\kappa _A=\max \limits _k\{||A_k^TA_k||\}$ as in the proof of Lemma 5.

Since $k_i\in {\mathcal {C}}\cap {\mathcal {S}}$, $x_k$ is an infeasible point by Lemma 1. Lemma 7 implies

$$\begin{aligned} \Delta _{k_i}\ge \min \left\{ \tau _1\kappa _{\Delta }^f\sigma _1||c_{k_i}||^{\sigma _2}, \tau _1\kappa _{\Delta }^c||A_{k_i}^T c_{k_i}||,\bar{\Delta }\right\} . \end{aligned}$$

(54)

From (53), (54), and $\kappa _{\Delta }^c\le \frac{1}{\kappa _A}$, we then have

$$\begin{aligned} H_{k_i,1}-h_{k_i}^+\ge \theta \eta _1\kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \tau _1\kappa _{\Delta }^f\sigma _1||c_{k_i}||^{\sigma _2}, \tau _1\kappa _{\Delta }^c||A_{k_i}^T c_{k_i}||,\bar{\Delta }\right\} . \end{aligned}$$

(55)

It therefore follows from (55), Lemma 2, and the update rule of the control set that

$$\begin{aligned} H_{k_i,1}-H_{k_{i+l,1}}\ge \theta \eta _1\kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \tau _1\kappa _{\Delta }^f\sigma _1||c_{k_i}||^{\sigma _2}, \tau _1\kappa _{\Delta }^c||A_{k_i}^T c_{k_i}||,\bar{\Delta }\right\} , \end{aligned}$$

which implies (52) immediately.

Since (51) is avoided, it follows from (52) that $\liminf \limits _{k\rightarrow \infty ,k\in {\mathcal {C}}\cap {\mathcal {S}}}||c_k||=0$. So, there exists a subsequence ${\mathcal {J}}\subset {\mathcal {C}}\cap {\mathcal {S}}$ such that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty ,k\in {\mathcal {J}}}||c_k||=0. \end{aligned}$$

(56)

Remembering $h(x_{k+1})< h(x_k)$ for all $k\in {\mathcal {C}}\cap {\mathcal {S}}$, we have from (14) and (56) that $\lim _{k\rightarrow \infty ,k\in {\mathcal {J}}} h_k^+=0$, which, together with Lemma 2 and the update rule of the control set, implies (50). The result then follows from (27) and (50). $\square $

Lemma 12

Suppose $|({\mathcal {H}} \cup {\mathcal {C}})\cap {\mathcal {S}}|=+\infty $ and (51) is avoided. Then

$$\begin{aligned} \lim _{k\rightarrow \infty }h(x_k)=0 \end{aligned}$$

(57)

and there exists a constant $\kappa _{\beta }\in (0,1)$ such that at least one limit point of $\{x_k\}$ is a KT point whenever $\beta \in [\kappa _{\beta },1)$.

Proof

Equality (57) follows immediately from Lemma 10 and Lemma 11. It follows from (14) and (57) that $\lim _{k\rightarrow \infty }h_k^+=0$. Denote ${\mathcal {K}}=({\mathcal {H}}\cup {\mathcal {C}})\cap {\mathcal {S}}$. Therefore, by $|{\mathcal {K}}|=+\infty $, the positivity of any $H_{k,i}$, and the update rule of the control set, we can find a subsequence $\{k_i\}\subset {\mathcal {K}}$ such that

$$\begin{aligned} h_{k_i}^+<H_{k,2}. \end{aligned}$$

(58)

Suppose $x^*$ is a limit point of $\{x_{k_i}\}$, which by (57) is a feasible point. To derive a contradiction, we assume $x^*$ is not a KT point. Without loss of generality, we further assume $\lim _{i\rightarrow \infty }x_{k_i}=x^*$. Thus, for sufficiently large $k_i$, we have $x_{k_i}\in {\mathcal {N}}(x^*)$ and

$$\begin{aligned} \chi _{k_i}\ge \epsilon , \end{aligned}$$

(59)

where ${\mathcal {N}}(x^*)$ is a neighbourhood of $x^*$ characterized in Lemma 8, and $\epsilon >0$ is a constant. According to (14) and (58), we have

$$\begin{aligned} h(x_{k_i})\le \frac{1}{1-\theta }h_{k_i}^+<\frac{1}{1-\theta }H_{k,2}, \end{aligned}$$

and therefore

$$\begin{aligned} ||c(x_{k_i})||\le \Big (2h(x_{k_i})\Big )^{1/2}\le O\Big ((H_{k_i,2})^{1/2}\Big ). \end{aligned}$$

(60)

We investigate the interval described in Lemma 8

$$\begin{aligned} \mu ||c_{k_i}||\le \Delta _{k_i}\le \min \Big \{\kappa ,(\kappa _H H_{k_i,2})^{1/4}\Big \}. \end{aligned}$$

(61)

It follows from (60) and $c(x_{k_i})\rightarrow 0$ that the lower bound in (61) is eventually smaller than $\tau _1$ times of the upper bound in (61). Thus, we have from (25) and Lemma 8 that, for all sufficiently large $k_i$,

$$\begin{aligned} \Delta _{k_i}\ge \tau _1\Big (\kappa _H H_{k_i,2}\Big )^{1/4}. \end{aligned}$$

(62)

In addition, Lemma 8 ensures that, in this situation, (15) must hold, and therefore $k$ cannot be an $h$-type iteration.

Now we consider any sufficiently large $k_i$ such that $x_{k_i}\in {\mathcal {N}}(x^*)$, (59) holds, and

$$\begin{aligned} h(x_{k_i})\le \kappa _h. \end{aligned}$$

(63)

Using the arguments above, we know $k_i\in {\mathcal {C}}\cap {\mathcal {S}}$, which implies by Lemma 1 that $x_{k_i}$ is infeasible. It then follows from (29) and assumptions A1, A2, and A5 that

$$\begin{aligned} \delta _{k_i}^c\ge \kappa _c||A_{k_i}^Tc_{k_i}|| \min \left\{ \frac{||A_{k_i}^Tc_{k_i}||}{||A_{k_i}^TA_{k_i}||},\Delta _{k_i}\right\} \ge \kappa _c\kappa _{\sigma } ||c_{k_i}|| \min \left\{ \frac{\kappa _{\sigma }||c_{k_i}||}{\kappa _A},\Delta _{k_i}\right\} . \end{aligned}$$

(64)

According to (60) and (62), we have $\Delta _{k_i}\ge O(||c_{k_i}||^{1/2})$, which together with (64), implies for all sufficiently large $k_i$,

$$\begin{aligned} \delta _{k_i}^c\ge \kappa _c\kappa _{\sigma } ||c_{k_i}|| \min \left\{ \frac{\kappa _{\sigma }||c_{k_i}||}{\kappa _A},O(||c_{k_i}||^{1/2})\right\} \ge \frac{\kappa _c\kappa _{\sigma }^2}{\kappa _A} ||c_{k_i}||^2. \end{aligned}$$

(65)

Since $k_i\in {\mathcal {C}}\cap {\mathcal {S}}$, (22) holds, and therefore, by (65), we have

$$\begin{aligned} h(x_{k_i+1})\le h(x_{k_i})-\eta _1\delta _{k_i}^c\le \left( 1-\frac{2\kappa _c\kappa _{\sigma }^2}{\kappa _A}\right) h(x_{k_i}). \end{aligned}$$

This implies that if

$$\begin{aligned} \kappa _{\beta }:=\left( 1-\frac{2\kappa _c\kappa _{\sigma }^2}{\kappa _A}\right) \le \beta <1, \end{aligned}$$

then $h(x_{k_i+1})\le \beta h(x_{k_i})$ and therefore (11) holds for $k_i$. Thus, $k_i$ cannot be a $c$-type iteration, which produces a contradiction. Hence, $x^*$ is a KT point. $\square $

We now present our main result below.

Theorem 1

Suppose that KT points and infeasible stationary points never occur and that (51) is avoided. Then at least one limit point of $\{x_k\}$ is a KT point whenever the parameter $\beta $ in (11) and (12) satisfies $\beta \in [\kappa _{\beta },1)$ where $\kappa _{\beta }\in (0,1)$ is a constant.

Proof

The result follows immediately from from Lemmas 6, 9, and 12. $\square $

4 Numerical results

In this section, preliminary numerical results are shown to demonstrate the potential of the new trust region infeasibility control algorithm. All the codes of the new algorithm were written in MATLAB7.9. Details about our implementation are described as follows.

A standard stopping criterion

$$\begin{aligned} ||c_k||_{\infty }\le 10^{-6}(1+||x_k||_2), \end{aligned}$$

and

$$\begin{aligned} ||g_k+A_k^T\lambda _{k+1}||_{\infty }\le 10^{-6}(1+||\lambda _{k+1}||_2) \end{aligned}$$

was used for our algorithm. The approximate Hessian $B_k$ was initialized to the identity and updated by the damped BFGS formula. The dogleg method was applied to compute both normal and tangent steps. The Lagrangian multipliers were computed via MATLAB’s lsqlin function. All the parameters were chosen as:

$$\begin{aligned}&\tau =0.8,\tau _1=0.5, \tau _2=1.2,\beta =0.9999, \gamma =\theta =\zeta =\eta _1=10^{-4}, \eta _2=0.7,\\&\sigma _1=10^{-8},\sigma _2=0.5,l=\max \{\min \{15,\lceil n/5\rceil \},3\},u=\max \{1000,1.5h(x_0)\},\\&\Delta _0=\max \{0.4||x_0||,1.2\sqrt{n}\},\hat{\Delta }=10\Delta _0, \bar{\Delta }=10^{-4}. \end{aligned}$$

We compared our algorithm with the famous nonlinear optimization solver SNOPT [20]. The corresponding results are shown in Tables 1 and 2, where “TRIC” denotes our trust region infeasibility control algorithm, “Nit” denotes the number of iterations, “Nf” denotes the number of function evaluations, and “Ng” denotes the number of gradient evaluations. The test problems were a number of equality constrained problems chosen from the CUTEr collection [21].

Table 1 Numerical results

Full size table

Table 2 Numerical results

Full size table

We also plot the logarithmic performance profile of Dolan and Moré [22] in Fig. 1. In the plots, the performance profile is defined by

$$\begin{aligned} \pi _s(t)\triangleq \frac{\text {no. of problems where}~\log _2{(r_{p,s})}\le t}{\text {total no. of problems}}, \end{aligned}$$

where $r_{p,s}$ is the ratio of Nf or Ng required to solve problem $p$ by solver $s$ and the lowest value of Ng required by any solver on this problem. The ratio $r_{p,s}$ is set to infinity whenever solver $s$ fails to solve problem $p$. It can be observed from Fig. 1 that TRIC is comparable with SNOPT.

References

Byrd, R.H., Schnabel, R.B., Shultz, G.A.: A trust region algorithm for nonlinearly constrained optimization. SIAM J. Numer. Anal. 24, 1152–1170 (1987)
Article MathSciNet MATH Google Scholar
Celis, M.R., Dennis Jr, J.E., Tapia, R.A.: A trust region strategy for nonlinear equality constrained optimization. In: Boggs, P.T., Byrd, R.H., Schnabel, R.B. (eds.) Numerical Optimization, pp. 71–82. SIAM, Philadelphia (1985)
Google Scholar
Conn, A.R., Gould, N.I.M., Toint, PhL: Trust-Region Methods, MPS-SIAM Series on Optimization. SIAM, Philadelphia (2000)
Book Google Scholar
Lalee, M., Nocedal, J., Plantenga, T.: On the implementation of an algorithm for large-scale equalty constrained optimization. SIAM J. Optim. 8, 682–706 (1998)
Article MathSciNet MATH Google Scholar
Omojokun, E.O.: Trust region algorithms for optimization with nonlinear equality and inequality constraints, Ph.D. thesis, Dept. of Computer Science, University of Colorado, Boulder (1989)
Powell, M.J.D., Yuan, Y.: A trust region algorithm for equality constrained optimization. Math. Program. 49, 189–211 (1991)
Article MathSciNet Google Scholar
Fletcher, R., Leyffer, S.: Nonlinear programming without a penalty function. Math. Program. 91, 239–269 (2002)
Article MathSciNet MATH Google Scholar
Fletcher, R., Leyffer, S., Toint, PhL: On the global convergence of a filter-SQP algorithm. SIAM J. Optim. 13, 44–59 (2002)
Article MathSciNet MATH Google Scholar
Shen, C., Leyffer, S., Fletcher, R.: A nonmonotone filter method for nonlinear optimization. Comput. Optim. Appl. 52, 583–607 (2012)
Article MathSciNet MATH Google Scholar
Gould, N.I.M., Loh, Y., Robinson, D.P.: A filter method with unified step computation for nonlinear optimization. SIAM J. Optim. 24, 175–209 (2014)
Article MathSciNet MATH Google Scholar
Bielschowsky, R.H., Gomes, F.A.M.: Dynamic control of infeasibility in equalitly constrained optimization. SIAM J. Optim. 19, 1299–1325 (2008)
Article MathSciNet MATH Google Scholar
Gould, N.I.M., Toint, PhL: Nonlinear programming without a penalty function or a filter. Math. Program. 122, 155–196 (2010)
Article MathSciNet MATH Google Scholar
Zhu, X., Pu, D.: A new double trust regions SQP method without a penalty function of a filter. Comput. Appl. Math. 31, 407–432 (2012)
Article MathSciNet MATH Google Scholar
Liu, X., Yuan, Y.: A sequential quadratic programming method without a penalty function or a filter for nonlinear equality constrained optimization. SIAM J. Optim. 21, 545–571 (2011)
Article MathSciNet MATH Google Scholar
Shen, C., Zhang, L., Wang, B., Shao, W.: Global and local convergence of a nonmonotone SQP method for constrained nonlinear optimization. Comput. Optim. Appl. 59, 435–473 (2014)
Article MathSciNet MATH Google Scholar
Powell, M.J.D.: A hybrid method for nonlinear equations. In: Rabinowitz, P. (ed.) Numerical Methods for Nonlinear Algebraic Equations, pp. 87–114. Gordon and Breach, London (1970)
Google Scholar
Steihaug, T.: The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 20, 626–637 (1983)
Article MathSciNet MATH Google Scholar
Gould, N.I.M., Lucidi, S., Roma, M., Toint, PhL: Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 9, 504–525 (1999)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 47, 99–131 (2005)
Article MathSciNet MATH Google Scholar
Gould, N.I.M., Orban, D., Toint, Ph.L.: CUTEr (and SifDec), a Constrained and Unconstrained Testing Environment, revisited. Technical Report TR/PA/01/04, CERFACS, Toulouse, France (2003)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

I am very grateful to the two anonymous referees for their helpful comments and suggestions which helped improve the quality of this paper. This work is supported by Talent Introduction Foundation of Shanghai University of Electric Power, No. K2013-018.

Author information

Authors and Affiliations

School of Mathematics and Physics, Shanghai University of Electric Power, Shanghai, 200090, People’s Republic of China
Xiaojing Zhu

Authors

Xiaojing Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojing Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, X. On a globally convergent trust region algorithm with infeasibility control for equality constrained optimization. J. Appl. Math. Comput. 50, 275–298 (2016). https://doi.org/10.1007/s12190-015-0870-1

Download citation

Received: 08 December 2014
Published: 03 February 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s12190-015-0870-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On a globally convergent trust region algorithm with infeasibility control for equality constrained optimization

Abstract

Similar content being viewed by others

A relaxed nonmonotone adaptive trust region method for solving unconstrained optimization problems

Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem

An improved adaptive trust-region algorithm

1 Introduction

2 The algorithm

2.1 Step computation

2.2 Step acceptance

2.3 The algorithm

Algorithm 1

Remark 1

3 Global convergence

Remark

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Definition 1

Lemma 6

Proof

Lemma 7

Proof

Lemma 8

Proof

Lemma 9

Proof

Lemma 10

Proof

Lemma 11

Proof

Lemma 12

Proof

Theorem 1

Proof

4 Numerical results

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation