Abstract
Recently, a novel step acceptance mechanism for equality constrained optimization was proposed by Zhu and Pu (Comput. Appl. Math. 31(2):407–432, 2012). This new mechanism uses an infeasibility control technique that is quite different from traditional penalty functions and filters. However, in that paper, global convergence of the algorithm with this new mechanism was proved in a double trust regions framework where a strong assumption on step sizes must be required. In this paper, we improve Zhu and Pu’s work and furnish a complete global convergence proof for this infeasibility control mechanism in a standard trust region framework where some mild assumptions are sufficient. In addition, numerical results on a number of CUTEr problems accompanied by comparison with SNOPT show the efficiency of the proposed algorithm.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we consider a numerical algorithm for the following nonlinear equality constrained optimization problem
where the objective \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) and the constraints \(c:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^m\) with \(m\le n\) are sufficiently smooth.
The proposed algorithm is a member of the class of two-phase trust region algorithms for which the reader is referred to, e.g., [1–6]. More specifically, our approach is based on the Byrd–Omojokun trust region SQP method [4, 5], which is recognized as the most practical trust region method for general nonlinear equality constrained optimization problems. In the Byrd–Omojokun method, a complete step is decomposed into a normal step and a tangent step, which are computed by solving a two-phase relaxation QP subproblem.
As a common view, the technique used to accept or reject steps deeply impacts the efficiency of methods for nonlinear constrained optimization. In many traditional algorithms for nonlinear constrained optimization, penalty functions or filters are used to judge the quality of a trial step. The most obvious disadvantage of penalty functions is their over dependence upon penalty parameters. An inappropriate penalty parameter can possibly reject a good step, and even a sophisticatedly designed strategy for updating the penalty parameter is not very efficient in practical use. That is why Fletcher and Leyffer proposed the creative concept of filters [7]. Convergence theories on filter methods can be seen, e.g., in [8, 9]. Nevertheless, filter methods also have an Achilles’ heel, a restoration phase is used to reduce infeasibility until a feasible subproblem is obtained. Gould, Loh, and Robinson [10] recently proposed a new robust filter method which is free of restoration phases. This method uses a complicated unified step computation process and a mixed step acceptance strategy based on filters and steering penalty functions.
Over the last few years, methods without a penalty function or a filter has been a hot topic in the nonlinear optimization community. Bielschowsky and Gomes [11] introduced an infeasibility control technique based on trust cylinders. This method needs to obtain a possibly computationally expensive restoration step per iteration. Gould and Toint [12] introduced a trust funnel technique for step acceptance. This method uses different trust regions for normal and tangent steps, but the strategy for coordinating normal and tangent trust region radii is very sophisticated and still in exploration. Zhu and Pu [13] proposed a step acceptance mechanism based on a control set. Liu and Yuan [14] proposed a penalty-filter-free technique in the line search framework. Those four methods are designed especially for equality constrained optimization. More recently, Shen et al. [15] proposed a non-monotone SQP method for general nonlinear constrained optimization without a penalty function or a filter. Although the penalty-filter-free methods mentioned above share some similarities, they are quite different from each other.
This paper is mostly based on the method of Zhu and Pu [13]. The most outstanding novelty of Zhu and Pu’s work is a new technique controlling infeasibility via a set of constraint violation of some previous iterations. This technique is very useful in practice, but in theory it has a fly in the ointment. Specifically, a strong assumption on the step size must be required to establish global convergence. The primary cause of this assumption is that a double trust regions strategy similar to Gould and Toint’s work [12] is used for step computations and the ratio of normal and tangent trust region radii is out of control in theory though in practice it is not the case.
The main contribution of this paper is to complete the global convergence theory of the new infeasibility control technique in [13]. Compared with [13], the most significant modification made in the proposed algorithm is that the double trust regions strategy is replaced by a standard (single) trust region strategy. Global convergence to first order critical points is then proved under mild assumptions. Of course, we also present an extended numerical results on some CUTEr problems to demonstrate the efficiency of the proposed algorithm.
The paper is organized as follows. In Sect. 2, a complete description of the proposed algorithm is introduced. we Assumptions and global convergence analysis are presented in Sect. 3. Section 4 is devoted to some numerical results.
2 The algorithm
2.1 Step computation
We compute steps on the basis of the Byrd–Omojokun trust region method [5]. Each complete step \(s_k\) is composed of a normal step \(n_k\) and a tangent step \(t_k\), i.e.,
The normal step \(n_k\) aims at reducing the constraint violation function \(h(x)\), where
with \(||\cdot ||\) denoting the Euclidean norm. This function can be viewed as an infeasibility measure at a point \(x\). The tangent step \(t_k\) aims at reducing the objective as much as possible while preserving the constraint violation. Specifically, \(n_k\) and \(t_k\) are computed as follows.
For the normal step \(n_k\), we solve the following trust region least squares problem
where \(\tau \in (0,1)\), \(c_k=c(x_k)\), and \(A_k=A(x_k)\) which is the Jacobian of \(c(x)\) at \(x_k\). We assume the solution to (4), the normal step \(n_k\), satisfies
where \(\kappa _n>0\). This assumption is actually a regularization condition on the Jacobian of constraints. In fact, suppose \(A_k\) has the SVD: \(A_k=U_k \Sigma _k V_k^T\). Then \(v_k=-V_k\Sigma _k^{\dagger } U_k^Tc_k\), where \(\Sigma _k^{\dagger }\) is the pseudo-inverse of \(\Sigma _k\), solves the least squares problem
Thus, a sufficient condition for (5) is that the smallest positive singular value of \(A_k\) is bounded below away from zero. Notice that \(n_k=0\) when \(x_k\) is feasible.
After computing the normal step \(n_k\), we proceed to the next task that is to find a tangent step \(t_k\) to improve the optimality of the current iterate \(x_k\). Consider a quadratic model of the Lagrangian at \(x_k\)
where \(f_k=f(x_k)\), \(g_k=\nabla f(x_k)\), and \(B_k\) is an approximate Hessian of the Lagrangian
at \(x_k\). It follows that
and
where \(g_k^n=g_k+B_k n_k\). Then the tangent step \(t_k\) should be an solution to the following problem
But, in practice, we solves for \(t_k\) this problem
where \(Z_k\) is an orthonormal basis matrix of the null space of \(A_k\). Therefore, the dogleg method [16], the CG-Steihaug [17] method, and the generalized Lanczos trust region (GLTR) method [18] can apply. Let \(v_k\) be the obtained solution to (6), we set \(t_k=Z_kv_k\). Since the tangent step \(t_k\) is in the null space of \(A_k\), we have from (2) that
which means that the linearized constraint violation remains unchanged after the tangent step \(t_k\) is taken.
The lagrangian multiplier vector \(\lambda _{k+1}\) is obtained by solving the following least squares problem
2.2 Step acceptance
The mechanism for step acceptance is introduced by Zhu and Pu [13]. This mechanism uses a novel infeasibility control technique to promote global convergence to first order critical points. Now we describe this technique in detail as follows.
The key is the innovative concept of “control set” which is a set of \(l\) positive numbers and denoted by
where the \(l\) elements are sorted in a non-increasing order, i.e., \(H_{k,1}\ge H_{k,2} \ge \cdots \ge H_{k,l}\). At the beginning, we define \(H_0\) as \(H_0=\{u,\cdots ,u\}\) where \(u\) is a sufficiently large constant such that
For an arbitrary iteration \(k\), when the complete step \(s_k\) is computed, we consider the following three cases.
where \(\beta \) and \(\gamma \) are two constants such that \(0<\gamma <\beta <1\). If one of (11) and (12) is satisfied, then
After \(x_k+s_k\) is accepted as the next iterate \(x_{k+1}\), we may update the control set \(H_k\) by substituting a new element \(h_k^+\) for the biggest element \(H_{k,1}\), where
with \(\theta \in (0,1)\). Of course, all the elements in the new control set \(H_{k+1}\) will be rearranged in a non-increasing order as well. The purpose of the control set is evidently to compel the infeasibility of the iterates to approach zero progressively. Although only the first two elements \(H_{k,1}\) and \(H_{k,2}\) of \(H_k\) are involved in conditions (10)–(13), the length \(l\) of \(H_k\) impacts the strength of infeasibility control. For example, consider two cases that \(l=2\) and \(l=3\) with the same values for the initial number and the entering number:
Here the notation “\(\oplus \)” means the control set is updated with some new entry.
It is observed that the first two elements of bigger \(l\) change faster.
All iterations are classified into the following three types.
\(\bullet \) \(f\)-type. At least one of (10)–(12) holds and
where \(\sigma _1,\sigma _2,\zeta \in (0,1)\) and \(\chi _k,\delta _k^f,\delta _k^{f,t}\) are defined by
\(\bullet \) \(h\)-type. At least one of (10)–(12) holds is but (15) fails.
\(\bullet \) \(c\)-type. None of (10)–(12) holds.
Given some constants \(\eta _1,\eta _2,\tau _1,\tau _2,\bar{\Delta },\hat{\Delta }\) such that \(0<\eta _1<\eta _2<1\), \(0<\tau _1<1\le \tau _2\), \(0<\bar{\Delta }<\hat{\Delta }\), we accept or reject the trial step according to the following strategy.
When \(k\) is an \(f\)-type iteration, we accept \(x_k+s_k\) if
The corresponding update rule for the trust region radius \(\Delta _k\) is
When \(k\) is an \(h\)-type iteration, we always accept \(x_k+s_k\) and update the trust region radius \(\Delta _k\) according to the following rule
When \(k\) is a \(c\)-type iteration, we accept \(x_k+s_k\) if
where
The trust region radius \(\Delta _k\) is then updated by
Before we present a formal description of our trust region infeasibility control algorithm, the refer should notice that formulae (20), (21), and (24) imply that
if \(k\) is a successful iteration, which is important for the global convergence analysis in the next section.
2.3 The algorithm
Now a formal statement of the algorithm is presented as follows.
Algorithm 1
A trust-region algorithm with infeasibility control (TRIC)
-
Initialization: Choose \(x_0,B_0\) and parameters \(\beta ,\gamma ,\theta ,\zeta ,\eta _1,\eta _2,\sigma _1,\sigma _2,\tau ,\tau _1\in (0,1),u,\tau _2\in [1,+\infty )\), \(l\in \{2,3,\cdots \}\) and \(\Delta _0,\bar{\Delta },\hat{\Delta }\in (0,+\infty )\) such that \(\bar{\Delta }<\Delta _0<\hat{\Delta }\). Set \(k=0\).
-
Step 1: Stop if \(x_k\) is a KT point.
-
Step 2: Solve (4) for \(n_k\) if \(c_k\ne 0\) and set \(n_k=0\) if \(c_k=0\).
-
Step 3: Compute \(Z_k\), solve (6) for \(v_k\), set \(t_k=Z_kv_k\), and obtain \(s_k=n_k+t_k\).
-
Step 4: When \(k\) is \(f\)-type, accept \(x_k+s_k\) if (19) holds and update \(\Delta _k\) according to (20).
-
Step 5: If \(x_k+s_k\) has been accepted, set \(x_{k+1}=x_k+s_k\), and set \(x_{k+1}=x_k\) otherwise.
-
Step 6: If \(x_k+s_k\) has been accepted, solve (8) for \(\lambda _{k+1}\).
-
Step 7: If \(x_k+s_k\) has been accepted, choose a symmetric matrix \(B_{k+1}\).
-
Step 8: Increment \(k\) by one and go to Step 1.
Remark 1
From step 4 we observe that the control set \(H_k\) is updated if \(k\) is a successful \(h\) or \(c\)-type iterations, and left unchanged otherwise. Also, from step 4 we see that \(k\) is always a successful iteration if it is \(h\)-type.
3 Global convergence
Firtst, we make the following assumptions that are essential for our convergence analysis.
Assumptions
- A1. :
-
Then objective \(f\) and the constraints \(c\) are twice continuously differentiable.
- A2. :
-
The set \(\{x_k\}\cup \{x_k+s_k\}\) is contained in a compact and convex set \(\Omega \).
- A3. :
-
There exists a positive constant \(\kappa _B\) such that \(||B_k||\le \kappa _B\) for all \(k\).
- A4. :
-
Inequality (5) is satisfied for all \(k\).
- A5. :
-
There exist two constants \(\kappa _{h},\kappa _{\sigma }>0\) such that
$$\begin{aligned} h(x)\le \kappa _h~~\Longrightarrow ~~\sigma _{\min }(A(x))\ge \kappa _{\sigma }, \end{aligned}$$(26)
where \(\sigma _{\min }(A)\) represents the smallest singular value of \(A\).
Remark
By contrast, these assumptions are weaker than that in [13]. In [13], the authors use a double trust regions strategy and impose
where \(\kappa _s\) is a positive constant and \(\Delta _k^c\) and \(\Delta _k^f\) are the trust regions for the normal step and the tangent step, respectively. This assumption is strong in theory because \(\Delta _k^c\) and \(\Delta _k^f\) there are updated independently.
In the rest of this section, we denote the index set of successful iterations by \({\mathcal {S}}\) and the index sets of \(f\)-type, \(h\)-type, and \(c\)-type iterations by \({\mathcal {F}}\), \({\mathcal {H}}\), and \({\mathcal {C}}\), respectively.
Lemma 1
Suppose that \(k\in {\mathcal {S}}\) and that \(x_k\) is a feasible point which is not a KT point. Then \(k\) must be an \(f\)-type iteration and therefore all elements of the control set are positive.
Proof
The feasibility of \(x_{k}\) implies that \(n_k=0\), \(\delta _k^f=\delta _k^{f,t}\), \(\delta _k^c=0\), and by (22) that \(k\) cannot be a successful \(c\)-type iteration. The hypothesis that \(x_k\) is not a KT point implies by (16) that \(\chi _k=||Z_k^T g_k^n||>0\) and therefore (15) holds. Thus, \(k\) must be a successful \(f\)-type iteration. It follows from the mechanism of the algorithm, the control set \(H_k\) is updated only in successful \(h\)-type and \(c\)-type iterations. Recalling the update rule of the control set is substituting \(h_k^+\) defined by (14) for \(H_{k,1}\), we can deduce by induction that \(H_{k,i}>0,~i=1,\ldots ,l\) for all \(k\). \(\square \)
Lemma 2
For all \(k\), we have
and \(H_{k,1}\) is monotonically non-increasing in \(k\).
Proof
Without loss of generality, we can assume that all \(k\) are successful iterations. We first prove the inequality
by induction. Obviously, (9) implies that (28) holds for \(k=0\). For \(k\ge 1\), we assume that (28) holds for \(k-1\) and consider the following three cases.
The first case is that \(k-1\in {\mathcal {F}}\). Then at least one of (10)–(12) holds and therefore, according to the hypothesis \(h(x_{k-1})\le H_{k-1,1}\), we have from (10)–(12) that
Since the \(H_k\) cannot be updated if \(k\) is an \(f\)-type iteration, we have \(H_{k,1}=H_{k-1,1}\). Thus (28) follows.
The second case is that \(k-1\in {\mathcal {H}}\). Lemma 1 implies that \(x_{k-1}\) is infeasible. Then either (11) or (12) is satisfied and \(H_{k-1}\) is updated by substituting \(h_{k-1}^+\) for \(H_{k-1,1}\). It follows from (11) and (12) that
Therefore, by the update rule of the control set together with (14), we obtain
Thus (28) follows.
The third case is that \(k-1\in {\mathcal {C}}\). Then (22) holds and \(H_{k-1}\) is updated by substituting \(h_{k-1}^+\) for \(H_{k-1,1}\). According to (14) and (22), we have
Therefore, by the update rule of the control set, we have \(h_{k-1}^+\le H_{k,1}\). Hence we obtain (28) from the last two inequalities.
Now we can finish the proof of this lemma based on (28). Note that
according to (10)–(12), (22), (28) and the mechanism of the algorithm. Then we have \(h_{k}^+\le H_{k,1}\) from (14). Thus the monotonicity of \(H_{k,1}\) follows from the update rule of the control set. Finally, (27) follows immediately from (28) and the monotonicity of \(H_{k,1}\). \(\square \)
Lemma 3
For all \(k\), we have that
where \(\kappa _c=\frac{1}{2}\tau \), and
where \(\kappa _f=\frac{1}{2}\sqrt{1-\tau ^2}\).
Proof
It follows from Lemma 4.3 in [19] that a solution \(d^*\) to the problem
must satisfy the Cauchy condition
Then, according to (4), (7), (23), and (31), we have
Similarly, according to (6), (16), (18), and (31), we have
The proof is complete. \(\square \)
Lemma 4
For all \(k\), we have that
and
where \(\kappa _D>0\) is a constant.
Proof
Inequalities (32) and (33) are just two consequences of the assumptions at the beginning of this section and Taylor’s theorem. \(\square \)
Lemma 5
If \(k\in {\mathcal {F}}\) and
where \(\kappa _{\Delta }^f=\min \{\frac{1}{\kappa _B},\frac{(1-\eta _1)\zeta \kappa _f}{2\kappa _D}\}\), then \(\rho _k^f>\eta _1\). Similarly, if \(k\in {\mathcal {C}}\), \(c_k\ne 0\), and
where \(\kappa _{\Delta }^c=\min \{\frac{1}{\kappa _A},\frac{(1-\eta _1)\kappa _c}{2\kappa _D}\}\) with \(\kappa _A\) being a positive constant, then \(\rho _k^c>\eta _1\).
Proof
It follows from (15), (30), and A4 that
This, together with (32) and the fact that
implies if (34) holds then
Hence, the first assertion follows. Similarly, using (29) and assumptions A1 and A2, we have
where \(\kappa _A=\max \limits _k\{||A_k^TA_k||\}\). This, together with (33) and (36), implies that if (35) holds then
Then the second assertion follows as well. \(\square \)
We show below that our algorithm can eventually make a step forward at any iterate which is not an infeasible stationary point of \(h(x)\). We recall beforehand the definition of an infeasibility stationary point of \(h(x)\).
Definition 1
A point \(\hat{x}\) is an infeasible stationary point of \(h(x)\) if \(\hat{x}\) satisfies
Lemma 6
Suppose that KT points and infeasible stationary points never occur. Then we have \(|{\mathcal {S}}|=+\infty \).
Proof
According to the mechanism of the algorithm, \(x_k+s_k\) must be accepted if \(k\) is an \(h\)-type iteration, we only consider the cases \(k\in {\mathcal {F}}\) and \(k\in {\mathcal {C}}\).
Suppose that \(x_k\) is infeasible. Since by the hypothesis \(x_k\) is not an infeasible stationary point, we have \(||A_k^Tc_k||>0\). It then follows from (29) that \(\delta _k^c>0\). Therefore, when \(k\in {\mathcal {C}}\), Lemma 5 ensures that \(\rho _k^c\ge \eta _1\) for all \(\Delta _k\) such that \(\Delta _k\le \kappa _{\Delta }^c ||A_k^Tc_k||\). Thus, \(k\) is a successful \(c\)-type iteration. When \(k\in {\mathcal {F}}\), we know by (15) that \(\chi _k>\sigma _1 ||c_k||^{\sigma _2}\) and by Lemma 5 that \(\rho _k^f\ge \eta _1\) for all \(\Delta _k\) such that \(\Delta _k\le \kappa _{\Delta }^f \chi _k\). Note that (16) implies \(\chi _k\) depends on \(g_k^n=B_kn_k+g_k\) and therefore depends on \(n_k\) which may change as \(\Delta _k\) decreases. Since \(||A_k^Tc_k||>0\), it follows from Theorem 4.1 in [19] that \(||n_k||=\tau \Delta _k\) for all sufficiently small \(\Delta _k\). Using the arguments above and (5), we have
Thus, (34) must be satisfied for all sufficiently small \(\Delta _k\). Therefore, a successful \(f\)-type iteration will eventually be finished at \(x_k\).
Now we suppose that \(x_k\) is feasible. Since \(x_k\) is not a KT point, we have
So, (15) is satisfied. It follows from \(c_k+A_ks_k=0\), (36), and Taylor’s theorem that
where
So, (10) holds whenever \(\Delta _k\le \left( \frac{2H_{k,1}}{m\kappa _C^2}\right) ^{1/4}\). Applying Lemma 5 once again, we have \(\rho _k^f\ge \eta _1\) when \(\Delta _k\) is sufficiently small. Hence, a successful \(f\)-type iteration will be finished at \(x_k\) in the end. \(\square \)
Lemma 7
If \(x_k\) is infeasible but not a stationary point, then
or
Proof
The results follows immediately from (15), (25), Lemma 4.5, the proof of Lemma 6 and the mechanism of the algorithm. \(\square \)
Lemma 8
Suppose \(x^*\in \Omega \) is a feasible point but not a KT point. Then there exists a neighbourhood \({\mathcal {N}}(x^*)\) of \(x^*\) and positive constants \(\delta ,\mu ,\kappa \) such that for any \(x\in {\mathcal {N}}(x^*)\cap \Omega \), if \(\Delta _k\ge \mu ||c_k||\), then \(c_k+A_ks_k=0\) and (15) holds, and moreover, if
where \(\kappa _H=\frac{2\beta }{m\kappa _C^2}\), then (12) and (19) hold and \(\delta _k^f\ge \delta \Delta _k\).
Proof
Assumptions A1, A2, and A5 imply that when \(x_k\) is sufficiently close to \(x^*\), \((A_kA_k^T)^{-1}\) exists and
for some constant \(\kappa _I>0\). Therefore, if
we have
and \(c_k+A_ks_k=0\).
Because \(x^*\) is a feasible point but not a KT point, there exists a constant \(\epsilon >0\) such that, for all \(x_k\) sufficiently close to \(x^*\),
and therefore by (30) and assumption A3
Define
It follows from (40)–(42) and assumptions A1–A3 that if \(x_k\) is sufficiently close to \(x^*\) and (41) is satisfied,
where \(\kappa _G=\kappa _I \max \limits _{x\in \Omega }\Big \{||\nabla f(x)||+\frac{1}{2}\kappa _B\kappa _I ||c(x)||\Big \}\). Then, applying (41), (44), and (45), we have that if \(x_k\) is sufficiently close to \(x^*\) and
then
and therefore
which together with (43) implies (15).
We deduce from Lemma 5 and (43) that if
then (19) holds. If, in addition, \(\Delta _k\) satisfies
then by (19), (37), and (44), we have
and
The last two inequalities mean (12) holds. Finally, defining \(\delta =\zeta \kappa _f\epsilon \), \(\mu =\max \left\{ \frac{\kappa _I}{\tau },\frac{\kappa _G}{(1-\zeta )\kappa _f\epsilon }\right\} \), \(\kappa =\min \left\{ \frac{\epsilon }{\kappa _B},\kappa _{\Delta }^f \epsilon \left( \frac{2\eta _1\zeta \kappa _f\epsilon }{m\gamma \kappa _C^2}\right) ^{1/3}\right\} \), and choosing a sufficiently small neighbourhood \({\mathcal {N}}(x^*)\), we complete the proof. \(\square \)
Now we consider convergence of the case that successful \(c\)-type and \(h\)-type iterations are finitely many.
Lemma 9
Suppose that \(|{\mathcal {S}}|=+\infty \) and \(|({\mathcal {H}} \cup {\mathcal {C}})\cap {\mathcal {S}}|<+\infty \). Then there exists a subsequence \({\mathcal {K}}\subset {\mathcal {S}}\) such that
and any limit point of \(\{x_k\}_{k\in {\mathcal {K}}}\) is a KT point.
Proof
Suppose that \(x_k\) is infeasible for all sufficiently large \(k\) for otherwise (47) must hold for some subsequence \({\mathcal {K}}\). The hypothesis of this lemma implies \(k\in {\mathcal {F}}\) for all \(k\in {\mathcal {S}}\) sufficiently large. Then \(\{f(x_k)\}\) is monotonically non-increasing from (15) and (19). It follows from (13) and Lemma 1 of [8] that \(\lim _{k\rightarrow \infty }h(x_k)=0\). Thus, (47) follows immediately.
Let \(x^*\) be an arbitrary limit point of \(\{x_k\}_{k\in {\mathcal {K}}}\). From (47), we deduce that \(x^*\) is feasible. Without loss of generality, suppose that \(\lim _{k\rightarrow \infty ,k\in {\mathcal {K}}}x_k=x^*\). To derive a contradiction, we assume \(x^*\) is not a KT point. Then, for sufficiently large \(k\in {\mathcal {K}}\), we have \(x_k\in {\mathcal {N}}(x^*)\), where \({\mathcal {N}}(x^*)\) is a neighbourhood of \(x^*\) characterized in Lemma 8. Applying Lemma 8, if
\(x_k+s_k\) must satisfies all the conditions for a successful \(f\)-type iteration. Note that the control set is not updated in a successful \(f\)-type iteration. Therefore, we can find an index \(k_0\) such that \(H_{k_0}=H_{k}\) for all \(k\ge k_0\). Hence, for all sufficiently large \(k\in {\mathcal {K}}\), the interval in (48) becomes
where the lower bound approaches zero and the upper bound is a positive constant. It then follows from the mechanism of the algorithm that, for all sufficiently large \(k\in {\mathcal {K}}\),
Therefore, by Lemma 8 and the non-decreasing monotonicity of \(\delta _k^f\) in \(\Delta _k\) on the interval \([\mu ||c_k||,+\infty )\), we have
this together with the non-increasing monotonicity of \(\{f(x_k)\}\), implies \(f(x_k)\rightarrow -\infty \) as \(k\rightarrow \infty \). This contradicts assumptions A1 and A2. So, the proof is complete. \(\square \)
Next we consider convergence of the case that successful \(c\)-type or \(h\)-type iterations are infinitely many.
Lemma 10
Suppose \(|{\mathcal {H}} |=+\infty \). Then \(\lim _{k\rightarrow \infty }h(x_k)=0\).
Proof
Denote \({\mathcal {H}}=\{k_i\}\). Since at least one of (10)–(12) holds in \(h\)-type iterations and by Lemma \(x_{k_i}\) is infeasible, we deduce from (11), (12), (14), and (27) that
It then follows from Lemma 2 and the update rule of the control set that
Applying Lemma 2 once again together with (49), we have
Thus, the result follows from (27) and (50). \(\square \)
In what follows, to obtain global convergence, we will rule out a bad scenario of successful \(c\)-type iterations that is
Lemma 11
Suppose \(|{\mathcal {C}}\cap {\mathcal {S}}|=+\infty \) and (51) is avoided. Then \(\lim _{k\rightarrow \infty }h(x_k)=0\).
Proof
We first prove
Denote \({\mathcal {C}}\cap {\mathcal {S}}=\{k_i\}\). From (14), (22), (27), (29), we have
where \(\kappa _A\) is still defined by \(\kappa _A=\max \limits _k\{||A_k^TA_k||\}\) as in the proof of Lemma 5.
Since \(k_i\in {\mathcal {C}}\cap {\mathcal {S}}\), \(x_k\) is an infeasible point by Lemma 1. Lemma 7 implies
From (53), (54), and \(\kappa _{\Delta }^c\le \frac{1}{\kappa _A}\), we then have
It therefore follows from (55), Lemma 2, and the update rule of the control set that
which implies (52) immediately.
Since (51) is avoided, it follows from (52) that \(\liminf \limits _{k\rightarrow \infty ,k\in {\mathcal {C}}\cap {\mathcal {S}}}||c_k||=0\). So, there exists a subsequence \({\mathcal {J}}\subset {\mathcal {C}}\cap {\mathcal {S}}\) such that
Remembering \(h(x_{k+1})< h(x_k)\) for all \(k\in {\mathcal {C}}\cap {\mathcal {S}}\), we have from (14) and (56) that \(\lim _{k\rightarrow \infty ,k\in {\mathcal {J}}} h_k^+=0\), which, together with Lemma 2 and the update rule of the control set, implies (50). The result then follows from (27) and (50). \(\square \)
Lemma 12
Suppose \(|({\mathcal {H}} \cup {\mathcal {C}})\cap {\mathcal {S}}|=+\infty \) and (51) is avoided. Then
and there exists a constant \(\kappa _{\beta }\in (0,1)\) such that at least one limit point of \(\{x_k\}\) is a KT point whenever \(\beta \in [\kappa _{\beta },1)\).
Proof
Equality (57) follows immediately from Lemma 10 and Lemma 11. It follows from (14) and (57) that \(\lim _{k\rightarrow \infty }h_k^+=0\). Denote \({\mathcal {K}}=({\mathcal {H}}\cup {\mathcal {C}})\cap {\mathcal {S}}\). Therefore, by \(|{\mathcal {K}}|=+\infty \), the positivity of any \(H_{k,i}\), and the update rule of the control set, we can find a subsequence \(\{k_i\}\subset {\mathcal {K}}\) such that
Suppose \(x^*\) is a limit point of \(\{x_{k_i}\}\), which by (57) is a feasible point. To derive a contradiction, we assume \(x^*\) is not a KT point. Without loss of generality, we further assume \(\lim _{i\rightarrow \infty }x_{k_i}=x^*\). Thus, for sufficiently large \(k_i\), we have \(x_{k_i}\in {\mathcal {N}}(x^*)\) and
where \({\mathcal {N}}(x^*)\) is a neighbourhood of \(x^*\) characterized in Lemma 8, and \(\epsilon >0\) is a constant. According to (14) and (58), we have
and therefore
We investigate the interval described in Lemma 8
It follows from (60) and \(c(x_{k_i})\rightarrow 0\) that the lower bound in (61) is eventually smaller than \(\tau _1\) times of the upper bound in (61). Thus, we have from (25) and Lemma 8 that, for all sufficiently large \(k_i\),
In addition, Lemma 8 ensures that, in this situation, (15) must hold, and therefore \(k\) cannot be an \(h\)-type iteration.
Now we consider any sufficiently large \(k_i\) such that \(x_{k_i}\in {\mathcal {N}}(x^*)\), (59) holds, and
Using the arguments above, we know \(k_i\in {\mathcal {C}}\cap {\mathcal {S}}\), which implies by Lemma 1 that \(x_{k_i}\) is infeasible. It then follows from (29) and assumptions A1, A2, and A5 that
According to (60) and (62), we have \(\Delta _{k_i}\ge O(||c_{k_i}||^{1/2})\), which together with (64), implies for all sufficiently large \(k_i\),
Since \(k_i\in {\mathcal {C}}\cap {\mathcal {S}}\), (22) holds, and therefore, by (65), we have
This implies that if
then \(h(x_{k_i+1})\le \beta h(x_{k_i})\) and therefore (11) holds for \(k_i\). Thus, \(k_i\) cannot be a \(c\)-type iteration, which produces a contradiction. Hence, \(x^*\) is a KT point. \(\square \)
We now present our main result below.
Theorem 1
Suppose that KT points and infeasible stationary points never occur and that (51) is avoided. Then at least one limit point of \(\{x_k\}\) is a KT point whenever the parameter \(\beta \) in (11) and (12) satisfies \(\beta \in [\kappa _{\beta },1)\) where \(\kappa _{\beta }\in (0,1)\) is a constant.
Proof
The result follows immediately from from Lemmas 6, 9, and 12. \(\square \)
4 Numerical results
In this section, preliminary numerical results are shown to demonstrate the potential of the new trust region infeasibility control algorithm. All the codes of the new algorithm were written in MATLAB7.9. Details about our implementation are described as follows.
A standard stopping criterion
and
was used for our algorithm. The approximate Hessian \(B_k\) was initialized to the identity and updated by the damped BFGS formula. The dogleg method was applied to compute both normal and tangent steps. The Lagrangian multipliers were computed via MATLAB’s lsqlin function. All the parameters were chosen as:
We compared our algorithm with the famous nonlinear optimization solver SNOPT [20]. The corresponding results are shown in Tables 1 and 2, where “TRIC” denotes our trust region infeasibility control algorithm, “Nit” denotes the number of iterations, “Nf” denotes the number of function evaluations, and “Ng” denotes the number of gradient evaluations. The test problems were a number of equality constrained problems chosen from the CUTEr collection [21].
We also plot the logarithmic performance profile of Dolan and Moré [22] in Fig. 1. In the plots, the performance profile is defined by
where \(r_{p,s}\) is the ratio of Nf or Ng required to solve problem \(p\) by solver \(s\) and the lowest value of Ng required by any solver on this problem. The ratio \(r_{p,s}\) is set to infinity whenever solver \(s\) fails to solve problem \(p\). It can be observed from Fig. 1 that TRIC is comparable with SNOPT.
References
Byrd, R.H., Schnabel, R.B., Shultz, G.A.: A trust region algorithm for nonlinearly constrained optimization. SIAM J. Numer. Anal. 24, 1152–1170 (1987)
Celis, M.R., Dennis Jr, J.E., Tapia, R.A.: A trust region strategy for nonlinear equality constrained optimization. In: Boggs, P.T., Byrd, R.H., Schnabel, R.B. (eds.) Numerical Optimization, pp. 71–82. SIAM, Philadelphia (1985)
Conn, A.R., Gould, N.I.M., Toint, PhL: Trust-Region Methods, MPS-SIAM Series on Optimization. SIAM, Philadelphia (2000)
Lalee, M., Nocedal, J., Plantenga, T.: On the implementation of an algorithm for large-scale equalty constrained optimization. SIAM J. Optim. 8, 682–706 (1998)
Omojokun, E.O.: Trust region algorithms for optimization with nonlinear equality and inequality constraints, Ph.D. thesis, Dept. of Computer Science, University of Colorado, Boulder (1989)
Powell, M.J.D., Yuan, Y.: A trust region algorithm for equality constrained optimization. Math. Program. 49, 189–211 (1991)
Fletcher, R., Leyffer, S.: Nonlinear programming without a penalty function. Math. Program. 91, 239–269 (2002)
Fletcher, R., Leyffer, S., Toint, PhL: On the global convergence of a filter-SQP algorithm. SIAM J. Optim. 13, 44–59 (2002)
Shen, C., Leyffer, S., Fletcher, R.: A nonmonotone filter method for nonlinear optimization. Comput. Optim. Appl. 52, 583–607 (2012)
Gould, N.I.M., Loh, Y., Robinson, D.P.: A filter method with unified step computation for nonlinear optimization. SIAM J. Optim. 24, 175–209 (2014)
Bielschowsky, R.H., Gomes, F.A.M.: Dynamic control of infeasibility in equalitly constrained optimization. SIAM J. Optim. 19, 1299–1325 (2008)
Gould, N.I.M., Toint, PhL: Nonlinear programming without a penalty function or a filter. Math. Program. 122, 155–196 (2010)
Zhu, X., Pu, D.: A new double trust regions SQP method without a penalty function of a filter. Comput. Appl. Math. 31, 407–432 (2012)
Liu, X., Yuan, Y.: A sequential quadratic programming method without a penalty function or a filter for nonlinear equality constrained optimization. SIAM J. Optim. 21, 545–571 (2011)
Shen, C., Zhang, L., Wang, B., Shao, W.: Global and local convergence of a nonmonotone SQP method for constrained nonlinear optimization. Comput. Optim. Appl. 59, 435–473 (2014)
Powell, M.J.D.: A hybrid method for nonlinear equations. In: Rabinowitz, P. (ed.) Numerical Methods for Nonlinear Algebraic Equations, pp. 87–114. Gordon and Breach, London (1970)
Steihaug, T.: The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 20, 626–637 (1983)
Gould, N.I.M., Lucidi, S., Roma, M., Toint, PhL: Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 9, 504–525 (1999)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 47, 99–131 (2005)
Gould, N.I.M., Orban, D., Toint, Ph.L.: CUTEr (and SifDec), a Constrained and Unconstrained Testing Environment, revisited. Technical Report TR/PA/01/04, CERFACS, Toulouse, France (2003)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Acknowledgments
I am very grateful to the two anonymous referees for their helpful comments and suggestions which helped improve the quality of this paper. This work is supported by Talent Introduction Foundation of Shanghai University of Electric Power, No. K2013-018.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, X. On a globally convergent trust region algorithm with infeasibility control for equality constrained optimization. J. Appl. Math. Comput. 50, 275–298 (2016). https://doi.org/10.1007/s12190-015-0870-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12190-015-0870-1