Two-Phase-SQP Method with Higher-Order Convergence Property

Chakraborty, Suvra Kanti; Panda, Geetanjali

doi:10.1007/s40305-016-0122-6

Two-Phase-SQP Method with Higher-Order Convergence Property

Published: 02 April 2016

Volume 4, pages 385–396, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of the Operations Research Society of China Aims and scope Submit manuscript

Two-Phase-SQP Method with Higher-Order Convergence Property

Download PDF

Suvra Kanti Chakraborty¹ &
Geetanjali Panda¹

125 Accesses
2 Citations
Explore all metrics

Abstract

We propose a two-phase-SQP (Sequential Quadratic Programming) algorithm for equality-constrained optimization problem. In this paper, an iteration process is developed, and at each iteration, two quadratic sub-problems are solved. It is proved that, under some suitable assumptions and without computing further higher-order derivatives, this iteration process achieves higher-order local convergence property in comparison to Newton-SQP scheme. Theoretical advantage and a note on $l_{1}$ merit function associated to the method are provided.

An SQP method for mathematical programs with vanishing constraints with strong convergence properties

Article Open access 06 February 2017

A stabilized SQP method: superlinear convergence

Article 12 September 2016

A Globally Convergent QP-Free Algorithm for Inequality Constrained Minimax Optimization

Article 10 October 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Sequential quadratic programming (SQP) method is one of the most successful methods for solving constrained nonlinear optimization problems. This is an iterative procedure which generates a sequence of points (not necessarily feasible points), obtained by solving quadratic programming sub problems, and converges to the Karush-Kuhn-Tucker (KKT) point. This idea was first proposed by Wilson [1] in 1963. Since then SQP method has been studied extensively by many researchers (see [2–8]). The readers may see Boggs [9], Gould et al. [10], Schittkowski et al. [11] for some good reviews on SQP algorithms, discussed so far. Some SQP methods use convex quasi-Newton approximation which makes the algorithm slow in case of large scale problems, whereas some other SQP methods employ the exact Hessian of the Lagrangian, which are Newton like methods. Under suitable assumptions the Newton version of the SQP algorithm converges to the minimum point without the use of line search and additional parameters. The Newton-SQP framework is the same as solving a Newton system derived from the KKT conditions, in which case it is often difficult to choose an initial iterate, close enough to the true solution to guarantee the convergence of the algorithm. These SQP methods, discussed so far, at most show quadratic or superlinear local convergence property. For the past few decades, researchers have shown their interest (see [12–16]) in iterative algorithms with higher-order convergence property. Again we notice that equality-constrained optimization emerges as a special branch of interest in constrained optimization theory (see [17, 18]).

In this paper, we suggest a two-phase-SQP method for equality-constrained optimization problems, which provides local cubic-order convergence under some comfortable assumptions. We have also discussed the conditions for the associated line search method to the scheme, under the $l_{1}$ merit function.

Section 2 explains the Newton-SQP method. The two-phase-SQP is proposed in Sect. 3 which is followed by the convergence analysis in Sect. 4. We propose a note on $l_{1}$ merit function of the new scheme in Sect. 5. Numerical examples are provided in Sect. 6, and finally some concluding remarks are provided in Sect. 7.

2 Background: Existing Newton-SQP Method

Consider the following optimization problem with equality constraints.

$$\begin{aligned} \text{(EP): }\quad&\text{ Min }\;f(x) \end{aligned}$$

(2.1a)

$$\begin{aligned}&\text{ s.t. }\; h(x)=0, \end{aligned}$$

(2.1b)

where $f : {\mathbb {R}}^n \rightarrow {\mathbb {R}}$ and $h: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^m$ are smooth functions. The motivation behind the local SQP approach is to model (EP) at the current iterate $x_k$ by a quadratic programming subproblem, then use the minimizer of the subproblem to define a new iterate $x_{k+1}$.

Lagrangian function for the problem (EP) is ${\mathscr {L}} (x,\lambda )=f(x)-\lambda ^\mathrm{T}h(x),~\lambda \in {\mathbb {R}}^{m}$. Denote $A(x)=[ \nabla h_{1}(x), \nabla h_{2}(x), \cdots , \nabla h_{m}(x)]^\mathrm{T}$ where $h_{i}(x)$ is the $i\mathrm{th}$ component of the vector h(x). The KKT system for (EP) is

$$\begin{aligned} P(x, \lambda ) \triangleq \begin{bmatrix} \nabla f(x) - A(x)^\mathrm{T} \lambda \\ h(x) \end{bmatrix} =0. \end{aligned}$$

(2.2)

Any solution $(x^{*}, \lambda ^{*})$ of (EP) for which A(x) has full rank, satisfies (2.2). System (2.2) consists of $n+m$ equations in $n+m$ unknowns x and $\lambda $. Let $P^{\prime } (x_{k}, \lambda _{k})$ be the first-order Frechet derivative of P at $(x_{k}, \lambda _{k})$. The next iterate $(x_{k+1}, \lambda _{k+1})$ is given by $x_{k+1}=x_{k}+\widehat{x}_{k}$ and $\lambda _{k+1}=\lambda _{k}+\widehat{\lambda }_{k}$, where $(\widehat{x}_{k}, \widehat{\lambda }_{k})$ is the value of $(\alpha , \beta ) \in {\mathbb {R}}^n \times {\mathbb {R}}^m$, satisfying the following system:

$$\begin{aligned} P^{\prime }(x_{k},\lambda _{k}) \begin{bmatrix} \alpha \\ \beta \end{bmatrix}= -P(x_{k}, \lambda _{k}), \end{aligned}$$

which is same as

$$\begin{aligned} \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{k}\qquad -A_{k}^\mathrm{T} \\ A_{k}\qquad 0 \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix}= \begin{bmatrix} -\nabla f_{k} + A_{k}^\mathrm{T} \lambda _{k} \\ -h_{k} \end{bmatrix}, \end{aligned}$$

(2.3)

where ${\mathscr {L}}_{k} \triangleq {\mathscr {L}}(x_{k}, \lambda _{k}), f_{k} \triangleq f(x_{k}), A_{k} \triangleq A(x_{k})$.

The Newton step generated from the iterate $(x_{k}, \lambda _{k})$ is thus given by

$$\begin{aligned} \begin{bmatrix} x_{k+1} \\ \lambda _{k+1} \end{bmatrix} = \begin{bmatrix} x_{k}\\ \lambda _{k} \end{bmatrix} + \begin{bmatrix} \widehat{x}_{k}\\ \widehat{\lambda }_{k} \end{bmatrix}, \end{aligned}$$

(2.4)

where $\widehat{x}_{k}$ and $\widehat{\lambda }_{k}$ can be obtained by solving (2.3). If the following assumptions are true, then (2.3) has unique solution (see [19]).

Assumptions

(A-1)
The constraint Jacobian matrix A(x) has full rank.
(A-2)
The matrix $\nabla _{xx}^{2} {\mathscr {L}}(x, \lambda )$ is positive definite on the tangent space of the constraints, i.e., $d^\mathrm{T} \nabla _{xx}^{2} {\mathscr {L}}(x, \lambda ) d>0$ for $d \in \{d \in {\mathbb {R}}^{n} ~|~ A(x)d=0, d \ne 0 \}$.

Consider the following quadratic programming problem as an approximate model of (EP) at $(x_{k}, \lambda _{k})$:

$$\begin{aligned} (\mathrm{QP}_k){:}\quad&\text{ Min }_{s \in {\mathbb {R}}^{n}}~~f_{k}+\nabla f_{k}^\mathrm{T} s+ \frac{1}{2} s^\mathrm{T} \nabla _{xx}^{2} {\mathscr {L}}_{k}~s \\&\text{ s.t. }~~A_{k} s+h_{k}=0. \end{aligned}$$

The Lagrange function of (QP$_k$) is

$$\begin{aligned} {\mathscr {L}}_\mathrm{QP} \triangleq f_{k}+\nabla f_{k}^\mathrm{T} s+ \frac{1}{2} s^\mathrm{T} \nabla _{xx}^{2} {\mathscr {L}}_{k}~s-l^\mathrm{T}(A_{k} s+h_{k}), \end{aligned}$$

where l is the Lagrange multiplier of (QP$_k$). If the assumptions (A-1) and (A-2) hold, then ($\mathrm{QP}_k$) has a unique solution $(s_{k},l_{k})$, that satisfies the first-order KKT conditions of ($\mathrm{QP}_k$), which are:

$$\begin{aligned}&\displaystyle \nabla _{xx}^{2} {\mathscr {L}}_{k} s_{k} + \nabla f_{k} - A_{k}^\mathrm{T} l_{k}=0, \end{aligned}$$

(2.5a)

$$\begin{aligned}&\displaystyle A_{k} s_{k} +h_{k}=0. \end{aligned}$$

(2.5b)

These two equations can be written in the following matrix form:

$$\begin{aligned} \begin{bmatrix} \nabla _{xx}^{2} {{\mathscr {L}}}_{k}&\quad -A_{k}^\mathrm{T} \\ A_{k}&\quad 0 \end{bmatrix} \begin{bmatrix} s_{k} \\ l_{k} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k} \\ -h_{k} \end{bmatrix}. \end{aligned}$$

(2.6)

Subtracting $A_{k}^\mathrm{T} \lambda _{k}$ from both side of the first equation of (2.3), we have

$$\begin{aligned} \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{k}&\quad -A_{k}^\mathrm{T} \\ A_{k}&\quad 0 \end{bmatrix} \begin{bmatrix} \widehat{x}_{k} \\ \lambda _{k}+\widehat{\lambda }_{k} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k} \\ -h_{k} \end{bmatrix}, \end{aligned}$$

i.e.,

$$\begin{aligned} \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{k}&\quad -A_{k}^\mathrm{T} \\ A_{k}&\quad 0 \end{bmatrix} \begin{bmatrix} \widehat{x}_{k} \\ \lambda _{k+1} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k} \\ -h_{k} \end{bmatrix}, \end{aligned}$$

(2.7)

where $\lambda _{k+1}= \lambda _{k}+\widehat{\lambda }_{k+1}$.

Now the local SQP iterates are $(x_{k+1}, \lambda _{k+1})$, where $x_{k+1}=x_{k}+s_{k}$ and $\lambda _{k+1}=l_{k}$. If the assumptions (A-1) and (A-2) hold then $P^{\prime }(x_{k}, \lambda _{k})$ becomes a non singular matrix. Hence from (2.6) and (2.7) we have that $s_{k}=\widehat{x}_{k}$ and $l_{k}=\lambda _{k+1}$. In this way, the new iterate $(x_{k+1}, \lambda _{k+1})$ can be generated by solving ($\mathrm{QP}_k$). Further it is proved in [9] that chosen $(x_{0}, \lambda _{0})$ close to the solution $(x^{*}, \lambda ^{*})$ of (EP), then sequence $\{(x_{k}, \lambda _{k})\}$ converges to $(x^{*}, \lambda ^{*})$ quadratically.

In the next section, this concept is used to develop a new algorithm to solve (EP) with higher-order convergence property.

3 Two-Phase-SQP Technique

Let $(s_{k}, l_{k})$ be the solution of ($\mathrm{QP}_k$), which is obtained using the process described in Sect. 2. $l_{k}$ is the Lagrange multiplier at $k\mathrm{th}$ stage. Denote

$$\begin{aligned} x_{\bar{k}}=x_{k}+s_{k}\quad \text{ and }\quad \lambda _{\bar{k}}=l_{k}. \end{aligned}$$

At this new point $(x_{\bar{k}},\lambda _{\bar{k}})$ find $A_{\bar{k}}\triangleq [\nabla h_{1}(x_{\bar{k}}),\nabla h_{2}(x_{\bar{k}}), \cdots , \nabla h_{m}(x_{\bar{k}})]^\mathrm{T}$ and $\nabla _{xx}^{2}{\mathscr {L}}_{{\bar{k}}} \triangleq \nabla _{xx}^{2} {\mathscr {L}}(x_{\bar{k}},\lambda _{\bar{k}})$. Consider the following optimization problem $(\mathrm{QP}_k1)$ at $(x_{k},\lambda _{k})$:

$$\begin{aligned}&(\mathrm{QP}_k1){:}\quad \text{ Min }_{s \in {\mathbb {R}}^{n}}~~f_{k}+\big (\nabla f_{k}^\mathrm{T}+\frac{1}{2}\lambda _{k}^\mathrm{T} B_{k} \big ) s+ \frac{1}{4} s^\mathrm{T} \left( \nabla _{xx}^{2} {\mathscr {L}}_{k}+ \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}}\right) ~s \end{aligned}$$

(3.1a)

$$\begin{aligned}&\text{ s.t. }~~\frac{1}{2}C_{k} s+h_{k}=0, \end{aligned}$$

(3.1b)

where $B_{k}=A_{\bar{k}}-A_{k}$, $C_{k}=A_{\bar{k}} +A_{k}~$.

$(\mathrm{QP}_k1)$ is a convex quadratic programming problem on the tangent subspace $ \{ d~|~d^\mathrm{T} \nabla _{xx}^{2}{\mathscr {L}}(x,\lambda ) d>0, A(x)d=0 \}$. Assume that $C_{k}$ is of full rank. The unique solution $(s_{\bar{k}}, l_{\bar{k}})$ of $(\mathrm{QP}_k1)$ satisfies the first-order KKT optimality condition of $(\mathrm{QP}_k1)$. $l_{\bar{k}}$ be the Lagrange multiplier of $(\mathrm{QP}_k1)$. Then from KKT optimality conditions

$$\begin{aligned}&\displaystyle \frac{1}{2}\left( \nabla _{xx}^{2} {\mathscr {L}}_{k}+ \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}}\right) s_{\bar{k}}+\nabla f_{k}+ \frac{1}{2}B_{k}^\mathrm{T} \lambda _{k}- \frac{1}{2}C_{k}^\mathrm{T} l_{\bar{k}}=0, \end{aligned}$$

(3.2a)

$$\begin{aligned}&\displaystyle \frac{1}{2}C_{k} s_{\bar{k}} +h_{k}=0. \end{aligned}$$

(3.2b)

Adding $\frac{1}{2}C_{k}^\mathrm{T} \lambda _{k}$ to both sides of (3.2a), we get

$$\begin{aligned} \frac{1}{2}\left( \nabla _{xx}^{2} {\mathscr {L}}_{k}+ \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}}\right) s_{\bar{k}}+\nabla f_{k}+&\frac{1}{2}B_{k}^\mathrm{T}\lambda _{k}-\frac{1}{2}C_{k}^\mathrm{T} (l_{\bar{k}}-\lambda _{k})= \frac{1}{2}C_{k}^\mathrm{T} \lambda _{k}, \end{aligned}$$

(3.3a)

$$\begin{aligned}&\frac{1}{2}C_{k} s_{\bar{k}} +h_{k}=0 . \end{aligned}$$

(3.3b)

(3.3a) and (3.3b) can be written as the following matrix equation:

$$\begin{aligned} \begin{bmatrix} \frac{1}{2}(\nabla _{xx}^{2} {\mathscr {L}}_{k}+ \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}})&\quad -\frac{1}{2}C_{k}^\mathrm{T} \\ \frac{1}{2}C_{k}&\quad 0 \end{bmatrix} \begin{bmatrix} s_{\bar{k}} \\ l_{\bar{k}}-\lambda _{k} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k}+A_{k}^\mathrm{T} \lambda _{k} \\ -h_{k} \end{bmatrix}. \end{aligned}$$

This is equivalent to

$$\begin{aligned} \frac{1}{2} \Bigg ( \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{k}&\quad -A_{k}^\mathrm{T} \\ A_{k}&\quad 0 \end{bmatrix}+ \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}}&\quad - A_{\bar{k}}^\mathrm{T}\\ A_{\bar{k}}&\quad 0 \end{bmatrix} \Bigg ) \begin{bmatrix} s_{\bar{k}} \\ l_{\bar{k}}-\lambda _{k} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k}+A_{k}^\mathrm{T} \lambda _{k} \\ -h_{k} \end{bmatrix}, \end{aligned}$$

i.e.,

$$\begin{aligned} \frac{1}{2} \big ( P^{\prime }(x_{k},\lambda _{k})+ P^{\prime }(x_{\bar{k}},\lambda _{\bar{k}})\big ) \begin{bmatrix} s_{\bar{k}} \\ l_{\bar{k}}-\lambda _{k} \end{bmatrix}= -P(x_{k}, \lambda _{k}). \end{aligned}$$

(3.4)

Denote

$$\begin{aligned} \widehat{x}_{\bar{k}}= s_{\bar{k}}\quad \text{ and }\;\widehat{\lambda }_{\bar{k}}=l_{\bar{k}}-\lambda _{k}. \end{aligned}$$

The iteration process discussed above can be summarized in the following two steps:

First step:

$$\begin{aligned} \begin{bmatrix} x_{\bar{k}} \\ \lambda _{\bar{k}} \end{bmatrix} = \begin{bmatrix} x_{k} \\ \lambda _{k} \end{bmatrix}+ \begin{bmatrix} \widehat{x}_{k} \\ \widehat{\lambda }_{k} \end{bmatrix}, \end{aligned}$$

(3.5a)

where ($\widehat{x}_{k}$,$\widehat{\lambda }_{k}$) is the value of $(\alpha , \beta )$ satisfying

$$\begin{aligned} P^{\prime }(x_{k},\lambda _{k}) \begin{bmatrix} \alpha \\ \beta \end{bmatrix}= -P(x_{k}, \lambda _{k}). \end{aligned}$$

(3.5b)

Second step:

$$\begin{aligned} \begin{bmatrix} x_{k+1} \\ \lambda _{k+1} \end{bmatrix} = \begin{bmatrix} x_{k} \\ \lambda _{k} \end{bmatrix}+ \begin{bmatrix} \widehat{x}_{\bar{k}} \\ \widehat{\lambda }_{\bar{k}} \end{bmatrix}, \end{aligned}$$

(3.6a)

where ($\widehat{x}_{\bar{k}}$,$\widehat{\lambda }_{\bar{k}}$) is the value of $(\alpha , \beta )$ satisfying

$$\begin{aligned} \frac{1}{2} \big ( P^{\prime }(x_{k},\lambda _{k})+ P^{\prime }(x_{\bar{k}},\lambda _{\bar{k}})\big ) \begin{bmatrix} \alpha \\ \beta \end{bmatrix}= -P(x_{k}, \lambda _{k}). \end{aligned}$$

(3.6b)

Since $\lambda _{k+1}=\lambda _{k} +\widehat{ \lambda }_{\bar{k}}$, so from (3.4) and (3.6a), $\lambda _{k+1}=\lambda _{k}+(l_{\bar{k}}-\lambda _{k})=l_{\bar{k}}$. Hence the new iterate $(x_{k+1}, \lambda _{k+1})$ can be generated by solving the KKT optimality conditions given in (3.2a) or by its equivalent form given in (3.5) and (3.6).

We state the algorithm for the proposed two-phase-SQP (TP-SQP) scheme in its simplest form in the following algorithm:

For solving the equality-constrained optimization problem, a good initial estimate $x_{0}$ for $x^{*}$ can be used to obtain a good initial estimate $\lambda _{0}$ for the optimal multiplier vector $\lambda ^{*}$, which may be considered as follows in the light of [9]:

$$\begin{aligned} \lambda _{0}=\left[ A(x_{0})A(x_{0})^\mathrm{T}\right] ^{-1} A(x_{0})\nabla f(x_{0}). \end{aligned}$$

(3.7)

$\lambda _{0}$ can be made arbitrarily close to $\lambda ^{*}$ by choosing $x_{0}$ close to $x^{*}$.

4 Convergence Analysis

Theorem 4.1

If P is thrice Frechet differentiable function and $(x_{0},\lambda _{0})$ is sufficiently close to the solution $(x^{*},\lambda ^{*})$ of (EP), then sequence $(x_{k}, \lambda _{k})$ generated by Algorithm 1 converges cubically to $(x^{*}, \lambda ^{*})$.

Proof

Let $u_{k}$ and $v_{k}$ are $m+n$ dimensional vectors, where $u_{k} \triangleq (x_{k}, \lambda _{k})$ and $v_{k} \triangleq (x_{\bar{k}}, \lambda _{\bar{k}})$, respectively. Then the steps (3.5) and (3.6) can be written as follows.

$$\begin{aligned} v_{k}&= u_{k} - \left[ P^{\prime }(u_{k}) \right] ^{-1} P(u_{k}), \end{aligned}$$

(4.1a)

$$\begin{aligned} u_{k+1}&=u_{k}- 2 \left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] ^{-1} P(u_{k}). \end{aligned}$$

(4.1b)

We denote $\alpha =(x^{*}, \lambda ^{*})$. Here $P:{\mathbb {R}}^{n+m}\rightarrow {\mathbb {R}}^{n+m}$. Using Taylor expansion of $P(\alpha )$ about $u_{k}$, we have

$$\begin{aligned} P(\alpha )= & {} P(u_{k})+P^{\prime }(u_{k})(\alpha - u_{k})+\frac{1}{2!}P^{\prime \prime }(u_{k})(\alpha - u_{k})^{2}\nonumber \\&+ \frac{1}{3!}P^{\prime \prime \prime }(u_{k})(\alpha - u_{k})^{3} + \cdots \end{aligned}$$

(4.2)

Subtracting $\alpha $ from both the sides of (4.1b), and denoting $e_{k}=u_{k}-\alpha $, we have

$$\begin{aligned}&\displaystyle e_{k+1}=e_{k}-2\left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] ^{-1} P(u_{k}), \nonumber \\&\displaystyle \left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] e_{k+1}=\left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] e_{k}-2P(u_{k}). \end{aligned}$$

(4.3)

Since $(x^{*}, \lambda ^{*})$ is the solution of (EP), so $P(x^{*}, \lambda ^{*})=0$. From (4.2),

$$\begin{aligned} P(u_{k})&=-P^{\prime }(u_{k})(\alpha - u_{k})-\frac{1}{2!}P^{\prime \prime }(u_{k})(\alpha - u_{k})^{2}\\&\qquad -\frac{1}{3!} P^{\prime \prime \prime }(u_{k})(\alpha - u_{k})^{3} + \cdots \\&=P^{\prime }(u_{k})e_{k}-\frac{1}{2!}P^{\prime \prime }(u_{k})e_{k}^{2}+ \frac{1}{3!}P^{\prime \prime \prime }(u_{k})e_{k}^{3} +\cdots \end{aligned}$$

$$\begin{aligned}&\quad \;[P^{\prime }(u_{k})]^{-1}P(u_{k})\nonumber \\&=e_{k}-[P^{\prime }(u_{k})]^{-1}P^{\prime \prime }(u_{k})\frac{1}{2!}e_{k}^{2}+ [P^{\prime }(u_{k})]^{-1}P^{\prime \prime \prime }(u_{k})\frac{1}{3!}e_{k}^{3} +\cdots \end{aligned}$$

(4.4)

Expanding $P^{\prime }(v_{k})$ with respect to $u_{k}$ we get,

$$\begin{aligned} P^{\prime }(v_{k})&= P^{\prime }(u_{k})+P^{\prime \prime }(u_{k})(v_{k}-u_{k})+\frac{1}{2!} P^{\prime \prime \prime }(u_{k})(v_{k}-u_{k})^{2}+ \cdots \nonumber \\&= P^{\prime }(u_{k})-P^{\prime \prime }(u_{k})\left[ P^{\prime }(u_{k})^{-1}P(u_{k})\right] +\frac{1}{2!}P^{\prime \prime \prime }(u_{k})\left[ P^{\prime }(u_{k})^{-1}P(u_{k})\right] ^{2}+\cdots \nonumber \\&\qquad \text{(From } \text{(4.1a)) } \nonumber \\&= P^{\prime }(u_{k})-P^{\prime \prime }(u_{k})\big [e_{k}-[P^{\prime }(u_{k})]^{-1} P^{\prime \prime }(u_{k})\frac{1}{2!}e_{k}^{2} \nonumber \\&\quad +[P^{\prime }(u_{k})]^{-1}P^{\prime \prime \prime }(u_{k})\frac{1}{3!}e_{k}^{3} +\cdots \big ]+\frac{1}{2!}P^{\prime \prime \prime }(u_{k})e_{k}^{2}+\cdots ~~ \text{(from } \text{(4.4)) } \end{aligned}$$

(4.5)

The right hand side of (4.3) gives

$$\begin{aligned} \left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] e_{k}-2P(u_{k})=&\,\frac{1}{2!}P^{\prime \prime } (u_{k})\left[ P^{\prime }(u_{k})\right] ^{-1}P^{\prime \prime }(u_{k})e_{k}^{3} \nonumber \\&+\frac{1}{2!}P^{\prime \prime \prime }(u_{k})e_{k}^{3}- \frac{2}{3!}P^{\prime \prime \prime }(u_{k}) e_{k}^{3}+ \cdots \end{aligned}$$

(4.6)

Using (4.6), Expression (4.3) can be written as

$$\begin{aligned} \left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] e_{k+1}=\mathscr {B}_{k} e_{k}^{3} + O(\vert \vert e_{k} \vert \vert ^{4}), \end{aligned}$$

(4.7)

where

$$\begin{aligned} \mathscr {B}_{k}=\bigg [ \frac{1}{2!}P^{\prime \prime }(u_{k})\left[ P^{\prime }(u_{k}) \right] ^{-1}P^{\prime \prime }(u_{k})+\frac{1}{3!}P^{\prime \prime \prime }(u_{k}) \bigg ]. \end{aligned}$$

This shows that the Algorithm 1 has cubic-order convergence.

One may observe the following theoretical advantages of the TP-SQP scheme.

Without computing higher-order derivatives, a higher-order convergence is achieved in comparison to Newton-SQP framework.
TP-SQP scheme involves the same order of arithmetic operations as compared to Newton-SQP method.

One may associate a line search technique to each iteration of Algorithm 1 with the help of a merit function, which can decide whether a trial step should be accepted or not. In the next section, we limit our discussion to $l_{1}$ merit function for the proposed scheme. It has been discussed how to choose the penalty parameter $\mu $ that makes the chosen direction a descent one in every iteration.

5 A Note on $l_{1}$-Merit Function for the Proposed Scheme

The $l_{1}$-merit function is of the form $\phi _{1} : {\mathbb {R}}^ {n+1} \rightarrow {\mathbb {R}}$, as

$$\begin{aligned} \phi _{1}(x; \mu )=f(x)+ \mu \Vert h(x) \Vert _{1}, \end{aligned}$$

where $\mu $ is the penalty parameter, $\Vert .\Vert _{1} $ is $l_{1}$ norm. In TP-SQP method, the step length is of one unit length. The steplength $\alpha _{k} \in (0,1]$ is associated if the following decrease condition holds:

$$\begin{aligned} \phi _{1}(x_{k}+\alpha _{k}p_{k}; \mu _{k}) \leqslant \phi _{1}(x_{k}; \mu _{k})+\eta \alpha _{k} {\mathscr {D}}(\phi _{1}(x_{k}; \mu );p_{k}), \end{aligned}$$

where $\eta \in (0,1)$ and ${\mathscr {D}}(\phi _{1}(x_{k}; \mu );p_{k})$ denotes the directional derivative of $\phi _{1}$ in the direction $p_{k}$. This requirement is analogous to Armijo condition of unconstrained optimization problem provided that $p_{k}$ is a descent direction, that is, ${\mathscr {D}}(\phi _{1}(x_{k}; \mu );p_{k}) \leqslant 0$. Descent direction holds if $\mu $ is chosen sufficiently large as shown in the following theorem:

Theorem 5.1

For the TP-SQP scheme, if $\max ~ \{ \Vert A_{k}s_{\bar{k}} \Vert _{1} , \Vert \frac{1}{2}(A_{\bar{k}}- A_{k}) s_{\bar{k}} \Vert _{1} \} \leqslant \Vert h_{k} \Vert _{1}$, then $ \mu > 4 \Vert \lambda _{k} \Vert _{\infty } + 2\Vert \lambda _{k+1} \Vert _{\infty }~$ implies

$$\begin{aligned} {\mathscr {D}}(\phi _{1}(x_{k}; \mu );s_{\bar{k}}) \leqslant 0. \end{aligned}$$

Proof

$$\begin{aligned}&\phi _{1}(x_{k}+\alpha s_{\bar{k}}; \mu ) - \phi (x_{k};\mu ) \\= & {} f(x_{k}+\alpha s_{\bar{k}}) - f_{k}+ \mu \Vert h(x_{k} + \alpha s_{\bar{k}}) \Vert _{1} - \mu \Vert h_{k} \Vert _{1} \\= & {} ( f_{k}+ \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}}+ \frac{1}{2} \alpha ^{2} s_{\bar{k}}^\mathrm{T} \nabla ^{2}f(x_{k}+t_{1} s_{\bar{k}}) s_{\bar{k}} )- f_{k} \\&+\, \mu \Vert h_{k} + \alpha A_{k} s_{\bar{k}} + \frac{1}{2} \alpha ^{2} s_{\bar{k}}^\mathrm{T} \nabla ^{2} h(x_{k}+t_{2}s_{\bar{k}}) s_{\bar{k}}\Vert _{1} - \mu \Vert h_{k} \Vert _{1} \\&\quad ( \text{ where } t_{1},t_{2} \in (0,\alpha ) ) \\\leqslant & {} \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}}+ \mu \Vert h_{k} + \alpha A_{k} s_{\bar{k}} \Vert _{1} - \mu \Vert h_{k} \Vert _{1} + \gamma \alpha ^{2} \Vert s_{\bar{k}} \Vert _{1}^{2} \\&\quad (\gamma \hbox { is the upper bound of the second derivatives of }f\hbox { and }h) \\= & {} \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}}+ \mu \Vert h_{k} + \alpha (-2 h_{k}- A_{\bar{k}} s_{\bar{k}}) \Vert _{1} - \mu \Vert h_{k} \Vert _{1} + \gamma \alpha ^{2} \Vert s_{\bar{k}} \Vert _{1}^{2} \\&\quad \left( \text{ Since } \text{ from } \text{(3.3b) },\; \frac{1}{2}(A_{k}+ A_{\bar{k}}) s_{\bar{k}} +h_{k}=0\right) \\= & {} \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}}+ \mu \Vert (1- \alpha )h_{k} - \alpha ( h_{k}+ A_{\bar{k}} s_{\bar{k}}) \Vert _{1} - \mu \Vert h_{k} \Vert _{1} \\&+\, \gamma \alpha ^{2} \Vert s_{\bar{k}} \Vert _{1}^{2} \\\leqslant & {} \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}} + \mu (1- \alpha ) \Vert h_{k}\Vert _{1} + \mu \alpha \Vert h_{k}+ A_{\bar{k}} s_{\bar{k}}\Vert _{1} -\mu \Vert h_{k} \Vert _{1} \\&+\, \gamma \alpha ^{2} \Vert s_{\bar{k}} \Vert _{1}^{2} \quad (\text{ for }\; 0< \alpha \leqslant 1). \end{aligned}$$

This implies

$$\begin{aligned} \text{ lim }_{\alpha \rightarrow 0} \frac{\phi _{1}(x_{k}+\alpha s_{\bar{k}}; \mu )- \phi _{1}(x_{k};\mu )}{\alpha } \leqslant \nabla f_{k}^\mathrm{T} s_{\bar{k}} - \mu \Vert h_{k} \Vert _{1} + \mu \Vert h_{k}+ A_{\bar{k}} s_{\bar{k}} \Vert _{1}, \end{aligned}$$

i.e.,

$$\begin{aligned} {\mathscr {D}}(\phi _{1}(x_{k}; \mu ); s_{\bar{k}} ) \leqslant \nabla f_{k}^\mathrm{T} s_{\bar{k}} - \mu \Vert h_{k} \Vert _{1} + \mu \Vert h_{k}+ A_{\bar{k}} s_{\bar{k}} \Vert _{1}. \end{aligned}$$

Multiplying $s_{\bar{k}}^\mathrm{T}$ in left to both sides of (3.2a) and using $l_{\bar{k}}= \lambda _{k+1}$, the above inequality becomes

$$\begin{aligned}&\quad {\mathscr {D}}(\phi _{1}(x_{k}; \mu ); s_{\bar{k}} ) \\&\leqslant -s_{\bar{k}}^\mathrm{T} \big ( \frac{1}{2}(\nabla ^{2}_{xx} {\mathscr {L}}_{k}+ \nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}) \big ) s_{\bar{k}} + s_{\bar{k}}^\mathrm{T} \big ( \frac{1}{2} (A_{k}^\mathrm{T}+ A_{\bar{k}}^\mathrm{T})(\lambda _{k+1}-\lambda _{k}) \big ) \\&+ s_{\bar{k}}^\mathrm{T} A_{k}^\mathrm{T} \lambda _{k} - \mu \Vert h_{k} \Vert _{1} + \mu \Vert h_{k} + A_{\bar{k}} s_{\bar{k}} \Vert _{1}. \end{aligned}$$

Using (3.3a), we get

$$\begin{aligned} {\mathscr {D}}(\phi _{1}(x_{k}; \mu ); s_{\bar{k}} )&\leqslant -s_{\bar{k}}^\mathrm{T} \left( \frac{1}{2}\left( \nabla ^{2}_{xx} {\mathscr {L}}_{k}+ \nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}\right) \right) s_{\bar{k}}-h_{k}^\mathrm{T}\lambda _{k+1}+h_{k}^\mathrm{T} \lambda _{k}+(A_{k} s_{\bar{k}})^\mathrm{T} \lambda _{k} \\&\quad - \mu \Vert h_{k} \Vert _{1} + \mu \Vert \frac{1}{2}(A_{\bar{k}}- A_{k}) s_{\bar{k}} \Vert _{1} \\&\leqslant -s_{\bar{k}}^\mathrm{T} \left( \frac{1}{2}\left( \nabla ^{2}_{xx} {\mathscr {L}}_{k}+ \nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}\right) \right) s_{\bar{k}} + \Vert h_{k} \Vert _{1} \Vert \lambda _{k+1} \Vert _{\infty } + \Vert h_{k} \Vert _{1} \Vert \lambda _{k} \Vert _{\infty } \\&\quad + \Vert A_{k} s_{\bar{k}} \Vert _{1} \Vert \lambda _{k} \Vert _{\infty } - \mu \Vert h_{k} \Vert _{1} + \frac{\mu }{2} \Vert h_{k} \Vert _{1}. \end{aligned}$$

Using the given condition $\max ~ \{ \Vert A_{k} s_{\bar{k}} \Vert _{1} , \Vert \frac{1}{2}(A_{\bar{k}}- A_{k}) s_{\bar{k}} \Vert _{1} \} \leqslant \Vert h_{k} \Vert _{1}$, we get

$$\begin{aligned} {\mathscr {D}}(\phi _{1}(x_{k}; \mu ); s_{\bar{k}} )&\leqslant - s_{\bar{k}}^\mathrm{T} \left( \frac{1}{2}(\nabla ^{2}_{xx} {\mathscr {L}}_{k}+ \nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}) \right) s_{\bar{k}} \\&\quad + \Vert h_{k} \Vert _{1} \left( \Vert \lambda _{k+1} \Vert _{\infty } + 2 \Vert \lambda _{k} \Vert _{\infty } - \frac{\mu }{2} \right) \\&\leqslant 0 \qquad \left( \text{ for } ~ \mu > 2 \Vert \lambda _{k+1} \Vert _{\infty } + 4 \Vert \lambda _{k} \Vert _{\infty }\right) . \end{aligned}$$

It follows that under the assumption stated, $s_{\bar{k}}$ will be a descent direction for $\phi $ if $\nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}$ is positive definite and $\mu > 2 \Vert \lambda _{k+1} \Vert _{\infty } + 4 \Vert \lambda _{k} \Vert _{\infty }$.

6 Numerical Examples

Following two examples are examined with the proposed algorithm, and the results are analyzed. The algorithm will terminate whenever the norm of the gradient of the Lagrangian comes close to zero.

Example 6.1

$$\begin{aligned} \text{(P1): }&\hbox { min}\; x_{1}+x_{2} \\&\text{ s.t. }\;x_{1}^{2}+x_{2}^{2}-2=0 \end{aligned}$$

Example 6.2

$$\begin{aligned} \text{(P2) }{:}&\hbox { min}\; (\sqrt{3})^{3} x_{1}x_{2}x_{3}\\&\text{ s.t. }\; x_{1}^{2}+x_{2}^{2}+x_{3}^{2}-1=0 \end{aligned}$$

(P1) and (P2) are solved in MATLAB R-2013(b) by Newton-SQP method and TP-SQP method with the tolerance limit is $10^{-15}$ and different initial points. Value of $\lambda _{0}$ is chosen using (3.7). The number of iterations by both the methods are provided in Table 1. Since the proposed scheme has local cubic-order convergence property, it is expected that the number of iterations of the proposed scheme to attain the solution must be less than the number of iterations of Newton-SQP scheme, which has quadratic convergence property. Table 1 certainly meets this expectation.

Table 1 Comparison between the number of iterations for Newton-SQP and TP-SQP

Full size table

7 Conclusion

In this paper, we have proposed a local scheme to solve a nonlinear equality-constrained optimization problem with higher-order convergence property. Without computing higher-order derivatives, TP-SQP scheme achieves higher-order convergence property more quickly than Newton-SQP method. However, since we are solving two quadratic subproblems in each iteration, for the aspect of software implementation, TP-SQP scheme is more suitable than the Newton-SQP method for those optimization problems, where the former one takes almost half the number of iterations than the latter one. In future, in the light of TP-SQP scheme, one may propose schemes even having order of convergence more than three.

References

Wilson, R.B.: A simplicial algorithm for concave programming. Ph.D. Dissertation. Harvard University (1963)
Byrd, R.H., Curtis, F.E., Nocedal, J.: Infeasibility detection and SQP methods for nonlinear optimization. SIAM J. Optim. 20(5), 2281–2299 (2010)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Nocedal, J.: An analysis of reduced Hessian methods for constrained optimization. Math. Program. 49(1–3), 285–323 (1990)
Article MathSciNet MATH Google Scholar
Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods, vol. 1. SIAM, Philadelphia (2000)
Book MATH Google Scholar
Han, S.P.: Superlinearly convergent variable metric algorithms for general nonlinear programming problems. Math. Program. 11(1), 263–282 (1976)
Article MathSciNet MATH Google Scholar
Huang, M., Pu, D.: A trust-region SQP method without a penalty or a filter for nonlinear programming. J. Comput. Appl. Math. 281, 107–119 (2015)
Article MathSciNet MATH Google Scholar
Izmailov, A.F., Solodov, M.V.: A truncated SQP method based on inexact interior-point solutions of subproblems. SIAM J. Optim. 20(5), 2584–2613 (2010)
Article MathSciNet MATH Google Scholar
Mo, J., Zhang, K., Wei, Z.: A variant of SQP method for inequality constrained optimization and its global convergence. J. Comput. Appl. Math. 197(1), 270–281 (2006)
Article MathSciNet MATH Google Scholar
Boggs, P.T., Tolle, J.W.: Sequential quadratic programming. Acta Numer. 4, 1–51 (1995)
Article MathSciNet MATH Google Scholar
Gould, N., Orban, D., Toint, P.: Numerical methods for large-scale nonlinear optimization. Acta Numer. 14, 299–361 (2005)
Article MathSciNet MATH Google Scholar
Schittkowski, K., Yuan, Y.X.: Sequential Quadratic Programming Methods. Wiley Encyclopedia of Operations Research and Management Science, New York 147–224 (2011)
Babajee, D., Dauhoo, M.: An analysis of the properties of the variants of Newtons method with third order convergence. Appl. Math. Comput. 183(1), 659–684 (2006)
MathSciNet MATH Google Scholar
Frontini, M., Sormani, E.: Third-order methods from quadrature formulae for solving systems of nonlinear equations. Appl. Math. Comput. 149(3), 771–782 (2004)
MathSciNet MATH Google Scholar
Golbabai, A., Javidi, M.: A third-order Newton type method for nonlinear equations based on modified homotopy perturbation method. Appl. Math. Comput. 191(1), 199–205 (2007)
MathSciNet MATH Google Scholar
Kou, J., Li, Y., Wang, X.: A modification of Newton method with third-order convergence. Appl. Math. Comput. 181(2), 1106–1111 (2006)
MathSciNet MATH Google Scholar
Sharma, J.R., Guha, R.K., Sharma, R.: An efficient fourth order weighted-newton method for systems of nonlinear equations. Numer. Algorithms 62(2), 307–323 (2013)
Article MathSciNet MATH Google Scholar
Grapiglia, G.N., Yuan, J., Yuan, Y.X.: A subspace version of the Powell–Yuan trust-region algorithm for equality constrained optimization. J. Oper. Res. Soc. China 1(4), 425–451 (2013)
Article MATH Google Scholar
Liu, X., Yuan, Y.: A sequential quadratic programming method without a penalty function or a filter for nonlinear equality constrained optimization. SIAM J. Optim. 21(2), 545–571 (2011)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur, 761302, India
Suvra Kanti Chakraborty & Geetanjali Panda

Authors

Suvra Kanti Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Geetanjali Panda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geetanjali Panda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chakraborty, S.K., Panda, G. Two-Phase-SQP Method with Higher-Order Convergence Property. J. Oper. Res. Soc. China 4, 385–396 (2016). https://doi.org/10.1007/s40305-016-0122-6

Download citation

Received: 11 June 2015
Revised: 20 January 2016
Accepted: 24 February 2016
Published: 02 April 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s40305-016-0122-6

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Two-Phase-SQP Method with Higher-Order Convergence Property

Abstract

Similar content being viewed by others

An SQP method for mathematical programs with vanishing constraints with strong convergence properties

A stabilized SQP method: superlinear convergence

A Globally Convergent QP-Free Algorithm for Inequality Constrained Minimax Optimization

1 Introduction

2 Background: Existing Newton-SQP Method

3 Two-Phase-SQP Technique

4 Convergence Analysis

Theorem 4.1

Proof

5 A Note on \(l_{1}\)-Merit Function for the Proposed Scheme

Theorem 5.1

Proof

6 Numerical Examples

Example 6.1

Example 6.2

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Two-Phase-SQP Method with Higher-Order Convergence Property

Abstract

Similar content being viewed by others

An SQP method for mathematical programs with vanishing constraints with strong convergence properties

A stabilized SQP method: superlinear convergence

A Globally Convergent QP-Free Algorithm for Inequality Constrained Minimax Optimization

1 Introduction

2 Background: Existing Newton-SQP Method

3 Two-Phase-SQP Technique

4 Convergence Analysis

Theorem 4.1

Proof

5 A Note on \(l_{1}\)-Merit Function for the Proposed Scheme

Theorem 5.1

Proof

6 Numerical Examples

Example 6.1

Example 6.2

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation