1 Introduction

Sequential quadratic programming (SQP) method is one of the most successful methods for solving constrained nonlinear optimization problems. This is an iterative procedure which generates a sequence of points (not necessarily feasible points), obtained by solving quadratic programming sub problems, and converges to the Karush-Kuhn-Tucker (KKT) point. This idea was first proposed by Wilson [1] in 1963. Since then SQP method has been studied extensively by many researchers (see [28]). The readers may see Boggs [9], Gould et al. [10], Schittkowski et al. [11] for some good reviews on SQP algorithms, discussed so far. Some SQP methods use convex quasi-Newton approximation which makes the algorithm slow in case of large scale problems, whereas some other SQP methods employ the exact Hessian of the Lagrangian, which are Newton like methods. Under suitable assumptions the Newton version of the SQP algorithm converges to the minimum point without the use of line search and additional parameters. The Newton-SQP framework is the same as solving a Newton system derived from the KKT conditions, in which case it is often difficult to choose an initial iterate, close enough to the true solution to guarantee the convergence of the algorithm. These SQP methods, discussed so far, at most show quadratic or superlinear local convergence property. For the past few decades, researchers have shown their interest (see [1216]) in iterative algorithms with higher-order convergence property. Again we notice that equality-constrained optimization emerges as a special branch of interest in constrained optimization theory (see [17, 18]).

In this paper, we suggest a two-phase-SQP method for equality-constrained optimization problems, which provides local cubic-order convergence under some comfortable assumptions. We have also discussed the conditions for the associated line search method to the scheme, under the \(l_{1}\) merit function.

Section 2 explains the Newton-SQP method. The two-phase-SQP is proposed in Sect. 3 which is followed by the convergence analysis in Sect. 4. We propose a note on \(l_{1}\) merit function of the new scheme in Sect. 5. Numerical examples are provided in Sect. 6, and finally some concluding remarks are provided in Sect. 7.

2 Background: Existing Newton-SQP Method

Consider the following optimization problem with equality constraints.

$$\begin{aligned} \text{(EP): }\quad&\text{ Min }\;f(x) \end{aligned}$$
(2.1a)
$$\begin{aligned}&\text{ s.t. }\; h(x)=0, \end{aligned}$$
(2.1b)

where \(f : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) and \(h: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^m\) are smooth functions. The motivation behind the local SQP approach is to model (EP) at the current iterate \(x_k\) by a quadratic programming subproblem, then use the minimizer of the subproblem to define a new iterate \(x_{k+1}\).

Lagrangian function for the problem (EP) is \({\mathscr {L}} (x,\lambda )=f(x)-\lambda ^\mathrm{T}h(x),~\lambda \in {\mathbb {R}}^{m}\). Denote \(A(x)=[ \nabla h_{1}(x), \nabla h_{2}(x), \cdots , \nabla h_{m}(x)]^\mathrm{T}\) where \(h_{i}(x)\) is the \(i\mathrm{th}\) component of the vector h(x). The KKT system for (EP) is

$$\begin{aligned} P(x, \lambda ) \triangleq \begin{bmatrix} \nabla f(x) - A(x)^\mathrm{T} \lambda \\ h(x) \end{bmatrix} =0. \end{aligned}$$
(2.2)

Any solution \((x^{*}, \lambda ^{*})\) of (EP) for which A(x) has full rank, satisfies (2.2). System (2.2) consists of \(n+m\) equations in \(n+m\) unknowns x and \(\lambda \). Let \(P^{\prime } (x_{k}, \lambda _{k})\) be the first-order Frechet derivative of P at \((x_{k}, \lambda _{k})\). The next iterate \((x_{k+1}, \lambda _{k+1})\) is given by \(x_{k+1}=x_{k}+\widehat{x}_{k}\) and \(\lambda _{k+1}=\lambda _{k}+\widehat{\lambda }_{k}\), where \((\widehat{x}_{k}, \widehat{\lambda }_{k})\) is the value of \((\alpha , \beta ) \in {\mathbb {R}}^n \times {\mathbb {R}}^m\), satisfying the following system:

$$\begin{aligned} P^{\prime }(x_{k},\lambda _{k}) \begin{bmatrix} \alpha \\ \beta \end{bmatrix}= -P(x_{k}, \lambda _{k}), \end{aligned}$$

which is same as

$$\begin{aligned} \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{k}\qquad -A_{k}^\mathrm{T} \\ A_{k}\qquad 0 \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix}= \begin{bmatrix} -\nabla f_{k} + A_{k}^\mathrm{T} \lambda _{k} \\ -h_{k} \end{bmatrix}, \end{aligned}$$
(2.3)

where \({\mathscr {L}}_{k} \triangleq {\mathscr {L}}(x_{k}, \lambda _{k}), f_{k} \triangleq f(x_{k}), A_{k} \triangleq A(x_{k})\).

The Newton step generated from the iterate \((x_{k}, \lambda _{k})\) is thus given by

$$\begin{aligned} \begin{bmatrix} x_{k+1} \\ \lambda _{k+1} \end{bmatrix} = \begin{bmatrix} x_{k}\\ \lambda _{k} \end{bmatrix} + \begin{bmatrix} \widehat{x}_{k}\\ \widehat{\lambda }_{k} \end{bmatrix}, \end{aligned}$$
(2.4)

where \(\widehat{x}_{k}\) and \(\widehat{\lambda }_{k}\) can be obtained by solving (2.3). If the following assumptions are true, then (2.3) has unique solution (see [19]).

Assumptions

  1. (A-1)

    The constraint Jacobian matrix A(x) has full rank.

  2. (A-2)

    The matrix \(\nabla _{xx}^{2} {\mathscr {L}}(x, \lambda )\) is positive definite on the tangent space of the constraints, i.e., \(d^\mathrm{T} \nabla _{xx}^{2} {\mathscr {L}}(x, \lambda ) d>0\) for \(d \in \{d \in {\mathbb {R}}^{n} ~|~ A(x)d=0, d \ne 0 \}\).

Consider the following quadratic programming problem as an approximate model of (EP) at \((x_{k}, \lambda _{k})\):

$$\begin{aligned} (\mathrm{QP}_k){:}\quad&\text{ Min }_{s \in {\mathbb {R}}^{n}}~~f_{k}+\nabla f_{k}^\mathrm{T} s+ \frac{1}{2} s^\mathrm{T} \nabla _{xx}^{2} {\mathscr {L}}_{k}~s \\&\text{ s.t. }~~A_{k} s+h_{k}=0. \end{aligned}$$

The Lagrange function of (QP\(_k\)) is

$$\begin{aligned} {\mathscr {L}}_\mathrm{QP} \triangleq f_{k}+\nabla f_{k}^\mathrm{T} s+ \frac{1}{2} s^\mathrm{T} \nabla _{xx}^{2} {\mathscr {L}}_{k}~s-l^\mathrm{T}(A_{k} s+h_{k}), \end{aligned}$$

where l is the Lagrange multiplier of (QP\(_k\)). If the assumptions (A-1) and (A-2) hold, then (\(\mathrm{QP}_k\)) has a unique solution \((s_{k},l_{k})\), that satisfies the first-order KKT conditions of (\(\mathrm{QP}_k\)), which are:

$$\begin{aligned}&\displaystyle \nabla _{xx}^{2} {\mathscr {L}}_{k} s_{k} + \nabla f_{k} - A_{k}^\mathrm{T} l_{k}=0, \end{aligned}$$
(2.5a)
$$\begin{aligned}&\displaystyle A_{k} s_{k} +h_{k}=0. \end{aligned}$$
(2.5b)

These two equations can be written in the following matrix form:

$$\begin{aligned} \begin{bmatrix} \nabla _{xx}^{2} {{\mathscr {L}}}_{k}&\quad -A_{k}^\mathrm{T} \\ A_{k}&\quad 0 \end{bmatrix} \begin{bmatrix} s_{k} \\ l_{k} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k} \\ -h_{k} \end{bmatrix}. \end{aligned}$$
(2.6)

Subtracting \(A_{k}^\mathrm{T} \lambda _{k}\) from both side of the first equation of (2.3), we have

$$\begin{aligned} \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{k}&\quad -A_{k}^\mathrm{T} \\ A_{k}&\quad 0 \end{bmatrix} \begin{bmatrix} \widehat{x}_{k} \\ \lambda _{k}+\widehat{\lambda }_{k} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k} \\ -h_{k} \end{bmatrix}, \end{aligned}$$

i.e.,

$$\begin{aligned} \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{k}&\quad -A_{k}^\mathrm{T} \\ A_{k}&\quad 0 \end{bmatrix} \begin{bmatrix} \widehat{x}_{k} \\ \lambda _{k+1} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k} \\ -h_{k} \end{bmatrix}, \end{aligned}$$
(2.7)

where \(\lambda _{k+1}= \lambda _{k}+\widehat{\lambda }_{k+1}\).

Now the local SQP iterates are \((x_{k+1}, \lambda _{k+1})\), where \(x_{k+1}=x_{k}+s_{k}\) and \(\lambda _{k+1}=l_{k}\). If the assumptions (A-1) and (A-2) hold then \(P^{\prime }(x_{k}, \lambda _{k})\) becomes a non singular matrix. Hence from (2.6) and (2.7) we have that \(s_{k}=\widehat{x}_{k}\) and \(l_{k}=\lambda _{k+1}\). In this way, the new iterate \((x_{k+1}, \lambda _{k+1})\) can be generated by solving (\(\mathrm{QP}_k\)). Further it is proved in [9] that chosen \((x_{0}, \lambda _{0})\) close to the solution \((x^{*}, \lambda ^{*})\) of (EP), then sequence \(\{(x_{k}, \lambda _{k})\}\) converges to \((x^{*}, \lambda ^{*})\) quadratically.

In the next section, this concept is used to develop a new algorithm to solve (EP) with higher-order convergence property.

3 Two-Phase-SQP Technique

Let \((s_{k}, l_{k})\) be the solution of (\(\mathrm{QP}_k\)), which is obtained using the process described in Sect. 2. \(l_{k}\) is the Lagrange multiplier at \(k\mathrm{th}\) stage. Denote

$$\begin{aligned} x_{\bar{k}}=x_{k}+s_{k}\quad \text{ and }\quad \lambda _{\bar{k}}=l_{k}. \end{aligned}$$

At this new point \((x_{\bar{k}},\lambda _{\bar{k}})\) find \(A_{\bar{k}}\triangleq [\nabla h_{1}(x_{\bar{k}}),\nabla h_{2}(x_{\bar{k}}), \cdots , \nabla h_{m}(x_{\bar{k}})]^\mathrm{T}\) and \(\nabla _{xx}^{2}{\mathscr {L}}_{{\bar{k}}} \triangleq \nabla _{xx}^{2} {\mathscr {L}}(x_{\bar{k}},\lambda _{\bar{k}})\). Consider the following optimization problem \((\mathrm{QP}_k1)\) at \((x_{k},\lambda _{k})\):

$$\begin{aligned}&(\mathrm{QP}_k1){:}\quad \text{ Min }_{s \in {\mathbb {R}}^{n}}~~f_{k}+\big (\nabla f_{k}^\mathrm{T}+\frac{1}{2}\lambda _{k}^\mathrm{T} B_{k} \big ) s+ \frac{1}{4} s^\mathrm{T} \left( \nabla _{xx}^{2} {\mathscr {L}}_{k}+ \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}}\right) ~s \end{aligned}$$
(3.1a)
$$\begin{aligned}&\text{ s.t. }~~\frac{1}{2}C_{k} s+h_{k}=0, \end{aligned}$$
(3.1b)

where \(B_{k}=A_{\bar{k}}-A_{k}\), \(C_{k}=A_{\bar{k}} +A_{k}~\).

\((\mathrm{QP}_k1)\) is a convex quadratic programming problem on the tangent subspace \( \{ d~|~d^\mathrm{T} \nabla _{xx}^{2}{\mathscr {L}}(x,\lambda ) d>0, A(x)d=0 \}\). Assume that \(C_{k}\) is of full rank. The unique solution \((s_{\bar{k}}, l_{\bar{k}})\) of \((\mathrm{QP}_k1)\) satisfies the first-order KKT optimality condition of \((\mathrm{QP}_k1)\). \(l_{\bar{k}}\) be the Lagrange multiplier of \((\mathrm{QP}_k1)\). Then from KKT optimality conditions

$$\begin{aligned}&\displaystyle \frac{1}{2}\left( \nabla _{xx}^{2} {\mathscr {L}}_{k}+ \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}}\right) s_{\bar{k}}+\nabla f_{k}+ \frac{1}{2}B_{k}^\mathrm{T} \lambda _{k}- \frac{1}{2}C_{k}^\mathrm{T} l_{\bar{k}}=0, \end{aligned}$$
(3.2a)
$$\begin{aligned}&\displaystyle \frac{1}{2}C_{k} s_{\bar{k}} +h_{k}=0. \end{aligned}$$
(3.2b)

Adding \(\frac{1}{2}C_{k}^\mathrm{T} \lambda _{k}\) to both sides of (3.2a), we get

$$\begin{aligned} \frac{1}{2}\left( \nabla _{xx}^{2} {\mathscr {L}}_{k}+ \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}}\right) s_{\bar{k}}+\nabla f_{k}+&\frac{1}{2}B_{k}^\mathrm{T}\lambda _{k}-\frac{1}{2}C_{k}^\mathrm{T} (l_{\bar{k}}-\lambda _{k})= \frac{1}{2}C_{k}^\mathrm{T} \lambda _{k}, \end{aligned}$$
(3.3a)
$$\begin{aligned}&\frac{1}{2}C_{k} s_{\bar{k}} +h_{k}=0 . \end{aligned}$$
(3.3b)

(3.3a) and (3.3b) can be written as the following matrix equation:

$$\begin{aligned} \begin{bmatrix} \frac{1}{2}(\nabla _{xx}^{2} {\mathscr {L}}_{k}+ \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}})&\quad -\frac{1}{2}C_{k}^\mathrm{T} \\ \frac{1}{2}C_{k}&\quad 0 \end{bmatrix} \begin{bmatrix} s_{\bar{k}} \\ l_{\bar{k}}-\lambda _{k} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k}+A_{k}^\mathrm{T} \lambda _{k} \\ -h_{k} \end{bmatrix}. \end{aligned}$$

This is equivalent to

$$\begin{aligned} \frac{1}{2} \Bigg ( \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{k}&\quad -A_{k}^\mathrm{T} \\ A_{k}&\quad 0 \end{bmatrix}+ \begin{bmatrix} \nabla _{xx}^{2} {\mathscr {L}}_{{\bar{k}}}&\quad - A_{\bar{k}}^\mathrm{T}\\ A_{\bar{k}}&\quad 0 \end{bmatrix} \Bigg ) \begin{bmatrix} s_{\bar{k}} \\ l_{\bar{k}}-\lambda _{k} \end{bmatrix}= \begin{bmatrix} -\nabla f_{k}+A_{k}^\mathrm{T} \lambda _{k} \\ -h_{k} \end{bmatrix}, \end{aligned}$$

i.e.,

$$\begin{aligned} \frac{1}{2} \big ( P^{\prime }(x_{k},\lambda _{k})+ P^{\prime }(x_{\bar{k}},\lambda _{\bar{k}})\big ) \begin{bmatrix} s_{\bar{k}} \\ l_{\bar{k}}-\lambda _{k} \end{bmatrix}= -P(x_{k}, \lambda _{k}). \end{aligned}$$
(3.4)

Denote

$$\begin{aligned} \widehat{x}_{\bar{k}}= s_{\bar{k}}\quad \text{ and }\;\widehat{\lambda }_{\bar{k}}=l_{\bar{k}}-\lambda _{k}. \end{aligned}$$

The iteration process discussed above can be summarized in the following two steps:

First step:

$$\begin{aligned} \begin{bmatrix} x_{\bar{k}} \\ \lambda _{\bar{k}} \end{bmatrix} = \begin{bmatrix} x_{k} \\ \lambda _{k} \end{bmatrix}+ \begin{bmatrix} \widehat{x}_{k} \\ \widehat{\lambda }_{k} \end{bmatrix}, \end{aligned}$$
(3.5a)

where (\(\widehat{x}_{k}\),\(\widehat{\lambda }_{k}\)) is the value of \((\alpha , \beta )\) satisfying

$$\begin{aligned} P^{\prime }(x_{k},\lambda _{k}) \begin{bmatrix} \alpha \\ \beta \end{bmatrix}= -P(x_{k}, \lambda _{k}). \end{aligned}$$
(3.5b)

Second step:

$$\begin{aligned} \begin{bmatrix} x_{k+1} \\ \lambda _{k+1} \end{bmatrix} = \begin{bmatrix} x_{k} \\ \lambda _{k} \end{bmatrix}+ \begin{bmatrix} \widehat{x}_{\bar{k}} \\ \widehat{\lambda }_{\bar{k}} \end{bmatrix}, \end{aligned}$$
(3.6a)

where (\(\widehat{x}_{\bar{k}}\),\(\widehat{\lambda }_{\bar{k}}\)) is the value of \((\alpha , \beta )\) satisfying

$$\begin{aligned} \frac{1}{2} \big ( P^{\prime }(x_{k},\lambda _{k})+ P^{\prime }(x_{\bar{k}},\lambda _{\bar{k}})\big ) \begin{bmatrix} \alpha \\ \beta \end{bmatrix}= -P(x_{k}, \lambda _{k}). \end{aligned}$$
(3.6b)

Since \(\lambda _{k+1}=\lambda _{k} +\widehat{ \lambda }_{\bar{k}}\), so from (3.4) and (3.6a), \(\lambda _{k+1}=\lambda _{k}+(l_{\bar{k}}-\lambda _{k})=l_{\bar{k}}\). Hence the new iterate \((x_{k+1}, \lambda _{k+1})\) can be generated by solving the KKT optimality conditions given in (3.2a) or by its equivalent form given in (3.5) and (3.6).

We state the algorithm for the proposed two-phase-SQP (TP-SQP) scheme in its simplest form in the following algorithm:

figure a

For solving the equality-constrained optimization problem, a good initial estimate \(x_{0}\) for \(x^{*}\) can be used to obtain a good initial estimate \(\lambda _{0}\) for the optimal multiplier vector \(\lambda ^{*}\), which may be considered as follows in the light of [9]:

$$\begin{aligned} \lambda _{0}=\left[ A(x_{0})A(x_{0})^\mathrm{T}\right] ^{-1} A(x_{0})\nabla f(x_{0}). \end{aligned}$$
(3.7)

\(\lambda _{0}\) can be made arbitrarily close to \(\lambda ^{*}\) by choosing \(x_{0}\) close to \(x^{*}\).

4 Convergence Analysis

Theorem 4.1

If P is thrice Frechet differentiable function and \((x_{0},\lambda _{0})\) is sufficiently close to the solution \((x^{*},\lambda ^{*})\) of (EP), then sequence \((x_{k}, \lambda _{k})\) generated by Algorithm 1 converges cubically to \((x^{*}, \lambda ^{*})\).

Proof

Let \(u_{k}\) and \(v_{k}\) are \(m+n\) dimensional vectors, where \(u_{k} \triangleq (x_{k}, \lambda _{k})\) and \(v_{k} \triangleq (x_{\bar{k}}, \lambda _{\bar{k}})\), respectively. Then the steps (3.5) and (3.6) can be written as follows.

$$\begin{aligned} v_{k}&= u_{k} - \left[ P^{\prime }(u_{k}) \right] ^{-1} P(u_{k}), \end{aligned}$$
(4.1a)
$$\begin{aligned} u_{k+1}&=u_{k}- 2 \left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] ^{-1} P(u_{k}). \end{aligned}$$
(4.1b)

We denote \(\alpha =(x^{*}, \lambda ^{*})\). Here \(P:{\mathbb {R}}^{n+m}\rightarrow {\mathbb {R}}^{n+m}\). Using Taylor expansion of \(P(\alpha )\) about \(u_{k}\), we have

$$\begin{aligned} P(\alpha )= & {} P(u_{k})+P^{\prime }(u_{k})(\alpha - u_{k})+\frac{1}{2!}P^{\prime \prime }(u_{k})(\alpha - u_{k})^{2}\nonumber \\&+ \frac{1}{3!}P^{\prime \prime \prime }(u_{k})(\alpha - u_{k})^{3} + \cdots \end{aligned}$$
(4.2)

Subtracting \(\alpha \) from both the sides of (4.1b), and denoting \(e_{k}=u_{k}-\alpha \), we have

$$\begin{aligned}&\displaystyle e_{k+1}=e_{k}-2\left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] ^{-1} P(u_{k}), \nonumber \\&\displaystyle \left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] e_{k+1}=\left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] e_{k}-2P(u_{k}). \end{aligned}$$
(4.3)

Since \((x^{*}, \lambda ^{*})\) is the solution of (EP), so \(P(x^{*}, \lambda ^{*})=0\). From (4.2),

$$\begin{aligned} P(u_{k})&=-P^{\prime }(u_{k})(\alpha - u_{k})-\frac{1}{2!}P^{\prime \prime }(u_{k})(\alpha - u_{k})^{2}\\&\qquad -\frac{1}{3!} P^{\prime \prime \prime }(u_{k})(\alpha - u_{k})^{3} + \cdots \\&=P^{\prime }(u_{k})e_{k}-\frac{1}{2!}P^{\prime \prime }(u_{k})e_{k}^{2}+ \frac{1}{3!}P^{\prime \prime \prime }(u_{k})e_{k}^{3} +\cdots \end{aligned}$$
$$\begin{aligned}&\quad \;[P^{\prime }(u_{k})]^{-1}P(u_{k})\nonumber \\&=e_{k}-[P^{\prime }(u_{k})]^{-1}P^{\prime \prime }(u_{k})\frac{1}{2!}e_{k}^{2}+ [P^{\prime }(u_{k})]^{-1}P^{\prime \prime \prime }(u_{k})\frac{1}{3!}e_{k}^{3} +\cdots \end{aligned}$$
(4.4)

Expanding \(P^{\prime }(v_{k})\) with respect to \(u_{k}\) we get,

$$\begin{aligned} P^{\prime }(v_{k})&= P^{\prime }(u_{k})+P^{\prime \prime }(u_{k})(v_{k}-u_{k})+\frac{1}{2!} P^{\prime \prime \prime }(u_{k})(v_{k}-u_{k})^{2}+ \cdots \nonumber \\&= P^{\prime }(u_{k})-P^{\prime \prime }(u_{k})\left[ P^{\prime }(u_{k})^{-1}P(u_{k})\right] +\frac{1}{2!}P^{\prime \prime \prime }(u_{k})\left[ P^{\prime }(u_{k})^{-1}P(u_{k})\right] ^{2}+\cdots \nonumber \\&\qquad \text{(From } \text{(4.1a)) } \nonumber \\&= P^{\prime }(u_{k})-P^{\prime \prime }(u_{k})\big [e_{k}-[P^{\prime }(u_{k})]^{-1} P^{\prime \prime }(u_{k})\frac{1}{2!}e_{k}^{2} \nonumber \\&\quad +[P^{\prime }(u_{k})]^{-1}P^{\prime \prime \prime }(u_{k})\frac{1}{3!}e_{k}^{3} +\cdots \big ]+\frac{1}{2!}P^{\prime \prime \prime }(u_{k})e_{k}^{2}+\cdots ~~ \text{(from } \text{(4.4)) } \end{aligned}$$
(4.5)

The right hand side of (4.3) gives

$$\begin{aligned} \left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] e_{k}-2P(u_{k})=&\,\frac{1}{2!}P^{\prime \prime } (u_{k})\left[ P^{\prime }(u_{k})\right] ^{-1}P^{\prime \prime }(u_{k})e_{k}^{3} \nonumber \\&+\frac{1}{2!}P^{\prime \prime \prime }(u_{k})e_{k}^{3}- \frac{2}{3!}P^{\prime \prime \prime }(u_{k}) e_{k}^{3}+ \cdots \end{aligned}$$
(4.6)

Using (4.6), Expression (4.3) can be written as

$$\begin{aligned} \left[ P^{\prime }(u_{k})+P^{\prime }(v_{k})\right] e_{k+1}=\mathscr {B}_{k} e_{k}^{3} + O(\vert \vert e_{k} \vert \vert ^{4}), \end{aligned}$$
(4.7)

where

$$\begin{aligned} \mathscr {B}_{k}=\bigg [ \frac{1}{2!}P^{\prime \prime }(u_{k})\left[ P^{\prime }(u_{k}) \right] ^{-1}P^{\prime \prime }(u_{k})+\frac{1}{3!}P^{\prime \prime \prime }(u_{k}) \bigg ]. \end{aligned}$$

This shows that the Algorithm 1 has cubic-order convergence.

One may observe the following theoretical advantages of the TP-SQP scheme.

  • Without computing higher-order derivatives, a higher-order convergence is achieved in comparison to Newton-SQP framework.

  • TP-SQP scheme involves the same order of arithmetic operations as compared to Newton-SQP method.

One may associate a line search technique to each iteration of Algorithm 1 with the help of a merit function, which can decide whether a trial step should be accepted or not. In the next section, we limit our discussion to \(l_{1}\) merit function for the proposed scheme. It has been discussed how to choose the penalty parameter \(\mu \) that makes the chosen direction a descent one in every iteration.

5 A Note on \(l_{1}\)-Merit Function for the Proposed Scheme

The \(l_{1}\)-merit function is of the form \(\phi _{1} : {\mathbb {R}}^ {n+1} \rightarrow {\mathbb {R}}\), as

$$\begin{aligned} \phi _{1}(x; \mu )=f(x)+ \mu \Vert h(x) \Vert _{1}, \end{aligned}$$

where \(\mu \) is the penalty parameter, \(\Vert .\Vert _{1} \) is \(l_{1}\) norm. In TP-SQP method, the step length is of one unit length. The steplength \(\alpha _{k} \in (0,1]\) is associated if the following decrease condition holds:

$$\begin{aligned} \phi _{1}(x_{k}+\alpha _{k}p_{k}; \mu _{k}) \leqslant \phi _{1}(x_{k}; \mu _{k})+\eta \alpha _{k} {\mathscr {D}}(\phi _{1}(x_{k}; \mu );p_{k}), \end{aligned}$$

where \(\eta \in (0,1)\) and \({\mathscr {D}}(\phi _{1}(x_{k}; \mu );p_{k})\) denotes the directional derivative of \(\phi _{1}\) in the direction \(p_{k}\). This requirement is analogous to Armijo condition of unconstrained optimization problem provided that \(p_{k}\) is a descent direction, that is, \({\mathscr {D}}(\phi _{1}(x_{k}; \mu );p_{k}) \leqslant 0\). Descent direction holds if \(\mu \) is chosen sufficiently large as shown in the following theorem:

Theorem 5.1

For the TP-SQP scheme, if \(\max ~ \{ \Vert A_{k}s_{\bar{k}} \Vert _{1} , \Vert \frac{1}{2}(A_{\bar{k}}- A_{k}) s_{\bar{k}} \Vert _{1} \} \leqslant \Vert h_{k} \Vert _{1}\), then \( \mu > 4 \Vert \lambda _{k} \Vert _{\infty } + 2\Vert \lambda _{k+1} \Vert _{\infty }~\) implies

$$\begin{aligned} {\mathscr {D}}(\phi _{1}(x_{k}; \mu );s_{\bar{k}}) \leqslant 0. \end{aligned}$$

Proof

$$\begin{aligned}&\phi _{1}(x_{k}+\alpha s_{\bar{k}}; \mu ) - \phi (x_{k};\mu ) \\= & {} f(x_{k}+\alpha s_{\bar{k}}) - f_{k}+ \mu \Vert h(x_{k} + \alpha s_{\bar{k}}) \Vert _{1} - \mu \Vert h_{k} \Vert _{1} \\= & {} ( f_{k}+ \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}}+ \frac{1}{2} \alpha ^{2} s_{\bar{k}}^\mathrm{T} \nabla ^{2}f(x_{k}+t_{1} s_{\bar{k}}) s_{\bar{k}} )- f_{k} \\&+\, \mu \Vert h_{k} + \alpha A_{k} s_{\bar{k}} + \frac{1}{2} \alpha ^{2} s_{\bar{k}}^\mathrm{T} \nabla ^{2} h(x_{k}+t_{2}s_{\bar{k}}) s_{\bar{k}}\Vert _{1} - \mu \Vert h_{k} \Vert _{1} \\&\quad ( \text{ where } t_{1},t_{2} \in (0,\alpha ) ) \\\leqslant & {} \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}}+ \mu \Vert h_{k} + \alpha A_{k} s_{\bar{k}} \Vert _{1} - \mu \Vert h_{k} \Vert _{1} + \gamma \alpha ^{2} \Vert s_{\bar{k}} \Vert _{1}^{2} \\&\quad (\gamma \hbox { is the upper bound of the second derivatives of }f\hbox { and }h) \\= & {} \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}}+ \mu \Vert h_{k} + \alpha (-2 h_{k}- A_{\bar{k}} s_{\bar{k}}) \Vert _{1} - \mu \Vert h_{k} \Vert _{1} + \gamma \alpha ^{2} \Vert s_{\bar{k}} \Vert _{1}^{2} \\&\quad \left( \text{ Since } \text{ from } \text{(3.3b) },\; \frac{1}{2}(A_{k}+ A_{\bar{k}}) s_{\bar{k}} +h_{k}=0\right) \\= & {} \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}}+ \mu \Vert (1- \alpha )h_{k} - \alpha ( h_{k}+ A_{\bar{k}} s_{\bar{k}}) \Vert _{1} - \mu \Vert h_{k} \Vert _{1} \\&+\, \gamma \alpha ^{2} \Vert s_{\bar{k}} \Vert _{1}^{2} \\\leqslant & {} \alpha \nabla f_{k}^\mathrm{T} s_{\bar{k}} + \mu (1- \alpha ) \Vert h_{k}\Vert _{1} + \mu \alpha \Vert h_{k}+ A_{\bar{k}} s_{\bar{k}}\Vert _{1} -\mu \Vert h_{k} \Vert _{1} \\&+\, \gamma \alpha ^{2} \Vert s_{\bar{k}} \Vert _{1}^{2} \quad (\text{ for }\; 0< \alpha \leqslant 1). \end{aligned}$$

This implies

$$\begin{aligned} \text{ lim }_{\alpha \rightarrow 0} \frac{\phi _{1}(x_{k}+\alpha s_{\bar{k}}; \mu )- \phi _{1}(x_{k};\mu )}{\alpha } \leqslant \nabla f_{k}^\mathrm{T} s_{\bar{k}} - \mu \Vert h_{k} \Vert _{1} + \mu \Vert h_{k}+ A_{\bar{k}} s_{\bar{k}} \Vert _{1}, \end{aligned}$$

i.e.,

$$\begin{aligned} {\mathscr {D}}(\phi _{1}(x_{k}; \mu ); s_{\bar{k}} ) \leqslant \nabla f_{k}^\mathrm{T} s_{\bar{k}} - \mu \Vert h_{k} \Vert _{1} + \mu \Vert h_{k}+ A_{\bar{k}} s_{\bar{k}} \Vert _{1}. \end{aligned}$$

Multiplying \(s_{\bar{k}}^\mathrm{T}\) in left to both sides of (3.2a) and using \(l_{\bar{k}}= \lambda _{k+1}\), the above inequality becomes

$$\begin{aligned}&\quad {\mathscr {D}}(\phi _{1}(x_{k}; \mu ); s_{\bar{k}} ) \\&\leqslant -s_{\bar{k}}^\mathrm{T} \big ( \frac{1}{2}(\nabla ^{2}_{xx} {\mathscr {L}}_{k}+ \nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}) \big ) s_{\bar{k}} + s_{\bar{k}}^\mathrm{T} \big ( \frac{1}{2} (A_{k}^\mathrm{T}+ A_{\bar{k}}^\mathrm{T})(\lambda _{k+1}-\lambda _{k}) \big ) \\&+ s_{\bar{k}}^\mathrm{T} A_{k}^\mathrm{T} \lambda _{k} - \mu \Vert h_{k} \Vert _{1} + \mu \Vert h_{k} + A_{\bar{k}} s_{\bar{k}} \Vert _{1}. \end{aligned}$$

Using (3.3a), we get

$$\begin{aligned} {\mathscr {D}}(\phi _{1}(x_{k}; \mu ); s_{\bar{k}} )&\leqslant -s_{\bar{k}}^\mathrm{T} \left( \frac{1}{2}\left( \nabla ^{2}_{xx} {\mathscr {L}}_{k}+ \nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}\right) \right) s_{\bar{k}}-h_{k}^\mathrm{T}\lambda _{k+1}+h_{k}^\mathrm{T} \lambda _{k}+(A_{k} s_{\bar{k}})^\mathrm{T} \lambda _{k} \\&\quad - \mu \Vert h_{k} \Vert _{1} + \mu \Vert \frac{1}{2}(A_{\bar{k}}- A_{k}) s_{\bar{k}} \Vert _{1} \\&\leqslant -s_{\bar{k}}^\mathrm{T} \left( \frac{1}{2}\left( \nabla ^{2}_{xx} {\mathscr {L}}_{k}+ \nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}\right) \right) s_{\bar{k}} + \Vert h_{k} \Vert _{1} \Vert \lambda _{k+1} \Vert _{\infty } + \Vert h_{k} \Vert _{1} \Vert \lambda _{k} \Vert _{\infty } \\&\quad + \Vert A_{k} s_{\bar{k}} \Vert _{1} \Vert \lambda _{k} \Vert _{\infty } - \mu \Vert h_{k} \Vert _{1} + \frac{\mu }{2} \Vert h_{k} \Vert _{1}. \end{aligned}$$

Using the given condition \(\max ~ \{ \Vert A_{k} s_{\bar{k}} \Vert _{1} , \Vert \frac{1}{2}(A_{\bar{k}}- A_{k}) s_{\bar{k}} \Vert _{1} \} \leqslant \Vert h_{k} \Vert _{1}\), we get

$$\begin{aligned} {\mathscr {D}}(\phi _{1}(x_{k}; \mu ); s_{\bar{k}} )&\leqslant - s_{\bar{k}}^\mathrm{T} \left( \frac{1}{2}(\nabla ^{2}_{xx} {\mathscr {L}}_{k}+ \nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}) \right) s_{\bar{k}} \\&\quad + \Vert h_{k} \Vert _{1} \left( \Vert \lambda _{k+1} \Vert _{\infty } + 2 \Vert \lambda _{k} \Vert _{\infty } - \frac{\mu }{2} \right) \\&\leqslant 0 \qquad \left( \text{ for } ~ \mu > 2 \Vert \lambda _{k+1} \Vert _{\infty } + 4 \Vert \lambda _{k} \Vert _{\infty }\right) . \end{aligned}$$

It follows that under the assumption stated, \(s_{\bar{k}}\) will be a descent direction for \(\phi \) if \(\nabla ^{2}_{xx} {\mathscr {L}}_{{\bar{k}}}\) is positive definite and \(\mu > 2 \Vert \lambda _{k+1} \Vert _{\infty } + 4 \Vert \lambda _{k} \Vert _{\infty }\).

6 Numerical Examples

Following two examples are examined with the proposed algorithm, and the results are analyzed. The algorithm will terminate whenever the norm of the gradient of the Lagrangian comes close to zero.

Example 6.1

$$\begin{aligned} \text{(P1): }&\hbox { min}\; x_{1}+x_{2} \\&\text{ s.t. }\;x_{1}^{2}+x_{2}^{2}-2=0 \end{aligned}$$

Example 6.2

$$\begin{aligned} \text{(P2) }{:}&\hbox { min}\; (\sqrt{3})^{3} x_{1}x_{2}x_{3}\\&\text{ s.t. }\; x_{1}^{2}+x_{2}^{2}+x_{3}^{2}-1=0 \end{aligned}$$

(P1) and (P2) are solved in MATLAB R-2013(b) by Newton-SQP method and TP-SQP method with the tolerance limit is \(10^{-15}\) and different initial points. Value of \(\lambda _{0}\) is chosen using (3.7). The number of iterations by both the methods are provided in Table 1. Since the proposed scheme has local cubic-order convergence property, it is expected that the number of iterations of the proposed scheme to attain the solution must be less than the number of iterations of Newton-SQP scheme, which has quadratic convergence property. Table 1 certainly meets this expectation.

Table 1 Comparison between the number of iterations for Newton-SQP and TP-SQP

7 Conclusion

In this paper, we have proposed a local scheme to solve a nonlinear equality-constrained optimization problem with higher-order convergence property. Without computing higher-order derivatives, TP-SQP scheme achieves higher-order convergence property more quickly than Newton-SQP method. However, since we are solving two quadratic subproblems in each iteration, for the aspect of software implementation, TP-SQP scheme is more suitable than the Newton-SQP method for those optimization problems, where the former one takes almost half the number of iterations than the latter one. In future, in the light of TP-SQP scheme, one may propose schemes even having order of convergence more than three.