1 Introduction

In this paper, we consider equality constrained optimization problems of the form

$$\begin{aligned}&\min _{x\in {\mathbb {R}}^n} f(x) \end{aligned}$$
(1a)
$$\begin{aligned}&\text {subject to }\, c_j(x)=0, \quad j=1, \ldots , m, \end{aligned}$$
(1b)

where f, \(c_j:{\mathbb {R}}^n \rightarrow {\mathbb {R}} \,\, (j=1, \ldots , m)\) are continuously differentiable.

One of the most popular algorithms for (1) is the Sequential Quadratic Programming (SQP). At the k-th iteration of the SQP method, a search direction \(d_k\) is computed by solving the subproblem:

$$\begin{aligned}&\min _{d\in {\mathbb {R}}^n} \varphi _k(d)=g_k^Td+\frac{1}{2}d^TB_kd \end{aligned}$$
(2a)
$$\begin{aligned}&\text {subject to }\, A_k^Td+c_k=0, \end{aligned}$$
(2b)

where

$$\begin{aligned} g_k=\nabla f(x_k), \,\, c_k=(c_1(x_k),\ldots ,c_{m}(x_k))^T, \,\, A_k=\begin{bmatrix}\nabla c_1(x_k), \ldots , \nabla c_{m}(x_k) \end{bmatrix}, \end{aligned}$$

and \(B_k\) is an approximation to the Hessian of the Lagrangian of (1). Formally,

$$\begin{aligned} B_k \approx \nabla _x^2 \mathcal {L}(x_k,\lambda _k). \end{aligned}$$

We refer to [2, 6] for comprehensive understanding and further references on SQP methods. SQP is a very successful algorithm for small or medium-size problems (1). However, it may not be suitable for large scale problems, due to computational cost and memory issue, for example.

Currently, solving large scale problems is one of the most important issues in nonlinear optimization. For such problems, a large number of variables, limited memory storage, and massive computations at each iteration consist of major difficulties. A recent review by Gould et al. [7] discusses the main approaches for handling large scale problems, such as step computation, active set, gradient projection and interior point methods.

To solve large scale problems, such as large scale nonlinear optimization, nonlinear equations, and nonlinear least squares, Yuan [19, 20] proposed a new approach using subspace techniques. Also, a model algorithm using subspace was given for unconstrained and equality constrained optimization and possible choices for subspaces were introduced. In the subspace methods, the main issue is how to construct subproblems in a low dimension so that the computational cost in each iteration can be reduced, compared to standard approaches.

Yuan [19, 20] also showed that in many standard algorithms, such as conjugate gradient method, limited-memory quasi-Newton method, projected gradient method, and null space method, there are ideas or techniques which can be viewed as subspace approaches. Based on these observations, several algorithms have been proposed. In particular, for unconstrained optimization, Wang and Yuan [18] and Wang et al. [17] proposed subspace trust region methods using standard quasi-Newton and limited-memory quasi-Newton approach, respectively. More detailed discussions on subspace methods for unconstrained optimization can be found in [5, 15,16,17,18, 21]. For nonlinear equality constrained optimization, Grapiglia et al. [9] investigated subspace properties for the Celis-Dennis-Tapia (CDT) problem [3] and proposed a subspace method. In this method, as iteration increases, the dimension of subspace increases, and so memory and computational cost increase. This is the main drawback of the method proposed by Grapiglia et al. [9]. Recently, subspace choices for the CDT problem and subspace properties of the quadratically constrained quadratic program (QCQP) were investigated by Zhao and Fan [22, 23], respectively.

By extending the method in [19], we propose a new subspace SQP method for large scale nonlinear equality constrained optimization problems. Since our method controls the dimension of subspace at each iteration, we avoid the rapid increase of dimension. Furthermore, we adopt a line search. As far as we know, line search has not been applied yet in the context of subspace methods for equality constrained optimization.

In the case of few constraints, our method is equivalent to the standard SQP method in the full space. In the case of many constraints, we reduce the number of constraints in the subproblem and we show that the solution of the subspace subproblem is a descent direction of a particular exact penalty function. Global convergence properties of the proposed method are given for both cases.

This paper is organized as follows. In Sect. 2, we introduce the subspace method for unconstrained optimization in [19]. In Sect. 3, we describe a new subspace technique. Based on it, we propose a subspace method for equality constrained optimization by extending the method in [19]. In Sect. 4, the convergence of the proposed method is established. Numerical results are reported in Sect. 5. Finally, concluding remarks are given in Sect. 6.

2 Subspace methods for unconstrained optimization

In this section, we review the subspace method proposed in [19]. Let us consider unconstrained optimization problems of the form

$$\begin{aligned} \min _{x\in {\mathbb {R}}^n} f(x). \end{aligned}$$
(3)

The choice of a subspace can be motivated by limited-memory quasi-Newton methods [11]. In general, to approximate the Hessian \(\nabla ^2f(x_k)\), quasi-Newton methods for (3) use the secant equation

$$\begin{aligned} B_ks_{k-1}=y_{k-1}, \end{aligned}$$

where \(s_{k-1}=x_k-x_{k-1}\) and \(y_{k-1}=g_k-g_{k-1}\). A typical example is the famous BFGS method [12]:

$$\begin{aligned} B_k=B_{k-1}-\frac{B_{k-1}s_{k-1}s_{k-1}^TB_{k-1}}{s_{k-1}^TB_{k-1}s_{k-1}} +\frac{y_{k-1}y_{k-1}^T}{s_{k-1}^Ty_{k-1}}. \end{aligned}$$

Thus, \(B_k\) is updated by using \(B_{k-1}\), \(s_{k-1}\), and \(y_{k-1}\). However, in large scale problems, retaining \(B_{k-1}\) may introduce a memory shortage as k increases.

As a variant of quasi-Newton methods, limited-memory quasi-Newton methods [11] use only a few previous terms; given \({\bar{m}}\), they periodically approximate the Hessian using \(B_{k}^{(i-1)}\), \(s_{k-{\bar{m}}-1+i}\), and \(y_{k-{\bar{m}}-1+i}\) for \(i=1,\ldots , {\bar{m}}\) from the initial guess \(B_k^{(0)}=\delta _k I\) with \(\delta _k > 0\). Byrd et al. [1] proposed a compact representation of limited-memory BFGS update matrix

$$\begin{aligned} B_k=B_k^{({\bar{m}})}=\delta _kI+W_kD_kW_k^T, \end{aligned}$$

where \(D_k\) is a \(2{\bar{m}}\times 2{\bar{m}}\) matrix and

$$\begin{aligned} W_k=\left[ S_k \quad Y_k \right] =\left[ s_{k-1}, s_{k-2}, \ldots , s_{k-{\bar{m}}}, y_{k-1}, y_{k-2}, \ldots , y_{k-{\bar{m}}}\right] \in {\mathbb {R}}^{n\times 2{\bar{m}}}. \end{aligned}$$

If a line search type method is applied, we have \(s_k=\alpha _kd_k=-\alpha _kB_k^{-1}g_k\), while for trust region algorithms one has \(s_k=-(B_k+\nu _kI)^{-1}g_k\) with \(\nu _k \ge 0\). In either case, it can be shown that \(s_k\) belongs to the subspace

$$\begin{aligned} \text {span} \left\{ g_k, s_{k-1}, \ldots , s_{k-{\bar{m}}}, y_{k-1}, \ldots , y_{k-{\bar{m}}} \right\} . \end{aligned}$$
(4)

Based on the subspace (4), Wang et al. [17] proposed a subspace trust region method for large scale unconstrained optimization problems (3). Preliminary numerical results show that their algorithm is encouraging. In their experiments, the scales of problem range from 500 to 10,000, but the dimensions of subspace are low, between 6 and 16. In the next section, we propose a new subspace method for the equality constrained optimization problem, using a line search framework.

3 A new subspace method for equality constrained optimization

In this section, we describe a new subspace technique for solving the equality constrained optimization problem (1). Suppose that at the k-th iteration, we have the current iterate point \(x_k\) and the subspace \({\mathcal {T}}_k\). We denote the dimension of \({\mathcal {T}}_k\) by \(i_k\) and let \(Q_k=\left[ q_1^{(k)},q_2^{(k)},\ldots ,q_{i_k}^{(k)}\right] \) be a matrix of linearly independent vectors in \({\mathcal {T}}_k\). Since any vector \(d\in {\mathcal {T}}_k\) can be written as \(d=Q_kz\) for some \(z\in {\mathbb {R}}^{i_k}\), the SQP subproblem with respect to the subspace \({\mathcal {T}}_k\) is given as

$$\begin{aligned}&\min _{z\in {\mathbb {R}}^{i_k}} {\bar{\varphi }}_k(z)=(Q_k^Tg_k)^Tz+\frac{1}{2}z^TQ_k^TB_kQ_kz \end{aligned}$$
(5a)
$$\begin{aligned}&\text {subject to }\, A_k^TQ_kz+c_k=0. \end{aligned}$$
(5b)

However, when \(m > i_k\), the linearized constraints (5b) may not be feasible because the system is over-determined. In this case, to guarantee the non-emptiness of feasible set, we consider the reduced constrained version of (5b):

$$\begin{aligned}&\min _{z\in {\mathbb {R}}^{i_k}} {\bar{\varphi }}_k(z)=(Q_k^Tg_k)^Tz+\frac{1}{2}z^TQ_k^TB_kQ_kz \end{aligned}$$
(6a)
$$\begin{aligned}&\text {subject to }\, P_k^T(A_k^TQ_kz+c_k)=0, \end{aligned}$$
(6b)

where \(P_k\) is a projection matrix. The Karush–Kuhn–Tucker (KKT) system of (6) is as follows:

$$\begin{aligned} Q_k^T (B_k Q_k z + A_kP_k \mu + g_k)= & {} 0, \end{aligned}$$
(7)
$$\begin{aligned} P_k^T(A_k^TQ_k z + c_k)= & {} 0, \end{aligned}$$
(8)

where \(\mu \) is a vector of Lagrangian multipliers.

Now we present a subspace SQP method for equality constrained optimization. It can be considered as a modification of the subspace method for equality constrained optimization proposed by Yuan [19].

Algorithm 1(Subspace SQP algorithm)

  1. Step 1

    Given \(x_0\), set an initial subspace \({\mathcal {T}}_0\), an initial penalty parameter \(\sigma _0> 0, tolerance \epsilon >0, iteration number k{:}{=}0.\)

  2. Step 2

    Solve the subspace subproblem (6):

    $$\begin{aligned}&\min _{z\in {\mathbb {R}}^{i_k}} {\bar{\varphi }}_k(z)=(Q_k^Tg_k)^Tz+\frac{1}{2}z^TQ_k^TB_kQ_kz\\&\text {subject to }\, P_k^T(A_k^TQ_kz+c_k)=0, \end{aligned}$$

    to obtain \(z_k\) and Lagrangian multipliers \(\mu _k\), and set

    \(d_k=Q_kz_k\) and \(\lambda _k^{+}=P_k\mu _k\). If \(\left\| d_k\right\| \le \epsilon \) then stop.

  3. Step 3

    Perform a line search to obtain the stepsize \(\alpha _k\) which satisfies

    $$\begin{aligned} \phi (x_k+\alpha d_k, \sigma _k) < \phi (x_k, \sigma _k) \end{aligned}$$

    and set

    $$\begin{aligned} x_{k+1}= & {} x_k+\alpha _k d_k\\ \lambda _{k+1}= & {} \lambda _k+\alpha _k (\lambda _k^{+}-\lambda _k) \end{aligned}$$

    where \(\phi (x,\sigma )\) is a penalty function and \(\sigma \) is a penalty parameter.

  4. Step 4

    Generate \(\sigma _{k+1}\), subspace \({\mathcal {T}}_{k+1}\), and \(\bar{\varphi }_{k+1}(z)\).

  5. Step 5

    Set \(k{:}{=}k+1\) and go to Step 2.

The key issue for Algorithm 1 is how to choose a subspace and solve the corresponding subspace subproblem quickly. We adapt the damped limited-memory BFGS update formula for equality constrained problems, which guarantees \(B_k\) to be positive definite [13]. With the relation \(H_k=B_k^{-1}\), we apply the following formulae:

$$\begin{aligned} B_0=I, \quad H_0=I. \end{aligned}$$

For \(k \ge 1\),

$$\begin{aligned}&B_k=\delta _k I -\left[ \delta _k S_k\quad {\bar{Y}}_k \right] \begin{bmatrix} \delta _k S_k^T S_k&L_k\\ L_k^T&-D_k \end{bmatrix}^{-1}\begin{bmatrix} \delta _kS_k^T \\ {\bar{Y}}_k^T \end{bmatrix}, \\&H_k=\gamma _k I + \left[ S_k\quad \gamma _k{\bar{Y}}_k \right] \begin{bmatrix} R_k^{-T}(D_k+\gamma _k {\bar{Y}}_k^T{\bar{Y}}_k)R_k^{-1}&-R_k^{-T}\\{\vspace{4pt}} -R_k^{-1}&0\end{bmatrix}\begin{bmatrix} S_k^T \\ \gamma _k{\bar{Y}}_k^T \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned}&S_k = \left\{ \begin{array}{ll} \left[ s_0,\ldots ,s_{k-1}\right] &{} \text { if}\quad k \le {\bar{m}}\\ {\vspace{4pt}} \left[ s_{k-{\bar{m}}},\ldots ,s_{k-1}\right] &{} \text { otherwise}\\ \end{array} \right. , \\&{\bar{Y}}_k = \left\{ \begin{array}{ll} \left[ {\bar{y}}_0,\ldots ,{\bar{y}}_{k-1}\right] &{} \text { if} \quad k \le {\bar{m}}\\ {\vspace{4pt}} \left[ {\bar{y}}_{k-{\bar{m}}},\ldots , {\bar{y}}_{k-1}\right] &{} \text { otherwise}\\ \end{array} \right. , \\&D_k = \left\{ \begin{array}{ll} \text {diag}\left( s_0^T{\bar{y}}_0,\ldots ,s_{k-1}^T{\bar{y}}_{k-1}\right) &{} \text { if}\quad k \le {\bar{m}}\\ {\vspace{4pt}} \text {diag}\left( s_{k-{\bar{m}}}^T{\bar{y}}_{k-{\bar{m}}},\ldots ,s_{k-1}^T{\bar{y}}_{k-1}\right) &{} \text { otherwise}\\ \end{array} \right. , \\&\delta _k=\frac{s_{k-1}^T{\bar{y}}_{k-1}}{\left\| s_{k-1}\right\| _2^2}, \quad \gamma _k = \frac{1}{\delta _k}, \\{\vspace{4pt}}&(L_k)_{ij} = \left\{ \begin{array}{ll} {\tilde{s}}_{i-1}^T {\tilde{y}}_{j-1} &{} \text { if}\quad i> j \\ 0&{} \text { otherwise}\\ \end{array} \right. ,\quad i,j = \left\{ \begin{array}{ll} 1,2,\ldots ,k &{} \text { if}\quad k \le {\bar{m}}\\ {\vspace{4pt}} 1,2,\ldots ,{\bar{m}} &{} \text { otherwise}\\ \end{array} \right. , \\&(R_k)_{ij} = \left\{ \begin{array}{ll} {\tilde{s}}_{i-1}^T {\tilde{y}}_{j-1} &{} \text { if}\quad i > j \\ 0&{} \text { otherwise}\\ \end{array} \right. ,\quad i,j = \left\{ \begin{array}{ll} 1,2,\ldots ,k &{} \text { if}\quad k \le {\bar{m}}\\ {\vspace{4pt}} 1,2,\ldots ,{\bar{m}} &{} \text { otherwise}\\ \end{array} \right. , \\&{\tilde{s}}_i = \left\{ \begin{array}{ll} s_i &{} \text { if}\quad k \le {\bar{m}}\\ {\vspace{4pt}} s_{k-{\bar{m}}+i}&{} \text { otherwise}\\ \end{array} \right. ,\quad {\tilde{y}}_i = \left\{ \begin{array}{ll} {\bar{y}}_i &{} \text { if}\quad k \le {\bar{m}}\\ {\vspace{4pt}} {\bar{y}}_{k-{\bar{m}}+i}&{} \text { otherwise}\\ \end{array} \right. , \\&s_i=x_{i+1}-x_i, \quad y_i=\nabla _x {\mathcal {L}}(x_{i+1},\lambda _{i+1})-\nabla _x \mathcal {L}(x_i,\lambda _i), \\&\theta _i= \left\{ \begin{array}{ll} 1 &{} \text { if}\quad s_i^Ty_i\ge 0.2s_i^TB_is_i\\ {\vspace{4pt}} (0.8s_i^TB_is_i)/(s_i^TB_is_i-s_i^Ty_i) &{} \text { otherwise}\\ \end{array} \right. , \\&{\bar{y}}_i=\theta _iy_i+(1-\theta _i)B_is_i, \end{aligned}$$

and \(\lambda _i\) are Lagrangian multipliers. The modified differences of the gradients of Lagrangian \({\bar{y}}_i\) are used to guarantee \(B_k\) to be positive definite. See [13] for details.

There are a variety of methods for solving the subproblem (2) and the corresponding subspace subproblem (6). In our numerical test, we use the range space method for these subproblems since \(B_k^{-1}\) is explicitly given as \(H_k\) through the quasi-Newton update formula [12]. More specifically, the range space method solves the following two systems

$$\begin{aligned} (A_k^TB_k^{-1}A_k) \lambda= & {} -A_k^TB_k^{-1}g_k +c_k, \end{aligned}$$
(9)
$$\begin{aligned} d= & {} -B_k^{-1}(g_k+A_k\lambda ), \end{aligned}$$
(10)

for the solution of the subproblem (2).

In Step 3, we use backtracking line search, which starts with \(\alpha =1\). The stepsize \(\alpha _k=\alpha \) is accepted if

$$\begin{aligned} \phi (x_k+\alpha _kd_k, \sigma _k)\le \phi (x_k, \sigma _k)+\eta \alpha _kD_{d_k}\phi (x_k, \sigma _k), \end{aligned}$$

where \(\eta \in (0, \frac{1}{2})\), and \(D_{d}\phi (x, \sigma )\) is the directional derivative of \(\phi \) at x in the direction d. Otherwise, we set \(\alpha \leftarrow \tau \alpha \), \(\tau \in (0,1)\) and repeat the line search. More details for our numerical experiments are given in Sect. 5.

4 Convergence analysis

In this section, we analyze convergence properties of the proposed method under the following assumptions:

Assumption 1

The starting point and all succeeding iterates lie in some closed, bounded and convex region \({\mathcal {C}}\) in \({\mathbb {R}}^n\).

Assumption 2

The columns of \(A(x)=\begin{bmatrix}\nabla c_1(x), \ldots , \nabla c_{m}(x) \end{bmatrix}\) are linearly independent for all \(x\in {\mathcal {C}}\).

Assumption 3

For all \(d\in {\mathbb {R}}^n\), there are positive constants \(\beta _1\) and \(\beta _2\) such that

$$\begin{aligned} \beta _1 \Vert d\Vert _2^2 \le d^TB_kd \le \beta _2 \Vert d\Vert _2^2 \end{aligned}$$

for all k.

These conditions are commonly chosen to prove global convergence of algorithms for constrained optimization problems [2].

We consider two cases based on the number of constraints m, comparing with the number of the variables n. For simplicity, we assume \(k \ge {\bar{m}}\).

4.1 Case I: Few constraints

First, we consider the case of relatively few constraints in the problem (1). Since \(m \ll n\), the subproblem (5) is used at each iteration. The following theorem shows that subproblem (2) is equivalent to subproblem (5). Its proof is an adaptation of Lemma 2.2 in [18].

Theorem 1

Suppose \(B_k\) is the k-th updated matrix by the damped limited-memory BFGS update formula. Let \(d_k\) be a solution of problem (2), and \(s_k=\alpha _k d_k\) with stepsize \(\alpha _k>0\). Let

$$\begin{aligned} {{\mathcal {T}}^F_k}=\text {span} \left\{ g_k, s_{k-{\bar{m}}}, \ldots , s_{k-1}, {\bar{y}}_{k-{\bar{m}}}, \ldots , {\bar{y}}_{k-1}, \nabla c_1(x_k), \ldots , \nabla c_{m}(x_k)\right\} . \end{aligned}$$

If \(Q_k\) is an orthonormal basis matrix of \({\mathcal {T}}^F_k\), then subproblem (2) is equivalent to (5).

Proof

We denote the dimension of the subspace \({\mathcal {T}}^F_k\) by r. Let \(U_k\in {\mathbb {R}}^{n\times (n-r)}\) be a matrix such that \(\left[ Q_k, \,\, U_k\right] \) is an \(n\times n\) orthogonal matrix. Then, for each \(d\in {\mathbb {R}}^n\), there exists one and only one pair \(z\in {\mathbb {R}}^r\), \(u\in {\mathbb {R}}^{n-r}\) such that \(d=Q_kz+U_ku\). Thus, it follows that

$$\begin{aligned} \varphi _k(d)= & {} g_k^T d + \frac{1}{2} d^T B_k d\nonumber \\= & {} g_k^T (Q_k z + U_k u) + \frac{1}{2} (Q_k z + U_k u)^T B_k (Q_k z + U_k u)\nonumber \\= & {} (Q_k^T g_k)^T z + (U_k^T g_k)^T u+\frac{1}{2} z^TQ_k^T B_k Q_k z +u^TU_k^T B_k Q_k z\nonumber \\&+\frac{1}{2} u^TU_k^T B_k U_k u. \end{aligned}$$
(11)

By the choice of \({\mathcal {T}}^F_k\), we have \(S_k^TU_k=0\), \({\bar{Y}}_k^TU_k=0\), and \(g_k^TU_k=0\). Hence, it follows that

$$\begin{aligned} B_kU_k=\delta _kU_k-\left[ \delta _k S_k\quad {\bar{Y}}_k \right] \begin{bmatrix} \delta _k S_k^T S_k&L_k\\ L_k^T&-D_k \end{bmatrix}^{-1}\begin{bmatrix} \delta _kS_k^TU_k \\ {\bar{Y}}_k^TU_k \end{bmatrix}=\delta _kU_k, \end{aligned}$$

and \(U_k^T B_k Q_k=\delta _k U_k^T Q_k =0\). By (11), we have

$$\begin{aligned} \varphi _k(d)=(Q_k^T g_k)^T z + \frac{1}{2} z^TQ_k^T B_k Q_k z+\frac{1}{2}\delta _k \left\| u\right\| _2^2. \end{aligned}$$

Since \(A_k^TU_k=0\), we have

$$\begin{aligned} 0=A_k^T d + c_k=A_k^T(Q_kz+U_k u)+c_k=A_k^T Q_k z+c_k. \end{aligned}$$

Now the above relation indicates (2) is equivalent to

$$\begin{aligned}&\min _{z, \, u} \quad \quad \quad (Q_k^Tg_k)^Tz+\frac{1}{2}z^TQ_k^TB_kQ_kz+\frac{1}{2}\delta _k \left\| u \right\| _2^2\\&\text {subject to }\, A_k^TQ_kz+c_k=0. \end{aligned}$$

Since \(\delta _k>0\), it is easy to see that the above problem is equivalent to (5) with \(u=0\). Therefore (2) is equivalent to (5) with \(d_k=Q_kz_k\).    \(\square \)

By Theorem 1, we ensure that the proposed method generates the same iterate as the standard SQP method. Thus, we can obtain the global convergence of the proposed method using Theorem 4.3 of Boggs and Tolle [2]. For completeness, we restate this theorem for the \(L_1\) exact penalty function [10]

$$\begin{aligned} \phi _1(x,\sigma )=f(x)+\sigma \left\| c(x)\right\| _1. \end{aligned}$$

Theorem 2

(Boggs and Tolle [2]) Assume that \(\sigma \) is chosen by

$$\begin{aligned} \sigma = \left\| \lambda _k^{+}\right\| _{\infty } +\bar{\rho } \end{aligned}$$

for some constant \(\bar{\rho } > 0\). Then the proposed algorithm started at any point \(x_0\) with stepsize \(\alpha \ge \bar{\alpha } > 0\) chosen to satisfy

$$\begin{aligned} \phi _1(x_k+\alpha d_k, \sigma _k)\le \phi _1(x_k, \sigma _k)+\eta \alpha D_{d_k}\phi _1(x_k, \sigma _k) \end{aligned}$$

converges to a KKT point of (1).

4.2 Case II: Many constraints

Now, we consider the case of relatively many constraints, i.e. \(m \approx n\) in (1). In this case, problem (6) is solved at each iteration. Since we reduce the number of constraints in the subproblem, we cannot expect that the proposed method generates the same iterate as the standard SQP method. Nevertheless, we can show that the search direction is a descent direction for the \(L_{\infty }\) exact penalty function [10]

$$\begin{aligned} \phi _{\infty }(x,\sigma )=f(x)+\sigma \left\| c(x)\right\| _{\infty }. \end{aligned}$$
(12)

Using this property, the global convergence of the proposed method is obtained.

First of all, we state the following result of Yuan [20]. It is used to show that the solution of the subproblem (6) is a descent direction of \(\phi _{\infty }(x,\sigma )\).

Lemma 1

(Yuan [20]) Suppose we sort the violations in decreasing order:

$$\begin{aligned} \left| c_{l_1}(x_k)\right| \ge \left| c_{l_2}(x_k)\right| \ge ...\ge \left| c_{l_{m}}(x_k)\right| , \end{aligned}$$
(13)

and \(|c_{l_{j_k+1}}| < \Vert c_k\Vert _{\infty }\). If the search direction \(d_k\) satisfies

$$\begin{aligned} P_k^T(c_k+A_k^Td_k)=0, \end{aligned}$$

where

$$\begin{aligned} P_k=\left[ e_{l_1},e_{l_2},...,e_{l_{j_k}}\right] \in {\mathbb {R}}^{m \times j_k}, \end{aligned}$$
(14)

then \(d_k\) is a descent direction of \(\left\| c(x)\right\| _{\infty }\) at \(x_k\).

Using Lemma 1, we derive the following theorem for a descent property of the search direction.

Theorem 3

Suppose \(\{x_k\}\), \(\{d_k\}\), \(\{\sigma _k\}\), \(\{c_k\}\), \(\{Q_k\}\), \(\{P_k\}\), and \(\{B_k\}\) are sequences generated by the proposed subspace SQP algorithm under Assumptions 2 and 3. If \(|c_{l_{j_k+1}}(x_k)| < \left\| c_k\right\| _{\infty }\), \(P_k\) is given by (14), and sufficiently large \(\sigma _k\) is chosen, then the search direction \(d_k\) is a descent direction for \(\phi _{\infty }(x,\sigma )\). That is, its directional derivative at \(x_k\) in the direction \(d_k\) satisfies

$$\begin{aligned} D_{d_k}\phi _{\infty }(x_k,\sigma _k)<0. \end{aligned}$$

Proof

Using Lemma 1, we obtain that

$$\begin{aligned} D_{d_k}\phi _{\infty }(x_k,\sigma _k)= & {} g_k ^Td_k-\sigma _k \Vert c_k\Vert _{\infty }\\= & {} g_k^T Q_k z_k - \sigma _k \Vert c_k\Vert _{\infty }\\= & {} -z_k^T Q_k^T B_k Q_k z_k - \mu _k^T P_k^T A_k^T Q_k z_k -\sigma _k \Vert c_k\Vert _{\infty }\\= & {} -z_k^T Q_k^T B_k Q_k z_k + \mu _k^T P_k^T c_k -\sigma _k \Vert c_k\Vert _{\infty }\\\le & {} -z_k^T Q_k^T B_k Q_k z_k -(\sigma _k - \Vert \mu _k\Vert _1)\Vert c_k\Vert _{\infty }, \end{aligned}$$

where the third equation is obtained by using (7), the fourth equation is from (8), and we get the last inequality by using Hölder’s inequality. Assumption 3 implies that \(-z_k^T Q_k^T B_k Q_k z_k\) is always negative. Thus, if \(\sigma _k\) is sufficiently large, that is, \(\sigma _k > \Vert \mu _k\Vert _1\), then \(D_{d_k}\phi _{\infty }(x_k,\sigma _k)<0\). Hence the search direction \(d_k\) is a descent direction for \(\phi _{\infty }(x,\sigma )\). \(\square \)

Now we establish the global convergence of our algorithm by applying the idea of Powell [14].

Theorem 4

Suppose \(\{x_k\}\), \(\{d_k\}\), \(\{\sigma _k\}\), \(\{c_k\}\), \(\{Q_k\}\), \(\{P_k\}\), and \(\{B_k\}\) are sequences generated by the proposed subspace SQP algorithm under Assumptions 13. If \(|c_{l_{j_k+1}}(x_k)| < \left\| c_k\right\| _{\infty }\), \(P_k\) is given by (14), \(\sigma _k\) is chosen as

$$\begin{aligned} \sigma _k = \left\| \mu _k\right\| _1 +\bar{\rho } \end{aligned}$$
(15)

for some constant \(\bar{\rho } > 0\), \(Q_k\) is an orthonormal basis matrix of \({\mathcal {T}}^M_k\) where

$$\begin{aligned} {{\mathcal {T}}^M_k}=\text {span} \left\{ g_k, s_{k-{\bar{m}}}, \ldots , s_{k-1}, {\bar{y}}_{k-{\bar{m}}}, \ldots , {\bar{y}}_{k-1}, \nabla c_{l_1}(x_k), \ldots , \nabla c_{l_{j_k}}(x_k)\right\} , \end{aligned}$$

and the stepsize \(\alpha _k > 0\) satisfies

$$\begin{aligned} \phi _{\infty }(x_k+\alpha _k d_k, \sigma _k)\le \phi _{\infty }(x_k, \sigma _k)+\eta \alpha _k D_{d_k}\phi _{\infty }(x_k, \sigma _k) \end{aligned}$$
(16)

for \(\eta \in (0, \frac{1}{2})\), then any limit point of \(\{x_k\}\) is a KKT point of (1).

Proof

First, we show that if \(d_k=0\), then \(x_k\) is a KKT point of original problem (1). By using \(d_k=Q_kz_k,\,\lambda _k^{+}=P_k\mu _k\), we can rewrite the Eqs. (7) and (8) as follows:

$$\begin{aligned} Q_k^T(B_k d_k +A_k\lambda _k^{+}+g_k)= & {} 0, \end{aligned}$$
(17)
$$\begin{aligned} P_k^T(A_k^Td_k+c_k)= & {} 0. \end{aligned}$$
(18)

With a matrix \(U_k\) which makes \(\left[ Q_k, \,\, U_k\right] \) an \(n\times n\) orthogonal matrix, we have the following relations:

$$\begin{aligned}&B_kU_k=\delta _kU_k,\,\, U_k^TA_kP_k=0,\,\, U_k^TB_kQ_k=\delta _kU_k^TQ_k=0,\,\, U_k^Tg_k=0, \nonumber \\&U_k^T(B_k d_k +A_k\lambda _k^{+}+g_k)=\delta _k U_k^T Q_k z_k+U_k^TA_kP_k\mu _k+U_k^Tg_k=0. \end{aligned}$$
(19)

Using (17) and (19), we have

$$\begin{aligned} B_k d_k +A_k\lambda _k^{+}+g_k= & {} 0, \nonumber \\ P_k^T(A_k^Td_k+c_k)= & {} 0. \end{aligned}$$
(20)

If \(d_k=0\), then

$$\begin{aligned} A_k\lambda _k^{+}+g_k= & {} 0, \\ P_k^Tc_k= & {} 0. \end{aligned}$$

From (13), \(P_k^Tc_k=0\) implies \(c_k=0\). Thus, \(x_k\) is a KKT point of the original problem (1) with a new Lagrangian multiplier \(\lambda _k^{+}\).

Now suppose \(d_k \ne 0\) for all k. We define the following two functions \({\bar{L}}_k\) and \(\bar{\phi }_k\) of d:

$$\begin{aligned} {\bar{L}}_k(x_k+d)= & {} f_k+g_k^T(Q_kz)+\frac{1}{2}(Q_kz)^TB_k(Q_kz)+\mu _k^TP_k^T(A_k^TQ_kz+c_k)\nonumber \\= & {} f_k+g_k^Td+\frac{1}{2}d^TB_kd+{\lambda _k^+}^T(A_k^Td+c_k),\nonumber \\ \bar{\phi }_k(x_k+d)= & {} f_k+g_k^T(Q_kz)+\frac{1}{2}(Q_kz)^TB_k(Q_kz)+\sigma _k \left\| P_k^T(A_k^TQ_kz+c_k)\right\| _{\infty } \nonumber \\= & {} f_k+g_k^Td+\frac{1}{2}d^TB_kd+\sigma _k \left\| P_k^T(A_k^Td+c_k)\right\| _{\infty }, \end{aligned}$$
(21)

with \(d=Q_kz\), \(\lambda _k^{+}=P_k\mu _k\). These functions are convex and achieve their minimum values at \(d=d_k\). Because of (20), we have

$$\begin{aligned} {\bar{L}}_k(x_k)={\bar{L}}_k(x_k+d_k)+\frac{1}{2}d_k^TB_kd_k\ge {\bar{L}}_k(x_k+d_k). \end{aligned}$$
(22)

The difference between these functions is bounded below by the estimate below.

$$\begin{aligned} \bar{\phi }_k(x_k+d)-{\bar{L}}_k(x_k+d)= & {} \sigma _k\left\| P_k^T(A_k^Td+c_k)\right\| _{\infty }-\mu _k^TP_k^T(A_k^Td+c_k) \nonumber \\\ge & {} (\sigma _k-\Vert \mu _k\Vert _1)\left\| P_k^T(A_k^Td+c_k)\right\| _{\infty } \nonumber \\\ge & {} \bar{\rho }\left\| P_k^T(A_k^Td+c_k)\right\| _{\infty }. \end{aligned}$$
(23)

If any of the equalities (6b) are violated, then we have

$$\begin{aligned} \bar{\phi }_k(x_k+d) > {\bar{L}}_k(x_k+d). \end{aligned}$$

In this case, if \(x_k\) is infeasible with respect to the given nonlinear constraints (1b), we have the bound

$$\begin{aligned} \bar{\phi }_k(x_k) > {\bar{L}}_k(x_k)\ge {\bar{L}}_k(x_k+d_k)=\bar{\phi }_k(x_k+d_k). \end{aligned}$$
(24)

While, if \(x_k\) is feasible, we have the condition

$$\begin{aligned} \bar{\phi }_k(x_k) = {\bar{L}}_k(x_k)> {\bar{L}}_k(x_k+d_k)=\bar{\phi }_k(x_k+d_k), \end{aligned}$$
(25)

where the strict inequality comes from that the term \(d_k^TB_kd_k\) in (22) is positive. Hence the number

$$\begin{aligned} t_k=\bar{\phi }_k(x_k)-\bar{\phi }_k(x_k+d_k) \end{aligned}$$

is positive. In addition, we have

$$\begin{aligned} \Vert c(x_k+\alpha d_k)\Vert _{\infty }= & {} \Vert c_k\Vert _{\infty }+\alpha D_{d_k}\Vert c(x_k)\Vert _{\infty }+o(\alpha )\\= & {} \Vert c_k\Vert _{\infty }-\alpha \Vert c_k\Vert _{\infty }+o(\alpha ) \end{aligned}$$

and

$$\begin{aligned} \bar{\phi }_k(x_k+\alpha d_k) = f_k+\alpha g_k^Td_k+\frac{1}{2}\alpha ^2 d_k^TB_kd_k+\sigma _k (1-\alpha )\left\| c_k\right\| _{\infty }. \end{aligned}$$

Owing to the above two relations and the convexity of the function (21), we have

$$\begin{aligned} \phi _\infty (x_k,\sigma )-\phi _{\infty }(x_k+\alpha d_k,\sigma )= & {} \bar{\phi }_k(x_k)-\bar{\phi }_k(x_k+\alpha d_k)+o(\alpha )\nonumber \\\ge & {} \alpha [\bar{\phi }_k(x_k)-\bar{\phi }_k(x_k+d_k)]+o(\alpha )\nonumber \\= & {} \alpha t_k +o(\alpha ),\,\,\, 0\le \alpha \le 1. \end{aligned}$$
(26)

Therefore the reduction of the penalty function

$$\begin{aligned} \phi _{\infty }(x_k+\alpha _k d_k, \sigma _k) < \phi _{\infty }(x_k, \sigma _k) \end{aligned}$$

can be achieved by a sufficiently small and positive \(\alpha _k\). From the choice of \(\alpha _k\) satisfying (16), we can replace the above condition on the stepsize by the inequality

$$\begin{aligned} \phi _{\infty }(x_{k+1},\sigma )\le \phi _{\infty }(x_k,\sigma )-\eta \alpha _k t_k, \end{aligned}$$
(27)

since \(D_{d_k}\phi _{\infty }(x_k,\sigma _k)\le \bar{\phi }_k(x_k+d_k)-\bar{\phi }_k(x_k)=-t_k<0\).

Let \(\xi \) be a small positive constant, and consider iterations with \(t_k > \xi \). Because the continuity of the first order derivatives in a compact domain introduces uniform continuity, and the vector \(d_k\) in (26) is bounded, there is a positive constant \(\beta (\xi )\) such that the inequality (27) holds for any \(\alpha _k \in [0,\beta (\xi )]\). Thus, by the line search process, we can deduce that

$$\begin{aligned} {[}\phi _\infty (x_k,\sigma )-\phi _\infty (x_{k+1},\sigma )]\ge \eta \xi \bar{\alpha }, \end{aligned}$$

where \(\bar{\alpha }\in (0,\beta (\xi )]\). However, the reduction in the line search tends to zero, because \(\{\phi _{\infty }(x_k,\sigma ):k=1,2,3,\ldots \}\) is a monotonically decreasing sequence which is bounded below. Therefore \(t_k\) tends to zero, that is,

$$\begin{aligned} \lim _{k \rightarrow \infty }[\bar{\phi }_k(x_k)-\bar{\phi }_k(x_k+d_k)]=0. \end{aligned}$$
(28)

Because either (24) or (25) is satisfied for each k, the limit (28) gives

$$\begin{aligned} \lim _{k \rightarrow \infty }[\bar{\phi }_k(x_k)-{\bar{L}}_k(x_k)]=0 \end{aligned}$$
(29)

and

$$\begin{aligned} \lim _{k \rightarrow \infty }[{\bar{L}}_k(x_k)-{\bar{L}}_k(x_k+d_k)]=0. \end{aligned}$$
(30)

By letting \(d=0\) in the estimate (23), and using the condition (15), we deduce from (29) that all limit points of the sequence \(\{x_k:k=1,2,3,\ldots \}\) satisfy the nonlinear constraints (1b).

Without loss of generality, we assume that \(\{x_k:k=1,2,3,\ldots \}\) has only one limit point, say \(x_*\). By the above arguments, \(c_j(x_*)=0\) for all \(j=1,\dots ,m\). From (20), we have that

$$\begin{aligned} \lim _{k \rightarrow \infty } \left\| B_k d_k+A_k\lambda _k^{+}+g_k\right\| =0. \end{aligned}$$
(31)

Since (22) and (30) imply \(B_k^{1/2}d_k\rightarrow 0\), and the matrices \(\{B_k : k=1,2,3,...\}\) are uniformly bounded, the limit (31) still holds after eliminating the term \(B_k d_k\). Because the Lagrange multipliers are bounded, by passing to a subsequence if necessary, we deduce that \(A_*\lambda _*^{+}+g_*=0\), where \(\lambda _*^{+}\) is any cluster points of \(\{\lambda _k^{+}\}\), \(\lim _{k \rightarrow \infty }A_k=A_*\), and \(\lim _{k \rightarrow \infty }g_k=g_*\). Hence \(x_*\) is a KKT point of (1). \(\square \)

5 Numerical results

In this section, we illustrate the implementation of the proposed algorithm SubSQP and report numerical results on some of CUTEst problems [8]. To show the effectiveness of SubSQP in large scale nonlinear equality constrained problems, we test it on 11 CUTEst problems. In most of these problems, the number of variables is greater than or equal to 5000, and the number of constraints is greater than half of the number of variables, consisting of only nonlinear equality constraints. The experiment also includes 5 problems with few constraints. To show the soundness and efficiency of SubSQP, we compare it with the standard SQP method. All tests are performed on an Intel Core i5 3.50 GHz CPU with 8 GB memory, running on Ubuntu 16.04 LTS and MATLAB R2016b.

For numerical tests, parameters are chosen as \(\eta = 10^{-3}\), \(\tau =0.5\), \(\sigma _k = \Vert \mu _k\Vert _1+\bar{\rho }\), \(\bar{\rho }=10^{-3}\), \(\epsilon = 10^{-3}\) and the maximum number of iterations is 100. Since we are mostly interested in large scale problems, we use the \(L_{\infty }\) exact penalty function

$$\begin{aligned} \phi _{\infty }(x,\sigma )=f(x)+\sigma \left\| c(x)\right\| _{\infty }, \end{aligned}$$

to measure progress towards a solution.

In order to prevent a rapid increase of the dimension of subspace in the case of many constraints, we limit the number of constraint gradients \(\nabla c_{l_j}(x_k)\) in the subspace to \({\bar{m}}\), for all k unless specified otherwise. To effectively decrease constraint violations, we select the \({\bar{m}}\) largest gradients in magnitude after sorting:

$$\begin{aligned} \left| c_{l_1}(x_k)\right| \ge \left| c_{l_2}(x_k)\right| \ge ...\ge \left| c_{l_{m}}(x_k)\right| . \end{aligned}$$

By including those constraint gradients in the subspace \({\mathcal {T}}_k^M\), the step \(s_k=\alpha _k d_k\) generated by the proposed method moves in the direction of reducing severe constraint violations. Consequently, we choose \(P_k\) in (14) as follows:

$$\begin{aligned} P_k=\left[ e_{l_1},e_{l_2},\ldots ,e_{l_{{\bar{m}}}}\right] \in {\mathbb {R}}^{m \times {\bar{m}}}, \end{aligned}$$

and set the subspace \({\mathcal {T}}_k^M\):

$$\begin{aligned} {\mathcal {T}}_k^M=\text {span} \left\{ g_k, s_{k-{\bar{m}}}, \ldots , s_{k-1}, {\bar{y}}_{k-{\bar{m}}}, \ldots , {\bar{y}}_{k-1}, \nabla c_{l_1}(x_k), \ldots , \nabla c_{l_{{\bar{m}}}}(x_k)\right\} . \end{aligned}$$

In the case of few constraints, all m constraint gradients are used to construct the subspace \({\mathcal {T}}_k^F\).

To compute the orthonormal basis matrix \(Q_k\), the reorthogonalization procedure in [4] is applied. If \(A_k\) is nearly rank deficient, SubSQP may eliminate some constraint gradients in the orthogonalization process. We use the conjugate gradient method to solve (9), and the relative residual norm is chosen for its stopping condition with tolerance \(10^{-4}\).

Table 1 Numerical results on CUTEst problems

To improve the performance of the proposed method, we skip the update \((s_k, {\bar{y}}_k)\) if

$$\begin{aligned} 10^{-8}{\bar{y}}_k^T{\bar{y}}_k\ge {\bar{y}}_k^Ts_k, \end{aligned}$$

and reset the subspace \({\mathcal {T}}_k\) if

$$\begin{aligned} |f_{k}-f_{k+1}|\le 10^{-4}|f_{k}| \end{aligned}$$

and

$$\begin{aligned} | \Vert c_{k}\Vert _{\infty }-\Vert c_{k+1}\Vert _{\infty } |\le 10^{-4}\Vert c_{k}\Vert _{\infty }. \end{aligned}$$

The stopping criterion consists of

$$\begin{aligned} ||g_k+A_k\lambda _k||_{\infty }\le \epsilon , \, ||c_k||_{\infty }\le \epsilon \quad \text {or} \quad ||d_k||_2\le \epsilon . \end{aligned}$$

We also terminate each algorithm when it generates a tiny stepsize \(\alpha _k < 10^{-6}\).

The numerical results on a few CUTEst test problems are reported in Tables 12. In Table 1, n represents the number of variables, m represents the number of constraints, \({\bar{m}}\) is a parameter for constructing the subspace. Since SQP and SubSQP give different solutions due to the adoption of subspace in SubSQP, we report the number of iterations, CPU time, the final objective value \(f_*\) and the infinity norm of the final constraint violation \(||c_*||\) in Tables 1 and 2.

Table 2 Function values and constraints violation on CUTEst problems

In the case of many constraints, the computational cost of SubSQP per iteration is less than that of SQP, especially on large scale problems since it performs in the proposed subspace. However, SubSQP may require more iterations, depending on the choice of subspace. SubSQP overall outperforms SQP in terms of final objective value, given in Table 2. Especially, for the problems DTOC2, LUKVLE16, and LUKVLE2, SubSQP is faster than SQP with a less constraint violation. In the case of few constraints, the results of SubSQP are similar to those of SQP. Since SubSQP requires an orthogonalization process, SubSQP is slower than SQP.

6 Conclusion

In this paper, we present a subspace SQP method to solve large scale nonlinear equality constrained optimization problems. The subspace technique is a promising approach for solving large scale optimization problems [21], since it can reduce both computational cost and memory requirements. To efficiently apply the technique to equality constrained problems, we propose a subspace generated by the gradient of the objective function, previous steps, the modified differences of the gradients of Lagrangian, and gradients of constraints with substantial violation, based on damped limited-memory BFGS update.

With this subspace, we prove that our method has a global convergence property. Numerical results on some of CUTEst problems show that our proposed algorithm works better for large scale problems when compared with the standard SQP method. Future work concerns a different choice of subspace and accompanied analyses of such space, including implementation.