1 Introduction

In this paper, we consider the following problem: find \(x\in C\) such that

$$\begin{aligned} Ax\in Q, \end{aligned}$$

where, C and Q are nonempty closed convex sets in \(\mathfrak {R}^n\) and \(\mathfrak {R}^m\), respectively, and A is an m by n real matrix. This problem was called the split feasibility problem (SFP) by Censor and Elfving [1]. It has important and wide applications in signal processing, image reconstruction and so on. Hence, it attracts many researchers attention in recent 20 years.

In [1], the authors used their multidistance idea to obtain iterative algorithms for solving the SFP. Their algorithms, as well as others obtained later involve matrix inverses at each iteration. In [2], Byrne presented a projection method called the CQ algorithm for solving the SFP that does not involve matrix inverses, but they assumed that the orthogonal projections onto C and Q are easily calculated. However, in some cases it is impossible or needs too much work to exactly compute the orthogonal projection. Therefore, if this case appears, the efficiency of projection-type methods, including the CQ algorithm, will be seriously affected. In [3], by using the relaxed projection technology, Yang presented a relaxed CQ algorithm for solving the SFP, where he used two halfspaces \(C_k\) and \(Q_k\) in place of C and Q, respectively, at the kth iteration and the orthogonal projections onto \(C_k\) and \(Q_k\) are easily executed. Recently, Qu and Xiu [4] gave a new reformulation for the SFP and proposed a new halfspace-relaxation projection method based on the new reformulation to solve the SFP. For more effective algorithms and the extensions of the SFP, the readers can see [5,6,7,8] and the survey papers [9, 10].

The global convergence of all the algorithms mentioned above were proved. It is worth noting that most of the algorithms are subject to the gradient projection method [11]. Projection-type methods represent important tools for finding the approximate solution of nonlinear programming and variational inequalities. The main idea of in this technique is to establish the equivalence between the problems related and the fixed-point problem by using the concept of projection. This alternative formulation has played a significant part in developing various projection-type methods for solving nonlinear programming and variational inequalities. It is well known that the convergence rate of such projection-type methods is linear at best. To our knowledge, superlinearly convergent algorithm for solving the SFP is not found in the literature. Studying the algorithm which has a fast convergence rate for solving the SFP is a very meaningful topic. It is the main purpose of this paper.

Basing on a merit function of the SFP, a Newton projection algorithm is presented for solving the SFP in this paper. The subproblem of this algorithm is to find an inexact solution of a convex quadratic programming, where the Newton step is judged by an inequality satisfied by the projection operator onto the set Q. We analyze the convergence properties of the method more completely. Under the condition that the solution set of the SFP is nonempty, we prove that the sequence of iterates converges globally to a solution of the SFP as long as the regularization parameter matrix is chosen properly. Further, in the analysis of the convergence rate, we must use the second-order term of the merit function. Because the gradient of the merit function is the linear composite function of the projection operator, it is nonsmooth in general. So, we have two choices with respect to techniques: (i) we can choose directly a generalized Hessian from its generalized Jacobian defined by Clarke for the reason that its gradient is globally Lipschitzian (see, e.g., [12,13,14,15,16,17]). (ii) We can get its second-order term by regularing and smoothing its gradient (see, e.g., [18,19,20,21,22]). In this paper, we adopt the first choice. Under some local assumptions which are necessary for the case where the projection operator is nonsmooth, we prove that the sequence of iterates generated by the algorithm superlinearly converges to a regular solution of the SFP. It is worth noting that the condition of the regular solution is weaker than the conditions in [23,24,25,26,27,28,29,30,31,32]. Because those conditions are the sufficient condition for the regular solution.

The rest of this paper is organized as follows. Section 2 introduces the merit function of the SFP and its some related properties and concepts. Section 3 gives the Newton projection algorithm and proves its global convergence. Under some local assumptions which are necessary for the case where the gradient of the merit function is nonsmooth, Sect. 4 proves that the iterative sequence generated by the algorithm converges superlinearly to a regular solution of the SFP. Section 5 contains some numerical results. Section 6 is concluding remarks in which we point out that the method presented in this paper can be extended to solve a class of convex optimization which is more generalized than the SFP or a class of nonsmooth variational inequality problem.

2 A merit function of SFP and its properties

In this section, we introduce a merit function of the SFP and review some properties of it.

Byrne [10] gave a merit function of the SFP as follows:

$$\begin{aligned} f(x)=\frac{1}{2}\Vert Ax-P_Q(Ax)\Vert ^2. \end{aligned}$$
(2.1)

The function f(x) is convex, continuously differentiable on \(\mathfrak {R}^n\) and its derivative is the operator

$$\begin{aligned} g(x)=A^T(I-P_Q)Ax, \end{aligned}$$
(2.2)

where, \(P_Q(x)\) denotes the orthogonal projection from x onto Q,  that is,

$$\begin{aligned} P_{Q}(x)=\mathrm{argmin}\{\Vert x-y\Vert |~y\in Q\}. \end{aligned}$$

Because Q is a closed convex set, \(P_Q(x)\) is single valued. For the projection operator, the following properties are well-known.

Proposition 2.1

For any \(x, y\in \mathfrak {R}^n,\) we have

  1. (i)

    \(\Vert P_Q(y)-P_Q(x)\Vert \le \Vert y-x\Vert ;\)

  2. (ii)

    \(\langle P_Q(y)-P_Q(x), y-x\rangle \ge \Vert P_Q(y)-P_Q(x)\Vert ^2.\)

From part (i) of Proposition 2.1, we know that \(P_{Q}\) is a globally Lipschitzian operator. Part (ii) tells us that \(P_{Q}\) is co-coercive, thereby is monotone. From (i) and according to Rademacher’s Theorem, we know that the generalized Jacobian defined by Clarke \(\partial P_Q(x)\) exist almost everywhere and has the following property.

Proposition 2.2

([20]) For \(x\in \mathfrak {R}^n,\) all \(V\in \partial P_Q(x)\) are symmetric, positive semidefinite and \(\Vert V\Vert \le 1.\)

Using Clarke’s generalized Jacobian knowledge ([33, 34]), we deduce the following from Proposition 2.2 and (2.2):

$$\begin{aligned} \partial g(x)d=A^T(I-\partial P_Q(Ax))Ad \end{aligned}$$
(2.3)

for all \(d\in \mathfrak {R}^n\).

In addition, we also have

Proposition 2.3

([7]) The following two conclusions hold.

  1. (i)

    g(x) is globally Lipschitz continuous on \(\mathfrak {R}^n;\)

  2. (ii)

    g(x) is co-coercive on \(\mathfrak {R}^n,\) thereby is monotone.

From (2.1) and (2.2), we can prove easily the following proposition.

Proposition 2.4

([7]) The following statements are equivalent:

  1. (i)

    \(x^*\) is a solution of the SFP;

  2. (ii)

    \(x^*\in C\) and \(f(x^*)=0;\)

  3. (iii)

    \(x^*\in C\) and \(g(x^*)=0.\)

From Proposition 2.4, we know that if the solution set of the SFP is nonempty, then we have

$$\begin{aligned} \mathrm{The\ solution\ set\ of\ the\ SFP} =\mathrm{argmin}\{f(x)|x\in C\}. \end{aligned}$$
(2.4)

Thus, solving the SFP is equivalent to solve the convex optimization problem:

$$\begin{aligned} \min _{x\in C} f(x). \end{aligned}$$

Next, we introduce an important concept about nonsmooth analysis used in this paper .

Definition 2.5

([15, 35]) Suppose \(F: \mathfrak {R}^n\rightarrow \mathfrak {R}^m\) is a locally Lipschitzian function. We say that F is semismooth at x if for any \(h\in \mathfrak {R}^n, h\ne 0,\) the limit

$$\begin{aligned} \lim _{\mathop {h'\rightarrow h, t\downarrow 0}\limits ^{V\in \partial F(x+th')}}\{Vh'\} \end{aligned}$$

exists. F is called semismooth on a set \(S\subseteq \mathfrak {R}^n\) if F is semismooth at each \(x\in S.\)

Proposition 2.6

([15]) Suppose \(F: \mathfrak {R}^n\rightarrow \mathfrak {R}^m\) is a locally Lipschitzian function. Then F is semismooth at x if and only if for any \(h\in \mathfrak {R}^n, h\ne 0,\) and \(\forall ~ V\in \partial F(x+h),\) we have

$$\begin{aligned} F(x+h)-F(x)-Vh=o(\Vert h\Vert ). \end{aligned}$$

It was pointed out in [15] that sums, differences, scalar products and compositions of semismooth functions are still semismooth functions. Especially, when Q is a polyhedral, the projection operator \(P_Q(\cdot )\) is semismooth. Thus, from (2.2), we have

Proposition 2.7

Let Q be a polyhedral. Then

  1. (i)

    \(P_Q(x)\) is semismooth on \(\mathfrak {R}^m\);

  2. (ii)

    g(x) is semismooth on \(\mathfrak {R}^n.\)

3 Algorithm and its global convergence

In this section, we present an algorithm for solving the SFP and prove its global convergence. From (2.4), we know that under the condition that the solution set of the SFP is nonempty, we only need to design algorithm to solve the optimization problem \(\min \{f(x)|x\in C\}\). As in [2], we assume all the projections in the algorithm are easily calculated.

Let the natural residual be \(r(x)=x-P_C(x-g(x)).\) For the current iterate \(x^k\in C,\) select a positive semidefinite matrix \(G_k\) and a regularization parameter \(\mu _k>0,\) and let

$$\begin{aligned} \varphi _k(z)=g(x^k)+(G_k+\mu _k I)(z-x^k). \end{aligned}$$

Consider a linearized subproblem: find \(\hat{z}^k\in C\) such that for all \(z\in C,\)

$$\begin{aligned} \left\langle \varphi _k(\hat{z}^k), z-\hat{z}^k\right\rangle \ge 0. \end{aligned}$$
(3.1)

As \(G_k+\mu _k I\) is positive definite, the subproblem (3.1) always has a unique solution \(\hat{z}^k,\) hence with a zero residual

$$\begin{aligned} \hat{z}^k-P_C(\hat{z}^k-\varphi _k(\hat{z}^k))=0. \end{aligned}$$

So, for any \(\epsilon _k>0,\) the subproblem (3.1) always has inexact solutions \(z^k\) satisfying

$$\begin{aligned} \Vert e^k\Vert \le \epsilon _k, \end{aligned}$$

where

$$\begin{aligned} e^k={z}^k-P_C({z}^k-\varphi _k({z}^k)). \end{aligned}$$

We now formally state the algorithm.

Algorithm 3.1

Step 0. Take an \(x^0\in C,\) and choose parameters \(\eta , \alpha , \beta \in (0, 1), \rho \in (0, \eta ).\) Set \(k:=0.\)

Step 1. Choose a positive semidefinite matrix \(G_k=A^T(I-V_k)A,\) where \(V_k\) is a positive semidefinite matrix with \(\Vert V_k\Vert \le 1,\) a regularization parameter \(\mu _k>0\) and \(\rho _k\in (0, \frac{\rho }{M_k}],\) where \(M_k=\Vert g(x^k)\Vert +\Vert G_k\Vert +2\mu _k+1.\) Compute \(z^k,\) an inexact solution of the subproblem (3.1), such that

$$\begin{aligned} \Vert e^k\Vert \le \rho _k \mu _k\Vert z^k-x^k\Vert \cdot \min \{\Vert z^k-x^k\Vert , 1\}. \end{aligned}$$
(3.2)

If \(z^k=x^k\), then stop.

Step 2. Let \(y_C^k=P_C(z^k-\varphi _k(z^k)), p^k=P_Q(Ay_C^k)-P_Q(Ax^k)-V_kA(y_C^k-x^k).\) If

$$\begin{aligned} \langle p^k, Ax^k-Ay_C^k\rangle \le (1-\eta )\mu _k\Vert y_C^k-x^k\Vert ^2, \end{aligned}$$
(3.3)

then set \(y^k=y_C^k, \hat{p}^k=g(y^k)-\varphi _k(z^k)+e^k\) and go to Step 4.

Step 3. Let \(y^k=x^k+\lambda _k(z^k-x^k),\) where \(\lambda _k=\beta ^{m_k}\) with \(m_k\) being the smallest nonnegative integer m such that

$$\begin{aligned} \Vert g(x^k+\beta ^m(z^k-x^k))-g(x^k)\Vert \le \alpha (1-\rho )\mu _k\Vert z^k-x^k\Vert . \end{aligned}$$
(3.4)

Set \(\hat{p}^k=g(y^k).\)

Step 4. Compute

$$\begin{aligned} \hat{x}^k= & {} x^k-\frac{\langle \hat{p}^k, x^k-y^k\rangle }{\Vert \hat{p}^k\Vert ^2}\hat{p}^k;\\ {x}^{k+1}= & {} P_C(\hat{x}^k). \end{aligned}$$

Set \(k{:}=k+1,\) go back to Step 1.

In order to explain the stop criterion and the feasibility of the algorithm, we give the following two lemmas, which will also be used in both the global convergence and the convergence rate analysis.

Lemma 3.2

The following inequality holds for all k

$$\begin{aligned} \langle g(x^k), x^k-z^k\rangle \ge \langle (G_k+(1-\rho )\mu _kI)(z^k-x^k), z^k-x^k\rangle . \end{aligned}$$

Proof

From the definition of \(y_C^k,\) we have

$$\begin{aligned} \langle y_C^k-z^k+\varphi _k(z^k), x^k-y_C^k\rangle \ge 0, \end{aligned}$$

which, together with (3.2) and the definition of \(\rho _k,\) implies that

$$\begin{aligned} \langle g(x^k), x^k-z^k\rangle\ge & {} \langle (G_k+\mu _kI)(z^k-x^k), z^k-x^k \rangle \\&+\Vert e^k\Vert ^2-\langle \varphi _k(z^k)+z^k-x^k, e^k\rangle \\\ge & {} \langle (G_k+\mu _kI)(z^k-x^k), z^k-x^k \rangle -\langle \varphi _k(z^k)+z^k-x^k, e^k\rangle \\\ge & {} \langle (G_k+\mu _kI)(z^k-x^k), z^k-x^k \rangle \\&-(\Vert g(x^k)\Vert \Vert e^k\Vert +(\Vert G_k\Vert +\mu _k+1)\Vert z^k-x^k\Vert \Vert e^k\Vert )\\\ge & {} \langle (G_k+\mu _kI)(z^k-x^k), z^k-x^k \rangle \\&-(\Vert g(x^k)\Vert +\Vert G_k\Vert +\mu _k+1)\rho _k\mu _k\Vert z^k-x^k\Vert ^2\\\ge & {} \langle (G_k+(1-\rho )\mu _kI)(z^k-x^k), z^k-x^k\rangle . \end{aligned}$$

\(\square \)

Lemma 3.3

For all k,  the following inequality is true

$$\begin{aligned} \langle \hat{p}^k, x^k-y^k\rangle \ge \left\{ \begin{array}{ll} (\eta -\rho )\mu _k\Vert y^k-x^k\Vert ^2, &{}\mathrm{if~} (3.3) \mathrm{~holds};\\ \lambda _k(1-\alpha )(1-\rho )\mu _k\Vert z^k-x^k\Vert ^2, &{}\mathrm{otherwise}. \end{array} \right. \end{aligned}$$

Proof

If (3.3) holds, then from (2.2) and the choice of \(G_k,\) we have

$$\begin{aligned} \langle \hat{p}^k, x^k-y^k\rangle= & {} \left\langle g(y^k)-\varphi _k(z^k)+e^k, x^k-y^k\right\rangle \\= & {} \left\langle g(y^k)-g(x^k)-(G_k+\mu _kI)(z^k-x^k)+e^k, x^k-y^k\right\rangle \\= & {} \langle A^T(I-P_Q)Ay^k-A^T(I-P_Q)Ax^k\\&-(A^T(I-V_k)A+\mu _kI)(z^k-x^k)+e^k, x^k-y^k\rangle \\= & {} \mu _k\Vert y^k-x^k\Vert ^2-\langle P_Q(Ay^k)\\&-P_Q(Ax^k)-V_kA(y^k-x^k), A(x^k-y^k)\rangle \\&-\left\langle (G_k+\mu _kI-I)e^k, x^k-y^k\right\rangle \\= & {} \mu _k\Vert y^k-x^k\Vert ^2-\langle p^k, Ax^k-Ay^k\rangle \\&-\langle (G_k+\mu _kI-I)e^k, x^k-y^k\rangle , \end{aligned}$$

which, together with (3.2) and (3.3) and the definition of \(\rho _k,\) can derive that

$$\begin{aligned} \langle \hat{p}^k, x^k-y^k\rangle\ge & {} \eta \mu _k\Vert y^k-x^k\Vert ^2-(\Vert G_k\Vert +\mu _k+1)\Vert e^k\Vert \Vert y^k-x^k\Vert \\\ge & {} [\eta -(\Vert G_k\Vert +\mu _k+1)\frac{\rho _k}{1-\rho _k\mu _k}]\mu _k\Vert y^k-x^k\Vert ^2\\\ge & {} \frac{\eta -(\Vert G_k\Vert +2\mu _k+1)\rho _k}{1-\rho _k\mu _k}\mu _k\Vert y^k-x^k\Vert ^2\\\ge & {} (\eta -\rho )\mu _k\Vert y^k-x^k\Vert ^2. \end{aligned}$$

Thus, the first inequality is proved. If (3.3) does not hold, then (3.4) holds. From Lemma 3.2, we have

$$\begin{aligned} \langle \hat{p}^k, x^k-y^k\rangle= & {} \langle g(y^k), x^k-y^k\rangle \\= & {} \lambda _k[\langle g(x^k), x^k-z^k\rangle +\langle g(y^k)-g(x^k), x^k-z^k\rangle ]\\\ge & {} \lambda _k[(1-\rho )\mu _k\Vert z^k-x^k\Vert ^2-\Vert g(y^k)-g(x^k)\Vert \Vert z^k-x^k\Vert ]\\\ge & {} \lambda _k(1-\alpha )(1-\rho )\mu _k\Vert z^k-x^k\Vert ^2. \end{aligned}$$

This proves the second inequality. \(\square \)

Remark 3.4

From Lemmas (3.2) and (3.2), we know that the stop criterion is sufficient and necessary, that is, \(x^k\in \mathrm{arg}\min \limits _{x\in C} f(x)\) if and only if \(z^k=x^k.\) From now on, we assume that \(z^k\ne x^k\) for all k,  and an infinite sequence \(\{x^k\}\) is generated. Next, we are ready to show that the algorithm is well defined. Firstly, if (3.3) holds, then \(\hat{p}^k\ne 0.\) Otherwise, from the first inequality of Lemmas (3.3) and (3.2), we obtain that \(z^k=x^k.\) Secondly, it is obvious that line search rule (3.4) is feasible and \(\hat{p}^k=g(y^k)\ne 0\). In fact, if \(g(y^k)=0,\) then from (3.4) we can know that

$$\begin{aligned} \Vert g(x^k)\Vert \le \alpha (1-\rho )\mu _k\Vert z^k-x^k\Vert . \end{aligned}$$

Because \(\alpha \in (0,1)\) and matrix \(G_k\) is positive semidefinite, the above inequality contradicts Lemma 3.2. Overall we have proved that Algorithm 3.1 is well defined.

Lemma 3.5

If \(\mathrm{arg}\min \limits _{x\in C} f(x)\ne \emptyset ,\) then for \(\forall x^*\in \mathrm{arg}\min \limits _{x\in C} f(x)\) and \(\forall k,\) we have

$$\begin{aligned} \frac{\langle \hat{p}^k, x^k-y^k\rangle ^2}{\Vert \hat{p}^k\Vert ^2}\le \Vert x^k-x^*\Vert ^2-\Vert x^{k+1}-x^*\Vert ^2. \end{aligned}$$

Proof

It is easy to prove that for \(\forall \bar{x}\in \mathfrak {R}^n,\) the following equality is true

$$\begin{aligned} \Vert \hat{x}^k-\bar{x}\Vert ^2= & {} \Vert {x}^k-\bar{x}\Vert ^2-\frac{\langle \hat{p}^k, x^k-y^k\rangle ^2}{\Vert \hat{p}^k\Vert ^2} \nonumber \\&+2\frac{\langle \hat{p}^k, x^k-y^k\rangle }{\Vert \hat{p}^k\Vert ^2}\langle \hat{p}^k, \bar{x}-y^k\rangle . \end{aligned}$$
(3.5)

Now take an arbitrary \(x^*\in \mathrm{arg}\min \limits _{x\in C} f(x),\) since \(y^k\in C,\) we have

$$\begin{aligned} \langle g(x^*), y^k-x^*\rangle \ge 0. \end{aligned}$$

Using the monotonicity of \(g(\cdot ),\) we obtain

$$\begin{aligned} \langle g(y^k), y^k-x^*\rangle \ge 0. \end{aligned}$$
(3.6)

Let \(\bar{x}=x^*\) in (3.5), we consider the last term on the right hand side. Let

$$\begin{aligned} F_k=\frac{\langle \hat{p}^k, x^k-y^k\rangle }{\Vert \hat{p}^k\Vert ^2}\langle \hat{p}^k, {x}^*-y^k\rangle . \end{aligned}$$

First suppose (3.3) holds. From the first inequality of Lemma 3.3, we have

$$\begin{aligned} \langle \hat{p}^k, x^k-y^k\rangle \ge 0. \end{aligned}$$
(3.7)

As \(y^k=P_C(z^k-\varphi _k(z^k))\) and \(x^*\in C,\) from the property of projection, we have

$$\begin{aligned} \langle \varphi _k(z^k)-e^k, x^*-y^k\rangle \ge 0. \end{aligned}$$
(3.8)

Thus, form (3.6) and (3.8) and the definition of \(\hat{p}^k,\) we see that

$$\begin{aligned} \langle \hat{p}^k, x^*-y^k\rangle \le 0. \end{aligned}$$
(3.9)

Hence, from (3.7) and (3.9), we obtain \(F_k\le 0.\)

Second, assume (3.4) holds. In this case, \(\hat{p}^k=g(y^k),\) hence, from (3.6), we have

$$\begin{aligned} \langle \hat{p}^k, x^*-y^k\rangle \le 0. \end{aligned}$$

So, from the second inequality of Lemma (3.3), we see that \(\langle \hat{p}^k, x^k-y^k\rangle \ge 0.\) Thus we obtain \(F_k\le 0.\)

In conclusion, for each \(k, F_k\le 0.\) Therefore, from the property of projection and (3.5), we have

$$\begin{aligned} \Vert x^{k+1}-x^*\Vert ^2\le & {} \Vert \hat{x}^{k}-x^*\Vert ^2\\\le & {} \Vert x^{k}-x^*\Vert ^2-\frac{\langle \hat{p}^k, x^k-y^k\rangle ^2}{\Vert \hat{p}^k\Vert ^2}, \end{aligned}$$

i.e., the lemma holds. \(\square \)

Lemma 3.6

Suppose that \(\mathrm{arg}\min \limits _{x\in C} f(x)\ne \emptyset ,\) and there exist constants \(0<m<M\) such that for all k and, starting with some index \(k_0, \mu _k\in [m, M].\) Then

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\Vert z^k-x^k\Vert =0, \lim \limits _{k\rightarrow \infty }\Vert y^k-x^k\Vert =0. \end{aligned}$$

Proof

From Lemma 3.5, we know that the sequence \(\{x^k\}\) is bounded and

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{\langle \hat{p}^k, x^k-y^k\rangle }{\Vert \hat{p}^k\Vert }=0. \end{aligned}$$
(3.10)

From Lemma 3.2 and the definition of \(\mu _k,\) we have

$$\begin{aligned} \Vert g(x^k)\Vert \ge m(1-\rho )\Vert z^k-x^k\Vert . \end{aligned}$$

As \(g(\cdot )\) is continuous and \(\{x^k\}\) is bounded, from the above equality, we see that \(\{z^k\}\) is bounded. Using (3.2) and the boundedness of \(\{G_k\},\) we know that \(\{y^k\}\) and \(\{\hat{p}^k\}\) are also bounded. Hence, from (3.10) we obtain

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\langle \hat{p}^k, x^k-y^k\rangle =0. \end{aligned}$$
(3.11)

Below we consider some possible cases. To this end, we divide all indices k into two classes.

$$\begin{aligned} K_N= & {} \{k|y^k=y_C^k, \mathrm{i.e.}, (3.3) ~\mathrm{holds}\}.\\ K_A= & {} \{k|y^k=x^k+\lambda _k(z^k-x^k), \lambda _k ~\mathrm{is~ obtained~ by}~ (3.4)\}. \end{aligned}$$

It is obvious that \(K_N\bigcup K_A=\{1,2,\ldots \}.\) From Lemma 3.3 and the definition of \(\mu _k,\) we have

$$\begin{aligned} \langle \hat{p}^k, x^k-y^k\rangle \ge \left\{ \begin{array}{ll} (\eta -\rho )m\Vert y^k-x^k\Vert ^2, &{} \mathrm{~if~} k\in K_N;\\ \lambda _k(1-\alpha )(1-\rho )m\Vert z^k-x^k\Vert ^2, &{} \mathrm{~if~} k\in K_A. \end{array} \right. \end{aligned}$$
(3.12)

If \(K_N\) is an infinite sequence, then from (3.11) and the first inequality of (3.12), we obtain \(\lim \limits _{k\in K_N, k\rightarrow \infty } \Vert y^k-x^k\Vert =0.\) From (3.2), we have

$$\begin{aligned} \Vert z^k-x^k\Vert\le & {} \frac{1}{1-\rho _k\mu _k}\Vert y^k-x^k\Vert \\\le & {} \frac{1}{1-\rho }\Vert y^k-x^k\Vert . \end{aligned}$$

Thus, \(\lim \limits _{k\in K_N, k\rightarrow \infty } \Vert z^k-x^k\Vert =0.\)

If \(K_A\) is an infinite sequence, then from (3.11) and the second inequality of (3.12), we obtain

$$\begin{aligned} \lim \limits _{k\in K_A, k\rightarrow \infty } \lambda _k\Vert z^k-x^k\Vert ^2=0. \end{aligned}$$
(3.13)

Suppose that there exists an infinite sequence \(K_0\subseteq K_A,\) such that

$$\begin{aligned} \lim \limits _{k\in K_0, k\rightarrow \infty } \Vert z^k-x^k\Vert >0. \end{aligned}$$
(3.14)

As \(\{z^k\}\) and \(\{x^k\}\) are bounded, without loss of generality, we may assume that

$$\begin{aligned} \lim \limits _{k\in K_0, k\rightarrow \infty } z^k=\bar{z}, \lim \limits _{k\in K_0, k\rightarrow \infty } x^k=\bar{x}. \end{aligned}$$

From (3.14) we know that \(\bar{z}\ne \bar{x}.\) Then by (3.13) and (3.14),

$$\begin{aligned} \lim \limits _{k\in K_0, k\rightarrow \infty } \lambda _k=0. \end{aligned}$$
(3.15)

By (3.15), line search rule (3.4) and the definition of \(\mu _k,\) when \(k\in K_0\) and k is sufficiently large,

$$\begin{aligned} \Vert g(x^k+\beta ^{-1}\lambda _k(z^k-x^k))-g(x^k)\Vert\ge & {} \alpha (1-\rho )\mu _k\Vert z^k-x^k\Vert \\\ge & {} \alpha (1-\rho ){{m}}\Vert z^k-x^k\Vert . \end{aligned}$$

Since (3.15) holds, passing onto the limit as \(k\rightarrow \infty ,\) we obtain \(\Vert \bar{z}-\bar{x}\Vert =0,\) which is a contradiction. Thus we obtain \(\lim \limits _{k\in K_A, k\rightarrow \infty }\Vert z^k-x^k\Vert =0\) and \(\lim \limits _{k\in K_A, k\rightarrow \infty }\Vert y^k-x^k\Vert =0.\) \(\square \)

We now give the global convergence theorem of Algorithm 3.1.

Theorem 3.7

The following two conclusions hold for Algorithm 3.1.

  1. (1)

    If \(\mathrm{arg}\min \limits _{x\in C}f(x)\ne \emptyset ,\) then \(\{x^k\}\) is bounded. Suppose that there exists a constant \(M>0\) such that for all k and, starting with some index \(k_0,\) \(\mu _k\in [\Vert r(x^k)\Vert , M].\) Then

    $$\begin{aligned} \lim \limits _{k\rightarrow \infty }x^k=x^*\in \mathrm{arg}\min _{x\in C}f(x). \end{aligned}$$
    (3.16)
  2. (2)

    If the sequence \(\{x^k\}\) is bounded and starting with some index \(k_0,\) \(\mu _k\in [\Vert r(x^k)\Vert , M],\) then, \(\mathrm{arg}\min \limits _{x\in C}f(x)\ne \emptyset \) and (3.16) holds.

Proof

(1) If \(\mathrm{arg}\min \limits _{x\in C}f(x)\ne \emptyset ,\) then from Lemma 3.5, \(\{x^k\}\) is bounded and for any \(x^*\in \mathrm{arg}\min \limits _{x\in C}f(x), \{\Vert x^k-x^*\Vert \}\) is monotonically decreasing. So, in order to prove conclusion (1), it suffices to show that \(\liminf \limits _{k\rightarrow \infty }\Vert r(x^k)\Vert =0.\) Suppose the conclusion does not hold, then it would be

$$\begin{aligned} \liminf \limits _{k\rightarrow \infty }\Vert r(x^k)\Vert > 0. \end{aligned}$$
(3.17)

From the definition of \(\mu _k,\) there exists a constant \(m>0\) such that for all sufficiently large \(k, \mu _k\in [m, M].\) Thus, from Lemmas (3.6) and (3.2), we have

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\Vert z^k-x^k\Vert =0, \lim \limits _{k\rightarrow \infty }\Vert y_C^k-x^k\Vert =0. \end{aligned}$$
(3.18)

As \(\{x^k\}\) is bounded, there is a subsequence \(K\subseteq \{1,2,\cdots \}\) such that

$$\begin{aligned} \lim \limits _{k\in K, k\rightarrow \infty }x^k=\bar{x}. \end{aligned}$$
(3.19)

From (3.18) and (3.19) and the fact that \(\{G_k\}\) and \(\{\mu _k\}\) are bounded, we have

$$\begin{aligned} \lim \limits _{k\in K, k\rightarrow \infty }\varphi _k(z^k)=g(\bar{x}). \end{aligned}$$
(3.20)

From the definition of \(y_C^k\) and the property of projection, for all \(x\in C,\) we have

$$\begin{aligned} \langle \varphi _k(z^k)-e^k, x-y_C^k\rangle \ge 0. \end{aligned}$$

Using (3.18), (3.19) and (3.20) and taking limits on both sides of the above inequality when \(k\in K\) and \(k\rightarrow \infty ,\)

$$\begin{aligned} \langle g(\bar{x}), x-\bar{x}\rangle \ge 0. \end{aligned}$$

As x can be any point in C,  the above inequality means that \(\bar{x}\in \mathrm{arg}\min \limits _{x\in C}f(x).\) From this we can conclude that \(\lim \limits _{k\rightarrow \infty } \Vert r(x^k)\Vert =0,\) which contradicts (3.17). So part (1) is proved.

(2) Now suppose that \(\{x^k\}\) is bounded and \(\mu _k\in [\Vert r(x^k)\Vert , M].\) In this case we only need to prove that

$$\begin{aligned} \liminf \limits _{k\rightarrow \infty }\Vert r(x^k)\Vert =0, \end{aligned}$$

which implies that \(\mathrm{arg}\min \limits _{x\in C}f(x)\ne \emptyset ,\) then by conclusion (1) we know that conclusion (2) also holds. Again we prove it by contradiction. Suppose (3.17) holds. Then from the definition of \(\mu _k,\) there exists \(m>0\) such that for all sufficiently large \(k, \mu _k\in [m, M].\) From Lemma 3.2, following the same line as in the proof for Lemma 3.6, we see that \(\{z^k\}, \{y^k\}\) are bounded. Hence, we can select a constant \(M_0>0,\) such that for all k,  we have

$$\begin{aligned} \max \{\Vert x^k\Vert , \Vert z^k\Vert , \Vert y^k\Vert \}\le M_0. \end{aligned}$$

We arbitrarily take an \(\bar{x}\in C,\) and let

$$\begin{aligned} Y=\{x\in \mathfrak {R}^n| \Vert x\Vert \le M_0+\Vert \bar{x}\Vert \}\bigcap C. \end{aligned}$$

As \(\bar{x}\in Y, Y\) is nonempty. Also, Y is a compact set and to be contained in C. Hence \(\mathrm{arg}\min \limits _{x\in Y}f(x)\ne \emptyset .\) Since \(\{x^k\}, \{z^k\}\) and \(\{y^k\}\subseteq Y,\) we can regard \(\{x^k\}\) as the iterative sequence obtained by using the Algorithm 3.1 for the problem \(\min \limits _{x\in Y}f(x).\) As \(\mathrm{arg}\min \limits _{x\in Y}f(x)\ne \emptyset ,\) by the proved conclusion (1), we have

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }x^k=x^*\in \mathrm{arg}\min \limits _{x\in Y}f(x). \end{aligned}$$
(3.21)

As \(\bar{x}\in Y,\) from (3.21), we have

$$\begin{aligned} \langle g(x^*), \bar{x}-x^*\rangle \ge 0. \end{aligned}$$
(3.22)

Notice that \(x^*\in C\) and \(\bar{x}\) can be any point in C,  (3.22) implies that \(x^*\in \mathrm{arg}\min \limits _{x\in C}f(x).\) Hence we see that \(\lim \limits _{k\rightarrow \infty }\Vert r(x^k)\Vert =0,\) which contradicts (3.17). \(\square \)

In Algorithm 3.1, we can choose a special case of \(V_k,\) i.e., \(V_k=0.\) Then, from the monotonicity of \(P_Q(\cdot ),\) we have

$$\begin{aligned} \langle p^k, Ax^k-Ay_C^k\rangle =\langle P_Q(Ay_C^k)-P_Q(Ax^k), Ax^k-Ay_C^k\rangle \le 0. \end{aligned}$$

So, (3.3) is obvious true. Thus we can drive a special case of Algorithm 3.1, that is, the following algorithm.

Algorithm 3.8

Step 0. Take an \(x^0\in C,\) and choose parameters \(\eta \in (0, 1), \rho \in (0, \eta ).\) Set \(k:=0.\)

Step 1. Choose a positive semidefinite matrix \(G_k=A^TA,\) a regularization parameter \(\mu _k>0\) and \(\rho _k\in (0, \frac{\rho }{M_k}],\) where \(M_k=\Vert g(x^k)\Vert +\Vert A^TA\Vert +2\mu _k+1.\) Compute \(z^k,\) an inexact solution of the subproblem (3.1), such that (3.2) holds. If \(z^k=x^k,\) then stop.

Step 2. Set \(y^k=P_C(z^k-\varphi _k(z^k)), \hat{p}^k=g(y^k)-\varphi _k(z^k)+e^k.\) Compute

$$\begin{aligned} \hat{x}^k=x^k-\frac{\langle \hat{p}^k, x^k-y^k\rangle }{\Vert \hat{p}^k\Vert ^2}\hat{p}^k; \end{aligned}$$
$$\begin{aligned} {x}^{k+1}=P_C(\hat{x}^k). \end{aligned}$$

Set \(k:=k+1,\) and go back to Step 1.

From Theorem 3.7, we can drive the global convergence of Algorithm 3.8.

4 Convergence rate analysis

In this section, we will discuss the convergence rate of the algorithm. To this end, for the nonsmooth \(g(\cdot ),\) we first introduce the definition of regular solution [28, 36]. Consider a perturbed linear variational inequality problem \((VIP(C, g^\omega )),\) where

$$\begin{aligned} g^\omega (x)=g(x^*)+\omega +G^*(x-x^*), \end{aligned}$$
(4.1)

and \(G^*=A^T(I-V(x^*))A,\) where \(V(x^*)\in \partial P_Q(Ax^*).\)

Definition 4.1

Let \(x^*\in \mathrm{arg}\min \limits _{x\in C}f(x).\) We call \(x^*\) a regular solution if there exist a neighborhood \(U(x^*)\) of \(x^*\) and a neighborhood W(0) of the origin, such that for all \(\omega \in W(0),\) problem \((VIP(C, g^\omega ))\) has a unique solution \(x(\omega )\) over \(U(x^*),\) and furthermore, \(x(\omega )\) is Lipschitz continuous over W(0),  i.e., there is a constant \(L^*>0,\) such that for all \(\omega '\) and \(\omega ''\in W(0),\) we have

$$\begin{aligned} \Vert x(\omega ')-x(\omega '')\Vert \le L^*\Vert \omega '-\omega ''\Vert , \end{aligned}$$

where \(L^*\) is called a regular Lipschitz constant for the solution \(x^*.\)

It is noticed that [28, 36] gave the regular solution for smooth analysis. Although this concept is applied to optimization, especially in the local convergence analysis of the algorithms for solving variational inequality problem, most references used the sufficient condition of regular solution. For example, the sufficient condition used in [25,26,27, 29] and the references therein was given under the condition that the mapping F satisfies some local assumptions and some constraint qualifications. Armand et al., Kanzow and Fukushima, Peng and Fukushima, Solodov and Svaiter, Sun et al. and Taji et al. [23, 24, 28, 30,31,32] used the stronger sufficient condition than those used in the references mentioned above. Unlike these references, in this section, under some local assumptions which are necessary for the case where \(P_Q(\cdot )\) is nonsmooth, we prove that the sequence of iterates generated by the algorithm superlinearly converges to a regular solution of the SFP.

Let \(\mathrm{arg}\min \limits _{x\in C}f(x)\ne \emptyset .\) We choose \(G_k=A^T(I-V(x^k))A,\) where \(V(x^k)\in \partial P_Q(Ax^k)\) in Algorithm 3.1. From Proposition 2.2, we know that \(G_k\) is symmetric and positive semidefinite and \(\{G_k\}\) is bounded. Choose \(\rho _k\rightarrow 0(k\rightarrow \infty ),\) \(\mu _k=\max \{\Vert r(x^k)\Vert , S(x^k)^t\}, t\in (0,1),\) where

$$\begin{aligned} S(x^k)=\sup \left\{ \frac{\Vert P_Q(Ax^k+u)-P_Q(Ax^k)-V(x^k)u\Vert }{\Vert u\Vert }|u\ne 0, \Vert u\Vert \le \Vert r(x^k)\Vert ^t\right\} . \end{aligned}$$

From Propositions 2.1 and 2.2, we know that \(\{S(x^k)\}\) is bounded.

Under the above choice, from Theorems (3.7) and (3.16) holds, that is

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }x^k=x^*\in \mathrm{arg}\min \limits _{x\in C}f(x). \end{aligned}$$

All the discussions below are about the analysis of the convergence rate of the sequence \(\{x^k\}\) converging to \(x^*\) which aimed to (3.16) under the above choice.

We need the following assumptions.

  1. (A1):

    \(\lim \limits _{k\rightarrow \infty }z^k=x^*;\)

  2. (A2):

    \(\lim \limits _{k\rightarrow \infty }V(x^k)=V(x^*),\) where, \(V(x^*)\) is chosen by (4.1);

  3. (A3):

    \(\lim \limits _{k\rightarrow \infty }S(x^k)=0;\)

  4. (A4):

    \(P_Q(Ax^k)-P_Q(Ax^*)-V(x^k)A(x^k-x^*)=o(\Vert x^k-x^*\Vert ).\)

The above assumptions, especially (A2), (A3) and (A4), are necessary for the projection operator \(P_Q(\cdot )\) which is nonsmooth. They have obvious background, which we will state in detail latter.

Lemma 4.2

Let \(x^*\) be a regular solution and the assumptions (A1), (A2) and (A3) hold. Then for all sufficiently large k,  we have

$$\begin{aligned} \Vert A(y_C^k-x^k)\Vert \le \Vert r(x^k)\Vert ^t. \end{aligned}$$

Proof

Let

$$\begin{aligned} \omega _k'= & {} \varphi _k(z^k)-e^k-g(x^*)-G^*(y_C^k-x^*), \end{aligned}$$
(4.2)
$$\begin{aligned} \omega _k''= & {} g(x^k)-r(x^k)-g(x^*)-G^*(x_C^k-x^*), \end{aligned}$$
(4.3)

where \(x_C^k=P_C(x^k-g(x^k))\). From the definition of \(y_C^k\) and \(x_C^k,\) they are the solution of perturbation \(VIP(C,g^{\omega _k'})\) and \(VIP(C,g^{\omega _k''})\) respectively, where

$$\begin{aligned} g^{\omega _k'}(x)= & {} g(x^*)+\omega _k'+G^*(x-x^*), \end{aligned}$$
(4.4)
$$\begin{aligned} g^{\omega _k''}(x)= & {} g(x^*)+\omega _k''+G^*(x-x^*). \end{aligned}$$
(4.5)

From assumption (A1) and (3.16), we have

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }y_C^k=x^*, \lim \limits _{k\rightarrow \infty }x_C^k=x^*. \end{aligned}$$
(4.6)

In addition, from (4.2) and (4.3), we have

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\omega _k'=0, \lim \limits _{k\rightarrow \infty }\omega _k''=0. \end{aligned}$$
(4.7)

From (4.6) and (4.7), for all sufficiently large k,  we obtain \(y_C^k, x_C^k\in U(x^*), \omega _k', \omega _k''\in W(0),\) where \(U(x^*)\) and W(0) is defined as in Definition 4.1. From assumption, \(x^*\) is a regular solution. Thus, from Definition 4.1, \(y_C^k\) and \(x_C^k\) are the unique solution of \(VIP(C,g^{\omega _k'})\) and \(VIP(C,g^{\omega _k''})\) over \(U(x^*)\) and satisfy

$$\begin{aligned} \Vert y_C^k-x_C^k\Vert \le L^*\Vert \omega _k'-\omega _k''\Vert . \end{aligned}$$
(4.8)

From (3.2) and (4.8), we have

$$\begin{aligned} \Vert y_C^k-x^k\Vert\le & {} \Vert y_C^k-x_C^k\Vert +\Vert r(x^k)\Vert \nonumber \\\le & {} L^*\Vert \omega _k'-\omega _k''\Vert +\Vert r(x^k)\Vert \nonumber \\= & {} L^*\Vert \varphi _k(z^k)-e^k-G^*(y_C^k-x^*)\nonumber \\&-g(x^k)+r(x^k)+G^*(x_C^k-x^*)\Vert +\Vert r(x^k)\Vert \nonumber \\= & {} L^*\Vert (G_k-G^*+\mu _kI)(z^k-x^k)\nonumber \\&+(I-G^*)r(x^k)+(G^*-I)e^k\Vert +\Vert r(x^k)\Vert \nonumber \\\le & {} L^*[(\Vert G^*\Vert +1)\Vert r(x^k)\Vert +(\Vert G^*\Vert +1)\Vert e^k\Vert \nonumber \\&+(\Vert G_k-G^*\Vert +\mu _k)\Vert z^k-x^k\Vert ]+\Vert r(x^k)\Vert \nonumber \\\le & {} L^*[(\Vert G^*\Vert +1)\rho _k\mu _k+\Vert G_k-G^*\Vert +\mu _k]\Vert z^k-x^k\Vert \nonumber \\&+[L^*(\Vert G^*\Vert +1)+1]\Vert r(x^k)\Vert \nonumber \\\le & {} L^*[(\Vert G^*\Vert +1)\rho _k\mu _k+(\Vert G_k-G^*\Vert +\mu _k)]\frac{1}{1-\rho _k\mu _k}\Vert y_C^k-x^k\Vert \nonumber \\&+[L^*(\Vert G^*\Vert +1)+1]\Vert r(x^k)\Vert . \end{aligned}$$
(4.9)

From the definition of \(G_k\) and \(\mu _k\), using assumption (A2) and (A3), we obtain \(\Vert G_k-G^*\Vert \rightarrow 0, \mu _k\rightarrow 0 (k\rightarrow \infty ).\) Thus, we can choose sufficiently large k,  such that

$$\begin{aligned} L^*[(\Vert G^*\Vert +1)\rho _k\mu _k+(\Vert G_k-G^*\Vert +\mu _k)]\frac{1}{1-\rho _k\mu _k}\le \frac{1}{2+L^*(\Vert G^*\Vert +1)}. \end{aligned}$$

Thus, from (4.9), we have

$$\begin{aligned} \Vert y_C^k-x^k\Vert\le & {} \frac{1}{2+L^*(\Vert G^*\Vert +1)}\Vert y_C^k-x^k\Vert \\&+[L^*(\Vert G^*\Vert +1)+1]\Vert r(x^k)\Vert . \end{aligned}$$

Therefore, we have

$$\begin{aligned} \Vert y_C^k-x^k\Vert\le & {} [2+L^*(\Vert G^*\Vert +1)]\Vert r(x^k)\Vert ,\\ \Vert A(y_C^k-x^k)\Vert\le & {} \Vert A\Vert [2+L^*(\Vert G^*\Vert +1)]\Vert r(x^k)\Vert . \end{aligned}$$

Because \(\Vert r(x^k)\Vert \rightarrow 0(k\rightarrow \infty ),\) for all sufficiently large k,  we have

$$\begin{aligned} \Vert A\Vert [2+L^*(\Vert G^*\Vert +1)]\Vert r(x^k)\Vert ^{1-t}\le 1. \end{aligned}$$

Thus, we can get

$$\begin{aligned} \Vert A(y_C^k-x^k)\Vert \le \Vert r(x^k)\Vert ^t. \end{aligned}$$

\(\square \)

Lemma 4.3

The following inequality holds for \(\forall k\)

$$\begin{aligned} \Vert \hat{x}^k-y^k\Vert \le \frac{\Vert \hat{p}^k+\mu _k(y^k-x^k)\Vert }{\mu _k}. \end{aligned}$$

Proof

From the following equality, this conclusion can be proved.

$$\begin{aligned}&\Vert \hat{p}^k+\mu _k(y^k-x^k)\Vert ^2-\mu _k^2\Vert \hat{x}^k-y^k\Vert ^2\\&\quad =\Vert \hat{p}^k\Vert ^2-2\mu _k\langle \hat{p}^k, x^k-y^k\rangle +\mu _k^2\frac{\langle \hat{p}^k, x^k-y^k\rangle ^2}{\Vert \hat{p}^k\Vert ^2}\\&\quad =\Vert \hat{p}^k-\mu _k\frac{\langle \hat{p}^k, x^k-y^k\rangle }{\Vert \hat{p}^k\Vert ^2}\hat{p}^k\Vert ^2. \end{aligned}$$

\(\square \)

Lemma 4.4

Let \(x^*\) be a regular solution and the assumptions (A1), (A2) and (A3) hold. Then for all sufficiently large k,  (3.3) holds.

Proof

If \(A(x^k-y_C^k)=0,\) then (3.3) is obviously true. Now, we suppose that \(A(x^k-y_C^k)\ne 0.\) From Lemma 4.2, for all sufficiently large k,  we have

$$\begin{aligned} \Vert A(y_C^k-x^k)\Vert \le \Vert r(x^k)\Vert ^t. \end{aligned}$$
(4.10)

Thus, from (4.10) and the definition of \(\mu _k,\) we have

$$\begin{aligned}&\langle {{{p}^k}}, A(x^k-y_C^k)\rangle \\&\quad =\left\langle P_Q(Ay_C^k)-P_Q(Ax^k)-V(x^k)A(y_C^k-x^k), A(x^k-y_C^k)\right\rangle \\&\quad \le \Vert A\Vert ^2\frac{\Vert P_Q(Ax^k+A(y_C^k-x^k))-P_Q(Ax^k)-V(x^k)A(y_C^k-x^k)\Vert }{\Vert A(y_C^k-x^k)\Vert }\Vert y_C^k-x^k\Vert ^2\\&\quad \le \Vert A\Vert ^2\sup \left\{ \frac{\Vert P_Q(Ax^k+u)-P_Q(Ax^k)-V(x^k)u\Vert }{\Vert u\Vert }| u\ne 0, \Vert u\Vert \le \Vert r(x^k)\Vert ^t\right\} \Vert y_C^k-x^k\Vert ^2\\&\quad =\Vert A\Vert ^2S(x^k)\Vert y_C^k-x^k\Vert ^2\\&\quad \le \Vert A\Vert ^2S(x^k)^{1-t}\mu _k\Vert y_C^k-x^k\Vert ^2. \end{aligned}$$

From assumption (A3), \(S(x^k)\rightarrow 0(k\rightarrow \infty ).\) So, for all sufficiently large k, \(\Vert A\Vert ^2S(x^k)^{1-t}\le 1-\eta .\) Thus, we have

$$\begin{aligned} \langle {{{p}^k}}, A(x^k-y_C^k)\rangle \le (1-\eta )\mu _k\Vert y_C^k-x^k\Vert ^2, \end{aligned}$$

i.e., (3.3) holds. \(\square \)

Now, we give the theorem of the convergence rate analysis of Algorithm 3.1.

Theorem 4.5

Let \(x^*\) be a regular solution and the assumptions (A1), (A2), (A3)and (A4) hold. Then \(\{x^k\}\) superlinearly converges to \(x^*,\) that is,

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{\Vert x^{k+1}-x^*\Vert }{\Vert x^{k}-x^*\Vert }=0. \end{aligned}$$

Proof

By Lemma 4.4, for sufficiently large k,  (3.3) holds. Thus, we see that \(y^k=y_C^k, \hat{p}^k=g(y^k)-\varphi _k(z^k)+e^k.\) Next, we will estimate \(\Vert x^{k+1}-x^*\Vert .\) We have

$$\begin{aligned} \begin{array}{lll} \Vert x^{k+1}-x^*\Vert &{}\le &{}\Vert \hat{x}^{k}-x^*\Vert \\ {} &{}\le &{}\Vert \hat{x}^{k}-y^k\Vert +\Vert y^k-x^*\Vert . \end{array} \end{aligned}$$
(4.11)

We first estimate \(\Vert y^k-x^*\Vert .\) From assumption, \(x^*\) is a regular solution, hence is the unique solution when \(\omega =0,\) i.e., the solution to the problem \(VIP(C, g^0)\) over \(U(x^*).\) Thus, from Definition 4.1, we have

$$\begin{aligned} \Vert y^k-x^*\Vert \le L^*\Vert \omega _k'\Vert . \end{aligned}$$

Noting the definition of \(G_k\) and \(G^*,\) by (3.2) and (4.2) and assumption (A4), we can get from the above inequality

$$\begin{aligned} \Vert y^k-x^*\Vert\le & {} L^*\Vert \varphi _k(z^k)-e^k-g(x^*)-G^*(y^k-x^*)\Vert \nonumber \\= & {} L^*\Vert g(x^k)-g(x^*)-G^*(x^k-x^*)\nonumber \\&+(G_k-G^*+\mu _k)(z^k-x^k)+(G^*-I)e^k\Vert \nonumber \\= & {} L^*\Vert A^T(P_Q(Ax^*)-P_Q(Ax^k)+V(x^*)A(x^k-x^*))\nonumber \\&+(G_k-G^*+\mu _k)(z^k-x^k)+(G^*-I)e^k\Vert \nonumber \\\le & {} L^*[\Vert A\Vert \Vert P_Q(Ax^k)-P_Q(Ax^*)-V(x^*)A(x^k-x^*)\Vert \nonumber \\&+(\Vert G_k-G^*\Vert +\mu _k)\Vert z^k-x^k\Vert \nonumber \\&+(\Vert G^*\Vert +1)\Vert e^k\Vert ]\nonumber \\\le & {} L^*\Vert A\Vert [\Vert P_Q(Ax^k)-P_Q(Ax^*)-V(x^k)A(x^k-x^*)\Vert \nonumber \\&+\Vert V(x^k)-V(x^*)\Vert \Vert A\Vert \Vert x^k-x^*\Vert \nonumber \\&+L^*[\Vert G_k-G^*\Vert +\mu _k+(\Vert G^*\Vert +1)\rho _k\mu _k]\Vert z^k-x^k\Vert \nonumber \\\le & {} L^*\Vert A\Vert \left[ \frac{o(\Vert x^k-x^*\Vert )}{\Vert x^k-x^*\Vert }+\Vert V(x^k)-V(x^*)\Vert \Vert A\Vert \right] \Vert x^k-x^*\Vert \nonumber \\&+L^*[\Vert G_k-G^*\Vert +\mu _k+(\Vert G^*\Vert +1)\rho _k\mu _k]\frac{1}{1-\rho _k\mu _k}\Vert y^k-x^k\Vert \nonumber \\\le & {} S_{1k}\Vert x^k-x^*\Vert +S_{2k}\Vert y^k-x^*\Vert , \end{aligned}$$
(4.12)

where,

$$\begin{aligned} S_{1k}= & {} L^*\Vert A\Vert [o(1)+\Vert V(x^k)-V(x^*)\Vert \Vert A\Vert ],\\ S_{2k}= & {} L^*[\Vert G_k-G^*\Vert +\mu _k+(\Vert G^*\Vert +1)\rho _k\mu _k]\frac{1}{1-\rho _k\mu _k}. \end{aligned}$$

From assumption (A2) and (A3), \(S_{1k}\rightarrow 0, S_{2k}\rightarrow 0(k\rightarrow \infty ).\) Thus, from (4.12), we have

$$\begin{aligned} \Vert y^k-x^*\Vert \le \frac{S_{1k}}{1-S_{2k}}\Vert x^k-x^*\Vert . \end{aligned}$$
(4.13)

We next estimate \(\Vert \hat{x}^k-y^k\Vert .\) By Lemmas (4.3), (3.2) and (4.13), we have

$$\begin{aligned} \Vert \hat{x}^k-y^k\Vert\le & {} \frac{\Vert \hat{p}^k-\mu _k(y^k-x^k)\Vert }{\mu _k}\nonumber \\= & {} \frac{\Vert g(y^k)-\varphi _k(z^k)+e^k+\mu _k(y^k-x^k)\Vert }{\mu _k}\nonumber \\= & {} \frac{\Vert g(y^k)-g(x^k)-G_k(y^k-x^k)-(G_k+\mu _kI)e^k+e^k\Vert }{\mu _k}\nonumber \\\le & {} \frac{\Vert A\Vert \Vert P_Q(Ay^k)-P_Q(Ax^k)-V(x^k)A(y^k-x^k)\Vert }{\mu _k\Vert A(y^k-x^k)\Vert }\Vert A(y^k-x^k)\Vert \nonumber \\&+\frac{\Vert G_k\Vert +\mu _k+1}{\mu _k}\Vert e^k\Vert \nonumber \\\le & {} \frac{\Vert A\Vert ^2S(x^k)\Vert y^k-x^k\Vert }{\mu _k}+\frac{\Vert G_k\Vert +\mu _k+1}{\mu _k}\Vert e^k\Vert \nonumber \\\le & {} \left[ \Vert A\Vert ^2S(x^k)^{1-t}+(\Vert G_k\Vert +\mu _k+1) \frac{\rho _k}{1-\rho _k\mu _k}\right] \Vert y^k-x^k\Vert \nonumber \\\le & {} \left[ \Vert A\Vert ^2S(x^k)^{1-t}+(\Vert G_k\Vert +\mu _k+1) \frac{\rho _k}{1-\rho _k\mu _k}\right] \nonumber \\&\left( \frac{S_{1k}}{1-S_{2k}}+1\right) \Vert x^k-x^*\Vert \nonumber \\= & {} S_{3k}\left( \frac{S_{1k}}{1-S_{2k}}+1\right) \Vert x^k-x^*\Vert , \end{aligned}$$
(4.14)

where, \(S_{3k}=\Vert A\Vert ^2S(x^k)^{1-t}+(\Vert G_k\Vert +\mu _k+1)\frac{\rho _k}{1-\rho _k\mu _k}.\) Because \(\{G_k\}\) is bounded, \(\rho _k\rightarrow 0(k\rightarrow \infty ),\) from (A3), we obtain \(S(x^k)\rightarrow 0(k\rightarrow \infty ).\) By (4.11), (4.13) and (4.14), we have

$$\begin{aligned} \Vert x^{k+1}-x^*\Vert \le \left[ \frac{S_{1k}}{1-S_{2k}}+S_{3k}\left( \frac{S_{1k}}{1-S_{2k}}+1\right) \right] \Vert x^{k}-x^*\Vert . \end{aligned}$$

As \(S_{1k}\rightarrow 0, S_{2k}\rightarrow 0, S_{3k}\rightarrow 0(k\rightarrow \infty ),\) we have

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{\Vert x^{k+1}-x^*\Vert }{\Vert x^{k}-x^*\Vert }=0. \end{aligned}$$

This completes the proof of this theorem. \(\square \)

Remark 4.6

Now we are ready to analyze the background of assumptions (A1), (A2), (A3) and (A4), which can show that these assumptions are necessary to the nonsmooth of \(P_Q.\)

  1. (1)

    If \(P_Q(\cdot )\) is semismooth at \(x^*\) (for example, Q is a polyhedron), then from Proposition 2.5, assumption (A4) holds.

  2. (2)

    If \(P_Q(\cdot )\) is smooth at \(x^*\) (for example, \(Q=\{z\in \mathfrak {R}^m| Bz=b\}\), where B is a \(p\times m\) matrix and has full row rank), then assumptions (A2), (A3) and (A4) are satisfied.

  3. (3)

    If \(P_Q(\cdot )\) is smooth at \(x^*\) and \(\bigtriangledown g(x^*)\) is positive definite, then assumptions (A2), (A3) and (A4) are satisfied. Furthermore, by Lemma 3.2, assumption (A1) is also satisfied. From Lemma 6 of [28], \(x^*\) is a regular solution. Thus, in this case, we only need to choose \(G_k=\bigtriangledown g(x^k), \mu _k=\max \{\Vert r(x^k)\Vert , S(x^k)^t\}\) in Theorem 4.5, then \(\{x^k\}\) superlinearly converges to \(x^*.\) It is worth noting that the condition of this result is weaker than that used in the related literature. For example, in Theorem 4.3 of [30], it not only need that \(\nabla F(x^*)\) is positive definite, but also need that \(\nabla F(\cdot )\) is H\(\ddot{o}\)lder continuous around \(x^*\) with degree p. On the other hand, we must point out that the Algorithm 2.2 of [30] play an enlightenment role for the algorithms of this paper.

By an argument analogous to that used in [28], Lemma 6, in the case of nonsmooth, we can prove that if \(G^*\) is positive definite, then \(x^*\) is a regular solution. In addition, by Lemma 3.2, we know that (A1) holds. Thus, from Theorem 4.5, we can obtain the following corollary.

Corollary 4.7

Suppose that \(G^*\) in (4.1) is positive definite and assumptions (A2), (A3) and (A4) hold. Then, \(\{x^k\}\) superlinearly converges to \(x^*.\)

Below we give the convergence analysis for the algorithm under the following assumption.

Assumption \((A_1'): \{z^k\}\) is bounded and \(\inf \limits _{k}\lambda _k> 0.\)

If the assumption (A1) is replaced by \((A_1')\) in Theorem 4.5, whether the convergence property of the algorithm can also hold? We first give the following lemma.

Lemma 4.8

Suppose that \(\mathrm{arg}\min \limits _{x\in C}f(x)=\{x^*\}\) and assumption \((A_1')\) holds. Then

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }z^k=x^*. \end{aligned}$$

Proof

From the assumption, we know that \(\{z^k\}\) is bounded. So, we only need to prove that any of its accumulation point \(z^*=x^*.\) Suppose there exists an infinite sequence \(K\subseteq \{1,2,\cdots \}\) such that

$$\begin{aligned} \lim \limits _{k\in K,k\rightarrow \infty }z^k=z^*. \end{aligned}$$
(4.15)

Because \(\rho _k\rightarrow 0 (k\rightarrow \infty ),\) from (3.2) and (4.15), we have

$$\begin{aligned} \lim \limits _{k\in K,k\rightarrow \infty }y_C^k=z^*. \end{aligned}$$
(4.16)

In addition, as \(\{\lambda _k\}\) is bounded, without loss of generality, we may assume

$$\begin{aligned} \lim \limits _{k\in K,k\rightarrow \infty }\lambda _k=\lambda ^*. \end{aligned}$$
(4.17)

From (4.15) and Lemma 3.2, noting that \(x^*\in \mathrm{arg}\min \limits _{x\in C}f(x),\) we can conclude that

$$\begin{aligned} \langle g(x^*), x^*-z^*\rangle =0. \end{aligned}$$
(4.18)

From the boundedness of \(\{z^k\}, \{y^k\}, \{\mu _k\}, \{G_k\}\) and Lemma 3.5, we have

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\langle \hat{p}^k, x^k-y^k\rangle =0. \end{aligned}$$
(4.19)

If (3.3) holds, then \(y^k=y_C^k, \hat{p}^k=g(y^k)-\varphi _k(z^k)+e^k,\) hence we have

$$\begin{aligned} \langle g(y^k), x^k-y^k\rangle \ge \langle \hat{p}^k, x^k-y^k\rangle . \end{aligned}$$
(4.20)

Thus, from (4.16), (4.18), (4.19) and (4.20) and taking into the monotonicity of \(g(\cdot )\) account, we have

$$\begin{aligned} \langle g(x^*)-g(z^*), x^*-z^*\rangle =0, \end{aligned}$$

which, together with the co-coercivity of \(g(\cdot ),\) implies that

$$\begin{aligned} g(x^*)=g(z^*). \end{aligned}$$
(4.21)

Choosing any \(x\in C,\) from (4.18) and (4.21), we have

$$\begin{aligned} \langle g(z^*), x-z^*\rangle= & {} \langle g(x^*), x-x^*\rangle +\langle g(x^*), x^*-z^*\rangle \\= & {} \langle g(x^*), x-x^*\rangle \\\ge & {} 0. \end{aligned}$$

It follows that \(z^*=x^*.\)

If \(y^k=x^k+\lambda _k(z^k-x^k),\) where, \(\lambda _k\) is generated by linesearch (3.4). Then, from (4.15) and (4.17), we have

$$\begin{aligned} \lim \limits _{k\in K, k\rightarrow \infty }y^k=y^*=x^*+\lambda ^*(z^*-x^*). \end{aligned}$$

Following the same line as in the above proof, we can get

$$\begin{aligned} y^*=x^*+\lambda ^*(z^*-x^*)=x^*. \end{aligned}$$

From the assumption, we know that \(\lambda ^*>0.\) It follows immediately from the above equality that \(z^*=x^*.\) \(\square \)

From Lemma 4.8 and Theorem 4.5, we can get the following corollary.

Corollary 4.9

Suppose that \(\mathrm{arg}\min \limits _{x\in C}f(x)=\{x^*\},x^*\) is a regular solution and assumptions \((A_1'),\) (A2), (A3) and (A4) hold. Then, \(\{x^k\}\) superlinearly converges to \(x^*.\)

5 Numerical results

To give some insight into the behavior of the algorithms presented in this paper, we implemented them in MATLAB to solve the following three examples. We use \(\Vert z^{k}-x^k\Vert <\epsilon \) as the stopping criteria. Throughout the computational experiments, the parameter \(\epsilon \) used in Algorithm 3.8 were set as \(\epsilon =10^{-6}.\) In the results reported below, the approximate solution is referred to the last iteration and the number of iterations indicated for each test problem and each specified initial point is the number of main iterations; the number of inner iterations in the subroutine is not included.

Example 1

(An \(\ell _2-\mathrm{norm}\) problem of linear equations[37])

Find a solution which satisfies that \(\Vert x\Vert _2\le 2\) of the following linear equations:

$$\begin{aligned}\left\{ \begin{array}{rr} 2x_1+3x_2+x_3=4\\ x_1-2x_2+4x_3=-5\\ 3x_1+8x_2-2x_3=13\\ 4x_1-x_2+9x_3=-6\\ \end{array} \right. \end{aligned}$$

This problem is an SFP for \(A=\left( \begin{array}{rrr} 2 &{} 3 &{} 1 \\ 1 &{} -2 &{} 4 \\ 3 &{} 8 &{} -2 \\ 4 &{} -1 &{} 9 \\ \end{array} \right) ,\) \(C=\{x| \Vert x\Vert _2\le 2\}\) and \(Q=\{y| y=(4, -5, 13, -6)^T\}.\) The test results for Algorithm 3.8 being applied to Example 1 are listed in Table 1 using different starting points.

Table 1 Number of iterations for Algorithm 3.8 being applied to Example 1

Example 2

(A SFP)

Let \(C=\{x\in \mathfrak {R}^3| x_1+x_2+x_3\le 3\}, Q=\{y\in \mathfrak {R}^4| y_1-2y_2+3y_3+7y_4\le 5\}.\) \(A=\left( \begin{array}{rrr} 2 &{} -1 &{} 3 \\ 4 &{} 2 &{} 5 \\ 2 &{} 0 &{} 2 \\ 0 &{} 1 &{} 1 \\ \end{array} \right) .\) Find some point \(x\in C\) with \(Ax\in Q.\)

The test results for Algorithm 3.8 being applied to Example 2 are listed in Table 2 using different starting points.

Table 2 Number of iterations for Algorithm 3.8 being applied to Example 2

Example 3

(A SFP generated randomly[38])

The SFP to find \(x\in C=\{{{x\in \mathfrak {R}^n}}| \Vert x\Vert _2\le 0.25\},\) such that \(Ax\in Q=\{{{y\in \mathfrak {R}^m}}|0.6\le y_j\le 1, {{j=1,2,\cdots , m}}\},\) where, the matrix \(A=(a_{ij})_{m\times n}\) and \(a_{ij}\in (0,50)\) are generated randomly. In what follows, we denote \(e_1=(1,1,\cdots ,1)^T.\)

The test results for Algorithm 3.8 being applied to Example 3 are listed in Table 3 using different starting points.

Table 3 Number of iterations for Algorithm 3.8 being applied to Example 3

Finally, we also compared the implementation of our algorithm with CQ algorithm of [2]. For a benchmark, we also tested the CQ algorithm on Examples 1, 2 and 3. The test results are listed in Tables 4, 5 and 6.

Table 4 Number of iterations for CQ Algorithm being applied to Example 1
Table 5 Number of iterations for CQ Algorithm being applied to Example 2
Table 6 Number of iterations for CQ Algorithm being applied to Example 3

The results given in Tables 1, 2 and 3 are quite promising as these test problems were solved using just a small amount of iterations. They showed that the algorithms of this paper is effective. Tables 1, 2, 3, 4, 5 and 6 tells us that the iteration numbers of our algorithm are relatively small. The numerical experiments demonstrate the viability of the new method proposed in this paper. We know that choice of parameter values affects performance of a method in practice. As this is not the main objective of the paper, we would not give detailed test in this aspect.

6 Concluding remarks

In this paper, we give two Newton projection methods, that is, Algorithms 3.1 and 3.8 for solving the split feasibility problem. As long as the solution set of the SFP is nonempty, the two algorithms will converges globally to a solution of the SFP proved by Theorem 3.7. While Theorem 4.5 and its corollaries prove that Algorithm 3.1 superlinearly converges to a regular solution. It is worth noting that if we do some feasible modification of Step 3 in Algorithm 3.1, the modified algorithm can solve a class of convex optimization problem, which is general than the SFP, in which the gradient of the objective function is nonsmooth, but locally Lipschitzian. Or, it can also be used to solve a class of nonsmooth and monotone VIP(CF),  where, F is locally Lipschitzian. For the extended algorithm, we can also prove that the conclusion corresponding to Theorems 3.7 and  4.5 also hold.