1 Introduction

In this paper, we consider numerically solving the following constrained nonlinear monotone equations of the form:

$$ \begin{array}{@{}rcl@{}} F(x)=0,\ x\in \mathcal{C}, \end{array} $$
(1)

where \(\mathcal {C}\subset \mathbb {R}^{n}\) is a nonempty closed convex set, and F is a continuous and monotone mapping from \(\mathbb {R}^{n}\) to itself. The monotonicity of the mapping F means that:

$$ \begin{array}{@{}rcl@{}} \left( F(x)-F(y)\right)^{\mathrm{T}}(x-y) \geq 0,~\forall~x,~y\in \mathbb{R}^{n}. \end{array} $$
(2)

We call problem (1) the unconstrained nonlinear monotone equations if \(\mathcal {C}=\mathbb {R}^{n}\).

The problem (1) arises in many important modern applications in applied mathematics, economics and engineering, and so on. Moreover, some mathematical problems can be converted into the form of (1), such as some nonlinear monotone complementarity problems, the compressed sensing problems, and the optimal power flow control problems in electricity (see [1,2,3,4] for references). The wide applications of nonlinear monotone equations have motivated many researchers to study the efficient methods for solving (1). Up to now, numerous iterative methods have been developed to solve (1). Among these methods, we only focus on the derivative-free projection-based methods (DFPMs for short); see, for example, [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]. This kind of methods are quite suitable for solving large-scale constrained nonlinear monotone equations due to their simplicity and low storage requirement.

Inspired by the projection technique [20] and the spectral gradient method [21], Yu et al. [5] first extended the spectral gradient projection method in [22] from unconstrained nonlinear monotone equations to constrained nonlinear monotone equations. The reported numerical results in [5] show that the proposed method works quite well even for large-scale problems. Since then, it has attracted much attention, and many DFPMs have sprung up, such as the spectral gradient-type projection method [6, 7], the conjugate gradient-type projection method [8,9,10,11,12,13,14,15], the spectral conjugate gradient-type projection method [16,17,18,19], to mention just a few. Recently, the spectral gradient-type projection method has been further developed. Abubakar et al. [23] proposed a positive spectral gradient projection method whose search direction is a convex combination of two different positive spectral coefficients multiplied with the residual vector. Awwal et al. [24] presented a two-step spectral gradient projection method. Mohammad and Abubakar [25] further designed an adaptive spectral gradient projection method. The numerical results in [23,24,25] show that these methods work well and are more suitable compared to the similar methods for solving (1).

In [26], Wang et al. proposed a three-term conjugate gradient projection method based on the HS method [27] for solving (1). Subsequently, motivated by [26], Gao and He [28] proposed another three-term conjugate gradient projection method by choosing a part of the LS method in [29] as the conjugate parameter. Recently, Gao et al. [30] developed an adaptive three-term conjugate gradient projection method, in which the search direction is close to the direction of the self-scaling memoryless BFGS method proposed by Oren and Luenberger [31] in the sense of the minimum Frobenius norm. And they successfully applied this algorithm to solve the compressed sensing problems in [30]. Quite recently, Awwal et al. [32] proposed a Perry-type derivative-free method based on the BFGS quasi-Newton search direction. The reported numerical results in [32] show that the proposed method is effective and promising for solving (1) and signal recovery problem. On the other hand, based on the ideas of hybrid conjugate gradient method in [33], Sun and Liu [34] developed a hybrid conjugate gradient projection method for solving (1), in which the search direction satisfies the sufficient descent property and the trust region property independent of line search technique. Some large-scale numerical results in [34] show that the proposed method is quite competitive for some fixed initial points.

Recently, based on the memoryless BFGS method proposed by Nocedal [35] and Shanno [36], Li [37,38,39] developed a family of three-term conjugate gradient methods, in which the search directions are close to that of the memoryless BFGS method. A large number of numerical comparisons indicate that the proposed methods are quite efficient for solving unconstrained optimization problems.

To the best of our knowledge, no existing such a conjugate gradient method in the literature can combine the three-term conjugate gradient method, whose search direction is close to that of the memoryless BFGS method, with the ideas of hybrid methods. Hence, it is interesting to design a hybrid three-term conjugate gradient projection method for solving problem (1) such that its search direction is close to that of the memoryless BFGS method and possesses the sufficient descent property and the trust region property independent of line search technique.

In this paper, inspired by the hybrid conjugate gradient projection method in [34] and the three-term conjugate gradient methods in [37,38,39], we propose a hybrid three-term conjugate gradient projection method for solving (1). The rest of the paper is organized as follows. In the next section, we describe in detail the motivation of this paper and propose the algorithm. In Section 3, we establish the global convergence under some mild conditions. In Section 4, we perform numerical experiments to illustrate the performance of the proposed method and apply it to solve compressed sensing problems in Section 5. Some conclusions are summarized in the last section.

Notation

Throughout this paper, the symbols ∥⋅∥ and ∥⋅∥1 denote the Euclidean norm and 1-norm on \(\mathbb {R}^{n}\), respectively. For convenience, we abbreviate F(xk) to Fk, and similar abbreviations are also used. For a nonempty closed convex set \(\mathcal {C}\subset \mathbb {R}^{n}\), \(P_{\mathcal {C}}[\cdot ]\) denotes the projection mapping from \(\mathbb {R}^{n}\) onto the set \(\mathcal {C}\), namely, \(P_{\mathcal {C}}[x]=\arg \min \limits \{\|y-x\|~|~y\in \mathcal {C}\}\).

2 Motivation and algorithm

In this section, we are devoted to developing a hybrid three-term conjugate gradient projection method for solving problem (1). To this end, we first briefly review the three-term conjugate gradient methods for solving unconstrained optimization problems proposed in [37,38,39] and the hybrid conjugate gradient projection method for solving problem (1) in [34].

As we all know, for solving large-scale unconstrained optimization problems:

$$ \begin{array}{@{}rcl@{}} \min\limits_{x\in \mathbb{R}^{n}}f(x), \end{array} $$

where \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a continuously differentiable function, one of the most effective methods is the three-term conjugate gradient method, which dates back to Beale [40] and later developed by Nazareth [41], Zhang et al. [42], etc. They are of the form:

$$ \begin{array}{@{}rcl@{}} x_{k+1}=x_{k}+\alpha_{k}d_{k}, \end{array} $$

where αk > 0 is the steplength determined by some line search and dk is the search direction generated by:

$$ \begin{array}{@{}rcl@{}} d_{k}=-F_{k}+\beta_{k}d_{k-1}+\theta_{k}y_{k-1},~k\geq 1,~d_{0}=-F_{0}, \end{array} $$
(3)

where Fk := ∇f(xk), βk and 𝜃k are scalar parameters, and yk− 1 := FkFk− 1. Different choices for the parameters βk and 𝜃k correspond to different three-term conjugate gradient methods. Clearly, the three-term conjugate gradient method reduces to the classical conjugate gradient method when 𝜃k ≡ 0 in (3). There are at least four well-known formulas for βk, such as:

$$ \begin{array}{@{}rcl@{}} \beta_{k}^{\text{PRP}}=\frac{F_{k}^{\top}y_{k-1}}{\|F_{k-1}\|^{2}}, ~\beta_{k}^{\text{HS}}=\frac{F_{k}^{\top}y_{k-1}}{d_{k-1}^{\top}y_{k-1}}, ~\beta_{k}^{\text{DY}}=\frac{\|F_{k}\|^{2}}{d_{k-1}^{\top}y_{k-1}}, ~\beta_{k}^{\text{LS}}=\frac{F_{k}^{\top}y_{k-1}}{-d_{k-1}^{\top}F_{k-1}}. \end{array} $$

We refer the readers to the monograph [43] and the comprehensive survey [44] for more details.

Based on the memoryless BFGS method proposed by Nocedal [35] and Shanno [36], in which the search direction can be written as:

$$ \begin{array}{@{}rcl@{}} d_{k}^{\text{LBFGS}}=-F_{k} + \left( \frac{F_{k}^{\top}y_{k-1}}{d_{k-1}^{\top}y_{k-1}} - \frac{\|y_{k-1}\|^{2}F_{k}^{\top}d_{k-1}}{(d_{k-1}^{\top}y_{k-1})^{2}}\right)d_{k-1} + \frac{F_{k}^{\top}d_{k-1}}{d_{k-1}^{\top}y_{k-1}}(y_{k-1} - s_{k-1}), \end{array} $$

where sk− 1 := xkxk− 1, Li [37] proposed a three-term HS conjugate gradient method, in which:

$$ \beta_{k}^{\text{THS}}= \frac{F_{k}^{\top}y_{k-1}}{d_{k-1}^{\top}y_{k-1}}-\frac{\|y_{k-1}\|^{2}F_{k}^{\top}d_{k-1}}{(d_{k-1}^{\top}y_{k-1})^{2}},~ \theta_{k}=t_{k}\frac{F_{k}^{\top}d_{k-1}}{d_{k-1}^{\top}y_{k-1}}, $$
(4)

and \(0\leq t_{k}\leq \bar {t}<1\). In practical computation, the parameter \(\bar {t}\) is set to 0.3 and then tk is calculated by:

$$ t_{k}=\min\left\{\bar{t},~\max\left\{0,~1-\frac{y_{k-1}^{\top}s_{k-1}}{\|y_{k-1}\|^{2}}\right\}\right\}, $$
(5)

where \(1-\frac {y_{k-1}^{\top }s_{k-1}}{\|y_{k-1}\|^{2}}\) is the solution of the univariate minimal problem:

$$ \begin{array}{@{}rcl@{}} \min\limits_{t\in \mathbb{R}}\|(y_{k-1}-s_{k-1})-t\cdot y_{k-1}\|^{2}. \end{array} $$

It is not hard to know that the search direction defined in (3)–(5) is close to that of the memoryless BFGS method in the sense of the above univariate minimum problem. Subsequently, based on the fact that \(d_{k-1}^{\top }y_{k-1}=\|F_{k-1}\|^{2}\) holds when the objective function f is a strongly convex quadratic and the exact line search is used, Li [38] proposed a three-term PRP conjugate gradient method with the search direction of the form (3) being generated by replacing \(d_{k-1}^{\top }y_{k-1}\) with ∥Fk− 12 in (4), i.e.:

$$ \begin{array}{@{}rcl@{}} \beta_{k}^{\text{TPRP}}= \frac{F_{k}^{\top}y_{k-1}}{\|F_{k-1}\|^{2}}-\frac{\|y_{k-1}\|^{2}F_{k}^{\top}d_{k-1}}{\|F_{k-1}\|^{4}},~ \hat{\theta}_{k}=t_{k}\frac{F_{k}^{\top}d_{k-1}}{\|F_{k-1}\|^{2}}, \end{array} $$
(6)

and \(0\leq t_{k}\leq \bar {t}<1\). Recently, based on the comprehensive analysis of [37, 38], Li [39] proposed a family of three-term conjugate gradient methods for solving unconstrained optimization problems. Interestingly, a remarkable feature of these methods is that the search directions defined by [37,38,39] always satisfy the sufficient descent property:

$$ \begin{array}{@{}rcl@{}} F_{k}^{\top}d_{k}\leq -\left( 1-\frac{(1+\bar{t})^{2}}{4}\right)\|F_{k}\|^{2}. \end{array} $$

On the other hand, based on the ideas of hybrid conjugate gradient method in [33], Sun and Liu [34] developed a hybrid conjugate gradient method and then extended it to solve problem (1). In [34], the search direction is determined by the following scheme:

$$ \begin{array}{@{}rcl@{}} d_{k}=-F_{k}+\beta_{k}^{\text{HCG}}d_{k-1},~k\geq 1,~d_{0}=-F_{0}, \end{array} $$

where:

$$ \begin{array}{@{}rcl@{}} \beta_{k}^{\text{HCG}}=\frac{\|F_{k}\|^{2}-\max\left\{0,~\frac{\|F_{k}\|}{\|F_{k-1}\|}F_{k}^{\top}F_{k-1}\right\}} {\max\{\kappa\|d_{k-1}\|\|F_{k}\|,~d_{k-1}^{\top}y_{k-1}\}}, \end{array} $$

with κ > 1 being a constant. It is easy to show that the search direction defined in the above two relations satisfies the sufficient descent property and the trust region property independent of line search technique; see Lemma 3.1 and Lemma 3.2 of [34], respectively. Notice that the parameter \(\beta _{k}^{\text {HCG}}\) can be viewed as a hybrid of \(\beta _{k}^{\text {HS}}\) and \(\beta _{k}^{\text {DY}}\).

Inspired by the three-term conjugate gradient methods and the hybrid conjugate gradient projection method introduced above, we propose a hybrid three-term conjugate gradient method by incorporating the projection technique to solve nonlinear monotone equations with convex constraints. The search direction dk for the proposed method is given by:

$$ \begin{array}{@{}rcl@{}} d_{k}=-F_{k}+\beta_{k}^{\text{HTP}}d_{k-1}+\tilde{\theta}_{k}y_{k-1},~k\geq 1,~d_{0}=-F_{0}, \end{array} $$
(7)

where:

$$ \begin{array}{@{}rcl@{}} w_{k}&=&\max\{\mu\|d_{k-1}\|\|y_{k-1}\|,d_{k-1}^{\top}y_{k-1},\|F_{k-1}\|^{2}\},\\ \beta_{k}^{\text{HTP}}&=&\frac{F_{k}^{\top}y_{k-1}}{w_{k}}-\frac{\|y_{k-1}\|^{2}F_{k}^{\top}d_{k-1}}{{w_{k}^{2}}},\\ \tilde{\theta}_{k}&=&t_{k}\frac{F_{k}^{\top}d_{k-1}}{w_{k}}, \end{array} $$
(8)

with \(0\leq t_{k}\leq \bar {t}<1\) and μ > 0. Clearly, due to the definition of wk in (8), the parameter \(\beta _{k}^{\text {HTP}}\) is a hybrid of \(\beta _{k}^{\text {THS}}\) in (4) and \(\beta _{k}^{\text {TPRP}}\) in (6). Similarly, the parameter \(\tilde {\theta }_{k}\) is a hybrid of 𝜃k in (4) and \(\hat {\theta }_{k}\) in (6). Thus, the search direction defined in (7) is close to that of the memoryless BFGS method when \(t_{k}=1-\frac {y_{k-1}^{\top }s_{k-1}}{\|y_{k-1}\|^{2}}\). Notice that the first term of the maximum function from the first relation in (8) is chosen to make the search direction have a trust region property; see Lemma 2 below. It is easy to see that wk > 0 due to the existence of ∥Fk− 12 in (8). Therefore, the parameters \(\beta _{k}^{\text {HTP}}\) and \(\tilde {\theta }_{k}\) in (8) are both well defined.

We now give the steps of our algorithm which we call the hybrid three-term conjugate gradient projection (HTTCGP) method.

figure a

Remark 1

(i) The adaptive line search (9) was proposed by our recent work in [45]. From the definition of P[λ, ν][⋅], we have that λP[λ, ν][u] ≤ ν for any \(u\in \mathbb {R}\). Introducing the parameter ν is to prevent the right-hand side of (9) from being too large, especially when xk is far from the solution set of (1) and ∥F(zk)∥ is also large. This can effectively reduce the computational cost of the line search. The parameter λ is chosen to guard against the worst possible outcome, i.e., F(zk) = 0 and \(z_{k}\notin \mathcal {C}\). Indeed, from step 2 of Algorithm 1, we have ∥dk∥ > 0. Combining this with the fact that P[λ, ν][∥F(xk + ςρidk)∥] ≥ λ > 0, we know that the right-hand side of (9) is positive for any finite nonnegative integer i. Thus, we see immediately from Lemma 3 below that F(zk)≠ 0 holds.

(ii) The adaptive line search (9) is a derivative-free line search and contains two well-known line search procedures in [22, 46] which were used in many of works. In fact, if λ = ν = 1, then P[λ, ν][∥F(zk)∥] ≡ 1. Thus, (9) reduces to the line search used in [22]. If λ = 0 and ν is sufficiently large, such as ν ≥∥F0∥, then we have P[λ, ν][∥F(zk)∥] ≡∥F(zk)∥. Therefore, (9) is equivalent to the line search used in [46]. Notice that when λ = 0 and ν = 1, (9) is equivalent to the line search recently proposed by Sun and Tian in [47]. Quite recently, Awwal et al. [24] proposed another adaptive line search, that is, \(\alpha _{k}=\varsigma \rho ^{i_{k}}\) with ik being the smallest nonnegative integer i such that:

$$ -F(x_{k}+\alpha_{k}d_{k})^{\top}d_{k}\geq \sigma\alpha_{k}\|F(x_{k}+\alpha_{k}d_{k})\|^{1/c}\cdot\|d_{k}\|^{2},~c\geq 1. $$
(12)

Clearly, if c = 1, (12) can be viewed as a special case of (9), provided that λ = 0 and ν is sufficiently large. On the other hand, if c is sufficiently large, then (12) behaves like the adaptive line search (9) with λ = ν = 1.

(iii) Notice that γ ∈ (0,2) in (10) is a relaxation factor whose suitable value may speed up convergence for the projection and contraction methods as stated in [48]. In practical computation, one can take a larger relaxation factor γ ∈ [1,2).

3 Convergence property

In this section, we analyze the global convergence of the proposed method. To this end, we need the following standard assumptions used in most DFPMs.

(A1) The solution set of problem (1), denoted by \(\mathcal {S}\), is nonempty.

(A2) The mapping F is Lipschitz continuous on \(\mathbb {R}^{n}\), i.e., there is a constant L > 0 such that:

$$ \begin{array}{@{}rcl@{}} \|F(x)-F(y)\|\leq L\|x-y\|,~\forall~x,~y\in \mathbb{R}^{n}. \end{array} $$
(13)

We now start our convergence analysis by giving a well-known nonexpansive property of the projection operator [49].

Lemma 1

Let \(\mathcal {C}\) be a nonempty closed convex subset of \(\mathbb {R}^{n}\). Then:

$$ \begin{array}{@{}rcl@{}} \|P_{\mathcal{C}}[x]-P_{\mathcal{C}}[y]\|\leq \|x-y\|,~\forall~x,~y\in \mathbb{R}^{n}. \end{array} $$

The next lemma shows that the search direction defined in (7) has some nice properties independent of line search technique.

Lemma 2

For all k ≥ 0, the search direction sequence {dk} generated by the HTTCGP always satisfies the sufficient descent property:

$$ \begin{array}{@{}rcl@{}} F_{k}^{\top}d_{k}\leq -\left( 1-\frac{(1+\bar{t})^{2}}{4}\right)\|F_{k}\|^{2} \end{array} $$
(14)

and the trust region property:

$$ \begin{array}{@{}rcl@{}} \left( 1-\frac{(1+\bar{t})^{2}}{4}\right)\|F_{k}\|\leq\|d_{k}\|\leq \left( 1+\frac{1+\bar{t}}{\mu}+\frac{1}{\mu^{2}}\right)\|F_{k}\| \end{array} $$
(15)

with \(0\leq t_{k}\leq \bar {t}<1\) and μ > 0.

Proof

Since d0 = −F0, we have \(F_{0}^{\top }d_{0}=-\|F_{0}\|^{2}\), which satisfies (14) for k = 0. From the definition of dk in (7), we conclude that for all k ≥ 1,

$$ \begin{array}{@{}rcl@{}} F_{k}^{\top}d_{k}&=& -\|F_{k}\|^{2}+\beta_{k}^{\text{HTP}}F_{k}^{\top}d_{k-1}+\tilde{\theta}_{k}F_{k}^{\top}y_{k-1}\\ &=& -\|F_{k}\|^{2}+\left( \frac{F_{k}^{\top}y_{k-1}}{w_{k}}-\frac{\|y_{k-1}\|^{2}F_{k}^{\top}d_{k-1}}{{w_{k}^{2}}}\right)F_{k}^{\top}d_{k-1}+t_{k}\frac{F_{k}^{\top}d_{k-1}}{w_{k}}F_{k}^{\top}y_{k-1}\\ &=& -\|F_{k}\|^{2}+(1+t_{k})\frac{F_{k}^{\top}y_{k-1}F_{k}^{\top}d_{k-1}}{w_{k}}-\frac{\|y_{k-1}\|^{2}(F_{k}^{\top}d_{k-1})^{2}}{{w_{k}^{2}}}\\ &=& -\|F_{k}\|^{2}+2\left( \frac{1+t_{k}}{2}F_{k}^{\top}\right)\frac{y_{k-1}F_{k}^{\top}d_{k-1}}{w_{k}}-\frac{\|y_{k-1}\|^{2}(F_{k}^{\top}d_{k-1})^{2}}{{w_{k}^{2}}}\\ &\leq& -\|F_{k}\|^{2}+\frac{(1+t_{k})^{2}}{4}\|F_{k}\|^{2}+\frac{\|y_{k-1}\|^{2}(F_{k}^{\top}d_{k-1})^{2}}{{w_{k}^{2}}}-\frac{\|y_{k-1}\|^{2}(F_{k}^{\top}d_{k-1})^{2}}{{w_{k}^{2}}}\\ &=& -\|F_{k}\|^{2}+\frac{(1+t_{k})^{2}}{4}\|F_{k}\|^{2}\leq-\left( 1-\frac{(1+\bar{t})^{2}}{4}\right)\|F_{k}\|^{2}, \end{array} $$

which implies that (14) holds.

Again, from d0 = −F0, we obtain ∥d0∥ = ∥F0∥, which satisfies (15) for k = 0. We next prove that the relation (15) holds for all k ≥ 1. Note from step 2 of Algorithm 1 that dk≠ 0 for all k. If yk− 1 = 0, then the desired result immediately follows from the fact that dk = −Fk due to the definition of dk in (7). Therefore, for the remainder of the proof, we assume that yk− 1≠ 0 for all k ≥ 1. From (14) and the Cauchy-Schwarz inequality, we know that the first inequality in (15) holds. To proceed, we note from (8) that:

$$ \begin{array}{@{}rcl@{}} |\beta_{k}^{\text{HTP}}|&=&\left|\frac{F_{k}^{\top}y_{k-1}}{w_{k}}-\frac{\|y_{k-1}\|^{2}F_{k}^{\top}d_{k-1}}{{w_{k}^{2}}}\right| \\ &\leq& \frac{\|F_{k}\|\|y_{k-1}\|}{w_{k}}+\frac{\|y_{k-1}\|^{2}\|F_{k}\|\|d_{k-1}\|}{{w_{k}^{2}}} \\ &\leq& \frac{\|F_{k}\|\|y_{k-1}\|}{\mu\|d_{k-1}\|\|y_{k-1}\|}+\frac{\|y_{k-1}\|^{2}\|F_{k}\|\|d_{k-1}\|}{(\mu\|d_{k-1}\|\|y_{k-1}\|)^{2}} \\ &=&\left( \frac{1}{\mu}+\frac{1}{\mu^{2}}\right)\frac{\|F_{k}\|}{\|d_{k-1}\|} \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} |\tilde{\theta}_{k}|&=&t_{k}\left|\frac{F_{k}^{\top}d_{k-1}}{w_{k}}\right| \\ &\leq& \bar{t}\frac{\|F_{k}\|\|d_{k-1}\|}{w_{k}} \end{array} $$
$$ \begin{array}{@{}rcl@{}} &\leq& \frac{\bar{t}\|F_{k}\|\|d_{k-1}\|}{\mu\|d_{k-1}\|\|y_{k-1}\|} \\ &=& \frac{\bar{t}\|F_{k}\|}{\mu\|y_{k-1}\|}. \end{array} $$

Combining the above two relations with the definition of dk in (7) yields:

$$ \begin{array}{@{}rcl@{}} \|d_{k}\|&=&\|-F_{k}+\beta_{k}^{\text{HTP}}d_{k-1}+\tilde{\theta}_{k}y_{k-1}\| \\ &\leq& \|F_{k}\|+|\beta_{k}^{\text{HTP}}|\|d_{k-1}\|+|\tilde{\theta}_{k}|\|y_{k-1}\|\\ &\leq& \left( 1+\frac{1+\bar{t}}{\mu}+\frac{1}{\mu^{2}}\right)\|F_{k}\|. \end{array} $$

This completes the proof. □

Remark 2

The first inequality in (15) implies that we find a solution of problem (1) when dk = 0. This is the reason why we choose dk = 0 as the stopping criterion in step 2 of Algorithm 1.

The following lemma not only shows that the adaptive line search (9) is well-defined but also provides a uniform lower bound of the sequence {αk}.

Lemma 3

(i) Let {xk} and {dk} be the sequences generated by the HTTCGP. If the mapping F is continuous on \(\mathbb {R}^{n}\), then for all k ≥ 0, there exists a nonnegative integer ik satisfying (9).

(ii) Suppose that assumption (A2) holds, then the steplength αk yielded by (9) satisfies:

$$ \begin{array}{@{}rcl@{}} \alpha_{k}\geq \alpha:=\min \left\{\varsigma,~\frac{\rho\mu^{4}[4-(1+\bar{t})^{2}]}{4(L+\sigma\nu)[\mu^{2}+\mu(1+\bar{t})+1]^{2}}\right\}. \end{array} $$
(16)

Proof

(i) We proceed by contradiction and assume that there exists an integer k0 ≥ 0 such that the adaptive line search (9) does not hold for any nonnegative integer i. Hence, for all i ≥ 0, we have:

$$ \begin{array}{@{}rcl@{}} -F(x_{k_{0}}+\varsigma\rho^{i}d_{k_{0}})^{\top}d_{k_{0}}&<& \sigma\varsigma\rho^{i}P_{[\lambda,\nu]}[\|F(x_{k_{0}}+\varsigma\rho^{i}d_{k_{0}})\|]\cdot\|d_{k_{0}}\|^{2}\\ &\leq& \sigma\nu\varsigma\rho^{i}\|d_{k_{0}}\|^{2}, \end{array} $$

where the second inequality follows from the fact that \(P_{\left [\lambda ,\nu \right ]}[\cdot ]\leq \nu \). Taking the limit in the above inequality, using the continuity of the mapping F and rearranging terms in the resulting relation, we obtain:

$$ \begin{array}{@{}rcl@{}} F(x_{k_{0}})^{\top}d_{k_{0}}\geq 0, \end{array} $$

which contradicts (14) since \(F(x_{k_{0}})^{\top }d_{k_{0}}\leq -\left (1-\frac {(1+\bar {t})^{2}}{4}\right )\|F(x_{k_{0}})\|^{2}<0\). Thus, the desired result holds.

(ii) We now prove part (ii). Clearly, if αk = ς, then (16) holds. Otherwise, \(\widehat {\alpha }_{k}:=\rho ^{-1}\alpha _{k}\) does not satisfy the adaptive line search (9), namely:

$$ \begin{array}{@{}rcl@{}} -F(\widehat{z}_{k})^{\mathrm{T}}d_{k}<\sigma\widehat{\alpha}_{k}P_{[\lambda,\nu]}[F(\widehat{z}_{k})]\|d_{k}\|^{2}\leq \sigma\nu\widehat{\alpha}_{k}\|d_{k}\|^{2}, \end{array} $$

where \(\widehat {z}_{k}=x_{k}+\widehat {\alpha }_{k}d_{k}\). This together with (13) and (14) yields:

$$ \begin{array}{@{}rcl@{}} \left( 1-\frac{(1+\bar{t})^{2}}{4}\right)\|F_{k}\|^{2}&\leq& -F_{k}^{\mathrm{T}}d_{k}=(F(\widehat{z}_{k})-F_{k})^{\mathrm{T}}d_{k}-F(\widehat{z}_{k})^{\mathrm{T}}d_{k} \\ &<& \widehat{\alpha}_{k}(L+\sigma\nu)\|d_{k}\|^{2}=\rho^{-1}\alpha_{k}(L+\sigma\nu)\|d_{k}\|^{2}. \end{array} $$

Therefore:

$$ \begin{array}{@{}rcl@{}} \alpha_{k}&>&\frac{\rho[4-(1+\bar{t})^{2}]}{4(L+\sigma\nu)}\frac{\|F_{k}\|^{2}}{\|d_{k}\|^{2}}\\ &\geq& \frac{\rho[4-(1+\bar{t})^{2}]}{4(L+\sigma\nu)} \frac{\|F_{k}\|^{2}}{\left( 1+\frac{1+\bar{t}}{\mu}+\frac{1}{\mu^{2}}\right)^{2}\|F_{k}\|^{2}}\\ &=&\frac{\rho\mu^{4}[4-(1+\bar{t})^{2}]}{4(L+\sigma\nu)[\mu^{2}+\mu(1+\bar{t})+1]^{2}}, \end{array} $$

where the second inequality follows from the second inequality in (15). This yields the desired inequality (16). □

The following result is adapted from [20, Lemma 2.1] to show that for any \(x^{\ast }\in \mathcal {S}\), the whole sequence {∥xkx∥} generated by the HTTCGP is convergent.

Lemma 4

Suppose that the assumptions (A1)-(A2) hold. Let {xk} and {zk} be the sequences generated by the HTTCGP. Then for any \(x^{\ast }\in \mathcal {S}\) the sequence {∥xkx∥} is convergent, therefore the sequence {xk} is bounded. Furthermore, it holds that:

$$ \begin{array}{@{}rcl@{}} \lim\limits_{k\rightarrow \infty}\alpha_{k}\|d_{k}\|=0. \end{array} $$
(17)

Proof

We start our proof by noticing from the definition of zk and (9) that:

$$ \begin{array}{lcl} F(z_{k})^{\mathrm{T}}(x_{k}-z_{k})&\geq& {\sigma\alpha_{k}^{2}}P_{[\lambda,\nu]}[\|F(z_{k})\|]\|d_{k}\|^{2}\\ &\geq& {\sigma\lambda\alpha_{k}^{2}}\|d_{k}\|^{2}=\sigma\lambda \|x_{k}-z_{k}\|^{2}>0. \end{array} $$
(18)

From (2) and \(x^{\ast }\in \mathcal {S}\), we obtain that:

$$ \begin{array}{lcl} F(z_{k})^{\mathrm{T}}(x_{k}-x^{\ast})&=& F(z_{k})^{\mathrm{T}}(x_{k}-z_{k})+F(z_{k})^{\mathrm{T}}(z_{k}-x^{\ast})\\ &\geq& F(z_{k})^{\mathrm{T}}(x_{k}-z_{k})+F(x^{\ast})^{\mathrm{T}}(z_{k}-x^{\ast})\\ &=& F(z_{k})^{\mathrm{T}}(x_{k}-z_{k}). \end{array} $$

Combining the above two relations with (10), Lemma 1 and (11), we conclude that:

$$ \begin{array}{@{}rcl@{}} \|x_{k+1}-x^{\ast}\|^{2}&=& \|P_{\mathcal{C}}[x_{k}-\gamma\cdot\xi_{k}F(z_{k})]-P_{\mathcal{C}}[x^{\ast}]\|^{2} \\ &\leq& \|x_{k}-\gamma\cdot\xi_{k}F(z_{k})-x^{\ast}\|^{2} \\ &=& \|x_{k}-x^{\ast}\|^{2}-2\gamma\cdot\xi_{k}F(z_{k})^{\mathrm{T}}(x_{k}-x^{\ast})+\gamma^{2}{\xi_{k}^{2}}\|F(z_{k})\|^{2} \\ &\leq& \|x_{k}-x^{\ast}\|^{2}-2\gamma\cdot\xi_{k}F(z_{k})^{\mathrm{T}}(x_{k}-z_{k})+\gamma^{2}{\xi_{k}^{2}}\|F(z_{k})\|^{2} \end{array} $$
$$ \begin{array}{@{}rcl@{}} &=& \|x_{k}-x^{\ast}\|^{2}-\gamma(2-\gamma)\frac{\left( F(z_{k})^{\mathrm{T}}(x_{k}-z_{k})\right)^{2}}{\|F(z_{k})\|^{2}} \\ &\leq& \|x_{k}-x^{\ast}\|^{2}-\gamma(2-\gamma)\frac{\sigma^{2}\lambda^{2}\|x_{k}-z_{k}\|^{4}}{\|F(z_{k})\|^{2}}, \end{array} $$
(19)

which implies 0 ≤∥xk+ 1x∥≤∥xkx∥ since 0 < γ < 2 from the choice of γ in the HTTCGP. That is, the sequence {∥xkx∥} is non-increasing and bounded below, and therefore is convergent. So, the boundedness of {xk} is at hand.

We next derive an upper bound of ∥F(zk)∥. To proceed, notice from the boundedness of {xk} and the continuity of F that there exists a constant M > 0 such that:

$$ \begin{array}{@{}rcl@{}} \|x_{k}\|\leq M,\ \|F_{k}\|\leq M,~\forall~k\geq 0. \end{array} $$

Hence, we see from the Cauchy-Schwarz inequality, (2) and (18) that:

$$ \begin{array}{@{}rcl@{}} M\geq \|F_{k}\|\geq \frac{F_{k}^{\mathrm{T}}(x_{k}-z_{k})}{\|x_{k}-z_{k}\|}\geq \frac{F(z_{k})^{\mathrm{T}}(x_{k}-z_{k})}{\|x_{k}-z_{k}\|}\geq \sigma\lambda\|x_{k}-z_{k}\|\geq\sigma\lambda\|z_{k}\|-\sigma\lambda M, \end{array} $$

which shows that {zk} is bounded. Therefore, it again follows from the continuity of F that there exists a constant \(\widehat {M}>0\) such that:

$$ \begin{array}{@{}rcl@{}} \|F(z_{k})\|\leq \widehat{M},~\forall~k\geq 0. \end{array} $$

This together with (19) implies:

$$ \begin{array}{@{}rcl@{}} \gamma(2-\gamma)\frac{\sigma^{2}\lambda^{2}}{\widehat{M}^{2}}\sum\limits_{k=0}^{\infty}\|x_{k}-z_{k}\|^{4}&\leq & \sum\limits_{k=0}^{\infty}\left( \|x_{k}-x^{\ast}\|^{2}-\|x_{k+1}-x^{\ast}\|^{2}\right)\\ &=&\|x_{0}-x^{\ast}\|^{2}- \lim\limits_{k\rightarrow\infty}\|x_{k+1}-x^{\ast}\|^{2}<\infty, \end{array} $$

which shows \(0=\lim \limits _{k\rightarrow \infty }\|x_{k}-z_{k}\|=\lim \limits _{k\rightarrow \infty }\alpha _{k}\|d_{k}\|\) since zk = xk + αkdk and 0 < γ < 2 from the HTTCGP. This completes the proof. □

Remark 3

Notice that the proof of the above lemma relies heavily on the projection technique and the line search technique independent of the specific structure of the search direction. It should be mentioned that although the whole sequence {∥xkx∥} is convergent from the above lemma, this does not guarantee that {xk} converges to \(x^{\ast }\in \mathcal {S}\), and even it is not necessarily convergent.

We now combine these results to prove the global convergence of the HTTCGP under some mild conditions.

Theorem 1

Suppose that the assumptions (A1)-(A2) hold. Then the whole sequence {xk} generated by the HTTCGP converges to a solution of proplem (1).

Proof

We start our proof by noticing from (16) and (17) that \(0\leq \alpha \|d_{k}\|\leq \alpha _{k}\|d_{k}\|\rightarrow 0\). This shows \(\lim _{k\rightarrow \infty }\|d_{k}\|=0\). Combining this with (15), we have:

$$ \begin{array}{@{}rcl@{}} 0\leq\left( 1-\frac{(1+\bar{t})^{2}}{4}\right)\|F_{k}\|\leq\|d_{k}\|\rightarrow 0, \end{array} $$

which implies \(\lim _{k\rightarrow \infty }\|F_{k}\|=0\). Since the sequence \(\{x_{k}\}\subset \mathcal {C}\) is bounded from the fact that \(x_{0}\in \mathcal {C}\), (10) and Lemma 4, there exists at least one cluster point of {xk}. Suppose that \(\bar {x}\) is a cluster point of the sequence \(\{x_{k}\}\subset \mathcal {C}\) and let \(\mathcal {K}\subset \{0,1,2,\dots \}\) be an infinite index set such that:

$$ \begin{array}{@{}rcl@{}} \lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}x_{k}=\bar{x}\in \mathcal{C}, \end{array} $$

where the inclusion follows from the closedness of \(\mathcal {C}\). Thus, it follows from these and the continuity of F that:

$$ \begin{array}{@{}rcl@{}} 0=\lim\limits_{k\rightarrow \infty}\|F_{k}\|=\lim\limits_{k\rightarrow \infty, k\in\mathcal{K}}\|F_{k}\|=\|F(\bar{x})\|, \end{array} $$

which implies that \(\bar {x}\) is a solution of problem (1), i.e., \(\bar {x}\in \mathcal {S}\). Therefore, setting \(x^{\ast }:=\bar {x}\) in Lemma 4, we know that the sequence \(\{\|x_{k}-\bar {x}\|\}\) is convergent and then:

$$ \begin{array}{@{}rcl@{}} \lim\limits_{k\rightarrow \infty}\|x_{k}-\bar x\|=\lim_{k\rightarrow \infty, k\in\mathcal{K}}\|x_{k}-\bar{x}\|=0. \end{array} $$

Consequently, we conclude that the whole sequence {xk} converges to \(\bar {x}\in \mathcal {S}\). This completes the proof. □

4 Numerical experiments

In this section, we perform numerical experiments to illustrate the effectiveness and robustness of the proposed method (HTTCGP), in comparison with two existing algorithms—Algorithm 2.1 in [30] (denoted by ATTCGP) and Algorithm 3.1 in [34] (denoted by HCGP)—in terms of the total number of iterations (Itr), the number of function evaluations (NF), and the CPU time in second (Tcpu) in the same calculating environment. All codes are written in MATLAB R2007b and run on a 64-bit Dell PC with Intel(R) Core(TM) i7-9700 CPU (3.00 GHz), 16.00 GB RAM and Windows 10.0 operating system.

The parameters in the HTTCGP are chosen as follows: \(\rho =0.5,\ \varsigma =1,\ \sigma =0.01,\ \lambda =0.001,\ \nu =0.8,\ \mu =0.2,\ \gamma =1.6,\ \bar {t}=0.3\) with tk being calculated by:

$$ \begin{array}{@{}rcl@{}} t_{k}=\min\left\{\bar{t},~\max\left\{0,~1-\frac{y_{k-1}^{\top}s_{k-1}}{\|y_{k-1}\|^{2}}\right\}\right\}. \end{array} $$

The parameters of ATTCGP are taken from Section 5 of [30], i.e., \(\rho =0.4,\ \varsigma =1,\ \sigma =0.0001,\ r=0.1,\ \mu _{1}=0.01,\ \mu _{2}=0.1,\ t_{k}={t_{k}^{b}}\). The parameters of HCGP are taken from Section 4 of [34], i.e., ρ = 0.5, ς = 1, σ = 0.0001, μ = 1.4, γ = 1.5. For each algorithm and each test problem, we stop the program when one of the following three conditions is satisfied:

$$ \begin{array}{@{}rcl@{}} (\text{ i})\ \ \| F_{k}\|\leq 10^{-6},\ (\text{ii})\ \ \| d_{k}\|\leq 10^{-7},\ {\text{or~(iii)}}\ \text{Itr}>2000. \end{array} $$

Although Lemma 3 shows that the adaptive line search (9) terminates finitely, we impose a restriction on the steplength computation. That is, for some integer i > 0, if the trial steplength ςρi ≤ 10− 10 does not satisfy (9), then we set αk = ςρi. We now give the following popular test problems in which the mapping F is taken as:

$$ \begin{array}{@{}rcl@{}} F(x)=\left( f_{1}(x),f_{2}(x),\dots,f_{n}(x)\right)^{\mathrm{T}}. \end{array} $$

Problem 4.1

[50] Set \(f_{i}(x)=e^{x_{i}}-1\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\mathbb {R}_{+}^{n}\). Clearly, this problem has a unique solution \(x^{\ast }=(0,0,\dots ,0)^{\mathrm {T}}\).

Problem 4.2

[7] Set

$$ \begin{array}{@{}rcl@{}} &&f_{1}(x)=x_{1}-e^{\cos(\frac{x_{1}+x_{2}}{n+1})}, \\ &&f_{i}(x)=x_{i}-e^{\cos(\frac{x_{i-1}+x_{i}+x_{i+1}}{n+1})}, \ \text{for}\ i=2,3,\dots,n-1,\\ &&f_{n}(x)=x_{n}-e^{\cos(\frac{x_{n-1}+x_{n}}{n+1})}, \end{array} $$

and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).

Problem 4.3

[5] Set \(f_{i}(x)=x_{i}-\sin \limits |x_{i}-1|\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\{x\in \mathbb {R}^{n}|{\sum }_{i=1}^{n}x_{i}\leq n,\ x_{i}\geq -1,\ i=1,2,\dots ,n\}\). Clearly, Problem 4.3 is nonsmooth at \(x=(1,\dots ,1)^{\mathrm {T}}\).

Problem 4.4

[28] Set

$$ \begin{array}{@{}rcl@{}} &&f_{1}(x)=2x_{1}+\sin x_{1}-1, \\ &&f_{i}(x)=2x_{i-1}+2x_{i}+\sin x_{i}-1, \ \text{for}\ i=2,3,\dots,n-1,\\ &&f_{n}(x)=2x_{n}+\sin x_{n}-1, \end{array} $$

and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).

Problem 4.5

[28] Set

$$ \begin{array}{@{}rcl@{}} &&f_{1}(x)=x_{1}-e^{\cos(\frac{x_{1}+x_{2}}{2})}, \\ &&f_{i}(x)=x_{i}-e^{\cos(\frac{x_{i-1}+x_{i}+x_{i+1}}{i})}, \ \text{for}\ i=2,3,\dots,n-1,\\ &&f_{n}(x)=x_{n}-e^{\cos(\frac{x_{n-1}+x_{n}}{n})}, \end{array} $$

and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).

Problem 4.6

[28] Set \(f_{i}(x)=(e^{x_{i}})^{2}+3{\sin \limits } x_{i} \cdot {\cos \limits } x_{i}-1\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).

Problem 4.7

This problem is Problem 2 in [51] with additional \(\mathcal {C}=\mathbb {R}_{+}^{n}\), i.e., \(f_{i}(x)=2{x_{i}}-{\sin \limits } x_{i}\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).

Problem 4.8

This problem is Problem 10 in [52] with additional \(\mathcal {C}=\mathbb {R}_{+}^{n}\), i.e., \(f_{i}(x)=\ln (x_{i}+1)-\frac {x_{i}}{n}\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).

For Problem 4.3, we use the quadratic program solver quadprog.m from the MATLAB optimization toolbox to compute a new iterate xk+ 1 in step 4 of HTTCGP. For the above test problems, we use the following different initial points: \(x_{1}=(1,1,\dots ,1)^{\mathrm {T}},\ x_{2}=(\frac {1}{3},\frac {1}{3^{2}},\dots ,\frac {1}{3^{n}})^{\mathrm {T}},\ x_{3}=(\frac {1}{2},\frac {1}{2^{2}},\dots ,\frac {1}{2^{n}})^{\mathrm {T}},\ x_{4}=(0,\frac {1}{n},\dots ,\frac {n-1}{n})^{\mathrm {T}},\ x_{5}=(1,\frac {1}{2},\dots ,\frac {1}{n})^{\mathrm {T}},\ x_{6}=(\frac {1}{n},\frac {2}{n},\dots ,1)^{\mathrm {T}},\ x_{7}=(1-\frac {1}{n},1-\frac {2}{n},\dots ,0)^{\mathrm {T}}\). The numerical results are listed in Tables 1234567, and 8, where “Init” means the initial point, “n” is the problem dimension multiplied by 104, and “∥F∥” stands for the final value of ∥F(xk)∥ when the program is stopped.

Table 1 Numerical results on Problem 4.1
Table 2 Numerical results on Problem 4.2
Table 3 Numerical results on Problem 4.3
Table 4 Numerical results on Problem 4.4
Table 5 Numerical results on Problem 4.5
Table 6 Numerical results on Problem 4.6
Table 7 Numerical results on Problem 4.7
Table 8 Numerical results on Problem 4.8

In order to compare the performance of these methods clearly, we adopt the performance profiles introduced by Dolan and Moré [53] based on Tcpu, NF, and Itr, respectively. From Tables 18 and Figs. 12, and 3, we conclude that for these fixed initial points, the proposed method in general is the most effective and robust in terms of Tcpu, NF, and Itr, respectively. The reason may lie in twofold, one is that the search direction defined by (7) inherits the effectiveness and robustness of the memoryless BFGS method, and the other is that the hybrid technique may numerically effectively improve the performance of the three-term

Fig. 1
figure 1

Performance profiles on Tcpu

Fig. 2
figure 2

Performance profiles on NF

Fig. 3
figure 3

Performance profiles on Itr

To further explore the influence of initial points on the algorithm, we test each problem by using the initial points randomly generated from the interval (0,1) with a gradually increasing number of dimensions. For the given dimension, we run the same test 20 times to obtain the average CPU time in second (\(\overline {\text {Tcpu}}\)), the average number of function evaluations (\(\overline {\text {NF}}\)), the average number of iterations (\(\overline {\text {Itr}}\)), and the average value of ∥F∥ (\(\overline {\|F^{\ast }\|}\)). The numerical results are listed in Table 9, where “P” stands for the test problems used and“n” is still the problem dimension multiplied by 104. We also plot the performance profiles of \(\overline {\text {Tcpu}}\), \(\overline {\text {NF}}\), and \(\overline {\text {Itr}}\), respectively. It is not hard to see from Table 9 and Figs. 45, and 6 that (i) all the methods can solve all the test problems successfully; (ii) our method outperforms the other two methods about 68%, 72%, and 70% of the experiments in terms of \(\overline {\text {Tcpu}}\), \(\overline {\text {NF}}\), and \(\overline {\text {Itr}}\), respectively. In most of the experiments, our method needs less number of function evaluations to obtain an approximate solution compared to the other two methods. The reason may be that the former benefits from the adaptive line search technique proposed by Yin et al. [45]. In a word, the proposed method is more efficient and competitive than the other two methods for the given test problems.

Table 9 Numerical results on Problems 4.1–4.8 with the randomly initial points
Fig. 4
figure 4

Performance profiles on \(\overline {\text {Tcpu}}\)

Fig. 5
figure 5

Performance profiles on \(\overline {\text {NF}}\)

Fig. 6
figure 6

Performance profiles on \(\overline {\text {Itr}}\)

5 Applications in compressed sensing

In this section, we apply our method to solve compressed sensing problems and compare it with two existing solvers: CGD [9] and ATTCGP [30].

5.1 Problem description

In compressed sensing problems, the task is to find/recover sparse solutions/signals of underdetermined linear systems Ax = b, where \(A\in \mathbb {R}^{m\times n}\) (mn) is a sampling matrix, and \(b\in \mathbb {R}^{m}\) is the observation signal. To this end, a popular approach is to solve the 1-norm regularized optimization problem:

$$ \begin{array}{@{}rcl@{}} \min\limits_{x\in \mathbb{R}^{n}}f(x):=\frac{1}{2}\|Ax-b\|^{2}+\tau\|x\|_{1}, \end{array} $$
(20)

where τ > 0 is a constant balancing the data-fitting and sparsity. Notice that the objective function f(x) in (20) is nonsmooth convex. Interestingly, one can reformulate (20) as a convex quadratic programming by Figueiredo, Nowak, and Wright [54]. Here we provide the concrete process for completeness. By introducing the auxiliary variables \(u\in \mathbb {R}^{n}\) and \(v\in \mathbb {R}^{n}\), we can reformulate x as:

$$ \begin{array}{@{}rcl@{}} x=u-v,~u\geq 0,~v\geq 0, \end{array} $$

where \(u_{i}=\max \limits \{x_{i},0\}\) and \(v_{i}=\max \limits \{-x_{i},0\}\) for all \(i=1,2,\dots ,n\). Then, \(\|x\|_{1}=e_{n}^{\top }u+e_{n}^{\top }v\), where \(e_{n}=(1,1,\dots ,1)^{\top }\in \mathbb {R}^{n}\). Mathematically, the 1-norm regularized optimization problem (20) can be equivalently expressed as:

$$ \begin{array}{@{}rcl@{}} \begin{array}{rl} \min\limits_{u,v} & \frac{1}{2}\|A(u-v)-b\|^{2}+\tau e_{n}^{\top}u+\tau e_{n}^{\top}v,\\ \text{s.t.} & u^{\top}v=0,~u\geq0,~v\geq 0. \end{array} \end{array} $$

However, due to the special structure of the objective function above as analyzed in [54], we know that the above optimization problem is equivalent to:

$$ \begin{array}{@{}rcl@{}} \begin{array}{rl} \min\limits_{u,v} & \frac{1}{2}\|A(u-v)-b\|^{2}+\tau e_{n}^{\top}u+\tau e_{n}^{\top}v,\\ \text{s.t.} & u\geq0,~v\geq 0. \end{array} \end{array} $$
(21)

Then by expanding the quadratic term ∥A(uv) − b2 in (21) and rearranging terms in the resulting relation, (21) can be written as the following simple bound constrained quadratic programming:

$$ \begin{array}{@{}rcl@{}} \begin{array}{rl} \min\limits_{z\in \mathbb{R}^{2n}} & \frac{1}{2}z^{\top}Hz+c^{\top}z,\\ \text{s.t.} & z\geq 0, \end{array} \end{array} $$
(22)

where:

$$ \begin{array}{@{}rcl@{}} z=\left[ \begin{array}{cc} u\\ v \end{array} \right],~c=\left[ \begin{array}{cc} \tau e_{n}-A^{\top}b\\ \tau e_{n}+A^{\top}b \end{array} \right],~H=\left[ \begin{array}{cc} A^{\top}A & -A^{\top}A\\ -A^{\top}A & A^{\top}A \end{array} \right]. \end{array} $$

It is easy to show that H is positive semi-definite. Then, (22) is a convex quadratic programming. Hence, it follows from the first-order optimality conditions of (22) that z is a minimizer of (22) if and only if z is a solution of the following unconstrained equations:

$$ \begin{array}{@{}rcl@{}} F(z)=\min\{z,Hz+c\}=0, \end{array} $$

where the “min” is interpreted as componentwise minimum. We then have from [55, Lemma 3] and [2, Lemma 2.2] that \(F: \mathbb {R}^{2n}\rightarrow \mathbb {R}^{2n}\) is Lipschitz continuous and monotone. Thus, we can apply the proposed method to solve the unconstrained equations above.

5.2 Numerical results

In this subsection, we perform two types of numerical experiments to verify the effectiveness and efficiency of the proposed method.

In our first experiment, the main goal is to restore a one-dimensional sparse signal of length n from m observations, where mn. For fairness, the parameters for our method HTTCGP are taken as the same as those in Section 4 except for μ = 2; the parameters of CGD are taken from Section 5.2 of [9]; the parameters of ATTCGP are chosen from Section 6.2 of [30] with \(t_{k}={t_{k}^{b}}\). As usual, the quality of recovery is measured by the mean of squared error (MSE, for succinctness):

$$ \begin{array}{@{}rcl@{}} \text{MSE}=\frac{1}{n}\|\bar{x}-x\|^{2}, \end{array} $$

where \(\bar {x}\) is the restored signal and x is the original sparse one. From the definition of MSE, the capability of achieving a lower MSE value reflects a better quality of the restored signal for one method. Therefore, the lower the asymptotically optimal MSE value is, the better the method is. All the algorithms start their iterations with x0 = Ab, and all take the stopping criterion as:

$$ \begin{array}{@{}rcl@{}} \frac{|f(x_{k})-f(x_{k-1})|}{|f(x_{k-1})|}\leq 10^{-5}, \end{array} $$

where f(x) is defined in (20). The parameter τ defined in (20) for three algorithms is obtained by the same continuation technique used in [54].

In our experiments, let r denote the number of nonzero entries of x. For a given triple (m, n, r), we use the following MATLAB codes to generate the test data:

$$ \begin{array}{@{}rcl@{}} \begin{array}{l} \text{\textit{x}=zeros(\textit{n},1)};~~\%~\text{initialize~the~original~sparse~signal}\\ \text{\textit{q}=randperm(\textit{n})};\\ \text{\textit{x}(\textit{q}(1:\textit{r}))=randn(\textit{r},1)};\\ \text{\textit{A}=randn(\textit{m},\textit{n})}; \\ \text{\textit{A} = orth(\textit{A}')'};~~\%~\text{orthonormalize~the~rows~of~\textit{A}} \\ \text{\textit{b}=\textit{A}* \textit{x}+0.001* randn(\textit{m},1)}; \end{array} \end{array} $$

From Fig. 7, we observe that the original signal is almost exactly restored from the disturbed one by the above-mentioned three algorithms, while our algorithm outperforms the other two algorithms in terms of MSE, Itr, and Tcpu. To further demonstrate the performance of our algorithm, we set (m, n, r) = (256i,1024i,32i) with i = 1 : 0.5 : 8. For each i = 1 : 0.5 : 8, we generate 20 random instances as described above. The detailed numerical results are reported in Table 10, where we report the average number of iterations (\(\overline {\text {Itr}}\)), the average CPU time in second (\(\overline {\text {Tcpu}}\)), and the average MSE (\(\overline {\text {MSE}}\)) over the 20 instances. We see from Table 10 that HTTCGP clearly outperforms the other two algorithms in terms of \(\overline {\text {Itr}}\), \(\overline {\text {Tcpu}}\) and \(\overline {\text {MSE}}\). Moreover, ATTCGP is always faster than CGD, and its solution quality is also better than CGD in terms of \(\overline {\text {MSE}}\). In summary, the reported numerical results indicate that the proposed method is promising for recovering a sparse signal in compressive sensing.

Fig. 7
figure 7

From top to bottom: the original signal, the measurement, and the recovered signals by three algorithms

Table 10 Numerical results for sparse signal restoration

In our second experiment, we concentrate on the image de-blurring problem and apply the proposed method to restore the original image from its blurred image. We refer the readers to the monograph [56] for the background of the digital image restoration problem. In the literature, the quality of restoration is usually measured by the peak signal-to-noise ratio (PSNR):

$$ \begin{array}{@{}rcl@{}} \text{PSNR}=10\times \log_{10}\frac{V^{2}}{\text{MSE}} \end{array} $$

and the structured similarity (SSIM) index which measures the similarity between the original image and the restored one; see [57] for more details. Here V is the maximum absolute value of the reconstruction. The MATLAB implementation of the SSIM index can be obtained at http://www.cns.nyu.edu/~lcv/ssim/. The larger the PSNR and SSIM are, the better the corresponding method is. The initialization and stopping criterion of all three algorithms are exactly the same as the first experiment above. The parameters for HTTCGP are chosen from the first experiment. For CGD and ATTCGP, we empirically determine appropriate values for their involved parameters. The tuned parameters are as follows: ρ = 0.1 for CGD; ς = 0.8 for ATTCGP and the other parameters are taken from their original papers. Here, the matrix A defined in (20) is a partial discrete Walsh-Hadamard transform (DWT) matrix. We refer the readers to [2] and references therein for a discussion about the computational cost associated with matrix A.

The tested images are Chart.tiff (256 × 256), Barbara.png (512 × 512), Lena.png (512 × 512) and Man.bmp (1024 × 1024). See Fig. 8 for the original, blurred, and restored images by three algorithms. We report the corresponding results in Table 11. According to Table 11, we know that our method is attractive and efficient in terms of the solution quality. Although the proposed method requires more number of iterations, we observe that its CPU time is usually less than CGD. In a word, all the numerical results show that our method provides a effective approach for solving compressed sensing problems.

Fig. 8
figure 8

The original images (first row), the blurred images (second row), and the restored images by CGD (third row), HTTCGP (fourth row), and ATTCGP (last row)

Table 11 Numerical results for the image de-blurring problem

6 Conclusions

In this paper, we propose a hybrid three-term conjugate gradient projection method for solving large-scale constrained nonlinear monotone equations, in which the search direction is close to that of the memoryless BFGS method in the sense of the univariate minimum problem and has the sufficient descent property and the trust region property independent of line search technique. With the help of the adaptive line search technique, the global convergence of the proposed method is established under some mild conditions. A large number of numerical experiments about the constrained nonlinear monotone equations indicate that our method not only inherits the good properties of the three-term conjugate gradient methods and the hybrid conjugate gradient methods but also benefits from the adaptive line search technique, thus greatly improves its computational efficiency compared to the two existing algorithms. Moreover, we apply the proposed method to solve sparse signal restoration and image de-blurring problems, and compare it with two existing methods. The numerical results show that our method is practical and promising.