1 Introduction

The conjugate gradient method (CGM) is very welcome to solve the following unconstrained optimization problem

$$ \begin{array}{@{}rcl@{}} \underset{x{\in}R^{n}}{\min} f(x), \end{array} $$
(1)

where \(f:R^{n}\rightarrow R\) is continuously differentiable and g(x) denotes its gradient at x, i.e., g(x) := ∇f(x). Generally, the CGM iterates along the following form

$$ \begin{array}{@{}rcl@{}} x_{k+1}=x_{k}+ \alpha_{k}d_{k} \end{array} $$
(2)

and

$$ \begin{array}{@{}rcl@{}} d_{k}=\left\{ \begin{array}{lc} -g_{k}, & ~\text{if}~k=1, \\ -g_{k}+\beta_{k}d_{k-1}, & ~\text{if}~k\geq 2, \end{array} \right. \end{array} $$

where αk > 0 is the step-length, and dk is the search direction decided by the conjugate parameter βk. As we all know, the convergence and numerical performance for the CGM depend on the conjugate parameter. Usually, choosing different conjugate parameters leads to obtaining different conjugate gradient methods (CGMs). The classical CGMs include Hestenes-Stiefel (HS) method [1], Fletcher and Reeves (FR) method [2], Polak-Ribière-Polyak (PRP) method [3, 4], Conjugate Descent (CD) method [5], Liu-Storey (LS) method [6] and Dai-Yuan (DY) method [7], and their conjugate parameters are, respectively, given by

$$ \begin{array}{@{}rcl@{}} \beta_{k}^{\text{HS}}&=&\frac{{g_{k}^{T}}y_{k-1}}{d_{k-1}^{T}y_{k-1}}, \beta_{k}^{\text{FR}}=\frac{\|g_{k}\|^{2}}{\|g_{k-1}\|^{2}}, \beta_{k}^{\text{PRP}}=\frac{{g_{k}^{T}}y_{k-1}}{\|g_{k-1}\|^{2}},\\ \beta_{k}^{\text{CD}}&=&\frac{\|g_{k}\|^{2}}{-g_{k-1}^{T} d_{k-1}}, \beta_{k}^{\text{LS}}=\frac{{g_{k}^{T}}y_{k-1}}{-g_{k-1}^{T}d_{k-1}}, \beta_{k}^{\text{DY}}=\frac{\|g_{k}\|^{2}}{d_{k-1}^{T}y_{k-1}}, \end{array} $$

where ∥⋅∥ denotes the Euclidean norm and yk− 1 := gkgk− 1. Some famous CGMs can be seen in Refs. [8,9,10,11].

In the past two decades, CGMs have been extensively studied. Especially, by varying the structure of the search direction for the classical CGMs, many CGMs are proposed, for example, the preconditioned CGM [12], the spectral CGM [13], the three-term CGM (TTCGM) [14, 15], and the spectral three-term CGM [16], to name just a few.

On the other hand, besides solving (1), CGMs have been used to solve problems arising from other areas, such as the tensor optimization [17], the stochastic optimization [18], the Riemannian manifold optimization [19], and the sparse optimization [20]. It is worthwhile to mention that using the CGM to solve the image restoration problems in the sparse optimization has received wide attention. Specifically, by adopting smoothing functions, Chen and Zhou [21] proposed a TTCGM to deal with the nonsmooth and nonconvex optimization problems arising from image restoration. Recently, Yin et al. [22] transformed the nonsmooth convex problem in image restoration into nonlinear monotone equations as those in [23, 24], and then designed an efficient hybrid three-term conjugate gradient projection method to solve it. More related works can be seen in Refs. [25,26,27].

In this paper, we focus on the efficient TTCGM and its application in image restoration. To this end, firstly, we aim to explore a family of hybrid three-term CGMs (HTTCGMs). Secondly, we design a new conjugate parameter for the family and apply it to solve image restoration problems. The main contributions of this paper have at least four aspects as follows.

  • A new family of HTTCGMs is proposed, in which the conjugate parameter \(\beta _{k}^{\text {new}}\) in the search direction is a hybrid of \(\beta _{k}^{\text {DY}}\) and any conjugate parameter. The search direction then is the sum of the negative gradient direction − gk and a convex combination between \(\beta _{k}^{\text {new}}d_{k-1}\) and θkgk− 1, where θk is an appropriate parameter; see (7) below.

  • The search direction generated by the presented family is descent at each iteration without depending on the choice of conjugate parameter and line search criterion. Furthermore, this family is proved to be globally convergent under usual assumptions.

  • Motivated by the idea of the hybrid CGM introduced in [28], an efficient conjugate parameter is designed for the family.

  • Numerical experiments of the proposed family are carried out for solving (1) and the image restoration problems, and the corresponding numerical results show that the new methods are very efficient and promising.

The rest of this article is arranged as follows. In Section 2, a family of HTTCGMs is proposed and the corresponding algorithm framework is given for problem (1). In Section 3, the descent property and global convergence of the new family are proved. In Section 4, a new conjugate parameter is designed for the new family, and then its effectiveness are verified by solving medium-large-scale unconstrained optimization and image restoration problems. In Section 5, a conclusion for this work is made.

2 Motivation and algorithm

Beale [29] believed that using the negative gradient direction frequently for restarting may not be optimal. So, he suggested that the search direction of restarting CGM is defined by

$$ \begin{array}{@{}rcl@{}} d_{k}=-g_{k}+\beta_{k}d_{k-1}+\gamma_{k}d_{t}, \end{array} $$

where \(\gamma _{k}=\frac {{g_{k}^{T}} y_{t}}{{d_{t}^{T}} y_{t}}\) and 1 ≤ t < k. This is also the earlier form of the TTCGM. Clearly, as long as conjugate parameter βk is equal to zero, the next iteration will restart along the direction dk = −gk + γkdt. Since then, the TTCGM has received much attention, and many TTCGMs are proposed with the following direction structure

$$ \begin{array}{@{}rcl@{}} d_{k}=-g_{k}+\beta_{k}d_{k-1}+\theta_{k}y_{k-1}, \end{array} $$
(3)

where θk is a parameter to be determined. Letting \(\beta _{k}:=\beta _{k}^{\text {PRP}}\) in (3) and then solving the equation \({g_{k}^{T}}d_{k}=-\|g_{k}\|^{2}\) to yield θk, Zhang et al. [30] proposed an improved TTCGM as follows:

$$ \begin{array}{@{}rcl@{}} d_{k}=\left\{ \begin{array}{lc} -g_{k}, & ~\text{if}~k=1, \\ -g_{k}+\beta_{k}^{\text{PRP}}d_{k-1}-\frac{{g_{k}^{T}}d_{k-1}}{\|g_{k-1}\|^{2}}y_{k-1}, & ~\text{if}~k\geq 2. \end{array} \right. \end{array} $$

Based on the form of (3), Kou and Dai [31] proposed a TTCGM with restarting procedure (KD method for short), where

$$ \begin{array}{@{}rcl@{}}\beta_{k}&=&\max\left\{\frac{{g_{k}^{T}} y_{k-1}}{d_{k-1}^{T} y_{k-1}}-\left( \frac{s_{k-1}^{T} y_{k-1}}{\|s_{k-1}\|^{2}}+\frac{\|y_{k-1}\|^{2}}{s_{k-1}^{T} y_{k-1}}\right) \frac{{g_{k}^{T}} s_{k-1}}{d_{k-1}^{T} y_{k-1}},\zeta\frac{{g_{k}^{T}}d_{k-1}}{\|d_{k-1}\|^{2}}\right\},\\ 0&<&\zeta<1, \theta_{k}=\xi_{k}\frac{{g_{k}^{T}}d_{k-1}}{d_{k-1}^{T}y_{k-1}},\ s_{k-1}=x_{k}-x_{k-1},\ 0\leq\xi_{k}\leq1.\end{array} $$

Due to the introduction of the restarting procedure, the KD method achieves much better numerical performance, especially for the hard problems. In [32], Narushima et al. proposed a family of TTCGMs which always generate a search direction satisfying the sufficient descent condition, and this property is independent of choices of βk and line searches. Specifically, the search direction in [32] is decided by

$$ \begin{array}{@{}rcl@{}} d_{k}=\left\{ \begin{array}{ll} -g_{k}, & ~\text{if}~k=0 ~\text{or} ~ {g_{k}^{T}}p_{k}=0, \\ -g_{k}+\beta_{k}d_{k-1}-\beta_{k}\frac{{g_{k}^{T}}d_{k-1}}{{g_{k}^{T}}p_{k}}p_{k}, & ~\text{otherwise}, \end{array} \right. \end{array} $$

where \(p_{k}\in \mathbb {R}^{n}\) is any vector, and it could be gk, dk− 1, yk− 1, etc. Recently, inspired by [33], Liu et al. [34] proposed four efficient TTCGMs with the following direction structure

$$ \begin{array}{@{}rcl@{}} d_{k}= -g_{k}+\beta_{k}d_{k-1}+\theta_{k}g_{k-1}. \end{array} $$
(4)

Setting respectively \(\beta _{k}:=\beta _{k}^{\text {LS}}-\frac {\|g_{k-1}\|^{2}{g_{k}^{T}}s_{k-1}}{(d_{k-1}^{T}g_{k-1})^{2}}\) and \(\theta _{k}:=\frac {{g_{k}^{T}}d_{k-1}}{-d_{k-1}^{T}g_{k-1}}\) in (4), the global convergence for the resulting algorithm is established with the strong Wolfe line search.

Based on the convex combination technique, Yuan et al. [25] gave an efficient strategy to design the search direction as follows:

$$ \begin{array}{@{}rcl@{}} d_{k}=\left\{ \begin{array}{lc} -g_{k}, & ~\text{if}~k=0, \\ -N_{k}g_{k}+(1-N_{k})\frac{{g_{k}^{T}}y_{k-1}d_{k-1}-d_{k-1}^{T}g_{k}y_{k-1}}{\max\{ 2\chi\|d_{k-1}\|\|y_{k-1}\|,-d_{k-1}^{T}g_{k-1}\}}, & ~\text{if}~k\geq 1, \end{array} \right. \end{array} $$
(5)

where \(\chi \in (0, 1), N_{k}=\frac {y_{k-1}^{T}y_{k-1}}{y_{k-1}^{T}s_{k-1}^{\ast }}\in (0, 1], s_{k-1}^{\ast }=s_{k-1}+(\max \limits \{0,\frac {-s_{k-1}^{T}y_{k-1}}{\|y_{k-1}\|^{2}}\}+1)y_{k-1},\) and yk− 1 = gkgk− 1. The numerical results for solving image restoration problems in [25] verify the effectiveness of the resulting method.

On the other hand, to guarantee \(\beta _{k}^{\text {HS}}\) to be nonnegative and the HS CGM to be globally convergent, Dai and Yuan [35] proposed a hybrid CGM (HD CGM for brevity), where the conjugate parameter is given by

$$ \begin{array}{@{}rcl@{}} \beta_{k}^{\text{HD}}=\max\left\{0,\min\left\{\beta_{k}^{\text{HS}},\beta_{k}^{\text{DY}}\right\}\right\}. \end{array} $$

Under usual assumptions, the authors in [35] showed the global convergence of the HD CGM with the weak Wolfe line search. A large number of numerical experiments illustrate the efficiency of the HD CGM.

In this paper, we are devoted to making more use of the information of the objective function at the current iteration to construct efficient algorithms for large-scale unconstrained optimization and image restoration problems. Motivated by the studies in [32, 34] and the convex combination technique used in [25], as well as the idea of the hybrid CGM proposed in [35], we present a new family of hybrid TTCGMs with the following search direction:

$$ \begin{array}{@{}rcl@{}} d_{k}&=&\left\{ \begin{array}{lc} -g_{k}, & ~\text{if}~k=1, \\ -g_{k}+(1-\lambda_{k})\beta_{k}^{\text{new}}d_{k-1}+\lambda_{k}\theta_{k}g_{k-1}, & ~\text{if}~k\geq 2, \end{array} \right. \end{array} $$
(6)
$$ \begin{array}{@{}rcl@{}} \beta_{k}^{\text{new}}&=&\max\left\{0,\min\left\{\beta_{k},\beta_{k}^{\text{DY}}\right\}\right\},\ \lambda_{k}=\frac{|{g_{k}^{T}}d_{k-1}|}{\|g_{k}\|\|d_{k-1}\|},\ \theta_{k}=-\eta\frac{{g_{k}^{T}}g_{k-1}}{\|g_{k-1}\|^{2}}, \end{array} $$
(7)

where βk is any conjugate parameter and 0 < η < 1. It is easy to see from the definition of λk that λk ∈ [0, 1] for all k ≥ 2. Therefore, if λk = 0 in (6) and \(\beta _{k}:=\beta _{k}^{\text {HS}}\) in (7), then the search direction (6) reduces to that of the HD CGM. It is worth mentioning that in the forthcoming analysis (see Lemma 1 and Theorem 1 below), the descent property for the search direction defined in (6) and the global convergence of the proposed family are independent of choices of βk and line searches. These facts will allow more flexibility from both a theoretical and practical viewpoint.

Now, based on the search direction (6) and the weak Wolfe line search, we formally present the detailed steps of the family (FHTTCGMs).

figure a

3 The descent property and global convergence

In this section, we firstly analyze the descent property for the search direction generated by FHTTCGMs. Subsequently, we focus on proving its global convergence.

The following lemma shows that dk defined in (6) is a descent direction, and provides an estimation of \((1-\lambda _{k})\beta _{k}^{\text {new}}\), which is critical for the subsequent convergence analysis.

Lemma 1

Let {dk} be a sequence generated by FHTTCGMs, then the following relation holds:

$$ \begin{array}{@{}rcl@{}} {g_{k}^{T}}d_{k}< 0,\ k\geq 1, \end{array} $$
(9)

which implies that the search direction yielded by FHTTCGMs is descent. Furthermore, for all k ≥ 2, we have

$$ \begin{array}{@{}rcl@{}} 0\leq(1-\lambda_{k})\beta_{k}^{\text{new}}\leq \frac{{g_{k}^{T}}d_{k}}{g_{k-1}^{T}d_{k-1}}. \end{array} $$
(10)

Proof

We prove the first assertion by induction. When k = 1, it follows from (6) that \({g_{1}^{T}}d_{1}=-\|g_{1}\|^{2}< 0\). Suppose that (9) holds for k − 1 (∀ k ≥ 2). Recall that λk ∈ [0, 1] for all k ≥ 2. Next, we prove that (9) also holds for k by the following three cases.

Case I: \(\beta _{k}^{\text {new}}=0\). Multiplying the both sides of (6) by \({g_{k}^{T}}\), it follows that

$$ \begin{array}{@{}rcl@{}} {g_{k}^{T}}d_{k}&=&-\|g_{k}\|^{2}-\eta\lambda_{k}\frac{\left( {g_{k}^{T}}g_{k-1}\right)^{2}}{\|g_{k-1}\|^{2}} =-\left( 1+\eta\lambda_{k}\cos^{2}\vartheta_{k}\right)\|g_{k}\|^{2}<0, \end{array} $$

where 𝜗k is the angle between gk and gk− 1.

Case II: \(\beta _{k}^{\text {new}}=\beta _{k}^{\text {DY}}\). By the definition of \(\beta _{k}^{\text {new}}\) in (7), we have \(\beta _{k}^{\text {DY}}>0\). Further, multiplying both sides of (6) by \({g_{k}^{T}}\), we get

$$ \begin{array}{@{}rcl@{}} {g_{k}^{T}}d_{k}\!&=&-\|g_{k}\|^{2}+(1-\lambda_{k})\beta_{k}^{\text{new}}{g_{k}^{T}}d_{k-1}-\eta\lambda_{k}\frac{\left( {g_{k}^{T}}g_{k-1}\right)^{2}}{\|g_{k-1}\|^{2}}\\ &=&-\left( 1+\eta\lambda_{k}\cos^{2}\vartheta_{k}\right)\|g_{k}\|^{2}+(1-\lambda_{k})\beta_{k}^{\text{DY}}{g_{k}^{T}}d_{k-1}\\ &=\!& - \left( 1 + \eta\lambda_{k}\cos^{2}\vartheta_{k}\right)\|g_{k}\|^{2} + (1 - \lambda_{k}) \left( \frac{\|g_{k}\|^{2}}{d_{k-1}^{T}y_{k-1}}d_{k-1}^{T}y_{k-1} + \beta_{k}^{\text{DY}}g_{k-1}^{T}d_{k-1}\!\right)\\ &=&-\left( 1+\eta\cos^{2}\vartheta_{k}\right)\lambda_{k}\|g_{k}\|^{2}+(1-\lambda_{k})\beta_{k}^{\text{DY}}g_{k-1}^{T}d_{k-1}. \end{array} $$

The above relation can be rewritten as

$$ \begin{array}{@{}rcl@{}} {g_{k}^{T}}d_{k}=\left\{ \begin{array}{ll} (1-\lambda_{k})\beta_{k}^{\text{new}}g_{k-1}^{T}d_{k-1}, & \text{if}~\lambda_{k}=0, \\ -\left( 1+\eta\cos^{2}\vartheta_{k}\right)\|g_{k}\|^{2}, & \text{if}~\lambda_{k}=1, \\ -\left( 1+\eta\cos^{2}\vartheta_{k}\right)\lambda_{k}\|g_{k}\|^{2}+(1-\lambda_{k})\beta_{k}^{\text{new}}g_{k-1}^{T}d_{k-1}, & \text{if}~0<\lambda_{k}<1. \end{array} \right. \end{array} $$
(11)

This together with the induction hypothesis yields \({g_{k}^{T}}d_{k}<0\).

Case III: \(\beta _{k}^{\text {new}}=\beta _{k}\). Again, using the definition of \(\beta _{k}^{\text {new}}\) in (7) leads us to the relation \(0<\beta _{k} \leq \beta _{k}^{\text {DY}}\), which further implies \(d_{k-1}^{T}y_{k-1}>0\) from the definition of \(\beta _{k}^{\text {DY}}\). Hence, we obtain from (6) that

$$ \begin{array}{@{}rcl@{}} {g_{k}^{T}}d_{k}\!&=&-\|g_{k}\|^{2}+(1-\lambda_{k})\beta_{k}^{\text{new}}{g_{k}^{T}}d_{k-1}-\eta\lambda_{k}\frac{\left( {g_{k}^{T}}g_{k-1}\right)^{2}}{\|g_{k-1}\|^{2}}\\ &=&-\left( 1+\eta\lambda_{k}\cos^{2}\vartheta_{k}\right)\|g_{k}\|^{2}+(1-\lambda_{k})\beta_{k}\left( d_{k-1}^{T}y_{k-1}+g_{k-1}^{T}d_{k-1}\right)\\ &\leq& - \left( 1 + \eta\lambda_{k}\cos^{2}\vartheta_{k}\right)\|g_{k}\|^{2}+(1 - \lambda_{k})\left( \frac{\|g_{k}\|^{2}}{d_{k-1}^{T}y_{k-1}} d_{k-1}^{T}y_{k-1}+\beta_{k}g_{k-1}^{T}d_{k-1}\right)\\ &=&-\left( 1+\eta\cos^{2}\vartheta_{k}\right)\lambda_{k}\|g_{k}\|^{2}+(1-\lambda_{k})\beta_{k}g_{k-1}^{T}d_{k-1}. \end{array} $$

Similarly, we conclude that the relation in (11) still holds. Combining this with the induction hypothesis yields \({g_{k}^{T}}d_{k}<0\). Up to now, we have showed that the relation in (9) holds.

Now, we establish the second assertion. If \(\beta _{k}^{\text {new}}=0\), we have from (9) that

$$ 0=(1-\lambda_{k})\beta_{k}^{\text{new}}< \frac{{g_{k}^{T}}d_{k}}{g_{k-1}^{T}d_{k-1}}. $$

If \(\beta _{k}^{\text {new}}=\beta _{k}^{\text {DY}}\) or βk, we deduce from (9) and (11) that

$$ 0\leq(1-\lambda_{k})\beta_{k}^{\text{new}}\leq \frac{{g_{k}^{T}}d_{k}}{g_{k-1}^{T}d_{k-1}}. $$

Thus, the proof is complete. □

From the above proof process, it can be seen that the descent property for the search direction defined in (6) does not depend on any specific conjugate parameter and line search criterion.

To analyze the global convergence property of the FHTTCGMs, the following assumptions for the objective function are required:

  • A1 The level set Λ = {xRn | f(x) ≤ f(x1)} is bounded;

  • A2 In a neighborhood U of Λ, f(x) is differentiable and its gradient g(x) is Lipschitz continuous, namely, there exists a constant L > 0 such that

$$ \|g(x)-g(y)\|\leq L\|x-y\|,\ \forall\ x,y\in U. $$

From the weak Wolfe line search (??), we know that the sequence {f(xk)} is monotonically nonincreasing. Combining this with assumption A1, we obtain that the sequence {xk} is bounded.

The following lemma is the well-known Zoutendijk condition. It is very important for the convergence analysis of the CGM; see [36] for details.

Lemma 2

Consider iteration of the form (2), where dk satisfies the descent condition \({g_{k}^{T}}d_{k}<0\) and αk satisfies the weak Wolfe line search (??). If assumptions A1-A2 hold, then we have \(\sum\limits _{k=1}^{\infty }\frac {({g_{k}^{T}}d_{k})^{2}}{\|d_{k}\|{\!}^{2}}<+\infty \).

With Lemmas 1 and 2 at hand, we give the convergence analysis of FHTTCGMs.

Theorem 1

Let {xk} be a sequence generated by FHTTCGMs. If assumptions A1-A2 hold, then it holds that \(\underset {k\to \infty }{\liminf } \|g_{k}\|=0\).

Proof

We prove this claim by contradiction. Naturally, we assume that there exists γ > 0 such that ∥gk∥ ≥ γ for all k ≥ 1. On the other hand, from assumption A2 and the boundedness of {xk}, we know that {gk} is bounded, namely, there is another positive constant \(\widetilde {\gamma }\) such that

$$ \begin{array}{@{}rcl@{}} \gamma\leq \|g_{k}\|\leq \widetilde{\gamma}, \ \forall\ k\geq 1. \end{array} $$
(12)

From (6), we have immediately that

$$ \begin{array}{@{}rcl@{}} d_{k}+g_{k}-\lambda_{k}\theta_{k}g_{k-1}=(1-\lambda_{k})\beta_{k}^{\text{new}}d_{k-1}. \end{array} $$

Next, squaring both sides of the above equality, we obtain

$$ \begin{array}{@{}rcl@{}} \|d_{k}\|^{2}+2{g_{k}^{T}}d_{k}+2\eta\lambda_{k}\frac{{g_{k}^{T}}g_{k-1}}{\|g_{k-1}\|^{2}}g_{k-1}^{T}d_{k}+\|g_{k}\|^{2}+ 2\eta\lambda_{k}\frac{{g_{k}^{T}}g_{k-1}}{\|g_{k-1}\|^{2}}{g_{k}^{T}}g_{k-1} \\ + \eta^{2}{\lambda_{k}^{2}}\left( \frac{{g_{k}^{T}}g_{k-1}}{\|g_{k-1}\|^{2}}\right)^{2}\|g_{k-1}\|^{2}=(1-\lambda_{k})^{2}\left( \beta_{k}^{\text{new}}\right)^{2}\|d_{k-1}\|^{2}. \end{array} $$
(13)

In addition, multiplying both sides of (6) by \(g_{k-1}^{T}\), we know that

$$ \begin{array}{@{}rcl@{}} g_{k-1}^{T}d_{k} &=& -{g_{k}^{T}}g_{k-1}+(1-\lambda_{k})\beta_{k}^{\text{new}}g_{k-1}^{T}d_{k-1}-\eta\lambda_{k}\|g_{k-1}\|^{2}\frac{{g_{k}^{T}}g_{k-1}}{\|g_{k-1}\|^{2}} \\ &=& -(1+\eta\lambda_{k}){g_{k}^{T}}g_{k-1}+(1-\lambda_{k})\beta_{k}^{\text{new}}g_{k-1}^{T}d_{k-1}. \end{array} $$
(14)

Substituting (14) into (13) and rearranging terms, we deduce that

$$ \begin{array}{@{}rcl@{}} \|d_{k}\|^{2}& =&(1-\lambda_{k})^{2}\left( \beta_{k}^{\text{new}}\right)^{2}\|d_{k-1}\|^{2}-2{g_{k}^{T}}d_{k}-2\eta\lambda_{k}(1-\lambda_{k})\frac{{g_{k}^{T}}g_{k-1}}{\|g_{k-1}\|^{2}} \beta_{k}^{\text{new}}g_{k-1}^{T}d_{k-1} \\ & &+2\eta\lambda_{k}\left( 1+\eta\lambda_{k}\right)\frac{\left( {g_{k}^{T}}g_{k-1}\right)^{2}}{\|g_{k-1}\|^{2}}-\|g_{k}\|^{2}-2\eta\lambda_{k} \frac{\left( {g_{k}^{T}}g_{k-1}\right)^{2}}{\|g_{k-1}\|^{2}}-\eta^{2}{\lambda_{k}^{2}}\frac{\left( {g_{k}^{T}}g_{k-1}\right)^{2}}{\|g_{k-1}\|^{2}} \\ &=& (1-\lambda_{k})^{2}\left( \beta_{k}^{\text{new}}\right)^{2}\|d_{k-1}\|^{2}-2{g_{k}^{T}}d_{k}-2\eta\lambda_{k}(1-\lambda_{k})\frac{{g_{k}^{T}}g_{k-1}}{\|g_{k-1}\|^{2}} \beta_{k}^{\text{new}}g_{k-1}^{T}d_{k-1} \\ && +\eta^{2}{\lambda_{k}^{2}}\frac{\left( {g_{k}^{T}}g_{k-1}\right)^{2}}{\|g_{k-1}\|^{2}}-\|g_{k}\|^{2}. \end{array} $$

Combining this with (9), (10), λk ∈ [0, 1] and the Cauchy-Schwarz inequality, we conclude that

$$ \begin{array}{@{}rcl@{}} \|d_{k}\|^{2} &\leq& (1-\lambda_{k})^{2}\left( \beta_{k}^{\text{new}}\right)^{2}\|d_{k-1}\|^{2}-2{g_{k}^{T}}d_{k}-2\eta\lambda_{k}(1-\lambda_{k})\frac{|{g_{k}^{T}}g_{k-1}|}{\|g_{k-1}\|^{2}} \beta_{k}^{\text{new}}g_{k-1}^{T}d_{k-1} \\ &&-\left( 1-\eta^{2}\right)\|g_{k}\|^{2} \\ &\leq& \frac{\left( {g_{k}^{T}}d_{k}\right)^{2}}{\left( g_{k-1}^{T}d_{k-1}\right)^{2}}\|d_{k-1}\|^{2}-2{g_{k}^{T}}d_{k}- 2\eta\lambda_{k}\frac{|{g_{k}^{T}}g_{k-1}|}{\|g_{k-1}\|^{2}}{g_{k}^{T}}d_{k} -\left( 1-\eta^{2}\right)\|g_{k}\|^{2} \\ &\leq& \frac{\left( {g_{k}^{T}}d_{k}\right)^{2}}{\left( g_{k-1}^{T}d_{k-1}\right)^{2}}\|d_{k-1}\|^{2}-2{g_{k}^{T}}d_{k}- 2\frac{\eta\widetilde{\gamma}^{2}}{\gamma^{2}}{g_{k}^{T}}d_{k} -\left( 1-\eta^{2}\right)\|g_{k}\|^{2} \\ &=& \frac{\left( {g_{k}^{T}}d_{k}\right)^{2}}{\left( g_{k-1}^{T}d_{k-1}\right)^{2}}\|d_{k-1}\|^{2}- 2\left( 1+\frac{\eta\widetilde{\gamma}^{2}}{\gamma^{2}}\right){g_{k}^{T}}d_{k}- \left( 1-\eta^{2}\right)\|g_{k}\|^{2}, \end{array} $$

where we made use of (12) for the third inequality. Letting \(P=1+\frac {\eta \widetilde {\gamma }^{2}}{\gamma ^{2}}>1\) and Q = 1 − η2 ∈ (0, 1), we then obtain

$$ \begin{array}{@{}rcl@{}} \|d_{k}\|^{2}\leq\frac{\left( {g_{k}^{T}}d_{k}\right)^{2}}{\left( g_{k-1}^{T}d_{k-1}\right)^{2}}\|d_{k-1}\|^{2}-2P{g_{k}^{T}}d_{k}-Q\|g_{k}\|^{2}. \end{array} $$

Dividing both sides of the above inequality by \(\left ({g_{k}^{T}}d_{k}\right )^{2}\) yields

$$ \begin{array}{@{}rcl@{}} \frac{\|d_{k}\|^{2}}{\left( {g_{k}^{T}}d_{k}\right)^{2}}&\leq& \frac{\|d_{k-1}\|^{2}}{\left( g_{k-1}^{T}d_{k-1}\right)^{2}} -\frac{2P}{{g_{k}^{T}}d_{k}}-Q\frac{\|g_{k}\|^{2}}{\left( {g_{k}^{T}}d_{k}\right)^{2}} \\ &=& \frac{\|d_{k-1}\|^{2}}{\left( g_{k-1}^{T}d_{k-1}\right)^{2}}- \left( \frac{P}{\sqrt{Q}\|g_{k}\|}+\frac{\sqrt{Q}\|g_{k}\|}{{g_{k}^{T}}d_{k}}\right)^{2}+ \frac{P^{2}}{Q\|g_{k}\|^{2}} \\ &\leq& \frac{\|d_{k-1}\|^{2}}{\left( g_{k-1}^{T}d_{k-1}\right)^{2}}+\frac{P^{2}}{Q}\frac{1}{\|g_{k}\|^{2}} \\ &\leq& \frac{\|d_{k-2}\|^{2}}{\left( g_{k-2}^{T}d_{k-2}\right)^{2}}+\frac{P^{2}}{Q}\frac{1}{\|g_{k-1}\|^{2}}+ \frac{P^{2}}{Q}\frac{1}{\|g_{k}\|^{2}} \\ &\leq& \frac{\|d_{1}\|^{2}}{({g_{1}^{T}}d_{1})^{2}}+\frac{P^{2}}{Q}\sum^{k}_{i=2}\frac{1}{\|g_{i}\|^{2}}. \end{array} $$

Hence, using d1 = −g1, P > 1, Q ∈ (0, 1) and (12), we obtain

$$ \begin{array}{@{}rcl@{}} \frac{\|d_{k}\|^{2}}{\left( {g_{k}^{T}}d_{k}\right)^{2}}\leq \frac{P^{2}}{Q}\sum^{k}_{i=1}\frac{1}{\|g_{i}\|^{2}} \leq\frac{P^{2}}{Q}\frac{k}{\gamma^{2}}, \end{array} $$

which further implies that \(\frac {({g_{k}^{T}}d_{k})^{2}}{\|d_{k}\|^{2}}\geq \frac {Q\gamma ^{2}}{P^{2}}\frac {1}{k}\). It is not hard to see that \(\sum\limits _{k=1}^{\infty } \frac {({g_{k}^{T}}d_{k})^{2}}{\|d_{k}\|^{2}}=\infty \). This contradicts with Lemma 2, and therefore the proof is complete. □

4 Numerical experiments

To verify the effectiveness and efficiency of FHTTCGMs, we first design a new conjugate parameter for \(\beta _{k}^{\text {new}}\) in the FHTTCGMs. Subsequently, we apply the FHTTCGMs to solve unconstrained optimization and image restoration problems.Footnote 1

In [28], Shi and Guo proposed a family of CGMs, in which the conjugate parameter is defined by

$$ \begin{array}{@{}rcl@{}} \beta_{k}^{\text{SG}}=\frac{{g_{k}^{T}}\left( g_{k}-g_{k-1}\right)}{(1-\mu)\|g_{k-1}\|^{2}-\mu g_{k-1}^{T}d_{k-1}}, \end{array} $$
(15)

where the hybrid parameter μ ∈ [0, 1]. Clearly, the conjugate parameter above is a hybrid of \(\beta _{k}^{\text {PRP}}\) and \(\beta _{k}^{\text {LS}}\). If μ≠ 1, then formula (15) can be rewritten as

$$ \begin{array}{@{}rcl@{}} \beta_{k}^{\text{SG}}=\frac{1}{1-\mu}\cdot\frac{{g_{k}^{T}}(g_{k}-g_{k-1})}{\|g_{k-1}\|^{2}-\frac{\mu}{1-\mu} g_{k-1}^{T}d_{k-1}}. \end{array} $$
(16)

Inspired by (16), and to avoid the trouble about how to choose the hybrid parameter μ, we design a new conjugate parameter as follows:

$$ \begin{array}{@{}rcl@{}} \beta_{k}^{\mathrm{N}}=\frac{{g_{k}^{T}}(g_{k}-g_{k-1})}{\|g_{k-1}\|^{2}-\sigma g_{k-1}^{T}d_{k-1}}, \end{array} $$
(17)

where σ is the same scalar as that of the weak Wolfe line search (??). Substituting \(\beta _{k}:=\beta _{k}^{\mathrm {N}}\) into (7) to obtain \(\beta _{k}^{\text {new}}\), and then embedding it in the FHTTCGMs, we call the resulting algorithm the FHTTCGM-N.

4.1 Unconstrained optimization problems

In this subsection, two group experiments are conducted. The first group is used to verify the effectiveness of FHTTCGMs. To this end, the conjugate parameter βk in (7) is set to \(\beta _{k}^{\text {HS}}\), \(\beta _{k}^{\text {PRP}}\) and \(\beta _{k}^{\text {LS}}\), and the corresponding methods are denoted by FHTTCGM-HS, FHTTCGM-PRP, and FHTTCGM-LS, respectively. We compare them with the original HS, PRP and LS CGMs. The second group is used to show that the proposed FHTTCGM-N is efficient. So, some state-of-the-art methods are chosen for comparison. They are the famous CG-DESCENT method (HZ) [9], the three-term CGM with restart direction (KD) [31], the three-term CGM with the modified direction structure (MTTLS) [34] and the HD CGM [35].

For two group experiments, the same 100 unconstrained problems are tested and compared, in which the problems 1-53 are taken from the CUTE library [37] and the others come from the unconstrained problem collections [38, 39]. The dimensions of the test problems vary from 11 to 800000. For the sake of fairness, all the comparison methods use the weak Wolfe line search (??) to compute the step-length αk, and the relevant parameters are set to δ = 0.01 and σ = 0.1. For our methods, we set η = 0.4. Moreover, we adopt the strategy described in [40] to compute the initial step-length. The termination criterion is (1) ∥gk∥≤ 10− 6 or (2) Itr > 2000, where “Itr” represents the number of iterations. When (2) does happen, we claim that the relevant algorithm is invalid for the corresponding test problem, and denote it by “F”. All codes are written in Matlab 2018b, and run on a Lenovo PC with 3.60 GHz CPU processor and 8 GB RAM memory as well as Windows 10 operation system. In the two group experiments, we report the number of iterations (Itr), CPU time (Tcpu) and the final value for ∥gk∥ (∥g∥) in Tables 123 and 4, and use the performance profiles proposed by Dolan and Morè [41] to visually describe the performance of these algorithms in terms of Tcpu and Itr, respectively. For the interpretation of the performance profiles, in a general way, the top curve shows that the relevant method is a winner; see [41] for more details.

Table 1 Numerical results for the first group
Table 2 Numerical results for the first group (continued)
Table 3 Numerical results for the second group
Table 4 Numerical results for the second group (continued)

Tables 1 and 2 and Figs. 1 and 2 show that the numerical performance of FHTTCGM-PRP, FHTTCGM-HS and FHTTCGM-LS is better than the corresponding original algorithm. This directly indicates that our proposed FHTTCGMs is effective. From Tables 3 and 4 and Figs. 3 and 4, we know that the curve for the proposed FHTTCGM-N is at the top and it can solve about 97% of the test problems successfully. Hence, in the second group experiments, the numerical performance of FHTTCGM-N is superior to the other four methods for the given test problems.

Fig. 1
figure 1

Performance profiles on Tcpu of the first group

Fig. 2
figure 2

Performance profiles on Itr of the first group

Fig. 3
figure 3

Performance profiles on Tcpu of the second group

Fig. 4
figure 4

Performance profiles on Itr of the second group

4.2 Image restoration problems

In this part, we use the presented FHTTCGM-N to deal with the image restoration problems.

In [42], Raymond et al. utilized the two-phase scheme to restore images corrupted by impulse noise. In the first phase, a median filter was used to detect noise pixels. Let X be an image of size M-by-N and \(A =\{1,2,\dots ,M\}\times \{1,2,\dots ,N\}\) be the index set of the image X. Denote by \(\mathcal {N}\subset A\) the set of indices of the noise pixels detected from the first phase, and \(|\mathcal {N}|\) means the number of elements in \(\mathcal {N}\). Let \(\mathcal {V}_{i,j}\) be the set of the four closest neighbors for the pixel at pixel location (i,j) ∈ A, i.e., \(\mathcal {V}_{i,j}=\{(i,j-1),(i,j+1),(i-1,j),(i+1,j)\}\), and yi,j be the observed pixel value of the image at pixel location (i,j). In the second phase, noisy pixels are cleaned by solving the nonsmooth minimization problem

$$ \begin{array}{@{}rcl@{}} \underset{\mathbf{u}}{\min} \sum_{(i, j) \in \mathcal{N}}\left[|u_{i, j}-y_{i, j}|+\frac{\beta}{2}\left( 2 \cdot S_{i, j}^{1}+S_{i, j}^{2}\right)\right], \end{array} $$
(18)

where

$$ S_{i, j}^{1} =\sum_{(m, n) \in \mathcal{V}_{i, j} \backslash \mathcal{N}} \varphi_{\alpha}\left( u_{i, j}-y_{m, n}\right),\ S_{i, j}^{2} =\sum_{(m, n) \in \mathcal{V}_{i, j} \cap \mathcal{N}} \varphi_{\alpha}\left( u_{i, j}-u_{m, n}\right). $$

Here, \(\varphi _{\alpha }(t)=\sqrt {t^{2}+\alpha }\) is an edge-preserving function with parameter α > 0 and \(\mathbf {u} = [u_{i,j}]_{(i,j)\in \mathcal {N}}\) is a column vector of length \(|\mathcal {N}|\) ordered lexicographically. However, it is time-consuming and cost-intensive to solve the nonsmooth minimization problem (18) exactly. Cai et al. [43] removed the nonsmooth term and obtained the following smooth unconstrained optimization:

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\underset{\mathbf{u}}{\min} F_{\alpha}(\mathbf{u}):=\sum_{(i,j)\in \mathcal{N}} \left( 2\sum_{(m,n)\in \mathcal{V}_{i,j}\backslash \mathcal{N}} \varphi_{\alpha}(u_{i,j}-y_{m,n})+\sum_{(m,n)\in \mathcal{V}_{i,j}\cap \mathcal{N}} \varphi_{\alpha}(u_{i,j}-u_{m,n})\right). \end{array} $$
(19)

Obviously, the higher the noise ratio, the larger the scale of (19). The authors in [43] discovered that the contaminated images can be restored efficiently by using the CGM to solve the above problem (19), even though the noise ratio is high or even reaches 90%. We refer the reader to [44,45,46] and the references therein for the applications of CGMs in image restoration.

Now, we focus on applying the two-phase scheme to remove salt-and-pepper noise, which is a special case of impulse noise. In the first phase, we use adaptive median filter [47] to detect noisy pixels. In the second phase, we use FHTTCGM-N to solve (19), and compare it with the PRP CGM used in [43], the classical PRP CGM and the HZ method. Notice that both the classical PRP CGM and the HZ method use the weak Wolfe line search (??) to compute the step-length αk, while the PRP CGM used in [43] adopts an explicit expression to obtain αk. For convenience, we directly denote the PRP CGM used in [43] by PRP, while we use PRP-W to denote the classical PRP CGM with the weak Wolfe line search. The step-length calculation formula and related parameters for PRP are the same as those in [43].

The test images are Boat(512 × 512), Hill(512 × 512), Lena(512 × 512) and Man(512 × 512). All the compared methods use the following stopping criterion

$$ \text{Itr}>300\ \text{or}\ \frac{|F_{\alpha}(\mathbf{u}_{k})-F_{\alpha}(\mathbf{u}_{k-1})|}{|F_{\alpha}(\mathbf{u}_{k})|}\leq 10^{-4}. $$

Throughout this part, the operating environment is the same as in Section 4.1. To assess the restoration performance qualitatively, we adopt the peak signal to noise ratio (PSNR, see [48]) defined as:

$$ \text{PSNR} = 10 \log_{10}\frac{255^{2}}{\frac{1}{MN}{\sum}_{i,j}\left( x_{i,j}^{r}-x_{i,j}^{*}\right)^{2}}, $$

where \(x_{i,j}^{r}\) and \(x_{i,j}^{*}\) denote the pixel values of the restored image and the original one, respectively.

Table 5 reports the number of iterations (Itr), the CPU time (Tcpu), and the PSNR values for the restored images. To save the space of the paper, we only plot the original, noisy and restored images for four algorithms when the salt-and-paper noise ratio is 70% and 90%. For the corresponding results, see Figs. 5 and 6. From Table 5, we observe that the proposed FHTTCGM-N usually requires less time than those of the other three algorithms. Moreover, the PSNR values of the images restored by the FHTTCGM-N are often higher than the other three algorithms except for a few cases. It is noticeable that the HZ method is usually superior to PRP and PRP-W in terms of Itr, Tcpu and PSNR. In this regard, we firmly believe that the numerical performance for the HZ method will be further improved if its own line search is used to compute the step-length. Anyway, our proposed FHTTCGM-N is superior to the other three methods for the given test images.

Fig. 5
figure 5

The original images (first row), the noisy images with 70% salt-and-paper noise (second row) and the restored images by FHTTCGM-N (third row), PRP (fourth row), PRP-W (fourth row) and HZ (last row)

Fig. 6
figure 6

The original images (first row), the noisy images with 90% salt-and-paper noise (second row) and the restored images by FHTTCGM-N (third row), PRP (fourth row), PRP-W (fourth row) and HZ (last row)

Table 5 Numerical results for image restoration problems

5 Conclusions

In this work, we propose a family of hybrid three-term conjugate gradient methods (FHTTCGMs), in which the search direction always satisfies the descent condition independent of choices of conjugate parameter and line searches. Under some mild conditions, the global convergence for the proposed family is obtained. By embedding the classical HS, PRP and LS conjugate parameters in the FHTTCGMs, respectively, the numerical comparison results associated with the resulting methods show that the FHTTCGMs is very promising. Moreover, we also design a conjugate parameter for the family, and thus propose a specific method. Finally, applying it to deal with the medium-large-scale unconstrained optimization and image restoration problems, the numerical results illustrate the encouraging efficiency and applicability of the proposed method even compared with the state-of-the-art methods.