1 Introduction

Consider unconstrained optimization problems

$$ \min\{f(x)|~x\in R^{n}\}, $$
(1.1)

where the objective function f : RnR is continuously differentiable. The conjugate gradient (CG) algorithm for (1.1) has the iteration formula:

$$ x_{k+1}=x_{k}+\alpha_{k}d_{k}, ~k=0,1,2,\cdots, $$
(1.2)

where xk is the current iteration point, αk is the step size, and dk is the search direction, which is defined by

$$ \begin{array}{@{}rcl@{}} d_{k}=\left\{ \begin{array}{ll} -g_{k}+\beta_{k-1}d_{k-1}, & \text{if} k\geq 1,\\ -g_{k},& \text{if} k=0, \end{array} \right. \end{array} $$
(1.3)

where \(g_{k}=\triangledown f(x_{k})\) is the gradient of f(x) at xk, and βkR is a CG scalar. Various well-known conjugate gradient methods (see [4,5,6, 9, 15,16,17]), including the Fletcher-Reeves (FR) method [6], the Polak-Ribière-Polyak (PRP) method [16, 17], the Hestenes-Stiefel (HS) method [9], the Dai-Yuan (DY) method [4] and the conjugate descent method [5]. Zoutendjk [42] proved the global convergence of the FR method for general functions under an exact line search. In practice, the FR method [6] may produce a continuous small step length, which will lead to poor numerical performance despite its satisfactory convergence. The global convergence of the DY method [4] and the CD method [5] has been proven, and the global convergence of the DY method [4] can be proven without descent condition. However, the performances of the DY method and the CD method are weaker than those of the PRP method [16, 17] and the HS method [9], where the performance of the HS method is similar to that of the PRP method. Furthermore, the PRP method has a built-in restart feature that can automatically adjust βk to solve the interference problem. Therefore, we are more interested in the PRP formula with

$$\beta_{k}^{PRP}=\frac{{g_{k}^{T}}y_{k-1}}{\|g_{k-1}\|^{2}},$$

where ∥⋅∥ denotes the Euclidean norm of vectors and yk− 1 = gkgk− 1. Many scholars have studied the convergence (see [2, 3, 17,18,19, 35, 41] et al.) of the PRP method. Polark and Ribière show that for convex problems, the global convergence of the PRP method can be proven using an exact line search [16, 17]. However, for the nonconvex problems, Powell [19] gave a three-dimensional counterexample and pointed out that the convergence of the PRP method could not be guaranteed even if an exact line search was adopted. Based on the above work, the research results of Gilbert and Nocedal [7] show that if \(\beta _{k}^{PRP}\) is strictly non-negative and the step size αk is determined by a line search step that satisfies sufficient descent, then the PRP method has global convergence.

It is well known that sufficient descent is a very important property for CG algorithms which has the form

$$ {d_{k}^{T}}g_{k}\leq -c\|g_{k}\|^{2}, $$
(1.4)

where c > 0 is a constant and this property has been proven to play a critical role in analyzing the global convergence of CG methods (see [7, 10, 36, 39] etc.). To ensure that the sufficient descent condition is satisfied and establish the convergence of the PRP method, the following weak Wolfe-Powell (WWP) inexact line search technique is proposed in optimization algorithms to find αk such that

$$ f(x_{k}+\alpha_{k}d_{k})\leq \\f(x_{k})+\delta \alpha_{k}{g_{k}^{T}}d_{k} $$
(1.5)

and

$$ g(x_{k}+\alpha_{k}d_{k})^{T}d_{k}\geq\sigma {g_{k}^{T}}d_{k}, $$
(1.6)

where \(\delta \in (0,\frac {1}{2})\) and σ ∈ (δ,1).

The above work is particularly important for the global convergence of algorithms, and the gradient Lipschitz continuity condition is usually used as a hypothetical condition in the convergence proof for each algorithm. Next, we will analyze the convergence of various classical CG formulas and the use of the gradient Lipschitz continuity condition in the corresponding proofs.

The FR method [1] proves global convergence under the strong Wolfe line search, while generating sufficient descent directions. In the global convergence proof of the FR method, the following three conditions are required: (i) the objective function f is continuously differentiable; (ii) the level set S = {xRn : f(x) ≤ f(x1)} is bounded; (iii) the gradient function g(x) satisfies the gradient Lipschitz continuity condition. According to the above conditions, we can get \(\liminf _{k \to \infty } \|g_{k}\|=0.\)

The DY method [4] is a classic CG formula that always generates descent directions under a standard Wolfe line search. The global convergence of the DY method is assumed under two conditions: (i) f is bounded below on Rn and is continuously differentiable in a neighborhood N of the level set S = {xRn : f(x) ≤ f(x1)}; (ii) the gradient ∇f(x) is Lipschitz continuous in N, i.e., there exists a constant L > 0 such that ∥∇f(x) −∇f(y)∥≤ Lxy∥ for any x,yN. In addition, the DY method satisfies the Zoutendijik condition with

$$ {\sum}_{k=0 }^{\infty}\frac{({g_{k}^{T}}d_{k})^{2}}{\|d_{k}\|^{2}}<\infty $$

under the assumption of the gradient Lipschitz continuity condition.

In [16, 17], the PRP method can converge at a square rate, and its finite-dimensional smooth objective function f(x) needs to satisfy the requirement that f(x) is a twice differentiable strongly convex function, with a bounded second derivative that satisfies gradient Lipschitz continuity condition. Then, we can obtain

$$ \|x^{kN}-x^{*}\|\leq cq^{2k}, q<1. $$

Moreover, the gradient Lipschitz continuity condition is used in the proof of the above inequality. This also indicates that the PRP method with an exact line search is globally convergent under the assumption that f is strongly convex.

The method that was proposed by Hager and Zhang (see [10]) in 2005 is also a well-known CG formula. This method is considered to satisfy either the Goldstein conditions [8]

$$ \delta_{1}\alpha_{k}{g_{k}^{T}}d_{k}\leq f(x_{k}+\alpha_{k}d_{k})-f(x_{k})\leq \delta_{2}\alpha_{k}{g_{k}^{T}}d_{k}, $$

where \(0<\delta _{2}<\frac {1}{2}<\delta _{1}<1\) and αk > 0, or the Wolfe conditions [25, 26]. Assuming that the objective function f is strongly convex and the gradient function satisfies gradient Lipschitz continuity condition, global convergence can be achieved. Moreover, if the line search satisfies the Goldstein conditions, then

$$ \alpha_{k}\geq \frac{(1-\delta_{1})}{L}\frac{|{g_{k}^{T}}d_{k}|}{\|g_{k}\|^{2}}, $$

and if the line search satisfies the Wolfe conditions [25, 26], then

$$ \alpha_{k}\geq \frac{(1-\sigma)}{L}\frac{|{g_{k}^{T}}d_{k}|}{\|g_{k}\|^{2}}. $$

Inspired by the above observations, many scholars have proposed modified CG methods based on the PRP formula (see [7, 10, 11, 14, 27,28,29,30,31,32, 36, 38, 40] etc.). An improved three-term Polak-Ribi\(\grave {e}\)re-polyak (MPRP) method proposed by Zhang et al. is one of the good results. Contradiction proof is used in the MPRP algorithm for global convergence, and ∥dk∥≤ Md (Md is a positive constant) is obtained in the proof process, which makes us naturally associate with the trust region property. It is well known that the trust region method is an effective method for optimization problems ([13, 20,21,22,23,24, 33, 34] etc.), which has desirable theoretical properties, i.e., it possesses very strong convergent features. The subproblem model of the trust region to finds a trial step dk is determined by

$$ \begin{array}{lll} &\min\limits_{x\in\Re^{n}} {g_{k}^{T}}d_{k}+\frac{1}{2}d^{T}B_{k}d=q_{k}(d),\\ &s.t.~~~\|d_{k}\| \leq {\Delta}_{k}. \end{array} $$

where Δk is called trust region radius. According to the subproblem model, we think that dk has the trust region feature if it can ensure that the following relation

$$ \|d_{k}\| \leq c_{M}\|g_{k}\|. $$
(1.7)

holds, where cM is a positive constant. Inspired by MPRP algorithm, we consider to prove the global convergence of algorithms combined with the trust region properties. However, it is well known that the gradient Lipschitz continuity condition is still needed to prove the global convergence of MPRP algorithm. The following question is considered: can the global convergence proceed smoothly without gradient Lipschitz continuity condition? At present, few scholars have obtained good results [12, 37]. Based on the MPRP method, this paper gives a positive answer, and the following results are obtained:

  • The sufficient descent property (1.4) of MPRP algorithm independent of line search technology is retained in Algorithm 1.

  • For nonconvex and gradient non-Lipschitz continuous functions, the global convergence of Algorithm 1 is established by combining trust region properties and using the weak Wolfe-Powell line search technique.

  • Algorithm 2 is further improved from Algorithm 1, and the global convergence can be obtained independently of the line search technique.

  • The proposed algorithms perform competitively with other similar algorithms.

The motivation and Algorithm 1 are presented in the next section. Section 3 introduces the global convergence of Algorithm 1 under the WWP line search, as well as the sufficient descent property. In Section 4, Algorithm 2 is analyzed and the global convergence of Algorithm 2 is given without gradient Lipschitz continuity condition. In Section 5, the numerical results of image restoration are introduced by A-PRP-A algorithm, A-PRP-A-W algorithm, MPRP algorithm and Hager-Zhang algorithm.

2 Motivation and algorithm

Sufficient descent property plays an important role in the proof of the global convergence of the CG algorithm. The first three-term CG formula proposed by Zhang et al. [43] is called the MPRP method which have the sufficient descent property described above with \({d_{k}^{T}}g_{k}=-\|g_{k}\|^{2}\) independent of any line search. The search direction dk iterative formula is as follows

$$ d_{k}=\left\{ \begin{array}{ll} -g_{k},& \text{if} k=0,\\ -g_{k}+\frac{{g_{k}^{T}}y_{k-1}d_{k-1}-{g_{k}^{T}}d_{k-1}y_{k-1}}{\|g_{k-1}\|^{2}}, & \text{if} k\geq 1. \end{array} \right. $$
(2.1)

The global convergence of MPRP algorithm is proved under Armijo line search technique and gradient Lipschitz continuity condition. In addition, in the process of proof using the contradiction, assuming ∥gk∥≥ 𝜖 > 0, you can obtain

$$ \|d_{k}\|\leq M_{d}, $$

where Md is a positive constant. Inspired by the form of the above formula, we consider combining trust region property in the proof of global convergence and expect to obtain the trust region property of the following form

$$ \|d_{k}\| \leq c_{M}\|g_{k}\| $$

holds, where cM is a positive constant. According to the above, for gradient non-Lipschitz continuous functions, we present a new formula on the basis of formula (2.1), and combine the trust region properties to prove global convergence. The search direction dk iterative formula is as follows:

$$ d_{k}=\left\{ \begin{array}{ll} -g_{k},& \text{if} k=0,\\ -g_{k}+\alpha_{k-1}\frac{{g_{k}^{T}}y_{k-1}d_{k-1}-{g_{k}^{T}}d_{k-1}y_{k-1}}{\|g_{k-1}\|^{2}}, & \text{if} k\geq 1, \end{array} \right. $$
(2.2)

In this formula, prefix αk− 1 is added to the formula (2.1) and the specified formula is called an adaptive three-term CG formula. This modification is helpful for the implementation of the algorithm in this paper. The sufficient descent property, global convergence and other properties of this method will be proven in the next section. Based on the above discussions, Algorithm 1 is presented as follows.

Algorithm 1: An adaptive three-term PRP algorithm (A-T-PRP-A)

  1. Step 0:

    Given any initial point x0Rn, d0 = −g0, constants 𝜖 ∈ (0,1), \(\delta \in (0,\frac {1}{2})\), σ ∈ (δ,1), μ > 0, and set k = 0.

  2. Step 1:

    Stop if ∥gk∥≤ 𝜖 holds.

  3. Step 2:

    Find αk satisfying the WWP line search (1.5) and (1.6).

  4. Step 3:

    The next point xk+ 1 = xk + αkdk is defined.

  5. Step 4:

    Stop if ∥gk+ 1∥≤ 𝜖 is true.

  6. Step 5:

    Compute the conjugate gradient direction by (2.2).

  7. Step 6:

    Set k = k + 1, and go to Step 2.

3 The global convergence of Algorithm 1

In this section, we will prove that Algorithm 1 is globally convergent. First, the sufficient descent property of the new formula will be demonstrated.

Lemma 1

The direction is generated by Algorithm 1 where \({g_{k}^{T}}d_{k}=-\|g_{k}\|^{2}\), i.e., Algorithm 1 has sufficient descent (1.4).

Proof

Algorithm 1 generates the sequence {xk,dk,αk,gk}, then, if k = 0, d0 = −g0. In this case, Algorithm 1 has sufficient descent (1.4). If k ≥ 1, the following relation can be obtained from the iterating formula (2.2):

$$ \begin{array}{@{}rcl@{}} {g_{k}^{T}}d_{k} &=&{g_{k}^{T}}(-g_{k}+\alpha_{k-1}\frac{{g_{k}^{T}}y_{k-1}d_{k-1}-{g_{k}^{T}}d_{k-1}y_{k-1}}{\|g_{k-1}\|^{2}})\\ &=&-\|g_{k}\|^{2}+{g_{k}^{T}}\frac{{g_{k}^{T}}y_{k-1}(x_{k}-x_{k-1})-{g_{k}^{T}}(x_{k}-x_{k-1})y_{k-1}}{\|g_{k-1}\|^{2}}\\ &=&-\|g_{k}\|^{2}. \end{array} $$

Then sufficient descent property (1.4) holds. The proof is complete. □

Remark 3.1

  1. (1)

    It is not difficult to observe that the direction has the property of sufficient descent and requires no additional assumptions.

  2. (2)

    From the above results, the following relation can be obtained:

    $$ -\|d_{k}\|\|g_{k}\|\leq {g_{k}^{T}}d_{k} = -\|g_{k}\|^{2}, $$

    this means that

    $$ \|g_{k}\|\leq \|d_{k}\|,\forall k. $$
    (3.1)

The following assumption is needed to prove the global convergence of Algorithm 1.

Assumption (I). The level set S = {x| f(x) ≤ f(x0)} is defined and bounded, and the nonconvex function f(x) is continuously differentiable and bounded below.

Remark 3.2

  1. (1)

    Under Assumption I, we conclude that there exists a constant c0 > 0 satisfying

    $$ \|x_{k}\|\leq c_{0}, ~\forall k. $$
  2. (2)

    From Assumption I, the nonconvex function f(x) is continuously differentiable then the gradient function g(x) is continuous. In addition, the level set S is bounded; therefore, the function f(x) is bounded in the level set S. And the gradient function g(x) is continuous, that is, the gradient function g(x) is bounded in the level set S. Namely there exists a constant Mg > 0 such that

    $$ \|g(x)\|\leq M_{g}, ~\forall x\in R^{n}. $$
    (3.2)
  3. (3)

    Note, gradient non-Lipschitz continuous functions are common. We give a simple example where the original function \(f(x)=x^{\frac {3}{2}}\), \(x\in [0,+\infty ]\), the gradient function \(g(x)=\frac {3}{2}x^{\frac {1}{2}}\), and at x = 0 the gradient does not satisfy gradient Lipschitz continuity condition.

Theorem 1

Suppose that Assumption I is satisfied and sequence {xk,dk,αk,gk} is generated by Algorithm 1, then, we get

$$ \liminf_{k \to \infty} \|g_{k}\|=0. $$
(3.3)

Proof

In this proof, global convergence is obtained by using contradiction. Fist of all, suppose

$$ \|g_{k}\|\geq\varepsilon_{0} $$
(3.4)

holds. The global convergence of Algorithm 1 is proved below. Using (1.5) and the relation (1.4) of Lemma 1, we have

$$ f(x_{k}+\alpha_{k}d_{k})\leq \\f(x_{k})+\delta \alpha_{k}{g_{k}^{T}}d_{k} \leq f(x_{k})-\delta \alpha_{k}\|g_{k}\|^{2}, $$

and

$$ \delta \alpha_{k}\|g_{k}\|^{2}\leq f(x_{k})-f(x_{k}+\alpha_{k}d_{k}). $$

Summing the above inequalities from k = 0 to \(\infty \), we obtain

$$ \delta{\sum}_{k=0 }^{\infty}\alpha_{k}\|g_{k}\|^{2}<\infty, $$

then

$$ \lim_{k \rightarrow \infty}\alpha_{k}\|g_{k}\|^{2}=0. $$
(3.5)

Therefore, the following relation can be obtained from (3.4) and (3.5)

$$ \alpha_{k}\rightarrow 0, k\rightarrow \infty. $$

In the next place, similar to the proof of Lemma 1, if k = 0, d0 = −g0. Then,

$$ \|d_{0}\|=\|g_{0}\|. $$
(3.6)

If k ≥ 1, it can be deduced from formulas (3.4) and Remark 3.2

$$ \begin{array}{@{}rcl@{}} \|d_{k}\|&=&\|-g_{k}+\alpha_{k-1}\frac{{g_{k}^{T}}y_{k-1}d_{k-1}-{g_{k}^{T}}d_{k-1}y_{k-1}}{\|g_{k-1}\|^{2}}\|\\ &=&\|-g_{k}+\frac{{g_{k}^{T}}y_{k-1}(x_{k}-x_{k-1})-{g_{k}^{T}}(x_{k}-x_{k-1})y_{k-1}}{\|g_{k-1}\|^{2}}\|\\ &\leq&\|g_{k}\|+\frac{2\|g_{k}\|\|y_{k-1}\|\|x_{k}-x_{k-1}\|}{\|g_{k-1}\|^{2}}\\ &\leq&\|g_{k}\|+\frac{2\|g_{k}\|(\|g_{k}\|+\|g_{k-1}\|)\cdot 2c_{0}}{\|g_{k-1}\|^{2}}\\ &\leq&(1+\frac{8c_{0}M_{g}}{{\epsilon_{0}^{2}}})\|g_{k}\|. \end{array} $$

Let \(M=1+\frac {8c_{0}M_{g}}{{\epsilon _{0}^{2}}}>0\). According to the above derivation and (3.6), that is to say

$$ \|d_{k}\|\leq M\|g_{k}\|, $$
(3.7)

where k ≥ 0. In addition, according to (3.1) in Remark 3.1, we can infer that

$$ \|g_{k}\|\leq\|d_{k}\|\leq M\|g_{k}\|. $$

From the above derivation, the level set is bounded means that

$$ x_{k_{j}}\rightarrow x^{*}, j\rightarrow \infty, $$

where \(\{x_{k_{j}}\}\) is a convergent sequence in xk. By Assumption I, the objective function f(x) is continuously differentiable. Therefore, for all 𝜖1 > 0, there exists an integer N1 > 0 such that

$$ \|g(x_{k_{j}})-g(x^{*})\|<\epsilon_{1}, \forall j>N_{1}. $$
(3.8)

Using (3.7), for all 𝜖2 > 0, there exists an integer N2 > 0 that satisfies

$$ \|d(x_{k_{j}})-d(x^{*})\|<\epsilon_{2}, \forall j>N_{2}. $$
(3.9)

From (3.8), (3.9) and Remark 3.1(2), we obtain

$$ g(x^{*})^{T}d(x^{*})\leq -\|g(x^{*})\|^{2}<-c\epsilon_{0}<0. $$
(3.10)

According to the second relation (1.6) of the WWP line search, the following relation can be obtained

$$ g_{k+1}^{T}d_{k}\geq\sigma {g_{k}^{T}}d_{k}, $$

this means that

$$ g_{k+1}^{T}d_{k}-\sigma {g_{k}^{T}}d_{k}\geq0, $$

Let \(N={\max \limits } \{N_{1},N_{2}\},\) when k > N, we can deduce that

$$ \lim_{k \rightarrow \infty}(g_{k+1}^{T}d_{k}-\sigma {g_{k}^{T}}d_{k})=(1-\sigma)g(x^{*})^{T}d(x^{*})\geq0. $$

Finally, the above relation also implies that

$$ g(x^{*})^{T}d(x^{*})\geq 0 $$

holds. This contradicts the relation (3.10) that was obtained under our hypothesis (3.3). The proof is completed. □

Remark 3.3

  1. (1)

    According to the above proof process, the global convergence can be proven in the absence of gradient Lipschitz continuity condition.

  2. (2)

    In the process of proving the global convergence, we use contradiction, and the proof for trust region simplifies the whole proof process.

4 Algorithm 2 and its proof of convergence

In the above section, we use the WWP line search method to determine αk and propose a new CG formula for determining the direction of descent dk. It is not difficult to observe that the proof of global convergence is still valid in the absence of gradient Lipschitz continuity condition. This section will continue to explore and give an algorithm without a line search or the gradient Lipschitz continuity condition, and global convergence will also hold. The algorithm is presented as follows:

Algorithm 2: An adaptive three-term PRP algorithm without gradient Lipschitz continuity condition (A-T-PRP-A-W)

  1. Step 0:

    Given any initial point x0Rn,d0 = −g0, constants 𝜖 ∈ (0,1), δ ∈ (0,1), σ0 ∈ (0,1), μ > 0, and set k = 0.

  2. Step 1:

    Stop if ∥gk∥≤ 𝜖 holds.

  3. Step 2:

    The next point xk+ 1 = xk + dk is defined.

  4. Step 3:

    If \(f(x_{k+1})>f(x_{k})+\delta _{0}{g_{k}^{T}}d_{k}\), set\(f(x_{k+1})=f(x_{k})+\sigma _{0}\delta _{0}{g_{k}^{T}}d_{k}\).

  5. Step 4:

    Stop if ∥gk+ 1∥≤ 𝜖 is true.

  6. Step 5:

    Compute the conjugate gradient direction by (2.1).

  7. Step 6:

    Set k=k + 1, and go to Step 2.

Remark 4.1

  1. (1)

    The difference between Algorithm 1 and Algorithm 2 is the step 2 and the step 3. We assume that αk is always equal to 1, and formula (2.2) will become a classical formula (2.1).

  2. (2)

    According to the Taylor formula, we obtain

    $$ f(x_{k}+d_{k})=f(x_{k})+{g_{k}^{T}}d_{k}+O(\|d_{k}\|^{2}). $$

    Using Lemma 1 has \({g_{k}^{T}}d_{k}=-\|g_{k}\|^{2}\). In the third step of Algorithm 2, if we choose a sufficiently small scalar δ0, then \(f(x_{k+1})\leq f(x_{k})+{g_{k}^{T}}d_{k}+O(\|d_{k}\|^{2})\) will be true in most cases. This also means that case \(f(x_{k+1})=f(x_{k})+\sigma _{0}\delta _{0}{g_{k}^{T}}d_{k}\) might not hold. Therefore, it can be divided into the following two situations with

    $$ \begin{aligned} {\textbf{Case(i):}} f(x_{k+1})\!\leq\! f(x_{k}) + \delta_{0}{g_{k}^{T}}d_{k} ~and~ {\textbf{Case(ii):}}f(x_{k+1}) = f(x_{k}) + \sigma_{0}\delta_{0}{g_{k}^{T}}d_{k}. \end{aligned} $$
  3. (3)

    In the following we set fb(xk+ 1) = f(xk+ 1) in Case(ii)

Lemma 2

Suppose that Assumption I holds, for Case(ii), we can deduce that fb(xk) is bounded below.

Proof

By the definition of fb, we have

$$ f^{b}(x_{k+1})=f(x_{k})+\sigma_{0}\delta_{0}{g_{k}^{T}}d_{k}=f(x_{k})-\sigma_{0}\delta_{0}\|g_{k}\|^{2}. $$

Then

$$ f(x_{k})-\sigma_{0}\delta_{0}\beta^{2}\leq f^{b}(x_{k+1})\leq f(x_{k}). $$

Since f(x) is bound below, the result of this lemma is true. The proof is complete. □

Theorem 2

Suppose that Assumption I hold, then we get

$$ \lim_{k\rightarrow \infty}\|g_{k}\|=0. $$
(4.1)

Proof

This theorem will be proved in the following four situations.

  • Situation I: Case(i) always holds. According to Assumption I, (1.4), we obtain

    $$ f(x_{k+1})\leq f(x_{k})+\delta_{0}{g_{k}^{T}}d_{k}=f(x_{k})-\delta_{0}\|g_{k}\|^{2}, $$

    then

    $$ \delta_{0}\|g_{k}\|^{2}\leq f(x_{k})-f(x_{k+1}). $$

    Summing the above inequalities from k = 0 to \(\infty \) and using Assumption I, we get

    $$ \delta_{0}{\sum}_{k=0 }^{\infty}\|g_{k}\|^{2}\leq f_{0}-f_{\infty}<\infty, $$

    so (4.1) is true.

  • Situation II: Case(ii) always holds. According to the above definition and (1.4), we obtain

    $$ \begin{array}{@{}rcl@{}} f^{b}(x_{1}) &=& f(x_{0})+\sigma_{0}\delta_{0}{g_{0}^{T}}d_{0} = f(x_{0})-\sigma_{0}\delta_{0}\|g_{0}\|^{2},\\ f^{b}(x_{2}) &=& f^{b}(x_{1})+\sigma_{0}\delta_{0}{g_{1}^{T}}d_{1} = f^{b}(x_{1})-\sigma_{0}\delta_{0}\|g_{1}\|^{2},\\ f^{b}(x_{3}) &=& f^{b}(x_{2})+\sigma_{0}\delta_{0}{g_{2}^{T}}d_{2} = f^{b}(x_{2})-\sigma_{0}\delta_{0}\|g_{2}\|^{2},\\ f^{b}(x_{4}) &=& f^{b}(x_{3})+\sigma_{0}\delta_{0}{g_{3}^{T}}d_{3} = f^{b}(x_{3})-\sigma_{0}\delta_{0}\|g_{3}\|^{2},\\ &\cdots,&\\ f^{b}(x_{k+1}) &=& f^{b}(x_{k})+\sigma_{0}\delta_{0}{g_{k}^{T}}d_{k}=f^{b}(x_{k})-\sigma_{0}\delta_{0}\|g_{k}\|^{2},\\ &\cdots,& \end{array} $$

    summing up the above relations obtains

    $$ \sigma_{0}\delta_{0}{\sum}_{k=0}^{\infty}\|g_{k}\|^{2}=f(x_{0})-f^{b}(x_{\infty})<\infty, $$

    From Lemma 2 and Assumption I, it can also be inferred that (4.1) holds.

  • Situation III: Case(i) and Case(ii) both hold. It has

    $$ \begin{array}{@{}rcl@{}} f^{b}(x_{1}) &=& f(x_{0})+\sigma_{0}\delta_{0}{g_{0}^{T}}d_{0}=f(x_{0})-\sigma_{0}\delta_{0}\|g_{0}\|^{2},\\ f(x_{2}) &\leq& f^{b}(x_{1})+\delta_{0}{g_{1}^{T}}d_{1}\leq f^{b}(x_{1})-\sigma_{0}\delta_{0}\|g_{1}\|^{2},\\ f^{b}(x_{3}) &=& f(x_{2})+\sigma_{0}\delta_{0}{g_{2}^{T}}d_{2}=f(x_{2})-\sigma_{0}\delta_{0}\|g_{2}\|^{2},\\ f(x_{4}) &=& f^{b}(x_{3})+\delta_{0}{g_{3}^{T}}d_{3}=f^{b}(x_{3})-\sigma_{0}\delta_{0}\|g_{3}\|^{2},\\ f^{b}(x_{5}) &=& f(x_{4})+\sigma_{0}\delta_{0}{g_{4}^{T}}d_{4}=f(x_{4})-\sigma_{0}\delta_{0}\|g_{4}\|^{2},\\ &\cdots,& \end{array} $$

    similar to Situation I and II, we can also obtain

    $$ \sigma_{0}\delta_{0}{\sum}_{k=0}^{\infty}\|g_{k}\|^{2}<\infty, $$

    which means that (4.1) is true.

  • Situation IV: Case(i) and Case(ii) hold with irregular alternation. By a way similar to the above analysis, we can turn out that (4.1) is true. The proof is complete.

5 Image restoration problems

In this section, the numerical results of image restoration are introduced. This section aims at restoring the original image from an image that was damaged by impulse noise. Image restoration is of great importance in optimization experiments and is applied in many fields. All the numerical tests are run on an Intel (R) Core (TM) i5-4590 CPU @ 3.30 GHz with RAM 4.0 GB of memory on the Windows 10 operating system.

Usually, after image compression, the output image differs from the original image. In experiments, to measure the image quality after processing, we usually refer to the PSNR value to measure whether a process is satisfactory. In general, the higher the value of PSNR is, the better the quality of the output image. Therefore, PSNR is an objective standard for image evaluation, which is defined as

$$ \begin{array}{@{}rcl@{}} PSNR=10\times\log_{10}(\frac{(2^{n}-1)^{2}}{MSE}), \end{array} $$

where MSE is the mean square error between the original image (voice) and the processed image (speech) and n is the number of bits per sample value. The algorithm will stop if \(\frac {\|f_{k+1}-f_{k}\|}{\|f_{k}\|}<10^{-4}\) is satisfied. The parameters of the algorithms are: δ = 0.2,σ = 0.9,δ0 = 0.2,σ0 = 0.9,μ = 0.01.

In the experiment, we choose Lena(512×512), Baboon(512×512), Barbara (512×512) and Cameraman (512×512) as the test images. We use the A-T-PRP-A algorithm and A-T-PRP-A-W algorithm to restore the images after adding noise, and compare the results with those of the MPRP algorithm and Hager-Zhang algorithm. The detailed performance results are shown in Figs. 12 and 3. Figures 12 and 3 describe the results of repair by the A-T-PRP-A algorithm, A-T-PRP-A-W algorithm, MPRP algorithm and Hager-Zhang algorithm under noises addition of 30%, 45% and 60% respectively. As shown in Figs. 12 and 3, the A-T-PRP-A algorithm and A-T-PRP-A-W algorithm can successfully restore the image with noise. In addition, Table 1 lists the PSNR values in the experiment. According to Table 1, although the differences in PSNR values of the restored images that are obtained by the four algorithms are small, it can still be seen that the PSNR values that are obtained by the A-T-PRP-A algorithm and A-T-PRP-A-W algorithm are superior to those obtained by the MPRP algorithm and Hager-Zhang algorithm. This shows that the A-T-PRP-A algorithm and A-T-PRP-A-W algorithm are competitive with the MPRP algorithm and Hager-Zhang algorithm in performance.

Fig. 1
figure 1

From left to right, the images disturbed by 30% salt-and-pepper noise, the images restored by A-T-PRP-A, the images restored by A-T-PRP-A-W algorithm, the images restored by MPRP algorithm and the images restored by Hager-Zhang algorithm

Fig. 2
figure 2

From left to right, the images disturbed by 45% salt-and-pepper noise, the images restored by A-T-PRP-A, the images restored by A-T-PRP-A-W algorithm, the images restored by MPRP algorithm and the images restored by Hager-Zhang algorithm

Fig. 3
figure 3

From left to right, the images disturbed by 60% salt-and-pepper noise, the images restored by A-T-PRP-A, the images restored by A-T-PRP-A-W algorithm, the images restored by MPRP algorithm and the images restored by Hager-Zhang algorithm

Table 1 Peak signal-to-noise ratio (PSNR) of A-T-PRP-A algorithm, A-T-PRP-A-W algorithm, MPRP algorithm and Hager-Zhang algorithm

6 Conclusion

For non-convex and gradient non-Lipschitz continuous functions, two adaptive three-term PRP algorithms are presented in this paper. Algorithm 1 is proposed on the basis of MPRP algorithm and its sufficient descent property is preserved independent any line search. The global convergence of Algorithm 1 is obtained using WWP line search technique without the gradient Lipschitz continuity condition, and the proof is combined with the trust region properties by using contradiction. Algorithm 2 is further improved from Algorithm 1 to obtain global convergence independent of line search technique. In addition, numerical experiment shows that Algorithm 1 and Algorithm 2 have certain competitiveness compared with other algorithms.

Algorithm 1 and Algorithm 2 are proposed based on MPRP algorithm. In addition, it can be seen from the proof of Lemma1 and (3.3) in this paper that if sufficient descent and trust region properties are satisfied, global convergence may be established. However, for classical CG algorithms, such as PRP algorithm, it is a pity that even sufficient descent properties cannot be satisfied.

In the future, there are several aspects can be further studied. The first one is whether there exist other proof methods for the global convergence of CG algorithms and nonconvex without gradient Lipschitz continuity condition. The second one is the convergence of some improved CG method in the absence of gradient Lipschitz continuity condition.