1 Introduction

The conjugate gradient method (CGM) is an ideal method for solving the large-scale unconstrained optimization problems due to its simple structure and low storage capacity, and therefore, it is usually extended to obtain the solution of optimization arising in many fields of science and engineering. For example, sparse recovery [6], heat conduction problem [21], inverse blackbody radiation problem [15], power system [16, 22, 23] and so on.

In this paper, we consider the following unconstrained optimization problem:

$$\begin{aligned} \min \{ f(x)| \ x\in R^n\}, \end{aligned}$$

where \(f:R^n\rightarrow R\) is a continuously differentiable function. Denote \(g_{k}=g(x_{k})=\nabla f(x_{k})\) for \(x_{k}\in R^{n}\). For CGM, the classic formula for generating the new iterate \(x_{k+1}\) is

$$\begin{aligned}&\begin{array}{c}x_{k+1}=x_k+\alpha _k d_k, \end{array}\end{aligned}$$
(1.1)
$$\begin{aligned}&d_k=\left\{ \begin{array}{ll}-g_k,&{}k={0},\\ {}-g_k+\beta _k d_{k-1},&{}k\ge {1}, \end{array}\right. \end{aligned}$$
(1.2)

where \(d_{k}\) is the search direction, and \(\alpha _k\) is a step-length computed by a suitable inexact line search. Usually, the popular inexact line searches include the weak Wolfe line search

$$\begin{aligned} \left\{ \begin{array}{ll} f(x_k+\alpha _k d_k)\le f(x_k)+\delta \alpha _k g_k^\mathrm {T}d_k,\\ g(x_k+\alpha _k d_k)^\mathrm {T} d_k\ge \sigma g_k^\mathrm {T}d_k, \end{array} \right. \end{aligned}$$
(1.3)

and the strong Wolfe line search

$$\begin{aligned} \left\{ \begin{array}{ll} f(x_k+\alpha _k d_k)\le f(x_k)+\delta \alpha _k g_k^\mathrm {T}d_k,\\ |g(x_k+\alpha _k d_k)^\mathrm {T} d_k|\le \sigma |g_k^\mathrm {T}d_k|, \end{array} \right. \end{aligned}$$
(1.4)

where the parameters \(\delta \) and \(\sigma \) in (1.3) and (1.4) are required to satisfy \(0<\delta<\sigma <1\). The scalar \(\beta _k\) in (1.2) is the so-called conjugate parameter, and different choices for \(\beta _k\) lead to different CGMs. The classical CGMs include the Hestenes–Stiefel (HS,1952) CGM [11], the Fletcher–Reeves (FR, 1964) CGM [7], the Polak–Ribière–Polyak (PRP,1969) CGM [18, 19] and the Dai–Yuan (DY,1999) CGM [3], and their parameters \(\beta _k\) are specified as follows:

$$\begin{aligned} \beta _k^{\mathrm {HS}}= & {} \frac{g_k^{\mathrm {T}}y_{k-1}}{d_{k-1}^{\mathrm {T}}y_{k-1}}, \quad \beta _k^{\mathrm {FR}}=\frac{\Vert g_k\Vert ^2}{\Vert g_{k-1}\Vert ^2}, \\ \beta _k^{\mathrm {PRP}}= & {} \frac{g_k^{\mathrm {T}}y_{k-1}}{\Vert g_{k-1}\Vert ^2}, \quad \beta _k^{\mathrm {DY}}=\frac{\Vert g_k\Vert ^2}{d_{k-1}^{\mathrm {T}}y_{k-1}}. \end{aligned}$$

Here \(y_{k-1}=g_k-g_{k-1}\) and \(\Vert \cdot \Vert \) stands for the Euclidean norm.

In the past few decades, based on the methods above, many CGMs with excellent theoretical properties and numerical results are proposed, see [9, 10, 12,13,14, 20, 24,25,26] for details.

Based on the PRP CGM, Huang et al. [12] proposed a new CGM, where \(\beta _k\) is written as

$$\begin{aligned} \beta _k^{\mathrm {MPRP}}=\frac{\Vert g_{k}\Vert ^{2}-\frac{(g_k^{\mathrm{T}}g_{k-1})^2}{\Vert g_{k-1}\Vert ^2}}{\Vert g_{k-1}\Vert ^2}. \end{aligned}$$
(1.5)

Zhou and Lu [25] modified HS formula as follows:

$$\begin{aligned} \beta _k^{\mathrm {MHS}}=\frac{\Vert g_{k}\Vert ^{2}-\frac{(g_k^{\mathrm{T}}g_{k-1})^2}{\Vert g_{k-1}\Vert ^2}}{d_{k-1}^{\mathrm{T}}(g_k-g_{k-1})}. \end{aligned}$$
(1.6)

Recently, Jiang and Jian [13] improved the FR and DY CGMs, and proposed three CGMs with good convergence and high efficiency, in which the conjugate parameters \(\beta _k^{\mathrm{IFR}}\), \(\beta _k^{\mathrm{IDY}}\) and \(\beta _k^{\mathrm{IFD}}\) are computed by

$$\begin{aligned} \beta _k^{\mathrm {IFR}}= & {} \tau _k \cdot \beta _k^{\mathrm {FR}}=\frac{|g_{k}^{\mathrm {T}} d_{k-1}|}{-g_{k-1}^{\mathrm {T}} d_{k-1}} \cdot \frac{\Vert g_{k}\Vert ^{2}}{\Vert g_{k-1}\Vert ^{2}}, \\ \beta _k^{\mathrm {IDY}}= & {} \tau _k \cdot \beta _k^{\mathrm {DY}}=\frac{|g_{k}^{\mathrm {T}} d_{k-1}|}{-g_{k-1}^{\mathrm {T}} d_{k-1}} \cdot \frac{\Vert g_{k}\Vert ^{2}}{d_{k-1}^{\mathrm {T}}y_{k-1}}, \\ \beta _k^{\mathrm{IFD}}= & {} \frac{\Vert g_{k}\Vert ^{2}}{\Vert g_{k-1}\Vert ^{2}}-\frac{|g_{k}^{\mathrm{T}} d_{k-1}|}{-g_{k-1}^{\mathrm{T}} d_{k-1}} \cdot \frac{\Vert g_{k}\Vert ^{2}}{d_{k-1}^{\mathrm{T}}(g_k-g_{k-1})}, \end{aligned}$$

where \(\tau _k=\frac{|g_{k}^{\mathrm {T}} d_{k-1}|}{-g_{k-1}^{\mathrm {T}} d_{k-1}}\) is the disturbance parameter.

Based on the the numerical results in Ref. [13], we further test the \(\tau _k \cdot \mathrm{FR}, \ (1- \tau _k) \cdot \mathrm{FR}, \ \tau _k \cdot \mathrm{PRP}, \ (1- \tau _k) \cdot \mathrm{PRP}, \ \tau _k \cdot \mathrm{HS}, \ (1- \tau _k) \cdot \mathrm{HS}, \ \tau _k \cdot \mathrm{DY},\) and \((1- \tau _k) \cdot \mathrm{DY}\) CGMs, where the test problems randomly are taken from Refs. [1, 2, 17]. And all the steplengthes are yielded by the strong Wolfe line search with \(\sigma =0.1, \ \delta =0.01\). The numerical results are demonstrated by the following performance charts, and see Figs. 1 and 2:

Fig. 1
figure 1

Performance profiles on Tcpu

Fig. 2
figure 2

Performance profiles on Tcpu

We notice that there is an interesting phenomenon, that is the methods perform well when the disturbance parameter is yielded by \({\tau _k}\) if the \(\beta _k\) with the denominator \(d_{k-1}^{\mathrm{T}}(g_k-g_{k-1})\) and the disturbance parameter is obtained from \(1-{\tau _k}\) while the denominator of \(\beta _k\) is \(\Vert g_{k-1}\Vert ^{2}\). Along these ideas and inspired by [12, 13, 25], we propose two improved conjugate parameters as follows:

$$\begin{aligned}&\beta _k^{\mathrm {IMPRP}}=(1-{\tau _k}) \cdot \frac{\Vert g_{k}\Vert ^{2}-\frac{(g_k^{\mathrm{T}}g_{k-1})^2}{\Vert g_{k-1}\Vert ^2}}{\Vert g_{k-1}\Vert ^{2}}, \end{aligned}$$
(1.7)
$$\begin{aligned}&\beta _k^{\mathrm {IMHS}}={\tau _k} \cdot \frac{\Vert g_{k}\Vert ^{2}-\frac{(g_k^{\mathrm{T}}g_{k-1})^2}{\Vert g_{k-1}\Vert ^2}}{d_{k-1}^{\mathrm{T}}(g_k-g_{k-1})}. \end{aligned}$$
(1.8)

In particular, by the second inequality of the strong Wolfe line search (1.4), we know that \(0 \le {\tau _k= \frac{|g_k^{\mathrm{T}}d_{k-1}|}{-g_{k-1}^{\mathrm{T}}d_{k-1}}} \le \sigma \) if \(g_{k-1}^{\mathrm{T}} d_{k-1}<0\).

This paper is organized as follows. In Sect. 2, the algorithms and relevant properties are demonstrated. The preliminary numerical results are reported in Sect. 3. The conclusion is presented in the final section.

2 Algorithms and Relevant Properties

In this section, based on the conjugate parameters (1.7) and (1.8), we first introduce corresponding algorithms with the strong Wolfe line search (1.4) as follows:

Step 0. Choose constants \(\epsilon >0\), \(0<\delta<\sigma <1\), and initial point \({ x_0}\in R^n\). Let \({ d_0=-g_0}\), \({ k:=0}\).

Step 1. If \(\Vert g_k\Vert \le \epsilon \), then stop. Otherwise, go to Step 2.

Step 2. Determine a step-length \(\alpha _k\) by the strong Wolfe line search (1.4).

Step 3. Let \(x_{k+1}:=x_k+\alpha _kd_k\), compute \(g_{k+1}=g(x_{k+1})\) and parameter \(\beta _{k+1}\) by (1.7) or (1.8), i.e., \(\beta _{k+1}=\beta _{k+1}^{\mathrm {IMPRP}}\) or \(\beta _{k+1}^{\mathrm {IMHS}}\).

Step 4. Compute \(d_{k+1}=-g_{k+1}+\beta _{k+1}d_k\). Let \(k:=k+1\), and go back to Step 1.

For the sake of convenience, we call these iteration methods which are decided by the algorithms with \(\beta _{k+1}^{\mathrm {IMPRP}}\) and \(\beta _{k+1}^{\mathrm {IMHS}}\) the IMPRP CGM and the IMHS CGM, respectively.

The following lemmas state that the search directions yielded by the above two CGMs are all sufficient descent, and possess some properties.

Lemma 2.1

Let \(d_{k}\) be yielded by the IMPRP CGM. If \(0<\sigma <\frac{1}{2}\), then the relation

$$\begin{aligned} -\frac{1}{1-\sigma } \le \frac{g_k^{\mathrm {T}}d_k}{\Vert g_{k}\Vert ^2}\le -\frac{1-2\sigma }{1-\sigma } \quad (\forall \ k\ge { 0}) \end{aligned}$$
(2.1)

holds. In other words, the direction \(d_k\) satisfies the sufficient descent condition, that is

$$\begin{aligned} g_{k}^{\mathrm {T}}d_{k}\le -c_{\mathrm {1}}\Vert g_{k}\Vert ^2, \quad c_{\mathrm {1}}:=\frac{1-2\sigma }{1-\sigma }>0 \quad (\forall \ k\ge { 0}). \end{aligned}$$
(2.2)

Moreover, the relation \(0 \le \beta _k^{\mathrm {IMPRP}} \le \beta _k^{\mathrm {FR}}\) holds.

Proof

Observe that \(g_0^{\mathrm {T}}d_0=-\Vert g_0\Vert ^2\) and the relation (2.1) holds for \(k={0}\). Now, assume that (2.1) holds for \(k-1\) (\(k>{1}\)), and we will prove that (2.1) holds for k. Firstly, it follows from \(g_{k-1}^{\mathrm {T}}d_{k-1}<0\) and the second inequality of (1.4) that

$$\begin{aligned} |g^{\mathrm {T}}_kd_{k-1}|\le \sigma |g_{k-1}^{\mathrm {T}}d_{k-1}|=-\sigma g_{k-1}^{\mathrm {T}}d_{k-1}, \end{aligned}$$

and we further have \(0 \le {\tau _k} \le \sigma .\) This together with (1.7) implies that

$$\begin{aligned} 0 \le \beta _k^{\mathrm {IMPRP}} = (1-{\tau _k}) \cdot \frac{\Vert g_{k}\Vert ^{2} (1-\cos ^{2} \xi _{k})}{\Vert g_{k-1}\Vert ^{2}} \le \frac{\Vert g_{k}\Vert ^{2}}{\Vert g_{k-1}\Vert ^2} = \beta _k^{\mathrm {FR}}, \end{aligned}$$
(2.3)

where \(\cos \xi _k = \frac{g_k^Tg_{k-1}}{\Vert g_k\Vert \Vert g_{k-1}\Vert }\).

To proceed, based on the inequality (2.3), and with the help of [8, Lemma 3.1], the relations (2.1) and (2.2) hold, and the proof is completed. \(\square \)

Lemma 2.2

Let \(d_{k}\) be yielded by the IMHS CGM. If \(0<\sigma <1\), then the direction \(d_{k}\) satisfies the sufficient descent condition

$$\begin{aligned} g_k^{\mathrm {T}}d_k\le -c_{\mathrm {2}}\Vert g_{k}\Vert ^2, \ c_{\mathrm {2}}:=1-\sigma >0 \ (\forall \ k \ge { 0}), \end{aligned}$$
(2.4)

and the relation

$$\begin{aligned} 0 \le \beta _{k}^{\mathrm {IMHS}}\le \frac{g_k^{\mathrm {T}}d_k}{g_{k-1}^{\mathrm {T}}d_{k-1}} \ (\forall \ k \ge { 1}) \end{aligned}$$
(2.5)

holds.

Proof

Using mathematical induction, firstly, it is easy to see that (2.4) holds for \(k={0}\) since \({d_{0}}={-g_{0}}.\) And we assume that (2.4) holds for \(k-1\) (\(k>{1}\)) which implies that \(g_{k-1}^{\mathrm {T}}d_{k-1}<0\). This together with the second inequality of (1.4) indicates that \(0 \le {\tau _k} \le \sigma \) and

$$\begin{aligned} d_{k-1}^{\mathrm {T}}(g_k-g_{k-1})\ge -(1-\sigma )g_{k-1}^{\mathrm {T}} d_{k-1}>0, \end{aligned}$$
(2.6)

which further gives

$$\begin{aligned} \beta _k^{\mathrm{IMHS}}={\tau _k} \cdot \frac{\Vert g_{k}\Vert ^{2}(1-\cos ^{2} \phi _k)}{d_{k-1}^{\mathrm{T}}(g_k-g_{k-1})} \ge 0, \end{aligned}$$
(2.7)

where \(\cos \phi _k = \frac{g_k^{\mathrm{T}}g_{k-1}}{\Vert g_k\Vert \Vert g_{k-1}\Vert }\). Multiplying the both sides (1.2) by \(g_{k}^{\mathrm {T}}\), and then from (2.7), we have

$$\begin{aligned} \begin{aligned} g_k^Td_k =&-\Vert g_k\Vert ^2+{\tau _k} \cdot \frac{\Vert g_{k}\Vert ^{2}(1-\cos ^{2} \phi _k)}{d_{k-1}^{\mathrm{T}}(g_k-g_{k-1})} \cdot g_k^Td_{k-1} \\=&-\Vert g_k\Vert ^2+{\tau _k} \cdot \Vert g_{k}\Vert ^{2} \cdot (1-\cos ^{2} \phi _k)+{\tau _k} \cdot \frac{\Vert g_{k}\Vert ^{2}(1-\cos ^{2} \phi _k)}{d_{k-1}^{\mathrm{T}}(g_k-g_{k-1})} \cdot g_{k-1}^{\mathrm{T}}d_{k-1} \\ \le&-\Vert g_k\Vert ^2+{\tau _k} \cdot \Vert g_k\Vert ^2+\beta _k^{\mathrm{IMHS}} \cdot g_{k-1}^{\mathrm{T}}d_{k-1} \\ \le&-(1-\sigma )\Vert g_k\Vert ^2+\beta _k^{\mathrm{IMHS}} \cdot g_{k-1}^{\mathrm{T}}d_{k-1}. \end{aligned}\nonumber \\ \end{aligned}$$
(2.8)

Finally, it follows from (2.7), (2.8) and \(g_{k-1}^{\mathrm {T}}d_{k-1}<0\) that relations (2.4), (2.5) hold, which ends the proof. \(\square \)

By [8, Lemma 3.1] for the IMPRP CGM and [4, Theorem 2.3] for the IMHS CGM, we obtain the global convergence results for the IMPRP and IMHS CGMs obviously, respectively, and omit their for brevity.

3 Numerical Results

In this section, we report some preliminary numerical performance of our methods to illustrate their effectiveness and potentiality. The numerical experiments are shown in the following two subsections.

3.1 Experimental Set-up

To demonstrate more clearly the effectiveness of our methods, we have run two groups preliminary numerical experiments for the IMPRP CGM and the IMHS CGM, respectively.

  • In Group A, we compare the IMPRP CGM with the MPRP CGM [12] and the KD CGM [14]. There are 102 problems are tested, in which problems 1–52 (from bard to vardim) come from CUTE library [2], and others come from [1, 17].

  • In Group B, the IMHS CGM is compared with the MHS CGM [25] and the NHS CGM [24]. Specifically, the test problems 1–62 (from bard to woods) are taken from the CUTE library [2], and the others (63–107) are taken from [1, 17].

In the testing process, the dimensions of the test problems range from 2 to 1,000,000. The initial iterate points for all test problems are same as that given in [1, 2, 17]. All codes were written in Matlab R2014a, and run on a DELL laptop (Inter(R) Core(TM) i5-5200U CPU 2.20 GHz) with 4 GB RAM memory and Windows 10 operating system. For all methods, we adopt the strong Wolfe line search rule with \(\sigma =0.1, \ \delta =0.01\). We stop the iteration when the gradient values \(\Vert g_k\Vert \le 10^{-6}\). Also, when the number of iteration \(\mathrm{Itr}>2000\), we stop the iteration to indicate that the test problem is not valid, and denote it as “NaN”.

Table 1 Numerical comparisons of Group A
Table 2 Numerical comparisons of Group A

The comparison results are shown in Tables 1, 2, 3 and 4. Where “name” is the abbreviation of the test problem, and “n” is the dimension of the test problem. Moreover, denote the number of iteration, function evaluations, gradient evaluations, the computing time of CPU and gradient values the “\( \mathrm{Itr},\ \mathrm{NF},\ \mathrm{NG},\ \mathrm{Tcpu}\) and \(\Vert g_{*}\Vert \)”, respectively.

To show the test results clearly, we adopt the performance profiles introduced by Dolan and Moré [5] to illustrate and compare the performances on Tcpu and Itr, respectively. About the performance profile, see [5] for details. Generally speaking, the higher the curve \(\rho (\tau )\), the better the numerical profile of the corresponding method.

Fig. 3
figure 3

Performance profiles on Tcpu of Group A

Fig. 4
figure 4

Performance profiles on Itr of Group A

3.2 Numerical Testing Reports

We observe from Tables 1, 2 and Figs. 3, 4 that the IMPRP CGM solves about \(97\%\) of test problems in Group A, while the MPRP CGM and the KD CGM solve about \(85\%\) and \(89\%\) of this group, respectively. Furthermore, the IMPRP CGM is fastest since it solves about 40% of the test problems with the least Itr and Tcpu, respectively.

Tables 3, 4 and Figs. 5, 6 show that the IMHS CGM exhibits the best performance since it can solve almost \(42\%\) of test problems with the smallest Itr and Tcpu in Group B. Meanwhile, the MHS CGM and the NHS CGM are superior for solving \(25\%\) and \(30\%\) of test problems, respectively. Moreover, the IMHS CGM solves about 97% of the test problems successfully, while the MHS CGM and the NHS CGM solve about 86% and 90% of this group, respectively.

In summary, from the performance profiles in Figs. 3, 4, 5 and 6, the curves corresponding to the IMPRP CGM and the IMHS CGM are all at the top, respectively, which indicates that the proposed methods perform well, at least for these collections of experiments.

Table 3 Numerical comparisons of Group B
Table 4 Numerical comparisons of Group B
Fig. 5
figure 5

Performance profiles on Tcpu of Group B

Fig. 6
figure 6

Performance profiles on Itr of Group B

4 Conclusion

In this work, according to the interesting phenomenon in experiments, we have constructed two new conjugate parameters with the second inequality in the strong Wolfe line search. Based on general assumptions, all relevant methods satisfy the sufficient descent and global convergence. Finally, numerical experiments were done and reported, which show that the improved CGMs have good performance and application prospect.