1 Introduction

We study the conjugate gradient methods for unconstrained optimization problems. It is well known that Fletcher–Reeves (\(\textit{FR}\)) [1], Conjugate Descent (\(\textit{CD}\)) [2] and Dai–Yuan (\(\textit{DY}\)) [3] conjugate gradient methods have strong convergence properties, but they may not perform well in practice due to jamming. On the other hand, Hestenes–Stiefel (\(\textit{HS}\)) [4], Polak–Ribiére–Polyak (\(\textit{PRP}\)) [5, 6], and Liu–Storey (\(\textit{LS}\)) [7] conjugate gradient methods may not converge in general, but they often perform better than \(\textit{FR}\), \(\textit{CD}\) and \(\textit{DY}\). To combine the best numerical performances of the \(\textit{LS}\) method and the global convergence properties of the \(\textit{CD}\) method, Yang et al. [8] proposed a hybrid \(\textit{LS}\)-\(\textit{CD}\) method. Dai and Liao [9] proposed an efficient conjugate gradient-type method (Dai–Liao method). Later, some more efficient Dai–Liao-type conjugate gradient methods, known as \(\textit{DHSDL}\) and \(\textit{DLSDL}\), were designed and studied in [10].

Continuing previous results, we propose two efficient hybridizations of conjugate gradient (\(\textit{CG}\)) methods for solving unconstrained optimization problems and a hybridization of the Broyden–Fletcher–Goldfarb–Shanno (\(\textit{BFGS}\)) method with the \(\textit{CG}\) methods. The convergence analysis of the proposed methods is considered. Numerical experiments show that our methods outperform the existing ones from [10].

The rest of the paper is organized as follows. In Sect. 2, we elaborate various possibilities to determine the step size and the search direction which can be used in defining various conjugate gradient methods and their combinations. Here, a hybridization of the \(\textit{CG}\) method and the \(\textit{BFGS}\) method will also be presented. A modification of the \(\textit{LSCD}\) method from [8], termed as \(\textit{MLSCD}\), is proposed in Sect. 3. Also, the global convergence of the \(\textit{MLSCD}\) method for non-convex functions with the backtracking line search is approved. In Sect. 4, we combine conjugate gradient parameters of \(\textit{DHSDL}\) and \(\textit{DLSDL}\) methods from [10] and propose the modification of these methods, termed as \(\textit{MMDL}\). The global convergence of the \(\textit{MMDL}\) method, supported by the backtracking line search, is proved. A new hybrid \(\textit{BFGS}\)-\(\textit{CG}\) search direction, that combines the search direction of the quasi-Newton \(\textit{BFGS}\) method and \(\textit{CG}\) methods, is proposed in Sect. 5. The method will be termed as H-\(\textit{BFGS}\)-\(\textit{CG}1\). The global convergence of the H-\(\textit{BFGS}\)-\(\textit{CG}1\) method with the backtracking line search is proved. In Sect. 6, we report some numerical results and compare the performance of the proposed methods with some existing methods. Some final conclusions are given in the last section.

2 Preliminaries

Unconstrained optimization problems consider the problem of minimizing an objective function \(f:\mathbb R^n\rightarrow \mathbb R\)

$$\begin{aligned} \min f(\mathbf {x}),\quad \mathbf {x}\in \mathbb R^n, \end{aligned}$$
(1)

that depends on real variables \(\mathbf {x}=\!\{x_1,\ldots ,x_n\}\) without any restrictions on their values. There are many methods for solving (1). Usually, the objective f is a continuously differentiable function with its gradient \(\mathbf {g}\). The general CG iterative scheme is given by

$$\begin{aligned} \mathbf {x}_{k+1}=\!{\mathbf x}_k+t_k{\mathbf d}_k,\quad k=\!0,1,\ldots , \end{aligned}$$
(2)

where \({\mathbf x}_k\) is the current iterate, \(t_k\) is the step size found by one of the line search methods, \({\mathbf d}_k\) is the search direction given by next relations

$$\begin{aligned} {\mathbf d}_k:=\mathfrak {d}_k:=\mathfrak {d}(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1})=\left\{ \begin{array}{ll} -\,\mathbf {g}_0, &{}\quad k=0,\\ -\,{\mathbf g}_k+\beta _k{\mathbf d}_{k-1}, &{}\quad k\ge 1, \end{array} \right. \end{aligned}$$
(3)

where \({\mathbf g}_k=\mathbf {g}({\mathbf x}_k)\) and \(\beta _k\) is an appropriately defined real scalar, known as the conjugate gradient parameter. A number of conjugate gradient iterative methods have been defined using various modifications of the conjugate gradient direction \({\mathbf d}_k\) and the parameter \(\beta _k\). The most popular conjugate gradient parameters \(\beta _k\) are surveyed by next formulas:

$$\begin{aligned} \begin{aligned}&\beta _k^{\textit{FR}}=\dfrac{\Vert {\mathbf g}_k\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^2}\,\, [1],\quad \beta _k^{\textit{CD}}=-\dfrac{\Vert {\mathbf g}_k\Vert ^2}{{\mathbf g}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\,\, [2],\quad \beta _k^{\textit{DY}}=\dfrac{\Vert {\mathbf g}_k\Vert ^2}{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\,\, [3],\\&\beta _k^{\textit{HS}}=\dfrac{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf g}_k}{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\,\, [4],\quad \beta _k^{\textit{PRP}}=\dfrac{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf g}_k}{\Vert {\mathbf g}_{k-1}\Vert ^2}\,\, [5,6],\quad \beta _k^{\textit{LS}}=-\dfrac{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf g}_k}{{\mathbf g}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\,\, [7], \\&\beta _k^{\textit{DHSDL}}=\frac{{\left\| {\mathbf g}_k\right\| }^2-\frac{\left\| {\mathbf g}_k\ \right\| }{\left\| {\mathbf g}_{k-1}\ \right\| }\left| {\mathbf g}_k^{\mathrm T}{\mathbf g}_{k-1}\right| }{\mu \left| {\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}\right| +\mathbf {d}_{k-1}^{\mathrm T}{\mathbf y}_{k-1}}- t_k\frac{{\mathbf g}_k^{\mathrm T}{\mathbf s}_{k-1}}{{\mathbf k}_{k-1}^{\mathrm T}{\mathbf y}_{k-1}}\,\, [10],\\&\beta _k^{\textit{DLSDL}}=\frac{{\left\| {\mathbf g}_k\right\| }^2-\frac{\left\| {\mathbf g}_k\ \right\| }{\left\| {\mathbf g}_{k-1}\ \right\| }\left| {\mathbf g}_k^{\mathrm T}{\mathbf g}_{k-1}\right| }{\mu \left| {\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}\right| -{\mathbf k}_{k-1}^{\mathrm T}{\mathbf g}_{k-1}}-t_k\frac{\mathbf {g}_k^{\mathrm T}{\mathbf s}_{k-1}}{\mathbf {d}_{k-1}^{\mathrm T}{\mathbf y}_{k-1}}\, \, [10], \end{aligned} \end{aligned}$$
(4)

where \({\mathbf y}_{k-1}={\mathbf g}_k-\mathbf {g}_{k-1}\), \(\mathbf {s}_{k-1}={\mathbf x}_k-{\mathbf x}_{k-1}\) and \(\Vert \cdot \Vert \) stands for the Euclidean vector norm.

As usual, the step size \(t_k\) is determined using the backtracking line search procedure from [11]. It starts from \(t=1\), and it reduces the objective function sufficiently in each iteration. Therefore, we use the following Algorithm 1 from [12] for the inexact line search which determines \(t_k\).

figure a

To combine the best numerical performances of the \(\textit{PRP}\) method and the global convergence properties of the \(\textit{FR}\) method, Touati–Ahmed and Storey [13] proposed a hybrid \(\textit{PRP}\)\(\textit{FR}\) method, which is called the H1 method in [14], using the conjugate gradient parameter defined as

$$\begin{aligned} \beta _k^{\textit{H}1}=\max \Big \{0, \min \Big \{\beta _k^{\textit{PRP}},\beta _k^{\textit{FR}}\Big \}\Big \}. \end{aligned}$$
(5)

Gilbert and Nocedal in [15] modified (5) to

$$\begin{aligned} \beta _k=\max \Big \{-\beta _k^{\textit{FR}},\min \Big \{\beta _k^{\textit{PRP}},\beta _k^{\textit{FR}}\Big \}\Big \}. \end{aligned}$$

A hybrid \(\textit{HS}\)\(\textit{DY}\) conjugate gradient method was proposed by Dai and Yuan in [16]. This method is termed as the H2 method in [14]. The H2 method is defined by the conjugate gradient parameter

$$\begin{aligned} \beta _k^{\textit{H}2}=\max \Big \{0, \min \Big \{\beta _k^{\textit{HS}},\beta _k^{\textit{DY}}\Big \}\Big \}. \end{aligned}$$
(6)

Numerical results derived in [13, 16, 17] show better performances of H1 and H2 methods with respect to the \(\textit{PRP}\) method.

We consider hybrid CG algorithms where the search direction \({\mathbf d}_k:=\mathfrak {d}_k\), \(k\ge 1\), from (3) is modified using one of the following two rules:

$$\begin{aligned} {\mathbf d}_k:= & {} \mathfrak {D}(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1}) =-\left( 1+\beta _k\dfrac{{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}}{\Vert {\mathbf g}_k\Vert ^2}\right) {\mathbf g}_k+\beta _k{\mathbf d}_{k-1}, \end{aligned}$$
(7)
$$\begin{aligned} {\mathbf d}_k:= & {} \mathfrak {D}_1(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1}) =\!-B_k{\mathbf g}_k +\mathfrak {D}(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1}), \end{aligned}$$
(8)

and the conjugate gradient parameter \(\beta _k\) is defined using some proper combinations of the parameters \(\beta _k\) from (4) and already defined hybridizations of these parameters.

Following the idea from the descent three term \(\textit{PRP}\) method from [18], Zhang et al. in [19, 20] proposed a modification to the \(\textit{FR}\) method, termed as the \(\textit{MFR}\) method, using the search direction

$$\begin{aligned} \begin{aligned} {\mathbf d}_k&:=\mathfrak {D}\Big (\beta _k^{\textit{FR}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned} \end{aligned}$$
(9)

Zhang in [20] also proposed a modified DY method, which is known as the \(\textit{MDY}\) method and which is defined by the gradient direction

$$\begin{aligned} \begin{aligned} {\mathbf d}_k&:=\mathfrak {D}\Big (\beta _k^{\textit{DY}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned} \end{aligned}$$
(10)

The \(\textit{MFR}\) and \(\textit{MDY}\) methods possess very useful property

$$\begin{aligned} {\mathbf g}_k^{\mathrm T}{\mathbf d}_k =\! -\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$
(11)

Further, the \(\textit{MFR}\) and the \(\textit{MDY}\) methods reduce to the \(\textit{FR}\) method and the DY method, respectively, if the exact line search is used. The \(\textit{MFR}\) method has proven to be globally convergent for non-convex functions with the Wolfe line search or the Armijo line search, and it is very efficient in real computations [19].

However, it is not known whether the \(\textit{MDY}\) method converges globally. So, in the paper [14], the authors replaced \(\beta _k^{\textit{FR}}\) in (9) and \(\beta _k^{\textit{DY}}\) in (10) by \(\beta _k^{\textit{H1}}\) from (5) and \(\beta _k^{H2}\) from (6), respectively. Then, they defined new hybrid \(\textit{PRP}\)\(\textit{FR}\) and \(\textit{HS}\)\(\textit{DY}\) methods, which they call the \(\textit{NH}1\) method and the \(\textit{NH}2\) method, respectively. These methods are based upon the search directions

$$\begin{aligned}&\text {NH1:}\quad {\mathbf d}_k :=\mathfrak {D}\Big (\beta _k^{H1},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ), \end{aligned}$$
(12)
$$\begin{aligned}&\text {NH2:} \quad {\mathbf d}_k :=\mathfrak {D}\Big (\beta _k^{H2},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned}$$
(13)

It is easy to see that the \(\textit{NH}1\) and the \(\textit{NH}2\) methods still satisfy (11), which shows that they are descent methods.

On the other hand, the search direction \({\mathbf d}_k\) in quasi-Newton methods is obtained as a solution of the linear algebraic system

$$\begin{aligned} B_k{\mathbf d}_k=\!-{\mathbf g}_k, \end{aligned}$$
(14)

where \(B_k\) is an approximation of the Hessian. The initial approximation is the identity matrix \(B_0=I\), and subsequent updates \(B_k\) are defined by an appropriate formula. Here we are interested in the \(\textit{BFGS}\) update formula, defined by

$$\begin{aligned} B_{k+1}=B_k-\dfrac{B_k{\mathbf s}_k {\mathbf s}_k^{\mathrm T}B_k}{{\mathbf s}_k^{\mathrm T}B_k{\mathbf s}_k}+\frac{{\mathbf y}_k{\mathbf y}_k^{\mathrm T}}{{\mathbf s}_k^{\mathrm T}{\mathbf y}_k}, \end{aligned}$$
(15)

where \({\mathbf s}_k={\mathbf x}_{k+1}-{\mathbf x}_k\), \({\mathbf y}_k={\mathbf g}_{k+1}-{\mathbf g}_k\). The next secant equation must hold:

$$\begin{aligned} B_{k+1}{\mathbf s}_k={\mathbf y}_k, \end{aligned}$$
(16)

which is possible only if the curvature condition

$$\begin{aligned} {\mathbf s}_k^{\mathrm T}\, {\mathbf y}_k>0 \end{aligned}$$
(17)

is satisfied.

The three-term hybrid \(\textit{BFGS}\) conjugate gradient method was proposed in [21]. That method uses best properties of both \(\textit{BFGS}\) and \(\textit{CG}\) methods and defines a hybrid \(\textit{BFGS}\)-\(\textit{CG}\) method for solving some selected unconstrained optimization problems, resulting in improvement in the total number of iterations and the CPU time.

3 A Mixed LS-CD Conjugate Gradient Method

We consider the modification of \(\textit{LSCD}\) method, defined in [8] by

$$\begin{aligned} \begin{aligned} \beta _k^{\textit{LSCD}}&=\max \Big \{0,\min \Big \{\beta _k^{\textit{LS}},\beta _k^{\textit{CD}}\Big \}\Big \},\\ {\mathbf d}_k&=\mathfrak {d}\Big (\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ) \end{aligned} \end{aligned}$$
(18)

and define the \(\textit{MLSCD}\) method with the search direction

$$\begin{aligned} {\mathbf d}_k\!:=\!\mathfrak {D}\Big (\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned}$$
(19)

In general, our idea is to replace \(\mathfrak {d}_k(\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1})\) from [8] with \({\mathbf d}_k=\mathfrak {D}(\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1})\).

Now we give the algorithm of this method.

figure b

3.1 Convergence of the MLSCD Conjugate Gradient Method

First, it is easy to prove the next theorem.

Theorem 3.1

   Let \(\beta _k\) be any \(\textit{CG}\) parameter. Then, the search direction \({\mathbf d}_k \!:=\!\mathfrak {D}(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1})\) satisfies

$$\begin{aligned} {\mathbf g}_k^{\mathrm T}{\mathbf d}_k=-\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$
(20)

We are going to prove the global convergence of the \(\textit{MLSCD}\) method under the following assumptions.

Assumption 3.1

  1. (1)

    The level set \(S=\{\mathbf {x}\in \mathbb {R}^n|\ f(\mathbf {x})\le f(\mathbf {x}_0)\}\) is bounded.

  2. (2)

    The function \(\mathbf {g}\) is continuously differentiable in some neighborhood \(\mathcal {N}\) of S and its gradient is Lipschitz continuous. Namely, there exists a constant \(L>0\), such that

    $$\begin{aligned} \Vert \mathbf {g}(\mathbf {x})-\mathbf {g}(\mathbf {y})\Vert \le L\Vert \mathbf {x}-\mathbf {y}\Vert ,\quad \text {for all}\quad \mathbf {x},\mathbf {y}\in \mathbb {N}. \end{aligned}$$
    (21)

It is well known that if Assumption 3.1 holds, then there exists a positive constant \(\Gamma \), such that

$$\begin{aligned} \Vert {\mathbf g}_k\Vert \le \Gamma ,\quad \text {for all}\quad k. \end{aligned}$$
(22)

The outcome of the next lemma, often called the Zoutendijk condition, is used to prove the global convergence of nonlinear \(\textit{CG}\) methods. Originally, it was given in [22].

Lemma 3.1

[22, 23] Let the conditions in Assumption 3.1 be satisfied. Let the sequence \(\{{\mathbf x}_k\}\) be generated by the \(\textit{MLSCD}\) method with the backtracking line search. Then it holds that

$$\begin{aligned} \sum _{k=0}^{\infty } {\dfrac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}}<+\,\infty . \end{aligned}$$
(23)

Proof

Consider the case where the backtracking line search is used. If \(t_k \ne \beta \), then it is not difficult to show that there is a constant \(c>0\) such that

$$\begin{aligned} t_k \ge c {\dfrac{-{\mathbf g}_k^{\mathrm T}{\mathbf d}_k}{\Vert {\mathbf d}_k\Vert ^2}}= c {\dfrac{\Vert {\mathbf g}_k\Vert ^2}{\Vert {\mathbf d}_k\Vert ^2}}. \end{aligned}$$
(24)

This together with backtracking line search implies that there exists a constant \(M > 0 \) such that

$$\begin{aligned} {\dfrac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}}\le M(f({\mathbf x}_k)-f({\mathbf x}_{k+1})). \end{aligned}$$
(25)

On the other hand, if \(t_k = \zeta \) then it follows that

$$\begin{aligned} {\dfrac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}}\le \Vert {\mathbf g}_k\Vert ^2\le \Vert {\mathbf d}_k\Vert ^2\le \delta ^{-1} \zeta ^{-2}(f({\mathbf x}_k)-f({\mathbf x}_{k+1})). \end{aligned}$$

This also implies (25) with some \(M=\delta ^{-1} \zeta ^{-2}\). \(\square \)

Theorem 3.2

Let the conditions of Assumption 3.1 hold. Then the sequence \(\{{\mathbf x}_k\}\), generated by the \(\textit{MLSCD}\) method with the backtracking line search satisfies

$$\begin{aligned} \liminf _{k\rightarrow \infty }{\Vert {\mathbf g}_k\Vert }=0. \end{aligned}$$
(26)

Proof

In order to gain the contradiction, let us suppose that (26) does not hold. Then, there exists a constant \(c>0\) such that

$$\begin{aligned} \Vert {\mathbf g}_k\Vert \ge c,\quad \text {for all}\quad k\ge 0. \end{aligned}$$
(27)

Clearly, (19) can be rewritten into the form

$$\begin{aligned} {\mathbf d}_k=-l_k {\mathbf g}_k+\beta _k^{\textit{LSCD}}\,{\mathbf d}_{k-1},\quad l_k=1+\beta _k^{\textit{LSCD}}\,\frac{{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}}{\Vert {\mathbf g}_k\Vert ^2}. \end{aligned}$$
(28)

Now, from (28), it follows that

$$\begin{aligned} {\mathbf d}_k+l_k {\mathbf g}_k=\beta _k^{\textit{LSCD}}\,{\mathbf d}_{k-1}, \end{aligned}$$

which further implies

$$\begin{aligned} \begin{aligned} ({\mathbf d}_k+l_k {\mathbf g}_k)^2\!&=\!\Big (\beta _k^{\textit{LSCD}}\,{\mathbf d}_{k-1}\Big )^2,\\ \Vert {\mathbf d}_k\Vert ^2 + 2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k + l_k^2\Vert {\mathbf g}_k\Vert ^2 \!&=\! \Big (\beta _k^{\textit{LSCD}}\Big )^2\Vert {\mathbf d}_{k-1}\Vert ^2, \end{aligned} \end{aligned}$$

and subsequently

$$\begin{aligned} \Vert {\mathbf d}_k\Vert ^2 = \Big (\beta _k^{\textit{LSCD}}\Big )^2 \Vert {\mathbf d}_{k-1}\Vert ^2 - 2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k - l_k^2\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$
(29)

Notice that

$$\begin{aligned} \beta _k^{\textit{LSCD}}=\max \Big \{0,\min \Big \{\beta _k^{\textit{LS}},\beta _k^{\textit{CD}}\Big \}\Big \} \le |\beta _k^{\textit{CD}}|. \end{aligned}$$
(30)

Dividing both sides of (29) by \(({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2\), we get from (30), (20), (27) and the definition of \(\beta _k^{\textit{CD}}\) that

$$\begin{aligned} \begin{aligned} \frac{\Vert {\mathbf d}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}=\frac{\Vert {\mathbf d}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\!&=\!\Big (\beta _k^{\textit{LSCD}}\Big )^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&\le \Big (\beta _k^{\textit{CD}}\Big )^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_k} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&=\left( \frac{\Vert {\mathbf g}_k\Vert ^2}{-{\mathbf g}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\right) ^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_k} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}. \end{aligned} \end{aligned}$$
(31)

Now, applying (20) in (31), one can verify

$$\begin{aligned} \begin{aligned} \frac{\Vert {\mathbf d}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}&=\frac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf g}_{k-1}\Vert ^4}\,\, \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4} + \frac{2l_k}{\Vert {\mathbf g}_k\Vert ^2} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}\\&=\frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} + \frac{2l_k}{\Vert {\mathbf g}_k\Vert ^2} - l_k^2 \frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&=\frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} - \frac{(l_k-1)^2}{\Vert {\mathbf g}_k\Vert ^2} +\frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&\le \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} +\frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&\le \sum _{j=0}^k\frac{1}{\Vert \mathbf {g}_j\Vert ^2}\\&\le \frac{k+1}{c^2}. \end{aligned} \end{aligned}$$

The last inequalities imply

$$\begin{aligned} \sum _{k\ge 1}\frac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}\ge c^2\sum _{k\ge 1}\frac{1}{k+1}=\infty , \end{aligned}$$
(32)

which contradicts to (23). The proof is thus complete. \(\square \)

4 A Mixed DHSDL-DLSDL Conjugate Gradient Method

In this section, we propose the hybrid \(\textit{MMDL}\) method which is defined by the search direction \({\mathbf d}_k\) as follows:

$$\begin{aligned} \beta _k^{\textit{MMDL}}=\max \Big \{0,\min \Big \{\beta _k^{\textit{DHSDL}}, \beta _k^{\textit{DLSDL}}\Big \}\Big \}, \end{aligned}$$
(33)
$$\begin{aligned} {\mathbf d}_k=\mathfrak {D}\Big (\beta _k^{\textit{MMDL}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned}$$
(34)

The computational algorithm of this method is presented in Algorithm 3.

figure c

4.1 Convergence of the MMDL Conjugate Gradient Method

We now prove the global convergence of the \(\textit{MMDL}\) method for arbitrary objective functions.

Theorem 4.1

Let the conditions in Assumption 3.1 hold. Then the sequence \(\{{\mathbf x}_k\}\) generated by the \(\textit{MMDL}\) method with the backtracking line search satisfies

$$\begin{aligned} \liminf _{k\rightarrow \infty }{\Vert {\mathbf g}_k\Vert }=0. \end{aligned}$$
(35)

Proof

Assume, on the contrary, that (35) does not hold. Then, there exists a constant \(c>0\) such that

$$\begin{aligned} \Vert {\mathbf g}_k\Vert \ge c,\quad \text {for all}\quad k. \end{aligned}$$
(36)

Denote

$$\begin{aligned} l_k=1+\beta _k^{\textit{MMDL}}\,\frac{{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}}{\Vert {\mathbf g}_k\Vert ^2}. \end{aligned}$$
(37)

Then we can write

$$\begin{aligned} {\mathbf d}_k+l_k {\mathbf g}_k=\beta _k^{\textit{MMDL}}\,{\mathbf d}_{k-1}, \end{aligned}$$
(38)

and further

$$\begin{aligned} \begin{aligned} ({\mathbf d}_k+l_k {\mathbf g}_k)^2\!&=\!\Big (\beta _k^{\textit{MMDL}}\,{\mathbf d}_{k-1}\Big )^2,\\ \Vert {\mathbf d}_k\Vert ^2 + 2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k + l_k^2\Vert {\mathbf g}_k\Vert ^2 \!&=\! \Big (\beta _k^{\textit{MMDL}}\Big )^2\Vert {\mathbf d}_{k-1}\Vert ^2. \end{aligned} \end{aligned}$$

Thus,

$$\begin{aligned} \Vert {\mathbf d}_k\Vert ^2 = \Big (\beta _k^{\textit{MMDL}}\Big )^2 \Vert {\mathbf d}_{k-1}\Vert ^2 - 2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k - l_k^2\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$
(39)

Having in view \(\mu >1\) as well as \({\mathbf d}_k^{\mathrm T}{\mathbf g}_k<0\) and applying the extended conjugacy condition \(\mathbf {d}_k^{\mathrm T}{\mathbf y}_{k-1}=t\mathbf {g}_k^{\mathrm T}{\mathbf s}_{k-1}\), \(t>0\), which was exploited in [9, 10], we get

$$\begin{aligned} \begin{aligned} {\beta }^{\textit{DHSDL}}_{k+1}&=\frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| +\mathbf {d}_k^{\mathrm T}{\mathbf y}_k}-t_k\frac{\mathbf {g}_{k+1}^{\mathrm T}{\mathbf s}_k}{\mathbf {d}_k^{\mathrm T}{\mathbf y}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| +\mathbf {d}_k^{\mathrm T}{\mathbf y}_k}\\&=\frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| + \mathbf {d}_k^{\mathrm T}({\mathbf g}_{k+1}-{\mathbf g}_k)}\\&=\frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| +\mathbf {d}_k^{\mathrm T}{\mathbf g}_{k+1}-{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2}{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| +{\mathbf d}_k^{\mathrm T}{\mathbf g}_{k+1}-\mathbf {d}_k^{\mathrm T}{\mathbf g}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2}{-\mathbf {d}_k^{\mathrm T}{\mathbf g}_k}. \end{aligned} \end{aligned}$$

Further,

$$\begin{aligned} \begin{aligned} {\beta }^{\textit{DLSDL}}_{k+1}&=\frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| -{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}-t_k\frac{{\mathbf g}_{k+1}^{\mathrm T}{\mathbf s}_k}{{\mathbf d}_k^{\mathrm T}{\mathbf y}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| -{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2}{\mu \left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| -{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2}{-{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}. \end{aligned} \end{aligned}$$

Now, we easily conclude

$$\begin{aligned} \beta _k^{\textit{MMDL}}=\max \Big \{0,\min \Big \{\beta _k^{\textit{DHSDL}},\beta _k^{\textit{DLSDL}}\Big \}\Big \}\le \frac{\Vert {\mathbf g}_k\Vert ^2}{-{\mathbf d}_{k-1}^{\mathrm T}{\mathbf g}_{k-1}}. \end{aligned}$$
(40)

Next, dividing both sides of (39) by \(({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2\), we get from (20), (40) and (36) that

$$\begin{aligned} \begin{aligned} \frac{\Vert {\mathbf d}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}=\frac{\Vert {\mathbf d}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}&= \Big (\beta _k^{\textit{MMDL}}\Big )^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&= \Big (\beta _k^{\textit{MMDL}}\Big )^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_k} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&\le \left( \frac{\Vert {\mathbf g}_k\Vert ^2}{-{\mathbf g}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\right) ^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_k} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&=\frac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf g}_{k-1}\Vert ^4}\,\, \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4} + \frac{2l_k}{\Vert {\mathbf g}_k\Vert ^2} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}\\&=\frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} + \frac{2l_k}{\Vert {\mathbf g}_k\Vert ^2} - l_k^2 \frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&=\frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} - \frac{(l_k-1)^2}{\Vert {\mathbf g}_k\Vert ^2} +\frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&\le \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} +\frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&\le \sum _{j=0}^k\frac{1}{\Vert \mathbf {g}_j\Vert ^2}\\&\le \frac{k+1}{c^2}. \end{aligned} \end{aligned}$$
(41)

The inequalities in (41) imply

$$\begin{aligned} \sum _{k\ge 1}\frac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}\ge c^2\sum _{k\ge 1}\frac{1}{k+1}=\infty . \end{aligned}$$
(42)

Therefore, \(\Vert {\mathbf g}_k\Vert \ge c\) causes a contradiction to (23). Consequently, (35) is verified. \(\square \)

5 Hybrid Three-Term Broyden–Fletcher–Goldfarb–Shanno Conjugate Gradient Methods in Solving Unconstrained Optimization Problems

It is known that conjugate gradient methods are better compared to the quasi-Newton method in terms of the CPU time. In addition, \(\textit{BFGS}\) is more costly in terms of the memory storage requirements than \(\textit{CG}\). On the other hand, the quasi-Newton methods are better in terms of the number of iterations and the number of function evaluations. For this purpose, various hybridizations of quasi-Newton methods and \(\textit{CG}\) methods have been proposed by previous researchers. The new hybrid method which solves the system of nonlinear equations combining the quasi-Newton method with chaos optimization was proposed in [24]. In [25], the authors defined a combination of a quasi-Newton and the Cauchy descent method for solving unconstrained optimization problems, which is known as the quasi-Newton SD method. In [21], the authors proposed a hybrid search direction that combines the quasi-Newton and \(\textit{CG}\) methods. It yields a search direction of the hybrid method which is known as the \(\textit{BFGS}\)-\(\textit{CG}\) method in [21]. The search direction of the method from [21] is defined by

$$\begin{aligned} {\mathbf d}_k={\left\{ \begin{array}{ll} -B_k{\mathbf g}_k,&{}\quad k=0,\\ -B_k{\mathbf g}_k+\eta (-{\mathbf g}_k+\beta _k{\mathbf d}_{k-1}),&{}\quad k\ge 1,\end{array}\right. } \end{aligned}$$

where \(\eta >0\) and \(\beta _k=\dfrac{{\mathbf g}_k^{\mathrm T}{\mathbf g}_{k-1}}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}}\). A hybrid direction search between \(\textit{BFGS}\) update of the Hessian matrix and the conjugate coefficient \(\beta _k\) was proposed and investigated in [26, 27]. A Hybrid \(\textit{DFP}\)-\(\textit{CG}\) method for solving unconstrained optimization problems was presented in [28].

5.1 Three-Term H-BFGS-CG1 Method

Our idea is to consider a three-term hybrid \(\textit{BFGS}\)-\(\textit{CG}\) method (called H-\(\textit{BFGS}\)-\(\textit{CG}1\)), defined by the search direction

$$\begin{aligned} {\mathbf d}_k\!:=\!{\left\{ \begin{array}{ll} -B_k{\mathbf g}_k,&{} \quad k=0,\\ \mathfrak {D}_1\Big (\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ),&{} \quad k\ge 1. \end{array}\right. } \end{aligned}$$

Algorithm 4 defines the corresponding computational procedure of the H-\(\textit{BFGS}\)-\(\textit{CG}1\) method.

figure d

5.2 Convergence Analysis of H-BFGS-CG1 Method

The following assumptions are required in this section.

Assumption 5.1

H1::

The objective function f is twice continuously differentiable.

H2::

The level set S is convex. Moreover, there exist positive constants \(c_1\) and \(c_2\) such that

$$\begin{aligned} c_1\Vert \mathbf {z}\Vert ^2\le \mathbf {z}^{\mathrm T}F(\mathbf {x})\mathbf {z}\le c_2\Vert \mathbf {z}\Vert ^2, \end{aligned}$$

for all \(\mathbf {z}\in \mathbb R^n\) and \(\mathbf {x}\in S\), where \(F(\mathbf {x})\) is the Hessian of f.

H3::

The gradient \(\mathbf {g}\) is Lipschitz continuous at the point \(\mathbf {x}^*\), that is, there exists the positive constant \(c_3\) satisfying

$$\begin{aligned} \Vert \mathbf {g}(\mathbf {x})-g(\mathbf {x}*)\Vert \le c_3\Vert \mathbf {x}-\mathbf {x}^*\Vert , \end{aligned}$$

for all \(\mathbf {x}\) in a neighborhood of \(\mathbf {x}^*\).

Theorem 5.1

[29] Let \(\{B_k\}\) be generated by the \(\textit{BFGS}\) update formula (15), where \({\mathbf {s}}_k={\mathbf {x}}_{k+1}-{\mathbf {x}}_k\), \({\mathbf {y}}_k={\mathbf {g}}_{k+1}-{\mathbf {g}}_k\). Assume that the matrix \(B_k\) is symmetric and positive definite and satisfies (16) and (17) for all k. Furthermore, assume that \(\{{\mathbf s}_k\}\) and \(\{{\mathbf y}_k\}\) satisfy the inequality

$$\begin{aligned} \frac{\Vert {\mathbf y}_k-G_*{\mathbf s}_k\Vert }{\Vert {\mathbf s}_k\Vert }\le \epsilon _k, \end{aligned}$$

for some symmetric and positive matrix \(G_*\) and for some sequence \(\{\epsilon _k\}\) possessing the property

$$\begin{aligned} \sum _{k=1}^{\infty }{\epsilon _k} <\infty . \end{aligned}$$

Then

$$\begin{aligned} \lim _{k\rightarrow \infty }{\dfrac{\Vert (B_k-G_*){\mathbf s}_k\Vert }{\Vert {\mathbf s}_k\Vert }}=0 \end{aligned}$$

and the sequences \(\{\Vert B_k\Vert \}\), \(\{\Vert B_k^{-1}\Vert \}\) are bounded.

Theorem 5.2

(Sufficient descent and global convergence.) Consider Algorithm 4. Assume that the conditions H1, H2 and H3 in Assumption 5.1 are satisfied as well as conditions of Theorem 5.1. Then

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert {\mathbf g}_k\Vert ^2=0. \end{aligned}$$

Proof

   It holds

$$\begin{aligned} \begin{aligned} {\mathbf g}_k^{\mathrm T}{\mathbf d}_k&=-{\mathbf g}_k^{\mathrm T}B_k {\mathbf g}_k-{\mathbf g}_k^{\mathrm T}{\mathbf g}_k-\beta _k^{\textit{LSCD}}{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}+\beta _k^{\textit{LSCD}}{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}\\ {}&\le -c_1\Vert {\mathbf g}_k\Vert ^2-\Vert {\mathbf g}_k\Vert ^2\\ {}&=-(c_1+1)\Vert {\mathbf g}_k\Vert ^2\le \Vert {\mathbf g}_k\Vert ^2,\,\ c_1+1>0, \end{aligned} \end{aligned}$$
(43)

wherefrom we conclude that the sufficient descent holds.

Further, from backtracking line search condition and (43), it holds

$$\begin{aligned} f({\mathbf x}_k)-f({\mathbf x}_k+t_k{\mathbf d}_k)\le -\sigma t_k{\mathbf g}_k^{\mathrm T}{\mathbf d}_k\le \sigma t_k\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$
(44)

Since \(f({\mathbf x}_k)\) is decreasing and the sequence \({f({\mathbf x}_k)}\) is bounded below by H2, we have that

$$\begin{aligned} \lim _{k\rightarrow \infty }{(f({\mathbf x}_k)-f({\mathbf x}_k+t_k{\mathbf d}_k))}=0. \end{aligned}$$
(45)

Hence, (44) and (45) imply

$$\begin{aligned} \lim _{k\rightarrow \infty }{\sigma t_k\Vert {\mathbf g}_k\Vert ^2}=0. \end{aligned}$$

Now, from \(t_k>0\) and \(\sigma >0\), we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert {\mathbf g}_k\Vert ^2=0, \end{aligned}$$

which was our initial intention. \(\square \)

6 Numerical Experiments

In this section, we present numerical results obtained by testing \(\textit{DHSDL}\), \(\textit{DLSDL}\), \(\textit{MMDL}\), \(\textit{MLSCD}\) and H-\(\textit{BFGS}\)-\(\textit{CG}1\) methods. The codes used in the tests were written in the Matlab programming language, and the tests were performed on the computer Workstation Intel Core i3 2.0 GHz. The number of iterations, number of function evaluations and CPU time in all tested methods are analyzed.

In the first part, we compare \(\textit{DHSDL}\), \(\textit{DLSDL}\), \(\textit{MMDL}\) and \(\textit{MLSCD}\) methods. The tested 33 functions are selected from [30]. We have considered 10 different numerical experiments with the number of variables equal to 100, 500, 1000, 2000, 3000, 5000, 7000, 8000, 10,000 and 15,000, for each function in the Tables 1, 2 and 3. Summary numerical results for \(\textit{DHSDL}\), \(\textit{DLSDL}\), \(\textit{MMDL}\) and \(\textit{MLSCD}\), tested on 33 large-scale test functions, are presented in Tables 1, 2 and 3.

The stopping conditions for all algorithms are

$$\begin{aligned} \Vert {\mathbf g}_k\Vert \le 10^{-6} \quad \mathrm {and}\quad \frac{|f({\mathbf x}_{k+1})-f({\mathbf x}_k)|}{1+|f({\mathbf x}_k)|} \le 10^{-16}. \end{aligned}$$

The backtracking parameters for all algorithms are \(\sigma = 0.0001\) and \(\beta =0.8\), which means that we accept a small decrease in f predicted by a linear approximation at the current point.

Table 1 Summary numerical results for the number of iterations
Table 2 Summary numerical results for the number of function evaluations

Performance profiles of a given metric, proposed in [31], is a widely used tool for benchmarking and comparing the performance of an optimization software on a large test set. As usual, the number of iterations, number of function evaluations and the computing time (CPU time) are used as performance measures. Figure 1(left) shows the performances of compared methods relative to the number of iterations. Figure 1(right) illustrates the performance of these methods relative to the number of function evaluations. The top curve corresponds to the method that exhibits the best performances with respect to the chosen performance profile.

Table 3 Summary numerical results for the CPU time

Figure 2 shows the performance of the considered methods relative to the CPU time.

In Fig. 1(left), it is observable that \(\textit{DHSDL}\), \(\textit{DLSDL}\), \(\textit{MMDL}\) and \(\textit{MLSCD}\) methods successfully solve all the problems, and the \(\textit{MLSCD}\) method is best in 82% of the test problems compared to other methods.

In Fig. 1(right), it is observable that \(\textit{DHSDL}\), \(\textit{DLSDL}\), \(\textit{MMDL}\) and \(\textit{MLSCD}\) methods successfully solve all the problems, and the \(\textit{MLSCD}\) method is best in 78% of the test problems compared to the other methods.

Fig. 1
figure 1

Performance profiles based on the number of iterations (left) and function evaluations (right)

Fig. 2
figure 2

Performance profile based on CPU time

Graphs in Fig. 2 show that \(\textit{DHSDL}\), \(\textit{DLSDL}\), \(\textit{MMDL}\) and \(\textit{MLSCD}\) methods successfully solve all the problems, and the \(\textit{MLSCD}\) method is best in 73% of the test problems compared to the other methods.

Consequently, Figs. 1 and 2 show that the \(\textit{MLSCD}\) method generates the best result with respect to all three considered criteria.

Table 4 contains the average number of iterations, the CPU time, and the number of function evaluations for all 330 numerical experiments.

Table 4 Average numerical outcomes for 33 test functions tested on 10 numerical experiments

Based on the results arranged in Table 4, it is observable that the \(\textit{MLSCD}\) method gives better results compared to \(\textit{DHSDL}\), \(\textit{DLSDL}\) and \(\textit{MMDL}\) methods. This conclusion is confirmed by performance profiles for the number of iterations, number of function evaluations and the CPU time.

In order to compare all five methods (\(\textit{DHSDL}\), \(\textit{DLSDL}\), \(\textit{MMDL}\), \(\textit{MLSCD}\) and H-\(\textit{BFGS}\)-\(\textit{CG}1\)), we are forced to reduce the number of variables due to the bad average CPU time of the H-\(\textit{BFGS}\)-\(\textit{CG}1\) method during the testing. For each test problem, we considered 10 different numerical experiments with the number of variables as follows: 100, 200, 300, 500, 700, 800, 1000, 1500, 2000 and 3000. The stopping conditions are the same as in previous tests. Also, the values of backtracking parameters for all methods are identical. Summary numerical results for all five methods, tested on 26 large-scale test functions, are presented in Tables 5, 6 and 7. Figures 3 and 4 show the performance of these methods relative to the number of iterations, number of function evaluations and the CPU time, respectively.

In Fig. 3(left), we see that all five methods successfully solve all the problems, and the \(\textit{MLSCD}\) method is best in 42% of the test problems compared to the \(\textit{DHSDL}\) (7%), \(\textit{DLSDL}\) (7%), \(\textit{MMDL}\) (23%), H-\(\textit{BFGS}\)-\(\textit{CG}1\) (42%).

In Fig. 3(right), we see that all five methods successfully solve all test problems, and the \(\textit{MLSCD}\) method is best in 38% of the test problems compared to the \(\textit{DHSDL}\) (12%), \(\textit{DLSDL}\) (15%), \(\textit{MMDL}\) (30%), H-\(\textit{BFGS}\)-\(\textit{CG}1\) (34%).

In Fig. 4, we see that all five methods successfully solve all the problems, and the \(\textit{MLSCD}\) method is best in 50% of the test problems compared to the \(\textit{DHSDL}\) (23%), \(\textit{DLSDL}\) (12%), \(\textit{MMDL}\) (35%), H-\(\textit{BFGS}\)-\(\textit{CG}1\) (0%).

Table 8 contains the average number of iterations, the average CPU time, and the average number of function evaluations for all 260 numerical experiments.

Table 5 Summary numerical results for the number of iterations
Table 6 Summary numerical results for the number of function evaluations
Table 7 Summary numerical results for the CPU time
Fig. 3
figure 3

Performance profiles based on the number of iterations (left) and function evaluations (right)

Fig. 4
figure 4

Performance profile based on CPU time

The results in Table 8 show remarkable progress in reducing the number of iterations and the number of function evaluations using the H-\(\textit{BFGS}\)-\(\textit{CG}1\) method. Compared to the \(\textit{DHSDL}\), \(\textit{DLSDL}\), \(\textit{MMDL}\) and \(\textit{MLSCD}\) methods, the H-\(\textit{BFGS}\)-\(\textit{CG}1\) method achieved a lower average number of iterative steps, and also a lower average number of function evaluations compared to the \(\textit{DHSDL}\), \(\textit{DLSDL}\) and \(\textit{MMDL}\) methods.

However, if we observe the average CPU time for all five methods, we can conclude that the H-\(\textit{BFGS}\)-\(\textit{CG}1\) method is slow. The conclusion is the same if we look at graphs arranged in Fig. 4.

From the above, we can give a final conclusion that the \(\textit{MLSCD}\) method is the most efficient in terms of all three underlying metrics: number of iterations, number of function evaluations and the CPU time.

7 Conclusions

We propose three efficient conjugate gradient methods for unconstrained optimization problems, in which the search directions always satisfy the sufficient descent condition. These methods are derived using various modifications on the conjugate gradients direction \(d_k\) of the form (7) or (8) or using various combinations of scaling parameters \(\beta _k\). Comparative criteria are the number of iterative steps, spent CPU time, and the number of the function evaluations. Based on the backtracking line search conditions, we show that our methods are strongly convergent for the uniformly convex functions and globally convergent for general functions. Numerical results illustrate that the proposed methods can outperform the existing ones. Also, these results show that a hybridization of a quasi-Newton method with a \(\textit{CG}\) method reduces the number of iterations and the number of function evaluations, but increases the CPU time.

Table 8 Average numerical outcomes for 26 test functions tested on 10 numerical experiments