New Hybrid Conjugate Gradient and Broyden–Fletcher–Goldfarb–Shanno Conjugate Gradient Methods

Stanimirović, Predrag S.; Ivanov, Branislav; Djordjević, Snežana; Brajević, Ivona

doi:10.1007/s10957-018-1324-3

New Hybrid Conjugate Gradient and Broyden–Fletcher–Goldfarb–Shanno Conjugate Gradient Methods

Published: 11 June 2018

Volume 178, pages 860–884, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

New Hybrid Conjugate Gradient and Broyden–Fletcher–Goldfarb–Shanno Conjugate Gradient Methods

Download PDF

Predrag S. Stanimirović¹,
Branislav Ivanov²,
Snežana Djordjević³ &
…
Ivona Brajević¹

697 Accesses
14 Citations
Explore all metrics

Abstract

Three hybrid methods for solving unconstrained optimization problems are introduced. These methods are defined using proper combinations of the search directions and included parameters in conjugate gradient and quasi-Newton methods. The convergence of proposed methods with the underlying backtracking line search is analyzed for general objective functions and particularly for uniformly convex objective functions. Numerical experiments show the superiority of the proposed methods with respect to some existing methods in view of the Dolan and Moré’s performance profile.

A nonmonotone hybrid conjugate gradient method for unconstrained optimization

Article Open access 08 April 2015

A Mixed Hybrid Conjugate Gradient Method for Unconstrained Engineering Optimization Problems

Further comment on another hybrid conjugate gradient algorithm for unconstrained optimization by Andrei

Article 10 July 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We study the conjugate gradient methods for unconstrained optimization problems. It is well known that Fletcher–Reeves ($\textit{FR}$) [1], Conjugate Descent ($\textit{CD}$) [2] and Dai–Yuan ($\textit{DY}$) [3] conjugate gradient methods have strong convergence properties, but they may not perform well in practice due to jamming. On the other hand, Hestenes–Stiefel ($\textit{HS}$) [4], Polak–Ribiére–Polyak ($\textit{PRP}$) [5, 6], and Liu–Storey ($\textit{LS}$) [7] conjugate gradient methods may not converge in general, but they often perform better than $\textit{FR}$, $\textit{CD}$ and $\textit{DY}$. To combine the best numerical performances of the $\textit{LS}$ method and the global convergence properties of the $\textit{CD}$ method, Yang et al. [8] proposed a hybrid $\textit{LS}$-$\textit{CD}$ method. Dai and Liao [9] proposed an efficient conjugate gradient-type method (Dai–Liao method). Later, some more efficient Dai–Liao-type conjugate gradient methods, known as $\textit{DHSDL}$ and $\textit{DLSDL}$, were designed and studied in [10].

Continuing previous results, we propose two efficient hybridizations of conjugate gradient ($\textit{CG}$) methods for solving unconstrained optimization problems and a hybridization of the Broyden–Fletcher–Goldfarb–Shanno ($\textit{BFGS}$) method with the $\textit{CG}$ methods. The convergence analysis of the proposed methods is considered. Numerical experiments show that our methods outperform the existing ones from [10].

The rest of the paper is organized as follows. In Sect. 2, we elaborate various possibilities to determine the step size and the search direction which can be used in defining various conjugate gradient methods and their combinations. Here, a hybridization of the $\textit{CG}$ method and the $\textit{BFGS}$ method will also be presented. A modification of the $\textit{LSCD}$ method from [8], termed as $\textit{MLSCD}$, is proposed in Sect. 3. Also, the global convergence of the $\textit{MLSCD}$ method for non-convex functions with the backtracking line search is approved. In Sect. 4, we combine conjugate gradient parameters of $\textit{DHSDL}$ and $\textit{DLSDL}$ methods from [10] and propose the modification of these methods, termed as $\textit{MMDL}$. The global convergence of the $\textit{MMDL}$ method, supported by the backtracking line search, is proved. A new hybrid $\textit{BFGS}$-$\textit{CG}$ search direction, that combines the search direction of the quasi-Newton $\textit{BFGS}$ method and $\textit{CG}$ methods, is proposed in Sect. 5. The method will be termed as H-$\textit{BFGS}$-$\textit{CG}1$. The global convergence of the H-$\textit{BFGS}$-$\textit{CG}1$ method with the backtracking line search is proved. In Sect. 6, we report some numerical results and compare the performance of the proposed methods with some existing methods. Some final conclusions are given in the last section.

2 Preliminaries

Unconstrained optimization problems consider the problem of minimizing an objective function $f:\mathbb R^n\rightarrow \mathbb R$

$$\begin{aligned} \min f(\mathbf {x}),\quad \mathbf {x}\in \mathbb R^n, \end{aligned}$$

(1)

that depends on real variables $\mathbf {x}=\!\{x_1,\ldots ,x_n\}$ without any restrictions on their values. There are many methods for solving (1). Usually, the objective f is a continuously differentiable function with its gradient $\mathbf {g}$. The general CG iterative scheme is given by

$$\begin{aligned} \mathbf {x}_{k+1}=\!{\mathbf x}_k+t_k{\mathbf d}_k,\quad k=\!0,1,\ldots , \end{aligned}$$

(2)

where ${\mathbf x}_k$ is the current iterate, $t_k$ is the step size found by one of the line search methods, ${\mathbf d}_k$ is the search direction given by next relations

$$\begin{aligned} {\mathbf d}_k:=\mathfrak {d}_k:=\mathfrak {d}(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1})=\left\{ \begin{array}{ll} -\,\mathbf {g}_0, &{}\quad k=0,\\ -\,{\mathbf g}_k+\beta _k{\mathbf d}_{k-1}, &{}\quad k\ge 1, \end{array} \right. \end{aligned}$$

(3)

where ${\mathbf g}_k=\mathbf {g}({\mathbf x}_k)$ and $\beta _k$ is an appropriately defined real scalar, known as the conjugate gradient parameter. A number of conjugate gradient iterative methods have been defined using various modifications of the conjugate gradient direction ${\mathbf d}_k$ and the parameter $\beta _k$. The most popular conjugate gradient parameters $\beta _k$ are surveyed by next formulas:

$$\begin{aligned} \begin{aligned}&\beta _k^{\textit{FR}}=\dfrac{\Vert {\mathbf g}_k\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^2}\,\, [1],\quad \beta _k^{\textit{CD}}=-\dfrac{\Vert {\mathbf g}_k\Vert ^2}{{\mathbf g}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\,\, [2],\quad \beta _k^{\textit{DY}}=\dfrac{\Vert {\mathbf g}_k\Vert ^2}{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\,\, [3],\\&\beta _k^{\textit{HS}}=\dfrac{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf g}_k}{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\,\, [4],\quad \beta _k^{\textit{PRP}}=\dfrac{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf g}_k}{\Vert {\mathbf g}_{k-1}\Vert ^2}\,\, [5,6],\quad \beta _k^{\textit{LS}}=-\dfrac{{\mathbf y}_{k-1}^{\mathrm T}{\mathbf g}_k}{{\mathbf g}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\,\, [7], \\&\beta _k^{\textit{DHSDL}}=\frac{{\left\| {\mathbf g}_k\right\| }^2-\frac{\left\| {\mathbf g}_k\ \right\| }{\left\| {\mathbf g}_{k-1}\ \right\| }\left| {\mathbf g}_k^{\mathrm T}{\mathbf g}_{k-1}\right| }{\mu \left| {\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}\right| +\mathbf {d}_{k-1}^{\mathrm T}{\mathbf y}_{k-1}}- t_k\frac{{\mathbf g}_k^{\mathrm T}{\mathbf s}_{k-1}}{{\mathbf k}_{k-1}^{\mathrm T}{\mathbf y}_{k-1}}\,\, [10],\\&\beta _k^{\textit{DLSDL}}=\frac{{\left\| {\mathbf g}_k\right\| }^2-\frac{\left\| {\mathbf g}_k\ \right\| }{\left\| {\mathbf g}_{k-1}\ \right\| }\left| {\mathbf g}_k^{\mathrm T}{\mathbf g}_{k-1}\right| }{\mu \left| {\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}\right| -{\mathbf k}_{k-1}^{\mathrm T}{\mathbf g}_{k-1}}-t_k\frac{\mathbf {g}_k^{\mathrm T}{\mathbf s}_{k-1}}{\mathbf {d}_{k-1}^{\mathrm T}{\mathbf y}_{k-1}}\, \, [10], \end{aligned} \end{aligned}$$

(4)

where ${\mathbf y}_{k-1}={\mathbf g}_k-\mathbf {g}_{k-1}$, $\mathbf {s}_{k-1}={\mathbf x}_k-{\mathbf x}_{k-1}$ and $\Vert \cdot \Vert $ stands for the Euclidean vector norm.

As usual, the step size $t_k$ is determined using the backtracking line search procedure from [11]. It starts from $t=1$, and it reduces the objective function sufficiently in each iteration. Therefore, we use the following Algorithm 1 from [12] for the inexact line search which determines $t_k$.

To combine the best numerical performances of the $\textit{PRP}$ method and the global convergence properties of the $\textit{FR}$ method, Touati–Ahmed and Storey [13] proposed a hybrid $\textit{PRP}$–$\textit{FR}$ method, which is called the H1 method in [14], using the conjugate gradient parameter defined as

$$\begin{aligned} \beta _k^{\textit{H}1}=\max \Big \{0, \min \Big \{\beta _k^{\textit{PRP}},\beta _k^{\textit{FR}}\Big \}\Big \}. \end{aligned}$$

(5)

Gilbert and Nocedal in [15] modified (5) to

$$\begin{aligned} \beta _k=\max \Big \{-\beta _k^{\textit{FR}},\min \Big \{\beta _k^{\textit{PRP}},\beta _k^{\textit{FR}}\Big \}\Big \}. \end{aligned}$$

A hybrid $\textit{HS}$–$\textit{DY}$ conjugate gradient method was proposed by Dai and Yuan in [16]. This method is termed as the H2 method in [14]. The H2 method is defined by the conjugate gradient parameter

$$\begin{aligned} \beta _k^{\textit{H}2}=\max \Big \{0, \min \Big \{\beta _k^{\textit{HS}},\beta _k^{\textit{DY}}\Big \}\Big \}. \end{aligned}$$

(6)

Numerical results derived in [13, 16, 17] show better performances of H1 and H2 methods with respect to the $\textit{PRP}$ method.

We consider hybrid CG algorithms where the search direction ${\mathbf d}_k:=\mathfrak {d}_k$, $k\ge 1$, from (3) is modified using one of the following two rules:

$$\begin{aligned} {\mathbf d}_k:= & {} \mathfrak {D}(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1}) =-\left( 1+\beta _k\dfrac{{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}}{\Vert {\mathbf g}_k\Vert ^2}\right) {\mathbf g}_k+\beta _k{\mathbf d}_{k-1}, \end{aligned}$$

(7)

$$\begin{aligned} {\mathbf d}_k:= & {} \mathfrak {D}_1(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1}) =\!-B_k{\mathbf g}_k +\mathfrak {D}(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1}), \end{aligned}$$

(8)

and the conjugate gradient parameter $\beta _k$ is defined using some proper combinations of the parameters $\beta _k$ from (4) and already defined hybridizations of these parameters.

Following the idea from the descent three term $\textit{PRP}$ method from [18], Zhang et al. in [19, 20] proposed a modification to the $\textit{FR}$ method, termed as the $\textit{MFR}$ method, using the search direction

$$\begin{aligned} \begin{aligned} {\mathbf d}_k&:=\mathfrak {D}\Big (\beta _k^{\textit{FR}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned} \end{aligned}$$

(9)

Zhang in [20] also proposed a modified DY method, which is known as the $\textit{MDY}$ method and which is defined by the gradient direction

$$\begin{aligned} \begin{aligned} {\mathbf d}_k&:=\mathfrak {D}\Big (\beta _k^{\textit{DY}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned} \end{aligned}$$

(10)

The $\textit{MFR}$ and $\textit{MDY}$ methods possess very useful property

$$\begin{aligned} {\mathbf g}_k^{\mathrm T}{\mathbf d}_k =\! -\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$

(11)

Further, the $\textit{MFR}$ and the $\textit{MDY}$ methods reduce to the $\textit{FR}$ method and the DY method, respectively, if the exact line search is used. The $\textit{MFR}$ method has proven to be globally convergent for non-convex functions with the Wolfe line search or the Armijo line search, and it is very efficient in real computations [19].

However, it is not known whether the $\textit{MDY}$ method converges globally. So, in the paper [14], the authors replaced $\beta _k^{\textit{FR}}$ in (9) and $\beta _k^{\textit{DY}}$ in (10) by $\beta _k^{\textit{H1}}$ from (5) and $\beta _k^{H2}$ from (6), respectively. Then, they defined new hybrid $\textit{PRP}$–$\textit{FR}$ and $\textit{HS}$–$\textit{DY}$ methods, which they call the $\textit{NH}1$ method and the $\textit{NH}2$ method, respectively. These methods are based upon the search directions

$$\begin{aligned}&\text {NH1:}\quad {\mathbf d}_k :=\mathfrak {D}\Big (\beta _k^{H1},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ), \end{aligned}$$

(12)

$$\begin{aligned}&\text {NH2:} \quad {\mathbf d}_k :=\mathfrak {D}\Big (\beta _k^{H2},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned}$$

(13)

It is easy to see that the $\textit{NH}1$ and the $\textit{NH}2$ methods still satisfy (11), which shows that they are descent methods.

On the other hand, the search direction ${\mathbf d}_k$ in quasi-Newton methods is obtained as a solution of the linear algebraic system

$$\begin{aligned} B_k{\mathbf d}_k=\!-{\mathbf g}_k, \end{aligned}$$

(14)

where $B_k$ is an approximation of the Hessian. The initial approximation is the identity matrix $B_0=I$, and subsequent updates $B_k$ are defined by an appropriate formula. Here we are interested in the $\textit{BFGS}$ update formula, defined by

$$\begin{aligned} B_{k+1}=B_k-\dfrac{B_k{\mathbf s}_k {\mathbf s}_k^{\mathrm T}B_k}{{\mathbf s}_k^{\mathrm T}B_k{\mathbf s}_k}+\frac{{\mathbf y}_k{\mathbf y}_k^{\mathrm T}}{{\mathbf s}_k^{\mathrm T}{\mathbf y}_k}, \end{aligned}$$

(15)

where ${\mathbf s}_k={\mathbf x}_{k+1}-{\mathbf x}_k$, ${\mathbf y}_k={\mathbf g}_{k+1}-{\mathbf g}_k$. The next secant equation must hold:

$$\begin{aligned} B_{k+1}{\mathbf s}_k={\mathbf y}_k, \end{aligned}$$

(16)

which is possible only if the curvature condition

$$\begin{aligned} {\mathbf s}_k^{\mathrm T}\, {\mathbf y}_k>0 \end{aligned}$$

(17)

is satisfied.

The three-term hybrid $\textit{BFGS}$ conjugate gradient method was proposed in [21]. That method uses best properties of both $\textit{BFGS}$ and $\textit{CG}$ methods and defines a hybrid $\textit{BFGS}$-$\textit{CG}$ method for solving some selected unconstrained optimization problems, resulting in improvement in the total number of iterations and the CPU time.

3 A Mixed LS-CD Conjugate Gradient Method

We consider the modification of $\textit{LSCD}$ method, defined in [8] by

$$\begin{aligned} \begin{aligned} \beta _k^{\textit{LSCD}}&=\max \Big \{0,\min \Big \{\beta _k^{\textit{LS}},\beta _k^{\textit{CD}}\Big \}\Big \},\\ {\mathbf d}_k&=\mathfrak {d}\Big (\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ) \end{aligned} \end{aligned}$$

(18)

and define the $\textit{MLSCD}$ method with the search direction

$$\begin{aligned} {\mathbf d}_k\!:=\!\mathfrak {D}\Big (\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned}$$

(19)

In general, our idea is to replace $\mathfrak {d}_k(\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1})$ from [8] with ${\mathbf d}_k=\mathfrak {D}(\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1})$.

Now we give the algorithm of this method.

3.1 Convergence of the MLSCD Conjugate Gradient Method

First, it is easy to prove the next theorem.

Theorem 3.1

Let $\beta _k$ be any $\textit{CG}$ parameter. Then, the search direction ${\mathbf d}_k \!:=\!\mathfrak {D}(\beta _k,{\mathbf g}_k,{\mathbf d}_{k-1})$ satisfies

$$\begin{aligned} {\mathbf g}_k^{\mathrm T}{\mathbf d}_k=-\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$

(20)

We are going to prove the global convergence of the $\textit{MLSCD}$ method under the following assumptions.

Assumption 3.1

(1)
The level set $S=\{\mathbf {x}\in \mathbb {R}^n|\ f(\mathbf {x})\le f(\mathbf {x}_0)\}$ is bounded.
(2)
The function $\mathbf {g}$ is continuously differentiable in some neighborhood $\mathcal {N}$ of S and its gradient is Lipschitz continuous. Namely, there exists a constant $L>0$, such that
$$\begin{aligned} \Vert \mathbf {g}(\mathbf {x})-\mathbf {g}(\mathbf {y})\Vert \le L\Vert \mathbf {x}-\mathbf {y}\Vert ,\quad \text {for all}\quad \mathbf {x},\mathbf {y}\in \mathbb {N}. \end{aligned}$$
(21)

It is well known that if Assumption 3.1 holds, then there exists a positive constant $\Gamma $, such that

$$\begin{aligned} \Vert {\mathbf g}_k\Vert \le \Gamma ,\quad \text {for all}\quad k. \end{aligned}$$

(22)

The outcome of the next lemma, often called the Zoutendijk condition, is used to prove the global convergence of nonlinear $\textit{CG}$ methods. Originally, it was given in [22].

Lemma 3.1

[22, 23] Let the conditions in Assumption 3.1 be satisfied. Let the sequence $\{{\mathbf x}_k\}$ be generated by the $\textit{MLSCD}$ method with the backtracking line search. Then it holds that

$$\begin{aligned} \sum _{k=0}^{\infty } {\dfrac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}}<+\,\infty . \end{aligned}$$

(23)

Proof

Consider the case where the backtracking line search is used. If $t_k \ne \beta $, then it is not difficult to show that there is a constant $c>0$ such that

$$\begin{aligned} t_k \ge c {\dfrac{-{\mathbf g}_k^{\mathrm T}{\mathbf d}_k}{\Vert {\mathbf d}_k\Vert ^2}}= c {\dfrac{\Vert {\mathbf g}_k\Vert ^2}{\Vert {\mathbf d}_k\Vert ^2}}. \end{aligned}$$

(24)

This together with backtracking line search implies that there exists a constant $M > 0 $ such that

$$\begin{aligned} {\dfrac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}}\le M(f({\mathbf x}_k)-f({\mathbf x}_{k+1})). \end{aligned}$$

(25)

On the other hand, if $t_k = \zeta $ then it follows that

$$\begin{aligned} {\dfrac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}}\le \Vert {\mathbf g}_k\Vert ^2\le \Vert {\mathbf d}_k\Vert ^2\le \delta ^{-1} \zeta ^{-2}(f({\mathbf x}_k)-f({\mathbf x}_{k+1})). \end{aligned}$$

This also implies (25) with some $M=\delta ^{-1} \zeta ^{-2}$. $\square $

Theorem 3.2

Let the conditions of Assumption 3.1 hold. Then the sequence $\{{\mathbf x}_k\}$, generated by the $\textit{MLSCD}$ method with the backtracking line search satisfies

$$\begin{aligned} \liminf _{k\rightarrow \infty }{\Vert {\mathbf g}_k\Vert }=0. \end{aligned}$$

(26)

Proof

In order to gain the contradiction, let us suppose that (26) does not hold. Then, there exists a constant $c>0$ such that

$$\begin{aligned} \Vert {\mathbf g}_k\Vert \ge c,\quad \text {for all}\quad k\ge 0. \end{aligned}$$

(27)

Clearly, (19) can be rewritten into the form

$$\begin{aligned} {\mathbf d}_k=-l_k {\mathbf g}_k+\beta _k^{\textit{LSCD}}\,{\mathbf d}_{k-1},\quad l_k=1+\beta _k^{\textit{LSCD}}\,\frac{{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}}{\Vert {\mathbf g}_k\Vert ^2}. \end{aligned}$$

(28)

Now, from (28), it follows that

$$\begin{aligned} {\mathbf d}_k+l_k {\mathbf g}_k=\beta _k^{\textit{LSCD}}\,{\mathbf d}_{k-1}, \end{aligned}$$

which further implies

$$\begin{aligned} \begin{aligned} ({\mathbf d}_k+l_k {\mathbf g}_k)^2\!&=\!\Big (\beta _k^{\textit{LSCD}}\,{\mathbf d}_{k-1}\Big )^2,\\ \Vert {\mathbf d}_k\Vert ^2 + 2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k + l_k^2\Vert {\mathbf g}_k\Vert ^2 \!&=\! \Big (\beta _k^{\textit{LSCD}}\Big )^2\Vert {\mathbf d}_{k-1}\Vert ^2, \end{aligned} \end{aligned}$$

and subsequently

$$\begin{aligned} \Vert {\mathbf d}_k\Vert ^2 = \Big (\beta _k^{\textit{LSCD}}\Big )^2 \Vert {\mathbf d}_{k-1}\Vert ^2 - 2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k - l_k^2\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$

(29)

Notice that

$$\begin{aligned} \beta _k^{\textit{LSCD}}=\max \Big \{0,\min \Big \{\beta _k^{\textit{LS}},\beta _k^{\textit{CD}}\Big \}\Big \} \le |\beta _k^{\textit{CD}}|. \end{aligned}$$

(30)

Dividing both sides of (29) by $({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2$, we get from (30), (20), (27) and the definition of $\beta _k^{\textit{CD}}$ that

$$\begin{aligned} \begin{aligned} \frac{\Vert {\mathbf d}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}=\frac{\Vert {\mathbf d}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\!&=\!\Big (\beta _k^{\textit{LSCD}}\Big )^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&\le \Big (\beta _k^{\textit{CD}}\Big )^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_k} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&=\left( \frac{\Vert {\mathbf g}_k\Vert ^2}{-{\mathbf g}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\right) ^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_k} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}. \end{aligned} \end{aligned}$$

(31)

Now, applying (20) in (31), one can verify

$$\begin{aligned} \begin{aligned} \frac{\Vert {\mathbf d}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}&=\frac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf g}_{k-1}\Vert ^4}\,\, \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4} + \frac{2l_k}{\Vert {\mathbf g}_k\Vert ^2} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}\\&=\frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} + \frac{2l_k}{\Vert {\mathbf g}_k\Vert ^2} - l_k^2 \frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&=\frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} - \frac{(l_k-1)^2}{\Vert {\mathbf g}_k\Vert ^2} +\frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&\le \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} +\frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&\le \sum _{j=0}^k\frac{1}{\Vert \mathbf {g}_j\Vert ^2}\\&\le \frac{k+1}{c^2}. \end{aligned} \end{aligned}$$

The last inequalities imply

$$\begin{aligned} \sum _{k\ge 1}\frac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}\ge c^2\sum _{k\ge 1}\frac{1}{k+1}=\infty , \end{aligned}$$

(32)

which contradicts to (23). The proof is thus complete. $\square $

4 A Mixed DHSDL-DLSDL Conjugate Gradient Method

In this section, we propose the hybrid $\textit{MMDL}$ method which is defined by the search direction ${\mathbf d}_k$ as follows:

$$\begin{aligned} \beta _k^{\textit{MMDL}}=\max \Big \{0,\min \Big \{\beta _k^{\textit{DHSDL}}, \beta _k^{\textit{DLSDL}}\Big \}\Big \}, \end{aligned}$$

(33)

$$\begin{aligned} {\mathbf d}_k=\mathfrak {D}\Big (\beta _k^{\textit{MMDL}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ). \end{aligned}$$

(34)

The computational algorithm of this method is presented in Algorithm 3.

4.1 Convergence of the MMDL Conjugate Gradient Method

We now prove the global convergence of the $\textit{MMDL}$ method for arbitrary objective functions.

Theorem 4.1

Let the conditions in Assumption 3.1 hold. Then the sequence $\{{\mathbf x}_k\}$ generated by the $\textit{MMDL}$ method with the backtracking line search satisfies

$$\begin{aligned} \liminf _{k\rightarrow \infty }{\Vert {\mathbf g}_k\Vert }=0. \end{aligned}$$

(35)

Proof

Assume, on the contrary, that (35) does not hold. Then, there exists a constant $c>0$ such that

$$\begin{aligned} \Vert {\mathbf g}_k\Vert \ge c,\quad \text {for all}\quad k. \end{aligned}$$

(36)

Denote

$$\begin{aligned} l_k=1+\beta _k^{\textit{MMDL}}\,\frac{{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}}{\Vert {\mathbf g}_k\Vert ^2}. \end{aligned}$$

(37)

Then we can write

$$\begin{aligned} {\mathbf d}_k+l_k {\mathbf g}_k=\beta _k^{\textit{MMDL}}\,{\mathbf d}_{k-1}, \end{aligned}$$

(38)

and further

$$\begin{aligned} \begin{aligned} ({\mathbf d}_k+l_k {\mathbf g}_k)^2\!&=\!\Big (\beta _k^{\textit{MMDL}}\,{\mathbf d}_{k-1}\Big )^2,\\ \Vert {\mathbf d}_k\Vert ^2 + 2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k + l_k^2\Vert {\mathbf g}_k\Vert ^2 \!&=\! \Big (\beta _k^{\textit{MMDL}}\Big )^2\Vert {\mathbf d}_{k-1}\Vert ^2. \end{aligned} \end{aligned}$$

Thus,

$$\begin{aligned} \Vert {\mathbf d}_k\Vert ^2 = \Big (\beta _k^{\textit{MMDL}}\Big )^2 \Vert {\mathbf d}_{k-1}\Vert ^2 - 2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k - l_k^2\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$

(39)

Having in view $\mu >1$ as well as ${\mathbf d}_k^{\mathrm T}{\mathbf g}_k<0$ and applying the extended conjugacy condition $\mathbf {d}_k^{\mathrm T}{\mathbf y}_{k-1}=t\mathbf {g}_k^{\mathrm T}{\mathbf s}_{k-1}$, $t>0$, which was exploited in [9, 10], we get

$$\begin{aligned} \begin{aligned} {\beta }^{\textit{DHSDL}}_{k+1}&=\frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| +\mathbf {d}_k^{\mathrm T}{\mathbf y}_k}-t_k\frac{\mathbf {g}_{k+1}^{\mathrm T}{\mathbf s}_k}{\mathbf {d}_k^{\mathrm T}{\mathbf y}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| +\mathbf {d}_k^{\mathrm T}{\mathbf y}_k}\\&=\frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| + \mathbf {d}_k^{\mathrm T}({\mathbf g}_{k+1}-{\mathbf g}_k)}\\&=\frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| +\mathbf {d}_k^{\mathrm T}{\mathbf g}_{k+1}-{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2}{\mu \left| \mathbf {g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| +{\mathbf d}_k^{\mathrm T}{\mathbf g}_{k+1}-\mathbf {d}_k^{\mathrm T}{\mathbf g}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2}{-\mathbf {d}_k^{\mathrm T}{\mathbf g}_k}. \end{aligned} \end{aligned}$$

Further,

$$\begin{aligned} \begin{aligned} {\beta }^{\textit{DLSDL}}_{k+1}&=\frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| -{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}-t_k\frac{{\mathbf g}_{k+1}^{\mathrm T}{\mathbf s}_k}{{\mathbf d}_k^{\mathrm T}{\mathbf y}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2-\frac{\left\| {\mathbf g}_{k+1}\ \right\| }{\left\| {\mathbf g}_k\ \right\| }\left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf g}_k\right| }{\mu \left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| -{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2}{\mu \left| {\mathbf g}_{k+1}^{\mathrm T}{\mathbf d}_k\right| -{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}\\&\le \frac{{\left\| {\mathbf g}_{k+1}\right\| }^2}{-{\mathbf d}_k^{\mathrm T}{\mathbf g}_k}. \end{aligned} \end{aligned}$$

Now, we easily conclude

$$\begin{aligned} \beta _k^{\textit{MMDL}}=\max \Big \{0,\min \Big \{\beta _k^{\textit{DHSDL}},\beta _k^{\textit{DLSDL}}\Big \}\Big \}\le \frac{\Vert {\mathbf g}_k\Vert ^2}{-{\mathbf d}_{k-1}^{\mathrm T}{\mathbf g}_{k-1}}. \end{aligned}$$

(40)

Next, dividing both sides of (39) by $({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2$, we get from (20), (40) and (36) that

$$\begin{aligned} \begin{aligned} \frac{\Vert {\mathbf d}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}=\frac{\Vert {\mathbf d}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}&= \Big (\beta _k^{\textit{MMDL}}\Big )^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k {\mathbf d}_k^{\mathrm T}{\mathbf g}_k}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&= \Big (\beta _k^{\textit{MMDL}}\Big )^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_k} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&\le \left( \frac{\Vert {\mathbf g}_k\Vert ^2}{-{\mathbf g}_{k-1}^{\mathrm T}{\mathbf d}_{k-1}}\right) ^2 \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2} - \frac{2l_k}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_k} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{({\mathbf g}_k^{\mathrm T}{\mathbf d}_k)^2}\\&=\frac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf g}_{k-1}\Vert ^4}\,\, \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4} + \frac{2l_k}{\Vert {\mathbf g}_k\Vert ^2} - l_k^2 \frac{\Vert {\mathbf g}_k\Vert ^2}{\Vert {\mathbf g}_k\Vert ^4}\\&=\frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} + \frac{2l_k}{\Vert {\mathbf g}_k\Vert ^2} - l_k^2 \frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&=\frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} - \frac{(l_k-1)^2}{\Vert {\mathbf g}_k\Vert ^2} +\frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&\le \frac{\Vert {\mathbf d}_{k-1}\Vert ^2}{\Vert {\mathbf g}_{k-1}\Vert ^4} +\frac{1}{\Vert {\mathbf g}_k\Vert ^2}\\&\le \sum _{j=0}^k\frac{1}{\Vert \mathbf {g}_j\Vert ^2}\\&\le \frac{k+1}{c^2}. \end{aligned} \end{aligned}$$

(41)

The inequalities in (41) imply

$$\begin{aligned} \sum _{k\ge 1}\frac{\Vert {\mathbf g}_k\Vert ^4}{\Vert {\mathbf d}_k\Vert ^2}\ge c^2\sum _{k\ge 1}\frac{1}{k+1}=\infty . \end{aligned}$$

(42)

Therefore, $\Vert {\mathbf g}_k\Vert \ge c$ causes a contradiction to (23). Consequently, (35) is verified. $\square $

5 Hybrid Three-Term Broyden–Fletcher–Goldfarb–Shanno Conjugate Gradient Methods in Solving Unconstrained Optimization Problems

It is known that conjugate gradient methods are better compared to the quasi-Newton method in terms of the CPU time. In addition, $\textit{BFGS}$ is more costly in terms of the memory storage requirements than $\textit{CG}$. On the other hand, the quasi-Newton methods are better in terms of the number of iterations and the number of function evaluations. For this purpose, various hybridizations of quasi-Newton methods and $\textit{CG}$ methods have been proposed by previous researchers. The new hybrid method which solves the system of nonlinear equations combining the quasi-Newton method with chaos optimization was proposed in [24]. In [25], the authors defined a combination of a quasi-Newton and the Cauchy descent method for solving unconstrained optimization problems, which is known as the quasi-Newton SD method. In [21], the authors proposed a hybrid search direction that combines the quasi-Newton and $\textit{CG}$ methods. It yields a search direction of the hybrid method which is known as the $\textit{BFGS}$-$\textit{CG}$ method in [21]. The search direction of the method from [21] is defined by

$$\begin{aligned} {\mathbf d}_k={\left\{ \begin{array}{ll} -B_k{\mathbf g}_k,&{}\quad k=0,\\ -B_k{\mathbf g}_k+\eta (-{\mathbf g}_k+\beta _k{\mathbf d}_{k-1}),&{}\quad k\ge 1,\end{array}\right. } \end{aligned}$$

where $\eta >0$ and $\beta _k=\dfrac{{\mathbf g}_k^{\mathrm T}{\mathbf g}_{k-1}}{{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}}$. A hybrid direction search between $\textit{BFGS}$ update of the Hessian matrix and the conjugate coefficient $\beta _k$ was proposed and investigated in [26, 27]. A Hybrid $\textit{DFP}$-$\textit{CG}$ method for solving unconstrained optimization problems was presented in [28].

5.1 Three-Term H-BFGS-CG1 Method

Our idea is to consider a three-term hybrid $\textit{BFGS}$-$\textit{CG}$ method (called H-$\textit{BFGS}$-$\textit{CG}1$), defined by the search direction

$$\begin{aligned} {\mathbf d}_k\!:=\!{\left\{ \begin{array}{ll} -B_k{\mathbf g}_k,&{} \quad k=0,\\ \mathfrak {D}_1\Big (\beta _k^{\textit{LSCD}},{\mathbf g}_k,{\mathbf d}_{k-1}\Big ),&{} \quad k\ge 1. \end{array}\right. } \end{aligned}$$

Algorithm 4 defines the corresponding computational procedure of the H-$\textit{BFGS}$-$\textit{CG}1$ method.

5.2 Convergence Analysis of H-BFGS-CG1 Method

The following assumptions are required in this section.

Assumption 5.1

H1::

The objective function f is twice continuously differentiable.

H2::

The level set S is convex. Moreover, there exist positive constants $c_1$ and $c_2$ such that

$$\begin{aligned} c_1\Vert \mathbf {z}\Vert ^2\le \mathbf {z}^{\mathrm T}F(\mathbf {x})\mathbf {z}\le c_2\Vert \mathbf {z}\Vert ^2, \end{aligned}$$

for all $\mathbf {z}\in \mathbb R^n$ and $\mathbf {x}\in S$, where $F(\mathbf {x})$ is the Hessian of f.

H3::

The gradient $\mathbf {g}$ is Lipschitz continuous at the point $\mathbf {x}^*$, that is, there exists the positive constant $c_3$ satisfying

$$\begin{aligned} \Vert \mathbf {g}(\mathbf {x})-g(\mathbf {x}*)\Vert \le c_3\Vert \mathbf {x}-\mathbf {x}^*\Vert , \end{aligned}$$

for all $\mathbf {x}$ in a neighborhood of $\mathbf {x}^*$.

Theorem 5.1

[29] Let $\{B_k\}$ be generated by the $\textit{BFGS}$ update formula (15), where ${\mathbf {s}}_k={\mathbf {x}}_{k+1}-{\mathbf {x}}_k$, ${\mathbf {y}}_k={\mathbf {g}}_{k+1}-{\mathbf {g}}_k$. Assume that the matrix $B_k$ is symmetric and positive definite and satisfies (16) and (17) for all k. Furthermore, assume that $\{{\mathbf s}_k\}$ and $\{{\mathbf y}_k\}$ satisfy the inequality

$$\begin{aligned} \frac{\Vert {\mathbf y}_k-G_*{\mathbf s}_k\Vert }{\Vert {\mathbf s}_k\Vert }\le \epsilon _k, \end{aligned}$$

for some symmetric and positive matrix $G_*$ and for some sequence $\{\epsilon _k\}$ possessing the property

$$\begin{aligned} \sum _{k=1}^{\infty }{\epsilon _k} <\infty . \end{aligned}$$

Then

$$\begin{aligned} \lim _{k\rightarrow \infty }{\dfrac{\Vert (B_k-G_*){\mathbf s}_k\Vert }{\Vert {\mathbf s}_k\Vert }}=0 \end{aligned}$$

and the sequences $\{\Vert B_k\Vert \}$, $\{\Vert B_k^{-1}\Vert \}$ are bounded.

Theorem 5.2

(Sufficient descent and global convergence.) Consider Algorithm 4. Assume that the conditions H1, H2 and H3 in Assumption 5.1 are satisfied as well as conditions of Theorem 5.1. Then

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert {\mathbf g}_k\Vert ^2=0. \end{aligned}$$

Proof

It holds

$$\begin{aligned} \begin{aligned} {\mathbf g}_k^{\mathrm T}{\mathbf d}_k&=-{\mathbf g}_k^{\mathrm T}B_k {\mathbf g}_k-{\mathbf g}_k^{\mathrm T}{\mathbf g}_k-\beta _k^{\textit{LSCD}}{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}+\beta _k^{\textit{LSCD}}{\mathbf g}_k^{\mathrm T}{\mathbf d}_{k-1}\\ {}&\le -c_1\Vert {\mathbf g}_k\Vert ^2-\Vert {\mathbf g}_k\Vert ^2\\ {}&=-(c_1+1)\Vert {\mathbf g}_k\Vert ^2\le \Vert {\mathbf g}_k\Vert ^2,\,\ c_1+1>0, \end{aligned} \end{aligned}$$

(43)

wherefrom we conclude that the sufficient descent holds.

Further, from backtracking line search condition and (43), it holds

$$\begin{aligned} f({\mathbf x}_k)-f({\mathbf x}_k+t_k{\mathbf d}_k)\le -\sigma t_k{\mathbf g}_k^{\mathrm T}{\mathbf d}_k\le \sigma t_k\Vert {\mathbf g}_k\Vert ^2. \end{aligned}$$

(44)

Since $f({\mathbf x}_k)$ is decreasing and the sequence ${f({\mathbf x}_k)}$ is bounded below by H2, we have that

$$\begin{aligned} \lim _{k\rightarrow \infty }{(f({\mathbf x}_k)-f({\mathbf x}_k+t_k{\mathbf d}_k))}=0. \end{aligned}$$

(45)

Hence, (44) and (45) imply

$$\begin{aligned} \lim _{k\rightarrow \infty }{\sigma t_k\Vert {\mathbf g}_k\Vert ^2}=0. \end{aligned}$$

Now, from $t_k>0$ and $\sigma >0$, we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert {\mathbf g}_k\Vert ^2=0, \end{aligned}$$

which was our initial intention. $\square $

6 Numerical Experiments

In this section, we present numerical results obtained by testing $\textit{DHSDL}$, $\textit{DLSDL}$, $\textit{MMDL}$, $\textit{MLSCD}$ and H-$\textit{BFGS}$-$\textit{CG}1$ methods. The codes used in the tests were written in the Matlab programming language, and the tests were performed on the computer Workstation Intel Core i3 2.0 GHz. The number of iterations, number of function evaluations and CPU time in all tested methods are analyzed.

In the first part, we compare $\textit{DHSDL}$, $\textit{DLSDL}$, $\textit{MMDL}$ and $\textit{MLSCD}$ methods. The tested 33 functions are selected from [30]. We have considered 10 different numerical experiments with the number of variables equal to 100, 500, 1000, 2000, 3000, 5000, 7000, 8000, 10,000 and 15,000, for each function in the Tables 1, 2 and 3. Summary numerical results for $\textit{DHSDL}$, $\textit{DLSDL}$, $\textit{MMDL}$ and $\textit{MLSCD}$, tested on 33 large-scale test functions, are presented in Tables 1, 2 and 3.

The stopping conditions for all algorithms are

$$\begin{aligned} \Vert {\mathbf g}_k\Vert \le 10^{-6} \quad \mathrm {and}\quad \frac{|f({\mathbf x}_{k+1})-f({\mathbf x}_k)|}{1+|f({\mathbf x}_k)|} \le 10^{-16}. \end{aligned}$$

The backtracking parameters for all algorithms are $\sigma = 0.0001$ and $\beta =0.8$, which means that we accept a small decrease in f predicted by a linear approximation at the current point.

Table 1 Summary numerical results for the number of iterations

Full size table

Table 2 Summary numerical results for the number of function evaluations

Full size table

Performance profiles of a given metric, proposed in [31], is a widely used tool for benchmarking and comparing the performance of an optimization software on a large test set. As usual, the number of iterations, number of function evaluations and the computing time (CPU time) are used as performance measures. Figure 1(left) shows the performances of compared methods relative to the number of iterations. Figure 1(right) illustrates the performance of these methods relative to the number of function evaluations. The top curve corresponds to the method that exhibits the best performances with respect to the chosen performance profile.

Table 3 Summary numerical results for the CPU time

Full size table

Figure 2 shows the performance of the considered methods relative to the CPU time.

In Fig. 1(left), it is observable that $\textit{DHSDL}$, $\textit{DLSDL}$, $\textit{MMDL}$ and $\textit{MLSCD}$ methods successfully solve all the problems, and the $\textit{MLSCD}$ method is best in 82% of the test problems compared to other methods.

In Fig. 1(right), it is observable that $\textit{DHSDL}$, $\textit{DLSDL}$, $\textit{MMDL}$ and $\textit{MLSCD}$ methods successfully solve all the problems, and the $\textit{MLSCD}$ method is best in 78% of the test problems compared to the other methods.

Graphs in Fig. 2 show that $\textit{DHSDL}$, $\textit{DLSDL}$, $\textit{MMDL}$ and $\textit{MLSCD}$ methods successfully solve all the problems, and the $\textit{MLSCD}$ method is best in 73% of the test problems compared to the other methods.

Consequently, Figs. 1 and 2 show that the $\textit{MLSCD}$ method generates the best result with respect to all three considered criteria.

Table 4 contains the average number of iterations, the CPU time, and the number of function evaluations for all 330 numerical experiments.

Table 4 Average numerical outcomes for 33 test functions tested on 10 numerical experiments

Full size table

Based on the results arranged in Table 4, it is observable that the $\textit{MLSCD}$ method gives better results compared to $\textit{DHSDL}$, $\textit{DLSDL}$ and $\textit{MMDL}$ methods. This conclusion is confirmed by performance profiles for the number of iterations, number of function evaluations and the CPU time.

In order to compare all five methods ($\textit{DHSDL}$, $\textit{DLSDL}$, $\textit{MMDL}$, $\textit{MLSCD}$ and H-$\textit{BFGS}$-$\textit{CG}1$), we are forced to reduce the number of variables due to the bad average CPU time of the H-$\textit{BFGS}$-$\textit{CG}1$ method during the testing. For each test problem, we considered 10 different numerical experiments with the number of variables as follows: 100, 200, 300, 500, 700, 800, 1000, 1500, 2000 and 3000. The stopping conditions are the same as in previous tests. Also, the values of backtracking parameters for all methods are identical. Summary numerical results for all five methods, tested on 26 large-scale test functions, are presented in Tables 5, 6 and 7. Figures 3 and 4 show the performance of these methods relative to the number of iterations, number of function evaluations and the CPU time, respectively.

In Fig. 3(left), we see that all five methods successfully solve all the problems, and the $\textit{MLSCD}$ method is best in 42% of the test problems compared to the $\textit{DHSDL}$ (7%), $\textit{DLSDL}$ (7%), $\textit{MMDL}$ (23%), H-$\textit{BFGS}$-$\textit{CG}1$ (42%).

In Fig. 3(right), we see that all five methods successfully solve all test problems, and the $\textit{MLSCD}$ method is best in 38% of the test problems compared to the $\textit{DHSDL}$ (12%), $\textit{DLSDL}$ (15%), $\textit{MMDL}$ (30%), H-$\textit{BFGS}$-$\textit{CG}1$ (34%).

In Fig. 4, we see that all five methods successfully solve all the problems, and the $\textit{MLSCD}$ method is best in 50% of the test problems compared to the $\textit{DHSDL}$ (23%), $\textit{DLSDL}$ (12%), $\textit{MMDL}$ (35%), H-$\textit{BFGS}$-$\textit{CG}1$ (0%).

Table 8 contains the average number of iterations, the average CPU time, and the average number of function evaluations for all 260 numerical experiments.

Table 5 Summary numerical results for the number of iterations

Full size table

Table 6 Summary numerical results for the number of function evaluations

Full size table

Table 7 Summary numerical results for the CPU time

Full size table

The results in Table 8 show remarkable progress in reducing the number of iterations and the number of function evaluations using the H-$\textit{BFGS}$-$\textit{CG}1$ method. Compared to the $\textit{DHSDL}$, $\textit{DLSDL}$, $\textit{MMDL}$ and $\textit{MLSCD}$ methods, the H-$\textit{BFGS}$-$\textit{CG}1$ method achieved a lower average number of iterative steps, and also a lower average number of function evaluations compared to the $\textit{DHSDL}$, $\textit{DLSDL}$ and $\textit{MMDL}$ methods.

However, if we observe the average CPU time for all five methods, we can conclude that the H-$\textit{BFGS}$-$\textit{CG}1$ method is slow. The conclusion is the same if we look at graphs arranged in Fig. 4.

From the above, we can give a final conclusion that the $\textit{MLSCD}$ method is the most efficient in terms of all three underlying metrics: number of iterations, number of function evaluations and the CPU time.

7 Conclusions

We propose three efficient conjugate gradient methods for unconstrained optimization problems, in which the search directions always satisfy the sufficient descent condition. These methods are derived using various modifications on the conjugate gradients direction $d_k$ of the form (7) or (8) or using various combinations of scaling parameters $\beta _k$. Comparative criteria are the number of iterative steps, spent CPU time, and the number of the function evaluations. Based on the backtracking line search conditions, we show that our methods are strongly convergent for the uniformly convex functions and globally convergent for general functions. Numerical results illustrate that the proposed methods can outperform the existing ones. Also, these results show that a hybridization of a quasi-Newton method with a $\textit{CG}$ method reduces the number of iterations and the number of function evaluations, but increases the CPU time.

Table 8 Average numerical outcomes for 26 test functions tested on 10 numerical experiments

Full size table

References

Fletcher, R., Reeves, C.M.: Function minimization by conjugate gradients. Comput. J. 7(2), 149–154 (1964)
Article MathSciNet MATH Google Scholar
Fletcher, R.: Practical Methods of Optimization. Unconstrained Optimization, vol. 1. Wiley, New York (1987)
MATH Google Scholar
Dai, Y.H., Yuan, Y.: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10(1), 177–182 (1999)
Article MathSciNet MATH Google Scholar
Hestenes, M.R., Stiefel, E.L.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952)
Article MathSciNet MATH Google Scholar
Polak, E., Ribiere, G.: Note sur la convergence des mthodes de directions conjuguées. Rev. Francaise Imformat Recherche Opertionelle 16, 35–43 (1969)
MATH Google Scholar
Polyak, B.T.: The conjugate gradient method in extreme problems. U.S.S.R. Comput. Math. Math. Phys. 9, 94–112 (1969)
Article MATH Google Scholar
Liu, Y., Storey, C.: Efficient generalized conjugate gradient algorithms, part 1: theory. J. Optim. Theory Appl. 69(1), 129–137 (1991)
Article MathSciNet MATH Google Scholar
Yang, X., Luo, Z., Dai, X.: A global convergence of LS-CD hybrid conjugate gradient method. In: Advances in Numerical Analysis 2013. Hindawi Publishing Corporation, Article ID 517452, 5 pp. (2013)
Dai, Y.H., Liao, L.Z.: New conjugacy conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 43(1), 87–101 (2001)
Article MathSciNet MATH Google Scholar
Zheng, Y., Zheng, B.: Two new Dai–Liao-type conjugate gradient methods for unconstrained optimization problems. J. Optim. Theory Appl. 175, 502–509 (2017)
Article MathSciNet MATH Google Scholar
Andrei, N.: An acceleration of gradient descent algorithm with backtracking for unconstrained optimization. Numer. Algorithms 42, 63–73 (2006)
Article MathSciNet MATH Google Scholar
Stanimirović, P.S., Miladinović, M.B.: Accelerated gradient descent methods with line search. Numer. Algorithms 54, 503–520 (2010)
Article MathSciNet MATH Google Scholar
Touati-Ahmed, D., Storey, C.: Efficient hybrid conjugate gradient techniques. J. Optim. Theory Appl. 64(2), 379–397 (1990)
Article MathSciNet MATH Google Scholar
Zhang, L., Zhou, W.: Two descent hybrid conjugate gradient methods for optimization. J. Comput. Appl. Math. 216, 251–264 (2008)
Article MathSciNet MATH Google Scholar
Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2(1), 21–42 (1992)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Yuan, Y.: An efficient hybrid conjugate gradient method for unconstrained optimization. Ann. Oper. Res. 103, 33–47 (2001)
Article MathSciNet MATH Google Scholar
Hager, W.W., Zhang, H.: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 16, 170–192 (2005)
Article MathSciNet MATH Google Scholar
Zhang, L., Zhou, W.J., Li, D.H.: A descent modified Polak–Ribiére–Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 26, 629–640 (2006)
Article MathSciNet MATH Google Scholar
Zhang, L., Zhou, W.J., Li, D.H.: Global convergence of a modified Fletcher–Reeves conjugate method with Armijo-type line search. Numer. Math. 104, 561–572 (2006)
Article MathSciNet MATH Google Scholar
Zhang, L.: Nonlinear Conjugate Gradient Methods for Optimization Problems. Ph.D. Thesis, College of Mathematics and Econometrics, Hunan University, Changsha, China (2006)
Ibrahim, M.A.H., Mamat, M., Leong, W.J.: The hybrid BFGS-CG method in solving unconstrained optimization problems. In: Abstract and Applied Analysis 2014. Hindawi Publishing Corporation. Article ID 507102, 6 pp. (2014)
Zoutendijk, G.: Nonlinear programming, computational methods. In: Abadie, J. (ed.) Integer and Nonlinear Programming, pp. 37–86. North-Holland, Amsterdam (1970)
Google Scholar
Cheng, W.: A two-term PRP-based descent method. Numer. Funct. Anal. Optim. 28(11–12), 1217–1230 (2007)
Article MathSciNet MATH Google Scholar
Luo, Y.Z., Tang, G.J., Zhou, L.N.: Hybrid approach for solving systems of nonlinear equations using chaos optimization and quasi-Newton method. Appl. Soft Comput. 8(2), 1068–1073 (2008)
Article Google Scholar
Han, L., Neumann, M.: Combining quasi-Newton and Cauchy directions. Int. J. Appl. Math. 12(2), 167–191 (2003)
MathSciNet MATH Google Scholar
Baluch, B., Salleh, Z., Alhawarat, A., Roslan, U.A.M.: A new modified three-term conjugate gradient method with sufficient descent property and its global convergence. J. Math. 2017, Article ID 2715854, 12 pp. (2017)
Khanaiah, Z., Hmod, G.: Novel hybrid algorithm in solving unconstrained optimizations problems. Int. J. Novel Res. Phys. Chem. Math. 4(3), 36–42 (2017)
Google Scholar
Osman, W.F.H.W., Ibrahim, M.A.H., Mamat, M.: Hybrid DFP-CG method for solving unconstrained optimization problems. J. Phys. Conf. Ser. 890(2017), 012033 (2017)
Article Google Scholar
Byrd, R.H., Nocedal, J.: A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 26(3), 727–739 (1989)
Article MathSciNet MATH Google Scholar
Andrei, N.: An Unconstrained optimization test functions collection. Adv. Model. Optim. 10(1), 147–161 (2008)
MathSciNet MATH Google Scholar
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The first author gratefully acknowledge support from the Research Project 174013 of the Serbian Ministry of Science.

Author information

Authors and Affiliations

Faculty of Sciences and Mathematics, University of Niš, Višegradska 33, 18 000, Nis, Serbia
Predrag S. Stanimirović & Ivona Brajević
Technical Faculty in Bor, University of Belgrade, Vojske Jugoslavije 12, 19 210, Bor, Serbia
Branislav Ivanov
Faculty of Technology in Leskovac, University of Niš, Bulevar Oslobodjenja 124, 16 000, Leskovac, Serbia
Snežana Djordjević

Authors

Predrag S. Stanimirović
View author publications
You can also search for this author in PubMed Google Scholar
Branislav Ivanov
View author publications
You can also search for this author in PubMed Google Scholar
Snežana Djordjević
View author publications
You can also search for this author in PubMed Google Scholar
Ivona Brajević
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Predrag S. Stanimirović.

Additional information

Communicated by Ilio Galligani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stanimirović, P.S., Ivanov, B., Djordjević, S. et al. New Hybrid Conjugate Gradient and Broyden–Fletcher–Goldfarb–Shanno Conjugate Gradient Methods. J Optim Theory Appl 178, 860–884 (2018). https://doi.org/10.1007/s10957-018-1324-3

Download citation

Received: 05 March 2018
Accepted: 26 May 2018
Published: 11 June 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10957-018-1324-3

Keywords

Mathematics Subject Classification

90C30

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

New Hybrid Conjugate Gradient and Broyden–Fletcher–Goldfarb–Shanno Conjugate Gradient Methods

Abstract

Similar content being viewed by others

A nonmonotone hybrid conjugate gradient method for unconstrained optimization

A Mixed Hybrid Conjugate Gradient Method for Unconstrained Engineering Optimization Problems

Further comment on another hybrid conjugate gradient algorithm for unconstrained optimization by Andrei

1 Introduction

2 Preliminaries

3 A Mixed LS-CD Conjugate Gradient Method

3.1 Convergence of the MLSCD Conjugate Gradient Method

Theorem 3.1

Assumption 3.1

Lemma 3.1

Proof

Theorem 3.2

Proof

4 A Mixed DHSDL-DLSDL Conjugate Gradient Method

4.1 Convergence of the MMDL Conjugate Gradient Method

Theorem 4.1

Proof

5 Hybrid Three-Term Broyden–Fletcher–Goldfarb–Shanno Conjugate Gradient Methods in Solving Unconstrained Optimization Problems

5.1 Three-Term H-BFGS-CG1 Method

5.2 Convergence Analysis of H-BFGS-CG1 Method

Assumption 5.1

Theorem 5.1

Theorem 5.2

Proof

6 Numerical Experiments

7 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation