Two-step conjugate gradient method for unconstrained optimization

Dehghani, R.; Bidabadi, N.

doi:10.1007/s40314-020-01297-2

Two-step conjugate gradient method for unconstrained optimization

Published: 11 August 2020

Volume 39, article number 241, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational and Applied Mathematics Aims and scope Submit manuscript

Two-step conjugate gradient method for unconstrained optimization

Download PDF

R. Dehghani¹ &
N. Bidabadi¹

362 Accesses
1 Citation
Explore all metrics

Abstract

Using Taylor’s series, we propose a modified secant relation to get a more accurate approximation of the second curvature of the objective function. Then, using this relation and an approach introduced by Dai and Liao, we present a conjugate gradient algorithm to solve unconstrained optimization problems. The proposed method makes use of both gradient and function values, and utilizes information from the two most recent steps, while the usual secant relation uses only the latest step information. Under appropriate conditions, we show that the proposed method is globally convergent without needing convexity assumption on the objective function. Comparative results show computational efficiency of the proposed method in the sense of the Dolan–Moré performance profiles.

A new modified BFGS method for unconstrained optimization problems

Article 19 April 2018

A modified secant equation quasi-Newton method for unconstrained optimization

Article 31 May 2022

A new family of globally convergent conjugate gradient methods

Article 08 February 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider the unconstrained nonlinear optimization problem:

$$\begin{aligned} \min f(x), \ \ \ x \in \mathbb {R}^n, \end{aligned}$$

where $f:\mathbb {R}^n\rightarrow \mathbb {R}, $ is twice continuously differentiable.

The conjugate gradient (CG) methods are popular in solving large-scale unconstrained minimization problems, since they avoid storing matrices. One step of these iterative methods is defined as follows:

$$\begin{aligned} x_{k+1} = x_k + \alpha _k d_k, \end{aligned}$$

(1)

where $\alpha _k>0$ is a step length and $d_k$ is the search direction defined by:

$$\begin{aligned} d_k=\left\{ \begin{array}{ll} -g_k, &{} \hbox { for } \quad k = 0, \\ -g_k+\beta _k d_{k-1}, &{} \hbox {for } \quad k> 0, \end{array} \right. \end{aligned}$$

(2)

where $g_{k} = \nabla f (x_{k})$ and $\beta _k $ being a scalar called the CG parameter. Different choices for the CG parameter in (2) lead to different CG methods.

The step length $\alpha _k$ is determined by an exact/inexact line search technique Nocedal and Wright (2006). Throughout our work here, we utilize the strong Wolfe conditions for determining the step length as follows Nocedal and Wright (2006):

$$\begin{aligned}&f(x_k + \alpha _k d_k) \le f(x_k) +\sigma _1 \alpha _k g_k^{T} d_k,\\&\Big \vert g(x_k + \alpha _k d_k)^{T} d_k \Big \vert \le \sigma _2 \Big \vert g(x_k)^{T} d_k \Big \vert , \end{aligned}$$

with $0<\sigma _1<\sigma _2<1.$

Well-known CG methods include those proposed by Hestenes–Stiefel (HS) (1952), Fletcher–Reeves (FR) (1964), and Polak–Ribire (PR) (Polak and Ribiére 1969; Polyak 1969).

Zoutendijk (1970) established the global convergence of the FR method with the exact line search. Al-Baali (1985) extended this result using inexact line search. Powell (1977) showed that the FR method had a poor performance in practice. In this regard, although PR and HS methods work very well in practice, their global convergence is not guaranteed. To overcome this problem, Powell (1984) suggested another modified conjugate parameter.

Recently, Dai and Liao (2001) proposed some formulae for $\beta _k$ by exploiting a conjugacy condition based on the standard secant relation:

$$\begin{aligned} B_{k+1}s_k=y_k , \end{aligned}$$

(3)

where $ s_{k} = x_{k+1} -x_{k}$ and $y_k=g_{k+1}-g_{k}$.

The usual secant relation employs only the gradients and the available function values are ignored. To get a higher order accuracy of approximating the Hessian matrix of the objective function, several researchers have modified the usual secant relation (3) to make full use of both the gradient and function values (see Wei et al. 2006; Yuan and Wei 2010; Yabe and Takano 2004; Zhang and Xu 2001). Wei et al. (2006), using Taylor’s series, modified (3) as follows:

$$\begin{aligned} B_{k+1}s_k=y_k + \frac{\vartheta _k}{\Vert s_k\Vert ^{2}} s_k, \end{aligned}$$

(4)

where $\vartheta _{k}=2 (f_{k}-f_{k+1})+ (g_{k}+g_{k+1})^T s_{k}.$ Recently, Yuan and Wei (2010) extended the modified secant relation (4) as follows:

$$\begin{aligned} B_{k+1}s_k={\tilde{y}}_k, \end{aligned}$$

(5)

where ${\tilde{y}}_k =y_k + \frac{\max (\vartheta _k , 0)}{\Vert s_k\Vert ^{2}} s_k$.

Such modified secant relations make use of both the available gradient and function values only at the latest available point. Ford and Moghrabi (1997, (1993, (1994) employed interpolating polynomials, and introduced a different secant relation utilizing information from three most recent points.

Ford and Moghrabi (1993), by constructing interpolating quadratic curves for x(t), modified the secant relation (3) as follows:

$$\begin{aligned} B_{k+1}r_k=w_k, \quad r_k=s_k-\psi s_{k-1}, \quad w_k=y_k-\psi y_{k-1}, \end{aligned}$$

(6)

where

$$\begin{aligned} \psi =\frac{\delta ^{2}}{1+2\delta }, \quad \delta =\frac{t_2-t_1}{t_1-t_0}. \end{aligned}$$

(7)

Notice that $t_0, t_1$ and $t_2$ are chosen, such that:

$$\begin{aligned} x(t_0)=x_{k-1},\quad x(t_1)=x_{k},\quad x(t_2)=x_{k+1}. \end{aligned}$$

(8)

We observe that the modified secant method (6) utilizes information from the three most recent iterates, but does not utilize function values. Here, we use Taylor’s series to construct alternative estimates of the secant relation, using both the available gradient and function values. Moreover, we utilize the information from the three most recent points instead of merely the last two points. Then, we make use of the new secant relation and the idea of Dai and Liao (2001) and Moghrabi (2019) to obtain a new formula for $\beta _k$ (see Eq. (48)).

The rest of our work is organized as follows. In Sect. 2, we briefly discuss the conjugacy condition proposed by Dai and Liao. In Sect. 3, we use Taylor’s series to construct alternative estimates of the secant relation and by considering this relation we introduce a new formula for CG parameter. In Sect. 4, we investigate the convergence of our proposed method. In Sect. 5, we report numerical results. We conclude in Sect. 6.

2 Dai–Liao’s nonlinear CG methods

Here, we summarize review Dai–Liao’s conjugacy condition. These methods are obtained based on the standard secant equation:

$$\begin{aligned} B_ks_{k-1}=y_{k-1}, \end{aligned}$$

(9)

where $B_k$ is an approximation of the Hessian of f at $x_k$.

The search direction $d_k$ is a quasi-Newton direction obtained by solving:

$$\begin{aligned} B_kd=-g_k. \end{aligned}$$

(10)

It is well known that the search direction generated by the linear CG methods satisfies the following conjugacy conditions:

$$\begin{aligned} d_{i}^{T}Qd_{j}=0,~~~\forall i\ne j, \end{aligned}$$

where Q is a positive definite Hessian matrix of the quadratic objective function.

For a general nonlinear function f, using the mean value theorem for f(x), there exists $\tau \in (0, 1)$, such that:

$$\begin{aligned} d_{k}^{T}y_{k-1}=\alpha _{k-1}d_{k}^{T} \nabla ^2 f(x_{k-1} +\tau \alpha _{k-1} d_{k-1})d_{k-1}. \end{aligned}$$

(11)

Hence, it is reasonable to replace (11) with:

$$\begin{aligned} d_{k}^{T}y_{k-1}=0. \end{aligned}$$

Using (9) and (10), we have:

$$\begin{aligned} d_{k}^{T}y_{k-1}=d_{k}^{T}( B_ks_{k-1})=-g_{k}^{T}s_{k-1}. \end{aligned}$$

(12)

Dai and Liao (2001) modified the conjugacy condition (12) as follows:

$$\begin{aligned} d_{k}^{T}y_{k-1}=-tg_{k}^{T}s_{k-1}, \end{aligned}$$

(13)

where t is a scaler. Then, using (2) and (13), they obtained the following new formula for $\beta _{k}$:

$$\begin{aligned} \beta _{k}^{DL}=\frac{ g_{k}^{T} (y_{k-1}-ts_{k-1})}{d_{k-1}^{T} y_{k-1}}. \end{aligned}$$

Recently, several CG methods based on other secant relations have been investigated (see Babaie-Kafaki et al. 2010; Li et al. 2007; Yabe and Takano 2004). In addition, using the multi-step secant relation (6), Ford et al. (2008) derived following formulas for $\beta _{k}$:

$$\begin{aligned} \beta _{k}^{FNY}=\frac{ g_{k}^{T} (w_{k-1}-tr_{k-1})}{d_{k-1}^{T}w_{k-1}}. \end{aligned}$$

(14)

3 Two-step conjugate gradient method

Here, we use Taylor’s series to construct alternative estimates of the secant relation to approximate the second curvature of the objective function with a high accuracy. Then, we propose a CG parameter by considering a new secant relation and adapting the idea of Dai and Liao (2001).

3.1 A modified secant relation

Here, we intend to make use of three iterates $x_{k-1},$$x_{k}$, and $x_{k+1},$ generated by some quasi-Newton algorithm. Let

$$\begin{aligned} \varphi (t)=f(x(t)), \end{aligned}$$

where x(t) is a differentiable curve in $\mathbb {R}^n.$ Since function f is adequately smooth, it is obvious that $\varphi (t)$ is sufficiently smooth.

Using the chain rule to function $\varphi (t)$:

$$\begin{aligned} \varphi '(t)=\left( \frac{{\text {d}}x}{{\text {d}}t}\right) ^{T}g(x(t)) \end{aligned}$$

(15)

and

$$\begin{aligned} \varphi ''(t)=\left( \frac{{\text {d}}x}{{\text {d}}t}\right) ^{T}G(x(t))\left( \frac{{\text {d}}x}{{\text {d}}t}\right) +\left( \frac{{\text {d}}^{2}x}{{\text {d}}t^{2}}\right) ^{T}g(x(t)). \end{aligned}$$

(16)

Now, suppose that x(t) is the quadratic polynomial interpolating, so that:

$$\begin{aligned} x(t_0)=x_{k-1}, \quad x(t_1)=x_{k}, \quad x(t_2)=x_{k+1}. \end{aligned}$$

Of course, there are various choices for x(t). Using Lagrange interpolation, we have:

$$\begin{aligned} x(t)\equiv \sum _{j=0}^{2} L_j(t)x_{k+j-1}, \end{aligned}$$

(17)

where $L_j(t), \quad j=0, 1, 2,$ are the basic Lagrange polynomials:

$$\begin{aligned} L_j(t)=\prod _{i=0,i\ne j}^{2}\frac{t-t_i}{t_j-t_i}. \end{aligned}$$

Taking the derivative of (17), we obtain:

$$\begin{aligned} \frac{{\text {d}}x(t)}{{\text {d}}t}\equiv \sum _{j=0}^{2} L'_j(t)x_{k+j-1}. \end{aligned}$$

(18)

On the other hand, we know:

$$\begin{aligned} \sum _{j=0}^{2}L'_j(t)=0. \end{aligned}$$

(19)

Using (19), we can write (18) as:

$$\begin{aligned} \frac{{\text {d}}x(t)}{{\text {d}}t}\equiv L'_2(t)\Big (s_k -\frac{L'_0(t)}{L'_2(t)}s_{k-1} \Big ). \end{aligned}$$

(20)

Moreover, by taking the derivative of (20), we have:

$$\begin{aligned} \frac{{\text {d}}^{2}x(t)}{{\text {d}}^{2}t}\equiv L''_2(t)\Big (s_k -\frac{L''_0(t)}{L''_2(t)}s_{k-1} \Big ). \end{aligned}$$

(21)

Clearly, we can represent (20) and (21) as follows:

$$\begin{aligned} \frac{{\text {d}}x(t)}{{\text {d}}t}\equiv L'_2(t)\Big (s_k -\psi s_{k-1} \Big ), \end{aligned}$$

(22)

and

$$\begin{aligned} \frac{{\text {d}}^{2}x(t)}{{\text {d}}^{2}t}\equiv L''_2(t)\Big (s_k -\delta s_{k-1} \Big ), \end{aligned}$$

(23)

where $\psi $ and $\delta $ are given by (7).

To determine $\delta $ and $\psi $ in (7), the values $t_j, j=0, 1, 2,$ are needed. For this purpose, we consider one of the successful approaches that was introduced by Ford and Moghrabi (1993). Consider a norm of the general form:

$$\begin{aligned} \Vert z\Vert _{M}=\{z^{T}Mz\}^{\frac{1}{2}}, \end{aligned}$$

where M is some symmetric positive definite matrix. By choosing $t_2=0$ and taking M be the identity matrix, $t_1$ and $t_0$ can be computed as follows:

$$\begin{aligned} t_1=-\Vert s_k\Vert , ~~t_0=-\Vert s_k\Vert -\Vert s_{k-1}\Vert . \end{aligned}$$

(24)

Using the Taylor’s expansion formula for the function $\varphi $,

$$\begin{aligned} \varphi (t)\simeq \varphi (0)+t\varphi '(0)+\frac{t^{2}}{2}\varphi ''(0); \end{aligned}$$

(25)

that is:

$$\begin{aligned} {t^{2}}\varphi ''(0)\simeq 2\Big (\varphi (t)-\varphi (0)-t\varphi '(0)\Big ). \end{aligned}$$

(26)

Substituting $t =t_1$ in (26), and using (15), (16) and the fact that $\varphi (t_2)=f_{k+1}$, $\varphi (t_1)=f_{k}, $ the relation (26) implies that:

$$\begin{aligned} \begin{array}{l} \Vert s_k\Vert ^{2} \left( \frac{{\text {d}}x}{{\text {d}}t}\vert _{ t=0}\right) ^{T}G_{k+1}\left( \frac{{\text {d}}x}{{\text {d}}t}\vert _{ t=0}\right) +\Vert s_k\Vert ^{2}\left( \frac{{\text {d}}^{2}x}{{\text {d}}t^{2}}\right) ^Tg_{k+1}\\ \simeq 2(f_{k}-f_{k+1})+2\Vert s_k\Vert \left( \frac{{\text {d}}x}{{\text {d}}t}\vert _{ t=0}\right) ^{T}g_{k+1};\end{array} \end{aligned}$$

that is:

$$\begin{aligned}&{\left( \Vert s_k\Vert \frac{{\text {d}}x}{{\text {d}}t}\vert _{ t=0}\right) ^{T}}G_{k+1} {\left( \Vert s_k\Vert \frac{{\text {d}}x}{{\text {d}}t}\vert _{ t=0}\right) }\nonumber \\&\quad \simeq 2(f_{k}-f_{k+1})+2{\left( \Vert s_k\Vert \frac{{\text {d}}x}{{\text {d}}t}\vert _{ t=0}\right) ^{T}}g_{k+1}-\Vert s_k\Vert ^{2}\left( \frac{{\text {d}}^{2}x}{{\text {d}}t^{2}}\right) ^Tg_{k+1}. \end{aligned}$$

(27)

By considering $\{t_j\}_{j=0}^{2},$ as (24), the relations (22) and (23) can be written as:

$$\begin{aligned} \frac{{\text {d}}x}{{\text {d}}t}\vert _{t=0}=\frac{\eta _1}{\Vert s_k\Vert }(s_k-\psi s_{k-1}), \end{aligned}$$

(28)

and

$$\begin{aligned} \frac{{\text {d}}^{2}x}{{\text {d}}t^{2}}=\frac{2(s_k-\delta s_{k-1})}{t_{0}t_{1}}, \end{aligned}$$

where $\psi $ and $\delta $ are defined by (7), and $\eta _1$ is given by:

$$\begin{aligned} \eta _1=\frac{t_0+t_1}{t_0}. \end{aligned}$$

This implies that:

$$\begin{aligned} -\Vert s_k\Vert ^{2}\left( \frac{{\text {d}}^{2}x}{{\text {d}}t^{2}}\right) ^{T}g_{k+1}=-2\eta _{2}{(s_k-\delta s_{k-1})^{T}g_{k+1}}, \end{aligned}$$

(29)

where

$$\begin{aligned} \eta _2= -\frac{\Vert s_k\Vert }{t_0}. \end{aligned}$$

Now, using (28) and (29) and the fact that $y_{k}=g_{k+1}-g_{k}$ and considering $B_{k+1}$ as a new approximation of $G_{k+1}$, the relation (27) implies that:

$$\begin{aligned} {\overline{s}}_k^{T}B_{k+1}{\overline{s}}_k={\overline{s}}_k^{T}(y_k-\psi y_{k-1})+\vartheta _k, \end{aligned}$$

(30)

with

$$\begin{aligned} {\overline{s}}_k=\eta _1(s_k-\psi s_{k-1}), \end{aligned}$$

(31)

and

$$\begin{aligned} \vartheta _k= 2(f_{k}-f_{k+1})-2\eta _2(s_k-\delta s_{k-1})^Tg_{k+1}+{\overline{s}}_k^{T}(g_{k+1}+(1+\psi )g_k-\psi g_{k-1}). \end{aligned}$$

Equation (30) provides a new modified secant relation as follows:

$$\begin{aligned} B_{k+1} {\overline{s}}_k={\overline{y}}_k, \end{aligned}$$

(32)

where ${\overline{s}}_k$ is given by (31) and ${\overline{y}}_k$ defined by:

$$\begin{aligned} {\overline{y}}_k= y_k-\psi y_{k-1}+\frac{\vartheta _k}{\Vert {\overline{s}}_k\Vert ^{2}}{\overline{s}}_k. \end{aligned}$$

(33)

We note that this new secant relation contains information from the three most recent points where the usual secant relation makes use of the information merely at the two latest points. Moreover, both the available gradient and function values are being utilized, while the modified BFGS of Ford and Moghrabi (6) made use of only the gradient information from the three most recent points, but ignored the available function values.

A merit of the new modified secant method can be seen from the following theorem.

Theorem 1

Suppose that the function f is sufficiently smooth. If $\Vert s_k \Vert ,$ is small enough, then we have:

$$\begin{aligned} {\overline{s}}_{k}^{T}(G_{k+1}{\overline{s}}_{k} -{\overline{y}}_{k})=\frac{1}{3} {\overline{s}}_{k}^T (T_{k+1} {\overline{s}}_{k}) {\overline{s}}_{k}+O (\parallel {\overline{s}}_{k}\parallel ^{4}), \end{aligned}$$

(34)

and

$$\begin{aligned} r_{k}^{T}(G_{k+1}r_k -w_{k})=\frac{1}{2} {\overline{s}}_{k}^T (T_{k+1} {\overline{s}}_{k}) {\overline{s}}_{k}+O (\parallel {\overline{s}}_{k} \parallel ^{4}), \end{aligned}$$

(35)

where

$$\begin{aligned} {\overline{s}}_k^T (T_{k+1} {\overline{s}}_k) {\overline{s}}_k=\sum _{i,j,l=1}^n \frac{\partial ^{3} f(x_{k+1})}{\partial x^i \partial x^j \partial x^l } \partial {\overline{s}}_{k}^i \partial {\overline{s}}_{k}^j \partial {\overline{s}}_{k}^l. \end{aligned}$$

(36)

Proof

The result follows straightforwardly from (25). $\square $

Theorem 1 implies that the quantity ${\overline{s}}_{k}^{T}{\overline{y}}_{k}$ generated by the modified secant method approximates the second curvature $ {\overline{s}}_{k}^{T}G_{k+1} {\overline{s}}_{k}$ with a higher accuracy than the quantity $r_{k}^{T}w_{k}$ does.

Remark 1

It is easy to see that:

$$\begin{aligned} \eta _1 =\frac{t_0+t_1}{t_0} \le 1+\frac{t_1}{t_0} = 1+\frac{\Vert s_k\Vert }{\Vert s_k\Vert +\Vert s_{k-1}\Vert }\le 2. \end{aligned}$$

(37)

This relation implies that:

$$\begin{aligned} 1 \le \eta _1 \le 2. \end{aligned}$$

(38)

Moreover:

$$\begin{aligned} \eta _2= \frac{\Vert s_k\Vert }{\Vert s_k\Vert +\Vert s_{k-1}\Vert }\le 1. \end{aligned}$$

(39)

Here, we consider some common assumptions.

Assumption A

The level set $D = \{x \mid f(x) \le f(x_0)\}$ is bounded.

Assumption B

The function f is continuously differentiable on D, and there is a constant $L \ge 0$, such that, for all $ x, y \in D,$$\parallel g(x) - g(y) \parallel \le L \parallel x - y \parallel $.

It is clear that assumptions A and B imply that there is a positive constant $\gamma $, such that:

$$\begin{aligned} \Vert g(x) \Vert \le \gamma , \quad \forall \quad x\in D. \end{aligned}$$

Lemma 1

Suppose that the assumption A holds and ${\overline{s}}_k$ defined by (31). Then, we have:

$$\begin{aligned} \frac{1}{2}\Vert s_k\Vert \le \Vert {\overline{s}}_k\Vert \le 3\Vert s_k\Vert . \end{aligned}$$

(40)

Proof

By the definitions of $t_0, t_1$ and $t_2$, we can get:

$$\begin{aligned} \delta =\frac{\Vert s_k\Vert }{\Vert s_{k-1}\Vert }. \end{aligned}$$

We see that $\delta >0$ and $\eta _{1} =\frac{t_0+t_1}{t_0}>0,$ then we can deduce the following inequalities:

$$\begin{aligned} \psi =\frac{\delta ^{2}}{1+2\delta }<\frac{\delta ^{2}}{2\delta }=\frac{\Vert s_k\Vert }{2\Vert s_{k-1}\Vert }. \end{aligned}$$

(41)

Now, using (31), (38), and (41), we have:

$$\begin{aligned} \Vert {\overline{s}}_k\Vert \le { \eta _1 } (\Vert s_k \Vert + \psi \Vert s_{k-1}\Vert ) \le 3\Vert s_k \Vert . \end{aligned}$$

(42)

Moreover, we can give:

$$\begin{aligned} \Vert {\overline{s}}_k\Vert \ge {\eta _1 } (\Vert s_k \Vert - \psi \Vert s_{k-1}\Vert ) \ge \frac{1}{2} \Vert s_k \Vert . \end{aligned}$$

(43)

Hence, (42) and (43) give the desired result (40). $\square $

3.2 A new formula for $\beta _k$

Here, considering the approach proposed by Dai and Liao, and using our modified secant relation given by (32), we propose a new conjugacy relation:

$$\begin{aligned} d_{k}^{T}{{\overline{y}}}_{k-1}=-tg_{k}^{T}{\overline{s}}_{k-1}. \end{aligned}$$

(44)

Using the conjugacy relation (44), we present a new formula for $\beta _k$:

$$\begin{aligned} \beta _{k}=\frac{ g_{k}^{T} ({{\overline{y}}}_{k-1}-t{\overline{s}}_{k-1})}{d_{k-1}^{T}{{\overline{y}}}_{k-1}}. \end{aligned}$$

(45)

For a general function f, it is possible that the denominator, $d_{k-1}^{T} {{\overline{y}}}_{k-1},$ in (45) becomes zero. To overcome the difficulty, we make a modification on (32) as follows:

$$\begin{aligned} B_{k+1} {\overline{s}}_k=y_k^{*} , \end{aligned}$$

(46)

with

$$\begin{aligned} y_k^{*} ={\overline{y}}_k +h_k {\overline{s}}_k,\quad h_k=C\Vert g_k\Vert +\max \left( -\frac{{\overline{y}}_k^T {\overline{s}}_k}{\Vert {\overline{s}}_k\Vert ^{2}},0\right) , \end{aligned}$$

(47)

where C is a positive constant.

Now, using the secant relation (46), we obtain another CG parameter as follows:

$$\begin{aligned} \beta _{k}^{{\text {new}}}=\frac{ g_{k}^{T} (y^{*}_{k-1}-ts_{k-1})}{d_{k-1}^{T} y^{*}_{k-1}}. \end{aligned}$$

(48)

From (38) and (47), it is not difficult to see that:

$$\begin{aligned}&d_{k-1}^{T}y^{*}_{k-1} \ge C \Vert g_{k-1}\Vert d_{k-1}^{T} {\overline{s}}_{k-1}\nonumber \\&\quad \ge \frac{1}{2}C \Vert g_{k-1}\Vert d_{k-1}^{T} {s}_{k-1} \ge \frac{1}{2}C\alpha ^{-1}_{k-1} \Vert g_{k-1}\Vert \Vert s_{k-1}\Vert ^{2}>0. \end{aligned}$$

(49)

We can now give a new CG algorithm using our proposed CG parameter given by (48) as follows.

Algorithm 1: New conjugate gradient method

Step 1: Give $ \varepsilon , $ as a tolerance for stopping criterion, $ C>0, $$ \lambda > 0 $, $\sigma _1 \in (0,1)$ and $\sigma _2\in (\sigma _1,1),$ a starting point $x_0 \in \mathbb {R}^n.$ Set $\alpha _{initial}=1$, $d_0=-g_0,$ and $k = 0$.

Step 2: If $\Vert g_k \Vert <\varepsilon $ then stop.

Step 3: Compute the step length $\alpha _k$ to satisfy the strong Wolfe conditions with the initial step length set to $\alpha _{initial}$.

Step 4 : Set $x_{k+1} = x_k + \alpha _k d_k.$

Step 5 : Compute ${\overline{s}}_k$ and $y_{k}^{*}$ by (31) and (47), respectively. Then, calculate $\beta _{k} $ by:

$$\begin{aligned} \beta _{k}\equiv \beta ^{{\text {New}}}_{k} =\frac{ g_{k+1}^{T} (y^{*}_{k}-ts_{k})}{d_{k+1}^{T} y^{*}_{k}}. \end{aligned}$$

Step 6 : Compute $d_{k+1}$ by $d_{k+1}=-g_{k+1}+\beta _kd_k$.

Step7: Compute the initial step length by:

$$\begin{aligned} \alpha _{{\text {initial}}}=\lambda \frac{ \Vert s_{k} \Vert }{\Vert d_{k+1}\Vert }. \end{aligned}$$

Step 8 : Set $k = k + 1$ and go to Step 2.

In the next section, we proceed to establish the global convergence of Algorithm 1.

4 Convergence analysis

First, we consider the following definition.

Definition 1

For the search directions $\{d_k\}$, we say that the descent condition holds if and only if:

$$\begin{aligned} g_{k}^{T}d_{k}<0, \quad \forall \quad k\ge 0. \end{aligned}$$

To establish convergence of Algorithm 1, we first provide some lemmas.

Lemma 2

Let f satisfies assumptions A and B. For ${\overline{y}}_{k}$ defined by (33), we have:

$$\begin{aligned} \Vert {\overline{y}}_{k} \Vert \le M_1 \Vert s_{k-1}\Vert , \end{aligned}$$

(50)

where $M_1 > 0$ is a constant.

Proof

Using assumptions A and B and Remark 1, we obtain:

$$\begin{aligned} \begin{array}{ll} \vert \vartheta _k \vert &{} = \vert 2(f_{k}-f_{k+1})-2\eta _{2}(s_k-\delta s_{k-1})g_{k+1}+{\overline{s}}_k^{T}(g_{k+1}+(1+\psi )g_k-\psi g_{k-1})\vert \\ &{} = \vert -2g(x_{k}+\theta s_k)s_k-2\eta _{2}(s_k-\delta s_{k-1})g_{k+1}+{\overline{s}}_k^{T}(g_{k+1}\\ &{} \quad +(1+\psi )g_k-\psi g_{k-1})\vert \\ &{} \le \Vert 2 (g_{k+1}-g(x_{k}+\theta s_k)) s_k\Vert +2\delta \Vert s_{k-1}\Vert g_{k+1}\Vert +\Vert {\overline{s}}_k \Vert \Vert (g_{k+1}\\ &{} \quad +(1+\psi )g_k-\psi g_{k-1})\Vert \\ &{} \le 2L(1-\theta )\Vert s_k \Vert ^{2}+2\gamma {\Vert s_k \Vert } +2m_2(1+\psi )\gamma \Vert {s}_k\Vert , \end{array} \end{aligned}$$

where $\theta \in [0, 1]$. Thus, we can deduce that there exists constant $a_1>0$, such that:

$$\begin{aligned} \vert \vartheta _k \vert \le a_1\Vert s_k\Vert ^{2}. \end{aligned}$$

Also, from definition of ${\overline{y}}_k$ and relation (40), we have:

$$\begin{aligned} \Vert {\overline{y}}_k\Vert= & {} \Vert y_k-\psi y_{k-1}+\frac{\vartheta _k}{\Vert {\overline{s}}_k\Vert ^{2}}{\overline{s}}_k\Vert \\\le & {} \Vert y_k\Vert + \vert \psi \vert \Vert y_{k-1}\Vert +\frac{\vert \vartheta _k \vert }{\Vert {\overline{s}}_k\Vert ^{2}}\Vert {\overline{s}}_k\Vert \\\le & {} L \Vert s_k\Vert + \vert \psi \vert L \Vert s_{k-1}\Vert +2{a_{3}}\Vert {\overline{s}}_k\Vert \\\le & {} \frac{3}{2}L \Vert s_k\Vert +6\Vert {s}_k\Vert . \end{aligned}$$

Therefore, the inequality (50) holds with $M_1=\frac{3}{2}L+6a_1.$$\square $

Corollary 1

Let f satisfies assumptions A and B, and $y^{*}_{k}$ be defined by (47). Then, there exists a constant $M_2>0$, such that:

$$\begin{aligned} \Vert y_{k} ^{*}\Vert \le M_2 \Vert s_{k}\Vert . \end{aligned}$$

(51)

Proof

Considering Lemma 2 and assumptions A and B, we have:

$$\begin{aligned} \Vert y_{k} ^{*}\Vert&=\Vert {\overline{y}}_{k}+r_{k} {\overline{s}}_{k}\Vert \\&=\Vert {\overline{y}}_{k}+\left( C\Vert g_{k}\Vert +\max \left( -\frac{{\overline{y}}_{k}^{T} {\overline{s}}_{k}}{\Vert {\overline{s}}_{k} \Vert ^{2}},0\right) \right) {\overline{s}}_{k}\Vert \\&\le \Vert {\overline{y}}_{k}\Vert + C\Vert g_{k}\Vert \Vert {\overline{s}}_{k} \Vert + \frac{\vert {\overline{y}}_{k}^{T} {\overline{s}}_{k}\vert }{\Vert {\overline{s}}_{k} \Vert ^{2}}\Vert {\overline{s}}_{k }\Vert \\&\le 2 \Vert {\overline{y}}_{k}\Vert + C\Vert g_{k}\Vert \Vert {\overline{s}}_{k} \Vert \le (2M_1 +Cm_2\gamma ) \Vert {s}_{k} \Vert , \end{aligned}$$

which implies (51) with $M_2=(2M_1 +Cm_2\gamma ) \Vert {s}_{k}\Vert .$$\square $

For any CG method using the strong Wolfe line search conditions, the following general result was established in Dai et al. (1999).

Lemma 3

Dai et al. (Dai et al. 1999) Let assumptions A and B hold. Consider any CG method in the form (1)–(2), where $d_k$ is a descent direction and $\alpha _{k}$ is computed using the strong Wolfe line search conditions. If

$$\begin{aligned} \sum _{k\ge 1}{} \frac{1}{\Vert d_k \Vert }=\infty , \end{aligned}$$

(52)

then we have:

$$\begin{aligned} \liminf _{k\longrightarrow \infty }g(x_k)=0. \end{aligned}$$

Next, we establish the global convergence of Algorithm 1.

Theorem 2

Let f satisfies the Assumptions A, B and $\{x_k\}$ be generated by Algorithm 1. Then, we have:

$$\begin{aligned} \liminf _{k\rightarrow \infty }g(x_k)=0. \end{aligned}$$

(53)

Table 1 Problems from the CUTEr library

Full size table

Proof

In view of Lemma 3, it suffices to show that (52) holds.

From (49), we have:

$$\begin{aligned} d_{k-1}^{T}y^{*}_{k-1} \ge \frac{1}{2}C\alpha ^{-1}_{k-1} \Vert g_{k-1}\Vert \Vert s_{k-1}\Vert ^{2}. \end{aligned}$$

(54)

On the other hand, using (40), we have:

$$\begin{aligned}&\vert g_{k}^{T} y^{*}_{k-1}-tg_{k}^{T}{\overline{s}}_{k-1}\vert \le \vert g_{k}^{T} y^{*}_{k-1}\vert +t\vert g_{k}^{T}{\overline{s}}_{k-1}\vert \end{aligned}$$

(55)

$$\begin{aligned}&\quad \le \Vert g_{k}\Vert \Vert y^{*}_{k-1}\Vert +3 t\Vert g_{k} \Vert \Vert s_{k-1}\Vert \nonumber \\&\quad \le M \Vert g_k\Vert \Vert s_{k-1}\Vert +3t \Vert g_k\Vert \Vert s_{k-1}\Vert \nonumber \\&\quad \le (M+3t) \Vert g_k\Vert \Vert s_{k-1}\Vert . \end{aligned}$$

(56)

Consequently, from (2), (54) and (56), we have:

$$\begin{aligned} \Vert d_k\Vert= & {} \Vert - g_k+ \beta ^{{\text {New}}}_{k} d_{k-1}\Vert \\\le & {} \Vert g_k\Vert +\vert \beta ^{{\text {New}}}_{k} \vert \Vert d_{k-1}\Vert \\= & {} \Vert g_k\Vert + \frac{ \vert g_{k}^{T} y^{*}_{k-1}-tg_{k}^{T} s_{k-1}\vert }{\vert d_{k-1}^{T} y^{*}_{k-1}\vert } \Vert d_{k-1}\Vert \\\le & {} \gamma +2 \frac{( M+3 t )\Vert s_{k-1}\Vert }{C\alpha ^{-1}_{k-1} \Vert s_{k-1}\Vert ^{2}} \Vert d_{k-1}\Vert \\\le & {} \gamma + 2 \frac{(M+3 t ) }{ C}. \end{aligned}$$

Now, summing up over k, we get:

$$\begin{aligned} \sum _{k\ge 1}^{} \frac{1}{\Vert d_k\Vert }\ge \frac{C}{ C\gamma +2( M+ t )} \sum _{k\ge 1}^{}1=\infty . \end{aligned}$$

Thus, by Lemma 3, the proof is complete. $\square $

5 Numerical results

Numerical results presented by Ford et al. (2008) show that the CG methods with ${\beta }^{FNY}_k$ as given by (14), perform better than the DL method of Dai and Liao (2001), the YT method of Yabe and Takano (2004), and HS method of Hestenes and Stiefel (1952). Thus, here, we compare the performance of our CG method, denoted by M1, with the aforementioned CG method proposed in Ford et al. (2008), denoted by M2. Therefore, our results are reported for the following two algorithms:

M1: Our proposed method (Algorithm 1).

M2: The CG method of Ford et al. (2008) (Algorithm 1 with $ \beta ^{FNY}_k$ as given by (14)).

Since CG methods are mainly designed for large-scale problems, our experiments were made on a set of 75 test problems of the CUTEr library Gould et al. (2015) with the minimum dimension being equal to 1000. A summary of these problems is given in Table 1.

All the codes were written in Matlab 2011 and run on a PC with CPU Intel(R) Core(TM) i5-4200 3.6 GHz, 4 GB of RAM memory, and Centos 6.2 server Linux operating system. The algorithms stop when

$$\begin{aligned} \Vert g_k\Vert \le 10^{-6}, \end{aligned}$$

or the total number of iterates exceeds 3000.

We note that we terminated an algorithm and considered it as failed on a problem, when the number of iterations exceeded 3000.

For the two algorithms in all cases, the steplength $\alpha _k$ is computed using the strong Wolfe conditions with $\sigma _1 =0.001,$$\sigma _2 =0.1.$ For this, we used the line search procedure of Moré and Thuente (1994). Moreover, we set $C=10^{-6}$ in the relation (47).

It is remarkable that numerical performance of the DL method is very dependent on the choice of the parameter t. Recently, work has been done to propose proper selections of t. Here, we utilize the parameter t, as follows Babaie-Kafaki and Ghanbari (2014):

$$\begin{aligned} t_k=\frac{\Vert y_{k}\Vert }{\Vert s_{k}\Vert }. \end{aligned}$$

(57)

We used the performance profiles of Dolan and Moré (2002) to evaluate the performance of these two algorithms with respect to CPU time, the number of iterations, and the total number of function and gradient evaluations computed as:

$$\begin{aligned} N_{\text {f}} + 3N_{\text {g}}, \end{aligned}$$

where $N_{\text {f}}$ and $N_{\text {g}}$, respectively, denote the number of function and gradient evaluations. Figures 1, 2, and 3 demonstrate the results of the comparisons. From these figures, it is clearly observed that Algorithm 1 (M1) is more efficient as compared to the method of Ford et al. (2008).

6 Conclusion

We presented a secant relation for unconstrained optimization problems and approximated the second curvature of the objective function with a high accuracy. Then, based on the proposed secant relation and adapting the approach of Dai–Liao, we presented a conjugate gradient method to solve unconstrained optimization problems. An interesting feature of the proposed method was taking both the gradient and function values into account. Another important property of the proposed method was the utilization of information from the three most recent iterates instead of the last two iterates. Under suitable assumptions, we have established the global convergence of the proposed method without a convexity assumption on the objective function. Numerical results on the collection of problems in CUTER show that the proposed method is efficient as compared to the conjugate gradient method proposed by Ford et al.

References

Al-Baali M (1985) Descent property and global convergence of the Fletcher- Reeves method with inexact linesearch. IMA J. Numer. Anal. 5:121–124
Article MathSciNet Google Scholar
Babaie-Kafaki S, Ghanbari R (2014) The Dai-Liao nonlinear conjugate gradient method with optimal parameter choices. Eur. J. Oper. Res. 234:625–630
Article MathSciNet Google Scholar
Babaie-Kafaki S, Ghanbari R, Mahdavi-Amiri N (2010) Two new conjugate gradient methods based on modified secant relations. J. Comput. Appl. Math. 234:1374–1386
Article MathSciNet Google Scholar
Dai YH, Han JY, Liu GH, Sun DF, Yin HX, Yuan YX (1999) Convergence properties of nonlinear conjugate gradient methods. SIAM J. Optim. 10:348–358
Article MathSciNet Google Scholar
Dai YH, Liao LZ (2001) New conjugacy conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 43:87–101
Article MathSciNet Google Scholar
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math. Program. 91:201–213
Article MathSciNet Google Scholar
Fletcher R, Revees CM (1964) Function minimization by conjugate gradients. Comput. J. 7:149–154
Article MathSciNet Google Scholar
Ford JA, Moghrabi IA (1997) Alternating multi-step quasi-Newton methods for unconstrained optimization. J. Comput. Appl. Math. 82:105–116
Article MathSciNet Google Scholar
Ford JA, Moghrabi IA (1993) Alternative parameter choices for multi-step quasi-Newton methods. Optim. Methods Softw. 2:357–370
Article Google Scholar
Ford JA, Moghrabi IA (1994) Multi-step quasi-Newton methods for optimization. J. Comput. Appl. Math. 50:305–323
Article MathSciNet Google Scholar
Ford JA, Narushima Y, Yabe H (2008) Multi-step nonlinear conjugate gradient methods for unconstrained minimization. Comput. Opt. Appl. 40:191–216 21-42 (1992)
Article MathSciNet Google Scholar
Gould NI, Orban D, Toint PhL (2015) CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Opt. Appl. 60:545–557
Article MathSciNet Google Scholar
Hestenes MR, Stiefel EL (1952) Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand. 49:409–436
Article MathSciNet Google Scholar
Li G, Tang C, Wei Z (2007) New conjugacy condition and related new conjugate gradient methods for unconstrained optimization. J. Comput. Appl. Math. 202:523–539
Article MathSciNet Google Scholar
Moré JJ, Thuente DJ (1994) Line search algorithms with guaranteed sufficient decrease. ACM Trans. Math. Softw. 20:286–307
Article MathSciNet Google Scholar
Moghrabi IA (2019) A new preconditioned conjugate gradient method for optimization. IAENG Int. J. Appl. Math. 49(1):1–8
MathSciNet Google Scholar
Nocedal J, Wright SJ (2006) Numerical Optimization. Springer, New York
MATH Google Scholar
Polak E, Ribiére G (1969) Note Sur la Convergence de Directions Conjuguée. Francaise Informat Recherche Operationelle 16:35–43
MATH Google Scholar
Polyak BT (1969) The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 9:94–112
Article Google Scholar
Powell MJD (1977) Restart procedures of the conjugate gradient method. Math. Program. 2:241–254
Article MathSciNet Google Scholar
Powell MJD (1984) Nonconvex minimization calculations and the conjugate gradient method, Numerical Analysis (Dundee, 1983) Lecture Notes in Mathematics, vol 1066. Springer, Berlin, pp 122–141
Google Scholar
Wei Z, Li G, Qi L (2006) New quasi-Newton methods for unconstrained optimization problems. Appl. Math. Comput. 175:1156–1188
MathSciNet MATH Google Scholar
Yuan G, Wei Z (2010) Convergence analysis of a modified BFGS method on convex minimizations. Comput. Opt. Appl. 47:237–255
Article MathSciNet Google Scholar
Yabe H, Takano M (2004) Global convergence properties of nonlinear conjugate gradient methods with modified secant relation. Comput. Optim. Appl. 28:203–225
Article MathSciNet Google Scholar
Zhang JZ, Xu CX (2001) Properties and numerical performance of quasi-Newton methods with modified quasi-Newton equation. J. Comput. Appl. Math. 137:269–278
Article MathSciNet Google Scholar
Zoutendijk G (1970) Nonlinear programming, computational methods. In: Abadie J (ed) Integer and nonlinear programming. North-holland, Amsterdam, pp 37–86
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Science, Yazd University, Yazd, Iran
R. Dehghani & N. Bidabadi

Authors

R. Dehghani
View author publications
You can also search for this author in PubMed Google Scholar
N. Bidabadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Bidabadi.

Additional information

Communicated by Andreas Fischer.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dehghani, R., Bidabadi, N. Two-step conjugate gradient method for unconstrained optimization. Comp. Appl. Math. 39, 241 (2020). https://doi.org/10.1007/s40314-020-01297-2

Download citation

Received: 15 February 2019
Revised: 20 May 2020
Accepted: 08 August 2020
Published: 11 August 2020
DOI: https://doi.org/10.1007/s40314-020-01297-2

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Two-step conjugate gradient method for unconstrained optimization

Abstract

Similar content being viewed by others

A new modified BFGS method for unconstrained optimization problems

A modified secant equation quasi-Newton method for unconstrained optimization

A new family of globally convergent conjugate gradient methods

1 Introduction

2 Dai–Liao’s nonlinear CG methods

3 Two-step conjugate gradient method

3.1 A modified secant relation

Theorem 1

Proof

Remark 1

Assumption A

Assumption B

Lemma 1

Proof

3.2 A new formula for \(\beta _k\)

4 Convergence analysis

Definition 1

Lemma 2

Proof

Corollary 1

Proof

Lemma 3

Theorem 2

Proof

5 Numerical results

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation