A two-step improved Newton method to solve convex unconstrained optimization problems

Dehghan Niri, T.; Shahzadeh Fazeli, S. A.; Heydari, M.

doi:10.1007/s12190-019-01272-z

A two-step improved Newton method to solve convex unconstrained optimization problems

Original Research
Published: 05 July 2019

Volume 62, pages 37–53, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Applied Mathematics and Computing Aims and scope Submit manuscript

A two-step improved Newton method to solve convex unconstrained optimization problems

Download PDF

514 Accesses
7 Citations
Explore all metrics

Abstract

In this investigation, a new two-step Newton method to solve convex unconstrained optimization problems is developed. This proposed method is based on Traub’s iterative scheme (Iterative methods for the solution of equations, Prentice Hall, Englewood Cliffs, 1964) which is extended to n-variable. The presented two-step algorithm is a modification of Newton method for solving unconstrained optimization problems. The convergence analysis for this iterative algorithm is established under suitable conditions. Various numerical examples are given to illustrate the efficiency and performance of the newly suggested method.

An Efficient Barzilai–Borwein Conjugate Gradient Method for Unconstrained Optimization

Article 25 September 2018

Comment on “A new three-term conjugate gradient method for unconstrained problem”

Article 23 August 2015

A modified secant equation quasi-Newton method for unconstrained optimization

Article 31 May 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The general form of optimization problems can be written as follows:

$$\begin{aligned} \min&f(x),&\nonumber \\ s.t.&~~~c_{i}(x)=0,&i \in E, \nonumber \\&~~~ c_{j}(x)\ge 0,&j \in I, \nonumber \\&x \in \mathbb {R}^{n}, \end{aligned}$$

(1)

where E and I are, respectively, the index set of equality and inequality constraints, $c_{i}(x), (i = 1,\ldots ,m \in E \cup I)$ are constraint functions. When both objective function and constraint functions are linear functions, the problem is called linear programming. Otherwise, the problem is called nonlinear programming. Also, a problem that does not entail any equality or inequality constraints is said to be an unconstrained optimization problem. Now, we consider an unconstrained optimization problem

$$\begin{aligned} \min _{x\in \mathbb {R}^n} f(x), \end{aligned}$$

(2)

where $f: \mathbb {R}^n \rightarrow \mathbb {R}$ is a smooth and continuously differentiable function. It is not easy to find a global minimizer of f(x), because our knowledge of the objective function is commonly only local. Therefore most algorithms are able to find only a local minimizer, which is a point that achieves the smallest value of f(x) in its neighborhood [2]. In other words, we say that a point $x^{*}$ is a local minimizer if there is a neighborhood $\mathcal {N}$ (an open set that contains $x^{*}$) of $x^{*}$ such that $f(x^ {*} ) \le f(x)$ for all $x\in \mathcal {N}$. Most of the numerical methods for unconstrained optimization problems can be classified into two groups, line search algorithms and trust region algorithms. There are many useful algorithms for solving the problem (2) such as the conjugate gradient methods, the trust region methods, the quasi-Newton methods, the classical Newton method, the Nelder–Meade simplex method for problems with noisy functions, the Levenberg–Marquardt method and etc. [2,3,4]. Among the methods mentioned above, the classical Newton method is very famous for its fast convergence property. There are several modifications of the Newton method for unconstrained minimization to achieve global and local convergence, see [2, 4] and the references therein. In Newton method, the positive definiteness of the Hessian matrix of the objective function is an essential condition to get the local minimum and the fast local convergence. Zhou et al. [5] introduced a new algorithm for monotone equations and showed its superliner convergence under a local error-bound assumption that is weaker than the standard nonsingularity condition. A new trust region method for nonlinear equations with the trust region radius converging to zero is proposed in [6], and its convergence under some weak conditions is provided. Li et al. [7] obtained two regularized Newton methods for convex minimization problems in which the Hessian at solutions may be singular and showed that if the objective function be in LC$^2$, then the methods possess local quadratic convergence under a local error bound condition without the requirement of isolated nonsingular solutions. Zhou and Chen in [8] proposed a modified regularized Newton method for convex minimization problems whose Hessian matrices may be singular. Nesterov and Polyak [9] proposed a cubic regularization of the Newton method. At each iteration, it requires solving an unconstrained minimization problem. Nesterov and Nemirovsky [10] presented the class of self-concordant functions that is three time differentiable convex function with the second and third derivatives satisfying a particular condition at each point. Polyak [11] introduced the regularized Newton method for unconstrained convex optimization. For any convex function, with a bounded optimal set, the RNM generates a sequence that converges to the optimal set from any starting point. Dehghan et al. [12] inroduced a new modification of the Newton method to solve unconstrained optimization problems. Also, they used a new improved Newton method in trust region algorithm for unconstrained minimization problems and analyzed its local and global convergence [13]. Recently, Dehghan et al. [14] proposed a new regularized Newton method based on Q.I.F method to solve optimization problems. One of the applications of the above algorithms is to solve system of nonlinear equations. For example, the system of nonlinear equations appearing in the references [15,16,17,18,19] can be solved by using these methods.

In this paper, we introduce a new algorithm to solve the convex unconstrained optimization problems. The organization of the paper is as follows: In Sect. 2, a new algorithm for solving unconstrained minimization problems is presented. Its associated convergence analysis is given in Sect. 3. Some numerical results to compare the new proposed method with the other algorithms are reported in Sect. 4 and finally the conclusions are described in Sect. 5.

2 Description of the method

In this section, a brief review of Newton method for unconstrained optimization problems is given which mentioned in [2]. Consider the minimization problem,

$$\begin{aligned} \min _{x\in \mathbb {R}^n} f(x), \end{aligned}$$

(3)

where $f:\mathbb {R}^n \longrightarrow \mathbb {R}$ is a smooth function and twice continuously differentiable. Gradient $\nabla f(x)$ and Hessian matrix $\nabla ^{2} f(x)$ are denoted by g(x) and H(x), respectively. In the line search method, each iteration computes a search direction $p_{k}$ and then decides how far to move along that direction. Iterations are as follows:

$$\begin{aligned} x_{k+1}=x_{k} + \alpha _{k}p_{k}, \end{aligned}$$

(4)

where the positive scalar $ \alpha _{k} $ is called the step length. The success of linear search method depends on the appropriate selections of step length $ \alpha _{k} $ and direction $p_{k}$. Most of the line search methods require $p_{k}$ to be a descent direction, because this property guarantees that the function f(x) can be reduced along this direction. The search direction can be defined as:

$$\begin{aligned} p_{k} = - B_{k}^{-1}\nabla f_{k}, \end{aligned}$$

(5)

where $ B_{k} $ is a symmetric and nonsingular matrix.

The Newton method is a powerful technique for solving nonlinear equations. It is an application of Taylor Polynomials for finding roots of functions. Newton method is the iterative method for finding a simple root $x^{*}$ of nonlinear equation f(x), i.e., $f(x^{*})=0,~f'(x^{*})\ne 0$, by using

$$\begin{aligned} x_{k+1}=x_{k}-\frac{f (x_{k})}{f'(x_{k})}, ~~~~k=0,1,\ldots , \end{aligned}$$

(6)

that converges quadratically in some neighborhood of $x^{*}$ [1, 20,21,22]. Newton method can also be used to find a minimum or maximum of a function. The derivative is zero at a minimum or maximum, so minima and maxima can be found by applying Newton method to the derivative. The iteration becomes:

$$\begin{aligned} x_{k+1}=x_{k}-\frac{f' (x_{k})}{f''(x_{k})}, ~~~~k=0,1,\ldots . \end{aligned}$$

(7)

The above method can be generalized to several dimensions by replacing the derivative with the gradient,$ \nabla f(x_{k})$, and the reciprocal of the second derivative with the inverse of the Hessian matrix, $ \nabla ^{2}f(x_{k}) $. So, we have the following iterative formula

$$\begin{aligned} x_{k+1}=x_{k}-\left( \nabla ^{2}f(x_{k})\right) ^{-1}\nabla f(x_{k}),~~~k=0,1,\ldots . \end{aligned}$$

(8)

In the line search Newton method, $ B_{k} $ is the Hessian matrix $ \nabla ^{2}f(x_{k}) $. If the Hessian matrix is not positive definite, or is close to being singular, then we can modify this matrix before or during the solution process. Following is a general description of this method.

The choice of Hessian modification $ E_{k}$ is crucial to the effectiveness of the method. The modified Newton method for multiple root $x^{*}$ of multiplicity m, i.e., $f^{(j) }(x^{*})=0,~j=0,1,\ldots ,m-1$ and $f^{(m)}(x^{*})\ne 0$, is quadratically convergent and it is written as:

$$\begin{aligned} x_{k+1}=x_{k}-m\frac{f (x_{k})}{f'(x_{k})}, ~~~~k=0,1,\ldots . \end{aligned}$$

(9)

If the multiplicity m is unknown, the standard Newton method has a linear convergence with a rate of $\frac{(m-1)}{m}$ [23]. Traub [1] used a transformation $u(x)=\frac{f(x)}{f'(x)}$ instead of f(x) for computing a multiple root of $f(x)=0$. Then the problem of finding a multiple root is reduced to the problem of finding a simple root of the transformed equation u(x), and thus any iterative method can be used preserving its original convergence order. Applying the standard Newton method (6) to $u(x)=0$, we can obtain

$$\begin{aligned} x_{k+1}=x_{k}-\frac{f(x_{k})f'(x_{k})}{f '(x_{k})^{2}- f(x_{k}) f ''(x_{k})},~~~~k=0,1,\ldots . \end{aligned}$$

(10)

This method can be extended to n-variable functions $(f:\mathbb {R}^{n} \rightarrow \mathbb {R})$ as

$$\begin{aligned} x_{k+1}=x_{k}-\left( \nabla f(x_{k})\nabla f(x_{k})^{T}- f(x_{k}) \nabla ^{2}f(x_{k})\right) ^{-1}f(x_{k})\nabla f(x_{k}),~~ k=0,1,\ldots .\nonumber \\ \end{aligned}$$

(11)

Now, we propose a two-step algorithm to solve unconstrained optimization problems by using methods (8) and (11).

It is clear that, the introduced matrix $ (g_{k}g_{k}^{T}-f_{k}H_{k})$ is a symmetric matrix. Furthermore, in this algorithm, there is no need to calculate the step length and $ \alpha _{k}=1$ at each iteration.

3 Convergence analysis

In this section, we study convergence of the proposed method. The following assumptions are imposed throughout the paper.

Assumption 3.1

(A1): :

Let $f:\Lambda \subset \mathbb {R}^{n} \rightarrow \mathbb {R}$ be defined on the bounded and close set $\Lambda $. Suppose f is twice continuously differentiable and let $\overline{\Lambda } (x_{0})$ denote the closure of the level set,

$$\begin{aligned} \Lambda (x_{0})=\lbrace x: x\in \Lambda , f(x)\le f(x_{0}) \rbrace . \end{aligned}$$

(12)

(A2): :

$\nabla f(x)$ and $\nabla ^{2}f (x)$ are both Lipschitz continuous that is, there exists constants $L_{1} > 0$ and $L_{2} > 0$ such that

$$\begin{aligned} \Vert \nabla f(x)-\nabla f(y)\Vert \le L_{1} \Vert x-y\Vert ,~~~ x, y\in \mathbb {R}^{n}, \end{aligned}$$

(13)

and

$$\begin{aligned} \Vert \nabla ^{2}f (x)-\nabla ^{2}f (y)\Vert \le L_{2} \Vert x-y\Vert ,~~~ x, y\in \mathbb {R}^{n}. \end{aligned}$$

(14)

(A3): :

$ 2(\nabla f_{k}^{T}(f_{k} \nabla ^{2}f_{k})^{-1} \nabla f_{k})\le 1. $

(A4): :

$ L= \max \{L_{1}, L_{2}\} $ and $ \gamma =L\Vert (\nabla ^{2} f(x^{*}))^{-1}\Vert < 1$.

(A5): :

$\theta \in \mathbb {R} $ and $\vert \theta \vert < \frac{1-\gamma }{6\gamma }. $

Theorem 3.2

Suppose A is a nonsingular $N \times N$ matrix, U is $N \times M$, V is $M \times N$, then $A + UV$ is nonsingular if and only if $I + V A^{-1} U$ is a nonsingular $M \times M$ matrix. If this is the case, then

$$\begin{aligned} (A + UV )^{-1} = A^{-1}-A^{-1}U(I +V A^{-1}U)^{-1}V A^{-1}. \end{aligned}$$

(15)

This is the Sherman–Morrison–Woodbury formula [3, 24, 25]. See [3] for further generalizations.

Proposition 3.3

[3] Let B be a nonsingular $n \times n$ matrix and let $ u, v \in \mathbb {R}^n. $ Then $B + uv^{T}$ is invertible if and only if $1 + v^{T}B^{-1}u \ne 0.$ In this case,

$$\begin{aligned} (B + uv^{T})^{-1}=\left( I-\frac{B^{-1}uv^{T}}{1+v^{T}B^{-1}u}\right) B^{-1}. \end{aligned}$$

(16)

Lemma 3.4

Suppose that Assumption 3.1(A1) and (A3) hold. Then

(I):: $ \Big | \frac{\nabla f_{k}^{T} (-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \Big | \le 1$.
(II):: $(-f_{k}\nabla ^{2}f_{k}+ \nabla f_{k} \nabla f_{k}^{T})^{-1}=(-f_{k}\nabla ^{2}f_{k})^{-1}\left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k}\nabla ^{2}f_{k})^{-1}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}}\right) $.

Proof

From Assumption 3.1(A3), we have

$$\begin{aligned} 2(\nabla f_{k}^{T}(f_{k} \nabla ^{2}f_{k})^{-1} \nabla f_{k})\le 1 \Longrightarrow (\nabla f_{k}^{T}(-f_{k} \nabla ^{2}f_{k})^{-1} \nabla f_{k})\ge -\frac{1}{2}, \end{aligned}$$

(17)

and hence

$$\begin{aligned} \left| \frac{\nabla f_{k}^{T} (-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right| \le 1. \end{aligned}$$

(18)

According to Theorem (3.2) and Proposition (3.3), we set $ B=-f_{k}\nabla ^{2}f_{k},$$u=v=\nabla f_{k}$. From (17) we obtain

$$\begin{aligned} 1 + v^{T}B^{-1}u =1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}\ge \frac{1}{2}. \end{aligned}$$

(19)

Therefore, the matrix $B + uv^{T}$ is invertible and we can get

$$\begin{aligned} (B + uv^{T})^{-1}= & {} (-f_{k}\nabla ^{2}f_{k}+ \nabla f_{k} \nabla f_{k}^{T})^{-1}\nonumber \\= & {} (-f_{k}\nabla ^{2}f_{k})^{-1}-(-f_{k}\nabla ^{2} f_{k})^{-1}\nabla f_{k}(1+ \nabla f_{k}^{T}\nonumber \\&(- f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k})^{-1}\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nonumber \\= & {} \frac{-1}{f_{k}}(\nabla ^{2} f_{k})^{-1} \left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k}\nabla ^{2} f_{k})^{-1}}{1+\nabla f_{k}^{T} (-f_{k}\nabla ^{2} f_{k})^{-1} \nabla f_{k}}\right) . \end{aligned}$$

(20)

$\square $

Theorem 3.5

Suppose that Assumption 3.1(A1)–(A5) hold. Assume that

$$\begin{aligned} \left( \nabla f(x_{k})\nabla f(x_{k})^{T}-f(x_{k})\nabla ^2 f(x_{k})\right) p_{k} = - f(x_{k})\nabla f(x_{k}), \end{aligned}$$

(21)

and

$$\begin{aligned} \nabla ^{2} f(x_{k})\tilde{p}_{k} = - \nabla f(x_{k}). \end{aligned}$$

(22)

Then the iteration $x_{k+1} = x_{k} +\theta p_{k}+ (1-\theta ) \tilde{p}_{k}$ generated by Algorithm 2 converges to $x^{*}$, where $ x_{0} $ is the starting point.

Proof

In this proof, $f_k, \nabla f_k$ and $\nabla ^2 f_k$ denotes the $f(x_k), \nabla f(x_k)$ and $\nabla ^2 f(x_k)$, respectively. By using Lemma 3.4-II

$$\begin{aligned} p_{k}=(\nabla ^{2} f_{k})^{-1} \left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k}\nabla ^{2} f_{k})^{-1}}{1+\nabla f_{k}^{T} (-f_{k}\nabla ^{2} f_{k})^{-1} \nabla f_{k}}\right) \nabla f_{k}. \end{aligned}$$

(23)

Without loss of generality, we assume $ \nabla ^{2} f_{k} $ positive definite matrix. According to (22) and (23) we have

$$\begin{aligned}&x_{k+1}-x^{*} \nonumber \\&\quad = x_{k}+\theta p_{k}+(1-\theta )\tilde{p}_{k}-x^{*} \nonumber \\&\quad =x_{k}-x^{*}+\theta (\nabla ^{2}f_{k})^{-1}\left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k} \nabla ^{2}f_{k})^{-1}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right) \nabla f_{k} -(1-\theta ) (\nabla ^{2}f_{k})^{-1} \nabla f_{k}\nonumber \\&\quad = (\nabla ^{2}f_{k})^{-1}\left( \nabla ^{2}f_{k}(x_{k}-x^{*})+ \theta \left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k}\nabla ^{2}f_{k})^{-1}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right) \nabla f_{k} -(1-\theta ) \nabla f_{k} \right) \nonumber \\&\quad = (\nabla ^{2}f_{k})^{-1}\left( \nabla ^{2}f_{k}(x_{k}-x^{*})+ \theta \nabla f_{k}- \theta \frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k} \nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} -(1- \theta ) \nabla f_{k} \right) \nonumber \\&\quad = (\nabla ^{2}f_{k})^{-1}\left( \nabla ^{2}f_{k}(x_{k}-x^{*})+ 2\theta \nabla f_{k}-\nabla f_{k}- \theta \frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k} \nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right) , \end{aligned}$$

(24)

and hence

$$\begin{aligned}&\Vert x_{k+1}-x^{*} \Vert \nonumber \\&\quad = \left\| (\nabla ^{2}f_{k})^{-1}\left( \nabla ^{2}f_{k}(x_{k}-x^{*})+ 2\theta \nabla f_{k}-\nabla f_{k}- \theta \frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k} \nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right) \right\| \nonumber \\&\quad \le \Vert (\nabla ^{2}f_{k})^{-1}\Vert \left\| \nabla ^{2}f_{k}(x_{k}-x^{*})+ 2\theta \nabla f_{k}-\nabla f_{k}- \theta \frac{\nabla f_{k} \nabla f_{k}^{T}(-f_{k} \nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right\| \nonumber \\&\quad \le \Vert (\nabla ^{2}f_{k})^{-1}\Vert \left( \Vert \nabla ^{2}f_{k}(x_{k}-x^{*})-\nabla f_{k} \Vert + 2\vert \theta \vert \Vert \nabla f_{k} \Vert +\vert \theta \vert \Vert \nabla f_{k}\Vert \right) \nonumber \\&\quad = \Vert (\nabla ^{2}f_{k})^{-1}\Vert \left( \Vert \nabla ^{2}f_{k}(x_{k}-x^{*})-(\nabla f_{k}-\nabla f^{*} )\Vert + 2\vert \theta \vert \Vert \nabla f_{k} \Vert +\vert \theta \vert \Vert \nabla f_{k}\Vert \right) . \end{aligned}$$

(25)

Since

$$\begin{aligned} \nabla f_{k}-\nabla f^{*}=\int _{0}^{1} \nabla ^{2}f(x_{k}+t(x^{*}-x_{k}))(x_{k}-x^{*})dt, \end{aligned}$$

(26)

then

$$\begin{aligned}&\Vert \nabla ^{2}f(x_{k})(x_{k}-x^{*})-(\nabla f_{k}-\nabla f^{*})\Vert \nonumber \\&\quad = \left\| \int _{0}^{1}(\nabla ^{2}f(x_{k})-\nabla ^{2}f(x_{k}+t(x^{*}-x_{k})))(x_{k} -x^{*})dt\right\| \nonumber \\&\quad \le \int _{0}^{1} \Vert \nabla ^{2} f(x_{k})-\nabla ^{2} f(x_{k}+t(x^{*} -x_{k}))\Vert \Vert (x_{k}-x^{*})\Vert dt\nonumber \\&\quad \le \frac{1}{2} L \Vert x_{k}-x^{*} \Vert ^{2}. \end{aligned}$$

(27)

Now from Lemma 3.4-I, Assumption 3.1(A2) and (27), we obtain

$$\begin{aligned} \Vert x_{k+1}-x^{*} \Vert \le \Vert (\nabla ^{2}f_{k})^{-1}\Vert ( \frac{1}{2} L \Vert x_{k}-x^{*} \Vert ^{2}+ 2 \vert \theta \vert L \Vert x_{k}-x^{*} \Vert +\vert \theta \vert L \Vert x_{k}-x^{*} \Vert . \nonumber \\ \end{aligned}$$

(28)

Since $\nabla ^{2}f(x^{*})$ is nonsingular and $ \nabla ^{2}f(x_{k})\rightarrow \nabla ^{2}f(x^{*}), $ this implies that $ \Vert (\nabla ^{2} f(x_{k}))^{-1}\Vert \le 2 \Vert (\nabla ^{2} f(x^{*}))^{-1}\Vert $ for all sufficiently large k. Therefore from Assumption 3.1(A4)

$$\begin{aligned} \Vert x_{k+1}-x^{*} \Vert \le \gamma (\Vert x_{k}-x^{*} \Vert ^{2}+ 6\vert \theta \vert \Vert x_{k}-x^{*} \Vert ). \end{aligned}$$

(29)

Now suppose that there exists an integer k such that (i)$ \Vert x_{k}-x^{*} \Vert \le 1$ or (ii) $\Vert x_{k}-x^{*} \Vert > 1$. Then we consider the following cases:

Case (i): This case implies that

$$\begin{aligned} \Vert x_{k+1} -x^{*}\Vert \le \gamma \left( 1+ 6\vert \theta \vert \right) \Vert x_{k}-x^{*} \Vert , \end{aligned}$$

(30)

and with the help of (30) we have

$$\begin{aligned} \Vert x_{k+1} -x^{*}\Vert \le \gamma \left( 1+ 6\vert \theta \vert \right) \Vert x_{k}-x^{*} \Vert \le \cdots \le (\gamma \left( 1+ 6\vert \theta \vert \right) )^{k+1} \Vert x_{0}-x^{*} \Vert . \end{aligned}$$

(31)

If the starting point is sufficiently near $ x^{*}, $ then, by using Assumption 3.1(A5) the sequence of $ \lbrace x_{k}\rbrace $ converges to $ x^{*}. $

Case (ii): This case implies that

$$\begin{aligned} \Vert x_{k+1} -x^{*}\Vert \le \gamma \left( 1+ 6\vert \theta \vert \right) \Vert x_{k}-x^{*} \Vert ^{2}, \end{aligned}$$

(32)

and therefore

$$\begin{aligned} \Vert x_{k+1} -x^{*}\Vert \le \gamma \left( 1+ 6\vert \theta \vert \right) \Vert x_{k}-x^{*} \Vert ^{2} \le \cdots \le (\gamma \left( 1+ 6\vert \theta \vert \right) )^{2^{k+1}-1}\Vert x_{0}-x^{*}\Vert ^{2^{k+1}},\nonumber \\ \end{aligned}$$

(33)

and similar to previous case by using Assumption 3.1(A5) the sequence of $ \lbrace x_{k}\rbrace $ converges to $ x^{*}. $$\square $

Remark 1

For $ \theta =0$, the proposed method reduces to the standard Newton method.

4 Numerical results

In this section, we report some results on the following numerical experiments for the proposed algorithm. Also, the effectiveness of the proposed method with classical Newton method and Modified Regularized Newton method (MRN) proposed by Zhou and Chen in [8] are compared. We suppose $ \tau =10^{-6} $ and stopping condition is $ \Vert g(x_{k})\Vert \le 10^{-5}. $ Moreover, we suppose $ p_{0}=0.0001, p_{1}=0.25, p_{2}=0.75, \mu _{1}=10^{-2} $ and $m=10^{-8}$ in MRN method. $N_f$ represents the number of the objective function evaluations and $N_g$ represents the number of gradient evaluations. The codes have been written in Matlab 12.0 and runs are made on 2.3v GHz PC with 8 GB memory. The following test functions with standard starting points [26, 27] and [28] are used for the comparison. The obtained numerical results are summarized in Tables 1 and 2. As shown in these tables, the proposed method (PM) is preferable to classical Newton method (NM) and MRN method. Here, we used the performance profiles proposed by Dolan and More [29] to compare and evaluate each implementation on the test functions. Figures 1, 2, 3, 4, 5 and 6 give the performance profiles of the proposed method with $\theta =0.25,0.5$ and 0.75, MRN method and the classical Newton method relative to the number of objective function evaluations $(N_{f})$ and the number of gradient evaluations $(N_{g})$. Also, “Dim” shows the dimension of the problem. Figures 1, 2 and 3 show the performance of considered algorithms in terms of number of function evaluations. At a glance to these figures, we observe that in the cases $\theta =0.25$ and $\theta =0.5$ the performance of the proposed algorithm is better than others. Also according to Figs. 4, 5 and 6, the proposed method with $\theta =0.25$ performe better than the other methods.

Table 1 Numerical results

Full size table

Table 2 Numerical results

Full size table

4.1 Systems of nonlinear equations

In this part, we solve systems of nonlinear equations by using the proposed algorithm. Consider the following nonlinear system of equations

$$\begin{aligned} F(x)=0, \end{aligned}$$

(34)

where $F(x)=(f_{1}(x),f_{2}(x), \ldots , f_{n}(x))$ and $ x \in \mathbb {R}^{n} $. This system can be extended as

$$\begin{aligned}&f_{1}(x_{1},x_{2}, \ldots , x_{n})=0,\nonumber \\&f_{2}(x_{1},x_{2}, \ldots , x_{n})=0,\nonumber \\&\vdots \nonumber \\&f_{n}(x_{1},x_{2}, \ldots , x_{n})=0. \end{aligned}$$

(35)

For solving (34) by proposed algorithm, we suppose that $ f(x)=\sum \nolimits _{i=1}^{n} f_{i}^{^{2}}(x) $. Here, we have worked out our proposed method on the following test problems. In all problems, the stopping criterion is given by $ \Vert f(x_{k})\Vert < 10^{-15}$.

Problem 1

Consider the system of two equations [30]:

$$\begin{aligned} F(x)=\left\{ \begin{array}{ll} x_{1}+e^{x_{2}}-\cos x_{2} =0, \\ 3 x_{1}-\sin x_{1}-x_{2} =0, \end{array}\right. \end{aligned}$$

with the initial value $ x_{0}=(1.5, 1.5)$ and exact solution $x^{*} =(0, 0).$

Problem 2

Consider the system of three equations [30]:

$$\begin{aligned} F(x)=\left\{ \begin{array}{ll} 10 x_{1}+\sin (x_1 +x_2)-1 =0, \\ 8 x_2- \cos ^2 (x_3 -x_2)-1 =0, \\ 12 x_3 + \sin x_3 -1 =0,~~~~~~~~ \end{array}\right. \end{aligned}$$

with the initial value $x_{0}=(-1, 1, -1)$ and exact solution

$$\begin{aligned} x^{*}=(0.0689783491726666, 0.2464424186091830, 0.0769289119875370). \end{aligned}$$

Table 3 Numerical results for Problems 1–4

Full size table

Problem 3

$ n=16 $, $ 1\le i \le n-1 $ [31]:

$$\begin{aligned} F(x)=\left\{ \begin{array}{l} x_{i} \sin (x_{i+1})-1=0,\\ x_{n} \sin (x_{1})-1=0, \end{array}\right. \end{aligned}$$

with the initial value $ x_{0}=(-0.85,-0.85, \ldots , -0.85)$ and exact solution

$$\begin{aligned} x^{*}=(-1.114157140871930087, \ldots , -1.114157140871930087). \end{aligned}$$

Problem 4

Next, consider a system of 25 equations [32]:

$$\begin{aligned} f_i(x)= x_{i}- \cos \left( 2x_{i}-\displaystyle \sum _{j=1}^{25} x_{j}\right) =0,~~ 1 \le i \le 25. \end{aligned}$$

The initial guess is $ x_{0}=(0.2, 0.2, \ldots , 0.2) $ and the root correct up to 16 digits is given by $ x^{*}=(0.2142756147552158,0.2142756147552158, \ldots , 0.2142756147552158).$

Table 3 shows the results about the considered nonlinear systems solved by the proposed algorithm, MRN and classical Newton method. From Table 3, we can see that the performance of proposed method on these problems is better than the other two algorithms.

5 Conclusions

In the present work, we proposed a new algorithm for solving convex unconstrained optimization problems. This proposed method is based on Traub’s iterative algorithm [1] which is extended to n-variable. The local convergence of the proposed method is also provided. The best property of this method is that it does not need to calculate the step length and $\alpha _{k}=1$ at each iteration. The numerical results and comparison with other algorithms confirmed the efficiency and robustness of our algorithm. In addition, by comparing the performance profile figures for three different $\theta =0.25, 0.5, 0.75$, it is seen that the numerical results for $\theta = 0.25$ are better than two other $\theta $ values. This algorithm is also used to solve systems of nonlinear equations, which have better numerical results compared to MRN method and classical Newton method.

References

Traub, J.F.: Iterative Methods for the Solution of Equations. Prentice Hall, Englewood Cliffs (1964)
MATH Google Scholar
Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Kelley, C.T.: Iterative Methods for Linear and Nonlinear Equations. North Carolina State University, Chapel Hill (1995)
Book Google Scholar
Dennis, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equation. Society for Industrial and Applied Mathematics, Prentice-Hall (1996)
Book Google Scholar
Zhou, G., Toh, K.C.: Superlinear convergence of a Newton-type algorithm for monotone equations. J. Optim. Theory Appl. 125, 205–221 (2005)
Article MathSciNet Google Scholar
Fan, J.Y.: Convergence rate of the trust region method for nonlinear equations under local error bound condition. Comput. Optim. Appl. 34, 215–227 (2006)
Article MathSciNet Google Scholar
Li, D.H., Fukushima, M., Qi, L., Yamashita, N.: Regularized Newton methods for convex minimization problems with singular solutions. Comput. Optim. Appl. 28, 131–147 (2004)
Article MathSciNet Google Scholar
Zhou, W., Chen, X.: On the convergence of a modified regularized Newton method for convex optimization with singular solutions. Comput. Appl. Math. 239, 179–188 (2013)
Article MathSciNet Google Scholar
Nesterov, Y., Nemirovsky, A.: Interior Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia (1994)
Book Google Scholar
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108, 177–205 (2006)
Article MathSciNet Google Scholar
Polyak, R.A.: Regularized Newton method for unconstrained convex optimization. Math. Program. 120, 125–145 (2009)
Article MathSciNet Google Scholar
Dehghan Niri, T., Hosseini, M.M., Heydari, M.: An efficient improvement of the Newton method for solving nonconvex optimization problems. Comput. Methods Differ. Equ. 7, 69–85 (2019)
MathSciNet MATH Google Scholar
Heydari, M., Dehghan Niri, T., Hosseini, M.M.: A new modified trust region algorithm for solving unconstrained optimization problems. J. Math. Ext. 12, 115–135 (2018)
Google Scholar
Dehghan Niri, T., Shahzadeh Fazeli, S.A., Hosseini, M.M.: Using a new regularized factorization method for unconstrained optimization problems. Int. J. Numer. Model. Electron. Netw. Devices Fields (2019). https://doi.org/10.1002/jnm.2580
Article Google Scholar
Abu Arqub, O., Al-Smadi, M.: Atangana–Baleanu fractional approach to the solutions of Bagley–Torvik and Painlevé equations in Hilbert space. Chaos Solitons Fractals 117, 161–167 (2018)
Article MathSciNet Google Scholar
Abu Arqub, O., Maayah, B.: Numerical solutions of integrodifferential equations of Fredholm operator type in the sense of the Atangana–Baleanu fractional operator. Chaos Solitons Fractals 117, 117–124 (2018)
Article MathSciNet Google Scholar
Abu Arqub, O.: Solutions of time-fractional Tricomi and Keldysh equations of Dirichlet functions types in Hilbert space. Numer. Methods Partial Differ. Equ. 34, 1759–1780 (2018)
Article MathSciNet Google Scholar
Abu Arqub, O.: Numerical solutions for the Robin time-fractional partial differential equations of heat and fluid flows based on the reproducing kernel algorithm. Int. J. Numer. Methods Heat Fluid Flow 28, 828–856 (2018)
Article Google Scholar
Abu Arqub, O.: Fitted reproducing kernel Hilbert space method for the solutions of some certain classes of time-fractional partial differential equations subject to initial and Neumann boundary conditions. Comput. Math. Appl. 73, 1243–1261 (2017)
Article MathSciNet Google Scholar
Aslam Noor, M.: New classes of iterative methods for nonlinear equations, A new modified Halley method without second derivatives for nonlinear equation. Appl. Math. Comput. 191, 128–131 (2007)
MathSciNet MATH Google Scholar
Aslam Noor, M., Inayat Noor, K., Mohyud-Din, S.T., Shabbir, A.: An iterative method with cubic convergence for nonlinear equations. Appl. Math. Comput. 183, 1249–1255 (2006)
MathSciNet MATH Google Scholar
Aslam Noor, M., Asghar Khan, W., Hussain, A.: A new modified Halley method without second derivatives for nonlinear equation. Appl. Math. Comput. 189, 1268–1273 (2007)
MathSciNet MATH Google Scholar
Atkinson, K.E.: An Introduction to Numerical Analysis, 2nd edn. Wiley, Singapore (1988)
Google Scholar
Duncan, W.J.: Some devices for the solution of large sets of simultaneous linear equations (with an appendix on the reciprocation ofp artitioned matrices). Lond. Edinb. Dublin Philos. Mag. J. Sci. 35, 660–670 (1944)
Article Google Scholar
Woodbury, M.A.: Inverting Modified Matrices, Memorandum Report 42. Statistical Research Group, Princeton, NJ (1950)
Google Scholar
Andrei, N.: Test Functions for Unconstrained Optimization. Academy of Romanian Scientists, Bucharest (2004)
Google Scholar
Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim. 10, 147–161 (2008)
MathSciNet MATH Google Scholar
More, J.J., Grabow, B.S., Hillstrom, K.E.: testing unconstrained optimization software. ACM Trans. Math. Softw. 7, 17–41 (1981)
Article MathSciNet Google Scholar
Dolan, E.D., More, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Article MathSciNet Google Scholar
Sharma, J.R., Gupta, P.: An efficient fifth order method for solving systems of nonlinear equations. Comput. Math. Appl. 67, 591–601 (2014)
Article MathSciNet Google Scholar
Xiao, X., Yin, H.: A new class of methods with higher order of convergence for solving systems of nonlinear equations. Appl. Math. Comput. 264, 300–309 (2015)
MathSciNet MATH Google Scholar
Narang, M., Bhatia, S., Kanwar, V.: New two-parameter Chebyshev–Halley-like family of fourth and sixth-order methods for systems of nonlinear equations. Appl. Math. Comput. 275, 394–403 (2016)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Yazd University, Yazd, Iran
T. Dehghan Niri, S. A. Shahzadeh Fazeli & M. Heydari

Authors

T. Dehghan Niri
View author publications
You can also search for this author in PubMed Google Scholar
S. A. Shahzadeh Fazeli
View author publications
You can also search for this author in PubMed Google Scholar
M. Heydari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. A. Shahzadeh Fazeli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dehghan Niri, T., Shahzadeh Fazeli, S.A. & Heydari, M. A two-step improved Newton method to solve convex unconstrained optimization problems. J. Appl. Math. Comput. 62, 37–53 (2020). https://doi.org/10.1007/s12190-019-01272-z

Download citation

Received: 12 May 2019
Published: 05 July 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s12190-019-01272-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A two-step improved Newton method to solve convex unconstrained optimization problems

Abstract

Similar content being viewed by others

An Efficient Barzilai–Borwein Conjugate Gradient Method for Unconstrained Optimization

Comment on “A new three-term conjugate gradient method for unconstrained problem”

A modified secant equation quasi-Newton method for unconstrained optimization

1 Introduction

2 Description of the method