1 Introduction

The general form of optimization problems can be written as follows:

$$\begin{aligned} \min&f(x),&\nonumber \\ s.t.&~~~c_{i}(x)=0,&i \in E, \nonumber \\&~~~ c_{j}(x)\ge 0,&j \in I, \nonumber \\&x \in \mathbb {R}^{n}, \end{aligned}$$
(1)

where E and I are, respectively, the index set of equality and inequality constraints, \(c_{i}(x), (i = 1,\ldots ,m \in E \cup I)\) are constraint functions. When both objective function and constraint functions are linear functions, the problem is called linear programming. Otherwise, the problem is called nonlinear programming. Also, a problem that does not entail any equality or inequality constraints is said to be an unconstrained optimization problem. Now, we consider an unconstrained optimization problem

$$\begin{aligned} \min _{x\in \mathbb {R}^n} f(x), \end{aligned}$$
(2)

where \(f: \mathbb {R}^n \rightarrow \mathbb {R}\) is a smooth and continuously differentiable function. It is not easy to find a global minimizer of f(x), because our knowledge of the objective function is commonly only local. Therefore most algorithms are able to find only a local minimizer, which is a point that achieves the smallest value of f(x) in its neighborhood [2]. In other words, we say that a point \(x^{*}\) is a local minimizer if there is a neighborhood \(\mathcal {N}\) (an open set that contains \(x^{*}\)) of \(x^{*}\) such that \(f(x^ {*} ) \le f(x)\) for all \(x\in \mathcal {N}\). Most of the numerical methods for unconstrained optimization problems can be classified into two groups, line search algorithms and trust region algorithms. There are many useful algorithms for solving the problem (2) such as the conjugate gradient methods, the trust region methods, the quasi-Newton methods, the classical Newton method, the Nelder–Meade simplex method for problems with noisy functions, the Levenberg–Marquardt method and etc. [2,3,4]. Among the methods mentioned above, the classical Newton method is very famous for its fast convergence property. There are several modifications of the Newton method for unconstrained minimization to achieve global and local convergence, see [2, 4] and the references therein. In Newton method, the positive definiteness of the Hessian matrix of the objective function is an essential condition to get the local minimum and the fast local convergence. Zhou et al. [5] introduced a new algorithm for monotone equations and showed its superliner convergence under a local error-bound assumption that is weaker than the standard nonsingularity condition. A new trust region method for nonlinear equations with the trust region radius converging to zero is proposed in [6], and its convergence under some weak conditions is provided. Li et al. [7] obtained two regularized Newton methods for convex minimization problems in which the Hessian at solutions may be singular and showed that if the objective function be in LC\(^2\), then the methods possess local quadratic convergence under a local error bound condition without the requirement of isolated nonsingular solutions. Zhou and Chen in [8] proposed a modified regularized Newton method for convex minimization problems whose Hessian matrices may be singular. Nesterov and Polyak [9] proposed a cubic regularization of the Newton method. At each iteration, it requires solving an unconstrained minimization problem. Nesterov and Nemirovsky [10] presented the class of self-concordant functions that is three time differentiable convex function with the second and third derivatives satisfying a particular condition at each point. Polyak [11] introduced the regularized Newton method for unconstrained convex optimization. For any convex function, with a bounded optimal set, the RNM generates a sequence that converges to the optimal set from any starting point. Dehghan et al. [12] inroduced a new modification of the Newton method to solve unconstrained optimization problems. Also, they used a new improved Newton method in trust region algorithm for unconstrained minimization problems and analyzed its local and global convergence [13]. Recently, Dehghan et al. [14] proposed a new regularized Newton method based on Q.I.F method to solve optimization problems. One of the applications of the above algorithms is to solve system of nonlinear equations. For example, the system of nonlinear equations appearing in the references [15,16,17,18,19] can be solved by using these methods.

In this paper, we introduce a new algorithm to solve the convex unconstrained optimization problems. The organization of the paper is as follows: In Sect. 2, a new algorithm for solving unconstrained minimization problems is presented. Its associated convergence analysis is given in Sect. 3. Some numerical results to compare the new proposed method with the other algorithms are reported in Sect. 4 and finally the conclusions are described in Sect. 5.

2 Description of the method

In this section, a brief review of Newton method for unconstrained optimization problems is given which mentioned in [2]. Consider the minimization problem,

$$\begin{aligned} \min _{x\in \mathbb {R}^n} f(x), \end{aligned}$$
(3)

where \(f:\mathbb {R}^n \longrightarrow \mathbb {R}\) is a smooth function and twice continuously differentiable. Gradient \(\nabla f(x)\) and Hessian matrix \(\nabla ^{2} f(x)\) are denoted by g(x) and H(x), respectively. In the line search method, each iteration computes a search direction \(p_{k}\) and then decides how far to move along that direction. Iterations are as follows:

$$\begin{aligned} x_{k+1}=x_{k} + \alpha _{k}p_{k}, \end{aligned}$$
(4)

where the positive scalar \( \alpha _{k} \) is called the step length. The success of linear search method depends on the appropriate selections of step length \( \alpha _{k} \) and direction \(p_{k}\). Most of the line search methods require \(p_{k}\) to be a descent direction, because this property guarantees that the function f(x) can be reduced along this direction. The search direction can be defined as:

$$\begin{aligned} p_{k} = - B_{k}^{-1}\nabla f_{k}, \end{aligned}$$
(5)

where \( B_{k} \) is a symmetric and nonsingular matrix.

The Newton method is a powerful technique for solving nonlinear equations. It is an application of Taylor Polynomials for finding roots of functions. Newton method is the iterative method for finding a simple root \(x^{*}\) of nonlinear equation f(x), i.e., \(f(x^{*})=0,~f'(x^{*})\ne 0\), by using

$$\begin{aligned} x_{k+1}=x_{k}-\frac{f (x_{k})}{f'(x_{k})}, ~~~~k=0,1,\ldots , \end{aligned}$$
(6)

that converges quadratically in some neighborhood of \(x^{*}\) [1, 20,21,22]. Newton method can also be used to find a minimum or maximum of a function. The derivative is zero at a minimum or maximum, so minima and maxima can be found by applying Newton method to the derivative. The iteration becomes:

$$\begin{aligned} x_{k+1}=x_{k}-\frac{f' (x_{k})}{f''(x_{k})}, ~~~~k=0,1,\ldots . \end{aligned}$$
(7)

The above method can be generalized to several dimensions by replacing the derivative with the gradient,\( \nabla f(x_{k})\), and the reciprocal of the second derivative with the inverse of the Hessian matrix, \( \nabla ^{2}f(x_{k}) \). So, we have the following iterative formula

$$\begin{aligned} x_{k+1}=x_{k}-\left( \nabla ^{2}f(x_{k})\right) ^{-1}\nabla f(x_{k}),~~~k=0,1,\ldots . \end{aligned}$$
(8)

In the line search Newton method, \( B_{k} \) is the Hessian matrix \( \nabla ^{2}f(x_{k}) \). If the Hessian matrix is not positive definite, or is close to being singular, then we can modify this matrix before or during the solution process. Following is a general description of this method.

figure a

The choice of Hessian modification \( E_{k}\) is crucial to the effectiveness of the method. The modified Newton method for multiple root \(x^{*}\) of multiplicity m, i.e., \(f^{(j) }(x^{*})=0,~j=0,1,\ldots ,m-1\) and \(f^{(m)}(x^{*})\ne 0\), is quadratically convergent and it is written as:

$$\begin{aligned} x_{k+1}=x_{k}-m\frac{f (x_{k})}{f'(x_{k})}, ~~~~k=0,1,\ldots . \end{aligned}$$
(9)

If the multiplicity m is unknown, the standard Newton method has a linear convergence with a rate of \(\frac{(m-1)}{m}\) [23]. Traub [1] used a transformation \(u(x)=\frac{f(x)}{f'(x)}\) instead of f(x) for computing a multiple root of \(f(x)=0\). Then the problem of finding a multiple root is reduced to the problem of finding a simple root of the transformed equation u(x), and thus any iterative method can be used preserving its original convergence order. Applying the standard Newton method (6) to \(u(x)=0\), we can obtain

$$\begin{aligned} x_{k+1}=x_{k}-\frac{f(x_{k})f'(x_{k})}{f '(x_{k})^{2}- f(x_{k}) f ''(x_{k})},~~~~k=0,1,\ldots . \end{aligned}$$
(10)

This method can be extended to n-variable functions \((f:\mathbb {R}^{n} \rightarrow \mathbb {R})\) as

$$\begin{aligned} x_{k+1}=x_{k}-\left( \nabla f(x_{k})\nabla f(x_{k})^{T}- f(x_{k}) \nabla ^{2}f(x_{k})\right) ^{-1}f(x_{k})\nabla f(x_{k}),~~ k=0,1,\ldots .\nonumber \\ \end{aligned}$$
(11)

Now, we propose a two-step algorithm to solve unconstrained optimization problems by using methods (8) and (11).

figure b

It is clear that, the introduced matrix \( (g_{k}g_{k}^{T}-f_{k}H_{k})\) is a symmetric matrix. Furthermore, in this algorithm, there is no need to calculate the step length and \( \alpha _{k}=1\) at each iteration.

3 Convergence analysis

In this section, we study convergence of the proposed method. The following assumptions are imposed throughout the paper.

Assumption 3.1

(A1): :

Let \(f:\Lambda \subset \mathbb {R}^{n} \rightarrow \mathbb {R}\) be defined on the bounded and close set \(\Lambda \). Suppose f is twice continuously differentiable and let \(\overline{\Lambda } (x_{0})\) denote the closure of the level set,

$$\begin{aligned} \Lambda (x_{0})=\lbrace x: x\in \Lambda , f(x)\le f(x_{0}) \rbrace . \end{aligned}$$
(12)
(A2): :

\(\nabla f(x)\) and \(\nabla ^{2}f (x)\) are both Lipschitz continuous that is, there exists constants \(L_{1} > 0\) and \(L_{2} > 0\) such that

$$\begin{aligned} \Vert \nabla f(x)-\nabla f(y)\Vert \le L_{1} \Vert x-y\Vert ,~~~ x, y\in \mathbb {R}^{n}, \end{aligned}$$
(13)

and

$$\begin{aligned} \Vert \nabla ^{2}f (x)-\nabla ^{2}f (y)\Vert \le L_{2} \Vert x-y\Vert ,~~~ x, y\in \mathbb {R}^{n}. \end{aligned}$$
(14)
(A3): :

\( 2(\nabla f_{k}^{T}(f_{k} \nabla ^{2}f_{k})^{-1} \nabla f_{k})\le 1. \)

(A4): :

\( L= \max \{L_{1}, L_{2}\} \) and \( \gamma =L\Vert (\nabla ^{2} f(x^{*}))^{-1}\Vert < 1\).

(A5): :

\(\theta \in \mathbb {R} \) and \(\vert \theta \vert < \frac{1-\gamma }{6\gamma }. \)

Theorem 3.2

Suppose A is a nonsingular \(N \times N\) matrix, U is \(N \times M\), V is \(M \times N\), then \(A + UV\) is nonsingular if and only if \(I + V A^{-1} U\) is a nonsingular \(M \times M\) matrix. If this is the case, then

$$\begin{aligned} (A + UV )^{-1} = A^{-1}-A^{-1}U(I +V A^{-1}U)^{-1}V A^{-1}. \end{aligned}$$
(15)

This is the Sherman–Morrison–Woodbury formula [3, 24, 25]. See [3] for further generalizations.

Proposition 3.3

[3] Let B be a nonsingular \(n \times n\) matrix and let \( u, v \in \mathbb {R}^n. \) Then \(B + uv^{T}\) is invertible if and only if \(1 + v^{T}B^{-1}u \ne 0.\) In this case,

$$\begin{aligned} (B + uv^{T})^{-1}=\left( I-\frac{B^{-1}uv^{T}}{1+v^{T}B^{-1}u}\right) B^{-1}. \end{aligned}$$
(16)

Lemma 3.4

Suppose that Assumption 3.1(A1) and (A3) hold. Then

(I):

\( \Big | \frac{\nabla f_{k}^{T} (-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \Big | \le 1\).

(II):

\((-f_{k}\nabla ^{2}f_{k}+ \nabla f_{k} \nabla f_{k}^{T})^{-1}=(-f_{k}\nabla ^{2}f_{k})^{-1}\left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k}\nabla ^{2}f_{k})^{-1}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}}\right) \).

Proof

From Assumption 3.1(A3), we have

$$\begin{aligned} 2(\nabla f_{k}^{T}(f_{k} \nabla ^{2}f_{k})^{-1} \nabla f_{k})\le 1 \Longrightarrow (\nabla f_{k}^{T}(-f_{k} \nabla ^{2}f_{k})^{-1} \nabla f_{k})\ge -\frac{1}{2}, \end{aligned}$$
(17)

and hence

$$\begin{aligned} \left| \frac{\nabla f_{k}^{T} (-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right| \le 1. \end{aligned}$$
(18)

According to Theorem (3.2) and Proposition (3.3), we set \( B=-f_{k}\nabla ^{2}f_{k},\)\(u=v=\nabla f_{k}\). From (17) we obtain

$$\begin{aligned} 1 + v^{T}B^{-1}u =1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}\ge \frac{1}{2}. \end{aligned}$$
(19)

Therefore, the matrix \(B + uv^{T}\) is invertible and we can get

$$\begin{aligned} (B + uv^{T})^{-1}= & {} (-f_{k}\nabla ^{2}f_{k}+ \nabla f_{k} \nabla f_{k}^{T})^{-1}\nonumber \\= & {} (-f_{k}\nabla ^{2}f_{k})^{-1}-(-f_{k}\nabla ^{2} f_{k})^{-1}\nabla f_{k}(1+ \nabla f_{k}^{T}\nonumber \\&(- f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k})^{-1}\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nonumber \\= & {} \frac{-1}{f_{k}}(\nabla ^{2} f_{k})^{-1} \left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k}\nabla ^{2} f_{k})^{-1}}{1+\nabla f_{k}^{T} (-f_{k}\nabla ^{2} f_{k})^{-1} \nabla f_{k}}\right) . \end{aligned}$$
(20)

\(\square \)

Theorem 3.5

Suppose that Assumption 3.1(A1)(A5) hold. Assume that

$$\begin{aligned} \left( \nabla f(x_{k})\nabla f(x_{k})^{T}-f(x_{k})\nabla ^2 f(x_{k})\right) p_{k} = - f(x_{k})\nabla f(x_{k}), \end{aligned}$$
(21)

and

$$\begin{aligned} \nabla ^{2} f(x_{k})\tilde{p}_{k} = - \nabla f(x_{k}). \end{aligned}$$
(22)

Then the iteration \(x_{k+1} = x_{k} +\theta p_{k}+ (1-\theta ) \tilde{p}_{k}\) generated by Algorithm 2 converges to \(x^{*}\), where \( x_{0} \) is the starting point.

Proof

In this proof, \(f_k, \nabla f_k\) and \(\nabla ^2 f_k\) denotes the \(f(x_k), \nabla f(x_k)\) and \(\nabla ^2 f(x_k)\), respectively. By using Lemma 3.4-II

$$\begin{aligned} p_{k}=(\nabla ^{2} f_{k})^{-1} \left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k}\nabla ^{2} f_{k})^{-1}}{1+\nabla f_{k}^{T} (-f_{k}\nabla ^{2} f_{k})^{-1} \nabla f_{k}}\right) \nabla f_{k}. \end{aligned}$$
(23)

Without loss of generality, we assume \( \nabla ^{2} f_{k} \) positive definite matrix. According to (22) and (23) we have

$$\begin{aligned}&x_{k+1}-x^{*} \nonumber \\&\quad = x_{k}+\theta p_{k}+(1-\theta )\tilde{p}_{k}-x^{*} \nonumber \\&\quad =x_{k}-x^{*}+\theta (\nabla ^{2}f_{k})^{-1}\left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k} \nabla ^{2}f_{k})^{-1}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right) \nabla f_{k} -(1-\theta ) (\nabla ^{2}f_{k})^{-1} \nabla f_{k}\nonumber \\&\quad = (\nabla ^{2}f_{k})^{-1}\left( \nabla ^{2}f_{k}(x_{k}-x^{*})+ \theta \left( I-\frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k}\nabla ^{2}f_{k})^{-1}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right) \nabla f_{k} -(1-\theta ) \nabla f_{k} \right) \nonumber \\&\quad = (\nabla ^{2}f_{k})^{-1}\left( \nabla ^{2}f_{k}(x_{k}-x^{*})+ \theta \nabla f_{k}- \theta \frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k} \nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} -(1- \theta ) \nabla f_{k} \right) \nonumber \\&\quad = (\nabla ^{2}f_{k})^{-1}\left( \nabla ^{2}f_{k}(x_{k}-x^{*})+ 2\theta \nabla f_{k}-\nabla f_{k}- \theta \frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k} \nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right) , \end{aligned}$$
(24)

and hence

$$\begin{aligned}&\Vert x_{k+1}-x^{*} \Vert \nonumber \\&\quad = \left\| (\nabla ^{2}f_{k})^{-1}\left( \nabla ^{2}f_{k}(x_{k}-x^{*})+ 2\theta \nabla f_{k}-\nabla f_{k}- \theta \frac{\nabla f_{k} \nabla f_{k}^{T} (-f_{k} \nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right) \right\| \nonumber \\&\quad \le \Vert (\nabla ^{2}f_{k})^{-1}\Vert \left\| \nabla ^{2}f_{k}(x_{k}-x^{*})+ 2\theta \nabla f_{k}-\nabla f_{k}- \theta \frac{\nabla f_{k} \nabla f_{k}^{T}(-f_{k} \nabla ^{2}f_{k})^{-1}\nabla f_{k}}{1+\nabla f_{k}^{T}(-f_{k}\nabla ^{2}f_{k})^{-1}\nabla f_{k}} \right\| \nonumber \\&\quad \le \Vert (\nabla ^{2}f_{k})^{-1}\Vert \left( \Vert \nabla ^{2}f_{k}(x_{k}-x^{*})-\nabla f_{k} \Vert + 2\vert \theta \vert \Vert \nabla f_{k} \Vert +\vert \theta \vert \Vert \nabla f_{k}\Vert \right) \nonumber \\&\quad = \Vert (\nabla ^{2}f_{k})^{-1}\Vert \left( \Vert \nabla ^{2}f_{k}(x_{k}-x^{*})-(\nabla f_{k}-\nabla f^{*} )\Vert + 2\vert \theta \vert \Vert \nabla f_{k} \Vert +\vert \theta \vert \Vert \nabla f_{k}\Vert \right) . \end{aligned}$$
(25)

Since

$$\begin{aligned} \nabla f_{k}-\nabla f^{*}=\int _{0}^{1} \nabla ^{2}f(x_{k}+t(x^{*}-x_{k}))(x_{k}-x^{*})dt, \end{aligned}$$
(26)

then

$$\begin{aligned}&\Vert \nabla ^{2}f(x_{k})(x_{k}-x^{*})-(\nabla f_{k}-\nabla f^{*})\Vert \nonumber \\&\quad = \left\| \int _{0}^{1}(\nabla ^{2}f(x_{k})-\nabla ^{2}f(x_{k}+t(x^{*}-x_{k})))(x_{k} -x^{*})dt\right\| \nonumber \\&\quad \le \int _{0}^{1} \Vert \nabla ^{2} f(x_{k})-\nabla ^{2} f(x_{k}+t(x^{*} -x_{k}))\Vert \Vert (x_{k}-x^{*})\Vert dt\nonumber \\&\quad \le \frac{1}{2} L \Vert x_{k}-x^{*} \Vert ^{2}. \end{aligned}$$
(27)

Now from Lemma 3.4-I, Assumption 3.1(A2) and (27), we obtain

$$\begin{aligned} \Vert x_{k+1}-x^{*} \Vert \le \Vert (\nabla ^{2}f_{k})^{-1}\Vert ( \frac{1}{2} L \Vert x_{k}-x^{*} \Vert ^{2}+ 2 \vert \theta \vert L \Vert x_{k}-x^{*} \Vert +\vert \theta \vert L \Vert x_{k}-x^{*} \Vert . \nonumber \\ \end{aligned}$$
(28)

Since \(\nabla ^{2}f(x^{*})\) is nonsingular and \( \nabla ^{2}f(x_{k})\rightarrow \nabla ^{2}f(x^{*}), \) this implies that \( \Vert (\nabla ^{2} f(x_{k}))^{-1}\Vert \le 2 \Vert (\nabla ^{2} f(x^{*}))^{-1}\Vert \) for all sufficiently large k. Therefore from Assumption 3.1(A4)

$$\begin{aligned} \Vert x_{k+1}-x^{*} \Vert \le \gamma (\Vert x_{k}-x^{*} \Vert ^{2}+ 6\vert \theta \vert \Vert x_{k}-x^{*} \Vert ). \end{aligned}$$
(29)

Now suppose that there exists an integer k such that (i)\( \Vert x_{k}-x^{*} \Vert \le 1\) or (ii) \(\Vert x_{k}-x^{*} \Vert > 1\). Then we consider the following cases:

Case (i): This case implies that

$$\begin{aligned} \Vert x_{k+1} -x^{*}\Vert \le \gamma \left( 1+ 6\vert \theta \vert \right) \Vert x_{k}-x^{*} \Vert , \end{aligned}$$
(30)

and with the help of (30) we have

$$\begin{aligned} \Vert x_{k+1} -x^{*}\Vert \le \gamma \left( 1+ 6\vert \theta \vert \right) \Vert x_{k}-x^{*} \Vert \le \cdots \le (\gamma \left( 1+ 6\vert \theta \vert \right) )^{k+1} \Vert x_{0}-x^{*} \Vert . \end{aligned}$$
(31)

If the starting point is sufficiently near \( x^{*}, \) then, by using Assumption 3.1(A5) the sequence of \( \lbrace x_{k}\rbrace \) converges to \( x^{*}. \)

Case (ii): This case implies that

$$\begin{aligned} \Vert x_{k+1} -x^{*}\Vert \le \gamma \left( 1+ 6\vert \theta \vert \right) \Vert x_{k}-x^{*} \Vert ^{2}, \end{aligned}$$
(32)

and therefore

$$\begin{aligned} \Vert x_{k+1} -x^{*}\Vert \le \gamma \left( 1+ 6\vert \theta \vert \right) \Vert x_{k}-x^{*} \Vert ^{2} \le \cdots \le (\gamma \left( 1+ 6\vert \theta \vert \right) )^{2^{k+1}-1}\Vert x_{0}-x^{*}\Vert ^{2^{k+1}},\nonumber \\ \end{aligned}$$
(33)

and similar to previous case by using Assumption 3.1(A5) the sequence of \( \lbrace x_{k}\rbrace \) converges to \( x^{*}. \)\(\square \)

Remark 1

For \( \theta =0\), the proposed method reduces to the standard Newton method.

4 Numerical results

In this section, we report some results on the following numerical experiments for the proposed algorithm. Also, the effectiveness of the proposed method with classical Newton method and Modified Regularized Newton method (MRN) proposed by Zhou and Chen in [8] are compared. We suppose \( \tau =10^{-6} \) and stopping condition is \( \Vert g(x_{k})\Vert \le 10^{-5}. \) Moreover, we suppose \( p_{0}=0.0001, p_{1}=0.25, p_{2}=0.75, \mu _{1}=10^{-2} \) and \(m=10^{-8}\) in MRN method. \(N_f\) represents the number of the objective function evaluations and \(N_g\) represents the number of gradient evaluations. The codes have been written in Matlab 12.0 and runs are made on 2.3v GHz PC with 8 GB memory. The following test functions with standard starting points [26, 27] and [28] are used for the comparison. The obtained numerical results are summarized in Tables 1 and 2. As shown in these tables, the proposed method (PM) is preferable to classical Newton method (NM) and MRN method. Here, we used the performance profiles proposed by Dolan and More [29] to compare and evaluate each implementation on the test functions. Figures 1, 2, 3, 4, 5 and 6 give the performance profiles of the proposed method with \(\theta =0.25,0.5\) and 0.75, MRN method and the classical Newton method relative to the number of objective function evaluations \((N_{f})\) and the number of gradient evaluations \((N_{g})\). Also, “Dim” shows the dimension of the problem. Figures 1, 2 and 3 show the performance of considered algorithms in terms of number of function evaluations. At a glance to these figures, we observe that in the cases \(\theta =0.25\) and \(\theta =0.5\) the performance of the proposed algorithm is better than others. Also according to Figs. 4, 5 and 6, the proposed method with \(\theta =0.25\) performe better than the other methods.

Table 1 Numerical results
Table 2 Numerical results
Fig. 1
figure 1

Performance profile for the number of function evaluations with \(\theta =0.25\)

Fig. 2
figure 2

Performance profile for the number of function evaluations with \(\theta =0.5\)

Fig. 3
figure 3

Performance profile for the number of function evaluations with \(\theta =0.75\)

Fig. 4
figure 4

Performance profile for the number of gradient evaluations with \(\theta =0.25\)

Fig. 5
figure 5

Performance profile for the number of gradient evaluations with \(\theta =0.5\)

Fig. 6
figure 6

Performance profile for the number of gradient evaluations with \(\theta =0.75\)

4.1 Systems of nonlinear equations

In this part, we solve systems of nonlinear equations by using the proposed algorithm. Consider the following nonlinear system of equations

$$\begin{aligned} F(x)=0, \end{aligned}$$
(34)

where \(F(x)=(f_{1}(x),f_{2}(x), \ldots , f_{n}(x))\) and \( x \in \mathbb {R}^{n} \). This system can be extended as

$$\begin{aligned}&f_{1}(x_{1},x_{2}, \ldots , x_{n})=0,\nonumber \\&f_{2}(x_{1},x_{2}, \ldots , x_{n})=0,\nonumber \\&\vdots \nonumber \\&f_{n}(x_{1},x_{2}, \ldots , x_{n})=0. \end{aligned}$$
(35)

For solving (34) by proposed algorithm, we suppose that \( f(x)=\sum \nolimits _{i=1}^{n} f_{i}^{^{2}}(x) \). Here, we have worked out our proposed method on the following test problems. In all problems, the stopping criterion is given by \( \Vert f(x_{k})\Vert < 10^{-15}\).

Problem 1

Consider the system of two equations [30]:

$$\begin{aligned} F(x)=\left\{ \begin{array}{ll} x_{1}+e^{x_{2}}-\cos x_{2} =0, \\ 3 x_{1}-\sin x_{1}-x_{2} =0, \end{array}\right. \end{aligned}$$

with the initial value \( x_{0}=(1.5, 1.5)\) and exact solution \(x^{*} =(0, 0).\)

Problem 2

Consider the system of three equations [30]:

$$\begin{aligned} F(x)=\left\{ \begin{array}{ll} 10 x_{1}+\sin (x_1 +x_2)-1 =0, \\ 8 x_2- \cos ^2 (x_3 -x_2)-1 =0, \\ 12 x_3 + \sin x_3 -1 =0,~~~~~~~~ \end{array}\right. \end{aligned}$$

with the initial value \(x_{0}=(-1, 1, -1)\) and exact solution

$$\begin{aligned} x^{*}=(0.0689783491726666, 0.2464424186091830, 0.0769289119875370). \end{aligned}$$
Table 3 Numerical results for Problems 1–4

Problem 3

\( n=16 \), \( 1\le i \le n-1 \) [31]:

$$\begin{aligned} F(x)=\left\{ \begin{array}{l} x_{i} \sin (x_{i+1})-1=0,\\ x_{n} \sin (x_{1})-1=0, \end{array}\right. \end{aligned}$$

with the initial value \( x_{0}=(-0.85,-0.85, \ldots , -0.85)\) and exact solution

$$\begin{aligned} x^{*}=(-1.114157140871930087, \ldots , -1.114157140871930087). \end{aligned}$$

Problem 4

Next, consider a system of 25 equations [32]:

$$\begin{aligned} f_i(x)= x_{i}- \cos \left( 2x_{i}-\displaystyle \sum _{j=1}^{25} x_{j}\right) =0,~~ 1 \le i \le 25. \end{aligned}$$

The initial guess is \( x_{0}=(0.2, 0.2, \ldots , 0.2) \) and the root correct up to 16 digits is given by \( x^{*}=(0.2142756147552158,0.2142756147552158, \ldots , 0.2142756147552158).\)

Table 3 shows the results about the considered nonlinear systems solved by the proposed algorithm, MRN and classical Newton method. From Table 3, we can see that the performance of proposed method on these problems is better than the other two algorithms.

5 Conclusions

In the present work, we proposed a new algorithm for solving convex unconstrained optimization problems. This proposed method is based on Traub’s iterative algorithm [1] which is extended to n-variable. The local convergence of the proposed method is also provided. The best property of this method is that it does not need to calculate the step length and \(\alpha _{k}=1\) at each iteration. The numerical results and comparison with other algorithms confirmed the efficiency and robustness of our algorithm. In addition, by comparing the performance profile figures for three different \(\theta =0.25, 0.5, 0.75\), it is seen that the numerical results for \(\theta = 0.25\) are better than two other \(\theta \) values. This algorithm is also used to solve systems of nonlinear equations, which have better numerical results compared to MRN method and classical Newton method.