On a Steffensen-like method for solving nonlinear equations

Amat, S.; Ezquerro, J. A.; Hernández-Verón, M. A.

doi:10.1007/s10092-015-0142-3

On a Steffensen-like method for solving nonlinear equations

Published: 11 April 2015

Volume 53, pages 171–188, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Calcolo Aims and scope Submit manuscript

On a Steffensen-like method for solving nonlinear equations

Download PDF

S. Amat¹,
J. A. Ezquerro² &
M. A. Hernández-Verón²

337 Accesses
14 Citations
Explore all metrics

Abstract

We study a generalization of Steffensen’s method in Banach spaces. Our main aim is to obtain similar convergence as Newton’s method, but without evaluating the first derivative of the operator involved. As motivation, we analyse numerical solutions of boundary-value problems approximated by the multiple shooting method that uses the proposed iterative scheme. Sufficient conditions for the semilocal convergence analysis of the method, including error estimates and the $R$-order of convergence, are provided. Finally, the theoretical results are applied to a nonlinear system of equations related with the approximation of a Hammerstein-type integral equation.

An alternating direction implicit finite element Galerkin method for the linear Schrödinger equation

Article 23 January 2024

Approximating the Nonlinear Schrödinger Equation by a Two Level Linearly Implicit Finite Element Method

Article 26 April 2019

Newton-like methods with increasing order of convergence and their convergence analysis in Banach space

Article 01 March 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

One of the most studied problems in numerical mathematics is the solution of nonlinear equations $F(x)=0$, where $F$ is a nonlinear operator defined on a non-empty open convex subset $\Omega $ of a Banach space $X$ with values in $X$. Iterative methods are a powerful tool for solving these equations.

It is well-known that Newton’s method,

$$\begin{aligned} x_{0}\in \Omega ,\quad x_{n}=x_{n-1}-[F'(x_{n-1})]^{-1}F(x_{n-1}),\quad n\in \mathbb {N}, \end{aligned}$$

is the one of the most used iterative methods to approximate the solution $x^*$ of $F(x)=0$. The quadratic convergence and the low operational cost of Newton’s method ensure that it has a good computational efficiency.

Kung and Traub presented in [7] a class of multipoint iterative functions without derivatives. These methods contain Steffensen’s method as a special case, where the evaluation of $F'(x)$ in each step of Newton’s method is approximated by the divided difference of first order $[x,x+F(x);F]$, so that we obtain the known method of Steffensen,

$$\begin{aligned} x_{0}\in \Omega ,\quad x_{n}=x_{n-1}-[x_{n-1},x_{n-1}+F(x_{n-1});F]^{-1}F(x_{n-1}), \quad n\in \mathbb {N}, \end{aligned}$$

which has quadratic convergence and the same computational efficiency as Newton’s method.

But, in order to achieve the second order in practice, we need an iteration close enough to the solution to have a good approximation of the first derivative of $F$ used in Newton’s method. In other case, some extra iterations in comparison with Newton’s method are required. Basically, when the norm of $F(x)$ is big, the approximation of the divided difference to the first derivative of $F$ is bad.

Other important aspect to consider is the applicability of an iterative method; that is, the set of starting points that we can take into account, so that the iterative method converges to a solution of an equation from any point of the set. In [5], we observe this experimentally by means of the attraction basin of the iterative method and justify that Steffensen’s method is less used than Newton’s method to approximate solutions of equations.

The aim of this work is then to improve the applicability of Steffensen’s method ensuring its second order of convergence in each iteration. For this, we present the following Steffensen-like method, which is a modification of Steffensen’s method:

$$\begin{aligned} \left\{ \begin{array}{l} x_{0}\in \Omega ,\\ y_{n-1} = x_{n-1} - a F(x_{n-1}),\quad a\in \mathbb {R}^{+},\quad n\in \mathbb {N},\\ z_{n-1} = x_{n-1} + b F(x_{n-1}),\quad b\in \mathbb {R}^{+},\\ x_{n}=x_{n-1}-[y_{n-1}, z_{n-1}; F]^{-1}F(x_{n-1}), \end{array} \right. \end{aligned}$$

(1)

that can be used to approximate the solution $x^*$ of $F(x)=0$ when Steffensen’s method does not. In Sect. 2, we give a motivation for using method (1), where we show a better numerical performance of this method with respect to Steffensen’s method.

We can also study the accessibility of an iterative method from the convergence conditions required to the iterative method. We then analyse in Sect. 3 the semilocal convergence of method (1), where the conditions on the starting point $x_{0}$ and the operator $F$ are imposed, by means of the majorant principle of Kantorovich [6]. We see that these conditions are reduced to those imposed to Steffensen’s method in [5], when the new method is reduced to Steffensen’s method (namely, $a=0$ and $b=1$). In Sect. 4, some conclusions about the better accessibility of method (1) on Steffensen’s method are justified. For this, regions of accessibility and domains of parameters of both methods are analyzed. Finally, in Sect. 5, an application is shown where the previous study is illustrated.

Throughout the paper we denote $\overline{B(x,\varrho )}=\{y\in X;\Vert y-x\Vert \le \varrho \}$ and $B(x,\varrho )=\{y\in X;\Vert y-x\Vert <\varrho \}$.

2 Motivation of the construction of method (1)

We consider the following boundary problem

$$\begin{aligned} y''(t)=f(t,y(t),y'(t)),\quad y(A)=\alpha , \quad y(B)=\beta , \end{aligned}$$

(2)

choose a discretization of $[A,B]$ with $N$ subintervals,

$$\begin{aligned} t_j = A + \dfrac{T}{N}\,j,\quad T=B-A, \quad j=0,1,\ldots ,N, \end{aligned}$$

and propose the use of the multiple shooting method for solving it. First, in each interval $[t_j,t_{j+1}]$, we compute the function $y(t;s_0,s_1,\ldots ,s_{j-1})$ recursively, by solving the initial value problems

$$\begin{aligned} y''(t)=f(t,y(t),y'(t)), \quad y(t_j)=y(t_j;s_0,s_1,\ldots ,s_{j-1}), \quad y'(t_j)=s_j, \end{aligned}$$

whose solution is denoted by $y(t;s_0,s_1,\ldots ,s_{j})$.

To approximate a solution of problem (2), we approximate a solution of the nonlinear system of equations $F(s)=0$, where $F:\mathbb {R}^N\longrightarrow \mathbb {R}^N$ and

$$\begin{aligned} \left\{ \begin{array}{c} F_1(s_0,s_1,\ldots ,s_{N-1})=s_1-y'(t_1;s_0) \\ F_2(s_0,s_1,\ldots ,s_{N-1})=s_2-y'(t_2;s_0,s_1) \\ \vdots \\ F_{N-1}(s_0,s_1,\ldots ,s_{N-1})=s_{N-1}-y'(t_{N-1};s_0,s_1,\ldots ,s_{N-2}) \\ F_N(s_0,s_1,\ldots ,s_{N-1})=\beta -y(t_N;s_0,s_1,s_{N-2},s_{N-1}). \end{array} \right. \end{aligned}$$

For this, we consider Steffensen’s method and method (1) and compare their numerical performance. In our study, we consider the usual divided difference of first order. So, for $\mathbf{u}, \mathbf{v} \in \mathbb {R}^N$, such that $\mathbf{u}\ne \mathbf{v}$, we consider $ {\left[ \mathbf{{u},\mathbf {v}}; F \right] }= {\left( [\mathbf{{u},\mathbf {v}}; F ]_{ij}\right) }^N_{i,j=1} \in \mathcal{L} (\mathbb {R}^N,\mathbb {R}^N),$ where

$$\begin{aligned}{}[\mathbf{u}, \mathbf{v}; F]_{ij} \!=\! {1\over {u_j{-}v_j}}\ (F_i (u_1, \dots , u_j, v_{j+1}, \!\dots , v_N) \!-\! F_i (u_1, \dots ,u_{j-1}, v_j, \dots , v_N)). \end{aligned}$$

For the initial slope $\mathbf {s}_0=\left( s_{0}^{0},s_{1}^{0},\ldots ,s_{N-1}^{0}\right) $, to apply Steffensen’s method and method (1), we consider

$$\begin{aligned} \left\{ \begin{array}{rcl} s_{0}^{0} &{}=&{}\displaystyle \frac{\beta -\alpha }{b-a}= \displaystyle \frac{y(t_N)-y(t_0)}{t_N-t_0}, \\ s_{1}^{0} &{}=&{}\displaystyle \frac{y(t_N)-y(t_1;s_0)}{t_N-t_1}, \\ s_{2}^{0} &{}=&{}\displaystyle \frac{y(t_N)-y(t_2;s_0,s_1)}{t_N-t_2}, \\ &{}\vdots &{} \\ s_{N-1}^{0} &{}=&{}\displaystyle \frac{y(t_N)-y(t_{N-1};s_0,s_1,\ldots ,s_{N-2})}{t_N-t_{N-1}}. \end{array} \right. \end{aligned}$$

In particular, to show the performance of method (1), we consider the following boundary value problem:

$$\begin{aligned} y''(t)=y(t) \left( y'(t)^2+\cos ^2 t\right) , \quad y(0)=-1, \quad y(1)=1. \end{aligned}$$

In this case, we have $T=1$ and consider three iterations of the schemes for $N=2,3$ and $4$ subintervals in the multiple shooting method. The exact solution is obtained with ND-Solve of MATHEMATICA taking $y'(0)=0.6500356840546128$ in order to have a trustworthy error for values near to $10^{-15}$ (tolerance in double precision).

In Tables 1, 2, 3, 4, 5, we observe that Steffensen’s method obtains poor results. Notice that when $N$ decreases (or the interval increases), the initial guess is less closer to the solution. This is the reason of the improvements of method (1) proposed in this work. For the worst case, $N=2$, Steffensen’s method diverges. And, for $N=3,4$, we observe clearly the second order of the methods, as well as the best performance of method (1).

Table 1 Method (1), $a=0$, $b=10^{-3}$; $N=2$

Full size table

Table 2 Method (1), $a=0$, $b=10^{-3}$; $N=3$

Full size table

Table 3 Steffensen’s method; $N=3$

Full size table

Table 4 Method (1), $a=0$, $b=10^{-3}$; $N=4$

Full size table

Table 5 Steffensen’s method; $N=4$

Full size table

Remark 1

In Table 6, we compute the error when the derivative $f^{'}(x)$ of $f(x)=\exp (x)$ at $x=0$ is approximated by the lateral divided difference

$$\begin{aligned}{}[x,x+h,f(x)]=\frac{f(x+h)-f(x)}{h}. \end{aligned}$$

Similar results appear for other functions, thus in the numerical examples we take $\text {Tol}<10^{-9}$. For the central divided difference, the rounded errors appear at $10^{-7}$.

Table 6 Error when the derivative is approximated by a lateral divided difference

Full size table

Then, we analyse the influence of increasing the interval of integration of a boundary value problem. So, we consider

$$\begin{aligned} y''(t) + \sin (y(t))\, y(t) = 0, \quad y(0)=1, \quad y(T)=\mu . \end{aligned}$$

and choose $\mu $ such that the solution $y(t)$ of the problem satisfies $y'(0)=0$ (using ND-Solve of MATHEMATICA). We consider $a=0$ and $b$ such that $\Vert b F(x_0)\Vert <1$.

We start considering the case $T=3$ and $N=1$; namely, $T$ is slightly higher than in the previous analysis and there is only one interval. As we can see in Table 7, if we consider a partition of the interval with only one interval, Steffensen’s method does not work correctly, while method (1) does.

Table 7 Error $\Vert y(t)-y_n\Vert _\infty $; $T=3$, $N=1$

Full size table

If we increase the value of $N$, for example $N=3$, both methods work well. Note that it is interesting to obtain good results for small values of $N$, as well as reducing the operational cost of the process.

For the obtained solution which is of the form shown in Fig. 1, an especially interesting case, from the numerical point of view, is $T=5$. Considering $N=7$ and $N=9$, we observe in Tables 8, 9 that the conditions greatly influence both methods. However, the better performance of method (1) is clear, since method (1) is convergent, in contrast to Steffensen’s method, which is not.

Table 8 Error $\Vert y(t)-y_n\Vert _\infty $; $T=5$

Full size table

Table 9 Error $\Vert y(t)-y_n\Vert _\infty $; $T=5$

Full size table

3 Main results for method (1)

Once motivated the construction of method (1), we then analyze its semilocal convergence and give some error estimates that lead to its quadratic convergence.

A divided difference of first order of the function $F$ at the points $x,y$ of $D \subseteq \mathbb {R}^m$ ($x\ne y$) is defined by

$$\begin{aligned}{}[x, y;~F] = \int _0^1 F'(tx+(1-t)y)\, dt. \end{aligned}$$

(3)

Notice that $[x,x;F]=F'(x)$ if $F$ is differentiable.

3.1 Semilocal convergence

We establish the semilocal convergence of the Steffensen-like method given in (1) by using the majorant principle [see [6]]. For this, we first suppose the following initial conditions:

(C1)
$\Vert F(x_0) \Vert \le \delta $,
(C2)
$\Vert \Gamma _0\Vert =\Vert [F'(x_0)]^{-1} \Vert \le \beta $,
(C3)
$\Vert F'(x)- F'(y)\Vert \le K \Vert x- y \Vert $, $x,y\in \Omega $, $K\in \mathbb {R}^{+}$.

From (C1)–(C3), we establish the semilocal convergence of the Steffensen-like method given in (1).

First of all, we prove the existence of the operator $[y_0,z_0;F]^{-1}\in \mathcal {L}(X,X)$ for $y_0,z_0\in \Omega $. From

$$\begin{aligned} \Vert I-\Gamma _0 [y_0,z_0; F]\Vert \le \Vert \Gamma _0\Vert \left\| F'(x_0)- \displaystyle {\int _{0}^{1}F'(\tau y_0+\tau (z_0-y_0))\, d\tau }\right\| \le \dfrac{a+b}{2}K\beta \delta , \end{aligned}$$

if $(a+b)K\beta \delta <2$, there exists the operator $[y_0,z_0; F]^{-1}\in \mathcal {L}(X,X)$, for $y_0,z_0\in \Omega $, and it is such that

$$\begin{aligned} \left\| [y_0,z_0; F]^{-1}\right\| \le \dfrac{2\beta }{2-(a+b)K\beta \delta }. \end{aligned}$$

Now, we define $M=K\left( 1+\frac{a+b}{\gamma }\right) $, $\gamma =\frac{2\beta }{2-(a+b)K\beta \delta }$ and consider the polynomial

$$\begin{aligned} q(t) = \frac{M}{2}t^2-\frac{t}{\gamma }+\delta , \qquad t\in [0,t']. \end{aligned}$$

(4)

Note that polynomial (4) has two positive roots $t^{*}=\frac{1-\sqrt{1-2M\delta \gamma ^2}}{M\gamma }$ and $t^{**}=\frac{1+\sqrt{1-2M\delta \gamma ^2}}{M\gamma }$ such that $t^{*}\le t^{**}<t'$ if $M\delta \gamma ^2\le \frac{1}{2}$. In addition, we define the scalar sequence $\{t_n\}$ by

$$\begin{aligned} t_{0} =0, \quad t_{n+1} = t_{n}-\dfrac{q(t_{n})}{q'(t_{n})},\quad n\ge 0. \end{aligned}$$

(5)

Note that sequence (5) is nondecreasing and convergent to $t^{*}$, since (5) is monotone and bounded.

Theorem 1

Let $F:\Omega \subseteq X\rightarrow X$ be a continuously differentiable operator defined on a nonempty open convex domain $\Omega $ of a Banach space $X$. Suppose that conditions (C1)–(C3) with

$$\begin{aligned} (a+b)K\beta \delta < 2,\quad M \delta \gamma ^2\le \frac{1}{2} \end{aligned}$$

(6)

and that $B(x_0,t^{*}+m\delta )\subset \Omega $, where $m=\max \{a,b\}$, are satisfied. Then, method (1) converges to a solution $x^{*}$ of the equation $F(x)=0$, starting at $x_0$, and $x_n,y_n,z_n,x^*\in \overline{B(x_0,t^{*}+m\delta )}$, for all $n=0,1,2,\ldots $. Moreover, the solution $x^{*}$ is unique in $B(x_{0},r)\cap \Omega $, where $r=\frac{2}{K\beta }-(t^{*}+m\delta )$, provided that $K\beta (t^{*}+m\delta )<2$.

Proof

To prove the semilocal convergence of (1), we use an inductive process. From,

$$\begin{aligned} \Vert y_0-x_0\Vert\le & {} a\,\delta < t^*+m\,\delta ,\\ \Vert z_0-x_0\Vert\le & {} b\,\delta < t^*+m\,\delta ,\\ \Vert x_1-x_0\Vert\le & {} \gamma \,\delta = t_{1}-t_{0} < t^{*}+m\,\delta , \end{aligned}$$

it follows that $y_0,z_0,x_1\in B(x_0,t^{*}+m\delta )\subset \Omega $ and then we can define $x_2$. Next, since

$$\begin{aligned} F(x_1)= & {} \displaystyle \int _{0}^{1}(F'(x_0+\tau (x_1 - x_0))-F'(x_0))\, d\tau (x_1 - x_0)\\&+\left( F'(x_0)-[y_0,z_0; F]\right) (x_1 - x_0), \end{aligned}$$

we have

$$\begin{aligned} \Vert F(x_1)\Vert \le \dfrac{K}{2}\Vert x_1 - x_0\Vert ^2 + \dfrac{a+b}{2} K \Vert F(x_0)\Vert \cdot \Vert x_1 - x_0\Vert \le \dfrac{M}{2}(t_1-t_0)^2 = q(t_1). \end{aligned}$$

Moreover, as sequence (5) is nondecreasing and polynomial (4) is nonincreasing in $[0,t^*]$, we get

$$\begin{aligned} \Vert y_1-x_0 \Vert\le & {} \Vert x_1-x_0 \Vert + a\,\Vert F(x_1) \Vert < t^*+a\,q(t_0) = t^*+ m\,\delta ,\\ \Vert z_1-x_0 \Vert\le & {} \Vert x_1-x_0 \Vert + b\,\Vert F(x_1) \Vert < t^*+b\,q(t_0) = t^*+ m\,\delta . \end{aligned}$$

Note that we need the existence of the operator $[y_1,z_1; F]^{-1}$ to define $x_2$. Taking into account that sequence (5) is nondecreasing and polynomial (4) is nonincreasing in $[0,t^*]$, we have

$$\begin{aligned} \Vert I-\Gamma _0 [y_1,z_1; F]\Vert\le & {} \Vert \Gamma _0\Vert \int _{0}^1 \Vert F'(x_{0})-F'(y_{1}+\tau (z_{1}-y_{1}))\Vert \,d\tau \\\le & {} \dfrac{K\beta }{2} \left( 2(t_1-t_0)+(a+b)q(t_1)\right) \\< & {} \beta \left( q'(t_1)+\dfrac{1}{\gamma } \right) \\< & {} 1, \end{aligned}$$

so that the operator $[y_1,z_1; F]^{-1}$ exists and is such that

$$\begin{aligned} \Vert [y_1,z_1; F]^{-1}\Vert \le \dfrac{\beta }{1-\Vert I-\Gamma _0[y_1,z_1; F]\Vert } \le - \dfrac{1}{q'(t_1)}. \end{aligned}$$

As a consequence,

$$\begin{aligned} \Vert x_2-x_1\Vert\le & {} \Vert [y_1,z_1; F]^{-1}\Vert \Vert F(x_1)\Vert \le -\dfrac{q(t_1)}{q'(t_1)} \le t_2-t_{1}, \\ \Vert x_2-x_0\Vert\le & {} \Vert x_2-x_{1}\Vert + \Vert x_1-x_{0}\Vert \le t_2-t_{0} < t^{*}-t_{0} < t^{*}+m\,\delta , \end{aligned}$$

$x_2\in B(x_0,t^{*}+m\,\delta )\subset \Omega $ and then we can define $y_2$ and $z_2$.

Now, from

$$\begin{aligned} F(x_n)= & {} \displaystyle \int _{0}^{1} (F'(x_{n-1}+\tau (x_n - x_{n-1}))-F'(x_{n-1}))\, d\tau (x_n - x_{n-1}) \\&+\displaystyle \int _{0}^{1} \left( F'(x_{n-1})-F'(y_{n-1}+\tau (z_{n-1}-y_{n-1})\, d\tau \right) (x_n - x_{n-1}) \end{aligned}$$

and $q(t_n)=\dfrac{M}{2}(t_n - t_{n-1})^2$, it follows $\Vert F(x_n)\Vert \le q(t_n)$, for all $n\in \mathbb {N}$, since

$$\begin{aligned} \Vert F(x_n)\Vert\le & {} \dfrac{K}{2} \Vert x_n-x_{n-1}\Vert ^2 \\&+K\displaystyle \int _{0}^{1} \left( (1-\tau )\Vert y_{n-1}-x_{n-1}\Vert +\tau \Vert z_{n-1}-x_{n-1}\Vert \right) \, d\tau \Vert x_n-x_{n-1}\Vert \\\le & {} \dfrac{K}{2} \left( 1-(a+b)\,q'(t_{n-1})\right) (t_n-t_{n-1})^2 \\\le & {} q(t_n). \end{aligned}$$

In addition, as polynomial (4) is nonincreasing in $[0,t^*]$, we have

$$\begin{aligned} \Vert y_n-x_{0} \Vert\le & {} \Vert x_n-x_{0}\Vert + a\,\Vert F(x_{n})\Vert < t^{*} + a\,q(t_0) < t^{*} + m\,\delta , \\ \Vert z_n-x_{0} \Vert\le & {} \Vert x_n-x_{0}\Vert + b\,\Vert F(x_{n})\Vert < t^{*} + b\,q(t_0) < t^{*} + m\,\delta \end{aligned}$$

and, therefore, $y_n,z_n\in B(x_0,t^{*} + m\,\delta )\subset \Omega $.

Next, we prove the existence of the operator $[y_n,z_n; F]^{-1}$. As

$$\begin{aligned} \Vert I-\Gamma _0[y_n,z_n; F]\Vert\le & {} \Vert \Gamma _0\Vert \int _0^1 \Vert F'(x_0)-F'(y_n+\tau (z_n-y_n))\Vert \,d\tau \\\le & {} \dfrac{K\beta }{2} \left( 2(t_n-t_0)+(a+b)q(t_n)\right) \\< & {} \beta \left( q'(t_n)+\dfrac{1}{\gamma } \right) \\< & {} 1, \end{aligned}$$

we have that the operator $[y_n,z_n;F]^{-1}$ exists and is such that

$$\begin{aligned} \Vert [y_n,z_n; F]^{-1}\Vert \le \dfrac{\beta }{1-\Vert I-\Gamma _0[y_n,z_n; F]\Vert } \le -\dfrac{1}{q'(t_n)}. \end{aligned}$$

Thus,

$$\begin{aligned} \Vert x_{n+1}-x_{n}\Vert \le \Vert [y_n,z_n; F]^{-1}\Vert \Vert F(x_n)\Vert \le -\dfrac{q(t_n)}{q'(t_n)} = t_{n+1}-t_{n}, \end{aligned}$$

(7)

$$\begin{aligned} \Vert x_{n+1}-x_{0} \Vert \le \Vert x_{n+1}-x_{n} \Vert + \Vert x_{n}-x_{0} \Vert \le t_{n+1}-t_{0} < t^{*}-t_{0} < t^{*}+m\,\delta . \end{aligned}$$

After that, as $\{t_n\}$ is a Cauchy sequence, it follows that the sequence $\{x_n\}$ is also a Cauchy sequence and, consequently, it is convergent. Let $\displaystyle {\lim _{n}x_{n}\!=\!x^{*}\in \overline{B(x_0,t^{*}\!+\!m\,\delta )}}$. To see that $x^{*}$ is a solution of $F(x)=0$, it is enough to note that $\Vert F(x_n)\Vert \le q(t_n)$ and, by the continuities of $F$ and $q$, it follows that $F(x^{*})=0$.

Finally, we prove the uniqueness of the solution $x^*$. We suppose that we have a solution $y^{*}\in B(x_0,r)\cap \Omega $ of $F(x)=0$ such that $y^*\ne x^*$. Consider

$$\begin{aligned} F(y^{*}) - F(x^{*}) = \displaystyle \int _{x^{*}}^{y^{*}} F'(x)\, dx = \displaystyle {\int _{0}^{1} F'(x^{*}+\tau (y^{*}-x^{*}))\, d\tau \, (y^{*}-x^{*})}=0 \end{aligned}$$

and the operator $J=\displaystyle \int _{0}^{1} F'(x^{*}+\tau (y^{*}-x^{*}))\, d\tau $. From

$$\begin{aligned} \Vert I\!-\!\Gamma _0 J\Vert \le \Vert \Gamma _0\Vert \displaystyle \int _{0}^{1}\Vert F'(x^{*}\!+\!\tau (y^{*}-x^{*}))\!-\!F'(x_0)\Vert \, d\tau < \dfrac{\beta K}{2}(t^{*}+m\delta +r) = 1, \end{aligned}$$

the operator $J$ is invertible, provided that $K\beta (t^{*}+m\,\delta )<2$, and then $x^{*}=y^{*}$. $\square $

Note that Theorem 1 extends the result of semilocal convergence given for Stefffensen’s method, method (1) with $a=0$ and $b=1$, in [5]. On the other hand, if $a=b=0$, the corresponding method (1) is reduced to Newton’s method for differentiable operators.

3.2 A priori error estimates and $R$-order of convergence

In the next theorem, we give some a priori error estimates for method (1), that are obtained by Ostrowski’s technique, see [8]. This technique allows bounding the error made by method (1) based on the zeros of polynomial (4).

Theorem 2

Under the conditions of Theorem 1, we consider polynomial (4) and the two positive zeros $t^*$ and $t^{**}$ of (4) such that $t^{*}\le t^{**}$. Then, we obtain the following error estimates for method (1):

(a)
If $t^{*}<t^{**}$, then
$$\begin{aligned} \Vert x^{*}-x_n\Vert \le t^{*}-t_n = \frac{(t^{**}-t^{*})\theta ^{2^{n}}}{1-\theta ^{2^{n}}},\qquad \text {where}\quad \theta =\dfrac{t^{*}}{t^{**}}. \end{aligned}$$
(8)
(b)
If $t^{*}=t^{**}$, then
$$\begin{aligned} \Vert x^{*}-x_n\Vert \le t^{*}- t_n = \frac{t^{*}}{2^{n}}. \end{aligned}$$
(9)

Proof

First, from (7), it follows that $\{t_{n}\}$ is a majorizing sequence of $\{x_{n}\}$. Then, for $m\ge 1$ and $n\ge 1$, we have

$$\begin{aligned} \Vert x_{n+m}-x_n\Vert \le \sum _{i=n}^{n+m-1}\Vert x_{i+1}-x_{i}\Vert \le \sum _{i=n}^{n+m-1} (t_{i+1}-t_{i}) = t_{n+m}-t_{n}, \end{aligned}$$

so that, if $m\rightarrow \infty $, from the convergence of $\{x_{n}\}$ and $\{t_{n}\}$, it follows that

$$\begin{aligned} \Vert x^{*}-x_n\Vert \le t^{*}-t_{n}. \end{aligned}$$

Second, we prove item (a). Since $t^{*}<t^{**}$, then we can write

$$\begin{aligned} q(t)=\frac{M}{2}(t^{*}-t)(t^{**}-t). \end{aligned}$$

If we denote $a_n = t^{*}-t_n$ and $b_n = t^{**}-t_n$, for all $n\ge 0$, then we can write $q(t_n) = \frac{M}{2}a_nb_n$. As $q'(t_n) = -\frac{M}{2}(a_n+b_n)$, then

$$\begin{aligned} a_{n+1} = t^{*}-t_{n+1} = \frac{a_n^2}{a_n+b_n},\quad b_{n+1} = t^{**}-t_{n+1} = \frac{b_n^2}{a_n+b_n},\quad n\ge 0. \end{aligned}$$

Moreover,

$$\begin{aligned} \dfrac{a_{n+1}}{b_{n+1}} = \dfrac{a_n^{2}}{b_n^{2}} = \cdots = \theta ^{2^{n+1}},\quad n\ge 0. \end{aligned}$$

Finally, from $b_{n+1} = (t^{**}-t^{*})+a_{n+1}$, we have (8).

Third, we prove item (b). Since $t^{*} = t^{**}$, then $a_n = b_n$, $q(t_n) = \frac{M}{2}a_n^2$,

$$\begin{aligned} a_{n+1} = \frac{a_n}{2} = \cdots = \frac{t^{*}}{2^{n}},\quad n\ge 0, \end{aligned}$$

and we have (9). $\square $

From the last theorem, we notice that method (1) has $R$-order of convergence at least two if $t^{*}<t^{**}$ and at least one if $t^{*}=t^{**}$.

4 Accessibility of method (1)

As we have indicated in the introduction, an important aspect to consider when studying the applicability of an iterative method is the set of starting points that we can take into account, so that the iterative method converges to a solution of an equation from any point of the set, what we call accessibility of the iterative method.

We can consider the following experimental form of studying the accessibility of an iterative method. We know that a given point $x\in \Omega $ has associated certain parameters of convergence. If the parameters of convergence satisfy the convergence conditions, we colour the point $x$; otherwise, we do not. So, the region that is finally coloured is what we call a region of accessibility of the iterative method. As a consequence, the region of accessibility of an iterative method provides the domain of starting points from which we have guaranteed the convergence of the iterative method. In other words, the region of accessibility represents the domain of starting points that satisfy the convergence conditions required by the iterative method that we want to apply for solving a particular equation.

To clarify the advantage of method (1) regarding Steffensen’s method, we consider a simple academic example, the complex equation $g(z)=z^3-1=0$, and analyze the region of accessibility of the root $z^{*}=1$ of $g(z)=0$ when it is approximated by method (1). For this, we consider the ball $B(0,2)$ as the domain of the function $g$. As a consequence, in this case, $K=12$. To paint the region of accessibility, we consider $z_{0}\in B(0,2)$ and colour all the points $z_{0}$ that satisfy conditions (6) of Theorem 1. Then, we show in Fig. 2 the regions of accessibility of $z^{*}=1$ when it is approximated by different methods (1): blue for method (1) with $a=0$ and $b=1/15$, red for method (1) with $a=0$ and $b=1/10$, yellow for method (1) with $a=0$ and $b=1/5$ and green for method (1) with $a=0$ and $b=1$ (Steffensen’s method). We observe in Fig. 2 that the smaller $(a+b)$ is, the bigger the region of accessibility is, so that the optimal situation is obtained when $a+b=0$ (i.e.: $a=b=0$).

On the other hand, we can also study, in general, the accessibility of an iterative method from the convergence conditions required to the iterative method. It is well-known that the semilocal convergence conditions are of two kinds: conditions required to the starting point $x_{0}$ and conditions required to the operator $F$ involved. For this, we consider the semilocal convergence result given by Theorem 1.

If we want to study the accessibility of method (1) from Theorem 1, we consider the pair of parameters $(\delta ,\beta )$ given in conditions (C1) and (C2), the conditions given in (6) and take into account what we call the domain of parameters associated to Theorem 1 for method (1): $\{(\delta ,\beta )\in \mathbb {R}^{2}:\ \text {conditions (6) are satisfied}\}$. Observe that the parameter $K$ is always a fixed value, so that $K$ will not affect the domain of parameters.

After that, we draw the domain of parameters associated to Theorem 1. Taking into account $x=\beta $ and $y=K\delta $, from conditions (6), we can define the domain of parameters associated to Theorem 1 for method (1) as the region of the $xy$-plane whose points satisfy conditions (6), so that the convergence of method (1) is guaranteed from the hypotheses imposed in Theorem 1. For this, we colour the values of the parameters that satisfy conditions (6) in the $xy$-plane. Note that initial conditions (C1) and (C2), required to the initial approximation $x_{0}$, define the parameters $\delta $ and $\beta $, and condition (C3), required to the operator $F$, define the fixed parameter $K$.

Next, we draw the domain of parameters associated to Theorem 1 for method (1). From Figs. 3, 4, 5, we can guess the following conclusions.

Conclusions. Once the domain of parameters associated to Theorem 1 for method (1) is drawn, we can say the following.

The parameters $a$ and $b$ play symmetrical roles.
The quantity $(a+b)$ is what makes the amplitude of the domain of parameters associated to Theorem 1.
The smaller the quantity $(a+b)$ is, the bigger the domain of parameters is.
If $a=0$ and $b=1$, method (1) is reduced to Steffensen’s method and the domain of parameters is that obtained in [5].
If $(a+b)\rightarrow 0$, the domain of parameters tends to be that obtained by Kantorovich for Newton’s method, [6].
If $a=b=0$, method (1) is reduced to Newton’s method for differentiable operators and the domain of parameters is that obtained by Kantorovich, [6].

In order to control the stability in practice, it is possible to consider different parameters ($a,b$) in each iteration as in [1–3]. We are preparing a forthcoming work in this direction.

5 Application

With the following application we show that we cannot apply Steffensen’s method [method (1) with $a=0$ and $b=1$] to approximate a solution of the discrete problem corresponding to a nonlinear integral equation of Hammerstein type, since the convergence conditions of Theorem 1 are not satisfied when $a=0$ and $b=1$. However, we can do it by a method (1) different from Steffensen’s method.

First, we consider a nonlinear integral equation of mixed Hammerstein type

$$\begin{aligned} x(s) = f(s)+\int _{T_0}^{T_1}G(s,t)H(x(t))\,dt,\quad s\in [T_0,T_1], \end{aligned}$$

(10)

where $-\infty <T_0<T_1<+\infty $, the function $f(s)$ is continuous on $[T_0,T_1]$ and given, the kernel $G(s,t)$ is the Green function and $H(\xi )$ is a known function. If we now consider that $G$ is the Green function in $[T_0,T_1]\times [T_0,T_1]$ and use a discretization process to transform Eq. (10) into a finite dimensional problem, then Eq. (10) is transformed into the following system:

$$\begin{aligned} F(\mathbf {x})\equiv \mathbf {x}-\mathbf f -D\,\mathbf {z} = 0, \quad F:\mathbb {R}^m\longrightarrow \mathbb {R}^m, \end{aligned}$$

(11)

where

$$\begin{aligned} \mathbf {x}=(x_1,x_2,\dots ,x_m)^T,\;\; \mathbf f =(f_1,f_2,\dots ,f_m)^T,\;\; \mathbf {z}=(H(x_1),H(x_2),\dots ,H(x_m))^T, \end{aligned}$$

$$\begin{aligned} D=(d_{ij})_{i,j=1}^m,\quad d_{ij} = w_j G(t_i,t_j)= \left\{ \begin{array}{ll} w_j \frac{(T_1-t_i)(t_j-T_0)}{T_0-T_1}, &{} j\le i, \\ w_j \frac{(T_1-t_j)(t_i-T_0)}{T_0-T_1}, &{} j > i. \end{array} \right. \end{aligned}$$

and $t_i$ and $w_i$ are, respectively, the known $m$ nodes and weights of the quadrature formula used in the process of discretization.

Now, we approximate a solution of a nonlinear system of form (11). In particular, if $m=8$, $\mathbf f =\mathbf 1 $ and $H(x)=x^{2}$, we have

$$\begin{aligned} F(\mathbf {x})\equiv \mathbf {x}-\mathbf 1 -D\mathbf {v_{x}} = 0, \quad F:\mathbb {R}^8\longrightarrow \mathbb {R}^8, \end{aligned}$$

(12)

where

$$\begin{aligned} \mathbf {x}=(x_1,x_2,\dots ,x_8)^T,\quad \mathbf 1 =(1,1,\dots ,1)^T,\quad \mathbf {v_{x}}=(x_1^2,x_2^2,\dots ,x_8^2)^T \end{aligned}$$

and $t_i$ and $w_i$ are, respectively, the known $8$ nodes and weights of the Gauss–Legendre formula in $[T_0,T_1]=[0,1]$. Moreover, in this case, we have $F'(\mathbf {x})=I-2D\text {diag}\{x_1,x_2,\dots ,x_8\}$.

Choosing as starting point $\mathbf {x_0}=(1.7,1.7,\ldots ,1.7)^T$ and the max-norm, we obtain $\delta =0.6713\ldots $, $\beta =1.6549\ldots $, $K=0.2471\ldots $ In consequence, we cannot apply Steffensen’s method (method (1) with $a=0$ and $b=1$), since the second condition of (6), Theorem 1, is not satisfied:

$$\begin{aligned} M\delta \gamma ^2 = 0.9286\ldots >\frac{1}{2}, \end{aligned}$$

where $\gamma =1.9182\ldots $ and $M=0.3759\ldots $ However, if we choose, for example, $a=0$ and $b=1/10$ in method (1), we can apply method (1) with $a=0$ and $b=1/10$, since the two conditions of (6), Theorem 1, are satisfied:

$$\begin{aligned} (a+b)K\beta \delta = 0.0274\ldots < 2,\quad M \delta \gamma ^2 = 0.4914\ldots \le \frac{1}{2} \end{aligned}$$

In addition, in Fig. 6, we can see visually how the initial point $\mathbf {x_0}$ satisfies the convergence conditions of method (1) with $a=0$ and $b=1/10$, but not Steffensen’s method. Observe that the black point belongs to the domain of parameters of method (1) with $a=0$ and $b=1/10$ (orange region), but not to that of Steffensen’s method (yellow region).

As method (1) with $a=0$ and $b=1/10$ is convergent by Theorem 1, then we use it to approximate the numerical solution $\mathbf {x^{*}}=(x_1^{*},x_2^{*},\dots ,x_8^{*})^T$ of system (12), which is shown in Table 10, after five iterations and using the stopping criterion $\Vert \mathbf {x_n}-\mathbf {x_{n-1}}\Vert <10^{-16}$. In Table 11 we show the errors $\Vert \mathbf {x_n}-\mathbf {x^*}\Vert $ obtained with the same stopping criterion. Notice that the vector shown in Table 10 is a good approximation of a solution of system (12), since $\Vert F(\mathbf {x_{n}})\Vert \le \text {constant}\times 10^{-16}$. See the sequence $\{\Vert F(\mathbf {x_n})\Vert \}$ in Table 11. Moreover, by Theorem 1, the existence of $\mathbf {x^{*}}$ is guaranteed in the ball $\overline{B(\mathbf {x_0},2.0594\ldots )}$ and the uniqueness of $\mathbf {x^{*}}$ in $B(\mathbf {x_0},2.8308\ldots )$.

Table 10 Numerical solution $\mathbf {x^{*}}$ of (12)

Full size table

Table 11 Absolute errors obtained by method (1) with $a=0$ and $b=1/10$ and $\Vert F(\mathbf {x_n})\Vert $ for the system (12)

Full size table

Finally, it is easy to see that the smaller the quantity $(a+b)$ is, the better the a priori error estimates obtained in Theorem 2 are.

6 Conclusions

In this paper, we have analyzed a Steffensen-type method depending of some parameters and including the original Steffensen’s method. We have proposed an election of these parameters in order to improve the numerical and theoretical behavior of the method. We have presented sufficient semilocal convergence results using weak conditions and similar to the one’s used for Newton’s methods. A nonlinear boundary value problem and a nonlinear integral equation are solved numerically showing the advantages in convergence and accessibility of the proposed method.

References

Alarcón, V., Amat, S., Busquier, S., López, D.J.: A Steffensen’s type method in Banach spaces with applications on boundary-value problems. J. Comput. Appl. Math. 216(1), 243–250 (2008)
Article MathSciNet MATH Google Scholar
Amat, S., Busquier, S.: A two-step Steffensen’s method under modified convergence conditions. J. Math. Anal. Appl. 324(2), 1084–1092 (2006)
Article MathSciNet MATH Google Scholar
Amat, S., Busquier, S.: Convergence and numerical analysis of a family of two-step Steffensen’s methods. Comput. Math. Appl. 49(1), 13–22 (2005)
Article MathSciNet MATH Google Scholar
Argyros, I.K.: A new convergence theorem for Steffensen’s method on Banach spaces and applications. Southwest J. Pure Appl. Math. 1, 23–29 (1997)
MATH Google Scholar
Ezquerro, J.A., Hernández, M.A., Romero, N., Velasco, A.I.: On Steffensen’s method on Banach spaces. J. Comput. Appl. Math. 249, 9–23 (2013)
Article MathSciNet MATH Google Scholar
Kantorovich, L.V., Akilov, G.P.: Functional Analysis. Pergamon Press, Oxford (1982)
MATH Google Scholar
Kung, H.T., Traub, J.F.: Optimal order of one-point and multipoint iteration. Computer Science Department. Paper 1747 (1973)
Ostrowski, A.M.: Solution of Equations and Systems of Equations. Academic Press, New York (1966)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics and Statistics, Polytechnic University of Cartagena, Paseo Alfonso XIII 52, 30203, Cartagena, Spain
S. Amat
Department of Mathematics and Computation, University of La Rioja, Calle Luis de Ulloa s/n, 26004, Logroño, Spain
J. A. Ezquerro & M. A. Hernández-Verón

Authors

S. Amat
View author publications
You can also search for this author in PubMed Google Scholar
J. A. Ezquerro
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Hernández-Verón
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Amat.

Additional information

This work was supported in part by the project MTM2011-28636-C02-01 of the Spanish Ministry of Science and Innovation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amat, S., Ezquerro, J.A. & Hernández-Verón, M.A. On a Steffensen-like method for solving nonlinear equations. Calcolo 53, 171–188 (2016). https://doi.org/10.1007/s10092-015-0142-3

Download citation

Received: 05 December 2014
Accepted: 18 March 2015
Published: 11 April 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10092-015-0142-3

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On a Steffensen-like method for solving nonlinear equations

Abstract

Similar content being viewed by others

An alternating direction implicit finite element Galerkin method for the linear Schrödinger equation

Approximating the Nonlinear Schrödinger Equation by a Two Level Linearly Implicit Finite Element Method

Newton-like methods with increasing order of convergence and their convergence analysis in Banach space

1 Introduction