Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

10.1 Iteration Functions

Many problems in computational sciences and other disciplines can be formulated by means of an equation like the following

$$\displaystyle{ \mathcal{F}(x) = 0\,, }$$
(10.1)

where \(\mathcal{F}: D \subset \mathbb{X}\longrightarrow \mathbb{Y}\) is a continuous operator defined on a nonempty convex subset D of a Banach space \(\mathbb{X}\) with values in a Banach space \(\mathbb{Y}\). We face the problem of approximating a local unique solution \(\alpha \in \mathbb{X}\) of Eq. (10.1). Since the exact solution of this equation can rarely be found, then we need to use iterative techniques to approximate it to the desired precision from one or several initial approximations. This procedure generates a sequence of approximations of the solution.

Traub [41] includes a classification of iteration functions, according to the information that is required to carry them out. We build up a sequence {x n } n ≥ 1 in a Banach space \(\mathbb{X}\) using the initial conditions \(x_{-k},\ldots,x_{-1},x_{0}\), 0 ≤ k ≤ j − 1. Traub’s classification of iteration functions is the following.

Type I.:

Term x n+1 is obtained using only the information at x n and no other information. That is, given x 0 ∈ D we have

$$\displaystyle{ x_{n+1} =\varPhi (x_{n})\,,\;\;\:n \geq 0\,. }$$
(10.2)

The function Φ is called a one-point iteration function and Eq. (10.2) is called the one-point iterative method without memory.

Type II.:

Term x n+1 is obtained using the information at x n and previous information at \(x_{n-1}\,,\ldots \,,x_{n-j} \in D\). Namely,

$$\displaystyle{ x_{n+1} =\varPhi (x_{n}\,;x_{n-1}\,,\ldots \,,x_{n-j})\,,\;\;\:n \geq 0\,,\;\;\:j \geq 1\,. }$$
(10.3)

Function Φ is called a one-point iteration function with memory and Eq. (10.3) is called a one-point iterative method with memory (j points). The semicolon in (10.3) is written to distinguish the information provided by the new data from the information that was previously used.

Type III.:

Term x n+1 is determined using new information at x n and previous information at \(\varphi _{1} =\varphi _{1}(x_{n}),\,\varphi _{2} =\varphi _{2}(\varphi _{1},x_{n}),\ldots,\) \(\varphi _{r} =\varphi _{r}(\varphi _{r-1},\ldots,\varphi _{1},x_{n}) \in D\)r ≥ ​ 1. That is,

$$\displaystyle{ x_{n+1} =\varPhi \left (x_{n}\,,\varphi _{1}\,,\ldots \,,\varphi _{r}\right )\,,\;\:n \geq 0\,,\;\;\:r \geq 1\,. }$$
(10.4)

Here, function Φ is called a multipoint iteration function without memory and Eq. (10.4) is called a multipoint iterative method without memory (r steps).

Type IV.:

Term x n+1 is obtained from new information at x n and previous information at

$$\displaystyle\begin{array}{rcl} & & \varphi _{1} =\varphi _{1}(x_{n}\,;x_{n-1},\ldots \,,x_{n-j}), {}\\ & & \vdots {}\\ & & \varphi _{r} =\varphi _{r}(x_{n},\varphi _{1},\ldots,\varphi _{r-1}\,;x_{n-1},\ldots \,,x_{n-j}). {}\\ \end{array}$$

Namely,

$$\displaystyle{ x_{n+1} =\varPhi \left (x_{n}\,,\varphi _{1}\,,\ldots \,,\varphi _{r};x_{n-1}\,,\ldots \,,x_{n-j}\right )\,,\;\;\:n \geq 0\,,\;\;\:r \geq 1\,,\;\;\:j \geq 1. }$$
(10.5)

Function Φ is called a multipoint iteration function with memory and (10.5) is called a multipoint iteration method with memory (r steps and j points).

10.1.1 One-Dimensional Case

In particular, when the Banach spaces \(\mathbb{X} = \mathbb{Y} = \mathbf{R}\), we have to solve the most simple, classical nonlinear problem. Namely, let f: I ⊆ R → R be a nonlinear function. We have to approximate a simple root α of the equation

$$\displaystyle{ f(x) = 0, }$$
(10.6)

where I is a neighborhood of α. An approximation of α is usually obtained by means of an iterative function of type I , II , III or IV, defined in (10.2), (10.3), (10.4) or (10.5) whereby a sequence {x n } n ≥ 1 is considered that converging converges to α.

Definition 1

The sequence {x n } is said to converge to α with order of convergence ρ ∈ R, ρ ≥ 1, if there exists a positive real constant C ≠ 0 and C such that

$$\displaystyle{ \lim _{n\rightarrow \infty }\,\dfrac{\vert e_{n+1}\vert } {\,\vert e_{n}\vert ^{\,\rho }\,} \, =\, C, }$$
(10.7)

where \(\,e_{n} = x_{n}-\alpha\) is the error in the nth iterate, and the constant C is called the asymptotic error constant (see [41]).

The local order of convergence of an iterative method in a neighborhood of a root is the order of its corresponding sequence generated by the iterative function and the corresponding initial approximations. For iterative methods without memory, the local order is a positive integer. The convergence is said to be linear if ρ = 1, quadratic if ρ = 2, cubic if ρ = 3, and, in general, superlinear if ρ > 1, superquadratic if ρ > 2, and so on.

The one-point iterative method without memory (10.7) can be written as

$$\displaystyle{ e_{n+1} = C\,e_{n}^{\,\rho }\,\, +\, O\big(\,e_{ n}^{\,\rho +1}\big)\,,\ n \geq n_{ 0}. }$$
(10.8)

The expression (10.8) is called the error difference equation for the one-point iterative method. Note that the higher order terms in (10.8) are powers of ρ + 1.

For the one-point iterative method without memory, an approximation of the number of correct decimal places in the nth iterate, d n , is given by

$$\displaystyle{ d_{n} = -\log _{10}\vert x_{n} -\alpha \vert. }$$
(10.9)

From (10.8), for n large enough we have e n+1 ≈ Ce n ρ, which using logarithms yields

$$\displaystyle{ d_{n+1} \approx -\log _{10}C +\rho \cdot d_{n}, }$$
(10.10)

from which follows

$$\displaystyle{ d_{n+1} \approx \rho \cdot d_{n}. }$$
(10.11)

This means that, in each iteration, the number of correct decimal places is approximately the number of correct decimals in the previous iteration multiplied by the local error.

This is in agreement with Wall’s definition [42]. That is, the local order of convergence of a one-point iteration function indicates the rate of convergence of the iteration method. Then, Wall defines the order ρ of the iteration formula by

$$\displaystyle{ \rho =\lim _{n\rightarrow \infty }\dfrac{\,\log \vert e_{n+1\vert }\,} {\log \vert e_{n}\vert } =\lim _{n\rightarrow \infty }\dfrac{\,d_{n+1}\,} {d_{n}}. }$$
(10.12)

This expression will be used later on when we define some parameters employed in the computation of the local order of convergence of an iterative method.

For the one-point iterative method with memory (10.3) the error difference equation is

$$\displaystyle\begin{array}{rcl} e_{n+1}& =& Ce_{n}^{a_{1} }e_{n-1}^{a_{2} }\ldots e_{n-j+1}^{a_{j} } + o(e_{n}^{a_{1} }e_{n-1}^{a_{2} }\ldots e_{n-j+1}^{a_{j} })\,,{}\end{array}$$
(10.13)

where a k are nonnegative integers for 1 ≤ k ≤ j and \(o(e_{n}^{a_{1}}e_{n-1}^{a_{2}}\ldots e_{n-j+1}^{a_{j}})\) represents terms with high order than the term \(e_{n}^{a_{1}}e_{n-1}^{a_{2}}\ldots e_{n-j+1}^{a_{j}}\). In this case, the order of convergence ρ is the unique real positive root of the indicial polynomial (see [27, 28, 40, 41]) of the error difference equation (10.13) given by

$$\displaystyle{ p_{j}(t) = t^{j} - a_{ 1}t^{j-1} -\ldots -a_{ j-1}t - a_{j}. }$$
(10.14)

Notice that p j (t) in (10.14) has a unique real positive root ρ on account of Descartes’s rule of signs. Moreover, we can write \(\,e_{n+1} = Ce_{n}^{\rho } + o(e_{n}^{\rho })\).

10.1.2 Multidimensional Case

When the Banach spaces \(\mathbb{X} = \mathbb{Y} = \mathbf{R}^{m}\) we have to solve a system of nonlinear equations. Namely, let F: D ⊂ R m R m be a nonlinear function and F ≡ (F 1, F 2, , F m ) with F i : D ⊆ R m → R, i = 1, 2, , m, where D is an open convex domain in R m, so that we have to approximate a solution α ∈ D of the equation F(x) = 0. 

Starting with a given set of initial approximations of the root α, the iteration function Φ​: ​ D ⟶ D of type I , II , III or IV is defined by (10.2), (10.3), (10.4) or (10.5), whereby a sequence {x n } n ≤ 1 XXX is considered to converge to α.

Definition 2

The sequence {x n } converges to α with an order of convergence of at least ρ ∈ R, ρ ≥ 1, if there is a positive real constant 0 < C <  such that

$$\displaystyle{ \|e_{n+1}\| \leq C\|e_{n}\|^{\,\rho }\,, }$$
(10.15)

where \(\,e_{n} = x_{n}-\alpha \,\) is the error in the nth iterate, and the constant C is called the asymptotic error constant (see [41]). Here the norm used is the maximum norm.

The local order of convergence of an iterative method in a neighborhood of a root is the order of the corresponding sequence generated (in R m) by the iterative function Φ and the corresponding initial approximations.

Without using norms, a definition of the local order of convergence for the one-step iterative method without memory can be considered as follows. The local order of convergence is \(\rho \in \mathbb{N}\) if there is an ρ–linear function \(C \in \mathcal{L}\left (\mathbf{R}^{m} \times \stackrel{\rho }{\breve{\cdots }} \times \mathbf{R}^{m},\mathbf{R}^{m}\right )\) \(\equiv \mathcal{L}_{\rho }\left (\mathbf{R}^{m},\mathbf{R}^{m}\right )\) such that

$$\displaystyle{ e_{n+1} = C\,e_{n}^{\,\rho }\,\, +\, O\big(\,e_{ n}^{\,\rho +1}\big)\,,\ n \geq n_{ 0}\, }$$
(10.16)

where \(\,e_{n}^{\rho }\) is \((e_{n},\stackrel{\rho }{\breve{\cdots }},e_{n}) \in \mathbf{R}^{m} \times \stackrel{\rho }{\breve{\cdots }} \times \mathbf{R}^{m}\). When 0 < C <  exists for some ρ ∈ [1, ) from (10.15), then ρ is the R-order of convergence of the iterative method defined by Ortega and Rheinboldt [27]. Moreover, the local order ρ of (10.16) is also the R-order of convergence of the method.

For the one-point iterative method with memory, the error difference equation can be expressed by

$$\displaystyle{ e_{n+1} = Ce_{n}^{a_{1} }e_{n-1}^{a_{2} }\ldots e_{n-j+1}^{a_{j} } + o(e_{n}^{a_{1} }e_{n-1}^{a_{2} }\ldots e_{n-j+1}^{a_{j} })\,, }$$
(10.17)

where \(C \in \mathcal{L}_{a_{1}+\ldots +a_{j}}\left (\mathbf{R}^{m},\mathbf{R}^{m}\right )\) and a k are nonnegative integers for 1 ≤ k ≤ j.

As in the one-dimensional case, we can write the equation associated with (10.17), \(p_{j}(t) = t^{j} - a_{1}t^{j-1} -\ldots -a_{j-1}t - a_{j} = 0\). If we apply Descartes’s rule to the previous polynomial, there is a unique real positive root ρ that coincides with the local order of convergence (see [27, 40]).

10.2 Computational Estimations of the Order

After testing the new iterative methods, we need to check the theoretical local order of convergence. The parameter Computational Order of Convergence (COC) is used in most studies published after Weerakoon and Fernando [43]. This parameter can only be used when the root α is known. To overcome this problem, the following parameters have been introduced:

  • Approximated Computational Order of Convergence (ACOC) by Hueso et al. [22],

  • Extrapolated Computational Order of Convergence (ECOC) by Grau et al. [12],

  • and Pétkovic Computational Order of Convergence (PCOC) by Petković [29].

The paper by Grau et al. [14] examines the relations between the parameters COC, ACOC and ECOC and the theoretical convergence order of iterative methods without memory.

Subsequently, using Wall’s definition of the order (10.12), four new parameters (CLOC, ACLOC, ECLOC and PCLOC) were given in [19] to check this order. Note that the last three parameters do not require knowledge of the root.

Generalizations of COC, ACOC and ECOC from the one-dimensional case to the multi-dimensional one can be found in [15]. They will be presented in detail in the sequel.

10.2.1 Computational Order of Convergence and Its Variants

Let {x n } n ≥ 1 be a sequence of real numbers converging to α. It is obtained by carrying out an iteration function in R, starting with an initial approximation x 0, or \(x_{-j+1},\ldots x_{0}\), of the root α of (10.6). Let {e n } n ≥ 1 be the sequence of errors given by \(e_{n} = x_{n}-\alpha\). If functions (10.2)–(10.5) have local order of convergence ρ, then from (10.10) we have

$$\displaystyle\begin{array}{rcl} \log \vert e_{n}\vert \approx \rho \cdot \log \vert e_{n-1}\vert +\log C,& & {}\\ \log \vert e_{n-1}\vert \approx \rho \cdot \log \vert e_{n-2}\vert +\log C.& & {}\\ \end{array}$$

By subtracting the second expression from the first one we get

$$\displaystyle{ \rho \approx \dfrac{\log \vert e_{n}\;/\,e_{n-1}\vert } {\,\log \vert e_{n-1}\;/\,e_{n-2}\vert \,}\,. }$$
(10.18)

This expression is the same as that described in papers by Weerakoon and Fernando [43], and Jay [23].

Definition 3

The values \(\overline{\rho }_{n}\) (COC), \(\widehat{\rho _{n}}\) (ACOC), \(\widetilde{\rho }_{n}\) (ECOC) and \(\breve{\rho }_{n}\) (PCOC) are defined by

$$\displaystyle\begin{array}{rcl} \overline{\rho }_{n}& =& \dfrac{\log \vert e_{n}\;/e_{n-1}\vert } {\log \vert e_{n-1}\;/e_{n-2}\vert },\quad e_{n} = x_{n}-\alpha \,,\quad n \geq 3,{}\end{array}$$
(10.19)
$$\displaystyle\begin{array}{rcl} \widehat{\rho _{n}}& =& \dfrac{\log \,\left \vert \hat{e}_{n}\;/\hat{e}_{n-1}\right \vert } {\log \,\left \vert \hat{e}_{n-1}\;/\hat{e}_{n-2}\right \vert },\quad \hat{e}_{n} = x_{n} - x_{n-1},\quad n \geq 4,{}\end{array}$$
(10.20)
$$\displaystyle\begin{array}{rcl} \widetilde{\rho }_{n}& =& \dfrac{\log \left \vert \tilde{e}_{n}\;/\,\tilde{e}_{n-1}\right \vert } {\log \left \vert \,\tilde{e}_{n-1}\;/\tilde{e}_{n-2}\right \vert },\quad \tilde{e}_{n} = x_{n} -\widetilde{\alpha }_{n},\quad n \geq 5,{}\end{array}$$
(10.21)
$$\displaystyle\begin{array}{rcl} & & \widetilde{\alpha }_{n} = x_{n} -\dfrac{\left (\delta x_{n-1}\right )^{2}} {\delta ^{2}x_{n-2}},\quad \delta x_{n} = x_{n+1} - x_{n}, \\ \breve{\rho }_{n}& =& \dfrac{\log \left \vert \breve{e}_{n}\right \vert } {\log \left \vert \breve{e}_{n-1}\right \vert },\quad \breve{e}_{n} = \dfrac{f(x_{n})} {\,f(x_{n-1})}\,\,,\quad n \geq 2. {}\end{array}$$
(10.22)

Note that the first variant of COC, ACOC, involves the parameter \(\hat{e}_{n} = x_{n} - x_{n-1}\) and the second variant ECOC is obtained using Aitken’s extrapolation procedure [1]. That is, from the iterates \(x_{n-2},\,x_{n-1},\,x_{n}\) we can obtain the approximation \(\widetilde{\alpha }_{n}\) of the root α.

Sequences \(\{\widetilde{\rho }_{n}\}_{n\geq 5}\) and \(\{\widehat{\rho }_{n}\}_{n\geq 4}\) converge to ρ. The details of the preceding claim can be found in [14], where the relations between the error e n and \(\tilde{e}_{n}\) and \(\hat{e}_{n}\) are also described.

From a computational viewpoint, ACOC has the least computational cost, followed by PCOC. Inspired by (10.12) given in [42], in our study [19] we present four new parameters that will be described in the following section.

10.2.2 New Parameters to Compute the Local Order of Convergence

A. Definitions Given the sequence {x n } n ≥ 1 of iterates converging to α with order ρ, we consider the sequences of errors \(e_{n} = x_{n}-\alpha\) and error parameters \(\hat{e}_{n} = x_{n} - x_{n-1}\), \(\tilde{e}_{n} = x_{n} -\widetilde{\alpha }_{n}\) and \(\breve{e}_{n} = \frac{f(x_{n})} {f(x_{n-1})}\) defined previously in (10.20), (10.21), (10.22). From the preceding, we define the following sequences \(\{\overline{\lambda }_{n}\}_{n\geq 2}\) (CLOC), \(\{\widehat{\lambda }_{n}\}_{n\geq 3}\) (ACLOC), \(\{\widetilde{\lambda }_{n}\}_{n\geq 4}\) (ECLOC) and \(\{\breve{\lambda }_{n}\}_{n\geq 2}\) (PCLOC):

$$\displaystyle{ \overline{\lambda }_{n} = \dfrac{\log \vert e_{n}\vert } {\log \left \vert e_{n-1}\right \vert },\;\:\widehat{\lambda }_{n} = \dfrac{\log \left \vert \hat{e}_{n}\right \vert } {\log \left \vert \widehat{e}_{n-1}\right \vert },\;\:\widetilde{\lambda }_{n} = \dfrac{\log \left \vert \tilde{e}_{n}\right \vert } {\log \left \vert \widetilde{e}_{n-1}\right \vert },\;\:\breve{\lambda }_{n} = \dfrac{\log \left \vert \,f(x_{n})\right \vert } {\log \left \vert \,f(x_{n-1})\right \vert \,}. }$$
(10.23)

Note the analogy between \(\overline{\lambda }_{n}\) and the definitions given by Wall in [42] and by Tornheim in [40]. To obtain \(\overline{\lambda }_{n}\), we need knowledge of α; while to obtain \(\widehat{\lambda }_{n}\), \(\widetilde{\lambda }_{n}\) and \(\breve{\lambda }_{n}\) we do not. The new parameters CLOC, ACLOC, ECLOC and PCLOC have a lower computational cost than their predecessors. A detailed description of their convergence can be found in our studies [19] and [20].

B. Relations Between Error and Error Parameters In the case of iterative methods to obtain approximates of the root α of f(x) = 0, where f: I ⊂ R → R, the error difference equation is given by

$$\displaystyle{ e_{n+1} = C\,e_{n}^{\,\rho }\,\big(1\, +\, O(\,e_{ n}^{\,\sigma })\big)\,,\quad 0 <\sigma \leq 1, }$$
(10.24)

where C is the asymptotic error constant. With the additional hypothesis on the order, say \(\rho \geq (1 + \sqrt{5})/2\), in [19] the relations between ρ and the parameters \(\overline{\lambda }_{n}\), \(\widehat{\lambda }_{n}\), \(\widetilde{\lambda }_{n}\) and \(\breve{\lambda }_{n}\) are presented.

Using (10.24) and the definitions of \(\hat{e}_{n}\), \(\tilde{e}_{n}\) and \(\breve{e}_{n}\,\), we obtain the following theoretical approximations of e n . Namely,

$$\displaystyle\begin{array}{rcl} e_{n}& & \approx C^{\: \frac{1} {1-\rho }}\,\left ( \dfrac{\hat{e}_{n}} {\:\hat{e}_{n-1}\:}\right )^{\rho ^{2}/(\rho -1) }\quad n \geq 3\,,{}\end{array}$$
(10.25a)
$$\displaystyle\begin{array}{rcl} e_{n}& \approx & C^{\: \frac{\rho -1} {\:2\rho -1\:} }\,\left (\tilde{e}_{n}\:\right )^{\rho ^{2}/\,(2\rho -1) }\quad n \geq 3,{}\end{array}$$
(10.25b)
$$\displaystyle\begin{array}{rcl} e_{n}& \approx & C^{\: \frac{1} {1-\rho }}\;\left (\breve{e}_{n}\:\right )^{\rho /\,\rho -1}\quad n \geq 2.{}\end{array}$$
(10.25c)

From the preceding (10.25a), (10.25b) and (10.25c), we can obtain bounds of the error to predict the number of correct figures and establish a stopping criterion, all without knowledge of the root α.

Table 10.1 Test functions, their roots and the initial points considered

C. Numerical Test The convergence of the new parameters has been tested in six iterative schemes with local convergence order equal to 2, 3, 4, \((1 + \sqrt{5}\:)/2\), \(1 + \sqrt{2}\) and \(1 + \sqrt{3}\) respectively, in a set of seven real functions shown in Table 10.1. The first three methods are one-point iterative methods without memory, known as the Newton method, the Chebyshev method [11] and the Schröder method [35]. The other three are iterative methods with memory, namely the Secant method and two of its variants (see [13]).

They are defined by

$$\displaystyle\begin{array}{rcl} \phi _{1}(x_{n})& =& x_{n} -\, u(x_{n}),{}\end{array}$$
(10.26)
$$\displaystyle\begin{array}{rcl} \phi _{2}(x_{n})& =& \phi _{1}(x_{n}) -\frac{1} {2}\,L(x_{n})\,u(x_{n}),{}\end{array}$$
(10.27)
$$\displaystyle\begin{array}{rcl} \phi _{3}(x_{n})& =& \phi _{2}(x_{n}) -\left (\frac{1} {2}\,L(x_{n})^{2}\, - M(x_{ n})\right )\,u(x_{n}),{}\end{array}$$
(10.28)
$$\displaystyle\begin{array}{rcl} \phi _{4}(x_{n})& =& x_{n} -\, [x_{n-1},x_{n}]_{f}^{-1}\:f(x_{ n}),{}\end{array}$$
(10.29)
$$\displaystyle\begin{array}{rcl} \phi _{5}(x_{n})& =& \phi _{4}(x_{n}) -\, [x_{n},\phi _{4}(x_{n})]_{f}^{-1}\:f(\phi _{ 4}(x_{n})),{}\end{array}$$
(10.30)
$$\displaystyle\begin{array}{rcl} \phi _{6}(x_{n})& =& \phi _{4}(x_{n}) -\, [x_{n},2\phi _{4}(x_{n}) - x_{n}]_{f}^{-1}\:f(\phi _{ 4}(x_{n})),{}\end{array}$$
(10.31)

where

$$\displaystyle{u(x) = \frac{f(x)} {f^{{\prime}}(x)},\;L(x) = \frac{f^{{\prime\prime}}(x)} {f'(x)} \,u(x),\;M(x) = \frac{f^{{\prime\prime\prime}}(x)} {3!\,f'(x)}\,u(x)^{2},\;[x,y]_{ f}^{-1} = \dfrac{y - x} {f(y) - f(x)}.}$$

The numerical results can be found in [20]. For each method from (10.26) to (10.31) and each function in Table 10.1, we have applied the four techniques with adaptive multi-precision arithmetic (see below) derived from relations (10.25a), (10.25b) and (10.25c) and the desired precision that for this study is 10−2200. The number of necessary iterations to obtain the desired precision and the values of iterated points x 1, , x I are the same.

From the results of [20], we can conclude that CLOC gives the best approximation of the theoretical order of convergence of an iterative method. However, knowledge of the root is required. Conversely, as we can see in the definitions (10.23) of ACLOC, ECLOC and PCLOC, these parameters do not involve the expression of the root α. Actually, in real problems we want to approximate the root that is not known in advance. For practical purposes, we recommend ECLOC because it presents the best approximation of the local order (see [20]). Nevertheless, PCLOC is a good practical parameter in many cases because it requires fewer computations.

10.2.3 Multidimensional Case

A generalization to several variables of some parameters is carried out to approximate the local convergence order of an iterative method presented in the previous sections. In order to define the new parameters, we substitute the absolute value by the maximum norm, and all computations are done using the components of the vectors. Let \(\{x_{n}\}_{n\in \mathbb{N}}\) be a convergence sequence of R m towards α ∈ R m, where x n  = (x n (1), x n (2), , x n (m))t and α = (α (1), α (2), , α (m))t. We consider the vectorial sequence of the error \(e_{n} = x_{n}-\alpha\) and the following vectorial sequences of parameters:

$$\displaystyle{ \hat{e}_{n} = x_{n} - x_{n-1}\,, \tilde{e}_{n} =\max _{1\leq r\leq m}\left \vert \dfrac{\left (\delta x_{n-1}^{(r)}\right )^{2}} {\delta ^{2}x_{n-2}^{(r)}} \right \vert }$$
(10.32)

where \(\delta x_{n} = x_{n+1} - x_{n}\). Notice that \(\tilde{e}_{n}\) is the δ 2-Aitken procedure applied to the components of x n−1, x n and x n+1, and all parameters are independent of knowledge of the root.

Definitions

Let \(\{\overline{\rho }_{n}\}_{n\geq 3}\), \(\{\widehat{\rho }_{n}\}_{\geq 4}\), \(\{\widetilde{\rho }_{n}\}_{\geq 5}\), \(\{\breve{\rho }_{n}\}_{n\geq 3}\), \(\{\overline{\lambda }_{n}\}_{n\geq 2}\), \(\{\widehat{\lambda }_{n}\}_{\geq 3}\), \(\{\widetilde{\lambda }_{n}\}_{\geq 4}\) y \(\{\breve{\lambda }_{n}\}_{n\geq 2}\) be the following real sequences:

  • Parameters COC, \(\{\overline{\rho }_{n}\}_{n\geq 3}\) and CLOC, \(\{\overline{\lambda }_{n}\}_{n\geq 2}\):

    $$\displaystyle\begin{array}{rcl} \overline{\rho }_{n}& =& \dfrac{\log \left (\|e_{n}\|/\|e_{n-1}\|\right )} {\,\log \left (\|e_{n-1}\|/\|e_{n-2}\|\right )\,}\,,\ n \geq 3\,,\quad \overline{\lambda }_{n} = \dfrac{\log \|e_{n}\|} {\,\log \|e_{n-1}\|\,}\,,\ n \geq 2\,. {}\end{array}$$
    (10.33a)
  • Parameters ACOC, \(\{\widehat{\rho }_{n}\}_{n\geq 4}\) and ACLOC \(\{\widehat{\lambda }_{n}\}_{n\geq 3}\):

    $$\displaystyle\begin{array}{rcl} \widehat{\rho }_{n}& =& \dfrac{\log \left (\|\hat{e}_{n}\|/\|\hat{e}_{n-1}\|\right )} {\,\log \left (\|\hat{e}_{n-1}\|/\|\hat{e}_{n-2}\|\right )\,}\,,\ n \geq 4\,,\quad \widehat{\lambda }_{n} = \dfrac{\log \|\hat{e}_{n}\|} {\,\log \|\hat{e}_{n-1}\|\,}\,,\ n \geq 3\,. {}\end{array}$$
    (10.33b)
  • Parameters ECOC \(\{\widetilde{\rho }_{n}\}_{n\geq 5}\) and ECLOC \(\{\widetilde{\lambda }_{n}\}_{n\geq 4}\):

    $$\displaystyle\begin{array}{rcl} \widetilde{\rho }_{n}& =& \dfrac{\log \left (\|\tilde{e}_{n}\|/\|\tilde{e}_{n-1}\|\right )} {\,\log \left (\|\tilde{e}_{n-1}\|/\|\tilde{e}_{n-2}\|\right )\,}\,,\ n \geq 5\,,\quad \widetilde{\lambda }_{n} = \dfrac{\log \|\tilde{e}_{n}\|} {\,\log \|\tilde{e}_{n-1}\|\,}\,,\ n \geq 4\,. {}\end{array}$$
    (10.33c)
  • Parameters PCOC \(\{\breve{\rho }_{n}\}_{n\geq 3}\) and PCLOC, \(\{\breve{\lambda }_{n}\}_{n\geq 2}\):

    $$\displaystyle\begin{array}{rcl} \breve{\rho }_{n}& =& \dfrac{\|F(x_{n})\|/\|F(x_{n-1})\|} {\|F(x_{n-1})\|/\|F(x_{n-2})\|},\ n \geq 3,\;\;\breve{\lambda }_{n} = \dfrac{\log \left \|F(x_{n})\right \|} {\log \left \|F(x_{n-1})\right \|},\ n \geq 2. {}\end{array}$$
    (10.33d)

Approximations COC, ACOC and ECOC have been used in Grau et al. [15]. A complete study of these parameters has been carried out to compute the local convergence order for four iterative methods and seven systems of nonlinear equations.

10.3 The Vectorial Error Difference Equation

Here we present a generalization to several variables of a technique used to compute analytically the error equation of iterative methods without memory for one variable. We consider iterative methods to find a simple root of a system of non-linear equations

$$\displaystyle{F(x) = 0\,,}$$

where F: D ⊆ R m R m is sufficiently differentiable and D is an open convex domain in R m. Assume that the solution of F(x) = 0 is α ∈ D, at which F′(α) is nonsingular.

The key idea is to use formal power series. The vectorial expression of the error equation obtained by carrying out this procedure, is

$$\displaystyle{e_{n+1} = G\left (F'(\alpha ),\,F''(\alpha ),\ldots \right )\,e_{n}^{\,\rho } +\, O\left (e_{ n}^{\rho +1}\right ),}$$

where ρ is a nonnegative integer. If the iterative scheme is with memory we obtain [see (10.13)]

$$\displaystyle{e_{n+1} = H\left (F'(\alpha ),\,F''(\alpha ),\ldots \right )\,e_{n}^{a_{1} }\,e_{n-1}^{a_{2} }\,\cdots e_{n-j+1}^{a_{j} } +\, o\left (e_{n}^{a_{1} }\,e_{n-1}^{a_{2} }\,\cdots e_{n-j+1}^{a_{j} }\right ),}$$

where a k are nonnegative integers for 1 ≤ k ≤ j.

10.3.1 Notation

To obtain the vectorial equation of the error, we need some known results that, for ease of reference, are included in the following. Let F: D ⊆ R m R m be sufficiently differentiable (Fréchet-differentiable) in D, and therefore with continuous differentials. If we consider the kth derivative of F at a ∈ R m, we have the k-linear function

$$\displaystyle\begin{array}{rcl} F^{(k)}(a): \mathbf{R}^{m} \times \stackrel{ k}{\breve{\cdots }} \times \mathbf{R}^{m}& \longrightarrow & \mathbf{R}^{m} {}\\ (h_{1},\ldots,h_{k})\;\;\;& \longmapsto & F^{(k)}(a)\,(h_{ 1},\ldots,h_{k}). {}\\ \end{array}$$

That is, \(F^{(k)}(a) \in \mathcal{L}\big(\mathbf{R}^{m} \times \stackrel{k}{\breve{\cdots }} \times \mathbf{R}^{m},\mathbf{R}^{m}\big) \equiv \mathcal{L}_{k}\left (\mathbf{R}^{m},\mathbf{R}^{m}\right )\). It has the following properties:

  1. P1.

    \(\;\:F^{(k)}(a)\,(h_{1},\ldots,h_{k-1},\cdot \,) \in \mathcal{L}\left (\mathbf{R}^{m},\,\mathbf{R}^{m}\right ) \equiv \mathcal{L}\left (\mathbf{R}^{m}\right )\).

  2. P2.

      F (k)(a) (h σ(1), , h σ(k)) = F (k)(a) (h 1, , h k ), where σ is any permutation of the set {1, 2, , k}.

Notice that from P1 and P2 we can use the following notation:

  1. N1.

       F (k)(a) (h 1, , h k ) = F (k)(a) h 1h k . For h j  = h, 1 ≤ j ≤ k, we write F (k)(a) h k.

  2. N2.

    \(\;\:\;F^{(k)}(a)\,h^{k-1}\;F^{(l)}(a)\,h^{l} = F^{(k)}(a)\,F^{(l)}(a)\:h^{k+l-1}\).

Hence, we can also express F (k)(a) (h 1, , h k ) as

$$\displaystyle\begin{array}{rcl} F^{(k)}(a)\,(h_{ 1},\ldots,h_{k-1})\,h_{k}& =& F^{(k)}(a)\,(h_{ 1},\ldots,h_{k-2})\;h_{k-1}\,h_{k} {}\\ & \vdots & {}\\ & =& F^{(k)}(a)\,h_{ 1}\cdots h_{k}\,. {}\\ \end{array}$$

For any \(\,q = a + h \in \mathbf{R}^{m}\) lying in a neighborhood of a ∈ R m, assuming that \(\,\left [F^{{\prime}}\left (a\right )\right ]^{-1}\) exists, and taking into account this notation, we write Taylor’s formulae in the following way:

$$\displaystyle\begin{array}{rcl} F(a + h)& =& F(a) + F'(a)\,h + \frac{1} {2!}F^{(2)}(a)\,h^{2} +\ldots + \frac{1} {p!}F^{(\,p)}(a)\,h^{p} +\, O_{ p+1}\,, \\ & =& F(a) + F'(a)\left (\,h +\,\sum _{ k=2}^{p}\,A_{ k}(a)\,h^{k} +\, O_{ p+1}\,\right ), {}\end{array}$$
(10.34)

where \(A_{k}(a) = \frac{1} {k!}\;\left [F'(a)\right ]^{-1}\:F^{(k)}(a) \in \mathcal{L}_{ k}\left (\mathbf{R}^{m},\mathbf{R}^{m}\right ),\;\;2 \leq k \leq p,\) and \(O_{p+1} =\, O(h^{p+1})\,\).

10.3.2 Symbolic Computation of the Inverse of a Function of Several Variables

We assume that F: D ⊆ R m R m has at least p-order derivatives with continuity on D for any x ∈ R m lying in a neighborhood of a simple zero, α ∈ D, of the system F(x) = 0. We can apply Taylor’s formulae to F(x). By setting \(\,e = x-\alpha\), the local order, and assuming that \(\,\left [F^{{\prime}}\left (\alpha \right )\right ]^{-1}\) exists, we have

$$\displaystyle{ F(x)\, =\, F(\alpha +e)\, =\,\varGamma \left (\,e +\,\sum _{ k=2}^{p-1}\,A_{ k}\,e^{k}\right )\, +\,\,\, O_{ p}\,, }$$
(10.35)

where

$$\displaystyle{A_{k} = A_{k}(\alpha )\,,\ k \geq 2\,,\text{ with}\varGamma = F^{{\prime}}\left (\alpha \right )\,,}$$
$$\displaystyle{\,e^{k}\, =\, (e,\stackrel{k}{\breve{\cdots }},e) \in \mathbf{R}^{m} \times \stackrel{k}{\breve{\cdots }} \times \mathbf{R}^{m}\text{ and}O_{ p} =\,\, O(e^{p}).}$$

Moreover, from (10.35) noting the identity by I, the derivatives of F(x) can be written as

$$\displaystyle{ F^{{\prime}}(x) =\varGamma \left (\,I +\,\sum _{ k=2}^{p-1}\,k\,A_{ k}\,e^{k-1}\right ) +\,\,\, O_{ p}\,, }$$
(10.36)
$$\displaystyle{ F^{{\prime\prime}}(x) =\varGamma \left (\,\sum _{ k=2}^{p-2}\,k\,(k - 1)\,A_{ k}\,e^{k-2}\right ) +\,\,\, O_{ p-1}\,, }$$
(10.37)
$$\displaystyle{ F^{{\prime\prime\prime}}(x) =\varGamma \left (\,\sum _{ k=3}^{p-3} \dfrac{k!} {(k - 3)!}\,A_{k}\,e^{k-3}\right ) +\,\,\, O_{ p-2}\,, }$$
(10.38)

and so forth up to order p.

By developing a formal series expansion of e, the inverse of F (x) is

$$\displaystyle{ \left [F^{{\prime}}(x)\right ]^{-1} = \left (I +\sum _{ j=1}^{4}\,K_{ j}e^{j} + O_{ 5}\right )\varGamma ^{-1}, }$$
(10.39)

where

$$\displaystyle\begin{array}{rcl} K_{1}& =& -2\,A_{2}, {}\\ K_{2}& =& 4\,A_{2}^{2} - 3\,A_{ 3}, {}\\ K_{3}& =& -8\,A_{2}^{3} + 6\,A_{ 2}\,A_{3} + 6\,A_{3}\,A_{2} - 4\,A_{4}, {}\\ K_{4}& =& 16\,A_{2}^{4} - 12\,A_{ 2}^{2}\,A_{ 3} - 12\,A_{2}\,A_{3}\,A_{2} - 12\,A_{3}\,A_{2}^{2} + 8\,A_{ 2}\,A_{4} + 8\,A_{4}\,A_{2}. {}\\ \end{array}$$

Example (Newton Method)

We consider the Newton’s method that we can write as

$$\displaystyle{ X = x -\, F'(x)^{-1}\,F(x). }$$
(10.40)

The expression of the error \(\,E = X-\alpha\) in terms of e is built up by subtracting α from both sides of (10.40) and taking into account (10.35) and (10.39). Namely,

$$\displaystyle\begin{array}{rcl} E& =& \,e -\,\left (I +\sum _{ j=1}^{3}\,K_{ j}\,e^{j} + O_{ 4}\right )\,\varGamma ^{-1}\,\varGamma \,\left (\,e +\,\sum _{ k=2}^{4}\,A_{ k}\,e^{k} +\, O_{ 5}\,\right ) \\ & =& A_{2}\,e^{2} + 2(A_{ 3} - A_{2}^{2})\,e^{3} + (3\,A_{ 4} - 4\,A_{2}A_{3} - 3\,A_{3}A_{2} + 4\,A_{2}^{3})\,e^{4} +\, O_{ 5}.{}\end{array}$$
(10.41)

The result (10.41) agrees with the classical asymptotic constant in the one-dimensional case and states that Newton’s method has at least local order 2. Note that the terms A 2 A 3 and A 3 A 2 are noncommutative.

10.3.3 A Development of the Inverse of the First Order Divided Differences of a Function of Several Variables

We assume that F: D ⊆ R m R m has, at least, fifth-order derivatives with continuity on D. We consider the first divided difference operator of F in R m as a mapping

$$\displaystyle\begin{array}{rcl} \left [-,-;F\right ]\,:\,& & \ \ D \times D\ \,\longrightarrow \,\mathcal{L}(\mathbf{R}^{m},\mathbf{R}^{m}) {}\\ & & (x + h,x)\longrightarrow \left [x + h,x\,;\;F\right ]\,, {}\\ \end{array}$$

which, for all x, x + h  ∈ D, is defined by

$$\displaystyle{ [x + h\,,x\,;\;F]\,h = F(x + h) - F(x)\,, }$$
(10.42)

where \(\mathcal{L}(\mathbf{R}^{m},\mathbf{R}^{m})\) denotes the set of bounded linear functions (see [27, 32] and references therein). For F sufficiently differentiable in D, we can write:

$$\displaystyle{ F(x + h) - F(x) =\int _{ x}^{x+h}F'(z)\,dz\ =\int _{ 0}^{1}F'(x + th)\,dt. }$$
(10.43)

By developing F (x + t h) in Taylor’s series at the point x ∈ R m and integrating, we obtain

$$\displaystyle{ \left [x + h\,,x\,;\;F\right ]\, = F^{{\prime}}(x) + \dfrac{1} {2}F^{{\prime\prime}}(x)\,h +\ldots + \dfrac{1} {p!}F^{(\,p)}(x)\,h^{p-1} +\, O_{ p}. }$$
(10.44)

By developing F(x) and its derivatives in Taylor’s series at the point \(x =\alpha +e\) lying in a neighborhood of a simple zero, α ∈ D, of the system F(x) = 0, and assuming that \(\,\left [F^{{\prime}}\left (\alpha \right )\right ]^{-1}\) exists, we obtain the expressions (10.35) and (10.38). Next, by replacing these expressions in (10.44), we obtain:

$$\displaystyle{ \left [x + h\,,x\,;\;F\right ] =\varGamma \left (I + A_{2}(2e + h) + A_{3}(3\,e^{2} + 3\,e\,h + h^{2})+\ldots \right ), }$$
(10.45)

or more precisely

$$\displaystyle{ \left [x + h\,,x\,;\;F\right ] =\varGamma \left (\,I\, +\,\sum _{ k=1}^{p-1}\,S_{ k}(h,e)\, + O_{p}(\varepsilon,e)\right ), }$$
(10.46)

where \(S_{k}(h,e) = A_{k+1}\sum _{j=1}^{k+1}\binom{k + 1}{j}e^{k-j+1}\,h^{j-1}\,,\;\;k \geq 1\).

We say that a function depending on ɛ and e is an O p (ɛ, e) if it is an \(O(\varepsilon ^{q_{0}}\,e^{q_{1}})\) with \(q_{0} + q_{1} = p\,,\) q i  ≥ 0 ,  i = 0, 1. 

Setting \(y = x + h\), \(\varepsilon = y-\alpha\) and \(h =\varepsilon -e\) in (10.45) and (10.46) we obtain

$$\displaystyle{ \left [\,y\,,x\,;\;F\right ] =\varGamma \left (I + A_{2}(\varepsilon +e) + A_{3}(\varepsilon ^{2} +\varepsilon \, e + e^{2})+\ldots \right ), }$$
(10.47)

or more precisely

$$\displaystyle{ \left [\,y\,,x\,;\;F\right ] =\varGamma \left (\,I\, +\,\sum _{ k=1}^{p-1}\,T_{ k}(\varepsilon,e)\, + O_{p}(\varepsilon,e)\right ), }$$
(10.48)

where \(T_{k}(\varepsilon,e) = A_{k+1}\sum _{j=0}^{k}\varepsilon ^{k-j}\,e^{j}\).

If we expand in formal power series of e and ε, the inverse of the divided difference given in (10.47) or in (10.48) is:

$$\displaystyle{ \left [\,y\,,x\,;\,F\right ]^{-1}=\left (I - A_{ 2}(\epsilon +e)-A_{3}(\varepsilon ^{2} +\varepsilon \, e + e^{2}) +\big (A_{ 2}(\epsilon +e)\big)^{2}+O_{ 3}(\varepsilon,e)\right )\varGamma ^{-1}. }$$
(10.49)

Notice that Eq. (10.49) is written explicitly until the 2nd-degree in ɛ and e, while in each specific circumstance it will be adapted and reduced to the necessary terms, with an effective contribution to the computation of the local order of convergence.

These developments of the divided difference operator (10.49) were first used in our study Grau et al. [18].

Example (Secant Method)

The generic case, (10.47), (10.48) or (10.49) can be adapted to different cases. For example, the well-known iterative method called the Secant method [27, 32] is defined by the algorithm:

$$\displaystyle{ x_{n+1} = x_{n} -\,\left [x_{n-1}\,,x_{n}\,;\;F\right ]^{-1}\,F(x_{ n})\,,\quad x_{0}\,,x_{1} \in D. }$$
(10.50)

If \(y = x_{n-1}\) and x = x n in (10.47) then we obtain an expression of the operator \(\left [x_{n-1}\,,x_{n}\,;\;F\right ]\) in terms of \(e_{n-1} = x_{n-1}-\alpha\) and \(e_{n} = x_{n}-\alpha\). If we expand in formal power series of e n−1 and e n the inverse of the divided difference operator in the Secant method we obtain

$$\displaystyle{ \left [x_{n-1},x_{n};F\right ]^{-1} = \left (I - A_{ 2}\,(e_{n-1} + e_{n}) + (A_{2}^{2} - A_{ 3})\,e_{n-1}^{2} + o(e_{ n-1}^{2})\right )\varGamma ^{-1}, }$$
(10.51)

where \(A_{2}^{2}\,e_{n-1}^{2} = \left (A_{2}\,e_{n-1}\right )^{2}\). The expression of the error \(\,e_{n+1} = x_{n+1}-\alpha\) in terms of e n and e n−1 for the Secant method is built up by subtracting α from both sides of (10.50). Taking into account (10.35) and (10.51), we have

$$\displaystyle\begin{array}{rcl} e_{n+1}& =& e_{n} -\left (I - A_{2}(e_{n-1} + e_{n}) + (A_{2}^{2} - A_{ 3})e_{n-1}^{2} + o(e_{ n-1}^{2})\right ) \cdot \\ & &\cdot \left (e_{n} + A_{2}e_{n}^{2} + O(e_{ n}^{3})\right ) \\ & =& A_{2}\,e_{n-1}\,e_{n} +\, (A_{3} - A_{2}^{2})\,e_{ n-1}^{2}\,e_{ n} +\, o(e_{n-1}^{2}\,e_{ n}), {}\end{array}$$
(10.52)

where the indicial polynomial [see (10.17)] of the error difference equation (10.52) is \(t^{2} - t - 1 = 0\), with only one positive real root, which is the R-order of convergence of the Secant method, \(\phi = (1 + \sqrt{5})/2\). The second term of the right side of (10.52) would give order 2, since its associated polynomial equation is \(t^{2} - t - 2 = 0\). This result agrees with the classical asymptotic constant in the one-dimensional case and states that the Secant method has at least local order ϕ.

A more complete expression of the error expression for the Secant method can be found in our study Grau et al. [13].

10.4 Efficiency Indices

We are interested in comparing iterative processes to approximate a solution α of a system of nonlinear equations. In the scalar case, the parameters of the efficiency index (E I) and computational efficiency (C E) are possible indicators of the efficiency of the scheme. Then, we consider the computational efficiency index (C E I) as a generalization to the multi-dimensional case. We show the power of this parameter by applying it to some numerical examples.

10.4.1 Efficiency Index and Computational Efficiency

To compare different iterative methods for solving scalar nonlinear equations, the efficiency index suggested by Ostrowski [28] is widely used,

$$\displaystyle{ EI =\rho ^{\,1/a}, }$$
(10.53)

where ρ is the local order of convergence of the method and a represents the number of evaluations of functions required to carry out the method per iteration.

Another classical measure of efficiency for iterative methods applied to scalar nonlinear equations is the computational efficiency proposed by Traub [41],

$$\displaystyle{ CE =\rho ^{\,1/\omega }, }$$
(10.54)

where ω is the number of operations, expressed in product units, that are needed to compute each iteration without considering the evaluations of the functions. In general, if we are interested in knowing the efficiency of a scalar scheme, the most frequently used parameter is E I, instead of any combination of this parameter with C E.

The efficiency index for Newton’s method is 21∕2 ≈ 1. 414 because an iterative step of Newton requires the computation of f(x) and f′(x) then a = 2, and the local order is ρ = 2. Note that the parameter E I is independent of the expression of f and its derivative, while the parameter C E does not consider the evaluation of the computational cost of the functions of the algorithm.

More precisely, note that an iteration step requires two actions: first the calculation of new functions values; and then the combination of data to calculate the next iterate. The evaluation of functions requires the invocation of subroutines, whereas the calculation of the next iterate requires only a few arithmetic operations. In general, these few arithmetic operations are not considered in the scalar case.

10.4.2 Computational Efficiency Index

The traditional way to present the computational efficiency index of iterative methods (see [28, 41]) is adapted for systems of nonlinear equations. When we deal with a system of nonlinear equations, the total operational cost is the sum of the evaluations of functions (the function and the derivatives involved) and the operational cost of doing a step of the iterative method.

In m-dimensional case, the choice of the most suitable iterative method, \(x_{n+1} =\varPhi (x_{n})\), depends mainly on its efficiency which also depends on the convergence order and the computational cost. The number of operations per iteration increases the computational cost in such a way that some algorithms will not be used because they are not efficient. In general, we have a scheme such as the following

$$\displaystyle{\varPhi (x_{n}) = x_{n} -\varTheta _{n}^{-1}\,F(x_{ n}),}$$

where instead of computing the inverse of the operator Θ n , we solve the following linear system

$$\displaystyle\begin{array}{rcl} \varTheta _{n}\,y_{n}& =& -F(x_{n}), {}\\ x_{n+1}& =& x_{n} + y_{n}. {}\\ \end{array}$$

Therefore, we choose the L U-decomposition plus the resolution of two linear triangular systems in the computation of the inverse operator that appears. In other words, in the multi-dimensional case we have to perform a great number of operations, while in the scalar case the number of operations is reduced to a very few products.

Let be the conversion factor of quotients into products (the time needed to perform a quotient, in time of product units). Recall that the number of products and quotients that we need to solve an m-dimensional linear system, using the L U-decomposition is

$$\displaystyle{\omega _{1}\, =\, \frac{m} {6} \,(2\,m^{2} - 3\,m + 1)\; +\,\ell\, \frac{m} {2} \,(m - 1),}$$

and to solve the two triangular linear systems with ones in the main diagonal of matrix L we have \(\,\omega _{2}\, =\, m\,(m - 1)\; +\,\ell\, m\) products. Finally, the total number of products is

$$\displaystyle{\frac{m} {6} \,\left (2\,m^{2} + 3\,(1+\ell)\,m + 3\,\ell - 5\right ).}$$

Definition 4

The computational efficiency index (C E I) and the computational cost per iteration (\(\mathcal{C}\)) are defined by (see [13, 16, 18, 32])

$$\displaystyle{ CEI(\mu _{0},\mu _{1},m,\ell)\, =\,\rho ^{ \dfrac{1} {\mathcal{C}(\mu _{0},\mu _{1},m,\ell)} }, }$$
(10.55)

where \(\mathcal{C}(\mu _{0},\mu _{1},m,\ell)\) is the computational cost given by

$$\displaystyle{ \mathcal{C}(\mu _{0},\mu _{1},m) = a_{0}(m)\,\mu _{0} + a_{1}(m)\,\mu _{1} +\omega (m,\ell), }$$
(10.56)
a 0(m):

represents the number of evaluations of the scalar functions (F 1, , F m ) used in one step of the iterative method.

a 1(m):

is the number of evaluations of scalar functions of F′, say \(\dfrac{\partial F_{i}} {\partial x_{j}},\,1 \leq i,j \leq m\).

ω(m, ):

represents the number of products needed per iteration.

The constants μ 0 and μ 1 are the ratios between products and evaluations required to express the value of \(\mathcal{C}(\mu _{0},\mu _{1},m)\) in terms of products, and is the cost of one quotient in products.

Note that:

$$\displaystyle{CEI(\mu _{0},\mu _{1},m,\ell)\, >\, 1,\qquad \lim _{m\rightarrow \infty }CEI(\mu _{0},\mu _{1},m,\ell) = 1.}$$

Notice that for \(\mu _{0} =\mu _{1} = 1\) and ω(m) = 0, (10.55) is reduced to (10.53), that is the classic efficiency index of an iterative method, say \(\,EI =\rho ^{\,1/a}\) in the scalar case. Also observe that, if \(a_{0} = a_{1} = 0\), (10.55) is written in the scalar case as (10.54); namely, \(\,CE =\rho ^{\,1/\omega }\).

According to (10.56), an estimation of the factors μ 0 , μ 1 is required. To do this, we express the cost of the evaluation of the elementary functions in terms of products, which depend on the machine, the software and the arithmetic used. In [10, 38], we find comparison studies between a multi-precision library (MPFR) and other computing libraries. In Tables 10.2 and 10.3 our own estimation of the cost of the elementary functions is shown in product units, where the running time of one product is measured in milliseconds.

Table 10.2 Computational cost of elementary functions computed with Matlab 2009b and Maple 13 on an IntelⓇCore(TM)2 Duo CPU P8800 (32-bit machine) processor with Microsoft Windows 7 Professional, where \(x = \sqrt{3} - 1\) and \(y = \sqrt{5}\)

The values presented in Table 10.2 have been rounded to 5 unities because of the huge variability in the different repetitions that were carried out. In contrast, the averages are shown in Table 10.3, since the variability was very low. In addition, the compilator of C++ that was used ensures that the function clock() gives exactly the CPU time invested by the program. Table 10.3 shows that some relative values for the product are lower in multiple precision than in double precision, although the absolute time spent on a product is much higher in multiple precision.

Table 10.3 Computational cost of elementary functions computed with a program written in C++, compiled by gcc(4.3.3) for i486-linux-gnu with libgmp (v.4.2.4) and libmpfr (v.2.4.0) libraries on an IntelⓇXeon E5420, 2.5 GHz, 6 MB cache processor where \(x = \sqrt{3} - 1\) and \(y = \sqrt{5}\)

This measure of computing efficiency is clearly more satisfactory than considering only the number of iterations or only the number of evaluated functions, which are used widely by others authors. Any change of software or hardware requires us to recompute the elapsed time of elemental functions, quotients and products.

In this section we compare some free derivative iterative methods that use the divided difference operator (see [13]). Firstly, we recall the classical Secant method (10.50) and we study a few two-step algorithms with memory.

10.4.3 Examples of Iterative Methods

Secant Method We call Φ 0 to the well-known iterative secant method (10.50). That is, by setting x −1, x 0 ∈ D

$$\displaystyle\begin{array}{rcl} x_{n+1}\; =\;\varPhi _{0}(x_{n-1},x_{n})& =& x_{n} -\left [\,x_{n-1},x_{n};F\right ]^{-1}\,F(x_{ n}),{}\end{array}$$
(10.57)

with the local order of convergence \(\,\phi = (1 + \sqrt{5})/2 = 1.618\ldots\)

Frozen Divided Difference Method We consider two steps of the Secant method and by setting x −1, x 0 given in D,

$$\displaystyle{ \left \{\begin{array}{rcl} y_{n}& =&\varPhi _{0}(x_{n-1},x_{n}), \\ x_{n+1} & =&\varPhi _{1}(x_{n-1},x_{n})\: =\: y_{n} -\left [x_{n-1},x_{n};F\right ]^{-1}\,F(y_{n}),\quad n \geq 0.\end{array} \right. }$$
(10.58)

In this case, the local order is at least 2. Therefore, this method is a quadratic method where the divided difference operator is only computed once, which is why the reason why it is called the frozen divided difference method (for more details see [18]).

The next two iterative methods are pseudo-compositions of two known schemes, which are two-step algorithms.

First Superquadratic Method We take the Secant method twice. That is, by putting x −1, x 0 given in D,

$$\displaystyle{ \left \{\begin{array}{rcl} y_{n}& =&\varPhi _{0}(x_{n-1},x_{n}), \\ x_{n+1} & =&\varPhi _{2}(x_{n-1},x_{n})\: =\: y_{n} -\left [\,x_{n},y_{n};F\right ]^{-1}\,F(y_{n}),\quad n \geq 0.\end{array} \right. }$$
(10.59)

The order of the two-step iterative method with memory defined in (10.59) is \(1 + \sqrt{2} = 2.414\ldots\)

Second Superquadratic Method We define the pseudo-composition of the Secant method with the Kurchatov method [9, 26]. The result is the following scheme: Given x −1, x 0 in D,

$$\displaystyle{ \left \{\begin{array}{rcl} y_{n}& =&\varPhi _{0}(x_{n-1},x_{n}), \\ x_{n+1} & =&\varPhi _{3}(x_{n-1},x_{n})\: =\: y_{n} -\left [\,x_{n},2y_{n} - x_{n};F\right ]^{-1}\,F(y_{n}),\quad n \geq 0.\end{array} \right. }$$
(10.60)

This two-step scheme with memory has a local order of convergence equal to \(1 + \sqrt{3} = 1.732\ldots\)

Finally, we observe that we have moved from a superlinear method such as the Secant method with local order equal to \((1 + \sqrt{5})/2\) to a superquadratic method with local order equal to \(1 + \sqrt{3}\).

10.4.4 Comparisons Between These Methods

We study the efficiency index of the four iterative methods Φ j , 0 ≤ j ≤ 3, given by (10.57), (10.58), (10.59) and (10.60) respectively. The computational efficiency index (C E I j ) of each iterative method and the computational cost per iteration (\(\mathcal{C}_{j}\)) are defined in (10.55) as

$$\displaystyle{ CEI_{j}(\mu,m,\ell) =\rho ^{\, \dfrac{1} {\mathcal{C}_{j}(\mu,m,\ell)} }, }$$

where

$$\displaystyle{ \mathcal{C}_{j}(\mu,m,\ell) = a_{j}(m)\mu +\omega _{j}(m,\ell). }$$

Note that we denote μ 0 by μ in these examples. For each method Φ j for j = 0, 1, 2, 3, Table 10.4 shows: the local order of convergence ρ j ; the number of evaluations of F (NEF); the number of computations of the divided differences (DD); the value of a j (m); and ω j (m, ). In order to obtain these results, we consider the following computational divided difference operator (10.61). To compute the [x n−1, x n ; F] operator, we need m 2 divisions and m(m − 1) scalar evaluations. Note that, for the [x n , y n ; F] or [x n , 2y n x n ; F] operators we need m 2 scalar evaluations.

$$\displaystyle\begin{array}{rcl} [\,y,x;F]_{i\,j}^{(1)}& =& \left (F_{ i}(y_{1},\ldots,y_{j-1},y_{j},x_{j+1},\ldots,x_{m})-\right. {} \\ & & \left.F_{i}(y_{1},\ldots,y_{j-1},x_{j},x_{j+1},\ldots,x_{m})\right )/(y_{j} - x_{j}),\quad 1 \leq i,j \leq m. \\ \end{array}$$
(10.61)

Summarizing the results of Table 10.4 we have

$$\displaystyle{\begin{array}{lcl} \mathcal{C}_{0}(\mu,m,\ell)\, =\, \dfrac{m} {6} (2m^{2} + 6m\mu + 3m + 9\ell m + 3\ell - 5), &&\rho _{0}\: =\: \frac{1+\sqrt{5}} {2} \,; \\ \mathcal{C}_{1}(\mu,m,\ell)\, =\, \dfrac{m} {6} (2m^{2} + 6m\mu + 9m + 9\ell m + 6\mu + 9\ell - 11),&&\rho _{1}\: =\: 2; \\ \mathcal{C}_{2}(\mu,m,\ell)\, =\, \dfrac{m} {3} (2m^{2} + 6m\mu + 3m + 9\ell m + 3\ell - 5), &&\rho _{2}\: =\: 1 + \sqrt{2}\,; \\ \mathcal{C}_{3}(\mu,m,\ell)\, =\, \dfrac{m} {3} (2m^{2} + 6m\mu + 3m + 9\ell m + 3\mu + 3\ell - 2), &&\rho _{3}\: =\: 1 + \sqrt{3}\,.\\ \end{array} }$$

In order to compare the corresponding C E Is we use the following quotient

$$\displaystyle{R_{\,i\,j}\, =\, \frac{\log CEI_{i}} {\log CEI_{j}}\, =\, \frac{\log \rho _{i}} {\log \rho _{j}}\;\frac{\mathcal{C}_{j}} {\mathcal{C}_{i}},}$$

and we have the following theorem [13]. In Table 10.5, we present the different situations of this theorem.

Theorem 1

For all ℓ ≥ 1 we have:

  1. 1.

    CEI 1 > CEI 2 and CEI 3 , for all m ≥ 2.

  2. 2.

    CEI 0 > CEI 2 for all m ≥ 2.

  3. 3.

    CEI 1 > CEI 0 , for all m ≥ 3.

  4. 4.

    CEI 3 > CEI 2 for all m ≥ 4.

  5. 5.

    CEI 3 > CEI 0 , for all m ≥ 12,

Table 10.4 Local convergence order and computational cost of methods Φ j for 0 ≤ j ≤ 3
Table 10.5 The four situations of Theorem 1

10.4.5 Numerical Results

The numerical computations listed in Tables 10.710.810.910.10 10.1110.12 and 10.13 were performed on an MPFR library of C++ multiprecision arithmetics [39] with 4096 digits of mantissa. All programs were compiled by gcc(4.3.3) for i486-linux-gnu with libgmp (v.4. 2.4) and libmpfr (v.2.4.0) libraries on an IntelⓇXeon E5420, 2.5 GHz and 6 MB cache processor. For this hardware and software, the computational cost of the quotient with respect to the product is  = 2. 5 (see Table 10.3). Within each example the starting point is the same for all methods tested. The classical stopping criterion \(\vert \vert e_{I}\vert \vert = \vert \vert x_{I} -\alpha \vert \vert > 0.5 \cdot 10^{-\varepsilon }\) and \(\vert \vert e_{I+1}\vert \vert \leq 0.5 \cdot 10^{-\varepsilon }\), with ɛ = 4096, is replaced by

$$\displaystyle{E_{I} = \frac{\vert \vert \hat{e}_{I}\vert \vert } {\vert \vert \hat{e}_{I-1}\vert \vert } < 0.5 \cdot 10^{-\eta },}$$

where \(\hat{e}_{I} = x_{I} - x_{I-1}\) and \(\eta = \frac{\rho -1} {\rho ^{2}} \:\varepsilon\) [see (10.25a)]. Notice that this criterium is independent of the knowledge of the root. Furthermore, in all computations we have substituted the computational order of convergence (COC) [43] by the approximation ACOC, \(\hat{\rho }_{I}\) (10.33b).

Table 10.6 The three cases of Theorem 1 for μ = 87. 8 and  = 2. 5
Table 10.7 Numerical results for case 1, where m = 3 and t p  = 0. 1039
Table 10.8 Numerical results for case 2, where m = 5 and t p  = 0. 1039
Table 10.9 Numerical results for case 3, where m = 13 and t p  = 0. 1039

Examples

We present the system defined by

$$\displaystyle{ F_{i}(x_{1},\ldots,x_{m}) =\sum _{ { j=1 \atop j\neq i} }^{m}x_{ j} -\exp (-x_{i}) = 0,\quad 1 \leq i \leq m, }$$
(10.62)

where m = 3, 5, 13 and μ = 87. 8 for arithmetics of 4096 digits, since in (10.62) μ is independent from m. The three values of m correspond to three situations of the Theorem 1 (see Table 10.6). Tables 10.710.8 and 10.9 show the results obtained for the iterative methods Φ 0, Φ 1, Φ 2 and Φ 3 respectively.

For each case, we present one table where we read the method Φ i , the number of iterations needed I to reach the maximum precision requested, the computational elapsed time T in milliseconds of the C++ execution for these iterations, the correct decimals reached in D I approximately, the computational efficiency index C E I, the time factor \(\text{TF} = 1/\log CEI\), an error’s higher bound of the ACOC computation \(\varDelta \hat{\rho }_{I}\) where \(\rho =\hat{\rho } _{I} \pm \varDelta \hat{\rho }_{I}\).

Case 1 We begin with the system (10.62) for m = 3 where C E I 1 > C E I 0 > C E I 2 = C E I 3 > C E I 4. The root α = (α i ),  1 ≤ i ≤ m, and the two initial points x −1,  x 0 are

$$\displaystyle{\begin{array}{c} \alpha _{1} = -0.8320250398,\quad \alpha _{2,3} = 1.148983754, \\ x_{-1} = (-0.8,1.1,1.1)^{t}\quad x_{0} = (-0.9,1.2,1.2)^{t}. \end{array} }$$

The numerical results of this case are shown in Table 10.7.

Case 2 The second case is the system (10.62) for m = 5 where C E I 1 > C E I 0 > C E I 4 > C E I 2 = C E I 3. The numerical results of this case are shown in Table 10.8. The root α and the two initial points x −1,  x 0 are

$$\displaystyle{\begin{array}{c} \alpha _{1,2,5} = -2.153967996,\quad \alpha _{3,4} = 6.463463374, \\ x_{-1} = (-2.1,-2.1,6.4,6.4,-2.1)^{t}\quad x_{0} = (-2.2,-2.2,6.5,6.5,-2.2)^{t}. \end{array} }$$

Case 3 Finally, the third case is the system (10.62) for m = 13 where C E I 1 > C E I 4 > C E I 0 > C E I 2 = C E I 3. The numerical results of this case are in Table 10.9. The root α and the two initial points x −1,  x 0 are

$$\displaystyle{\begin{array}{c} \alpha _{1,2,3,5,7,10} = 1.371341671,\quad \alpha _{4,6,8,9,11,12,13} = -.9432774419, \\ x_{-1} = (1.3,1.3,1.3,-0.9,1.3,-0.9,1.3,-0.9,-0.9,1.3,-0.9,-0.9,-0.9)^{t}, \\ x_{0} = (1.4,1.4,1.4,-1.0,1.4,-1.0,1.4,-1.0,-1.0,1.4,-1.0,-1.0,-1.0)^{t}. \end{array} }$$

Remark 1

In case 1, we can arrange methods Φ 2 and Φ 3 according to the elapsed time T or the time factor TF. The results are different because the final precisions D I obtained in each method are not comparable. In Sect. 10.5 we explain a better way to compare the elapsed time that is more consistent with the theoretical results of the computational efficiency index C E I.

Remark 2

The first numerical definition of divided difference (10.61) has a counterexample in the following 2 × 2 system of nonlinear equations

$$\displaystyle{ F(x_{1},x_{2})\, =\, \left \{\begin{array}{l} x_{1}^{2} + x_{ 2}^{2} - 9\, =\, 0, \\ x_{1}\,x_{2} - 1\, =\, 0. \end{array} \right. }$$
(10.63)

Scheme Φ 3 gives a PCLOC \(\breve{\rho }= 1 + \sqrt{2}\), instead of the theoretical value \(\rho = 1 + \sqrt{3}\). Furthermore, a comparison of the expression (10.43), taking into account the definition of the divided differences operator (10.61), gives the following result

$$\displaystyle{\int _{0}^{1}\:F'(x+th)\,dt = \left (\begin{array}{cc} 2x_{1} + h_{1} & 2x_{2} + h_{2} \\ x_{2} + h_{2}/2&x_{1} + h_{1}/2 \end{array} \right )\:\mathbf{\neq }\:\left [\,x + h,x;F\,\right ]^{(1)} = \left (\begin{array}{cc} 2x_{1} + h_{1} & 2x_{2} + h_{2} \\ x_{2} & x_{1} + h_{1} \end{array} \right ),}$$

where x = (x 1, x 2)t, h = (h 1, h 2)t and t ∈ R. Due to Potra [30, 32] we have the following necessary and sufficient condition to characterize the divided difference operator by means of a Riemann integral.

Theorem 2

If F satisfies the following Lipschitz condition \(\|[x,y;F] - [u,v;F]\| \leq H\left (\|x - u\| +\| y - v\|\right )\) , then equality (10.61) holds for every pair of distinct points (x + h,x) ∈ D × D if, and only if, for all (u,v) ∈ D × D with u ≠ v and 2v − u ∈ D the following relation is satisfied:

$$\displaystyle{ [u,v;F]\, =\: 2\,[u,2v - u;F] - [v,2v - u;F]. }$$
(10.64)

We can check that the function considered in (10.75) does not hold (10.64). We need a new definition of divided differences instead of the definition given in (10.61) to obtain the local order required when we apply algorithms Φ k ,   k = 0, 1, 2, 3, in this case. We use the following method to compute the divided difference operator

$$\displaystyle{ [\,y,x;F]_{i\,j}^{(2)} = \frac{1} {2}\:\left ([\,y,x;F]_{i\,j}^{(1)} + [x,y;F]_{ i\,j}^{(1)}\right ),\;\;1 \leq i,j \leq m. }$$
(10.65)

Notice that the operator defined in (10.65) is symmetric: [ y, x; F] = [x, y; F]. If we use definition (10.65), we have to double the number of evaluations of the scalar functions in the computation of [ y, x; F], and by rewriting (10.65) we have m 2 quotients.

If we use (10.65) then method Φ 3 applied to system (10.75), the computational order of convergence is equal to \(1 + \sqrt{3}\). Another example with the same behavior as (10.75) is

$$\displaystyle{ F_{i}(x_{1},\ldots,x_{3}) = x_{i} -\cos \left (\sum _{{ j=1 \atop j\neq i} }^{3}x_{ j}\: - x_{i}\right ),\;\;1 \leq i \leq 3. }$$
(10.66)

In Table 10.10, we show the numerical results of method Φ 3 applied to the systems of nonlinear equations (10.75) and (10.66). We denote by Φ 3 ( j), j = 1, 2, method Φ 3 using the numerical definition of the divided difference operator \(\left [\,x + h,x;F\,\right ]^{(\,j)}\), j = 1, 2 respectively. By setting TF\(_{3}^{(\,j)} = 1/\log CEI_{3}^{(\,j)}\) as the time factors of methods Φ 3 ( j) and by comparing the two time factors of system (10.75), we can conclude (see Table 10.10) that method Φ 3 (2) is more efficient than Φ 3 (1). This behavior is reversed in example (10.66).

Table 10.10 Numerical results for the scheme Φ 3 applied to systems (10.75) and (10.66)

10.5 Theoretical Numerical Considerations

Theoretical and experimental studies of numerical applications often move away from one another. From studies by Ostrowski [28], Traub [41] and Ralston [33], we have introduced new concepts that allow us to estimate the execution time. We revisit the time factor [18] and we present a relation between these measures and the number of iterations, to achieve a given precision in the root computation. In other words, in the classical comparison between two iterative methods, the following ratio of efficiency logarithms was introduced

$$\displaystyle{ \frac{\varTheta _{1}} {\varTheta _{2}} = \frac{\log CEI_{2}} {\log CEI_{1}} = \frac{\mathcal{C}_{1}/\log \rho _{1}} {\mathcal{C}_{2}/\log \rho _{2}}, }$$
(10.67)

where Θ is the total cost of products to obtain the required precision when we apply a method. That is, if I is the total number of iterations, then

$$\displaystyle{ \varTheta = I \cdot \mathcal{C}. }$$
(10.68)

In this section, we also introduce a new factor that provides us with an explicit expression of the total time \(\tilde{\varTheta }\).

10.5.1 Theoretical Estimations

As an iterative method has local order of convergence ρ and local error \(\,e_{n} = x_{n}-\alpha \,\), we define \(\,D_{n} = -\log _{10}\vert \vert e_{n}\vert \vert \). That is, D n is approximately the number of correct decimal places in the nth iteration. From the definition of local order, we have | | e I+1 | | ≈ C | | e I  | | ρ and \(D_{I+1} \approx -\log _{10}C +\rho D_{I}\). The solution of this difference equation is

$$\displaystyle{D_{I} \approx D_{0}\rho ^{I} +\log _{ 10}M,\quad \text{where}\quad M = C^{1/(\rho -1)},}$$

and we obtain D I  ≈ D 0 ρ I. If we apply logarithms to both sides of the preceding equation and take into account (10.68) we get

$$\displaystyle{ I = (\log q)/(\log \rho )\quad \text{and}\quad \varTheta =\log q\frac{\mathcal{C}} {\log \rho } = \frac{\log q} {\log CEI}, }$$
(10.69)

where \(\,q = D_{I}/D_{0}\). From (10.68), if we take \(\;\tilde{\varTheta }(q) =\varTheta \, t_{p}\), where t p is the time required to do one product, then \(\tilde{\varTheta }(q)\) is the total time. Taking into account (10.55) and (10.69), the total time is \(\tilde{\varTheta }(q) \approx \,\log q\, \dfrac{t_{p}} {\log CEI}\). If we consider the time in product units, then 1∕logC E I will be called the time factor (TF). Notice that the term logq introduced in (10.69) is simplified in the quotient of Eq. (10.67). In [33], (10.67) is obtained from

$$\displaystyle{ \frac{\varTheta _{1}} {\varTheta _{2}} = \frac{I_{1}} {I_{2}} \frac{\mathcal{C}_{1}} {\mathcal{C}_{2}}. }$$
(10.70)

Then, considering \(\varTheta = \mathcal{C}/\ln \rho\), the efficiency index is

$$\displaystyle{EI = \frac{1} {\varTheta } = \frac{\ln \rho } {\mathcal{C}} =\ln \rho ^{1/\mathcal{C}}.}$$

Not only have we taken C E I as defined in (10.55), but we have also expressed the factor that is simplified in (10.70) as

$$\displaystyle{I_{k} = \frac{\log q} {\log \rho _{k}}\;k = 1,2,}$$

as we inferred and deduced in (10.69). We are now in a position to state the following theorem.

Theorem 3

In order to obtain D I correct decimals from D 0 correct decimals using an iterative method, we can estimate

  • the number of iterations \(\;I \approx \dfrac{\log q} {\log \rho }\) ,

  • the necessary time \(\;\tilde{\varTheta }(q) \approx \log q\; \dfrac{t_{p}} {\log CEI}\) ,

where \(\,q = D_{I}/D_{0}\) and t p  is the time needed for one product and \(\,\mathcal{C}\,\) is the computational cost per iteration.

When we are faced with a numerical problem we rewrite the estimation of time as in the following equation of the straight line in variables \((\log D_{I},\tilde{\varTheta })\):

$$\displaystyle{\tilde{\varTheta }(D_{I}) = \dfrac{t_{p}} {\log CEI}(\log D_{I} -\log D_{0}) =\kappa \log \frac{D_{I}} {D_{0}} =\kappa \log q,}$$

where \(\,\kappa = t_{p}/\log (CEI)\). That is, κ is a coefficient that measures the time of execution in function of the approximate number of correct decimals. In order to study and analyze an iterative method, we can consider the straight line \(\tilde{\varTheta }(D_{I})\) with slope κ in a semi-logarithmic graph. If we approximate the (logD j , Θ(D j )) pairs in a least-squares sense by a polynomial of degree one, we can compute an experimental slope \(\tilde{\kappa }\) that is used in Figs. 10.110.2 and 10.3, and Tables 10.1110.12 and 10.13.

Fig. 10.1
figure 1

Time t versus number of correct decimals D i for m = 3

Fig. 10.2
figure 2

Time t versus number of correct decimals D i for m = 5

Fig. 10.3
figure 3

Time t versus number of correct decimals D i for m = 13

Table 10.11 Numerical results for case 1, where m = 3 and t p  = 0. 1039
Table 10.12 Numerical results for case 2, where m = 5 and t p  = 0. 1039
Table 10.13 Numerical results for case 3, where m = 13 and t p  = 0. 1039

Tables 10.1110.12 and 10.13 show the time factor (T F); the last iteration reached (I); the approximated number of correct decimal places in the Ith iteration (D I ); the elapsed time \(\tilde{\varTheta }\); the slope \(\tilde{\kappa }\); and the computed time factor \(\widetilde{TF}\) defined by

$$\displaystyle{ \widetilde{TF} = \frac{\widetilde{\varTheta }(D_{I})} {t_{p}\,\log q} = \frac{\tilde{\kappa }} {t_{p}} \approx TF = \frac{1} {\log CEI}. }$$

Furthermore, the last column shows the percentage of relative error r T F between T F and \(\widetilde{TF}\). Note that the ordering of the methods according to time factor (T F) (or C E I) matches the ordering according to \(\tilde{\kappa }\).

Figures 10.110.2 and 10.3 show in a semi-logarithmic graph the (logD j , Θ(D j )) pairs and the straight line of each method. Note that the smaller the slope of the line, the more efficient the method.

10.6 Ball of Local Convergence

In this section, we use the convergence ball to study an aspect of local convergence theory. For this purpose, we present an algorithm devised by Schmidt and Schwetlick [34] and subsequently studied by Potra and Pták [31]. The procedure consists of fixing a natural number k, and keeping the same linear operator of the Secant method for sections of the process consisting of k steps each. It may be described as follows: starting with x n , z n  ∈ D, for 0 ≤ j ≤ k − 1, n ≥ 0 and x n (0) = z n ,

$$\displaystyle\begin{array}{rcl} x_{n}^{(\,j+1)}\, =\,\varPhi _{ 4,\,j+1}(x_{n};\,x_{n-1})& =& x_{n}^{(\,j)} -\left [\,x_{ n},z_{n};F\right ]^{-1}F(x_{ n}^{(\,j)}).{}\end{array}$$
(10.71)

In the two last steps we take \(x_{n+1} =\varPhi _{4,\,k-1}(x_{n};x_{n-1}) = x_{n}^{(k-1)}\) and finally \(z_{n+1} =\varPhi _{4,\,k}(x_{n};x_{n-1}) = x_{n}^{(k)}\).

The iterative method Φ 4, k defined in (10.71) has a local convergence order equal to at least \(\rho _{4,\,k} = \frac{1} {2}\,\left (k + \sqrt{k^{2 } + 4}\right )\) [21].

We introduce a theorem (see [21]) on the local convergence of sequences defined in (10.71), following the ideas given in [8, 36]. We denote B(α, r) the open ball {x ∈ R m; ∥ xα ∥  < r}. 

Theorem 4

Let α be a solution of F(x) = 0 such that [F′(α)] −1 exists. We suppose that there is a first order divided difference \([x,y;F] \in \mathcal{L}(D,\mathbf{R^{m}})\) , for all x,y ∈ D, that satisfies

$$\displaystyle{ \left \|[F'(\alpha )]^{-1}\left ([x,y;F] - [u,v;F]\right )\right \| \leq K\left (\|x - u\| +\| y - v\|\right ),\;x,y,u,v \in D, }$$
(10.72)

and B(α,r) ⊂ D, where \(r = \frac{1} {5K}\) . Then, for x 0 ,z 0 ∈ B(α,r), the sequence {x n ( j+1) }, 0 ≤ j ≤ k − 1, n ≥ 0, given in (10.71) is well-defined, belongs to B(α,r) and converges to α. Moreover,

$$\displaystyle{ \begin{array}{rcl} \|x_{n}^{(\,j+1)}-\alpha \|& =&\dfrac{K\left (\|x_{n} -\alpha \| +\|z_{n} -\alpha \| +\|x_{n}^{(\,j)}-\alpha \|\right )} {1 - K\left (\|x_{n} -\alpha \| +\|z_{n}-\alpha \|\right )} \;\|x_{n}^{(\,j)} -\alpha \|. \end{array} }$$
(10.73)

Theorem 4 can be seen as a result of the accessibility of the solution in the following way, if x 0, z 0 ∈ B(α, r), where \(r = \frac{1} {5K}\), the sequence {x n ( j+1)}, given in (10.71), converges to α. The radius r could be slightly increased if we consider center-Lipschitz conditions, as in [3] or [4], for instance. In fact, let us assume that, together with (10.72), the following condition holds:

$$\displaystyle{ \left \|[F'(\alpha )]^{-1}\left ([\alpha,\alpha;F] - [u,v;F]\right )\right \| \leq K_{ 1}\left (\|\alpha -u\| +\|\alpha -v\|\right ),\;\;\:u,v \in D. }$$
(10.74)

Obviously K 1 is always less than or equal to K. Then, we can mimic the proof of Theorem 4 (see [21]) to obtain the radius \(r_{1} = 1/(2K_{1} + 3K) \geq r.\)

Application Now, we consider an application of the previous analysis to the nonlinear integral equation of mixed Hammerstein type of the form:

$$\displaystyle{x(s) = 1 + \frac{1} {2}\int _{0}^{1}G(s,t)\,x(t)^{3}\,dt,\quad s \in [0,1],}$$

where x ∈ C[0, 1], t ∈ [0, 1], and the kernel G is the Green function

$$\displaystyle{G(s,t) = \left \{\begin{array}{ll} (1 - s)\,t,&t \leq s,\\ s\,(1 - t), &s < t. \end{array} \right.}$$

Using the following Gauss-Legendre formula to approximate an integral, we obtain the following nonlinear system of equations

$$\displaystyle{ x_{i} = 1+\frac{1} {2}\sum _{j=1}^{8}b_{ ij}\,x_{j}^{3},\quad b_{ ij} = \left \{\begin{array}{ll} \varpi _{j}t_{j}(1 - t_{i}),&\mbox{ if $j \leq i$,} \\ \varpi _{j}t_{i}(1 - t_{j}),&\mbox{ if $j > i$,} \end{array} \right.\quad i = 1,\ldots,8, }$$
(10.75)

where the abscissas t j and the weights ϖ j are determined for m = 8 (see [6]). We have denoted the approximation of x(t i ) by x i (\(i = 1,2,\ldots,8\)). The solution of this system is \(\alpha = \left (\alpha _{1},\alpha _{2},\alpha _{3},\alpha _{4},\alpha _{4},\alpha _{3},\alpha _{2},\alpha _{1}\right )^{t}\), where

$$\displaystyle{\alpha _{1} \approx 1.00577,\ \alpha _{2} \approx 1.02744,\alpha _{3} \approx 1.05518\ \mathrm{and}\ \alpha _{4} \approx 1.07441.}$$

The nonlinear system (10.75) can be written in the form

$$\displaystyle{F(x) = x -\overline{1} -\frac{1} {2}\,B\,\hat{x}\, =\, 0,}$$

for \(x = (x_{1},\ldots,x_{m})^{t}\), \(\overline{1} = (1,1,\ldots,1)^{t}\), \(\hat{x} = (x_{1}^{3},\ldots,x_{m}^{3})^{t}\) and B = (b i j ).

Taking the definition of the divided difference operator (10.61) we have

$$\displaystyle{[x,y;F] = I -\frac{1} {2}\,B\,\mathrm{diag}(\tilde{p}),}$$

where \(\tilde{p} \in \mathbf{R^{m}}\) and \(\tilde{p}_{i} = x_{i}^{2} + x_{i}y_{i} + y_{i}^{2},1 \leq i \leq m\). The Fréchet-derivative of operator F is given by

$$\displaystyle{F'(X) = I -\frac{3} {2}\,B\,\mathrm{diag}(q),}$$

where q ∈ R m and q i  = X i 2, 1 ≤ i ≤ m, and, in addition, we have \(F'(X) - F'(Y ) = -\frac{3} {2}\,B\,\mathrm{diag}(r)\), where r ∈ R m and \(r_{i} = X_{i}^{2} - Y _{i}^{2}\), 1 ≤ i ≤ m. Setting

$$\displaystyle{ \varOmega =\{ X \in \mathbf{R^{m}}\vert \|X\|_{ \infty }\leq \delta \} }$$
(10.76)

and taking norms we obtain

$$\displaystyle{ \|F'(X) - F'(Y )\| \leq \, \frac{3} {2}\,\|B\|\,\max _{1\leq i\leq m}\vert 2c_{i}\vert \:\|X - Y \|, }$$

where c ∈ Ω, and we get

$$\displaystyle{ \|F'(X) - F'(Y )\| \leq \, 3\,\delta \,\|B\|\,\|X - Y \|. }$$
(10.77)

The divided difference operator can also be written as follows (see [32]):

$$\displaystyle{[x,y;F] =\int _{ 0}^{1}\,F'\left (\tau x + (1-\tau )y\right )\:d\tau.}$$

Then we have

$$\displaystyle\begin{array}{rcl} \|[x,y;F] - [u,v;F]\|& \leq & \int _{0}^{1}\,\|F'\left (\tau x + (1-\tau )y\right ) - F'\left (\tau u + (1-\tau )v\right )\|\,d\tau {}\\ & \leq & 3\,\delta \,\|B\|\,\int _{0}^{1}\,\left (\tau \|x - u\| + (1-\tau )\|y - v\|\right )\,d\tau {}\\ & =& \frac{3} {2}\,\delta \,\|B\|\left (\|x - u\| +\| y - v\|\right ). {}\\ \end{array}$$

Next, we compute an upper bound for the inverse of F′(α), \(\|F'(\alpha )^{-1}\| \leq 1.233\), and finally, taking into account

$$\displaystyle{\|F'(\alpha )^{-1}\left ([x,y;F] - [u,v;F]\right )\| \leq \| F'(\alpha )^{-1}\|\,\frac{3} {2}\,\delta \,\|B\|\left (\|x - u\| +\| y - v\|\right ),}$$

we deduce the following value for the parameter K introduced in (10.72):

$$\displaystyle{ K =\, \frac{3} {2}\,\delta \,\|F'(\alpha )^{-1}\|\,\|B\|. }$$
(10.78)
Table 10.14 Numerical results for the radius of the existence ball in the nonlinear system (10.75) for different values of δ defined in (10.76) and the corresponding K introduced in (10.78)

Table 10.14 shows the size of the balls centered in α, \(r = 1/(5K)\), for the frozen Schmidt-Schwetlick method and for \(x_{-1} = \overline{\delta }\) and \(x_{0} = \overline{1}\) (see Theorem 4).

The numerical computations were performed on an MPFR library of C++ multiprecision arithmetics with 4096 digits of mantissa. All programs were compiled by g++(4.2.1) for i686-apple-darwin1 with libgmp (v.4.2.4) and libmpfr (v.2.4.0) libraries on an IntelⓇ Core i7, 2.8 GHz (64-bit machine) processor. For this hardware and software, the computational cost of the quotient respect to the product is  = 1. 7. Within each example, the starting point is the same for all the methods tested. The classical stopping criteria \(\vert \vert e_{I+1}\vert \vert = \vert \vert x_{I+1} -\alpha \vert \vert < 10^{-\varepsilon }\) and | | e I  | |  > 10ɛ, where ɛ = 4096, is replaced by \(\vert \vert \tilde{e}_{I+1}\vert \vert = \vert \vert x_{I+1} -\tilde{\alpha }_{I+1}\vert \vert <\varepsilon\) and \(\vert \vert \tilde{e}_{I}\vert \vert >\varepsilon\), where \(\tilde{\alpha }_{n}\) is obtained by the δ 2-Aitken procedure, that is

$$\displaystyle{ \tilde{e}_{n} = \left (\frac{(\delta x_{n-1}^{(r)})^{2}} {\delta ^{2}x_{n-2}^{(r)}} \right )_{r=1\div m} }$$
(10.79)

where \(\delta x_{n-1} = x_{n} - x_{n-1}\) and the stopping criterion is now \(\vert \vert \tilde{e}_{I+1}\vert \vert < 10^{-\eta },\) where \(\eta = \left [\varepsilon \,(2\rho - 1)/\rho ^{2}\right ]\). Note that this criterion is independent of the knowledge of the root (see [14]).

In this case the concrete values of the parameters for the method Φ 4, k are (m, μ) = (8, 11). Taking as initial approximations \(x_{-1} = \overline{1}\) and \(x_{0} = \overline{1.1}\), which satisfy the conditions of Theorem 4 (see Table 10.14), we compare the convergence of the methods Φ 4, k , towards the root α. We get the results shown in Table 10.15.

Table 10.15 Numerical results for the nonlinear system (10.75), where we show I, the number of iterations, ρ 4, k , the local order of convergence, C E I, the computational efficiency index, T F, the time factor, D I and the estimation of the corrected decimal number in the last iteration x I

Finally, in the computations we substitute the computational order of convergence (COC) [43] by an extrapolation (ECLOC) denoted by \(\tilde{\rho }\) and defined as follows

$$\displaystyle{\tilde{\rho }= \frac{\ln \vert \vert \tilde{e}_{I}\vert \vert } {\ln \vert \vert \tilde{e}_{I-1}\vert \vert },}$$

where \(\tilde{e}_{I}\) is given in (10.79). If \(\rho =\tilde{\rho } \pm \varDelta \tilde{\rho }\), where ρ is the local order of convergence and \(\varDelta \tilde{\rho }\) is the error of ECLOC, then we get \(\varDelta \tilde{\rho }< 10^{-3}\). This means that in all computations of ECLOC we obtain at least three significant digits. Therefore, it is a good check of the local convergence orders of the family of iterative methods presented in this section.

10.7 Adaptive Arithmetic

In this section, we present a new way to compute the iterates. It consists of adapting the length of the mantissa to the number of significant figures that should be computed in the next iteration. The role of the local convergence order ρ is the key concept to forecast the precision of the next iterate. Furthermore, if the root is known, then an expression of the forecast of digits in terms of x n and e n is

$$\displaystyle{\varDelta _{e_{n}} = \lceil \rho \left (-\log _{10}\vert \vert e_{n}\vert \vert + 4\right ) +\log _{10}\vert \vert x_{n}\vert \vert \rceil,}$$

where 4 is a security term that is empirically obtained. When the root is unknown according to

$$\displaystyle{ D_{k} \approx -\log _{10}\|e_{k}\| \approx -\, \frac{\rho } {\rho -1}\,\log _{10}\|\breve{e}_{k}\|. }$$
(10.80)

(see Sect. 10.2 for more details) we may use the following forecast formulae

$$\displaystyle{\varDelta _{\breve{e}_{n}} = \left \lceil \frac{\rho ^{2}} {\rho -1}\left (-\log _{10}\vert \vert \breve{e}_{n}\vert \vert + 2\right ) +\log _{10}\vert \vert x_{n}\vert \vert \right \rceil,}$$

where \(\varDelta _{e_{n}}\) and \(\varDelta _{\breve{e}_{n}}\) are the lengths of the mantissa for the next iteration.

10.7.1 Iterative Method

The composition of two Newton’s iteration functions is a well-known technique that allows us to improve the efficiency of iterative methods in the scalar case. The two-step iteration function obtained in this way is

$$\displaystyle{ \left \{\begin{array}{ccl} y & =&\mathcal{N}(x)\: =\: x -\,\dfrac{f(x)} {f'(x)}, \\ X & =&\mathcal{N}(y).\end{array} \right. }$$
(10.81)

In order to avoid the computation and the evaluation of a new derivative function f′(y) in the second step, some authors have proposed the following variant:

$$\displaystyle\begin{array}{rcl} X& =& y -\,\frac{f(y)} {f'(x)}.{}\end{array}$$
(10.82)

In this case, the derivative is “frozen” (we only need to calculate f′(x) at each step). Note that the local order of convergence decreases from ρ = 4 to ρ = 3, but the efficiency index in (10.81) is \(EI = 4^{1/4} = 1.414\), while in (10.82) we have an improvement: \(EI = 3^{1/3} = 1.441\).

Chung in [5] considers a different denominator in the second step of (10.81); namely,

$$\displaystyle\begin{array}{rcl} X& =& y -\, \frac{f(y)} {f'(x)\,h(t)},{}\end{array}$$
(10.83)

where \(\,t = f(y)/f(x)\) and h(t) is a real valued function. The conditions established by Chung [5] in order to obtain a fourth-order method with only three evaluations of functions, f(x), f(y), f′(x), are the following

$$\displaystyle{h(0) = 1,\;h'(0) = -2\;\mbox{ and }\;\vert h''(0)\vert < \infty.}$$

In particular, we consider a specific member of the King’s family [25] defined by \(h(t) = \dfrac{1} {1 + 2\,t}\) (see also [2]) and the second step of (10.81) is

$$\displaystyle{X\, =\, y -\, (1 + 2\,t)\,\frac{f(y)} {f'(x)}.}$$

Taking into account that

$$\displaystyle{ t\, =\, \dfrac{f(y)} {f(x)}\, =\, \dfrac{f(x) + f(y) - f(x)} {f(x)} \, =\, 1 +\: \dfrac{f(y) - f(x)} {f(x)}, }$$

and \(\,f(x) = -\,f'(x)\,(y -\, x)\), we have

$$\displaystyle{ t\, =\, 1 -\:\dfrac{f(y) - f(x)} {f'(x)\,(y -\, x)}\, =\, 1 -\:\dfrac{[x,y]_{f}} {f'(x)}, }$$
(10.84)

we generalize to the operator T defined by

$$\displaystyle{T = I - F'(x)^{-1}\,[x,y;F].}$$

So we consider the iterative method

$$\displaystyle{ \left \{\begin{array}{rcl} y_{n}& =&x_{n} - F'(x_{n})^{-1}\,F(x_{n}), \\ x_{n+1} & =&\varPhi _{5}(x_{n}) = y_{n} -\left (I + 2\,T\right )\,F'(x)^{-1}\,F(y) \\ & =&y_{n} -\left (3\,I - 2\,F'(x_{n})^{-1}\,[\,x_{n},y_{n};F\,]\right )\,F'(x)^{-1}\,F(y). \end{array} \right. }$$
(10.85)

This algorithm has been deduced independently by Sharma and Arora [37], who prove that the R-order is at least four. In Table 10.16 we present the parameters of the computational cost of this scheme.

10.7.2 Numerical Example

We have applied the method (10.85) to the system (10.62) for m = 11 and the initial vector x 0 with all its components equal to 0. 1. The solution reached with this initial point is the vector α with their eleven components approximately equal to 0. 09127652716 In this case, taking into account \(\mu _{1} =\mu _{0}/m\), we have (m, μ 0, μ 1) = (11, 76. 4, 6. 95).

Table 10.16 Computational cost of iterative methods Φ 5

Table 10.17 shows the correct decimals and forecast for this function using the Φ 5 method. In the first two rows, we consider the correct decimals number D n using fixed arithmetic (FA) and adaptive arithmetic (AA). In the third and fourth rows, the forecasts of lengths of the mantissa are obtained in adaptive arithmetic when we know the root \(\varDelta _{e_{n}}\), or not \(\varDelta _{\breve{e}_{n}}\), respectively.

Table 10.17 Number of correct decimals and forecasts for each iteration 1 ≤ n ≤ 5

As can be seen, all the forecasts overestimate the real values. Note that the values of \(\varDelta _{e_{n}}\) and α are very similar.

Table 10.18 Partial and total elapsed time in ms for each iteration 1 ≤ n ≤ 5

The two first rows in Table 10.18 show the partial and total elapsed time (t e and T e ) when the root is known. The third and fourth rows show these times (\(t_{\breve{e}}\) and \(T_{\breve{e}}\)) when the root is unknown. Moreover, using fixed arithmetic, the total elapsed time to obtain the same solution is 1364. 151 ms. Henceforth, the time in the forecast of the mantissa length is sufficiently short that we can state that the use of adaptive arithmetic is a technique five times faster than the fixed arithmetic technique (in fact, 5. 7 using e and 5. 3 using \(\breve{e}\)).

Table 10.19 Elapsed time in ms for 100,000 products

10.7.3 Practical Result

Note that, if the number of digits is increased, then the time needed to compute a product is also increased. In particular, when the number of digits is doubled from 256 digits the computational time is tripled (see Table 10.19). Following Karatsuba’s algorithm [24], we have

$$\displaystyle{t_{n} = a\,\varDelta _{n}^{\lambda },}$$

where \(\lambda =\log _{2}3 = 1.58496\ldots\) Note that \(\varDelta _{n} = D_{n}+\varLambda\) where D n is the number of correct decimals and Λ is the number of integer digits of x n . For the range 256 ≤ Δ n  ≤ 4096 we have Δ n λ ≈ D n λ. Therefore, from D n+1 ≈ ρD n we obtain

$$\displaystyle{\mathcal{C}_{n+1} = a\,\varDelta _{n+1}^{\lambda } \approx a\,D_{ n+1}^{\lambda } \approx a\,\rho ^{\lambda }\,D_{ n}^{\lambda } \approx \rho ^{\lambda }\,\mathcal{C}_{ n}.}$$

Denoting by \(\mathcal{C}_{I}\) the computational cost of the last iteration, if we consider an infinite number of iterations, the total cost is

$$\displaystyle{\widetilde{\mathcal{C}} = \mathcal{C}_{I}\left (1 + \frac{1} {\rho ^{\lambda }} + \frac{1} {\rho ^{2\lambda }} + \cdots \,\right ) = \mathcal{C}_{I}\; \frac{\rho ^{\lambda }} {\rho ^{\lambda } - 1}.}$$

If we only consider the last iterate then we have

$$\displaystyle{r = \frac{\mathcal{C}_{I}} {\widetilde{\mathcal{C}}} = \frac{\rho ^{\lambda } - 1} {\rho ^{\lambda }}.}$$

Notice that for ρ = 4 we have r = 0. 889. From Table 10.18 we can deduce that in the two cases (knowledge of the root and no knowledge of the root) in adaptive arithmetic, the computational cost of the last iteration is 87.5 % of the total elapsed time. Actually, we can assert that for iterative methods of an order equal to or greater than the fourth order, we only need to consider the cost of the last iteration to obtain a first approximation.