1 Introduction

In this study we are concerned with the problem of computing a square root of a given matrix A. A square root of a \(r\times r\) matrix A with complex elements, \(A\in \mathbb {C}^{r \times r}\), is a solution \(X\in \mathbb {C}^{r \times r}\) of the quadratic matrix equation

$$\begin{aligned} \mathcal {F}(X)\equiv X^2-A=0, \end{aligned}$$

which is denoted by \(X=A^{1/2}\). Different from the square root of a scalar, the square root of a matrix may not exist. The theory of existence of matrix square roots is well presented in [15] and the number of matrix square roots is obtained in [13, 15].

One approach to computing a square root of a matrix is to apply iterative methods to the previous quadratic matrix equation. For a general function \(\mathcal {G}:\mathbb {C}^{r \times r}\longrightarrow \mathbb {C}^{r \times r}\), iterative methods for the solution of \(\mathcal {F}(X)=0\) can be specified by an initial approximation \(X_0\) and the algorithm \(X_{n+1}=\mathcal {G}(X_{n})\), \(n\ge 0\), so that, for suitable initial values \(X_0\), the algorithm provides a sequence \(\{X_n\}\) which converges to \(A^{1/2}\).

Algorithms for computing the square root of a matrix are a subject of current research (see [14]) and are now well understood, so that they are attracting more and more attention. The most popular method for computing the square root of a matrix is the Newton method [12], that converges quadratically. In this paper, we investigate a convergence of the family of iterations with high order of convergence for computing the square root of a matrix, proposed in [3], which includes the Newton and Chebyshev methods. Since the Newton method has already been sufficiently studied by other authors for computing the square root of a matrix [12, 13, 16], we began with the Chebyshev method to present the technique for proving its semilocal convergence that we use later to prove the semilocal convergence of the family of iterations. This family is analysed in detail and applied to solve the above quadratic matrix equation. We show that the methods have good properties of convergence and semilocally converge to \(A^{1/2}\). In addition, the theoretic analysis and numerical experiments show that the most efficient method depends on the matrix and on the implementation way.

2 The Chebyshev method

First, we introduce the idea of the Chebyshev method and some corresponding results. It is well-known that, for a given operator \(F:X\longrightarrow Y\) defined in a Banach space X and with values in a Banach space Y, the third order Chebyshev method [5] for solving \(F(X)=0\) is:

$$\begin{aligned} X_0\ \text {given},\quad X_{n+1} = X_n-\left( I+\frac{1}{2}L_{F}(X_n)\right) [F'(X_n)]^{-1}F(X_n),\quad n\ge 0, \end{aligned}$$

where \(L_{F}(X_n)=[F'(X_n)]^{-1}F''(X_n)[F'(X_n)]^{-1}F(X_n)\).

If we particularize the Chebyshev method for the computation of the square root of a matrix, we then consider the space

$$\begin{aligned} \Theta =\left\{ A \in \mathbb {C}^{r \times r}\,\text {s.t.}\, A \, \text {has no nonpositive real eigenvalues} \right\} \end{aligned}$$

and approximate \(A^{1/2}\) for a given operator \(A\in \Theta \). For this, we consider

$$\begin{aligned} \mathcal {F}:\Theta \longrightarrow \mathbb {C}^{r \times r} \quad \text {and}\quad \mathcal {F}(X)=X^{2}-A, \end{aligned}$$

so that \(A^{1/2}\) is the solution of the equation \(\mathcal {F}(X)=0\). So, the Chebyshev method is reduced (see [3]) to

$$\begin{aligned} X_0\in \Theta ,\quad X_{n+1}= \frac{3}{8} X_n + \frac{3}{4} A X_n^{-1} - \frac{1}{8}A^2 X_n^{-3},\quad n\ge 0. \end{aligned}$$
(1)

Observe that we need to calculate the inverses \(X_{n}^{-1}\) at each step of the Chebychev method defined in (1). We are interested in obtaining other expression of the Chebyshev method, where the computation of the inverses is not necessary. For this, we now consider the equation

$$\begin{aligned} \mathcal {F}(X) = X^{-2}-A^{-1} = 0, \end{aligned}$$
(2)

where \(\mathcal {F}:\Theta \longrightarrow \mathbb {C}^{r \times r}\). In this case, \(A^{1/2}\) is also the solution of the equation \(\mathcal {F}(X)=0\).

From (2), we obtain

$$\begin{aligned} \mathcal {F}'(X)Y = -2 X^{-1} Y X^{-2},\qquad \mathcal {F}''(X)Y^{2} = 6 (X^{-1} Y)^{2} X^{-2} \end{aligned}$$

and taking into account that the Chebyshev method can be also written as

$$\begin{aligned} \left\{ \begin{array}{l} X_0\in \Theta , \\ \mathcal {F}'(X_{n})(Y_{n}-X_{n}) = - \mathcal {F}(X_{n}),\quad n\ge 0,\\ \mathcal {F}'(X_{n})(X_{n+1}-Y_{n}) = - \frac{1}{2} \mathcal {F}''(X_{n})(Y_{n}-X_{n})^{2}, \end{array} \right. \end{aligned}$$

the algorithm of the Chebyshev method for solving \(\mathcal {F}(X)=0\), where \(\mathcal {F}:\Theta \longrightarrow \mathbb {C}^{r \times r}\), is then

$$\begin{aligned} X_0\in \Theta ,\quad X_{n+1} = X_n\left( I+\dfrac{1}{2}\left( I-A^{-1}X_n^2\right) + \dfrac{3}{8}\left( I-A^{-1}X_n^2\right) ^2\right) ,\quad n\ge 0. \end{aligned}$$
(3)

Observe that we only need to calculate \(A^{-1}\), and not \(X_{n}^{-1}\), at each step for applying algorithm (3) to solve \(\mathcal {F}(X)=0\), so that this algorithm is more efficient than algorithm (1).

After that, we analyse the convergence of the Chebyshev method. For this, we suppose \(AX_0=X_0A\) and consider the following residual of algorithm (3):

$$\begin{aligned} R(X_n)=I-A^{-1}X_n^2,\quad n\ge 0. \end{aligned}$$

To prove the convergence of algorithm (3), we consider a submultiplicative matrix norm \(\Vert \cdot \Vert \) defined in \(\mathbb {C}^{r \times r}\) and prove that \(\{\Vert R(X_n)\Vert \}\) is a scalar decreasing sequence convergent to zero. First of all, from \(AX_0=X_0A\), it follows \(AX_n=X_nA\), \(AX_n^2=X_n^2A\) and \(A^{-1}X_n^2=X_n^2A^{-1}\). As a consequence, we have

$$\begin{aligned} R(X_n)= & {} I-A^{-1}X_n^2\\= & {} I-A^{-1}\left( X_{n-1}\left( I+\dfrac{1}{2}R(X_{n-1})+\dfrac{3}{8}R(X_{n-1})^2\right) \right) ^2 \\= & {} I-(I-R(X_{n-1}))\left( I+\dfrac{1}{2}R(X_{n-1})+\dfrac{3}{8}R(X_{n-1})^2\right) ^2. \end{aligned}$$

Next, taking into account Villareal’s formula,

$$\begin{aligned} P(y)^2=\left( \sum _{i=0}^m a_{i}\,y^{i}\right) ^2=\sum _{i=0}^{2m}P_{i}\,y^{i}, \end{aligned}$$

where

$$\begin{aligned} P_{0}=a_{0}^2,\quad \displaystyle {P_{i}=\sum _{j=0}^{i-1}P_{j}\,\frac{a_{i-j}}{a_{0}}\,\frac{2i-3j}{i}},\quad a_i=\dfrac{d_i}{i!},\quad d_0=1,\quad \displaystyle {d_i=\prod _{j=0}^{i-1}\left( \dfrac{1}{2}+j\right) }, \end{aligned}$$

it follows

$$\begin{aligned} R(X_n) = I-(I-R(X_{n-1}))\sum _{i=0}^{4} P_{i}\,R(X_{n-1})^{i}, \end{aligned}$$
(4)

with \(P_{0}=P_1=P_2=1\), \(P_3=0.375\) and \(P_4=0.1406\).

Note that we can always obtain \(X_n A = A X_n\). For this, it is enough to choose \(X_0=I_{r\times r}\), as Guo does in [9].

Now, we establish the semilocal convergence of the Chebyshev method in the following theorem.

Theorem 2.1

Let \(A\in \mathbb {C}^{r \times r}\) and \(X_{0}\in \Theta \) such that \(AX_0=X_0A\) and \(\Vert R(X_0)\Vert =\Vert I-A^{-1}X_0^2\Vert <1\). Then, the Chebyshev method defined in (3) converges to \(A^{1/2}\). Moreover, \(\Vert R(X_n)\Vert \le \Vert R(X_0)\Vert ^{3^n}\), \(n\in \mathbb {N}\).

Proof

From (4), we have

$$\begin{aligned} R(X_1)= & {} I-\sum _{i=0}^{4}P_iR(X_0)^i+\sum _{i=0}^{4}P_{i}R(X_0)^{i+1} \\= & {} \sum _{i=0}^{3}(P_{i}-P_{i+1})R(X_0)^{i+1}+P_{4}R(X_0)^{5} \\= & {} (1-P_3)R(X_0)^3+ (P_3-P_{4})R(X_0)^4+P_{4}R(X_0)^{5} \end{aligned}$$

and, as \(\Vert R(X_0)\Vert <1\), \(P_3-P_{4}\ge 0\) and \(P_{4}\ge 0\), it follows

$$\begin{aligned} \Vert R(X_1)\Vert= & {} (1-P_3)\Vert R(X_0)\Vert ^3+(P_3-P_{4})\Vert R(X_0)\Vert ^4+P_{4}\Vert R(X_0)\Vert ^{5} \\< & {} (1-P_3)\Vert R(X_0)\Vert ^3+(P_3-P_{4})\Vert R(X_0)\Vert ^3+P_{4}\Vert R(X_0)\Vert ^3 \\= & {} \Vert R(X_0)\Vert ^3. \end{aligned}$$

Now, taking into account \(\Vert R(X_1)\Vert <\Vert R(X_0)\Vert ^3<1\), we prove, by mathematical induction on n, that \(\Vert R(X_n)\Vert <\Vert R(X_{n-1})\Vert ^3\), for \(n\in \mathbb {N}\). As a consequence,

$$\begin{aligned} \Vert R(X_n)\Vert <\Vert R(X_{n-1})\Vert ^3<\Vert R(X_{n-2})\Vert ^{3^2}<\cdots <\Vert R(X_0)\Vert ^{3^n}. \end{aligned}$$

Since \(\Vert R(X_0)\Vert <1\), then \(\lim _{n}\Vert I-A^{-1}X_n^2\Vert =0\) and \(A^{1/2}=\lim _{n}X_n\). \(\square \)

From Theorem 2.1, we also deduce that the Chebyshev method is of R-order at least three.

Example We compare the behavior of the Chebyshev and Halley methods for solving the following matrix differential equation

$$\begin{aligned} y''(t) + A y(t)=0 \end{aligned}$$

with the initial conditions \(y(0)=\alpha \) and \(y'(0)=\beta \). Note that the Halley method is the usual method proposed to solve this type of problems [9, 17].

The exact solution of this problem can be expressed as

$$\begin{aligned} y(t) = \alpha \cos \left( A^{1/2} t\right) + \beta A^{-1/2} \sin \left( A^{1/2} t\right) , \end{aligned}$$

so that we need to compute both \(A^{1/2} \) and \(A^{-1/2}\) at the same time. We then consider the matrix

$$\begin{aligned} A = \left( \begin{array}{cccccc} 1-2\lambda &{}\quad \lambda &{} &{} &{} &{} \\ \lambda &{}\quad 1-2\lambda &{}\quad \lambda &{} &{}\quad 0 &{} \\ &{}\quad \lambda &{}\quad 1-2\lambda &{}\quad \lambda &{} &{} \\ &{} &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{} \\ &{} 0 &{} &{}\quad \lambda &{}\quad 1-2\lambda &{}\quad \lambda \\ &{} &{} &{} &{}\quad \lambda &{}\quad 1-2\lambda \end{array} \right) \end{aligned}$$

with \(\lambda =1/16\). This matrix is related with the finite difference discretization of second order derivatives in space.

In Table 1 we observe the improvements of the Chebychev method when the dimension of the matrix A increases. We have computed previously the exact squares using Newton’s method and then the errors in each iteration using the 2-norm. The CPU time is computed with the cpu time command of MATLAB and it is an average of 100 computations.

Table 1 Error and CPU time comparisons of Chebyshev and Halley methods after three iterations for the matrix differential problem with initial guess \(\text {diag}(A)\)

From this feature of these methods, we pay attention to construct a generalization of the Chebyshev method with any prefixed order of convergence.

3 A new family of iterative methods

We are interested in generalizing the previous situations and construct iterative methods with any prefixed order of convergence. For this, we first observe that the Newton and the Chebyshev methods can be obtained from inverse interpolation, so that if there exists the inverse operator of \(\mathcal {F}\), \(\Phi \), then the solution of the equation \(\mathcal {F}(X)=0\) is \(\Phi (0)\). So, if we consider different approximations of \(\Phi (0)\), we can construct different iterative methods. For example, in the above-mentioned, if \(\mathcal {F}(X)=X^{-2}-A^{-1}=W\in \mathbb {C}^{r\times r}\) and the necessary conditions are satisfied, we can consider \(\Phi (W)=(W+A^{-1})^{- \frac{1}{2}}=X\) and such that \(\Phi (0)=A^{1/2}\) is a solution of \(\mathcal {F}(X)=X^{-2}-A^{-1}=0\).

In this case, given \(X_0\in \Theta \), we do \(W_0=X_0^{-2}-A^{-1}\). Next, taking into account

$$\begin{aligned} \Phi '(W)W= & {} \lim _{h\rightarrow 0}\frac{1}{h}\left( \Phi (W+hW)-\Phi (W) \right) \\= & {} \lim _{h\rightarrow 0}\frac{1}{h} \left( \left( W+A^{-1}\right) ^{-\frac{1}{2}} \left( I+hW(W+A^{-1})^{-1}\right) ^{-\frac{1}{2}} -\left( W+A^{-1}\right) ^{-\frac{1}{2}}\right) \\= & {} -\frac{1}{2} (W+A^{-1})^{-\frac{1}{2}} W (W+A^{-1})^{-1}, \end{aligned}$$

we obtain

$$\begin{aligned} \left( W_0+A^{-1}\right) ^{-\frac{1}{2}} W_0 (W_0+A^{-1})^{-1} = X_0(I-A^{-1}X_0^2). \end{aligned}$$

Then, if we consider two terms of the Taylor formula for approximating \(\Phi (0)\),

$$\begin{aligned} \Phi (0)\approx \Phi (W_0)-\Phi '(W_0)W_0 = X_0 \left( I + \frac{1}{2} \left( I-A^{-1}X_0^2\right) \right) , \end{aligned}$$

a new approximation, \(X_1=X_0 \left( I+\frac{1}{2}\left( I-A^{-1}X_0^2\right) \right) \), of \(A^{1/2}=\Phi (0)\) is obtained. Reiterating this procedure, we have other version of the Newton method,

$$\begin{aligned} X_0\in \Theta ,\quad X_{n+1} = X_n \left( I + \frac{1}{2} \left( I-A^{-1}X_n^2\right) \right) ,\quad n\ge 0, \end{aligned}$$

for computing the matrix square root.

If we now take into account three terms of the Taylor formula for approximating \(\Phi (0)\),

$$\begin{aligned} \Phi (0)\approx \Phi (W_0)-\Phi '(W_0)W_0+\frac{1}{2}\Phi ''(W_0)W_0^2, \end{aligned}$$

where

$$\begin{aligned} \Phi ''(W) W^{2}= & {} \lim _{h\rightarrow 0}\frac{1}{h}\left( \Phi '(W+hW)W-\Phi '(W)W\right) \\= & {} \frac{3}{4} (W+A^{-1})^{-\frac{1}{2}} \left( W (W+A^{-1})^{-1}\right) ^{2}, \end{aligned}$$

and do the same as previously, we obtain the version of the Chebyshev method given in (3).

After that, we can think in generalizing the above, so that \(\Phi (0)=A^{1/2}\) is approximated by \(m+1\) terms of the Taylor formula for \(\Phi (0)\),

$$\begin{aligned} \Phi (0)\approx \sum _{k=0}^{m}\frac{(-1)^k}{k!}\,\Phi ^{(k)}(W_0)W_0^k. \end{aligned}$$

Before, we need the following result.

Theorem 3.1

Let \(\Phi :\mathbb {C}^{r \times r}\longrightarrow \Theta \) such that \(\Phi (W)=\left( W+A^{-1}\right) ^{-\frac{1}{2}}\). Then,

$$\begin{aligned} \Phi ^{(k)}(W)W^k = (-1)^{k} d_{k} (W+A^{-1})^{-\frac{1}{2}} \left( W (W+A^{-1})^{-1}\right) ^{k},\quad k\ge 2, \end{aligned}$$
(5)

where \(d_{k}=\prod _{i=0}^{k-1}\left( \frac{1}{2}+i\right) \).

Proof

Once we have seen what the expressions for the derivatives \(\Phi '(W)W\) and \(\Phi ''(W)W^{2}\) are, we now assume

$$\begin{aligned} \Phi ^{(k-1)}(W)W^{k-1} = (-1)^{k-1} d_{k-1} (W+A^{-1})^{-\frac{1}{2}} \left( W (W+A^{-1})^{-1}\right) ^{k-1}, \end{aligned}$$

and follows, in the same way and taking into account the next two approximations, obtained from Newton’s generalized binomial,

$$\begin{aligned} \left( W+hW+A^{-1}\right) ^{-\frac{1}{2}}= & {} \left( W+A^{-1}\right) ^{-\frac{1}{2}}\left( I-\dfrac{h}{2}W\left( W+A^{-1}\right) ^{-1}\right) + \mathcal {O}(h^{2}),\\ \left( W\left( W+hW+A^{-1}\right) ^{-1}\right) ^{k-1}= & {} \left( W\left( W+A^{-1}\right) ^{-1}\right) ^{k-1}\\&-\, (k-1)h\left( W\left( W+A^{-1}\right) ^{-1}\right) ^{3} + \mathcal {O}(h^{2}), \end{aligned}$$

that

$$\begin{aligned} \Phi ^{(k)}(W)W^k= & {} \lim _{h\longrightarrow 0}\frac{1}{h}\left( \Phi ^{(k-1)}(W+hW)W^{k-1}-\Phi ^{(k-1)}(W)W^{k-1}\right) \\= & {} (-1)^{k} d_{k} (W+A^{-1})^{-\frac{1}{2}} \left( W (W+A^{-1})^{-1}\right) ^{k}, \end{aligned}$$

so that (5) is true for all positive integers k by mathematical induction. \(\square \)

As a consequence of Theorem 3.1, given \(X_{0}\in \Theta \) and \(W_{0}=X_{0}^{-2}-A^{-1}\) and following the arguments previous to the theorem, we have

$$\begin{aligned} A^{1/2} = \Phi (0) \approx X_{0}\sum _{k=0}^{m}\frac{d_{k}}{k!} \left( W_0\left( W_0+A^{-1}\right) ^{-1}\right) ^k = X_{0}\sum _{k=0}^{m}\frac{d_{k}}{k!} \left( I-A^{-1}X_{0}^2\right) ^k, \end{aligned}$$

where \(d_{k}=\prod _{i=0}^{k-1}\left( \frac{1}{2}+i\right) \). If we do \(X_{1}=X_{0}\sum _{k=0}^{m}\frac{d_{k}}{k!}\left( I-A^{-1}X_0^2\right) ^k\) and reiterate the procedure, we obtain, for approximating \(\Phi (0)=A^{1/2}\), the following general method:

$$\begin{aligned} X_0\in \Theta ,\quad X_{n+1} = \displaystyle {X_n\sum _{k=0}^{m}\frac{d_{k}}{k!} \left( I-A^{-1}X_n^2\right) ^k},\quad n\ge 0. \end{aligned}$$
(6)

Note again that, by choosing \(X_0=I_{r\times r}\), we can obtain \(X_n A = A X_n\) (see [9]).

In the next section, we establish the semilocal convergence of the general method given in (6).

4 Semilocal convergence

Several techniques are usually considered to study the convergence of iterative methods, as we can see in the following papers [68, 11]. Among these, the two most common are based on the majorant principle and recurrence relations. However, in this paper, we use a simpler technique.

If \(R(X_{n})=I-A^{-1}X_n^2\), then

$$\begin{aligned} R(X_n) = I-A^{-1}X_{n}^2 = I-(I-R(X_{n-1}))\left( \sum _{k=0}^{m}\frac{d_{k}}{k!}R(X_{n-1})^k\right) ^2 \end{aligned}$$

and, taking into account Villareal’s formula, it follows

$$\begin{aligned} R(X_n) = \sum _{i=1}^{2m}\left( P_{i-1}-P_{i}\right) R(X_{n-1})^{i}+P_{2m}R(X_{n-1})^{2m+1}, \end{aligned}$$

where \(\displaystyle {P_{i}=\sum \nolimits _{j=0}^{i-1}P_{j}\,\frac{d_{i-j}}{(i-j)!}\,\frac{2i-3j}{i}}\) and \(P_0=1\).

Moreover, we have the following property:

$$\begin{aligned} P_{i} = 1, \quad i=0,1,\ldots ,m. \end{aligned}$$

Now, we are ready to establish the semilocal convergence of the general method given in (6). Notice that \(\Vert \cdot \Vert \) denotes a submultiplicative matrix norm.

Theorem 4.1

Let \(A\in \mathbb {C}^{r \times r}\) and \(X_{0}\in \Theta \) such that \(AX_0=X_0A\) and \(\Vert R(X_0)\Vert =\Vert I-A^{-1}X_0^2\Vert <1\). If \(\{P_{i}\}_{i=m+1}^{2m}\) is a nonincreasing sequence such that \(P_{i}\) are nonnegative for all \(i=m+1,m+2,\ldots ,2m\), then the general method given in (6) converges to \(A^{1/2}\) for each fixed \(m\in \mathbb {N}\). Moreover, \(\Vert R(X_n)\Vert \le \Vert R(X_0)\Vert ^{(m+1)^n}\), \(n\in \mathbb {N}\).

Proof

From (4), we have

$$\begin{aligned} R(X_1)= & {} I-A^{-1}\left( X_0\sum _{k=0}^{m}\frac{d_{k}}{k!}\,R(X_0)^k\right) ^2 \\= & {} I-(I-R(X_0))\left( \sum _{k=0}^{m}\frac{d_{k}}{k!}\,R(X_0)^k\right) ^2 \\= & {} I-(I-R(X_0)) \sum _{k=0}^{m}P_k\,R(X_0)^k \\= & {} (1-P_{m+1})R(X_0)^{m+1}+\sum _{k=m+2}^{2m}(P_{k-1}-P_k)R(X_0)^{k}+P_{2m}R(X_0)^{2m+1}, \end{aligned}$$
$$\begin{aligned} \Vert R(X_1)\Vert\le & {} |1-P_{m+1}|\Vert R(X_0)\Vert ^{m+1}+\sum _{k=m+2}^{2m}|P_{k-1}-P_k|\Vert R(X_0)\Vert ^k+|P_{2m}|\Vert R(X_0)\Vert ^{2m+1} \\= & {} (1-P_{m+1})\Vert R(X_0)\Vert ^{m+1}+\sum _{k=m+2}^{2m}(P_{k-1}-P_k)\Vert R(X_0)\Vert ^k+P_{2m}\Vert R(X_0)\Vert ^{2m+1} \\< & {} \left( 1-P_{m+1}+\sum _{k=m+2}^{2m} (P_{k-1}-P_k) +P_{2m}\right) \Vert R(X_0)\Vert ^{m+1} \\= & {} \Vert R(X_0)\Vert ^{m+1}, \end{aligned}$$

so that \(\Vert R(X_1)\Vert <1\), since \(\Vert R(X_0)\Vert <1\).

Reiterating the procedure, it is easy to prove

$$\begin{aligned} \Vert R(X_n)\Vert < \Vert R(X_{n-1})\Vert ^{m+1},\quad n\in \mathbb {N}, \end{aligned}$$

and, consequently,

$$\begin{aligned} \Vert R(X_n)\Vert < \Vert R(X_{n-1})\Vert ^{m+1} < \Vert R(X_{n-2})\Vert ^{(m+1)^2} <\cdots < \Vert R(X_0)\Vert ^{(m+1)^n}. \end{aligned}$$

As \(\Vert R(X_0)\Vert <1\), then \(\lim _{n}\Vert I-A^{-1}X_n^p\Vert =0\) and \(A^{1/2}=\lim _{n}X_n\). \(\square \)

From Theorem 4.1, we also deduce that the general method given in (6) is of R-order at least \(m+1\).

Finally, experiments suggest that \(\{P_{i}\}_{i=m+1}^{2m}\) is a nonincreasing sequence such that \(P_{i}\) are nonnegative for all \(i=m+1,m+2,\ldots ,2m\). We have checked numerically this property of the sequence \(\{P_{i}\}_{i=m+1}^{2m}\) for \(m=m+1,m+2,\ldots ,100\). In Table 2, we can see this sequence for \(m=m+1,m+2,\ldots ,10\). On the other hand, if the properties are not true in a particular case, we should consider, following the ideas of [9] and [17], the existence of a positive real number \(q<1\) such that

$$\begin{aligned} \sum _{i=m}^{2m-1} | P_{i}-P_{i+1} | q^{i-m} + |P_{2m}| q^{m} = 1. \end{aligned}$$
Table 2 Sequence \(\{P_{i}\}_{i=m+1}^{2m}\) for \(m=2,3,\ldots ,10\)

5 On the efficiency

In this section we analyse the efficiency of the introduced methods. The best method depends on the particular problem that we are interested in solving and the way used to do it.

If A is a matrix \(r\times r\), then the computational cost of (6) is \(mr^{2}+(m+1)r^{3}\), its computational efficiency is \(CE(m,r)=(m+1)^{\frac{1}{mr^{2}+(m+1)r^{3}}}\), [18], and we can observe in Figure 1 that the Chebyshev method is the most efficient of (6).

Fig. 1
figure 1

\(CE(m,r)=(m+1)^{\frac{1}{mr^{2}+(m+1)r^{3}}}\) of (6) where \(r=15,20,25\)

For a problem of moderated size or a problem where the matrix A is sparse (in this case r will be related with the number of nonzero elements and not with the size of the matrix A), the Chebyshev method is the best option since

$$\begin{aligned} 3^{\frac{1}{ 2 r^{2}+ 3 r^{3}}} \ge CE(m,r), \quad \text {for all}\quad m, r. \end{aligned}$$

This is the case of the example introduced above where our scheme improves the numerical behavior of the Halley method.

However, the computational cost of the methods for a dense and large matrix A is prohibited. In these cases, some manipulations should be necessary in order to reduce it. An example is given by multiresolution representation of data, such as wavelet decompositions. These algorithms are useful tools for data compression [10].

Given a finite sequence \(f^L\), which represents sampling of weighted-averages of a function f(x) at the finest resolution level L, multiresolution algorithms connect it with its multiscale representation \(\left\{ f^0,d^1,d^2,\ldots ,d^L\right\} \), where the \(f^0\) corresponds to the sampling at the coarsest resolution level and each sequence \(d^k\) represents the intermediate details which are necessary to recover \(f^k\) from \(f^{k-1}\). These algorithms can be viewed as a simple change of basis. We denote this change by the matrix M, that is

$$\begin{aligned} M f^{L} = \left\{ f^0,d^1,d^2,\ldots ,d^L\right\} . \end{aligned}$$

With these algorithms, a discrete sequence \(f^L\) is encoded to produce a multi-scale representation of its information contents, \(\left( f^0,d^1,d^2,\ldots ,d^L\right) \); this representation is then processed (if the vector represent a piecewise smooth function, then the number of significant coefficients will be \(O(\log J_L)\)), after truncation \(\hat{f}^0_j=\mathbf{tr}\left( f^0_j,\epsilon _0\right) \), \(\hat{d}^k_j=\mathbf{tr}\left( d^k_j,\epsilon _k\right) \), where

$$\begin{aligned} \hat{e}^k_{j} = \mathbf{tr}( {e}^{k}_{j};\epsilon _{k}):= \left\{ \begin{array}{ll} 0, &{}\quad \left| {e}^{k}_{j}\right| \le \epsilon _{k}, \\ {e}^{k}_{j}, &{}\quad \text{ otherwise }, \end{array} \right. \end{aligned}$$
(7)

and the end result is a modified multi-scale representation \(\left( \hat{f}^0,\hat{d}^1, \hat{d}^2,\ldots ,\hat{d}^L\right) \) which is close to the original one; i.e., such that (in norm)

$$\begin{aligned} \left\| \hat{f}^0 - f^0 \right\| \le \epsilon _0, \qquad \left\| \hat{d}^k - d^k \right\| \le \epsilon _k \quad 1 \le k \le L, \end{aligned}$$

where the truncation parameters \(\epsilon _0,\epsilon _1,\ldots ,\epsilon _L\) are chosen according to some criteria specified by the user. After decoding the processed representation, we obtain a discrete set \(\hat{f}^L\) which is expected to be close to the original discrete set \(f^L\). So this is true, some form of stability is needed; i.e., we must require that

$$\begin{aligned} \left\| \hat{f}^L - f^L \right\| \le \sigma (\epsilon _0,\epsilon _1,\ldots , \epsilon _L), \end{aligned}$$

where \(\sigma (\cdot ,\ldots ,\cdot )\) satisfies

$$\begin{aligned} \lim _{\epsilon _l \rightarrow 0, \ 0\le l \le L} \sigma (\epsilon _0,\epsilon _1,\ldots ,\epsilon _L)=0. \end{aligned}$$

Given know the matrix A, its standard form, using the above multiresolution scheme, is defined by the matrix

$$\begin{aligned} A^s=M A M^{-1}. \end{aligned}$$
(8)

The standard form satisfies

$$\begin{aligned} M(A f)=A^s M(f) \quad \text {for every vector} \ f; \quad \left( A^2\right) ^s = A^s A^s. \end{aligned}$$

Both M and \(M^{-1}\) are continuous transformations. Thus if an \(r\times r\) matrix A represents a piecewise smooth operator, each transformed row in \(AM^{-1}\) represents a piecewise smooth operator. Sparsity is obtained by truncation of each transformed column in \(M A M^{-1}\). The matrix \(A^s\) will have \(O(r \log (r))\) significant coefficients, \(O(\log (r))\) for each column. The transformation of the columns leaves the significant coefficients in the typical finger shape. On the other hand, the computational cost of this transform is \(O(r^2)\). We refer [1, 2, 4] for more details.

Table 3 Order of the method with higher CE(mrCD)

For simplicity we consider that \(O(r \log (r)) \approx C r \log (r)\) and \(O(r^2)\approx D r^2\), then the computational cost of (6) is \(m C r \log (r)+(m+1)C r^{2}\log (r)+D r^2\) and its computational efficiency is \(CE(m,r,C,D)=(m+1)^{\frac{1}{m C r \log (r)+(m+1) C r^{2}\log (r)+D r^2}}\). We can observe in Table 3 that the best method will depend on the sparsity of the operator A and the computational cost of the multiresolution scheme, that is, C and D, but not on r. The sparsity is related with the smoothness of A and with the size of the truncation parameters. For stable multiresolution algorithms, as we have mentioned, the approximating error will be bounded by a decreasing function of these parameters (see our previous works [1, 2] for more details).