Abstract
This paper is devoted to the approximation of the square root of a matrix. We present a family of any order of convergence as a generalization of the classical Newton and Chebychev methods. The semilocal convergence of the family is presented. After an analysis of the efficiency, we conclude that the most efficient method depends on the matrix and on the implementation way.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this study we are concerned with the problem of computing a square root of a given matrix A. A square root of a \(r\times r\) matrix A with complex elements, \(A\in \mathbb {C}^{r \times r}\), is a solution \(X\in \mathbb {C}^{r \times r}\) of the quadratic matrix equation
which is denoted by \(X=A^{1/2}\). Different from the square root of a scalar, the square root of a matrix may not exist. The theory of existence of matrix square roots is well presented in [15] and the number of matrix square roots is obtained in [13, 15].
One approach to computing a square root of a matrix is to apply iterative methods to the previous quadratic matrix equation. For a general function \(\mathcal {G}:\mathbb {C}^{r \times r}\longrightarrow \mathbb {C}^{r \times r}\), iterative methods for the solution of \(\mathcal {F}(X)=0\) can be specified by an initial approximation \(X_0\) and the algorithm \(X_{n+1}=\mathcal {G}(X_{n})\), \(n\ge 0\), so that, for suitable initial values \(X_0\), the algorithm provides a sequence \(\{X_n\}\) which converges to \(A^{1/2}\).
Algorithms for computing the square root of a matrix are a subject of current research (see [14]) and are now well understood, so that they are attracting more and more attention. The most popular method for computing the square root of a matrix is the Newton method [12], that converges quadratically. In this paper, we investigate a convergence of the family of iterations with high order of convergence for computing the square root of a matrix, proposed in [3], which includes the Newton and Chebyshev methods. Since the Newton method has already been sufficiently studied by other authors for computing the square root of a matrix [12, 13, 16], we began with the Chebyshev method to present the technique for proving its semilocal convergence that we use later to prove the semilocal convergence of the family of iterations. This family is analysed in detail and applied to solve the above quadratic matrix equation. We show that the methods have good properties of convergence and semilocally converge to \(A^{1/2}\). In addition, the theoretic analysis and numerical experiments show that the most efficient method depends on the matrix and on the implementation way.
2 The Chebyshev method
First, we introduce the idea of the Chebyshev method and some corresponding results. It is well-known that, for a given operator \(F:X\longrightarrow Y\) defined in a Banach space X and with values in a Banach space Y, the third order Chebyshev method [5] for solving \(F(X)=0\) is:
where \(L_{F}(X_n)=[F'(X_n)]^{-1}F''(X_n)[F'(X_n)]^{-1}F(X_n)\).
If we particularize the Chebyshev method for the computation of the square root of a matrix, we then consider the space
and approximate \(A^{1/2}\) for a given operator \(A\in \Theta \). For this, we consider
so that \(A^{1/2}\) is the solution of the equation \(\mathcal {F}(X)=0\). So, the Chebyshev method is reduced (see [3]) to
Observe that we need to calculate the inverses \(X_{n}^{-1}\) at each step of the Chebychev method defined in (1). We are interested in obtaining other expression of the Chebyshev method, where the computation of the inverses is not necessary. For this, we now consider the equation
where \(\mathcal {F}:\Theta \longrightarrow \mathbb {C}^{r \times r}\). In this case, \(A^{1/2}\) is also the solution of the equation \(\mathcal {F}(X)=0\).
From (2), we obtain
and taking into account that the Chebyshev method can be also written as
the algorithm of the Chebyshev method for solving \(\mathcal {F}(X)=0\), where \(\mathcal {F}:\Theta \longrightarrow \mathbb {C}^{r \times r}\), is then
Observe that we only need to calculate \(A^{-1}\), and not \(X_{n}^{-1}\), at each step for applying algorithm (3) to solve \(\mathcal {F}(X)=0\), so that this algorithm is more efficient than algorithm (1).
After that, we analyse the convergence of the Chebyshev method. For this, we suppose \(AX_0=X_0A\) and consider the following residual of algorithm (3):
To prove the convergence of algorithm (3), we consider a submultiplicative matrix norm \(\Vert \cdot \Vert \) defined in \(\mathbb {C}^{r \times r}\) and prove that \(\{\Vert R(X_n)\Vert \}\) is a scalar decreasing sequence convergent to zero. First of all, from \(AX_0=X_0A\), it follows \(AX_n=X_nA\), \(AX_n^2=X_n^2A\) and \(A^{-1}X_n^2=X_n^2A^{-1}\). As a consequence, we have
Next, taking into account Villareal’s formula,
where
it follows
with \(P_{0}=P_1=P_2=1\), \(P_3=0.375\) and \(P_4=0.1406\).
Note that we can always obtain \(X_n A = A X_n\). For this, it is enough to choose \(X_0=I_{r\times r}\), as Guo does in [9].
Now, we establish the semilocal convergence of the Chebyshev method in the following theorem.
Theorem 2.1
Let \(A\in \mathbb {C}^{r \times r}\) and \(X_{0}\in \Theta \) such that \(AX_0=X_0A\) and \(\Vert R(X_0)\Vert =\Vert I-A^{-1}X_0^2\Vert <1\). Then, the Chebyshev method defined in (3) converges to \(A^{1/2}\). Moreover, \(\Vert R(X_n)\Vert \le \Vert R(X_0)\Vert ^{3^n}\), \(n\in \mathbb {N}\).
Proof
From (4), we have
and, as \(\Vert R(X_0)\Vert <1\), \(P_3-P_{4}\ge 0\) and \(P_{4}\ge 0\), it follows
Now, taking into account \(\Vert R(X_1)\Vert <\Vert R(X_0)\Vert ^3<1\), we prove, by mathematical induction on n, that \(\Vert R(X_n)\Vert <\Vert R(X_{n-1})\Vert ^3\), for \(n\in \mathbb {N}\). As a consequence,
Since \(\Vert R(X_0)\Vert <1\), then \(\lim _{n}\Vert I-A^{-1}X_n^2\Vert =0\) and \(A^{1/2}=\lim _{n}X_n\). \(\square \)
From Theorem 2.1, we also deduce that the Chebyshev method is of R-order at least three.
Example We compare the behavior of the Chebyshev and Halley methods for solving the following matrix differential equation
with the initial conditions \(y(0)=\alpha \) and \(y'(0)=\beta \). Note that the Halley method is the usual method proposed to solve this type of problems [9, 17].
The exact solution of this problem can be expressed as
so that we need to compute both \(A^{1/2} \) and \(A^{-1/2}\) at the same time. We then consider the matrix
with \(\lambda =1/16\). This matrix is related with the finite difference discretization of second order derivatives in space.
In Table 1 we observe the improvements of the Chebychev method when the dimension of the matrix A increases. We have computed previously the exact squares using Newton’s method and then the errors in each iteration using the 2-norm. The CPU time is computed with the cpu time command of MATLAB and it is an average of 100 computations.
From this feature of these methods, we pay attention to construct a generalization of the Chebyshev method with any prefixed order of convergence.
3 A new family of iterative methods
We are interested in generalizing the previous situations and construct iterative methods with any prefixed order of convergence. For this, we first observe that the Newton and the Chebyshev methods can be obtained from inverse interpolation, so that if there exists the inverse operator of \(\mathcal {F}\), \(\Phi \), then the solution of the equation \(\mathcal {F}(X)=0\) is \(\Phi (0)\). So, if we consider different approximations of \(\Phi (0)\), we can construct different iterative methods. For example, in the above-mentioned, if \(\mathcal {F}(X)=X^{-2}-A^{-1}=W\in \mathbb {C}^{r\times r}\) and the necessary conditions are satisfied, we can consider \(\Phi (W)=(W+A^{-1})^{- \frac{1}{2}}=X\) and such that \(\Phi (0)=A^{1/2}\) is a solution of \(\mathcal {F}(X)=X^{-2}-A^{-1}=0\).
In this case, given \(X_0\in \Theta \), we do \(W_0=X_0^{-2}-A^{-1}\). Next, taking into account
we obtain
Then, if we consider two terms of the Taylor formula for approximating \(\Phi (0)\),
a new approximation, \(X_1=X_0 \left( I+\frac{1}{2}\left( I-A^{-1}X_0^2\right) \right) \), of \(A^{1/2}=\Phi (0)\) is obtained. Reiterating this procedure, we have other version of the Newton method,
for computing the matrix square root.
If we now take into account three terms of the Taylor formula for approximating \(\Phi (0)\),
where
and do the same as previously, we obtain the version of the Chebyshev method given in (3).
After that, we can think in generalizing the above, so that \(\Phi (0)=A^{1/2}\) is approximated by \(m+1\) terms of the Taylor formula for \(\Phi (0)\),
Before, we need the following result.
Theorem 3.1
Let \(\Phi :\mathbb {C}^{r \times r}\longrightarrow \Theta \) such that \(\Phi (W)=\left( W+A^{-1}\right) ^{-\frac{1}{2}}\). Then,
where \(d_{k}=\prod _{i=0}^{k-1}\left( \frac{1}{2}+i\right) \).
Proof
Once we have seen what the expressions for the derivatives \(\Phi '(W)W\) and \(\Phi ''(W)W^{2}\) are, we now assume
and follows, in the same way and taking into account the next two approximations, obtained from Newton’s generalized binomial,
that
so that (5) is true for all positive integers k by mathematical induction. \(\square \)
As a consequence of Theorem 3.1, given \(X_{0}\in \Theta \) and \(W_{0}=X_{0}^{-2}-A^{-1}\) and following the arguments previous to the theorem, we have
where \(d_{k}=\prod _{i=0}^{k-1}\left( \frac{1}{2}+i\right) \). If we do \(X_{1}=X_{0}\sum _{k=0}^{m}\frac{d_{k}}{k!}\left( I-A^{-1}X_0^2\right) ^k\) and reiterate the procedure, we obtain, for approximating \(\Phi (0)=A^{1/2}\), the following general method:
Note again that, by choosing \(X_0=I_{r\times r}\), we can obtain \(X_n A = A X_n\) (see [9]).
In the next section, we establish the semilocal convergence of the general method given in (6).
4 Semilocal convergence
Several techniques are usually considered to study the convergence of iterative methods, as we can see in the following papers [6–8, 11]. Among these, the two most common are based on the majorant principle and recurrence relations. However, in this paper, we use a simpler technique.
If \(R(X_{n})=I-A^{-1}X_n^2\), then
and, taking into account Villareal’s formula, it follows
where \(\displaystyle {P_{i}=\sum \nolimits _{j=0}^{i-1}P_{j}\,\frac{d_{i-j}}{(i-j)!}\,\frac{2i-3j}{i}}\) and \(P_0=1\).
Moreover, we have the following property:
Now, we are ready to establish the semilocal convergence of the general method given in (6). Notice that \(\Vert \cdot \Vert \) denotes a submultiplicative matrix norm.
Theorem 4.1
Let \(A\in \mathbb {C}^{r \times r}\) and \(X_{0}\in \Theta \) such that \(AX_0=X_0A\) and \(\Vert R(X_0)\Vert =\Vert I-A^{-1}X_0^2\Vert <1\). If \(\{P_{i}\}_{i=m+1}^{2m}\) is a nonincreasing sequence such that \(P_{i}\) are nonnegative for all \(i=m+1,m+2,\ldots ,2m\), then the general method given in (6) converges to \(A^{1/2}\) for each fixed \(m\in \mathbb {N}\). Moreover, \(\Vert R(X_n)\Vert \le \Vert R(X_0)\Vert ^{(m+1)^n}\), \(n\in \mathbb {N}\).
Proof
From (4), we have
so that \(\Vert R(X_1)\Vert <1\), since \(\Vert R(X_0)\Vert <1\).
Reiterating the procedure, it is easy to prove
and, consequently,
As \(\Vert R(X_0)\Vert <1\), then \(\lim _{n}\Vert I-A^{-1}X_n^p\Vert =0\) and \(A^{1/2}=\lim _{n}X_n\). \(\square \)
From Theorem 4.1, we also deduce that the general method given in (6) is of R-order at least \(m+1\).
Finally, experiments suggest that \(\{P_{i}\}_{i=m+1}^{2m}\) is a nonincreasing sequence such that \(P_{i}\) are nonnegative for all \(i=m+1,m+2,\ldots ,2m\). We have checked numerically this property of the sequence \(\{P_{i}\}_{i=m+1}^{2m}\) for \(m=m+1,m+2,\ldots ,100\). In Table 2, we can see this sequence for \(m=m+1,m+2,\ldots ,10\). On the other hand, if the properties are not true in a particular case, we should consider, following the ideas of [9] and [17], the existence of a positive real number \(q<1\) such that
5 On the efficiency
In this section we analyse the efficiency of the introduced methods. The best method depends on the particular problem that we are interested in solving and the way used to do it.
If A is a matrix \(r\times r\), then the computational cost of (6) is \(mr^{2}+(m+1)r^{3}\), its computational efficiency is \(CE(m,r)=(m+1)^{\frac{1}{mr^{2}+(m+1)r^{3}}}\), [18], and we can observe in Figure 1 that the Chebyshev method is the most efficient of (6).
For a problem of moderated size or a problem where the matrix A is sparse (in this case r will be related with the number of nonzero elements and not with the size of the matrix A), the Chebyshev method is the best option since
This is the case of the example introduced above where our scheme improves the numerical behavior of the Halley method.
However, the computational cost of the methods for a dense and large matrix A is prohibited. In these cases, some manipulations should be necessary in order to reduce it. An example is given by multiresolution representation of data, such as wavelet decompositions. These algorithms are useful tools for data compression [10].
Given a finite sequence \(f^L\), which represents sampling of weighted-averages of a function f(x) at the finest resolution level L, multiresolution algorithms connect it with its multiscale representation \(\left\{ f^0,d^1,d^2,\ldots ,d^L\right\} \), where the \(f^0\) corresponds to the sampling at the coarsest resolution level and each sequence \(d^k\) represents the intermediate details which are necessary to recover \(f^k\) from \(f^{k-1}\). These algorithms can be viewed as a simple change of basis. We denote this change by the matrix M, that is
With these algorithms, a discrete sequence \(f^L\) is encoded to produce a multi-scale representation of its information contents, \(\left( f^0,d^1,d^2,\ldots ,d^L\right) \); this representation is then processed (if the vector represent a piecewise smooth function, then the number of significant coefficients will be \(O(\log J_L)\)), after truncation \(\hat{f}^0_j=\mathbf{tr}\left( f^0_j,\epsilon _0\right) \), \(\hat{d}^k_j=\mathbf{tr}\left( d^k_j,\epsilon _k\right) \), where
and the end result is a modified multi-scale representation \(\left( \hat{f}^0,\hat{d}^1, \hat{d}^2,\ldots ,\hat{d}^L\right) \) which is close to the original one; i.e., such that (in norm)
where the truncation parameters \(\epsilon _0,\epsilon _1,\ldots ,\epsilon _L\) are chosen according to some criteria specified by the user. After decoding the processed representation, we obtain a discrete set \(\hat{f}^L\) which is expected to be close to the original discrete set \(f^L\). So this is true, some form of stability is needed; i.e., we must require that
where \(\sigma (\cdot ,\ldots ,\cdot )\) satisfies
Given know the matrix A, its standard form, using the above multiresolution scheme, is defined by the matrix
The standard form satisfies
Both M and \(M^{-1}\) are continuous transformations. Thus if an \(r\times r\) matrix A represents a piecewise smooth operator, each transformed row in \(AM^{-1}\) represents a piecewise smooth operator. Sparsity is obtained by truncation of each transformed column in \(M A M^{-1}\). The matrix \(A^s\) will have \(O(r \log (r))\) significant coefficients, \(O(\log (r))\) for each column. The transformation of the columns leaves the significant coefficients in the typical finger shape. On the other hand, the computational cost of this transform is \(O(r^2)\). We refer [1, 2, 4] for more details.
For simplicity we consider that \(O(r \log (r)) \approx C r \log (r)\) and \(O(r^2)\approx D r^2\), then the computational cost of (6) is \(m C r \log (r)+(m+1)C r^{2}\log (r)+D r^2\) and its computational efficiency is \(CE(m,r,C,D)=(m+1)^{\frac{1}{m C r \log (r)+(m+1) C r^{2}\log (r)+D r^2}}\). We can observe in Table 3 that the best method will depend on the sparsity of the operator A and the computational cost of the multiresolution scheme, that is, C and D, but not on r. The sparsity is related with the smoothness of A and with the size of the truncation parameters. For stable multiresolution algorithms, as we have mentioned, the approximating error will be bounded by a decreasing function of these parameters (see our previous works [1, 2] for more details).
References
Amat, S., Busquier, S., Gutiérrez, J.M.: An adaptive version of a fourth-order iterative method for quadratic equations. J. Comput. Appl. Math. 191(2), 259–268 (2006)
Amat, S., Busquier, S., Negra, M.: Adaptive approximation of nonlinear operators. Numer. Funct. Anal. Optim. 25(5–6), 397–405 (2004)
Amat, S., Ezquerro, J.A., Hernández-Verón M.A.: On a new family of high-order iterative methods for the matrix \(p\)th root. Numer. Linear Algebra Appl. (2015). doi:10.1002/nla.1974
Aràndiga, F., Candela, V.F.: Multiresolution standard form of a matrix. SIAM J. Numer. Anal. 33, 417–434 (1996)
Argyros, I.K., Chen, D.: Results on the Chebyshev method in Banach spaces. Proyecciones 12(2), 119–128 (1993)
Argyros, I.K.: The convergence of a Halley-Chebyshev-type method under Newton–Kantorovich hypotheses. Appl. Math. Lett. 6(5), 71–74 (1993)
Argyros, I.K.: A convergence analysis of Newton’s method under the gamma-condition in Banach spaces. Appl. Math. 36(2), 225–239 (2009)
Ezquerro, J.A., Hernández, M.A.: New Kantorovich-type conditions for Halley’s method. Appl. Numer. Anal. Comput. Math. 2(1), 70–77 (2005)
Guo, C.-H.: On Newton’s method and Halley’s method for the principal \(p\)th root of a matrix. Linear Algebra Appl. 432(8), 1905–1922 (2010)
Harten, A.: Multiresolution representation of data II. SIAM J. Numer. Anal. 33(3), 1205–1256 (1996)
Hernández, M.A., Salanova, M.A.: Modification of the Kantorovich assumptions for semilocal convergence of the Chebyshev method. J. Comput. Appl. Math. 126(1–2), 131–143 (2000)
Higham, N.J.: Newton’s method for the matrix square root. Math. Comp. 46, 537–549 (1986)
Higham, N.J.: Computing real square roots of a real matrix. Linear Algebra Appl. 88–89, 405–430 (1987)
Higham, N.J.: Functions of matrices: theory and computation. SIAM, Philadelphia (2008)
Horn, R.A., Johnson, C.R.: Topics in matrix analysis. Cambridge University Press, Cambridge (1991)
Iannazzo, B.: A note on computing the matrix square root. Calcolo 40, 273–283 (2003)
Lin, M.: A residual recurrence for Halley’s method for the matrix \(p\)th root. Linear Algebra Appl. 432(11), 2928–2930 (2010)
Traub, J.F.: Iterative methods for the solution of equations. Prentice-Hall, Englewood Cliffs, New Jersey (1964)
Acknowledgments
This work has been partially supported by the project MTM2014-52016-C2-1-P of Spanish Ministry of Economy and Competitiveness.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amat, S., Ezquerro, J.A. & Hernández-Verón, M.A. Iterative methods for computing the matrix square root. SeMA 70, 11–21 (2015). https://doi.org/10.1007/s40324-015-0038-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40324-015-0038-9