A note on the convergence theorem of the tridiagonal QR algorithm with Wilkinson’s shift

Aishima, Kensuke

doi:10.1007/s13160-015-0171-y

A note on the convergence theorem of the tridiagonal QR algorithm with Wilkinson’s shift

Original Paper
Area 2
Published: 12 June 2015

Volume 32, pages 465–487, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Japan Journal of Industrial and Applied Mathematics Aims and scope Submit manuscript

A note on the convergence theorem of the tridiagonal QR algorithm with Wilkinson’s shift

Download PDF

Kensuke Aishima¹

197 Accesses
Explore all metrics

Abstract

We discuss the convergence rate of the QR algorithm with Wilkinson’s shift for tridiagonal symmetric eigenvalue problems. It is well known that the convergence rate is theoretically at least quadratic, and practically better than cubic for most matrices. In an effort to derive the convergence rate, the limiting patterns of some lower right submatrices have been intensively investigated. In this paper, we first describe a new limiting pattern of the lower right 3-by-3 submatrix with a concrete example, and then prove that the convergence rate of this new pattern is strictly cubic. In addition, we stress that our analysis identifies three classes of the limiting patterns of the tridiagonal QR algorithm with Wilkinson’s shift.

Global Convergence of Hessenberg Shifted QR I: Exact Arithmetic

Article 13 August 2024

Global convergence of the restarted Lanczos and Jacobi–Davidson methods for symmetric eigenvalue problems

Article 09 January 2015

Convergence Acceleration of Shifted LR Transformations for Totally Nonnegative Hessenberg Matrices

Article 07 September 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The standard method for computing the eigenvalues of a real symmetric matrix A consists of two phases. First, A is transformed to a tridiagonal matrix T by an appropriate orthogonal similarity transformation. Then some iterative method is applied to T to compute its eigenvalues. There are several approaches in the second phase. Among them, the historical QR algorithm is still widely used as a reliable tool, particularly with Wilkinson’s shift for accelerating the convergence. In this paper we consider the convergence behavior of this algorithm.

Hence, we write a symmetric irreducible tridiagonal matrix as

$$\begin{aligned} T = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \alpha _{1} &{} \beta _{1} &{} &{} \\ \beta _{1} &{} \alpha _{2} &{} \ddots &{} \\ &{} \ddots &{} \ddots &{} \beta _{m-1} \\ &{} &{} \beta _{m-1} &{} \alpha _{m} \end{array} \right) . \end{aligned}$$

(1)

The eigenvalues of T are all distinct [7], which are denoted as $\lambda _1>\cdots >\lambda _m$ here. The shifted QR algorithm is described as

$$\begin{aligned} T^{(n)}-s^{(n)}I= & {} Q^{(n)}R^{(n)}, \end{aligned}$$

(2)

$$\begin{aligned} T^{(n+1)}= & {} R^{(n)}Q^{(n)}+s^{(n)}I \end{aligned}$$

(3)

for $n=0,1,\ldots $, where $Q^{(n)}$ is orthogonal, $R^{(n)}$ is upper triangular with nonnegative diagonal elements, and $s^{(n)}$ is a shift. Similarly to (1), tridiagonal elements of $T^{(n)}$ are denoted by

$$\begin{aligned} T^{(n)} = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \alpha _{1}^{(n)}&{} \beta _{1}^{(n)} &{} &{} \\ \beta _{1}^{(n)}&{} \alpha _{2}^{(n)} &{} \ddots &{} \\ &{} \ddots &{} \ddots &{} \beta _{m-1}^{(n)} \\ &{} &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) . \end{aligned}$$

(4)

In (2), one way to determine an efficient shift $s^{(n)}$ is to consider the lower right 2-by-2 submatrix, and pick its eigenvalue closer to $\alpha _{m}^{(n)}$. This is the so called Wilkinson shift.

For this algorithm, there is a long history of convergence analysis. Global convergence (i.e. $\lim _{n\rightarrow \infty }\beta _{m-1}^{(n)}=0$ for any initial matrix) was first proved by Wilkinson in [10], and then another elegant proof was given by Hoffmann–Parlett [1]. Regarding the convergence rate, Wilkinson [10] theoretically proved that it is at least quadratic, after which Hoffmann–Parlett [1] showed that for most matrices convergence better than cubic is achieved. Today, the convergence scenarios are classified by the asymptotic behavior of the lower right elements as follows.

if $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=0$, then quadratic convergence only;
if $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=D\not =0$, then cubic convergence at least;

if, in addition, $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=0$, then the rate is better than cubic.

This view was not explicitly described in any literature, as far as the authors know. However, in actual fact, this view was first implied by Hoffmann–Parlett [1], and then followed by other researchers [7, 8, 11] (for more detail, see Sects. 2 and 3). In the present study, we focus on the situation $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=D\not =0,\ \lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=C\not =0$ where the rate is at least cubic. This situation is relatively exceptional. In fact, concrete matrices that belong to this case have not been found in any literature, and most initial matrices result in the case $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=0$ where the rate is better than cubic. Mathematically strictly speaking, however, it is desired to determine if such an exceptional example actually exists and, if so, derive its exact convergence rate.

In this paper, we first point out an example where $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=D\not =0,\ \lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=C\not =0$ are observed. Next, we prove $\lim _{n\rightarrow \infty }(\alpha _{m-2}^{(n)}-\alpha _{m}^{(n)})=-D$ in this case. In other words, we prove that there exists a matrix with the new limiting pattern of the lower right 3-by-3 matrix

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{m-2}^{(n)}&{} \beta _{m-2}^{(n)} &{} 0 \\ \beta _{m-2}^{(n)}&{} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ 0 &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} \lambda _l-D &{} C &{} 0 \\ C &{} \lambda _l+D &{} 0 \\ 0 &{} 0 &{} \lambda _l \end{array} \right) , \end{aligned}$$

where $C\not =0$ and $D\not =0$. Although several literatures imply that the rate of this case is at least cubic, we theoretically show a stronger result that the rates of all the examples belonging to this limiting pattern are “strictly” cubic. Our analysis is based on the result by Jiang–Zhang [3, Lemma 1]. Also note that our analysis identifies three classes of the limiting patterns of the tridiagonal QR algorithm with Wilkinson’s shift.

This paper is organized as follows. After a brief summary of Wilkinson’s results [10] in Sect. 2, we summarize the limiting matrix patterns owing to Hoffmann–Parlett [1] in Sect. 3. Then in Sect. 4, we present an example where the new limiting pattern above is observed. In Sect. 5, we review a part of the convergence analysis by Jiang–Zhang [3] for our convergence analysis. In Sect. 6, we prove that the convergence rate for the new limiting pattern is strictly cubic, and we give a convergence theorem that covers all the 3-by-3 limiting submatrices. Section 7 is devoted to the conclusions.

Remark 1

Strictly speaking, in order for our classifications by limiting matrices to work for every initial matrix, it should be also proved that the 3-by-3 submatrix in question always tends to a constant matrix (without exhibiting any oscillatory behaviors). Actually this holds true. This fact might have been noticed by the experts in this research field because its proof is almost the same as those by [2, 6, 9] for the unshifted QR algorithm. However, the present authors do not know any reference where the proof for the shifted algorithm is explicitly stated. For the readers’ convenience, in the present paper a stronger result stating all of the tridiagonal elements in fact tend to constants is shown in Appendix A.

Remark 2

In this paper, we assume that all the diagonal elements of $R^{(n)}$ in (2) are nonnegative in the QR algorithm. Even if some diagonal elements of $R^{(n)}$ are not nonnegative, the absolute values of the subdiagonal elements $|\beta _{1}^{(n)}|,\ldots ,|\beta _{m-1}^{(n)}|$ behave in the same way. We explain it more precisely below. Let $\hat{T}^{(n)}\ (n=0,1,\ldots )$ be the tridiagonal matrices computed by the QR algorithm where some diagonal elements of $R^{(n)}$ are not nonnegative. Then $\hat{T}^{(n)}=D^{(n)}T^{(n)}D^{(n)}$ for $n=0,1,\ldots $ are satisfied where $D^{(n)}$ are certain diagonal sign matrices. This is proved by induction. Obviously $D^{(0)}=I$ is a sign matrix. Note that the Wilkinson’s shift $s^{(n)}$ for $\hat{T}^{(n)}$ is the same as $T^{(n)}$. Hence, any QR decomposition of $\hat{T}^{(n)}-s^{(n)}I$ is described as $(D^{(n)}Q^{(n)}D^{(n+1)})(D^{(n+1)}R^{(n)}D^{(n)})$ for a sign matrix $D^{(n+1)}$. Then, we have $\hat{T}^{(n+1)}=D^{(n+1)}T^{(n+1)}D^{(n+1)}$. Therefore, the absolute values of the subdiagonal elements $\hat{T}^{(n+1)}$ are equivalent to those of $T^{(n+1)}$. Thus, in some sense, we also cover the convergence analysis of $\hat{T}^{(n)}$ by replacing $\beta _{1}^{(n)},\ldots ,\beta _{m-1}^{(n)}$ with $|\beta _{1}^{(n)}|,\ldots ,|\beta _{m-1}^{(n)}|$ in the following sections.

2 QR algorithm with Wilkinson’s shift

In this section, Wilkinson’s convergence theorems in [10] are summarized. In the QR algorithm with Wilkinson’s shift, the global convergence (in the sense that $\lim _{n\rightarrow \infty }\beta _{m-1}^{(n)}=0$ for any initial matrix) is theoretically guaranteed, and the convergence rate is at least quadratic as the next theorem indicates.

Theorem 1

(Wilkinson [10]) Suppose the QR algorithm with Wilkinson’s shift is applied to an irreducible tridiagonal matrix T. Then we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\beta _{m-1}^{(n)}=0, \qquad \left| \beta _{m-1}^{(n+1)}\right| =\mathrm{O}\left( \left| \beta _{m-1}^{(n)}\right| ^{2}\right) \end{aligned}$$

(5)

for all initial matrices.

Here we like to point out that, although not explicitly written, from the discussions in [10] we can deduce two important facts.

First, Wilkinson’s paper [10] implies $\lim _{n\rightarrow \infty }\alpha _{m}^{(n)}=\lambda _l$. In other words, $\alpha _{m}^{(n)}$ converges to a fixed eigenvalue $\lambda _l$. It is easy to see that (21) on p. 412 of [10] corresponds to

$$\begin{aligned} \alpha _{m}^{(n+1)}=s^{(n)}+r_{mm}^{(n)}\cos {\theta ^{(n)}}, \end{aligned}$$

(6)

where $r_{mm}^{(n)}$ is the lower right element of $R^{(n)}$, and $\theta ^{(n)}$ is the last angle of the Givens rotation to delete the last subdiagonal element. Obviously, $r_{mm}^{(\infty )}=0$. Hence, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\left| \alpha _{m}^{(n+1)}-\alpha _{m}^{(n)}\right| \le \lim _{n\rightarrow \infty }\left| \alpha _{m}^{(n+1)}-s^{(n)}\right| +\lim _{n\rightarrow \infty }\left| \alpha _{m}^{(n)}-s^{(n)}\right| =0. \end{aligned}$$

(7)

This shows that $\alpha _{m}^{(n)}$ converges to a fixed eigenvalue.

Second, Wilkinson’s paper [10] implies the cubic convergence of $\beta _{m-1}^{(n)}$ in a certain case. The first inequality in (62) on p. 418 of [10] corresponds to

$$\begin{aligned} \left| \beta _{m-1}^{(n+1)}\right| \le \frac{\left| \beta _{m-1}^{(n)}\right| ^3}{\left| \alpha _{m-1}^{(n)}-s^{(n)}\right| \left( \delta -3\beta _{m-1}^{(n)}\right) }+\frac{\left| \beta _{m-1}^{(n)}\right| ^3}{\left( \delta -3\beta _{m-1}^{(n)}\right) ^2} \end{aligned}$$

(8)

for all sufficiently large n, where $\delta =\min _{i\not =l}|\lambda _{i}-\lambda _{l}|$. It is easy to see that, if $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-s^{(n)})=D\not =0$, then $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-1}^{(n)}|^{3})$. Therefore, the following theorem is established.

Theorem 2

Suppose the QR algorithm with Wilkinson’s shift is applied to an irreducible tridiagonal matrix T. Then $\lim _{n\rightarrow \infty }\alpha _{m}^{(n)}=\lambda _l$ holds for some l. If the lower right 2-by-2 submatrix converges to the limiting matrix

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c} \lambda _l+D &{} 0 \\ 0 &{} \lambda _l \end{array} \right) , \end{aligned}$$

(9)

where $D\not =0$, then $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-1}^{(n)}|^{3})$. If the lower right 2-by-2 submatrix converges to the limiting matrix

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} \lambda _l &{} 0 \\ 0 &{} \lambda _l \end{array} \right) , \end{aligned}$$

(10)

then $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-1}^{(n)}|^{2})$.

3 Convergence rate analysis by Hoffmann–Parlett [1]

Actually, quite often it is observed that the convergence rate is better than cubic. This phenomena is mathematically described in the next theorem, which states that if the second lower right element $\beta _{m-2}^{(n)}$ converges to 0, then the convergence rate of the lower right element $\beta _{m-1}^{(n)}$ is better than cubic.

Theorem 3

(Hoffmann–Parlett [1]) Suppose the QR algorithm with Wilkinson’s shift is applied to an irreducible tridiagonal matrix T. If the lower right 3-by-3 submatrix converges to the limiting matrix

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{m-2}^{(n)}&{} \beta _{m-2}^{(n)} &{} 0 \\ \beta _{m-2}^{(n)}&{} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ 0 &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} * &{} 0 &{} 0 \\ 0 &{} \lambda _k &{} 0 \\ 0 &{} 0 &{} \lambda _l \end{array} \right) , \end{aligned}$$

(11)

where $k\not =l$, then $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-2}^{(n)}|^{2}|\beta _{m-1}^{(n)}|^{3})$. If the lower right 3-by-3 submatrix converges to the limiting matrix

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{m-2}^{(n)}&{} \beta _{m-2}^{(n)} &{} 0 \\ \beta _{m-2}^{(n)}&{} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ 0 &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} * &{} C &{} 0 \\ C &{} \lambda _l &{} 0 \\ 0 &{} 0 &{} \lambda _l \end{array} \right) , \end{aligned}$$

(12)

where $C\not =0$, then $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-1}^{(n)}|^{2})$.

More precisely, in (12), Hoffmann–Parlett do not assume $\lim _{n\rightarrow \infty }\alpha _{m-1}^{(n)}=\lim _{n\rightarrow \infty }\alpha _{m}^{(n)}=\lambda _l$ but $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-s^{(n)})=0$. However, it is easy to see that $\lim _{n\rightarrow \infty }\alpha _{m-1}^{(n)}=\lim _{n\rightarrow \infty }\alpha _{m}^{(n)}=\lambda _l$ is mathematically equivalent to $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-s^{(n)})=0$ in the same way as the previous section. Combining the above theorem with Theorem 2, we have the following corollary.

Corollary 1

The convergence rate of $\beta _{m-1}^{(n)}$ is summarized as follows.

if $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=0$, then quadratic convergence only;
if $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=D\not =0$, then cubic convergence at least; if, in addition, $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=0$, then the rate is better than cubic.

In most cases, the limiting matrix satisfies (11), and thus the actual convergence speed is better than cubic [1, 7, 8, 11]. In the case (11), Parlett demonstrates the asymptotic ratio $|\beta _{m-1}^{(n+1)}|/|\beta _{m-2}^{(n)}|^{2}|\beta _{m-1} ^{(n)}|^{3}$ approaches a computable limit under the assumption that $\lim _{n\rightarrow \infty }\beta _{m-3}^{(n)}=0$ and $\lim _{n\rightarrow \infty }\alpha _{m-3}^{(n)}=\lambda _{j}$ for some j [7, Theorem 8.11.1]. The possibility of the loss of cubic convergence, i.e., the occurrence of (12), was discussed in several studies (see [1, 7, 8, 11]). For example, Parlett [7] and Wang [8] show that, in the case (12), the limiting pattern of the lower right 3-by-3 matrix is

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{m-2}^{(n)}&{} \beta _{m-2}^{(n)} &{} 0 \\ \beta _{m-2}^{(n)}&{} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ 0 &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} \lambda _l &{} C &{} 0 \\ C &{} \lambda _l &{} 0 \\ 0 &{} 0 &{} \lambda _l \end{array} \right) \end{aligned}$$

(13)

and $\lim _{n\rightarrow \infty }\beta _{m-3}^{(n)}=0$. But it is still mathematically open whether or not there in fact exists such a matrix that leads to the case (except the case $m=3$, for which a rigorous analysis is given in [4]; see Remark 3 below). However, we notice that the 3-by-3 limiting submatrix is described in a simple way (13) in the case $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=0$. Also in the case $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=0$, the corresponding limiting pattern is clearly described as (11).

We here focus on the other situation: $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=D\ne 0$ and $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)} = C\ne 0$. Does it never happen? If it happens, what is the limiting pattern of the lower right 3-by-3 matrix including $\lim _{n\rightarrow \infty }\alpha _{m-2}^{(n)}$? What is the exact convergence rate?

Below we show an answer to these questions; it turns out that such a situation can actually happen (we show an example), and there the convergence rate is strictly cubic.

4 A numerical experiment

Let us apply the QR algorithm with Wilkinson’s shift to a 101-by-101 matrix

$$\begin{aligned} T^{(0)}=\left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0 &{} 100 &{} &{} &{} \\ 100 &{} \ddots &{} \ddots &{} &{} \\ &{} \ddots &{} 0 &{} 100 &{} \\ &{} &{} 100 &{} 0 &{} 1 \\ &{} &{} &{} 1 &{} 0 \end{array} \right) , \end{aligned}$$

(14)

whose eigenvalues are 0, $\pm c_{k}\ (k=1,\ldots ,50)$. Also let $T_{2}^{(n)}$ be the lower right 3-by-3 submatrix of $T^{(n)}$; for $n=0$ we see

$$\begin{aligned} T_{2}^{(0)}=\left( \begin{array}{c@{\quad }c@{\quad }c} 0 &{} 100 &{} \\ 100 &{} 0 &{} 1 \\ &{} 1 &{} 0 \end{array} \right) . \end{aligned}$$

(15)

Then we find the following convergence behavior of $T_{2}^{(n)}$ for $n=3,4,5$:

$$\begin{aligned} T_{2}^{(3)}= & {} \left( \begin{array}{c@{\quad }c@{\quad }c} 1.80 &{} 2.55 &{} \\ 2.55 &{} -1.81 &{} 2.55\times 10^{-14} \\ &{} 2.55\times 10^{-14} &{} -1.21\times 10^{-28} \end{array} \right) ,\\ T_{2}^{(4)}= & {} \left( \begin{array}{c@{\quad }c@{\quad }c} 1.80 &{} 2.53 &{} \\ 2.53 &{} -1.81 &{} -1.96\times 10^{-42} \\ &{} -1.96\times 10^{-42} &{} -7.18\times 10^{-85} \end{array} \right) ,\\ T_{2}^{(5)}= & {} \left( \begin{array}{c@{\quad }c@{\quad }c} 1.81 &{} 2.53 &{} \\ 2.53 &{} -1.81 &{} -8.88\times 10^{-127} \\ &{} -8.88\times 10^{-127} &{} -1.47\times 10^{-253} \end{array} \right) . \end{aligned}$$

Thus it tends to

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{m-2}^{(n)}&{} \beta _{m-2}^{(n)} &{} 0 \\ \beta _{m-2}^{(n)}&{} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ 0 &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} \lambda _l-D &{} C &{} 0 \\ C &{} \lambda _l+D &{} 0 \\ 0 &{} 0 &{} \lambda _l \end{array} \right) , \end{aligned}$$

(16)

where $\lambda _l=0$, $C\approx 2.5$, $D\approx -1.8$. The observed convergence rate of $\beta _{m-1}^{(n)}$ is cubic: $|\beta _{m-1}^{(n+1)}|/|\beta _{m-1}^{(n)}|^{3}\approx 0.11$. Despite the long history of the convergence analysis for Wilkinson’s shift, no similar example has been pointed out in the research field of numerical linear algebra, as far as the authors know. In this sense, one of the contribution of this paper is the finding of the matrix (16). In the following sections, we prove that the convergence rate of (16) is strictly cubic.

5 Convergence rate analysis by Jiang-Zhang [3]

For the convergence analysis of (16), we review the paper [3] by Jiang and Zhang in 1985. It implies $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-2}^{(n)}|^{2}|\beta _{m-1}^{(n)}|^{3})$ in the case $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=0$, though the convergence analysis in [3] focuses on their own shift strategy where both Wilkinson’s shift and the Rayleigh quotient shift are exploited. The convergence rate analysis for the Wilkinson’s shift is described on p. 270 in [3] in the case $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=0$.

Let $T_{k}^{(n)}$ be the $k\times k$ leading principal submatrix of $T^{(n)}$ and

$$\begin{aligned} d_{k}^{(n)}:=\det \left( T_{k}^{(n)}-s^{(n)}I\right) \quad (k=1,\ldots , m). \end{aligned}$$

(17)

The following lemma is crucial [3, Lemma2].

Lemma 1

([3]) For any shift strategy in the tridiagonal QR algorithm,

$$\begin{aligned} \left| \beta _{m-1}^{(n+1)}\right| =\frac{\left| \beta _{m-1}^{(n)}d_{m}^{(n)} \gamma ^{(n)}\right| }{\left| \left( d_{m-1}^{(n)}\right) ^2 +\left( \beta _{m-1}^{(n)}\gamma ^{(n)}\right) ^2\right| } \end{aligned}$$

(18)

hold for all n, where

$$\begin{aligned} \left( \gamma ^{(n)}\right) ^2= & {} \left( d_{m-2}^{(n)}\right) ^2 +\left( \beta _{m-2}^{(n)}d_{m-3}^{(n)}\right) ^2 +\left( \beta _{m-2}^{(n)}\beta _{m-3}^{(n)}d_{m-4}^{(n)}\right) ^2+\cdots \nonumber \\&+\left( \beta _{m-2}^{(n)}\beta _{m-3}^{(n)}\ldots \beta _{2}^{(n)}d_{1}^{(n)}\right) ^2 +\left( \beta _{m-2}^{(n)}\beta _{m-3}^{(n)}\ldots \beta _{1}^{(n)}\right) ^2 \end{aligned}$$

(19)

for all n.

In order to derive the convergence rate, let us consider the right-hand side of (18). It is easy to see that

$$\begin{aligned} \lim _{n\rightarrow \infty }d_{m-1}^{(n)}=\prod _{i\not =l}(\lambda _{i}-\lambda _{l})\not =0 \end{aligned}$$

(20)

in view of $\lim _{n\rightarrow \infty }\beta _{m-1}^{(n)}=0$ and $s^{(\infty )}=\lambda _{l}$. Noting (17) and the definition of the Wilkinson’s shift, we have

$$\begin{aligned} d_{m}^{(n)}= & {} \left[ \left( \alpha _{m}^{(n)}-s^{(n)}\right) \left( \alpha _{m-1}^{(n)} -s^{(n)}\right) -\left( \beta _{m-1}^{(n)}\right) ^2\right] d_{m-2}^{(n)} -\left( \alpha _{m}^{(n)}-s^{(n)}\right) \left( \beta _{m-2}^{(n)}\right) ^2 d_{m-3}^{(n)}\nonumber \\= & {} -\frac{\left( \beta _{m-1}^{(n)}\right) ^2\left( \beta _{m-2}^{(n)}\right) ^2 d_{m-3}^{(n)}}{\left( \alpha _{m-1}^{(n)}-s^{(n)}\right) }. \end{aligned}$$

(21)

Since we assume here that $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=0$, $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-s^{(n)})= \lambda _{k}-\lambda _{l}\not =0$ holds for some $k,\ l$. Therefore, we have

$$\begin{aligned} \left| d_{m}^{(n)}\right| \sim \frac{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^2 \left| d_{m-3}^{(n)}\right| }{\left| \lambda _{k}-\lambda _{l}\right| } \end{aligned}$$

(22)

as $n\rightarrow \infty $. Furthermore, $|\gamma ^{(n)}|$ in (18) is bounded because $\beta _{k}^{(n)}\ (1\le k \le m-1)$ and $d_{k}^{(n)}\ (1\le k \le m)$ in (19) are bounded for all n. In addition, $\lim _{n\rightarrow \infty }\beta _{m-1}^{(n)}=0$ holds. Therefore, we obtain $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-2}^{(n)}|^{2}|\beta _{m-1}^{(n)}|^{3})$ from (18), (20) and (22).

Although the situation: $\beta _{m-2}^{(\infty )}\not =0,\ \alpha _{m-1}^{(\infty )}-s^{(\infty )}\not =0$ is not explicitly considered in [3], it is easy to see from the above discussion that $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-1}^{(n)}|^{3})$ is realized in this case. In the next section, we prove that the convergence rate of $\beta _{m-1}^{(n)}$ is strictly cubic based on Lemma 1.

6 Convergence theorem covering the new limiting pattern

In this section, we prove that all the possible limiting patterns are (11), (13), and (16). Note that (16) is the new pattern. Based on Lemma 1, we prove that the convergence rate of (16) is strictly cubic. As a result, we have a convergence theorem that covers all the possible limiting patterns.

First of all, we note the following lemma. See Appendix B for the proof.

Lemma 2

If $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=C\not =0$, then $|\lambda _{l}-\lambda _{l-1}|=|\lambda _{l}-\lambda _{l+1}|$ holds. Moreover, the lower right 3-by-3 matrix of $T^{(n)}$ converges as

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{m-2}^{(n)}&{} \beta _{m-2}^{(n)} &{} 0 \\ \beta _{m-2}^{(n)}&{} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ 0 &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} \lambda _l-D &{} C &{} 0 \\ C &{} \lambda _l+D &{} 0 \\ 0 &{} 0 &{} \lambda _l \end{array} \right) \end{aligned}$$

(23)

for $D\in {\mathbb {R}}$ such that $\sqrt{C^{2}+D^{2}}=|\lambda _{l} -\lambda _{l-1}|=|\lambda _{l}-\lambda _{l+1}|$.

As stated above, [7, 8] showed that, if $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}=C\not =0$ and $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=D=0$, then the limiting pattern of the 3-by-3 matrix is (13). Lemma 2 is a generalization of this result to $D\in {\mathbb {R}}$.

Fortunately, in the case $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=D\not =0$, we can obtain $\lim _{n\rightarrow \infty }|\beta _{m-1}^{(n+1)}|/|\beta _{m-2}^{(n)}|^2|\beta _{m-1}^{(n)}|^3$ using (18) in Lemma 1. Recall that $\lambda _{k}$ is the closest eigenvalue to $\lambda _{l}$. Let $\lambda _{j}$ be the second closest one. We derive the following lemma from (18).

Lemma 3

If $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=D\not =0$, then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^3} =\frac{\left| \rho _{2}\right| }{\left| \lambda _{k}-\lambda _{l}\right| ^3\left| \lambda _{j} -\lambda _{l}\right| \left| \rho _{1}\right| }, \end{aligned}$$

(24)

where

$$\begin{aligned} |\rho _{1}|= & {} |D|/|\lambda _{k}-\lambda _{l}|, \end{aligned}$$

(25)

$$\begin{aligned} |\rho _{2}|= & {} \sqrt{1-|\beta _{m-3}^{(\infty )}|^2/|\lambda _{j} -\lambda _{l}|^2}. \end{aligned}$$

(26)

In addition,

$$\begin{aligned} 0<|\rho _{1}|=|D|/|\lambda _{k}-\lambda _{l}|\le 1 \end{aligned}$$

(27)

holds. Even if $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=0$,

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-1}^{(n)}\right| ^2} \le \frac{1}{\left| \lambda _{k}-\lambda _{l}\right| } \end{aligned}$$

(28)

holds.

See Appendix B for the proof. In fact, (24) is an extension of [7, Theorem 8.11.1]:

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^3} =\frac{1}{\left| \lambda _{k}-\lambda _{l}\right| ^3\left| \lambda _{j}-\lambda _{l}\right| } \end{aligned}$$

under the assumption: $\alpha _{m-2}^{(\infty )}=\lambda _{j},\ \alpha _{m-1}^{(\infty )}=\lambda _{k},\ \beta _{m-2}^{(\infty )}=0,\ \beta _{m-3}^{(\infty )}=0$. In addition, (24) also covers the answer to [7, Exercise 8.11.3]. In other words, in the case $\beta _{m-2}^{(\infty )}=0,\ \alpha _{m-3}^{(\infty )}=\lambda _{l}(=\alpha _{m}^{(\infty )}),\ \beta _{m-3}^{(\infty )}\not =0$, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^3} =\frac{\sqrt{1-\left| \beta _{m-3}^{(\infty )}\right| ^2/\left| \lambda _{j} -\lambda _{l}\right| ^2}}{\left| \lambda _{k}-\lambda _{l}\right| ^3\left| \lambda _{j} -\lambda _{l}\right| }=0, \end{aligned}$$

where the first equality is due to (24), (25) and (26), the second equality is due to (69) in Appendix B. Also note that (28) is proved by Wilkinson [10]. We prove this based on Lemma 1 in Appendix B.

Using Lemmas 2 and 3, we have the following classification.

Theorem 4

Suppose the QR algorithm with Wilkinson’s shift is applied to an irreducible tridiagonal matrix T. Let $\lambda _{l}=\alpha _{m}^{(\infty )}$, and $\lambda _{k}$ be the closest eigenvalue to $\lambda _{l}$. Moreover, let $\lambda _{j}$ be the second closest one. All the possible limiting 3-by-3 matrices are divided into three patterns. Based on the limiting patterns, the convergence rates are classified as follows. If the lower right 3-by-3 submatrix converges to the limiting matrix

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{m-2}^{(n)}&{} \beta _{m-2}^{(n)} &{} 0 \\ \beta _{m-2}^{(n)}&{} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ 0 &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} * &{} 0 &{} 0 \\ 0 &{} \lambda _k &{} 0 \\ 0 &{} 0 &{} \lambda _l \end{array} \right) , \end{aligned}$$

(29)

where $k\not =l$, then the convergence rate is better that cubic:

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^3}= & {} \frac{\left| \rho _{2}\right| }{\left| \lambda _{k}-\lambda _{l}\right| ^3\left| \lambda _{j} -\lambda _{l}\right| }, \end{aligned}$$

(30)

$$\begin{aligned} \left| \rho _{2}\right|= & {} \sqrt{1-\left| \beta _{m-3}^{(\infty )}\right| ^2/\left| \lambda _{j} -\lambda _{l}\right| ^2}\le 1. \end{aligned}$$

(31)

If the lower right 3-by-3 submatrix converges to the limiting matrix

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{m-2}^{(n)}&{} \beta _{m-2}^{(n)} &{} 0 \\ \beta _{m-2}^{(n)}&{} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ 0 &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} \lambda _l-D &{} C &{} 0 \\ C &{} \lambda _l+D &{} 0 \\ 0 &{} 0 &{} \lambda _l \end{array} \right) , \end{aligned}$$

(32)

where $C\not =0,\ D\not =0,\ \sqrt{C^2+D^2}=|\lambda _{k}-\lambda _l|=|\lambda _{j}-\lambda _{l}|$, then the convergence rate is strictly cubic:

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-1}^{(n)}\right| ^3} =\frac{|C|^2}{\left| \lambda _{k}-\lambda _{l}\right| ^3|D|}. \end{aligned}$$

(33)

If the lower right 3-by-3 submatrix converges to the limiting matrix

$$\begin{aligned} \lim _{n\rightarrow \infty } \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha _{m-2}^{(n)}&{} \beta _{m-2}^{(n)} &{} 0 \\ \beta _{m-2}^{(n)}&{} \alpha _{m-1}^{(n)} &{} \beta _{m-1}^{(n)} \\ 0 &{} \beta _{m-1}^{(n)} &{} \alpha _{m}^{(n)} \end{array} \right) = \left( \begin{array}{c@{\quad }c@{\quad }c} \lambda _l &{} C &{} 0 \\ C &{} \lambda _l &{} 0 \\ 0 &{} 0 &{} \lambda _l \end{array} \right) , \end{aligned}$$

(34)

where $C\not =0,\ |C|=|\lambda _{k}-\lambda _l|=|\lambda _{j}-\lambda _{l}|$, then the convergence rate is at least quadratic:

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{|\beta _{m-1}^{(n+1)}|}{|\beta _{m-1}^{(n)}|^2} \le \frac{1}{|\lambda _{k}-\lambda _{l}|}. \end{aligned}$$

(35)

Proof

If $\beta _{m-2}^{(\infty )}=0$, then $|D|=|\lambda _{k}-\lambda _{l}|$ in Lemma 3. Hence, we have (30). If $\beta _{m-2}^{(\infty )}\not =0$, then $\beta _{m-3}^{(\infty )}=0$ because the block size of $T^{(\infty )}$ is at most 2. See Appendix A for its proof. In addition, $|\lambda _{k}-\lambda _l|=|\lambda _{j}-\lambda _{l}|$ is proved in Appendix A. Hence, we obtain (33) by Lemma 3. Lemma 3 also shows (35). $\square $

As noted above, the previous analysis did not explicitly point out the limiting pattern (32), though the cubic convergence of this pattern is guaranteed by Theorem 2. In our experience, this limiting pattern is relatively rare compared to the previously known case (29), but not extremely rare so that it can be left outside our consideration (see also Remark 3 below). We here also like to emphasize that the above classification covers all the possible limiting patterns.

Our analysis in this section is based on the elegant result [3, Lemma2] that is derived by cleverly utilizing the special features of the shifted tridiagonal QR algorithm and Cramer’s rule. Note that Theorem 4 in this section is the stronger result than Corollary 1 that was previously implied by several literatures [1, 7, 8, 11]. By carefully reading these literatures, we immediately find that all the existing proofs of Corollary 1 are also based on some special features of the tridiagonal QR algorithm. However, it is also possible to prove Corollary 1 based on the general gap theorem that is one of the useful perturbation theorems. In Appendix C, we describe another proof of Corollary 1 along this line: the gap theorem and the Jacobi transformation. The alternative proof is readily accessible to many readers in the research field of numerical linear algebra.

Remark 3

The above result is closely related to the work by Leitte–Saldanha–Tomei [4], which considered the case of 3-by-3 matrices, and proved in the language of dynamical systems theory that there exists a 3-by-3 matrix that converges only quadratically by the QR iteration with Wilkinson’s shift. More precisely, they considered the matrix

$$\begin{aligned} \tilde{T}=\left( \begin{array}{c@{\quad }c@{\quad }c} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 \end{array} \right) , \end{aligned}$$

(36)

and regarded the QR iterations with Wilkinson’s shift as maps generating a discrete dynamical system in the space of symmetric tridiagonal matrices with the same spectrum as $\tilde{T}$ (see the original paper for the detail). Then they proved that there exists an open neighborhood of $\tilde{T}$ such that (i) the iteration maps a point (matrix) back to the set, (ii) the convergence is strictly quadratic if it tends to $\tilde{T}$ and cubic otherwise, and (iii) the Hausdorff dimension of the set of such initial points that leads to the quadratic convergence is 1.

Although the main topic of [4] is the existence of the quadratic cases, if we view the result from the opposite direction, it is also claiming that there are quite many initial matrices close to $\tilde{T}$ resulting in cubic convergence (observe (iii) of [4], which states that the quadratic cases are “very thin” [4]). This strongly suggests that cubic convergence can be observed even when $\beta _{m-2}^{(n)}$ does not tend to 0 (since it should stay around 1). In this way, the work [4] suggests a similar result as above in the case of 3-by-3 matrices, although explicit initial matrix examples are not given. By shrinking the size of the example (14) in the present paper, it is easy to obtain a 3-by-3 example

$$\begin{aligned} T^{(0)}=\left( \begin{array}{c@{\quad }c@{\quad }c} 0 &{} 100 &{} \\ 100 &{} 0 &{} 1 \\ &{} 1 &{} 0 \end{array} \right) , \end{aligned}$$

(37)

which actually tends to the form (16) cubically.

7 Conclusions and future works

We pointed out a limiting matrix pattern (32) where the convergence rate is strictly cubic as (33). Taking into account the fact that all the elements always converge (see Appendix A), we see that the classification by Theorem 4 covers all the possible scenarios, starting from any initial matrix.

It is still open, however, whether or not there actually exists a strictly quadratic case for 4-by-4 or larger matrices, i.e., if the limiting case (34) actually occurs. Furthermore, even if it is confirmed, still there remains the possibility that the estimate $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-1}^{(n)}|^2)$ is an overestimate, and the actual rate is cubic. These issues are left as future works. (Recall that for 3-by-3 matrices they have been completely settled in [4]; see Remark 3).

References

Hoffmann, W., Parlett, B.N.: A new proof of global convergence for the tridiagonal QL algorithm. SIAM J. Numer. Anal. 15, 929–937 (1978)
Article MathSciNet MATH Google Scholar
Huang, H., Tam, T.: On the QR iterations of real matrices. Linear Algebra Appl. 408, 161–176 (2005)
Article MathSciNet Google Scholar
Jiang, E., Zhang, Z.: A new shift of the QL algorithm for irreducible symmetric matrices. Linear Algebra Appl. 65, 261–272 (1985)
Article MathSciNet MATH Google Scholar
Leitte, R.S., Saldanha, N.C., Tomei, C.: The asymptotics of Wilkinson’s shift: loss of cubic convergence. Found. Comput. Math. 10, 15–36 (2010)
Article MathSciNet MATH Google Scholar
Parlett, B.N.: Canonical decomposition of Hessenberg matrices. Math. Comput. 21, 223–227 (1967)
Article MathSciNet MATH Google Scholar
Parlett, B.N.: Global convergence of the basic QR algorithm on Hessenberg matrices. Math. Comput. 22, 803–817 (1968)
MathSciNet Google Scholar
Parlett, B.N.: The symmetric eigenvalue problem. Prentice-Hall,Englewood Cliffs, New Jersey, 1980; SIAM, Philadelphia (1998)
Wang, T.: Convergence of the tridiagonal QR algorithm. Linear Algebra Appl. 322, 1–17 (2001)
Article MathSciNet MATH Google Scholar
Wilkinson, J.H.: The algebraic eigenvalue problem. Clarendon Press, Oxford (1965)
MATH Google Scholar
Wilkinson, J.H.: Global convergence of tridiagonal QR algorithm with origin shifts. Linear Algebra Appl. 1, 409–420 (1968)
Article MathSciNet MATH Google Scholar
Zhang, G.: On the convergence rate of the QL algorithm with Wilkinson’s shift. Linear Algebra Appl. 113, 131–137 (1989)
Article MathSciNet Google Scholar

Download references

Acknowledgments

The author is grateful to Professor Takayasu Matsuo and Professor Beresford Parlett for their valuable comments and suggestions. The author also thanks the anonymous reviewer for the helpful comments.

Author information

Authors and Affiliations

The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, Japan
Kensuke Aishima

Authors

Kensuke Aishima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kensuke Aishima.

Additional information

The author is supported by JSPS Grant-in-Aid for Young Scientists (Grant Number 25790096).

Appendices

Appendix A: Proof of convergence

We prove here that all the tridiagonal elements of $T^{(n)}$ converge for any initial matrix $T^{(0)}$. To this end, we consider a general shift $s^{(n)}$ satisfying the following conditions:

(i)
The shift $s^{(n)}$ converges to a certain eigenvalue $s^{(\infty )}=\lambda _{l}$;
(ii)
$|s^{(n)}-\lambda _{l}|=\mathrm{o}(c^{n})$ for a positive constant $c<1$.

Note that Wilkinson’s shift satisfies the two conditions: (i) has been proved by [8]; then the convergence rate by the shifts $s^{(n)}\rightarrow \lambda _{l}$ is at least quadratic because $|s^{(n)}-\lambda _{l}|\le |s^{(n)}-\alpha _{m}^{(n)}|+|\alpha _{m}^{(n)}-\lambda _{l}| \le 2|\beta _{m-1}^{(n)}|$ by the definition of Wilkinson’s shift and Gershgorin’s circle theorem, and then we have $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-1}^{(n)}|^2)$ by Wilkinson’s proof, which implies (ii).

In what follows, we prove that all the tridiagonal elements $T^{(n)}$ always converge for such general shifts. The following convergence proof might have been noticed by the experts in this research field because its proof is almost the same as those by [2, 6, 9] for the unshifted QR algorithm. However, to the best of the authors’ knowledge, the proof for the shifted algorithm is not explicitly stated in any reference. For the readers’ convenience, we prove it as follows.

Theorem 5

Suppose that the QR algorithm with a general shift satisfying the above conditions (i) and (ii) is applied to an irreducible tridiagonal matrix T. Then, $T^{(n)}$ converges to a block diagonal matrix whose block size is at most 2. Further, the lower right diagonal element $T^{(n)}(m,m)$ converges to $\lambda _{l}$ and the lower right off-diagonal element $T^{(n)}(m,m-1)$ converges to 0.

Proof

First of all, we show several important facts for the convergence analysis. Similarly to the discussion in [6] and [9, Chapter8, §28], let $\tilde{Q}^{(n)},\ \tilde{R}^{(n)}$ be

$$\begin{aligned} \tilde{Q}^{(n)}= & {} Q^{(0)}\ldots Q^{(n-1)},\\ \tilde{R}^{(n)}= & {} R^{(n-1)}\ldots R^{(0)}. \end{aligned}$$

By the orthogonal matrix $\tilde{Q}^{(n)}$, $T^{(n)}$ is described as

$$\begin{aligned} T^{(n)} =\left( \tilde{Q}^{(n)}\right) ^\mathrm{T}T\tilde{Q}^{(n)}. \end{aligned}$$

(38)

in view of $T^{(n+1)}=R^{(n)}Q^{(n)}+s^{(n)}I=(Q^{(n)})^\mathrm{T}(Q^{(n)}R^{(n)}+s^{(n)}I)Q^{(n)}=(Q^{(n)})^\mathrm{T}T^{(n)}Q^{(n)}$. Using (38), we see

$$\begin{aligned} \tilde{Q}^{(n)}\tilde{R}^{(n)}= & {} \tilde{Q}^{(n-1)}\left( T^{(n-1)}-s^{(n-1)}I\right) \tilde{R}^{(n-1)} \nonumber \\= & {} \left( T-s^{(n-1)}I\right) \tilde{Q}^{(n-1)}\tilde{R}^{(n-1)} \nonumber \\= & {} \left( T-s^{(n-1)}I\right) \ldots \left( T-s^{(0)}I\right) . \end{aligned}$$

(39)

Let p(l) denote a permutation of the indices l $(l = 1, \ldots , m)$. Then in view of the condition (i) we can place the shifted eigenvalues in a descending order as

$$\begin{aligned} \left| \lambda _{p(1)}-s^{(\infty )}\right| \ge \cdots \ge \left| \lambda _{p(m-1)}-s^{(\infty )}\right| > \left| \lambda _{p(m)}-s^{(\infty )}\right| =0. \end{aligned}$$

(40)

The last inequality follows from (i) (note that all the eigenvalues are distinct since T is irreducible).

Next, we focus on the eigendecomposition

$$\begin{aligned} T=X\varLambda X^\mathrm{T}, \end{aligned}$$

(41)

where X is the orthogonal matrix consisting of the eigenvectors and $\varLambda $ is the diagonal matrix with the eigenvalues: $\mathrm{diag}(\lambda _{p(1)},\ldots ,\lambda _{p(m)})$. Then we see

$$\begin{aligned} \left( T-s^{(0)}I\right) \ldots \left( T-s^{(n-1)}I\right) =X\varLambda ^{(n)} X^\mathrm{T}, \end{aligned}$$

(42)

where

$$\begin{aligned} \varLambda ^{(n)} =\left( \varLambda -s^{(0)}I\right) \ldots \left( \varLambda -s^{(n-1)}I\right) . \end{aligned}$$

(43)

From (38) and (41), we have $T^{(n)}=(\tilde{Q}^{(n)})^\mathrm{T}X\varLambda X^\mathrm{T}\tilde{Q}^{(n)}$. If all the inequalities are strict in (40), $T^{(n)}$ converges to $\varLambda $. Actually, $T^{(n)}=(\tilde{Q}^{(n)})^\mathrm{T}X\varLambda X^\mathrm{T}\tilde{Q}^{(n)}$ always converges to a block diagonal matrix whose block size is at most 2. In order to prove it, we firstly apply the LU factorization $X^\mathrm{T}=LU$. Note that $X^\mathrm{T}$ constructed by the normalized eigenvectors of an irreducible tridiagonal matrix is always LU factorizable [2, 5]. Hence, we see

$$\begin{aligned} \left( T-s^{(0)}I\right) \ldots \left( T-s^{(n-1)}I\right) = X\varLambda ^{(n)} L\left( \varLambda ^{(n)}\right) ^{-1} \varLambda ^{(n)} U. \end{aligned}$$

(44)

Combining it with (39) we have

$$\begin{aligned} \tilde{Q}^{(n)}\tilde{R}^{(n)}= X\varLambda ^{(n)} L(\varLambda ^{(n)})^{-1} \varLambda ^{(n)} U. \end{aligned}$$

(45)

To reveal the relationship between $\tilde{Q}^{(n)}$ and X, we discuss the behavior of the QR factorization of $\varLambda ^{(n)} L(\varLambda ^{(n)})^{-1}$ as follows. Let $D_{\varLambda ^{(n)}}$ be an orthogonal matrix

$$\begin{aligned} \mathrm{diag}\left( \prod _{l=0}^{n-1}\frac{|\lambda _{p(1)} -s^{(l)}|}{(\lambda _{p(1)}-s^{(l)})},\ldots ,\prod _{l=0}^{n-1}\frac{|\lambda _{p(m)} -s^{(l)}|}{(\lambda _{p(m)}-s^{(l)})}\right) . \end{aligned}$$

(46)

It is easy to see that

$$\begin{aligned} D_{\varLambda ^{(n)}} \varLambda ^{(n)}= \mathrm{diag}\left( \prod _{l=0}^{n-1}|\lambda _{p(1)}-s^{(l)}|, \ldots ,\prod _{l=0}^{n-1}|\lambda _{p(m)}-s^{(l)}|\right) . \end{aligned}$$

(47)

In the right-hand side of (45), we have

$$\begin{aligned} \varLambda ^{(n)} L\left( \varLambda ^{(n)}\right) ^{-1}= D_{\varLambda ^{(n)}}^{-1}\left( D_{\varLambda ^{(n)}}\varLambda ^{(n)} L\left( D_{\varLambda ^{(n)}}\varLambda ^{(n)}\right) ^{-1}\right) D_{\varLambda ^{(n)}}, \end{aligned}$$

(48)

and by applying the QR factorization we see

$$\begin{aligned} D_{\varLambda ^{(n)}}\varLambda ^{(n)} L\left( D_{\varLambda ^{(n)}}\varLambda ^{(n)}\right) ^{-1} =P^{(n)}\varGamma ^{(n)}, \end{aligned}$$

(49)

where $P^{(n)}$ is an orthogonal matrix, $\varGamma ^{(n)}$ is an upper triangular matrix whose diagonal elements are positive. Let $D_{U}$ be an orthogonal matrix $D_{U}=\mathrm{diag}(|u_{11}|/u_{11}, \ldots ,|u_{mm}|/u_{mm})$. Then we see

$$\begin{aligned} \tilde{Q}^{(n)}= & {} XD_{\varLambda ^{(n)}}^{-1}P^{(n)}D_{U}^{-1}, \end{aligned}$$

(50)

$$\begin{aligned} \tilde{R}^{(n)}= & {} D_{U}\varGamma ^{(n)} D_{\varLambda ^{(n)}} \varLambda ^{(n)} U \end{aligned}$$

(51)

from (45), (48), and (49). Therefore, we have

$$\begin{aligned} T^{(n)}=D_{U}\left( P^{(n)}\right) ^\mathrm{T}\varLambda P^{(n)}D_{U}^{-1} \end{aligned}$$

(52)

from (38) and (41).

Since our aim is to prove the convergence of all the elements of $T^{(n)}$, let us discuss the behavior of the orthogonal matrix $P^{(n)}$ as $n\rightarrow \infty $. To this end, we focus on (49). The lower left elements are

$$\begin{aligned} \left( D_{\varLambda ^{(n)}}\varLambda ^{(n)} L\left( D_{\varLambda ^{(n)}}\varLambda ^{(n)}\right) ^{-1}\right) _{ij}=l_{ij} \prod _{l=0}^{n-1}\left| \frac{\lambda _{p(j)} -s^{(l)}}{\lambda _{p(i)}-s^{(l)}}\right| \quad (i>j) \end{aligned}$$

(53)

from (47). Obviously $\lim _{n\rightarrow \infty }(D_{\varLambda ^{(n)}}\varLambda ^{(n)} L(D_{\varLambda ^{(n)}}\varLambda ^{(n)})^{-1})_{ij}=0$, when $|\lambda _{p(i)}-s^{(\infty )}| > |\lambda _{p(j)}-s^{(\infty )}|$. Otherwise, from (40) and the condition (ii), we have

$$\begin{aligned} \left| \frac{\lambda _{p(j)}-s^{(l)}}{\lambda _{p(i)}-s^{(l)}}\right| =\left| \frac{\lambda _{p(j)}-\lambda _{p(m)} +\lambda _{p(m)}-s^{(l)}}{\lambda _{p(i)} -\lambda _{p(m)}+\lambda _{p(m)}-s^{(l)}}\right| =1+\mathrm{o}(c^{l}). \end{aligned}$$

(54)

Since a sequence of the size $\mathrm{o}(c^{l})$ with $0<c<1$ absolutely converges, $(D_{\varLambda ^{(n)}}\varLambda ^{(n)} L(D_{\varLambda ^{(n)}}\varLambda ^{(n)})^{-1})_{ij}$ represented by the infinite product (53) is convergent:

$$\begin{aligned} \lim _{n\rightarrow \infty }D_{\varLambda ^{(n)}}\varLambda ^{(n)} L\left( D_{\varLambda ^{(n)}}\varLambda ^{(n)}\right) ^{-1}=\tilde{L}. \end{aligned}$$

(55)

The resulting matrix $\tilde{L}$ is not only unit lower triangular, but also block diagonal with the block sizes at most 2, because if the equality $|\lambda _{p(k)}-s^{(\infty )}|=|\lambda _{p(k+1)}-s^{(\infty )}|$ in (40) holds, then both inequalities $|\lambda _{p(k-1)}-s^{(\infty )}|>|\lambda _{p(k)}-s^{(\infty )}|$ and $|\lambda _{p(k+1)}-s^{(\infty )}|>|\lambda _{p(k+2)}-s^{(\infty )}|$ are satisfied thanks to the fact that the eigenvalues are all distinct. Hence, the orthogonal matrix $P^{(n)}$ given by the QR factorization of $D_{\varLambda ^{(n)}}\varLambda ^{(n)} L(D_{\varLambda ^{(n)}}\varLambda ^{(n)})^{-1}$ is convergent:

$$\begin{aligned} \lim _{n\rightarrow \infty }{P}^{(n)}=\tilde{P}, \end{aligned}$$

(56)

where $\tilde{P}$ is a block diagonal matrix whose block size is at most 2. It then follows that

$$\begin{aligned} \lim _{n\rightarrow \infty }T^{(n)} =D_{U}{\tilde{P}}^\mathrm{T}\varLambda \tilde{P}D_{U}^{-1} \end{aligned}$$

(57)

from (52). Therefore, $T^{(n)}$ converges to a block diagonal matrix whose block size is at most 2. $\square $

Appendix B: Proofs of Lemmas 2 and 3

We prove here Lemmas 2 and 3 in turn.

By the discussion in Appendix A, $\lambda _{p(m-1)},\ \lambda _{p(m-2)}$ are the eigenvalues of the $2\times 2$ submatrix

$$\begin{aligned} \left( \begin{array}{c@{\quad }c} \alpha _{m-2}^{(\infty )} &{} \beta _{m-2}^{(\infty )} \\ \beta _{m-2}^{(\infty )} &{} \alpha _{m-1}^{(\infty )} \\ \end{array} \right) . \end{aligned}$$

(58)

In view of $\beta _{m-2}^{(\infty )}\not = 0$, we see $|\lambda _{p(m-2)}-\lambda _{p(m)}|=|\lambda _{p(m-1)}-\lambda _{p(m)}|$. The eigenvalues are real and distinct, which implies $(\lambda _{p(m-2)}-\lambda _{p(m)})+(\lambda _{p(m-1)}-\lambda _{p(m)})=0$. Since the sum of the eigenvalues of the matrix (58) is equal to the trace of that, we have $(\alpha _{m-2}^{(\infty )}-s^{(\infty )}) +(\alpha _{m-1}^{(\infty )}-s^{(\infty )})=0$. In other words,

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( \begin{array}{c@{\quad }c} \alpha _{m-2}^{(n)}-s^{(n)} &{} \beta _{m-2}^{(n)} \\ \beta _{m-2}^{(n)} &{} \alpha _{m-1}^{(n)}-s^{(n)} \\ \end{array} \right) = \left( \begin{array}{c@{\quad }c} -D &{} C \\ C &{} D \\ \end{array} \right) \end{aligned}$$

(59)

holds for constants C and D. Obviously, the eigenvalues of the matrix (59) are $\pm \sqrt{C^2+D^2}$. It then follows that

$$\begin{aligned} \sqrt{C^2+D^2}=\left| \lambda _{p(m-2)}-\lambda _{p(m)}\right| =\left| \lambda _{p(m-1)}-\lambda _{p(m)}\right| . \end{aligned}$$

(60)

This completes the proof of Lemma 2.

Next, we we prove Lemma 3 based on Lemma 1. Actually, in Lemma 1,

$$\begin{aligned} \lim _{n\rightarrow \infty }\left| \gamma ^{(n)}\right| =\prod _{1\le i \le m-2}\left| \lambda _{p(i)}-\lambda _{p(m)}\right| >0 \end{aligned}$$

(61)

holds. We prove (61) below. If $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)}= 0$, then $\lim _{n\rightarrow \infty }\alpha _{m-1}^{(n)}= \lambda _{p(m-1)}$ holds. Therefore, we obtain

$$\begin{aligned} \lim _{n\rightarrow \infty }\left| \gamma ^{(n)}\right| =\lim _{n\rightarrow \infty }\left| d_{m-2}^{(n)}\right| =\lim _{n\rightarrow \infty }\prod _{1\le i \le m-2}\left| \lambda _{p(i)}-\lambda _{p(m)}\right| \end{aligned}$$

from (17) and (19). Next, we consider the situation $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)} \not =0$. It is easy to see that, if $\lim _{n\rightarrow \infty }\beta _{m-2}^{(n)} \not =0$, then $\lim _{n\rightarrow \infty }\beta _{m-3}^{(n)}= 0$ because the block size of $T^{(\infty )}$ is at most 2. Combining it with (17), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }d_{m-3}^{(n)}=\prod _{1\le i\le m-3}\left( \lambda _{p(i)}-\lambda _{p(m)}\right) \not =0. \end{aligned}$$

(62)

In addition, we see

$$\begin{aligned} \lim _{n\rightarrow \infty }(\gamma ^{(n)})^2=\lim _{n\rightarrow \infty }\left[ \left( d_{m-2}^{(n)}\right) ^2 +\left( \beta _{m-2}^{(n)}d_{m-3}^{(n)}\right) ^2\right] \end{aligned}$$

(63)

in (19). Noting $\beta _{m-3}^{(\infty )}=0$ and (17), we see that

$$\begin{aligned} \lim _{n\rightarrow \infty }|\gamma ^{(n)}|= & {} \lim _{n\rightarrow \infty }\sqrt{\left( \alpha _{m-2}^{(n)}-s^{(n)}\right) ^{2} +\left( \beta _{m-2}^{(n)}\right) ^{2}}|d_{m-3}^{(n)}|\nonumber \\= & {} \sqrt{\left( \alpha _{m-2}^{(\infty )}-\lambda _{p(m)}\right) ^2 +\left( \beta _{m-2}^{(\infty )}\right) ^2}\prod _{1\le i \le m-3}|\lambda _{p(i)}-\lambda _{p(m)}| \end{aligned}$$

(64)

holds. Therefore, we obtain (61) from (59) and (60).

Obviously,

$$\begin{aligned} \lim _{n\rightarrow \infty }d_{m-1}^{(n)}=\lim _{n\rightarrow \infty }\prod _{1\le i \le m-1}\left( \lambda _{p(i)}-s^{(n)}\right) \not =0 \end{aligned}$$

(65)

holds. Moreover, if $D\not =0$, then

$$\begin{aligned} \left| d_{m}^{(n)}\right| \sim \frac{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^2 \left| d_{m-3}^{(n)}\right| }{|D|} \end{aligned}$$

(66)

in the same way as (22).

From (18), (61), (65), and (66), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^3} =\frac{\left| d_{m-3}^{(\infty )}\right| }{|D|\left| \lambda _{p(m-1)} -\lambda _{p(m)}\right| ^2\prod _{1\le i \le m-2}\left| \lambda _{p(i)}-\lambda _{p(m)}\right| }. \end{aligned}$$

(67)

Noting $|\rho _{1}|$ is defined as (25), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^3} =\frac{\left| d_{m-3}^{(\infty )}\right| }{\left| \rho _{1}\right| \left| \lambda _{p(m-1)} -\lambda _{p(m)}\right| ^3\prod _{1\le i \le m-2}\left| \lambda _{p(i)}-\lambda _{p(m)}\right| } \end{aligned}$$

(68)

from (67).

We investigate the behavior of $|d_{m-3}^{(n)}|$. First of all, we assume $\beta _{m-3}^{(\infty )}\not =0$. Then $\beta _{m-4}^{(\infty )}=0$ and $\beta _{m-2}^{(\infty )}=0$. Similarly to (59) and (60),

$$\begin{aligned} \left| \alpha _{m-3}^{(\infty )}-\lambda _{p(m)}\right| ^2 +\left| \beta _{m-3}^{(\infty )}\right| ^2=\left| \lambda _{p(m-3)} -\lambda _{p(m)}\right| ^2=\left| \lambda _{p(m-2)}-\lambda _{p(m)}\right| ^2 \end{aligned}$$

(69)

holds. Noting $\beta _{m-4}^{(\infty )}=0$, we have

$$\begin{aligned} \left| d_{m-3}^{(\infty )}\right|= & {} \left| \alpha _{m-3}^{(\infty )}-\lambda _{p(m)}\right| \prod _{1\le i \le m-4}\left| \lambda _{p(i)}-\lambda _{p(m)}\right| \\= & {} \sqrt{\left| \lambda _{p(m-3)}-\lambda _{p(m)}\right| ^2 -\left| \beta _{m-3}^{(\infty )}\right| ^2}\prod _{1\le i \le m-4}\left| \lambda _{p(i)}-\lambda _{p(m)}\right| \end{aligned}$$

in the same way as (59) and (60). Moreover, noting $\left| \lambda _{p(m-3)}-\lambda _{p(m)}\right| =\left| \lambda _{p(m-2)}\right. \left. -\lambda _{p(m)}\right| $, we obtain

$$\begin{aligned} \left| d_{m-3}^{(\infty )}\right| = \sqrt{1-\left| \beta _{m-3}^{(\infty )}\right| ^2/\left| \lambda _{p(m-2)} -\lambda _{p(m)}\right| ^2}\prod _{1\le i \le m-3}\left| \lambda _{p(i)}-\lambda _{p(m)}\right| . \end{aligned}$$

(70)

Actually, it is easy to see that (70) covers the case $\beta _{m-3}^{(\infty )}=0$. Therefore, we obtain

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^3}= & {} \frac{\sqrt{1-\left| \beta _{m-3}^{(\infty )}\right| ^2/\left| \lambda _{p(m-2)} -\lambda _{p(m)}\right| ^2}}{\left| \rho _{1}\right| \left| \lambda _{p(m-1)} -\lambda _{p(m)}\right| ^3\left| \lambda _{p(m-2)}-\lambda _{p(m)}\right| }\nonumber \\= & {} \frac{\left| \rho _{2}\right| }{\left| \rho _{1}\right| \left| \lambda _{p(m-1)} -\lambda _{p(m)}\right| ^3\left| \lambda _{p(m-2)}-\lambda _{p(m)}\right| }, \end{aligned}$$

(71)

where the first equality is due to (68) and (70), the second equality is due to the definition of $|\rho _{2}|$ in (26). This completes the proof of (24).

The final task is to derive (28) in the case $\alpha _{m-1}^{(\infty )}=\alpha _{m}^{(\infty )}$. In (21), $|\beta _{m-1}^{(n)}|/|\alpha _{m-1}^{(n)}-s^{(n)}|\le 1$ in view of the definition of Wilkinson’s shift. Hence,

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{\left| d_{m}^{(n)}\right| }{\left| \beta _{m-1}^{(n)}\right| }\le \left| \beta _{m-2}^{(\infty )}\right| ^2\left| d_{m-3}^{(\infty )}\right| \end{aligned}$$

from (21). Also noting $\left| \beta _{m-2}^{(\infty )}\right| =\left| \lambda _{p(m-1)} -\lambda _{p(m)}\right| =\left| \lambda _{p(m-2)}-\lambda _{p(m)}\right| $, $\beta _{m-3}^{(\infty )}=0$, we have

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{\left| d_{m}^{(n)}\right| }{\left| \beta _{m-1}^{(n)}\right| }\le \prod _{1\le i \le m-1}\left| \lambda _{p(i)}-\lambda _{p(m)}\right| . \end{aligned}$$

Therefore, we obtain (28) from (18), (61), and (65).

Appendix C: Convergence analysis based on perturbation theory

In this section, using perturbation theory we prove Corollary 1. In other words, we prove the following facts:

if $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=0$, then $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-1}^{(n)}|^2)$;
if $\lim _{n\rightarrow \infty }(\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)})=D\not =0$, then $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-2}^{(n)}|^2|\beta _{m-1}^{(n)}|^3)$.

In what follows, we use the so-called gap theorem [7, Theorem 11.7.1].

Lemma 4

([7]) Let y be a unit vector, A be a symmetric matrix, and $\lambda _{l}$ be the eigenvalue of A closest to $y^\mathrm{T}Ay$. Then

$$\begin{aligned} \left| y^\mathrm{T}Ay-\lambda _{l}\right| \le \frac{\left\| Ay-\left( y^\mathrm{T}Ay\right) y\right\| ^2}{\min _{i\not =l}\left| \lambda _{i}-y^\mathrm{T}Ay\right| } \end{aligned}$$

holds.

First of all, we note

$$\begin{aligned} \left| \beta _{m-1}^{(n+1)}\right| \sim \frac{\left| \lambda _{p(m)}-s^{(n)}\right| }{\left| \lambda _{p(m-1)} -s^{(n)}\right| }\left| \beta _{m-1}^{(n)}\right| \end{aligned}$$

(72)

as $n\rightarrow \infty $ for the general shifts satisfying $\beta _{m-1}^{(\infty )}=0$ and the condition (i) in Appendix A. Although this fact might be noticed by the experts, the authors do not know the literature where its proof is explicitly written. Hence, we prove (72) below. Recall that $T_{k}^{(n)}$ for $k=1,\ldots ,m$ are the $k \times k$ leading principal submatrix of $T^{(n)}$ and $d_{k}^{(n)}=\det (T_{k}^{(n)}-s^{(n)}I)$ for $k=1,\ldots ,m$ defined in (17). Then, (65) holds for the general shifts in view of the condition (i) and $\lim _{n\rightarrow \infty }\beta _{m-1}^{(n)}= 0$. Obviously, $d_{m}^{(n)}=\prod _{1\le i \le m}(\lambda _{p(i)}-s^{(n)})$. Furthermore, noting that $\gamma ^{(n)}$ in (19) is bounded and $\lim _{n\rightarrow \infty }\beta _{m-1}^{(n)}= 0$, in (18) we have

$$\begin{aligned} \left| \beta _{m-1}^{(n+1)}\right| \sim \frac{\left| (\lambda _{p(m)}-s^{(n)})\gamma ^{(n)}\right| }{\prod _{1\le i \le m-1}\left| \lambda _{p(i)}-s^{(n)}\right| }\left| \beta _{m-1}^{(n)}\right| \end{aligned}$$

(73)

as $n\rightarrow \infty $. From (61), we have (72).

For the discussion below, we describe the relation (72) more precisely. For any $\epsilon _{1} >0$,

$$\begin{aligned} \left| \beta _{m-1}^{(n+1)}\right| \le \frac{\left| \lambda _{p(m)}-s^{(n)}\right| }{\left| \lambda _{p(m-1)} -s^{(n)}\right| }\left| \beta _{m-1}^{(n)}\right| \left( 1+\epsilon _{1}\right) \end{aligned}$$

(74)

holds for all sufficiently large n.

In order to reveal the convergence rate, let us estimate $|\lambda _{p(m)}-s^{(n)}|$. To this end, suppose we apply one step of the Jacobi method for the lower right 2-by-2 submatrix of $T^{(n)}$. Then we see from [7, Chapter9] that the angle $\theta ^{(n)}$ of Givens rotation for annihilating $\beta _{m-1}^{(n)}$ satisfies

$$\begin{aligned} \tan \left( 2\theta ^{(n)}\right) =\frac{2\beta _{m-1}^{(n)}}{\alpha _{m-1}^{(n)} -\alpha _{m}^{(n)}}, \end{aligned}$$

(75)

where $\theta ^{(n)}$ is chosen in the interval $[-\pi /4,\ \pi /4]$. It means that the transformed matrix can be described as

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} \ddots &{} \ddots &{} &{} &{} \\ \ddots &{} * &{} * &{} &{} \\ &{} * &{} * &{} * &{} z^{(n)} \\ &{} &{} * &{} * &{} 0 \\ &{} &{} z^{(n)} &{} 0 &{} s^{(n)} \\ \end{array} \right) , \end{aligned}$$

where

$$\begin{aligned} z^{(n)}=\beta _{m-2}^{(n)}\sin (\theta ^{(n)}). \end{aligned}$$

(76)

Note that $s^{(\infty )}=\lambda _{p(m)}$. Let

$$\begin{aligned} \delta =\min _{i\not =m}\left| \lambda _{p(i)}-s^{(\infty )}\right| >0. \end{aligned}$$

(77)

For any $\epsilon _{2} >0$, noting Lemma 4 with $y:=(0,0,\ldots ,1)^\mathrm{T}$, we have

$$\begin{aligned} \left| \lambda _{p(m)}-s^{(n)}\right| \le \frac{|z^{(n)}|^{2}}{\delta -\epsilon _{2}} \end{aligned}$$

(78)

for all sufficiently large n. In addition, it is easy to see that, for any $\epsilon _{3} > 0$,

$$\begin{aligned} \left| \lambda _{p(m-1)}-s^{(n)}\right| \ge \delta - \epsilon _{3} \end{aligned}$$

(79)

for all sufficiently large n.

Now we consider the case of $|\alpha _{m-1}^{(n)} - \alpha _m^{(n)}| \rightarrow D \ne 0$. We obtain

$$\begin{aligned} \left| \beta _{m-1}^{(n+1)}\right|\le & {} \frac{\left| \beta _{m-1}^{(n)}\right| \left| z^{(n)}\right| ^{2}\left| 1+\epsilon _{1}\right| }{\left| \delta -\epsilon _{2}\right| \left| \delta -\epsilon _{3}\right| } \nonumber \\= & {} \frac{\left| \beta _{m-1}^{(n)}\right| \left| \beta _{m-2}^{(n)} \sin (\theta ^{(n)})\right| ^{2}\left| 1+\epsilon _{1}\right| }{\left| \delta -\epsilon _{2}\right| \left| \delta -\epsilon _{3}\right| } \nonumber \\= & {} \frac{\left| \beta _{m-1}^{(n)}\right| ^{3}\left| \beta _{m-2}^{(n)}\right| ^{2}\left| 1 +\epsilon _{1}\right| }{\left| \alpha _{m-1}^{(n)}-\alpha _{m}^{(n)}\right| ^{2}\left| \delta -\epsilon _{2}\right| \left| \delta -\epsilon _{3}\right| } \end{aligned}$$

(80)

for all sufficiently large n by using (74), (79), (78), (76), (75) in turn. We see $\epsilon _{1}, \epsilon _{2}, \epsilon _{3} \rightarrow 0$ and $|\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)}| \rightarrow |D| > 0$ as $n\rightarrow \infty $. Therefore, $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-2}^{(n)}|^{2}|\beta _{m-1}^{(n)}|^{3})$.

Finally, we prove the quadratic convergence in the case $|\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)}|\rightarrow 0$. Since the estimate (80) by the Jacobi transformation cannot derive the quadratic convergence in the case $|\alpha _{m-1}^{(n)}-\alpha _{m}^{(n)}|\rightarrow 0$, we give another estimate based on Lemma 4. For any $\epsilon _{4} >0$, we see

$$\begin{aligned} \left| \lambda _{p(m)}-\alpha _{m}^{(n)}\right| \le \frac{\left| \beta _{m-1}^{(n)}\right| ^{2}}{\delta -\epsilon _{4}} \end{aligned}$$

(81)

for all sufficiently large n from Lemma 4 with $y:=(0,0,\ldots ,1)^\mathrm{T}$, $A:=T^{(n)}$. From the definition of Wilkinson’s shift, $|\alpha _{m}^{(n)}-s^{(n)}|\le |\beta _{m-1}^{(n)}|$ holds. Hence,

$$\begin{aligned} \left| \lambda _{p(m)}-s^{(n)}\right| \le \left| \lambda _{p(m)}-\alpha _{m}^{(n)}\right| +\left| \alpha _{m}^{(n)}-s^{(n)}\right| \le \frac{\left| \beta _{m-1}^{(n)}\right| ^{2}}{\delta -\epsilon _{4}}+\left| \beta _{m-1}^{(n)}\right| \end{aligned}$$

(82)

for all sufficiently large n. Thus, we have

$$\begin{aligned} \left| \beta _{m-1}^{(n+1)}\right| \le \frac{\left| \beta _{m-1}^{(n)}\right| ^{2}\left( \delta -\epsilon _{4}+\beta _{m-1}^{(n)}\right) \left| 1+\epsilon _{1}\right| }{\left| \delta -\epsilon _{3}\right| \left| \delta -\epsilon _{4}\right| } \end{aligned}$$

for all sufficiently large n from (74), (79), (82). We see $\epsilon _{3}, \epsilon _{4} \rightarrow 0$ and $\beta _{m-1}^{(n)} \rightarrow 0$ as $n\rightarrow \infty $. Therefore, $|\beta _{m-1}^{(n+1)}|=\mathrm{O}(|\beta _{m-1}^{(n)}|^{2})$. This completes the proof.

Although the convergence analysis above is readily accessible to the readers in the research fields of the numerical linear algebra, we also note that the right-hand side of (80) in our proof is an overestimate in the case of $|\lambda _{p(m-1)}-\lambda _{p(m)}|< |\lambda _{p(m-2)}-\lambda _{p(m)}|$ because we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{\left| \alpha _{m-1}^{(n)}-\alpha _{m}^{(n)}\right| ^{2}\left| \delta \right| ^2}= & {} \frac{1}{|D|^2\left| \lambda _{p(m-1)}-\lambda _{p(m)}\right| ^2}\nonumber \\> & {} \frac{1}{|D|\left| \lambda _{p(m-2)} -\lambda _{p(m)}\right| \left| \lambda _{p(m-1)}-\lambda _{p(m)}\right| ^2}\nonumber \\= & {} \frac{1}{\left| \lambda _{p(m-1)} -\lambda _{p(m)}\right| ^3\left| \lambda _{p(m-2)}-\lambda _{p(m)}\right| \left| \rho _{1}\right| }\nonumber \\\ge & {} \frac{\left| \rho _{2}\right| }{\left| \lambda _{p(m-1)} -\lambda _{p(m)}\right| ^3\left| \lambda _{p(m-2)}-\lambda _{p(m)}\right| \left| \rho _{1}\right| }\nonumber \\= & {} \lim _{n\rightarrow \infty }\frac{\left| \beta _{m-1}^{(n+1)}\right| }{\left| \beta _{m-2}^{(n)}\right| ^2\left| \beta _{m-1}^{(n)}\right| ^3}, \end{aligned}$$

(83)

where the first equality is due to the assumption $\alpha _{m-1}^{(\infty )}-\alpha _{m}^{(\infty )}=D$ and the definition of $\delta $ in (77), the next inequality is due to (27) and the above condition $|\lambda _{p(m-1)}-\lambda _{p(m)}|< |\lambda _{p(m-2)}-\lambda _{p(m)}|$, the next equality is due to (25), the next inequality is due to (26), and the last equality is due to (24). Also note that, in the case of $\beta _{m-2}^{(\infty )}=C\not =0$, the above relation (83) holds because we have the strict inequality in (83) from (60). Therefore, the right-hand side of (80) is an overestimate.

About this article

Cite this article

Aishima, K. A note on the convergence theorem of the tridiagonal QR algorithm with Wilkinson’s shift. Japan J. Indust. Appl. Math. 32, 465–487 (2015). https://doi.org/10.1007/s13160-015-0171-y

Download citation

Received: 02 December 2014
Revised: 23 March 2015
Published: 12 June 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s13160-015-0171-y

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A note on the convergence theorem of the tridiagonal QR algorithm with Wilkinson’s shift

Abstract

Similar content being viewed by others

Global Convergence of Hessenberg Shifted QR I: Exact Arithmetic

Global convergence of the restarted Lanczos and Jacobi–Davidson methods for symmetric eigenvalue problems

Convergence Acceleration of Shifted LR Transformations for Totally Nonnegative Hessenberg Matrices

1 Introduction

Remark 1

Remark 2

2 QR algorithm with Wilkinson’s shift

Theorem 1

Theorem 2

3 Convergence rate analysis by Hoffmann–Parlett [1]

Theorem 3

Corollary 1

4 A numerical experiment

5 Convergence rate analysis by Jiang-Zhang [3]

Lemma 1

6 Convergence theorem covering the new limiting pattern

Lemma 2

Lemma 3

Theorem 4

Proof

Remark 3

7 Conclusions and future works

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proof of convergence

Theorem 5

Proof

Appendix B: Proofs of Lemmas 2 and 3

Appendix C: Convergence analysis based on perturbation theory

Lemma 4

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation