Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction and Main Result

Gershgorin’s circle theorem [4] is a fundamental and widely used result on localizing the eigenvalues of square matrices. It states that all eigenvalues are in disks (called Gershgorin disks) around the diagonal elements.

The main goal of this paper is to improve Gershgorin’s theorem under special conditions, namely, when the matrix is non-negative and has a multiple eigenvalue. We show that such an eigenvalue lies in disks of smaller radius around a diagonal element. For the proof we establish various geometric inequalities concerning rearrangements of vector sums. This is an interesting connection between convex geometry and matrix theory. The geometric point of view in eigenvalue problems is certainly not new but this particular connection seems to be new.

Here we show that if the matrix entries are non-negative and an eigenvalue has geometric multiplicity at least two, then this eigenvalue lies in a smaller disk.

Let D(a, r) denote the disk with center a and radius r on the complex plane:

$$\displaystyle{D(a,r) = \left \{x \in \mathbb{C}: \vert x - a\vert \leqslant r\right \}.}$$

For an n × n complex matrix, A = [a ij ], the Gershgorin disks are D(a ii , R i ) where R i = j: ij | a ij |. The most commonly cited form of Gershgorin’s theorem says that every eigenvalue of A lies in some D(a ii , R i ). Varga’s nice book Gershgorin and His Circles [15] surveys various applications and extensions of this important theorem. An interesting and recent theorem of Marsli and Hall [7] states that if an eigenvalue of a matrix A has geometric multiplicity k, then it lies in at least k of the Gershgorin disks of A. They have extended this result in subsequent papers [3, 810]. Here we focus on the k = 2 case for non-negative matrices.

Understanding the spectra of a matrix is a central question both in applied and pure mathematics. Here are some facts and results. There are two particular eigenvalues for which the multiplicity is of great importance; the largest eigenvalue which determines the spectral radius of the matrix and the multiplicity of the eigenvalue “0” since it determines the rank of the matrix. There are also applications using the smallest eigenvalue. For example Roy shows in [12] that the Euclidean representation number of a graph is closely related to the multiplicity of the smallest eigenvalue. The multiplicity of the largest and the second largest eigenvalues play a key role in some numerical methods. Del Corso [2] considers the problem of approximating an eigenvector belonging to the largest eigenvalue by the so called power method. It is proved that the rate of convergence depends on the ratio of the two largest eigenvalues and on their multiplicities. The rate increases with the multiplicity of the largest eigenvalue and decreases with the multiplicity of the second eigenvalue. In graph theory the Colin de Verdière number is the multiplicity of the second largest eigenvalue of the adjacency matrix, maximized by weighting the edges and nodes. For more details and the exact definition we refer to the papers [6] and [14].

Gershgorin’s circle theorem is intertwined with the Perron–Frobenius theory. It is one of the tools used to bound the spectral radius of a matrix. It follows from the Perron–Frobenius theorem that the largest magnitude eigenvalue of any non-negative matrix is a positive real number, see in e.g.  [1].

Let us define the half Gershgorin disks, D(a ii , r i ), which are subsets of the original. Instead of R i = j: ij | a ij | we take the partial sum of the ⌊n∕2⌋ largest terms. This sum is denoted by r i .

Recall that the geometric multiplicity of an eigenvalue λ of A is the dimension of the corresponding eigenspace of A, that is, the kernel of AλI. (Its algebraic multiplicity is the multiplicity of the root λ of the polynomial det(AxI).)

We are going to show that multiple geometric eigenvalues are in smaller Gershgorin disks when the matrix is non-negative.

Theorem 1

Let A = {a ij } be an n × n non-negative (real) matrix and λ an eigenvalue of A with geometric multiplicity at least two. Then λ is in a half Gershgorin disk, D(a ii , r i ), for some i. 

Actually we are going to prove that such an eigenvalue lies in the disk D(a ii , r) and various values of r for some suitable i. The proofs are based on geometric estimates that are of independent interest. They are given in the next section.

2 Rearrangement Inequalities for Vectors

Assume \(V =\{ v_{1},\ldots,v_{n}\} \subset \mathbb{R}^{d}\) and 1 n v i = 0. Further, let α 1α n ≥ 0 be real numbers. We write [n] for the set {1, , n}.

Theorem 2

Under the above conditions set β = α n∕2⌋+1 . Then for every permutation σ of [n]

$$\displaystyle{\left \|\sum _{1}^{n}\alpha _{ i}v_{\sigma (i)}\right \| \leq \max _{i\in [n]}\|v_{i}\|\sum _{1}^{n}\vert \alpha _{ i} -\beta \vert.}$$

Corollary 1

Under the above conditions, for every permutation σ of [n]

$$\displaystyle{\left \|\sum _{1}^{n}\alpha _{ i}v_{\sigma (i)}\right \| \leq \max _{i\in [n]}\|v_{i}\|\sum _{1}^{\lfloor n/2\rfloor }\alpha _{ i}.}$$

In the second geometric estimate we need a technical assumption.

Theorem 3

Let \(V =\{ v_{1},\ldots,v_{n}\} \subset \mathbb{R}^{d}\) satisfy the previous assumption. Suppose further that the v i are ordered with decreasing (Euclidean) length, that is,v 1∥ ≥ ≥ ∥v n . Let γ ∈ [α j+1, α j ] for some j ∈ [n − 1]. Then for every permutation σ of [n]

$$\displaystyle{\left \|\sum _{1}^{n}\alpha _{ i}v_{\sigma (i)}\right \| \leq \sum _{1}^{j}\alpha _{ i}\|v_{i}\| - \frac{\gamma } {2}\left [\sum _{1}^{j}\|v_{ i}\| -\sum _{j+1}^{n}\|v_{ i}\|\right ].}$$

Here of course one wants to choose j and γ so that the right hand side is as small as possible. When j = ⌈n∕2⌉, the sum between the brackets is non-negative. Choosing any γ from the interval [α j+1, α j ] gives the following.

Corollary 2

Under the above conditions for every permutation σ of [n]

$$\displaystyle{\left \|\sum _{1}^{n}\alpha _{ i}v_{\sigma (i)}\right \| \leq \sum _{1}^{\lceil n/2\rceil }\alpha _{ i}\|v_{i}\|.}$$

We mention that the estimates in Theorems 2 and 3 are incomparable; sometimes the first, other times the second gives the better bound.

3 Proof of the Rearrangement Inequalities

Proof of Theorem 2

First fix some γ ≥ 0. Then

$$\displaystyle{\sum _{1}^{n}\alpha _{ i}v_{\sigma (i)} =\sum _{ 1}^{n}\alpha _{ i}v_{\sigma (i)} -\sum _{1}^{n}\gamma v_{\sigma (i)} =\sum _{ 1}^{n}(\alpha _{ i}-\gamma )v_{\sigma (i)}.}$$

By the triangle inequality

$$\displaystyle{\left \|\sum _{1}^{n}\alpha _{ i}v_{\sigma (i)}\right \| \leq \max _{i\in [n]}\|v_{i}\|\sum _{1}^{n}\vert \alpha _{ i} -\gamma \vert.}$$

Set k = ⌊n∕2⌋ and define β = α k+1. It can be proven that the function γ 1 n | α i γ | takes its minimum at γ = β when n is odd, and at every γ from the interval [α k+1, α k ] when n is even. □

Corollary 1 follows immediately since with the above k and β

$$\displaystyle\begin{array}{rcl} \sum _{1}^{n}\vert \alpha _{ i} -\beta \vert & =& \sum _{1}^{k}(\alpha _{ i}-\beta ) +\sum _{ k+1}^{n}(\beta -\alpha _{ i}) {}\\ & =& \sum _{1}^{k}\alpha _{ i} -\sum _{\ell}^{n}\alpha _{ i} \leq \sum _{1}^{k}\alpha _{ i} {}\\ \end{array}$$

where equals k + 1 for even n and k + 2 for odd n.

Proof of Theorem 3

The zonotope Z(V ) spanned by V is, by definition, the set

$$\displaystyle{Z(V ) = \left \{\sum _{i\in [n]}\xi _{i}v_{i}: 0 \leq \xi _{i} \leq 1\;(\forall i)\right \}.}$$

Let B denote the Euclidean unit ball of \(\mathbb{R}^{d}\). We claim first that

$$\displaystyle{ Z(V ) \subset \frac{1} {2}\Big(\|v_{1}\| + \cdots +\| v_{n}\|\Big)B. }$$
(1)

It is well-known [11] and easy to check that Z(V ) is the convex hull of the points s(W) = vW v where WV. Thus it suffices to show that for every WV, \(\|s(W)\| \leq \frac{1} {2}(\|v_{1}\| +\ldots +\|v_{n}\|)\). Fix UV such that s(U) has maximal length among all s(W). Set z = s(U) and observe that − z = s(V ∖U) as s(V ) = 0. Since ∥z∥ = ∥− z∥ evidently, we have

$$\displaystyle{2\|z\| =\| z\| +\| -z\| =\| s(U)\| +\| s(V \setminus U)\| \leq \sum _{1}^{n}\|v_{ i}\|}$$

by the triangle inequality. This implies that \(\|z\| \leq \frac{1} {2}\sum _{1}^{n}\|v_{ i}\|\).

We observe next that

$$\displaystyle\begin{array}{rcl} \sum _{1}^{n}\alpha _{ i}v_{\sigma (i)}& =& \sum _{1}^{j}(\alpha _{ i}-\gamma )v_{\sigma (i)} +\sum _{ 1}^{j}\gamma v_{\sigma (i)} +\sum _{ j+1}^{n}\alpha _{ i}v_{\sigma (i)} {}\\ & =& \sum _{1}^{j}(\alpha _{ i}-\gamma )v_{\sigma (i)} +\gamma \left [\sum _{1}^{j}v_{\sigma (i)} +\sum _{ j+1}^{n}\frac{\alpha _{i}} {\gamma } v_{\sigma (i)}\right ]. {}\\ \end{array}$$

The expression between the brackets is a vector u in Z(V ) so \(\|u\| \leq \frac{1} {2}\sum _{1}^{n}\|v_{ i}\|\). By the triangle inequality the norm of 1 n α i v σ(i) is at most

$$\displaystyle\begin{array}{rcl} \sum _{1}^{j}(\alpha _{ i}-\gamma )\|v_{\sigma (i)}\| +\gamma \| u\|& \leq & \sum _{1}^{j}(\alpha _{ i}-\gamma )\|v_{i}\| + \frac{\gamma } {2}\sum _{1}^{n}\|v_{ i}\| {}\\ & =& \sum _{1}^{j}\alpha _{ i}\|v_{i}\| - \frac{\gamma } {2}\left [\sum _{1}^{j}\|v_{ i}\| -\sum _{j+1}^{n}\|v_{ i}\|\right ]. {}\\ \end{array}$$

4 Proof of Theorem 1

We first recall the simple proof of Gershgorin’s original theorem. Let v = (v 1, , v n ) be an eigenvector with eigenvalue λ where v i are complex numbers. Assume | v i | = max j ∈ [n] | v j |. Then j = 1 n a ij v j = λv i implying

$$\displaystyle{ (\lambda -a_{ii})v_{i} =\sum _{j:j\neq i}a_{ij}v_{j}. }$$
(2)

Taking absolute value on both sides and using | v i | ≥ | v j | shows that λD(a ii , R i ) with R i = j: ji a ij indeed.

When the eigenvalue λ has geometric multiplicity at least two, then its eigenspace contains a nonzero vector v = (v 1, , v n ) whose components sum to zero: 1 n v i = 0. Indeed, let u and w be two linearly independent eigenvectors from the eigenspace of λ. If 1 n u i = 0, then v = u is a suitable eigenvector. If not, then \(v = \left (\sum _{1}^{n}w_{i}\right )u -\left (\sum _{1}^{n}u_{i}\right )w\) has the required property.

As any multiplier of v is still an eigenvector, we can suppose that the largest magnitude component of v,  v i , is a positive real number. Actually we can and do assume that v i = 1. Then the other components, v j , are complex numbers with | v j | ≤ 1.

The proof of Theorem 1 is based on equation (2) plus the condition that 1 n v j = 0. As \(\mathbb{C}\) is a vector space of dimension 2 over \(\mathbb{R}\), we can consider the components v j of v as vectors in \(\mathbb{R}^{2}\). Then Theorem 2 with d = 2 applies to the \(v_{j} \in \mathbb{R}^{2}\), we just have to imagine that on the right hand side of (2) v i is added with coefficient zero. So define b ii = 0 and b ij = a ij if ij. Let b be the median of the sequence b i1, , b in . Theorem 2 gives then that λ lies in the disk D(a ii , r) where

$$\displaystyle{ r =\sum _{j\neq i}\vert b_{ij} - b^{{\ast}}\vert. }$$
(3)

The proof of Theorem 1 uses Corollary 1: λ lies in the disk D(a ii , r) where r is the sum of the largest ⌊n∕2⌋ entries in the ith row of A (disregarding a ii ). Note that in general the estimate in (3) is gives a better bound on r than Theorem 1. □

We can also apply Corollary 2 to the components of v, considered again as vectors in \(\mathbb{R}^{2}\). This gives that λ lies in the disk D(a ii , r) where r is the sum of the k = ⌈n∕2⌉ largest entries in row i of A (disregarding a ii again). In any special case a better estimate may come from the more general Theorem 3.

Remark 1

One could hope that an eigenvalue with (geometric) multiplicity 3 or higher should lie strictly inside the half Gershgorin disk. The simple example below shows that this is not the case.

Let A be an n × n matrix with n = 3k, consisting of three k × k blocks along the main diagonal, with each block being a doubly stochastic matrix. Then λ = 1 is an eigenvalue with multiplicity 3, which lies on the boundary of each half Gershgorin disk D(a ii , r i ). Indeed r i is the sum of the largest ⌊n∕2⌋ entries of the ith row (disregarding a ii ) which equals 1 − a ii .

This example shows, however, that λ lies in the “third Gershgorin disk”. This is the disk centred at a ii and of radius r which is the sum of the largest n∕3 entries in the ith row (disregarding again a ii ). We return to this question at the end of the paper.

5 Examples

In what follows we show examples illustrating the limits of possible extensions of the results above. Note that one can not expect in general that a multiple eigenvalue is strictly inside the half Gershgorin disk. The simplest illustration to this is the matrix A below where 1 is an eigenvalue with (geometric) multiplicity two.

$$\displaystyle{A = \left [\begin{array}{ccc} 0&1&1\\ 1 &0 &1 \\ 1&1&0\\ \end{array} \right ]}$$

Next we are going to give further examples. The first two show that Theorem 1 does not extend to real matrices that have both positive and negative entries. The second is a positive semidefinite Hermitian matrix (with complex entries) where the triple eigenvalue “0” lies on the boundary of the half Gershgorin disk. Perhaps some form of Theorem 1 can be extended to such matrices.

5.1 Real Matrices with Both Positive and Negative Entries

The matrices in Theorem 1 have non-negative entries. This condition cannot be deleted as the following symmetric circulant matrix with 0, ±1 entries shows:

$$\displaystyle{B = \left [\begin{array}{ccccc} 0 & 1 & - 1& - 1& 1\\ 1 & 0 & 1 & - 1 & -1 \\ - 1& 1 & 0 & 1 & - 1\\ - 1 & - 1 & 1 & 0 & 1 \\ 1 & - 1& - 1& 1 & 0 \end{array} \right ]}$$

Like every 5 × 5 symmetric circulant matrix, B has two multiple eigenvalues. They are \(\sqrt{5} \approx 2.236\) and \(-\sqrt{5}\) and both lie outside the half Gershgorin disk.

The following 7 × 7 matrix is again circulant and has 0, ±1 entries. Its multiple eigenvalue ≈ −3. 494 is even further from the half Gershgorin disk which has radius 3 around the origin.

$$\displaystyle{C = \left [\begin{array}{ccccccc} 0 & 1 & - 1& 1 & 1 & - 1& 1\\ 1 & 0 & 1 & - 1 & 1 & 1 & -1 \\ - 1& 1 & 0 & 1 & - 1& 1 & 1\\ 1 & - 1 & 1 & 0 & 1 & - 1 & 1 \\ 1 & 1 & - 1& 1 & 0 & 1 & - 1\\ - 1 & 1 & 1 & - 1 & 1 & 0 & 1 \\ 1 & - 1& 1 & 1 & - 1& 1 & 0\\ \end{array} \right ]}$$

5.2 A Positive Semidefinite Matrix

The next construction gives a 9 × 9 positive semidefinite Hermitian matrix H with the triple eigenvalue “0” lying on the boundary of the half Gershgorin disk. (This is very different from the example in Remark 1 where the half disk and the third disk were the same.) The other eigenvalue is 6 and it lies on the boundary of the “quarter disk”. This example comes from the Hesse configuration of 9 points and 12 lines in \(\mathbb{C}\mathbb{P}^{2}\) [5]. The matrix H looks interesting on its own right. It shows further that strengthening Theorem 1 to more general matrices (with high multiplicity eigenvalues) might be difficult.

One possible realization of the Hesse configuration is given by the following 9 points on the complex projective plane

$$\displaystyle{\begin{array}{ccc} p_{1} = (0,1,-1)& p_{2} = (0,1,-\omega ) &p_{3} = (0,1,-\omega ^{2}) \\ p_{4} = (1,0,-1)&p_{5} = (1,0,-\omega ^{2})& p_{6} = (1,0,-\omega ) \\ p_{7} = (1,-1,0)& p_{8} = (1,-\omega,0) &p_{9} = (1,-\omega ^{2},0)\\ \end{array} }$$

where \(\omega = \frac{-1+i\sqrt{3}} {2}\) is a third root of unity. In this arrangement each point lies on four lines and each line contains three points. Our first matrix, A, records the linear dependencies of the points. It has 9 columns, one for each point, and 12 rows, one for each line. If p i , p j and p k are collinear, then there are nonzero complex multipliers α, β, γ such that αp i + βp j + γp k = 0. For example the sixth (highlighted) row in the matrix A below represents the equation

$$\displaystyle{-\omega ^{2}(0,1,-1) - (0,1,-\omega ) -\omega (0,1,-\omega ^{2}) = (0,0,0).}$$

Thus the matrix A encodes the linear dependencies of collinear triples in the point-line arrangement of the Hesse configuration.

$$\displaystyle{A = \left [\begin{array}{ccccccccc} 1 & 0 & 0 & - 1& 0 & 0 & 1 & 0 & 0\\ \\ \\ 0 & 0 & 1 & 0 & - 1& 0 & 1 & 0 & 0\\ \\ \\ 0 & 1 & 0 & 0 & 0 & - 1& 1 & 0 & 0\\ \\ \\ 0 & 0 & 0 & 0 & 0 & 0 & -\omega ^{2} & - 1& -\omega \\ \\ \\ 0 & 0 & 0 & -\omega ^{2} & -\omega &- 1& 0 & 0 & 0\\ \\ \\ -\boldsymbol{\omega }^{\mathbf{2}} &\mathbf{-1}& -\boldsymbol{\omega } & \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{0}\\ \\ \\ 0 & \omega & 0 & 0 & - 1& 0 & 0 & 1 & 0\\ \\ \\ 0 & 0 & -\omega ^{2} & 0 & 0 & 1 & 0 & 0 & - 1\\ \\ \\ -\omega ^{2} & 0 & 0 & 0 & 1 & 0 & 0 & 0 & - 1\\ \\ \\ \omega & 0 & 0 & 0 & 0 & - 1& 0 & 1 & 0\\ \\ \\ 0 & 1 & 0 & -\omega & 0 & 0 & 0 & 0 & \omega \\ \\ \\ 0 & 0 & \omega & - 1& 0 & 0 & 0 & 1 & 0 \end{array} \right ]}$$

The points of the Hesse configuration satisfy the homogeneous system of equations A x = 0 where \(x_{i} \in \mathbb{C}\mathbb{P}^{2}\). An affine image of a solution is also a solution, implying that the rank of A is at most 6. It is easy to see that the rank is exactly 6: the rank remains the same if one multiplies a matrix with its Hermitian transpose (complex conjugate transpose). So consider the 9 × 9 matrix \(H = \overline{A^{T}}A\).

$$\displaystyle{H = \left [\begin{array}{ccccccccc} \phantom{ -} 4& \phantom{ -}\omega \phantom{^{2}} & \phantom{ -}\omega ^{2} & - 1& -\omega \phantom{^{2}} & -\omega ^{2} & \phantom{ -} 1& \phantom{ -}\omega ^{2} & \phantom{ -}\omega \phantom{^{2}}\\ \\ \\ \phantom{ -}\omega ^{2} & \phantom{ -} 4& \phantom{ -}\omega \phantom{^{2}} & -\omega \phantom{^{2}} & -\omega ^{2} & - 1&\phantom{ -} 1& \phantom{ -}\omega ^{2} & \phantom{ -}\omega \phantom{^{2}}\\ \\ \\ \phantom{ -}\omega \phantom{^{2}} & \phantom{ -}\omega ^{2} & \phantom{ -} 4& -\omega ^{2} & - 1& -\omega \phantom{^{2}} & \phantom{ -} 1& \phantom{ -}\omega ^{2} & \phantom{ -}\omega \phantom{^{2}}\\ \\ \\ - 1& -\omega ^{2} & -\omega \phantom{^{2}} & \phantom{ -} 4& \phantom{ -}\omega ^{2} & \phantom{ -}\omega \phantom{^{2}} & - 1& - 1& - 1\\ \\ \\ -\omega ^{2} & -\omega \phantom{^{2}} & - 1& \phantom{ -}\omega \phantom{^{2}} & \phantom{ -} 4& \phantom{ -}\omega ^{2} & - 1& - 1& - 1\\ \\ \\ -\omega \phantom{^{2}} & - 1& -\omega ^{2} & \phantom{ -}\omega ^{2} & \phantom{ -}\omega \phantom{^{2}} & \phantom{ -} 4& - 1& - 1& - 1\\ \\ \\ \phantom{ -} 1&\phantom{ -} 1&\phantom{ -} 1& - 1& - 1& - 1&\phantom{ -} 4& \phantom{ -}\omega \phantom{^{2}} & \phantom{ -}\omega ^{2}\\ \\ \\ \phantom{ -}\omega \phantom{^{2}} & \phantom{ -}\omega \phantom{^{2}} & \phantom{ -}\omega \phantom{^{2}} & - 1& - 1& - 1& \phantom{ -}\omega ^{2} & \phantom{ -} 4& \phantom{ -}\omega \phantom{^{2}}\\ \\ \\ \phantom{ -}\omega ^{2} & \phantom{ -}\omega ^{2} & \phantom{ -}\omega ^{2} & - 1& - 1& - 1& \phantom{ -}\omega \phantom{^{2}} & \phantom{ -}\omega ^{2} & \phantom{ -} 4 \end{array} \right ]}$$

Matrix H is a positive semidefinite Hermitian matrix that has two eigenvalues: 0 with multiplicity 3 (so the rank of A is indeed 6) and 6 with multiplicity 6. All non-diagonal entries have norm one and the diagonal entries are 4. Thus λ = 0 is on the boundary of the half Gershgorin disk D(4, 4) and λ = 6 on the boundary of D(4, 2), the “quarter disk” (Fig. 1).

Fig. 1
figure 1

The Gershgorin disk and half-disk of H

6 Remarks

There are several questions that remain open.

  • What can be said about the location of an eigenvalue with larger multiplicity? Our method, using the zonotope Z(V ) in the proof of Theorem 3 has its limitations. Perhaps inequality (1) can be improved. For instance, for an eigenvalue with multiplicity at least k one would like to use an eigenvector v = (v 1, , v n ) such that the corresponding zonotope Z(V ) satisfies

    $$\displaystyle{Z(V ) \subset c\left (\|v_{1}\| +\ldots +\|v_{n}\|\right )B}$$

    where c decreases as k grows. Unfortunately one can not expect c to go below \(\frac{1} {\pi }\), (see Exercise 14.9 in [13])

  • How about other matrices? What is the radius of the shrunken Gershgorin disk which contains a multiple eigenvalue of a general complex matrix? Are there better bounds for special matrices, like real or positive semidefinite Hermitian matrices?