Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The interconnections and coupling patterns of dynamical systems are best described in terms of graph theory. This chapter serves the purpose of summarizing the main results and tools from matrix analysis and graph theory that will be important for the analysis of interconnected systems in subsequent chapters. This includes a proof of the Perron–Frobenius theorem for irreducible nonnegative matrices using a contraction mapping principle on convex cones due to Birkhoff (1957). We introduce adjacency matrices and Laplacians associated to a weighted directed graph and study their spectral properties. The analysis of eigenvalues and eigenvectors for graph adjacency matrices and Laplacians is the subject of spectral graph theory, which is briefly summarized in this chapter; see the book by Godsil and Royle (2001) for a comprehensive presentation. Explicit formulas for the eigenvalues and eigenvectors of Laplacians are derived for special types of graphs such as cycles and paths. These formulas will be used later on, in Chapter 9, in an examination of homogeneous networks. The technique of graph compression is briefly discussed owing to its relevance for the model reduction of networks. Properties of graphs are increasingly important for applications to, for example, formation control and molecular geometry. Therefore, a brief section is included on graph rigidity and the characterization of Euclidean distance matrices.

We begin by establishing some notation to be used subsequently and presenting some basic facts on Kronecker products of matrices over a field \(\mathbb{F}\). For rectangular matrices \(A \in \mathbb{F}^{m\times n},B \in \mathbb{F}^{k\times l},\) the Kronecker product is defined as the mk ×nl matrix

$$\displaystyle{ A\otimes B = \left (\begin{array}{ccccc} a_{11}B &&\ldots && a_{1n}B\\ \vdots & &\ddots & & \vdots \\ a_{m1}B &&\ldots &&a_{\mathit{mn}}B \end{array} \right ). }$$

By this definition, the Kronecker product of an upper triangular matrix A with a rectangular matrix B is block-upper triangular. In particular, the Kronecker product BI N is of the form

$$\displaystyle{ B\otimes I_{N} = \left (\begin{array}{ccccc} b_{11}I_{N} &&\ldots &&b_{1l}I_{N}\\ \vdots & &\ddots & & \vdots \\ b_{k1}I_{N}&&\ldots &&b_{\mathit{kl}}I_{N} \end{array} \right ), }$$

while

$$\displaystyle{ I_{N}\otimes A = \text{diag}(A,\ldots,A) = \left (\begin{array}{ccccc} A&&\ldots && 0\\ \vdots & &\ddots & & \vdots\\ 0 & &\ldots & &A \end{array} \right ). }$$

If A and B are invertible n × n and m × m matrices, respectively, then the Kronecker product AB is invertible and

$$\displaystyle{ (A \otimes B)^{-1} = A^{-1} \otimes B^{-1}. }$$

The eigenvalues of AB are the products \(\lambda _{i}(A)\lambda _{j}(B)\) of the eigenvalues λ i (A) and λ j (B) of A and B, respectively. Therefore, the trace and determinant of AB of matrices A and B are tr (AB) = tr (A)tr (B) and \(\det (A \otimes B) =\det (A)^{m}\det (B)^{n}\). Similarly, the eigenvalues of \(A \otimes I_{m} + I_{n} \otimes B\) are the sums \(\lambda _{i}(A) +\lambda _{j}(B)\). The following rules for the Kronecker product are easily verified:

$$\displaystyle\begin{array}{rcl} (A \otimes B)(C \otimes D)& =& \mathit{AC} \otimes \mathit{BD}, {}\\ (A \otimes B)^{\top }& =& A^{\top }\otimes B^{\top }. {}\\ & & {}\\ \end{array}$$

Let \(\mathrm{vec(A)} \in \mathbb{F}^{\mathit{mn}}\) denote a column vector obtained by stacking the second column of A under the first, then the third under the second, and so on. This leads to the following important identity:

$$\displaystyle{ \mathrm{vec(ABC)} =\big (C^{\top }\otimes A\big)\mathrm{vec(B)}. }$$

1 Nonnegative Matrices and Contractions

Let \(\mathbb{R}_{+}\) denote the subset of all nonnegative real numbers. A matrix \(A \in \mathbb{R}^{n\times n}\) is called nonnegative (positive) if all entries a ij of A are nonnegative (positive) real numbers. The notation for nonnegative and positive matrices

$$\displaystyle\begin{array}{rcl} A \geq 0& \;\Longleftrightarrow\;& a_{\mathit{ij}} \geq 0,\quad i,j = 1,\ldots,n, {}\\ A > 0& \;\Longleftrightarrow\;& a_{\mathit{ij}} > 0,\quad i,j = 1,\ldots,n {}\\ & & {}\\ \end{array}$$

should not be confused with the notion of positive definiteness for symmetric matrices. To distinguish these notions from each other, the positive (semi-) definiteness of real symmetric matrices \(A = A^{\top }\in \mathbb{R}^{n\times n}\) is denoted by

$$\displaystyle\begin{array}{rcl} A\succeq 0& \;\Longleftrightarrow\;& x^{\top }\mathit{Ax} \geq 0\quad \text{for all}\;x \in \mathbb{R}^{n}, {}\\ A \succ 0& \;\Longleftrightarrow\;& x^{\top }\mathit{Ax} > 0\quad \text{for all}\;x \in \mathbb{R}^{n}\setminus \{0\}. {}\\ \end{array}$$

The sum and product of finitely many nonnegative matrices is nonnegative. Moreover, the scalar product λ A of a nonnegative matrix A with a nonnegative real number λ is a nonnegative matrix. Thus the set of nonnegative matrices forms a closed convex cone \(\mathbb{R}_{+}^{n\times n}\) in the matrix space \(\mathbb{R}^{n\times n}\), which is multiplicatively closed.

The set of nonnegative matrices forms the largest class of matrices that leave the cone \(\mathbb{R}_{+}^{n}\) invariant. The Hilbert metric allows for a natural generalization of this situation. We refer to the papers by Birkhoff (1957), Bushell (1986), and Kohlberg and Pratt (1982) for additional background. Let \(C \subset \mathbb{R}^{n}\) denote a closed convex cone that is pointed, i.e., C has nonempty interior and satisfies \(C \cap (-C) =\{ 0\}\). We use the notation x ≥ 0 whenever x ∈ C and x > 0 whenever x is an interior point of C. Recall that the dual cone of C is defined as

$$\displaystyle{ C^{{\ast}} =\{\lambda \in \mathbb{R}^{1\times n}\;\vert \;\lambda (x) \geq 0\}. }$$

Clearly, the dual cone \((\mathbb{R}_{+}^{n})^{{\ast}}\) of \(\mathbb{R}_{+}^{n}\) is equal to \(\mathbb{R}_{+}^{1\times n}\). We mention, without proof, the well-known fact that for pointed closed convex sets the interior of the dual cone C is nonempty. Note further that for C a closed pointed convex cone, every linear functional λ in the interior of C satisfies λ(x) > 0 for all nonzero x ∈ C. This implies the following lemma.

Lemma 8.1.

Let C be a closed convex and pointed cone in \(\mathbb{R}^{n}\) . Then the subset \(C_{1} =\{ x \in C\;\vert \;\lambda (x) = 1\}\) is compact for all interior points λ of the dual cone C .

Proof.

Clearly, C 1 \(\mathbb{R}^{n}\). Suppose C 1 is unbounded. Then there exists a sequence \(x_{k} \in C_{1}\), with \(\|x_{k}\| \rightarrow \infty \) and λ(x k ) = 1 for all k. Thus \(\lambda (\frac{x_{k}} {\|x_{k}\|})\) converges to 0. By the compactness of the unit sphere in \(\mathbb{R}^{n}\), there exists an infinite subsequence \(y_{m},m \in \mathbb{N}\) of \(\frac{x_{k}} {\|x_{k}\|}\) that converges to a unit vector y ∈ C. Thus \(\lambda (y) =\lim _{m\rightarrow \infty }\lambda (y_{m}) = 0\). But λ is in the interior of C , and therefore λ(x) > 0 for all x ∈ C ∖{0}. Thus y = 0, which is a contradiction. ■ 

A projective metric on C is a map \(d: C \times C\longrightarrow \mathbb{R} \cup \{\infty \}\) such that for all x, y, z ∈ C (and r ≤ , r +  =  =  + for all real r):

  1. 1.

    d(x, y) = d(y, x);

  2. 2.

    d(x, y) ≥ 0, d(x, y) = 0 if and only if x = λ y for some real λ > 0;

  3. 3.

    d(x, z) ≤ d(x, y) + d(y, z).

Conditions 1–3 imply the identity

  1. 4.

    d(λ x, μ y) = d(x, y) for all λ > 0, μ > 0.

Let

$$\displaystyle\begin{array}{rcl} M(x,y)& =& \inf \{\lambda \geq 0\;\vert \;x \leq \lambda y\}, {}\\ m(x,y)& =& \sup \{\lambda \geq 0\;\vert \;x \geq \lambda y\} = \frac{1} {M(y,x)}. {}\\ & & {}\\ \end{array}$$

The following properties are easily seen:

$$\displaystyle\begin{array}{rcl} & & 0 \leq m(x,y) \leq M(x,y) \leq \infty, {}\\ & & m(x,y)y \leq x \leq M(x,y)y. {}\\ \end{array}$$

Definition 8.2.

The Hilbert metric on C is a projective metric defined by

$$\displaystyle{ d(x,y) =\log \frac{M(x,y)} {m(x,y)} =\log M(x,y) +\log M(y,x). }$$

Here, d(0, 0) = 0;  d(x, 0) = d(0, y) =  for x, y ∈ C.

The preceding definitions are illustrated by the following examples.

Example 8.3.

  1. (a)

    Let \(C = \mathbb{R}_{+}^{n}\). Then for all x > 0, y > 0

    $$\displaystyle{ m(x,y) =\min _{i=1,\ldots,n}\frac{x_{i}} {y_{i}},\quad M(x,y) =\max _{i=1,\ldots,n}\frac{x_{i}} {y_{i}}. }$$

    Thus the Hilbert metric on \(\mathbb{R}_{+}^{n}\) is

    $$\displaystyle{ d(x,y) =\max _{1\leq i,j\leq n}\log \frac{x_{i}y_{j}} {x_{j}y_{i}},\quad \text{for}\;x > 0,y > 0. }$$
  2. (b)

    Let \(C =\{ X \in \mathbb{R}^{n\times n}\;\vert \;X = X^{\top }\succeq 0\}\) denote the closed convex cone of positive semidefinite real symmetric matrices. Let λ min(X) and λ max(X) denote the smallest and largest eigenvalues of a symmetric matrix X, respectively. For positive definite matrices X ≻ 0, Y ≻ 0 then

    $$\displaystyle{ m(X,Y ) =\lambda _{\mathrm{min}}(\mathit{XY }^{-1}),\quad M(X,Y ) =\lambda _{\mathrm{ max}}(\mathit{XY }^{-1}). }$$

    Thus the Hilbert metric of two positive definite matrices X ≻ 0, Y ≻ 0 is

    $$\displaystyle{ d(X,Y ) =\log \frac{\lambda _{\mathrm{max}}(\mathit{XY }^{-1})} {\lambda _{\mathrm{min}}(\mathit{XY }^{-1})}. }$$

For the proof of the following result see Kohlberg and Pratt (1982).

Theorem 8.4.

Let C be a closed convex cone in \(\mathbb{R}^{n}\) that is pointed. Let λ ∈ C be an interior point of C . Then the following properties are satisfied:

  1. 1.

    The Hilbert metric is a projective metric on C. The distance satisfies d(x,y) < ∞ if and only if x and y are interior points of \(L(x,y) \cap C\) . In particular, d(x,y) < ∞ for all interior points x,y of C.

  2. 2.

    Let C 1 :={ x ∈ C | λ(x) = 1}. Then (C 1 ,d) is a compact metric space, and there exists a constant γ > 0 with

    $$\displaystyle{ \|x - y\| \leq \gamma d(x,y)\quad \forall x,y \in C_{1}. }$$
  3. 3.

    Let x,y ∈ C be such that d(x,y) < ∞. Then there exist a,b ∈ C such that

    $$\displaystyle{ d(x,y) =\log \frac{\|a - y\|\|x - b\|} {\|a - x\|\|y - b\|}. }$$

A linear map \(A: \mathbb{R}^{n}\longrightarrow \mathbb{R}^{n}\) is called C-monotonic whenever AC ⊂ C. Let d denote the Hilbert metric on C. Then

$$\displaystyle{ k(A) =\inf \{ k \geq 0\;\vert \;d(\mathit{Ax},\mathit{Ay}) \leq \mathit{kd}(x,y)\;\forall x,y \in C\} }$$

denotes the contraction constant of A. The operator A is called a contraction if k(A) < 1. There is a beautiful formula for the contraction constant of a C-monotonic linear map A.

Theorem 8.5 (Birkhoff (1957)).

A linear C-monotonic map \(A: \mathbb{R}^{n}\longrightarrow \mathbb{R}^{n}\) is a contraction if and only if

$$\displaystyle{ \delta =\sup \{ d(\mathit{Ax},\mathit{Ay})\;\vert \;x,y \in C\} < \infty. }$$

Whenever this is satisfied, the contraction constant is equal to

$$\displaystyle{ k(A) = \frac{e^{\delta /2} - 1} {e^{\delta /2} + 1}. }$$

For a linear contraction on a closed convex and pointed cone C the Banach fixed-point theorem applies. The following result extends the classical Perron–Frobenius theorem to monotonic maps on a closed convex pointed cone.

Theorem 8.6 (Contraction Mapping Theorem).

Let \(C \subset \mathbb{R}^{n}\) denote a closed convex and pointed cone, λ ∈ C an interior point of the dual cone C , and \(C_{1}:=\{ x \in C\;\vert \;\lambda (x) = 1\}\) . Let μ ≥ 0 be a nonnegative real number. Let \(N \in \mathbb{N}\) and \(A: \mathbb{R}^{n}\longrightarrow \mathbb{R}^{n}\) be a linear map such that (μI + A) N maps C 1 into the interior of C.

  1. 1.

    Then (μI + A) N is a contraction on C and A has a unique eigenvector \(x_{{\ast}}\in C_{1}\) . The vector x is contained in the interior of C 1 with a positive eigenvalue r > 0.

  2. 2.

    The discrete dynamical system

    $$\displaystyle{ x_{t+1} = \frac{(\mu I + A)^{\mathit{Nx}_{t}}} {\lambda ((\mu I + A)_{t}^{\mathit{Nx}})},\quad x_{t} \in C_{1} }$$
    (8.1)

    converges to x from each initial condition \(x_{0} \in C_{1}\) .

Proof.

Let B = (μ I + A)N. By Lemma 8.1, C 1 is compact, and therefore the image K: = B(C 1) is a compact subset of the interior of C. Thus

$$\displaystyle\begin{array}{rcl} \sup \{d(\mathit{Bx},\mathit{By})\;\vert \;x,y \in C\}& =& \sup \{d(\mathit{Bx},\mathit{By})\;\vert \;x,y \in C_{1}\} {}\\ & \leq & \delta (K) < \infty, {}\\ \end{array}$$

where δ(K) denotes the diameter of the compact set K. The Birkhoff Theorem 8.5 therefore implies that B is a contraction on the complete metric space C 1. Consider the discrete-dynamical system \(x_{t+1} = f(x_{t})\) on C 1, defined by iterating the map

$$\displaystyle{ f: C_{1}\longrightarrow C_{1},\quad f(x) = \frac{\mathit{Bx}} {\lambda (\mathit{Bx})}. }$$

By our assumption on A, the map f is well defined and satisfies d(f(x), f(y)) = d(Bx, By). Since B is a contraction on C 1, so is f. Thus there exists 0 ≤ k < 1 with d(f(x), f(y)) ≤ kd(x, y) for all x, y ∈ C 1. Therefore, one can apply the Banach fixed-point theorem to f and conclude that there exists a unique fixed point \(x_{{\ast}}\in C_{1}\) of f. Moreover, the dynamical system (8.1) converges to x from every initial point \(x_{0} \in C_{1}\). This shows that \(x_{{\ast}}\in C_{1}\) is an eigenvector of B, with \(\mathit{Bx}_{{\ast}} =\sigma x_{{\ast}}\). Since B maps C 1 into the interior of C, \(\sigma x_{{\ast}} = \mathit{Bx}_{{\ast}}\) must be an interior point of C. But this implies that σ > 0 as well as that x is an interior point of C 1. By projective invariance of the Hilbert metric,

$$\displaystyle\begin{array}{rcl} d(\mathit{Ax}_{{\ast}},x_{{\ast}})& =& d(A(\mathit{Bx}_{{\ast}}),\mathit{Bx}_{{\ast}}) {}\\ & =& d(B(\mathit{Ax}_{{\ast}}),\mathit{Bx}_{{\ast}}) \leq \mathit{kd}(\mathit{Ax}_{{\ast}},x_{{\ast}}), {}\\ \end{array}$$

and therefore \(d(\mathit{Ax}_{{\ast}},x_{{\ast}}) = 0\). But this implies \(\mathit{Ax}_{{\ast}} = r_{{\ast}}x_{{\ast}}\) for some r > 0. The result follows. ■ 

One can give a steepest-descent interpretation of the preceding arguments that will be useful for the proof of the Perron–Frobenius theorem. Consider the continuous function

$$\displaystyle{ R_{A}: K\longrightarrow \mathbb{R}_{+},\quad R_{A}(x) = m(\mathit{Ax},x). }$$
(8.2)

It is instructive to compute this function in the special case \(C = \mathbb{R}_{+}^{n}\), where for x > 0 one has

$$\displaystyle{ R_{A}(x) =\min _{1\leq i\leq n}\frac{(\mathit{Ax})_{i}} {x_{i}}. }$$

This form is reminiscent of the Rayleigh quotient function .

Proposition 8.7 (Generalized Rayleigh Quotient).

The same notation is used here as in Theorem  8.6 . The function (8.2) has x as its unique maximum with \(R_{A}(x_{{\ast}}) = r_{{\ast}}\) . Moreover, \(R_{A}(x_{t})\) , t ≥ 0, is monotonically increasing in t and converges to \(R_{A}(x_{{\ast}}) = r_{{\ast}}\) for each sequence of points \((x_{t})_{t\geq 0}\) that are generated by the power iterations (8.1) .

Proof.

We use the notation from the proof of Theorem 8.6. Note that R A is well defined and continuous, because K = B(C 1) is a compact subset of the interior of C. Since m(cx, cy) = m(x, y) for c > 0, one obtains for all x ∈ K

$$\displaystyle\begin{array}{rcl} R_{A}(f(x))& =& m(\mathit{ABx},\mathit{Bx}) =\sup \{\lambda \; \vert \;\mathit{ABx} \geq \lambda \mathit{Bx}\} {}\\ & =& \sup \{\lambda \;\vert \;B(\mathit{Ax} -\lambda x) \geq 0\} {}\\ & \geq & \sup \{\lambda \;\vert \;\mathit{Ax} -\lambda x \geq 0\} = R_{A}(x). {}\\ \end{array}$$

Since Bv > 0 for all nonzero vectors v ∈ C, this shows B(Axλ x) > 0 for all x ∈ K that are not eigenvectors of A. This implies that \(\sup \{\lambda \;\vert \;B(\mathit{Ax} -\lambda x) \geq 0\} >\sup \{\lambda \; \vert \;\mathit{Ax} -\lambda x \geq 0\}\) is a strict inequality, unless x ∈ K is an eigenvector of A. Thus \(R_{A}(x_{t+1}) = R_{A}(f(x_{t})) < R_{A}(x_{t})\), unless \(x_{t} = f(x_{t})\) is a fixed point of f. By Theorem 8.6, the eigenvector x is the only fixed point of f and satisfies \(R_{A}(x_{{\ast}}) = m(\mathit{Ax}_{{\ast}},x_{{\ast}}) = m(r_{{\ast}}x_{{\ast}},x_{{\ast}}) = r_{{\ast}}\). This completes the proof. ■ 

The next result yields an explicit form for the contraction constant for positive matrices \(A \in \mathbb{R}^{n\times n}\), A > 0. Here \(C = \mathbb{R}_{+}^{n}\).

Corollary 8.8.

Every positive matrix \(A \in \mathbb{R}^{n\times n}\) is a contraction with respect to the Hilbert metric on \(\mathbb{R}_{+}^{n}\) . The contraction constant is

$$\displaystyle{ k(A) = \frac{\sqrt{\gamma }- 1} {\sqrt{\gamma } + 1}, }$$

where

$$\displaystyle{ \gamma =\max _{i,j,k,l}\frac{a_{\mathit{ki}}a_{\mathit{lj}}} {a_{\mathit{kj}}a_{\mathit{li}}}. }$$

2 Perron–Frobenius Theorem

The Perron–Frobenius theorem establishes a surprising and deep connection between the spectral properties of nonnegative matrices and the properties of the associated graph. Let us begin by deriving the theorem using the contraction mapping theorem on closed convex cones. For other approaches and further details see, for example, the beautiful books by Fiedler (2008) and Sternberg (2010). The notion of the irreducibility of nonnegative matrices plays a central role in the subsequent analysis. In Section 8.6, a simple graph-theoretic characterization of irreducibility is derived.

Definition 8.9.

A matrix \(A \in \mathbb{R}^{n\times n}\) is called reducible if either (n = 1, A = 0) or n ≥ 2 and there exists a permutation matrix \(P \in \mathbb{R}^{n\times n}\), 1 ≤ r ≤ n − 1, and the matrices \(B \in \mathbb{R}^{r\times r},C \in \mathbb{R}^{r\times (n-r)},D \in \mathbb{R}^{(n-r)\times (n-r)}\), with

$$\displaystyle{ P^{\top }\mathit{AP} = \left (\begin{array}{cc} B &C \\ 0 &D \end{array} \right ). }$$

Otherwise, A is called irreducible .

Irreducible nonnegative matrices have useful properties.

Lemma 8.10.

Let \(A \in \mathbb{R}^{n\times n}\) be a nonnegative irreducible matrix and \(x \in \mathbb{R}^{n}\) a vector with nonnegative components. Then Ax = 0 implies x = 0.

Proof.

After a suitable permutation of the entries of x (and an induced similarity transformation on A), one may assume that \(x = (\xi ^{\top },0)^{\top }\), with \(\xi = (x_{1},\ldots,x_{r})^{\top }\) and \(x_{1} > 0,\ldots,x_{r} > 0\). Suppose r ≥ 1. Partition the matrix A accordingly as

$$\displaystyle{ A = \left (\begin{array}{cc} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array} \right ), }$$
(8.3)

with \(A_{11} \in \mathbb{R}^{r\times r}\), and so forth. Thus Ax = 0 is equivalent to \(A_{11}\xi = 0,A_{12}\xi = 0\), which implies \(A_{11} = 0,A_{21} = 0\). This is a contradiction to A being irreducible. Therefore, Ax = 0 implies x = 0. ■ 

Using this lemma, we next prove a basic existence and uniqueness result for positive eigenvectors of nonnegative irreducible matrices. Let

$$\displaystyle{ \rho (A) =\mathrm{ max}\{\vert \lambda \vert \;\vert \;\mathrm{det}(\lambda I - A) = 0\} }$$
(8.4)

denote the spectral radius of matrix A. Suppose that A has exactly h eigenvalues with absolute value ρ(A). Then h is called the index of A.

Theorem 8.11.

Let \(A \in \mathbb{R}^{n\times n}\) be a nonnegative irreducible matrix. Let

$$\displaystyle{\mathbf{e} = (1,\ldots,1)^{\top }\in \mathbb{R}^{n},\quad C_{ 1} =\{ x \in \mathbb{R}_{+}^{n}\;\vert \;\mathbf{e}^{\top }x = 1\}.}$$
  1. 1.

    Then A has a unique nonnegative eigenvector \(x_{{\ast}}\in \mathbb{R}_{+}^{n}\) , with \(\mathbf{e}^{\top }x_{{\ast}} = 1\) , called the Perron vector . Both the Perron vector and the associated eigenvalue are positive.

  2. 2.

    If A is positive, then the sequence of power iterates

    $$\displaystyle{ x_{t+1} = \frac{\mathit{Ax}_{t}} {\mathbf{e}^{\top }\mathit{Ax}_{t}},\quad x_{t} \in C_{1} }$$
    (8.5)

    converges to x from each initial condition \(x_{0} \in C_{1},x_{0} > 0\) .

Proof.

C 1 is a compact convex subset of the closed convex and pointed cone \(C = \mathbb{R}_{+}^{n}\). By Lemma 8.10, matrix A maps C ∖{0} into itself. Moreover, e Ax > 0 for all x ∈ C ∖{0}. Since A is irreducible, the matrix (I + A)n−1 is positive (see the subsequent Theorem 8.26) and therefore maps C 1 into the interior of C. Thus one can apply Theorem 8.6, with N = n − 1 and μ = 1, to conclude that A possesses a unique eigenvector x in C 1. Moreover, x is contained in the interior of C 1 with positive eigenvalue r > 0. In particular, x  > 0. The second part follows again from Theorem 8.6. ■ 

After this first step we can state and prove the Perron–Frobenius theorem.

Theorem 8.12 (Perron–Frobenius).

Let \(A \in \mathbb{R}^{n\times n}\) denote an irreducible nonnegative matrix. Then the spectral radius ρ(A) is a simple positive eigenvalue of A and there exists a positive eigenvector x > 0 for ρ(A). No eigenvector corresponding to other eigenvalues of A is positive.

Proof.

Suppose, \(x \in \mathbb{R}_{+}^{n}\) is a nonnegative eigenvector of A. By Theorem 8.11, then x = x , where x  > 0 denotes the unique Perron vector of A; thus \(\mathit{Ax}_{{\ast}} = r_{{\ast}}x_{{\ast}}\) and r  > 0. Since \(\mathbb{R}x_{{\ast}}\) is the only eigenspace that intersects \(\mathbb{R}_{+}^{n}\), the eigenvalue r has an algebraic multiplicity of one. It suffices, therefore, to show that ρ(A) coincides with the eigenvalue r for x . In fact, let \(\lambda \in \mathbb{C}\) be an eigenvalue of A so that Az = λ z for some complex vector \(z = (z_{1},\ldots,z_{n})^{\top }\). Let \(\vert z\vert = (\vert z_{1}\vert,\ldots,\vert z_{n}\vert ) \in \mathbb{R}_{+}^{n}\) be the associated nonnegative vector of absolute values. From the triangle inequality one obtains | λ | | z | ≤ A | z | . Let

$$\displaystyle{ m(x,y) =\sup \{\lambda \geq 0\;\vert \;x \geq \lambda y\} }$$

be the order function for \(\mathbb{R}_{+}^{n}\). Applying Proposition 8.7, then

$$\displaystyle{ \vert \lambda \vert = m(\lambda \vert z\vert,\vert z\vert ) \leq m(A\vert z\vert,\vert z\vert ) \leq r_{{\ast}}. }$$

Thus r is an eigenvalue with eigenvector x and is equal to the spectral radius. This completes the proof. ■ 

The following perturbation result is of independent interest.

Proposition 8.13 (Wielandt).

Let \(A \in \mathbb{R}^{n\times n}\) be an irreducible nonnegative matrix. Let \(B \in \mathbb{C}^{n\times n}\) be a complex matrix with

$$\displaystyle{ \vert b_{\mathit{ij}}\vert \leq a_{\mathit{ij}}\quad \text{for all }\;i,j = 1,\ldots n. }$$

Then ρ(B) ≤ρ(A). If ρ(B) = ρ(A) and \(\rho (B)e^{\phi \sqrt{-1}}\) is an eigenvalue of B, then there exists a diagonal matrix \(D =\mathrm{ diag\,}(z_{1},\ldots,z_{n}) \in \mathbb{C}^{n\times n}\) with \(\vert z_{1}\vert =\ldots = \vert z_{n}\vert = 1\) such that

$$\displaystyle{ B = e^{\phi \sqrt{-1}}\mathit{DAD}^{-1}. }$$

In particular, |b ij | = a ij for all \(i,j = 1,\ldots n\) .

Proof.

In the preceding proof of the Perron–Frobenius theorem it was shown that the maximum \(r_{{\ast}} = R_{A}(x_{{\ast}})\) of the function \(R_{A}: \mathbb{R}_{+}^{n}\longrightarrow \mathbb{R} \cup \{\infty \},R_{A}(x) = m(\mathit{Ax},x)\) exists and r  = ρ(A). Let \(z \in \mathbb{C}^{n},z\neq 0,\) and \(\lambda \in \mathbb{C}\), with Bz = λ z. Then, using the triangle inequality, one obtains | λ | | z |  =  | Bz | ≤ A | z | , and therefore

$$\displaystyle{ \vert \lambda \vert = m(\vert \mathit{Bz}\vert,\vert z\vert ) \leq m(A\vert z\vert,\vert z\vert ) \leq r_{{\ast}} =\rho (A). }$$
(8.6)

This shows that ρ(B) ≤ ρ(A). Assume that ρ(B) = ρ(A) = r and λ is an eigenvalue of B, with | λ |  = r . Then (8.6) implies that r  =  | λ |  = m(A | z | , | z | ). The Perron–Frobenius theorem therefore implies A | z |  = r  | z | and | z |  > 0. Similarly, for | B |  = ( | b ij  | ) one obtains

$$\displaystyle{ r_{{\ast}}\vert z\vert = \vert \lambda z\vert = \vert \mathit{Bz}\vert \leq \vert B\vert \cdot \vert z\vert \leq A\vert z\vert = r_{{\ast}}\vert z\vert, }$$

and therefore | B | ⋅ | z |  = A ⋅ | z | . Since A − | B | is a nonnegative matrix and | z |  > 0, this implies A =  | B | .

Define

$$\displaystyle{ D =\mathrm{ diag\,}\left ( \frac{z_{1}} {\vert z_{1}\vert },\ldots, \frac{z_{n}} {\vert z_{n}\vert }\right ). }$$

Then D | z |  = z and BD | z |  = Bz = λ D | z | . Thus \(C:= e^{\phi \sqrt{-1}}D^{-1}\mathit{AD}\) satisfies C | z |  = A | z | and | C |  =  | B |  = A. Split the complex matrix \(C =\mathop{ \mathrm{Re}}\nolimits C + \sqrt{-1}\mathop{\mathrm{Im}}\nolimits C\) into real and imaginary parts \(\mathop{\mathrm{Re}}\nolimits C\) and \(\mathop{\mathrm{Im}}\nolimits C\), respectively. Since A is real, C | z |  = A | z | implies \(\mathop{\mathrm{Re}}\nolimits C\vert z\vert = A\vert z\vert \). From \(\mathop{\mathrm{Re}}\nolimits C \leq \vert C\vert = A\) it follows that \(A -\mathop{\mathrm{Re}}\nolimits C\) is nonnegative, with \((A -\mathop{\mathrm{Re}}\nolimits C)\vert z\vert = 0\). Since | z |  > 0, this implies \(\vert C\vert = A =\mathop{ \mathrm{Re}}\nolimits C\), and therefore C = A. This completes the proof. ■ 

Let \(A \in \mathbb{R}^{n\times n}\) be nonnegative and irreducible. For i = 1, , n, the period p(i) is defined as the greatest common divisor of all \(m \in \mathbb{N}\) satisfying (A m) ii  > 0. By a theorem of Romanovsky, p(1) = ⋯ = p(n) for all irreducible nonnegative matrices A. The common value \(p(A):= p(1) = \cdots = p(n)\) is called the period of A. A nonnegative matrix A of period 1 is called aperiodic. We now state, without providing full proof details, a full characterization of the structure of irreducible nonnegative matrices. A stronger form of the subsequent result and its proof appeared as Theorem 4.3.1 in the book by Fiedler (2008).

Theorem 8.14.

Let \(A \in \mathbb{R}^{n\times n}\) be irreducible and nonnegative, and let \(\lambda _{0},\;\ldots,\;\lambda _{k-1}\) denote the eigenvalues of A with absolute value equal to the spectral radius ρ(A). The following statements are true:

  1. 1.

    \(\lambda _{0},\ldots,\lambda _{k-1}\) are simple eigenvalues of A and satisfy

    $$\displaystyle{ \lambda _{j} = e^{\frac{2\pi \sqrt{-1}j} {k} }\rho (A),\quad j = 0,\ldots,k - 1. }$$
  2. 2.

    The spectrum of A is invariant under rotations with angle \(\frac{2\pi } {k}\) .

  3. 3.

    If k > 1, then there exists a permutation matrix P such that

    $$\displaystyle{ \mathit{PAP}^{\top } = \left (\begin{array}{ccccc} 0 &B_{12} & 0 & \ldots & 0 \\ 0 & 0 &B_{23} & & \vdots \\ \vdots & & \ddots & \ddots & 0\\ 0 & & &0 &B_{ k-1,k} \\ B_{k1} & 0 & \ldots &0& 0\end{array} \right ), }$$

    with block matrices B ij of suitable sizes.

  4. 4.

    The index k of A in (1) coincides with the period of A.

  5. 5.

    A is primitive if and only if the spectral radius ρ(A) is the only eigenvalue λ of A with |λ| = ρ(A).

Proof.

Only the first two statements are shown; we refer the reader to Fiedler (2008) for a proof of the remaining claims. Let \(\lambda _{j} =\rho e^{2\pi \phi _{j}\sqrt{-1}},\phi _{0} = 0,\) denote the eigenvalues of absolute value ρ = ρ(A). Applying Proposition 8.13 with B = A one obtains

$$\displaystyle{ A = e^{2\pi \phi _{j}\sqrt{-1}}D_{ j}AD_{j}^{-1},\quad j = 0,\cdots \,,k - 1 }$$

for suitable unitary diagonal matrices \(D_{0},\ldots,D_{k-1}\). Thus the spectrum of A is invariant under multiplications by \(e^{2\pi \phi _{j}\sqrt{-1}},j = 0,\cdots \,,k - 1\). Since the spectral radius λ 0 = ρ is a simple eigenvalue of A, \(\lambda _{0},\ldots,\lambda _{k-1}\) are simple eigenvalues of \(e^{2\pi \phi _{j}\sqrt{-1}}D_{j}AD_{j}^{-1} = A\). For all 0 ≤ r, s ≤ k − 1 and \(D_{\mathit{rs}} = D_{r}D_{s}\),

$$\displaystyle\begin{array}{rcl} A& =& e^{2\pi \phi _{r}\sqrt{-1}}D_{ r}AD_{r}^{-1} = e^{2\pi \phi _{r}\sqrt{-1}}e^{2\pi \phi _{s}\sqrt{-1}}D_{ r}D_{s}AD_{r}^{-1}D_{ s}^{-1} {}\\ & =& e^{2\pi (\phi _{r}+\phi _{s})\sqrt{-1}}D_{\mathit{ rs}}\mathit{AD}_{\mathit{rs}}^{-1}. {}\\ \end{array}$$

Thus \(e^{2\pi (\phi _{r}+\phi _{s})\sqrt{-1}}\rho\) are eigenvalues of A for all 0 ≤ r, s ≤ k − 1. This implies that \(\{1,e^{2\pi \phi _{1}\sqrt{-1}},\ldots,e^{2\pi \phi _{k-1}\sqrt{-1}}\}\) is a multiplicative subgroup of \(S^{1} =\{ z \in \mathbb{C}\;\vert \;\vert z\vert = 1\}\) of order k. Thus

$$\displaystyle{ \lambda _{j} = e^{\frac{2\pi j\sqrt{-1}} {k} },\quad j = 0,\ldots,k - 1. }$$

In particular, the spectrum of A is invariant under rotations by \(\frac{2\pi } {k}\). This completes the proof for the first two items. ■ 

The preceding result allows for an interesting dynamical interpretation of irreducible nonnegative matrices A in terms of discrete-time periodic linear systems. In fact, if A is not primitive, then the dynamical system x t+1 = Ax t is permutation equivalent to a periodic time-varying system \(x_{t+1} = A_{[t]}x_{t},\) with local states \(x_{t} \in \mathbb{R}^{n_{t}}\) and a periodic sequence of matrices \(A_{[0]} = B_{12},A_{[1]} = B_{23},\ldots,A_{[k-1]} = B_{k1},A_{[k]} = A_{[0]}\), and so forth.

3 Stochastic Matrices and Markov Chains

We present a simple version of the ergodic theorem in the context of finite-dimensional matrix algebras. Recall that a norm \(\|\cdot \|\) on the matrix space \(\mathbb{C}^{n\times n}\) is called submultiplicative if \(\|\mathit{AB}\| \leq \| A\|\|B\|\) for all matrices \(A,B \in \mathbb{C}^{n\times n}\). Standard examples of submultiplicative matrix norms include the 1-norm \(\|A\|_{1} =\sum _{ i,j=1}^{n}\vert a_{\mathit{ij}}\vert \), the Frobenius norm \(\|A\|_{F} = \sqrt{\sum _{i,j=1 }^{n }\vert a_{\mathit{ij } } \vert ^{2}}\), and the operator norm \(\|A\| =\sup _{\|x\|=1}\|\mathit{Ax}\|\). Let ρ(A) denote the spectral radius of A.

Proposition 8.15.

Let ρ(A) denote the spectral radius of a matrix \(A \in \mathbb{C}^{n\times n}\) . Then the following assertions are true:

  1. 1.

    If ρ(A) < 1, then

    $$\displaystyle{ \lim _{k\rightarrow \infty }A^{k} =\lim _{ k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i} = 0. }$$
  2. 2.

    If ρ(A) > 1, then both sequences (A k ) and \(\frac{1} {k}\sum _{i=0}^{k-1}A^{i}\) diverge.

  3. 3.

    Let ρ(A) = 1. The limit \(\lim _{k\rightarrow \infty }A^{k}\) exists if and only if 1 is the only eigenvalue of A with absolute value 1 and all Jordan blocks for 1 are 1 × 1.

  4. 4.

    Let ρ(A) = 1. The limit \(\lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i}\) exists if and only if all eigenvalues λ of A with absolute value |λ| = 1 have a geometric multiplicity of one.

Proof.

The simple proofs of assertions 1–3 are omitted. To prove assertion 4, assume that the limit \(\lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i}\) exists. From assertion it follows that each eigenvalue λ of A must satisfy | λ | ≤ 1. Suppose | λ |  = 1. Without loss of generality, one can assume that A = λ I + N is a Jordan block. Then

$$\displaystyle{ \frac{1} {k}\sum _{i=0}^{k-1}A^{i} = \frac{1} {k}\sum _{i=0}^{k-1}\lambda ^{i}I + \frac{1} {k}\sum _{i=0}^{k-1}i\lambda ^{i-1}N +\ldots +\frac{1} {k}\sum _{i=0}^{k-1}N^{i} }$$

diverges whenever

$$\displaystyle{ \frac{1} {k}\sum _{i=0}^{k-1}i\lambda ^{i-1} = \left \{\begin{array}{@{}l@{\quad }l@{}} \frac{k\lambda ^{k-1}(\lambda -1)-(\lambda ^{k}-1)} {k(\lambda -1)^{2}} \quad &\text{if}\quad \lambda \neq 1 \\ \frac{k-1} {2} \quad &\text{if}\quad \lambda = 1 \end{array} \right. }$$

diverges. This completes the proof. ■ 

Theorem 8.16 (Ergodic Theorem ).

Let \(\|\cdot \|\) be a submultiplicative matrix norm on \(\mathbb{C}^{n\times n}\) and \(A \in \mathbb{C}^{n\times n}\) with \(\|A\| \leq 1\) . Then:

  1. 1.

    The limit

    $$\displaystyle{ P =\lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i} }$$

    exists and satisfies P 2 = P = PA = AP. Moreover,

    $$\displaystyle\begin{array}{rcl} \mathrm{Im}\;P& =& \mathrm{Ker\,}\;(A - I) {}\\ \mathrm{Ker\,}\;P& =& \bigoplus _{\lambda \neq 1}\mathrm{Ker\,}\;(A -\lambda I)^{n}; {}\\ \end{array}$$
  2. 2.

    If 1 is the only eigenvalue of A with an absolute value of one, then

    $$\displaystyle{ P =\lim _{k\rightarrow \infty }A^{k} }$$

    exists.

Proof.

Assume that the limit

$$\displaystyle{ P =\lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i} }$$
(8.7)

exists. Proposition 8.15 then implies ρ(A) ≤ 1. For complex numbers λ with | λ | ≤ 1,

$$\displaystyle{ \lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}\lambda ^{i} =\lim _{ k\rightarrow \infty }\frac{1} {k} \frac{\lambda ^{k} - 1} {\lambda -1} = \left \{\begin{array}{@{}l@{\quad }l@{}} 0\quad &\text{if}\quad \lambda \neq 1,\\ 1\quad &\text{if} \quad \lambda = 1. \end{array} \right. }$$

Thus, by decomposing the Jordan canonical form \(J =\mathrm{ diag\,}(J_{1},J_{2})\) of A, where J 1 and J 2 have eigenvalues λ = 1 and | λ |  < 1, respectively, one obtains \(\lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}J_{ 1}^{i} = I\) and \(\lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}J_{ 1}^{i} = 0\). Thus \(P =\lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i}\) is a projection operator that commutes with A. Therefore, P 2 = P = AP = PA. The preceding argument also implies the formulas for the image space Im P and kernel of P. The second claim follows from Proposition 8.15. Thus it remains to show that the limit (8.7) exists.

For all complex matrices and submultiplicative norms the inequality \(\rho (A) \leq \| A\|\) is valid. By Proposition 8.15, it is enough to show that the Jordan blocks for the eigenvalues λ with | λ |  = 1 are 1 × 1. Assume that A = SJS −1, where \(J =\mathrm{ diag\,}(J_{1},\ldots,J_{r})\) are in Jordan canonical form. Let \(J_{1},\ldots,J_{\nu }\) be the Jordan blocks for the eigenvalues λ with an absolute value of one. For all \(m \in \mathbb{N}\), therefore, \(\|J^{m}\| \leq \| S\|\|S^{-1}\|\|A^{m}\| \leq \| S\|\|S^{-1}\|\). Since all norms on \(\mathbb{C}^{n\times n}\) are equivalent, one obtains, for \(i = 1,\ldots,\nu\),

$$\displaystyle{ \|J_{i}^{m}\|_{ 1} \leq \| J^{m}\|_{ 1} \leq \gamma \| J^{m}\| \leq \gamma \| S\|\|S^{-1}\|. }$$

For every Jordan block \(J_{i} =\lambda I + N,i = 1,\ldots,\nu,\) of size s and m ≥ 1, one has the estimate

$$\displaystyle{ \|J_{i}^{m}\|_{ 1} =\| (\lambda I+N)^{m}\|_{ 1} \geq \left \{\begin{array}{@{}l@{\quad }l@{}} 1 \quad &s = 1, \\ m + 1\quad &s > 1 \end{array} \right.. }$$

Thus the sequence \((\|J^{m}\|_{1})_{m}\) grows unbounded if there exists an eigenvalue λ of geometric multiplicity greater than one and | λ |  = 1. This completes the proof. ■ 

We establish a concrete form of the ergodic theorem for doubly stochastic matrices.

Definition 8.17.

A nonnegative matrix \(A \in \mathbb{R}^{n\times n}\) is called stochastic if

$$\displaystyle{ \sum _{j=1}^{n}a_{\mathit{ ij}} = 1,\quad i = 1,\ldots,n. }$$

A is called doubly stochastic if it is nonnegative and satisfies

$$\displaystyle{ \sum _{j=1}^{n}a_{\mathit{ ij}} = 1,\quad \sum _{l=1}^{n}a_{\mathit{ lj}} = 1,\quad i,l = 1,\ldots,n. }$$
(8.8)

Let

$$\displaystyle{ \mathbf{e}_{n} = \left (\begin{array}{c} 1\\ \vdots\\ 1 \end{array} \right ) }$$

denote a vector in \(\mathbb{R}^{n}\) with all components equal to one. Then a nonnegative matrix A is stochastic (or doubly stochastic) if and only if \(A\mathbf{e}_{n} = \mathbf{e}_{n}\) (or \(A\mathbf{e}_{n} = \mathbf{e}_{n},\mathbf{e}_{n}^{\top }A = \mathbf{e}_{n}^{\top }\)).

Theorem 8.18.

Let \(A \in \mathbb{R}^{n\times n}\) be a stochastic matrix. Then

  1. 1.

    1 is an eigenvalue of A and A has a spectral radius equal to 1;

  2. 2.

    The limit

    $$\displaystyle{ P =\lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i} }$$

    exists and is a stochastic matrix that satisfies P 2 = P = PA = AP;

  3. 3.

    If 1 is the only eigenvalue of A with an absolute value of one, then

    $$\displaystyle{ P =\lim _{k\rightarrow \infty }A^{k} }$$

    exists. In particular, this is case if A is primitive ;

  4. 4.

    If A is irreducible, then

    $$\displaystyle{ \lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i} = \mathbf{e}_{ n}y^{\top }. }$$

    Here y is a uniquely determined positive vector with

    $$\displaystyle{ y^{\top }A = y^{\top },\quad y_{ 1} + \cdots + y_{n} = 1. }$$

Proof.

The matrix norm \(\|A\| =\max _{1\leq i\leq n}\sum _{j=1}^{n}\vert a_{\mathit{ij}}\vert \) is submultiplicative, and every stochastic matrix A satisfies \(\|A\| = 1\). The vector e n is an eigenvector of A with eigenvalue 1. Therefore, the spectral radius satisfies \(1 \leq \rho (A) \leq \| A\| = 1\), i.e.,

$$\displaystyle{ \rho (A) =\| A\| = 1. }$$

This proves claim 1. Moreover, claim 2 follows from the Ergodic Theorem 8.16, together with the observation that A being stochastic implies that for each k the matrix \(\frac{1} {k}\sum _{i=0}^{k-1}A^{i}\) is nonnegative, with \(\frac{1} {k}\sum _{i=0}^{k-1}A^{i}\mathbf{e}_{ n} = \frac{1} {k}\sum _{i=0}^{k-1}\mathbf{e}_{ n} = \mathbf{e}_{n}\). The first claim in statement 3 follows from Theorem 8.16, while the second claim follows from the Perron–Frobenius theorem. To prove the last claim, one applies the Perron–Frobenius theorem to the irreducible matrix A . Thus, there exists a unique vector \(y \in \mathbb{R}^{n},y > 0,\) with \(y^{\top }A = y^{\top }\) and e y = 1. Moreover, 1 is a simple eigenvalue of A. Thus Theorem 8.16 implies that rkP = dimKer (AI) = 1, and therefore P is of the form P = bc for unique nonzero vectors \(b,c \in \mathbb{R}^{n}\), with c 1 = 1. Since P e = e, b = e. Moreover, \(y^{\top }A = y^{\top }\) implies \(y^{\top }P = y^{\top }\), and therefore \(y^{\top } = y^{\top }P = y^{\top }\mathbf{e}c^{\top } = c^{\top }\), since y e = 1. This completes the proof. ■ 

Corollary 8.19.

For every irreducible, doubly stochastic matrix \(A \in \mathbb{R}^{n\times n}\) ,

$$\displaystyle{ \lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i} = \frac{1} {n}\mathbf{e}\mathbf{e}^{\top }. }$$

It is straightforward to apply the preceding results to Markov chains. Consider random variables with values in the finite alphabet \(\{1,\ldots,n\}\) and associated probabilities \(\pi _{i} = P(X = i),i = 1,\ldots,n\). Thus \(\pi = (\pi (1),\ldots,\pi (n))^{\top }\in \mathbb{R}^{n}\) satisfies π ≥ 0 and \(\pi (1) + \cdots +\pi (n) = 1\). One can easily generalize this simple static model to a dynamic one by considering a stochastic process (X t ) defined by a sequence of random variables \(X_{t},t \in \mathbb{N}_{0},\) with values in {1, , n}. Let \((\pi _{t}) \in \mathbb{R}^{n}\) denote the associated vector of probabilities. Markov chains are special stochastic processes where for each time t the vector of probabilities π t depends only on π t−1. More precisely, it is assumed that for each \(i,j \in \{ 1,\ldots,n\}\) the conditioned probabilities

$$\displaystyle{ p_{\mathit{ij}}:= P(X_{t+1} = j\vert X_{t} = i),\quad t \in \mathbb{N}_{0}, }$$

are independent of time t. Therefore, one can describe the transition probabilities between states i and j by a matrix

$$\displaystyle{ A = (p_{\mathit{ij}}) \in \mathbb{R}^{n\times n} }$$

of real numbers p ij  ≥ 0, with \(\sum _{j=1}^{n}p_{\mathit{ij}} = 1\) for i = 1, , n. Thus \(A \in \mathbb{R}^{n\times n}\) is a stochastic matrix of transition probabilities, with A e = e.

Definition 8.20.

A Markov chain on the finite state space {1, , n} is a discrete dynamical system

$$\displaystyle{ \pi _{t+1}^{\top } =\pi _{ t}^{\top }A,\quad \pi _{ 0} =\pi,\quad t \in \mathbb{N}_{0} }$$
(8.9)

defined by a stochastic matrix A. Here the initial probability distribution π is allowed to be an arbitrary vector of nonnegative numbers \(p_{1},\ldots,p_{n}\), \(p_{1} + \cdots + p_{n} = 1\).

The preceding results on stochastic matrices can be reformulated as follows.

Theorem 8.21.

Let (A,π) be a Markov chain on {1,…,n} with initial probability distribution π. Let \(\pi _{t}^{\top } = (\pi _{t}(1),\ldots,\pi _{t}(n))\) denote the probability distributions that evolve according to the Markov chain  (8.9) .

  1. 1.

    If A is irreducible, then there exists a unique stationary probability distribution \(\pi _{\infty }^{\top } = (\pi _{\infty }(1),\ldots,\pi _{\infty }(n)) \in \mathbb{R}^{1\times n}\) satisfying

    $$\displaystyle{ \pi _{\infty } > 0,\quad \pi _{\infty }^{\top }A =\pi _{ \infty }^{\top },\quad \mathbf{e}^{\top }\pi _{ \infty } = 1. }$$

    Moreover,

    $$\displaystyle{ \lim _{k\rightarrow \infty }\frac{1} {k}\sum _{i=0}^{k-1}A^{i} = \mathbf{e}\pi _{ \infty }^{\top }, }$$

    which implies that

    $$\displaystyle{ \lim _{k\rightarrow \infty }\frac{1} {k}\sum _{t=0}^{k-1}\pi _{ t}^{\top } =\pi _{ \infty }^{\top }. }$$
  2. 2.

    Assume A is primitive, i.e., A m > 0 for some \(m \in \mathbb{N}\) . Then the following limits exist:

    $$\displaystyle\begin{array}{rcl} \lim _{k\rightarrow \infty }A^{k}& =& \mathbf{e}\pi _{ \infty }^{\top }, {}\\ \lim _{t\rightarrow \infty }E(X_{t})& =& \sum _{i=1}^{n}i\pi _{ \infty }(i). {}\\ \end{array}$$

    Here the expectation value of X t is defined as \(E(X_{t}):=\sum _{ i=1}^{n}i\pi _{t}(i)\) .

Example 8.22.

We discuss the Ehrenfest diffusion model from statistical mechanics. Assume a domain Ω is partitioned into two regions, Ω 1 and Ω 2. Assume further that Ω contains exactly n particles that may move around in Ω, passing from one region to the other. Let X t  ∈ { 0, , n} denote the number of particles that are in region Ω 1 at time t. Assume that the probability for a change of a particle from region Ω 1 to region Ω 2, or vice versa, is exactly \(\frac{1} {n}\). The transition probability matrix, then, is the (n + 1) × (n + 1) tridiagonal matrix

$$\displaystyle{ A = \left (\begin{array}{*{10}c} 0 & 1 & 0 & \cdots &\cdots & 0 \\ \frac{1} {n} & 0 & \frac{n-1} {n} & & & \vdots \\ 0 & \frac{2} {n} & 0 & \frac{n-2} {n} & & \vdots \\ \vdots & & \ddots & \ddots & \ddots & 0 \\ \vdots & & & \ddots & \ddots & \frac{1} {n} \\ 0 & \cdots & \cdots & 0 & 1 & 0 \end{array} \right ). }$$

Note that A is an irreducible stochastic matrix. Therefore, a unique stationary probability distribution π of X t exists and satisfies \(\pi _{\infty }^{\top }A =\pi _{ \infty }^{\top }\). Define \(y = (y_{0},\ldots,y_{n})^{\top }\), with \(y_{j} = 2^{-j}{n\choose j}\). A straightforward computation shows that \(y^{\top }A = y^{\top },y^{\top }\mathbf{e} = 1\). Therefore, the stationary probabilities are

$$\displaystyle{ \pi _{\infty }(j) = \frac{1} {2^{j}}{n\choose j},\quad j = 0,\ldots,n. }$$

In particular, the expectation value of the number of particles in region Ω 1 is equal to

$$\displaystyle{ E(X_{t}) =\sum _{ j=0}^{n}j\pi _{ \infty }(j) =\sum _{ j=0}^{n}j \frac{1} {2^{j}}{n\choose j} = \frac{n} {2}, }$$

as expected. One can show that the eigenvalues of A are the real numbers \(1-\frac{2k} {n},\) \(k = 0,\ldots,n\). The convergence rate of the Markov chain is dependent on the second largest eigenvalue of A, i.e., it is equal to \(1 - \frac{2} {n}\). Thus, for large numbers n of particles, the Markov chain will converge quite slowly to the equilibrium distribution.

4 Graphs and Matrices

We now introduce some of the basic notions from graph theory and relate the graph concepts to the structure of nonnegative matrices. A directed graph (digraph) Γ = (V, E) consists of a finite set \(V =\{ v_{1},\ldots,v_{N}\}\) of vertices, together with a finite subset E ⊂ V × V of pairs of vertices called edges. Thus each edge of a graph is a pair (v, w) of vertices v and w, which are called the initial and terminal vertices of e, respectively. This leads to well-defined maps \(\iota,\tau: E\longrightarrow V\) that assign to each edge the initial vertex \(\iota (v,w) = v\) and terminal vertex τ(v, w) = w, respectively. We refer to the pair \(\iota,\tau\) as the canonical orientation on a digraph (Figures 8.1 and 8.2).

Fig. 8.1
figure 1

Directed graph

Fig. 8.2
figure 2

Spanning tree

Each vertex element v in a digraph has two kinds of neighborhoods,  the in-neighborhood,

$$\displaystyle{ \mathcal{N}^{i}(v) =\{ u \in V \;\vert \;(u,v) \in E\}, }$$

and the out-neighborhood,

$$\displaystyle{ \mathcal{N}^{o}(v) =\{ w \in V \;\vert \;(v,w) \in E\}. }$$

The cardinalities \(d_{i}(v) = \vert \mathcal{N}^{i}(v)\vert \) and \(d_{o}(v) = \vert \mathcal{N}^{o}(v)\vert \) are called the in-degree and out-degree of v, respectively. A subgraph of a digraph Γ = (V, E) is a digraph \(\varGamma ^{{\prime}} = (V ^{{\prime}},E^{{\prime}})\), with V  ⊂ V and E  ⊂ E. It is called a spanning subgraph if V  = V and E  ⊂ E. An induced subgraph of Γ = (V, E) is a subgraph \(\varGamma ^{{\prime}} = (V ^{{\prime}},E^{{\prime}})\) that contains all edges in E between pairs of vertices in V . A walk in a directed graph Γ of length r − 1 is a finite sequence of vertices \((v_{1},\ldots,v_{r})\) such that \((v_{i},v_{i+1})\) are edges for \(i = 1,\ldots,r - 1\). A walk is cyclic if \(v_{1} = v_{r}\). A path is a walk where all vertices \(v_{1},\ldots,v_{r}\) are distinct. Thus a path cannot be cyclic. A directed graph is called acyclic if it does not contain a cycle.

An important topological concept in graph theory is that of connectivity. A digraph Γ is called strongly connected if there exists a directed path between all pairs (u, v) ∈ V × V of distinct vertices. Γ is called connected if, for each (u, v) ∈ V × V, there exists a directed path from u to v or from v to u. A strong component of a digraph is a maximal, strongly connected induced subgraph (Figure 8.3).

Fig. 8.3
figure 3

Strongly connected graph

For a proof of the following characterization of strong components we refer the reader to Fiedler (2008).

Proposition 8.23.

Let Γ = (V,E) be a digraph.

  1. 1.

    Every vertex v is contained in a unique strong component.

  2. 2.

    Two distinct strong components have disjoint sets of vertices.

  3. 3.

    Two distinct vertices u,v ∈ V belong to the same strong component if and only if there are directed paths from u to v and from v to u.

An undirected graph Γ = (V, E), often simply called a graph, consists of a finite set \(V =\{ v_{1},\ldots,v_{N}\}\) of vertices, together with a finite set \(E =\{\{ v_{i},v_{j}\}\;\vert \;(i,j) \in I\}\) of edges. Here I denotes a finite subset of \(\{1,\ldots,N\} \times \{ 1,\ldots,N\}\). Thus the edges of an undirected graph are unordered pairs of vertices. Frequently, self-loops are excluded from the definition. A graph Γ is oriented if there exist maps \(\iota,\tau: E\longrightarrow V\) such that for each edge, \(e =\{\iota (e),\tau (e)\}\). Thus, \(\hat{\varGamma }= (V,E)\) is a directed graph with a set of edges \(E =\{ (\iota (e),\tau (e))\;\vert \;e \in E\}\). In many situations concerning graphs one often assumes that an underlying orientation of the graph is specified. A directed graph Γ = (V, E) carries a natural orientation by defining \(\iota (v,w) = v,\tau (v,w) = w\) for all vertices (Figures 8.4 and 8.5).

Fig. 8.4
figure 4

Undirected graph

Fig. 8.5
figure 5

Orientation of a graph

We briefly mention a number of elementary operations one can perform with graphs. Let Γ = (V, E) and \(\varGamma ^{{\prime}} = (V ^{{\prime}},E^{{\prime}})\) denote graphs with a disjoint set of vertices, i.e., \(V \cap V ^{{\prime}} =\emptyset\). Then the following operations yield new graphs:

  1. 1.

    Union: \(\varGamma \cup \varGamma ^{{\prime}}:= (V \cup V ^{{\prime}},E \cup E^{{\prime}})\).

  2. 2.

    Join: \(\varGamma +\varGamma ^{{\prime}}:= (V \cup V ^{{\prime}},< E \cup E^{{\prime}} >)\), where

    $$\displaystyle{ < E \cup E^{{\prime}} >:= E \cup E^{{\prime}}\cup \{\{ v,v^{{\prime}}\}\;\vert \;v \in V,\;v^{{\prime}}\in V ^{{\prime}}\}. }$$
  3. 3.

    Product: \(\varGamma \times \varGamma ^{{\prime}}:= (V \times V ^{{\prime}},\hat{E})\), where the set of edges \(\hat{E}\) is defined as

    $$\displaystyle{ \{(v,v^{{\prime}}),(w,w^{{\prime}})\} \in \hat{ E}\;\Longleftrightarrow\;\left \{\begin{array}{@{}l@{\quad }l@{}} \{v,w\} \in E\;\text{and}\;v^{{\prime}} = w^{{\prime}} \quad & \\ \text{or} \quad & \\ \{v^{{\prime}},w^{{\prime}}\}\in E^{{\prime}}\;\text{and}\;v = w.\quad &\end{array} \right. }$$
  4. 4.

    Insertion of a vertex into an edge: Geometrically this means one places a new vertex w in the middle of an edge uv and replaces the old edge uv with the two new ones uw, wv. This operation does not change the topology of the graph. Thus, for every edge {u, v} ∈ E, a new graph \(\varGamma ^{{\prime}} = (V ^{{\prime}},E^{{\prime}})\) is defined as

    $$\displaystyle\begin{array}{rcl} V ^{{\prime}}& =& V \cup \{\{ u,v\}\} {}\\ E^{{\prime}}& =& E\setminus \{\{u,v\}\} \cup \{\{ u,w\},\{w,v\}\}\;. {}\\ \end{array}$$
  5. 5.

    Contraction of an edge: Here one replaces an edge {u, v} with a new vertex w and adds new edges to all neighboring vertices of u, v.

The preceding definitions and constructions for digraphs carry over to undirected graphs in an obvious way. For a vertex v ∈ V, let

$$\displaystyle{ N(v) =\{ w \in V \;\vert \;\{v,w\} \in E\} }$$

denote the neighborhood of v in the graph. The degree of v is the number of all neighboring vertices, i.e., it is equal to | N(v) | . A graph is called k-regular if all vertices have the same degree k. A subgraph of Γ is defined by a pair \(\varGamma ^{{\prime}} = (V ^{{\prime}},E^{{\prime}})\) such that V  ⊂ V and E  ⊂ E. A spanning subgraph of Γ is a subgraph \(\varGamma ^{{\prime}} = (V ^{{\prime}},E^{{\prime}})\) with the same set of vertices V  = V. A path in Γ of length r − 1 is a finite sequence of vertices \((v_{0},\ldots,v_{r})\) such that \(e_{i} =\{ v_{i-1},v_{i}\}\) are edges of Γ for \(i = 1,\ldots,r\). One says that the path connects the vertices v 0 and v r . If \(v_{r} = v_{0}\), then the path is called closed or a cycle. A graph is called connected if two distinct vertices vw are always connected through a suitable path in Γ. A maximal, connected, induced subgraph of a graph is called a connected component. The counterpart to Proposition 8.23 is true, too, i.e., each vertex is contained in a unique connected component, and the connected components of a graph form a disjoint decomposition of the set of vertices. A tree is a connected graph Γ without cycles. This is easily seen to be equivalent to | V |  =  | E | + 1. A forest is a graph whose connected components are trees. A spanning tree in a graph Γ = (V, E) is a spanning subgraph \(\varGamma ^{{\prime}} = (V ^{{\prime}},E^{{\prime}})\), which is a tree. The number of spanning trees in a graph can be counted by the so-called Matrix-Tree Theorem 8.43.

Weighted Digraphs and Matrices. Nonnegative matrices are associated with digraphs in various ways. A digraph Γ = (V, E) is called weighted if for each edge \((v_{i},v_{j}) \in E\) one specifies a nonzero real number \(a_{\mathit{ij}} \in \mathbb{R}\). For \((v_{i},v_{j})\notin E\) set a ij  = 0. Thus, using a labeling \(\{1,\ldots,N\}\longrightarrow V\), one associates with the graph a real N × N matrix \(A(\varGamma ) = (a_{\mathit{ij}}) \in \mathbb{R}^{N\times N}\). We refer to A(Γ) as the weighted adjacency matrix. The labelings of the set of vertices differ from each other by a permutation π on \(\{1,\ldots N\}\). Thus the associated adjacency matrix changes by a similarity transformation π A(Γ)π −1. Conversely, if A denotes a real N × N matrix, then let \(\varGamma _{A} = (V _{A},E_{A})\) denote the associated finite directed graph with vertex set \(V _{A} =\{ 1,\ldots,N\}\). A pair \((i,j) \in V _{A} \times V _{A}\) is an edge of Γ A if and only if a ij ≠ 0. Then A is the weighted adjacency matrix of Γ A . Similarly, weighted undirected graphs are defined by specifying for each edge \(\{v_{i},v_{j}\}\) a real number a ij and a ij  = 0 for \(\{v_{i},v_{j}\}\notin E\). Thus the weight matrix A = (a ij ) of an undirected graph is always a real symmetric matrix and therefore has only real eigenvalues.

Every digraph can be considered in a canonical way as a weighted digraph by defining the weight matrix with 0, 1 entries as

$$\displaystyle{ \mathfrak{A} = (a_{\mathit{ij}}) \in \{ 0,1\}^{N\times N},\quad \text{with}\quad a_{\mathit{ ij}} = \left \{\begin{array}{@{}l@{\quad }l@{}} 1\quad &(v_{i},v_{j}) \in E, \\ 0\quad &\text{otherwise}. \end{array} \right. }$$

We refer to the digraph \(\varGamma _{\mathfrak{A}}\) as a canonically weighted digraph.

Example 8.24.

A simple example of digraphs with nonnegative weights arises in Euclidean distance geometry and shape analysis. Thus, consider an arbitrary directed graph Γ = (V, E) with vertex set \(V =\{ v_{1},\ldots,v_{N}\} \subset \mathbb{R}^{m}\). Using the Euclidean distance \(\|v - w\|\) between two vertices, define the weights as \(a_{\mathit{ij}} =\| v_{i} - v_{j}\|\) if and only if \((v_{i},v_{j}) \in E\), and a ij  = 0 otherwise. Then the weighted adjacency matrix contains all the mutual distances between ordered pairs of points \((v_{i},v_{j})\) that are specified by the edges of the graph. Thus this matrix contains very interesting information on the geometric configuration of the vertex points.

One can express the classical adjacency matrices of a graph in terms of basic graph operations. Let Γ and Γ be graphs on m and n vertices, respectively. The classical adjacency matrices for unions, sums, and products are

$$\displaystyle\begin{array}{rcl} \mathfrak{A}_{\varGamma \cup \varGamma ^{{\prime}}}& =& \mathrm{diag\,}(\mathfrak{A}_{\varGamma },\mathfrak{A}_{\varGamma ^{{\prime}}}),\quad \mathfrak{A}_{\varGamma \times \varGamma ^{{\prime}}} = \mathfrak{A}_{\varGamma } \otimes I_{n} + I_{m} \otimes \mathfrak{A}_{\varGamma ^{{\prime}}}\,, {}\\ \mathfrak{A}_{\varGamma +\varGamma ^{{\prime}}}& =& \left (\begin{array}{cc} \mathfrak{A}_{\varGamma } & J\\ J^{\top }&\mathfrak{A}_{\varGamma ^{{\prime}}} \end{array} \right )\;.{}\\ \end{array}$$

Here J denotes the m × n matrix with all entries equal to 1.

The analysis of the structural properties of matrices is closely related to graph theory. The basic connectivity properties of digraphs are reflected in the associated properties of the weighted adjacency matrix. There is a simple graph-theoretic characterization of irreducible matrices.

Proposition 8.25.

The following conditions are equivalent for a matrix \(A \in \mathbb{R}^{N\times N}\) :

  1. 1.

    A is irreducible.

  2. 2.

    The digraph Γ A is strongly connected.

Proof.

Assume that Γ A is not strongly connected. Then there exist vertices wj ∈ V A such that there exists no directed path from w to j. Let \(V ^{{\prime}}\subset V _{A}\) denote the set of vertices v such that there exists a directed path from v to j. Define \(V _{1} = V ^{{\prime}}\cup \{ j\}\) and \(V _{2} = V _{A}\setminus V _{1}\). By construction of these sets, there does not exist a path from a vertex w in V 2 to some vertex v in V 1; otherwise, one could concatenate the path from w to v with the path from v to j to obtain a path from w to j. This is a contradiction, since \(V _{2} \cap V _{1} =\emptyset\). By assumption on w, j, one has \(j \in V _{1}\neq \emptyset,w \in V _{2}\). After a suitable renumbering of vertices one can assume, without loss of generality, that \(V _{1} =\{ 1,\ldots,r\},V _{2} =\{ r + 1,\ldots,N\}\), 1 ≤ r ≤ N − 1. Thus there exists a permutation matrix P such that

$$\displaystyle{ P^{\top }\mathit{AP} = \left (\begin{array}{cc} B &C \\ 0 &D \end{array} \right ). }$$

Therefore, A is reducible. Conversely, if A is reducible, then there exists no path from the set \(V _{2} =\{ r + 1,\ldots,N\}\) to \(V _{1} =\{ 1,\ldots,r\}\). Thus the graph is not strongly connected. This completes the proof. ■ 

For nonnegative matrices a stronger form of Proposition 8.25 is valid.

Theorem 8.26.

The following conditions are equivalent for a nonnegative \(A \in \mathbb{R}^{N\times N}\) .

  1. 1.

    A is irreducible.

  2. 2.

    For every pair of indices \(i,j \in \{ 1,\ldots,N\}\) there exists \(m \in \mathbb{N}\) , with \((A^{m})_{\mathit{ij}}\neq 0\) .

  3. 3.

    \((I_{N} + A)^{N-1} > 0\) .

  4. 4.

    The digraph Γ A is strongly connected.

Proof.

The equivalence of conditions 1 and 4 has already been shown. For \(m \in \mathbb{N}\), the sum

$$\displaystyle{ (A^{m})_{\mathit{ ij}} =\sum _{ k_{1},\ldots,k_{m-1}}^{}a_{\mathit{ik}_{1}}a_{k_{1}k_{2}}\ldots a_{k_{m-1}j} = 0 }$$

is zero if and only if each summand is zero. But this is equivalent to the property that there exists no walk from i to j of length m. Thus condition 2 is equivalent to condition 4.

Assume condition 3. Since A is nonnegative, the entries of the matrix

$$\displaystyle{ (I + A)^{N-1} =\sum _{ j=0}^{N-1}{N - 1\choose j}A^{j} }$$

are all positive. Thus, for each i, j, the ij entry of A m is positive, for \(m = 1,\ldots,N - 1\). Thus condition 2 is satisfied and condition 3 implies condition 2 and, therefore, condition 4. Conversely, assume that Γ A is strongly connected. Between every two distinct vertices there exists, then, a path of length ≤ N − 1 that connects them. Thus the off-diagonal entries of (I + A)N−1 are positive. Since the diagonal entries of (I + A)N−1 are positive, too, this implies condition 3. This completes the proof.

 ■ 

The adjacency matrix of the cyclic graph in Figure 8.6 is

$$\displaystyle{ A = \left (\begin{array}{cc} 0&1\\ 1 &0 \end{array} \right ) }$$

and satisfies, for each m,

$$\displaystyle{ A^{2m+1} = \left (\begin{array}{cc} 0&1 \\ 1&0 \end{array} \right ),\quad A^{2m} = \left (\begin{array}{cc} 1&0 \\ 0&1 \end{array} \right ). }$$

Thus A is an example of a nonnegative matrix that satisfies assertion 2 in Theorem 8.26 but does not satisfy A m > 0 for some m. Such phenomena are explained by the Perron–Frobenius theorem.

Fig. 8.6
figure 6

Cyclic graph

Definition 8.27.

A nonnegative matrix \(A \in \mathbb{R}^{N\times N}\) is called primitive if A m > 0 for some \(m \in \mathbb{N}\). The smallest such integer m is called the primitivity index γ(A).

Thus a primitive nonnegative matrix A is irreducible, but the converse does not hold. In general, the primitivity index does not satisfy γ(A) ≤ N. One can show that the primitivity index always satisfies the sharp bound

$$\displaystyle{ \gamma (A) \leq N^{2} - 2N + 2. }$$

5 Graph Rigidity and Euclidean Distance Matrices

Graph rigidity is an important notion from Euclidean distance geometry that plays a central role in diverse areas such as civil engineering Henneberg (1911), the characterization of tensegrity structures, molecular geometry and 2D-NMR spectroscopy Havel and Wüthrich (1985), and formation shape control of multiagent systems Anderson et. al. (2007). Formations of specified shape are useful in control for sensing and localizing objects, and formations of fixed shape can be contemplated for moving massive objects placed upon them. To steer formations of points from one location to another, steepest-descent methods are used to optimize a suitable cost function. Typically, the smooth cost function

$$\displaystyle{ V (X) = \frac{1} {4}\sum _{\mathit{ij}\in E}(\|x_{i} - x_{j}\|^{2} - d_{\mathit{ ij}}^{2})^{2} }$$

on the space of all formations \(X = (x_{1},\ldots,x_{N})\) of N points x i in \(\mathbb{R}^{m}\) is used. The gradient flow of V is

$$\displaystyle{\dot{x}_{i} =\sum _{ j:\ \mathit{ij}\in E}^{}(\|x_{i} - x_{j}\|^{2} - d_{\mathit{ ij}}^{2})(x_{ i} - x_{j}),\quad i = 1,\ldots,N,}$$

and can be shown to converge from every initial condition pointwise to a single equilibrium point. It thus provides a simple computational approach to find a formation that realizes a specified set of distances d ij  > 0, ij ∈ E, indexed by the edges of a graph. The characterization of such critical formations and the analysis of their local stability properties in terms of the properties of the graph are among the open research problems in this field. Such research depends crucially on a deeper understanding of Euclidean distance geometry and associated graph-theoretic concepts, such as rigidity. We next turn to a brief description of such methods.

In Euclidean distance geometry one considers a finite tuple of points \(x_{1},\ldots,x_{N}\) in Euclidean space \(\mathbb{R}^{m}\), called a formation, together with an undirected, connected graph Γ = (V, E) on the set of vertices V = { 1, , N}, with prescribed distances \(d_{\mathit{ij}} =\| x_{i} - x_{j}\|\) for each edge ij ∈ E of Γ. Conversely, by assigning positive real numbers d ij to the edges ij of a graph Γ, one is asked to find a formation \(x_{1},\ldots,x_{N} \in \mathbb{R}^{m}\) that realizes the d ij as distances \(\|x_{i} - x_{j}\|\). In heuristic terms (see below for a more formal definition), a formation \((x_{1},\ldots,x_{N})\) is then called rigid whenever there does not exist a nontrivial perturbed formation \((x_{1}^{{\prime}},\ldots,x_{N}^{{\prime}})\) near \((x_{1},\ldots,x_{N})\) that realizes the same prescribed distances.

Let us associate with a vertex element i ∈ V a point x i in Euclidean space \(\mathbb{R}^{m}\). The m × N matrix \(X = (x_{1},\ldots,x_{N})\) then describes a formation X of N points labeled by the set of vertices of Γ. With this notation at hand, consider the smooth distance map

$$\displaystyle{ \mathcal{D}: \mathbb{R}^{m\times N}\longrightarrow \mathbb{R}^{M},\quad \mathcal{D}(X) = (\|x_{ i} - x_{j}\|^{2})_{ (i,j)\in E}. }$$

The image set

$$\displaystyle{ \mathit{CM}_{m}(\varGamma ) =\{ \mathcal{D}(X)\;\vert \;X \in \mathbb{R}^{m\times N}\} }$$

is called the Cayley–Menger variety . Being the image of a real polynomial map, the Cayley–Menger variety defines a semialgebraic subset of \(\mathbb{R}^{M}\), which is in fact closed. It is a fundamental geometric object that is attached to the set of all realizations of a graph in \(\mathbb{R}^{m}\). For simplicity, let us focus on the complete graph K N with a set of vertices \(V =\{ 1,\ldots,N\}\) and a set of edges E = V × V. Then the elements of the Cayley–Menger variety \(C_{m}(K_{N})\) are in bijective correspondence with the set of N × N Euclidean distance matrices

$$\displaystyle{ D(x_{1},\ldots,x_{N}) = \left (\begin{array}{cccc} 0 &\|x_{1} - x_{2}\|^{2} & \mathop{\ldots } & \|x_{1} - x_{N}\|^{2} \\ \|x_{1} - x_{2}\|^{2} & 0 & \mathop{\ldots } & \|x_{2} - x_{N}\|^{2}\\ \\ \vdots & \ddots & \ddots & \vdots \\ \|x_{1} - x_{N-1}\|^{2} & & 0 &\|x_{N-1} - x_{N}\|^{2} \\ \|x_{1} - x_{N}\|^{2} & \mathop{\ldots } &\|x_{N-1} - x_{N}\|^{2} & 0 \end{array} \right ) }$$

defined by \(x_{1}\ldots,x_{N} \in \mathbb{R}^{m}\). Thus,

$$\displaystyle{D(x_{1},\ldots,x_{N}) = -2X^{\top }X + \mathbf{x}\mathbf{e}^{\top } + \mathbf{e}\mathbf{x}^{\top }.}$$

Here \(\mathbf{x} =\mathrm{ col\,}(\|x_{1}\|^{2},\ldots,\|x_{N}\|^{2}) \in \mathbb{R}^{N}\) and \(X^{\top }X = (x_{i}^{\top }x_{j}) \in \mathbb{R}^{N\times N}\) denote the Gramian matrix associated with \(x_{1},\ldots,x_{N}\). In particular, \(D(x_{1},\ldots,x_{N})\) is a rank two perturbation of the rank ≤ m Gramian matrix X X. This observation implies that Euclidean distance matrices of N points in \(\mathbb{R}^{m}\) have rank ≤ m + 2, while for generic choices of \(x_{1},\ldots,x_{N}\) the rank is equal to m + 2. To characterize the set of Euclidean distance matrices, one needs a simple lemma from linear algebra.

Lemma 8.28.

Let \(A = A^{\top }\in \mathbb{R}^{n\times n}\) , and assume that \(B \in \mathbb{R}^{r\times n}\) has full row rank r. Let Q A denote the quadratic form x Ax defined on the kernel Ker B of B. Then the symmetric matrix

$$\displaystyle{C = \left (\begin{array}{cc} 0 &B^{\top } \\ B & A \end{array} \right )}$$

satisfies the equations for rank and signature

$$\displaystyle\begin{array}{rcl} \mathop{\mathrm{rk}}\nolimits \;C& =& \mathop{\mathrm{rk}}\nolimits \;Q_{A} + 2r, {}\\ \mathop{\mathrm{sign}}\nolimits \;C& =& \mathop{\mathrm{sign}}\nolimits \;Q_{A}. {}\\ \end{array}$$

Proof.

Let R and L be invertible r × r and n × n matrices, with LBR = (0, I r ), respectively. Then

$$\displaystyle{\left (\begin{array}{ll} R^{\top }&0 \\ 0 &L \end{array} \right )\left (\begin{array}{ll} 0 &B^{\top } \\ B &A \end{array} \right )\left (\begin{array}{ll} R&0 \\ 0 &L^{\top } \end{array} \right ) = \left (\begin{array}{lc} 0 &(\mathit{LBR})^{\top } \\ \mathit{LBR}& \mathit{LAL}^{\top } \end{array} \right ),}$$

and after such a suitable transformation one can assume without loss of generality that B = (I r , 0). Partition the matrix A

$$\displaystyle{A = \left (\begin{array}{cc} A_{11} & A_{12} \\ A_{12}^{\top }&A_{22} \end{array} \right ),}$$

where A 11, A 12, and A 22 have sizes r × r, r × (nr), and (nr) × (nr). Applying elementary row and column operations we obtain

$$\displaystyle\begin{array}{rcl} C& =& \left (\begin{array}{ccc} 0 & I_{r} & 0 \\ I_{r}& A_{11} & A_{12} \\ 0 &A_{12}^{\top }&A_{22} \end{array} \right ) = \left (\begin{array}{ccc} I &0&0 \\ \frac{1} {2}A_{11} & I &0 \\ A_{12}^{\top }&0&I \end{array} \right )\left (\begin{array}{ccc} 0 &I_{r}& 0 \\ I_{r}& 0 & 0 \\ 0 & 0 &A_{22} \end{array} \right )\left (\begin{array}{ccc} I &\frac{1} {2}A_{11} & A_{12} \\ 0& I & 0\\ 0 & 0 & I \end{array} \right ).{}\\ \end{array}$$

Thus the inertia theorem of Sylvester implies that

$$\displaystyle\begin{array}{rcl} \mathop{\mathrm{rk}}\nolimits \;C& =& \mathop{\mathrm{rk}}\nolimits \;A_{22} + 2r =\mathop{ \mathrm{rk}}\nolimits \;Q_{A} + 2r, {}\\ \mathop{\mathrm{sign}}\nolimits \;C& =& \mathop{\mathrm{sign}}\nolimits \;A_{11} =\mathop{ \mathrm{sign}}\nolimits \;Q_{A}. {}\\ \end{array}$$

 ■ 

A classical result by Menger (1928), see also Blumenthal (1953), asserts that the Euclidean distance matrices \(D(x_{1}\ldots,x_{N})\) of N points in \(\mathbb{R}^{m}\) have nonpositive Cayley–Menger determinants

$$\displaystyle\begin{array}{rcl} \mathop{\mathrm{det}}\nolimits \;\left (\begin{array}{*{10}c} 0& 1 & 1 &\mathop{\ldots }& 1 \\ 1& 0 & -\frac{1} {2}\|x_{1} - x_{2}\|^{2} & \mathop{\ldots }&-\frac{1} {2}\|x_{1} - x_{k}\|^{2} \\ 1& -\frac{1} {2}\|x_{1} - x_{2}\|^{2} & 0 &\mathop{\ldots }&-\frac{1} {2}\|x_{2} - x_{k}\|^{2} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1&-\frac{1} {2}\|x_{1} - x_{k}\|^{2} & -\frac{1} {2}\|x_{2} - x_{k}\|^{2} & \mathop{\ldots }& 0 \end{array} \right ) \leq 0& &{}\end{array}$$
(8.10)

for each k ≤ N (and are equal to zero for k > m + 1). One can easily deduce this condition from the following more general characterization of Euclidean distance matrices.

Theorem 8.29.

Let \(A = (a_{\mathit{ij}}) \in \mathbb{R}_{+}^{N\times N}\) be a nonnegative symmetric matrix, with \(a_{11} = \cdots = a_{\mathit{NN}} = 0\) . The following assertions are equivalent:

  1. (a)

    A is a Euclidean distance matrix of N points in \(\mathbb{R}^{m}\) .

  2. (b)

    There exists a nonnegative vector \(\mathbf{a} \in \mathbb{R}_{+}^{N}\) that satisfies the linear matrix inequality with rank constraint

    $$\displaystyle{-A + \mathbf{a}\mathbf{e}^{\top } + \mathbf{e}\mathbf{a}^{\top }\succeq 0,\quad \mathop{\mathrm{rk}}\nolimits (-A + \mathbf{a}\mathbf{e}^{\top } + \mathbf{e}\mathbf{a}^{\top }) \leq m.}$$
  3. (c)

    There exists a positive semidefinite matrix S of rank ≤ m, with

    $$\displaystyle{ -\frac{1} {2}A = S -\frac{1} {2}\left (\mathrm{diag\,}(S)\mathbf{e}\mathbf{e}^{\top } + \mathbf{e}\mathbf{e}^{\top }\mathrm{diag\,}(S)\right ). }$$
    (8.11)
  4. (d)

    The matrix

    $$\displaystyle{S_{A}:= -\frac{1} {2}\left (I_{N} - \frac{1} {N}\mathbf{e}\mathbf{e}^{\top }\right )A\left (I_{ N} - \frac{1} {N}\mathbf{e}\mathbf{e}^{\top }\right )}$$

    is positive semidefinite of rank ≤ m.

  5. (e)

    The restriction of the quadratic form x Ax on \((\mathbb{R}\mathbf{e})^{\perp }\) is negative semidefinite and has rank ≤ m.

  6. (f)

    The Cayley–Menger matrix

    $$\displaystyle{ \mathit{CM}(A):= \left (\begin{array}{*{10}c} 0& 1 & 1 &\mathop{\ldots }& 1 \\ 1& 0 & -\frac{1} {2}a_{12} & \mathop{\ldots }&-\frac{1} {2}a_{1N} \\ 1& -\frac{1} {2}a_{12} & 0 &\mathop{\ldots }&-\frac{1} {2}a_{2N}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1&-\frac{1} {2}a_{1N}&-\frac{1} {2}a_{2N}&\mathop{\ldots }& 0\end{array} \right ) }$$

    has exactly one negative eigenvalue and at most m + 1 positive eigenvalues.

Proof.

Using the identity \(\|x_{i} - x_{j}\|^{2} =\| x_{i}\|^{2} +\| x_{j}\|^{2} - 2x_{i}^{\top }x_{j}\), we see that \(A = D(x_{1},\ldots,x_{N})\) for some points \(x_{1},\ldots,x_{N} \in \mathbb{R}^{m}\) if and only if

$$\displaystyle{ -\frac{1} {2}A = X^{\top }X -\frac{1} {2}\mathbf{a}\mathbf{e}^{\top }-\frac{1} {2}\mathbf{e}\mathbf{a}^{\top }, }$$
(8.12)

where \(\mathbf{a} =\mathrm{ col\,}(\|x_{1}\|^{2},\ldots,\|x_{N}\|^{2})\). Equivalently, A is a Euclidean distance matrix in \(\mathbb{R}^{m}\) if and only if there exists a positive semidefinite matrix S of rank ≤ m with

$$\displaystyle{ -\frac{1} {2}A = S -\frac{1} {2}\left (\mathrm{diag\,}(S)\mathbf{e}\mathbf{e}^{\top } + \mathbf{e}\mathbf{e}^{\top }\mathrm{diag\,}(S)\right ). }$$

Here diag S is a diagonal matrix with the same diagonal entries as S. This completes the proof of the equivalence of the first three conditions. It is easily seen that

$$\displaystyle{S:= -\frac{1} {2}\left (I_{N} - \frac{1} {N}\mathbf{e}\mathbf{e}^{\top }\right )A\left (I_{ N} - \frac{1} {N}\mathbf{e}\mathbf{e}^{\top }\right )}$$

satisfies (8.11). Thus (d) implies (c) and hence also (a). Conversely, (a) implies (8.12), and therefore

$$\displaystyle{S_{A} = -\frac{1} {2}\left (I_{N} - \frac{1} {N}\mathbf{e}\mathbf{e}^{\top }\right )X^{\top }X\left (I_{ N} - \frac{1} {N}\mathbf{e}\mathbf{e}^{\top }\right )}$$

is positive semidefinite of rank ≤ m. This shows the equivalence of (a) and (d). The equivalence of (a) with (e) and (f) follows from Lemma 8.28 by noting that S A in (d) satisfies \(x^{\top }S_{\mathit{Ax}} = -\frac{1} {2}x^{\top }\mathit{Ax}\) for all \(x \in (\mathbb{R}\mathbf{e})^{\perp }\). This completes the proof. ■ 

There are two simple facts about Cayley–Menger determinants that are worth mentioning. First, for N = 3 points in \(\mathbb{R}^{2}\), one has the expression for the determinant of the distance matrix (with \(a_{\mathit{ij}}:=\| x_{i} - x_{j}\|\))

$$\displaystyle\begin{array}{rcl} & & \det D(x_{1},x_{2},x_{3}) {}\\ & & = -\frac{1} {4}(a_{12} + a_{13} + a_{23})(a_{12} + a_{13} - a_{23})(a_{12} - a_{13} + a_{23})(-a_{12} + a_{13} + a_{23}).{}\\ \end{array}$$

This relates to the familiar triangle inequalities that characterize a triple of nonnegative real numbers \(d_{1},d_{2},d_{3}\) as the side lengths of a triangle. Second, a well-known formula for the volume \(\mathop{\mathrm{Vol}}\nolimits \;(\varSigma )\) of the simplex Σ defined by N + 1 points \(x_{0},\ldots,x_{N}\) in \(\mathbb{R}^{N}\) asserts that

$$\displaystyle{\mathop{\mathrm{Vol}}\nolimits \;(\varSigma ) = \frac{1} {N!}\vert \mathop{\mathrm{det}}\nolimits \;(x_{1} - x_{0},\ldots,x_{N} - x_{0})\vert.}$$

From the translational invariance of the norm, the distance matrix \(A:= D(x_{0},\ldots,x_{N}) = D(0,p_{1},\ldots,p_{N})\), with \(p_{i}:= x_{i} - x_{0}\) for \(i = 1,\ldots,N\). Applying (8.12) to \(d:=\mathrm{ col\,}(0,\|p_{1}\|^{2},\ldots,\|p_{N}\|^{2})\), \(P:= (p_{1},\ldots,p_{N}) \in \mathbb{R}^{d\times N}\), we obtain

$$\displaystyle\begin{array}{rcl} \mathop{\mathrm{Vol}}\nolimits ^{2}(\varSigma )& =& \left ( \frac{1} {N!}\right )^{2}\vert \det P\vert ^{2} =\det \left (\begin{array}{cc} I_{2} & 0 \\ 0 &P^{\top }P \end{array} \right ) {}\\ & =& -\det \left (\begin{array}{ccccc} 0&1& 1 &\cdots & 1\\ 1 &0 & 0 &\cdots & 0 \\ 1&0& p_{1}^{\top }p_{1} & \cdots & p_{1}^{\top }p_{N}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1&0&p_{N}^{\top }p_{1} & \cdots &p_{N}^{\top }p_{N} \end{array} \right ) {}\\ & =& -\det \left (\begin{array}{cc} 0& \mathbf{e}^{\top } \\ \mathbf{e}& -\frac{1} {2}D \end{array} \right ). {}\\ \end{array}$$

Thus we obtain the formula for the squared volume of the simplex in terms of the mutual distances as

$$\displaystyle{\mathop{\mathrm{Vol}}\nolimits ^{2}(\varSigma ) = -\left ( \frac{1} {N!}\right )^{2}\det \mathit{CM},}$$

where CM is the Cayley–Menger matrix of the distance matrix \(D(x_{0},\ldots,x_{N})\).

Returning to the situation of formations defined over a graph, the rigidity matrix of a formation is defined as the M ×mN Jacobi matrix \(R(X) = \mathit{Jac}_{\mathcal{D}}(X)\) whose ijth row (ij ∈ E) is

$$\displaystyle{ R(X)_{\mathit{ij}} = (e_{i} - e_{j})^{\top }\otimes (x_{ i} - x_{j})^{\top }. }$$

A formation X is called regular whenever

$$\displaystyle{ \mathop{\mathrm{rk}}\nolimits R(X) =\max _{Z\in \mathbb{R}^{m\times N}}\mathop{ \mathrm{rk}}\nolimits R(Z). }$$

The regular formations form an open and dense subset in the space of all formations. The geometry of formations is closely connected with the standard action of the Euclidean group on \(\mathbb{R}^{m}\). Let O(m) denote the compact matrix Lie group of all real orthogonal m × m matrices. The Euclidean group E(m) then parameterizes all Euclidean group transformations of the form \(p\mapsto \mathit{gp} + v\), where g ∈ O(m) and \(v \in \mathbb{R}^{m}\) denotes a translation vector. Thus E(m) is a Lie group of dimension \(\frac{m(m+1)} {2}\), which is in fact a semidirect product of O(m) and \(\mathbb{R}^{m}\). Since \(\mathcal{D}\) is invariant under orthogonal rotations, i.e., \(\mathcal{D}(\mathit{SX}) = \mathcal{D}(X)\) for S ∈ O(m), the tangent space to such a group orbit is contained in the kernel of the rigidity matrix R(X).

Lemma 8.30.

The kernel of the rigidity map R(X) contains the tangent space T X (O(m) ⋅ X). Suppose the linear span of the columns \(x_{1},\ldots,x_{N}\) of X has dimension r. Then the kernel of R(X) has at least dimension \(\frac{1} {2}r(2m - r - 1)\) .

Proof.

The first statement is a simple consequence of the invariance of \(\mathcal{D}\) under the group of orthogonal transformations \(X\mapsto \mathit{SX}\). Note that the stabilizer group O(m) X of X coincides with the subgroup of O(m) that leaves the elements of the linear span \(< x_{1},\ldots,x_{N} >\) pointwise invariant. Thus a straightforward computation reveals that the dimension of O(m) X is equal to \(\frac{1} {2}(m - r)(m - r - 1)\). Therefore, the dimension of the group orbit O(m) ⋅ X is equal to \(\frac{1} {2}m(m - 1) -\frac{1} {2}(m - r)(m - r - 1) = \frac{1} {2}r(2m - r - 1)\). This completes the proof. ■ 

A formation is called infinitesimally rigid if the kernel of the rigidity matrix coincides with the tangent space T X (O(m) ⋅ X). Equivalently, infinitesimal rigidity holds if and only if the following rank condition is satisfied:

$$\displaystyle{ \mathop{\mathrm{rk}}\nolimits R(X) = m(N - 1) -\frac{1} {2}\mathop{\mathrm{rk}}\nolimits X(2m -\mathop{\mathrm{rk}}\nolimits X - 1). }$$

Note that from the structure of R(X) one can easily check that \(\mathop{\mathrm{rk}}\nolimits R(X) \leq N - 1\) for r = 1. Likewise one can check that \(\mathop{\mathrm{rk}}\nolimits R(X) \leq 2N - 3\) for r = 2 and all d with M ≥ 2N − 3. With the aid of these bounds, one can then verify using the rank condition that a formation of N points in the plane \(\mathbb{R}^{2}\) is infinitesimally rigid if and only if r = 2 and the rank of R(X) is equal to 2N − 3. Similarly, a formation of N ≥ 4 points in \(\mathbb{R}^{3}\) is infinitesimally rigid if and only if r = 3 and the rank of R(X) is equal to 3N − 6. A formation X is called rigid whenever the orbit SO(m) ⋅ X is isolated in the fiber \(\mathcal{D}^{-1}(\mathcal{D}(X)\). Every infinitesimally rigid formation is rigid, but the converse does not hold. In fact, regular formations are infinitesimally rigid if and only if they are rigid (Figures 8.7 and 8.8).

Fig. 8.7
figure 7

Complete graphs are rigid

Fig. 8.8
figure 8

Nonrigid graph

A rigid graph in \(\mathbb{R}^{m}\) is one for which almost every \(X \in \mathbb{R}^{m}\) is infinitesimally rigid. Thus Γ is rigid in \(\mathbb{R}^{m}\) if and only if the rigidity matrix R(X) has generic rank equal to \(\mathit{mN} -\frac{m(m+1)} {2}\). A rigid graph is called minimally rigid if it has exactly \(\mathit{mN} -\frac{m(m+1)} {2}\) edges (Figure 8.9). An example of a rigid graph is the complete graph K N that has an edge between each pair of the N vertices. K N is minimally rigid if and only if N = 2, 3, but not if N ≥ 4. In contrast, a graph with 4 vertices and 5 edges realized in \(\mathbb{R}^{2}\) is minimally rigid.

Fig. 8.9
figure 9

Minimally rigid graph

Rigid graphs in \(\mathbb{R}^{2}\) are characterized in a combinatorial manner by the so-called Laman condition stated in the next theorem.

Theorem 8.31 (Laman (1970)).

A graph Γ with \(M\) edges and N vertices is minimally rigid in \(\mathbb{R}^{2}\) if and only if the following two conditions are satisfied:

  1. (a)

    M = 2N − 3.

  2. (b)

    Every subgraph Γ of Γ with N vertices and M edges satisfies \(M^{{\prime}}\leq 2N^{{\prime}}- 3\) .

An explicit combinatorial characterization of rigid graphs in \(\mathbb{R}^{3}\) is unknown.

6 Spectral Graph Theory

Definition 8.32.

The spectrum of a weighted graph is defined as the set of eigenvalues of the adjacency matrix A(Γ), counted with their multiplicities. The characteristic polynomial of Γ is defined as the characteristic polynomial

$$\displaystyle{ \det (\mathit{zI} - A(\varGamma )) = z^{N} + c_{ 1}z^{N-1} + \cdots + c_{ N}. }$$

The field of spectral graph theory is concerned with attempting to characterize the properties of graphs using information on the spectrum. Typically, the graphs are not weighted, and thus the adjacency matrix considered is the classical adjacency matrix of a graph. The first three coefficients of the characteristic polynomial of an unweighted graph (without self-loops) are easily characterized as follows:

  1. 1.

    c 1 = 0,

  2. 2.

    c 2 = − | E | ,

  3. 3.

    c 3 = −2δ, where δ denotes the number of triangles in Γ.

We refer the reader to Cvetkovic, Rowlinson and Simic (2010) for a detailed study of graph spectra.

Laplacian matrices are constructed from the adjacency matrix of a weighted graph through the notion of oriented incidence matrices. Their spectral properties dominate the present theory of consensus and synchronization, as will be further explained in Chapter 11 Let Γ = (V, E) be an oriented (directed or undirected) weighted graph with associated maps \(\iota: E\longrightarrow V\) and \(\tau: E\longrightarrow V\) on the initial and terminal points, respectively. Here and in the sequel we will always use the canonical orientation on a digraph. Assume that Γ has n vertices \(\{v_{1},\ldots,v_{N}\}\) and M edges \(\{e_{1},\ldots,e_{M}\}\). Thus, for an edge e ∈ E with initial and terminal points \(v_{i} =\iota (e),v_{j} =\tau (e),\) respectively, there are associated weights \(a_{e} = a_{\mathit{ij}} \geq 0\). Let \(A \in \mathbb{R}_{+}^{N\times N}\) denote the weight adjacency matrix of Γ = (V, E). Equivalently, one can present the weights as the diagonal matrix

$$\displaystyle{ W =\mathrm{ diag\,}(a_{e})_{e\in E}\; \in \mathbb{R}^{M\times M}. }$$

The oriented incidence matrix is defined as

$$\displaystyle{ B = (b_{\mathit{ij}}) \in \mathbb{R}^{N\times M},\quad b_{\mathit{ ij}} = \left \{\begin{array}{@{}l@{\quad }l@{}} 1 \quad &\text{if }\;v_{i} =\tau (e_{j})\neq \iota (e_{j}), \\ -1\quad &\text{if}\;v_{i} =\iota (e_{j})\neq \tau (e_{j}), \\ 0 \quad &\text{otherwise}. \end{array} \right. }$$
(8.13)

Thus every incidence matrix has in each of its columns a single entry 1 and a single entry − 1. All other entries in the column are zero. If B, B are incidence matrices of two identical graphs but with different orientations, then B  = BS for a unique diagonal matrix \(S =\mathrm{ diag\,}(s_{1},\ldots,s_{M})\) and s i  = ±1. Thus the product \(B^{{\prime}}W(B^{{\prime}})^{\top } = \mathit{BWB}^{\top }\) is independent of the orientation. If the graph Γ is strongly connected, then the incidence matrix B is known to have full row rank N − 1; see, for example, Fiedler (2008). This shows that B has full row rank N − 1 and the kernel of B has dimension MN + 1. Each vector in Ker B describes a cycle in the graph Γ. Thus there are exactly MN + 1 linearly independent cycles for a (directed or undirected) graph defined by a basis of the kernel of B with integer coefficients (Figure 8.10).

Fig. 8.10
figure 10

Orientation labeling of a graph

Let \(\overleftarrow{\varGamma } = (V,\overleftarrow{E})\) denote the so-called reverse graph, i.e., Γ and \(\overleftarrow{\varGamma }\) have the same set V of vertices and for each pair of vertices u, v:

$$\displaystyle{ (u,v) \in E\;\Longleftrightarrow\;(v,u) \in \overleftarrow{ E}. }$$

Then the adjacency matrix of \(\overleftarrow{\varGamma }\) is the transpose of that of Γ, i.e.,

$$\displaystyle{ A(\overleftarrow{\varGamma }) = A(\varGamma )^{\top }. }$$

Definition 8.33 (Laplacian).

Let Γ = (V, E) be a weighted (directed or undirected) graph with nonnegative weight matrix \(A(\varGamma ) \in \mathbb{R}_{+}^{N\times N}\). Let \(D_{\varGamma } =\mathrm{ diag\,}(d_{1},\ldots,d_{N})\) denote the real diagonal matrix with entries \(d_{i}:=\sum _{ j=1}^{N}a_{\mathit{ij}}\). The Laplacian of the weighted digraph is defined as

$$\displaystyle{ L(\varGamma ) = D(\varGamma ) - A(\varGamma ) \in \mathbb{R}^{N\times N}. }$$

The canonical Laplacian \(\mathcal{L}(\varGamma ) \in \mathbb{Z}^{N\times N}\) of a (directed or undirected) graph is defined as a Laplacian with the canonical 0, 1 adjacency weight matrix \(\mathfrak{A}(\varGamma )\).

Proposition 8.34.

Let Γ = (V,E) be an oriented directed graph with nonnegative weight matrix A Γ . Then

$$\displaystyle{ \mathit{BWB}^{\top } = L(\varGamma ) + L(\overleftarrow{\varGamma }). }$$

If Γ is undirected, then

$$\displaystyle{ \mathit{BWB}^{\top } = L(\varGamma ). }$$

Proof.

Let \(e_{1},\ldots,e_{M}\) denote the edges of Γ. The ij entry of BWB is equal to \(\sum _{k=1}^{M}a_{e_{k}}b_{\mathit{ik}}b_{\mathit{jk}}\). Note that \(b_{\mathit{ik}}b_{\mathit{jk}} = 0\) if and only if either i or j is not a vertex of the edge e k . This shows that

$$\displaystyle\begin{array}{rcl} (\mathit{BWB}^{\top })_{\mathit{ ij}} = \left \{\begin{array}{@{}l@{\quad }l@{}} -(a_{\mathit{ij}} + a_{\mathit{ji}}) \quad &\text{for}\;i\neq j \\ \sum _{r=1}^{N}(a_{\mathit{ir}} + a_{\mathit{ri}})\quad &\text{for}\;i = j\;. \end{array} \right.& & {}\\ \end{array}$$

This shows the formula for directed graphs. For undirected graphs, the edges ij and ji appear only once. This accounts for the factor of \(\frac{1} {2}\). This completes the proof. ■ 

The Laplacian of an undirected, weighted graph is always a real symmetric matrix and therefore has only real eigenvalues. Of course, for a directed graph this does not need to be true. However, there are important constraints on the spectrum of a Laplacian that are specified by the next theorem.

For \(\xi =\mathrm{ col\,}(\xi _{1},\ldots,\xi _{N}) \in \mathbb{R}^{N}\), introduce the diagonal matrix \(\mathrm{diag\,}(\xi _{i} -\xi _{j})_{(i,j)\in E}\; \in \mathbb{R}^{M\times M}.\) The oriented incidence matrix then satisfies the useful identity

$$\displaystyle{ B^{\top }\xi =\mathrm{ diag\,}(\xi _{ i} -\xi _{j})\mathbf{e}. }$$

The preceding lemma then implies the explicit representation of the associated quadratic form as

$$\displaystyle\begin{array}{rcl} Q_{\varGamma }(\xi )& =& \xi ^{\top }(L(\varGamma ) + L(\overleftarrow{\varGamma }))\xi =\xi ^{\top }\mathit{BWB}^{\top }\xi \\ & =& \sum _{i,j=1}^{n}a_{\mathit{ ij}}(\xi _{i} -\xi _{j})^{2}. {}\end{array}$$
(8.14)

Most spectral properties of Laplacian matrices can be derived via the canonical quadratic form \(Q_{\varGamma }: \mathbb{R}^{N\times N}\longrightarrow \mathbb{R}\) associated with the graph.

Lemma 8.35.

The quadratic form Q Γ vanishes on \(\mathbb{R}\mathbf{e}\) . It is exactly zero on \(\mathbb{R}\mathbf{e}\) if the graph Γ is strongly connected.

Proof.

The first claim is obvious. By (8.14), it follows that Q Γ (ξ) = 0 if and only if \(a_{\mathit{ij}}(\xi _{i} -\xi _{j})^{2} = 0\) for all i, j. This implies \(\xi _{i} =\xi _{j}\) for all edges (i, j) ∈ E. Thus, if Γ is strongly connected, this implies \(\xi _{1} = \cdots =\xi _{n}\). ■ 

Theorem 8.36.

Let L(Γ) denote the Laplacian of a weighted graph Γ on N vertices with nonnegative weights. Let \(\mathbf{e} = (1,\ldots,1)^{\top }\in \mathbb{R}^{N}\) . Then \(L(\varGamma )\mathbf{e} = L(\overleftarrow{\varGamma })\mathbf{e} = 0\) . Moreover:

  1. 1.

    The eigenvalues of L(Γ) have nonnegative real part;

  2. 2.

    If Γ is strongly connected, then L(Γ) has rank N − 1, i.e.,

    $$\displaystyle{ \mathrm{Ker\,}L(\varGamma ) = \mathbb{R}\mathbf{e}. }$$

    Thus, 0 is a simple eigenvalue of L(Γ), and all other eigenvalues of L Γ have positive real part.

  3. 3.

    The quadratic form Q Γ is positive semidefinite if and only if e L(Γ) = 0.

Proof.

The claim \(L(\varGamma )\mathbf{e} = L(\overleftarrow{\varGamma })\mathbf{e} = 0\) is obvious from the definition of Laplacians. Note that \(\mathbf{e}^{\top }L(\varGamma ) = 0\) if and only if \(\sum _{j=1}^{N}a_{\mathit{ij}} =\sum _{ j=1}^{N}a_{\mathit{ji}}\) for all i = 1, , N. This in turn is equivalent to \(L(\overleftarrow{\varGamma }) = L(\varGamma )^{\top }\). For simplicity, we only prove claims 1 and 2 for symmetric weights, i.e., for \(L(\overleftarrow{\varGamma }) = L(\varGamma )^{\top }\). See Bullo, Cortés and Martínez (2009), Theorem 1.32, for the proof in the general case.

Claims 1 and 2. Assume Lx = λ x for a nonzero complex vector x and \(\lambda \in \mathbb{C}\). By Proposition 8.34, then \(L + L^{\top } = \mathit{BWB}^{\top }\), and therefore, using \(x^{{\ast}} = \overline{x}^{\top }\),

$$\displaystyle{ 2\mathrm{Re}\lambda \|x\|^{2} = x^{{\ast}}\mathit{Lx} + x^{{\ast}}L^{\top }x = x^{{\ast}}\mathit{BWB}^{\top }x \geq 0. }$$

Thus Reλ ≥ 0. Now suppose that Γ is strongly connected. By Lemma 8.35, the symmetric matrix L(Γ) + L(Γ) is positive semidefinite and degenerates exactly on \(\mathbb{R}\mathbf{e}\). This proves the claim.

Claim 3. If e L(Γ) = 0, then \(L(\varGamma ) + L(\varGamma )^{\top } = L(\varGamma ) + L(\overleftarrow{\varGamma }) = \mathit{BWB}^{\top }\) is positive semidefinite. Conversely, assume that L(Γ) + L(Γ) is positive semidefinite. Then

$$\displaystyle{ 0 = \mathbf{e}^{\top }(L(\varGamma ) + L(\varGamma )^{\top })\mathbf{e}. }$$

By the positive semidefiniteness of L(Γ) + L(Γ), thus (L(Γ) e = L(Γ) + L(Γ))e = 0. This proves (3). ■ 

A classical result on the eigenvalues of Hermitian matrices is the Courant–Fischer minimax principle , which characterizes the eigenvalues of Hermitian matrices.

Theorem 8.37 (Courant–Fischer).

Let \(A = A^{{\ast}}\in \mathbb{C}^{n\times n}\) be Hermitian with eigenvalues \(\lambda _{1} \geq \cdots \geq \lambda _{n}\) . Let E i denote the direct sum of the eigenspaces corresponding to the eigenvalues \(\lambda _{1},\ldots,\lambda _{i}\) . Then

$$\displaystyle{\lambda _{i} =\min _{\dim S=n-i+1}\;\max _{0\neq x\in S}\frac{x^{{\ast}}\mathit{Ax}} {x^{{\ast}}x} =\min _{0\neq x\in E_{i}}\frac{x^{{\ast}}\mathit{Ax}} {x^{{\ast}}x}.}$$

The minimax principle can be extended to establish bounds for the eigenvalues of sums of Hermitian matrices, such as the Weyl inequalities. The Weyl inequality asserts that the eigenvalues of the sum A + B of Hermitian matrices \(A,B \in \mathbb{C}^{n\times n}\), ordered decreasingly, satisfy, for 1 ≤ i + j − 1 ≤ n, the inequality

$$\displaystyle{\lambda _{i+j-1}(A + B) \leq \lambda _{i}(A) +\lambda _{j}(B).}$$

More refined eigenvalue estimates are obtained from generalizations of the Weyl inequalities such as the Freede–Thompson inequality. We next state a straightforward consequence of the minimax principle to Laplacian matrices.

Corollary 8.38.

Let L(Γ) denote the Laplacian of a weighted, directed graph on N vertices with nonnegative weights. Assume that Γ is strongly connected and satisfies e L = 0. Then the eigenvalue λ N−1 of L(Γ) with smallest positive real part satisfies

$$\displaystyle{ \mathop{\mathrm{Re}}\nolimits (\lambda _{N-1}) \geq \min _{0\neq x\in (\mathbb{R}\mathbf{e})^{\perp }}\frac{x^{{\ast}}\mathit{Lx}} {x^{{\ast}}x} > 0. }$$
(8.15)

Moreover, if L(Γ) is symmetric, then λ N−1 is real, and equality in (8.15) holds.

Proof.

By Theorem 8.36 the quadratic form \(Q_{\varGamma }(x) = 2x^{{\ast}}\mathit{Lx} = x^{{\ast}}(L + L^{{\ast}})x\) is positive semidefinite and degenerates exactly on \(\mathbb{R}\mathbf{e}\). Moreover, all nonzero eigenvalues σ of L have positive real part. Thus the eigenvectors v of L with Lv = σ v satisfy v e = 0 and \(v^{{\ast}}(L + L^{{\ast}})v = 2\mathop{\mathrm{Re}}\nolimits (\sigma )\|v\|^{2}\). Thus the result follows from the minimax principle applied to A: = L + L . ■ 

Let us briefly mention a coordinate-free approach to Laplacian matrices. Let S(N) denote the vector space of real symmetric N × N matrices. For an N × N matrix S, let \(\delta (S) = (s_{11},s_{22},\ldots,s_{\mathit{NN}})^{\top }\) denote the column vector defined by the diagonal entries of S. Moreover, let diag (S) denote the diagonal matrix obtained from S by setting all off-diagonal entries equal to zero. Define a linear map \(\mathcal{D}: S(N)\longrightarrow S(N)\) as

$$\displaystyle{ \mathcal{D}(S) = S -\frac{1} {2}(\delta (S)\mathbf{e}^{\top } + \mathbf{e}\delta (S)^{\top }), }$$

where \(\mathbf{e} = (1,\ldots,1)^{\top }\in \mathbb{R}^{N}\).

Lemma 8.39.

$$\displaystyle{ \mathop{\mathrm{Ker}}\nolimits \mathcal{D} =\{ a\mathbf{e}^{\top } + \mathbf{e}a^{\top }\}. }$$

Proof.

The inclusion ⊂ follows from the definition of \(\mathcal{D}(S)\) with \(a = \frac{1} {2}\delta (S)\). For the other direction let \(S = a\mathbf{e}^{\top } + \mathbf{e}a^{\top }\). Then δ(S) = 2a, and thus \(\mathcal{D}(S) = 0\). ■ 

It is easily seen that the adjoint operator of \(\mathcal{D}\) with respect to the Frobenius inner product \(< S_{1},S_{2} >:=\mathop{ \mathrm{tr}}\nolimits (S_{1}S_{2})\) is \(\mathcal{D}^{{\ast}}: S(N)\longrightarrow S(N)\), with

$$\displaystyle{ \mathcal{D}^{{\ast}}(S) = S -\frac{1} {2}\mathrm{diag\,}(\delta (S)\mathbf{e}\mathbf{e}^{\top } + \mathbf{e}\mathbf{e}^{\top }\delta (S)^{\top }). }$$

The Laplacian operator is the linear map \(L = \mathcal{D}^{{\ast}}\circ \mathcal{D}\). Obviously, a symmetric matrix S satisfies

$$\displaystyle{ \mathop{\mathrm{tr}}\nolimits (L(S)S) =\mathop{ \mathrm{tr}}\nolimits (\mathcal{D}(S)\mathcal{D}(S)) =\| \mathcal{D}(S)\|^{2} \geq 0, }$$

and therefore \(\mathop{\mathrm{tr}}\nolimits (L(S)S) = 0\) if and only if \(\mathcal{D}(S) = 0\). Note further that \(\mathcal{D}\circ \mathcal{D}^{{\ast}} = \mathcal{D}^{{\ast}}\) and the operators \(\mathcal{D}\) and \(\mathcal{D}^{{\ast}}\) commute on the space of symmetric matrices with zero diagonal entries. A brute force calculation shows

$$\displaystyle{ L(S) = S-\frac{1} {2}(\delta (S)\mathbf{e}^{\top }+\mathbf{e}\delta (S)^{\top })-\frac{1} {2}\mathrm{diag\,}(S\mathbf{e}\mathbf{e}^{\top }+\mathbf{e}\mathbf{e}^{\top }S)+\frac{N} {2} \mathrm{diag\,}(S)+\frac{1} {2}\mathop{\mathrm{tr}}\nolimits (S)I_{N}. }$$

Using the preceding formula for the Laplacian operator one concludes for each symmetric matrix S that

$$\displaystyle{ L_{\mathit{ij}}(S) = \left \{\begin{array}{@{}l@{\quad }l@{}} s_{\mathit{ij}} -\frac{s_{\mathit{ii}}+s_{\mathit{jj}}} {2} \quad &\text{if}\quad i\neq j \\ -\sum _{j=1}^{N}(s_{\mathit{ik}} -\frac{s_{\mathit{ii}}+s_{\mathit{kk}}} {2} )\quad &\text{if}\quad i = j. \end{array} \right. }$$

In particular, L(S)e = 0 for all S. This explicit formula implies the following corollary.

Corollary 8.40.

For \((x_{1},\ldots,x_{N}) \in \mathbb{R}^{m\times N}\) , define the distance matrix \(D(x_{1},\ldots,x_{N})\) \(= (\|x_{i} - x_{j}\|^{2})\) . Then the Laplacian of S = X X is

$$\displaystyle{ L(X^{\top }X) = -\frac{1} {2}(D(x_{1},\ldots,x_{N}) -\mathrm{ diag\,}(D(x_{1},\ldots,x_{N})\mathbf{e}). }$$

In particular, if the matrix \(D(x_{1},\ldots,x_{N})\) is irreducible, then

$$\displaystyle{ \mathop{\mathrm{Ker}}\nolimits L(X^{\top }X) = \mathbb{R}\mathbf{e}. }$$

Laplacian operators share an important monotonicity property.

Proposition 8.41.

L is a monotonic operator, i.e., if \(S_{1} - S_{2}\) is positive semidefinite, then so is \(L(S_{1}) - L(S_{2})\) . The kernel of L is equal to the kernel of \(\mathcal{D}\) .

Proof.

The second claim is obvious from the definition of \(L = \mathcal{D}^{{\ast}}\circ \mathcal{D}\). For the first claim it suffices to show that L maps positive semidefinite matrices into positive semidefinite matrices. If S is positive semidefinite, then there exists a full column row matrix \(X = (x_{1},\ldots,x_{N})\), with S = X X. Therefore, \(L_{\mathit{ij}}(X^{\top }X) = -\|x_{i} - x_{j}\|^{2}\) for ij and \(L_{\mathit{ii}}(X^{\top }X) =\sum _{ j=1}^{N}\|x_{i} - x_{j}\|^{2}\). Thus, for a vector ξ,

$$\displaystyle{ \xi ^{\top }L(X^{\top }X)\xi = \frac{1} {2}\sum _{i<j}^{}\|x_{i} - x_{j}\|^{2}(\xi _{ i} -\xi _{j})^{2} \geq 0. }$$

This completes the proof. ■ 

A different version of the Laplacian matrix of a graph that is frequently of interest in applications is the normalized Laplacian , or the flocking matrix

$$\displaystyle{ \mathcal{L} = D^{-1}A. }$$

Here A denotes the weighted adjacency matrix and D = diag (A e). We list a few spectral properties of the normalized Laplacian for undirected graphs.

Theorem 8.42.

The normalized Laplacian \(\mathcal{L}\) of an undirected, weighted, connected graph Γ has the following properties:

  1. 1.

    \(\mathcal{L}\) is a stochastic matrix with only real eigenvalues − 1 ≤λ ≤ 1;

  2. 2.

    1 is a simple eigenvalue of ℒ with eigenspace \(\mathbb{R}\mathbf{e}\) . Moreover, − 1 is not an eigenvalue of Γ if and only if Γ is not bipartite;

  3. 3.

    If A has at least one positive entry on the diagonal, then − 1 is not an eigenvalue of \(\mathcal{L}\) .

Proof.

\(\mathcal{L}\) is similar to the real symmetric matrix

$$\displaystyle{D^{\frac{1} {2} }\mathcal{L}D^{-\frac{1} {2} } = D^{-\frac{1} {2} }\mathit{AD}^{-\frac{1} {2} } = I - D^{-\frac{1} {2} }\mathit{LD}^{-\frac{1} {2} }}$$

and therefore has only real eigenvalues. Moreover, D −1 Ax = x if and only if Lx = (DA)x = 0. Theorem 8.36 implies that 1 is a simple eigenvalue of \(\mathcal{L}\) with eigenspace equal to the kernel of L, i.e., it coincides with \(\mathbb{R}\mathbf{e}\). \(\mathcal{L}\) is nonnegative, with \(\mathcal{L}\mathbf{e} = \mathbf{e}\), and therefore a stochastic matrix. Thus Theorem 8.18 implies that \(\mathcal{L}\) has spectral radius 1. Moreover, the irreducibility of the adjacency matrix A implies that \(\mathcal{L}\) is irreducible. Suppose that − 1 is an eigenvalue of \(\mathcal{L}\). Applying Theorem 8.23 we conclude that \(\mathcal{L}\) and, hence, A are permutation equivalent to a matrix of the form

$$\displaystyle{ \left (\begin{array}{cc} 0 &B_{1} \\ B_{1}^{\top }& 0 \end{array} \right ). }$$
(8.16)

But this is equivalent to the graph being bipartite. Conversely, assume that A is the adjacency matrix of a bipartite graph. Then \(\mathcal{L}\) is permutation equivalent to (8.16). Thus the characteristic polynomial of \(\mathcal{L}\) is even, and therefore − 1 is an eigenvalue. This proves the first two claims. Now suppose that A and, hence, \(\mathcal{L}\) have at least one positive diagonal entry. Then \(\mathcal{L}\) cannot be permutation equivalent to a matrix of the form (8.16) because diagonal entries remain on the diagonal under permutations. Thus Theorem 8.23 implies that − 1 cannot be an eigenvalue of \(\mathcal{L}\). This completes the proof. ■ 

Finally, we prove the classical matrix-tree theorem for weighted graphs, which plays an important role in algebraic combinatorics. We use the following notation. For each edge e ∈ E let a e  > 0 denote the associated weight. For a subset E  ⊂ E define

$$\displaystyle{ a_{E^{{\prime}}} =\prod _{ e\in E^{{\prime}}}^{}a_{e}. }$$

Note that the classical adjoint of a matrix A is the transposed matrix adj  (A of cofactors, i.e., adj  A ij  = (−1)i+jdetA ji .

Theorem 8.43 (Matrix-Tree Theorem).

Let Γ be an undirected weighted graph and L(Γ) the associated Laplacian with real eigenvalues \(0 =\lambda _{1} \leq \lambda _{2} \leq \cdots \leq \lambda _{N}\) . Then:

  1. 1.

    The (N − 1) × (N − 1) leading principal minor of L(Γ) is equal to

    $$\displaystyle{ \kappa (\varGamma ) =\sum _{ \vert E^{{\prime}}\vert =N-1}^{}a_{E^{{\prime}}}, }$$
    (8.17)

    where the sum is over all spanning subtrees (V,E ) of Γ;

  2. 2.

    The adjoint matrix of L(Γ) is

    $$\displaystyle{ \mathrm{adj\,}\;L(\varGamma ) =\kappa (\varGamma )\mathbf{e}\mathbf{e}^{\top }; }$$
    (8.18)
  3. 3.
    $$\displaystyle{ \kappa (\varGamma ) = \frac{\lambda _{2}\cdots \lambda _{N}} {N}. }$$
    (8.19)

Proof.

By Lemma 8.34, the 11-minor of the Laplacian is equal to \(\det (B_{1}\mathit{WB}_{1}^{\top })\), where \(B_{1} \in \mathbb{R}^{(N-1)\times M}\) denotes the (N − 1) × M submatrix of the oriented incidence matrix B formed by the first N − 1 rows. By the Cauchy–Binet formula,

$$\displaystyle{ \det (B_{1}\mathit{WB}_{1}^{\top }) =\sum _{ \vert E^{{\prime}}\vert =N-1}^{}\det ^{2}(B_{ E^{{\prime}}})\det W_{E^{{\prime}}}, }$$
(8.20)

where the summation is over all subsets of edges \(E^{{\prime}}\subset E \cap (\{1,\ldots,N - 1\} \times \{ 1,\ldots,N - 1\})\) of cardinality N − 1. One can always assume without loss of generality that the subgraphs ({1, , N − 1}, E ) are connected, because otherwise \(B_{E^{{\prime}}}\) would contain a zero row (or zero column), and thus \(\det ^{2}(B_{E^{{\prime}}}) = 0\). Assume that ({1, , N − 1}, E ) contains a cycle of length r ≤ N − 1. Then, after a suitable permutation of rows and columns, \(B_{E^{{\prime}}}\) would be of the form

$$\displaystyle{ \left (\begin{array}{cc} B_{11} & B_{12} \\ 0 &B_{22}\\ \end{array} \right ), }$$

with

$$\displaystyle{ B_{11} = \left (\begin{array}{ccccc} 1 & 0 & \cdots & 0 & - 1\\ - 1 & 1 & 0 & & 0 \\ 0 & \ddots & \ddots & \ddots & \vdots\\ \vdots & & - 1 & 1 & 0 \\ 0 &\cdots & 0 & - 1& 1 \end{array} \right ) }$$

a circulant matrix whose last column is the negative of the sum of the previous ones. Thus detB 11 = 0, and therefore the corresponding summand in (8.20) vanishes. Thus only the cycle-free subgraphs \(\varGamma ^{{\prime}} = (\{1,\ldots,N - 1\},E^{{\prime}})\) contribute. These are exactly the spanning subtrees of ({1, , N − 1}, E ) with determinant \(\det B_{E^{{\prime}}} = \pm 1\). This proves (8.20).

For the second claim note that Γ is connected if and only if the rank of the Laplacian is n − 1. Thus, if Γ is not connected, then both sides of (8.19) are zero. Hence, one can assume that Γ is connected. From L(Γ)adj  L(Γ) = det L(Γ)I = 0 we conclude that every column of adj  L(Γ) is in the kernel of L(Γ), i.e., is a scalar multiple of e. Since L(Γ) is symmetric, so is adj  L(Γ). Thus adj  L(Γ) is a multiple of ee .

The last claim follows by taking traces in (8.19). Thus N κ(Γ) = tr  adj  L(Γ) coincides with the sum of eigenvalues of adj  L(Γ). If \(\lambda _{1},\ldots,\lambda _{N}\) denote the eigenvalues of a matrix M, then the eigenvalues of the adjoint adj  M are \(\prod _{j\neq i}^{}\lambda _{j}\), j = 1, , N. Thus we obtain \(\mathrm{tr\,}\;\mathrm{adj\,}\;L(\varGamma ) =\lambda _{2}\cdots \lambda _{N}\). This completes the proof.  ■ 

As a consequence of the matrix-tree theorem one can derive an explicit formula for the number of spanning trees in a graph.

Corollary 8.44 (Temperley (1964)).

Let L:= L(Γ) be the Laplacian of an undirected weighted graph on N vertices and J = ee . Then

$$\displaystyle{ \kappa (\varGamma ) = \frac{\det \;(J + L)} {N^{2}}. }$$

Proof.

The identities NJ = J 2, JL = 0 imply (NIJ)(J + L) = NL. By taking adjoints, therefore, adj  (J + L)adj  (NIJ) = adj  (NL) = N N−1adj  L. Thus, using adj  (NIJ) = N N−2 J, the matrix-tree theorem implies

$$\displaystyle{ N^{N-1}\kappa (\varGamma )J = N^{N-1}\mathrm{adj\,}\;L = N^{N-2}\mathrm{adj\,}\;(J + L)J. }$$

Thus N κ(Γ)J = adj  (J + L)J, and therefore

$$\displaystyle\begin{array}{rcl} \det \;(J + L)J& =& (J + L)\mathrm{adj\,}\;(J + L)J =\mathrm{ adj\,}\;(J + L)(J + L)J = N\mathrm{adj\,}\;(J + L)J {}\\ & =& N^{2}\kappa (\varGamma ). {}\\ \end{array}$$

 ■ 

For the complete graph K N on N vertices, the classical graph Laplacian is \(\mathcal{L} = \mathit{NI} - J\). This implies the well-known formula

$$\displaystyle{ \kappa (K_{N}) = \frac{\det \;(J + \mathcal{L})} {N^{2}} = \frac{N^{N}} {N^{2}} = N^{N-2} }$$

for the number of spanning trees in K N .

7 Laplacians of Simple Graphs

We determine spectral information for important classes of classical Laplacians and adjacency matrices for directed and undirected graphs Γ. For simplicity we focus on unweighted graphs, i.e., on classical Laplacians and adjacency matrices. Note that both Laplacians and adjacency matrices of undirected, weighted graphs satisfy the following properties.

$$\displaystyle\begin{array}{rcl} L(\varGamma \cup \varGamma ^{{\prime}}))& =& \mathrm{diag\,}\;(L(\varGamma ),L(\varGamma ^{{\prime}})), {}\\ L(\varGamma \times \varGamma ^{{\prime}})& =& L(\varGamma ) \otimes I_{ n} + I_{m} \otimes L(\varGamma ^{{\prime}})\;. {}\\ \end{array}$$

In particular, the eigenvalues of the Laplacian of the direct product graph Γ ×Γ are the sums \(\lambda _{i} +\mu _{j}\) of the eigenvalues λ i and μ j of Γ and Γ , respectively.

1. Simple Path Graph. Our first example is the simple directed path graph Γ n on the vertex set V = { 1, , N} and edges E = { (1, 2), (2, 3), , (N − 1, N)} (Figure 8.11).

Fig. 8.11
figure 11

Directed simple path

The adjacency matrix and Laplacian of Γ N are respectively

$$\displaystyle{ \mathfrak{A} = \left (\begin{array}{cccc} 0&\cdots &\cdots &0\\ 1 & \ddots & & \vdots\\ \vdots & \ddots & \ddots & \vdots \\ 0&\cdots &1 &0 \end{array} \right ),\quad L = \left (\begin{array}{cccc} 0 &\cdots & \cdots &0\\ - 1 & 1 & & \vdots\\ \vdots & \ddots & \ddots & \vdots \\ 0 &\cdots & - 1&1 \end{array} \right ). }$$

Thus \(\mathfrak{A}\) is the standard cyclic nilpotent matrix, while L has 0 as a simple eigenvalue and the eigenvalue 1 has a geometric multiplicity of one (and an algebraic multiplicity of N − 1) (Figure 8.12).

Fig. 8.12
figure 12

Undirected simple path

More interesting is the associated undirected graph with a set of edges E = {{ 1, 2}, {2, 3}, , {N − 1, N}} and graph adjacency and Laplacian matrices, respectively,

$$\displaystyle{ \mathfrak{A}_{N} = \left (\begin{array}{*{10}c} 0&1& & &\\ 1 &0 &1 && \\ &1& \ddots & \ddots &\\ & & \ddots & \ddots&1 \\ & & &1&0 \end{array} \right ),\quad L_{N} = \left (\begin{array}{*{10}c} 1 &-1& & & &\\ -1 & 2 &-1 & && \\ &-1& \ddots & \ddots & &\\ & & \ddots & \ddots &-1& \\ & & &-1& 2 &-1\\ & & & &-1 & 1 \end{array} \right ). }$$

We begin with a spectral analysis of \(\mathfrak{A}_{N}\). This is a classical exercise from analysis.

Theorem 8.45.

  1. 1.

    The eigenvalues of \(\mathfrak{A}_{N}\) are the distinct real numbers \(\lambda _{k}(\mathfrak{A}_{N}) = 2\cos \frac{k\pi } {N+1},\;k = 1,\ldots,N\) .

  2. 2.

    The unique eigenvector \(x^{(k)} = (\xi _{0}^{(k)},\ldots,\xi _{N-1}^{(k)})^{\top }\) , normalized as \(\xi _{0}^{(k)}:= 1\) , for the eigenvalue \(2\cos \frac{k\pi } {N+1}\) is

    $$\displaystyle{ \xi _{\nu }^{(k)} = \frac{\sin k( \frac{\nu +1} {N+1})\pi } {\sin ( \frac{k\pi } {N+1})},\quad \nu = 0,\ldots,N - 1. }$$

    In particular, the coordinates of the eigenvector x (k) are reflection symmetric, that is, they satisfy \(\xi _{\nu }^{(k)} =\xi _{ n-1-\nu }^{(k)}\) .

Proof.

Let

$$\displaystyle{ e_{N}(z) =\det (\mathit{zI} - \mathfrak{A}_{N}) }$$

be the characteristic polynomial of \(\mathfrak{A}_{N}\). Expanding by the first row leads to the three-term recursion

$$\displaystyle{ e_{N}(z) = \mathit{ze}_{N-1}(z) - e_{N-2}(z). }$$
(8.21)

For the 1-norm \(\|\mathfrak{A}\| \leq 2\), and hence it follows that the eigenvalues satisfy | λ | ≤ 2 and, as eigenvalues of a real symmetric matrix, they are all real. So one can set λ = 2cosx. From (8.21) it follows that λ is an eigenvalue if and only if e N (λ) = 0. The difference equation (8.21) can be written now as

$$\displaystyle{ e_{N}(2\cos x) = 2\cos x \cdot e_{N-1}(2\cos x) - e_{N-2}(2\cos x). }$$
(8.22)

We try an exponential solution to this difference equation, i.e., we put \(e_{N} = A\zeta _{1}^{N} + B\zeta _{2}^{N}\), where ζ 1 and ζ 2 are the two roots of the characteristic polynomial \(\zeta ^{2} - 2\zeta \cos x + 1 = 0\). This leads to \(\zeta = e^{\pm \sqrt{-1}x}\). The initial conditions for the difference equation (8.21) are e 0(z) = 1 and e 1(z) = z. Setting \(e_{N}(2\cos x) = \mathit{Ae}^{\sqrt{-1}\mathit{Nx}} + \mathit{Be}^{-\sqrt{-1}\mathit{Nx}}\) leads to the pair of equations

$$\displaystyle{ A + B = 1,\quad \mathit{Ae}^{\sqrt{-1}x} + Be^{-\sqrt{-1}x} = 2\cos x. }$$

Solving and substituting back in (8.22) one obtains

$$\displaystyle{ e_{N}(2\cos x) = \frac{\sin (N + 1)x} {\sin x}. }$$

The right-hand side vanishes for \(x = \frac{k\pi } {N+1}\), k = 1, , N, and therefore the eigenvalues of \(\mathfrak{A}_{N}\) are \(2\cos \frac{k\pi } {N+1}\).

We proceed now to the computation of the eigenvectors of \(\mathfrak{A}_{N}\). Let \(x^{(k)} = (\xi _{0}^{(k)},\ldots,\xi _{N-1}^{(k)})\) be the eigenvector corresponding to the eigenvalue \(\lambda _{k} = 2\cos ( \frac{k\pi } {N+1})\). The characteristic equation \(\mathit{Tx}^{(k)} =\lambda _{k}x^{(k)}\) is equivalent to the system

$$\displaystyle\begin{array}{rcl} \xi _{1}^{(k)}& =& \lambda _{ k}\xi _{0}^{(k)} \\ \xi _{0}^{(k)} +\xi _{ 2}^{(k)}& =& \lambda _{ k}\xi _{1}^{(k)} \\ & \vdots & \\ \xi _{N-3}^{(k)} +\xi _{ N-1}^{(k)}& =& \lambda _{ k}\xi _{N-2}^{(k)} \\ \xi _{N-2}^{(k)}& =& \lambda _{ k}\xi _{N-1}^{(k)}\;.{}\end{array}$$
(8.23)

The coordinates of the eigenvector x (k) satisfy the recursion

$$\displaystyle{ \xi _{\nu }^{(k)} = 2\cos ( \frac{k\pi } {N + 1})\xi _{\nu -1}^{(k)} -\xi _{\nu -2}^{(k)}. }$$

Normalize the eigenvector by requiring \(\xi _{0}^{(k)} = 1\). The second coordinate is determined by the first equation in (8.23) and \(\xi _{1}^{(k)} = 2\cos ( \frac{k\pi } {N+1})\). As is the case with eigenvalues, one solves the difference equation using a linear combination of two exponential solutions. Thus

$$\displaystyle{ \xi _{\nu }^{(k)} = \mathit{Ae}^{\frac{\sqrt{-1}\nu \pi } {N+1} } + \mathit{Be}^{\frac{-\sqrt{-1}\nu \pi } {N+1} }. }$$

The initial conditions determine the ξ ν , and we obtain the explicit formula

$$\displaystyle{ \xi _{\nu }^{(k)} = \frac{\sin k( \frac{\nu +1} {N+1})\pi } {\sin ( \frac{k\pi } {N+1})}. }$$

 ■ 

For later use we formulate a similar result for the matrices

$$\displaystyle{ M_{N} = \left (\begin{array}{*{10}c} 1 &-1& & & &\\ -1 & 2 &-1 & && \\ &-1& \ddots & \ddots & &\\ & & \ddots & \ddots &-1& \\ & & &-1& 2 &-1\\ & & & &-1 & 2 \end{array} \right )\quad L_{N} = \left (\begin{array}{*{10}c} 1 &-1& & & &\\ -1 & 2 &-1 & && \\ &-1& \ddots & \ddots & &\\ & & \ddots & \ddots &-1& \\ & & &-1& 2 &-1\\ & & & &-1 & 1 \end{array} \right ). }$$

Expanding the characteristic polynomial of M N

$$\displaystyle{ g_{N}(z) =\det (\mathit{zI} - M_{N}) }$$

by the last row one obtains the recursion

$$\displaystyle{ g_{N}(z) = (z - 2)g_{N-1}(z) - g_{N-2}(z), }$$

with initial conditions \(g_{0}(z) = 1,g_{1}(z):= z - 1,g_{2}(z) = (z - 1)(z - 2) - 1\). Note that \(\gamma _{N}(z) = e_{N}(z - 2)\) satisfies the same recursion, but with different initial conditions. For a proof of the next result we refer to Yueh (2005); see Willms (2008) for further eigenvalue formulas for tridiagonal matrices.

Theorem 8.46.

  1. 1.

    The eigenvalues of M N are distinct and are

    $$\displaystyle{ \lambda _{k}(M_{N}) = 2 - 2\cos \frac{(2k - 1)\pi } {2N + 1},\quad k = 1,\ldots,N. }$$

    An eigenvector \(x^{(k)} = (\xi _{0}^{(k)},\ldots,\xi _{N-1}^{(k)})^{\top }\) for the eigenvalue \(\lambda _{k}(M_{N})\) is

    $$\displaystyle{ \xi _{\nu }^{(k)} =\sin \Big (\frac{(2k - 1)(\nu +N + 1)\pi } {2N + 1} \Big),\quad \nu = 0,\ldots,N - 1. }$$
  2. 2.

    The eigenvalues of L N are distinct

    $$\displaystyle{ \lambda _{k}(L_{N}) = 2 - 2\cos \frac{(k - 1)\pi } {N},\quad k = 1,\ldots,N. }$$

    The unique eigenvector \(x^{(k)} = (\xi _{0}^{(k)},\ldots,\xi _{N-1}^{(k)})^{\top }\) , normalized as \(\xi _{\nu }^{(1)}:= 1\) , for the eigenvalue \(\lambda _{k}(L_{N})\) is

    $$\displaystyle{ \xi _{\nu }^{(k)} =\cos \Big (\frac{(k - 1)(2\nu + 1)\pi } {2N} \Big),\quad \nu = 0,\ldots,N - 1. }$$

2. Simple Cycle Graph. In this case the set of edges of the digraph is \(E =\{ (1,2),\ldots,(N - 1,N),(N,1)\}\) and in the undirected case

$$\displaystyle{E =\{\{ 1,2\},\ldots,\{N - 1,N\},\{1,N\}\}.}$$
Fig. 8.13
figure 13

Directed cycle graph

Consider first the directed graph case (Figure 8.13). Then the adjacency matrix is the circulant matrix

$$\displaystyle{ C_{N} = \left (\begin{array}{*{10}c} 0& & &1\\ 1 &\ddots & &\\ &\ddots & \ddots & \\ \ & &1&0 \end{array} \right )\;. }$$
(8.24)

Being a circulant matrix, C is diagonalized by the Fourier matrix. Explicitly, let

$$\displaystyle{ \varPhi = \frac{1} {\sqrt{N}}\left (\begin{array}{*{10}c} 1& 1 & 1 & \ldots & 1 \\ 1& \omega & \omega ^{2} & \cdots & \omega ^{N-1} \\ 1& \omega ^{2} & \omega ^{4} & \cdots & \omega ^{2N-2}\\ \vdots & & & & \vdots \\ 1&\omega ^{N-1} & \omega ^{2N-2} & \cdots &\omega ^{(N-1)^{2} } \end{array} \right ) }$$
(8.25)

denote the Fourier matrix, where \(\omega = e^{2\pi \sqrt{-1}/N}\) denotes a primitive Nth root of unity. Notice that Φ is both a unitary and a symmetric matrix:

$$\displaystyle{ C_{N} =\varPhi \mathrm{ diag\,}(1,\omega,\ldots,\omega ^{N-1})\varPhi ^{{\ast}}\,. }$$

This proves the following theorem.

Theorem 8.47.

The eigenvalues of C N are distinct and are the Nth roots of unity:

$$\displaystyle{ \lambda _{k}(C_{N}) =\omega ^{k} = e^{\frac{2k\sqrt{-1}\pi } {N} },\quad k = 1,\ldots,N. }$$

An eigenvector \(x^{(k)} = (\xi _{0}^{(k)},\ldots,\xi _{N-1}^{(k)})^{\top }\) for the eigenvalue \(\lambda _{k}(C_{N})\) is

$$\displaystyle{ x^{(k)} =\varPhi e_{ k} =\sum _{ j=0}^{N-1}\omega ^{(k-1)j}e_{ j+1},\quad k = 1,\ldots,N. }$$

The associated Laplacian matrix is equal to \(L_{N} = I_{N} - C_{N}\). Thus the eigenvalues and eigenvector are trivially related to those of C N (Figure 8.14).

Fig. 8.14
figure 14

Undirected cycle graph

The undirected case is more interesting. The adjacency matrix and Laplacian matrices are

$$\displaystyle{ \mathfrak{A}_{N} = \left (\begin{array}{*{10}c} 0&1& & &1\\ 1 &0 &1 & & \\ &1& \ddots & \ddots &\\ & & \ddots & \ddots&1 \\ 1& & &1&0 \end{array} \right )\;,\quad L_{N} = 2I_{N}-\mathfrak{A}_{N} = \left (\begin{array}{*{10}c} 2 & 1 & & &-1\\ -1 & 2 &-1 & & \\ &-1& \ddots & \ddots &\\ & & \ddots & \ddots&-1 \\ -1& & &-1& 2 \end{array} \right ). }$$
(8.26)

Theorem 8.48.

  1. 1.

    The eigenvalues of \(\mathfrak{A}_{N}\) and L N defined in (8.26) are

    $$\displaystyle\begin{array}{rcl} \lambda _{k}(\mathfrak{A}_{N})& =& 2\cos (\frac{2k\pi } {N} ),\quad k = 1,\ldots,N, {}\\ \lambda _{k}(L_{N})& =& 2 - 2\cos (\frac{2k\pi } {N} ),\quad k = 1,\ldots,N. {}\\ \end{array}$$

    In either case, \(\lambda _{k} =\lambda _{l}\) for 1 ≤ k,l ≤ N if and only if l = N − k. For N = 2m even, λ m and λ N are simple and λ k has a multiplicity of two for all other k. For N = 2m + 1 odd, λ N is simple and all other eigenvalues have a multiplicity of two.

  2. 2.

    An orthonormal basis for the eigenspaces of \(\mathfrak{A}_{N}\) and L N for the eigenvalue \(\lambda _{k}(\mathfrak{A}_{N})\) and \(\lambda _{k}(L_{N})\) , respectively, is as follows:

    1. (a)

      A single generator

      $$\displaystyle{ \frac{1} {\sqrt{N}}\left (\begin{array}{c} 1\\ 1 \\ 1\\ \vdots\\ 1\end{array} \right ),\quad \quad \frac{1} {\sqrt{N}}\left (\begin{array}{c} 1\\ -1 \\ 1\\ \vdots\\ 1\\ \end{array} \right ) }$$

      for k = N or k = m,N = 2m, respectively.

    2. (b)

      Otherwise, a basis of two orthonormal vectors

      $$\displaystyle{ x^{(k)} = \frac{1} {\sqrt{N}}\left (\begin{array}{c} 1 \\ \cos (\frac{2k\pi } {N} ) \\ \cos (\frac{4k\pi } {N} )\\ \vdots \\ \cos (\frac{2(N-1)k\pi } {N} )\\ \end{array} \right ),\quad y^{(k)} = \frac{1} {\sqrt{N}}\left (\begin{array}{c} 1 \\ \sin (\frac{2k\pi } {N} ) \\ \sin (\frac{4k\pi } {N} )\\ \vdots \\ \sin (\frac{2(N-1)k\pi } {N} )\\ \end{array} \right ). }$$

Proof.

Since

$$\displaystyle{\mathfrak{A}_{N} = C_{N} + C_{N}^{\top } =\varPhi \mathrm{ diag\,}(1,\omega,\ldots,\omega ^{N-1})\varPhi ^{{\ast}} +\varPhi \mathrm{ diag\,}(1,\overline{\omega },\ldots,\overline{\omega }^{N-1})\varPhi ^{{\ast}},}$$

the eigenvalues of C + C are equal to \(\mathrm{Re}(\omega ^{k} + \overline{\omega }^{k}) = 2\cos (\frac{2k\pi } {N} )\). Moreover, the complex eigenvectors of \(\mathfrak{A}_{N}\) are simply the columns

$$\displaystyle{ \phi _{k} = \frac{1} {\sqrt{N}}\left (\begin{array}{c} 1 \\ \omega ^{k} \\ \omega ^{2k}\\ \vdots \\ \omega ^{(N-1)k}\\ \end{array} \right ) }$$

of the Fourier matrix Φ N . Thus the real and imaginary parts

$$\displaystyle{ x^{(k)} = \frac{1} {2}(\phi _{k} + \overline{\phi _{k}}),\quad y^{(k)} = \frac{1} {2i}(\phi _{k} -\overline{\phi _{k}}) }$$

form a real basis of the corresponding eigenspaces. Writing \(x^{(k)} = \frac{1} {\sqrt{N}}(\xi _{0}^{(k)},\ldots\), \(\xi _{N-1}^{(k)})^{\top }\) and \(y^{(k)} = \frac{1} {\sqrt{N}}(\eta _{0}^{(k)},\ldots,\) \(\eta _{N-1}^{(k)})^{\top }\) one obtains for each \(k = 1,\ldots,N\)

$$\displaystyle{ x_{\nu }^{(k)} =\cos \frac{2k\nu \pi } {N},\quad y_{\nu }^{(k)} =\sin \frac{2k\nu \pi } {N},\quad k = 1,\ldots,N,\nu = 0,\ldots,N - 1. }$$

This completes the proof for \(\mathfrak{A}_{N}\). The result on the Laplacian follows trivially as \(L_{N} = 2I_{N} - \mathfrak{A}_{N}\). ■ 

8 Compressions and Extensions of Laplacians

We begin by recalling the definition of the Schur complement. Let M be an N × N matrix and I, J ⊂ { 1, , N}. Then M IJ denotes the submatrix of M with row indices in I and column indices in J, respectively.

Definition 8.49.

Let M be an N × N matrix and \(I \subset \{ 1,\ldots,N\}\) such that M II is invertible. Let \(J =\{ 1,\ldots,N\}\setminus I\neq \emptyset\). Then

$$\displaystyle{ M/M_{\mathit{II}}:= M_{\mathit{JJ}} - M_{\mathit{JI}}M_{\mathit{II}}^{-1}M_{\mathit{ IJ}} }$$

is called the Schur complement .

The Schur complement has some basic properties that are easily established, as follows.

Proposition 8.50.

Let M be an N × N matrix and \(I \subset \{ 1,\ldots,N\}\) such that M II is invertible. The Schur complement M∕M II has the following properties:

  1. 1.

    \(\mathrm{rk}M =\mathrm{ rk}M_{\mathit{II}} +\mathrm{ rk}M/M_{\mathit{II}}\) ;

  2. 2.

    Let M be Hermitian; then M∕M II is Hermitian with signature

    $$\displaystyle{ \mathrm{sign\,}(M) =\mathrm{ sign\,}(M_{\mathit{II}}) +\mathrm{ sign\,}(M/M_{\mathit{II}}). }$$

Proof.

Without loss of generality, assume that \(I =\{ 1,\ldots,r\},1 \leq r < n\) and M is partitioned as

$$\displaystyle{ M = \left (\begin{array}{cc} M_{11} & M_{12} \\ M_{21} & M_{22} \end{array} \right ). }$$

The result follows easily from the identity

$$\displaystyle{ \left (\begin{array}{cc} I &0\\ - M_{ 21}M_{11}^{-1} & I \end{array} \right )\left (\begin{array}{cc} M_{11} & M_{12} \\ M_{21} & M_{22} \end{array} \right )\left (\begin{array}{cc} I & - M_{11}^{-1}M_{ 12} \\ 0& I \end{array} \right ) = \left (\begin{array}{cc} M_{11} & 0 \\ 0 &M_{22} - M_{21}M_{11}^{-1}M_{12} \end{array} \right ). }$$

 ■ 

The 2 × 2 matrix

$$\displaystyle{ \left (\begin{array}{cc} 1& - 2\\ 2 & - 3 \end{array} \right ) }$$

shows that the Schur complement of a Hurwitz matrix need not be a Hurwitz matrix. Information about the spectral properties of the Schur complement is provided by the next result. For the proof we refer the reader to Fiedler (2008).

Theorem 8.51.

Let M be a real N × N matrix with nonpositive off-diagonal entries m ij ≤ 0,i ≠ j. Let I ⊂{ 1,…,N}, with M II invertible. Suppose there exists a vector x ≥ 0 with Mx > 0. Then:

  1. 1.

    All eigenvalues of M have positive real part;

  2. 2.

    The eigenvalues of M II and M∕M II have positive real parts, respectively;

  3. 3.

    The off-diagonal entries of M II and the Schur complement M∕M II are both nonpositive. The inverses M II −1 and (M∕M II ) −1 exist and are nonnegative matrices.

Let L denote the Laplacian matrix of an undirected, weighted graph Γ. Assume that \(\varDelta =\mathrm{ diag\,}(\delta _{1},\ldots,\delta _{N})\) denotes a diagonal matrix with nonnegative entries \(\delta _{1} \geq 0,\ldots,\delta _{N} \geq 0\). Then the matrix

$$\displaystyle{ \mathcal{L} = L+\varDelta }$$

is called a generalized Laplacian for Γ. Thus the generalized Laplacians \(\mathcal{L}\) are characterized as those matrices with nonpositive off-diagonal entries that satisfy \(\mathcal{L}\mathbf{e} \geq 0\). Let A denote the weighted adjacency matrix of Γ. The loopy Laplacian is then the generalized Laplacian Q = L +Δ defined by setting \(\delta _{i}:= a_{\mathit{ii}}\) for i = 1, , N.

We now prove a central result on the submatrices of generalized Laplacians.

Theorem 8.52.

Let Γ be an undirected, weighted graph with generalized Laplacian matrix \(\mathcal{L}\) .

  1. 1.

    Γ is connected if and only if \(\mathcal{L}\) is irreducible.

  2. 2.

    Every eigenvalue and every principal minor of \(\mathcal{L}\) are nonnegative.

  3. 3.

    For each \(I\neq \{1,\ldots,N\}\) , \(\mathcal{L}_{\mathit{II}}\) is a positive definite matrix and its inverse \((\mathcal{L}_{\mathit{II}})^{-1}\) is a nonnegative matrix.

  4. 4.

    Let Γ be connected. Then, for each \(I\neq \{1,\ldots,N\}\) , both \(\mathcal{L}_{\mathit{II}}\) and the Schur complement \(\mathcal{L}/\mathcal{L}_{\mathit{II}}\) are generalized Laplacians.

Proof.

Since the off-diagonal entries of \(-\mathcal{L}\) coincide with those of the graph adjacency matrix A of Γ, it follows that \(\mathcal{L}\) is irreducible if and only if A is irreducible. But this is equivalent to Γ being connected.

By Proposition 8.34, every principal submatrix of \(\mathcal{L}\) is of the form

$$\displaystyle{ \mathcal{L}_{\mathit{II}} =\varDelta _{\mathit{II}} + B_{I}WB_{I}^{\top }, }$$

where B I denotes the submatrix of the incidence matrix B formed by the rows that are indexed by I. The matrix Δ II is a diagonal matrix with nonnegative entries. Thus \(\mathcal{L}_{\mathit{II}}\) has nonnegative off-diagonal terms and is positive semidefinite. This proves claim 2. Assume that Γ is connected. Then for each proper subset \(I \subset \{ 1,\ldots,N\}\), the matrix \(B_{I}WB_{I}^{\top }\) is positive definite. In fact, \(x^{\top }B_{I}WB_{I}^{\top }x = 0\) implies \(B_{I}^{\top }x = 0\). Extend x to \(z \in \mathbb{R}^{N}\) by adding zeros, so that \(B^{\top }z = B_{I}^{\top }x\). Since \(\mathrm{Ker\,}\;B^{\top } = \mathbb{R}\mathbf{e}\), therefore z = λ e. Since at least one entry of z is zero, we obtain λ = 0. Thus x = 0, which proves positive definiteness of \(B_{I}WB_{I}^{\top }\). In particular, \(\mathcal{L}_{\mathit{II}} =\varDelta _{\mathit{II}} + B_{I}WB_{I}^{\top }\) is positive definite for all proper index sets I. Moreover, \(\mathcal{L}_{\mathit{II}}\) is a generalized Laplacian matrix because the off-diagonal entries are all nonpositive and \(\mathcal{L}_{\mathit{II}}\mathbf{e} \geq 0\). Let A denote a real matrix with nonpositive off-diagonal entries such that all eigenvalues of A have positive real part. By Theorem 5.2.1 in Fiedler (2008), one obtains that A −1 is a nonnegative matrix. Applying this result to \(A = \mathcal{L}_{\mathit{II}}\) we conclude that \(\mathcal{L}_{\mathit{II}}^{-1}\) is nonnegative. This completes the proof of claim 3. Since \(\mathcal{L}_{\mathit{II}}\) is invertible, the Schur complement \(\mathcal{L}/\mathcal{L}_{\mathit{II}} = \mathcal{L}_{\mathit{JJ}} -\mathcal{L}_{\mathit{JI}}\mathcal{L}_{\mathit{II}}^{-1}\mathcal{L}_{\mathit{IJ}}\) exists. Moreover, \(\mathcal{L}_{\mathit{II}}^{-1}\) is nonnegative and the entries of \(\mathcal{L}_{\mathit{JI}},\mathcal{L}_{\mathit{IJ}}\) are nonpositive. Thus all entries of \(-\mathcal{L}_{\mathit{JI}}\mathcal{L}_{\mathit{II}}^{-1}\mathcal{L}_{\mathit{IJ}}\) are ≤ 0. Since the off-diagonal entries of \(\mathcal{L}_{\mathit{JJ}}\) are all ≤ 0, this shows that the off-diagonal entries of \(\mathcal{L}/\mathcal{L}_{\mathit{II}}\) are nonpositive. Thus it remains to show that the diagonal entries of the Schur complement are nonnegative. To this end, we simplify the notation by assuming that \(I =\{ 1,\ldots,r\}\). Then diagonal entries of \(\mathcal{L}/\mathcal{L}_{\mathit{II}}\) are of the form

$$\displaystyle{ v^{\top }\left (\begin{array}{cc} -\mathcal{L}_{ 21}\mathcal{L}_{11}^{-1} & I \end{array} \right )\left (\begin{array}{cc} \mathcal{L}_{11} & \mathcal{L}_{12} \\ \mathcal{L}_{21} & \mathcal{L}_{22} \end{array} \right )\left (\begin{array}{c} -\mathcal{L}_{11}^{-1}\mathcal{L}_{ 12} \\ I \end{array} \right )v = w^{\top }\mathcal{L}w, }$$

for suitable choices of v, w. By claim 2, then \(w^{\top }\mathcal{L}w \geq 0\), and the result follows. ■ 

We now explore in more detail the underlying graphs that are associated with forming submatrices and Schur complements. Let Γ = (V, E) be an undirected weighted graph and V  ⊂ V a nonempty subset of r vertices in V. Let \(\varGamma _{V ^{{\prime}}} = (V ^{{\prime}},E \cap (V ^{{\prime}}\times V ^{{\prime}})\) denote an induced graph with the induced weight adjacency matrix A . The relation between the Laplacians of \(\varGamma _{V ^{{\prime}}}\) and Γ is established by the following result, whose easy proof is omitted.

Proposition 8.53.

Let \(I =\{ i_{1} <\ldots < i_{r}\} \subset \{ 1,\ldots,N\}\) and \(V ^{{\prime}} =\{ v_{i_{1}},\ldots,v_{i_{r}}\}\) denote the corresponding set of vertices in V. Let

$$\displaystyle{ L(\varGamma )_{\mathit{II}} = (L_{\mathit{ab}})_{a,b\in I} }$$

denote the r × r principal submatrix of the Laplacian \(L(\varGamma ) = (L_{\mathit{ij}})\) , with row and column indices in I. Then

$$\displaystyle{ L(\varGamma )_{\mathit{II}} = L(\varGamma _{V ^{{\prime}}}) + D_{V ^{{\prime}}}, }$$

where \(D_{V ^{{\prime}}} =\mathrm{ diag\,}\;(\delta _{1},\ldots,\delta _{r})\) is a diagonal matrix with nonnegative entries \(\delta _{i} =\sum _{ j\not\in I}^{}(a_{\mathit{ij}} + a_{\mathit{ji}})\) . In particular, the submatrix L(Γ) II of the Laplacian L is a generalized Laplacian of the induced graph \(\varGamma _{V ^{{\prime}}} = (V ^{{\prime}},E \cap (V ^{{\prime}}\times V ^{{\prime}})\) .

For the Schur complement we introduce the following notion; see Fiedler (2008) and Horn and Johnson (1990).

Definition 8.54.

Let Γ = (V, E) be an undirected weighted graph on a vertex set \(V =\{ 1,\ldots,N\}\) and I ⊂ V. The Schur complement, or the Kron reduction, on J is the graph \(\varGamma _{J} = (J,E_{J})\) with the set of vertices J: = V ∖ I. Between vertices i, j ∈ J an edge (i, j) ∈ E is defined if and only if there exists a path from i to j such that all its interior vertices (i.e., those of the path that differ from i and j) belong to W.

The Kron reduction of an undirected graph is an undirected graph on a subset of vertices; however, it may contain self-loops even if Γ = (V, E) does not have self-loops. The Kron reduction graph has some appealing properties that are partially stated in the next result.

Theorem 8.55.

Let Γ = (V,E) be an undirected weighted graph that is connected. The Kron reduction of Γ is connected. The Schur complement \(\mathcal{L}/\mathcal{L}_{\mathit{II}}\) of a generalized Laplacian of Γ is a generalized Laplacian of the Kron reduction graph \(\varGamma _{J} = (V _{J},E_{J})\) .

Proof.

For a proof that the Kron reduction is connected, we refer to Doerfler and Bullo (2013). By the preceding reasoning, \(L(J):= \mathcal{L}/\mathcal{L}_{\mathit{II}}\) is a generalized Laplacian on the vertex set J. Thus it remains to show that the off-diagonal entries L ij (J) are nonzero if and only if ij is an edge of the Kron reduction \(\varGamma _{J} = (J,E_{J})\). This is shown in Theorem 14.1.2 by Fiedler (2008) in the case where \(\mathcal{L}\) possesses a vector x ≥ 0 with \(\mathcal{L}x > 0\). In Theorem 3.4 by Doerfler and Bullo (2013), this is shown for the so-called loopy Laplacian matrix of a graph. ■ 

The Courant–Fischer minimax principle has important implications for the characterization of the eigenvalues of submatrices of Hermitian matrices via interlacing conditions. We state one of the simplest known results here, which is often attributed to Cauchy and Weyl.

Theorem 8.56 (Eigenvalue Interlacing Theorem).

Let M be a Hermitian n × n matrix and \(I \subset \{ 1,\ldots,n\}\) a subset of cardinality r. Assume that the eigenvalues of Hermitian matrices A are ordered increasingly as \(\lambda _{1}(A) \leq \cdots \leq \lambda _{n}(A)\) . Then, for 1 ≤ k ≤ r,

$$\displaystyle{ \lambda _{k}(M) \leq \lambda _{k}(M_{\mathit{II}}) \leq \lambda _{k+n-r}(M). }$$
(8.27)

If in addition M II is positive definite, then for 1 ≤ k ≤ n − r,

$$\displaystyle{ \lambda _{k}(M) \leq \lambda _{k}(M/M_{\mathit{II}}) \leq \lambda _{k}(M_{\mathit{JJ}}) \leq \lambda _{k+r}(M). }$$

Proof.

For the proof of the first inequality we refer to Theorem 4.3.15 in Horn and Johnson (1990). For the second claim note that the positive definiteness of M II implies that of \(M_{\mathit{JI}}M_{\mathit{II}}^{-1}M_{\mathit{IJ}}\). Therefore, \(M_{\mathit{JJ}}\succeq M_{\mathit{JJ}} - M_{\mathit{JI}}M_{\mathit{II}}^{-1}M_{\mathit{IJ}}\), and thus \(\lambda _{k}(M/M_{\mathit{II}}) \leq \lambda _{k}(M_{\mathit{JJ}})\) for all 1 ≤ k ≤ n. Applying (8.27) to the submatrix M JJ gives the result. ■ 

We next describe inequalities between the eigenvalues of the Laplacians of a graph and induced subgraph.

Theorem 8.57.

Let Γ = (V,E) be an undirected weighted graph and \(\varGamma _{V ^{{\prime}}} = (V ^{{\prime}},E \cap (V ^{{\prime}}\times V ^{{\prime}})\) an induced subgraph on V ⊂ V. Let \(0 =\lambda _{1} \leq \lambda _{2} \leq \cdots \leq \lambda _{n}\) , \(\lambda _{1}^{{\prime}} \leq \lambda _{2}^{{\prime}}\leq \cdots \leq \lambda _{r}^{{\prime}}\) , and \(0 =\mu _{1} \leq \mu _{2} \leq \cdots \leq \mu _{r}\) denote the eigenvalues of L(Γ), \(L(\varGamma )_{V ^{{\prime}}}\) , and \(L(\varGamma _{V ^{{\prime}}})\) , respectively. Then for all 1 ≤ k ≤ r,

$$\displaystyle\begin{array}{rcl} \lambda _{k}& \leq & \lambda _{k}^{{\prime}}\leq \lambda _{ N-r+k}, {}\\ \sum _{j=1}^{k}\lambda _{ j}^{{\prime}}& \geq & \sum _{ j=1}^{k}\mu _{ j} +\sum _{ j=1}^{k}\delta _{ j}. {}\\ \end{array}$$

In particular,

$$\displaystyle{ \lambda _{N} + \cdots +\lambda _{N-r+1} \geq \sum _{i=1}^{r}\sum _{ j\neq i}^{}(a_{\mathit{ij}} + a_{\mathit{ji}}). }$$

Proof.

The first inequality follows from the interlacing theorem for eigenvalues of nested Hermitian matrices; see, for example, Horn, Rhee and So (1998). The second estimate follows from a standard eigenvalue inequality for sums of Hermitian matrices. The last inequality follows from the other two. In fact, by the first inequality, \(\lambda _{n} + \cdots +\lambda _{n-r+1} \geq \mathrm{ tr\,}(L(\varGamma )_{V ^{{\prime}}}) =\mathrm{ tr\,}L(\varGamma _{V ^{{\prime}}}) + D_{V ^{{\prime}}}\). This completes the proof. ■ 

Similar eigenvalue inequalities exist for the Schur complement of generalized Laplacians. The straightforward proof of the next theorem is omitted.

Theorem 8.58.

Let \(\mathcal{L}\) denote a generalized Laplacian of an undirected, connected graph, and let \(\mathcal{L}_{\mathrm{red}} = \mathcal{L}/\mathcal{L}_{\mathit{II}}\) denote the Schur complement, |I| = r. Then the following interlacing conditions for eigenvalues are satisfied:

$$\displaystyle{ \lambda _{k}(\mathcal{L}) \leq \lambda _{k}(\mathcal{L}_{\mathrm{red}}) \leq \lambda _{k}(\mathcal{L}_{\mathit{JJ}}) \leq \lambda _{k+r}(\mathcal{L})\quad \text{for}\;1 \leq k \leq N - r. }$$

9 Exercises

  1. 1.

    Let \(\lambda _{1},\ldots,\lambda _{n}\) and \(\mu _{1},\ldots,\mu _{m}\) be the eigenvalues of the matrices \(A \in \mathbb{R}^{n\times n}\) and \(B \in \mathbb{R}^{m\times m}\), respectively. Prove that the eigenvalues of the Kronecker product AB and of \(A \otimes I_{m} + I_{n} \otimes B\) are λ i μ j and \(\lambda _{i} +\mu _{j}\), respectively, for \(i = 1\ldots,n;j = 1,\ldots,m\). Deduce that the Sylvester operator \(A \otimes I_{m} - I_{n} \otimes B\) is invertible if and only if A and B have disjoint spectra.

  2. 2.

    Let \(\lambda _{1},\ldots,\lambda _{n}\) and \(\mu _{1},\ldots,\mu _{m}\) be the eigenvalues of the matrices \(A \in \mathbb{R}^{n\times n}\) and \(B \in \mathbb{R}^{m\times m}\), respectively. Let \(p(x,y) =\sum _{ i,j}^{}c_{i}jx^{i}y^{j}\) denote a real polynomial in two commuting variables. Generalize the preceding exercise by showing that the eigenvalues of

    $$\displaystyle{\sum _{\mathit{ij}}^{}c_{\mathit{ij}}A^{i} \otimes B^{j}}$$

    are equal to \(p(\lambda _{k},\mu _{l})\).

  3. 3.

    The Hadamard product of two matrices \(A,B \in \mathbb{R}^{n\times n}\) is defined as the n × n matrix \(A {\ast} B = (a_{\mathit{ij}}b_{\mathit{ij}})\). Prove that AB is a principal submatrix of AB. Deduce that the Hadamard product AB of two positive definite symmetric matrices A and B is again positive definite.

  4. 4.

    Prove that the set of matrices \(A \in \mathbb{R}^{n\times n}\), with \(\mathbf{e}^{\top }A = \mathbf{e}^{\top }\) and A e = e, forms an affine space of dimension (n − 1)2.

  5. 5.

    Prove Birkhoff’s theorem, stating that the set of n × n doubly stochastic matrices form a convex polyhedron whose n! vertices are permutation matrices.

  6. 6.

    Let \(A \in \mathbb{C}^{n\times n}\) be unitary. Prove that the n × n matrix ( | a ij  | 2) is doubly stochastic.

  7. 7.

    Let \(A \in \mathbb{R}^{n\times n}\) be irreducible and \(D \in \mathbb{R}^{n\times n}\) be diagonal with AD = DA. Prove that D = λ I n is suitable for \(\lambda \in \mathbb{R}\).

  8. 8.

    Let \(A \in \mathbb{R}^{n\times n}\) be irreducible and \(D_{1},\ldots,D_{N-1} \in \mathbb{R}^{n\times n}\) diagonal, with

    $$\displaystyle{A = e^{2\pi \sqrt{-1}k/N}D_{ k}AD_{k}^{-1},\quad k = 1,\ldots,N - 1.}$$

    Then there exists \(\lambda _{k} \in \mathbb{C}\), with \(D_{k} =\lambda _{k}D_{1}^{k}\) for \(k = 1,\ldots,N - 1\).

  9. 9.

    A connected graph with N vertices, without loops and multiple edges, has at least N − 1 edges. If the graph has more than N − 1 edges, then it contains a polygon as a subgraph.

  10. 10.

    Prove that a graph is connected if and only if it has a spanning tree.

  11. 11.

    Consider the directed graph Γ on the vertex set \(\mathcal{V} =\{ 1,2,3,4,5,6\}\) with adjacency matrix

    $$\displaystyle{A = \left (\begin{array}{ccccccc} 0&1&0&0&0&0\\ 1 &0 &0 &0 &1 &0 \\ 0&1&0&0&0&0\\ 1 &0 &1 &0 &0 &0 \\ 0&0&0&1&0&1\\ 0 &0 &1 &0 &1 &0\end{array} \right ).}$$
    1. (a)

      Prove that Γ is strongly connected.

    2. (b)

      Prove that there exists a cycle of period two through vertex 1 and 1 has no cycle of odd period.

    3. (c)

      Prove that the period of A is 2.

    4. (d)

      Compute the eigenvalues of A.

  12. 12.

    Let \(A \in \mathbb{R}^{n\times n}\) be nonnegative and irreducible. Show that \((A +\epsilon I)^{n-1} > 0\) for all ε > 0.

  13. 13.

    Consider the matrices

    $$\displaystyle{ A_{1} = \left (\begin{array}{cccc} 0&1&0&0\\ 0 &0 &1 &0 \\ 0&0&0&1\\ 1 &0 &0 &0\end{array} \right ),\quad A_{2} = \left (\begin{array}{cccc} 0&1&0&0\\ 0 &0 &1 &0 \\ 0&0&0&1\\ 1 &1 &0 &0\end{array} \right ). }$$

    Check for the primitivity of the matrices and, if possible, determine the smallest \(m \in \mathbb{N}\) such that \(A_{i}^{m} > 0\).

  14. 14.

    Show that the contraction constant for the Hilbert metric of

    $$\displaystyle{A = \left (\begin{array}{cc} 1 &\frac{1} {2} \\ \frac{1} {2} & \frac{1} {3}\end{array} \right )}$$

    is equal to

    $$\displaystyle{k(A) = \frac{2 -\sqrt{3}} {2 + \sqrt{3}},}$$

    while the eigenvalues of A are \(\lambda _{\pm } = \frac{4\pm \sqrt{13}} {6}\). Deduce that k(A) is strictly smaller than the convergence rate for the power iteration defined by A.

  15. 15.

    The primitivity index γ(A) of a nonnegative matrix A is defined as the smallest \(m \in \mathbb{N}\), with A m > 0. Prove that the n × n Wielandt matrix

    $$\displaystyle{A = \left (\begin{array}{cccc} 0&1&\ldots &0\\ \vdots & \ddots &\ddots & \vdots\\ 0 & &\ddots &1 \\ 1&1&\ldots &0\end{array} \right )}$$

    is primitive with primitivity index γ(A) = n 2 − 2n + 2.

  16. 16.

    Prove that every nonnegative irreducible matrix \(A \in \mathbb{R}^{n\times n}\) with at least one positive diagonal element is primitive.

  17. 17.

    Consider a real n × n tridiagonal matrix

    $$\displaystyle{A = \left (\begin{array}{cccc} a_{1} & b_{1} & \cdots & 0\\ c_{ 1} & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & b_{n-1} \\ 0 & \cdots &c_{n-1} & a_{n}\end{array} \right )}$$

    with spectral radius r(A). Prove:

    1. (a)

      If \(b_{j}c_{j} \geq 0\) for all j, then A has only real eigenvalues.

    2. (b)

      If \(b_{j}c_{j} > 0\) for all j, then A has only real simple eigenvalues.

    3. (c)

      Assume b j  > 0, c j  > 0, and a j  ≥ 0 for all j. Then A is irreducible. Matrix A is primitive if at least one a j  > 0. If \(a_{1} =\ldots = a_{n} = 0\), then − r(A) is an eigenvalue of A.

  18. 18.

    Let \(\varGamma = (V =\{ 1,\ldots,N\},E)\) be a finite directed graph and \(d_{\text{o}}(j) = \vert \mathcal{N}^{\text{o}}(j)\vert = \vert \{i \in V \;\vert \;(j,i) \in \varGamma \}\vert \) the out-degree of vertex j. For a real number 0 ≤ α < 1 define the N × N Google matrix \(\mathcal{G} = (g_{ij})\) of the digraph Γ as

    $$\displaystyle{g_{ij}:= \left \{\begin{array}{@{}l@{\quad }l@{}} \frac{\alpha }{d_{ \text{o}}(j)} + \frac{1-\alpha } {N} \quad &i \in \mathcal{N}^{\text{o}}(j)\neq \emptyset \\ \frac{1-\alpha } {N} \quad &i\not\in \mathcal{N}^{\text{o}}(j)\neq \emptyset \\ \frac{1} {N} \quad &\mathcal{N}^{\text{o}}(j) =\emptyset.\end{array} \right.}$$
    1. (a)

      Prove that \(\mathcal{G}\) is column stochastic and primitive,

    2. (b)

      Prove that the largest eigenvalue of \(\mathcal{G}\) is λ 1 = 1 and the second largest eigenvalue of \(\mathcal{G}\) is λ 2 = α.

  19. 19.

    The Leslie matrix is a nonnegative matrix of the form

    $$\displaystyle{A = \left (\begin{array}{cccc} a_{1} & a_{2} & \cdots &a_{n} \\ b_{1} & 0 & \cdots & 0\\ \vdots & \ddots & \ddots & \vdots \\ 0 & \cdots &b_{n-1} & 0\end{array} \right ).}$$

    We assume a n  > 0 and \(b_{1} > 0,\ldots,b_{n} > 0\).

    1. (a)

      Show that A is irreducible.

    2. (b)

      Show that A is primitive whenever there exists i with a i  > 0 and a i+1 > 0.

    3. (c)

      Show that A is not primitive if n = 3 and \(a_{1} = a_{2} = 0\).

  20. 20.

    Prove that the Cayley–Menger determinants of a formation \(x_{1},\ldots,x_{N} \in \mathbb{R}^{m}\) are nonpositive for k ≤ N and are zero for k > m + 1.

10 Notes and References

Classical references for nonnegative matrices, Markov chains, and the Perron–Frobenius theorem include Gantmacher (1959), Horn and Johnson (1990), and Seneta (1981). We also mention the excellent book by Fiedler (2008), which provides a useful collection of results on special matrices and connections to graph theory. Part of the material on stochastic matrices and the ergodic theorem in Section 8.3 was inspired by the book of Huppert (1990). Infinite-dimensional generalizations of the Perron–Frobenius theory can be found in the work of Jentzsch (1912), Krein and Rutman (1950), and Krasnoselskii (1964). A special case of the Contraction Mapping Theorem 8.6 was applied by Pollicot and Yuri (1998) to prove the existence of a unique Perron vector for aperiodic {0, 1} matrices. For positive matrices A, the existence of the Perron vector in Theorem 8.11 is well known and easily deduced from purely topological arguments. In fact, the standard simplex C 1 is homeomorphic to the closed unit ball, on which

$$\displaystyle{x\mapsto \frac{\mathit{Ax}} {\mathbf{e}^{\top }\mathit{Ax}}}$$

defines a continuous map. Thus the Brouwer fixed-point theorem implies the existence of an eigenvector x ∈ C 1 with positive eigenvalue. The papers by Bushell (1986) and Kohlberg and Pratt (1982) provide further background on the Hilbert metric and the analysis of positive operators. The sequence of power iterates (8.5) to compute the Perron vector is reminiscent of the well-known power method from numerical linear algebra,

$$\displaystyle{x_{t+1} = \frac{\mathit{Ax}_{t}} {\|\mathit{Ax}_{t}\|},}$$

for computing dominant eigenvectors of a matrix A. The convergence speed of the general power method depends on the ratio \(\frac{\vert \lambda _{1}\vert } {\vert \lambda _{2}\vert }\) of the largest and second largest eigenvalues. See Parlett and Poole (1973) and Helmke and Moore (1994) for convergence proofs of the power method on projective spaces and Grassmann manifolds.

Graph-theoretic methods have long been used for studying the convergence properties of random walks on a graph, for analyzing synchronization and clustering phenomena in physical systems, and for algorithms in distributed computing, formation control, and networked control systems; we refer the reader to the monographies by Bullo, Cortés and Martínez (2009) and Meshbahi and Egerstedt  (2010) for extensive background material and further references. Boyd, Diaconis and Xiao (2004) developed linear matrix inequalities characterizing the fastest Markov chain on a graph, while Xiao and Boyd (2004) studied linear iterations for distributed averaging and consensus in networks. Ottaviani and Sturmfels (2013) studied the problem of finding weights in a complete graph such that the associated Markov chain has a stationary probability distribution that is contained in a specified linear subspace. This problem is equivalent to characterizing the Laplacian matrices A of a graph such that the pair (C, A) is not observable. This problem seems widely open, but in a special situation (complete graph; weights are complex numbers), Ottaviani and Sturmfels (2013) successfully computed the degree of the variety of unobservable pairs.

The literature on formation shape control via distances and graph concepts includes early work by Olfati-Saber, Fax and Murray (2007) and Doerfler and Francis (2010). For characterizations of rigid graphs see Asimov and Roth (1978) and Connelly (1993). References on Euclidean distance geometry and applications include Crippen and Havel (1988), Dress and Havel (1993), and Blumenthal (1953). The nonpositivity condition (8.10) for the Cayley–Menger determinants yields a simple determinantal condition that is necessary for a nonnegative symmetric matrix A with zero diagonal entries to be a Euclidean distance matrix. Blumenthal (1953), Chapter IV, p. 105, has shown that every such matrix A is a Euclidean distance matrix if and only if the Cayley–Menger determinants for all k × k principal submatrices of A are nonpositive, \(k = 1,\ldots,N\). The conditions of parts (d) and (e) in Theorem 8.29 are due to Gower (1985) and Schoenberg (1935), respectively.

A reference for the proof of Theorem 8.45 and related material is Grenander and Szegö (1958). The spectral properties of circulant matrices are well studied. An important fact is that the set of all circulant matrices is simultaneously diagonalized by the Fourier matrix (8.25). Further information on circulant matrices can be found in Davis (1979) and the recent survey by Kra and Simanca (2012). For a statement and proof of the Courant–Fischer minimax principle, see Horn and Johnson (1990). A generalization is Wielandt’s minimax theorem on partial sums of eigenvalues. The eigenvalue inequalities appearing in Theorem 8.57 are the simplest of a whole series of eigenvalue inequalities, which can be derived from eigenvalue inequalities on sums of Hermitian matrices. For a derivation of such eigenvalue inequalities via Schubert calculus on Grassmann manifolds, see Helmke and Rosenthal (1995). The full set of eigenvalue inequalities for sums of Hermitian matrices has been characterized by Klyachko; his work is nicely summarized by Fulton (2000). Such results should be useful in deriving sharp eigenvalue bounds for the Schur complement of Laplacian matrices.