Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Given an n × n matrix A(z) whose elements are analytic function of a complex parameter z, we consider the problem of finding the values of z for which the linear simultaneous equation A(z)w = 0 has a nonzero solution w. Such a problem is known as the nonlinear eigenvalue problem and the value of z and w that satisfy this condition are called the eigenvalue and the eigenvector, respectively. The nonlinear eigenvalue problem arises in many fields of scientific and engineering computing, such as the electronic structure calculation, nonlinear elasticity and theoretical fluid dynamics.

There are several algorithms for solving the nonlinear eigenvalue problem, including the multivariate Newton’s method [14] and its variants [13], the nonlinear Arnoldi method [21], the nonlinear Jacobi-Davidson method [3] and methods based on complex contour integration [1, 2, 4, 22]. Among them, the last class of methods have a unique feature that they can compute all the eigenvalues in a specified region on the complex plane enclosed by a Jordan curve (i.e., simple closed curve) Γ. In addition, they have large grain parallelism since the function evaluations for numerical integration can be done for each sample point independently. In fact, [1] reports that nearly linear speedup can be achieved even in a Grid environment, where the interconnection network among the computing nodes is relatively weak.

These algorithms can be viewed as nonlinear extensions of the Sakurai-Sugiura (SS) method for the generalized eigenvalue problem A x = λB x [16]. To find the eigenvalues within a closed Jordan curve Γ in the complex plane, the SS method computes the moments μ p = Γ z p u (AzB)−1 vdz, where u and v are some constant vectors, and extracts the information of the eigenvalues from the moments. To justify the algorithm, Weierstrass’s canonical form [5] for (linear) matrix pencils is used. Similarly, existing derivations of the SS-type algorithms for the nonlinear eigenvalue problem rely on canonical forms of the analytic matrix function A(z). Specifically, Asakura et al. uses the Smith form for analytic matrix functions [6], while Beyn and Yokota et al. employ the theorem of Keldysh [11, 12]. These theorems are intricate structure theorems, which give canonical representations of A(z) that are valid on the whole domain enclosed by Γ. On the other hand, they require advanced knowledge of both analysis and linear algebra and are rarely introduced even in advanced textbooks of linear algebra.

In this paper, we present an elementary derivation of the SS-type method for the nonlinear eigenvalue problem, assuming that all the eigenvalues of A(z) in Γ are simple. Instead of the whole domain enclosed by Γ, we consider an infinitesimally small circle Γ i ε around each eigenvalue z i . This allows us to use the analyticity of the eigenvalues and eigenvectors of a parametrized matrix A(z), which is a well-known result in matrix perturbation theory [9, p. 117][10, Chapter 2, Sections 1 & 2], to evaluate the contour integral along Γ i ε. Then we aggregate the contributions from each Γ i ε to evaluate the contour integral along Γ. This is sufficient for theoretical justification of the nonlinear SS-type algorithm in the case of simple eigenvalues. We believe that this provides an easily accessible approach to the theory of SS-type methods for the nonlinear eigenvalue problem. We emphasize that our focus here is not to propose a new algorithm for the nonlinear eigenvalue problem, but to provide an elementary derivation of the SS-type nonlinear eigensolver.

This paper is structured as follows: In Sect. 2, we develop a theory for computing the eigenvalues of A(z) based on the complex contour integral. The algorithm based on this theory is presented in Sect. 3. Section 4 gives some numerical results. Finally, we give some concluding remarks in Sect. 5.

Throughout this paper, we use capital letters to denote matrices, bold small letters to denote vectors, roman small letters and Greek letters to denote scalars. A T and A denote the transpose and the Hermitian conjugate of a matrix A, respectively. I n denotes the identity matrix of dimension n. For xC n, {x} denotes a subspace of C n spanned by x.

2 The Theory

Let A(z) be an n × n matrix whose elements are analytic functions of a complex parameter z in some region of the complex plane. Let Γ be a closed Jordan curve within that region and assume that A(z) has m eigenvalues z 1, z 2, , z m within Γ. We further assume that they are simple eigenvalues, that is, simple zeroes of det(A(z)), and the number m is known. In the following, we refer to z 1, z 2, , z m as nonlinear eigenvalues of A(z).

For a fixed value of z, A(z) is a constant matrix and therefore has n eigenvalues. We refer to them as linear eigenvalues of A(z). Also, we call the eigenvectors of a constant matrix A(z) linear eigenvectors. If z i is a nonlinear eigenvalue of A(z), A(z i ) is singular and at least one of the linear eigenvalues of A(z i ) is zero. Moreover, since z 1, z 2, , z m are simple eigenvalues, only one of the n linear eigenvalues become zero at each of them. We denote the linear eigenvalue that becomes zero at z = z i by λ i (z). Note that z i is a simple zero of λ i (z) because otherwise z i will not be a simple zero of det(A(z)).

Since λ i (z) is a continuous function of A(z) near z i [10, p. 93, Theorem 2.3], it remains to be a simple linear eigenvalue of A(z) in the neighborhood of z = z i and has one-dimensional right and left eigenspaces. Let x i (z) and y i (z) be the (linear) left and right eigenvectors, respectively, chosen so that y i (z)x i (z) = 1. Also, let X i (z) ∈ C n×(n−1) and Y i (z) ∈ C n×(n−1) be matrices whose column vectors are the basis of the orthogonal complementary subspaces of {y i (z)} and {x i (z)}, respectively, and which satisfy Y i (z)X i (z) = I n−1. From these definitions, we have

$$\displaystyle{ \left [\begin{array}{c} \mathbf{y}_{i}^{{\ast}}(z) \\ Y _{i}^{{\ast}}(z) \end{array} \right ]\left [\begin{array}{cc} \mathbf{x}_{i}(z)&X_{i}(z) \end{array} \right ] = \left [\begin{array}{cc} \mathbf{y}_{i}^{{\ast}}(z)\mathbf{x}_{i}(z) & \mathbf{y}_{i}^{{\ast}}(z)X_{i}(z) \\ Y _{i}^{{\ast}}(z)\mathbf{x}_{i}(z)&Y _{i}^{{\ast}}(z)X_{i}(z) \end{array} \right ] = \left [\begin{array}{cc} 1 & \mathbf{0}^{T} \\ \mathbf{0}&I_{n-1} \end{array} \right ]. }$$
(1)

Note that x i (z), y i (z), X i (z) and Y i (z) are not yet uniquely determined under these conditions.

Now we show the following basic lemma.

Lemma 2.1

Let Γ i ε be a circle with center z i and radius ε. For sufficiently small ε, λ i (z) is an analytic function of z within Γ i ε and all of x i (z), y i (z), X i (z) and Y i (z) can be chosen to be analytic functions of z within Γ i ε .

Proof

For a sufficiently small ε, λ i (z) is a simple linear eigenvalue of A(z) everywhere in Γ i ε. In this case, it is well known that λ i (z) is an analytic function of z in Γ i ε. See [9, p. 117] for the proof. Let P i (z) ∈ C n×n be a projection operator on the right eigenvector of A(z) belonging to λ i (z) along the left eigenvector. It is also shown in [10, p. 93, Theorem 2.3] that P i (z) is an analytic function of z in Γ i ε for sufficiently small ε.

Now, let x i (0)0 be a (linear) right eigenvector of A(z i ) corresponding to λ i (z i ) and set x i (z) = P i (z)x i (0). Then x i (z) is an analytic function of z and belongs to the right eigenspace of λ i (z). Moreover, since x i (z i ) = P i (z i )x i (0) = x i (0)0, x i (z) remains nonzero within Γ i ε if ε is sufficiently small. Thus we can adopt x i (z) as a (linear) right eigenvector corresponding to λ i (z).

Next, let \(\tilde{\mathbf{y}}_{i}^{(0)}\neq \mathbf{0}\) be a (linear) left eigenvector of A(z i ) corresponding to λ i (z i ) and X i (0)C n×(n−1) be a matrix whose column vectors are the basis of the orthogonal complementary subspace of \(\{\tilde{\mathbf{y}}_{i}^{(0)}\}\). Set \(X_{i}(z) = \left (I - P_{i}(z)\right )X_{i}^{(0)}\). Then, X i (z) is an analytic function of z. Also, its column vectors are orthogonal to the (linear) left eigenvector of A(z) corresponding to λ i (z), which we denote by \(\tilde{\mathbf{y}}_{i}(z)\), since

$$\displaystyle{ \tilde{\mathbf{y}}_{i}^{{\ast}}(z)X_{ i}(z) =\tilde{ \mathbf{y}}_{i}^{{\ast}}(z)\left (I - P_{ i}(z)\right )X_{i}^{(0)} = \left (\tilde{\mathbf{y}}_{ i}^{{\ast}}(z) -\tilde{\mathbf{y}}_{ i}^{{\ast}}(z)\right )X_{ i}^{(0)} = \mathbf{0}^{T}, }$$
(2)

where we used the fact that P i (z) is a projection operator on the left eigenvector along the right eigenvector. Moreover, since \(X_{i}(z_{i}) = \left (I - P_{i}(z_{i})\right )X_{i}^{(0)} = X_{i}^{(0)}\), X i (z) remains to be rank n − 1 within Γ i ε if ε is sufficiently small. In this situation, the column vectors of X i (z) constitute the basis of the orthogonal complementary subspace.

Finally we note that for sufficiently small ε, the matrix \(\left [\mathbf{x}_{i}(z)\;X_{i}(z)\right ]\) is of full rank since the column vectors of X i (z) are orthogonal to the left eigenvector, while the left and right eigenvectors are not orthogonal for a simple eigenvalue [15]. Hence we can define a vector y i (z) and a matrix Y i (z) ∈ C n×(n−1) by

$$\displaystyle{ \left [\begin{array}{c} \mathbf{y}_{i}^{{\ast}}(z) \\ Y _{i}^{{\ast}}(z) \end{array} \right ] = \left [\begin{array}{cc} \mathbf{x}_{i}(z)&X_{i}(z) \end{array} \right ]^{-1}. }$$
(3)

It is clear that y i (z) and Y i (z) are analytic functions of z and x i (z), y i (z), X i (z) and Y i (z) satisfy Eq. (1). From Eq. (1), it is apparent that Y i (z) is of rank n − 1 and its columns are the basis of the orthogonal subspace of x i (z). Finally, y i (z) is a (linear) left eigenvector corresponding to λ i (z) since it is orthogonal to the columns of X i (z) and the eigenspace is one-dimensional.

Thus we have constructed x i (z), y i (z), X i (z) and Y i (z) that satisfy all the requirements of the lemma. □

Using the result of Lemma 2.1 and Eq. (1), we can expand A(z) in Γ i ε as

$$\displaystyle\begin{array}{rcl} A(z)& =& \left [\begin{array}{cc} \mathbf{x}_{i}(z)&X_{i}(z) \end{array} \right ]\left [\begin{array}{c} \mathbf{y}_{i}^{{\ast}}(z) \\ Y _{i}^{{\ast}}(z) \end{array} \right ]A(z)\left [\begin{array}{cc} \mathbf{x}_{i}(z)&X_{i}(z) \end{array} \right ]\left [\begin{array}{c} \mathbf{y}_{i}^{{\ast}}(z) \\ Y _{i}^{{\ast}}(z) \end{array} \right ] \\ & =& \left [\begin{array}{cc} \mathbf{x}_{i}(z)&X_{i}(z) \end{array} \right ]\left [\begin{array}{cc} \mathbf{y}_{i}^{{\ast}}(z)A(z)\mathbf{x}_{i}(z) & \mathbf{y}_{i}^{{\ast}}(z)A(z)X_{i}(z) \\ Y _{i}^{{\ast}}(z)A(z)\mathbf{x}_{i}(z)&Y _{i}^{{\ast}}(z)A(z)X_{i}(z) \end{array} \right ]\left [\begin{array}{c} \mathbf{y}_{i}^{{\ast}}(z) \\ Y _{i}^{{\ast}}(z) \end{array} \right ] \\ & =& \left [\begin{array}{cc} \mathbf{x}_{i}(z)&X_{i}(z) \end{array} \right ]\left [\begin{array}{cc} \lambda _{i}(z)& \mathbf{0}^{T} \\ \mathbf{0} &Y _{i}^{{\ast}}(z)A(z)X_{i}(z) \end{array} \right ]\left [\begin{array}{c} \mathbf{y}_{i}^{{\ast}}(z) \\ Y _{i}^{{\ast}}(z) \end{array} \right ] {}\end{array}$$
(4)

where all the elements and submatrices appearing in the last line are analytic functions of z.

As for the submatrix Y i (z)A(z)X i (z), we can show the following lemma.

Lemma 2.2

For sufficiently small ε, Y i (z)A(z)X i (z) is nonsingular within Γ i ε .

Proof

Since ε is sufficiently small, we can assume that there is no other nonlinear eigenvalues of A(z) in Γ i ε than z i .

Now, assume that Y i (z)A(z)X i (z) is singular at some point \(z =\hat{ z}\) in Γ i ε. Then there is a nonzero vector pC n−1 such that \(Y _{i}^{{\ast}}(\hat{z})A(\hat{z})X_{i}(\hat{z})\mathbf{p} = \mathbf{0}\). It then follows from Eqs. (4) and (1) that \(X_{i}(\hat{z})\mathbf{p}\) is a (linear) right eigenvector of \(A(\hat{z})\) corresponding to the linear eigenvalue 0. Hence, \(\hat{z}\) is a nonlinear eigenvalue of A(z). But because A(z) has no other nonlinear eigenvalues than z i in Γ i ε, we have z = z i . On the other hand, \(\mathbf{x}_{i}(\hat{z}) = \mathbf{x}_{i}(z_{i})\) is also a (linear) right eigenvector of \(A(\hat{z})\) corresponding to the linear eigenvalue 0. Since the matrix \(\left [\mathbf{x}_{i}(\hat{z})\;X_{i}(\hat{z})\right ]\) is of full rank (see Eq. (1)), \(\mathbf{x}_{i}(\hat{z})\) and \(X_{i}(\hat{z})\mathbf{p}\) are linearly independent. Thus the null space of \(A(\hat{z})\) is at least two-dimensional. But this contradicts the assumption that \(\hat{z} = z_{i}\) is a simple zero of det(A(z)). Hence Y i (z)A(z)X i (z) must be nonsingular within Γ i ε. □

Combining Lemma 2.2 with Eq. (4), we have the following expansion of A(z)−1 valid everywhere in Γ i ε except at z = z i :

$$\displaystyle\begin{array}{rcl} A(z)^{-1}& =& \left [\begin{array}{cc} \mathbf{x}_{ i}(z)&X_{i}(z) \end{array} \right ]\left [\begin{array}{cc} \lambda _{i}(z)^{-1} & \mathbf{0}^{T} \\ \mathbf{0} &\left \{Y _{i}^{{\ast}}(z)A(z)X_{i}(z)\right \}^{-1} \end{array} \right ]\left [\begin{array}{c} \mathbf{y}_{i}^{{\ast}}(z) \\ Y _{i}^{{\ast}}(z) \end{array} \right ].{}\end{array}$$
(5)

In the right hand side, λ i (z) is analytic except at z = z i . All other elements and submatrices are analytic everywhere in Γ i ε. Note that \(\left \{Y _{i}^{{\ast}}(z)A(z)X_{i}(z)\right \}^{-1}\) is analytic because A(z), X i (z) and Y i (z) are analytic (see Lemma 2.1) and Y i (z)A(z)X i (z) is nonsingular, as proved in Lemma 2.2.

We now define the complex moments μ 1, μ 2, , μ 2m−1 by complex contour integration as

$$\displaystyle{ \mu _{p}(\mathbf{u},\mathbf{v}) = \frac{1} {2\pi i}\oint _{\varGamma }z^{\,p}\mathbf{u}^{{\ast}}A(z)^{-1}A^{{\prime}}(z)\mathbf{v}\,dz, }$$
(6)

where u and v are some constant vectors in C n. The next lemma shows that these complex moments contain information on the nonlinear eigenvalues of A(z) in Γ.

Lemma 2.3

The complex moments can be written as

$$\displaystyle{ \mu _{p}(\mathbf{u},\mathbf{v}) =\sum _{ i=1}^{m}\nu _{ i}(\mathbf{u},\mathbf{v})z_{i}^{\,p}, }$$
(7)

where {ν i (u, v)} i = 1 m are some complex numbers. Moreover, {ν i (u, v)} i = 1 m are nonzero for generic u and v .

Proof

Let Γ i ε (i = 1, , m) be a circle with center z i and with sufficiently small radius ε. In Γ, the integrand is analytic everywhere except inside Γ 1 ε, , Γ m ε, so we only need to consider the integration along Γ i ε.

Since ε is sufficiently small, Lemma 2.1 ensures that we can choose analytic x i (z), y i (z), X i (z) and Y i (z) within Γ i ε. Of course, λ i (z) is also analytic in Γ i ε. In addition, since λ i (z) has a simple zero at z = z i , it can be expressed as λ i (z) = (zz i )p i (z), where p i (z) is analytic and nonzero in Γ i ε.

Now, from Eq. (5), we have

$$\displaystyle{ A(z)^{-1} =\lambda _{ i}(z)^{-1}\mathbf{x}_{ i}(z)\mathbf{y}_{i}^{{\ast}}(z) + X_{ i}(z)\left \{Y _{i}^{{\ast}}(z)A(z)X_{ i}(z)\right \}^{-1}Y _{ i}^{{\ast}}(z) }$$
(8)

By differentiating Eq. (4) with respect to z, we have

$$\displaystyle\begin{array}{rcl} A^{{\prime}}(z)& =& \lambda _{ i}^{{\prime}}(z)\mathbf{x}_{ i}(z)\mathbf{y}_{i}^{{\ast}}(z) +\lambda _{ i}(z)\left \{\mathbf{x}_{i}(z)\mathbf{y}_{i}^{{\ast}}(z)\right \}^{{\prime}} \\ & &\quad \quad \quad \quad \quad \quad \quad + \left \{X_{i}(z)Y _{i}^{{\ast}}(z)A(z)X_{ i}(z)Y _{i}^{{\ast}}(z)\right \}^{{\prime}}{}\end{array}$$
(9)

Combining Eqs. (8) and (9), we have

$$\displaystyle\begin{array}{rcl} A(z)^{-1}A^{{\prime}}(z)& =& \lambda _{ i}(z)^{-1}\lambda _{ i}^{{\prime}}(z)\mathbf{x}_{ i}(z)\mathbf{y}_{i}^{{\ast}}(z) + \mathbf{x}_{ i}(z)\mathbf{y}_{i}^{{\ast}}(z)\left \{\mathbf{x}_{ i}(z)\mathbf{y}_{i}^{{\ast}}(z)\right \}^{{\prime}} \\ & &\quad +\lambda _{i}(z)\mathbf{x}_{i}(z)\mathbf{y}_{i}^{{\ast}}(z)\left \{X_{ i}(z)Y _{i}^{{\ast}}(z)A(z)X_{ i}(z)Y _{i}^{{\ast}}(z)\right \}^{{\prime}} \\ & &\quad + X_{i}(z)\left \{Y _{i}^{{\ast}}(z)A(z)X_{ i}(z)\right \}^{-1}Y _{ i}^{{\ast}}(z)A^{{\prime}}(z) \\ & =& \frac{1} {z - z_{i}} \cdot \mathbf{x}_{i}(z)\mathbf{y}_{i}^{{\ast}}(z) + \frac{p_{i}^{{\prime}}(z)} {p_{i}(z)} \mathbf{x}_{i}(z)\mathbf{y}_{i}^{{\ast}}(z) \\ & & +\quad \mathbf{x}_{i}(z)\mathbf{y}_{i}^{{\ast}}(z)\left \{\mathbf{x}_{ i}(z)\mathbf{y}_{i}^{{\ast}}(z)\right \}^{{\prime}} \\ & & + \frac{1} {z - z_{i}} \cdot \frac{1} {p_{i}(z)} \cdot \mathbf{x}_{i}(z)\mathbf{y}_{i}^{{\ast}}(z)\left \{X_{ i}(z)Y _{i}^{{\ast}}(z)A(z)X_{ i}(z)Y _{i}^{{\ast}}(z)\right \}^{{\prime}} \\ & & +X_{i}(z)\left \{Y _{i}^{{\ast}}(z)A(z)X_{ i}(z)\right \}^{-1}Y _{ i}^{{\ast}}(z)A^{{\prime}}(z). {}\end{array}$$
(10)

Note that in the rightmost hand side of Eq. (10), the second, third and fifth terms are analytic and vanish by contour integration. Hence,

$$\displaystyle\begin{array}{rcl} & & \frac{1} {2\pi i}\oint _{\varGamma _{i}^{\epsilon }}z^{\,p}\mathbf{u}^{{\ast}}A(z)^{-1}A^{{\prime}}(z)\mathbf{v}\,dz \\ & & = \frac{1} {2\pi i}\oint _{\varGamma _{i}^{\epsilon }}z^{\,p}\mathbf{u}^{{\ast}}\left [ \frac{1} {z - z_{i}} \cdot \mathbf{x}_{i}(z)\mathbf{y}_{i}^{{\ast}}(z)\right. \\ & & \quad \left.+ \frac{1} {z - z_{i}} \cdot \frac{1} {p_{i}(z)} \cdot \mathbf{x}_{i}(z)\mathbf{y}_{i}^{{\ast}}(z)\left \{X_{ i}(z)Y _{i}^{{\ast}}(z)A(z)X_{ i}(z)Y _{i}^{{\ast}}(z)\right \}^{{\prime}}\right ]\mathbf{v}\,dz \\ & & =\nu _{i}(\mathbf{u},\mathbf{v})z_{i}^{\,p}, {}\end{array}$$
(11)

where

$$\displaystyle\begin{array}{rcl} & & \nu _{i}(\mathbf{u},\mathbf{v}) \\ & & = \mathbf{u}^{{\ast}}\mathbf{x}_{ i}(z_{i})\mathbf{y}_{i}^{{\ast}}(z_{ i})\left [I_{n} + \frac{1} {p_{i}(z_{i})}\left \{X_{i}(z_{i})Y _{i}^{{\ast}}(z_{ i})A(z_{i})X_{i}(z_{i})Y _{i}^{{\ast}}(z_{ i})\right \}^{{\prime}}\right ]\mathbf{v}.{}\end{array}$$
(12)

In deriving the last equality of Eq. (11), we used the fact that all the factors in the integrand except 1∕(zz i ) is analytic in Γ i ε. ν i (u, v) is nonzero for generic u and v, since ν i (u, v) can be written as u X v, where X is a nonzero constant matrix.

Finally, we have

$$\displaystyle\begin{array}{rcl} \mu _{p}(\mathbf{u},\mathbf{v})& =& \frac{1} {2\pi i}\oint _{\varGamma }z^{\,p}\mathbf{u}^{{\ast}}A(z)^{-1}A^{{\prime}}(z)\mathbf{v}\,dz \\ & =& \sum _{i=1}^{m} \frac{1} {2\pi i}\oint _{\varGamma _{i}^{\epsilon }}z^{\,p}\mathbf{u}^{{\ast}}A(z)^{-1}A^{{\prime}}(z)\mathbf{v}\,dz \\ & =& \sum _{i=1}^{m}\nu _{ i}(\mathbf{u},\mathbf{v})z_{i}^{\,p}. {}\end{array}$$
(13)

This completes the proof. □

Note that we could adopt the definition

$$\displaystyle{ \mu _{p}(\mathbf{u},\mathbf{v}) = \frac{1} {2\pi i}\oint _{\varGamma }z^{\,p}\mathbf{u}^{{\ast}}A^{{\prime}}(z)A(z)^{-1}\mathbf{v}\,dz, }$$
(14)

instead of Eq. (6) and get the same result, although the expression for ν i (u, v) in Eq. (12) is slightly different. So the order of A(z)−1 and A (z) does not actually matter.

Once {μ p (u, v)} p = 0 2m−1 have been computed, we can extract the information on the nonlinear eigenvalues {z i } i = 1 m from them in the same way as in the algorithm for the linear eigenvalue problem [16]. To this end, we first define two Hankel matrices H m and H m < by

$$\displaystyle{ H_{m} = \left (\begin{array}{cccc} \mu _{0} & \mu _{1} & \cdots & \mu _{m-1} \\ \mu _{1} & \mu _{2} & \cdots & \mu _{m}\\ \vdots & \vdots & \ddots & \vdots \\ \mu _{m-1} & \mu _{m}&\cdots &\mu _{2m-2} \end{array} \right ),\quad H_{m}^{<} = \left (\begin{array}{cccc} \mu _{1} & \mu _{2} & \cdots & \mu _{m}\\ \mu _{2 } & \mu _{3 } & \cdots & \mu _{m+1}\\ \vdots & \vdots & \ddots & \vdots \\ \mu _{m}&\mu _{m+1} & \cdots &\mu _{2m-1} \end{array} \right ). }$$
(15)

Here we have suppressed the dependence of μ p on u and v for brevity. The next theorem shows how to compute the nonlinear eigenvalues from H m and H m <. This is exactly the same theorem used for the linear eigenvalue problem in [16], but we include the proof for completeness.

Theorem 2.4

Assume that A(z) has m simple nonlinear eigenvalues z 1, z 2, , z m within Γ. Assume further that ν i ’s defined by Eq. ( 12 ) are nonzero for i = 1, , m. Then, z 1, z 2, , z m are given as the m eigenvalues of the matrix pencil H m <λH m defined by Eq. ( 15 ).

Proof

Define a Vandermonde matrix V m and two diagonal matrices D m and Λ m by

$$\displaystyle\begin{array}{rcl} V _{m}& =& \left (\begin{array}{cccc} 1 & 1 &\cdots & 1\\ z_{ 1} & z_{2} & \cdots & z_{m}\\ \vdots & \vdots & \vdots & \vdots \\ z_{1}^{m-1} & z_{2}^{m-1} & \cdots &z_{m}^{m-1} \end{array} \right ),{}\end{array}$$
(16)
$$\displaystyle\begin{array}{rcl} D_{m}& =& \mathrm{diag}(\nu _{1},\nu _{2},\cdots \,,\nu _{m}),{}\end{array}$$
(17)
$$\displaystyle\begin{array}{rcl} \varLambda _{m}& =& \mathrm{diag}(z_{1},z_{2},\cdots \,,z_{m}).{}\end{array}$$
(18)

Then it is easy to see that H m = V m D m V m T and H m < = V m D m Λ m V m T. Since ν i ≠ 0 (i = 1, , m), D m is nonsingular. Also, since the m nonlinear eigenvalues are distinct, V m is nonsingular. Thus we have

$$\displaystyle\begin{array}{rcl} & & \lambda \;\mbox{ is an eigenvalue of}\;H_{m}^{<} -\lambda H_{ m}. \\ & \Leftrightarrow & H_{m}^{<} -\lambda H_{ m}\;\mbox{ is singular.} \\ & \Leftrightarrow & \varLambda _{m} -\lambda I_{m}\;\mbox{ is singular.} \\ & \Leftrightarrow & \exists k,\lambda = z_{k}. {}\end{array}$$
(19)

This completes the proof. □

We can also compute the (nonlinear) eigenvectors corresponding to z 1, z 2, , z m by slightly modifying the lemma and the theorem stated above. Let n-dimensional vectors s 0, s 1, , s m−1 be defined by

$$\displaystyle{ \mathbf{s}_{p}(\mathbf{v}) = \frac{1} {2\pi i}\oint _{\varGamma }z^{\,p}A(z)^{-1}A^{{\prime}}(z)\mathbf{v}\,dz\quad (\,p = 0,1,\ldots,m - 1). }$$
(20)

Then we have the following lemma.

Lemma 2.5

The vector s p can be written as

$$\displaystyle{ \mathbf{s}_{p}(\mathbf{v}) =\sum _{ i=1}^{m}z_{ i}^{\,p}\sigma _{ i}(\mathbf{v})\mathbf{x}_{i}(z_{i}), }$$
(21)

where {σ i (v)} i = 1 m are some complex numbers. Moreover, {σ i (v)} i = 1 m are nonzero for generic v .

Proof

Let e j be the j-th column of I n . Then we have from Eqs. (20), (13) and (12),

$$\displaystyle\begin{array}{rcl} \mathbf{s}_{p}(\mathbf{v})& =& \sum _{j=1}^{n}\mathbf{e}_{ j}\, \frac{1} {2\pi i}\oint _{\varGamma }z^{\,p}\mathbf{e}_{ j}^{{\ast}}A(z)^{-1}A^{{\prime}}(z)\mathbf{v}\,dz \\ & =& \sum _{j=1}^{n}\mathbf{e}_{ j}\mu _{p}(\mathbf{e}_{j},\mathbf{v}) \\ & =& \sum _{j=1}^{n}\sum _{ i=1}^{m}\mathbf{e}_{ j}\nu _{i}(\mathbf{e}_{j},\mathbf{v})z_{i}^{\,p} \\ & =& \sum _{i=1}^{m}z_{ i}^{\,p}\mathbf{x}_{ i}(z_{i})\mathbf{y}_{i}^{{\ast}}(z_{ i})\left [I_{n} + \frac{1} {p_{i}(z_{i})}\left \{X_{i}(z_{i})Y _{i}^{{\ast}}(z_{ i})A(z_{i})X_{i}(z_{i})Y _{i}^{{\ast}}(z_{ i})\right \}^{{\prime}}\right ]\mathbf{v} \\ & =& \sum _{i=1}^{m}z_{ i}^{\,p}\sigma _{ i}(\mathbf{v})\mathbf{x}_{i}(z_{i}), {}\end{array}$$
(22)

where

$$\displaystyle{ \sigma _{i}(\mathbf{v}) = \mathbf{y}_{i}^{{\ast}}(z_{ i})\left [I_{n} + \frac{1} {p_{i}(z_{i})}\left \{X_{i}(z_{i})Y _{i}^{{\ast}}(z_{ i})A(z_{i})X_{i}(z_{i})Y _{i}^{{\ast}}(z_{ i})\right \}^{{\prime}}\right ]\mathbf{v}. }$$
(23)

Apparently, σ i (v) is nonzero for generic v. □

Denote by w i the (nonlinear) eigenvector of A(z) corresponding to the eigenvalue z i , that is, w i = x i (z i ). Then w 1, w 2, , w m can be computed as follows.

Theorem 2.6

If σ i 0 for i = 1, , m, the eigenvectors are given by

$$\displaystyle{ \left [\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{m}\right ] = \left [\mathbf{s}_{0},\mathbf{s}_{1},\ldots,\mathbf{s}_{m-1}\right ]V _{m}^{-T}. }$$
(24)

Proof

From Lemma 2.5, \(\left [\mathbf{s}_{0},\mathbf{s}_{1},\ldots,\mathbf{s}_{m-1}\right ]\) can be written as

$$\displaystyle\begin{array}{rcl} & & \left [\mathbf{s}_{0},\mathbf{s}_{1},\ldots,\mathbf{s}_{m-1}\right ] \\ & & = \left [\sum _{i=1}^{m}z_{ i}^{0}\sigma _{ i}(\mathbf{v})\mathbf{x}_{i}(z_{i}),\sum _{i=1}^{m}z_{ i}^{1}\sigma _{ i}(\mathbf{v})\mathbf{x}_{i}(z_{i}),\ldots,\sum _{i=1}^{m}z_{ i}^{m-1}\sigma _{ i}(\mathbf{v})\mathbf{x}_{i}(z_{i})\right ] \\ & & = \left [\sigma _{1}\mathbf{x}_{1}(z_{1}),\sigma _{2}\mathbf{x}_{2}(z_{2}),\ldots,\sigma _{m}\mathbf{x}_{m}(z_{m})\right ]V _{m}^{T}. {}\end{array}$$
(25)

Hence,

$$\displaystyle{ \left [\sigma _{1}\mathbf{x}_{1}(z_{1}),\sigma _{2}\mathbf{x}_{2}(z_{2}),\ldots,\sigma _{m}\mathbf{x}_{m}(z_{m})\right ] = \left [\mathbf{s}_{0},\mathbf{s}_{1},\ldots,\mathbf{s}_{m-1}\right ]V _{m}^{-T}. }$$
(26)

The theorem follows by noting that if σ i ≠ 0, σ i x i (z i ) is a nonzero vector that satisfies A(z i )x i (z i ) = λ i (z i )x i (z i ) = 0 and is itself a nonlinear eigenvector corresponding to z i . □

3 The Algorithm

In this section, we present an algorithm for computing the nonlinear eigenvalues of A(z) that lie within Γ based on the theory developed in the previous section. For simplicity, we restrict ourselves to the case where Γ is a circle centered at the origin and with radius r.

In the algorithm, we need to approximate the contour integrals in Eqs. (6) and (20) with some quadrature. Since they are integrals of an analytic function over the entire period, we use the trapezoidal rule [19, 20], which converges exponentially and therefore is an excellent method for the task. When the number of sample points is K, Eqs. (6) and (20) become

$$\displaystyle\begin{array}{rcl} \mu _{p}(\mathbf{u},\mathbf{v})& =& \frac{r^{p+1}} {K} \sum _{j=0}^{K-1}\omega _{ K}^{(\,p+1)j}\mathbf{u}^{{\ast}}A(r\omega _{ K}^{j})^{-1}A^{{\prime}}(r\omega _{ K}^{j})\mathbf{v},{}\end{array}$$
(27)
$$\displaystyle\begin{array}{rcl} \mathbf{s}_{p}(\mathbf{v})& =& \frac{r^{p+1}} {K} \sum _{j=0}^{K-1}\omega _{ K}^{(\,p+1)j}A(r\omega _{ K}^{j})^{-1}A^{{\prime}}(r\omega _{ K}^{j})\mathbf{v},{}\end{array}$$
(28)

respectively, where \(\omega _{K} =\exp \left (\frac{2\pi i} {K}\right )\).

Using these expressions, the algorithm can be written as in Algorithm 1.

Concerning the use of Algorithm 1, several remarks are in order.

  1. 1.

    In this algorithm, the computationally dominant part is step 5, where the solution of linear equations with coefficient matrix A(ξ j ) for j = 0, 1, , K − 1 is needed. This operation is repeated for K different values of ξ j . However, as is clear from the algorithm, these K operations can be done completely in parallel. Thus the algorithm has large-grain parallelism.

  2. 2.

    In step 13, since V m is a Vandermonde matrix, multiplying V m T can be done using a specialized solver for Vandermonde systems [7]. This is faster and more accurate than first constructing V m explicitly and then using a general-purpose solver such as the Gaussian elimination.

  3. 3.

    Though this algorithm presupposes that m, the number of eigenvalues in Γ, is known in advance, this is often not the case. When m is unknown, we can choose some integer M, which hopefully satisfies Mm, run the algorithm by replacing m with M, and compute {ν i } i = 1 M by ν i = e i T V M −1 H M V M T e i . In this case, Mm of {z 1, z 2, , z m } are spurious eigenvalues that do not correspond to the nonlinear eigenvalues of A(z) in Γ. These spurious eigenvalues can be distinguished from the true ones since the corresponding | ν i | ’s are very small. This technique was proposed in [17] for the (linear) generalized eigenvalue problem and its detailed analysis is given in [1]. There is also a technique to determine m using the singular value decomposition of H M . See [8] for details.

4 Numerical Examples

In this section, we give numerical examples of our Algorithm 1. The experiments were performed on a PC with a Xeon processor and Red Hat Linux using the Gnu C++ compiler. We used LAPACK routines to solve the linear simultaneous equation with coefficient matrix A(ξ j ) and to find the eigenvalues of the matrix pencil H m <λH m .

Example 1

Our first example is a small symmetric quadratic eigenvalue problem taken from [18]:

$$\displaystyle{ A(z)=\left [\begin{array}{rrrrr} -10\lambda ^{2}+\lambda +10 & & & & \\ 2\lambda ^{2}+2\lambda +2 & \;-11\lambda ^{2}+\lambda +9 & & \mathrm{sym.}& \\ -\lambda ^{2}+\lambda -1 & 2\lambda ^{2}+2\lambda +3 & -12\lambda ^{2}+10 & & \\ \lambda ^{2}+2\lambda +2 & -2\lambda ^{2}+\lambda -1 & \;-\lambda ^{2}-2\lambda +2 & \;-10\lambda ^{2}+2\lambda +12 & \\ 3\lambda ^{2}+\lambda -2 & -\lambda ^{2}+3\lambda -2 & \lambda ^{2}-2\lambda -1 & 2\lambda ^{2}+3\lambda +1 & \;-11\lambda ^{2}+3\lambda +10 \end{array} \right ]. }$$
(29)

This problem has ten distinct eigenvalues and their values are (to three decimals) [13]:

$$\displaystyle\begin{array}{rcl} \begin{array}{rrrrr} - 1.27&\quad - 1.08&\quad - 1.0048&\quad - 0.779&\quad - 0.512\\ 0.502&0.880&0.937&1.47&1.96.\end{array} & &{}\end{array}$$
(30)

We applied our method with r = 1. 0 to find the eigenvalues in the unit disk with center at the origin. There are five eigenvalues of A(z) in this circle. We set M = 8 (see item (iii) of the previous subsection) and K = 128. The computed eigenvalues z i of H M <λH M is shown in Table 1, along with the residual of the computed eigenvectors w i and the values of ν i . Here the residual is defined by \(\parallel A(z_{i})\mathbf{w}_{i} \parallel /\left (\parallel A(z_{i}) \parallel _{\infty }\parallel \mathbf{w}_{i} \parallel \right )\).

Table 1 Computed eigenvalues, their residuals and the values of ν i for Example 1

Among the eight computed eigenvalues, z 2 and z 5 through z 8 are inside the circle and have relatively large value of | ν i |. Thus we know that they are wanted eigenvalues. In fact, they have small residuals of order 10−11. Hence we can say that we have succeeded in finding all the five eigenvalues in the circle and the corresponding eigenvectors with high accuracy.

On the other hand, z 4 is located outside the circle and z 1 and z 3 have small value of | ν i |. This shows that they are either unwanted or spurious eigenvalues. Among these three eigenvalues, z 4 has a large value of | ν i | and its residual is as small as that for the inner eigenvalues. Thus it seems that this is a true outer eigenvalue that has been computed accurately. This occurs because the effect of the poles of u A(z)−1 A (z)v just outside the circle remains due to numerical integration. This phenomenon occurs also in the algorithm using Tr(A(z)−1 A (z)) and is analyzed in [1] in detail.

Example 2

Our next example is a medium size problem whose elements have both linear and exponential dependence on z. Specifically,

$$\displaystyle{ A(z) = A - zI_{n} +\epsilon B(z), }$$
(31)

where A is a real nonsymmetric matrix whose elements follow uniform random distribution in [0, 1], B(z) is an anti-diagonal matrix with antidiagonal elements e z and ε is a parameter that determines the degree of nonlinearity. This test matrix is used in [1]. In the present example, n = 500 and we applied our method with r = 0. 7. It is known that there are ten eigenvalues in the circle. We set M = 12 and K = 128. The result are shown in Table 2.

Table 2 Computed eigenvalues, their residuals and the values of ν i for Example 2

Among the twelve computed eigenvalues, the ten eigenvalues except for z 2 and z 11 are inside the circle and have large value of ν i . Accordingly, these are judged to be the wanted eigenvalues. This is confirmed by the fact that the corresponding residuals are all of order 10−13. Hence we can conclude that our algorithm again succeeded in finding all the wanted eigenvalues in this example. The computed eigenvalues in the complex plane are shown in Fig. 1.

Fig. 1
figure 1

Distribution of the eigenvalues in Example 2

5 Conclusion

In this paper, we presented an alternative derivation of the SS-type method for the nonlinear eigenvalue problem. We assumed that all the eigenvalues in the specified region are simple and considered contour integrals along infinitesimally small circles around the eigenvalues. This allowed us to use the analyticity of the eigenvalues and eigenvectors of a parametrized matrix A(z), which is a well-known result in matrix perturbation theory, instead of the canonical forms of A(z) described by the Smith form or the theorem of Keldysh. We believe this will provide an easily accessible approach to the theory of the nonlinear SS-type method.