Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Discretization of the Bethe–Salpeter equation (BSE) [15] leads to an eigenvalue problem \(Hz =\lambda z\), where the coefficient matrix H has the form

$$\displaystyle{ H = \left [\begin{array}{*{10}c} A & B\\ -\overline{B } &-\overline{A} \end{array} \right ]. }$$
(1)

The matrix A and B in (1) satisfy

$$\displaystyle{ A^{{\ast}} = A,\qquad \overline{B}^{{\ast}} = B. }$$
(2)

Here \(A^{{\ast}}\) and \(\overline{A}\) are the conjugate transpose and complex conjugate of \(A\), respectively. In this paper, we call \(H\) a Bethe–Salpeter Hamiltonian matrix, or, in short, a BSE Hamiltonian. In condense matter physics, the Bethe–Salpeter eigenvalue problem is derived from a Dyson’s equation for a 2-particle Green’s function used to describe excitation events that involve two particles simultaneously. It is a special case of the J-symmetric eigenvalue problem [3]. This type of eigenvalue problem also appears in linear response (LR) time-dependent density functional theory, and the random phase approximation theory. In these approaches, H is sometimes called a Casida Hamiltonian, a linear response Hamiltonian, or a random phase approximation (RPA) Hamiltonian.

The dimension of A and B can be quite large, because it scales as \(\mathscr{O}(N^{2})\), where N is number of degrees of freedom required to represent a three-dimensional single particle wavefunction. As a result, efficient numerical algorithms must be developed to solve the Bethe–Salpeter eigenvalue problem. To gain computational efficiency, these methods should take advantage of the special structure of the Hamiltonian in (1).

Let

$$\displaystyle{ C_{n} = \left [\begin{array}{*{10}c} I_{n}& 0 \\ 0 &-I_{n} \end{array} \right ],\qquad \varOmega = \left [\begin{array}{*{10}c} A&B\\ \overline{B } & \overline{A} \end{array} \right ]. }$$
(3)

Then \(H = C_{n}\varOmega\), with both \(C_{n}\) and \(\varOmega\) Hermitian. In most physics problems, the condition

$$\displaystyle{ \varOmega \succ 0 }$$
(4)

holds, that is, the matrix \(\varOmega\) is positive definite. We call \(H\) a definite Bethe–Salpeter Hamiltonian matrix when (4) is satisfied. It has been shown in [16] that, in general, solving a Bethe–Salpeter eigenvalue problem is equivalent to solving a real Hamiltonian eigenvalue problem. However, a definite Bethe–Salpeter eigenvalue problem, which is of most interest in practice, has many additional properties. In this paper we restrict ourselves to this special case, i.e., we assume that the condition (4) holds.

There are several ways to reformulate the definite Bethe–Salpeter eigenvalue problem. One equivalent formulation of \(Hz =\lambda z\) yields a generalized eigenvalue problem (GEP) \(C_{n}z =\lambda ^{-1}\varOmega z\). As \(\varOmega\) is positive definite, \(C_{n}z =\lambda ^{-1}\varOmega z\) is a Hermitian–definite GEP and hence has real eigenvalues. Another equivalent formulation is \((\varOmega -\lambda C_{n})z = 0\), where \(\varOmega -\lambda C_{n}\) is a definite pencil [7, 20] with a definitizing shift λ 0 = 0. In addition, the eigenvalue problem \(H^{2}z =\lambda ^{2}z\) can be written as a product eigenvalue problem \((C_{n}\varOmega C_{n})\varOmega z =\lambda ^{2}z\) in which both \(C_{n}\varOmega C_{n}\) and \(\varOmega\) are positive definite. These formulations suggest that a definite Bethe–Salpeter eigenvalue problem can be transformed to symmetric eigenvalue problems. As a result, we can analyze various properties of the Bethe–Salpeter eigenvalue problem by combining existing theories of symmetric eigenvalue problems (see, e.g., [14, 20]) with the special structure of H.

In this paper, we describe several spectral properties of a definite BSE Hamiltonian. These properties include the orthogonality of eigenvectors, the Courant–Fischer type of min–max characterization of the eigenvalues, the Cauchy type interlacing properties, and the Weyl type inequalities for establishing bounds on a structurely perturbed definite BSE Hamiltonian. Most properties take into account the special structure of the BSE Hamiltonian. Although the derivations are relatively straightforward, these properties are important for developing efficient and reliable algorithms for solving the definite Bethe–Salpeter eigenvalue problem.

The rest of this paper is organized as follows. In Sect. 2, we analyze the spectral decomposition of \(H\) and derive two types of orthogonality conditions on the eigenvectors. Variational properties based on these two types of orthogonality conditions are established in Sect. 3. Finally, we provide several eigenvalue perturbation bounds in Sect. 4.

2 Preliminaries

2.1 Spectral Decomposition

As a highly structured matrix, a definite BSE Hamiltonian admits a structured spectral decomposition as stated in the following theorem.

Theorem 1 ([16, Theorem 3])

A definite Bethe–Salpeter Hamiltonian matrix is diagonalizable and has real spectrum. Furthermore, it admits a spectral decomposition of the form

$$\displaystyle{ H = \left [\begin{array}{*{10}c} X & \overline{Y }\ \\ Y &\overline{X} \end{array} \right ]\left [\begin{array}{*{10}c} \varLambda & 0\\ 0 &-\varLambda \end{array} \right ]\left [\begin{array}{*{10}c} X &-\overline{Y }\ \\ -Y & \overline{X } \end{array} \right ]^{{\ast}}\,, }$$
(5)

where \(\varLambda =\mathop{ \mathrm{diag}}\nolimits \left \{\lambda _{1},\mathop{\ldots },\lambda _{n}\right \} \succ 0\) , and

$$\displaystyle{ \left [\begin{array}{*{10}c} X &-\overline{Y }\\ -Y & \overline{X } \end{array} \right ]^{{\ast}}\left [\begin{array}{*{10}c} X & \overline{Y } \\ Y &\overline{X} \end{array} \right ] = I_{2n}. }$$
(6)

As the eigenvalues of a definite BSE Hamiltonian appear in pairs \(\pm \lambda\), we denote by \(\lambda _{i}^{+}(U)\) (\(\lambda _{i}^{-}(U)\)) the \(i\) th smallest positive (largest negative) eigenvalue of a matrix \(U\) with real spectrum. When the matrix is omitted, \(\lambda _{i}^{+}\) (or \(\lambda _{i}^{-}\)) represents \(\lambda _{i}^{+}(H)\) (or \(\lambda _{i}^{-}(H)\)), where \(H\) is a definite BSE Hamiltonian. Thus the eigenvalues of \(H\) are labeled as

$$\displaystyle{\lambda _{n}^{-}\leq \cdots \leq \lambda _{ 1}^{-} <\lambda _{ 1}^{+} \leq \cdots \leq \lambda _{ n}^{+}.}$$

To represent the structure of the eigenvectors of \(H\), we introduce the notation

$$\displaystyle{\phi (U,V )\mathop{:=}\left [\begin{array}{*{10}c} U &\overline{V }\\ V & \overline{U } \end{array} \right ],}$$

where \(U\) and \(V\) are matrices of the same size. The structure is preserved under summation, real scaling, complex conjugation, transposition, as well as matrix multiplication

$$\displaystyle{\phi (U_{1},V _{1})\phi (U_{2},V _{2}) =\phi (U_{1}U_{2} + \overline{V }_{1}V _{2},V _{1}U_{2} + \overline{U}_{1}V _{2}).}$$

The Eqs. (5) and (6) can be rewritten as

$$\displaystyle{C_{n}\varOmega =\phi (X,Y )C_{n}\phi (\varLambda,0)\phi (X,-Y )^{{\ast}},\qquad \phi (X,-Y )^{{\ast}}\phi (X,Y ) = I_{ 2n}.}$$

The converse of Theorem 1 is also true in the following sense: If (5) holds, then \(H\) is a definite BSE Hamiltonian because

$$\displaystyle{H = C_{n}\big[\phi (X,-Y )\phi (\varLambda,0)\phi (X,-Y )^{{\ast}}\big].}$$

As a result, \(H^{-1} = C_{n}\big[\phi (X,-Y )\phi (\varLambda ^{-1},0)\phi (X,-Y )^{{\ast}}\big]\) is also a definite BSE Hamiltonian.

2.2 Orthogonality on the Eigenvectors

From the spectral decomposition of a definite BSE Hamiltonian \(H\), we immediately obtain two types of orthogonality conditions on the eigenvectors of \(H\).

First, the fact \(\varOmega =\phi (X,-Y )\phi (\varLambda,0)\phi (X,-Y )^{{\ast}}\) implies that

$$\displaystyle{\phi (X,Y )^{{\ast}}\varOmega \phi (X,Y ) =\phi (\varLambda,0).}$$

Therefore, the eigenvectors of \(H\) are orthogonal with respect to the \(\varOmega\)-inner product defined by \(\left \langle u,v\right \rangle _{\varOmega }:= v^{{\ast}}\varOmega u\). The eigenvectors can be normalized as

$$\displaystyle{\phi (\tilde{X},\tilde{Y })^{{\ast}}\varOmega \phi (\tilde{X},\tilde{Y }) = I_{ 2n}.}$$

through a diagonal scaling \(\phi (\tilde{X},\tilde{Y }) =\phi (X,Y )\phi (\varLambda ^{-1/2},0)\).

Second, it follows directly from (6) that

$$\displaystyle{\phi (X,Y )^{{\ast}}C_{ n}\phi (X,Y ) = C_{n}.}$$

This indicates that the eigenvectors of \(H\) are also orthogonal with respect to the \(C\)-inner product defined by \(\left \langle u,v\right \rangle _{C}:= v^{{\ast}}C_{n}u\), which is an indefinite scalar product [20]. Furthermore, the positive (negative) eigenvalues of \(H\) are also the \(C\)-positive (\(C\)-negative) eigenvalues of the definite pencil \(\varOmega -\lambda C_{n}\).Footnote 1

These two types of orthogonal properties can be used to construct structure preserving projections that play a key role in Krylov subspace based eigensolvers. Suppose that \(\phi (X_{k},Y _{k}) \in \mathbb{C}^{2n\times 2k}\) is orthonormal with respect to the \(\varOmega\)-inner product. Then projection using \(\phi (X_{k},Y _{k})\) yields a \(2k \times 2k\) Hermitian matrix of the form

$$\displaystyle{ H_{k}\mathop{:=}\phi (X_{k},Y _{k})^{{\ast}}\varOmega H\phi (X_{ k},Y _{k}) =\phi (X_{k},Y _{k})^{{\ast}}\varOmega C_{ n}\varOmega \phi (X_{k},Y _{k})\mathop{=:}C_{n}\phi (A_{k},\overline{B}_{k}). }$$
(7)

It can be easily shown that the eigenvalues of the projected Hermitian matrix \(H_{k}\) also occur in pairs \(\pm \theta\), as \(H_{k}\) admits a structured spectral decomposition \(H_{k} =\phi (U_{k},V _{k})C_{k}\phi (\varTheta _{k},0)\phi (U_{k},V _{k})^{{\ast}}\), where \(\phi (U_{k},V _{k})^{{\ast}}\phi (U_{k},V _{k}) = I_{2k}\). Furthermore, the matrix \(\phi (X_{k},Y _{k})\phi (U_{k},V _{k})\) is again orthonormal with respect to the \(\varOmega\)-inner product. Thus we regard (7) as a structure preserving projection. But we remark that \(\varTheta _{k}\) is not always positive definite here as \(H_{k}\) can sometimes be singular.

Similarly, if \(\phi (X_{k},Y _{k}) \in \mathbb{C}^{2n\times 2k}\) is orthonormal with respect to the \(C\)-inner product, that is, \(\phi (X_{k},Y _{k})^{{\ast}}C_{n}\phi (X_{k},Y _{k}) = C_{k}\). Then

$$\displaystyle{ H_{k}\mathop{:=}C_{k}\phi (X_{k},Y _{k})^{{\ast}}C_{ n}H\phi (X_{k},Y _{k}) = C_{k}\big[\phi (X_{k},Y _{k})^{{\ast}}\varOmega \phi (X_{ k},Y _{k})\big] }$$
(8)

is a \(2k \times 2k\) definite BSE Hamiltonian. Therefore the projection (8) in \(C\)-inner product can also be regarded as structure preserving.

3 Variational Properties

3.1 Min–Max Principles

The \(i\) th smallest eigenvalues of the Hermitian–definite pencil \(C_{n}-\mu \varOmega\), denoted by \(\mu _{i}\), can be characterized by the Courant–Fischer min–max principle

$$\displaystyle{ \mu _{i} =\min _{\dim (\mathscr{V })=i}\max _{\begin{array}{c}z\in \mathscr{V } \\ z\neq 0 \end{array}}\frac{z^{{\ast}}C_{n}z} {z^{{\ast}}\varOmega z} }$$
(9)
$$\displaystyle{ =\max _{\dim (\mathscr{V })=2n-i+1}\min _{\begin{array}{c}z\in \mathscr{V } \\ z\neq 0 \end{array}}\frac{z^{{\ast}}C_{n}z} {z^{{\ast}}\varOmega z}, }$$
(10)

where \(\mathscr{V }\) is a linear subspace of \(\mathbb{C}^{2n}\). Notice that

$$\displaystyle{\lambda _{i}^{+} = \frac{1} {\mu _{2n-i+1}}> 0,\qquad (1 \leq i \leq n).}$$

Taking the reciprocal of (9) and (10) yields Theorem 2 below. The theorem is also a direct consequence of the Wielandt min–max principle discussed in [10, Theorem 2.2] for the definite pencil \(\varOmega -\lambda C_{n}\).

Theorem 2

Let \(H = C_{n}\varOmega\) be a definite Bethe–Salpeter Hamiltonian matrix as defined in  (1). Then

$$\displaystyle{ \lambda _{i}^{+} =\max _{\dim (\mathscr{V })=2n-i+1}\min _{\begin{array}{c}z\in \mathscr{V } \\ z^{{\ast}}C_{ n}z>0\end{array}} \frac{z^{{\ast}}\varOmega z} {z^{{\ast}}C_{n}z} }$$
(11)
$$\displaystyle{ =\min _{\dim (\mathscr{V })=i}\max _{\begin{array}{c}z\in \mathscr{V } \\ z^{{\ast}}C_{ n}z>0\end{array}} \frac{z^{{\ast}}\varOmega z} {z^{{\ast}}C_{n}z} }$$
(12)

for \(1 \leq i \leq n\) .

An important special case is \(i = 1\), for which we have the following Corollary 1.

Corollary 1 ([19])

The smallest positive eigenvalue of a definite Bethe–Salpeter Hamiltonian matrix \(H = C_{n}\varOmega\) satisfies

$$\displaystyle{ \lambda _{1}^{+} =\min _{ x^{{\ast}}x-y^{{\ast}}y\neq 0}\varrho (x,y), }$$
(13)

where

$$\displaystyle{ \varrho (x,y) = \frac{\left [\begin{array}{*{10}c} x\\ y \end{array} \right ]^{{\ast}}\left [\begin{array}{*{10}c} A&B\\ \overline{B } & \overline{A} \end{array} \right ]\left [\begin{array}{*{10}c} x\\ y \end{array} \right ]} {\vert x^{{\ast}}x - y^{{\ast}}y\vert }. }$$
(14)

is the Thouless functional.

Thanks to this result, the computation of \(\lambda _{1}^{+}\) can be converted to minimizing the Thouless functional (14). Thus optimization based eigensolvers, such as the Davidson algorithm [6] and the LOBPCG algorithm [8], can be adopted to compute \(\lambda _{1}^{+}\).

Finally, we remark that, from a computational point of view, the use of (12) requires additional care, because for an arbitrarily chosen subspace \(\mathscr{V } \subset \mathbb{C}^{2n}\) the quantity

$$\displaystyle{\sup _{\begin{array}{c}z\in \mathscr{V } \\ z^{{\ast}}C_{ n}z>0\end{array}} \frac{z^{{\ast}}\varOmega z} {z^{{\ast}}C_{n}z} =\sup _{\begin{array}{c}z\in \mathscr{V } \\ z^{{\ast}}C_{ n}z=1\end{array}}z^{{\ast}}\varOmega z}$$

can easily become \(+\infty\) when \(\mathscr{V }\) contains \(C\)-neutral vectors.

3.2 Trace Minimization Principles

In many applications, only a few smallest positive eigenvalues of \(H\) are of practical interest. The computation of these interior eigenvalues requires additional care since interior eigenvalues are in general much more difficult to compute compared to external ones. Recently, (13) has been extended to a trace minimization principle for real BSE Hamiltonians [1], so that several eigenvalues can be computed simultaneously using a blocked algorithm [2, 9]. In the following, we present two trace minimization principles, corresponding to the two types of structured preserving projections discussed in Sect. 2.2.

Theorem 3

Let \(H = C_{n}\varOmega\) be a definite Bethe–Salpeter Hamiltonian matrix as defined in  (1). Then

$$\displaystyle{ -{\Bigl ( \frac{1} {\lambda _{1}^{+}} + \cdots + \frac{1} {\lambda _{k}^{+}}\Bigr )} =\min _{\phi (X,Y )^{{\ast}}\varOmega \phi (X,Y )=I_{2k}}\mathop{ \mathrm{trace}}\nolimits (X^{{\ast}}X - Y ^{{\ast}}Y ) }$$
(15)

holds for \(1 \leq k \leq n\) .

Proof

We rewrite the eigenvalue problem \(Hz =\lambda z\) as \(C_{n}z =\lambda ^{-1}\varOmega z\). Then by the trace minimization principle for Hermitian–definite GEP, we obtain

$$\displaystyle{-{\Bigl ( \frac{1} {\lambda _{1}^{+}} + \cdots + \frac{1} {\lambda _{k}^{+}}\Bigr )} =\min _{Z^{{\ast}}\varOmega Z=I_{k}}\mathop{ \mathrm{trace}}\nolimits (Z^{{\ast}}C_{ n}Z).}$$

Notice that

$$\displaystyle{\mathscr{S}_{1}\mathop{:=}\left \{\left [\begin{array}{*{10}c} X\\ Y \end{array} \right ] \in \mathbb{C}^{2n\times k}: \phi (X,Y )^{{\ast}}\varOmega \phi (X,Y ) = I_{ 2k}\right \}}$$

is a subset of

$$\displaystyle{\mathscr{S}_{2}\mathop{:=}\left \{Z \in \mathbb{C}^{2n\times k}: Z^{{\ast}}\varOmega Z = I_{ k}\right \}.}$$

We have

$$\displaystyle{\min _{Z\in \mathscr{S}_{2}}\mathop{ \mathrm{trace}}\nolimits (Z^{{\ast}}C_{ n}Z) \leq \min _{Z\in \mathscr{S}_{1}}\mathop{ \mathrm{trace}}\nolimits (Z^{{\ast}}C_{ n}Z).}$$

The equality is attainable, since the minimizer in \(\mathscr{S}_{2}\) can be chosen as the eigenvectors of \(H\), which is also in \(\mathscr{S}_{1}\). As a result, (15) follows directly from the fact that \([X^{{\ast}},Y ^{{\ast}}]C_{n}[X^{{\ast}},Y ^{{\ast}}]^{{\ast}} = X^{{\ast}}X - Y ^{{\ast}}Y\). □

Theorem 4

Let \(H = C_{n}\varOmega\) be a definite Bethe–Salpeter Hamiltonian matrix as defined in  (1). Then

$$\displaystyle{ \lambda _{1}^{+} + \cdots +\lambda _{ k}^{+} =\min _{\phi (X,Y )^{{\ast}}C_{n}\phi (X,Y )=C_{k}}\mathop{ \mathrm{trace}}\nolimits (X^{{\ast}}AX + X^{{\ast}}BY + Y ^{{\ast}}\overline{B}X + Y ^{{\ast}}\overline{A}Y ) }$$
(16)

holds for \(1 \leq k \leq n\) .

Proof

As the eigenvalues of \(H\) are also the eigenvalues of the definite pencil \(\varOmega -\lambda C_{n}\), by the trace minimization property of definite pencils (see, for example, [9, Theorem 2.4]), we obtain

$$\displaystyle{\begin{array}{rl} \lambda _{1}^{+} + \cdots +\lambda _{ k}^{+} & =\min _{Z^{{\ast}}C_{n}Z=I_{k}}\mathop{ \mathrm{trace}}\nolimits (Z^{{\ast}}\varOmega Z) \\ & = \frac{1} {2}\min _{Z^{{\ast}}C_{n}Z=C_{k}}\mathop{ \mathrm{trace}}\nolimits (Z^{{\ast}}\varOmega Z). \end{array} }$$

The rest of the proof is nearly identical to that of Theorem 3. Because

$$\displaystyle{\mathscr{S}_{1}\mathop{:=}\left \{\left [\begin{array}{*{10}c} X\\ Y \end{array} \right ] \in \mathbb{C}^{2n\times k}: \phi (X,Y )^{{\ast}}C_{ n}\phi (X,Y ) = C_{k}\right \}}$$

is a subset of

$$\displaystyle{\mathscr{S}_{2}\mathop{:=}\left \{Z \in \mathbb{C}^{2n\times k}: Z^{{\ast}}C_{ n}Z = C_{k}\right \},}$$

we have

$$\displaystyle{\min _{Z\in \mathscr{S}_{2}}\mathop{ \mathrm{trace}}\nolimits (Z^{{\ast}}\varOmega Z) \leq \min _{ Z\in \mathscr{S}_{1}}\mathop{ \mathrm{trace}}\nolimits (Z^{{\ast}}\varOmega Z).}$$

The equality is attainable by choosing the corresponding eigenvectors of \(H\), which belong to both \(\mathscr{S}_{1}\) and \(\mathscr{S}_{2}\). □

Theorems 3 and 4 can both be used to derive structure preserving optimization based eigensolvers. We shall discuss the computation of eigenvalues in separate publications. We also refer the readers to [10] for more general variational principles.

3.3 Interlacing Properties

We have already seen that the two types of orthogonality conditions on the eigenvectors of \(H\) can both be used to construct structure preserving projections that can be used for eigenvalue computations. In this subsection we point out some difference on the location of the Ritz values.

When the \(\varOmega\)-inner product is used for projection, we have the following Cauchy type interlacing property.

Theorem 5

Let \(H = C_{n}\varOmega\) be a definite Bethe–Salpeter Hamiltonian matrix as defined in  (1). Suppose that \(\phi (X,Y )^{{\ast}}\varOmega \phi (X,Y ) = I_{2k}\) , where \(1 \leq k \leq n\) . Then the eigenvalues of \(\phi (X,Y )^{{\ast}}\varOmega H\phi (X,Y )\) are real and appear in pairs \(\pm \theta\) . Moreover Footnote 2

$$\displaystyle{ \lambda _{i}^{+}\big(\phi (X,Y )^{{\ast}}\varOmega H\phi (X,Y )\big) \leq \lambda _{ n+i-k}^{+}(H),\qquad (1 \leq i \leq k). }$$
(17)

Proof

The first half of the theorem follows from the discussions in Sect. 2.2. We only show the interlacing property. Notice that \(U\mathop{:=}\varOmega ^{1/2}\phi (X,Y )\) has orthonormal columns in the standard inner product, that is, \(U^{{\ast}}U = I_{2k}\). By the Cauchy interlacing theorem, we have

$$\displaystyle\begin{array}{rcl} \lambda _{i}^{+}\big(\phi (X,Y )^{{\ast}}\varOmega H\phi (X,Y )\big)& =& \lambda _{ i}^{+}\big(U^{{\ast}}\varOmega ^{1/2}C_{ n}\varOmega ^{1/2}U\big) {}\\ & \leq & \lambda _{n+i-k}^{+}\big(\varOmega ^{1/2}C_{ n}\varOmega ^{1/2}\big) {}\\ & =& \lambda _{n+i-k}^{+}\big(H\big). {}\\ \end{array}$$

In contrast to the standard Cauchy interlacing theorem, there is no nontrivial lower bound on the Ritz value \(\lambda _{i}^{+}\big(\phi (X,Y )^{{\ast}}\varOmega H\phi (X,Y )\big)\) here. In fact, the projected matrix \(\phi (X,Y )^{{\ast}}\varOmega H\phi (X,Y )\) can even be zero. For instance,

$$\displaystyle{A = I_{n},\quad B = 0,\quad X = \frac{1} {\sqrt{2}}\left [\begin{array}{*{10}c} I_{k} \\ I_{k} \\ 0 \end{array} \right ],\quad Y = \frac{1} {\sqrt{2}}\left [\begin{array}{*{10}c} I_{k} \\ -I_{k} \\ 0 \end{array} \right ]}$$

is an example for such an extreme case (assuming \(2k \leq n\)).

For projections based on the \(C\)-inner product, we establish Theorem 6 below. Similar to Theorem 5, Ritz values are only bounded in one direction. However, in this case, it is possible to provide a meaningful (though complicated) upper bound for the Ritz value. We refer the readers to [1, Theorem 4.1] for the case of real BSE. Further investigation in this direction is beyond the scope of this paper.

Theorem 6

Let \(H = C_{n}\varOmega\) be a definite Bethe–Salpeter Hamiltonian matrix as defined in  (1). Suppose that \(\phi (X,Y )^{{\ast}}C_{n}\phi (X,Y ) = C_{k}\) , where \(1 \leq k \leq n\) . Then the eigenvalues of \(C_{k}\phi (X,Y )^{{\ast}}C_{n}H\phi (X,Y )\) appear in pairs \(\pm \theta\) . Moreover

$$\displaystyle{ \lambda _{i}^{+}\big(C_{ k}\phi (X,Y )^{{\ast}}C_{ n}H\phi (X,Y )\big) \geq \lambda _{i}^{+}(H),\qquad (1 \leq i \leq k). }$$
(18)

Proof

Notice that the eigenvalues of \(C_{k}\big(\phi (X,Y )^{{\ast}}\varOmega \phi (X,Y )\big)\) can also be regarded as the eigenvalues of the definite pencil \(\phi (X,Y )^{{\ast}}(\varOmega -\lambda C_{n})\phi (X,Y )\). Then the conclusion follows from the Cauchy interlacing property of definite pencils [9, Theorem 2.3]. □

From a computational perspective, (18) provides more useful information than (17), because the Ritz value \(\lambda _{i}^{+}\big(C_{k}\phi (X,Y )^{{\ast}}C_{n}H\phi (X,Y )\big)\) is bounded in terms of the corresponding eigenvalue \(\lambda _{i}^{+}(H)\) to be approximated. The inequality (17) gives an upper bound of the Ritz value. But we have less control over the location of \(\lambda _{i}^{+}\big(\phi (X,Y )^{{\ast}}\varOmega H\phi (X,Y )\big)\).

Finally, we remark that the trace minimization principle (16) can also be derived by the interlacing property (18).

4 Eigenvalue Perturbation Bounds

4.1 Weyl Type Inequalities

In the perturbation theory of symmetric eigenvalue problems, Weyl’s inequality implies that the eigenvalues of a Hermitian matrix are well conditioned when a Hermitian perturbation is introduced. In the following we establish similar results for definite Bethe–Salpeter eigenvalue problems.

Theorem 7

Let \(H\) and \(H +\varDelta H\) be definite Bethe–Salpeter Hamiltonian matrices. Then

$$\displaystyle{\left \vert \frac{\lambda _{i}^{+}(H +\varDelta H) -\lambda _{i}^{+}(H)} {\lambda _{i}^{+}(H)} \right \vert \leq \kappa _{2}(H)\frac{\|\varDelta H\|_{2}} {\|H\|_{2}},\qquad (1 \leq i \leq n),}$$

where \(\kappa _{2}(H) =\| H\|_{2}\|H^{-1}\|_{2}\) .

Proof

Let \(\varDelta \varOmega = C_{n}\varDelta H\). Then \(\varOmega +\varDelta \varOmega\) is positive definite. We rewrite \(Hz =\lambda z\) as the GEP \(C_{n}z =\lambda ^{-1}\varOmega z\). It follows from the Weyl inequality on Hermitian–definite GEP [12, Theorem 2.1] that

$$\displaystyle{\left \vert \frac{1} {\lambda _{i}^{+}(H)} - \frac{1} {\lambda _{i}^{+}(H +\varDelta H)}\right \vert \leq \frac{\|\varOmega ^{-1}\|_{2}\|\varDelta \varOmega \|_{2}} {\lambda _{i}^{+}(H +\varDelta H)}.}$$

By simple arithmetic manipulations, we arrive at

$$\displaystyle{\left \vert \frac{\lambda _{i}^{+}(H +\varDelta H) -\lambda _{i}^{+}(H)} {\lambda _{i}^{+}(H)} \right \vert \leq \kappa _{2}(\varOmega )\frac{\|\varDelta \varOmega \|_{2}} {\|\varOmega \|_{2}} =\kappa _{2}(H)\frac{\|\varDelta H\|_{2}} {\|H\|_{2}}.}$$

Theorem 7 characterizes the sensitivity of the eigenvalues of \(H\) when a structured perturbation is introduced—the relative condition number of \(\lambda _{i}^{+}(H)\) is bounded by \(\kappa _{2}(H)\). When the perturbation is also a definite BSE Hamiltonian, the eigenvalues are perturbed monotonically. We have the following result.

Theorem 8

Let \(H\) , \(\varDelta H \in \mathbb{C}^{2n\times 2n}\) be definite Bethe–Salpeter Hamiltonian matrices. Then

$$\displaystyle{\lambda _{i}^{+}(H +\varDelta H) \geq \lambda _{ i}^{+}(H) +\lambda _{ 1}^{+}(\varDelta H),\qquad (1 \leq i \leq n).}$$

Proof

Let \(\varDelta \varOmega = C_{n}\varDelta H\). Then by Theorem 2 we have

$$\displaystyle\begin{array}{rcl} \lambda _{i}^{+}(H +\varDelta H)& =& \max _{\dim (\mathscr{V })=2n-i+1}\min _{\begin{array}{c}z\in \mathscr{V } \\ z^{{\ast}}C_{ n}z>0\end{array}}\left ( \frac{z^{{\ast}}\varOmega z} {z^{{\ast}}C_{n}z} + \frac{z^{{\ast}}\varDelta \varOmega z} {z^{{\ast}}C_{n}z}\right ) {}\\ & \geq & \max _{\dim (\mathscr{V })=2n-i+1}\min _{\begin{array}{c}z\in \mathscr{V } \\ z^{{\ast}}C_{ n}z>0\end{array}}\left ( \frac{z^{{\ast}}\varOmega z} {z^{{\ast}}C_{n}z} +\lambda _{ 1}^{+}(\varDelta H)\right ) {}\\ & =& \lambda _{i}^{+}(H) +\lambda _{ 1}^{+}(\varDelta H). {}\\ \end{array}$$

A special perturbation in the context of Bethe–Salpeter eigenvalue problems is to drop the off-diagonal blocks in \(H\). Such a perturbation is known as the Tamm–Dancoff approximation (TDA) [5, 18]. Similar to the monotonic perturbation behavior above, it has been shown in [16] that TDA overestimates all positive eigenvalues of \(H\). In the following, we present a simpler proof of this property than the one given in [16].

Theorem 9 ([16, Theorem 4])

Let \(H\) be a definite Bethe–Salpeter Hamiltonian matrix as defined in  (1). Then

$$\displaystyle{\lambda _{i}^{+}(H) \leq \lambda _{ i}^{+}(A),\qquad (1 \leq i \leq n).}$$

Proof

Notice that \(H^{2} = (C_{n}\varOmega C_{n})\varOmega\) with both \(C_{n}\varOmega C_{n}\) and \(\varOmega\) positive definite. By the arithmetic–geometric inequality on positive definite matrices [4, Sect. 3.4], we obtain

$$\displaystyle{\lambda _{i}^{+}(H) = \sqrt{\lambda _{ 2i}^{+}\big((C_{n}\varOmega C_{n})\varOmega \big)} \leq \lambda _{2i}^{+}\bigg(\frac{C_{n}\varOmega C_{n}+\varOmega } {2} \bigg) =\lambda _{ 2i}^{+}\bigg(\left [\begin{array}{*{10}c} A& 0 \\ 0 &\overline{A} \end{array} \right ]\bigg) =\lambda _{ i}^{+}(A).\quad }$$

Combining Theorems 7 and 9, we obtain the following corollary. It characterizes to what extent existing results in the literature obtained from TDA are reliable.

Corollary 2

If \(H\) is a Bethe–Salpeter Hamiltonian matrix as defined in  (1), then

$$\displaystyle{0 \leq \frac{\lambda _{i}^{+}(A) -\lambda _{i}^{+}(H)} {\lambda _{i}^{+}(H)} \leq \kappa _{2}(H) \frac{\|B\|_{2}} {\|H\|_{2}},\qquad (1 \leq i \leq n).}$$

4.2 Residual Bounds

Another type of perturbation bounds on eigenvalues measures the accuracy of approximate eigenvalues in terms of the residual norm. These bounds are of interest in eigenvalue computations. In the following we discuss several residual bounds for the definite Bethe–Salpeter eigenvalue problem.

Theorem 10

Let \(H = C_{n}\varOmega\) be a definite Bethe–Salpeter Hamiltonian matrix. Suppose that \(X\) , \(Y \in \mathbb{C}^{n\times k}\) satisfy

$$\displaystyle{\phi (X,Y )^{{\ast}}\varOmega \phi (X,Y ) = I_{ 2k},\qquad \phi (X,Y )^{{\ast}}C_{ n}\phi (X,Y ) = \left [\begin{array}{*{10}c} \varTheta & 0\\ 0 &-\varTheta \end{array} \right ]^{-1},}$$

for some \(k\) between \(1\) and \(n\) , where \(\varTheta =\mathop{ \mathrm{diag}}\nolimits \left \{\theta _{1},\mathop{\ldots },\theta _{k}\right \} \succ 0\) . Then there exists a BSE Hamiltonian \(\varDelta H = C_{n}\varDelta \varOmega = C_{n}\phi (\varDelta A,\overline{\varDelta B})\) such that

$$\displaystyle{ (H+\varDelta H)\phi (X,Y ) =\phi (X,Y )\left [\begin{array}{*{10}c} \varTheta & 0\\ 0 &-\varTheta \end{array} \right ]. }$$
(19)

and

$$\displaystyle{ \|\varDelta H\|_{2} \leq 2\|H\|_{2}^{1/2}\|R\|_{ 2}, }$$
(20)

where

$$\displaystyle{R = H\phi (X,Y )-\phi (X,Y )\left [\begin{array}{*{10}c} \varTheta & 0\\ 0 &-\varTheta \end{array} \right ].}$$

Proof

It follows from the definition of \(R\) that

$$\displaystyle{\phi (X,Y )^{{\ast}}C_{ n}R = I_{2k}-\phi (X,Y )^{{\ast}}C_{ n}\phi (X,Y )\left [\begin{array}{*{10}c} \varTheta & 0\\ 0 &-\varTheta \end{array} \right ] = 0.}$$

Let

$$\displaystyle{\varDelta \varOmega = C_{n}R\phi (X,Y )^{{\ast}}\varOmega +\varOmega \phi (X,Y )R^{{\ast}}C_{ n}.}$$

Then \(\varDelta \varOmega\) is Hermitian. Since \(\varTheta\) is real, we have

$$\displaystyle{R = C_{n}\phi (A,\overline{B})\phi (X,Y ) -\phi (X,Y )C_{n}\phi (\varTheta,0) = C_{n}\phi (AX + BY - X\varTheta,A\overline{Y } + B\overline{X} + Y \varTheta ),}$$

indicating that

$$\displaystyle{\varOmega \phi (X,Y )R^{{\ast}}C_{ n} =\phi (A,\overline{B})\phi (X,Y )\phi (AX + BY - X\varTheta,A\overline{Y } + B\overline{X} + Y \varTheta )^{{\ast}}}$$

has the block structure \(\phi (\cdot,\cdot )\). Thus \(\varDelta H\mathop{:=}C_{n}\varDelta \varOmega\) is a BSE Hamiltonian. It can be easily verified that (19) is satisfied. Finally,

$$\displaystyle{\|\varDelta H\|_{2} =\|\varDelta \varOmega \| _{2} \leq 2\|\varOmega ^{1/2}\|_{ 2}\|\varOmega ^{1/2}\phi (X,Y )\|_{ 2}\|R^{{\ast}}\|_{ 2} = 2\|H\|_{2}^{1/2}\|R\|_{ 2}.\quad }$$

Roughly speaking, (20) implies that for definite BSE, Rayleigh–Ritz based algorithms that produce small residual norms are backward stable. When \(\kappa _{2}(H)\) is of modest size, backward stability implies forward stability according to Theorem 7. The following theorem provides a slightly better estimate compared to simply combining Theorems 7 and 10.

Theorem 11

Under the same assumption of Theorem  10 , there exist \(k\) positive eigenvalues of \(H\) , \(\lambda _{j_{1}} \leq \cdots \leq \lambda _{j_{k}}\) , such that

$$\displaystyle{\vert \theta _{i} -\lambda _{j_{i}}\vert \leq \| H\|_{2}^{1/2}\|R\|_{ 2},\qquad (1 \leq i \leq k).}$$

Proof

Notice that \(U\mathop{:=}\varOmega ^{1/2}\phi (X,Y )\) has orthonormal columns (in the standard inner product), and

$$\displaystyle{\varOmega ^{1/2}R =\varOmega ^{1/2}C_{ n}\varOmega ^{1/2}U-U\left [\begin{array}{*{10}c} \varTheta & 0 \\ 0&-\varTheta \end{array} \right ].}$$

By the residual bound for standard Hermitian eigenvalue problems (see [14, Theorem 11.5.1] or [17, Sect. IV.4.4]), we obtain that there are \(2k\) eigenvalues of \(\varOmega ^{1/2}C_{n}\varOmega ^{1/2}\), \(\tilde{\lambda }_{-j_{k}} \leq \cdots \leq \tilde{\lambda }_{-j_{1}} \leq \tilde{\lambda }_{j_{1}} \leq \cdots \leq \tilde{\lambda }_{j_{k}}\), such that

$$\displaystyle{\max \left \{\vert \theta _{i} +\tilde{\lambda } _{-j_{i}}\vert,\vert \theta _{i} -\tilde{\lambda }_{j_{i}}\vert \right \}\leq \|\varOmega \|_{2}^{1/2}\|R\|_{ 2} =\| H\|_{2}^{1/2}\|R\|_{ 2},\qquad (1 \leq i \leq k).}$$

Note that at least one of the inequalities \(\tilde{\lambda }_{j_{1}}> 0\) and \(\tilde{\lambda }_{-j_{1}} <0\) holds. As the eigenvalues of \(\varOmega ^{1/2}C_{n}\varOmega ^{1/2}\) are identical to those of \(H\), the conclusion follows immediately by choosing

$$\displaystyle{\lambda _{j_{i}} = \left \{\begin{array}{@{}l@{\quad }l@{}} \tilde{\lambda }_{j_{i}}, \quad &\text{if }\tilde{\lambda }_{j_{1}}> 0, \\ -\tilde{\lambda }_{-j_{i}},\quad &\text{otherwise,} \end{array} \right.\qquad (1 \leq i \leq k).}$$

Finally, we end this section by a Temple–Kato type quadratic residual bound as stated in Theorem 12. The quadratic residual bound explains the fact that the accuracy of a computed eigenvalues is in general much higher compared to that of the corresponding eigenpair. Such a behavior has been reported for real Bethe–Salpeter eigenvalue problem in [9]. It is certainly possible to extend Theorem 12 to a subspace manner using techniques in [11, 13].

Theorem 12

Let \((\theta,\hat{z})\) be an approximate eigenpair of a definite BSE Hamiltonian \(H = C_{n}\varOmega\) satisfying

$$\displaystyle{ \frac{\hat{z}^{{\ast}}\varOmega \hat{z}} {\hat{z}^{{\ast}}C_{n}\hat{z}} =\theta.}$$

Then the eigenvalue of \(H\) closest to \(\theta\) , denoted by \(\lambda\) , satisfies

$$\displaystyle{\big\vert \theta ^{-1} -\lambda ^{-1}\big\vert \leq \frac{\|H^{-1}\|_{ 2}\|H\hat{z} -\theta \hat{ z}\|_{2}^{2}} {\mathrm{gap}(\theta )\hat{z}^{{\ast}}\varOmega \hat{z}},}$$

where

$$\displaystyle{\mathrm{gap}(\theta )\mathop{:=}\min _{\lambda _{i}(H)\neq \theta }\big\vert \theta ^{-1} -\lambda _{ i}(H)^{-1}\big\vert.}$$

Proof

The theorem is a direct consequence of [14, Theorem 11.7.1] on the equivalent Hermitian eigenvalue problem \(\big(\varOmega ^{-1/2}C_{n}\varOmega ^{-1/2}\big)\big(\varOmega ^{1/2}z\big) =\lambda \big (\varOmega ^{1/2}z\big)\). □

5 Summary

The Bethe–Salpeter eigenvalue problem is an important class of structured eigenvalue problems arising from several physics and chemistry applications. The most important case, the definite Bethe–Salpeter eigenvalue problem, has a number of interesting properties. We identified two types of orthogonality conditions on the eigenvectors, and discussed several properties of the corresponding structure preserving projections. Although most of our theoretical results can be derived by extending similar results for general symmetric eigenvalue problems to this class of problems, they play an important role in developing and analyzing structure preserving algorithms for solving this type of problems. Numerical algorithms will be discussed in a separate publication.