1 Introduction

We consider a generalized eigenvalue problem \(A\boldsymbol{x} =\lambda B\boldsymbol{x}\), where \(A,B \in \mathbb{C}^{n\times n}\), \(\lambda \in \mathbb{C}\) is an eigenvalue, and \(\boldsymbol{x} \in \mathbb{C}^{n}\setminus \{\boldsymbol{0}\}\) is an eigenvector. Eigenvalue problems arise in many scientific applications such as in quantum transport models, where the self-energy is required to describe the charge injection and extraction effect of the contact. To compute the self-energy exactly, one needs to compute all of the eigenpairs, however it is enough for practical applications to compute only some of the eigenpairs. In [9], it is necessary to obtain the eigenvalues λ = ei near the circumference of a circle on the complex plane | λ | = 1 and corresponding eigenvectors, where Δ is the lattice period length, and k is the wave number.

The shift-invert Arnoldi method is a widely used method for obtaining interior eigenpairs[12]. This method computes eigenvalues close to a shift point and the corresponding eigenvectors. It is hard to obtain eigenpairs near the circumference with the shift-invert Arnoldi method. The Sakurai-Sugiura projection method (SSPM)[7, 8, 13] has been proposed for computing eigenvalues in a given region, and the corresponding eigenvectors, with contour integration. The SSPM finds eigenvalues in a domain surrounded by an integration path, by solving linear systems of equations at the quadrature points with numerical quadrature. An extension of the SSPM for calculating eigenvalues in the arc-shaped region by dividing the circumference of a circle into several arcs, and computing the eigenpairs for each line was proposed in [10]. This extension allows effective parallel computing of the eigenpairs in each arc. However, the quadrature points are set on the arc, and when the eigenpairs are located at the quadrature points, division by zero arises in the calculations.

In this paper, we present an alternative extension of the SSPM by setting two arcs, which avoids a decrease in the computational accuracy of the eigenpairs resulting from locating the quadrature points near the eigenvalues, and allows parallel computation.

We test the proposed method in SLEPc (the Scalable Library for Eigenvalue Problem Computations) [5].

This paper is organized as follows. In Sect. 2, we review the SSPM and an extension of the method for arcs. In Sect. 3, we propose an extension of the SSPM for the partial ring region and implement it in SLEPc. In Sect. 4, we discuss the results of the numerical experiments, and our conclusions are presented in Sect. 5.

2 An Extension of the SSPM for Arcs

In this section, we introduce the SSPM for generalized eigenvalue problems[13] and show an extension of the SSPM for the ring region on the complex plane[10]. The extension divides the ring region into several arcs, and calculates the eigenpairs near each arc. In the extension, we construct a subspace that contains the eigenvectors associated with the eigenvalues near the arc.

First, we introduce the SSPM. Let Γ be a positively oriented closed Jordan curve on the complex plane. The SSPM approximates eigenvalues inside of the closed Jordan curve Γ and corresponding eigenvectors, using a two-step procedure. The first step is to construct the subspace with a filtering for eigenvectors, and the second step is to extract the eigenpairs inside the closed Jordan curve.

We now introduce the procedure for constructing the subspace. Suppose that m eigenvalues are located inside Γ, let V be a n × L matrix, the column vectors of which are linearly independent, and let S = [S 0, S 1, , S M−1] where S k are n × L matrices be n × LM matrices which are determined through contour integration,

$$\displaystyle{ S_{k} = \frac{1} {2\pi \mathrm{i}}\oint _{\varGamma }z^{k}\left (zB - A\right )^{-1}BV \mathrm{d}z,\:\:\mathrm{for}\:\:k = 0,1,\ldots,M - 1, }$$
(1)

where zBA is a regular matrix pencil on zΓ, and M is chosen such that LM > m.

We assume that the matrix pencil μBA is diagonalizable for any μ; regular matrices \(X = (\boldsymbol{x}_{1},\boldsymbol{x}_{2},\ldots,\boldsymbol{x}_{n})\) and \(Y = (\,\boldsymbol{y}_{1},\boldsymbol{y}_{2},\ldots,\boldsymbol{y}_{n})\) that satisfy Y H(μBA)X = (μIΛ) exist, where Λ is the diagonal matrix with elements λ 1, λ 2, , λ n on the diagonal. From the residue theorem,

$$\displaystyle\begin{array}{rcl} S_{k} =\sum _{ i=1}^{n}f_{ k}(\lambda _{i})\boldsymbol{x}_{i}\boldsymbol{y}_{i}^{\mathrm{H}}BV,& & {}\\ \end{array}$$

where \(\boldsymbol{y}_{i}\) and \(\boldsymbol{x}_{i}\) are the left and right eigenvector of μBA respectively, and f k (λ i ) is a filter function that satisfies

$$\displaystyle\begin{array}{rcl} f_{k}(x) = \frac{1} {2\pi \mathrm{i}}\oint _{\varGamma } \frac{z^{k}} {z - x}\mathrm{d}z = \left \{\begin{array}{l} x^{k},\quad x \in G, \\ 0,\quad \mathrm{otherwise}, \end{array} \right.& & {}\\ \end{array}$$

where G is the interior region of Γ. Eigenvalues outside Γ are filtered out with the filter function f k (λ i ). Thus the components of S in the direction of eigenvectors with eigenvalues outside Γ are reduced.

In the case that the Jordan curve Γ is a circle with a center γ and a radius ρ, an N-point trapezoidal rule can be applied to compute (1) numerically, that is

$$\displaystyle\begin{array}{rcl} S_{k} \approx \hat{ S}_{k} =\sum _{ j=1}^{N}w_{ j}\zeta _{j}^{k}X_{ j},& &{}\end{array}$$
(2)

where

$$\displaystyle\begin{array}{rcl} z_{j} =\gamma +\rho \mathrm{e}^{\frac{2\pi \mathrm{i}} {N}(\,j+\frac{1} {2} )},\;\;w_{j} = \frac{z_{j}-\gamma } {\rho N},\;\;\zeta _{j} = \frac{z_{j}-\gamma } {\rho },\quad j = 0,1,\ldots,N - 1,& & {}\\ \end{array}$$

are quadrature points, normalized quadrature points and corresponding weights, respectively, and X j ,  j = 0, 1, , N − 1 are the solutions of linear systems with multiple right-hand side vectors,

$$\displaystyle\begin{array}{rcl} \left (z_{j}B - A\right )X_{j} = BV,\quad j = 0,1,\ldots,N - 1.& &{}\end{array}$$
(3)

The filter function f k (x) is approximated by the N-point trapezoidal rule as

$$\displaystyle\begin{array}{rcl} f_{k}(x) \approx \hat{ f}(x)x^{k} =\sum _{ j=1}^{N} \frac{w_{j}} {z_{j} - x}x^{k},\quad 0 \leq k \leq N - 1,& &{}\end{array}$$
(4)

where \(\hat{f}(x)\) is a rational function. The rational function \(\hat{f}(x)\) and eigenvectors in \(\hat{S}_{k}\) depend on z j , w j , ζ j and N. In this case, the rational function \(\hat{f}(x)\) decays outside the circle[7, 11]. Thus the components of \(\hat{S}_{k}\) in the direction of eigenvectors with eigenvalues outside Γ are small.

Next, we introduce the procedure for the approximation of eigenpairs using the Rayleigh-Ritz approach for the SSPM[7]. Let the singular value decomposition (SVD) of \(\hat{S} = [\hat{S}_{0},\hat{S}_{1},\ldots,\hat{S}_{M-1}] \in \mathbb{C}^{n\times (LM)}\) be \(\hat{S} = Q\varSigma W^{\mathrm{H}}\), where \(Q = [\boldsymbol{q}_{1},\boldsymbol{q}_{2},\ldots,\boldsymbol{q}_{LM}] \in \mathbb{C}^{n\times LM},\;\varSigma = \mathrm{diag}(\sigma _{1},\sigma _{2},\ldots,\sigma _{LM}),\;\sigma _{1} \geq \sigma _{2} \geq \ldots \geq \sigma _{LM}\) and \(W \in \mathbb{C}^{LM\times LM}\). We omit singular values less than δ, and construct \(\hat{Q} = [\boldsymbol{q}_{1},\boldsymbol{q}_{2},\ldots,\boldsymbol{q}_{K}] \in \mathbb{C}^{n\times K}\), where K > m, and σ K δσ K+1. We solve the small eigenvalue problem

$$\displaystyle\begin{array}{rcl} (\alpha _{i}\hat{Q}^{\mathrm{H}}B\hat{Q} -\hat{ Q}^{\mathrm{H}}A\hat{Q})\boldsymbol{u}_{ i} = \boldsymbol{0},\quad \hat{Q}^{\mathrm{H}}A\hat{Q},\hat{Q}^{\mathrm{H}}B\hat{Q} \in \mathbb{C}^{K\times K},& & {}\\ \end{array}$$

where α i is the eigenvalue of the matrix pencil \(\alpha _{i}\hat{Q}^{\mathrm{H}}B\hat{Q} -\hat{ Q}^{\mathrm{H}}A\hat{Q}\) and \(\boldsymbol{u}_{i}\) is the eigenvector corresponding to α i . Then the eigenvalues of the matrix pencil AλB are approximated by λ i α i , and the corresponding approximate eigenvectors are given by \(\boldsymbol{x}_{i} \approx \hat{ Q}\boldsymbol{u}_{i}\) for i = 1, 2, , K. Some approximated eigenvalues may appear outside Γ. We keep eigenvalue λ i inside Γ for \(i = 1,2,\ldots,\tilde{m}\), where \(\tilde{m}\) is the number of approximated eigenvalues inside Γ, and discard the rest.

We can compute the eigenpairs in a specific circle by using the SSPM. When many eigenvalues exist in the circle, we have to set a large value for LM, and thus the computational cost for computing the eigenpairs is high. In some applications, the eigenpairs near the circumference of the circle are also required. When computing these eigenpairs, the computational cost can be reduced with an extension of the SSPM for the arc as follows[10]. In the extension, the procedure for constructing the subspace is different, but the procedure for extracting eigenpairs remains the same.

Let \(\mathbb{L}\) be the arc with center γ, radius ρ, starting angle θ a and ending angle θ b,

$$\displaystyle\begin{array}{rcl} \mathbb{L}: z =\gamma +\rho \mathrm{e}^{\mathrm{i}\theta },\quad \theta _{\mathrm{ a}} \leq \theta \leq \theta _{\mathrm{b}},& & {}\\ \end{array}$$

where 0 ≤ θ a < θ b ≤ 2π. Quadrature points z j , normalized quadrature points ζ j and corresponding weights w j are given by

$$\displaystyle\begin{array}{rcl} z_{j} =\gamma +\rho \mathrm{e}^{\mathrm{i}\theta _{j} },\;\zeta _{j} =\cos \left (\frac{2j + 1} {2N} \pi \right ),\;w_{j} = \frac{T_{N-1}(\zeta _{j})} {N},\quad j = 0,1,\ldots,N - 1,& &{}\end{array}$$
(5)

where, T k (x) is the Chebyshev polynomial of the first kind of degree k, and \(\theta _{j} =\theta _{\mathrm{a}} + (\theta _{\mathrm{b}} -\theta _{\mathrm{a}})\frac{\zeta _{j}+1} {2}\). ζ j are N Chebyshev points in the interval [−1, 1], and z j are points in \(\mathbb{L}\). The matrices \(\hat{S}_{k}\) in (2) are computed with z j , ζ j , w j in (5). According to [10], the rational function \(\hat{f}(x)\) in (4) with z j , ζ j , w j in (5) decays outside the arc \(\mathbb{L}\). Thus the eigenvectors associated with the eigenvalues outside \(\mathbb{L}\) are filtered out. Then we extract the eigenpairs using a Rayleigh-Ritz approach, and we can obtain the eigenpairs near \(\mathbb{L}\).

For computing the eigenpairs in a ring region, we divide the circumference of the ring into D arcs \(\mathbb{L}_{d},\;d = 1,2,\ldots,D\) with θ a (d), θ b (d),  d = 1, 2, , D. Then we compute the eigenpairs on each arc.

3 Extension of the SSPM for the Partial Ring Region

In the extension of the SSPM, quadrature points may lie on the arc. Division by zero arises when eigenpairs are located at quadrature points. Therefore, to avoid division by zero, quadrature points should be located sufficiently far from the arc. The filter function, which is approximated by the rational function, is dependent on the quadrature points, and decays outside of the arc. Thus components of eigenvectors in \(\hat{S}\) decrease when eigenvalues are farther from quadrature points. When the eigenpairs are located away from the quadrature points, the accuracy of the approximated eigenpairs is reduced. We propose an alternative extension of the SSPM, which avoids a decrease in the computational accuracy of the eigenpairs resulting from locating the quadrature points near the eigenvalues. The proposed method uses alternative formulations for z j , ζ j , w j , and derive the filter function which decays outside of the partial ring region.

Let \(\mathbb{L}^{\pm }\) be two arcs such that

$$\displaystyle\begin{array}{rcl} \mathbb{L}^{\pm }: z =\gamma +\rho ^{\pm }\mathrm{e}^{\mathrm{i}\theta },\quad \theta _{\mathrm{ a}} \leq \theta \leq \theta _{\mathrm{b}},& & {}\\ \end{array}$$

where γ is the center, and ρ +,  ρ are the outer and inner radii of the arcs that satisfy ρ + > ρ , and θ a, θ b are the starting and ending angles that satisfy 0 ≤ θ a < θ b ≤ 2 π.

Quadrature points z j ,  j = 0, 1, , N − 1 are Chebyshev points on \(\mathbb{L}^{+}\) and \(\mathbb{L}^{-}\),

$$\displaystyle\begin{array}{rcl} z_{j} = \left \{\begin{array}{l} z_{j}^{+},\;(0 \leq j <N^{+}) \\ z_{j-N^{+}}^{-},\;(N^{+} \leq j <N^{+} + N^{-}) \end{array} \right.,& & {}\\ \end{array}$$

where z j + are N + Chebyshev points on \(\mathbb{L}^{+}\), and z j are N Chebyshev points on \(\mathbb{L}^{-}\) defined by (5), and N = N + + N . In the SSPM, a weight for the quadrature {w 0, w 1, , w N−1} is set to satisfy the following equation for computing the eigenpairs inside Γ [14],

$$\displaystyle\begin{array}{rcl} \sum _{j=0}^{N-1}w_{ j}\zeta _{j}^{k} = \left \{\begin{array}{l} 1,\;(k = -1) \\ 0,\;(k = 0,1,\ldots,N - 2) \end{array} \right..& &{}\end{array}$$
(6)

In the proposed method, we compute the eigenpairs between two arcs. The weights for a quadrature w j are defined by barycentric weight [2],

$$\displaystyle\begin{array}{rcl} w_{j} = (-1)^{N+1} \frac{\prod _{k=0}^{N-1}(\zeta _{ k})} {\prod _{k=0,k\neq i}^{N-1}(\zeta _{k} -\zeta _{k})},& & {}\\ \end{array}$$

where

$$\displaystyle\begin{array}{rcl} \zeta _{j} = \frac{2(z_{j}-\gamma )} {\rho _{1} +\rho _{2}},\quad j = 0,1,\ldots,N - 1.& & {}\\ \end{array}$$

The barycentric weight is used for computing weight for quadrature, which satisfies (6). Then, we construct the matrix \(\hat{S}_{k}\) by (2). The procedure after constructing \(\hat{S}_{k}\) is then the same as the SSPM in Sect. 2.

Figures 1 and 2 show schematics of the quadrature points in the SSPM for an arc and a partial ring region, and Figs. 3 and 4 show rational functions \(\hat{f}(x)\) defined by (4) in each extension for N = 32 quadrature points. In Fig. 3, we set γ = 0, ρ = 1, θ a = 0 and θ b = π. In Fig. 4, we set γ = 0, ρ + = 1. 01, ρ = 0. 99, θ a = 0, θ b = π, N + = 24 and N = 8. In the extension for arcs as well as for the partial ring region, the rational function \(\hat{f}(x)\) decays outside of the two arcs. Thus the components of \(\hat{S}_{k}\) in the direction of eigenvectors associated with eigenvalues outside Γ are small.

Fig. 1
figure 1

Quadrature points for the SSPM for the arc

Fig. 2
figure 2

Quadrature points for the proposed method

Fig. 3
figure 3

Filter function for the SSPM for the arc

Fig. 4
figure 4

Filter function for the proposed method

Figures 6 and 7 show the rational functions \(\hat{f}(x)\) on the two lines in Fig. 5. We compute absolute value of the rational function for the SSPM for the arc and the proposed method with N + = 24, N = 8 and N ± = 16. The parameters γ, ρ, ρ ±, θ a and θ b are the same as in Figs. 3 and 4. In Figs. 6 and 7, the horizontal axis indicates angle of the line1 and imaginary axis, respectively. In Fig. 6, we can see that the gap between maximum value and minimum value of the rational function for the proposed method is smaller than the gap for the SSPM for the arc. The rational function for the proposed method with N + = 24, N = 8 are similar to that with N ± = 16. Thus the components of \(\hat{S}_{k}\) in the direction of eigenvectors associated with eigenvalues near the arc for the proposed method is more equable than that for the SSPM for arc. In Fig. 7, we can see that the rational functions for the proposed method decays outside of the circle more rapidly than that for the SSPM for arc. However, the rational functions for the proposed method decays inside of the circle more slowly than that for the SSPM for arc. The rational function for the proposed method with N + = 24, N = 8 are similar to that with N ± = 16. Thus the components of \(\hat{S}_{k}\) in the direction of eigenvectors associated with eigenvalues outside the circle for the proposed method are smaller than that for the SSPM for arc, and the components of \(\hat{S}_{k}\) in the direction of eigenvectors associated with eigenvalues inside the circle for the proposed method are larger than that for the SSPM for arc.

Fig. 5
figure 5

Two lines for filter function

Fig. 6
figure 6

Absolute value of the filter function on the line1

Fig. 7
figure 7

Absolute value of the filter function on the line2

In the proposed method, we compute the eigenpairs in a partial ring region. To do this, we divide the ring region into D partial ring regions by θ a (d), θ b (d),  d = 1, 2, , D. Then we compute the eigenpairs using the proposed method in each partial ring region.

The SSPM has potential for hierarchical parallelism: (I) each region can be computed independently, (II) linear systems at each quadrature point can be solved independently, (III) multiple right-hand sides of the linear systems can be solved simultaneously. Therefore we can assign different tasks for solving the linear systems to each parallel processor. Parallel implementations of the SSPM have been developed, such as Bloss[6], z-Pares[4] and CISS (Contour Integral Spectral Slicing). In CISS, the parallelism of (I) and (II) is implemented in SLEPc, along with an extension for the arc.

4 Numerical Example

In this section, we present numerical examples of the proposed method. We implement the proposed method in SLEPc, and compare the performance of the proposed method with that of the extension for the arc.

Experiments are performed on the supercomputer cluster of many-core architecture COMA (PACS-IX) at the Center for Computational Sciences, the University of Tsukuba. COMA has a total of 393 nodes providing 1.001 PFLOPS at optimum performance. Each node has dual CPU (Intel Xeon E5-2670v2), dual MIC (Intel Xeon Phi 7110P), and 64 GB memory, and the CPU has 10 cores and the MIC has 61 cores. The linear systems are solved by a direct method, in particular PCLU in PETSc library[1].

First, we compare the accuracy of the two extensions. Four test matrices A, B were used in the numerical experiments: (I) 1000 × 1000 diagonal matrix A and identity matrix B. The diagonal elements of A are 950 complex values, real and imaginary part of which are random between [0, 0. 4], and 50 complex values, real and imaginary part of which are random between [0. 5, 1. 1] (SAMPLE). (II) 501 × 501 diagonal matrix A and identity matrix B. The diagonal elements of A are 500 complex values, that are spaced equiangularly on the circle with a center 0 and a radius 1 on the complex plane, and 1 complex value which is close to quadrature point (z 1 + 10−10) in the SSPM for the arc (SAMPLE2). (III) 5000 × 5000 matrix A taken from the matrix market[3] and identity matrix B (OLM5000). (IV) 11, 520 × 11, 520 matrix A, B derived from computation of the self-energy of a silicon nanowire with a 6 × 6 nm2 cross section[9] (SI11520). In both extensions, we divide the ring region into D = 4 partial ring regions, and we set ρ ± = ρ ±β. Parameters for the ring region, starting angles and ending angles are given in Table 1. The remaining parameters for both extensions were N + = 24, N = 8, N = 32, L = 32, M = 8 and δ = 10−12.

Table 1 Parameters for the proposed method and the extension for the arc

Table 2 shows the accuracy of the two extensions. max(res) and min(res) are the maximum and minimum values of the residuals \(\|A\boldsymbol{x}_{i} -\lambda _{i}B\boldsymbol{x}_{i}\|_{2}/(\|A\|_{\mathrm{F}} + \vert \lambda _{i}\vert \|B\|_{\mathrm{F}})\), respectively. The proposed method shows similar accuracy to the SSPM for the arc for the case (I), (III) and (IV). In the case (II), we can see that the maximum value of the residuals of the proposed method is smaller than that of the SSPM for the arc. Because 1 eigenvalue is very close to the quadrature point for the SSPM for the arc, a component of \(\hat{S}_{k}\) in the direction of eigenvector associated with eigenvalue near the quadrature point becomes large, and other components become small relatively. These results indicate that the accuracy of residuals become low when the eigenpairs are closed at the quadrature points, and the proposed method improves the accuracy by locating the quadrature points sufficiently far from the arc.

Table 2 Accuracy of the two extensions

Next, we evaluate the parallel performance of the proposed method. We investigate how the computational time varies as the number of processes increases. We use test matrix SI11520[9] in this experiment. We implement the proposed method in SLEPc. Here, the number of processes is set to 1, 2, 4, 8, 16, 32, 64 and 128, and other parameter values are the same as in the above experiment.

Table 3 shows the computational time for the proposed method with different numbers of processes. Total is the computational time for the proposed method, and Ideal is the ideal time (Total of 1 process)∕(# processes). Figure 8 shows the details of the computational time. In Table 3 and Fig. 8, we can see that the proposed method has a good scaling for solving linear systems in (3), but is saturated for constructing \(\hat{S}\) in (2) and computing the SVD due to the increase in communication time for each process. Thus the computational time for the proposed method is close to the ideal time for up to 8 processes but increases for 16 or more processes.

Fig. 8
figure 8

Details of the computational time for the proposed method with different numbers of processes (SI11520[9])

Table 3 Computational time for the proposed method for different numbers of processes

5 Conclusion

In the present paper, we presented an extension of the SSPM for a partial ring region. The filter function for the extension is similar to that for the existing SSPM for the arc. We implemented the SSPM for a partial ring region using SLEPc, and demonstrated that the method can be parallelized. The performance of the method was examined on a supercomputer cluster with many-core architecture. The results showed that the accuracy of the proposed method was similar to that of the SSPM for an arc, and the proposed method improves the accuracy by locating the quadrature points sufficiently far from the arc. We demonstrate that our implementation on SLEPc has efficient parallelism. As an area for future work, we intend to develop the SSPM to avoid the loss in efficiency due to the communication time between computers.