1 Introduction

We solve pairs of a real symmetric-definite generalized eigenproblem of size N matrices A and B as:

$$\displaystyle{ A\,\mathbf{v} =\lambda B\,\mathbf{v} }$$
(1)

whose eigenvalues are in the specified interval [a, b] by the filter diagonalization method [16]. We define for this kind of eigenproblem, the resolvent with a complex-valued shift ρ as:

$$\displaystyle{ R(\rho ) = (A -\rho B)^{-1}\,B\,. }$$
(2)

In our previous papers and reports [59] and papers of others [13], the filter studied or used was a real-part of a complex linear combination of resolvents with complex shifts:

$$\displaystyle{ F = c_{\infty }I + \mathrm{Re}\,\sum _{k=1}^{n}\gamma _{ k}\,R(\rho _{k})\,. }$$
(3)

Here, c is a real coefficient, γ k , k = 1, 2, , n are complex coefficients, and I is the identity matrix. (The reason we take the real-part is to halve the cost of calculation by using the complex-conjugate symmetry). For example, from n = 6 to n = 16 (or more) resolvents were used.

For a given set of size N column vectors X, the action of the resolvent YR(ρ)X reduces to solve an equation CY = BX for a set of column vectors Y. Here, the coefficient matrix is C = AρB. When C is banded, the equation may be solved by some direct method using matrix factorization. When C is random sparse, the equation is solved by some iterative method using incomplete matrix factorization. In the application of the filter, the matrix factorization is a large portion of the calculation. The total amount of memory to store the factor is also a very severe constraint in the calculations of large size problems. When many resolvents are used, the total amount of memory requirements is proportional to the number of resolvents applied concurrently.

There are also different but similar approaches and successful studies which are based on the contour integrals and moment calculations [4, 1315], which also uses many resolvents whose shifts correspond to the integration points.

In this report of study (and in our several previous reports [1012]), we used only a single resolvent with a complex shift and constructed the filter which is a real-part of a polynomial of the resolvent as:

$$\displaystyle{ F = c_{\infty }I + \mathrm{Re}\,\sum _{k=1}^{n}\gamma _{ k}\,\left (R(\rho )\right )^{k}\,. }$$
(4)

We made some numerical experiments on a test problem to check if this approach is really applicable.

2 Present Approach: Filter is Real-Part of Polynomial of Resolvent

For the large eigenproblem, we assume the severest constraint is the amount of memory requirements. Thus, in our present study of the filter diagonalization method, we use a single resolvent in the filter rather than many ones. The filter we use is a real-part of a polynomial of the resolvent. In the filter operation, the same resolvent is applied as many times as the degree of the polynomial. Each time, the application of the resolvent to a set of vectors reduces to the set of solutions of simultaneous linear equations of the same coefficient matrix. To solve the set of simultaneous equations, the coefficient matrix is factored once and the factor is stored. The stored factor is used many times when the resolvent is applied. By the use of a single resolvent rather than many ones, even the transfer function of the filter cannot be made in good shape, but in exchange we obtain advantages of lower memory requirement and reduced number of matrix factorization.

2.1 Filter as a Polynomial of a Resolvent and Its Transfer Function

We consider a real symmetric-definite generalized eigenproblem of size N matrices A and B:

$$\displaystyle{ A\mathbf{v} =\lambda B\mathbf{v},\;\mathrm{where\;}B> 0\,. }$$
(5)

The resolvent with a non-real complex shift ρ is:

$$\displaystyle{ R(\rho ) = (A -\rho B)^{-1}B\,. }$$
(6)

For any pair of the eigenproblem (λ, v), we have:

$$\displaystyle{ R(\rho )\,\mathbf{v} = \frac{1} {\lambda -\rho }\mathbf{v}\,. }$$
(7)

The filter F is a real-part of a degree n polynomial of the resolvent:

$$\displaystyle{ F = c_{\infty }I + \mathrm{Re}\,\sum _{k=1}^{n}\gamma _{ k}\left \{R(\rho )\right \}^{k}\,. }$$
(8)

Here, c is a real number, and γ k are complex numbers. This filter is a real linear operator. For any eigenpair (λ, v), we have:

$$\displaystyle{ F\,\mathbf{v} = f(\lambda )\,\mathbf{v}\,. }$$
(9)

Here, f(λ) is the transfer function of the filter F which is a real rational function of λ of degree 2n as:

$$\displaystyle{ f(\lambda ) = c_{\infty } + \mathrm{Re}\,\sum _{k=1}^{n} \frac{\gamma _{k}} {(\lambda -\rho )^{k}} }$$
(10)

whose only poles are located at a non-real complex number ρ and its complex conjugate (both poles are n-th order).

2.1.1 Transfer Function g(t) in Normalized Coordinate t

We are to solve those pairs whose eigenvalues are in the specified real interval [a, b]. By the linear transformation which maps between λ ∈ [a, b] and t ∈ [−1, 1], the normalized coordinate t of λ is defined as \(\lambda = \frac{a+b} {2} + \left (\frac{b-a} {2} \right )\,t\,\). We call the interval t ∈ [−1, 1] as the passband, intervals μ ≤ | t | as stopbands, and intervals 1 < | t | < μ which are between the passband and stopbands as transition-bands.

The transfer function g(t) in the normalized coordinate t is defined by:

$$\displaystyle{ g(t) = f(\lambda )\,. }$$
(11)

To the transfer function g(t), we request the following conditions:

  1. 1.

    | g(t) | ≤ g stop when t is in stopbands. Here, g stop is a very small positive number.

  2. 2.

    g passg(t) when and only when t is in the passband. Here, g pass is a number much larger than g stop. (The upper-bound of g(t) is about unity. By re-normalization of the filter, which is the multiplication of a constant, we may set the upper-bound to unity later.)

For convenience, we also restrict g(t) to an even function. Then the poles are pure imaginary numbers (Fig. 1).

Fig. 1
figure 1

Specified interval of eigenvalue and location of poles (shifts)

Fig. 2
figure 2

Shapes of transfer functions (ideal and conceptual)

We just placed the poles of g(t) at pure imaginary numbers \(t = \pm \sqrt{-1}\), and the expression of g(t) is written as:

$$\displaystyle{ g(t) = c_{\infty }^{{\prime}} + \mathrm{Re}\,\sum _{ k=1}^{n} \frac{\alpha _{k}} {(1 + t\sqrt{-1}\,)^{k}}\,. }$$
(12)

For this expression, to make g(t) an even function, we restrict coefficients α k , k = 1, 2, , n as real numbers. The real coefficients are so tuned to make the shape of g(t) satisfies the following two conditions: (1) In the passband | t | < 1, the value of g(t) is close to 1, (2) In stopbands μ < | t |, the magnitude of g(t) is very small. (See, Fig. 2). In our present study, coefficients are optimized by a method which is similar to the least-square method.

2.2 Construction of Filter from Transfer Function

We construct the filter operator F from the transfer function g(t).

Since

$$\displaystyle\begin{array}{rcl} g(t)& =& c_{\infty }^{{\prime}} + \mathrm{Re}\,\sum _{ k=1}^{n} \frac{\alpha _{k}} {\left (1 + t\sqrt{-1}\,\right )^{k}}\,,{}\end{array}$$
(13)
$$\displaystyle\begin{array}{rcl} f(\lambda )& =& c_{\infty } + \mathrm{Re}\,\sum _{k=1}^{n} \frac{\gamma _{k}} {(\lambda -\rho )^{k}}\,,{}\end{array}$$
(14)

and also from both two relations: \(f(\lambda ) = g(t)\), \(\lambda = \frac{a + b} {2} + \left (\frac{b - a} {2} \right )\,t\),we have relations between coefficients and the value of the shift.

$$\displaystyle\begin{array}{rcl} c_{\infty }^{{\prime}}& =& c_{ \infty },{}\end{array}$$
(15)
$$\displaystyle\begin{array}{rcl} \gamma _{k}& =& \alpha _{k}\left (-\sqrt{-1}\right )^{k}\left (\frac{b - a} {2} \right )^{k}\,,k=1,2,\ldots,n\,\,,{}\end{array}$$
(16)
$$\displaystyle\begin{array}{rcl} \rho & =& \frac{a + b} {2} + \frac{b - a} {2} \sqrt{-1}\,.{}\end{array}$$
(17)

Here after, we simply set the transfer rate at infinity c to zero, thus our filter operator is:

$$\displaystyle{ F = \mathrm{Re}\,\sum _{k=1}^{n}\gamma _{ k}\,\left \{R(\rho )\right \}^{k}\,. }$$
(18)

When the half-width of the interval \(\frac{b-a} {2}\) is a large or a small number, then coefficients \(\gamma _{k} = \left (-\,\frac{b-a} {2} \sqrt{-1}\right )^{k}\,\alpha _{ k}\) for higher k-th terms might get unnecessary floating point number overflows or underflows, which can be avoided by changing the expression of the filter as:

$$\displaystyle{ F = \mathrm{Re}\,\sum _{k=1}^{n}\alpha _{ k}\,\left \{\left (-\sqrt{-1}\right )\,\frac{b - a} {2} \,R(\rho )\right \}^{k}\,. }$$
(19)

2.3 Procedure of Filter Operation

Here, we show the procedure of the filter operation, the action of the filter to a given set of vectors.

Our filter F is specified by the degree n, a complex shift \(\rho = \frac{a+b} {2} + \frac{b-a} {2} \sqrt{-1}\) and real coefficients α k , k = 1, 2, , n as the above expression (19).

Let X and Y are sets of m real column vectors of size N which are represented as real N×m matrices. Then, the filter operation YFX can be calculated by a procedure shown in Fig. 3. (In the procedure, W and Z are complex N×m matrices for work.)

Fig. 3
figure 3

Procedure of filter operation YFX

2.4 Implementation of Resolvent

To calculate the action of a resolvent ZR(ρ) W, we first calculate BW from W, then solve the equation CZ = BW for Z. Here, the coefficient is C = AρB. Since both matrices A and B are real symmetric, C is complex symmetric (C T = C). When both matrices A and B are banded, C is also banded. In our present experiments, the complex modified-Cholesky method without pivoting for banded system is used for the banded complex symmetric matrix C, even there might be a potential risk of numerical instability.

In the calculation of Rayleigh quotient inverse-iteration which refines approximated eigenpairs, the shifted matrix is real symmetric but indefinite and very close to singular, therefore the simultaneous linear equation is solved carefully by the banded LU decomposition with partial pivoting without using the symmetry.

3 Filter Design

Our filter is a real-part of a polynomials of a resolvent. The coefficients of the polynomial are determined by a kind of least-square method, which is the minimization of the weighted sum of definite integrals in both passband and stopbands. The definite integrals are weighted integrations of square of errors of the transfer function from ideal one. The minimization condition gives a system of linear equation with a symmetric positive definite matrix. However, the equations is numerically highly ill-conditioned. Therefore, to determine accurate coefficients in double precision, we have to make quadruple precision calculation both in the generation of the system of linear equations and in the solution of the generated system by using regularization.

3.1 Design by LSQ Method

We show a method to tune α k , the coefficients of g(t), by least square like method. First, we make the change of variable from t to θ as t ≡ tanθ, and let h(θ) ≡ g(t).

$$\displaystyle{ h(\theta ) =\sum _{ k=1}^{n}\alpha _{ k}\,\cos (k\theta )\,(\cos \theta )^{k}\,. }$$
(20)

Since h(θ) is an even function, it is sufficient to consider only in θ ∈ [0, ). The condition of passband is also considered in \(\theta \in [\,0,\, \frac{\pi } {4}\,]\).

3.1.1 Method-I

J stop and J pass are the integrals in the stopband and in the passband (with weight 1) of the square of difference of the transfer function h(θ) from the ideal one.

We choose a positive small number η and minimize JJ stop + ηJ pass.

For intervals [0, 1] and [μ, ) of t correspond to intervals [0, π∕4] and [tan−1 μ, π∕2) of θ, respectively (these give endpoints of definite integrals for (half of) passband and a stopband). Then we have:

$$\displaystyle\begin{array}{rcl} J_{\mathrm{stop}}& \equiv & \int _{\tan ^{-1}\mu }^{\pi /2}\{h(\theta )\}^{2}d\theta =\sum _{ p,\,q=1}^{n}\alpha _{ p}\,\mathscr{A}_{p,q}\,\alpha _{q}\,,{}\end{array}$$
(21)
$$\displaystyle\begin{array}{rcl} J_{\mathrm{pass}}& \equiv & \int _{0}^{\pi /4}\{1 - h(\theta )\}^{2}d\theta =\sum _{ p,\,q=1}^{n}\alpha _{ p}\,\mathscr{B}_{p,q}\,\alpha _{q} - 2\sum _{p=1}^{n}\alpha _{ p}\,\mathscr{B}_{p,0}\; +\; \mathrm{const}\,.\qquad {}\end{array}$$
(22)

Here,

$$\displaystyle\begin{array}{rcl} \mathscr{A}_{p,q}& \equiv & \int _{\tan ^{-1}\mu }^{\pi /2}\cos (\,p\theta )\,\cos (q\theta )\,(\cos \theta )^{p+q}\,d\theta \,,{}\end{array}$$
(23)
$$\displaystyle\begin{array}{rcl} \mathscr{B}_{p,q}& \equiv & \int _{0}^{\pi /4}\cos (\,p\theta )\,\cos (q\theta )\,(\cos \theta )^{p+q}\,d\theta \,.{}\end{array}$$
(24)

We calculated numerical values of these definite integrals by analytic closed formulae.

We can easily show that cos( )cos()(cosθ)p+q

$$\displaystyle\begin{array}{rcl} & =& 2^{-(\,p+q+2)}(1 + e^{-2ip\theta })(1 + e^{-2iq\theta })(1 + e^{2i\theta })^{p+q} {}\\ & =& 2^{-(\,p+q+2)}(1 + e^{-2ip\theta })(1 + e^{-2iq\theta })\sum _{ k=0}^{p+q}\binom{p + q}{k}e^{2ik\theta } {}\\ & =& 2^{-(\,p+q+2)}\sum _{ k=0}^{p+q}\binom{p + q}{k}\{e^{2ik\theta } + e^{2i(k-p)\theta } + e^{2i(k-q)\theta } + e^{2i(k-p-q)\theta }\}, {}\\ \end{array}$$

where i denotes the imaginary unit \(\sqrt{-1}\).

We define for an integer and real number a and b:

$$\displaystyle\begin{array}{rcl} T_{\ell}& \equiv & \int _{a}^{b}\cos 2\ell\theta \,d\theta {}\\ & =& \left \{\begin{array}{@{}l@{\quad }l@{}} b - a \quad &(\ell= 0)\\ (\sin 2\ell b -\sin 2\ell a )/(2\ell) =\{\sin \ell(b - a) \cdot \cos \ell(b + a) \}/\ell\quad &(otherwise) \end{array} \right.{}\\ \end{array}$$

Then for integers p and q, we have:

$$\displaystyle{\int _{a}^{b}\cos (\,p\theta )\cos (q\theta )(\cos \theta )^{p+q}\,d\theta = \frac{1} {2^{p+q+2}}\sum _{k=0}^{p+q}\binom{p + q}{k}\{T_{ k} + T_{k-p} + T_{q-k} + T_{p+q-k}\},}$$

which has a symmetry for pq. We can use another symmetry that T k + T kp and T qk + T p+qk is interchanged when kp + qk. Since | T k | ≤ 1∕k, we calculate the sum so that terms are added in ascending order of magnitudes of binomial coefficients so to reduce rounding errors. Let w k T k + T kp and mp + q, \(c_{k} \equiv \binom{m}{k}\), the value of integral v is calculated as in Fig. 4.

Fig. 4
figure 4

Procedure to calculate definite integral

The minimization condition of JJ stop + ηJ pass is, if we set \(b_{p} \equiv \mathscr{B}_{p,0}\), reduces to a simultaneous linear equations whose coefficient matrix is real symmetric positive definite:

$$\displaystyle{ (\mathscr{A} +\eta \mathscr{B})\,\boldsymbol{\alpha } =\eta \, \mathbf{b}\,. }$$
(25)

For this linear equation, \(\mathscr{A}\) and \(\mathscr{B}\) are size n matrices whose elements are \(\mathscr{A}_{p,q}\) and \(\mathscr{B}_{p,q}\), p, q = 1, 2, , n, respectively, and also \(\boldsymbol{\alpha }\) and b are column vectors \(\boldsymbol{\alpha }\equiv [\alpha _{1},\alpha _{2},\ldots,\alpha _{n}]^{T}\) and b ≡ [b 1, b 2, , b n ]T, respectively. We solve this linear equation to obtain the coefficients α k , k = 1, 2, , n.

3.1.2 Method-II

We assume \(\boldsymbol{\alpha }\) as a vector whose 2-norm is a constant, and we first minimize the definite integral in the stopband:

$$\displaystyle{ J_{\mathrm{stop}} =\sum _{ p,\,q=1}^{n}\alpha _{ p}\,\mathscr{A}_{p,q}\,\alpha _{q} =\boldsymbol{\alpha } ^{T}\mathscr{A}\,\boldsymbol{\alpha }\,. }$$
(26)

If we choose \(\boldsymbol{\alpha }\) to the eigenvector of the smallest eigenvalue of the matrix \(\mathscr{A}\), then J stop is the minimum. But if we did so, there is no more freedom left to tune the approximation in the passband. Thus, we introduce the following modification. We choose a suitable small positive number ε. If there are eigenvectors whose eigenvalues are under the threshold ε, let S () be the subspace which is spanned by those eigenvectors. Then, it holds \(J_{\mathrm{stop}} \leq \epsilon \,\vert \vert \boldsymbol{\alpha }\vert \vert _{2}^{2}\) whenever \(\boldsymbol{\alpha }\in S^{(\ell)}\).

The minimization condition of J pass under the constraint \(\boldsymbol{\alpha }\in S^{(\ell)}\) reduces to a simultaneous linear equations whose coefficient matrix is of size and symmetric positive definite.

When we extend the subspace (by increasing ), then J pass decreases and the approximation in passband become better, however J stop increases and the approximation in stopband become worse. On the other hand, when we shrink the subspace (by decreasing ), then J stop decreases and the approximation in the stopband become better, however J pass increases and the approximation in passband goes worse.

We have to find a good choice of threshold ε (or ) considering the balance of both contradicting conditions of approximations in the passband and the stopband.

3.2 Examples of Designed Filters

Table 1 Filters used in experiments
Table 2 Filter (No.1): coefficients α k
Table 3 Filter (No.2): coefficients α k
Table 4 Filter (No.3): coefficients α k
Fig. 5
figure 5

Filter (No.1): magnitude of transfer function | g(t) |

Fig. 6
figure 6

Filter (No.2): magnitude of transfer function | g(t) |

Fig. 7
figure 7

Filter (No.3): magnitude of transfer function | g(t) |

We show in Table 1 (See Tables 2, 3, 4 and Figs. 5, 6, 7), three filters (No.1), (No.2) and (No.3) which are determined by a least-square type method (Method-II). The good thresholds in the method are determined by trials. For the filter (No.1),ε = 10−30 is used, which gives = 2. For the filter (No.2), ε = 10−25 is used, which gives = 2. For the filter (No.3), ε = 10−30 is used, which gives = 5. When n and μ are given, the result depends only on the rank of subspace. The value of threshold ε is used to obtain the appropriate . If about 15-digits reduction ratio in stopbands (g stop ≈ 10−15) is desired, we need 30-digits accuracy to calculate the least-square type method. Therefore, we used quadruple precision calculation only in this step to obtain coefficients α k , k = 1, 2, , n in double precision.) It seems the coefficients α k themselves are numerically very sensitive even the calculation is made in quadruple precision, however it does not matter as long as the shape of the obtained transfer function is good. For the filter (No.2), the value of μ is set smaller than that of filter (No.1), but in exchange the value of g pass becomes smaller and the value of g stop becomes larger. For the filter (No.3), the value of g pass is closer to 1 than that of filter (No.1), which is attained with larger degree n = 20.

About the shape parameters of the transfer function μ, g pass and g stop:

  • If μ is increased, the transition-bands become wider, then it is likely that the number of eigenvectors whose eigenvalues are in the transition-bands increases. The more eigenvectors exist whose eigenvalues are in transition-bands, then the more vectors are required to be filtered to construct an approximation of the basis of the invariant subspace.

  • When the max-min ratio of the transfer rate in the passband (related to the reciprocal of g pass) is a large number, the ununiformity of accuracies of approximated pairs tends to be low.

  • When g stop, the upper-bound of magnitude of transfer rate in stopbands, is not very small, the approximation of the invariant subspace will not be good and approximated pairs will be poor.

4 Experiments of Filter Diagonalization

4.1 Test Problem: 3D-Laplacian Discretized by FEM

Our test problem for a real symmetric-definite generalized eigenproblem:

$$\displaystyle{ A\,\mathbf{v} =\lambda B\,\mathbf{v} }$$
(27)

is originated from a discretization by the finite element method approximation of an eigenproblem of the Laplace operator in three dimensions:

$$\displaystyle{ (-\nabla ^{2})\,\varPsi (x,y,z) =\lambda \,\varPsi (x,y,z)\,. }$$
(28)

The region of the partial differential operator is a cube [0, π]3, and the boundary condition is zero-Dirichlet.

For the discretization by the finite element method approximation, each direction of the edge of the cube is equi-divided into N 1 + 1, N 2 + 1, N 3 + 1 sub-intervals. In each finite element, basis functions are tri-linear functions which are products of piece-wise linear function in each direction. The discretization by the finite element method gives a real symmetric-definite generalized eigenproblem of matrices (In this case, both matrices A and B are positive definite, and all eigenvalues are positive real numbers).

The size of both matrices A and B is N = N 1 N 2 N 3. The lower bandwidth of matrices is 1 + N 1 + N 1 N 2 by a good numbering of basis functions. (Although A and B are quite sparse inside their bands, in our calculation they are treated as if dense inside their bands).

We solve only those eigenpairs (λ, v) whose eigenvalues are in a specified interval [a, b]. Exact eigenvalues can be calculated by a simple formula. When the numbers of sub-intervals in directions are all different, all eigenvalues are distinct.

Computer System Environment

Our calculation is made on a high end class PC system. The CPU is intel Core i7-5960X (3.0 GHz, 8cores, 20 MB L3 cache). Both the turbo mode and the hyper-threading mode of the CPU are disabled from the BIOS menu. The theoretical peak performance of the CPU is 384 GFLOPS in double precision. The memory bus is quad-channel and the total main memory size is 128 GB (8 pieces of DDR4-2133 MHz (PC4-17000) 16 GB memory module). The operating system is CentOS 7 (64bit address). We used intel Fortran compiler ver.15.0.0. for Linux x86_64 with compile options: -fast , -openmp .

4.2 Experiment Results

We solve an eigenproblem of large size whose discretization manner is (N 1, N 2, N 3) = (50, 60, 70). In this case, the size of matrices is N = 50 × 60 × 70 = 210, 000, and the lower bandwidth of matrices is w L = 1 + 50 + 50 × 60 = 3051.

Table 5 Elapse times (in s) (matrix size N = 210, 000)

We solved those pairs whose eigenvalues are in the interval [200, 210] (The true count of such pairs is 91). We chose m = 200 for the number of vectors to be filtered. In the calculation of the action of the resolvent, the modified Cholesky factorization for the complex symmetric banded matrix is used. In experiments, three filters (No.1), (No.2) and (No.3) are tested and elapse times are measured in seconds (Table 5). For an approximated eigenpair (λ, v) of the generalized eigenproblem, the residual of the pair is a vector r = (AλB)v. We assume the vector v of every approximated pair is already normalized in B-norm such that v T B v = 1. We use B −1-norm for the norm to the residual of an approximated pair. Therefore, the norm of residual is \(\varDelta = \sqrt{\mathbf{r} ^{T } B^{-1 } \mathbf{r}}\), where r = (AλB)v and v is B-normalized is assumed. The errors of eigenvalues are calculated by comparisons from exact values by using the formula for this special test problem made by the FEM discretization of Laplace operator in a cube with zero-Dirichlet boundary condition.

  • Case of filter (No.1): The graph in Fig. 8 plots the norm of the residual of each approximated pair. In the middle of the interval of eigenvalue the norm of the residual is about 10−10, and near the both ends of the interval it is about 10−6. Their ratio is about 104, which corresponds to the ununiformity of transfer rate of the filter (No.1) in the passband.

    Fig. 8
    figure 8

    Filter (No.1): norm of residual (matrix size N = 210, 000)

    The graph in Fig. 9 plots the absolute error of eigenvalue of each approximated pair. The errors of approximated eigenvalues are less than 10−12, and approximated eigenvalues are accurate to about 14 digits or more.

  • Case of filter (No.2): The graph in Fig. 10 plots the norm of the residual of each approximated pair. In the middle of the interval of eigenvalue the norm of residual is about 10−10, and near the both ends of the interval and it is about 10−6. Their ratio is about 104, which corresponds to the ununiformity of transfer rate of the filter (No.2) in the passband.

    The graph in Fig. 11 plots the absolute error of eigenvalue of each approximated pair. The absolute errors of approximated eigenvalues are less than 10−12, and approximated eigenvalues are accurate to about 14 digits or more.

  • Case of filter (No.3): The graph in Fig. 12 plots the norm of the residual of each approximated pair. In the middle of the interval of eigenvalue the norm of the residual is about 10−10, and near the both ends of the interval it is about 10−8. Their ratio is about 102, which corresponds to the ununiformity of transfer rate of the filter (No.3) in the passband.

    The graph in Fig. 13 plots the absolute error of eigenvalue of each approximated pair. The absolute errors of approximated eigenvalues are less than 10−12, and approximated eigenvalues are accurate to about 14 digits or more.

Fig. 9
figure 9

Filter (No.1): error of eigenvalue (matrix size N = 210, 000)

Fig. 10
figure 10

Filter (No.2): norm of residual (matrix size N = 210, 000)

Fig. 11
figure 11

Filter (No.2): error of eigenvalue (matrix size N = 210, 000)

Fig. 12
figure 12

Filter (No.3): norm of residual (matrix size N = 210, 000)

Fig. 13
figure 13

Filter (No.3): error of eigenvalue (matrix size N = 210, 000)

5 Timing Comparisons with Elliptic Filters

We compared our present filter to the elliptic filter. Our present filter is a real-part of a polynomial of a resolvent. The elliptic filter is a typical filter which is a real-part of a linear combination of resolvents.

5.1 Filter Which is a Real-Part of a Linear Combination of Resolvents

The filter \(\mathscr{F}\) which is a real-part of a linear combination of resolvents is written as:

$$\displaystyle{ \mathscr{F} = c_{\infty }I + \mathrm{Re}\sum _{i=1}^{k}\gamma _{ i}R(\rho _{i})\,. }$$
(29)

The coefficient c is real, coefficients γ i and shifts ρ i  i = 1, 2, , k are complex numbers. We assume shifts are not real numbers and their imaginary parts are positive. An application of \(\mathscr{F}\) to a real vector gives a real vector. For any eigenpair (λ, v) of the original eigenproblem, we have \(\mathscr{F}\mathbf{v} = f(\lambda )\mathbf{v}\). Here f(λ) is the transfer function of \(\mathscr{F}\), which is the following real rational function of λ of degree 2k:

$$\displaystyle{ f(\lambda ) = c_{\infty } + \mathrm{Re}\sum _{i=1}^{k} \frac{\gamma _{i}} {\lambda -\rho _{i}}\,. }$$
(30)

The normal coordinate t of λ is introduced by \(\lambda = \mathscr{L}(t) \equiv (a + b)/2 + (b - a)/2 \cdot t\), which is a linear map between t ∈ [−1, 1] and λ ∈ [a, b]. We define g(t), the transfer function in the normal coordinate t, by the relation f(λ) ≡ g(t). Then the form of the transfer function g(t) is:

$$\displaystyle{ g(t) = c_{\infty } + \mathrm{Re}\sum _{i=1}^{k} \frac{c_{i}} {t -\tau _{i}}\,, }$$
(31)

which is a real rational function of degree 2k. Here, \(\rho _{i} = \mathscr{L}(\tau _{i})\), \(\gamma _{i} = \mathscr{L}^{{\prime}}\cdot c_{i}\), where \(\mathscr{L}^{{\prime}} \equiv (b - a)/2\) is a constant. In reverse, when a real rational function g(t) is given which can be represented in the form of expression (31), then from the real coefficient c , complex coefficients c i , i = 1, 2, , k and also complex poles τ i , i = 1, 2, , k, the function f(λ) is determined. Thus the filter \(\mathscr{F}\) which is a real-part of a linear combination of resolvents is also determined.

We impose conditions for the shape of g(t) on the real axis:

  • | g(t) | ≤ g stop when μ ≤ | t |,

  • g passg(t) when | t | ≤ 1, and maxg(t) = 1,

  • g stop < g(t) < g pass when 1 < | t | < μ.

We give μ, g pass and g stop (μ > 1 and 0 < g stopg pass < 1), then the (half) degree k and the coefficient c and coefficients and poles c i , τ i , i = 1, 2, , k are so determined to satisfy the shape conditions.

5.2 Elliptic Filters for Comparisons

For comparisons, we choose the elliptic filter (also called Cauer filter or Zolotarev filter) as the filter which is a real-part of a linear combination of resolvents, which comes from the theory of best approximation by rational function. The elliptic filter is so powerful that it can choose the value of μ any close to unity, g p any close to unity, and also the value of g s any small if (half) degree k is raised. But in our experiments to make comparisons, we choose three elliptic filters (No.E1), (No.E2) and (No.E3), which have similar shape parameters (μ, g pass, g stop) to our present filters (No.1), (No.2) and (No.3) respectively (Table 6). We set the same values of μ and g pass between the present filters and the corresponding elliptic filters, but since the (half) degree n of the elliptic filter must be an integer, the true values g stop for the elliptic filters are not the same but chosen smaller (better in shape). For each elliptic filters we used, complex poles in the upper half complex plane and their complex coefficients of g(t) are tabulated in Tables 7, 8 and 9, respectively. Figures 14, 15 and 16 plot graphs of transfer functions g(t) of elliptic filters for only t ≥ 0 since they are even functions.

Table 6 Elliptic filters used in comparisons
Table 7 Elliptic filter (No.E1) for μ = 2. 0, g pass = 2. 3797×10−4, g stop = 1. 1×10−15 (The half degree k = 8 and the true g stop = 4. 1457×10−17)
Table 8 Elliptic filter (No.E2) for μ = 1. 5, g pass = 5. 4647×10−5, g stop = 5. 8×10−13 (The half degree k = 7 and the true g stop = 7. 7975×10−14)
Table 9 Elliptic filter (No.E3) for μ = 2. 0, g pass = 1. 2727×10−2, g stop = 2. 6×10−15 (The half degree k = 8 and the true g stop = 2. 2452×10−15)

5.3 Elapse Times of Filtering

For an elliptic filter which is a real-part of a linear combination of resolvents, the applications of resolvents to a set of vectors can be made in parallel, however when the applications are made in parallel, the larger memory space is required especially when the applications of resolvents are made by solving linear equations by direct method (matrix factorization method). Therefore, in this experiment, the applications of resolvents are made one by one to keep the memory requirement low.

In both present filters and elliptic filters, in an application of a resolvent R(ρ i ) ≡ (Aρ i B)−1 B to a set of vectors, the linear equation with complex symmetric matrix C = Aρ i B is solved by the complex version of modified Cholesky method C = LDL T for banded matrix C and it is calculated by the same program code. For the elliptic filters, in the calculation of R(ρ i ) X for the set of m vectors X, we make a matrix multiplication X = BX once, since X is the common right-hand-sides of the set of linear equations for every ρ i .

We are to solve the same eigenproblem from FEM discretization of Laplacian problem as before whose discretization manner is (N 1, N 2, N 3) = (50, 60, 70). The size of matrices of the eigenproblem is N = 50 × 60 × 70 = 210, 000, and the lower bandwidth of matrices is w L = 1 + 50 + 50 × 60 = 3051.

Fig. 14
figure 14

Filter (No.E1): magnitude of transfer function | g(t) |

Fig. 15
figure 15

Filter (No.E2): magnitude of transfer function | g(t) |

Fig. 16
figure 16

Filter (No.E3): magnitude of transfer function | g(t) |

The only difference from the previous experiment is the kind of filters used, therefore elapse times are compared for the filtering procedure.

For present filter (No.1), (No.2) and (No.3), the count of matrix decomposition is only once, however n the number of repeats of a matrix multiplication by B followed by solution of a set of simultaneous linear equations by using the matrix factors is 15, 15 and 20, respectively. For elliptic filter (No.E1), (No.E2) and (No.E3), the count of matrix decompositions k is 8, 7 and 8, respectively.

We measured elapse times to filter a set of m vectors for the cases m = 30 and m = 200, by using present filters (No.1), (No.2) and (No.3) and elliptic filters (No.E1), (No.E2) and (No.E3), which are shown in Table 10. In the case of m = 30, the elapse times are about 3 times less for present filters compared with elliptic ones, therefore, when m = 30 the use of present filter reduces the elapse time for filtering than elliptic filter. But in the case of m = 200, the elapse times were not so much different between elliptic filters and present filters.

Table 10 Elapse times (in s) for filtering m vectors

When the size of matrices A and B of the eigenproblem is N and its bandwidth is w, the amount of computation to factor a symmetric banded matrix C is O(Nw 2), and it is O(Nwm) to solve the set of simultaneous linear equations with m right-hand-sides after the matrix is factored. The amount of computation to multiply a symmetric banded matrix B to a set of m vectors is also O(Nwm).

Thus, the elapse time to factor a symmetric banded matrix C of size N with bandwidth w is T decomposet 1 Nw 2, the elapse time to solve a set of m of simultaneous linear equations using the matrix factor is T solvet 2 Nwm and the elapse time to multiply a set of m vectors to a symmetric banded matrix B of size N with bandwidth w is T mulBt 3 Nwm. Here, t 1, t 2 and t 3 are some coefficients. Then, the elapse time of present filter can be written as:

$$\displaystyle{ T_{\mathrm{present}} = T_{\mathrm{decompose}} + n \times (T_{\mathrm{mulB}} + T_{\mathrm{solve}} + O(Nm))\,, }$$
(32)

and the elapse time of elliptic filter by using k resolvents is written as:

$$\displaystyle{ T_{\mathrm{elliptic}} = T_{\mathrm{mulB}} + k \times (T_{\mathrm{decompose}} + T_{\mathrm{solve}} + O(Nm))\,. }$$
(33)

We have

$$\displaystyle{ \begin{array}{l} T_{\mathrm{present}}/N \approx t_{1}w^{2} + (t_{2} + t_{3})nwm, \\ T_{\mathrm{elliptic}}/N \approx t_{1}kw^{2} + (t_{2}k + t_{3})wm\,. \end{array} }$$
(34)

The coefficients t 1, t 2 and t 3 depends on the system and the manner of calculation. Since the calculation of matrix decomposition has much higher locality of data references than the matrix-vector multiplication or the solution of linear equation using the matrix factors, the coefficient t 1 must be smaller than t 2 or t 3. From the above expression (34), when m is small and ignorable, the elapse time of present filter could be nearly k times faster than the elliptic filter (with no parallel resolvent calculation) since it makes just one matrix decomposition. However, as m increases, the advantage of the present approach seems reduced.

6 Conclusion

For a real symmetric-definite generalized eigenproblem, the filter diagonalization method solves those eigenpairs whose eigenvalues are in the specified interval. In this present study, the filter we used is a real-part of a polynomial of a single resolvent rather than a real-part of a linear combination of many resolvents to take advantages of reductions of the amount of memory requirement and also computation. In numerical experiments, we obtained good results.

When the filter is a real-part of a linear combination of many (8 to 16) resolvents, for each resolvents the applications to a set of vectors can be made in parallel. For our present method, when the filter is a real-part of a polynomial of a resolvent, applications of the resolvent as many times as the degree of the polynomial can be made only in sequential. However, even the potential parallelism is reduced, the present method has an advantage that it requires only single resolvent, therefore the amount of storage requirement is low and also the total amount of computation can be reduced. Once a single matrix which corresponds to the resolvent is decomposed and factors are stored, each application of the resolvent can be calculated easily and fast. Another difficulty of the present type of filter is that the shape of the transfer function is not so good as the shape of the filter which is a real-part of a linear combination of many resolvents such as elliptic filter.

In this paper, three filters are constructed and they are shown with their polynomial coefficients and shape parameters. By using these three filters, we made some experiments of the filter diagonalization. For a generalized eigenproblem which is derived from FEM discretization of the Laplace operator over a cubic region with zero Dirichlet boundary condition, we solved some internal eigenpairs whose eigenvalues are in the specified interval. We compared eigenvalues of approximated pairs with exact ones, and found their agreements were good, which showed our approach worked as expected.