1 Introduction

In this paper, we consider complex moment-based eigensolvers for computing all eigenvalues located in a certain region and their corresponding eigenvectors for a generalized eigenvalue problem of the following form

$$\displaystyle{ A\boldsymbol{x}_{i} =\lambda _{i}B\boldsymbol{x}_{i},\quad \boldsymbol{x}_{i} \in \mathbb{C}^{n}\setminus \{\boldsymbol{0}\},\quad \lambda _{ i} \in \varOmega \subset \mathbb{C}, }$$
(1)

where \(A,B \in \mathbb{C}^{n\times n}\) and the matrix pencil zBA are assumed to be diagonalizable and nonsingular for any z on the boundary of Ω. Let m be the number of target eigenpairs and X Ω be an n × m matrix, whose columns are the target eigenvectors, i.e., \(X_{\varOmega }:= [\boldsymbol{x}_{i}\vert \lambda _{i} \in \varOmega ]\).

For solving the generalized eigenvalue problem (1), Sakurai and Sugiura have proposed a projection type method that uses certain complex moment matrices constructed by a contour integral in 2003 [13]. Thereafter, several researchers have actively studied improvements and related eigensolvers based on the complex moment-based eigensolver [5,6,7,8, 12, 14, 17]. The concepts of Sakurai and Sugiura have also been extended to solve nonlinear eigenvalue problems [1,2,3, 18].

Recently, we analyzed error bounds of the Rayleigh–Ritz type complex moment-based eigensolver called the block SS–RR method [9]. In this paper, we apply the results of the analyses to the case that soft-errors like bit-flip occur. Using the error bound, we provide an error resilience strategy which does not require standard checkpointing and replication techniques in the most time-consuming part of the eigensolver.

The remainder of this paper is organized as follows. In Sect. 2, we briefly describe the basic concepts of the complex moment-based eigensolvers. In Sect. 3, we introduce the algorithm of the block SS–RR method and the results of its error bounds. We also introduce the parallel implementation of the block SS–RR method in Sect. 3. In Sect. 4, we propose an error resilience strategy for the block SS–RR method. In Sect. 5, we show some numerical results and we present conclusions in Sect. 6.

Throughout, the following notations are used. Let \(V = [\boldsymbol{v}_{1},\boldsymbol{v}_{2},\ldots,\boldsymbol{v}_{L}] \in \mathbb{C}^{n\times L}\), then \(\mathscr{R}(V )\) is the range space of the matrix V, and is defined by \(\mathscr{R}(V ):= \mathrm{span}\{\boldsymbol{v}_{1},\boldsymbol{v}_{2}\), \(\ldots,\boldsymbol{v}_{L}\}\). In addition, for \(A \in \mathbb{C}^{n\times n}\), \(\mathscr{K}_{k}^{\square }(A,V )\) are the block Krylov subspaces, \(\mathscr{K}_{k}^{\square }(A,V ) =\mathscr{ R}([V,AV,\ldots,A^{k-1}V ])\).

2 Complex Moment-Based Eigensolvers

As a powerful algorithm for solving the generalized eigenvalue problem (1), Sakurai and Sugiura have proposed the complex moment-based eigensolver in 2003 [13]. This is called the SS–Hankel method. To solve (1), they introduced the rational function

$$\displaystyle{ r(z):=\widetilde{\,\boldsymbol{ v}}^{\mathrm{H}}(zB - A)^{-1}B\boldsymbol{v},\quad \boldsymbol{v},\widetilde{\boldsymbol{v}} \in \mathbb{C}^{n}\setminus \{\boldsymbol{0}\}, }$$
(2)

whose poles are the eigenvalues λ of the matrix pencil zBA. They then considered computing all poles located in Ω.

All poles located in a certain region of a meromorphic function can be computed by the algorithm in [11], which is based on Cauchy’s integral formula,

$$\displaystyle{ r(a) = \frac{1} {2\pi \mathrm{i}}\oint _{\varGamma } \frac{r(z)} {z - a}\mathrm{d}z, }$$

where Γ is the positively oriented Jordan curve (i.e., the boundary of Ω). By applying the algorithm in [11] to the rational function (2), the target eigenpairs \((\lambda _{i},\boldsymbol{x}_{i}),\lambda _{i} \in \varOmega\) of the generalized eigenvalue problem (1) are obtained by solving the generalized eigenvalue problem:

$$\displaystyle{ H_{M}^{<}\boldsymbol{u}_{ i} =\theta _{i}H_{M}\boldsymbol{u}_{i}. }$$

Here, H M and H M < are small M × M Hankel matrices of the form

$$\displaystyle{ H_{M}:= \left (\begin{array}{cccc} \mu _{0} & \mu _{1} & \cdots & \mu _{M-1} \\ \mu _{1} & \mu _{2} & \cdots & \mu _{M}\\ \vdots & \vdots & \ddots & \vdots \\ \mu _{M-1} & \mu _{M}&\cdots &\mu _{2M-2} \end{array} \right ),\quad H_{M}^{<}:= \left (\begin{array}{cccc} \mu _{1} & \mu _{2} & \cdots & \mu _{M}\\ \mu _{2 } & \mu _{3 } & \cdots & \mu _{M+1}\\ \vdots & \vdots & \ddots & \vdots \\ \mu _{M}&\mu _{M+1} & \cdots &\mu _{2M-1} \end{array} \right ), }$$

whose entries consist of the following complex moments

$$\displaystyle{ \mu _{k}:= \frac{1} {2\pi \mathrm{i}}\oint _{\varGamma }z^{k}r(z)\mathrm{d}z. }$$

For details, refer to [13].

For more accurate eigenpairs, improvement of the SS–Hankel method has been proposed [14]. This improvement is based on the Rayleigh–Ritz procedure and is called the SS–RR method. Block variants of the SS–Hankel method and the SS–RR method have also been proposed [5, 6] for higher stability of the algorithms, specifically when multiple eigenvalues exist in Ω. These are called the block SS–Hankel method and the block SS–RR method, respectively. An Arnoldi-based interpretation of the complex moment-based eigensolvers and the resulting algorithm have also been proposed [8]. The algorithm is named the block SS–Arnoldi method.

As another approach of the complex moment-based eigensolver, Polizzi has proposed the FEAST eigensolver for Hermitian generalized eigenvalue problems in 2009 [12] and then developed it further [17]. The FEAST eigensolver is an accelerated subspace iteration-type method, and a single iteration is closely connected to a special case of the block SS–RR method with M = 1.

The relationship among these complex moment-based eigensolvers was analyzed in [10].

3 The Block SS–RR Method

In this section, we introduce the algorithm of the block SS–RR method and the results of its error bounds.

3.1 Algorithm of the Block SS–RR Method

Let \(L,M \in \mathbb{N}\) be input parameters. Also let \(V \in \mathbb{C}^{n\times L}\) be an input matrix, e.g., a random matrix. Then, we define an n × LM matrix

$$\displaystyle{ S:= [S_{0},S_{1},\ldots,S_{M-1}], }$$

where

$$\displaystyle{ S_{k}:= \frac{1} {2\pi \mathrm{i}}\oint _{\varGamma }z^{k}(zB - A)^{-1}BV \mathrm{d}z. }$$
(3)

Then, we have the following theorem; see e.g., [9].

Theorem 1

Let m be the number of eigenvalues of  (1) and rank(S) = m. Then, we have

$$\displaystyle{ \mathscr{R}(S) =\mathscr{ R}(X_{\varOmega }) = \mathrm{span}\{\boldsymbol{x}_{i}\vert \lambda _{i} \in \varOmega \}. }$$

Theorem 1 indicates that the target eigenpairs \((\lambda _{i},\boldsymbol{x}_{i}),\lambda _{i} \in \varOmega\) can be obtained by the Rayleigh–Ritz procedure with \(\mathscr{R}(S)\). The above forms the basis of the block SS–RR method [5]. Continuous integration (3) is approximated by some numerical integration rule such as the N-point trapezoidal rule with NM − 1. The approximated matrix \(\widehat{S}_{k}\) is expressed as

$$\displaystyle{ S \approx \widehat{ S}_{k}:=\sum _{ j=1}^{N}\omega _{ j}z_{j}^{k}(z_{ j}B - A)^{-1}BV, }$$
(4)

where z j are the quadrature points, and ω j are the corresponding weights. We also set

$$\displaystyle{ S \approx \widehat{ S}:= [\widehat{S}_{0},\widehat{S}_{1},\ldots,\widehat{S}_{M-1}]. }$$
(5)

Here, (z j , ω j ) are required to satisfy

$$\displaystyle{ \sum _{j=1}^{N}\omega _{ j}z_{j}^{k}\left \{\begin{array}{ll} \neq 0, &(k = -1) \\ = 0,&(k = 0,1,\ldots,N - 2) \end{array} \right.. }$$
(6)

The algorithm of the block SS–RR method with numerical integration is consist of the following three steps:

  1. Step 1.

    Solve N linear systems with L right-hand sides of the form:

    $$\displaystyle{ (z_{j}B - A)W_{j} = BV,\quad j = 1,2,\ldots,N. }$$
    (7)
  2. Step 2.

    Construct the matrix \(\widehat{S}\) by (4) and (5), where \(\widehat{S}_{k}\) can be rewritten by using W j as follows:

    $$\displaystyle{ \widehat{S}_{k} =\sum _{ j=1}^{N}\omega _{ j}z_{j}^{k}W_{ j},\quad k = 1,2,\ldots,M - 1. }$$
    (8)
  3. Step 3.

    Compute approximate eigenpairs by the Rayleigh–Ritz procedure as follows. Solve

    $$\displaystyle{ Q^{\mathrm{H}}AQ\boldsymbol{u}_{ i} =\theta _{i}Q^{\mathrm{H}}BQ\boldsymbol{u}_{ i}, }$$

    and \((\widehat{\lambda }_{i},\widehat{\boldsymbol{x}}_{i}) = (\theta _{i},Q\boldsymbol{u}_{i})\), where \(Q = \mathrm{orth}(\widehat{S})\).

The algorithm of the block SS–RR method is summarized as Algorithm 1.

Algorithm 1 The block SS–RR method

In practice, in order to reduce the computational costs and to improve accuracy, the matrix \(\widehat{S}\) is replaced with a low-rank approximation obtained from the singular value decomposition. Moreover, z k is scaled for improving numerical stability. For details, refer to [5, 15].

The block SS–RR method has some parameters such as L, M, N, and these parameters strongly affect the performance of the method. In the current version of the software of the block SS–RR method, z-pares ver.0.9.6a [19], N = 32, M = 16 are used as the default parameters. The parameter L is usually set such that LM = 2m, where m is the number of the target eigenvalues in Ω. The optimal parameters depend on the eigenvalue distribution, the required accuracy, computational environments and so on. For the details of how to set the parameters achieving good performance, refer to [15].

3.2 Error Bounds of the Block SS–RR Method

In [9], the error bounds of the block SS–RR method are analyzed. Here, we briefly introduce the results.

Let the matrix pencil zBA be diagonalizable, i.e.,

$$\displaystyle{ Y ^{-1}(zB-A)X = z\left [\begin{array}{cc} I_{r}& \\ &O_{n-r} \end{array} \right ]-\left [\begin{array}{cc} \varLambda _{r}& \\ &I_{n-r} \end{array} \right ], }$$

where Λ r : = diag(λ 1, λ 2, , λ r ) is a diagonal matrix, and \(Y ^{-1}:= [\,\widetilde{\boldsymbol{y}}_{1},\widetilde{\boldsymbol{y}}_{2},\ldots,\widetilde{\boldsymbol{y}}_{n}]^{\mathrm{H}}\) and \(X:= [\boldsymbol{x}_{1},\boldsymbol{x}_{2},\ldots,\boldsymbol{x}_{n}]\) are nonsingular matrices. The generalized eigenvalue problem \(A\boldsymbol{x}_{i} =\lambda _{i}B\boldsymbol{x}_{i}\) has r: = rank(B) finite eigenvalues λ 1, λ 2, , λ r and nr infinite eigenvalues. The vectors \(\widetilde{\boldsymbol{y}}_{i}\) and \(\boldsymbol{x}_{i}\) are the corresponding left and right eigenvectors, respectively. The filter function

$$\displaystyle{ f(\lambda _{i}):=\sum _{ j=1}^{N} \frac{\omega _{j}} {z_{j} -\lambda _{i}}, }$$
(9)

is commonly used for analysis of the complex moment-based eigensolvers [6, 16, 17]. Using this filter function, the matrix \(\widehat{S}\) can be written as

$$\displaystyle{ \widehat{S} = \left (X_{r}f(\varLambda _{r})\widetilde{X}_{r}^{\mathrm{H}}\right )[V,CV,\ldots,C^{M-1}V ],\quad C:= X_{ r}\varLambda _{r}\widetilde{X}_{r}^{\mathrm{H}}, }$$

where \(\varLambda _{r}:= \mathrm{diag}(\lambda _{1},\lambda _{2},\ldots,\lambda _{r}),X_{r}:= [\boldsymbol{x}_{1},\boldsymbol{x}_{2},\ldots,\boldsymbol{x}_{r}],\widetilde{X}_{r}:= [\,\widetilde{\boldsymbol{x}}_{1},\widetilde{\boldsymbol{x}}_{2},\ldots,\widetilde{\boldsymbol{x}}_{r}]\) and \(X^{-1} =\widetilde{ X}^{\mathrm{H}} = [\,\widetilde{\boldsymbol{x}}_{1},\widetilde{\boldsymbol{x}}_{2},\ldots,\widetilde{\boldsymbol{x}}_{n}]^{\mathrm{H}}\). The error bound of the block SS–RR method in [9] can be simplified under some assumption on V as follows.

Theorem 2

Let \((\lambda _{i},\boldsymbol{x}_{i})\) be the exact eigenpairs of the matrix pencil zBA. Assume that f(λ i ) are ordered in decreasing order of magnitude |  f(λ i ) | ≥ |  f(λ i+1) | . Define \(\mathscr{P}\) as the orthogonal projector onto the subspace \(\mathscr{R}(\widehat{S})\) . Then, we have

$$\displaystyle{ \|(I -\mathscr{P})\boldsymbol{x}_{i}\|_{2} \leq \alpha \beta _{i}\left \vert \frac{f(\lambda _{LM+1})} {f(\lambda _{i})} \right \vert, }$$

where α = ∥X2X −12 , β i depends on the angle between the subspace \(\mathscr{K}_{M}^{\square }(C,V )\) and each eigenvector \(\boldsymbol{x}_{i}\) .

Moreover, in [9], the error bound has been proposed for the case in which the solution of the linear system for the j -th quadrature point is contaminated as follows:

$$\displaystyle{ (z_{j^{{\prime}}}B - A)^{-1}BV + E, }$$
(10)

where \(E \in \mathbb{C}^{n\times L}\) is an error matrix of rank(E) = L L. Because of the contaminated solution (10), the matrix \(\widehat{S}\) is also contaminated. We define the contaminated matrix as \(\widehat{S}^{{\prime}}\). The error bound of the block SS–RR method with the contaminated matrix in [9] can also be simplified under some assumption on V as follows.

Theorem 3

Let \((\lambda _{i},\boldsymbol{x}_{i})\) be the exact eigenpairs of the matrix pencil (A, B). Assume that f(λ i ) are ordered in decreasing order of magnitude |  f(λ i ) | ≥ |  f(λ i+1) | . Define \(\mathscr{P}^{{\prime}}\) as the orthogonal projector onto the subspace \(\mathscr{R}(\widehat{S}^{{\prime}})\) . Then, we have

$$\displaystyle{ \|(I -\mathscr{P}^{{\prime}})\boldsymbol{x}_{ i}\|_{2} \leq \alpha \beta _{i}^{{\prime}}\left \vert \frac{f(\lambda _{LM-L^{{\prime}}+1})} {f(\lambda _{i})} \right \vert, }$$

where α = ∥X2X −12 , β i depends on the error matrix E and the angle between the subspace \(\mathscr{K}_{M}^{\square }(C,V )\) and each eigenvector \(\boldsymbol{x}_{i}\) .

Here, we note that the values β i is not equivalent to β i , since β i depends on error matrix E and the contaminated quadrature point j . β i may become larger for λ i near \(z_{j^{{\prime}}}\) than others, specifically for the case where L = L. For more details of these theorems, refer to [9].

3.3 Parallel Implementation of the Block SS–RR Method

The most time-consuming part of the block SS–RR method is to solve N linear systems with L right-hand sides (7) in Step 1. For solving the linear systems, the block SS–RR method has hierarchical parallelism; see Fig. 1.

Fig. 1
figure 1

Hierarchical structure of the block SS–RR method

  1. Layer 1.

    Contour paths can be performed independently.

  2. Layer 2.

    The linear systems can be solved independently.

  3. Layer 3.

    Each linear system can be solved in parallel.

By making the hierarchical structure of the algorithm responsive to the hierarchical structure of the architecture, the block SS–RR method is expected to achieve high scalability.

Because Layer 1 can be implemented completely without communications, here we describe a basic parallel implementation of the block SS–RR method for one contour path. Let P be the number of MPI processes used for one contour path. Here we assume mod(P, N) = 0 for simplicity and consider two dimensional processes grid, i.e., p i, j , i = 1, 2, , PN, j = 1, 2, , N. Then, we also define MPI sub-communicators for N MPI processes p i, j , j = 1, 2, , N as mpi_comm_row(i) (i = 1, 2, , PN) and for PN MPI processes p i, j , i = 1, 2, , PN as mpi_comm_col(j) ( j = 1, 2, , N); see Fig. 2.

Fig. 2
figure 2

Processes grid and MPI sub-communicators

3.3.1 Parallel Implementation for Step 1

In Step 1, we need to solve N linear systems (7). Because these linear systems are independent of j (index of quadrature point), we can independently solve these N linear systems in N parallel. Each linear system (z j BA)W j = V is solved by some parallel linear solver on the MPI sub-communicator mpi_comm_col(j) in parallel.

In this implementation, the coefficient matrices A, B and the input matrix V require to be distributed to PN MPI processes in each MPI sub-communicator mpi_comm_col(j) with N redundant. As a result, each solution W j of the linear system is also distributed to PN MPI processes in the MPI sub-communicator mpi_comm_col(j).

3.3.2 Parallel Implementation for Step 2

Let W j (i), (i = 1, 2, , PN, j = 1, 2, , N) be the distributed sub-matrix of W j , which are stored by the MPI process p i, j . Then, for constructing the matrix \(\widehat{S}_{k}\) (8), we independently compute

$$\displaystyle{ W_{j,k}^{(i)} =\omega _{ j}z_{j}^{k}W_{ j}^{(i)} }$$

in each MPI process without communication. Then, we perform mpi_allreduce on the MPI sub-communicator mpi_comm_row(i) with N MPI processes in PN parallel as follows:

$$\displaystyle{ \widehat{S}_{k}^{\,(i)} =\sum _{ j=1}^{N}W_{ j,k}^{(i)},\quad k = 0,1,\ldots,M - 1. }$$

We also set

$$\displaystyle{ \widehat{S}^{(i)} = [\widehat{S}_{ 0}^{(i)},\widehat{S}_{ 1}^{(i)},\ldots,\widehat{S}_{ M-1}^{(i)}], }$$

where \(\widehat{S}^{(i)}\) are the sub-matrix of \(\widehat{S}\), which is redundantly stored by the MPI processes p i, j , j = 1, 2, , N. In this implementation, the matrix \(\widehat{S}\) are distributed in PN MPI processes in each MPI sub-communicator mpi_comm_row(i) with N redundant.

3.3.3 Parallel Implementation for Step 3

We have two choices for parallel implementation for Step 3. The first choice is that all P MPI processes perform the orthogonalization of \(\widehat{S}\) and the Rayleigh–Ritz procedure. This choice makes it possible to work all MPI processes we can use; however, it needs to redistribution of the matrices A, B and \(\widehat{S}\).

The second choice is that only PN MPI processes in the MPI sub-communicator mpi_comm_col(j) perform this calculation. In this case, only PN MPI processes work and the others are just redundant; however, redistribution of the matrices A, B and \(\widehat{S}\) does not be required.

4 An Error Resilience Strategy of the Block SS–RR Method

With the recent development of high-performance computer, systems scale is drastically increasing. In such situation, fault management is considered to play an important role in large scale application. The fault can be classified to hardware fault and software fault. Here, we focus on software fault like bit-flip.

The most standard software fault tolerance techniques are checkpointing techniques. The checkpointing techniques save all correct data at some interval, and if some fault is detected then it restarts with the last correct data. These are efficient for the case that data size required to save is small and that interval between each checkpoint is small. On the other hand, large data size causes large I/O costs and large interval causes large recalculation costs when fault occurs.

The replication techniques are also very basic software fault tolerance techniques. Its basic idea is shown below. Let P be the number of MPI processes we can use and K be the number of redundancies. Firstly, we split MPI communicator into each PK MPI processes. The replication techniques restrict the parallelism to PK, i.e., calculation is independently performed by PK MPI processes in each MPI sub-communicator. Then, the correct solution is selected from K solutions by e.g. a majority vote. These are efficient when the number of MPI processes is large such that target calculation does not show good scalability. However, if the target calculation shows good scalability, the replication techniques largely increase the execution time even if fault does not occur.

In this section, we consider an error resilience strategy of the block SS–RR method that can use all the MPI processes for the most time-consuming part, i.e., to solve the N linear systems (7) in Step 1 and avoid resolving them even if fault occurs. Here, we assume the following software fault:

  • Let \(a \in \mathbb{F}\) be the correct value, where \(\mathbb{F}\) is the set of floating point numbers. The fault occurs as the numerical error as follows:

    $$\displaystyle{ a^{{\prime}}\leftarrow a + e,\quad e \in \mathbb{F}, }$$
    (11)

    where \(a^{{\prime}}\in \mathbb{F}\) is the contaminated value. Here, a, a , e are not “Inf” or “Nan”.

  • Unlike hardware faults, remaining calculation are correctly performed with the contaminated values.

4.1 Error Resilience Strategy

As shown in Sect. 3, the algorithm of the block SS–RR method and its parallel implementation can be divided into three steps: solving the linear systems, the numerical integration and the Rayleigh–Ritz procedure. Here, we consider error resilience of each step.

4.1.1 Error Resilience Strategy for Step 1

Step 1 is the most time-consuming part and also the most scalable part of the block SS–RR method. Therefore, standard checkpointing and replication techniques may not be efficient for computational costs. Hence, we introduce an alternative strategy to standard checkpointing and replication techniques for computational costs.

When fault occurs in Step 1, some kind of value(s) in calculation are replaced as (11) due to the fault. Then, the contamination is propagated to all MPI processes in the same MPI sub-communicator mpi_comm_col(j) via communication. As a result, the solution of the linear system is replaced as

$$\displaystyle{ W_{j^{{\prime}}}^{{\prime}}\leftarrow W_{ j^{{\prime}}} + E,\quad E \in \mathbb{F}^{n\times L},\quad \mathrm{rank}(E) = L, }$$
(12)

when fault occurs in the MPI process \(p_{i,j^{{\prime}}}\) associated with the j -th linear system.

Here, we reconsider Theorems 2 and 3. Theorem 2 implies that the error bound of the block SS–RR method is evaluated by the ratio of the magnitude of the filter function | f(λ i ) | to the (LM + 1)-th largest | f(λ LM+1) |. The magnitude of the filter function | f(λ i ) | of the N-point trapezoidal rule with N = 16, 32, 64 for the unit circle region Ω is shown in Fig. 3. The filter function has | f(λ) | ≈ 1 inside the region Ω, | f(λ)) | ≈ 0 far from the region and 0 < | f(λ) | < 1 outside but near the region. Because of Theorem 2 and the filter function, we usually set subspace size LM such that | f(λ LM+1) | ≈ 0 to compute the target eigenpairs \((\lambda _{i},\boldsymbol{x}_{i}),\lambda _{i} \in \varOmega\) with high accuracy.

Fig. 3
figure 3

Magnitude of filter function | f(λ) | of the N-point trapezoidal rule with N = 16, 32, 64 for the unit circle region Ω. (a) On the real axis for N = 16, 32, 64. (b) On the complex plane for N = 32

Regarding the filter function, Theorem 3 implies that the accuracy of the block SS–RR method with the contaminated solution is evaluated by the ratio of the magnitude of the filter function | f(λ i ) | to the (LML + 1)-th largest | f(λ LML+1) |. Of course, Theorem 3 support the case when fault occurs in Step 1 like (12). Therefore, if we consider the case that fault occurs in Step 1, we just set subspace size LM such that | f(λ LML+1) | ≈ 0 in order to obtain the eigenpairs to high accuracy.

Here, we note that, when multiple faults occur in different quadrature points, i.e.,

$$\displaystyle{ W_{j_{1}^{{\prime}}}^{{\prime}}\leftarrow W_{ j_{1}^{{\prime}}}+E_{1},\quad W_{j_{2}^{{\prime}}}^{{\prime}}\leftarrow W_{ j_{2}^{{\prime}}}+E_{2},\quad E_{1},E_{2} \in \mathbb{F}^{n\times L},\quad \mathrm{rank}(E_{ 1}) = \mathrm{rank}(E_{2}) = L, }$$

then we can handle the fault in Step 1 by setting larger subspace LM such that | f(λ LM−2L+1) | ≈ 0.

This is an error resilience strategy for Step 1, which makes it possible to use all MPI processes for computing the N linear systems (7) and to avoid resolving them even if fault occurs.

4.1.2 Error Resilience Strategy for Step 2

The computational cost for Step 2 is very small, and the data size is not exorbitant large. Therefore, we can apply checkpointing technique with small additional costs for Step 2.

4.1.3 Error Resilience Strategy for Step 3

As noted in Sect. 3.3, we have two choices for implementation of Step 3: to use all processes with redistribution and to replicate without redistribution. If the number of processes P is not so large such that this part shows good scalability, the first choice is better in terms of computational costs. If not, the second choice is better due to the costs of redistribution. In practice, we want to increase the number of processes P, if possible, during N linear systems, which is the most time-consuming part, shows good scalability. And computation of N independent linear systems is expected to have better scalability than one of the orthogonalization and the Rayleigh–Ritz procedure. Hence, we usually employ the second choice.

Therefore, we can apply replication technique without no additional costs for Step 3.

4.2 A Possibility of Development to Other Complex Moment-Based Eigensolvers

In Sect. 4, we proposed the error resilience strategy of the block SS–RR method which is based on the error analysis in [9]. Here, we consider a possibility of development of our strategy to other complex moment-based eigensolvers.

The proposed error resilient strategy is mainly based on Theorem 3 for the block SS–RR method. Similar theorems as Theorem 3 could be derived for other complex moment-based eigensolvers. One of the most important respects of Theorem 3 is that the subspace size LM should be larger than the rank of error matrix L , i.e., LM > L . In the case of one linear solution is contaminated in the block SS–RR method with M ≥ 2, the condition LM > LL is always satisfied and this makes it possible to derive the proposed error resilient strategy.

For development of our strategy to other complex moment-based eigensolvers, we can expect that the proposed strategy is also utilized to other complex moment-based eigensolvers with high order complex moments such as the (block) SS–Hankel method and the block SS–Arnoldi method, although more detailed analyses and numerical experiments are required. Because these methods with M > 2 always satisfy the condition LM > LL as well as the block SS–RR method.

On the other hand, the current proposed strategy may be difficult to recover the error of the complex moment-based eigensolvers only with low order complex moments such as the FEAST eigensolver [12, 17] and the Beyn method [3]. The subspace size of these methods is L which is the same as the number of right-hand side of the linear systems. This indicates that the rank of the error matrix reaches the subspace size in the worst case. In this case, our strategy can not recover the error.

5 Numerical Experiments

In this section, we experimentally evaluate the results of the error resilience strategy specifically for Step 1.

5.1 Example I

For the first example, we apply the block SS–RR method with and without soft-error in Step 1 to the following model problem

$$\displaystyle{ \begin{array}{c} A\boldsymbol{x}_{i} =\lambda \boldsymbol{x}_{i}, \\ A = \mathrm{diag}(0.01,0.11,0.21,\ldots,9.91) \in \mathbb{R}^{100\times 100}, \\ \lambda _{i} \in \varOmega = [-1,1],\end{array} }$$

and evaluate its accuracy.

We evaluate the relation between accuracy with the number of subspace size LM. To evaluate this relation, we fixed the parameters as L = 10 and N = 32, and tested four cases M = 1, 2, 3, 4 (LM = 10, 20, 30, 40). For this example, we set Γ as the unit circle and the quadrature points as

$$\displaystyle{ z_{j} =\cos (\theta _{j}) + \mathrm{i}\sin (\theta _{j}),\quad \theta _{j} = \frac{2\pi } {N}\left (j -\frac{1} {2}\right ) }$$

for j = 1, 2, , N. We let fault occur at one of the following quadrature points,

$$\displaystyle{ z_{j^{{\prime}}} = \left \{\begin{array}{l} z_{1} =\cos \left ( \frac{\pi }{32}\right ) + \mathrm{i}\sin \left ( \frac{\pi }{32}\right )\\ \\ z_{8} =\cos \left (\frac{15\pi } {32}\right ) + \mathrm{i}\sin \left (\frac{15\pi } {32}\right )\\ \\ z_{16} =\cos \left (\frac{31\pi } {32}\right ) + \mathrm{i}\sin \left (\frac{31\pi } {32}\right ) \end{array} \right. }$$

The algorithm was implemented in MATLAB R2014a. The input matrix V and the error matrix E were set as different random matrices generated by the Mersenne Twister in MATLAB, and each linear system was solved by the MATLAB command “∖”.

We show in Table 1 the relation of the minimum and the maximum values of \(\|\boldsymbol{r}_{i}\|_{2}\) in λ i Ω with LM. Table 1(a) is for the case without fault and Table 1(b)–(d) are for the case when fault occurs in Step 1. We also show in Fig. 4 the residual 2-norm \(\|\boldsymbol{r}_{i}\|_{2}:=\| A\boldsymbol{x}_{i} -\lambda _{i}B\boldsymbol{x}_{i}\|_{2}/\|\boldsymbol{x}_{i}\|_{2}\) for the block SS–RR method with and without fault.

Fig. 4
figure 4

Accuracy of the block SS–RR method with L = 10, M = 4, N = 32 when fault occurs in Step 1. (a) Fault occurs at z 1. (b) Fault occurs at z 8. (c) Fault occurs at z 16

Table 1 Relation of accuracy of the block SS–RR method with LM when fault occurs in Step 1

Table 1 shows that \(\min _{\lambda _{i}\in \varOmega }\|\boldsymbol{r}_{i}\|_{2}\) have approximately the same order as | f(λ LM+1) | for the case without fault and as | f(λ LML+1) | when fault occurs in Step 1, respectively. Moreover, Fig. 4 shows that enough large subspace size (LM = 40 in this example) provides equally high accuracy independent of fault in Step 1.

5.2 Example II

For the second example, we apply the block SS–RR method with and without soft-error in Step 1 to the generalized eigenvalue problem AUNW9180 from ELSES matrix library [4]. The coefficient matrices A, B are 9180 dimensional real sparse symmetric matrices and B is also positive definite. We consider finding all eigenpairs \((\lambda _{i},\boldsymbol{x}_{i}),\lambda _{i} \in \varOmega = [0.119,0.153]\). In this region, there exist 99 eigenvalues.

We set Γ as the ellipse (center: 0.131, semi-major axis: 0.012 and semi-minor axis: 0.0012), and the quadrature points as

$$\displaystyle\begin{array}{rcl} & & z_{j} = 0.131 + 0.012\left (\cos (\theta _{j}) + 0.1\mathrm{i}\sin (\theta _{j})\right ), {}\\ & & \theta _{j} = \frac{2\pi } {N}\left (j -\frac{1} {2}\right ) {}\\ \end{array}$$

for j = 1, 2, , N. We also set parameters as L = 25, M = 8, N = 32 for the case without fault and as L = 25, M = 10, N = 32 when fault occurs in Step 1.

The input matrix V and the error matrix E were set as different random matrices generated by the Mersenne Twister, and each linear system was solved by “cluster_sparse_solver” in Intel MKL. Here, we note that, in this numerical experiment, we solved only N∕2 linear systems with multiple right-hand sides for j = 1, 2, , N∕2, because the linear solution W Nj can be constructed from W j using a symmetric property of the problem.

The numerical experiments were carried out in double precision arithmetic on 8 nodes of COMA at University of Tsukuba. COMA has two Intel Xeon E5-2670v2 (2.5 GHz) and two Intel Xeon Phi 7110P (61 cores) per node. In this numerical experiment, we use only CPU part. The algorithm was implemented in Fortran 90 and MPI, and was executed with 8 [node] × 2 [process/node] × 8 [thread/process].

We show in Fig. 5 the residual 2-norm \(\|\boldsymbol{r}_{i}\|_{2}:=\| A\boldsymbol{x}_{i} -\lambda _{i}B\boldsymbol{x}_{i}\|_{2}/\|\boldsymbol{x}_{i}\|_{2}\) for the block SS–RR method with and without fault. This shows that, by increasing subspace size LM, the block SS–RR method with fault can achieve approximately the same accuracy as the case without fault.

Fig. 5
figure 5

Accuracy of the block SS–RR method with and without fault in Step 1 for AUNW9180

Table 2 shows that the computation time of the block SS–RR method without fault using 1–16 processes and the computation time of the block SS–RR method with fault using 16 processes. This result indicates that Step 1 of the SS–RR method is the most time-consuming. We can also observe from this result that the proposed strategy recovers software faults with very small additional computational costs.

Table 2 Computation time of the block SS–RR method with and without fault

6 Conclusion

In this paper, we investigated the error resilience strategy of the Rayleigh–Ritz type complex moment-based parallel eigensolver (the block SS–RR method) for solving generalized eigenvalue problems. Based on the analyses of the error bound of the method, we provided the error resilience strategy which does not require standard checkpointing and replication techniques in the most time-consuming and the most scalable part. From our numerical experiment, our strategy recovers software faults like bit-flip with small additional costs.