Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In [3], Erlangga and Nabben proposed a multilevel method for solving the linear system

$$\begin{array}{rcl} Ax = b,\quad A \in {\mathbb{R}}^{n\times n},& &\end{array}$$
(1)

which mimics a multigrid process, but, instead of a smoother, with a Krylov method used at each level. Since a (optimal) Krylov method reduces, not smoothes, errors in some norm, the underlying concept of this Krylov-based multilevel method is different from that of multigrid. Algorithm 4 shows the main body of the two-level version of a multilevel Krylov (MK) method, based on (flexible) GMRES [7].

Notice that a multigrid component is in play: the course-grid solve, which involves a coarse-grid matrix A c . In multigrid, this solve is associated with the reduction of slow varying components of errors. The fine and coarse subspace are connected by two transfer operators \(\mathit{I_{H }^{h}}\) and \(\mathit{I_{h}^{H }}\), which prolongate and restrict, respectively, some quantities in the iteration process. In multigrid, these quantities are related to errors made by an approximate solution, obtained after a few smoothing steps or coarse-grid solves. In MK, this quantity is associated with vectors, which build the approximation subspace.

From an abstract point of view, if the second-level problem is solved exactly, Algorithm 4 is the consequence of applying GMRES on the preconditioned linear system

$$\begin{array}{rcl} \mathit{AQ}_{N}\tilde{x} = b,\quad x = Q_{N}\tilde{x},& &\end{array}$$
(2)

with \(Q_{N} = I -\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}A + \lambda _{n}\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}\) and \(\lambda _{n} =\max \{ \vert \lambda _{i}\vert \}_{1\leq i\leq n}\), with λ i the eigenvalue of A. Q N is called the shift operator, due to the following spectral property [3].

Theorem 1.

Let A be symmetric positive definite and \(\sigma (A) =\{ \lambda _{1},\ldots,\lambda _{n}\},\) \(0 < \lambda _{i} \leq \lambda _{j}\) , for i < j. If \(\mathit{I_{H }^{h}} = [z_{1}\ldots z_{m}]\) , with \(Az_{i} = \lambda _{i}z_{i}\) , m < n, \(\mathit{I_{h}^{H }} = {(\mathit{I_{H }^{h}})}^{T}\) , and \(A_{c} = {(\mathit{I_{H }^{h}})}^{T}\mathit{AI_{H }^{h}}\) (Galerkin coarse grid), then \(\sigma (\mathit{AQ}_{N}) =\{ \lambda _{m+1},\ldots,\lambda _{n}\}\) , with λ n having multiplicity of at least m + 1.

Thus, under the assumptions set in Theorem 1, Q N somehow shifts m smallest eigenvalues of A to λ n , without changing the rest, making the spectrum more clustered. Obviously, the system (2) is more favorable for a Krylov method than the original system (1).

The first two terms in Q N form the (right) deflation operator [6], denoted by Q D . The effect of Q N and Q D on A are spectrally equivalent in the following sense [3].

Theorem 2.

Let A be symmetric positive definite. Let \(Q_{D} = I -\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}A.\)

$$\sigma (\mathit{AQ}_{D}) :=\{ 0,\mu _{m+1},\ldots,\mu _{n}\}\;\Longleftrightarrow\;\{\mu _{m+1},\ldots,\mu _{n},\lambda _{n}\} =: \sigma (\mathit{AQ}_{N}).$$

Furthermore, if \(P_{D} = Q_{D}^{T}\) and \(P_{N} = I -\mathit{AI_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }} + \lambda _{n}\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}\) , then \(\sigma (\mathit{P_{D}A}) = \sigma (\mathit{AQ}_{D})\) and \(\sigma (\mathit{P_{N }A}) = \sigma (\mathit{AQ}_{N})\) , and the same equivalence holds.

Theorem 2 can be easily extended to a more general class of matrices A.

In this short note, we present observed convergence behaviors of this method, based on some relatively simple problems. We shall base the presentation on the two level Krylov method (Algorithm 4), which represents the best performance possibly attained in terms of numbers of iterations, for a given multilevel Krylov setup.

Throughout this note, σ( ⋅), λ( ⋅), κ( ⋅), \(\mathcal{R}(\cdot )\), and \(\mathcal{N}(\cdot )\) denote respectively the spectrum, eigenvalue, condition number, range, and null space of the argument.

2 Spectral Properties and Observed Convergence

In this section, we shall consider a hypothetical problem: a diagonal matrix \(A = \mathit{diag}(1,2,\ldots,n)\). In this case, \(\sigma (A) =\{ 1,2,\ldots,n\}\), the eigenvectors \(v_{i} = \alpha _{i}\mathbf{e}_{i}\), \(\alpha _{i}\neq 0\), and for any b = [b i ], the linear system (1) has the solution \(x = [x_{i}]_{1\leq i\leq n} = [b_{i}/a_{ii}]\).

2.1 Eigenvectors

Let \(\mathit{I_{H }^{h}} = [\mathbf{e}_{i}]_{1\leq i\leq m}\), \(\mathit{I_{h}^{H }} = {(\mathit{I_{H }^{h}})}^{T}\), and set \(A_{c} = I_{h}^{H}AI_{H}^{h}\) (Galerkin-type coarse-grid matrix). In this case, according to Theorem 2, \(\sigma (P_{N}A) =\{ m + 1,\ldots,n\}\), with λ n  = n having a multiplicity m + 1. Furthermore, \(\sigma (\mathit{P_{D}A}) =\{ 0,m + 1,\ldots,n\}\). As \(\mathit{P_{D}A}\) is symmetric positive semi-definite, \(\kappa _{\text{ eff}}(\mathit{P_{D}A}) = n/(m + 1) < n = \kappa (A)\).

Let x 0 = 0, and consider the left-preconditioned version of the two-level Krylov method: \(\mathit{P_{N }A}x = P_{N}b\). The Krylov subspace after the k-th iteration is

$$\mathcal{K}(P_{N}A,r_{0}) = \text{ span}\{r_{0},P_{N}Ar_{0},\ldots,{(\mathit{P_{N }A})}^{k-1}r_{ 0}\}.$$

Straight-forward computations lead to the following results:

$$r_{0} = \lambda _{n}D_{m}^{-1}b+D_{ n-m}^{0}b,\quad {(P_{ N}A)}^{i}r_{ 0} = \lambda _{n}^{i}D_{ m}^{-1}b+D_{ n-m}^{i-1}b,\,\,i = 1,\ldots,k-1,$$

with \(D_{m}^{-1} = \mathit{diag}(A_{m}^{-1}\,\,0)\) and \(D_{n-m} = \mathit{diag}(0\,\,A_{n-m})\). Thus,

$$\begin{array}{rcl} x_{k,N}& \in & \text{ span}\{\lambda _{n}D_{m}^{-1}b + D_{ n-m}^{0}b,\lambda _{ n}D_{m}^{-1}b + D_{ n-m}b,\ldots,\lambda _{n}^{k-1}D_{ m}^{-1}b + D_{ n-m}^{k-1}b\} \\ & \subseteq & \text{ span}\{D_{m}^{-1}b,D_{ n-m}^{0}b,D_{ n-m}b,\ldots,D_{n-m}^{k-1}b\}.\end{array}$$

Consider the (deflated) CG iteration applied to \(P_{D}Ax = P_{D}b\). In this case, CG generates a sequence of approximate solutions, \(\{\tilde{x}_{k,D}\}\), to this singular linear system such that

$$\tilde{x}_{k,D} \in \text{ span}\{D_{n-m}^{0}b,D_{ n-m}b,\ldots,D_{n-m}^{k-1}b\}.$$

The solution of Ax = b is then constructed as follows: \(x = (I - P_{D}^{T})x + P_{D}^{T}x = \mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}b + P_{D}^{T}x\), implying a sequence \(x_{k,D} = Is_{H}^{h}A_{c}^{-1}I_{h}^{H}b + P_{D}^{T}\tilde{x}_{k,D}\), and

$$\begin{array}{rcl} x_{k,D}& \in & I_{H}^{h}A_{ c}^{-1}I_{ h}^{H}b + \text{ span}\{P_{ D}^{T}D_{ n-m}^{0}b,P_{ D}^{T}D_{ n-m}b,\ldots,P_{D}^{T}D_{ n-m}^{k-1}b\} \\ & =& D_{n-m}^{-1}b + \text{ span}\{D_{ n-m}^{0}b,D_{ n-m}b,\ldots,D_{n-m}^{k-1}b\}.\end{array}$$

So, \(x_{k,N}\) and \(x_{k,D}\) are members of the same subspace. From [5], for every \(x_{k} \in \text{ span}\{D_{n-m}^{-1}b,D_{n-m}^{0}b,D_{n-m}b,\ldots,D_{n-m}^{k-1}b\} \equiv \mathcal{L}\),

$$\|x - x_{k,D}\|_{A}^{2} \leq \| x - x_{ k}\|_{A}^{2}.$$

Since \(x_{k,N} \in \mathcal{L}\),

$$\|x - x_{k,D}\|_{A}^{2} \leq \| x - x_{ k,N}\|_{A}^{2}.$$

This result suggests the superiority of deflation technique to the shift operator, shown in Figs. 1 and 2 for the hypothetical problem.

Fig. 1
figure 1

Convergence history of CG with deflation P D (solid) and shift P N (dotted)

Fig. 2
figure 2

Difference in the A-norm: \(\|x - x_{k,N}\|_{A} -\| x - x_{s}k,D\|_{A}\)

A similar result holds also for (right preconditioned) GMRES, reading

$$\|b - Ax_{k,D}\|_{2} \leq \| b - Ax_{k,N}\|_{2}.$$

The proof is skipped, and will be presented in another paper.

Fig. 3
figure 3

One dimensional (a) aggregation and (b) linear interpolation, with ∘ and ∙ indicating the fine and course nodes, respectively

2.2 Interpolation Matrices

Consider any sparse, full rank interpolation matrix \(\mathit{I_{H }^{h}} \in {\mathbb{R}}^{n\times m_{1}}\) and let \(\mathcal{I}_{H}^{h} \in {\mathbb{R}}^{n\times m_{2}}\) be another full rank matrix. The following theorem is proved in [5]:

Theorem 3.

Let \(A_{c,1}\,=\,{(\mathit{I_{H }^{h}})}^{T}\mathit{AI_{H }^{h}}\) and \(A_{c,2}\,=\,{(\mathcal{I}_{H}^{h})}^{T}A\mathcal{I}_{H}^{h}\) . Let \(P_{D,1}\,=\,I\,-\,\mathit{AI_{H }^{h}}A_{c,1}^{-1}{(\mathit{I_{H }^{h}})}^{T}\) , and similarly for P D,2 . If \(\mathcal{R}(\mathit{I_{H }^{h}}) \subseteq \mathcal{R}(\mathcal{I}_{H}^{h})\) , then

  1. 1.

    \(\lambda _{n}(P_{D,1}A) \geq \lambda _{n}(P_{D,2}A)\) , and

  2. 2.

    \(\lambda _{m_{1}+1}(P_{D,1}A) \leq \lambda _{m_{2}+1}(P_{D,2}A)\).

The classical examples of interpolation matrices are those associated with aggregation and linear interpolation in multigrid. They are illustrated for 1D in Fig. 3. Let \(\mathit{I_{H }^{h}}\) and \(\mathcal{I}_{H}^{h}\) be matrices, associated with the “standard” aggregation and augmented aggregation, respectively. For the latter, we augment \(\mathit{I_{H }^{h}}\) by a column vector \({(1\,0\,\ldots \,0\,1)}^{T}\). So, \(m_{2} = m_{1} + 1\), \(\mathit{I_{H }^{h}}\) and \(\mathcal{I}_{H}^{h}\) are full rank, and \(\mathit{I_{H }^{h}} \subseteq \mathcal{I}_{H}^{h}\). Thus, Theorem 3 holds. But, P N A (or \(\mathit{AQ}_{N}\)) is no longer symmetric, even if A is a diagonal matrix; CG certainly breaks down in this case, and GMRES has to be employed. Convergence of two-level Krylov is shown in Fig. 4. The figure suggests the faster convergence of the method with augmented aggregation. For deflation, this performance is predicted by Theorem 3 and the well-known convergence bound of CG (due to \(\kappa _{\text{ eff}}(P_{D,\mathcal{I}}A) \leq \kappa _{\text{ eff}}(P_{D,I}A))\). For P N , the GMRES convergence bound of [7] is, however, not useful for extracting detailed information about the behavior of the method (see the residuals at the initial stage of iterations). However, better clustering affects the overall convergence.

Fig. 4
figure 4

GMRES convergence history, based on “standard” aggregation, augmented aggregation, and linear interpolation

In Fig. 4, we show also the convergence based on the linear interpolation. The associated interpolation matrix (denoted by \(\mathcal{I}_{H}^{h}\)) is set such that it is of the same rank as the aggregation matrix (denoted by \(\mathit{I_{H }^{h}}\)). In this case, however, the inclusion condition of Theorem 3 does not hold. Use of linear interpolation clearly leads to a better convergence. This behavior is unfortunately not generally the rule, as we shall see in some examples in the subsequent sections.

3 Does Eigenvector-Based Transfer Operator Lead to a Better Convergence?

An almost general wisdom in deflation is to use eigenvectors or some (accurate) approximations to accelerate the convergence. The only reason one in practice avoids using them is because of the computational cost of computing even some of them. We shall address this issue in this section.

We consider a 1D Poisson problem in [0, 1]:

$$\begin{array}{rcl} -u \prime \prime(x) = f(x),\quad u(0) = u(1) = 0,& &\end{array}$$
(3)

discretized by the second-order central difference. The eigenvalues and eigenvectors of the associated finite difference matrix A are

$$\lambda _{k} = {4\sin }^{2}\left (\frac{\theta _{k}} {2} \right ),\quad v_{k} = {(\sin \theta _{k},\sin (2\theta _{k}),\ldots,\sin (n\theta _{k}))}^{T},\quad k = 1,\ldots,n,$$

with \(\theta _{k} = k\pi /(n + 1)\), with n the number of interior grid points (and hence, the size of A). In Fig. 5, we compare the performance of two-level MK method based on eigenvectors, (augmented) aggregation, and linear interpolation for n = 200 and \(m = n/2 = 100\). Interestingly, methods based on aggregation perform the best; they converge to the machine accuracy in two iterations. The spectrum of AQ N in this case consists of just two eigenvalues, i.e., \(\sigma (\mathit{AQ}_{N}) =\{ 2,3.999\}\), while with eigenvectors, \(\sigma (\mathit{AQ}_{N}) =\{ 100,101,\cdots \,,200\}\).

Fig. 5
figure 5

GMRES convergence history for a 1D Poisson problem, based on eigenvectors, aggregation, and linear interpolation

Figure 6 shows a similar comparison, based on the 2D version of (3), discretized on a uniform finite difference mesh. In this case, eigenvector-based two-level method converges faster than the other scenarios, but aggregation-based techniques remain promising, and the (bi)linear interpolation performs the worst. This kind of performance is not one that we typically expect from multigrid (bilinear interpolation works well, and aggregation does not converge).

Fig. 6
figure 6

GMRES convergence history for a 2D Poisson problem, based on eigenvectors, aggregation, and bilinear interpolation

4 Singular but Consistent Systems

The last example is based on the 2D diffusion equation with Neumann conditions at the boundaries, set such that the resultant linear system is consistent. The matrix A is now singular, and is of rank n − 1. If A c is nonsingular, it can be shown that \(\mathcal{N}(P_{N}A) = \mathcal{N}(\mathit{AQ}_{N}) = \mathcal{N}({(P_{N}A)}^{T})\) [2]. In this case, according to Theorem 2.4 of [1], GMRES is guaranteed to converge to the least-squares solution of (2). An invertible coarse-grid matrix A c can be obtained by modifying a nn such that the sum of the last row is nonzero.

Figure 7 shows the convergence history for different choices of I H h, for constant density. In this case, the (augmented) aggregation strategy outperforms both eigenvector– and bilinear interpolation–based approach. Convergence of the problem with one bubble is shown in Fig. 8. Use of eigenvectors leads to the fastest convergence, even though it is not as fast as the convergence seen from the previous examples.

Fig. 7
figure 7

GMRES convergence history for a 2D diffusion problem with uniform density

Fig. 8
figure 8

GMRES convergence history for a 2D diffusion problem with one bubble and density ratio 103

5 Final Remarks

The potential of multilevel Krylov methods has been demonstrated in a number of papers; see also, e.g., [4]. This short note sheds some additional glimpse about this potential. Some examples suggest that eigenvector-based approach does not necessarily turn out to be the best way of setting up a multilevel Krylov method. Therefore, one can also argue that any approximation to eigenvectors may also lead to a non-optimal method. On the other hand, some examples show that it may be possible to get an extremely fast converging method using a rather simple transfer operator. What is crucial in this respect seems to lie in the choice of the transfer operator \(\mathit{I_{H }^{h}}\). Unfortunately, at this point, no guideline is available in this direction.