Some Experiences with Multilevel Krylov Methods

Erlangga, Y. A.

doi:10.1007/978-3-642-33134-3_79

Y. A. Erlangga⁷

2073 Accesses

Abstract

This note discusses convergence behaviors of multilevel Krylov methods for some simple problems, mainly focusing on the possible choice of transfer operators. This study is part of the search for an optimal multilevel Krylov method.

Access provided by Autonomous University of Puebla. Download conference paper PDF

The Multilevel Krylov-Multigrid Method for the Helmholtz Equation Preconditioned by the Shifted Laplacian

On Convergence and Efficiency in the Resolution of Systems of Nonlinear Equations from a Local Analysis

A Multilevel Newton’s Method for Eigenvalue Problems

Article 01 June 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In [3], Erlangga and Nabben proposed a multilevel method for solving the linear system

$$\begin{array}{rcl} Ax = b,\quad A \in {\mathbb{R}}^{n\times n},& &\end{array}$$

(1)

which mimics a multigrid process, but, instead of a smoother, with a Krylov method used at each level. Since a (optimal) Krylov method reduces, not smoothes, errors in some norm, the underlying concept of this Krylov-based multilevel method is different from that of multigrid. Algorithm 4 shows the main body of the two-level version of a multilevel Krylov (MK) method, based on (flexible) GMRES [7].

Notice that a multigrid component is in play: the course-grid solve, which involves a coarse-grid matrix A _c. In multigrid, this solve is associated with the reduction of slow varying components of errors. The fine and coarse subspace are connected by two transfer operators $\mathit{I_{H }^{h}}$ and $\mathit{I_{h}^{H }}$, which prolongate and restrict, respectively, some quantities in the iteration process. In multigrid, these quantities are related to errors made by an approximate solution, obtained after a few smoothing steps or coarse-grid solves. In MK, this quantity is associated with vectors, which build the approximation subspace.

From an abstract point of view, if the second-level problem is solved exactly, Algorithm 4 is the consequence of applying GMRES on the preconditioned linear system

$$\begin{array}{rcl} \mathit{AQ}_{N}\tilde{x} = b,\quad x = Q_{N}\tilde{x},& &\end{array}$$

(2)

with $Q_{N} = I -\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}A + \lambda _{n}\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}$ and $\lambda _{n} =\max \{ \vert \lambda _{i}\vert \}_{1\leq i\leq n}$, with λ_i the eigenvalue of A. Q _N is called the shift operator, due to the following spectral property [3].

Theorem 1.

Let A be symmetric positive definite and $\sigma (A) =\{ \lambda _{1},\ldots,\lambda _{n}\},$ $0 < \lambda _{i} \leq \lambda _{j}$ , for i < j. If $\mathit{I_{H }^{h}} = [z_{1}\ldots z_{m}]$ , with $Az_{i} = \lambda _{i}z_{i}$ , m < n, $\mathit{I_{h}^{H }} = {(\mathit{I_{H }^{h}})}^{T}$ , and $A_{c} = {(\mathit{I_{H }^{h}})}^{T}\mathit{AI_{H }^{h}}$ (Galerkin coarse grid), then $\sigma (\mathit{AQ}_{N}) =\{ \lambda _{m+1},\ldots,\lambda _{n}\}$ , with λ _n having multiplicity of at least m + 1.

Thus, under the assumptions set in Theorem 1, Q _N somehow shifts m smallest eigenvalues of A to λ_n, without changing the rest, making the spectrum more clustered. Obviously, the system (2) is more favorable for a Krylov method than the original system (1).

The first two terms in Q _N form the (right) deflation operator [6], denoted by Q _D. The effect of Q _N and Q _D on A are spectrally equivalent in the following sense [3].

Theorem 2.

Let A be symmetric positive definite. Let $Q_{D} = I -\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}A.$

$$\sigma (\mathit{AQ}_{D}) :=\{ 0,\mu _{m+1},\ldots,\mu _{n}\}\;\Longleftrightarrow\;\{\mu _{m+1},\ldots,\mu _{n},\lambda _{n}\} =: \sigma (\mathit{AQ}_{N}).$$

Furthermore, if $P_{D} = Q_{D}^{T}$ and $P_{N} = I -\mathit{AI_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }} + \lambda _{n}\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}$ , then $\sigma (\mathit{P_{D}A}) = \sigma (\mathit{AQ}_{D})$ and $\sigma (\mathit{P_{N }A}) = \sigma (\mathit{AQ}_{N})$ , and the same equivalence holds.

Theorem 2 can be easily extended to a more general class of matrices A.

In this short note, we present observed convergence behaviors of this method, based on some relatively simple problems. We shall base the presentation on the two level Krylov method (Algorithm 4), which represents the best performance possibly attained in terms of numbers of iterations, for a given multilevel Krylov setup.

Throughout this note, σ( ⋅), λ( ⋅), κ( ⋅), $\mathcal{R}(\cdot )$, and $\mathcal{N}(\cdot )$ denote respectively the spectrum, eigenvalue, condition number, range, and null space of the argument.

2 Spectral Properties and Observed Convergence

In this section, we shall consider a hypothetical problem: a diagonal matrix $A = \mathit{diag}(1,2,\ldots,n)$. In this case, $\sigma (A) =\{ 1,2,\ldots,n\}$, the eigenvectors $v_{i} = \alpha _{i}\mathbf{e}_{i}$, $\alpha _{i}\neq 0$, and for any b = [b _i], the linear system (1) has the solution $x = [x_{i}]_{1\leq i\leq n} = [b_{i}/a_{ii}]$.

2.1 Eigenvectors

Let $\mathit{I_{H }^{h}} = [\mathbf{e}_{i}]_{1\leq i\leq m}$, $\mathit{I_{h}^{H }} = {(\mathit{I_{H }^{h}})}^{T}$, and set $A_{c} = I_{h}^{H}AI_{H}^{h}$ (Galerkin-type coarse-grid matrix). In this case, according to Theorem 2, $\sigma (P_{N}A) =\{ m + 1,\ldots,n\}$, with λ_n = n having a multiplicity m + 1. Furthermore, $\sigma (\mathit{P_{D}A}) =\{ 0,m + 1,\ldots,n\}$. As $\mathit{P_{D}A}$ is symmetric positive semi-definite, $\kappa _{\text{ eff}}(\mathit{P_{D}A}) = n/(m + 1) < n = \kappa (A)$.

Let x ₀ = 0, and consider the left-preconditioned version of the two-level Krylov method: $\mathit{P_{N }A}x = P_{N}b$. The Krylov subspace after the k-th iteration is

$$\mathcal{K}(P_{N}A,r_{0}) = \text{ span}\{r_{0},P_{N}Ar_{0},\ldots,{(\mathit{P_{N }A})}^{k-1}r_{ 0}\}.$$

Straight-forward computations lead to the following results:

$$r_{0} = \lambda _{n}D_{m}^{-1}b+D_{ n-m}^{0}b,\quad {(P_{ N}A)}^{i}r_{ 0} = \lambda _{n}^{i}D_{ m}^{-1}b+D_{ n-m}^{i-1}b,\,\,i = 1,\ldots,k-1,$$

with $D_{m}^{-1} = \mathit{diag}(A_{m}^{-1}\,\,0)$ and $D_{n-m} = \mathit{diag}(0\,\,A_{n-m})$. Thus,

$$\begin{array}{rcl} x_{k,N}& \in & \text{ span}\{\lambda _{n}D_{m}^{-1}b + D_{ n-m}^{0}b,\lambda _{ n}D_{m}^{-1}b + D_{ n-m}b,\ldots,\lambda _{n}^{k-1}D_{ m}^{-1}b + D_{ n-m}^{k-1}b\} \\ & \subseteq & \text{ span}\{D_{m}^{-1}b,D_{ n-m}^{0}b,D_{ n-m}b,\ldots,D_{n-m}^{k-1}b\}.\end{array}$$

Consider the (deflated) CG iteration applied to $P_{D}Ax = P_{D}b$. In this case, CG generates a sequence of approximate solutions, $\{\tilde{x}_{k,D}\}$, to this singular linear system such that

$$\tilde{x}_{k,D} \in \text{ span}\{D_{n-m}^{0}b,D_{ n-m}b,\ldots,D_{n-m}^{k-1}b\}.$$

The solution of Ax = b is then constructed as follows: $x = (I - P_{D}^{T})x + P_{D}^{T}x = \mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}b + P_{D}^{T}x$, implying a sequence $x_{k,D} = Is_{H}^{h}A_{c}^{-1}I_{h}^{H}b + P_{D}^{T}\tilde{x}_{k,D}$, and

$$\begin{array}{rcl} x_{k,D}& \in & I_{H}^{h}A_{ c}^{-1}I_{ h}^{H}b + \text{ span}\{P_{ D}^{T}D_{ n-m}^{0}b,P_{ D}^{T}D_{ n-m}b,\ldots,P_{D}^{T}D_{ n-m}^{k-1}b\} \\ & =& D_{n-m}^{-1}b + \text{ span}\{D_{ n-m}^{0}b,D_{ n-m}b,\ldots,D_{n-m}^{k-1}b\}.\end{array}$$

So, $x_{k,N}$ and $x_{k,D}$ are members of the same subspace. From [5], for every $x_{k} \in \text{ span}\{D_{n-m}^{-1}b,D_{n-m}^{0}b,D_{n-m}b,\ldots,D_{n-m}^{k-1}b\} \equiv \mathcal{L}$,

$$\|x - x_{k,D}\|_{A}^{2} \leq \| x - x_{ k}\|_{A}^{2}.$$

Since $x_{k,N} \in \mathcal{L}$,

$$\|x - x_{k,D}\|_{A}^{2} \leq \| x - x_{ k,N}\|_{A}^{2}.$$

This result suggests the superiority of deflation technique to the shift operator, shown in Figs. 1 and 2 for the hypothetical problem.

A similar result holds also for (right preconditioned) GMRES, reading

$$\|b - Ax_{k,D}\|_{2} \leq \| b - Ax_{k,N}\|_{2}.$$

The proof is skipped, and will be presented in another paper.

2.2 Interpolation Matrices

Consider any sparse, full rank interpolation matrix $\mathit{I_{H }^{h}} \in {\mathbb{R}}^{n\times m_{1}}$ and let $\mathcal{I}_{H}^{h} \in {\mathbb{R}}^{n\times m_{2}}$ be another full rank matrix. The following theorem is proved in [5]:

Theorem 3.

Let $A_{c,1}\,=\,{(\mathit{I_{H }^{h}})}^{T}\mathit{AI_{H }^{h}}$ and $A_{c,2}\,=\,{(\mathcal{I}_{H}^{h})}^{T}A\mathcal{I}_{H}^{h}$ . Let $P_{D,1}\,=\,I\,-\,\mathit{AI_{H }^{h}}A_{c,1}^{-1}{(\mathit{I_{H }^{h}})}^{T}$ , and similarly for P _D,2 . If $\mathcal{R}(\mathit{I_{H }^{h}}) \subseteq \mathcal{R}(\mathcal{I}_{H}^{h})$ , then

1.
$\lambda _{n}(P_{D,1}A) \geq \lambda _{n}(P_{D,2}A)$ , and
2.
$\lambda _{m_{1}+1}(P_{D,1}A) \leq \lambda _{m_{2}+1}(P_{D,2}A)$.

The classical examples of interpolation matrices are those associated with aggregation and linear interpolation in multigrid. They are illustrated for 1D in Fig. 3. Let $\mathit{I_{H }^{h}}$ and $\mathcal{I}_{H}^{h}$ be matrices, associated with the “standard” aggregation and augmented aggregation, respectively. For the latter, we augment $\mathit{I_{H }^{h}}$ by a column vector ${(1\,0\,\ldots \,0\,1)}^{T}$. So, $m_{2} = m_{1} + 1$, $\mathit{I_{H }^{h}}$ and $\mathcal{I}_{H}^{h}$ are full rank, and $\mathit{I_{H }^{h}} \subseteq \mathcal{I}_{H}^{h}$. Thus, Theorem 3 holds. But, P _N A (or $\mathit{AQ}_{N}$) is no longer symmetric, even if A is a diagonal matrix; CG certainly breaks down in this case, and GMRES has to be employed. Convergence of two-level Krylov is shown in Fig. 4. The figure suggests the faster convergence of the method with augmented aggregation. For deflation, this performance is predicted by Theorem 3 and the well-known convergence bound of CG (due to $\kappa _{\text{ eff}}(P_{D,\mathcal{I}}A) \leq \kappa _{\text{ eff}}(P_{D,I}A))$. For P _N, the GMRES convergence bound of [7] is, however, not useful for extracting detailed information about the behavior of the method (see the residuals at the initial stage of iterations). However, better clustering affects the overall convergence.

In Fig. 4, we show also the convergence based on the linear interpolation. The associated interpolation matrix (denoted by $\mathcal{I}_{H}^{h}$) is set such that it is of the same rank as the aggregation matrix (denoted by $\mathit{I_{H }^{h}}$). In this case, however, the inclusion condition of Theorem 3 does not hold. Use of linear interpolation clearly leads to a better convergence. This behavior is unfortunately not generally the rule, as we shall see in some examples in the subsequent sections.

3 Does Eigenvector-Based Transfer Operator Lead to a Better Convergence?

An almost general wisdom in deflation is to use eigenvectors or some (accurate) approximations to accelerate the convergence. The only reason one in practice avoids using them is because of the computational cost of computing even some of them. We shall address this issue in this section.

We consider a 1D Poisson problem in [0, 1]:

$$\begin{array}{rcl} -u \prime \prime(x) = f(x),\quad u(0) = u(1) = 0,& &\end{array}$$

(3)

discretized by the second-order central difference. The eigenvalues and eigenvectors of the associated finite difference matrix A are

$$\lambda _{k} = {4\sin }^{2}\left (\frac{\theta _{k}} {2} \right ),\quad v_{k} = {(\sin \theta _{k},\sin (2\theta _{k}),\ldots,\sin (n\theta _{k}))}^{T},\quad k = 1,\ldots,n,$$

with $\theta _{k} = k\pi /(n + 1)$, with n the number of interior grid points (and hence, the size of A). In Fig. 5, we compare the performance of two-level MK method based on eigenvectors, (augmented) aggregation, and linear interpolation for n = 200 and $m = n/2 = 100$. Interestingly, methods based on aggregation perform the best; they converge to the machine accuracy in two iterations. The spectrum of AQ _N in this case consists of just two eigenvalues, i.e., $\sigma (\mathit{AQ}_{N}) =\{ 2,3.999\}$, while with eigenvectors, $\sigma (\mathit{AQ}_{N}) =\{ 100,101,\cdots \,,200\}$.

Figure 6 shows a similar comparison, based on the 2D version of (3), discretized on a uniform finite difference mesh. In this case, eigenvector-based two-level method converges faster than the other scenarios, but aggregation-based techniques remain promising, and the (bi)linear interpolation performs the worst. This kind of performance is not one that we typically expect from multigrid (bilinear interpolation works well, and aggregation does not converge).

4 Singular but Consistent Systems

The last example is based on the 2D diffusion equation with Neumann conditions at the boundaries, set such that the resultant linear system is consistent. The matrix A is now singular, and is of rank n − 1. If A _c is nonsingular, it can be shown that $\mathcal{N}(P_{N}A) = \mathcal{N}(\mathit{AQ}_{N}) = \mathcal{N}({(P_{N}A)}^{T})$ [2]. In this case, according to Theorem 2.4 of [1], GMRES is guaranteed to converge to the least-squares solution of (2). An invertible coarse-grid matrix A _c can be obtained by modifying a _nn such that the sum of the last row is nonzero.

Figure 7 shows the convergence history for different choices of I _H ^h, for constant density. In this case, the (augmented) aggregation strategy outperforms both eigenvector– and bilinear interpolation–based approach. Convergence of the problem with one bubble is shown in Fig. 8. Use of eigenvectors leads to the fastest convergence, even though it is not as fast as the convergence seen from the previous examples.

5 Final Remarks

The potential of multilevel Krylov methods has been demonstrated in a number of papers; see also, e.g., [4]. This short note sheds some additional glimpse about this potential. Some examples suggest that eigenvector-based approach does not necessarily turn out to be the best way of setting up a multilevel Krylov method. Therefore, one can also argue that any approximation to eigenvectors may also lead to a non-optimal method. On the other hand, some examples show that it may be possible to get an extremely fast converging method using a rather simple transfer operator. What is crucial in this respect seems to lie in the choice of the transfer operator $\mathit{I_{H }^{h}}$. Unfortunately, at this point, no guideline is available in this direction.

References

Peter N Brown and Homer F Walker. GMRES on (nearly) singular systems. SIAM J. Matrix Anal. Appl., 18:37–51, 1997.
Google Scholar
Yogi A Erlangga. Multilevel krylov for singular systems. manuscript.
Google Scholar
Yogi A Erlangga and Reinhard Nabben. Multilevel projection-based nested Krylov iteration for boundary value problems. SIAM J. Sci. Comput., 30:1572–1595, 2008.
Google Scholar
Yogi A Erlangga and Reinhard Nabben. Algebraic multilevel Krylov methods. SIAM J. Sci. Comput., 31:3417–3437, 2009.
Google Scholar
R Nabben and C Vuik. A comparison of deflation and coarse grid correction applied to porous media flow. SIAM J. Numer. Anal., 42:1631–1647, 2004.
Google Scholar
R A Nicolaides. Deflation of conjugate gradients with applications to boundary value problems. SIAM J. Numer. Anal., 24:355–365, 1987.
Google Scholar
Y Saad and M H Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7:856–869, 1986.
Google Scholar

Download references

Author information

Authors and Affiliations

Mathematics Department, Alfaisal University, At-Takhassusi Road (South), 10927, Riyadh, 11533, Saudi Arabia
Y. A. Erlangga

Authors

Y. A. Erlangga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Y. A. Erlangga .

Editor information

Editors and Affiliations

, Department of Mathematics, University of Leicester, Leicester, LE!1 7RH, United Kingdom
Andrea Cangiani
, Department of Mathematics, University of Leicester, Leicester, LE1 7RL, United Kingdom
Ruslan L. Davidchack
, Department of Mathematics, University of Leicester, Leicester, LE1 7RH, United Kingdom
Emmanuil Georgoulis
, Department of Mathematics, University of Leicester, Leicester, LE1 7RH, United Kingdom
Alexander N. Gorban
, Department of Mathematics, University of Leicester, Leicester, LE1 7RH, United Kingdom
Jeremy Levesley
, School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, United Kingdom
Michael V. Tretyakov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Erlangga, Y.A. (2013). Some Experiences with Multilevel Krylov Methods. In: Cangiani, A., Davidchack, R., Georgoulis, E., Gorban, A., Levesley, J., Tretyakov, M. (eds) Numerical Mathematics and Advanced Applications 2011. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33134-3_79

Download citation

DOI: https://doi.org/10.1007/978-3-642-33134-3_79
Published: 04 November 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33133-6
Online ISBN: 978-3-642-33134-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us