Abstract
This note discusses convergence behaviors of multilevel Krylov methods for some simple problems, mainly focusing on the possible choice of transfer operators. This study is part of the search for an optimal multilevel Krylov method.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
In [3], Erlangga and Nabben proposed a multilevel method for solving the linear system
which mimics a multigrid process, but, instead of a smoother, with a Krylov method used at each level. Since a (optimal) Krylov method reduces, not smoothes, errors in some norm, the underlying concept of this Krylov-based multilevel method is different from that of multigrid. Algorithm 4 shows the main body of the two-level version of a multilevel Krylov (MK) method, based on (flexible) GMRES [7].
Notice that a multigrid component is in play: the course-grid solve, which involves a coarse-grid matrix A c . In multigrid, this solve is associated with the reduction of slow varying components of errors. The fine and coarse subspace are connected by two transfer operators \(\mathit{I_{H }^{h}}\) and \(\mathit{I_{h}^{H }}\), which prolongate and restrict, respectively, some quantities in the iteration process. In multigrid, these quantities are related to errors made by an approximate solution, obtained after a few smoothing steps or coarse-grid solves. In MK, this quantity is associated with vectors, which build the approximation subspace.
From an abstract point of view, if the second-level problem is solved exactly, Algorithm 4 is the consequence of applying GMRES on the preconditioned linear system
with \(Q_{N} = I -\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}A + \lambda _{n}\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}\) and \(\lambda _{n} =\max \{ \vert \lambda _{i}\vert \}_{1\leq i\leq n}\), with λ i the eigenvalue of A. Q N is called the shift operator, due to the following spectral property [3].
Theorem 1.
Let A be symmetric positive definite and \(\sigma (A) =\{ \lambda _{1},\ldots,\lambda _{n}\},\) \(0 < \lambda _{i} \leq \lambda _{j}\) , for i < j. If \(\mathit{I_{H }^{h}} = [z_{1}\ldots z_{m}]\) , with \(Az_{i} = \lambda _{i}z_{i}\) , m < n, \(\mathit{I_{h}^{H }} = {(\mathit{I_{H }^{h}})}^{T}\) , and \(A_{c} = {(\mathit{I_{H }^{h}})}^{T}\mathit{AI_{H }^{h}}\) (Galerkin coarse grid), then \(\sigma (\mathit{AQ}_{N}) =\{ \lambda _{m+1},\ldots,\lambda _{n}\}\) , with λ n having multiplicity of at least m + 1.
Thus, under the assumptions set in Theorem 1, Q N somehow shifts m smallest eigenvalues of A to λ n , without changing the rest, making the spectrum more clustered. Obviously, the system (2) is more favorable for a Krylov method than the original system (1).
The first two terms in Q N form the (right) deflation operator [6], denoted by Q D . The effect of Q N and Q D on A are spectrally equivalent in the following sense [3].
Theorem 2.
Let A be symmetric positive definite. Let \(Q_{D} = I -\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}A.\)
Furthermore, if \(P_{D} = Q_{D}^{T}\) and \(P_{N} = I -\mathit{AI_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }} + \lambda _{n}\mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}\) , then \(\sigma (\mathit{P_{D}A}) = \sigma (\mathit{AQ}_{D})\) and \(\sigma (\mathit{P_{N }A}) = \sigma (\mathit{AQ}_{N})\) , and the same equivalence holds.
Theorem 2 can be easily extended to a more general class of matrices A.
In this short note, we present observed convergence behaviors of this method, based on some relatively simple problems. We shall base the presentation on the two level Krylov method (Algorithm 4), which represents the best performance possibly attained in terms of numbers of iterations, for a given multilevel Krylov setup.
Throughout this note, σ( ⋅), λ( ⋅), κ( ⋅), \(\mathcal{R}(\cdot )\), and \(\mathcal{N}(\cdot )\) denote respectively the spectrum, eigenvalue, condition number, range, and null space of the argument.
2 Spectral Properties and Observed Convergence
In this section, we shall consider a hypothetical problem: a diagonal matrix \(A = \mathit{diag}(1,2,\ldots,n)\). In this case, \(\sigma (A) =\{ 1,2,\ldots,n\}\), the eigenvectors \(v_{i} = \alpha _{i}\mathbf{e}_{i}\), \(\alpha _{i}\neq 0\), and for any b = [b i ], the linear system (1) has the solution \(x = [x_{i}]_{1\leq i\leq n} = [b_{i}/a_{ii}]\).
2.1 Eigenvectors
Let \(\mathit{I_{H }^{h}} = [\mathbf{e}_{i}]_{1\leq i\leq m}\), \(\mathit{I_{h}^{H }} = {(\mathit{I_{H }^{h}})}^{T}\), and set \(A_{c} = I_{h}^{H}AI_{H}^{h}\) (Galerkin-type coarse-grid matrix). In this case, according to Theorem 2, \(\sigma (P_{N}A) =\{ m + 1,\ldots,n\}\), with λ n = n having a multiplicity m + 1. Furthermore, \(\sigma (\mathit{P_{D}A}) =\{ 0,m + 1,\ldots,n\}\). As \(\mathit{P_{D}A}\) is symmetric positive semi-definite, \(\kappa _{\text{ eff}}(\mathit{P_{D}A}) = n/(m + 1) < n = \kappa (A)\).
Let x 0 = 0, and consider the left-preconditioned version of the two-level Krylov method: \(\mathit{P_{N }A}x = P_{N}b\). The Krylov subspace after the k-th iteration is
Straight-forward computations lead to the following results:
with \(D_{m}^{-1} = \mathit{diag}(A_{m}^{-1}\,\,0)\) and \(D_{n-m} = \mathit{diag}(0\,\,A_{n-m})\). Thus,
Consider the (deflated) CG iteration applied to \(P_{D}Ax = P_{D}b\). In this case, CG generates a sequence of approximate solutions, \(\{\tilde{x}_{k,D}\}\), to this singular linear system such that
The solution of Ax = b is then constructed as follows: \(x = (I - P_{D}^{T})x + P_{D}^{T}x = \mathit{I_{H }^{h}}A_{c}^{-1}\mathit{I_{h}^{H }}b + P_{D}^{T}x\), implying a sequence \(x_{k,D} = Is_{H}^{h}A_{c}^{-1}I_{h}^{H}b + P_{D}^{T}\tilde{x}_{k,D}\), and
So, \(x_{k,N}\) and \(x_{k,D}\) are members of the same subspace. From [5], for every \(x_{k} \in \text{ span}\{D_{n-m}^{-1}b,D_{n-m}^{0}b,D_{n-m}b,\ldots,D_{n-m}^{k-1}b\} \equiv \mathcal{L}\),
Since \(x_{k,N} \in \mathcal{L}\),
This result suggests the superiority of deflation technique to the shift operator, shown in Figs. 1 and 2 for the hypothetical problem.
A similar result holds also for (right preconditioned) GMRES, reading
The proof is skipped, and will be presented in another paper.
2.2 Interpolation Matrices
Consider any sparse, full rank interpolation matrix \(\mathit{I_{H }^{h}} \in {\mathbb{R}}^{n\times m_{1}}\) and let \(\mathcal{I}_{H}^{h} \in {\mathbb{R}}^{n\times m_{2}}\) be another full rank matrix. The following theorem is proved in [5]:
Theorem 3.
Let \(A_{c,1}\,=\,{(\mathit{I_{H }^{h}})}^{T}\mathit{AI_{H }^{h}}\) and \(A_{c,2}\,=\,{(\mathcal{I}_{H}^{h})}^{T}A\mathcal{I}_{H}^{h}\) . Let \(P_{D,1}\,=\,I\,-\,\mathit{AI_{H }^{h}}A_{c,1}^{-1}{(\mathit{I_{H }^{h}})}^{T}\) , and similarly for P D,2 . If \(\mathcal{R}(\mathit{I_{H }^{h}}) \subseteq \mathcal{R}(\mathcal{I}_{H}^{h})\) , then
-
1.
\(\lambda _{n}(P_{D,1}A) \geq \lambda _{n}(P_{D,2}A)\) , and
-
2.
\(\lambda _{m_{1}+1}(P_{D,1}A) \leq \lambda _{m_{2}+1}(P_{D,2}A)\).
The classical examples of interpolation matrices are those associated with aggregation and linear interpolation in multigrid. They are illustrated for 1D in Fig. 3. Let \(\mathit{I_{H }^{h}}\) and \(\mathcal{I}_{H}^{h}\) be matrices, associated with the “standard” aggregation and augmented aggregation, respectively. For the latter, we augment \(\mathit{I_{H }^{h}}\) by a column vector \({(1\,0\,\ldots \,0\,1)}^{T}\). So, \(m_{2} = m_{1} + 1\), \(\mathit{I_{H }^{h}}\) and \(\mathcal{I}_{H}^{h}\) are full rank, and \(\mathit{I_{H }^{h}} \subseteq \mathcal{I}_{H}^{h}\). Thus, Theorem 3 holds. But, P N A (or \(\mathit{AQ}_{N}\)) is no longer symmetric, even if A is a diagonal matrix; CG certainly breaks down in this case, and GMRES has to be employed. Convergence of two-level Krylov is shown in Fig. 4. The figure suggests the faster convergence of the method with augmented aggregation. For deflation, this performance is predicted by Theorem 3 and the well-known convergence bound of CG (due to \(\kappa _{\text{ eff}}(P_{D,\mathcal{I}}A) \leq \kappa _{\text{ eff}}(P_{D,I}A))\). For P N , the GMRES convergence bound of [7] is, however, not useful for extracting detailed information about the behavior of the method (see the residuals at the initial stage of iterations). However, better clustering affects the overall convergence.
In Fig. 4, we show also the convergence based on the linear interpolation. The associated interpolation matrix (denoted by \(\mathcal{I}_{H}^{h}\)) is set such that it is of the same rank as the aggregation matrix (denoted by \(\mathit{I_{H }^{h}}\)). In this case, however, the inclusion condition of Theorem 3 does not hold. Use of linear interpolation clearly leads to a better convergence. This behavior is unfortunately not generally the rule, as we shall see in some examples in the subsequent sections.
3 Does Eigenvector-Based Transfer Operator Lead to a Better Convergence?
An almost general wisdom in deflation is to use eigenvectors or some (accurate) approximations to accelerate the convergence. The only reason one in practice avoids using them is because of the computational cost of computing even some of them. We shall address this issue in this section.
We consider a 1D Poisson problem in [0, 1]:
discretized by the second-order central difference. The eigenvalues and eigenvectors of the associated finite difference matrix A are
with \(\theta _{k} = k\pi /(n + 1)\), with n the number of interior grid points (and hence, the size of A). In Fig. 5, we compare the performance of two-level MK method based on eigenvectors, (augmented) aggregation, and linear interpolation for n = 200 and \(m = n/2 = 100\). Interestingly, methods based on aggregation perform the best; they converge to the machine accuracy in two iterations. The spectrum of AQ N in this case consists of just two eigenvalues, i.e., \(\sigma (\mathit{AQ}_{N}) =\{ 2,3.999\}\), while with eigenvectors, \(\sigma (\mathit{AQ}_{N}) =\{ 100,101,\cdots \,,200\}\).
Figure 6 shows a similar comparison, based on the 2D version of (3), discretized on a uniform finite difference mesh. In this case, eigenvector-based two-level method converges faster than the other scenarios, but aggregation-based techniques remain promising, and the (bi)linear interpolation performs the worst. This kind of performance is not one that we typically expect from multigrid (bilinear interpolation works well, and aggregation does not converge).
4 Singular but Consistent Systems
The last example is based on the 2D diffusion equation with Neumann conditions at the boundaries, set such that the resultant linear system is consistent. The matrix A is now singular, and is of rank n − 1. If A c is nonsingular, it can be shown that \(\mathcal{N}(P_{N}A) = \mathcal{N}(\mathit{AQ}_{N}) = \mathcal{N}({(P_{N}A)}^{T})\) [2]. In this case, according to Theorem 2.4 of [1], GMRES is guaranteed to converge to the least-squares solution of (2). An invertible coarse-grid matrix A c can be obtained by modifying a nn such that the sum of the last row is nonzero.
Figure 7 shows the convergence history for different choices of I H h, for constant density. In this case, the (augmented) aggregation strategy outperforms both eigenvector– and bilinear interpolation–based approach. Convergence of the problem with one bubble is shown in Fig. 8. Use of eigenvectors leads to the fastest convergence, even though it is not as fast as the convergence seen from the previous examples.
5 Final Remarks
The potential of multilevel Krylov methods has been demonstrated in a number of papers; see also, e.g., [4]. This short note sheds some additional glimpse about this potential. Some examples suggest that eigenvector-based approach does not necessarily turn out to be the best way of setting up a multilevel Krylov method. Therefore, one can also argue that any approximation to eigenvectors may also lead to a non-optimal method. On the other hand, some examples show that it may be possible to get an extremely fast converging method using a rather simple transfer operator. What is crucial in this respect seems to lie in the choice of the transfer operator \(\mathit{I_{H }^{h}}\). Unfortunately, at this point, no guideline is available in this direction.
References
Peter N Brown and Homer F Walker. GMRES on (nearly) singular systems. SIAM J. Matrix Anal. Appl., 18:37–51, 1997.
Yogi A Erlangga. Multilevel krylov for singular systems. manuscript.
Yogi A Erlangga and Reinhard Nabben. Multilevel projection-based nested Krylov iteration for boundary value problems. SIAM J. Sci. Comput., 30:1572–1595, 2008.
Yogi A Erlangga and Reinhard Nabben. Algebraic multilevel Krylov methods. SIAM J. Sci. Comput., 31:3417–3437, 2009.
R Nabben and C Vuik. A comparison of deflation and coarse grid correction applied to porous media flow. SIAM J. Numer. Anal., 42:1631–1647, 2004.
R A Nicolaides. Deflation of conjugate gradients with applications to boundary value problems. SIAM J. Numer. Anal., 24:355–365, 1987.
Y Saad and M H Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7:856–869, 1986.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Erlangga, Y.A. (2013). Some Experiences with Multilevel Krylov Methods. In: Cangiani, A., Davidchack, R., Georgoulis, E., Gorban, A., Levesley, J., Tretyakov, M. (eds) Numerical Mathematics and Advanced Applications 2011. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33134-3_79
Download citation
DOI: https://doi.org/10.1007/978-3-642-33134-3_79
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33133-6
Online ISBN: 978-3-642-33134-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)