1 Introduction

We are interested in solving large and sparse linear systems of equations

$$\begin{aligned} A \mathbf{x}= \mathbf{b}, \end{aligned}$$

where \(A \in \mathbb {R}^{n \times n}\) is assumed symmetric positive definite (s.p.d.), by algebraic multigrid (AMG) and more specifically by aggregation based AMG. The AMG methods, originated in [5], together with the smoothed aggregation AMG (or SA AMG) [29], have become a powerful tool for solving problems of linear algebraic equations that typically arise from discretization of elliptic PDEs. In recent years substantial progress has been made to extend the applicability of AMG to more general sparse linear systems by developing methods that use appropriate adaptive strategies (cf., [3, 4, 8, 9, 20, 23], etc.) aimed at capturing the near-null components of the error (sometimes referred to as algebraically smooth components) that the current solver cannot efficiently handle so that they are then used to improve the solver by modifying its hierarchy of coarse spaces.

The approach that we utilize builds upon the adaptive AMG ideas however presents several new features. It is also fairly general in the sense that we do not assume any specific knowledge of the near-nullspace of \(A\) (or of a preconditioned version of \(A\), such as \(B^{-1}A\)). The main philosophy is the same as in the original adaptive AMG papers (cited above); namely we test the current method (represented by an operator \(B\)) applied to the trivial system \(A {\mathbf x}= 0\) starting with a nonzero random initial iterate \({\mathbf x}\), by computing \({\mathbf x}:= (I-B^{-1}A) {\mathbf x}\), which effectively provides an approximation to the eigenvector of \(B^{-1}A\) corresponding to the minimal eigenvalue of \(B^{-1}A\). If during this process a slow convergence is encountered, we use the most recent iterate to build a new coarse hierarchy. This is the first main difference with the previously studied adaptive AMG methods. As a result, we end up with a composite AMG solver \(B\), given by the product formula

$$\begin{aligned} I-B^{-1}A = \prod _j (I-B^{-1}_j A), \end{aligned}$$

where each \(B_j\) corresponds to a separate hierarchy constructed driven by a particular algebraically smooth vector.

Another difference in our approach is in the coarsening process employed to obtain a multilevel hierarchy. We consider coarsening by pairwise aggregation based on a weighted matching (for definitions, see Sect. 2) applied to the matrix adjacency graph. At each level of the hierarchy, starting from a maximum product matching of the graph associated with the current matrix, we generate two complementary coarser vector spaces by simple piecewise constant interpolation of a given algebraically smooth vector. We select the coarse space based on the principles of compatible relaxation (originated in [2]), i.e., we test the convergence of a pointwise smoother on homogeneous systems associated to the two available coarser matrices and choose as new coarse matrix and new algebraically smooth vector those for which slower convergence is observed. In fact, if we use matching so that the aggregates gather together pairs of fine degrees of freedom (or dofs) that are “strongly connected” the complementary space gives rise to a hierarchically complement matrix that is well-conditioned (when preconditioned by the smoother). In general, the procedure can end up building a binary tree of multiple coarse spaces by matching-based aggregation where, at each level, selection of coarsening branch is based on compatible relaxation of a given vector. We use both optimal solution for maximum product matching and an approximation algorithm and demonstrate the performance of our adaptive AMG on the difficult (for multigrid) s.p.d. linear systems arising from discretization of anisotropic PDEs on structured and unstructured meshes. In particular, we demonstrate that our coarsening strategy clearly detects the direction of anisotropy in both structured and unstructured mesh cases. We also include some preliminary tests of the method on (2D and 3D) elasticity problems as well as on some matrices from the University of Florida Sparse Matrix Collection.

The remainder of the paper is organized as follows. In Sect. 2, we recall the notion of graph associated to a sparse matrix and remind the relation between maximum product bipartite matching and linear algebra applications. Then we describe the algorithm for pairwise aggregation based on weighted matching. In Sect. 3, we introduce two algebraic coarsening processes based on the pairwise aggregation, depending on the weights we used for matching. The actual coarse vector space is chosen based on compatible relaxation principles. In Sect. 4, we outline the bootstrap strategy employed to build a composite \(\alpha \)AMG with a prescribed convergence rate, whereas in Sect. 5 we present an extensive set of numerical results illustrating our approach. Finally, some remarks and future work are included in Sect. 6.

2 Pairwise aggregation based on weighted matching

To find matching in a graph is a classical problem in combinatorial optimization which has wide range of applications in Sparse Linear Algebra [13]. The starting point is the representation of the sparse matrices in terms of graphs [27]. Let \(A=(a_{ij})_{i,j=1, \ldots , n}\) be a sparse matrix, the graph associated with \(A\) is the pair \(G_U=(V, E)\), where the vertex set \(V\) correspond to the row/column indices of \(A\) and the edge set \(E\) corresponds to the set of nonzeros of the matrix \(A\) so that \((i,j) \in E\) iff \(a_{ij} \ne 0\). For matrices with symmetric sparsity pattern, the edges \((i,j)\) are undirected pairs of vertices, i.e. \((i,j)=(j,i) \in E\) iff \(a_{ij} \ne 0\) and \(a_{ji} \ne 0\), and \(G_U\) is called undirected graph. In the case of a graph \(G_P=\{ V_r \cup V_c, E \}\), where the vertex set has a partition to two subsets \(V_r\) and \(V_c\) (for example the rows and the columns of \(A\)), such that \((i,j) \in E\) connects \(i \in V_r\) and \(j \in V_c\), the graph is called bipartite [10]. A matching \({\mathcal M}\subseteq E\) in a graph (\(G_U\) or \(G_P\)) is a set of edges such that no two edges share the same vertex. The number of edges in \({\mathcal M}\) is called the cardinality of the matching and a matching for \(G_U\) or \(G_P\) is referred to as perfect one if its edges touch all vertices. We refer to [13] and the reference herein for conditions which guarantee the existence of perfect matching. A perfect matching \({\mathcal M}\) for \(G_U\) or \(G_P\) corresponds to \(n\) nonzeros no two of which are in the same row or column and can be represented in terms of a column permutation

$$\begin{aligned} \pi _{ji}= \left\{ \begin{array}{ll} 1, &{} \quad \text{ if } (i,j) \in \mathcal {M}\\ 0, &{} \quad \text{ otherwise } \end{array} \right. \end{aligned}$$

such that the matrix \(A\pi \) has a zero-free diagonal. Generally, in linear algebra applications, we are interested in finding matching that controls the size of the diagonal elements of \(A\pi \), and such a requirement is formulated in terms of a maximum weighted matching problem, i.e. in finding a matching \({\mathcal M}\subseteq E\) such that \(C({\mathcal M})= \sum _{ (i,j) \in {\mathcal M}} c_{ij} = \max _{{\mathcal M}^{'}} C(\mathcal {M}^{'})\), with \({\mathcal M}^{'}\) matching of \(G_U/G_P\) and \(c_{ij} \ge 0\) edge weights. In particular, matrices with larger entries on the diagonal can be obtained by solving the following optimization problem [12, 13].

  • Maximum Product Bipartite Matching Problem: Given a graph \(G_P\) corresponding to a sparse matrix \(A\), find a matching \({\mathcal M}\) that maximizes the product of the matched entries, i.e., find a permutation matrix \(\pi \) such that \(\prod abs((A\pi )_{ii})\) is maximum among all permutations.

Therefore, if row \(i\) is matched to column \(j\) in a maximum product bipartite matching problem, we can reasonably assume that \(|a_{ij}| \approx max_{k \ne i} |a_{ik}|\), which in terms of the classical AMG characterization of the strength of matrix connections is equivalent to say that index \(i\) is strongly connected to index \(j\). The difference is that the maximum product bipartite matching problem optimizes a global measure, whereas in classical AMG the strength of connection is a local notion. We demonstrate in the present paper that this global matching is able to capture very accurately the direction of strong anisotropy for difficult AMG test problems with anisotropy that is not grid-aligned. We note however that the maximum product bipartite matching problem if implemented exactly can become too costly, on the other hand a similar matching problem can be described for undirected graphs, so in practice we use an approximation of the maximum product matching problem in undirected graph to end up with setup cost of order \({\mathcal O}(n)\) and still be able to capture the direction of strong anisotropy as in the more expensive accurate solution of the maximum product bipartite matching problem.

Starting from the above considerations, we propose a coarsening process based on the pairwise aggregation described in Algorithm 1. It builds a partition \(\mathfrak {a}_k, \; k=1, \ldots , n_c\) of the index set \(\{1, \ldots , n\}\), where each aggregate \(\mathfrak {a}_k\) is generally a pair of matched indices. In the general case of possible unmatched indices, i.e., in the case of non-perfect matchings (structurally rank-deficient matrices) or sub-optimal solutions, we can obtain a partition with possible singletons.

figure a

We observe that Algorithm 1 is an automatic aggregation procedure once the matching is being constructed; it only uses information on the matrix entries and no additional information is needed. We note that to use pairwise aggregation for coarsening is not a new concept; it has been used previously, e.g., in the widely used partitioner METIS [18] and it seems to be a common practice nowadays, cf., e.g., [7] and the references therein. Our pairwise aggregation does not depend on any user-defined threshold for strong/weakly connection or coarse-grid quality measure, as in the case of pairwise aggregations proposed in [24, 25]. A main novelty in our procedure is the connection which we recognized between the aggregation based on weighted matching and the algorithms and software developed by the sparse direct solvers community that utilizes matchings to reorder the sparse matrix with the goal to improve its diagonal dominance [13]. In the following Sect. 3, we employ the aggregation procedure within an adaptive method exploiting the relation between aggregation based on maximum product matching and the compatible relaxation methods investigated previously [2, 6, 21].

Computation of a maximum product matching in a graph is a challenging problem in terms of computational complexity, indeed classical algorithms require a running time of [11]. On the other hand, the problem can be solved for bipartite graphs with the widely used algorithm described in [12] and implemented in the HSL-MC64 subroutine [16], whose computational complexity is , where \(nnz\) is the number of nonzeros of the matrix. The latter cost is a worst case estimate. At any rate, from AMG perspective the latter cost is still unacceptable since our ultimate goal is aiming at \({\mathcal O}(n)\) algorithm. For that reason, we also use an approximate version of a maximum product matching algorithm in an undirected graph that uses \({\mathcal O}(n)\) operations. We demonstrated that, although in the case of approximate matching, the coarsening ratio of our approach is reduced with respect to the coarsening by a factor of two in the exact matching, the overall performance of the adaptive process does not deteriorate substantially.

3 Coarsening based on compatible weighted matching

3.1 Main ingredients for coarsening

Given a set of aggregates \(\mathfrak {a}_1, \ldots \mathfrak {a}_{n_c}\), built by Algorithm 1, and a starting (arbitrary) vector \({\mathbf w}\), per each pair \(\mathfrak {a}_l=\{i,j\}, \; l=1, \ldots , n_p\), let

$$\begin{aligned} {\mathbf w}_{\mathfrak {a}_l}=\frac{1}{\sqrt{w^2_i + w^2_j}}\left[ \begin{array}{c} w_i\\ w_j \end{array} \right] , \; \; {\mathbf w}^\perp _{\mathfrak {a}_l}=\frac{1}{\sqrt{w^2_i + w^2_j}}\left[ \begin{array}{c} -w_j\\ w_i \end{array} \right] \end{aligned}$$

be the normalized restrictions of \({\mathbf w}\) to the set \(\mathfrak {a}_l\) and its orthonormal complement. We then define the following matrices:

$$\begin{aligned} \tilde{P}_c&= \text {blockdiag}( {\mathbf w}_{\mathfrak {a}_1}, \ldots , {\mathbf w}_{\mathfrak {a}_{n_p}} ) \in \mathbb {R}^{2n_p \times n_p},\\ \tilde{P}_f&= \text {blockdiag}({\mathbf w}^\perp _{\mathfrak {a}_1}, \ldots , {\mathbf w}^\perp _{\mathfrak {a}_{n_p}}) \in \mathbb {R}^{2n_p \times n_p}. \end{aligned}$$

For the singletons \(\mathfrak {a}_l=\{k\}, \; l=1, \ldots , n_s\), (\(n_c = n_p + n_s\), \(n = 2n_p + n_s\)), we introduce the diagonal matrix:

$$\begin{aligned} W=diag(w_k/|w_k|) \in \mathbb {R}^{n_s \times n_s}. \end{aligned}$$

From the above matrices, we obtain two prolongation matrices corresponding to two complementary coarse index sets:

$$\begin{aligned} P_c = \left( \begin{array}{cc} \tilde{P}_c &{} 0\\ 0 &{} W \end{array} \right) \in \mathbb {R}^{n \times n_c}, \; \; P_f= \left( \begin{array}{c} \tilde{P}_f \\ 0 \end{array} \right) \in \mathbb {R}^{n \times n_p}. \end{aligned}$$
(3.1)

The \(n \times n_c\) matrix \(P_c\), referred to as tentative prolongator, maps vectors associated with the coarse index set \(\{1,\;2,\; \ldots , n_c\}\) on the original fine-grid set \(\{1,\;2,\;\ldots ,\;n\}\), whereas \(P_f\), referred to as complementary tentative prolongator, is an \(n \times n_p\) matrix which transfers vectors associated with the complementary coarse index set \(\{1, \;2,\;\ldots , n_p\}\) also on the fine-grid index set \(\{1,\;2,\;\ldots ,\;n\}\). We recall that \(n_c = n_p + n_s\) and \(n = 2n_p + n_s\), where \(n_p\) is the number of pairwise aggregates and \(n_s\) is the number of singletons. Note that \(\mathbb {R}^n = \mathrm {Range}(P_c) \oplus ^{\perp } \mathrm {Range}(P_f)\), where \(\mathrm {Range}(P_c)\ni \mathbf{w}\) and \(\mathrm {Range}(P_f) \ni \mathbf{w}^\perp \) form an orthogonal decomposition of \(\mathbb {R}^n\). In other words, we have that the matrix \(P = \left[ P_f,\; P_c \right] \) has orthogonal columns.

After proper reordering of \(A\), the following two coarser matrices can be formed via Galerkin triple matrix product

$$\begin{aligned} A_c&= P_c^TAP_c \in \mathbb {R}^{n_c \times n_c}, \nonumber \\ A_f&= P_f^TAP_f \in \mathbb {R}^{n_p \times n_p}. \end{aligned}$$
(3.2)

These are the diagonal blocks of the transformed fine-grid matrix \(P^TAP\) under the orthogonal transformation \(P\), i.e., we have

$$\begin{aligned} P^TAP= \left[ \begin{array}{ll} A_f &{} A_{fc} \\ A_{cf} &{} A_c \end{array} \right] . \end{aligned}$$

The off-diagonal blocks read: \(A_{fc}=P_f^TAP_c\) and \(A_{cf}=P_c^TAP_f\).

The choice of the best coarse matrix \(A_c\) for a multilevel hierarchy can be driven by the basic principle of compatible relaxation first introduced by Brandt in [2] and extended in [14] (see also [30]). The compatible relaxation is defined as a relaxation scheme which is able to keep coarse-level variables invariant. It gives a practical way to measure the quality of a set of coarse variables, indeed, since in an efficient multigrid method relaxation scheme has to be effective on the fine variables, the convergence rate of a compatible relaxation scheme can be used as a measure of the quality of a set of coarse variables. This basic idea was used in different approaches to select coarse grids [6, 21]. Here, we apply the principle of compatible relaxation to choose the best coarse matrix from the two available matrices in (3.2), and the corresponding coarse index set, by applying a simple point-wise relaxation scheme to the homogeneous systems associated to each of the matrices, starting from a random initial guess and then relaxing on the two complementary vector spaces separately. If the vector \({\mathbf w}\) is chosen based on a relaxation scheme applied to the original matrix \(A\) so that it is in the near-null space of \(A\), it is natural to expect that \(A_f\) will be better conditioned than \(A_c\). For a more general iterative process, we allow the option to choose between \(A_f\) and \(A_c\) when selecting the coarse-level variables.

3.2 The multilevel adaptive coarsening schemes

Our overall adaptive multilevel coarsening strategy can be described as follows. We propose two versions. The first one, referred to as coarsening based on compatible matching (version 1) is sketched in Algorithm 2. We start with the given system matrix and a given smooth vector, for example the unitary vector. Then, we apply Algorithm 1 for building the two complementary coarse matrices in (3.2). After that, we test the convergence of a simple smoother on homogeneous systems associated to the two available matrices and choose as new coarse matrix and new algebraically smooth vector those for which slower convergence is observed. The process can be applied in a recursive way until a desired small size of the coarse matrix is obtained. Therefore, our procedure builds a binary tree of multiple coarse spaces by matching-based aggregation, where, at each level, selection of the new coarsening branch is based on compatible relaxation of a given vector.

figure b

Note that, as shown in [26], in the case of strongly diagonally dominant or s.p.d. matrices maximum product (perfect) matching produces permutation matrices equal to the identity matrices, i.e. it produces a set of \(n\) self-aggregated indices. Therefore, in order to obtain an effective pairwise aggregation, in Algorithm 2, we apply the maximum product matching to the matrix \(A^k-\text {diag}(A^k)\), where \(\text {diag}(A^k)\) is the diagonal matrix obtained by the diagonal elements of \(A^k\). We also observe that in Algorithm 2, when we build the two complementary coarse matrices \(A_c\) and \(A_f\), we need to compute the normalized restriction of the smooth vector \({\mathbf w}\) on each set of the partition computed by Algorithm 1. It may happen that during the coarsening process, the smooth vector components corresponding to some set of the partition are very small, i.e. the corresponding error components are sufficiently damped by the smoother. In these cases we associate the corresponding unknowns to the vector space \(\mathrm {Range}(P_f)\). More specifically, if \(\mathfrak {a}_l=\{ i, j\}\) is a pair of matched indices such that \(\sqrt{w^2_i + w^2_j} <{ TOL}\), we consider the corresponding indices as unpaired. Furthermore, per each index \(i\) such that \(|w_i|<{ TOL}\), we consider \(i\) as only fine-grid index and we modify operators in (3.1) including a zero row in the diagonal matrix \(W\) for \(P_c\), while the complementary tentative prolongator appears as in the following:

$$\begin{aligned} P_f = \left( \begin{array}{cc} \tilde{P}_f &{} 0\\ 0 &{} I \end{array} \right) \in \mathbb {R}^{n \times (n_p+n_f)}, \end{aligned}$$

where \(I \in \mathbb {R}^{n_f \times n_f}\) is the identity matrix and \(n_f\) is the number of only fine-grid indices. In our experiments we choose TOL as the machine epsilon.

Convergence rates in Algorithm 2 can be estimated as the ratios of the A-norm of two successive iterates, that is \(\rho _{c/f}=\Vert {\mathbf w}^k_{c/f}\Vert _{A_{c/f}}/\Vert {\mathbf w}^{k-1}_{c/f}\Vert _{A_{c/f}}\).

There is an alternative to Algorithm 2 that we consider, still using both the orthogonal decomposition of \(\mathbb {R}^n\) defined by the matrices in (3.1) and the principles of compatible relaxation to build an effective coarsening process. Indeed, after we have built the matrices in (3.2), we accept \(A_c\) as coarse matrix if the corresponding complementary matrix \(A_f\) is as diagonally-dominant as possible, i.e., if \(A_f\) has the compatible relaxation fast to converge. We observe that given the original matrix \(A\), its associated graph \(G_U\) or \(G_P\), and a vector \({\mathbf w}\), the diagonal entries of the resulting \(A_f\) are a subset of the following values:

$$\begin{aligned} {\widehat{a}}_{i,j} =\frac{1}{w^2_j+w^2_i}\; \left[ \begin{array}{c} -w_j \\ w_i \end{array} \right] ^T \left( \begin{array}{cc} a_{i,i} &{} a_{i,j}\\ a_{j,i} &{} a_{j,j} \end{array} \right) \left[ \begin{array}{c} -w_j \\ w_i \end{array} \right] , \nonumber \\ (i,j) \in E. \end{aligned}$$
(3.3)

Consider the thus modified symmetric matrix \({\widehat{A}}=\left( {\widehat{a}}_{i,j}\right) \) having a null diagonal and the same sparsity pattern as \(A\). Note that building \({\widehat{A}}\) has a computational cost of \({\mathcal O}(nnz)\). Therefore, if we compute a maximum product weighted matching \({\mathcal M}\subseteq E\) from \({\widehat{A}}\) and build the corresponding aggregates, we see that the complementary tentative prolongator \(P_f\) in (3.1) produces a matrix \(A_f\) which has on its diagonal entries \({\widehat{a}}_{i,j},\; (i,j) \in {\mathcal M}\) with maximal product. The latter can be seen as an approximation to the notion of diagonal dominance giving rise to a fast convergent compatible relaxation. The process can be applied in a recursive way to define a new adaptive coarsening algorithm which we refer to as coarsening based on compatible matching (version 2). It is sketched in Algorithm 3. Note that also in this algorithm, at each level possible small smooth vector entries are associated to only-fine grid indices.

figure c

The above two compatible matching-based coarsening algorithms can be used to define a hierarchy of coarse vector spaces and matrices from which a multilevel method \(B\) can be designed. In the following we describe an adaptive strategy to improve the efficiency of an initial multilevel method, obtained with compatible matching-based coarsening, by successively building a composite method with a prescribed convergence rate.

4 Composite AMG with prescribed convergence rate

Following the \(\alpha \)AMG principle, once an algebraic multilevel solver \(B\) has been constructed, we test its performance by solving the homogeneous problem \(A {\mathbf x}= {\mathbf 0}\), i.e. by performing the following iterations:

$$\begin{aligned} {\mathbf x}_k = (I - B^{-1} A) {\mathbf x}_{k-1}, \quad \quad k=1, 2, \ldots , \end{aligned}$$

starting with a random initial iterate \({\mathbf x}_0\) and monitoring convergence through two successive values of the \(A\)-norm of the error (which is equal to the respective iterate, since the exact solution is zero). The above iterates provide approximation to the lowest eigenmode of \(B^{-1}A\), which is commonly referred to as algebraic smooth vectors with respect to the current AMG method. If the convergence factor of the method is close to one, we can select \({\mathbf w}= {\mathbf x}_k/\Vert {\mathbf x}_k \Vert _A\) and apply one of the coarsening algorithms described in the preceding section to generate a new method \(B_1\) based on this new vector \({\mathbf w}\). Assuming that we have constructed two (or more) methods \(B_r\), \(r =0,1,\ldots ,\;m\) via the above bootstrap scheme aimed at improving the initial AMG, we consider the homogeneous system and monitor the convergence of the following composite method, starting with a random initial guess \({\mathbf x}_0\),

$$\begin{aligned} \mathbf{x}_k= \prod _{r=1}^{m} (I-B_r^{-1}A)\mathbf{x}_{k-1}, \quad \quad k=1, 2, \ldots , \end{aligned}$$
(4.1)

or of its symmetrized version:

$$\begin{aligned} {\mathbf x}_k = \prod ^{2m+1}_{r=0}(I- B^{-1}_r A){\mathbf x}_{k-1}, \quad \quad k=1, 2, \ldots , \end{aligned}$$
(4.2)

where \(B_{m+r} = B_{m+1-r}, \; r =1,\;\dots ,\;m+1\). The process may be repeated by computing at each stage a new multilevel method until the convergence rate of the composite AMG is acceptable. The final adaptive procedure is sketched in Algorithm 4.

figure d

5 Results

In this section we illustrate the performance of our composite \(\alpha \)AMG in terms of the cost of the setup phase described in Algorithm 4 and the ability of the coarsening procedures based on maximum product matching to obtain effective coarse grids.

We considered the following anisotropic PDE posed in the unit square, when homogeneous Dirichlet boundary conditions are considered:

$$\begin{aligned} - \text {div}(K\; \nabla u)=f, \end{aligned}$$

where \(K\) is the coefficient matrix

$$\begin{aligned} K = \left[ \begin{array}{ll} a &{} c\\ c &{} b \end{array} \right] , \quad \text { with } \quad \left\{ \begin{array}{l} a= \epsilon + \cos ^2(\theta )\\ b= \epsilon + \sin ^2(\theta )\\ c= \cos (\theta )\sin (\theta ) \end{array} \right. \end{aligned}$$

The parameter \(0 < \epsilon \le 1\) defines the strength of anisotropy in the problem, while the parameter \(\theta \) specifies the direction of anisotropy. In the following we discuss results related to \(\epsilon =0.001\) and \(\theta = 0\), \(\pi /8\), \(\pi /4\), \(\pi /3\), \(\pi /2\) for a total of \(5\) test cases, which we refer to as Test Case 1 to 5, respectively. The above problem was discretized by the Matlab PDE toolbox, using bilinear finite elements on triangular and rectangular meshes.

We measure the setup cost in terms of AMG components (nstages) built by the adaptive process in Algorithm 4, both in the case of the coarsening described in Algorithm 2 and in the case of Algorithm 3. In addition to the number of the components, we also report, per each test case and per each mesh, the convergence factor (\(\rho \)) of the composite solver, the average number of levels (nlev) of all built solver components and the average of their operator complexity (cmpx). This last parameter is commonly defined as the ratio between the sum of nonzero entries of the matrices of all levels and the number of nonzero entries of the fine matrix; it gives an estimate of the cost of application of a cycle. Many algorithmic and parameter choices are possible to test our method; here we discuss results related to the following particular choices. The desired convergence factor required for the composite AMG was set to \(\rho _{desired}=0.7\) and a symmetrized multiplicative composition of the AMG components as in (4.2) was applied. The number of iterations used to estimate solver convergence rates at each stage was set to \(\nu _2=15\). Weighted Jacobi (with weight \(\omega =1/3\) for triangular meshes and \(\omega =1/4\) for rectangular meshes) was applied as relaxation scheme in Algorithms 2 and 3, where we have fixed the number of iterations equal to \(\nu _1=20\). We stop the coarsening process when the size of the coarsest matrix was at most \(\text {maxsize}=100\). Note that we did various experiments with increased values of \(\nu _1\) and \(\nu _2\) but estimated values of the obtained convergence rates did not differ significantly.

We developed a Matlab implementation of the composite \(\alpha \)AMG and we analyze its behavior when the coarsening algorithm is based on algorithm HSL-MC64 (Sect. 5.1), or based on a Matlab implementation of the half-approximation maximum weighted matching algorithm for undirected graphs described in [28] (Sect. 5.2).

5.1 Composite AMG based on exact matching

Here we discuss results obtained using the HSL-MC64 routine which, for non-singular matrices, is able to compute a perfect weighted matching for bipartite sparse matrix graphs. In this case, Algorithm 1 has a coarsening factor less but close to two, since it can produce a (small) number of singletons (unaggregated DOFs), essentially due to possible unsymmetric matching (e.g. row \(i\) is matched at column \(j\) and row \(j\) is matched at column \(k\), with \(k \ne i\)). Since the cost for application of exact matching is about , i.e. it is super-linear, in the following we analyze the setup cost of our bootstrap strategy for building a composite multigrid of type (4.2), when each AMG component was a W-cycle, which has a super-linear complexity for coarsening factor less than two, as in our case. Later on, we relax this cycle to a hybrid V–W one (cf., e.g., [30]) in order to ensure order \({\mathcal O}(n)\) cost of the cycle. One sweep of symmetric Gauss-Seidel was used as both pre/post smoother and as coarsest level solver.

5.1.1 Unstructured mesh

In this section we present results for matrices corresponding to discretization of our test cases on unstructured triangular meshes with a total number of nodes \(n=2705, 10657, 42305\), that correspond to three different mesh sizes. We report, in Tables 1 and 2, all parameters leading to the setup cost of the composite AMG achieving convergence rate not larger than the prescribed one, \(\rho _{desired}=0.7\).

Table 1 Setup cost for different mesh sizes when exact bipartite matching is used for aggregation.
Table 2 Setup cost for different mesh sizes when exact bipartite matching is used for aggregation.

We can see that in all the cases our method, for both coarsening algorithms (Algorithms 2 and 3), shows very similar results and it is able to achieve a convergence factor less than the desired one with an acceptable number of components (denoted as nstages in the tables). This demonstrates feasibility and robustness of our approach. If we look closer at the convergence behavior in the different test cases, we can observe that the method shows very good efficiency and scalability on Test Case 2, where a convergence rate much lower than the required one is obtained, for all mesh sizes, by building only \(1\) AMG component. An increase in the number of coarsening levels corresponding to increased mesh size produces only a slight degradation in the convergence rate of the solver. In all other test cases, the convergence behavior appears mesh dependent, showing an increase of the number of solver components when the mesh gets refined. Indeed, in all cases with the exception of Test Case 2, we need 5 or 6 components to get the desired convergence rate for the finest mesh versus 1 or 2 components needed in the case of the smallest mesh size. In all cases the average operator complexity over all constructed solver components is about two, with a slight increase (in most cases about 2%, up to 15 % in the 3D elasticity test presented later on when the matrix dimension increases).

Concerning the performance of the coarsening process, we observe that both versions of the compatible weighted bipartite matching generate similar coarsening trees. More specifically, we see that Algorithm 2 based on an adaptive choice of the coarsening tree branch which depends on the convergence behavior of the relaxation scheme applied to two orthogonal vector spaces always chooses (at each level) the tree branch associated with the matrix \(A_c\). This shows that the pairwise aggregation algorithm based on maximum product matching of the original system matrix (that is \(A\), not the modified one, \({\widehat{A}}\)) is able to detect strong matrix connections (since then \(A_f\) has faster to converge compatible relaxation) for our test cases. In Figs. 1 and 2, we can see that the estimated convergence factors \(\rho _c\) and \(\rho _f\) of the compatible relaxation applied to the matrices \(A_c\) and \(A_f\) respectively produced by our two coarsening schemes, Algorithm 2 and Algorithm 3, have very similar pattern. The coarsening trees depicted in Figs. 1 and 2 are representative of the behavior of the coarsening process for each component of the composite \(\alpha \)AMG solvers built for all considered test cases and various mesh sizes. More specifically, we observe that the estimated convergence factor of the compatible relaxation, that is, of the weighted Jacobi applied to the homogeneous system associated with \(A_f\) built at each coarsening level, decreases fairly when the number of levels increases and stays within the range \([0.85, 0.71]\) for all tested mesh sizes. Such bounded convergence rates of the compatible relaxation when the number of levels and the problem size increase are good indication that our two coarsening schemes are capable of producing scalable AMG. In the following figures, Figs. 3 and 4, we show illustration of the pattern of the aggregates (or sparsity of the interpolation matrices) built by our two coarsening algorithms for two different test cases corresponding to the smallest problem size. Note that points corresponding to fine grid are represented in the figures by symbol \(+\) in black, while orange lines and boxes represent aggregates built at the coarsest level. The number of aggregates at the coarsest level is \(n_c=93\) for both pictures in Fig. 3, while in Fig. 4 we have \(n_c=92\) for the picture on the top, corresponding to Algorithm 2, and \(n_c=91\) and \(n_c=92\), for the pictures in the middle and on the bottom, respectively, corresponding to the 2-stages AMG built when Algorithm 3 is applied. In Figs. 3 and 4 we can clearly see that both coarsening algorithms were able to produce semi-coarsening which detects the direction of anisotropy by building aggregates aligned with the \(x-\)direction for Test Case 1 and with the main diagonal for Test Case 3.

Fig. 1
figure 1

Test Case 5 (\(\theta =\pi /2\)), \(n=2705\). Coarsening tree based on Algorithm 2 and exact bipartite matching

Fig. 2
figure 2

Test Case 5 (\(\theta =\pi /2\)), \(n=2705\). Coarsening tree based on Algorithm 3 and exact bipartite matching

Fig. 3
figure 3

Test case 1 (\(\theta =0\)), \(n=2705\). Coarsest interpolation matrices pattern built by Algorithm 2 (top) and Algorithm 3 (bottom) with exact bipartite matching

Fig. 4
figure 4

Test case 3 (\(\theta =\pi /4\)), \(n=2705\). Coarsest interpolation matrices pattern built by Algorithm 2 (top) and Algorithm 3 (center and bottom) with exact bipartite matching

5.1.2 Structured mesh

In this subsection we report results for linear systems arising from the test cases presented in the previous subsection, corresponding to \(\theta =0\), \(\pi /8\), \(\pi /4\) and \(\pi /3\), now using rectangular mesh with increasing number of nodes in the discretization. The goal is to demonstrate that our coarsening algorithms can easily detect grid-aligned anisotropy (\(\pi /4\)) and after some additional work, the adaptive procedure can produce semi-coarsening also in the non-grid aligned anisotropic case (\(\pi /3\)). This is indeed the case and is illustrated in Figs. 7 and 8 for a mesh with \(40\) internal nodes per each direction, where the number of aggregates at the coarsest level are \(n_c=63\) and \(n_c=60\) for the pictures at the top and the bottom of Fig. 7, respectively and \(n_c=56\) and \(n_c=58\) for the pictures at the top and the bottom of Fig. 8, respectively. Note that in Fig. 7, at the top left and bottom right, black bullets correspond to nodes not aggregated due to near zero smooth error at these points obtained after relaxation.

The parameter setting to construct the solver, the smoother and the algorithmic choices are the same as in the previous unstructured mesh case (Table 3).

Table 3 Setup cost for different mesh sizes when exact bipartite matching is used for aggregation

We first note that, as in the case of unstructured meshes, the two coarsening processes give similar results for all the test cases. In terms of setup cost, we observe that in the easy case of grid-aligned anisotropy, Test Case 1 and Test Case 3, only 1 or at most 2 components are needed to achieve convergence factor not greater than the desired one showing a very good scalability of the method. On the other hand, as in the unstructured grid case, for Test Case 2 and Test Case 4 when the anisotropy is not grid-aligned, we observe a degradation of the scalability, i.e., the number of components needed to reach the desired convergence factor increases when the mesh is refined. Also for these test cases the average operator complexity is about two for each mesh size similarly to the unstructured mesh case. As in the unstructured mesh case described in the previous section, we observe that the behavior of the coarsening process is very similar per each AMG component of the composite solver. It also appears comparable to that obtained for the same test cases arising from discretization on unstructured grids and discussed before, although here we observed almost constant convergence rate of the compatible relaxation (\(\approx 0.8\)) at each level of the coarsening tree, for all test cases and each mesh. As representatives of the general behavior, we draw in Figs. 5 and 6 the coarsening trees built by two versions of our matching-based coarsening methods for the first component of the 2-stage composite AMG for Test case 4 using the smallest size structured mesh (Table 4).

Fig. 5
figure 5

Test case 4 (\(\theta =\pi /3\)), \(n=64 \times 64\). Coarsening tree based on Algorithm 2 and exact bipartite matching.

Fig. 6
figure 6

Test case 4 (\(\theta =\pi /3\)), \(n=64 \times 64\). Coarsening tree based on Algorithm 3 and exact bipartite matching.

Fig. 7
figure 7

Test Case 3 (\(\theta =\pi /4\)), \(n=40 \times 40\). Coarsest interpolation matrices pattern built by Algorithm 2 (top) and Algorithm 3 (bottom) with exact bipartite matching

Fig. 8
figure 8

Test Case 4 (\(\theta =\pi /3\)), \(n=40 \times 40\). Coarsest interpolation matrices pattern built by Algorithm 2 (top) and 3 (bottom) with exact bipartite matching.

Table 4 Setup cost for different mesh sizes when exact bipartite matching is used for aggregation

5.2 Composite AMG based on approximate matching

As we remark at the end of Sect. 2, the HSL-MC64 subroutine used for computing a maximum product matching in a bipartite graph has a non-optimal computational complexity. This is not desirable if used in multigrid context, where we aim to obtain optimal \({\mathcal O}(n)\) computational complexity method. In order to overcome the super-linear computational complexity of the algorithms for exact weighted matching, we tested an algorithm which allows to obtain a matching in a general graph with weight about \(1/2\) of the maximum, also known as half-approximate matching, with \({\mathcal O}(n)\) computational complexity [28].

Motivated by the wide range of applications, to obtain linear-time approximate algorithms with increasing performance ratio as well as effective parallel implementations of such approximate algorithms is currently an active area of research (see for example, [11, 15, 22]) Our aim here is to assess the impact of using half-approximate matching in Algorithm 1 during the coarsening process described in Sect. 3 as well as its impact on the convergence behavior and setup cost of our composite \(\alpha \)AMG. All results discussed in what follows are obtained using our Matlab implementation of the adaptive AMG where the HSL-MC64 subroutine was substituted by a Matlab function implementing the matching algorithm described in [28].

We report tables that contain the parameters describing the setup cost of the composite AMG for the same test cases introduced in Sect. 5 for both unstructured and structured meshes. Algorithmic choices and parameter settings are the same as before, however now instead of using W cycle, in order to have optimal multigrid components coupled with linear complexity matching, we apply a hybrid V–W cycle which allows to obtain a \({\mathcal O}(n)\) linear complexity cycle. Again, we use one sweep of symmetric Gauss-Seidel both as pre/post smoother and coarsest level solver.

5.2.1 Unstructured mesh

As, in the case of exact bipartite matching, we first discuss results obtained on triangular meshes with a total number of nodes \(n=\) 2,705, 10,657, 42,305, respectively, corresponding to three different mesh sizes and all five test cases (with angle of anisotropy \(\theta =0\), \(\pi /8\), \(\pi /3\), \(\pi /4\), \(\pi /2\)). We report in Tables 5 and 6 the characteristics of the setup cost of the composite AMG needed to achieve pre-selected convergence factor \(\rho _{desired}=0.7\), for both coarsening algorithms.

Table 5 Setup cost for different mesh sizes when approximate graph matching is used for aggregation
Table 6 Setup cost for different mesh sizes when approximate graph matching is used for aggregation

A general observation is that as in the case of exact bipartite matching, the two coarsening algorithms give similar convergence behavior also in the case of approximate matching. Indeed, looking at the coarsening trees obtained per each AMG component (see Figs. 9, 10) we observe the same behavior showed in Sect. 5.1. On the other hand, using half-approximate matchings produces coarsening with factor less than two due to a larger number of singletons than in the case of aggregation based on exact matching. This happens because, while the exact weighted matching implemented in HSL-MC64 computes weighted matching with maximum cardinality (perfect matchings for non-singular matrices), the approximate algorithm from [28] computes a maximal weighted matching, not necessarily of maximum cardinality. The reduced coarsening factor affects the number of coarsening levels (on average by one), and as a result leads to a slight increase of the average operator complexity. We recall, that we stop the coarsening process when the size of the coarse problem reaches certain threshold. On the other hand, as expected, the use of the hybrid V–W cycle generally affects the scalability of the composite AMG. Indeed in all test cases, we observe a slight increase in the number of components (1 or 2 additional components, except for Test Case 3 with the largest mesh and Algorithm 3, which requires 3 additional components) needed to reach the desired convergence factor. In Figs. 11 and 12 we show the pattern of the aggregates built by the two coarsening algorithms for the first component of the composite AMG solvers built for Test Case 1 and Test Case 3 with the smallest mesh. In these cases the number of aggregates at the coarsest level is \(n_c=72\) (top) and \(n_c=69\) (bottom) in Fig. 11, \(n_c=74\) (top) and \(n_c=67\) (bottom) in Fig. 12. The figures in both cases show that both coarsening algorithms are able to fairly detect the direction of anisotropy however not as accurately as the aggregates obtained using exact matching (see Figs. 3; 4). The latter appear better aligned with the direction of anisotropy than the ones obtained using half-approximate matching.

Fig. 9
figure 9

Test Case 5 (\(\theta =\pi /2\)), \(n=2705\). Coarsening tree based on Algorithm 2 and approximate graph matching.

Fig. 10
figure 10

Test Case 5 (\(\theta =\pi /2\)), \(n=2705\). Coarsening tree based on Algorithm 3 and approximate graph matching

Fig. 11
figure 11

Test case 1 (\(\theta =0\)), \(n=2705\). Coarsest interpolation matrices pattern built by Algorithms 2 (top) and 3 (bottom) with approximate graph matching

Fig. 12
figure 12

Test case 3 (\(\theta =\pi /4\)), \(n=2705\). Coarsest interpolation matrices pattern built by Algorithms 2 and 3 (bottom) with approximate graph matching

5.2.2 Structured mesh

In the following we report the same results as in the previous subsection corresponding to the construction of composite AMG with a desired convergence rate, now on structured meshes, using half-approximate matching for aggregation. Also, in order to have optimal \({\mathcal O}(n)\) setup complexity for each solver component, we apply the hybrid V–W cycle (to compensate for the coarsening factor less than two). All other algorithmic choices are the same as in Sect. 5.1.2.

We observe from Tables 7 and 8 that using half-approximate matching coupled with the hybrid V–W does not affect in a significant way the convergence behavior of the constructed composite \(\alpha \)AMG. We generally see, and not in all cases, an increase of one additional solver component versus the counterpart results discussed in Sect. 5.1.2, except for Test Case 2 where in the case of the largest mesh Algorithm 2 requires two more components than the counterpart in Table 3.

Table 7 Setup cost for different mesh sizes when approximate graph matching is used for aggregation.
Table 8 Setup cost for different mesh sizes when approximate graph matching is used for aggregation

In Figs. 13 and 14 we show the pattern of the interpolation matrices built by our two coarsening processes for \(40 \times 40\) rectangular fine mesh.

Fig. 13
figure 13

Test Case 3 (\(\theta =\pi /4\)), \(n=40 \times 40\). Coarsest interpolation matrices pattern built by Algorithms 2 (top) and 3 (bottom) with approximate graph matching.

Fig. 14
figure 14

Test Case 4 (\(\theta =\pi /3\)), \(n=40 \times 40\). Coarsest interpolation matrices pattern built by Algorithm 2 and 3 (bottom) with approximate graph matching

The last figures also show that using half-approximate matching produces aggregates with fairly similar quality compared with the aggregates obtained by the exact matching displayed in Figs. 7 and 8.

5.3 Further results

In order to assess the influence of the strength of anisotropy we have run some additional tests varying \(\epsilon =1,\;0.1,\;0.01\), in the case of the most difficult anisotropic angle \(\theta = \pi /3\) (Test Case 4), when the unstructured mesh is employed.In Tables 9 and 10 we show results obtained by using exact matching coupled with W cycle, while in Tables 11 and 12 results obtained by using half-approximate matching coupled with hybrid V–W cycle are presented. All the parameters and algorithmic choices are the same as in the above sections. We observe that, as expected, for decreasing value of \(\epsilon \), i.e. when we increase the strength of anisotropy, there is some moderate increase of the setup cost. On the other hand, in the case of \(\epsilon =1\), only 1 component is needed to reach the desired convergence rate, both in the case of exact matching and in the case of approximate one, per each one of the considered mesh sizes.

Table 9 Test Case 4 varying \(\epsilon \): Setup cost for different mesh sizes when exact bipartite matching is used for aggregation
Table 10 Test Case 4 varying \(\epsilon \): Setup cost for different mesh sizes when exact bipartite matching is used for aggregation
Table 11 Test Case 4 varying \(\epsilon \): Setup cost for different mesh sizes when approximate graph matching is used for aggregation.
Table 12 Test Case 4 varying \(\epsilon \): Setup cost for different mesh sizes when approximate graph matching is used for aggregation

5.4 Results with Algorithms 2 and 3 with random initial guess replaced by the restricted smooth vector from previous level

In the following we summarize results obtained by using Algorithm 3, when we transport the current smooth vector used in the definition of the interpolation operator from fine to coarse grid through restriction. The new version of the algorithm is sketched in Algorithm 5.

figure e

Results in Table 13 refer to linear systems arising from discretization of linear elasticity problems describing a multi-material cantilever beam in 2D and 3D. Problems were discretized by linear finite elements; triangular meshes of three different sizes (\(4386\), \(16962\) and \(66690\)) were employed for 2D problems, while tetrahedral meshes of different sizes (\(2475\), \(15795\), \(111843\)) were used in 3D. We refer to the above problems as LE2D and LE3D, respectively. We obtained the system matrices and right-hand sides using the software MFEM available at http://mfem.googlecode.com. The desired convergence factor for the composite AMG was set to \(\rho _{desired}=0.7\) and a symmetrized multiplicative composition of the AMG components was applied. The number of iterations used to estimate solver convergence rates at each stage was set to \(\nu _2=15\). Half approximate matching was used for aggregation and each component is an hybrid V–W cycle where symmetric Gauss-Seidel was applied as pre/post smoother (one sweep); \(\nu _1\) sweeps of symmetric Gauss-Seidel are also applied to the coarse smooth vector on homogeneous coarse system at each level. At the coarsest level an \(LU\) factorization is applied. We stopped the coarsening process when the size of the coarsest matrix was at most maxsize \(\,=100\). Note that the unitary vector was considered as first smooth vector and no information about rigid body modes is used in the method. In the following Table 14, we report also results obtained on linear elasticity by a modification of Algorithm 2 when restriction of the smooth vector is considered at each level. We use the same parameters as in the previous experiments with Algorithm 5. Also in this case we use hybrid V–W cycle with 1 sweep of symmetric Gauss-Seidel for pre/post smoothing and LU factorization on the coarsest system. Each component is built by using half approximate matching and \(\ell _1\)–smoother is employed in compatible relaxation to choose the coarsening branch at each level. We recall that for a matrix \(A=(a_{ij})\), the \(\ell _1\)-smoother is defined as the diagonal matrix diag(\(d_k\)) with \(d_k = \sum \limits _j |a_{kj}| \frac{w_k}{w_j}\) for any given positive weights \(\{w_i\}\). Common choices are \(w_i = 1\) or \(w_i = \sqrt{a_{ii}}\). We used the latter in our experiments. Variants of the \(\ell _1\)-smoother are defaults choices in the parallel solvers library [17]. It has been first used in [19], see also [1]. Its high level of intrinsic parallelism and guaranteed convergence properties make it a viable alternative to the scaled Jacobi one.

Table 13 Setup cost for different mesh sizes when approximate graph matching is used for aggregation
Table 14 Setup cost for different mesh sizes when approximate graph matching is used for aggregation

We see a minor difference in the performance of the two versions, with Algorithm 5 being somewhat superior.

5.5 Results on S.P.D. Matrices arising from UF Sparse Matrix Collection

To assess the potential of our composite \(\alpha \)AMG strategy, we have performed some preliminary tests on s.p.d. matrices, coming from different application fields (including non-PDE modeling), from the University of Florida (UF) Sparse Matrix Collection available at http://cise.ufl.edu/research/sparse/matrices.

We summarize in Table 15 the main features of the selected matrices.

Table 15 Main Features of Selected Matrices

For sake of brevity, we report only results obtained applying Algorithm 5 when aggregation is based on half-approximate matching and each AMG component is an hybrid V–W cycle. The desired convergence factor for the composite AMG was set to \(\rho _{desired}=0.7\) and a symmetrized multiplicative composition of the AMG components was applied. A maximum number of \(10\) components are allowed to reach the desired convergence rate (i.e. if the desired convergence rate is not obtained with 10 AMG components, the bootstrap process is stopped). The number of iterations used to estimate solver convergence rates at each stage was set to \(\nu _2=15\). Symmetric Gauss-Seidel was applied both for relaxation on coarse homogeneous system for the restricted coarse vector, where we have fixed the number of iterations equal to \(\nu _1=20\), and in each solver as pre/post smoother (1 sweep). LU factorization is applied on the coarsest system. We stop the coarsening process when the size of the coarsest matrix was at most \(\text {maxsize}=100\) or when during the aggregation process all the indices appear as only-fine indices. The results are summarized in Table 16,

Table 16 Matrices from UF Sparse Matrix Collection. Setup cost when Algorithm (5) coupled with half-approximate matching is used

We observe that for a large portion of the considered matrices, both in the case of PDE and non-PDE problems, our method obtains the desired convergence rate by only 1 component. On the other hand in three cases, related to structural engineering problems more components (3 for sts4098 and \(crankseg\_1\); 8 for ldoor) are needed to get the desired convergence rate. In three cases, one coming from PDE problems (structural engineering) and two coming from non PDE problems (optimization), we observe a slow convergence behavior; indeed our method was not able to get the desired convergence rate with a number of components less or equal to 10.

6 Concluding remarks

In this paper we have performed a preliminary study of a new composite adaptive AMG method. It relies on coarsening algorithms based on the principle of compatible relaxation combined with exact or approximate maximum product matching in graphs. The latter is a strategy successfully exploited in reordering algorithms for sparse direct methods to enhance diagonal dominance. By performing a large set of experiments on difficult for AMG non-grid aligned finite element 2nd order anisotropic elliptic equations, we demonstrated that our approach can lead to semi-coarsening and to overall composite \(\alpha \)AMG solver with desired pre-set convergence factor. The composite solver can become expensive since the number of components that are built generally increase when refining the mesh. This is perhaps to be expected for AMG solvers for such non-grid aligned anisotropic problems when standard (pointwise) smoothers are employed. Note also, that we use very simple interpolation matrices (block-diagonal, with \(\ell _2\)-orthogonal columns) which are not energy stable. The reason to choose this strategy is to minimize the overall setup of the adaptive AMG methods. Other ways to alleviate the setup cost could be to combine several components in one cycle by using larger aggregates and several algebraically smooth vectors to build one tentative interpolation matrix. The setup cost of all proposed adaptive/bootstrap AMG methods tends to be expensive since they typically use several cycles to compute the final AMG hierarchy. Among all of them, the one in [4] exploits one or two setup cycles applied to multiple test vectors and hence has the potential to have cost comparable to the more traditional non-adaptive AMG methods. To make a definite conclusion about this is not easy, since the experiments reported in [4] are only two-level; they are applied to the simple Laplace equation and a more difficult and non-standard gauge-Laplacian arising in quantum chromodynamics (QCD).

We also presented preliminary tests to assess the potential of the method on systems of PDEs (2D/3D elasticity) and more general sparse matrices not necessarily coming from PDEs which was successful for many of the examples with very few exceptions. The latter cases most likely can be handled if more powerful smoother (like block-Gauss–Seidel or overlapping Schwarz) is employed.

Finally, parallel versions of (approximate) matching algorithms can be exploited to construct AMG solvers suitable for large-scale computations. One way to exploit parallelism that we envision is after the (multiplicative) setup to build the components \(B_j\), we could run the composite solver in an additive form (i.e., use \(B^{-1}_\text {additive} =\sum _jB^{-1}_j\) as a preconditioner in CG). In this way the cycles corresponding to each component \(B_j\) can be run in parallel.