An evaluation of low-cost heuristics for matrix bandwidth and profile reductions

Gonzaga de Oliveira, Sanderson L.; Bernardes, Júnior A. B.; Chagas, Guilherme O.

doi:10.1007/s40314-016-0394-9

An evaluation of low-cost heuristics for matrix bandwidth and profile reductions

Published: 07 December 2016

Volume 37, pages 1412–1471, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational and Applied Mathematics Aims and scope Submit manuscript

An evaluation of low-cost heuristics for matrix bandwidth and profile reductions

Download PDF

Sanderson L. Gonzaga de Oliveira¹,
Júnior A. B. Bernardes¹ &
Guilherme O. Chagas¹

454 Accesses
21 Citations
Explore all metrics

Abstract

Hundreds of heuristics have been proposed to resolve the problems of bandwidth and profile reductions since the 1960s. We found 132 heuristics that have been applied to these problems in reviews of the literature. Among them, 14 were selected for which no other simulation or comparison revealed that the heuristic could be superseded by any other algorithm in the analyzed articles with respect to bandwidth or profile reduction. We also considered the computational costs of the heuristics during this process. Therefore, these 14 heuristics were selected as potentially being the best low-cost methods to solve the bandwidth and/or profile reduction problems. Results of the 14 selected heuristics are evaluated in this work. For evaluation on the set of test problems, a metric based on the relative percentage distance to the best possible bandwidth or profile is proposed. The most promising heuristics for several application areas are identified. Moreover, it was found that the FNCHC and GPS heuristics showed the best overall results in reducing the bandwidth of symmetric and asymmetric matrices among the evaluated heuristics, respectively. In addition, the NSloan and MPG heuristics showed the best overall results in reducing the profile of symmetric and asymmetric matrices among the heuristics among the evaluated heuristics, respectively.

Finding a Starting Vertex for the Reverse Cuthill-McKee Method for Bandwidth Reduction: A Comparative Analysis Using Asymmetric Matrices

A Modified Bandwidth Reduction Heuristic Based on the WBRA and George-Liu Algorithm

An Experimental Analysis of Heuristics for Profile Reduction

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The resolution of large sparse linear systems \(Ax= b\), in which A is a sparse matrix, is fundamental in several science and engineering applications and is generally the part of the simulation that requires the highest computational cost. The principal origin of the problems with large-scale matrices arises from the discretization of elliptic or parabolic partial differential equations (PDEs) (Benzi 2002). The methods of finite elements, finite differences, and finite volumes are some of the most common numerical problem-solving methods related to physical phenomena that are modeled by PDEs. Large sparse linear systems are generated when these methods are applied. In addition, large sparse linear systems are also originated from problems that are not modeled by PDEs, such as chemical engineering processes, design and analysis of integrated circuits, and power system networks (Benzi 2002). A considerable amount of memory and a high processing cost are necessary to store and to solve these large-scale linear systems.

Modern hierarchical memory architecture and paging policies favor programs that consider locality of reference into account. Thus, cache coherence (that is, a sequence of recent memory references is clustered locally rather than randomly in the memory address space) should be considered important when designing a new algorithm. For the low-cost solution of large and sparse linear systems, and to reduce the memory space required, an adequate nodal renumbering is desirable to ensure that the corresponding coefficient matrix A will have narrow bandwidth and small profile. Thus, a way of designing an algorithm to return a sequence of graph vertices with cache coherence is through the use of heuristics for bandwidth reduction. Therefore, heuristics for bandwidth and profile reduction are used to achieve low computational and storage costs for solving large sparse linear systems (Gonzaga de Oliveira and Chagas 2015). In particular, the profile reduction is also important for reducing the storage cost of applications that use the skyline data structure (Felippa 1975) to represent large-scale matrices.

Let \(A = [a_{ij}]\) be an \(n \times n\) symmetric matrix (corresponding to an undirected graph \(G=(V,E)\), composed of a set of vertices V and a set of edges E). The bandwidth of line i is \(\beta _i(A) = i - \mathrm{min}(j : (1 \le j < i)\ a_{ij} \ne 0)\). Bandwidth \(\beta (A)\) is the largest distance between the non-null coefficient of the lower triangular matrix and the main diagonal considering all lines of the matrix, that is, \(\beta (A) = \mathrm{max}((1 \le i \le n)\ \beta _i(A)) = \mathrm{max}((1 \le i \le n)\ (1 \le j< i)\ (i - \mathrm{min}(j : (1 \le j < i))\ |\ a_{ij} \ne 0))\) (or the bandwidth of G for a labeling \(S=\{s(v_1), s(v_2),\ldots ,s(v_{|V|})\}\) (i.e., a bijective mapping from V to the set \(\{1,2,\ldots ,|V|\}\)) is \(\beta (G)={\mathrm{max}}(|s(v_i)-s(v_j)|:(v_i,v_j)\in E)\)). The profile of A can be defined as \(\mathrm{profile}(A) = \sum _{i=1}^n \beta _i(A)\).

On the other hand, let \(A_u = [a_{ij}]\) be an \(n \times n\) asymmetric matrix. The bandwidth of line i is \(\beta _i(A) = {\mathrm{max}}({\beta _l}_i(A_u),{\beta _u}_i(A_u))\), where \({\beta _l}_i(A_u) = i - {\mathrm{min}}(j : (1 \le j < i)\ a_{ij} \ne 0)\) and \({\beta _u}_i(A_u) = {\mathrm{max}}(j: (i < j \le n)\ a_{ij} \ne 0) - i\). Bandwidth \(\beta (A_u)\) is the largest distance between the non-null coefficient of the lower or the upper triangular matrix and the main diagonal, considering all lines of the matrix, that is, \(\beta (A_u) = \mathrm{max} ((1 \le i \le n)\ {\beta _l}_i(A_u), {\beta _u}_i(A_u))\). The profile of \(A_u\) can be defined as profile\((A_u) = \sum _{i=1}^n({\beta _l}_i(A_u) + {\beta _u}_i(A_u))\).

The problems of bandwidth and profile minimizations are hard (Papadimitriou 1976; Lin and Yuan 1994). Thus, several heuristics have been proposed to solve the bandwidth and/or profile reduction problems since the mid-1960s. The large number of methods available means that the user has an arduous task to determine which method to employ. Various comparisons among methods are available in the literature, but only for a few of them. Additionally, there have been few reviews published on this field. Cuthill (1972) performed a comparative study among results of the heuristics known up to 1971. Gibbs et al. (1976) made comparisons between the results of six heuristics. Benzi (2002) reviewed preconditioning techniques for iterative solution of large sparse linear systems. This review focused on techniques to improve performance and reliability of Krylov subspace methods. However, this review was not strictly of heuristics for bandwidth or profile reduction (Gonzaga de Oliveira and Chagas 2015). Thus, a comparison of a large variety of heuristics proposed for bandwidth and profile reductions is needed in the literature.

Disregarding computational costs, the Variable Neighborhood Search for bandwidth reduction (VNS-band) heuristic (Mladenovic et al. 2010) may be the method that represents the state of the art with respect to the problem of bandwidth reduction. Mladenovic et al. (2010) established the VNS-band timeout at 500 s to solve the problem in 113 instances of the Harwell-Boeing sparse-matrix collection (http://math.nist.gov/MatrixMarket/data/Harwell-Boeing) (Duff et al. 1989a). Even with the short amount of time it takes to find the solution, 500 s as the VNS-band timeout to search for better solutions can be considered high, since this is the time that the user waits for the results. On the other hand, the heuristic based on a dual-representation simulated annealing (DRSA-band) of Torres-Jimenez et al. (2015) obtained, in tests conducted by these authors, results slightly better than results of the VNS-band heuristic (Mladenovic et al. 2010) in relation to bandwidth reduction. In general, the DRSA-band is slower than the VNS-band in the results presented by Torres-Jimenez et al. (2015). Thus, the DRSA-band heuristic was not considered as potentially the best low-cost heuristic with significant bandwidth reduction because its computational cost is higher than the computational cost of the VNS-band heuristic; apart from significantly reducing the bandwidth, a heuristic must also present low computational cost, i.e., it cannot be slow compared to other heuristics. Many papers in this field, where these are only two examples, evaluate their heuristics on 113 instances of Harwell-Boeing dataset, where the number of rows/columns of the matrices included in the dataset varies from 30 to 1104. Although the Harwell-Boeing sparse-matrix collection was widely used for testing heuristics for bandwidth reduction in the literature, the matrices in this dataset are too small by today’s standards. The results of such papers, with the largest case only of 1104 dimensions, offer little insight as to how the tested heuristics compare on large examples that are of more practical interest. Hence, additional datasets including much larger matrices are covered in our evaluation.

Thus, one of the main objectives of this study is to verify whether, with a low timeout, the VNS-band heuristic still achieves better results than the possible best low-cost heuristics for bandwidth or profile reduction identified in systematic reviews. Therefore, the main contribution of this work is the comparison of results obtained using 14 heuristics for bandwidth and profile reductions in symmetric and asymmetric matrices (up to 100,196 vertices). These 14 heuristics were selected in systematic reviews as the possible best low-cost methods to solve the problems (Chagas and Gonzaga de Oliveira 2015; Bernardes et al. 2015; Gonzaga de Oliveira and Chagas 2015).

Only potential low-cost heuristics for bandwidth and profile reductions were selected in the systematic reviews. The reason for this decision was that the local reordering of vertices of the graph associated with the matrix of the linear system may contribute to reducing the computational cost of an iterative solver, such as the Conjugate Gradient Method (CGM) (Duff and Meurant 1989b). It should be noted that it is important to have an ordering which does not lead to an increase in the number of iterations when a preconditioner is applied. Additionally, such reduction in the execution cost is also achieved by improving the number of cache hits (Das et al. 1992; Burgess and Giles 1997). On the other hand, the bandwidth and profile reductions are not directly proportional to the computational cost reduction obtained when linear systems are solved using an iterative method. Clearly, what is to be minimized is the total computing time including the reordering time (at least when only a single linear system is to be solved). Thus, a reordering of vertices must be performed at low cost. To provide more specific detail, although linear systems of reduced order can be solved efficiently with a multifrontal direct method, a prominent method for solving large-scale sparse linear systems is the conjugate gradient method (Hestenes and Stiefel 1952; Lanczos 1952). One can reduce computational costs using this method by applying a local ordering of the vertices (Duff and Meurant 1989b) of the corresponding graph of A to improve cache hit rates. This local ordering can be reached by applying a heuristic for bandwidth reduction (Das et al. 1992; Burgess and Giles 1997). On the other hand, an important issue is to have an ordering which leads to a small number of iterations when a preconditioner is applied (and this depends on the structure of the instance). Additionally, Benzi et al. (1999) showed that heuristics for bandwidth reduction can have a positive effect on the computational cost of the generalized minimal residual (GMRES) method (Saad and Schultz 1986).

The remainder of this paper is organized as follows. Section 2 explains the systematic reviews performed to identify the potential best low-cost heuristics for bandwidth and profile reductions. Section 3 presents how the simulations were conducted in this study. Section 4 shows the results. Finally, Sect. 5 addresses the conclusions.

2 Systematic reviews

Systematic reviews (Chagas and Gonzaga de Oliveira 2015; Bernardes et al. 2015; Gonzaga de Oliveira and Chagas 2015) report 73 and 74 heuristics for bandwidth and profile reductions, respectively, that had been published in the period of time spanning the 1960s to the present. Most of the heuristics were considered surpassed by other heuristics in these systematic reviews. Consequently, eight heuristics in each case were selected as potentially being the best low-cost heuristics for bandwidth [RCM–GL (George and Liu 1981), Burgess–Lai (BL) (Burgess and Lai 1986), WBRA (Esposito et al. 1998), FNCHC (Lim et al. 2003, 2004, 2007), GGPS (Wang et al. 2009), VNS-band (Mladenovic et al. 2010), hGPHH (Koohestani and Poli 2011), CSS-band (Kaveh and Sharafi 2012)] or profile [Snay (1976), RCM–GL (George and Liu 1981), RCM–GL–FL (Fenves and Law 1983), Sloan (1989), MPG (Medeiros et al. 1993), NSloan (Kumfert and Pothen 1997), Sloan–MGPS (Reid and Scott 1999), Hu and Scott (2001)] reduction.

From the heuristics identified in the systematic reviews, 17 heuristics were applied to both bandwidth and profile reductions. In addition, the Reverse Cuthill–McKee method with pseudo-peripheral vertex given by the George–Liu algorithm (RCM–GL) (George and Liu 1981) was selected in both systematic reviews of heuristics for bandwidth and profile reductions. Thus, 130 heuristics for bandwidth and profile reductions were identified in both systematic reviews, and 15 heuristics were selected because no other simulation or comparison showed that these 15 heuristics could be superseded by any other heuristics in the articles analyzed, in terms of bandwidth or profile reduction when the computation costs of the given heuristic are also considered.

The Gibbs–Poole–Stockmeyer (GPS) algorithm (Gibbs et al. 1976), outperformed by metaheuristic-based heuristics, is one of the most classic low-cost heuristics tested in the field for both bandwidth and profile reductions. Hence, the GPS algorithm was implemented and its results were compared with the other heuristics implemented in this computational experiment.

On the other hand, Fenves and Law’s RCM–GL (RCM–GL–FL) method (Fenves and Law 1983), despite being selected in the systematic review of heuristics for profile reduction, was not implemented in this work because it is a specific application of the RCM–GL method for finite element discretizations. Additionally, although no computational costs were presented by its authors, the Hu and Scott’s heuristic (Hu and Scott 2001) was selected for its results in profile reductions when compared with the results shown by the heuristics that were tested by Hu and Scott (2001). However, the Hu-Scott heuristic was not implemented here because it was considered a high-cost heuristic, especially when performing matrix multiplications.

Another version of the VNS-band was proposed by Wang et al. (2014). This heuristic is outperformed by the fast node centroid hill climbing (FNCHC) heuristic (Lim et al. 2007). This is verified both in approximating the solution as well as comparing the computational cost to present an approximate solution. One can realize this by examining the results of the original VNS-band (Mladenovic et al. 2010) heuristic and the VNS-band proposed by Wang et al. (2014) and comparing them with the results presented below. It should be noted that the dataset used in the tests shown by Wang et al. (2014) is a subset of a dataset presented below.

Thus, Table 1 shows 14 heuristics that can be considered as the most promising low-cost heuristics to solve the problems. These 14 heuristics were implemented and tested in this work.

Table 1 Low-cost heuristics for bandwidth and profile reductions that were selected from systematic reviews

An evaluation of low-cost heuristics for matrix bandwidth and profile reductions

Abstract

Similar content being viewed by others

Finding a Starting Vertex for the Reverse Cuthill-McKee Method for Bandwidth Reduction: A Comparative Analysis Using Asymmetric Matrices

A Modified Bandwidth Reduction Heuristic Based on the WBRA and George-Liu Algorithm

An Experimental Analysis of Heuristics for Profile Reduction

1 Introduction

2 Systematic reviews

3 Description of the tests

4 Results and analysis

4.1 Results of 14 heuristics applied to instances of the Harwell-Boeing sparse-matrix collection

4.1.1 Results of 14 heuristics applied to the set composed of 15 symmetric instances of the Harwell-Boeing sparse-matrix collection

4.1.2 Results of 14 heuristics applied to the set composed of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection

4.1.3 Results of 14 heuristics applied to the set composed of 18 asymmetric instances of the Harwell-Boeing sparse-matrix collection

4.1.4 Results of 14 heuristics applied to the set composed of 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection

4.2 Results of 10 heuristics applied to instances of the University of Florida sparse-matrix collection

4.2.1 Results of 10 heuristics applied to the set composed of 17 symmetric instances of the University of Florida sparse-matrix collection

4.2.2 Results of 10 heuristics applied to the set composed of five asymmetric instances of the University of Florida sparse-matrix collection

4.3 Best heuristics applied to four sets of instances of the Harwell-Boeing and two sets of instances of the University of Florida sparse-matrix collections

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Implementation of the heuristics, testing, and calibration

1.1 A.1: The RCM–GL and hGPHH-GL methods

1.2 A.2: The GPS algorithm

1.3 A.3: The Sloan, NSloan, and Sloan–MGPS heuristics

1.4 A.4: The MPG heuristic

1.5 A.5: The CSS-band heuristic

Appendix B: Results

Appendix C: Application areas

Appendix D: The most promising low-cost heuristics for bandwidth and profile reductions of symmetric and asymmetric matrices

1.1 D.1: Best heuristic for bandwidth reduction of symmetric matrices

1.2 D.2: Best heuristic for bandwidth reduction of asymmetric matrices

1.3 D.3: Best low-cost heuristic for profile reduction of symmetric and asymmetric matrices

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation