Abstract
Hundreds of heuristics have been proposed to resolve the problems of bandwidth and profile reductions since the 1960s. We found 132 heuristics that have been applied to these problems in reviews of the literature. Among them, 14 were selected for which no other simulation or comparison revealed that the heuristic could be superseded by any other algorithm in the analyzed articles with respect to bandwidth or profile reduction. We also considered the computational costs of the heuristics during this process. Therefore, these 14 heuristics were selected as potentially being the best low-cost methods to solve the bandwidth and/or profile reduction problems. Results of the 14 selected heuristics are evaluated in this work. For evaluation on the set of test problems, a metric based on the relative percentage distance to the best possible bandwidth or profile is proposed. The most promising heuristics for several application areas are identified. Moreover, it was found that the FNCHC and GPS heuristics showed the best overall results in reducing the bandwidth of symmetric and asymmetric matrices among the evaluated heuristics, respectively. In addition, the NSloan and MPG heuristics showed the best overall results in reducing the profile of symmetric and asymmetric matrices among the heuristics among the evaluated heuristics, respectively.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The resolution of large sparse linear systems \(Ax= b\), in which A is a sparse matrix, is fundamental in several science and engineering applications and is generally the part of the simulation that requires the highest computational cost. The principal origin of the problems with large-scale matrices arises from the discretization of elliptic or parabolic partial differential equations (PDEs) (Benzi 2002). The methods of finite elements, finite differences, and finite volumes are some of the most common numerical problem-solving methods related to physical phenomena that are modeled by PDEs. Large sparse linear systems are generated when these methods are applied. In addition, large sparse linear systems are also originated from problems that are not modeled by PDEs, such as chemical engineering processes, design and analysis of integrated circuits, and power system networks (Benzi 2002). A considerable amount of memory and a high processing cost are necessary to store and to solve these large-scale linear systems.
Modern hierarchical memory architecture and paging policies favor programs that consider locality of reference into account. Thus, cache coherence (that is, a sequence of recent memory references is clustered locally rather than randomly in the memory address space) should be considered important when designing a new algorithm. For the low-cost solution of large and sparse linear systems, and to reduce the memory space required, an adequate nodal renumbering is desirable to ensure that the corresponding coefficient matrix A will have narrow bandwidth and small profile. Thus, a way of designing an algorithm to return a sequence of graph vertices with cache coherence is through the use of heuristics for bandwidth reduction. Therefore, heuristics for bandwidth and profile reduction are used to achieve low computational and storage costs for solving large sparse linear systems (Gonzaga de Oliveira and Chagas 2015). In particular, the profile reduction is also important for reducing the storage cost of applications that use the skyline data structure (Felippa 1975) to represent large-scale matrices.
Let \(A = [a_{ij}]\) be an \(n \times n\) symmetric matrix (corresponding to an undirected graph \(G=(V,E)\), composed of a set of vertices V and a set of edges E). The bandwidth of line i is \(\beta _i(A) = i - \mathrm{min}(j : (1 \le j < i)\ a_{ij} \ne 0)\). Bandwidth \(\beta (A)\) is the largest distance between the non-null coefficient of the lower triangular matrix and the main diagonal considering all lines of the matrix, that is, \(\beta (A) = \mathrm{max}((1 \le i \le n)\ \beta _i(A)) = \mathrm{max}((1 \le i \le n)\ (1 \le j< i)\ (i - \mathrm{min}(j : (1 \le j < i))\ |\ a_{ij} \ne 0))\) (or the bandwidth of G for a labeling \(S=\{s(v_1), s(v_2),\ldots ,s(v_{|V|})\}\) (i.e., a bijective mapping from V to the set \(\{1,2,\ldots ,|V|\}\)) is \(\beta (G)={\mathrm{max}}(|s(v_i)-s(v_j)|:(v_i,v_j)\in E)\)). The profile of A can be defined as \(\mathrm{profile}(A) = \sum _{i=1}^n \beta _i(A)\).
On the other hand, let \(A_u = [a_{ij}]\) be an \(n \times n\) asymmetric matrix. The bandwidth of line i is \(\beta _i(A) = {\mathrm{max}}({\beta _l}_i(A_u),{\beta _u}_i(A_u))\), where \({\beta _l}_i(A_u) = i - {\mathrm{min}}(j : (1 \le j < i)\ a_{ij} \ne 0)\) and \({\beta _u}_i(A_u) = {\mathrm{max}}(j: (i < j \le n)\ a_{ij} \ne 0) - i\). Bandwidth \(\beta (A_u)\) is the largest distance between the non-null coefficient of the lower or the upper triangular matrix and the main diagonal, considering all lines of the matrix, that is, \(\beta (A_u) = \mathrm{max} ((1 \le i \le n)\ {\beta _l}_i(A_u), {\beta _u}_i(A_u))\). The profile of \(A_u\) can be defined as profile\((A_u) = \sum _{i=1}^n({\beta _l}_i(A_u) + {\beta _u}_i(A_u))\).
The problems of bandwidth and profile minimizations are hard (Papadimitriou 1976; Lin and Yuan 1994). Thus, several heuristics have been proposed to solve the bandwidth and/or profile reduction problems since the mid-1960s. The large number of methods available means that the user has an arduous task to determine which method to employ. Various comparisons among methods are available in the literature, but only for a few of them. Additionally, there have been few reviews published on this field. Cuthill (1972) performed a comparative study among results of the heuristics known up to 1971. Gibbs et al. (1976) made comparisons between the results of six heuristics. Benzi (2002) reviewed preconditioning techniques for iterative solution of large sparse linear systems. This review focused on techniques to improve performance and reliability of Krylov subspace methods. However, this review was not strictly of heuristics for bandwidth or profile reduction (Gonzaga de Oliveira and Chagas 2015). Thus, a comparison of a large variety of heuristics proposed for bandwidth and profile reductions is needed in the literature.
Disregarding computational costs, the Variable Neighborhood Search for bandwidth reduction (VNS-band) heuristic (Mladenovic et al. 2010) may be the method that represents the state of the art with respect to the problem of bandwidth reduction. Mladenovic et al. (2010) established the VNS-band timeout at 500 s to solve the problem in 113 instances of the Harwell-Boeing sparse-matrix collection (http://math.nist.gov/MatrixMarket/data/Harwell-Boeing) (Duff et al. 1989a). Even with the short amount of time it takes to find the solution, 500 s as the VNS-band timeout to search for better solutions can be considered high, since this is the time that the user waits for the results. On the other hand, the heuristic based on a dual-representation simulated annealing (DRSA-band) of Torres-Jimenez et al. (2015) obtained, in tests conducted by these authors, results slightly better than results of the VNS-band heuristic (Mladenovic et al. 2010) in relation to bandwidth reduction. In general, the DRSA-band is slower than the VNS-band in the results presented by Torres-Jimenez et al. (2015). Thus, the DRSA-band heuristic was not considered as potentially the best low-cost heuristic with significant bandwidth reduction because its computational cost is higher than the computational cost of the VNS-band heuristic; apart from significantly reducing the bandwidth, a heuristic must also present low computational cost, i.e., it cannot be slow compared to other heuristics. Many papers in this field, where these are only two examples, evaluate their heuristics on 113 instances of Harwell-Boeing dataset, where the number of rows/columns of the matrices included in the dataset varies from 30 to 1104. Although the Harwell-Boeing sparse-matrix collection was widely used for testing heuristics for bandwidth reduction in the literature, the matrices in this dataset are too small by today’s standards. The results of such papers, with the largest case only of 1104 dimensions, offer little insight as to how the tested heuristics compare on large examples that are of more practical interest. Hence, additional datasets including much larger matrices are covered in our evaluation.
Thus, one of the main objectives of this study is to verify whether, with a low timeout, the VNS-band heuristic still achieves better results than the possible best low-cost heuristics for bandwidth or profile reduction identified in systematic reviews. Therefore, the main contribution of this work is the comparison of results obtained using 14 heuristics for bandwidth and profile reductions in symmetric and asymmetric matrices (up to 100,196 vertices). These 14 heuristics were selected in systematic reviews as the possible best low-cost methods to solve the problems (Chagas and Gonzaga de Oliveira 2015; Bernardes et al. 2015; Gonzaga de Oliveira and Chagas 2015).
Only potential low-cost heuristics for bandwidth and profile reductions were selected in the systematic reviews. The reason for this decision was that the local reordering of vertices of the graph associated with the matrix of the linear system may contribute to reducing the computational cost of an iterative solver, such as the Conjugate Gradient Method (CGM) (Duff and Meurant 1989b). It should be noted that it is important to have an ordering which does not lead to an increase in the number of iterations when a preconditioner is applied. Additionally, such reduction in the execution cost is also achieved by improving the number of cache hits (Das et al. 1992; Burgess and Giles 1997). On the other hand, the bandwidth and profile reductions are not directly proportional to the computational cost reduction obtained when linear systems are solved using an iterative method. Clearly, what is to be minimized is the total computing time including the reordering time (at least when only a single linear system is to be solved). Thus, a reordering of vertices must be performed at low cost. To provide more specific detail, although linear systems of reduced order can be solved efficiently with a multifrontal direct method, a prominent method for solving large-scale sparse linear systems is the conjugate gradient method (Hestenes and Stiefel 1952; Lanczos 1952). One can reduce computational costs using this method by applying a local ordering of the vertices (Duff and Meurant 1989b) of the corresponding graph of A to improve cache hit rates. This local ordering can be reached by applying a heuristic for bandwidth reduction (Das et al. 1992; Burgess and Giles 1997). On the other hand, an important issue is to have an ordering which leads to a small number of iterations when a preconditioner is applied (and this depends on the structure of the instance). Additionally, Benzi et al. (1999) showed that heuristics for bandwidth reduction can have a positive effect on the computational cost of the generalized minimal residual (GMRES) method (Saad and Schultz 1986).
The remainder of this paper is organized as follows. Section 2 explains the systematic reviews performed to identify the potential best low-cost heuristics for bandwidth and profile reductions. Section 3 presents how the simulations were conducted in this study. Section 4 shows the results. Finally, Sect. 5 addresses the conclusions.
2 Systematic reviews
Systematic reviews (Chagas and Gonzaga de Oliveira 2015; Bernardes et al. 2015; Gonzaga de Oliveira and Chagas 2015) report 73 and 74 heuristics for bandwidth and profile reductions, respectively, that had been published in the period of time spanning the 1960s to the present. Most of the heuristics were considered surpassed by other heuristics in these systematic reviews. Consequently, eight heuristics in each case were selected as potentially being the best low-cost heuristics for bandwidth [RCM–GL (George and Liu 1981), Burgess–Lai (BL) (Burgess and Lai 1986), WBRA (Esposito et al. 1998), FNCHC (Lim et al. 2003, 2004, 2007), GGPS (Wang et al. 2009), VNS-band (Mladenovic et al. 2010), hGPHH (Koohestani and Poli 2011), CSS-band (Kaveh and Sharafi 2012)] or profile [Snay (1976), RCM–GL (George and Liu 1981), RCM–GL–FL (Fenves and Law 1983), Sloan (1989), MPG (Medeiros et al. 1993), NSloan (Kumfert and Pothen 1997), Sloan–MGPS (Reid and Scott 1999), Hu and Scott (2001)] reduction.
From the heuristics identified in the systematic reviews, 17 heuristics were applied to both bandwidth and profile reductions. In addition, the Reverse Cuthill–McKee method with pseudo-peripheral vertex given by the George–Liu algorithm (RCM–GL) (George and Liu 1981) was selected in both systematic reviews of heuristics for bandwidth and profile reductions. Thus, 130 heuristics for bandwidth and profile reductions were identified in both systematic reviews, and 15 heuristics were selected because no other simulation or comparison showed that these 15 heuristics could be superseded by any other heuristics in the articles analyzed, in terms of bandwidth or profile reduction when the computation costs of the given heuristic are also considered.
The Gibbs–Poole–Stockmeyer (GPS) algorithm (Gibbs et al. 1976), outperformed by metaheuristic-based heuristics, is one of the most classic low-cost heuristics tested in the field for both bandwidth and profile reductions. Hence, the GPS algorithm was implemented and its results were compared with the other heuristics implemented in this computational experiment.
On the other hand, Fenves and Law’s RCM–GL (RCM–GL–FL) method (Fenves and Law 1983), despite being selected in the systematic review of heuristics for profile reduction, was not implemented in this work because it is a specific application of the RCM–GL method for finite element discretizations. Additionally, although no computational costs were presented by its authors, the Hu and Scott’s heuristic (Hu and Scott 2001) was selected for its results in profile reductions when compared with the results shown by the heuristics that were tested by Hu and Scott (2001). However, the Hu-Scott heuristic was not implemented here because it was considered a high-cost heuristic, especially when performing matrix multiplications.
Another version of the VNS-band was proposed by Wang et al. (2014). This heuristic is outperformed by the fast node centroid hill climbing (FNCHC) heuristic (Lim et al. 2007). This is verified both in approximating the solution as well as comparing the computational cost to present an approximate solution. One can realize this by examining the results of the original VNS-band (Mladenovic et al. 2010) heuristic and the VNS-band proposed by Wang et al. (2014) and comparing them with the results presented below. It should be noted that the dataset used in the tests shown by Wang et al. (2014) is a subset of a dataset presented below.
Thus, Table 1 shows 14 heuristics that can be considered as the most promising low-cost heuristics to solve the problems. These 14 heuristics were implemented and tested in this work.
3 Description of the tests
Appendix A (Appendices A–D are available at http://www.dcc.ufla.br/~sanderson/app_coam16) shows the testing and calibration performed to compare our implementations with the codes used by the original proposers of the 14 heuristics to ensure the codes we implemented were comparable to the algorithms that were originally proposed. To evaluate the bandwidth and profile reductions provided by the 14 selected heuristics, two datasets commonly used in this field were employed. Specifically, 113 (50 symmetric and 63 asymmetric) instances of the Harwell-Boeing sparse-matrix collection (http://math.nist.gov/MatrixMarket/data/Harwell-Boeing) (Duff et al. 1989a) and 22 (17 symmetric and 5 asymmetric) instances of the University of Florida sparse-matrix collection (http://www.cise.ufl.edu/research/sparse/matrices/index.html) (Davis and Hu 2011) were used in this work. These instances represent a wide spectrum of scientific and engineering applications matrices (i.e., standard-test matrices arising from problems in linear systems, least squares, and eigenvalue calculations) and have been employed here for comparisons and evaluations since many other researchers have employed these instances. Specifically, the 113 instances of the Harwell-Boeing sparse-matrix collection were divided into two subsets: (i) 33 instances, ranging from 30 to 237 vertices, accordingly as stated in the collection, with each of these 33 instances (used as counter-examples to hypotheses in sparse-matrix research) having less than 200 vertices when vertices without adjacencies were disregarded; and (ii) 80 instances, ranging from 207 to 1104 vertices. This subdivision of the 113 instances of the Harwell-Boeing sparse-matrix collection into two sets of instances is common in the field, for example, see Martí et al. (2001), Lim et al. (2004, 2007), Piñana et al. (2004), Rodriguez-Tello et al. (2008), Mladenovic et al. (2010), Torres-Jimenez et al. (2015). Additionally, each set of instances was divided into symmetric and asymmetric matrices, as shown in the tables below.
To apply the heuristics implemented in this work in asymmetric matrices, we used the strategy proposed by Reid and Scott (2006) to apply the heuristics in asymmetric instances. In this strategy, the asymmetric matrix \(A_u\) is added to its transpose matrix \(A_u^\text {T}\), i.e., \(A_u + A_u^\text {T}\), thus resulting in a symmetric matrix. Subsequently, the heuristic is applied to the graph associated with the matrix obtained and the reordering attained using the heuristic is used to reorder the rows of the original asymmetric matrix. This strategy, according to the results presented by Reid and Scott (2006), had the poorest results in the bandwidth reduction of asymmetric matrices using the RCM heuristic (George 1971) when compared with the two other strategies proposed by Reid and Scott (2006). Nevertheless, the strategy to compute \(A_u + A_u^T\) was applied in our approach because it is included along with the RCM–GL method (George and Liu 1981) in the MATLAB software (MATLAB 2016). In addition, it is the simplest strategy and the one that presents the lowest storage and computational costs among those proposed by Reid and Scott (2006).
The workstations used in the execution of the simulations with the instances of the Harwell-Boeing and University of Florida sparse-matrix collections contained an Intel® CoreTM i3-2120 (3 MB Cache, CPU 3.30 GHz\(\times \)4, 8 GB of main memory DDR3 1333 MHz) and Intel® XeonTM E5620 (12 MB Cache, CPU 2.40 GHz\(\times \)8, 24 GB of main memory DDR3 1333 MHz) (Intel; Santa Clara, CA, United States), respectively. The Ubuntu 14.04 LTS 64-bit operating system with Linux kernel-version 3.13.0-39-generic was used. It should be noted that we followed the recommendations given by Johnson (2002) for this experimental analysis of the 14 heuristics for bandwidth and profile reductions.
A metric used by some authors, where Kumfert and Pothen (1997), Mladenovic et al. (2010), Koohestani and Poli (2011) are examples, is to add, for each heuristic, the bandwidths or profiles obtained. A similar metric used by other authors, where Martí et al. (2001), Piñana et al. (2004), Lim et al. (2007), Rodriguez-Tello et al. (2008) are examples, is to calculate the average bandwidth over the instances in each set. However, these metrics are not reasonable when considering a set of instances with very different sizes, as is the case in the tests presented in this paper. This is because, for example, a heuristic with very bad results in a small instance would not be penalized appropriately.
In many works, including Burgess and Lai (1986), Medeiros et al. (1993), Esposito et al. (1998), Mladenovic et al. (2010), Koohestani and Poli (2011), the authors compared the results of the heuristics by counting the number of times that the heuristic obtained the lower bandwidth or profile on the instances of the Harwell-Boeing sparse-matrix collection. This kind of metric is also shown in the following tables. In addition, the heuristic may be appropriately considered the best heuristic in a set when it has the best bandwidth or profile reduction among all the heuristics in the set. One possible scenario, then, is that a heuristic could be considered the best of the entire set if it attains the second best status in many instances, while different heuristics are the best heuristic for different individual instances. Thus, to evaluate the quality of the results obtained using the heuristics tested, for each heuristic h in each set of instances, \(\rho _p=\frac{\mathrm{profile}_h-\mathrm{profile}_{\mathrm{min}}}{\mathrm{profile}_{\mathrm{min}}}\) was calculated for each instance, where \(\mathrm{profile}_h\) is the profile obtained using a heuristic h and \(\mathrm{profile}_{\mathrm{min}}\) is the lowest profile obtained in an instance by a heuristic tested. The same is carried out in relation to both bandwidth reduction (\(\rho _{\beta }\)) and execution times (\(\rho _t\)). To the best of our knowledge, this paper is the first (published) instance of this approach being used in the field.
4 Results and analysis
This section presents the results obtained by the heuristics in relation to bandwidth and profile reductions in 113 (Sect. 4.1) and 22 (Sect. 4.2) instances of the Harwell-Boeing and University of Florida sparse-matrix collections, respectively. In the tables shown in this section, it can be seen that the instance name and size (n), the value of the initial bandwidth (\(\beta _0\)) or profile (\(\mathrm{profile}_0\)) of the instance, and the average values of bandwidth, profile and runtime obtained using each heuristic in 10 executions carried out in each of the instances. Section 4.3 analyzes the results obtained.
4.1 Results of 14 heuristics applied to instances of the Harwell-Boeing sparse-matrix collection
This section presents the bandwidth and profile results obtained using 14 heuristics applied to 113 instances of the Harwell-Boeing sparse-matrix collection. Sections 4.1.1–4.1.4 present the results of the 14 heuristics for bandwidth and profile reductions applied to the sets composed of 15 and 35 symmetric matrices and 18 and 45 asymmetric matrices of the Harwell-Boeing sparse-matrix collection, respectively.
4.1.1 Results of 14 heuristics applied to the set composed of 15 symmetric instances of the Harwell-Boeing sparse-matrix collection
In this set composed of 15 symmetric instances, the FNCHC (Lim et al. 2003, 2004, 2007) and NSloan (Kumfert and Pothen 1997) heuristics were the best ones for reducing bandwidth and profile when considering the \(\rho _{\beta }\) and \(\rho _p\) metrics, respectively [see Fig. 1a, b, and Table 9 (Tables 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 are available in Appendix B)]. These two heuristics also obtained the largest number of best results of bandwidth (in 12 instances) and profile (in 8 instances) in this dataset (see Fig. 1c, d).
The VNS-band heuristic was tested with a timeout of 500 s because this was the time set by its authors (Mladenovic et al. 2010). It should be noted that 500 s on the machine used in this study achieves more computation than the machine on which the tests were conducted by Mladenovic et al. (2010). In spite of this, the VNS-band heuristic was dominated by the FNCHC and NSloan heuristics in relation to bandwidth and profile reductions, respectively, for this set of instances.
The hGPHH-GL and RCM–GL methods were the fastest ones among the heuristics tested according to the \(\rho _t\) metric, and considering also that the RCM–GL method obtained the lowest computational costs for a larger number of instances (six instances) (see Fig. 2; Table 10). In particular, the WBRA (Esposito et al. 1998), FNCHC (Lim et al. 2003, 2004, 2007), VNS-band (Mladenovic et al. 2010), and CSS-band (Kaveh and Sharafi 2012) heuristics showed higher execution times than the 10 other heuristics tested. Moreover, the hGPHH-GL method is as fast as the RCM–GL method because their difference is how to order the adjacent vertices (to be numbered) to the current vertex.
4.1.2 Results of 14 heuristics applied to the set composed of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection
In this set composed of 35 symmetric instances, the VNS-band (8 s) (Mladenovic et al. 2010) and MPG (Medeiros et al. 1993) heuristics were the best ones for reducing bandwidth and profile when considering the \(\rho _b\) and \(\rho _p\) metrics, respectively (see Fig. 3a, b; Tables 11 and 12). These two heuristics also obtained the largest number of best results of bandwidth (in 24 instances) and profile (in 11 instances) in this set of instances, respectively (see Fig. 3c, d). In addition, Sloan’s (1989) algorithm was the fastest method among the heuristics tested (see Fig. 4; Table 13).
4.1.3 Results of 14 heuristics applied to the set composed of 18 asymmetric instances of the Harwell-Boeing sparse-matrix collection
In this set composed of 18 asymmetric instances, the FNCHC and MPG heuristics were the best ones for reducing bandwidth and profile when considering the \(\rho _b\) and \(\rho _p\) metrics, respectively (see Fig. 5a, b; Table 14). The RCM–GL method was the fastest method among the heuristics tested (see Fig. 6; Table 15).
4.1.4 Results of 14 heuristics applied to the set composed of 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection
In this set composed of 45 asymmetric instances, the VNS-band (12 s) (Mladenovic et al. 2010) and MPG (Medeiros et al. 1993) heuristics were the best ones for reducing bandwidth and profile when considering the \(\rho _b\) and \(\rho _p\) metrics, respectively (see Fig. 7a, b; Tables 16 and 17). These two heuristics also obtained the largest number of best results of bandwidth (in 34 instances) and profile (in 17 instances) in this set of instances, respectively (see Fig. 7c, d). The hGPHH-GL heuristic was the fastest method among the heuristics tested (see Fig. 8; Table 18).
4.2 Results of 10 heuristics applied to instances of the University of Florida sparse-matrix collection
Results of Snay’s (1976), Burgess and Lai (1986), WBRA (Esposito et al. 1998), and CSS-band (Kaveh and Sharafi 2012) heuristics were dominated by the 10 other heuristics when applied to the 113 instances of the Harwell-Boeing sparse-matrix collection (see Sect. 4.1) so that these four heuristics were not applied to the 22 instances of the University of Florida sparse-matrix collection. Hence, Sects. 4.2.1 and 4.2.2 present the results of 10 heuristics for bandwidth and profile reductions applied to the sets composed of 17 symmetric and 5 asymmetric matrices of the University of Florida sparse-matrix collection, respectively.
4.2.1 Results of 10 heuristics applied to the set composed of 17 symmetric instances of the University of Florida sparse-matrix collection
In this set composed of 17 symmetric instances, the FNCHC and NSloan heuristics were the best ones for reducing bandwidth and profile when considering the \(\rho _b\) and \(\rho _p\) metrics, respectively (see Fig. 9a, b; Table 19). These two heuristics also obtained the largest number of best results of bandwidth (in eight instances) and profile (in ten instances) in this set of instances, respectively (see Fig. 9c, d). The RCM–GL method was the fastest method among the heuristics tested (see Fig. 10; Table 20).
We established the VNS-band timeout at 500 s to solve the problem in these 17 symmetric instances of the University of Florida sparse-matrix collection. Probably, the VNS-band heuristic would achieve better results with a higher timeout, but our intention is to investigate low timeouts for the heuristics. In the tests shown in Table 20, one can note that the VNS-band heuristic achieved its results with a higher computing cost of 1 and 5 magnitudes in relation to the FNCHC and RCM–GL heuristics, respectively.
4.2.2 Results of 10 heuristics applied to the set composed of five asymmetric instances of the University of Florida sparse-matrix collection
In this set composed of five asymmetric instances, one can verify that the GPS and MPG heuristics were the best ones for reducing bandwidth and profile when considering the \(\rho _b\) and \(\rho _p\) metrics, respectively (see Fig. 11a, b; Table 21). These two heuristics also obtained the largest number of best results of bandwidth (in two instances) and profile (in three instances) in this set of instances, respectively. The RCM–GL method was the fastest method among the heuristics tested (see Fig. 12; Table 21).
Similar to the executions performed with the 17 symmetric instances of the University of Florida sparse-matrix collection, we established the VNS-band timeout at 500 s to solve the problem in these five asymmetric instances of this dataset. Likewise, it is probable that the VNS-band heuristic would achieve better results with a higher timeout, but our intention is to investigate low timeouts for the heuristics. In the tests performed in these five asymmetric matrices, the VNS-band heuristic achieved its results with a higher computing cost of 1 and 5 magnitudes in relation to the GPS and RCM–GL algorithms, respectively (see Table 21).
4.3 Best heuristics applied to four sets of instances of the Harwell-Boeing and two sets of instances of the University of Florida sparse-matrix collections
The sets composed of 15 symmetric and 18 asymmetric instances contain much smaller matrices than the sets composed of 35 symmetric and 45 asymmetric instances, respectively. Moreover, the instances of the University of Florida sparse-matrix collection are larger than the instances of the Harwell-Boeing sparse-matrix collection. It may not be appropriate to consider a small instance with the same score of a much larger instance. Thus, the sets of instances of the University of Florida sparse-matrix collection were mainly considered to identify the best low-cost heuristics for bandwidth and profile reductions. On the other hand, the other sets composed of instances of the Harwell-Boeing sparse-matrix collection are also considered, as described below.
We divided the sets of test problems by application area and then the most promising heuristics for each specific application were identified. Tables 2 (symmetric matrices) and 3 (asymmetric matrices) show the most promising heuristics for application areas applied to the four sets of instances of the Harwell-Boeing (http://math.nist.gov/MatrixMarket/data/Harwell-Boeing (Duff et al. 1989a)) and the two sets of instances of the University of Florida sparse-matrix (http://www.cise.ufl.edu/research/sparse/matrices/index.html (Davis and Hu 2011)) collections in relation to bandwidth and profile reductions (see Tables 22 and 23 in Appendix C). Appendix D analyzes the heuristics that showed the best bandwidth and profile reductions of symmetric and asymmetric matrices. Table 4 shows the heuristics that showed the best overall performance on these six sets of test matrices.
Figure 13 [WBRA (Esposito et al. 1998), FNCHC (Lim et al. 2003, 2004, 2007), VNS-band (Mladenovic et al. 2010), and CSS-band (Kaveh and Sharafi 2012) heuristics], Figure 14 [GPS (Gibbs et al. 1976), Snay’s (1976), Burgess and Lai (1986), and GGPS (Wang et al. 2009) heuristics], and Figure 15 [GPS (Gibbs et al. 1976), FNCHC (Lim et al. 2003, 2004, 2007), GGPS (Wang et al. 2009), and VNS-band (Mladenovic et al. 2010)], built from a wide variety of references that were part of this work, show graphs of execution times, of the highest time-consuming heuristics tested, applied to 113 and 22 instances of the Harwell-Boeing and University of Florida sparse-matrix collections, respectively.
As expected, the FNCHC (Lim et al. 2003, 2004, 2007), VNS-band (Mladenovic et al. 2010), and CSS-band (Kaveh and Sharafi 2012) heuristics showed higher computational costs than most of the methods tested here because these are metaheuristic-based heuristics.
The WBRA heuristic (Esposito et al. 1998) shows high computational cost because it builds a rooted level structure (RLS) for each vertex of the graph and these structures are renumbered based on a bottleneck linear assignment. This heuristic for bandwidth reduction of symmetric matrices did not show competitive results when compared to other heuristics, such as the VNS-band, FNCHC, and GPS heuristics. The WBRA heuristic employs a function to move vertices from the level with maximum width to the subsequent level (nearer the level 0) in the level structure (that may not be rooted at a vertex). This may result in a worse distribution of the vertices in relation to the RLS created by the GPS algorithm, for example.
The results of the CSS-band heuristic may be related to the random initial solution suggested by its authors (we strictly followed the algorithms proposed when implementing these heuristics). On the other hand, the VNS-band and FNCHC heuristics employ a variation of the breadth-first search procedure to provide an initial solution. Moreover, a slow convergence was observed within the CSS-band heuristic.
Although based on RCM and GPS (Gibbs et al. 1976) algorithms, Burgess and Lai’s heuristic (Burgess and Lai 1986) contains a stage where three RLSs are modified and its results showed higher execution times than the GPS algorithm. To be more precise, Burgess and Lai’s heuristic (Burgess and Lai 1986) attempts to reduce the width of each level of three RLSs built in the first step. This heuristic moves vertices from one level above or below until a minimum width is reached. Thus, this algorithm may compute a level for a long time until reaching this minimum width.
Snay’s heuristic (Snay 1976) is similar to the RCM method, considering that both are based on breadth-first search procedure. The RCM method labels adjacent vertices to the current vertex (sorting them in ascending degree order) and Snay’s heuristic (Snay 1976) takes into account vertices also in the second level (sorting them in ascending degree order, considering only adjacencies to vertices that are neither labeled nor are candidate vertices) of the vertices already labeled. Snay’s heuristic (Snay 1976) performs this task for 10 pseudo-peripheral vertices, which explains its high computational cost. Even designed for profile reduction (by allowing vertices with small degree numbers being labeled before other vertices), the results of Snay’s heuristic were not competitive with the results of the other heuristics for profile reduction, such as Sloan’s algorithm (and its variations).
In addition to the GPS (Gibbs et al. 1976), MPG (Medeiros et al. 1993), NSloan (Kumfert and Pothen 1997), and FNCHC (Lim et al. 2003, 2004, 2007) heuristics for bandwidth and/or profile reductions, it should also be noted that the RCM–GL (George and Liu 1981) and hGPHH-GL (Koohestani and Poli 2011) methods demonstrated very low computational costs in the six datasets tested. Figure 16 shows the execution times of the RCM–GL (George and Liu 1981), Sloan’s (1989), MPG (Medeiros et al. 1993), NSloan (Kumfert and Pothen 1997), Sloan–MGPS (Reid and Scott 1999), and hGPHH-GL (Koohestani and Poli 2011) heuristics, which are the lowest-cost heuristics tested, applied to 113 asymmetric instances of the Harwell-Boeing sparse-matrix collection. In particular, the 14 heuristics showed very high computational costs in the MBEACXC, MBEAFLW, and MBEAUSE instances (from economic modeling). These instances have many connected components and isolated vertices; the heuristics are applied to each connected component of a disconnected graph.
Figure 17 shows the average execution times of the RCM–GL (George and Liu 1981), Sloan’s (1989), MPG (Medeiros et al. 1993), NSloan (Kumfert and Pothen 1997), Sloan–MGPS (Reid and Scott 1999), and hGPHH-GL (Koohestani and Poli 2011) heuristics applied to 22 instances of the University of Florida sparse-matrix collection. In general, the MPG heuristic (Medeiros et al. 1993) showed lower computational cost than Sloan’s (1989), NSloan (Kumfert and Pothen 1997), and Sloan–MGPS (Reid and Scott 1999). In particular, the MPG (Medeiros et al. 1993) limits the number of candidate vertices to be examined on each iteration.
5 Conclusions
Among 132 heuristics identified in the literature that were applied to bandwidth and/or profile reductions, 14 heuristics were selected as the most promising low-cost heuristics to solve these problems. In experiments with 113 and 22 instances (up to 100,196 vertices) of the Harwell-Boeing and University of Florida sparse-matrix collections, respectively, the FNCHC (Lim et al. 2003, 2004, 2007) and GPS (Gibbs et al. 1976) heuristics showed the best results with respect to bandwidth reduction of symmetric and asymmetric matrices, respectively. Additionally, the NSloan (Kumfert and Pothen 1997) and MPG (Medeiros et al. 1993) heuristics may be seen as the most promising low-cost algorithms for profile reduction of symmetric and asymmetric matrices, respectively.
Heuristics for reordering of vertices contribute to provide adequate memory location and, hence, improving cache hit rates (Das et al. 1992; Burgess and Giles 1997). Moreover, as exhibited by the computational experiment described in this paper, conclusively, the choice of a heuristic is highly dependent on the use of a specific instance. Thus, in addition to the GPS (Gibbs et al. 1976), MPG (Medeiros et al. 1993), NSloan (Kumfert and Pothen 1997), and FNCHC (Lim et al. 2003, 2004, 2007) heuristics, the RCM–GL (George and Liu 1981) and hGPHH-GL (Koohestani and Poli 2011) algorithms showed very low computational costs for bandwidth and/or profile reductions in the six sets of instances tested. These six heuristics were identified as the most promising low-cost heuristics for bandwidth and profile reductions so that the computational cost of solving large-scale linear systems by iterative methods can be reduced (depending on the preconditioner used to reduce the number of iterations of the solver applied). In addition, Sloan’s, GGPS, and Sloan–MGPS heuristics can be applied to instances originating from specific application areas (see Tables 2, 3).
Additional datasets including much larger matrices shall be included in a future evaluation so that an insight is intended to be offered as to how the tested heuristics compare on very large examples that are of more practical interest. Thus, we intend to apply the 14 heuristics tested in this work and, in particular, those six heuristics selected, to perform reordering of vertices and reduce the computational cost of iterative methods for solving large-scale linear systems to verify the best method(s) in this context.
References
Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182:418–477
Benzi M, Szyld DB, Van Duin A (1999) Orderings for incomplete factorization preconditioning of nonsymmetric problems. SIAM J Sci Comput 20(5):1652–1670
Bernardes JAB (2015) Gonzaga de Oliveira, S.L.: A systematic review of heuristics for profile reduction of symmetric matrices. In: Koziel S, Leifsson L, Lees M, Krzhizhanovskaya VV, Dongarra J, Sloot PM (eds) International conference on computational science, ICCS 2015, vol 51, pp 221–230. Elsevier, Reykjavik, Iceland
Burgess DA, Giles M (1997) Renumbering unstructured grids to improve the performance of codes on hierarchial memory machines. Adv Eng Softw 28(3):189–201
Burgess IW, Lai PKF (1986) A new node renumbering algorithm for bandwidth reduction. Int J Numer Methods Eng 23:1693–1704
Chagas GO, Gonzaga de Oliveira SL (2015) Metaheuristic-based heuristics for symmetric-matrix bandwidth reduction: a systematic review. In: Koziel S, Leifsson L, Lees M, Krzhizhanovskaya VV, Dongarra J, Sloot PM (eds) International conference on computational science, ICCS 2015, vol 51, pp 211–220. Elsevier, Reykjavik, Iceland
Cuthill EH (1972) Several strategies for reducing the bandwidth of matrices. In: Proceedings of a symposium on sparse matrices and their applications, pp 157–166. The IBM Research Symposia Series, New York, USA
Das R, Mavriplis DJ, Saltz JH, Gupta SK, Ponnusamy R (1992) The design and implementation of a parallel unstructured euler solver using software primitives. Tech. Rep. AD-A249 437, Institute for Computer Applications in Science and Engineering—NASA, Virgina, USA
Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25
Duff IS, Grimes RG, Lewis JG (1989) Sparse matrix test problems. ACM Trans Math Softw 15(1):1–14
Duff IS, Meurant GA (1989) The effect of ordering on preconditioned conjugate gradients. BIT Numer Math 29(4):635–657
Esposito A, Catalano MSF, Malucelli F, Tarricone L (1998) A new matrix bandwidth reduction algorithm. Oper Res Lett 23:99–107
Everstine GC (1979) A comparison of three resequencing algorithms for the reduction of matrix profile and wavefront. Int J Numer Methods Eng 14:837–853
Felippa CA (1975) Solution of linear equations with skyline-stored symmetric matrix. Comput Struct 5(1):13–29
Fenves SJ, Law KH (1983) A two-step approach to finite element ordering. Int J Numer Methods Eng 19(6):891–911
George, A.: Computer implementation of the finite element method. Ph.D. thesis, Stanford University, Stanford, USA (1971)
George A, Liu JW (1981) Computer solution of large sparse positive definite systems. Prentice-Hall, Englewood Cliffs
George A, Liu JWH (1979) An implementation of a pseudoperipheral node finder. ACM Trans Math Softw 5(3):284–295
Gibbs NE, Poole WG, Stockmeyer PK (1976) An algorithm for reducing the bandwidth and profile of a sparse matrix. SIAM J Numer Anal 13(2):236–250
Gibbs NE, Poole WG, Stockmeyer PK (1976) Comparison of several bandwidth and profile reduction algorithms. ACM Trans Math Softw 4(1):322–330
Gonzaga de Oliveira, S.L., Chagas, G.O.: A systematic review of heuristics for symmetric-matrix bandwidth reduction: methods not based on metaheuristics. In: Proceedings of the Brazilian symposium on operations research (SBPO 2015). Sobrapo, Pernambuco, Brazil (2015)
Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(36):409–436
Hu Y, Scott JA (2001) A multilevel algorithm for wavefront reduction. SIAM J Sci Comput 23(4):1352–1375
Johnson D (2002) A Theoretician’s guide to the experimental analysis of algorithms. In: Goldwasser M, Johnson DS, McGeoch CC (eds) Proceedings of the 5th and 6th DIMACS implementation challenges. American Mathematical Society, Providence
Kaveh A, Sharafi P (2012) Ordering for bandwidth and profile minimization problems via charged system search algorithm. Iran J Sci Technol Trans Civil Eng 36(2):39–52
King IP (1970) An automatic reordering scheme for simultaneous equations derived from network systems. Int J Numer Methods Eng 2(4):523–533
Koohestani, B., Poli, R.: A hyper-heuristic approach to evolving algorithms for bandwidth reduction based on genetic programming. In: Research and development in intelligent systems XXVIII, pp 93–106. Springer London, London, UK (2011)
Kumfert G, Pothen A (1997) Two improved algorithms for envelope and wavefront reduction. BIT Numer Math 37(3):559–590
Lanczos C (1952) Solutions of systems of linear equations by minimized iterations. J Res Natl Bur Stand 49(1):33–53
Lewis JG (1982) Implementations of the Gibbs–Poole–Stockmeyer algorithms and Gibbs–King algorithms. ACM Trans Math Softw 8:180–189
Lim A, Rodrigues B, Xiao F (2003) A new node centroid algorithm for bandwidth minimization. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 1544–1545
Lim A, Rodrigues B, Xiao F (2004) A centroid-based approach to solve the bandwidth minimization problem. In: Proceedings of the Hawaii international conference on system sciences. Institute of electrical and electronics engineers Inc., Big Island, HI, United States, vol 37, pp 1203–1208
Lim A, Rodrigues B, Xiao F (2007) A fast algorithm for bandwidth minimization. Int J Artif Intell Tools 3:537–544
Lin YX, Yuan JJ (1994) Profile minimization problem for matrices and graphs. Acta Math Appl Sin 10(1):107–122
Martí R, Laguna M, Glover F, Campos V (2001) Reducing the bandwidth of a sparse matrix with tabu search. Eur J Oper Res 135(2):450–459
Medeiros SRP, Pimenta PM, Goldenberg P (1993) Algorithm for profile and wavefront reduction of sparse matrices with a symmetric structure. Eng Comput 10(3):257–266
Mladenovic N, Urosevic D, Pérez-Brito D, García-González CG (2010) Variable neighbourhood search for bandwidth reduction. Eur J Oper Res 200:14–27
Papadimitriou CH (1976) The NP-completeness of bandwidth minimization problem. Comput J 16:177–192
Piñana E, Plana I, Campos V, Martí R (2004) GRASP and path relinking for the matrix bandwidth minimization. Eur J Oper Res 153(1):200–210
Reid JK, Scott JA (1999) Ordering symmetric sparse matrices for small profile and wavefront. Int J Numer Methods Eng 45(12):1737–1755
Reid JK, Scott JA (2006) Reducing the total bandwidth of a sparse unsymmetric matrix. SIAM J Matrix Anal Appl 28(3):805–821
Rodriguez-Tello E, Kao HJ, Torres-Jimenez J (2008) An improved simulated annealing algorithm for bandwidth minimization. Eur J Oper Res 185:1319–1335
Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7:856–869
Sloan SW (1989) A Fortran program for profile and wavefront reduction. Int J Numer Methods Eng 28(11):2651–2679
Snay RA (1976) Reducing the profile of sparse symmetric matrices. Bull Geod 50(4):341–352
STFC: The Science and Technology Facilities Council. HSL. A collection of Fortran codes for large scale scientific computation. http://www.hsl.rl.ac.uk. Accessed Dec 2015
The MathWorks, I.: MATLAB. http://www.mathworks.com/products/matlab/index.html (1994–2016). Accessed Mar 2016
Torres-Jimenez J, Izquierdo-Marquez I, Garcia-Robledo A, Gonzalez-Gomez A, Bernal J, Kacker RN (2015) A dual representation simulated annealing algorithm for the bandwidth minimization problem on graphs. Inf Sci 33:33–49
Wang, C., Xu, C., Lisser, A.: Bandwidth minimization problem. In: 10th International conference on modeling, optimization and simulation (Mosim14). Nancy, France (2014)
Wang Q, Guo YC, Shi XW (2009) A generalized GPS algorithm for reducing the bandwidth and profile of a sparse matrix. Prog Electromagn Res 90:121–136
Acknowledgements
This work was undertaken with the support of the FAPEMIG—Fundação de Amparo à Pesquisa do Estado de Minas Gerais (Minas Gerais Research Support Foundation, Brazil). We would like to thank Prof. Dr. Dragan Urosevic, from the Mathematical Institute SANU, for sending us the VNS-band executable programs. We would like also to thank Prof. Dr. Fei Xiao, from Beepi, for providing us the with the C programming language source code of the FNCHC heuristic. In addition, we would like to thank the reviewers for their valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jose Alberto Cuminato.
Appendices
Appendix A: Implementation of the heuristics, testing, and calibration
This appendix shows the testing and calibration performed to compare our implementations with the codes used by the original proposers of the 14 heuristics to ensure the codes we implemented were comparable to the algorithms that were originally proposed. Regarding the VNS-band heuristic, 32-bit and 64-bit executable programs of this heuristic (which were kindly provided by one of the heuristic’s authors) were used. Mladenovic et al. (2010) developed a heuristic based on variable neighborhood search meta-heuristic to solve the bandwidth minimization problem. This meta-heuristic investigates increasingly distant neighbors of a solution. An initial solution is generated through applying a random breadth-first search procedure. Then, a main loop carries out three steps: shaking, local search, and neighborhood change. First, the neighborhood index k is initialized with \(k_{{\mathrm{min}}}\), which is the size of the neighborhood. Second, in the shaking step, the meta-heuristic produces a solution S within the current neighborhood. Using the solution S, a local-search method produces a local-optimum solution \(O_l\). Third, \(O_l\) replaces the current solution if the bandwidth of the solution \(O_l\) is narrower than the bandwidth of the current solution, and k is set to \(k_{{\mathrm{min}}}\); otherwise, neighborhood is expanded incrementing k while \(k < k_{\mathrm{max}}\). The VNS-band heuristic terminates when the execution time exceeds a timeout as previously established.
The FNCHC-heuristic source code was also kindly provided by one of the heuristic’s authors. With this, the source code was converted in this present work to the C++ programming language. The FNCHC heuristic (Lim et al. 2003, 2004, 2007) is a variation of the NCHC heuristic (Lim et al. 2003, 2004, 2007). The FNCHC heuristic is based on a direct node adjustment method with hill climbing for solving the bandwidth minimization problem.
We asked all the other heuristics’ authors for the sources and/or executables of their algorithms. However, some authors responded that they no longer had the code, some authors did not answer, and some authors explained that the programs could not be provided. Then, the remaining 12 heuristics were also implemented in the C++ programming language so that the computational costs of the heuristics could be compared appropriately. Specifically, the g++ version 4.8.2 compiler was used.
It should be noted that these heuristics are simple to implement:
-
the RCM–GL (George and Liu 1981) (based on breadth-first search),
-
Snay’s heuristic (Snay 1976) (based on RCM method),
-
Sloan’s algorithm (Sloan 1989),
-
the Medeiros–Pimenta–Goldenberg (MPG) heuristic (Medeiros et al. 1993) (based on Sloan’s algorithm),
-
the Normalized Sloan (NSloan) heuristic (Kumfert and Pothen 1997) (based on Sloan’s algorithm),
-
Sloan’s algorithm with pseudo-peripheral vertex given by the modified GPS algorithm (Sloan–MGPS) (Reid and Scott 1999) (based on Sloan’s algorithm),
-
the heuristic based on genetic programming hyper-heuristic (hGPHH) (Koohestani and Poli 2011) (based on RCM method).
Details about the heuristics implemented are given below. Sections A.1 and A.2 show the testing and calibration performed with regard to the RCM–GL and hGPHH-GL methods, and the GPS algorithm, respectively.
Sloan’s (1989), the NSloan (Kumfert and Pothen 1997), Sloan–MGPS (Reid and Scott 1999), and Charged System Search for bandwidth reduction (CSS-band) (Kaveh and Sharafi 2012) heuristics have important parameters that may change the results. Sections A.3 and A.4 show the testing and calibration performed with regard to Sloan’s, Nsloan, and Sloan–MGPS heuristics, and the MPG heuristic, respectively. Section A.5 shows the testing and calibration performed with regard to the CSS-band heuristic. On the other hand, the other heuristics do not have parameters that affect the results.
To our knowledge, results of Snay’s (1976), Burgess and Lai (1986), wonder bandwidth reduction algorithm (WBRA) (Esposito et al. 1998), Generic GPS (GGPS) (Wang et al. 2009), and CSS-band (Kaveh and Sharafi 2012) heuristics have only been published in their original papers and, unfortunately, we did not find the instances where these five heuristics were applied. On the other hand, since an efficient implementation comes at a cost in programming effort, we strictly observed the descriptions of the algorithms provided by their authors to obtain reasonably efficient implementations of (all) these heuristics.
It should be noted that it was not our intention that the results of the C++ programming language versions of the heuristics outperform results of the original implementations. Our intention was to implement reasonably efficient implementations of the heuristics tested to make it possible an appropriate comparison of the results of the 14 heuristics.
1.1 A.1: The RCM–GL and hGPHH-GL methods
In short, the RCM is a breadth-first search-based procedure and the vertices on each level are labeled in ascending order of degree number. Then, the final labeling is reverted. Similarly, the hGPHH is a breadth-first search-based procedure and the vertices on each level are labeled with a specific formula provided by the genetic programming hyper-heuristic (GPHH) (Koohestani and Poli 2011).
These two heuristics are highly dependent on the starting vertex and since Koohestani and Poli (2011) provided no description of the pseudo-peripheral vertex finder used, we employed the George–Liu algorithm (George and Liu 1979) to find a pseudo-peripheral vertex; hence, we named this method as hGPHH-GL, i.e., it is an RCM-based heuristic developed through a genetic programming hyper-heuristic. Thus, in both algorithms, the starting vertex is given here by the George–Liu algorithm.
Table 5 exhibits results of our implementation of the RCM–GL and hGPHH-GL methods applied to 11 instances of the Harwell-Boeing sparse-matrix collection. These results are compared to the results of the GHH (Koohestani and Poli 2011) and the RCM method contained in the MATLAB library (RCMM) (MATLAB 2016), which is a SPARSPAK-based highly enhanced version of the RCM method described by George and Liu (1981). Specifically, Koohestani and Poli (2011) published these results. In general, one can consider that the results are similar, although differences can be explained by the use of the George–Liu algorithm (George and Liu 1979) to find pseudo-peripheral vertices.
1.2 A.2: The GPS algorithm
In short, the GPS algorithm (Gibbs et al. 1976) creates a level structure rooted at a pseudo-peripheral vertex. It also considers an end pseudo-peripheral vertex when building this structure. The final labeling is given starting by the end node of the rooted level structure (RLS) and the vertices on each level are labeled in decreasing order of degree number. Let \(G = (V, E)\) be a connected and simple graph. Given a vertex \(v\in V\), the level structure rooted at v, with depth (or the eccentricity of v) \(\ell (v)\), is a partitioning \(\mathscr {L}(v)\) \(=\) \(\{L_0(v), L_1(v),\ \dots , L_{\ell (v)}(v) \}\), where \(L_0(v) = \{v\}\), \(L_i(v) = \mathrm{Adj}(L_{i-1}(v)) - \bigcup _{j=0}^{i-1}L_j(v)\), for \(\ i = 1, 2,\ 3,\ \dots , \ell (v)\) and \(\mathrm{Adj}(.)\) returns the adjacent vertices of the argument. The width \(b(\mathscr {L}(v))\) of \(\mathscr {L}(v)\) is defined as \(b(\mathscr {L}(u)) = \max \nolimits _{0 \le i \le \ell (u)}|L_i(u)|\).
Results of the GPS algorithm applied to the Harwell-Boeing sparse-matrix collection were provided by Martí et al. (2001), Piñana et al. (2004), Lim et al. (2007), and Rodriguez-Tello et al. (2008). These researchers compared the results of their heuristics with the results of the GPS algorithm applied to instances of this sparse-matrix collection. Their papers show the average bandwidth over the instances in each dataset. Nevertheless, a comparison of the average bandwidth (over the instances in each set) of the results obtained using the (C++ programming language version of the) GPS algorithm to the results shown in these publications is not adequate because the datasets used here contain instances with very different sizes, i.e., a heuristic that returns a very large bandwidth in a small instance would not be penalized appropriately.
Then, we closely followed the recommendations given by Lewis (1982) in a Fortran programming language code to implement the GPS algorithm (Gibbs et al. 1976) in the C++ programming language. One of the main results of Lewis (1982) was to present a low-cost implementation of the GPS algorithm (Gibbs et al. 1976). Lewis (1982) did not explicitly present results of the GPS algorithm with respect to bandwidth and profile reductions. However, this author showed results of the Gibbs–King heuristic with regard to bandwidth and profile reductions and stated that the results of the GPS algorithm were as good as the results of the Gibbs–King algorithm. On the other hand, Lewis (1982) presented the GPS algorithm computational cost and showed that the Gibbs–King heuristic is much slower than the GPS algorithm. This author presented these results because the bandwidth and profile reductions obtained using the GPS algorithm were the same results obtained by Everstine (1979). On the other hand, Everstine (1979) showed only results concerning wavefront and profile reductions. Thus, Table 6 shows results with respect to bandwidth reductions obtained using the Gibbs–King heuristic (the same results obtained using the GPS algorithm) (Lewis 1982), results with regard to profile reductions attained by the GPS algorithm (Everstine 1979; Lewis 1982), and results concerning bandwidth and profile reductions achieved by the C++ programming language version of the GPS algorithm applied to 13 instances of the Harwell-Boeing sparse-matrix collection.
In general, the C++ programming language version of the GPS algorithm obtained better results than the Fortran programming language version of this algorithm (Lewis 1982) in relation to bandwidth reductions. On the other hand, although the C++ programming language version of the GPS algorithm obtained the largest number of best results (in eight instances) in this set of instances, the Fortran version of this algorithm (Lewis 1982) was the best one for reducing profile when considering the \(\rho _p\) metric. Nevertheless, the difference in the \(\rho _p\) metric is small and it can be explained by choices of the pseudo-peripheral vertices in these runs. Therefore, one can consider that the C++ programming language version used here is a reasonably efficient implementation of the GPS algorithm (Gibbs et al. 1976).
1.3 A.3: The Sloan, NSloan, and Sloan–MGPS heuristics
Similar to the GPS algorithm (Gibbs et al. 1976), the first step of Sloan’s algorithm (Sloan 1989) returns a starting (s) and an end (e) pseudo-peripheral vertices. Then, Sloan’s algorithm (Sloan 1989) creates a level structure rooted at s. The final labeling is given starting by the vertex s and the vertices are labeled using a max-priority queue according to the priority \(p(v)=w_1d(v,e)-w_2(\mathrm{deg}(v)+1)\), where \(w_1\) and \(w_2\) are integer weights, d(v, e) is the distance of vertex v to the vertex e, and \(\mathrm{deg}(v)\) is the degree of vertex v. Sloan (1989) suggested to assign these two weights as \(w_1=1\) and \(w_2=2\). Setting \(w_1 \gg w_2 \ge 1\) places more emphasis on the global criterion of distance from the target end vertex and forces the vertices to be labeled level by level, and the vertex-labeling algorithm is the method of King (King 1970). When setting \(w_1=0\) and \(w_2=1\), the vertex-labeling algorithm is similar to Snay’s heuristic (Snay 1976).
One can realize that the RCM method (George 1971) obtains better bandwidth results than Sloan’s algorithm (Sloan 1989) (and variations) because, in the RCM method, if \((u,v)\in E\), then \(|s(u)-s(v)|\) is small, since the vertices are labeled level by level in relation to the level structure rooted at the starting vertex. In turn, Sloan’s algorithm (Sloan 1989) may provide a labeling with small \(|s(u)-s(v)|\), but large d(u, v) when placing more emphasis on the current degree as the principal measure of priority (i.e., \(w_2>w_1\)), which is a local criterion. On the other hand, one can realize that Sloan’s algorithm (Sloan 1989) (and variations) obtains better profile results than the RCM method (George 1971), because the RCM method may produce \(\beta _i\) (\(1\le i\le |V|\)) similar to \(\beta (A)\) and, consequently, producing a labeling with large profile. By its turn, Sloan’s algorithm (Sloan 1989) (and variations) produces small \(\beta _i\) (\(1\le i\le |V|\)) by choosing to label firstly vertices with smaller degrees so that it produces a labeling with small profile [although in each iteration Sloan’s algorithm (Sloan 1989) (and variations) forces to label vertices level by level—similarly to the RCM method—by increasing \(w_2\) to the priority of the candidate vertices to be labeled].
We strictly observed the recommendations given by Sloan (Sloan 1989) and also the recommendations described in http://www.hsl.rl.ac.uk/archive/specs/mc40 (STFC 2015) in a Fortran programming language code to implement this heuristic in the C++ programming language. Likewise, we studied the Fortran programming language code of Sloan–MGPS available at http://www.hsl.rl.ac.uk/catalogue/mc60.html (STFC 2015) to implement this heuristic in the C++ programming language.
Sloan’s algorithm (Sloan 1989) for finding pseudo-peripheral vertices follows three main steps. First, Sloan’s algorithm (Sloan 1989) for finding pseudo-peripheral vertices selects a vertex of minimum degree v in the graph. Second, it scans \(L_{\ell (v)}(v)\) and then observes a set containing only one vertex of each degree. Finally, considering the eccentricity and width of RLSs, this algorithm returns a starting (s) and an end (e) pseudo-peripheral vertices. This algorithm is a modified version of the algorithm of Gibbs et al. (1976) (which considers only the eccentricity of vertices) and is also applied together with the MPG (Medeiros et al. 1993) (see Sect. A.4) and NSloan (Kumfert and Pothen 1997) heuristics.
Reid and Scott (1999) proposed two modifications in Sloan’s algorithm (Sloan 1989) for finding pseudo-peripheral vertices: it scans \(L_{\ell (v)}(v)\) and observes a set containing only five vertices which are not adjacent to each other (instead of observing a set containing only one vertex of each degree) and, before returning the pseudo-peripheral vertices, it changes s by e if \(\ell (e)>\ell (s)\). Therefore, the Sloan–MGPS heuristic (Reid and Scott 1999) is Sloan’s vertex-labeling algorithm with pseudo-peripheral vertices given by this modified GPS algorithm.
Sloan (1989) proposed to employ two weights to label the vertices of the instance: \(w_1\), associated with the distance d(v, e) of the vertex v to the target end vertex e in \(\mathscr {L}(s)\), and \(w_2\), associated with the degree of each vertex. To provide specific detail, the priority \(p(v)=w_1d(v,e)-w_2(\mathrm{deg}(v)+1)\) employed in Sloan’s algorithm (Sloan 1989) presents different scales for both criteria. The value of \(\mathrm{deg}(v)+1\) ranges from 1 to \(m+1\) (where m is the maximum degree of the graph \(G=(V,E)\)), and d(v, e) ranges from 0 (when \(v=e\), and \(d(v,e)>0\) for \(d\ne e\)) to the eccentricity \(\ell (e)\) (of the target end vertex e). Regarding Sloan’s (1989), NSloan (Kumfert and Pothen 1997), and Sloan–MGPS (Reid and Scott 1999) heuristics, we established the two weights as these parameters were described in the original papers. When more than one pair of values have been suggested in the original papers, we performed exploratory investigations to determine the pair of values that obtains the best profile results. Then, the two weights are assigned as \(w_1=1\) and \(w_2=2\) for the Sloan–MGPS heuristic (Reid and Scott 1999), and as \(w_1=2\) and \(w_2=1\) for the NSloan heuristic (Kumfert and Pothen 1997). To give more specific detail, although assigning these two weights as \(w_1=1\) and \(w_2=2\), Sloan’s (1989), MPG (Medeiros et al. 1993) (see Sect. A.4), and Sloan–MGPS (Reid and Scott 1999) heuristics place more emphasis on the distance d(v, e) (related to \(\mathscr {L}(s)\)) when applied to sparse graphs because the scale of this criterion is larger than the scale of the other criterion. One can realize that, in sparse graphs, \(\ell (e)\) is larger than m so that Kumfert and Pothen (1997) normalized these two criteria to propose the Normalized Sloan (NSloan) heuristic.
Results of Sloan’s (Sloan 1989) and Sloan–MGPS (Reid and Scott 1999) heuristics applied to 17 symmetric instances of the University of Florida sparse-matrix collection were provided by Reid and Scott (1999). Table 7 exhibits results of the C++ programming language versions of Sloan’s algorithm and the Sloan–MGPS heuristic, and the results of these heuristics provided by Reid and Scott (1999) applied to 17 symmetric instances of the University of Florida sparse-matrix collection. In this dataset composed of 17 symmetric instances, one can verify that the C++ programming language versions of the two heuristics obtained better profile results when considering the \(\rho _p\) metric. Disregarding the skirt instance (which Reid and Scott (1999) show larger profiles of 2 magnitudes in relation to the C++ programming language versions), one finds \(\sum \rho _p=0.84\) for Sloan’s algorithm (Sloan 1989), against \(\sum \rho _p=0.58\) resulted by the C++ programming language version of this heuristic, and the \(\rho _p\) metric in both versions of the Sloan–MGPS is also similar when disregarding this instance.
Kumfert and Pothen (1997) provided results of the NSloan heuristic applied to instances of the University of Florida sparse-matrix collection. Kumfert and Pothen (1997) normalized the results of the NSloan heuristic in relation to the RCM method (George 1971) with pseudo-peripheral vertex given by the method of Duff et al. (1989a), which is similar to Sloan’s algorithm for finding pseudo-peripheral vertices (Sloan 1989). Table 7 replicates these results. In addition, Table 7 shows results of the C++ programming language version of the NSloan heuristic normalized in relation to the results obtained using the C++ programming language version of the RCM–GL method (George and Liu 1981) applied to 17 symmetric instances of the University of Florida sparse-matrix collection. In this dataset composed of 17 symmetric instances, one finds \(\sum \rho _p=0.13\) for the original NSloan heuristic (Kumfert and Pothen 1997), against \(\sum \rho _p=1.64\) resulted by the C++ programming language version of this heuristic. One can observe that results of both implementations are similar and differences in the results can be explained by the use of different algorithms for finding pseudo-peripheral vertices.
1.4 A.4: The MPG heuristic
The MPG heuristic (Medeiros et al. 1993) employs two max-priority queues: t contains vertices that are candidate vertices to be labeled, and q contains vertices belonging to t and also vertices that can be inserted to t. Similar to Sloan’s algorithm (Sloan 1989) (and variations), the current degree of a vertex is the number of adjacencies to vertices that neither have been labeled nor belong to q. A main loop performs three steps. First, a vertex v is inserted into q to maximize a specific priority function. Second, the current degree \(\mathrm{currdeg}_v\) of each vertex \(v\in t\) is observed: the algorithm labels a vertex v if \(\mathrm{currdeg}_v = 0\), and the algorithm removes from t a vertex v (i.e., \(t\leftarrow t - \{v\}\)) if \(\mathrm{currdeg}_v > 1\). Third, if t is empty, the algorithm inserts into t each vertex \(u\in q\) with priority \(p_u \ge p_{\mathrm{max}}(q)-1\), where \(p_{\mathrm{max}}(q)\) returns the maximum priority among the vertices in q.
Table 8 exhibits results of the C++ programming language version of the MPG heuristic and the results of this heuristic provided by Medeiros et al. (1993) applied to 30 symmetric instances of the Harwell-Boeing sparse-matrix collection. In this dataset composed of 30 instances, one can verify that the C++ programming language version of the MPG heuristic obtained better profile results than the original version of this heuristic.
1.5 A.5: The CSS-band heuristic
Regarding the charged system search for bandwidth reduction (CSS-band) heuristic (Kaveh and Sharafi 2012), 500 iterations were used as a termination criterion, and 1 charged particle per 100 vertices was used in the tests performed. These values are the same used by Kaveh and Sharafi (2012). This heuristic was also tested with less iterations and less charged particles. This considerably decreased the computational cost of the heuristic; however, the bandwidth and profile reductions also decreased substantially. In addition, tests with more iterations and more charged particles were performed. Although it slightly improved the bandwidth and profile reductions, the computational cost increased considerably.
Appendix B: Results
The first and second parts of Table 9 contain the results of 14 heuristics applied to reduce the bandwidth and profile of 15 symmetric instances of the Harwell-Boeing sparse-matrix collection, respectively. Numbers in bold face are the best results. Table 10 shows the average execution times of the 14 heuristics applied to 15 symmetric instances of the Harwell-Boeing sparse-matrix collection. Similar to the other tables related to average execution times shown below, Table 10 also shows the number of best results, the rank order according to the \(\sum \rho _t\) metric, and the mode of the order of magnitude of these 14 heuristics. In particular, to achieve the maximal precision of the metrics, we set up several decimal cases to avoid draws.
Tables 11 and 12 contain the results of 14 heuristics applied to reduce the bandwidth and profile of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection, respectively. In particular, Table 11 contains the results of the VNS-band heuristic set at 7 and 8 s. This was done to show that 8 s was the lowest timeout required for the VNS-band heuristic to obtain better results than the FNCHC heuristic, which was the second best in reducing bandwidth, according to the \(\rho _b\) metric. Table 12 also contains the results of the VNS-band heuristic set with 500 s. Even so, when considering profile reductions, the VNS-band heuristic was dominated by the MPG (Medeiros et al. 1993), Sloan–MGPS (Reid and Scott 1999), Sloan’s (1989), and NSloan (Kumfert and Pothen 1997) heuristics in this set of 35 symmetric instances. Table 13 shows the average execution times of the 14 heuristics applied to 35 symmetric instances of the Harwell-Boeing sparse-matrix collection.
Table 14 contains the results of 14 heuristics applied to reduce the bandwidth and profile of 18 asymmetric instances of the Harwell-Boeing sparse-matrix collection. In particular, in the GENT113 instance, it was only possible to apply the 64-bit executable program of the VNS-band heuristic with a limit of 6 s. Therefore, the bandwidth and profile values presented for this instance are obtained using the VNS-band (6 s) heuristic. Table 15 shows the average execution times of the 14 heuristics applied to 18 asymmetric instances of the Harwell-Boeing sparse-matrix collection.
Tables 16 and 17 contain the results of 14 heuristics applied to reduce the bandwidth and profile of 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection, respectively. In particular, it was not possible to apply the 64-bit executable program of the VNS-band heuristic in the SHERMAN4 instance. In this instance, the values are shown after being obtained using the 32-bit executable program of the VNS-band heuristic, considering the same runtime limit. Table 18 shows the average execution times of the 14 heuristics applied to 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection.
Table 19 contains the results of 10 heuristics applied to reduce the bandwidth and profile of 17 symmetric instances of the University of Florida sparse-matrix collection. Table 20 shows the average execution times of the 10 heuristics applied to 17 symmetric instances of the University of Florida sparse-matrix collection.
Table 21 contains the results of 10 heuristics applied to reduce the bandwidth and profile of five asymmetric instances of the University of Florida sparse-matrix collection. Table 21 also shows the average execution times of the 10 heuristics applied to five asymmetric instances of the University of Florida sparse-matrix collection.
Appendix C: Application areas
Tables 22 and 23 show several instances of the University of Florida and Harwell-Boeing sparse-matrix collections divided by application area, respectively.
Appendix D: The most promising low-cost heuristics for bandwidth and profile reductions of symmetric and asymmetric matrices
Table 24 summarizes the results of the heuristics that showed the best overall performance on the six sets of test matrices. Additionally, Table 24 shows the lowest-cost heuristics tested in these sets of instances. Furthermore, in spite of the small number of executions for each heuristic in each instance, Table 24 shows the largest standard deviation \(\sigma \) and coefficient of variation attained in relation to the execution times of the 14 heuristics for these six datasets tested.
It should be noted that when execution times are crucial for the application, a reasonable bandwidth or profile reduction by a very fast heuristic may be better than a very large bandwidth or profile reduction provided by a heuristic with high computational cost. Then, Table 24 shows the rank order of heuristics and their order of magnitude regarding their computational costs.
Unlike heuristics such as the RCM–GL, hGPHH-GL, GPS, and GGPS, in the heuristics designed for profile reduction tested here, vertices are not exclusively labeled level by level in relation to the level structure rooted at a starting vertex. This favors a profile reduction by selecting a vertex of small degree number to be labeled before other vertices. This is carried out in Snay’s (1976), Sloan’s (1989), MPG (Medeiros et al. 1993), NSloan (Kumfert and Pothen 1997), and Sloan–MGPS (Reid and Scott 1999) heuristics.
Sloan’s algorithm (Sloan 1989) showed lower computational cost than the Sloan–MGPS heuristic (Reid and Scott 1999) in all tests performed [e.g., Sloan’s algorithm (Sloan 1989) achieved its results with a lower computing cost of 1 magnitude in relation to the Sloan–MGPS heuristic (Reid and Scott 1999) when applied to the 17 symmetric instances of the University of Florida sparse-matrix collection (see Table 20)]. It should be noted that Sloan’s (1989) and MPG (Medeiros et al. 1993) heuristics were implemented with a linked list and the NSloan (Kumfert and Pothen 1997) and Sloan–MGPS (Reid and Scott 1999) heuristics were implemented with a binary heap (as proposed in the original algorithms). Specifically, a version of Sloan’s algorithm (Sloan 1989) implemented here with a binary heap was more expensive than the version implemented with a linked list. The number of vertices in the max-priority queue in each iteration may be small so that the operations within the linked list are less expensive than the operations within the max heap (which would show a different asymptotic behavior for a sufficiently large number of vertices). On the other hand, Sloan’s algorithm (Sloan 1989) was dominated by the Sloan–MGPS heuristic (Reid and Scott 1999) in relation to bandwidth and profile reductions in all tests performed: the MGPS method (Reid and Scott 1999) may have selected pseudo-peripheral vertices with larger eccentricity (or a smaller width of the corresponding RLS was generated) than the pseudo-peripheral vertices given by Sloan’s algorithm (Sloan 1989) (for this task).
Sections D.1 and D.2 describe the best heuristics for bandwidth reduction of symmetric and asymmetric matrices, respectively. Finally, Sect. D.3 addresses the best low-cost heuristic for profile reduction of symmetric and asymmetric matrices.
1.1 D.1: Best heuristic for bandwidth reduction of symmetric matrices
When setting low VNS-band timeouts and their results compared with low-cost heuristics, the VNS-band (8 s) heuristic provided better bandwidth results in the set composed of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection (see Fig. 3c). The FNCHC heuristic was the second best in reducing bandwidth in this dataset, according to the \(\rho _b\) metric (see Fig. 3a and Table 11). The VNS-band heuristic achieved these results with a higher computing cost of two magnitudes in relation to the FNCHC heuristic (see Table 13).
Additionally, in the tests presented in this paper, the FNCHC heuristic obtained the best results in 12 instances in the set composed of 15 symmetric instances (see Fig. 1c). Moreover, the FNCHC heuristic obtained the best results in the 17 symmetric instances of the University of Florida sparse-matrix collection (see Figs. 9a, c in Sect. 4.2.1 and the first part of Table 19). Therefore, the FNCHC (Lim et al. 2003, 2004, 2007) heuristic was considered the algorithm that achieved the most promising (overall) results for reducing bandwidth of symmetric matrices.
1.2 D.2: Best heuristic for bandwidth reduction of asymmetric matrices
Although the FNCHC and VNS-band heuristics obtained the best results in relation to bandwidth reduction in the sets composed of 18 and 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection (see Figs. 5a, 7a, c, the first part of Tables 14, and 16), respectively, when applied to the dataset composed of five asymmetric instances of the University of Florida sparse-matrix collection, the GPS algorithm obtained the best results according to the \(\rho _{\beta }\) metric (see Fig. 11a, the first part of Table 21, and Sect. 4.2.2). Moreover, in general, the GPS algorithm achieved the third (see Sects. 4.1.1 and 4.1.2) and forth (see Sects. 4.1.3 and 4.1.4) best results according to the \(\rho _{\beta }\) metric when applied to the four datasets composed of symmetric and asymmetric instances of the Harwell-Boeing sparse-matrix collection. The GPS algorithm was outperformed by the FNCHC and VNS-band (with its timeout set at 8, 12, and 500 s) heuristics when applied to the 113 instances of the Harwell-Boeing sparse-matrix collection (see Figures 1a–c, 3a–c, 5a, and 7a–c, and Tables 11, 14, and 16). On the other hand, the GPS algorithm outperformed the VNS-band (500 s) when applied to the 22 instances of the University of Florida sparse-matrix collection (see Figs. 9a and 11a; Tables 19, 21).
It should be noted that the FNCHC and VNS-band heuristics generate several solutions and employ local-search procedures to provide local-optimum solutions. The GPS algorithm generates only one solution. Moreover, a local-search procedure is not employed within the GPS algorithm.
The FNCHC and VNS-band heuristics achieved their results with a higher computing cost of 1 magnitude in relation to the GPS heuristic in the set composed of five asymmetric instances of the University of Florida sparse-matrix collection (see Table 21). Thus, the GPS algorithm was considered the most promising low-cost heuristic for bandwidth reduction of asymmetric matrices.
1.3 D.3: Best low-cost heuristic for profile reduction of symmetric and asymmetric matrices
The MPG (Medeiros et al. 1993) and NSloan (Kumfert and Pothen 1997) heuristics obtained the best profile results in 11 and 9 instances in the set composed of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection, respectively (see Fig. 3d; Table 12). Moreover, the NSloan (Kumfert and Pothen 1997) obtained the best profile results in the set composed of 15 symmetric instances of the Harwell-Boeing sparse-matrix collection (see Fig. 1b, d and the second part of Table 9) and in the set composed of 17 symmetric instances of the University of Florida sparse-matrix collection (see Fig. 9b, d and Table 19). Therefore, the NSloan (Kumfert and Pothen 1997) heuristic was considered the best low-cost heuristic for profile reduction of symmetric matrices.
In relation to asymmetric instances, the MPG (Medeiros et al. 1993) heuristic obtained the best results in the sets composed of 18 and 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection (see Figs 5b, 7b, and 7d; Tables 14, 17) and in the set composed of five asymmetric instances of the University of Florida sparse-matrix collection (see Fig. 11b and the second part of Table 21). Therefore, the MPG heuristic (Medeiros et al. 1993) was considered the best heuristic for profile reduction of asymmetric matrices.
Usually, in sparse graphs, \(\ell (e)\) is larger than m. Thus, Sloan’s (1989) and Sloan–MGPS (Reid and Scott 1999) heuristics place more emphasis on the distance of a vertex v to the target end vertex (a global criterion) than the degree of v (a local criterion). Thus, these two heuristics tend to label vertices level by level in a level structure rooted at a starting vertex. By normalizing the priority of these two criteria, the NSloan heuristic (Kumfert and Pothen 1997) places similar emphasis on these two criteria. On the other hand, a vertex-labeling algorithm that labels vertices in an ascending degree number produces poor bandwidth reduction at a high computational cost because it considers many candidate vertices for each iteration (we tested it here within exploratory investigations). Thus, the MPG heuristic (Medeiros et al. 1993) controls the number of candidate vertices and (although assigning the two weights as \(w_1=1\) and \(w_2=2\)) offers a trade-off between the two criteria mentioned. These characteristics of the NSloan (Kumfert and Pothen 1997) and MPG (Medeiros et al. 1993) heuristics can explain their best results in profile reduction of symmetric and asymmetric instances when compared to the other heuristics for this task, respectively.
Rights and permissions
About this article
Cite this article
Gonzaga de Oliveira, S.L., Bernardes, J.A.B. & Chagas, G.O. An evaluation of low-cost heuristics for matrix bandwidth and profile reductions. Comp. Appl. Math. 37, 1412–1471 (2018). https://doi.org/10.1007/s40314-016-0394-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40314-016-0394-9
Keywords
- Bandwidth reduction
- Profile reduction
- Combinatorial optimization
- Envelope reduction problem
- Heuristics
- Metaheuristics
- Reordering algorithms
- Sparse matrices
- Reordering algorithms
- Renumbering
- Ordering
- Graph labeling
- Bandwidth minimization