Keywords

1 Introduction

The solution of linear systems composed of large-scale sparse matrices is one of the most relevant computational kernels in scientific computing. Specifically, the solution of many real-world and applied problems in science and engineering (e.g., thermal and model reduction problems) reduces to solving large-scale sparse linear systems. The solution of large-scale sparse linear systems is usually a part of the numerical simulation that demands a high computational cost in execution times and memory requirements. Thus, the study to reduce the computational cost of solving linear systems composed of large-scale sparse matrices is a significant topic in numerical mathematics because of its importance in many numerical simulations.

Iterative linear system solvers that handle large-scale sparse matrices tend to suffer from poor memory performance because of the inefficient use of cache if they do not consider the order of how to process the matrix rows and columns. Simulations can generally gain substantial improvements in execution costs by accessing data with an appropriate order (e.g., [2, 4, 6, 9, 10]). Thus, the simulation should adequately label the vertices (of the sparse graph corresponding to a sparse matrix) so that data associated with adjacent vertices tend to be stored in nearby memory locations to improve cache hit rates. Therefore, spatial locality (a cache block brings in variables that the near future computation will use) should be considered an essential aspect when designing a new algorithm.

An appropriate vertex numbering is desirable to guarantee that the associated coefficient matrix will have a small profile for the low-cost solution of large and sparse linear systems, and to reduce the memory requirements of a linear system, depending on the data structure used. Thereby, the use of heuristics for profile reduction is a way of designing an application to return a sequence of graph vertices with spatial locality. The matrix profile reduction also favors direct methods for solving linear systems. Reordering rows and columns of sparse matrices also contributes to improving the arithmetic intensity of the sparse matrix-vector multiplication [21], which is the most critical aspect in the kernel of the conjugate gradient method [11, 14]. As a consequence, the resulting linear system is much easier to compute than the original linear system, even when the linear system is composed of multiple right-hand side vectors [9]. Thus, heuristics for profile reduction are used to obtain low processing and small storage costs for solving large sparse linear systems.

The profile reduction is also crucial for increasing the effectiveness of data structures to represent large-scale matrices, such as applications that use the skyline data structure [7]. Another area where reordering rows and columns of sparse matrices is of fundamental importance is in serial and parallel adaptive mesh refinement. As the mesh is adapted, sparsity changes dynamically and reordering should be employed [3].

Heuristics for profile reduction belong to a family of renumbering algorithms that place nonzero coefficients of a sparse matrix close to the main diagonal. Let \(A=[a_{ij}]\) be an \(n \times n\) symmetric matrix associated with a connected undirected graph \(G=(V,E)\) where V and E are sets of vertices and edges, respectively. The profile of a matrix A is defined as \(profile(A) = \sum _{i=1}^n[\beta _i]\) where \(\beta _i=i-\min \limits _{1 \le j \le i}[j\ |\ a_{ij} \ne 0]\) and \(a_{ii}\ne 0\).

The profile minimization problem is a well known \(\mathcal {NP}\)-hard [15] computational search problem studied for over a half-century. The reason is that it is related to a vast range of scientific and engineering application areas. Thus, practitioners have proposed a wide variety of heuristic methods for reordering the rows and columns of a sparse matrix to reduce its profile (see [1, 10] and references therein). Since this is an intense field of research, practitioners continue to devote efforts to developing heuristics for profile reductions that are capable of reducing the profile of the instances to a considerable extent (e.g., see [17]).

Metaheuristic approaches are common alternatives to graph-theoretical optimization techniques in several fields. Specifically, metaheuristic-based heuristics for profile optimization have been proposed recently (see [17] and references therein). Palubeckis [17] provides a list of applications that require the solution of the profile reduction problem where runtime is not a critical issue. Thus, these applications may apply a metaheuristic algorithm for profile optimization. In the case of a heuristic for profile reduction applied as a preprocessing step while solving linear systems, however, there has been a limited exploration of such techniques mainly because the heuristics for this problem must reach low computational costs. In general, computational costs (time and space) of metaheuristic-based heuristics for profile optimization are impractical when the objective is to accelerate a linear system solver [1, 8,9,10].

This paper focuses on the group of applications based on sparse matrix factorization and related problems so that the experiments conducted here evaluate low-cost (time and space) heuristics for profile reductions against the state-of-the-art metaheuristic-based algorithm for this problem [17]. Apart from substantially reducing the profile, a heuristic must also achieve low computational costs, i.e., it can neither be slow nor show large memory requirements. To provide more specific details, an adequate vertex labeling of a graph corresponding to a matrix contained in a linear system may reduce the computational times of a (possibly preconditioned) linear system solver, such as the conjugate gradient method [6] (as previously mentioned, by improving cache hit rates [2, 4]). Additionally, the profile reductions obtained by a reordering algorithm are not directly proportional to the computational time reduction of solving linear systems. Moreover, the objective is to minimize the total computing time of the simulation including the preprocessing time required by the reordering heuristic, at least when only a single linear system is to be solved [10]. Therefore, a vertex labeling algorithm must perform at low-cost [9].

This paper evaluates a modified heuristic against the Sloan’s [19], MPG [16], NSloan [13], Sloan-MGPS [18], and Hu-Scott [12] heuristics. The new heuristic for profile reduction is a hybrid algorithm of the MPG [16] and NSloan [13] heuristics. This paper also compares the profile results obtained by these low-cost heuristics with the state-of-the-art metaheuristic-based algorithm for profile reduction [17].

Section 2 describes the heuristics evaluated in this computational experiment and states a modified heuristic for profile reduction. Section 3 describes how this paper conducted the tests in this computational experiment. Section 4 shows the experimental results. Finally, Sect. 5 addresses the conclusions.

2 Related Work

Sloan [19] proposed one of the most important heuristics in this field. His heuristic is still one of the most widely used reordering algorithm for reducing the profile size of sparse matrices (e.g., [1, 9, 10, 13, 16, 18]). The reason is that it is inexpensive (in terms of execution times and storage costs) and generates quality solutions. Sloan’s heuristic [19] employs two weights in its priority scheme in order to label the vertices of the instance: \(w_1\), associated with the distance d(ve) from the vertex v to a pseudo-peripheral (target end) vertex e that belongs to the last level of the level structure rooted at a starting vertex s, and \(w_2\), associated with the degree of each vertex. The priority function \(p(v)=w_1\cdot d(v,e)-w_2\cdot (deg(v)+1)\) employed in Sloan’s heuristic [19] presents different scales for both criteria. The value of \(deg(v)+1\) ranges from 1 to \(m+1\) (where \(m=\max \limits _{v\in V}[deg(v)] \) is the maximum degree found in the graph \(G=(V,E)\)), and d(ve) ranges from 0 (when \(v=e\)) to the eccentricity \(\ell (e)\) (of the target end vertex e).

The MPG [16], NSloan [13], and Sloan-MGPS [18] heuristics are based on Sloan’s heuristic [19]. Specifically, the Sloan-MGPS heuristic [18] is essentially the Sloan’s heuristic [19] with the starting and target end vertices given by an algorithm named modified GPS [18] instead of using the original Sloan’s algorithm for finding these two pseudo-peripheral vertices.

The MPG heuristic [16] employs two max-priority queues: t contains vertices that are candidate vertices to be labeled, and q contains vertices belonging to t and also vertices that can be inserted to t. Similarly to Sloan’s heuristic [19] (and its variations), the current degree of a vertex is the number of adjacencies to vertices that neither have been labeled nor belong to q. A main loop performs three steps. First, a vertex v is inserted into q in order to maximize a specific priority function. Second, the current degree cdeg(v) of each vertex \(v\in t\) is observed: the algorithm labels a vertex v if \(cdeg(v) = 0\), and the algorithm removes from t a vertex v (i.e., \(t\leftarrow t - \{v\}\)) if \(cdeg(v) > 1\). Third, if t is empty, the algorithm inserts into t each vertex \(u\in q\) with priority \(p_u \ge p_{max}(q)-1\) where \(p_{max}(q)\) returns the maximum priority among the vertices in q. The priority function in the MPG heuristic is \(p(v)=d(v,e)-2\cdot cdeg(v)\).

Kumfert and Pothen [13] normalized the two criteria used in Sloan’s algorithm with the objective of proposing the Normalized Sloan (NSloan) heuristic [13]. These authors used the priority function \(p(v) = w_1\cdot d(v,e) - w_2\cdot \left\lfloor d(s,e)/m\right\rfloor \cdot (deg(v)+1)\).

Regarding Sloan’s [19], NSloan [13], and Sloan-MGPS [18] heuristics, this paper established the two weights as described in the original papers. When the authors suggested more than one pair of values in the original papers, exploratory investigations were performed to determine the pair of values that obtains the best profile results [10]. Thus, the two weights are assigned as \(w_1=1\) and \(w_2=2\) for Sloan’s and Sloan-MGPS [18] heuristics, and as \(w_1=2\) and \(w_2=1\) for the NSloan heuristic [13].

In addition to the four low-cost heuristics for profile reductions selected from reviews of the literature [9, 10], this paper evaluates a modified MPG heuristic in this paper. The new heuristic for profile reduction is essentially a hybrid of the MPG [16] and NSloan [13] heuristics. Specifically, the new heuristic is based on the MPG heuristic [16] in conjunction with the normalized scheme proposed by Kumfert and Pothen [13]. This heuristic uses the priority function \(p(v)=d(v,e)-2\cdot d(s,e)/\max \limits _{v\in V}[deg(v)]\cdot (cdeg(v))\). This paper refer to this new heuristic as the NMPG heuristic.

The Hu-Scott heuristic [12] is a multilevel algorithm that uses a maximal independent vertex set for coarsening the adjacency graph of the matrix and Sloan-MGPS heuristic [18] on the coarsest graph. This paper also analyzes the results yielded by the state-of-the-art metaheuristic algorithm for profile reduction [17] against five low-cost heuristics. The MSA-VNS heuristic is a hybrid algorithm based on the multi-start simulated annealing (MSA) algorithm along with the Variable Neighborhood Search (VNS) metaheuristic [17].

3 Description of the Tests

This paper divides the experiments into two main parts. The first part of the experiments compares the results of five low-cost heuristics for profile reductions with the results obtained by the MSA-VNS heuristic [17]. Palubeckis [17] compared the results of his heuristic with the results of the previous state-of-the-art metaheuristic algorithms for profile reduction when applied to two groups of matrices: 38 (ranging from 24 to 292 vertices) and 39 matrices (ranging from 307 to 2,680 vertices). The SuiteSparse matrix collection [5] contains these matrices. Since the matrices of the first group are too small for today’s standards, this paper uses the 39 matrices of the second group to compare the six heuristics evaluated in this computational experiment.

Palubeckis [17] implemented his heuristic in the C++ programming language, and performed his experiments on a workstation containing an Intel® CoreTM 2 Duo CPU running at 3.0 GHz. To provide a reasonable comparison of the running times, we performed the executions of the five low-cost heuristics for profile reductions evaluated here on a workstation containing an Intel CoreTM 2 Duo CPU running at 2.3 GHz (Intel; Santa Clara, CA, United States). Although the profile reduction heuristics evaluated here are deterministic algorithms, 10 serial runs for each matrix were carried out to obtain average results to mitigate possible interferences in the execution costs.

This appraisal also employs the C++ programming language to implement five low-cost heuristics for profile reduction (Sloan’s, MPG, NSloan, Sloan-MGPS, and NMPG heuristics). The implementations of these five heuristics for profile reductions appraised here employ binary heaps to code the priority queues (although the original Sloan’s algorithm [19] used a linked list to code it). A previous publication [10] shows the testing and calibration performed to compare the implementations with the ones used by the original proposers of the four low-cost heuristics (Sloan’s [19], MPG [16], NSloan [13], and Sloan-MGPS [18]) to ensure that the codes employed here were comparable to the formerly proposed algorithms.

A second part of the experiments uses 15 real symmetric matrices contained in the SuiteSparse matrix collection [5]. We also used the Hu-Scott heuristic [12], namely the MC73 routine, contained in the HSL [20], in this experiment. We employed the Fortran programming language to use this routine. The workstations used in the execution of the simulations with these 15 matrices contained an Intel® CoreTM i7-4770 (CPU 3.40 GHz, 8 MB Cache, 8 GB of main memory DDR3 1.333 GHz) (Intel; Santa Clara, CA, United States). We performed three sequential runs for each large-scale matrix. The profile reduction depends on the choice of the initial ordering, and this paper considers the original ordering given in the instance contained in the SuiteSparse matrix collection [5].

4 Results and Analysis

The first part of the experiments (in Sect. 4.1) compares the profile results of five low-cost heuristics (Sloan’s, MPG, NSloan, Sloan-MGPS, and NMPG) with the results of the state-of-the-art metaheuristic algorithm for profile reduction (i.e., the MSA-VNS heuristic [17]) when applied to instances ranging from 307 to 2,680 vertices. These experiments show that the MSA-VNS heuristic yields better profile results than the five other heuristics evaluated do, at much higher execution costs. Thus, the second part of the experiments (in Sect. 4.2) shows the results of six low-cost heuristics for profile reductions (i.e., including the Hu-Scott heuristic) when applied to 15 instances ranging from 19,994 to 1,228,045 vertices [up to 47,851,783 edges (or nonzero coefficients)].

4.1 Comparison of the Results Obtained Using State-of-the-Art Metaheuristic Algorithm Against Five Low-Cost Heuristics

Tables 1 and 2 show the characteristics of the instance (name, size, original profile (\(profile_0\))) and the average values of profile obtained when using six heuristics applied to reduce the profile of 39 small matrices. Figure 1 shows that the MSA-VNS heuristic [17] obtains better profile reductions than the five other heuristics evaluated do in this computational experiment. The figure shows that the MSA-VNS heuristic achieves much higher profile rate reduction than the five other heuristics included in this appraisal when applied to some instances (e.g., can445, bcsstk20, can634, dwt869). Additionally, the MSA-VNS heuristic [17] reduced the profile of the gr_30_30 and instances nos3, whereas, in general, the five other low-cost heuristics increased the profiles of these two instances. On the other hand, Fig. 1 shows that in general the profile rate reduction obtained by the six heuristics evaluated are similar.

Table 1. Results of six heuristics applied to reduce the profile of 26 instances contained in the SuiteSparse matrix collection [5]. Palubeckis [17] obtained the results of the MSA-VNS heuristic (times in seconds) in simulations performed on an Intel® CoreTM 2 Duo running at 3.0 GHz [17]. The five other heuristics (times in milliseconds) were executed on a machine containing an Intel® CoreTM 2 Duo running at 2.3 GHz. (Continued on Table 2.)
Table 2. Results of six heuristics applied to reduce the profile of 13 instances contained in the SuiteSparse matrix collection. Palubeckis [17] obtained the results of the MSA-VNS heuristic (times in seconds) in simulations performed on an Intel® CoreTM 2 Duo running at 3.0 GHz [17]. The five other heuristics (times in milliseconds) were executed on a machine containing an Intel® CoreTM 2 Duo running at 2.3 GHz. (Continued from Table 1.)
Fig. 1.
figure 1

Results of six heuristics applied to reduce the profile of 39 instances contained in the SuiteSparse matrix collection [5].

Figure 2 (in line charts for clarity) shows that the executions costs of the MSA-VNS heuristic is much higher than the five other heuristics for profile reduction evaluated here. The experiments conducted here reveal that the MSA-VNS heuristic [17] achieved its results with a higher processing cost of at least seven magnitudes in relation to the five other heuristics evaluated. For example, Sloan’s [19], NSloan [13], Sloan-MGPS [18], MPG [16], and NMPG heuristics computed the instance dwt2680 (i.e., the largest instance contained in this dataset) in 0.04, 0.16, 0.23, 0.39, and 0.41 ms (in simulations performed on an Intel® CoreTM 2 Duo running at 2.3 GHz), respectively, whereas the MSA-VNS heuristic computed this instance in more than 3221 s (in simulations performed on an Intel® CoreTM 2 Duo running at 3.0 GHz). As an example, Sloan’s heuristic [19] computes an instance composed of 1,228,045 vertices in two seconds.

Fig. 2.
figure 2

Execution times of six heuristics applied to reduce the profile of 39 instances contained in the SuiteSparse matrix collection. Palubeckis [17] obtained the results of the MSA-VNS heuristic (times in seconds (s)) in simulations performed on an Intel® CoreTM 2 Duo running at 3.0 GHz [17]. The simulations with the five other heuristics (times in milliseconds (msecs)) were performed on an Intel® CoreTM 2 Duo running at 2.3 GHz.

On 34 of the 39 matrices contained in the dataset used here, the MSA-VNS heuristic [17] delivers smaller profiles than any other previously known profile reduction algorithm does. While this result is impressive, the MSA-VNS heuristic is very slow, taking approximately \(2n^2\) milliseconds for a matrix of order n. It is not practical for larger instances.

The quality and time of the MSA-VNS algorithm may depend on the setting of its parameters. However, we considered the results provided by its original publication [17], which are expected to have the best parameters. Since low-cost heuristics for profile reductions [10] yield in general reasonable profile results at much lower costs than the state-of-the-art metaheuristic algorithm for profile reduction [17], Sect. 4.2 concentrates on evaluating six low-cost heuristics for profile reductions when applied to larger matrices (up to 1,228,045 vertices).

4.2 Results of Six Low-Cost Heuristics for Profile Reductions

This section describes the results of six low-cost heuristics for profile reductions applied to a dataset comprised of 15 symmetric matrices contained in the SuiteSparse matrix collection. Table 3 shows the matrix’s name and size, the value of the initial profile of the instance (profile\(_0\)), and the average values of profile obtained when using each heuristic. The same table also highlights the best profile results.

Table 3. Results of six heuristics applied to reduce the profile of 15 symmetric matrices contained in the SuiteSparse matrix collection [5] originating from thermal (four matrices) and model reduction (11 matrices) problems.

Table 3 shows that the Hu-Scott heuristic yielded the highest number of best profile results when applied to symmetric instances originating from thermal (in three instances) and model reduction (in six instances) problems. On the other hand, the same table shows that the MPG heuristic delivered the best profile result when applied to the highest matrix (thermal2) used in this study.

Figure 3 (in line charts for clarity) shows the execution times of the six low-cost heuristics for profile reductions evaluated. The Sloan and NSloan heuristics obtained lower execution times than the four other heuristics evaluated. On average, the Hu-Scott, NMPG, and MPG heuristics obtained similar execution times.

Fig. 3.
figure 3

Execution times (in seconds) of six heuristics for profile reductions in simulations using 15 symmetric instances contained in the SuiteSparse matrix collection.

5 Conclusions

This computational experiment evaluated five existing low-cost heuristics for profile reduction (variants of Sloan’s algorithm [19]) along with a new hybrid heuristic based on the MPG [16] and NSloan [13] heuristics, termed NMPG heuristic. This computational experiment also compared the results provided by a metaheuristic algorithm based on simulated annealing and variable neighborhood search metaheuristics [17] with low-cost heuristics for profile reduction. As expected, the simulations conducted in this paper show that the state-of-the-art metaheuristic algorithm for profile reduction [17] reaches better profile results than low-cost heuristics do at much higher execution costs. Currently, no metaheuristic-based heuristic for profile reduction exists in the literature that can successfully reduce the profile of large-scale matrices at a reasonable amount of time. In this field, scientific and engineering applications apply low-cost heuristics for profile reductions when the computational time is a critical subject. Consequently, in the case of instances of rather large dimensions, a practical option is to use low-cost heuristics for obtaining satisfactory-quality solutions for problem instances defined by such matrices. The experiments conducted here were carried out on several matrices arising from different application domains. The results show that the metaheuristic algorithm provides better profile results, but takes a lot more time than graph-theoretic heuristics for profile reduction.

This paper also applied six low-cost heuristics to 15 matrices arising from thermal and model reduction problems. Numerical experiments show that the NMPG heuristic provides, in general, further worthwhile gains when compared with classical heuristics in this field (Sloan’s [19], MPG [16], NSloan [13], Sloan-MGPS [18]). However, the NMPG heuristic for profile reduction did not perform better than Hu-Scott heuristic did when applied to matrices arising from the application domains used here.

The next step in this work is to evaluate low-cost heuristics for profile reductions implemented using parallel libraries (e.g., OpenMP, Galois, and Message Passing Interface systems) and in GPU-accelerated computing. Similarly, regarding massively parallel computing, an evaluation of these heuristics implemented within the Intel® Math Kernel Library running on Intel® Xeon® Many Integrated Core Architecture and Scalable processors is another future step of this investigation.