An Analysis of Reordering Algorithms to Reduce the Computational Cost of the Jacobi-Preconditioned CG Solver Using High-Precision Arithmetic

Gonzaga de Oliveira, Sanderson L.; Chagas, Guilherme Oliveira; Bernardes, Júnior Assis Barreto

doi:10.1007/978-3-319-62392-4_1

Sanderson L. Gonzaga de Oliveira²³,
Guilherme Oliveira Chagas²⁴ &
Júnior Assis Barreto Bernardes²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10404))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1817 Accesses
1 Citations

Abstract

Several heuristics for bandwidth and profile reductions have been proposed since the 1960s. In systematic reviews, 133 heuristics applied to these problems have been found. The results of these heuristics have been analyzed so that, among them, 13 were selected in a manner that no simulation or comparison showed that these algorithms could be outperformed by any other algorithm in the publications analyzed, in terms of bandwidth or profile reductions and also considering the computational costs of the heuristics. Therefore, these 13 heuristics were selected as the most promising low-cost methods to solve these problems. Based on this experience, this article reports that in certain cases no heuristic for bandwidth or profile reduction can reduce the computational cost of the Jacobi-preconditioned Conjugate Gradient Method when using high-precision numerical computations.

Access provided by CONRICYT-eBooks. Download conference paper PDF

An evaluation of reordering algorithms to reduce the computational cost of the incomplete Cholesky-conjugate gradient method

Article 11 August 2017

An Assessment of Reordering Algorithms to Speed Up the ICCG Method Applied to CFD Problems

A Modified Bandwidth Reduction Heuristic Based on the WBRA and George-Liu Algorithm

Keywords

1 Introduction

In several scientific and engineering fields, such as finite element analysis, computational fluid mechanics, and structural engineering, a fundamental task is the resolution of large sparse linear systems with the form \(Ax= b\), where A is an \(n\times n\) sparse, symmetric, and positive-definite matrix, b is a vector of length n, and x is an unknown vector (which is sought) of length n. Generally, the highest computational cost of the simulation is required in the resolution of these linear systems. A substantial amount of memory and a high processing cost are necessary to store and to solve these large-scale linear systems. For the low-cost solution of large and sparse linear systems, a heuristic for bandwidth or profile reduction is often used so that the corresponding coefficient matrix A will have narrow bandwidth and small profile. Thus, heuristics for bandwidth and profile reductions are used to achieve low processing and storage costs for solving large sparse linear systems [14, 17]. In particular, profile reduction is employed to reduce storage costs of applications that employ the skyline data structure [9] to represent large-scale matrices.

Let \(A = [a_{ij}]\) be a symmetric sparse \(n \times n\) matrix. The bandwidth of line i is \(\beta _i(A) = i - min(j: (1 \le j < i)\ a_{ij} \ne 0)\). Bandwidth of A is defined as \(\beta (A) = \max [(1 \le i \le n)\ \beta _i(A)] = \max [(1 \le i \le n)\ (i - \min [j : (1 \le j < i)]\ |\ a_{ij} \ne 0)]\). The profile of A is defined as \(profile(A) = \sum _{i=1}^n \beta _i(A)\). The bandwidth and profile minimization problems are NP-hard [28, 31]. Since these problems have associations with an extensive variety of other problems in scientific and engineering disciplines, several heuristics for bandwidth and profile reductions have been proposed for reordering the rows and columns of sparse matrices to solve the bandwidth and profile reduction problems.

A prominent algorithm for solving large-scale sparse linear systems is the Conjugate Gradient Method (CGM) [21, 26]. Duff and Meurant [8] showed that a local ordering of the vertices of the corresponding graph of A can improve cache hit rates so that a computational cost reduction of the CGM is reached. Moreover, Burgess and Giles [3] and Das et al. [6] showed that such local ordering can be attained by using a heuristic for bandwidth reduction. Moreover, one should employ an ordering which does not lead to an increase of the number of iterations of the CGM when a preconditioner is applied [15].

In this work, we analyze cases where selected heuristics for bandwidth or profile reduction may not reduce the computational times of the Jacobi-preconditioned CGM (JPCGM). In previous publications [14, 16], we showed preceding results and based on this experience [2, 5, 15, 17], 13 heuristics were selected as the most promising methods in this field. Thus, the main objective of this work is to analyze the results of 13 potential state-of-the-art low-cost heuristics for bandwidth and profile reductions (that were selected from systematic reviews [2, 5, 15, 17]) when executed to reduce the computational cost of the JPCGM using high-precision floating-point arithmetic.

Section 2 describes the systematic reviews accomplished to identify the potential best low-cost heuristics for bandwidth and profile reductions. Section 3 describes how the numerical experiments were conducted in this study. Section 4 shows the results. Finally, Sect. 5 addresses the conclusions.

2 Systematic Reviews

As described, since the bandwidth and profile reduction problems have connections with a wide range of other problems in scientific and engineering disciplines, a large number of heuristics for bandwidth and profile reductions has been proposed. In systematic reviews, 133 heuristics for bandwidth and/or profile reductions were identified [2, 5, 15, 17], published between the 1960s and the present day, including a recent proposed heuristic for bandwidth and profile reductions [13]. From the analysis performed, respectively, seven and six heuristics for bandwidth and profile reductions were selected to be evaluated in this computational experiment as potentially being the best low-cost heuristics for bandwidth (Burgess-Lai [4], FNCHC [27], GGPS [38], VNS-band [30], hGPHH [24], CSS-band [23]) or profile (Snay [35], Sloan [34], Medeiros-Pimenta-Goldenberg (MPG) [29], NSloan [25], Sloan-MGPS [32]) reduction. The Reverse Cuthill-McKee method with starting pseudo-peripheral vertex given by the George-Liu algorithm (RCM-GL) [10] was selected in both systematic reviews of heuristics for bandwidth and profile reductions. In particular, the RCM-GL method [10] is contained in the Matlab software [36]. Therefore, from the 133 identified heuristics for bandwidth and profile reduction, 12 were selected to be evaluated in this computational experiment because no other simulation or comparison showed that these 12 heuristics could be superseded by any other heuristics in the analyzed papers, concerning bandwidth or profile reduction, when the computation costs of the given heuristic were also considered. Thus, these 12 heuristics could be deemed as the most promising low-cost heuristics to solve the problems.

The GPS heuristic [12] was not selected in these systematic reviews. In spite of this, it was also implemented and its results were compared with these 12 heuristics in this computational experiment because it is one of the most classic low-cost heuristics evaluated in the field for both bandwidth and profile reductions. Thus, 13 heuristics were implemented and/or evaluated in this work.

3 Description of the Tests, Implementation of the Heuristics, Testing, and Calibration

A 64-bit executable program of the VNS-band heuristic (which was kindly provided by one of the heuristic’s authors) was used. This executable only runs with instances up to 500,000 vertices.

The FNCHC-heuristic source code was also kindly provided by one of the heuristic’s authors. With this, the source code was converted and implemented in this present work using the C++ programming language.

The 11 other heuristics’ authors were requested for the sources and/or executables of their algorithms. Some authors informed that they no longer had the source code or executable, some authors did not answer, and other authors explained that the programs could not be provided. Then, the 11 other heuristics were also implemented using the C++ programming language so that the computational costs of the heuristics could be properly compared [15]. Specifically, the g++ version 4.8.2 compiler was used.

The IEEE 754 double-precision binary floating-point arithmetic is composed of 11 bits of exponent (ranging between \(10^{-307}\) and \(10^{307}\)) and a matissa comprised of 53 bits, which describes approximately 16 decimal digits. Nowadays, this double-precision floating-point arithmetic is adequately accurate for most scientific computing applications. Nonetheless, for some scientific applications, the 64-bit IEEE arithmetic is no longer suitable for today’s large-scale numerical simulations. Thus, some relevant scientific applications require high-precision floating-point computations. High-precision floating-point arithmetic is used in applications where the execution time of arithmetic is not a limiting factor, or where accurate results with many digits in the mantissa are needed. Some of these applications demand a significand of 64 bits or more to reach numerically useful results. These applications derive from numerous scientific applications, such as climate modeling, computational fluid dynamics (CFD) problems (e.g. vortex roll-up simulations), computational geometry, mesh generation, computational number theory, Coulomb N-body atomic system simulations, experimental mathematics, large-scale physical simulations performed on highly parallel supercomputers (e.g. studies of the fine structure constant of physics), and quantum theory [1]. Particularly, mesh generation, contour mapping, and other computational geometry applications substantially trust on highly precise arithmetic, mostly when the domain is the unit cube. The reason is that small numerical errors can induce geometrically questionable results. Such troubles are latent in the mathematics of the formulas commonly used in such computations and cannot be repaired without a considerable effort [1]. Specifically, in the applications mentioned, portions of the code normally contain numerically sensitive computations. When using double-precision floating-point arithmetic, these applications may return results with questionable precision, depending on the stopping criteria used. These imprecise results may in turn cause larger errors. On the other hand, it is normally cheaper and more reliable to use high-precision floating-point arithmetic to overcome these troubles [1]. Specifically, in this computational experiment, we used instances derived from meshes generated in discretizations of partial differential equations (that govern CFD problems) by finite volumes [19, 20]. Hence, our numerical experiments will focus on high-precision floating-point arithmetic. We used the GNU Multiple Precision Floating-point Computations with Correct-Rounding (MFR) library with 256-bit (when using instances originating from discretizations of the Laplace equation) and 512-bit (when using instances contained in the University of Florida sparse matrix collection) precisions.

Many heuristics evaluated here are highly dependent on the starting vertex. Since Koohestani and Poli [24] did not explain which pseudo-peripheral vertex finder was used, the George-Liu algorithm [11] for computing a pseudo-peripheral vertex was used in this computational experiment. Hence, we will refer this heuristic as hGPHH-GL.

It was not our objective that the results of the C++ programming language versions of the heuristics supersede all the results of the original implementations. Our objective was to code reasonably efficient implementations of the heuristics evaluated to make it possible an adequate comparison of the results of the 13 heuristics. However, we tested and calibrated the C++ programming language versions of the heuristics implemented to compare our implementations with the codes used by the original proposers of the heuristics to ensure the codes we implemented were comparable to the algorithms that were originally proposed. We compared the results of the C++ programming language versions of the heuristics with the results presented in the original publications. In particular, a previous publication [15] shows how the heuristics were implemented, tested, and calibrated. The C++ programming language implementations of the heuristics obtained similar results in bandwidth or profile reductions to the results presented in the original publications (see [15]).

Table 1 shows the characteristics of the five workstations used to perform the simulations. Particularly, the Ubuntu 14.04 LTS 64-bit operating system was used.

Table 1. Characteristics of the machines used to perform the simulations.

Full size table

Three sequential runs, with both a reordering algorithm and with the JPCGM, were carried out with each instance. In addition, for this experimental analysis of 13 low-cost heuristics for bandwidth and profile reductions, we followed the suggestions given by Johnson [22], aiming at reducing the computational cost of the JPCGM.

4 Numerical Experiments and Analysis

This section shows the results obtained in simulations using the JPCGM, executed after applying heuristics for bandwidth and profile reductions. Section 4.1 shows the results of the resolutions of linear systems arising from the discretization of the Laplace equation by finite volumes [19]. Section 4.2 shows the results of the resolutions of linear systems contained in the University of Florida sparse matrix collection [7].

Tables in this section show the dimension n of the respective coefficients matrix of the linear system (or the number of vertices of the graph associated with the coefficient matrix on it or the name of the instance contained in the University of Florida sparse matrix collection), the name of the reordering algorithms applied, the results with respect to profile and bandwidth reductions, the average results of the heuristics in relation to the computational cost, in seconds (s), and the memory requirements, in mebibytes (MiB). In addition, these tables show the number of iterations and the total computational cost, in seconds, of the JPCGM. Furthermore, in spite of the small number of executions for each heuristic in each instance, these tables show the standard deviation (\(\sigma \)) and coefficient of variation (\(C_v\)), referring to the total computational cost of the JPCGM. Additionally, these tables show “–” in the first row of a set of simulations performed with each instance. This means that no reordering algorithm was used. With this result, one can verify the speed-down of the JPCGM attained when using a heuristic for bandwidth or profile reduction, shown in the last columns of these tables. In the tables below, numbers in bold face are the best results (up to two occurrences) in the \(\beta \), profile, t(s), and m.(MiB) columns. Figures in this section are presented as line charts for clarity.

4.1 Instances Originating from the Discretization of the Laplace Equation by Finite Volumes

This section shows the results of the resolutions of linear systems arising from the discretization of the Laplace equation by finite volumes [19]. These linear systems are divided into two datasets: seven and eight linear systems ranging from 7,322 to 277,118 and from 16,922 to 1,115,004 unknowns comprised of matrices with random order [see Fig. 1 and Tables 2 and 3 (with executions performed on the M1 machine)] and originally ordered using a sequence given by the Sierpiński-like curve [18, 37] [see Fig. 2 and Tables 4 and 5 (with executions performed on the M2 machine)], respectively.

Table 2. Resolution of three linear systems (derived from the discretization of the Laplace equation by finite volumes and composed of matrices with random order) using the JPCGM and vertices labeled by heuristics for bandwidth and profile reductions.

Full size table

Table 3. Resolution of four linear systems (derived from the discretization of the Laplace equation by finite volumes and composed of matrices with random order) using the JPCGM and vertices labeled by heuristics for bandwidth and profile reductions.

Full size table

Table 4. Resolution of linear systems (ranging from 16,922 to 105,764 unknowns, derived from the discretization of the Laplace equation by finite volumes and composed of matrices ordered using a sequence determined by the Sierpiński-like curve) using the JPCGM and vertices labeled by heuristics for bandwidth and profile reductions.

Full size table

Table 5. Resolution of four linear systems (derived from the discretization of the Laplace equation by finite volumes and composed of matrices originally ordered using a sequence determined by the Sierpiński-like curve) using the JPCGM and vertices labeled by heuristics for bandwidth and profile reductions.

Full size table

Tables 2, 3, 4 and 5 show that Sloan’s heuristic almost always obtained the best profile results in these datasets. In addition, these tables show that the FNCHC heuristic achieved in general the best bandwidth results, but closely followed by the RCM-GL and hGPHH-GL heuristics, which presented much lower computational costs. Nevertheless, no gain was attained regarding the speed-up of the JPCGM when using these heuristics. In particular, the FNCHC heuristic presented a much higher computational cost than the RCM-GL, Sloan’s, MPG, NSloan, Sloan-MGPS, and hGPHH-GL heuristics.

A slight speed-up of the JPCGM applied to the linear system composed of 16,922 when using the hGPHH-GL heuristic (see Table 4) was reached, but this gain is marginal. Moreover, a speed-down of the JPCGM was obtained when using the other heuristics for bandwidth and profile reductions applied to the other linear systems.

Tables 4 and 5 do not show results of Snay’s heuristic [35] applied to linear systems larger than 68,414 unknowns. Snay’s heuristic obtained better results (related to reduce the JPCGM computational cost) than the results of the CSS-band [23] and NSloan [25] heuristics when applied to the linear systems composed of 39,716 and 68,414 unknowns. However, Snay’s heuristic performed less favorably than the other heuristics when applied to the linear system comprised of 16,922 unknowns (see Table 4).

The GPS [12], Burgess-Lai [4], GGPS [38], and CSS-band [23] presented higher computational costs than the other heuristics [see t(s)(Heuristic) column in Tables 3 and 5]. Consequently, Table 5 does not show the results of these four heuristics applied to the linear systems composed of 750,446 and 1,015,004 unknowns, keeping in mind that the VNS-band execution program runs with instances up to 500,000 unknowns. Furthermore, Table 5 does not show the results of the NSloan [25] and Sloan-MGPS [32] heuristics applied to the linear system composed of 1,015,004 unknowns because these two heuristics performed less favorably than the five other heuristics when applied to linear systems contained in this dataset.

Table 6. Eleven linear systems (composed of symmetric and positive-definite matrices) contained in the University of Florida sparse matrix collection.

Full size table

4.2 Instances Contained in the University of Florida Sparse Matrix Collection

Table 6 provides the characteristics of 11 linear systems (composed of symmetric and positive-definite matrices) contained in the University of Florida sparse matrix collection [7]. Tables 2, 3, 4 and 5 show that the RCM-GL [10], Sloan’s [34], MPG [29], NSloan [25], Sloan-MGPS [32], and hGPHH-GL [24] heuristics presented much lower computational costs than the other heuristics evaluated in this computational experiment. Then, these six low-cost heuristics for bandwidth or profile reduction evaluated in this study were applied to the dataset presented in Table 6.

Tables 7 and 8 and Fig. 3 show the results of the resolutions of 11 linear systems contained in the University of Florida sparse matrix collection using the JPCGM and vertices labeled using heuristics for bandwidth and profile reductions. The hGPHH-GL heuristic obtained the best speed-up of the JPCGM when applied to the Pres_Poisson instance (see Table 7). On the other hand, speed-downs of the JPCGM were obtained when using these six heuristics for bandwidth and profile reductions when applied to the 10 other linear systems contained in the University of Florida sparse matrix collection that were used here.

Among the heuristics evaluated, the Sloan-MGPS and Sloan’s (RCM-GL) heuristics obtained (almost always) the best profile (bandwidth) results when applied to the instances composed in this dataset. Nevertheless, speed-downs of the JPCGM were obtained when using these heuristics (except the simulation using the Pres_Poisson instance).

Table 7. Resolution of seven linear systems contained in the University of Florida sparse matrix collection using the JPCGM and vertices labeled using heuristics for bandwidth and profile reductions

Full size table

Table 8. Resolution of four linear systems contained in the University of Florida sparse matrix collection using the JPCGM and vertices labeled using several heuristics for bandwidth and profile reductions.

Full size table

5 Conclusions

The results of 13 heuristics for bandwidth and profile reductions applied to reduce the computational cost of solving three datasets of linear systems using the Jacobi-preconditioned Conjugate Gradient Method in high-precision floating-point arithmetic are described in this paper. These heuristics were selected from systematic reviews [2, 5, 15, 17].

In experiments using three datasets composed of large-scale linear systems, the hGPHH-GL heuristic performed best when applied to one linear system aiming at reducing the computational cost of the JPCGM (see Table 7). On the other hand, speed-downs of the JPCGM were obtained when applying these 13 heuristics for bandwidth and profile reductions to the other linear systems that were used in this computational experiment. Thus, the attained results show that in certain cases no heuristic for bandwidth or profile reduction can reduce the computational cost of the Jacobi-preconditioned Conjugate Gradient Method when using high-precision numerical computations.

Concerning the set of linear systems arising from the discretization of the Laplace equation by finite volumes comprised of matrices with random order, each vertex has exactly three adjacencies [19]. Probably because of this, relabeling the vertices did not improve cache hit rates.

Regarding the set of linear systems originating from the discretization of the Laplace equation by finite volumes and comprised of matrices originally ordered using a sequence given by the Sierpiński-like curve [19], a large number of cache misses may be occurred after applying heuristics for bandwidth and profile reductions. Probably, the reason is that a space-filling curve already provides an adequate memory-data locality so that a reordering algorithm is not useful in such cases. We applied these 13 heuristics in large-scale linear systems and cache memory is a relevant factor in the execution times of these simulations. Evidence from the experiments described in this paper does allow the assertion that a linear system should be studied carefully before using a heuristic for bandwidth or profile reduction aiming at reducing the computational cost of the JPCGM (and probably when using a preconditioned CGM or other iterative linear system solver).

As a continuation of this work, we intend to implement and evaluate the following preconditioners: Algebraic Multigrid, incomplete Cholesky factorization, threshold-based incomplete LU (ILUT), Successive Over-Relaxation (SOR), Symmetric SOR, and Gauss-Seidel. To provide more specific detail, we intend to study the effectiveness of the strategies when using incomplete or approximate factorization based preconditioners as well approximate inverse preconditioners. These techniques shall be used as preconditioners of the Conjugate Gradient Method and the Generalized Minimal Residual (GMRES) method [33] to evaluate their computational performance in conjunction with heuristics for bandwidth and profile reductions. Parallel strategies of the above algorithms will also be studied.

Extended (256-bit and 512-bit) precision was employed in this work. This reduces rounding errors. However, it increases the execution times by a large factor and it may not be performed when solving certain real problems. We intend to examine what occurs in double-precision arithmetic in future studies.

References

Bailey, D.H.: High-precision floating-point arithmetic in scientific computation. Comput. Sci. Eng. 7(3), 54–61 (2005)
Article Google Scholar
Bernardes, J.A.B., Gonzaga de Oliveira, S.L.: A systematic review of heuristics for profile reduction of symmetric matrices. Procedia Comput. Sci. 51, 221–230 (2015). (International Conference on Computational Science, ICCS)
Article Google Scholar
Burgess, D.A., Giles, M.: Renumbering unstructured grids to improve the performance of codes on hierarchial memory machines. Adv. Eng. Softw. 28(3), 189–201 (1997)
Article Google Scholar
Burgess, I.W., Lai, P.K.F.: A new node renumbering algorithm for bandwidth reduction. Int. J. Numer. Methods Eng. 23, 1693–1704 (1986)
Article MATH Google Scholar
Chagas, G.O., Gonzaga de Oliveira, S.L.: Metaheuristic-based heuristics for symmetric-matrix bandwidth reduction: a systematic review. Procedia Comput. Sci. (ICCS) 51, 211–220 (2015)
Article Google Scholar
Das, R., Mavriplis, D.J., Saltz, J.H., Gupta, S.K., Ponnusamy, R.: Design and implementation of a parallel unstructured Euler solver using software primitives. AIAA J. 32(3), 489–496 (1994)
Article MATH Google Scholar
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)
MathSciNet MATH Google Scholar
Duff, I.S., Meurant, G.A.: The effect of ordering on preconditioned conjugate gradients. BIT Numer. Math. 29(4), 635–657 (1989)
Article MathSciNet MATH Google Scholar
Felippa, C.A.: Solution of linear equations with skyline-stored symmetric matrix. Comput. Struct. 5(1), 13–29 (1975)
Article MATH Google Scholar
George, A., Liu, J.W.: Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs (1981)
MATH Google Scholar
George, A., Liu, J.W.H.: An implementation of a pseudoperipheral node finder. ACM Trans. Math. Softw. 5(3), 284–295 (1979)
Article MATH Google Scholar
Gibbs, N.E., Poole, W.G., Stockmeyer, P.K.: An algorithm for reducing the bandwidth and profile of a sparse matrix. SIAM J. Numer. Anal. 13(2), 236–250 (1976)
Article MathSciNet MATH Google Scholar
Gonzaga de Oliveira, S.L., Abreu, A.A.A.M., Robaina, D., Kischinhevsky, M.: A new heuristic for bandwidth and profile reductions of matrices using a self-organizing map. In: Gervasi, O., et al. (eds.) ICCSA 2016. LNCS, vol. 9786, pp. 54–70. Springer, Cham (2016). doi:10.1007/978-3-319-42085-1_5
Chapter Google Scholar
Gonzaga de Oliveira, S.L., Abreu, A.A.A.M., Robaina, D.T., Kischnhevsky, M.: An evaluation of four reordering algorithms to reduce the computational cost of the Jacobi-preconditioned conjugate gradient method using high-precision arithmetic. Int. J. Bus. Intell. Data Min. 12(2), 190–209 (2017). http://dx.doi.org/10.1504/IJBIDM.2017.10004158
Article Google Scholar
Gonzaga de Oliveira, S.L., Bernardes, J.A.B., Chagas, G.O.: An evaluation of low-cost heuristics for matrix bandwidth and profile reductions. Comput. Appl. Math. (2016). doi:10.1007/s40314-016-0394-9
Gonzaga de Oliveira, S.L., Bernardes, J.A.B., Chagas, G.O.: An evaluation of several heuristics for bandwidth and profile reductions to reduce the computational cost of the preconditioned conjugate gradient method. In: The XLVIII of the Brazilian Symposium of Operations Research (SBPO), Vitória, Brazil, September 2016
Google Scholar
Gonzaga de Oliveira, S.L., Chagas, G.O.: A systematic review of heuristics for symmetric-matrix bandwidth reduction: methods not based on metaheuristics. In: The XLVII Brazilian Symposium of Operational Research (SBPO), Ipojuca-PE, Brazil, August 2015. Sobrapo
Google Scholar
Gonzaga de Oliveira, S.L., Kischinhevsky, M.: Sierpiński curve for total ordering of a graph-based adaptive simplicial-mesh refinement for finite volume discretizations. In: Proceedings of the Brazilian National Conference on Computational and Applied Mathematics (CNMAC), pp. 581–585, Belém, Brazil (2008)
Google Scholar
Gonzaga de Oliveira, S.L., Kischinhevsky, M., Tavares, J.M.R.S.: Novel graph-based adaptive triangular mesh refinement for finite-volume discretizations. Comput. Model. Eng. Sci. 95(2), 119–141 (2013)
MathSciNet MATH Google Scholar
Gonzaga de Oliveira, S.L., Oliveira, F.S., Chagas, G.O.: A novel approach to the weighted laplacian formulation applied to 2D delaunay triangulations. In: Gervasi, O., Murgante, B., Misra, S., Gavrilova, M.L., Rocha, A.M.A.C., Torre, C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2015. LNCS, vol. 9155, pp. 502–515. Springer, Cham (2015). doi:10.1007/978-3-319-21404-7_37
Chapter Google Scholar
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(36), 409–436 (1952)
Article MathSciNet MATH Google Scholar
Johnson, D.: A theoretician’s guide to the experimental analysis of algorithms. In: Goldwasser, M., Johnson, D.S., McGeoch, C.C., (eds.) Proceedings of the 5th and 6th DIMACS Implementation Challenges, Providence (2002)
Google Scholar
Kaveh, A., Sharafi, P.: Ordering for bandwidth and profile minimization problems via charged system search algorithm. IJST-T Civ. Eng. 36(2), 39–52 (2012)
Google Scholar
Koohestani, B., Poli, R.: A hyper-heuristic approach to evolving algorithms for bandwidth reduction based on genetic programming. In: Bramer, M., Petridis, M., Nolle, L. (eds.) Research and Development in Intelligent Systems XXVIII, pp. 93–106. Springer, London (2011). doi:10.1007/978-1-4471-2318-7_7
Chapter Google Scholar
Kumfert, G., Pothen, A.: Two improved algorithms for envelope and wavefront reduction. BIT Numer. Math. 37(3), 559–590 (1997)
Article MathSciNet MATH Google Scholar
Lanczos, C.: Solutions of systems of linear equations by minimized iterations. J. Res. Natl. Bur. Stand. 49(1), 33–53 (1952)
Article MathSciNet Google Scholar
Lim, A., Rodrigues, B., Xiao, F.: A fast algorithm for bandwidth minimization. Int. J. Artif. Intell. Tools 3, 537–544 (2007)
Article Google Scholar
Lin, Y.X., Yuan, J.J.: Profile minimization problem for matrices and graphs. Acta Mathematicae Applicatae Sinica 10(1), 107–122 (1994)
Article MathSciNet MATH Google Scholar
Medeiros, S.R.P., Pimenta, P.M., Goldenberg, P.: Algorithm for profile and wavefront reduction of sparse matrices with a symmetric structure. Eng. Comput. 10(3), 257–266 (1993)
Article Google Scholar
Mladenovic, N., Urosevic, D., Pérez-Brito, D., García-González, C.G.: Variable neighbourhood search for bandwidth reduction. Eur. J. Oper. Res. 200, 14–27 (2010)
Article MathSciNet MATH Google Scholar
Papadimitriou, C.H.: The NP-completeness of bandwidth minimization problem. Comput. J. 16, 177–192 (1976)
MathSciNet MATH Google Scholar
Reid, J.K., Scott, J.A.: Ordering symmetric sparse matrices for small profile and wavefront. Int. J. Numer. Methods Eng. 45(12), 1737–1755 (1999)
Article MATH Google Scholar
Saad, Y., Schultz, M.H.: GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Comput. 7, 856–869 (1986)
Article MathSciNet MATH Google Scholar
Sloan, S.W.: A Fortran program for profile and wavefront reduction. Int. J. Numer. Methods Eng. 28(11), 2651–2679 (1989)
Article MATH Google Scholar
Snay, R.A.: Reducing the profile of sparse symmetric matrices. Bulletin Geodésique 50(4), 341–352 (1976)
Article MathSciNet Google Scholar
The MathWorks, Inc.: MATLAB, 1994–2015. http://www.mathworks.com/products/matlab
Velho, L., Figueiredo, L.H., Gomes, J.: Hierarchical generalized triangle strips. Vis. Comput. 15(1), 21–35 (1999)
Article Google Scholar
Wang, Q., Guo, Y.C., Shi, X.W.: A generalized GPS algorithm for reducing the bandwidth and profile of a sparse matrix. Prog. Electromagn. Res. 90, 121–136 (2009)
Article Google Scholar

Download references

Acknowledgments

This work was undertaken with the support of the Fapemig - Fundação de Amparo à Pesquisa do Estado de Minas Gerais. The authors would like to thank respectively Prof. Dr. Dragan Urosevic, from the Mathematical Institute SANU, and Prof. Dr. Fei Xiao, from Beepi, for sending us the VNS-band executable programs, and the source code of the FNCHC heuristic. In addition, we would like to thank the reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

Universidade Federal de Lavras, Lavras, Minas Gerais, Brazil
Sanderson L. Gonzaga de Oliveira & Júnior Assis Barreto Bernardes
Instituto Nacional de Pesquisas Espaciais, São José dos Campos, São Paulo, Brazil
Guilherme Oliveira Chagas

Authors

Sanderson L. Gonzaga de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme Oliveira Chagas
View author publications
You can also search for this author in PubMed Google Scholar
Júnior Assis Barreto Bernardes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanderson L. Gonzaga de Oliveira .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
University of Trieste, Trieste, Italy
Giuseppe Borruso
Polytechnic University of Bari, Bari, Italy
Carmelo M. Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
University of Trieste, Trieste, Italy
Alfredo Cuzzocrea

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonzaga de Oliveira, S.L., Chagas, G.O., Bernardes, J.A.B. (2017). An Analysis of Reordering Algorithms to Reduce the Computational Cost of the Jacobi-Preconditioned CG Solver Using High-Precision Arithmetic. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-62392-4_1
Published: 06 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62391-7
Online ISBN: 978-3-319-62392-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Analysis of Reordering Algorithms to Reduce the Computational Cost of the Jacobi-Preconditioned CG Solver Using High-Precision Arithmetic

Abstract

Similar content being viewed by others

An evaluation of reordering algorithms to reduce the computational cost of the incomplete Cholesky-conjugate gradient method

An Assessment of Reordering Algorithms to Speed Up the ICCG Method Applied to CFD Problems

A Modified Bandwidth Reduction Heuristic Based on the WBRA and George-Liu Algorithm

Keywords

1 Introduction

2 Systematic Reviews

3 Description of the Tests, Implementation of the Heuristics, Testing, and Calibration

4 Numerical Experiments and Analysis

4.1 Instances Originating from the Discretization of the Laplace Equation by Finite Volumes

4.2 Instances Contained in the University of Florida Sparse Matrix Collection

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Analysis of Reordering Algorithms to Reduce the Computational Cost of the Jacobi-Preconditioned CG Solver Using High-Precision Arithmetic

Abstract

Similar content being viewed by others

An evaluation of reordering algorithms to reduce the computational cost of the incomplete Cholesky-conjugate gradient method

An Assessment of Reordering Algorithms to Speed Up the ICCG Method Applied to CFD Problems

A Modified Bandwidth Reduction Heuristic Based on the WBRA and George-Liu Algorithm

Keywords

1 Introduction

2 Systematic Reviews

3 Description of the Tests, Implementation of the Heuristics, Testing, and Calibration

4 Numerical Experiments and Analysis

4.1 Instances Originating from the Discretization of the Laplace Equation by Finite Volumes

4.2 Instances Contained in the University of Florida Sparse Matrix Collection

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation