Abstract
The ESSEX project investigates computational issues arising at exascale for large-scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The project pursues a coherent co-design of all software layers where a holistic performance engineering process guides code development across the classic boundaries of application, numerical method, and basic kernel library. Within ESSEX the numerical methods cover widely applicable solvers such as classic Krylov, Jacobi-Davidson, or the recent FEAST methods, as well as domain-specific iterative schemes relevant for the ESSEX quantum physics application. This report introduces the project structure and presents selected results which demonstrate the potential impact of ESSEX for efficient sparse solvers on highly scalable heterogeneous supercomputers.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Threading and GPGPU support in PETSc, http://www.mcs.anl.gov/petsc/features/
Parallel Arnoldi package (PARPACK) homepage, http://www.caam.rice.edu/~kristyn/parpack_home.html
Anasazi package homepage, http://trilinos.sandia.gov/packages/anasazi/
LAMA — Library for Accelerated Math Applications, http://www.libama.org
Förster, M., Kraus, J.: Scalable parallel AMG on ccNUMA machines with OpenMP. Computer Science - Research and Development 26, 221–228 (2011) ISSN 1865-2034
pOSKI: parallel optimized sparse kernel interface, http://bebop.cs.berkeley.edu/poski
Bautista-Gomez, L., Tsuboi, S., Komatitsch, D., Cappello, F., Maruyama, N., Matsuoka, S.: FTI: high performance fault tolerance interface for hybrid systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 32:1–32:32. ACM, New York (2011)
Plank, J.S., Kim, Y., Dongarra, J.J.: Algorithm-based diskless checkpointing for fault-tolerant matrix operations. In: Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing, FTCS 1995, pp. 351–360. IEEE Computer Society, Washington, DC (1995)
Horton, M., Tomov, S., Dongarra, J.: A class of hybrid LAPACK algorithms for multicore and GPU architectures. In: Symposium on Application Accelerators in High-Performance Computing, pp. 150–158. IEEE Computer Society, Los Alamitos (2011)
Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency Computat. Pract. Exper. (2013), doi:10.1002/cpe.3180
Polizzi, E.: Density-matrix-based algorithm for solving eigenvalue problems. Phys. Rev. B 79, 115112 (2009)
Weiße, A., Wellein, G., Alvermann, A., Fehske, H.: The kernel polynomial method. Rev. Mod. Phys. 78, 275 (2006)
Tal-Ezer, H., Kosloff, R.: An accurate and efficient scheme for propagating the time dependent Schrödinger equation. J. Chem. Phys. 81, 3967 (1984)
Fehske, H., Schleede, J., Schubert, G., Wellein, G., Filinov, V.S., Bishop, A.R.: Numerical approaches to time evolution of complex quantum systems. Phys. Lett. A 373, 2182 (2009)
Alvermann, A., Fehske, H.: High-order commutator-free exponential time-propagation of driven quantum systems. J. Comp. Phys. 230, 5930 (2011)
di Napoli, E., Polizzi, E., Saad, Y.: Efficient estimation of eigenvalue counts in an interval, Preprint arXiv:1308.4275 (2013)
Bhardwaj, O., Ineichen, Y., Bekas, C., Curioni, A.: Highly scalable linear time estimation of spectrograms - a tool for very large scale data analysis. Poster at 2013 ACM/IEEE International Conference on High Performance Computing Networking, Storage and Analysis (2013)
Pieper, A., Schubert, G., Wellein, G., Fehske, H.: Effects of disorder and contacts on transport through graphene nanoribbons. Phys. Rev. B 88, 195409 (2013)
Pieper, A., Heinisch, R.L., Wellein, G., Fehske, H.: Dot-bound and dispersive states in graphene quantum dot superlattices. Phys. Rev. B 89, 165121 (2014)
Krämer, L., Galgon, M., Lang, B., Alvermann, A., Fehske, H., Pieper, A.: Improving robustness of the FEAST algorithm and solving eigenvalue problems from graphene nanoribbons (Submitted to PAMM 2014)
Krämer, L., Di Napoli, E., Galgon, M., Lang, B., Bientinesi, P.: Dissecting the FEAST algorithm for generalized eigenproblems. J. Comput. Appl. Math. 244, 1–9 (2013)
Krämer, L.: Integration Based Solvers for Standard and Generalized Eigenvalue Problems. Ph.D. thesis, Bergische Universität Wuppertal (2014)
Röhrig-Zöllner, M., Thies, J., Kreutzer, M., Alvermann, A., Pieper, A., Basermann, A., Hager, G., Wellein, G., Fehske, H.: Increasing the performance of the Jacobi-Davidson method by blocking. SIAM J. Sci. Comput. (Submitted)
Shahzad, F., Wittmann, M., Zeiser, T., Wellein, G.: Asynchronous checkpointing by dedicated checkpoint threads. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds.) EuroMPI 2012. LNCS, vol. 7490, pp. 289–290. Springer, Heidelberg (2012)
Shahzad, F., Wittmann, M., Kreutzer, M., Zeiser, T., Hager, G., Wellein, G.: A survey of checkpoint/restart techniques on distributed memory systems. Parallel Processing Letters 23(04), 13400111–134001120 (2013)
Shahzad, F., Wittmann, M., Zeiser, T., Hager, G., Wellein, G.: An evaluation of different I/O techniques for checkpoint/restart. In: Proceedings of the 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1708–1716. IEEE Computer Society (2013)
Shahzad, F., Wittmann, M., Kreutzer, M., Zeiser, T., Hager, G., Wellein, G.: PGAS implementation of SPMVM and LBM with GPI. In: Proceedings of the 7th International Conference on PGAS Programming Models, pp. 172–184 (2013)
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In:Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 18:1–18:11. ACM, New York (2009)
Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM Journal on Scientific Computing 36(5), C401–C423 (2014)
Müthing, S., Ribbrock, D., Göddeke, D.: Integrating multi-threading and accelerators into DUNE-ISTL. In: Proceedings of ENUMATH 2013 (accepted 2014)
Anzt, H., Tomov, S., Dongarra, J.: Implementing a sparse matrix vector product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs. Tech. rep. (March 2014), http://www.eecs.utk.edu/resources/library/585
Intel Math Kernel Library (MKL), https://software.intel.com/en-us/intel-mkl
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Alvermann, A. et al. (2014). ESSEX: Equipping Sparse Solvers for Exascale. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8806. Springer, Cham. https://doi.org/10.1007/978-3-319-14313-2_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-14313-2_49
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14312-5
Online ISBN: 978-3-319-14313-2
eBook Packages: Computer ScienceComputer Science (R0)