Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

Lange, Michael; Gorman, Gerard; Weiland, Michèle; Mitchell, Lawrence; Southern, James

doi:10.1007/978-3-642-38750-0_8

Michael Lange¹⁹,
Gerard Gorman¹⁹,
Michèle Weiland²⁰,
Lawrence Mitchell²⁰ &
…
James Southern²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7905))

Included in the following conference series:

International Supercomputing Conference

2577 Accesses
10 Citations
2 Altmetric

Abstract

The increasing number of processing elements and decreasing memory to core ratio in modern high-performance platforms makes efficient strong scaling a key requirement for numerical algorithms. In order to achieve efficient scalability on massively parallel systems scientific software must evolve across the entire stack to exploit the multiple levels of parallelism exposed in modern architectures. In this paper we demonstrate the use of hybrid MPI/OpenMP parallelisation to optimise parallel sparse matrix-vector multiplication in PETSc, a widely used scientific library for the scalable solution of partial differential equations. Using large matrices generated by Fluidity, an open source CFD application code which uses PETSc as its linear solver engine, we evaluate the effect of explicit communication overlap using task-based parallelism and show how to further improve performance by explicitly load balancing threads within MPI processes. We demonstrate a significant speedup over the pure-MPI mode and efficient strong scaling of sparse matrix-vector multiplication on Fujitsu PRIMEHPC FX10 and Cray XE6 systems.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Hybrid MPI/OpenMP Parallelization in FETI-DP Methods

Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm

Task-Based Parallel Sparse Matrix-Vector Multiplication (SpMVM) with GPI-2

Keywords

References

Cray XE6 system (March 2013), http://www.cray.com/Products/Computing/XE/Specifications/Specifications-XE6.aspx
Fluidity Manual. Applied Modelling and Computation Group, Department of Earth Science and Engineering, South Kensington Campus, Imperial College London, London, SW7 2AZ, UK, version 4.1.8.2 edn. (March 2013), http://launchpad.net/fluidity/4.1/4.1.8.2/+download/fluidity-manual-4.1.8.2.pdf
Fujitsu PRIMEHPC FX10 (March 2013), http://www.fujitsu.com/global/services/solutions/tc/hpc/products/primehpc/spec/
Balaji, P., Buntinas, D., Goodell, D., Gropp, W., Kumar, S., Lusk, E., Thakur, R., Träff, J.L.: MPI on a Million Processors. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) PVM/MPI. LNCS, vol. 5759, pp. 20–30. Springer, Heidelberg (2009)
Chapter Google Scholar
Balay, S., Brown, J., Buschelman, K., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc users manual. Tech. Rep. ANL-95/11 - Revision 3.3, Argonne National Laboratory (2012)
Google Scholar
Balay, S., Gropp, W.D., McInnes, L.C., Smith, B.F.: Efficient management of parallelism in object oriented numerical software libraries. In: Arge, E., Bruaset, A.M., Langtangen, H.P. (eds.) Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser Press (1997)
Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 18:1–18:11. ACM, New York (2009)
Google Scholar
Butler, M., Barnes, L., Sarma, D.D., Gelinas, B.: Bulldozer: An approach to multithreaded compute performance. IEEE Micro 31(2), 6–15 (2011)
Article Google Scholar
Goumas, G., Kourtis, K., Anastopoulos, N., Karakasis, V., Koziris, N.: Performance evaluation of the sparse matrix-vector multiplication on modern architectures. The Journal of Supercomputing 50, 36–77 (2009)
Article Google Scholar
Piggott, M.D., Gorman, G.J., Pain, C.C., Allison, P.A., Candy, A.S., Martin, B.T., Wells, M.R.: A new computational framework for multi-scale ocean modelling based on adapting unstructured meshes. International Journal for Numerical Methods in Fluids 56(8), 1003–1015 (2008)
Article MathSciNet Google Scholar
Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 427–436 (2009)
Google Scholar
Reid, F.J.L., Bull, J.M.: OpenMP microbenchmarks version 2.0. In: European Workshop on OpenMP, EWOMP (2004)
Google Scholar
Schubert, G., Fehske, H., Hager, G., Wellein, G.: Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems. Parallel Processing Letters 21(3), 339–358 (2011)
Article MathSciNet Google Scholar
Wellein, G., Hager, G., Basermann, A., Fehske, H.: Fast sparse matrix-vector multiplication for teraflop/s computers. In: Palma, J.M.L.M., Sousa, A.A., Dongarra, J., Hernández, V. (eds.) VECPAR 2002. LNCS, vol. 2565, pp. 287–301. Springer, Heidelberg (2003)
Chapter Google Scholar
Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing 35(3), 178–194 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Applied Modelling and Computation Group, Imperial College London, London, UK
Michael Lange & Gerard Gorman
EPCC, The University of Edinburgh, Edinburgh, UK
Michèle Weiland & Lawrence Mitchell
Fujitsu Laboratories of Europe Ltd., Hayes, Middlesex, UK
James Southern

Authors

Michael Lange
View author publications
You can also search for this author in PubMed Google Scholar
Gerard Gorman
View author publications
You can also search for this author in PubMed Google Scholar
Michèle Weiland
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence Mitchell
View author publications
You can also search for this author in PubMed Google Scholar
James Southern
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Hamburg, Department of Informatics, Bundestraße 45a, 20146, Hamburg, Germany
Julian Martin Kunkel
Deutsches Klimarechenzentrum, Bundestraße 45a, 20146, Hamburg, Germany
Thomas Ludwig
Germany and Prometeus GmbH, University of Mannheim, Fliederstraße 2, 74915, Waibstadt, Germany
Hans Werner Meuer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lange, M., Gorman, G., Weiland, M., Mitchell, L., Southern, J. (2013). Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-38750-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38749-4
Online ISBN: 978-3-642-38750-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

Abstract

Chapter PDF

Similar content being viewed by others

Hybrid MPI/OpenMP Parallelization in FETI-DP Methods

Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm

Task-Based Parallel Sparse Matrix-Vector Multiplication (SpMVM) with GPI-2

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

Abstract

Chapter PDF

Similar content being viewed by others

Hybrid MPI/OpenMP Parallelization in FETI-DP Methods

Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm

Task-Based Parallel Sparse Matrix-Vector Multiplication (SpMVM) with GPI-2

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation