Dynamic Data Migration for Structured AMR Solvers

Nordén, Markus; Löf, Henrik; Rantakokko, Jarmo; Holmgren, Sverker

doi:10.1007/s10766-007-0056-z

Dynamic Data Migration for Structured AMR Solvers

Published: 06 September 2007

Volume 35, pages 477–491, (2007)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Parallel Programming Aims and scope Submit manuscript

Dynamic Data Migration for Structured AMR Solvers

Download PDF

Markus Nordén¹,
Henrik Löf¹,
Jarmo Rantakokko¹ &
…
Sverker Holmgren¹

87 Accesses
11 Citations
Explore all metrics

Abstract

On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. In this article, we study the performance of a parallel PDE solver with adaptive mesh refinement (AMR). The solver is parallelized using OpenMP and the adaptive mesh refinement makes dynamic load balancing necessary. Due to the dynamically changing memory access pattern caused by the runtime adaption, it is a challenging task to achieve a high degree of geographical locality. The main conclusions of the study are: (1) that geographical locality is very important for the performance of the solver, (2) that the performance can be improved significantly using dynamic page migration of misplaced data, (3) that a migrate-on-next-touch directive works well whereas the first-touch strategy is less advantageous for programs exhibiting a dynamically changing memory access patterns, and (4) that the overhead for such migration is low compared to the total execution time.

References

Wilson, K. M. Aglietti, B. B.: Dynamic page placement to improve locality in CC-NUMA multiprocessors for TPC-C. In: Supercomputing ’01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, pp. 33–33. ACM Press, New York, NY, USA (2001)
Corbalan, J., Martorell, X., Labarta, J.: Evaluation of the memory page migration influence in the system performance: the case of the SGI O2000. In: Proceedings of the 17th Annual International Conference on Supercomputing, pp. 121–129. ACM Press (2003)
Holmgren S., Nordén M., Rantakokko J., Wallin D. (2002). Performance of PDE solvers on a self-optimizing NUMA architecture. Parallel Algor. Appl. 17(4): 285–299
Article Google Scholar
Mark Bull, J., Johnson, C.: Data Distribution, Migration and Replication on a cc-NUMA Architecture. In: Proceedings of the Fourth European Workshop on OpenMP. http://www.caspur.it/ewomp2002/ (2002)
Rendleman C.A. (2000). Parallelization of structured, hierarchical adaptive mesh refinement algorithms. Comput Visual Sci 3: 147–157
Article MATH Google Scholar
Deiterding, R.: Construction and application of an amr algorithm for distributed memory computers. In: Adaptive Mesh Refinement – Theory and Applications, Proc. of the Chicago Workshop on Adaptive Mesh Refinement Methods, pp. 361–372. Springer (2003)
MacNeice P. (2000). Paramesh: a parallel adaptive mesh refinement community toolkit. Comput phys communi 126: 330–354
Article MATH Google Scholar
Parashar, M., Browne, J.: System engineering for high performance computing software: the hdda/dagh infrastructure for implementation of parallel structured adaptive mesh refinement. In: IMA Volume on Structured Adaptive Mesh Refinement (SAMR) Grid Methods, pp. 1–18 (2000)
Colella, P., Graves, D.T., Ligocki, T.J., Martin, D.F., Modiano, D., Serafini, D.B., Straalen, B.V.:Chombo Software Package for AMR Applications – Design Document. Applied Numerical Algorithms Group, NERSC Division, Lawrence Berkeley National Laboratories (2000)
Wissink, A.M., Hornung, R.D., Kohn, S.R., Smith, S.S., Elliott, N.: Large scale parallel structured amr calculations using the samrai framework. In: proceedings of SC2001 (2001)
Steensland, J.: Efficient partitioning of structured dynamic grid hierarchies. Doctoral thesis. Scientific Computing, Department of Information Technology, University of Uppsala. Uppsala dissertations from the Faculty of Science and Technology 44 (2002)
Schloegel, K., Karypis, G., Kumar, V.: A unified algorithm for load-balancing adaptive scientific simulations. In: Proceedings Supercomputing 2000 (2000)
Dreher J., Grauer R. (2005). Racoon: a parallel mesh-adaptive framework for hyperbolic conservation laws. Parallel Comput. 31: 913–932
Article Google Scholar
Maerten, B.: Drama: a library for parallel dynamic load balancing of finite element applications. In: Lecture Notes in Computer Science, Vol. 1685, pp. 313–316 (1999)
Walshaw C., Cross M., Everett M.G. (1997). Parallel dynamic graph partitioning for adaptive unstructured meshes. Parallel Distributed Comput. 47(2): 102–108
Article Google Scholar
Rantakokko J. (2000). Partitioning strategies for structured multiblock grids. Parallel Comput. 26: 1661–1680
Article MATH Google Scholar
Steensland, J., Söderberg, S., Thuné, M.: A comparison of partitioning schemes for blockwise parallel samr algorithms. In: Lecture Notes in Computer Science, Vol. 1947, pp. 160–169 (2001)
Balsara D.S., Norton C.D. (2001). Highly parallel structured adaptive mesh refinement using parallel language-based approaches. Parallel Comput. 27: 37–70
Article MATH Google Scholar
Rantakokko, J.: Comparison of parallelization models for structured adaptive mesh refinement. In: Lecture Notes in Computer Science, Vol. 3149, pp. 615–623 (2004)
Blikberg, R.: Nested Parallelism in OpenMP with Application to Adaptive Mesh Refinement. PhD thesis, Parallab/Department of Informatics, University of Bergen, Norway, Februariy 2003 (2003)
Blikberg R., Sørevik T. (2005). Load balancing and openmp implementation of nested parallelism. Parallel Comput. 31(10-12): 984–998
Article Google Scholar
Ferm L., Lötsetdt P. (2006). Space–time adaptive solutions of first order pdes. J. Sci. Comput. 26(1): 83–110
Article MATH Google Scholar
Karypsis G., Kumar V. (1999). A fast and highly qualitymultilevel scheme for partitioning irregular gra phs. SIAM J. Sci. Comput. 20(1): 359–392
Article Google Scholar
Sun Microsystems, http://www.sun.com/servers/wp/docs/mpo_v7_CUSTOMER.pdf. Solaris Memory Placement Optimization and Sun Fire servers, January 2003 (2003)
Teller P.J. (1990). Translation-lookaside buffer consistency. Computer 23(6): 26–36
Article Google Scholar
Löf, H., Holmgren, S.: Affinity-on-next-touch: increasing the performance of an industrial pde solver on a cc-numa system. In: ICS ’05: Proceedings of the 19th Annual International Conference onSupercomputing, pp. 387–392. ACM Press, New York, NY, USA (2005)
Bircsak J., Craig P., Crowell R., Cvetanovic Z., Harris J., Alexander Nelson C., Offner C.D. (2000). Extending OpenMP for NUMA machines. Sci. Program, 8: 163–181
Google Scholar
Laudon, J., Lenoski, D.: The SGI Origin: a ccNUMA highly scalable server. In: Proceedings of the 24th Annual International Symposium on Computer architecture, pp. 241–251. ACM Press (1997)
Tikir, M.M., Hollingsworth, J.K.: Using hardware counters to automatically improve memory performance. In: SC ’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 46. IEEE Computer Society, Washington, DC, USA (2004)
Spiegel, A., an Mey, D.: Hybrid Parallelization with Dynamic Thread Balancing on a ccNUMA system. In: Brorson M. (ed.) Proceedings of the 6th European Workshop on OpenMP, pp. 77–81. Royal Institute of Technology (KTH), Sweden (2004)
Löf H., Nordén M., Holmgren S. (2004). Improving geographical locality of data for shared memory implementations of PDE solvers. In: Sloth, P.M.A., Tan, C.J.K., Dongarra, J.J., and Hoekstra, A.G. (eds) Computational Science – ICCS 2004, Part II, pp 9–16. Springer-Verlag, Berlin
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Uppsala University, Box 337, Uppsalas, S-751 05, Sweden
Markus Nordén, Henrik Löf, Jarmo Rantakokko & Sverker Holmgren

Authors

Markus Nordén
View author publications
You can also search for this author in PubMed Google Scholar
Henrik Löf
View author publications
You can also search for this author in PubMed Google Scholar
Jarmo Rantakokko
View author publications
You can also search for this author in PubMed Google Scholar
Sverker Holmgren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sverker Holmgren.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nordén, M., Löf, H., Rantakokko, J. et al. Dynamic Data Migration for Structured AMR Solvers. Int J Parallel Prog 35, 477–491 (2007). https://doi.org/10.1007/s10766-007-0056-z

Download citation

Received: 06 November 2006
Accepted: 26 January 2007
Published: 06 September 2007
Issue Date: October 2007
DOI: https://doi.org/10.1007/s10766-007-0056-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Dynamic Data Migration for Structured AMR Solvers

Abstract

Article PDF

Similar content being viewed by others

Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architectures

GPU-Aware AMR on Octree-Based Grids

Optimisation of Patch Distribution Strategies for AMR Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic Data Migration for Structured AMR Solvers

Abstract

Article PDF

Similar content being viewed by others

Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architectures

GPU-Aware AMR on Octree-Based Grids

Optimisation of Patch Distribution Strategies for AMR Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation