Abstract
In astrophysical N-body simulations, Dehnen’s algorithm, implemented in the serial falcON code and based on a dual tree traversal, is faster than serial Barnes-Hut tree-codes, but outperformed by parallel CPU and GPU tree-codes. In this paper, we present a parallel dual tree traversal, implemented in the pfalcON code, targeting multicore CPUs and many-core architectures (Xeon Phi). We focus here on both performance and portability, while preserving Dehnen’s original algorithm. We first use task parallelism, with either OpenMP or Intel TBB, for the dual tree traversal. We then rely on the SPMD (single-program, multiple-data) model for the SIMD vectorization of the near field part thanks to the Intel SPMD Program Compiler. We compare the pfalcON performance to related work, and finally obtain performance results that match one of the best current tree-code implementations on GPU.
Chapter PDF
Similar content being viewed by others
References
Arora, N., Shringarpure, A., Vuduc, R.: Direct n-body kernels for multicore platforms. In: Proc. of the Int. Conf. on Parallel Processing (ICPP), pp. 379–387 (2009)
Barnes, J.E., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(4), 446–449 (1986)
Bédorf, J., Gaburov, E., Zwart, S.P.: A sparse octree gravitational N-body code that runs entirely on the GPU processor. J. Comp. Phys. 231(7), 2825–2839 (2012)
Burtscher, M., Pingali, K.: An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm. GPU computing Gems Emerald edition, p. 75 (2011)
Cheng, H., Greengard, L., Rokhlin, V.: A Fast Adaptive Multipole Algorithm in Three Dimensions. Journal of Computational Physics 155, 468–498 (1999)
Dehnen, W.: A Hierarchical O(N) Force Calculation Algorithm. J. Comp. Phys. 179, 27–42 (2002)
Fortin, P., Lamotte, J.L.: Fast Multipole Method on the Cell B.E.: the Near Field Part. In: Int. Parallel Computing Conf. (ParCo), vol. 19, pp. 323–330 (2009)
Fortin, P., Athanassoula, E., Lambert, J.-C.: Comparisons of different codes for galactic N-body simulations. Astronomy & Astrophysics 531, A120 (2011)
Lange, B., Fortin, P.: Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations, http://hal.upmc.fr/hal-00947130
Londrillo, P., Nipoti, C., Ciotti, L.: A parallel implementation of a new fast algorithm for N-body simulations. In: Comp. Astro. in Italy: Methods and Tools (2002)
Pharr, M., Mark, W.R.: ispc: A SPMD compiler for high-performance CPU programming. In: Innovative Parallel Computing (InPar 2012), pp. 1–13. IEEE (2012)
Springel, V.: The cosmological simulation code GADGET-2. Monthly Notices of the Royal Astronomical Society 364(4), 1105–1134 (2005)
Taura, K., Nakashima, J., Yokota, R., Maruyama, N.: A Task Parallel Implementation of Fast Multipole Methods. In: SC Companion, pp. 617–625 (2012)
Yokota, R.: An FMM Based on Dual Tree Traversal for Many-core Architectures. Journal of Algorithms and Computational Technology 7(3), 301–324 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Lange, B., Fortin, P. (2014). Parallel Dual Tree Traversal on Multi-core and Many-core Architectures for Astrophysical N-body Simulations. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-09873-9_60
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)