Abstract
Approaching the theoretical performance of hierarchical multicore machines requires a very careful distribution of threads and data among the underlying non-uniform architecture in order to minimize cache misses and NUMA penalties. While it is acknowledged that OpenMP can enhance the quality of thread scheduling on such architectures in a portable way, by transmitting precious information about the affinities between threads and data to the underlying runtime system, most OpenMP runtime systems are actually unable to efficiently support highly irregular, massively parallel applications on NUMA machines.
In this paper, we present a thread scheduling policy suited to the execution of OpenMP programs featuring irregular and massive nested parallelism over hierarchical architectures. Our policy enforces a distribution of threads that maximizes the proximity of threads belonging to the same parallel region, and uses a NUMA-aware work stealing strategy when load balancing is needed. It has been developed as a plug-in to the forestGOMP OpenMP platform [TBG+07]. We demonstrate the efficiency of our approach with a highly irregular recursive OpenMP program resulting from the generic parallelization of a surface reconstruction application. We achieve a speedup of 14 on a 16-core machine with no application-level optimization.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Ayguade, E., Copty, N., Duranl, A., Hoeflinger, J., Lin, Y., Massaioli, F., Su, E., Unnikrishnan, P., Zhang, G.: A proposal for task parallelism in OpenMP. In: Third International Workshop on OpenMP (IWOMP 2007), Beijing, China (2007)
Ayguade, E., Gonzalez, M., Martorell, X., Jost, G.: Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications. In: 18th International Parallel and Distributed Processing Symposium (IPDPS) (2004)
an Mey, D., Sarholz, S., Terboven, C.: Nested Parallelization with OpenMP. Parallel Computing 35(5), 459–476 (2007)
Balart, J., Duran, A., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: Nanos mercurium: A research compiler for openmp. In: European Workshop on OpenMP (EWOMP) (October 2004)
Blikberg, R., Sørevik, T.: Load balancing and OpenMP implementation of nested parallelism. Parallel Computing, 31(10-12):984–998 (October 2005)
Chapman, B.M., Huang, L., Jin, H., Jost, G., de Supinski, B.R.: Extending openmp worksharing directives for multithreading. In: EuroPar 2006 Parallel Processing (2006)
Duran, A., Gonzàles, M., Corbalán, J.: Automatic Thread Distribution for Nested Parallelism in OpenMP. In: 19th ACM International Conference on Supercomputing, Cambridge, MA, USA, June 2005, pp. 121–130 (2005)
Duran, A., Silvera, R., Corbalán, J., Labarta, J.: Runtime adjustment of parallel nested loops. In: Chapman, B.M. (ed.) WOMPAT 2004. LNCS, vol. 3349, Springer, Heidelberg (2005)
Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada (June 1998)
GOMP – An OpenMP implementation for GCC, http://gcc.gnu.org/projects/gomp/
Gonzalez, M., Oliver, J., Martorell, X., Ayguade, E., Labarta, J., Navarro, N.: OpenMP Extensions for Thread Groups and Their Run-Time Support. In: Languages and Compilers for Parallel Computing, Springer, Heidelberg (2001)
Gao, G.R., Sterling, T., Stevens, R., Hereld, M., Zhu, W.: Hierarchical multithreading: programming model and system software. In: 20th International Parallel and Distributed Processing Symposium (IPDPS) (April 2006)
Gerndt, A., Sarholz, S., Wolter, M., an Mey, D., Bischof, C., Kuhlen, T.: Nested OpenMP for Efficient Computation of 3D Critical Points in Multi-Block CFD Datasets. In: Super Computing (November 2006)
Hadjidoukas, P.E., Dimakopoulos, V.V.: Nested Parallelism in the OMPi OpenMP/C compiler. In: EuroPar, Rennes,France, July 2007, ACM, New York (2007)
Karlsson, S.: An Introduction to Balder - An OpenMP Run-time Library for Clusters of SMPs. In: International Workshop on OpenMP (IWOMP) (June 2005)
Martorell, X., Ayguadé, E., Navarro, N., Corbalán, J., González, M., Labarta, J.: Thread Fork/Join Techniques for Multi-Level Parallelism Exploitation in NUMA Multiprocessors. In: International Conference on SuperComputing, pp. 294–301. ACM Press, New York (1999)
Ohtake, Y., Belyaev, A., Alexa, M., Turk, G., Seidel, H.-P.: Multi-level partition of unity implicits. ACM Trans. Graph. 22(3), 463–470 (2003)
Su, E., Tian, X., Haab, M.G.G., Shah, S., Petersen, P.: Compiler Support of the Workqueuing Execution Model for Intel SMP Architectures. In: European Workshop on OpenMP (EWOMP) (October 2004)
Thibault, S., Broquedis, F., Goglin, B., Namyst, R., Wacrenier, P.-A.: An Efficient OpenMP Runtime System for Hierarchical Architectures. In: International Workshop on OpenMP (IWOMP), Beijing,China, June 2007, pp. 148–159 (2007)
Tian, X., Girkar, M., Bik, A., Saito, H.: Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs. Comput. J. 48(5), 588–601 (2005)
Tian, X., Girkar, M., Shah, S., Armstrong, D., Su, E., Petersen, P.: Compiler and Runtime Support for Running OpenMP Programs on Pentium- and Itanium-Architectures. In: Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments, April 2003, pp. 47–55 (2003)
Tian, X., Hoeflinger, J.P., Haab, G., Chen, Y.-K., Girkar, M., Shah, S.: A compiler for exploiting nested parallelism in OpenMP programs. Parallel Comput. 31(10-12), 960–983 (2005)
Tanaka, Y., Taura, K., Sato, M., Yonezawa, A.: Performance evaluation of openmp applications with nested parallelism. In: Languages, Compilers, and Run-Time Systems for Scalable Computers, pp. 100–112 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Broquedis, F., Diakhaté, F., Thibault, S., Aumage, O., Namyst, R., Wacrenier, PA. (2008). Scheduling Dynamic OpenMP Applications over Multicore Architectures. In: Eigenmann, R., de Supinski, B.R. (eds) OpenMP in a New Era of Parallelism. IWOMP 2008. Lecture Notes in Computer Science, vol 5004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79561-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-79561-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79560-5
Online ISBN: 978-3-540-79561-2
eBook Packages: Computer ScienceComputer Science (R0)