Abstract
Despite its ease of use, OpenMP has failed to gain widespread use on large scale systems, largely due to its failure to deliver sufficient performance. Our experience indicates that the cost of initiating OpenMP regions is simply too high for the desired OpenMP usage scenario of many applications. In this paper, we introduce CLOMP, a new benchmark to characterize this aspect of OpenMP implementations accurately. CLOMP complements the existing EPCC benchmark suite to provide simple, easy to understand measurements of OpenMP overheads in the context of application usage scenarios. Our results for several OpenMP implementations demonstrate that CLOMP identifies the amount of work required to compensate for the overheads observed with EPCC. We also show that CLOMP also captures limitations for OpenMP parallelization on SMT and NUMA systems. Finally, CLOMPI, our MPI extension of CLOMP, demonstrates which aspects of OpenMP interact poorly with MPI when MPI helper threads cannot run on the NIC.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 2.5
Message Passing Interface Forum: Mpi: A message-passing interface standard. Int. J. Supercomput. Appl. 8(3/4), 165–414 (1994)
Hoeflinger, J., de Supinski, B.R.: The openmp memory model. In: International Workshop on OpenMP (IWOMP) (2005)
Reid, F.J.L., Bull, J.M.: Openmp microbenchmarks version 2.0. In: European Workshop on OpenMP (EWOMP) (2004)
Collins W.D., Bitz C.M., Blackmon M.L., Bonan G.B., Bretherton C.S., Carton J.A., Chang P., Doney S.C., Hack J.J., Henderson T.B., Kiehl J.T., Large W.G., McKenna D.S., Santer B.D., Smith R.D.: The community climate system model version 3. J. Climate 19(1), 2122–2143 (2006)
de St. Germain, J.D., McCorquodale, J., Parker, S., Johnson, C.: A component-based architecture for parallel multi-physics PDE simulation. In: International Symposium on High Performance and Distributed Computing (2000)
Rosner, R., Calder, A., Dursi, J., Fryxell, B., Lamb, D.Q., Niemeyer, J.C., Olson, K., Ricker, P., Timmes, F.X., Truran, J.W., Tufo, H., Young, Y.-N., Zingale, M., Lusk, E., Stevens, R.: Flash code: studying astrophysical thermonuclear flashes. J. Comput. Sci. Eng. 2(2) (2000)
White, B.S., McKee, S.A., de Supinski, B.R., Miller, B., Quinlan, D., Schulz, M.: Improving the computational intensity of unstructured mesh applications. In: Proceedings of the 19th ACM International Conference on Supercomputing, June 2005
Balay, S., Gropp, W.D., McInnes, L.C., Smith, B.F.: Efficient management of parallelism in object oriented numerical software libraries. In: Arge E., Bruaset, A. M., Langtangen, H. P. (eds.) Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser Press (1997)
Falgout, R., Jones, J., Yang, U.: The design and implementation of HYPRE, a library of parallel high performance preconditioners. In: Numerical Solution of Partial Differential Equations on Parallel Computers. Springer-Verlag, to appear
Blackford, L., Choi, J., Cleary, A., Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammerling, S., Henry, G., Petite, A., Stanley, K., Walker, D., Whaley, R.: ScaLAPACK users. SIAM, Philadelphia (1997)
Gygi, F., Draeger, E., de Supinski, B., Yates, R., Franchetti, F., Kral, S., Lorenz, J., Überhuber, C., Gunnels, J., Sexton, J.: Large-scale first-principles molecular dynamics simulations on the BlueGene/L platform using the Qbox code. In: Proceedings of IEEE/ACM Supercomputing ’05, Nov. 2005
Germann, T., Kadau, K., Lomdahl, P.: 25 Tflop/s multibillion-atom molecular dynamics simulations and visualization/analysis on BlueGene/L. In: Proceedings of IEEE/ACM Supercomputing ’05, Nov. 2005
Phillips, J.C., Zheng, G., Kumar, S., Kale, L.V.: NAMD: Biomolecular simulation on thousands of processors. In: Proceedings of IEEE/ACM Supercomputing ’02, Nov. 2002
Streitz, F., Glosli, J., Patel, M., Chan, B., Yates, R., de Supinski, B., Sexton, J., Gunnels, J.: 100+ TFlop solidification simulations on BlueGene/L. In: Proceedings of IEEE/ACM Supercomputing ’05, Nov. 2005
Bulatov, V., Cai, W., Fier, J., Hiratani, M., Hommes, G., Pierce, T., Tang, M., Rhee, M., Yates, K., Arsenlis, T.: Scalable line dynamics in ParaDiS. In: Proceedings of IEEE/ACM Supercomputing ’04, Nov. 2004
Acknowledgements
This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-JRNL-408738).
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Bronevetsky, G., Gyllenhaal, J. & de Supinski, B.R. CLOMP: Accurately Characterizing OpenMP Application Overheads. Int J Parallel Prog 37, 250–265 (2009). https://doi.org/10.1007/s10766-009-0096-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-009-0096-7