Abstract
The core count of modern processors is steadily increasing, forcing programmers to use more concurrent threads or tasks to effectively use the available hardware. This in turn makes it increasingly challenging to achieve correct and efficient thread synchronization. To support the programmer in this task, IBM introduced hardware transactional memory (TM) and speculative execution (SE) in their Blue Gene/Q system with its 16-core processor, which permits to run 64 simultaneous hardware threads in SMT mode. TM and SE allow for parallelization when race conditions may happen, however upon their detection the respective parts of the execution are rolled back and re-executed serially. This incurs some overhead and therefore usage must be well justified. In this paper, we describe extensions to the community instrumentation and measurement infrastructure Score-P, allowing developers to instrument, measure, and analyze applications. To our knowledge, this is the first integrated performance tool framework allowing to analyze TM/SE programs. We demonstrate its usefulness and effectiveness by describing experiments with benchmarks and a real-world application.
This work is partially supported by the National Basic Research 973 Program of China under Grant No.61312701001, the National High Technology Research and Development Program of China under Grant No.2012AA01A309.
Chapter PDF
Similar content being viewed by others
Keywords
References
Ohmacht, M., Wang, A., Gooding, T., Nathanson, B., Nair, I., Janssen, G., Schaal, M., Steinmacher-Burow, B.: IBM Blue Gene/Q memory subsystem with speculative execution and transactional memory. IBM Journal of Research and Development 57(1/2), 1–7 (2013)
Knüpfer, A., et al.: Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proc. of 5th Parallel Tools Workshop, 2011, Dresden, Germany, pp. 79–91. Springer (September 2012)
Geimer, M., Kuhlmann, B., Pulatova, F., Wolf, F., Wylie, B.J.N.: Scalable Collation and Presentation of Call-Path Profile Data with CUBE. In: Proc. of the Conference on Parallel Computing (ParCo), Aachen/Jülich, Germany, pp. 645–652 (September 2007), Minisymposium Scalability and Usability of HPC Programming Tools
Herlihy, M., Moss, J.E.B.: Transactional Memory: Architectural Support for Lock-free Data Structures. In: Proc. of the 20th Annual Intl. Symposium on Computer Architecture, ISCA 1993, pp. 289–300. ACM, New York (1993)
Shavit, N., Touitou, D.: Software transactional memory. Distributed Computing 10(2), 99–116 (1997)
Ansari, M., Jarvis, K., Kotselidis, C., Luján, M., Kirkham, C., Watson, I.: Profiling transactional memory applications. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 11–20. IEEE (2009)
Zyulkyarov, F., Stipic, S., Harris, T., Unsal, O.S., Cristal, A., Hur, I., Valero, M.: Profiling and Optimizing Transactional Memory Applications. Intl. Journal of Parallel Programming 40(1), 25–56 (2012)
Lourenço, J., Dias, R., Luís, J., Rebelo, M., Pessanha, V.: Understanding the behavior of transactional memory applications. In: Proc. 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, p. 3. ACM (2009)
Cascaval, C., Blundell, C., Michael, M., Cain, H.W., Wu, P., Chiras, S., Chatterjee, S.: Software Transactional Memory: Why Is It Only a Research Toy? Queue 6(5), 40:46–40:58 (2008)
Wang, A., Gaudet, M., Wu, P., Amaral, J.N., Ohmacht, M., Barton, C., Silvera, R., Michael, M.: Evaluation of Blue Gene/Q hardware support for transactional memories. In: Proc. of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 127–136. ACM (2012)
Schindewolf, M., Biliari, B., Gyllenhaal, J., Schulz, M., Wang, A., Karl, W.: What scientific applications can benefit from hardware transactional memory? In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11. IEEE (2012)
Kunaseth, M., Kalia, R.K., Nakano, A., Vashishta, P., Richards, D.F., Glosli, J.N.: Performance Characteristics of Hardware Transactional Memory for Molecular Dynamics Application on BlueGene/Q: Toward Efficient Multithreading Strategies for Large-Scale Scientific Applications. In: Proc. of Intl. Workshop on Parallel and Distributed Scientific and Engineering Computing (2013)
Schindewolf, M., Rocker, B., Karl, W., Heuveline, V.: Evaluation of Two Formulations of the Conjugate Gradients Method with Transactional Memory. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 508–520. Springer, Heidelberg (2013)
Bihari, B.L., Wong, M., Wang, A., de Supinski, B.R., Chen, W.: A case for including transactions in openmp ii: Hardware transactional memory. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 44–58. Springer, Heidelberg (2012)
Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of Intel® transactional synchronization extensions for high-performance computing. In: Proc. of SC13: Intl. Conference for High Performance Computing, Networking, Storage and Analysis, p. 19. ACM (2013)
Mohr, B., Malony, A.D., Hoppe, H.C., Schlimbach, F., Haab, G., Hoeflinger, J., Shah, S.: A Performance Monitoring Interface for OpenMP. In: Proc. of Fourth European Workshop on OpenMP (EWOMP), Rome, Italy (September 2002)
Maurer, T.: BG/Q Application Tuning – memory hierarchy, transactional memory, speculative execution, http://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/juqueenpt13/juqueenpt13-applicationtuning1.pdf
Sutmann, G., Westphal, L., Bolten, M.: Particle based simulations of complex systems with mp2c: hydrodynamics and electrostatics. In: ICNAAM 2010: International Conference of Numerical Analysis and Applied Mathematics 2010, vol. 1281, pp. 1768–1772. AIP Publishing (2010)
Brunst, H., Mohr, B.: Performance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005/2006. LNCS, vol. 4315, pp. 5–14. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jiang, J., Philippen, P., Knobloch, M., Mohr, B. (2014). Performance Measurement and Analysis of Transactional Memory and Speculative Execution on IBM Blue Gene/Q. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-09873-9_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)