Abstract
Preparing performance measurements of HPC applications is usually a tradeoff between accuracy and granularity of the measured data. When using direct instrumentation, that is, the insertion of extra code around performance-relevant functions, the measurement overhead increases with the rate at which these functions are visited. If applied indiscriminately, the measurement dilation can even be prohibitive. In this paper, we show how static code analysis in combination with binary re-writing can help eliminate unnecessary instrumentation points based on configurable filter rules. In contrast to earlier approaches, our technique does not rely on dynamic information, making extra runs prior to the actual measurement dispensable. Moreover, the rules can be applied and modified without re-compilation. We evaluate filter rules designed for the analysis of computation and communication performance and show that in most cases the measurement dilation can be reduced to a few percent while still retaining significant detail.
This material is based upon work supported by the US Department of Energy under Award Number DE-SC0001621.
Chapter PDF
Similar content being viewed by others
Keywords
- Code Snippet
- Call Site
- Cyclomatic Complexity
- Instrumentation Tool
- High Performance Computing Application
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22(6), 685–701 (2009)
Ball, T., Larus, J.R.: Efficient path profiling. In: Proc. of the 29th ACM/IEEE International Symposium on Microarchitecture, pp. 46–57. IEEE Computer Society, Washington, DC, USA (1996)
Buck, B., Hollingsworth, J.: An API for runtime code patching. Journal of High Performance Computing Applications 14(4), 317–329 (2000)
Cactus code (2010), http://www.cactuscode.org
Gadget 2 (2010), http://www.mpa-garching.mpg.de/gadget
Geimer, M., Shende, S.S., Malony, A.D., Wolf, F.: A generic and configurable source-code instrumentation component. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009. LNCS, vol. 5545, pp. 696–705. Springer, Heidelberg (2009)
Geimer, M., Wolf, F., Wylie, B., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010)
Hernandez, O., Jin, H., Chapman, B.: Compiler support for efficient instrumentation. In: Proc. of the ParCo 2007 Conference. Advances in Parallel Computing, vol. 15, pp. 661–668 (2008)
JuRoPA (2010), http://www.fz-juelich.de/jsc/juropa
Malony, A.D., Shende, S.S.: Overhead compensation in performance profiling. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 119–132. Springer, Heidelberg (2004)
Malony, A.D., Shende, S.S., Morris, A., Wolf, F.: Compensation of measurement overhead in parallel performance profiling. International Journal of High Performance Computing Applications 21(2), 174–194 (2007)
McCabe, T.: A complexity measure. IEEE Transactions on Software Engineering 2, 308–320 (1976)
Mellor-Crummey, J., Fowler, R., Marin, G., Tallent, N.: HPCView: A tool for top-down analysis of node performance. The Journal of Supercomputing 23(1), 81–104 (2002)
Message Passing Interface Forum: MPI: A message-passing interface standard, version 2.2 (September 2009), ch. 14: Profiling Interface
an Mey, D., et al.: Score-P – A unified performance measurement system for petascale applications. In: Proc. of Competence in High Performance Computing, Schloss Schwetzingen, Germany (2010), (to appear)
Müller, M., van Waveren, M., Lieberman, R., Whitney, B., Saito, H., Kumaran, K., Baron, J., Brantley, W., Parrott, C., Elken, T., Feng, H., Ponder, C.: SPEC MPI2007 – An application benchmark suite for parallel systems using MPI. Concurrency and Computation: Practice and Experience 22(2), 191 (2010)
Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.-C., Solchenbach, K.: VAMPIR: Visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996)
Schulz, M., Galarowicz, J., Maghrak, D., Hachfeld, W., Montoya, D., Cranford, S.: Open|SpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming 16(2-3), 105–121 (2008)
Servat, H., Llort, G., Giménez, J., Labarta, J.: Detailed performance analysis using coarse grain sampling. In: Lin, H.-X., Alexander, M., Forsell, M., Knüpfer, A., Prodan, R., Sousa, L., Streit, A. (eds.) Euro-Par 2009. LNCS, vol. 6043, pp. 185–198. Springer, Heidelberg (2010)
Shende, S.S.: The role of instrumentation and mapping in performance measurement. Ph.D. thesis, University of Oregon (August 2001)
Shende, S.S., Malony, A.D.: The TAU parallel performance system. International Journal of High Performance Computing Applications 20(2), 287–311 (2006)
Williams, C.C., Hollingsworth, J.K.: Interactive binary instrumentation. IEEE Seminar Digests 915, 25–28 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mußler, J., Lorenz, D., Wolf, F. (2011). Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis. In: Jeannot, E., Namyst, R., Roman, J. (eds) Euro-Par 2011 Parallel Processing. Euro-Par 2011. Lecture Notes in Computer Science, vol 6852. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23400-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-23400-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23399-9
Online ISBN: 978-3-642-23400-2
eBook Packages: Computer ScienceComputer Science (R0)