Abstract
The ability of system software to detect and issue error messages that help programmers quickly fix serial and parallel run-time errors is an important productivity criterion for developing and maintaining application programs. Over ten thousand run-time error tests and a run-time error detection (RTED) evaluation tool has been developed for the automatic evaluation of run-time error detection capabilities for serial errors and for parallel errors in MPI, OpenMP and UPC programs. Evaluation results, tests and the RTED evaluation tool are freely available at http://rted.public.iastate.edu. Many compilers, tools and run-time systems scored poorly on these tests. The authors make recommendations for providing better RTED in the future.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Luecke, G., Coyle, J., Hoekstra, J., Kraeva, M., Li, Y., Taborskaia, O., Wang, Y.: A Survey of Systems for Detecting Serial Run-time Errors. Concurrency and Computation: Practice and Experience, vol. 18, pp 1885–1907 (2006)
Sun Microsystem’s HPC ClusterTools, http://www.sun.com/software/products/clustertools/
Snir, M., Otto, S. W., Huss-Lederman, S., Walker, D. W., Dongarra, J.: MPI - The Complete Reference, The MIT Press (1998)
Message Passing Interface Forum, http://www.mpi-forum.org
The OpenMP API Specification, http://openmp.org
Chapman, B., Jost, G., Van der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming, The MIT Press (2008)
Unified Parallel C, http://upc.gwu.edu
El-Ghazawi, T., Carlson, W., Sterling, T., Yelick, K.: UPC Distributed Shared Memory Programming, Wiley-Interscience (2005)
Vetter, J.S., De Supinski, B.R.: Dynamic software testing of MPI applications with Umpire, In: Conference on High Performance Networking and Computing Article 51, Proceedings of the 2000 ACM/IEEE conference on Supercomputing, Dallas, Texas, United States (2000)
Hilbrich, T., Supinski, B., Mueller, M., Schulz, M.: A Graph Based Approach for MPI Deadlock Detection, In: International Conference on Supercomputing, Yorktown Heights, NY, USA, pp 296–305 (2009)
MARMOT, http://www.hlrs.de/organization/av/amt/research/marmot/publications/
Luecke, G.R., Chen, H., Coyle, J., Hoekstra, J., Kraeva, Zou, Y.: MPI-CHECK: a Tool for Checking Fortran 90 MPI Programs. Concurrency and Computation: Practice and Experience, vol. 15, pp 93–100 (2003)
Intel Message Checker, http://www.intel.com/cd/software/products/asmo-na/eng/227074.htm
Intel Thread Checker, http://software.intel.com/en-us/intel-thread-checker/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Luecke, G.R. et al. (2010). The Importance of Run-Time Error Detection. In: Müller, M., Resch, M., Schulz, A., Nagel, W. (eds) Tools for High Performance Computing 2009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11261-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-11261-4_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11260-7
Online ISBN: 978-3-642-11261-4
eBook Packages: Computer ScienceComputer Science (R0)