Abstract
Enterprise middleware systems typically consist of a large cluster of machines with stringent performance requirements. Hence, when a performance problem occurs in such environments, it is critical that the health monitoring software identifies the root cause with minimal delay. A technique commonly used for isolating root causes is rule definition, which involves specifying combinations of events that cause particular problems. However, such predefined rules (or problem signatures) tend to be inflexible, and crucially depend on domain experts for their definition. We present in this paper a method that automatically generates change point based problem signatures using administrator feedback, thereby removing the dependence on domain experts. The problem signatures generated by our method are flexible, in that they do not require exact matches for triggering, and adapt as more information becomes available. Unlike traditional data mining techniques, where one requires a large number of problem instances to extract meaningful patterns, our method requires few fault instances to learn problem signatures. We demonstrate the efficacy of our approach by learning problem signatures for five common problems that occur in enterprise systems and reliably recognizing these problems with a small number of learning instances.
Chapter PDF
Similar content being viewed by others
References
Hellerstein, J.L., Ma, S., Perng, C.: Discovering Actionable Patterns in Event Data. IBM Systems Journal 41(3) (2002)
Agarwal, M., Gupta, M., Mann, V., Sachindran, N., Anerousis, N., Mummert, L.: Problem Determination in Enterprise Middleware Systems using Change Point Correlation of Time Series Data. In: 9th IEEE/IFIP Network Operations and Management Symposium (NOMS), Vancouver, Canada (May 2006)
Steinder, M., Sethi, A.: The present and future of event correlation: A need for end-to-end service fault localization. In: SCI-2001, 5th World Multiconference on Systemics, Cybernetics, and Informatics, Orlando, FL, pp. 124–129 (July 2001)
Appleby, K., Goldszmidt, G., Steinder, M.: Yemanja A Layered Fault Localization System for Multi-domain Computing Utilities. In: IM 2001 (2001)
Gruschke, B.: Integrated Event Management: Event Correlation Using Dependency Graphs. In: DSOM 1998 (1998)
Brodie, M., Rish, I., Ma, S., Odintsova, N.: Active Probing Strategies for Problem Diagnosis in Distributed Systems. In: IJCAI 2003 (2003)
Gao, J., Kar, G., Kermani, P.: Approaches to Building Self Healing Systems using Dependency Analysis. In: IEEE/IFIP Network Operations and Management Symposium (NOMS) (April 2004)
Brown, A., Kar, G., Keller, A.: An Active Approach to Characterizing Dynamic Dependencies for Problem Determination in a Distributed Environment. In: IM 2001 (2001)
Steinder, M., Sethi, A.: Non-deterministic Event-driven Fault Diagnosis through Incremental Hypothesis Updating. In: Goldszmidt, G., Schonwalder, J. (eds.) Integrated Network Management, VIII, pp. 635–648. Kluwer Academic Publishers, Boston (2003)
Chen, M.Y., Kıcıman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: PD in Large, Dynamic Internet Services. In: International Conference on Dependable Systems and Networks, DSN 2002 (2002)
Choi, J., Choi, M., Lee, S.: An Alarm Correlation and Fault Identification Scheme Based on OSI Managed Object Classes. In: IEEE International Conference on Communications, Vancouver, BC, Canada, pp. 1547–1551 (1999)
Katker, S., Paterok, M.: Fault Isolation and Event Correlation for Integrated Fault Management. In: Integrated Network Management V. Chapman and Hall, Boca Raton (1997)
Aguilera, M., et al.: Performance Debugging for Distributed Systems of Black Boxes. In: 19th ACM Symposium on Operating Systems Principles (October 2003)
Agarwal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Conference on Management of Data, pp. 207–216 (May 1993)
Agarwal, M., Appleby, K., Faik, J., Kar, G., Neogi, A., Sailer, A.: Threshold management for Problem Determination in Transaction Oriented e-Commerce Systems. In: 9th IFIP/IEEE International Symposium on Integrated Network Management (IM 2005) (May 2005)
Fu, A., Kwong, R., Tang, J.: Mining N most interesting Itemsets. In: Ohsuga, S., Raś, Z.W. (eds.) ISMIS 2000. LNCS, vol. 1932, pp. 59–67. Springer, Heidelberg (2000)
IBM Trade Performance Benchmark Sample, http://www-306.ibm.com/software/webservers/appserv/was/performance.html
IBM Websphere Studio Workload Simulator, http://www-306.ibm.com/software/awdtools/studioworkloadsimulator/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 IFIP International Federation for Information Processing
About this paper
Cite this paper
Agarwal, M.K., Sachindran, N., Gupta, M., Mann, V. (2006). Fast Extraction of Adaptive Change Point Based Patterns for Problem Resolution in Enterprise Systems. In: State, R., van der Meer, S., O’Sullivan, D., Pfeifer, T. (eds) Large Scale Management of Distributed Systems. DSOM 2006. Lecture Notes in Computer Science, vol 4269. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11907466_14
Download citation
DOI: https://doi.org/10.1007/11907466_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47659-7
Online ISBN: 978-3-540-47662-7
eBook Packages: Computer ScienceComputer Science (R0)