Abstract
Most current intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is typically expensive to produce. We present a new geometric framework for unsupervised anomaly detection, which are algorithms that are designed to process unlabeled data. In our framework, data elements are mapped to a feature space which is typically a vector space ℛd. Anomalies are detected by determining which points lies in sparse regions of the feature space. We present two feature maps for mapping data elements to a feature space. Our first map is a data-dependent normalization feature map which we apply to network connections. Our second feature map is a spectrum kernel which we apply to system call traces. We present three algorithms for detecting which points lie in sparse regions of the feature space. We evaluate our methods by performing experiments over network records from the KDD CUP 1999 data set and system call traces from the 1999 Lincoln Labs DARPA evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data. John Wiley and Sons.
Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. (2000). LOF: identifying density-based local outliers. In ACM SIGMOD Int. Conf. on Management of Data, pages 93–104.
Christina Leslie, E. E. and Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the Pacific Symposium on Biocomputing (PSB-2002), Kaua’i, Hawaii.
Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK.
Denning, D. (1987). An intrusion detection model. IEEE Transactions on Software Engineering, SE-13:222–232.
Eskin, E. (2000). Anomaly detection over noisy data using learned probability distributions. In Proceedings of the International Conference on Machine Learning.
Eskin, E., Lee, W., and Stolfo, S. J. (2001). Modeling system calls for intrusion detection with dynamic window sizes. ln Proceedings of DARPA Information Survivabilty Conference and Exposition II (DISCEX II), Anaheim, CA.
Fan, W. and Stolfo, S. (2002). Ensemble-based adaptive intrusion detection. In Proceedings of 2002 SIAM International Conference on Data Mining, Arlington, VA.
Forrest, S., Hofmeyr, S. A., Somayaji, A., and Longstaff, T. A. (1996). A sense of self for unix processes. In 1996 IEEE Symposium on Security and Privacy, pages 120–128. IEEE Computer Society.
Ghosh, A. and Schwartzbard, A. (1999). A study in using neural networks for anomaly and misuse detection. In Proceedings of the 8th USENIX Security Symposium.
Haussler, D. (1999). Convolution kernels on discrete structures. Technical Report UCS-CRL-99–10, UC Santa Cruz.
Helman, P. and Bhangoo, J. (1997). A statistically base system for prioritizing information exploration under uncertainty. IEEE Transactions on Systems,Man and Cybernetics, Part A: Systems and Humans, 27(4):449–466.
Hofmeyr, S. A., Forrest, S., and Somayaji, A. (1998). Intrusion detect using sequences of system calls. Journal of Computer Security, 6:151–180.
Javitz, H. S. and Valdes, A. (1993). The NIDES statistical component: description and justification. In Technical Report,Computer Science Laboratory, SRI International.
KDD99-Cup (1999). The third international knowledge discovery and data mining tools competition dataset http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.htm1.
Knorr, E. M. and Ng, R. T. (1998). Algorithms for mining distance-based outliers in large datasets. In Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pages 392–403.
Knorr, E. M. and Ng, R. T. (1999). Finding intentional knowledge of distance-based outliers. The VLDB Journal, pages 211–222.
Lane, T. and Brodley, C. E. (1997). Sequence matching and learning in anomaly detection for computer security. In AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, pages 43–49. AAAI Press.
Lee, W. and Stolfo, S. J. (1998). Data mining approaches for intrusion detection. In Proceedings of the 1998 USENIX Security Symposium.
Lee, W., Stolfo, S. J., and Chan, P. K. (1997). Learning patterns from unix processes execution traces for intrusion detection. In AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, pages 50–56. AAAI Press.
Lee, W., Stolfo, S. J., and Mok, K. (1999). Data mining in work flow environments: Experiences in intrusion detection. In Proceedings of the 1999 Conference on Knowledge Discovery and Data Mining (KDD99).
Lippmann, R. P., Cunningham, R. K., Fried, D. J., Graf, I., Kendall, K. R., Webster, S. W., and Zissman, M. (1999). Results of the 1999 darpa off-line intrusion detection evaluation. In Second International Workshop on Recent Advances in Intrusion Detection (RAID 1999), West Lafayette, IN.
McCallum, A., Nigam, K., and Ungar, L. H. (2000). Efficient clustering of high-dimensional data sets with application to reference matching. In Knowledge Discovery and Data Mining, pages 169–178.
Paxson, V. (1998). Bro: A system for detecting network intruders in real-time. In Proceedings of the 7th USENIX Security Symposium, San Antonio, TX.
Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In Schölkopf, B., Burges, C. J. C., and Smola, A. J., editors, Advances in Kernel Methods — Support Vector Learning, pages 185–208, Cambridge, MA. MIT Press.
Portnoy, L., Eskin, E., and Stolfo, S. J. (2001). Intrusion detection with unlabeled data using clustering. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001), Philadelphia, PA.
Provost, F., Fawcett, T., and Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning.
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (1999). Estimating the support of a high-dimensional distribution. Technical Report 99–87, Microsoft Research. To appear in Neural Computation, 2001.
Warrender, C., Forrest, S., and Pearlmutter, B. (1999). Detecting intrusions using system calls: alternative data models. In 1999 IEEE Symposium on Security and Privacy, pages 133–145. IEEE Computer Society.
Watkins, C. (2000). Dynamic alignment kernels. In Smola, A., Bartlett, P., Schölkopf, B., and Schuurmans, D., editors, Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA. MIT Press.
Ye, N. (2000). A markov chain model of temporal behavior for anomaly detection,. In Proceedings of the 2000 IEEE Systems, Man, and Cybernetics Information Assurance and Security Workshop.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S. (2002). A Geometric Framework for Unsupervised Anomaly Detection. In: Barbará, D., Jajodia, S. (eds) Applications of Data Mining in Computer Security. Advances in Information Security, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0953-0_4
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0953-0_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5321-8
Online ISBN: 978-1-4615-0953-0
eBook Packages: Springer Book Archive