A Minimum Spanning Tree-Inspired Clustering-Based Outlier Detection Technique

Wang, Xiaochun; Wang, Xia Li; Wilkes, D. Mitch

doi:10.1007/978-3-642-31488-9_17

Xiaochun Wang²⁰,
Xia Li Wang²¹ &
D. Mitch Wilkes²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7377))

Included in the following conference series:

Industrial Conference on Data Mining

1613 Accesses
11 Citations

Abstract

Due to its important applications in data mining, many techniques have been developed for outlier detection. In this paper, an efficient three-phase outlier detection technique. First, we modify the famous k-means algorithm for an efficient construction of a spanning tree which is very close to a minimum spanning tree of the data set. Second, the longest edges in the obtained spanning tree are removed to form clusters. Based on the intuition that the data points in small clusters may be most likely all outliers, they are selected and regarded as outlier candidates. Finally, density-based outlying factors, LOF, are calculated for potential outlier candidates and accessed to pinpoint the local outliers. Extensive experiments on real and synthetic data sets show that the proposed approach can efficiently identify global as well as local outliers for large-scale datasets with respect to the state-of-the-art methods.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A Spectral Clustering Based Outlier Detection Technique

An Outlier Detection Algorithm Based on Spectral Clustering

A New Neighborhood-Based Outlier Detection Technique

Keywords

References

Hawkins, D.M.: Identification of Outliers, Monographs on Applied Probability and Statistics. Chapman and Hall, London (1980)
Google Scholar
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Data Mining for Security Applications (2002)
Google Scholar
Lane, T., Brodley, C.E.: Temporal sequence learning and data reduction for anomaly detection. ACM Transactions on Information and System Security 2(3), 295–331 (1999)
Article Google Scholar
Bolton, R.J., David, J.H.: Unsupervised Profiling Methods for Fraud Detection. Statistical Science 17(3), 235–255 (2002)
Article MathSciNet MATH Google Scholar
Wong, W., Moore, A., Cooper, G., Wagner, M.: Rule-based Anomaly Pattern Detection for Detecting Disease Outbreaks. In: Proceedings of the 18th National Conference on Artificial Intelligence (2002)
Google Scholar
Sheng, B., Li, Q., Mao, W., Jin, W.: Outlier detection in sensor networks. In: Proceedings of ACM International Symposium on Mobile Ad Hoc Networking and Computing, pp. 219–228 (2007)
Google Scholar
Hodge, V.J., Austin, J.: A Survey of Outlier Detection Methodologies. Artificial Intelligence Review 22, 85–126 (2004)
Article MATH Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. ACM Computing Surveys 41(3), article 15 (2009)
Google Scholar
Gibbons, P.B., Papadimitriou, S., Kitagawa, H., Christos Faloutsos, C.: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In: Proceedings of the IEEE 19th International Conference on Data Engineering, Bangalore, India, pp. 315–328 (2003)
Google Scholar
Breuning, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Google Scholar
Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: Proceedings of the 24th VLDB Conference, New York, USA, pp. 392–403 (1998)
Google Scholar
Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, UK, pp. 211–222 (1999)
Google Scholar
Angiulli, F., Pizzuti, C.: Outlier mining in large high dimensional datasets. IEEE Transactions on Knowledge and Data and Engineering, 203–215 (2005)
Google Scholar
Niu, K., Huang, C., Zhang, S., Chen, J.: ODDC: outlier detection using distance distribution clustering. In: HPDMA 2007 in Conjunction with PAKDDd 2007, pp. 332–343 (2007)
Google Scholar
Kreigel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, pp. 444–452 (2008)
Google Scholar
Wang, X., Wang, X.L., Wilkes, D.M.: A Divide-And-Conquer Approach For Minimum Spanning Tree-Based Clustering. IEEE Transactions on Knowledge and Data Engineering 21(7), 945–958 (2009)
Article Google Scholar
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB Journal: Very Large Databases 8(3-4), 237–253 (2000)
Article Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD Conference, pp. 427–438 (2000)
Google Scholar
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Proceedings of the Sixth European Conference on the Principles of Data Mining and Knowledge Discovery, pp. 15–26 (2002)
Google Scholar
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD 2003, pp. 29–38 (2003)
Google Scholar
Ghoting, A., Parthasarathy, S., Otey, M.E.: Fast mining of distance-based outliers in high-dimensional datasets. In: SDM 2006, pp. 608–612 (2006)
Google Scholar
Wang, X., Wang, X.L., Wilkes, D.M.: A fast distance-based outlier detection technique. In: Poster and Workshop Proceedings of 8th Industrial Conference on Data Mining, Leipzig, Germany, pp. 25–44 (July 2008)
Google Scholar
Wang, X., Wang, X.L., Wilkes, D.M.: Application of two partial search methods to Euclidean distance-based outlier detection. In: Proceedings of the 2008 International Conference on Data Mining, Las Vegas Nevada, USA, July 2008, pp. 420–426 (2008)
Google Scholar
Jin, W., Tung, A.K.H., Han, J.: Mining top-n local outliers in large databases. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, pp. 293–298 (2001)
Google Scholar
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking Outliers Using Symmetric Neighborhood Relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006)
Chapter Google Scholar
Tang, J., Chen, Z., Fu, A.W.-c., Cheung, D.W.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 535. Springer, Heidelberg (2002)
Chapter Google Scholar
Sun, P., Chawla, S.: On local spatial outliers. In: Proceedings of the 4th International Conference on Data Mining (ICDM), Brighton, UK (2004)
Google Scholar
Zahn, C.T.: Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions on Computers C-20, 68–86 (1971)
Article Google Scholar
Rohlf, F.J.: Generalization of the gap test for the detection of multivariate outliers. Biometrics 31, 93–101 (1975)
Article MATH Google Scholar
Jiang, M.F., Tseng, S.S., Su, C.M.: Two-Phase Clustering Process for Outliers Detection. Pattern Recognition Letters 22, 691–700 (2001)
Article MATH Google Scholar
Lin, J., Ye, D., Chen, C., Gao, M.: Minimum Spanning Tree Based Spatial Outlier Mining and Its Applications. In: Wang, G., Li, T., Grzymala-Busse, J.W., Miao, D., Skowron, A., Yao, Y. (eds.) RSKT 2008. LNCS (LNAI), vol. 5009, pp. 508–515. Springer, Heidelberg (2008)
Chapter Google Scholar
Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V.: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Data Base Systems (TODS) 30(2), 364–397 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics and Information, Xi’an Jiaotong University, Xi’an, 710049, China
Xiaochun Wang
Department of Computer Science, Changan Univeristy, Xi’an, 710061, China
Xia Li Wang
School of Engineering, Vanderbilt University, Nashville, TN, 37235, USA
D. Mitch Wilkes

Authors

Xiaochun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xia Li Wang
View author publications
You can also search for this author in PubMed Google Scholar
D. Mitch Wilkes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Wang, X.L., Wilkes, D.M. (2012). A Minimum Spanning Tree-Inspired Clustering-Based Outlier Detection Technique. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2012. Lecture Notes in Computer Science(), vol 7377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31488-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-31488-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31487-2
Online ISBN: 978-3-642-31488-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Minimum Spanning Tree-Inspired Clustering-Based Outlier Detection Technique

Abstract

Chapter PDF

Similar content being viewed by others

A Spectral Clustering Based Outlier Detection Technique

An Outlier Detection Algorithm Based on Spectral Clustering

A New Neighborhood-Based Outlier Detection Technique

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Minimum Spanning Tree-Inspired Clustering-Based Outlier Detection Technique

Abstract

Chapter PDF

Similar content being viewed by others

A Spectral Clustering Based Outlier Detection Technique

An Outlier Detection Algorithm Based on Spectral Clustering

A New Neighborhood-Based Outlier Detection Technique

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation