Cluster-based outlier detection

Duan, Lian; Xu, Lida; Liu, Ying; Lee, Jun

doi:10.1007/s10479-008-0371-9

Cluster-based outlier detection

Published: 12 June 2008

Volume 168, pages 151–168, (2009)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Annals of Operations Research Aims and scope Submit manuscript

Cluster-based outlier detection

Download PDF

Lian Duan¹,
Lida Xu^2,3,
Ying Liu⁴ &
…
Jun Lee⁵

2287 Accesses
140 Citations
3 Altmetric
Explore all metrics

Abstract

Outlier detection has important applications in the field of data mining, such as fraud detection, customer behavior analysis, and intrusion detection. Outlier detection is the process of detecting the data objects which are grossly different from or inconsistent with the remaining set of data. Outliers are traditionally considered as single points; however, there is a key observation that many abnormal events have both temporal and spatial locality, which might form small clusters that also need to be deemed as outliers. In other words, not only a single point but also a small cluster can probably be an outlier. In this paper, we present a new definition for outliers: cluster-based outlier, which is meaningful and provides importance to the local data behavior, and how to detect outliers by the clustering algorithm LDBSCAN (Duan et al. in Inf. Syst. 32(7):978–986, 2007) which is capable of finding clusters and assigning LOF (Breunig et al. in Proceedings of the 2000 ACM SIG MOD International Conference on Manegement of Data, ACM Press, pp. 93–104, 2000) to single points.

References

Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Record, 27(2), 94–105. doi:10.1145/276305.276314.
Article Google Scholar
Ankerst, M., Breunig, M. M., Kriegel, H., & Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD international conference on management of data (pp. 49–60). SIGMOD’99, Philadelphia, Pennsylvania, United States, May 31–June 03, 1999. New York: ACM Press.
Chapter Google Scholar
Barnett, V., & Lewis, T. (1994). Outliers in statistical data. New York: Wiley.
Google Scholar
Beyer, K. S., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “nearest neighbor” meaningful? In C. Beeri & P. Buneman (Eds.), Lecture notes in computer science: Vol. 1540. Proceeding of the 7th international conference on database theory (pp. 217–235). January 10–12, 1999. London: Springer.
Google Scholar
Breunig, M. M., Kriegel, H., Ng, R. T., & Sander, J. (2000). LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on management of data (pp. 93–104). SIGMOD’00, Dallas, Texas, United States, May 15–18, 2000. New York: ACM Press.
Chapter Google Scholar
Carvalho, R., & Costa, H. (2007). Application of an integrated decision support process for supplier selection. Enterprise Information Systems, 1(2), 197–216. doi:10.1080/17517570701356208.
Article Google Scholar
Crovella, M. E., & Bestavros, A. (1997). Self-similarity in World Wide Web traffic: evidence and possible causes. IEEE/ACM Transactions on Networking, 5(6), 835–846.
Article Google Scholar
Duan, L., Xu, L., Guo, F., Lee, J., & Yan, B. (2007). A local-density based spatial clustering algorithm with noise. Information Systems, 32(7), 978–986. doi:10.1016/j.is.2006.10.006.
Article Google Scholar
Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noises. In Proc. 2nd int. conf. on knowledge discovery and data mining (pp. 226–231). AAAI Press: Portland.
Google Scholar
Guha, S., Rastogi, R., & Shim, K. (1998). CURE: an efficient clustering algorithm for large databases. In A. Tiwary & M. Franklin (Eds.), Proceedings of the 1998 ACM SIGMOD international conference on management of data (pp. 73–84). SIGMOD’98 Seattle, Washington, United States, June 01–04, 1998. New York: ACM Press.
Chapter Google Scholar
Han, J., & Kamber, M. (2006). Data mining: concepts and techniques. Amsterdam: Elsevier.
Google Scholar
Hawkins, D. (1980). Identification of outliers. London: Chapman and Hall.
Google Scholar
He, Z., Xu, X., & Deng, S. (2003). Discovering cluster-based local outliers. Pattern Recognition Letters, 24(9–10), 1641–1650. doi:10.1016/S0167-8655(02)00160-5.
Article Google Scholar
Hinneburg, A., & Keim, D. 1998. An efficient approach to clustering in large multimedia databases with noise. In Proc. 4th int. conf. on knowledge discovery and data mining (pp. 58–65). New York.
Hinneburg, A., Aggarwal, C. C., & Keim, D. A. (2000). What is the nearest neighbor in high dimensional spaces? In A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, & K. Whang (Eds.), Proceedings of the 26th international conference on very large data bases (pp. 506–515). Very large data bases, September 10–14, 2000. San Francisco: Morgan Kaufmann Publishers.
Google Scholar
Hsu, C., & Wallace, W. A. (2007). An industrial network flow information integration model for supply chain management and intelligent transportation. Enterprise Information Systems, 1(3), 327–351. doi:10.1080/17517570701504633.
Article Google Scholar
Jiang, M. F., Tseng, S. S., & Su, C. M. (2001). Two-phase clustering process for outliers detection. Pattern Recognition Letters, 22(6–7), 691–700.
Article Google Scholar
Johnson, T., Kwok, I., & Ng, R. (1998). Fast computation of 2-dimensional depth contours. In Proc. 4th int. conf. on knowledge discovery and data mining (pp. 224–228). New York: AAAI Press.
Google Scholar
Knorr, E. M., & Ng, R. T. (1998). Algorithms for mining distance-based outliers in large datasets. In A. Gupta, O. Shmueli, & J. Widom (Eds.), Proceedings of the 24rd international conference on very large data bases (pp. 392–403). Very large data bases, August 24–27, 1998. San Francisco: Morgan Kaufmann Publishers.
Google Scholar
Knorr, E. M., & Ng, R. T. (1999). Finding intensional knowledge of distance-based outliers. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, & M. L. Brodie (Eds.), Proceedings of the 25th international conference on very large data bases (pp. 211–222). Very large data bases, September 07–10, 1999. San Francisco: Morgan Kaufmann Publishers.
Google Scholar
Li, H., & Xu, L. (2001). Feature space theory—a mathematical foundation for data mining. Knowledge-Based Systems, 14(5–6), 253–257. doi:10.1016/S0950-7051(01)00103-4.
Article Google Scholar
Li, H., Xu, L., Wang, J., & Mo, Z. (2003). Feature space theory in data mining: transformations between extensions and intensions in knowledge representation. Expert Systems, 20(2), 60–71. doi:10.1111/1468-0394.00226.
Article Google Scholar
Luo, J., Xu, L., Jamont, J., Zeng, L., & Shi, Z. (2007). Flood decision support system on agent grid: method and implementation. Enterprise Information Systems, 1(1), 49–68. doi:10.1080/17517570601092184.
Article Google Scholar
Ng, R., & Han, J. (2002). CLARANS: a method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14(5), 1003–1016.
Article Google Scholar
Preparata, F., & Shamos, M. (1988). Computational geometry: an introduction. Berlin: Springer.
Google Scholar
Qiu, G., Li, H., Xu, L., & Zhang, W. (2003). A knowledge processing method for intelligent systems based on inclusion degree. Expert Systems, 20(4), 187–195. doi:10.1111/1468-0394.00243.
Article Google Scholar
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD international conference on management of data (pp. 427–438). SIGMOD’00, Dallas, Texas, United States, May 15–18, 2000. New York: ACM Press.
Chapter Google Scholar
Sheikholeslami, G., Chatterjee, S., & Zhang, A. (1998). WaveCluster: a multi-resolution clustering approach for very large spatial databases. In A. Gupta, O. Shmueli, & J. Widom (Eds.), Proceedings of the 24rd international conference on very large data bases (pp. 428–439). Very large data bases, August 24–27, 1998. San Francisco: Morgan Kaufmann Publishers.
Google Scholar
Shi, Z., Huang, Y., He, Q., Xu, L., Liu, S., Qin, L., Jia, Z., Li, J., Huang, H., & Zhao, L. (2007). MSMiner-a developing platform for OLAP. Decision Support Systems, 42(4), 2016–2028. doi:10.1016/j.dss.2004.11.006.
Article Google Scholar
Tukey, J. W. (1977). Exploratory data analysis. Reading: Addison–Wesley.
Google Scholar
Wang, W., Yang, J., & Muntz, R. R. (1997). STING: a statistical information grid approach to spatial data mining. In M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, & M. A. Jeusfeld (Eds.), Proceedings of the 23rd international conference on very large data bases (pp. 186–195). Very large data bases, August 25–29, 1997. San Francisco: Morgan Kaufmann Publishers.
Google Scholar
Xu, L. (2006). Advances in intelligent information processing. Expert Systems, 23(5), 249–250. doi:10.1111/j.1468-0394.2006.00405.x.
Article Google Scholar
Xu, L., Liang, N., & Gao, Q. (2008). An integrated approach for agricultural ecosystem management, IEEE Transactions on Systems Man and Cybernetics, Part C, 38(3).
Zhang, M., Xu, L., Zhang, W., & Li, H. (2003). A rough set approach to knowledge reduction based on inclusion degree and evidence reasoning theory. Expert Systems, 20(5), 298–304. doi:10.1111/1468-0394.00254.
Article Google Scholar
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. In J. Widom (Ed.), Proceedings of the 1996 ACM SIGMOD international conference on management of data (pp. 103–114). SIGMOD’96 Montreal, Quebec, Canada, June 04–06, 1996. New York: ACM Press.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Management Sciences Department, University of Iowa, Iowa City, IA, USA
Lian Duan
College of Economics and Management, Beijing Jiaotong University, Beijing, 100044, China
Lida Xu
Department of Information Technology & Decision Science, Old Dominion University, Norfolk, VA, 23529, USA
Lida Xu
Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, China
Ying Liu
China Science and Technology Network, Chinese Academy of Sciences, Beijing, China
Jun Lee

Authors

Lian Duan
View author publications
You can also search for this author in PubMed Google Scholar
Lida Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lian Duan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duan, L., Xu, L., Liu, Y. et al. Cluster-based outlier detection. Ann Oper Res 168, 151–168 (2009). https://doi.org/10.1007/s10479-008-0371-9

Download citation

Published: 12 June 2008
Issue Date: April 2009
DOI: https://doi.org/10.1007/s10479-008-0371-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Cluster-based outlier detection

Abstract

Article PDF

Similar content being viewed by others

Outlier Detection Techniques: A Comparative Study

Outlier Detection Using Subset Formation of Clustering Based Method

A Spectral Clustering Based Outlier Detection Technique

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cluster-based outlier detection

Abstract

Article PDF

Similar content being viewed by others

Outlier Detection Techniques: A Comparative Study

Outlier Detection Using Subset Formation of Clustering Based Method

A Spectral Clustering Based Outlier Detection Technique

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation