Abstract
Density and delta-distance clustering (DDC) is an ideal clustering method that computes the density and delta distance of data. When data derived from the two indicators are large, these areas can be defined as cluster centers. DDC has good clustering performance compared with some other clustering algorithms. However, DDC has a high time complexity and requires manual identification of cluster centers. To fill these gaps, an efficient and intelligent DDC (EIDDC) algorithm is proposed in this study. EIDDC begins from using a sampling method based on locality-sensitive hashing (LSH) to obtain a small-scale dataset. The density and delta distance of each data point are calculated from this dataset to reduce time complexity. Cluster centers are intelligently recognized by utilizing density-based spatial clustering of applications with noise-based outlier detection technology. Experiment results show that LSH can obtain good representatives of the original dataset and that the proposed outlier detection method can recognize the cluster centers of a given dataset. The results also reveal the efficiency of EIDDC.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Jain, A.K.; Dubes, R.C.: Algorithms for Clustering Data, pp. 45–46. Prentice-Hall, Englewood Cliffs (1988)
Gracia, C.D.; Sudha, S.: Adaptive clustering of embedded multiple web objects for efficient group prefetching. Arab. J. Sci. Eng. 42(2), 715–724 (2017)
Tagarelli, A.; Karypis, G.: A segment-based approach to clustering multi-topic documents. Knowl. Inf. Syst. 34(3), 563–595 (2013)
Wang, Q.; Chen, G.: Fuzzy soft subspace clustering method for gene co-expression network analysis. Int. J. Mach. Learn. Cybern. 8(4), 1157–1165 (2017)
Wu, X.; Kumar, V.; Quinlan, J.R.; et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Salgado, P.; Garrido: fuzzy clustering of fuzzy systems. In: IEEE International Conference on Systems Man and Cybernetics, pp. 2368–2373 (2004)
Masahiro, E.; Masahiro, U.; Takaya, T.: A clustering me tho d using hierarchical self-organizing maps. J. VLSI Signal Process. Syst. Signal Image Video Technol. 32(1/ 2), 105–118 (2002)
Xu, D.; Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015)
Rodriguez, A.; Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Jia, S.; Tang, G.; Zhu, J.; et al.: A novel ranking-based clustering approach for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 54(1), 88–102 (2016)
Cheng, Q.; Liu, Z.; Huang, J.; et al.: Community detection in hypernetwork via density-ordered tree partition. Appl. Math. Comput. 276, 384–393 (2016)
Chen, Y.W.; Lai, D.H.; Qi, H.; et al.: A new method to estimate ages of facial image for large database. Multimed. Tools Appl. 75(5), 2877–2895 (2016)
Wang, M.; Zuo, W.; Wang, Y.: An improved density peaks-based clustering method for social circle discovery in social networks. Neurocomputing 179, 219–227 (2016)
Dandan, M.; Xiaowei, Q.; Weidong, W.: Anomalous cell detection with kernel density-based local outlier factor. China Commun. 12(9), 64–75 (2015)
Wang, T.; Zhang, W.; Ye, C.; et al.: Fd4c: automatic fault diagnosis framework for web applications in cloud computing. IEEE Trans. Syst. Man Cybern. Syst. 46(1), 61–75 (2016)
Lu, J.; Wang, G.; Deng, W.; et al.: Reconstruction-based metric learning for unconstrained face verification. IEEE Trans. Inf. Forensics Secur. 10(1), 79–89 (2015)
Wang, S.; Wang, D.; Li, C.; et al.: Comment on “Clustering by fast search and find of density peaks”. arXiv preprint arXiv:1501.04267 (2015)
Zhong, J.; Peter, W.T.; Wei, Y.: An intelligent and improved density and distance-based clustering approach for industrial survey data classification. Expert Syst. Appl. 68, 21–28 (2017)
Gionis, A.; Indyk, P.; Motwani, R.: Similarity search in high dimensions via hashing. VLDB 99(6), 518–529 (1999)
Datar, M.; Immorlica, N.; Indyk, P.; et al.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry. ACM, pp. 253–262 (2004)
Ester, M.; Kriegel, H.P.; Sander, J.; et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34), 226–231 (1996)
Yun, X.; Chong-zhao, H.; Huan-hong, W.; et al.: Kernel-based self-organizing map clustering. J. Xi’an J. Univ. 39(12), 1307–1310 (2005)
Ng, A.Y.; Jordan, M.I.; Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, pp. 849–856. MIT Press, Cambridge (2002)
Indyk, P.; Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. ACM, pp. 604–613 (1998)
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing. ACM, pp. 380–388 (2002)
Acknowledgements
I appreciate my co-authors for their valuable help and support in accomplishing the manuscript. This research work was supported by the National Natural Science Foundation of China (No. 61571226).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, X., Yuan, J. & Zhao, H. Efficient and Intelligent Density and Delta-Distance Clustering Algorithm. Arab J Sci Eng 43, 7177–7187 (2018). https://doi.org/10.1007/s13369-017-3060-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-017-3060-7