Abstract
Clustering in data mining is a supreme step toward organizing data into some meaningful patterns. It plays an extremely crucial role in the entire KDD process, and also as categorizing data is one of the most rudimentary steps in knowledge discovery. Clustering is used for creating partitions or clusters of similar objects. It is an unsupervised learning task used for exploratory data analysis to find some unrevealed patterns which are present in data but cannot be categorized clearly. Sets of data can be designated or grouped together based on some common characteristics and termed clusters, and the implementation steps involved in cluster analysis are essentially dependent upon the primary task of keeping objects within a cluster more closer than objects belonging to other groups or clusters. Depending on the data and expected cluster characteristics, there are different types of clustering algorithms. In the very recent times, many new algorithms have emerged, which aim toward bridging the different approaches toward clustering and merging different clustering algorithms given the requirement of handling sequential, high-dimensional data with multiple relationships in many applications across a broad spectrum. The paper aims to survey, study, and analyze few clustering algorithms and provides a comprehensive comparison of their efficiency on some common grounds. This study also contributes in correlating some very important characteristics of an efficient clustering algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cherkassky, V., Mulier, F.: Learning From Data: Concepts, Theory, and Methods. Wiley, New York (1998)
Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowl. Inf. Syst. 5(4), 387–415 (2003)
Mann, A.K., Kaur, N.: Survey paper onclustering techniques. IJSETR: Int. J. Sci. Eng. Technol. Res. 2(4) (2013) (ISSN: 2278-7798)
Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, London (2001)
Xu, R., Wunch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3) (2005)
Berkhin, P.: Survey of clustering data mining techniques (2001). http://www.accrue.com/products/rp_cluster_review.p, http://citeseer.nj.nec.com/berkhin02survey.html.
Kleinberg, J.: An impossibility theorem for clustering. In: Proceedings of the 2002 Conference on Advances in Neural Information Processing Systems, vol. 15, pp. 463–470 (2002)
Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets, min knowl disc, vol. 10, p. 141 (2005). https://doi.org/10.1007/s10618-005-0361-3
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)
Guha, S., Rastogi, R., Shim. K.: ROCK: a robust clustering algorithm for categorical attributes. In: 18th Proceedings of the 15th International Conference on Data Engineering (1999)
Sneath, P.: The application of computers to taxonomy. J. Gen. Microbiol. 17, 201–226 (1957)
Fasulo, D.: An analysis of recent work on clustering algorithms. Department of Computer Science Engineering University of Washington, Seattle, WA, Technical Report, 01-03-02 (1999)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of data, Montreal, Quebec, Canada, pp. 103–114, 04–06 June 1996
Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2000)
Karypis, G., Han, E.-H., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)
Cutting, D., Pedersen, J., Karger, D., Tukey, J.: Scatter/gather: a cluster-based approach to browsing large document collections. In Proceedings of the ACM SIGIR, Copenhagen, pp. 318–329 (1992)
Ball, G.H., Hall, D.J.: ISODATA–A novel method data analysis and pattern classification. Menlo park: Stanford Res. Inst, CA (1965)
Macqueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkely Symposium on Mathematical statistics and probability, 1, 281–297 (1967)
He, J., Lan, M., Tan, C.-L., Sung, S.-Y., Low, H.-B.: Initialization of Cluster refinement algorithms: a review and comparative study. In: Proceeding of International Joint Conference on Neural Networks, Budapest (2004)
Biswas, G., Weingberg, J., Fisher, D.H.: ITERATE: a conceptual clustering algorithm for data mining. IEEE Trans. Syst. Cybern. 28C, 219–230
Han, J., Kamber, M.: Data Mining Concepts and Techniques-a Reference Book, pp. 383–422
Pujari, A.K.: Data Mining Techniques-a Reference Book, pp. 114–147
He, Z., Xu, X., Deng, S.: Scalable algorithms for clustering large datasets with mixed type attributes. Int. J. Intell. Syst. 20, 1077–1089
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 226–231 (1996)
Ball, G., Hall, D.: A clustering technique for summarizing multivariatedata. Behav. Sci. 12, 153–155 (1967)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Idrissi, A., Rehioui, H.: An improvement of denclue algorithm for the data clustering. In: 2015 5th International Conference Information & Communication Technology and Accessibility (ICTA). IEEE Xplore, 10 Mar 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bindra, K., Mishra, A., Suryakant (2019). Effective Data Clustering Algorithms. In: Ray, K., Sharma, T., Rawat, S., Saini, R., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 742. Springer, Singapore. https://doi.org/10.1007/978-981-13-0589-4_39
Download citation
DOI: https://doi.org/10.1007/978-981-13-0589-4_39
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0588-7
Online ISBN: 978-981-13-0589-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)