Abstract
In recent days, mining data in the form of information and knowledge from large databases is one of the demanding and task. Finding similarity between different attributes in a synthetic dataset is an aggressive concept in data retrieval applications. For this purpose, some of the clustering techniques are proposed in the existing works such as k-means, fuzzy c-means, and fuzzy k-means. But it has some drawbacks that include high overhead, less effective results, computation complexity, high time consumption, and memory utilization. To overcome these drawbacks, a similarity-based categorical data clustering technique is proposed. Here, the similarities of inter- and intra-attributes are simultaneously calculated and it is integrated to improve the performance. The dataset loaded as input, where the preprocessing is performed to remove the noise. Once the data are noise free, the similarity between the elements is computed; then, the most relevant attributes are selected and the insignificant attributes are neglected. The support and confidence measures are estimated by applying association rule mining for resource planning. The similarity-based K-medoids clustering technique is used to cluster the attributes based on the Euclidean distance to reduce the overhead. Finally, the bee colony (BC) optimization technique is used to select the optimal features for further use. In experiments, the results of the proposed clustering system are estimated and analyzed with respect to the clustering accuracy, execution time (s), error rate, convergence time (s), and adjusted Rand index (ARI). From the results, it is observed that the proposed technique provides better results when compared to the other techniques.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Verma, A.; Kaur, I.; Kaur, A.: Algorithmic approach to data mining and classification techniques. Indian J. Sci. Technol. (IJST) (Association rule mining, classification, clustering, data, data mining, decision tree, neural network) 9(28), 1–22 (2016)
Gayathri, S.; Mary, Metilda M.; Sanjai, Babu S.: A shared nearest neighbour density based clustering approach on a Proclus method to cluster high dimensional data. Indian J. Sci. Technol. (IJST) (Density based approach, high dimensional data, Proclus, SNN algorithm) 8(22), 1–6 (2015)
Celebi, M.E.; Kingravi, H.A.; Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl. 40(1), 200–210 (2013)
Ghosh, S.; Dubey, S.K.: Comparative analysis of k-means and fuzzy c-means algorithms. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(4), 35–39 (2013)
Velmurugan, T.: Performance based analysis between k-means and fuzzy C-means clustering algorithms for connection oriented telecommunication data. Appl. Soft Comput. 19, 134–46 (2014)
Wang, C.; Dong, X.; Zhou, F.; Cao, L.; Chi, C.-H.: Coupled attribute similarity learning on categorical data. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 781–97 (2015)
Joshi, A.; Kaur, R.: A review: comparative study of various clustering techniques in data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(3), 67–70 (2013)
Mukhopadhyay, A.; Maulik, U.; Bandyopadhyay, S.; Coello, C.A.C.: Survey of multiobjective evolutionary algorithms for data mining: part II. IEEE Trans. Evolut. Comput. 18(1), 20–35 (2014)
Arora, P.; Varshney, S.: Analysis of K-means and K-medoids algorithm for big data. Proced. Comput. Sci. 78, 507–512 (2016)
Harikumar, S.; Surya, P.: K-medoid clustering for heterogeneous datasets. Proced. Comput. Sci. 70, 226–37 (2015)
Choi, D.-W.; Chung, C.-W.: A K-partitioning algorithm for clustering large-scale spatio-textual data. Inf. Syst. 64, 1–11 (2017)
Mei, J.-P.; Chen, L.: Fuzzy clustering with weighted medoids for relational data. Pattern Recognit. 43(5), 1964–74 (2010)
Galluccio, L.; Michel, O.; Comon, P.; Kliger, M.; Hero, A.O.: Clustering with a new distance measure based on a dual-rooted tree. Inf. Sci. 251, 96–113 (2013)
Jiang, B.; Pei, J.; Tao, Y.; Lin, X.: Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowl. Data Eng. 25(4), 751–63 (2013)
Subbalakshmi, G.R.; Rao, S.K.M. (eds.) Evaluation of data mining strategies using fuzzy clustering in dynamic environment. In: Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. Springer, Berlin (2016)
Raj, Y.S.; Rajan, A.P.; Charles, S.; Raj, S.A.J.: Clustering methods and algorithms in data mining: concepts and a study. J. Comput. Technol. 4(7), 8–11 (2015)
Zadegan, S.M.R.; Mirzaie, M.; Sadoughi, F.: Ranked k-medoids: a fast and accurate rank-based partitioning algorithm for clustering large datasets. Knowl. Based Syst. 39, 133–43 (2013)
Sood, M.; Bansal, S.: K-medoids clustering technique using bat algorithm. Int. J. Appl. Inf. Syst. 5(8), 20–2 (2013)
Skabar, A.; Abdalgader, K.: Clustering sentence-level text using a novel fuzzy relational clustering algorithm. IEEE Trans. Knowl. Data Eng. 25(1), 62–75 (2013)
Kulkarni, B.M.; Kinariwala, S.: Review on fuzzy approach to sentence level text clustering. Int. J. Sci. Res. Educ. 3(06), 3845–3850 (2015)
Kameshwaran, K.; Malarvizhi, K.: Survey on clustering techniques in data mining. Int. J. Comput. Sci. Inf. Technol. 5(2), 2272–6 (2014)
Ghadiri, M.; Aghaee, A.; Baghshah, M.S.: Active distance-based clustering using K-medoids. (2015). arXiv preprint arXiv:1512.03953 [cs.LG]
Li, Y.; Hsu, B.-J.P.; Zhai, C.; Wang, K. (eds.) Mining entity attribute synonyms via compact clustering. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, ACM (2013)
Guo, G.; Zhang, J.; Yorke-Smith, N.: Leveraging multiviews of trust and similarity to enhance clustering-based recommender systems. Knowl. Based Syst. 74, 14–27 (2015)
Balabantaray, R.C.: Sarma, C.; Jha, M.: Document clustering using K-means and K-medoids. (2015). arXiv preprint arXiv:1502.07938 [cs.IR]
Grossi, V.; Monreale, A.; Nanni, M.; Pedreschi, D.; Turini, F. (eds.) Clustering formulation using constraint optimization. In: International Conference on Software Engineering and Formal Methods. Springer (2015)
Nguyen, T.T.; Nguyen, Q.V.H.; Weidlich, M.; Aberer, K. (eds.) Result selection and summarization for web table search. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), IEEE (2015)
Zhang, D.; Luo, K.: Clustering algorithm based on artificial bee colony optimization. In: International Conference on Applied Science and Engineering Innovation (ASEI) (2015)
Ozturk, C.; Hancer, E.; Karaboga, D.: Dynamic clustering with improved binary artificial bee colony algorithm. Appl. Soft Comput. 28, 69–80 (2015)
Djenouri, Y.; Drias, H.; Habbas, Z.: Bees swarm optimisation using multiple strategies for association rule mining. Int. J. Bioinspir. Comput. 6(4), 239–49 (2014)
Karaboga, D.; Gorkemli, B.; Ozturk, C.; Karaboga, N.: A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif. Intell. Rev. 42(1), 21–57 (2014)
Becker, B.: Adult data set (2015). https://archive.ics.uci.edu/ml/datasets/adult
Bain, M.; Hoff, A.V. Chess (King-Rook vs. King) data set (2015). https://archive.ics.uci.edu/ml/datasets/Chess+(King-Rook+vs.+King)
Tromp J.: Connect-4 data set (2015). https://archive.ics.uci.edu/ml/datasets/Connect-4
Cheung, Y.-M.; Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 46(8), 2228–38 (2013)
Cao, F.; Liang, J.; Li, D.; Zhao, X.: A weighting k modes algorithm for subspace clustering of categorical data. Neurocomputing 108, 23–30 (2013)
Vora, P.; Oza, B.: A survey on k mean clustering and particle swarm optimization. Int. J. Sci. Mod. Eng. (IJISME) 1(3), 24–26 (2013)
Mirjalili, S.; Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Surya Narayana, G., Vasumathi, D. An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining. Arab J Sci Eng 43, 3979–3992 (2018). https://doi.org/10.1007/s13369-017-2761-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-017-2761-2