An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining

Surya Narayana, G.; Vasumathi, D.

doi:10.1007/s13369-017-2761-2

An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining

Research Article - Special Issue - Computer Engineering and Computer Science
Published: 08 August 2017

Volume 43, pages 3979–3992, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining

Download PDF

276 Accesses
17 Citations
Explore all metrics

Abstract

In recent days, mining data in the form of information and knowledge from large databases is one of the demanding and task. Finding similarity between different attributes in a synthetic dataset is an aggressive concept in data retrieval applications. For this purpose, some of the clustering techniques are proposed in the existing works such as k-means, fuzzy c-means, and fuzzy k-means. But it has some drawbacks that include high overhead, less effective results, computation complexity, high time consumption, and memory utilization. To overcome these drawbacks, a similarity-based categorical data clustering technique is proposed. Here, the similarities of inter- and intra-attributes are simultaneously calculated and it is integrated to improve the performance. The dataset loaded as input, where the preprocessing is performed to remove the noise. Once the data are noise free, the similarity between the elements is computed; then, the most relevant attributes are selected and the insignificant attributes are neglected. The support and confidence measures are estimated by applying association rule mining for resource planning. The similarity-based K-medoids clustering technique is used to cluster the attributes based on the Euclidean distance to reduce the overhead. Finally, the bee colony (BC) optimization technique is used to select the optimal features for further use. In experiments, the results of the proposed clustering system are estimated and analyzed with respect to the clustering accuracy, execution time (s), error rate, convergence time (s), and adjusted Rand index (ARI). From the results, it is observed that the proposed technique provides better results when compared to the other techniques.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Verma, A.; Kaur, I.; Kaur, A.: Algorithmic approach to data mining and classification techniques. Indian J. Sci. Technol. (IJST) (Association rule mining, classification, clustering, data, data mining, decision tree, neural network) 9(28), 1–22 (2016)
Gayathri, S.; Mary, Metilda M.; Sanjai, Babu S.: A shared nearest neighbour density based clustering approach on a Proclus method to cluster high dimensional data. Indian J. Sci. Technol. (IJST) (Density based approach, high dimensional data, Proclus, SNN algorithm) 8(22), 1–6 (2015)
Celebi, M.E.; Kingravi, H.A.; Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl. 40(1), 200–210 (2013)
Article Google Scholar
Ghosh, S.; Dubey, S.K.: Comparative analysis of k-means and fuzzy c-means algorithms. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(4), 35–39 (2013)
Google Scholar
Velmurugan, T.: Performance based analysis between k-means and fuzzy C-means clustering algorithms for connection oriented telecommunication data. Appl. Soft Comput. 19, 134–46 (2014)
Article Google Scholar
Wang, C.; Dong, X.; Zhou, F.; Cao, L.; Chi, C.-H.: Coupled attribute similarity learning on categorical data. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 781–97 (2015)
Article MathSciNet Google Scholar
Joshi, A.; Kaur, R.: A review: comparative study of various clustering techniques in data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(3), 67–70 (2013)
Mukhopadhyay, A.; Maulik, U.; Bandyopadhyay, S.; Coello, C.A.C.: Survey of multiobjective evolutionary algorithms for data mining: part II. IEEE Trans. Evolut. Comput. 18(1), 20–35 (2014)
Article Google Scholar
Arora, P.; Varshney, S.: Analysis of K-means and K-medoids algorithm for big data. Proced. Comput. Sci. 78, 507–512 (2016)
Article Google Scholar
Harikumar, S.; Surya, P.: K-medoid clustering for heterogeneous datasets. Proced. Comput. Sci. 70, 226–37 (2015)
Article Google Scholar
Choi, D.-W.; Chung, C.-W.: A K-partitioning algorithm for clustering large-scale spatio-textual data. Inf. Syst. 64, 1–11 (2017)
Article Google Scholar
Mei, J.-P.; Chen, L.: Fuzzy clustering with weighted medoids for relational data. Pattern Recognit. 43(5), 1964–74 (2010)
Article MATH Google Scholar
Galluccio, L.; Michel, O.; Comon, P.; Kliger, M.; Hero, A.O.: Clustering with a new distance measure based on a dual-rooted tree. Inf. Sci. 251, 96–113 (2013)
Article MathSciNet Google Scholar
Jiang, B.; Pei, J.; Tao, Y.; Lin, X.: Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowl. Data Eng. 25(4), 751–63 (2013)
Article Google Scholar
Subbalakshmi, G.R.; Rao, S.K.M. (eds.) Evaluation of data mining strategies using fuzzy clustering in dynamic environment. In: Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. Springer, Berlin (2016)
Raj, Y.S.; Rajan, A.P.; Charles, S.; Raj, S.A.J.: Clustering methods and algorithms in data mining: concepts and a study. J. Comput. Technol. 4(7), 8–11 (2015)
Google Scholar
Zadegan, S.M.R.; Mirzaie, M.; Sadoughi, F.: Ranked k-medoids: a fast and accurate rank-based partitioning algorithm for clustering large datasets. Knowl. Based Syst. 39, 133–43 (2013)
Article Google Scholar
Sood, M.; Bansal, S.: K-medoids clustering technique using bat algorithm. Int. J. Appl. Inf. Syst. 5(8), 20–2 (2013)
Google Scholar
Skabar, A.; Abdalgader, K.: Clustering sentence-level text using a novel fuzzy relational clustering algorithm. IEEE Trans. Knowl. Data Eng. 25(1), 62–75 (2013)
Article Google Scholar
Kulkarni, B.M.; Kinariwala, S.: Review on fuzzy approach to sentence level text clustering. Int. J. Sci. Res. Educ. 3(06), 3845–3850 (2015)
Google Scholar
Kameshwaran, K.; Malarvizhi, K.: Survey on clustering techniques in data mining. Int. J. Comput. Sci. Inf. Technol. 5(2), 2272–6 (2014)
Google Scholar
Ghadiri, M.; Aghaee, A.; Baghshah, M.S.: Active distance-based clustering using K-medoids. (2015). arXiv preprint arXiv:1512.03953 [cs.LG]
Li, Y.; Hsu, B.-J.P.; Zhai, C.; Wang, K. (eds.) Mining entity attribute synonyms via compact clustering. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, ACM (2013)
Guo, G.; Zhang, J.; Yorke-Smith, N.: Leveraging multiviews of trust and similarity to enhance clustering-based recommender systems. Knowl. Based Syst. 74, 14–27 (2015)
Article Google Scholar
Balabantaray, R.C.: Sarma, C.; Jha, M.: Document clustering using K-means and K-medoids. (2015). arXiv preprint arXiv:1502.07938 [cs.IR]
Grossi, V.; Monreale, A.; Nanni, M.; Pedreschi, D.; Turini, F. (eds.) Clustering formulation using constraint optimization. In: International Conference on Software Engineering and Formal Methods. Springer (2015)
Nguyen, T.T.; Nguyen, Q.V.H.; Weidlich, M.; Aberer, K. (eds.) Result selection and summarization for web table search. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), IEEE (2015)
Zhang, D.; Luo, K.: Clustering algorithm based on artificial bee colony optimization. In: International Conference on Applied Science and Engineering Innovation (ASEI) (2015)
Ozturk, C.; Hancer, E.; Karaboga, D.: Dynamic clustering with improved binary artificial bee colony algorithm. Appl. Soft Comput. 28, 69–80 (2015)
Article Google Scholar
Djenouri, Y.; Drias, H.; Habbas, Z.: Bees swarm optimisation using multiple strategies for association rule mining. Int. J. Bioinspir. Comput. 6(4), 239–49 (2014)
Article Google Scholar
Karaboga, D.; Gorkemli, B.; Ozturk, C.; Karaboga, N.: A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif. Intell. Rev. 42(1), 21–57 (2014)
Article Google Scholar
Becker, B.: Adult data set (2015). https://archive.ics.uci.edu/ml/datasets/adult
Bain, M.; Hoff, A.V. Chess (King-Rook vs. King) data set (2015). https://archive.ics.uci.edu/ml/datasets/Chess+(King-Rook+vs.+King)
Tromp J.: Connect-4 data set (2015). https://archive.ics.uci.edu/ml/datasets/Connect-4
Cheung, Y.-M.; Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 46(8), 2228–38 (2013)
Article MATH Google Scholar
Cao, F.; Liang, J.; Li, D.; Zhao, X.: A weighting k modes algorithm for subspace clustering of categorical data. Neurocomputing 108, 23–30 (2013)
Article Google Scholar
Vora, P.; Oza, B.: A survey on k mean clustering and particle swarm optimization. Int. J. Sci. Mod. Eng. (IJISME) 1(3), 24–26 (2013)
Google Scholar
Mirjalili, S.; Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

JNTUH, Hyderabad, Telangana State, India
G. Surya Narayana
CSE Department, JNTUCEH, Hyderabad, Telangana State, India
D. Vasumathi

Authors

G. Surya Narayana
View author publications
You can also search for this author in PubMed Google Scholar
D. Vasumathi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Surya Narayana.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Surya Narayana, G., Vasumathi, D. An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining. Arab J Sci Eng 43, 3979–3992 (2018). https://doi.org/10.1007/s13369-017-2761-2

Download citation

Received: 31 March 2017
Accepted: 17 July 2017
Published: 08 August 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s13369-017-2761-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining

Abstract

Article PDF

Similar content being viewed by others

Application of genetic algorithm-based intuitionistic fuzzy weighted c-ordered-means algorithm to cluster analysis

Intuitionistic fuzzy c-means clustering algorithm based on a novel weighted proximity measure and genetic algorithm

Hybridization of K-means Clustering Using Different Distance Function to Find the Distance Among Dataset

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining

Abstract

Article PDF

Similar content being viewed by others

Application of genetic algorithm-based intuitionistic fuzzy weighted c-ordered-means algorithm to cluster analysis

Intuitionistic fuzzy c-means clustering algorithm based on a novel weighted proximity measure and genetic algorithm

Hybridization of K-means Clustering Using Different Distance Function to Find the Distance Among Dataset

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation