Clustering Mixed Datasets by Using Similarity Features

Ahmad, Amir; Ray, Santosh Kumar; Aswani Kumar, Ch.

doi:10.1007/978-3-030-34515-0_50

Amir Ahmad⁵,
Santosh Kumar Ray⁶ &
Ch. Aswani Kumar⁷

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 39))

Included in the following conference series:

International Conference on Sustainable Communication Networks and Application

949 Accesses

Abstract

Clustering datasets consisting of numeric and nominal features is a challenging task as there are different similarity measures for numeric and nominal features. In the present paper, we propose a method to transform a mixed dataset to a numeric dataset. This method uses a similarity measure for mixed datasets and a randomly selected set of the data objects form the given mixed dataset and generate numeric similarity features. A clustering algorithm for pure numeric datasets is then applied on the newly generated numeric dataset to produce clusters. A comparative study with the other clustering algorithms demonstrated the superior performance of the proposed clustering approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms

A Unified Metric for Categorical and Numerical Attributes in Data Clustering

Integrated Framework Using Frequent Pattern for Clustering Numeric and Nominal Data Sets

References

Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63(2), 503–527 (2007)
Article Google Scholar
Ahmad, A., Dey, L.: A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recogn. Lett. 32(7), 1062–1069 (2011)
Article Google Scholar
Ahmad, A., Hashmi, S.: K-harmonic means type clustering algorithm for mixed datasets. Appl. Soft Comput. 48(C), 39–49 (2016)
Article Google Scholar
Ahmad, A., Khan, S.S.: Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 7, 31883–31902 (2019)
Article Google Scholar
Balcan, M.F., Blum, A.: On a theory of learning with similarity functions. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Google Scholar
Balcan, M.F., Blum, A., Vempala, S.: Kernels as features: on kernels, margins, and low-dimensional mappings. Mach. Learn. 65, 79–94 (2006)
Article Google Scholar
Barcelo-Rico, F., Jose-Luis, D.: Geometrical codification for clustering mixed categorical and numerical databases. J. Intell. Inf. Syst. 39(1), 167–185 (2012)
Article Google Scholar
Carpenter, G.A., Grossberg, S., Rosen, D.B.: Fuzzy art: fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Netw. 4(6), 759–771 (1991)
Article Google Scholar
Cheung, Y.M., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recogn. 46(8), 2228–2238 (2013)
Article Google Scholar
Foss, A.H., Markatou, M., Ray, B.: Distance metrics and clustering methods for mixed-type data. Int. Stat. Rev. 87(1), 80–109 (2018)
Article MathSciNet Google Scholar
He, Z.: Farthest-point heuristic based initialization methods for k-modes clustering. CoRR, abs/cs/0610043 (2006)
Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, pp. 21–34. World Scientific, Singapore (1997)
Google Scholar
Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: In Research Issues on Data Mining and Knowledge Discovery, pp. 1–8 (1997)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Upper Saddle River (1988)
Google Scholar
Ji, J., Pang, W., Zheng, Y., Wang, Z., Ma, Z., Zhang, L.: A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance. Appl. Math. Inf. Sci. 9(6), 2933 (2015)
Google Scholar
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-modes clustering. Expert Syst. Appl. 40(18), 7444–7456 (2013)
Article Google Scholar
Lam, D., Wei, M., Wunsch, D.: Clustering data of mixed categorical and numerical type with unsupervised feature learning. IEEE Access 3, 1605–1613 (2015)
Article Google Scholar
Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. IEEE Trans. Knowl. Data Eng. 14(4), 673–690 (2002)
Article Google Scholar
Lin, S., Azarnoush, B., Runger, G.: CRAFTER: a tree-ensemble clustering algorithm for static datasets with mixed attributes and high dimensionality. IEEE Trans. Knowl. Data Eng. (in Press)
Google Scholar
Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332(C), 167–183 (2016)
Article Google Scholar
Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Mach. Learn. 52(3), 217–237 (2003)
Article Google Scholar
Wang, C., Chi, C., Zhou, W., Wong, R.: Coupled interdependent attribute analysis on mixed data. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015, pp. 1861–1867 (2015)
Google Scholar
Wei, M., Chow, T.W.S., Chan, R.H.M.: Clustering heterogeneous data with k-means by mutual information-based unsupervised feature transformation. Entropy 17(3), 1535–1548 (2015)
Article Google Scholar
Wu, S., Jiang, Q., Huang, J.Z.: A new initialization method for clustering categorical data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) Advances in Knowledge Discovery and Data Mining, Berlin, Heidelberg, pp. 972–980. Springer, Heidelberg (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Technology, United Arab Emirates University, Al Ain, UAE
Amir Ahmad
Department of Information Technology, Khawarizmi International College, Al Ain, UAE
Santosh Kumar Ray
School of Information Technology and Engineering, VIT University, Vellore, India
Ch. Aswani Kumar

Authors

Amir Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Kumar Ray
View author publications
You can also search for this author in PubMed Google Scholar
Ch. Aswani Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amir Ahmad .

Editor information

Editors and Affiliations

Surya Engineering College, Kathirampatti, Tamil Nadu, India
P. Karrupusamy
Department of Electrical Engineering, Da-Yeh University, Changhua, Taiwan
Joy Chen
Department of Computer Science, Kennesaw State University, Kennesaw, GA, USA
Yong Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmad, A., Ray, S.K., Aswani Kumar, C. (2020). Clustering Mixed Datasets by Using Similarity Features. In: Karrupusamy, P., Chen, J., Shi, Y. (eds) Sustainable Communication Networks and Application. ICSCN 2019. Lecture Notes on Data Engineering and Communications Technologies, vol 39. Springer, Cham. https://doi.org/10.1007/978-3-030-34515-0_50

Download citation

DOI: https://doi.org/10.1007/978-3-030-34515-0_50
Published: 07 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34514-3
Online ISBN: 978-3-030-34515-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Clustering Mixed Datasets by Using Similarity Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms

A Unified Metric for Categorical and Numerical Attributes in Data Clustering

Integrated Framework Using Frequent Pattern for Clustering Numeric and Nominal Data Sets

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Clustering Mixed Datasets by Using Similarity Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms

A Unified Metric for Categorical and Numerical Attributes in Data Clustering

Integrated Framework Using Frequent Pattern for Clustering Numeric and Nominal Data Sets

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation