Abstract
K-nearest neighbor (KNN) is a simple classifier used in the classification of medical data. The performance of KNN depends on the data used for classification and the number of neighbors considered (K). Data preprocessing is considered to be an important step in data mining to improve the quality of the data. Preprocessing involves data cleaning by removing duplicates and noise, data normalization, feature selection, etc. Hence, in this paper, preprocessing the data is done by removing the irrelevant attributes present in the dataset using correlation matrix, and suitable value of K is chosen for KNN algorithm which helps in improving the performance of KNN model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Z. Deng et al., Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)
H.K. Chantar, D.W. Corne, Feature subset selection for Arabic document categorization using BPSO-KNN, in 2011 Third World Congress on Nature and Biologically Inspired Computing (IEEE, 2011), pp. 546–551
H.S. Khamis, K.W. Cheruiyot, S. Kimani, Application of k-nearest neighbor classification in medical data mining. Int. J. Inf. Commun. Technol. Res. 4(4) (2014)
S. Garcia, J. Luengo, F. Herrera, Data Preprocessing in Data Mining (Springer, 2015)
L. Jiang et al., Survey of improving k-nearest-neighbor for classification, in Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 1 (IEEE, 2007), pp. 679–683
H. Parvin, H. Alizadeh, B. Minaei-Bidgoli, MKNN: Modified k-nearest neighbor, in Proceedings of the World Congress on Engineering and Computer Science, vol. 1 (Citeseer, 2008)
Q. Song, J. Ni, G. Wang, A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2011)
Y. Li, T. Li, H. Liu, Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53(3), 551–577 (2017)
S.A. Mostafa et al., Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease. Cogn. Syst. Res. 54, 90–99 (2019)
C.H. Park, S.B. Kim, Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst. Appl. 42(5), 2336–2342 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shetty, R., Geetha, M., Acharya, D.U., Shyamala, G. (2022). Data Preprocessing and Finding Optimal Value of K for KNN Model. In: Reddy, V.S., Prasad, V.K., Wang, J., Reddy, K. (eds) Soft Computing and Signal Processing. ICSCSP 2021. Advances in Intelligent Systems and Computing, vol 1413. Springer, Singapore. https://doi.org/10.1007/978-981-16-7088-6_1
Download citation
DOI: https://doi.org/10.1007/978-981-16-7088-6_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7087-9
Online ISBN: 978-981-16-7088-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)