Data Preprocessing and Finding Optimal Value of K for KNN Model

Shetty, Roopashri; Geetha, M.; Acharya, Dinesh U.; Shyamala, G.

doi:10.1007/978-981-16-7088-6_1

Roopashri Shetty¹⁸,
M. Geetha¹⁸,
Dinesh U. Acharya¹⁸ &
…
G. Shyamala¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1413))

Included in the following conference series:

International Conference on Soft Computing and Signal Processing

899 Accesses
2 Citations

Abstract

K-nearest neighbor (KNN) is a simple classifier used in the classification of medical data. The performance of KNN depends on the data used for classification and the number of neighbors considered (K). Data preprocessing is considered to be an important step in data mining to improve the quality of the data. Preprocessing involves data cleaning by removing duplicates and noise, data normalization, feature selection, etc. Hence, in this paper, preprocessing the data is done by removing the irrelevant attributes present in the dataset using correlation matrix, and suitable value of K is chosen for KNN algorithm which helps in improving the performance of KNN model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The distance function effect on k-nearest neighbor classification for medical datasets

Article Open access 09 August 2016

Integrated Effect of Nearest Neighbors and Distance Measures in k-NN Algorithm

An Enhanced Entropy-K-Nearest Neighbor Algorithm Based on Attribute Reduction

References

Z. Deng et al., Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)
Article Google Scholar
H.K. Chantar, D.W. Corne, Feature subset selection for Arabic document categorization using BPSO-KNN, in 2011 Third World Congress on Nature and Biologically Inspired Computing (IEEE, 2011), pp. 546–551
Google Scholar
H.S. Khamis, K.W. Cheruiyot, S. Kimani, Application of k-nearest neighbor classification in medical data mining. Int. J. Inf. Commun. Technol. Res. 4(4) (2014)
Google Scholar
S. Garcia, J. Luengo, F. Herrera, Data Preprocessing in Data Mining (Springer, 2015)
Google Scholar
L. Jiang et al., Survey of improving k-nearest-neighbor for classification, in Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 1 (IEEE, 2007), pp. 679–683
Google Scholar
H. Parvin, H. Alizadeh, B. Minaei-Bidgoli, MKNN: Modified k-nearest neighbor, in Proceedings of the World Congress on Engineering and Computer Science, vol. 1 (Citeseer, 2008)
Google Scholar
Q. Song, J. Ni, G. Wang, A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2011)
Google Scholar
Y. Li, T. Li, H. Liu, Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53(3), 551–577 (2017)
Google Scholar
S.A. Mostafa et al., Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease. Cogn. Syst. Res. 54, 90–99 (2019)
Google Scholar
C.H. Park, S.B. Kim, Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst. Appl. 42(5), 2336–2342 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
Roopashri Shetty, M. Geetha & Dinesh U. Acharya
Department of Obstetrics and Gynaecology, Kasturba Medical College, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
G. Shyamala

Authors

Roopashri Shetty
View author publications
You can also search for this author in PubMed Google Scholar
M. Geetha
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh U. Acharya
View author publications
You can also search for this author in PubMed Google Scholar
G. Shyamala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roopashri Shetty .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Malla Reddy College of Engineering and Technology, Hyderabad, Telangana, India
V. Sivakumar Reddy
Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Hyderabad, Hyderabad, Telangana, India
V. Kamakshi Prasad
Department of Computer Science and Software Engineering, Monmouth University, New Jersey, NJ, USA
Jiacun Wang
Department of Electronics and Communication Engineering, Sir Visvesvaraya Institute of Technology, Nashik, Maharashtra, India
K.T.V. Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shetty, R., Geetha, M., Acharya, D.U., Shyamala, G. (2022). Data Preprocessing and Finding Optimal Value of K for KNN Model. In: Reddy, V.S., Prasad, V.K., Wang, J., Reddy, K. (eds) Soft Computing and Signal Processing. ICSCSP 2021. Advances in Intelligent Systems and Computing, vol 1413. Springer, Singapore. https://doi.org/10.1007/978-981-16-7088-6_1

Download citation

DOI: https://doi.org/10.1007/978-981-16-7088-6_1
Published: 15 February 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7087-9
Online ISBN: 978-981-16-7088-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Data Preprocessing and Finding Optimal Value of K for KNN Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

The distance function effect on k-nearest neighbor classification for medical datasets

Integrated Effect of Nearest Neighbors and Distance Measures in k-NN Algorithm

An Enhanced Entropy-K-Nearest Neighbor Algorithm Based on Attribute Reduction

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Data Preprocessing and Finding Optimal Value of K for KNN Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

The distance function effect on k-nearest neighbor classification for medical datasets

Integrated Effect of Nearest Neighbors and Distance Measures in k-NN Algorithm

An Enhanced Entropy-K-Nearest Neighbor Algorithm Based on Attribute Reduction

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation