Towards Efficient Imputation by Nearest-Neighbors: A Clustering-Based Approach

Hruschka, Eduardo R.; Hruschka, Estevam R.; Ebecken, Nelson F. F.

doi:10.1007/978-3-540-30549-1_45

Eduardo R. Hruschka²⁰,
Estevam R. Hruschka Jr.²¹ &
Nelson F. F. Ebecken²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

2683 Accesses
11 Citations

Abstract

This paper proposes and evaluates a nearest-neighbor method to sub-stitute missing values in ordinal/continuous datasets. In a nutshell, the K-Means clustering algorithm is applied in the complete dataset (without missing values) before the imputation process by nearest-neighbors takes place. Then, the achieved cluster centroids are employed as training instances for the nearest-neighbor method. The proposed method is more efficient than the traditional nearest-neighbor method, and simulations performed in three benchmark data-sets also indicate that it provides suitable imputations, both in terms of prediction and classification tasks.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

CKNNI: An Improved KNN-Based Missing Value Handling Technique

k-CCM: A Center-Based Algorithm for Clustering Categorical Data with Missing Values

On the Use of Multivariate Medians for Nearest Neighbour Imputation

References

Pyle, D.: Data Preparation for Data Mining. Academic Press (1999)
Google Scholar
Little, R., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
MATH Google Scholar
Mitchell, T.M.: Machine Learning. The McGraw-Hill Companies, New York (1997)
MATH Google Scholar
Hruschka, E.R., Hruschka Jr., E.R., Ebecken, N.F.F.: Evaluating a nearest-neighbor method to substitute continuous missing values. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 723–734. Springer, Heidelberg (2003)
Chapter Google Scholar
Batista, G.E.A.P., Monard, M.C.: An Analysis of Four Missing Data Treatment Meth-ods for Supervised Learning. Applied Artificial Intelligence 17(5-6), 519–534 (2003)
Article Google Scholar
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally Weighted Learning. Artificial Intelli-gence Review 11, 11–73 (1997)
Article Google Scholar
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. Arnold Publishers, London (2001)
MATH Google Scholar
Anderberg, M.R.: Cluster Analysis for Applications, USA. Academic Press, Inc, London (1973)
MATH Google Scholar
Troyanskaya, O., et al.: Missing Value Estimation Methods for DNA Microarrays. Bioin-formatics 17(6), 520–525 (2001)
Article Google Scholar
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Databases Irvine, CA, University of California, Department of Information and Computer Science, http://www.ics.uci.edu
Witten, I.H., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, USA (2000)
Google Scholar
Kennedy, R.L., Lee, Y., Roy, B.V., Reed, C.D., Lippmann, R.P.: Solving Data Mining Problems through Pattern Recognition. Prentice Hall PTR, Englewood Cliffs (1997)
Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction Techniques for Instance-Based Learning Algo-rithms. In: Machine Learning, vol. 38(3), pp. 257–286. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidade Católica de Santos (UniSantos), Brasil
Eduardo R. Hruschka
Universidade Federal de São Carlos (UFSCAR), Brasil
Estevam R. Hruschka Jr.
COPPE / Universidade Federal do Rio de Janeiro, Brasil
Nelson F. F. Ebecken

Authors

Eduardo R. Hruschka
View author publications
You can also search for this author in PubMed Google Scholar
Estevam R. Hruschka Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Nelson F. F. Ebecken
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Information Technology, Monash University, VIC 3800, Australia
Geoffrey I. Webb
Science, Engineering and Technology Portfolio, Royal Melbourne Institute of Technology, VIC 3001, Melbourne, Australia
Xinghuo Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F. (2004). Towards Efficient Imputation by Nearest-Neighbors: A Clustering-Based Approach. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_45

Download citation

DOI: https://doi.org/10.1007/978-3-540-30549-1_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24059-4
Online ISBN: 978-3-540-30549-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Efficient Imputation by Nearest-Neighbors: A Clustering-Based Approach

Abstract

Chapter PDF

Similar content being viewed by others

CKNNI: An Improved KNN-Based Missing Value Handling Technique

k-CCM: A Center-Based Algorithm for Clustering Categorical Data with Missing Values

On the Use of Multivariate Medians for Nearest Neighbour Imputation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Towards Efficient Imputation by Nearest-Neighbors: A Clustering-Based Approach

Abstract

Chapter PDF

Similar content being viewed by others

CKNNI: An Improved KNN-Based Missing Value Handling Technique

k-CCM: A Center-Based Algorithm for Clustering Categorical Data with Missing Values

On the Use of Multivariate Medians for Nearest Neighbour Imputation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation