Abstract
This paper proposes and evaluates a nearest-neighbor method to sub-stitute missing values in ordinal/continuous datasets. In a nutshell, the K-Means clustering algorithm is applied in the complete dataset (without missing values) before the imputation process by nearest-neighbors takes place. Then, the achieved cluster centroids are employed as training instances for the nearest-neighbor method. The proposed method is more efficient than the traditional nearest-neighbor method, and simulations performed in three benchmark data-sets also indicate that it provides suitable imputations, both in terms of prediction and classification tasks.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Pyle, D.: Data Preparation for Data Mining. Academic Press (1999)
Little, R., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Mitchell, T.M.: Machine Learning. The McGraw-Hill Companies, New York (1997)
Hruschka, E.R., Hruschka Jr., E.R., Ebecken, N.F.F.: Evaluating a nearest-neighbor method to substitute continuous missing values. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 723–734. Springer, Heidelberg (2003)
Batista, G.E.A.P., Monard, M.C.: An Analysis of Four Missing Data Treatment Meth-ods for Supervised Learning. Applied Artificial Intelligence 17(5-6), 519–534 (2003)
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally Weighted Learning. Artificial Intelli-gence Review 11, 11–73 (1997)
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. Arnold Publishers, London (2001)
Anderberg, M.R.: Cluster Analysis for Applications, USA. Academic Press, Inc, London (1973)
Troyanskaya, O., et al.: Missing Value Estimation Methods for DNA Microarrays. Bioin-formatics 17(6), 520–525 (2001)
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Databases Irvine, CA, University of California, Department of Information and Computer Science, http://www.ics.uci.edu
Witten, I.H., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, USA (2000)
Kennedy, R.L., Lee, Y., Roy, B.V., Reed, C.D., Lippmann, R.P.: Solving Data Mining Problems through Pattern Recognition. Prentice Hall PTR, Englewood Cliffs (1997)
Wilson, D.R., Martinez, T.R.: Reduction Techniques for Instance-Based Learning Algo-rithms. In: Machine Learning, vol. 38(3), pp. 257–286. Kluwer Academic Publishers, Dordrecht (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F. (2004). Towards Efficient Imputation by Nearest-Neighbors: A Clustering-Based Approach. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-30549-1_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24059-4
Online ISBN: 978-3-540-30549-1
eBook Packages: Computer ScienceComputer Science (R0)