Abstract
In KDD procedure, to fill in missing data typically requires a very large investment of time and energy – often 80% to 90% of a data analysis project is spent in making the data reliable enough so that the results can be trustful. In this paper, we propose a SVM regression based algorithm for filling in missing data, i.e. set the decision attribute (output attribute) as the condition attribute (input attribute) and the condition attribute as the decision attribute, then use SVM regression to predict the condition attribute values. SARS data set experimental results show that SVM regression method has the highest precision. The method with which the value of the example that has the minimum distance to the example with missing value will be taken to fill in the missing values takes the second place, and the mean and median methods have lower precision.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Support Vector Regression
- Decision Attribute
- Output Attribute
- Input Attribute
- Lagrange Multiplier Technique
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Thomas, T.M., Plymat, K.R., Blannin, J., Meade, T.W.: Prevalence of Urinary Incontinence. Br. Med. J. 281, 1243–1245 (1980)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall, London (1997)
Hill, M.A.: SPSS Missing Value Analysis 7.5. SPSS Inc., Chicago (1997)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, NY (1995)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001), Software is available for download at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Guanghui, Z., Huazhu, S., Hongxia, X., Luo, Z.: Comparison of Missing Data Estimation Methods in Satellite Information for Scientific Exploration. DCABES, 278–280 (2004)
Cartwright, M.H., Shepperd, M.J., Song, Q.: Dealing with Missing Software Project Data. In: 9th International Software Metrics Symposium, pp. 154–165 (2003)
Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F.: Evaluating a Nearest-Neighbor Method to Substitute Continuous Missing Values. Lecture notes in computer science, pp. 723–734 (2003)
Liehr, T.: Data Preparation in Large Real-World data Mining Projects: Methods for Imputing Missing Values. In: Exploratory data analysis in empirical research, pp. 248–256 (2003)
Shen, J.-J., Chen, M.-T.: A Recycle Technique of Association Rule for Missing Value Completion. In: 17th International Conference on Advanced Information Networking and Applications, pp. 526–529 (2003)
Kandara, M., Kandara, O.: Association Rules to Recover the Missing Data Value for An Attribute in a Database. In: The 7th World Multiconference on Systemics, Cybernetics and Informatics, pp. 1–6 (2003)
Shigcyuki, O., Masa-aki, S., Ichiro, T., Morito, M., Ken-ichi, M., Shin, I.: Missing Value Estimation Using Mixture of PCAs. In: International Conference on Artificial Neural Networks, pp. 492–497 (2002)
Grzymala-Busse, J.W., Hu, M.: A Comparison of Several Approaches to Missing Attribute Values in Data Mining. In: 2nd International Conference on Rough Sets and Current Trends in Computing, pp. 378–385 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Honghai, F., Guoshun, C., Cheng, Y., Bingru, Y., Yumei, C. (2005). A SVM Regression Based Approach to Filling in Missing Values. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11553939_83
Download citation
DOI: https://doi.org/10.1007/11553939_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28896-1
Online ISBN: 978-3-540-31990-0
eBook Packages: Computer ScienceComputer Science (R0)