Abstract
In the field of data mining, classification is an important aspect which has been studied widely. However, most of the existing studies assumed the data for classification is complete, while in practice, a lot of data with missing values exists. When dealing with these data, deleting the incomplete instances will result in a reduction of available information and filling in missing values may introduce skew and errors. To avoid the above problems, it is of great importance to study how to classify directly with incomplete data. In the paper, an information theory based classification algorithm, ITCI, is proposed. ITCI calculates the initial uncertainty of each class and attributes’ contribution to decrease class uncertainty in the training stage and then, in the testing stage, an instance is assigned to the class whose uncertainty is minimum after all of the attributes are taken into consideration. Extended experiments proved the effectiveness and feasibility of the proposed method.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Gantayat, S.S., Misra, A., Panda, B.S.: A study of incomplete data – A review. In: Satapathy, S.C., Udgata, S.K., Biswal, B.N. (eds.) FICTA 2013. AISC, vol. 247, pp. 401–408. Springer, Heidelberg (2014)
Graham, J.W.: Missing Data Theory. Missing Data, pp. 3–46. Springer, New York (2012)
Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data (2002)
Farhangfar, A., Kurgan, L.A., Pedrycz, W.: A novel framework for imputation of missing values in databases. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 37(5), 692–709 (2007)
Zhang, S., Jin, Z., Zhu, X.: Missing data imputation by utilizing information within incomplete instances. Journal of Systems and Software 84(3), 452–459 (2011)
Garca-Laencina, P.J., Sancho-Gmez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Computing and Applications 19(2), 263–282 (2010)
Zhang, X., Song, S., Wu, C.: Robust Bayesian Classification with Incomplete Data. Cognitive Computation, 1–18 (2013)
Quinlan, J.R.: C4. 5: programs for machine learning. Morgan Kaufmann (1993)
Ichihashi, H., Honda, K., Notsu, A., et al.: Fuzzy c-means classifier with deterministic initialization and missing value imputation. In: IEEE Symposium on Foundations of Computational Intelligence, FOCI 2007, pp. 214–221. IEEE (2007)
Chechik, G., Heitz, G., Elidan, G., et al.: Max-margin classification of incomplete data. In: Advances in Neural Information Processing Systems: Proceedings of the 2006 Conference, vol. 19, p. 233. The MIT Press (2007)
Wang, S.C., Yuan, S.M.: Research on Learning Bayesian Networks Structure with Missing Data. Journal of Software 7, 11 (2004)
Jonsson, P., Wohlin, C.: An evaluation of k-nearest neighbour imputation using likert data. In: Proceedings of the 10th International Symposium on Software Metrics, pp. 108–118. IEEE (2004)
Blomberg, L.C., Ruiz, D.D.A.: Evaluating the Influence of Missing Data on Classification Algorithms in Data Mining Applications. SBSI 2013: Simpiósio Brasileiro de Sistemas de Informacao (2013)
Ramoni, M., Sebastiani, P.: Robust bayes classifiers. Artificial Intelligence 125(1), 209–226 (2001)
Corani, G., Zaffalon, M.: Naive credal classifier 2: an extension of naive Bayes for delivering robust classifications. DMIN 8, 84–90 (2008)
Dai, J., Xu, Q., Wang, W.: A comparative study on strategies of rule induction for incomplete data based on rough set approach[J]. International Journal of Advancements in Computing Technology 3(3), 176–183 (2011)
Grzymala-Busse, J.W., Hippe, Z.S.: Mining Incomplete Data A Rough Set Approach. Emerging Paradigms in Machine Learning, pp. 49–74. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, Y., Li, J., Luo, J. (2014). ITCI:An Information Theory Based Classification Algorithm for Incomplete Data. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-08010-9_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)