Abstract
In this chapter methods of handling missing attribute values in Data Mining are described. These methods are categorized into sequential and parallel. In sequential methods, missing attribute values are replaced by known values first, as a preprocessing, then the knowledge is acquired for a data set with all known attribute values. In parallel methods, there is no preprocessing, i.e., knowledge is acquired directly from the original data sets. In this chapter the main emphasis is put on rule induction. Methods of handling attribute values for decision tree generation are only briefly summarized.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Keywords
References
Allison P.D. Missing Data. Sage Publications, 2002.
Brazdii P. and Bruha I. Processing unknown attribute values by ID3. Proceedings of the 4-th Int. Conference Computing and Information, Toronto, 1992, 227–230
Breiman L., Friedman J.H., Olshen R.A., Stone CJ. Classification and Regression Trees. Wadsworth & Brooks, Monterey, CA, 1984.
Bruha I. Meta-learner for unknown attribute values processing: Dealing with inconsistency of meta-databases. Journal of Intelligent Information Systems 22 71–87, 2004.
Chiu, D. K. and Wong A. K. C. Synthesizing knowledge: A cluster analysis approach using event-covering. IEEE Trans. Syst., Man, and Cybern. SMC-16 251–259, 1986.
Clark P. and Niblett T. The CN2 induction algorithm. Machine Learning 3 261–283, 1989.
Dardzinska A. and Ras Z.W. Chasing unknown values in incomplete information systems. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, Melbourne, FL, November 1922, 24–30, 2003A.
Dardzinska A. and Ras Z.W. On rule discovery from incomplete information systems. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, Melbourne, FL, November 1922, 31–35, 2003B.
Greco S., Matarazzo B., and Slowinski R. Dealing with missing data in rough set analysis of multi-attribute and multi-criteria decision problems. In Decision Making: Recent developments and Worldwide Applications, ed. by S. H. Zanakis, G. Doukidis, and Z. Zopounidis, Kluwer Academic Publishers, Dordrecht, Boston, London, 2000, 295–316.
Grzymala-Busse J.W. Knowledge acquisition under uncertainty—A rough set approach. Journal of Intelligent & Robotic Systems 1 (1988) 3–16.
Grzymala-Busse J.W. On the unknown attribute values in learning from examples. Proc. of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Charlotte, North Carolina, October 16–19, 1991. Lecture Notes in Artificial Intelligence, vol. 542, Springer-Verlag, Berlin, Heidelberg, New York, 1991, 368–377.
Grzymala-Busse J.W. LERS—A system for learning from examples based on rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, ed. by R. Slowinski, Kluwer Academic Publishers, Dordrecht, Boston, London, 1992, 3–18.
Grzymala-Busse J.W. A new version of the rule induction system LERS, Fundamenta Informaticae 31 (1997) 27–39.
Grzymala-Busse J.W. MLEM2: A new algorithm for rule induction from imperfect data. Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, Annecy, France, July 1–5, 2002, 243–250.
Grzymala-Busse J.W. Rough set strategies to data with missing attribute values. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, Melbourne, FL, November 1922, 2003, 56–63.
Grzymala-Busse J.W. Data with missing attribute values: Generalization of in-discernibility relation and rule induction. Transactions on Rough Sets, Lecture Notes in Computer Science Journal Subline, Springer-Verlag, vol. 1 78–95, 2004A.
Grzymala-Busse J.W. Characteristic relations for incomplete data: A generalization of the indiscernibility relation. Proceedings of the RSCTC’2004, the Fourth International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 15, 2004. Lecture Notes in Artificial Intelligence 3066, Springer-Verlag pp.244–253, 2004B.
Grzymala-Busse J.W. Rough set approach to incomplete data. Proceedings of the ICAISC’2004, the Seventh International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, June 711, 2004. Lecture Notes in Artificial Intelligence 3070, Springer-Verlag pp.50–55, 2004.
Grzymala-Busse J.W., Grzymala-Busse W.J., and Goodwin L.K. A comparison of three closest fit approaches to missing attribute values in preterm birth data. International Journal of Intelligent Systems 17 (2002) 125–134.
Grzymala-Busse, J.W. and Hu, M. A comparison of several approaches to missing attribute values in Data Mining. Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing RSCTC’2000, Banff, Canada, October 16–19, 2000, 340–347.
Grzymala-Busse, J.W. and Wang A.Y. Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. Proc. of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC’97) at the Third Joint Conference on Information Sciences (JCIS’97), Research Triangle Park, NC, March 2–5, 1997, 69–72.
Grzymala-Busse J.W. and Siddhaye S. Rough set approaches to rule induction from incomplete data. Proceedings of the IPMU’2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 49, 2004, vol. 2, 923930.
Imielinski T. and Lipski W. Jr. Incomplete information in relational databases, Journal of the ACM 31 (1984) 761–791.
Kononenko I., Bratko I., and Roskar E. Experiments in automatic learning of medical diagnostic rules. Technical Report, Jozef Stefan Institute, Lljubl-jana, Yugoslavia, 1984
Kryszkiewicz M. Rough set approach to incomplete information systems. Proceedings of the Second Annual Joint Conference on Information Sciences, Wrightsville Beach, NC, September 28-October 1, 1995, 194–197.
Kryszkiewicz M. Rules in incomplete information systems. Information Sciences 113 (1999) 271–292.
Lakshminarayan K., Harp S.A., and Samad T. Imputation of missing data in industrial databases. Applied Intelligence 11 (1999) 259–275.
Latkowski, R. On decomposition for incomplete data. Fundamenta Informaticae 54 (2003) 1–16.
Latkowski R. and Mikolajczyk M. Data decomposition and decision rule joining for classification of data with missing values. Proceedings of the RSCTC2004, the Fourth International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 1–5,2004. Lecture Notes in Artificial Intelligence 3066, Springer-Verlag 2004, 254–263.
Lipski W. Jr. On semantic issues connected with incomplete information databases. ACM Transactions on Database Systems 4 (1979), 262–296.
Lipski W. Jr. On databases with incomplete information. Journal of the ACM 28(1981) 41–70.
Little R.J.A. and Rubin D.B. Statistical Analysis with Missing Data, Second Edition, J. Wiley & Sons, Inc., 2002.
Pawlak Z. Rough Sets. International Journal of Computer and Information Sciences 11 (1982) 341–356.
Pawlak Z. Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London, 1991.
Pawlak Z., Grzymala-Busse J.W., Slowinski R., and Ziarko, W. Rough sets. Communications of the ACM 38 (1995) 88–95.
Polkowski L. and Skowron A. (eds.) Rough Sets in Knowledge Discovery, 2, Applications, Case Studies and Software Systems, Appendix 2: Software Systems. Physica Verlag, Heidelberg New York (1998) 551–601.
Quinlan J.R. Unknown attribute values in induction. Proc. of the 6-th Int. Workshop on Machine Learning, Ithaca, NY, 1989, 164–168.
Quinlan J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo CA (1993).
Schafer J.L. Analysis of Incomplete Multivariate Data. Chapman and Hall, London, 1997.
Slowinski R. and Vanderpooten D. A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12 (2000) 331–336.
Stefanowski J. Algorithms of Decision Rule Induction in Data Mining. Poznan University of Technology Press, Poznan, Poland (2001).
Stefanowski J. and Tsoukias A. On the extension of rough sets under incomplete information. Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, RSFDGrC’ 1999, Ube, Yamaguchi, Japan, November 8–10, 1999, 73–81.
Stefanowski J. and Tsoukias A. Incomplete information tables and rough classification. Computational Intelligence 17 (2001) 545–566.
Weiss S. and Kulikowski C.A. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, chapter How to Estimate the True Performance of a Learning System, pp. 17–49, San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1991.
Wong K.C. and Chiu K.Y. Synthesizing statistical knowledge for incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 9 (1987) 796805.
Wu X. and Barbara D. Learning missing values from summary constraints. ACM SIGKDD Explorations Newsletter 4 (2002) 21–30.
Wu X. and Barbara D. Modeling and imputation of large incomplete multidimensional datasets. Proc. of the 4-th Int. Conference on Data Warehousing and Knowledge Discovery, Aix-en-Provence, France, 2002, 286–295
Yao Y.Y. On the generalizing rough set theory. Proc. of the 9th Int. Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFD-GrC’2003), Chongqing, China, October 19–22, 2003, 44–51.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Grzymala-Busse, J.W., Grzymala-Busse, W.J. (2005). Handling Missing Attribute Values. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_3
Download citation
DOI: https://doi.org/10.1007/0-387-25465-X_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)