Abstract
This paper presents and summarizes some criterions for selecting the best data abstraction for relations in relational databases. The data abstraction can be understood as a grouping of attribute values whose individual aspects are forgotten and are therefore abstracted to some more abstract value together. Consequently, a relation after the abstraction is a more compact one for which data miners will work efficiently. It is however a major problem that, when an important aspect of data values is neglected in the abstraction, then the quality of extracted knowledge becomes worse. So, it is the central issue to present a criterion under which only an adequate data abstraction is selected so as to keep the important information and to reduce the sizes of relations at the same time. From this viewpoint, we present in this paper three criterions and test them for a task of classifying tuples in a relation given several target classes. All the criterions are derived from a notion of similarities among class distributions, and are formalized based on the standard information theory. We also summarize our experimental results for the classification task, and discuss a future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Han, J. and Fu, Y.: Attribute-Oriented Induction in Data Mining. In Advances in Knowledge Discovery and Data Mining (Fayyad, U.N. et.al. eds.), pp.399–421, 1996.
Kudoh, Y. and Haraguchi, M.: An Appropriate Abstration for an Attribute-Oriented Induction Proceeding of The Second International Conference on Discovery Science, LNAI 721, pp.43–55, 1999.
Kudoh, Y. and Haraguchi, M.: Detecting a Compact Decision Tree Based on an Appropriate Abstraction Proc. of 2nd Intl. Conf. on Intelligent Data Engineering and Automated Learning, LNCS-1983, pp.60–70, 2000.
Quinlan, J.R.: C4.5-Programs for Machine Learning, Morgan Kaufmann, 1993.
Shannon, C. E.: A Mathematical Theory of Communication, The Bell system technical journal, vol. 27, pp.379–423 (part I), pp.623–656 (part II), 1948.
Kudoh, Y., Haraguchi, M. and Okubo, Y.: Data Abstractions for Decision Tree Induction, submitted to an international journal, Jan. 2001.
Murphy, P.M. and Aha, D.W.: UCI Repository of machine learning databases, http://www.ics.uci.edu/ mlearn/MLRepository.html.
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K.: Intorduction to WordNet: An On-line Lexical Database In: International Journal of lexicography 3(4), pp.235–244, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Haraguchi, M., Kudoh, Y. (2002). Some Criterions for Selecting the Best Data Abstractions. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_8
Download citation
DOI: https://doi.org/10.1007/3-540-45884-0_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43338-5
Online ISBN: 978-3-540-45884-5
eBook Packages: Springer Book Archive