Abstract
A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regularities and patterns are observable. Many information-theoretic measures have been proposed and applied to quantify the importance of attributes and relationships between attributes in various fields. In the context of knowledge discovery and data mining (KDD), we present a critical review and analysis of information-theoretic measures of attribute importance and attribute association, with emphasis on their interpretations and connections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T. and Swami, A. Mining association rules between sets of items in large databases, Proceedings of the ACM SIGMOD International Conference on the Management of Data, 207–216, 1993.
Bell, A. Discovery and maintenance of functional dependencies by independencies, Proceedings of KDD-95, 27–32, 1995.
Büchter, O. and Wirth R. Discovery of association rules over ordinal data: a new and faster algorithm and its application to basket analysis, in: Research and Development in Knowledge Discovery and Data Mining, Wu, X., Kotagiri, R. and Bork, K.B. (Eds.), Springer, Berlin, 36–47, 1998.
Butz, C.J., Wong, S.K.M. and Yao, Y.Y. On data and probabilistic dependencies, Proceedings of the 1999 IEEE Canadian Conference on Electrical and Computer Engineering, 1692–1697, 1999.
Cendrowska, J. PRISM: an algorithm for inducing modular rules, International Journal of Man-Machine Studies, 27, 349–370, 1987.
Chen, C. Statistical Pattern Recognition, Hayden Book Company, Inc., New Jersey, 1973.
Chen, M., Han, J. and Yu, P.S. Data mining, an overview from a database perspective, IEEE Transactions on Knowledge and Data Engineering, 8, 866883, 1996.
Chow, C. and Liu, C. Approximating discrete probability distributions with dependence trees, IEEE Transactions on Information Theory, IT-14, 462–467, 1968.
Cowell, R.G., Dawid, A.P., Lauritzen, S.L. and Spiegelhalter, D.J. Probabilistic Networks and Expert Systems, Springer, New York, 1999.
Cover, T. and Thomas, J. Elements of Information Theory, John Wiley Sc Sons, Toronto, 1991.
Csiszar, I. and Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, New York, 1981.
Garner, W.R. and McGill, W.J. Relation between information and variance analyses, Psychometrika, 21, 219–228, 1956.
Gray, B. and Orlowska, M.E. CCAIIA: clustering categorical attributes into interesting association rules, in: Research and Development in Knowledge Discovery and Data Mining, Wu, X., Kotagiri, R. and Bork, K.B. (Eds.), Springer, Berlin, 132–143, 1998.
Guiasu, S. Information Theory with Applications, McGraw-Hill, New York, 1977.
Han, J., Cai, Y. and Cercone, N. Data-driven discovery of quantitative rules in databases, IEEE Transactions on Knowledge and Data Engineering, 5, 29–40, 1993.
Horibe, Y. A note on entropy metrics, Information and Control, 22, 403–404, 1973.
Horibe, Y. Entropy and correlation, IEEE Transactions on Systems, Man, and Cybernetics, SMC-15, 641–642, 1985.
Hou, W. Extraction and applications of statistical relationships in relational databases, IEEE Transactions on Knowledge and Data Engineering, 8, 939945, 1996.
Hwang, C.L. and Yoon, K. Multiple Attribute Decision Making, Methods and Applications, Springer-Verlag, Berlin, 1981.
Kazakos, D. and Cotsidas, T. A decision approach to the approximation of discrete probability densities, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2, 61–67, 1980.
Kamber, M. and Shinghal, R. Evaluating the interestingness of characteristic rules, Proceedings of KDD-96, 263–266, 1996.
Klir, G.J. and Yuan, B. Fuzzy Sets and Fuzzy Logic, Theory and Applications, Prentice Hall, New Jersey, 1995.
Klösgen, W. Explora: a multipattern and multistrategy discovery assistant, in: Advances in Knowledge Discovery and Data Mining, Fayyad, U.M, PiatetskyShapiro, G., Smyth, P. and Uthurusamy, R. (Eds.), AAAI/MIT Press, California, 249–271, 1996.
Knobbe, A.J. and Adriaans P.W. Analysis of binary association, Proceedings of KDD-96, 311–314, 1996.
Kohavi, R. and Li, C. Oblivious decision trees, graphs and top-down pruning, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1071–1077, 1995.
Kvâlseth, T.O. Entropy and correlation: some comments, IEEE Transactions on Systems, Man, and Cybernetics, SMC-17, 517–519, 1987.
Kullback, S. and Leibler, R.A. On information and sufficiency, Annals of Mathematical Statistics, 22, 79–86, 1951.
Lee, T.T. An information-theoretic analysis of relational databases — part I: data dependencies and information metric, IEEE Transactions on Software Engineering, SE-13, 1049–1061, 1987.
Liebetrau, A.M. Measures of Association, Sage University Paper Series on Quantitative Application in the Social Sciences, 07–032, Sage Publications, Beverly Hills, 1983.
Lin, J. and Wong, S.K.M. A new directed divergence measure and its characterization, International Journal of General Systems,17, 73–81, 1991
Lin, T.Y. and Cercone, N. (Eds.), Rough Sets and Data Mining: Analysis for Imprecise Data, Kluwer Academic Publishers, Boston, 1997.
Linfoot, E.H. An informational measure of correlation, Information and Control, 1, 85–87, 1957.
Liu, H. and Motoda, H. Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers, Boston, 1998.
Lopez de Mhntaras, R. ID3 revisited: a distance-based criterion for attribute selection, in: Methodologies for Intelligent Systems, 4, Ras, Z.W. (Ed.), North-Holland, New York, 342–350, 1989.
Malvestuto, F.M. Statistical treatment of the information content of a database, Information Systems, 11, 211–223, 1986.
Michalski, R.S., Carbonell, J.G. and Mitchell, T.M. (Eds.), Machine Learning, Tioga, 1983.
Pfahringer, B. and Kramer, S. Compression-based evaluation of partial determinations, Proceedings of KDD-95, 234–239, 1995.
Pawlak, Z. Rough Sets, Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Boston, 1991.
Pawlak, Z., Wong, S.K.M. and Ziarko, W. Rough sets: probabilistic versus deterministic approach, International Journal of Man-Machine Studies, 29, 8195, 1988.
Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference,Morgan Kaufmann Publishers, San Francisco, 1988.
Polkowski, L. and Skowron, A. (Eds.), Rough Sets in Knowledge Discovery 1,2,Physica-Verlag, Heidelberg, 1998.
Quinlan, J.R. Induction of decision trees, Machine Learning, 1, 81–106, 1986.
Rao, C.R. Diversity and dissimilarity coefficients: a unified approach, Theoretical Population Biology, 21, 24–43, 1982.
Rajski, C. A metric space of discrete probability distributions, Information and Control, 4, 373–377, 1961.
Salton, G. and McGill, M.H. Introduction to Modern Information Retrieval,McGraw-Hill, New York, 1983.
Shannon, C.E. A mathematical theory of communication, Bell System and Technical Journal, 27, 379–423, 623–656, 1948.
Shannon, C.E. Some topics in information theory, Proceedings of International Congress of Mathematics, 2, 262, 1950.
Sheridan, T.B. and Ferrell, W.R. Man-Machine Systems: Information Control and Decision Models of Human Performance The MIT Press, Cambridge, 1974.
Silverstein, C., Brin, S. and Motwani, R. Beyond market baskets: generalizing association rules to dependence rules, Data Mining and Knowledge Discovery, 2, 39–68, 1998.
Smyth, P. and Goodman, R.M. Rule induction using information theory, in: Knowledge Discovery in Databases, Piatetsky-Shapiro, G. and Frawley, W.J. (Eds.), AAAI/MIT Press, 159–176, 1991.
Spyratos, N. The partition model: a deductive database model, ACM Transactions on Database Systems12, 1–37, 1987.
van Rijsbergen, C.J. Information RetrievalButterworth, London, 1979.
Wan, S.J. and Wong, S.K.M. A measure for attribute dissimilarity and its applications in machine learning, in: Computing and Information, Janicki, R. and Koczkodaj, W.W. (Eds.), North-Holland, Amsterdam, 267–273, 1989.
Wang, Q.R. and Suen, C.Y. Analysis and design of a decision tree based on entropy reduction and its application to large character set recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 406–417, 1984.
Watanabe, S. Knowing and Guessing, Wiley, New York, 1969.
Watanabe, S. Pattern recognition as a quest for minimum entropy, Pattern Recognition, 13, 381–387, 1981.
Wong, A.K.C. and You, M. Entropy and distance of random graphs with application to structural pattern recognition, IEEE Transactions on Pattern Analysis And Machine Intelligence, PAMI-7, 599–609, 1985.
Wong, S.K.M. and Yao, Y.Y. A probability distribution model for information retrieval, Information Processing and Management, 25, 39–53, 1989.
Wong, S.K.M. and Yao, Y.Y. An information-theoretic measure of term specificity, Journal of the American Society for Information Science, 43, 54–61, 1992.
Yao, Y.Y., Wong, S.K.M. and Butz, C.J. On information-theoretic measures of attribute importance, Proceedings of PAKDD’99, 133–137, 1999.
Yao, Y.Y., Wong, S.K.M. and Lin, T.Y. A review of rough set models, in: Rough Sets and Data Mining: Analysis for Imprecise Data, Lin, T.Y. and Cercone, N. (Eds.), Academic Publishers, Boston, 47–75, 1997.
Yao, Y.Y. Information tables with neighborhood semantics, in: Data Mining and Knowledge Discovery: Theory, Tools, and Technology II, Dasarathy, B.V. (Ed.), The International Society for Optical Engineering, Bellingham, Washington, 108–116, 2000.
Yao, Y.Y. and Zhong, N. An analysis of quantitative measures associated with rules, Proceedings of PAKDD’99, 479–488, 1999.
Yao, Y.Y. and Zhong, N. On association, similarity and dependency of attributes, Proceedings of PAKDD’00, 2000.
Yao, Y.Y. and Zhong, N. Granular computing using information tables, manuscript, 2000.
Yao, Y.Y. and Zhong, N. Mining market value function for targeted marketing, manuscript, 2000.
Zeleny, M. Linear multiobjective programming, Springer-Verlag, New York, 1974.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Yao, Y.Y. (2003). Information-Theoretic Measures for Knowledge Discovery and Data Mining. In: Karmeshu (eds) Entropy Measures, Maximum Entropy Principle and Emerging Applications. Studies in Fuzziness and Soft Computing, vol 119. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36212-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-36212-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05531-7
Online ISBN: 978-3-540-36212-8
eBook Packages: Springer Book Archive