Abstract
In most databases, it is possible to identify small partitions of the data where the observed distribution is notably different from that of the database as a whole. In classical subgroup discovery, one considers the distribution of a single nominal attribute, and exceptional subgroups show a surprising increase in the occurrence of one of its values. In this paper, we introduce Exceptional Model Mining (EMM), a framework that allows for more complicated target concepts. Rather than finding subgroups based on the distribution of a single target attribute, EMM finds subgroups where a model fitted to that subgroup is somehow exceptional. We discuss regression as well as classification models, and define quality measures that determine how exceptional a given model on a subgroup is. Our framework is general enough to be applied to many types of models, even from other paradigms such as association analysis and graphical modeling.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Affymetrix (1992), http://www.affymetrix.com/index.affx
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian Networks: The combination of knowledge and statistical data. Machine Learning 20, 179–243 (1995)
Klösgen, W.: Subgroup Discovery. In: Handbook of Data Mining and Knowledge Discovery, ch. 16.3. Oxford University Press, New York (2002)
Friedman, J., Fisher, N.: Bump-Hunting in High-Dimensional Data. Statistics and Computing 9(2), 123–143 (1999)
Knobbe, A.: Safarii multi-relational data mining environment (2006), http://www.kiminkii.com/safarii.html
Knobbe, A., Ho, E.: Pattern Teams. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213. Springer, Heidelberg (2006)
Kohavi, R.: The Power of Decision Tables. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912. Springer, Heidelberg (1995)
Anglin, P.M., Gençay, R.: Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics 11(6), 633–648 (1996)
van de Koppel, E., et al.: Knowledge Discovery in Neuroblastoma-related Biological Data. In: Data Mining in Functional Genomics and Proteomics workshop at PKDD 2007, Warsaw, Poland (2007)
Moore, D., McCabe, G.: Introduction to the Practice of Statistics, New York (1993)
Neter, J., Kutner, M., Nachtsheim, C.J., Wasserman, W.: Applied Linear Statistical Models. WCB McGraw-Hill (1996)
Yang, G., Le Cam, L.: Asymptotics in Statistics: Some Basic Concepts. Springer, Berlin (2000)
Xu, Y., Fern, A.: Learning Linear Ranking Functions for Beam Search. In: Proceedings ICML 2007 (2007)
Niculescu-Mizil, A., Caruana, R.: Inductive Transfer for Bayesian Network Structure Learning. In: Proceedings of the 11th International Conference on AI and Statitics (AISTATS 2007) (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leman, D., Feelders, A., Knobbe, A. (2008). Exceptional Model Mining. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87481-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-87481-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87480-5
Online ISBN: 978-3-540-87481-2
eBook Packages: Computer ScienceComputer Science (R0)