Abstract
Classification rules play an important role in prediction tasks. Their popularity is mainly due to their simple and interpretable form. Classification methods combining classification rules that are interesting (w.r.t. a defined interestingness measure) generally lead to good predictions. However, the performance of rulebased classifiers is strongly dependent on the interestingness measure used (e.g. confidence, growth rate, ... ) and on themeasure threshold to be set for differentiating interesting from non-interesting rules; threshold setting is a non-trivial problem. Furthermore, it can be easily shown that the mined rules are individually non-robust: an interesting (e.g. frequent and confident) rule mined from the training set could be no more confident in a test phase. In this paper, we suggest a new criterion for the evaluation of the robustness of classification rules in binary labeled data sets. Our criterion arises from a Bayesian approach: we propose an expression of the probability of a rule given the data. The most probable rules are thus the rules that are robust. Our Bayesian criterion is derived from this defined expression and allows us to mark out the robust rules from a given set of rules without parameter tuning.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings ACM SIGMOD 1993, pp. 207–216 (1993)
Antonie, M.-L., Zaïane, O.R.: An associative classifier based on positive and negative rules. In: DMKD 2004 (2004)
Baralis, E., Chiusano, S.: Essential classification rule sets. ACM Transactions on Database Systems 29(4), 635–674 (2004)
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets : A condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7(1), 5–22 (2003)
Boullé, M.: A bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)
Boullé, M.: MODL: A bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)
Bringmann, B., Nijssen, S., Zimmermann, A.: Pattern-based classification: A unifying perspective. In: LeGo 2009 Workshop co-located with EMCL/PKDD 2009 (2009)
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings KDD 1999, pp. 43–52. ACM Press (1999)
Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: Classification by Aggregating Emerging Patterns. In: Arikawa, S., Nakata, I. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 30–42. Springer, Heidelberg (1999)
François, P., Crémilleux, B., Robert, C., Demongeot, J.: MENINGE: a medical consulting system for child’s meningitis study on a series of consecutive cases. Artificial Intelligence in Medecine 4(4), 281–292 (1992)
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Gay, D., Boullé, M.: Un critère bayésien pour évaluer la robustesse des règles de classification. In: EGC 2011. Revue des Nouvelles Technologies de l’Information, vol. RNTI-E-20, pp. 539–550. Hermann-Éditions (2011)
Grünwald, P.: The minimum description length principle. MIT Press (2007)
Hue, C., Boullé, M.: A new probabilistic approach in rank regression with optimal bayesian partitioning. Journal of Machine Learning Research 8, 2727–2754 (2007)
Jorge, A.M., Azevedo, P.J., Pereira, F.: Distribution Rules with Numeric Attributes of Interest. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 247–258. Springer, Heidelberg (2006)
Khenchaf, A., Poncelet, P. (eds.): Extraction et gestion des connaissances (EGC 2011), Janvier 25-29, Brest, France. Revue des Nouvelles Technologies de l’Information, vol. RNTI-E-20. Hermann-Éditions (2011)
Le Bras, Y., Meyer, P., Lenca, P., Lallich, S.: A Robustness Measure of Association Rules. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS (LNAI), vol. 6322, pp. 227–242. Springer, Heidelberg (2010)
Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer (2008)
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings ICDM 2001, pp. 369–376. IEEE Computer Society (2001)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings KDD 1998, pp. 80–86. AAAI Press (1998)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24(1), 25–46 (1999)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal (1948)
Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: SIAM DM 2006 (2006)
Suzuki, E.: Negative Encoding Length as a Subjective Interestingness Measure for Groups of Rules. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 220–231. Springer, Heidelberg (2009)
van Leeuwen, M., Vreeken, J., Siebes, A.: Compression Picks Item Sets That Matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)
Voisine, N., Boullé, M., Hue, C.: A Bayes Evaluation Criterion for Decision Trees. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 21–38. Springer, Heidelberg (2010)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann (2005)
Zhang, X., Dong, G., Ramamohanarao, K.: Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In: KDD 2000, pp. 310–314 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gay, D., Boullé, M. (2013). A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets. In: Guillet, F., Pinaud, B., Venturini, G., Zighed, D. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 471. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35855-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-35855-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35854-8
Online ISBN: 978-3-642-35855-5
eBook Packages: EngineeringEngineering (R0)