Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning

Zheng, Fei; Webb, Geoffrey I.; Suraweera, Pramuditha; Zhu, Liguang

doi:10.1007/s10994-011-5275-2

Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning

Published: 22 December 2011

Volume 87, pages 93–125, (2012)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning

Download PDF

Fei Zheng¹,
Geoffrey I. Webb¹,
Pramuditha Suraweera¹ &
…
Liguang Zhu¹

1069 Accesses
52 Citations
Explore all metrics

Abstract

Semi-naive Bayesian techniques seek to improve the accuracy of naive Bayes (NB) by relaxing the attribute independence assumption. We present a new type of semi-naive Bayesian operation, Subsumption Resolution (SR), which efficiently identifies occurrences of the specialization-generalization relationship and eliminates generalizations at classification time. We extend SR to Near-Subsumption Resolution (NSR) to delete near–generalizations in addition to generalizations. We develop two versions of SR: one that performs SR during training, called eager SR (ESR), and another that performs SR during testing, called lazy SR (LSR). We investigate the effect of ESR, LSR, NSR and conventional attribute elimination (BSE) on NB and Averaged One-Dependence Estimators (AODE), a powerful alternative to NB. BSE imposes very high training time overheads on NB and AODE accompanied by varying decreases in classification time overheads. ESR, LSR and NSR impose high training time and test time overheads on NB. However, LSR imposes no extra training time overheads and only modest test time overheads on AODE, while ESR and NSR impose modest training and test time overheads on AODE. Our extensive experimental comparison on sixty UCI data sets shows that applying BSE, LSR or NSR to NB significantly improves both zero-one loss and RMSE, while applying BSE, ESR or NSR to AODE significantly improves zero-one loss and RMSE and applying LSR to AODE significantly improves zero-one loss. The Friedman test and Nemenyi test show that AODE with ESR or NSR have a significant zero-one loss and RMSE advantage over Logistic Regression and a zero-one loss advantage over Weka’s LibSVM implementation with a grid parameter search on categorical data. AODE with LSR has a zero-one loss advantage over Logistic Regression and comparable zero-one loss with LibSVM. Finally, we examine the circumstances under which the elimination of near-generalizations proves beneficial.

Article PDF

A Lazy One-Dependence Classification Algorithm Based on Selective Patterns

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

Reducing Examples in Relational Learning with Bounded-Treewidth Hypotheses

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Cerquides, J., & Mántaras, R. L. D. (2005). Robust Bayesian linear classifier ensembles. In Proceedings of the sixteenth European conference on machine learning (pp. 70–81).
Google Scholar
Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In Proceedings of the ninth European conference on artificial intelligence (pp. 147–149). London: Pitman.
Google Scholar
Dash, D., & Cooper, G. F. (2002). Exact model averaging with naive Bayesian classifiers. In Proceedings of the nineteenth international conference on machine learning (pp. 91–98). San Mateo: Morgan Kaufmann.
Google Scholar
De Raedt, L. (2010a). Logic of generality. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of machine learning (pp. 624–631). New York: Springer.
Google Scholar
De Raedt, L. D. (2010b). Inductive logic programming. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of machine learning (pp. 529–537). New York: Springer.
Google Scholar
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
MATH Google Scholar
Domingos, P., & Pazzani, M. J. (1996). Beyond independence: conditions for the optimality of the simple Bayesian classifier. In Proceedings of the thirteenth international conference on machine learning (pp. 105–112). San Mateo: Morgan Kaufmann.
Google Scholar
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley.
MATH Google Scholar
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the thirteenth international joint conference on artificial intelligence (pp. 1022–1029). San Mateo: Morgan Kaufmann.
Google Scholar
Flores, M., Gámez, J., Martínez, A., & Puerta, J. (2009). GAODE and HAODE: two proposals based on AODE to deal with continuous variables. In Proceedings of the 26th annual international conference on machine learning (pp. 313–320).
Google Scholar
Frank, E., Hall, M., & Pfahringer, B. (2003). Locally weighted naive Bayes. In Proceedings of the nineteenth conference on uncertainty in artificial intelligence (pp. 249–256). San Mateo: Morgan Kaufmann.
Google Scholar
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675–701.
Article Google Scholar
Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. Journal of the American Statistical Association, 11(1), 86–92.
MATH Google Scholar
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2), 131–163.
Article MATH Google Scholar
Gama, J. (2003). Iterative Bayes. Theoretical Computer Science, 292(2), 417–430.
Article MathSciNet MATH Google Scholar
Hand, D. J., & Yu, K. (2001). Idiot’s Bayes: not so stupid after all? International Statistical Review, 69(3), 385–398.
Article MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning: data mining, inference and prediction. New York: Springer.
MATH Google Scholar
Hilden, J., & Bjerregaard, B. (1976). Computer-aided diagnosis and the atypical case. In F. T. de Dombal & F. Gremy (Eds.), Decision making and medical care: can information science help (pp. 365–378). Amsterdam: North-Holland.
Google Scholar
Iman, R. L., & Davenport, J. M. (1980). Approximations of the critical region of the Friedman statistic. In Communications in statistics (pp. 571–595).
Google Scholar
Keogh, E. J., & Pazzani, M. J. (1999). Learning augmented Bayesian classifiers: a comparison of distribution-based and classification-based approaches. In Proceedings of the international workshop on artificial intelligence and statistics (pp. 225–230).
Google Scholar
Kittler, J. (1986). Feature selection and extraction. In T. Y. Young & K.-S. Fu (Eds.), Handbook of pattern recognition and image processing. New York: Academic Press.
Google Scholar
Kohavi, R. (1996). Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In Proceedings of the second international conference on knowledge discovery and data mining (pp. 202–207).
Google Scholar
Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In Proceedings of the thirteenth international conference on machine learning (pp. 275–283). San Francisco: Morgan Kaufmann.
Google Scholar
Kononenko, I. (1990). Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition. In B. Wielinga, J. Boose, B. Gaines, G. Schreiber, & M. van Someren (Eds.), Current trends in knowledge acquisition. Amsterdam: IOS Press.
Google Scholar
Kononenko, I. (1991). Semi-naive Bayesian classifier. In Proceedings of the sixth European working session on machine learning (pp. 206–219). Berlin: Springer.
Google Scholar
Langley, P. (1993). Induction of recursive Bayesian classifiers. In Proceedings of the 1993 European conference on machine learning (pp. 153–164). Berlin: Springer.
Google Scholar
Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the tenth conference on uncertainty in artificial intelligence (pp. 399–406). San Mateo: Morgan Kaufmann.
Google Scholar
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the tenth national conference on artificial intelligence (pp. 223–228). Menlo Park: AAAI Press.
Google Scholar
Langseth, H., & Nielsen, T. D. (2006). Classification using hierarchical naive Bayes models. Machine Learning, 63(2), 135–159 (1994).
Article MATH Google Scholar
Lewis, D. D. (1998). Naive Bayes at forty: the independence assumption in information retrieval. In Proceedings of the tenth European conference on machine learning (pp. 4–15). Berlin: Springer.
Google Scholar
Mitchell, T. (1997). Machine learning. New York: McGraw Hill.
MATH Google Scholar
Newman, D., Hettich, S., Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. Irvine: University of California, Department of Information and Computer Science.
Google Scholar
Pazzani, M. J. (1996). Constructive induction of Cartesian product attributes. In ISIS: information, statistics and induction in science (pp. 66–77).
Google Scholar
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in large margin classifiers. Cambridge: MIT Press.
Google Scholar
Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In Proceedings of the second international conference on knowledge discovery in databases (pp. 334–338). Menlo Park: AAAI Press.
Google Scholar
Webb, G. I. (2000). MultiBoosting: a technique for combining boosting and wagging. Machine Learning, 40(2), 159–196.
Article Google Scholar
Webb, G. I., & Pazzani, M. J. (1998). Adjusted probability naive Bayesian induction. In Proceedings of the eleventh Australian joint conference on artificial intelligence (pp. 285–295). Berlin: Springer.
Google Scholar
Webb, G. I., Boughton, J., & Wang, Z. (2005). Not so naive Bayes: aggregating one-dependence estimators. Machine Learning, 58(1), 5–24.
Article MATH Google Scholar
Webb, G. I., Boughton, J., Zheng, F., Ting, K. M., & Salem, H. (2011). Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly naive Bayesian classification. Machine Learning. doi:10.1007/s10994-011-5263-6.
Google Scholar
Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques. San Mateo: Morgan Kaufmann.
MATH Google Scholar
Zadrozny, B., & Elkan, C. (2001). Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proceedings of the eighteenth international conference on machine learning (pp. 609–616). San Francisco: Morgan Kaufmann.
Google Scholar
Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth international conference on knowledge discovery and data mining (pp. 694–699). New York: ACM Press.
Chapter Google Scholar
Zhang, N. L., Nielsen, T. D., & Jensen, F. V. (2004). Latent variable discovery in classification models. Artificial Intelligence in Medicine, 30(3), 283–299.
Article Google Scholar
Zhang, H., Jiang, L., & Su, J. (2005). Hidden naive Bayes. In Proceedings of the twentieth national conference on artificial intelligence (pp. 919–924). Menlo Park: AAAI Press.
Google Scholar
Zheng, Z., & Webb, G. I. (2000). Lazy learning of Bayesian rules. Machine Learning, 41(1), 53–84.
Article Google Scholar
Zheng, F., & Webb, G. I. (2005). A comparative study of semi-naive Bayes methods in classification learning. In Proceedings of the fourth Australasian data mining conference (pp. 141–156).
Google Scholar
Zheng, F., & Webb, G. I. (2006). Efficient lazy elimination for averaged-one dependence estimators. In Proceedings of the twenty-third international conference on machine learning (pp. 1113–1120). New York: ACM Press.
Google Scholar
Zheng, F., & Webb, G. I. (2007). Finding the right family: parent and child selection for averaged one-dependence estimators. In Proceedings of the eighteenth European conference on machine learning (pp. 490–501). Berlin: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, Clayton, Vic., 3800, Australia
Fei Zheng, Geoffrey I. Webb, Pramuditha Suraweera & Liguang Zhu

Authors

Fei Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar
Pramuditha Suraweera
View author publications
You can also search for this author in PubMed Google Scholar
Liguang Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geoffrey I. Webb.

Additional information

Editors: Mark Craven and Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, F., Webb, G.I., Suraweera, P. et al. Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning. Mach Learn 87, 93–125 (2012). https://doi.org/10.1007/s10994-011-5275-2

Download citation

Received: 31 October 2007
Accepted: 30 November 2011
Published: 22 December 2011
Issue Date: April 2012
DOI: https://doi.org/10.1007/s10994-011-5275-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning

Abstract

Article PDF

Similar content being viewed by others

A Lazy One-Dependence Classification Algorithm Based on Selective Patterns

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

Reducing Examples in Relational Learning with Bounded-Treewidth Hypotheses

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning

Abstract

Article PDF

Similar content being viewed by others

A Lazy One-Dependence Classification Algorithm Based on Selective Patterns

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

Reducing Examples in Relational Learning with Bounded-Treewidth Hypotheses

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation