Abstract
Semi-naive Bayesian techniques seek to improve the accuracy of naive Bayes (NB) by relaxing the attribute independence assumption. We present a new type of semi-naive Bayesian operation, Subsumption Resolution (SR), which efficiently identifies occurrences of the specialization-generalization relationship and eliminates generalizations at classification time. We extend SR to Near-Subsumption Resolution (NSR) to delete near–generalizations in addition to generalizations. We develop two versions of SR: one that performs SR during training, called eager SR (ESR), and another that performs SR during testing, called lazy SR (LSR). We investigate the effect of ESR, LSR, NSR and conventional attribute elimination (BSE) on NB and Averaged One-Dependence Estimators (AODE), a powerful alternative to NB. BSE imposes very high training time overheads on NB and AODE accompanied by varying decreases in classification time overheads. ESR, LSR and NSR impose high training time and test time overheads on NB. However, LSR imposes no extra training time overheads and only modest test time overheads on AODE, while ESR and NSR impose modest training and test time overheads on AODE. Our extensive experimental comparison on sixty UCI data sets shows that applying BSE, LSR or NSR to NB significantly improves both zero-one loss and RMSE, while applying BSE, ESR or NSR to AODE significantly improves zero-one loss and RMSE and applying LSR to AODE significantly improves zero-one loss. The Friedman test and Nemenyi test show that AODE with ESR or NSR have a significant zero-one loss and RMSE advantage over Logistic Regression and a zero-one loss advantage over Weka’s LibSVM implementation with a grid parameter search on categorical data. AODE with LSR has a zero-one loss advantage over Logistic Regression and comparable zero-one loss with LibSVM. Finally, we examine the circumstances under which the elimination of near-generalizations proves beneficial.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Cerquides, J., & Mántaras, R. L. D. (2005). Robust Bayesian linear classifier ensembles. In Proceedings of the sixteenth European conference on machine learning (pp. 70–81).
Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In Proceedings of the ninth European conference on artificial intelligence (pp. 147–149). London: Pitman.
Dash, D., & Cooper, G. F. (2002). Exact model averaging with naive Bayesian classifiers. In Proceedings of the nineteenth international conference on machine learning (pp. 91–98). San Mateo: Morgan Kaufmann.
De Raedt, L. (2010a). Logic of generality. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of machine learning (pp. 624–631). New York: Springer.
De Raedt, L. D. (2010b). Inductive logic programming. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of machine learning (pp. 529–537). New York: Springer.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Domingos, P., & Pazzani, M. J. (1996). Beyond independence: conditions for the optimality of the simple Bayesian classifier. In Proceedings of the thirteenth international conference on machine learning (pp. 105–112). San Mateo: Morgan Kaufmann.
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley.
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the thirteenth international joint conference on artificial intelligence (pp. 1022–1029). San Mateo: Morgan Kaufmann.
Flores, M., Gámez, J., Martínez, A., & Puerta, J. (2009). GAODE and HAODE: two proposals based on AODE to deal with continuous variables. In Proceedings of the 26th annual international conference on machine learning (pp. 313–320).
Frank, E., Hall, M., & Pfahringer, B. (2003). Locally weighted naive Bayes. In Proceedings of the nineteenth conference on uncertainty in artificial intelligence (pp. 249–256). San Mateo: Morgan Kaufmann.
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675–701.
Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. Journal of the American Statistical Association, 11(1), 86–92.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2), 131–163.
Gama, J. (2003). Iterative Bayes. Theoretical Computer Science, 292(2), 417–430.
Hand, D. J., & Yu, K. (2001). Idiot’s Bayes: not so stupid after all? International Statistical Review, 69(3), 385–398.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning: data mining, inference and prediction. New York: Springer.
Hilden, J., & Bjerregaard, B. (1976). Computer-aided diagnosis and the atypical case. In F. T. de Dombal & F. Gremy (Eds.), Decision making and medical care: can information science help (pp. 365–378). Amsterdam: North-Holland.
Iman, R. L., & Davenport, J. M. (1980). Approximations of the critical region of the Friedman statistic. In Communications in statistics (pp. 571–595).
Keogh, E. J., & Pazzani, M. J. (1999). Learning augmented Bayesian classifiers: a comparison of distribution-based and classification-based approaches. In Proceedings of the international workshop on artificial intelligence and statistics (pp. 225–230).
Kittler, J. (1986). Feature selection and extraction. In T. Y. Young & K.-S. Fu (Eds.), Handbook of pattern recognition and image processing. New York: Academic Press.
Kohavi, R. (1996). Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In Proceedings of the second international conference on knowledge discovery and data mining (pp. 202–207).
Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In Proceedings of the thirteenth international conference on machine learning (pp. 275–283). San Francisco: Morgan Kaufmann.
Kononenko, I. (1990). Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition. In B. Wielinga, J. Boose, B. Gaines, G. Schreiber, & M. van Someren (Eds.), Current trends in knowledge acquisition. Amsterdam: IOS Press.
Kononenko, I. (1991). Semi-naive Bayesian classifier. In Proceedings of the sixth European working session on machine learning (pp. 206–219). Berlin: Springer.
Langley, P. (1993). Induction of recursive Bayesian classifiers. In Proceedings of the 1993 European conference on machine learning (pp. 153–164). Berlin: Springer.
Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the tenth conference on uncertainty in artificial intelligence (pp. 399–406). San Mateo: Morgan Kaufmann.
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the tenth national conference on artificial intelligence (pp. 223–228). Menlo Park: AAAI Press.
Langseth, H., & Nielsen, T. D. (2006). Classification using hierarchical naive Bayes models. Machine Learning, 63(2), 135–159 (1994).
Lewis, D. D. (1998). Naive Bayes at forty: the independence assumption in information retrieval. In Proceedings of the tenth European conference on machine learning (pp. 4–15). Berlin: Springer.
Mitchell, T. (1997). Machine learning. New York: McGraw Hill.
Newman, D., Hettich, S., Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. Irvine: University of California, Department of Information and Computer Science.
Pazzani, M. J. (1996). Constructive induction of Cartesian product attributes. In ISIS: information, statistics and induction in science (pp. 66–77).
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in large margin classifiers. Cambridge: MIT Press.
Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In Proceedings of the second international conference on knowledge discovery in databases (pp. 334–338). Menlo Park: AAAI Press.
Webb, G. I. (2000). MultiBoosting: a technique for combining boosting and wagging. Machine Learning, 40(2), 159–196.
Webb, G. I., & Pazzani, M. J. (1998). Adjusted probability naive Bayesian induction. In Proceedings of the eleventh Australian joint conference on artificial intelligence (pp. 285–295). Berlin: Springer.
Webb, G. I., Boughton, J., & Wang, Z. (2005). Not so naive Bayes: aggregating one-dependence estimators. Machine Learning, 58(1), 5–24.
Webb, G. I., Boughton, J., Zheng, F., Ting, K. M., & Salem, H. (2011). Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly naive Bayesian classification. Machine Learning. doi:10.1007/s10994-011-5263-6.
Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques. San Mateo: Morgan Kaufmann.
Zadrozny, B., & Elkan, C. (2001). Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proceedings of the eighteenth international conference on machine learning (pp. 609–616). San Francisco: Morgan Kaufmann.
Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth international conference on knowledge discovery and data mining (pp. 694–699). New York: ACM Press.
Zhang, N. L., Nielsen, T. D., & Jensen, F. V. (2004). Latent variable discovery in classification models. Artificial Intelligence in Medicine, 30(3), 283–299.
Zhang, H., Jiang, L., & Su, J. (2005). Hidden naive Bayes. In Proceedings of the twentieth national conference on artificial intelligence (pp. 919–924). Menlo Park: AAAI Press.
Zheng, Z., & Webb, G. I. (2000). Lazy learning of Bayesian rules. Machine Learning, 41(1), 53–84.
Zheng, F., & Webb, G. I. (2005). A comparative study of semi-naive Bayes methods in classification learning. In Proceedings of the fourth Australasian data mining conference (pp. 141–156).
Zheng, F., & Webb, G. I. (2006). Efficient lazy elimination for averaged-one dependence estimators. In Proceedings of the twenty-third international conference on machine learning (pp. 1113–1120). New York: ACM Press.
Zheng, F., & Webb, G. I. (2007). Finding the right family: parent and child selection for averaged one-dependence estimators. In Proceedings of the eighteenth European conference on machine learning (pp. 490–501). Berlin: Springer.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Mark Craven and Johannes Fürnkranz.
Rights and permissions
About this article
Cite this article
Zheng, F., Webb, G.I., Suraweera, P. et al. Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning. Mach Learn 87, 93–125 (2012). https://doi.org/10.1007/s10994-011-5275-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-011-5275-2