Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification

Webb, Geoffrey I.; Boughton, Janice R.; Zheng, Fei; Ting, Kai Ming; Salem, Houssam

doi:10.1007/s10994-011-5263-6

Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification

Published: 13 October 2011

Volume 86, pages 233–272, (2012)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification

Download PDF

Geoffrey I. Webb¹,
Janice R. Boughton¹,
Fei Zheng¹,
Kai Ming Ting¹ &
…
Houssam Salem¹

1612 Accesses
66 Citations
3 Altmetric
Explore all metrics

Abstract

Averaged n-Dependence Estimators (AnDE) is an approach to probabilistic classification learning that learns by extrapolation from marginal to full-multivariate probability distributions. It utilizes a single parameter that transforms the approach between a low-variance high-bias learner (Naive Bayes) and a high-variance low-bias learner with Bayes optimal asymptotic error. It extends the underlying strategy of Averaged One-Dependence Estimators (AODE), which relaxes the Naive Bayes independence assumption while retaining many of Naive Bayes’ desirable computational and theoretical properties. AnDE further relaxes the independence assumption by generalizing AODE to higher-levels of dependence. Extensive experimental evaluation shows that the bias-variance trade-off for Averaged 2-Dependence Estimators results in strong predictive accuracy over a wide range of data sets. It has training time linear with respect to the number of examples, learns in a single pass through the training data, supports incremental learning, handles directly missing values, and is robust in the face of noise. Beyond the practical utility of its lower-dimensional variants, AnDE is of interest in that it demonstrates that it is possible to create low-bias high-variance generative learners and suggests strategies for developing even more powerful classifiers.

Article PDF

Highly Scalable Attribute Selection for Averaged One-Dependence Estimators

Efficient Learning of Classifiers Based on the 2-Additive Choquet Integral

When is the Naive Bayes approximation not so naive?

Article 21 July 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Affendey, L., Paris, I., Mustapha, N., Sulaiman, M., & Muda, Z. (2010). Ranking of influencing factors in predicting students’ academic performance. Information Technology Journal, 9(4), 832–837.
Article Google Scholar
Birzele, F., & Kramer, S. (2006). A new representation for protein secondary structure prediction based on frequent patterns. Bioinformatics, 22(21), 2628–2634.
Article Google Scholar
Brain, D., & Webb, G. I. (2002). The need for low bias algorithms in classification learning from large data sets. In Proceedings of the sixth European conference on principles of data mining and knowledge discovery (PKDD) (pp. 62–73). Berlin: Springer.
Chapter Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Article MATH Google Scholar
Camporelli, M. (2006). Using a Bayesian classifier for probability estimation: analysis of the AMIS score for risk stratification in myocardial infarction. Diploma thesis, Department of Informatics, University of Zurich.
Cerquides, J., & Mántaras, R. L. D. (2005). Robust Bayesian linear classifier ensembles. In Proceedings of the sixteenth European conference on machine learning (pp. 70–81).
Google Scholar
Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In Proceedings of the ninth European conference on artificial intelligence (pp. 147–149). London: Pitman.
Google Scholar
Domingos, P., & Pazzani, M. J. (1996). Beyond independence: conditions for the optimality of the simple Bayesian classifier. In Proceedings of the thirteenth international conference on machine learning (pp. 105–112). San Mateo: Morgan Kaufmann.
Google Scholar
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the thirteenth international joint conference on artificial intelligence (pp. 1022–1029). San Mateo: Morgan Kaufmann.
Google Scholar
Ferrari, L. D., & Aitken, S. (2006). Mining housekeeping genes with a naive Bayes classifier. BMC Genomics, 7(1), 277.
Article Google Scholar
Flikka, K., Martens, L., Vandekerckhove, J., Gevaert, K., & Eidhammer, I. (2006). Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering. Proteomics, 6(7), 2086–2094.
Article Google Scholar
Flores, M., Gámez, J., Martínez, A., & Puerta, J. (2009). GAODE and HAODE: two proposals based on AODE to deal with continuous variables. In Proceedings of the 26th annual international conference on machine learning (pp. 313–320). New York: ACM.
Google Scholar
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2), 131–163.
Article MATH Google Scholar
Garcia, B., Aler, R., Ledezma, A., & Sanchis, A. (2008). Protein-protein functional association prediction using genetic programming. In Proceedings of the tenth annual conference on genetic and evolutionary computation (pp. 347–348). New York: ACM.
Chapter Google Scholar
García-Jiménez, B., Juan, D., Ezkurdia, I., Andrés-León, E., & Valencia, A. (2010). Inference of functional relations in predicted protein networks with a machine learning approach. PLoS ONE, 4, e9969.
Article Google Scholar
Hopfgartner, F., Urruty, T., Lopez, P., Villa, R., & Jose, J. (2010). Simulated evaluation of faceted browsing based on feature selection. Multimedia Tools and Applications, 47(3), 631–662.
Article Google Scholar
Hunt, K. (2006). Evaluation of novel algorithms to optimize risk stratification scores in myocardial infarction. PhD thesis, Department of Informatics, University of Zurich.
Jiang, L., & Zhang, H. (2006). Weightily averaged one-dependence estimators. In PRICAI 2006: trends in artificial intelligence (pp. 970–974).
Chapter Google Scholar
Kunchevaa, L. I., Vilas, V. J. D. R., & Rodríguezc, J. J. (2007). Diagnosing scrapie in sheep: a classification experiment. Computers in Biology and Medicine, 37(8), 1194–1202.
Article Google Scholar
Kurz, D., Bernstein, A., Hunt, K., Radovanovic, D., Erne, P., Siudak, Z., & Bertel, O. (2009). Simple point-of-care risk stratification in acute coronary syndromes: the AMIS model. British Medical Journal, 95(8), 662.
Google Scholar
Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the tenth conference on uncertainty in artificial intelligence (pp. 399–406). San Mateo: Morgan Kaufmann.
Google Scholar
Lasko, T. A., Atlas, S. J., Barry, M. J., & Chueh, K. H. C. (2006). Automated identification of a physician’s primary patients. Journal of the American Medical Informatics Association, 13(1), 74–79.
Article Google Scholar
Lau, Q. P., Hsu, W., Lee, M. L., Mao, Y., & Chen, L. (2007). Prediction of cerebral aneurysm rupture. In Proceedings of the nineteenth IEEE international conference on tools with artificial intelligence (pp. 350–357). Washington: IEEE Computer Society.
Chapter Google Scholar
Leon, A., et al. (2009). EcID. A database for the inference of functional interactions in E. coli. Nucleic Acids Research, 37, D629 (Database issue).
Article Google Scholar
Liew, C., Ma, X., & Yap, C. (2010). Consensus model for identification of novel PI3K inhibitors in large chemical library. Journal of Computer-Aided Molecular Design, 24(2), 131–141.
Article Google Scholar
Masegosa, A., Joho, H., & Jose, J. (2007). Evaluating query-independent object features for relevancy prediction. In Advances in information retrieval (pp. 283–294).
Chapter Google Scholar
Mitchell, T. M. (1982). Generalization as search. Artificial Intelligence, 18(2), 203–226.
Article MathSciNet Google Scholar
Nikora, A. P. (2005). Classifying requirements: towards a more rigorous analysis of natural-language specifications. In Proceedings of the sixteenth IEEE international symposium on software reliability engineering (pp. 291–300). Washington: IEEE Computer Society.
Google Scholar
Orhan, Z., & Altan, Z. (2006). Impact of feature selection for corpus-based WSD in Turkish. In Proceedings of the fifth Mexican international conference on artificial intelligence (pp. 868–878). Berlin: Springer.
Google Scholar
Pazzani, M. J. (1996). Constructive induction of Cartesian product attributes. In ISIS: information, statistics and induction in science (pp. 66–77).
Google Scholar
Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In Proceedings of the second international conference on knowledge discovery in databases (pp. 334–338). Menlo Park: AAAI Press.
Google Scholar
Shahri, S., & Jamil, H. (2009). An extendable meta-learning algorithm for ontology mapping. In Flexible query answering systems (pp. 418–430).
Chapter Google Scholar
Simpson, M., Demner-Fushman, D., Sneiderman, C., Antani, S., & Thoma, G. (2009). Using non-lexical features to identify effective indexing terms for biomedical illustrations. In Proceedings of the 12th conference of the European chapter of the association for computational linguistics (pp. 737–744). Association for Computational Linguistics.
Google Scholar
Tian, Y., Chen, C., & Zhang, C. (2008). Aode for source code metrics for improved software maintainability. In Fourth international conference on semantics, knowledge and grid (pp. 330–335).
Chapter Google Scholar
Ting, K. M., Wells, J. R., Tan, S. C., Teng, S. W., & Webb, G. I. (2011). Feature-subspace aggregating: ensembles for stable and unstable learners. Machine Learning, 82(3), 375–397.
Article Google Scholar
Wang, H., Klinginsmith, J., Dong, X., Lee, A., Guha, R., Wu, Y., Crippen, G., & Wild, D. (2007). Chemical data mining of the NCI human tumor cell line database. Journal of Chemical Information and Modeling, 47(6), 2063–2076.
Article Google Scholar
Webb, G. I. (2000). Multiboosting: a technique for combining boosting and wagging. Machine Learning, 40(2), 159–196.
Article Google Scholar
Webb, G. I., Boughton, J., & Wang, Z. (2005). Not so naive Bayes: aggregating one-dependence estimators. Machine Learning, 58(1), 5–24.
Article MATH Google Scholar
Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques. San Mateo: Morgan Kaufmann.
MATH Google Scholar
Yang, Y., Webb, G., Cerquides, J., Korb, K., Boughton, J., & Ting, K. M. (2006). To select or to weigh: a comparative study of model selection and model weighing for SPODE ensembles. In Proceedings of the seventeenth European conference on machine learning (pp. 533–544). Berlin: Springer.
Google Scholar
Yang, Y., Webb, G. I., Cerquides, J., Korb, K. B., Boughton, J., & Ting, K. M. (2007). To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Transactions on Knowledge and Data Engineering, 19(12), 1652–1665.
Article Google Scholar
Yang, Y., Webb, G. I., Korb, K., & Ting, K.-M. (2007). Classifying under computational resource constraints: anytime classification using probabilistic estimators. Machine Learning, 69(1), 35–53.
Article Google Scholar
Zheng, Z., & Webb, G. I. (2000). Lazy learning of Bayesian rules. Machine Learning, 41(1), 53–84.
Article Google Scholar
Zheng, F., & Webb, G. I. (2006). Efficient lazy elimination for averaged-one dependence estimators. In Proceedings of the twenty-third international conference on machine learning (pp. 1113–1120). New York: ACM.
Google Scholar
Zheng, F., & Webb, G. I. (2007). Finding the right family: parent and child selection for averaged one-dependence estimators. In Proceedings of the eighteenth European conference on machine learning (pp. 490–501). Berlin: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, Clayton, VIC, 3800, Australia
Geoffrey I. Webb, Janice R. Boughton, Fei Zheng, Kai Ming Ting & Houssam Salem

Authors

Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar
Janice R. Boughton
View author publications
You can also search for this author in PubMed Google Scholar
Fei Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Kai Ming Ting
View author publications
You can also search for this author in PubMed Google Scholar
Houssam Salem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geoffrey I. Webb.

Additional information

Editor: Peter Flach.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Webb, G.I., Boughton, J.R., Zheng, F. et al. Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification. Mach Learn 86, 233–272 (2012). https://doi.org/10.1007/s10994-011-5263-6

Download citation

Received: 08 December 2009
Accepted: 15 September 2011
Published: 13 October 2011
Issue Date: February 2012
DOI: https://doi.org/10.1007/s10994-011-5263-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification

Abstract

Article PDF

Similar content being viewed by others

Highly Scalable Attribute Selection for Averaged One-Dependence Estimators

Efficient Learning of Classifiers Based on the 2-Additive Choquet Integral

When is the Naive Bayes approximation not so naive?

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification

Abstract

Article PDF

Similar content being viewed by others

Highly Scalable Attribute Selection for Averaged One-Dependence Estimators

Efficient Learning of Classifiers Based on the 2-Additive Choquet Integral

When is the Naive Bayes approximation not so naive?

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation