Abstract
In many online applications of machine learning, the computational resources available for classification will vary from time to time. Most techniques are designed to operate within the constraints of the minimum expected resources and fail to utilize further resources when they are available. We propose a novel anytime classification algorithm, anytime averaged probabilistic estimators (AAPE), which is capable of delivering strong prediction accuracy with little CPU time and utilizing additional CPU time to increase classification accuracy. The idea is to run an ordered sequence of very efficient Bayesian probabilistic estimators (single improvement steps) until classification time runs out. Theoretical studies and empirical validations reveal that by properly identifying, ordering, invoking and ensembling single improvement steps, AAPE is able to accomplish accurate classification whenever it is interrupted. It is also able to output class probability estimates beyond simple 0/1-loss classifications, as well as adeptly handle incremental learning.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading: Addison–Wesley.
Bernstein, D. S., Perkins, T. J., Zilberstein, S., & Finkelstein, L. (2002). Scheduling contract algorithms on multiple processors. In Proceedings of the 18th national conference on artificial intelligence and the 14th conference on innovative applications of artificial intelligence (pp. 702–706).
Blake, C., & Merz, C. J. (2004). UCI repository of machine learning databases. [Machine-readable data repository]. Department of Information and Computer Science, University of California, Irvine, CA, USA.
Breiman, L. (1996). Bias, variance and arcing classifiers (Technical report 460). Berkeley: Statistics Department, University of California.
Chan, P., Fan, W., Prodromidis, A., & Stolfo, S. (1999). Distributed data mining in credit card fraud detection. IEEE Intelligent Systems, 14(6), 67–74.
DeCoste, D. (2002). Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In Proceedings of the 19th international conference on machine learning (pp. 99–106).
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1(1), 55–77.
Grass, J., & Zilberstein, S. (1996). Anytime algorithm development tools. In M. Pittarelli (Ed.), SIGART Bulletin Special Issue on Anytime Algorithms and Deliberation Scheduling, 7(2), 20–27.
Grefenstette, J., & Ramsey, C. (1992). An approach to anytime learning. In Proceedings of the 9th international machine learning workshop.
Keogh, E. J., & Pazzani, M. J. (2002). Learning the structure of augmented Bayesian classifiers. International Journal on Artificial Intelligence Tools, 11(40), 587–601.
Kohavi, R., & John, G. H. (1996). Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance, 97(1–2), 273–324.
Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In Proceedings of the 13th international conference on machine learning (pp. 275–283).
Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of the 13th international conference on machine learning (pp. 284–292).
Kong, E. B., & Dietterich, T. G. (1995). Error-correcting output coding corrects bias and variance. In Proceedings of the 12th international conference on machine learning (pp. 313–321).
Korb, K., & Nicholson, A. (2004). Bayesian artificial intelligence. London: Chapman & Hall/CRC.
Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the 10th annual conference on uncertainty in artificial intelligence.
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the 10th national conference on artificial intelligence (pp. 223–228).
Lewis, D. D. (1998). Naive Bayes at forty: the independence assumption in information retrieval. In Proceedings of the 10th European conference on machine learning (pp. 4–15).
Mitchell, T. M. (1997). Machine learning. New York: McGraw–Hill.
Opitz, D. (1995). An anytime approach to confectionist theory refinement: refining the topologies of knowledge-based neural networks. Unpublished doctoral dissertation, Department of Computer Sciences, University of Wisconsin-Madison, USA.
Resnick, P., & Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40(3), 56–58.
Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In Proceedings of the 2nd international conference on knowledge discovery and data mining.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–465.
Suzuki, J. (1996). Learning Bayesian belief networks based on the MDL principle: an efficient algorithm using the branch and bound technique. In Proceedings of the 13th international conference on machine learning (pp. 463–470).
Turney, P. (2000). Types of cost in inductive concept learning. In Workshop on cost-sensitive learning at ICML 2000 (pp. 15–21).
Webb, G. I. (2000). Multiboosting: a technique for combining boosting and wagging. Machine Learning, 40(2), 159–196.
Webb, G. I., Pazzani, M. J., & Billsus, D. (2001). Machine learning for user modeling. User Modeling and User-Adapted Interaction, 11(1–2), 19–29.
Webb, G. I., Boughton, J., & Wang, Z. (2005). Not so naive Bayes: averaged one-dependence estimators. Machine Learning, 58(1), 5–24.
Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques with Java implementations (2nd ed.). Los Altos: Kaufmann.
Wu, X., & Urpani, D. (1999). Induction by attribute elimination. IEEE Transactions on Knowledge and Data Engineering, 11(5), 805–812.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: David Page.
Rights and permissions
About this article
Cite this article
Yang, Y., Webb, G., Korb, K. et al. Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69, 35–53 (2007). https://doi.org/10.1007/s10994-007-5020-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-007-5020-z