Abstract
Classifying large datasets without any a-priori information poses a problem in numerous tasks. Especially in industrial environments, we often encounter diverse measurement devices and sensors that produce huge amounts of data, but we still rely on a human expert to help give the data a meaningful interpretation. As the amount of data that must be manually classified plays a critical role, we need to reduce the number of learning episodes involving human interactions as much as possible. In addition for real world applications it is fundamental to converge in a stable manner to a solution that is close to the optimal solution. We present a new self-controlled exploration/exploitation strategy to select data points to be labeled by a domain expert where the potential of each data point is computed based on a combination of its representativeness and the uncertainty of the classifier. A new Prototype Based Active Learning (PBAC) algorithm for classification is introduced. We compare the results to other active learning approaches on several benchmark datasets.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Asuncion A, Newman D (2007) UCI machine learning repository. http://mlearn.ics.uci.edu/mlrepository.html
Baram Y, El-Yaniv R, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5: 255–291
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9): 509–517
Buhmann JM, Zöller T (2000) Active learning for hierarchical pairwise data clustering. In: International conference on pattern recognition (ICPR), Barcelona, Spain, vol II, pp 2186–2189
Cebron N, Berthold MR (2006) Adaptive active classification of cell assay images. In: Fürnkranz J, Scheffer T, Spiliopoulou M(eds) PKDD, vol 4213 of lecture notes in computer science. Springer, Berlin, pp 79–90
Chin SL (1997) An efficient method for extracting fuzzy classification rules from high dimensional data. JACIII 1(1): 31–36
Cohn DA, Atlas L, Ladner RE (1994a) Improving generalization with active learning. Mach Learn 15(2): 201–221
Cohn DA, Ghahramani Z, Jordan MI (1994) Active learning with statistical models. In: Tesauro G, Touretzky DS, Leen TK(eds) NIPS. MIT Press, Cambridge, pp 705–712
Kang J, Ryu KR, Kwon H-C (2004) Using cluster-based sampling to select initial training set for active learning in text classification. In: Advances in knowledge discovery and data mining, vol 3056. Springer, Berlin, pp 384–388
Luo T, Kramer K, Goldgof DB, Hall LO, Samson S, Remsen A, Hopkins T (2005) Active learning to recognize multiple types of plankton. J Mach Learn Res 6: 589–613
Mandel MI, Poliner GE, Ellis DPW (2006) Support vector machine active learning for music retrieval. Multimedia Syst 12(1): 3–13
McCallum A, Nigam K (1998) Employing em and pool-based active learning for text classification. In: Shavlik JW(eds) Proceedings of the fifteenth international conference on machine learning (ICML 1998), Madison, WI, July 24–27, 1998. Morgan Kaufmann, San Fransisco, CA, pp 350–358
Nguyen HT, Smeulders AWM (2004) Active learning using pre-clustering. In: Brodley CE (ed) Machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4–8, 2004. ACM
Osugi T, Kun D, Scott S (2005) Balancing exploration and exploitation: a new algorithm for active machine learning. In: ICDM ’05: proceedings of the fifth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 330–337
Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Langley P(eds) Proceedings of the seventeenth international conference on machine learning (ICML 2000), Stanford University, Stanford, CA, June 29–July 2, 2000. Morgan Kaufmann, San Fransisco, CA, pp 839–846
Wang L, Chan KL, Zhang Z (2003) Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR 2003), Madison, WI, June 16–22, 2003. IEEE Computer Society, pp 629–634
Warmuth MK, Liao J, Rätsch G, Mathieson M, Putta S, Lemmen C (2003) Active learning with support vector machines in the drug discovery process. J Chem Inf Comp Sci 43(2): 667–673
Xu Z, Yu K, Tresp V, Xu X, Wang J (2004) Representative sampling for text classification using support vector machines. In: ECIR 2003, vol 2633. Springer, Berlin, pp 393–407
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Pierre Baldi.
Rights and permissions
About this article
Cite this article
Cebron, N., Berthold, M.R. Active learning for object classification: from exploration to exploitation. Data Min Knowl Disc 18, 283–299 (2009). https://doi.org/10.1007/s10618-008-0115-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0115-0