Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search

dos Santos, Davi P.; de Carvalho, André C. P. L. F.

doi:10.1007/978-3-319-07617-1_54

Davi P. dos Santos²⁵ &
André C. P. L. F. de Carvalho²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8480))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

2072 Accesses
3 Citations

Abstract

Induction of predictive models is one of the most frequent data mining tasks. However, for several domains, the available data is unlabeled and the generation of a class label for each instance may have a high cost. An alternative to reduce this cost is the use of active learning, which selects instances according to a criterion of relevance. Diverse sampling strategies for active learning, following different paradigms, can be found in the literature. However, there is no detailed comparison between these strategies and they are usually evaluated for only one classification technique. In this paper, strategies from different paradigms are experimentally compared using different learning algorithms and datasets. Additionally, a multiclass hypothesis space search called SG-multi is proposed and empirically shown to be feasible. Experimental results show the effectiveness of active learning and which classification techniques are more suitable to which sampling strategies.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Active Learning Algorithm Using the Discrimination Function of the Base Classifiers

Evidence-based uncertainty sampling for active learning

Article 13 April 2016

Contextual Bandit for Active Learning: Active Thompson Sampling

Keywords

References

Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: Shavlik, J.W. (ed.) ICML, pp. 1–9. Morgan Kaufmann (1998)
Google Scholar
Angluin, D.: Queries and concept learning. Machine Learning 2(4), 319–342 (1987)
Google Scholar
Attenberg, J., Provost, F.J.: Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In: KDD, pp. 423–432. ACM (2010)
Google Scholar
Bache, K., Lichman, M.: UCI repository of machine learning databases. Machine-readable data repository, University of California, Department of Information and Computer Science, Irvine, CA (2013)
Google Scholar
Bouckaert, R.R., Frank, E.: Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 3–12. Springer, Heidelberg (2004)
Chapter Google Scholar
Cohn, D.A., Atlas, L.E., Ladner, R.E.: Improving generalization with active learning. Machine Learning 15(2), 201–221 (1994)
Google Scholar
Dasgupta, S.: Two faces of active learning. Theoretical Computer Science 412(19), 1767–1781 (2011)
Article MATH MathSciNet Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
MATH MathSciNet Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Ramakrishnan, R., Stolfo, S.J., Bayardo, R.J., Parsa, I. (eds.) KDD, pp. 71–80. ACM (2000)
Google Scholar
Fujii, A., Inui, K., Tokunaga, T., Tanaka, H.: Selective sampling for example-based word sense disambiguation. Computational Linguistics 24(4), 573–597 (1998)
Google Scholar
Guo, Y., Greiner, R.: Optimistic active-learning using mutual information. In: Veloso, M.M. (ed.) IJCAI, pp. 823–829 (2007)
Google Scholar
Guo, Y., Schuurmans, D.: Discriminative batch mode active learning. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) NIPS. Curran Associates, Inc. (2007)
Google Scholar
Guyon, I., Cawley, G.C., Dror, G., Lemaire, V.: Results of the active learning challenge. In: Active Learning and Experimental Design @ AISTATS, vol. 16, pp. 19–45. JMLR.org (2011)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18 (2009)
Article Google Scholar
Hart, P.E.: The condensed nearest neighbor rule (corresp.). IEEE Transactions on Information Theory 14(3), 515–516 (1968)
Article Google Scholar
Körner, C., Wrobel, S.: Multi-class ensemble-based active learning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 687–694. Springer, Heidelberg (2006)
Chapter Google Scholar
Kubat, M., Holte, R.C., Matwin, S.: Learning when negative examples abound. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 146–153. Springer, Heidelberg (1997)
Google Scholar
Lewis, D.D.: A sequential algorithm for training text classifiers: Corrigendum and additional data. SIGIR Forum 29(2), 13–19 (1995)
Article Google Scholar
Lewis, D.D.: Naive (bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Chapter Google Scholar
Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991)
Article MATH Google Scholar
McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: Shavlik, J.W. (ed.) ICML, pp. 350–358. Morgan Kaufmann (1998)
Google Scholar
Melville, P., Mooney, R.J.: Diverse ensembles for active learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 74. ACM, New York (2004)
Chapter Google Scholar
Mitchell, T.M.: Machine learning. McGraw Hill Series in Computer Science. McGraw-Hill (1997)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Google Scholar
Raghavan, H., Madani, O., Jones, R.: Active learning with feedback on features and instances. Journal of Machine Learning Research 7, 1655–1686 (2006)
MATH MathSciNet Google Scholar
Ritter, G.L., Woodruff, H.B., Lowry, S.R., Isenhour, T.L.: An algorithm for a selective nearest neighbor decision rule (corresp.). IEEE Transactions on Information Theory 21(6), 665–669 (1975)
Article MATH Google Scholar
Robertson, A.: The sampling variance of the genetic correlation coefficient. Biometrics 15(3), 469–485 (1959)
Article MATH MathSciNet Google Scholar
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Brodley, C.E., Danyluk, A.P. (eds.) ICML, pp. 441–448. Morgan Kaufmann (2001)
Google Scholar
Schein, A.I., Ungar, L.H.: Active learning for logistic regression: an evaluation. Machine Learning 68(3), 235–265 (2007)
Article Google Scholar
Settles, B.: Curious machines: active learning with structured instances. Ph.D. thesis, University of Madison Wisconsin (2008)
Google Scholar
Settles, B.: Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool (2012)
Google Scholar
Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: EMNLP, pp. 1070–1079. ACL (2008)
Google Scholar
Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) NIPS. Curran Associates, Inc. (2007)
Google Scholar
Shannon, C.E.: Communication theory of secrecy systems. Bell System Technical Journal 28(4), 656–715 (1949)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil
Davi P. dos Santos & André C. P. L. F. de Carvalho

Authors

Davi P. dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
André C. P. L. F. de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Avenue, 1678, Nicosia, Cyprus
Marios Polycarpou
Department of Computer Science, University of Sao Paulo at Sao Carlos, Sao Carlos, SP, Brazil
André C. P. L. F. de Carvalho
Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, No. 415, Chien Kung Road, 80778, Kaohsiung, Taiwan
Jeng-Shyang Pan
Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Michał Woźniak
University of Salamanca, Plaza de la Merced S/N, 37008 Salamanca, Spain and University of A Coruna, Escuela Universitaria Politecnica, Departamento de Enxeñeria Industrial, A Coruna, Spain
Héctor Quintian
University of Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

dos Santos, D.P., de Carvalho, A.C.P.L.F. (2014). Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search. In: Polycarpou, M., de Carvalho, A.C.P.L.F., Pan, JS., Woźniak, M., Quintian, H., Corchado, E. (eds) Hybrid Artificial Intelligence Systems. HAIS 2014. Lecture Notes in Computer Science(), vol 8480. Springer, Cham. https://doi.org/10.1007/978-3-319-07617-1_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-07617-1_54
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07616-4
Online ISBN: 978-3-319-07617-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search

Abstract

Chapter PDF

Similar content being viewed by others

Active Learning Algorithm Using the Discrimination Function of the Base Classifiers

Evidence-based uncertainty sampling for active learning

Contextual Bandit for Active Learning: Active Thompson Sampling

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search

Abstract

Chapter PDF

Similar content being viewed by others

Active Learning Algorithm Using the Discrimination Function of the Base Classifiers

Evidence-based uncertainty sampling for active learning

Contextual Bandit for Active Learning: Active Thompson Sampling

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation