Abstract
There has been growing interest in practice in using unlabeled data together with labeled data in machine learning, and a number of different approaches have been developed. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical models. In this paper we describe a PAC-style framework that can be used to model many of these assumptions, and analyze sample-complexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what are the basic quantities that these numbers depend on. Our model can be viewed as an extension of the standard PAC model, where in addition to a concept class C, one also proposes a type of compatibility that one believes the target concept should have with the underlying distribution. In this view, unlabeled data can be helpful because it allows one to estimate compatibility over the space of hypotheses, and reduce the size of the search space to those that, according to one’s assumptions, are a-priori reasonable with respect to the distribution. We discuss a number of technical issues that arise in this context, and provide sample-complexity bounds both for uniform convergence and ε-cover based algorithms. We also consider algorithmic issues, and give an efficient algorithm for a special case of co-training.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Balcan, M.F., Blum, A., Yang, K.: Co-training and expansion: Towards bridging theory and practice. In: NIPS (2004)
Benedek, G.M., Itai, A.: Learnability with respect to a fixed distribution. Theoretical Computer Science 86, 377–389 (1991)
Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Proc. ICML, pp. 19–26 (2001)
Blum, A., Frieze, A., Kannan, R., Vempala, S.: A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica 22, 35–52 (1998)
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proc. 11th Annual Conf. Computational Learning Theory, pp. 92–100 (1998)
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik Chervonenkis dimension. Journal of the ACM 36(4), 929–965 (1989)
Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: a survey of recent advances. Manuscript (2004)
Boucheron, S., Lugosi, G., Massart, P.: A sharp concentration inequality with applications. Random Structures and Algorithms 16, 277–292 (2000)
Castelli, V., Cover, T.M.: On the exponential value of labeled samples. Pattern Recognition Letters 16, 105–111 (1995)
Castelli, V., Cover, T.M.: The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Transactions on Information Theory 42(6), 2102–2117 (1996)
Devroye, L., Gyorfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)
Dunagan, J., Vempala, S.: Optimal outlier removal in high-dimensional spaces. In: Proceedings of the 33rd ACM Symposium on Theory of Computing (2001)
Ehrenfeucht, A., Haussler, D., Kearns, M., Valiant, L.: A general lower bound on the number of examples needed for learning. Inf. and Comput 82, 246–261 (1989)
Flaxman, A.: Personal communication (2003)
Hwa, R., Osborne, M., Sarkar, A., Steedman, M.: Corrected co-training for statistical parsers. In: ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, Washington D.C (2003)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proc. ICML, pp. 200–209 (1999)
Levin, A., Viola, P., Freund, Y.: Unsupervised improvement of visual detectors using co-training. In: Proc. 9th Int. Conf. Computer Vision, pp. 626–633 (2003)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Mach. Learning 39(2/3), 103–134 (2000)
Park, S.-B., Zhang, B.-T.: Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information. Information Processing and Management 40(3), 421–439 (2004)
Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C., Anthony, M.: Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory 44(5), 1926–1940 (1998)
Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons Inc, Chichester (1998)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. Meeting of the Association for Computational Linguistics, 189–196 (1995)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proc. ICML, pp. 912–912 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Balcan, MF., Blum, A. (2005). A PAC-Style Model for Learning from Labeled and Unlabeled Data. In: Auer, P., Meir, R. (eds) Learning Theory. COLT 2005. Lecture Notes in Computer Science(), vol 3559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11503415_8
Download citation
DOI: https://doi.org/10.1007/11503415_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26556-6
Online ISBN: 978-3-540-31892-7
eBook Packages: Computer ScienceComputer Science (R0)