Abstract
We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the Laplace-Beltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set.
Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace-Beltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15:6, 1373–1396.
Belkin, M., Matveeva, I., & Niyogi, P. (2003). Regression and regularization on large graphs. University of Chicago Computer Science, Technical Report TR-2003-11.
Blum, A., & Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. In Proceedings of the International Conference on Machine Learning.
Bousquet, O., & Elisseeff, A. (2001). Stability and generalization. Journal of Machine Learning Research.
Buser, P. (1982). A note on the isoperimetric constant. Ann. Sci. Ec. Norm. Sup. 15.
Castelli, V., & Cover, T. M. (1995). On the exponential value of labeled samples. Pattern Recognition Letters, 16.
Cheeger, J. (1970). A lower bound for the smallest eigenvalue of the laplacian. In R.C. Gunnings (Ed.), Problems in analysis. Princeton University Press.
Chapelle, O., Weston, J., & Scholkopf, B. (2003). Cluster kernels for semi-supervised learning. Advances in Neural Information Processing Systems.
Cucker, F., & Smale, S. (2001). On the mathematical foundations of learning. Bulletin of the AMS, 39, 1–49.
Chung, F. R. K. (1997). Spectral graph theory. Regional Conference Series in Mathematics, number 92.
Chung, F. R. K., Grigor'yan, A., & Yau, S.-T. (2000). Higher eigenvalues and isoperimetric inequalities on Riemannian manifolds and graphs. Communications on Analysis and Geometry, 8, 969–1026.
Haykin, S. (1999). Neural networks, A comprehensive foundation. Prentice Hall.
Joachims, T. (2003). Transductive learning via spectral graph partitioning. In Proceedings of the International Conference on Machine Learning.
Kannan, R., Vempala, S., & Adrian Vetta. (2000). On clusterings: Good, bad and spectral. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science.
Kondor, R., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the International Conference on Machine Learning.
Kutin, S., & Niyogi, P. (2002). Almost everywhere algorithmic stability and generalization error. In Proceedings of Uncertainty in Artificial Intelligence.
Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled data. Machine Learning, 39:2/3.
Rosenberg, S. (1997). The Laplacian on a riemmannian manifold. Cambridge University Press.
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290.
Schölkopf, B., Smola, A., & Mller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:5.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:8.
Smola, A., & Kondor, R. (2003). Kernels and regularization on graphs. In The Sixteenth Annual Conference on Learning Theory/The Seventh Workshop on Kernel Machines.
Szummer, M., & Jaakkola, T. (2002). Partially labeled classification with Markov random walks. Advances in Neural Information Processing Systems.
Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290.
Tikhonov, A. N., & Arsenin, V. Y. (1977). Solutions of ill-posed problems. W. H. Winston, Washington, D.C.
Wahba, G. (1990). Spline models for observational data. Society for Industrial and Applied Mathematics.
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., & Schölkopf, B. (2003). Learning with local and global consistency, Max Planck Institute for Biological Cybernetics Technical Report.
Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Belkin, M., Niyogi, P. Semi-Supervised Learning on Riemannian Manifolds. Machine Learning 56, 209–239 (2004). https://doi.org/10.1023/B:MACH.0000033120.25363.1e
Issue Date:
DOI: https://doi.org/10.1023/B:MACH.0000033120.25363.1e