Abstract
We propose a novel regularizer when training an auto-encoder for unsupervised feature extraction. We explicitly encourage the latent representation to contract the input space by regularizing the norm of the Jacobian (analytically) and the Hessian (stochastically) of the encoder’s output with respect to its input, at the training points. While the penalty on the Jacobian’s norm ensures robustness to tiny corruption of samples in the input space, constraining the norm of the Hessian extends this robustness when moving further away from the sample. From a manifold learning perspective, balancing this regularization with the auto-encoder’s reconstruction objective yields a representation that varies most when moving along the data manifold in input space, and is most insensitive in directions orthogonal to the manifold. The second order regularization, using the Hessian, penalizes curvature, and thus favors smooth manifold. We show that our proposed technique, while remaining computationally efficient, yields representations that are significantly better suited for initializing deep architectures than previously proposed approaches, beating state-of-the-art performance on a number of datasets.
Chapter PDF
Similar content being viewed by others
References
Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009); Also published as a book. Now Publishers (2009)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS, vol. 19, pp. 153–160. MIT Press, Cambridge (2007)
Bengio, Y., Larochelle, H., Vincent, P.: Non-local manifold parzen windows. In: NIPS, vol. 18. MIT Press, Cambridge (2006)
Bishop, C.M.: Curvature-driven smoothing: A learning algorithm for feedforward networks. IEEE Transactions on Neural Networks 5(4), 882–884 (1993)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Cho, Y., Saul, L.: Kernel methods for deep learning. In: NIPS 2009, pp. 342–350, NIPS Foundation (2010)
Coates, A., Lee, H., Ng, A.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), JMLR W&CP (2011)
Goodfellow, I., Le, Q., Saxe, A., Ng, A.: Measuring invariances in deep networks. In: NIPS 2009, pp. 646–654 (2009)
Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall, Boca Raton (1990)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)
Kavukcuoglu, K., Ranzato, M., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2009), IEEE, Los Alamitos (2009)
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Ghahramani, Z. (ed.) ICML 2007, pp. 473–480. ACM, New York (2007)
Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research 37, 3311–3325 (1997)
Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: NIPS 2006 (2007)
Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: NIPS 2006, pp. 1137–1144. MIT Press, Cambridge (2007)
Rifai, S., Muller, X., Mesnil, G., Bengio, Y., Vincent, P.: Learning invariant features through local space contraction. Technical Report 1360, Département d’informatique et recherche opérationnelle, Université de Montréal (2011)
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contracting auto-encoders: Explicit invariance during feature extraction. In: Proceedings of the Twenty-eight International Conference on Machine Learning, ICML 2011 (2011)
Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: AISTATS 2009, vol. 5, pp. 448–455 (2009)
Simard, P., Victorri, B., LeCun, Y., Denker, J.: Tangent prop - A formalism for specifying selected invariances in an adaptive network. In: NIPS 1991, pp. 895–903. Morgan Kaufmann, San Francisco (1992)
Swersky, K., Ranzato, M., Buchman, D., Marlin, B., de Freitas, N.: On score matching for energy based models: Generalizing autoencoders and simplifying deep learning. In: Proc. ICML 2011, ACM Press, New York (2011)
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. W. H. Winston, Washington D.C (1977)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: ICML 2008, pp. 1096–1103. ACM, New York (2008)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR 1, 3371–3408 (2010)
Wahba, G.: Spline models for observational data. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)
Weston, J., Ratle, F., Collobert, R.: Deep learning via semi-supervised embedding. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) ICML 2008, pp. 1168–1175. ACM, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rifai, S. et al. (2011). Higher Order Contractive Auto-Encoder. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23783-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-23783-6_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23782-9
Online ISBN: 978-3-642-23783-6
eBook Packages: Computer ScienceComputer Science (R0)