Abstract
We investigate new matrix penalties to jointly learn linear models with orthogonality constraints, generalizing the work of Xiao et al. [24] who proposed a strictly convex matrix norm for orthogonal transfer. We show that this norm converges to a particular atomic norm when its convexity parameter decreases, leading to new algorithmic solutions to minimize it. We also investigate concave formulations of this norm, corresponding to more aggressive strategies to induce orthogonality, and show how these penalties can also be used to learn sparse models with disjoint supports.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Foundations and Trends® in Machine Learning 4(1), 1–106 (2011)
Bakker, B., Heskes, T.: Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003)
Barvinok, A. A Course in Convexity. American Mathematical Society (2002)
Baxter, J.: A model of inductive bias learning. Journal of Artificial Intelligence Research 12, 149–198 (2000)
Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples. Cms Books in Mathematics Series. Springer (2000)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)
Brickman, L.: On the field of values of a matrix. Proceedings of the American Mathematical Society, 61–66 (1961)
Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 78–87. ACM, New York (2004)
Calder, A.J., Burton, A.M., Miller, P., Young, A.W., Akamatsu, S.: A principal component analysis of facial expressions. Vision Res. 41(9), 1179–1208 (2001)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math. 12(6), 805–849 (2012)
Evgeniou, T., Micchelli, C., Pontil, M.: Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615–637 (2005)
Hwang, S.J.J., Grauman, K., Sha, F.: Learning a tree of metrics with disjoint visual features. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P., Pereira, F.C.N., Weinberger, K.Q. (eds.) Adv. Neural. Inform. Process Syst. 24, pp. 621–629 (2011)
Jacob, L., Vert, J.-P.: Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19), 2149–2156 (2008)
Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0-1 optimization. SIAM Journal on Optimization 1(2), 166–190 (1991)
McCallum, A., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 359–367. Morgan Kaufmann Publishers Inc., San Francisco (1998)
Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing 20(2), 231–252 (2010)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Romera-Paredes, B., Argyriou, A., Berthouze, N., Pontil, M.: Exploiting unrelated tasks in multi-task learning. J. Mach. Learn. Res. - Proceedings Track 22, 951–959 (2012)
Shor, N.Z.: Quadratic optimization problems. Soviet Journal of Computer and Systems Sciences 25, 1–11 (1987)
Srebro, N., Rennie, J.D.M., Jaakkola, T.S.: Maximum-margin matrix factorization. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Adv. Neural. Inform. Process Syst. 17, pp. 1329–1336. MIT Press, Cambridge (2005)
Thrun, S., Pratt, L. (eds.): Learning to learn. Kluwer Academic Publishers, Norwell (1998)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
Xiao, L., Zhou, D., Wu, M.: Hierarchical classification via orthogonal transfer. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28-July 2, pp. 801–808. Omnipress (2011)
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 9999, 2543–2596 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vervier, K., Mahé, P., D’Aspremont, A., Veyrieras, JB., Vert, JP. (2014). On Learning Matrices with Orthogonal Columns or Disjoint Supports. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44845-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-662-44845-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44844-1
Online ISBN: 978-3-662-44845-8
eBook Packages: Computer ScienceComputer Science (R0)