Estimating variable structure and dependence in multitask learning via gradients

Guinney, Justin; Wu, Qiang; Mukherjee, Sayan

doi:10.1007/s10994-010-5217-4

Estimating variable structure and dependence in multitask learning via gradients

Open access
Published: 23 October 2010

Volume 83, pages 265–287, (2011)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

Estimating variable structure and dependence in multitask learning via gradients

Download PDF

Justin Guinney¹,
Qiang Wu² &
Sayan Mukherjee³

784 Accesses
7 Citations
Explore all metrics

Abstract

We consider the problem of hierarchical or multitask modeling where we simultaneously learn the regression function and the underlying geometry and dependence between variables. We demonstrate how the gradients of the multiple related regression functions over the tasks allow for dimension reduction and inference of dependencies across tasks jointly and for each task individually. We provide Tikhonov regularization algorithms for both classification and regression that are efficient and robust for high-dimensional data, and a mechanism for incorporating a priori knowledge of task (dis)similarity into this framework. The utility of this method is illustrated on simulated and real data.

Article PDF

Regularization: From Inverse Problems to Large-Scale Machine Learning

On Sparsity Inducing Regularization Methods for Machine Learning

Estimation in High Dimensions: A Geometric Perspective

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ando, R., & Zhang, T. (2005). A framework for learning predictive structure from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853.
MathSciNet Google Scholar
Argyriou, A., Evgeniou, T., & Pontil, M. (2006). Multi-task feature learning. In NIPS 20.
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. In Proc. of computational learning theory (COLT).
Bickel, S., Bogojeska, J., Lengauer, T., & Scheffer, T. T. (2008). Multi-task learning for HIV therapy screening. In ICML ’08: Proceedings of the 25th international conference on machine learning (pp. 56–63). New York, NY, USA. New York: ACM.
Chapter Google Scholar
Caruana, R. (1997). Multi-task learning. Machine Learning, 28, 41–75.
Article Google Scholar
Cook, R. (2007). Fisher lecture: Dimension reduction in regression. Statistical Science, 22(1), 1–26.
Article MathSciNet Google Scholar
Cook, R., & Weisberg, S. (1991). Discussion of “sliced inverse regression for dimension reduction”. Journal of the American Statistical Association, 86, 328–332.
Article Google Scholar
Edelman, E., Guinney, J., Chi, J., Febbo, P., & Mukherjee, S. (2008). Modeling cancer progression via pathway dependencies. PLoS Computational Biology, 4(2).
Evgeniou, T., & Pontil, M. (2004). Regularized multi-task learning. In Proc. conference on knowledge discovery and data mining.
Evgeniou, T., Micchelli, C., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
MathSciNet Google Scholar
Fisher, R. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Statistical Society A, 222, 309–368.
Article Google Scholar
Fukumizu, K., Bach, F., & Jordan, M. (2005). Dimensionality reduction in supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5, 73–99.
MathSciNet Google Scholar
Hastie, T., & Tibshirani, R. (1996). Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society. Series B, 58(1), 155–176.
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Berlin: Springer.
MATH Google Scholar
Hotelling, H. (1933). Analysis of a complex of statistical variables in principal components. Journal of Educational Psychology, 24, 417–441.
Article Google Scholar
Jebara, T. (2004). Multi-task feature and kernel selection for svms. In Proc. of ICML.
Jiang, J., Neubauer, B., Graff, J., Chedid, M., Thomas, J., Roehm, N., Zhang, S., Eckert, G., Koch, M., Eble, J., & Cheng, L. (2002). Expression of group iia secretory phospholipase a2 is elevated in prostatic intraepithelial neoplasia and adenocarcinoma. The American Journal of Pathology, 160, 667–671.
Article Google Scholar
Lauritzen, S. (1996). Graphical models. Oxford: Clarendon Press.
Google Scholar
Li, K. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–342.
Article MathSciNet MATH Google Scholar
Liang, F., Mukherjee, S., Liao, M., & West, M. (2008). Nonparametric Bayesian kernel models (Technical Report 07-25). ISDS, Duke Univ.
Maggioni, M., & Coifman, R. (2007). Multiscale analysis of data sets using diffusion wavelets. In Proc. data mining for biomedical informatics.
Mukherjee, S., & Wu, Q. (2006). Estimation of gradients and coordinate covariation in classification. Journal of Machine Learning Research, 7, 2481–2514.
MathSciNet Google Scholar
Mukherjee, S., & Zhou, D. (2006). Learning coordinate covariances via gradients. Journal of Machine Learning Research, 7, 519–549.
MathSciNet Google Scholar
Mukherjee, S., Wu, Q., & Zhou, D.-X. (2010). Learning gradients on manifolds. Bernoulli, 16(1), 181–207.
Article MathSciNet MATH Google Scholar
Obozinski, G., Taskar, B., & Jordan, M. (2006). Multi-task feature selection (Technical report). Dept. of Statistics, University of California, Berkeley.
Speed, T., & Kiiveri, H. (1986). Gaussian Markov distributions over finite graphs. Annals of Statistics, 14, 138–150.
Article MathSciNet MATH Google Scholar
Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of Machine Learning Research, 8, 1027–1061.
Google Scholar
Tomlins, S., Mehra, R., Rhodes, D., Cao, X., Wang, L., Dhanasekaran, S., Kalyana-Sundaram, S., Wei, J., Rubin, M., Pienta, K., Shah, R., & Chinnaiyan, A. (2007). Integrative molecular concept modeling of prostate cancer progression. Nature Genetics, 39(1), 41–51.
Article Google Scholar
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Wahba, G. (1990). Series in applied mathematics : Vol. 59. Splines models for observational data. Philadelphia: SIAM.
Google Scholar
Wu, Q., Guinney, J., Maggioni, M., & Mukherjee, S. (2010). Learning gradients: predictive models that reflect geometry and dependencies. Journal of Machine Learning Research, 11, 2175–2198.
MathSciNet Google Scholar
Wu, Q., Liang, F., & Mukherjee, S. (2007). Regularized sliced inverse regression for kernel models (Technical report 07-25). ISDS, Duke Univ.
Xia, Y., Tong, H., Li, W., & Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society. Series B, 64(3), 363–410.
Article MathSciNet MATH Google Scholar
Xue, Y., Liao, X., Carin, L., & Krishnapuram, B. (2007). Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8, 35–63.
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Sage Bionetworks, Seatle, WA, 98109, USA
Justin Guinney
Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
Qiang Wu
Departments of Statistical Science, Computer Science, and Mathematics, Duke University, Durham, NC, 27708, USA
Sayan Mukherjee

Authors

Justin Guinney
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sayan Mukherjee.

Additional information

Editor: Tony Jebara.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Guinney, J., Wu, Q. & Mukherjee, S. Estimating variable structure and dependence in multitask learning via gradients. Mach Learn 83, 265–287 (2011). https://doi.org/10.1007/s10994-010-5217-4

Download citation

Received: 06 November 2008
Revised: 26 March 2010
Accepted: 30 August 2010
Published: 23 October 2010
Issue Date: June 2011
DOI: https://doi.org/10.1007/s10994-010-5217-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Estimating variable structure and dependence in multitask learning via gradients

Abstract

Article PDF

Similar content being viewed by others

Regularization: From Inverse Problems to Large-Scale Machine Learning

On Sparsity Inducing Regularization Methods for Machine Learning

Estimation in High Dimensions: A Geometric Perspective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimating variable structure and dependence in multitask learning via gradients

Abstract

Article PDF

Similar content being viewed by others

Regularization: From Inverse Problems to Large-Scale Machine Learning

On Sparsity Inducing Regularization Methods for Machine Learning

Estimation in High Dimensions: A Geometric Perspective

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation