Abstract
We consider the problem of hierarchical or multitask modeling where we simultaneously learn the regression function and the underlying geometry and dependence between variables. We demonstrate how the gradients of the multiple related regression functions over the tasks allow for dimension reduction and inference of dependencies across tasks jointly and for each task individually. We provide Tikhonov regularization algorithms for both classification and regression that are efficient and robust for high-dimensional data, and a mechanism for incorporating a priori knowledge of task (dis)similarity into this framework. The utility of this method is illustrated on simulated and real data.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Ando, R., & Zhang, T. (2005). A framework for learning predictive structure from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853.
Argyriou, A., Evgeniou, T., & Pontil, M. (2006). Multi-task feature learning. In NIPS 20.
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. In Proc. of computational learning theory (COLT).
Bickel, S., Bogojeska, J., Lengauer, T., & Scheffer, T. T. (2008). Multi-task learning for HIV therapy screening. In ICML ’08: Proceedings of the 25th international conference on machine learning (pp. 56–63). New York, NY, USA. New York: ACM.
Caruana, R. (1997). Multi-task learning. Machine Learning, 28, 41–75.
Cook, R. (2007). Fisher lecture: Dimension reduction in regression. Statistical Science, 22(1), 1–26.
Cook, R., & Weisberg, S. (1991). Discussion of “sliced inverse regression for dimension reduction”. Journal of the American Statistical Association, 86, 328–332.
Edelman, E., Guinney, J., Chi, J., Febbo, P., & Mukherjee, S. (2008). Modeling cancer progression via pathway dependencies. PLoS Computational Biology, 4(2).
Evgeniou, T., & Pontil, M. (2004). Regularized multi-task learning. In Proc. conference on knowledge discovery and data mining.
Evgeniou, T., Micchelli, C., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
Fisher, R. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Statistical Society A, 222, 309–368.
Fukumizu, K., Bach, F., & Jordan, M. (2005). Dimensionality reduction in supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5, 73–99.
Hastie, T., & Tibshirani, R. (1996). Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society. Series B, 58(1), 155–176.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Berlin: Springer.
Hotelling, H. (1933). Analysis of a complex of statistical variables in principal components. Journal of Educational Psychology, 24, 417–441.
Jebara, T. (2004). Multi-task feature and kernel selection for svms. In Proc. of ICML.
Jiang, J., Neubauer, B., Graff, J., Chedid, M., Thomas, J., Roehm, N., Zhang, S., Eckert, G., Koch, M., Eble, J., & Cheng, L. (2002). Expression of group iia secretory phospholipase a2 is elevated in prostatic intraepithelial neoplasia and adenocarcinoma. The American Journal of Pathology, 160, 667–671.
Lauritzen, S. (1996). Graphical models. Oxford: Clarendon Press.
Li, K. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–342.
Liang, F., Mukherjee, S., Liao, M., & West, M. (2008). Nonparametric Bayesian kernel models (Technical Report 07-25). ISDS, Duke Univ.
Maggioni, M., & Coifman, R. (2007). Multiscale analysis of data sets using diffusion wavelets. In Proc. data mining for biomedical informatics.
Mukherjee, S., & Wu, Q. (2006). Estimation of gradients and coordinate covariation in classification. Journal of Machine Learning Research, 7, 2481–2514.
Mukherjee, S., & Zhou, D. (2006). Learning coordinate covariances via gradients. Journal of Machine Learning Research, 7, 519–549.
Mukherjee, S., Wu, Q., & Zhou, D.-X. (2010). Learning gradients on manifolds. Bernoulli, 16(1), 181–207.
Obozinski, G., Taskar, B., & Jordan, M. (2006). Multi-task feature selection (Technical report). Dept. of Statistics, University of California, Berkeley.
Speed, T., & Kiiveri, H. (1986). Gaussian Markov distributions over finite graphs. Annals of Statistics, 14, 138–150.
Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of Machine Learning Research, 8, 1027–1061.
Tomlins, S., Mehra, R., Rhodes, D., Cao, X., Wang, L., Dhanasekaran, S., Kalyana-Sundaram, S., Wei, J., Rubin, M., Pienta, K., Shah, R., & Chinnaiyan, A. (2007). Integrative molecular concept modeling of prostate cancer progression. Nature Genetics, 39(1), 41–51.
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
Wahba, G. (1990). Series in applied mathematics : Vol. 59. Splines models for observational data. Philadelphia: SIAM.
Wu, Q., Guinney, J., Maggioni, M., & Mukherjee, S. (2010). Learning gradients: predictive models that reflect geometry and dependencies. Journal of Machine Learning Research, 11, 2175–2198.
Wu, Q., Liang, F., & Mukherjee, S. (2007). Regularized sliced inverse regression for kernel models (Technical report 07-25). ISDS, Duke Univ.
Xia, Y., Tong, H., Li, W., & Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society. Series B, 64(3), 363–410.
Xue, Y., Liao, X., Carin, L., & Krishnapuram, B. (2007). Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8, 35–63.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Tony Jebara.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Guinney, J., Wu, Q. & Mukherjee, S. Estimating variable structure and dependence in multitask learning via gradients. Mach Learn 83, 265–287 (2011). https://doi.org/10.1007/s10994-010-5217-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5217-4