Boosted multi-task learning

Chapelle, Olivier; Shivaswamy, Pannagadatta; Vadrevu, Srinivas; Weinberger, Kilian; Zhang, Ya; Tseng, Belle

doi:10.1007/s10994-010-5231-6

Boosted multi-task learning

Published: 24 December 2010

Volume 85, pages 149–173, (2011)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Boosted multi-task learning

Download PDF

Olivier Chapelle¹,
Pannagadatta Shivaswamy²,
Srinivas Vadrevu¹,
Kilian Weinberger³,
Ya Zhang⁴ &
…
Belle Tseng¹

3018 Accesses
39 Citations
Explore all metrics

Abstract

In this paper we propose a novel algorithm for multi-task learning with boosted decision trees. We learn several different learning tasks with a joint model, explicitly addressing their commonalities through shared parameters and their differences with task-specific ones. This enables implicit data sharing and regularization. Our algorithm is derived using the relationship between ℓ ₁-regularization and boosting. We evaluate our learning method on web-search ranking data sets from several countries. Here, multi-task learning is particularly helpful as data sets from different countries vary largely in size because of the cost of editorial judgments. Further, the proposed method obtains state-of-the-art results on a publicly available multi-task dataset. Our experiments validate that learning various tasks jointly can lead to significant improvements in performance with surprising reliability.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems (Vol. 19). Cambridge: MIT Press.
Google Scholar
Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.
Article Google Scholar
Bakker, B., & Heskes, T. (2003). Task clustering and gating for Bayesian multitask learning. Journal of Machine Learning Research, 4, 83–99. doi:10.1162/153244304322765658.
Google Scholar
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. In 16th annual conference on learning theory (pp. 567–580).
Google Scholar
Bian, J., Li, X., Li, F., Zheng, Z., & Zha, H. (2010). Ranking specialization for web search: a divide-and-conquer approach by using topical ranksvm. In WWW’10: proceedings of the 19th international World Wide Web conference.
Google Scholar
Bottou, L., & Vapnik, V. (1992). Local learning algorithms. Neural Computation, 4(6), 888–900.
Article Google Scholar
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. London/Boca Raton: Chapman & Hall/CRC.
MATH Google Scholar
Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10.
Article Google Scholar
Burges, C., Shaked, T., Renshaw, E., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In Internation conference on machine learning (pp. 89–96).
Google Scholar
Cao, Y., Xu, J., Liu, T., Li, H., Huang, Y., & Hon, H. (2006). Adapting ranking SVM to document retrieval. In Proceedings of the 29th international ACM SIGIR conference on research and development in information retrieval (pp. 186–193).
Google Scholar
Caponnetto, A., Micchelli, C., Pontil, M., & Ying, Y. (2008). Universal multi-task kernels. The Journal of Machine Learning Research, 9, 1615–1646.
MathSciNet Google Scholar
Caruana, R. (1997). Multitask learning. In Machine learning (pp. 41–75).
Google Scholar
Chapelle, O., & Wu, M. (2010). Gradient descent optimization of smoothed information retrieval metrics. Information Retrieval Journal, 13(3), 216–235.
Article Google Scholar
Chen, D., Xiong, Y., Yan, J., Xue, G. R., Wang, G., & Chen, Z. (2009). Knowledge transfer for cross domain learning to rank. Information Retrieval.
Collobert, R., & Weston, J. (2008). A unified architecture for NLP: deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning (pp. 160–167). New York: ACM.
Chapter Google Scholar
Cossock, D., & Zhang, T. (2006). Subset ranking using regression. In Proceedings of the conference on learning theory.
Google Scholar
Evgeniou, T., & Pontil, M. (2004). Regularized multi-task learning. In KDD (pp. 109–117).
Chapter Google Scholar
Evgeniou, T., Micchelli, C., & Pontil, M. (2006). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6(1), 615.
MathSciNet Google Scholar
Friedman, J. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29, 1189–1232.
Article MathSciNet MATH Google Scholar
Gao, J., Wu, Q., Burges, C., Svore, K., Su, Y., Khan, N., Shah, S., & Zhou, H. (2009). Model adaptation via model interpolation and boosting for web search ranking. In EMNLP (pp. 505–513).
Chapter Google Scholar
Geng, X., Liu, T. Y., Qin, T., Arnold, A., Li, H., & Shum, H. Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR’08: proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 115–122). New York: ACM.
Chapter Google Scholar
Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In A. Smola, P. Bartlett, B. Schölkopf, D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 115–132). Cambridge: MIT Press.
Google Scholar
Jarvelin, K., & Kekalainen, J. (2002). IR evaluation methods for retrieving highly relevant documents. In ACM special interest group in information retrieval (SIGIR) (pp. 41–48). New York: ACM.
Google Scholar
Jebara, T. (2004). Multi-task feature and kernel selection for svms. In Proceedings of the 21st international conference on machine learning.
Google Scholar
Kang, I. H., & Kim, G. (2003). Query type classification for web document retrieval. In SIGIR’03: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (pp. 64–71). New York: ACM.
Google Scholar
Li, P., Burges, C., & Wu, Q. (2008). Mcrank: learning to rank using multiple classification and gradient boosting. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 21, pp. 897–904). Cambridge: MIT Press.
Google Scholar
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Boosting algorithms as gradient descent in function space. In Neural information processing systems (Vol. 12, pp. 512–518).
Google Scholar
Maurer, A. (2006). Bounds for linear multi-task learning. Journal of Machine Learning Research, 7, 117–139.
MathSciNet Google Scholar
Rosset, S., Zhu, J., Hastie, T., & Schapire, R. (2004). Boosting as a regularized path to a maximum margin classifier. Journal of Machine Learning Research, 5, 941–973.
Google Scholar
Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Taylor, M., Guiver, J., Robertson, S., & Minka, T. (2008). SoftRank: optimizing non-smooth rank metrics. In Proceedings of the 1st ACM international conference on web search and data mining (pp. 77–86).
Chapter Google Scholar
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In D. Touretzky & M. Mozer (Eds.), Advances in neural information processing systems (NIPS) (Vol. 8, pp. 640–646). Cambridge: MIT Press.
Google Scholar
Wang, X., Zhang, C., & Zhang, Z. (2009). Boosted multi-task learning for face verification with applications to web image and video search. In Proceedings of IEEE computer society conference on computer vision and patter recognition.
Google Scholar
Weinberger, K., Dasgupta, A., Attenberg, J., Langford, J., & Smola, A. (2009). Feature hashing for large scale multitask learning. In ICML.
Google Scholar
Xue, Y., Liao, X., Carin, L., & Krishnapuram, B. (2007). Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8, 2007.
MathSciNet Google Scholar
Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning Gaussian processes from multiple tasks. In Proceedings of the 22nd international conference on machine learning.
Google Scholar
Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In Proceedings of the 30th international ACM SIGIR conference on research and development in information retrieval (pp. 271–278).
Chapter Google Scholar
Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., & Sun, G. (2008). A general boosting method and its application to learning ranking functions for web search. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 1697–1704).
Google Scholar

Download references

Author information

Authors and Affiliations

Yahoo! Labs, Sunnyvale, CA, USA
Olivier Chapelle, Srinivas Vadrevu & Belle Tseng
Department of Computer Science, Cornell University, Ithaca, NY, USA
Pannagadatta Shivaswamy
Washington University, Saint Louis, MO, USA
Kilian Weinberger
Shanghai Jiao Tong University, Shanghai, China
Ya Zhang

Authors

Olivier Chapelle
View author publications
You can also search for this author in PubMed Google Scholar
Pannagadatta Shivaswamy
View author publications
You can also search for this author in PubMed Google Scholar
Srinivas Vadrevu
View author publications
You can also search for this author in PubMed Google Scholar
Kilian Weinberger
View author publications
You can also search for this author in PubMed Google Scholar
Ya Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Belle Tseng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olivier Chapelle.

Additional information

Editors: Süreyya Özöğür-Akyüz, Devrim Ünay, and Alex Smola.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chapelle, O., Shivaswamy, P., Vadrevu, S. et al. Boosted multi-task learning. Mach Learn 85, 149–173 (2011). https://doi.org/10.1007/s10994-010-5231-6

Download citation

Received: 27 February 2010
Revised: 27 September 2010
Accepted: 25 November 2010
Published: 24 December 2010
Issue Date: October 2011
DOI: https://doi.org/10.1007/s10994-010-5231-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Boosted multi-task learning

Abstract

Article PDF

Similar content being viewed by others

Learning Gradient Boosted Multi-label Classification Rules

Hyperopt-Sklearn

Collaborating Differently on Different Topics: A Multi-Relational Approach to Multi-Task Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Boosted multi-task learning

Abstract

Article PDF

Similar content being viewed by others

Learning Gradient Boosted Multi-label Classification Rules

Hyperopt-Sklearn

Collaborating Differently on Different Topics: A Multi-Relational Approach to Multi-Task Learning

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation