Abstract
In this paper we propose a novel algorithm for multi-task learning with boosted decision trees. We learn several different learning tasks with a joint model, explicitly addressing their commonalities through shared parameters and their differences with task-specific ones. This enables implicit data sharing and regularization. Our algorithm is derived using the relationship between ℓ 1-regularization and boosting. We evaluate our learning method on web-search ranking data sets from several countries. Here, multi-task learning is particularly helpful as data sets from different countries vary largely in size because of the cost of editorial judgments. Further, the proposed method obtains state-of-the-art results on a publicly available multi-task dataset. Our experiments validate that learning various tasks jointly can lead to significant improvements in performance with surprising reliability.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems (Vol. 19). Cambridge: MIT Press.
Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.
Bakker, B., & Heskes, T. (2003). Task clustering and gating for Bayesian multitask learning. Journal of Machine Learning Research, 4, 83–99. doi:10.1162/153244304322765658.
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. In 16th annual conference on learning theory (pp. 567–580).
Bian, J., Li, X., Li, F., Zheng, Z., & Zha, H. (2010). Ranking specialization for web search: a divide-and-conquer approach by using topical ranksvm. In WWW’10: proceedings of the 19th international World Wide Web conference.
Bottou, L., & Vapnik, V. (1992). Local learning algorithms. Neural Computation, 4(6), 888–900.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. London/Boca Raton: Chapman & Hall/CRC.
Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10.
Burges, C., Shaked, T., Renshaw, E., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In Internation conference on machine learning (pp. 89–96).
Cao, Y., Xu, J., Liu, T., Li, H., Huang, Y., & Hon, H. (2006). Adapting ranking SVM to document retrieval. In Proceedings of the 29th international ACM SIGIR conference on research and development in information retrieval (pp. 186–193).
Caponnetto, A., Micchelli, C., Pontil, M., & Ying, Y. (2008). Universal multi-task kernels. The Journal of Machine Learning Research, 9, 1615–1646.
Caruana, R. (1997). Multitask learning. In Machine learning (pp. 41–75).
Chapelle, O., & Wu, M. (2010). Gradient descent optimization of smoothed information retrieval metrics. Information Retrieval Journal, 13(3), 216–235.
Chen, D., Xiong, Y., Yan, J., Xue, G. R., Wang, G., & Chen, Z. (2009). Knowledge transfer for cross domain learning to rank. Information Retrieval.
Collobert, R., & Weston, J. (2008). A unified architecture for NLP: deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning (pp. 160–167). New York: ACM.
Cossock, D., & Zhang, T. (2006). Subset ranking using regression. In Proceedings of the conference on learning theory.
Evgeniou, T., & Pontil, M. (2004). Regularized multi-task learning. In KDD (pp. 109–117).
Evgeniou, T., Micchelli, C., & Pontil, M. (2006). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6(1), 615.
Friedman, J. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29, 1189–1232.
Gao, J., Wu, Q., Burges, C., Svore, K., Su, Y., Khan, N., Shah, S., & Zhou, H. (2009). Model adaptation via model interpolation and boosting for web search ranking. In EMNLP (pp. 505–513).
Geng, X., Liu, T. Y., Qin, T., Arnold, A., Li, H., & Shum, H. Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR’08: proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 115–122). New York: ACM.
Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In A. Smola, P. Bartlett, B. Schölkopf, D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 115–132). Cambridge: MIT Press.
Jarvelin, K., & Kekalainen, J. (2002). IR evaluation methods for retrieving highly relevant documents. In ACM special interest group in information retrieval (SIGIR) (pp. 41–48). New York: ACM.
Jebara, T. (2004). Multi-task feature and kernel selection for svms. In Proceedings of the 21st international conference on machine learning.
Kang, I. H., & Kim, G. (2003). Query type classification for web document retrieval. In SIGIR’03: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (pp. 64–71). New York: ACM.
Li, P., Burges, C., & Wu, Q. (2008). Mcrank: learning to rank using multiple classification and gradient boosting. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 21, pp. 897–904). Cambridge: MIT Press.
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Boosting algorithms as gradient descent in function space. In Neural information processing systems (Vol. 12, pp. 512–518).
Maurer, A. (2006). Bounds for linear multi-task learning. Journal of Machine Learning Research, 7, 117–139.
Rosset, S., Zhu, J., Hastie, T., & Schapire, R. (2004). Boosting as a regularized path to a maximum margin classifier. Journal of Machine Learning Research, 5, 941–973.
Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.
Taylor, M., Guiver, J., Robertson, S., & Minka, T. (2008). SoftRank: optimizing non-smooth rank metrics. In Proceedings of the 1st ACM international conference on web search and data mining (pp. 77–86).
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In D. Touretzky & M. Mozer (Eds.), Advances in neural information processing systems (NIPS) (Vol. 8, pp. 640–646). Cambridge: MIT Press.
Wang, X., Zhang, C., & Zhang, Z. (2009). Boosted multi-task learning for face verification with applications to web image and video search. In Proceedings of IEEE computer society conference on computer vision and patter recognition.
Weinberger, K., Dasgupta, A., Attenberg, J., Langford, J., & Smola, A. (2009). Feature hashing for large scale multitask learning. In ICML.
Xue, Y., Liao, X., Carin, L., & Krishnapuram, B. (2007). Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8, 2007.
Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning Gaussian processes from multiple tasks. In Proceedings of the 22nd international conference on machine learning.
Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In Proceedings of the 30th international ACM SIGIR conference on research and development in information retrieval (pp. 271–278).
Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., & Sun, G. (2008). A general boosting method and its application to learning ranking functions for web search. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 1697–1704).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Süreyya Özöğür-Akyüz, Devrim Ünay, and Alex Smola.
Rights and permissions
About this article
Cite this article
Chapelle, O., Shivaswamy, P., Vadrevu, S. et al. Boosted multi-task learning. Mach Learn 85, 149–173 (2011). https://doi.org/10.1007/s10994-010-5231-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5231-6