Abstract
Recursive partitioning methods are among the most popular techniques in machine learning. The purpose of this paper is to investigate how to adapt this methodology to the bipartite ranking problem. Following in the footsteps of the TreeRank approach developed in Clémençon and Vayatis (Proceedings of the 2008 Conference on Algorithmic Learning Theory, 2008 and IEEE Trans. Inf. Theory 55(9):4316–4336, 2009), we present tree-structured algorithms designed for learning to rank instances based on classification data. The main contributions of the present work are the following: the practical implementation of the TreeRank algorithm, well-founded solutions to the crucial issues related to the splitting rule and the choice of the “right” size for the ranking tree. From the angle embraced in this paper, splitting is viewed as a cost-sensitive classification task with data-dependent cost. Hence, up to straightforward modifications, any classification algorithm may serve as a splitting rule. Also, we propose to implement a cost-complexity pruning method after the growing stage in order to produce a “right-sized” ranking sub-tree with large AUC. In particular, performance bounds are established for pruning schemes inspired by recent work on nonparametric model selection. Eventually, we propose indicators for variable importance and variable dependence, plus various simulation studies illustrating the potential of our method.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Arlot, S. (2009). Model selection by resampling techniques. Electronic Journal of Statistics, 3, 557–624.
Boucheron, S., Bousquet, O., & Lugosi, G. (2005). Theory of classification: a survey of some recent advances. ESAIM: Probability and Statistics, 9, 323–375.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth and Brooks.
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In ACM international conference proceeding series : Vol. 119. Proceedings of the 22nd international conference on machine learning (pp. 89–96). New York: ACM.
Clémençon, S., & Vayatis, N. (2008). Tree-structured ranking rules and approximation of the optimal ROC curve. In ALT ’08: Proceedings of the 2008 conference on algorithmic learning theory.
Clémençon, S., & Vayatis, N. (2008). Overlaying classifiers: a practical approach for optimal ranking. In NIPS ’08: Proceedings of the 2008 conference on advances in neural information processing systems.
Clémençon, S., & Vayatis, N. (2009). On partitioning rules for bipartite ranking. Journal of Machine Learning Research: Proceedings of AISTATS ’09
Clémençon, S., & Vayatis, N. (2009). Tree-based ranking methods. IEEE Transactions on Information Theory, 55(9), 4316–4336.
Clémençon, S., & Vayatis, N. (2010). Overlaying classifiers: a practical approach for optimal scoring. Constructive Approximation. doi:10.1007/s00365-010-9084-9
Clémençon, S., Lugosi, G., & Vayatis, N. (2005). Ranking and scoring using empirical risk minimization. In P. Auer, & R. Meir (Eds.), Lecture notes in computer science : Vol. 3559. Proceedings of COLT 2005 (pp. 1–15). Berlin: Springer.
Clémençon, S., Lugosi, G., & Vayatis, N. (2008). Ranking and empirical risk minimization of U-statistics. The Annals of Statistics, 36(2), 844–874.
Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Berlin: Springer.
Ferri, C., Flach, P., & Hernández-Orallo, J. (2003). Improving the AUC of probabilistic estimation trees. In N. Lavrac, D. Gamberger, L. Todorovski, & H. Blockeel (Eds.), Proceedings of the 14th European conference on machine learning (ECML 2003), Cavtat-Dubrovnik, Croatia (pp. 121–132). Berlin: Springer.
Flach, P., & Matsubara, E. T. (2007). A simple lexicographic ranker and probability estimator. In J. N. Kok, J. Koronacki, R. L. de Mantaras, S. Matwin, D. Mladenic, & A. Skowron (Eds.), Proceedings of the 18th European conference on machine learning (pp. 575–582). Berlin: Springer.
Freund, Y., Iyer, R. D., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.
Friedman, J. (1996). Local learning based on recursive covering. Tech. Report, Dept. of Statistics, Stanford University, Stanford, CA 94305.
Friedman, J. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 6, 393–425. IMS Reitz Lecture, 1999.
Hastie, T., & Tibshirani, R. (1990). Generalized linear models. New York: Chapman & Hall/CRC.
Hüllermeier, E., & Vanderlooy, S. (2008). An empirical and formal analysis of decision trees for ranking. Technical Report Computer Science Series 56. Philipps Universität Marburg.
Hüllermeier, E., & Vanderlooy, S. (2009). Why fuzzy decision trees are good rankers. IEEE Transactions on Fuzzy Systems. 17(6), 1233–1244.
Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD’02: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). New York: ACM.
Lugosi, G., & Zeger, K. (1996). Concept learning using complexity regularization. IEEE Transactions on Information Theory, 42(1), 48–54.
Mallat, S. (1990). A wavelet tour of signal processing. San Diego: Academic Press.
Massart, P. (2006). Concentration inequalities and model selection. Lecture notes in mathematics. Berlin: Springer.
Nobel, A. (2002). Analysis of a complexity-based pruning scheme for classification trees. IEEE Transactions on Information Theory, 48(8), 2362–2368.
Pahikkala, T., Tsivtsivadze, E., Airola, A., Boberg, J., & Salakoski, T. (2007). Learning to rank with pairwise regularized least-squares. In Proceedings of SIGIR 2007 workshop on learning to rank for information retrieval (pp. 27–33).
Provost, F., & Domingos, P. (2003). Tree induction for probability-based ranking. Machine Learning, 52(3), 199–215.
Ripley, B. (1996). Pattern recognition and neural networks. Cambridge: Cambridge University Press.
Serfling, R. (1980). Approximation theorems of mathematical statistics. New York: Wiley.
Tsybakov, A. (2004). Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32(1), 135–166.
Yu, P., Wan, W., & Lee, P. (2008). Analyzing ranking data using decision tree. In Proceedings of the EMCL/PKDD’08 workshop on preference learning.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Hendrik Blockeel.
Rights and permissions
About this article
Cite this article
Clémençon, S., Depecker, M. & Vayatis, N. Adaptive partitioning schemes for bipartite ranking. Mach Learn 83, 31–69 (2011). https://doi.org/10.1007/s10994-010-5190-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5190-y