Informally, preferences are indications of the value of a given item in a certain context. Such valuations can occur in the form of a preference function, which assigns a utility value to the object, or in the form of a preference relation, which compares two different objects.

Methods for learning and predicting preferences in an automatic way are among the very recent research topics in machine learning and related areas, such as recommender systems (Ricci et al. 2011; Jannach et al. 2010) and information retrieval (Liu 2011). In fact, it is fair to say that the topic of preference learning and ranking has established itself as a new subfield of machine learning in recent years—a development that is witnessed, amongst others, by a continuously growing number of publications on this topic as well as the organization of dedicated events. Recent workshops in this area include three workshops on Preference Learning at the ECML/PKDD conferences 2008–2010, the Yahoo! Learning to Rank Challenge at ICML 2010, a workshop on Choice Models and Preference Learning at NIPS 2011, and a workshop on Preference Learning: Problems and Applications in Artificial Intelligence at ECAI 2012. Tutorials on this subject have been held at conferences like SIGIR 2008, WWW 2009, ACML 2010, ECML/PKDD 2010, Discovery Science 2011, or ECAI 2012.Footnote 1

Despite being grounded in machine learning, preference learning is an interdisciplinary field with close connections to other disciplines, including statistics, social choice theory, multiple criteria decision making, decision under risk and uncertainty, and operations research (cf. Fig. 1). In all these areas, considerable progress has been made on preference representation and the automated learning of preference models; recent overviews in these areas include (Fürnkranz and Hüllermeier 2011; Kaci 2011). In March 2014, a full-week international workshop at Schloß Dagstuhl in Germany will be dedicated to an exchange of ideas and a homogenization of work in this area in different disciplines.

Fig. 1
figure 1

Preference learning and related research areas within machine learning, information retrieval, and the decision sciences

Roughly speaking, preference learning deals with the induction of (predictive) preference models from empirical data, such as explicit preference information or implicit feedback about preferences. A classical problem setting in this regard is the approximation of the utility function of a single agent on the basis of some question-answer process. This area is often also referred to as preference elicitation (Haddawy et al. 2003; Bonilla et al. 2010). An alternative line of research investigates special types of preference models such as outranking relations (Greco et al. 2011), rough set-based decision rules (Greco et al. 2001), lexicographic orders (Dombi et al. 2007; Schmitt and Martignon 2006; Flach and Matsubara 2007; Yaman et al. 2011) and CP-nets (Boutilier et al. 2004). Popular application areas include collaborative filtering, where a customer’s preferences are estimated from the preferences of other customers (Breese et al. 1998; Sarwar et al. 2001), and personalized ranking techniques for information retrieval (Radlinski and Joachims 2005; Radlinski et al. 2011; Rudin 2009).

Our goal for this special issue was to illustrate the many facets of this quickly developing field of machine learning, to provide an update on mature high-quality work in this area, and to connect to related research areas. In our opinion, the following seven papers, which have been selected from twelve submissions, meet this goal. Each of the papers was reviewed by three experts in this areas, the candidates for acceptance had to undergo at least one (often more) thorough revisions, and were then re-reviewed by the same reviewers in order to ensure the high quality of the final selection of the papers.

In “Supervised clustering of label ranking data using label preference information”, Grbovic, Djuric, Guo and Vucetic study a generalization of supervised clustering, in which instances are associated, not with a single class label, but with a ranking over a fixed set of labels—in preference learning, this kind of structure is also known as a label ranking. The goal is to find a clustering structure such that instances grouped together are homogeneous both in terms of their feature representation and in terms of their associated label rankings. For example, a cluster of that kind may correspond to a group of customers with similar properties and similar preferences over a given set of items. The authors propose different clustering algorithms and compare them experimentally on synthetic and real label ranking data.

In contrast to label ranking, the problem of object ranking consists of learning a ranking function that allows one to order an arbitrary subset of objects, typically characterized in terms of a feature representation, from an underlying reference set. Calauzènes, Usunier, and Gallinari address an important theoretical topic in the scoring-based approach to object ranking, in which objects are evaluated in terms of a score first and sorted according to these scores afterward. In this case, the learning problem essentially reduces to finding a suitable scoring function. Since typical ranking losses are hard to optimize, many approaches make use of more convenient surrogate losses for training scoring functions. In their contribution “Calibration and regret bounds for order-preserving surrogate losses in learning to rank”, the authors analyze the property of “calibration”, which guarantees that minimizing the surrogate risk will yield models that are also optimal with regard to the true target loss. To this end, they consider the family of order-preserving (OP) losses which includes popular surrogate losses for ranking such as the squared error and pairwise losses. While these losses turn out to be calibrated with respect to performance measures like Discounted Cumulative Gain (DCG), which is frequently used to evaluate learning-to-rank approaches in information retrieval, they are not calibrated for other widely used measures, such as Mean Average Precision and Expected Reciprocal Rank. For several widely used OP losses, the authors furthermore derive quantitative surrogate regret bounds with respect to DCG-like evaluation measures.

In “Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers”, Busa-Fekete, Kégl, Éltető, and Szarvas also address the learning-to-rank problem in a setting that is typical for information retrieval applications, and in which objects (such as documents) are associated with degrees of relevance to a given query. The authors’ goal is to produce Bayes-optimal predictions with respect to the Normalized Discounted Cumulative Gain (NDCG) as a loss function. To this end, they devise a three step procedure. First, an ensemble of conventional multi-class classifiers is trained to predict the relevance level of an object. The scores produced by these classifiers are then calibrated using probabilistic and regression losses. Finally, the calibrated models are combined into a single ranker.

Ensembles are also investigated in the next contribution, this time in the context of boosting. In “BoostingTree: parallel selection of weak learners in boosting, with application to ranking”, Kocsis, György, and Bán propose a new technique of that kind, in which several sequences of weak hypotheses are learned in parallel. More specifically, the sequences are arranged in a tree-like structure called boosting tree, and new weak hypotheses are added to those nodes that appear to be the most promising ones. While this approach is guaranteed to asymptotically achieve the performance of the base boosting algorithm, it is shown experimentally to have advantages especially in the case where the length of the sequence is limited.

In their paper “Efficient regularized least-squares algorithms for conditional ranking on relational data”, Pahikkala, Airola, Stock, De Baets, and Waegeman continue their recent work on a problem called “conditional ranking”. Operating in a relational setting, the goal is to predict a ranking of objects conditioned on a specific target object. This setting may be viewed as a generalization of label ranking (where a ranking is conditioned on an instance, but the objects to be ranked are only described as labels) and object ranking (where the ranking is unconditional, but the objects to be ranked are described in terms of features). The authors develop a general kernel framework for this task and propose efficient algorithms based on the optimization of squared regression and ranking loss functions. The performance of their approach is illustrated by means of experiments on synthetic and real-world data.

Letham, Rudin, and Madigan tackle a novel type of problem called “sequential event prediction”. Although their formalization of this problem is inspired by related problems from the field of learning-to-rank, the goal is a quite different one, namely to extrapolate a current sequence of events by predicting the next one. For example, given the shopping history of a customer, what product will be purchased next? In contrast to other methods dealing with sequential data, such as time series analysis or Markov models, the authors make the assumption that the information contained in the current event sequence is properly reflected by the set of the past events, while the order of these events can be neglected. They develop sequential event prediction algorithms that learn from a “sequence database”, and present applications of these algorithms in the domains of recommender systems and health event prediction.

Corrente, Greco, Kadziǹski, and Słowiǹski intend to familiarize the machine learning community with the problem of multi-criteria decision aiding, an important research topic in the operations research and management science communities. In their contribution “Robust ordinal regression for preference learning and ranking”, they contrast their work with common learning-to-rank approaches in machine learning. Since their method also learns to rank a set of alternatives according to their (latent) utilities in a given context, it appears to be quite similar to common machine learning algorithms developed for the same purpose. While this is indeed true to some extent, there are also some notable differences. In particular, while machine learning methods are purely data-driven most of the time and have a strong focus on accurate predictions, robust ordinal regression puts much more emphasis on user interaction and model interpretability. Besides, instead of picking a single model (whose generalization performance is supposed to be the best), this methods seeks to systematically explore the space of all ranking models that are consistent with the preference information given. The paper describes the method in detail, contrasts it with similar machine learning approaches, and illustrates it in a case study on evaluating innovation and the economic benefit in 28 European countries.