1 Introduction

Preferences are fundamental to decision making and have been researched in areas such as knowledge representation, decision theory, social choice, and constraint satisfaction. Preferences amount to a total order or preorder on a set of outcomes (alternatives). In some settings, for instance in voting theory, the number of outcomes is small enough to allow an explicit enumeration as a method to represent preference relations. However, in other settings outcomes are specified in terms of attributes, each with its own domain, where an outcome is a tuple of values, one for each attribute. Such outcome spaces are called combinatorial domains. If attribute domains have at least two values, the cardinality of a combinatorial domain is exponential in the number of attributes. Consequently, explicit enumeration of preference orders, even for combinatorial domains over as few as ten attributes, is infeasible.

To represent preferences over combinatorial domains, we use languages that concisely express agent’s criteria for preferring one outcome over another, thus determining preference orders on outcomes. Languages exploiting lexicographic orders have been especially extensively studied. They include lexicographic strategies [17], lexicographic preference trees [2], partial lexicographic preference trees [13] (our focus in this paper), and preference trees [6, 14]. These models naturally support preference reasoning [18, 19]. Most recently, Bräuning et al. [3] studied learning of preference lists, a model orthogonal to the model of PLP-trees. On the one hand, preference lists can capture preferences that cannot be captured by PLP-trees. On the other hand, preference lists cannot capture conditional importances that can naturally be modeled by PLP-trees.

Lexicographic preference models have structure that factors the agent’s preference order into the importance, sometimes conditional, of attributes, and preference orders, also sometimes conditional, on values of individual attribute domains. This structure can be exploited for preference elicitation. It also provides useful insights into what is important for an agent when choosing among available outcomes. In particular, it makes it easy to compare outcomes (dominance testing) and to identify outcomes that are most preferred.

In this paper, we focus on lexicographic models given by partial lexicographic preference trees, or PLP-trees for short [13]. PLP-trees that impose strong restrictions on the structure, for instance, those with unconditional importance of attributes and unconditional preference orders on values of attribute domains, can be elicited effectively from the agents. However, in general, PLP-trees are difficult to elicit directly and have to be learned, that is, built from examples of pairwise comparisons or other observed expressions of the agent’s preference [11]. Unrestricted PLP-trees may have size of the order of the size of the underlying combinatorial domain. Such large trees offer no advantages over explicit enumerations of preference orders. However, PLP-trees learned from a set E of examples have size O(|E|). This gives us control over the size of learned trees but the predictive power of trees learned from small sets of examples may be limited. Learning forests of small trees and using some voting aggregation method was proposed as a way to circumvent the problem. Following ideas proposed by Breiman [4], Liu and Truszczynski [11] studied learning forests of PLP-trees and used the Pairwise Majority rule (PMR) to obtain a new type of a lexicographic preference model [11].

There are two main problems with this last approach. First, the PMR does not (in general) yield an order. Second, it does not lead to any obvious algorithms for reasoning tasks other than dominance testing. For instance, it does not seem to lead to natural approaches to preference optimization, that is, computing optimal or near-optimal outcomes. In this paper, we extend the results by Liu and Truszczynski [11] by replacing the PMR with several common voting rules. Using voting rules to aggregate preference orders defined by lexicographic models has drawn significant attention lately. Lang and Xia [10] studied sequential voting protocols. Lang et al. [9] established computational properties of voting-based methods to aggregate LP-trees, and Liu and Truszczynski [12] conducted an experimental study of aggregating LP-trees by voting using SAT-based tools.

Using voting rules to aggregate forests of PLP-trees turns out to yield preference models where dominance testing is as direct as with the PMR. However, preference optimization becomes feasible, too. As there are many voting rules that could be used, and they pose different computational challenges, it is important to study whether some rules are better than others. Earlier work in the standard voting setting showed significant robustness of the aggregated order to the choice of a voting rule. Comparing several common voting rules, researchers found that, except for Plurality, these voting methods show a high consensus on the resulting aggregated preference orderings [5, 15]. Our results on rank correlation in the setting when individual preferences are represented by PLP-trees over possibly large combinatorial domains also show high consensus among orders determined by the PLP-forest models, at levels consistent with those reported for the voting setting.

As long as we are interested in dominance testing only, one can build predictive models by learning decision trees.Footnote 1 We compare the quality of learned PLP-trees and forests with those of learned decision trees. Decision trees turn out to be more accurate for dominance testing. However, they have drawbacks. Decision trees do not in general represent order nor partial order relations. They do not provide any explicit information about underlying orders and so, do not provide insights into how agents whose preferences they aim to model make decisions. Lastly, they do not lend themselves easily to tasks involving preference optimization.

To summarize, our contributions are as follows. (1) We propose to model preferences by forests of PLP-trees, aggregated by voting rules. We study computational complexity of key reasoning tasks for the resulting models. (2) We demonstrate that the models we studied had higher predictive accuracy than the models given by a single PLP-tree, and by a PLP-forest with the PMR. (3) We show that for several voting rules the orders obtained by aggregating PLP-forests are quite close to each other. This alleviates the issue of selecting the “right” rule. (4) For the proposed PLP-forest preference models, we develop methods to compute optimal and near-optimal outcomes, the reasoning task that has no natural solutions under models based on the PMR. (5) We compare our models with those based on decision trees. We show that the latter are more accurate but, as noted above, have shortcomings in other aspects.

The higher accuracy of models based on decision trees on the dominance testing task does not invalidate PLP-tree based approaches, as they have important advantages noted above. Rather, they suggest an intriguing question of whether PLP-trees (forests) could be combined with decision trees (forests) retaining the best features of each approach. One possibility might be to use PLP-trees to some top-level partitioning of outcomes, with decision trees used for low-level details.

2 Partial Lexicographic Preference Trees and Forests

Let \(\mathcal {A}=\{X_1,\ldots ,X_p\}\) be a set of attributes, each attribute \(X_i\) having a finite domain \(D_i\). The corresponding combinatorial domain over \(\mathcal {A}\) is the Cartesian product \( CD (\mathcal {A})=D_1 \times \ldots \times D_p\). We call elements of combinatorial domains outcomes.

A PLP-tree over \( CD (\mathcal {A})\) is an ordered labeled tree, where: (1) every non-leaf node is labeled by some attribute from \(\mathcal {A}\), say \(X_i\), and by a local preference \(>_i\), a total strict order on the corresponding domain \(D_i\); (2) every non-leaf node labeled by an attribute \(X_i\) has \(|D_i|\) outgoing edges; (3) every leaf node is denoted by \(\Box \); and (4) on every path from the root to a leaf each attribute appears at most once as a label.

Each outcome \(\alpha \in CD (\mathcal {A})\) determines in a PLP-tree T its outcome path, \(H(\alpha ,T)\). It starts at the root of T and proceeds downward. When at a node d labeled with an attribute X, the path descends to the next level based on the value \(\alpha (X)\) of the attribute X in the outcome \(\alpha \) and on the local preference order associated with d. Namely, if \(\alpha (X)\) is the i-th most preferred value in this order, the path descends to the i-th child of d. We denote by \(\ell ^T(\alpha )\) the index of the leaf in which the outcome path \(H(\alpha ,T)\) ends (the leaves are indexed from left to right with integers \(0,1,\ldots \)).

We say that an outcome \(\alpha \) is at least as good as an outcome \(\beta \) (\(\alpha \succeq _T \beta \)) if \(\ell ^T(\alpha ) \le \ell ^T(\beta )\). The associated equivalence and strict order relations \(\approx _T\) and \(\succ _T\) are specified by the conditions \(\ell ^T(\alpha ) = \ell ^T(\beta )\) and \(\ell ^T(\alpha ) < \ell ^T(\beta )\), respectively. Preference relations modeled by PLP-trees are total preorders.

The leaves of a PLP-tree can be indexed in time O(s(T)), where s(T) is the number of nodes in T, by adapting the inorder traversal to the task. After that, the value \(\ell ^T(\alpha )\) can be computed in time O(h(T)), where h(T) is the height of tree T. Thus, assuming the indices were precomputed, all three relations can be decided in time O(h(T)).

To illustrate, let us consider the domain of cars described by four multi-valued attributes. The attribute BodyType (B) has three values: minivan (v), sedan (s), and sport (r). The attribute Make (M) can either have value Honda (h) or Ford (f). The Price (P) can be low (l), medium (d), or high (g). Finally, Transmission (T) can be automatic (a) or manual (m). An agent’s preference order on cars from this space could be expressed by a PLP-tree T in Fig. 1.

Fig. 1.
figure 1

A PLP-tree T over the car domain

The tree tells us that BodyType is the most important attribute to the agent and that she prefers minivans, followed by sedans and by sport cars. Her next most important attribute is contingent upon what type of cars the agent is considering. For minivans, her most important attribute is Make, where she likes Honda more than Ford. Among sedans, her most important attribute is Price, where she prefers medium-priced cars over low-priced ones, and those over high-priced ones. She does not differentiate between sport cars; they are least preferred.

To compare a Ford sedan with a middle-range price and an automatic transmission (\( \langle s, f, d, a \rangle \), in our notation) and a Honda sedan with a high-range price and a manual transmission (that is, \( \langle s, h, g, m \rangle \)), we traverse the tree T. We see that the cars diverge on the node labeled by attribute P, and that the Ford car falls to leaf 2 and the Honda car leaf 4. Thus, the Ford car is preferred to the Honda car.

A PLP-forest is a finite set of PLP-trees. When extended with a voting rule to aggregate orders given by its constituent PLP-trees, a PLP-forest specifies a single preference order on the space of outcomes. In this way, PLP-forests with voting rules can be viewed as models of preference relations.

3 Voting in Partial Lexicographic Preference Forests

To aggregate PLP-forests we consider the voting rules Top-k Clusters, Plurality, Borda, Copeland, and Maximin. In our experiments, we also consider the earlier model of PLP-forests combined with the PMR. In general, the PMR does not yield a sensible preference relation as it suffers from the Condorcet paradox [7]. Nevertheless, it performs well in dominance testing [4, 11]. We consider it here as the baseline for the voting rules.

The five voting rules are scoring rules in the sense that, given a PLP-forest P, they assign to each outcome o the score \(S_r(o,P)\) (where r refers to a voting rule). The scores define the preference relation \(\succeq \) as follows: for outcomes \(o,o'\), we have \(o \succeq o'\) if and only if \(S_r(o,P) \ge S_r(o',P)\). Clearly, the relation defined in this way is a total preorderFootnote 2.

The first three rules we discuss are versions of the well-known positional scoring rules used with total preference orders. They are adjusted here to the case of total preorders. Each tree T in a PLP-forest P determines the score \(S_r(o,T)\) of an outcome o in the preference preorder given by T. This score depends on the position of the preorder “cluster” containing o, its size, and on the number of outcomes in the clusters that are more preferred than the one containing o. In each case we consider, namely, Top-k Clusters, Plurality and Borda the specific formula for \(S_r(o,T)\) is a natural generalization of the corresponding formula for the standard case of total orders to total preorders. In each case, the sum of scores with respect to all trees in the forest P yields the score \(S_r(o,P)\).

Below we introduce the five voting rules adjusted to the setting of total preorders (they are commonly defined for strict total orders), as well as the PMR.

Top-k Clusters (where k is a positive integer): For an outcome o, we define \(S_{tkc}(o,T)=\max \{k-\ell ^T(o),0\}\) and set

$$ S_{tkc}(o,P) = \sum \limits _{T\in P} S_{tkc}(o,T). $$

Assuming that we precomputed indices of leaves in all trees, which can be accomplished in time O(s(P)), where s(P) denotes the number of nodes in all trees in P, we can compute \(S_{tkc}(o,P)\), for any outcome o, in time \(O(t(P)\cdot \max \{h(T): T\in P\})\), where t(P) is the number of trees in P. We note that Top Cluster (\(k=1\)) is a rule similar to approval, where each tree approves all outcomes in the leftmost cluster (and only those outcomes); and Top-k Cluster rules with \(k>1\) are its natural generalizations.

Plurality: Let \(\ell ^T_0\) be the set of most preferred outcomes in a PLP-tree T (the set of all outcomes o with \(\ell ^T(o)=0\)). Next, let \(\varDelta ^T(o)=1\) if outcome o is a most preferred one in T, and \(\varDelta ^T(o)=0\), otherwise. We define the Plurality score \(S_{\textit{pl}}(o,P)\) by setting

$$ S_{\textit{pl}}(o,P) = \sum \limits _{T\in P} \frac{\varDelta ^T(o)}{|\ell ^T_0|}. $$

We can compute \(\varDelta ^T(o)\) and \(|\ell ^T_0|\) in time O(h(T)). Thus, \(S_{\textit{pl}}(o,P)\) can be computed in time \(O(t(P) \cdot \max \{h(T): T\in P\})\).

Borda: Let T be a PLP-tree. We define \(\ell _i^T\) to be the set of all outcomes o with \(\ell ^T(o)=i\) (the \(i^{th}\) cluster in the order defined by T). Let c(o) be the cluster containing o (in our notation, \(c(o) = \ell ^T_{\ell ^T(o)}\)). We define

$$ S_b(o,T) = \frac{\sum \limits _{1\le j\le |c(o)|} (n-j-\sum \limits _{0\le i<\ell ^T(o)} |\ell ^T_i|)}{|c(o)|}, $$

where n is the size of the combinatorial domain,Footnote 3 and set \(S_b(o,P)\) as follows:

$$ S_b(o,P) = \sum \limits _{T\in P} S_b(o,T). $$

Assuming that the sizes \(|\ell ^T_i|\) of clusters and the quantities \(\sum \limits _{0\le i<\ell } |\ell ^T_i|)\) are precomputed, which can be done in time O(s(P)), we can compute \(S_b(o,T)\) in time O(h(T)). Consequently, \(S_b(o,P)\) can be computed in time \(O(t(P)\cdot \max \{h(T): T\in P\})\).

Copeland: Let us define \(N_P(o,o')\) to be the number of trees \(T\in P\) such that \(o\succ _T o'\). Informally, \(N_P(o,o')\) is the number of trees that declare o more preferred to \(o'\). If \(N_P(o,o')>N_P(o',o)\), then o wins with \(o'\) in P. If \(N_P(o,o')<N_P(o',o)\), then o loses to \(o'\) in P. The Copeland score \(S_{\textit{cp}}(o,P)\) is given by the difference between the number of pairwise wins and the number of pairwise losses of o:

$$\begin{aligned} S_{\textit{cp}}(o,P) =&|\{o'\in C\setminus \{o\}: N_P(o,o')>N_P(o',o)\}| \\&- |\{o'\in C\setminus \{o\}: N_P(o,o')<N_P(o',o)\}|. \end{aligned}$$

Maximin: This method (also known as the Simpson-Kramer method) is considered in several variants in which the definition of the Maximin scoring function \(S_{\textit{xn}}(o,P)\) may include winning votes, margins, and pairwise oppositions. In this paper, we will define it in terms of the margin for an outcome, that is, the smallest difference between the numbers of pairwise wins and pairwise losses against all opponents.

$$ S_{\textit{xn}}(o,P) = \min \limits _{o'\in C\setminus \{o\}} (N_P(o,o') - N_P(o',o)). $$

Both the Copeland score and the Maximin score can be computed in time \(O(n\cdot t(P)\cdot \max \{h(T): T\in P\})\), where n is the size of the combinatorial domain.

Pairwise Majority Rule (PMR): The PMR is not a scoring rule. We use it to decide preferences between outcomes. Specifically, given two outcomes o and \(o'\), \(o \succ _{pm} o'\) if \(N_P(o,o')>N_P(o',o)\). Thus, deciding pairwise preferences takes time \(O(t(P)\cdot \max \{h(T): T\in P\})\).

4 Computational Complexity

In the previous sections we listed estimates of the running time of algorithms that could be used to compute scores of the five scoring rules we consider. Here we complete the discussion by considering the complexity of the problems SCORE, QUALITY, and OPTIMIZATION.

SCORE (for a scoring rule r): Given a PLP-forest P, an outcome o, and a positive rational number s, decide whether \(S_r(o,P) \ge s\).

QUALITY (for a scoring rule r): Given a PLP-forest P and a positive rational number \(\ell \), decide whether there is an outcome o such that \(S_r(o,P) \ge \ell \).

OPTIMIZATION (for a scoring rule r): Given a PLP-forest P, compute an outcome with the highest score (an optimal outcome).

The picture for the rules Top-k Clusters, Plurality and Borda is complete. As we noted above, the SCORE problem for Borda is in the class P, and Lang et al. [9] proved that the QUALITY and OPTIMIZATION problems for Borda are NP-complete and NP-hard, respectively. The SCORE problem for Top-k Clusters and Plurality is in P (cf. our comments in the previous section) and the following two results show that in each case, the problems QUALITY and OPTIMIZATION are NP-complete and NP-hard, respectively.

Theorem 1

The QUALITY problem for Top Cluster is NP-complete.

Proof

(Sketch). Membership is obvious, as one can guess an outcome o in O(p) time, and verify that \(S_{tc}(o,P)\ge l\) in polynomial time in the size of P. Hardness is proved by reduction from MIN2SAT: Given a set \(\varPhi \) of n 2-clauses \(\{ C_1,\ldots ,C_n \}\) over a set of propositional variables \(\{ X_1,\ldots ,X_p \}\), and a positive integer g (\(g \le n\)), decide whether there is a truth assignment that satisfies at most g clauses in \(\varPhi \).

Specifically, let \(\varPhi \) be a collection of n 2-clauses. For each clause in \(\varPhi \), say \(C=X_i \vee \lnot X_j\), we create a PLP-tree \(T_C\) in Fig. 2, treating propositional variables \(X_i\) as attributes with the domains \(\{0_i,1_i\}\) (the form of the tree for other types of 2-clauses is evident from the example we selected for illustration). We also set \(l=n-g\).

A truth assignment v falsifies a clause C in \(\varPhi \) if and only if, when viewed as an outcome, it belongs to the top cluster of the corresponding tree \(T_C\). Clearly, there is an assignment satisfying at most g clauses of \(\Pi \) if and only if there is an assignment that falsifies at least l clauses in \(\varPhi \). The latter is equivalent to the existence of an outcome that belongs to the top cluster of at least l trees \(T_C\), \(C\in \varPhi \), that is, an outcome v such that \(S_{tc}(v,P) \ge l\).    \(\square \)

Fig. 2.
figure 2

PLP-tree

The case of the Top-k Cluster rule with \(k>1\) is dealt with in the next theorem.

Theorem 2

The QUALITY problem for Top-k Clusters, where \(k>1\), is NP-complete.

Proof

(Sketch). The membership in NP is evident. To prove NP-hardness for when \(k>1\), we again reduce from the MIN2SAT problem, but the construction is different. For every clause \(C\in \varPhi \), say \(C = X_i \vee \lnot X_j \in \varPhi \), we construct a set \(P_C\) of three PLP-trees shown in Fig. 3 (the construction is evident from the example we selected here for illustration).

We note that if an assignment v satisfies a clause \(C\in \varPhi \), then we have \(S_{tkc}(v,P_C) = 3\cdot k - 4\); otherwise, we have \(S_{tkc}(v,P_C) = 3\cdot k - 3\). We now set \(P=\bigcup _{C\in \varPhi }P_C\) and \({l=3n(k-1)-g}\). We need to show that assignment v satisfies at most g clauses in \(\varPhi \) if and only if v scores at least l in P according to the Top-k Clusters rule.

Let v be an assignment satisfying \(g'\) clauses in \(\varPhi \). Then, \(S_{tkc}(v,P)=g'(3\cdot k - 4)+(n-g')(3\cdot k - 3)={3n(k-1)-g'}\). Thus, as required, there is an assignment v that satisfies at most g clauses if and only if there is an outcome with the score at least l.    \(\square \)

Fig. 3.
figure 3

Set \(P_C\) of PLP-trees for clause \(C = X_i \vee \lnot X_j\)

Theorem 3

The QUALITY problem for Plurality is NP-complete.

Proof

(Sketch). The membership in NP is clear. The NP-hardness can be proved by reduction from MIN2SAT. For each clause in \(\varPhi \), we create a PLP-tree \(T_C\) as in the proof of Theorem 1, and we set \(l=(n-g)/2^{p-2}\). One can show that there is a truth assignment satisfying at most g clauses in \(\varPhi \) if and only if there exists an outcome whose Plurality score is at least l.    \(\square \)

Theorems 2 and 3 show that the corresponding OPTIMIZATION problems for Top-k Cluster, \(k\ge 1\), and for Plurality are NP-hard.

The SCORE, QUALITY and OPTIMIZATION problems for Copeland and Maximin were studied by Lang et al. [9]. These results are partial and not tight. We studied the complexity of these problems for the scoring rules Top-k Clusters, Plurality and Borda. Our results are complete and tight. We summarize all these results in Table 1. Completing the complexity picture for Copeland and Maximin remains a challenging open problem.

Table 1. Computational complexity results

5 Experiments and Results

PLP-trees and forests are difficult to elicit from users directly. In practical settings they have to be learned from examples, that is, pairs \((o,o')\) of outcomes, where o is strictly preferred to \(o'\) in the preference order we are trying to elicit (model). A method to learn PLP-trees was proposed by Liu and Truszczynski [11]. They also applied it learn PLP-forests and aggregate them with the PMR. In this paper, we extend this work to the case when learned PLP-forests (forests of learned PLP-trees) are aggregated by means of voting rules.

In our main results, we evaluate the ability of PLP-forests extended with voting rules to approximate preference orders arising in practical settings. Further, we compare in this respect PLP-forest models with models based on decision trees, develop for PLP-forests effective techniques to compute optimal or near optimal outcomes, and study the effect of the choice of a specific voting rule on the quality of the preference model.

5.1 Datasets and Experimental Set-up

We implemented the scoring rules discussed above as order aggregators for PLP-forests and experimented with them on the twelve preferential datasets used before by Liu and Truszczynski [11].Footnote 4 Their key characteristics are given in Table 2. The third column gives the number of pairs of outcomes from the corresponding domain with the first outcome being strictly better than the second one. As mentioned earlier, we refer to such pairs as examples.

Table 2. Preference datasets in the preference learning library

The PLP-forest learning procedure works as follows. For each of the datasets, we randomly partition the set of examples \(\mathcal {E}\), generating a training set of 70% of \(\mathcal {E}\) and use the rest 30% as the testing set. In the training phase, we use the greedy learning heuristic [11] to learn a PLP-forest of a given number of PLP-trees, each of which is learned from M (a parameter) examples selected with replacement and uniformly at random from the training set. In the testing phase, the trees in the learned PLP-forest are aggregated using the seven voting methods, Top Cluster, Top-2 Clusters, Top-3 Clusters, Plurality, Borda, Copeland and Maximin, to predict testing examples and to compute the social welfare rankings. We repeat this procedure 20 times for each dataset.

We recall that the greedy heuristic algorithm to learn a PLP-tree [11] takes as input the set \(\mathcal {E}\) of examples, the set \(\mathcal {A}\) of attributes, and a node n. The algorithm labels n with an attribute X and picks the preference order of elements in the domain of X so that to maximize the number of examples in \(\mathcal {E}\) correctly decided by X and this domain order. For each value in the domain of X, the algorithm generates a child node of n for which the algorithm recursively repeats with updated inputs: \(\mathcal {E}'\), \(\mathcal {A}'\) and \(n'\), where \(\mathcal {E}'\) is obtained from \(\mathcal {E}\) excluding the examples decided at node n, \(\mathcal {A}'=\mathcal {A}\)\\(\{X\}\), and \(n'\) is a child node of n. The algorithm stops and returns at a node where either \(\mathcal {E}\) or \(\mathcal {A}\) becomes empty.

The following three subsections present our experimental results. The first one concerns the task of predicting new preferences. For this task, we compute and report average accuracy results, where the accuracy is defined as the number of examples in the testing set that are in agreement with the learned model divided by the size of the testing set.

In the next subsection, we discuss computing optimal outcomes for PLP-forests using the Top-k Clusters rules. We show that the problem can be reduced to the weighted partial MAXSAT problem [1]. This allows one to use the MAXSAT solver toulbar2 [8] to solve them.

Finally, we consider the effect of the choice of a scoring rule on the preference order. To this end, we calculate the Spearman’s rho [15] for Top Cluster, Top-2 Clusters, Top-3 Clusters, Plurality, Borda and Maximin, all against Copeland. This allows us to quantify the similarity between orders generated by different rules.

Fig. 4.
figure 4

Preference learning and optimization results for PLP-forests

5.2 Preference Prediction Results

We focus on PLP-forests of trees learned from small sets of examples. This supports fast learning and leads to small constituent PLP-trees. In our experiments we learned PLP-trees from samples of 50, 100 and 200 examples. The results, averaged over all datasets for the Top-2 Clusters rule, are shown in Fig. 4a. They show that the testing accuracy is better when smaller PLP-trees are learned. We saw similar behavior for other scoring rules and so omitted the results from Fig. 4a. Based on these experiments, we now restrict our discussion to PLP-forests with trees learned from samples of size 50.

In Fig. 4b, we present the mean learning curves over all datasets for all 8 rules, where each curve shows how testing results (accuracy percentages) change with the PLP-forest size (the number of trees in the forest). We also show there the results for learning a single PLP-tree and a single decision tree using the whole training set (70% of \(\mathcal {E}\)). Decision trees in our experiments are classification trees trained using labeled instances, where an instance consists of two outcomes and has a binary label, 1 (0) indicating the first outcome is (is not, resp.) strictly preferred to the second. Given a decision tree D and two outcomes o, \(o'\), the dominance testing query asks if it is true that o is strictly preferred to \(o'\) in D. In testing, to answer such a query for two outcomes, they are input into a decision tree. If the tree predicts 1, we answer yes to the query; otherwise, no.

First, we observe that independently of the rule used the PLP-forest models across all datasets outperform the single PLP-tree model. This is most notable for the Borda rule, with a 4% improvement from 87% for single PLP-trees to 91% for PLP-forests. Pairwise Majority used by Liu and Truszczynski [11] turns out to be the worst aggregating method overall.

Moreover, looking at the results for 1000-tree forests, we see that, these forests have high accuracy of about 89–91%, on the testing datasets, depending on the voting rule. This provides strong evidence for the adequacy of the PLP-forest model to represent user preferences over practical combinatorial domains. This also demonstrates that the differences between these voting rules in predicting new preferences are not significant. In particular, the Top-3 Clusters rule finishes 90%, only a percentile point difference from Borda.

Our results also show that decision trees perform better on our datasets with accuracy of about 99%. We attribute the near-perfect performance of the decision-tree model to their large size (cf. Table 3) that enables classifying with high-granularity whether one outcome is preferred to another. However, the decision-tree model has drawbacks. It does not guarantee that the relation it determines is an order or a partial order, it does not offer clear explanations what factors affect comparisons, and it does not support computing optimal and near-optimal outcomes. In each of these aspects PLP-forest models have an advantage.

Table 3. Size comparison of PLP-trees and decision trees learned from the training data (70% of \(\mathcal {E}\))

5.3 Preference Optimization Results

PLP-forests with scoring rules allow for effective optimal outcome computation. We show it here for orders obtained by using Top, Top-2 and Top-3 Clusters rule to aggregate orders defined by individual PLP-trees in a PLP-forest.

For every dataset, we learn PLP-forests of up to 1000 PLP-trees. To compute the optimal outcome in each forest (under Top, Top-2 or Top-3 Clusters rule), we encode the problem as an instance of the weighted partial MAXSAT problem [1] and use toulbar2 [8] to solve it. A weighted partial MAXSAT instance \(\varPhi \) consists of two parts \(\varPhi _h\) and \(\varPhi _s\), where \(\varPhi _h\), called hard constraints, is a collection of clauses, and \(\varPhi _s\), called soft constraints, is a collection of weighted clauses. If \(\varPhi _h\) is over-constrained and thus unsatisfiable, there is no solution to \(\varPhi \). Otherwise, the solution to \(\varPhi \) is a truth assignment v that satisfies \(\varPhi _h\) and maximizes the sum of the weights of the clauses satisfied by v. We now briefly discuss the encoding for the case of binary attributes, which extends in a straightforward way to the general case of multi-valued attributes.

The encoding consists of two main steps and assumes that we are using the Top-k Clusters rule for aggregation. First, given a PLP-forest \(P=\{T_1, \ldots , T_m\}\), we build a collection \(\varPsi =\{(B^1_1,k),\ldots ,(B^1_{k},1), \ldots ,(B^m_1,k), \ldots , (B^m_{k},1)\}\) of term-weight pairs. Each term \(B^i_j\) is the conjunction of literals x or \(\lnot x\), where x’s are the names of the attributes labeling the nodes on the path in the tree \(T_i\) from its root to the leaf \(\ell _j\). If the path follows to the left child of the node labeled by x, the term \(B^i_j\) includes the literal x, otherwise, it includes the literal \(\lnot x\). This collection of terms can be built in time that is linear in the size of the input. One can show that the winning outcome for P with respect to the Top-k Clusters rule is precisely the truth assignment with the maximum sum of weights of those terms in \(\varPsi \) that it satisfies (and conversely).

Second, we translate \(\varPsi \) to an equivalent weighted partial MAXSAT instance \(\varPhi \) of two parts \(\varPhi _s\) and \(\varPhi _h\). Given \(\varPsi =\{(B_1,w_1),\ldots ,(B_N,w_N)\}\), we build in linear time \(\varPhi _s=\{(c_1,w_1),\ldots ,(c_N,w_N)\}\) where every \(c_i\) is a new atom, and \(\varPhi _h=\{{ CNF}(B_i \leftrightarrow c_i): (B_i,w_i) \in \varPsi \}\) where \({ CNF}\) denotes the set of clauses of a given formula. One can show that the truth assignment with the maximum sum of weights of satisfied terms in \(\varPsi \) is precisely the truth assignment that satisfies all clauses in \(\varPhi _h\) and for which the sum of weights of clauses in \(\varPhi _s\) that it satisfies is maximum (and the converse holds, too).

Average computational time spent on searching for optimal outcomes for all datasets is shown in Fig. 4c. We see that, for any dataset and for any forest size up to 1000, a weighted partial MAXSAT instance encoding preference optimization can be solved within 0.2 s.

The reductions are straightforward for the Top-k Clusters rules. It is not clear how to extend them to other scoring rules. Instead, we show that optimal outcomes computed for the three of the Top-k Clusters rules are close to optimal for orders obtained for other rules. Specifically, for every optimal outcome computed based on Top-k Clusters rule (\(k=1,2,3\)) and every dataset, we randomly select 1000 outcomes and check how well the optimal outcome compares to them, when other voting rules (Plurality, Borda, Copeland and Maximin) are used. Average percentiles of the number of outcomes “beaten” by the optimal one are shown in Fig. 4d. We note that, when forests are big, the optimal outcomes based on the three Top-k Clusters rules are either very likely optimal (when the percentiles are exactly 100%) or very likely near-optimal (when the percentiles are not 100% but very close to), for all other voting rules. This is desirable because it shows that computing optimal outcomes for orders determined by Top-k Clusters rules, which we demonstrated to be computationally feasible, are likely optimal or near-optimal for rules where methods to optimize preferences are not straightforward. For decision trees, the results show that outcomes optimal for Top-k Rules are further from optimal but still within the top 20% of outcomes according to the decision-tree model.

5.4 Rank Correlation Results

In the standard voting setting, the rankings generated by different voting rules are quite close to each other [5, 15]. For the setting of combinatorial domain setting, when preference orders are given as PLP-forests (with scoring rules as aggregators), the results we discussed in the previous sections suggest that here, too, the choice of a voting rule does not affect the order significantly (all rules result in models of similar accuracy and outcomes highly preferred for one rule are highly preferred for other).

Specifically, we empirically studied the correlation to orders determined by the Copeland rule of orders determined by the other scoring rules we studied. As suggested in previous work on measuring rank correlation [5, 15, 16], we used the Spearman’s rho (denoted by \(\rho \)) as the rank correlation coefficient. Given two total orders \(L_1\) and \(L_2\) of outcomes in C, we define

$$ \rho (L_1,L_2)=1-\frac{6 \cdot \sum \limits _{1\le i \le n} (i-D_2(L_1(i)))^2}{n\cdot (n^2-1)}, $$

where i is the rank value between 1 and n, and \(D_2(o)\) is the rank of outcome o in \(L_2\). The value of \(\rho (L_1,L_2)\) is in between \(-1\) and 1, both inclusive. When \(L_1\) and \(L_2\) order C exactly the same, we have \(\rho (L_1,L_2)=1\). If \(\rho (L_1,L_2)=-1\), it means \(L_1\) and \(L_2\) reversely order C. Furthermore, the closer the value to 0, the weaker the correlation between \(L_1\) and \(L_2\).

Our results (cf. Table 4) suggest that Borda-generated orders have a very high degree of consensus with those generated by Copeland, and that Plurality and Top Cluster lead to orders with the lowest degrees of agreement. Nevertheless, in all cases the Spearman’s rho has high values, similar to those obtained for strict preference orders over non-combinatorial domains with few outcomes [5, 15, 16].

Table 4. Mean and standard deviation of the Spearman’s rho results for voting rules against Copeland across all datasets in learning PLP-forests of size 1000

6 Conclusions and Future Work

We proposed to use PLP-forests extended with a voting rule as a model of preference relations. We considered five voting rules, Top-k Clusters, Borda, Plurality, Copeland and Maximin, all adjusted to the case of total preorders. We studied the complexity of three key preference reasoning problems arising in this setting: SCORE, QUALITY and OPTIMIZATION. For Top-k Clusters, Borda and Plurality, our results, together with those obtained earlier in the literature, provide a complete picture. In all cases, the SCORE problem is in P, the QUALITY problem is NP-complete and the OPTIMIZATION problem is NP-hard. For the Copeland and Maximin rules, investigated by Lang et al. [9], only some results are known. However, they suggest the two rules may be more demanding computationally.

We studied our PLP-forest models experimentally. Our results showed that using these voting rules for preferential datasets generated from real-world classification datasets yields models reflecting underlying preference relations with high accuracy, exceeding that of PLP-forest models utilizing the Pairwise Majority rule.

We also studied the correlation among the orders given by different PLP-forest models, extending to the setting of “votes” over combinatorial domains several earlier studies in the standard voting setting with a small number of alternatives. We found that when compared to the model given by PLP-forests with Copeland as an aggregator, all models showed high levels of correlation, similar to those reported in the literature for the standard voting setting. Our results suggest that using rules such as Borda or Top-3 clusters (the two closest to Copeland) produces orders representative for all those that can be obtained by combining a PLP-forest with a scoring rule.

For the Top-k Clusters rule, we developed methods to compute optimal outcomes for orders they determine given a PLP-forest. Our experiments for when \(k=1,2,3\) showed that the methods are computationally feasible. They also show that optimal outcomes computed for the Top-k Clusters rules are near optimal for orders determined by all other scoring rules.

Our results suggest that PLP-forest preference models with scoring rules as aggregators, especially Top-k Clusters and Borda, have many attractive features. They can be learned so that to reflect underlying true preference relation with high accuracy. They represent well orders that result from using other scoring rules. Lastly, they support fast methods for computing optimal outcomes and these outcomes are likely to be near optimal for orders given by other scoring rules.

We also compare our PLP-forest models with the decision tree approach. The decision trees learned from examples can approximate underlying orders with higher accuracy (as high as 99% in our experiments). However, they do have drawbacks not present in PLP-forest models. First, unlike in the case of the PLP-forests, the relation defined by decision trees is not guaranteed to be a total order (not even a partial order). Second, decision trees do not provide any clear insights into key factors determining the underlying preference relations. In contrast, the PLP-tree and PLP-forest models yield information about attributes most significant for determining the preference order. For the PLP-tree model, it is the attribute that labels the root, for the PLP-forest model, the attributes appearing most frequently as the labels of the roots of its trees. Lastly they do not offer effective ways to solve optimization tasks (finding optimal or near optimal outcomes) while, as we show, PLP-tree and forest models do. These drawbacks of decision trees make PLP-forests, despite their lower accuracy, an attractive preference model for use in applications.

Our results provide evidence of low effect of the choice of a voting rule when aggregating preference orders determined by trees in a PLP-forest on the final preference order. Clearly, the strength of this observation is has to be quantified by the range of the data sets we considered. Expanding the scope of experiments to other domains implied by practice, as well as to randomly generated ones is a goal for future research. It will provide a more detailed understanding of sensitivity of the model to the choice of a voting rule.

Improving the accuracy of the PLP-forest model is the main challenge for future work. There seem to be two natural directions. First, one can explore a possibility of combining the PLP-forest and decision-tree models, for instance, by using decision trees at leaf nodes of PLP-trees for comparison tests of outcomes in the corresponding clusters. Second, one can investigate other PLP-tree learning algorithms, possibly developing methods to find trees that best fit with given sets of examples, rather than to use heuristics, as we do now. Another promising direction is to extend the work of Bräuning et al. [3]. First, one can generalize the concept of a preference list to the tree of preference lists. In this way one can expand the ability of the preference list model to handle conditional preferences. Second, similarly as we do in this work considering PLP-forests, that is, collections of PLP-trees, one can study collections of preference list models.