Keywords

1 Introduction

There are diverse situations where the preferences of different people have to be aggregated, in order to decide upon a given set of alternatives. In such a situation a voting rule is used. Such voting rules may also be employed in systems of artificial intelligence, where different agents have to make a joint decision on some set of alternatives. The study of voting rules from an axiomatic and algorithmic point of view is actively pursued in the field of computational social choice (see e.g., the bookchapters by Zwicker [36], Caragiannis et al. [11], Conitzer and Walsh [15], Faliszewski and Rothe [16], Lang and Xia [22], Boutilier and Rosenschein [6]). Most of this work studies the case where the winner of the election is a single candidate, but voting is also used to elect a set of candidates, like in the election of a council or a committee. Consider for example the situation where three friends give a joint party and want to decide what there will be on the buffet. They know that there is only place for a certain number of things on the table, and the three friends will probably have different preferences over the possible choices. In this case a committee election rule may be used. Another application is the design of recommender systems (see [25]), where the task is to choose a number of products for recommendation to a buyer on an online platform.

A widely used rule for committee elections is the minisum approach. Here, the voters decide for each candidate whether they approve or disapprove of her, and the winning committee has a minimum sum of distances to the individual votes, where the distance is measured by the Hamming distance. This corresponds to a utilitarian approach. Another variant of minimizing the voters’ dissatisfaction is the minimax approach suggested by Brams et al. [79, 21]. Here, the maximum distance to an individual vote is minimized using the Hamming distance. Baumeister and Dennisen [2] proposed minisum and minimax committee election rules for trichotomous votes, incomplete linear orders, and complete linear orders. We will study winner determination and manipulation for these newly defined committee election rules.

Closely related to committee elections that minimize the voters’ dissatisfaction are systems of proportional representation (see the work of Chamberlin and Courant [13] and Monroe [27]). In these systems the dissatisfaction of a voter is not computed for the whole committee, but only for that candidate from the committee that represents this voter. Computational aspects for problems of proportional representation have been studied in [4, 32, 34].

2 Definitions and Notations

In this section we introduce the definitions and notations for the voting rules that we analyze in the following sections. A committee election is a triple (CVk), where \(C=\{c_1,\dots ,c_m\}\) is the set of candidates, \(V=(v_1,\dots , v_n)\) is a list of voters represented by their votes, and k is the size of the committee. We will consider four different types of votes. An approval vote is a \(\{0,1\}^m\) vector, for a fixed order of the candidates. A 1 in the vector stands for an approval of the corresponding candidate, whereas a 0 stands for a disapproval. The second type of votes are trichotomous votes (see Felsenthal [17]). Here, the votes can equally express an approval or disapproval for each candidate, but the voters can also decide to abstain for a candidate. This is realized through \(\{-1,0,1\}^m\) vectors for a fixed order of the candidates, where again a 1 stands for approval, but now a \(-1\) stands for disapproval, and a 0 stands for an abstention. The last two types of votes we consider are complete and incomplete linear orders. A complete linear order is a total, transitive, and asymmetric binary relation over the set of candidates. We will denote the vote of voter v by \(>_v\), where \(c >_v d\) means that candidate c is preferred to candidate d by voter v, and \(c \not >_v d\) means that voter v does not prefer candidate c to candidate d. Incomplete linear orders are also transitive and asymmetric, but not necessarily a total binary relation over the set of candidates. Note that this allows for general incomplete linear orders and not only linear orders with indifferences. To distinguish between complete and incomplete linear orders, we will denote an incomplete linear order for voter v by \(\succ _v\). We will write \(a \sim _v b\), if the relation between two candidates a and b is unknown in v, i.e., neither \(a \succ _v b\) nor \(b \succ _v a\) holds.

To define minisum and minimax voting rules, we need a measure of distance for a single vote with a potential committee. For the sake of readability, we will denote an approval committee as a \(\{0,1\}^m\) vector having exactly k ones, or as a set \(K \subseteq C\) of candidates. In case of trichotomous votes, we will also denote a committee as a \(\{-1,1\}^m\) vector having exactly k ones. And we will say that weight(v) denotes the number of ones in a vector v from \(\{0,1\}^m\) or \(\{-1,0,1\}^m\). For approval votes we adopt the obvious approach of using the Hamming distance, as proposed by Brams, Kilgour and Sanver [7]. The distance between a vote \(v \in \{0,1\}^m\) and a committee \(w \in \{0,1\}^m\) is defined through \(HD(w,v)=\sum _{1 \le i \le m} |w(i)-v(i)|\).

In the case of trichotomous votes, we slightly adapt the Hamming distance, such that a complete disagreement between vote and committee adds two points, and an abstention in the vote and an approval or disapproval in the committee adds only one point. The distance between a vote \(v \in \{-1,0,1\}^m\) and a committee \(w \in \{-1,1\}^m\) is defined through \(\delta (w,v)=\sum _{1 \le i \le m} |v(i)-w(i)|\). This distance can be similarly defined for two vectors \(v,w \in \{-1,0,1\}^m\).

For complete linear orders, Baumeister and Dennisen [2] use the sum of the ranks of the committee members in a vote to measure the dissatisfaction. This goes back to the Wilcoxon rank-sum test [35]. The normalized ranksum between a vote v and a committee \(K \subseteq C\) is \(RS(K,v)=\sum _{c \in K } pos(c,v) - \), where pos(cv) denotes the position of candidate c in vote v.

The last type of votes are incomplete linear orders, for which we will use the modified Kemeny distance, as suggested by Baumeister and Dennisen [2]. The distance between a vote v and a committee \(K \subseteq C\), is defined as \(Dist(K,v) = \sum _{a,b\in C} d_{K,v}(a,b)\), where the distance between two candidates a and b regarding a committee K and a vote v is defined through

$$\begin{aligned} d_{K,v}(a,b)={\left\{ \begin{array}{ll} 1 &{} \text {if } (a \in K, b \not \in K \wedge a \sim _v b) \text { or } (a \not \in K, b \in K \wedge a \sim _v b), \\ 2 &{} \text {if } (a \in K, b \not \in K \wedge b \succ _v a) \text { or } (a \not \in K, b \in K \wedge a \succ _v b),\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Note that for an increase of dissatisfaction in the definition of \(d_{K,v}(a,b)\) it is important to require that one of the candidates a or b is not in the committee. Otherwise the dissatisfaction would already increase if for two candidates from the committee (or outside the committee) the vote specifies an order over them, which may result in different winning committees. Transferring the minisum and minimax approach, that will be defined formally in Sect. 3, to trichotomous votes leads to minisum-CAV and minimax-CAV, for complete linear orders to minisum-ranksum and minimax-ranksum, and to minisum-Kemeny and minimax-Kemeny for incomplete linear orders. In the following sections we will study winner determination and manipulation for these voting rules from a computational point of view. We assume that the reader is familiar with the basic concepts of computational complexity, such as many-one reducibility and the complexity classes \(\mathrm P\) and \(\mathrm{NP}\); details and definitions can be found in the textbook by Papadimitriou [30]. Intuitively, the tractable problems are those, which can be solved in polynomial time, whereas problems which are \(\mathrm{NP}\)-complete can be seen as intractable. Table 1 summarizes the above introduced voting rules and the results concerning winner determination that will be obtained in the following section.

Table 1. Winner determination in minisum and minimax voting rules for different forms of votes.

3 Winner Determination

In a single winner election, the winner is a single winning candidate, whereas in a committee election, the winner is one committee (or several committees) consisting of a fixed number of candidates. Let \(F_k(C)\) denote all committees of size k from the set C, where the representation (i.e., approval vote, trichotomous vote, or a set of candidates) will be clear from the context. The number of possible winning committees \(|F_k(C)|\) is exponential in the number of participating candidates. Since committee elections may have a huge number of voters and/or candidates, it is desirable that the winning committees may still be determined in a reasonable amount of time. In the following we will present the minisum and minimax approach that can be combined with the above defined measures of disagreement to define committee election rules. These rules will not always return a single winning committee, hence some tie-breaking rule has to be used in order to obtain a single winning committee.

In a minisum rule, the sum of the disagreement values for all individual votes regarding the winning committee is to be minimized. Formally, the set of winning committees in the minisum rule is \({\text {arg}\,\text {min}}_{K \in F_k(C)}\,\,\sum _{v \in V}\triangle (K,v)\), where \(\triangle \in \{HD,\delta ,RS,Dist\}\) for approval votes, trichotomous votes, complete linear orders, or incomplete linear orders. In contrast in a minimax rule, the maximum disagreement value for an individual vote is to be minimized. Hence, the set of winning committees in the minimax rule is \({\text {arg}\,\text {min}}_{K \in F_k(C)}\,\,\max _{v \in V}\triangle (K,v)\), where \(\triangle \in \{HD,\delta ,RS,Dist\}\) for approval votes, trichotomous votes, complete linear orders, or incomplete linear orders. We will make use of the following theorem from Baumeister and Dennisen [2], which shows that for the case of complete linear orders, the winning committees in a minisum/minimax-Kemeny election and in a minisum/minimax-ranksum election are always equal.

Theorem 1

(Baumeister and Dennisen [2]). The set of winning committees in a minisum/minimax-Kemeny election with complete linear orders and in the corresponding minisum/minimax-ranksum election are always equal, i.e.,

$$\begin{aligned} {\displaystyle \mathop {{\mathrm {arg\,min}}}_{K \in F_k(C)}}\,\,\sum _{v \in V}Dist(K,v)&= {\displaystyle \mathop {{\mathrm {arg\,min}}}_{K \in F_k(C)}}\,\,\sum _{v \in V}RS(K,v), \text { and} \\ {\displaystyle \mathop {{\mathrm {arg\,min}}}_{K \in F_k(C)}}\,\,\max _{v \in V}Dist(K,v)&= {\displaystyle \mathop {{\mathrm {arg\,min}}}_{K \in F_k(C)}}\,\,\max _{v \in V}RS(K,v). \end{aligned}$$

We want to study the complexity of determining a winning committee independently of any tie-breaking rule. Since the number of winners may be exponential, determining all winning committees is obviously not possible in polynomial time, hence we focus on the question whether it is possible to obtain one winning committee. We will see that in the case of minisum elections, this is always possible in polynomial time. For the minimax elections, we show \(\mathrm{NP}\)-hardness of the corresponding decision problem for trichotomous votes and incomplete linear orders, and present an approximation algorithm for the case of trichotomous votes.

3.1 Minisum Elections

In minisum-approval the total dissatisfaction is minimum if the committee contains k candidates having the highest number of approvals in all votes. Hence, as shown by Brams, Kilgour, and Sanver [8], it is possible to determine a winner in polynomial time. Similar to minisum-approval in minisum-CAV, the total dissatisfaction is minimum if the committee consists of k candidates that have the highest combined approval scores, i.e., the sum of the corresponding values from the votes. Again a winner can obviously be determined in polynomial time.

Theorem 2

A minisum-ranksum winner can be determined in polynomial time.

This holds, since a minisum-ranksum winner consists of k candidates with the highest Borda scoreFootnote 1. The detailed proof is omitted due to space.

The corresponding single-winner problem for Kemeny elections, i.e., whether there exists a linear order for which the sum of the distances to the votes does not exceed a given bound, was shown to be \(\mathrm{NP}\)-hard by Bartholdi et al. [20]. In contrast, we show that a winning committee for minisum-Kemeny can be determined in polynomial time, by reducing the determination of a winner committee for an election with incomplete linear orders to an election in which all votes are complete linear orders.

Lemma 1

For each minisum-Kemeny election (CVk) with incomplete linear orders, there is a minisum-Kemeny election \((C,V',k)\) with complete linear orders, so that \(\sum _{v \in V'}{Dist(K,v)} = 2 \sum _{v \in V}{Dist(K,v)}\) holds for all committees \(K \in F_k(C)\) and \(V'\) can be constructed in polynomial time.

Proof

Given a committee election (CVk), we will construct a set of voters \(V'\) with the following properties: For each vote \(v \in V\) there will be two votes \(v'_1\) and \(v'_2\) in \(V'\), such that \(\forall c,d \in C\)

  1. 1.

    if \(c \succ _v d\), then \(c >_{v'_1} d\) and \(c >_{v'_2} d\),

  2. 2.

    if \(c \sim _v d\), either \(c >_{v'_1} d\) and \(d >_{v'_2} c\), or \(c >_{v'_2} d\) and \(d >_{v'_1} c\) holds.

In general, for a committee K and a voter list U, we have: \(\sum _{v \in U}{Dist(K,v)} =\sum _{c \in K} \sum _{d \notin K} {(|\{v \in U | c \sim _v d \}| + 2 \cdot |\{v \in U | d \succ _v c \}|)}\). With the above defined properties, it holds \(|\{v \in V'| d > c \text { in } v\}| = |\{v \in V| c \sim _v d \}| + 2 \cdot |\{v \in V | d \succ _v c\}|\) for all \(c,d \in C\), and thus

$$\begin{aligned} \sum \limits _{v \in V'}&{Dist(K,v)} = \sum \limits _{c \in K} \sum \limits _{d \notin K} (|\{v \in V'| c \sim _v d\}|+ 2 \cdot |\{v \in V'| d >_v c \}|)\\&= 2\cdot \sum \limits _{c \in K} \sum \limits _{d \notin K} (|\{v \in V|c \sim _v d\}| + 2 \cdot |\{v \in V | d \succ _v c\}|)= 2 \cdot \sum \limits _{v \in V}{Dist(K,v)}. \end{aligned}$$

We now describe how \(V'\) can be constructed: For each vote \(w \in V\) create two directed graphs \(G_1 = (V_1,E_1)\) and \(G_2 = (V_2,E_2)\).

  1. 1.

    \(G_1\) and \(G_2\) are constructed as follows: There is a node for each candidate \(c \in C\), and an edge (cd) if \(c \succ _w d\) holds.

  2. 2.

    Find two nodes \(u,v \in V_1\), so that neither \((u,v) \in E_1\) nor \((v,u) \in E_1\) holds. Add (uv) to \(E_1\). If there is no such pair, go to step 7.

  3. 3.

    Build the transitive closure of \(G_1\).

  4. 4.

    Add for each edge (uv) which was added in steps 2 and 3 to \(E_1\), the edge (vu) to \(E_2\).

  5. 5.

    Build the transitive closure of \(G_2\).

  6. 6.

    If in step 5 new edges were added to \(E_2\), add for each edge (uv) which was added to \(E_2\), the edge (vu) to \(E_1\) and go to step 3. Otherwise, go to step 2.

  7. 7.

    Determine the complete orders on basis of the indegree of the nodes of \(G_1\) and \(G_2\) and halt. If a node in \(G_i\) has indegree j, the corresponding candidate is at position \((j + 1)\) in \(v'_i\).

According to the definition of incomplete linear orders, the original graph in step 1 is acyclic and transitively closed. If we add an edge (uv) to \(E_1\) in step 2, this cannot create a cycle since there was no path from v to u: If there had been a path from v to u, the edge (vu) would have been in \(E_1\) because of the transitive closure. This holds analogously for adding an edge (vu) to \(E_2\) in step 4. If we compute the transitive closure, no cycle can be created if the original graph was acyclic.

If we “mirror” an edge (uv) from \(G_1\) to \(G_2\), it holds that neither edge (uv) nor (vu) existed in \(G_2\) before. Analogously, it holds that neither edge (uv) nor edge (vu) existed in \(G_1\) if we “mirror” an edge (uv) from \(G_2\) to \(G_1\).

A directed graph \(G=(V',E')\), which contains for each node pair \(u, v \in V'\) either edge (uv) or edge (vu), contains exactly \(((|V'|-1)\cdot |V'|)\cdot \frac{1}{2}\) edges. The algorithm runs until both graphs contain \(((|V_1|-1)\cdot |V_1|)\cdot \frac{1}{2}\) \(= ((|V_2|-1)\cdot |V_2|)\cdot \frac{1}{2}\) edges. Step 2 guarantees that this edge number is attained in \(G_1\). As for each edge (uv) which is added to \(G_1\) in step 2, edge (vu) is added in step 4 to \(G_2\), this edge number is attained in \(G_2\), too. So, the algorithm terminates, and the properties stated at the beginning of this proof hold.

The initialization of the graphs and the translation of the resulting graphs into two complete linear orders \(v'_1\), \(v'_2\) for a vote \(v \in V\) is possible in polynomial time. The transitive closure of a graph G can be computed in \(O(|V'|^3)\). For a vote v, the transitive closure has be to computed maximally as often as edges are added to \(G_1\) or \(G_2\). The number of edges is smaller than \(|V_1|^2\), i.e., the runtime for the computation of transitive closures for a vote v is in \(O(|V_1|^3 \cdot |V_1|^2) =\) \( O(m^5)\). So, \(V'\) can be constructed in time \(O(m^5 \cdot n)\). This completes the proof.    \(\square \)

Now, we are ready to state our result for winner determination in minisum-Kemeny elections. The proof follows directly from Lemma 1, Theorems 2 and 1.

Theorem 3

A minisum-Kemeny winning committee can be determined in polynomial time.

3.2 Minimax Elections

In contrast to the minisum elections, in minimax-approval, minimax-CAV, and minimax Kemeny elections, it is not always possible to determine a winning committee in polynomial time, unless \(\mathrm P= \mathrm{NP}\). This holds, since we show \(\mathrm{NP}\)-hardness of the following decision problem.

figure a

where \(\triangle \in \{HD,\delta ,RS, Dist\}\) for approval votes, trichotomous votes, complete or incomplete linear orders. Since there is a natural upper bound for \(\triangle (K,v)\) for every \(\triangle \in \{HD, \delta ,RS, Dist\}\), it is possible to compute a winning committee in polynomial time, if this decision problem is solvable in polynomial time. LeGrand [23] showed that HD -Minimax-Score is \(\mathrm{NP}\)-complete. However, LeGrand et al. [24] showed that the search variant of this problem, where we actually seek a winning committee, can be approximated with a factor of 3, and Caragiannis et al. [12] proposed an LP-based algorithm for this problem where the distance to the optimal solution is at most 2. Recently, Byrka and Sornat [10] provided a PTAS for minimax approval voting.

The above defined score problem for trichotomous votes is also \(\mathrm{NP}\)-complete.

Theorem 4

\(\delta \)-Minimax-Score is \(\mathrm{NP}\)-complete.

Proof

Membership in \(\mathrm{NP}\) is obvious, and to see that \(\delta \)-Minimax-Score is \(\mathrm{NP}\)-hard, it suffices to transform every approval vote v into a trichotomous vote \(v'\) by replacing every 0 in v by a \(-1\) in \(v'\). To obtain a committee \(K'\) for trichotomous votes, we need to replace every 0 by a \(-1\) in the resulting approval committee K. Then we have \(HD(K,v) \le d \Leftrightarrow \delta (K',v') \le 2d\).    \(\square \)

Similarly to the approximation algorithm for the search variant of HD-minimax-score by LeGrand et al. [24], one can give an approximation algorithm with a factor 3 for the search variant of \(\delta \)-minimax-score.

The search variant of will be denoted by Minimax-CAV, where the input is also a committee election (CVk) with votes represented as \(\{-1, 0, 1\}^m\) vectors, and the aim is to find a committee \(K^* \in {\text {arg}\,\text {min}}_{K \in F_k(C)} \max _{u \in V} \delta (K,u)\).

Theorem 5

There is an approximation algorithm which finds a solution \(K'\) in polynomial time, so that for each optimal solution K of the search problem Minimax-CAV it holds \(\delta (K',u) \le 3 \cdot Max\delta (V, K)\) for all votes \(u \in V\), where \(Max\delta (V, K) = \max _{u \in V}\delta (K,u)\) denotes the maximum distance between K and the votes in V.

Proof

Denote for a vector \(v \in \{-1, 0, 1\}^m\) by z(v) the number of zeros in v, and by n(v) the number of \(-1\) in v. A k-completion of v is a vector \(v'\) in \(\{-1,1\}^m\) constructed as follows.

  1. 1.

    If \(weight(v) \le k\) and \(z(v) + weight(v) < k\), transform all zeros into ones and transform \(-1\) into ones until \(weight(v') = k\).

  2. 2.

    If \(weight(v) \le k\) and \(z(v) + weight(v) \ge k\), transform zeros into ones until \(weight(v') = k\), and transform the remaining zeros into \(-1\).

  3. 3.

    If \(weight(v) > k\), transform ones into \(-1\) until \(weight(v') = k\), and transform all zeros into \(-1\).

Obviously, each k-completion \(v'\) for v contains exactly k ones. We will first show, that for an optimal solution K, an initial vector v and a k-completion \(K'\) of v, it holds \(\delta (K',v) \le \delta (K,v)\). Fix an optimal solution K, an initial vector v and a k-completion \(K'\) of v. Consider the possibilities for the i-th position in the vectors, where we can have a \(-1\), a 0, or a 1. For the first case (\(weight(v) \le k\) and \(z(v) + weight(v) < k\)), we have the following 8 possibilities. For \(1 \le i \le 8\), \(k_i\) denotes the number of occurrences of the corresponding possibility (Table 2).

Table 2. Possibilities for the first case

We have \(weight(K') = k_1 + k_2 + k_3 + k_4 + k_5 + k_6 = k\), and \(weight(K) = k_1 + k_3 + k_5 + k_7 = k\). Hence \(k_2 + k_4 + k_6 = k_7\), and we have \(\delta (K,v) = 2k_2 + k_3 + k_4 + 2k_5 + 2k_7\), and \(\delta (K',v) = k_3 + k_4 + 2k_5 + 2k_6\). Now it is easy to verify, that \(\delta (K,v) \ge \delta (K',v)\). The other two cases can be handled analogously.

The approximation algorithm Approx-Minimax-CAV proceeds as follows:

  1. 1.

    Select a vote \(v \in V\) arbitrarily.

  2. 2.

    Compute a k-completion \(v'\) of v.

  3. 3.

    Return \(K' = v'\) as solution.

Obviously, this algorithm runs in polynomial time. To estimate the approximation rate of the algorithm, consider the vote \(v \in V\) which was chosen, the k-completion \(K'\) of v which is returned as solution, and an optimal solution K for Minimax-CAV, i.e., a vector K in \(\{-1,1\}^m\) containing exactly k ones so that \(Max\delta (V, K)\) is minimum for all vectors in \(\{-1,1\}^m\) with weight k.

We need to show that \(\delta (K',u) \le 3 \cdot Max\delta (V, K)\) holds for all \(u \in V\). Since the triangle inequality applies to \(\delta \), we have for all \(u \in V\): \(\delta (K',u) \le \delta (K',v) + \delta (v,u)\). Repeated application of the triangle inequality leads to:

$$\begin{aligned} \delta (K',u) \le \delta (K',v) + \delta (K,v) + \delta (K,u). \end{aligned}$$
(1)

Since K is an optimal solution, we have \(\delta (K,u) \le Max\delta (V, K)\) for all \(u \in V\). Similarly, we have \(\delta (K,v) \le Max\delta (V, K)\). Since \(K'\) is a k-completion of v, it also holds that \(\delta (K',v) \le \delta (K,v)\). The three terms on the right hand of inequality (1) are \(\le Max\delta (V, K)\), and we get the desired property: \(\delta (K',u) \le 3 \cdot Max\delta (V, K)\text { for all }u \in V.\)    \(\square \)

We now turn to the study of the Dist -Minimax-Score for incomplete linear votes. To show that this problem is also \(\mathrm{NP}\)-hard, we first need to show \(\mathrm{NP}\)-hardness of Restricted- HD -Minimax-Score, which corresponds to HD -Minimax-Score for the case where the number m of candidates is even and the size of the committee is exactly .

Theorem 6

Restricted-HD -Minimax-Score is \(\mathrm{NP}\)-complete.

Proof

The \(\mathrm{NP}\)-hardness of Restricted- HD -Minimax-Score can be shown via a reduction from HD -Minimax-Score. Consider a HD-Minimax-Score instance (CVkd), and construct a Restricted- HD -Minimax-Score instance \((C',V',k',d)\). The set of candidates is \(C' = C \cup D\) with the set of dummy candidates \(D = \{d_1,...,d_m\}\), and the list \(V' = \{v'_1,...,v'_n\}\) of votes is constructed as follows: \(v'_i\) has the form \((v_i,w)\) where w is a vector that has a 1 at the first \({m-k}\) positions, and a 0 on the remaining k positions. Denote the set of the \(m-k\) candidates from D who receive a 1 in all votes by W. Let the size of the committee be \(k' = m\). Now, we show that \((C,V,k,d) \in HD {\textsc {-Minimax-Score}} \Leftrightarrow (C',V',k',d) \in {\textsc {Restricted-}} HD {\textsc {-Minimax-Score}}\).

(\(\Rightarrow \)) Assume that (CVkd) is a yes-instance for HD -Minimax-Score, i.e., there is a committee \(K \in F_k(C)\) so that the Hamming distance to all votes in V is at most d. Then, there also exists a committee \(K' \in F_{k'}(C') = F_m(C')\) so that the Hamming distance to all votes in \(V'\) is at most d, namely the committee \(K \cup W\), since for all \(i \in \{1,...,n\}\), it holds that \(HD(K',v'_i)= HD(K \cup W, v'_i) = HD(K,v_i) + HD(W,w) = HD(K,v_i) \le d\).

(\(\Leftarrow \)) Assume that \((C',V',k',d)\) is a yes-instance for Restricted- HD -Mini-max-Score. Then, there is a committee \(K^* \in F_{k'}(C') = F_m(C')\), so that the Hamming distance to all votes in \(V'\) is at most d. We will now consider all possible choices of \(K^*\) and show that there is always a winning committee with exactly k candidates from C and \(m-k\) candidates from D.

Case 1: \(K^* = K \cup W\) for a committee \(K \in F_k(C)\) . If \(K^*\) has the form \(K \cup W\) for a committee \(K \in F_k(C)\), we have with \(K = K^*-D\) a committee in \(F_k(C)\) so that the Hamming distance to all votes in V is at most d: for all \(i \in \{1,...,n\}\), it holds that \(HD(K,v_i) = HD(K^*,v'_i) \le d\).

Case 2: \(K^* \ne K \cup W\) for a committee \(K \in F_k(C)\) . If \(K^*\) does not have the form \(K \cup W\) for a committee \(K \in F_k(C)\), we can transform \(K^*\) into a committee \(K' = K \cup W\) so that \(K \in F_k(C)\), as follows. If a candidate in W is not elected, “shift” in the vector representation of the committee, if possible, a 1 from a candidate in \(D - W\) to this candidate in W. This action reduces the Hamming distance of the committee regarding all votes. Since in all votes the candidates in W are accepted and the candidates in \(D-W\) are rejected, the Hamming distance regarding D decreases by 2 via such a shift. Suppose that q such shifts are required and let \(K^+\) denote the resulting committee. Then it holds that \(HD(K^+,v'_i) = HD(K^*,v'_i) - 2q \le HD(K^*,v'_i)\).

Case 2.1: In \(K^+\) more than \(m-k\) candidates from D are elected. If in \(K^+\) more than \(m-k\) candidates from D are elected, “shift” the surplus ones from candidates in \(D - W\) to the candidates in C. Overall, exactly m candidates are elected, i.e., if exactly \(m-k\) candidates from D are elected, exactly k candidates from C are elected. The resulting committee \(K'\) is a committee of the form \(K \cup W\) with \(K \in F_k(C)\). If we shift a 1 from \(D-W\) to C, the Hamming distance regarding the candidates in D decreases by value 1 since in all votes the candidates in \(D-W\) are rejected. The Hamming distance regarding the candidates in C increases at most by 1 in a shift. So, the Hamming distance either decreases or stays the same.

Case 2.2: In \(K^+\) less than \(m-k\) candidates from D are elected. We can analogously “shift” the missing ones from C to the candidates in W. With \(K = K' - D\), we have a committee from \(F_k(C)\), where the Hamming distance regarding all votes in V is at most d. So, we have for all \(i \in \{1,...,n\}\) \(HD(K,v_i) = HD(K' - D,v_i) = HD(K',v_i) \le HD(K^+,v'_i) \le HD(K^*,v'_i) \le d\).    \(\square \)

With the \(\mathrm{NP}\)-hardness of Restricted- HD -Minimax-Score, we can now show \(\mathrm{NP}\)-hardness of Dist -Minimax-Score for incomplete linear orders.

Theorem 7

Dist -Minimax-Score is \(\mathrm{NP}\)-complete.

Proof

The \(\mathrm{NP}\)-hardness of Dist -Minimax-Score can be shown via a reduction from Restricted- HD -Minimax-Score. Consider a Restricted- HD -Minimax-Score instance (CVkd) and construct a Dist -Minimax-Score instance \((C,V',k,d')\), The set of votes \(V' = \{v'_1,...,v'_n\}\) is constructed as follows: Let \(W_i\) denote the set of candidates who received a 1 in \(v_i\), and \(L_i\) the set of candidates who received a 0 in \(v_i\). The candidates from \(W_i\) and \(L_i\) are ordered in \(v'_i\) so that \(w \succ _{v'_i} l\) holds for all \(w \in W_i\) and all \(l \in L_i\). The bound on the distance is \(d' = d\cdot k\). We now show that due to the property \(k = \frac{m}{2}\), \(Dist(v'_i, K) = HD(K,v_i)\cdot k\) holds for the given construction for all \(i \in \{1,...,n\}\) and for all committees \(K \in F_k(C)\). For a vote \(v'_i \in V'\) and a committee K define the following sets of candidates:

  • \(K_i^W = K \cap W_i\),

  • \(K_i^L = K \cap L_i\),

  • \(\bar{K}_i^W = (C \setminus K) \cap W_i\), and

  • \(\bar{K}_i^L = (C \setminus K) \cap L_i\).

The computation of the modified Kemeny distance can then be carried out based on the quantities \(|K_i^W|\), \(|K_i^L|\), \(|\bar{K}_i^W|\), and \(|\bar{K}_i^L|\). It suffices to compare all candidates in the committee with the candidates outside of the committee, i.e., we only need to consider the candidate pairs (ab) with \(a \in K\) and \(b \in (C \setminus K)\). We have:

  • For each pair of candidates \((a,b) \in K_i^W \times \bar{K}_i^W\), the relation between a and b is unknown in \(v_i\), since both of them are contained in \(W_i\). So, we have a distance of 1.

  • For each pair of candidates \((a,b) \in K_i^W \times \bar{K}_i^L\), it holds also \(a \succ _{v_i} b\), since a is contained in \(W_i\) and b in \(L_i\). So, we have a distance of 0.

  • For each pair of candidates \((a,b) \in K_i^L \times \bar{K}_i^W\), it holds \(b \succ _{v_i} a\) since a is contained in \(L_i\) and b in \(W_i\). So, we have a distance of 2.

  • For each pair of candidates \((a,b) \in K_i^L \times \bar{K}_i^L\), the relation between a and b is unknown in \(v_i\), since both of them are contained in \(L_i\). So, we have a distance of 1.

Thus, we have:

$$\begin{aligned} Dist(v'_i, K) = |K_i^W|\cdot |\bar{K}_i^W| + 2\cdot |K_i^L|\cdot |\bar{K}_i^W| + |K_i^L| \cdot |\bar{K}_i^L| \end{aligned}$$
(2)

If we determine the Hamming distance between a committee K and a vote \(v_i\), we count the positions where a candidate who is elected in the committee is not accepted in \(v_i\) and the positions where a candidate not elected in the committee is accepted in \(v_i\), i.e., we have:

$$\begin{aligned} HD(K,v_i) = |K_i^L| + |\bar{K}_i^W| \end{aligned}$$
(3)

For all \(i \in \{1,...,n\}\) and for all committees \(K \in F_k(C)\), we have:

$$\begin{aligned} Dist(v'_i,K)&\overset{(2)}{=} |K_i^W|\cdot |\bar{K}_i^W| + 2\cdot |K_i^L|\cdot |\bar{K}_i^W| + |K_i^L|\cdot |\bar{K}_i^L|\\&= |\bar{K}_i^W|\cdot (|K_i^W| + |K_i^L|) + |K_i^L|\cdot (|\bar{K}_i^W| + |\bar{K}_i^L|)\\&= |\bar{K}_i^W|\cdot k + |K_i^L|\cdot (m-k) = (|\bar{K}_i^W| + |K_i^L|)\cdot k + |K_i^L|\cdot (m - 2k)\\&\overset{(3)}{=} HD(K,v_i)\cdot k + |K_i^L|\cdot (m - 2k). \end{aligned}$$

With \(k = \frac{m}{2}\) it holds for all \(i \in \{1,...,n\}\) and for all committees \(K \in F_k(C)\) \(Dist(v'_i,K) = HD(K,v_i)\cdot k + |K_i^L|\cdot 0 = HD(K,v_i)\cdot k\). Hence, we have that there is a committee for which the Hamming distance to all votes in V is at most d, if and only if there is a committee for which the modified Kemeny distance to all votes in \(V'\) is at most dk. This completes the proof.    \(\square \)

The complexity of the corresponding problem for complete linear orders, RS-Minimax-Score, remains open.

4 Manipulation

The famous Gibbard and Satterthwaite Theorem [18, 33] says that, in principle, every preference-based voting rule is manipulable. Bartholdi et al. [1] introduced a decision problem that captures manipulation in elections. They ask whether for a given election and some distinguished candidate, there is a vote of the manipulator that makes this candidate win. Based on manipulation for single winner elections, Meir et al. [26] propose manipulation problems for committee elections. Their definition includes a utility function and its most general form is as follows.

figure b

They consider an adversarial tie-breaking (see [14]), where from several equally performing candidates those with the lower utility for the manipulator win the election. Procaccia et al. [31] state that Utility-Committee-Manipulation is in \(\mathrm P\) for committee elections under Approval voting. The proof is given by Meir et al. [26], who also show that if the utility function is mapping to \(\{0,1\}\) rather than \(\mathbb {Z}\), Utility-Committee-Manipulation is in \(\mathrm P\) for all committee elections held under scoring rules. Since in a minisum-ranksum election the winning committee contains the k candidates with the highest Borda scores (see the remark after Theorem 2), the fact that Utility-Committee-Manipulation for minisum-ranksum is in \(\mathrm P\) follows immediately.

Obraztsova et al. [29] follow the approach of Meir et al. [26] by defining manipulation in committee elections through a utility function of the manipulator. They study the complexity of manipulation in committee elections, with a particular focus on the role of tie-breaking. They focus on the case where ties are broken according to a fixed predefined order or by a natural randomized rule.

We will also follow the approach of Meir et al. [26], but here we consider non-unique winners and therefore have to change the definition by asking whether the condition holds for at least one winning committee. Furthermore, we will focus only on the case where the utility function is boolean-valued, and state two special variants of it, that we find most natural. In the first variant, we simply ask whether it is possible for the manipulator to vote such that a given subset of the candidates is in at least one winning committee. In analogy to the manipulation problems in single winner elections, we call this problem \(\mathcal {E}\) -Committee-Manipulation.

figure c

While CM asks whether all candidates in L can become part of the winning committee, one can also ask whether it is possible to make at least t candidates in L part of a winning committee.

figure d

Note that in the case of a committee election rule that always returns a single winner, CM is a special case of Utility-Committee-Manipulation with \(t=|L|\) and in which the utility function maps all candidates in L to the value 1 and all candidates in \(C \setminus L\) to 0. Furthermore, CM is the special case of TCM with \(t=|L|\le k\)

Even though we consider a different model regarding tie-breaking, the above mentioned results by Meir et al. [26] can be adapted to CM and TCM.

Theorem 8

CM and TCM for minisum-approval are in \(\mathrm P\).

Following Meir et al. [26] it holds that if the manipulators’ strategy to approve all candidates in L and to disapprove all candidates in \(C \setminus L\) does not succeed, no other does. Thus, both CM and TCM for minisum-approval can be decided in polynomial time. A similar result holds for minisum-CAV.

Theorem 9

CM and TCM for minisum-CAV are in \(\mathrm P\).

Similar to minisum-approval we have that if the strategy to give all candidates in L a 1 and all candidates in \(C \setminus L\) a \(-1\) does not succeed, no other one does. In the case of complete linear orders, we show, that both manipulation problems can also be solved efficiently.

Theorem 10

CM and TCM for minisum-ranksum are in \(\mathrm P\).

Proof

As CM is the special case of TCM where t equals the number of candidates in L, it suffices to give the proof for TCM. Similar to the proofs by Meir et al. [26] and Bartholdi et al. [20] (see also Obraztsova et al.  [28]) we show how to decide in polynomial time whether it is possible that at least t members in L belong to a winning committee. We denote the number of candidates in L by l. We define four lists in which the candidates are given in ascending order regarding their Borda scores. List \(T=(t_{1},\dots ,t_{t})\) contains the t candidates in L with the highest Borda scores, list \(F=(f_{1},\dots ,f_{l-t})\) all other candidates in L, list \(G=(g_{1},\dots ,g_{\min (k-t,m-l)})\) the \(\min (k-t,m-l)\) candidates in \(C \setminus L\) with the highest Borda scores and \(\hat{C}=(\hat{c}_{1},\dots , \hat{c}_{m-l-\min (k-t,m-l)})\) the remaining candidates. The manipulator’s vote is given by \(t_{1} > \dots > t_{t} > f_{1} > \dots > f_{l-t} > g_{1} > \dots > g_{\min (k-t,m-l)} > \hat{c}_{1} > \hat{c}_{m-l-\min (k-t,m-l)}\). If at least t members in L are part of a winning committee in the election \((C,(v_{1},\dots ,v_{n},s),k)\) the manipulation was successful, otherwise it is not possible.    \(\square \)

The next theorem shows that for the case of incomplete linear orders, CM and TCM are solvable in polynomial time as well.

Theorem 11

CM and TCM for minisum-Kemeny are in \(\mathrm P\).

Proof

It again suffices to give the proof for TCM. Given a committee election (CVk) with \(V=(v_1,\dots , v_n)\), we construct a set of voters \(V'=(v'_{1},\dots , v'_{2n})\) according to Lemma 1.

We introduce a problem \(\textsc {2-TCM}\) which equals \(\textsc {TCM}\) with the exception that we ask for the existence of a single vote s over C, such that at least t candidates in L belong to a winning committee K in the committee election \((C,(v'_{1},\dots ,v'_{n},s,s),k).\) As \(V'\) is a list of complete linear orders and according to Theorem 1, the winning committees in the corresponding minisum-Kemeny election with complete linear orders equal the winning committees in a minisum-ranksum-election. The proof that \(\textsc {2-TCM}\) for minisum-ranksum is in P is analogous to the proof of Theorem 10 Footnote 2. We now show that it is possible to manipulate \((C,V',k)\) with two manipulators with identical votes if and only if it is possible to manipulate (CVk) with a single manipulator. It is possible to manipulate the election (CVk) according to TCM, i. e. there exists an s such that \(L \subseteq K^{*}\) for a

$$\begin{aligned} K^{*} \in&{\displaystyle \mathop {{\mathrm {arg\,min}}}_{K \in F_k(C)}}(\sum _{v \in V} Dist(K,v) + Dist(K,s))\\&\quad \,\, {=} \quad {\displaystyle \mathop {{\mathrm {arg\,min}}}_{K \in F_k(C)}} (2 (\sum _{v \in V} Dist(K,v) + Dist(K,s)))\\&\overset{\text {Lemma 1}}{=} {\displaystyle \mathop {{\mathrm {arg\,min}}}_{K \in F_k(C)}}(\sum _{v \in V'} Dist(K,v) + 2 \cdot Dist(K,s)). \end{aligned}$$

Therefore, \(K^{*}\) is a winning committee in \((C,(v'_{1},\dots ,v'_{2n},s,s),k)\). The other direction can be shown similarly.    \(\square \)

5 Conclusions

We showed that a winning committee under all minisum rules can be determined in polynomial time, whereas the corresponding decision problem for minimax rules is \(\mathrm{NP}\)-hard for trichotomous votes and incomplete linear orders. In the case of approval votes, \(\mathrm{NP}\)-hardness was already shown by [23], and the complexity of winner determination for minimax-ranksum remains open. Our analysis focuses on the case where the size of the committee is known in advance. For the case where the size of the committee is not fixed in advance, we can still measure the disagreement of a voter with several committees of different sizes with the Hamming distance in the case of approval or trichotomous votes. Hence for minisum-approval and minisum-CAV it is still possible to determine a winner in polynomial time, since it is enough to compare the disagreement for all possible sizes of the committee. LeGrand [23] argued that the corresponding decision problem for minimax-approval also remains \(\mathrm{NP}\)-complete. However in the case of complete or incomplete linear orders it is not directly clear how the disagreement of committees of different sizes can be compared with each other. Note that in particular the disagreement of a voter measured by the ranksum or the modified Kemeny distance is always zero for the committee that consists of all candidates and for the empty committee. One task for future research is to compare these rules in terms of their axiomatic properties. Besides winner determination we also studied manipulation in minisum elections, and obtained polynomial-time solvability results in all cases. Another interesting question for future research is the problem of coalitional manipulation, where not a single manipulator, but several voters try to take influence on the outcome of the election.