Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Voting is a very common way of resolving disagreements, determining common opinions, choosing public policies, electing office-holders, finding winners in contests and solving other problems of amalgamating a set of (typically individual) opinions. Indeed, group decision making most often involves bargaining (see chapters by Druckman and Albin, and Kibris, this volume) or voting, or both. Voting can be precisely regulated, like in legislatures, or informal, like when a group of people decide where and how to spend a Sunday afternoon together. The outcome of voting is then deemed as the collective choice made by group.Footnote 1

The decision to take a vote is no doubt important, but so are the questions related to the way in which the vote is taken. In other words, the voting procedure to be applied plays an important role as well. In fact, voting rules are as important determinants of the voting outcomes as the individual opinions expressed in voting. An extreme example is one where – for a fixed set of expressed opinions of the voters – the outcome can be any one of the available alternatives depending on the procedure applied. Consider the following example of the election of department chair (Nurmi, 2006, 123–124). There are five candidates for the post. They are identified as A, B, C, D and E. Altogether nine electors can participate in the election. Four of them emphasize the scholarly merits of candidates and find that A is most qualified, E next best, followed by D, then C and finally B. Three electors deem the teaching merits as most important and give the preference order BCEDA. The remaining two electors focus on administrative qualifications and suggest the order CDEBA. These views are summarized in Table 1.

Table 1 Five candidates, five winners

Suppose now that the voting method is the one-person-one-vote system where every voter can vote for one candidate and the winner is the recipient of the largest number of votes. This is system is also known as the plurality method. Assuming that the voters vote according to their preferences expressed in Table 1, the winner is A with four votes.

Plurality system is a very common voting rule, but in many single-winner elections, the aim is to elect a candidate supported by at least a half of the electorate. Since there often is no such candidate, a method known as plurality runoff eliminates all but two candidates and applies the plurality rule to this restricted set of candidates. Barring a tie, this is bound to result in a winner supported by more than a half of the electorate. But what is the criterion used in excluding all but two candidates? It is the number of plurality votes received. If one candidate gets more than 50% of the votes, he/she (hereafter he) is elected. Otherwise those two candidates with largest number of votes face off in the second round of voting. The winner of this round is then declared the winner. In Table 1 example, since no candidate is supported by five or more voters, the second round candidates are A and B. In the second round B presumably gets the votes of the two voters whose favorites are not present in the second round. So, B wins by the plurality runoff method.

Suppose that instead of voting once as in plurality or at most twice as in the plurality runoff one, the voters can vote for their candidate in every pair that can be formed. That is they can vote for either A or B, for either B or C, etc. There are several voting methods that are based on such pairwise comparisons of decision alternatives. They differ in how the winner is determined once the pairwise votes have been taken. Most of these methods, however, agree on electing the candidate that beats all other contestants in pairwise votes, should there be such a candidate. In Table 1 there is: it is C. C would defeat all other candidates by a majority in pairwise comparisons. It is, by definition, then the Condorcet winner.

Now we have three different winners depending on which rule is adopted in the example of Table 1 However, even E can be the winner. This happens if the Borda count is used. This is a method that is based on points assigned to alternatives in accordance with the rank they occupy in individual preference orderings. Lowest rank gives 0 points, next to lowest 1 point, the next higher 2 points,..., the highest rank k-1 points, if the number of alternatives is k. Summing the points given to candidates by voters gives the Borda score of each candidate. In Table 1 the scores are 16 for A, 14 for B, 21 for C, 17 for D and 22 for E. The winner by the Borda count is the candidate with the largest Borda score, i.e. E.

Fig. 1
figure 1

The successive agenda

It is possible that even D be the winner. Suppose that the approval voting method is adopted. This method allows each voter to vote for as many candidates as he wishes with the restriction that each candidate can be given either 1 or 0 votes. The winner is the candidate with the largest number of votes. By making the additional assumption that the group of four voters votes for three of their most preferred candidates (i.e. for A, E and D), while the others vote for only two highest ranked ones, D turns out as the approval voting winner.

So, by varying the rule any candidate can be elected the department chair if the expressed voter opinions are the ones presented in Table 1. Why do we have so many rules which seemingly all aim at the same goal, viz. to single out the choice that is best from the collective point of view? All rules have intuitive justification which presumably has played a central role in their introduction. The plurality and plurality runoff rules look for the candidate that is best in the opinion of more voters than other candidates. In the case of plurality runoff there is the added constraint that the winner has to be regarded best by at least a half of the electorate. The systems based on pairwise comparisons are typically used in legislatures and other bodies dealing with choices of policy alternatives rather than candidates for offices. The motivation behind the Borda count is to elect the alternative which on the average is positioned higher in the individual rankings than any other alternative. The approval voting, in turn, looks for the alternative that is approved of by more voters than any other candidate.

Table 1 depicts a preference profile, i.e. a set of preference relations of voters over decision alternatives. In analyzing the outcomes ensuing from this profile when various methods are used, we have made assumptions regarding the voting strategy of the voters. To wit, we have assumed that they vote according to their expressed opinions. This is called sincere voting strategy. Very often the voters deviate from their true opinions in voting, e.g. when they think that their true favorite has no chance of being elected. In these situations the voters may vote for their best realistic candidate and act as if their true favorite is ranked low in their preference order. This is an example of strategic voting.

Although voting as such is very important method for group decisions, the study of voting rules can be given another justification, viz. by substituting criteria of performance to voters in settings like Table 1, we can analyze multiple criterion decision making (MCDM). So, many results of the theory of voting systems are immediately applicable in the MCDM settings (see the chapter by Salo and Hämäläinen, this volume).

A Look at the Classics

The theory underlying voting systems is known as social choice theory. It has a long, but discontinuous history documented and analyzed by McLean and Urken (1995, 1–63). While occasional discussions have undoubtedly been had in the medieval times, the first systematic works on voting and social choice were presented in the late 18th century. From those times stems also the first controversy regarding choice rules. It arose in the French Royal Academy of Sciences and has survived till modern times. It is therefore appropriate to give a brief account of the contributions of Jean-Charles de Borda and Marquis de Condorcet, the main parties of the controversy. While both were dealing with social choice, the specific institutions focused upon differ somewhat. Borda’s attention was in the election of persons, while Condorcet discussed the jury decision making setting. Borda was interested in the choices that would best express “the will of the electors”, while Condorcet wanted to maximize the probability that the chosen policy alternative (verdict) is “right”. Condorcet’s probability calculus, however, turned out to be defective and was soon forgotten. Today he is much better known for his paradox and a solution concept. Also Borda’s contribution can be best outlined in terms of a paradox. Since it antedates Condorcet’s writing, we consider it first.

Borda’s paradox is a by-product of the criticism that its author directs against the plurality voting system. An instance of Borda’s paradox is presented in Table 2.

Table 2 Borda’s paradox

The voters are identified with their preferences over three candidates: A, B and C. Thus, four voters prefer A to B and B to C. Three voters have the preference ranking BCA and two voters the ranking CBA. Assuming that each voter votes according to his preferences, A will get four, B three and C two votes. Hence, A wins by a plurality of votes.

Upon a moment’s reflection it turns out that a pretty strong case can be built for arguing that A is not a plausible winner. While it receives the plurality of votes, it is not supported by an absolute majority of voters. More importantly, its performance in pairwise comparisons with other candidates is poor: it is defeated by both B and C with a majority of votes in paired comparisons. A is, in modern terminology, the Condorcet loser. Surely, a candidate defeated by every other candidate is pairwise contests cannot be a plausible winner. This was Borda’s contention.

As a solution to the problem exhibited by the paradox Borda proposed a point counting system or method of marks. This system was described in the preceding section. This system is today known as the Borda count. One of its advantages is, indeed, the fact that it eliminates the Borda paradox, i.e. the Borda count never results in a Condorcet loser. The fact that it does not always result in a Condorcet winner has been viewed as one of its main shortcomings. In the above setting B is the Condorcet winner. It is also the Borda winner, but – as was just pointed out – it is possible that the Condorcet winner not be elected by the Borda count.

The lessons from Borda’s paradox are the following:

  • There are degrees of detail in expressing individual opinions and using this information for making social choices. These are important determinants of choices.

  • There are several intuitive concepts of winning, e.g. pairwise and positional.

  • These concepts are not necessarily compatible. Even within these categories, i.e. pairwise and positional concept, there are incompatible views of winning.

  • If an absolute majority agrees on a highest-ranked alternative, both pairwise and plurality winners coincide.

  • The Borda count is profoundly different in not necessarily choosing the alternative ranked first by an absolute majority.

The first lesson pertains to the fact that while plurality voting requires only a minimal amount of information on voter opinions, there are methods, notably the Borda count, that are able to utilize richer forms of expressing opinions. This observation thus poses the question of the “right” form of expressing opinions.

The second lesson points to the central observation in Borda’s paradox, viz. “winning” may mean different things to different observers. The view underlying the plurality voting according to which the most frequently first-ranked candidate is the winner is clearly a positional view, but a very limited one: it looks only at the distribution of first preferences over candidates. The Borda count is also based on a positional view of winning: to win one has to occupy higher positions, on the average, than the other candidates.

The third lesson suggests that some methods of both pairwise and plurality variety agree - i.e. come up with an identical choice - when more than 50 % of the voters have the same candidate ranked first. This may explain the absolute majority requirement often imposed on winners in presidential elections.

The fourth lesson says that Borda’ proposal differs from many other voting systems in not necessarily electing a candidate that is first-ranked by an absolute majority of voters. Indeed, when the number of candidates is larger than the number of voters, the Borda count may not elect a candidate that is first-ranked by all but one voter. Depending on one’s view on the importance of protecting minority interests, this feature can be regarded as a virtue or vice (see Baharad and Nitzan, 2002).

Condorcet’s paradox is better known than Borda’s. In the literature it is sometimes called the voting paradox, simpliciter. Given the large number of various kinds of paradoxes related to voting, it is, however, preferable to call it Condorcet’s paradox. In its purest version it takes the following form:

Table 3

Suppose that we compare the candidates in pairs according to an exogenously determined list (agenda) so that the winner of each comparison survives while the loser is eliminated.Footnote 2 Hence, we need to conduct two paired comparisons. Suppose that the agenda is: (i) A vs. B, and (ii) the winner of (i) vs. C. The winner of (ii) is the overall winner. Notice that just two out of all three possible pairwise comparisons are performed. The method is based on the (erroneous) assumption that whichever alternative defeats the winner of an earlier pairwise comparison, also defeats the loser of it.

If the voters vote sincerely, A will win in (i) and C in (ii). C thus becomes the overall winner. Suppose, however, that C were confronted with the loser of (i), i.e. B. The winner of this hypothetical comparison would B. Prima facie, it could be argued that since it (B) would defeat the former winner C, it is the “real” winner. However, this argument overlooks the fact that there is a candidate that defeats B, viz. A. But not even A can be regarded as the true winner as it is beaten by C. So, no matter which candidate is picked as the winner, there is another candidate that defeats it.

The lessons of Condorcet’s paradox are the following:

  • The winner of the pairwise comparison sequence depends on the agenda. More precisely, any candidate can be rendered the winner of the procedure if one has full control over the agenda.

  • The paradox implicitly assumes complete voter myopia. In other words, in each pairwise comparison every voter is assumed to vote for whichever candidate he prefers to the other one.

  • Splitting rankings into pairwise components entails losing important information about preferences.

The first lesson pertains to the importance of agenda-setting power in certain types of preference profiles. When the preferences of voters form a Condorcet paradox, any alternative can be made the winner with suitable adjustment of the agenda of pairwise votes.

The second lesson points out an important underlying assumption, viz. the voters are assumed to vote at each stage of procedure for the candidate that is preferable. For example, one assumes that the voter with preference ranking ABC will vote for A in the first pairwise vote between A and B because he prefers A to B. Yet, it might make sense for him to vote for B if he knows the entire preference profile as well as the agenda. For then he also knows that whichever candidate wins the first ballot will confront C in the second one. If this voter wishes to avoid C (his last-ranked candidate) being elected, he should vote for B in the first ballot since B will definitely be supported by the second voter in the ballot against C. So, complete agenda-control is possible only if the voters are myopic. In other words, strategic voting may be an antidote against agenda-manipulation.

The third lesson has been emphasized by Saari, (1995, PP. 87–88). If the voters are assumed to possess rankings over candidates, it makes no sense to split these rankings into pairs ignoring all the rest of the preference information. Given what we know about the preference profile, a tie of all three alternatives is the only reasonable outcome (assuming that we do not wish to discriminate for or against any candidate or voter). The Condorcet paradox emerges not only in cases where the voters submit consistent (i.e. complete and transitive) preference rankings, but it can also pop up in settings where none of the voters has a consistent ranking. In the latter case, the word “paradox” is hardly warranted since no one expects collective preferences to be consistent if all individual preferences are inconsistent.

The two classic voting paradoxes have some joint lessons as well. Firstly, they tell us what can happen, not what will necessarily, often or very rarely happen. Secondly, there are limits of what one can expect from voting institutions in terms of performance. More specifically, the fact that one resorts to a neutral and anonymous procedure – such as plurality voting or the Borda count – does not guarantee that the voting outcomes would always reflect the voter opinions in a natural way. Thirdly, the fact that strategic voting may avoid some disastrous voting outcomes, poses the question of whether the voters are instrumentally rational or wish to convey their opinions in voting.

All these issues have been dealt with in the extensive social choice literature of our time. Probability models and computer simulations have been resorted to in order to find out the likelihood of various types of paradoxes (see e.g. Gehrlein, 1997; Gehrlein and Fishburn, 1976a, b; Gehrlein and Lepelley, 1999). The performance criteria for voting procedures have also been dealt with (see e.g. Nurmi, 1987; Riker, 1982; Straffin, 1980). The issue of strategic vs. sincere voting has been in the focus ever since the path-breaking monograph of Farquharson (1969). So, the classic voting paradoxes have been instrumental in the development of the modern social choice theory.

Single-Winner Voting Systems

The bulk of voting theory deals with systems resulting in the choice of one candidate or alternative. These are called single-winner voting systems. A large number of such systems exists today. They can be classified in many ways, but perhaps the most straight-forward one is to distinguish between binary and positional systems. The former are based on pairwise comparisons of alternatives, whereas the latter aim at choosing the candidate that is better – in some specific sense – positioned in the voters’ preferences than other candidates. These two classes do not, however, exhaust all systems. Many systems contain both binary and positional elements. We shall call them hybrid ones.

Examples of binary systems are Dodgson’s method, Copeland’s rule and max-min method. Dodgson’s method aims at electing a Condorcet winner when one exists. Since this is not always the case, the method looks for the candidate which is closest to a Condorcet winner in the sense that the number of binary preference changes needed for the candidate to become a Condorcet winner is smaller than the changes needed to make any other candidate one.

Copeland’s rule is based on all (k–1)/2 majority comparisons of alternatives. For each comparison, the winning candidate receives 1 point and the non-winning one 0 points. The Copeland score of a candidate is the sum of his points in all pairwise comparisons. The winner is the candidate with the largest Copeland score.

Max–min method determines the minimum support of a candidate in all pairwise comparisons, i.e. the number of votes he receives when confronted with his toughest competitor. The candidate with the largest minimum support is the max–min winner.

Of positional systems we have already discussed two, viz. the plurality system and the Borda count. The former determines the winner on the basis of the number of first ranks occupied by each candidate in the voters’ preference rankings. The latter takes a more “holistic” view of the preferences in assigning different points to different ranks. Also approval voting can be deemed a positional system. So can anti-plurality voting, where the voters vote for all except their lowest-ranked candidate and the winner is the candidate with more votes than other candidates.

Of hybrid systems the best-known is undoubtedly the plurality runoff. It is a mixture of plurality voting and binary comparison. The way it is implemented in e.g. presidential elections in France, there are either one or two ballots. If one of the candidates receives more than half of the total number of votes, he is elected. Otherwise, there will be a second ballot between those two candidates who received more votes than the others in the first ballot. The winner is then the one who gets more votes in the second ballot. Obviously, this system can be implemented in one round of balloting if the voters give their full preference rankings.

Another known hybrid system is single transferable vote. Its single-winner variant is called Hare’s system. It is based on similar principles as the plurality runoff system. The winner is the candidate ranked first by more than a half of the electorate. If no such candidate exist, Hare’s system eliminates the candidate with the smallest number of first ranks and considers those candidates ranked second in the ballots with the eliminated candidate ranked first as first ranked. If a candidate now has more than half of the first ranks, he is elected. Otherwise, the elimination continues until a winner is found.

These are but a sample of the voting systems considered in the literature (for more extensive listing, see e.g. Nurmi, 1987; Richelson, 1979; Straffin, 1980). They can all be implemented once the preference profile is given (in the case of approval voting one also needs the cut-off point indicating which alternatives in the ranking are above the acceptance level). In a way, one may assume that all alternatives or candidates are being considered simultaneously. There are other systems in which this is not the case, but only a proper subset of alternatives is being considered at any given stage of the procedure.

Agenda-Based Systems

It can be argued that all balloting is preceded by an agenda-formation process. In political elections, it is often the task of the political parties to suggest candidates. In committee decisions the agenda-building is typically preceded by a discussion in the course of which various parties make proposals for the policy to be adopted or candidates for offices. By agenda-based procedures one usually refers to committee procedures where the agenda is explicitly decided upon after the decision alternatives are known. Typical settings of agenda-based procedures are parliaments and committees.

Two procedures stand out among the agenda-based systems: (i) the amendment and (ii) the successive procedure. Both are widely used in contemporary parliaments. Rasch (1995) reports that the latter is the most common parliamentary voting procedure in the world. Similarly as the amendment procedure, it is based on pairwise comparisons, but so that at each stage of the procedure an alternative is confronted with all the remaining alternatives. If it is voted upon by a majority, it is elected and the process is terminated. Otherwise this alternative is set aside and the next one is confronted with all the remaining alternatives. Again the majority decides whether this alternative is elected and the process terminated or whether the next alternative is picked up for the next vote. Eventually one alternative gets the majority support and is elected.

Figure 1 shows an example of a successive agenda where the order of alternatives to be voted upon is A, C, B and D. Whether this sequence will be followed through depends on the outcomes of the ballots. In general, the maximum number of ballots taken of k alternatives is k–1.

The amendment procedure confronts alternatives with each other in pairs so that in each ballot two separate alternatives are compared. Whichever gets the majority of votes proceeds to the next ballot, while the loser is set aside. Figure 2 shows an example of an amendment agenda over 3 alternatives: x, y and z. According to the agenda, alternatives x and y are first compared and the winner is faced with z on the second ballot.

Both the amendment and successive procedure are very agenda-sensitive systems. In other words, two agendas may produce different outcomes even though the underlying preference ranking of voters and their voting behavior remain the same. Under sincere voting – whereby for all alternatives x and y the voter always votes for x if he prefers x to y and vice versa – Condorcet’s paradox provides an example: of the three alternatives any one can be rendered the winner depending on the agenda. To determine the outcomes – even under sincere voting – of successive procedure requires assumptions regarding voter preferences over subsets of alternatives. Under the assumption that the voters always vote for the subset of alternatives that contains their first-ranked alternative, the successive procedure is also vulnerable to agenda-manipulation.

Evaluating Voting Systems

The existence of a large number of voting systems suggests that people in different times and places have had somewhat different intuitive notions of how the collective choices should be made. Or they may have wanted to put emphasis on somewhat different aspects of the choice process. The binary systems have, overall, tended to emphasize that the eventual Condorcet winners be elected. An exception to this is the successive procedure which can be regarded as a binary system, albeit one where an alternative is compared with a set of alternatives. Assuming that the voters vote for the set which contains their highest ranked alternative, it may happen that the Condorcet winner is voted down in the early phases of the process. Also positional voting systems, e.g. plurality voting and the Borda count, may fail to elect a Condorcet winner.

A strong version of the Condorcet winner criterion requires that an eventual strong Condorcet winner is elected. A strong Condorcet winner is an alternative that is ranked first by more than half of the electorate. A large majority of the systems considered here satisfies this criterion. The only exceptions are the Borda count and approval voting. This is shown by Table 3. B’s Borda score is largest. B is also elected by approval voting if the seven-voter group approves of both A and B.

Table 3 Borda count and approval voting vs. strong Condorcet winner

Electing the Condorcet winner has generally been deemed a desirable property of voting systems. Profile component analysis results by Saari (1995) as well as a counterexample of Fishburn have, however, cast doubt on the plausibility of this criterion. Fishburn’s, (1973) example is reproduced in Table 4. Here the Borda winner E seems more plausible choice than the Condorcet winner D since the former has equally many first ranks as D, strictly more second and third ranks and no voter ranks it worse than third, whereas D is ranked next to last by one voter and last by one voter.

Table 4 Fishburn’s example

Another criterion associated with Condorcet’s name is the Condorcet loser one. It requires that an eventual Condorcet loser be excluded from the choice set. This criterion is generally accepted as plausible constraint on social choices.

These two are but examples of a several criteria to be found in the literature. One of the most compelling ones is monotonicity. It says that additional support should never harm a candidate’s chances of getting elected. To state this requirement more precisely consider a preference profile P consisting of rankings of n voters over the set X of k candidates. Suppose that voting rule f is applied to this profile and that candidate x is the winner. That is,

$$f(P, X) = x.$$

Suppose now that another profile P is formed so that x’s position is improved in at least one individual ranking, but no other changes are made in P. The method f is monotonic if

$$f(P^\prime, X) = x.$$

While many voting systems – e.g. plurality voting and Borda count – are monotonic, there are commonly used procedures that are non-monotonic, e.g. plurality runoff and single transferable vote. Their failure on monotonicity is exhibited in Table 5.

Table 5 Non-monotonicity of plurality runoff and STV

Here A and B will face each other in the second round, whereupon A wins. Suppose now that A had somewhat more support to start with so that the two right-most voters had the preference ranking ABC instead of BAC. In this new profile, A confronts C in the second round, where the latter wins. The same result is obtained using Hare’s system since with three alternatives it is equivalent with plurality runoff.

Pareto criterion is quite commonplace in economics, but it has an important place in the theory of voting as well. In this context it is phrased as follows: if every voter strictly prefers alternative x to alternative y, then y is not the social choice. Most voting systems satisfy this plausible requirement, but notably the agenda-based ones do not. Pareto violations of the amendment and approval voting have been shown e.g. in Nurmi (1987) and that of the successive procedure can seen by applying the successive agenda of Fig. 1 to the profile of Table 6, where B will be elected even though everyone prefers A to B.

Table 6 Pareto violation of successive procedure under agenda of Fig. 1

Another criterion of considerable intuitive appeal is consistency. It concerns choices made by subsets of voters. Let the voter set N and profile P be partitioned into N 1 and N 2, with preference profiles P 1 and P 2, respectively. Let \(F(X, P_i)\) denote the choice set of N i with i=1,2. Suppose now that some of the winning alternatives in N 1 are also winning in N 2, that is, \(F(X, P_1) \cap F(X, P_2) \neq \emptyset\). Consistency now requires that \(F(X, P_1) \cap F(X, P_2) = F(X, P). \) In words, if the subgroups elect same alternatives, these should be also chosen by the group at large. Despite its intuitive plausibility, consistency is not common among voting systems. Of the systems discussed here, only plurality, Borda count and approval voting are consistent.

Even more rare is the property called Chernoff (a.k.a. property α or heritage). It states that, given a profile and a set X of alternatives, if an alternative, say x, is the winner in X, it should be the winner in every proper subset of X it belongs to. This property characterizes only approval voting and even in this case an additional assumption is needed, viz. that the voters’ approved alternatives do not change when the alternative set is diminished. A summary evaluation of the voting systems introduced above is presented in Table 7. (In the evaluation of the agenda based systems, amendment and successive procedure, the additional assumption of fixed agenda has been made).

Table 7 Summary evaluation of some voting systems a = Condorcet winner, b = Condorcet loser, c = majority winning, d = monotonicity, e = Pareto, f = consistency and g = Chernoff

Profile Analysis Techniques

The standard starting point in social choice theory is the preference profile, i.e. a set of complete and transitive preference relations – one for each voter – over a set of alternatives. Under certain behavioral assumptions, these profiles together with the voting rule determine the set of chosen alternatives. In the preceding the behavioral assumption has been that the voters vote according to their preferences at each stage of the process. This assumption is not always plausible, but can be justified as benchmark for voting system evaluations. Moreover, it is useful in extending the results to multi-criterion decision making (MCDM) and/or in applying the MCDM results. To translate the voting results into MCDM, one simply substitutes “criteria” for “voters”. The assumption that voting takes place according to preferences (or performance rankings in MCDM) is then most natural.

Several descriptive techniques have been devised for the analysis of preference profiles. The outranking matrix is one of them. Given a profile of preferences over k alternatives, the outranking matrix is a k × k matrix, where the entry on the ith row and jth column equals the number of voters preferring the ith alternative to the jth one. Ignoring the diagonal entries, the Borda scores of alternatives can now obtained as row sums so that the sum of all non-diagonal entries on the ith row is the Borda score of the ith alternative.

From outranking matrix one can form the tournament (a.k.a. dominance) one by placing 1 in ith row and jth column if the ith alternative beats the jth one. Otherwise, the entry equals zero. From the tournament matrix one can directly spot an eventual Condorcet winner: it is the alternative that corresponds the row where all non-diagonal entries are 1’s. Similarly, the Condorcet loser is the alternative represented by a row in the tournament matrix that has just zero entries.

In the preceding we have assumed that the voters vote sincerely at each stage of the process. There are, however, contexts in which it is plausible to expect that voters vote strategically in the sense of trying to achieve as good an end result as possible even though that would imply voting in a way that differs from the voter’s preferences. This often happens in plurality or plurality runoff systems if the voters have some information about the distribution of the support of various candidates. Voting for a “lesser evil” rather than for one’s favorite may be quite plausible for the supporters of candidates with very slim chances of getting elected. The analysis of strategic or sophisticated voting based on the elimination of dominated voting strategies in binary agendas was started by Dummett and Farquharson (1961; see also the chapter by Chatterjee, this volume). The goal was to predict the voting outcomes starting from a preference profile and voting rule under the assumption of strategic voting (see also Dummett, 1984; Farquharson, 1969).

The method of eliminating dominated strategies is somewhat cumbersome. For binary voting systems McKelvey and Niemi (1978) have suggested a backwards induction procedure whereby the sophisticated voting strategies can be easily determined, if the preference profile is known to all voters (see also Shepsle and Weingast, 1984). Given an agenda of pairwise votes, the procedure starts from the final nodes of the voting tree and replaces them with their strategic equivalents. These are the alternatives that win the last pairwise comparisons. In Fig. 2 above we have two final nodes: one that represents the x vs. z comparison and the other representing the y vs. z comparison. Since the profile is known, we can predict what will be the outcome of these final votes as at this stage the voters have no reason not to vote sincerely. We can thus replace the left-hand (right-hand, respectively) final node with x or z (y or z) depending on which one wins this comparison under sincere voting. What we have left, then, is the initial node followed by two possible outcomes. By the same argument as we just presented, we now predict that the voters vote according to their preferences in this initial node whereupon we know the sophisticated voting strategy of each voter. The same backwards induction method can be used for successive procedure, i.e. in settings where the agenda (e.g. Fig. 1) and the preference profile are known.

Fig. 2
figure 2

The amendment agenda

The McKelvey–Niemi algorithm is agenda-based. A more general approach to determining the outcomes resulting from strategic voting is to look for the uncovered alternatives (Miller, 1980; 1995). Given a preference profile, we define the relation of covering as follows: alternative x covers alternative y if the former defeats the latter in pairwise contest and, moreover, x defeats all those alternatives that y defeats. It is clear that a covered alternative cannot be the sophisticated voting winner since no matter what alternative it is confronted with in the final comparison, it will be defeated. Hence, the set of uncovered alternatives includes the set of sophisticated voting winners.

Miller (1980) has shown that for any alternative x in X, any alternative y in the uncovered set either defeats x or there is an alternative z which (i) is defeated by y, and (ii) defeats x. This suggests the use of the outranking matrix and its square to identify the uncovered set (Banks, 1985):

$$T = U + U^2,$$

where U the tournament matrix. The alternatives represented by rows in T where all non-diagonal entries are non-zero form the uncovered set.

The uncovered set contains all sophisticated voting outcomes, but is too inclusive. In other words, there may be uncovered alternatives that are not sophisticated voting outcomes under any conceivable agenda. A precise characterization of the sophisticated voting outcomes has been given by Banks (1985; see also Miller, 1995). It is based on Banks chains. Given any alternative x and preference profile, the Banks chain is formed by first finding another alternative, say x 1, that defeats x. If no such x 1 exists, we are done and the end point of the Banks chain is x. If it does exist, one looks for a third alternative, say x 2, that defeats x and x 1. Continuing in this manner we eventually reach a stage where no such alternative can be found that defeats all its predecessors. The last alternative found is called a Banks alternative, i.e. it is the end point of a Banks chain beginning from x. The Banks set consists of all Banks alternatives. In other words, the set of all sophisticated voting outcomes can be found by forming all possible Banks chains and considering their end points. In contrast to the uncovered set, there are no efficient algorithms for computing the Banks set.

More recently, Saari (1995) has presented a new, geometric approach to voting systems. His representational triangles (a.k.a. Saari triangles) are very illuminating in analyzing three-alternative profiles. They are also useful in illustrating the effects of various profile components. Consider the profile of Table 3. There almost everything points to the election of A: it is the plurality winner, plurality runoff winner and strong Condorcet winner. Yet, it is not the Borda winner.

The preference profile over three alternatives can be translated into an equilateral triangle with vertices standing for alternatives. Drawing all median lines within the triangle results in six small triangles. Each one of them represents a preference ranking so that the distance from the vertices determines the ranking. So, the area labelled 7 represents ABC ranking since it is closest to vertex A, and closer to B than C. Similarly, the triangle marked with four is closest to the B vertex and C is the next closest one.

The plurality, Borda and Condorcet winners can be determined from the representational triangle as follows. The sum of the two entries in the triangles closest to each vertex gives the plurality votes of the candidate represented by the vertex. Thus, for instance, 7 + 0 is the plurality vote sum of A. The Borda score of A, in turn, can be computed by summing the entries on the left side of the line segment connecting C and the mid-point of AB line, and the entries on the lower side of the line segment connecting B and the mid-point of the AC line. I.e. 7 + 7 =14. Similarly, B’s Borda score is 11 + 4 =15 and C’s 4 + 0 = 4. That A is the Condorcet winner can be inferred from the fact that its both summands are greater than 5.5, the number of voters divided by two. The fact that C is the Condorcet loser, can be inferred from its summands as well: they are both less than the majority of voters.

Despite the fact that much speaks in favor of the election of A in the Table 3 profile, it can be argued that the Borda winner B is more robust winner than A with respect to certain changes in the size of the voter group (Saari, 1995, 2001a, b). To wit, suppose that we remove from the group a set of voters whose preferences imply a tie among all alternatives. In other words, this group – acting alone – could not decide which alternative is better than the others. Its preference profile constitutes an instance of the Condorcet paradox. Intuitively, then, the removal of this group should not make a difference in the choice of the collectively best alternative. Yet, if our choice criterion dictates that an eventual Condorcet winner should be chosen whenever it exists, the removal of this kind of sub-profile can make a difference. Similarly, adding such a group can change the Condorcet winner.

To illustrate, suppose that we add to the electorate of Table 3 a group of 12 voters with a preference profile that constitutes a Condorcet paradox: A defeats C, C defeats B and B defeats A, with equal vote margins, viz. 8 vs. 4. The resulting representational triangle looks as Fig. 4.

Fig. 3
figure 3

Representational triangle of Table 3

Fig. 4
figure 4

Adding a condorcet portion

Making the similar computations as above in Fig. 3 shows that in Fig. 4 A remains the plurality winner, but the Condorcet winner is now B. So, adding a voter group with a perfect tie profile changes the Condorcet winner. Borda winner, in contrast, remains the same. So, it seems that while the Borda count is vulnerable to changes in the alternative set (adding or removing alternatives), the systems that always elect the Condorcet winner are vulnerable to changes in the size of the electorate.

Some Fundamental Results

No account of voting procedures can ignore the many – mostly negative – results achieved in the social choice theory over the last five decades. Voting procedures are, in fact, specific implementation devices of abstract social choice functions. The notoriously negative nature of some of the main theorems stems from the incompatibility of various desiderata demonstrated by them. The results stated in the following are but a small and biased sample.

The best-known incompatibility result is Arrow’s impossibility theorem (Arrow 1963). It deals with social welfare functions. These are rules defined for preference profiles over alternatives. For each profile, the rules specify the social preference relation over the alternatives. In other words, a social welfare function \(f: \textbf{R}_{1} \times \ldots \times \textbf{R}_{n} \rightarrow \textbf{R}\), where the \(\textbf{R}_i\) denotes the set of all possible complete and transitive preference relations of individual i, while \(\textbf{R}\) is the set of all complete and transitive social preference relations. The most common version of the theorem is:

Theorem 1.

(Arrow 1963). The following conditions imposed on F are incompatible:

  • Universal domain: f is defined for all n-tuples of individual preferences.

  • Pareto: if all individuals prefer alternative x to alternative y, so does the collectivity, i.e. x will be ranked at least as high as y in the social preference relation.

  • Independence of irrelevant alternatives: the social preference between x and y depends on the individual preferences between x and y only.

  • Non-dictatorship: there is no individual whose preference determines the social preference between all pairs of alternatives.

This result has given rise to a voluminous literature and can be regarded as the starting point of the axiomatic social choice theory (see Austen-Smith and Banks, 1999; Kelly, 1978; Plott, 1976; Sen, 1970). Yet, its relevance for voting procedures is limited. One of its conditions is violated by all of them, viz. the independence of irrelevant alternatives. So, in practice this condition has not been deemed indispensable. There are systems that violate Pareto as well, e.g. the amendment and successive procedures.

Another prima facie dramatic incompatibility result is due to Gibbard (1973) and Satterthwaite (1975). It deals with a special class of social choice functions called social decision functions. While the social choice rules specify a choice set for any profile and set of alternatives, the social decision functions impose the additional requirement that the choice set be singleton valued. In other words, a single winner is determined for each profile and alternative set. The property focused upon by the Gibbard–Satterthwaite theorem is called manipulability. To define this concept we need the concept of situation. It is a pair (X, P) where X is the set of alternatives and P is a preference profile. The social choice function F is manipulable by individual i in situation (X, P) if \(F(X, P')\) is preferred to F(X, P) by individual i and the only difference between P and P' is i’s preference relation. Intuitively, if i’s true preference ranking were the one included in P, he can improve the outcome by acting as if his preference were the one included in \(P'\). A case in point is plurality voting where voters whose favorites have no chance of winning act as if their favorite were one of the “realistic” contestants.

The theorem says the following:

Theorem 2.

(Gibbard, 1973; Satterthwaite, 1975). All universal and non-trivial social decision functions are either manipulable or dictatorial.

A non-trivial choice function is such that for any alternative, a profile can be constructed so that this alternative will be chosen by the function. In other words, no alternative is so strongly discriminated against that it will not be elected under any profile. Universal decision functions are defined for all possible preference profiles.

This theorem sounds more dramatic than it is mainly because it pertains to rules that are not common. After all, nearly all voting procedures may result in a tie between two or more alternatives. That means that these procedures are not social decision functions. Nonetheless, all voting procedures discussed in the preceding can be shown to be manipulable.

Somewhat less known is the theorem that shows the incompatibility of two commonly mentioned desiderata. One of them is the Condorcet winning criterion discussed above. The other is defined in terms of the no-show paradox (Fishburn and Brams, 1983). This paradox occurs whenever a voter or a group of voters would receive a better outcome by not voting at all than by voting according to their preferences.

Theorem 3.

(Moulin, 1988). All procedures that satisfy the Condorcet winning criterion are vulnerable to no-show paradox.

These three theorems are representatives of a wide class of incompatibility results that have been proven about various desiderata on voting and, more generally, choice methods.

Methods for Reaching Consensus

The existence of a multitude of voting methods for reaching an apparently identical result – singling out the collective preference relation – is puzzling, given the fact that the methods are non-equivalent. The reasons for their invention and adoption are difficult if not impossible to ascertain. It can be argued, however, that there is a common ground underlying the methods, viz. an idea of a consensus state accompanied with a measure that indicates how far any given situation is from the consensus state. Moreover, it is arguable that each method is based on the idea of minimizing the distance – measured in some specific way – between the prevailing preference profile and the postulated consensus state. If this idea of the common ground is accepted, it becomes possible to understand the multitude of the methods by referring to differences of opinions concerning the consensus states as well as measures used in the distance minimization process.

Indeed, there is a method which is explicitly based on the above idea of distance minimization: Kemeny’s rule (Kemeny, 1959). Given an observed preference profile, it determines the preference ranking over all alternatives that is closest to the observed one in the sense of requiring the minimum number of pairwise changes in individual opinions to reach that ranking. Thus, the postulated consensus state from which the distance to the observed profile in Kemeny’s system is measured is one of unanimity regarding all positions in the ranking of alternatives, i.e the voters are in agreement about which alternative is placed first, which second etc. throughout all positions. The metric used in measuring the distance from the consensus is the inversion metric (Baigent, 1987a, b; Meskanen and Nurmi, 2006). Let R and R’ be two rankings. Then their distance is:

$$d_{K}(R,R') = \left| \left\{ (x,y)\in X^2 \mid R(x)>R(y),\ R'(y)>R'(x) \right\}\right|.$$

Here we denote by R(x) the number of alternatives worse than x in a ranking R. This is called inversion metric.

Let U(R) denote an unanimous profile where every voter’s ranking is R. Kemeny’s rule results in the ranking \(\bar{R}\) so that

$$ d_{K}(P,U(\bar R))\leq d_{K}(P,U(R))\ \forall R\in \mathcal{R}\setminus \bar R,$$

where P is the observed profile and \(\mathcal{R}\) denotes the set of all possible rankings. If all the inequalities above are strict then \(\bar{R}\) is the only winner.

We focus now on the Borda count and consider an observed profile P. For a candidate x we denote by \(\textbf{W}(x)\) the set of all profiles where x is first-ranked in every voter’s ranking. Clearly in all these profiles x gets the maximum points. We consider these as the consensus states for the Borda count (Nitzan, 1981).

For a candidate x, the number of alternatives above it in any ranking of P equals the number of points deducted from the maximum points. This is also the number of inversions needed to get x in the winning position in every ranking. Thus, using the metric above, w B is the Borda winner if

$$d_{K}(P, \textbf{W}(w_B))\leq d_{K}(P,\textbf{W}(x))\ \forall x\in X\setminus w_B.$$

The plurality system is also directed at the same consensus state as the Borda count, but its metric is different. Rather than counting the number of pairwise preference changes needed to make a given alternative unanimously first ranked, it minimizes the number of individuals having different alternatives ranked first.

To represent the plurality system as distance-minimizing we define a metric d d :

$$\begin{array}{*{20}l} {d_{d}(R,R')} & {= 0}, & {\textrm{if}}\quad R(1)=R'(1) \\ {} & {= 1}, &{ \textrm{otherwise}} \\\end{array}$$

Here R(1) and R'(1) denote the first ranked alternative in preference rankings R and R’, respectively.

The unanimous consensus state in plurality voting is one where all voters have the same alternative ranked first. With the metric d d we tally, for each alternative, how many voters in the observed profile P do not have this alternative as their first ranked one. The alternative for which this number is smallest is the plurality winner. The plurality ranking coincides with the order of these numbers.

Using this metric we have for the plurality winner w p ,

$$d_{d}(P, \textbf{W}(w_p))\leq d_{d}(P,\textbf{W}(x))\ \forall x\in X\setminus w_p.$$

The only difference to the Borda winner is the different metric used.

Many other systems can be represented as distance-minimizing ones (Meskanen and Nurmi, 2006). It seems, then, that the differences between voting procedures can be explained by the differences in the underlying consensus states sought for and the measures used in minimizing the distances between rankings.

Multi-winner Contexts

Voting procedures are often applied in composing a multi-member body, e.g. parliament, committee, working group, task force etc. Methods used in single-winner elections are, of course, applicable in these contexts, but usually additional considerations have to be taken into account. Of particular importance are issues related to the representativeness of the body. Under which conditions can we say that a multi-member body – say, a committee – represents a wider electorate?

If k-member committee is composed on the basis of plurality voting so that each voter can vote for one representative and the committee consists of k candidates with largest number of votes, the outcome may be highly unsatisfactory. To wit, consider the profile of Table 8.

Table 8 Electing a two-member committee

The plurality committee would now consist on A and B and yet A is the Condorcet loser and B is defeated by both C and D, i.e. the candidates which did not make it to the committee. Indeed, one could argue that the AB committee is the least representative of the voter opinions. In any event, the notion of representative committee seems to be ambiguous: representative in the plurality sense may be unrepresentative in the Condorcet sense.

Let us look at the representativeness issue from the view point of a voter. When can we say that a committee represents his opinion? One way of answering this is to determine whether the voter’s favorite representative is in the committee. If he is, then it seems natural to say that the voter’s opinions are represented in the committee. In the profile of Table 8 70 voters out of 90 are represented in this sense. This way of measuring representativeness underlies plurality rule committees. Even though having one’s favorite candidate in the committee is certainly important for the voter, he can be expected to be interested in the overall composition of the committee as well. For example, in Table 8 the 40 voters seeing A as their favorite, would probably prefer committee AD to AB since D is their second-ranked, while B their lowest-ranked candidate. A reasonable way to extend this idea of preference is to compose the committee with k candidates with highest Borda scores. This is suggested by Chamberlin and Courant (1983). In the Table 8 profile this leads to committee CD.

In a Borda type committee, the notion of constituency is difficult to apply. Yet, in some contexts a desideratum is to elect a committee so that each member represents a constituency of equal size. This idea underlies Monroe’s (1995) method of constructing optimal committees. The basic concept is the amount of misrepresentation. This concept is applied to pairs consisting of committee members and voters. Consider a committee C and electorate N. For each pair j, l where \(j \in C\) and \(l \in N\), let \(\mu_{jl}\) be the amount of misrepresentation related to l being represented by j. It is reasonable to set \(\mu_{jl} = 0\) if k is top-ranked in l’s preferences. In searching for the pure fully proportional representation Monroe embarks upon finding a set of k representatives, each representing an equally-sized group of voters (constituency), so that the total misrepresentation – the sum over voters of the misrepresentations of all committee members – is minimal. He suggests a procedure which firstly generates all possible \(\binom{m}{k}\) committees of k members that can be formed of m candidates. For each committee one then assigns each voter to the representative that represents him best. Since this typically leads to committees consisting of members with constituencies of different size, one proceeds by moving voters from one constituency to another so that eventually each constituency has equally many voters. The criterion in moving voters is the difference between their misrepresentation in the source and target constituencies: the smaller the difference, the more likely is the voter to be transferred.

For large m and k the procedure is extremely tedious. Potthoff and Brams (1998) suggest a simplification that essentially turns the committee formation problem into an integer programming one (see also Brams 2008). Let \(\mu_{ij}\) be the misrepresentation value of candidate i to voter j. Define x i for \(i= 1,\dots, k\) so that it is 1 if i is present in the committee and 0, otherwise. Furthermore, we define \(x_{ij} = 1\) if candidate i is assigned to voter j, that is, if i represents j in the committee. Otherwise \(x_{ij} = 0\). The objective function we aim at minimizing now becomes:

$$z = \sum_i \sum_j \mu_{ij}.$$

In other words, we minimize the sum of misrepresentations associated with the committee members. In the spirit of Monroe, Potthoff and Brams impose the following constraints:

$$\sum_i x_i = k$$
((1))
$$\sum_i x_{ij} = 1$$
((2))
$$- \frac{ n}{m} x_i + \sum_j x_{ij} = 0, \forall i.$$
((3))

Equation (1) states that the committee consists of k candidates, (2) says that each voter be represented by only one candidate, and (3) amounts to the requirement that each committee member represents an equal number of voters. In Monroe’s system, \(\mu_{ij} = k - 1 - b_{ij}\) where b ij is the number of Borda points given by j to candidate i.

In proportional representation systems the devices used to achieve similarity of opinion distributions in the electorate and the representative body are usually based on one-person-one-vote principle. A wide variety of these systems are analyzed in the magnum opus of Balinski and Young (2001).

The Best Voting System?

The multitude of voting systems as well as the large number of criteria used in their assessment suggests that the voting system designers have had different views regarding the choice desiderata. Since no system satisfies all criteria, one is well-advised to fix one’s ideas as to what a system should be able to accomplish. An even more profound issue pertains to voting system inputs: are the voters assumed to be endowed with preference rankings over candidates or something more or less demanding? An example of more demanding input is the individual utility function or “cash value” of candidates. Another input type is assumed by majoritarian judgment system elaborated by Balinski and Laraki (2007, 2009). In this system the voters assign a grade to each candidate. Of systems requiring less than preference rankings one could mention the approval voting where the voters simply indicate those candidates that they approve of (Brams and Fishburn, 1983). The evaluation criteria for these systems are much less developed than those of systems based on rankings (see, however, Aizerman and Aleskerov, 1995, for systems aggregating individual choice functions).

With regard to systems based on individual preference rankings the scholarly community is still roughly divided into those emphasizing success in pairwise comparisons and those of more positional persuasion. This was essentially the dividing line some 200 years ago when Borda and Condorcet debated the voting schemes of their time. Until mid-1990s it appeared that the social choice scholars were leaning largely to the side of Condorcet, but with the advent of Saari’s geometrical approach many (including the present writer) began to hesitate. The Borda count had proven to be easily vulnerable to strategic maneuvering and undesirably unstable under changes in the number of alternatives. However, as was discussed above, Saari pointed out that the Condorcet winners are not stable, either. To make Borda count more immune to strategic voting, one could suggest Nanson’s method which takes advantage of the weak relationship between Borda and Condorcet winners: the latter always receives a higher than average Borda score. As we saw in the preceding, this “synthesis” of two winner intuitions comes with a price: Nanson’s method is non-monotonic. Thus, one of the fundamental advantages of positional systems, monotonicity, is sacrificed when striving for less vulnerability to strategic preference misrepresentation and compatibility with the Condorcet winning criterion. For many, this is too high a price.

For those who stress positional information in group decisions, the Borda count is undoubtedly still one of the best bets. Its several variations have all proven inferior (see Nurmi and Salonen, 2008). For those inspired by the Condorcet criteria – especially the winning one – Copeland’s method would seem most plausible in the light of the criteria discussed above. A caveat is, however, in order: we have discussed but a small subset of existing voting systems and evaluation criteria. With different criterion set one might end up with different conclusions.