1 Introduction

As a theory of social justice, utilitarianism is quite appealing. But as a practical criterion for collective choice, it has serious shortcomings. To apply the utilitarian criterion, we not only need accurate information about the cardinal utility functions of all individuals in society; we must also know how to make cardinal interpersonal comparisons between these utility functions. Furthermore, an individual might inadvertently misperceive her own utility function, either through lack of self-knowledge, or because she does not fully understand the long-term consequences of the policies under consideration. For these reasons (among others), collective decisions are almost never made by trying to explicitly ascertain the utility functions of the members of society. Instead, collective decisions are usually made by voting.

A scoring rule is a particular kind of voting rule where each voter assigns a “score” to each alternative (with some constraints), and the alternative with the highest aggregate score wins. Well-known scoring rules include the Borda rule, the plurality and anti-plurality rules, evaluative (or “range”) voting, and approval voting. Since they involve maximizing a sum, scoring rules seem like a sort of “ersatz utilitarianism”. We will show that this is more than just a superficial formal resemblance. If the probability distribution of utility functions in a large society satisfies certain conditions, then we will show that a well-chosen scoring rule has a very high probability of selecting the alternative which maximizes the utilitarian social welfare function. For a sufficiently large population, this probability can be made arbitrarily close to certainty. Thus, with the right scoring rule, we can realize the utilitarian ideal, despite the informational problems described above.

The remainder of this paper is organized as follows. Section 2 introduces notation and assumptions which will be maintained throughout the paper. Section 3 considers evaluative voting. Section 4 considers approval voting. Section 5 considers rank scoring rules, such as the Borda rule or the plurality rule. Each of these three sections introduces one or more scenarios (described by hypotheses concerning the probability distribution of utility functions), and then, for each scenario, gives an asymptotic probability result. Appendix A reviews some results from Pivato (2016c) that are used in the other proofs. Appendix B contains the proofs of all the results in the paper.

Related literature The results in this paper are complementary to those in Pivato (2015, (2016a, (2016c). Like the present paper, Pivato (2015) considers conditions under which ordinal voting rules maximize the utilitarian social welfare function (SWF) in a large population. But whereas this paper focuses on scoring rules, Pivato (2015) focuses on Condorcet consistent rules such as the Copeland rule or an agenda of pairwise majority votes. Meanwhile, Pivato (2016c) considers a broader problem: how can we compute (and maximize) the utilitarian SWF when we have only very imprecise information about people’s utility functions and the correct system of interpersonal utility comparisons? Under plausible conditions, Pivato (2016c) shows that, in a large population, we can accurately estimate the utilitarian SWF despite these difficulties. Indeed, Pivato (2016a) shows that this can be done in a strategy-proof way, using a modified version of the Groves–Clarke pivotal mechanism.

We will evaluate ordinal voting rules from a utilitarian perspective. This approach was pioneered by Laplace in 1795,Footnote 1 but then apparently neglected for one hundred seventy years. It was rediscovered by Rae (1969), Taylor (1969) and Weber (1978). Rae and Taylor assumed that all voters had equal preference intensities over a dichotomous choice, given by independent, identically distributed (i.i.d.) random \(\{0,1\}\)-valued utility functions. In this setting, they showed that simple majority vote maximized the expected value of the utilitarian SWF, amongst all anonymous voting rules.Footnote 2 Weber (1978) considered a setting with many alternatives, and variable preference intensities. Assuming that the voters’ utilities for the different alternatives were independent, uniformly distributed (i.u.d.) random variables, he sought the voting rule which maximized the expected value of the utilitarian SWF in a large population. He showed that the Borda rule was the optimal rule in the class of rank scoring rules. The results in Sect. 5 can be seen as a major generalization of this early insight. In the case of exactly three alternatives, Weber (1978) also showed that the approval voting rule slightly outperforms the Borda rule; we will study the utilitarian efficiency of approval voting in Sect. 4.

Shortly after Weber’s foundational work, Bordley (1983, (1985a) and Merrill (1984) used computer simulations to estimate the expected value of the utilitarian SWF for various voting rules. Bordley (1985b, (1986) computed utilitarian-optimal weighted majority voting schemes for dichotomous decisions with correlated voters. But there was no further utilitarian analysis of voting rules for the next 20 years.

Starting in 2005, a literature emerged on the utilitarian analysis of federal representative assemblies. Most of these papers focussed on dichotomous decisions, and assumed that the utility functions of the citizens were i.i.d. random variables. They asked: which voting rule will maximise the expected value of the utilitarian SWF? First, Beisbart et al. (2005) computationally compared the performance of seven benchmark rules, while Barberà and Jackson (2006) gave an exact formula for the utilitarian-optimal weighted majority rule in terms of the distribution of utility functions found within each region. Next, Beisbart and Bovens (2007) and Bovens and Hartmann (2007) investigated the consequences of different population-based weighting formulas with a mixture of theoretical analysis and computational results. Laruelle and Valenciano (2008, Ch.3; 2010, §7) provided a utilitarian rational for the classic Penrose “square root” weighting formula. Macé and Treibich (2012) and Koriyama et al. (2013) derived analytical results in scenarios where voters have non-separable preferences over a series of dichotomies. (This is represented by making the utility of each region a concave function of its frequency of victory in a long series of decisions.) In contrast to all the previously mentioned papers, Fleurbaey (2009) and Beisbart and Hartmann (2010) considered models with correlated voters. In Biesbart and Hartmann’s model, the profile of utility functions is drawn from a multivariate normal distribution, whereas Fleurbaey’s model is extremely general; the correlation structure between voters is completely arbitrary. Finally, Maaser and Napel (2014) used computer simulations to find utilitarian-optimal voting weights in a setting with three or more alternatives arranged on a line (with single-peaked preferences). The overall message of these papers is that, with i.i.d. voters, the expected value of utilitarian social welfare is generally maximized by a “degressive” weighted majority rule, where the weight of each region is a sub-linear function (e.g. square root) of its population. With non-independent and/or non-identical voters, the optimal weights can depend on the correlations and preference intensities.

More recently, a series of papers have considered direct democracies deciding amongst three or more alternatives. Lehtinen (2007, (2008) used computer simulations to show that strategic voting often improves the utilitarian efficiency of the Borda rule and approval voting. Caragiannis and Procaccia (2011) estimated the “distortion” of the plurality, approval, and antiplurality voting rules—that is, the worst-case ratio between the utilitarian social welfare of the optimal alternative, and the utilitarian social welfare of the alternative which actually wins, where the worst case is computed over all possible profiles of “normalized” utility functions. (A utility function is “normalized” if it is positive and the utilities sum to one.) Procaccia and Caragiannis were particularly interested in the asymptotic growth rate of this distortion ratio as the number of voters and/or alternatives becomes large. They showed that, if voters randomly convert their cardinal utility functions into voting behaviours in a plausible way, then the expected distortion ratio grows surprisingly slowly. Their intended application was preference aggregation in a cooperating group of artificially intelligent agents, but their results are also applicable to more traditional social welfare problems. This approach has recently been extended by Boutilier et al. (2012), who study the worst-case and average-case performance of randomized social choice rules.

Given a social welfare function W, and the probability distribution of the voters’ cardinal utility functions, Apesteguia et al. (2011) asked: what ordinal voting rule maximizes the expected value of W? In the case when W is the utilitarian SWF, and the voters’ utilities are i.i.d. random variables, they showed that the W-optimal rule is a rank scoring rule of the kind we consider in Sect. 5. In particular, if the voters’ utilities are i.u.d. random variables, then the W-optimal rule is the Borda rule. The results in Sect. 5 of this paper can be seen as complementary to those of Apesteguia et al. (2011); they showed that a certain scoring rule was better, on average, than any alternative voting rule, whereas we show that, in a large population, it approaches perfect agreement with the utilitarian social choice.

Giles and Postl (2014) conducted a similar investigation for (A, B)-voting rules, a two-parameter family of rules introduced by Myerson (2002), which includes approval vote as well as rank scoring rules. Giles and Postl suppose there are three alternatives, whose utilities for each voter are privately known i.i.d. random variables on the interval [0, 1]. Unlike Apesteguia et al. (2011), they allow strategic voting. Giles and Postl first characterize the symmetric Bayesian Nash equilibrium (BNE) for the N-player strategic voting game for any \(N\ge 2\). Then they numerically compute the expected value of the utilitarian SWF at the three-player BNE for various \((A,B)\in [0,1]^2\) (where the three players’ utilities are i.i.d. random variables drawn from either a uniform distribution or a beta distribution on [0, 1]). The results on approval voting in Sect. 4 of this paper can be seen as complementary to the findings of Giles and Postl (2014), but extended to an arbitrary number of alternatives and a large number of voters.

Kim (2014) pushes this investigation further. In a setting with three or more alternatives, and voters with independent (but not identically distributed) random utilities, he characterizes the rules which are ex ante Pareto efficient in the class of ordinal voting rules: they are “non-anonymous” rank scoring rules (where each voter has perhaps a different score vector). He further shows that, in a “neutral” environment (i.e. all alternatives are ex ante interchangeable), such rules are incentive compatible (i.e. truth-revealing in BNE). Special cases of Kim’s analysis are the rank scoring rules which maximize ex ante utilitarian social welfare over all ordinal rules. In particular, Kim observes that the rank scoring rules of Apesteguia et al. (2011) are incentive-compatible in the i.i.d. environment of their paper.Footnote 3 He then constructs incentive-compatible voting rules which, in terms of the utilitarian SWF, are superior to any ordinal rule (in particular, any scoring rule), but which utilize only a limited amount of cardinal utility information from the voters.

Azrieli and Kim (2014) perform a similar analysis for a dichotomous choice in which voters have independent (but not identically distributed) random utilities. They show that the rule which maximizes ex ante utilitarian social welfare over the class of all incentive compatible rules is a weighted majoritarian rule (where the weight of each voter is determined by the expected value of her utility function). They also obtain a similar characterization of the ex ante and ex interim Pareto-optimal rules in the class of incentive-compatible rules. Meanwhile, Krishna and Morgan (2012) have considered a model where participation itself is a strategic choice. Assuming that voting is costly and participation is voluntary, they showed that simple majority vote will maximize utilitarian social welfare, because voters with weaker preferences will abstain from voting.

The majority of the aforementioned papers deal only with dichotomous decisions, whereas we allow an arbitrary number of alternatives.Footnote 4 Also, except for Fleurbaey (2009), all of the aforementioned papers assumed that cardinal interpersonal utility comparisons are unproblematic. In contrast, we suppose that these interpersonal comparisons themselves are ambiguous in practice (but still meaningful in principle). Finally, except for Weber (1978) and Caragiannis and Procaccia (2011), all of the aforementioned papers deal with “small” populations of voters, whereas we are interested in asymptotic probabilistic results for very large populations.Footnote 5 It is not possible here to adequately summarise the vast and growing literature on the large-population asymptotic probabilistic analysis of voting rules. Instead, we will only briefly touch on two strands of this literature. The first strand is the Condorcet Jury Theorem (CJT) and its many generalizations.Footnote 6 Like the CJT literature, the results of the present paper say that, under certain probabilistic assumptions, a large population using a certain voting rule is likely to arrive at the “correct” decision. But the goal for the CJT literature is to find the correct answer to some objective factual question, whereas the goal in the present paper is to maximize social welfare.

The second strand is the literature on strategic voting and/or strategic candidacy in large populations with some kind of randomness or uncertainty in voters’ preferences. This literature is mainly concerned with characterizing the Nash equilibria of certain large election games. These equilibria occasionally have surprising social welfare properties. For example, Ledyard (1984), Lindbeck and Weibull (1987, (1993), Coughlin (1992; Theorem 3.7 and Corollary 4.4), Banks and Duggan (2004; §4) and McKelvey and Patty (2006) have all shown that, in certain election games, there is a unique Nash equilibrium (sometimes called a “political equilibrium”) where all the candidates select the policy which maximizes a utilitarian SWF. But these utilitarian SWFs are based on somewhat peculiar systems of interpersonal utility comparisons. In these models, voter behaviour is described by a stochastic device: the probability that voter i votes for candidate C (or in some cases, the probability that i votes at all) is a function of the difference between the cardinal utility which i assigns to C and the cardinal utility she assigns to other candidates. Although the different models use different stochastic devices and seek to capture different phenomena (e.g. random private costs for voting, or random private shocks to the utility functions, or random individual errors due to bounded rationality, or other exogenous perturbations), each model assumes that utility functions are translated into voting probabilities in the same way for every voter. In this way, each model smuggles in a system of “implicit” interpersonal utility comparisons via the stochastic device. As observed by Banks and Duggan (2004, p. 29), this means that the normative significance of the “utilitarianism” emerging from these political equilibria is somewhat unclear.

In contrast, this paper assumes that there is a pre-existing, normatively meaningful system of cardinal interpersonal utility comparisions, explicitly described by a set of “calibration constants” which exist independently of the voting rule and any other random factors in the model. The social planner does not know the exact values of these calibration constants, so she regards them as random variables. Our results suggest that it is still possible to closely approximate the utilitarian social choice, even with this kind of uncertainty. However, unlike the political equilibrium literature, we treat the social alternatives as exogenous, rather than endogenizing them as the result of political candidates competing for popularity.

We also differ from the political equilibrium literature in that we do not grapple with strategic voting. However, in a companion paper, Núñez and Pivato (2016) show that all of the voting rules considered in this paper can be “approximately implemented” in large populations. To be precise, given any voting rule F, there is a stochastic voting rule \({\widetilde{F}}\) which has two properties, for any sufficiently large population of voters. First, \({\widetilde{F}}\) selects the same alternative as F, with very high probability. Second, each voter will find it strategically optimal to vote honestly in \({\widetilde{F}}\), with very high probability. This result holds under weak and plausible assumptions about the voters’ beliefs about other voters. Thus, the problem of strategic voting essentially vanishes in a sufficiently large population.

2 Assumptions and notation

We will now fix some notation and assumptions which will be maintained throughout the paper. Let \({\mathbb {R}}\) denote the set of real numbers. For any set \({\mathcal { A}}\) and function \(f:{\mathcal { A}}{{\longrightarrow }}{\mathbb {R}}\), let \({{\mathrm{\mathrm{argmax}}}}_{\mathcal { A}}(f)\) denote the set of elements in \({\mathcal { A}}\) which maximize f. For any random variable X, let \({\mathbb {E}}[X]\) denote its expected value, and let \(\mathrm{var}[X]\) denote its variance.

Let \({\mathcal { A}}\) be a finite set of social alternatives, let \({\mathcal { I}}\) be a set of voters, and let \(I:=|{\mathcal { I}}|\). (We will typically suppose that I is very large; indeed we will be interested in asymptotic phenomena as \(I{\rightarrow }{\infty }\).) For every i in \({\mathcal { I}}\), let \(u_i:{\mathcal { A}}{{\longrightarrow }}{\mathbb {R}}\) be voter i’s cardinal utility function, and let \(c_i>0\) be a “calibration constant”, which we will use to make cardinal interpersonal utility comparisons. We suppose that the functions \(c_i\, u_i\) and \(c_j\, u_j\) are interpersonally comparable for all voters i and j in \({\mathcal { I}}\). In other words, for any alternatives a, b, c, and d in \({\mathcal { A}}\), if \(c_i\,u_i(b)-c_i\,u_i(a)=c_j\,u_j(d)-c_j\,u_j(c)\), then the welfare that voter i gains in moving from alternative a to alternative b is exactly the same as the welfare that voter j gains in moving from c to d. We would therefore like to maximize the utilitarian social welfare function \(U_{\mathcal { I}}:{\mathcal { A}}{{\longrightarrow }}{\mathbb {R}}\) defined by

$$\begin{aligned} U_{\mathcal { I}}(a):=\frac{1}{I} \sum _{i\in {\mathcal { I}}} c_i\, u_i(a), \quad \text{ for } \text{ every } \text{ alternative } \text{ a } \text{ in }\,{\mathcal { A}}. \end{aligned}$$
(1)

The models which appear below differ in their assumptions about the functions \(\{u_i\}_{i\in {\mathcal { I}}}\). In Sects. 4 and 5, we will suppose that \(\{u_i\}_{i\in {\mathcal { I}}}\) are unknown to the social planner; from her point of view, they are random functions. However, she knows certain features of their distributions, and can exploit these features in the design of the voting rule. She can then obtain partial and indirect information about \(\{u_i\}_{i\in {\mathcal { I}}}\) from the voters’ responses to this voting rule. In contrast, in Sect. 3 we will suppose that \(\{u_i\}_{i\in {\mathcal { I}}}\) are known functions, ranging over [0, 1]; the only uncertainty lies in the calibration constants \(\{c_i\}_{i\in {\mathcal { I}}}\). Throughout Sects. 3, 4 and 5, we will assume that \(\{c_i\}_{i\in {\mathcal { I}}}\) are unknown to the social planner. We formalize this with the following assumption:

  1. (C)

    \(\{c_i\}_{i\in {\mathcal { I}}}\) are real-valued random variables, which are independent, but not necessarily identically distributed. There is some constant \(\sigma _c^2\ge 0\) (independent of I) such that \(\mathrm{var}[c_i]\le \sigma _c^2\) and \({\mathbb {E}}[c_i]=1\) for all \(i\in {\mathcal { I}}\).

For some of our results, it will also be convenient to assume the utility profile \(\{u_i\}_{i\in {\mathcal { I}}}\) satisfies the following technical property.

  • \((\Delta )\)    There is a constant \(\Delta >0\) (independent of I) such that \(\max _{{\mathcal { A}}} (U_{\mathcal { I}})-U_{\mathcal { I}}(a)>\Delta \) for every \(a\not \in {{\mathrm{\mathrm{argmax}}}}_{{\mathcal { A}}} (U_{\mathcal { I}})\).

Here, \(\Delta \) is the minimum social welfare gap between the optimal policy and the next-best policy. If a voting rule acts as an “estimator” of the utilitarian SWF, then we need the error of this estimate to be smaller than \(\Delta \), in order for the rule to select the optimal policy, and not the next-best policy. (Of course, if \(\Delta \) was very small, then selecting the next-best policy would not be a catastrophe; thus, we will relax condition \((\Delta )\) in Propositions 3.2, 4.3, and 5.3 below.)

It might seem strange that \(U_{\mathcal { I}}\) is defined in formula (1) as the per capita average utility, rather than the total utility. The extra 1 / I factor makes no difference for maximization purposes. So why is it there?

The reason is that we are concerned with assessing how far a particular policy falls short of the theoretical maximum social welfare, and in a large (and growing) population, it makes much more sense to measure such a welfare shortfall in terms of per capita average utility rather than total utility. To see the problem, suppose our voting rule selects some policy P which falls short of the optimal policy Q. If we measure this shortfall as the per capita average utility disparity between P and Q, then our assessment will be independent of the size of the population (ceteris paribus). But if we measured the shortfall as the total utility disparity between P and Q, then an increase in the size of the population would mechanically cause a proportionate increase in our assessment of the “suboptimality” of P, even though nothing else has changed. If we judge the quality of voting rules by the social suboptimality of the policies they produce, then we would spuriously conclude that the larger society had a worse voting rule.

Indeed, policy evaluators usually normalize statistics by population size, precisely to avoid this kind of spurious conclusion. For instance, suppose that a certain traffic law X leads to 100 accidental traffic deaths per year in country A, while an alternative traffic law Y leads to 300 accidental traffic deaths per year in another very similar country B (and otherwise X and Y have identical effects on traffic flow, pollution, etc.). We might naïvely conclude that the traffic law X is safer. But if the population of country B is ten times bigger than country A, then in fact the law Y is safer, in per capita terms.

3 Evaluative voting

The most natural “utilitarian” voting rule simply asks each voter to assign a numerical score to each alternative, presumably reflecting her cardinal utility function. The obvious problem with this approach is that voters could strategically exaggerate these scores. One partial solution is to rescale every voter’s utility function to range over the interval \({\left[ 0,1 \right]}\). The resulting social choice rule has been called evaluative voting (Núñez and Laslier 2014; Baujard et al. 2014), utilitarian voting (Hillinger 2005), or range voting (Smith 2000; Macé 2013).

Formally, in evaluative voting (EV), the vote of each voter i in \({\mathcal { I}}\) takes the form of a function \(v_i:{\mathcal { A}}{{\longrightarrow }}[0,1]\). The EV rule then chooses the alternative(s) in \({\mathcal { A}}\) that maximize the function \(V_{\mathcal { I}}:{\mathcal { A}}{{\longrightarrow }}{\mathbb {R}}\) defined by

$$\begin{aligned} V_{\mathcal { I}}(a):= \sum _{i\in {\mathcal { I}}} v_i(a), \quad \text {for every alternative }\; a \; \text {in }\; {\mathcal { A}}. \end{aligned}$$
(2)

Meanwhile, for every voter i in \({\mathcal { I}}\) let \(w_i:{\mathcal { A}}\,{{\longrightarrow }}\,{\mathbb {R}}\) be her “true” utility function. We suppose these utility functions admit one-for-one cardinal interpersonal comparisons. In other words, for any alternatives a, b, c, and d in \({\mathcal { A}}\), if \(w_i(b)-w_i(a)=w_j(d)-w_j(c)\), then the welfare that voter i gains in moving from a to b is exactly the same as the welfare that voter j gains in moving from c to d. We therefore want to maximize the utilitarian SWF \(U_{\mathcal { I}}\) defined by

$$\begin{aligned} U_{\mathcal { I}}(a):= \frac{1}{I}\sum _{i\in {\mathcal { I}}} w_i(a), \quad \text{ for } \text{ every } \text{ alternative } \; a \; \text {in } \; {\mathcal { A}}. \end{aligned}$$
(3)

Let \({\underline{w}}_i:=\min \{w_i(a)\); \(a\in {\mathcal { A}}\}\). By replacing \(w_i\) with the function \({\widetilde{w}}_i:=w_i-{\underline{w}}_i\) if necessary, we can suppose that \(\min \{w_i(a)\); \(a\in {\mathcal { A}}\}=0\), for every voter i in \({\mathcal { I}}\). Clearly this does not affect the maximizer of (3). Now let \(c_i:=\max \{w_i(a)\); \(a\in {\mathcal { A}}\}\), and then define \(u_i(a):= w_i (a)/c_i\), for every voter i in \({\mathcal { I}}\) and every alternative a in \({\mathcal { A}}\). Then formula (3) is clearly equivalent to formula (1). Roughly speaking, \(c_i\) can be seen as a crude measure of the “intensity” of voter i’s preferences regarding the social decision. Of course, the social planner cannot accurately observe these preference intensities. So we will assume that the planner regards \(\{c_i\}_{i\in {\mathcal { I}}}\) is independent random variables, as described by assumption (C) from Sect. 2.

Note that each \(u_i\) ranges over the interval \({\left[ 0,1 \right]}\). If each voter i uses the full range [0, 1] to express her utilities, but is otherwise accurate; then she will set \(v_i=u_i\). (In this case, EV is equivilant to the relative utilitarian social welfare function axiomatized by Dhillon (1998) and Dhillon and Mertens (1999).) However, voter i may misperceive her own utility function. Thus, in general, \(v_i = u_i+\epsilon _i\), where \(\epsilon _i:{\mathcal { A}}\,{{\longrightarrow }}\,{\mathbb {R}}\) is a random “error” function. Suppose \({\underline{a}}_i\) and \({\overline{a}}_i\) are the minimizer and maximizer of \(u_i\) (thus, \(u_i({\underline{a}}_i)=0\) and \(u_i({\overline{a}}_i)=1\)). It is reasonable to suppose that \(v_i({\underline{a}}_i)=0\) and \(v_i({\overline{a}}_i)=1\)—that is, voter i reliably assigns a score of 0 to her worst alternative and a score of 1 to her best alternative. Thus, \(\epsilon _i({\underline{a}}_i)=\epsilon _i({\overline{a}}_i)=0\). However, for the other alternatives in \({\mathcal { A}}\), the errors may be nonzero. We assume they satisfy the following condition:

  1. (E)

    For each alternative a in \({\mathcal { A}}\), the random errors \(\{\epsilon _i(a)\}_{i\in {\mathcal { I}}}\) are independent, but not necessarily identically distributed. There is some constant \(\sigma _\epsilon ^2>0\) (independent of I) such that \(\mathrm{var}[\epsilon _i(a)]\le \sigma _\epsilon ^2\) and \({\mathbb {E}}[\epsilon _i(a)]=0\) for all \(a\in {\mathcal { A}}\) and \(i\in {\mathcal { I}}\). The random variables \(\{c_i\}_{i\in {\mathcal { I}}}\) are independent of the random functions \(\{\epsilon _i\}_{i\in {\mathcal { I}}}\).

(Note that we do not assume that, for a fixed voter i in \({\mathcal { I}}\), the random errors \(\epsilon _i(a)\) and \(\epsilon _i(b)\) are independent for different alternatives a and b in \({\mathcal { A}}\).) Our first result says that, despite the uncertainties surrounding \(\{c_i\}_{i\in {\mathcal { I}}}\) and \(\{\epsilon _i\}_{i\in {\mathcal { I}}}\), evaluative voting has a very good chance of maximizing the utilitarian social welfare function \(U_{\mathcal { I}}\) in formula (3), when the population is large.

Theorem 3.1

For every voter i in \({\mathcal { I}}\), let \(u_i:{\mathcal { A}}\,{{\longrightarrow }}\, [0,1]\) be a utility function. Suppose that the profile \(\{u_i\}_{i\in {\mathcal { I}}}\) satisfies \((\Delta )\), and suppose \(\{c_i\}_{i\in {\mathcal { I}}}\), \(\{\epsilon _i\}_{i\in {\mathcal { I}}}\) and \(\{v_i\}_{i\in {\mathcal { I}}}\) are randomly variables satisfying (C) and (E). Then

$$\begin{aligned} \lim _{I{\rightarrow }{\infty }} \ \mathrm{Prob}[ {{\mathrm{\mathrm{argmax}}}}_{{\mathcal { A}}} (V_{\mathcal { I}})\subseteq {{\mathrm{\mathrm{argmax}}}}_{\mathcal { A}}(U_{\mathcal { I}})]=1. \end{aligned}$$

We can refine this result in three ways. First, we can drop condition \((\Delta )\). Second, and relatedly, instead of demanding that the outcome of evaluative voting exactly maximizes \(U_{\mathcal { I}}\), we can allow the possibility that it only almost maximizes \(U_{\mathcal { I}}\)—something which would be almost as good, for practical purposes. Third, we can estimate how large the population needs to be in order to achieve such “almost-maximization” with a certain probability. To achieve these refinements, we need some more notation. For any utility profile \(\{u_i\}_{i\in {\mathcal { I}}}\), if \(U_{\mathcal { I}}\) is as in Eq. (3), then let

$$\begin{aligned} U^*_{\mathcal { I}}:= \max {\left\{ U_{\mathcal { I}}(a) \; ; \; a\in {\mathcal { A}} \right\} } \end{aligned}$$
(4)

This is the theoretical maximum social welfare, which would be obtained from the optimal social alternative. Let \(\delta >0\) represent a “social suboptimality tolerance”, and let \(p>0\) represent the probability that this tolerance will be exceeded (the social planner wants both of these to be small). For any values of \(\delta \) and p, we define

$$\begin{aligned} {\overline{I}}(\delta ,p):= 4\,|{\mathcal { A}}|\,\frac{\sigma _c^2+ \sigma _\epsilon ^2}{p\,\delta ^2}. \end{aligned}$$
(5)

Our next result says that, for any population larger than \({\overline{I}}(\delta ,p)\), any \(V_{\mathcal { I}}\)-maximizing social alternative will produce a utilitarian social welfare within \(\delta \) of the theoretical optimum, with probability at least \(1-p\).

Proposition 3.2

For every voter i in \({\mathcal { I}}\), let \(u_i:{\mathcal { A}}\,{{\longrightarrow }}\, [0,1]\) be a utility function. Suppose \(\{c_i\}_{i\in {\mathcal { I}}}\), \(\{\epsilon _i\}_{i\in {\mathcal { I}}}\) and \(\{v_i\}_{i\in {\mathcal { I}}}\) satisfy (C) and (E). For any \(\delta >0\) and \(p\in (0,1)\), if \(I\ge {\overline{I}}(\delta ,p)\), then \(\mathrm{Prob}\left[ U_{\mathcal { I}}(a)< U^*_{\mathcal { I}}-\delta \right]\ < \ p\), for every a in \({{\mathrm{\mathrm{argmax}}}}_{{\mathcal { A}}} (V_{\mathcal { I}})\).

Remark on strategic voting Theorem 3.1 assumes that everyone tries to vote honestly—i.e. that \(v_i=u_i\) (plus perhaps some error) for all \(i\in {\mathcal { I}}\). But people may vote strategically. Indeed, in a Myerson–Weber model of evaluative voting in a large-population, each voter’s best strategy is to assign a score of either 0 or 1 to each alternative in \({\mathcal { A}}\) (Núñez and Laslier 2014). In this case, evaluative voting would reduces to approval voting. However, Núñez and Pivato (2016, Theorem 4) show that in a sufficiently large population, there is a stochastic voting rule which will produce the same outcome as evaluative voting, with very high probability, and where almost all voters will vote honestly. This result, combined with Theorem 3.1 above, suggests that in a sufficiently large population satisfying hypotheses (C) and (E), this stochastic evaluative voting rule will select the utilitarian-optimum outcome with very high probability, even when voters are strategically sophisticated.

4 Approval voting

Approval voting was originally proposed by Ottewell (1977), Kellett and Mott (1977), and Weber (1978), but the first sustained formal analysis was by Brams and Fishburn (1983), which is now usually regarded as the locus classicus. Laslier and Sanver (2010) provide a recent and comprehensive reference. Informally, the approval voting rule works as follows:

  1. 1.

    Each voter i identifies a subset of alternatives in \({\mathcal { A}}\) which she “approves”.

  2. 2.

    For each social alternative a in \({\mathcal { A}}\), count how many voters approve a.

  3. 3.

    Choose the alternative which is approved by the most voters.

Formally, for every voter i in \({\mathcal { I}}\), let \({\mathcal { G}}_i\subseteq {\mathcal { A}}\) be the set of alternatives which i approves; we refer to \({\mathcal { G}}_i\) as her approval set. Let \({\varvec{{\mathcal { G}}}}:=\{{\mathcal { G}}_i\}_{i\in {\mathcal { I}}}\) be the profile of the voters’ approval sets. For any social alternative a in \({\mathcal { A}}\), we define its approval score by: \(V_{\varvec{{\mathcal { G}}}}(a):=\#\{i\in {\mathcal { I}}\); \(a\in {\mathcal { G}}_i\}\). We then define \(\mathrm{Appr}({\varvec{{\mathcal { G}}}}) := {{\mathrm{\mathrm{argmax}}}}_{\mathcal { A}}(V_{\varvec{{\mathcal { G}}}})\).Footnote 7

Recall that \(u_i:{\mathcal { A}}\,{{\longrightarrow }}\,{\mathbb {R}}\) denotes the cardinal utility function of voter i, and that we wish to maximize the utilitarian social welfare function \(U_{\mathcal { I}}\) defined in Formula (1). If the approval set \({\mathcal { G}}_i\) is a “noisy signal” of \(u_i\) (for every voter i in \({\mathcal { I}}\)), then the aggregate approval score \(V_{\varvec{{\mathcal { G}}}}\) could be seen as a “noisy signal” of the social welfare function \(U_{\mathcal { I}}\). Thus, under the right conditions, approval voting should maximize utilitarian social welfare. Indeed, Weber (1978) showed this was true when there are three alternatives, for which the utilities of the voters are independent random variables uniformly distributed over an interval. The goal of this section is to make this intuition precise in a much more general setting.

Each voter’s true utility function is unknown to the social planner. So is the process by which each voter converts her utility function into an approval set. We capture this lack of knowledge by treating these as random variables, described by some probabalistic model. We will consider two different models: the Threshold Model and the Selection Model. The Threshold Model (Sect. 4.1) first assigns each voter a random utility for each social alternative, and then selects her approval set from these alternatives by means of a randomly determined threshold. The Selection Model (Sect. 4.2) first selects an approval set for each voter (this process may be random or deterministic), and then randomly assigns utilities to each social alternative, according to a probability distribution which depends on whether or not it is in the approval set. Both models yield the same conclusion: in a large population, approval voting maximizes the utilitarian social welfare function, with high probability.

4.1 The threshold model

In this section, we will suppose that each voter i in \({\mathcal { I}}\) identifies an approval threshold \(\theta _i\). She then defines her approval set \({\mathcal { G}}_i\) to be all social alternatives whose utility exceeds \(\theta _i\). That is:

$$\begin{aligned} {\mathcal { G}}_i:= {\left\{ a\in {\mathcal { A}} \; ; \; u_i(a)\ge \theta _i \right\} }. \end{aligned}$$
(6)

In addition to the assumptions (C) and \((\Delta )\) from Sect. 2, we will now make the following two assumptions:

\((\Theta 1)\) :

The utilities \(\{u_i(a)\); \(a\in {\mathcal { A}}\) and \(i\in {\mathcal { I}}\}\) are i.i.d. random variables with finite variance. The variables \(\{c_i\}_{i\in {\mathcal { I}}}\) and \(\{u_i\}_{i\in {\mathcal { I}}}\) are all jointly independent.

\((\Theta 2)\) :

The thresholds \(\{\theta _i\}_{i\in {\mathcal { I}}}\) are independent random variables (not necessarily identically distributed). For any \(i\in {\mathcal { I}}\), the random threshold \(\theta _i\) may depend on \(\{u_i(a)\}_{a\in {\mathcal { A}}}\). But \(\{\theta _i\}_{i\in {\mathcal { I}}}\) and \(\{c_i\}_{i\in {\mathcal { I}}}\) are independent. Finally, \(0<\mathrm{Prob}[u_i(a)\ge \theta _i]<1\) for all \(a\in {\mathcal { A}}\).

Assumption \((\Theta 1)\) describes the planner’s ignorance about people’s true utility functions, while \((\Theta 2)\) describes her ignorance about how they convert these utility functions into approval sets, except for the fact that they follow rule (6). The condition “\(0<\mathrm{Prob}[u_i(a)\ge \theta _i]<1\)” simply guarantees that the output of rule (6) is almost-surely nondegenerate. The next result says that, if a large population of voters satisfies these hypotheses, then with very high probability, approval voting will maximize the utilitarian social welfare function \(U_{\mathcal { I}}\) in Eq. (1).

Theorem 4.1

Suppose \(\{u_i\}_{i\in {\mathcal { I}}}\), \(\{\theta _i\}_{i\in {\mathcal { I}}}\) and \(\{c_i\}_{i\in {\mathcal { I}}}\) satisfy (C), \((\Delta )\), \((\Theta 1)\), and \((\Theta 2)\), and the approval profile \({\varvec{{\mathcal { G}}}}=\{{\mathcal { G}}_i\}_{i\in {\mathcal { I}}}\) is defined by rule (6). Then

$$\begin{aligned} \lim _{I{\rightarrow }{\infty }} \ \mathrm{Prob}\left[ \mathrm{Appr}({\varvec{{\mathcal { G}}}})\subseteq {{\mathrm{\mathrm{argmax}}}}_{\mathcal { A}}(U_{\mathcal { I}})\right]=1. \end{aligned}$$

One notable difference between Theorems 3.1 and 4.1 is that the former places essentially no conditions on the utility functions \(\{u_i\}_{i\in {\mathcal { I}}}\), whereas the latter requires these utility functions to be i.i.d. random variables. The reason is that approval voting provides us with less information about individual utility functions than evaluative voting. A voter’s response in evaluative voting tells us her entire utility function, up to a scalar multiple; the only uncertainty is the magnitude of this scalar. But her approval set only tells us whether each she assigns a ‘high’ or ‘low’ utility to each alternative. Thus, without further information about the conditional probability distributions of these ‘high’ and ‘low’ utilities, it is not possible for us to estimate the utilitarian social welfare function.

The total ignorance described by \((\Theta 1)\) implies both a sort of a priori anonymity (i.e. all voters are indistinguishible, a priori) and a priori neutrality (i.e. all social alternatives are indistinguishible, a priori). Thus, it does not allow us to incorporate the knowledge that certain alternatives (e.g. low taxes) tend to be favoured by certain classes of voters (e.g. business owners). Nor does it allow us to incorporate the knowledge that people’s preferences may be correlated (e.g. voters who favour low taxes tend to also favour less regulation, because they are often business owners). Likewise, \((\Theta 2)\) does not allow us to incorporate knowledge that certain types of voters tend to set higher thresholds than others for certain types of policy problems. Thus, when applied to a specific policy problem, these assumptions may be less than optimal; a voting rule more closely optimized to the specific probability distribution of utility functions in a society might yield a higher expected social welfare.Footnote 8 However, in many cases, we might not have such specific information about the distribution of utility functions.

It is important to note that we do not assume that the random threshold \(\theta _i\) is independent of \(u_i\). For example, one very natural threshold for a voter would be the average utility of all alternatives; that is:

$$\begin{aligned} \theta _i=\frac{1}{A} \sum _{a\in {\mathcal { A}}} u_i(a). \end{aligned}$$
(7)

Note that a random threshold determined in this way is compatible with hypothesis \((\Theta 2)\).

4.2 The selection model

In our second model, the approval set of each voter is exogenous and arbitrary. Her approval set might be fixed in advance, or it might be generated by some other random process. (If the voters’ approval sets are random variables, then we do not need to assume that they are either independent, or identically distributed.) Each voter assigns random utilities to each alternative, conditional on whether or not it is in her approval set.

Formally, for each i in \({\mathcal { I}}\), let \({\mathcal { G}}_i\) be the (exogenous) approval set of voter i, and let \({\mathcal { B}}_i:={\mathcal { A}}{\setminus }{\mathcal { G}}_i\). Let \(\gamma \) and \(\beta \) be two finite-variance probability measures on \({\mathbb {R}}\), such that the mean value of \(\gamma \) is strictly larger than that of \(\beta \).Footnote 9 In addition to assumptions (C) and \((\Delta )\) from Sect. 2, we now make the following assumption:

(S) :

For all g in \({\mathcal { G}}_i\), \(u_i(g)\) is a \(\gamma \)-random variable. For all b in \({\mathcal { B}}_i\), \(u_i(b)\) is a \(\beta \)-random variable. The random variables \(\{c_i\); \(i\in {\mathcal { I}}\}\) and \(\{u_i(a)\); \(i\in {\mathcal { I}}\) and \(a\in {\mathcal { A}}\}\) are all jointly independent.

The interpretation of (S) is the same as \((\Theta 1)\); it implies both a sort of a priori anonymity and a priori neutrality. The difference is that now we can distinguish between those alternatives which are in a voter’s approval set and those which aren’t, and give them different statistical treatments. Since the approval sets are exogenous, this model is able to cope with situations where a voter’s approval choices are highly correlated, both with other voters and with other known facts about that voter (e.g. the fact that business owners tend to approve of tax reduction and also tend to approve of deregulation). The model in effect makes no assumptions about the statistical distribution of approval sets (it doesn’t even treat them as random). However, the model still does not allow correlations of utility within each voter’s approval set (e.g. a correlation between the utility that a voter assigns to a tax reduction and the utility she assigns to a deregulation policy, given that she has approved of both). Once again, if a large population of voters satisfies these hypotheses, then with very high probability, approval voting will maximize the utilitarian social welfare function \(U_{\mathcal { I}}\) in Eq. (1).

Theorem 4.2

Let \({\varvec{{\mathcal { G}}}}=\{{\mathcal { G}}_i\}_{i\in {\mathcal { I}}}\) be an arbitrary approval profile. If \(\{u_i\}_{i\in {\mathcal { I}}}\) and \(\{c_i\}_{i\in {\mathcal { I}}}\) satisfy hypotheses (C), \((\Delta )\), and (S), then

$$\begin{aligned} \lim \nolimits _{I{\rightarrow }{\infty }} \ \mathrm{Prob}\left[ \mathrm{Appr}({\varvec{{\mathcal { G}}}})\subseteq {{\mathrm{\mathrm{argmax}}}}_{\mathcal { A}}(U_{\mathcal { I}})\right] = 1. \end{aligned}$$

As in Sect. 3, we would like to refine Theorems 4.1 and 4.2 by dropping condition \((\Delta )\). We would also like to estimate how large the population must be in order for approval voting to “almost-maximize” \(U_{\mathcal { I}}\) with a certain probability, by analogy with Proposition 3.2. For brevity, we will present such a result only for the Selection Model, but a similar result can be proved for the Threshold Model. Let \(U^*_{\mathcal { I}}:=\max \{U_{\mathcal { I}}(a)\); \(a\in {\mathcal { A}}\}\).

Proposition 4.3

Let \({\varvec{{\mathcal { G}}}}=\{{\mathcal { G}}_i\}_{i\in {\mathcal { I}}}\) be an arbitrary approval profile. If \(\{u_i\}_{i\in {\mathcal { I}}}\) and \(\{c_i\}_{i\in {\mathcal { I}}}\) satisfy (C) and (S), then for any \(\delta >0\), we have

$$\begin{aligned} \lim _{I{\rightarrow }{\infty }} \mathrm{Prob}\left( U_{\mathcal { I}}(a) \ge U^*_{\mathcal { I}}-\delta \; \text{ for } \text{ all } \text{ a } \in \mathrm{Appr}({\varvec{{\mathcal { G}}}})\right) = 1. \end{aligned}$$
(8)

Furthermore, if the fourth moments of \(\gamma \) and \(\beta \) are finite,Footnote 10 then there are constants \(C_1,C_2>0\) (determined by \(\gamma \), \(\beta \), and \(\sigma _c^2\)) such that, for any \(p>0\), if \(I\ge C_1/p\) and \(I\ge C_2/p\,\delta ^2\), then \(\mathrm{Prob}\left[ U_{\mathcal { I}}(a)< U^*_{\mathcal { I}}-\delta \right] < p\) for all \(a\in \mathrm{Appr}({\varvec{{\mathcal { G}}}})\).

Remark on strategic voting Theorems 4.1 and 4.2 assumed that each person votes sincerely, meaning that she votes for all and only those alternatives whose utility exceeds some threshold. Such sincerity should not be taken for granted; De Sinopoli et al. (2006) and Núñez (2014) have constructed examples of mixed strategy Nash equilibria and Poisson equilibria with insincere approval voting. However, Núñez and Pivato (2016, Theorem 3) show that in a sufficiently large population, there is a stochastic voting rule which will produce the same outcome as approval voting, with very high probability, and where almost all voters will vote sincerely, using the threshold defined by formula (7). This result, combined with Theorem 4.1, suggests that in a large enough population satisfying hypotheses (C), \((\Delta )\), \((\Theta 1)\), and \((\Theta 2)\), this stochastic approval voting rule will select the utilitarian-optimum outcome with very high probability, even when voters have the option of being insincere.

5 Rank scoring rules

One concern with approval voting is that it gives each voter very little ability to express the intensity of her preference for or against each alternative. For example, if a voter does not include a certain alternative in her approval set, this may be because she actively dislikes this alternative, or it may simply be because she has no strong feelings either way, or perhaps inadequate knowledge of this alternative, and for this reason she “abstains” from endorsing it. Approval voting is unable to distinguish between voting against and abstention. For this reason, Alcantud and Laruelle (2014) propose “dis&approval” voting, which allows a voter three choices for each alternative: approve, disapprove, or abstain. However, “dis&approval” voting still does not distiguish between “strong” (dis)approval and “weak” (dis)approval. For this purpose, we turn to rank scoring rules.

Let \(N:=|{\mathcal { A}}|\). Let \(s_1\le s_2\le \cdots \le s_N\) be real numbers, and define \({\mathbf { s}}:=(s_1,s_2,\ldots ,s_N)\). The \({\mathbf { s}}\) -rank scoring rule on \({\mathcal { A}}\) is defined as follows:

  1. 1.

    For every voter i in \({\mathcal { I}}\), let \(\succ _i\) denote her (strict) ordinal preferences on \({\mathcal { A}}\).

  2. 2.

    For every alternative a in \({\mathcal { A}}\), if a is ranked kth place from the bottom with respect to \(\succ _i\), then voter i gives a the score \(s_k\). (In particular, i gives the score \(s_1\) to her least-prefered alternative, and the score \(s_N\) to her most prefered alternative.)

  3. 3.

    For each alternative in \({\mathcal { A}}\), add up the scores it gets from all voters.

  4. 4.

    The \({\mathbf { s}}\)-rank scoring rule chooses the alternative(s) with the highest total score.

For example, the Borda rule is the rank scoring rule with \({\mathbf { s}}=(1,2,3,\ldots ,N)\). The standard plurality rule is the rank scoring rule with \({\mathbf { s}}=(0,0,\ldots ,0,1)\).

Formally, for every voter i in \({\mathcal { I}}\), if \({\mathcal { A}}=\{a_1,a_2,\ldots ,a_N\}\) and \(a_1\prec _i a_2\prec _i\cdots \prec _i a_N\), then define \(v_i:{\mathcal { A}}{{\longrightarrow }}{\mathbb {R}}\) by setting \(v_i(a_k) := s_k\) for all \(k\in [1\ldots N]\). Let \({\mathcal { P}}_{\mathcal { I}}:=\{\succ _i\}_{i\in {\mathcal { I}}}\) be the profile of ordinal preferences of the voters. For every social alternative a in \({\mathcal { A}}\), define \(V^{\mathbf { s}}_{{\mathcal { P}}_{\mathcal { I}}}(a):=\sum _{i\in {\mathcal { I}}} v_i(a)\). Then define \(\mathrm{Score}_{\mathbf { s}}({\mathcal { P}}_{\mathcal { I}}):={{\mathrm{\mathrm{argmax}}}}_{\mathcal { A}}(V^{\mathbf { s}}_{{\mathcal { P}}_{\mathcal { I}}})\).

Apestequia et al. (2011; Theorem 3.1) have shown that, amongst all voting rules, rank scoring rules are the ones which maximize the expected value of the utilitarian social welfare function (under certain conditions). Furthermore, they characterized the optimal rank scoring rule in terms of the probability distribution of the voters’ utility functions—to be precise, in terms of the expected order statistics of this distribution. Our results in this section are complementary. We work with a much broader class of probability distributions than Apesteguia et al. (2011). We will show that, if the profile \(\{u_i\}_{i\in {\mathcal { I}}}\) arises from this class, then there exists a rank scoring rule which will come arbitrarily close to selecting a utilitarian optimum, with very high probability as \(I{\rightarrow }{\infty }\). Thus, while Apesteguia et al. (2011) show that the optimal rank scoring rule is “better on average” than any other voting rule, we show that it is, in fact “almost perfect”, in the limit of a large population.Footnote 11

As in Sect. 4, we will present two stochastic models of voter preference formation. In the Endogenous Preference model (Sect. 5.1), the voters’ utility functions are i.i.d. random variables, and their ordinal preferences are determined by these utility functions. In the Exogenous preference model (Sect. 5.2), the voter’s ordinal preferences are exogenous and arbitrary, and their utility functions are random variables conditional on these preferences.

5.1 Endogenous preferences

For each voter i in \({\mathcal { I}}\), we will represent her utility function over \({\mathcal { A}}\) as an N-dimensional vector \({\mathbf { u}}^i:=(u^i_a)_{a\in {\mathcal { A}}}\in {\mathbb {R}}^{\mathcal { A}}\) (where \(u^i_a:=u_i(a)\) for all \(a\in {\mathcal { A}}\)). Let \(\mu \) be a probability measure on \({\mathbb {R}}^{\mathcal { A}}\) which has finite variance (i.e. the variance of each one-dimensional marginal of \(\mu \) is finite). We will use \(\mu \) to randomly generate the utility functions of the voters. For any \(a,b\in {\mathcal { A}}\), we assume that \(\mu \{{\mathbf { u}}\in {\mathbb {R}}^{\mathcal { A}}\); \(u_a=u_b\}=0\) (i.e. almost surely, no two alternatives yield the same utility). In addition to hypotheses (C) and \((\Delta )\) from Sect. 2, we will now make the following two assumptions.

(N1) :

The utility functions \(\{{\mathbf { u}}^i\}_{i\in {\mathcal { I}}}\) are independent, \(\mu \)-random vectors. Also, \(\{{\mathbf { u}}^i\}_{i\in {\mathcal { I}}}\) are independent of \(\{c_i\}_{i\in {\mathcal { I}}}\).

(N2) :

\(\mu \) is symmetric under all coordinate permutations.

Note that we do not assume, for any particular voter i, that the utilities she assigns to different alternatives a and b are independent random variables: the probability measure \(\mu \) may allow for arbitrary correlations between \(u^i_a\) and \(u^i_b\). However, assumption (N2) acts as a form of a priori neutrality. It implies that the expected values of the utilities \(u^i_a\) and \(u^i_b\) are the same. Furthermore, the expected value of \(u^i_a\), conditional on some information about the rank of b (say, that b is i’s favourite alternative) is the same as the expected value of \(u^i_b\) conditional on the same information about the rank of a. Hypothesis (N1) is comparable to (S) or \((\Theta 1)\): it is a sort of a priori anonymity, saying that all voters are indistinguishable, a priori. However, (N1) supposes a richer level of knowledge than (S) or \((\Theta 1)\); for example, we might know that, on average, each voter’s second-best and third-best alternatives obtain, respectively 90 and 75 % of the utility of her favourite alternative (assuming her worst alternative has utility 0). We will now see how to incorporate such knowledge into the optimal rank scoring rule.

Let \({\mathbf { u}}\in {\mathbb {R}}^{\mathcal { A}}\) be a \(\mu \)-random vector. Rearrange the coordinates of \({\mathbf { u}}\) in increasing order, to get a vector \({}^{\scriptscriptstyle \uparrow }\!{\mathbf { u}}:=({}^{\scriptscriptstyle \uparrow }\!u_1,\ldots ,{}^{\scriptscriptstyle \uparrow }\!u_N)\), where \({}^{\scriptscriptstyle \uparrow }\!u_1\le {}^{\scriptscriptstyle \uparrow }\!u_2\le \cdots \le {}^{\scriptscriptstyle \uparrow }\!u_N\). For all \(n\in [1\ldots N]\), let \(s_n\) be the expected value of \({}^{\scriptscriptstyle \uparrow }\!u_n\); this yields a vector \({\mathbf { s}}=(s_1,\ldots ,s_N)\). The next result says that, if a large population of voters satisfies our hypotheses, then with very high probability, the \({\mathbf { s}}\)-rank scoring rule will maximize the utilitarian social welfare function \(U_{\mathcal { I}}\) in Eq. (1).

Theorem 5.1

Suppose \(\{{\mathbf { u}}_i\}_{i\in {\mathcal { I}}}\), and \(\{c_i\}_{i\in {\mathcal { I}}}\) satisfy (C), \((\Delta )\), (N1) and (N2), and let \({\mathcal { P}}_{\mathcal { I}}=\) \({\{\succ _i\}_{i\in {\mathcal { I}}}}\) be the ordinal preference profile defined by \(\{u_i\}_{i\in {\mathcal { I}}}\). Then

$$\begin{aligned} \lim _{I{\rightarrow }{\infty }} \ \mathrm{Prob}\left[ \mathrm{Score}_{\mathbf { s}}({\mathcal { P}}_{\mathcal { I}})\subseteq {{\mathrm{\mathrm{argmax}}}}_{\mathcal { A}}(U_{\mathcal { I}})\right] \ = \ 1. \end{aligned}$$
(9)

Impartial culture As we have emphasized, hypothesis (N1) does not require the utilities \(\{u^i_a\}_{a\in {\mathcal { A}}}\) to be i.i.d. random variables, for any particular voter i. However, it is certainly compatible with this additional assumption. This corresponds to the special case of the model where \(\mu \) is a Cartesian product of N copies of some underlying finite-variance, nonatomic probability measure \(\rho \) on \({\mathbb {R}}\). In this case, the utilities \(\{u^i_a\); \(i\in {\mathcal { I}}\) and \(a\in {\mathcal { A}}\}\) are all independent, \(\rho \)-random variables; this is a version of the so-called “Impartial Culture” model. If we make the further assumption that \(\rho \) is absolutely continuous and has compact support, and assume that \(c_i:=1\) for every voter i in \({\mathcal { I}}\), then we obtain the models considered by Weber (1978) and Apesteguia et al. (2011). But the Endogenous Preference model is much more general than the Impartial Culture model. For example, in a large population Impartial Culture model, all elements of \({\mathcal { A}}\) end up with roughly the same average utility (due to the Law of Large Numbers), so that utilitarianism is effectively indifferent between them, and the use of any voting rule is somewhat superfluous. The Endogenous Preference model avoids this unrealistically trivial outcome.

5.2 Exogenous preferences

In our second model, the preference orders of the voters are exogenous and arbitrary. These preference orders may themselves be random variables, or they may be determined in some other way. (If they are random variables, then we do not assume that they are independent or identically distributed, unlike the Impartial Culture model.) The model then assigns each voter a random utility for each alternative, conditional on her preference order.

Formally, let \({}^{^\uparrow }\!{\mathbb {U}}:=\{{}^{\scriptscriptstyle \uparrow }\!{\mathbf { u}}\in {\mathbb {R}}^N\); \({}^{\scriptscriptstyle \uparrow }\!u_1< {}^{\scriptscriptstyle \uparrow }\!u_2< \cdots < {}^{\scriptscriptstyle \uparrow }\!u_N\}\), and let \(\lambda \) be a finite-variance probability measure on \({}^{^\uparrow }\!{\mathbb {U}}\). For every voter i in \({\mathcal { I}}\), let \(\succ _i\) denote i’s (exogenous) preference order over \({\mathcal { A}}\). In addition to assumptions (C) and \((\Delta )\) from Sect. 2, we make the following assumption:

  1. (X)

    The utility vectors \(\{{\mathbf { u}}^i\}_{i\in {\mathcal { I}}}\) are independent random vectors, generated as follows. For each \(i\in {\mathcal { I}}\), we obtain \({\mathbf { u}}^i\) by taking a \(\lambda \)-random variable \({}^{\scriptscriptstyle \uparrow }\!{\mathbf { u}}^i\) in \({}^{^\uparrow }\!{\mathbb {U}}\), and rearranging the coordinates to agree with the preference order \(\succ _i\). The random variables \(\{{\mathbf { u}}^i\}_{i\in {\mathcal { I}}}\) and \(\{c^i\}_{i\in {\mathcal { I}}}\) are independent.

Once again, we do not assume, for any particular voter i, that the utilities she assigns to different alternatives a and b are independent random variables, even after we condition on \(\succ _i\); the probability measure \(\lambda \) may allow for arbitrary correlations between \(u^i_a\) and \(u^i_b\). However, (X) has a consequence similar to (N2): given two voters i and j, and two alternatives a and b, if a has the same rank with respect to \(\succ _i\) as b does with respect to \(\succ _j\), then the expected value of \(u^i_a\) is the same as that of \(u^j_b\). Hypothesis (X) is also comparable to (N1): it is a sort of a priori anonymity, saying that all voters are indistinguishable, a priori, except for their exogenous ordinal preferences.

For all \(k\in [1\ldots N]\), let \(s_k\) be the expected value of \({}^{\scriptscriptstyle \uparrow }\!u_k\), where \(({}^{\scriptscriptstyle \uparrow }\!u_1,\ldots ,{}^{\scriptscriptstyle \uparrow }\!u_N)\in {}^{^\uparrow }\!{\mathbb {U}}\) is a \(\lambda \)-random variable. Let \({\mathbf { s}}:=(s_1,s_2,\ldots ,s_N)\). The next result says that, if a large population of voters satisfies our hypotheses, then with very high probability, the \({\mathbf { s}}\)-rank scoring rule will maximize the utilitarian social welfare function \(U_{\mathcal { I}}\) in equation (1).

Theorem 5.2

Let \({\mathcal { P}}_{\mathcal { I}}=\{\succ _i\}_{i\in {\mathcal { I}}}\) be an arbitrary ordinal preference profile. Suppose \(\{{\mathbf { u}}_i\}_{i\in {\mathcal { I}}}\), and \(\{c_i\}_{i\in {\mathcal { I}}}\) satisfy (C), \((\Delta )\), and (X). Then the limit (9) holds as \(I{\rightarrow }{\infty }\).

As in Sects. 3 and 4, we would like to refine Theorems 5.1 and 5.2 by dropping condition \((\Delta )\). We would also like to estimate how large the population must be in order for the rank scoring rule to “almost-maximize” \(U_{\mathcal { I}}\) with a certain probability, by analogy with Propositions 3.2 and 4.3. For brevity, we will present such a result only for the Exogenous Preference model, but a similar result can be proved for the Endogenous Preference model. As usual, let \(U^*_{\mathcal { I}}:=\max \{U_{\mathcal { I}}(a)\); \(a\in {\mathcal { A}}\}\).

Proposition 5.3

Let \({\mathcal { P}}_{\mathcal { I}}=\{\succ _i\}_{i\in {\mathcal { I}}}\) be an arbitrary ordinal preference profile. If \(\{u_i\}_{i\in {\mathcal { I}}}\) and \(\{c_i\}_{i\in {\mathcal { I}}}\) satisfy (C) and (X), then for any \(\delta >0\), we have

$$\begin{aligned} (\lim _{I{\rightarrow }{\infty }} \mathrm{Prob}\; U_{\mathcal { I}}(a) \ge U^*_{\mathcal { I}}-\delta \quad \text {for all } \; a \in \mathrm{Score}_{\mathbf { s}}({\mathcal { P}}_{\mathcal { I}})) = 1. \end{aligned}$$
(10)

Furthermore, if the fourth moment of \(\lambda \) is finite,Footnote 12 then there are constants \(C_1,C_2>0\) (determined by \(\lambda \) and \(\sigma _c^2\)) such that, for any \(p>0\), if \(I\ge C_1/p\) and \(I\ge C_2/p\,\delta ^2\), then \(\mathrm{Prob}\left[ U_{\mathcal { I}}(a< U^*_{\mathcal { I}}-\delta \right] < p\), for all \(a\in \mathrm{Score}_{\mathbf { s}}({\mathcal { P}}_{\mathcal { I}})\).

Conditional impartial culture Theorem 5.1 is actually a consequence of Theorem 5.2; in effect, the Exogenous Preference model can be intepreted as the Endogenous Preference model, conditional on a particular realization of the (random) ordinal preference profile \(\{\succ _i\}_{i\in {\mathcal { I}}}\). Theorem 5.2 says that limit (9) holds for any particular realization of \(\{\succ _i\}_{i\in {\mathcal { I}}}\). Theorem 5.1 follows from this fact by integrating over all possible realizations of \(\{\succ _i\}_{i\in {\mathcal { I}}}\). (See Appendix B for details.)

At the end of Sect. 5.1, we explained how the “Impartial Culture” model was a special case of the Endogenous Preferences model. If we condition on a particular realization of \(\{\succ _i\}_{i\in {\mathcal { I}}}\), we obtain the Conditional Impartial Culture model. To be precise, let \(\rho \) be a probability measure on \({\mathbb {R}}\) with finite variance and no atoms. For every voter i in \({\mathcal { I}}\), let \(\succ _i\) be voter i’s (exogenous) ordinal preference relation on \({\mathcal { A}}\). In this case, hypothesis (X) takes the following form:

(X\('\)):

Let \(\{r^i_1,r^i_2,\ldots ,r^i_N\}\) be a sample of N independent, \(\rho \)-random variables. Rearrange this sample in increasing order, to obtain an N-tuple \(({}^{\scriptscriptstyle \uparrow }\!r^i_1,{}^{\scriptscriptstyle \uparrow }\!r^i_2,\ldots ,{}^{\scriptscriptstyle \uparrow }\!r^i_N)\) with \({}^{\scriptscriptstyle \uparrow }\!r^i_{1}< {}^{\scriptscriptstyle \uparrow }\!r^i_{2}< \cdots < {}^{\scriptscriptstyle \uparrow }\!r^i_{N}\).Footnote 13 If \({\mathcal { A}}=\{a_1,a_2,\ldots ,a_N\}\) and \(a_1 \prec _i a_2 \prec _i\cdots \prec _i a_N\), then set \(u_i(a_1):={}^{\scriptscriptstyle \uparrow }\!r^i_{1}\), \(u_i(a_2):={}^{\scriptscriptstyle \uparrow }\!r^i_{2}\), \(\ldots \), and \(u_i(a_N):={}^{\scriptscriptstyle \uparrow }\!r^i_{N}\).

The rank scoring rule described prior to Theorem 5.2 now has the following construction. Take a random sample of N independent random variables drawn from \(\rho \), and compute the order statistics of this sample; this yields N new random variables (which are neither independent, nor identically distributed). Let \(s^{N}_{1}< s^{N}_{2}< \cdots < s^{N}_{N}\) be the expected values of these random variables. Then set \({\mathbf { s}}:=(s^{N}_{1},s^{N}_{2},\ldots ,s^{N}_{N})\).

It is convenient to “renormalize” \(s^{N}_{1},s^{N}_{2},\ldots ,s^{N}_{N}\) to range over the interval \({\left[ -1,1 \right]}\), by defining

$$\begin{aligned} {\widetilde{s}}^{N}_{n} :=\frac{2\,s^{N}_{n} - s^{N}_{N}-s^{N}_{1}}{s^{N}_{N} - s^{N}_{1}}, \quad \text{ for } \text{ all } \; n \quad \text {in}\quad {\left[ 1\ldots N \right]}. \end{aligned}$$

This ensures that \({\widetilde{s}}^{N}_{N}=1\) and \({\widetilde{s}}^{N}_{1}=-1\). (For example, if \(N=3\), then we have \({\widetilde{s}}^{3}_{3}=1\) and \({\widetilde{s}}^{3}_{1}=-1\), and only the value of \({\widetilde{s}}^{3}_{2}\) remains to be determined.) If \(\rho \) is a probability distribution symmetrically distributed about some point in the real line, then the values \({\widetilde{s}}^{N}_{1},{\widetilde{s}}^{N}_{2},\ldots ,{\widetilde{s}}^{N}_{N}\) will be symmetrically distributed around zero—that is, \({\widetilde{s}}^{N}_{k} = - {\widetilde{s}}^{N}_{N+1-k}\) for all k in \({\left[ 1\ldots N \right]}\). Thus, if N is odd and \(k=(N+1)/2\), then \({\widetilde{s}}^{N}_{k}=0\). In particular, if \(N=3\), then we must have \({\widetilde{s}}^{3}_{2}=0\), while \({\widetilde{s}}^3_3=1\) and \({\widetilde{s}}^3_1=-1\). Thus, we get the rank scoring rule defined by the scoring vector \((-1,0,1)\), which is just the Borda rule. Thus, Theorem 5.2 implies the next result, which says that the Borda rule is asymptotically utilitarian-optimal for any symmetric measure \(\rho \).

Corollary 5.4

Suppose \(|{\mathcal { A}}|=3\), and let \({\mathcal { P}}_{\mathcal { I}}=\{\succ _i\}_{i\in {\mathcal { I}}}\) be any profile of preference orders on \({\mathcal { A}}\). Let \(\rho \) be any symmetric, finite-variance probability distribution on \({\mathbb {R}}\). If \(\{u_i\}_{i\in {\mathcal { I}}}\), and \(\{c_i\}_{i\in {\mathcal { I}}}\) satisfy (C), \((\Delta )\) and (X\('\)), then

$$\begin{aligned} \lim \nolimits _{I{\rightarrow }{\infty }} \ \mathrm{Prob}[ \mathrm{Borda}({\mathcal { P}}_{\mathcal { I}})\subseteq {{\mathrm{\mathrm{argmax}}}}_{\mathcal { A}}(U_{\mathcal { I}})] = 1. \end{aligned}$$

If \(|{\mathcal { A}}|\ge 4\), then the Borda rule is no longer guaranteed to be asymptotically optimal; the optimal rule will depend on the expected values of the order statistics for \(\rho \), which depend on the structure of \(\rho \) itself. For example, suppose \(\rho \) was a standard normal probability distribution and \(|{\mathcal { A}}|=7\). Then we get the following expected order statistics (to 5 significant digits).

$$\begin{aligned} \begin{array}{llll} &{}\;s^{7}_{7}&{}\approx &{} 1.35218, \\ &{}\;s^{7}_{6}&{}\approx &{} 0.75737, \\ &{}\;s^{7}_{5}&{}\approx &{} 0.35271, \\ &{}\;s^{7}_{4}&{}=&{} 0, \\ &{}\;s^{7}_{3}&{}\approx &{} -0.35271, \\ &{}\;s^{7}_{2}&{}\approx &{}-0.75737, \\ \ \text {and} &{}\; s^{7}_{1}&{}\approx &{}-1.35218, \\ \end{array} \quad \text {which renormalize to}\quad \begin{array}{llll} &{}\;{\widetilde{s}}^{7}_{7}&{}=&{} 1, \\ &{}\;{\widetilde{s}}^{7}_{6}&{}\approx &{} 0.56011, \\ &{}\;{\widetilde{s}}^{7}_{5}&{}\approx &{} 0.26085, \\ &{}\;{\widetilde{s}}^{7}_{4}&{}=&{} 0, \\ &{}\;{\widetilde{s}}^{7}_{3}&{}\approx &{}-0.26085, \\ &{}\;{\widetilde{s}}^{7}_{2}&{}\approx &{}-0.56011, \\ \text {and} &{} \; {\widetilde{s}}^{7}_{1}&{}=&{}-1. \\ \end{array} \end{aligned}$$

By comparision, the Borda rule uses the scoring vector \((-1, -0.6\overline{6}, -0.3\overline{3}, 0, 0.3\overline{3}, 0.6\overline{6}, 1)\).

Unfortunately, the expected values of order statistics are quite hard to compute for many probability distributions. Harter and Balakrishnan (1996) provide tables of these expected values for most of the common probability distributions (e.g. normal, exponential, Weibull, etc.); from this data it is easy to design the appropriate rank scoring rule.

Remark on strategic voting Scoring rules are highly susceptible to strategic voting. Theorems 5.1 and 5.2 do not address this issue. However, given any scoring rule F, Núñez and Pivato (2016, Theorem 1) show that, in a sufficiently large population, there is a stochastic voting rule which will produce the same outcome as F, with very high probability, and where almost all voters will vote honestly. This result, combined with Theorems 5.1 and 5.2, suggests that in a sufficiently large population satisfying hypotheses (C), \((\Delta )\), and either (X) or hypotheses (N1) and (N2), this stochastic scoring rule will select the utilitarian-optimum outcome with very high probability, even when we allow for strategic voting.

6 Conclusion

For a large enough population, our results suggest that simple and popular voting rules such as the Borda rule or approval voting have a very high probability of selecting an alternative which maximizes or almost-maximizes the utilitarian social welfare function. However, our results depend on some substantive assumptions. We will now briefly discuss some of these assumptions.

First, we have often assumed that the random variables associated with different voters are independent. This assumption is shared by virtually all the literature reviewed in Sect. 1.Footnote 14 But it excludes the possibility that voters belonging to a particular community or social group might exhibit correlations in their utility functions, preference intensities, errors, and/or approval thresholds. Empirical data suggests that these independence assumptions are false (Gelman et al. 2004). But they were made for expositional simplicity, rather than technical necessity. The results in this paper are derived using results from Section 2 of Pivato (2016c), which assumes stochastically independent voters. Section 3 of Pivato (2016c) extends these results to correlated voters, assuming the correlation strength is not too strong.Footnote 15 Using this extension, it would be straightforward to extend the results of the present paper to correlated voters.

A major obstacle to any implementation of utilitarian social choice is strategic voting. As we have already noted, Núñez and Pivato (2016) offers a possible solution, which is particularly suitable for the large-population, probabilistic approach taken in this paper. The mechanisms in Núñez and Pivato (2016) make truthful voting an optimal strategy for almost all voters, but at a price: they introduce a tiny probability that the rule will choose a socially suboptimal outcome. (In effect: they induce each voter to reveal her true preferences by offering her a small chance to be a “random dictator”.) This is very similar to the concept of virtual implementation introduced by Matsushima (1988) and Abreu and Sen (1991). Virtual implementation is an extremely powerful and versatile implementation technology; the main idea is that it is sufficient to obtain a very high probability of selecting a socially optimal outcome, rather than certainty. Thus, virtual implementation is also well-suited to the probabilistic approach of the present paper. Another partial solution to strategic voting is suggested by Kim (2014), who shows that, with stochastically independent voters, the rank scoring rules considered in Sect. 5 are truth-revealing in Bayesian Nash equilibrium.

Finally, each of our results depends on specific assumptions about the statistical distribution of utility functions in the population. Thus, to select the best rule, we must know something about this distribution. The statistical distribution of utility functions probably depends on both the society and the particular policy problem. Thus, different voting rules may be optimal in different situations. In some situations, none of the voting rules considered here may be optimal, from a utilitarian perspective. This suggests a two-stage approach. In the first stage, estimate the utility functions of some statistically representative sample of the population (e.g. using a survey). Use this data to estimate the statistical distribution of utilities, and then determine which voting rule (if any) is optimal, given this distribution. If the statistical distribution of utility functions satisfies the hypotheses of one of the results in this paper, then in the second stage, we can use the corresponding voting rule to make the collective decision. Otherwise, we must resort to some other method.