1 Introduction

This paper studies the aggregation of infinitely many probability measures. Depending on one’s disciplinary outlook, the question may be either of interest in its own right or need further motivation. The rest of the introduction and the following Sect. 2 is meant to provide such motivation; these initial parts can be ignored by readers who find the problem of sufficient interest in itself.

Formal epistemologists are now increasingly following the decision-theoretic paradigm of recent decades by weakening the strict, classical Bayesian assumption of rational agents being endowed with a single subjective probability measure. (Subjective probability measures are generally referred to as priors in the decision-theoretic literature, even when learning by conditioning is very rarely studied, whence there are, strictly speaking, neither prior nor posterior probability measures in the statistical sense; we submit to this terminological convention.) Bradley (2012), for example, entertains the possibility of rational agents having belief systems that are compatible with several subjective probability measures—as opposed to a single unique one. This appears, at first sight, to be a step in the direction of the recent decision-theoretic literature on multiple priors.

However, this is not the case—due to the different perspectives of epistemologists on the one hand and decision theorists on the other. There are at least two ways in which the aggregation of probability measures per se—as opposed to their induced preferences—is of great importance from an epistemological point of view and perhaps also of some interest from a decision-theoretic vantage point.

The first comes from social epistemology. Consider a group whose members hold different belief systems, each encoded by a subjective probability measure, and face a collective decision, i.e. they need to choose one of several social alternatives. Suppose this group wants to ensure that their decision is rationally defensible given some belief system which in a reasonable sense aggregates their individual belief system. Let us assume, for simplicity, that the individuals differ only with respect to their beliefs while sharing a common utility function over final outcomes. A good solution to their problem would be to look for a way of rationally aggregating the probability measures describing their individual belief systems and afterwards choosing a social alternative which maximises expected utility with respect to the aggregate probability measure. [Cases similar to this, however with additional serious complications, feature in a forthcoming paper by Bradley et al. (2012).]

Another area in which the aggregation of probability measures per se becomes important is a new, epistemologically and psychologically informed account of individual decisions. This account is motivated by theories from contemporary psychology according to which the human mind is to be understood as a composition of more elementary mental agents, viz. in terms of a ‘society of mind’ (Minsky 1986), a ‘multimind’ (Ornstein 1986), or an ‘internal family system’ (Schwartz 1997); it has been formalised and introduced into the epistemological discussion by Bradley (2012).

Technically speaking, this new account of human decision making attempts to give a precise description of the decision process of an individual whose belief system is compatible with several probability measures: Bradley (2012) suggests that every decision of an agent with multiple subjective probability measures is preceded by an ‘aggregation’ of those priors—which takes her, temporarily, to a new, aggregate prior, i.e. a new probabilistically consistent belief system; in order to make a decision, she will then evaluate the options available to her using the classical expected utility criterion with respect to the temporary, aggregate prior.Footnote 1 After the decision has been made, she returns to her previous epistemic state encoded by a whole set of subjective probability measures. Note that this set of subjective probability measures will in general not be finite—e.g. when it is the set of all probability measures that are consistent with certain conditional probability assignments or that satisfy certain upper or lower bounds when evaluated at certain events (‘interval probabilities’). Therefore, epistemologists who wish to follow Bradley (2012) in his proposal to integrate insights from contemporary psychology into epistemology should seriously consider the problem of aggregating infinite profiles of probability measures.

Of course, the classical decision theory for multiple subjective probability measures (multiple priors), as in Gilboa and Schmeidler (1989), is not—at least not without substantial, quite probably philosophically questionable, detours—applicable in these situations: whilst one could associate each probability measure in the set of priors with an individual and thereby view a maxmin expected utility preference ordering as an aggregate preference ordering, the corresponding ‘aggregator’ would aggregate preferences derived from priors, not priors themselves. Therefore, this theory is—notwithstanding its mathematical elegance and also in spite of its manifold practical use, e.g. in mathematical finance [cf. e.g. Riedel (2009)] —not satisfactory for the epistemologically motivated purposes of the present paper.

The paper is structured as follows. Section 2 motivates the problem of aggregating subjective probability measures from a revised Bayesian perspective. This section can be skipped by readers who are only interested in the technical aggregation problem itself, which will be studied in Sect. 3. We will see that there is indeed a relatively natural and decision-theoretically defensible way of aggregating probability measures—via a generalised aggregation theory for infinite profiles of probability measures along the lines of McConway (1981) and Arrow (1963). Moreover, this existence question is not trivial, as certain natural candidates like oligarchic aggregation or aggregation via integrals on the electorate are not feasible. The formal details and proofs are given in Sects. 4 and 5, respectively. Section 6 concludes.

2 Motivation: Bayesianism with multiple priors

2.1 Single- versus multiple-prior Bayesianism and decision theory

In classical Bayesian contributions to epistemology and decision theory it is often assumed that the degrees of belief of a rational individual can be given by a single probability measure: a rational individual, it is tacitly or explicitly assumed in this literature, makes judgments about all propositions in some algebra and does so by assigning a precise real number, its degree of belief, to each of them. Let us call this thesis—for reasons which will shortly become clear—single-prior Bayesianism. For philosophical simplicity, let us also assume that all credences are reported explicitly by the individual, rather than implicitly (through betting or the like).

Of course, this is a very strong assumption. To be sure, formal epistemologists have offered numerous arguments why the degrees of belief of a rational agent should satisfy the axioms of probability theory (at least with finite additivity), if they are precise [e.g. Joyce (2009); Easwaran and Fitelson (2012); Fitelson and McCarthy (2012); Leitgeb and Pettigrew (2010a, b); Wedgwood (2012)]. These arguments, however, do not establish that the system of propositions to which a rational agent assigns degrees of belief must be an algebra, nor that a rational agent must always assign precise degrees of belief.

Indeed, rational agents may subscribe to a set \({\mathrm{\mathcal {S}}}\) of precise assignments of conditional degrees of belief (i.e. assignments of real values to conditional events \(\langle A|B\rangle \), where \(A,B\) belong to some algebra \({\mathcal {A}}\) of propositions of which the agent is aware) which is in general too small to derive precise degrees of belief even for all propositions which occur in the conditional degree of belief assignments in \({\mathrm{\mathcal {S}}}\). For instance, a rational agent may (for symmetry reasons, say) assign a conditional degree of belief of 1/2 to proposition A given B, but at the same time may not be able to assign a precise degree of belief to either A or B. Instead, she would deem a whole, in general: infinite, set of probability measures compatible with her beliefs. Let us denote the thesis that the belief system of a rational agent corresponds to a set of probability measures by multiple-prior Bayesianism.

Additional reasons for relaxing the assumption of single-prior Bayesianism, i.e. that the beliefs of a rational agent can always be encoded by a single probability measure, are provided by decision theory. More than 90 years ago, Knight (1921) already distinguished two kinds of uncertainties to which economic agents may be subject: sometimes, economic agents are merely uncertain about the exact value of a certain economic variable while being quite certain about the probability distribution of that variable. This kind of uncertainty is called risk or first-order uncertainty. But there are also situations in which the economic agents do not even know the exact probability distribution of some non-deterministic economic variable, but consider several, perhaps rather different, probability distributions possible. This kind of uncertainty is referred to as second-order risk or second-order uncertainty or Knightian uncertainty and is empirically observable [cf. e.g. the seminal contribution by Ellsberg (1961)]. Adopting the statistical terminology of prior and posterior probability distributions, the phenomenon of second-order uncertainty is sometimes also expressed by the term multiple priors or ambiguity (of the prior).

For single-prior Bayesianism, there is an obvious decision criterion: an act is preferable to another act if and only if the expected utility of the outcome of the first act under the single prior is higher than the expected utility of the outcome of the second act. This is not only an intuitively appealing, but also a particularly rational decision criterion in a rigorous theoretical sense: the expected utility theorems of de Finetti (1937, 1970, 1974, 1975), von Neumann and Morgenstern (1944) and, as a synthesis, Savage (1954) show that any preference relation among acts (i.e. maps from a certain set, whose elements are called states of the world, to another set, whose elements are called outcomes) which satisfies certain rationality postulates can be derived from an expected utility criterion for some probability distribution on the set of states of the world and some utility function defined on the set of outcomes.

2.2 Options for a Bayesian decision theory with multiple priors

Among the strengths of single-prior Bayesianism—the thesis that the belief system of a rational agent can always be adequately captured by a single unique probability measure—is its association with a both theoretically and practically very appealing decision criterion. But what would be appropriate decision criteria under the hypothesis of multiple-prior Bayesianism—i.e. the thesis that rational agents may have belief systems compatible with several subjective probability measures (‘uncertainty’ in the sense of Knight (1921))? This depends, unsurprisingly, crucially on what we mean by ‘appropriate’ in this context. Two approaches have to be distinguished.

We might either conceive of multiple-prior Bayesian agents as rational decision makers, whence ‘appropriate’ reduces to a notion of decision-theoretic rationality. If we were to choose that approach, we would look for decision criteria that have some intuitive appeal and at the same time can be rationalised in the same manner as the expected utility criterion has been rationalised by de Finetti (1937, 1970), de Finetti (1974, 1975), von Neumann and Morgenstern (1944) and ultimately Savage (1954). (An additional criterion might perhaps be the practical implementability of a decision criterion.)

Among the most famous results in this direction are the rationalisation of the maxmin expected utility decision criterion by Gilboa and Schmeidler (1989) and its generalisation to the decision criterion of maxmin expected utility with additive penalty (encoding the relative unlikelihood of certain priors) by Maccheroni et al. (2006).

However, we might also look at our multiple-prior Bayesian agents (agents who are subject to Knightian uncertainty) from a more epistemologically motivated perspective. This is what Bradley (2012) suggested in a recent colloquium talk. In line with some of the recent psychological literature [for instance, Minsky (1986), Ornstein (1986), or Schwartz (1997)], Bradley conceives of multiple-prior Bayesian agents as complex—in extreme cases schizophrenic—personalities, composed of simpler sub-personalities (called ‘opinionated avatars’ by Bradley) corresponding to each prior that is compatible with the agent’s belief system. When the time is ripe for a decision, the complex personality aggregates (Bradley) her different probability measures into a single, temporary subjective probability measure and then choose the optimal act based on expected utility; after the decision has been made, she continues to be the complex personality that she was before.

In other words, on the account of Bradley (2012), the preferences of a multiple-prior Bayesian agent (on which the decision is based) are not deduced directly from her set of priors—as would be the case if the agent were to use the maxmin expected utility decision criterion. Instead, her preferences are derived, at least for this specific decision, via classical expected utility theory, from a single, temporary subjective probability measure which has been constructed using some aggregation rule from her set of priors—even though the aggregation rule might vary, thus allowing for time-inconsistency. Hence, at least from an epistemological perspective, the question now remains how one can merge (‘aggregate’) probability measures; behaviourally, only the aggregate prior is revealed.

3 The aggregation of probability measures

In principle, there are multiple possibilities for the ‘aggregation’ of several priors into a single one. One’s choice will have to depend on how closely one wants to follow some established notion of orthodoxy for aggregation.

If one were to take the requirement of an orthodox aggregation theory very literally, one might recall that the aggregation theory of Arrow (1963) is the classical microeconomic theory of aggregation. Since its domain is the aggregation of preferences, one would have to convert the priors first of all into preference orderings; the canonical way to achieve this is via expected utility theory. This will then prompt the question under which circumstances Arrovian aggregation of expected utility preferences is possible.

One might even go one step further [than Bradley (2012)] in the direction of a society of mind (Minsky 1986) psychology of decisions: what if the decision maker is composed of agents that are themselves conglomerates of even more elementary agents who face some second-order uncertainty, yet have very simple utility functions. Then, under some rationality constraints, one may assume that each of the agents that constitute the decision maker has maxmin expected utility preferences or at least preferences from one of the larger classes of ambiguous preferences, such as the variational preferences of Maccheroni et al. (2006) or even the MBA preferences introduced by Cerreia-Vioglio et al. (2011). The above question then becomes whether Arrovian aggregation of such generalisations of expected utility preferences is possible.

As it turns out, this is in general not the case: whenever one wants to aggregate a profile of variational preferences from a sufficiently rich class (e.g. the set of all expected utility preferences for some set of states of the world or the set of all multiple-prior preferences) into a single variational preference ordering (e.g. an expected utility preference ordering on that set of states of the world), there will be no aggregation rule satisfying the analogues of Arrow’s responsiveness axioms, as was shown in Herzberg (2013a, b). Note that these impossibility statements can be established both for the case of profiles of a given finite length [the analogue of Arrow’s impossibility theorem, cf. Arrow (1963)] and for the case of profiles of any given infinite length [the analogue of Campbell’s impossibility theorem, cf. Campbell (1990)], using a model-theoretic approach to aggregation theory inspired by Lauwers and Liedekerke (1995) and systematically elaborated by Herzberg and Eckert (2012a, b). For the special case of the Arrovian finite expected utility profiles of a given length, this impossibility theorem was proved by Le Breton (1986); an impossibility theorem for the non-dictatorial aggregation of expected utility preferences in a slightly different, yet still very natural setting was established by Hylland and Zeckhauser (1979).

Thus, one will have to relax somehow the requirement of following a well-established orthodoxy regarding aggregation. Another approach becomes available as soon as one recalls that the problem of merging several priors into a single prior has actually already been studied in the statistical literature: under the heading of probabilistic opinion pooling, in particular the characterisation of aggregation (pooling) rules satisfying certain responsiveness axioms as linear averaging rules due to McConway (1981) [the findings of McConway (1981) have also entered the literature on aggregating expert judgments, e.g. Cooke (1991); the aggregation of probability distributions, yet without the imposition of Arrovian responsiveness axioms, has been studied by Lindley (1983).]

Even though McConway has shown a possibility result and Arrow an impossibility result, the responsiveness axioms for pooling rules imposed by McConway (1981) are remarkably similar to those of the social choice literature in the tradition of Arrow (1963). In fact, based on these similarities, Dietrich and List (2010, 2008) have formulated a unified framework for preference aggregation, probabilistic opinion pooling and judgment aggregation. Herzberg (2014) has proposed a unified mathematical methodology for approaching this novel general aggregation theory of Dietrich and List (2010, 2008); in this setting, one can derive both McConway’s theorem (McConway 1981) and a judgment aggregation analogue of Arrow’s theorem (Arrow 1963) from a single characterisation theorem has been proposed in Herzberg (2014). The basic idea is that in a binary setting, there are fewer aggregation rules anyhow, whence the responsiveness axioms can only be satisfied by aggregation ruler that are projections, while in the probabilistic setting, the space of possible aggregation rules is much larger and analogous responsiveness axioms can be satisfied by linear aggregation rules.

Understandably, the probabilistic opinion pooling literature is only concerned with finite sets of (\(\sigma \)-additive) priors. Since the set of priors of multiple-prior Bayesian agents (agents facing Knightian uncertainty) will in general be infinite sets of finitely additive priors, we are looking for an appropriate generalisation of McConway’s theorem. Our desideratum is a theorem which proves the existence of aggregation rules for infinite profiles of priors that satisfy Arrow’s responsiveness axioms.

In fact, it is not too difficult to give such an existence proof, at least if one does not insist on the \(\sigma \)-additivity of the aggregate prior (which also permits to study the aggregation of profiles of finitely additive probability measures).

We adopt essentially the same setup as in McConway (1981), with three modifications. First, no \(\sigma \)-additivity is required for the probability measures. Secondly (WSFP, SSFP or ZPP) consensus functions are now called (independent, systematic or unanimity-preserving—respectively) aggregation functionals, in order to connect the result to general aggregation theory. Thirdly and most importantly, we allow for an infinite electorate N.

In the remainder of the paper, we shall discuss the framework just described (including the assumptions on the aggregation operators) and prove the following results, assuming that the underlying set of states of nature has at least three elements:

  1. (1)

    An aggregation functional is systematic if and only if it is both independent and unanimity-preserving.

  2. (2)

    If the electorate is finite, then an aggregation functional is systematic if and only if the aggregation functional can be reduced to weighted averaging of probabilities.

  3. (3)

    If the electorate is infinite, then there again exist aggregation functionals that are systematic but cannot be derived from the probability assignments of any finite subset of the electorate.

  4. (4)

    The existence of such aggregation functionals is non-trivial because an obvious class of candidates for such aggregation functionals, viz. integrals with respect to a \(\sigma \)-additive measure on the (power-set of the) electorate, generically is empty if N is uncountable.Footnote 2

Thus, non-trivial systematic aggregation of priors is possible even for infinite electorates. Parts 1 and 2, of course, are merely a straightforward adaptation of McConway’s famous results (McConway 1981, Theorems 3.2 and 3.3) to a setting in which \(\sigma \)-additivity of priors is not assumed and which also admits with profiles of infinite length.

For part 3 of the Theorem, one has to find systematic aggregation functionals for infinite profiles of priors, which have hitherto not been constructed. Our construction uses Robinsonian non-standard analysis (Robinson 1961, 1966) through an ultrapower of the reals with respect to some non-principal ultrafilter on the cardinality of the set of priors (i.e. the profile length). Very roughly speaking, the ultrafilter can be seen as a device of picking an accumulation point of a bounded sequence (such as the profile of all probabilities assigned to a particular event by individuals in the electorate), and the use of non-standard analysis permits (a) calculating with this accumulation point as if it were an ordinary sequence element (i.e. just an ordinary real between 0 and 1) and (b) extracting accumulation points in a uniform manner for all sequences (i.e. all profiles of probabilities). In particular, this construction satisfies only a weak anonymity concept, viz. finite anonymity, but not bounded or even strong anonymity, as was shown by Fey (2004).

Part 4 merely assembles various results from axiomatic set theory on the so-called measure problem. The problem with integral-based aggregation functionals is that the function \(i\mapsto P_i(A)\) which assigns to each individual \(i\) the subjective probability of \(i\) for a given event \(A\), given some profile \(\underline{P}\), need not be measurable with respect to a fixed \(\sigma \)-algebra on an uncountable electorate N. In general, it can only be assumed to be measurable with respect to the power-set \(\sigma \)-algebra. However, the existence of \(\sigma \)-additive probability measures on the power-set of an uncountable set N is an intricate set-theoretic problem; the existence of such a measure cannot be proved in classical mathematics and for some cardinalities of N even refuted.

4 A generalised aggregation theory for probability measures à la McConway and Arrow

The possibility of aggregating finitely many probability measures while satisfying responsiveness axioms similar to those in Arrow (1963) has been known for more than three decades: any rule which takes the weighted average of the individual probability measures already has those desirable properties, and there are no other rules satisfying those desiderata (McConway 1981).

What has not been treated in the existing literature—at least to the present author’s knowledge—is the aggregation of infinitely many probability measures. We will now show that one can aggregate an infinite number of subjective finitely additive probability measures in a manner that the Arrow-type responsiveness axioms imposed by McConway (1981). It is enough to form an ultrapower of the sequence of the subjective finitely additive probability measures, with respect to an ultrafilter over the infinite set of individuals, and then push it down to a standard finitely additive probability measure by composing it event-wise with the standard part operator [as in the Loeb (1975) measure construction, but without requiring \(\sigma \)-additivity and thus without invoking a saturation principle].

4.1 Formal framework

In the following, we present a framework for probabilistic opinion pooling which is very similar to that of the classical paper by McConway (1981). Our framework is more general in that it allows for infinite electorates as well (and is concerned with the aggregation of finitely additive probability measures). Our terminology reflects the formal similarities between judgment aggregation and probabilistic opinion pooling which have already been observed by Dietrich and List (2010) [for a more formal treatment, see Herzberg (2014)].

Let \(\varOmega \) be a set of possible worlds and let \(\Sigma \) be the set of all algebras on \(\varOmega \).Footnote 3 For all \({\mathcal {A}}\in \Sigma \), let \(\Delta ({\mathcal {A}})\) be the set of finitely additive probability measures defined on \({\mathcal {A}}\).

Let N be a finite or infinite set, called the electorate. An aggregation functional is a map F with domain \(\Sigma \) such that \(F({\mathcal {A}}):\Delta ({\mathcal {A}})^N\rightarrow \Delta ({\mathcal {A}})\) for all \({\mathcal {A}}\in \Sigma \). [McConway (1981) calls our aggregation functionals “classes of consensus functions”, denoted by \({\mathcal {C}}\).] Elements \(\Delta ({\mathcal {A}})^N\) (for any \({\mathcal {A}}\in \Sigma \)) are called profiles and are typically denoted \(\underline{P}\). Given some profile \(\underline{P}=(P_i)_{i\in N}\in \Delta ({\mathcal {A}})^N\) (for any \({\mathcal {A}}\in \Sigma \)), we denote by \(\underline{P}(A)\), for any \(A\in {\mathcal {A}}\), the sequence \(\left( P_i(A)\right) _{i\in N}\in [0,1]^N\). Furthermore, we denote by \(\underline{0}\) and \(\underline{1}\) the \(N\)-sequences consisting only of \(0\)’s and \(1\)’s, respectively. Note that whenever \({\mathcal {A}}\ne {\mathcal {B}}\), \(\Delta ({\mathcal {A}})\cap \Delta ({\mathcal {B}})=\varnothing \) and therefore the domains of \(F({\mathcal {A}})\) and \(F({\mathcal {B}})\) are disjoint. We shall therefore usually drop the first argument, writing \(F\left( \underline{P}\right) \) instead of \(F({\mathcal {A}})\left( \underline{P}\right) \) for all \({\mathcal {A}}\in \Sigma \).

4.2 Properties of aggregation functionals

A very natural condition on any aggregation functional (‘class of consensus functions’) is the unanimity-preservation principle [in the terminology of McConway (1981) ‘Zero Probability Property’], which demands that the aggregate probability of some event should be zero if all voters assign zero to it.Footnote 4

Definition 1

An aggregation functional F is unanimity-preserving/ZPP if and only if for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\), one has \(F\left( \underline{P}\right) (A)=0\) whenever \(P_i(A)=0\) for all \(i\in N\).

Another natural, but perhaps somewhat less compelling condition is to demand that the aggregate probability of some event should depend on nothing else than that event and the sequence of probabilities assigned to that event by the voters [in the terminology of McConway (1981): ‘Weak Setwise Function Property’].

Definition 2

An aggregation functional F is independent/WSFP if and only if there exists a function \(G:\left( 2^\varOmega {\setminus }\{\varnothing ,\varOmega \}\right) \times [0,1]^N \cup \left\{ \left( \varnothing ,\underline{0}\right) \right\} \cup \left\{ \left( \varOmega ,\underline{1}\right) \right\} \rightarrow [0,1]\) such that for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\),

$$\begin{aligned} F\left( \underline{P}\right) (A)=G\left( A,\underline{P}(A)\right) \!. \end{aligned}$$
(1)

The independence property can be traced back to the Arrovian literature where it appears as the requirement of ‘independence of irrelevant alternatives’. In the judgment aggregation literature, it is commonly known as independence tout court. The idea is that the social judgment with respect to some proposition (encoding, e.g., a preference of one alternative over another) should only depend on the individuals’ attitudes towards that particular proposition. Of course, such a requirement is only plausible if the agenda admits some kind of separability, so that the propositions about which social judgments are formed enjoy some degree of mutual independence themselves.

It is clear that \(G(\varOmega ,\underline{1})=1\) and \(G(\varnothing ,\underline{0})=0\) whenever G satisfies Eq. (1) for some aggregation functional F (because \(F\left( \underline{P}\right) \) is, by assumption on F, always a finitely additive probability measure). Moreover, the extension of the notion of independence/WSFP would not change if one replaced the domain of \(G\) by \(2^\varOmega \times [0,1]^N\). This, however, would introduce additional notational difficulties in the proof.

Finally, an even stronger notion than independence would be to require that the aggregate probability of some event should depend on nothing else but the sequence of probabilities assigned to that event by the voters [in the terminology of McConway (1981): ‘Strong Setwise Function Property’]:

Definition 3

An aggregation functional is systematic/SSFP if and only if there exists a function \(f:[0,1]^N\rightarrow [0,1]\) such that for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\),

$$\begin{aligned} F\left( \underline{P}\right) (A)=f\left( \underline{P}(A)\right) . \end{aligned}$$
(2)

The notion of systematicity is known in the preference aggregation literature as ‘neutrality’. It can be seen as stronger version of independence, demanding that aggregation procedures be blind to the content of the proposition about which individual judgments are aggregated.

From the judgment aggregation literature, it is well-known that for sufficiently complex agendas, systematicity and independence are actually equivalent (for a concise proof, cf. e.g. Klamler and Eckert (2009)). A similar finding holds in the setting of McConway (1981) probabilistic opinion pooling. This result of (McConway (1981), Theorem 3.2) can easily be generalised to infinite electorates, as follows. (Because the original proof is only for finite electorates and contains several misprints, we shall provide a full proof, even though it is very close to McConway’s.)

Theorem 4

(Unanimity-preservation and independence = systematicity) Suppose \(\mathrm {card} (\varOmega )\ge 3\). Then, an aggregation functional is both unanimity-preserving/ZPP and independent/WSFP if and only if it is systematic/SSFP.

4.3 Existence and characterisation of aggregation functionals

For the formal statement of our main theorem, we need the notion of an oligarchy, which is a finite subset of an infinite electorate which uniquely determines the aggregation functional by means of weighted averaging.

Definition 5

An aggregation functional is an oligarchy if and only if there exists a finite proper subset \(M\subset N\) and some function \(h:[0,1]^M\rightarrow [0,1]\) such that \(F(\underline{P})(A)=h\left( \left( P_i(A)\right) _{i\in M}\right) \) for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\).

Beyond non-oligarchy, the stronger notion of finite anonymity is a desirable property of aggregation functionals for infinite electorates:Footnote 5

Definition 6

An aggregation functional is finitely anonymous if and only if for each permutation \(\pi :N\rightarrow N\) that is constant on a co-finite subset of \(N\), one has \(F(\underline{P})(A)=F\left( \left( P_{\pi (i)}\right) _{i\in N}\right) (A)\) for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\).

If there exists some probability measure \(\mu \) on the whole power-set of \(N\), then \(i\mapsto P_i(A)\) will be a bounded measurable function for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\). Hence one may define in such a setting a systematic/SSFP aggregation functional by \(F(\underline{P})(A)= \int _N P_i(A)\mu (\mathrm{d}i)\) for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\). The existence of such a \(\mu \), however, becomes a profound set-theoretic problem if \(N\) is uncountable. The consequence of this will be seen in the subsequent characterisation theorem for aggregation functionals. Let us, for simplicity, call aggregation functionals continuous linear if and only if they admit an integral representation of the above form.

Definition 7

An aggregation functional is continuous linear if and only if there exists a \(\sigma \)-additive measure \(\mu :2^N\rightarrow [0,1]\) such that for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\),

$$\begin{aligned} F(\underline{P})(A)=\int _N P_j(A) \mu (\mathrm{d}j). \end{aligned}$$

Remark 8

Any continuous linear aggregation functional is systematic/SSFP.

Lemma 9

If \(N\) is countably infinite, then no continuous linear aggregation functional is finitely anonymous.

Theorem 10

(Existence of systematic aggregation functionals) Suppose \(\mathrm {card} (\varOmega )\ge 3\).

  • If N is finite, then an aggregation functional F is systematic/SSFP if and only if it is continuous linear, and there exist non-oligarchic aggregation systematic/SSFP functionals.

  • If \(N\) is countably infinite, there exist non-oligarchic aggregation functionals that are continuous linear, but also non-oligarchic systematic/SSFP aggregation functionals that are finitely anonymous.

  • If \(N\) is uncountably infinite, then there exist aggregation functionals that are systematic/SSFP, finitely anonymous and non-oligarchic.

  • If \(N\) is uncountably infinite, the existence of continuous linear aggregation functionals cannot be proved from Zermelo–Fraenkel set theory with the Axiom of Choice (ZFC).

  • If the cardinality of \(N\) is the least uncountable cardinal (or any other successor cardinal), there cannot be a continuous linear aggregation functional.

The proof follows from the following two lemmas in combination with known results from axiomatic set theory. The first lemma is just a slight variation of (McConway (1981), Theorem 3.3).

Lemma 11

If \(N\) is finite, then an aggregation functional \(F\) is systematic/SSFP if and only if there exists some \(\underline{\alpha }\in [0,1]^N\) such that \(F(\underline{P})(A)=\sum _{i\in N}\alpha _iP_i(A)\) for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\).

Lemma 12

If \(N\) is infinite, then there exist aggregation functionals that are systematic/SSFP and finitely anonymous, but neither oligarchic nor continuous linear.

5 Proofs

Proof of Theorem 4

First consider a systematic/SSFP aggregation functional \(F\), and let \(f:[0,1]^N\rightarrow [0,1]\) be such that for all \({\mathcal {A}}\in \Sigma \) and every \(A\in {\mathcal {A}}\), Eq. (2) holds. Trivially, it is then independent/WSFP. In order to see that it is also unanimity-preserving/ZPP, it is enough to prove that \(f\left( \underline{0}\right) =0\). However, again by Eq. (2), for any \({\mathcal {A}}\in \Sigma \) and \(\underline{P}\in \Delta ({\mathcal {A}})^N\),

$$\begin{aligned} f\left( \underline{0}\right) = f\left( \underline{P}(\varnothing )\right) =F\left( \underline{P}\right) (\varnothing )=0, \end{aligned}$$

wherein the first equality holds because each \(P_i\) is a finitely additive probability measure and hence \(P_i(\varnothing )=0\) for all \(i\in N\), and the last equality holds because \(F\left( \underline{P}\right) \) is a finitely additive probability measure by definition of \(F\).

Now consider an aggregation functional \(F\) that is both unanimity-preserving/ZPP and independent/WSFP. Then there exists a function \(G:\left( 2^\varOmega {\setminus }\{\varnothing ,\varOmega \}\right) \times [0,1]^N \cup \left\{ \left( \varnothing ,\underline{0}\right) \right\} \cup \left\{ \left( \varOmega ,\underline{1}\right) \right\} \times [0,1]^N\rightarrow [0,1]\) such that Eq. (1) holds for all \({\mathcal {A}}\in \Sigma \) and every \(A\in {\mathcal {A}}\). Because \(F\) is unanimity-preserving/ZPP, it is easy to see that we must have

$$\begin{aligned} G(A,\underline{0})=0 \end{aligned}$$
(3)

for all \(A\subsetneq \varOmega \). Let \(\underline{P}\) with \(\underline{P}(A)=\underline{0}\), so that \(\underline{P}(\complement A)=\underline{1}\). Then, \(G(\complement A,\underline{1})=F(\underline{P})(\complement A)=1-G(A,\underline{0})=1\). Therefore,

$$\begin{aligned} G(B,\underline{1})=1 \end{aligned}$$
(4)

for all non-empty \(B\subseteq \varOmega \).

Consider now two sequences \((a_i)_{i\in N},(b_i)_{i\in N}\in [0,1]^N\) such that \(a_i+b_i\le 1\) for all \(i\in N\). Since \(\mathrm {card} (\varOmega )\ge 3\), there will be two disjoint non-empty sets \(A,B\subsetneq \varOmega \) such that \(A\cup B\ne \varOmega \). Then, there will be a algebra \({\mathcal {A}}\) such that \(A,B\in {\mathcal {A}}\) and a sequence \(\underline{P}\in \Delta ({\mathcal {A}})^N\) such that for all \( i\in N\),

$$\begin{aligned} P_i(A)=a_i,\qquad P_i(B)=b_i, \qquad P_i(A\cup B)=a_i+b_i. \end{aligned}$$

Since \(F\left( \underline{P}\right) \) is a finitely additive probability measure on \({\mathcal {A}}\) by definition of \(F\), we will have \(F\left( \underline{P}\right) (A\cup B)=F\left( \underline{P}\right) (A)+ F\left( \underline{P}\right) (B)\), so

$$\begin{aligned} G\left( A\cup B,(a_i+b_i)_{i\in N}\right) =G\left( A,(a_i)_{i\in N}\right) +G\left( B,(b_i)_{i\in N}\right) . \end{aligned}$$

In the special case, \((b_i)_{i\in N}=\underline{0}\), we obtain, in light of Eq. (3),

$$\begin{aligned} G\left( A\cup B,(a_i)_{i\in N}\right) =G\left( A,(a_i)_{i\in N}\right) . \end{aligned}$$

Therefore (putting \(C=A\cup B\)), whenever \(A\subseteq C\subsetneq \varOmega \),

$$\begin{aligned} G(C,\cdot )=G(A,\cdot ). \end{aligned}$$
(5)

(This equation, and structural similar identities in the remainder of this paper, should be understood as an equality on the intersection of the domains of the left-hand side and the right-hand side.)

It is now easy to prove \(G(A_1,\cdot )=G(A_2,\cdot )\) for all \(A_1,A_2\subsetneq \varOmega \). If \(A_1\) and \(A_2\) have a non-empty intersection, then \(G(A_2,\cdot )=G(A_1\cap A_2,\cdot )=G(A_1,\cdot )\) after applying Eq. (5) twice (each time with \(A=A_1\cap A_2\), once with \(C=A_2\) and once with \(C=A_1\)). If \(A_1\) and \(A_2\) are disjoint with \(A_1\cup A_2\subsetneq \varOmega \), then \(G(A_2,\cdot )=G(A_1\cup A_2,\cdot )=G(A_1,\cdot )\) after applying Eq. (5) twice (each time with \(C=A_1\cup A_2\), once with \(A=A_2\) and once with \(A=A_1\)). If \(A_1\) and \(A_2\) are disjoint with \(A_1\cup A_2 =\varOmega \), then there must be a proper non-empty subset \(D\) of either \(A_1\) or \(A_2\). Without loss of generality, assume \(\varnothing \ne D\subsetneq A_1\) Then, on the one hand, \(G(A_1,\cdot )=G(D,\cdot )\) by Eq. (5), and on the other hand \(G(A_2,\cdot )=G(A_2\cup D,\cdot )=G(D,\cdot )\) after noting that \(A_2\cup D\subsetneq A_1\cup A_2=\varOmega \) and applying Eq. (5) twice.

Therefore, if we fix any non-empty \(A_1\subsetneq \varOmega \), we will get \(G(A,\cdot )=G(A_1,\cdot )\) for all \(A\subseteq \varOmega \) and we may define \(f=f(A_1,\cdot )\).

Proof of Lemma 9

If \(F\) is continuous linear, say represented by some probability measure \(\mu \) on \(N\), and \(N\) is countably infinite then there must be two individuals \(i,j\) such that \(\mu \{i\}\ne \mu \{j\}\). Any permutation \(\pi \) that is constant on a co-finite subset of \(N\) and satisfies \(\pi (i)=j\) and \(\pi (j)=i\) is a counterexample to finite anonymity.

Proof of Lemma 11

One can literally copy the proof of McConway’s main result(McConway 1981, Theorem 3.3), because that proof does not require the \(\sigma \)-additivity of the probability measures in the profile.

Proof of Lemma 12

Our proof employs non-standard analysis in the sense of Robinson (1961, 1966).Footnote 6 Fix a non-principal ultrafilter \({\fancyscript{U}}\) on \(N\).Footnote 7 Then, the ultrapower \(\mathbf {R}^N/{\fancyscript{U}}\) will be a non-standard model of the real numbers.

Let \(^\circ \) denote the standard part operator on this model of the hyperreals.

Let \({\mathcal {A}}\in \Sigma \) and \(\underline{P}\in \Delta ({\mathcal {A}})^N\), and define a real-valued set function \(F(\underline{P})\) by

$$\begin{aligned} F(\underline{P}):{\mathcal {A}}\rightarrow [0,1],\quad A\mapsto {^\circ \left[ \underline{P}(A)\right] _{\sim _{{\fancyscript{U}}}}}. \end{aligned}$$

(Since each element of the sequence \(\underline{P}(A)\) is between 0 and 1, the ultrapower element \(\left[ \underline{P}(A)\right] _{\sim _{{\fancyscript{U}}}}\) is a hyperreal between 0 and 1 by Łoś’s theorem (Łoś 1955), whence its standard part \({^\circ \left[ \underline{P}(A)\right] _{\sim _{{\fancyscript{U}}}}}\) is well-defined and a standard real number between 0 and 1.)

Now, each \(P_i\) is finitely additive, addition on the ultrapower \(\mathbf {R}^N/{\fancyscript{U}}\) is defined representative-wise and \(^\circ \) commutes with addition of limited hyperreals. Therefore, for all disjoint \(A,B\in {\mathcal {A}}\),

$$\begin{aligned} {^\circ \left[ \underline{P}(A\cup B)\right] _{\sim _{{\fancyscript{U}}}}}&= {^\circ \left[ \underline{P}(A)+\underline{P}(B)\right] _{\sim _{{\fancyscript{U}}}}}= {^\circ \left( \left[ \underline{P}(A)\right] _{\sim _{{\fancyscript{U}}}}+ \left[ \underline{P}(B)\right] _{\sim _{{\fancyscript{U}}}}\right) } \\&= {^\circ \left[ \underline{P}(A)\right] _{\sim _{{\fancyscript{U}}}}}+ {^\circ \left[ \underline{P}(B)\right] _{\sim _{{\fancyscript{U}}}}}, \end{aligned}$$

whence the set function \(F(\underline{P})\) defined above is finitely additive. Similarly,

$$\begin{aligned} {^\circ \left[ \underline{P}(\varOmega )\right] _{\sim _{{\fancyscript{U}}}}}= {^\circ \left[ \underline{1}\right] _{\sim _{{\fancyscript{U}}}}}={^\circ 1}=1, \end{aligned}$$

whence \(F(\underline{P})(\varOmega )=1\). Hence, \(F(\underline{P})\) is a finitely additive probability measure on \({\mathcal {A}}\).

We also need to show that \(F\) is not an oligarchy. Suppose otherwise, so that there is some \(M\subseteq N\) and some function \(h:[0,1]^M\rightarrow [0,1]\) such that \(F(\underline{P})(A)=h\left( \left( P_i(A)\right) _{i\in M}\right) \) for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\). On the one hand, one can show that \(h\) must be a linear weighted averaging operation, simply by applying the first part of the theorem to the aggregation functional \(F^{(M)}\) defined through

$$\begin{aligned} F^{(M)}({\mathcal {A}}):\Delta ({\mathcal {A}})^M\rightarrow \Delta ({\mathcal {A}}),\qquad \underline{P}\mapsto h\left( \left( P_i(\cdot )\right) _{i\in M}\right) . \end{aligned}$$

On the other hand, by our above choice of \(F\), we have

$$\begin{aligned} {^\circ \left[ \underline{\alpha }\right] _{\sim _{{\fancyscript{U}}}}}=h\left( \underline{\alpha }\right) \end{aligned}$$
(6)

for all \(\underline{\alpha }\in [0,1]^N\), in particular for all sequences that converge to zero in a strictly decreasing manner. Let us insert such a sequence \(\underline{\alpha }\). Since \(h\) is a linear weighted averaging operation and all entries of \(\underline{\alpha }\) are positive, \(h\left( \underline{\alpha }\right) >0\). However, since \(\underline{\alpha }\) is a null sequence, \(\left[ \underline{\alpha }\right] _{\sim _{{\fancyscript{U}}}}\) is an infinitesimal on account of Łoś’s theorem, hence \({^\circ \left[ \underline{\alpha }\right] _{\sim _{{\fancyscript{U}}}}}=0\), contradicting Eq. (6).

We have to prove that \(F\) need not be continuous linear. Now, if \(N\) is countable, then \({\fancyscript{U}}\) cannot be \(\sigma \)-complete, and therefore the set function \(\chi _{\fancyscript{U}}\) defined on \(2^N\) is not a \(\sigma \)-additive measure, whence the aggregation functional \(F\) does not admit a classical integral representation (and hence is not continuous linear).

If \(N\) is uncountable, then \({\fancyscript{U}}\) can at the very least be chosen in such a manner that \({\fancyscript{U}}\) is not \(\sigma \)-complete and hence \(F\) is not continuous linear: For example, suppose that either there are no weakly inaccessible cardinals or the cardinality of \(N\) is less than the least such cardinal. If \({\fancyscript{U}}\) were \(\sigma \)-complete on \(N\), then it would induce a \(\sigma \)-additive measure \(\chi _{\fancyscript{U}}\) on \(N\), which contradicts our cardinality assumption [cf. Ulam (1930), (Jech (2000), p. 126, Theorem 10.1)]. Hence, unless the cardinality of \(N\) is extremely large, no non-principal ultrafilter \({\fancyscript{U}}\) can be \(\sigma \)-complete, whence \(F\) will not be continuous linear. [Note, that the existence of weakly inaccessible cardinals cannot be proved from ZFC, cf. (Jech (2000), p. 33).] In general, regardless of the cardinality of \(N\), there will at least exist non-principal ultrafilters \({\fancyscript{U}}\) which are not \(\sigma \)-complete [and satisfy additional properties, cf. e.g. (Kunen (1972), Theorem 3.2)].

Finally, we have to verify that \(F\) is finitely anonymous. To see this, note that if \(\pi :N\rightarrow N \) is a permutation which is constant on a co-finite subset of \(N\), then \(I_\pi =\left\{ i\in N \ : \ \pi (i)=i\right\} \in {\fancyscript{U}}\) by our choice of \({\fancyscript{U}}\).

As \({\fancyscript{U}}\) is closed under supersets, this means that for all \(\underline{\beta }\in [0,1]^N\), \( \underline{\beta }\sim _{{\fancyscript{U}}} \left( \beta _{\pi (i)}\right) _{i\in N},\) i.e.,

$$\begin{aligned} \left[ \underline{\beta }\right] _{\sim _{{\fancyscript{U}}}} = \left[ \left( \beta _{\pi (i)}\right) _{i\in N}\right] _{\sim _{{\fancyscript{U}}}}\!\!. \end{aligned}$$
(7)

Now let \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\), and put \(\underline{P}'=\left( P_{\pi (i)}\right) _{i\in N}\). Then,

$$\begin{aligned} \left[ \underline{P}(A)\right] _{\sim _{{\fancyscript{U}}}} = \left[ \underline{P}'(A)\right] _{\sim _{{\fancyscript{U}}}} \end{aligned}$$

by Eq. (7), hence \(F(\underline{P})(A)= F(\underline{P}')(A)\) by definition of \(F\).

Proof of Theorem 10

  • The equivalence follows from Lemma 11. If the vector \(\underline{\alpha }\) in that Lemma is chosen to contain only non-zero entries, the resulting aggregation functional is clearly non-oligarchic.

  • If \(N\) is countably infinite, there are of course continuous linear aggregation functionals, and whenever the measure that represents such a functional has full support, the resulting aggregation functional will be non-oligarchic. There will then exist a sequence \(\underline{\alpha }\in (0,1)^N\) such that \(F(\underline{P})(A)= \sum _{i\in N}\alpha _i P_i(A)\) (convergent) for all \({\mathcal {A}}\in \Sigma \), \(\underline{P}\in \Delta ({\mathcal {A}})^N\) and \(A\in {\mathcal {A}}\). (In particular, \(\sum _{i\in N}\alpha _i=1\).) However, Lemma 12 teaches that there also other aggregation functionals which are non-oligarchic and systematic/SSFP.

  • This follows from Lemma 12.

  • The existence of a continuous linear functional entails the existence of a \(\sigma \)-additive measure on N. This, however, implies the existence of a weakly inaccessible—in fact, even of a measurable—cardinal [cf. Ulam (1930), (Jech (2000), p. 126, Theorem 10.1)]. This, however, is not provable from ZFC [cf. (Jech (2000), p. 33)].Footnote 8

  • Since no successor cardinal is weakly inaccessible, the above argument shows that there cannot be a continuous linear aggregation functional if \(N\) has successor cardinality.

6 Discussion and conclusion

Formal epistemologists who accept Knightian uncertainty (or ambiguity)—i.e. the thesis that rational agents may have belief systems that are compatible with several subjective probability measures (multiple priors)—need to come up with a new account of how such agents will make their decisions, as classical expected utility theory requires, of course, a unique prior. The standard decision-theoretic literature (e.g. Gilboa and Schmeidler 1989) treats Knightian uncertainty through axiomatic rationalisation of generalisations of expected utility preferences, but this is clearly not satisfactory from an epistemological point of view, since there are situations where decisions have to be rationalised with respect to an aggregate belief system [as, for instance, in Bradley’s psychologically informed account of individual decisions (Bradley 2012), or in social epistemology, cf. Bradley et al. (2012)]. What is needed is a natural aggregation theory for probability measures—ideally one that is applicable to infinite profiles of priors.

While a number of obvious approaches (such as Arrovian aggregation of !-utility preferences or their generalisations) turn out to be barren, there does exist a candidate for such an aggregation theory of (possibly infinitely many) probability measures: We have proved an extension of McConway’s results on probabilistic opinion pooling (McConway 1981) which can be regarded as Arrovian in spirit as it relates to the social choice theory in the tradition of Arrow (1963) through the unified general aggregation theory of Dietrich and List (2008, 2010). The existence of well-behaved aggregation functionals for uncountably infinite profiles is non-trivial, since obvious candidates such as integral-based aggregators quickly lead into (set-theoretic) measurability problems. Moreover, the aggregation functionals that we construct for the case of infinite electorates also satisfy a weak anonymity condition.

Now some readers might be concerned about the use of the ultrafilter existence theorem and non-standard analysis in the construction of the aggregation functionals for infinite electorates. It may be reassuring to recall that (i) any proof invoking non-standard analysis can always be transformed into a long standard proof; (ii) that the ultrafilter existence theorem is strictly weaker than the Axiom of Choice (Halpern and Levy 1971); and (iii) that there are non-standard models of the reals—even non-standard universes—which are definable over Zermelo–Fraenkel set theory plus Choice (Kanovei and Shelah 2004; Herzberg 2008a, b).

To conclude: It is possible to directly aggregate—finite and infinite—profiles of finitely additive probability measures in a way that (i) respects Arrovian-spirited responsiveness axioms, (ii) reduces, in the case of finite profiles, to the intuitive rule of linear averaging of probabilities, (iii) is—for several decision-theoretic reasons—the most natural viable approach to the aggregation of priors. Philosophically, this means that there is a rigorous sense in which one can refer (1) to the ‘aggregate belief system’ of a group of individuals who hold probabilistic beliefs [as in Bradley et al. (2012)] and (2) to the ‘aggregate prior’ of an agent whose belief system is compatible with several subjective probability measures [as recently suggested by Bradley (2012)].