Keywords

1 Introduction

Most political representative bodies in the world are chosen through multi-district elections, where seats are apportioned among n parties within each of c districts independently, i.e., solely on the basis of the district vote. In such elections, jurisdiction-wide distribution of seats (the seat distribution) depends heavily not only on the overall voting result (i.e., a vector of party vote shares), but on the geographical distribution of each party’s support over the set of electoral districts (the vote distribution). Anomalous vote distributions can lead to skewed electoral results, such as the well-known referendum paradox [11, 49, 59]. While such anomalous distributions can arise through natural causes, such as voter self-segregation and other population clustering effects [18, 38, 42, 43, 75] (the U.S. electoral college, where two out of five most recent elections involved instances of the referendum paradox favoring the Republican Party, affords a prominent example), they can also be facilitated through deliberate manipulation of electoral district boundaries. Such manipulation, especially when undertaken for the purpose of obtaining an advantage for the party or block of parties controlling the redistricting process, is known as gerrymandering.

Gerrymandering is possible under all kinds of voting rules [6], but is most common under the combination of single-member electoral districts and the plurality rule (known in political science as the FPTP system). A classic gerrymander under FPTP is based on a combination of two strategies: assigning as many opposition voters as possible into a small number of districts (packing) (obviously, that number needs to be smaller than c/2), while spreading out the remainder roughly equally across other districts in such manner that they do not constitute a majority in any of them (cracking) [5, 27]. When done correctly, this results in a substantial number of opposition votes in the “packed” districts being wasted, while the opposition supporters in other districts are so diluted that they are incapable of securing a plurality in any of them. If there are more than two parties, other strategies also become possible, such as stacking, balancing the number of supporters of different opposition parties in such manner that enables the preferred candidate to win with less than majority, but they tend to require more detailed knowledge about voter preferences and their distribution.

Ultimately, however, even as both strategies and objectives of gerrymandering are well-understood, the concept itself, as we will see below, remains difficult to formalize. Even apart from difficulties necessarily involved in discerning intent (and hence distinguishing manipulation from unintentional bias), there is no accepted standard by which a specific vote distribution can be judged “fair” or “natural” [16, 36, 39]. Without such standard, the concepts of distributional “unfairness” or “anomalousness” are fuzzy at best and meaningless at worst. This obviously makes it more difficult to detect and identify gerrymandering, as resort has to be had to circumstantial or otherwise indirect evidence.

2 Methodological Approaches to Detecting Gerrymandering

Altman et al. [2] distinguish six basic methodological approaches to detecting gerrymandering: method of stated intent, which relies on public statements of the authors of the districting plan; method of totality of the circumstances, which focuses on the political circumstances (well-known geographical rivalries, past practice, etc.); method of evaluation of process, which analyzes the districting process; methods of inspection, where gerrymandering is inferred from some qualitative or quantitative characteristics of the districting plan; method of post-hoc comparisons, where the districting plan is compared against a random sample of alternative plans; and method of revealed preferences, where the districting plan is compared against alternatives rejected during the districting process. Of those, the first three are purely qualitative and only rarely will suffice to prove gerrymandering, or even systemic bias. In addition, they require extensive extrinsic knowledge about the districting process that cannot be obtained from election results and districting plans alone. The method of revealed preferences, while advocated by [2], also requires such extrinsic knowledge (namely the set of plans that were known to the districting authority but have been rejected).

That leaves us with only two classes of quantitative methods for detecting gerrymandering that can be applied when extrinsic knowledge is unavailable: the methods of inspection and the method of post-hoc comparisons. As noted, in the former we focus on some observable characteristics of the districting plan and compare them against a well-known standard. Most of such methods focus on one of the basic types of plan characteristics: district geometry and the relation between seats and votes. Geometric methods involve tests of district contiguity and of various measures of district compactness [1, 28, 62, 77], trying to formalize the intuition that gerrymandered electoral districts are oddly shaped. Yet the evidence of manipulation provided by such methods is circumstantial at best, as irregularity of shape is neither necessary nor sufficient condition for gerrymandering.

Methods focusing on the seats-votes relation instead start from some assumptions about the desired characteristics of such relation. Such characteristics may include proportionality [8], responsiveness to shifts in voter support (measured by the swing ratio, i.e., the derivative of seats with respect to votes) [56, 73], partisan symmetry (a requirement that seats-votes curves by identical for all competing parties [32, 33, 35, 37, 47, 57]), or the efficiency principle, requiring that the number of wasted votes be equal for all parties [53, 69]. Then each party’s seats-votes function (i.e., a function assigning to total vote share v the total seat share s) is tested for deviation from the chosen characteristics. Those methods generally share three principal limitations. The first one is of fundamental nature: most of the methods described above (except the partisan symmetry method) involve a priori assumptions that certain form of the seats-votes function is a natural one, but no attempt is made to justify those assumptions, for instance by showing that they arise from some general or accepted distributional assumptions. Without such justification it may well be that those methods generate a large number of false negatives by holding districting plans to a more restrictive standard than mere absence of distributional anomalies. The second problem with methods focusing on the seats-votes relation is more technical: they usually require that the full seats-votes function be known for each party, yet all that is empirically known is a single data point per election. Extrapolation from those data points involves questionable assumptions about how changes in one party’s vote share translate into changes in its vote distribution and in other parties’ vote shares (like the uniform partisan swing assumption, see [14, 15, 29, 35, 58]). Finally, virtually all methods focusing on the seats-votes relation have been developed with two-party elections in mind and usually lack natural generalization for multiparty elections.

The method of post-hoc comparisons instead compares districting plans with an ensemble of alternative districting plans [18,19,20, 24, 51, 60]. The problem is that the full set of correct solutions to the districting problem is in all but simplest cases too numerous to be used for such comparison, so we are reduced to testing the empirical plan against some sample of algorithmically generated random plans. But for proper inference to be drawn from such sample, we need the sample to be drawn from the set of all possible districting plans with some known probability measure, and we are unaware of any algorithm for generating districting plans for which such measure has been analytically determined [2, 3].

Finally, all of the methods described above fail in partially-contested elections, i.e., those where only some parties (or even none) field candidates in all electoral districts, and other parties field candidates in fewer than all districts (including one-candidate parties that only run in a single district). In such cases, the number of candidates can vary across districts, affecting both vote distributions and seats-votes relationships. In addition, we are no longer free to generate alternative seat allocations by rearranging districts, since we no longer have data about each party’s support beyond the districts it contested. It has been already noted by [45] that traditional statistical methods for dealing with missing data cannot be applied to partially-contested elections since failure to contest an election in a district is usually not a random event, but a function of the party’s forecasted electoral strength in such district. Yet the methods for dealing with partially-contested elections proposed by, inter alia, [45, 50, 72, 76], are also insufficient when the patterns of electoral contestation are very chaotic, and particularly if the election cannot be described as a mixture of relatively few patterns with multiple districts per each.

We have encountered exactly those problems when analyzing gerrymandering in Polish local election of 2014, which was held under the plurality rule. Due to highly personalized nature of local politics in Poland (especially in the smallest but most numerous class of municipalities, the townships), in 2386 out of 2412 municipalities the election was partially-contested. The chaotic character of electoral contestation patterns is best described by the following selection of facts:

  • only 2218 out of 16,971 partiesFootnote 1 have contested the election in all districts within their respective municipalities,

  • if parties were ordered according to the fraction of districts contested within their respective municipalities, a median party would have contested less than half of all districts,

  • 4733 parties have contested only a single district,

  • there are, on average, 8.26 different district contestation patterns per municipality.

To address the problems described above, we propose a new method for detecting gerrymandering in partially-contested multiparty elections that are conducted under identical rules in multiple jurisdictions with separate districting plans (examples include regional and local elections, but also national elections in which redistricting is done by subnational jurisdictions, as in the case of the U.S. House of Representatives). We proceed on two general assumptions: that voting in each district can be modeled by a stochastic process that is identical (modulo choice of parameters) for all jurisdictions of interest, and that gerrymandering is ultimately an exception rather than a rule, so the parameters of the stochastic model estimated from the set of all jurisdictions are free from the taint of manipulation. We first formulate a general model of vote distribution, then propose a procedure for estimating that model’s parameters, and finally use that model to derive a sampling distribution of seat shares against which party seat shares can be compared.

3 Modeling District-Level Vote Distribution

3.1 Definitions and Notation

  1. 1.

    An electoral jurisdiction consists of a finite set of electoral districts D, whose cardinality we denote as \(c:=\left| D\right| \), a finite set of parties \(P:=\left\{ 1,\dots ,n\right\} \), and a left- and right-total relation \(R\subseteq P\times D\) such that \(\left( i,k\right) \in R\) if the i-th party fields a candidate in the k-th district. It is assumed here that in each district there is exactly one seat to be allocated using the plurality rule and hence each party is able to field only a single candidate.

  2. 2.

    Let \(D_{\varvec{i}}\subseteq D\) be the set of indices of the electoral districts where the i-th party, \(i\in P\), fields candidates, i.e., a set of such \(k\in D\) that \(\left( i,k\right) \in R\). Let \( c_{i}:=\left| D_{i}\right| \).

  3. 3.

    Let \(P_{k}\subseteq P\) be the set of indices of the parties contesting the k-th district, \(k\in D\), i.e., a set of such \(i=1,\dots ,n\) that \( \left( i,k\right) \in R\). Let \(n_{k}:=\left| P_{k}\right| \).

  4. 4.

    Let \(\sim \) be an equivalence relation on D identifying districts contested by the same set of parties, i.e., such that \(k\sim l\) if and only if \(P_{k}=P_{l}\). By \(\left[ k\right] _{\sim }\) we denote an equivalence class of k in D with respect to \(\sim \). We call it a contestation pattern.

  5. 5.

    The voting result in the k-th district is a vector \(\mathbf {v} _{\varvec{k}}:=\left( v_{i_{1}}^{k},\dots ,v_{i_{n_{k}}}^{k}\right) \in \delta _{k}\), where \(\delta _{k}\) is an \(n_{k}\)-face of the standard \(\left( n-1\right) \)-dimensional simplex \(\varDelta _{n}\) that includes vertices \(i_{1}\) to \(i_{n_{k}}\), \(v_{i}^{k}\) is the i-th party’s vote share in the k-th district, and \(i_{1}<\dots <i_{n_{k}}\) are elements of \(P_{k}\). Note that \(\delta _{k}\) can be identified with the standard \(\left( n_{k}-1\right) \)-dimensional simplex \(\varDelta _{n_{k}}\).

  6. 6.

    Let \(v_{i}:=\left( \sum _{k\in D_{\varvec{i}}}v_{i}^{k}w_{k}\right) /\left( \sum _{k\in D_{\varvec{i}}}w_{k}\right) \), where \(w_{k}\) is the number of voters in the k-th district, be the i-th party’s total vote share.

  7. 7.

    Let \(\varvec{D}_{m}\) be the set of all such districts \(k\in \bigcup D\), where the sum is over all electoral jurisdictions of interest, that \(n_{k}=m\).

  8. 8.

    By quantile mixture of absolutely continuous probability distributions \(\mathcal {M}_{1},\dots ,\mathcal {M}_{m}\) supported on some compact I we understand a probability distribution characterized uniquely by the inverse cumulative distribution function \(\varLambda ^{-1}: \left[ 0,1\right] \rightarrow I\) given by \(\varLambda ^{-1}\left( x\right) := \frac{1}{m}\sum _{i=1}^{m}F_{i}^{-1}\left( x\right) \), where \(F_{i}^{-1}\) is the inverse cumulative distribution function of \(\mathcal {M}_{i}\) [44].

  9. 9.

    Where single-district models are discussed (in Sects. 3.2 and 3.3) index k is omitted.

3.2 Overview of Available Models

The problem of modeling voter choice in single-choice electoral systems can be though of as a special case of the problem of modeling preference orderings, which is well known in the social choice theory (see, e.g., [65] and [71]). A number of models has been employed for that purpose, but, since we are only interested in the first choice, we omit the discussion of those that differ only in their treatment of the second and subsequent preferences.

  1. 1.

    Under the Impartial Culture (IC) model, each preference ordering (and, therefore, also each choice of the first candidate) is equiprobable and each voter decides independently with fixed probabilities [17]. The voting result \(\mathbf {v}\) follows a multinomial distribution centered at the barycenter of \(\varDelta _{n}\) with the variance of the square distance from the barycenter \(O\left( w^{-1}\right) \). There is extensive evidence for the claim that both the equiprobability and independence assumptions are not satisfied in empirical elections (recounted by, inter alia, [65]).

  2. 2.

    The multinomial model is a generalization of the IC model which assigns unequal probabilities to the candidates, but still assumes that each voter makes an independent choice with fixed probabilities described by vector \(\mathbf {p}\). The voting result \(\mathbf {v}\) follows a multinomial distribution centered at \(\mathbf {p}\) with the variance of the square distance from \(\mathbf {p}\) behaving as \(O\left( w^{-1}\right) \). As first noted in [46], this model significantly underestimates the variance of the vote distribution. To avoid that problem, Penrose and others [61, 67, 74] have proposed clustered multinomial model, according to which each district’s population consists of \(\kappa \) equally sized clusters of voters who have identical characteristics and instead of randomizing individual voters’ choices, we randomize each cluster’s choice. Under that model, \(\mathbf {v}\) still follows a multinomial distribution centered at \(\mathbf {p}\), but its variance increases to \(O\left( \kappa ^{-1}\right) \) (as \(\kappa \ll w\) – Penrose’s original estimate for Great Britain was \(\kappa \approx 14\)).

  3. 3.

    The Impartial Anonymous Culture (IAC) model treats each preference profile (and, therefore, each voting result) as equiprobable [31, 48]. Accordingly, the voting result \( \mathbf {v}\) follows the uniform distribution on a discrete grid of points within \(\varDelta _{n_{k}}\), which, as w approaches \(\infty \), weakly converges to the uniform distribution on \(\varDelta _{n_{k}}\).

  4. 4.

    The Pólya urn model, first introduced by Eggenberger and P ólya in 1923 [22], has been applied in the field social choice theory by, inter alia, [12, 21, 40, 66]. Voting is treated as a discrete stochastic process where a ball is drawn from an urn that initially contains \(\alpha _{i}\) balls of the i-th color (where \(i=1,\dots ,n\)), and after each draw \(\lambda \) balls of the same color as the one drawn are returned to the urn. The voting result \(\mathbf {v}\) follows the multivariate Pó lya distribution and, as w approaches \(\infty \), converges almost surely to a random variable having the Dirichlet distribution parametrized by vector \((\alpha _{1},\dots ,\alpha _{n})/\lambda \) [7, 41]. Both IC and IAC are special cases of the urn model, with \(\alpha _{1}=\dots =\alpha _{n}=1\) and \(\lambda =1\) for IAC and 0 for IC.

  5. 5.

    Spatial models assume that voter policy preferences are distributed (usually normally) over a multidimensional policy space, that party policy positions are either specified or randomly distributed over the same space and that voters always choose the candidate of the closest party according to some fixed metric [4, 23, 55]. Again, some voting clustering has to be assumed to avoid overestimating homogeneity.

There is considerable evidence that equiprobability and independence assumptions fail to match empirical data, and accordingly both IC and IAC fail as empirical models of electoral behavior [65, 71]. In [71], spatial models are found to be most effective in modeling preference profiles, but in single-choice elections such models involve too many degrees of freedom for estimation unless highly simplifying assumptions are made (for instance, about reduction of the number of dimensions). That leaves only the urn model for our intended applications.

Sociological theory of electoral behavior also provides sound reasons for adopting the urn model. Contagion mechanisms it is used to model translate into an observation that most voters are initially undecided and their political views are shaped through social interactions with others, who include already-committed supporters of the parties and candidates (cf. [13]). Indeed, political parties recognize that direct mobilization of voters through personal interaction is one of the most important tools of electoral campaigning [26]. Even mass media influence on political views, which would seem to support rather fixed-probability models, is indirect and effective primarily when the information communicated by the media is later verified through direct interaction with other members of the community [54]. It is also recognized that such political contagion processes are essentially stochastic, being dependent on the fine structure of social networks [52] which cannot be predicted deterministically.

3.3 Urn Model of Electoral Behavior

A Pólya urn model is usually characterized by two parameters: a vector of initial ball numbers (\(\mathbf {\alpha }\in \mathbb {R}_{+}^{n}\)) and the number of additional balls returned after each draw (\(\lambda \in \mathbb {R} _{+}\cup \left\{ 0\right\} \)), but note that by rescaling vector \(\mathbf { \alpha }\) we can always obtain \(\lambda =1\), thereby reducing our parameter space to \(\mathbb {R}_{+}^{n}\). In addition, it is often convenient to express \(\mathbf {\alpha }\) as a product of an n-element vector \(\mathbf {p} \in \varDelta _{n}\) and of the concentration parameter \(\alpha \in \mathbb {R} _{+}\).

Definition 1

Pólya-Eggenberger Urn Model [22, 63].

Let us consider a countably infinite set of potential voters. Let \(X_{j}\in P \) be the choice of the j-th voter (\(j\in \mathbb {N}\), \(\mathbb {N=} \left\{ 1,2,3,\dots \right\} \)).

Voting is a discrete stochastic process where the probability of the \(\left( j+1\right) \)-th voter choosing the i-th party’s candidate is defined by induction as

$$\begin{aligned} \Pr \left( X_{j+1}=i\right) =\frac{\alpha p_{i}+\left| \left\{ k=1,\dots ,j:X_{k}=i\right\} \right| }{\alpha +j}, \end{aligned}$$
(1)

for \(i=1,\dots ,n\).

Intuitively, the attractiveness of the i-th party to the \(\left( j+1\right) \)-th voter is proportional to the sum of the number of voters that already have decided to support it and its initial strength \(\alpha p_{i}\). In [66] the authors propose that \(\alpha p_{i}\) be interpreted as the number of voters who are committed at the outset to support the i-th party’s candidate, but this interpretation raises some issues as \(\alpha p_{i}\) need not be an integer.

Proposition 1

In the above situation, there exists a random variable such that \(\left( \Pr \left( X_{j}=1\right) ,\dots ,\Pr \left( X_{j}=n\right) \right) \overset{\mathrm {a.s.}}{\rightarrow }\mathbf {V}\) as \( j\rightarrow \infty \), where (the Dirichlet distribution) is a continuous multivariate probability distribution supported on \(\varDelta _{n}\) that has a probability density f with respect to the Lebesgue measure on \(\varDelta _{n}\) given by:

$$\begin{aligned} f\left( v_{1},\dots ,v_{n}\right) :=\frac{1}{\mathrm {B}\left( \alpha _{1},\dots ,\alpha _{n}\right) }\prod \limits _{i=1}^{n}v_{i}^{\alpha _{i}-1}, \end{aligned}$$
(2)

where \(\mathbf {v}\in \varDelta _{n}\) and \(\mathrm {B}\left( \alpha _{1},\dots ,\alpha _{n}\right) \) is the multivariate beta function:

$$\begin{aligned} \mathrm {B}\left( \alpha _{1},\dots ,\alpha _{n}\right) :=\frac{ \prod \nolimits _{i=1}^{n}\varGamma \left( \alpha _{i}\right) }{\varGamma \left( \alpha \right) }. \end{aligned}$$
(3)

In the above situation we have for \(i=1,\dots ,n\):

(4)
(5)
(6)

For proof of the above proposition see, inter alia, [7] and [41] (Fig. 1).

Fig. 1.
figure 1

Density plot of a symmetric Dirichlet distribution on \(\varDelta _3\) with \(\mathbf {p}=(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})\) and \(\alpha =9\) (left) and of an asymmetric Dirichlet distribution on \(\varDelta _3\) with \(\mathbf {p}=(\frac{4}{9}, \frac{1}{3}, \frac{2}{9})\) and \(\alpha =8\) (right).

3.4 Parameter Fitting – The Expectation Vector

Literature on electoral studies recognizes that district-level vote shares depend on two principal factors: overall party popularity, measured by the total vote share vector \(\mathbf {v}\), and political geography, i.e., district-specific effects, which are more difficult to model formally. However, as we consider an idealized distribution of vote shares in a non-biased election, in essence approximating an average distribution of district vote shares over the population of non-biased districting plans, we abstract from the effects of political geography altogether.

It would thus appear from (5) that the vector of party total vote shares \(\mathbf {v}\) would be the most natural estimate of parameter \(\mathbf {p}\). This, however, is only the case if the voting results in all districts in D come from a single distribution, which in turn is equivalent to a condition that the election be fully contested, i.e., that every party j field a candidate in every district k. Otherwise, there must be a different distribution for each equivalence class \(\left[ k\right] _{\sim }\), as each such class is characterized by the presence of a different set of parties. It follows that in partially-contested elections, which are of primary interest to us, voting results in D will be distributed according to a direct product of Dirichlet distributions , with \(n_{k}\), \(\mathbf {p}_{k}\) and \(\alpha _{k}\) constant for each equivalence class \(\left[ k\right] _{\sim }\), and \(\mathbf {p}_{k}\) and \( \alpha _{k}\) being unknown.

We cannot simply assume that \(p_{i}^{k}=\sum _{j\in \left[ k\right] _{\sim }}v_{i}^{j}/\left| \left[ k\right] _{\sim }\right| \) for each \(k\in D \), since the empirical vote distribution over equivalence classes \(\left[ k \right] _{\sim }\) may be tainted by gerrymandering. Instead, we need a theoretical model that is based solely on district contestation patterns (described by relation R) and party total vote shares vector \(\mathbf {v}\).

In fitting \(\mathbf {p}_{k}\) to each equivalence class in D with respect to \(\sim \), we seek to satisfy the following three natural requirements:

  1. R1

    For each district k, \(\mathbf {p}_{k}\in \varDelta _{n_{k}}\), i.e., \( \sum _{i\in P_{k}}p_{i}^{k}=1\).

  2. R2

    For any two districts \(k,l\in D\), if \(k\sim l\), then \(\mathbf {p} _{k}=\mathbf {p}_{l}\).

  3. R3

    For any two parties \(i,j\in P\), the order on \(\left\{ p_{i}^{k},p_{j}^{k}\right\} \) is identical in every district \(k\in D_{ \varvec{i}}\cap D_{\varvec{j}}\).

  4. R4

    For any two parties \(i,j\in P\) such that \(D_{i}=D_{j}\), the order on \(\left\{ p_{i}^{k},p_{j}^{k}\right\} \) is identical with the order on \( \left\{ v_{i},v_{j}\right\} \) for each district \(k\in D_{i}\).

In addition, there are three postulates that we seek to satisfy approximately (i.e., to minimize deviation from them):

  1. P5

    For each party \(i\in P\) its mean expected vote share over districts should be close to its party vote share, i.e., \(\sum _{k\in D_{ \varvec{i}}}p_{i}^{k}\approx c_{i}v_{i}\).

  2. P6

    For any two districts \(k,l\in D\) if \(n_{k}=n_{l}\), then \( p_{i}^{k}\approx p_{i}^{l}\) for each party \(i\in P_{k}\cap P_{l}\).

  3. P7

    For each party \(j\in P\) and for any two districts \(k,l\in D_{ \varvec{j}}\) we have \(\varphi _{n_{k}}\left( p_{j}^{k}\right) =\varphi _{n_{l}}\left( p_{j}^{l}\right) \), where \(\varphi _{m}:\left[ 0,1\right] \rightarrow \left[ 0,1\right] \), \(m\in \mathbb {N}\), is a function mapping a party vote share in a district with m contenders to a standardized value independent of m.

Of those postulates, P7 clearly requires some additional discussion. The underlying problem consists of comparing vote shares across districts with different number of candidates. Clearly, obtaining 40% of the vote in a district with two candidates is not equivalent to obtaining an identical vote share in a district with ten candidates. In formal terms, this intuition can be expressed as follows: let \(X_{m}\), \(m\in \mathbb {N}\), be a random variable given by \(X_{m}\left( i,k\right) :=v_{i}^{k}\), where k is drawn from a uniform discrete distribution on \(\varvec{D}_{m}\) and i is later drawn from a uniform discrete distribution on \(P_{k}\). The distribution of \(X_{m}\) necessarily depends on m, while for vote shares from different districts to be comparable, we need to transform \(X_{m}\) into another random variable with a distribution that is constant with respect to m.

The probability integral transform of \(X_{m}\) is one natural choice of such transformation. Let us consider the cumulative distribution function of \( X_{m}\). As it is not injective, \(X_{m}\) being discrete, we formally define \( \varphi _{m}:\left[ 0,1\right] \rightarrow \left[ 0,1\right] \) as its continuous approximation obtained by integrating the probit-transformedFootnote 2 kernel density estimator \(\psi _{m}\) [30] of the distribution of \(X_{m}\), i.e., \(\varphi _{m}\left( p\right) =\int _{0}^{p}\psi _{m}\left( x\right) \,dx\) for \(p\in \left[ 0,1\right] \). This assures that \(\varphi _{m}\) is invertible, and that \(\varphi _{m}^{-1}\) is continuous, strictly increasing, and the images of the bounds of its domain are, respectively, 0 and 1. It follows that every linear combination of functions \(\varphi _{k}^{-1}\), where \(k\in \mathbb {N}\), with positive coefficients summing up to \(c>0\), is also continuous and strictly increasing, and the images of the bounds of its domain are 0 and c. Let \( i\in P\). By the intermediate value theorem there exists a unique \(q_{i}\in \left[ 0,1\right] \) such that \(\sum _{k\in D_{i}}\varphi _{n_{k}}^{-1}\left( q_{i}\right) =c_{i}v_{i}\le c_{i}\). Hence the definition \( p_{i}^{k}:=\varphi _{n_{k}}^{-1}\left( q_{i}\right) \) would naturally imply P5. Parameter \(q_{i}\) has no natural interpretation, however if we assume the distribution of the i-th party’s district vote shares to be a quantile mixture of the distributions \(\mathcal {D}_{k}\), where \(k\in D_{i}\) , \(q_{i}\) will correspond to the value of such mixture’s cumulative distribution function \(\varLambda _{i}\) for the empirical value of \(v_{i}\).

Note that model assuming \(p_{i}^{k}=\varphi _{n_{k}}^{-1}\left( q_{i}\right) \) satisfies most of the requirements and postulates specified above:

  • P5 and P7 are satisfied by definition of \(q_{i}\) and \(\varphi _{m}\).

  • If \(n_{k}=n_{l}\), \(p_{i}^{k}=\varphi _{n_{k}}^{-1}\left( q_{i}\right) =\varphi _{n_{l}}^{-1}\left( q_{i}\right) =p_{i}^{l}\) for any party i and any two districts \(k,l\in D_{i}\), so P6 is satisfied exactly and therefore implies R2.

  • R3 results from the monotonicity of \(\varphi _{m}^{-1}\).

  • From the monotonicity of \(\varphi _{m}^{-1}\) we know that the order on \(\left\{ p_{i}^{k},p_{j}^{k}\right\} \) is identical with the order on \( \left\{ q_{i},q_{j}\right\} =\left\{ \varLambda _{i}\left( v_{i}\right) ,\varLambda _{j}\left( v_{j}\right) \right\} \). From \(D_{i}=D_{j}\) it follows that \(\varLambda _{i}=\varLambda _{j}\). As \(\varLambda _{i}\) is strictly increasing, the order on \(\left\{ q_{i},q_{j}\right\} \) is identical to that on \(\left\{ v_{i},v_{j}\right\} \), as desired under R4.

Unfortunately, there is no guarantee that the above model satisfies R1. We therefore modify it by renormalizing vector \(\mathbf {p}_{k}\) for each district k. This renormalization ensures that R1 is satisfied, R3 is preserved (as renormalization preserves the ordering of \(p_{1}^{k},\dots ,p_{n}^{k}\)), and so are R2 and R4 (as the renormalization constant does not vary within \(\left[ k \right] _{\sim }\)). In turn, such renormalization may introduce violations of P5, P6, and P7, but we do not need those postulates to be satisfied exactly.

Note that this method is loosely analogous to the biproportionality method by [9, 10, 64].

The distribution of \(p_{i}^{k}\) in all districts \(k\in \varvec{D}_{m}\), \( m\in \mathbb {N}\), will be of further interest in the following section (recall that \(\varvec{D}_{m}\) is the set of all districts with m candidates). We denote its cumulative distribution function by \(\varPsi _{m}\).

3.5 Parameter Fitting – The Concentration Parameter

The last parameter of our electoral model is the concentration parameter \( \alpha _{k}\). Unlike the expected vote shares of the contending parties, \( \alpha _{k}\) is never observable directly, and in most cases we do not have enough data to fit it to empirical voting results using some distribution fitting method that produces a reasonable confidence interval (since such fitting would require large equivalence class \(\left[ k\right] _{\sim }\)). Intuitively, the concentration parameter should depend on at least two further parameters: the number of candidates and the political homogeneity of the jurisdiction under consideration. The latter, in turn, is likely to depend in a complex manner on a large number of factors, such as the population and area of the jurisdiction, historical cleavages, settlement structure, socioeconomic diversity, etc. We do not have a good theoretical model of those relationships that would enable us to estimate \(\alpha _{k}\) and a formulation of such model would go far beyond the scope of this paper.

To circumvent this issue we treat the concentration parameter as another random variable distributed, for each class of districts \(\varvec{D}_{m}\), with a gamma distribution with parameters \(\kappa _{m}\) and \(\theta _{m}\). To apply our model to a particular class of elections, we still need to estimate those parameters of the distribution of the concentration parameter. We proceed as follows: let \(Y_{m}\), \(m\in \mathbb {N}\), be a random variable given by \(Y_{m}\left( i,k\right) :=V_{i}^{k}\), where \( V_{i}^{k}\) is the i-th barycenteric coordinate of , k is drawn from a uniform discrete distribution on \(\varvec{D}_{m}\) and i is later drawn from a uniform discrete distribution on \(P_{k}\). Intuitively, it is the theoretical vote share of a random party in a random district in an ideal unbiased election. Under our model, the distribution of \(Y_{m}\) is a compound beta distribution with parameters \(\left( \alpha p,\alpha -\alpha p\right) \) (see (4)), where \(p\sim \varPsi _{m}\) and . Accordingly, the density of that distribution is given by:

(7)

The function \(\varPhi _{m}\) is known at this stage (having been estimated in the preceding section), so the only two unknowns in this formula are the gamma distribution parameters \(\kappa _{m}\) and \(\theta _{m}\). But note that under our general assumption that gerrymandering is not ubiquitous, the distribution of \(Y_{m}\) should closely approximate the distribution of \(X_{m} \) provided the model is correct. Therefore, we can use that property to obtain \(\kappa _{m}\) and \(\theta _{m}\). We do that by numerically minimizing, for each \(m\in \mathbb {N}\), the total variation distance [34, 68] between the distributions of \(X_{m}\) and \(Y_{m}\).

4 Modeling the Sampling Distribution of Seats

By this point, we have estimated all the parameters necessary to model the ideal unbiased distribution of votes in each electoral district in every jurisdiction, namely, \(n_{k}\) and \(\mathbf {p}_{k}\) for each district k, and \(\kappa _{m}\) and \(\theta _{m}\) for each class of districts \(\varvec{D}_{m}\), \(m\in \mathbb {N}\). Of course, not all anomalies in the vote distribution of a party are of interest to us when seeking to detect gerrymanders, but only those that translate into biases in the allocation of seats. To detect such biases, we run a Monte Carlo simulation for each jurisdiction of interest, generating a large number of simulated election results. We proceed as follows:

  1. 1.

    For each district \(k\in D\), we generate a single realization of the random variable , which we will denote as \(\widehat{\alpha } _{k}\).

  2. 2.

    For each district \(k\in D\), we then generate a single realization of the random vector , which we will denote as \(\widehat{ \mathbf {V}}_{k}\).

  3. 3.

    We distribute seats within each district \(k\in D\) according to the plurality rule, awarding a single seat to the party with the greatest vote share, i.e., to the one corresponding to the greatest barycentric coordinate of \(\widehat{\mathbf {V}}_{k}\).

  4. 4.

    For each party \(i\in P\) we sum seats over districts.

This procedure is repeated \(2^{20}\) times for each electoral jurisdiction. Through this process, we obtain a joint discrete sampling distribution of party seat vectors \(\mathcal {S}\) on \(\prod \nolimits _{i=1}^{n}\left\{ 0,\dots ,c_{i}\right\} \), and for each party \(i\in P\) we denote the marginal sampling distribution of seats by \(\mathcal {S}_{i}\). In the process of estimating the above distributions we do not rely on the empirical distribution of voters among districts, and therefore they are untainted by the possible gerrymandering.

To measure the distance between actual seat count of the i-th party, \(s_{i} \), and the distribution \(\mathcal {S}_{i}\) obtained above, we introduce a simple measure analogous to the well-known p-value used in statistical hypothesis testing:

$$\begin{aligned} \pi _{i}:=\min \left( \mathcal {S}_{i}\left( \left[ 0,s_{i}\right] \right) , \mathcal {S}_{i}\left( \left[ s_{i},c_{i}\right] \right) \right) . \end{aligned}$$
(8)

In other words, \(\pi _{i}\) is the probability of a party obtaining the number of seats that is equal to or more extreme than its actual number of seats. Low values of \(\pi _{i}\) are indicative not only of anomalies in the vote distribution, but also of the fact that they translate into a rather improbable deviation from the expected number of seats.

To obtain a jurisdiction-level index, we could simply average the values of \( \pi _{i}\) over \(i\in P\). However, to account for the fact that we are primarily interested in cases of gerrymandering affecting parties contesting most districts, we weigh \(\pi _{i}\) by the number of districts \(c_{i}\). The resulting index,

$$\begin{aligned} \pi :=\frac{\sum _{i=1}^{n}\pi _{i}c_{i}}{\sum _{i=1}^{n}c_{i}}, \end{aligned}$$
(9)

is our final measure of electoral bias. While not conclusive evidence of gerrymandering, since we still lack proof of intent, as electoral bias can be unintentional and arise due to pecularities of spatial distribution of party voters), it allows us to identify the outlier jurisdictions which can then be analyzed using other, possibly more qualitative methods.

Remark 1

Note that for our primary data set of interest, Polish local elections of 2014, \(\pi \) is quite well approximated by a normal distribution, see Fig. 2. This indicates an absence of pervasive gerrymandering, which agrees with the intuition that gerrymandering (or at least successful gerrymandering) is more difficult in less orderly party systems.

Fig. 2.
figure 2

A kernel density estimate of the empirical density of \(\pi \) for Polish local election of 2014 (black) and a normal density curve with \(\mu \approx 0.388\) and \(\sigma \approx 0.0945\) (red). (Color figure online)

To conclude, we have seen that classic methods for detecting gerrymandering fail when applied to multiparty partially-contested elections. We propose an alternative method based on a probabilistic model of voting behavior, together with a procedure for estimating the parameters of such model in a manner insulated from the possible taint of gerrymandering. We admit that the method is complex and involves simplifying assumptions and ad-hoc solutions, but they are made necessary due to the complexity of the problem and the limitations of the available data. Ultimately, we are unable to secure any conclusive evidence of gerrymandering, but we do obtain a single index that can be used to identify the suspect jurisdictions for further analysis.