Introduction

Electoral institutions are core institutions of political systems. Sartori (1968, p. 273) calls the electoral system ‘the most specific manipulative instrument of politics’. Yet there is little agreement about what causes cross-national variation in electoral laws, despite a recent surge in the literature on electoral system choice in advanced democracies before World War II with contributions by several prominent scholars. Among others these contributions disagree on whether the introduction of proportional representation (PR) was a response of the political right to stop the rise of the political left (Boix, 1999), whether PR was forced upon the political right by a radicalized political left (Alesina and Glaeser, 2004), or whether the move to PR was in fact largely consensual (Blais et al, 2005; Cusack et al, 2007).

In a discussion of Boix (1999) and Cusack et al (2007), Kreuzer (2010) identifies several reasons for this lack of agreement. In particular, he argues that ‘political scientists commonly draw on history but often do not read actual historians carefully’ and that ‘it would be beneficial to first do the more nuts-and-bolts work of using historical knowledge to improve the quantitative study of institutional origins’ (Kreuzer, 2010, pp. 369, 385). Kreuzer makes three empirical contributions: he revisits Boix’s (1999) and Cusack et al’s (2007) data collection; he replicates their statistical analyses (using the new data) to test the robustness of the findings; and he tests more observable implications of the causal mechanisms that Boix (1999) and Cusack et al (2007) put forward. Kreuzer (2010, p. 383) concludes with a rather negative assessment of Cusack et al (2007), but argues that ‘Boix’s closer dialogue with historical knowledge is vindicated by the greater robustness of his findings’.

While we agree with Kreuzer’s (2010) critique, we argue that the contributions of Alesina and Glaeser (2004), Blais et al (2005), Boix (1999, 2010) and Cusack et al (2007, 2010) also suffer from a more fundamental methodological problem that is typical of much comparative research that is solely based on the analysis of cross-sectional variation. In particular, we submit that methodological approaches that rely exclusively on medium- to large-N cross-sectional correlations among variables as the source of causal inference are generally not suitable for analyzing comparative research questions in which the main acting agents are not individuals but collective actors, such as political parties, social movements or governments. Qualitative research has repeatedly documented that such collective actors are likely to be characterized by internal factions, personal and ideological rivalry, charismatic leaders, and thus often change positions for both ideological and strategic reasons. The political behavior of collective actor is therefore highly context-dependent and volatile. As a result, methodological approaches that rely exclusively on cross-sectional correlations and that treat these collective actors as unitary actors are prone to create non-robust and assumptions-dependent findings that are not internally valid. Without complementary analyses, in particular of within-case variation, these methodological approaches are inadequate for the causal analysis of institutional change.

To develop our argument, we analytically refine Kreuzer’s (2010) recommendation to take history more seriously by translating it into a methodological argument that consists of two steps. First, we stress the context-dependence of reforms. In particular, we focus on the presence of multiple, non-independent issues on the political agenda of advanced democracies, which allows agents to engage in strategic bargaining, both within and between collective actors. Second, we argue that given strategic behavior on the part of political actors and in presence of multiple, non-independent issues on the agenda, researchers cannot treat the behavior of collective actors as if it were merely aggregate individual behavior. Instead, researchers need to acknowledge that problems with social choice make it difficult to anticipate collective decisions (Kittel, 2006).

We subsequently discuss the two main strategies to overcome these limitations of medium- to large-N cross-sectional analyses. On the one hand, researchers can include further cross-sectional data in the analysis to increase the robustness of their findings (King et al, 1994). For instance, they can test their hypotheses against data from a new set of cases or focus on recurrent events. This strategy does not solve the aforementioned problems but it makes them less likely to bias the causal inference. On the other hand, researchers can use careful within-case analyses instead of or in combination with cross-sectional analyses (Brady and Collier, 2004). For instance, researchers can rely on process-tracing designs to improve the internal validity of their findings. This strategy does not suffer from problems with social choice but its potential for contingent generalizations is more limited. Hence, high internal validity might come at the cost of low external validity. Alternatively, researchers can rely on designs that seek to combine medium- to large-N cross-sectional analyses with the analysis of within-case variation in the framework of multi-method designs. While such research designs are often challenging in practice, they have the great potential to allow researchers to overcome the limitations of medium- to large-N cross-sectional analyses without abandoning the goal of drawing meaningful inferences that can be generalized to a comparatively large population of cases.

At this point we need to clarify what we are not arguing: First, our argument applies only to cases in which the dependent variable is the result of collective decision making in the presence of strategic interactions within and between these collectivities. In contrast, our argument does not apply to individual-level behavior. Second, statistics offers a powerful way of organizing and analyzing data, and we certainly do not reject statistics as a method. Our argument is simply that research designs solely based on cross-sectional correlations are often inadequate for causal analysis in comparative research if the acting agents are collective actors units, such as political parties, interest groups or governments. Third, we are not arguing that historical events are essentially unique and thus not comparable. Instead, we argue that while researchers must take context-dependence into account, context-dependence does not make comparison impossible.

This article is organized as follows. We first present four recent contributions about electoral system choice in advanced democracies before World War II and demonstrate that these four contributions elaborate theoretically totally different propositions while looking at the same cases and trying to explain the same phenomenon. We subsequently argue that these differences are the result of the methodological approaches that the authors of the four discussed contributions have chosen. Following our methodological critique, which we illustrate using examples from the recent debate on electoral system choice, we discuss possible alternative research strategies that allow to overcome some of these limitations. The final section concludes.

Four Arguments about Electoral System Choice

This section introduces four arguments on electoral system choice during the period 1890–1939.Footnote 1 The start and end points of the period under investigation were chosen by the authors of these arguments because PR was introduced at the national level for the first time in 1899 (Belgium) and because the onset of World War II – with the accompanying collapse of several democracies using PR – challenged the previous consensus on the democratic virtues of PR. After Belgium, several other countries adopted PR, while others either adopted or continued using some version of majoritarian representation (MR).Footnote 2 PR systems allocate seats in parliament roughly proportional to votes in elections; MR systems allocate a disproportionate amount of parliament seats to the electorally strongest party. MR systems often produce a single, absolute winner party, while PR systems force parties to form coalition governments to secure a majority of seats in parliament.

The debate on the determinants of electoral system choice started with Boix’s (1999) extension of the classic Rokkan (1970) argument, which identifies a ‘socialist threat’ as the main determinant for a country’s adoption of PR. Given that MR systems reward strong parties and punish weak ones, the emergence of a new (strong) party could endanger the position of the established ones. More concretely, electorally strong socialist parties could endanger the established parties if the established parties were fragmented (and thus weakened) and if the extension of suffrage to the lower social classes was likely to boost the vote share of the Socialists. In such a situation, the endangered established parties would benefit from a move to PR and, given their still powerful position, would be able to enact PR unilaterally.

Later, Boix (2010) refines the original argument in his response to Kreuzer (2010). He now distinguishes between segmented electoral arenas (the support of a particular party is highly concentrated in a particular geographic area or social sector) and competitive electoral arenas (several parties contend for the vote of a least some fraction of the electorate). In segmented electoral arenas, the established parties favor PR only if the new entrant threatens their electoral hegemony in a certain segment. In more competitive electoral arenas, the position of the established parties is shaped by the extent to which they are dominant in the electoral arena vis-à-vis the other established parties conditional on the entry of third parties. The party that expects to become the focal point around which non-socialist voters will eventually rally has little incentive to support PR. Conversely, established parties that do not expect to become the dominant non-socialist party prefer PR.

The second argument, developed by Alesina and Glaeser (2004), emphasizes the interest of the political left in PR. Given that MR stymied its electoral chances, the political left would benefit from PR. According to Alesina and Glaeser (2004), the political left used strikes, street protests, the threat of violence or simply their increasing political power to force reform against the will of the established parties, especially those representing the old elite. In some countries, the left took advantage of the weakness of the right during national crises, such as the aftermath of World War I. Thus PR was implemented either by the left after it came to power or by the right after it was forced to by a mobilized revolutionary left. In countries where the left lacked the necessary power resources, the right successfully defended MR electoral systems.

Third, Blais et al (2005) highlight two factors that facilitated the shift to PR. First, the spread of democratic ideas, along with the general perception that PR was the fairest electoral system, increased pressure on Western countries to democratize and adopt PR. Second, the presence of a majority run-off electoral system (in contrast to a plurality electoral system) led to a higher number of parties already before the introduction of PR and therefore a more fragmented party system. In these fragmented party systems, the regular occurrence of coalition governments and greater uncertainty about the optimal strategies for winning elections weakened opposition to the introduction of PR. In addition, the presence of smaller, electorally disadvantaged parties meant that some parties strongly favored the adoption of PR. Consequently, many countries adopted PR without much debate.

Fourth, Cusack et al (2007, 2010) argue that PR was adopted in countries with traditions of cooperation and negotiated decision making. These traditions of cooperation encouraged the production of co-specific assets, that is, investments by both companies and workers, where return on investment was possible only in the presence of cooperation between these diverse actors. By the end of the nineteenth century, industrialization turned local workers’ organizations into a national movement and increased the role of the national level in regulatory policymaking. These changes created a collective action problem, because MR did not allow for the proportional representation of all relevant social and economic interests at the national level. Consequently, to restore a negotiation-based political system in which national parties represented all relevant social and economic interests, all major parties supported the adoption of PR.

The four arguments for electoral system choice in advanced democracies before World War II differ widely. While Blais et al (2005) and Cusack et al (2007, 2010) emphasize widespread societal consensus, Boix (1999, 2010) and Alesina and Glaeser (2004) stress conflict among the main social groups. At the same time, however, Alesina and Glaeser (2004) and Blais et al (2005) identify forces of democratization as the drivers of electoral reform, while Boix (1999, 2010) and Cusack et al (2007, 2010) highlight the role of the old political elite, which attempted to protect its position in the established political system through electoral reform. Hence, the arguments are in conflict on these two fundamental dimensions (see Table 1). For Boix (1999, 2010) it is the political right that introduces PR to contain the political left, for Alesina and Glaeser (2004) the political left forces PR upon the political right, while for Blais et al (2005) and Cusack et al (2007, 2010) the political left and right typically agreed on whether to introduce PR. Overall, the four arguments are difficult to reconcile with each other.

Table 1 Recent contributions on electoral system choice

The scholars behind these four arguments provide empirical evidence from roughly 20 Western countries, which supports their argument, either through regression analysis or through short (one paragraph) discussions of each case. In addition, the contributors to this debate generally dismiss competing hypotheses. For instance, Blais et al (2005) and Cusack et al (2007) reject Boix’s (1999) hypothesis of a socialist threat, while Boix (2010) finds no support for Cusack et al (2007).

As this short discussion shows, these authors formulate very different theoretical propositions while looking at the same cases and trying to explain the same outcome. In the following, we submit that the reasons for these differences are the result of the methodological approach that the authors of the four discussed contributions have chosen. We consider this methodological approach to be inadequate for the causal analysis of institutional change because it does not acknowledge that in presence of multiple, non-independent issues on the political agenda, the behavior of collective actors cannot be interpreted as if it were merely aggregate individual behavior. Put differently, we submit that taking history seriously implies that collective actors cannot be conceptualized as unitary actors or, in the words of Wendt (2004), likened to ‘persons’. Rather, researchers have to consider the possibly complex decision-making processes within collective actors in their analyses.

Taking History Seriously in Comparative Research

In this section we argue that comparative research suffers from a familiar, but largely ignored problem, namely unpredictability in the case of decision making by collective actors (Capoccia and Ziblatt, 2010). We develop this argument in two steps. First, we emphasize the context-dependence of institutional change. Second, we explain why social choice problems make the methodological approach chosen by Alesina and Glaeser (2004), Blais et al (2005), Boix (1999, 2010) and Cusack et al (2007, 2010) inadequate for explaining electoral system choice in advanced democracies in the period 1890–1939.

The context-dependence of reforms

In democracies, the government is determined by free and competitive elections, while in non-democracies elections do not play the same role. In a similar vein, it is a completely different achievement to introduce PR during a process of democratization than in a country that has been democratic for almost 100 years at the time of the debate. Generally, the roughly 20 Western countries studied in the articles on electoral system choice in the period 1890–1939 differ widely with regard to democratic traditions and institutions. For instance, Italy, the Netherlands and Switzerland adopted PR in 1918 and 1919, respectively, but they made the transition to PR in completely different contexts. Switzerland was a stable democracy with universal male suffrage since 1848 and not involved in World War I. The Netherlands made the transition to PR during a process of rapid democratization, which also led to the introduction of universal male suffrage. Unlike the Netherlands and Switzerland, Italy was involved in World War I, suffering many casualties. In addition, democracy in Italy was, despite universal male suffrage, clearly deficient.

Despite these differences in terms of historical context, recent contributions to the debate on electoral system choice treat their cases as if they belong to the same class of units, thus assuming unit homogeneity. Boix (1999) and Cusack et al (2007) compare events in non-democratic Italy in the period 1919–1923 (in 1922, Mussolini became Italian Prime Minister) to events in the democratic United States in the period 1919–1939. In a similar vein, Blais et al (2005) treat the 1925 reform in non-democratic Japan the same way as they treat the 1918 reform in democratic Switzerland. However, these differences in context matter a great deal. Not only are some of these contextual variables in fact part of the theoretical arguments (for example, universal male suffrage in case of Boix, democracy in case of Cusack et al), these contexts are also likely to influence actors’ preferences. For instance, Penadés (2008) shows that, depending on the political context and contra Alesina and Glaeser, the political left was not always in favor of PR.

There is also a strong temporal pattern in the data, again pointing to the importance of historical context and the non-independence of different events. As mentioned above, Italy, the Netherlands and Switzerland adopted PR around the same time, at the end of World War I. This is in fact a general pattern: After the pioneer country Belgium, PR was introduced in Finland (after becoming independent) in 1907, followed by neighboring Sweden (1909), followed by neighboring Denmark (1915) and neighboring Norway (1919). Immediately after the end of World War I, PR was introduced in the Continental European countries Austria (1919), France (1919), Germany (1918), Italy (1919), the Netherlands (1918) and Switzerland (1918). Ireland (after becoming independent), Greece and Japan followed in 1922, 1925 and 1932, respectively (Colomer, 2005). This pattern points to the possible role of diffusion processes (Scandinavia), the catalytic role of World War I (Austria, France, Germany, Italy, the Netherlands) and the importance of national independence (Finland, Ireland, Norway). Importantly, these factors are not independent of the variables highlighted in the contributions on electoral system choice discussed above. For instance, it is well-known that World War I led to the strengthening of the (radical) political left and the nationalization of regulatory policymaking (Sassoon, 2010).

Finally, many recent contributions to the debate on electoral system choice fail to take sufficiently into account the fact that electoral system choice was but one among many issues on the political agenda at that time. For instance, Blais et al (2005) argue that the spread of democratic ideas increased the pressure on European governments to adopt PR, Boix (1999) correctly underlines the importance of considering the question of universal male suffrage (even though it is not part of his empirical analysis), Alesina and Glaeser (2004) highlight the increasing strength of the political left (often as a result of World War I), while Cusack et al (2007) emphasize the nationalization of politics. One could also add protection of political minorities, secret ballots, boundaries of constituencies, republicanism, female suffrage, one- or two-chamber systems and ticket methods. Generally, the adoption of PR took place in the context of intensive democratization, in the process of independence or during periods of political turmoil (for example, in the aftermath of World War I).

The decision to adopt PR was often part of comprehensive reforms that included multiple issues (Dodd, 1910, 1911). Given the complexity of these reform packages, the effect of proposed reforms was often impossible to predict by proponents and opponents. For instance, Carstairs (1980) and Alexander (2004) document that France changed its electoral system multiple times between 1870 and 1940, with French political parties adapting their position on electoral system choice frequently while repeatedly failing to correctly predict the outcomes of electoral system reforms. In addition, PR was often not the main concern of party leaders. In the intense French discussions around 1900, PR was second to the choice of ticket methods (Garner, 1913) and for socialist parties the expansion of male suffrage was generally the most important democratization issue (Boix, 1999). In Sweden, PR was introduced as a result of a political compromise that traded universal male suffrage (main demand of the Socialists and Liberals) for the introduction of PR (main demand of the Conservatives) (Särlvik, 2002). In Switzerland, the introduction of PR was just one among a series of demands of the labor movement in the 1918 general strike that ultimately led to the introduction of PR. Other core demands included universal female suffrage, the introduction of a public old-age pension scheme and shorter working hours (Lutz, 2004). In addition, Switzerland did not introduce PR at the federal level by means of a parliamentary decision, but by means of a constitutional amendment following a popular initiative.

Hence, PR has been adopted in very diverse political contexts (autocracy, democratization, independence, direct democracy, in the aftermath of World War I) and periods in which electoral system choice was not the only issue on the political agenda in debates on democratization and electoral systems at that time. Put differently, political actors had plenty of opportunities for strategic maneuvering and the combination of non-independent political issues in unexpected ways. As we argue in the next section, social choice problems are very likely in the presence of multiple political issues and strategic interactions. If collective actors such as political parties allow for some sort of collective decision making, the presence of multiple non-independent political issues makes the anticipation of these decisions very difficult without relying on in-depth historical case studies.

Social choice problems

Comparative politics research has been subjected to increased criticism in recent decades (Shalev, 2007). We add to this literature by arguing that ‘taking history seriously’ often undermines medium- to large-N cross-sectional analyses in comparative politics. In a nutshell, we argue that medium- to large-N cross-sectional approaches in comparative politics often make theoretical assumptions that are most certainly oversimplifications.

How do macro-phenomena like industrialization or socialist threats lead to a political reaction? In comparative research, these questions are typically answered using Coleman’s (1990) ‘bathtub’, which relies on micro-level behavior to account for macro-level phenomena. For instance, using an example from Boix (1999), it could be argued that a socialist threat made members of the Belgian Liberal Party worry about the future of their party. These members then reacted to the threat by demanding a change of strategy. Finally, the new preferences of party members were aggregated through an intra-party decision-making process into the party’s position on electoral system change.

Unlike historically oriented empirical research, in medium- to large-N cross-sectional analyses in comparative politics these micro-foundations are hardly ever analyzed. Rather, researchers simply compare the variable claimed to influence individual behavior (for example, the level of the socialist threat) with the collective decision (for example, the party’s position on electoral system change). As a consequence, researchers empirically compare macro-phenomena, while they make theoretical assumptions about the micro-foundations that produce the observed macro-relationships. Historians are rather critical of such macro-macro comparisons. As argued by Roberts (1996, p. 16), ‘historians do not explain the occurrence of complex historical events by subsuming them under covering laws’. Rather, ‘they explain the occurrence by tracing the sequence of events that brought them about’. Of course, this methodological stance implies that historians emphasize internal validity at the expense of external validity.

Macro-macro comparisons can give rise to ecological fallacies. Consider the example of the introduction of PR in Germany. On the basis of their analysis of cross-national data, Cusack et al (2007, 2010) conclude that Germany introduced PR in 1918 to protect investments in co-specific assets after industrialization had increased the role of the national level in regulatory policymaking. To substantiate this argument, they demonstrate that an indicator of ‘pre-industrial coordination’ is highly correlated with the effective electoral threshold in a sample of approximately 20 countries. They thus correlate two macro-phenomena.

Leemann and Mares (2014) provide an alternative test of the theoretical argument using the individual voting behavior of legislators at the introduction of PR. They examine whether variation in co-specific assets at the district level can explain the voting behavior of the legislators representing these districts in the German parliament. Hence, they focus on the individual level. In their careful empirical analysis, they find no such effect and thus conclude that investments in co-specific assets cannot explain the introduction of PR in Germany despite the observed correlation between pre-industrial coordination and PR at the macro-level.

Ecological fallacies typically occur when the theoretical argument and the empirical analysis do not focus on the same level of analysis. For instance, an argument about individual voting behavior is analyzed using district-level data. However, in comparative politics, theoretical arguments often focus on collective actors such as parties that are supposed to do the ‘acting’. Put differently, comparative researchers often care not so much about individual voting behavior but rather focus on the preferences and activities of collective actors such as political parties. In the example discussed above, the level of socialist threat determines whether a given party supports the introduction of PR. The issue is thus not one of individual voting behavior but rather how collective actors determine their position on a given issue. But here, too, we run into serious problems if we do not take history seriously.

It is well-known from social choice theory that the outcomes of collective decision-making processes are difficult to predict in the presence of strategic behavior by multiple groups and several issues on the agenda (Nurmi, 1999). It is difficult because ‘most of the general situations either lack an analytical equilibrium solution, or face an infinite variety of them’ (Kittel, 2006, p. 660). Simple alterations of the decision-making process suffice to change the outcome, and it is quite possible that a minority prevails over the majority in a collective decision-making process. However, despite its fundamental importance, the relationship linking micro-behavior and macro-outcomes is rarely discussed.Footnote 3

Consider the example of the Belgian Catholic Party, which used its absolute majority in parliament to be the first country to introduce PR in 1899 (Ahmed, 2010). Sixty-five Catholics and 5 Liberals supported the 1899 law introducing PR, 35 Catholics, 7 Liberals and 21 Socialists opposed it (Mahaim, 1900, p. 397). This result is remarkable for several reasons, among them the fact that it was mainly the Liberals who were facing the socialist threat and not the Catholics, or the fact that the opposing parties benefitted from the reform, while the Catholics went on to suffer heavy losses in terms of parliamentary representation. Boix (2010, p. 411) ignores these anomalies and considers Belgium a case that confirms his theory because it was mostly urban Catholics (facing some socialist threat) who supported PR, while rural Catholics (facing no socialist threat) opposed PR. However, Boix (2010) fails to mention that these vulnerable urban Catholics were a clear minority in the Catholic Party (Ahmed, 2010, p. 1082). How can a minority beat the majority?

What seems puzzling at first sight can be resolved once it is accepted that social choice problems are often unavoidable. For instance, the Ostrogorski paradox, a classic social choice problem, demonstrates how a minority can prevail in a decision-making process based on voting if several, non-independent issues are considered at the same time instead of one issue after another. Table 2 shows a stylized example of the paradox using four groups of different sizes (rural and urban Catholics from the South and the North of Belgium, respectively). Together, the rural Catholics form a majority (60 per cent) within the Catholic Party. The four groups have to decide whether to support a law that would introduce PR. However, this reform is not exclusively about the introduction of PR. Rather, it has important additional effects. A minority within the Catholic Party believes that it would help restore social peace, while others believe that it would be wise to save the Liberal Party, so it could play the role as a buffer against the Socialists. In the example displayed in Table 2, the draft law is rejected in all three cases (with a majority of 60 per cent) if the four groups vote on each of the three issues separately (Columns 3–5). However, if the four groups first form a decision on all three issues before they vote (last column), the draft law introducing PR is adopted. Hence, by considering all three issues at the same time, the outcome of the collective decision-making process changes.

Table 2 The Ostrogorski paradox and the introduction of PR

While appearing complex at first sight, Table 2 in fact strongly simplifies the situation the Catholic Party encountered in late nineteenth-century Belgium. The reality was considerably more complicated with interventions by King Leopold, violent street protests that led the Catholic Party to withdraw its first proposal (which was more beneficial for them), the role of the previous electoral system, the plan of the Catholic Party to use the Liberals as a buffer against the Socialists and the role of the geographical distribution of votes, which gave the Catholics an incentive to support PR (Mahaim, 1900; Ahmed, 2010). Given this large number of non-independent political issues, there is an almost unlimited potential for strategic behavior by political actors. Owing to the resulting social choice problems, the outcome of the aggregation process within the Catholic Party becomes basically impossible to predict outside its historical context. The Ostrogorski paradox could thus explain how a minority can prevail over a majority in the presence of multiple issues on the political agenda, but the example also shows how dangerous it is to make simplifying theoretical assumptions about intra-party collective decision making. If Boix (1999, 2010) were really correct, the Catholic Party, led by its rural majority, would have let the Liberal Party perish instead of deliberately saving it (Ahmed, 2010, p. 1079).

In sum, comparative research that relies exclusively on medium- to large-N cross-sectional correlations among variables as the source of causal inference quickly runs into problems. Occasionally, comparative arguments can be reformulated in a way that allows for an examination of individual voting behavior (cf. Leemann and Mares, 2014). However, this strategy has three significant drawbacks: Next to abandoning the comparative perspective and issues concerning data availability, this strategy is limited to cases in which the comparative argument can be translated to the individual level. But what if the acting agent is an interest group or if there is no formal vote in parliament?

In comparative research, theoretical arguments often focus on the behavior of collective actors, such as political parties, interest groups or governments. If a theoretical argument expects the introduction of PR to be a function of, say, the strength of the union movement, this argument could be examined by correlating union movement strength with indicators capturing the proportionality of the electoral system. However, unions are collective actors and thus prone to social choice problems, in particular in presence of multiple non-independent issues on the political agenda (as it was certainly the case at the time of the introduction of PR). As a result, union behavior becomes increasingly difficult to anticipate and, by itself, a medium- to large-N cross-sectional analysis cannot give a conclusive answer to the question of whether union movement strength is a cause of the introduction of PR.

Alternative Research Designs

The critique presented in the preceding pages does not imply that researchers should abandon the analysis of electoral system choice. Rather, researchers have to find better research designs. In the following, we briefly discuss possible strategies to overcome the limitations of medium- to large-N cross-sectional analyses. We illustrate each strategy, if possible, using an example from the recent debate on electoral system choice.

Continue to rely on covariation but include further data

Following the suggestions by King et al (1994), the first set of strategies denotes research designs that continue to rely on covariation but try to overcome problems of social choice by including further data. For instance, researchers can test their hypotheses against data from a new set of cases or focus on recurrent events. These strategies do not fundamentally solve the aforementioned problems but they make them less likely to bias the causal inference.

Testing observable implications of their theories against data from new cases is standard advice in methodology textbooks. However, this strategy is often difficult to follow in comparative politics because the sample of countries analyzed is typically identical to the population of cases. Put differently, there are no new cases against which hypotheses could be tested. This is also the case for the recent contributions to the debate on electoral system choice. Although there are disagreements with regard to the scope conditions that define the population of cases (Boix, 1999; Cusack et al, 2007; Kreuzer, 2010), they all test their hypotheses against data from a sample of countries that is identical to their population of cases.

Hence, we are not familiar with an analysis using this strategy in the literature on electoral system choice. However, Geddes (2003, pp. 106–114) provides a classic example from a different debate. Using data from Latin American countries, Geddes rejects Skocpol’s (1979) famous arguments about international warfare, state breakdown and social revolutions. However, as Mahoney and Goertz (2004, p. 665) show, Geddes’ rejection is not valid. Skocpol limited her argument to politically ambitious agrarian states that, unlike Latin American countries, have not experienced colonial domination (Skocpol, 1979, pp. 33–42). Hence, Geddes’ analysis violates Skocpol’s scope conditions. The same problem is likely to apply to tests of the theories presented above because the sample of cases analyzed corresponds to the entire population of cases.

Alternatively, researchers can test hypotheses against new data from the same set of countries. A good example is Boix’s (1999) theoretical argument about the determinants of electoral system choice. Following Kreuzer (2010, p. 381), Boix’s argument has at least five observable implications, although Boix (1999) only tests two of them against empirical data. The other three implications are the concomitance of electoral system reform and suffrage extension, the initiation of the reform by the ruling parties and consensus among the ruling (that is, established) parties. As Kreuzer (2010, p. 380) notes, only 9 out of 24 cases match all 5 of the causal links in the overall argument. For instance, in Austria and Denmark the ruling parties did not initiate the reform, while in Germany and Switzerland universal male suffrage was introduced several decades before the switch to PR.

An alternative strategy to arrive at new data is to change the level of analysis. Federal political systems such as Germany or Switzerland allow for the analysis of PR adoption at the sub-state level (Wuarin, 1895; Barber, 1995). The Swiss case is particularly interesting. Not only was the Swiss canton of Ticino the first political unit to introduce PR in 1890, there was also great variation among the (then) 25 Swiss cantons. Some Swiss cantons still use MR for cantonal parliamentary elections and cantonal rules for election of cantonal governments and representatives in the two federal parliamentary chambers vary considerably. Given the considerable political autonomy Swiss cantons enjoyed at the turn of the century as well as their pioneering role with regard to the introduction of PR, this might be a promising avenue for research.

Changing the level of analysis has the advantage of providing new cases without violating the scope conditions and keeping numerous factors constant. However, it is questionable to what extent the results are translatable to the level of nation-states. For instance, Lutz and Zila (2009) in their analysis of the introduction of PR in Swiss cantons find little evidence that the threat from socialist parties played much of a role. Instead, they highlight the pattern of competition among right parties as well as the coordination capacities and needs within cantonal political systems. They argue that in Swiss cantons PR was typically introduced in situations where political coordination was already a part of the electoral game because of majoritarian elections in multi-member districts. However, these findings are translatable to the level of nation-states only if there is good reason to assume that there is no ‘nation effect’ (Lieberson, 1985, pp. 110–115). Put differently, these findings are relevant only if it does not matter whether the findings from the sub-state level analysis are from the United States, Switzerland or Germany.

Alternatively, given the difficulties with the behavior of collective actors, researchers might be best served by turning to individual-level data, thereby circumventing the collective decision-making problems that bedevil comparative research (Ziblatt, 2008). In the debate on electoral system choice, Leemann and Mares (2014) correlate district-level data such as the vulnerability of a given legislator’s seat to the rise of social democratic candidates and the skill profile of a given district with the legislator’s voting behavior in parliament. They show that the skill structure of a district has no effect on legislators’ voting behavior, while the vulnerability to the rise of social democratic competitors has considerable explanatory power. There are, however, also notable disadvantages of this strategy. Most importantly, it allows researchers to examine only parts of certain causal mechanisms (those that are linked to individual-level behavior such as voting in parliament), presupposes the availability of data and, finally, does not examine whether generalizations hold across cases to which they are supposed to apply.

Instead of finding new cases or changing the level of analysis, researchers can also try to overcome collective decision-making problems by having multiple observations for each case. The logic behind this strategy is straightforward: In case of recurrent collective decision making the average outcome is likely to approximate the expected value if there is no systematic bias that affects the outcome over the long run. Hence, although some error (because of social choice problems) is often unavoidable, estimates are unbiased if this error is not correlated with the error term.Footnote 4

Unfortunately, this strategy is difficult to apply to research on electoral system choice because researchers are dealing with a rare event. Only very few countries have switched between MR and PR systems multiple times (and these reforms are likely to be non-independent). However, this strategy is rather common in areas such as welfare state research. For instance, researchers use pooled time-series cross-sectional data to correlate the partisan composition of governments with changes in social expenditure in subsequent years (Huber and Stephens, 2001). In this literature, the assumption is not that left-wing governments always increase social expenditure but rather that they on average do so more often than, say, right-wing governments. Hence, it is quite conceivable that in a given case right-wing governments increase social expenditure more than left-wing governments. However, on average, it is the other way around.

Turn to within-case analysis to replace or complement the cross-case analysis

Following Brady and Collier (2004), the second set of strategies denotes measures to replace or complement cross-case analyses with within-case analyses. For instance, researchers can rely on process-tracing or congruence analysis designs (George and Bennett, 2005; Beach and Pedersen, 2013) to test the micro-level implications of theories. These designs prioritize internal validity, although in the framework of multi-method designs, this focus on internal validity need not necessarily come at the cost of lower external validity.

For instance, historical case studies or small-N comparisons that are more sensitive to historical context and micro-settings are a promising strategy to avoid problematic assumptions about (the absence of) social choice problems (Capoccia and Ziblatt, 2010). Such designs that primarily focus on within-case variation have a series of obvious advantages. Most importantly, they allow researchers to trace political processes over time, thereby avoiding difficult counterfactuals. However, there are also obvious disadvantages. In particular, researchers gain intension at the cost of extension, hence limiting the extent to which cross-case regularities can be identified and their findings can be generalized to a larger set of cases. At the most extreme, single-outcome studies (Gerring, 2007) even abandon the goal of much comparative research, narrowly understood, namely that comparisons are used to control ‘whether generalizations hold across cases to which they apply’ (Sartori, 1991, p. 244).

In the literature on electoral system choice, Ahmed (2010) has used comparative case studies of Belgium and the United Kingdom to demonstrate that at the time of enactment both PR (adopted by Belgium in 1899) and single-member plurality (adopted by the United Kingdom in 1884 and today considered a MR system) ‘were understood as two functionally equivalent alternative safeguards of the position of right parties against the consequences of suffrage expansion’ (Ahmed, 2010, p. 1059). Hence, according to Ahmed, Boix (1999) is right in emphasizing suffrage expansion and socialist threats, but she argues that Boix misunderstands how the effects of single-member plurality were perceived at the time of enactment. Hence, although Ahmed (2010) partly supports Boix’s theoretical argument, she also modifies it in important ways. However, again, the extent to which Ahmed’s (2010) results are relevant for other cases remains to be answered.

Hence, in recent years, social science methodologists have begun to push for research designs that seek to combine medium- to large-N cross-sectional analyses with the analysis of within-case variation in the framework of a multi-method design (Brady and Collier, 2004). While such research designs are often challenging in practice and numerous methodological questions remain, they have the great potential to allow researchers to overcome the limitations of medium- to large-N cross-sectional analyses without abandoning the goal of drawing meaningful inferences that can be generalized to the population of cases.

Research designs that emphasize within-case analysis have the drawback that their findings cannot be easily generalized to a larger set of cases. Hence, researchers may decide to combine medium- to large-N cross-sectional correlations among variables with historical case studies based on the analysis of within-case variation (Lieberman, 2005; Collier et al, 2010). Cusack et al (2010) have adopted this strategy in response to Kreuzer (2010). They have complemented their original cross-sectional analysis with a within-case analysis of three German states (Länder). However, their example also illustrates the dangers of multi-method designs: Cusack et al (2010) fail to justify their case selection for within-case analysis, their case studies lack depth and it remains unclear to what extent their findings can be translated to the level of nation-states.

In general, multi-method research designs face the formidable challenge of case selection: How are researchers to select cases for within-case analysis if they do not know whether the cross-sectional analysis on which the case selection is based is sound (Rohlfing, 2008)? Although scholars have developed sophisticated case selection techniques in recent years (Seawright and Gerring, 2008), the quality of the case selection is ultimately a function of the quality of the preceding cross-sectional analysis. This problem becomes apparent once the complex relationship of dependency is taken into account: The cross-sectional analysis determines the case selection for the within-case analysis, while the within-case analysis is supposed to evaluate the validity of the cross-sectional analysis.

Alternatively, researchers can start with the analysis of within-case variation before they analyze the extent to which their findings can be generalized by means of a cross-sectional analysis. This strategy allows researchers to bypass some of the problems listed above. For instance, Ahmed (2013) uses four detailed historical case studies to develop her arguments about the relationship between the impact of democratization, the existential threat posed by socialist parties and the reform of electoral systems. She then creates a typology to show how the processes she observes in the four cases are representative of developments in another 14 Western countries. Thus, by carefully combining historical case studies with typological analysis, Ahmed (2013) succeeds in drawing meaningful inferences that can be generalized to the population of cases.

Overall, it is our firm conviction that successful multi-method research in comparative politics stands and falls with the researchers’ ability to repeatedly switch between within-case and cross-sectional analysis, thereby engaging in a dialogue between theory and evidence – in their own research as well as together with other researchers. The literature on electoral system choice is exemplary in this regard. The exchange between Ahmed (2010, 2013), Boix (1999, 2010), Kreuzer (2010) and Leemann and Mares (2014) relied on a multitude of research designs. Yet, together they have advanced the literature on the politics of electoral system choice in important ways. In these exchanges, the contributions that were particularly attentive to history played important roles in analyzing, among others, the actors’ preferences, how actors perceived their choices and how the causal mechanisms unfolded.

Conclusion

The debate on electoral system choice perfectly illustrates our general methodological criticism of comparative research that is solely based on the analysis of cross-sectional variation and in which the main acting agents are collective actors, such as political parties, social movements or governments. First, the number of cases covered and the data analysis techniques used are quite typical for this kind of research. Second, the theoretical arguments put forward by these researchers are, although interesting, clearly contradictory. Third, all contributors to the debate on electoral system choice claim to provide conclusive empirical evidence that seems to support their arguments. How is this possible?

In this article, we have argued that the problem is the methodological approach these authors have chosen. In a nutshell, we have argued that reforms of the electoral system typically take place in a historical context that is characterized by the presence of multiple, non-independent issues on the political agenda, which allows political actors to engage in strategic behavior. In such situations, researchers cannot treat collective behavior (of collective actors such as parties or governments) as if it were merely aggregate individual behavior. Instead, researchers need to acknowledge that problems with social choice make it difficult to anticipate collective decisions. As a consequence, methodological approaches that rely exclusively on medium- to large-N cross-sectional correlations among variables as the source of causal inference and that treat these collective actors as unitary actors are prone to create non-robust and assumptions-dependent findings.

In the last section, we have presented several alternative research designs that can overcome some of the limitations of such comparative research. However, as we have noted, all of them have their own weaknesses. Given the research interests in comparative research and the typical data limitations, multi-method designs combining the analysis of both cross-sectional and within-case variation seem most promising. Hence, we believe that the future of comparative research stands and falls with the ability of researchers, as research teams or through constructive exchanges between research teams and possibly across disciplines (Elman and Elman, 2001), to find ways to combine cross-sectional and within-case analysis in a fruitful way.