1 Introduction

A major development in contemporary international affairs has been the creation of international dispute settlement institutions – international courts, tribunals, and arbitration bodies – to address all types of global disputes. Even prominent international law skeptics have acknowledged that “(i)n the past few years, international dispute resolution has assumed an unprecedented prominence in international politics” (Posner and Yoo 2005, 3). If we look at the economic realm, as one illustration, we can identify many active dispute settlement bodies, both prominent and less prominent. The dispute settlement mechanism (DSM) of the World Trade Organization (WTO) receives the most attention due to its central place in the multilateral trade regime and its unique features (e.g., Bernauer et al. 2012). Yet numerous regional and bilateral dispute settlement institutions also exist, many of which have compiled an impressive record. Most notable is the European Court of Justice (ECJ), whose massive caseload and landmark rulings have been a major part of European integration. Similar courts have arisen in other regions, as 11 “operational copies” of the ECJ now exist worldwide, which collectively have issued more than 2100 rulings (Alter 2012, 136, 140). Likewise, the United Nations-sponsored Integrated Database of Trade Disputes for Latin America and the Caribbean (IDATD) indicates that more than 1,100 disputes have gone before various dispute settlement bodies hemisphere-wide since 1995.Footnote 1 The record is impressive elsewhere, such as in eastern and southern Africa, even if it less well known.Footnote 2

The creation of these dispute settlement institutions, and the rules by which they can be used, is spelled out in accompanying international treaties. Most notable international agreements contain dispute settlement provisions, but they vary tremendously. Not all of them create a powerful, heavily-utilized court. Instead, some provide for ad hoc arbitration, specify mediation by a third-party, encourage bilateral negotiations, or various combinations of the above. Moreover, the precise rules governing each settlement option varies much more than is acknowledged. Even among arbitration, the most common option, one observes widespread variation in the specified timelines, rules governing choice of fora, selection of arbitrators, and types of post-award sanctions allowed, among other features.

These design features can be consequential. They may affect, for example, whether states want to pursue formal dispute settlement at all or whether some venues are more likely to be used than others. Furthermore, the mere presence of a powerful judicial body might encourage “out of court” settlements or deter objectionable behavior in the first place. However, the rules also can have unintended consequences if they are used more frequently than envisioned or entail greater costs than was anticipated.Footnote 3 Thus, how future disputes are handled, and whether they are resolved to the satisfaction of aggrieved parties, depends heavily on what is contained in an agreement’s dispute settlement provisions.

To understand why states sometimes allow for strong, legal dispute settlement, we investigate variation across the dispute settlements rules of nearly 600 international agreements signed during the past 60 years. Following others, we focus on preferential trade agreements (PTAs), which contain the full landscape of dispute settlement design choices. Additionally, they are numerous, politically relevant, and signed by nearly all countries – which makes them generalizable to other issue areas and types of agreements. Our efforts to explain such wide variation in dispute settlement design are unique in several ways. First, we examine all PTAs since World War II – not just regional trade agreements or those currently in force – and thus we incorporate hundreds more agreements than in other studies. Second, we depart from more simple conceptualizations of DSMs and instead re-orient our theorizing around the notion that DSMs are procedures that help to enforce the terms of agreements. Relatedly, whereas other studies simply examine whether or not selected agreements have a dispute settlement mechanism (DSM) or whether it is “legal” versus “political,” we develop a richer, more multi-faceted empirical measure of DSM strength that draws upon newly compiled data. Finally, we engage and reconcile seemingly disparate explanations for institutional design (rational design, state power, and regionalism), which typically are portrayed as hostile to, or even incompatible with, one another.

Our unifying claim is that stronger DSMs are primarily a rational response by members to the heightened demands of “deeper” agreements with “wider” memberships, but that considerable room exists for signatories to craft agreements to fit the unique preferences of agreement members. Drawing on well-developed, but often untested, arguments about international cooperation, we expect that all states should incorporate stronger dispute settlement rules within agreements that have more members and entail significant obligations. Indeed, the strongest finding from our multivariate empirical tests is that the deepest agreements – those that require the greatest policy change (in our case trade liberalization, market access, and harmonization) – also incorporate the strongest dispute settlement provisions. A similarly robust finding is that DSM strength increases with the number of members. Yet we uncover clear evidence that agreements also reflect power-based considerations. As compared to more balanced agreements, we find that North–South agreements are much more likely to contain strong DSM language as are agreements involving the United States. As for regional patterns, we discover that PTAs signed in the Americas contain stronger dispute settlement rules, as do agreements among Asian partners, contradicting widespread perceptions of a litigation-averse “Asian way.” In sum, treaty features such as depth and membership size play a major role in pushing all states toward stronger dispute settlement rules, but power constellations and regional differences also shape dispute settlement design.

2 Theoretical foundations

2.1 A wider lens on international dispute settlement

Much scholarship on international dispute settlement focuses quite logically on the record of prominent courts and tribunals. Numerous studies have investigated the workings of bodies like the International Court of Justice (e.g., Paulson 2004), ECJ (e.g., Stone Sweet and Brunell 1999), DSM of the WTO (e.g., Bown and Pauwelyn 2010) as well as the Andean Tribunal of Justice (e.g., Alter and Helfer 2010) and NAFTA dispute settlement panels (e.g., De Mestral 2006). These institutions deserve such attention: they are active, address important disputes, and can impose costs on non-compliant states. But they represent the tip of the iceberg among international dispute settlement institutions – the most visible and active of such bodies in a global ocean of dispute settlement venues. It is important to consider these bodies in a broader context to avoid drawing conclusions that are either biased or perhaps overly optimistic.

One way to broaden our view of international dispute settlement is to consider all of the other, sometimes lesser-known, dispute settlement institutions to which states may turn. Many such options exist, generating what Eric Posner (2009, 150) has called a “…bewildering jungle of judicial, quasi-judicial, and advisory bodies, some global and some regional or bilateral, with overlapping jurisdiction and no hierarchical structure to ensure uniformity in the law.” On the surface, this increasing legalization network may be cause for celebration, since it facilitates rule-based resolution of international disputes based on legal principles. But this potential legal fragmentation could produce competing and perhaps contradictory legal regimes (e.g., Raustiala and Victor 2004; Alter and Meunier 2009) and allow states to forum-shop and to exploit differences in state obligations and also interpretation.Footnote 4 Thus the concern is that disputes might not be resolved strictly based on some objective rule of law, but instead by the power politics that determines the rules governing dispute settlement and who can select among competing rules (see also Drezner 2013).

This also suggests the need to understand the origins of these various dispute settlement venues and how the rules for their use are written. Thus a second way of broadening our approach to international dispute settlement is to focus not just on the caseload of dispute settlement bodies, but also earlier on the creation of these bodies, as reflected in the treaty provisions specifying the conditions under which they may be used. For our purposes, we cannot fully understand the efficacy of a dispute settlement mechanism without taking into account its design features (Mitchell 1994). For instance, international agreements designed with a (stronger) dispute settlement mechanism may (Peinhardt and Allee 2012; Berger et al. 2010) or may not (Gray and Slapin 2013; Kono 2007) be more effective, or have different effects, than those without such mechanisms (Haftel 2012).

2.2 International institutional design

There has been a recent proliferation of scholarship on international institutional design; that is, the particular rules and characteristics that member-states include within formal international organizations, international treaties, as well as more informal arrangements. All studies are seemingly motivated by the observation that “major institutions are organized in radically different ways” (Koremenos et al. 2001: 761). Early work lays out theoretical claims and establishes important concepts (e.g., Aggarwal 1998; Haftel and Thompson 2006; Koremenos et al. 2001). A few empirical studies explore design variation across a wide range of international institutions (e.g., Hooghe et al. 2014; Koremenos 2007; Koremenos and Betz 2013), but the dominant trend has been toward empirical work that examines institutional design within particular issue areas or types of institutions. Among the more vibrant domains for empirical institutional design scholarship include: international environmental agreements (e.g., Bernauer et al. 2013; Marcoux 2009), regional organizations (e.g., Acharya and Johnston 2007; Haftel 2012, 2013; Smith 2000) and economic institutions ranging from the WTO (Pelc 2009) to PTAs (Dür et al. 2014) to BITs (Allee and Peinhardt 2010, 2014).

Perhaps the most studied design feature of international institutions is dispute settlement provisions, the literature on which is diverse both methodologically and substantively (e.g., Arnold and Rittberger 2013; Rosendorff 2005; Yarbrough and Yarbrough 1997). The recent trend is toward quantitative work on single issues ranging from maritime boundary agreements (Ásgeirsdóttir and Steinwand 2014) to bilateral investment treaties (Allee and Peinhardt 2010, 2014). Regional and preferential trade institutions are pre-eminent within the dispute settlement design literature, with DSM design being the primary (Jo and Namgung 2012; Smith 2000; also Chase et al. 2013; Porges 2011) or secondary (Mansfield and Milner 2012; Haftel 2012, 2013) dependent variable in many studies. From the few studies that look across all types of agreements, a common finding is that trade and/or regional institutions are more likely than security or environmental organizations to have a dispute settlement provision (Koremenos 2007) or to allow for international arbitration or adjudication (Hooghe et al. 2014; Koremenos and Betz 2013). All of this suggests that much of the variation and innovation in dispute settlement design can be found in economic agreements.

2.3 Moving beyond legalization

The literature on international legalization (e.g., Goldstein et al. 2000; Keohane et al. 2000) is a logical starting point for thinking about dispute settlement, particularly the legalization sub-component of “delegation” (Abbott et al. 2000). One might claim that “legal delegation” entails allowing disputes to be resolved by third-parties according to legal principles, as compared to through political or diplomatic interactions between the parties themselves.

Correspondingly, empirical studies have leaned heavily on legalization as the singular defining feature of DSMs (e.g., Jo and Namgung 2012; Kono 2007; Smith 2000; also Chase et al. 2013; Morgan 2008; Porges 2011). These studies ask whether an international treaty or agreement allows for “legal” as compared to “political” (or “diplomatic”) settlement of disputes, and thus the resulting empirical measure is typically a binary or scaled indicator of whether or not a DSM is “legalistic.” As an example of the most thoughtful treatment in the literature, Smith (2000) measures the DSMs of 62 regional organizations on a scale ranging from “diplomacy” to “legalism.” He incorporates five variables on whether the DSM: allows for legal dispute settlement (third-party review), has a permanent or ad hoc body, whether rulings have direct effect, whether they are binding, and whether private actors have standing. Subsequent work draws heavily on Smith in measuring DSM legalism, whether used as an independent (Kono 2007) or dependent (Jo and Namgung 2012) variable.Footnote 5

The above schemes capture DSM legalism well, but we see some limitations with them. First, there is little variation across some legalism components. For example, explicit mentions that awards are not binding are extremely rare; we find only two PTAs (out of 589) in which the parties explicitly say that awards are not legally binding. Second, the few components that comprise legalization overlap considerably. For instance, a 1) permanent dispute settlement body is by definition one that engages in 2) third-party review and that will issue 3) binding judgments.Footnote 6 Third, the legalization concept suggests, intentionally or not, a false dichotomy that does not exist. When legal dispute settlement (arbitration or adjudication) is provided for as an option within an international institutional arrangement, it almost always is specified in tandem with consultations or other “out of court” efforts – as a follow-up option in the event diplomatic efforts fail to produce a solution. Thus the “political” (“diplomatic”) and “legal” options are not mutually exclusive, but work as complements. A fourth and related issue is that contemporary international institutions increasingly provide for at least some type of legal dispute settlement.Footnote 7 This “move to law” (Goldstein et al. 2000: 385), in fact, is a major part of what sparked the legalization literature. So the conceptual task is not to think about whether a DSM allows for legal dispute settlement, but instead to consider the various elements of such legalism. Thankfully, many empirical legal studies chronicle important variation across legal dispute settlement mechanisms such as supporting infrastructure, requirements for panelists, decision-making, appeals, and implementation (e.g., Allee and Elsig 2015; Chase et al. 2013; Porges 2011; also Bartels and Ortino 2006; Donaldson and Lester 2009). Therefore, we build on the idea of legalization, but use it primarily as a launch pad for creating a richer conceptualization of dispute settlement mechanisms.

Likewise, we argue that DSMs should be thought of as potential enforcement devices for which the “strength” of the DSM is central. Legalization is a useful typology, but the main purpose of designing a DSM is not to achieve legalization, but to achieve compliance with the obligations enshrined in agreements. Indeed, trade agreements are commonly depicted as prisoners’ dilemmas in which preventing defection is paramount (Yarbrough and Yarbrough 1992, 1997). Many persuasive accounts of WTO dispute settlement portray the DSM as an enforcement body (Maggi 1999; Schropp 2009; also Sattler and Bernauer 2011). Similar dynamics characterize human rights and environmental agreements, for which the issue of compliance (with the obligations enshrined in a treaty) is crucial (von Stein 2012). Therefore, a major purpose of a DSM is to encourage adherence to an agreement. This, and effective dispute settlement more generally, will be more likely when the DSM is “strong”; that is, provides tools to aggrieved parties to successfully challenge the potential non-compliance of other members.

3 Theory and hypotheses

Our explanation for dispute settlement design is informed by existing scholarship, yet we put forward a framework that both draws from and reconciles tensions among existing perspectives. We see a split in the current institutional design literature between rationalist approaches, which treat states as mostly undifferentiated actors that will react similarly to characteristics of the treaty environment, and various other approaches that assume institutional design primarily reflects the individual or collective attributes of the actors – whether they are derived from power, regional identities, or domestic factors. At present the various perspectives ignore, de-emphasize, or criticize the others. We find merit in each as an explanation for DSM design, and believe they can be reinforcing and complementary.

Nevertheless, we posit a hierarchy among institutional design influences. Our overall claim is that DSM design is first and foremost a rational response to treaty characteristics. Thus we expect factors such as agreement depth or membership size to strongly predict dispute settlement design. Yet against this rationalist baseline, powerful states should be able to shape design features to meet their preferences, particularly in asymmetric agreements. Finally, room should exist for regional forces to shape agreement design. In the following sections, then, we articulate the logic of rationalist, power-based, and regional explanations for dispute settlement design and identify the hypotheses that emerge from each approach.

3.1 Rational response to agreement features

One prominent perspective toward international institutional design portrays agreement-design outcomes as a rational response to the nature of the agreement. Some agreements have more members or entail greater obligations, and thus member-states in these settings will design a strong DSM to aid in enforcement. We agree with the basic thrust of this claim, and expect it to explain much of the variation in DSM design. One literature that espouses this type of logic is scholarship on the “rational design of international institutions,” in which institutional design outcomes reflect various “cooperation problems” to which all states will react similarly (Koremenos et al. 2001). Even more directly relevant is game-theoretic work on treaty compliance from the “enforcement” school (e.g., Downs et al. 1996). It maintains that strong DSMs are needed to achieve compliance with agreements that require major policy change, and for which the probability of defection is high.

These arguments are not uncontroversial. First, endogeneity is an ever-present concern in studies which posit that one agreement characteristic should affect another – although this is less of a concern in our case.Footnote 8 Moreover, the “rational design” approach has been criticized on many grounds (e.g., Duffield 2003; Wendt 2001), including its ambiguous and unmeasured concepts, neglect of state power, and lack of quantitative empirical tests (but see Koremenos and Betz 2013; also Copelovitch and Putnam 2014). Also notable are critics from the “mangerialist” school of treaty compliance, who reject “enforcement” scholars’ claims that states willfully choose to violate agreements and instead argue that non-compliance results from lack of transparency and limited state capacity (e.g., Chayes and Chayes 1993; Victor et al. 1998). In light of these final two criticisms, we make a notable empirical contribution by carefully testing controversial and unresolved arguments about the relationship between key agreement characteristics (depth, membership size) and dispute settlement design.

The first feature of agreements that should lead to stronger dispute settlement is agreement depth. Most closely linked with the work of Downs et al. (1996), the logic is that “deeper” international agreements promise greater gains from cooperation but also a greater potential for defection. Therefore, those agreements that include the most meaningful commitments, and require the greatest policy changes, are the ones that require some type of strong dispute settlement mechanism. Such mechanisms can compel states to uphold their commitments for fear of adverse rulings by neutral third parties, which can entail reputational costs as well as punishment through various sanctions. Careful tests of this depth-enforcement connection are elusive, and thus our findings should help to arbitrate the long-standing managerial-versus-enforcement debate. The former would predict no discernible relationship between how “deep” the agreement is and the strength of its DSM, while the latter predicts a strong, positive relationship between the two.

  1. H1a

    International agreements that are deeper; that is, entail greater obligations, should contain stronger dispute settlement mechanisms.

Beyond depth, a second feature of international institutions that might shape dispute settlement design is the membership-size of the agreement. A frequent claim is that international agreements with more members are more likely to receive strong, legal dispute settlement (Yarbrough and Yarbrough 1992; Koremenos 2007). We flesh out the causal link between number and design outcomes, and see several reasons why agreements with more members should have stronger DSMs. First, greater numbers make monitoring more difficult, and stronger DSMs have devices that help to monitor members’ compliance with an agreement. Second, any disputes that arise in a bilateral arrangement should be easier to deal with directly and informally. By contrast, in agreements with multiple members disputes may entail greater complexity and involve irreconcilable demands, and thus be more difficult to resolve informally. Stronger DSMs can facilitate successful dispute resolution in these scenarios. Likewise, compliance with any dispute settlement outcome (formal or informal) may be more difficult in multi-member agreements, and more advanced DSMs have devices (post-award sanctions, timelines, etc.) that can aid in dispute settlement. In addition, in multi-party agreements the need for uniform interpretation increases, providing an incentive to opt for a more institutionalized dispute settlement process that can provide greater consistency. Finally, setting up more elaborate and formalized arbitration systems can be a financial burden for agreement partners, and with greater numbers the individual contribution decreases, thus making stronger DSMs more likely.

  1. H1b

    International agreements should contain stronger dispute settlement mechanisms as the number of members increases.

3.2 Power and asymmetry

The previous arguments emphasize features of the agreement as the primary determinant of institutional design: what does the agreement require members to do, and how many actors will be doing it? An important implication is that the states that design the DSMs are undifferentiated. By contrast, all other perspectives emphasize, in one form or another, who is part of the agreement and what their specific design preferences might be. Although we believe depth and number of members should be strong, primary determinants, we also expect various characteristics of the members to play a notable role in institutional design. The particular argument in this section differentiates states according to power and emphasizes power asymmetries among members, following previous scholarship on the role of power in shaping international institutions (e.g., Gruber 2000; Drezner 2007; Stone 2011).

We believe that powerful states should prefer stronger dispute settlement mechanisms within international institutions – contrary to much of the prevailing conventional wisdom. Indeed, a long line of scholarship suggests that powerful states will be averse to legal trade dispute settlement because they should fare well in disagreements in the absence of legal institutions (e.g., Gomez-Mera and Molinari 2014; Schneider 1999). Although the above logic is sensible, we find several reasons to believe that powerful states might actually support strong, legal dispute settlement – most of which stem from the general point that powerful states can use international institutions to their advantage (e.g., Brewster 2006; Martin 1992; Gruber 2000; Drezner 2007). First, many DSMs allow for both “diplomatic” and “legal” dispute settlement, so powerful states can benefit from having a menu of dispute settlement options that allows them to forum-shop even within treaties. Second, powerful states can select with whom they sign agreements and what is (and is not) contained in the agreement. So the obligations that are subject to powerful dispute settlement are likely to be desirable elements that powerful states want to have enforced.Footnote 9 Third, evidence from other studies of international agreements suggests that powerful states have disproportionate influence on the design of dispute settlement mechanisms.Footnote 10 Fourth, power still plays a role, even within legal dispute settlement, since any legal dispute settlement ruling must still be implemented, which may depend on a state’s ability to sanction or impose costs on a non-compliant party (Bown 2004). For all of the above reasons, then, we expect powerful states to be more likely to include strong dispute settlement in their international agreements.

  1. H2a

    International agreements that include a powerful state should contain stronger dispute settlement mechanisms.

The preceding discussion of powerful states raises the issue of how broader power dynamics among agreement members might affect institutional design outcomes. Agreements can involve very similar groups of countries, or countries that are quite different. We argue that countries in more asymmetric agreements should prefer stronger DSMs, since it will appeal to states at both ends of the power imbalance. For powerful states, all of the preceding arguments are accentuated in asymmetric agreements: they will have even greater control over the terms of the agreement, the ability to forum-shop, and any post-dispute negotiations over implementation. That said, weaker states enter freely into these arrangements as sovereign states. For the weaker actors in these situations, then, the key imperative is how to ensure that powerful partners will abide by the agreement? We argue that stronger DSMs will be attractive in these situations of greater asymmetry. In the absence of a strong, legally-based DSM, weaker actors have little chance of successfully confronting a more powerful partner. But timely, effective, and legalized dispute settlement can “level the playing field” and give weaker actors the ability to contest policies from the wealthier state that it finds objectionable, or to have a general check on the abuse of power. In sum, both stronger and weaker members of agreements should perceive strong DSMs to be desirable. By contrast, in symmetric agreements members will discuss issues on equal terms and should find it easier to arrive at negotiated settlements to potential disputes.

  1. H2b

    International agreements among members with significant power asymmetries should contain stronger dispute settlement mechanisms.

3.3 Regional patterns

Differences across regions, or what Acharya and Johnston (2007: 245) have labeled “regional exceptionalism” in international institutional design, also should be evident. Once again, this approach posits that different types of members – this time those from particular regions – should have identifiable orientations toward DSM design (Acharya and Johnston 2007). We believe these regional effects also should predict institutional design beyond what is explained by agreement depth or membership size. There might be different regional attitudes toward sovereignty or institutionalization based on varying colonial pasts, collective identity, or other regional characteristics. Drawing upon rich literatures on regionalization, we expect three regions – Asia, Africa, and the Americas – to be somewhat unique in terms of how they approach dispute settlement in their agreements.

Perhaps the strongest candidate for an exceptional region is Asia, where ex ante we expect international agreements to be less likely to include strong dispute settlement. A diverse body of scholarship maintains that within Asia there seems to be a long-standing hesitation to adopt formal institutions and legalization (e.g., Acharya 1997; Kahler 2000; Khong and Nesadurai 2007; Luo 2006). Thus international agreements within Asia should have weaker DSMs due to the general preference among states in the region for more informal, and less adversarial, methods of resolving disagreements. The explanation for this expected outcome typically revolves around a discussion of Asian culture or values, or perhaps political heterogeneity (see Kahler 2000).

Another region in which we expect weaker dispute settlement provisions is Africa. In general, sovereignty-based concerns should lead states to adopt weaker, more diplomatically-based solutions to dispute settlement (Schneider 1999). These concerns with allowing strong, legal dispute settlement should be most pronounced across the African continent, where the overwhelming majority of countries obtained independence only in the past half-century or so. Indeed, some argue that within Africa greater emphasis is placed on flexibility than on enforcing obligations (Gaathi 2010b). Herbst (2007, 130) likewise claims that there is a “clear style” of African cooperation and questions whether “rational design” predictions can fully explain Africa, noting that many African leaders are more concerned with domestic politics than with international appearances.

The story for Latin America is quite different, and thus we expect agreements within the Americas to be more likely to contain strong dispute settlement provisions. One reason is that unlike Africa and to a lesser extent Asia, Latin America is largely free of post-colonial, sovereignty-based fears of transferring power to supranational institutions. In fact, the history of cross-border cooperation in Latin America goes back nearly two centuries (Biukovic 2008; Dominguez 2007). Dominguez notes that regional institutions in the western hemisphere date back to the 1820s, and that across the region “(a) relatively thick array of international institutional rules had emerged by the 1930s” (2007, 83). Moreover, the broadly-defined region contains a general political organization (the Organization of American States) as well as a human rights convention, among other institutions, illustrating the importance of international law in the region. For all of these reasons, we expect states across the Americas to be more comfortable with institutionalized, transnational dispute settlement.

  1. H3a

    International agreements involving members from Asia are less likely to have strong DSMs.

  2. H3b

    International agreements involving members from Africa are less likely to have strong DSMs.

  3. H3b

    International agreements involving members from the Americas are more likely to have strong DSMs.

4 Data and measurement

4.1 PTAs as a laboratory for understanding dispute settlement design

For a set of cases to test the above hypotheses, we turn to the universe of post-war PTAs, which we define as agreements between states or regional organizations that provide reciprocal preferential market access for members’ goods and services. Within this broad definition, they can range from trade agreements between two parties to agreements consisting of multiple members. Some are regionally concentrated (e.g., NAFTA), others are characterized by membership that is geographically dispersed (e.g., Chile-Singapore PTA). Some are partial trade agreements that liberalize selective industry sectors (e.g., the automobile sector), others are attempts to provide duty-free access across the board (free trade agreements). Finally, some trade agreements develop within a broader economic or political integration process, such as in the case of customs unions (e.g., South-African Customs Union) or as important elements of economic and monetary unions (e.g., the European Union). These varying features provide useful variation on the dimensions we believe should affect dispute settlement design, such as size of membership, depth, and region. Moreover, by focusing on a consistent type of agreement, albeit one with considerable variation, we hold constant confounding variables associated with issue area, since too much incompatibility across cases can lead to simplistic explanations or garbage-can-type regression models.

We select PTAs not only because they vary on important dimensions, but because they are widespread, policy-relevant, significantly affect trade relations, and have generated thousands of real-world disputes. In the face of continued stagnation at the multilateral level, PTAs are now viewed by many as the pre-eminent institutions for governing global trade for the foreseeable future. All WTO members have officially concluded a PTA, and most governments participate in many, if not dozens of, agreements. Furthermore, they continue to evolve, as mega-PTAs such as the Trans-Pacific Partnership (TPP) and the US-EU Trade and Investment Partnership (TTIP) currently dominate the trade agenda. All told we code the treaty texts of 589 PTAs signed between 1945 and 2009. This list of treaties is drawn from Dür et al. (2014) and combines several lists from international organizations (WTO, Organization of American States) and is complemented by searches of trade and economic ministries’ websites.Footnote 11

It is important to note that DSMs are widely used after they are designed. Overall dispute totals are difficult to come by, but recent tallies of PTA-related disputes in the Western Hemisphere number in the thousands.Footnote 12 Among individual agreements, NAFTA has three different dispute settlement mechanisms – chapter 11 (investment), chapter 19 (antidumping and countervailing duties), and chapter 20 (general) – and 205 disputes have taken before the agreements’ mechanisms through 2014.Footnote 13 Likewise, disputes have been initiated against Costa Rica, the Dominican Republic, El Salvador, Guatemala, and the United States as part of DR-CAFTA’s investor-state dispute settlement provisions.Footnote 14 Various disputes have been taken before lesser-known bodies, such as the court of the Economic and Monetary Community of Central Africa (CEMAC) and the dispute settlement procedures of the Latin American Integration Association (ALADI).Footnote 15 Furthermore, the actual number of total PTA disputes goes well beyond the identifiable examples above because many disputes are dealt with through informal procedures that are specified in DSMs, but end up being hidden from public view and do not get registered formally.Footnote 16

4.2 Conceptualization and measurement of strong dispute settlement mechanisms

We assert that stronger DSMs are those containing legal procedures that provide for speedy litigation free from delay, allow complainants to drive the legal process and make important choices, and facilitate implementation of legal awards.Footnote 17 Drawing on established work on international arbitration, adjudication, and courts, we compile precise data on six components that reflect how “strong” or “weak” a particular DSM of a PTA is. The first of these follows directly from existing scholarship on DSM design, but the next five are newly created and represent an important original contribution. These six components are then combined into a primary and two alternate indicators of DSM strength, which are then used in our empirical tests.

The first of six components captures the extent to which dispute settlement authority is delegated to a third-party, legal body – as consistent with existing scholarship on DSM “legalization” in international agreements (e.g., Jo and Namgung 2012; Porges 2011; Smith 2000). The lowest category in the three-category “legal delegation” scale (coded with a 0) includes the scenario in which there are no provisions for dispute settlement or in which dispute settlement provisions exist but specify only consultations and/or mediation. By contrast, a value of 1 is assigned to PTAs that utilize only ad hoc arbitration as a legal dispute settlement option, whereas the largest value of 2 is reserved for PTAs that create a standing body for dispute settlement. The rationale for this last distinction is that by creating a standing body, PTA members delegate substantial authority to a more autonomous, established actor with greater resources, which helps with the carrying out of proceedings and the enforcement of awards. Table 1a contains the complete distribution for this legal delegation variable, which shows that ad hoc arbitration is the most commonly specified legal dispute settlement option.

Table 1a Variation in legal delegation across PTAs

The second component emphasizes the ability of a complainant state to choose the dispute settlement venue. PTAs vary greatly in their language on forum choice (Donaldson and Lester 2009, 378–381). The ability to choose the venue that is viewed as most desirable, or in rare cases to pursue a claim in multiple venues, can be quite advantageous to complainants, particularly powerful ones. PTAs are given a 0 for this complainant forum choice variable when they fail to specify anything about multiple fora and forum choice. Approximately 80 % of PTAs fall into this first category (see Table 1b). The next category (coded with a 1) indicates scenarios in which the complainant is allowed to choose the forum, yet they can only pursue settlement in one forum, thereby excluding ex post the use of an alternative forum. Most of the remaining cases (104 PTA, or 18 % of the total) follow this “fork in the road” logic. Finally, the highest value (coded with a 2), which is rare, is when the complainant chooses the venue and there are no restrictions on the use of multiple fora.

A third element is the composition of the judicial body, particularly the issue of chairman selection, since judicial panels take decisions by majority and the chair often plays a pivotal role (e.g., Chase et al. 2013; Donaldson and Lester 2009; Porges 2011; Posner 2009; Posner and Yoo 2005). Effective dispute settlement is more likely to occur when there is an unbiased chair or “umpire” (Posner 2009) that is not beholden to the interests of the state parties. Disagreements about panel composition – particularly the selection of the chair – can paralyze proceedings and hinder effective dispute settlement (Chase et al. 2013; Donaldson and Lester 2009; Porges 2011). Defendants in these scenarios have incentives to block or delay, since slowing down dispute settlement proceedings works to their advantage. By contrast, dispute settlement will proceed more efficiently when there are clearly-specified, apolitical rules for selecting a panel chairman. Four options for selecting the chair exist: the two parties consult and decide, the party-appointed arbitrators choose, an outside actor (an international organization/secretary-general) selects, or the chairman is chosen “by lot.” The latter two options enhance dispute settlement the most, since the selection process is faster and less subject to pressure by the respondent state. For coding purposes, we are interested in whether either of the more effective options (third party selects or “by lot”) is specified at all, since this provides a route for swifter appointment of an effective chairman. One of these two options is included in 145 PTAs, as depicted by the values for “2” in Table 1c. Next in line are the scenarios in which the party-appointed arbitrators select the chairman (coded as 1). All remaining PTAs, coded as 0, specify only bilateral consultations as the method of selection or have no legal dispute settlement.

A fourth component captures whether the DSP in a given treaty provides time limits for the dispute settlement process, whether overall and/or for particular stages (pre- and post-award). Some of the logic is similar to that for chairman selection, and the omission of both temporal dynamics is a limitation of earlier studies. The specification of time frames encourages a faster dispute settlement process and thus should enhance compliance with obligations. A total of 221 agreements specify time limits (see Table 1d).

The fifth ingredient of a strong DSM is the extent to which post-award sanctions can be used to effectively implement awards. Post-award enforcement mechanisms such as retaliatory measures have attracted considerable attention in the WTO dispute settlement context (e.g., Bown and Pauwelyn 2010; Zangl 2008) but are absent from existing empirical studies of PTA DSMs. Even after an arbitration or adjudication ruling, a state found to have violated its obligations might be able to delay or even avoid changing its non-compliant behavior. By allowing an aggrieved complainant to punish the non-implementation, institution-sponsored sanctions serve as negative inducements that make compliance more likely. Our empirical indicator for this post-award sanctions variable is a four-point, additive scale that combines values for four 0 vs. 1 indicators. The first indicator captures whether a PTA contains a sanctions provision, which is the case in 162 agreements.Footnote 18 The second indicator measures whether the complainant can choose the level of retaliation, which is allowed in 151 PTAs – a large majority of those that allow for some type of sanctions. Finally, the third and fourth indicators capture whether same-sector or cross-retaliation is allowed, as well as whether monetary compensation is envisaged. The former is present in 88 PTAs, while the latter occurs in 20 agreements. The distribution of the resulting, four-point, additive indicator of sanctioning power is shown in Table 1e.

The final component explores the comprehensiveness of dispute settlement provisions; that is, do the provisions apply broadly to all areas covered by the agreement? Across PTAs we observe various exceptions in which some areas are explicitly listed as not being subject to dispute settlement rules. Areas most commonly excluded from dispute settlement are trade remedies, safeguards, some forms of services, temporal entry of workers, SPS and TBT, competition policy, and investment. These negative exceptions weaken the enforceability of treaty commitments, particularly since they are likely inserted by a party that may be hesitant or unwilling to carry out particular obligations. The most comprehensive DSMs, then, are those that lack any such exemptions. This occurs in 169 PTAs, as depicted by “1 s” in Table 1f.

We sum these six components to create a simple, additive index of the strength of dispute settlement for all 589 PTAs in our data set. We utilize this as our primary measure because it is the most straightforward, transparent, and intuitive of the three slightly different measures we create. This resulting 0–9 index serves as the primary dependent variable in our empirical tests, and Table 1g shows the distribution of this variable across the universe of post-war PTAs.Footnote 19

Figure 1 shows the evolution of DSM strength over the post-war period using this same, primary indicator for DSM strength. Overall there is a clear upward trend, which reflects the general trend toward increased international judicialization over time. As a result, we estimate all of our multiple regression models with controls for 5-year time periods. Nevertheless, we also stress the considerable variation within each decade, each of which contains both weak and strong DSMs. Moreover, we emphasize that many early PTAs also had quite-strong DSMs.

Fig. 1
figure 1

Average strength of dispute settlement mechanisms in PTAs over time

Finally, as the first of two alternate outcome variables used to check for robustness of findings, we also create a more standardized six-category index that forces all subcomponents to have equal weight. In this case we take the six components depicted in Table 1(a to f), but instead of adding the raw values for each we standardize them all on a 0–1 scale, resulting in a more fine-tuned indicator that ranges from 0 to 6.Footnote 20 As a final method of measuring the overall strength of dispute settlement, we also combine all of the relevant variables using principal components analysis.Footnote 21

4.3 Conceptualization and measurement of independent variables

In this section we discuss measurement of explanatory variables, including those for primary hypotheses and control variables. We begin by specifying indicators for the easier-to-measure hypotheses from part 3. We then discuss in detail our measures for more complex concepts such agreement depth and power before introducing some additional control variables. Measuring number (H1b) is straightforward, as we include a simple count of the number of members of the agreements.Footnote 22 Likewise, the measures for region are simple dummy variables that capture whether all PTA members are located in each of the three geographic regions we have singled out: Asia (H3a), Africa (H3b), and the Americas (H3c).Footnote 23 This leaves as the comparison group those agreements that span multiple regions and non-EU PTAs in Europe (see below), along with a few agreements in Oceania.

Substantial effort is devoted to collection and measurement of agreement depth (H1a). Downs et al. (1996, 383) define depth as “the extent to which (an agreement) requires states to depart from what they would have done in its absence.” In the PTA context, “depth” should capture the degree to which commitments are made that can potentially lead to market opening and an increase in the exchange of goods, services and investments. Our operationalization of depth reflects that the effects of depth can be direct (in areas such as tariff liberalization for goods, trade in services, and rules allowing foreign firms to bid for public procurement tenders) as well as indirect (promoting regulatory convergence or harmonizing standards). Given the centrality of this concept and different possible approaches to measurement, we utilize multiple indicators to capture PTA depth, although our findings end up being robust to the decision of which to include. Our primary measure is a simple additive index that potentially ranges from 0 to 7. It tallies (1) whether the PTA reduces substantially all tariffs to zero and (2–7) whether it has a substantive provision on each of the following areas: competition, intellectual property rights, investment, public procurement, services, and technical barriers to trade (TBT) and/or sanitary and phytosanitary (SPS) measures.Footnote 24 A second indicator is a more fine-tuned count of the above, which ranges from 0 to 48 and captures whether the agreement contains certain sub-provisions within each of the above areas. A third indicator uses exploratory factor analysis to combine information across the aforementioned 48 components. Finally, we add a measure of depth that prioritizes the 18 components pertaining to domestic regulation, with the logic being that such commitments to change regulatory rules are particularly difficult to observe, monitor, and enforce – and thus are particularly likely to require stronger dispute settlement.Footnote 25

For the measurement of H3a, which singles out powerful states as being more likely to design strong DSMs, we create and include in the model three distinct indicators for the presence of a powerful actor in a given PTA. Most obvious are agreements that include the United States. The argument for the U.S. is particularly compelling. It is a highly legalized society with a litigious culture; it was the main actor behind the creation of the WTO DSM in the Uruguay Round (Elsig and Eckhardt 2015), and it has advocated strong dispute settlement within other economic agreements, such as BITs. Similarly, we also include a dummy variable for all agreements involving the European Union – the other post-war economic giant – although our expectation for the E.U. is perhaps less definitive. One reason is because of the peculiarities of E.U. trade policy, in particular the multitude of non-economic interests the E.U. pursues with a substantial number of PTAs. Finally, we also include a dummy variable for any agreement that includes a member of the modern-day Group of 20, or “G20.” This serves as a much broader measure for the design effects of powerful states, and allows us to compare and contrast agreements involving rising powers such as China and Brazil with the more traditional powers (US, EU). All three indicators are included in our multiple regression models.

To test the related hypothesis about power asymmetry (H2b), we utilize as our primary measure a dummy variable to indicate agreements that involve a country from the global “North” and a country from the global “South.”Footnote 26 This North–South indicator has the benefit of being intuitive as well as easy to interpret. It also facilitates comparison with other types of membership combinations such as North-North and South-South agreements, which we believe should have weaker dispute settlement than North–South pairings.Footnote 27 Nevertheless, as a robustness check we also create and insert a more continuous measure of power asymmetry based on the difference in GDP from the largest member of the PTA (in terms of economic size) and the smallest member, expressed in billions of dollars.

Finally, we also include a set of control variables for other possible influences on DSM design. A first control variable is for regime type, since studies have found that democratic states are more likely to embrace legal dispute settlement to settle disputes (e.g., Allee and Huth 2006; Gomez-Mera and Molinari 2014; Sattler and Bernauer 2011). By extension, one would expect that democratic states will be more receptive to the inclusion of strong, legal dispute settlement language in their agreements. We use the average Polity score among all members of the PTA (Jaggers and Gurr 1995). We also control for whether the signatories are members of the GATT/WTO. WTO members are likely to have greater familiarity and comfort with powerful legal dispute settlement bodies, via their experiences with the WTO DSM, and thus will be more willing to include strong dispute settlement in their PTAs, too. Finally, we also control for PTA trade volume, since PTAs involving larger amounts of trade might necessitate stronger DSMs (Haftel 2012; Hooghe et al. 2014; Jo and Namgung 2012).Footnote 28

5 Primary findings

To evaluate the hypotheses discussed above, we first estimate and interpret a primary model, which is presented in Table 2. In addition, we estimate a series of follow-up models to check for sensitivity of results to alternate independent (Tables 3 and 6) and dependent (Appendices 2 and 3) variable specifications. We also explore the substantive effects of variables found to be statistically significant, and present these in Tables 4, 5, 7, and 8. Ordered probit models are estimated in many cases, but we also utilize ordinary least squares (OLS) regression in appropriate situations.Footnote 29 Finally, for all models we include controls for time-period.Footnote 30

Table 2 Ordered probit results for the strength of dispute settlement in PTAs
Table 3 Ordered probit results for the strength of dispute settlement in PTAs
Table 4 Predicted probabilities for strength of dispute settlement in PTAs at varying levels of agreement depth

The findings from our primary model, shown in Table 2, strongly support our overall expectation of a hierarchy of design influences. The dominant pattern is that features of the agreement are powerful, consistent predictors of strong dispute settlement regardless of the identity or location of the signatories. Yet power dynamics also play an important role: asymmetric, or “North–South,” agreements are far more likely to have strong dispute settlement than all types of more balanced agreements, and U.S. PTAs have stronger DSMs, too. Finally, regional patterns also are evident, with Latin American and Asian agreements exhibiting a greater likelihood of employing strong dispute settlement mechanisms.

From the core findings in Table 2, and across all of the models we estimate, agreement depth is the strongest and most consistent predictor of DSM design. Overall, across the roughly two dozen models we estimate, the agreement depth variable is positively associated with strong dispute settlement, almost always at the 99 % level of confidence. The finding is notable given the ongoing, aforementioned, debate about whether compliance with agreements should be “managed” or “enforced.” Our empirical findings support the enforcement claim. We find that governments certainly believe that their deeper agreements require stronger dispute settlement, and they design them accordingly.

This strong finding holds regardless of how we conceptualize and measure agreement depth, as Table 3 shows. For instance, similar findings are returned when we substitute the much broader indicator of depth based on the full universe of 48 depth-related indicators (column 1). The results also are nearly identical when we substitute the measure of depth arrived at by factor analysis (column 2). Finally, our theoretically-driven measure that examines depth in terms of domestic regulations also is a strong predictor of dispute settlement strength (column 3), which means that PTAs that require stronger regulatory convergence also receive strong dispute settlement.

To explore the substantive impact of agreement depth, we examine the likelihood of each of the outcomes of DSM strength that comprise our primary dependent variable (and range in increasing order from 0 to 9).Footnote 31 Table 4 indicates that for the shallowest PTAs (“very low depth”), agreements have less than a 1 % chance of having a “strong” DSM, defined as a mechanism that lies between 7 and 9 on our 0–9 dependent variable scale. “Low depth” agreements have only an 8 % chance of falling into this range, and are still most likely (61 %) to have the weakest possible DSMs (0 or 1 on the DSM scale). “Moderate depth” agreements become likely to have somewhat stronger dispute settlement procedures, with only a 25 % likelihood of having the weakest DSMs and more than a 60 % having a moderately-strong DSM – defined as lying between 3 and 7 on our 0–9 scale. Finally, Table 4 shows how “high depth” agreements are very likely to have strong DSMs; in fact, they have more than an 80 % predicted likelihood of falling into the highest possible (7–9) range on our scale. By contrast, these deep agreements have only a 2 % predicted chance of falling into one of the weaker DSM categories (0–2).

The number of states in a PTA is another feature of international agreements that is a consistent predictor of DSM strength. In Table 2 the number of members variable is positively associated with strong dispute settlement at the highest (99 %) level of confidence. Taken together, then, agreement depth and membership size serve as important forces that lead to strong dispute settlement. From a substantive standpoint, the effects of membership size are notable but not particularly substantial. Table 5 shows that compared to bilateral agreements, agreements with 7 and 14 members are about 10 and 20 % more likely to have very strong dispute settlement (8 or 9 on the DSM strength scale) and similarly less likely to have weak dispute settlement (0 or 1 on DSM strength). Agreements with particularly large memberships (50 members) become about twice as likely to have strong DSMs (7–9 on the scale) than bilateral agreements. In some follow-up analyses, we find that the distinction between bilateral (two) and plurilateral (greater than two) agreements is consistently important, but that there are somewhat weak and diminishing effects to adding more members.Footnote 32 Thus, although it is clear that agreement features are important and establish a clear baseline for DSM design-strength, depth is a stronger driving force than membership size.

Table 5 Predicted probabilities for strength of dispute settlement in PTAs at varying levels of membership size

Although membership size and depth, in particular, are strong determinants of dispute settlement design for all states, power dynamics also play an important role. Notable and robust is the finding that agreements with power asymmetries are more likely to contain strong DSMs. The coefficient estimate for asymmetric agreements in Table 2, reflected by the North–South dummy variable, is positive and significant at the 99 % level of confidence. In fact, this relationship is positive across all of the models we estimate, almost always at the same 99 % level. Moreover, this finding also holds when we substitute an alternate measure for asymmetry based on the difference in GDP from the richest to the poorest member of the agreement (see Table 6, column 1). Likewise, when compared to all types of more symmetric agreements, it becomes clear that asymmetric agreements are more disposed to stronger DSMs. In the second column in Table 6 we substitute dummy variables for North-North and South-South agreements and both return negative coefficient estimates, which can be contrasted with that model’s baseline category of North–South agreements. In sum, among all power constellations, asymmetric agreements are much more likely to provide for strong dispute settlement. From a substantive standpoint, Table 7 shows that such agreements are about twice as likely as other pairings to have strong dispute settlement (8 or 9 for DSM strength) and about 50 % less likely to have weak DSMs (0–2 for DSM strength).

Table 6 Ordered probit results for the strength of dispute settlement in PTAs
Table 7 Predicted probabilities for strength of dispute settlement in PTAs for asymmetric vs. symmetric agreements

There is some – albeit less – support for the related claim that powerful states will prefer strong DSMs regardless of the other agreements partners (H2a). The most noteworthy finding is that PTAs involving the United States are more likely to have strong dispute settlement, ceteris paribus. Earlier we argued that the U.S. should be amenable to strong enforcement due to its selectivity in choosing with whom it signs PTAs and the obligations to which it agrees, and well as its overall comfort with litigation. This appears to be the case, as evidenced by the finding in Table 2 in which U.S. agreements are positively associated with strong dispute settlement provisions at the 99 % level of confidence. This relationship is generally strong, although with some qualifiers, as we discuss in the next section. The predicted probabilities displayed in Table 8 show that U.S. PTAs are nearly twice as likely to have strong dispute settlement (DSM strength of 7, 8, or 9) than PTAs involving other countries.

Table 8 Predicted probabilities for strength of dispute settlement in PTAs for various countries and regions

By contrast, we find no evidence that other powerful actors include stronger DSMs in their agreements. The coefficient estimate on the European Union variable in Table 2 is not statistically significant, but is negative, as it is across most models we estimate. Upon further exploration, it seems that at least until recently, the E.U. approach to PTA dispute settlement has conformed to the conventional argument about powerful actors’ aversion to legally-based dispute settlement.Footnote 33 Finally, the G20 variable exhibits no relationship with DSM design, indicating that powerful states outside of the U.S. exhibit no clear propensity toward strong versus weak dispute settlement.

Several regional patterns also emerge in Table 2, in ways both expected and unexpected. The strongest and most consistent finding is that agreements in the Americas are more likely to have strong dispute settlement procedures. The Americas variable is positive and statistically significant at the 99 % level in Table 2, and in nearly all of the sensitivity checks we conduct. Agreements in the Americas have a 65 % likelihood of having at least moderately-strong dispute settlement, defined as 5 or higher on the 0–9 scale (see Table 8). At the low end of the dispute settlement strength spectrum, PTAs in the Americas are less than half as likely (19 % versus 43 %) as agreements from other regions to have one of the weakest levels of dispute settlement (0 or 1).

Perhaps the most striking finding from our analyses is that agreements in Asia are actually more – not less – likely to have strong dispute settlement mechanisms. The Asia variable in our primary model (Table 2) is positive and significant at the 90 % level of confidence. From a substantive standpoint, PTAs in Asia are about 60 % more likely to be in the upper three categories (7–9) of dispute settlement strength than agreements outside Asia.Footnote 34 This finding suggests a few reassessments of the prevailing conventional wisdom. One is a rejection of the claim that legalization is weak in Asia due to Asian culture or some inherent regional characteristic. We find merit in Kahler’s (2000) conjecture that any past hesitation by governments in Asia to embrace legalization is not so much due to culture or “Asian values” but instead likely reflects conscious choices made by governments. Moreover, we also draw a distinction between DSM design and DSM use, which can work in symbiotic ways. It may be that governments in Asia have been receptive to the inclusion of legal dispute settlement provisions, even if they have not fully litigated many disputes.Footnote 35

Findings for other variables also deserve a brief mention. Most notably, the evidence suggests that WTO members are more likely to include strong dispute settlement in their PTAs (see Table 2). This finding is quite robust, and likely reflects these states’ familiarity and comfort with legal dispute settlement. In contrast, we find that democratic signatories are neither more nor less likely to include strong DSMs.Footnote 36 Moreover, we explore several related variables, such as those for transitioning democracies and strong law-and-order states, and continually find little evidence that these features affect dispute settlement design. Finally, there is no support for the idea that PTAs with greater trade volume have strong dispute settlement once other, more salient, features are taken into account.

We also conduct a series of sensitivity checks using alternate estimators, as well as independent and dependent variables, to assess the stability of findings and to explore further patterns that are suggested by the findings thus far. We first consider different estimators and estimation choices, many of which we touched upon earlier, and present them in online Appendix 2.Footnote 37 First we estimate the primary model in Table 2 using OLS and then binary probit, both of which return very similar results to the original ordered probit model.Footnote 38 Next we also use OLS to examine the robustness of findings to two of the alternate dependent variables that we discussed earlier: the standardized version of our six-category indicator and the one created using principal components analysis. Appendix 2 also displays the findings from these two models. In general the original findings (from Table 2) are overwhelmingly upheld in both models. The findings for most significant variables, including Depth, Number of Members, Asymmetry (North–South), Americas, Asia, and WTO Membership, are virtually unchanged. The only noteworthy change is that in the two models in Appendix 2 PTAs involving the U.S. are no longer associated with stronger dispute settlement. This lack of support for a previously strong finding is puzzling, and motivates us to delve further into the nuances of our dependent variable.

Although we believe strongly in using a composite measure of the overall strength of dispute settlement, as a final step we unpack our dependent variable and run separate regression models for each of the six dependent variable subcomponents. This allows us to evaluate micro-level relationships between each explanatory variable and each aspect of strong dispute settlement. Appendix 3 presents the results of this endeavor in a six-column table, the contents of which: support our previous findings generally, reveal some important and logical nuances, and clarify the anomalous findings above about U.S. PTAs. A major takeaway from Appendix 3 is that many of the relationships are robust across all components of DSM strength. For instance, the two dominant agreement-feature variables – depth and to a lesser extent number of members – consistently exhibit a positive and statistically significant relationship to all subcomponents of DSM strength. The same is true for the Americas, and to a slightly lesser extent, asymmetric (North–South) agreements. These findings increase our already firm belief in the positive relationship between these characteristics and strong dispute settlement in PTAs.

Some other relationships in Appendix 3 largely bolster the pre-existing conclusions, yet also reveal some interesting patterns. Asian agreements are positively associated with all six subcomponents, although not all of the relationships reach conventional levels of statistical significance. E.U. PTAs are negatively associated (at statistically significant levels) with three of the subcomponents (forum choice, panel chair, time limits) and this somewhat mixed pattern is consistent with our earlier interpretation of E.U. PTAs. Finally, the patterns for U.S. PTAs are similarly mixed. U.S. PTAs are positively associated with three of the six subcomponents at the 90 % or greater level of confidence (forum choice, panel chair, sanctions), yet in three other cases there is no discernible relationship. Nevertheless, the particular features of strong dispute settlement the U.S. tends to employ make logical sense. For instance, U.S. agreements typically allow for sanctions in cases of non-implementation, which is consistent with U.S. attitudes toward the WTO DSM and reflects an area of dispute settlement that is particularly amenable to the interests of major powers. Although we strongly advocate thinking about DSMs in a more holistic sense, these fine-tuned relationships are revealing and add to the richness of our study.

6 Conclusion

The goal of this research has been to use PTAs as a vehicle to explain why some international institutions have stronger dispute settlement provisions than others. Our study breaks new ground in several ways. One is that we examine a much greater number of agreements than previous studies, capturing considerable variation in nearly 600 PTAs across space and time. Similarly, we collect original data on dozens of features of dispute settlement provisions in PTAs, which we employ for the first time in these empirical tests. These expanded data are very beneficial in allowing us to investigate a new, richer way of conceptualizing dispute settlement design. We move beyond the conventional notion of legalization and instead theorize DSMs as a comprehensive tool designed to help uphold deep trade obligations. Our core finding is that agreement depth is the strongest of all predictors for dispute settlement design. We offer the most systematic empirical test of the depth-enforcement link that underlies much of the theoretical debate about international institutions. This finding also indicates that governments view strong dispute settlement as necessary for achieving compliance with international obligations, and that compliance is not “managed” but instead is incentivized through deterrence and punishment (Downs et al. 1996). We also reconcile divergent perspectives on institutional design, showing that international agreements can reflect both rational design but also the unique needs and preferences of state and regional actors.

Several of our other findings also have implications beyond the study of PTAs or dispute settlement. We cast doubt on assertions that Asian culture is not amenable to formal dispute settlement or strong legal regimes. We also find that PTAs involving the U.S. are more likely to contain strong DSMs, which challenges the view that the U.S. is skeptical of legalization. Yet it also raises the possibility that state power might creep into realm of legal dispute settlement, which is supposed to neutralize such considerations. Maybe powerful actors design DSMs to suit their expected needs, and then choose from among the menu they have created when future disputes arise? Although this is a concern, we emphasize that asymmetric PTAs have the strongest dispute settlement, suggesting that both powerful and weak states can find mutual benefit from legalistic DSMs. Furthermore, even if some powerful states desire strong dispute settlement, we suspect most other states would be happy to see leading actors bound more closely to the rule of law.

Although our findings clearly demonstrate that post-war DSMs are uniquely designed, we observe some emerging convergence that is worth exploring further in future work. One pattern is that all DSMs are getting stronger over time, with many now including time limits and the ability to retaliate in the event of non-compliance. Perhaps governments are beginning to mimic one another when designing agreements? Some have suggested that institutional design features are likely to diffuse (Jetschke and Lenz 2013). Indeed, recent empirical work shows that treaty language is often replicated from one treaty to the next (Allee and Elsig 2014) and that a handful of models may exist (Baccini et al. 2015a, b).

There now exist hundreds of international agreements worldwide with legal dispute settlement provisions, including various features associated with timely resolution, selection of panelists, forum choice, and sanctioning, among others. These design features are likely to reappear in upcoming decades as more disputes begin to arise and the above features are evoked. Since the obligations in PTAs often take many years to fully “kick in,” some DSMs that are dormant in their early years are likely to become active down the line, as was the case with the ECJ, which was hardly used in the 1950s and 1960s (Alter 2012). Therefore we anticipate that DSMs will be used with increasing frequency in the near future, with disputes being resolved through varied diplomatic and legal channels, revealing further the importance of past design choices and the influences exerted upon them.