1 Introduction

What made states leave out a formal punishment provision in the Nuclear Nonproliferation Treaty (NPT) yet articulate very detailed formal punishment provisions in the series of International Coffee Agreements that governed that commodity from 1962 to 1989? The NPT is important, has the support of major powers, and is arguably not just for show. In fact, the absence of a formal punishment provision in the NPT has not led to widespread noncooperation among members, despite the fact that incentives to defect do exist. Perhaps one could even jump to the conclusion that most states, most of the time, understand that defection will be noticed and potentially addressed—that is, punishment provisions exist but are left informal. In other words, within formal law, a consequential provision was left deliberately informal.

The study of formal international institutions, including their details and consequences, has dominated the political science and law literatures on international institutions over the past decades.Footnote 1 This is a tremendously important step in deepening our understanding of why and how international cooperation occurs. Scholars have shown that even the small details of formal international law, like the final clauses, matter.Footnote 2

Scholars and policymakers also know that informalism not only exists; it often is key to international cooperation across many issue areas. Yet, despite this recognition of the critical role played by informalism, scholars have struggled to articulate in any refined or testable way just when and how informalism is important and what role it plays either alone or in conjunction with formal cooperation. A key challenge in the study of informalism is identifying and quantifying what is indeed informal across more than a few, well-known cases.

In this article, I offer a theory and method to analyze the role of informalism across a large set of cases encompassing all kinds of international cooperation. Specifically, I examine the role of informal punishment provisions (that is, potentially implicit or unwritten provisions) within formal international law. The main research question is as follows: When states face an international cooperation problem requiring enforcement, why do they decide to make that enforcement formal or informal in any institution they create?

In what follows, I briefly review a few very important past scholarly contributions on informalism that are most relevant to this study as well as some important literature on compliance and the need for punishment provisions. I then present an overall theory of punishment provisions and a set of conjectures about whether punishments will be formalized or not. This theory gives rise to a two-part empirical analysis conducted on a large-n dataset featuring diverse issue areas as well as parties. The first part is the presentation of a simple model that predicts the presence of enforcement mechanisms in agreements. The second is an analysis of those cases that are “misclassified”—ones in which the model predicts the presence of such mechanisms, but the agreements lack them. The two-stage approach is necessary because informal punishment provisions are generally unobservable. The misclassified agreements are thus candidates for informal enforcement. I provide case study evidence that punishment can and does occur in cases in which the punishment provision was left informal. Overall, the results on punishment provisions presented here provide further evidence that the details of international law are chosen systematically, including the details of what’s left out!

The analysis shows that, in cases of great regime heterogeneity and large differences in the military capabilities of agreement members, informality provides states with flexibility to tailor their responses to defection based on the realized specifics. As elaborated below, these results indicate both efficiency concerns and power influence the design of international law.

2 Things left out—useless or hidden uses?

Informalism has been defined in various ways. Schachter (1977) lists characteristics of legally non-binding agreements and considers such agreements as a whole informal (see also Aust 1986). Schachter mentions that imprecise, overly general wording in international agreements is often taken as indicative of non-binding intention and a low level of legal obligation. This account squares well with Abbott and Snidal’s (2000) soft law, characterized by weak legal obligations, vague wording, and weak or no delegation.

More recently, the literature has shifted emphasis to the relationship between formal and informal agreements. Cogan (2009: 212), who examines informal agreements in the selection of international bureaucrats, provides a very clear rationale for why informal agreements enjoy such a prevalence in the international system: “informal agreements largely take account of, and reallocate authority to match, the differences in power and interests that pervade the international system when those differences cannot be acknowledged formally.” Downs and Rocke (1990) highlight the role of tacit bargaining in fostering international cooperation, with tacit bargaining being more about action that signals intention than about formalized negotiation. They give many examples of tacit bargaining and importantly point out that most cooperation combines tacit and formalized communication. Lipson (1991) identifies a broad range of domestic factors that may motivate informal agreements. Compared to formally ratified treaties, informal agreements are easier for governments to negotiate, faster to implement, more flexible, and lack the public visibility of treaties. Lipson particularly emphasizes the ability of informal agreements to address uncertain, changing environments since such agreements can be adjusted and renegotiated more easily than formal agreements.

While these authors have noted that some cooperation is optimally left informal, none of their insights have been systematically tested or theoretically refined, perhaps because of the obstacles to quantifying what is informal. An exception is Stone’s (2011) analysis of informal governance in three important formal international institutions: the IMF, the WTO, and the EU.Footnote 3 Stone explicitly considers the relationship between formal and informal governance and concludes that scholars “generally failed to connect the dots, because they have not appreciated that informal governance mechanisms exist primarily to serve the interests of powerful states, while formal rules are generally designed to protect the weak” (2011, 207). Unlike most extant studies, Stone explicitly acknowledges that informal elements co-exist with formal rules in many international agreements, and indeed in many cases modify and even overrule formal procedures. He contends that the balance between informal and formal elements within an institution is an equilibrium outcome, derived from the member states’ power and interests.

This paper is connected to Stone’s work in that it examines (the absence of) inducements to compliance in international agreements as an example of informalism in formal international institutions. The agreement itself can still be worded very precisely and hence create strong legal obligations; nonetheless, at the time of negotiation, some subset of the agreement was left out to be regulated informally in the future.

Why are specific design provisions sometimes left out of international agreements? There are three potential explanations. First, a specific provision might not serve any purpose because the situation does not call for it. For instance, consider dispute resolution procedures. As Koremenos (2007) shows, including dispute resolution provisions into agreements is a deliberate choice in response to certain cooperation problems (see, too, Koremenos and Betz 2013). When those particular cooperation problems are absent, dispute resolution provisions are rare. Put differently, the efficient design of international law implies that unnecessary design elements will be left out. When states face incentives to defect attributable to a Prisoner’s Dilemma-like trade game, dispute resolution provisions make sense; when states find themselves in a simple coordination game to prevent frontier fires, dispute resolution mechanisms are unnecessary, and rational states should not pay the transaction and sovereignty costs of delegating such authority.

Second, other design features may fill their place; hence one design element is substituted for another. Extremely precise wording, for instance, may render dispute resolution mechanisms for agreement interpretation obsolete (Koremenos 2012). If precise wording allows parties to identify defection and hence predict with great certainty what would happen should dispute resolution be triggered, parties can save on delegation costs entirely. We rarely if ever see speeding violations identified by radar go to court.

Third, and what this article addresses, despite being useful for the underlying problem structure, states may deliberately leave out parts of an agreement. In such cases, the potential for the unwritten design mechanism to be triggered is implicitly understood. As mentioned, the Nuclear Non-Proliferation Treaty (NPT) is a striking example of this third category.

Of course, the literature provides alternative explanations as to why agreements like the NPT lack punishment provisions. The work of Chayes and Chayes (1993), for instance, implies that successfully negotiated agreements rarely need explicit punishment provisions. As they put it, “if the agreement is well-designed—sensible, comprehensible, and with a practical eye to probable patterns of conduct and interaction—compliance problems and enforcement issues are likely to be manageable,” and therefore strong enforcement mechanisms are unnecessary (Chayes and Chayes 1993, 183). Yet, punishment provisions could be one of the critical parts of such a well-designed agreement. The very existence of a punishment provision may tilt the calculus of states towards compliance, especially so when states have incentives to defect.

It should also be emphasized that punishment provisions need not involve military action. Military involvement is confined to few agreements and issue areas. For example, Chayes et al. (1995) note that they are not aware of any environmental agreement that allows for such punishments; nor do punishments have to involve economic sanctions, which are also rarely authorized in environmental agreements (Chayes et al. 1995, 79). However, these authors do find that environmental agreements often authorize “membership sanctions,” depriving violators of rights and privileges. The authors note that such membership sanctions are rarely used, thus dismissing them as ineffective. This is too quick a conclusion: the rare use of these punishments may indicate their very effectiveness in that they deter violations and make actual punishment unnecessary.Footnote 4 Hence, punishment provisions appear to be neither uncommon nor unnecessary for the functioning of international agreements.

The finding that punishment provisions such as membership sanctions are incorporated into international law poses another question, motivated by the Realist literature: In response to a violation, states could revert to the status quo in place before the agreement was concluded; thus, why incorporate punishment provisions, whether formal or implicit, in the first place? That is, a lack of punishment provisions may indicate that the participating states already have an understanding of the potential response to rule violations: retaliation that ends cooperation.

Punishment provisions provide an important advantage over threats to revert to the status quo. Reversion to the status quo (realized via “grim trigger strategies”) is often “over-punishing” and results in the breakdown of cooperation entirely. Punishment provisions that do not end all cooperation avoid these pathologies and allow for more robust cooperation over time.

3 A theory of punishment provisions in international agreements

In this section, I present first an overall theory of punishment provisions and second a set of hypotheses about whether any needed punishments will be formalized. This overall theory assumes that the driving force that brings states to the negotiation table is the desire to solve particular substantive problems, which are characterized by different cooperation problems. Particular design provisions are then chosen to solve these problems.Footnote 5 It is worth mentioning that the theory articulated here applies to inducements to compliance more broadly, that is, anything that changes the payoffs an actor receives from either cooperating or defecting. Thus, a reward is one kind of inducement in that it increases the payoff of cooperating; the flip side is a punishment that decreases the payoff from defecting. Because, as will be noted below, rewards are rarely formally incorporated into international agreements (an interesting future research issue by itself), I use the term punishment to capture either kind of inducement. But as the case study on the NPT below will show, rewards as well as punishments are indeed employed in cases in which the inducement to compliance is left informal.

3.1 When are punishment provisions needed in international agreements?

When would we expect to see punishment provisions incorporated into an international agreement? The Rational Design (Koremenos et al. 2001) and the Legalization (Goldstein et al. 2000) literatures offer conjectures. Rational Design starts from the premise that states and other international actors design institutions through purposeful, rational interactions and from the observation that international institutions display dramatic design variation. To understand the variation, Rational Design relies on game-theoretic insights to relate cooperation problems to specific design provisions. Cooperation problems, in turn, reflect the constellation of preferences and constraints in any given situation; Koremenos et al. (2001) as well as Koremenos (n.d.) provide a more extensive treatment.

A prime condition calling for punishment provisions is the existence of an enforcement problem: when the incentives to defect are large, states want to insure themselves against being the ‘sucker’ by being able to punish the defector (Koremenos et al. 2001:786).Footnote 6 Enforcement problems correspond to the celebrated Prisoners’ Dilemma, and one way to address such problems is to impose severe and credible sanctions on defectors. Lowering the noncooperation payoff of the original game, defections thus become less attractive for each party, and hence mutual defections are less likely to occur. Expecting the maintenance of cooperation, states are thus willing to sign onto agreements that would be infeasible in the absence of punishment provisions. Hence, a first conjecture is that, other things equal, the presence of enforcement problems results in the inclusion of punishment provisions.

Domestic commitment problems constitute another factor that should, other things equal, result in the inclusion of punishment provisions in international agreements. A commitment problem arises if an actor’s current optimal plan for the future is no longer optimal once this future arrives; in other words, the current plan is inconsistent over time. The most prominent examples of such problems pertain to domestic monetary policy-making and promises not to nationalize foreign investments. For instance, a state may promise not to nationalize investments to attract foreign companies. Nevertheless, once foreign businesses invest in a state, the host government has an incentive to break its commitment and reap the benefits. Commitment problems also often arise out of domestic politics. In states that have not solved the credible commitment problem through domestic institutions, changes in the political support base of a government, for instance, may make it harder to comply with international commitments and thus trigger violations. Punishment provisions, whose negative consequences offset such political pressures, may then deter violations. That states intentionally try to solve such commitment problems through ‘tougher’ agreements is also recognized in the Legalization literature. As Goldstein et al. (2000: 393) put it, “Governments and domestic groups may also deliberately employ international legalization as a means to bind themselves or their successors in the future. In other words, international legalization may have the aim of imposing constraints on domestic political behavior.” Therefore, a second conjecture is that, other things equal, the presence of commitment problems results in the inclusion of punishment provisions.

A third cooperation problem potentially associated with the inclusion of punishment provisions is uncertainty about behavior, which captures situations where the actions of other states are difficult or impossible to observe—perhaps because they take place at the domestic level, to which a third party or other states have no access. If actions by other states are hard to observe, this creates incentives to defect since defections might go unnoticed. Hence, a fourth conjecture is that, other things equal, the presence of uncertainty about behavior results in the inclusion of punishment provisions.Footnote 7

Punishment provisions also become more attractive in multilateral agreements. In bilateral agreements, no coordination is necessary to punish a defector—Axelrod’s (1984) celebrated insights on Tit-for-Tat as a strategy in two-actor games impressively underscore this point. As Oye (1986, 19) points out, as the number of actors increases, the likelihood of including a state “too weak (domestically) to detect, react, or implement a strategy of reciprocity, that cannot distinguish reliably between cooperation and defection by other states, or that departs from even minimal standards of rationality” is increasing dramatically as well. Moreover, if punishments cannot be targeted to single defectors, but apply equally to all participants of the agreement, strategies of reciprocity become impossible to implement if states do not want to risk the breakdown of cooperation. The fourth conjecture, hence, is, other things equal, bilateral agreements are less likely to include punishment provisions than multilateral agreements.

3.2 The design of punishment provisions: Formal or informal

What might explain why states leave out explicit punishment provisions in situations in which such provisions are deemed necessary? Informality is a form of flexibility in that the particulars can be decided both at a later time depending on the circumstances and on a case-by-case as opposed to uniform basis. This definition of informality used in this analysis, the deliberate omission of a provision, is akin to an incomplete contract. Incomplete contracts arise because ex ante it is difficult to get a particular group to agree on specific provisions and because ex post parties may prefer discretion in how they react to particular events, like noncompliance.

Below, I first explain how heterogeneity among the participating states is related to both the difficulty of getting things agreed upon ex ante as well as the usefulness of discretion ex post and how heterogeneity might therefore lead to informalism.Footnote 8 I then explain how great differences in power might lead powerful states to prefer informal versus formal punishment as a way of exerting more control over outcomes.Footnote 9

3.2.1 Heterogeneity among participants

Heterogeneity among parties to an agreement makes compromise harder to achieve. Decentralized (that is, not formally and explicitly defined) punishment provisions thus become more attractive. Oates (1999) makes a similar point in the context of fiscal federalism: when preferences are diverse, rather than agreeing on a centralized, uniform mechanism, it becomes more attractive to delegate decision-making authority and fiscal autonomy to constituent units.Footnote 10 Ehrlich and Posner (1974) connect this mechanism to law-making: if compromise is hard to achieve, it becomes more attractive to leave out specifics. This argument is in line with Koremenos (2012) who, building on Ehrlich and Posner, argues that international lawmakers design law efficiently when choosing the precision or vagueness of the substantive terms. In particular, a greater number of participants (which tends to increase heterogeneity) makes it harder to agree on precise rules. Such rules could be substantive ones, as in Koremenos (2012), or they could be procedural, such as the punishment provisions examined here.Footnote 11

The following Department of State quote illustrates the difficulty of a uniform strategy and is from the period when the Convention on the Elimination of All Forms of Discrimination Against Women (CEDAW) (coded below as having informal punishment) was negotiated:

But what measures or combination of approaches will be most effective: pressure of public opinion, denial of trade or aid, or quiet diplomacy behind the scenes? Are there ways of providing incentives (as well as pressures) for improved human rights practices? …. And finally, what are the relative advantages and disadvantages of bilateral and/or multilateral approaches; how can they most effectively be used in combination?

Because of the complexities involved in the above factors, the Department has been unable to find any single formula for categorizing the human rights situations that require attention and action, and has tentatively concluded that decisions have to be made case by case from an analysis of all the circumstances involved.

Thus, a first prediction is, other things equal, agreements with informal as opposed to formal punishment provisions are characterized by greater heterogeneity among participants.

Of course, an extremely high level of regime heterogeneity might lead to no agreement at all. Some scholars worry, and rightly so, about the conceptual and evidentiary consequences of focusing on the selected sample of realized agreements rather than on the latent population of all potential agreements. Koremenos (book manuscript) argues that in many substantive contexts, it is the effect on realized agreements that is of primary interest. Even in cases where we care about the effect in the latent population, their model clarifies when, for example, the sign of the effect will be the same in the two populations. Without question this selection problem merits rigorous scholarship in and of itself, especially when the latent population is difficult if not impossible to enumerate. Notwithstanding, within the sample of actually realized agreements, the theoretical argument implies that the more heterogeneous the participants are, the more likely the agreement will have informal as opposed to formal enforcement provisions.

3.2.2 Power

Informal punishment provisions are also useful for accommodating power differences within international agreements. Tierney (2008: 284) argues powerful actors are very careful about preserving their freedom of action. This view squares well with Stone’s (2011) insight that formal rules protect the weak and informal rules serve the powerful. Informal punishment provisions give powerful states more freedom with respect to the application of enforcement mechanisms, and thus give strong states more options in how they can exercise their power. Stone states: the formal rules “embody a broad consensus of the membership, while the informal rules allow exceptional access for powerful states” (Stone 2011: 13).Footnote 12 Such logic is consistent with the Rational Design conjecture, “Asymmetry of Control increases with Asymmetry Among Contributors,” since the flexibility afforded by informality allows strong states to exercise more control over the institution (Koremenos et al. 2001: 791). For these reasons, other things equal, agreements with informal as opposed to formal punishment provisions are more likely the larger are the power differentials among parties to an agreement.

4 Data and descriptive statistics

The data used in this article stem from the Continent of International Law (COIL) project, which in addition to theoretical development, features data collection on a random sample of international agreements. COIL includes 234 agreements drawn from the United Nations Treaty Series (UNTS), a database that comprises all agreements registered or filed with the UN Secretariat since 1946 as well as many agreements registered with the League of Nations. Importantly, such registration is a prerequisite to invoking an agreement before any organ of the United Nations, which creates incentives for states to register agreements.

COIL, following UNTS definitions, focuses on four major issue areas: economics, environment, human rights, and security. Conditional on these issue areas, 234 agreements were drawn randomly. This choice was in part motivated by the extant literature, which typically compares agreements within specific issue areas, as in Mitchell’s (2002–2011) database of International Environmental Agreements or the Alliance Treaty Obligations and Provisions data set of Leeds et al. (2002). More significantly, the choice was motivated by COIL’s theoretical premise that issue areas are comparable once one looks at the set of underlying cooperation problems that brought states to the negotiation. The data set contains 103 economics agreements, 43 environmental agreements, 41 human rights agreements, and 47 security-related agreements.

For the purposes of COIL, every UNTS agreement is considered an international agreement unless it is excluded by one of five criteria. For example, an agreement must involve at least two states; thus, agreements between one state and an international organization were excluded. Among the variables covered are the underlying cooperation problem(s); the main prescriptions, proscriptions, and authorizations; the existence of preambles and appendices; membership criteria, including any mention of nonstate actors; and flexibility provisions like escape clauses and reservations. For this article, the most important variable pertains to whether an agreement has inducements to compliance. (An agreement may, of course, have more than one type of inducement to compliance.) Table 1 displays descriptive statistics on this question.

Table 1 Inducements to compliance

Interestingly, only one agreement in the sample stipulates formal rewards for compliant behavior (and this agreement includes a punishment provision as well).Footnote 13,Footnote 14 In contrast, formal punishment provisions occur frequently in international agreements. In most cases, punishments are conducted by agreement members themselves or by intergovernmental bodies created by the agreement. However, punishments may also be delegated to already existing international institutions. In the context of punishment provisions, the most relevant such institution is the United Nations Security Council (UNSC). The UNSC is a source of punishments in multiple agreements in the sample, via states having the right to complain formally to the UNSC; consequently, these agreements are coded as having formal punishment provisions.

Table 1 also reveals that the incidence of punishment provisions varies vastly across issue areas. Almost half of agreements in the issue areas of economics and human rights contain punishment provisions; the share is much lower for environmental and security agreements.

Another dimension of COIL is the definition and coding of cooperation problems. More than a dozen different cooperation problems are identified, one or more of which may underlie any international agreement. COIL mandated a separation of coding for the cooperation problems (the independent variables in many analyses) and the design elements (the dependent variables), using two independent sets of coders. In coding the cooperation problems, relevant background information was analyzed. Sometimes, negotiators revealed the problems they were attempting to solve, and this is documented. More often, research needed to be done more broadly on the relationship among the relevant states (for example, in a bilateral agreement, the relationship of the dyad in the decade or two before the agreement is signed) and into the general problems of the sub-issue at the time. Only the substantive goals of the agreement were examined when trying to infer the underlying cooperation problem(s). Once background research was completed, a decision was made whether any particular problem met the threshold for inclusion. For example, bilateral agreements between Canada and the US were never coded as meeting the uncertainty about preferences threshold, whereas the Security Treaty between the United States (US) and Japan, signed in 1951 (UNTS 1835), is coded as exceeding the threshold.

5 Empirical testing

Based on the theoretical conjectures, we can build an empirical model of punishment provisions; all of the data come from the COIL dataset. Table 2 shows coefficient estimates and heteroskedasticity-consistent standard errors from a probit regression, with the presence of formal punishment provisions as the dependent variable; the regressors are based on the conjectures regarding when a punishment provision is needed, and all are coded as dichotomous variables. Issue area dummies address the stratified sampling in COIL, which is necessary because in some issue areas, like human rights, the population of agreements is quite small.Footnote 15

Table 2 Probit results

The conjectures perform quite well. Except for the coefficient on uncertainty about behavior, all coefficients of interest are large in substantive terms and statistically significant at the 1 % level.Footnote 16 Table 3 displays marginal effects: by how many percentage points the presence of any particular cooperation problem increases the probability that an agreement contains a punishment provision. The first four columns show the marginal effects for an agreement in each of the four issue areas, where all variables but the respective issue area dummy are zero. The fifth column averages these marginal effects over issue areas, using the relative frequency of each issue area to weight the issue-area specific marginal effects. For instance, an agreement with an underlying enforcement problem is on average 28.7 percentage points more likely to have a formal punishment provisions than an agreement without an enforcement problem.

Table 3 Marginal effects

The marginal effects of each cooperation problem are large in substantive terms: on average, the presence of a commitment problem, for instance, raises the probability that punishment provisions are included by 43 percentage points.Footnote 17 Overall, judging from the significance and size of these marginal effects, the conjectures regarding the need for punishment provisions outlined above perform extremely well.

Relying on the Receiver Operating Characteristics (ROC), a heuristic used to evaluate fit of models with binary dependent variables (Greenhill et al. 2011, 992), the model seems to perform well for predictive purposes. In short, the ROC compares the fraction of agreements with punishment provisions that are (correctly) predicted to have punishment provisions with the fraction of agreements without punishment provisions that are (incorrectly) predicted to have punishment provisions. The area under the ROC curve is 0.904, indicating an astonishingly good fit; a model without predictive power would yield a score of 0.5 and a perfect fit a score of 1. Consequently, agreements without punishment provisions that were predicted to have them can be viewed as outliers, that is, candidates for informal punishment provisions.

Specifically, based on the coefficient estimates, we can predict the probability that each agreement in the sample should, theoretically, include punishment provisions. The predicted probabilities range from 0.0 % to 97.3 %, with a mean of 32.8 % and a standard deviation of 0.329. Using 0.5 as cut-off, 84.6 % of the agreements were predicted correctly—i.e., were predicted to have punishment provisions and in fact had them formally incorporated, or were predicted to not have any punishment provisions and in fact did not have any. Table 4 shows the 13 agreements that were predicted to have punishment provisions with a probability of at least 50 %, yet do not have any—i.e., candidates for informal punishment provisions. This set of agreements, which I refer to as misclassified agreements in the following, will be subjected to further inquiry in the following sections—in particular, whether the misclassified agreements do indeed have the characteristics they are predicted to have (greater heterogeneity, and greater power differentials).

Table 4 Misclassified agreements

A web appendix provides a number of robustness checks with respect to methodological alternatives to identifying the set of misclassified agreements. The set of misclassified agreements using these alternative strategies is very similar to the one displayed in Table 4. Thus the main results of the paper hold under these alternative model specifications.

5.1 An analysis of misclassified agreements and the choice of formal versus informal punishment

The research design introduced in this paper will now be exploited to shed light on the choice of formal versus informal punishment provisions by comparing the set of agreements that are correctly predicted to have punishment provisions and those that need punishment but lack formal provisions—the misclassified agreements. According to the theory, the set of misclassified agreements should be characterized by informal punishment provisions, but among the set of agreement could also be cases in which punishment is needed but not provided informally. The next section begins to address this conflation of categories given that research thus far indicates that five misclassified agreements are found to have some form of enforcement and only one thus far seems to be a clear case of failed cooperation. This research, therefore, is supportive of the thesis: What’s left out is often really there. The statistics are especially promising given the inability to find examples of informal punishment could also be taken as a sign that implicit punishment is having a deterrent effect.

5.1.1 Heterogeneity among participants

I expect misclassified agreements to be comprised of more heterogeneous sets of states than agreements correctly classified.Footnote 18 To assess the relationship between heterogeneity and potentially informal punishment provisions, I use regime type as an indicator of heterogeneity. The literature provides several measures; three are considered in the following: Polity scores (on a scale from −10 to 10), the Vanhanen democracy index (standardized to a 100 point scale), and the Freedom House democracy index (standardized to a 100 point scale).Footnote 19 Table 5 displays the results from the respective measures. To measure heterogeneity, for bilateral agreements the absolute difference in the respective democracy scores between the two participants is used. For multilateral agreements, first a data set with all dyads in the multilateral agreement was created; the absolute difference in the democracy indices for the most dissimilar dyad then determines the heterogeneity measure for the agreement (the “weakest link” assumption). All three measures of heterogeneity are substantively larger for misclassified agreements; using one-sided t-tests, the difference between misclassified agreements and the remaining agreements is in all cases statistically significant at the 5 % level. These results support the conjecture that informal punishment provisions are more likely in agreements composed of heterogeneous states.

Table 5 Heterogeneity

Agreements with great homogeneity among participants are typically agreements composed of democracies; there are few homogeneous agreements in the COIL sample comprised largely of non-democratic states. Figure 1 plots average polity scores against the maximal difference in polity scores among the members to an agreement, the variable used in Table 5. The graph indicates a negative relationship between average polity scores and heterogeneity. Given heterogeneity in this context implies agreements that include non-democracies, the flexibility attained with informal provisions makes even more sense.

Fig. 1
figure 1

Average polity scores and heterogeneity among participants

5.1.2 Power

In economic agreements, GDP is likely to be the most relevant measure of power, while in security agreements, military capabilities are the most relevant factor. Absent better measures, I will rely on GDP also to measure power differences for environmental and human rights agreements. Table 6 displays results from a probit regression with the standard deviation in GDP among the participants and the standard deviation in military capabilities among the participants as explanatory variables. As before, the relevant comparison is between misclassified agreements and those agreements that are predicted to have punishment provisions and indeed have them. It should be noted that data availability severely limits the sample. The results are interesting. Informal punishment provisions are more likely to be in place when participants have different military capabilities, but they are less likely if participants differ in terms of wealth. Thus, it appears that while differentials in military power tend to result in informalism, as both Stone and I predict, differentials in economic power have the opposite effect.

Table 6 Probit results–power differences

Given this is the first systematic treatment of informal provisions in formal law, formal and informal punishment provisions are treated as substitutes. In reality, informal efforts can supplement formal ones at any time.Footnote 20 The complementarities might be like those analyzed by Verdier (2008) in the nonproliferation regime, where the US supplements formal provisions with informal ones to target inducements to the particularities of the noncompliant state.

6 Does punishment ever occur in “misclassified” agreements?

The data overwhelmingly point to regime-type heterogeneity and differences in military capabilities as promising explanations of when and why states leave punishment provisions informal. Still, an important issue must be addressed for the argument in this paper to be compelling: What happens when a state defects and a punishment provision was left out? If nothing ever happens, one could argue punishment is not informal; rather, it is nonexistent.

Of course, it would be a fallacy to infer from the nonuse of punishment provisions, whether formal or informal, that they are inconsequential. Making the leap from unused to ineffective punishment provisions overlooks the anticipatory behavior of states. In fact, infrequent recourse to actual punishment may indicate the effectiveness of this institutional design choice. If the threat of punishment, whether formalized or left informal, is credible, states that prefer defection do not even join the agreement, given that their preferred strategy is likely to be too costly to be beneficial once the threat of punishment is incorporated into their payoffs. For those that prefer long-term cooperation with their partner(s) to long-term defection but face incentives to defect, the threat of punishment may keep them from defecting. In other words, when the threat is credible, actual punishment is off the equilibrium path behavior.

Still, the potential for punishment does not perfectly deter noncompliance in the domestic context let alone the international one. Some states do indeed defect from international law. While there are examples of formal punishment occurring as, for example, in the case of Bilateral Investment Agreements (and, of course, there exist examples of it failing to be employed as well),Footnote 21 there are also cases of informal punishment occurring.

Consider one of the misclassified agreements: The International Convention for the Suppression of the Financing of Terrorism. Given the underlying cooperation problems, one would predict the treaty would incorporate verification and enforcement. Yet, the treaty establishes no such mechanisms. As Rosand’s (2003:333) paper on the convention puts it, “some states still want their friends to be able to use terrorism to advance their favorite causes. The oft repeated phrase ‘one man’s terrorist is another man’s freedom fighter’ unfortunately remains relevant.” Rosand thus indirectly calls attention to regime heterogeneity as a possible explanation of the lack of formal punishment. There is good evidence this agreement is now being enforced, despite the lack of a formal enforcement provision. Specifically, when it was needed, a UNSC Resolution (1373) was adopted that was based on the agreement and effectively enforces it—even though the UNSC is never mentioned in the agreement text.Footnote 22 Thus it is not a large leap to say punishment was implicit despite being left out formally. This example also suggests the UNSC may play an even more important role for informal enforcement than what one would expect based on an evaluation of the treaty texts. The subsections below reinforce this point about the role of the UNSC—an organization controlled by powerful states.

6.1 The nuclear nonproliferation treaty

The NPT provides strong case study evidence in favor of the theoretical arguments presented here.Footnote 23 First, the UNSC has indeed acted to sanction states that are developing or threatening to develop nuclear weapons but who are not authorized by the NPT to do so. Furthermore, powerful states have acted alone to sanction violations, with varying degrees of punishment depending on the particulars of the noncompliant state. Second, and importantly, reactions to such threats or instances of nuclear proliferation (real or presumed) are quite different depending on whether the state in question is a member of the NPT or not. In other words, the NPT itself seems to contain informal enforcement provisions, and the counterfactual that the UNSC or particular powerful states would have acted correspondingly without the independent effect of the NPT is not supported.Footnote 24 I elaborate below.

The NPT permits five states, the US, France, Russia, the United Kingdom (UK), and China, to be in possession of nuclear weapons. However, over the course of history, other states have developed, or are believed to be developing, nuclear capabilities as well. Kazakhstan, Belarus, and Ukraine had nuclear weapons following the collapse of the Soviet Union, but promptly disavowed their weapons, signing the NPT and returning their nuclear weapons to Russia. Similarly, South Africa possessed nuclear weapons before signing the NPT but dismantled its weapons and signed the treaty. Four states—Iran, North Korea,Footnote 25 Iraq, and LibyaFootnote 26—are signatory countries that the International Atomic Energy Agency (IAEA) believes (or has recently believed) to be in violation of the NPT. Three other states possessing (or believed to possess) nuclear weapons— Israel, India, and Pakistan—are not party to the NPT. These cases provide insight into when and how “informal” punishment occurs.

The case of Iran is, perhaps, the most straightforward example. Reports by the IAEA have long reported concern with Iran’s potential capability to develop nuclear weapons. As a result of these reports, UNSC Resolutions imposed sanctions on the state of Iran (primarily with regard to the sale of weapons or weapons materials to Iran).

The attempt to halt North Korea’s program utilized carrots and sticks with a combination of threats, security guarantees, and aid packages, thereby illustrating the theoretical point that rewards and punishments are two sides of the same coin. In exchange for freezing its nuclear program, the US offered North Korea two nuclear reactors, 500,000 t of oil per year, and normalized political and economic relations. Despite this, a deal failed to materialize, and North Korea was later threatened with further isolation and sanctions.Footnote 27

With respect to Iraq, in 1998, President Saddam Hussein expelled UN weapons inspectors.Footnote 28 Subsequently, only two IAEA inspections occurred.Footnote 29 In June 2001, the Nuclear Control Institute called attention to “troubling indications over the last 2 years that Saddam’s nuclear-weapons program has not only survived, but been reinvigorated.”Footnote 30 The US and the UK invaded Iraq in 2003, and, although the invasion is hotly debated, many argued Iraq’s rejection of IAEA weapons inspection was in direct violation of UNSC Resolution 687Footnote 31 as well as other UNSC resolutions and thus the invasion was warranted. While perhaps not the most significant factor, the NPT was used legally by the US as one justification of a levied punishment.

These examples highlight how informal punishment can be more or less tailored to the specific regime in question. Both the UNSC and the US have played roles in enforcing the NPT, despite the treaty’s lack of formal provisions. Yet, not all violations to the NPT are dealt with by the UNSC or by other states. For example, although the IAEA has called attention to likely Syrian violations of the NPT,Footnote 32 no sanctions have been adopted either by the UNSC nor by any major powers. This is due to UNSC members Russia and China’s continued support of Syria, even in the wake of current bloodshed. The Syrian example illustrates the discretion enjoyed by powerful states to withhold punishment if their interests dictate such a course of action.

The US’ response to a West German deal with Brazil is illustrative in this regard as well. In 1975, after West Germany had signed the NPT, its government signed a secret deal that would aid Brazil in the completion of eight nuclear power plants.Footnote 33 However, since Brazil was not party to the NPT, this deal violated an important part of the NPT, which gives preferential treatment in gaining assistance with peaceful uses of nuclear technology to member states. The US, and other powers, including the Soviet Union, disapproved of this deal.Footnote 34 Under President Ford, strong sanctions were considered, but “when it was suggested that U.S. troops stationed in Europe and joint initiatives with the Soviets be used to put pressure on the Germans, Kissinger felt obliged to argue that this was the wrong way to treat a close ally” (Kaiser 1978:89).

The Syrian and (albeit less serious) German cases could be used as counter-arguments to the thesis that the NPT is enforced informally even when no formal punishment mechanism exists within the treaty’s text. Yet, in both cases, sanctions were entertained. Additionally, it is important to remember when examining informal enforcement of international law that no international, or, for that matter, domestic law, with formal punishment provisions is perfectly and consistently enforced either. Given that even under formal domestic law, violations are not always punished, consistent enforcement is a flawed benchmark for assessing whether or not failure to respond in these cases represents a deviation from the norm of informal punishment. And because military power asymmetries characterize the NPT, we would expect powerful states to exploit their discretion from time to time. Finally, the Syrian case is far from closed.Footnote 35

It is instructive to compare cases of member violations with non-member “violations.” If the treaty is key to the punishment of treaty violators, violators not party to the treaty should not be punished (or threatened with punishment) with the same frequency.

The facts do seem to support the hypothesis that violations of the norms of non-proliferation by non-members have, on average, received a weaker response from the UNSC or the US. Both Israel and India have become nuclear weapon states without ratifying the NPT and have received merely non-effectual condemnations. As Charnysh bluntly states: “The US government pursued a policy of silence towards the Israeli nuclear weapons program.”Footnote 36 And ending a moratorium on nuclear trade with India, the 2008 US-India Civil Nuclear Agreement, which necessitated an official waiver from the Nuclear Suppliers Group, allows US companies to work in partnership with India on the development of nuclear reactors.Footnote 37 The agreement forces concessions on India –e.g.,, India agreed to open itself to inspections by the IAEA; still, such concessions are not commensurate to the punishments directed towards violating member states.

Brazil and Argentina provide another case in point. Both states had aggressive nuclear weapons development policies during the 1970s and 1980s, a time when both were nonsignatories to the NPT. Obviously, states like the US were not in support of these policies. Still, little, if any, coercive pressure was used against these states by either the US or the UNSC. The US held back on transferring certain technologies (which is actually in the spirit of its NPT obligations) but did little else. It is not farfetched to argue that, had that US wanted to coerce these states into dismantling their programs, the US had the economic and military power to do so. For example, “when Carter took office in 1977, he stepped up pressure on the Argentines to halt what his administration saw as gross human rights abuses. The U.S. cut back on military and economic aid and began collecting information on incidents of kidnapping, torture and killing.”Footnote 38 As noted above, harsher measures were considered against NPT member state, Germany, in its interaction with Brazil as opposed to on Brazil. Redick, a specialist in Latin American nuclear energy programs, recounts the roots and manifestation of the rivalry and its evolution to today’s situation in which both states have acceded to the NPT and the Treaty of Tlatelolco. He argues US pressure was quite low with respect to getting these states to change their policies, stating:

Now, I have not included direct external pressure as one of the main reasons for the change. I think the external pressure, in the case of Argentina and Brazil, was far more effective in the form of incentives rather than penalties. Certainly, the restrictive foreign export policies that the U.S., the Germans, the Canadians and others used contributed to the expense and difficulty of the Argentine and Brazilian nuclear programs. But this was not the reason why the two countries reversed their policies…Footnote 39

Yet, consistent with the thesis, the German export policies referred to above were put in place due to the pressure of the US under the Carter administration.Footnote 40

6.2 Nonstate enforcement mechanisms

Human rights agreements dominate the set of misclassified agreements in Table 4: They often lack formal punishment provisions even though there are incentives to defect. Simmons (2009) offers an explanation based on domestic political factors. She argues that human rights treaties become meaningful, and thereby exert compliance pressures, by empowering domestic (or transnational) individuals or groups; these actors use human rights commitments to pressure governments into obeying higher standards. Moreover, international agreements may help local actors define their agendas more clearly, agree on a common set of priorities, and obtain additional options for litigation, thereby gaining a more effective bargaining position vis-à-vis the government. The presence of such mechanisms renders punishment provisions in some agreements obsolete and, as Simmons convincingly shows, especially so in human rights treaties. Two of the misclassified agreements in Table 3, CEDAW and the International Covenant on Civil and Political Rights, are studied in depth in Simmons’ book. It will be illuminating to study the remaining agreements in Table 4 from Simmon’s perspective.

6.3 UNSC action

One might ask whether the UNSC ever enforces issue areas that are not related to security, as is the NPT. In fact, informal punishment mechanisms are levied through many UNSC Resolutions that are aimed at human rights abuses. The UNSC has acted to impose punishments on Sudan (from shaming to travel ban and asset freeze)Footnote 41 and Sierra Leone (petrol sanctions, travel ban, and arms embargo),Footnote 42 to name a few of the states targeted.

The case of Libya illustrates the references to international law quite well. UNSC Resolution 1970, from February 26, 2011, states, “Considering that the widespread and systematic attacks currently taking place in the Libyan Arab Jamahiriya against the civilian population may amount to crimes against humanity … Urges the Libyan authorities to Act with the utmost restraint, respect human rights and international humanitarian law, and allow immediate access for international human rights monitors.” Footnote 43 These references to international humanitarian law suggest that the punishments contained in the resolution are directed at these violations of international law. The resolution itself contains the following punishments: an arms embargo, a travel ban, an asset freeze, and the possibility of sanctions against Libya, as well as referring the situation to the International Criminal Court. The most recent resolution, UNSC Resolution 2009, adds additional references to international law to condemn “violence against civilians, or arbitrary arrests and detentions, in particular of African migrants,” which violates the International Convention on the Protection of the Rights of All Migrant Workers and Members of their Families, as well as “sexual violence, particularly against women and girls,”Footnote 44 which violates the CEDAW. Both agreements are listed as misclassified and hence candidates for informal punishment. Libya had acceded to both of these agreements at the time.Footnote 45

6.4 The possibility of failed cooperation

Another reason why formal punishment provisions might be left out of agreements is that governments may have only weak preferences for cooperation. Thus the set of misclassified agreements, for instance, conflates two categories: Agreements with implicitly informal punishment provisions and agreements without such informal punishments even though they are needed. Agreements in this second category might be considered “failed cooperation.”

Such failed cooperation may arise because interest groups pressure governments into pursuing international agreements, especially in the issue areas of the environment and human rights. Instead of refusing to cooperate entirely, states may draw up an agreement that responds to interest group demands yet does not include any enforcement mechanism, explicit or implicit. Many human rights agreements and environmental agreements are subject to exactly this criticism: they state grand goals but fail to include any mechanisms to enforce those goals.

The “Convention on Nature Protection and Wild Life Preservation in the Western Hemisphere,” categorized as misclassified, arguably falls under this category. There is some evidence that participating states were reluctant to agree to the loss of sovereignty that would come with formal enforcement provisions; the Convention accordingly came under criticism.Footnote 46

Yet, and from this perspective unexpectedly, even this agreement opened up opportunities for informal enforcement. In 1944, the US considered pressuring Ecuador on the issue of preserving the Galapagos archipelago. It was only the prevalence of military considerations that led the US to turn a blind eye. As the Secretary of State, Cordell Hull, explained in a memorandum to President Roosevelt, the issue of the Galapagos archipelago was dropped eventually “in order to avoid possible jeopardy to negotiations recently authorized relating to the use of [a military] base during the war,” promising to return to the status of the archipelago “at the earliest possible juncture.”Footnote 47 Even though the Convention arguably falls under the category of agreements that intentionally lacked any enforcement capacities, it nevertheless provided an opportunity for informal enforcement.

It should also be noted that “weak preferences” for cooperation are very different from the design of “weak enforcement” mechanisms. Weak enforcement, as considered by Downs and Rocke (1995), refers to the existence of punishments costless enough to allow defection from agreements from time to time (yet costly enough to prevent violations most of the time). Downs and Rocke point out that the GATT’s weak enforcement norm allowed governments temporarily to suspend cooperation in response to pressures by domestic interest groups. The absence of strong enforcement mechanisms was a rational response to the prevalent uncertainty about interest group demands, which in turn arose from fluctuations in technology or the world market.

This discussion underscores that weak preferences and weak enforcement arise for opposite reasons. Weak preferences for cooperation arise when a government is not interested in cooperation and the outcome of an international agreement, but instead uses the agreement to placate interest groups. By contrast, weak enforcement arises when a government wants to form an international agreement, but fears that it will have to suspend cooperation temporarily to placate interest groups in the future (Downs and Rocke 1995: 88). In order to allow such room to move, the government prefers only limited enforcement mechanisms.

The reader should also note that weak enforcement is not equivalent to informal enforcement. In fact, as explained earlier, informal punishments can be very specific, very targeted, and very severe. Weak enforcement, by contrast, is first and foremost characterized by being limited in size and scope. One may go even as far as saying that informal enforcement, by being in principle unlimited in size and scope, is located at the opposite end of the spectrum.

7 Conclusion

To this day, it remains fashionable to dole out arguments about the weakness of international law. Even scholars sympathetic to international law, like Guzman (2008), underestimate the potential power of formal punishment provisions let alone informal ones. The analysis in this paper presents another blow to such arguments. Adding to over a decade of theory and empirical analyses showing the rationality and effectiveness of the design of international law,Footnote 48 I show here that even what’s left out is rationally designed and effective. The point made in the introduction, that the absence of formal punishment provisions does not imply the absence of punishment, is valid. And importantly, unlike extant studies on the weakness of international law, this study marshals strong empirical evidence in its favor by combining a novel research design with a scientific dataset on international law.

I present a theory of punishment provisions based primarily on Rational Design logic. The comparative static predictions flowing from this theory perform strikingly well when tested against a data set that indicates whether an international agreement contains a formal punishment provision. I then present theoretically-motivated hypotheses based on both power and efficiency about whether any necessary punishment provision will be formal or informal. While informal punishment cannot be observed in the data set, the research design I develop does allow me to test whether the predicted systematic difference between agreements that incorporate formal punishment and those that need it but do not and hence for which the punishment may be implicit exists. I am therefore able to test an implication of my argument. Given a different theory of formal-versus-informal punishment would not have the same implications, the overall theoretical argument is buttressed. Finally, case study evidence suggests informal punishment indeed occurs. The analysis thus shows not only the compatibility of power and efficiency considerations; the results indicate that most of the time when punishments are needed they are indeed formalized. Thus while informal law is systematic, it does not dominate formal law.

Two important points are worth reemphasizing. First, not every misclassified treaty has been found to have informal enforcement (although future work that relies on interviews and archives may uncover additional evidence), but not every treaty that has formal enforcement and for which defection occurs is punishment forthcoming. Second, and relatedly, when we do not see any informal punishment occurring in a misclassified treaty, we cannot jump to the conclusion that the informal punishment is not implicit any more than we can jump to the conclusion that treaties with unexploited formal enforcement are weak.

Given the results in this article, a research program is suggested. One potentially fruitful avenue would be to find the negotiating record of “misclassified” agreements. Was the need for punishment provisions mentioned? If so, how did actors arrive at the decision to leave them out formally? Did states have different bargaining positions on the issue? Additionally, future research should examine false positives, that is, those agreements that have punishment provisions but for which such provisions are deemed unnecessary. Three of the twenty false positives in this analysis are International Labor Organization (ILO) Conventions. While these three conventions address issues that do not warrant punishment provisions, they were negotiated under the auspices of the ILO, which automatically provides enforcement power. One testable implication then is that this enforcement power has never been used.

Finally, other design provisions can be scrutinized with this same research design, including monitoring provisions and the informal role played by NGOs. All in all, there is every reason to believe that a new wave of research on informalism in international law will yield a commensurate level of insight to international cooperation as did the research over the past decades on formal law. It is an exciting frontier.