1 Introduction

A discussion of the merits and limits of self-governance necessarily takes place along two dimensions. First, it must bear on the possible emergence of cooperation norms. The question is whether norms can emerge spontaneously from repeated interactions between individuals or if norms must be created explicitly by the state.Footnote 1 Second, one must also consider which type of enforcement—private or governmentalFootnote 2—is required for individuals to comply with rules of conduct. Here, the specific question is whether private parties are able to devise mechanisms and institutions to ensure compliance with rules without the need of “the backing of state authority” (Benson 1991a).

With regard to these two sets of questions, there is a vast literatureFootnote 3 showing that self-governance works, and that it has been pervasive throughout human history. Admittedly, rules of conduct did not emerge only from private interactions but compelling evidence suggests that they were also implemented successfully through private enforcement mechanisms. Examples range from primitive societies (Benson 1989, 1991a, b), ancient Greece (Karayiannis and Hatzis 2012), medieval Ireland (Peden 1977) and Iceland (Friedman 1978; Solvasson 1993), and across diverse groups such as Maghribi traders (Greif 1989, 1993), the American West in the 19th century (Anderson and Hill 1977), pirates (Leeson 2007a, 2009a, b, 2014a), and prison gangsFootnote 4 (Sobel and Osoba 2009; Leeson and Skarbek 2010). Contemporary societies have also been included in this literature (Benson 1990; Ellickson 1991; Bernstein 1992; Leeson 2013). Private enforcement mechanisms in these instances often take the form of expressive punishment, such as social disapproval and gossip, among those with whom the individual interacts (Guala 2012).

If private enforcement thus appears to work sufficiently well, one may wonder: what is the usefulness of government enforcement? This is precisely the question that we address in this paper. Given evidence that private enforcement mechanisms are widespread, we seek to understand the benefits and costs of a government enforcement mechanism in the presence of private enforcement that is already in place. Specifically, we study the effects of adding and removing a government-like enforcement mechanism—in the form of centralized monetary punishment—on individual compliance with a cooperation norm that is created and enforced via peer disapproval.Footnote 5 Our analysis begins by establishing a private enforcement mechanism, namely peer control via peer disapproval, and proceeds by examining how the addition and removal of government enforcement impact cooperation levels and the effectiveness of private enforcement.

To do this, we build on a firmly established paradigm from experimental economics, the public good game that we present in Sect. 2. Our two main results are presented in Sect. 4. First, we confirm that implementing government enforcement that is aligned with a pre-existing cooperation norm stabilizes contributions to the public good over time. Our second finding, however, is less favorable regarding the impact of a government enforcement mechanism on the effectiveness of pre-existing privately enforced norms. We show that there exists a crowding-out effect or a sort of double “stickiness” regarding the post-intervention effects of government enforcement.Footnote 6 Our results suggest that once individuals have been exposed to government enforcement, this enforcement cannot be removed without reducing cooperation levels and damaging the effectiveness of the private enforcement mechanism in the process.

The existing literature, presented in Sect. 3, struggles to explain how the removal of government enforcement may negatively impact contributions to a public good. We complement the existing literature by showing that removing government enforcement, in the form of a centralized monetary punishment, makes low contributors unresponsive to the private enforcement mechanism of peer disapproval that remains in place.

2 Experimental design

2.1 Experimental game

We use a public good game that represents a stylized model of a community in which each person’s well-being depends on own and other persons’ contributions. Individually, each member is best off if he or she contributes nothing and relies on others’ efforts to create social benefits by behaving cooperatively. Thus, the public good game provides the closest scenario to real-life settings when some kind of coercion by the state may, a priori, increase social welfare.

The basic structure of our experimental game follows the well-established design of a repeated linear public good game employing standard parameters. Ledyard (1995) and more recently Chaudhuri (2011) provide elaborate descriptions of how public good games are implemented. In our experiment, groups are composed of \(n = 4\) subjects. Each subject is endowed with \(E_{i} = 20\) tokens at the beginning of each period, which must be allocated to either a public account (\(g_{i}\)) or left on subject’s private account (\(c_{i}\)). Each participant i must make a contribution decision \(g_{i}\) (0 \(\le g_{i}\) \(\le 20\) 20). Contributions are made in whole tokens, simultaneously and without any communication. In the Baseline, each token left on the private account generates a benefit equal to 1 Ecu. In addition to the tokens kept on the private account, each participant receives a fixed benefit, \(\alpha = 0.4\) Ecus, from the total group contribution to the public account, where 0\(< \alpha <1 < \textit{n} \alpha.\) Thus, the individual payoff function (\(\pi _{i}\)) is the following:

$$\begin{aligned} \pi _{i} = 20 - g_{i} + 0.4 \sum \limits _{j=1}^4 g_{j} \end{aligned}$$

The value provided to individuals by the public good is a linear function of how much of the public good is provided. From 1\(< \textit{n} \alpha\) it follows that the Utilitarian optimum and the efficient symmetric outcome is for all group members to contribute their entire endowments to the public account. This would, in effect, maximize the gains at the group level (which correspond to \(\textit{n} \alpha\)). However, at the individual level (assuming pure self-interest), each subject is better off from contributing zero to the public account.

In our experiment, each session is composed of three segments of ten periods each. In the first segment of each session (periods 1–10), which corresponds to our Baseline, we implement these standard parameters. In the second and third segments, our experiment differs, however, from standard public good games along two main dimensions.

The first dimension we manipulate is the strategic environment within which subjects make decisions. In our Government enforcement treatment (hereinafter GE), subjects are informed that 0.3 Ecus will be subtracted from the private account for every token not allocated to the public account. The payoff function in the GE treatment is given by

$$\begin{aligned} \pi _{i} = E_{i} - g_{i} + \alpha \sum \limits _{j=1}^n g_{j} - s_{i}(E_{i} - g_{i}) \ \end{aligned}$$

This setup implies that while in the Baseline the return from each token left on the private account was 1 Ecu, this return is reduced to 0.7 Ecus in the GE treatment. The intensity, framing, and implementation of the subtraction rule were chosen so as to replicate three specific characteristics of punishments meted out by state authorities. First, the monetary punishment of offenders is typically mild (Engel 2014). Also Ostrom et al. (1992) note that many successful communities had frequently recourse to mild monetary punishment. In order to implement a mild punishment, we set the subtraction rule so as to ensure that donating zero remains the dominant strategy of money-maximizing individuals, which preserves the nature of the decision as a social dilemma—i.e., one that pits an individual’s interest against the interest of the group. To see why this is the case, consider the individual payoff with our subtraction rule, which yields the following individual payoff function: \(\pi _{i}\,=\,\)0.7 × (\(E_{i}\) \(-\) \(g_{i}\)) + 0.4 × \(\sum \nolimits _{j=1}^n\) \(g_{j}\) . While full contributions from every subject in the group yields \(\pi _{i} = 32\) Ecus, contributing zero and paying \(s_{i} = 0.3\) for every token kept on the private account, yields \(\pi _{i} = 38\) Ecus for the free-rider. Thus, a money maximizing individual does not contribute to the public account so long as \(s_{i}\) \(< 1 - \alpha.\)

Second, the punitive nature of the punishments meted out by state authorities is typically clear in real-world settings. Cooter (1984), p. 1523) argues that costly governmental punishment is a payment “imposed for doing what is forbidden” (emphasis added) rather than “the price of doing what is permitted”. To emphasize the punitive nature of our subtraction rule in the GE treatment, we highlight the fact that Ecus are subtracted when individuals deviate from the action that benefits the group, i.e., when they keep tokens on their private account and therefore do not place them in the public account. Specifically, the instructions read that 0.3 Ecus are subtracted from tokens not allocated to the public account.Footnote 7 In public good experiments, it is generally assumed that group members share the understanding that the desirable action of each individual is one that favors the interest of the group and that deviations from this action are undesirable [(see, e.g., Andreoni and Gee (2012)]. Our GE treatment makes salient this contribution norm by emphasizing that keeping tokens on the private account constitutes deviations from the group desirable outcome and that these deviations have monetary consequences. We avoid using words such as tax, punishment, or sanction in order to minimize experimenter demand effects (Zizzo 2010) and avoid the possibly varied connotations that participants may attach to this vocabulary.

Third, to mimic centralized government enforcement, we make it clear to participants that the subtraction rule is applied by the central computer. Because the legitimacy of enforcement figures has been shown to play an important role in public goods experiments with punishment opportunities (Baldassarri and Grossman 2011), we wish to minimize the possibility that the punisher is seen as illegitimate. Thus, while some recent experiments [e.g., Engel (2014)] used a randomly selected subject to act as the punishing authority, we elect to deliver punishment in this treatment through the central computer as the experimenter is most likely to be seen as a legitimate authority (Milgram 1963; Karakostas and Zizzo 2015).

Our second treatment, the Peer enforcement treatment (hereinafter PE), was implemented in the following manner. This treatment adds a second stage to the standard linear public good game. In this stage, we introduce the option for group members to send disapproval points to each other. Following Masclet et al. (2003), each participant is given the opportunity to assign between zero and ten disapproval points to any other member in the same group if he or she is dissatisfied with the other’s contribution in a given period. Thus, the number of points sent represents the level of disapproval of a subject’s contribution. As such, they are not costly, in monetary terms, to send or receive—similarly to gossip or to facial expressions that indicate disapproval of someone’s conduct. An assignment of zero points indicates the absence of disapproval, while ten points indicate the highest level of disapproval. The functioning of the sending of disapproval points was explained to subjects by the following message:

In this stage you have the opportunity to register your approval or disapproval of each other group member’s decision by distributing points. You can award a large number of points to any member of your group if you disapprove of his or her decision: 10 points for the strongest disapproval, 0 points for the absence of disapproval. You may distribute any whole number of points between 0 and 10.

Finally, we combine the two dimensions—i.e., GE and PE treatments—in a third treatment, which we conceptualize as the Combined enforcement treatment (hereinafter CE).

2.2 Procedures

The experiment consists of eight sessions conducted at the Laboratoire Montpellierain d’Economie Theorique et Appliquee (LAMETA) in Montpellier, France, between March and September 2015. Twenty subjects participated in each session (with 16 in one session), for a total of 156 participants (54.49 % were females), invited via the ORSEE software (Grenier 2004). Seventy-nine percent of the subjects were students at one of the universities in Montpellier and 22.3 % of them were studying economics. Nine out of ten subjects participated previously in a laboratory experiment. We ensured, however, that none had participated previously in an experiment with similar parameters. Terminals were separated by lateral partitions to ensure complete anonymity. Payments were made privately at the end of the session. The exchange rate was 15 Ecus \(=\) 1 euro. Subjects earned an average of 22 euros. Sessions lasted between 1.5 and 2 h, including initial instruction and payment of subjects.

At the outset of each session, subjects were informed that the central server would allocate them randomly to groups of four people. We employ partner-matching, so group assignments remain the same for the entire session. Each session consists of 30 periods, divided into three segments of ten periods. The total number of segments in the session was common knowledge, as was the fact that at the end of the experiment only one segment out of the three was to be chosen at random for payment. This procedure has been used by others in public good games, e.g., Andreoni and Miller (2002) and Goeree et al. (2002). Goeree et al. (2002) argue that paying for all decisions may provide stronger incentives, but paying only for some decisions may induce subjects to think more clearly about the payoff consequences of each decision rather than focus on the relative earnings aggregated over all decisions. This design allows us to avoid the problem of whether subjects are influenced by their cumulative earnings and thus care less about the consequences of their decisions in the last periods of the game.

In each session, subjects first played ten periods of a standard public good game, which corresponds to our Baseline. The same subjects then played another ten periods in either the GE, the PE, or the CE treatment. All subjects played the first ten periods under Baseline conditions in order to provide groups the opportunity to become accustomed to significant degrees of free-riding. This creates a challenging environment for each of our treatments employed in the second segment of the session.

The addition of a third segment allows us to examine the sequential effect of adding the GE treatment to groups that previously had been accustomed to peer disapproval under the PE treatment and vice versa. Also, the third segment allows us to compare two scenarios: one in which PE is the only mechanism that subjects experienced, both in the second and third segments versus a scenario in which PE had been supported by GE in the second segment but the latter was dropped in the third segment, leaving PE as the only way of sustaining cooperation norms.

Table 1 provides detailed information about the described treatments, the number of sessions for each treatment, and the segment in which each treatment was implemented.

Table 1 Experimental treatments

3 Theoretical background and predictions

3.1 Standard predictions

The traditional game theoretic model tested in most laboratory experiments assumes away the impact that social or internalized norms may have on behavior.Footnote 8 Also, monetary punishments are considered to change behavior only when these are optimal—that is, when option X is made more attractive relative to option Y in monetary terms. Thus, this model predicts that in our PE, GE, and CE treatments, none of the subjects will contribute to the public good.

Why is this? In our Baseline, it can easily be seen that the dominant strategy Nash equilibrium is for all subjects to keep all 20 Ecus in their private account and contribute nothing to the public account. In equilibrium, this yields a gain of 80 Ecus at the group level. Alternatively, if all group members contributed their entire endowments to the public account, the individual gain would be 32 Ecus and total group earnings 128 Ecus. This corresponds to the Pareto optimal outcome—the outcome that maximizes total group earnings. However, as is well-known, a rational money-maximizing agent would benefit from deviating from this strategy in favor of complete free-riding, hoping for a private gain of 44 Ecus. Since the game is symmetric, this is the strategy adopted by everyone and each subject ends up with 20 Ecus—their initial endowment.

In our PE, GE, and CE treatments, the subgame perfect equilibrium is the same as in the Baseline treatment. This is because peer disapproval is payoff-irrelevant and the monetary punishment is non-deterrent in terms of the benefit-cost ratio. Contributions should not be affected by the introduction of either of these treatments. This is also what we should expect under the combination of the two.

In contrast to these predictions, however, repeated public good experiments employing parameters similar to our Baseline have shown that subjects contribute about 40 % of their endowment in the first period and then reduce their contributions to reach virtually the game theoretic prediction in the last periods of the game [see Gaechter (2014)]. We should therefore expect subjects in our Baseline to behave differently than the game theoretic predictions presented above. Further, any impact of our key treatments—PE and CE—relative to the Baseline must be attributed to behavioral factors such as the desire to avoid peer disapproval, and its reinforcement or reduction by a government-like enforcement mechanism.

3.2 Peer disapproval as a private norm-enforcement strategy

Private enforcement can take many forms. In this paper, we study a decentralized mechanism based on the use of peer disapproval.

A great deal of evidence exists regarding how peers enforce cooperation norms in field settings. Field evidence, however, is often difficult to interpret, and limited control over experimental interventions in field settings means that measuring the interplay between peer enforcement and other types of enforcement can be problematic. It is for this reason, for example, that the crowding-out of social disapproval in Gneezy and Rustichini (2000) seminal paper is proposed as only one among other possible explanations for their results. For instance, the willingness to express disapproval in real-life settings may be influenced by factors such as the belief that those receiving disapprobation could retaliate (Nikiforakis 2008).

Laboratory experiments are therefore needed to control the expression of peer disapproval so as to establish its causal impact on those receiving disapprobation. Laboratory experiments have indeed helped to clarify the extent to which peers voluntarily enforce cooperation norms in groups. One robust experimental finding is that individuals are willing to enforce norms that support cooperation even when doing so is costly [for a survey of the literature, see Gaechter and Herrmann (2009)]. In most field settings, however, peer enforcement manifests in low or zero-cost behaviors such as ridicule, social disapproval, and gossipFootnote 9 (Boehm 1999; Feinberg et al. 2012). Ostrom et al. (1992) made the first attempts to design a laboratory experiment to study norm enforcement by peers. In the context of a common-pool-resource game, they show that people use “shaming” as a strategy to try to induce others to comply with what they consider to be appropriate conduct. The shaming strategy led to substantial improvement in cooperation levels. Gaechter and Fehr (1999) sought support for this outcome in the context of a public good game. They found, however, no impact of social approval and disapproval. Their design, nonetheless, neglects an important feature of real life human interactions—in their experiment, approval or disapproval was not communicated directly from peer to peer. An experiment that allowed subjects to communicate directly their disapproval is Masclet et al. (2003). The authors found that simply giving to subjects this opportunity increases compliance with cooperation norms. They explain this result by the fact that social disapproval instills shame and increases the psychological cost of selfishness. Subhasish (2013) and Nelissen and Mulder (2013) followed and confirmed the seminal result from Masclet, Noussair, Tucker, and Villeval. Since our PE treatment is based on Masclet et al. (2003), we expect to find similar results. Thus,

Hypothesis 1

Providing subjects the opportunity to express disapproval of others’ decisions is an effective strategy for creating and enforcing cooperation norms.

3.3 The interplay of private enforcement and government enforcement

Once cooperation norms have emerged and are privately enforced, government enforcement mechanisms may improve, reduce or leave unaffected cooperation levels. To formulate our hypotheses about how these two forms of enforcement might interact in our experimental setting, we refer to the existing literature on the complementarity or substitutability of added government enforcement, as well as the lasting effects associated with the government enforcement once it is removed.

3.3.1 Adding government enforcement

Analyses concerning the interplay between government and private enforcement address the efficacy of government enforcement with and without the private support of members of the society. Spontaneous order theorists urge public authorities to be cautious when implementing new rules (Boettke et al. 2008; Williamson 2009). As Boettke et al. (2008) emphasize, a sort of institutional stickiness exists whereby the effectiveness of government-enforced rules is improved if rule-makers take into account the importance of the temporal context in which these rules are implemented—specifically, the importance of privately enforced norms as precedents for government enforcement. For instance, the French Civil Code, the Code Napoleon, worked because it was based on and wrote down the customs that preceded it Josselin and Marciano (2002). The role of private enforcement is also emphasized by the law-and-economics of norms literature (McAdams 1997; Feldman 2009), which argues that punishment meted out by government authorities may strengthen the private enforcement of cooperation norms because government enforcement mechanisms indicate and legitimize these norms (Sunstein 1996; Cooter 1998). Enacting a law prohibiting littering, for instance, may induce people to expect to have to pay for non-compliance. In addition, it could induce people to expect to be a target of ostracism from others in their community to a greater extent than before if they are caught littering. In the absence of any legal statement about littering, however, the expectation of ostracism may be reduced, suggesting that government enforcement legitimizes private enforcement, serving as a signal that peer punishment of non-compliance is “backed” by the government.

The experimental literature on the effectiveness of government-like enforcement mechanisms, however, does not provide clear results. While some studies find evidence that punishment meted out by central government-like authorities can “reinforce” peer control, others find that the former crowds-out the latter, or has no effect. The reinforcement hypothesis is supported, for instance, by Andreoni and Gee (2012) and Xiao and Houser (2011), who found that government-like enforcement reinforced cooperation norms, especially when the punishment was applied publicly. This finding is supported by Baldassarri and Grossman (2011) who analyze how centralized punishment affects cooperation in the context of a public good game. They conducted a lab-in-the-field experiment with small groups of farmers in order to study the effectiveness of a monetary punishment when the punishing authority is seen as legitimate. In field settings such as this, it is reasonable to assume that behavior is inherently driven by social norms [see Ellickson (1991) study of norms governing farmers and ranchers in Shasta County, California]. In contrast, Gneezy and Rustichini (2000), in their famous field work, found imposing a fine on parents who are late to pick up their children from day care may in turn reduce parents’ expectations of receiving non-monetary punishment, such as disapproval, from daycare center employees. The situation studied by Gneezy and Rustichini (2000) more closely parallels a principal-agent relationship rather than a social dilemma, however, which leads us to base our Hypothesis 2 on results found by Baldassarri and Grossman (2011), who used a public good experiment, as we do in this paper.

Hypothesis 2

Since the experimenter is most likely seen as a legitimate authority, the introduction of punishment delivered through the experimenter should enhance the legitimacy of peer disapproval and result in increased cooperation.

3.3.2 Removing government enforcement

The lasting impacts associated with removing government enforcement have received only limited attention within economics. Nelissen and Mulder (2013) performed an experiment that is of particular interest to our work. They compare contributions after the removal of monetary sanctions to the situation when peer disapproval is abandoned and find that levels of cooperation decline more after the removal of monetary sanctions than after the removal of peer disapproval. Notwithstanding the interesting implications from their work—in particular that monetary sanctions improve cooperation when they are in place but have a rather negative impact once removed—their experiment, however, does not tell us how removing monetary sanctions may impact the effectiveness of a peer system of disapproval that may remain in place.Footnote 10 Stagnaro et al. (2016) experimental study may provide new insights into this question. They find that a centralized punishment institution positively affects cooperation even when it is removed, but does not affect subjects’ willingness to enforce a cooperation norm privately. Conversely, Peysakhovich and Rand (2015) find that post-centralized punishment positively affects private enforcement levels. It is worth noting that both Stagnaro et al. (2016) and Peysakhovich and Rand (2015) investigate subjects’ willingness to enforce norms of cooperation, which is different from studying the reaction of those who are punished by one’s group fellows. Although our experiment sheds light on the two dimensions, we focus more on the latter aspect.

The hypothesis that removing government-enforced sanctions might cause a significant change in people’s behavior has been also tested empirically by Funk (2007). Using observational data from a natural experiment, Funk demonstrated that the removal of the legal obligation to vote that had been enforced by symbolic (less than $1) fines had a significant negative impact on voter turnout in some Swiss Cantons. However, Funk (2007) does not investigate the link between the removal of government-enforced sanctions and the working of private enforcement.

Our experiment allows us to test whether the removal of government-enforced sanctions negatively impacts the effectiveness of the private enforcement that remains—namely, social disapproval. Hence, our Hypothesis 3 is as follows.

Hypothesis 3

We hypothesize that the removal of a government-like mechanism of punishment should signal that those who express high levels of disapproval are no longer backed by the authority that previously implemented the punishment. The reduced effectiveness of private enforcement, in turn, should negatively impact cooperation levels.

4 Results

The presentation of the results is divided into three parts. First, we answer the question whether peer disapproval—our PE treatment – is an effective mechanism for creating and enforcing private cooperation norms. Second, we investigate whether adding government-like enforcement in the form of a monetary punishment— our GE treatment—to a pre-existing system of peer disapproval improves cooperation. Third, we address whether removing government enforcement undermines the power of peer disapproval to change the behavior of low contributors—which corresponds to the transition from our CE treatment to our PE treatment.

4.1 Does peer disapproval create and maintain cooperation norms?

Table 2 provides detailed information about average group contributions by treatment and by segment.

Table 2 Average contributions to the public good by treatment and by segment

The first question we investigate is whether, in the absence of any explicit change in the benefit-cost ratio, higher contribution rates can be sustained by the expression of peer disapproval alone. To do this, we examine how subjects behave under the PE treatment compared to the Baseline. It is important to note that the same subjects transit from a state in which there is no punishment—i.e., our Baseline in segment 1—to a state in which each group member has the possibility of sending disapproval points to any other group member if he or she is dissatisfied with the other’s contribution decision. Such a design makes it more difficult to obtain a significant impact of peer disapproval than in the original study by Masclet et al. (2003) because we do not compare treatments in the same periods or segments. In other words, the Baseline always precedes the PE treatment.Footnote 11

Table 2 indicates that average contributions are substantially lower in periods 1–10 under the Baseline than in periods 11–20 under the PE treatment. If we pool the data from the sessions in which the Baseline is in the first ten periods and the PE treatment is in effect in periods 11–20 (this is the case in four sessions), then we obtain an average contribution of 5.36 in the Baseline and of 8.45 under the PE treatmentFootnote 12. A Wilcoxon signed-rank test shows that this difference is significant at conventional levelsFootnote 13 (z \(=\) 2.156, p \(<\) 0.05). Thus,

Result 1

The expression of peer disapproval has a positive and significant effect on average contribution levels. Peer disapproval is therefore an effective strategy to privately create and enforce norms of cooperation even when retaliation between agents is impossible.Footnote 14

4.2 How does the addition of government enforcement affect cooperation?

The evolution in average contributions when the GE treatment was introduced after peer disapproval is shown in Fig. 1.

Fig. 1
figure 1

Evolution of average contributions by group and by period

To answer the question whether the addition of GE to PE in the third segment improves cooperation, it was necessary to design a treatment in which PE is the only mechanism in place across both segments two and three in order to compare it to a scenario in which GE is added in segment three. It is evident that the combination of PE with GE—referred to as the CE treatment in Fig. 1—results in higher average contributions than when peer disapproval remains the only enforcement mechanism in the third segment. This outcome is shown in Table 1. When the PE treatment is the only enforcement mechanism in periods 21–30, subjects contribute on average 9.31 tokens over these periods, compared to 11.56 tokens when GE is added to PE over the same periods. However, a Mann-Whitney ranksum test indicates that this difference is not significant (z \(=\) 0.817, p \(=\) 0.41).

Since from a statistical point of view we cannot reject the hypothesis that similar levels of cooperation are achieved with and without GE, does this imply that government enforcement is not worthwhile? To fully address this question, we explore the impact of government enforcement beyond it’s impact on aggregate contribution levels. In real-world enforcement scenarios, the stability of cooperation rates over time is of considerable importance.Footnote 15 For instance, Fig. 1 suggests that our PE treatment in segment three fails to maintain high contribution rates over time. However, we do not find any evidence of a similar decay in segment three of the treatment in which GE is added to PE. We explore this outcome statistically by comparing average contributions in the first five periods of segment three to average contributions in the last five periods of the same segment when GE is added to PE.Footnote 16 We find no evidence of a significant decay in cooperation in segment 3 when GE is added to PE. A Wilcoxon signed-rank test indicates no significant difference in average contributions between periods 21–25 and 26–30 (z \(=\) 0.408, p \(=\) 0.68).

This naturally prompts the question whether the combination of private and government enforcement stabilizes contribution rates regardless of the environment in which this combination is employed. In other words, would we still observe such a stable evolution in contributions if the combination of the two enforcement mechanisms was preceded by some other treatment than PE in periods 11–20? To answer this question, we implemented again the CE treatment in periods 21–30 but, this time, preceded by the GE treatment in periods 11–20. It turns out that the environment within which the CE treatment is implemented does matter. In effect, when the combination of PE and GE is used after subjects had been accustomed to GE alone, average contributions are significantly higher in periods 21–25 than in periods 26–30 (z \(=\) 2.701, p \(<\) 0.01). Thus,

Result 2

We find that adding government enforcement to private enforcement prevents contribution rates from decaying over time. However, our results do not support the hypothesis that the combination of the two types of enforcement increases average cooperation rates.

4.3 Do post-government enforcement effects exist?

We now come to the core question that this experiment raises: does the removal of government enforcement impact the functioning of the private enforcement that remains in place? We hypothesized (see hypothesis 3 above) that the removal of our GE treatment will not simply restore the power of peer disapproval to its initial level, but rather will undermine its ability to change the behavior of those who receive points of disapproval. To test this hypothesis, we compare cooperation rates under the PE treatment in segment three to cooperation rates under the same mechanism and in the same segment, with the only difference between the two being that one was preceded by PE in segment two, whereas the other was preceded by the combination of PE and GE in segment two. In this latter case, the government enforcement introduced at period 11 was removed after period 20, leaving only PE from period 21–30. Consequently, the only difference between the two is that in one case subjects had been exposed to government-like enforcement in periods 11–20, but not in the other case.

Figure 1 (see above) shows that removing GE after period 20 results in a reduction in contributions in periods 21–30. Table 2 shows that subjects contribute an average of 3.81 tokens in periods 21–30 under the PE treatment once government enforcement was removed. We compare this to the 9.31 tokens contributed on average by subjects under the same private enforcement in the same segment three, but who previously had not been exposed to government enforcement. The Mann-Whitney ranksum test indicates that the difference between the two is significant at conventional levels (z \(=\) 2.287, p \(<\) 0.05).

One explanation for this finding could be that subjects simply stop expressing disapproval once the government enforcement is removed. However, we are able to reject this hypothesis and provide support for our hypothesis 3, i.e., that disapproval points in fact lose their power to change people’s behavior. Figure 2 below presents the evolution of disapproval points sent in the PE treatment in segment three that was preceded also by PE in segment two to the PE treatment in segment 3 when it was preceded by the combination of PE and GE in segment two. It is evident that, if anything, subjects express more disapproval in periods 21–30 after having been exposed to GE than when GE had never been introduced. The Mann-Whitney ranksum test indicates that the difference between the average number of disapproval points sent in segment three between these two treatments is statistically significant (z \(=\) 1.993, p \(<\) 0.05).

Fig. 2
figure 2

Evolution of disapproval points sent by treatment and by segment

This outcome seems to lend support to our Hypothesis 3— that previous exposure to government enforcement undermines the power of peer disapproval to change the behavior of those who receive points of disapproval. In order to go further in the testing of this hypothesis, we distinguish between two types of contributors, low and high contributors, as disapproval points may have different effects on those who contribute less or more than the average contribution in their group. Accordingly, we define a low contributor as someone who had contributed less than the mean contribution level of their group in period t.Footnote 17 High contributors are those who had contributed more than the mean contribution level of his group in period t. Figure 3 shows that previous exposure to government enforcement does indeed affect in different ways the behavioral response of low and high contributors to receiving disapproval points.

Fig. 3
figure 3

The relationship between disapproval received in period t and the change in contribution from t to t + 1

Figure 3(a) shows the effect of receiving disapproval points on the change in low and high types’ contributions from period t to t+1 when subjects had not experienced any government-like enforcement. It clearly indicates that high contributors do not change their contribution from one period to the next as a result of disapproval received from their peers. However, low contributors, under the no-exposure to government enforcement, do increase their contribution from one period to the next as the result of disapproval points received. Figure 3(b), on the other hand, shows the effect of disapproval points on the change in low and high types’ contributions when subjects previously had been exposed to government-like enforcement. It indicates that, when previously exposed to our GE treatment, low contributors are not responsive to disapproval from their peers. However, the effect for high contributors is more nuanced. They contribute substantially lower amounts in t+1 when they realize that they contributed more than the group average in period t. Yet, receiving more disapproval points reduces the extent to which they lower their own contributions.

The estimates from the following OLS, shown in Table 3 (Panel A and Panel B), confirm that a previous exposure to government enforcement undermines the effectiveness of disapproval points for low contributors:

$$\begin{aligned} c_{i}^{t+1} - c_{i}^{t} = \beta _{0} + \beta _{1}\left(\sum D^{t}\right) + \epsilon _{i} \ \end{aligned}$$

The coefficient \(\beta _{1}\) measures the effect of receiving disapproval points on subject i’s change in contribution from one period to the next. The model is estimated separately for subjects who contributed less (Low contributors) and more (High contributors) than the mean group contribution in period t.

Table 3 Determinants of changes in contribution

The \(\beta _{1}\) estimates from the Panel A in Table 3 suggest that disapproval points raise contributions for individuals who contributed less than the average only when they had not been exposed to government enforcement; while Panel B suggests that, when previously exposed to government enforcement, receiving more disapproval points reduces the extent to which high contributors lower their own contributions. Thus, the effect from the exposure to government enforcement is twofold: it undermines the effectiveness of disapproval points for low contributors and it reduces the extent to which high contributors negatively react to the information that they contributed more than the average contribution level in their group.Footnote 18

Result 3

The removal of government enforcement results in a significant lessening of cooperation rates under private enforcement. This effect is not explained by a reduction in subjects’ willingness to express disapproval vis-a-vis the other group members. On the contrary, the number of disapproval points sent is significantly higher with a previous exposure to government enforcement than without this exposure. We find that low contributors are responsive to the receipt of punishment points when previously they had not been exposed to government enforcement, but that once they experienced it, the receipt of disapproval does not change their behavior anymore. For low contributors, a previous exposure to government enforcement crowds-out the effectiveness of peer disapproval to enforce cooperation norms.

5 Discussion

As we observed in our study, there are benefits but also possible costs from using government enforcement. On the one hand, government enforcement may improve cooperation norms when private enforcement already is in place. On the other hand, removing government enforcement makes low contributors unresponsive to private enforcement. What might explain these two effects?

The most plausible interpretation of our findings is based on a specific approach within the expressive theory of government enforcement. Cooter (2000) distinguishes between the expressive, deterrent, and internalization effects of government enforcement. The expressive effect occurs when government enforcement causes people to adjust their expectations about what constitutes appropriate behavior in a given group, community, or society at large. When this occurs, people interpret governmental measures as an expression of what constitutes one’s duty. For instance, Feldman and Taylor (2011) find that employees’ adherence to workplace rules is stronger when those procedures are mandated by the government. This argument is also suggested by Benabou and Tirole (2011), who show that government enforcement may affect what people believe they should do in a particular context, and suggest that governmental rules may thus have a normative function.

The literature on the expressive theory of government enforcement further rejects the argument that the removal of government-enforced sanctions will not cause any change in people’s behavior when sanctions are mild. Funk (2007) study, for instance, suggests that as adding a government-enforced sanction reinforces cooperation norms, in a similar way, its removal alters people’s perceptions of the significance of contributing to a public good.

In the context of our own experiment, we showed that the addition of government enforcement to private enforcement caused subjects to contribute on average more compared to when private enforcement was the only mechanism in place.Footnote 19 However, contributions went down to levels reaching nearly complete free-riding when government enforcement was removed, leaving only private enforcement. This reduction in contributions is unlikely to be caused by the removal of the monetary sanction because our subtraction rule was purely symbolic, i.e., the sanction preserved the nature of the decision as a social dilemma. Rather, we conjecture that the abolition of government enforcement altered people’s perceptions of the significance of contributing. It may be the case, as the expressive theories of government enforcement suggest, that before the government enforcement mechanism was removed, contributing was considered the appropriate behavior to be followed, whereas once it was removed, this association was removed along with it.Footnote 20

Why could this make low contributors unresponsive to the receipt of disapproval points? We hypothesize that the removal of the normative significance attached to contributing may have been interpreted by low contributors as a signal that those who express high levels of disapproval are no longer “backed” by the government—in our case, the automatic mechanism that metes out the sanction. Consequently, low contributors may no longer have felt compelled to react positively to high levels of disapproval. This is also how Gneezy and Rustichini (2000) interpret some of their results. Specifically, they contend that parents who were late to pick up their children from the day-care center may not have felt shame or guilt once exposed to a symbolic monetary fine for being late. However, there is an important difference between their findings and ours. They observe such perverse effects even when the fine is in place. From this perspective, our results rather support the argument developed by Funk (2007)—mely that government enforcement may change people’s perceptions of the significance of contributing, making this the right thing to do, when government enforcement is in place. However, once it is removed, this association is also removed along with it, making low contributors unresponsive to social disapproval.

6 Conclusion

Private enforcement of cooperation norms using mechanisms such as shaming, ostracism, and gossip, not only precede the rules enforced by governmental authorities, but presumably also remain in place even after government enforcement measures are lifted. This might be the case when, for example, government enforcement becomes too costly to implement, or proves only marginally to enhance compliance with desired behaviors, causing policymakers to remove such measures in order to rely instead on the previous, emergent mechanisms to maintain cooperation.

As our results suggest, the addition of government enforcement can be effective in stabilizing contributions to a public good. To ensure this outcome, spontaneous order theories emphasize that government enforcement should align with pre-existing norms and be implemented once a private enforcement is in place. Our results support the hypothesis that government enforcement stabilizes cooperation when implemented after pre-existing privately enforced norms. It seems, then, that government enforcement may complement private enforcement, or, as Posner and Rasmusen (1999, p. 369) put it, “the two kinds [of enforcement] reinforce each other”. Our second finding is, however, less favorable regarding the impact of government enforcement on cooperation and the effectiveness of private enforcement. We find evidence of a crowding-out effect resulting from the post-intervention impact of government enforcement. Once individuals have been exposed to government enforcement, its removal leads low contributors to become unresponsive to private mechanisms of enforcement such as peer disapproval. As the French economist Frederic Bastiat argued, policy-makers must be careful, because there is that which is seen and that which is not seen. We have demonstrated that that which is not seen, i.e., peer disapproval in the field, matters a great deal for the effectiveness of government enforcement mechanisms.