1 Introduction

Studies of animal behaviour have found many practices which create collective benefits at some apparent cost or risk to individual participants. Examples include alarm calls, food-sharing, grooming, and participation in inter-group warfare. One of the most fundamental problems in evolutionary biology since Darwin (1859) has been to explain how such forms of cooperation evolve by natural selection. An analogous problem in economics has been to look for explanations of cooperative human practices, such as the fulfilment of market obligations, the provision of public goods through voluntary contributions, and the management of common property resources, that are consistent with the traditional assumption of individual self-interest.Footnote 1 Many different theories have been proposed by biologists and economists as possible solutions. Among the mechanisms that have been modelled are direct and indirect reciprocity, kin selection, group selection, and the ‘green beard’ mechanism. (For an overview of these mechanisms, see Nowak (2006). Tomasello (2014) provides a comprehensive account of why and how cooperation may have evolved among early humans. His hypothesis is that early humans were forced by ecological circumstances into more cooperative modes of life, and this led to the evolution of ways of thinking that were directed towards coordination with others to achieve joint goals.) Some economists have combined biological and economic modes of explanation, hypothesising that human cooperation in the modern world is a product of genetically hard-wired traits that evolved by natural selection to equip Homo sapiens for life in hunter-gatherer societies. In some versions of this hypothesis, those traits act as equilibrium selection devices in the modern ‘game of life’ (e.g. Binmore 1994, 1998); in others, they can generate non-selfish behaviour in modern societies (e.g. Boyd et al. 2005; Bowles and Gintis 2011).

However, a recent trend in biology has been to question whether such sophisticated explanations are always necessary. Many forms of apparently cooperative behaviour have been found to be forms of mutualism: the ‘cooperating’ individual derives sufficient direct fitness benefit to make the behaviour worthwhile, and any effect on the fitness of others is incidental (e.g. Clutton-Brock 2002, 2009; Sachs et al. 2004). The Snowdrift game (Sugden 1986), in which equilibrium involves cooperation by one player and free-riding by the other, is increasingly used in biology as a model of such behaviour. In this paper, we present a new model of the evolution of cooperation which fits with this trend of thought.

Our methodological approach treats the biological and economic problems of cooperation as isomorphic to one another. That is, we hypothesise that the emergence and reproduction of human cooperative practices are governed by evolutionary mechanisms that are distinct from, but structurally similar to, those of natural selection. Candidate mechanisms include trial-and-error learning by individuals, imitation of successful neighbours, and cultural selection through inter-group competition. Analyses which use this approach may be both informed by and informative to theoretical biology. For example, Sugden’s (1986) analysis of the emergence of social norms was inspired by the earlier work of theoretical biologists, but it developed new models (in particular, the Snowdrift and Mutual Aid games) which have since been widely used in biology (e.g. Leimar and Hammerstein 2001; Nowak and Sigmund 2005). The model that we present in this paper can be interpreted as a representation either of natural selection or of trial-and-error human learning.

Our modelling strategy is distinctive in that it uses three assumptions which in combination rule out most of the mechanisms that feature in existing theories of cooperation. Specifically, we assume that interactions are anonymous, that evolution takes place in a large, well-mixed population, and that the evolutionary process selects strategies according to their material payoffs. The assumption of anonymity excludes mechanisms based on reputation, reciprocity or third-party punishment. The assumption of well mixedness excludes mechanisms of group or kin selection. The assumption that selection is for material payoffs excludes mechanisms which postulate non-selfish preferences as an explanatory primitive. Working within the constraints imposed by these assumptions, we are able to generate a simple and robust model of cooperation.

Our model adapts the familiar framework of a Prisoner’s Dilemma that is played recurrently in a large population. We introduce two additional features, which we suggest can be found in many real-world cases of potentially cooperative interaction, both for humans and for other animals.

The first additional feature is that participation in the game is voluntary. One of the restrictive properties of the Prisoner’s Dilemma is that, in any given interaction, an individual must act either pro-socially (the strategy of cooperation) or anti-socially (the strategy of defection or cheating, which allows a cheater to benefit at the expense of a cooperator). There is no opportunity to be simply asocial. We add an asocial strategy, that of opting out of the interaction altogether. Of course, if the only difference between anti-social and asocial behaviour was that asocial individuals did not benefit when their co-players chose to cooperate, asociality would be a dominated strategy. It is an essential part of our model that if both players cheat, both are worse off than if they had opted out of the interaction.Footnote 2

The second additional feature is that the payoff that each player receives if they both cooperate is subject to random variation. Before choosing his (or her, or its) strategy, each player knows his own cooperative payoff, but not the other player’s. With non-zero probability, the payoff from mutual cooperation is greater than that from cheating against a cooperator. Thus, there are circumstances in which it would be profitable for a player to cooperate if he were sufficiently confident that the other player would cooperate too. Crucially, however, it is never common knowledge that the payoffs are such that mutual cooperation is a Nash equilibrium. In our model, players receive no information at all about the realisation of the random component of their co-players’ cooperative payoff. This is obviously an extreme assumption; we use it only as a modelling simplification. In real interactions, players often have some such information. [For example, explanations of animal behaviour in asymmetric contests often depend on the assumption that both contestants recognise some feature of the game which signals which of them is more likely to attach the higher value to the disputed resource (Maynard Smith and Parker 1976).] But the main qualitative results of our model require only that each player has some private information about his own payoffs such that, with non-zero probability, a player may know that cooperation is a non-dominated strategy for him without knowing whether the same is true for the other player.Footnote 3

As an illustration of the kind of interaction that our model represents, we offer the following variant of Rousseau’s (1755/1998, p. 36) story of hunting in a state of nature. Two individuals jointly have the opportunity to invest time and energy to hunt a deer. The hunters can succeed only by acting on a concerted plan out of sight of one another. A hunt begins only if both individuals agree to take part. Each can then cheat by unilaterally pursuing a smaller prey, which the other’s deer hunting tends to flush out and make easier to catch. The anticipated benefit of deer hunting to an individual, conditional on the other’s not cheating, can be different for different individuals and on different occasions. Sometimes, but not always, this benefit is sufficiently low that unilateral cheating pays off.

As a more modern illustration, consider two individuals who make contact through the internet. One of them is offering to sell some good which has to be customised to meet the specific requirements of the buyer; the other is looking to buy such a good. If they agree to trade, each individual invests resources in the transaction (exchanging information, producing and dispatching the good, sending payment). Each may have opportunities to gain by deviating from the terms of the agreement. Sometimes, but not always, the benefit of completing the transaction is sufficiently low that unilateral cheating pays off.

We will show how the interaction of voluntary participation and stochastic payoffs can induce cooperation. Of course, it is well known that voluntary participation can facilitate cooperation when players can distinguish between more and less cooperative opponents. If such distinctions are possible, voluntary participation can allow cooperators to avoid interacting with cheats. This can sustain cooperation without the need for informationally and cognitively more demanding strategies of reciprocity or punishment—an idea that can be traced back to Smith’s (1763/1978, pp. 538–539) analysis of trustworthiness among traders in commercial societies. But such mechanisms are ruled out by our anonymity assumption.

In our model, voluntary participation facilitates cooperation by a different route. Because would-be cheats have the alternative option of non-participation, and because non-participation is the best response to cheating, the equilibrium frequency of cheating is subject to an upper limit. If cheating occurs at all, the expected payoff from cheating cannot be less than that from non-participation. Thus, for any given frequency of cooperation, the frequency of cheating is self-limiting. The underlying mechanism is similar to that of the Lotka–Volterra model of interaction between predators and prey: the size of the predator population (the frequency of cheating) is limited by the size of the prey population (the frequency of cooperation).

Clearly, however, this mechanism can support cooperation only if, when the frequency of cheating is sufficiently low, some players choose to cooperate. This could not be the case if, as in the Prisoner’s Dilemma, cooperation was always a weakly dominated strategy. In our model, random variation in the payoff from mutual cooperation ensures that players sometimes find it worthwhile to cooperate, despite the risk of meeting a cheat. The players who cooperate are those for whom the benefit of mutual cooperation is sufficient to compensate for this risk. Because cooperators are self-selecting in this way, the average payoff in the game is greater than the payoff to non-participation. In other words, despite the presence of cheats, beneficial cooperation occurs.

As a first step in developing an evolutionary model, we begin (in Sect. 2) by presenting our variant of the Prisoner’s Dilemma as a one-shot game and identifying its symmetric Bayesian Nash equilibria. We show that, provided the upper bound of the distribution of cooperative benefit is not too low, the game has at least one such equilibrium in which beneficial cooperation occurs. In Sect. 3, we investigate some comparative-static properties of equilibria in this game. We show that as the distribution of cooperative benefit becomes more favourable, the maximum frequency of cooperation that is sustainable in equilibrium increases. In Sect. 4, we examine the dynamics of the model, using simple analytical methods. In Sect. 5, we supplement this analysis by computer simulations based on replicator dynamics. Our analysis shows that, in the neighbourhood of ‘interior’ equilibria in which some but not all players choose non-participation, the dynamics are similar to those of predator–prey models. Depending on the payoffs of the game, interior equilibria may be locally stable (with evolutionary paths spiralling in from a large zone of attraction) or unstable (with evolutionary paths spiralling out and ending at an equilibrium of non-participation). In Sect. 6, we discuss the contribution that our model can make to the explanation of cooperative behaviour. We show that, despite sharing some features of existing biological models of mutualism and voluntary participation, our model isolates a distinct causal mechanism.

2 The model: equilibrium properties

In this section and in Sect. 3, we present our game in one-shot form and analyse its equilibrium properties. This is a game for two players 1 and 2. For each player \(i\in \{1, 2\}\) , the benefit \(x_{i}\) that he gains if both players cooperate is an independent realisation of a random variable X whose distribution f(.) is continuous with support [\(x_{\mathrm {min}}, x_{\mathrm {max}}\)]. Each player knows his own benefit but not that of the other player. Given this knowledge, he plays a game with three pure strategies: to cooperate (C), to cheat (D), or not to participate (N). The payoff matrix is shown in Table 1.

$$\begin{aligned} x_{\mathrm {max}}> a > x_{\mathrm {min}}\ge 0; b > c > 0. \end{aligned}$$
Table 1 Payoff matrix for the game

The essential features of the game are contained in the structure of best responses.Footnote 4 The condition \(x_{\mathrm {max}}> a > x_{\mathrm {min}}\) implies that either C or D (but not N) may be the better response to C, depending on the relevant player’s realisation of X. The condition \(b > c\) implies that, as in the Prisoner’s Dilemma, D is better than C as a response to D. Given that the payoff to N is normalised to zero, \(a > 0\) implies that cheating gives a higher payoff than non-participation if the opponent cooperates; \(c > 0\) implies that the opposite is the case if the opponent cheats. No assumption is made about whether \(a-b\) (i.e. the net benefit of an interaction in which one player cooperates and the other cheats) is positive, zero or negative. It is easy to imagine real-world applications (such as our example of internet trading) in which any of these possibilities would be plausible.Footnote 5 The condition \(x_{\mathrm {min}}\ge 0\) (which is not essential for our main results) implies that players are never worse off from mutual cooperation than from non-participation.

We now consider symmetric Bayesian Nash equilibria (SBNE) of the game. Although our formal analysis in this section and in Sect. 3 treats the game as one-shot, our ultimate concern is with SBNEs as possible stationary points in an evolutionary process in a well-mixed population.

We will say that a pure strategy (N, C or D) is played in a given equilibrium if and only if the unconditional probability with which it is played is non-zero. Some significant properties of SBNE hold for all parameter values. First, there is a non-participation equilibrium in which only N is played; in this equilibrium, players’ payoffs are zero and unilateral deviations lead to neither gain nor loss. Second, there is no SBNE in which C is played but D is not. (Against an opponent who might play C but will not play D, i’s best response plays D when \(x_{i}< a\).) Third, there is no SBNE in which D is played but not C. (Against an opponent who might play D but will not play C, N is the unique best response.) Thus, only two types of equilibrium participation are possible. Depending on the parameter values, there may be an interior equilibrium in which N, D and C are all played; and there may be a boundary equilibrium in which D and C are played but not N.

Since the only information on which a player can condition his strategy choice is private to that player, and given the assumed properties of f(.), any SBNE can be described by the values of two variables. The variable \(\beta \in [x_{\mathrm {min}}, x_{\mathrm {max}}\)] is defined such that each player i chooses C if and only if \(x_{i}\ge \beta \). The variable \(\pi \in \) [0, 1] is the probability that each player i chooses D, conditional on \(x_{\mathrm {min}} < x_{i} < \beta \). In the non-participation equilibrium, \(\beta =x_{\mathrm {max}}\) and \(\pi = 0\). In an interior equilibrium, \(x_{\mathrm {min}}< \beta <x_{\mathrm {max}}\) and \(0 < \pi <1\). In a boundary equilibrium, \(x_{\mathrm {min}}<\beta < x_{\mathrm {max}}\) and \(\pi = 1\).

We now analyse these equilibria. Consider any player i facing an opponent whose strategy is described by \((\beta , \pi )\), in an interaction in which \(x_{i}=\beta \). Let \(V_{{N}}, V_{D}, V_{C}\) and \(V_{M}\) be the expected payoffs to player i from playing N, D, C and M, respectively, where M is the mix of D with probability \(\pi \) and N with probability (1 \(-\) \(\pi )\). Let \(g(x)\equiv F(x)/[1-F(x)\)], where F(.) is the cumulative of f(.). It is straightforward to derive the following expressions:

$$\begin{aligned} V_{{N}}= & {} 0 \end{aligned}$$
(1)
$$\begin{aligned} V_{{D}}= & {} [1-F(\beta )]a-F(\beta )\pi c \end{aligned}$$
(2)
$$\begin{aligned} V_{{C}}= & {} [1-F(\beta )]\beta -F(\beta )\pi b \end{aligned}$$
(3)
$$\begin{aligned} V_{{M}}= & {} \pi V_{D}. \end{aligned}$$
(4)

In analysing equilibrium, it is convenient to work in a \((\beta , \pi )\) space defined by \(x_{\mathrm {min}}\le \beta \le x_{\mathrm {max}}\) and \(\pi \ge 0\). Notice that this space includes points at which \(\pi > 1\). Although such points have no interpretation within our model, Eqs. (1)–(4) above define \(V_{N}, V_{D}, V_{C}\), and \(V_{M}\) for all values of \(\pi \). This allows us to define the loci of points in this \((\beta , \pi )\) space at which the mathematical equations \(V_{N}=V_{D}\) and \(V_{C}=V_{M}\) are satisfied, and then to characterise equilibria in terms of these loci, imposing the inequality \(\pi \le 1\) as an additional constraint. This method of analysis is useful in simplifying the proofs of our results.

First, consider the locus of points in the \((\beta , \pi )\) space at which \(V_{N}=V_{D}\). Any interior equilibrium must be a point on this ND locus, with \(0<\pi <1\); any boundary equilibrium must be a point at which \(V_{N} \le V_{D}\) and \(\pi = 1\). By (1) and (2), this locus is determined by:

$$\begin{aligned} V_{{D}} \ge (\hbox {or}<)V_{{N}}\;\Leftrightarrow \; a/\pi c\ge (\hbox {or}<)g(\beta ). \end{aligned}$$
(5)

This is a continuous and downward-sloping curve which includes the point (\(x_{\mathrm {max}}\), 0) and is asymptotic to \(\beta =x_{\mathrm {min}}\). It divides the (\(\beta \), \(\pi )\) space into three regions: the set of points on the locus, at which \(V_{N}=V_{D}\); the set of points inside the locus (that is, below and to the left), at which \(V_{N}<V_{D}\); and the set of points outside the locus (that is, above and to the right), at which \(V_{N}>V_{D}\).

Now consider the locus of points at which \(V_{C}=V_{M}\). Every equilibrium must be a point on this CM locus, with either \(\pi = 0\) (the non-participation equilibrium), \(0 < \pi < 1\) (an interior equilibrium), or \(\pi = 1\) (a boundary equilibrium). Combining equations (2)–(4), this locus is determined by:

$$\begin{aligned} V_{{C}}\ge (\hbox {or}<)V_{{M}} \;\Leftrightarrow \; (\beta -\pi a)/[\pi (b-\pi c)]\ge (\hbox {or}<)g(\beta ). \end{aligned}$$
(6)

This is a continuous curve which includes the points \((x_{\mathrm {min}}, x_{\mathrm {min}}/a)\) and \((x_{\mathrm {max}}, 0)\) and has the property that \(\pi < 1\) when \(\beta \le a\). It divides the \((\beta , \pi )\) space into three regions: the set of points on the locus, at which \(V_{C}=V_{M}\); the set of points inside the locus, at which \(V_{C} > V_{M}\); and the set of points outside the locus, at which \(V_{M}>V_{C}\).

Propositions (5) and (6) together imply the following result about the relative positions of the two loci:

$$\begin{aligned} \hbox {if}\; V_{{D}}= & {} V_{{N}} \hbox { and }\beta <x_{\mathrm{max}} \hbox { and } \pi >0 \hbox { then } V_{{C}} \ge (\hbox {or}<)V_{{M}} \;\Leftrightarrow \; \beta \ge (\hbox {or}<)ab/c.\nonumber \\ \end{aligned}$$
(7)

The loci intersect at the non-participation equilibrium \((x_{\mathrm {max}}, 0)\). If \(x_{\mathrm {max}} \le \) ab/c, there is no other intersection and hence no interior equilibrium.Footnote 6 This case is illustrated in Fig. 1a. (The loci are shown by the curves ND and CM; N is the non-participation equilibrium. Sections of these loci which pass above \(\pi = 1\) are drawn in grey dots to signify that these points have no interpretation within the model. The arrows refer to the dynamic analysis, which will be presented in Sect. 4.) If instead \(x_{\mathrm {max}} >ab/c\), there is exactly one other intersection, at \(\beta =ab/c\). There are now three alternative cases.

Fig. 1
figure 1

Equilibria and dynamics. a Non-participation the only equilibrium. b An interior equilibrium. c A boundary equilibrium

In the first case, illustrated in Fig. 1b, this intersection is at \(\pi < 1\). This intersection, denoted I, is an interior equilibrium, defined by \(\beta = \textit{ab}/c, \pi = a/g(\textit{ab}/c)\).Footnote 7 These values of \(\beta \) and \(\pi \) imply that the probability with which C is played, conditional on participation in the game (i.e. conditional on N not being played) is \(c/(a+c)\), ensuring that \(V_{D}= 0\). (Equivalently, the frequencies with which C and D are played are in the ratio c:a.) There may also be boundary equilibria; these occur if the CM locus intersects the line \(\pi = 1\) to the left of the ND locus.

In the second case, the loci intersect at \(\pi >1\). Because the CM locus is continuous, and because \(x_{\mathrm {min}}/a < 1\), there must be at least one value of \(\beta \) in the interval \(a < \beta < \textit{ab}/c\) at which the CM locus intersects the line \(\pi =1\). Any such point is a boundary equilibrium. This case is illustrated in Fig. 1c; B is a boundary equilibrium. In the third case (not illustrated), the loci intersect exactly at \(\pi = 1\). Then this intersection is a boundary equilibrium. In this case, there may be other boundary equilibria.

The foregoing argument establishes:

Result 1

If \(x_{\mathrm {max}} >\textit{ab}/c\), there is at least one (interior or boundary) equilibrium with \(0<\pi \le \) 1 and \(x_{\mathrm {min}}<\beta <x_{\mathrm {max}}\).

In other words, provided the upper tail of the distribution of cooperative benefit is not too short, there is at least one equilibrium in which both C and D are played.

We now consider players’ payoffs in such equilibria. Let V*(\(\beta , \pi \)) be the ex ante expected payoff to any player i, prior to the realisations of the random variable X, given that i and his opponent play according to \(\beta \) and \(\pi \). We will call \(V^{*}(\beta , \pi )\) the value of the game conditional on \((\beta , \pi )\).

The following results are derived in the Mathematical Appendix:

Result 2

In every interior and boundary equilibrium, the value of the game is strictly positive.

Result 3

Suppose there are two equilibria, \((\beta , \pi )\), \((\beta ^{\prime }, \pi ^{\prime })\), such that \(\beta <\beta ^{\prime }\). Then \(V^{*}(\beta ,\pi )> V^{*}(\beta ^{\prime },\pi ^{\prime })\).

Result 2 establishes that in every interior and boundary equilibrium, cooperative activity creates positive net benefits relative to the benchmark of non-participation, despite the non-zero probability of cheating. If there are multiple equilibria, one of these is distinguished by its having the lowest value of \(\beta \). (Since there can be no more than one interior equilibrium, no two equilibria have the same value of \(\beta \).) Result 3 establishes that this is the equilibrium at which the value of the game is greatest. We will call this the highest-value equilibrium.

3 The model: comparative statics

The frequency of cooperative behaviour that can be sustained in equilibrium depends on the distribution of cooperative benefit X. To keep the exposition simple, we analyse the effect of a rightward shift from one distribution F to an unambiguously superior distribution G when there is no change in the support [\(x_{\mathrm {min}}, x_{\mathrm {max}}\)]. That is, for all \(x_{\mathrm {min}} <z < x_{\mathrm {max}}, G(z) < F(z)\). The values of all other parameters are held constant.

Fig. 2
figure 2

Effects of a shift in the distribution of cooperative benefit

Using (5) it can be shown that if some point \((\beta , \pi )\) is on the ND locus for the distribution F, it is inside the corresponding locus for G. Similarly, using (6), if some point \((\beta , \pi )\) is on the CM locus for the distribution F, it is inside the corresponding locus for G. Thus, an improvement in the distribution of cooperative benefit moves both loci outwards. Figure 2 illustrates the effects of a shift in the distribution from F [inducing the loci ND(F) and CM(F)] to G [inducing the loci ND(G) and CM(G)].

As this diagram shows, if the game has interior equilibria for both distributions, those equilibria have the same value of \(\beta \), namely ab/c, but the G equilibrium has a higher value of \(\pi \). Since \(G(\textit{ab}/c) < F(\textit{ab}/c)\) and the frequencies with which C and D are played are in the fixed ratio c:a, both C and D are played with higher frequency in the G equilibrium than in the F equilibrium. More intuitively, the relationship between cooperation and cheating is analogous to that between prey and predator. If the distribution of cooperative benefit becomes more favourable, a higher frequency of cooperation is induced; but the more cooperation there is, the more cheating can be sustained.

If the game has boundary equilibria for both distributions, the highest-value G equilibrium must be to the left of the highest-value F equilibrium. (This can be seen by considering the effect of an outward shift of the CM locus in Fig. 1c.) Thus, the former equilibrium induces a higher frequency of cooperation than the latter.

The following general result is proved in the Appendix:

Result 4

Suppose \(x_{\mathrm {max}} >\textit{ab}/c\) and let F, G be two distributions of X such that G is rightward of F. Then in the highest-value G equilibrium, the frequency of cooperation and the value of the game are both strictly greater than in the highest-value F equilibrium.

Thus, as the distribution of cooperative benefit becomes progressively more favourable, the maximum sustainable frequency of cooperation increases.Footnote 8 Increases in cooperation are associated with increases in cheating until the frequency of non-participation falls to zero.

4 The model: dynamics

We now embed our game in an evolutionary process. We consider a finite population of potential players, sufficiently large to legitimate the use of the law of large numbers. In each of a long series of periods, individuals from this population are randomly and anonymously matched to play the game. Since we are presenting an evolutionary analysis, we do not define any concept of ‘lifetime’ utility that players maximise. Instead, we assume that, at the population level, behaviour gravitates towards whatever pattern of play is currently payoff-maximising for individuals, given the behaviour of the population as a whole.

In this section, we present a simple dynamic analysis that can be represented in the \((\beta , \pi )\) space of Fig. 1. For the purposes of this analysis, \(\beta \) and \(\pi \) are interpreted as descriptions of the mix of strategies played in the population at any given time: \(\beta \in [x_{\mathrm {min}}, x_{\mathrm {max}}\)] is the critical value of X such that C is chosen by any player i if and only if \(x_{i}\ge \beta \), and \(\pi \in \) [0, 1] is the relative frequency of D choices among players for whom \(x_{i} < \beta \). We assume that \(\beta \) and \(\pi \) evolve independently. (In a biological application, this is equivalent to assuming that \(\beta \) and \(\pi \) are determined by distinct sets of genes.) Thus, the direction of change of \(\beta \) depends on the relative values of \(V_{C}\) and \(V_{M}\): \(\beta \) tends to increase (respectively: decrease) if \(V_{M}>V_{C}(V_{M}<V_{C})\). The direction of change of \(\pi \) depends on the relative values of \(V_{D}\) and \(V_{N}\): \(\pi \) tends to increase (decrease) if \(V_{D}> V_{N}\) (\(V_{D}<V_{N}\)). This gives the dynamics shown in phase-diagram form in Fig. 1.

In the case shown in Fig. 1a, the dynamics in the neighbourhood of the non-participation equilibrium (N) are cyclical. At the level of generality at which we are working, it is not possible to determine whether this equilibrium is locally stable, but it has a non-empty zone of attraction, including at least all points at which \(\beta = x_{\mathrm {max}}\). In the cases shown in Fig. 1b, c, the non-participation equilibrium is locally unstable. (In these cases, all paths from points close to N but below the ND locus lead away from N, and must eventually pass through or to the left of the interior equilibrium I. As in the case shown in Fig. 1a, N has a non-empty zone of attraction.) However, in states in which almost all players choose non-participation, selection pressure is weak, and so the dynamics shown in the diagrams might work very slowly in the region close to N.

It is clear from Fig. 1b that, in the neighbourhood of an interior equilibrium, the dynamics exhibit cyclical or spiralling paths. Described in terms of the evolution of the frequencies of the three strategies N, C and D, these paths are similar to those of the Rock–Paper–Scissors game. (The frequency of cooperation is greatest towards the left of the diagram, where the value of \(\beta \) is low. From there, evolutionary paths lead towards the top right, where the values of \(\beta \) and \(\pi \) are both high, and the frequency of cheating is greatest. From there, paths lead towards the bottom right, where \(\beta \) is high and \(\pi \) is low, and the frequency of non-participation is greatest. And from there, paths lead back towards the left. If paths spiral outwards, they may lead into the zone of attraction of the non-participation equilibrium.) These paths resemble predator–prey interactions, cheats acting as predators and cooperators as prey.

If the CM locus cuts the line \(\pi = 1\) at a point where \(\beta < \textit{ab}/c\), this point is a boundary equilibrium. If, as in the case shown in Fig. 1c, points to the left of this equilibrium are outside the locus, the equilibrium is locally stable. Not all boundary equilibria have this property, but whenever the ND and CM loci intersect at \(\pi > 1\), there must be at least one locally stable boundary equilibrium.

5 Simulations

In this section, we briefly illustrate some basic features of our theoretical model by means of computer simulations. Using a simple deterministic replicator dynamics, we analyse the evolution of \(\beta \) and \(\pi \) over time, and the associated relative frequencies \(p_{N}, p_{C}\) and \( p_{D}\) with which strategies N, C and D respectively are played. Further details are provided in the “Mathematical Appendix”.Footnote 9

In applying replicator dynamics to our game, we cannot treat N-players, C-players and D-players as distinct sub-populations which replicate separately. This is because, in our model, players’ decisions about whether or not to cooperate in any given interaction are conditioned on the relevant realisation of the random variable X. Our method is to assume that at any given time t, all players are characterised by the same \((\beta , \pi )\) pair. In replicator dynamics, the growth rate of the population fraction using a given strategy is proportional to the difference between the current payoff of that strategy and the weighted average of the current payoffs of all strategies, each strategy being weighted by the relative frequency with which it is played (Taylor and Jonker 1978). For the purposes of our analysis, we define the current payoff to each strategy as the expected payoff to any player i from choosing that strategy, conditional on the current values of \(p_{N}, p_{C}\) and \(p_{D}\) and conditional on \(x_{i}\) being equal to the current value of \(\beta \). At each time t, the rates of change of \(p_{N}, p_{C}\) and \(p_{D}\) are determined by the replicator equations; these changes are then implemented through changes in \(\beta \) and \(\pi \). This method allows mathematically tractable simulations while conserving the essential features of the dynamics described theoretically in Sect. 4.Footnote 10

Table 2 Type and stability properties of interior equilibria

Our simulations use the parameter values \(a=3, b= 4, x_{\mathrm {min}}= 0\) and \(x_{\mathrm {max}} = 9\); the distribution of X over the interval [\(x_{\mathrm {min}}, x_{\mathrm {max}}\)] is assumed to be uniform. We investigate the dynamics at different values of c, with particular emphasis on values in the interval [1.33, 2.4] in which interior equilibria occur. (If \(c\le 1.33, x_{\mathrm {max}}\le \textit{ab}/c\), and so the CM and ND loci intersect only at the non-participation equilibrium; if \(c\ge \) 2.4, the loci intersect at \(\pi \ge 1\).) Intuitively, increases in c (that is, increases in the cost incurred by each player when both cheat) reduce the rewards from cheating and so favour cooperation.

Irrespective of the value of c, the non-participation equilibrium is unstable under replicator dynamics. (Since C is the best response to every non-degenerate mix of C and N, any path starting at a point at which \(\pi = 0\) and \(x_{\mathrm {max}} > \beta >x_{\mathrm {min}}\) must move away from the non-participation equilibrium along the line \(\pi = 0\).) However, there are also paths that converge to the non-participation equilibrium. (Since N is the best response to every non-degenerate mix of N and D, any path starting at a point at which \(\beta =x_{\mathrm {min}}\) and \(1 > \pi > 0\) must have this property.) The stability properties of interior equilibria are less immediately obvious. It turns out that, as c varies, there are qualitative changes in the dynamics associated with interior equilibria.

Table 2 reports the type and stability properties of the interior equilibrium at different values of c. This equilibrium is unstable for \(1.33 < c < 2.3094\) but asymptotically stable for \(2.3094 < c < 2.4\). At \(c= 2.3094\) there is a Hopf bifurcation (see the “Mathematical Appendix” for details).

Figure 3 shows the dynamics at different values of c. In Fig. 3a, \(c=1\) and the CM and ND loci intersect only at the non-participation equilibrium. Paths that start close to the non-participation equilibrium but below the CM locus initially move away from that equilibrium, but then approach it from above ND locus; all paths converge asymptotically to the \(\beta =x_{\mathrm {max}}\) line. As c increases (Fig. 3b, c) the general picture does not change until the Hopf bifurcation occurs: paths that start close to the interior equilibrium eventually approach the non-participation equilibrium. But after this bifurcation, paths that start below the ND locus spiral in to the interior equilibrium (Fig. 3d). The implication is that there is a range of parameter values for which a stable interior equilibrium exists and has a large zone of attraction.

Fig. 3
figure 3

Replicator dynamics for different values of c

6 Discussion

We do not intend to claim that our model represents the mechanism that underlies human and animal cooperation. There is no good reason to suppose that cooperation is a single phenomenon with a unified causal explanation. We find it more plausible to view cooperation as a family of loosely related phenomena which may have multiple causes. We offer our model as a stylised representation of one mechanism by which cooperation might emerge and persist.

Our model is unusually robust in that it assumes only materially self-interested motivations and applies to anonymous, well-mixed populations. In claiming this as a merit of the model, we do not deny that individuals sometimes act on pro-social motivations. It has long been known that experimental subjects often cooperate in non-repeated and anonymous Prisoner’s Dilemmas (Sally 1995).

Nor do we deny that many recurrent cooperative interactions are between individuals who are known to one another, or that populations of potential cooperators are often structured into clusters of individuals who interact mainly with their neighbours. Each of these factors can contribute to the explanation of cooperation in particular environments. Nevertheless, theories that depend on non-anonymity, or on population structures taking particular forms, have restricted domains of application. And since self-interest is a very common and reliable motivation, models which assume only self-interest can be expected to be particularly robust.

As an illustration of how theories with less robust assumptions can be restricted in their application, we consider the currently much-discussed hypothesis of altruistic punishment (Fehr and Gächter 2000; Gintis et al. 2005). The key insight is that multilateral cooperation can be sustained in equilibrium if individuals have low-cost options of punishing one another, and if even a relatively small proportion of individuals have relatively weak preferences for punishing non-cooperators. However, the general effectiveness of this mechanism depends on the cost of punishing being low relative to the harm inflicted, and on the absence of opportunities for punishees to retaliate (Herrmann et al. 2008; Nikiforakis 2008); and it requires that at least some individuals have non-selfish preferences for punishing. Such preferences might be sustained by cultural group selection in hunter-gatherer economies, where groups are small and inter-group warfare is frequent, but these conditions are not typical of the modern world; even among hunter-gatherers, biological group selection of altruistic punishment would be frustrated by inter-group gene flow (Boyd et al. 2005). Altruistic punishment should be understood as a mechanism that can sustain cooperation in specific types of environment, not as the solution to the problem of explaining cooperation. We claim no more than this for our own model.

We have said that our model is in the same spirit as some recent work by biologists, which finds apparently cooperative behaviour to be directly beneficial to the individual cooperator (see Sect. 1 above). But, as we now explain, the explanatory principles used by these biologists are not the same as those exhibited in our model.

One of the fundamental features of our model is that the cooperative behaviour it describes is reciprocally beneficial. By this, we mean the following. Such cooperation is not simply a unilateral action by one individual which, intentionally or unintentionally, confers benefits on another; it is the composition of cooperative actions by two or more individuals, the combined effect of which is to benefit each of them. In other words, each cooperator benefits from his action only if this action is reciprocated by one or more other individuals. In the absence of enforceable promises, reciprocally beneficial cooperation requires at least one individual to choose a cooperative action without assurance that others will reciprocate. In our model, any player who chooses to cooperate incurs a risk of loss, which is realised if his opponent cheats. One might think (as we are inclined to do) that reciprocal benefit is a hallmark of genuine, as opposed to apparent, cooperation (see also Sachs et al. 2004; West et al. 2007). In biological models of mutualism, cooperation is not reciprocally beneficial, in the sense we have defined.

In the Snowdrift game, which is often used to model apparently cooperative animal behaviour, cooperation and cheating are best responses to one another. In the original story, two drivers are stuck in the same snowdrift. Both drivers have shovels, and so each can choose whether or not to dig. If either driver digs a way out for his own car, the other can drive out too. Each would rather be the only one to dig than remain stuck. This defines a game with Chicken payoffs; in a pure-strategy Nash equilibrium, one driver digs and the other free-rides (Sugden 1986). Such an equilibrium is not a case of reciprocally beneficial behaviour.

Clutton-Brock (2009) offers the Soldier’s Dilemma as a model of mutualism in biology. In this game, a patrol of soldiers is ambushed by the enemy. Soldiers who fire back attract incoming fire and increase their chance of being killed. By firing back, however, each individual reduces the probability that the patrol will be overrun. The gain from this may be such that from an individual’s perspective there is no dilemma at all: firing back may give the best chance of individual survival, irrespective of what the others do. A biological equivalent to this game (or perhaps to Snowdrift) can be found in the behaviour of certain birds and mammals, such as Arabian babblers and meerkat, which feed in predator-rich environments. Individuals of these species go on sentinel duty once they have fed for long enough to be close to satiation (Clutton-Brock et al. 1999). In these games, cooperation is chosen either as a dominant strategy or as a best response to other players’ non-cooperation; it is not reciprocally beneficial.

In the story of the Soldier’s Dilemma, it would be natural to assume that cooperation would be a dominant strategy only if the number of soldiers in the patrol was relatively small, so that each of them received a significant share of the total benefit created by his own cooperative action. Hauert et al. (2002) present a model which can be understood as a version of the Soldier’s Dilemma in which the size of the patrol is endogenous. This is an n-player model of voluntary contributions to a public good, but with an outside option of non-participation. A player who takes the outside option receives a small positive payoff \(\sigma \) with certainty, but forgoes any share in the benefits of the public good. Players who participate can either cooperate (contribute to the public good) or cheat (not contribute). Each cooperator incurs a cost of 1 and creates a benefit of r (where \(1 < r <n\) and \(r >\sigma + 1\)), which is divided equally between all participants. This game has no pure-strategy Nash equilibrium. (If all of one’s opponents take the outside option, the best response is to cooperate; if they all cooperate, the best response is to cheat; if they all cheat, the best response is the outside option.) There is a unique symmetrical mixed-strategy Nash equilibrium in which the expected payoff to all three strategies is \(\sigma \). More intuitively, in equilibrium the expected number of participants in each game is sufficiently small that cooperation and cheating are equally profitable. Replicator dynamics have the Rock–Paper–Scissors cyclical pattern.

There are some similarities between Hauert et al.’s model and ours: both models include a non-participation option, and both induce mixed-strategy equilibria with predator–prey characteristics. However, Hauert et al.’s model differs from ours in two significant ways. First, the mechanism that induces cooperation works through variation in the number of participants in the cooperative activity. For this reason, the model cannot represent cooperative activities which require a fixed number of participants. In particular, it cannot represent activities which inherently involve just two individuals—as, for example, most forms of market exchange do. Second, because the costs and benefits of contributing to the public good are non-stochastic, the expected payoffs to cooperation, cheating and non-participation are equal in equilibrium. Thus, although some cooperative activity takes place in equilibrium, this activity generates no net benefit relative to non-participation: it is not reciprocally beneficial.

We suggest that our analysis provides a stylised but essentially realistic account of a mechanism by which reciprocally beneficial cooperation can emerge and persist in anonymous, well-mixed populations in which strategies are selected according to their material payoffs. Using two simple components—voluntary participation and stochastic payoffs—that have not previously been put together, we have constructed a robust general-purpose model of cooperation.

We are conscious that, for some theoretically-oriented economists, the mechanism we have described may seem rather prosaic. For decades, the Prisoner’s Dilemma has been used as the paradigm model of cooperation problems, and the problem of explaining cooperation in that game has been treated as a supreme theoretical challenge. Viewed in that perspective, a modelling strategy which relaxes the assumption that cooperation is always a dominated strategy may seem too easy. But we share the view of Worden and Levin (2007) that many real-world cooperation problems are less intractable than the Prisoner’s Dilemma. Neglecting these cases results in an incomplete body of theory and fosters unwarranted pessimism about the possibility of spontaneous cooperation.