Keywords

1 Introduction

The paper by Axelrod and Hamilton [1] has inspired much theoretical and empirical work on the problem of cooperation. Their famous model of cooperation is based on letting individuals interact repeatedly over time and that each member of a pair has the opportunity to provide a benefit to the other at a cost to himself by cooperating. Now consider a population of Tit-for-Tatters which cooperates on the first interaction and keeps on cooperating only as long as their partner cooperates. Axelrod and Hamilton [1] showed that Tit-for-Tatters can resist invasion by defectors who never cooperate as long as the long-run benefit of mutual cooperation is greater than the short-run benefit that a defector gets by exploiting a cooperator. However, as shown by Axelrod and Hamilton [1], a population of Tit-for-Tatters is not the only one that is evolutionary stable. In fact, a population where all are defectors is also evolutionary stable. If (almost) all players in a population are defectors, a cooperator will have no one to cooperate with. Therefore, a player cannot do any better than playing defect. The long-run benefit associated with sustained cooperation becomes irrelevant. This raises the problem concerning initiation of cooperation from a previous asocial state. How could an evolutionary trend towards cooperative behavior have started in the first place?

To study this question more closely Axelrod and Hamilton introduce the concept of segregation. Segregated interaction means that the probability for a Tit-for-Tatter to meet another Tit-for-Tatter is higher than the proportion of Tit-for-Tatters in the population. Axelrod and Hamilton then show that if there are few Tit-for-Tatters in the population, and if the long-run benefit of cooperation is big, only a small amount of segregation is needed in order to secure Tit-for-Tatters a higher expected payoff than defectors. An evolutionary trend towards universal cooperation can then start. The results established by Axelrod and Hamilton are generated within a setup where pairs of individuals interact repeatedly over time, and where everybody is able to remember the action taken by each member of the population in previous interactions. However, in many human social environments, Axelrod and Hamilton’s conditions favoring cooperation can be questioned. Individuals do not always interact repeatedly over long periods of time, and in large groups it can be difficult to remember the action taken by a potential exchange partner in previously interactions. This leads us to the main question of this paper: Since segregation is a powerful mechanism for the promotion of cooperation in a repeated Prisoner’s Dilemma game, can segregation also promote the evolution of cooperation in a non-repeated version of the game? If so, how much segregation is needed, and how does cooperative behavior evolve over time depending on the degree of segregation?

2 The Problem of Cooperation

Consider a large population of players who interact in pairs with available actions and payoffs describing a Prisoner’s Dilemma game. We have the following payoff matrix, where \(a>b>c>d\).

 

Cooperate

Defect

Cooperate

\(\ \ \ b\) , \(b\)

\(d\) , \(a\)

Defect

\(\ \ \ a\) , \(d\)

\(c\) , \(c\)

If both players cooperate, they both receive a payoff of \(b\). If both defect, they both receive payoffs of \(c\). If one cooperates and the other defects, the cooperator receives a payoff of \(d\), while the defector does very well with a payoff of \(a\). Assume further that individuals in the larger population are either (perhaps due to cultural experiences, perhaps due to genes) cooperators \(\left( C\right) \) or defectors \(\left( D\right) \) in a single period Prisoner’s Dilemma. Let \(p\) denote the proportion of the population that are cooperators and \((1-p)\) the proportion of defectors. If the members of the population are randomly paired, the expected payoffs are given by

$$\begin{aligned} V(C)&= pb+(1-p)d \end{aligned}$$
(1)
$$\begin{aligned} V(D)&= pa+(1-p)c \end{aligned}$$
(2)

where \(V(C)\) and \(V(D)\) are the expected payoff for a cooperator and a defector respectively. Equation (1) says that with probability \(p\) a cooperator is paired with another cooperator producing a payoff \(b\), and with probability \(\left( 1-p\right) \) is paired with a defector producing a payoff \(d\). Equation (2) has a similar interpretation: With probability \(p\) a defector is paired with a cooperator producing a payoff \(a\), and with probability \(\left( 1-p\right) \) is paired with a another defector producing a payoff \(c\).

Assume now the following simple evolutionary dynamics: At any time, the growth rate of the proportion of cooperators (p) is positive or negative, depending on whether the expected payoff for cooperators is higher or lower than the expected payoff for defectors. The population distribution (p) will be unchanging, producing an equilibrium, if

$$\begin{aligned} V(C) = V(D) \end{aligned}$$
(3)

It is easy to see from (1) and (2) that the only evolutionary stable equilibrium in this game is \(p=0\), where all members of the population defects.

This result follows from the fact that \(a>b\) and \(c>d\), which gives \(V(C)<V(D)\) for all \(p\in \left( 0,1\right) \). Cooperators cooperate irrespective of the type of player whom they meet. Defectors take advantage of such indiscriminate cooperative behavior and get a higher expected payoff compared to cooperators. Defectors increase in numbers, and in the long run take over the whole population. This result motivated Axelrod and Hamilton to examine more closely conditions, not captured in the situation just studied, that can lead to the evolution of cooperation when cooperators and defectors meet to play the Prisoner’s Dilemma game.

3 Evolution of Cooperative Behavior

3.1 Repeated Interaction

Assume that the Prisoner’s Dilemma game introduced above is repeated with an unknown number of rounds. After each round there is a probability \(\beta \) that another round will be played. Hence, the expected number of rounds is \( 1/\left( 1-\beta \right) \). Assume also that the population consists of two types of players, unconditional defectors and conditional cooperators. The unconditional defectors always defect, while the conditional cooperators are endowed with the Tit-for-Tat strategy. The Tit-for-Tat strategy dictates cooperators to cooperate on the first round, and on all subsequent rounds do what the partner did on the previous round. The fraction of the population adopting Tit-fot-Tat is \(p\), while the remaining is adopting unconditional Defect. The expected payoff for cooperators adopting Tit-for-Tat and defectors, respectively, are then

$$\begin{aligned} V(C)&=p\left( \frac{b}{1-\beta }\right) +\left( 1-p\right) \left( d+\frac{c\beta }{1-\beta }\right) \end{aligned}$$
(4)
$$\begin{aligned} V(D)&=p\left( a+\frac{c\beta }{1-\beta }\right) +\left( 1-p\right) \left( \frac{c}{1-\beta }\right) \end{aligned}$$
(5)

Equation (4) says that when two Tit-for-Tatters meet, they will both cooperate on the first interaction and then continue to do so until the interaction terminated, giving a expected payoff of \(b/\left( 1-\beta \right) \). When a Tit-for-Tatter meets a defector, the former gets \(d\) on the first interaction while the defector gets \(a\). Then both will defect until the game terminates, the expected number of iterations after the first round being \((1/(1-\beta ))-1=\beta /(1-\beta )\). Equation (5) has a similar interpretation.

According to (3) the condition for equilibrium is that the expected payoff for the two types is equal, giving

$$\begin{aligned} p^{*}=\frac{c-d}{\frac{b-c\beta }{1-\beta }+c-d-a} \end{aligned}$$
(6)

Since the nominator is positive, the denominator of (6) must also be positive. In addition, for \(p^{*}\in \left( 0,1\right) \), the denominator must be greater than the nominator. Both conditions are satisfied if

$$\begin{aligned} \frac{b-c\beta }{1-\beta }-a>0 \end{aligned}$$
(7)

which gives

$$\begin{aligned} \beta >\frac{a-b}{a-c} \end{aligned}$$
(8)

When (8) holds, \(p^{*}\) is an interior equilibrium. This situation can be explained as follows: Suppose that the initial frequency of cooperators is lower than \(p^{*}\), when there are many defectors, rare cooperators are likely to be paired with defectors, producing a low payoff for cooperators. If, however, the initial frequency of cooperators is higher than \(p^{*}\), then \(V(C) > V( D)\). Cooperators often meet other cooperators with whom to associate. Expected payoff for Tit-for-Tatters is higher than that of the defectors, which causes cooperating behavior to spread. Hence, \(p^{*}\) is an interior unstable equilibrium (a tipping point) which marks the boundary between the ranges of attraction of the two stable equilibrium, \(p=0\) and \(p=1\).

We can then draw the following conclusion from the model: In a population where defecting behavior is not too common, the cooperating Tit-for-Tat strategy leads to universal cooperation if pairs of individuals are likely to interact many times. From (6) we get

$$\begin{aligned} \frac{\mathrm{{dp}}^{*}}{\mathrm{{d}}\beta }=\frac{p^{*}\frac{c-b}{\left( 1-\beta \right) ^{2}}}{\frac{b-c\beta }{1-\beta }+c-d-a}<0 \end{aligned}$$
(9)

saying that an increase in the probability for the game to be continued moves \(p^{*}\) to the left. A smaller fraction of Tit-for-Tatters is then needed in order to secure an evolutionary stable survival of cooperative behavior.

However, even if \(\beta \) is high we still need a certain fraction of Tit-for-Tatter in order to start a process where Tit-for-Tatters increase in numbers. This illustrates that the model fails to answer what many consider as the most fundamental problem related to the evolution of cooperation: How could cooperation ever have started from a previous asocial state where (almost) all are defectors? To solve this puzzle Axelrod and Hamilton introduce the concept of segregation (or clustering as they name it). When there is some segregated interaction, Tit-for-Tatters are more likely paired with each other than chance alone would dictate. If the long-run benefit of cooperation is big, even a small amount of segregation can cause the expected payoff of Tit-for-Tatters to exceed the expected payoff of defectors. An evolutionary trend towards universal cooperation can then get started.

3.2 Segregation

A main result in the work by Axelrod and Hamilton is that segregation can be very effective for the evolution of cooperation in a repeated Prisoner’s Dilemma game. But what about the no-repeated version of the game ? Can segregation also promote the evolution of cooperation when the players meet to play the one-shot Prisoner’s Dilemma game, that is when \(\beta = 0\). It is immediately clear that complete segregation of cooperators and defectors within a large population secure cooperation. Complete segregation means that cooperators always meet cooperators, and defectors always meet defectors. Cooperators get a payoff of \(b\), while defectors get \(c\). Since \(b>c\) cooperating behavior will spread, and in the long run take over the whole population.

The case where cooperators and defectors are only partly segregated can be modeled by using the following formulation, adopted from Boyd and Richerson [3]. Let \(r\in \left( 0,1\right) \) be a measure of the degree of segregation. When \(p\) is the fraction of cooperators in the population, the probability that a cooperator meets another cooperator is no longer \(p\) but \(r+\left( 1-r\right) p\). Correspondingly, the probability that a defector meets another defector is \(r+\left( 1-r\right) \left( 1-p\right) \). If \(r=1\), we have complete segregation, implying that cooperators never interact with defectors. If \(r=0\), we are back to the situation with random matching. Adopting this formulation, the expected payoff for cooperators and defectors, respectively, are

$$\begin{aligned} V(C)&=[r+(1-r)p]b+[(1-r)(1-p)]d \end{aligned}$$
(10)
$$\begin{aligned} V(D)&=[(1-r)p]a+[r+(1-r)(1-p)]c \end{aligned}$$
(11)

From (10) and (11) we see that with random matching \(\left( r=0\right) \), we are back to the situation analyzed in Sect. 2. Defectors do it better than cooperators for every \(p\in \left( 0,1\right) \), giving \(p=0\) as an evolutionary stable equilibrium. With complete segregation \(\left( r=1\right) \), we reach the complete opposite conclusion, as noted above. Cooperators do it better than defectors for every \(p\in \left( 0,1\right) \), giving \(p=1\) as an evolutionary stable equilibrium. In the simulation we are therefore interested in analyzing the situation where the segregation parameter \(\left( r\right) \) lies between these two extreme cases. In particular we are interesting in finding out how small \(r\) can be in order to support an evolutionary stable proportion of cooperators. However, as it has been shown in earlier work, in addition to \(r\), the expected payoffs are also influenced by the proportion cooperators \(\left( p\right) \) and defectors \(\left( 1-p\right) \) in the population. In the simulation we therefore have to vary both the segregation parameter and the initial proportion of cooperators and defectors in the population. This makes it possible to study how different combinations of \(r\) and \(p\) affect the evolution of cooperators and defectors.

4 The Simulation

There has been a lot of research on the simulation of the PD [24]. As a simulation model, we use an agent-based simulation approach in which an agent represents a player with a predefined strategy. The basic activity of the agent is to play the iterated Prisoner’s Dilemma. Each agent is identified using a unique label. The label \(C\) is used to identify the agents choosing the cooperative strategy, while the label \(D\) is used for those choosing the defective strategy. Each agent’s label can be viewed as a mapping from one state of the game to a new state in the next round, and the simulation experiments searches for the ability of an agent to survive the evolution process.

The simulation of the iterated Prisoner’s Dilemma is described as follows. Initially, a population of agents is generated. A user-defined parameter will determine the percentage of agents playing the cooperative strategy against those playing the defective strategy. The payoff of all agents is set to 0. The next step of the algorithm proceeds by pairing off agents to play one game of PD. This step can be viewed as a matching process. To begin with, a random number random is drawn uniformly on the interval (0,1). Thereafter, an \(\mathrm{{agent}}_k\) is drawn randomly from the set of unmatched agents. If \(\mathrm{{agent}}_k\) is assigned the label \(C\), then the matching scheme will select a randomly unmatched agent with the label \(C\) provided the following inequality \((\mathbf{random } < r+ (1-r ) * p_c)\) holds, otherwise the matching mate of \(\mathrm{{agent}}_k\) will be a randomly chosen unmatched agent with the label \(D\). The value of \(p_c\) represents the proportion of agents playing the cooperative strategy. On the other hand, if \(\mathrm{{agent}}_k\) is assigned the label \(D\), then its matching mate will be chosen with the label \(D\) provided the inequality \((\mathbf{random } < r+ (1-r ) * ( 1- p_c))\) holds, otherwise the matching mate of \(\mathrm{{agent}}_k\) will be selected with the label \(C\). If by the chance, the matching scheme is unable to locate the matching mate with the required label, then \(\mathrm{{agent}}_k\) will be left unmatched. At the end of each tournament, the agents of the current population \(P_t\) are transformed into a new population \(P_{t+1}\) that engages in a new round of PD based on each agent’s payoff. In the simulation we use the same payoff parameters as Axelrod and Hamilton [1]. These are shown in the payoff matrix below.

 

Cooperate

Defect

Cooperate

\(\ \ \ 3\), \(3\)

\(\ \ \ \ \ \ 0\), \(5\)

Defect

\(\ \ \ 5\), \(0\)

\(\ \ 1\), \(1\)

The payoff received will determine whether an agent is removed from the game or allowed to continue. It is assumed that the size of the entire population stays fixed during the whole simulation process. All the unmatched agents from \(P_t\) will automatically be allowed to be part of the new population \(P_{t+1}\). The agents that were engaged in the one-shot Prisoner’s Dilemma game are ranked according to their payoff from best to worse ( i.e., sorting agents to decreasing payoff values) and those with the highest payoff will be allowed to proceed to the next round and multiplies by cloning a duplicate agent with similar strategy. Each agent resets its payoff to \(0\) before starting a new round of PD. The simulation process is assumed to have reached a stabilization of its convergence when all the agents have similar strategy.

5 Experiments

5.1 Experimental Setup

The simulation model has a number of user-defined parameters such as the segregation parameter, and the starting initial conditions (i.e., percentages of cooperators and defectors). We perform several simulations using instances defined by the 4-duple \(<n, p_c,p_d,r>\), where \(n\) denotes the number of agents, \(p_c\) denotes the percentage of cooperators, \(p_d\) denotes the percentage of defectors, and \(r\) the segregation parameter. We set the number of agents to \(1,000\). In order to obtain a more fair understanding of the simulation process, we vary the parameters \(r\) and \(p_c\) from \(0.1\) to \(0.9\) with a step size of \(0.1\), and \(p_c\) from \(10\) to 90 % with a step size of \(10\). Thereby, producing \(81\) different pairs of \(r\) and \(p_c\). Because of the stochastic nature of the simulation process, we let each simulation do \(100\) independent runs, each run with a different random seed. In this way every result we present is averaged over \(100\) runs. The simulation process ends when the population of agents converges to either 100 % C’s or 100 % D’s, or a maximum of \(10^6\) generations have been performed.

5.2 The Benchmark Case

In this section, we conduct an experiment using \(p_c = 90\), \(p_d = 10\) and setting the segregation parameter \(r\) to \(0\). Figure 1 shows one typical run of the simulation experiment. The course of the percentage function suggests an interesting feature which is the existence of two phases. The first phase starts with a steady decline of agents with the cooperative strategy over the first generations before it flattens off as we mount the plateau, marking the start of the second phase. The plateau spans a region where the percentage of C’s and D’s fluctuates around 50 %. The plateau is rather short and becomes less pronounced as the number of generation increases. Then the percentage of C’s start to decrease before finally it jumps to 0 %. This example illustrates how agents tend to evolve strategies that increasingly defect in the absence of the segregation. The explanation is rather a simple one. The agents evolve in a random environment, and therefore the agents that manage to survive the simulation process are those willing to always defect.

Fig. 1
figure 1

Evolution process of cooperators and defectors with r = 0

5.3 Phase Transition

Table 1 gives the results of the simulations with different values of \(r\) and \(p_c\)k. A quick look at this table reveals the existence of three different regions. The first region lies in the upper left corner where the values of \(p_c\) and \(r\) are low shows that the result of the simulation converges globally to D’s with a success ratio equal to 1. This region starts in the classes where \(p_c = 10\) and \(r \le 0.4\), \(p_c = 20\) and \(r \le 0.3\), and finally \(p_c = 30\) and \(r \le 0.2\).

Table 1 Convergence ratios for cooperates and defects

The second region lies in the right corner where \(r \ge 0.5\). In this region, the simulation converges to C’s with a success ratio equal to 1 regardless of the starting percentage of cooperators. Finally, a third region which lies between the two other regions where for every pair of \(p_c\) and \(r\), the result of the simulation process includes a mixture of C’s and D’s. This region starts in the classes where \(p_c = 20\) and \(r = 0.4\), \(p_c = 30\) and \( 0.3 \le r \le 0.4 \), \(p_c = 0.4\) and \( 0.2 \le r \le 0.4\), and finally \(50 \le p_c \ge 90\) and \(0.1 \le r \le 0.4.\) In this region the success ratio is equal to \(0\), and for each pair of \(p_c\) and \(r\) lying in this region, the figure shows the final percentages of C’s and D’s which may differ depending on the maximum number of generations allowed to be performed.

These experiments show the existence of a phase transition which refers to the phenomenon that the probability that the simulation process converges to C’s or D’s decreases from 1 to 0 when the parameters \(r\) and \(p_c\) are assigned values within a given interval.

The next three figures will show the course of the simulation process regarding the percentage of C’s and D’s in the three regions. Figure 2 shows a plot of the simulation process representing a case in the region where the convergence results always in favor of the agents choosing the defect strategy. The result shows a rapid rise of D’s before it reaches 100 % at about the sixteenth generation. Choosing the values of \(r\) and \(p_c\) in this region prevent agents with the cooperative strategy to develop leading to a random working environment where the agents with the defect strategy proliferate. Figure 3 shows a plot of the simulation process representing a case in the phase transition with \(r = 0.3\) , 50 %, and \(p_d = \) 50 %, where the convergence results always in a mix population of C’s and D’s. Notice the rapid increase in the percentage of C’s and the rapid decline in the percentage of D’ during the first generations. Both strategies reach a peak value at about \(400\) generations and periodically fluctuates between a low and a high percentage range and remain there indefinitly. Finally, Fig. 4 shows a plot representing a case in the third region characterized by a convergence resulting always in favor of the agents choosing the cooperative strategy. The plot shows an upward trend in the percentage of C’s and the possibility of having the chance to develop due to the right choice of the segregation parameter value. Accordingly, in subsequent generations, the population of agents becomes increasingly dominated by C’s.

Fig. 2
figure 2

Evolution process of cooperators and defectors with r = 0.3, p c = 80%, p d = 20%

Fig. 3
figure 3

Evolution process of cooperators and defectors with r = 0.3, p c = 50%, p d = 50%

Fig. 4
figure 4

Evolution process of cooperators and defectors with r = 0.5, p c = 30%, p d = 70%

6 Conclusion

Most game-theoretic treatments of the problem of cooperation adopt the assumption of random pairing. But this is somewhat strange since social interaction is hardly ever random. As discussed in previous research articles, non-random interaction constitutes an important aspect of our social architecture. In most societies there is a strong tendency that members are structured in more or less homogeneous groups. A “group” can for example be a village, a neighborhood, a class, an occupation, an ethnic group or a religious community. Members of these groups interact more frequent with each other than with members of the society at large. Hence, since non-random pairing plays an important role in most social interaction, it should be taken into consideration when the evolution of behavior and norms are analyzed. The paper has shown that segregated interaction is a powerful mechanism for the evolution of cooperation in a Prisoner’s Dilemma game, where cooperators interact with defectors. This conclusion holds even if we drop the possibility of repeated interaction and reciprocity, which was essential for the results generated in Axelrod and Hamilton’s influential paper.