The game nicknamed ‘prisoner’s dilemma’ by A.W. Tucker has attracted wide attention, doubtless because it has raised doubts about the universal applicability of the so called Surething Principle as a principle of rational decision.

The game is illustrated by the following anecdote. Two men, caught with stolen goods, are suspected of burglary, but there is not enough evidence to convict them of that crime, unless one or both confess. They could, however, be convicted of possession of stolen goods, a lesser offence.

The prisoners are not permitted to communicate. The situation is explained to each separately. If both confess, both will be convicted of burglary and sentenced to two years in prison. If neither confesses, they will be convicted of possession of stolen goods and given a sixmonth prison sentence. If only one confesses, he will go scot-free, while the other, convicted on the strength of his partner’s testimony, will get the maximum sentence of five years.

It is in the interest of each prisoner to confess. For if the other confesses, confession results in a two-year sentence, while holding out results in a five-year sentence. If the other does not confess, holding out results in a six-month sentence, while confession leads to freedom. Thus, ‘to confess’ is a dominating strategy, one that results in a preferred outcome regardless of the strategy used by the partner. A dominating strategy can be said to be dictated by the Sure-thing Principle. Nevertheless, if both, guided by the Sure-thing Principle, confess, both are worse off (with a two-year sentence) than if they had not confessed and had got a six-month sentence.

In this way, Prisoner’s Dilemma is seen as an illustration of the divergence between individual and collective rationality. Decisions that are rational from the point of view of each individual may be defective from the point of view of both or, more generally, all individuals in decision situations where each participant’s decision affects all participants.

Generalized to more than two participants (players), Prisoner’s Dilemma becomes a version of the so called Tragedy of the Commons (Hardin 1968) It is in each farmer’s interest to add a cow to his herd grazing on a communal pasture. But if each farmer follows his individual interest, the land may be overgrazed to everyone’s disadvantage. Over-harvesting in pursuit of profit by each nation engaged in commercial fishing is essentially Tragedy of the Commons in modern garb.

Many social situations are characterized by a similar bifurcation between decisions prescribed by individual and collective rationality. Price wars and arms races are conspicuous examples. In the context of Prisoner’s Dilemma, holding out would be regarded as an act of cooperation (with the partner, of course, not with the authorities); confession with noncooperation or defection.

Because the prescriptions of individual and collective rationality are contradictory, a normative theory of decision in situations of this sort becomes ambivalent. Attention naturally turns to the problem of developing a descriptive theory, one which would purport to describe (or to predict, if possible) how people, faced with dilemmas of this sort, actually decide under a variety of conditions.

As experimental social psychology was going through a rapid development in the 1950s, Prisoner’s Dilemma became a favourite experimental tool. It enabled investigators to gather large masses of data with relatively little effort. Moreover, the data were all ‘hard’, since the dichotomy between a cooperative choice in a Prisoner’s Dilemma game (C) and a defecting one (D) is unambiguous. Frequencies of these choices became the principal dependent variables in experiments on decision-making involving choices between acting in individual or collective interest. As for the independent variables, these ranged over the personal characteristics of the players (sex, occupation, nationality, personality profile), conditions under which the decisions were made (previous experience, opportunities for communication), characteristics or behaviour of partner, the payoffs associated with the outcomes of the game, etc. (cf. Rapoport et al. 1976, chs 9, 15, 18, 19).

Prisoner’s Dilemma is usually presented to experimental subjects in the form of a 2 × 2 matrix, whose rows, C1 and D1, represent one player’s choices, while the columns, C2 and D2 represent the choices of the other. The choices are usually made independently. Thus, the four cells of the matrix correspond to the four possible outcomes of the game: C1C2, C1D2, D1C2 and D1D2. Each cell displays two numbers, the first being the payoff to Row, the player choosing between C1 and D1, the second the payoff to Column, who chooses between C2 and D2. The magnitudes of the payoffs are such that strategy (choice) D of each player dominates strategy C. The decision problem is seen as a dilemma, because both players prefer outcome C1C2 to D1D2; yet to choose C entails forgoing taking advantage of the other player, should he choose C, or getting the worst of the four payoffs, should he choose D.

The experiments are usually conducted in one of three formats: (1) single play, where each player makes only one decision; (2) iterated play, in which several simultaneous sequential decisions are made by a pair of players; (3) iterated play against a programmed player, where the subject’s co-player’s choices are determined in a prescribed way, usually dependent on the subject’s choices.

The purpose of a single play is to see how different subjects will choose when there is no opportunity of interacting with the other player. The purpose of iterated play with two bona fide subjects is to study the effects of interaction between the successive choices. The purpose of play against a programmed player is to see how different (controlled) strategies of iterated plays influence the behaviour of the subject, whether, for example, cooperation is reciprocated or exploited, whether punishing defections has ‘deterrent’ effect, etc. For an extensive review of experiments with a programmed player, see Oskamp 1971.

The findings generated by experiments with Prisoner’s Dilemma are of various degrees of interest. Some are little more than confirmations of common sense expectations. For example, frequencies of cooperative choices in iterated plays vary as expected with the payoffs associated with the outcomes. The larger the rewards associated with reciprocated cooperation or the larger the punishments associated with double defection, the more frequent are the cooperative choices. The larger the punishment associated with unreciprocated cooperation, the more frequent are the defecting choices, and so on. As expected, opportunities to communicate with the partner enhance cooperation; inducing a competitive orientation in the subjects inhibits it.

Of greater interest are the dynamics of iterated play. Typically, the frequency of cooperative choices averaged over large numbers of subjects at first decreases, suggesting disappointment with unsuccessful attempts to establish cooperation. If the play continues long enough, average frequency of cooperation eventually increases, suggesting establishment of a tacit agreement between the players. The asymptotically approached frequency of cooperation represents only the mean and not the mode. Typically, the players) ‘lock in’ either on the C1C2 or on the D1D2 outcome (Rapoport and Chammah 1965).

Bimodality is observed also in iterated plays against a programmed player who cooperates unconditionally. Roughly one half of the subjects have been observed to reciprocate this cooperation fully, while one half have been observed to exploit it throughout, obtaining the largest payoff.

Comparison of the effects of various programmed strategies in iterated play showed that the so called Tit-for-tat strategy was the most effective in eliciting cooperation from the subjects. This strategy starts with C and thereafter duplicates the co-player’s choice on the previous play. Of some psychological interest is the finding that the subjects are almost never aware that they are actually playing against their own mirror image one play removed. In a way, this finding is a demonstration of the difficulty of recognizing that others’ behaviour towards one may be largely a reflection of one’s behaviour towards them. Escalation of mutual hostility in various situations may well be a consequence of this deficiency.

Perhaps the most interesting result of Prisoner’s Dilemma experiments with iterated play is that even if the number of iterations to be played is known to both subjects, nevertheless a tacit agreement to cooperate is often achieved. This finding is interesting because it illustrates dramatically the deficiency of prescriptions based on fully rigorous strategic reasoning.

At first thought, it seems that a tacit agreement to cooperate is rational in iterated play, because a defection can be expected to be followed by a retaliatory defection in ‘self defence’, so to say, by the other player with the view of avoiding the worst payoff associated with unreciprocated cooperation. However, this argument does not apply to the play known to be the last, because no retaliation can follow. Thus, D dominates C on the last play, and according to the Sure-Thing Principle, D1D2 is a foregone conclusion. This turns attention to the next-to-thelast play, which now is in effect, the ‘last play’, to which the same reasoning applies. And so on. Thus, rigorous strategic analysis shows that the strategy consisting of D’s throughout the iterated play is the only ‘rational one’, regardless of the length of the series.

The backward induction cannot be made if the number of iterations is infinite or unknown or determined probabilistically. In those cases, provided the probability of termination is not too large, the 100 per cent D strategy is not necessarily dictated by individual rationality. The question naturally arises about the relative merit of various strategies in iterated play of Prisoner’s Dilemma. This question was approached empirically by Axelrod (1984).

Persons interested in this problem were invited to submit programmes for playing iterated Prisoner’s Dilemma 200 times. Each programme was to be matched with every other programme submitted, including itself. The programme with the largest cumulated payoff was to be declared the winner of the contest.

Fifteen programmes were submitted, Tit-for-tat among them. It obtained the highest score. A second contest was announced, this time with probabilistic termination, 200 iterations being the expected number. The results of the first contest together with complete descriptions of the programmes submitted were publicized with the invitation to the second contest. This time 63 programmes were submitted from six countries. Tit-for-tat was again among them (submitted by the same contestant and by no other) and again obtained the highest score.

The interesting feature of this result was the fact that Tit-for-tat did not ‘beat’ a single programme against which it was pitted. In fact, it cannot beat any programme, since the only way to get a higher score than the co-player is to play more D’s than he, and this, by definition, Tit-for-tat cannot do. It can only either tie or lose, to be sure by no more than one play. It follows that Tit-for-tat obtained the highest score, because other programmes, presumably designed to beat their opponents, reduced each other’s scores when pitted against each other, including themselves. The results of these contests can be interpreted as further evidence of the deficiency of strategies based on attempts to maximize one’s individual gains in situations where both cooperative and competitive strategies are possible. Moreover, the superiority of cooperative strategies does not necessarily depend on opportunities for explicit agreements.

Support for the latter conjecture came from a somewhat unexpected source, namely, applications of game-theoretic concepts in the theory of evolution (Maynard Smith 1982; Rapoport 1985). Until recently, game-theoretic models used in theoretical biology were so called games against nature (e.g. Lewontin 1961). A ‘choice of strategy’ was represented by the appearance of a particular genotype in a population immersed in a stochastic environment. Degree of adaptation to the environment was reflected in relative reproductive success of the genotype, i.e. statistically expected numbers of progeny surviving the reproductive age. In this way, the population evolved towards the best adapted genotype.

In this model, adaptation depends only on the probability distribution of the states of nature occurring in the environment (e.g. wet or dry seasons) but not on the fraction of the population that has adopted a given strategy. When this dependence is introduced, the model becomes a genuine game-theoretic model with more than one bona fide player.

The model suggested by Prisoner’s Dilemma appeared in theoretical biology in connection with combats between members of the same species, for example over mates or territories. Assuming for simplicity two modes of fighting, fierce and mild, we can see the connection to Prisoner’s Dilemma by examining the likely result of evolution. In an encounter between a fierce and a mild fighter, the former wins, the latter loses. However, an encounter between two fierce fighters may impose more severe losses on both than an encounter between two mild fighters. With proper rank ordering or payoffs (relative reproductive success), the model becomes a Prisoner’s Dilemma. Development of non-lethal weapons, such as backward curved horns or behavioural inhibitions may have been results of natural selection which made lethal combats between members of the same species rare.

Iterated combats suggest comparison of the effectiveness of strategies in iterated play. Maynard Smith and Price (1973) observed a computer-simulated population of iterated Prisoner’s Dilemma players, using different strategies, whereby the payoffs were translated into differential reproduction rates of the players using the respective strategies. In this way, the) ‘evolution’ of the population could be observed. Eventually, the ‘Retaliators’, essentially Tit-for-tat players, replaced all others.

A central concept in game-theoretic models of evolution is that of the evolutionarily stable strategy (ESS). It is stable in the sense that a population consisting of genotypes representing that strategy cannot be ‘invaded’ by isolated mutants or immigrants, since such invaders will be disadvantaged with respect to their reproductive success. It has been shown by computer simulation that a population represented by programmes submitted to the above-mentioned contests evolved towards Tit-for-tat as an evolutionarily stable strategy. It was, however, shown subsequently that it is not the only such strategy.

In sum, the lively interest among behavioural scientists and lately many biologists in Prisoner’s Dilemma can be attributed to the new ideas generated by the analysis of that game and by results of experiments with it. The different prescriptions of decisions based on individual and collective rationality in some conflict situations cast doubt on the very meaningfulness of the facile definition of ‘rationality’ as effective maximization of one’s own expected gains, a definition implicit in all manners of strategic thinking, specifically in economic, political, and military milieus. Models derived from Prisoner’s Dilemma point to a clear refutation of a basic assumption of classical economics, according to which pursuit of self-interest under free competition results in collectively optimal equilibria. These models also expose the fallacies inherent in assuming the) ‘worst case’ in conflict situations. The assumption is fully justified in the context of two-person zero sum games but not in more general forms of conflict, where interests of participants partly conflict and partly coincide. Most conflicts outside the purely military sphere are of this sort.

Finally, Prisoner’s Dilemma and its generalization, the Tragedy of the Commons, provide a rigorous rationale for Kant’s Categorical Imperative: act in the way you wish others to act. Acting on this principle reflects more than altruism. It reflects a form of rationality, which takes into account the circumstance that the effectiveness of a strategy may depend crucially on how many others adopt it and the fact that a strategy initially successful may become self-defeating because its success leads others to imitate it. Thus, defectors in Prisoner’s Dilemma may be initially successful in a population of cooperators. But if this success leads to an increase of defectors and a decrease of cooperators, success turns to failure. Insights of this sort are of obvious relevance to many forms of human conflict.

See Also