Keywords

JEL Classifications

The theory of learning and evolution in games provides models of disequilibrium behaviour in strategic settings. Much of the theory focuses on whether and when disequilibrium behaviour will resolve in equilibrium play, and, if it does, on predicting which equilibrium will be played. But the theory also offers techniques for characterizing perpetual disequilibrium play.

A Taxonomy

Models from evolutionary game theory consider the behaviour of large populations in strategic environments. In the biological strand of the theory, agents are genetically programmed to play fixed actions, and changes in the population’s composition are the result of natural selection and random mutations. In economic approaches to the theory, agents actively choose which actions to play using simple myopic rules, so that changes in aggregate behaviour are the end result of many individual decisions. Deterministic evolutionary dynamics, usually taking the form of ordinary differential equations, are used to describe behaviour over moderate time spans, while stochastic evolutionary dynamics, modelled using Markov processes, are more commonly employed to study behaviour over very long time spans.

Models of learning in games focus on the behaviour of small groups of players, one of whom fills each role in a repeated game. These models too can be partitioned into two categories. Models of heuristic learning (or adaptive learning) resemble evolutionary models, in that their players base their decisions on simple myopic rules. One sometimes can distinguish the two sorts of models by the inputs to the agents’ decision rules. In both the stochastic evolutionary model of c, Kandori, Mailath and Rob (1993) and the heuristic learning model of Young (1993), agents’ decisions take the form of noisy best responses. But in the former model agents evaluate each action by its performance against the population’s current behaviour, while in the latter they consider performance against the time averages of opponents’ past play.

In models of coordinated Bayesian learning (or rational learning), each player forms explicit beliefs about the repeated game strategies employed by other players, and plays a best response to those beliefs in each period. The latter models assume a degree of coordination of players’ prior beliefs that is sufficient to ensure that play converges to Nash equilibrium. By dropping this coordination assumption, one obtains the more general class of Bayesian learning (or belief learning) models. Since such models can entail quite naive beliefs, belief learning models overlap with heuristic learning models – see section “Learning in Games” below.

Evolutionary Game Theory

The roots of evolutionary game theory lie in mathematical biology. Maynard Smith and Price (1973) introduced the equilibrium notion of an evolutionarily stable strategy (or ESS) to capture the possible stable outcomes of a dynamic evolutionary process by way of a static definition. Later, Taylor and Jonker (1978) offered the replicator dynamic as an explicitly dynamic model of the natural selection process. The decade that followed saw an explosion of research on the replicator dynamic and related models of animal behaviour, population ecology, and population genetics: see Hofbauer and Sigmund (1988).

In economics, evolutionary game theory studies the behaviour of populations of strategically interacting agents who actively choose among the actions available to them. Agents decide when to switch actions and which action to choose next using simple myopic rules known as revision protocols (see Sandholm 2006). A population of agents, a game, and a revision protocol together define a stochastic process – in particular, a Markov process – on the set of population states.

Deterministic Evolutionary Dynamics

How the analysis proceeds depends on the time horizon of interest. Suppose that for the application in question, our interest is in moderate time spans. Then if the population size is large enough, the idiosyncratic noise in agent’s choices is averaged away, so that the evolution of aggregate behaviour follows an almost deterministic path (Benaïm and Weibull 2003). This path is described by a solution to an ordinary differential equation. For example, Björnerstedt and Weibull (1996) and Schlag (1998) show that if agents use certain revision protocols based on imitation of successful opponents, then the population’s aggregate behaviour follows a solution to Taylor and Jonker’s (1978) replicator dynamic. This argument provides an alternative, economic interpretation of this fundamental evolutionary model.

Much of the literature on deterministic evolutionary dynamics focuses on connections with traditional game theoretic solution concepts. For instance, under a wide range of deterministic dynamics, all Nash equilibria of the underlying game are rest points. While some dynamics (including the replicator dynamic) have additional non-Nash rest points, there are others under which rest points and Nash equilibria are identical (Brown and von Neumann, 1950; Smith 1984; Sandholm 2006).

A more important question, though, is whether Nash equilibrium will be approached from arbitrary disequilibrium states. For certain specific classes of games, general convergence results can be established (Hofbauer 2000; Sandholm 2007). But beyond these classes, convergence cannot be guaranteed. One can construct games under which no reasonable deterministic evolutionary dynamic will converge to equilibrium – instead, the population cycles through a range of disequilibrium states forever (Hofbauer and Swinkels 1996; Hart and Mas-Colell 2003). More surprisingly, one can construct games in which nearly all deterministic evolutionary dynamics not only cycle for ever, but also fail to eliminate strictly dominated strategies (Hofbauer and Sandholm 2006). If we truly are interested in modelling the dynamics of behaviour, these results reveal that our predictions cannot always be confined to equilibria; rather, more complicated limit phenomena like cycles and chaotic attractors must also be permitted as predictions of play.

Stochastic Evolutionary Dynamics

If we are interested in behaviour over very long time horizons, deterministic approximations are no longer valid, and we must study our original Markov process directly. Under certain non-degeneracy assumptions, the long-run behaviour of this process is captured by its unique stationary distribution, which describes the proportion of time the process spends in each population state.

While stochastic evolutionary processes can be more difficult to analyse than their deterministic counterparts, they also permit us to make surprisingly tight predictions. By making the amount of noise in agents’ choice rules vanishingly small, one can often ensure that all mass in the limiting stationary distribution is placed on a single population state. This stochastically stable state provides a unique prediction of play even in games with multiple strict equilibria (Foster and Young 1990; Kandori, Mailath and Rob, 1993).

The most thoroughly studied model of stochastic evolution considers agents who usually play a best response to the current population state, but who occasionally choose a strategy at random. Kandori, Mailath and Rob (1993) show that if the agents are randomly matched to play a symmetric 2 × 2 coordination game, then taking the probability of ‘mutations’ to zero generates a unique stochastically stable state. In this state, called the risk dominant equilibrium, all agents play the action that is optimal against an opponent who is equally likely to choose each action.

Selection results of this sort have since been extended to cases in which the underlying game has an arbitrary number of strategies, as well as to settings in which agents are positioned on a fixed network, interacting only with neighbours (see Kandori and Rob 1995; Blume 2003; Ellison 1993; 2000). Stochastic stability has also been employed in contexts where the underlying game has a nontrivial extensive form; these analyses have provided support for notions of backward induction (for example, subgame perfection) and forward induction (for example, signalling game equilibrium refinements): see Nöldeke and Samuelson (1993) and Hart (2002).

Still, these selection results must be interpreted with care. When the number of agents is large or the rate of ‘mutation’ is small, states that fail to be stochastically stable can be coordinated upon for great lengths of time (Binmore, Samuelson and Vaughan, 1995). Consequently, if the relevant time span for the application at hand is not long enough, the stochastically stable state may not be the only reasonable prediction of behaviour.

Learning in Games

Heuristic Learning

Learning models study disequilibrium adjustment processes in repeated games. Like evolutionary models, heuristic learning models assume that players employ simple myopic rules in deciding how to act. In the simplest of these models, each player decides how to act by considering the payoffs he has earned in the past. For instance, under reinforcement learning (Börgers and Sarin 1997; Erev and Roth 1998), agents choose each strategy with probability proportional to the total payoff that the strategy has earned in past periods.

By considering rules that look not only at payoffs earned, but also at payoffs foregone, one can obtain surprisingly strong convergence results. Define a player’s regret for (not having played) action a to be the difference between the average payoff he would have earned had he always played a in the past, and the average payoff he actually received. Under regret matchingt, each action whose regret is positive is chosen with probability proportional to its regret. Hart and Mas-Colell (2000) show that regret matching is a consistent repeated game strategy: it forces a player’s regret for each action to become nonpositive. If used by all players, regret matching ensures that their time-averaged behaviour converges to the set of coarse correlated equilibria of the underlying game. (Coarse correlated equilibrium is a generalization of correlated equilibrium under which players’ incentive constraints must be satisfied at the ex ante stage rather than at the interim stage: see Young 2004.)

Some of the most striking convergence results in the evolution and learning literature establish a stronger conclusion: namely, convergence of time-averaged behaviour to the set of correlated equilibria, regardless of the game at hand. The original result of this sort is due to Foster and Vohra (1997; 1998), who prove the result by constructing a calibrated procedure for forecasting opponents’ play. A forecasting procedure produces probabilistic forecasts of how opponents will act. The procedure is calibrated if in those periods in which the forecast is given by the probability vector p, the empirical distribution of opponents’ play is approximately p. It is not difficult to show that if players always choose myopic best responses to calibrated forecasts, then their time-averaged behaviour converges to the set of correlated equilibria.

Hart and Mas-Colell (2000) construct simpler procedures – in particular, procedures that define conditionally consistent repeated game strategies – also ensure convergence to correlated equilibrium. A repeated game strategy is conditionally consistent if for each frequently played action a, the agent would not have been better off had he always played an alternative action a′ in place of a. As a matter of definition, the use of conditionally consistent strategies by all players leads time-averaged behavior to converge to the set of correlated equilibria.

Another variety of heuristic learning models, based on random search and independent verification, ensures a stochastic form of convergence to Nash equilibrium regardless of the game being played (Foster and Young 2003). However, in these models the time required before equilibrium is first reached is quite long, making them most relevant to applications with especially long time horizons.

In some heuristic learning models, players use simple rules to predict how opponents will behave, and then respond optimally to those predictions. The leading examples of such models are fictitious play and its stochastic variants (Brown 1951; Fudenberg and Kreps 1993): in these models, the prediction about an opponents’ next period play is given by the empirical frequencies of his past plays. Beginning with Robinson (1951), many authors have proved convergence results for standard and stochastic fictitious play in specific classes of games (see Hofbauer and Sandholm (2002) for an overview). But as Shapley (1964) and others have shown, these models do not lead to equilibrium play in all games.

Coordinated Bayesian Learning

The prediction rule underlying two-player fictitious play can be described by a belief about the opponent’s repeated game strategy that is updated using Bayes’s rule in the face of observed play. This belief specifies that the opponent choose his stage game actions in an i.i.d. fashion, conditional on the value of an unknown parameter. (In fact, the player’s beliefs about this parameter must come from the family of Dirichlet distributions, the conjugate family of distributions for multinomial trials.) Evidently, each player’s beliefs about his opponent are wrong: player 1 believes that player 2 chooses actions in an i.i.d. fashion, whereas player 2 actually plays optimally in response to his own (i.i.d.) predictions about player 1’s behaviour. It is therefore not surprising that fictitious play processes do not converge in all games.

In models of coordinated Bayesian learning (or rational learning), it is not only supposed that players form and respond optimally to beliefs about the opponent’s repeated game strategy; it is also assumed that the players’ initial beliefs are coordinated in some way. The most studied case is one in which prior beliefs satisfy an absolute continuity condition: if the distribution over play paths generated by the players’ actual strategies assigns positive probability to some set of play paths, then so must the distribution generated by each player’s prior. A strong sufficient condition for absolute continuity is that each player’s prior assigns a positive probability to his opponent’s actual strategy.

The fundamental result in this literature, due to Kalai and Lehrer (1993), shows that under absolute continuity, each player’s forecast along the path of play is asymptotically correct, and the path of play is asymptotically consistent with Nash equilibrium play in the repeated game. Related convergence results have been proved for more complicated environments in which each player’s stage game payoffs are private information (Jordan 1995; Nyarko 1998). If the distributions of players types are continuous, then the sense in which play converges to equilibrium can involve a form of purification: while actual play is pure, it appears random to an outside observer.

How much coordination of prior beliefs is needed to prove convergence to equilibrium play? Nachbar (2005) proves that for a large class of repeated games, for any belief learning model, there are no prior beliefs that satisfy three criteria: learnability, consistency with optimal play, and diversity. Thus, if players can learn to predict one another’s behaviour, and are capable of responding optimally to their updated beliefs, then each player’s beliefs about his opponents must rule out some seemingly natural strategies a priori. In this sense, the assumption of coordinated prior beliefs that ensures convergence to equilibrium in rational learning models does not seem dramatically weaker than a direct assumption of equilibrium play.

For additional details about the theory of learning and evolution in games, we refer the reader to the entries on specific topics listed in the cross-references below.

See Also