Keywords

1 Introduction and Motivation

Strategies are the basic objects of study in a game-theoretic model. The standard interpretation is that a strategy represents a player’s general plan of action. That is, player i’s strategy describes the action that player i will choose whenever she is required to make a decision according to the rules of the game.

Traditionally, game theorists have focused on identifying profiles of strategies that constitute an “equilibrium” (e.g., the Nash equilibrium and its refinements). A typical game-theoretic analysis runs as follows: Given a game G, there is an associated solution space \(S_G\) describing all the possible outcomes of G. In a one-shot game (called a strategic game; see Sect. 2.1 for details), this is the set of all tuples of strategies (a tuple of strategies, one for each player, is called a strategy profile)Footnote 1. Abstractly, a solution for a game G is a subset of the solution space \(S_G\). The subset of \(S_G\) identified by a solution concept is intended to represent the “rational outcomes” of the game G.

Suppose that \(S\subseteq S_G\) is a solution for the game G. The elements of S are privileged outcomes of G, but what, exactly, distinguishes them from the other outcomes in \(S_G\)? The standard approach is to require that for each profile in S, players should not have an incentive to deviate from their prescribed strategy, given that the other players follow their own prescribed strategies. This is an internal constraint on the elements of a solution set since it requires that the strategies in a profile are related to each other in a particular way. This chapter takes a different perspective on the above question by imposing a different constraint on the profiles in S: Each player’s prescribed strategy should be “optimal” given her beliefs about what the other players are going to do. This constraint is external since it refers to the players’ “beliefs”, which are typically not part of the mathematical representation of the game.

It is not hard to think of situations in which the internal and external constraints on solution concepts discussed above are not jointly satisfied. The point is that players may have very good reasons to believe that the other players are choosing certain strategies, and so, they choose an optimal strategy based on these beliefs. There is no reason to expect that the resulting choices will satisfy the above internal constraint unless one makes strong assumptions about how the players’ beliefs are relatedFootnote 2. The external constraint on solution concepts can be made more precise by taking a “Bayesian” perspective on game theory [46]: In a game-theoretic situation, as in any situation of choice, the rational choice for a player is the one that maximizes expected utility with respect to a (subjective) probability measure over the other players’ strategy choices. A sophisticated literature has developed around this simple idea: it focuses on characterizing solution concepts in terms of what the players know and believe about the other players’ strategy choices and beliefs (see, for example, [7, 22, 26, 60] and [63] for a textbook presentation).

In this chapter, I shift the focus from beliefs about the other players’ choices to the underlying processes that lead (rational) players to adopt certain strategies. An early formulation of this idea can found in John C. Harsanyi’s seminal paper [42], in which he introduced the tracing procedure to select an equilibrium in any finite game:

The n players will find the solution s of a giving game G through an intellectual process of convergent expectations, to be called the solution process....During this process, they will continually and systematically modify [their] expectations—until, at the end of this process, their expectations will come to converge on one particular equilibrium point s in the game G. (original italics) [42, pg. 71]

The goal of the tracing procedure is to identify a unique Nash equilibrium in any finite strategic game. The idea is to define a continuum of games in such a way that each of the games has a unique Nash equilibrium. The tracing procedure identifies a path through this space of games ending at a unique Nash equilibrium in the original game. Harsanyi thought of this procedure as “being a mathematical formalization of the process by which rational players coordinate their choices of strategies.”

Harsanyi, in collaboration with Reinhard Selten [43], turned these basic ideas into a beautiful theory of equilibrium selection. This theory is now part of the standard education for any game theorist. Nonetheless, it is not at all clear that this theory of equilibrium selection is best interpreted as a formalization of the players’ processes of “rational deliberation” in game situations (see [72, pgs. 154–158] for a discussion of this point). In this chapter, I will critically discuss three recent frameworks in which the players’ process of “rational deliberation” takes center stage:

  1. 1.

    Brian Skyrms’ model of “dynamic deliberation,” in which players deliberate by calculating their expected utility and then use this new information to recalculate their probabilities about the states of the world and their expected utilities [72].

  2. 2.

    Robin Cubitt and Robert Sugden’s recent contribution that develops a “reasoning-based expected utility” procedure for solving games (building on David Lewis’ “common modes of reasoning”) [31, 33].

  3. 3.

    Johan van Benthem et col.’s analysis of solution concepts as fixed-points of iterated “(virtual) rationality announcements” [3, 15, 18, 20, 21].

Although the details of these frameworks are quite different, they share a common line of thought: In contrast to classical game theory, solution concepts are no longer the basic object of study. Instead, the “rational solutions” of a game are arrived at through a process of “rational deliberation”. My goal in this chapter is to provide a (biased) overview of some key technical and conceptual issues that arise when developing mathematical models of players deliberating about what to do in a game situation.

2 Background

I assume that the reader is familiar with the basics of game theory (see [52] and [2] for concise discussions of the key concepts, definitions and theorems) and formal models of knowledge and belief (see [19, 57] for details). In this section, I introduce some key definitions in order to fix notation.

2.1 Strategic Games

A strategic game is a tuple \(\langle N, \{S_i\}_{i\in N}, \{u_i\}_{i\in N}\rangle \) where N is a (finite) set of players; for each \(i\in N\), \(S_i\) is a finite set (elements of which are called actions or strategies); and for each \(i\in N\), \(u_i:\varPi _{i\in N} S_i\rightarrow \mathbb {R}\) is a utility function assigning real numbers to each outcome of the game (i.e., tuples consisting of the choices for each player). Strategic games represent situations in which each player makes a single decision, and all the players make their decisions simultaneously. If \(s\in \varPi _{i\in N} S_i\) is a strategy profile, then write \(s_i\) for the ith component of s and \(s_{-i}\) for the sequence consisting of all components of s except for \(s_i\) (let \(S_{-i}\) denote all such sequences of strategies).

Recall from the introduction that the solution space \(S_G\) for a game G is the set of all outcomes of G. Since we identify the outcomes of a game with the set of strategy profiles, we have \(S_G=\varPi _{i\in N} S_i\). This means that a “solution” to a strategic game is a distinguished set of strategy profiles. In the remainder of this section, I will define some standard game- and decision-theoretic notions that will be used throughout this chapter.

Mixed Strategies. Let \(\varDelta (X)\) denote the set of probability measures over the finiteFootnote 3 set X. A mixed strategy for player i, is an element \(m_i\in \varDelta (S_i)\). If \(m_i\in \varDelta (S_i)\) assigns probability 1 to an element \(s_i\in S_i\), then \(m_i\) is called a pure strategy (in such a case, I write \(s_i\) for \(m_i\)). Mixed strategies are incorporated into a game-theoretic analysis as follows. Suppose that \(G=\langle N, \{S_i\}_{i\in N}, \{u_i\}_{i\in N}\rangle \) is a finite strategic game. The mixed extension of G is the strategic game in which the strategies for player i are the mixed strategies in G (i.e., \(\varDelta (S_i)\)), and the utility for player i (denoted \(U_i\)) of the joint mixed strategy \(m\in \varPi _{i\in N} \varDelta (S_i)\) is calculated in the obvious way (let \(m(s)=m_1(s_1)\cdot m_2(s_2)\cdots m_n(s_n)\) for \(s\in \varPi _{i\in N} S_i\)):

$$U_i(m)=\sum _{s\in \varPi _{i\in N}S_i} m(s)\cdot u_i(s).$$

Thus, the solution space of a mixed extension of the game G is the set \(\varPi _{i\in N} \varDelta (S_i)\).

Mixed strategies play an important role in many game-theoretic analyses. However, the interpretation of mixed strategies is controversial, as Ariel Rubinstein notes: “We are reluctant to believe that our decisions are made at random. We prefer to be able to point to a reason for each action we take. Outside of Las Vegas we do not spin roulettes” [69, pg. 913]. For the purposes of this chapter, I will assume that players choose only pure strategies. Mixed strategies do play a role in Sect. 3, where they describe each players’ beliefs about what they will do (at the end of deliberation).

Nash Equilibrium. The most well-known and extensively studied solution concept is the Nash equilibrium. Let \(G=\langle N, \{S_i\}_{i\in N}, \{u_i\}_{i\in N}\rangle \) be a finite strategic game. A mixed strategy profile \(m=(m_1,\ldots ,m_n)\in \varPi _{i\in N} \varDelta (S_i)\) is a Nash equilibrium provided for all \(i\in N\),

$$U_i(m_1,\ldots ,m_i,\ldots ,m_n)\ge U_i(m_1,\ldots ,m_i',\ldots ,m_n),\qquad \text {for all } m_i'\in \varDelta (S_i).$$

This definition is an example of the internal constraint on solutions discussed in the introduction. Despite its prominence in the game theory literature, the Nash equilibrium faces many foundational problems [68]. For example, there are theoretical concerns about what the players need to know in order to play their component of a Nash equilibrium [10, 62]; questions about how players choose among multiple Nash equilibria; and many experiments purporting to demonstrate game-theoretic situations in which the player’s choices do not form a Nash equilibrium. Nash equilibrium does not play an important role in this chapter. I focus, instead, on the outcomes of a game that can be reached through a process of “rational deliberation”.

Iteratively Removing Strategies. A strategy \(s\in S_i\) strictly dominates strategy \(s'\in S_i\) provided that

$$\forall s_{-i}\in S_{-i}\ \ u_i(s,s_{-i}) > u_i(s',s_{-i}).$$

A strategy \(s\in S_i\) weakly dominates strategy \(s'\in S_i\) provided that

$$\forall s_{-i}\in S_{-i}\ \ u_i(s,s_{-i}) \ge u_i(s',s_{-i})\ \ \text { and } \ \ \exists s_{-i}\in S_{-i}\ \ u_i(s,s_{-i}) > u_i(s', s_{-i}).$$

More generally, the strategy s strictly/weakly dominates \(s'\) with respect to a set \(X\subseteq S_{-i}\) if \(S_{-i}\) is replaced with X in the above definitionsFootnote 4. Suppose that \(G=\langle N, \{S_i\}_{i\in N}, \{u_i\}_{i\in N}\rangle \) and \(G'=\langle N, \{S_i'\}_{i\in N}, \{u'_i\}_{i\in N}\rangle \) are strategic games. The game \(G'\) is a restriction of G provided that for each \(i\in N\), \(S_i'\subseteq S_i\) and \(u_i'\) is the restriction of \(u_i\) to \(\varPi _{i\in N}S_i'\). To simplify notation, write \(G_i\) for the set of strategies for player i in game G. Strict and weak dominance can be used to reduce a strategic game. Write \(H\longrightarrow _{SD} H'\) whenever \(H\ne H'\), \(H'\) is a restriction of H and

$$\forall i\in N, \forall s_i\in H_i \setminus H_i'\ \exists s'_i\in H_i s_i \text { is strictly dominated in } H \text { by }s'_i$$

So, if \(H\longrightarrow _{SD} H'\), then \(H'\) is the result of removing some of the strictly dominated strategies from H. We can iterate this process of removing strictly dominated strategies. Formally, H is the result of iteratively removing strictly dominated strategies (IESDS) provided that \(G\longrightarrow ^*_{SD} H\), where \(\longrightarrow ^*\) is the reflexive transitive closureFootnote 5 of a relation \(\longrightarrow \).

The above definition can be easily adapted to other choice rules, such as weak dominance. Let \(\longrightarrow _{WD}\) denote the relation between games defined as above using weak dominance instead of strict dominanceFootnote 6. Furthermore, the above definition of iterated removal of strictly/weakly dominated strategies can be readily adapted to the mixed extensions of a strategic game.

There are a number of ways to interpret the iterative process of removing strategies, defined above. The first is that it is an algorithm that a game theorist can use to find an equilibrium in a game. The second interpretation views the successive steps of the removal process as corresponding to the players’ higher-order beliefs (i.e., player i believes that player j believes that player i believes that...that player i will not play such-and-such strategy). Finally, the third interpretation is that the iterative process of removing strategies tracks the “back-and-forth reasoning” players engage in as they decide what to do in a game situation (i.e., if player i does not play such-and-such a strategy, then player j will not play such-and-such a strategy, and so on).

Bayesian Rationality. In this chapter, I am interested not only in solutions to a game, but also what the players believe about the outcomes of a game. Let \(G=\langle N, \{S_i\}_{i\in N}, \{u_i\}_{i\in N}\rangle \) be a strategic game. A probability measure \(\pi \in \varDelta (S_{-i})\) is called a conjecture for player i. The expected utility of \(s\in S_i\) for player i with respect to \(\pi \in \varDelta (S_{-i})\) is:

$$EU_\pi (s)=\sum _{\sigma _{-i}\in S_{-i}}\pi (\sigma _{-i})\cdot u_i(s,\sigma _{-i}).$$

We say that \(s\in S_i\) maximizes expected utility with respect to \(\pi \in \varDelta (S_{-i})\), denoted \(MEU(s,\pi )\), if for all \(s'\in S_i\), \(EU_\pi (s)\ge EU_\pi (s')\).

$$*******$$

One conclusion to draw from the discussion in this section is that much can be said about the issues raised in this chapter using standard game-theoretic notions. Indeed, it is standard for a game theorist to distinguish between the ex ante and ex interim stages of decision makingFootnote 7. In the former, the players have not yet decided what strategy they will choose, while, in the latter, the players know their own choices but not their opponents’. However, the process by which the players form their beliefs in the ex interim stage is typically not discussed. The frameworks discussed in the remainder of this chapter are focused on making this process explicit.

2.2 Game Models

A game model describes a particular play of the game and what the players think about the other players. That is, a game model represents an “informational context” of a given play of the game. This includes the “knowledge” the players have about the game situation and what they think about the other players’ choices and beliefs. Researchers interested in the foundations of decision theory, epistemic and doxastic logic and formal epistemology have developed many different formal models to describe the variety of informational attitudes important for assessing decision maker’s choices in a decision- or game-theoretic situation. See [19] for an overview and pointers to the relevant literature. In this section, I present the details of a logical framework that can be used to reason about the informational context of a game.

Syntactic issues do not play an important role in this chapter. Nonetheless, I will give the definition of truth for a relevant formal language, as it makes for a smoother transition from the game theory literature to the literature on dynamic epistemic logic and iterated belief change discussed in Sect. 5.1. Consult [19, 57, 58] for a discussion of the standard logical questions about axiomatics, definability, decidability of the satisfiability problem, and so on.

Epistemic-Plausibility Models. Variants of the models presented in this section have been studied extensively by logicians [13, 17, 19], game theorists [23], philosophers [51, 74] and computer scientists [25, 48]. The models are intended to describe what the players know and believe about an outcome of the game.

The first component of an epistemic-plausibility model is a nonempty set W of states (also called worlds). Each state in a game model will be associated with an outcome of a game G via a function \(\sigma \), called the outcome map. So, for a state w, \(\sigma (w)\) is the element of \(S_G\) realized at state w. Let \(\sigma _i(w)\) denote the ith component of \(\sigma (w)\) (so, \(\sigma _i(w)\) is the strategy played by i at state w). The atomic propositions are intended to describe different aspects of the the outcomes of a game. For example, they could describe the specific action chosen by a player or the utility assigned to the outcome by a given player. There are a number of ways to make this precise. Perhaps the simplest is to introduce, for each player i and strategy \(a\in S_i\), an atomic proposition \(play_i(a)\) intended to mean “player i is playing strategy a.” For a game \(G=\langle N, \{S_i\}_{i\in N}, \{u_i\}_{i\in N}\rangle \), let \(\mathsf {At}(G)=\{play_i(a) \ |\ i\in N\text { and } a\in S_i\}\) be the set of atomic propositions for the game G.

There are two additional components to an epistemic-plausibility model. The first is a set of equivalence relations \(\sim _i\), one for each player. The intended reading of \(w\sim _i v\) is that “everything that i knows at w is true at v”. Alternatively, I will say that “player i does not have enough information to distinguish state w from state v.”

The second component is a plausibility ordering for each player: a pre-order (reflexive and transitive) \(w\preceq _i v\) that says “agent i considers world w at least as plausible as v.” As a convenient notation, for \(X\subseteq W\), set Min \(_{\preceq _i}(X)=\{v\in X\ |\ v\preceq _i w \text { for all }w\in X\}\), the set of minimal elements of X according to \(\preceq _i\). This is the subset of X that agent i considers the “most plausible”. Thus, while the \(\sim _i\) partitions the set of possible worlds according to i’s “hard information”, the plausibility ordering \(\preceq _i\) represents which of the possible worlds agent i considers more likely (i.e., it represents i’s “soft information”).

Putting everything together, the definition of an epistemic-plausibility model is as follows:

Definition 1

Suppose that \(G=\langle N, \{S_i\}_{i\in N}, \{u_i\}_{i\in N}\rangle \) is a strategic game. An epistemic-plausibility model for G is a tuple \(\mathcal {M}= \langle W,\{\sim _i\}_{i\in \mathcal {A}},\{\preceq _i\}_{i\in \mathcal {A}},\sigma \rangle \) where \(W\ne \emptyset \); for each \(i\in \mathcal {A}\), \(\sim _i\subseteq W\times W\) is an equivalence relation (each \(\sim _i\) is reflexive: for each \(w\in W\), \(w\sim _i w\); transitive: for each \(w,v,u\in W\), if \(w\sim _i v\) and \(v\sim _i u\) then \(w\sim _i u\); and Euclidean: for each \(w,v,u\in W\), if \(w\sim _i v\) and \(w\sim _i u\), then \(v\sim _i u\)); for each \(i\in \mathcal {A}\), \(\preceq _i\) is a well-founded (every non-empty set of states has a minimal element)Footnote 8 reflexive and transitive relation on W; and \(\sigma \) is an outcome map. In addition, the following two conditions are imposed for all \(w,v\in W\):

  1. 1.

    if \(w\preceq _i v\) then \(w\sim _i v\) (plausibility implies possibility), and

  2. 2.

    if \(w\sim _i v\) then either \(w\preceq _i v\) or \(v\preceq _i w\) (locally-connected). \(\triangleleft \)

Models without plausibility relations are called epistemic models.

Remark 1

Note that if \(w\not \sim _i v\), then, since \(\sim _i\) is symmetric, I also have \(v\not \sim _i w\), and so by property 1, \(w\not \preceq _i v\) and \(v\not \preceq _i w\). Thus, I have the following equivalence: \(w\sim _i v\) iff \(w\preceq _i v\) or \(v\preceq _i w\). In what follows, unless otherwise stated, I will assume that \(\sim _i\) is defined as follows: \(w\sim _i v\) iff \(w\preceq _i v\) or \(v\preceq _i w\).

For each strategic game G, let \(\mathcal {L}_{KB}(G)\) be the set of sentences generated by the following grammarFootnote 9:

$$\varphi \ :=\ play_i(a)\ |\ \lnot \varphi \ |\ \varphi \wedge \psi \ |\ B_i^\varphi \psi \ |\ K_i\varphi $$

where \(i\in N\) and \(play_i(a)\in \mathsf {At}(G)\). The additional propositional connectives (\(\rightarrow ,\leftrightarrow ,\vee \)) are defined as usual and the dual of \(K_i\), denoted \(L_i\), is defined as follows: \(L_i\varphi \ := \lnot K_i\lnot \varphi \). The intended interpretation of \(K_i\varphi \) is “agent i knows that \(\varphi \)Footnote 10. The intended interpretation of \(B_i^\varphi \psi \) is “agent i believes \(\psi \) under the supposition that \(\varphi \) is true”.

Truth for formulas in \(\mathcal {L}_{KB}(G)\) is defined as usual. Let \([w]_i\) be the equivalence class of w under \(\sim _i\). Then, local connectedness implies that \(\preceq _i\) totally orders \([w]_i\), and well-foundedness implies that Min \(_{\preceq _i}([w]_i\cap X)\) is nonempty if \([w]_i\cap X\ne \emptyset \).

Definition 2

(Truth for \(\mathcal {L}_{KB}(G)\) ). Given an epistemic-plausibility model \(\mathcal {M}=\langle W, \{\sim _i\}_{i\in \mathcal {A}}, \{\preceq _i\}_{i\in \mathcal {A}},\sigma \rangle \). Truth for formulas from \(\mathcal {L}_{KB}(G)\) is defined recursively:

  • \(\mathcal {M},w\models play_i(a)\) iff \(\sigma _i(w)=a\)

  • \(\mathcal {M},w\models \lnot \varphi \) iff \(\mathcal {M},w\not \models \varphi \)

  • \(\mathcal {M},w\models \varphi \wedge \psi \) iff \(\mathcal {M},w\models \varphi \) and \(\mathcal {M},w\models \psi \)

  • \(\mathcal {M},w\models K_i\varphi \) iff for all \(v\in W\), if \(w\sim _i v\) then \(\mathcal {M},v\models \varphi \)

  • \(\mathcal {M},w\models B^\varphi _i\psi \) iff for all \(v\in Min_{\preceq _i}([w]_i\cap [\![{\varphi }]\!]_{\mathcal {M}})\), \(\mathcal {M},v\models \varphi \)

Thus, i believes \(\psi \) conditional on \(\varphi \), \(B_i^\varphi \psi \), if i’s most plausible \(\varphi \)-worlds (i.e., the states satisfying \(\varphi \) that i has not ruled out and considers most plausible) all satisfy \(\psi \). Full belief is defined as follows: \(B_i\varphi \ :=\ B^\top \varphi \). Then, the definition of plain belief is:

$$\mathcal {M},w\models B_i\varphi \text { iff for each }v\in Min_{\preceq _i}([w]_i), \mathcal {M},v\models \varphi .$$

I illustrate the above definition with the following coordination game:

figure a

The epistemic-plausibility model below describes a possible configuration of ex ante beliefs of the players (i.e., before the players have settled on a strategy): I draw an i-labeled arrow from v to w if \(w\preceq _i v\) (to keep minimize the clutter, I do not include all arrows; the remaining arrows can be inferred by reflexivity and transitivity).

figure b

Following the convention discussed in Remark 1, we have \([w_1]_a=[w_1]_b=\{w_1,w_2,w_3,w_4\}\), and so, neither Ann nor Bob knows how the game will end. Furthermore, both Ann and Bob believe that they will coordinate with Ann choosing u and Bob choosing l:

$$B_a(play_a(u)\wedge play_b(l))\wedge B_b(play_a(u)\wedge play_b(l))$$

is true at all states. However, Ann and Bob do have different conditional beliefs. On the one hand, Ann believes that their choices are independent; thus, she believes that \(play_b(l)\) is true even under the supposition that \(play_a(d)\) is true (i.e., she continues to believe that Bob will play l even if she decides to play d). On the other hand, Bob believes that their choices are somehow correlated; thus, under the supposition that \(play_b(r)\) is true, Bob believes that Ann will choose d. Conditional beliefs describe an agent’s disposition to change her beliefs in the presence of (perhaps surprising) evidence (cf. [49]).

Common Knowledge and Belief. States in an epistemic-plausibility model not only represent the players’ beliefs about what their opponents will do, but also their higher-order beliefs about what their opponents are thinking. Both game theorists and logicians have extensively discussed different notions of knowledge and belief for a group, such as common knowledge and belief. These notions have played a fundamental role in the analysis of distributed algorithms [40] and social interactions [28]. In this section, I briefly recount the standard definition of common knowledgeFootnote 11.

Consider the statement “Everyone in group X knows that \(\varphi \).” With finitely many agents, this can be easily defined in the epistemic language \(\mathcal {L}_{KB}\):

$$K_X\varphi \ \ :=\ \ \bigwedge _{i\in X}K_i\varphi ,$$

where \(X\subseteq N\) is a finite set. The first nontrivial informational attitude for a group that I study is common knowledge. If \(\varphi \) is common knowledge for the group G, then not only does everyone in the group know that \(\varphi \) is true, but this fact is completely transparent to all members of the group. Following [6], the idea is to define common knowledge of \(\varphi \) as the following iteration of everyone knows operators:

$$\varphi \wedge K_N\varphi \wedge K_NK_N\varphi \wedge K_NK_NK_N\varphi \wedge \cdots $$

The above formula is an infinite conjunction and, so, is not a formula in our epistemic language \(\mathcal {L}_{KB}\) (by definition, there can be, at most, finitely many conjunctions in any formula). In order to express this, a modal operator \(C_G\varphi \) with the intended meaning “\(\varphi \) is common knowledge among the group G” must be added to our modal language. Formally:

Definition 3

(Interpretation of \(C_G\) ). Let \(\mathcal {M}=\langle W,\{\sim _i\}_{i\in \mathcal {A}},V\rangle \) be an epistemic modelFootnote 12 and \(w\in W\). The truth of formulas of the form \(C_X\varphi \) is:

$$\mathcal {M},w\models C_X\varphi \text {~ iff ~ for all } v\in W, \text { if } wR_X^Cv \text { then }\mathcal {M},v\models \varphi $$

where \(R_X^C:=(\bigcup _{i\in X} \sim _i)^*\) is the reflexive transitive closure of \(\bigcup _{i\in X} \sim _i\).

It is well-known that for any relation R on W, if \(wR^*v\) then there is a finite R-path starting at w ending in v. Thus, \(\mathcal {M},w\models C_X\varphi \) iff every finite path for X from w ends with a state satisfying \(\varphi \).

This approach to defining common knowledge can be viewed as a recipe for defining common (robust) belief. For example, suppose that \(wR_i^B v\) iff \(v\in Min_{\preceq _i}([w]_i)\), and define \(R_G^B\) to be the transitive closureFootnote 13 of \(\cup _{i\in G} R_i^B\). Then, common belief of \(\varphi \), denoted \(C_G^B\varphi \), is defined in the usual way:

$$\mathcal {M},w\models C_G^B\varphi \,\,\text {iff for each}\,\, v\in W, \,\,\text {if}\,\, wR_G^B v \,\,\text {then}\,\, \mathcal {M},v\models \varphi .$$

A probabilistic variant of common belief was introduced in [55].

3 Reasoning to an Equilibrium

Brian Skyrms presents a model of the players’ process of deliberation in a game in his important book The Dynamics of Rational Deliberation [72]. In this section, I introduce and discuss this model of deliberation, though the reader is referred to [72] for a full discussion (see, also, [1, 45] for analyses of this model).

To simplify the exposition, I restrict attention to a two-person finite strategic game. Everything discussed below can be extended to situations with more than two playersFootnote 14 and to extensive gamesFootnote 15. Suppose that \(G=\langle \{a, b\}, \{S_a, S_b\}, \{u_a, u_b\}\rangle \) is a strategic game in which \(S_a=\{s_1,\ldots , s_n\}\) and \(S_b=\{t_1,\ldots , t_m\}\) are the players’ strategies, and \(u_a\) and \(u_b\) are utility functions. In the simplest case, deliberation is trivial: Each player calculates the expected utility given her belief about what her opponent is going to do and then chooses the action that maximizes these expected utilities. One of Skyrms’ key insights is that this calculation may be informative to the players, and if a player believes that there is any possibility that the process of deliberation may ultimately lead her to a different decision, then she will not act until her deliberation process has reached a stable stateFootnote 16.

Deliberation is understood as an iterative process that modifies the players’ opinions about the strategies that they will choose (at the end of the deliberation). For each player, a state of indecision is a probability measure on that player’s set of strategies—i.e., an element of \(\varDelta (S_i)\) for \(i=a, b\). Note that each state of indecision is a mixed strategy. However, the interpretation of the mixed strategies differs from the one discussed in Sect. 2.1. In this model, the interpretation is that the state of indecision for a player i at any given stage of the deliberation process is the mixed strategy that player i would choose if the player stopped deliberating. It is the players’ states of indecision that evolve during the deliberation process.

Let \(p_a\in \varDelta (S_a)\) and \(p_b\in \varDelta (S_b)\) be states of indecision for a and b, respectively, and assume that the states of indecision are common knowledge. One consequence of this assumption is that the players can calculate the expected utilities of their strategies (using their opponent’s state of indecision). For example, for \(s_j\in S_a\), we have

$$EU_a(s_j)=\sum _{t_k\in S_b} p_b(t_k)u_i(s_j,t_k), $$

and similarly for b. The status quo is the expected utility of the current state of indecision:

$$SQ_a=\sum _{s_j\in S_a} p_a(s_j)\cdot EU_a(s_j)\qquad SQ_b=\sum _{t_k\in S_b} p_b(t_k)\cdot EU_b(t_k).$$

Once the expected utilities are calculated, the players modify their states of indecision so that they believe more strongly that they will choose strategies with higher expected utility than the status quo. Players can use various rules to update their states of indecision accordingly. In general, any dynamical rule can be used so long as the rule seeks the good in the following sense:

  1. 1.

    The rule raises the probability of a strategy only if that strategy has expected utility greater than the status quo.

  2. 2.

    The rule raises the sum of the probabilities of all strategies with expected utility greater than the status quo (if any).

Deliberation reaches a fixed-point when the dynamical rule no longer changes the state of indecision. It is not hard to see that all dynamical rules that seek the good have, as fixed-points, states of indecision in which the expected utility of the status quo is maximal. To illustrate Skyrms’ model of deliberation with an example, I give the details of one of the rules discussed in [72]:

Nash dynamics. The covetability of a strategy s for player i is calculated as follows: \(cov_i(s)=\max (EU_i(s) - SQ_i, 0)\). Then, Nash dynamics transform a probability \(p\in \varDelta (S_i)\) into a new probability \(p'\in \varDelta (S_i)\) as follows. For each \(s\in S_i\):

$$p'(s)=\frac{k\cdot p(s) + cov_i(s)}{k + \sum _{s\in S_i} cov(s)},$$

where \(k>0\) is the “index of caution” (the higher the k, the more slowly the decision maker raises the probability of strategies that have higher expected utility than the status quo).

In addition to assuming that the initial states of indecision are common knowledge, it is assumed that each player can emulate the other’s calculations, and that each player is, in fact, using the same dynamical rule to modify her state of indecision. Given that all of this is common knowledge, the states of indecision resulting from one round of the deliberation process will, again, be common knowledge and the process can continue until a fixed-point is reached.

A simple example will make this more concrete. Consider the following game between two players, Ann (a) and Bob (b)Footnote 17.

figure c

There are two pure Nash equilibria ((ul) and (dr)) and one mixed-strategy Nash equilibrium where Ann plays u with probability 2/3 and Bob plays l with probability 1/3. Suppose that the initial state of indecision is:

$$p_a(u)= 0.2,\ p_a(d)= 0.8 \qquad \text {and}\qquad p_b(l)= 0.9,\ p_b(r)=0.1. $$

Since both players have access to each other’s state of indecision, they can calculate the expected utilities of each of their strategies:

figure d

If the players simply choose the strategy that maximizes their expected utilities, then the outcome of the interaction will be the off-equilibrium profile (ur). However, the process of deliberation will pull the players towards an equilibrium. The status quo for each player is:

figure e

The covetabilities for each of the strategies are:

figure f

Now, the new states of indecision \(p_a'\) and \(p_b'\) are calculated using Nash dynamics (for simplicity, I assume that the index of caution is \(k=1\)):

figure g

The new states of indecision are now \(p_a'\) and \(p_b'\), and we can continue this process. On can visualize this process by the following graph, in which the x-axis is the probability that Bob will choose r and the y-axis is the probability that Ann will choose u Footnote 18.

figure h

The deliberation reaches a fixed-point with Ann and Bob deciding to play their part of the Nash equilibrium (ul). In fact, Skyrms shows that under the strong assumptions of common knowledge noted above and assuming that all players use dynamical rules that seek the good, when the process of deliberation reaches a fixed-point, the states of indecision will form a Nash equilibriumFootnote 19.

4 Strategic Reasoning as a Solution Concept

A key aspect of the iterative removal of dominated strategies is that at each stage of the process, strategies are identified as either “good” or “bad”. The “good” strategies are those that are not strictly/weakly dominated, while the “bad” ones are weakly/strictly dominated. If the intended interpretation of the iterative procedure that removes weakly/strictly dominated strategies is to represent the players “deliberation” about what they are going to do, then this is a significant assumption. The point is that while a player is deliberating about what to do in a game situation, there may be strategies that cannot yet be classified as “good” or “bad”. These are the strategies that the player needs to think about more before deciding how to classify them. Building on this intuition, the reasoning-based expected utility procedure of [32] is intended to model the reasoning procedure that a Bayesian rational player would follow as she decides what to do in a game.

At each stage of the procedure, strategies are categorized. A categorization is a ternary partition of the players’ strategies \(S_i\), rather than the usual binary partition in terms of which strategies are strictly/weakly dominated and which are not. The key idea is that during the reasoning process, strategies are accumulated, deleted or neither. Formally, for each player i, let \(S_i^+\subseteq S_i\) denote the set of strategies that have been accumulated and \(S_i^-\subseteq S_i\) the set of strategies that have been deleted. The innovative aspect of this procedure is that \(S_i^+\cup S_i^-\) need not equal \(S_i\). So, strategies in \(S_i\) but not in \(S_i^+\cup S_i^-\) are classified as “neither accumulated nor deleted”. The reasoning-based expected utility procedure proceeds as follows: The procedure is defined by induction. Initially, let \(D_{i,0}=\varDelta (S_{-i})\), the set of all probability measures over the strategies of i’s opponents, and let \(S_{i,0}^+=S_{i,0}^-=\emptyset \). Then, for \(n\ge 0\), we have:

  • Accumulate all strategies for player i that maximize expected utility for every probability in \(D_i\). Formally,

    $$S_{i,n+1}^+=\{s_i\in S_{i,n}\ |\ MEU(s_i,\pi )\text { for all }\pi \in D_{i,n}\}.$$
  • Delete all strategies for player i that do not maximize probability against any probability distribution

    $$S_{i,n+1}^-\{s_i\in S_{i,n}\ |\ \text { there is no }\pi \in D_{i,n} \text { such that }MEU(s_i,\pi )\}.$$
  • Keep all probability measures that assign positive probability to opponents playing accumulated strategies and zero probability to deleted strategies. Formally, let \(D_{i,n+1}\) be all the probability measures from \(D_{i,n}\) that assign positive probability to any strategy profile from \(\varPi _{j\ne i} S_{i,n+1}^+\) and 0 probability to any strategy profile from \(\varPi _{j\ne i} S_{i,n+1}^-\).

The following example from [32] illustrates this procedure:

figure i

For Bob, strategy l is accumulated since it maximizes expected utility with respect to every probability on Ann’s strategies (note that l weakly dominates r). For Ann, d is deleted, as it does not maximize probability with respect to any probability measure on Bob’s strategies (note that d is strictly dominated by u). Thus, we have

figure j

In the next round, Ann must consider only probability measures that assign positive probability to Bob playing l, and Bob must consider only probability measures assigning probability 0 to Ann playing d. This means that r is accumulated for BobFootnote 20 and \(m_1\) is deletedFootnote 21 for Ann:

figure k

At this point, the procedure reaches a fixed-point with Bob accumulating l and r and Ann deleting d and \(m_1\). The interpretation is that Bob has good reason to play either l or r and, thus, must pick one of them. All that Ann was able to conclude is that d and \(m_1\) are not good choices.

The general message is that players may not be able to identify a unique rational strategy by strategic reasoning alone (as represented by the iterative procedure given above). There are two reasons why this may happen. First, a player may accumulate more than one strategy, and so the player must “pick”Footnote 22 one of them. This is what happened with Bob. Given the observation that Ann will not choose b, both of the choices l and r give the same payoff, and so Bob must pick one of themFootnote 23. Second, players may not have enough information to identify the “rational” choices. Without any information about which of l or r Bob will pick, Ann cannot come to a conclusion about which of u or \(m_2\) she should choose. Thus, neither of these strategies can be accumulated. Ann and Bob face very different decision problems. No matter which choice Bob ends up picking, his choice will be rational (given his belief that Ann will not choose irrationally). However, since Ann lacks a probability over how Bob will pick, she cannot identify a rational choice.

5 Reasoning to a Game Model

The game models introduced in Sect. 2.2 can be used to describe the informational context of a game. A natural question from the perspective of this chapter is: How do the players arrive at a particular informational context? In this section, I introduce different operations that transform epistemic-plausibility models. These operations are intended to represent different ways a rational agent’s knowledge and beliefs can change over time. Then, I show how to use these operations to describe how the players’ knowledge and beliefs change as they each deliberate about what they are going to do in a game situation.

5.1 Modeling Information Changes

The simplest type of informational change treats the source of the information as infallible. The effect of finding out from an infallible source that \(\varphi \) is true should be clear: Remove all states that do not satisfy \(\varphi \). In the epistemic logic literature, this operation is called a public announcement [37, 65]. However, calling this an “announcement” is misleading since, in this chapter, I am not modeling any form of “pre-play” communication. The “announcements” are formulas that the players incorporate into the current epistemic state.

Definition 4

(Public Announcement). Suppose that \(\mathcal {M}=\langle W, \{\sim _i\}_{i\in \mathcal {A}}, V\rangle \) is an epistemic model and \(\varphi \) is a formula (in the language \(\mathcal {L}_{K}\)). After all the agents find out that \(\varphi \) is true (i.e., \(\varphi \) is publicly announced), the resulting model is \(\mathcal {M}^{!\varphi }=\langle W^{!\varphi },\{\sim _i^{!\varphi }\},V^{!\varphi }\rangle \), where \(W^{!\varphi }=\{w\in W\ |\ \mathcal {M},w\models \varphi \}\); \(\sim _i^{!\varphi }=\sim _i\cap \ W^{!\varphi }\times W^{!\varphi }\) for all \(i\in \mathcal {A}\); and \(\sigma ^{!\varphi }\) is the restriction of \(\sigma \) to \(W^{!\varphi }\).

The models \(\mathcal {M}\) and \(\mathcal {M}^\varphi \) describe two different moments in time, with \(\mathcal {M}\) describing the current or initial information state of the agents and \(\mathcal {M}^{!\varphi }\) the information state after all the agents find out that \(\varphi \) is true. This temporal dimension can also be represented in the logical language with modalities of the form \([!\varphi ]\psi \). The intended interpretation of \([!\varphi ]\psi \) is “\(\psi \) is true after all the agents find out that \(\varphi \) is true”, and truth is defined as

  • \(\mathcal {M},w\models [!\varphi ]\psi \) iff [if \(\mathcal {M},w\models \varphi \) then \(\mathcal {M}^{!\varphi },w\models \psi \)].

A public announcement is only one type of informative action. For the other transformations discussed in this chapter, while the agents do trust the source of \(\varphi \), they do not treat it as infallible. Perhaps the most ubiquitous policy is conservative upgrade (\(\uparrow \!\!\varphi \)), which lets the agent only tentatively accept the incoming information \(\varphi \) by making the best \(\varphi \)-worlds the new minimal set and keeping the old plausibility ordering the same on all other worlds. A second operation is radical upgrade (\(\Uparrow \!\!\varphi \)), which moves all the \(\varphi \) worlds before all the \(\lnot \varphi \) worlds and otherwise keeps the plausibility ordering the same. Before giving the formal definition, we need some notation: Given an epistemic-plausibility model \(\mathcal {M}\), let \([\![\varphi ]\!]_{i}^w=\{x\ |\ \mathcal {M},x\models \varphi \}\cap [w]_i\) denote the set of all \(\varphi \)-worlds that i considers possible at state w and \(best_i(\varphi ,w)=Min_{\preceq _i}([\![\varphi ]\!]_{i}^w)\) be the best \(\varphi \)-worlds at state w, according to agent i.

Definition 5

(Conservative and Radical Upgrade). Given an epistemic-plausibility model \(\mathcal {M}=\langle W, \{\sim _i\}_{i\in \mathcal {A}}, \{\preceq _i\}_{i\in \mathcal {A}},\sigma \rangle \) and a formula \(\varphi \in \mathcal {L}_{KB}\), the conservative/radical upgrade of \(\mathcal {M}\) with \(\varphi \) is the model 10:44 AM 9/11/2015 with \(W^{*\varphi }=W\), for each i, \(\sim _i^{*\varphi }=\sim _i\), \(V^{*\varphi }=V\) where \(*=\uparrow , \Uparrow \). The relations \(\preceq _i^{\uparrow \varphi }\) and \(\preceq _i^{\Uparrow \varphi }\) are the smallest relations satisfying:

Conservative Upgrade

  1. 1.

    If \(v\in best_i(\varphi ,w)\) then \(v\prec _i^{\uparrow \varphi } x\) for all \(x\in [w]_i\); and

  2. 2.

    for all \(x,y\in [w]_i - best_i(\varphi ,w)\), \(x\preceq _i^{\uparrow \varphi } y\) iff \(x\preceq _i y\).

Radical Upgrade

  1. 1.

    for all \(x\in [\![\varphi ]\!]_i^w\) and \(y\in [\![\lnot \varphi ]\!]_i^w\), set \(x\prec _i^{\Uparrow \varphi } y\);

  2. 2.

    for all \(x,y\in [\![\varphi ]\!]_i^w\), set \(x \preceq _i^{\Uparrow \varphi } y\) iff \(x\preceq _i y\); and

  3. 3.

    for all \(x,y\in [\![\lnot \varphi ]\!]_i^w\), set \(x \preceq _i^{\Uparrow \varphi } y\) iff \(x\preceq _i y\). \(\triangleleft \)

As the reader is invited to check, a conservative upgrade is a special case of a radical upgrade: the conservative upgrade of \(\varphi \) at w is the radical upgrade of \(best_i(\varphi ,w)\). A logical analysis of these operations includes formulas of the form \([\uparrow \!\!\varphi ]\psi \) intended to mean “after everyone conservatively upgrades with \(\varphi \), \(\psi \) is true” and \([\Uparrow \!\!\varphi ]\psi \) intended to mean “after everyone radically upgrades with \(\varphi \), \(\psi \) is true”. The definition of truth for these formula is as expected:

  • \(\mathcal {M},w\models [\uparrow \!\!\varphi ]\psi \) iff \(\mathcal {M}^{\uparrow \varphi },w\models \psi \)

  • \(\mathcal {M},w\models [\Uparrow \!\!\varphi ]\psi \) iff \(\mathcal {M}^{\Uparrow \varphi },w\models \psi \)

The main issue of interest in this chapter is the limit behavior of iterated sequences of announcements. That is, what happens to the epistemic-plausibility models in the limit? Do the players’ knowledge and beliefs stabilize or keep changing in response to the new information?

An initial observation is that iterated public announcement of any formula \(\varphi \) in an epistemic-plausibility model must stop at a limit model where either \(\varphi \) or its negation is true at all states (see [14] for a discussion and proof). In addition to the limit dynamics of knowledge under public announcements, there is the limit behavior of beliefs under soft announcements (radical/conservative upgrades). See [14] and [21, Sect. 4] for general discussions. I conclude this brief introduction to dynamic logics of knowledge and beliefs with an example of the type of dynamics that can arise.

Let \(\mathcal {M}_1\) be an initial epistemic-plausibility model (for a single agent) with three states \(w_1\), \(w_2\) and \(w_3\) satisfying r, q and p, respectively. Suppose that the agent’s plausibility ordering is \(w_1\prec w_2 \prec w_3\). Then, the agent believes that r. Consider the formula

$$\varphi :=\ (r\vee (B^{\lnot r} q\wedge p) \vee (B^{\lnot r} p \wedge q)).$$

This is true at \(w_1\) in the initial model. Since \([\![{\varphi }]\!]_{\mathcal {M}_1}=\{w_3,w_1\}\), we have \(\mathcal {M}_1^{\Uparrow \varphi }=\mathcal {M}_2\). Furthermore, \([\![{\varphi }]\!]_{\mathcal {M}_2}=\{w_2,w_1\}\), so \(\mathcal {M}_2^{\Uparrow \varphi }=\mathcal {M}_3\). Since \(\mathcal {M}_3\) is the same model as \(\mathcal {M}_1\), we have a cycle:

figure l

In the above example, the player’s conditional beliefs keep changing during the update process. However, the player’s non-conditional beliefs remain fixed throughout the process. In fact, Baltag and Smets have shown that every iterated sequence of truthful radical upgrades stabilizes all non-conditional beliefs in the limit [14]. See [12, 38, 58] for generalizations and broader discussions about the issues raised in this section.

5.2 Rational Belief Change During Deliberation

This section looks at the operations that transform the informational context of a game as the players deliberate about what they should do in a game situation. The main idea is that in each informational context (viewed as describing one stage of the deliberation process), the players determine which options are “optimal” and which options the players ought to avoid (guided by some choice rule). This leads to a transformation of the informational context as the players adopt the relevant beliefs about the outcome of their practical reasoning. The different types of transformation mentioned above then represent how confident the player(s) (or modeler) is (are) in their assessment of which outcomes are rational. In this new informational context, the players again think about what they should do, leading to another transformation. The main question is: Does this process stabilize?

The answer to this question will depend on a number of factors. The general picture is

$$\mathcal {M}_0\mathop {\Longrightarrow }\limits ^{\tau (D_0)}\mathcal {M}_1 \mathop {\Longrightarrow }\limits ^{\tau (D_1)}\mathcal {M}_2 \mathop {\Longrightarrow }\limits ^{\tau (D_2)}\cdots \mathop {\Longrightarrow }\limits ^{\tau (D_n)}\mathcal {M}_{n+1}{\Longrightarrow } \cdots $$

where each \(D_i\) is some proposition describing the “rational” options and \(\tau \) is a model transformer (e.g., public announcement, radical or conservative upgrade). Two questions are important for the analysis of this process. First, what type of transformations are the players using? Second, where do the propositions \(D_i\) come from?

Here is a baseline result from [18]. Consider a propositional formula \(\mathsf {Rat}_i\) that is intended to mean “i’s current action is not strictly dominated in the set of actions that the agent currently considers possible”. This is a propositional formula whose valuation changes as the model changes (i.e., as the agent removes possible outcomes that are strictly dominated). An epistemic model is full for a game G provided the map \(\sigma \) from states to profiles is onto. That is, all outcomes are initially possible.

Theorem 1

(van Benthem [18]). The following are equivalent for all states w in an epistemic model that is full for a finite game G:

  1. 1.

    The outcome \(\sigma (w)\) survives iterated removal of strictly dominated strategies.

  2. 2.

    Repeated successive public announcements of \(\bigwedge _i \mathsf {Rat}_i\) for the players stabilize at a submodel whose domain contains w.

This theorem gives a precise sense of how the process of iteratively removing strictly dominated strategies can be viewed as a process of deliberation (cf. the discussion in Sect. 2.1). See [4] for a generalization of this theorem focusing on arbitrary “optimality” propositions satisfying a monotonicity property and arbitrary games. A related analysis can be found in [59], which provides an in-depth study of the upgrade mechanisms that match game-theoretic analyses.

5.3 Rational Belief Change During Game Play

The importance of explicitly modeling belief change over time becomes even more evident when considering extensive games. An extensive game makes explicit the sequential structure of the choices in a game. Formally, an extensive game is a tuple \(\langle N, T, \tau , \{ u_i \}_{i \in N}\rangle \), where

  • N is a finite set of players;

  • T is a tree describing the temporal structure of the game situation: Formally, T consists of a set of nodes and an immediate successor relation \(\rightarrowtail \). Let O denote the set of leaves (nodes without any successors) and V the remaining nodes. The edges at a decision node \(v\in V\) are each labeled with an action. Let A(v) denote the set of actions available at v. Let \(\rightsquigarrow \) be the transitive closure of \(\rightarrowtail \).

  • \(\tau \) is a turn function assigning a player to each node \(v\in V\) (let \(V_i=\{v\in V\ |\ \tau (v)=i\}\).

  • \(u_i: O\rightarrow \mathbb {R}\) is the utility function for player i assigning real numbers to outcome nodes.

The following is an example of an extensive game:

figure m

This is an extensive game with \(V=\{v_1, v_2, v_3\}\), \(O=\{o_1, o_2, o_3, o_4\}\), \(\tau (v_1)=\tau (v_3)=1\) and \(\tau (v_2)=2\), and, for example, \(u_1(o_2)=0\) and \(u_2(o_2)=3\). Furthermore, we have, for example, \(v_1\rightarrowtail o_1\), \(v_1\rightsquigarrow o_4\), and \(A(v_1)=\{I_1, O_1\}\).

A strategy for player i in an extensive game is a function \(\sigma \) from \(V_i\) to nodes such that \(v\mapsto \sigma (v)\). Thus, a strategy prescribes a move for player i at every possible node where i moves. For example, the function \(\sigma \) with \(\sigma (v_1)=O_1\) and \(\sigma (v_3)=I_3\) is a strategy for player i, even though, by following the strategy, i knows that \(v_3\) will not be reached. The main solution concept for extensive games is the subgame perfect equilibrium [71], which is calculated using the “backward induction (BI) algorithm”:

BI Algorithm: At terminal nodes, players already have the nodes marked with their utilities. At a non-terminal node n, once all daughters are marked, the node is marked as follows: determine whose turn it is to move at n and find the daughter d that has the highest utility for that player. Copy the utilities from d onto n.

In the extensive game given above, the BI algorithm leads to the following markings:

figure n

The BI strategy for player 1 is \(\sigma (v_1)=O_1\), \(\sigma (v_3)=O_3\) and for player 2 it is \(\sigma (v_2)=O_2\). If both players follow their BI strategy, then the resulting outcome is \(o_1\) (\(v_1\mapsto o_1\) is called the BI path).

Much has been written about backward induction and whether it follows from the assumption that there is common knowledge (or common belief) that all players are rational Footnote 24. In the remainder of this section, I explain how epistemic-plausibility models and the model transformations defined above can make this more precise. The first step is to describe what the players believe about the strategies followed in an extensive game and how these beliefs may change during the play of the game. I sidestep a number of delicate issues in the discussion below (see [24] for a clear exposition). My focus is on the players’ beliefs about which outcome of the game (i.e., the terminal nodes) will be realized.

Suppose that \(a,a'\in A(v)\) for some \(v\in V_i\). We say move a strictly dominates move \(a'\) in beliefs (given some epistemic-plausibility model), provided that all of the most plausible outcomes reachable by playing a at v are preferred to all the most plausible outcomes reachable by playing \(a'\). Consider an initial epistemic-plausibility model in which the states are the four outcomes \(\{o_1,o_2,o_3,o_4\}\), and both players consider all outcomes equally plausible (I write \(w\approx v\) if w and v are equally plausible—i.e., \(w\succeq v\) and \(v\succeq w\)). Then, at \(v_2\), \(O_2\) is not strictly dominated over \(I_2\) in beliefs since the nodes reachable by \(I_2\) are \(\{o_3, o_4\}\), both are equally plausible, and player 2 prefers \(o_4\) over \(o_2\), but \(o_2\) over \(o_3\). However, since player 1 prefers \(o_3\) to \(o_4\), \(O_3\) strictly dominates \(I_3\) in beliefs. Suppose that R is interpreted as “no player chooses an action that is strictly dominated in beliefs”. Thus, in the initial model, in which all four outcomes are equally plausible, the interpretation of R is \(\{o_1,o_2,o_3\}\). We can now ask what happens to the initial model if this formula R is iteratively updated (for example, using radical upgrade).

figure o

This sequence of radical upgrades is intended to represent the “pre-play” deliberation leading to a model in which there is common belief that the outcome of the game will be \(o_1\). But, what justifies both players deliberating in this way to a common epistemic-plausibility model?

The correctness of the deliberation sequence is derived from the assumption that there is common knowledge that the players are “rational” (in the sense, that players will not knowingly choose an option that will give them lower payoffs). But there is a potential problem: Under common knowledge that the players are rational (i.e., make the optimal choice when given the chance), player 1 must choose \(O_1\) at node \(v_1\). The backward induction argument for this is based on what the players would do if player 1 chose \(I_1\). But, if player 1 did, in fact, choose \(I_1\), then common knowledge of rationality is violated (player 1’s choice would be “irrational”). Thus, it seems that common knowledge of rationality, alone, cannot be used to show that the players will make choices consistent with the backward induction path. An additional assumption about how the players’ beliefs may change during the course of the game is needed. The underlying assumption is that the players are assumed to be unwaveringly optimistic: No matter what is observed, players maintain the belief that everyone is rational at future nodes.

There are many ways to formalize the above intuition that players are “unwaveringly optimistic”. I briefly discuss the approach from [15] since it touches on a number of issues raised in this chapter. The key idea is to encode the players’ strategies as conditional beliefs in an epistemic-plausibility model. For example, consider the following epistemic-plausibility model on the four outcomes of the above extensive game:

figure p

It is assumed that there are atomic propositions for each possible outcome. Formally, suppose that there is an atomic proposition \(\mathsf {o_i}\) for each outcome \(o_i\) (assume that \(\mathsf {o_i}\) is true only at state \(o_i\)). The non-terminal nodes \(v\in V\) are then identified with the set of outcomes reachable from that node:

$$\mathsf {v}:=\bigvee _{v\rightsquigarrow o} \mathsf {o}.$$

In the above model, both players 1 and 2 believe that \(o_1\) is the outcome that will be realized, and the players initially rule out none of the possible outcomes. That is, the model satisfies the “open future” assumption of [15] (none of the players have “hard” information that an outcome is ruled out). The fact that player 1 is committed to the BI strategy is encoded in the conditional beliefs of the player: both \(B_1^{\mathsf {v_1}}\mathsf {o_1}\) and \(B_1^{\mathsf {v_3}}\mathsf {o_3}\) are true in the above model. For player 2, \(B_2^{\mathsf {v_2}}(\mathsf {o_3}\vee \mathsf {o_4})\) is true in the above model, which implies that player 2 plans to choose action \(I_2\) at node \(v_2\).

The dynamics of actual play is then modeled as a sequence of public announcements (cf. Definition 4). The players’ beliefs change as they learn (irrevocably) which of the nodes in the game are reached. This process produces a sequence of epistemic-plausibility models. For example, a possible sequence of the above game starting with the initial model \(\mathcal {M}\) given above is:

$$\mathcal {M}=\mathcal {M}^{!\mathsf {v_1}}; \mathcal {M}^{!\mathsf {v_2}};\mathcal {M}^{!\mathsf {v_3}};\mathcal {M}^{!\mathsf {o_4}}$$

The assumption that the players are “incurably optimistic” is represented as follows: No matter what true formula is publicly announced (i.e., no matter how the game proceeds), there is common belief that the players will make a rational choice (when it is their turn to move). Formally, this requires introducing an arbitrary public announcement operator [11]: \(\mathcal {M},w\models [\ !\ ]\varphi \) provided that, for all formulasFootnote 25 \(\psi \), if \(\mathcal {M},w\models \psi \) then \(\mathcal {M},w\models [!\psi ]\varphi \). Then, there is common stable belief in \(\varphi \) provided that \([\ !\ ]C^B\varphi \) is true, where \(C^B\varphi \) is intended to mean that there is common belief in \(\varphi \) (cf. Sect. 2.2). The key result is:

Theorem 2

(Baltag, Smets and Zvesper [15]). Common knowledge of the game structure, and of open future and common stable belief in dynamic rationality, together, imply common belief in the backward induction outcome.

6 Concluding Remarks

This chapter has not focused on strategies per se, but, rather, on the process of “rational deliberation” that leads players to adopt a particular “plan of action”. Developing formal models of this process is an important and rich area of research for anyone interested in the foundations of game theory.

The economist’s predilection for equilibria frequently arises from the belief that some underlying dynamic process (often suppressed in formal models) moves a system to a point from which it moves no further.

[22, pg. 1008]

Many readers may have been expecting a formal account of the players’ practical reasoning in game-theoretic situations. Instead, this chapter presented three different frameworks in which the “underlying dynamic process” mentioned in the above quote is made explicit. None of the frameworks discussed in this chapter are intended to model the players’ practical reasoning. Rather, they describe deliberation in terms of a sequence of belief changes about what the players are doing or what their opponents may be thinking. This raises an important question: In what sense do the frameworks introduced in this chapter describe the players’ strategic reasoning? I will not attempt a complete answer to this question here. Instead, I conclude with brief discussions of two related questions.

6.1 What Are the Differences and Similarities Between the Different Models of Strategic Reasoning?

The three frameworks presented in this paper offer different perspectives on the standard game-theoretic analysis of strategic situations. To compare and contrast these different formal frameworks, I will illustrate the different perspectives on the following game from [70, Example 8, pg. 305]:

figure q

In the above game, d is weakly dominated by u for Ann. If Bob knows that Ann is rational (in the sense that she will not choose a weakly dominated strategy), then he can rule out option d. In the smaller game, action r is now strictly dominated by l for Bob. If Ann knows that Bob is rational and that Bob knows that she is rational (and so, rules out option d), then she can rule out option r. Assuming that the above reasoning is transparent to both Ann and Bob, it is common knowledge that Ann will play u and Bob will play l. But now, what is the reason for Bob to rule out the possibility that Ann will play d? He knows that Ann knows that he is going to play l, and both u and d maximize Ann’s expected utility with respect to the belief that Bob will play l.

Many authors have pointed out puzzles surrounding an epistemic analysis of iterated removal of weakly dominated strategies [5, 27, 70]. The central issue is that the assumption of common knowledge of rationality seems to conflict with the logic of iteratively removing weakly dominated strategies. The models introduced in this paper each provide a unique perspective on this issue. Note that the idea is not to provide a new “epistemic foundation” for iterated removal of weakly dominated strategies. Both [27] and [41] have convincing results here. Rather, the goal is to offer a different perspective on the existing epistemic analyses.

I start with Skyrms’ model of rational deliberation from Sect. 3. There are two Nash equilibria: the pure strategy Nash equilibrium (ul) and the mixed Nash equilibrium, where Ann plays u and d each with probability 0.5 and Bob plays strategy l. Rational deliberation with any dynamical rule that “seeks the good” (such as the Nash dynamics) is guaranteed to lead the players to one of the two equilibria. However, there is an important difference between the two Nash equilibria from the point of view of rational deliberators. Through deliberation, the players will almost always end up at the pure-strategy equilibrium. That is, unless the players start deliberating at the mixed-strategy Nash equilibrium, deliberation will lead the players to the pure-strategy equilibrium. This makes sense since playing u will always give a greater expected utility for Ann than any mixed strategy, as long as there is a chance (no matter how small) that Bob will play r. I can illustrate this point by showing the deliberational path that is generated if the players start from the following states of indecision: (1) Ann is playing d with probability 1 and Bob is playing l with probability 1; (2) Ann is playing u and l with probability 0.5 and Bob is playing l with probability 0.95; and (3) Ann is playing u with probability 0.5 and Bob is playing r with probability 0.5Footnote 26.

figure r

The second perspective comes from the reasoning-based expected utility procedure discussed in Sect. 4. For Ann, u is accumulated in the first round since it maximizes expected utility with respect to all probability measures on Bob’s strategies. No other strategies are deleted or accumulated. Thus, the procedure stabilizes in the second round without categorizing any of Bob’s strategies or Ann’s strategy d. So, u is identified as a “good” strategy, but d is not classified as a “bad” strategy. Furthermore, neither of Bob’s strategies can be classified as “good” or “bad”.

Finally, I turn to the approach outlined in Sect. 5. An analysis of this game is discussed in [59]. In that paper, it is shown that certain deliberational sequences for the above game do not stabilize. Of course, whether a deliberational sequence stabilizes depends crucially on which model transformations are used. Indeed, a new model transformation, “suspend judgement”, is used in [59] to construct a deliberational sequence that does not stabilize. The general conclusion is that the players may not be able to deliberate their way to an informational context in which there is common knowledge of rationality (where rationality includes the assumption that players do not play weakly dominated strategies).

Each of the different frameworks offers a different perspective on strategic reasoning in games. The perspectives are not competing; rather, they highlight different aspects of what it means to reason strategically. However, more work is needed to precisely characterize the similarities and differences between these different models of rational deliberation in games. Such a comprehensive comparison will be left for another paper.

6.2 What Role Do Higher-Order Beliefs Play in a General Theory of Rational Decision Making in Game Situations?

Each model of deliberation discussed in this chapter either implicitly or explicitly made assumptions about the players’ higher-order beliefs (see Sect. 2.2). In the end, I am interested only in what (rational) players are going to do. This, in turn, depends only on what the players believe the other players are going to do. A player’s belief about what her opponents are thinking is relevant only because they shape the players first-order beliefs about what her opponents are going to do. Kadane and Larkey explain the issue nicely:

“It is true that a subjective Bayesian will have an opinion not only on his opponent’s behavior, but also on his opponent’s belief about his own behavior, his opponent’s belief about his belief about his opponent’s behavior, etc. (He also has opinions about the phase of the moon, tomorrow’s weather and the winner of the next Superbowl). However, in a single-play game, all aspects of his opinion except his opinion about his opponent’s behavior are irrelevant, and can be ignored in the analysis by integrating them out of the joint opinion.

[46, pg. 239, my emphasis]

A theory of rational decision making in game situations need not require that a player considers all of her higher-order beliefs in her decision-making process. The assumption is only that the players recognize that their opponents are “actively reasoning” agents. Precisely “how much” higher-order information should be taken into account in such a situation is a very interesting, open question (cf. [47, 64]).

There is quite a lot of experimental work about whether or not humans take into account even second-order beliefs (e.g., a belief about their opponents’ beliefs) in game situations (see, for example, [30, 44, 73]). It is beyond the scope of this chapter to survey this literature here (see [29] for an excellent overview). Of course, this is a descriptive question, and it is very much open how such observations should be incorporated into a general theory of rational deliberation in games (cf. [53, 54, 77]).

$$*******$$

A general theory of rational deliberation for game and decision theory is a broad topic. It is beyond the scope of this chapter to discuss the many different aspects and competing perspectives on such a theoryFootnote 27. A completely developed theory will have both a normative component (What are the normative principles that guide the players’ thinking about what they should do?) and a descriptive component (Which psychological phenomena best explain discrepancies between predicted and observed behavior in game situations?). The main challenge is to find the right balance between descriptive accuracy and normative relevance. While this is true for all theories of individual decision making and reasoning, focusing on game situations raises a number of compelling issues. Robert Aumann and Jacques Dreze [2, pg. 81] adeptly summarize one of the most pressing issues when they write: “[T]he fundamental insight of game theory [is] that a rational player must take into account that the players reason about each other in deciding how to play”. Exactly how the players (should) incorporate the fact that they are interacting with other (actively reasoning) agents into their own decision-making process is the subject of much debate.