Abstract
This chapter describes a number of results obtained in the last 60 years on the theory of nonzero-sum discrete-time stochastic games. We provide an overview of almost all basic streams of research in this area such as the existence of stationary Nash and correlated equilibria in models on countable and general state spaces, the existence of subgame-perfect equilibria, algorithms, stopping games, and the existence of uniform equilibria. Our survey incorporates several examples of games studied in operations research and economics. In particular, separate sections are devoted to intergenerational games, dynamic Cournot competition and game models of resource extraction. The provided reference list includes not only seminal papers that commenced research in various directions but also exposes recent advances in this field.
Access provided by CONRICYT-eBooks. Download reference work entry PDF
Similar content being viewed by others
Keywords
- Nonzero-sum game
- Stochastic game
- Discounted payoff
- Limit-average payoff
- Markov perfect equilibrium
- Subgame-perfect equilibrium
- Intergenerational altruism
- Uniform equilibrium
- Stopping game
1 Introduction
The fundamentals of modern theory of non-cooperative dynamic games were established in the 1950s at Princeton University. First Nash (1950) introduced the notion of equilibrium for finite n-person static games and proved its existence using the fixed point theorem of Kakutani (1941). Next Shapley (1953) presented the model of infinite time horizon stochastic zero-sum game with positive stop probability. Fink (1964) and Takahashi (1964) extended his model to nonzero-sum discounted stochastic games with finite state spaces and proved the existence of equilibrium in stationary Markov strategies. Later on, Rogers (1969) and Sobel (1971) obtained similar results for irreducible stochastic games with the expected limit-average payoffs. Afterwards, the theory of discrete-time nonzero-sum stochastic games was extended in various directions inspiring a lot of interesting applications. An overview of selected basic topics in stochastic dynamic games with instructive examples can be found in the books of Başar and Olsder (1995) and Haurie et al. (2012) and the survey paper by Dutta and Sundaram (1998). Theoretically more advanced material is included in the book edited by Neyman and Sorin (2003) and the monograph of Mertens et al. (2015).
In this chapter, we provide an overview of almost all basic streams of research in the area of nonzero-sum discrete-time stochastic games such as: the existence of stationary equilibria in models on both countable and general state spaces, the existence of subgame-perfect equilibria, algorithms, stopping games, correlated and uniform equilibria. Our survey incorporates several examples of games studied in operations research and economics. In particular, separate sections are devoted to intergenerational games, dynamic Cournot competition and game models of resource extraction. The provided reference list not only includes seminal papers that initiated research in various directions but also exposes recent advances in this field.
The paper is organized as follows. In Sect. 2 we describe some basic material needed for an examination of nonzero-sum stochastic games with general state spaces. To make the presentation less technical, we restrict our attention to Borel space models. A great deal of the results described in this chapter are stated in the literature in a more general framework. However, the Borel state space models are sufficiently rich for most applications. Section 2 includes auxiliary results on set-valued mappings arising in the study of the existence of stationary Nash/correlated equilibria and certain known results in the literature such as a random version of the Carathéodory theorem or the Mertens measurable selection theorem. The second part, on the other hand, is devoted to supermodular games. Sect. 3 deals with the concept of subgame-perfect equilibrium in games on a Borel state space and introduces different classes of strategies, in which subgame-perfect equilibria may be obtained. Sect. 4 includes results on correlated equilibria with public signal proved for games on Borel state spaces, whereas Sect. 5 presents the state-of-the-art results on the existence of stationary equilibria (further called “stationary Markov perfect equilibria”) in discounted stochastic games. This theory found its applications to several examples examined in operations research and economics. Namely, in Sect. 6 we provide models with special but natural transition structures, for which there exist stationary equilibria. Sect. 7 recalls the papers, where the authors proved the existence of an equilibrium for stochastic games on denumerable state spaces. This section embraces the discounted models as well as models with the limit-average payoff criterion. Moreover, it is also shown that the discounted game with a Borel state space can be approximated, under some assumptions, by simpler games with countable state spaces. Sect. 8 reviews algorithms applied to nonzero-sum stochastic games. In particular, it is shown how a formulation of a linear complementarity problem can be helpful in solving games with discounted and limit-average payoff criteria with special transition structure. In addition, we also mention the homotopy methods applied to this issue. Sect. 9 presents material on games with finite state and action spaces, while Sect. 10 deals with games with product state spaces. In Sect. 11 we formulate results proved for various intergenerational games. Our models incorporate paternalistic and non-paternalistic altruistic economic growth models: games with one, finite, or infinitely many descendants as well as games on one- and multidimensional commodity spaces. Finally, Sect. 12 provides a short overview of stopping games beginning from the Dynkin extension of Neveu’s stopping problem.
2 Preliminaries
In this section, we recall essential notations and several facts, which are crucial for studying Nash and correlated equilibria in nonzero-sum stochastic games with uncountable state spaces. Here we follow preliminaries in Jaśkiewicz and Nowak (2018a). Let N = {1, …, n} be the set of n-players. Let X, A1, …, An be Borel spaces. Assume that for each i ∈ N, x → Ai(x) ⊂ Ai is a lower measurable compact-valued action correspondence for player i. This is equivalent to saying that the graph of this correspondence is a Borel subset of X × Ai. Let A := A1 × ⋯ × An. We consider a nonzero-sum n-person game parameterized by a state variable x ∈ X. The payoff or utility function for player i ∈ N is \(r_i:X\times A \to \mathbb {R}\) and it is assumed that ri is bounded, ri(⋅, a) is Borel measurable for each a ∈ A, and ri(x, ⋅) is continuous on A for each x ∈ X. Assuming that i ∈ N chooses a mixed strategy \(\nu _i\in \Pr (A_i(x))\) and ν := (ν1, …, νn), we denote the expected payoff to player i by
A strategy profile \(\nu ^*=(\nu _1^*,\ldots ,\nu _n^*)\) is a Nash equilibrium in the game for a given state x ∈ X if
for every i ∈ N and \(\nu _i\in \Pr (A_i(x)).\) As usual \((\nu _i,\nu ^*_{-i})\) denotes ν∗ with \(\nu ^*_i\) replaced by νi. For any x ∈ X, by \( \mathcal {N} (x)\), we denote the set of all Nash equilibria in the considered game. By Glicksberg (1952), \(\mathcal {N }(x)\neq \emptyset .\) It is easy to see that \(\mathcal {N}( x)\) is compact. Let \(\mathcal {NP}(x)\subset \mathbb {R} ^n\) be the set of payoff vectors corresponding to all Nash equilibria in \( \mathcal {N}( x).\) By co, we denote the convex hull operator in \(\mathbb {R} ^n.\)
Proposition 1
The mappings \(x\to {\mathcal {N}}( x)\) , \(x\to {\mathcal {NP}}( x)\) and \(x\to co{\mathcal {NP}}( x)\) are compact-valued and lower measurable.
For a detailed discussion of these statements, consult Nowak and Raghavan (1992), Himmelberg (1975), and Klein and Thompson (1984). By Kuratowski and Ryll-Nardzewski (1965), every set-valued mapping in Proposition 1 has a Borel measurable selection. Making use of standard results on measurable selections (see Castaing and Valadier 1977) and Carathéodory’s theorem, we obtain the following result.
Proposition 2
Let \(b:X\to \mathbb {R}^n\) be a Borel measurable selection of the mapping \(x\to co{\mathcal {NP}}( x).\) Then, there exist Borel measurable selections \(b^i:X\to \mathbb {R}^n\) and Borel measurable functions λi : X → [0, 1] (i = 1, …, n + 1) such that for each x ∈ X, we have
Similarly as in Nowak and Raghavan (1992), from Filippov’s measurable implicit function theorem, we conclude the following facts.
Proposition 3
Let \(p:X\to \mathbb {R}^n\) be a Borel measurable selection of the mapping \(x\to {\mathcal {NP}}( x).\) Then, there exists a Borel measurable selection ψ of the mapping \(x\to {\mathcal {N}}( x)\) such that
Proposition 4
If \(b:X\to \mathbb {R}^n\) is a Borel measurable selection of the mapping \(x\to co{\mathcal {NP}}( x)\), then there exist Borel measurable selections ψi of the mapping \(x\to {\mathcal {N}}( x)\) and Borel measurable functions λi : X → [0, 1] (i = 1, …, n + 1) such that for each x ∈ X, we have \(\sum _{i=1}^{n+1}\lambda ^i(x)=1\) and
The following result plays an important role in studying Nash equilibria in stochastic games with Borel state spaces and can be deduced from Theorem 2 in Mertens (2003) and Filippov’s measurable implicit function theorem. It is related to Lyapunov’s theorem on the range of nonatomic measures and also has some predecessors in control theory; see Artstein (1989).
Proposition 5
Let μ be a nonatomic Borel probability measure on X. Assume that qj (j = 1, …, l) are Borel measurable transition probabilities from X to X and for every j and x ∈ X, qj(⋅|x) ≪ μ, i.e., qj(⋅|x) is dominated by μ. Let \(w^0:X\to \mathbb {R}^n\) be a Borel measurable mapping such that \(w^0(x)\in co{\mathcal {NP}}(x)\) for each x ∈ X. Then there exists a Borel measurable mapping \(v^0:X\times X\to \mathbb {R}^n\) such that \(v^0(x,y)\in {\mathcal {NP}}(x)\) for all x, y ∈ X and
Moreover, there exists a Borel measurable mapping \(\phi :X\times X \to \Pr (A)\) such that \(\phi (x,y)\in {\mathcal {N}}(x)\) for all x, y ∈ X.
Let L be a lattice contained in Euclidean space \(\mathbb {R}^k\) equipped with the component-wise order ≥. For any x, y ∈ L, x ∨ y (x ∧ y) denotes the join (meet) of x and y. A function \(\phi :L\to \mathbb {R}\) is supermodular if for any x, y ∈ L, it holds
Clearly, if k = 1, then any function ϕ is supermodular. Let \(L_1\subset \mathbb {R}^k\), \(L_2\subset \mathbb {R}^l\) be lattices. A function \( \psi :L_1\times L_2\to \mathbb {R}\) has increasing differences in (x, y) if for every x′≥ x in L1, ψ(x′, y) − ψ(x, y) is non-decreasing in y. Let the set Ai of pure strategies of player i ∈ N be a compact convex subset of Euclidean space \(\mathbb {R}^{m_i}.\) An element ai of Ai is denoted by \(a_i =(a_{i1}, a_{i2},\ldots ,a_{im_i}).\) We consider an n-person game G0 in which \(R_i:A \to \mathbb {R}\) is the payoff function of player i ∈ N and A := A1 ×⋯ × An. As usual, any strategy profile a = (a1, a2, …, an) can also be denoted as (ai, a−i) for i ∈ N.
Assume that every Ai is a lattice. The game G0 is called supermodular if for every player i ∈ N and a−i, the function ai → Ri(ai, a−i) is supermodular and Ri has increasing differences in (ai, a−i).
It is well known that any supermodular game with continuous utility functions and compact strategy sets Ai has a pure Nash equilibrium; see Topkis (1998) or Theorems 4 and 5 in Milgrom and Roberts (1990).
The game G0 is called smooth if every Ri can be extended from A to an open superset Ao in such a way that its second-order partial derivatives exist and are continuous on Ao.
A game G0 is called a smooth supermodular game if for every player i ∈ N,
-
(a)
Ai is a compact interval in \(\mathbb {R}^{m_i}\),
-
(b)
\( \frac {\partial ^2 R_i}{\partial a_{ij} \partial a_{ik}}\ge 0\) on A for all 1 ≤ j < k ≤ mi,
-
(c)
\(\frac {\partial ^2 R_i}{\partial a_{ij} \partial a_{kl}}\ge 0\) on A for each k ≠ i and all 1 ≤ j ≤ mi, 1 ≤ l ≤ mk.
It is well known that any game satisfying conditions (a)–(c) is supermodular. Conditions (a) and (b) imply that Ri is a supermodular function with respect to ai for fixed a−i, while conditions (a) and (c) imply that Ri has increasing differences in (ai, a−i). For a detailed discussion of these issues, see Topkis (1978, 1998) or Theorem 4 in Milgrom and Roberts (1990).
In order to obtain a uniqueness of an equilibrium in a smooth supermodular game G0, one needs an additional assumption, often called a strict diagonal dominance condition, see page 1271 in Milgrom and Roberts (1990) or Rosen (1965). As noted by Curtat (1996), this condition can be described for smooth supermodular games as follows. Let Mi := {1, 2, …, mi}.
-
(C1)
For every i ∈ N and j ∈ Mi,
$$\displaystyle \begin{aligned} \frac{\partial^2 R_i}{\partial a^2_{ij}} +\sum_{l\in M^i\setminus \{j\}} \frac{\partial^2 R_i}{\partial a_{ij} \partial a_{il}}+\sum_{k\in N\setminus\{i\}} \sum_{l\in M^k} \frac{\partial^2 R_i}{\partial a_{ij} \partial a_{kl}}<0. \end{aligned}$$
From Milgrom and Roberts (1990) and page 187 in Curtat (1996), we obtain the following auxiliary result.
Proposition 6
Any smooth supermodular game G0 satisfying condition (C1)has a unique pure Nash equilibrium.
Assume now that the payoff functions Ri are parameterized by τ in some partially ordered set T, i.e., \(R_i: A\times T\to \mathbb {R}.\) We introduce a new condition.
-
(C2)
\(\frac {\partial ^2 R_i}{\partial a_{ij} \partial \tau }\ge 0\) for all 1 ≤ j ≤ mi, and i ∈ N.
It is known that the set of Nash equilibria in any supermodular game G0 is a lattice and has the smallest and the largest elements. The following result follows from Theorem 7 in Milgrom and Roberts (1990).
Proposition 7
Suppose that a smooth supermodular game satisfies (C2) . Then, the largest and smallest pure Nash equilibria are non-decreasing functions of τ.
3 Subgame-Perfect Equilibria in Stochastic Games with General State Space
We consider an n-person nonzero-sum discounted stochastic game G as defined below.
-
\((X,\mathcal {B}(X))\) is a nonempty Borel state space with its Borel σ-algebra \(\mathcal {B}(X).\)
-
Ai is a Borel space of actions for player i ∈ N := {1, …, n}.
-
Ai(x) ⊂ Ai is a set of actions available to player i ∈ N in state x ∈ X. The correspondence x → Ai(x) is lower measurable and compact-valued. Define
$$\displaystyle \begin{aligned} A:=A_1\times \ldots\times A_n\quad\mbox{and}\quad A(x)=A_1(x)\times\ldots\times A_n(x),\quad\;x\in X. \end{aligned}$$ -
\(u_i:X\times A \to \mathbb {R}\) is a Borel measurable bounded utility (or payoff) function for player i ∈ N. It is assumed that ui(x, ⋅) is continuous on A for every x ∈ X.
-
\(q:X\times A\times \mathcal {B}(X)\to [0,1]\) is a transition probability. We assume that q(D|x, ⋅) is continuous on A for each x ∈ X and \(D\in \mathcal {B}(X).\)
-
β ∈ (0, 1) is a discount factor.
Every stage of the game begins with a state x ∈ X, and after observing x, the players simultaneously choose their actions ai ∈ Ai(x) (i ∈ N) and obtain payoffs ui(x, a), where a = (a1, …, an). A new state x′ is realized from the distribution q(⋅|x, a) and new period begins with payoffs discounted by β. Let H1 = X and Ht be the set of all plays ht = (x1, a1, …, xt−1, at−1, xt), where \( a^k=(a_1^k,\ldots ,a_n^k)\in A(x_k)\), k = 1, …, t − 1. A strategy for player i ∈ N is a sequence \(\pi _i=(\pi _{it})_{t\in \mathbb {N}}\) of Borel measurable transition probabilities from Ht to Ai such that πit(Ai(xt)) = 1 for each ht ∈ Ht. The set of strategies for player i ∈ N is denoted by Πi. We let Π := Π1 ×… × Πn. Let Fi (\(F^0_i\)) be the set of all Borel measurable mappings \(f_i: X\times X \to \Pr (A_i)\) (\(\phi _i: X \to \Pr (A_i)\)) such that \(f_i(x^-,x) \in \Pr (A_i(x))\) (\(\phi _i(x) \in \Pr (A_i(x))\)) for each x−, x ∈ X. A stationary almost Markov strategy for player i ∈ N is a constant sequence \((\pi _{it})_{t\in \mathbb {N}}\) where πit = fi for some fi ∈ Fi and for all \(t\in \mathbb {N}.\) If xt is a state of the game on its t-stage with t ≥ 2, then player i chooses an action using the mixed strategy fi(xt−1, xt). The mixed strategy used at an initial state x1 is fi(x1, x1). The set of all stationary almost Markov strategies for player i ∈ N is identified with the set Fi. A stationary Markov strategy for player i ∈ N is identified with a Borel measurable mapping \(f_i\in F_i^0.\) We say that πi = (πi1, πi2, …) ∈ Πi is a Markov strategy for player i if \(\pi _{it}\in F^0_i\) for all \(t\in \mathbb {N}.\)
Any strategy profile π = (π1, …, πn) ∈ Π together with an initial state x = x1 ∈ X determines a unique probability measure \(P^\pi _x\) on the space H∞ of all plays h∞ = (x1, a1, x2, a2, …) endowed with the product σ-algebra. The expected discounted payoff or utility function for player i ∈ N is
We shall write \(J_{\beta }^i(x,\pi )\), if T = ∞.
A profile of strategies π∗∈ Π is called a Nash equilibrium, if
A stationary almost Markov (stationary Markov) perfect equilibrium is a Nash equilibrium that belongs to the class of strategy profiles F := F1 ×… × Fn (\( F^0:=F^0_1\times \ldots \times F^0_n\)). A Markov perfect equilibrium, on the other hand, is a Nash equilibrium π∗, in which \(\pi ^*_{it}=f_{it} \) and \(f_{it}\in F_i^0\) for every \(t\in \mathbb {N}\) and every player i ∈ N. The strategies involved in such an equilibrium are called “markovian,” “state-contigent,” or “payoff-relevant”; see Maskin and Tirole (2001). Clearly, every stationary Markov perfect equilibrium is also a Markov perfect equilibrium.
Let π = (π1, …, πn) ∈ Π and ht ∈ Ht. By πi[ht] we denote the conditional strategy for player i that can be applied from stage t onward. Let π[ht] = (π1[ht], …, πn[ht]). Using this notation, one can say that π∗ is a subgame-perfect equilibrium in the stochastic game if for any \(t\in \mathbb {N}\) and every partial history ht ∈ Ht, π∗[ht] is a Nash equilibrium in the subgame starting at xt, where xt is the last coordinate in ht. This definition refers to the classical idea of Selten (1975).
Let B(X) be the space of all bounded Borel measurable real-valued functions on X and Bn(X) := B(X) ×⋯ × B(X) (n times). Similarly define B(X × X) and Bn(X × X). With any x ∈ X and v = (v1, …, vn) ∈ Bn(X), we associate the one-shot game Γv(x) in which the payoff function to player i ∈ N is
If \(\nu =(\nu _1,\ldots ,\nu _n)\in \Pr (A(x))\), then
and if f = (f1, …, fn) ∈ F0, then \(U_{\beta }^i(v_i,x,f)= U_{\beta }^i(v_i,x,\nu )\) with ν = (f1(x), …, fn(x)). Further, \( U_{\beta }^i(v_i,x,(\mu _i,f_{-i}))=U_{\beta }^i(v_i,x,\nu )\) with νi = μi, νj = fj(x) for j ≠ i. Under our assumptions, \(a\to U_{\beta }^i(v_i,x,a)\) is continuous on A(x) for every vi ∈ B(X), x ∈ X, i ∈ N. Let \(\mathcal {N}_v(x)\) be the set of all Nash equilibria in the game Γv(x). By \(\mathcal {NP}_v(x)\) we denote the set of payoff vectors corresponding to all Nash equilibria in \(\mathcal {N}_v(x).\) Let \( \mathcal {M}_v\) be the set of all Borel measurable selections of the set-valued mapping \(x\to \mathcal {N}_v(x).\) We know from Proposition 1 that \(\mathcal {M}_v\neq \emptyset .\)
Consider a T-stage game (2 ≤ T < ∞). Assume that the (T − 1)-stage subgame starting at any state x2 ∈ X has a Markov perfect equilibrium, say \(\pi ^*_{T-1}.\) Let \(v_{T-1}^*\) be the vector payoff function in Bn(X) determined by \(\pi ^*_{T-1}.\) Then we can get some \(f^*\in \mathcal {M} _{v_{T-1}^*}\) and define \(\pi ^*_T:=(f^*,\pi ^*_{T-1}).\) It is obvious that \( \pi ^*_T\) is a Markov perfect equilibrium in the T-stage game. This fact was proved by Rieder (1979) and we state it below.
Theorem 1
Every finite-stage nonzero-sum discounted stochastic game satisfying the above conditions has a subgame-perfect equilibrium. For any ε > 0, there exists an ε-equilibrium πε in Markov strategies, i.e.,
Note that ε-equilibrium in the second part of this theorem has no subgame-perfection property.
We now make an additional assumption.
-
(A1)
The transition probability q is norm continuous in actions, i.e., for each x ∈ X, ak → a0 in A(x) as k →∞, it follows that
$$\displaystyle \begin{aligned} \sup_{D\in \mathcal{B}(X)}|q(D|x,a^k)-q(D|x,a^0)|\to 0. \end{aligned}$$
Condition (A1) is quite restrictive, but it is satisfied, if q has a continuous in actions conditional density with respect to some probability measure on X.
Theorem 2
Every discounted nonzero-sum stochastic game G satisfying (A1)has a subgame-perfect equilibrium.
Theorem 2 was proved in a more general form by Mertens and Parthasarathy (2003), where the payoffs and discount factors may depend on time and the state space is a general measurable space. A special case was considered by Mertens and Parthasarathy (1991), who assumed that the action sets are finite and state independent and transitions are dominated by some probability measure on X. The proofs given in Mertens and Parthasarathy (1991, 2003) are based upon studying a specified fixed point property of an operator defined in the class of measurable selections of compact set-valued mappings from the state space to the payoff space. The fixed point obtained in that class is used to define in a recursive way a subgame-perfect equilibrium that consists of history-dependent strategies (unbounded memory is assumed). For further comments on possible extensions of Theorem 2, the reader is referred to Mertens (2002) and Mertens et al. (2015). A modified proof of their results was provided by Solan (1998), who analyzed accumulation points of 𝜖-equilibria (as 𝜖 → 0) obtained in Theorem 1.
Assume that Ai(x) = Ai for each x ∈ X and i ∈ N and that every space Ai is compact. Let X and A1, …, An be given the discrete topology. According to Maitra and Sudderth (2007), a function \(g:H_{\infty } \to \mathbb {R} \) is DS-continuous on H∞ if it is continuous on H∞ endowed with the product topology. It is easy to see that g is DS -continuous on H∞ if and only if, for any 𝜖 > 0 and y = (y1, y2, …) ∈ H∞, there exists m such that |g(y) − g(y′)| < 𝜖 for each \(y^{\prime }=(y_1^{\prime },y_2^{\prime },\ldots )\in H_{\infty }\) such that \(y_l=y^{\prime }_l\) for 1 ≤ l ≤ m. Suppose that \(g_i:H_{\infty }\to \mathbb {R}\) is a bounded Borel measurable payoff function for player i ∈ N. For any strategy profile π ∈ Π and every initial state x = x1, the expected payoff to player i is \(E^\pi _x(g_i).\) The subgame-perfect equilibrium can be defined for this game in the usual way. Maitra and Sudderth (2007) (see Theorem 1.2) obtained a general theorem on the existence of subgame-perfect equilibria for stochastic games.
Theorem 3
Let the payoff functions gi, i ∈ N be bounded, Borel measurable, and DS-continuous on H∞ and let the action spaces Ai, i ∈ N be finite. Then the game has a subgame-perfect equilibrium.
The proof of Theorem 3 applies some techniques from gambling theory described in Dubins and Savage (2014), i.e., approximations of DS-continuous functions by “finitary functions”. Theorem 3 extends a result due to Fudenberg and Levine (1983). An example given in Harris et al. (1995) shows that Theorem 3 is false, if the action spaces are compact metric and the transition probability q is weakly continuous.
The next result was proved by Maitra and Sudderth (2007) (see Theorem 1.3) for “additive games” and sounds as follows.
Theorem 4
Assume that every action space is compact and the transition probability satisfies (A1) . Assume that \(g_i(h_{\infty })=\sum _{t=1}^\infty r_{it}(x_t,a^t)\) and this series converges uniformly on H∞. If, in addition, every function rit is bounded, rit(⋅, a) is Borel measurable on X for each a ∈ A := A1 ×⋯ × An and rit(x, ⋅) is continuous on A for each x ∈ X, then the game has a subgame-perfect equilibrium.
It is worth to emphasize that stationary Markov perfect equilibria may not exist in games considered in this section. Namely, Levy (2013) gave a counterexample of a discounted game with uncountable state space, finite action sets and deterministic transitions. Then, Levy and McLennan (2015) showed that stationary Markov perfect equilibria may not exist even if the action spaces are finite, X = [0, 1] and the transition probability has a density function with respect to some measure \(\mu \in \Pr (X).\) A simple modification of the example given in Levy and McLennan (2015) shows that a new game (with X = [0, 2]) need not have a stationary Markov perfect equilibrium, when the measure μ (dominating the transition probability q) is nonatomic.
4 Correlated Equilibria with Public Signals in Games with Borel State Spaces
Correlated equilibria for normal form games were first studied by Aumann (1974, 1987). In this section we describe an extensive-form correlated equilibrium with public randomization inspired by the work of Forges (1986) . A further discussion of correlated equilibria and communication in games can be found in Forges (2009). The sets of all equilibrium payoffs in extended form games that include a general communication device are characterized by Solan (2001).
We now extend the sets of strategies available to the players in the sense that we allow them to correlate their choices in some natural way. Suppose that \((\xi _t)_{t\in \mathbb {N}}\) is a sequence of so-called signals , drawn independently from [0, 1] according to the uniform distribution. Suppose that at the beginning of each period t of the game the players are informed not only of the outcome of the preceding period and the current state xt, but also of ξt. Then, the information available to them is a vector ht = (x1, ξ1, a1, …, xt−1, ξt−1, at−1, xt, ξt), where xτ ∈ X, ξτ ∈ [0, 1], aτ ∈ A(xτ), 1 ≤ τ ≤ t − 1. We denote the set of such vectors by Ht. An extended strategy for player i is a sequence \(\pi _i=(\pi _{it})_{t\in \mathbb {N}}\), where πit is a Borel measurable transition probability from Ht to Ai such that πit(Ai(xt)|ht) = 1 for each ht ∈ Ht. An extended stationary strategy for player i ∈ N can be identified with a Borel measurable mapping \(f:X \times [0,1]\to \Pr (A_i)\) such that f(Ai(x)|x, ξ) = 1 for all (x, ξ) ∈ X × [0, 1]. Assuming that the players use extended strategies we actually assume that they play the stochastic game with the extended state space X × [0, 1]. The law of motion, say p, in the extended state space model is obviously the product of the original law of motion q and the uniform distribution η on [0, 1]. More precisely, for any x ∈ X , ξ ∈ [0, 1], a ∈ A(x), Borel sets C ⊂ X and D ⊂ [0, 1], p(C × D|x, ξ, a) = q(C|x, a)η(D). For any profile of extended strategies π = (π1, …, πn) of the players, the expected discounted payoff to player i ∈ N is a function of the initial state x1 = x and the first signal ξ1 = ξ and is denoted by \(J_{\beta }^i(x,\xi ,\pi ).\) We say that \(f^*=(f^*_1,\ldots ,f^*_n)\) is a Nash equilibrium in the β-discounted stochastic game in the class of extended strategies if for each initial state x1 = x, i ∈ N and every extended strategy πi of player i, we have
The Nash equilibrium in extended strategies is also called a correlated equilibrium with public signals. The reason is that after the outcome of any period of the game, the players can coordinate their next choices by exploiting the next (known to all of them, i.e., public) signal and using some coordination mechanism telling which (pure or mixed) action is to be played by everyone. In many applications, we are particularly interested in stationary equilibria. In such a case the coordination mechanism can be represented by a family of n + 1 Borel measurable functions λj : X → [0, 1] such that \(\sum _{j=1}^{n+1}\lambda ^j(x)=1\) for each x ∈ X. A stationary correlated equilibrium can be constructed then by using a family of n + 1 stationary strategies \(f^1_i,\ldots ,f^{n+1}_i\) given for every player i, and the following coordination rule. If the game is in state xt = x on stage t and a random number ξt = ξ is selected, then player i ∈ N is suggested to use \(f_i^k(\cdot |x)\) where k is the least index for which \( \sum _{j=1}^k\lambda ^j(x)\ge \xi .\) The functions λj and \(f^j_i\) induce an extended stationary strategy \(f^*_i\) for every player i as follows
and
for x ∈ X, 2 ≤ k ≤ n + 1. Because the signals are independent and uniformly distributed in [0, 1], it follows that at any period of the game and for any current state x, the number λj(x) can be interpreted as the probability that player i is suggested to use \(f^j_i(\cdot |x)\) as a mixed action.
-
(A2)
Let \(\mu \in \Pr (X).\) There exists a conditional density function ρ for q with respect to μ such that if ak → a0 in A(x), x ∈ X, as k →∞, then
$$\displaystyle \begin{aligned} \lim_{k\to\infty}\int_X|\rho(x,a^k,y)-\rho(x,a^0,y)|\mu(dy) =0. \end{aligned}$$
Theorem 5
Any discounted stochastic game G satisfying (A2)has a stationary correlated equilibrium with public signals.
Theorem 5 was proved by Nowak and Raghavan (1992). First it is shown by making use of a theorem in Glicksberg (1952) that the correspondence \(v\to \mathcal {M}_v\) has a fixed point, i.e., there exists w∗∈ Bn(X) such that \(w^*(x)\in co\mathcal {NP}_{w^*}(x)\) for all x ∈ X. Then, applying Propositions 2 and 4, one can prove the existence of a stationary correlated equilibrium with public signals for the game with the payoff functions \(U_{\beta }^i(w^*_i,x,a)\) defined in (6.1). A verification that f∗ obtained in this way is indeed a Nash equilibrium in the game with the extended state space X × [0, 1] relies on using standard Bellman equations for discounted dynamic programming; see Blackwell (1965) or Puterman (1994). Observe also that the set of all atoms Da for μ is countable. A refinement of the above result is Theorem 2 in Jaśkiewicz and Nowak (2016), where it is shown that public signals are important only in states belonging to the set X ∖ Da. A similar result on correlated equilibria was given in Nowak and Jaśkiewicz (2005) for semi-Markov games with Borel state spaces and the expected average payoffs. This result, in turn, was proved under geometric drift conditions (GE1)–(GE3) formulated in Sect. 5 in Jaśkiewicz and Nowak (2018a).
Condition (A2) can be replaced in the proof (with minor changes) by assumption (A1) on norm continuity of q with respect to actions. A similar result to Theorem 5 was given by Duffie et al. (1994), where it was assumed that for any x, x′∈ X, a ∈ A(x), a′∈ A(x′), we have
In addition, Duffie et al. (1994) required the continuity of the payoffs and transitions with respect to actions. Thus, the result in Duffie et al. (1994) is weaker than Theorem 5. Moreover, they also established the ergodicity of the Markov chain induced by a stationary correlated equilibrium. Their proof is different from that of Nowak and Raghavan (1992). Subgame-perfect correlated equilibria were also studied by Harris et al. (1995) for games with weakly continuous transitions and general continuous payoff functions on the space of infinite plays endowed with the product topology. Harris et al. (1995) gave an example showing that public signals play an important role. They proved that the subgame-perfect equilibrium path correspondence is upper hemicontinuous. Later, Reny and Robson (2002) provided a shorter and simpler proof of existence that focuses on considerations of equilibrium payoffs rather than paths. Some comments on correlated equilibria for games with finitely many states or different payoff evaluation will be given in the sequel.
5 Stationary Equilibria in Discounted Stochastic Games with Borel State Spaces
In this section, we introduce the following condition.
-
(A3)
There exist l Carathéodory functions αj : X × A → [0, 1] such that \(\sum _{j=1}^l\alpha _j(x,a)=1\) for every (x, a) ∈ X × A and Borel measurable transition probabilities \(q_j:X\times \mathcal {B}(X)\to [0,1]\) such that
$$\displaystyle \begin{aligned} q(\cdot|x,a) = \sum_{j=1}^l \alpha_j(x,a)q_j(\cdot|x), \quad (x,a)\in X\times A.\end{aligned} $$Additionally, every qj(⋅|x) is dominated by some \(\mu \in \Pr (X).\)
We can now state a result due to Jaśkiewicz and Nowak (2016).
Theorem 6
Assume that game G satisfies (A3) . Then, G has a stationary almost Markov perfect equilibrium.
We outline the proof of Theorem 6 for nonatomic measure μ. The general case needs an additional notation. First, we show that there exists a Borel measurable mapping w∗∈ Bn(X) such that \(w^*(x) \in co\mathcal {NP}_{w^*}(x)\) for all x ∈ X. This result is obtained by applying a generalization of the Kakutani fixed point theorem due to Glicksberg (1952). (Note that closed balls in Bn(X) are compact in the weak-star topology due to Banach-Alaoglu’s theorem.) Second, applying Proposition 5 we conclude that there exists some v∗∈ Bn(X × X) such that
Hence, by (A3) we infer that
Moreover, we know that \(v^*(x,y)\in \mathcal {NP}_{v^*}(y)\) for all states x and y. Furthermore, making use of Filippov’s measurable implicit function theorem (as in Proposition 5), we claim that v∗(x, y) is the vector of equilibrium payoffs corresponding to some stationary almost Markov strategy profile. Finally, we utilize the system of n Bellman equations to provide a characterization of stationary equilibrium and to deduce that this profile is indeed a stationary almost Markov perfect equilibrium. For details the reader is referred to Jaśkiewicz and Nowak (2016).
Corollary 1
Consider a game where the set A is finite and the transition probability q is Borel measurable. Then, the game has a stationary almost Markov perfect equilibrium.
Proof
We show that the game meets (A3). Let \(m\in \mathbb {N}\) be such that A = {a1, …, am}. Now, for j = 1, …, m, define
Then, \(q(\cdot |s,a)=\sum _{j=1}^l g_j(s,a)q_j(\cdot |s)\) for l = m and the conclusion follows from Theorem 6.
Remark 1
Corollary 1 extends the result of Mertens and Parthasarathy (1991), where it is additionally assumed that Ai(x) = Ai for all x ∈ X, i ∈ N and that μ is nonatomic; see Comment on page 147 in Mertens and Parthasarathy (1991) or Theorem VII.1.8 on page 398 in Mertens et al. (2015). If μ admits some atoms, then they proved the existence of a subgame-perfect equilibrium in which the strategy of player i ∈ N is of the form (fi1, fi2, …) with \(f_{it}\in F_i^0\) for each \(t\in \mathbb {N}.\) Thus, the equilibrium strategy of player i ∈ N is stage-dependent.
Remark 2
It is worth to emphasize that equilibria established in Theorem 6 are subgame-perfect. A related result to Theorem 6 is given in Barelli and Duggan (2014). The assumption imposed on the transition probability in their paper is weaker, but an equilibrium is shown to exist in the class of stationary semi-Markov strategies, where the players take into account the current state, the previous state and the actions chosen by the players in the previous state.
Remark 3
As already mentioned in Sect. 3, Levy and McLennan (2015) constructed a stochastic game that does not have a stationary Markov perfect equilibrium. In their model, each set Ai is finite, Ai(x) = Ai for every i ∈ N, x ∈ X, and the transition law is a convex combination of a probability measure (depending on the current state) and the Dirac measure concentrated at some state. Such a model satisfies the absolute continuity condition. Hence, their example confirms that one cannot expect to obtain an equilibrium in stationary Markov strategies even for games with finite action spaces. Therefore, Corollary 1 is meaningful.
Remark 4
By Urysohn’s metrization theorem (see Theorem 3.40 in Aliprantis and Border 2006), every action space Ai can be embedded homeomorphically in the Hilbert cube. The action correspondences remain measurable and compact-valued after the embedding. Therefore, one can assume without loss of generality as in Jaśkiewicz and Nowak (2016) that the action spaces are compact metric.
A stochastic game with additive reward and additive transitions (ARAT for short) satisfies some separability condition for the actions of the players. To simplify presentation, we assume that N = {1, 2}. The payoff function for player i ∈ N is of the form
where x ∈ X, a1 ∈ A1(x), a2 ∈ A2(x) and similarly
where q1 and q2 are Borel measurable subtransition probabilities dominated by some \(\mu \in \Pr (X).\)
The following result was proved in Jaśkiewicz and Nowak (2015a).
Theorem 7
If μ is a nonatomic probability measure and the action sets A1 and A2 are finite, then the ARAT stochastic game has a Nash equilibrium in pure stationary almost Markov strategies.
The separability condition as in ARAT games can be easily generalized to n -person case. Assumptions of similar type are often used in differential games; see Başar and Olsder (1995). ARAT stochastic games with Borel state and finite action spaces were first studied by Himmelberg et al. (1976), who showed the existence of stationary Markov equilibria for μ-almost all initial states with \(\mu \in \Pr (X).\) Their result was strengthened by Nowak (1987) , who considered compact metric action spaces and obtained stationary equilibria for all initial states. Pure stationary Markov perfect equilibria may not exist in ARAT stochastic games if μ has atoms; see Example 3.1 (a game with 4 states) in Raghavan et al. (1985) or counterexample (a game with 2 states) in Jaśkiewicz and Nowak (2015a). Küenle (1999) studied ARAT stochastic games with a Borel state space and compact metric action spaces and established the existence of non-stationary history-dependent pure Nash equilibria. In order to construct subgame-perfect equilibria, he used the well-known idea of threats (frequently used in repeated games). The result of Küenle (1999) is stated for two-person games only. Theorem 7 can also be proved for n-person games under a similar additivity assumption. An almost Markov equilibrium is obviously subgame-perfect.
Stationary Markov perfect equilibria exist in discounted stochastic games with state-independent transitions (SIT games) studied by Parthasarathy and Sinha (1989). They assumed that Ai(x) = Ai for all x ∈ X and i ∈ N, the action sets Ai are finite, and q(⋅|x, a) = q(⋅|a) are nonatomic for all a ∈ A. A more general class of games with additive transitions satisfying (A3) but with all qj independent of state x ∈ X (AT games) was examined by Nowak (2003b). A stationary Markov perfect equilibrium f∗∈ F0 was shown to exist in that class of stochastic games. Additional special classes of discounted stochastic games with uncountable state space having stationary Markov perfect equilibrium are described in Krishnamurthy et al. (2012) . Some of them are related to AT games studied by Nowak (2003b).
Let X = Y × Z where Y and Z are Borel spaces. In a noisy stochastic game considered by Duggan (2012), the states are of the form x = (y, z) ∈ X, where z is called a noise variable. The payoffs depend measurably on x = (y, z) and are continuous in actions of the players. The transition probability q is defined as follows:
Moreover, it is assumed that q1 is dominated by some \(\mu _1\in \Pr (Y)\) and q2 is absolutely continuous with respect to some nonatomic measure \(\mu _2\in \Pr (Z).\) Additionally, q1(⋅|x, a) is norm continuous in actions a ∈ A, for each x ∈ X. This form of q implies that conditional on y′ the next shock z′ is independent of the current state and actions. In applications, (y, z) may represent a pair: the price of some good and the realization of random demand. By choosing actions, the players can determine (stochastically) the next period price y′, which in turn, has some influence on the next demand shock. Other applications are discussed by Duggan (2012), who obtained the following result.
Theorem 8
Every noisy stochastic game has a stationary Markov perfect equilibrium.
Let X be a Borel space, \(\mu \in \Pr (X)\) and let \(\mathcal {G}\subset \mathcal {B}(X)\) be a sub-σ-algebra. A set \(D\in \mathcal {B}(X)\) is said to be a (conditional) \(\mathcal {G}\)-atom if μ(D) > 0 and for any Borel set B ⊂ D, there exists some \(D_0\in \mathcal {G}\) such that μ(B△(D ∩ D0)) = 0. Assume that the transition probability q is dominated by some probability measure μ, and ρ denotes a conditional density function. Following He and Sun (2017), we say that a discounted stochastic game has a decomposable coarser transition kernel if there exists a sub-σ-algebra \(\mathcal {G}\subset \mathcal {B}(X)\) such that \(\mathcal { B}(X)\) has no \(\mathcal {G}\)-atom and there exist Borel measurable nonnegative functions ρj and dj (j = 1, …, l) such that, for every x ∈ X, a ∈ A, each function ρj(⋅, x, a) is \(\mathcal {G}\) -measurable and the transition probability density ρ is of the form
Using a theorem of Dynkin and Evstigneev (1977) on conditional expectations of measurable correspondences and a fixed point property proved in Nowak and Raghavan (1992), He and Sun (2017) established the following result.
Theorem 9
Every discounted stochastic game having decomposable coarser transition kernel with respect to a nonatomic probability measure μ on X has a stationary Markov perfect equilibrium.
A slight extension of the above theorem, given in He and Sun (2017), contains as special cases the results proved by Parthasarathy and Sinha (1989), Nowak (2003b), Nowak and Raghavan (1992). However, the form of the equilibrium strategy obtained by Nowak and Raghavan (1992) does not follow from He and Sun (2017). The result of He and Sun (2017) also covers the class of noisy stochastic games examined in Duggan (2012). In this case, it suffices to take \(\mathcal {G}\subset \mathcal {B} (Y\times Z)\) that consists of the sets D × Z, \(D\in \mathcal {B}(Y).\) Finally, we wish to point out that ARAT discounted stochastic games as well as games considered in Jaśkiewicz and Nowak (2016) (see Theorem 6) are not included in the class of models mentioned in Theorem 9.
Remark 5
A key tool in proving the existence of a stationary Markov perfect equilibrium in a discounted stochastic game that has no ARAT structure is Lyapunov’s theorem on the range of a nonatomic vector measure. Since the Lyapunov theorem is false for infinitely many measures, the counterexample of Levy’s eight-person game is of some importance (see Levy and McLennan 2015). There is another reason for which the existence of an equilibrium in the class of stationary Markov strategies F0 is difficult to obtain. One can recognize strategies from the sets \(F_i^0\) as “Young measures” and consider natural in that class weak-star topology; see Valadier (1994). Young measures are often called relaxed controls in control theory. With the help of Example 3.16 from Elliott et al. (1973), one can easily construct a stochastic game with X = [0, 1], finite action spaces and trivial transition probability q being a Lebesgue measure on X, where the expected discounted payoffs \(J^i_{\beta }(x,f)\) are discontinuous on F0 endowed with the product topology. The continuity of \(f\to J^i_{\beta }(x,f)\) (for fixed initial state) can only be proved for ARAT games. Generally, it is difficult to obtain compact families of continuous strategies. This property requires very strong conditions in order to get, for instance, equicontinuous family of functions (see Sect. 6).
6 Special Classes of Stochastic Games with Uncountable State Space and Their Applications in Economics
In a number of applications of discrete-time dynamic games in economics, the state space is an interval in Euclidean space. An illustrative example is the “fish war game” studied by Levhari and Mirman (1980), where the state space X = [0, 1], Ai(x) = [0, x∕n] for each i ∈ N. Usually, X is interpreted as the set of common property renewable resources. If xt is a resource stock at the beginning of period \(t\in \mathbb {N}\) and player i ∈ N extracts ait ∈ Ai(xt) for consumption, then the new state is \(x_{t+1}= \left (x_t-\sum _{j=1}^na_{jt}\right )^\alpha \) with α ∈ (0, 1). The game is symmetric in the sense that the utility function of player i ∈ N is: \( u_i(x,a):= \ln a_i\) with a = (a1, …, an) being a pure strategy profile chosen by the players in state x ∈ X. Levhari and Mirman (1980) constructed a symmetric stationary Markov perfect equilibrium for the two-player β -discounted game that consists of linear strategies. For the arbitrary n -player case, the equilibrium strategy profile is \(f^\beta =(f_1^\beta ,\ldots ,f^ \beta _n)\) where \(f_i^\beta (x)= \frac {(1-\alpha \beta )x}{n+(1-n)\alpha \beta }\), x ∈ X, i ∈ N; see Nowak (2006c). Levhari and Mirman (1980) concluded that, in equilibrium, the fish population will be smaller than the population that would have resulted if the players had cooperated and had maximized their joint utility. The phenomenon of overexploitation of a common property resource is known in economics as the “tragedy of the commons.” Dutta and Sundaram (1993) showed that there may exist equilibria (that consist of discontinuous consumption functions), in which the common resource is underexploited, so that the tragedy of the commons need not occur. A characterization of the set of equilibria in this model has been given by Chiarella et al. (1984). If β → 1 , then \(f^\beta \to f^*=(f_1^*,\ldots ,f_n^*)\) where \(f_i^*(x)=\frac {(1-\alpha )x }{n+(1-n)\alpha }\), x ∈ X, i ∈ N. As shown in Nowak (2008), f∗ is a Nash equilibrium in the class of all strategies of the players in the fish war game under the overtaking optimality criterion. Such a criterion was examined in economics by Ramsey (1928), von Weizscker (1965), and Gale (1967), and its application to repeated games was pointed out by Rubinstein (1979). Generally, finding an equilibrium under the overtaking optimality criterion in the class of all strategies is a difficult task; see Carlson and Haurie (1996).
Dutta and Sundaram (1992) considered a stochastic game of resource extraction with state space X = [0, ∞), Ai(x) = [0, x∕n] for each i ∈ N, x ∈ X and the same nonnegative utility function u for each player. Their model includes both the dynamic game with deterministic transitions studied by Sundaram (1989a,b) and the stochastic game with nonatomic transition probabilities considered by Majumdar and Sundaram (1991). Now we list the assumptions used by Dutta and Sundaram (1992). For any y, z ∈ X, let Q(y|z) := q([0, y]|z) and for any y > 0 set \(Q(y^-|z):=\lim _{y^{\prime }\uparrow y}Q(y^{\prime }|z).\)
-
(D1)
For any x ∈ X, a = (a1, …, an) ∈ A(x) and i ∈ N, ui(x, a) = u(ai) ≥ 0. The utility function u is strictly concave, increasing and continuously differentiable. Moreover, lima↓0u′(a) = ∞.
-
(D2)
Q(0|0) = 1 and for each z > 0, there exists a compact interval I(z) ⊂ (0, ∞) such that q(I(z)|z) = 1.
-
(D3)
There exists z1 > 0 such that if 0 < z < z1, then Q(z−|z) = 0, i.e., q([z, ∞)|z) = 1.
-
(D4)
There exists \(\hat {z}>0\) such that for each \(z\ge \hat {z}\), Q(z|z) = 1, i.e., q([0, z]|z) = 1.
-
(D5)
If zm → z as m →∞, then q(⋅|zm) → q(⋅|z) in the weak topology on \(\Pr (X).\)
-
(D6)
If z < z′, then for each y > 0, Q(y−|z) ≥ Q(y|z′).
Assumption (D6) is a “strong stochastic dominance” condition that requires larger investments to obtain probabilistically higher stock levels. This assumption and the fact that the players have identical utility functions play a crucial role in the proof of Theorem 1 in Dutta and Sundaram (1992) that can be stated as follows.
Theorem 10
Every discounted stochastic game satisfying conditions (D1)–(D6)has a pure stationary Markov perfect equilibrium.
Remark 6
The equilibrium strategies obtained by Dutta and Sundaram (1992) are identical for all the players, and the corresponding equilibrium functions are nondecreasing and upper semicontinuous on X. One can observe that the assumptions on the transition probability functions include the usual deterministic case with an increasing continuous production function. A slightly more general model was recently studied by Jaśkiewicz and Nowak (2018b), who dealt with unbounded utilities.
Transition probabilities considered in other papers on equilibria in stochastic games are assumed to satisfy much stronger continuity conditions, e.g., the norm continuity in actions.
The problem of proving the existence of a Nash equilibrium in a stochastic game of resource extraction with different utility functions for the players seems to be difficult. Partial results were reported by Amir (1996a), Nowak (2003b), Balbus and Nowak (2008) and Jaśkiewicz and Nowak (2015b), where specific transition structures were assumed. Below we give an example, where the assumptions are relatively simple to list.
-
(S1)
X = [0, ∞) and Ai(x) = [0, bi(x)] with \(\sum _{j=1}^n b_j(x) \le x\) for all x ∈ X, where each bj is a continuous increasing function.
-
(S2)
\(u_i:[0,\infty ) \to \mathbb {R}\) is a nonnegative increasing twice differentiable utility function for player i ∈ N such that ui(0) = 0.
-
(S3)
If a = (a1, …, an) ∈ A(x) and \(s(a)= \sum _{i=1}^na_i\), then
$$\displaystyle \begin{aligned} q(\cdot|x,a) = h(x-s(a))q_0(\cdot|x) +(1-h(x-s(a)))\delta_0(\cdot), \end{aligned}$$where h : X → [0, 1] is an increasing twice differentiable function such that h′′ < 0 and h(0) = 0, δ0 is the Dirac measure concentrated at the point 0 ∈ X. Moreover, q0((0, ∞)|x) = 1 for each x > 0, q0({0}|0) = 1 and q0(⋅|x) has a density function ρ(x, ⋅) with respect to a σ-finite measure μ defined on X. The function x → ρ(x, y) is continuous for each y ∈ X.
The following result is a special case of Theorem 2 in Jaśkiewicz and Nowak (2015b).
Theorem 11
Every discounted stochastic game satisfying assumptions (S1)–(S3)has a pure stationary Markov perfect equilibrium.
The proof of Theorem 11 uses the fact that the auxiliary game Γv(x) has a unique Nash equilibrium for any vector v = (v1, …, vn) of nonnegative continuation payoffs vi such that vi(0) = 0. The uniqueness follows from page 1476 in Balbus and Nowak (2008) or can be deduced from the classical theorem of Rosen (1965) (see also Theorem 3.6 in Haurie et al. 2012). The game Γv(x) is not supermodular since for increasing continuation payoffs vi such that vi(0) = 0, we have \( \frac {\partial ^2 {U_{\beta }^i(v_i,x,a)}}{\partial {a_i}\partial {a_j}}<0\), for i ≠ j. A stronger version of Theorem 11 and related results can be found in Jaśkiewicz and Nowak (2015b).
Transition probabilities presented in (S3) were first used in Balbus and Nowak (2004). They dealt with the symmetric discounted stochastic games of resource extraction and proved that the sequence of Nash equilibrium payoffs in the n-stage games converges monotonically as n →∞. Stochastic games of resource extraction without the symmetry condition were first examined by Amir (1996a), who considered so-called convex transitions. More precisely, he assumed that the conditional cumulative distribution function Q(y|z) is strictly convex with respect to z ∈ X for every fixed y > 0. He proved the existence of pure stationary Markov perfect equilibria, which are Lipschitz continuous functions in the state variable. Although the result obtained is strong, a careful analysis of various examples suggests that the convexity assumption imposed by Amir (1996a) is satisfied very rarely. Usually, the cumulative distribution Q(y|z) is neither convex nor concave with respect to z. A further discussion on this condition is provided in Remarks 7–8 in Jaśkiewicz and Nowak (2015b). The function Q(y|z) induced by the transition probability q of the form considered in (S3) is strictly concave in z = x − s(a) only when q0 is independent of x ∈ X. Such transition probabilities that are “mixtures” of finitely many probability measures on X were considered in Nowak (2003b) and Balbus and Nowak (2008). A survey of various game theoretic approaches to resource extraction models can be found in Van Long (2011).
In many other examples, the one-shot game Γv(x) has also nonempty compact set of pure Nash equilibria. Therefore, a counterpart of Theorem 6 can be formulated for the class of pure strategies of the players. We now describe some examples presented in Jaśkiewicz and Nowak (2016).
Example 1 (Dynamic Cournot oligopoly)
Let \(X=[0,\bar {x}]\) and x ∈ X represent a realization of a random demand shock that is modified at each period of the game. Player i ∈ N (oligopolist) sets a production quantity ai ∈ Ai(x) = [0, 1]. If \(P\left (x,\sum _{j=1}^n a_j\right )\) is the inverse demand function, and ci(x, ai) is the cost function for player i, then
A simple example of the inverse demand function is
The function \(a_i \to a_iP\left (x,\sum _{j=1}^n a_j\right )\) is usually concave and ai → ci(x, ai) is often convex. Assume that
where q1(⋅|x) and q2(⋅|x) are dominated by some probability measure μ on X for all x ∈ X. In order to provide an interpretation of q, we observe that
Let
be the mean values of the distributions q(⋅|x, a) and qj(⋅|x), respectively. By (6.2), we have \(E_q(x,a):= E_{q_{1}}(x) + \overline {a}( E_{q_{2}}(x)- E_{q_{1}}(x)).\) Assume that \(E_{q_{1}}(x) \ge x\ge E_{q_{2}}(x).\) This condition implies that
Thus, the expectation of the next demand shock Eq(x, a) decreases if the total sale \(n\overline {a}\) in the current state x ∈ X increases. Observe that the game Γv(x) is concave. From Nash (1950), it follows that the game Γv(x) has a pure equilibrium point. However, the set of Nash equilibria in Γv(x) may contain many points. A modification of the proof of Theorem 1 given in Jaśkiewicz and Nowak (2016) implies that this game has a pure stationary almost Markov perfect equilibrium.
Example 2 (Cournot competition with substituting goods in differentiated markets)
This model is inspired by a dynamic game with complementary goods studied by Curtat (1996). Related static games were already discussed in Spence (1976) and Vives (1990). There are n firms on the market and firm i ∈ N produces a quantity ai ∈ Ai(x) = [0, 1] of a differentiated product. The inverse demand function is given by a twice differentiable function Pi(a), where a = (a1, …, an). The goods are substitutes, i.e., \(\frac {\partial P_i(a)}{\partial a_j}<0\) for all i, j ∈ N, see Spence (1976). In other words, consumption of one good will decrease consumption of the others. We assume that X = [0, 1]n, where i-th coordinate xi ∈ [0, 1] is a measure of the cumulative experience of firm i ∈ N. If ci(xi) is the marginal cost for firm i ∈ N, then
The transition probability of the next state (experience vector) is of the form:
where
and q1(⋅|x), q2(⋅|x) are dominated by some probability measure μ on X for all x ∈ X. In Curtat (1996) it is assumed that q1 and q2 are independent of x ∈ X and also that q2 stochastically dominates q1. Then, the underlying Markov process governed by q captures the ideas of learning-by-doing and spillover (see page 197 in Curtat 1996). Here, this stochastic dominance condition can be dropped, although it is quite natural. It is easy to see that the game Γv(x) is concave, if ui(x, (⋅, a−i)) is concave on [0, 1]. Clearly, this is satisfied, if for each i ∈ N, we have
If the goods are substitutes, this condition holds, when \(\frac {\partial ^2 P_i(a)}{\partial a_i^2}\le 0\) for all i ∈ N. The game Γv(x) may have multiple pure Nash equilibria. Using the methods from Jaśkiewicz and Nowak (2016), one can show that any concave game discussed here has a pure stationary almost Markov perfect equilibrium.
Supermodular static games were extensively studied by Milgrom and Roberts (1990) and Topkis (1998). This class of games finds applications in dynamic economic models with complementarities. Our next illustration refers to Example 2, but with products that are complements. The state space and action spaces for firms are the same as in Example 2. We endow both X and A = [0, 1]n with the usual component-wise ordering. Then, X and A are complete lattices. We assume that the transition probability is defined as in (6.4) and q1(⋅|x) and q2(⋅|x) are for all x ∈ X dominated by some probability measure μ on S. The payoff function for every firm is given in (6.3).
Example 3 (Cournot oligopoly with complementary goods in differentiated markets)
Let h be defined as in (6.5). Suppose that the payoff function in the game Γv(x) satisfies the following condition:
Then, by Theorem 4 in Milgrom and Roberts (1990), the game Γv(x) is supermodular. Note that within our framework, it is sufficient to prove that for ui(x, a), defined in (6.3), it holds \(\frac {\partial ^2 u_i(s,a)}{\partial a_i\partial a_j}\ge 0\), j ≠ i. But
and \(\frac {\partial ^2 u_i(x,a)}{\partial a_i\partial a_j}\) are likely to be nonnegative, if the goods are complements, i.e., \(\frac {\partial P_i(a)}{\partial a_j}\ge 0\) for j ≠ i; see Vives (1990). From Theorem 5 in Milgrom and Roberts (1990), it follows that the game Γv(x) has a pure Nash equilibrium. Therefore, the arguments used in Jaśkiewicz and Nowak (2016) imply that the stochastic game has a pure stationary almost Markov perfect equilibrium.
Remark 7
The game described in Example 3 is also studied in Curtat (1996), but with additional restrictive assumptions that q1 and q2 are independent of x ∈ X. Then, the transition probability q has so-called increasing differences in (x, a). This fact implies that the functions \(U^i_{\beta }(v_i,\cdot ,\cdot )\) satisfy the assumptions in Proposition 7. Other condition imposed by Curtat (1996) states that the payoff functions ui(x, a) are increasing in a−i and, more importantly, satisfy the so-called strict diagonal dominance condition for each x ∈ X. For details the reader is referred to Curtat (1996) and Rosen (1965). This additional condition entails the uniqueness of a pure Nash equilibrium in every auxiliary game Γv(x) under consideration; see Proposition 6. The advantage is that Curtat (1996) can directly work with Lipschitz continuous strategies for the players and find a stationary Markov perfect equilibrium in that class using Schauder’s fixed point theorem. Without the strict diagonal dominance condition, Γv(x) may have many pure Nash equilibria and Curtat’s approach cannot be applied. The coefficients of the convex combination in (6.4) are affine functions of a ∈ A. This requirement can slightly be generalized; see, for instance, Example 4 in Jaśkiewicz and Nowak (2016). If q1 or q2 depends on x ∈ X, then the increasing differences property of q does not hold and the method of Curtat (1996) does not work. Additional comments on supermodular stochastic games can be found in Amir (2003).
The result in Curtat (1996) on the existence of stationary Markov perfect equilibria for supermodular discounted stochastic games is based upon the lattice theoretic arguments and on complementarity and monotonicity assumptions. The state and action spaces are assumed to be compact intervals in Euclidean space, and the transition probability is assumed to be norm continuous in state and action variables. Moreover, the strict diagonal dominance condition (see (C1) in Sect. 2) applied to the auxiliary one-shot games Γv(x) for any increasing Lipschitz continuous continuation vector payoff v plays a crucial role. Namely, this assumption together with others implies that \(\mathcal {NP}_v(x)\) is a singleton. In addition, the function \(x\to \mathcal {NP}_v(x)\) is increasing and Lipschitz continuous. Thus, his proof is comprised of two steps. First, he shows that there exists an increasing Lipschitz continuous vector payoff function v∗ such that \(v^*(x)= \mathcal {NP}_{v^*}(x)\) for all x ∈ X. Second, he makes use of a theorem on the Lipschitz property of the unique equilibrium in \( \varGamma _{v^*}.\)
Horst (2005) provided a different and more unified approach to stationary Markov perfect equilibria that can be applied beyond the setting of supermodular games. Instead of imposing monotonicity conditions on the players’ utility functions, he considered stochastic games in which the interaction between different players is sufficiently weak. For instance, certain “production games” satisfy this property. The method of his proof is based on a selection theorem of Montrucchio (1987) and the Schauder fixed point theorem applied to the space of Lipschitz continuous strategy profiles of the players. The assumptions imposed by Horst (2005) are rather complicated. For example, they may enforce a number of players in the game or the upper bound for a discount factor. Such limitations do not occur in the approach of Curtat (1996).
Balbus et al. (2014a) considered supermodular stochastic games with an absorbing state and the transition probabilities of the form q(⋅|x, a) = g(x, a)q0(⋅|x) + (1 − g(x, a))δ0(⋅). Under some strong monotonicity conditions on the utility functions and transitions, they showed that the Nash equilibrium payoffs in the n-stage games monotonically converge as n →∞. This fact yields the existence of pure stationary Markov perfect equilibrium. A related result is given in Balbus et al. (2013b) for a similar class of dynamic games. The state space X in Balbus et al. (2014a) is one-dimensional, and their results do not apply to the games of resource extraction discussed earlier. If, on the other hand, the transition probability is a “mixture” of finitely many probability measures, then a stationary Markov perfect equilibrium can be obtained, in certain models, by solving a system of non-linear equations. This method was discussed in Sect. 5 of Nowak (2007) . The next example is not a supermodular game in the sense of Balbus et al. (2014a), but it belongs to the class of production games examined by Horst (2005). Generally, there are only few examples of games with continuum states, for which Nash equilibria can be given in a closed form.
Example 4
Let X = [0, 1], Ai(x) = [0, 1] for all x ∈ X and i ∈ N = {1, 2}. We consider the symmetric game where the stage utility of player i is
The state variable x in (6.6) is a complementarity coefficient of the players’ actions. The transition probabilities are of the form
We assume that μ1 has the density ρ1(y) = 2y and μ2 has the density ρ2(y) = 2 − 2y, y ∈ X. Note that μ1 stochastically dominates μ2. From the definition of q, it follows that higher states x ∈ X or high actions a1, a2 (efforts) of the players induce a distribution of the next state having a higher mean value. Assume that \(v^*=(v_1^*,v_2^*)\) is an equilibrium payoff vector in the β-discounted stochastic game. As shown in Example 1 of Nowak (2007), it is possible to construct a system of non-linear equations with unknown z1 and z2, whose solution \(z_1^*\), \(z_2^*\) is \(z_i^*=\int _Xv^*_i(y)\mu _i(dy)\). This fact, in turn, gives the possibility finding of a symmetric stationary Markov perfect equilibrium \((f_1^*,f_2^*)\) and \(v_1^*=v_2^*\). It is of the form \(f^*_i(x)= \frac {1+z^*}{4-2x}. \ x\in X, \ i\in N\), where
Moreover, we have
Ericson and Pakes (1995) provided a model of firm and industry dynamics that allows for entry, exit and uncertainty generating variability in the fortunes of firms. They considered the ergodicity of the stochastic process resulting from a Markov perfect industry equilibrium. A dynamic competition in an oligopolistic industry with investment, entry and exit was also extensively studied by Doraszelski and Satterthwaite (2010). Computational methods for the class of games studied by Ericson and Pakes (1995) are presented in Doraszelski and Pakes (2007). Further applications of discounted stochastic games with countably many states to models in industrial organization including models of industry dynamics are given in Escobar (2013).
Shubik and Whitt (1973) considered a non-stochastic model of sequential strategic market game, where the state includes a current stocks of capital. At each period of the game, one unit of a consumer good is put up for sale, and players bid some amounts of fiat money for it. A stochastic counterpart of this game was first presented in Secchi and Sudderth (2005). Wiecek (2009), on the other hand, obtained a general structure of equilibrium policies in two -person games, where bids gradually decrease with increase of the discount factor. Moreover, Wiecek (2012) proved that a Nash equilibrium, where all the players use “aggressive strategies”, emerges in the game for any value of the discount factor as the number of players is sufficiently large. This fact corresponds to a similar result for a deterministic economy given in Shubik and Whitt (1973) as well as being consistent with existing results about economies with continuum of players. Other applications of nonzero-sum stochastic games to economic models can be found in Duggan (2012) and He and Sun (2017). Although the concept of mean field equilibrium in dynamic games is not directly inspired by Nash, the influence of the theory of non-cooperative stochastic games on this area of research is obvious. Also the notion of supermodularity is used in studying the mean field equilibria in dynamic games. The reader is referred to Adlakha and Johari (2013) where some applications to computer science and operations research are given.
7 Special Classes of Stochastic Games with Countably Many States
Assume that the state space X is countable. Then every \(F^0_i\) can be recognized as a compact convex subset of a linear topological space. A sequence \((f_i^k)_{k\in \mathbb {N}}\) converges to \(f_i\in F^0_i\) if for every \(x\in X\ f_i^k(\cdot |x) \to f_i(\cdot |x)\) in the weak-star topology on the space of probability measures on Ai(x). The weak or weak-star convergence of probability measures on metric spaces is fully described in Aliprantis and Border (2006) or Billingsley (1968). Since X is countable, every space \(F^0_i \) is sequentially compact (it suffices to use the standard diagonal method for selecting convergent subsequences) and, therefore, F0 is sequentially compact when endowed with the product topology. If X is finite and the sets of actions are finite, then F0 can actually be viewed as a convex compact subset of Euclidean space. In the finite state space case, it is easy to prove that the discounted payoffs \(J_{\beta }^i(x,f)\) are continuous on F0. If X is countable and the payoff functions are uniformly bounded, and q(y|x, a) is continuous in a ∈ A(x) for all x, y ∈ X, then showing the continuity of \(J_{\beta }^i(x,f)\) on F0 requires a little more work; see Federgruen (1978). From the Bellman equation in discounted dynamic programming (see Puterman 1994), it follows that \(f^*=(f^*_1,\ldots ,f^*_n)\) is a stationary Markov perfect equilibrium in the discounted stochastic game if and only if there exist bounded functions \(v_i^*:X\to \mathbb {R}\) such that for each x ∈ X and i ∈ N we have
From (6.7), it follows that \(v^*_i(x)=J_{\beta }^i(x,f^*)\). Using the continuity of the expected discounted payoffs in f ∈ F0 and (6.7 ), one can define the best response correspondence in the space of strategies, show its upper semicontinuity and conclude from the fixed point theorem due to Glicksberg (1952) (or due to Kakutani (1941) in the case of finite state and action space) that the game has a stationary Markov perfect equilibrium f∗∈ F0. This fact was proved for finite state space discounted stochastic games by Fink (1964) and Takahashi (1964). An extension to games with countable state spaces was reported in Parthasarathy (1973) and Federgruen (1978). Some results for a class of discounted games with discontinuous payoff functions can be found in Nowak and Wiȩcek (2007).
The fundamental results in the theory of regular Nash equilibria in normal form games concerning genericity (see Harsanyi 1973a) and purification (see Harsanyi 1973b) were extended to dynamic games by Doraszelski and Escobar (2010). A discounted stochastic game possessing equilibria that are all regular in the sense of Doraszelski and Escobar (2010) has a compact equilibrium set that consists of isolated points. Hence, it follows that the equilibrium set is finite. They proved that the set of discounted stochastic games (with finite sets of states and actions) having Markov perfect equilibria that all are regular is open and has full Lebesgue measure. Related results were given by Haller and Lagunoff (2000), but their definition of regular equilibrium is different and may not be purifiable.
The payoff function for player i ∈ N in the limit-average stochastic game can be defined as follows:
The equilibrium solutions for this class of games are defined similarly as in the discounted case. The existence of stationary Markov perfect equilibria for games with finite state and action spaces and the limit-average payoffs was proved independently by Rogers (1969) and Sobel (1971). They assumed that the Markov chain induced by any stationary strategy profile and the transition probability q is irreducible . It is shown under the irreducibility condition that the equilibrium payoffs \(w^*_i\) of the players are independent of the initial state. Moreover, it is proved that there exists a sequence of equilibria \((f^k)_{k\in \mathbb {N}}\) in βk-discounted games (with βk → 1 as k →∞) such that \(w_i^*= \lim _{k\to \infty }(1-\beta _k)J^i_{\beta _k}(x,f^k)\). Later, Federgruen (1978) extended these results to limit-average stochastic games with countably many states satisfying some uniform ergodicity conditions. Other cases of similar type were mentioned by Nowak (2003a). Below we provide a result due to Altman et al. (1997), which has some potential for applications in queueing models. The stage payoffs in their approach may be unbounded. We start with a formulation of their assumptions.
Let m : X → [1, ∞) be a function for which the following conditions hold.
-
(A4)
For each x, y ∈ X, i ∈ N, the functions ui(x, ⋅) and q(y|x, ⋅) are continuous on A(x). Moreover,
$$\displaystyle \begin{aligned} &\sup_{x\in X}\max_{a\in A(x)}|u_i(x,a)|/m(x) <\infty \quad\mbox{and}\\ &\quad \quad \lim_{k\to\infty}\sum_{y\in X}|q(y|x,a^k)-q(y|x,a)|m(y)=0 \end{aligned} $$for any ak → a ∈ A(x).
-
(A5)
There exist a finite set Y ⊂ X and γ ∈ (0, 1) such that
$$\displaystyle \begin{aligned} \sum_{y\in X\setminus Y}q(y|x,a)m(y) \le \gamma m(x)\quad \mbox{for all} \quad x\in X,\ a\in A(x). \end{aligned}$$ -
(A6)
The function f → n(f) is continuous with respect to stationary strategy profiles f ∈ F0, where n(f) denotes the number of closed classes in the Markov chain induced by the transition probability q(y|x, f), x, y ∈ X.
Property (A5) is called m-uniform geometric recurrence; see Altman et al. (1997). Condition (A6) is quite restrictive and implies that the number of positive recurrent classes is a constant function of the stationary strategies. If the Markov chains resulting from the stationary policies are all unichain, the limit-average payoff functions are constant, i.e., independent of the initial state. For a detailed discussion, we refer the reader to Altman et al. (1997) and the references cited therein.
Theorem 12
If conditions (A4)–(A6)are satisfied, then the limit-average payoff n-person stochastic game has a stationary Markov perfect equilibrium.
The above result follows from Theorem 2.6 in Altman et al. (1997), where it is also shown that under (A4)–(A6) any limit of stationary Markov equilibria in β-discounted games (as β → 1) is an equilibrium in the limit-average game. A related result was established by Borkar and Ghosh (1993) under a stochastic stability condition. More precisely, they assumed that the Markov chain induced by any stationary strategy profile is unichain and the transition probability from any fixed state has a finite support.
Stochastic games with countably many states are usually studied under some recurrence or ergodicity conditions. Without these conditions n-person nonzero-sum limit-average payoff stochastic games with countable state spaces are very difficult to deal with. Nevertheless, the results obtained in the literature have some interesting applications, especially to queueing systems; see, for example, Altman (1996) and Altman et al. (1997).
Now assume that X is a Borel space and μ is a probability measure on X. Consider an n-person discounted stochastic game G, where Ai(x) = Ai for all i ∈ N and x ∈ X, the payoff functions are uniformly bounded and continuous in actions.
-
(A7)
The transition probability q has a conditional density function ρ, which is continuous in actions and such that
$$\displaystyle \begin{aligned} \int_X\max_{a\in A}\rho(x,a,y)\mu(dy) <\infty. \end{aligned}$$
Let C(A) be the Banach space of all real-valued continuous functions on the compact space A endowed with the supremum norm ∥⋅∥∞. By L1(X, C(A)) we denote the Banach space of all C(A)-valued measurable functions ϕ on X such that ∥ϕ∥1 :=∫X∥ϕ(y)∥∞μ(dy) < ∞. Let \(\{X_k\}_{k\in \mathbb {N}_0}\) be a measurable partition of the state space (\(\mathbb {N}_0\subset \mathbb {N}\)), \(\{u_{i,k}\}_{k\in \mathbb {N}_0}\) be a family of functions ui,k ∈ C(A) and \( \{\rho _k\}_{k\in \mathbb {N}_0}\) be a family of functions ρk ∈ L1(X, C(A)) such that ρk(x)(a, y) ≥ 0 and ∫Xρk(x)(a, y)μ(dy) = 1 for each \(k\in \mathbb {N}_0,\ a\in A\). Consider a game \( \tilde {G}\) where the payoff function for player i is \(\tilde {u} _i(x,a)=u_k(a)\) if x ∈ Xk. The transition density is \(\tilde {\rho } (x,a,y)=\rho _k(x)(a,y)\) if x ∈ Xk. Let \(\tilde {F}^0_i\) be the set of all \(f_i\in F^0_i\) that are constant on every set Xk. Let \(\tilde {F}^0:= \tilde {F}_1^0\times \cdots \times \tilde {F}_n^0\). The game \(\tilde {G}\) resembles a game with countably many states and if the payoff functions \( \tilde {u}_i\) are uniformly bounded, then \(\tilde G\) with the discounted payoff criterion has an equilibrium in \(\tilde {F}^0\). Denote by \(\tilde {J} ^i_{\beta }(x,\pi )\) the discounted expected payoff to player i ∈ N in the game \(\tilde G\). It is well known that C(A) is separable. The Banach space L1(X, C(A)) is also separable. Note that x → ui(x, ⋅) is a measurable mapping from X to C(A). By, (A7) the mapping x → ρ(x, ⋅, ⋅) from X to L1(X, C(A)) is also measurable. Using these facts Nowak (1985) proved the following result.
Theorem 13
Assume that G satisfies (A7) . For any 𝜖 > 0, there exists a game \(\tilde G\) such that \(|J^i_{\beta }(x,\pi )-\tilde {J}^i_{\beta }(x,\pi )|< \epsilon /2\) for all x ∈ X, i ∈ N and π ∈ Π. Moreover, the game G has a stationary Markov 𝜖-equilibrium.
A related result on approximation of discounted nonzero-sum games and existence of 𝜖-equilibria was given by Whitt (1980), who used stronger uniform continuity conditions and used a different technique. Approximations of discounted and also limit-average stochastic games with general state spaces and unbounded stage functions were studied in Nowak and Altman (2002). They used the weighted norm approach and imposed some geometric ergodicity conditions while examining the limit-average case. An extension with simpler and more transparent proof for semi-Markov games satisfying a geometric drift condition and a majorization property, similar to (GE1)–(GE3) in Sect. 5 in Jaśkiewicz and Nowak (2018a), was given in Jaśkiewicz and Nowak (2006).
8 Algorithms for Nonzero-Sum Stochastic Games
In this section, we assume that the state space X and the sets of actions Ai are finite. In the 2-player case, we let for notational convenience A1(x) = A1, A2(x) = A2 and a = a1 ∈ A1, b = a2 ∈ A2. Further, for any \(f_i \in F^0_i\), i = 1, 2, we set
In a similar way, we define q(y|x, a, f2) and ui(x, a, f2). Note that every \(f_i\in F^0_i\) can be recognized as a compact convex subset of Euclidean space. Also every function \(\phi :X\to \mathbb {R}\) can be viewed as a vector in Euclidean space. Below we describe two results of Filar et al. (1991) about characterization of stationary equilibria in stochastic games by constrained nonlinear programming. However, due to the fact that the constraint sets are not convex, the results are not straightforward in numerical implementation. Although it is common in mathematical programming to use matrix notation, we follow the one introduced in previous sections.
Let c = (v1, v2, f1, f2). Consider the following problem (OPβ):
subject to \((f_1,f_2)\in F^0_1\times F^0_2\) and
and
Theorem 14
Consider a feasible point \(c^*=(v_1^*,v_2^*,f_1^*,f_2^*)\) in (OPβ). Then \((f_1^*,f_2^*)\in F_1^0\times F^0_2\) is a stationary Nash equilibrium in the discounted stochastic game if and only if c∗ is a solution to problem (OPβ) with O1(c∗) = 0.
Let c = (z1, v1, w1, f2, z2, v2, w2, f1). Now consider the following problem (OPa):
subject to \((f_1,f_2)\in F^0_1\times F^0_2\) and
for all x ∈ X, a ∈ A1 and
for all x ∈ X, b ∈ A2 and
for all x ∈ X and i = 1, 2.
Theorem 15
Consider a feasible point \(c^*=(z^*_1,v_1^*,w_1^*,f_2^*,z^*_2,v_2^*,w_2^*,f_1^*)\) in (OPa). Then \((f_1^*,f_2^*)\in F_1^0\times F^0_2\) is a stationary Nash equilibrium in the limit-average payoff stochastic game if and only if c∗ is a solution to problem (OPa) with O2(c∗) = 0.
Theorems 14 and 15 were stated and proved in Filar et al. (1991); see also Theorems 3.8.2 and 3.8.4 in Filar and Vrieze (1997).
As in the zero-sum case, when one player controls the transitions, it is possible to construct finite-step algorithms to compute Nash equilibria. The linear complementarity problem (LCP) is defined as follows. Given a square matrix \(\mathbb {M}\) of order m and a (column) vector \( \overline {Q}\in \mathbb {R}^m\), we find two vectors \(\overline {Z} =[z_1,\ldots ,z_m]^T \in \mathbb {R}^m\) and \(\overline {W}=[w_1,\ldots ,w_m]^T\in \mathbb {R}^m\) such that
Lemke (1965) proposed some pivoting finite-step algorithms to solve the LCP for a large class of matrices \(\mathbb {M}\) and vectors \(\overline {Q}\). Further research on the LCP can be found in Cottle et al. (1992).
Finding a Nash equilibrium in any bimatrix game \((\mathbb {A},\mathbb {B})\) is equivalent to solving the LCP with
A finite-step algorithm for this LCP was given by Lemke and Howson (1964). If \( \overline {Z}^*= [\overline {Z}_1^*,\overline {Z}^*_2]\) is a part of the solution of the above LCP, then the normalization of \(\overline {Z}^*_i\) is an equilibrium strategy for player i.
Suppose that only player 2 controls the transitions in a discounted stochastic game, i.e., q(y|x, a, b) is independent of a ∈ A. Let \(\{f_1,\ldots ,f_{m_1}\}\) and \(\{g_1,\ldots ,g_{m_2}\}\) be the families of all pure stationary strategies for players 1 and 2, respectively. Consider the bimatrix game \((\mathbb {A},\mathbb {B})\), where the entries aij of \(\mathbb {A}\) and bij of \(\mathbb {B}\) are
Then, making use of the Lemke-Howson algorithm, Nowak and Raghavan (1993) proved the following result.
Theorem 16
Let \(\xi ^* =(\xi ^*_1,\ldots ,\xi ^*_{m_1})\) and \(\zeta ^*=(\zeta ^*_1,\ldots ,\zeta ^*_{m_2})\) and assume that (ξ∗, ζ∗) is a Nash equilibrium in the bimatrix game \((\mathbb {A},\mathbb {B})\) defined above. Then the stationary strategies
form a Nash equilibrium in the discounted stochastic game.
It should be noted that a similar result does not hold for stochastic games with the limit-average payoffs. Note that the entries of the matrix \( \mathbb {B}\) can be computed in finitely many steps, but the order of the associated LCP is very high. Therefore, a natural question arises as to whether the single-controller stochastic game can be solved with the help of LCP formulation with appropriately defined matrix \(\mathbb {M}\) (with lower dimension) and vector \(\overline {Q}\). Since the payoffs and transitions depend on states and stationary equilibria which are characterized by the systems of Bellman equations, the dimension of the LCP must be high. However, it should be essentially smaller than in the case of Theorem 16. Such an LCP formulation for discounted single-controller stochastic games was given by Mohan et al. (1997) and further developed in Mohan et al. (2001). In the case of the limit-average payoff and single-controller stochastic game, Raghavan and Syed (2002) provided an analogous algorithm. Further studies on specific classes of stochastic games (acyclic 3-person switching control games, polystochastic games) can be found in Krishnamurthy et al. (2012).
Let us recall that a Nash equilibrium in an n-person game is a fixed point of some mapping. A fixed point theorem of certain deformations of continuous mappings proved by Browder (1960) turned out to be basic for developing so-called homotopy methods in computing equilibria in nonzero-sum games. It reads as follows.
Theorem 17
Assume that \(C\subset \mathbb {R}^d\) is a nonempty compact convex set. Let Ψ : [0, 1] × C → C be a continuous mapping and F(Ψ) := {(t, c) ∈ [0, 1] × C : c = Ψ(t, c)}. Then F(Ψ) contains a connected subset Fc(Ψ) such that Fc(Ψ) ∩ ({0}× C) ≠ ∅ and Fc(Ψ) ∩ ({1}× C) ≠ ∅.
This result was extended to upper semicontinuous correspondences by Mas-Colell (1974). Consider an n-person game and assume that Ψ1 is a continuous mapping whose fixed points in the set C of strategy profiles correspond to Nash equilibria in this game. The basic idea in the homotopy methods is to define a “deformation” Ψ of Ψ1 such that Ψ(1, c) = Ψ1(c) for all c ∈ C and such that Ψ(0, c) has a unique fixed point, say \(c_0^*\), that is relatively simple to find. By Theorem 17, Fc(Ψ) is a connected set. Thus, \(c_0^*\) is connected via Fc(Ψ) with a fixed point \(c_1^*\) of Ψ1. Hence, the idea is to consider the connected set Fc(Ψ). Since the dimension of the domain of Ψ is one higher than the dimension of its range, one can formulate regularity conditions under which the approximation path is a compact, piecewise differentiable one-dimensional manifold, i.e., it is a finite collection of arcs and loops. In the case of bimatrix games, a nondegeneracy condition is sufficient to guarantee that the aforementioned properties are satisfied. A comprehensive discussion of the homotopy algorithms applied to n-person games is provided by Herings and Peeters (2010) and references cited therein. According to the authors, “advantages of homotopy algorithms include their numerical stability, their ability to locate multiple solutions, and the insight they provide in the properties of solutions”. Various examples show that implementation of homotopy methods is rather straightforward with the aid of available professional software. It is worth recalling the known fact that the Lemke-Howson algorithm can be applied to bimatrix games only. An issue of finding Nash equilibria in concave n-person games comprises a non-linear complementarity problem. Therefore, one can only expect to obtain approximate equilibria by different numerical methods.
The homotopy methods, as noted by Herings and Peeters (2004), are also useful in the study of stationary equilibria, their structure and computation in nonzero-sum stochastic games. Their results can be applied to n-person discounted stochastic games with finite state and action spaces.
Recently, Govindan and Wilson (2003) proposed a new algorithm to compute Nash equilibria in finite games. Their algorithm combines the global Newton method (see Smale 1976)) and a homotopy method for finding fixed points of continuous mappings developed by Eaves (1972, 1984). In the construction of a Nash equilibrium, a fundamental topological property of the graph of the Nash equilibrium correspondence discovered by Kohlberg and Mertens (1986) plays an important role. Being more precise, the authors show that making use of the global Newton method, it is possible to trace the path of the homotopy by a dynamical system. The same method can be applied to a construction of an algorithm for n-person discounted stochastic games with finite action and state sets; see Govindan and Wilson (2009). Strategic n-person games with a potential function having pure Nash equilibria were considered by Monderer and Shapley (1996). Potters et al. (2009), on the other hand, examined certain classes of discounted stochastic games via the potential function approach and constructed pure stationary Nash equilibria by solving a finite number of finite strategic games.
Solan and Vieille (2010) pointed out that the methods based on formal logic, successfully applied to zero-sum games, are also useful in the examination of certain classes of nonzero-sum stochastic games with the limit-average payoff criterion.
9 Uniform Equilibrium, Subgame Perfection, and Correlation in Stochastic Games with Finite State and Action Spaces
In this section, we consider stochastic games with finite state space X = {1, …, s} and finite sets of actions. We deal with “normalized discounted payoffs” and use notation which is more consistent with the surveyed literature. We let β = 1 − λ and multiply all current payoffs by λ ∈ (0, 1). Thus, we consider
For any \(T\in \mathbb {N}\) and x = x1 ∈ X, π ∈ Π, the T-stage average payoff for player i ∈ N is
A vector \(\bar {g}\in \mathbb {R}^{n}\) is called a uniform equilibrium payoff at the initial state x ∈ X if for every 𝜖 > 0 there exist λ0 ∈ (0, 1], \(T^0\in \mathbb {N}\) and a strategy profile π0 ∈ Π such that for every player i ∈ N and every strategy πi ∈ Πi, we have
and
Any profile π0 that has the above two properties is a called a uniform 𝜖-equilibrium. In other words, the game has a uniform equilibrium payoff if for every 𝜖 > 0 there is a strategy profile π0 which is an 𝜖-equilibrium in every discounted game with a sufficiently small discount factor λ and in every finite-stage game with sufficiently long time horizon.
A stochastic game is called absorbing if all states but one are absorbing. Assume that X = {1, 2, 3} and only state x = 1 is nonabsorbing. Let E0 denote the set of all uniform equilibrium payoffs. Since the payoffs are determined in states 2 and 3, in a two-person game, the set E0 can be viewed as a subset of \(\mathbb {R}^2\). Let λk → 0 as k →∞, and let \( f_k^*\) be a stationary Markov perfect equilibrium in the λk -discounted two-person game. A question arises as to whether the sequence \( (J^1_{\lambda _k}(x,f^*_k),J^2_{\lambda _k}(x,f^*_k))_{k\in \mathbb {N}}\) with x = 1 has an accumulation point \( \bar {g}\in E^0\). That is the case in the zero-sum case (see Mertens and Neyman 1981 ). Sorin (1986) provided a nonzero-sum modification of the “Big Match,” where only state x = 1 is nonabsorbing in which \(\lim _{k\to \infty } (J^1_{\lambda _k}(1,f^*_k),J^2_{\lambda _k}(1,f^*_k))\not \in E^0\). A similar phenomenon occurs for the limit of T-stage equilibrium payoffs. Sorin (1986) gave a full description of the set E0 in his example. His observations were generalized by Vrieze and Thuijsman (1989) to all 2-person absorbing games. They proved the following result.
Theorem 18
Any two-person absorbing stochastic game has a uniform equilibrium payoff.
We now state the fundamental result of Vieille (2000a,b).
Theorem 19
Every two-person stochastic game has a uniform equilibrium payoff.
The proof of Vrieze and Thuijsman (1989) is based on the “vanishing discount factor approach” combined with the idea of “punishment” successfully used in repeated games. The assumption that there are only two players is important in the proof. The 𝜖-equilibrium strategies that they construct need unbounded memory. The proof of Vieille (2000a,b) is involved. One of the reasons is that the ergodic classes do not depend continuously on strategy profiles. Following Vieille (2002) one can briefly say that “the basic idea is to devise an 𝜖-equilibrium profile that takes the form of a stationary-like strategy vector, supplemented by threats of indefinite punishment”. The construction of uniform equilibrium payoff consists of two independent steps. First, a class of solvable states is recognized and some controlled sets are considered. Second, the problem is reduced to the existence of equilibria in a class of recursive games. The punishment component is crucial in the construction and therefore the fact that the game is 2-person is important. Neither of the two parts of the proof can be extended to games with more than two players. The 𝜖 -equilibrium profiles have no subgame-perfection property and require unbounded memory for the players. For a heuristic description of the proof, the reader is referred to Vieille (2002). In a recent paper Solan (2017) proposed a new solution concept for multiplayer stochastic games called acceptable strategy profiles. It is relatively simpler than uniform equilibrium and has some interesting properties. A suitable adaptation of the notion of uniform equilibrium is studied by Neyman (2017) in the class of continuous-time stochastic games with a small imprecision in the specification of players’ evaluations of streams of payoffs.
Flesch et al. (1997) proposed a three-person game with absorbing states where only a cyclic Markov equilibrium exists. No examples of this type were found in the 2-person case. This example inspired Solan (1999), who making use of certain arguments from Vrieze and Thuijsman (1989), proved the following result.
Theorem 20
Every 3 -person absorbing stochastic game has a uniform equilibrium payoff.
In a quitting game , every player has only two actions, c for continue and q for quit. As soon as one or more of the players at any stage chooses q, the game stops and the players receive their payoffs, which are determined by the subset of players, say S, that choose simultaneously the action q. If nobody chooses q throughout all stages of play, then all players receive zero. The payoffs are defined as follows. For every nonempty subset S ⊂ N of players, there is a payoff vector \(v(S) \in \mathbb {R}^n\). The first stage on which S is the subset of players that choose q at this stage, every player i ∈ N receives the payoff v(S)i. A quitting game is a special limit-average-absorbing stochastic game. The example of Flesch et al. (1997) belongs to this class. We now state the result due to Solan and Vieille (2001).
Theorem 21
Consider a quitting game satisfying the following assumptions: if player i alone quits, then i receives 1, and if player i quits with some other players, then i receives at most 1. Then the game has a subgame-perfect 𝜖-equilibrium. Moreover, there is a cyclic 𝜖-equilibrium strategy profile.
Quitting games are special cases of “escape games” studied by Simon (2007). As shown by Simon (2012), a study of quitting games can be based on some methods of topological dynamics and homotopy theory. More comments on this issue can be found in Simon (2016).
Thuijsman and Raghavan (1997) studied n-person perfect information stochastic games and n-person ARAT stochastic games and showed the existence of pure equilibria in the limit-average payoff case. They also derived the existence of 𝜖-equilibria for 2-person switching control stochastic games with the same payoff criterion. A class of n-person stochastic games with the limit-average payoff criterion and additive transitions as in the ARAT case (see Sect. 5) was studied by Flesch et al. (2007). The payoff functions do not satisfy any separability in actions assumption. They established the existence of Nash equilibria that are history dependent. For 2-person absorbing games, they showed the existence of stationary 𝜖-equilibria. In Flesch et al. (2008, 2009), the authors studied stochastic games with the limit-average payoffs where the state space X is the Cartesian product of some finite sets Xi, i ∈ N. For any state x = (x1, …, xn) ∈ X and any profile of actions a = (a1, …, an), the transition probability is of the form q(y|x, a) = q1(y1|x1, a1)⋯qn(yn|xn, an) where y = (y1, …, yn) ∈ X. In both aperiodic and periodic cases, they established the existence of Nash equilibria for n -person games. In the two-person zero-sum case, there exists a stationary Markov perfect equilibrium.
A stochastic game is recursive if the payoffs at all nonabsorbing states are zero. The class of recursive stochastic games is important. The payoffs in any absorbing state can be interpreted as limit averages of stage payoffs as soon as the absorbing state is reached. If no absorbing state is reached, then the average payoff is zero. Moreover, as noted by Simon (2016), “by expanding the state space of any normal stochastic game so that there is a one-to-one relationship between the finite histories of play and the states, any state corresponds to a clopen (open and closed) subset of the infinite histories of play and every open subset of the infinite histories of play will correspond to some collection of states. A stochastic game where all non-zero payoffs are determined by membership in an open set of the infinite histories of play becomes in this way equivalent to a recursive game. Notice that if all absorbing payoffs are positive then the payoffs are lower semicontinuous, and if all absorbing payoffs are negative then the payoffs are upper semicontinuous (as functions on the infinite histories of play).” Flesch et al. (2010b) considered a class of n-person stochastic perfect information games assuming that in every state, the transitions are controlled by one player. The payoffs are equal to zero in every nonabsorbing state and are nonnegative in every absorbing state. They proposed a new iterative method to analyse these games under the expected limit-average payoff criterion and proved the existence of a subgame-perfect 𝜖-equilibrium in pure strategies. They also showed the existence of the uniform equilibrium payoffs. Recursive n-person perfect information games, where each player controls one nonabsorbing state and the transitions are deterministic, were studied in Kuipers et al. (2016). Allowing also for negative payoffs in absorbing states (in contrast to Flesch et al. 2010b), the authors showed the existence of a subgame-perfect 𝜖-equilibrium by a combinatorial method.
Correlated equilibria were introduced by Aumann (1974, 1987) for games in normal form. Correlation devices may be of different types; see Forges (2009). In Sect. 4 we considered a correlation device using public randomization. They are also called stationary, because at every stage a signal is generated according to the same probability distribution, independent of any data. There are also devices based on past signals that were sent to the players, but not on the past play. They are called “autonomous correlation devices” (see Forges 2009). An 𝜖-equilibrium in an extended game that includes an autonomous correlation device is also called an extensive-form correlated 𝜖 -equilibrium in a multistage game. Solan (2001) characterized the set of extensive-form correlated 𝜖-equilibria in stochastic games. He showed that every feasible and individually rational payoff in a stochastic game is an extensive-form correlated equilibrium payoff constructed with the help of an appropriately chosen device.
The following two results are due to Solan and Vieille (2002).
Theorem 22
Every n-person stochastic game with finite state and action spaces has a uniform correlated equilibrium payoff using an autonomous correlation device.
The construction of an equilibrium profile is based on the method of Mertens and Neyman (1981) applied to zero-sum games. The equilibrium path is sustained by the use of threat strategies. However, punishment occurs only if a player disobeys the recommendation of the correlation device. The second result is stronger in some sense but concerns positive recursive games, where the payoffs in absorbing states are nonnegative for all players.
Theorem 23
Every positive recursive stochastic game with finite sets of states and actions has a uniform correlated equilibrium payoff and the correlation device can be taken to be stationary.
The proof of the above result makes use of a variant of the method of Vieille (2000b).
In a recent paper, Mashiah-Yaakovi (2015) considered stochastic games with countable state spaces, finite sets of actions and Borel measurable bounded payoffs, defined on the space H∞ of all plays. This class includes the Gδ-games of Blackwell (1969). The concept of uniform 𝜖 -equilibrium does not apply to this class of games, because the payoffs are not additive. She proved that these games have extensive-form correlated 𝜖-equilibria.
Secchi and Sudderth (2002a) considered a special class of n-person stochastic “stay-in-a-set games” defined as follows. Let Gi be a fixed subset of X for each i ∈ N. Define \(G^i_{\infty }:=\{(x_1,a^1,x_2,a^2,\ldots )\}\), where xt ∈ Gi for every t. The payoff function for player i ∈ N is the characteristic function of the set \(G_i^\infty \). They proved the existence of an 𝜖-equilibrium (equilibrium) assuming that the state space is countable (finite) and the sets of actions are finite. Maitra and Sudderth (2003) generalized this result to the Borel state stay-in-a-set games with compact action sets using standard continuity assumption on the transition probability with respect to actions. Secchi and Sudderth (2002b) proved that every n -person stochastic game with countably many states, finite action sets and bounded upper semicontinuous payoff functions on H∞ has an 𝜖 -equilibrium. All proofs in the aforementioned papers are partially based on the methods from gambling theory; see Dubins and Savage (2014).
Nonzero-sum infinite horizon games with perfect information are special cases of stochastic games. Flesch et al. (2010a) established the existence of subgame-perfect 𝜖-equilibria in pure strategies in perfect information games with lower semicontinuous payoff functions on the space H∞ of all plays. A similar result for games with upper semicontinuous payoffs was proved by Purves and Sudderth (2011). It is worth mentioning that the aforementioned results also hold for games with arbitrary nonempty action spaces and deterministic transitions. Solan and Vieille (2003) provided an example of a two-person game with perfect information that has no subgame-perfect 𝜖-equilibrium in pure strategies, but does have a subgame-perfect 𝜖-equilibrium in behavior strategies. Their game belongs to the class of deterministic stopping games. Recently, Flesch et al. (2014) showed that a subgame-perfect 𝜖-equilibrium (in behavioral strategies) may not exist in perfect information games if the payoff functions are bounded and Borel measurable.
Additional general results on subgame-perfect equilibria in games of perfect information can be found in Alós-Ferrer and Ritzberger (2015, 2016). Two refinements of subgame-perfect 𝜖-equilibrium concept were introduced and studied in continuous games of perfect information by Flesch and Predtetchinski (2015).
Two-person discounted stochastic games of perfect information with finite state and action spaces were treated in Küenle (1994). Making use of threat strategies, he constructed a history-dependent pure Nash equilibrium. However, it is worth to point out that pure stationary Nash equilibria need not exist in this class of games. A similar remark applies to irreducible stochastic games of perfect information with the limiting average payoff criterion. Counterexamples are described in Federgruen (1978) and Küenle (1994).
We close this section with a remark on “folk theorems” for stochastic games. It is worth mentioning that the techniques, based on threat strategies utilized very often in repeated games, cannot be immediately adapted to stochastic games, where the players use randomized (behavioral) strategies. Deviations are difficult to discover when the actions are selected at random. However, some folk theorems for various classes of stochastic games were proved in Dutta (1995), Fudenberg and Yamamoto (2011), Hörner et al. (2011, 2014), and Peski and Wiseman (2015). Further comments can be found in Solan and Zillotto (2016).
Abreu et al. (1986, 1990) applied a method for analysing subgame-perfect equilibria in discounted repeated games that resembles the dynamic programming technique. The set of equilibrium payoffs is a set-valued fixed points of some naturally defined operator. A similar idea was used in stochastic games by Mertens and Parthasarathy (1991). The fixed point property for subgame-perfect equilibrium payoffs can be used to develop algorithms. Berg (2016) and Kitti (2016) considered some modifications of the aforementioned methods for discounted stochastic games with finite state spaces. They also demonstrated some techniques for computing (non-stationary) subgame-perfect equilibria in pure strategies provided that they exist. Sleet and Yeltekin (2016) applied the methods of Abreu et al. (1986, 1990) to some classes of dynamic games and provided a new approach for computing equilibrium value correspondences. Their idea is based on outer and inner approximations of the equilibrium value correspondence via step set-valued functions.
10 Nonzero-Sum Stochastic Games with Imperfect Monitoring
There are only a few papers on nonzero-sum stochastic games with imperfect monitoring (or incomplete information). Although in many models an equilibrium does not exist, some positive results were obtained for repeated games; see Forges (1992), Chap. IX in Mertens et al. (2015) and references cited therein. Altman et al. (2005, 2008) studied stochastic games, in which every player can only observe and control his “private state” and the state of the world is composed of the vector of private states. Moreover, the players do not observe the actions of their partners in the game. Such models of games are motivated by certain examples in wireless communications.
In the model of Altman et al. (2008), the state space \(X= \prod _{i=1}^n X_i\), where Xi is a finite set of private states of player i ∈ N. The action space Ai(xi) of player i ∈ N depends on xi ∈ Xi and is finite. It is assumed that player i ∈ N has no information about the payoffs called costs. Hence, player i only knows the history of his private state process and the action chosen by himself in the past. Thus, a strategy πi of player i ∈ N is independent of realizations of state processes of other players and their actions. If x = (x1, …, xn) ∈ X is a state at some period of the game and a = (a1, …, an) is the action profile selected independently by the players at that state, then the probability of going to state y = (y1, …, yn) is q(y|x, a) = q1(y1|x1, a1)⋯qn(yn|xn, an), where \(q_i(\cdot |x_i,a_i)\in \Pr (A_i(x_i))\). Thus the coordinate (or private) state processes are independent. It is assumed that every player i has a probability distribution νi of the initial state xi ∈ Xi and that the initial private states are independent. The initial distribution ν of the state x ∈ X is determined by ν1, …, νn in an obvious way and is known by the players. Further, it is supposed that player i ∈ N is given some stage cost functions \( c_i^j(x,a)\) (j = 0, 1, …, ni) depending on x ∈ X and action profiles a available in that state. The cost function \(c^0_i\) is to be minimized by player i in the long run, and \(c^j_i\) (for j > 0) are the costs that must satisfy some constraints described below.
Any strategy profile π together with the initial distribution ν and the transition probability q induces a unique probability measure on the space of all infinite plays. The expectation operator with respect to this measure is denoted by \(E^\pi _{\nu }\). The expected limit-average cost \( C_i^j(\pi )\) is defined as follows:
Note that xt ∈ X and at is an action profile of all the players.
Let \(b_i^j>0\) (j = 1, …, ni) be bounds used to define constraints below. A strategy profile π is i-feasible if
Thus, π is feasible if it is i-feasible for every player i ∈ N.
A strategy profile π∗ is called a constrained Nash equilibrium, if π∗ is feasible and for every player i ∈ N and his strategy πi such that the profile \( (\pi _i,\pi ^*_{-i})\) is i-feasible, we have
Note that a unilateral deviation of player i may increase his cost or it may violate his constraints. The aforementioned fact is illustrated in Altman et al. (2008) by an example in wireless communications.
Altman et al. (2008) made the following assumptions.
-
(I1)
(Ergodicity) For every player i ∈ N and any stationary strategy the state process on Xi is an irreducible Markov chain with one ergodic class and possibly some transient states.
-
(I2)
(Strong Slater condition) There exists some η > 0 such that every player i ∈ N has a strategy \(\pi _i^\eta \) with the property that for any strategy profile π−i of other players
$$\displaystyle \begin{aligned} C^j_i(\pi^\eta_i,\pi_{-i})\le b^j_i-\eta \quad \mbox{for all}\quad j=1,\ldots,n_i. \end{aligned}$$ -
(I3)
(Information) The players do not observe their costs.
Theorem 24
Consider the game model that satisfies conditions (I1)–(I3) . Then there exists a stationary constrained Nash equilibrium.
Stochastic games with finite sets of states and actions and imperfect public monitoring were studied in Fudenberg and Yamamoto (2011) and Hörner et al. (2011). The players, in their models, observe states and receive only public signals on the chosen actions by the partners in the game. Fudenberg and Yamamoto (2011) and Hörner et al. (2011) established “folk theorems” for stochastic games under assumptions that relate to “irreducibility” conditions on the transition probability function. Moreover, Hörner et al. (2011, 2014) also studied algorithms for both computing the sets of all equilibrium payoffs in the normalized discounted games and for finding their limit as the discount factor tends to one. As shown in counterexamples in Flesch et al. (2003) an n -person stochastic game with non-observable actions of the players (and no public signals), observable payoffs and the expected limit-average payoff criterion does not possess 𝜖-equilibrium. Cole and Kocherlakota (2001) studied discounted stochastic games with hidden states and actions. They provided an algorithm for finding a sequential equilibrium, where strategies depend on private information only through the privately observed state. Imperfect monitoring is also assumed in the model of the supermodular stochastic game studied by Balbus et al. (2013b), where the monotone convergence of Nash equilibrium payoffs in finite-stage games is proved.
11 Intergenerational Stochastic Games
This section develops a concept of equilibrium behavior and establishes its existence in various intergenerational games. Both paternalistic and non-paternalistic altruism cases are discussed. Consider an infinite sequence of generations labelled by \(t\in \mathbb {N}\). There is a single good (called also a renewable resource) that can be used for consumption or productive investment. The set of all resource stocks S is an interval in \( \mathbb {R}\). It is assumed that 0 ∈ S. Every generation lives one period and derives utility from its own consumption and consumptions of some or all its descendants. Generation t observes the current stock st ∈ S and chooses at ∈ A(st) := [0, st] for consumption. The remaining part yt = st − at is left as an investment for its descendants. The next generation’s inheritance or endowment is determined by a weakly continuous transition probability q from S to S (stochastic production function), which depends on yt ∈ A(st) ⊂ S. Recall that the weak continuity of q means that q(⋅|ym) ⇒ q(⋅|y0) if ym → y0 in S (as m →∞). Usually, it is assumed that state 0 is absorbing, i.e., q({0}|0) = 1. Let Φ be the set of all Borel functions ϕ : S → S such that ϕ(s) ∈ A(s) for each s ∈ S. A strategy for generation t is a function ϕt ∈ Φ. If ϕt = ϕ for all \(t\in \mathbb {N}\) and some ϕ ∈ Φ, then we say that the generations employ a stationary strategy.
Suppose that all generations from t + 1 onward use a consumption strategy c ∈ Φ. Then, in the paternalistic model generation t’s utility when it consumes at ∈ A(st) equals to H(at, c)(st), where H is some real-valued function used for measurement of the satisfaction level of the generation. This implies that in models with paternalistic altruism each generation derives its utility from its own consumption and the consumptions of its successor or successors.
Such a game model reveals a time inconsistency. Strotz (1956) and Pollak (1968) were among the first, who noted this fact in the model of an economic agent whose preferences change over time. In related works, Phelps and Pollak (1968) and Peleg and Yaari (1973) observed that this situation is formally equivalent to one, in which decisions are made by a sequence of heterogeneous planners. They investigated the existence of consistent plans, what we shall call (stationary) Markov perfect equilibria. The solution concept is in fact a symmetric Nash equilibrium (c∗, c∗, …) in a game played by countably many short-lived players having the same utility functions. Therefore, we can say that a stationary Markov perfect equilibrium (c∗, c∗, …) corresponds to a strategy c∗∈ Φ such that
for every s ∈ S. We identify this equilibrium with c∗.
In other words, c∗∈ Φ is a stationary Markov perfect equilibrium if
There is now a substantial body of work on paternalistic models, see for instance, Alj and Haurie (1983), Harris and Laibson (2001), and Nowak (2010) and the results presented below in this section. At the beginning we consider three types of games, in which the existence of a stationary Markov perfect equilibrium was proved in a sequence of papers: Balbus et al. (2015a,b,c). Game (G1) describes a purely deterministic case, while games (G2) and (G3) deal with a stochastic production function. However, (G2) concerns a model with one descendant, whereas (G3) examines a model with infinitely many descendants. Let us mention that by an intergenerational game with k (k is finite or infinite) descendants (successors or followers), we mean a game in which each generation derives its utility from its own consumption and consumptions of its k descendants.
-
(G1)
Let S := [0, +∞). Assume that \(q(\cdot |y_t)=\delta _{p(y_t)}( \cdot )\), where p : S → S is a continuous and increasing production function such that p(0) = 0. We also assume that
$$\displaystyle \begin{aligned} H(a,c)(s)= \hat{u}(a,c(p(s-a))) \end{aligned}$$for some continuous and increasing in each variable function \(\hat {u}: \mathbb {R}^2_+ \to \mathbb {R}\cup \{-\infty \}\). Moreover, we allow \(\hat {u}\) to be unbounded from below. Hence, we assume that \(\hat {u}(0,y) \ge -\infty \) for all y ≥ 0 and \(\hat {u}(x,0) >-\infty \) for all x > 0. Furthermore, for any y1 > y2 in S and h > 0, we assume that the function \(\varDelta _h\hat {u} (x):= \hat {u}(x,y_1)-\hat {u}(x+h,y_2)\) has the strict single crossing property on (0, +∞), i.e., \(\varDelta _h\hat {u}(x) \ge 0\) implies that \(\varDelta _h\hat {u}(x^{\prime })> 0\) for each x′ > x (see Milgrom and Shannon 1994).
-
(G2)
Let S := [0, +∞). We study a model with a utility that reflects a generation’s attitude toward risk. This fact is reflected by a positive risk coefficient r. In this setup, H takes the following form:
$$\displaystyle \begin{aligned} H(a,c)(s) =\left\{ \begin{array}{l@{\quad \quad}l} u(a) + \beta\int_Sv(c(s^{\prime }))q(ds^{\prime }|s-a), & \mbox{for}\quad r=0 \\ u(a) - \frac{\beta}{r}\ln\int_Se^{-rv(c(s^{\prime }))}q(ds^{\prime }|s-a), & \mbox{for}\quad r>0, \end{array} \right. \end{aligned}$$where \(u:S\to \mathbb {R}\cup \{-\infty \}\) is increasing, strictly concave, continuous on (0, +∞) and u(0) ≥−∞. In addition, the function \(v:S\to \mathbb {R}\) is bounded, continuous and increasing. Further assumptions are as follows: for every s ∈ S, the set Zs := {y ∈ S : q({s}|y) > 0} is countable and the transition law q is stochastically increasing. The latter fact means that, if z → Q(z|y) is the cumulative distribution function for q(⋅|y), then for all y1 < y2 and z ∈ S, we have Q(z|y1) ≥ Q(z|y2).
-
(G3)
Let \(S:=[0,\bar {s}]\) for some \(\bar {s}>0\). In this case, we assume that the utility function of current generation t is as follows:
$$\displaystyle \begin{aligned} H(a,c)(s) = \tilde{u}(a) + E^c_{s}[w(a_{t+1},a_{t+2},\ldots)], \end{aligned}$$where \(w: S^\infty \to \mathbb {R}\) is continuous and \(\tilde {u}: S\mapsto \mathbb {R}\) is continuous, strictly concave and increasing. Here, \(E_s^c\) is an expectation operator with respect to the unique probability measure on the space of all feasible future histories (starting from the endowment s of generation t) of the consumption-investment process induced by the stationary strategy c ∈ Φ used by each generation τ (τ > t) and the transition probability q. The function \(\tilde {u}\) is also assumed to be continuous and strictly concave. Defining
$$\displaystyle \begin{aligned} \tilde{J}(c)(s)= E^c_{s}[w(a_k, a_{k+1},a_{k+2},\ldots)] \end{aligned}$$for every \(k\in \mathbb {N}\), we obtain that
$$\displaystyle \begin{aligned} H(a,c)(s) =\tilde{u}(a)+\int_S \tilde{J}(c)(s^{\prime })q(ds^{\prime }|s-a). \end{aligned}$$In addition, q(⋅|y) is assumed to be nonatomic for y > 0.
Let I denote the set of nondecreasing lower semicontinuous functions \(i: S\to \mathbb {R}\) such that i(s) ∈ A(s) for each s ∈ S. Note that every i ∈ I is continuous from the left and has at most a countable number of discontinuity points. Put
Clearly, any c ∈ F is upper semicontinuous and continuous from the left. The idea of using the class F of strategies for analysing equilibria in deterministic bequest games comes from Bernheim and Ray (1983). Further, it was successfully applied to the study of other classes of dynamic games with simultaneous moves; see Sundaram (1989a) and Majumdar and Sundaram (1991).
Theorem 25
Every intergenerational game (G1), (G2)and (G3)possesses a stationary Markov perfect equilibrium c∗∈ F.
The main idea of the proof is based upon the consideration of an operator L defined as follows: to each consumption strategy c ∈ F used by descendant (or descendants) the function L assigns the maximal element c0 from the set of best responses to c. It is shown that c0 ∈ F. Moreover, F can be viewed as a convex subset of the vector space Y of real-valued continuous from the left functions \(\eta :S\mapsto \mathbb {R}\) of bounded variation on every interval Sn := [0, n], \(n\in \mathbb {N}\), thus in particular on \([0,\bar {s}]\). We further equip Y with the topology of weak convergence. We assume that (ηm) converges weakly to some η0 ∈ Y , if \(\lim\limits _{m\to \infty }\eta ^m(s)=\eta ^0(s)\) for every continuity point s of η0. Then, due to Lemma 2 in Balbus et al. (2015c) , F is compact and metrizable. Finally, the equilibrium point is obtained via the Schauder-Tychonoff fixed point theorem applied to the operator L.
Theorem 25 for game (G1) was proved in Balbus et al. (2015c). Related results for the purely deterministic case were considered by Bernheim and Ray (1983) and Leininger (1986). For instance, Leininger (1986) studied a class \( \mathcal {U}\) of bounded from below utility functions for which every selector of the best response correspondence is nondecreasing. In particular, he noticed that this class is nonempty and it includes, for instance, the separable case, i.e., u(x, y) = v(x) + bv(y), where v is strictly increasing and concave and b > 0. Bernheim and Ray (1983), on the other hand, showed that the functions u that are strictly concave in their first argument and satisfying the so-called increasing differences property (see Sect. 2) also belong to \(\mathcal {U.}\) Other functions u that meet conditions imposed by Bernheim and Ray (1983) and Leininger (1986) are of the form u(x, y) = v1(x)v2(y), where v1 is strictly concave and v2 ≥ 0 is continuous and increasing. The class \(\mathcal {U}\) is not fully characterized. The class (G1) of games includes all above-mentioned examples and some new ones. Our result is also valid for a larger class of utilities that can be unbounded from below. Therefore, Theorem 25 is a generalization of Theorem 4.2 in Bernheim and Ray (1983) and Theorem 3 in Leininger (1986). The proofs given by Bernheim and Ray (1983) and Leininger (1986) do not work for unbounded utility functions. Indeed, Leininger (1986) uses a transformation of upper semicontinuous consumption strategies into the set of Lipschitz functions with constant 1. This clever “levelling” operation enables him to equip the space of continuous functions on the interval \([0, \bar {y}]\) with the topology of uniform convergence and to apply the Schauder fixed point theorem. His proof strongly makes use of the uniform continuity of u. This is the case, when the production function crosses the 45∘ line. If the production function does not cross the 45∘ line, a stationary equilibrium is then obtained as a limit of equilibria corresponding to the truncations of the production function. However, this part of the proof is descriptive and sketchy. Bernheim and Ray (1983), on the other hand, identify with the maximal best response consumption strategy, which is upper semicontinuous, a convex-valued upper hemicontinuous correspondence. Then, such a space of upper hemicontinuous correspondences is equipped with the Hausdorff topology. This fact implies the strategy space is compact, if endowments have an upper bound, i.e., when the production function p crosses the 45∘ line. If this is not satisfied, then a similar approximation technique as in Leininger (1986) is employed. Our proof does not follow the above-mentioned approximation methods. The weak topology introduced in the space Y implies that F is compact and allows to use an elementary but non-trivial analysis. For examples of deterministic bequest games with stationary Markov perfect equilibria given in closed form the reader is referred to Fudenberg and Tirole (1991) and Nowak (2006b, 2010).
Theorem 25 for game (G2) was proved by Balbus et al. (2015b), whereas for game (G3) by Balbus et al. (2015a). Within the stochastic framework, Theorem 25 is an attempt of saving the result reported by Bernheim and Ray (1989) on the existence of stationary Markov perfect equilibria in games with very general utility function and nonatomic shocks. If q is allowed to possess atoms, then a stationary Markov perfect equilibrium exists in the bequest games with one follower (see Theorems 1–2 in Balbus et al. 2015b). The latter result also embraces the purely deterministic case; see Example 1 in Balbus et al. (2015b), where the nature and role of assumptions are discussed. However, as shown in Example 3 in Balbus et al. (2015a), the existence of stationary Markov perfect equilibria in the class of F cannot be proved in intergenerational games where q has atoms and there are more than one descendant.
The result in Bernheim and Ray (1986) concerns “consistent plans” in models with finite time horizon. The problem is then simpler. The results of Bernheim and Ray (1986) were considerably extended by Harris (1985) in his paper on perfect equilibria in some classes of games of perfect information. It should be noted that there are other papers that contain certain results for bequest games with stochastic production function. Amir (1996b) studied games with one descendant for every generation and the transition probability such that the induced cumulative distribution function Q(z|y) is convex in y ∈ S. This condition is rather restrictive. Nowak (2006a) considered similar games in which the transition probability is a convex combination of the Dirac measure at state s = 0 and some transition probability from S to S with coefficients depending on investments. Similar models were considered by Balbus et al. (2012, 2013a). The latter paper also studies some computational issues for stationary Markov perfect equilibria. One should note, however, that the transition probabilities in the aforementioned works are specific. However, the transition structure in Balbus et al. (2015a,b) is consistent with the transitions used in the theory of economic growth; see Bhattacharya and Majumdar (2007) and Stokey et al. (1989).
The interesting issue studied in the economics literature concerns the limiting behavior of the state process induced by a stationary Markov perfect equilibrium. Below we formulate a steady state result for a stationary Markov perfect equilibrium obtained for the game (G1). Under slightly more restrictive conditions it was shown by Bernheim and Ray (1987) that the equilibrium capital stock never exceeds the optimal planning stock in any period. Namely, it is assumed that
-
(B1)
p is strictly concave, continuously differentiable and \( \lim _{y\to 0^+}p^{\prime }(y)>1\), limy→∞p′(y) < 1∕β, where β ∈ (0, 1] is a discount factor;
-
(B2)
\( \hat {u}(a_t,a_{t+1})=\hat {v}(a_t)+\beta \hat {v}(a_{t+1})\), where \(\hat {v} :S\to \mathbb {R}\) is increasing, continuously differentiable, strictly concave and \(\hat {v}(a)\to \infty \) as a →∞.
An optimal consumption program \(\hat {a}:=(\hat {a}_t)_{t\in \mathbb {N} }\) is the one which maximizes \(\sum _{t=1}^\infty \beta ^{t-1} \hat {v}(\hat {a} _t)\) subject to all feasibility constraints described in the model. The following result is stated as Theorems 3.2 and 3.3 in Bernheim and Ray (1987).
Theorem 26
Assume (B1)–(B2)and consider game (G1) . If c∗ is a stationary Markov perfect equilibrium, then \(i^*(s)=s-c^*(s)\le \hat {y}\), where \(\hat {y}\in [0,\infty )\) is the limit of the sequence \((s_t-\hat {a_t})_{t\in \mathbb {N}}\). If \(\hat {y}>0\), it solves βp′(y) = 1. If \(\lim _{y\to 0^+}p'(y)>1/\beta \), then \(\hat {y}>0\).
For further properties of stationary Markov perfect equilibria such as efficiency, and Pareto optimality, the reader is referred to Sect. 4 in Bernheim and Ray (1987).
For stochastic models it is of some interest to know whether a stationary Markov perfect equilibrium induces a Markov process having an invariant distribution. It turns out that the answer is positive if an additional stochastic monotonicity requirement is met:
-
(B3)
If y1 < y2, then for any nondecreasing Borel measurable function \(h:S\to \mathbb {R}\),
$$\displaystyle \begin{aligned} \int_S h(s)q(ds|y_1)\le \int_S h(s)q(ds|y_2). \end{aligned}$$
By Theorem 25 for game (G3), there exists c∗∈ F. Then s → i∗(s) = s − c∗(s) is nondecreasing on S. Put q∗(B|s) := q(B|i∗(s)) where B is a Borel subset of S and s ∈ S. From (B3), it follows that s → q∗(⋅|s) is nondecreasing. Define the mapping \(\varPsi :\Pr (S)\to \Pr (S)\) by
where \(B\in \mathcal {B}(S)\). An invariant distribution for the Markov process induced by the transition probability q∗ determined by i∗ (and thus by c∗) is any fixed point of Ψ. Let Δ(q∗) be the set of invariant distributions for the process induced by q∗. In Sect. 4 in Balbus et al. (2015a), the following result was proved.
Theorem 27
Assume (B3)and consider game (G3) . Then, the set of invariant distributions Δ(q∗) is compact in the weak topology on \(\Pr (S)\).
For each σ ∈ Δ(q∗), M(σ) :=∫Ssσ(ds) is the mean of distribution σ. By Theorem 27, there exists σ∗∗ with the highest mean over the set Δ(q∗).
One can ask for the uniqueness of invariant distribution. Theorem 4 in Balbus et al. (2015a) yields a positive answer to this question. However, this result concerns the model with multiplicative shocks, i.e., q is induced by the equation
where f : S → S is a continuous increasing function such that f(0) > 0. In addition, there is a state \( \hat {s}\in (0,\infty )\) such that f(y) > y for \(y\in (0,\hat {s})\) and f(y) < y for \(y\in (\hat {s}, \infty )\). Here \((\xi _t)_{t\in \mathbb {N}}\) is an i.i.d. sequence with the nonatomic distribution π. Assuming additionally the monotone mixing condition, we conclude from Theorem 4 in Balbus et al. (2015a) the uniqueness of the invariant distribution. Further discussion on these issues can be found in Stokey et al. (1989), Hopenhayn and Prescott (1992), Stachurski (2009), Balbus et al. (2015a) and the references cited therein.
In contrast to the paternalistic model one can also think of a non-paternalistic altruism. This notion is concerned with a model, in which each generation’s utility is derived from its own consumption and the utilities of its all successors. The most general model with non-paternalistic altruism was formulated by Ray (1987). His work is of some importance, because it provides a proper definition of an equilibrium for the non-paternalistic case. According to Ray (1987), a stationary equilibrium consists of a pair of two functions: a saving policy (or strategy) and an indirect utility function. Such a pair constitutes an equilibrium if it is optimal for the current generation, provided its descendants use the same saving strategy and the same indirect utility function.
Assume that the generations from t onward use a consumption strategy c ∈ Φ. Then, the expected utility of generation t, that inherits an endowment \(s_t=s\in S:=[0, \bar {s}]\), is of the form
where \(\tilde {u}:S \to K\) and w : K∞→ K are continuous functions and \(K:=[0,\bar {k}]\) with some \(\bar {k}\ge \bar {s}\). The function v : S → K is called an indirect utility and is assumed to be Borel measurable. Similarly, for any c ∈ Φ and s = st+1 ∈ S, we can define
which yields
Let us define
where s ∈ S, a ∈ A(s) and c ∈ Φ. If st = s, then P(a, c, v)(s) is the utility for generation t choosing the consumption level a ∈ A(st) in this state under the assumption that all future generations will employ a stationary strategy c ∈ Φ and the indirect utility is v.
A stationary equilibrium in the sense of Ray (1987) is a pair (c∗, v∗), with c∗∈ Φ, and v∗ : S → K being a bounded Borel measurable function such that for every s ∈ S, we have that
Note that equality (6.9) says that there exist an indirect utility function v∗ and a consumption strategy c∗, both depending on the current endowment, such that each generation finds it optimal to adopt this consumption strategy provided its descendants use the same strategy and exhibit the given indirect utility.
Let V be the set of all nondecreasing upper semicontinuous functions v : S → K. Note that every v ∈ V is continuous from the right and has at most a countable number of discontinuity points. By I we denote the subset of all functions φ ∈ V such that φ(s) ∈ A(s) for each s ∈ S. Let F = {c : c(s) = s − i(s), s ∈ S, i ∈ I}. We impose similar conditions to those imposed on model (G3). Namely, we shall assume that \( \tilde {u}\) is strictly concave and increasing. Then, the following result holds.
Theorem 28
In a non-paternalistic game as described above with nonatomic transitions, there exists a stationary equilibrium (c∗, v∗) ∈ F × V .
Theorem 28 was established as Theorem 1 in Balbus et al. (2016). Ray (1987) analysed games with non-paternalistic altruism and deterministic production functions. Unfortunately, his proof contains a mistake. The above result is strongly based on the assumption that the transitions are nonatomic and weakly continuous. The problem in the deterministic model of Ray (1987) remains open. However, Theorem 28 implies that an equilibrium exists if a “small nonatomic noise” is added to the deterministic transition function.
There is a great deal of work devoted to the so-called hyperbolic decision makers, in which the function w in (G3) has a specific form. Namely,
where α > 0 and is interpreted as a short-run discount factor and β < 1 is known as a long-run discount coefficient. This model was studied by Harris and Laibson (2001) with the transition function defined via the difference equation
The random variables \((\xi _t)_{t\in \mathbb {N}}\) are nonnegative, Independent, and identically distributed with respect to a nonatomic probability measure. The function \(\tilde {u}\) satisfies some restrictive condition concerning the risk aversion of the decisionmaker, but it may be unbounded from above. Working in the class of strategies with locally bounded variation, Harris and Laibson (2001) showed the existence of a stationary Markov perfect equilibrium in their model with concave utility function \( \tilde {u}\). They also derived a strong hyperbolic Euler relation. The model considered by Harris and Laibson (2001) can also be viewed as a game between generations; see Balbus and Nowak (2008), Nowak (2010), and Jaśkiewicz and Nowak (2014a) where related versions are studied. However, its main interpretation in the economics literature says that it is a decision problem where the utility of an economic agent changes over time. Thus, the agent is represented by a sequence of selves and the problem is to find a time-consistent solution . This solution is actually a stationary Markov perfect equilibrium obtained by thinking about selves as players in an intergenerational game. For further details and references, the reader is referred to Harris and Laibson (2001) and Jaśkiewicz and Nowak (2014a). A decision model with time-inconsistent preferences involving selves who can stop the process at any stage was recently studied by Cingiz et al. (2016)
The model with the function w defined in (6.10) can be extended by adding to the transition probabilities an unknown parameter θ. Then, the natural solution for such a model is a robust Markov perfect equilibrium. Roughly speaking, this solution is based on the assumption that the generations involved in the game are risk-sensitive and accept a maxmin utility. More precisely, let Θ be a nonempty Borel subset of Euclidean space \(\mathbb {R}^m\) (m ≥ 1). Then, the endowment st+1 for generation t + 1 is determined by the transition q from S × Θ to S that depends on the investment yt ∈ A(st) and a parameter θt ∈ Θ. This parameter is chosen according to a certain probability measure \(\gamma _t\in \mathcal {P}\), where \(\mathcal {P}\) denotes the action set of nature and it is assumed to be a Borel subset of \( \Pr (\varTheta )\).
Let Γ be the set of all sequences \((\gamma _t)_{t\in \mathbb {N}}\) of Borel measurable mappings \(\gamma _t: D\to \mathcal {P}\), where D = {(s, a) : s ∈ S, a ∈ A(s)}. For any \(t\in \mathbb {N}\) and \(\gamma =(\gamma _t)_{t\in \mathbb {N}}\in \varGamma \), we set γt := (γτ)τ≥t. Clearly, γt ∈ Γ. A Markov strategy for nature is a sequence \(\gamma =(\gamma _t)_{t\in \mathbb {N}}\in \varGamma \). Note that γt can be called a Markov strategy used by nature from period t onward.
For any \(t\in \mathbb {N}\), define Ht as the set of all sequences
Ht is the set of all feasible future histories of the process from period t onward. Endow Ht with the product σ-algebra. Assume in addition that \( \tilde {u}\le 0\), the generations employ a stationary strategy c ∈ Φ and nature chooses some γ ∈ Γ. Then the choice of nature is a probability measure depending on (st, c(st)). Let \(E^{c,\gamma ^t}_{s_t}\) denote as usual the expectation operator corresponding to the unique probability measure on Ht induced by a stationary strategy c ∈ Φ used by each generation (τ ≥ t), a Markov strategy of nature γt ∈ Γ and the transition probability q. Assume that all generations from t onward use c ∈ Φ and nature applies a strategy γt ∈ Γ. Then, the generation t’s expected utility is of the following form:
This definition of utility in an intergenerational game provides an intuitive notion of ambiguity aversion, which can be regarded as the generations’ diffidence for any lack of precise definition of uncertainty, something that provides room for the malevolent influence of nature. Letting
we one can show that
For any s ∈ S, a ∈ A(s), and c ∈ Φ, we set
If s = st, then \(\hat {P}(a,c)(s)\) is the utility for generation t choosing a ∈ A(st) in this state when all future generations employ a stationary strategy c ∈ Φ.
A robust Markov perfect equilibrium is a function c∗∈ Φ such that for every s ∈ S we have
The existence of a robust Markov perfect equilibrium in the aforementioned model was proved by Balbus et al. (2014) under the assumption that the transition probability is a convex combination of probability measures μ1, …, μl on S with coefficients depending on investments y = s − a . A robust Markov perfect equilibrium was obtained in the class of functions F under the condition that all measures μ1, …, μl are nonatomic. If μ1, …, μl have atoms, then some stochastic dominance conditions are imposed, but the equilibrium was obtained in the class of Lipschitz continuous functions with constant 1. A different approach was presented in the work of Jaśkiewicz and Nowak (2014b), where the set of endowments S and the set of consumptions are Borel and the parameter set Θ is finite. Assuming again that the transition probability is a finite convex combination of probability measures μ1, …, μl on S depending on the parameter θ with coefficients depending on the inheritance s and consumption level a, they have established a twofold result. First, they proved the existence of a robust Markov perfect equilibrium in the class of randomized strategies. Second, assuming that μ1, …, μl are nonatomic, and making use of the purification theorem of Dvoretzky-Wald-Wolfowitz, they replaced a randomized equilibrium by a pure one.
The models of intergenerational games with general spaces of consumptions and endowments were also examined by Jaśkiewicz and Nowak (2014a). A novel feature in this approach is the fact that generation t can employ the entropic risk measure to calculate its utilities. More precisely, if Z is a random variable with the distribution π, then its entropic risk measure is \(\mathcal {E}(Z)=\frac {1}{r} \ln \int _{\varOmega } e^{rZ(\omega )}\pi (d\omega )\), where r < 0 is a risk coefficient. If r is sufficiently close to zero, then making use of the Taylor expansion one can see that
This means that a generation which uses the entropic risk measure to calculate its utility is risk averse and takes into account not only the expected value of a random future successors’ utilities derived from consumptions but also their variance. Assuming that each generation cares only about its m descendants and assuming that the transition probability is a convex combination of finitely many nonatomic measures on the endowment space with coefficients that may depend on s and a, Jaśkiewicz and Nowak (2014a) proved the existence of stationary Markov perfect equilibrium in pure strategies. The same result was shown for games with infinitely many descendants in the case of hyperbolic preferences. In both cases the proof consists of two parts. First, a randomized stationary Markov perfect equilibrium was shown to exist. Second, making use of the specific structure of the transition probability and applying the Dvoretzky-Wald-Wolfowitz theorem a desired pure stationary Markov perfect equilibrium was obtained.
A related game to the above mentioned models is the one with quasi-geometric discounting from the dynamic consumer theory; see Chatterjee and Eyigungor (2016). Particularly, the authors showed that in natural cases such a game does not possess a Markov perfect equilibrium in the class of continuous strategies. However, a continuous Markov perfect equilibrium exists, if the model was reformulated involving lotteries. These two models were then numerically compared. It is known that the numerical analysis of equilibrium in models with hyperbolic (quasi-geometric) discounting shows difficulties in achieving convergence even in a simple, deterministic optimal growth problem that has a smooth closed-form solution. Maliar and Maliar (2016) defined some restrictions on the equilibrium strategies under which the numerical methods studied deliver a unique smooth solution for many deterministic and stochastic models.
Finally, we wish to point out that Markov Perfect Equilibria for stochastic bequest games with transition probabilities and utilities depending on time were shown to exist in Balbus et al. (2017, 2018).
12 Stopping Games
Stopping games were introduced by Dynkin (1969) as a generalization of optimal stopping problems. They were used in several models in economics and operations research, for example, in equipment replacement, job search, and consumer purchase behavior; see Heller (2012).
Dynkin (1969) dealt with the following problem. Two players observe a bivariate sequence of adapted random variables \((X(k),Y(k))_{k\in \mathbb {N}_0}\), where \(\mathbb {N}_0= \mathbb {N}\cup \{0\}\). Player 1 chooses a stopping time τ1 such that {τ1 = k}⊂{X(k) ≥ 0}, whereas player 2 selects τ2 such that {τ2 = k}⊂{X(k) < 0}. If τ1 ∧ τ2 is finite, then player 2 pays Y (τ) to player 1 and the game terminates. Hence, the objective of player 1 (respectively 2) is to maximize (minimize) R(τ1, τ2) = E[Y (τ1 ∧ τ2)]. Dynkin (1969) characterized 𝜖-optimal stopping times and proved that the game has a value provided that \(\sup _{k\in \mathbb {N}_0}|Y(k)|\) is integrable. This model was later extended by Kiefer (1971) and Neveu (1975). In particular, Neveu (1975) showed the existence of a game value in a slightly modified model. Namely, he dealt with the following expected payoff function:
where \((X(k))_{k\in \mathbb {N}_0}\) and \((Y(k))_{k\in \mathbb {N}_0}\) are \( \mathbb {R}\)-valued adapted stochastic processes such that \(\sup _{k\in \mathbb { N}_0}(X^+(k)+Y^-(k))\) are integrable and X(k) ≤ Y (k) for all \(k\in \mathbb { N}_0\). The game considered by Neveu (1975) was generalized by Yasuda (1985), who dropped the latter assumption on monotonicity. In his model, the expected payoff function takes the following form:
where as usual \((X(k))_{k\in \mathbb {N}_0}\), \((Y(k))_{k\in \mathbb {N}_0}\) and \( (Z(k))_{k\in \mathbb {N}_0}\) are adapted integrable random variables. Yasuda (1985) considered randomized strategies instead of pure ones. According to Yasuda (1985) a strategy for a player is an adapted random sequence \( p=(p_k)_{k\in \mathbb {N}_0}\) (or \(q=(q_k)_{k\in \mathbb {N}_0}\)) such that 0 ≤ pk, qk ≤ 1 with probability one. Here, pk (or qk) stands for the probability that the player stops the game at time k conditional on the event that the game was not stopped before. In computing the payoff induced by a pair of strategies (p, q), one assumes that the randomizations performed by the players in various stages are mutually independent and independent of the payoff processes. Thus, a strategy that corresponds to a stopping time σ is pk = 0 on the event [σ > k] and pk = 1 on the event [σ ≤ k]. Yasuda (1985) proved the existence of the value in the set of randomized strategies in finite and discounted infinite time horizon problems.
Before formulating the next result, we define the stopping stages for players 1 and 2 by \(\theta _1:=\inf \{k\in \mathbb {N}_0 :\ P(k)\le p_k\}\), and \(\theta _2:=\inf \{k\in \mathbb {N}_0 :\ Q(k)\le q_k\}, \) where \((P(k), Q(k))_{k\in \mathbb {N}_0}\) is a double sequence of i.i.d. random variables uniformly distributed over [0, 1] satisfying certain independence assumptions imposed in Rosenberg et al. (2001). Set θ = θ1 ∧ θ2. Clearly, θ is the stage at which the game stops. Let us define
and its β-discounted evaluation
The following result was proved by Rosenberg et al. (2001).
Theorem 29
Assume that \(E[\sup _{k\in \mathbb {N}_0} (|X(k)|+|Y(k)|+|Z(k)|)]<+\infty \). Then the stopping games with the payoffs R(p, q) and Rβ(p, q) have values, say v and vβ, respectively. Moreover, limβ→1vβ = v.
Let us now turn to nonzero-sum Dynkin games. They were considered in several papers; see, for instance, Ferenstein (2007), Krasnosielska-Kobos (2016), Morimoto (1986), Nowak and Szajowski (1999), Ohtsubo (1987), Ohtsubo (1991), Solan and Vieille (2001), and Szajowski (1994). Obviously, the list of references is by no means exhaustive. We start with presenting a result for two-player nonzero-sum stopping games. Assume that the aforementioned sequences \((X(k))_{k\in \mathbb {N}_0}\), \((Y(k))_{k\in \mathbb {N}_0}\) and \( (Z(k))_{k\in \mathbb {N}_0}\) are bounded in \(\mathbb {R}^2\) and let ρ be a uniform bound on the payoffs. The payoff of the game is R(p, q) except that \(R(p,q)\in \mathbb {R}^2\). Shmaya and Solan (2004) proved the following result.
Theorem 30
For each 𝜖 > 0, the stopping game has an 𝜖-equilibrium \((p^*_{\epsilon },q^*_{\epsilon })\).
Theorem 30 does not hold, if the payoffs are not uniformly bounded. Its proof is based upon a stochastic version of the Ramsey theorem that was also proved by Shmaya and Solan (2004). It states that for every colouring of a complete infinite graph by finitely many colours, there is a complete infinite monochromatic subgraph. Shmaya and Solan (2004) applied a variation of this result to reduce the problem of the existence of an 𝜖-equilibrium in a general stopping game to that of studying properties of 𝜖 -equilibria in a simple class of stochastic games with finite state space. A similar result for deterministic 2-player nonzero-sum stopping games was reported by Shmaya et al. (2003).
All the aforementioned works deal with the two-player case and/or assume some special structure of the payoffs. Recently, Hamadène and Hassani (2014) studied n -person nonzero-sum Dynkin games. Such a game is terminated at τ := τ1 ∧… ∧ τn, where τi is a stopping time chosen by player i. Then, the corresponding payoff for player i is given by
where Is denotes the set of players who make the decision to stop, that is, Is = {m ∈{1, …, n} : τ = τm} and \(W^{i,I_s}\) is the payoff stochastic process of player i. The main assumption says that the payoff is less when the player belongs to the group involved in the decision to stop than when he is not. Hamadène and Hassani (2014) showed that the game has a Nash equilibrium in pure strategies. The proof is based on the approximation scheme whose limit provides a Nash equilibrium.
Krasnosielska-Kobos and Ferenstein (2013) is another paper that is concerned with multi-person stopping games. More precisely, they consider a game in which players sequentially observe the offers X(1), X(2), … at jump times T1, T2, … of a Poisson process. It is assumed that the random variables X(1), X(2), … form an i.i.d. sequence. Each accepted offer results in a reward R(k) = X(k)r(Tk), where r is a non-increasing discount function. If more than one player accepts the offer, then the player with the highest priority gets the reward. By making use of the solution to the multiple optimal stopping time problem with above reward structure, Krasnosielska-Kobos and Ferenstein (2013) constructed a Nash equilibrium which is Pareto efficient.
Mashiah-Yaakovi (2014), on the other hand, studied subgame-perfect equilibria in stopping games. It is assumed that at every stage one of the players is chosen according to a stochastic process, and that player decides whether to continue the interaction or to stop it. The terminal payoff vector is obtained by another stochastic process. Mashiah-Yaakovi (2014) defines a weaker concept of subgame-perfect equilibrium, namely, a δ-approximate subgame-perfect 𝜖-equilibrium. A strategy profile is a δ -approximate subgame-perfect 𝜖-equilibrium if it induces an 𝜖-equilibrium in every subgame, except perhaps a set of subgames that occur with probability at most δ. A 0-approximate subgame-perfect 𝜖-equilibrium is actually a subgame-perfect 𝜖-equilibrium. The concept of approximate subgame-perfect equilibrium relates to the concept of “trembling-hand perfect equilibrium” introduced by Selten (1975). A stopping game in which, at every stage, one player who decides to stop or continue the game is chosen according to a (periodic in some sense) stochastic process is also studied in Mashiah-Yaakovi (2009). This assumption extends the random priority in stopping games considered, for example, in Szajowski (1994, 1995). Once the chosen player decides to stop, the players receive terminal payoffs that are determined by a second stochastic process. Periodic subgame-perfect 𝜖-equilibria in pure strategies are studied under some quite general conditions. Some bases for stopping n-person games with fixed priorities were provided by Enns and Ferenstein (1987).
Finally, it is worth pointing out that there are three notions of random stopping times. The above mentioned randomized strategies used by Yasuda (1985) and Rosenberg et al. (2001) are also called behavioral stopping times. A randomized stopping time, on the other hand, is a nonnegative adapted real-valued process \(\rho =(\rho _k)_{k\in \mathbb {N}\cup \{\infty \}}\) that satisfies \(\sum _{k\in \mathbb {N}\cup \{\infty \}}\rho _k=1\). The third concept allows to define mixed stopping times ν. Roughly speaking, they are product measurable functions in which the first coordinate r is chosen according to the uniform distribution over the interval [0, 1] at the outset. Then, the stopping time is ν(r, ⋅). For more details, the reader is referred to Rosenberg et al. (2001). As communicated to us by Eilon Solan, the classes of mixed and randomized stopping times are equivalent by a proper generalization of Kuhn’s theorem.
References
Abreu D, Pearce D, Stacchetti E (1986) Optimal cartel equilibria with imperfect monitoring. J Econ Theory 39:251–269
Abreu D, Pearce D, Stacchetti E (1990) Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58:1041–1063
Adlakha S, Johari R (2013) Mean field equilibrium in dynamic games with strategic complementarities. Oper Res 61:971–989
Aliprantis C, Border K (2006) Infinite dimensional analysis: a Hitchhiker’s guide. Springer, New York
Alj A, Haurie A (1983) Dynamic equilibria in multigenerational stochastic games. IEEE Trans Autom Control 28:193–203
Alós-Ferrer C, Ritzberger K (2015) Characterizing existence of equilibrium for large extensive form games: a necessity result. Econ Theory 63:407–430
Alós-Ferrer C, Ritzberger K (2016) Equilibrium existence for large perfect information games. J Math Econ 62:5–18
Altman E (1996) Non-zero-sum stochastic games in admission, service and routing control in queueing systems. Queueing Syst Theory Appl 23:259–279
Altman E, Avrachenkov K, Bonneau N, Debbah M, El-Azouzi R, Sadoc Menasche D (2008) Constrained cost-coupled stochastic games with independent state processes. Oper Res Lett 36:160–164
Altman E, Avrachenkov K, Marquez R, Miller G (2005) Zero-sum constrained stochastic games with independent state processes. Math Methods Oper Res 62:375–386
Altman E, Hordijk A, Spieksma FM (1997) Contraction conditions for average and α-discount optimality in countable state Markov games with unbounded rewards. Math Oper Res 22: 588–618
Amir R (1996a) Continuous stochastic games of capital accumulation with convex transitions. Games Econ Behavior 15:132–148
Amir R (1996b) Strategic intergenerational bequests with stochastic convex production. Econ Theory 8:367–376
Amir R (2003) Stochastic games in economics: the lattice-theoretic approach. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 443–453
Artstein Z (1989) Parametrized integration of multifunctions with applications to control and optimization. SIAM J Control Optim 27:1369–1380
Aumann RJ (1974) Subjectivity and correlation in randomized strategies. J Math Econ 1:67–96
Aumann RJ (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55:1–18
Balbus Ł, Jaśkiewicz A, Nowak AS (2014) Robust Markov perfect equilibria in a dynamic choice model with quasi-hyperbolic discounting. In: Haunschmied J et al (eds) Dynamic games in economics, dynamic modeling and econometrics in economics and finance 16. Springer, Berlin/Heidelberg, pp 1–22
Balbus Ł, Jaśkiewicz A, Nowak AS (2015a) Existence of stationary Markov perfect equilibria in stochastic altruistic growth economies. J Optim Theory Appl 165:295–315
Balbus Ł, Jaśkiewicz A, Nowak AS (2015b) Stochastic bequest games. Games Econ Behavior 90:247–256
Balbus Ł, Jaśkiewicz A, Nowak AS (2015c) Bequest games with unbounded utility functions. J Math Anal Appl 427:515–524
Balbus Ł, Jaśkiewicz A, Nowak AS (2016) Non-paternalistic intergenerational altruism revisited. J Math Econ 63:27–33
Balbus Ł, Nowak AS (2004) Construction of Nash equilibria in symmetric stochastic games of capital accumulation. Math Methods Oper Res 60:267–277
Balbus Ł, Nowak AS (2008) Existence of perfect equilibria in a class of multigenerational stochastic games of capital accumulation. Automatica 44:1471–1479
Balbus Ł, Reffett K, Woźny Ł (2012) Stationary Markovian equilibria in altruistic stochastic OLG models with limited commitment. J Math Econ 48:115–132
Balbus Ł, Reffett K, Woźny Ł (2013a) A constructive geometrical approach to the uniqueness of Markov stationary equilibrium in stochastic games of intergenerational altruism. J Econ Dyn Control 37:1019–1039
Balbus Ł, Reffett K, Woźny Ł (2013b) Markov stationary equilibria in stochastic supermodular games with imperfect private and public information. Dyn Games Appl 3:187–206
Balbus Ł, Reffett K, Woźny Ł (2014a) Constructive study of Markov equilibria in stochastic games with complementarities. J Econ Theory 150:815–840
Balbus Ł, Jaśkiewicz A, Nowak AS, Woźny L (2017) A note on Markov perfect equilibria in a class of non-stationary bequest games. J Math Anal Appl 456:394–401
Balbus Ł, Jaśkiewicz A, Nowak AS (2018, in press) Markov perfect equilibria in a dynamic decision model with quasi-hyperbolic discounting. Ann Oper Res
Barelli P, Duggan J (2014) A note on semi-Markov perfect equilibria in discounted stochastic games. J Econ Theory 15:596–604
Başar T, Olsder GJ (1995) Dynamic noncooperative game theory. Academic, New York
Berg K (2016) Elementary subpaths in discounted stochastic games. Dyn Games Appl 6(3): 304–323
Bernheim D, Ray D (1983) Altruistic growth economies I. Existence of bequest equilibria. Technical report no 419, Institute for Mathematical Studies in the Social Sciences, Stanford University
Bernheim D, Ray D (1986) On the existence of Markov-consistent plans under production uncertainty. Rev Econ Stud 53:877–882
Bernheim D, Ray D (1987) Economic growth with intergenerational altruism. Rev Econ Stud 54:227–242
Bernheim D, Ray D (1989) Markov perfect equilibria in altruistic growth economies with production uncertainty. J Econ Theory 47:195–202
Bhattacharya R, Majumdar M (2007) Random dynamical systems: theory and applications. Cambridge University Press, Cambridge
Billingsley P (1968) Convergence of probability measures. Wiley, New York
Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36:226–235
Blackwell D (1969) Infinite Gδ -games with imperfect information. Zastosowania Matematyki (Appl Math) 10:99–101
Borkar VS, Ghosh MK (1993) Denumerable state stochastic games with limiting average payoff. J Optim Theory Appl 76:539–560
Browder FE (1960) On continuity of fixed points under deformations of continuous mappings. Summa Brasiliensis Math 4:183–191
Carlson D, Haurie A (1996) A turnpike theory for infinite-horizon open-loop competitive processes. SIAM J Control Optim 34:1405–1419
Castaing C, Valadier M (1977) Convex analysis and measurable multifunctions. Lecture notes in mathematics, vol 580. Springer, New York
Chatterjee S, Eyigungor B (2016) Continuous Markov equilibria with quasi-geometric discounting. J Econ Theory 163:467–494
Chiarella C, Kemp MC, Van Long N, Okuguchi K (1984) On the economics of international fisheries. Int Econ Rev 25:85–92
Cingiz K, Flesch J, Herings JJP, Predtetchinski A (2016) Doing it now, later, or never. Games Econ Behavior 97:174–185
Cole HL, Kocherlakota N (2001) Dynamic games with hidden actions and hidden states. J Econ Theory 98:114–126
Cottle RW, Pang JS, Stone RE (1992) The linear complementarity problem. Academic, New York
Curtat LO (1996) Markov equilibria of stochastic games with complementarities. Games Econ Behavior 17:177–199
Doraszelski U, Escobar JF (2010) A theory of regular Markov perfect equilibria in dynamic stochastic games: genericity, stability, and purification. Theor Econ 5:369–402
Doraszelski U, Pakes A (2007) A framework for applied dynamic analysis in IO. In: Amstrong M, Porter RH (eds) Handbook of industrial organization, vol 3. North Holland, Amsterdam/ London, pp 1887–1966
Doraszelski U, Satterthwaite M (2010) Computable Markov-perfect industry dynamics. Rand J Econ 41:215–243
Dubins LE, Savage LJ (2014) Inequalities for stochastic processes. Dover, New York
Duffie D, Geanakoplos J, Mas-Colell A, McLennan A (1994) Stationary Markov equilibria. Econometrica 62:745–781
Duggan J (2012) Noisy stochastic games. Econometrica 80:2017–2046
Dutta PK (1995) A folk theorem for stochastic games. J Econ Theory 66:1–32
Dutta PK, Sundaram RK (1992) Markovian equilibrium in class of stochastic games: existence theorems for discounted and undiscounted models. Econ Theory 2:197–214
Dutta PK, Sundaram RK (1993) The tragedy of the commons? Econ Theory 3:413–426
Dutta PK, Sundaram RK (1998) The equilibrium existence problem in general Markovian games. In: Majumdar M (ed) Organizations with incomplete information. Cambridge University Press, Cambridge, pp 159–207
Dynkin EB (1969) The game variant of a problem on optimal stopping. Sov Math Dokl 10:270–274
Dynkin EB, Evstigneev IV (1977) Regular conditional expectations of correspondences. Theory Probab Appl 21:325–338
Eaves B (1972) Homotopies for computation of fixed points. Math Program 3:1–22
Eaves B (1984) A course in triangulations for solving equations with deformations. Springer, Berlin
Elliott RJ, Kalton NJ, Markus L (1973) Saddle-points for linear differential games. SIAM J Control Optim 11:100–112
Enns EG, Ferenstein EZ (1987) On a multi-person time-sequential game with priorities. Seq Anal 6:239–256
Ericson R, Pakes A (1995) Markov-perfect industry dynamics: a framework for empirical work. Rev Econ Stud 62:53–82
Escobar JF (2013) Equilibrium analysis of dynamic models of imperfect competition. Int J Ind Organ 31:92–101
Federgruen A (1978) On N-person stochastic games with denumerable state space. Adv Appl Probab 10:452–471
Ferenstein E (2007) Randomized stopping games and Markov market games. Math Methods Oper Res 66:531–544
Filar JA, Schultz T, Thuijsman F, Vrieze OJ (1991) Nonlinear programming and stationary equilibria in stochastic games. Math Program 50:227–237
Filar JA, Vrieze K (1997) Competitive Markov decision processes. Springer, New York
Fink AM (1964) Equilibrium in a stochastic n-person game. J Sci Hiroshima Univ Ser A-I Math 28:89–93
Flesch J, Schoenmakers G, Vrieze K (2008) Stochastic games on a product state space. Math Oper Res 33:403–420
Flesch J, Schoenmakers G, Vrieze K (2009) Stochastic games on a product state space: the periodic case. Int J Game Theory 38:263–289
Flesch J, Thuijsman F, Vrieze OJ (1997) Cyclic Markov equilibrium in stochastic games. Int J Garne Theory 26:303–314
Flesch J, Thuijsman F, Vrieze OJ (2003) Stochastic games with non-observable actions. Math Methods Oper Res 58:459–475
Flesch J, Kuipers J, Mashiah-Yaakovi A, Schoenmakers G, Solan E, Vrieze K (2010a) Perfect-information games with lower-semicontinuous payoffs. Math Oper Res 35:742–755
Flesch J, Kuipers J, Schoenmakers G, Vrieze K (2010b) Subgame-perfection in positive recursive games with perfect information. Math Oper Res 35:193–207
Flesch J, Kuipers J, Mashiah-Yaakovi A, Schoenmakers G, Shmaya E, Solan E, Vrieze K (2014) Non-existence of subgame-perfect 𝜖-equilibrium in perfect-information games with infinite horizon. Int J Game Theory 43:945–951
Flesch J, Thuijsman F, Vrieze OJ (2007) Stochastic games with additive transitions. Eur J Oper Res 179:483–497
Flesch J, Predtetchinski A (2015) On refinements of subgame perfect 𝜖 -equilibrium. Int J Game Theory 45:523–542
Forges E (1986) An approach to communication equilibria. Econometrica 54:1375–1385
Forges F (1992) Repeated games of incomplete information: nonzero-sum. In: Aumann RJ, Hart S (eds) Handbook of game theory, vol 1. North Holland, Amsterdam, pp 155–177
Forges F (2009) Correlated equilibria and communication in games. In: Meyers RA (ed) Encyclopedia of complexity and systems science. Springer, New York, pp 1587–1596
Fudenberg D, Levine D (1983) Subgame-perfect equilibria of finite and infinite horizon games. J Econ Theory 31:251–268
Fudenberg D, Tirole J (1991) Game theory. MIT Press, Cambridge
Fudenberg D, Yamamoto Y (2011) The folk theorem for irreducible stochastic games with imperfect public monitoring. J Econ Theory 146:1664–1683
Gale D (1967) On optimal development in a multisector economy. Rev Econ Stud 34:1–19
Glicksberg IL (1952) A further generalization of the Kakutani fixed point theorem with application to Nash equilibrium points. Proc Am Math Soc 3:170–174
Govindan S, Wilson R (2003) A global Newton method to compute Nash equilibria. J Econ Theory 110:65–86
Govindan S, Wilson R (2009) Global Newton method for stochastic games. J Econ Theory 144:414–421
Haller H, Lagunoff R (2000) Genericity and Markovian behavior in stochastic games. Econometrica 68:1231–1248
Hamadène S, Hassani M (2014) The multi-player nonzero-sum Dynkin game in discrete time. Math Methods Oper Res 79:179–194
Harris C (1985) Existence and characterization of perfect equilibrium in games of perfect information. Econometrica 53:613–628
Harris C, Laibson D (2001) Dynamic choices of hyperbolic consumers. Econometrica 69:935–957
Harris C, Reny PJ, Robson A (1995) The existence of subgame-perfect equilibrium in continuous games with almost perfect information: a case for public randomization. Econometrica 63: 507–544
Harsanyi JC (1973a) Oddness of the number of equilibrium points: a new proof. Int J Game Theory 2:235–250
Harsanyi JC (1973b) Games with randomly disturbed payoffs: a new rationale for mixed-strategy equilibrium points. Int J Game Theory 2:1–23
Haurie A, Krawczyk JB, Zaccour G (2012) Games and dynamic games. World Scientific, Singapore
He W, Sun Y (2017) Stationary Markov perfect equilibria in discounted stochastic games. J Econ Theory 169:35–61
Heller Y (2012) Sequential correlated equilibria in stopping games. Oper Res 60:209–224
Herings JJP, Peeters RJAP (2004) Stationary equilibria in stochastic games: structure, selection, and computation. J Econ Theory 118:32–60
Herings JJP, Peeters RJAP (2010) Homotopy methods to compute equilibria in game theory. Econ Theory 42:119–156
Himmelberg CJ (1975) Measurable relations. Fundam Math 87:53–72
Himmelberg CJ, Parthasarathy T, Raghavan TES, Van Vleck FS (1976) Existence of p-equilibrium and optimal stationary strategies in stochastic games. Proc Am Math Soc 60:245–251
Hopenhayn H, Prescott E (1992) Stochastic monotonicity and stationary distributions for dynamic economies. Econometrica 60:1387–1406
Hörner J, Sugaya T, Takahashi S, Vieille N (2011) Recursive methods in discounted stochastic games: an algorithm for δ → 1 and a folk theorem. Econometrica 79:1277–1318
Hörner J, Takahashi S, Vieille N (2014) On the limit perfect public equilibrium payoff set in repeated and stochastic game. Games Econ Behavior 85:70–83
Horst U (2005) Stationary equilibria in discounted stochastic games with weakly interacting players. Games Econ Behavior 51:83–108
Jaśkiewicz A, Nowak AS (2006) Approximation of noncooperative semi-Markov games. J Optim Theory Appl 131:115–134
Jaśkiewicz A, Nowak AS (2014a) Stationary Markov perfect equilibria in risk sensitive stochastic overlapping generations models. J Econ Theory 151:411–447
Jaśkiewicz A, Nowak AS (2014b) Robust Markov perfect equilibria. J Math Anal Appl 419: 1322–1332
Jaśkiewicz A, Nowak AS (2015a) On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games. Math Methods Oper Res 81:169–179
Jaśkiewicz A, Nowak AS (2015b) Stochastic games of resource extraction. Automatica 54: 310–316
Jaśkiewicz A, Nowak AS (2016) Stationary almost Markov perfect equilibria in discounted stochastic games. Math Oper Res 41:430–441
Jaśkiewicz A, Nowak AS (2018a) Zero-sum stochastic games. In: Basar T, Zaccour G (eds) Handbook of dynamic game theory, Birkhäuser, Basel
Jaśkiewicz A, Nowak AS (2018b, in press) On symmetric stochastic games of resource extraction with weakly continuous transitions. TOP
Kakutani S (1941) A generalization of Brouwer’s fixed point theorem. Duke Math J 8:457–459
Kiefer YI (1971) Optimal stopped games. Theory Probab Appl 16:185–189
Kitti M (2016) Subgame-perfect equilibria in discounted stochastic games. J Math Anal Appl 435:253–266
Klein E, Thompson AC (1984) Theory of correspondences. Wiley, New York
Kohlberg E, Mertens JF (1986) On the strategic stability of equilibria. Econometrica 54:1003–1037
Krasnosielska-Kobos A (2016) Construction of Nash equilibrium based on multiple stopping problem in multi-person game. Math Methods Oper Res (2016) 83:53–70
Krasnosielska-Kobos A, Ferenstein E (2013) Construction of Nash equilibrium in a game version of Elfving’s multiple stopping problem. Dyn Games Appl 3:220–235
Krishnamurthy N, Parthasarathy T, Babu S (2012) Existence of stationary equilibrium for mixtures of discounted stochastic games. Curr Sci 103:1003–1013
Krishnamurthy N, Parthasarathy T, Ravindran G (2012) Solving subclasses of multi-player stochastic games via linear complementarity problem formulations – a survey and some new results. Optim Eng 13:435–457
Kuipers J, Flesch J, Schoenmakers G, Vrieze K (2016) Subgame-perfection in recursive perfect information games, where each player controls one state. Int J Game Theory 45:205–237
Kuratowski K, Ryll-Nardzewski C (1965) A general theorem on selectors. Bull Polish Acad Sci (Ser Math) 13:397–403
Küenle HU (1994) On Nash equilibrium solutions in nonzero-sum stochastic games with complete information. Int J Game Theory 23:303–324
Küenle HU (1999) Equilibrium strategies in stochastic games with additive cost and transition structure and Borel state and action spaces. Int Game Theory Rev 1:131–147
Leininger W (1986) The existence of perfect equilibria in model of growth with altruism between generations. Rev Econ Stud 53:349–368
Lemke CE (1965) Bimatrix equilibrium points and mathematical programming. Manag Sci 11:681–689
Lemke CE, Howson JT Jr (1964) Equilibrium points of bimatrix games. SIAM J Appl Math 12:413–423
Levhari D, Mirman L (1980) The great fish war: an example using a dynamic Coumot-Nash solution. Bell J Econ 11:322–334
Levy YJ (2013) Discounted stochastic games with no stationary Nash equilibrium: two examples. Econometrica 81:1973–2007
Levy YJ, McLennan A (2015) Corrigendum to: discounted stochastic games with no stationary Nash equilibrium: two examples. Econometrica 83:1237–1252
Maitra A, Sudderth W (2003) Borel stay-in-a-set games. Int J Game Theory 32:97–108
Maitra A, Sudderth WD (2007) Subgame-perfect equilibria for stochastic games. Math Oper Res 32:711–722
Majumdar MK, Sundaram R (1991) Symmetric stochastic games of resource extraction. The existence of non-randomized stationary equilibrium. In: Raghavan et al (eds) Stochastic games and related topics. Kluwer, Dordrecht, pp 175–190
Maliar, L, Maliar, S (2016) Ruling out multiplicity of smooth equilibria in dynamic games: a hyperbolic discounting example. Dyn Games Appl 6:243–261
Mas-Colell A (1974) A note on a theorem of F. Browder. Math Program 6:229–233
Mashiah-Yaakovi A (2009) Periodic stopping games. Int J Game Theory 38:169–181
Mashiah-Yaakovi A (2014) Subgame-perfect equilibria in stopping games. Int J Game Theory 43:89–135
Mashiah-Yaakovi A (2015) Correlated equilibria in stochastic games with Borel measurable payoffs. Dyn Games Appl 5:120–135
Maskin E, Tirole J (2001) Markov perfect equilibrium: I. Observable actions. J Econ Theory 100:191–219
Mertens JF (2002) Stochastic games. In: Aumann RJ, Hart S (eds) Handbook of game theory with economic applications, vol 3. North Holland, Amsterdam/London, pp 1809–1832
Mertens JF (2003) A measurable measurable choice theorem. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 107–130
Mertens JF, Neyman A (1981) Stochastic games. Int J Game Theory 10:53–56
Mertens JF, Parthasarathy T (1991) Nonzero-sum stochastic games. In: Raghavan et al (eds) Stochastic games and related topics. Kluwer, Dordrecht, pp 145–148
Mertens JF, Parthasarathy T (2003) Equilibria for discounted stochastic games. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 131–172
Mertens JF, Sorin S, Zamir S (2015) Repeated games. Cambridge University Press, Cambridge
Milgrom P, Roberts J (1990) Rationalizability, learning and equilibrium in games with strategic complementarities. Econometrica 58:1255–1277
Milgrom P, Shannon C (1994) Monotone comparative statics. Econometrica 62:157–180
Mohan SR, Neogy SK, Parthasarathy T (1997) Linear complementarity and discounted polystochastic game when one player controls transitions. In: Ferris MC, Pang JS (eds) Proceedings of the international conference on complementarity problems. SIAM, Philadelphia, pp 284–294
Mohan SR, Neogy SK, Parthasarathy T (2001) Pivoting algorithms for some classes of stochastic games: a survey. Int Game Theory Rev 3:253–281
Monderer D, Shapley LS (1996) Potential games. Games Econ Behavior 14:124–143
Montrucchio L (1987) Lipschitz continuous policy functions for strongly concave optimization problems. J Math Econ 16:259–273
Morimoto H (1986) Nonzero-sum discrete parameter stochastic games with stopping times. Probab Theory Relat Fields 72:155–160
Nash JF (1950) Equilibrium points in n -person games. Proc Nat Acad Sci USA 36:48–49
Neveu J (1975) Discrete-parameter martingales. North-Holland, Amsterdam
Neyman A (2017) Continuous-time stochastic games. Games Econ Behavior 104:92–130
Neyman A, Sorin S (eds) (2003) Stochastic games and applications. Kluwer, Dordrecht
Nowak AS (1985) Existence of equilibrium stationary strategies in discounted noncooperative stochastic games with uncountable state space. J Optim Theory Appl 45:591–602
Nowak AS (1987) Nonrandomized strategy equilibria in noncooperative stochastic games with additive transition and reward structure. J Optim Theory Appl 52:429–441
Nowak AS (2003a) N-person stochastic games: extensions of the finite state space case and correlation. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 93–106
Nowak AS (2003b) On a new class of nonzero-sum discounted stochastic games having stationary Nash equilibrium point. Int J Game Theory 32:121–132
Nowak AS (2006a) On perfect equilibria in stochastic models of growth with intergenerational altruism. Econ Theory 28:73–83
Nowak AS (2006b) A multigenerational dynamic game of resource extraction. Math Social Sci 51:327–336
Nowak AS (2006c) A note on equilibrium in the great fish war game. Econ Bull 17(2):1–10
Nowak AS (2007) On stochastic games in economics. Math Methods Oper Res 66:513–530
Nowak AS (2008) Equilibrium in a dynamic game of capital accumulation with the overtaking criterion. Econ Lett 99:233–237
Nowak AS (2010) On a noncooperative stochastic game played by internally cooperating generations. J Optim Theory Appl 144:88–106
Nowak AS, Altman E (2002) ε-Equilibria for stochastic games with uncountable state space and unbounded costs. SIAM J Control Optim 40:1821–1839
Nowak AS, Jaśkiewicz A (2005) Nonzero-sum semi-Markov games with the expected average payoffs. Math Methods Oper Res 62:23–40
Nowak AS, Raghavan TES (1992) Existence of stationary correlated equilibria with symmetric information for discounted stochastic games. Math Oper Res 17:519–526
Nowak AS, Raghavan TES (1993) A finite step algorithm via a bimatrix game to a single controller non-zero sum stochastic game. Math Program Ser A 59:249–259
Nowak AS, Szajowski K (1999) Nonzero-sum stochastic games. In: Ann Int Soc Dyn Games, vol 4. Birkhäuser, Boston, pp 297–343
Nowak AS, Wiȩcek P (2007) On Nikaido-Isoda type theorems for discounted stochastic games. J Math Anal Appl 332:1109–1118
Ohtsubo Y (1987) A nonzero-sum extension of Dynkin’s stopping problem. Math Oper Res 12:277–296
Ohtsubo Y (1991) On a discrete-time nonzero-sum Dynkin problem with monotonicity. J Appl Probab 28:466–472
Parthasarathy T (1973) Discounted, positive, and noncooperative stochastic games. Int J Game Theory 2:25–37
Parthasarathy T, Sinha S (1989) Existence of stationary equilibrium strategies in nonzero-sum discounted stochastic games with uncountable state space and state-independent transitions. Int J Game Theory 18:189–194
Peleg B, Yaari M (1973) On the existence of consistent course of action when tastes are changing. Rev Econ Stud 40:391–401
Peski M, Wiseman T (2015) A folk theorem for stochastic games with infrequent state changes. Theoret Econ 10:131–173
Phelps E, Pollak R (1968) On second best national savings and game equilibrium growth. Rev Econ Stud 35:195–199
Pollak R (1968) Consistent planning. Rev Econ Stud 35:201–208
Potters JAM, Raghavan TES, Tijs SH (2009) Pure equilibrium strategies for stochastic games via potential functions. In: Advances in dynamic games and their applications. Annals of the international society of dynamic games, vol 10. Birkhäuser, Boston, pp 433–444
Purves RA, Sudderth WD (2011) Perfect information games with upper semicontinuous payoffs. Math Oper Res 36:468–473
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken
Raghavan TES, Syed Z (2002) Computing stationary Nash equilibria of undiscounted single-controller stochastic games. Math Oper Res 27:384–400
Raghavan TES, Tijs SH, Vrieze OJ (1985) On stochastic games with additive reward and transition structure. J Optim Theory Appl 47:451–464
Ramsey FP (1928) A mathematical theory of savings. Econ J 38:543–559
Ray D (1987) Nonpaternalistic intergenerational altruism. J Econ Theory 40:112–132
Reny PJ, Robson A (2002) Existence of subgame-perfect equilibrium with public randomization: a short proof. Econ Bull 3(24):1–8
Rieder U (1979) Equilibrium plans for non-zero sum Markov games. In: Moeschlin O, Pallaschke D (eds) Game theory and related topics. North-Holland, Amsterdam, pp 91–102
Rogers PD (1969) Non-zero-sum stochastic games. Ph.D. dissertation, report 69–8, Univ of California
Rosen JB (1965) Existence and uniqueness of equilibrium points for concave n-person games. Econometrica 33:520–534
Rosenberg D, Solan E, Vieille N (2001) Stopping games with randomized strategies. Probab Theory Relat Fields 119:433–451
Rubinstein A (1979) Equilibrium in supergames with the overtaking criterion. J Econ Theory 21:1–9
Secchi P, Sudderth WD (2002a) Stay-in-a-set games. Int J Game Theory 30:479–490
Secchi P, Sudderth WD (2002b) N-person stochastic games with upper semi-continuous payoffs. Int J Game Theory 30:491–502
Secchi P, Sudderth.WD (2005) A simple two-person stochastic game with money. In: Annals of the international society of dynamic games, vol 7. Birkhäuser, Boston, pp 39–66
Selten R (1975) Re-examination of the perfectness concept for equilibrium points in extensive games. Int J Game Theory 4:25–55
Shapley LS (1953) Stochastic games. Proc Nat Acad Sci USA 39:1095–1100
Shubik M, Whitt W (1973) Fiat money in an economy with one non-durable good and no credit: a non-cooperative sequential game. In: Blaquière A (ed) Topics in differential games. North-Holland, Amsterdam, pp 401–448
Shmaya E, Solan E (2004) Two player non-zero sum stopping games in discrete time. Ann Probab 32:2733–2764
Shmaya E, Solan E, Vieille N (2003) An applications of Ramsey theorem to stopping games. Games Econ Behavior 42:300–306
Simon RS (2007) The structure of nonzero-sum stochastic games. Adv Appl Math 38:1–26
Simon R (2012) A topological approach to quitting games. Math Oper Res 37:180–195
Simon RS (2016) The challenge of nonzero-sum stochastic games. Int J Game Theory 45:191–204
Sleet C, Yeltekin S (2016) On the computation of value correspondences for dynamic games. Dyn Games Appl 6:174–186
Smale S (1976) A convergent process of price adjustment and global Newton methods. J Math Econ 3:107–120
Sobel MJ (1871) Non-cooperative stochastic games. Ann Math Stat 42:1930–1935
Solan E (1998) Discounted stochastic games. Math Oper Res 23:1010–1021
Solan E (1999) Three-person absorbing games. Math Oper Res 24:669–698
Solan E (2001) Characterization of correlated equilibria in stochastic games. Int J Game Theory 30:259–277
Solan E (2017, in press) Acceptable strategy profiles in stochastic games. Games Econ Behavior
Solan E, Vieille N (2001) Quitting games. Math Oper Res 26:265–285
Solan E, Vieille N (2002) Correlated equilibrium in stochastic games. Games Econ Behavior 38:362–399
Solan E, Vieille N (2003) Deterministic multi-player Dynkin games. J Math Econ 39:911–929
Solan E, Vieille N (2010) Computing uniformly optimal strategies in two-player stochastic games. Econ Theory 42:237–253
Solan E, Ziliotto B (2016) Stochastic games with signals. In: Annals of the international society of dynamic games, vol 14. Birkhäuser, Boston, pp 77–94
Sorin S (1986) Asymptotic properties of a nonzero-sum stochastic games. Int J Game Theory 15:101–107
Spence M (1976) Product selection, fixed costs, and monopolistic competition. Rev Econ Stud 43:217–235
Stachurski J (2009) Economic dynamics: theory and computation. MIT Press, Cambridge, MA
Stokey NL, Lucas RE, Prescott E (1989) Recursive methods in economic dynamics. Harvard University Press, Cambridge
Strotz RH (1956) Myopia and inconsistency in dynamic utility maximization. Rev Econ Stud 23:165–180
Sundaram RK (1989a) Perfect equilibrium in a class of symmetric dynamic games. J Econ Theory 47:153–177
Sundaram RK (1989b) Perfect equilibrium in a class of symmetric dynamic games. Corrigendum. J Econ Theory 49:385–187
Szajowski K (1994) Markov stopping game with random priority. Z Oper Res 39:69–84
Szajowski K (1995) Optimal stopping of a discrete Markov process by two decision makers. SIAM J Control Optim 33:1392–1410
Takahashi M (1964) Equilibrium points of stochastic non-cooperative n-person games. J Sci Hiroshima Univ Ser A-I Math 28:95–99
Thuijsman F, Raghavan TES (1997) Perfect information stochastic games and related classes. Int J Game Theory 26:403–408
Topkis D (1978) Minimizing a submodular function on a lattice. Oper Res 26:305–321
Topkis D (1998) Supermodularity and complementarity. Princeton University Press, Princeton
Valadier M (1994) Young measures, weak and strong convergence and the Visintin-Balder theorem. Set-Valued Anal 2:357–367
Van Long N (2011) Dynamic games in the economics of natural resources: a survey. Dyn Games Appl 1:115–148
Vieille N (2000a) Two-player stochastic games I: a reduction. Isr J Math 119:55–92
Vieille N (2000b) Two-player stochastic games II: the case of recursive games. Israel J Math 119:93–126
Vieille N (2002) Stochastic games: Recent results. In: Aumann RJ, Hart S (eds) Handbook of game theory with economic applications, vol 3. North Holland, Amsterdam/London, pp 1833–1850
Vives X (1990) Nash equilibrium with strategic complementarities. J Math Econ 19:305–321
von Weizsäcker CC (1965) Existence of optimal programs of accumulation for an infinite horizon. Rev Econ Stud 32:85–104
Vrieze OJ, Thuijsman F (1989) On equilibria in repeated games with absorbing states. Int J Game Theory 18:293–310
Whitt W (1980) Representation and approximation of noncooperative sequential games. SIAM J Control Optim 18:33–48
Wiecek P (2009) Pure equilibria in a simple dynamic model of strategic market game. Math Methods Oper Res 69:59–79
Wiecek P (2012) N-person dynamic strategic market games. Appl Math Optim 65:147–173
Yasuda M (1985) On a randomised strategy in Neveu’s stopping problem. Stoch Process Appl 21:159–166
Acknowledgements
We thank Tamer Başar and Georges Zaccour for inviting us to write this chapter and their help. We also thank Elżbieta Ferenstein, János Flesch, Eilon Solan, Yeneng Sun, Krzysztof Szajowski and two reviewers for their comments on an earlier version of this survey.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Jaśkiewicz, A., Nowak, A.S. (2018). Nonzero-Sum Stochastic Games. In: Başar, T., Zaccour, G. (eds) Handbook of Dynamic Game Theory. Springer, Cham. https://doi.org/10.1007/978-3-319-44374-4_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-44374-4_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44373-7
Online ISBN: 978-3-319-44374-4
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering