Nonzero-Sum Stochastic Games

Jaśkiewicz, Anna; Nowak, Andrzej S.

doi:10.1007/978-3-319-44374-4_33

Anna Jaśkiewicz³ &
Andrzej S. Nowak⁴

4050 Accesses
13 Citations

Abstract

This chapter describes a number of results obtained in the last 60 years on the theory of nonzero-sum discrete-time stochastic games. We provide an overview of almost all basic streams of research in this area such as the existence of stationary Nash and correlated equilibria in models on countable and general state spaces, the existence of subgame-perfect equilibria, algorithms, stopping games, and the existence of uniform equilibria. Our survey incorporates several examples of games studied in operations research and economics. In particular, separate sections are devoted to intergenerational games, dynamic Cournot competition and game models of resource extraction. The provided reference list includes not only seminal papers that commenced research in various directions but also exposes recent advances in this field.

Access provided by CONRICYT-eBooks. Download reference work entry PDF

Non-Zero-Sum Stochastic Games

Keywords

1 Introduction

The fundamentals of modern theory of non-cooperative dynamic games were established in the 1950s at Princeton University. First Nash (1950) introduced the notion of equilibrium for finite n-person static games and proved its existence using the fixed point theorem of Kakutani (1941). Next Shapley (1953) presented the model of infinite time horizon stochastic zero-sum game with positive stop probability. Fink (1964) and Takahashi (1964) extended his model to nonzero-sum discounted stochastic games with finite state spaces and proved the existence of equilibrium in stationary Markov strategies. Later on, Rogers (1969) and Sobel (1971) obtained similar results for irreducible stochastic games with the expected limit-average payoffs. Afterwards, the theory of discrete-time nonzero-sum stochastic games was extended in various directions inspiring a lot of interesting applications. An overview of selected basic topics in stochastic dynamic games with instructive examples can be found in the books of Başar and Olsder (1995) and Haurie et al. (2012) and the survey paper by Dutta and Sundaram (1998). Theoretically more advanced material is included in the book edited by Neyman and Sorin (2003) and the monograph of Mertens et al. (2015).

In this chapter, we provide an overview of almost all basic streams of research in the area of nonzero-sum discrete-time stochastic games such as: the existence of stationary equilibria in models on both countable and general state spaces, the existence of subgame-perfect equilibria, algorithms, stopping games, correlated and uniform equilibria. Our survey incorporates several examples of games studied in operations research and economics. In particular, separate sections are devoted to intergenerational games, dynamic Cournot competition and game models of resource extraction. The provided reference list not only includes seminal papers that initiated research in various directions but also exposes recent advances in this field.

The paper is organized as follows. In Sect. 2 we describe some basic material needed for an examination of nonzero-sum stochastic games with general state spaces. To make the presentation less technical, we restrict our attention to Borel space models. A great deal of the results described in this chapter are stated in the literature in a more general framework. However, the Borel state space models are sufficiently rich for most applications. Section 2 includes auxiliary results on set-valued mappings arising in the study of the existence of stationary Nash/correlated equilibria and certain known results in the literature such as a random version of the Carathéodory theorem or the Mertens measurable selection theorem. The second part, on the other hand, is devoted to supermodular games. Sect. 3 deals with the concept of subgame-perfect equilibrium in games on a Borel state space and introduces different classes of strategies, in which subgame-perfect equilibria may be obtained. Sect. 4 includes results on correlated equilibria with public signal proved for games on Borel state spaces, whereas Sect. 5 presents the state-of-the-art results on the existence of stationary equilibria (further called “stationary Markov perfect equilibria”) in discounted stochastic games. This theory found its applications to several examples examined in operations research and economics. Namely, in Sect. 6 we provide models with special but natural transition structures, for which there exist stationary equilibria. Sect. 7 recalls the papers, where the authors proved the existence of an equilibrium for stochastic games on denumerable state spaces. This section embraces the discounted models as well as models with the limit-average payoff criterion. Moreover, it is also shown that the discounted game with a Borel state space can be approximated, under some assumptions, by simpler games with countable state spaces. Sect. 8 reviews algorithms applied to nonzero-sum stochastic games. In particular, it is shown how a formulation of a linear complementarity problem can be helpful in solving games with discounted and limit-average payoff criteria with special transition structure. In addition, we also mention the homotopy methods applied to this issue. Sect. 9 presents material on games with finite state and action spaces, while Sect. 10 deals with games with product state spaces. In Sect. 11 we formulate results proved for various intergenerational games. Our models incorporate paternalistic and non-paternalistic altruistic economic growth models: games with one, finite, or infinitely many descendants as well as games on one- and multidimensional commodity spaces. Finally, Sect. 12 provides a short overview of stopping games beginning from the Dynkin extension of Neveu’s stopping problem.

2 Preliminaries

In this section, we recall essential notations and several facts, which are crucial for studying Nash and correlated equilibria in nonzero-sum stochastic games with uncountable state spaces. Here we follow preliminaries in Jaśkiewicz and Nowak (2018a). Let N = {1, …, n} be the set of n-players. Let X, A₁, …, A_n be Borel spaces. Assume that for each i ∈ N, x → A_i(x) ⊂ A_i is a lower measurable compact-valued action correspondence for player i. This is equivalent to saying that the graph of this correspondence is a Borel subset of X × A_i. Let A := A₁ × ⋯ × A_n. We consider a nonzero-sum n-person game parameterized by a state variable x ∈ X. The payoff or utility function for player i ∈ N is $r_i:X\times A \to \mathbb {R}$ and it is assumed that r_i is bounded, r_i(⋅, a) is Borel measurable for each a ∈ A, and r_i(x, ⋅) is continuous on A for each x ∈ X. Assuming that i ∈ N chooses a mixed strategy $\nu _i\in \Pr (A_i(x))$ and ν := (ν₁, …, ν_n), we denote the expected payoff to player i by

$$\displaystyle \begin{aligned} P^i(x,\nu):= \int_{A_n(x)}\cdots \int_{A_1(x)} r_i(x,a_1,\ldots,a_n)\nu_1(da_1)\times\cdots \times\nu_n(da_n). \end{aligned}$$

A strategy profile $\nu ^*=(\nu _1^*,\ldots ,\nu _n^*)$ is a Nash equilibrium in the game for a given state x ∈ X if

$$\displaystyle \begin{aligned} P^i(x,\nu^*) \ge P^i(x,(\nu_i,\nu^*_{-i})) \end{aligned}$$

for every i ∈ N and $\nu _i\in \Pr (A_i(x)).$ As usual $(\nu _i,\nu ^*_{-i})$ denotes ν^∗ with $\nu ^*_i$ replaced by ν_i. For any x ∈ X, by $ \mathcal {N} (x)$, we denote the set of all Nash equilibria in the considered game. By Glicksberg (1952), $\mathcal {N }(x)\neq \emptyset .$ It is easy to see that $\mathcal {N}( x)$ is compact. Let $\mathcal {NP}(x)\subset \mathbb {R} ^n$ be the set of payoff vectors corresponding to all Nash equilibria in $ \mathcal {N}( x).$ By co, we denote the convex hull operator in $\mathbb {R} ^n.$

Proposition 1

The mappings $x\to {\mathcal {N}}( x)$ , $x\to {\mathcal {NP}}( x)$ and $x\to co{\mathcal {NP}}( x)$ are compact-valued and lower measurable.

For a detailed discussion of these statements, consult Nowak and Raghavan (1992), Himmelberg (1975), and Klein and Thompson (1984). By Kuratowski and Ryll-Nardzewski (1965), every set-valued mapping in Proposition 1 has a Borel measurable selection. Making use of standard results on measurable selections (see Castaing and Valadier 1977) and Carathéodory’s theorem, we obtain the following result.

Proposition 2

Let $b:X\to \mathbb {R}^n$ be a Borel measurable selection of the mapping $x\to co{\mathcal {NP}}( x).$ Then, there exist Borel measurable selections $b^i:X\to \mathbb {R}^n$ and Borel measurable functions λⁱ : X → [0, 1] (i = 1, …, n + 1) such that for each x ∈ X, we have

$$\displaystyle \begin{aligned} \sum_{i=1}^{n+1}\lambda^i(x)=1\quad\mathit{\mbox{ and}}\quad b(x)=\sum_{i=1}^{n+1}\lambda^i(x)b^i(x). \end{aligned}$$

Similarly as in Nowak and Raghavan (1992), from Filippov’s measurable implicit function theorem, we conclude the following facts.

Proposition 3

Let $p:X\to \mathbb {R}^n$ be a Borel measurable selection of the mapping $x\to {\mathcal {NP}}( x).$ Then, there exists a Borel measurable selection ψ of the mapping $x\to {\mathcal {N}}( x)$ such that

$$\displaystyle \begin{aligned} p(x) = (P^1(x,\psi(x)),\ldots,P^n(x,\psi(x)))\quad\mathit{\mbox{for all}}\quad x\in X. \end{aligned}$$

Proposition 4

If $b:X\to \mathbb {R}^n$ is a Borel measurable selection of the mapping $x\to co{\mathcal {NP}}( x)$, then there exist Borel measurable selections ψⁱ of the mapping $x\to {\mathcal {N}}( x)$ and Borel measurable functions λⁱ : X → [0, 1] (i = 1, …, n + 1) such that for each x ∈ X, we have $\sum _{i=1}^{n+1}\lambda ^i(x)=1$ and

$$\displaystyle \begin{aligned} b(x) =\sum_{i=1}^{n+1} \lambda^i(x)(P^1(x,\psi^i(x)),\ldots,P^n(x,\psi^i(x))). \end{aligned}$$

The following result plays an important role in studying Nash equilibria in stochastic games with Borel state spaces and can be deduced from Theorem 2 in Mertens (2003) and Filippov’s measurable implicit function theorem. It is related to Lyapunov’s theorem on the range of nonatomic measures and also has some predecessors in control theory; see Artstein (1989).

Proposition 5

Let μ be a nonatomic Borel probability measure on X. Assume that q_j (j = 1, …, l) are Borel measurable transition probabilities from X to X and for every j and x ∈ X, q_j(⋅|x) ≪ μ, i.e., q_j(⋅|x) is dominated by μ. Let $w^0:X\to \mathbb {R}^n$ be a Borel measurable mapping such that $w^0(x)\in co{\mathcal {NP}}(x)$ for each x ∈ X. Then there exists a Borel measurable mapping $v^0:X\times X\to \mathbb {R}^n$ such that $v^0(x,y)\in {\mathcal {NP}}(x)$ for all x, y ∈ X and

$$\displaystyle \begin{aligned} \int_Xw^0(y)q_j(dy|x)=\int_Xv^0(x,y)q_j(dy|x),\quad j=1,\ldots,l. \end{aligned}$$

Moreover, there exists a Borel measurable mapping $\phi :X\times X \to \Pr (A)$ such that $\phi (x,y)\in {\mathcal {N}}(x)$ for all x, y ∈ X.

Let L be a lattice contained in Euclidean space $\mathbb {R}^k$ equipped with the component-wise order ≥. For any x, y ∈ L, x ∨ y (x ∧ y) denotes the join (meet) of x and y. A function $\phi :L\to \mathbb {R}$ is supermodular if for any x, y ∈ L, it holds

$$\displaystyle \begin{aligned} \phi(x\vee y)+\phi(x\wedge y)\ge \phi(x)+\phi(y). \end{aligned}$$

Clearly, if k = 1, then any function ϕ is supermodular. Let $L_1\subset \mathbb {R}^k$, $L_2\subset \mathbb {R}^l$ be lattices. A function $ \psi :L_1\times L_2\to \mathbb {R}$ has increasing differences in (x, y) if for every x^′≥ x in L₁, ψ(x^′, y) − ψ(x, y) is non-decreasing in y. Let the set A_i of pure strategies of player i ∈ N be a compact convex subset of Euclidean space $\mathbb {R}^{m_i}.$ An element a_i of A_i is denoted by $a_i =(a_{i1}, a_{i2},\ldots ,a_{im_i}).$ We consider an n-person game G₀ in which $R_i:A \to \mathbb {R}$ is the payoff function of player i ∈ N and A := A₁ ×⋯ × A_n. As usual, any strategy profile a = (a₁, a₂, …, a_n) can also be denoted as (a_i, a_−i) for i ∈ N.

Assume that every A_i is a lattice. The game G₀ is called supermodular if for every player i ∈ N and a_−i, the function a_i → R_i(a_i, a_−i) is supermodular and R_i has increasing differences in (a_i, a_−i).

It is well known that any supermodular game with continuous utility functions and compact strategy sets A_i has a pure Nash equilibrium; see Topkis (1998) or Theorems 4 and 5 in Milgrom and Roberts (1990).

The game G₀ is called smooth if every R_i can be extended from A to an open superset A^o in such a way that its second-order partial derivatives exist and are continuous on A^o.

A game G₀ is called a smooth supermodular game if for every player i ∈ N,

(a)
A_i is a compact interval in $\mathbb {R}^{m_i}$,
(b)
$ \frac {\partial ^2 R_i}{\partial a_{ij} \partial a_{ik}}\ge 0$ on A for all 1 ≤ j < k ≤ m_i,
(c)
$\frac {\partial ^2 R_i}{\partial a_{ij} \partial a_{kl}}\ge 0$ on A for each k ≠ i and all 1 ≤ j ≤ m_i, 1 ≤ l ≤ m_k.

It is well known that any game satisfying conditions (a)–(c) is supermodular. Conditions (a) and (b) imply that R_i is a supermodular function with respect to a_i for fixed a_−i, while conditions (a) and (c) imply that R_i has increasing differences in (a_i, a_−i). For a detailed discussion of these issues, see Topkis (1978, 1998) or Theorem 4 in Milgrom and Roberts (1990).

In order to obtain a uniqueness of an equilibrium in a smooth supermodular game G₀, one needs an additional assumption, often called a strict diagonal dominance condition, see page 1271 in Milgrom and Roberts (1990) or Rosen (1965). As noted by Curtat (1996), this condition can be described for smooth supermodular games as follows. Let Mⁱ := {1, 2, …, m_i}.

(C1)
For every i ∈ N and j ∈ Mⁱ,
$$\displaystyle \begin{aligned} \frac{\partial^2 R_i}{\partial a^2_{ij}} +\sum_{l\in M^i\setminus \{j\}} \frac{\partial^2 R_i}{\partial a_{ij} \partial a_{il}}+\sum_{k\in N\setminus\{i\}} \sum_{l\in M^k} \frac{\partial^2 R_i}{\partial a_{ij} \partial a_{kl}}<0. \end{aligned}$$

From Milgrom and Roberts (1990) and page 187 in Curtat (1996), we obtain the following auxiliary result.

Proposition 6

Any smooth supermodular game G₀ satisfying condition (C1)has a unique pure Nash equilibrium.

Assume now that the payoff functions R_i are parameterized by τ in some partially ordered set T, i.e., $R_i: A\times T\to \mathbb {R}.$ We introduce a new condition.

(C2)
$\frac {\partial ^2 R_i}{\partial a_{ij} \partial \tau }\ge 0$ for all 1 ≤ j ≤ m_i, and i ∈ N.

It is known that the set of Nash equilibria in any supermodular game G₀ is a lattice and has the smallest and the largest elements. The following result follows from Theorem 7 in Milgrom and Roberts (1990).

Proposition 7

Suppose that a smooth supermodular game satisfies (C2) . Then, the largest and smallest pure Nash equilibria are non-decreasing functions of τ.

3 Subgame-Perfect Equilibria in Stochastic Games with General State Space

We consider an n-person nonzero-sum discounted stochastic game G as defined below.

$(X,\mathcal {B}(X))$ is a nonempty Borel state space with its Borel σ-algebra $\mathcal {B}(X).$
A_i is a Borel space of actions for player i ∈ N := {1, …, n}.
A_i(x) ⊂ A_i is a set of actions available to player i ∈ N in state x ∈ X. The correspondence x → A_i(x) is lower measurable and compact-valued. Define
$$\displaystyle \begin{aligned} A:=A_1\times \ldots\times A_n\quad\mbox{and}\quad A(x)=A_1(x)\times\ldots\times A_n(x),\quad\;x\in X. \end{aligned}$$
$u_i:X\times A \to \mathbb {R}$ is a Borel measurable bounded utility (or payoff) function for player i ∈ N. It is assumed that u_i(x, ⋅) is continuous on A for every x ∈ X.
$q:X\times A\times \mathcal {B}(X)\to [0,1]$ is a transition probability. We assume that q(D|x, ⋅) is continuous on A for each x ∈ X and $D\in \mathcal {B}(X).$
β ∈ (0, 1) is a discount factor.

Every stage of the game begins with a state x ∈ X, and after observing x, the players simultaneously choose their actions a_i ∈ A_i(x) (i ∈ N) and obtain payoffs u_i(x, a), where a = (a₁, …, a_n). A new state x^′ is realized from the distribution q(⋅|x, a) and new period begins with payoffs discounted by β. Let H₁ = X and H_t be the set of all plays h_t = (x₁, a¹, …, x_t−1, a^t−1, x_t), where $ a^k=(a_1^k,\ldots ,a_n^k)\in A(x_k)$, k = 1, …, t − 1. A strategy for player i ∈ N is a sequence $\pi _i=(\pi _{it})_{t\in \mathbb {N}}$ of Borel measurable transition probabilities from H_t to A_i such that π_it(A_i(x_t)) = 1 for each h_t ∈ H_t. The set of strategies for player i ∈ N is denoted by Π_i. We let Π := Π₁ ×… × Π_n. Let F_i ($F^0_i$) be the set of all Borel measurable mappings $f_i: X\times X \to \Pr (A_i)$ ($\phi _i: X \to \Pr (A_i)$) such that $f_i(x^-,x) \in \Pr (A_i(x))$ ($\phi _i(x) \in \Pr (A_i(x))$) for each x⁻, x ∈ X. A stationary almost Markov strategy for player i ∈ N is a constant sequence $(\pi _{it})_{t\in \mathbb {N}}$ where π_it = f_i for some f_i ∈ F_i and for all $t\in \mathbb {N}.$ If x_t is a state of the game on its t-stage with t ≥ 2, then player i chooses an action using the mixed strategy f_i(x_t−1, x_t). The mixed strategy used at an initial state x₁ is f_i(x₁, x₁). The set of all stationary almost Markov strategies for player i ∈ N is identified with the set F_i. A stationary Markov strategy for player i ∈ N is identified with a Borel measurable mapping $f_i\in F_i^0.$ We say that π_i = (π_i1, π_i2, …) ∈ Π_i is a Markov strategy for player i if $\pi _{it}\in F^0_i$ for all $t\in \mathbb {N}.$

Any strategy profile π = (π₁, …, π_n) ∈ Π together with an initial state x = x₁ ∈ X determines a unique probability measure $P^\pi _x$ on the space H_∞ of all plays h_∞ = (x₁, a¹, x₂, a², …) endowed with the product σ-algebra. The expected discounted payoff or utility function for player i ∈ N is

$$\displaystyle \begin{aligned} J_{\beta}^{i,T}(x,\pi)= E^\pi_x\left(\sum_{t=1}^T \beta^{t-1}u_i(x_t,a^t)\right)\quad\mbox{where}\quad T\le\infty. \end{aligned}$$

We shall write $J_{\beta }^i(x,\pi )$, if T = ∞.

A profile of strategies π^∗∈ Π is called a Nash equilibrium, if

$$\displaystyle \begin{aligned} J_{\beta}^{i,T}(x,\pi^*)\ge J_{\beta}^{i,T}(x,(\pi^*_{-i},\pi_i)) \mbox{ for all }x\in X, \,\,\pi_i\in \varPi_i \mbox{ and } i\in N. \end{aligned}$$

A stationary almost Markov (stationary Markov) perfect equilibrium is a Nash equilibrium that belongs to the class of strategy profiles F := F₁ ×… × F_n ($ F^0:=F^0_1\times \ldots \times F^0_n$). A Markov perfect equilibrium, on the other hand, is a Nash equilibrium π^∗, in which $\pi ^*_{it}=f_{it} $ and $f_{it}\in F_i^0$ for every $t\in \mathbb {N}$ and every player i ∈ N. The strategies involved in such an equilibrium are called “markovian,” “state-contigent,” or “payoff-relevant”; see Maskin and Tirole (2001). Clearly, every stationary Markov perfect equilibrium is also a Markov perfect equilibrium.

Let π = (π₁, …, π_n) ∈ Π and h_t ∈ H_t. By π_i[h_t] we denote the conditional strategy for player i that can be applied from stage t onward. Let π[h_t] = (π₁[h_t], …, π_n[h_t]). Using this notation, one can say that π^∗ is a subgame-perfect equilibrium in the stochastic game if for any $t\in \mathbb {N}$ and every partial history h_t ∈ H_t, π^∗[h_t] is a Nash equilibrium in the subgame starting at x_t, where x_t is the last coordinate in h_t. This definition refers to the classical idea of Selten (1975).

Let B(X) be the space of all bounded Borel measurable real-valued functions on X and Bⁿ(X) := B(X) ×⋯ × B(X) (n times). Similarly define B(X × X) and Bⁿ(X × X). With any x ∈ X and v = (v₁, …, v_n) ∈ Bⁿ(X), we associate the one-shot game Γ_v(x) in which the payoff function to player i ∈ N is

$$\displaystyle \begin{aligned} U_{\beta}^i(v_i,x,a) := u_i(x,a)+\beta \int_X v_i(y)q(dy|x,a), \quad a\in A(x). \end{aligned} $$

(6.1)

If $\nu =(\nu _1,\ldots ,\nu _n)\in \Pr (A(x))$, then

$$\displaystyle \begin{aligned} U_{\beta}^i(v_i,x,\nu)=\int_{A_n(x)}\ldots\int_{A_1(x)}U_ \beta^i(v_i,x,a_1,\ldots,a_n)\nu_1(da_1)\times\cdots\times \nu_n(da_n) \end{aligned}$$

and if f = (f₁, …, f_n) ∈ F⁰, then $U_{\beta }^i(v_i,x,f)= U_{\beta }^i(v_i,x,\nu )$ with ν = (f₁(x), …, f_n(x)). Further, $ U_{\beta }^i(v_i,x,(\mu _i,f_{-i}))=U_{\beta }^i(v_i,x,\nu )$ with ν_i = μ_i, ν_j = f_j(x) for j ≠ i. Under our assumptions, $a\to U_{\beta }^i(v_i,x,a)$ is continuous on A(x) for every v_i ∈ B(X), x ∈ X, i ∈ N. Let $\mathcal {N}_v(x)$ be the set of all Nash equilibria in the game Γ_v(x). By $\mathcal {NP}_v(x)$ we denote the set of payoff vectors corresponding to all Nash equilibria in $\mathcal {N}_v(x).$ Let $ \mathcal {M}_v$ be the set of all Borel measurable selections of the set-valued mapping $x\to \mathcal {N}_v(x).$ We know from Proposition 1 that $\mathcal {M}_v\neq \emptyset .$

Consider a T-stage game (2 ≤ T < ∞). Assume that the (T − 1)-stage subgame starting at any state x₂ ∈ X has a Markov perfect equilibrium, say $\pi ^*_{T-1}.$ Let $v_{T-1}^*$ be the vector payoff function in Bⁿ(X) determined by $\pi ^*_{T-1}.$ Then we can get some $f^*\in \mathcal {M} _{v_{T-1}^*}$ and define $\pi ^*_T:=(f^*,\pi ^*_{T-1}).$ It is obvious that $ \pi ^*_T$ is a Markov perfect equilibrium in the T-stage game. This fact was proved by Rieder (1979) and we state it below.

Theorem 1

Every finite-stage nonzero-sum discounted stochastic game satisfying the above conditions has a subgame-perfect equilibrium. For any ε > 0, there exists an ε-equilibrium π^ε in Markov strategies, i.e.,

$$\displaystyle \begin{aligned} J_{\beta}^i(x,\pi^\varepsilon) +\varepsilon \ge J_{\beta}^i(x,(\pi_i,\pi^\varepsilon_{-i})) \quad\mathit{\mbox{for all}}\quad x\in X, \ \pi_i\in \varPi_i \mathit{\mbox{ and }}i\in N. \end{aligned}$$

Note that ε-equilibrium in the second part of this theorem has no subgame-perfection property.

We now make an additional assumption.

(A1)
The transition probability q is norm continuous in actions, i.e., for each x ∈ X, a^k → a⁰ in A(x) as k →∞, it follows that
$$\displaystyle \begin{aligned} \sup_{D\in \mathcal{B}(X)}|q(D|x,a^k)-q(D|x,a^0)|\to 0. \end{aligned}$$

Condition (A1) is quite restrictive, but it is satisfied, if q has a continuous in actions conditional density with respect to some probability measure on X.

Theorem 2

Every discounted nonzero-sum stochastic game G satisfying (A1)has a subgame-perfect equilibrium.

Theorem 2 was proved in a more general form by Mertens and Parthasarathy (2003), where the payoffs and discount factors may depend on time and the state space is a general measurable space. A special case was considered by Mertens and Parthasarathy (1991), who assumed that the action sets are finite and state independent and transitions are dominated by some probability measure on X. The proofs given in Mertens and Parthasarathy (1991, 2003) are based upon studying a specified fixed point property of an operator defined in the class of measurable selections of compact set-valued mappings from the state space to the payoff space. The fixed point obtained in that class is used to define in a recursive way a subgame-perfect equilibrium that consists of history-dependent strategies (unbounded memory is assumed). For further comments on possible extensions of Theorem 2, the reader is referred to Mertens (2002) and Mertens et al. (2015). A modified proof of their results was provided by Solan (1998), who analyzed accumulation points of 𝜖-equilibria (as 𝜖 → 0) obtained in Theorem 1.

Assume that A_i(x) = A_i for each x ∈ X and i ∈ N and that every space A_i is compact. Let X and A₁, …, A_n be given the discrete topology. According to Maitra and Sudderth (2007), a function $g:H_{\infty } \to \mathbb {R} $ is DS-continuous on H_∞ if it is continuous on H_∞ endowed with the product topology. It is easy to see that g is DS -continuous on H_∞ if and only if, for any 𝜖 > 0 and y = (y₁, y₂, …) ∈ H_∞, there exists m such that |g(y) − g(y^′)| < 𝜖 for each $y^{\prime }=(y_1^{\prime },y_2^{\prime },\ldots )\in H_{\infty }$ such that $y_l=y^{\prime }_l$ for 1 ≤ l ≤ m. Suppose that $g_i:H_{\infty }\to \mathbb {R}$ is a bounded Borel measurable payoff function for player i ∈ N. For any strategy profile π ∈ Π and every initial state x = x₁, the expected payoff to player i is $E^\pi _x(g_i).$ The subgame-perfect equilibrium can be defined for this game in the usual way. Maitra and Sudderth (2007) (see Theorem 1.2) obtained a general theorem on the existence of subgame-perfect equilibria for stochastic games.

Theorem 3

Let the payoff functions g_i, i ∈ N be bounded, Borel measurable, and DS-continuous on H_∞ and let the action spaces A_i, i ∈ N be finite. Then the game has a subgame-perfect equilibrium.

The proof of Theorem 3 applies some techniques from gambling theory described in Dubins and Savage (2014), i.e., approximations of DS-continuous functions by “finitary functions”. Theorem 3 extends a result due to Fudenberg and Levine (1983). An example given in Harris et al. (1995) shows that Theorem 3 is false, if the action spaces are compact metric and the transition probability q is weakly continuous.

The next result was proved by Maitra and Sudderth (2007) (see Theorem 1.3) for “additive games” and sounds as follows.

Theorem 4

Assume that every action space is compact and the transition probability satisfies (A1) . Assume that $g_i(h_{\infty })=\sum _{t=1}^\infty r_{it}(x_t,a^t)$ and this series converges uniformly on H_∞. If, in addition, every function r_it is bounded, r_it(⋅, a) is Borel measurable on X for each a ∈ A := A₁ ×⋯ × A_n and r_it(x, ⋅) is continuous on A for each x ∈ X, then the game has a subgame-perfect equilibrium.

It is worth to emphasize that stationary Markov perfect equilibria may not exist in games considered in this section. Namely, Levy (2013) gave a counterexample of a discounted game with uncountable state space, finite action sets and deterministic transitions. Then, Levy and McLennan (2015) showed that stationary Markov perfect equilibria may not exist even if the action spaces are finite, X = [0, 1] and the transition probability has a density function with respect to some measure $\mu \in \Pr (X).$ A simple modification of the example given in Levy and McLennan (2015) shows that a new game (with X = [0, 2]) need not have a stationary Markov perfect equilibrium, when the measure μ (dominating the transition probability q) is nonatomic.

4 Correlated Equilibria with Public Signals in Games with Borel State Spaces

Correlated equilibria for normal form games were first studied by Aumann (1974, 1987). In this section we describe an extensive-form correlated equilibrium with public randomization inspired by the work of Forges (1986) . A further discussion of correlated equilibria and communication in games can be found in Forges (2009). The sets of all equilibrium payoffs in extended form games that include a general communication device are characterized by Solan (2001).

We now extend the sets of strategies available to the players in the sense that we allow them to correlate their choices in some natural way. Suppose that $(\xi _t)_{t\in \mathbb {N}}$ is a sequence of so-called signals , drawn independently from [0, 1] according to the uniform distribution. Suppose that at the beginning of each period t of the game the players are informed not only of the outcome of the preceding period and the current state x_t, but also of ξ_t. Then, the information available to them is a vector h^t = (x₁, ξ₁, a¹, …, x_t−1, ξ_t−1, a^t−1, x_t, ξ_t), where x_τ ∈ X, ξ_τ ∈ [0, 1], a^τ ∈ A(x_τ), 1 ≤ τ ≤ t − 1. We denote the set of such vectors by H^t. An extended strategy for player i is a sequence $\pi _i=(\pi _{it})_{t\in \mathbb {N}}$, where π_it is a Borel measurable transition probability from H^t to A_i such that π_it(A_i(x_t)|h^t) = 1 for each h^t ∈ H^t. An extended stationary strategy for player i ∈ N can be identified with a Borel measurable mapping $f:X \times [0,1]\to \Pr (A_i)$ such that f(A_i(x)|x, ξ) = 1 for all (x, ξ) ∈ X × [0, 1]. Assuming that the players use extended strategies we actually assume that they play the stochastic game with the extended state space X × [0, 1]. The law of motion, say p, in the extended state space model is obviously the product of the original law of motion q and the uniform distribution η on [0, 1]. More precisely, for any x ∈ X , ξ ∈ [0, 1], a ∈ A(x), Borel sets C ⊂ X and D ⊂ [0, 1], p(C × D|x, ξ, a) = q(C|x, a)η(D). For any profile of extended strategies π = (π₁, …, π_n) of the players, the expected discounted payoff to player i ∈ N is a function of the initial state x₁ = x and the first signal ξ₁ = ξ and is denoted by $J_{\beta }^i(x,\xi ,\pi ).$ We say that $f^*=(f^*_1,\ldots ,f^*_n)$ is a Nash equilibrium in the β-discounted stochastic game in the class of extended strategies if for each initial state x₁ = x, i ∈ N and every extended strategy π_i of player i, we have

$$\displaystyle \begin{aligned} \int_0^1J_{\beta}^i(x,\xi,f^*)d\xi \ge \int_0^1J_{\beta}^i(x,\xi,(\pi_i,f^*_{-i}))d\xi. \end{aligned}$$

The Nash equilibrium in extended strategies is also called a correlated equilibrium with public signals. The reason is that after the outcome of any period of the game, the players can coordinate their next choices by exploiting the next (known to all of them, i.e., public) signal and using some coordination mechanism telling which (pure or mixed) action is to be played by everyone. In many applications, we are particularly interested in stationary equilibria. In such a case the coordination mechanism can be represented by a family of n + 1 Borel measurable functions λ^j : X → [0, 1] such that $\sum _{j=1}^{n+1}\lambda ^j(x)=1$ for each x ∈ X. A stationary correlated equilibrium can be constructed then by using a family of n + 1 stationary strategies $f^1_i,\ldots ,f^{n+1}_i$ given for every player i, and the following coordination rule. If the game is in state x_t = x on stage t and a random number ξ_t = ξ is selected, then player i ∈ N is suggested to use $f_i^k(\cdot |x)$ where k is the least index for which $ \sum _{j=1}^k\lambda ^j(x)\ge \xi .$ The functions λ^j and $f^j_i$ induce an extended stationary strategy $f^*_i$ for every player i as follows

$$\displaystyle \begin{aligned} f_i^*(\cdot|x,\xi):= f_i^1(\cdot|x)\quad\mbox{if}\quad \xi\le\lambda^1(x), \ \ x\in X, \end{aligned}$$

and

$$\displaystyle \begin{aligned} f_i^*(\cdot|x,\xi):= f_i^k(\cdot|x)\quad\mbox{if}\quad \sum_{j=1}^{k-1}\lambda^j(x)<\xi\le \sum_{j=1}^k\lambda^j(x) \end{aligned}$$

for x ∈ X, 2 ≤ k ≤ n + 1. Because the signals are independent and uniformly distributed in [0, 1], it follows that at any period of the game and for any current state x, the number λ_j(x) can be interpreted as the probability that player i is suggested to use $f^j_i(\cdot |x)$ as a mixed action.

(A2)
Let $\mu \in \Pr (X).$ There exists a conditional density function ρ for q with respect to μ such that if a^k → a⁰ in A(x), x ∈ X, as k →∞, then
$$\displaystyle \begin{aligned} \lim_{k\to\infty}\int_X|\rho(x,a^k,y)-\rho(x,a^0,y)|\mu(dy) =0. \end{aligned}$$

Theorem 5

Any discounted stochastic game G satisfying (A2)has a stationary correlated equilibrium with public signals.

Theorem 5 was proved by Nowak and Raghavan (1992). First it is shown by making use of a theorem in Glicksberg (1952) that the correspondence $v\to \mathcal {M}_v$ has a fixed point, i.e., there exists w^∗∈ Bⁿ(X) such that $w^*(x)\in co\mathcal {NP}_{w^*}(x)$ for all x ∈ X. Then, applying Propositions 2 and 4, one can prove the existence of a stationary correlated equilibrium with public signals for the game with the payoff functions $U_{\beta }^i(w^*_i,x,a)$ defined in (6.1). A verification that f^∗ obtained in this way is indeed a Nash equilibrium in the game with the extended state space X × [0, 1] relies on using standard Bellman equations for discounted dynamic programming; see Blackwell (1965) or Puterman (1994). Observe also that the set of all atoms D_a for μ is countable. A refinement of the above result is Theorem 2 in Jaśkiewicz and Nowak (2016), where it is shown that public signals are important only in states belonging to the set X ∖ D_a. A similar result on correlated equilibria was given in Nowak and Jaśkiewicz (2005) for semi-Markov games with Borel state spaces and the expected average payoffs. This result, in turn, was proved under geometric drift conditions (GE1)–(GE3) formulated in Sect. 5 in Jaśkiewicz and Nowak (2018a).

Condition (A2) can be replaced in the proof (with minor changes) by assumption (A1) on norm continuity of q with respect to actions. A similar result to Theorem 5 was given by Duffie et al. (1994), where it was assumed that for any x, x^′∈ X, a ∈ A(x), a^′∈ A(x^′), we have

$$\displaystyle \begin{aligned} q(\cdot|x,a)\ll q(\cdot|x^{\prime },a^{\prime })\quad\mbox{ and}\quad q(\cdot|x^{\prime },a^{\prime })\ll q(\cdot|x,a). \end{aligned}$$

In addition, Duffie et al. (1994) required the continuity of the payoffs and transitions with respect to actions. Thus, the result in Duffie et al. (1994) is weaker than Theorem 5. Moreover, they also established the ergodicity of the Markov chain induced by a stationary correlated equilibrium. Their proof is different from that of Nowak and Raghavan (1992). Subgame-perfect correlated equilibria were also studied by Harris et al. (1995) for games with weakly continuous transitions and general continuous payoff functions on the space of infinite plays endowed with the product topology. Harris et al. (1995) gave an example showing that public signals play an important role. They proved that the subgame-perfect equilibrium path correspondence is upper hemicontinuous. Later, Reny and Robson (2002) provided a shorter and simpler proof of existence that focuses on considerations of equilibrium payoffs rather than paths. Some comments on correlated equilibria for games with finitely many states or different payoff evaluation will be given in the sequel.

5 Stationary Equilibria in Discounted Stochastic Games with Borel State Spaces

In this section, we introduce the following condition.

(A3)
There exist l Carathéodory functions α_j : X × A → [0, 1] such that $\sum _{j=1}^l\alpha _j(x,a)=1$ for every (x, a) ∈ X × A and Borel measurable transition probabilities $q_j:X\times \mathcal {B}(X)\to [0,1]$ such that
$$\displaystyle \begin{aligned} q(\cdot|x,a) = \sum_{j=1}^l \alpha_j(x,a)q_j(\cdot|x), \quad (x,a)\in X\times A.\end{aligned} $$
Additionally, every q_j(⋅|x) is dominated by some $\mu \in \Pr (X).$

We can now state a result due to Jaśkiewicz and Nowak (2016).

Theorem 6

Assume that game G satisfies (A3) . Then, G has a stationary almost Markov perfect equilibrium.

We outline the proof of Theorem 6 for nonatomic measure μ. The general case needs an additional notation. First, we show that there exists a Borel measurable mapping w^∗∈ Bⁿ(X) such that $w^*(x) \in co\mathcal {NP}_{w^*}(x)$ for all x ∈ X. This result is obtained by applying a generalization of the Kakutani fixed point theorem due to Glicksberg (1952). (Note that closed balls in Bⁿ(X) are compact in the weak-star topology due to Banach-Alaoglu’s theorem.) Second, applying Proposition 5 we conclude that there exists some v^∗∈ Bⁿ(X × X) such that

$$\displaystyle \begin{aligned} \int_Xw^*(y)q_j(dy|x)=\int_Xv^*(x,y)q_j(dy|x),\quad j=1,\ldots,l. \end{aligned}$$

Hence, by (A3) we infer that

$$\displaystyle \begin{aligned} \int_Xw^*(y)q(dy|x,a)=\int_Xv^*(x,y)q(dy|x,a),\quad (x,a)\in X\times A. \end{aligned}$$

Moreover, we know that $v^*(x,y)\in \mathcal {NP}_{v^*}(y)$ for all states x and y. Furthermore, making use of Filippov’s measurable implicit function theorem (as in Proposition 5), we claim that v^∗(x, y) is the vector of equilibrium payoffs corresponding to some stationary almost Markov strategy profile. Finally, we utilize the system of n Bellman equations to provide a characterization of stationary equilibrium and to deduce that this profile is indeed a stationary almost Markov perfect equilibrium. For details the reader is referred to Jaśkiewicz and Nowak (2016).

Corollary 1

Consider a game where the set A is finite and the transition probability q is Borel measurable. Then, the game has a stationary almost Markov perfect equilibrium.

Proof

We show that the game meets (A3). Let $m\in \mathbb {N}$ be such that A = {a¹, …, a^m}. Now, for j = 1, …, m, define

$$\displaystyle \begin{aligned} \alpha_j(s,a):= \left\{ \begin{array}{ll} 1, & \mbox{if}\ \ a\in A(x),\; a=a^j \\ 0, & \mbox{otherwise,} \end{array} \right.\quad\mbox{and}\\ q_j(\cdot|x):= \left\{ \begin{array}{ll} q(\cdot|x,a), & \mbox{if}\ \ a\in A(x),\; a=a^j \\ \mu(\cdot), & \mbox{otherwise.} \end{array} \right. \end{aligned} $$

Then, $q(\cdot |s,a)=\sum _{j=1}^l g_j(s,a)q_j(\cdot |s)$ for l = m and the conclusion follows from Theorem 6.

Remark 1

Corollary 1 extends the result of Mertens and Parthasarathy (1991), where it is additionally assumed that A_i(x) = A_i for all x ∈ X, i ∈ N and that μ is nonatomic; see Comment on page 147 in Mertens and Parthasarathy (1991) or Theorem VII.1.8 on page 398 in Mertens et al. (2015). If μ admits some atoms, then they proved the existence of a subgame-perfect equilibrium in which the strategy of player i ∈ N is of the form (f_i1, f_i2, …) with $f_{it}\in F_i^0$ for each $t\in \mathbb {N}.$ Thus, the equilibrium strategy of player i ∈ N is stage-dependent.

Remark 2

It is worth to emphasize that equilibria established in Theorem 6 are subgame-perfect. A related result to Theorem 6 is given in Barelli and Duggan (2014). The assumption imposed on the transition probability in their paper is weaker, but an equilibrium is shown to exist in the class of stationary semi-Markov strategies, where the players take into account the current state, the previous state and the actions chosen by the players in the previous state.

Remark 3

As already mentioned in Sect. 3, Levy and McLennan (2015) constructed a stochastic game that does not have a stationary Markov perfect equilibrium. In their model, each set A_i is finite, A_i(x) = A_i for every i ∈ N, x ∈ X, and the transition law is a convex combination of a probability measure (depending on the current state) and the Dirac measure concentrated at some state. Such a model satisfies the absolute continuity condition. Hence, their example confirms that one cannot expect to obtain an equilibrium in stationary Markov strategies even for games with finite action spaces. Therefore, Corollary 1 is meaningful.

Remark 4

By Urysohn’s metrization theorem (see Theorem 3.40 in Aliprantis and Border 2006), every action space A_i can be embedded homeomorphically in the Hilbert cube. The action correspondences remain measurable and compact-valued after the embedding. Therefore, one can assume without loss of generality as in Jaśkiewicz and Nowak (2016) that the action spaces are compact metric.

A stochastic game with additive reward and additive transitions (ARAT for short) satisfies some separability condition for the actions of the players. To simplify presentation, we assume that N = {1, 2}. The payoff function for player i ∈ N is of the form

$$\displaystyle \begin{aligned} u_i(x,a_1,a_2) =u_{i1}(x,a_1) +u_{i2}(x,a_2), \end{aligned}$$

where x ∈ X, a₁ ∈ A₁(x), a₂ ∈ A₂(x) and similarly

$$\displaystyle \begin{aligned} q(\cdot|x,a_1,a_2) = q_1(\cdot|x,a_1) + q_2(\cdot|x,a_2), \end{aligned}$$

where q₁ and q₂ are Borel measurable subtransition probabilities dominated by some $\mu \in \Pr (X).$

The following result was proved in Jaśkiewicz and Nowak (2015a).

Theorem 7

If μ is a nonatomic probability measure and the action sets A₁ and A₂ are finite, then the ARAT stochastic game has a Nash equilibrium in pure stationary almost Markov strategies.

The separability condition as in ARAT games can be easily generalized to n -person case. Assumptions of similar type are often used in differential games; see Başar and Olsder (1995). ARAT stochastic games with Borel state and finite action spaces were first studied by Himmelberg et al. (1976), who showed the existence of stationary Markov equilibria for μ-almost all initial states with $\mu \in \Pr (X).$ Their result was strengthened by Nowak (1987) , who considered compact metric action spaces and obtained stationary equilibria for all initial states. Pure stationary Markov perfect equilibria may not exist in ARAT stochastic games if μ has atoms; see Example 3.1 (a game with 4 states) in Raghavan et al. (1985) or counterexample (a game with 2 states) in Jaśkiewicz and Nowak (2015a). Küenle (1999) studied ARAT stochastic games with a Borel state space and compact metric action spaces and established the existence of non-stationary history-dependent pure Nash equilibria. In order to construct subgame-perfect equilibria, he used the well-known idea of threats (frequently used in repeated games). The result of Küenle (1999) is stated for two-person games only. Theorem 7 can also be proved for n-person games under a similar additivity assumption. An almost Markov equilibrium is obviously subgame-perfect.

Stationary Markov perfect equilibria exist in discounted stochastic games with state-independent transitions (SIT games) studied by Parthasarathy and Sinha (1989). They assumed that A_i(x) = A_i for all x ∈ X and i ∈ N, the action sets A_i are finite, and q(⋅|x, a) = q(⋅|a) are nonatomic for all a ∈ A. A more general class of games with additive transitions satisfying (A3) but with all q_j independent of state x ∈ X (AT games) was examined by Nowak (2003b). A stationary Markov perfect equilibrium f^∗∈ F⁰ was shown to exist in that class of stochastic games. Additional special classes of discounted stochastic games with uncountable state space having stationary Markov perfect equilibrium are described in Krishnamurthy et al. (2012) . Some of them are related to AT games studied by Nowak (2003b).

Let X = Y × Z where Y and Z are Borel spaces. In a noisy stochastic game considered by Duggan (2012), the states are of the form x = (y, z) ∈ X, where z is called a noise variable. The payoffs depend measurably on x = (y, z) and are continuous in actions of the players. The transition probability q is defined as follows:

$$\displaystyle \begin{aligned} q(D|x,a)= \int_Y\int_Z1_D(y^{\prime },z^{\prime })q_2(dz^{\prime }|y^{\prime })q_1(dy^{\prime }|x,a), \quad a\in A(x),\ D\in \mathcal{B}(Y\times Z). \end{aligned}$$

Moreover, it is assumed that q₁ is dominated by some $\mu _1\in \Pr (Y)$ and q₂ is absolutely continuous with respect to some nonatomic measure $\mu _2\in \Pr (Z).$ Additionally, q₁(⋅|x, a) is norm continuous in actions a ∈ A, for each x ∈ X. This form of q implies that conditional on y^′ the next shock z^′ is independent of the current state and actions. In applications, (y, z) may represent a pair: the price of some good and the realization of random demand. By choosing actions, the players can determine (stochastically) the next period price y^′, which in turn, has some influence on the next demand shock. Other applications are discussed by Duggan (2012), who obtained the following result.

Theorem 8

Every noisy stochastic game has a stationary Markov perfect equilibrium.

Let X be a Borel space, $\mu \in \Pr (X)$ and let $\mathcal {G}\subset \mathcal {B}(X)$ be a sub-σ-algebra. A set $D\in \mathcal {B}(X)$ is said to be a (conditional) $\mathcal {G}$-atom if μ(D) > 0 and for any Borel set B ⊂ D, there exists some $D_0\in \mathcal {G}$ such that μ(B△(D ∩ D₀)) = 0. Assume that the transition probability q is dominated by some probability measure μ, and ρ denotes a conditional density function. Following He and Sun (2017), we say that a discounted stochastic game has a decomposable coarser transition kernel if there exists a sub-σ-algebra $\mathcal {G}\subset \mathcal {B}(X)$ such that $\mathcal { B}(X)$ has no $\mathcal {G}$-atom and there exist Borel measurable nonnegative functions ρ_j and d_j (j = 1, …, l) such that, for every x ∈ X, a ∈ A, each function ρ_j(⋅, x, a) is $\mathcal {G}$ -measurable and the transition probability density ρ is of the form

$$\displaystyle \begin{aligned} \rho(y,x,a)=\sum_{j=1}^l\rho_j(y,x,a)d_j(y),\quad x,\ y\in X,\ a\in A. \end{aligned}$$

Using a theorem of Dynkin and Evstigneev (1977) on conditional expectations of measurable correspondences and a fixed point property proved in Nowak and Raghavan (1992), He and Sun (2017) established the following result.

Theorem 9

Every discounted stochastic game having decomposable coarser transition kernel with respect to a nonatomic probability measure μ on X has a stationary Markov perfect equilibrium.

A slight extension of the above theorem, given in He and Sun (2017), contains as special cases the results proved by Parthasarathy and Sinha (1989), Nowak (2003b), Nowak and Raghavan (1992). However, the form of the equilibrium strategy obtained by Nowak and Raghavan (1992) does not follow from He and Sun (2017). The result of He and Sun (2017) also covers the class of noisy stochastic games examined in Duggan (2012). In this case, it suffices to take $\mathcal {G}\subset \mathcal {B} (Y\times Z)$ that consists of the sets D × Z, $D\in \mathcal {B}(Y).$ Finally, we wish to point out that ARAT discounted stochastic games as well as games considered in Jaśkiewicz and Nowak (2016) (see Theorem 6) are not included in the class of models mentioned in Theorem 9.

Remark 5

A key tool in proving the existence of a stationary Markov perfect equilibrium in a discounted stochastic game that has no ARAT structure is Lyapunov’s theorem on the range of a nonatomic vector measure. Since the Lyapunov theorem is false for infinitely many measures, the counterexample of Levy’s eight-person game is of some importance (see Levy and McLennan 2015). There is another reason for which the existence of an equilibrium in the class of stationary Markov strategies F⁰ is difficult to obtain. One can recognize strategies from the sets $F_i^0$ as “Young measures” and consider natural in that class weak-star topology; see Valadier (1994). Young measures are often called relaxed controls in control theory. With the help of Example 3.16 from Elliott et al. (1973), one can easily construct a stochastic game with X = [0, 1], finite action spaces and trivial transition probability q being a Lebesgue measure on X, where the expected discounted payoffs $J^i_{\beta }(x,f)$ are discontinuous on F⁰ endowed with the product topology. The continuity of $f\to J^i_{\beta }(x,f)$ (for fixed initial state) can only be proved for ARAT games. Generally, it is difficult to obtain compact families of continuous strategies. This property requires very strong conditions in order to get, for instance, equicontinuous family of functions (see Sect. 6).

6 Special Classes of Stochastic Games with Uncountable State Space and Their Applications in Economics

In a number of applications of discrete-time dynamic games in economics, the state space is an interval in Euclidean space. An illustrative example is the “fish war game” studied by Levhari and Mirman (1980), where the state space X = [0, 1], A_i(x) = [0, x∕n] for each i ∈ N. Usually, X is interpreted as the set of common property renewable resources. If x_t is a resource stock at the beginning of period $t\in \mathbb {N}$ and player i ∈ N extracts a_it ∈ A_i(x_t) for consumption, then the new state is $x_{t+1}= \left (x_t-\sum _{j=1}^na_{jt}\right )^\alpha $ with α ∈ (0, 1). The game is symmetric in the sense that the utility function of player i ∈ N is: $ u_i(x,a):= \ln a_i$ with a = (a₁, …, a_n) being a pure strategy profile chosen by the players in state x ∈ X. Levhari and Mirman (1980) constructed a symmetric stationary Markov perfect equilibrium for the two-player β -discounted game that consists of linear strategies. For the arbitrary n -player case, the equilibrium strategy profile is $f^\beta =(f_1^\beta ,\ldots ,f^ \beta _n)$ where $f_i^\beta (x)= \frac {(1-\alpha \beta )x}{n+(1-n)\alpha \beta }$, x ∈ X, i ∈ N; see Nowak (2006c). Levhari and Mirman (1980) concluded that, in equilibrium, the fish population will be smaller than the population that would have resulted if the players had cooperated and had maximized their joint utility. The phenomenon of overexploitation of a common property resource is known in economics as the “tragedy of the commons.” Dutta and Sundaram (1993) showed that there may exist equilibria (that consist of discontinuous consumption functions), in which the common resource is underexploited, so that the tragedy of the commons need not occur. A characterization of the set of equilibria in this model has been given by Chiarella et al. (1984). If β → 1 , then $f^\beta \to f^*=(f_1^*,\ldots ,f_n^*)$ where $f_i^*(x)=\frac {(1-\alpha )x }{n+(1-n)\alpha }$, x ∈ X, i ∈ N. As shown in Nowak (2008), f^∗ is a Nash equilibrium in the class of all strategies of the players in the fish war game under the overtaking optimality criterion. Such a criterion was examined in economics by Ramsey (1928), von Weizscker (1965), and Gale (1967), and its application to repeated games was pointed out by Rubinstein (1979). Generally, finding an equilibrium under the overtaking optimality criterion in the class of all strategies is a difficult task; see Carlson and Haurie (1996).

Dutta and Sundaram (1992) considered a stochastic game of resource extraction with state space X = [0, ∞), A_i(x) = [0, x∕n] for each i ∈ N, x ∈ X and the same nonnegative utility function u for each player. Their model includes both the dynamic game with deterministic transitions studied by Sundaram (1989a,b) and the stochastic game with nonatomic transition probabilities considered by Majumdar and Sundaram (1991). Now we list the assumptions used by Dutta and Sundaram (1992). For any y, z ∈ X, let Q(y|z) := q([0, y]|z) and for any y > 0 set $Q(y^-|z):=\lim _{y^{\prime }\uparrow y}Q(y^{\prime }|z).$

(D1)
For any x ∈ X, a = (a₁, …, a_n) ∈ A(x) and i ∈ N, u_i(x, a) = u(a_i) ≥ 0. The utility function u is strictly concave, increasing and continuously differentiable. Moreover, lim_a↓0u^′(a) = ∞.
(D2)
Q(0|0) = 1 and for each z > 0, there exists a compact interval I(z) ⊂ (0, ∞) such that q(I(z)|z) = 1.
(D3)
There exists z₁ > 0 such that if 0 < z < z₁, then Q(z⁻|z) = 0, i.e., q([z, ∞)|z) = 1.
(D4)
There exists $\hat {z}>0$ such that for each $z\ge \hat {z}$, Q(z|z) = 1, i.e., q([0, z]|z) = 1.
(D5)
If z_m → z as m →∞, then q(⋅|z_m) → q(⋅|z) in the weak topology on $\Pr (X).$
(D6)
If z < z^′, then for each y > 0, Q(y⁻|z) ≥ Q(y|z^′).

Assumption (D6) is a “strong stochastic dominance” condition that requires larger investments to obtain probabilistically higher stock levels. This assumption and the fact that the players have identical utility functions play a crucial role in the proof of Theorem 1 in Dutta and Sundaram (1992) that can be stated as follows.

Theorem 10

Every discounted stochastic game satisfying conditions (D1)–(D6)has a pure stationary Markov perfect equilibrium.

Remark 6

The equilibrium strategies obtained by Dutta and Sundaram (1992) are identical for all the players, and the corresponding equilibrium functions are nondecreasing and upper semicontinuous on X. One can observe that the assumptions on the transition probability functions include the usual deterministic case with an increasing continuous production function. A slightly more general model was recently studied by Jaśkiewicz and Nowak (2018b), who dealt with unbounded utilities.

Transition probabilities considered in other papers on equilibria in stochastic games are assumed to satisfy much stronger continuity conditions, e.g., the norm continuity in actions.

The problem of proving the existence of a Nash equilibrium in a stochastic game of resource extraction with different utility functions for the players seems to be difficult. Partial results were reported by Amir (1996a), Nowak (2003b), Balbus and Nowak (2008) and Jaśkiewicz and Nowak (2015b), where specific transition structures were assumed. Below we give an example, where the assumptions are relatively simple to list.

(S1)
X = [0, ∞) and A_i(x) = [0, b_i(x)] with $\sum _{j=1}^n b_j(x) \le x$ for all x ∈ X, where each b_j is a continuous increasing function.
(S2)
$u_i:[0,\infty ) \to \mathbb {R}$ is a nonnegative increasing twice differentiable utility function for player i ∈ N such that u_i(0) = 0.
(S3)
If a = (a₁, …, a_n) ∈ A(x) and $s(a)= \sum _{i=1}^na_i$, then
$$\displaystyle \begin{aligned} q(\cdot|x,a) = h(x-s(a))q_0(\cdot|x) +(1-h(x-s(a)))\delta_0(\cdot), \end{aligned}$$
where h : X → [0, 1] is an increasing twice differentiable function such that h^′′ < 0 and h(0) = 0, δ₀ is the Dirac measure concentrated at the point 0 ∈ X. Moreover, q₀((0, ∞)|x) = 1 for each x > 0, q₀({0}|0) = 1 and q₀(⋅|x) has a density function ρ(x, ⋅) with respect to a σ-finite measure μ defined on X. The function x → ρ(x, y) is continuous for each y ∈ X.

The following result is a special case of Theorem 2 in Jaśkiewicz and Nowak (2015b).

Theorem 11

Every discounted stochastic game satisfying assumptions (S1)–(S3)has a pure stationary Markov perfect equilibrium.

The proof of Theorem 11 uses the fact that the auxiliary game Γ_v(x) has a unique Nash equilibrium for any vector v = (v₁, …, v_n) of nonnegative continuation payoffs v_i such that v_i(0) = 0. The uniqueness follows from page 1476 in Balbus and Nowak (2008) or can be deduced from the classical theorem of Rosen (1965) (see also Theorem 3.6 in Haurie et al. 2012). The game Γ_v(x) is not supermodular since for increasing continuation payoffs v_i such that v_i(0) = 0, we have $ \frac {\partial ^2 {U_{\beta }^i(v_i,x,a)}}{\partial {a_i}\partial {a_j}}<0$, for i ≠ j. A stronger version of Theorem 11 and related results can be found in Jaśkiewicz and Nowak (2015b).

Transition probabilities presented in (S3) were first used in Balbus and Nowak (2004). They dealt with the symmetric discounted stochastic games of resource extraction and proved that the sequence of Nash equilibrium payoffs in the n-stage games converges monotonically as n →∞. Stochastic games of resource extraction without the symmetry condition were first examined by Amir (1996a), who considered so-called convex transitions. More precisely, he assumed that the conditional cumulative distribution function Q(y|z) is strictly convex with respect to z ∈ X for every fixed y > 0. He proved the existence of pure stationary Markov perfect equilibria, which are Lipschitz continuous functions in the state variable. Although the result obtained is strong, a careful analysis of various examples suggests that the convexity assumption imposed by Amir (1996a) is satisfied very rarely. Usually, the cumulative distribution Q(y|z) is neither convex nor concave with respect to z. A further discussion on this condition is provided in Remarks 7–8 in Jaśkiewicz and Nowak (2015b). The function Q(y|z) induced by the transition probability q of the form considered in (S3) is strictly concave in z = x − s(a) only when q₀ is independent of x ∈ X. Such transition probabilities that are “mixtures” of finitely many probability measures on X were considered in Nowak (2003b) and Balbus and Nowak (2008). A survey of various game theoretic approaches to resource extraction models can be found in Van Long (2011).

In many other examples, the one-shot game Γ_v(x) has also nonempty compact set of pure Nash equilibria. Therefore, a counterpart of Theorem 6 can be formulated for the class of pure strategies of the players. We now describe some examples presented in Jaśkiewicz and Nowak (2016).

Example 1 (Dynamic Cournot oligopoly)

Let $X=[0,\bar {x}]$ and x ∈ X represent a realization of a random demand shock that is modified at each period of the game. Player i ∈ N (oligopolist) sets a production quantity a_i ∈ A_i(x) = [0, 1]. If $P\left (x,\sum _{j=1}^n a_j\right )$ is the inverse demand function, and c_i(x, a_i) is the cost function for player i, then

$$\displaystyle \begin{aligned} u_i(x,a):= a_i P\left(x,\sum_{j=1}^n a_j\right)- c_i(x,a_i), \quad a=(a_1,\ldots,a_n). \end{aligned}$$

A simple example of the inverse demand function is

$$\displaystyle \begin{aligned} P\left(x,\sum_{j=1}^n a_j\right)=x\left(n-\sum_{j=1}^n a_j\right). \end{aligned}$$

The function $a_i \to a_iP\left (x,\sum _{j=1}^n a_j\right )$ is usually concave and a_i → c_i(x, a_i) is often convex. Assume that

$$\displaystyle \begin{aligned} q(\cdot|x,a) = (1-\overline{a})q_1(\cdot|x) +\overline{a}q_2(\cdot|x), \quad \overline{a} :=\frac{1}{n}\sum_{j=1}^n a_j, \end{aligned}$$

where q₁(⋅|x) and q₂(⋅|x) are dominated by some probability measure μ on X for all x ∈ X. In order to provide an interpretation of q, we observe that

$$\displaystyle \begin{aligned} q(\cdot|x,a)= q_1(\cdot|x) + \overline{a}(q_2(\cdot|x)-q_1(\cdot|x)). \end{aligned} $$

(6.2)

Let

$$\displaystyle \begin{aligned} E_q(x,a):=\int_X yq(dy|x,a) \quad\mbox{and}\quad E_{q_{j}}(x):=\int_X yq_j(dy|x) \end{aligned}$$

be the mean values of the distributions q(⋅|x, a) and q_j(⋅|x), respectively. By (6.2), we have $E_q(x,a):= E_{q_{1}}(x) + \overline {a}( E_{q_{2}}(x)- E_{q_{1}}(x)).$ Assume that $E_{q_{1}}(x) \ge x\ge E_{q_{2}}(x).$ This condition implies that

$$\displaystyle \begin{aligned} E_{q_{2}}(x)- E_{q_{1}}(x)\le 0.\end{aligned} $$

Thus, the expectation of the next demand shock E_q(x, a) decreases if the total sale $n\overline {a}$ in the current state x ∈ X increases. Observe that the game Γ_v(x) is concave. From Nash (1950), it follows that the game Γ_v(x) has a pure equilibrium point. However, the set of Nash equilibria in Γ_v(x) may contain many points. A modification of the proof of Theorem 1 given in Jaśkiewicz and Nowak (2016) implies that this game has a pure stationary almost Markov perfect equilibrium.

Example 2 (Cournot competition with substituting goods in differentiated markets)

This model is inspired by a dynamic game with complementary goods studied by Curtat (1996). Related static games were already discussed in Spence (1976) and Vives (1990). There are n firms on the market and firm i ∈ N produces a quantity a_i ∈ A_i(x) = [0, 1] of a differentiated product. The inverse demand function is given by a twice differentiable function P_i(a), where a = (a₁, …, a_n). The goods are substitutes, i.e., $\frac {\partial P_i(a)}{\partial a_j}<0$ for all i, j ∈ N, see Spence (1976). In other words, consumption of one good will decrease consumption of the others. We assume that X = [0, 1]ⁿ, where i-th coordinate x_i ∈ [0, 1] is a measure of the cumulative experience of firm i ∈ N. If c_i(x_i) is the marginal cost for firm i ∈ N, then

$$\displaystyle \begin{aligned} u_i(x,a):= a_i\left[ P_i\left( a\right)- c_i(x_i)\right], \quad a=(a_1,\ldots,a_n),\;\; x=(x_1,\ldots,x_n)\in X.\end{aligned} $$

(6.3)

The transition probability of the next state (experience vector) is of the form:

$$\displaystyle \begin{aligned} q(\cdot|x,a) = h \left(\sum_{j=1}^n (x_j+ a_j)\right)q_2(\cdot|x) + \left(1- h\left(\sum_{j=1}^n (x_j+ a_j)\right)\right)q_1(\cdot|x),\end{aligned} $$

(6.4)

where

$$\displaystyle \begin{aligned} h\left(\sum_{j=1}^n (x_j+a_j)\right)=\frac{\sum_{j=1}^n x_j+\sum_{j=1}^n a_j}{2n}\end{aligned} $$

(6.5)

and q₁(⋅|x), q₂(⋅|x) are dominated by some probability measure μ on X for all x ∈ X. In Curtat (1996) it is assumed that q₁ and q₂ are independent of x ∈ X and also that q₂ stochastically dominates q₁. Then, the underlying Markov process governed by q captures the ideas of learning-by-doing and spillover (see page 197 in Curtat 1996). Here, this stochastic dominance condition can be dropped, although it is quite natural. It is easy to see that the game Γ_v(x) is concave, if u_i(x, (⋅, a_−i)) is concave on [0, 1]. Clearly, this is satisfied, if for each i ∈ N, we have

$$\displaystyle \begin{aligned} 2\frac{\partial P_i(a)}{\partial a_i}+\frac{\partial^2 P_i(a)}{\partial a_i^2}a_i<0. \end{aligned}$$

If the goods are substitutes, this condition holds, when $\frac {\partial ^2 P_i(a)}{\partial a_i^2}\le 0$ for all i ∈ N. The game Γ_v(x) may have multiple pure Nash equilibria. Using the methods from Jaśkiewicz and Nowak (2016), one can show that any concave game discussed here has a pure stationary almost Markov perfect equilibrium.

Supermodular static games were extensively studied by Milgrom and Roberts (1990) and Topkis (1998). This class of games finds applications in dynamic economic models with complementarities. Our next illustration refers to Example 2, but with products that are complements. The state space and action spaces for firms are the same as in Example 2. We endow both X and A = [0, 1]ⁿ with the usual component-wise ordering. Then, X and A are complete lattices. We assume that the transition probability is defined as in (6.4) and q₁(⋅|x) and q₂(⋅|x) are for all x ∈ X dominated by some probability measure μ on S. The payoff function for every firm is given in (6.3).

Example 3 (Cournot oligopoly with complementary goods in differentiated markets)

Let h be defined as in (6.5). Suppose that the payoff function in the game Γ_v(x) satisfies the following condition:

$$\displaystyle \begin{aligned} \frac{\partial^2 U_{\beta}^i(v_i,x,a)}{\partial a_i\partial a_j}\ge 0\quad\mbox{for}\quad j\neq i. \end{aligned}$$

Then, by Theorem 4 in Milgrom and Roberts (1990), the game Γ_v(x) is supermodular. Note that within our framework, it is sufficient to prove that for u_i(x, a), defined in (6.3), it holds $\frac {\partial ^2 u_i(s,a)}{\partial a_i\partial a_j}\ge 0$, j ≠ i. But

$$\displaystyle \begin{aligned} \frac{\partial^2 u_i(x,a)}{\partial a_i\partial a_j}=a_i\frac{\partial^2 P_i(a)}{\partial a_i\partial a_j}+ \frac{\partial P_i(a)}{\partial a_j},\quad j\neq i \end{aligned}$$

and $\frac {\partial ^2 u_i(x,a)}{\partial a_i\partial a_j}$ are likely to be nonnegative, if the goods are complements, i.e., $\frac {\partial P_i(a)}{\partial a_j}\ge 0$ for j ≠ i; see Vives (1990). From Theorem 5 in Milgrom and Roberts (1990), it follows that the game Γ_v(x) has a pure Nash equilibrium. Therefore, the arguments used in Jaśkiewicz and Nowak (2016) imply that the stochastic game has a pure stationary almost Markov perfect equilibrium.

Remark 7

The game described in Example 3 is also studied in Curtat (1996), but with additional restrictive assumptions that q₁ and q₂ are independent of x ∈ X. Then, the transition probability q has so-called increasing differences in (x, a). This fact implies that the functions $U^i_{\beta }(v_i,\cdot ,\cdot )$ satisfy the assumptions in Proposition 7. Other condition imposed by Curtat (1996) states that the payoff functions u_i(x, a) are increasing in a_−i and, more importantly, satisfy the so-called strict diagonal dominance condition for each x ∈ X. For details the reader is referred to Curtat (1996) and Rosen (1965). This additional condition entails the uniqueness of a pure Nash equilibrium in every auxiliary game Γ_v(x) under consideration; see Proposition 6. The advantage is that Curtat (1996) can directly work with Lipschitz continuous strategies for the players and find a stationary Markov perfect equilibrium in that class using Schauder’s fixed point theorem. Without the strict diagonal dominance condition, Γ_v(x) may have many pure Nash equilibria and Curtat’s approach cannot be applied. The coefficients of the convex combination in (6.4) are affine functions of a ∈ A. This requirement can slightly be generalized; see, for instance, Example 4 in Jaśkiewicz and Nowak (2016). If q₁ or q₂ depends on x ∈ X, then the increasing differences property of q does not hold and the method of Curtat (1996) does not work. Additional comments on supermodular stochastic games can be found in Amir (2003).

The result in Curtat (1996) on the existence of stationary Markov perfect equilibria for supermodular discounted stochastic games is based upon the lattice theoretic arguments and on complementarity and monotonicity assumptions. The state and action spaces are assumed to be compact intervals in Euclidean space, and the transition probability is assumed to be norm continuous in state and action variables. Moreover, the strict diagonal dominance condition (see (C1) in Sect. 2) applied to the auxiliary one-shot games Γ_v(x) for any increasing Lipschitz continuous continuation vector payoff v plays a crucial role. Namely, this assumption together with others implies that $\mathcal {NP}_v(x)$ is a singleton. In addition, the function $x\to \mathcal {NP}_v(x)$ is increasing and Lipschitz continuous. Thus, his proof is comprised of two steps. First, he shows that there exists an increasing Lipschitz continuous vector payoff function v^∗ such that $v^*(x)= \mathcal {NP}_{v^*}(x)$ for all x ∈ X. Second, he makes use of a theorem on the Lipschitz property of the unique equilibrium in $ \varGamma _{v^*}.$

Horst (2005) provided a different and more unified approach to stationary Markov perfect equilibria that can be applied beyond the setting of supermodular games. Instead of imposing monotonicity conditions on the players’ utility functions, he considered stochastic games in which the interaction between different players is sufficiently weak. For instance, certain “production games” satisfy this property. The method of his proof is based on a selection theorem of Montrucchio (1987) and the Schauder fixed point theorem applied to the space of Lipschitz continuous strategy profiles of the players. The assumptions imposed by Horst (2005) are rather complicated. For example, they may enforce a number of players in the game or the upper bound for a discount factor. Such limitations do not occur in the approach of Curtat (1996).

Balbus et al. (2014a) considered supermodular stochastic games with an absorbing state and the transition probabilities of the form q(⋅|x, a) = g(x, a)q₀(⋅|x) + (1 − g(x, a))δ₀(⋅). Under some strong monotonicity conditions on the utility functions and transitions, they showed that the Nash equilibrium payoffs in the n-stage games monotonically converge as n →∞. This fact yields the existence of pure stationary Markov perfect equilibrium. A related result is given in Balbus et al. (2013b) for a similar class of dynamic games. The state space X in Balbus et al. (2014a) is one-dimensional, and their results do not apply to the games of resource extraction discussed earlier. If, on the other hand, the transition probability is a “mixture” of finitely many probability measures, then a stationary Markov perfect equilibrium can be obtained, in certain models, by solving a system of non-linear equations. This method was discussed in Sect. 5 of Nowak (2007) . The next example is not a supermodular game in the sense of Balbus et al. (2014a), but it belongs to the class of production games examined by Horst (2005). Generally, there are only few examples of games with continuum states, for which Nash equilibria can be given in a closed form.

Example 4

Let X = [0, 1], A_i(x) = [0, 1] for all x ∈ X and i ∈ N = {1, 2}. We consider the symmetric game where the stage utility of player i is

$$\displaystyle \begin{aligned} u_i(x,a_1,a_2)= a_1+a_2+ 2xa_1a_2- a_i^2. \end{aligned} $$

(6.6)

The state variable x in (6.6) is a complementarity coefficient of the players’ actions. The transition probabilities are of the form

$$\displaystyle \begin{aligned} q(\cdot|x,a_1,a_2):= \frac{x+a_1+a_2}{3}\mu_1(\cdot) + \frac{3-x-a_1-a_2}{3}\mu_2(\cdot). \end{aligned}$$

We assume that μ₁ has the density ρ₁(y) = 2y and μ₂ has the density ρ₂(y) = 2 − 2y, y ∈ X. Note that μ₁ stochastically dominates μ₂. From the definition of q, it follows that higher states x ∈ X or high actions a₁, a₂ (efforts) of the players induce a distribution of the next state having a higher mean value. Assume that $v^*=(v_1^*,v_2^*)$ is an equilibrium payoff vector in the β-discounted stochastic game. As shown in Example 1 of Nowak (2007), it is possible to construct a system of non-linear equations with unknown z₁ and z₂, whose solution $z_1^*$, $z_2^*$ is $z_i^*=\int _Xv^*_i(y)\mu _i(dy)$. This fact, in turn, gives the possibility finding of a symmetric stationary Markov perfect equilibrium $(f_1^*,f_2^*)$ and $v_1^*=v_2^*$. It is of the form $f^*_i(x)= \frac {1+z^*}{4-2x}. \ x\in X, \ i\in N$, where

$$\displaystyle \begin{aligned} z^* = \frac{-8-6p(\beta-1)-\sqrt{(8+6p(\beta-1))^2-36}}{6} \ \ \mbox{and}\ \ p=\frac{9+2\beta\ln 2 -2\beta}{\beta(1-\beta)(6\ln 2-3 )}. \end{aligned}$$

Moreover, we have

$$\displaystyle \begin{aligned} v_i^*(x)= (p\beta+x)z^* +\frac{(1+z^*)^2(3-x)}{2(2-x)^2}. \end{aligned}$$

Ericson and Pakes (1995) provided a model of firm and industry dynamics that allows for entry, exit and uncertainty generating variability in the fortunes of firms. They considered the ergodicity of the stochastic process resulting from a Markov perfect industry equilibrium. A dynamic competition in an oligopolistic industry with investment, entry and exit was also extensively studied by Doraszelski and Satterthwaite (2010). Computational methods for the class of games studied by Ericson and Pakes (1995) are presented in Doraszelski and Pakes (2007). Further applications of discounted stochastic games with countably many states to models in industrial organization including models of industry dynamics are given in Escobar (2013).

Shubik and Whitt (1973) considered a non-stochastic model of sequential strategic market game, where the state includes a current stocks of capital. At each period of the game, one unit of a consumer good is put up for sale, and players bid some amounts of fiat money for it. A stochastic counterpart of this game was first presented in Secchi and Sudderth (2005). Wiecek (2009), on the other hand, obtained a general structure of equilibrium policies in two -person games, where bids gradually decrease with increase of the discount factor. Moreover, Wiecek (2012) proved that a Nash equilibrium, where all the players use “aggressive strategies”, emerges in the game for any value of the discount factor as the number of players is sufficiently large. This fact corresponds to a similar result for a deterministic economy given in Shubik and Whitt (1973) as well as being consistent with existing results about economies with continuum of players. Other applications of nonzero-sum stochastic games to economic models can be found in Duggan (2012) and He and Sun (2017). Although the concept of mean field equilibrium in dynamic games is not directly inspired by Nash, the influence of the theory of non-cooperative stochastic games on this area of research is obvious. Also the notion of supermodularity is used in studying the mean field equilibria in dynamic games. The reader is referred to Adlakha and Johari (2013) where some applications to computer science and operations research are given.

7 Special Classes of Stochastic Games with Countably Many States

Assume that the state space X is countable. Then every $F^0_i$ can be recognized as a compact convex subset of a linear topological space. A sequence $(f_i^k)_{k\in \mathbb {N}}$ converges to $f_i\in F^0_i$ if for every $x\in X\ f_i^k(\cdot |x) \to f_i(\cdot |x)$ in the weak-star topology on the space of probability measures on A_i(x). The weak or weak-star convergence of probability measures on metric spaces is fully described in Aliprantis and Border (2006) or Billingsley (1968). Since X is countable, every space $F^0_i $ is sequentially compact (it suffices to use the standard diagonal method for selecting convergent subsequences) and, therefore, F⁰ is sequentially compact when endowed with the product topology. If X is finite and the sets of actions are finite, then F⁰ can actually be viewed as a convex compact subset of Euclidean space. In the finite state space case, it is easy to prove that the discounted payoffs $J_{\beta }^i(x,f)$ are continuous on F⁰. If X is countable and the payoff functions are uniformly bounded, and q(y|x, a) is continuous in a ∈ A(x) for all x, y ∈ X, then showing the continuity of $J_{\beta }^i(x,f)$ on F⁰ requires a little more work; see Federgruen (1978). From the Bellman equation in discounted dynamic programming (see Puterman 1994), it follows that $f^*=(f^*_1,\ldots ,f^*_n)$ is a stationary Markov perfect equilibrium in the discounted stochastic game if and only if there exist bounded functions $v_i^*:X\to \mathbb {R}$ such that for each x ∈ X and i ∈ N we have

$$\displaystyle \begin{aligned} v^*_i(x)= \max_{\nu_i \in \Pr (A_i(x))} U^i_{\beta}(v^*_i,x,(\nu_i,f^*_{-i}))=U^i_{\beta}(v^*_i,x,f^*). \end{aligned} $$

(6.7)

From (6.7), it follows that $v^*_i(x)=J_{\beta }^i(x,f^*)$. Using the continuity of the expected discounted payoffs in f ∈ F⁰ and (6.7 ), one can define the best response correspondence in the space of strategies, show its upper semicontinuity and conclude from the fixed point theorem due to Glicksberg (1952) (or due to Kakutani (1941) in the case of finite state and action space) that the game has a stationary Markov perfect equilibrium f^∗∈ F⁰. This fact was proved for finite state space discounted stochastic games by Fink (1964) and Takahashi (1964). An extension to games with countable state spaces was reported in Parthasarathy (1973) and Federgruen (1978). Some results for a class of discounted games with discontinuous payoff functions can be found in Nowak and Wiȩcek (2007).

The fundamental results in the theory of regular Nash equilibria in normal form games concerning genericity (see Harsanyi 1973a) and purification (see Harsanyi 1973b) were extended to dynamic games by Doraszelski and Escobar (2010). A discounted stochastic game possessing equilibria that are all regular in the sense of Doraszelski and Escobar (2010) has a compact equilibrium set that consists of isolated points. Hence, it follows that the equilibrium set is finite. They proved that the set of discounted stochastic games (with finite sets of states and actions) having Markov perfect equilibria that all are regular is open and has full Lebesgue measure. Related results were given by Haller and Lagunoff (2000), but their definition of regular equilibrium is different and may not be purifiable.

The payoff function for player i ∈ N in the limit-average stochastic game can be defined as follows:

$$\displaystyle \begin{aligned} \bar{J}^i(x,\pi):= \liminf_{T\to\infty} E_x^\pi\left(\frac{1}{T}\sum_{t=1}^T u_i(x_t,a^t)\right),\quad x\in X,\ \pi\in \varPi.\end{aligned} $$

The equilibrium solutions for this class of games are defined similarly as in the discounted case. The existence of stationary Markov perfect equilibria for games with finite state and action spaces and the limit-average payoffs was proved independently by Rogers (1969) and Sobel (1971). They assumed that the Markov chain induced by any stationary strategy profile and the transition probability q is irreducible . It is shown under the irreducibility condition that the equilibrium payoffs $w^*_i$ of the players are independent of the initial state. Moreover, it is proved that there exists a sequence of equilibria $(f^k)_{k\in \mathbb {N}}$ in β_k-discounted games (with β_k → 1 as k →∞) such that $w_i^*= \lim _{k\to \infty }(1-\beta _k)J^i_{\beta _k}(x,f^k)$. Later, Federgruen (1978) extended these results to limit-average stochastic games with countably many states satisfying some uniform ergodicity conditions. Other cases of similar type were mentioned by Nowak (2003a). Below we provide a result due to Altman et al. (1997), which has some potential for applications in queueing models. The stage payoffs in their approach may be unbounded. We start with a formulation of their assumptions.

Let m : X → [1, ∞) be a function for which the following conditions hold.

(A4)
For each x, y ∈ X, i ∈ N, the functions u_i(x, ⋅) and q(y|x, ⋅) are continuous on A(x). Moreover,
$$\displaystyle \begin{aligned} &\sup_{x\in X}\max_{a\in A(x)}|u_i(x,a)|/m(x) <\infty \quad\mbox{and}\\ &\quad \quad \lim_{k\to\infty}\sum_{y\in X}|q(y|x,a^k)-q(y|x,a)|m(y)=0 \end{aligned} $$
for any a^k → a ∈ A(x).
(A5)
There exist a finite set Y ⊂ X and γ ∈ (0, 1) such that
$$\displaystyle \begin{aligned} \sum_{y\in X\setminus Y}q(y|x,a)m(y) \le \gamma m(x)\quad \mbox{for all} \quad x\in X,\ a\in A(x). \end{aligned}$$
(A6)
The function f → n(f) is continuous with respect to stationary strategy profiles f ∈ F⁰, where n(f) denotes the number of closed classes in the Markov chain induced by the transition probability q(y|x, f), x, y ∈ X.

Property (A5) is called m-uniform geometric recurrence; see Altman et al. (1997). Condition (A6) is quite restrictive and implies that the number of positive recurrent classes is a constant function of the stationary strategies. If the Markov chains resulting from the stationary policies are all unichain, the limit-average payoff functions are constant, i.e., independent of the initial state. For a detailed discussion, we refer the reader to Altman et al. (1997) and the references cited therein.

Theorem 12

If conditions (A4)–(A6)are satisfied, then the limit-average payoff n-person stochastic game has a stationary Markov perfect equilibrium.

The above result follows from Theorem 2.6 in Altman et al. (1997), where it is also shown that under (A4)–(A6) any limit of stationary Markov equilibria in β-discounted games (as β → 1) is an equilibrium in the limit-average game. A related result was established by Borkar and Ghosh (1993) under a stochastic stability condition. More precisely, they assumed that the Markov chain induced by any stationary strategy profile is unichain and the transition probability from any fixed state has a finite support.

Stochastic games with countably many states are usually studied under some recurrence or ergodicity conditions. Without these conditions n-person nonzero-sum limit-average payoff stochastic games with countable state spaces are very difficult to deal with. Nevertheless, the results obtained in the literature have some interesting applications, especially to queueing systems; see, for example, Altman (1996) and Altman et al. (1997).

Now assume that X is a Borel space and μ is a probability measure on X. Consider an n-person discounted stochastic game G, where A_i(x) = A_i for all i ∈ N and x ∈ X, the payoff functions are uniformly bounded and continuous in actions.

(A7)
The transition probability q has a conditional density function ρ, which is continuous in actions and such that
$$\displaystyle \begin{aligned} \int_X\max_{a\in A}\rho(x,a,y)\mu(dy) <\infty. \end{aligned}$$

Let C(A) be the Banach space of all real-valued continuous functions on the compact space A endowed with the supremum norm ∥⋅∥_∞. By L₁(X, C(A)) we denote the Banach space of all C(A)-valued measurable functions ϕ on X such that ∥ϕ∥₁ :=∫_X∥ϕ(y)∥_∞μ(dy) < ∞. Let $\{X_k\}_{k\in \mathbb {N}_0}$ be a measurable partition of the state space ($\mathbb {N}_0\subset \mathbb {N}$), $\{u_{i,k}\}_{k\in \mathbb {N}_0}$ be a family of functions u_i,k ∈ C(A) and $ \{\rho _k\}_{k\in \mathbb {N}_0}$ be a family of functions ρ_k ∈ L₁(X, C(A)) such that ρ_k(x)(a, y) ≥ 0 and ∫_Xρ_k(x)(a, y)μ(dy) = 1 for each $k\in \mathbb {N}_0,\ a\in A$. Consider a game $ \tilde {G}$ where the payoff function for player i is $\tilde {u} _i(x,a)=u_k(a)$ if x ∈ X_k. The transition density is $\tilde {\rho } (x,a,y)=\rho _k(x)(a,y)$ if x ∈ X_k. Let $\tilde {F}^0_i$ be the set of all $f_i\in F^0_i$ that are constant on every set X_k. Let $\tilde {F}^0:= \tilde {F}_1^0\times \cdots \times \tilde {F}_n^0$. The game $\tilde {G}$ resembles a game with countably many states and if the payoff functions $ \tilde {u}_i$ are uniformly bounded, then $\tilde G$ with the discounted payoff criterion has an equilibrium in $\tilde {F}^0$. Denote by $\tilde {J} ^i_{\beta }(x,\pi )$ the discounted expected payoff to player i ∈ N in the game $\tilde G$. It is well known that C(A) is separable. The Banach space L₁(X, C(A)) is also separable. Note that x → u_i(x, ⋅) is a measurable mapping from X to C(A). By, (A7) the mapping x → ρ(x, ⋅, ⋅) from X to L₁(X, C(A)) is also measurable. Using these facts Nowak (1985) proved the following result.

Theorem 13

Assume that G satisfies (A7) . For any 𝜖 > 0, there exists a game $\tilde G$ such that $|J^i_{\beta }(x,\pi )-\tilde {J}^i_{\beta }(x,\pi )|< \epsilon /2$ for all x ∈ X, i ∈ N and π ∈ Π. Moreover, the game G has a stationary Markov 𝜖-equilibrium.

A related result on approximation of discounted nonzero-sum games and existence of 𝜖-equilibria was given by Whitt (1980), who used stronger uniform continuity conditions and used a different technique. Approximations of discounted and also limit-average stochastic games with general state spaces and unbounded stage functions were studied in Nowak and Altman (2002). They used the weighted norm approach and imposed some geometric ergodicity conditions while examining the limit-average case. An extension with simpler and more transparent proof for semi-Markov games satisfying a geometric drift condition and a majorization property, similar to (GE1)–(GE3) in Sect. 5 in Jaśkiewicz and Nowak (2018a), was given in Jaśkiewicz and Nowak (2006).

8 Algorithms for Nonzero-Sum Stochastic Games

In this section, we assume that the state space X and the sets of actions A_i are finite. In the 2-player case, we let for notational convenience A₁(x) = A₁, A₂(x) = A₂ and a = a₁ ∈ A₁, b = a₂ ∈ A₂. Further, for any $f_i \in F^0_i$, i = 1, 2, we set

$$\displaystyle \begin{aligned} q(y|x,f_1,f_2)&:= \sum_{a\in A_1}\sum_{b\in A_2} q(y|x,a,b)f_1(a|x)f_2(b|x),\\ q(y|x,f_1,b)&:= \sum_{a\in A_1} q(y|x,a,b)f_1(a|x),\\ u_i(x,f_1,f_2)&:= \sum_{a\in A_1}\sum_{ b\in A_2} u_i(x,a,b)f_1(a|x)f_2(b|x),\\ u_i(x,f_1,b)&:= \sum_{a\in A_1} u_i(x,a,b)f_1(a|x). \end{aligned} $$

In a similar way, we define q(y|x, a, f₂) and u_i(x, a, f₂). Note that every $f_i\in F^0_i$ can be recognized as a compact convex subset of Euclidean space. Also every function $\phi :X\to \mathbb {R}$ can be viewed as a vector in Euclidean space. Below we describe two results of Filar et al. (1991) about characterization of stationary equilibria in stochastic games by constrained nonlinear programming. However, due to the fact that the constraint sets are not convex, the results are not straightforward in numerical implementation. Although it is common in mathematical programming to use matrix notation, we follow the one introduced in previous sections.

Let c = (v₁, v₂, f₁, f₂). Consider the following problem (OP_β):

$$\displaystyle \begin{aligned} \min \ O_1(c):= \sum_{i=1}^2\sum_{x\in X} \left( v_i(x)-u_i(x,f_1,f_2) -\beta\sum_{y\in X}v_i(y)q(y|x,f_1,f_2)\right) \end{aligned}$$

subject to $(f_1,f_2)\in F^0_1\times F^0_2$ and

$$\displaystyle \begin{aligned} u_1(x,a,f_2)+\beta\sum_{x\in X}v_1(y)q(y|x,a,f_2)\le v_1(x),\ \mbox{for all} \ x\in X,\ a\in A_1, \end{aligned}$$

and

$$\displaystyle \begin{aligned} u_2(x,f_1,b)+\beta\sum_{x\in X}v_2(y)q(y|x,f_1,b)\le v_2(x),\ \mbox{for all} \ x\in X,\ b\in A_2. \end{aligned}$$

Theorem 14

Consider a feasible point $c^*=(v_1^*,v_2^*,f_1^*,f_2^*)$ in (OP_β). Then $(f_1^*,f_2^*)\in F_1^0\times F^0_2$ is a stationary Nash equilibrium in the discounted stochastic game if and only if c^∗ is a solution to problem (OP_β) with O₁(c^∗) = 0.

Let c = (z₁, v₁, w₁, f₂, z₂, v₂, w₂, f₁). Now consider the following problem (OP_a):

$$\displaystyle \begin{aligned} \min\ O_2(c):= \sum_{i=1}^2\sum_{x\in X} \left( v_i(x)-\sum_{y\in X}v_i(y)q(y|x,f_1,f_2)\right) \end{aligned}$$

subject to $(f_1,f_2)\in F^0_1\times F^0_2$ and

$$\displaystyle \begin{aligned} &\sum_{y\in X} v_1(y)q(y|x,a,f_2)\le v_1(x),\\ & u_1(x,a,f_2)+\sum_{y\in X} z_1(y)q(y|x,a,f_2)\le v_1(x)+z_1(x) \end{aligned} $$

for all x ∈ X, a ∈ A₁ and

$$\displaystyle \begin{aligned} &\sum_{y\in X} v_2(y)q(y|x,f_1,b)\le v_2(x), \\ &u_2(x,f_1,b)+\sum_{y\in X} z_2(y)q(y|x,f_1,b)\le v_2(x)+z_2(x) \end{aligned} $$

for all x ∈ X, b ∈ A₂ and

$$\displaystyle \begin{aligned} u_i(x,f_1,f_2)+\sum_{y\in X}w_i(y)q(y|x,f_1,f_2)= v_i(x)+w_i(x) \end{aligned}$$

for all x ∈ X and i = 1, 2.

Theorem 15

Consider a feasible point $c^*=(z^*_1,v_1^*,w_1^*,f_2^*,z^*_2,v_2^*,w_2^*,f_1^*)$ in (OP_a). Then $(f_1^*,f_2^*)\in F_1^0\times F^0_2$ is a stationary Nash equilibrium in the limit-average payoff stochastic game if and only if c^∗ is a solution to problem (OP_a) with O₂(c^∗) = 0.

Theorems 14 and 15 were stated and proved in Filar et al. (1991); see also Theorems 3.8.2 and 3.8.4 in Filar and Vrieze (1997).

As in the zero-sum case, when one player controls the transitions, it is possible to construct finite-step algorithms to compute Nash equilibria. The linear complementarity problem (LCP) is defined as follows. Given a square matrix $\mathbb {M}$ of order m and a (column) vector $ \overline {Q}\in \mathbb {R}^m$, we find two vectors $\overline {Z} =[z_1,\ldots ,z_m]^T \in \mathbb {R}^m$ and $\overline {W}=[w_1,\ldots ,w_m]^T\in \mathbb {R}^m$ such that

$$\displaystyle \begin{aligned} \mathbb{M}\overline{Z}+\overline{Q}=\overline{W}\quad\mbox{and}\quad w_j\ge0,\ z_j\ge 0, \ z_jw_j=0\quad\mbox{for all}\quad j=1,\ldots,m. \end{aligned}$$

Lemke (1965) proposed some pivoting finite-step algorithms to solve the LCP for a large class of matrices $\mathbb {M}$ and vectors $\overline {Q}$. Further research on the LCP can be found in Cottle et al. (1992).

Finding a Nash equilibrium in any bimatrix game $(\mathbb {A},\mathbb {B})$ is equivalent to solving the LCP with

$$\displaystyle \begin{aligned} \mathbb{M} =\left[ \begin{array}{cc} \mathbb{B}^T & \mathbb{O} \\ \mathbb{O} & \mathbb{A} \end{array} \right] \ \mbox{where}\ \mathbb{O} \ \mbox{ is the matrix with zero entries,}\ \overline{Q}=[-1,\ldots,-1]^T. \end{aligned}$$

A finite-step algorithm for this LCP was given by Lemke and Howson (1964). If $ \overline {Z}^*= [\overline {Z}_1^*,\overline {Z}^*_2]$ is a part of the solution of the above LCP, then the normalization of $\overline {Z}^*_i$ is an equilibrium strategy for player i.

Suppose that only player 2 controls the transitions in a discounted stochastic game, i.e., q(y|x, a, b) is independent of a ∈ A. Let $\{f_1,\ldots ,f_{m_1}\}$ and $\{g_1,\ldots ,g_{m_2}\}$ be the families of all pure stationary strategies for players 1 and 2, respectively. Consider the bimatrix game $(\mathbb {A},\mathbb {B})$, where the entries a_ij of $\mathbb {A}$ and b_ij of $\mathbb {B}$ are

$$\displaystyle \begin{aligned} a_{ij}:= \sum_{x\in X} u_1(x,f_i(x),g_j(x)) \quad\mbox{and}\quad b_{ij}:= \sum_{x\in X} J_{\beta}^2(x,f_i,g_j). \end{aligned}$$

Then, making use of the Lemke-Howson algorithm, Nowak and Raghavan (1993) proved the following result.

Theorem 16

Let $\xi ^* =(\xi ^*_1,\ldots ,\xi ^*_{m_1})$ and $\zeta ^*=(\zeta ^*_1,\ldots ,\zeta ^*_{m_2})$ and assume that (ξ^∗, ζ^∗) is a Nash equilibrium in the bimatrix game $(\mathbb {A},\mathbb {B})$ defined above. Then the stationary strategies

$$\displaystyle \begin{aligned} f^*(x)=\sum_{j=1}^{m_1}\xi^*_j\delta_{f_j(x)}\quad\mathit{\mbox{and}}\quad g^*(x)=\sum_{j=1}^{m_2}\zeta^*_j\delta_{g_j(x)} \end{aligned}$$

form a Nash equilibrium in the discounted stochastic game.

It should be noted that a similar result does not hold for stochastic games with the limit-average payoffs. Note that the entries of the matrix $ \mathbb {B}$ can be computed in finitely many steps, but the order of the associated LCP is very high. Therefore, a natural question arises as to whether the single-controller stochastic game can be solved with the help of LCP formulation with appropriately defined matrix $\mathbb {M}$ (with lower dimension) and vector $\overline {Q}$. Since the payoffs and transitions depend on states and stationary equilibria which are characterized by the systems of Bellman equations, the dimension of the LCP must be high. However, it should be essentially smaller than in the case of Theorem 16. Such an LCP formulation for discounted single-controller stochastic games was given by Mohan et al. (1997) and further developed in Mohan et al. (2001). In the case of the limit-average payoff and single-controller stochastic game, Raghavan and Syed (2002) provided an analogous algorithm. Further studies on specific classes of stochastic games (acyclic 3-person switching control games, polystochastic games) can be found in Krishnamurthy et al. (2012).

Let us recall that a Nash equilibrium in an n-person game is a fixed point of some mapping. A fixed point theorem of certain deformations of continuous mappings proved by Browder (1960) turned out to be basic for developing so-called homotopy methods in computing equilibria in nonzero-sum games. It reads as follows.

Theorem 17

Assume that $C\subset \mathbb {R}^d$ is a nonempty compact convex set. Let Ψ : [0, 1] × C → C be a continuous mapping and F(Ψ) := {(t, c) ∈ [0, 1] × C : c = Ψ(t, c)}. Then F(Ψ) contains a connected subset F_c(Ψ) such that F_c(Ψ) ∩ ({0}× C) ≠ ∅ and F_c(Ψ) ∩ ({1}× C) ≠ ∅.

This result was extended to upper semicontinuous correspondences by Mas-Colell (1974). Consider an n-person game and assume that Ψ₁ is a continuous mapping whose fixed points in the set C of strategy profiles correspond to Nash equilibria in this game. The basic idea in the homotopy methods is to define a “deformation” Ψ of Ψ₁ such that Ψ(1, c) = Ψ₁(c) for all c ∈ C and such that Ψ(0, c) has a unique fixed point, say $c_0^*$, that is relatively simple to find. By Theorem 17, F_c(Ψ) is a connected set. Thus, $c_0^*$ is connected via F_c(Ψ) with a fixed point $c_1^*$ of Ψ₁. Hence, the idea is to consider the connected set F_c(Ψ). Since the dimension of the domain of Ψ is one higher than the dimension of its range, one can formulate regularity conditions under which the approximation path is a compact, piecewise differentiable one-dimensional manifold, i.e., it is a finite collection of arcs and loops. In the case of bimatrix games, a nondegeneracy condition is sufficient to guarantee that the aforementioned properties are satisfied. A comprehensive discussion of the homotopy algorithms applied to n-person games is provided by Herings and Peeters (2010) and references cited therein. According to the authors, “advantages of homotopy algorithms include their numerical stability, their ability to locate multiple solutions, and the insight they provide in the properties of solutions”. Various examples show that implementation of homotopy methods is rather straightforward with the aid of available professional software. It is worth recalling the known fact that the Lemke-Howson algorithm can be applied to bimatrix games only. An issue of finding Nash equilibria in concave n-person games comprises a non-linear complementarity problem. Therefore, one can only expect to obtain approximate equilibria by different numerical methods.

The homotopy methods, as noted by Herings and Peeters (2004), are also useful in the study of stationary equilibria, their structure and computation in nonzero-sum stochastic games. Their results can be applied to n-person discounted stochastic games with finite state and action spaces.

Recently, Govindan and Wilson (2003) proposed a new algorithm to compute Nash equilibria in finite games. Their algorithm combines the global Newton method (see Smale 1976)) and a homotopy method for finding fixed points of continuous mappings developed by Eaves (1972, 1984). In the construction of a Nash equilibrium, a fundamental topological property of the graph of the Nash equilibrium correspondence discovered by Kohlberg and Mertens (1986) plays an important role. Being more precise, the authors show that making use of the global Newton method, it is possible to trace the path of the homotopy by a dynamical system. The same method can be applied to a construction of an algorithm for n-person discounted stochastic games with finite action and state sets; see Govindan and Wilson (2009). Strategic n-person games with a potential function having pure Nash equilibria were considered by Monderer and Shapley (1996). Potters et al. (2009), on the other hand, examined certain classes of discounted stochastic games via the potential function approach and constructed pure stationary Nash equilibria by solving a finite number of finite strategic games.

Solan and Vieille (2010) pointed out that the methods based on formal logic, successfully applied to zero-sum games, are also useful in the examination of certain classes of nonzero-sum stochastic games with the limit-average payoff criterion.

9 Uniform Equilibrium, Subgame Perfection, and Correlation in Stochastic Games with Finite State and Action Spaces

In this section, we consider stochastic games with finite state space X = {1, …, s} and finite sets of actions. We deal with “normalized discounted payoffs” and use notation which is more consistent with the surveyed literature. We let β = 1 − λ and multiply all current payoffs by λ ∈ (0, 1). Thus, we consider

$$\displaystyle \begin{aligned} J^i_{\lambda}(x,\pi):= E_x^\pi\left(\sum_{t=1}^\infty \lambda(1-\lambda)^{t-1}u_i(x_t,a^t)\right),\quad x=x_1\in X,\ \pi\in\varPi,\ i\in N. \end{aligned}$$

For any $T\in \mathbb {N}$ and x = x₁ ∈ X, π ∈ Π, the T-stage average payoff for player i ∈ N is

$$\displaystyle \begin{aligned} J^i_T(x,\pi):= E_x^\pi\left( \frac{1}{T}\sum_{t=1}^T u_i(x_t,a^t)\right). \end{aligned}$$

A vector $\bar {g}\in \mathbb {R}^{n}$ is called a uniform equilibrium payoff at the initial state x ∈ X if for every 𝜖 > 0 there exist λ⁰ ∈ (0, 1], $T^0\in \mathbb {N}$ and a strategy profile π⁰ ∈ Π such that for every player i ∈ N and every strategy π_i ∈ Π_i, we have

$$\displaystyle \begin{aligned} \bar{g}^i +\epsilon \ge J^i_{\lambda}(x,\pi^0)\ge \bar{g}^i-\epsilon \ge J^i_{\lambda}(x,(\pi_i,\pi^0_{-i}))-2\epsilon \quad\mbox{for}\quad \lambda\in (0,\lambda^0] \end{aligned}$$

and

$$\displaystyle \begin{aligned} \bar{g}^i +\epsilon \ge J^i_T(x,\pi^0)\ge \bar{g}^i-\epsilon \ge J^i_T(x,(\pi_i,\pi^0_{-i}))-2\epsilon \quad\mbox{for}\quad T\ge T^0. \end{aligned}$$

Any profile π⁰ that has the above two properties is a called a uniform 𝜖-equilibrium. In other words, the game has a uniform equilibrium payoff if for every 𝜖 > 0 there is a strategy profile π⁰ which is an 𝜖-equilibrium in every discounted game with a sufficiently small discount factor λ and in every finite-stage game with sufficiently long time horizon.

A stochastic game is called absorbing if all states but one are absorbing. Assume that X = {1, 2, 3} and only state x = 1 is nonabsorbing. Let E⁰ denote the set of all uniform equilibrium payoffs. Since the payoffs are determined in states 2 and 3, in a two-person game, the set E⁰ can be viewed as a subset of $\mathbb {R}^2$. Let λ_k → 0 as k →∞, and let $ f_k^*$ be a stationary Markov perfect equilibrium in the λ_k -discounted two-person game. A question arises as to whether the sequence $ (J^1_{\lambda _k}(x,f^*_k),J^2_{\lambda _k}(x,f^*_k))_{k\in \mathbb {N}}$ with x = 1 has an accumulation point $ \bar {g}\in E^0$. That is the case in the zero-sum case (see Mertens and Neyman 1981 ). Sorin (1986) provided a nonzero-sum modification of the “Big Match,” where only state x = 1 is nonabsorbing in which $\lim _{k\to \infty } (J^1_{\lambda _k}(1,f^*_k),J^2_{\lambda _k}(1,f^*_k))\not \in E^0$. A similar phenomenon occurs for the limit of T-stage equilibrium payoffs. Sorin (1986) gave a full description of the set E⁰ in his example. His observations were generalized by Vrieze and Thuijsman (1989) to all 2-person absorbing games. They proved the following result.

Theorem 18

Any two-person absorbing stochastic game has a uniform equilibrium payoff.

We now state the fundamental result of Vieille (2000a,b).

Theorem 19

Every two-person stochastic game has a uniform equilibrium payoff.

The proof of Vrieze and Thuijsman (1989) is based on the “vanishing discount factor approach” combined with the idea of “punishment” successfully used in repeated games. The assumption that there are only two players is important in the proof. The 𝜖-equilibrium strategies that they construct need unbounded memory. The proof of Vieille (2000a,b) is involved. One of the reasons is that the ergodic classes do not depend continuously on strategy profiles. Following Vieille (2002) one can briefly say that “the basic idea is to devise an 𝜖-equilibrium profile that takes the form of a stationary-like strategy vector, supplemented by threats of indefinite punishment”. The construction of uniform equilibrium payoff consists of two independent steps. First, a class of solvable states is recognized and some controlled sets are considered. Second, the problem is reduced to the existence of equilibria in a class of recursive games. The punishment component is crucial in the construction and therefore the fact that the game is 2-person is important. Neither of the two parts of the proof can be extended to games with more than two players. The 𝜖 -equilibrium profiles have no subgame-perfection property and require unbounded memory for the players. For a heuristic description of the proof, the reader is referred to Vieille (2002). In a recent paper Solan (2017) proposed a new solution concept for multiplayer stochastic games called acceptable strategy profiles. It is relatively simpler than uniform equilibrium and has some interesting properties. A suitable adaptation of the notion of uniform equilibrium is studied by Neyman (2017) in the class of continuous-time stochastic games with a small imprecision in the specification of players’ evaluations of streams of payoffs.

Flesch et al. (1997) proposed a three-person game with absorbing states where only a cyclic Markov equilibrium exists. No examples of this type were found in the 2-person case. This example inspired Solan (1999), who making use of certain arguments from Vrieze and Thuijsman (1989), proved the following result.

Theorem 20

Every 3 -person absorbing stochastic game has a uniform equilibrium payoff.

In a quitting game , every player has only two actions, c for continue and q for quit. As soon as one or more of the players at any stage chooses q, the game stops and the players receive their payoffs, which are determined by the subset of players, say S, that choose simultaneously the action q. If nobody chooses q throughout all stages of play, then all players receive zero. The payoffs are defined as follows. For every nonempty subset S ⊂ N of players, there is a payoff vector $v(S) \in \mathbb {R}^n$. The first stage on which S is the subset of players that choose q at this stage, every player i ∈ N receives the payoff v(S)_i. A quitting game is a special limit-average-absorbing stochastic game. The example of Flesch et al. (1997) belongs to this class. We now state the result due to Solan and Vieille (2001).

Theorem 21

Consider a quitting game satisfying the following assumptions: if player i alone quits, then i receives 1, and if player i quits with some other players, then i receives at most 1. Then the game has a subgame-perfect 𝜖-equilibrium. Moreover, there is a cyclic 𝜖-equilibrium strategy profile.

Quitting games are special cases of “escape games” studied by Simon (2007). As shown by Simon (2012), a study of quitting games can be based on some methods of topological dynamics and homotopy theory. More comments on this issue can be found in Simon (2016).

Thuijsman and Raghavan (1997) studied n-person perfect information stochastic games and n-person ARAT stochastic games and showed the existence of pure equilibria in the limit-average payoff case. They also derived the existence of 𝜖-equilibria for 2-person switching control stochastic games with the same payoff criterion. A class of n-person stochastic games with the limit-average payoff criterion and additive transitions as in the ARAT case (see Sect. 5) was studied by Flesch et al. (2007). The payoff functions do not satisfy any separability in actions assumption. They established the existence of Nash equilibria that are history dependent. For 2-person absorbing games, they showed the existence of stationary 𝜖-equilibria. In Flesch et al. (2008, 2009), the authors studied stochastic games with the limit-average payoffs where the state space X is the Cartesian product of some finite sets X_i, i ∈ N. For any state x = (x₁, …, x_n) ∈ X and any profile of actions a = (a₁, …, a_n), the transition probability is of the form q(y|x, a) = q₁(y₁|x₁, a₁)⋯q_n(y_n|x_n, a_n) where y = (y₁, …, y_n) ∈ X. In both aperiodic and periodic cases, they established the existence of Nash equilibria for n -person games. In the two-person zero-sum case, there exists a stationary Markov perfect equilibrium.

A stochastic game is recursive if the payoffs at all nonabsorbing states are zero. The class of recursive stochastic games is important. The payoffs in any absorbing state can be interpreted as limit averages of stage payoffs as soon as the absorbing state is reached. If no absorbing state is reached, then the average payoff is zero. Moreover, as noted by Simon (2016), “by expanding the state space of any normal stochastic game so that there is a one-to-one relationship between the finite histories of play and the states, any state corresponds to a clopen (open and closed) subset of the infinite histories of play and every open subset of the infinite histories of play will correspond to some collection of states. A stochastic game where all non-zero payoffs are determined by membership in an open set of the infinite histories of play becomes in this way equivalent to a recursive game. Notice that if all absorbing payoffs are positive then the payoffs are lower semicontinuous, and if all absorbing payoffs are negative then the payoffs are upper semicontinuous (as functions on the infinite histories of play).” Flesch et al. (2010b) considered a class of n-person stochastic perfect information games assuming that in every state, the transitions are controlled by one player. The payoffs are equal to zero in every nonabsorbing state and are nonnegative in every absorbing state. They proposed a new iterative method to analyse these games under the expected limit-average payoff criterion and proved the existence of a subgame-perfect 𝜖-equilibrium in pure strategies. They also showed the existence of the uniform equilibrium payoffs. Recursive n-person perfect information games, where each player controls one nonabsorbing state and the transitions are deterministic, were studied in Kuipers et al. (2016). Allowing also for negative payoffs in absorbing states (in contrast to Flesch et al. 2010b), the authors showed the existence of a subgame-perfect 𝜖-equilibrium by a combinatorial method.

Correlated equilibria were introduced by Aumann (1974, 1987) for games in normal form. Correlation devices may be of different types; see Forges (2009). In Sect. 4 we considered a correlation device using public randomization. They are also called stationary, because at every stage a signal is generated according to the same probability distribution, independent of any data. There are also devices based on past signals that were sent to the players, but not on the past play. They are called “autonomous correlation devices” (see Forges 2009). An 𝜖-equilibrium in an extended game that includes an autonomous correlation device is also called an extensive-form correlated 𝜖 -equilibrium in a multistage game. Solan (2001) characterized the set of extensive-form correlated 𝜖-equilibria in stochastic games. He showed that every feasible and individually rational payoff in a stochastic game is an extensive-form correlated equilibrium payoff constructed with the help of an appropriately chosen device.

The following two results are due to Solan and Vieille (2002).

Theorem 22

Every n-person stochastic game with finite state and action spaces has a uniform correlated equilibrium payoff using an autonomous correlation device.

The construction of an equilibrium profile is based on the method of Mertens and Neyman (1981) applied to zero-sum games. The equilibrium path is sustained by the use of threat strategies. However, punishment occurs only if a player disobeys the recommendation of the correlation device. The second result is stronger in some sense but concerns positive recursive games, where the payoffs in absorbing states are nonnegative for all players.

Theorem 23

Every positive recursive stochastic game with finite sets of states and actions has a uniform correlated equilibrium payoff and the correlation device can be taken to be stationary.

The proof of the above result makes use of a variant of the method of Vieille (2000b).

In a recent paper, Mashiah-Yaakovi (2015) considered stochastic games with countable state spaces, finite sets of actions and Borel measurable bounded payoffs, defined on the space H_∞ of all plays. This class includes the G_δ-games of Blackwell (1969). The concept of uniform 𝜖 -equilibrium does not apply to this class of games, because the payoffs are not additive. She proved that these games have extensive-form correlated 𝜖-equilibria.

Secchi and Sudderth (2002a) considered a special class of n-person stochastic “stay-in-a-set games” defined as follows. Let G_i be a fixed subset of X for each i ∈ N. Define $G^i_{\infty }:=\{(x_1,a^1,x_2,a^2,\ldots )\}$, where x_t ∈ G_i for every t. The payoff function for player i ∈ N is the characteristic function of the set $G_i^\infty $. They proved the existence of an 𝜖-equilibrium (equilibrium) assuming that the state space is countable (finite) and the sets of actions are finite. Maitra and Sudderth (2003) generalized this result to the Borel state stay-in-a-set games with compact action sets using standard continuity assumption on the transition probability with respect to actions. Secchi and Sudderth (2002b) proved that every n -person stochastic game with countably many states, finite action sets and bounded upper semicontinuous payoff functions on H_∞ has an 𝜖 -equilibrium. All proofs in the aforementioned papers are partially based on the methods from gambling theory; see Dubins and Savage (2014).

Nonzero-sum infinite horizon games with perfect information are special cases of stochastic games. Flesch et al. (2010a) established the existence of subgame-perfect 𝜖-equilibria in pure strategies in perfect information games with lower semicontinuous payoff functions on the space H_∞ of all plays. A similar result for games with upper semicontinuous payoffs was proved by Purves and Sudderth (2011). It is worth mentioning that the aforementioned results also hold for games with arbitrary nonempty action spaces and deterministic transitions. Solan and Vieille (2003) provided an example of a two-person game with perfect information that has no subgame-perfect 𝜖-equilibrium in pure strategies, but does have a subgame-perfect 𝜖-equilibrium in behavior strategies. Their game belongs to the class of deterministic stopping games. Recently, Flesch et al. (2014) showed that a subgame-perfect 𝜖-equilibrium (in behavioral strategies) may not exist in perfect information games if the payoff functions are bounded and Borel measurable.

Additional general results on subgame-perfect equilibria in games of perfect information can be found in Alós-Ferrer and Ritzberger (2015, 2016). Two refinements of subgame-perfect 𝜖-equilibrium concept were introduced and studied in continuous games of perfect information by Flesch and Predtetchinski (2015).

Two-person discounted stochastic games of perfect information with finite state and action spaces were treated in Küenle (1994). Making use of threat strategies, he constructed a history-dependent pure Nash equilibrium. However, it is worth to point out that pure stationary Nash equilibria need not exist in this class of games. A similar remark applies to irreducible stochastic games of perfect information with the limiting average payoff criterion. Counterexamples are described in Federgruen (1978) and Küenle (1994).

We close this section with a remark on “folk theorems” for stochastic games. It is worth mentioning that the techniques, based on threat strategies utilized very often in repeated games, cannot be immediately adapted to stochastic games, where the players use randomized (behavioral) strategies. Deviations are difficult to discover when the actions are selected at random. However, some folk theorems for various classes of stochastic games were proved in Dutta (1995), Fudenberg and Yamamoto (2011), Hörner et al. (2011, 2014), and Peski and Wiseman (2015). Further comments can be found in Solan and Zillotto (2016).

Abreu et al. (1986, 1990) applied a method for analysing subgame-perfect equilibria in discounted repeated games that resembles the dynamic programming technique. The set of equilibrium payoffs is a set-valued fixed points of some naturally defined operator. A similar idea was used in stochastic games by Mertens and Parthasarathy (1991). The fixed point property for subgame-perfect equilibrium payoffs can be used to develop algorithms. Berg (2016) and Kitti (2016) considered some modifications of the aforementioned methods for discounted stochastic games with finite state spaces. They also demonstrated some techniques for computing (non-stationary) subgame-perfect equilibria in pure strategies provided that they exist. Sleet and Yeltekin (2016) applied the methods of Abreu et al. (1986, 1990) to some classes of dynamic games and provided a new approach for computing equilibrium value correspondences. Their idea is based on outer and inner approximations of the equilibrium value correspondence via step set-valued functions.

10 Nonzero-Sum Stochastic Games with Imperfect Monitoring

There are only a few papers on nonzero-sum stochastic games with imperfect monitoring (or incomplete information). Although in many models an equilibrium does not exist, some positive results were obtained for repeated games; see Forges (1992), Chap. IX in Mertens et al. (2015) and references cited therein. Altman et al. (2005, 2008) studied stochastic games, in which every player can only observe and control his “private state” and the state of the world is composed of the vector of private states. Moreover, the players do not observe the actions of their partners in the game. Such models of games are motivated by certain examples in wireless communications.

In the model of Altman et al. (2008), the state space $X= \prod _{i=1}^n X_i$, where X_i is a finite set of private states of player i ∈ N. The action space A_i(x_i) of player i ∈ N depends on x_i ∈ X_i and is finite. It is assumed that player i ∈ N has no information about the payoffs called costs. Hence, player i only knows the history of his private state process and the action chosen by himself in the past. Thus, a strategy π_i of player i ∈ N is independent of realizations of state processes of other players and their actions. If x = (x₁, …, x_n) ∈ X is a state at some period of the game and a = (a₁, …, a_n) is the action profile selected independently by the players at that state, then the probability of going to state y = (y₁, …, y_n) is q(y|x, a) = q₁(y₁|x₁, a₁)⋯q_n(y_n|x_n, a_n), where $q_i(\cdot |x_i,a_i)\in \Pr (A_i(x_i))$. Thus the coordinate (or private) state processes are independent. It is assumed that every player i has a probability distribution ν_i of the initial state x_i ∈ X_i and that the initial private states are independent. The initial distribution ν of the state x ∈ X is determined by ν₁, …, ν_n in an obvious way and is known by the players. Further, it is supposed that player i ∈ N is given some stage cost functions $ c_i^j(x,a)$ (j = 0, 1, …, n_i) depending on x ∈ X and action profiles a available in that state. The cost function $c^0_i$ is to be minimized by player i in the long run, and $c^j_i$ (for j > 0) are the costs that must satisfy some constraints described below.

Any strategy profile π together with the initial distribution ν and the transition probability q induces a unique probability measure on the space of all infinite plays. The expectation operator with respect to this measure is denoted by $E^\pi _{\nu }$. The expected limit-average cost $ C_i^j(\pi )$ is defined as follows:

$$\displaystyle \begin{aligned} C^j_i(\pi):= \limsup_{T\to\infty} \frac{1}{T}E^\pi_{\nu}\left(\sum_{t=1}^T c^j_i(x^t,a^t)\right). \end{aligned}$$

Note that x^t ∈ X and a^t is an action profile of all the players.

Let $b_i^j>0$ (j = 1, …, n_i) be bounds used to define constraints below. A strategy profile π is i-feasible if

$$\displaystyle \begin{aligned} C^j_i(\pi)\le b_i^j\quad \mbox{ for each}\quad j=1,\ldots,n_i. \end{aligned}$$

Thus, π is feasible if it is i-feasible for every player i ∈ N.

A strategy profile π^∗ is called a constrained Nash equilibrium, if π^∗ is feasible and for every player i ∈ N and his strategy π_i such that the profile $ (\pi _i,\pi ^*_{-i})$ is i-feasible, we have

$$\displaystyle \begin{aligned} C^0_i(\pi)\le C^0_i(\pi_i,\pi^*_{-i}). \end{aligned}$$

Note that a unilateral deviation of player i may increase his cost or it may violate his constraints. The aforementioned fact is illustrated in Altman et al. (2008) by an example in wireless communications.

Altman et al. (2008) made the following assumptions.

(I1)
(Ergodicity) For every player i ∈ N and any stationary strategy the state process on X_i is an irreducible Markov chain with one ergodic class and possibly some transient states.
(I2)
(Strong Slater condition) There exists some η > 0 such that every player i ∈ N has a strategy $\pi _i^\eta $ with the property that for any strategy profile π_−i of other players
$$\displaystyle \begin{aligned} C^j_i(\pi^\eta_i,\pi_{-i})\le b^j_i-\eta \quad \mbox{for all}\quad j=1,\ldots,n_i. \end{aligned}$$
(I3)
(Information) The players do not observe their costs.

Theorem 24

Consider the game model that satisfies conditions (I1)–(I3) . Then there exists a stationary constrained Nash equilibrium.

Stochastic games with finite sets of states and actions and imperfect public monitoring were studied in Fudenberg and Yamamoto (2011) and Hörner et al. (2011). The players, in their models, observe states and receive only public signals on the chosen actions by the partners in the game. Fudenberg and Yamamoto (2011) and Hörner et al. (2011) established “folk theorems” for stochastic games under assumptions that relate to “irreducibility” conditions on the transition probability function. Moreover, Hörner et al. (2011, 2014) also studied algorithms for both computing the sets of all equilibrium payoffs in the normalized discounted games and for finding their limit as the discount factor tends to one. As shown in counterexamples in Flesch et al. (2003) an n -person stochastic game with non-observable actions of the players (and no public signals), observable payoffs and the expected limit-average payoff criterion does not possess 𝜖-equilibrium. Cole and Kocherlakota (2001) studied discounted stochastic games with hidden states and actions. They provided an algorithm for finding a sequential equilibrium, where strategies depend on private information only through the privately observed state. Imperfect monitoring is also assumed in the model of the supermodular stochastic game studied by Balbus et al. (2013b), where the monotone convergence of Nash equilibrium payoffs in finite-stage games is proved.

11 Intergenerational Stochastic Games

This section develops a concept of equilibrium behavior and establishes its existence in various intergenerational games. Both paternalistic and non-paternalistic altruism cases are discussed. Consider an infinite sequence of generations labelled by $t\in \mathbb {N}$. There is a single good (called also a renewable resource) that can be used for consumption or productive investment. The set of all resource stocks S is an interval in $ \mathbb {R}$. It is assumed that 0 ∈ S. Every generation lives one period and derives utility from its own consumption and consumptions of some or all its descendants. Generation t observes the current stock s_t ∈ S and chooses a_t ∈ A(s_t) := [0, s_t] for consumption. The remaining part y_t = s_t − a_t is left as an investment for its descendants. The next generation’s inheritance or endowment is determined by a weakly continuous transition probability q from S to S (stochastic production function), which depends on y_t ∈ A(s_t) ⊂ S. Recall that the weak continuity of q means that q(⋅|y_m) ⇒ q(⋅|y₀) if y_m → y₀ in S (as m →∞). Usually, it is assumed that state 0 is absorbing, i.e., q({0}|0) = 1. Let Φ be the set of all Borel functions ϕ : S → S such that ϕ(s) ∈ A(s) for each s ∈ S. A strategy for generation t is a function ϕ_t ∈ Φ. If ϕ_t = ϕ for all $t\in \mathbb {N}$ and some ϕ ∈ Φ, then we say that the generations employ a stationary strategy.

Suppose that all generations from t + 1 onward use a consumption strategy c ∈ Φ. Then, in the paternalistic model generation t’s utility when it consumes a_t ∈ A(s_t) equals to H(a_t, c)(s_t), where H is some real-valued function used for measurement of the satisfaction level of the generation. This implies that in models with paternalistic altruism each generation derives its utility from its own consumption and the consumptions of its successor or successors.

Such a game model reveals a time inconsistency. Strotz (1956) and Pollak (1968) were among the first, who noted this fact in the model of an economic agent whose preferences change over time. In related works, Phelps and Pollak (1968) and Peleg and Yaari (1973) observed that this situation is formally equivalent to one, in which decisions are made by a sequence of heterogeneous planners. They investigated the existence of consistent plans, what we shall call (stationary) Markov perfect equilibria. The solution concept is in fact a symmetric Nash equilibrium (c^∗, c^∗, …) in a game played by countably many short-lived players having the same utility functions. Therefore, we can say that a stationary Markov perfect equilibrium (c^∗, c^∗, …) corresponds to a strategy c^∗∈ Φ such that

$$\displaystyle \begin{aligned} H(c^*(s),c^*)(s) =\sup_{a\in A(s)}H(a,c^*)(s) \end{aligned}$$

for every s ∈ S. We identify this equilibrium with c^∗.

In other words, c^∗∈ Φ is a stationary Markov perfect equilibrium if

$$\displaystyle \begin{aligned} c^*(s) \in \mathrm{arg} \max_{a\in A(s)}H(a,c^*)(s)\ \ \mbox{for each }\ s\in S. \end{aligned}$$

There is now a substantial body of work on paternalistic models, see for instance, Alj and Haurie (1983), Harris and Laibson (2001), and Nowak (2010) and the results presented below in this section. At the beginning we consider three types of games, in which the existence of a stationary Markov perfect equilibrium was proved in a sequence of papers: Balbus et al. (2015a,b,c). Game (G1) describes a purely deterministic case, while games (G2) and (G3) deal with a stochastic production function. However, (G2) concerns a model with one descendant, whereas (G3) examines a model with infinitely many descendants. Let us mention that by an intergenerational game with k (k is finite or infinite) descendants (successors or followers), we mean a game in which each generation derives its utility from its own consumption and consumptions of its k descendants.

(G1)
Let S := [0, +∞). Assume that $q(\cdot |y_t)=\delta _{p(y_t)}( \cdot )$, where p : S → S is a continuous and increasing production function such that p(0) = 0. We also assume that
$$\displaystyle \begin{aligned} H(a,c)(s)= \hat{u}(a,c(p(s-a))) \end{aligned}$$
for some continuous and increasing in each variable function $\hat {u}: \mathbb {R}^2_+ \to \mathbb {R}\cup \{-\infty \}$. Moreover, we allow $\hat {u}$ to be unbounded from below. Hence, we assume that $\hat {u}(0,y) \ge -\infty $ for all y ≥ 0 and $\hat {u}(x,0) >-\infty $ for all x > 0. Furthermore, for any y₁ > y₂ in S and h > 0, we assume that the function $\varDelta _h\hat {u} (x):= \hat {u}(x,y_1)-\hat {u}(x+h,y_2)$ has the strict single crossing property on (0, +∞), i.e., $\varDelta _h\hat {u}(x) \ge 0$ implies that $\varDelta _h\hat {u}(x^{\prime })> 0$ for each x^′ > x (see Milgrom and Shannon 1994).
(G2)
Let S := [0, +∞). We study a model with a utility that reflects a generation’s attitude toward risk. This fact is reflected by a positive risk coefficient r. In this setup, H takes the following form:
$$\displaystyle \begin{aligned} H(a,c)(s) =\left\{ \begin{array}{l@{\quad \quad}l} u(a) + \beta\int_Sv(c(s^{\prime }))q(ds^{\prime }|s-a), & \mbox{for}\quad r=0 \\ u(a) - \frac{\beta}{r}\ln\int_Se^{-rv(c(s^{\prime }))}q(ds^{\prime }|s-a), & \mbox{for}\quad r>0, \end{array} \right. \end{aligned}$$
where $u:S\to \mathbb {R}\cup \{-\infty \}$ is increasing, strictly concave, continuous on (0, +∞) and u(0) ≥−∞. In addition, the function $v:S\to \mathbb {R}$ is bounded, continuous and increasing. Further assumptions are as follows: for every s ∈ S, the set Z_s := {y ∈ S : q({s}|y) > 0} is countable and the transition law q is stochastically increasing. The latter fact means that, if z → Q(z|y) is the cumulative distribution function for q(⋅|y), then for all y₁ < y₂ and z ∈ S, we have Q(z|y₁) ≥ Q(z|y₂).
(G3)
Let $S:=[0,\bar {s}]$ for some $\bar {s}>0$. In this case, we assume that the utility function of current generation t is as follows:
$$\displaystyle \begin{aligned} H(a,c)(s) = \tilde{u}(a) + E^c_{s}[w(a_{t+1},a_{t+2},\ldots)], \end{aligned}$$
where $w: S^\infty \to \mathbb {R}$ is continuous and $\tilde {u}: S\mapsto \mathbb {R}$ is continuous, strictly concave and increasing. Here, $E_s^c$ is an expectation operator with respect to the unique probability measure on the space of all feasible future histories (starting from the endowment s of generation t) of the consumption-investment process induced by the stationary strategy c ∈ Φ used by each generation τ (τ > t) and the transition probability q. The function $\tilde {u}$ is also assumed to be continuous and strictly concave. Defining
$$\displaystyle \begin{aligned} \tilde{J}(c)(s)= E^c_{s}[w(a_k, a_{k+1},a_{k+2},\ldots)] \end{aligned}$$
for every $k\in \mathbb {N}$, we obtain that
$$\displaystyle \begin{aligned} H(a,c)(s) =\tilde{u}(a)+\int_S \tilde{J}(c)(s^{\prime })q(ds^{\prime }|s-a). \end{aligned}$$
In addition, q(⋅|y) is assumed to be nonatomic for y > 0.

Let I denote the set of nondecreasing lower semicontinuous functions $i: S\to \mathbb {R}$ such that i(s) ∈ A(s) for each s ∈ S. Note that every i ∈ I is continuous from the left and has at most a countable number of discontinuity points. Put

$$\displaystyle \begin{aligned} F:=\{c\in \varPhi:\; c(s)=s-i(s),\; i\in I,\ s\in S\}. \end{aligned}$$

Clearly, any c ∈ F is upper semicontinuous and continuous from the left. The idea of using the class F of strategies for analysing equilibria in deterministic bequest games comes from Bernheim and Ray (1983). Further, it was successfully applied to the study of other classes of dynamic games with simultaneous moves; see Sundaram (1989a) and Majumdar and Sundaram (1991).

Theorem 25

Every intergenerational game (G1), (G2)and (G3)possesses a stationary Markov perfect equilibrium c^∗∈ F.

The main idea of the proof is based upon the consideration of an operator L defined as follows: to each consumption strategy c ∈ F used by descendant (or descendants) the function L assigns the maximal element c₀ from the set of best responses to c. It is shown that c₀ ∈ F. Moreover, F can be viewed as a convex subset of the vector space Y of real-valued continuous from the left functions $\eta :S\mapsto \mathbb {R}$ of bounded variation on every interval S_n := [0, n], $n\in \mathbb {N}$, thus in particular on $[0,\bar {s}]$. We further equip Y with the topology of weak convergence. We assume that (η^m) converges weakly to some η⁰ ∈ Y , if $\lim\limits _{m\to \infty }\eta ^m(s)=\eta ^0(s)$ for every continuity point s of η⁰. Then, due to Lemma 2 in Balbus et al. (2015c) , F is compact and metrizable. Finally, the equilibrium point is obtained via the Schauder-Tychonoff fixed point theorem applied to the operator L.

Theorem 25 for game (G1) was proved in Balbus et al. (2015c). Related results for the purely deterministic case were considered by Bernheim and Ray (1983) and Leininger (1986). For instance, Leininger (1986) studied a class $ \mathcal {U}$ of bounded from below utility functions for which every selector of the best response correspondence is nondecreasing. In particular, he noticed that this class is nonempty and it includes, for instance, the separable case, i.e., u(x, y) = v(x) + bv(y), where v is strictly increasing and concave and b > 0. Bernheim and Ray (1983), on the other hand, showed that the functions u that are strictly concave in their first argument and satisfying the so-called increasing differences property (see Sect. 2) also belong to $\mathcal {U.}$ Other functions u that meet conditions imposed by Bernheim and Ray (1983) and Leininger (1986) are of the form u(x, y) = v₁(x)v₂(y), where v₁ is strictly concave and v₂ ≥ 0 is continuous and increasing. The class $\mathcal {U}$ is not fully characterized. The class (G1) of games includes all above-mentioned examples and some new ones. Our result is also valid for a larger class of utilities that can be unbounded from below. Therefore, Theorem 25 is a generalization of Theorem 4.2 in Bernheim and Ray (1983) and Theorem 3 in Leininger (1986). The proofs given by Bernheim and Ray (1983) and Leininger (1986) do not work for unbounded utility functions. Indeed, Leininger (1986) uses a transformation of upper semicontinuous consumption strategies into the set of Lipschitz functions with constant 1. This clever “levelling” operation enables him to equip the space of continuous functions on the interval $[0, \bar {y}]$ with the topology of uniform convergence and to apply the Schauder fixed point theorem. His proof strongly makes use of the uniform continuity of u. This is the case, when the production function crosses the 45^∘ line. If the production function does not cross the 45^∘ line, a stationary equilibrium is then obtained as a limit of equilibria corresponding to the truncations of the production function. However, this part of the proof is descriptive and sketchy. Bernheim and Ray (1983), on the other hand, identify with the maximal best response consumption strategy, which is upper semicontinuous, a convex-valued upper hemicontinuous correspondence. Then, such a space of upper hemicontinuous correspondences is equipped with the Hausdorff topology. This fact implies the strategy space is compact, if endowments have an upper bound, i.e., when the production function p crosses the 45^∘ line. If this is not satisfied, then a similar approximation technique as in Leininger (1986) is employed. Our proof does not follow the above-mentioned approximation methods. The weak topology introduced in the space Y implies that F is compact and allows to use an elementary but non-trivial analysis. For examples of deterministic bequest games with stationary Markov perfect equilibria given in closed form the reader is referred to Fudenberg and Tirole (1991) and Nowak (2006b, 2010).

Theorem 25 for game (G2) was proved by Balbus et al. (2015b), whereas for game (G3) by Balbus et al. (2015a). Within the stochastic framework, Theorem 25 is an attempt of saving the result reported by Bernheim and Ray (1989) on the existence of stationary Markov perfect equilibria in games with very general utility function and nonatomic shocks. If q is allowed to possess atoms, then a stationary Markov perfect equilibrium exists in the bequest games with one follower (see Theorems 1–2 in Balbus et al. 2015b). The latter result also embraces the purely deterministic case; see Example 1 in Balbus et al. (2015b), where the nature and role of assumptions are discussed. However, as shown in Example 3 in Balbus et al. (2015a), the existence of stationary Markov perfect equilibria in the class of F cannot be proved in intergenerational games where q has atoms and there are more than one descendant.

The result in Bernheim and Ray (1986) concerns “consistent plans” in models with finite time horizon. The problem is then simpler. The results of Bernheim and Ray (1986) were considerably extended by Harris (1985) in his paper on perfect equilibria in some classes of games of perfect information. It should be noted that there are other papers that contain certain results for bequest games with stochastic production function. Amir (1996b) studied games with one descendant for every generation and the transition probability such that the induced cumulative distribution function Q(z|y) is convex in y ∈ S. This condition is rather restrictive. Nowak (2006a) considered similar games in which the transition probability is a convex combination of the Dirac measure at state s = 0 and some transition probability from S to S with coefficients depending on investments. Similar models were considered by Balbus et al. (2012, 2013a). The latter paper also studies some computational issues for stationary Markov perfect equilibria. One should note, however, that the transition probabilities in the aforementioned works are specific. However, the transition structure in Balbus et al. (2015a,b) is consistent with the transitions used in the theory of economic growth; see Bhattacharya and Majumdar (2007) and Stokey et al. (1989).

The interesting issue studied in the economics literature concerns the limiting behavior of the state process induced by a stationary Markov perfect equilibrium. Below we formulate a steady state result for a stationary Markov perfect equilibrium obtained for the game (G1). Under slightly more restrictive conditions it was shown by Bernheim and Ray (1987) that the equilibrium capital stock never exceeds the optimal planning stock in any period. Namely, it is assumed that

(B1)
p is strictly concave, continuously differentiable and $ \lim _{y\to 0^+}p^{\prime }(y)>1$, lim_y→∞p^′(y) < 1∕β, where β ∈ (0, 1] is a discount factor;
(B2)
$ \hat {u}(a_t,a_{t+1})=\hat {v}(a_t)+\beta \hat {v}(a_{t+1})$, where $\hat {v} :S\to \mathbb {R}$ is increasing, continuously differentiable, strictly concave and $\hat {v}(a)\to \infty $ as a →∞.

An optimal consumption program $\hat {a}:=(\hat {a}_t)_{t\in \mathbb {N} }$ is the one which maximizes $\sum _{t=1}^\infty \beta ^{t-1} \hat {v}(\hat {a} _t)$ subject to all feasibility constraints described in the model. The following result is stated as Theorems 3.2 and 3.3 in Bernheim and Ray (1987).

Theorem 26

Assume (B1)–(B2)and consider game (G1) . If c^∗ is a stationary Markov perfect equilibrium, then $i^*(s)=s-c^*(s)\le \hat {y}$, where $\hat {y}\in [0,\infty )$ is the limit of the sequence $(s_t-\hat {a_t})_{t\in \mathbb {N}}$. If $\hat {y}>0$, it solves βp′(y) = 1. If $\lim _{y\to 0^+}p'(y)>1/\beta $, then $\hat {y}>0$.

For further properties of stationary Markov perfect equilibria such as efficiency, and Pareto optimality, the reader is referred to Sect. 4 in Bernheim and Ray (1987).

For stochastic models it is of some interest to know whether a stationary Markov perfect equilibrium induces a Markov process having an invariant distribution. It turns out that the answer is positive if an additional stochastic monotonicity requirement is met:

(B3)
If y₁ < y₂, then for any nondecreasing Borel measurable function $h:S\to \mathbb {R}$,
$$\displaystyle \begin{aligned} \int_S h(s)q(ds|y_1)\le \int_S h(s)q(ds|y_2). \end{aligned}$$

By Theorem 25 for game (G3), there exists c^∗∈ F. Then s → i^∗(s) = s − c^∗(s) is nondecreasing on S. Put q^∗(B|s) := q(B|i^∗(s)) where B is a Borel subset of S and s ∈ S. From (B3), it follows that s → q^∗(⋅|s) is nondecreasing. Define the mapping $\varPsi :\Pr (S)\to \Pr (S)$ by

$$\displaystyle \begin{aligned} \varPsi\sigma(B):= \int_S q^*(B|s)\sigma(ds) \end{aligned}$$

where $B\in \mathcal {B}(S)$. An invariant distribution for the Markov process induced by the transition probability q^∗ determined by i^∗ (and thus by c^∗) is any fixed point of Ψ. Let Δ(q^∗) be the set of invariant distributions for the process induced by q^∗. In Sect. 4 in Balbus et al. (2015a), the following result was proved.

Theorem 27

Assume (B3)and consider game (G3) . Then, the set of invariant distributions Δ(q^∗) is compact in the weak topology on $\Pr (S)$.

For each σ ∈ Δ(q^∗), M(σ) :=∫_Ssσ(ds) is the mean of distribution σ. By Theorem 27, there exists σ^∗∗ with the highest mean over the set Δ(q^∗).

One can ask for the uniqueness of invariant distribution. Theorem 4 in Balbus et al. (2015a) yields a positive answer to this question. However, this result concerns the model with multiplicative shocks, i.e., q is induced by the equation

$$\displaystyle \begin{aligned} s_{t+1}= f(y_t)\xi_t, \quad t\in\mathbb{N}, \end{aligned}$$

where f : S → S is a continuous increasing function such that f(0) > 0. In addition, there is a state $ \hat {s}\in (0,\infty )$ such that f(y) > y for $y\in (0,\hat {s})$ and f(y) < y for $y\in (\hat {s}, \infty )$. Here $(\xi _t)_{t\in \mathbb {N}}$ is an i.i.d. sequence with the nonatomic distribution π. Assuming additionally the monotone mixing condition, we conclude from Theorem 4 in Balbus et al. (2015a) the uniqueness of the invariant distribution. Further discussion on these issues can be found in Stokey et al. (1989), Hopenhayn and Prescott (1992), Stachurski (2009), Balbus et al. (2015a) and the references cited therein.

In contrast to the paternalistic model one can also think of a non-paternalistic altruism. This notion is concerned with a model, in which each generation’s utility is derived from its own consumption and the utilities of its all successors. The most general model with non-paternalistic altruism was formulated by Ray (1987). His work is of some importance, because it provides a proper definition of an equilibrium for the non-paternalistic case. According to Ray (1987), a stationary equilibrium consists of a pair of two functions: a saving policy (or strategy) and an indirect utility function. Such a pair constitutes an equilibrium if it is optimal for the current generation, provided its descendants use the same saving strategy and the same indirect utility function.

Assume that the generations from t onward use a consumption strategy c ∈ Φ. Then, the expected utility of generation t, that inherits an endowment $s_t=s\in S:=[0, \bar {s}]$, is of the form

$$\displaystyle \begin{aligned} W_t(c,v)(s):= (1-\beta)\tilde{u}(c(s)) +\beta E^c_{s}[w(v(s_{t+1}),v(s_{t+2}),\ldots)]. \end{aligned} $$

(6.8)

where $\tilde {u}:S \to K$ and w : K^∞→ K are continuous functions and $K:=[0,\bar {k}]$ with some $\bar {k}\ge \bar {s}$. The function v : S → K is called an indirect utility and is assumed to be Borel measurable. Similarly, for any c ∈ Φ and s = s_t+1 ∈ S, we can define

$$\displaystyle \begin{aligned} J(c,v)(s):= E^c_{s}[w(v(s_{t+2}),v(s_{t+3}),\ldots)], \end{aligned}$$

which yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} W(c,v)(s):=W_t(c,v)(s) &\displaystyle =&\displaystyle (1-\beta) \tilde{u}(c(s)) +\beta\int_S J(c,v)(s^{\prime })q(ds^{\prime }|s-c(s)). \end{array} \end{aligned} $$

Let us define

$$\displaystyle \begin{aligned} P(a,c,v)(s):=(1-\beta)\tilde{u}(a)+\beta\int_S J(c,v)(s^{\prime })q(ds^{\prime }|s-a), \end{aligned}$$

where s ∈ S, a ∈ A(s) and c ∈ Φ. If s_t = s, then P(a, c, v)(s) is the utility for generation t choosing the consumption level a ∈ A(s_t) in this state under the assumption that all future generations will employ a stationary strategy c ∈ Φ and the indirect utility is v.

A stationary equilibrium in the sense of Ray (1987) is a pair (c^∗, v^∗), with c^∗∈ Φ, and v^∗ : S → K being a bounded Borel measurable function such that for every s ∈ S, we have that

$$\displaystyle \begin{aligned} v^*(s)=\sup_{a\in A(s)}P(a,c^*,v^*)(s)=P(c^*(s),c^*,v^*)(s)=W(c^*,v^*)(s). \end{aligned} $$

(6.9)

Note that equality (6.9) says that there exist an indirect utility function v^∗ and a consumption strategy c^∗, both depending on the current endowment, such that each generation finds it optimal to adopt this consumption strategy provided its descendants use the same strategy and exhibit the given indirect utility.

Let V be the set of all nondecreasing upper semicontinuous functions v : S → K. Note that every v ∈ V is continuous from the right and has at most a countable number of discontinuity points. By I we denote the subset of all functions φ ∈ V such that φ(s) ∈ A(s) for each s ∈ S. Let F = {c : c(s) = s − i(s), s ∈ S, i ∈ I}. We impose similar conditions to those imposed on model (G3). Namely, we shall assume that $ \tilde {u}$ is strictly concave and increasing. Then, the following result holds.

Theorem 28

In a non-paternalistic game as described above with nonatomic transitions, there exists a stationary equilibrium (c^∗, v^∗) ∈ F × V .

Theorem 28 was established as Theorem 1 in Balbus et al. (2016). Ray (1987) analysed games with non-paternalistic altruism and deterministic production functions. Unfortunately, his proof contains a mistake. The above result is strongly based on the assumption that the transitions are nonatomic and weakly continuous. The problem in the deterministic model of Ray (1987) remains open. However, Theorem 28 implies that an equilibrium exists if a “small nonatomic noise” is added to the deterministic transition function.

There is a great deal of work devoted to the so-called hyperbolic decision makers, in which the function w in (G3) has a specific form. Namely,

$$\displaystyle \begin{aligned} w(a_k,a_{k+1},a_{k+2},\ldots)=\alpha\beta\sum_{m=k}^\infty \beta^{m-k} \tilde{u}(a_m), \end{aligned} $$

(6.10)

where α > 0 and is interpreted as a short-run discount factor and β < 1 is known as a long-run discount coefficient. This model was studied by Harris and Laibson (2001) with the transition function defined via the difference equation

$$\displaystyle \begin{aligned} s_{t+1}=R(s_t-a_t)+\xi_t,\quad \quad R\ge 0\quad\mbox{and} \quad t\in\mathbb{ N}. \end{aligned}$$

The random variables $(\xi _t)_{t\in \mathbb {N}}$ are nonnegative, Independent, and identically distributed with respect to a nonatomic probability measure. The function $\tilde {u}$ satisfies some restrictive condition concerning the risk aversion of the decisionmaker, but it may be unbounded from above. Working in the class of strategies with locally bounded variation, Harris and Laibson (2001) showed the existence of a stationary Markov perfect equilibrium in their model with concave utility function $ \tilde {u}$. They also derived a strong hyperbolic Euler relation. The model considered by Harris and Laibson (2001) can also be viewed as a game between generations; see Balbus and Nowak (2008), Nowak (2010), and Jaśkiewicz and Nowak (2014a) where related versions are studied. However, its main interpretation in the economics literature says that it is a decision problem where the utility of an economic agent changes over time. Thus, the agent is represented by a sequence of selves and the problem is to find a time-consistent solution . This solution is actually a stationary Markov perfect equilibrium obtained by thinking about selves as players in an intergenerational game. For further details and references, the reader is referred to Harris and Laibson (2001) and Jaśkiewicz and Nowak (2014a). A decision model with time-inconsistent preferences involving selves who can stop the process at any stage was recently studied by Cingiz et al. (2016)

The model with the function w defined in (6.10) can be extended by adding to the transition probabilities an unknown parameter θ. Then, the natural solution for such a model is a robust Markov perfect equilibrium. Roughly speaking, this solution is based on the assumption that the generations involved in the game are risk-sensitive and accept a maxmin utility. More precisely, let Θ be a nonempty Borel subset of Euclidean space $\mathbb {R}^m$ (m ≥ 1). Then, the endowment s_t+1 for generation t + 1 is determined by the transition q from S × Θ to S that depends on the investment y_t ∈ A(s_t) and a parameter θ_t ∈ Θ. This parameter is chosen according to a certain probability measure $\gamma _t\in \mathcal {P}$, where $\mathcal {P}$ denotes the action set of nature and it is assumed to be a Borel subset of $ \Pr (\varTheta )$.

Let Γ be the set of all sequences $(\gamma _t)_{t\in \mathbb {N}}$ of Borel measurable mappings $\gamma _t: D\to \mathcal {P}$, where D = {(s, a) : s ∈ S, a ∈ A(s)}. For any $t\in \mathbb {N}$ and $\gamma =(\gamma _t)_{t\in \mathbb {N}}\in \varGamma $, we set γ^t := (γ_τ)_τ≥t. Clearly, γ^t ∈ Γ. A Markov strategy for nature is a sequence $\gamma =(\gamma _t)_{t\in \mathbb {N}}\in \varGamma $. Note that γ^t can be called a Markov strategy used by nature from period t onward.

For any $t\in \mathbb {N}$, define H^t as the set of all sequences

$$\displaystyle \begin{aligned} h^t= (a_t,\theta_t,s_{t+1},a_{t+1},\theta_{t+1},\ldots), \quad\mbox{where } (s_k,a_k)\in D,\ \theta_k\in\varTheta \mbox{ and }k\ge t. \end{aligned}$$

H^t is the set of all feasible future histories of the process from period t onward. Endow H^t with the product σ-algebra. Assume in addition that $ \tilde {u}\le 0$, the generations employ a stationary strategy c ∈ Φ and nature chooses some γ ∈ Γ. Then the choice of nature is a probability measure depending on (s_t, c(s_t)). Let $E^{c,\gamma ^t}_{s_t}$ denote as usual the expectation operator corresponding to the unique probability measure on H^t induced by a stationary strategy c ∈ Φ used by each generation (τ ≥ t), a Markov strategy of nature γ^t ∈ Γ and the transition probability q. Assume that all generations from t onward use c ∈ Φ and nature applies a strategy γ^t ∈ Γ. Then, the generation t’s expected utility is of the following form:

$$\displaystyle \begin{aligned} \hat{W}(c)(s_t):=\inf_{\gamma^t\in \varGamma} E_{s_t}^{c,\gamma^t}\left(\tilde{u }(c(s_t))+ \alpha\beta\sum_{m=t+1}^\infty \beta^{m-t-1}\tilde{u} (c(s_{\tau}))\right). \end{aligned}$$

This definition of utility in an intergenerational game provides an intuitive notion of ambiguity aversion, which can be regarded as the generations’ diffidence for any lack of precise definition of uncertainty, something that provides room for the malevolent influence of nature. Letting

$$\displaystyle \begin{aligned} \hat{J}(c)(s_j)= \inf_{\gamma^j\in\varGamma}E_{s_{j}}^{c,\gamma^j}\left(\sum_{m=j}^\infty \beta^{m-j}\tilde{u}(c(s_{\tau}))\right) \end{aligned}$$

we one can show that

$$\displaystyle \begin{aligned} \hat{W}(c)(s_t) = \tilde{u}(c(s))+\inf_{\xi\in\mathcal{P}}\alpha\beta\int_S \hat{J}(c)(s_{t+1})q(ds_{t+1}|s_t-c(s_t),\xi). \end{aligned}$$

For any s ∈ S, a ∈ A(s), and c ∈ Φ, we set

$$\displaystyle \begin{aligned} \hat{P}(a,c)(s)=\tilde{u}(a)+\inf_{\xi\in\mathcal{P}}\alpha\beta\int_S \hat{J }(c)(s^{\prime })q(ds^{\prime }|s-a,\xi). \end{aligned}$$

If s = s_t, then $\hat {P}(a,c)(s)$ is the utility for generation t choosing a ∈ A(s_t) in this state when all future generations employ a stationary strategy c ∈ Φ.

A robust Markov perfect equilibrium is a function c^∗∈ Φ such that for every s ∈ S we have

$$\displaystyle \begin{aligned} \sup_{a\in A(s)} \hat{P}(a,c^*)(s)=\hat{P}(c^*(s),c^*)(s)=\hat{W}(c^*)(s). \end{aligned}$$

The existence of a robust Markov perfect equilibrium in the aforementioned model was proved by Balbus et al. (2014) under the assumption that the transition probability is a convex combination of probability measures μ₁, …, μ_l on S with coefficients depending on investments y = s − a . A robust Markov perfect equilibrium was obtained in the class of functions F under the condition that all measures μ₁, …, μ_l are nonatomic. If μ₁, …, μ_l have atoms, then some stochastic dominance conditions are imposed, but the equilibrium was obtained in the class of Lipschitz continuous functions with constant 1. A different approach was presented in the work of Jaśkiewicz and Nowak (2014b), where the set of endowments S and the set of consumptions are Borel and the parameter set Θ is finite. Assuming again that the transition probability is a finite convex combination of probability measures μ₁, …, μ_l on S depending on the parameter θ with coefficients depending on the inheritance s and consumption level a, they have established a twofold result. First, they proved the existence of a robust Markov perfect equilibrium in the class of randomized strategies. Second, assuming that μ₁, …, μ_l are nonatomic, and making use of the purification theorem of Dvoretzky-Wald-Wolfowitz, they replaced a randomized equilibrium by a pure one.

The models of intergenerational games with general spaces of consumptions and endowments were also examined by Jaśkiewicz and Nowak (2014a). A novel feature in this approach is the fact that generation t can employ the entropic risk measure to calculate its utilities. More precisely, if Z is a random variable with the distribution π, then its entropic risk measure is $\mathcal {E}(Z)=\frac {1}{r} \ln \int _{\varOmega } e^{rZ(\omega )}\pi (d\omega )$, where r < 0 is a risk coefficient. If r is sufficiently close to zero, then making use of the Taylor expansion one can see that

$$\displaystyle \begin{aligned} \mathcal{E}(Z)\approx EZ + \frac{r}{2}\mathrm{Var}(Z). \end{aligned}$$

This means that a generation which uses the entropic risk measure to calculate its utility is risk averse and takes into account not only the expected value of a random future successors’ utilities derived from consumptions but also their variance. Assuming that each generation cares only about its m descendants and assuming that the transition probability is a convex combination of finitely many nonatomic measures on the endowment space with coefficients that may depend on s and a, Jaśkiewicz and Nowak (2014a) proved the existence of stationary Markov perfect equilibrium in pure strategies. The same result was shown for games with infinitely many descendants in the case of hyperbolic preferences. In both cases the proof consists of two parts. First, a randomized stationary Markov perfect equilibrium was shown to exist. Second, making use of the specific structure of the transition probability and applying the Dvoretzky-Wald-Wolfowitz theorem a desired pure stationary Markov perfect equilibrium was obtained.

A related game to the above mentioned models is the one with quasi-geometric discounting from the dynamic consumer theory; see Chatterjee and Eyigungor (2016). Particularly, the authors showed that in natural cases such a game does not possess a Markov perfect equilibrium in the class of continuous strategies. However, a continuous Markov perfect equilibrium exists, if the model was reformulated involving lotteries. These two models were then numerically compared. It is known that the numerical analysis of equilibrium in models with hyperbolic (quasi-geometric) discounting shows difficulties in achieving convergence even in a simple, deterministic optimal growth problem that has a smooth closed-form solution. Maliar and Maliar (2016) defined some restrictions on the equilibrium strategies under which the numerical methods studied deliver a unique smooth solution for many deterministic and stochastic models.

Finally, we wish to point out that Markov Perfect Equilibria for stochastic bequest games with transition probabilities and utilities depending on time were shown to exist in Balbus et al. (2017, 2018).

12 Stopping Games

Stopping games were introduced by Dynkin (1969) as a generalization of optimal stopping problems. They were used in several models in economics and operations research, for example, in equipment replacement, job search, and consumer purchase behavior; see Heller (2012).

Dynkin (1969) dealt with the following problem. Two players observe a bivariate sequence of adapted random variables $(X(k),Y(k))_{k\in \mathbb {N}_0}$, where $\mathbb {N}_0= \mathbb {N}\cup \{0\}$. Player 1 chooses a stopping time τ₁ such that {τ₁ = k}⊂{X(k) ≥ 0}, whereas player 2 selects τ₂ such that {τ₂ = k}⊂{X(k) < 0}. If τ₁ ∧ τ₂ is finite, then player 2 pays Y (τ) to player 1 and the game terminates. Hence, the objective of player 1 (respectively 2) is to maximize (minimize) R(τ₁, τ₂) = E[Y (τ₁ ∧ τ₂)]. Dynkin (1969) characterized 𝜖-optimal stopping times and proved that the game has a value provided that $\sup _{k\in \mathbb {N}_0}|Y(k)|$ is integrable. This model was later extended by Kiefer (1971) and Neveu (1975). In particular, Neveu (1975) showed the existence of a game value in a slightly modified model. Namely, he dealt with the following expected payoff function:

$$\displaystyle \begin{aligned} R(\tau_1,\tau_2)= E[X(\tau_1)1[\tau_1<\tau_2]+Y(\tau_2)1[\tau_2\le \tau_1]], \end{aligned}$$

where $(X(k))_{k\in \mathbb {N}_0}$ and $(Y(k))_{k\in \mathbb {N}_0}$ are $ \mathbb {R}$-valued adapted stochastic processes such that $\sup _{k\in \mathbb { N}_0}(X^+(k)+Y^-(k))$ are integrable and X(k) ≤ Y (k) for all $k\in \mathbb { N}_0$. The game considered by Neveu (1975) was generalized by Yasuda (1985), who dropped the latter assumption on monotonicity. In his model, the expected payoff function takes the following form:

$$\displaystyle \begin{aligned} R(\tau_1,\tau_2)= E[X(\tau_1)1[\tau_1<\tau_2]+Y(\tau_2)1[\tau_2< \tau_1]+Z(\tau_1)1[\tau_1=\tau_2]], \end{aligned}$$

where as usual $(X(k))_{k\in \mathbb {N}_0}$, $(Y(k))_{k\in \mathbb {N}_0}$ and $ (Z(k))_{k\in \mathbb {N}_0}$ are adapted integrable random variables. Yasuda (1985) considered randomized strategies instead of pure ones. According to Yasuda (1985) a strategy for a player is an adapted random sequence $ p=(p_k)_{k\in \mathbb {N}_0}$ (or $q=(q_k)_{k\in \mathbb {N}_0}$) such that 0 ≤ p_k, q_k ≤ 1 with probability one. Here, p_k (or q_k) stands for the probability that the player stops the game at time k conditional on the event that the game was not stopped before. In computing the payoff induced by a pair of strategies (p, q), one assumes that the randomizations performed by the players in various stages are mutually independent and independent of the payoff processes. Thus, a strategy that corresponds to a stopping time σ is p_k = 0 on the event [σ > k] and p_k = 1 on the event [σ ≤ k]. Yasuda (1985) proved the existence of the value in the set of randomized strategies in finite and discounted infinite time horizon problems.

Before formulating the next result, we define the stopping stages for players 1 and 2 by $\theta _1:=\inf \{k\in \mathbb {N}_0 :\ P(k)\le p_k\}$, and $\theta _2:=\inf \{k\in \mathbb {N}_0 :\ Q(k)\le q_k\}, $ where $(P(k), Q(k))_{k\in \mathbb {N}_0}$ is a double sequence of i.i.d. random variables uniformly distributed over [0, 1] satisfying certain independence assumptions imposed in Rosenberg et al. (2001). Set θ = θ₁ ∧ θ₂. Clearly, θ is the stage at which the game stops. Let us define

$$\displaystyle \begin{aligned} R(p,q)=E[X(\theta_1)1[\theta_1<\theta_2]+Y(\theta_2)1[\theta_2< \theta_1]+Z(\theta_1)1[\tau_1=\tau_2<+\infty]] \end{aligned}$$

and its β-discounted evaluation

$$\displaystyle \begin{aligned} R_{\beta}(p,q)&=(1-\beta)E\big[\beta^{\theta+1}(X(\theta_1)1[\theta_1< \theta_2]+ Y(\theta_2)1[\theta_2< \theta_1]\\ &\quad +Z(\theta_1)1[\tau_1=\tau_2<+\infty])\big]. \end{aligned} $$

The following result was proved by Rosenberg et al. (2001).

Theorem 29

Assume that $E[\sup _{k\in \mathbb {N}_0} (|X(k)|+|Y(k)|+|Z(k)|)]<+\infty $. Then the stopping games with the payoffs R(p, q) and R_β(p, q) have values, say v and v_β, respectively. Moreover, lim_β→1v_β = v.

Let us now turn to nonzero-sum Dynkin games. They were considered in several papers; see, for instance, Ferenstein (2007), Krasnosielska-Kobos (2016), Morimoto (1986), Nowak and Szajowski (1999), Ohtsubo (1987), Ohtsubo (1991), Solan and Vieille (2001), and Szajowski (1994). Obviously, the list of references is by no means exhaustive. We start with presenting a result for two-player nonzero-sum stopping games. Assume that the aforementioned sequences $(X(k))_{k\in \mathbb {N}_0}$, $(Y(k))_{k\in \mathbb {N}_0}$ and $ (Z(k))_{k\in \mathbb {N}_0}$ are bounded in $\mathbb {R}^2$ and let ρ be a uniform bound on the payoffs. The payoff of the game is R(p, q) except that $R(p,q)\in \mathbb {R}^2$. Shmaya and Solan (2004) proved the following result.

Theorem 30

For each 𝜖 > 0, the stopping game has an 𝜖-equilibrium $(p^*_{\epsilon },q^*_{\epsilon })$.

Theorem 30 does not hold, if the payoffs are not uniformly bounded. Its proof is based upon a stochastic version of the Ramsey theorem that was also proved by Shmaya and Solan (2004). It states that for every colouring of a complete infinite graph by finitely many colours, there is a complete infinite monochromatic subgraph. Shmaya and Solan (2004) applied a variation of this result to reduce the problem of the existence of an 𝜖-equilibrium in a general stopping game to that of studying properties of 𝜖 -equilibria in a simple class of stochastic games with finite state space. A similar result for deterministic 2-player nonzero-sum stopping games was reported by Shmaya et al. (2003).

All the aforementioned works deal with the two-player case and/or assume some special structure of the payoffs. Recently, Hamadène and Hassani (2014) studied n -person nonzero-sum Dynkin games. Such a game is terminated at τ := τ₁ ∧… ∧ τ_n, where τ_i is a stopping time chosen by player i. Then, the corresponding payoff for player i is given by

$$\displaystyle \begin{aligned} R_i(\tau_1,\ldots,\tau_n)=W_{\tau}^{i,I}, \end{aligned}$$

where I_s denotes the set of players who make the decision to stop, that is, I_s = {m ∈{1, …, n} : τ = τ_m} and $W^{i,I_s}$ is the payoff stochastic process of player i. The main assumption says that the payoff is less when the player belongs to the group involved in the decision to stop than when he is not. Hamadène and Hassani (2014) showed that the game has a Nash equilibrium in pure strategies. The proof is based on the approximation scheme whose limit provides a Nash equilibrium.

Krasnosielska-Kobos and Ferenstein (2013) is another paper that is concerned with multi-person stopping games. More precisely, they consider a game in which players sequentially observe the offers X(1), X(2), … at jump times T₁, T₂, … of a Poisson process. It is assumed that the random variables X(1), X(2), … form an i.i.d. sequence. Each accepted offer results in a reward R(k) = X(k)r(T_k), where r is a non-increasing discount function. If more than one player accepts the offer, then the player with the highest priority gets the reward. By making use of the solution to the multiple optimal stopping time problem with above reward structure, Krasnosielska-Kobos and Ferenstein (2013) constructed a Nash equilibrium which is Pareto efficient.

Mashiah-Yaakovi (2014), on the other hand, studied subgame-perfect equilibria in stopping games. It is assumed that at every stage one of the players is chosen according to a stochastic process, and that player decides whether to continue the interaction or to stop it. The terminal payoff vector is obtained by another stochastic process. Mashiah-Yaakovi (2014) defines a weaker concept of subgame-perfect equilibrium, namely, a δ-approximate subgame-perfect 𝜖-equilibrium. A strategy profile is a δ -approximate subgame-perfect 𝜖-equilibrium if it induces an 𝜖-equilibrium in every subgame, except perhaps a set of subgames that occur with probability at most δ. A 0-approximate subgame-perfect 𝜖-equilibrium is actually a subgame-perfect 𝜖-equilibrium. The concept of approximate subgame-perfect equilibrium relates to the concept of “trembling-hand perfect equilibrium” introduced by Selten (1975). A stopping game in which, at every stage, one player who decides to stop or continue the game is chosen according to a (periodic in some sense) stochastic process is also studied in Mashiah-Yaakovi (2009). This assumption extends the random priority in stopping games considered, for example, in Szajowski (1994, 1995). Once the chosen player decides to stop, the players receive terminal payoffs that are determined by a second stochastic process. Periodic subgame-perfect 𝜖-equilibria in pure strategies are studied under some quite general conditions. Some bases for stopping n-person games with fixed priorities were provided by Enns and Ferenstein (1987).

Finally, it is worth pointing out that there are three notions of random stopping times. The above mentioned randomized strategies used by Yasuda (1985) and Rosenberg et al. (2001) are also called behavioral stopping times. A randomized stopping time, on the other hand, is a nonnegative adapted real-valued process $\rho =(\rho _k)_{k\in \mathbb {N}\cup \{\infty \}}$ that satisfies $\sum _{k\in \mathbb {N}\cup \{\infty \}}\rho _k=1$. The third concept allows to define mixed stopping times ν. Roughly speaking, they are product measurable functions in which the first coordinate r is chosen according to the uniform distribution over the interval [0, 1] at the outset. Then, the stopping time is ν(r, ⋅). For more details, the reader is referred to Rosenberg et al. (2001). As communicated to us by Eilon Solan, the classes of mixed and randomized stopping times are equivalent by a proper generalization of Kuhn’s theorem.

References

Abreu D, Pearce D, Stacchetti E (1986) Optimal cartel equilibria with imperfect monitoring. J Econ Theory 39:251–269
Article MathSciNet MATH Google Scholar
Abreu D, Pearce D, Stacchetti E (1990) Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58:1041–1063
Article MathSciNet MATH Google Scholar
Adlakha S, Johari R (2013) Mean field equilibrium in dynamic games with strategic complementarities. Oper Res 61:971–989
Article MathSciNet MATH Google Scholar
Aliprantis C, Border K (2006) Infinite dimensional analysis: a Hitchhiker’s guide. Springer, New York
MATH Google Scholar
Alj A, Haurie A (1983) Dynamic equilibria in multigenerational stochastic games. IEEE Trans Autom Control 28:193–203
Article MATH Google Scholar
Alós-Ferrer C, Ritzberger K (2015) Characterizing existence of equilibrium for large extensive form games: a necessity result. Econ Theory 63:407–430
Article MathSciNet MATH Google Scholar
Alós-Ferrer C, Ritzberger K (2016) Equilibrium existence for large perfect information games. J Math Econ 62:5–18
Article MathSciNet MATH Google Scholar
Altman E (1996) Non-zero-sum stochastic games in admission, service and routing control in queueing systems. Queueing Syst Theory Appl 23:259–279
Article MathSciNet MATH Google Scholar
Altman E, Avrachenkov K, Bonneau N, Debbah M, El-Azouzi R, Sadoc Menasche D (2008) Constrained cost-coupled stochastic games with independent state processes. Oper Res Lett 36:160–164
Article MathSciNet MATH Google Scholar
Altman E, Avrachenkov K, Marquez R, Miller G (2005) Zero-sum constrained stochastic games with independent state processes. Math Methods Oper Res 62:375–386
Article MathSciNet MATH Google Scholar
Altman E, Hordijk A, Spieksma FM (1997) Contraction conditions for average and α-discount optimality in countable state Markov games with unbounded rewards. Math Oper Res 22: 588–618
Article MathSciNet MATH Google Scholar
Amir R (1996a) Continuous stochastic games of capital accumulation with convex transitions. Games Econ Behavior 15:132–148
Article MathSciNet MATH Google Scholar
Amir R (1996b) Strategic intergenerational bequests with stochastic convex production. Econ Theory 8:367–376
Article MATH Google Scholar
Amir R (2003) Stochastic games in economics: the lattice-theoretic approach. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 443–453
Chapter MATH Google Scholar
Artstein Z (1989) Parametrized integration of multifunctions with applications to control and optimization. SIAM J Control Optim 27:1369–1380
Article MathSciNet MATH Google Scholar
Aumann RJ (1974) Subjectivity and correlation in randomized strategies. J Math Econ 1:67–96
Article MathSciNet MATH Google Scholar
Aumann RJ (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55:1–18
Article MathSciNet MATH Google Scholar
Balbus Ł, Jaśkiewicz A, Nowak AS (2014) Robust Markov perfect equilibria in a dynamic choice model with quasi-hyperbolic discounting. In: Haunschmied J et al (eds) Dynamic games in economics, dynamic modeling and econometrics in economics and finance 16. Springer, Berlin/Heidelberg, pp 1–22
MATH Google Scholar
Balbus Ł, Jaśkiewicz A, Nowak AS (2015a) Existence of stationary Markov perfect equilibria in stochastic altruistic growth economies. J Optim Theory Appl 165:295–315
Article MathSciNet MATH Google Scholar
Balbus Ł, Jaśkiewicz A, Nowak AS (2015b) Stochastic bequest games. Games Econ Behavior 90:247–256
Article MathSciNet MATH Google Scholar
Balbus Ł, Jaśkiewicz A, Nowak AS (2015c) Bequest games with unbounded utility functions. J Math Anal Appl 427:515–524
Article MathSciNet MATH Google Scholar
Balbus Ł, Jaśkiewicz A, Nowak AS (2016) Non-paternalistic intergenerational altruism revisited. J Math Econ 63:27–33
Article MathSciNet MATH Google Scholar
Balbus Ł, Nowak AS (2004) Construction of Nash equilibria in symmetric stochastic games of capital accumulation. Math Methods Oper Res 60:267–277
Article MathSciNet MATH Google Scholar
Balbus Ł, Nowak AS (2008) Existence of perfect equilibria in a class of multigenerational stochastic games of capital accumulation. Automatica 44:1471–1479
Article MathSciNet MATH Google Scholar
Balbus Ł, Reffett K, Woźny Ł (2012) Stationary Markovian equilibria in altruistic stochastic OLG models with limited commitment. J Math Econ 48:115–132
Article MATH Google Scholar
Balbus Ł, Reffett K, Woźny Ł (2013a) A constructive geometrical approach to the uniqueness of Markov stationary equilibrium in stochastic games of intergenerational altruism. J Econ Dyn Control 37:1019–1039
Article MathSciNet Google Scholar
Balbus Ł, Reffett K, Woźny Ł (2013b) Markov stationary equilibria in stochastic supermodular games with imperfect private and public information. Dyn Games Appl 3:187–206
Article MathSciNet MATH Google Scholar
Balbus Ł, Reffett K, Woźny Ł (2014a) Constructive study of Markov equilibria in stochastic games with complementarities. J Econ Theory 150:815–840
Article MathSciNet MATH Google Scholar
Balbus Ł, Jaśkiewicz A, Nowak AS, Woźny L (2017) A note on Markov perfect equilibria in a class of non-stationary bequest games. J Math Anal Appl 456:394–401
Article MathSciNet MATH Google Scholar
Balbus Ł, Jaśkiewicz A, Nowak AS (2018, in press) Markov perfect equilibria in a dynamic decision model with quasi-hyperbolic discounting. Ann Oper Res
Google Scholar
Barelli P, Duggan J (2014) A note on semi-Markov perfect equilibria in discounted stochastic games. J Econ Theory 15:596–604
Article MathSciNet MATH Google Scholar
Başar T, Olsder GJ (1995) Dynamic noncooperative game theory. Academic, New York
MATH Google Scholar
Berg K (2016) Elementary subpaths in discounted stochastic games. Dyn Games Appl 6(3): 304–323
Article MathSciNet MATH Google Scholar
Bernheim D, Ray D (1983) Altruistic growth economies I. Existence of bequest equilibria. Technical report no 419, Institute for Mathematical Studies in the Social Sciences, Stanford University
Google Scholar
Bernheim D, Ray D (1986) On the existence of Markov-consistent plans under production uncertainty. Rev Econ Stud 53:877–882
Article MathSciNet MATH Google Scholar
Bernheim D, Ray D (1987) Economic growth with intergenerational altruism. Rev Econ Stud 54:227–242
Article MathSciNet MATH Google Scholar
Bernheim D, Ray D (1989) Markov perfect equilibria in altruistic growth economies with production uncertainty. J Econ Theory 47:195–202
Article MathSciNet MATH Google Scholar
Bhattacharya R, Majumdar M (2007) Random dynamical systems: theory and applications. Cambridge University Press, Cambridge
Book MATH Google Scholar
Billingsley P (1968) Convergence of probability measures. Wiley, New York
MATH Google Scholar
Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36:226–235
Article MathSciNet MATH Google Scholar
Blackwell D (1969) Infinite G_δ -games with imperfect information. Zastosowania Matematyki (Appl Math) 10:99–101
MathSciNet MATH Google Scholar
Borkar VS, Ghosh MK (1993) Denumerable state stochastic games with limiting average payoff. J Optim Theory Appl 76:539–560
Article MathSciNet MATH Google Scholar
Browder FE (1960) On continuity of fixed points under deformations of continuous mappings. Summa Brasiliensis Math 4:183–191
MathSciNet MATH Google Scholar
Carlson D, Haurie A (1996) A turnpike theory for infinite-horizon open-loop competitive processes. SIAM J Control Optim 34:1405–1419
Article MathSciNet MATH Google Scholar
Castaing C, Valadier M (1977) Convex analysis and measurable multifunctions. Lecture notes in mathematics, vol 580. Springer, New York
MATH Google Scholar
Chatterjee S, Eyigungor B (2016) Continuous Markov equilibria with quasi-geometric discounting. J Econ Theory 163:467–494
Article MathSciNet MATH Google Scholar
Chiarella C, Kemp MC, Van Long N, Okuguchi K (1984) On the economics of international fisheries. Int Econ Rev 25:85–92
Article MathSciNet MATH Google Scholar
Cingiz K, Flesch J, Herings JJP, Predtetchinski A (2016) Doing it now, later, or never. Games Econ Behavior 97:174–185
Article MathSciNet MATH Google Scholar
Cole HL, Kocherlakota N (2001) Dynamic games with hidden actions and hidden states. J Econ Theory 98:114–126
Article MathSciNet MATH Google Scholar
Cottle RW, Pang JS, Stone RE (1992) The linear complementarity problem. Academic, New York
MATH Google Scholar
Curtat LO (1996) Markov equilibria of stochastic games with complementarities. Games Econ Behavior 17:177–199
Article MathSciNet MATH Google Scholar
Doraszelski U, Escobar JF (2010) A theory of regular Markov perfect equilibria in dynamic stochastic games: genericity, stability, and purification. Theor Econ 5:369–402
Article MathSciNet MATH Google Scholar
Doraszelski U, Pakes A (2007) A framework for applied dynamic analysis in IO. In: Amstrong M, Porter RH (eds) Handbook of industrial organization, vol 3. North Holland, Amsterdam/ London, pp 1887–1966
Google Scholar
Doraszelski U, Satterthwaite M (2010) Computable Markov-perfect industry dynamics. Rand J Econ 41:215–243
Article Google Scholar
Dubins LE, Savage LJ (2014) Inequalities for stochastic processes. Dover, New York
Google Scholar
Duffie D, Geanakoplos J, Mas-Colell A, McLennan A (1994) Stationary Markov equilibria. Econometrica 62:745–781
Article MathSciNet MATH Google Scholar
Duggan J (2012) Noisy stochastic games. Econometrica 80:2017–2046
Article MathSciNet MATH Google Scholar
Dutta PK (1995) A folk theorem for stochastic games. J Econ Theory 66:1–32
Article MathSciNet MATH Google Scholar
Dutta PK, Sundaram RK (1992) Markovian equilibrium in class of stochastic games: existence theorems for discounted and undiscounted models. Econ Theory 2:197–214
Article MathSciNet MATH Google Scholar
Dutta PK, Sundaram RK (1993) The tragedy of the commons? Econ Theory 3:413–426
Article MathSciNet MATH Google Scholar
Dutta PK, Sundaram RK (1998) The equilibrium existence problem in general Markovian games. In: Majumdar M (ed) Organizations with incomplete information. Cambridge University Press, Cambridge, pp 159–207
Chapter Google Scholar
Dynkin EB (1969) The game variant of a problem on optimal stopping. Sov Math Dokl 10:270–274
MATH Google Scholar
Dynkin EB, Evstigneev IV (1977) Regular conditional expectations of correspondences. Theory Probab Appl 21:325–338
Article MATH Google Scholar
Eaves B (1972) Homotopies for computation of fixed points. Math Program 3:1–22
Article MathSciNet MATH Google Scholar
Eaves B (1984) A course in triangulations for solving equations with deformations. Springer, Berlin
Book MATH Google Scholar
Elliott RJ, Kalton NJ, Markus L (1973) Saddle-points for linear differential games. SIAM J Control Optim 11:100–112
Article MathSciNet MATH Google Scholar
Enns EG, Ferenstein EZ (1987) On a multi-person time-sequential game with priorities. Seq Anal 6:239–256
Article MATH Google Scholar
Ericson R, Pakes A (1995) Markov-perfect industry dynamics: a framework for empirical work. Rev Econ Stud 62:53–82
Article MATH Google Scholar
Escobar JF (2013) Equilibrium analysis of dynamic models of imperfect competition. Int J Ind Organ 31:92–101
Article Google Scholar
Federgruen A (1978) On N-person stochastic games with denumerable state space. Adv Appl Probab 10:452–471
Article MathSciNet MATH Google Scholar
Ferenstein E (2007) Randomized stopping games and Markov market games. Math Methods Oper Res 66:531–544
Article MathSciNet MATH Google Scholar
Filar JA, Schultz T, Thuijsman F, Vrieze OJ (1991) Nonlinear programming and stationary equilibria in stochastic games. Math Program 50:227–237
Article MathSciNet MATH Google Scholar
Filar JA, Vrieze K (1997) Competitive Markov decision processes. Springer, New York
MATH Google Scholar
Fink AM (1964) Equilibrium in a stochastic n-person game. J Sci Hiroshima Univ Ser A-I Math 28:89–93
MathSciNet MATH Google Scholar
Flesch J, Schoenmakers G, Vrieze K (2008) Stochastic games on a product state space. Math Oper Res 33:403–420
Article MathSciNet MATH Google Scholar
Flesch J, Schoenmakers G, Vrieze K (2009) Stochastic games on a product state space: the periodic case. Int J Game Theory 38:263–289
Article MathSciNet MATH Google Scholar
Flesch J, Thuijsman F, Vrieze OJ (1997) Cyclic Markov equilibrium in stochastic games. Int J Garne Theory 26:303–314
Article MATH Google Scholar
Flesch J, Thuijsman F, Vrieze OJ (2003) Stochastic games with non-observable actions. Math Methods Oper Res 58:459–475
Article MathSciNet MATH Google Scholar
Flesch J, Kuipers J, Mashiah-Yaakovi A, Schoenmakers G, Solan E, Vrieze K (2010a) Perfect-information games with lower-semicontinuous payoffs. Math Oper Res 35:742–755
Article MathSciNet MATH Google Scholar
Flesch J, Kuipers J, Schoenmakers G, Vrieze K (2010b) Subgame-perfection in positive recursive games with perfect information. Math Oper Res 35:193–207
Article MathSciNet MATH Google Scholar
Flesch J, Kuipers J, Mashiah-Yaakovi A, Schoenmakers G, Shmaya E, Solan E, Vrieze K (2014) Non-existence of subgame-perfect 𝜖-equilibrium in perfect-information games with infinite horizon. Int J Game Theory 43:945–951
Article MathSciNet MATH Google Scholar
Flesch J, Thuijsman F, Vrieze OJ (2007) Stochastic games with additive transitions. Eur J Oper Res 179:483–497
Article MathSciNet MATH Google Scholar
Flesch J, Predtetchinski A (2015) On refinements of subgame perfect 𝜖 -equilibrium. Int J Game Theory 45:523–542
Article MathSciNet MATH Google Scholar
Forges E (1986) An approach to communication equilibria. Econometrica 54:1375–1385
Article MathSciNet MATH Google Scholar
Forges F (1992) Repeated games of incomplete information: nonzero-sum. In: Aumann RJ, Hart S (eds) Handbook of game theory, vol 1. North Holland, Amsterdam, pp 155–177
MATH Google Scholar
Forges F (2009) Correlated equilibria and communication in games. In: Meyers RA (ed) Encyclopedia of complexity and systems science. Springer, New York, pp 1587–1596
Chapter Google Scholar
Fudenberg D, Levine D (1983) Subgame-perfect equilibria of finite and infinite horizon games. J Econ Theory 31:251–268
Article MathSciNet MATH Google Scholar
Fudenberg D, Tirole J (1991) Game theory. MIT Press, Cambridge
MATH Google Scholar
Fudenberg D, Yamamoto Y (2011) The folk theorem for irreducible stochastic games with imperfect public monitoring. J Econ Theory 146:1664–1683
Article MathSciNet MATH Google Scholar
Gale D (1967) On optimal development in a multisector economy. Rev Econ Stud 34:1–19
Article Google Scholar
Glicksberg IL (1952) A further generalization of the Kakutani fixed point theorem with application to Nash equilibrium points. Proc Am Math Soc 3:170–174
MathSciNet MATH Google Scholar
Govindan S, Wilson R (2003) A global Newton method to compute Nash equilibria. J Econ Theory 110:65–86
Article MathSciNet MATH Google Scholar
Govindan S, Wilson R (2009) Global Newton method for stochastic games. J Econ Theory 144:414–421
Article MathSciNet MATH Google Scholar
Haller H, Lagunoff R (2000) Genericity and Markovian behavior in stochastic games. Econometrica 68:1231–1248
Article MathSciNet MATH Google Scholar
Hamadène S, Hassani M (2014) The multi-player nonzero-sum Dynkin game in discrete time. Math Methods Oper Res 79:179–194
Article MathSciNet MATH Google Scholar
Harris C (1985) Existence and characterization of perfect equilibrium in games of perfect information. Econometrica 53:613–628
Article MathSciNet MATH Google Scholar
Harris C, Laibson D (2001) Dynamic choices of hyperbolic consumers. Econometrica 69:935–957
Article MathSciNet MATH Google Scholar
Harris C, Reny PJ, Robson A (1995) The existence of subgame-perfect equilibrium in continuous games with almost perfect information: a case for public randomization. Econometrica 63: 507–544
Article MathSciNet MATH Google Scholar
Harsanyi JC (1973a) Oddness of the number of equilibrium points: a new proof. Int J Game Theory 2:235–250
Article MathSciNet MATH Google Scholar
Harsanyi JC (1973b) Games with randomly disturbed payoffs: a new rationale for mixed-strategy equilibrium points. Int J Game Theory 2:1–23
Article MathSciNet MATH Google Scholar
Haurie A, Krawczyk JB, Zaccour G (2012) Games and dynamic games. World Scientific, Singapore
Book MATH Google Scholar
He W, Sun Y (2017) Stationary Markov perfect equilibria in discounted stochastic games. J Econ Theory 169:35–61
Article MathSciNet MATH Google Scholar
Heller Y (2012) Sequential correlated equilibria in stopping games. Oper Res 60:209–224
Article MathSciNet MATH Google Scholar
Herings JJP, Peeters RJAP (2004) Stationary equilibria in stochastic games: structure, selection, and computation. J Econ Theory 118:32–60
MathSciNet MATH Google Scholar
Herings JJP, Peeters RJAP (2010) Homotopy methods to compute equilibria in game theory. Econ Theory 42:119–156
Article MathSciNet MATH Google Scholar
Himmelberg CJ (1975) Measurable relations. Fundam Math 87:53–72
Article MathSciNet MATH Google Scholar
Himmelberg CJ, Parthasarathy T, Raghavan TES, Van Vleck FS (1976) Existence of p-equilibrium and optimal stationary strategies in stochastic games. Proc Am Math Soc 60:245–251
MathSciNet MATH Google Scholar
Hopenhayn H, Prescott E (1992) Stochastic monotonicity and stationary distributions for dynamic economies. Econometrica 60:1387–1406
Article MathSciNet MATH Google Scholar
Hörner J, Sugaya T, Takahashi S, Vieille N (2011) Recursive methods in discounted stochastic games: an algorithm for δ → 1 and a folk theorem. Econometrica 79:1277–1318
Article MathSciNet MATH Google Scholar
Hörner J, Takahashi S, Vieille N (2014) On the limit perfect public equilibrium payoff set in repeated and stochastic game. Games Econ Behavior 85:70–83
Article MathSciNet MATH Google Scholar
Horst U (2005) Stationary equilibria in discounted stochastic games with weakly interacting players. Games Econ Behavior 51:83–108
Article MathSciNet MATH Google Scholar
Jaśkiewicz A, Nowak AS (2006) Approximation of noncooperative semi-Markov games. J Optim Theory Appl 131:115–134
Article MathSciNet MATH Google Scholar
Jaśkiewicz A, Nowak AS (2014a) Stationary Markov perfect equilibria in risk sensitive stochastic overlapping generations models. J Econ Theory 151:411–447
Article MathSciNet MATH Google Scholar
Jaśkiewicz A, Nowak AS (2014b) Robust Markov perfect equilibria. J Math Anal Appl 419: 1322–1332
Article MathSciNet MATH Google Scholar
Jaśkiewicz A, Nowak AS (2015a) On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games. Math Methods Oper Res 81:169–179
Article MathSciNet MATH Google Scholar
Jaśkiewicz A, Nowak AS (2015b) Stochastic games of resource extraction. Automatica 54: 310–316
Article MathSciNet MATH Google Scholar
Jaśkiewicz A, Nowak AS (2016) Stationary almost Markov perfect equilibria in discounted stochastic games. Math Oper Res 41:430–441
Article MathSciNet MATH Google Scholar
Jaśkiewicz A, Nowak AS (2018a) Zero-sum stochastic games. In: Basar T, Zaccour G (eds) Handbook of dynamic game theory, Birkhäuser, Basel
Google Scholar
Jaśkiewicz A, Nowak AS (2018b, in press) On symmetric stochastic games of resource extraction with weakly continuous transitions. TOP
Google Scholar
Kakutani S (1941) A generalization of Brouwer’s fixed point theorem. Duke Math J 8:457–459
Article MathSciNet MATH Google Scholar
Kiefer YI (1971) Optimal stopped games. Theory Probab Appl 16:185–189
Article Google Scholar
Kitti M (2016) Subgame-perfect equilibria in discounted stochastic games. J Math Anal Appl 435:253–266
Article MathSciNet MATH Google Scholar
Klein E, Thompson AC (1984) Theory of correspondences. Wiley, New York
MATH Google Scholar
Kohlberg E, Mertens JF (1986) On the strategic stability of equilibria. Econometrica 54:1003–1037
Article MathSciNet MATH Google Scholar
Krasnosielska-Kobos A (2016) Construction of Nash equilibrium based on multiple stopping problem in multi-person game. Math Methods Oper Res (2016) 83:53–70
Article MathSciNet MATH Google Scholar
Krasnosielska-Kobos A, Ferenstein E (2013) Construction of Nash equilibrium in a game version of Elfving’s multiple stopping problem. Dyn Games Appl 3:220–235
Article MathSciNet MATH Google Scholar
Krishnamurthy N, Parthasarathy T, Babu S (2012) Existence of stationary equilibrium for mixtures of discounted stochastic games. Curr Sci 103:1003–1013
Google Scholar
Krishnamurthy N, Parthasarathy T, Ravindran G (2012) Solving subclasses of multi-player stochastic games via linear complementarity problem formulations – a survey and some new results. Optim Eng 13:435–457
Article MathSciNet MATH Google Scholar
Kuipers J, Flesch J, Schoenmakers G, Vrieze K (2016) Subgame-perfection in recursive perfect information games, where each player controls one state. Int J Game Theory 45:205–237
Article MathSciNet MATH Google Scholar
Kuratowski K, Ryll-Nardzewski C (1965) A general theorem on selectors. Bull Polish Acad Sci (Ser Math) 13:397–403
MathSciNet MATH Google Scholar
Küenle HU (1994) On Nash equilibrium solutions in nonzero-sum stochastic games with complete information. Int J Game Theory 23:303–324
Article MathSciNet MATH Google Scholar
Küenle HU (1999) Equilibrium strategies in stochastic games with additive cost and transition structure and Borel state and action spaces. Int Game Theory Rev 1:131–147
Article MathSciNet MATH Google Scholar
Leininger W (1986) The existence of perfect equilibria in model of growth with altruism between generations. Rev Econ Stud 53:349–368
Article MathSciNet MATH Google Scholar
Lemke CE (1965) Bimatrix equilibrium points and mathematical programming. Manag Sci 11:681–689
Article MathSciNet MATH Google Scholar
Lemke CE, Howson JT Jr (1964) Equilibrium points of bimatrix games. SIAM J Appl Math 12:413–423
Article MathSciNet MATH Google Scholar
Levhari D, Mirman L (1980) The great fish war: an example using a dynamic Coumot-Nash solution. Bell J Econ 11:322–334
Article Google Scholar
Levy YJ (2013) Discounted stochastic games with no stationary Nash equilibrium: two examples. Econometrica 81:1973–2007
Article MathSciNet MATH Google Scholar
Levy YJ, McLennan A (2015) Corrigendum to: discounted stochastic games with no stationary Nash equilibrium: two examples. Econometrica 83:1237–1252
Article MathSciNet Google Scholar
Maitra A, Sudderth W (2003) Borel stay-in-a-set games. Int J Game Theory 32:97–108
Article MathSciNet MATH Google Scholar
Maitra A, Sudderth WD (2007) Subgame-perfect equilibria for stochastic games. Math Oper Res 32:711–722
Article MathSciNet MATH Google Scholar
Majumdar MK, Sundaram R (1991) Symmetric stochastic games of resource extraction. The existence of non-randomized stationary equilibrium. In: Raghavan et al (eds) Stochastic games and related topics. Kluwer, Dordrecht, pp 175–190
Chapter Google Scholar
Maliar, L, Maliar, S (2016) Ruling out multiplicity of smooth equilibria in dynamic games: a hyperbolic discounting example. Dyn Games Appl 6:243–261
Article MathSciNet MATH Google Scholar
Mas-Colell A (1974) A note on a theorem of F. Browder. Math Program 6:229–233
Article MathSciNet MATH Google Scholar
Mashiah-Yaakovi A (2009) Periodic stopping games. Int J Game Theory 38:169–181
Article MathSciNet MATH Google Scholar
Mashiah-Yaakovi A (2014) Subgame-perfect equilibria in stopping games. Int J Game Theory 43:89–135
Article MathSciNet MATH Google Scholar
Mashiah-Yaakovi A (2015) Correlated equilibria in stochastic games with Borel measurable payoffs. Dyn Games Appl 5:120–135
Article MathSciNet MATH Google Scholar
Maskin E, Tirole J (2001) Markov perfect equilibrium: I. Observable actions. J Econ Theory 100:191–219
Article MathSciNet MATH Google Scholar
Mertens JF (2002) Stochastic games. In: Aumann RJ, Hart S (eds) Handbook of game theory with economic applications, vol 3. North Holland, Amsterdam/London, pp 1809–1832
Google Scholar
Mertens JF (2003) A measurable measurable choice theorem. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 107–130
Chapter Google Scholar
Mertens JF, Neyman A (1981) Stochastic games. Int J Game Theory 10:53–56
Article MathSciNet MATH Google Scholar
Mertens JF, Parthasarathy T (1991) Nonzero-sum stochastic games. In: Raghavan et al (eds) Stochastic games and related topics. Kluwer, Dordrecht, pp 145–148
Chapter Google Scholar
Mertens JF, Parthasarathy T (2003) Equilibria for discounted stochastic games. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 131–172
Chapter MATH Google Scholar
Mertens JF, Sorin S, Zamir S (2015) Repeated games. Cambridge University Press, Cambridge
Book MATH Google Scholar
Milgrom P, Roberts J (1990) Rationalizability, learning and equilibrium in games with strategic complementarities. Econometrica 58:1255–1277
Article MathSciNet MATH Google Scholar
Milgrom P, Shannon C (1994) Monotone comparative statics. Econometrica 62:157–180
Article MathSciNet MATH Google Scholar
Mohan SR, Neogy SK, Parthasarathy T (1997) Linear complementarity and discounted polystochastic game when one player controls transitions. In: Ferris MC, Pang JS (eds) Proceedings of the international conference on complementarity problems. SIAM, Philadelphia, pp 284–294
MATH Google Scholar
Mohan SR, Neogy SK, Parthasarathy T (2001) Pivoting algorithms for some classes of stochastic games: a survey. Int Game Theory Rev 3:253–281
Article MathSciNet MATH Google Scholar
Monderer D, Shapley LS (1996) Potential games. Games Econ Behavior 14:124–143
Article MathSciNet MATH Google Scholar
Montrucchio L (1987) Lipschitz continuous policy functions for strongly concave optimization problems. J Math Econ 16:259–273
Article MathSciNet MATH Google Scholar
Morimoto H (1986) Nonzero-sum discrete parameter stochastic games with stopping times. Probab Theory Relat Fields 72:155–160
Article MathSciNet MATH Google Scholar
Nash JF (1950) Equilibrium points in n -person games. Proc Nat Acad Sci USA 36:48–49
Article MathSciNet MATH Google Scholar
Neveu J (1975) Discrete-parameter martingales. North-Holland, Amsterdam
MATH Google Scholar
Neyman A (2017) Continuous-time stochastic games. Games Econ Behavior 104:92–130
Article MathSciNet MATH Google Scholar
Neyman A, Sorin S (eds) (2003) Stochastic games and applications. Kluwer, Dordrecht
MATH Google Scholar
Nowak AS (1985) Existence of equilibrium stationary strategies in discounted noncooperative stochastic games with uncountable state space. J Optim Theory Appl 45:591–602
Article MathSciNet MATH Google Scholar
Nowak AS (1987) Nonrandomized strategy equilibria in noncooperative stochastic games with additive transition and reward structure. J Optim Theory Appl 52:429–441
Article MathSciNet MATH Google Scholar
Nowak AS (2003a) N-person stochastic games: extensions of the finite state space case and correlation. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer, Dordrecht, pp 93–106
Chapter MATH Google Scholar
Nowak AS (2003b) On a new class of nonzero-sum discounted stochastic games having stationary Nash equilibrium point. Int J Game Theory 32:121–132
Article MathSciNet MATH Google Scholar
Nowak AS (2006a) On perfect equilibria in stochastic models of growth with intergenerational altruism. Econ Theory 28:73–83
Article MathSciNet MATH Google Scholar
Nowak AS (2006b) A multigenerational dynamic game of resource extraction. Math Social Sci 51:327–336
Article MathSciNet MATH Google Scholar
Nowak AS (2006c) A note on equilibrium in the great fish war game. Econ Bull 17(2):1–10
MathSciNet Google Scholar
Nowak AS (2007) On stochastic games in economics. Math Methods Oper Res 66:513–530
Article MathSciNet MATH Google Scholar
Nowak AS (2008) Equilibrium in a dynamic game of capital accumulation with the overtaking criterion. Econ Lett 99:233–237
Article MathSciNet MATH Google Scholar
Nowak AS (2010) On a noncooperative stochastic game played by internally cooperating generations. J Optim Theory Appl 144:88–106
Article MathSciNet MATH Google Scholar
Nowak AS, Altman E (2002) ε-Equilibria for stochastic games with uncountable state space and unbounded costs. SIAM J Control Optim 40:1821–1839
Article MathSciNet MATH Google Scholar
Nowak AS, Jaśkiewicz A (2005) Nonzero-sum semi-Markov games with the expected average payoffs. Math Methods Oper Res 62:23–40
Article MathSciNet MATH Google Scholar
Nowak AS, Raghavan TES (1992) Existence of stationary correlated equilibria with symmetric information for discounted stochastic games. Math Oper Res 17:519–526
Article MathSciNet MATH Google Scholar
Nowak AS, Raghavan TES (1993) A finite step algorithm via a bimatrix game to a single controller non-zero sum stochastic game. Math Program Ser A 59:249–259
Article MathSciNet MATH Google Scholar
Nowak AS, Szajowski K (1999) Nonzero-sum stochastic games. In: Ann Int Soc Dyn Games, vol 4. Birkhäuser, Boston, pp 297–343
Google Scholar
Nowak AS, Wiȩcek P (2007) On Nikaido-Isoda type theorems for discounted stochastic games. J Math Anal Appl 332:1109–1118
Google Scholar
Ohtsubo Y (1987) A nonzero-sum extension of Dynkin’s stopping problem. Math Oper Res 12:277–296
Article MathSciNet MATH Google Scholar
Ohtsubo Y (1991) On a discrete-time nonzero-sum Dynkin problem with monotonicity. J Appl Probab 28:466–472
Article MathSciNet MATH Google Scholar
Parthasarathy T (1973) Discounted, positive, and noncooperative stochastic games. Int J Game Theory 2:25–37
Article MathSciNet MATH Google Scholar
Parthasarathy T, Sinha S (1989) Existence of stationary equilibrium strategies in nonzero-sum discounted stochastic games with uncountable state space and state-independent transitions. Int J Game Theory 18:189–194
Article MATH Google Scholar
Peleg B, Yaari M (1973) On the existence of consistent course of action when tastes are changing. Rev Econ Stud 40:391–401
Article MATH Google Scholar
Peski M, Wiseman T (2015) A folk theorem for stochastic games with infrequent state changes. Theoret Econ 10:131–173
Article MathSciNet MATH Google Scholar
Phelps E, Pollak R (1968) On second best national savings and game equilibrium growth. Rev Econ Stud 35:195–199
Article Google Scholar
Pollak R (1968) Consistent planning. Rev Econ Stud 35:201–208
Article Google Scholar
Potters JAM, Raghavan TES, Tijs SH (2009) Pure equilibrium strategies for stochastic games via potential functions. In: Advances in dynamic games and their applications. Annals of the international society of dynamic games, vol 10. Birkhäuser, Boston, pp 433–444
Google Scholar
Purves RA, Sudderth WD (2011) Perfect information games with upper semicontinuous payoffs. Math Oper Res 36:468–473
Article MathSciNet MATH Google Scholar
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken
Book MATH Google Scholar
Raghavan TES, Syed Z (2002) Computing stationary Nash equilibria of undiscounted single-controller stochastic games. Math Oper Res 27:384–400
Article MathSciNet MATH Google Scholar
Raghavan TES, Tijs SH, Vrieze OJ (1985) On stochastic games with additive reward and transition structure. J Optim Theory Appl 47:451–464
Article MathSciNet MATH Google Scholar
Ramsey FP (1928) A mathematical theory of savings. Econ J 38:543–559
Article Google Scholar
Ray D (1987) Nonpaternalistic intergenerational altruism. J Econ Theory 40:112–132
Article MATH Google Scholar
Reny PJ, Robson A (2002) Existence of subgame-perfect equilibrium with public randomization: a short proof. Econ Bull 3(24):1–8
Google Scholar
Rieder U (1979) Equilibrium plans for non-zero sum Markov games. In: Moeschlin O, Pallaschke D (eds) Game theory and related topics. North-Holland, Amsterdam, pp 91–102
Google Scholar
Rogers PD (1969) Non-zero-sum stochastic games. Ph.D. dissertation, report 69–8, Univ of California
Google Scholar
Rosen JB (1965) Existence and uniqueness of equilibrium points for concave n-person games. Econometrica 33:520–534
Article MathSciNet MATH Google Scholar
Rosenberg D, Solan E, Vieille N (2001) Stopping games with randomized strategies. Probab Theory Relat Fields 119:433–451
Article MathSciNet MATH Google Scholar
Rubinstein A (1979) Equilibrium in supergames with the overtaking criterion. J Econ Theory 21:1–9
Article MathSciNet MATH Google Scholar
Secchi P, Sudderth WD (2002a) Stay-in-a-set games. Int J Game Theory 30:479–490
Article MathSciNet MATH Google Scholar
Secchi P, Sudderth WD (2002b) N-person stochastic games with upper semi-continuous payoffs. Int J Game Theory 30:491–502
Article MathSciNet MATH Google Scholar
Secchi P, Sudderth.WD (2005) A simple two-person stochastic game with money. In: Annals of the international society of dynamic games, vol 7. Birkhäuser, Boston, pp 39–66
Google Scholar
Selten R (1975) Re-examination of the perfectness concept for equilibrium points in extensive games. Int J Game Theory 4:25–55
Article MATH Google Scholar
Shapley LS (1953) Stochastic games. Proc Nat Acad Sci USA 39:1095–1100
Article MathSciNet MATH Google Scholar
Shubik M, Whitt W (1973) Fiat money in an economy with one non-durable good and no credit: a non-cooperative sequential game. In: Blaquière A (ed) Topics in differential games. North-Holland, Amsterdam, pp 401–448
Google Scholar
Shmaya E, Solan E (2004) Two player non-zero sum stopping games in discrete time. Ann Probab 32:2733–2764
Article MathSciNet MATH Google Scholar
Shmaya E, Solan E, Vieille N (2003) An applications of Ramsey theorem to stopping games. Games Econ Behavior 42:300–306
Article MathSciNet MATH Google Scholar
Simon RS (2007) The structure of nonzero-sum stochastic games. Adv Appl Math 38:1–26
Article MathSciNet Google Scholar
Simon R (2012) A topological approach to quitting games. Math Oper Res 37:180–195
Article MathSciNet MATH Google Scholar
Simon RS (2016) The challenge of nonzero-sum stochastic games. Int J Game Theory 45:191–204
Article MATH Google Scholar
Sleet C, Yeltekin S (2016) On the computation of value correspondences for dynamic games. Dyn Games Appl 6:174–186
Article MathSciNet MATH Google Scholar
Smale S (1976) A convergent process of price adjustment and global Newton methods. J Math Econ 3:107–120
Article MathSciNet MATH Google Scholar
Sobel MJ (1871) Non-cooperative stochastic games. Ann Math Stat 42:1930–1935
Article MATH Google Scholar
Solan E (1998) Discounted stochastic games. Math Oper Res 23:1010–1021
Article MathSciNet MATH Google Scholar
Solan E (1999) Three-person absorbing games. Math Oper Res 24:669–698
Article MathSciNet MATH Google Scholar
Solan E (2001) Characterization of correlated equilibria in stochastic games. Int J Game Theory 30:259–277
Article MathSciNet MATH Google Scholar
Solan E (2017, in press) Acceptable strategy profiles in stochastic games. Games Econ Behavior
Google Scholar
Solan E, Vieille N (2001) Quitting games. Math Oper Res 26:265–285
Article MathSciNet MATH Google Scholar
Solan E, Vieille N (2002) Correlated equilibrium in stochastic games. Games Econ Behavior 38:362–399
Article MathSciNet MATH Google Scholar
Solan E, Vieille N (2003) Deterministic multi-player Dynkin games. J Math Econ 39:911–929
Article MathSciNet MATH Google Scholar
Solan E, Vieille N (2010) Computing uniformly optimal strategies in two-player stochastic games. Econ Theory 42:237–253
Article MathSciNet MATH Google Scholar
Solan E, Ziliotto B (2016) Stochastic games with signals. In: Annals of the international society of dynamic games, vol 14. Birkhäuser, Boston, pp 77–94
MATH Google Scholar
Sorin S (1986) Asymptotic properties of a nonzero-sum stochastic games. Int J Game Theory 15:101–107
Article MathSciNet MATH Google Scholar
Spence M (1976) Product selection, fixed costs, and monopolistic competition. Rev Econ Stud 43:217–235
Article MATH Google Scholar
Stachurski J (2009) Economic dynamics: theory and computation. MIT Press, Cambridge, MA
MATH Google Scholar
Stokey NL, Lucas RE, Prescott E (1989) Recursive methods in economic dynamics. Harvard University Press, Cambridge
Google Scholar
Strotz RH (1956) Myopia and inconsistency in dynamic utility maximization. Rev Econ Stud 23:165–180
Article Google Scholar
Sundaram RK (1989a) Perfect equilibrium in a class of symmetric dynamic games. J Econ Theory 47:153–177
Article MATH Google Scholar
Sundaram RK (1989b) Perfect equilibrium in a class of symmetric dynamic games. Corrigendum. J Econ Theory 49:385–187
Article MATH Google Scholar
Szajowski K (1994) Markov stopping game with random priority. Z Oper Res 39:69–84
MathSciNet MATH Google Scholar
Szajowski K (1995) Optimal stopping of a discrete Markov process by two decision makers. SIAM J Control Optim 33:1392–1410
Article MathSciNet MATH Google Scholar
Takahashi M (1964) Equilibrium points of stochastic non-cooperative n-person games. J Sci Hiroshima Univ Ser A-I Math 28:95–99
MathSciNet MATH Google Scholar
Thuijsman F, Raghavan TES (1997) Perfect information stochastic games and related classes. Int J Game Theory 26:403–408
Article MathSciNet MATH Google Scholar
Topkis D (1978) Minimizing a submodular function on a lattice. Oper Res 26:305–321
Article MathSciNet MATH Google Scholar
Topkis D (1998) Supermodularity and complementarity. Princeton University Press, Princeton
Google Scholar
Valadier M (1994) Young measures, weak and strong convergence and the Visintin-Balder theorem. Set-Valued Anal 2:357–367
Article MathSciNet MATH Google Scholar
Van Long N (2011) Dynamic games in the economics of natural resources: a survey. Dyn Games Appl 1:115–148
Article MathSciNet MATH Google Scholar
Vieille N (2000a) Two-player stochastic games I: a reduction. Isr J Math 119:55–92
Article MathSciNet MATH Google Scholar
Vieille N (2000b) Two-player stochastic games II: the case of recursive games. Israel J Math 119:93–126
Article MathSciNet MATH Google Scholar
Vieille N (2002) Stochastic games: Recent results. In: Aumann RJ, Hart S (eds) Handbook of game theory with economic applications, vol 3. North Holland, Amsterdam/London, pp 1833–1850
Google Scholar
Vives X (1990) Nash equilibrium with strategic complementarities. J Math Econ 19:305–321
Article MathSciNet MATH Google Scholar
von Weizsäcker CC (1965) Existence of optimal programs of accumulation for an infinite horizon. Rev Econ Stud 32:85–104
Article Google Scholar
Vrieze OJ, Thuijsman F (1989) On equilibria in repeated games with absorbing states. Int J Game Theory 18:293–310
Article MathSciNet MATH Google Scholar
Whitt W (1980) Representation and approximation of noncooperative sequential games. SIAM J Control Optim 18:33–48
Article MathSciNet MATH Google Scholar
Wiecek P (2009) Pure equilibria in a simple dynamic model of strategic market game. Math Methods Oper Res 69:59–79
Article MathSciNet MATH Google Scholar
Wiecek P (2012) N-person dynamic strategic market games. Appl Math Optim 65:147–173
Article MathSciNet MATH Google Scholar
Yasuda M (1985) On a randomised strategy in Neveu’s stopping problem. Stoch Process Appl 21:159–166
Article MATH Google Scholar

Download references

Acknowledgements

We thank Tamer Başar and Georges Zaccour for inviting us to write this chapter and their help. We also thank Elżbieta Ferenstein, János Flesch, Eilon Solan, Yeneng Sun, Krzysztof Szajowski and two reviewers for their comments on an earlier version of this survey.

Author information

Authors and Affiliations

Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology, Wrocław, Poland
Anna Jaśkiewicz
Faculty of Mathematics, Computer Science and Econometrics, University of Zielona Góra, Zielona Góra, Poland
Andrzej S. Nowak

Authors

Anna Jaśkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej S. Nowak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Jaśkiewicz .

Editor information

Editors and Affiliations

Coordinated Science Laboratory and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Tamer Başar
Department of Decision Sciences GERAD, HEC Montréal, Montreal, Québec, Canada
Georges Zaccour

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Jaśkiewicz, A., Nowak, A.S. (2018). Nonzero-Sum Stochastic Games. In: Başar, T., Zaccour, G. (eds) Handbook of Dynamic Game Theory. Springer, Cham. https://doi.org/10.1007/978-3-319-44374-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-44374-4_33
Published: 02 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44373-7
Online ISBN: 978-3-319-44374-4
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Nonzero-Sum Stochastic Games

Abstract

Similar content being viewed by others

Non-Zero-Sum Stochastic Games

Non-Zero-Sum Stochastic Games

Non-Zero-Sum Stochastic Games

Keywords

1 Introduction

2 Preliminaries

Proposition 1

Proposition 2

Proposition 3

Proposition 4

Proposition 5

Proposition 6

Proposition 7

3 Subgame-Perfect Equilibria in Stochastic Games with General State Space

Theorem 1

Theorem 2

Theorem 3

Theorem 4

4 Correlated Equilibria with Public Signals in Games with Borel State Spaces

Theorem 5

5 Stationary Equilibria in Discounted Stochastic Games with Borel State Spaces

Theorem 6

Corollary 1

Proof

Remark 1

Remark 2

Remark 3

Remark 4

Theorem 7

Theorem 8

Theorem 9

Remark 5

6 Special Classes of Stochastic Games with Uncountable State Space and Their Applications in Economics

Theorem 10

Remark 6

Theorem 11

Example 1 (Dynamic Cournot oligopoly)

Example 2 (Cournot competition with substituting goods in differentiated markets)

Example 3 (Cournot oligopoly with complementary goods in differentiated markets)

Remark 7

Example 4

7 Special Classes of Stochastic Games with Countably Many States

Theorem 12

Theorem 13

8 Algorithms for Nonzero-Sum Stochastic Games

Theorem 14

Theorem 15

Theorem 16

Theorem 17

9 Uniform Equilibrium, Subgame Perfection, and Correlation in Stochastic Games with Finite State and Action Spaces

Theorem 18

Theorem 19

Theorem 20

Theorem 21

Theorem 22

Theorem 23

10 Nonzero-Sum Stochastic Games with Imperfect Monitoring

Theorem 24

11 Intergenerational Stochastic Games

Theorem 25

Theorem 26

Theorem 27

Theorem 28

12 Stopping Games

Theorem 29

Theorem 30

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry