Keywords

1 Introduction

The fundamentals of modern theory of non-cooperative dynamic games were established in the 1950s at Princeton University. First Nash (1950) introduced the notion of equilibrium for finite n-person static games and proved its existence using the fixed point theorem of Kakutani (1941). Next Shapley (1953) presented the model of infinite time horizon stochastic zero-sum game with positive stop probability. Fink (1964) and Takahashi (1964) extended his model to nonzero-sum discounted stochastic games with finite state spaces and proved the existence of equilibrium in stationary Markov strategies. Later on, Rogers (1969) and Sobel (1971) obtained similar results for irreducible stochastic games with the expected limit-average payoffs. Afterwards, the theory of discrete-time nonzero-sum stochastic games was extended in various directions inspiring a lot of interesting applications. An overview of selected basic topics in stochastic dynamic games with instructive examples can be found in the books of Başar and Olsder (1995) and Haurie et al. (2012) and the survey paper by Dutta and Sundaram (1998). Theoretically more advanced material is included in the book edited by Neyman and Sorin (2003) and the monograph of Mertens et al. (2015).

In this chapter, we provide an overview of almost all basic streams of research in the area of nonzero-sum discrete-time stochastic games such as: the existence of stationary equilibria in models on both countable and general state spaces, the existence of subgame-perfect equilibria, algorithms, stopping games, correlated and uniform equilibria. Our survey incorporates several examples of games studied in operations research and economics. In particular, separate sections are devoted to intergenerational games, dynamic Cournot competition and game models of resource extraction. The provided reference list not only includes seminal papers that initiated research in various directions but also exposes recent advances in this field.

The paper is organized as follows. In Sect. 2 we describe some basic material needed for an examination of nonzero-sum stochastic games with general state spaces. To make the presentation less technical, we restrict our attention to Borel space models. A great deal of the results described in this chapter are stated in the literature in a more general framework. However, the Borel state space models are sufficiently rich for most applications. Section 2 includes auxiliary results on set-valued mappings arising in the study of the existence of stationary Nash/correlated equilibria and certain known results in the literature such as a random version of the Carathéodory theorem or the Mertens measurable selection theorem. The second part, on the other hand, is devoted to supermodular games. Sect. 3 deals with the concept of subgame-perfect equilibrium in games on a Borel state space and introduces different classes of strategies, in which subgame-perfect equilibria may be obtained. Sect. 4 includes results on correlated equilibria with public signal proved for games on Borel state spaces, whereas Sect. 5 presents the state-of-the-art results on the existence of stationary equilibria (further called “stationary Markov perfect equilibria”) in discounted stochastic games. This theory found its applications to several examples examined in operations research and economics. Namely, in Sect. 6 we provide models with special but natural transition structures, for which there exist stationary equilibria. Sect. 7 recalls the papers, where the authors proved the existence of an equilibrium for stochastic games on denumerable state spaces. This section embraces the discounted models as well as models with the limit-average payoff criterion. Moreover, it is also shown that the discounted game with a Borel state space can be approximated, under some assumptions, by simpler games with countable state spaces. Sect. 8 reviews algorithms applied to nonzero-sum stochastic games. In particular, it is shown how a formulation of a linear complementarity problem can be helpful in solving games with discounted and limit-average payoff criteria with special transition structure. In addition, we also mention the homotopy methods applied to this issue. Sect. 9 presents material on games with finite state and action spaces, while Sect. 10 deals with games with product state spaces. In Sect. 11 we formulate results proved for various intergenerational games. Our models incorporate paternalistic and non-paternalistic altruistic economic growth models: games with one, finite, or infinitely many descendants as well as games on one- and multidimensional commodity spaces. Finally, Sect. 12 provides a short overview of stopping games beginning from the Dynkin extension of Neveu’s stopping problem.

2 Preliminaries

In this section, we recall essential notations and several facts, which are crucial for studying Nash and correlated equilibria in nonzero-sum stochastic games with uncountable state spaces. Here we follow preliminaries in Jaśkiewicz and Nowak (2018a). Let N = {1, …, n} be the set of n-players. Let X, A1, …, An be Borel spaces. Assume that for each i ∈ N, x → Ai(x) ⊂ Ai is a lower measurable compact-valued action correspondence for player i. This is equivalent to saying that the graph of this correspondence is a Borel subset of X × Ai. Let A := A1 × ⋯ × An. We consider a nonzero-sum n-person game parameterized by a state variable x ∈ X. The payoff or utility function for player i ∈ N is \(r_i:X\times A \to \mathbb {R}\) and it is assumed that ri is bounded, ri(⋅, a) is Borel measurable for each a ∈ A, and ri(x, ⋅) is continuous on A for each x ∈ X. Assuming that i ∈ N chooses a mixed strategy \(\nu _i\in \Pr (A_i(x))\) and ν := (ν1, …, νn), we denote the expected payoff to player i by

$$\displaystyle \begin{aligned} P^i(x,\nu):= \int_{A_n(x)}\cdots \int_{A_1(x)} r_i(x,a_1,\ldots,a_n)\nu_1(da_1)\times\cdots \times\nu_n(da_n). \end{aligned}$$

A strategy profile \(\nu ^*=(\nu _1^*,\ldots ,\nu _n^*)\) is a Nash equilibrium in the game for a given state x ∈ X if

$$\displaystyle \begin{aligned} P^i(x,\nu^*) \ge P^i(x,(\nu_i,\nu^*_{-i})) \end{aligned}$$

for every i ∈ N and \(\nu _i\in \Pr (A_i(x)).\) As usual \((\nu _i,\nu ^*_{-i})\) denotes ν with \(\nu ^*_i\) replaced by νi. For any x ∈ X, by \( \mathcal {N} (x)\), we denote the set of all Nash equilibria in the considered game. By Glicksberg (1952), \(\mathcal {N }(x)\neq \emptyset .\) It is easy to see that \(\mathcal {N}( x)\) is compact. Let \(\mathcal {NP}(x)\subset \mathbb {R} ^n\) be the set of payoff vectors corresponding to all Nash equilibria in \( \mathcal {N}( x).\) By co, we denote the convex hull operator in \(\mathbb {R} ^n.\)

Proposition 1

The mappings \(x\to {\mathcal {N}}( x)\) , \(x\to {\mathcal {NP}}( x)\) and \(x\to co{\mathcal {NP}}( x)\) are compact-valued and lower measurable.

For a detailed discussion of these statements, consult Nowak and Raghavan (1992), Himmelberg (1975), and Klein and Thompson (1984). By Kuratowski and Ryll-Nardzewski (1965), every set-valued mapping in Proposition 1 has a Borel measurable selection. Making use of standard results on measurable selections (see Castaing and Valadier 1977) and Carathéodory’s theorem, we obtain the following result.

Proposition 2

Let \(b:X\to \mathbb {R}^n\) be a Borel measurable selection of the mapping \(x\to co{\mathcal {NP}}( x).\) Then, there exist Borel measurable selections \(b^i:X\to \mathbb {R}^n\) and Borel measurable functions λi : X → [0, 1] (i = 1, …, n + 1) such that for each x  X, we have

$$\displaystyle \begin{aligned} \sum_{i=1}^{n+1}\lambda^i(x)=1\quad\mathit{\mbox{ and}}\quad b(x)=\sum_{i=1}^{n+1}\lambda^i(x)b^i(x). \end{aligned}$$

Similarly as in Nowak and Raghavan (1992), from Filippov’s measurable implicit function theorem, we conclude the following facts.

Proposition 3

Let \(p:X\to \mathbb {R}^n\) be a Borel measurable selection of the mapping \(x\to {\mathcal {NP}}( x).\) Then, there exists a Borel measurable selection ψ of the mapping \(x\to {\mathcal {N}}( x)\) such that

$$\displaystyle \begin{aligned} p(x) = (P^1(x,\psi(x)),\ldots,P^n(x,\psi(x)))\quad\mathit{\mbox{for all}}\quad x\in X. \end{aligned}$$

Proposition 4

If \(b:X\to \mathbb {R}^n\) is a Borel measurable selection of the mapping \(x\to co{\mathcal {NP}}( x)\), then there exist Borel measurable selections ψi of the mapping \(x\to {\mathcal {N}}( x)\) and Borel measurable functions λi : X → [0, 1] (i = 1, …, n + 1) such that for each x  X, we have \(\sum _{i=1}^{n+1}\lambda ^i(x)=1\) and

$$\displaystyle \begin{aligned} b(x) =\sum_{i=1}^{n+1} \lambda^i(x)(P^1(x,\psi^i(x)),\ldots,P^n(x,\psi^i(x))). \end{aligned}$$

The following result plays an important role in studying Nash equilibria in stochastic games with Borel state spaces and can be deduced from Theorem 2 in Mertens (2003) and Filippov’s measurable implicit function theorem. It is related to Lyapunov’s theorem on the range of nonatomic measures and also has some predecessors in control theory; see Artstein (1989).

Proposition 5

Let μ be a nonatomic Borel probability measure on X. Assume that qj (j = 1, …, l) are Borel measurable transition probabilities from X to X and for every j and x  X, qj(⋅|x) ≪ μ, i.e., qj(⋅|x) is dominated by μ. Let \(w^0:X\to \mathbb {R}^n\) be a Borel measurable mapping such that \(w^0(x)\in co{\mathcal {NP}}(x)\) for each x  X. Then there exists a Borel measurable mapping \(v^0:X\times X\to \mathbb {R}^n\) such that \(v^0(x,y)\in {\mathcal {NP}}(x)\) for all x, y  X and

$$\displaystyle \begin{aligned} \int_Xw^0(y)q_j(dy|x)=\int_Xv^0(x,y)q_j(dy|x),\quad j=1,\ldots,l. \end{aligned}$$

Moreover, there exists a Borel measurable mapping \(\phi :X\times X \to \Pr (A)\) such that \(\phi (x,y)\in {\mathcal {N}}(x)\) for all x, y  X.

Let L be a lattice contained in Euclidean space \(\mathbb {R}^k\) equipped with the component-wise order ≥. For any x, y ∈ L, x ∨ y (x ∧ y) denotes the join (meet) of x and y. A function \(\phi :L\to \mathbb {R}\) is supermodular if for any x, y ∈ L, it holds

$$\displaystyle \begin{aligned} \phi(x\vee y)+\phi(x\wedge y)\ge \phi(x)+\phi(y). \end{aligned}$$

Clearly, if k = 1, then any function ϕ is supermodular. Let \(L_1\subset \mathbb {R}^k\), \(L_2\subset \mathbb {R}^l\) be lattices. A function \( \psi :L_1\times L_2\to \mathbb {R}\) has increasing differences in (x, y) if for every x≥ x in L1, ψ(x, y) − ψ(x, y) is non-decreasing in y. Let the set Ai of pure strategies of player i ∈ N be a compact convex subset of Euclidean space \(\mathbb {R}^{m_i}.\) An element ai of Ai is denoted by \(a_i =(a_{i1}, a_{i2},\ldots ,a_{im_i}).\) We consider an n-person game G0 in which \(R_i:A \to \mathbb {R}\) is the payoff function of player i ∈ N and A := A1 ×⋯ × An. As usual, any strategy profile a = (a1, a2, …, an) can also be denoted as (ai, ai) for i ∈ N.

Assume that every Ai is a lattice. The game G0 is called supermodular if for every player i ∈ N and ai, the function ai → Ri(ai, ai) is supermodular and Ri has increasing differences in (ai, ai).

It is well known that any supermodular game with continuous utility functions and compact strategy sets Ai has a pure Nash equilibrium; see Topkis (1998) or Theorems 4 and 5 in Milgrom and Roberts (1990).

The game G0 is called smooth if every Ri can be extended from A to an open superset Ao in such a way that its second-order partial derivatives exist and are continuous on Ao.

A game G0 is called a smooth supermodular game if for every player i ∈ N,

  1. (a)

    Ai is a compact interval in \(\mathbb {R}^{m_i}\),

  2. (b)

    \( \frac {\partial ^2 R_i}{\partial a_{ij} \partial a_{ik}}\ge 0\) on A for all 1 ≤ j < k ≤ mi,

  3. (c)

    \(\frac {\partial ^2 R_i}{\partial a_{ij} \partial a_{kl}}\ge 0\) on A for each k ≠ i and all 1 ≤ j ≤ mi, 1 ≤ l ≤ mk.

It is well known that any game satisfying conditions (a)–(c) is supermodular. Conditions (a) and (b) imply that Ri is a supermodular function with respect to ai for fixed ai, while conditions (a) and (c) imply that Ri has increasing differences in (ai, ai). For a detailed discussion of these issues, see Topkis (1978, 1998) or Theorem 4 in Milgrom and Roberts (1990).

In order to obtain a uniqueness of an equilibrium in a smooth supermodular game G0, one needs an additional assumption, often called a strict diagonal dominance condition, see page 1271 in Milgrom and Roberts (1990) or Rosen (1965). As noted by Curtat (1996), this condition can be described for smooth supermodular games as follows. Let Mi := {1, 2, …, mi}.

  1. (C1)

    For every i ∈ N and j ∈ Mi,

    $$\displaystyle \begin{aligned} \frac{\partial^2 R_i}{\partial a^2_{ij}} +\sum_{l\in M^i\setminus \{j\}} \frac{\partial^2 R_i}{\partial a_{ij} \partial a_{il}}+\sum_{k\in N\setminus\{i\}} \sum_{l\in M^k} \frac{\partial^2 R_i}{\partial a_{ij} \partial a_{kl}}<0. \end{aligned}$$

From Milgrom and Roberts (1990) and page 187 in Curtat (1996), we obtain the following auxiliary result.

Proposition 6

Any smooth supermodular game G0 satisfying condition (C1)has a unique pure Nash equilibrium.

Assume now that the payoff functions Ri are parameterized by τ in some partially ordered set T, i.e., \(R_i: A\times T\to \mathbb {R}.\) We introduce a new condition.

  1. (C2)

    \(\frac {\partial ^2 R_i}{\partial a_{ij} \partial \tau }\ge 0\) for all 1 ≤ j ≤ mi, and i ∈ N.

It is known that the set of Nash equilibria in any supermodular game G0 is a lattice and has the smallest and the largest elements. The following result follows from Theorem 7 in Milgrom and Roberts (1990).

Proposition 7

Suppose that a smooth supermodular game satisfies (C2) . Then, the largest and smallest pure Nash equilibria are non-decreasing functions of τ.

3 Subgame-Perfect Equilibria in Stochastic Games with General State Space

We consider an n-person nonzero-sum discounted stochastic game G as defined below.

  • \((X,\mathcal {B}(X))\) is a nonempty Borel state space with its Borel σ-algebra \(\mathcal {B}(X).\)

  • Ai is a Borel space of actions for player i ∈ N := {1, …, n}.

  • Ai(x) ⊂ Ai is a set of actions available to player i ∈ N in state x ∈ X. The correspondence x → Ai(x) is lower measurable and compact-valued. Define

    $$\displaystyle \begin{aligned} A:=A_1\times \ldots\times A_n\quad\mbox{and}\quad A(x)=A_1(x)\times\ldots\times A_n(x),\quad\;x\in X. \end{aligned}$$
  • \(u_i:X\times A \to \mathbb {R}\) is a Borel measurable bounded utility (or payoff) function for player i ∈ N. It is assumed that ui(x, ⋅) is continuous on A for every x ∈ X.

  • \(q:X\times A\times \mathcal {B}(X)\to [0,1]\) is a transition probability. We assume that q(D|x, ⋅) is continuous on A for each x ∈ X and \(D\in \mathcal {B}(X).\)

  • β ∈ (0, 1) is a discount factor.

Every stage of the game begins with a state x ∈ X, and after observing x, the players simultaneously choose their actions ai ∈ Ai(x) (i ∈ N) and obtain payoffs ui(x, a), where a = (a1, …, an). A new state x is realized from the distribution q(⋅|x, a) and new period begins with payoffs discounted by β. Let H1 = X and Ht be the set of all plays ht = (x1, a1, …, xt−1, at−1, xt), where \( a^k=(a_1^k,\ldots ,a_n^k)\in A(x_k)\), k = 1, …, t − 1. A strategy for player i ∈ N is a sequence \(\pi _i=(\pi _{it})_{t\in \mathbb {N}}\) of Borel measurable transition probabilities from Ht to Ai such that πit(Ai(xt)) = 1 for each ht ∈ Ht. The set of strategies for player i ∈ N is denoted by Πi. We let Π := Π1 ×… × Πn. Let Fi (\(F^0_i\)) be the set of all Borel measurable mappings \(f_i: X\times X \to \Pr (A_i)\) (\(\phi _i: X \to \Pr (A_i)\)) such that \(f_i(x^-,x) \in \Pr (A_i(x))\) (\(\phi _i(x) \in \Pr (A_i(x))\)) for each x, x ∈ X. A stationary almost Markov strategy for player i ∈ N is a constant sequence \((\pi _{it})_{t\in \mathbb {N}}\) where πit = fi for some fi ∈ Fi and for all \(t\in \mathbb {N}.\) If xt is a state of the game on its t-stage with t ≥ 2, then player i chooses an action using the mixed strategy fi(xt−1, xt). The mixed strategy used at an initial state x1 is fi(x1, x1). The set of all stationary almost Markov strategies for player i ∈ N is identified with the set Fi. A stationary Markov strategy for player i ∈ N is identified with a Borel measurable mapping \(f_i\in F_i^0.\) We say that πi = (πi1, πi2, …) ∈ Πi is a Markov strategy for player i if \(\pi _{it}\in F^0_i\) for all \(t\in \mathbb {N}.\)

Any strategy profile π = (π1, …, πn) ∈ Π together with an initial state x = x1 ∈ X determines a unique probability measure \(P^\pi _x\) on the space H of all plays h = (x1, a1, x2, a2, …) endowed with the product σ-algebra. The expected discounted payoff or utility function for player i ∈ N is

$$\displaystyle \begin{aligned} J_{\beta}^{i,T}(x,\pi)= E^\pi_x\left(\sum_{t=1}^T \beta^{t-1}u_i(x_t,a^t)\right)\quad\mbox{where}\quad T\le\infty. \end{aligned}$$

We shall write \(J_{\beta }^i(x,\pi )\), if T = .

A profile of strategies π∈ Π is called a Nash equilibrium, if

$$\displaystyle \begin{aligned} J_{\beta}^{i,T}(x,\pi^*)\ge J_{\beta}^{i,T}(x,(\pi^*_{-i},\pi_i)) \mbox{ for all }x\in X, \,\,\pi_i\in \varPi_i \mbox{ and } i\in N. \end{aligned}$$

A stationary almost Markov (stationary Markov) perfect equilibrium is a Nash equilibrium that belongs to the class of strategy profiles F := F1 ×… × Fn (\( F^0:=F^0_1\times \ldots \times F^0_n\)). A Markov perfect equilibrium, on the other hand, is a Nash equilibrium π, in which \(\pi ^*_{it}=f_{it} \) and \(f_{it}\in F_i^0\) for every \(t\in \mathbb {N}\) and every player i ∈ N. The strategies involved in such an equilibrium are called “markovian,” “state-contigent,” or “payoff-relevant”; see Maskin and Tirole (2001). Clearly, every stationary Markov perfect equilibrium is also a Markov perfect equilibrium.

Let π = (π1, …, πn) ∈ Π and ht ∈ Ht. By πi[ht] we denote the conditional strategy for player i that can be applied from stage t onward. Let π[ht] = (π1[ht], …, πn[ht]). Using this notation, one can say that π is a subgame-perfect equilibrium in the stochastic game if for any \(t\in \mathbb {N}\) and every partial history ht ∈ Ht, π[ht] is a Nash equilibrium in the subgame starting at xt, where xt is the last coordinate in ht. This definition refers to the classical idea of Selten (1975).

Let B(X) be the space of all bounded Borel measurable real-valued functions on X and Bn(X) := B(X) ×⋯ × B(X) (n times). Similarly define B(X × X) and Bn(X × X). With any x ∈ X and v = (v1, …, vn) ∈ Bn(X), we associate the one-shot game Γv(x) in which the payoff function to player i ∈ N is

$$\displaystyle \begin{aligned} U_{\beta}^i(v_i,x,a) := u_i(x,a)+\beta \int_X v_i(y)q(dy|x,a), \quad a\in A(x). \end{aligned} $$
(6.1)

If \(\nu =(\nu _1,\ldots ,\nu _n)\in \Pr (A(x))\), then

$$\displaystyle \begin{aligned} U_{\beta}^i(v_i,x,\nu)=\int_{A_n(x)}\ldots\int_{A_1(x)}U_ \beta^i(v_i,x,a_1,\ldots,a_n)\nu_1(da_1)\times\cdots\times \nu_n(da_n) \end{aligned}$$

and if f = (f1, …, fn) ∈ F0, then \(U_{\beta }^i(v_i,x,f)= U_{\beta }^i(v_i,x,\nu )\) with ν = (f1(x), …, fn(x)). Further, \( U_{\beta }^i(v_i,x,(\mu _i,f_{-i}))=U_{\beta }^i(v_i,x,\nu )\) with νi = μi, νj = fj(x) for j ≠ i. Under our assumptions, \(a\to U_{\beta }^i(v_i,x,a)\) is continuous on A(x) for every vi ∈ B(X), x ∈ X, i ∈ N. Let \(\mathcal {N}_v(x)\) be the set of all Nash equilibria in the game Γv(x). By \(\mathcal {NP}_v(x)\) we denote the set of payoff vectors corresponding to all Nash equilibria in \(\mathcal {N}_v(x).\) Let \( \mathcal {M}_v\) be the set of all Borel measurable selections of the set-valued mapping \(x\to \mathcal {N}_v(x).\) We know from Proposition 1 that \(\mathcal {M}_v\neq \emptyset .\)

Consider a T-stage game (2 ≤ T < ). Assume that the (T − 1)-stage subgame starting at any state x2 ∈ X has a Markov perfect equilibrium, say \(\pi ^*_{T-1}.\) Let \(v_{T-1}^*\) be the vector payoff function in Bn(X) determined by \(\pi ^*_{T-1}.\) Then we can get some \(f^*\in \mathcal {M} _{v_{T-1}^*}\) and define \(\pi ^*_T:=(f^*,\pi ^*_{T-1}).\) It is obvious that \( \pi ^*_T\) is a Markov perfect equilibrium in the T-stage game. This fact was proved by Rieder (1979) and we state it below.

Theorem 1

Every finite-stage nonzero-sum discounted stochastic game satisfying the above conditions has a subgame-perfect equilibrium. For any ε > 0, there exists an ε-equilibrium πε in Markov strategies, i.e.,

$$\displaystyle \begin{aligned} J_{\beta}^i(x,\pi^\varepsilon) +\varepsilon \ge J_{\beta}^i(x,(\pi_i,\pi^\varepsilon_{-i})) \quad\mathit{\mbox{for all}}\quad x\in X, \ \pi_i\in \varPi_i \mathit{\mbox{ and }}i\in N. \end{aligned}$$

Note that ε-equilibrium in the second part of this theorem has no subgame-perfection property.

We now make an additional assumption.

  1. (A1)

    The transition probability q is norm continuous in actions, i.e., for each x ∈ X, ak → a0 in A(x) as k →, it follows that

    $$\displaystyle \begin{aligned} \sup_{D\in \mathcal{B}(X)}|q(D|x,a^k)-q(D|x,a^0)|\to 0. \end{aligned}$$

Condition (A1) is quite restrictive, but it is satisfied, if q has a continuous in actions conditional density with respect to some probability measure on X.

Theorem 2

Every discounted nonzero-sum stochastic game G satisfying (A1)has a subgame-perfect equilibrium.

Theorem 2 was proved in a more general form by Mertens and Parthasarathy (2003), where the payoffs and discount factors may depend on time and the state space is a general measurable space. A special case was considered by Mertens and Parthasarathy (1991), who assumed that the action sets are finite and state independent and transitions are dominated by some probability measure on X. The proofs given in Mertens and Parthasarathy (1991, 2003) are based upon studying a specified fixed point property of an operator defined in the class of measurable selections of compact set-valued mappings from the state space to the payoff space. The fixed point obtained in that class is used to define in a recursive way a subgame-perfect equilibrium that consists of history-dependent strategies (unbounded memory is assumed). For further comments on possible extensions of Theorem 2, the reader is referred to Mertens (2002) and Mertens et al. (2015). A modified proof of their results was provided by Solan (1998), who analyzed accumulation points of 𝜖-equilibria (as 𝜖 → 0) obtained in Theorem 1.

Assume that Ai(x) = Ai for each x ∈ X and i ∈ N and that every space Ai is compact. Let X and A1, …, An be given the discrete topology. According to Maitra and Sudderth (2007), a function \(g:H_{\infty } \to \mathbb {R} \) is DS-continuous on H if it is continuous on H endowed with the product topology. It is easy to see that g is DS -continuous on H if and only if, for any 𝜖 > 0 and y = (y1, y2, …) ∈ H, there exists m such that |g(y) − g(y)| < 𝜖 for each \(y^{\prime }=(y_1^{\prime },y_2^{\prime },\ldots )\in H_{\infty }\) such that \(y_l=y^{\prime }_l\) for 1 ≤ l ≤ m. Suppose that \(g_i:H_{\infty }\to \mathbb {R}\) is a bounded Borel measurable payoff function for player i ∈ N. For any strategy profile π ∈ Π and every initial state x = x1, the expected payoff to player i is \(E^\pi _x(g_i).\) The subgame-perfect equilibrium can be defined for this game in the usual way. Maitra and Sudderth (2007) (see Theorem 1.2) obtained a general theorem on the existence of subgame-perfect equilibria for stochastic games.

Theorem 3

Let the payoff functions gi, i  N be bounded, Borel measurable, and DS-continuous on H and let the action spaces Ai, i  N be finite. Then the game has a subgame-perfect equilibrium.

The proof of Theorem 3 applies some techniques from gambling theory described in Dubins and Savage (2014), i.e., approximations of DS-continuous functions by “finitary functions”. Theorem 3 extends a result due to Fudenberg and Levine (1983). An example given in Harris et al. (1995) shows that Theorem 3 is false, if the action spaces are compact metric and the transition probability q is weakly continuous.

The next result was proved by Maitra and Sudderth (2007) (see Theorem 1.3) for “additive games” and sounds as follows.

Theorem 4

Assume that every action space is compact and the transition probability satisfies (A1) . Assume that \(g_i(h_{\infty })=\sum _{t=1}^\infty r_{it}(x_t,a^t)\) and this series converges uniformly on H. If, in addition, every function rit is bounded, rit(⋅, a) is Borel measurable on X for each a  A := A1 ×⋯ × An and rit(x, ⋅) is continuous on A for each x  X, then the game has a subgame-perfect equilibrium.

It is worth to emphasize that stationary Markov perfect equilibria may not exist in games considered in this section. Namely, Levy (2013) gave a counterexample of a discounted game with uncountable state space, finite action sets and deterministic transitions. Then, Levy and McLennan (2015) showed that stationary Markov perfect equilibria may not exist even if the action spaces are finite, X = [0, 1] and the transition probability has a density function with respect to some measure \(\mu \in \Pr (X).\) A simple modification of the example given in Levy and McLennan (2015) shows that a new game (with X = [0, 2]) need not have a stationary Markov perfect equilibrium, when the measure μ (dominating the transition probability q) is nonatomic.

4 Correlated Equilibria with Public Signals in Games with Borel State Spaces

Correlated equilibria for normal form games were first studied by Aumann (1974, 1987). In this section we describe an extensive-form correlated equilibrium with public randomization inspired by the work of Forges (1986) . A further discussion of correlated equilibria and communication in games can be found in Forges (2009). The sets of all equilibrium payoffs in extended form games that include a general communication device are characterized by Solan (2001).

We now extend the sets of strategies available to the players in the sense that we allow them to correlate their choices in some natural way. Suppose that \((\xi _t)_{t\in \mathbb {N}}\) is a sequence of so-called signals , drawn independently from [0, 1] according to the uniform distribution. Suppose that at the beginning of each period t of the game the players are informed not only of the outcome of the preceding period and the current state xt, but also of ξt. Then, the information available to them is a vector ht = (x1, ξ1, a1, …, xt−1, ξt−1, at−1, xt, ξt), where xτ ∈ X, ξτ ∈ [0, 1], aτ ∈ A(xτ), 1 ≤ τ ≤ t − 1. We denote the set of such vectors by Ht. An extended strategy for player i is a sequence \(\pi _i=(\pi _{it})_{t\in \mathbb {N}}\), where πit is a Borel measurable transition probability from Ht to Ai such that πit(Ai(xt)|ht) = 1 for each ht ∈ Ht. An extended stationary strategy for player i ∈ N can be identified with a Borel measurable mapping \(f:X \times [0,1]\to \Pr (A_i)\) such that f(Ai(x)|x, ξ) = 1 for all (x, ξ) ∈ X × [0, 1]. Assuming that the players use extended strategies we actually assume that they play the stochastic game with the extended state space X × [0, 1]. The law of motion, say p, in the extended state space model is obviously the product of the original law of motion q and the uniform distribution η on [0, 1]. More precisely, for any x ∈ X , ξ ∈ [0, 1], a ∈ A(x), Borel sets C ⊂ X and D ⊂ [0, 1], p(C × D|x, ξ, a) = q(C|x, a)η(D). For any profile of extended strategies π = (π1, …, πn) of the players, the expected discounted payoff to player i ∈ N is a function of the initial state x1 = x and the first signal ξ1 = ξ and is denoted by \(J_{\beta }^i(x,\xi ,\pi ).\) We say that \(f^*=(f^*_1,\ldots ,f^*_n)\) is a Nash equilibrium in the β-discounted stochastic game in the class of extended strategies if for each initial state x1 = x, i ∈ N and every extended strategy πi of player i, we have

$$\displaystyle \begin{aligned} \int_0^1J_{\beta}^i(x,\xi,f^*)d\xi \ge \int_0^1J_{\beta}^i(x,\xi,(\pi_i,f^*_{-i}))d\xi. \end{aligned}$$

The Nash equilibrium in extended strategies is also called a correlated equilibrium with public signals. The reason is that after the outcome of any period of the game, the players can coordinate their next choices by exploiting the next (known to all of them, i.e., public) signal and using some coordination mechanism telling which (pure or mixed) action is to be played by everyone. In many applications, we are particularly interested in stationary equilibria. In such a case the coordination mechanism can be represented by a family of n + 1 Borel measurable functions λj : X → [0, 1] such that \(\sum _{j=1}^{n+1}\lambda ^j(x)=1\) for each x ∈ X. A stationary correlated equilibrium can be constructed then by using a family of n + 1 stationary strategies \(f^1_i,\ldots ,f^{n+1}_i\) given for every player i, and the following coordination rule. If the game is in state xt = x on stage t and a random number ξt = ξ is selected, then player i ∈ N is suggested to use \(f_i^k(\cdot |x)\) where k is the least index for which \( \sum _{j=1}^k\lambda ^j(x)\ge \xi .\) The functions λj and \(f^j_i\) induce an extended stationary strategy \(f^*_i\) for every player i as follows

$$\displaystyle \begin{aligned} f_i^*(\cdot|x,\xi):= f_i^1(\cdot|x)\quad\mbox{if}\quad \xi\le\lambda^1(x), \ \ x\in X, \end{aligned}$$

and

$$\displaystyle \begin{aligned} f_i^*(\cdot|x,\xi):= f_i^k(\cdot|x)\quad\mbox{if}\quad \sum_{j=1}^{k-1}\lambda^j(x)<\xi\le \sum_{j=1}^k\lambda^j(x) \end{aligned}$$

for x ∈ X, 2 ≤ k ≤ n + 1. Because the signals are independent and uniformly distributed in [0, 1], it follows that at any period of the game and for any current state x, the number λj(x) can be interpreted as the probability that player i is suggested to use \(f^j_i(\cdot |x)\) as a mixed action.

  1. (A2)

    Let \(\mu \in \Pr (X).\) There exists a conditional density function ρ for q with respect to μ such that if ak → a0 in A(x), x ∈ X, as k →, then

    $$\displaystyle \begin{aligned} \lim_{k\to\infty}\int_X|\rho(x,a^k,y)-\rho(x,a^0,y)|\mu(dy) =0. \end{aligned}$$

Theorem 5

Any discounted stochastic game G satisfying (A2)has a stationary correlated equilibrium with public signals.

Theorem 5 was proved by Nowak and Raghavan (1992). First it is shown by making use of a theorem in Glicksberg (1952) that the correspondence \(v\to \mathcal {M}_v\) has a fixed point, i.e., there exists w∈ Bn(X) such that \(w^*(x)\in co\mathcal {NP}_{w^*}(x)\) for all x ∈ X. Then, applying Propositions 2 and 4, one can prove the existence of a stationary correlated equilibrium with public signals for the game with the payoff functions \(U_{\beta }^i(w^*_i,x,a)\) defined in (6.1). A verification that f obtained in this way is indeed a Nash equilibrium in the game with the extended state space X × [0, 1] relies on using standard Bellman equations for discounted dynamic programming; see Blackwell (1965) or Puterman (1994). Observe also that the set of all atoms Da for μ is countable. A refinement of the above result is Theorem 2 in Jaśkiewicz and Nowak (2016), where it is shown that public signals are important only in states belonging to the set X ∖ Da. A similar result on correlated equilibria was given in Nowak and Jaśkiewicz (2005) for semi-Markov games with Borel state spaces and the expected average payoffs. This result, in turn, was proved under geometric drift conditions (GE1)–(GE3) formulated in Sect. 5 in Jaśkiewicz and Nowak (2018a).

Condition (A2) can be replaced in the proof (with minor changes) by assumption (A1) on norm continuity of q with respect to actions. A similar result to Theorem 5 was given by Duffie et al. (1994), where it was assumed that for any x, x∈ X, a ∈ A(x), a∈ A(x), we have

$$\displaystyle \begin{aligned} q(\cdot|x,a)\ll q(\cdot|x^{\prime },a^{\prime })\quad\mbox{ and}\quad q(\cdot|x^{\prime },a^{\prime })\ll q(\cdot|x,a). \end{aligned}$$

In addition, Duffie et al. (1994) required the continuity of the payoffs and transitions with respect to actions. Thus, the result in Duffie et al. (1994) is weaker than Theorem 5. Moreover, they also established the ergodicity of the Markov chain induced by a stationary correlated equilibrium. Their proof is different from that of Nowak and Raghavan (1992). Subgame-perfect correlated equilibria were also studied by Harris et al. (1995) for games with weakly continuous transitions and general continuous payoff functions on the space of infinite plays endowed with the product topology. Harris et al. (1995) gave an example showing that public signals play an important role. They proved that the subgame-perfect equilibrium path correspondence is upper hemicontinuous. Later, Reny and Robson (2002) provided a shorter and simpler proof of existence that focuses on considerations of equilibrium payoffs rather than paths. Some comments on correlated equilibria for games with finitely many states or different payoff evaluation will be given in the sequel.

5 Stationary Equilibria in Discounted Stochastic Games with Borel State Spaces

In this section, we introduce the following condition.

  1. (A3)

    There exist l Carathéodory functions αj : X × A → [0, 1] such that \(\sum _{j=1}^l\alpha _j(x,a)=1\) for every (x, a) ∈ X × A and Borel measurable transition probabilities \(q_j:X\times \mathcal {B}(X)\to [0,1]\) such that

    $$\displaystyle \begin{aligned} q(\cdot|x,a) = \sum_{j=1}^l \alpha_j(x,a)q_j(\cdot|x), \quad (x,a)\in X\times A.\end{aligned} $$

    Additionally, every qj(⋅|x) is dominated by some \(\mu \in \Pr (X).\)

We can now state a result due to Jaśkiewicz and Nowak (2016).

Theorem 6

Assume that game G satisfies (A3) . Then, G has a stationary almost Markov perfect equilibrium.

We outline the proof of Theorem 6 for nonatomic measure μ. The general case needs an additional notation. First, we show that there exists a Borel measurable mapping w∈ Bn(X) such that \(w^*(x) \in co\mathcal {NP}_{w^*}(x)\) for all x ∈ X. This result is obtained by applying a generalization of the Kakutani fixed point theorem due to Glicksberg (1952). (Note that closed balls in Bn(X) are compact in the weak-star topology due to Banach-Alaoglu’s theorem.) Second, applying Proposition 5 we conclude that there exists some v∈ Bn(X × X) such that

$$\displaystyle \begin{aligned} \int_Xw^*(y)q_j(dy|x)=\int_Xv^*(x,y)q_j(dy|x),\quad j=1,\ldots,l. \end{aligned}$$

Hence, by (A3) we infer that

$$\displaystyle \begin{aligned} \int_Xw^*(y)q(dy|x,a)=\int_Xv^*(x,y)q(dy|x,a),\quad (x,a)\in X\times A. \end{aligned}$$

Moreover, we know that \(v^*(x,y)\in \mathcal {NP}_{v^*}(y)\) for all states x and y. Furthermore, making use of Filippov’s measurable implicit function theorem (as in Proposition 5), we claim that v(x, y) is the vector of equilibrium payoffs corresponding to some stationary almost Markov strategy profile. Finally, we utilize the system of n Bellman equations to provide a characterization of stationary equilibrium and to deduce that this profile is indeed a stationary almost Markov perfect equilibrium. For details the reader is referred to Jaśkiewicz and Nowak (2016).

Corollary 1

Consider a game where the set A is finite and the transition probability q is Borel measurable. Then, the game has a stationary almost Markov perfect equilibrium.

Proof

We show that the game meets (A3). Let \(m\in \mathbb {N}\) be such that A = {a1, …, am}. Now, for j = 1, …, m, define

$$\displaystyle \begin{aligned} \alpha_j(s,a):= \left\{ \begin{array}{ll} 1, & \mbox{if}\ \ a\in A(x),\; a=a^j \\ 0, & \mbox{otherwise,} \end{array} \right.\quad\mbox{and}\\ q_j(\cdot|x):= \left\{ \begin{array}{ll} q(\cdot|x,a), & \mbox{if}\ \ a\in A(x),\; a=a^j \\ \mu(\cdot), & \mbox{otherwise.} \end{array} \right. \end{aligned} $$

Then, \(q(\cdot |s,a)=\sum _{j=1}^l g_j(s,a)q_j(\cdot |s)\) for l = m and the conclusion follows from Theorem 6.

Remark 1

Corollary 1 extends the result of Mertens and Parthasarathy (1991), where it is additionally assumed that Ai(x) = Ai for all x ∈ X, i ∈ N and that μ is nonatomic; see Comment on page 147 in Mertens and Parthasarathy (1991) or Theorem VII.1.8 on page 398 in Mertens et al. (2015). If μ admits some atoms, then they proved the existence of a subgame-perfect equilibrium in which the strategy of player i ∈ N is of the form (fi1, fi2, …) with \(f_{it}\in F_i^0\) for each \(t\in \mathbb {N}.\) Thus, the equilibrium strategy of player i ∈ N is stage-dependent.

Remark 2

It is worth to emphasize that equilibria established in Theorem 6 are subgame-perfect. A related result to Theorem 6 is given in Barelli and Duggan (2014). The assumption imposed on the transition probability in their paper is weaker, but an equilibrium is shown to exist in the class of stationary semi-Markov strategies, where the players take into account the current state, the previous state and the actions chosen by the players in the previous state.

Remark 3

As already mentioned in Sect. 3, Levy and McLennan (2015) constructed a stochastic game that does not have a stationary Markov perfect equilibrium. In their model, each set Ai is finite, Ai(x) = Ai for every i ∈ N, x ∈ X, and the transition law is a convex combination of a probability measure (depending on the current state) and the Dirac measure concentrated at some state. Such a model satisfies the absolute continuity condition. Hence, their example confirms that one cannot expect to obtain an equilibrium in stationary Markov strategies even for games with finite action spaces. Therefore, Corollary 1 is meaningful.

Remark 4

By Urysohn’s metrization theorem (see Theorem 3.40 in Aliprantis and Border 2006), every action space Ai can be embedded homeomorphically in the Hilbert cube. The action correspondences remain measurable and compact-valued after the embedding. Therefore, one can assume without loss of generality as in Jaśkiewicz and Nowak (2016) that the action spaces are compact metric.

A stochastic game with additive reward and additive transitions (ARAT for short) satisfies some separability condition for the actions of the players. To simplify presentation, we assume that N = {1, 2}. The payoff function for player i ∈ N is of the form

$$\displaystyle \begin{aligned} u_i(x,a_1,a_2) =u_{i1}(x,a_1) +u_{i2}(x,a_2), \end{aligned}$$

where x ∈ X, a1 ∈ A1(x), a2 ∈ A2(x) and similarly

$$\displaystyle \begin{aligned} q(\cdot|x,a_1,a_2) = q_1(\cdot|x,a_1) + q_2(\cdot|x,a_2), \end{aligned}$$

where q1 and q2 are Borel measurable subtransition probabilities dominated by some \(\mu \in \Pr (X).\)

The following result was proved in Jaśkiewicz and Nowak (2015a).

Theorem 7

If μ is a nonatomic probability measure and the action sets A1 and A2 are finite, then the ARAT stochastic game has a Nash equilibrium in pure stationary almost Markov strategies.

The separability condition as in ARAT games can be easily generalized to n -person case. Assumptions of similar type are often used in differential games; see Başar and Olsder (1995). ARAT stochastic games with Borel state and finite action spaces were first studied by Himmelberg et al. (1976), who showed the existence of stationary Markov equilibria for μ-almost all initial states with \(\mu \in \Pr (X).\) Their result was strengthened by Nowak (1987) , who considered compact metric action spaces and obtained stationary equilibria for all initial states. Pure stationary Markov perfect equilibria may not exist in ARAT stochastic games if μ has atoms; see Example 3.1 (a game with 4 states) in Raghavan et al. (1985) or counterexample (a game with 2 states) in Jaśkiewicz and Nowak (2015a). Küenle (1999) studied ARAT stochastic games with a Borel state space and compact metric action spaces and established the existence of non-stationary history-dependent pure Nash equilibria. In order to construct subgame-perfect equilibria, he used the well-known idea of threats (frequently used in repeated games). The result of Küenle (1999) is stated for two-person games only. Theorem 7 can also be proved for n-person games under a similar additivity assumption. An almost Markov equilibrium is obviously subgame-perfect.

Stationary Markov perfect equilibria exist in discounted stochastic games with state-independent transitions (SIT games) studied by Parthasarathy and Sinha (1989). They assumed that Ai(x) = Ai for all x ∈ X and i ∈ N, the action sets Ai are finite, and q(⋅|x, a) = q(⋅|a) are nonatomic for all a ∈ A. A more general class of games with additive transitions satisfying (A3) but with all qj independent of state x ∈ X (AT games) was examined by Nowak (2003b). A stationary Markov perfect equilibrium f∈ F0 was shown to exist in that class of stochastic games. Additional special classes of discounted stochastic games with uncountable state space having stationary Markov perfect equilibrium are described in Krishnamurthy et al. (2012) . Some of them are related to AT games studied by Nowak (2003b).

Let X = Y × Z where Y  and Z are Borel spaces. In a noisy stochastic game considered by Duggan (2012), the states are of the form x = (y, z) ∈ X, where z is called a noise variable. The payoffs depend measurably on x = (y, z) and are continuous in actions of the players. The transition probability q is defined as follows:

$$\displaystyle \begin{aligned} q(D|x,a)= \int_Y\int_Z1_D(y^{\prime },z^{\prime })q_2(dz^{\prime }|y^{\prime })q_1(dy^{\prime }|x,a), \quad a\in A(x),\ D\in \mathcal{B}(Y\times Z). \end{aligned}$$

Moreover, it is assumed that q1 is dominated by some \(\mu _1\in \Pr (Y)\) and q2 is absolutely continuous with respect to some nonatomic measure \(\mu _2\in \Pr (Z).\) Additionally, q1(⋅|x, a) is norm continuous in actions a ∈ A, for each x ∈ X. This form of q implies that conditional on y the next shock z is independent of the current state and actions. In applications, (y, z) may represent a pair: the price of some good and the realization of random demand. By choosing actions, the players can determine (stochastically) the next period price y, which in turn, has some influence on the next demand shock. Other applications are discussed by Duggan (2012), who obtained the following result.

Theorem 8

Every noisy stochastic game has a stationary Markov perfect equilibrium.

Let X be a Borel space, \(\mu \in \Pr (X)\) and let \(\mathcal {G}\subset \mathcal {B}(X)\) be a sub-σ-algebra. A set \(D\in \mathcal {B}(X)\) is said to be a (conditional) \(\mathcal {G}\)-atom if μ(D) > 0 and for any Borel set B ⊂ D, there exists some \(D_0\in \mathcal {G}\) such that μ(B△(D ∩ D0)) = 0. Assume that the transition probability q is dominated by some probability measure μ, and ρ denotes a conditional density function. Following He and Sun (2017), we say that a discounted stochastic game has a decomposable coarser transition kernel if there exists a sub-σ-algebra \(\mathcal {G}\subset \mathcal {B}(X)\) such that \(\mathcal { B}(X)\) has no \(\mathcal {G}\)-atom and there exist Borel measurable nonnegative functions ρj and dj (j = 1, …, l) such that, for every x ∈ X, a ∈ A, each function ρj(⋅, x, a) is \(\mathcal {G}\) -measurable and the transition probability density ρ is of the form

$$\displaystyle \begin{aligned} \rho(y,x,a)=\sum_{j=1}^l\rho_j(y,x,a)d_j(y),\quad x,\ y\in X,\ a\in A. \end{aligned}$$

Using a theorem of Dynkin and Evstigneev (1977) on conditional expectations of measurable correspondences and a fixed point property proved in Nowak and Raghavan (1992), He and Sun (2017) established the following result.

Theorem 9

Every discounted stochastic game having decomposable coarser transition kernel with respect to a nonatomic probability measure μ on X has a stationary Markov perfect equilibrium.

A slight extension of the above theorem, given in He and Sun (2017), contains as special cases the results proved by Parthasarathy and Sinha (1989), Nowak (2003b), Nowak and Raghavan (1992). However, the form of the equilibrium strategy obtained by Nowak and Raghavan (1992) does not follow from He and Sun (2017). The result of He and Sun (2017) also covers the class of noisy stochastic games examined in Duggan (2012). In this case, it suffices to take \(\mathcal {G}\subset \mathcal {B} (Y\times Z)\) that consists of the sets D × Z, \(D\in \mathcal {B}(Y).\) Finally, we wish to point out that ARAT discounted stochastic games as well as games considered in Jaśkiewicz and Nowak (2016) (see Theorem 6) are not included in the class of models mentioned in Theorem 9.

Remark 5

A key tool in proving the existence of a stationary Markov perfect equilibrium in a discounted stochastic game that has no ARAT structure is Lyapunov’s theorem on the range of a nonatomic vector measure. Since the Lyapunov theorem is false for infinitely many measures, the counterexample of Levy’s eight-person game is of some importance (see Levy and McLennan 2015). There is another reason for which the existence of an equilibrium in the class of stationary Markov strategies F0 is difficult to obtain. One can recognize strategies from the sets \(F_i^0\) as “Young measures” and consider natural in that class weak-star topology; see Valadier (1994). Young measures are often called relaxed controls in control theory. With the help of Example 3.16 from Elliott et al. (1973), one can easily construct a stochastic game with X = [0, 1], finite action spaces and trivial transition probability q being a Lebesgue measure on X, where the expected discounted payoffs \(J^i_{\beta }(x,f)\) are discontinuous on F0 endowed with the product topology. The continuity of \(f\to J^i_{\beta }(x,f)\) (for fixed initial state) can only be proved for ARAT games. Generally, it is difficult to obtain compact families of continuous strategies. This property requires very strong conditions in order to get, for instance, equicontinuous family of functions (see Sect. 6).

6 Special Classes of Stochastic Games with Uncountable State Space and Their Applications in Economics

In a number of applications of discrete-time dynamic games in economics, the state space is an interval in Euclidean space. An illustrative example is the “fish war game” studied by Levhari and Mirman (1980), where the state space X = [0, 1], Ai(x) = [0, xn] for each i ∈ N. Usually, X is interpreted as the set of common property renewable resources. If xt is a resource stock at the beginning of period \(t\in \mathbb {N}\) and player i ∈ N extracts ait ∈ Ai(xt) for consumption, then the new state is \(x_{t+1}= \left (x_t-\sum _{j=1}^na_{jt}\right )^\alpha \) with α ∈ (0, 1). The game is symmetric in the sense that the utility function of player i ∈ N is: \( u_i(x,a):= \ln a_i\) with a = (a1, …, an) being a pure strategy profile chosen by the players in state x ∈ X. Levhari and Mirman (1980) constructed a symmetric stationary Markov perfect equilibrium for the two-player β -discounted game that consists of linear strategies. For the arbitrary n -player case, the equilibrium strategy profile is \(f^\beta =(f_1^\beta ,\ldots ,f^ \beta _n)\) where \(f_i^\beta (x)= \frac {(1-\alpha \beta )x}{n+(1-n)\alpha \beta }\), x ∈ X, i ∈ N; see Nowak (2006c). Levhari and Mirman (1980) concluded that, in equilibrium, the fish population will be smaller than the population that would have resulted if the players had cooperated and had maximized their joint utility. The phenomenon of overexploitation of a common property resource is known in economics as the “tragedy of the commons.” Dutta and Sundaram (1993) showed that there may exist equilibria (that consist of discontinuous consumption functions), in which the common resource is underexploited, so that the tragedy of the commons need not occur. A characterization of the set of equilibria in this model has been given by Chiarella et al. (1984). If β → 1 , then \(f^\beta \to f^*=(f_1^*,\ldots ,f_n^*)\) where \(f_i^*(x)=\frac {(1-\alpha )x }{n+(1-n)\alpha }\), x ∈ X, i ∈ N. As shown in Nowak (2008), f is a Nash equilibrium in the class of all strategies of the players in the fish war game under the overtaking optimality criterion. Such a criterion was examined in economics by Ramsey (1928), von Weizscker (1965), and Gale (1967), and its application to repeated games was pointed out by Rubinstein (1979). Generally, finding an equilibrium under the overtaking optimality criterion in the class of all strategies is a difficult task; see Carlson and Haurie (1996).

Dutta and Sundaram (1992) considered a stochastic game of resource extraction with state space X = [0, ), Ai(x) = [0, xn] for each i ∈ N, x ∈ X and the same nonnegative utility function u for each player. Their model includes both the dynamic game with deterministic transitions studied by Sundaram (1989a,b) and the stochastic game with nonatomic transition probabilities considered by Majumdar and Sundaram (1991). Now we list the assumptions used by Dutta and Sundaram (1992). For any y, z ∈ X, let Q(y|z) := q([0, y]|z) and for any y > 0 set \(Q(y^-|z):=\lim _{y^{\prime }\uparrow y}Q(y^{\prime }|z).\)

  1. (D1)

    For any x ∈ X, a = (a1, …, an) ∈ A(x) and i ∈ N, ui(x, a) = u(ai) ≥ 0. The utility function u is strictly concave, increasing and continuously differentiable. Moreover, lima↓0u(a) = .

  2. (D2)

    Q(0|0) = 1 and for each z > 0, there exists a compact interval I(z) ⊂ (0, ) such that q(I(z)|z) = 1.

  3. (D3)

    There exists z1 > 0 such that if 0 < z < z1, then Q(z|z) = 0, i.e., q([z, )|z) = 1.

  4. (D4)

    There exists \(\hat {z}>0\) such that for each \(z\ge \hat {z}\), Q(z|z) = 1, i.e., q([0, z]|z) = 1.

  5. (D5)

    If zm → z as m →, then q(⋅|zm) → q(⋅|z) in the weak topology on \(\Pr (X).\)

  6. (D6)

    If z < z, then for each y > 0, Q(y|z) ≥ Q(y|z).

Assumption (D6) is a “strong stochastic dominance” condition that requires larger investments to obtain probabilistically higher stock levels. This assumption and the fact that the players have identical utility functions play a crucial role in the proof of Theorem 1 in Dutta and Sundaram (1992) that can be stated as follows.

Theorem 10

Every discounted stochastic game satisfying conditions (D1)–(D6)has a pure stationary Markov perfect equilibrium.

Remark 6

The equilibrium strategies obtained by Dutta and Sundaram (1992) are identical for all the players, and the corresponding equilibrium functions are nondecreasing and upper semicontinuous on X. One can observe that the assumptions on the transition probability functions include the usual deterministic case with an increasing continuous production function. A slightly more general model was recently studied by Jaśkiewicz and Nowak (2018b), who dealt with unbounded utilities.

Transition probabilities considered in other papers on equilibria in stochastic games are assumed to satisfy much stronger continuity conditions, e.g., the norm continuity in actions.

The problem of proving the existence of a Nash equilibrium in a stochastic game of resource extraction with different utility functions for the players seems to be difficult. Partial results were reported by Amir (1996a), Nowak (2003b), Balbus and Nowak (2008) and Jaśkiewicz and Nowak (2015b), where specific transition structures were assumed. Below we give an example, where the assumptions are relatively simple to list.

  1. (S1)

    X = [0, ) and Ai(x) = [0, bi(x)] with \(\sum _{j=1}^n b_j(x) \le x\) for all x ∈ X, where each bj is a continuous increasing function.

  2. (S2)

    \(u_i:[0,\infty ) \to \mathbb {R}\) is a nonnegative increasing twice differentiable utility function for player i ∈ N such that ui(0) = 0.

  3. (S3)

    If a = (a1, …, an) ∈ A(x) and \(s(a)= \sum _{i=1}^na_i\), then

    $$\displaystyle \begin{aligned} q(\cdot|x,a) = h(x-s(a))q_0(\cdot|x) +(1-h(x-s(a)))\delta_0(\cdot), \end{aligned}$$

    where h : X → [0, 1] is an increasing twice differentiable function such that h′′ < 0 and h(0) = 0, δ0 is the Dirac measure concentrated at the point 0 ∈ X. Moreover, q0((0, )|x) = 1 for each x > 0, q0({0}|0) = 1 and q0(⋅|x) has a density function ρ(x, ⋅) with respect to a σ-finite measure μ defined on X. The function x → ρ(x, y) is continuous for each y ∈ X.

The following result is a special case of Theorem 2 in Jaśkiewicz and Nowak (2015b).

Theorem 11

Every discounted stochastic game satisfying assumptions (S1)–(S3)has a pure stationary Markov perfect equilibrium.

The proof of Theorem 11 uses the fact that the auxiliary game Γv(x) has a unique Nash equilibrium for any vector v = (v1, …, vn) of nonnegative continuation payoffs vi such that vi(0) = 0. The uniqueness follows from page 1476 in Balbus and Nowak (2008) or can be deduced from the classical theorem of Rosen (1965) (see also Theorem 3.6 in Haurie et al. 2012). The game Γv(x) is not supermodular since for increasing continuation payoffs vi such that vi(0) = 0, we have \( \frac {\partial ^2 {U_{\beta }^i(v_i,x,a)}}{\partial {a_i}\partial {a_j}}<0\), for i ≠ j. A stronger version of Theorem 11 and related results can be found in Jaśkiewicz and Nowak (2015b).

Transition probabilities presented in (S3) were first used in Balbus and Nowak (2004). They dealt with the symmetric discounted stochastic games of resource extraction and proved that the sequence of Nash equilibrium payoffs in the n-stage games converges monotonically as n →. Stochastic games of resource extraction without the symmetry condition were first examined by Amir (1996a), who considered so-called convex transitions. More precisely, he assumed that the conditional cumulative distribution function Q(y|z) is strictly convex with respect to z ∈ X for every fixed y > 0. He proved the existence of pure stationary Markov perfect equilibria, which are Lipschitz continuous functions in the state variable. Although the result obtained is strong, a careful analysis of various examples suggests that the convexity assumption imposed by Amir (1996a) is satisfied very rarely. Usually, the cumulative distribution Q(y|z) is neither convex nor concave with respect to z. A further discussion on this condition is provided in Remarks 7–8 in Jaśkiewicz and Nowak (2015b). The function Q(y|z) induced by the transition probability q of the form considered in (S3) is strictly concave in z = x − s(a) only when q0 is independent of x ∈ X. Such transition probabilities that are “mixtures” of finitely many probability measures on X were considered in Nowak (2003b) and Balbus and Nowak (2008). A survey of various game theoretic approaches to resource extraction models can be found in Van Long (2011).

In many other examples, the one-shot game Γv(x) has also nonempty compact set of pure Nash equilibria. Therefore, a counterpart of Theorem 6 can be formulated for the class of pure strategies of the players. We now describe some examples presented in Jaśkiewicz and Nowak (2016).

Example 1 (Dynamic Cournot oligopoly)

Let \(X=[0,\bar {x}]\) and x ∈ X represent a realization of a random demand shock that is modified at each period of the game. Player i ∈ N (oligopolist) sets a production quantity ai ∈ Ai(x) = [0, 1]. If \(P\left (x,\sum _{j=1}^n a_j\right )\) is the inverse demand function, and ci(x, ai) is the cost function for player i, then

$$\displaystyle \begin{aligned} u_i(x,a):= a_i P\left(x,\sum_{j=1}^n a_j\right)- c_i(x,a_i), \quad a=(a_1,\ldots,a_n). \end{aligned}$$

A simple example of the inverse demand function is

$$\displaystyle \begin{aligned} P\left(x,\sum_{j=1}^n a_j\right)=x\left(n-\sum_{j=1}^n a_j\right). \end{aligned}$$

The function \(a_i \to a_iP\left (x,\sum _{j=1}^n a_j\right )\) is usually concave and ai → ci(x, ai) is often convex. Assume that

$$\displaystyle \begin{aligned} q(\cdot|x,a) = (1-\overline{a})q_1(\cdot|x) +\overline{a}q_2(\cdot|x), \quad \overline{a} :=\frac{1}{n}\sum_{j=1}^n a_j, \end{aligned}$$

where q1(⋅|x) and q2(⋅|x) are dominated by some probability measure μ on X for all x ∈ X. In order to provide an interpretation of q, we observe that

$$\displaystyle \begin{aligned} q(\cdot|x,a)= q_1(\cdot|x) + \overline{a}(q_2(\cdot|x)-q_1(\cdot|x)). \end{aligned} $$
(6.2)

Let

$$\displaystyle \begin{aligned} E_q(x,a):=\int_X yq(dy|x,a) \quad\mbox{and}\quad E_{q_{j}}(x):=\int_X yq_j(dy|x) \end{aligned}$$

be the mean values of the distributions q(⋅|x, a) and qj(⋅|x), respectively. By (6.2), we have \(E_q(x,a):= E_{q_{1}}(x) + \overline {a}( E_{q_{2}}(x)- E_{q_{1}}(x)).\) Assume that \(E_{q_{1}}(x) \ge x\ge E_{q_{2}}(x).\) This condition implies that

$$\displaystyle \begin{aligned} E_{q_{2}}(x)- E_{q_{1}}(x)\le 0.\end{aligned} $$

Thus, the expectation of the next demand shock Eq(x, a) decreases if the total sale \(n\overline {a}\) in the current state x ∈ X increases. Observe that the game Γv(x) is concave. From Nash (1950), it follows that the game Γv(x) has a pure equilibrium point. However, the set of Nash equilibria in Γv(x) may contain many points. A modification of the proof of Theorem 1 given in Jaśkiewicz and Nowak (2016) implies that this game has a pure stationary almost Markov perfect equilibrium.

Example 2 (Cournot competition with substituting goods in differentiated markets)

This model is inspired by a dynamic game with complementary goods studied by Curtat (1996). Related static games were already discussed in Spence (1976) and Vives (1990). There are n firms on the market and firm i ∈ N produces a quantity ai ∈ Ai(x) = [0, 1] of a differentiated product. The inverse demand function is given by a twice differentiable function Pi(a), where a = (a1, …, an). The goods are substitutes, i.e., \(\frac {\partial P_i(a)}{\partial a_j}<0\) for all i, j ∈ N, see Spence (1976). In other words, consumption of one good will decrease consumption of the others. We assume that X = [0, 1]n, where i-th coordinate xi ∈ [0, 1] is a measure of the cumulative experience of firm i ∈ N. If ci(xi) is the marginal cost for firm i ∈ N, then

$$\displaystyle \begin{aligned} u_i(x,a):= a_i\left[ P_i\left( a\right)- c_i(x_i)\right], \quad a=(a_1,\ldots,a_n),\;\; x=(x_1,\ldots,x_n)\in X.\end{aligned} $$
(6.3)

The transition probability of the next state (experience vector) is of the form:

$$\displaystyle \begin{aligned} q(\cdot|x,a) = h \left(\sum_{j=1}^n (x_j+ a_j)\right)q_2(\cdot|x) + \left(1- h\left(\sum_{j=1}^n (x_j+ a_j)\right)\right)q_1(\cdot|x),\end{aligned} $$
(6.4)

where

$$\displaystyle \begin{aligned} h\left(\sum_{j=1}^n (x_j+a_j)\right)=\frac{\sum_{j=1}^n x_j+\sum_{j=1}^n a_j}{2n}\end{aligned} $$
(6.5)

and q1(⋅|x), q2(⋅|x) are dominated by some probability measure μ on X for all x ∈ X. In Curtat (1996) it is assumed that q1 and q2 are independent of x ∈ X and also that q2 stochastically dominates q1. Then, the underlying Markov process governed by q captures the ideas of learning-by-doing and spillover (see page 197 in Curtat 1996). Here, this stochastic dominance condition can be dropped, although it is quite natural. It is easy to see that the game Γv(x) is concave, if ui(x, (⋅, ai)) is concave on [0, 1]. Clearly, this is satisfied, if for each i ∈ N, we have

$$\displaystyle \begin{aligned} 2\frac{\partial P_i(a)}{\partial a_i}+\frac{\partial^2 P_i(a)}{\partial a_i^2}a_i<0. \end{aligned}$$

If the goods are substitutes, this condition holds, when \(\frac {\partial ^2 P_i(a)}{\partial a_i^2}\le 0\) for all i ∈ N. The game Γv(x) may have multiple pure Nash equilibria. Using the methods from Jaśkiewicz and Nowak (2016), one can show that any concave game discussed here has a pure stationary almost Markov perfect equilibrium.

Supermodular static games were extensively studied by Milgrom and Roberts (1990) and Topkis (1998). This class of games finds applications in dynamic economic models with complementarities. Our next illustration refers to Example 2, but with products that are complements. The state space and action spaces for firms are the same as in Example 2. We endow both X and A = [0, 1]n with the usual component-wise ordering. Then, X and A are complete lattices. We assume that the transition probability is defined as in (6.4) and q1(⋅|x) and q2(⋅|x) are for all x ∈ X dominated by some probability measure μ on S. The payoff function for every firm is given in (6.3).

Example 3 (Cournot oligopoly with complementary goods in differentiated markets)

Let h be defined as in (6.5). Suppose that the payoff function in the game Γv(x) satisfies the following condition:

$$\displaystyle \begin{aligned} \frac{\partial^2 U_{\beta}^i(v_i,x,a)}{\partial a_i\partial a_j}\ge 0\quad\mbox{for}\quad j\neq i. \end{aligned}$$

Then, by Theorem 4 in Milgrom and Roberts (1990), the game Γv(x) is supermodular. Note that within our framework, it is sufficient to prove that for ui(x, a), defined in (6.3), it holds \(\frac {\partial ^2 u_i(s,a)}{\partial a_i\partial a_j}\ge 0\), j ≠ i. But

$$\displaystyle \begin{aligned} \frac{\partial^2 u_i(x,a)}{\partial a_i\partial a_j}=a_i\frac{\partial^2 P_i(a)}{\partial a_i\partial a_j}+ \frac{\partial P_i(a)}{\partial a_j},\quad j\neq i \end{aligned}$$

and \(\frac {\partial ^2 u_i(x,a)}{\partial a_i\partial a_j}\) are likely to be nonnegative, if the goods are complements, i.e., \(\frac {\partial P_i(a)}{\partial a_j}\ge 0\) for j ≠ i; see Vives (1990). From Theorem 5 in Milgrom and Roberts (1990), it follows that the game Γv(x) has a pure Nash equilibrium. Therefore, the arguments used in Jaśkiewicz and Nowak (2016) imply that the stochastic game has a pure stationary almost Markov perfect equilibrium.

Remark 7

The game described in Example 3 is also studied in Curtat (1996), but with additional restrictive assumptions that q1 and q2 are independent of x ∈ X. Then, the transition probability q has so-called increasing differences in (x, a). This fact implies that the functions \(U^i_{\beta }(v_i,\cdot ,\cdot )\) satisfy the assumptions in Proposition 7. Other condition imposed by Curtat (1996) states that the payoff functions ui(x, a) are increasing in ai and, more importantly, satisfy the so-called strict diagonal dominance condition for each x ∈ X. For details the reader is referred to Curtat (1996) and Rosen (1965). This additional condition entails the uniqueness of a pure Nash equilibrium in every auxiliary game Γv(x) under consideration; see Proposition 6. The advantage is that Curtat (1996) can directly work with Lipschitz continuous strategies for the players and find a stationary Markov perfect equilibrium in that class using Schauder’s fixed point theorem. Without the strict diagonal dominance condition, Γv(x) may have many pure Nash equilibria and Curtat’s approach cannot be applied. The coefficients of the convex combination in (6.4) are affine functions of a ∈ A. This requirement can slightly be generalized; see, for instance, Example 4 in Jaśkiewicz and Nowak (2016). If q1 or q2 depends on x ∈ X, then the increasing differences property of q does not hold and the method of Curtat (1996) does not work. Additional comments on supermodular stochastic games can be found in Amir (2003).

The result in Curtat (1996) on the existence of stationary Markov perfect equilibria for supermodular discounted stochastic games is based upon the lattice theoretic arguments and on complementarity and monotonicity assumptions. The state and action spaces are assumed to be compact intervals in Euclidean space, and the transition probability is assumed to be norm continuous in state and action variables. Moreover, the strict diagonal dominance condition (see (C1) in Sect. 2) applied to the auxiliary one-shot games Γv(x) for any increasing Lipschitz continuous continuation vector payoff v plays a crucial role. Namely, this assumption together with others implies that \(\mathcal {NP}_v(x)\) is a singleton. In addition, the function \(x\to \mathcal {NP}_v(x)\) is increasing and Lipschitz continuous. Thus, his proof is comprised of two steps. First, he shows that there exists an increasing Lipschitz continuous vector payoff function v such that \(v^*(x)= \mathcal {NP}_{v^*}(x)\) for all x ∈ X. Second, he makes use of a theorem on the Lipschitz property of the unique equilibrium in \( \varGamma _{v^*}.\)

Horst (2005) provided a different and more unified approach to stationary Markov perfect equilibria that can be applied beyond the setting of supermodular games. Instead of imposing monotonicity conditions on the players’ utility functions, he considered stochastic games in which the interaction between different players is sufficiently weak. For instance, certain “production games” satisfy this property. The method of his proof is based on a selection theorem of Montrucchio (1987) and the Schauder fixed point theorem applied to the space of Lipschitz continuous strategy profiles of the players. The assumptions imposed by Horst (2005) are rather complicated. For example, they may enforce a number of players in the game or the upper bound for a discount factor. Such limitations do not occur in the approach of Curtat (1996).

Balbus et al. (2014a) considered supermodular stochastic games with an absorbing state and the transition probabilities of the form q(⋅|x, a) = g(x, a)q0(⋅|x) + (1 − g(x, a))δ0(⋅). Under some strong monotonicity conditions on the utility functions and transitions, they showed that the Nash equilibrium payoffs in the n-stage games monotonically converge as n →. This fact yields the existence of pure stationary Markov perfect equilibrium. A related result is given in Balbus et al. (2013b) for a similar class of dynamic games. The state space X in Balbus et al. (2014a) is one-dimensional, and their results do not apply to the games of resource extraction discussed earlier. If, on the other hand, the transition probability is a “mixture” of finitely many probability measures, then a stationary Markov perfect equilibrium can be obtained, in certain models, by solving a system of non-linear equations. This method was discussed in Sect. 5 of Nowak (2007) . The next example is not a supermodular game in the sense of Balbus et al. (2014a), but it belongs to the class of production games examined by Horst (2005). Generally, there are only few examples of games with continuum states, for which Nash equilibria can be given in a closed form.

Example 4

Let X = [0, 1], Ai(x) = [0, 1] for all x ∈ X and i ∈ N = {1, 2}. We consider the symmetric game where the stage utility of player i is

$$\displaystyle \begin{aligned} u_i(x,a_1,a_2)= a_1+a_2+ 2xa_1a_2- a_i^2. \end{aligned} $$
(6.6)

The state variable x in (6.6) is a complementarity coefficient of the players’ actions. The transition probabilities are of the form

$$\displaystyle \begin{aligned} q(\cdot|x,a_1,a_2):= \frac{x+a_1+a_2}{3}\mu_1(\cdot) + \frac{3-x-a_1-a_2}{3}\mu_2(\cdot). \end{aligned}$$

We assume that μ1 has the density ρ1(y) = 2y and μ2 has the density ρ2(y) = 2 − 2y, y ∈ X. Note that μ1 stochastically dominates μ2. From the definition of q, it follows that higher states x ∈ X or high actions a1, a2 (efforts) of the players induce a distribution of the next state having a higher mean value. Assume that \(v^*=(v_1^*,v_2^*)\) is an equilibrium payoff vector in the β-discounted stochastic game. As shown in Example 1 of Nowak (2007), it is possible to construct a system of non-linear equations with unknown z1 and z2, whose solution \(z_1^*\), \(z_2^*\) is \(z_i^*=\int _Xv^*_i(y)\mu _i(dy)\). This fact, in turn, gives the possibility finding of a symmetric stationary Markov perfect equilibrium \((f_1^*,f_2^*)\) and \(v_1^*=v_2^*\). It is of the form \(f^*_i(x)= \frac {1+z^*}{4-2x}. \ x\in X, \ i\in N\), where

$$\displaystyle \begin{aligned} z^* = \frac{-8-6p(\beta-1)-\sqrt{(8+6p(\beta-1))^2-36}}{6} \ \ \mbox{and}\ \ p=\frac{9+2\beta\ln 2 -2\beta}{\beta(1-\beta)(6\ln 2-3 )}. \end{aligned}$$

Moreover, we have

$$\displaystyle \begin{aligned} v_i^*(x)= (p\beta+x)z^* +\frac{(1+z^*)^2(3-x)}{2(2-x)^2}. \end{aligned}$$

Ericson and Pakes (1995) provided a model of firm and industry dynamics that allows for entry, exit and uncertainty generating variability in the fortunes of firms. They considered the ergodicity of the stochastic process resulting from a Markov perfect industry equilibrium. A dynamic competition in an oligopolistic industry with investment, entry and exit was also extensively studied by Doraszelski and Satterthwaite (2010). Computational methods for the class of games studied by Ericson and Pakes (1995) are presented in Doraszelski and Pakes (2007). Further applications of discounted stochastic games with countably many states to models in industrial organization including models of industry dynamics are given in Escobar (2013).

Shubik and Whitt (1973) considered a non-stochastic model of sequential strategic market game, where the state includes a current stocks of capital. At each period of the game, one unit of a consumer good is put up for sale, and players bid some amounts of fiat money for it. A stochastic counterpart of this game was first presented in Secchi and Sudderth (2005). Wiecek (2009), on the other hand, obtained a general structure of equilibrium policies in two -person games, where bids gradually decrease with increase of the discount factor. Moreover, Wiecek (2012) proved that a Nash equilibrium, where all the players use “aggressive strategies”, emerges in the game for any value of the discount factor as the number of players is sufficiently large. This fact corresponds to a similar result for a deterministic economy given in Shubik and Whitt (1973) as well as being consistent with existing results about economies with continuum of players. Other applications of nonzero-sum stochastic games to economic models can be found in Duggan (2012) and He and Sun (2017). Although the concept of mean field equilibrium in dynamic games is not directly inspired by Nash, the influence of the theory of non-cooperative stochastic games on this area of research is obvious. Also the notion of supermodularity is used in studying the mean field equilibria in dynamic games. The reader is referred to Adlakha and Johari (2013) where some applications to computer science and operations research are given.

7 Special Classes of Stochastic Games with Countably Many States

Assume that the state space X is countable. Then every \(F^0_i\) can be recognized as a compact convex subset of a linear topological space. A sequence \((f_i^k)_{k\in \mathbb {N}}\) converges to \(f_i\in F^0_i\) if for every \(x\in X\ f_i^k(\cdot |x) \to f_i(\cdot |x)\) in the weak-star topology on the space of probability measures on Ai(x). The weak or weak-star convergence of probability measures on metric spaces is fully described in Aliprantis and Border (2006) or Billingsley (1968). Since X is countable, every space \(F^0_i \) is sequentially compact (it suffices to use the standard diagonal method for selecting convergent subsequences) and, therefore, F0 is sequentially compact when endowed with the product topology. If X is finite and the sets of actions are finite, then F0 can actually be viewed as a convex compact subset of Euclidean space. In the finite state space case, it is easy to prove that the discounted payoffs \(J_{\beta }^i(x,f)\) are continuous on F0. If X is countable and the payoff functions are uniformly bounded, and q(y|x, a) is continuous in a ∈ A(x) for all x, y ∈ X, then showing the continuity of \(J_{\beta }^i(x,f)\) on F0 requires a little more work; see Federgruen (1978). From the Bellman equation in discounted dynamic programming (see Puterman 1994), it follows that \(f^*=(f^*_1,\ldots ,f^*_n)\) is a stationary Markov perfect equilibrium in the discounted stochastic game if and only if there exist bounded functions \(v_i^*:X\to \mathbb {R}\) such that for each x ∈ X and i ∈ N we have

$$\displaystyle \begin{aligned} v^*_i(x)= \max_{\nu_i \in \Pr (A_i(x))} U^i_{\beta}(v^*_i,x,(\nu_i,f^*_{-i}))=U^i_{\beta}(v^*_i,x,f^*). \end{aligned} $$
(6.7)

From (6.7), it follows that \(v^*_i(x)=J_{\beta }^i(x,f^*)\). Using the continuity of the expected discounted payoffs in f ∈ F0 and (6.7 ), one can define the best response correspondence in the space of strategies, show its upper semicontinuity and conclude from the fixed point theorem due to Glicksberg (1952) (or due to Kakutani (1941) in the case of finite state and action space) that the game has a stationary Markov perfect equilibrium f∈ F0. This fact was proved for finite state space discounted stochastic games by Fink (1964) and Takahashi (1964). An extension to games with countable state spaces was reported in Parthasarathy (1973) and Federgruen (1978). Some results for a class of discounted games with discontinuous payoff functions can be found in Nowak and Wiȩcek (2007).

The fundamental results in the theory of regular Nash equilibria in normal form games concerning genericity (see Harsanyi 1973a) and purification (see Harsanyi 1973b) were extended to dynamic games by Doraszelski and Escobar (2010). A discounted stochastic game possessing equilibria that are all regular in the sense of Doraszelski and Escobar (2010) has a compact equilibrium set that consists of isolated points. Hence, it follows that the equilibrium set is finite. They proved that the set of discounted stochastic games (with finite sets of states and actions) having Markov perfect equilibria that all are regular is open and has full Lebesgue measure. Related results were given by Haller and Lagunoff (2000), but their definition of regular equilibrium is different and may not be purifiable.

The payoff function for player i ∈ N in the limit-average stochastic game can be defined as follows:

$$\displaystyle \begin{aligned} \bar{J}^i(x,\pi):= \liminf_{T\to\infty} E_x^\pi\left(\frac{1}{T}\sum_{t=1}^T u_i(x_t,a^t)\right),\quad x\in X,\ \pi\in \varPi.\end{aligned} $$

The equilibrium solutions for this class of games are defined similarly as in the discounted case. The existence of stationary Markov perfect equilibria for games with finite state and action spaces and the limit-average payoffs was proved independently by Rogers (1969) and Sobel (1971). They assumed that the Markov chain induced by any stationary strategy profile and the transition probability q is irreducible . It is shown under the irreducibility condition that the equilibrium payoffs \(w^*_i\) of the players are independent of the initial state. Moreover, it is proved that there exists a sequence of equilibria \((f^k)_{k\in \mathbb {N}}\) in βk-discounted games (with βk → 1 as k →) such that \(w_i^*= \lim _{k\to \infty }(1-\beta _k)J^i_{\beta _k}(x,f^k)\). Later, Federgruen (1978) extended these results to limit-average stochastic games with countably many states satisfying some uniform ergodicity conditions. Other cases of similar type were mentioned by Nowak (2003a). Below we provide a result due to Altman et al. (1997), which has some potential for applications in queueing models. The stage payoffs in their approach may be unbounded. We start with a formulation of their assumptions.

Let m : X → [1, ) be a function for which the following conditions hold.

  1. (A4)

    For each x, y ∈ X, i ∈ N, the functions ui(x, ⋅) and q(y|x, ⋅) are continuous on A(x). Moreover,

    $$\displaystyle \begin{aligned} &\sup_{x\in X}\max_{a\in A(x)}|u_i(x,a)|/m(x) <\infty \quad\mbox{and}\\ &\quad \quad \lim_{k\to\infty}\sum_{y\in X}|q(y|x,a^k)-q(y|x,a)|m(y)=0 \end{aligned} $$

    for any ak → a ∈ A(x).

  2. (A5)

    There exist a finite set Y ⊂ X and γ ∈ (0, 1) such that

    $$\displaystyle \begin{aligned} \sum_{y\in X\setminus Y}q(y|x,a)m(y) \le \gamma m(x)\quad \mbox{for all} \quad x\in X,\ a\in A(x). \end{aligned}$$
  3. (A6)

    The function f → n(f) is continuous with respect to stationary strategy profiles f ∈ F0, where n(f) denotes the number of closed classes in the Markov chain induced by the transition probability q(y|x, f), x, y ∈ X.

Property (A5) is called m-uniform geometric recurrence; see Altman et al. (1997). Condition (A6) is quite restrictive and implies that the number of positive recurrent classes is a constant function of the stationary strategies. If the Markov chains resulting from the stationary policies are all unichain, the limit-average payoff functions are constant, i.e., independent of the initial state. For a detailed discussion, we refer the reader to Altman et al. (1997) and the references cited therein.

Theorem 12

If conditions (A4)–(A6)are satisfied, then the limit-average payoff n-person stochastic game has a stationary Markov perfect equilibrium.

The above result follows from Theorem 2.6 in Altman et al. (1997), where it is also shown that under (A4)–(A6) any limit of stationary Markov equilibria in β-discounted games (as β → 1) is an equilibrium in the limit-average game. A related result was established by Borkar and Ghosh (1993) under a stochastic stability condition. More precisely, they assumed that the Markov chain induced by any stationary strategy profile is unichain and the transition probability from any fixed state has a finite support.

Stochastic games with countably many states are usually studied under some recurrence or ergodicity conditions. Without these conditions n-person nonzero-sum limit-average payoff stochastic games with countable state spaces are very difficult to deal with. Nevertheless, the results obtained in the literature have some interesting applications, especially to queueing systems; see, for example, Altman (1996) and Altman et al. (1997).

Now assume that X is a Borel space and μ is a probability measure on X. Consider an n-person discounted stochastic game G, where Ai(x) = Ai for all i ∈ N and x ∈ X, the payoff functions are uniformly bounded and continuous in actions.

  1. (A7)

    The transition probability q has a conditional density function ρ, which is continuous in actions and such that

    $$\displaystyle \begin{aligned} \int_X\max_{a\in A}\rho(x,a,y)\mu(dy) <\infty. \end{aligned}$$

Let C(A) be the Banach space of all real-valued continuous functions on the compact space A endowed with the supremum norm ∥⋅∥. By L1(X, C(A)) we denote the Banach space of all C(A)-valued measurable functions ϕ on X such that ∥ϕ1 :=∫Xϕ(y)∥μ(dy) < . Let \(\{X_k\}_{k\in \mathbb {N}_0}\) be a measurable partition of the state space (\(\mathbb {N}_0\subset \mathbb {N}\)), \(\{u_{i,k}\}_{k\in \mathbb {N}_0}\) be a family of functions ui,k ∈ C(A) and \( \{\rho _k\}_{k\in \mathbb {N}_0}\) be a family of functions ρk ∈ L1(X, C(A)) such that ρk(x)(a, y) ≥ 0 and ∫Xρk(x)(a, y)μ(dy) = 1 for each \(k\in \mathbb {N}_0,\ a\in A\). Consider a game \( \tilde {G}\) where the payoff function for player i is \(\tilde {u} _i(x,a)=u_k(a)\) if x ∈ Xk. The transition density is \(\tilde {\rho } (x,a,y)=\rho _k(x)(a,y)\) if x ∈ Xk. Let \(\tilde {F}^0_i\) be the set of all \(f_i\in F^0_i\) that are constant on every set Xk. Let \(\tilde {F}^0:= \tilde {F}_1^0\times \cdots \times \tilde {F}_n^0\). The game \(\tilde {G}\) resembles a game with countably many states and if the payoff functions \( \tilde {u}_i\) are uniformly bounded, then \(\tilde G\) with the discounted payoff criterion has an equilibrium in \(\tilde {F}^0\). Denote by \(\tilde {J} ^i_{\beta }(x,\pi )\) the discounted expected payoff to player i ∈ N in the game \(\tilde G\). It is well known that C(A) is separable. The Banach space L1(X, C(A)) is also separable. Note that x → ui(x, ⋅) is a measurable mapping from X to C(A). By, (A7) the mapping x → ρ(x, ⋅, ⋅) from X to L1(X, C(A)) is also measurable. Using these facts Nowak (1985) proved the following result.

Theorem 13

Assume that G satisfies (A7) . For any 𝜖 > 0, there exists a game \(\tilde G\) such that \(|J^i_{\beta }(x,\pi )-\tilde {J}^i_{\beta }(x,\pi )|< \epsilon /2\) for all x  X, i  N and π  Π. Moreover, the game G has a stationary Markov 𝜖-equilibrium.

A related result on approximation of discounted nonzero-sum games and existence of 𝜖-equilibria was given by Whitt (1980), who used stronger uniform continuity conditions and used a different technique. Approximations of discounted and also limit-average stochastic games with general state spaces and unbounded stage functions were studied in Nowak and Altman (2002). They used the weighted norm approach and imposed some geometric ergodicity conditions while examining the limit-average case. An extension with simpler and more transparent proof for semi-Markov games satisfying a geometric drift condition and a majorization property, similar to (GE1)–(GE3) in Sect. 5 in Jaśkiewicz and Nowak (2018a), was given in Jaśkiewicz and Nowak (2006).

8 Algorithms for Nonzero-Sum Stochastic Games

In this section, we assume that the state space X and the sets of actions Ai are finite. In the 2-player case, we let for notational convenience A1(x) = A1, A2(x) = A2 and a = a1 ∈ A1, b = a2 ∈ A2. Further, for any \(f_i \in F^0_i\), i = 1, 2, we set

$$\displaystyle \begin{aligned} q(y|x,f_1,f_2)&:= \sum_{a\in A_1}\sum_{b\in A_2} q(y|x,a,b)f_1(a|x)f_2(b|x),\\ q(y|x,f_1,b)&:= \sum_{a\in A_1} q(y|x,a,b)f_1(a|x),\\ u_i(x,f_1,f_2)&:= \sum_{a\in A_1}\sum_{ b\in A_2} u_i(x,a,b)f_1(a|x)f_2(b|x),\\ u_i(x,f_1,b)&:= \sum_{a\in A_1} u_i(x,a,b)f_1(a|x). \end{aligned} $$

In a similar way, we define q(y|x, a, f2) and ui(x, a, f2). Note that every \(f_i\in F^0_i\) can be recognized as a compact convex subset of Euclidean space. Also every function \(\phi :X\to \mathbb {R}\) can be viewed as a vector in Euclidean space. Below we describe two results of Filar et al. (1991) about characterization of stationary equilibria in stochastic games by constrained nonlinear programming. However, due to the fact that the constraint sets are not convex, the results are not straightforward in numerical implementation. Although it is common in mathematical programming to use matrix notation, we follow the one introduced in previous sections.

Let c = (v1, v2, f1, f2). Consider the following problem (OPβ):

$$\displaystyle \begin{aligned} \min \ O_1(c):= \sum_{i=1}^2\sum_{x\in X} \left( v_i(x)-u_i(x,f_1,f_2) -\beta\sum_{y\in X}v_i(y)q(y|x,f_1,f_2)\right) \end{aligned}$$

subject to \((f_1,f_2)\in F^0_1\times F^0_2\) and

$$\displaystyle \begin{aligned} u_1(x,a,f_2)+\beta\sum_{x\in X}v_1(y)q(y|x,a,f_2)\le v_1(x),\ \mbox{for all} \ x\in X,\ a\in A_1, \end{aligned}$$

and

$$\displaystyle \begin{aligned} u_2(x,f_1,b)+\beta\sum_{x\in X}v_2(y)q(y|x,f_1,b)\le v_2(x),\ \mbox{for all} \ x\in X,\ b\in A_2. \end{aligned}$$

Theorem 14

Consider a feasible point \(c^*=(v_1^*,v_2^*,f_1^*,f_2^*)\) in (OPβ). Then \((f_1^*,f_2^*)\in F_1^0\times F^0_2\) is a stationary Nash equilibrium in the discounted stochastic game if and only if c is a solution to problem (OPβ) with O1(c) = 0.

Let c = (z1, v1, w1, f2, z2, v2, w2, f1). Now consider the following problem (OPa):

$$\displaystyle \begin{aligned} \min\ O_2(c):= \sum_{i=1}^2\sum_{x\in X} \left( v_i(x)-\sum_{y\in X}v_i(y)q(y|x,f_1,f_2)\right) \end{aligned}$$

subject to \((f_1,f_2)\in F^0_1\times F^0_2\) and

$$\displaystyle \begin{aligned} &\sum_{y\in X} v_1(y)q(y|x,a,f_2)\le v_1(x),\\ & u_1(x,a,f_2)+\sum_{y\in X} z_1(y)q(y|x,a,f_2)\le v_1(x)+z_1(x) \end{aligned} $$

for all x ∈ X, a ∈ A1 and

$$\displaystyle \begin{aligned} &\sum_{y\in X} v_2(y)q(y|x,f_1,b)\le v_2(x), \\ &u_2(x,f_1,b)+\sum_{y\in X} z_2(y)q(y|x,f_1,b)\le v_2(x)+z_2(x) \end{aligned} $$

for all x ∈ X, b ∈ A2 and

$$\displaystyle \begin{aligned} u_i(x,f_1,f_2)+\sum_{y\in X}w_i(y)q(y|x,f_1,f_2)= v_i(x)+w_i(x) \end{aligned}$$

for all x ∈ X and i = 1, 2.

Theorem 15

Consider a feasible point \(c^*=(z^*_1,v_1^*,w_1^*,f_2^*,z^*_2,v_2^*,w_2^*,f_1^*)\) in (OPa). Then \((f_1^*,f_2^*)\in F_1^0\times F^0_2\) is a stationary Nash equilibrium in the limit-average payoff stochastic game if and only if c is a solution to problem (OPa) with O2(c) = 0.

Theorems 14 and 15 were stated and proved in Filar et al. (1991); see also Theorems 3.8.2 and 3.8.4 in Filar and Vrieze (1997).

As in the zero-sum case, when one player controls the transitions, it is possible to construct finite-step algorithms to compute Nash equilibria. The linear complementarity problem (LCP) is defined as follows. Given a square matrix \(\mathbb {M}\) of order m and a (column) vector \( \overline {Q}\in \mathbb {R}^m\), we find two vectors \(\overline {Z} =[z_1,\ldots ,z_m]^T \in \mathbb {R}^m\) and \(\overline {W}=[w_1,\ldots ,w_m]^T\in \mathbb {R}^m\) such that

$$\displaystyle \begin{aligned} \mathbb{M}\overline{Z}+\overline{Q}=\overline{W}\quad\mbox{and}\quad w_j\ge0,\ z_j\ge 0, \ z_jw_j=0\quad\mbox{for all}\quad j=1,\ldots,m. \end{aligned}$$

Lemke (1965) proposed some pivoting finite-step algorithms to solve the LCP for a large class of matrices \(\mathbb {M}\) and vectors \(\overline {Q}\). Further research on the LCP can be found in Cottle et al. (1992).

Finding a Nash equilibrium in any bimatrix game \((\mathbb {A},\mathbb {B})\) is equivalent to solving the LCP with

$$\displaystyle \begin{aligned} \mathbb{M} =\left[ \begin{array}{cc} \mathbb{B}^T & \mathbb{O} \\ \mathbb{O} & \mathbb{A} \end{array} \right] \ \mbox{where}\ \mathbb{O} \ \mbox{ is the matrix with zero entries,}\ \overline{Q}=[-1,\ldots,-1]^T. \end{aligned}$$

A finite-step algorithm for this LCP was given by Lemke and Howson (1964). If \( \overline {Z}^*= [\overline {Z}_1^*,\overline {Z}^*_2]\) is a part of the solution of the above LCP, then the normalization of \(\overline {Z}^*_i\) is an equilibrium strategy for player i.

Suppose that only player 2 controls the transitions in a discounted stochastic game, i.e., q(y|x, a, b) is independent of a ∈ A. Let \(\{f_1,\ldots ,f_{m_1}\}\) and \(\{g_1,\ldots ,g_{m_2}\}\) be the families of all pure stationary strategies for players 1 and 2, respectively. Consider the bimatrix game \((\mathbb {A},\mathbb {B})\), where the entries aij of \(\mathbb {A}\) and bij of \(\mathbb {B}\) are

$$\displaystyle \begin{aligned} a_{ij}:= \sum_{x\in X} u_1(x,f_i(x),g_j(x)) \quad\mbox{and}\quad b_{ij}:= \sum_{x\in X} J_{\beta}^2(x,f_i,g_j). \end{aligned}$$

Then, making use of the Lemke-Howson algorithm, Nowak and Raghavan (1993) proved the following result.

Theorem 16

Let \(\xi ^* =(\xi ^*_1,\ldots ,\xi ^*_{m_1})\) and \(\zeta ^*=(\zeta ^*_1,\ldots ,\zeta ^*_{m_2})\) and assume that (ξ, ζ) is a Nash equilibrium in the bimatrix game \((\mathbb {A},\mathbb {B})\) defined above. Then the stationary strategies

$$\displaystyle \begin{aligned} f^*(x)=\sum_{j=1}^{m_1}\xi^*_j\delta_{f_j(x)}\quad\mathit{\mbox{and}}\quad g^*(x)=\sum_{j=1}^{m_2}\zeta^*_j\delta_{g_j(x)} \end{aligned}$$

form a Nash equilibrium in the discounted stochastic game.

It should be noted that a similar result does not hold for stochastic games with the limit-average payoffs. Note that the entries of the matrix \( \mathbb {B}\) can be computed in finitely many steps, but the order of the associated LCP is very high. Therefore, a natural question arises as to whether the single-controller stochastic game can be solved with the help of LCP formulation with appropriately defined matrix \(\mathbb {M}\) (with lower dimension) and vector \(\overline {Q}\). Since the payoffs and transitions depend on states and stationary equilibria which are characterized by the systems of Bellman equations, the dimension of the LCP must be high. However, it should be essentially smaller than in the case of Theorem 16. Such an LCP formulation for discounted single-controller stochastic games was given by Mohan et al. (1997) and further developed in Mohan et al. (2001). In the case of the limit-average payoff and single-controller stochastic game, Raghavan and Syed (2002) provided an analogous algorithm. Further studies on specific classes of stochastic games (acyclic 3-person switching control games, polystochastic games) can be found in Krishnamurthy et al. (2012).

Let us recall that a Nash equilibrium in an n-person game is a fixed point of some mapping. A fixed point theorem of certain deformations of continuous mappings proved by Browder (1960) turned out to be basic for developing so-called homotopy methods in computing equilibria in nonzero-sum games. It reads as follows.

Theorem 17

Assume that \(C\subset \mathbb {R}^d\) is a nonempty compact convex set. Let Ψ : [0, 1] × C  C be a continuous mapping and F(Ψ) := {(t, c) ∈ [0, 1] × C : c = Ψ(t, c)}. Then F(Ψ) contains a connected subset Fc(Ψ) such that Fc(Ψ) ∩ ({0}× C) ≠ ∅ and Fc(Ψ) ∩ ({1}× C) ≠ ∅.

This result was extended to upper semicontinuous correspondences by Mas-Colell (1974). Consider an n-person game and assume that Ψ1 is a continuous mapping whose fixed points in the set C of strategy profiles correspond to Nash equilibria in this game. The basic idea in the homotopy methods is to define a “deformation” Ψ of Ψ1 such that Ψ(1, c) = Ψ1(c) for all c ∈ C and such that Ψ(0, c) has a unique fixed point, say \(c_0^*\), that is relatively simple to find. By Theorem 17, Fc(Ψ) is a connected set. Thus, \(c_0^*\) is connected via Fc(Ψ) with a fixed point \(c_1^*\) of Ψ1. Hence, the idea is to consider the connected set Fc(Ψ). Since the dimension of the domain of Ψ is one higher than the dimension of its range, one can formulate regularity conditions under which the approximation path is a compact, piecewise differentiable one-dimensional manifold, i.e., it is a finite collection of arcs and loops. In the case of bimatrix games, a nondegeneracy condition is sufficient to guarantee that the aforementioned properties are satisfied. A comprehensive discussion of the homotopy algorithms applied to n-person games is provided by Herings and Peeters (2010) and references cited therein. According to the authors, “advantages of homotopy algorithms include their numerical stability, their ability to locate multiple solutions, and the insight they provide in the properties of solutions”. Various examples show that implementation of homotopy methods is rather straightforward with the aid of available professional software. It is worth recalling the known fact that the Lemke-Howson algorithm can be applied to bimatrix games only. An issue of finding Nash equilibria in concave n-person games comprises a non-linear complementarity problem. Therefore, one can only expect to obtain approximate equilibria by different numerical methods.

The homotopy methods, as noted by Herings and Peeters (2004), are also useful in the study of stationary equilibria, their structure and computation in nonzero-sum stochastic games. Their results can be applied to n-person discounted stochastic games with finite state and action spaces.

Recently, Govindan and Wilson (2003) proposed a new algorithm to compute Nash equilibria in finite games. Their algorithm combines the global Newton method (see Smale 1976)) and a homotopy method for finding fixed points of continuous mappings developed by Eaves (1972, 1984). In the construction of a Nash equilibrium, a fundamental topological property of the graph of the Nash equilibrium correspondence discovered by Kohlberg and Mertens (1986) plays an important role. Being more precise, the authors show that making use of the global Newton method, it is possible to trace the path of the homotopy by a dynamical system. The same method can be applied to a construction of an algorithm for n-person discounted stochastic games with finite action and state sets; see Govindan and Wilson (2009). Strategic n-person games with a potential function having pure Nash equilibria were considered by Monderer and Shapley (1996). Potters et al. (2009), on the other hand, examined certain classes of discounted stochastic games via the potential function approach and constructed pure stationary Nash equilibria by solving a finite number of finite strategic games.

Solan and Vieille (2010) pointed out that the methods based on formal logic, successfully applied to zero-sum games, are also useful in the examination of certain classes of nonzero-sum stochastic games with the limit-average payoff criterion.

9 Uniform Equilibrium, Subgame Perfection, and Correlation in Stochastic Games with Finite State and Action Spaces

In this section, we consider stochastic games with finite state space X = {1, …, s} and finite sets of actions. We deal with “normalized discounted payoffs” and use notation which is more consistent with the surveyed literature. We let β = 1 − λ and multiply all current payoffs by λ ∈ (0, 1). Thus, we consider

$$\displaystyle \begin{aligned} J^i_{\lambda}(x,\pi):= E_x^\pi\left(\sum_{t=1}^\infty \lambda(1-\lambda)^{t-1}u_i(x_t,a^t)\right),\quad x=x_1\in X,\ \pi\in\varPi,\ i\in N. \end{aligned}$$

For any \(T\in \mathbb {N}\) and x = x1 ∈ X, π ∈ Π, the T-stage average payoff for player i ∈ N is

$$\displaystyle \begin{aligned} J^i_T(x,\pi):= E_x^\pi\left( \frac{1}{T}\sum_{t=1}^T u_i(x_t,a^t)\right). \end{aligned}$$

A vector \(\bar {g}\in \mathbb {R}^{n}\) is called a uniform equilibrium payoff at the initial state x ∈ X if for every 𝜖 > 0 there exist λ0 ∈ (0, 1], \(T^0\in \mathbb {N}\) and a strategy profile π0 ∈ Π such that for every player i ∈ N and every strategy πi ∈ Πi, we have

$$\displaystyle \begin{aligned} \bar{g}^i +\epsilon \ge J^i_{\lambda}(x,\pi^0)\ge \bar{g}^i-\epsilon \ge J^i_{\lambda}(x,(\pi_i,\pi^0_{-i}))-2\epsilon \quad\mbox{for}\quad \lambda\in (0,\lambda^0] \end{aligned}$$

and

$$\displaystyle \begin{aligned} \bar{g}^i +\epsilon \ge J^i_T(x,\pi^0)\ge \bar{g}^i-\epsilon \ge J^i_T(x,(\pi_i,\pi^0_{-i}))-2\epsilon \quad\mbox{for}\quad T\ge T^0. \end{aligned}$$

Any profile π0 that has the above two properties is a called a uniform 𝜖-equilibrium. In other words, the game has a uniform equilibrium payoff if for every 𝜖 > 0 there is a strategy profile π0 which is an 𝜖-equilibrium in every discounted game with a sufficiently small discount factor λ and in every finite-stage game with sufficiently long time horizon.

A stochastic game is called absorbing if all states but one are absorbing. Assume that X = {1, 2, 3} and only state x = 1 is nonabsorbing. Let E0 denote the set of all uniform equilibrium payoffs. Since the payoffs are determined in states 2 and 3, in a two-person game, the set E0 can be viewed as a subset of \(\mathbb {R}^2\). Let λk → 0 as k →, and let \( f_k^*\) be a stationary Markov perfect equilibrium in the λk -discounted two-person game. A question arises as to whether the sequence \( (J^1_{\lambda _k}(x,f^*_k),J^2_{\lambda _k}(x,f^*_k))_{k\in \mathbb {N}}\) with x = 1 has an accumulation point \( \bar {g}\in E^0\). That is the case in the zero-sum case (see Mertens and Neyman 1981 ). Sorin (1986) provided a nonzero-sum modification of the “Big Match,” where only state x = 1 is nonabsorbing in which \(\lim _{k\to \infty } (J^1_{\lambda _k}(1,f^*_k),J^2_{\lambda _k}(1,f^*_k))\not \in E^0\). A similar phenomenon occurs for the limit of T-stage equilibrium payoffs. Sorin (1986) gave a full description of the set E0 in his example. His observations were generalized by Vrieze and Thuijsman (1989) to all 2-person absorbing games. They proved the following result.

Theorem 18

Any two-person absorbing stochastic game has a uniform equilibrium payoff.

We now state the fundamental result of Vieille (2000a,b).

Theorem 19

Every two-person stochastic game has a uniform equilibrium payoff.

The proof of Vrieze and Thuijsman (1989) is based on the “vanishing discount factor approach” combined with the idea of “punishment” successfully used in repeated games. The assumption that there are only two players is important in the proof. The 𝜖-equilibrium strategies that they construct need unbounded memory. The proof of Vieille (2000a,b) is involved. One of the reasons is that the ergodic classes do not depend continuously on strategy profiles. Following Vieille (2002) one can briefly say that “the basic idea is to devise an 𝜖-equilibrium profile that takes the form of a stationary-like strategy vector, supplemented by threats of indefinite punishment”. The construction of uniform equilibrium payoff consists of two independent steps. First, a class of solvable states is recognized and some controlled sets are considered. Second, the problem is reduced to the existence of equilibria in a class of recursive games. The punishment component is crucial in the construction and therefore the fact that the game is 2-person is important. Neither of the two parts of the proof can be extended to games with more than two players. The 𝜖 -equilibrium profiles have no subgame-perfection property and require unbounded memory for the players. For a heuristic description of the proof, the reader is referred to Vieille (2002). In a recent paper Solan (2017) proposed a new solution concept for multiplayer stochastic games called acceptable strategy profiles. It is relatively simpler than uniform equilibrium and has some interesting properties. A suitable adaptation of the notion of uniform equilibrium is studied by Neyman (2017) in the class of continuous-time stochastic games with a small imprecision in the specification of players’ evaluations of streams of payoffs.

Flesch et al. (1997) proposed a three-person game with absorbing states where only a cyclic Markov equilibrium exists. No examples of this type were found in the 2-person case. This example inspired Solan (1999), who making use of certain arguments from Vrieze and Thuijsman (1989), proved the following result.

Theorem 20

Every 3 -person absorbing stochastic game has a uniform equilibrium payoff.

In a quitting game , every player has only two actions, c for continue and q for quit. As soon as one or more of the players at any stage chooses q, the game stops and the players receive their payoffs, which are determined by the subset of players, say S, that choose simultaneously the action q. If nobody chooses q throughout all stages of play, then all players receive zero. The payoffs are defined as follows. For every nonempty subset S ⊂ N of players, there is a payoff vector \(v(S) \in \mathbb {R}^n\). The first stage on which S is the subset of players that choose q at this stage, every player i ∈ N receives the payoff v(S)i. A quitting game is a special limit-average-absorbing stochastic game. The example of Flesch et al. (1997) belongs to this class. We now state the result due to Solan and Vieille (2001).

Theorem 21

Consider a quitting game satisfying the following assumptions: if player i alone quits, then i receives 1, and if player i quits with some other players, then i receives at most 1. Then the game has a subgame-perfect 𝜖-equilibrium. Moreover, there is a cyclic 𝜖-equilibrium strategy profile.

Quitting games are special cases of “escape games” studied by Simon (2007). As shown by Simon (2012), a study of quitting games can be based on some methods of topological dynamics and homotopy theory. More comments on this issue can be found in Simon (2016).

Thuijsman and Raghavan (1997) studied n-person perfect information stochastic games and n-person ARAT stochastic games and showed the existence of pure equilibria in the limit-average payoff case. They also derived the existence of 𝜖-equilibria for 2-person switching control stochastic games with the same payoff criterion. A class of n-person stochastic games with the limit-average payoff criterion and additive transitions as in the ARAT case (see Sect. 5) was studied by Flesch et al. (2007). The payoff functions do not satisfy any separability in actions assumption. They established the existence of Nash equilibria that are history dependent. For 2-person absorbing games, they showed the existence of stationary 𝜖-equilibria. In Flesch et al. (2008, 2009), the authors studied stochastic games with the limit-average payoffs where the state space X is the Cartesian product of some finite sets Xi, i ∈ N. For any state x = (x1, …, xn) ∈ X and any profile of actions a = (a1, …, an), the transition probability is of the form q(y|x, a) = q1(y1|x1, a1)⋯qn(yn|xn, an) where y = (y1, …, yn) ∈ X. In both aperiodic and periodic cases, they established the existence of Nash equilibria for n -person games. In the two-person zero-sum case, there exists a stationary Markov perfect equilibrium.

A stochastic game is recursive if the payoffs at all nonabsorbing states are zero. The class of recursive stochastic games is important. The payoffs in any absorbing state can be interpreted as limit averages of stage payoffs as soon as the absorbing state is reached. If no absorbing state is reached, then the average payoff is zero. Moreover, as noted by Simon (2016), “by expanding the state space of any normal stochastic game so that there is a one-to-one relationship between the finite histories of play and the states, any state corresponds to a clopen (open and closed) subset of the infinite histories of play and every open subset of the infinite histories of play will correspond to some collection of states. A stochastic game where all non-zero payoffs are determined by membership in an open set of the infinite histories of play becomes in this way equivalent to a recursive game. Notice that if all absorbing payoffs are positive then the payoffs are lower semicontinuous, and if all absorbing payoffs are negative then the payoffs are upper semicontinuous (as functions on the infinite histories of play).” Flesch et al. (2010b) considered a class of n-person stochastic perfect information games assuming that in every state, the transitions are controlled by one player. The payoffs are equal to zero in every nonabsorbing state and are nonnegative in every absorbing state. They proposed a new iterative method to analyse these games under the expected limit-average payoff criterion and proved the existence of a subgame-perfect 𝜖-equilibrium in pure strategies. They also showed the existence of the uniform equilibrium payoffs. Recursive n-person perfect information games, where each player controls one nonabsorbing state and the transitions are deterministic, were studied in Kuipers et al. (2016). Allowing also for negative payoffs in absorbing states (in contrast to Flesch et al. 2010b), the authors showed the existence of a subgame-perfect 𝜖-equilibrium by a combinatorial method.

Correlated equilibria were introduced by Aumann (1974, 1987) for games in normal form. Correlation devices may be of different types; see Forges (2009). In Sect. 4 we considered a correlation device using public randomization. They are also called stationary, because at every stage a signal is generated according to the same probability distribution, independent of any data. There are also devices based on past signals that were sent to the players, but not on the past play. They are called “autonomous correlation devices” (see Forges 2009). An 𝜖-equilibrium in an extended game that includes an autonomous correlation device is also called an extensive-form correlated 𝜖 -equilibrium in a multistage game. Solan (2001) characterized the set of extensive-form correlated 𝜖-equilibria in stochastic games. He showed that every feasible and individually rational payoff in a stochastic game is an extensive-form correlated equilibrium payoff constructed with the help of an appropriately chosen device.

The following two results are due to Solan and Vieille (2002).

Theorem 22

Every n-person stochastic game with finite state and action spaces has a uniform correlated equilibrium payoff using an autonomous correlation device.

The construction of an equilibrium profile is based on the method of Mertens and Neyman (1981) applied to zero-sum games. The equilibrium path is sustained by the use of threat strategies. However, punishment occurs only if a player disobeys the recommendation of the correlation device. The second result is stronger in some sense but concerns positive recursive games, where the payoffs in absorbing states are nonnegative for all players.

Theorem 23

Every positive recursive stochastic game with finite sets of states and actions has a uniform correlated equilibrium payoff and the correlation device can be taken to be stationary.

The proof of the above result makes use of a variant of the method of Vieille (2000b).

In a recent paper, Mashiah-Yaakovi (2015) considered stochastic games with countable state spaces, finite sets of actions and Borel measurable bounded payoffs, defined on the space H of all plays. This class includes the Gδ-games of Blackwell (1969). The concept of uniform 𝜖 -equilibrium does not apply to this class of games, because the payoffs are not additive. She proved that these games have extensive-form correlated 𝜖-equilibria.

Secchi and Sudderth (2002a) considered a special class of n-person stochastic “stay-in-a-set games” defined as follows. Let Gi be a fixed subset of X for each i ∈ N. Define \(G^i_{\infty }:=\{(x_1,a^1,x_2,a^2,\ldots )\}\), where xt ∈ Gi for every t. The payoff function for player i ∈ N is the characteristic function of the set \(G_i^\infty \). They proved the existence of an 𝜖-equilibrium (equilibrium) assuming that the state space is countable (finite) and the sets of actions are finite. Maitra and Sudderth (2003) generalized this result to the Borel state stay-in-a-set games with compact action sets using standard continuity assumption on the transition probability with respect to actions. Secchi and Sudderth (2002b) proved that every n -person stochastic game with countably many states, finite action sets and bounded upper semicontinuous payoff functions on H has an 𝜖 -equilibrium. All proofs in the aforementioned papers are partially based on the methods from gambling theory; see Dubins and Savage (2014).

Nonzero-sum infinite horizon games with perfect information are special cases of stochastic games. Flesch et al. (2010a) established the existence of subgame-perfect 𝜖-equilibria in pure strategies in perfect information games with lower semicontinuous payoff functions on the space H of all plays. A similar result for games with upper semicontinuous payoffs was proved by Purves and Sudderth (2011). It is worth mentioning that the aforementioned results also hold for games with arbitrary nonempty action spaces and deterministic transitions. Solan and Vieille (2003) provided an example of a two-person game with perfect information that has no subgame-perfect 𝜖-equilibrium in pure strategies, but does have a subgame-perfect 𝜖-equilibrium in behavior strategies. Their game belongs to the class of deterministic stopping games. Recently, Flesch et al. (2014) showed that a subgame-perfect 𝜖-equilibrium (in behavioral strategies) may not exist in perfect information games if the payoff functions are bounded and Borel measurable.

Additional general results on subgame-perfect equilibria in games of perfect information can be found in Alós-Ferrer and Ritzberger (2015, 2016). Two refinements of subgame-perfect 𝜖-equilibrium concept were introduced and studied in continuous games of perfect information by Flesch and Predtetchinski (2015).

Two-person discounted stochastic games of perfect information with finite state and action spaces were treated in Küenle (1994). Making use of threat strategies, he constructed a history-dependent pure Nash equilibrium. However, it is worth to point out that pure stationary Nash equilibria need not exist in this class of games. A similar remark applies to irreducible stochastic games of perfect information with the limiting average payoff criterion. Counterexamples are described in Federgruen (1978) and Küenle (1994).

We close this section with a remark on “folk theorems” for stochastic games. It is worth mentioning that the techniques, based on threat strategies utilized very often in repeated games, cannot be immediately adapted to stochastic games, where the players use randomized (behavioral) strategies. Deviations are difficult to discover when the actions are selected at random. However, some folk theorems for various classes of stochastic games were proved in Dutta (1995), Fudenberg and Yamamoto (2011), Hörner et al. (2011, 2014), and Peski and Wiseman (2015). Further comments can be found in Solan and Zillotto (2016).

Abreu et al. (1986, 1990) applied a method for analysing subgame-perfect equilibria in discounted repeated games that resembles the dynamic programming technique. The set of equilibrium payoffs is a set-valued fixed points of some naturally defined operator. A similar idea was used in stochastic games by Mertens and Parthasarathy (1991). The fixed point property for subgame-perfect equilibrium payoffs can be used to develop algorithms. Berg (2016) and Kitti (2016) considered some modifications of the aforementioned methods for discounted stochastic games with finite state spaces. They also demonstrated some techniques for computing (non-stationary) subgame-perfect equilibria in pure strategies provided that they exist. Sleet and Yeltekin (2016) applied the methods of Abreu et al. (1986, 1990) to some classes of dynamic games and provided a new approach for computing equilibrium value correspondences. Their idea is based on outer and inner approximations of the equilibrium value correspondence via step set-valued functions.

10 Nonzero-Sum Stochastic Games with Imperfect Monitoring

There are only a few papers on nonzero-sum stochastic games with imperfect monitoring (or incomplete information). Although in many models an equilibrium does not exist, some positive results were obtained for repeated games; see Forges (1992), Chap. IX in Mertens et al. (2015) and references cited therein. Altman et al. (2005, 2008) studied stochastic games, in which every player can only observe and control his “private state” and the state of the world is composed of the vector of private states. Moreover, the players do not observe the actions of their partners in the game. Such models of games are motivated by certain examples in wireless communications.

In the model of Altman et al. (2008), the state space \(X= \prod _{i=1}^n X_i\), where Xi is a finite set of private states of player i ∈ N. The action space Ai(xi) of player i ∈ N depends on xi ∈ Xi and is finite. It is assumed that player i ∈ N has no information about the payoffs called costs. Hence, player i only knows the history of his private state process and the action chosen by himself in the past. Thus, a strategy πi of player i ∈ N is independent of realizations of state processes of other players and their actions. If x = (x1, …, xn) ∈ X is a state at some period of the game and a = (a1, …, an) is the action profile selected independently by the players at that state, then the probability of going to state y = (y1, …, yn) is q(y|x, a) = q1(y1|x1, a1)⋯qn(yn|xn, an), where \(q_i(\cdot |x_i,a_i)\in \Pr (A_i(x_i))\). Thus the coordinate (or private) state processes are independent. It is assumed that every player i has a probability distribution νi of the initial state xi ∈ Xi and that the initial private states are independent. The initial distribution ν of the state x ∈ X is determined by ν1, …, νn in an obvious way and is known by the players. Further, it is supposed that player i ∈ N is given some stage cost functions \( c_i^j(x,a)\) (j = 0, 1, …, ni) depending on x ∈ X and action profiles a available in that state. The cost function \(c^0_i\) is to be minimized by player i in the long run, and \(c^j_i\) (for j > 0) are the costs that must satisfy some constraints described below.

Any strategy profile π together with the initial distribution ν and the transition probability q induces a unique probability measure on the space of all infinite plays. The expectation operator with respect to this measure is denoted by \(E^\pi _{\nu }\). The expected limit-average cost \( C_i^j(\pi )\) is defined as follows:

$$\displaystyle \begin{aligned} C^j_i(\pi):= \limsup_{T\to\infty} \frac{1}{T}E^\pi_{\nu}\left(\sum_{t=1}^T c^j_i(x^t,a^t)\right). \end{aligned}$$

Note that xt ∈ X and at is an action profile of all the players.

Let \(b_i^j>0\) (j = 1, …, ni) be bounds used to define constraints below. A strategy profile π is i-feasible if

$$\displaystyle \begin{aligned} C^j_i(\pi)\le b_i^j\quad \mbox{ for each}\quad j=1,\ldots,n_i. \end{aligned}$$

Thus, π is feasible if it is i-feasible for every player i ∈ N.

A strategy profile π is called a constrained Nash equilibrium, if π is feasible and for every player i ∈ N and his strategy πi such that the profile \( (\pi _i,\pi ^*_{-i})\) is i-feasible, we have

$$\displaystyle \begin{aligned} C^0_i(\pi)\le C^0_i(\pi_i,\pi^*_{-i}). \end{aligned}$$

Note that a unilateral deviation of player i may increase his cost or it may violate his constraints. The aforementioned fact is illustrated in Altman et al. (2008) by an example in wireless communications.

Altman et al. (2008) made the following assumptions.

  1. (I1)

    (Ergodicity) For every player i ∈ N and any stationary strategy the state process on Xi is an irreducible Markov chain with one ergodic class and possibly some transient states.

  2. (I2)

    (Strong Slater condition) There exists some η > 0 such that every player i ∈ N has a strategy \(\pi _i^\eta \) with the property that for any strategy profile πi of other players

    $$\displaystyle \begin{aligned} C^j_i(\pi^\eta_i,\pi_{-i})\le b^j_i-\eta \quad \mbox{for all}\quad j=1,\ldots,n_i. \end{aligned}$$
  3. (I3)

    (Information) The players do not observe their costs.

Theorem 24

Consider the game model that satisfies conditions (I1)–(I3) . Then there exists a stationary constrained Nash equilibrium.

Stochastic games with finite sets of states and actions and imperfect public monitoring were studied in Fudenberg and Yamamoto (2011) and Hörner et al. (2011). The players, in their models, observe states and receive only public signals on the chosen actions by the partners in the game. Fudenberg and Yamamoto (2011) and Hörner et al. (2011) established “folk theorems” for stochastic games under assumptions that relate to “irreducibility” conditions on the transition probability function. Moreover, Hörner et al. (2011, 2014) also studied algorithms for both computing the sets of all equilibrium payoffs in the normalized discounted games and for finding their limit as the discount factor tends to one. As shown in counterexamples in Flesch et al. (2003) an n -person stochastic game with non-observable actions of the players (and no public signals), observable payoffs and the expected limit-average payoff criterion does not possess 𝜖-equilibrium. Cole and Kocherlakota (2001) studied discounted stochastic games with hidden states and actions. They provided an algorithm for finding a sequential equilibrium, where strategies depend on private information only through the privately observed state. Imperfect monitoring is also assumed in the model of the supermodular stochastic game studied by Balbus et al. (2013b), where the monotone convergence of Nash equilibrium payoffs in finite-stage games is proved.

11 Intergenerational Stochastic Games

This section develops a concept of equilibrium behavior and establishes its existence in various intergenerational games. Both paternalistic and non-paternalistic altruism cases are discussed. Consider an infinite sequence of generations labelled by \(t\in \mathbb {N}\). There is a single good (called also a renewable resource) that can be used for consumption or productive investment. The set of all resource stocks S is an interval in \( \mathbb {R}\). It is assumed that 0 ∈ S. Every generation lives one period and derives utility from its own consumption and consumptions of some or all its descendants. Generation t observes the current stock st ∈ S and chooses at ∈ A(st) := [0, st] for consumption. The remaining part yt = st − at is left as an investment for its descendants. The next generation’s inheritance or endowment is determined by a weakly continuous transition probability q from S to S (stochastic production function), which depends on yt ∈ A(st) ⊂ S. Recall that the weak continuity of q means that q(⋅|ym) ⇒ q(⋅|y0) if ym → y0 in S (as m →). Usually, it is assumed that state 0 is absorbing, i.e., q({0}|0) = 1. Let Φ be the set of all Borel functions ϕ : S → S such that ϕ(s) ∈ A(s) for each s ∈ S. A strategy for generation t is a function ϕt ∈ Φ. If ϕt = ϕ for all \(t\in \mathbb {N}\) and some ϕ ∈ Φ, then we say that the generations employ a stationary strategy.

Suppose that all generations from t + 1 onward use a consumption strategy c ∈ Φ. Then, in the paternalistic model generation t’s utility when it consumes at ∈ A(st) equals to H(at, c)(st), where H is some real-valued function used for measurement of the satisfaction level of the generation. This implies that in models with paternalistic altruism each generation derives its utility from its own consumption and the consumptions of its successor or successors.

Such a game model reveals a time inconsistency. Strotz (1956) and Pollak (1968) were among the first, who noted this fact in the model of an economic agent whose preferences change over time. In related works, Phelps and Pollak (1968) and Peleg and Yaari (1973) observed that this situation is formally equivalent to one, in which decisions are made by a sequence of heterogeneous planners. They investigated the existence of consistent plans, what we shall call (stationary) Markov perfect equilibria. The solution concept is in fact a symmetric Nash equilibrium (c, c, …) in a game played by countably many short-lived players having the same utility functions. Therefore, we can say that a stationary Markov perfect equilibrium (c, c, …) corresponds to a strategy c∈ Φ such that

$$\displaystyle \begin{aligned} H(c^*(s),c^*)(s) =\sup_{a\in A(s)}H(a,c^*)(s) \end{aligned}$$

for every s ∈ S. We identify this equilibrium with c.

In other words, c∈ Φ is a stationary Markov perfect equilibrium if

$$\displaystyle \begin{aligned} c^*(s) \in \mathrm{arg} \max_{a\in A(s)}H(a,c^*)(s)\ \ \mbox{for each }\ s\in S. \end{aligned}$$

There is now a substantial body of work on paternalistic models, see for instance, Alj and Haurie (1983), Harris and Laibson (2001), and Nowak (2010) and the results presented below in this section. At the beginning we consider three types of games, in which the existence of a stationary Markov perfect equilibrium was proved in a sequence of papers: Balbus et al. (2015a,b,c). Game (G1) describes a purely deterministic case, while games (G2) and (G3) deal with a stochastic production function. However, (G2) concerns a model with one descendant, whereas (G3) examines a model with infinitely many descendants. Let us mention that by an intergenerational game with k (k is finite or infinite) descendants (successors or followers), we mean a game in which each generation derives its utility from its own consumption and consumptions of its k descendants.

  1. (G1)

    Let S := [0, +). Assume that \(q(\cdot |y_t)=\delta _{p(y_t)}( \cdot )\), where p : S → S is a continuous and increasing production function such that p(0) = 0. We also assume that

    $$\displaystyle \begin{aligned} H(a,c)(s)= \hat{u}(a,c(p(s-a))) \end{aligned}$$

    for some continuous and increasing in each variable function \(\hat {u}: \mathbb {R}^2_+ \to \mathbb {R}\cup \{-\infty \}\). Moreover, we allow \(\hat {u}\) to be unbounded from below. Hence, we assume that \(\hat {u}(0,y) \ge -\infty \) for all y ≥ 0 and \(\hat {u}(x,0) >-\infty \) for all x > 0. Furthermore, for any y1 > y2 in S and h > 0, we assume that the function \(\varDelta _h\hat {u} (x):= \hat {u}(x,y_1)-\hat {u}(x+h,y_2)\) has the strict single crossing property on (0, +), i.e., \(\varDelta _h\hat {u}(x) \ge 0\) implies that \(\varDelta _h\hat {u}(x^{\prime })> 0\) for each x > x (see Milgrom and Shannon 1994).

  2. (G2)

    Let S := [0, +). We study a model with a utility that reflects a generation’s attitude toward risk. This fact is reflected by a positive risk coefficient r. In this setup, H takes the following form:

    $$\displaystyle \begin{aligned} H(a,c)(s) =\left\{ \begin{array}{l@{\quad \quad}l} u(a) + \beta\int_Sv(c(s^{\prime }))q(ds^{\prime }|s-a), & \mbox{for}\quad r=0 \\ u(a) - \frac{\beta}{r}\ln\int_Se^{-rv(c(s^{\prime }))}q(ds^{\prime }|s-a), & \mbox{for}\quad r>0, \end{array} \right. \end{aligned}$$

    where \(u:S\to \mathbb {R}\cup \{-\infty \}\) is increasing, strictly concave, continuous on (0, +) and u(0) ≥−. In addition, the function \(v:S\to \mathbb {R}\) is bounded, continuous and increasing. Further assumptions are as follows: for every s ∈ S, the set Zs := {y ∈ S :  q({s}|y) > 0} is countable and the transition law q is stochastically increasing. The latter fact means that, if z → Q(z|y) is the cumulative distribution function for q(⋅|y), then for all y1 < y2 and z ∈ S, we have Q(z|y1) ≥ Q(z|y2).

  3. (G3)

    Let \(S:=[0,\bar {s}]\) for some \(\bar {s}>0\). In this case, we assume that the utility function of current generation t is as follows:

    $$\displaystyle \begin{aligned} H(a,c)(s) = \tilde{u}(a) + E^c_{s}[w(a_{t+1},a_{t+2},\ldots)], \end{aligned}$$

    where \(w: S^\infty \to \mathbb {R}\) is continuous and \(\tilde {u}: S\mapsto \mathbb {R}\) is continuous, strictly concave and increasing. Here, \(E_s^c\) is an expectation operator with respect to the unique probability measure on the space of all feasible future histories (starting from the endowment s of generation t) of the consumption-investment process induced by the stationary strategy c ∈ Φ used by each generation τ (τ > t) and the transition probability q. The function \(\tilde {u}\) is also assumed to be continuous and strictly concave. Defining

    $$\displaystyle \begin{aligned} \tilde{J}(c)(s)= E^c_{s}[w(a_k, a_{k+1},a_{k+2},\ldots)] \end{aligned}$$

    for every \(k\in \mathbb {N}\), we obtain that

    $$\displaystyle \begin{aligned} H(a,c)(s) =\tilde{u}(a)+\int_S \tilde{J}(c)(s^{\prime })q(ds^{\prime }|s-a). \end{aligned}$$

    In addition, q(⋅|y) is assumed to be nonatomic for y > 0.

Let I denote the set of nondecreasing lower semicontinuous functions \(i: S\to \mathbb {R}\) such that i(s) ∈ A(s) for each s ∈ S. Note that every i ∈ I is continuous from the left and has at most a countable number of discontinuity points. Put

$$\displaystyle \begin{aligned} F:=\{c\in \varPhi:\; c(s)=s-i(s),\; i\in I,\ s\in S\}. \end{aligned}$$

Clearly, any c ∈ F is upper semicontinuous and continuous from the left. The idea of using the class F of strategies for analysing equilibria in deterministic bequest games comes from Bernheim and Ray (1983). Further, it was successfully applied to the study of other classes of dynamic games with simultaneous moves; see Sundaram (1989a) and Majumdar and Sundaram (1991).

Theorem 25

Every intergenerational game (G1), (G2)and (G3)possesses a stationary Markov perfect equilibrium c F.

The main idea of the proof is based upon the consideration of an operator L defined as follows: to each consumption strategy c ∈ F used by descendant (or descendants) the function L assigns the maximal element c0 from the set of best responses to c. It is shown that c0 ∈ F. Moreover, F can be viewed as a convex subset of the vector space Y  of real-valued continuous from the left functions \(\eta :S\mapsto \mathbb {R}\) of bounded variation on every interval Sn := [0, n], \(n\in \mathbb {N}\), thus in particular on \([0,\bar {s}]\). We further equip Y  with the topology of weak convergence. We assume that (ηm) converges weakly to some η0 ∈ Y , if \(\lim\limits _{m\to \infty }\eta ^m(s)=\eta ^0(s)\) for every continuity point s of η0. Then, due to Lemma 2 in Balbus et al. (2015c) , F is compact and metrizable. Finally, the equilibrium point is obtained via the Schauder-Tychonoff fixed point theorem applied to the operator L.

Theorem 25 for game (G1) was proved in Balbus et al. (2015c). Related results for the purely deterministic case were considered by Bernheim and Ray (1983) and Leininger (1986). For instance, Leininger (1986) studied a class \( \mathcal {U}\) of bounded from below utility functions for which every selector of the best response correspondence is nondecreasing. In particular, he noticed that this class is nonempty and it includes, for instance, the separable case, i.e., u(x, y) = v(x) + bv(y), where v is strictly increasing and concave and b > 0. Bernheim and Ray (1983), on the other hand, showed that the functions u that are strictly concave in their first argument and satisfying the so-called increasing differences property (see Sect. 2) also belong to \(\mathcal {U.}\) Other functions u that meet conditions imposed by Bernheim and Ray (1983) and Leininger (1986) are of the form u(x, y) = v1(x)v2(y), where v1 is strictly concave and v2 ≥ 0 is continuous and increasing. The class \(\mathcal {U}\) is not fully characterized. The class (G1) of games includes all above-mentioned examples and some new ones. Our result is also valid for a larger class of utilities that can be unbounded from below. Therefore, Theorem 25 is a generalization of Theorem 4.2 in Bernheim and Ray (1983) and Theorem 3 in Leininger (1986). The proofs given by Bernheim and Ray (1983) and Leininger (1986) do not work for unbounded utility functions. Indeed, Leininger (1986) uses a transformation of upper semicontinuous consumption strategies into the set of Lipschitz functions with constant 1. This clever “levelling” operation enables him to equip the space of continuous functions on the interval \([0, \bar {y}]\) with the topology of uniform convergence and to apply the Schauder fixed point theorem. His proof strongly makes use of the uniform continuity of u. This is the case, when the production function crosses the 45 line. If the production function does not cross the 45 line, a stationary equilibrium is then obtained as a limit of equilibria corresponding to the truncations of the production function. However, this part of the proof is descriptive and sketchy. Bernheim and Ray (1983), on the other hand, identify with the maximal best response consumption strategy, which is upper semicontinuous, a convex-valued upper hemicontinuous correspondence. Then, such a space of upper hemicontinuous correspondences is equipped with the Hausdorff topology. This fact implies the strategy space is compact, if endowments have an upper bound, i.e., when the production function p crosses the 45 line. If this is not satisfied, then a similar approximation technique as in Leininger (1986) is employed. Our proof does not follow the above-mentioned approximation methods. The weak topology introduced in the space Y  implies that F is compact and allows to use an elementary but non-trivial analysis. For examples of deterministic bequest games with stationary Markov perfect equilibria given in closed form the reader is referred to Fudenberg and Tirole (1991) and Nowak (2006b, 2010).

Theorem 25 for game (G2) was proved by Balbus et al. (2015b), whereas for game (G3) by Balbus et al. (2015a). Within the stochastic framework, Theorem 25 is an attempt of saving the result reported by Bernheim and Ray (1989) on the existence of stationary Markov perfect equilibria in games with very general utility function and nonatomic shocks. If q is allowed to possess atoms, then a stationary Markov perfect equilibrium exists in the bequest games with one follower (see Theorems 1–2 in Balbus et al. 2015b). The latter result also embraces the purely deterministic case; see Example 1 in Balbus et al. (2015b), where the nature and role of assumptions are discussed. However, as shown in Example 3 in Balbus et al. (2015a), the existence of stationary Markov perfect equilibria in the class of F cannot be proved in intergenerational games where q has atoms and there are more than one descendant.

The result in Bernheim and Ray (1986) concerns “consistent plans” in models with finite time horizon. The problem is then simpler. The results of Bernheim and Ray (1986) were considerably extended by Harris (1985) in his paper on perfect equilibria in some classes of games of perfect information. It should be noted that there are other papers that contain certain results for bequest games with stochastic production function. Amir (1996b) studied games with one descendant for every generation and the transition probability such that the induced cumulative distribution function Q(z|y) is convex in y ∈ S. This condition is rather restrictive. Nowak (2006a) considered similar games in which the transition probability is a convex combination of the Dirac measure at state s = 0 and some transition probability from S to S with coefficients depending on investments. Similar models were considered by Balbus et al. (2012, 2013a). The latter paper also studies some computational issues for stationary Markov perfect equilibria. One should note, however, that the transition probabilities in the aforementioned works are specific. However, the transition structure in Balbus et al. (2015a,b) is consistent with the transitions used in the theory of economic growth; see Bhattacharya and Majumdar (2007) and Stokey et al. (1989).

The interesting issue studied in the economics literature concerns the limiting behavior of the state process induced by a stationary Markov perfect equilibrium. Below we formulate a steady state result for a stationary Markov perfect equilibrium obtained for the game (G1). Under slightly more restrictive conditions it was shown by Bernheim and Ray (1987) that the equilibrium capital stock never exceeds the optimal planning stock in any period. Namely, it is assumed that

  1. (B1)

    p is strictly concave, continuously differentiable and \( \lim _{y\to 0^+}p^{\prime }(y)>1\), limyp(y) < 1∕β, where β ∈ (0, 1] is a discount factor;

  2. (B2)

    \( \hat {u}(a_t,a_{t+1})=\hat {v}(a_t)+\beta \hat {v}(a_{t+1})\), where \(\hat {v} :S\to \mathbb {R}\) is increasing, continuously differentiable, strictly concave and \(\hat {v}(a)\to \infty \) as a →.

An optimal consumption program \(\hat {a}:=(\hat {a}_t)_{t\in \mathbb {N} }\) is the one which maximizes \(\sum _{t=1}^\infty \beta ^{t-1} \hat {v}(\hat {a} _t)\) subject to all feasibility constraints described in the model. The following result is stated as Theorems 3.2 and 3.3 in Bernheim and Ray (1987).

Theorem 26

Assume (B1)–(B2)and consider game (G1) . If c is a stationary Markov perfect equilibrium, then \(i^*(s)=s-c^*(s)\le \hat {y}\), where \(\hat {y}\in [0,\infty )\) is the limit of the sequence \((s_t-\hat {a_t})_{t\in \mathbb {N}}\). If \(\hat {y}>0\), it solves βp′(y) = 1. If \(\lim _{y\to 0^+}p'(y)>1/\beta \), then \(\hat {y}>0\).

For further properties of stationary Markov perfect equilibria such as efficiency, and Pareto optimality, the reader is referred to Sect. 4 in Bernheim and Ray (1987).

For stochastic models it is of some interest to know whether a stationary Markov perfect equilibrium induces a Markov process having an invariant distribution. It turns out that the answer is positive if an additional stochastic monotonicity requirement is met:

  1. (B3)

    If y1 < y2, then for any nondecreasing Borel measurable function \(h:S\to \mathbb {R}\),

    $$\displaystyle \begin{aligned} \int_S h(s)q(ds|y_1)\le \int_S h(s)q(ds|y_2). \end{aligned}$$

By Theorem 25 for game (G3), there exists c∈ F. Then s → i(s) = s − c(s) is nondecreasing on S. Put q(B|s) := q(B|i(s)) where B is a Borel subset of S and s ∈ S. From (B3), it follows that s → q(⋅|s) is nondecreasing. Define the mapping \(\varPsi :\Pr (S)\to \Pr (S)\) by

$$\displaystyle \begin{aligned} \varPsi\sigma(B):= \int_S q^*(B|s)\sigma(ds) \end{aligned}$$

where \(B\in \mathcal {B}(S)\). An invariant distribution for the Markov process induced by the transition probability q determined by i (and thus by c) is any fixed point of Ψ. Let Δ(q) be the set of invariant distributions for the process induced by q. In Sect. 4 in Balbus et al. (2015a), the following result was proved.

Theorem 27

Assume (B3)and consider game (G3) . Then, the set of invariant distributions Δ(q) is compact in the weak topology on \(\Pr (S)\).

For each σ ∈ Δ(q), M(σ) :=∫S(ds) is the mean of distribution σ. By Theorem 27, there exists σ∗∗ with the highest mean over the set Δ(q).

One can ask for the uniqueness of invariant distribution. Theorem 4 in Balbus et al. (2015a) yields a positive answer to this question. However, this result concerns the model with multiplicative shocks, i.e., q is induced by the equation

$$\displaystyle \begin{aligned} s_{t+1}= f(y_t)\xi_t, \quad t\in\mathbb{N}, \end{aligned}$$

where f : S → S is a continuous increasing function such that f(0) > 0. In addition, there is a state \( \hat {s}\in (0,\infty )\) such that f(y) > y for \(y\in (0,\hat {s})\) and f(y) < y for \(y\in (\hat {s}, \infty )\). Here \((\xi _t)_{t\in \mathbb {N}}\) is an i.i.d. sequence with the nonatomic distribution π. Assuming additionally the monotone mixing condition, we conclude from Theorem 4 in Balbus et al. (2015a) the uniqueness of the invariant distribution. Further discussion on these issues can be found in Stokey et al. (1989), Hopenhayn and Prescott (1992), Stachurski (2009), Balbus et al. (2015a) and the references cited therein.

In contrast to the paternalistic model one can also think of a non-paternalistic altruism. This notion is concerned with a model, in which each generation’s utility is derived from its own consumption and the utilities of its all successors. The most general model with non-paternalistic altruism was formulated by Ray (1987). His work is of some importance, because it provides a proper definition of an equilibrium for the non-paternalistic case. According to Ray (1987), a stationary equilibrium consists of a pair of two functions: a saving policy (or strategy) and an indirect utility function. Such a pair constitutes an equilibrium if it is optimal for the current generation, provided its descendants use the same saving strategy and the same indirect utility function.

Assume that the generations from t onward use a consumption strategy c ∈ Φ. Then, the expected utility of generation t, that inherits an endowment \(s_t=s\in S:=[0, \bar {s}]\), is of the form

$$\displaystyle \begin{aligned} W_t(c,v)(s):= (1-\beta)\tilde{u}(c(s)) +\beta E^c_{s}[w(v(s_{t+1}),v(s_{t+2}),\ldots)]. \end{aligned} $$
(6.8)

where \(\tilde {u}:S \to K\) and w : K→ K are continuous functions and \(K:=[0,\bar {k}]\) with some \(\bar {k}\ge \bar {s}\). The function v : S → K is called an indirect utility and is assumed to be Borel measurable. Similarly, for any c ∈ Φ and s = st+1 ∈ S, we can define

$$\displaystyle \begin{aligned} J(c,v)(s):= E^c_{s}[w(v(s_{t+2}),v(s_{t+3}),\ldots)], \end{aligned}$$

which yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} W(c,v)(s):=W_t(c,v)(s) &\displaystyle =&\displaystyle (1-\beta) \tilde{u}(c(s)) +\beta\int_S J(c,v)(s^{\prime })q(ds^{\prime }|s-c(s)). \end{array} \end{aligned} $$

Let us define

$$\displaystyle \begin{aligned} P(a,c,v)(s):=(1-\beta)\tilde{u}(a)+\beta\int_S J(c,v)(s^{\prime })q(ds^{\prime }|s-a), \end{aligned}$$

where s ∈ S, a ∈ A(s) and c ∈ Φ. If st = s, then P(a, c, v)(s) is the utility for generation t choosing the consumption level a ∈ A(st) in this state under the assumption that all future generations will employ a stationary strategy c ∈ Φ and the indirect utility is v.

A stationary equilibrium in the sense of Ray (1987) is a pair (c, v), with cΦ, and v : S → K being a bounded Borel measurable function such that for every s ∈ S, we have that

$$\displaystyle \begin{aligned} v^*(s)=\sup_{a\in A(s)}P(a,c^*,v^*)(s)=P(c^*(s),c^*,v^*)(s)=W(c^*,v^*)(s). \end{aligned} $$
(6.9)

Note that equality (6.9) says that there exist an indirect utility function v and a consumption strategy c, both depending on the current endowment, such that each generation finds it optimal to adopt this consumption strategy provided its descendants use the same strategy and exhibit the given indirect utility.

Let V  be the set of all nondecreasing upper semicontinuous functions v : S → K. Note that every v ∈ V  is continuous from the right and has at most a countable number of discontinuity points. By I we denote the subset of all functions φ ∈ V  such that φ(s) ∈ A(s) for each s ∈ S. Let F = {c : c(s) = s − i(s), s ∈ S, i ∈ I}. We impose similar conditions to those imposed on model (G3). Namely, we shall assume that \( \tilde {u}\) is strictly concave and increasing. Then, the following result holds.

Theorem 28

In a non-paternalistic game as described above with nonatomic transitions, there exists a stationary equilibrium (c, v) ∈ F × V .

Theorem 28 was established as Theorem 1 in Balbus et al. (2016). Ray (1987) analysed games with non-paternalistic altruism and deterministic production functions. Unfortunately, his proof contains a mistake. The above result is strongly based on the assumption that the transitions are nonatomic and weakly continuous. The problem in the deterministic model of Ray (1987) remains open. However, Theorem 28 implies that an equilibrium exists if a “small nonatomic noise” is added to the deterministic transition function.

There is a great deal of work devoted to the so-called hyperbolic decision makers, in which the function w in (G3) has a specific form. Namely,

$$\displaystyle \begin{aligned} w(a_k,a_{k+1},a_{k+2},\ldots)=\alpha\beta\sum_{m=k}^\infty \beta^{m-k} \tilde{u}(a_m), \end{aligned} $$
(6.10)

where α > 0 and is interpreted as a short-run discount factor and β < 1 is known as a long-run discount coefficient. This model was studied by Harris and Laibson (2001) with the transition function defined via the difference equation

$$\displaystyle \begin{aligned} s_{t+1}=R(s_t-a_t)+\xi_t,\quad \quad R\ge 0\quad\mbox{and} \quad t\in\mathbb{ N}. \end{aligned}$$

The random variables \((\xi _t)_{t\in \mathbb {N}}\) are nonnegative, Independent, and identically distributed with respect to a nonatomic probability measure. The function \(\tilde {u}\) satisfies some restrictive condition concerning the risk aversion of the decisionmaker, but it may be unbounded from above. Working in the class of strategies with locally bounded variation, Harris and Laibson (2001) showed the existence of a stationary Markov perfect equilibrium in their model with concave utility function \( \tilde {u}\). They also derived a strong hyperbolic Euler relation. The model considered by Harris and Laibson (2001) can also be viewed as a game between generations; see Balbus and Nowak (2008), Nowak (2010), and Jaśkiewicz and Nowak (2014a) where related versions are studied. However, its main interpretation in the economics literature says that it is a decision problem where the utility of an economic agent changes over time. Thus, the agent is represented by a sequence of selves and the problem is to find a time-consistent solution . This solution is actually a stationary Markov perfect equilibrium obtained by thinking about selves as players in an intergenerational game. For further details and references, the reader is referred to Harris and Laibson (2001) and Jaśkiewicz and Nowak (2014a). A decision model with time-inconsistent preferences involving selves who can stop the process at any stage was recently studied by Cingiz et al. (2016)

The model with the function w defined in (6.10) can be extended by adding to the transition probabilities an unknown parameter θ. Then, the natural solution for such a model is a robust Markov perfect equilibrium. Roughly speaking, this solution is based on the assumption that the generations involved in the game are risk-sensitive and accept a maxmin utility. More precisely, let Θ be a nonempty Borel subset of Euclidean space \(\mathbb {R}^m\) (m ≥ 1). Then, the endowment st+1 for generation t + 1 is determined by the transition q from S × Θ to S that depends on the investment yt ∈ A(st) and a parameter θt ∈ Θ. This parameter is chosen according to a certain probability measure \(\gamma _t\in \mathcal {P}\), where \(\mathcal {P}\) denotes the action set of nature and it is assumed to be a Borel subset of \( \Pr (\varTheta )\).

Let Γ be the set of all sequences \((\gamma _t)_{t\in \mathbb {N}}\) of Borel measurable mappings \(\gamma _t: D\to \mathcal {P}\), where D = {(s, a) :  s ∈ S, a ∈ A(s)}. For any \(t\in \mathbb {N}\) and \(\gamma =(\gamma _t)_{t\in \mathbb {N}}\in \varGamma \), we set γt := (γτ)τt. Clearly, γt ∈ Γ. A Markov strategy for nature is a sequence \(\gamma =(\gamma _t)_{t\in \mathbb {N}}\in \varGamma \). Note that γt can be called a Markov strategy used by nature from period t onward.

For any \(t\in \mathbb {N}\), define Ht as the set of all sequences

$$\displaystyle \begin{aligned} h^t= (a_t,\theta_t,s_{t+1},a_{t+1},\theta_{t+1},\ldots), \quad\mbox{where } (s_k,a_k)\in D,\ \theta_k\in\varTheta \mbox{ and }k\ge t. \end{aligned}$$

Ht is the set of all feasible future histories of the process from period t onward. Endow Ht with the product σ-algebra. Assume in addition that \( \tilde {u}\le 0\), the generations employ a stationary strategy c ∈ Φ and nature chooses some γ ∈ Γ. Then the choice of nature is a probability measure depending on (st, c(st)). Let \(E^{c,\gamma ^t}_{s_t}\) denote as usual the expectation operator corresponding to the unique probability measure on Ht induced by a stationary strategy c ∈ Φ used by each generation (τ ≥ t), a Markov strategy of nature γt ∈ Γ and the transition probability q. Assume that all generations from t onward use c ∈ Φ and nature applies a strategy γt ∈ Γ. Then, the generation t’s expected utility is of the following form:

$$\displaystyle \begin{aligned} \hat{W}(c)(s_t):=\inf_{\gamma^t\in \varGamma} E_{s_t}^{c,\gamma^t}\left(\tilde{u }(c(s_t))+ \alpha\beta\sum_{m=t+1}^\infty \beta^{m-t-1}\tilde{u} (c(s_{\tau}))\right). \end{aligned}$$

This definition of utility in an intergenerational game provides an intuitive notion of ambiguity aversion, which can be regarded as the generations’ diffidence for any lack of precise definition of uncertainty, something that provides room for the malevolent influence of nature. Letting

$$\displaystyle \begin{aligned} \hat{J}(c)(s_j)= \inf_{\gamma^j\in\varGamma}E_{s_{j}}^{c,\gamma^j}\left(\sum_{m=j}^\infty \beta^{m-j}\tilde{u}(c(s_{\tau}))\right) \end{aligned}$$

we one can show that

$$\displaystyle \begin{aligned} \hat{W}(c)(s_t) = \tilde{u}(c(s))+\inf_{\xi\in\mathcal{P}}\alpha\beta\int_S \hat{J}(c)(s_{t+1})q(ds_{t+1}|s_t-c(s_t),\xi). \end{aligned}$$

For any s ∈ S, a ∈ A(s), and c ∈ Φ, we set

$$\displaystyle \begin{aligned} \hat{P}(a,c)(s)=\tilde{u}(a)+\inf_{\xi\in\mathcal{P}}\alpha\beta\int_S \hat{J }(c)(s^{\prime })q(ds^{\prime }|s-a,\xi). \end{aligned}$$

If s = st, then \(\hat {P}(a,c)(s)\) is the utility for generation t choosing a ∈ A(st) in this state when all future generations employ a stationary strategy c ∈ Φ.

A robust Markov perfect equilibrium is a function c∈ Φ such that for every s ∈ S we have

$$\displaystyle \begin{aligned} \sup_{a\in A(s)} \hat{P}(a,c^*)(s)=\hat{P}(c^*(s),c^*)(s)=\hat{W}(c^*)(s). \end{aligned}$$

The existence of a robust Markov perfect equilibrium in the aforementioned model was proved by Balbus et al. (2014) under the assumption that the transition probability is a convex combination of probability measures μ1, …, μl on S with coefficients depending on investments y = s − a . A robust Markov perfect equilibrium was obtained in the class of functions F under the condition that all measures μ1, …, μl are nonatomic. If μ1, …, μl have atoms, then some stochastic dominance conditions are imposed, but the equilibrium was obtained in the class of Lipschitz continuous functions with constant 1. A different approach was presented in the work of Jaśkiewicz and Nowak (2014b), where the set of endowments S and the set of consumptions are Borel and the parameter set Θ is finite. Assuming again that the transition probability is a finite convex combination of probability measures μ1, …, μl on S depending on the parameter θ with coefficients depending on the inheritance s and consumption level a, they have established a twofold result. First, they proved the existence of a robust Markov perfect equilibrium in the class of randomized strategies. Second, assuming that μ1, …, μl are nonatomic, and making use of the purification theorem of Dvoretzky-Wald-Wolfowitz, they replaced a randomized equilibrium by a pure one.

The models of intergenerational games with general spaces of consumptions and endowments were also examined by Jaśkiewicz and Nowak (2014a). A novel feature in this approach is the fact that generation t can employ the entropic risk measure to calculate its utilities. More precisely, if Z is a random variable with the distribution π, then its entropic risk measure is \(\mathcal {E}(Z)=\frac {1}{r} \ln \int _{\varOmega } e^{rZ(\omega )}\pi (d\omega )\), where r < 0 is a risk coefficient. If r is sufficiently close to zero, then making use of the Taylor expansion one can see that

$$\displaystyle \begin{aligned} \mathcal{E}(Z)\approx EZ + \frac{r}{2}\mathrm{Var}(Z). \end{aligned}$$

This means that a generation which uses the entropic risk measure to calculate its utility is risk averse and takes into account not only the expected value of a random future successors’ utilities derived from consumptions but also their variance. Assuming that each generation cares only about its m descendants and assuming that the transition probability is a convex combination of finitely many nonatomic measures on the endowment space with coefficients that may depend on s and a, Jaśkiewicz and Nowak (2014a) proved the existence of stationary Markov perfect equilibrium in pure strategies. The same result was shown for games with infinitely many descendants in the case of hyperbolic preferences. In both cases the proof consists of two parts. First, a randomized stationary Markov perfect equilibrium was shown to exist. Second, making use of the specific structure of the transition probability and applying the Dvoretzky-Wald-Wolfowitz theorem a desired pure stationary Markov perfect equilibrium was obtained.

A related game to the above mentioned models is the one with quasi-geometric discounting from the dynamic consumer theory; see Chatterjee and Eyigungor (2016). Particularly, the authors showed that in natural cases such a game does not possess a Markov perfect equilibrium in the class of continuous strategies. However, a continuous Markov perfect equilibrium exists, if the model was reformulated involving lotteries. These two models were then numerically compared. It is known that the numerical analysis of equilibrium in models with hyperbolic (quasi-geometric) discounting shows difficulties in achieving convergence even in a simple, deterministic optimal growth problem that has a smooth closed-form solution. Maliar and Maliar (2016) defined some restrictions on the equilibrium strategies under which the numerical methods studied deliver a unique smooth solution for many deterministic and stochastic models.

Finally, we wish to point out that Markov Perfect Equilibria for stochastic bequest games with transition probabilities and utilities depending on time were shown to exist in Balbus et al. (2017, 2018).

12 Stopping Games

Stopping games were introduced by Dynkin (1969) as a generalization of optimal stopping problems. They were used in several models in economics and operations research, for example, in equipment replacement, job search, and consumer purchase behavior; see Heller (2012).

Dynkin (1969) dealt with the following problem. Two players observe a bivariate sequence of adapted random variables \((X(k),Y(k))_{k\in \mathbb {N}_0}\), where \(\mathbb {N}_0= \mathbb {N}\cup \{0\}\). Player 1 chooses a stopping time τ1 such that {τ1 = k}⊂{X(k) ≥ 0}, whereas player 2 selects τ2 such that {τ2 = k}⊂{X(k) < 0}. If τ1 ∧ τ2 is finite, then player 2 pays Y (τ) to player 1 and the game terminates. Hence, the objective of player 1 (respectively 2) is to maximize (minimize) R(τ1, τ2) = E[Y (τ1 ∧ τ2)]. Dynkin (1969) characterized 𝜖-optimal stopping times and proved that the game has a value provided that \(\sup _{k\in \mathbb {N}_0}|Y(k)|\) is integrable. This model was later extended by Kiefer (1971) and Neveu (1975). In particular, Neveu (1975) showed the existence of a game value in a slightly modified model. Namely, he dealt with the following expected payoff function:

$$\displaystyle \begin{aligned} R(\tau_1,\tau_2)= E[X(\tau_1)1[\tau_1<\tau_2]+Y(\tau_2)1[\tau_2\le \tau_1]], \end{aligned}$$

where \((X(k))_{k\in \mathbb {N}_0}\) and \((Y(k))_{k\in \mathbb {N}_0}\) are \( \mathbb {R}\)-valued adapted stochastic processes such that \(\sup _{k\in \mathbb { N}_0}(X^+(k)+Y^-(k))\) are integrable and X(k) ≤ Y (k) for all \(k\in \mathbb { N}_0\). The game considered by Neveu (1975) was generalized by Yasuda (1985), who dropped the latter assumption on monotonicity. In his model, the expected payoff function takes the following form:

$$\displaystyle \begin{aligned} R(\tau_1,\tau_2)= E[X(\tau_1)1[\tau_1<\tau_2]+Y(\tau_2)1[\tau_2< \tau_1]+Z(\tau_1)1[\tau_1=\tau_2]], \end{aligned}$$

where as usual \((X(k))_{k\in \mathbb {N}_0}\), \((Y(k))_{k\in \mathbb {N}_0}\) and \( (Z(k))_{k\in \mathbb {N}_0}\) are adapted integrable random variables. Yasuda (1985) considered randomized strategies instead of pure ones. According to Yasuda (1985) a strategy for a player is an adapted random sequence \( p=(p_k)_{k\in \mathbb {N}_0}\) (or \(q=(q_k)_{k\in \mathbb {N}_0}\)) such that 0 ≤ pk, qk ≤ 1 with probability one. Here, pk (or qk) stands for the probability that the player stops the game at time k conditional on the event that the game was not stopped before. In computing the payoff induced by a pair of strategies (p, q), one assumes that the randomizations performed by the players in various stages are mutually independent and independent of the payoff processes. Thus, a strategy that corresponds to a stopping time σ is pk = 0 on the event [σ > k] and pk = 1 on the event [σ ≤ k]. Yasuda (1985) proved the existence of the value in the set of randomized strategies in finite and discounted infinite time horizon problems.

Before formulating the next result, we define the stopping stages for players 1 and 2 by \(\theta _1:=\inf \{k\in \mathbb {N}_0 :\ P(k)\le p_k\}\), and \(\theta _2:=\inf \{k\in \mathbb {N}_0 :\ Q(k)\le q_k\}, \) where \((P(k), Q(k))_{k\in \mathbb {N}_0}\) is a double sequence of i.i.d. random variables uniformly distributed over [0, 1] satisfying certain independence assumptions imposed in Rosenberg et al. (2001). Set θ = θ1 ∧ θ2. Clearly, θ is the stage at which the game stops. Let us define

$$\displaystyle \begin{aligned} R(p,q)=E[X(\theta_1)1[\theta_1<\theta_2]+Y(\theta_2)1[\theta_2< \theta_1]+Z(\theta_1)1[\tau_1=\tau_2<+\infty]] \end{aligned}$$

and its β-discounted evaluation

$$\displaystyle \begin{aligned} R_{\beta}(p,q)&=(1-\beta)E\big[\beta^{\theta+1}(X(\theta_1)1[\theta_1< \theta_2]+ Y(\theta_2)1[\theta_2< \theta_1]\\ &\quad +Z(\theta_1)1[\tau_1=\tau_2<+\infty])\big]. \end{aligned} $$

The following result was proved by Rosenberg et al. (2001).

Theorem 29

Assume that \(E[\sup _{k\in \mathbb {N}_0} (|X(k)|+|Y(k)|+|Z(k)|)]<+\infty \). Then the stopping games with the payoffs R(p, q) and Rβ(p, q) have values, say v and vβ, respectively. Moreover, limβ→1vβ = v.

Let us now turn to nonzero-sum Dynkin games. They were considered in several papers; see, for instance, Ferenstein (2007), Krasnosielska-Kobos (2016), Morimoto (1986), Nowak and Szajowski (1999), Ohtsubo (1987), Ohtsubo (1991), Solan and Vieille (2001), and Szajowski (1994). Obviously, the list of references is by no means exhaustive. We start with presenting a result for two-player nonzero-sum stopping games. Assume that the aforementioned sequences \((X(k))_{k\in \mathbb {N}_0}\), \((Y(k))_{k\in \mathbb {N}_0}\) and \( (Z(k))_{k\in \mathbb {N}_0}\) are bounded in \(\mathbb {R}^2\) and let ρ be a uniform bound on the payoffs. The payoff of the game is R(p, q) except that \(R(p,q)\in \mathbb {R}^2\). Shmaya and Solan (2004) proved the following result.

Theorem 30

For each 𝜖 > 0, the stopping game has an 𝜖-equilibrium \((p^*_{\epsilon },q^*_{\epsilon })\).

Theorem 30 does not hold, if the payoffs are not uniformly bounded. Its proof is based upon a stochastic version of the Ramsey theorem that was also proved by Shmaya and Solan (2004). It states that for every colouring of a complete infinite graph by finitely many colours, there is a complete infinite monochromatic subgraph. Shmaya and Solan (2004) applied a variation of this result to reduce the problem of the existence of an 𝜖-equilibrium in a general stopping game to that of studying properties of 𝜖 -equilibria in a simple class of stochastic games with finite state space. A similar result for deterministic 2-player nonzero-sum stopping games was reported by Shmaya et al. (2003).

All the aforementioned works deal with the two-player case and/or assume some special structure of the payoffs. Recently, Hamadène and Hassani (2014) studied n -person nonzero-sum Dynkin games. Such a game is terminated at τ := τ1 ∧… ∧ τn, where τi is a stopping time chosen by player i. Then, the corresponding payoff for player i is given by

$$\displaystyle \begin{aligned} R_i(\tau_1,\ldots,\tau_n)=W_{\tau}^{i,I}, \end{aligned}$$

where Is denotes the set of players who make the decision to stop, that is, Is = {m ∈{1, …, n} :  τ = τm} and \(W^{i,I_s}\) is the payoff stochastic process of player i. The main assumption says that the payoff is less when the player belongs to the group involved in the decision to stop than when he is not. Hamadène and Hassani (2014) showed that the game has a Nash equilibrium in pure strategies. The proof is based on the approximation scheme whose limit provides a Nash equilibrium.

Krasnosielska-Kobos and Ferenstein (2013) is another paper that is concerned with multi-person stopping games. More precisely, they consider a game in which players sequentially observe the offers X(1), X(2), … at jump times T1, T2, … of a Poisson process. It is assumed that the random variables X(1), X(2), … form an i.i.d. sequence. Each accepted offer results in a reward R(k) = X(k)r(Tk), where r is a non-increasing discount function. If more than one player accepts the offer, then the player with the highest priority gets the reward. By making use of the solution to the multiple optimal stopping time problem with above reward structure, Krasnosielska-Kobos and Ferenstein (2013) constructed a Nash equilibrium which is Pareto efficient.

Mashiah-Yaakovi (2014), on the other hand, studied subgame-perfect equilibria in stopping games. It is assumed that at every stage one of the players is chosen according to a stochastic process, and that player decides whether to continue the interaction or to stop it. The terminal payoff vector is obtained by another stochastic process. Mashiah-Yaakovi (2014) defines a weaker concept of subgame-perfect equilibrium, namely, a δ-approximate subgame-perfect 𝜖-equilibrium. A strategy profile is a δ -approximate subgame-perfect 𝜖-equilibrium if it induces an 𝜖-equilibrium in every subgame, except perhaps a set of subgames that occur with probability at most δ. A 0-approximate subgame-perfect 𝜖-equilibrium is actually a subgame-perfect 𝜖-equilibrium. The concept of approximate subgame-perfect equilibrium relates to the concept of “trembling-hand perfect equilibrium” introduced by Selten (1975). A stopping game in which, at every stage, one player who decides to stop or continue the game is chosen according to a (periodic in some sense) stochastic process is also studied in Mashiah-Yaakovi (2009). This assumption extends the random priority in stopping games considered, for example, in Szajowski (1994, 1995). Once the chosen player decides to stop, the players receive terminal payoffs that are determined by a second stochastic process. Periodic subgame-perfect 𝜖-equilibria in pure strategies are studied under some quite general conditions. Some bases for stopping n-person games with fixed priorities were provided by Enns and Ferenstein (1987).

Finally, it is worth pointing out that there are three notions of random stopping times. The above mentioned randomized strategies used by Yasuda (1985) and Rosenberg et al. (2001) are also called behavioral stopping times. A randomized stopping time, on the other hand, is a nonnegative adapted real-valued process \(\rho =(\rho _k)_{k\in \mathbb {N}\cup \{\infty \}}\) that satisfies \(\sum _{k\in \mathbb {N}\cup \{\infty \}}\rho _k=1\). The third concept allows to define mixed stopping times ν. Roughly speaking, they are product measurable functions in which the first coordinate r is chosen according to the uniform distribution over the interval [0, 1] at the outset. Then, the stopping time is ν(r, ⋅). For more details, the reader is referred to Rosenberg et al. (2001). As communicated to us by Eilon Solan, the classes of mixed and randomized stopping times are equivalent by a proper generalization of Kuhn’s theorem.