Abstract
In this paper, we consider constrained discounted stochastic games with a countably generated state space and norm continuous transition probability having a density function. We prove existence of approximate stationary equilibria and stationary weak correlated equilibria. Our results imply the existence of stationary Nash equilibrium in ARAT stochastic games.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Constrained Markov decision processes and stochastic games have numerous applications in operations research, economics, computer sciences, consult with [2, 3, 28, 37] and references cited therein. They arise in situations, in which a controller or player has many objectives. For example, when she or he wants to minimise one type of cost while keeping other costs lower than some given bounds. Constrained stochastic games with finite state and action spaces were first studied by Altman and Shwartz [3]. Their work was extended to some classes of games with countable state spaces in [4, 42] by finite state approximations. A more direct approach based on properties of measures induced by strategies and occupation measures was presented in [28].
In this paper, we study discounted constrained stochastic games with a general state space and the transition probability having a density function. Such two-person games with additive rewards and additive transition structure (ARAT games) were recently studied by Dufour and Prieto-Rumeau [13]. They established the existence of stationary Nash equilibria generalising the result of Himmelberg et al. [25] proved for unconstrained games. Moreover, their theorem also holds for N-person ARAT games satisfying the standard Slater condition. As shown in a highly non-trivial example by Levy and McLennan [29], the games under consideration in this paper may have no stationary Nash equilibrium in the unconstrained case. It can be seen that this example applies to the constrained case as well. Thus, results on approximate equilibria as in [34, 41] became more valuable. They are stated for the unconstrained case, and in this paper we extend the main result from [34] to a class of constrained games. In this way, we establish the existence of approximate stationary equilbria for discounted stochastic games with constraints and general state spaces. It should be noted that the existence of stationary equilibria in discounted unconstrained games was proved only in some special cases, for instance, for ARAT games [25] or games with transitions having no conditional atoms [23]. For a survey of results on stationary and non-stationary Nash equilibria the reader is referred to [26].
The other group of papers comprise the ones on stationary equilibria with public signals, see [11, 22, 36]. Such solutions can be viewed as special communication or correlated equilibria widely discussed in dynamic frameworks (repeated, stochastic or extensive form games) in [20, 21, 31, 38, 39]. They were inspired by the seminal papers of Aumann [5, 6]. A weaker version of correlated equilibrium was proposed by Moulin and Vial [32]. According to their approach a correlated strategy in a finite (bimatrix) game is a probability distribution \(\nu \) on the set of pure strategy pairs. Every player has to decide whether to accept \(\nu \) or to use his or her individual strategy. If player i uses an individual strategy and player \(j\not =i\) obeys \(\nu \), then a pure action for player j is selected by the marginal distribution of \(\nu \) on his/her pure actions. Then \(\nu \) is an equilibrium, if no unilateral deviations from it are profitable. This solution is called a weak correlated equilibrium or a correlated equilibrium with no exchange of information [32]. In contrast to Aumann’s approach, the players who accepted \(\nu \) cannot change actions after using the lottery \(\nu \). The solution proposed by Moulin and Vial [32] has an interesting property. Namely, the authors constructed a bimatrix game, in which the equilibrium payoffs in their equilibrium concept strictly dominate in the Pareto sense the payoffs in Aumann’s equilibrium, see [30, 32].
In [35] the concept of Moulin and Vial is used to an unconstrained discounted stochastic game with a general state space. However, as shown by Solan and Vieille, [39], the notion of a weak correlated equilibrium can be also regarded as a special case of a general correlation scheme.
In this paper, we extend the result from [35] to a large class of discounted stochastic games with so-called integral constraints. We apply our recent result from [28] for games with discrete state spaces and use an approximation technique. A stationary weak correlated equilibrium is obtained as a limit (in the weak* sense) of approximate equilibria. Our result generalises the main theorem of Dufour and Prieto-Rumeau [13] given for ARAT games, if the action sets for players do not depend on the state. We wish to emphasise that the considerations of other classes of correlated equilibria in constrained stochastic games (like equilibria with public signals) seem to be very challenging for many reasons. Firstly, the integral constraints are difficult to apply. Secondly, the usual methods from dynamic programming (Bellman’s principle) or backward and forward induction used in unconstrained cases are not applicable. Perhaps further possible results can be obtained for other correlated equilibria but under different type of constraints.
The paper is organised as follows. The model and main results on equilibria are contained in Sect. 2. Section 3 presents the approximation technique and the proofs of two main theorems. Section 4 is devoted to the proof on the existence of a weak correlated equilibrium and a discussion on our assumptions. In Sect. 5, we show that the example given in [29] can be used to show that discounted constrained stochastic games studied in this paper may not have stationary Nash equilibria. Section 6 discusses a useful transformation that shows how to easily extend our results formulated for bounded cost functions to unbounded ones. In Appendix (Sect. 7) we give a crucial lemma on a replacement of a general strategy by a piecewise constant strategy. It is used in the proofs of our main theorems on equilibria in constrained stochastic games.
2 The Game Model and Main Results
In this section, we describe constrained discounted stochastic games with general state space and our basic assumptions. We provide our main results in three cases. Firstly, we give a theorem on the existence of a stationary approximate equilibrium assuming that the players play the game independently. Secondly, we drop the constraints and give a theorem on the existence of a stationary \(\varepsilon \)-equilibrium for every initial state, extending the main result in [34]. Finally, we show that the constrained stochastic games under consideration possess stationary weak correlated equilibria introduced in the static (bimatrix) case by Moulin and Vial [32].
2.1 Approximate Nash Equilibria in Constrained Discounted Stochastic Games
The non-zero-sum constrained stochastic game (CSG) is described by the following objects:
-
\({{{\mathcal {N}}}}=\{1,2,...,N\}\) is the set of players.
-
X is a state space endowed with a countably generated \(\sigma \)-algebra \({{\mathcal {F}}}.\)
-
\(A_i\) is a compact metric action space for player \(i\in {{{\mathcal {N}}}} \) endowed with the Borel \(\sigma \)-algebra. We put
$$\begin{aligned}{} & {} A:=\prod _{j\in {{{\mathcal {N}}}}} A_j\quad \text{ and }\quad A_{-i}:=\prod _{j\in {{{\mathcal {N}}}}\setminus \{ i\}} A_j, \\{} & {} {\mathbb {K}}_i:= \{(x,a_i): x\in X,\ a_i\in A_i \}, \quad {\mathbb {K}}:= \{(x,\pmb {a}): x\in X,\ \pmb { a}=(a_1,...,a_n)\in A \}. \end{aligned}$$ -
The real-valued functions \(c_i^\ell :{\mathbb {K}}\rightarrow {\mathbb {R}},\) where \(i\in {{{\mathcal {N}}}},\) \(\ell \in {{{\mathcal {L}}}}_0={{{\mathcal {L}}}}\cup \{0\} \) with \({{{\mathcal {L}}}}=\{1,...,L\},\) are product measurable. Here, \(c_i^0\) is the cost-per-stage function for player \(i\in {{{\mathcal {N}}}},\) and for each \(\ell \in \mathcal{L},\) \(c_i^\ell \) is a function used in the definition of the \(\ell \)-th constraint for this player. It is assumed that there exists \(b>0\) such that
$$\begin{aligned} |c^\ell _i(x,\pmb {a})|\le b,\quad \text{ for } \text{ all }\quad i\in {{{\mathcal {N}}}},\ \ell \in {{{\mathcal {L}}}}_0,\ (x,\pmb {a})\in {\mathbb {K}}. \end{aligned}$$ -
\(p(dy|x,\pmb {a})\) is the transition probability from x to \(y\in X,\) when the players choose a profile \(\pmb { a}=(a_1,a_2,...,a_N)\) of actions in A.
-
\(\eta \) is the initial state distribution.
-
\(\alpha \in (0,1)\) is the discount factor.
-
\(\kappa _i^\ell \) are constraint constants, \(i\in {{{\mathcal {N}}}},\) \(\ell \in {{{\mathcal {L}}}}.\)
Let \({\mathbb {N}}=\{1,2,...\} .\) Define \(H^1=X\) and \(H^{t+1}= {\mathbb {K}}\times H^{t}\) for \(t\in {\mathbb {N}}.\) An element \(h^t=(x^1,\pmb { a}^1,\ldots ,x^t)\) of \(H^t\) represents a history of the game up to the t-th period, where \(\pmb { a}^k=(a^k_1,\ldots ,a^k_N)\) is the profile of actions chosen by the players in the state \(x^k\) on the k-th stage of the game, \(h^1=x^1.\)
Strategies for the players are defined in the usual way. A strategy for player \(i\in {{{\mathcal {N}}}}\) is a sequence \(\pi _i=(\pi _{i}^t)_{t\in {\mathbb {N}}},\) where each \(\pi _{i}^t\) is a transition probability from \(H^t\) to \(A_i.\) By \(\Pi _i\) we denote the set of all strategies for player i. Let \(\Phi _i\) be the set of transition probabilities from X to \(A_i.\) A stationary strategy for player i is a constant sequence \( (\varphi _{i}^t)_{t\in {\mathbb {N}}},\) where \(\varphi _i^t=\varphi _i\) for all \(t\in {\mathbb {N}}\) and some \(\varphi _i\in \Phi _i.\) Furthermore, we shall identify a stationary strategy for player i with the constant element \(\varphi _i\) of the sequence. Thus, the set of all stationary strategies of player i is also denoted by \(\Phi _i.\) We define
Hence, \(\Pi \) (\(\Phi \)) is the set of all (stationary) multi-strategies of the players.
Let \(H^\infty = {\mathbb {K}}\times {\mathbb {K}}\times \cdots \) be the space of all infinite histories of the game endowed with the product \(\sigma \)-algebra. For any multi-strategy \(\pmb { \pi }\in \Pi \), a unique probability measure \({\mathbb {P}}_\eta ^{\pmb {\pi }}\) and a stochastic process \((x^t,\pmb {a}^t)_{t\in {\mathbb {N}}}\) are defined on \(H^\infty \) in a canonical way, see the Ionescu-Tulcea theorem, e.g., Proposition V.1.1 in [33]. The measure \({\mathbb {P}}_\eta ^{\pmb {\pi }}\) is induced by \(\pmb {\pi },\) the transition probability p and the initial distribution \(\eta .\) The expectation operator with respect to \({\mathbb {P}}_\eta ^{\pmb {\pi }}\) is denoted by \({\mathbb {E}}_\eta ^{\pmb {\pi }}.\)
Let \(\pmb {\pi }\in \Pi \) be any multi-strategy. For each \(i\in \mathcal{N}\) and \(\ell \in {{{\mathcal {L}}}}_0\), the discounted cost functionals are defined as
We assume that \(J^0_i(\pmb {\pi })\) is the expected discounted cost of player \(i\in {{{\mathcal {N}}}}\), who wishes to minimise it over \(\pi _i \in \Pi _i\) in such a way that the following constraints are satisfied
A multi-strategy \(\pmb {\pi }\) is feasible, if the above inequality holds for each \(i\in {{{\mathcal {N}}}},\) \(\ell \in {{{\mathcal {L}}}}.\) We denote by \(\Delta \) the set of all feasible multi-strategies in the CSG.
As usual, for any \(\pmb {\pi }\in \Pi \), we denote by \(\pmb {\pi _{-i}}\) the multi-strategy of all players but player i, that is, \( \pmb {\pi _{-1}} =(\pi _2,...,\pi _N),\) \( \pmb {\pi _{-N}} =(\pi _1,...,\pi _{N-1}),\) and for \(i\in {{{\mathcal {N}}}}\setminus \{1,N\},\)
We identify \([\pmb {\pi _{-i}},\pi _i]\) with \(\pmb {\pi }.\) For each \(\pmb {\pi }\in \Pi \), we define the set of feasible strategies for player i with \(\pmb {\pi _{-i}}\) as
Let \(\pmb {\pi }=(\pi _1,\pi _2,...,\pi _N) \in \Pi \) and \(\sigma _i\in \Pi _i.\) By \([\pmb {\pi _{-i}},\sigma _i]\) we denote the multi-strategy, where player i uses \(\sigma _i\) and every player \(j\not = i\) uses \(\pi _j.\)
Definition 2.1
A multi-strategy \(\pmb {\pi }^*\in \Pi \) is an approximate equilibrium in the CSG (for given \(\varepsilon >0\)), if for every \(i\in {{{\mathcal {N}}}}\) and \(\ell \in {{{\mathcal {L}}}},\)
and for every \(i\in {{{\mathcal {N}}}},\)
A multi-strategy \(\pmb {\pi }^*\in \Pi \) is an \(\varepsilon \)-equilibrium in the CSG (for given \(\varepsilon \ge 0\)), if (2.2) holds and \(J^\ell _i(\pmb {\pi }^*) \le \kappa ^\ell _i \) for every \(i\in {{{\mathcal {N}}}}\) and \(\ell \in {{{\mathcal {L}}}}.\) A 0-equilibrium is called a Nash equilibrium in the CSG.
Note that, every \(\varepsilon \)-equilibrium is approximate, but not vice versa. For small \(\varepsilon >0,\) condition (2.1) allows for a slight violation of the feasibility of \(\pmb {\pi }^*\). Further comments on this condition the reader will find in Remark 2.4.
We now formulate our basic assumptions.
Assumption A1
The functions \(c^\ell _i(x,\cdot )\) are continuous on A for all \(x\in X,\) \(i\in {{{\mathcal {N}}}}\) and \(\ell \in {{{\mathcal {L}}}}_0.\)
Assumption A2
The transition probability p is of the form
where \(\mu \) is a probability measure on \({{\mathcal {F}}}\) and \(\delta \) is a product measurable non-negative (density) function such that, if \(\pmb {a}^n \rightarrow \pmb {a}\) as \(n\rightarrow \infty ,\) then
This assumption means the norm continuity of p with respect to action profiles.
Assumption A3
For each stationary multi-strategy \(\pmb {\varphi }\in \Phi \) and for each player \(i\in {{{\mathcal {N}}}},\) there exists \(\pi _i\in \Pi _i\) such that
Assumption A3 is standard in the theory of constrained decision processes and stochastic games [2, 3, 13, 28].
Remark 2.2
From Assumption A3, Lemma 2.3 in [13] and Lemma 24 in [37], it follows that the strategy \(\pi _i\in \Pi _i\) can be replaced a stationary strategy \(\sigma _i\in \Phi _i\) such that
The proof of Lemma 24 in [37] on the equivalence of these strategies is formulated for models with Borel state spaces. However, it is also valid in our framework (see pp. 307–309 in [37]) with the exception that we need an appropriate disintegration result. In this matter, consult with Lemma 2.3 in [13] or Theorem 3.2 in [19].
We are ready to state our first main result.
Theorem 2.3
Assume A1, A2 and A3. Then, for each \(\varepsilon >0,\) the CSG possesses a stationary approximate equilibrium.
Remark 2.4
The proof of this result is given in Sect. 3. We prove that a stationary approximate equilibrium for given \(\varepsilon >0\) consists of strategies that are piecewise constant functions of the state variable. We observe that, under assumptions of Theorem 2.3, condition (2.1) with \(\varepsilon =0\) need not be satisfied by piecewise constant stationary multi-strategies. Therefore, the existence of an \(\varepsilon \)-equilibrium in the CSG is an open issue. We would like to emphasise that Theorem 2.3 is crucial in our proof of Theorem 2.13 on weak correlated equilibria, where we apply an asymptotic approach when \(\varepsilon \rightarrow 0.\)
Remark 2.5
The only result in the literature on the existence of stationary Nash equilibria in CSGs with general state space was given by Dufour and Prieto-Rumeau [13]. It concerns so-called discounted additive rewards and additive transition (ARAT) stochastic games. In the two-person case the ARAT assumption means that \(c_i^\ell (x,a_1,a_2)= c_{1i}^\ell (x,a_1) + c_{2i}^\ell (x, a_2)\) and \(p(\cdot |x,a_1,a_2)= p_1(\cdot |x,a_1 )+ p_2(\cdot |x, a_2), \) where \(p_1\) and \(p_2\) are transition subprobabilities. The results in [13] are given for two-person games satisfying the standard Slater condition (Assumption A3 with strict inequalities). However, they can be easily extended by the same methods to N-person ARAT stochastic games. A simple adaptation of the counterexample by Levy and McLennan [29] given for unconstrained discounted stochastic games implies that stationary Nash equilibria may not exist in the constrained stochastic games studied in this paper. For more details see Sect. 5.
Remark 2.6
We wish to emphasise that the Slater condition is not needed for the establishing an approximate equilibrium in CSGs.
2.2 An Update on Stationary Equilibria in Unconstrained Discounted Stochastic Games
In this subsection, we drop the constraints. By the Ionescu–Tulcea theorem [33], any multi-strategy \(\pmb {\pi }\in \Pi \) and any initial state \(x\in X,\) induce a unique probability measure \({\mathbb {P}}_x^{\pmb {\pi }}\) on \(H^\infty .\) The expectation operator with respect to \({\mathbb {P}}_x^{\pmb {\pi }}\) is denoted by \({\mathbb {E}}_x^{\pmb {\pi }}.\)
The discounted cost for player \(i\in {{{\mathcal {N}}}}\) is defined as
Definition 2.7
Let \(\varepsilon \ge 0\) be fixed. A multi-strategy \(\pmb {\pi }^*\in \Pi \) is an \(\varepsilon \)-equilibrium in the unconstrained discounted stochastic game, if
for every player \(i\in {{{\mathcal {N}}}}\) and for all initial states \(x\in X.\) A 0-equilibrium is called a Nash equilibrium.
Theorem 2.8
Under assumptions A1and A2, for any \(\varepsilon >0,\) the unconstrained discounted stochastic game has a stationary \(\varepsilon \)-equilibrium.
The proof is given in Sect. 3.
Remark 2.9
Stationary Nash equilibria exist only in some special cases of stochastic games satisfying Assumptions A1 and A2, see [25] (ARAT games), [23] (other classes of games) and [26] (a survey). As shown by Levy and McLennan [29] stationary Nash equilibria need not exist in general under assumptions of Theorem 2.8.
Remark 2.10
Theorem 2.8 is an extension of Theorem 3.1 in [34], where additionally it is assumed that
2.3 Weak Correlated Equilibria in Constrained Discounted Stochastic Games
Let \(\Psi \) be the set of all transition probabilities from X to A, that is, \(\psi \in \Psi \) if \(\psi (\cdot |x)\in \Pr (A)\) for every \(x\in X \) and \(\psi (D|\cdot )\) is \({{\mathcal {F}}}\)-measurable for any Borel set \(D\subset A.\) A stationary correlated strategy for the players in the CSG is a constant sequence \((\psi ,\psi ,\ldots ),\) where \(\psi \in \Psi .\) As in the case of stationary strategies, we shall identify a correlated strategy with the element \(\psi \) of this sequence.
By the Ionescu-Tulcea theorem [33], any correlated strategy \(\psi \in \Psi \) and the initial distribution \(\eta ,\) induce a unique probability measure \({\mathbb {P}}_\eta ^{\psi }\) on \(H^\infty .\) The expectation operator with respect to \({\mathbb {P}}_\eta ^{\psi }\) is denoted by \({\mathbb {E}}_\eta ^{\psi }.\) Then the discounted cost functionals for player \(i\in {{{\mathcal {N}}}}\) are defined as
for all \(\ell \in {{{\mathcal {L}}}}_0.\) Obviously, here at stage t the vector of actions \(\pmb {a}^t\) is chosen according to a probability measure \(\psi (\cdot |x^t).\)
Furthermore, let \(\psi _{i}\) and i denote the projections for any x 2 X of (jx) on \(A_{\psi i}\) and \(A_{i}\), respectively. For any player \(i\in {{{\mathcal {N}}}}\) and a strategy \(\pi _i\in \Pi _i\) we denote by \([\psi _{-i},\pi _i]\) a multi-strategy, where player i uses a strategy \(\pi _i\) and the other players act as one player applying \(\psi _{-i}.\) In this case, \(J^0_i([\psi _{-i},\pi _i])\) denotes the expected discounted cost for player i. Set
Definition 2.11
A strategy \(\psi ^*\in \Psi \) is called a weak correlated equilibrium in the CSG, if for every \(i\in {{{\mathcal {N}}}} \) and \(\ell \in {{{\mathcal {L}}}},\) \(J^\ell _i (\psi ^*) \le \kappa ^\ell _i\) and for every \(i\in {{{\mathcal {N}}}}, \)
If all players but \(i\in {{{\mathcal {N}}}}\) accept to use \(\psi ^*\) to select an action profile in any state x and player \(i\in {{{\mathcal {N}}}}\) decides to play independently of all of them by choosing a feasible strategy \(\pi _i\), then the action profile for all players in \(\mathcal{N}\setminus \{i\}\) is selected with respect to the marginal probability distribution \(\psi _{-i}^*(\cdot |x)\) on \(A_{-i}.\) When \(\psi ^*\) is a weak correlated equilibrium, then inequality (2.4) says that unilateral deviations from \(\psi ^*\) are not profitable. This is an adaptation of the equilibrium concept, formulated by Moulin and Vial [32] for static games, to our dynamic game model.
In order to state our third main result, we define \(\Phi _{-i}:= \prod _{j\in {{{\mathcal {N}}}}\setminus \{i\}}\Phi _j \) and impose the following condition.
Assumption A4
For each player \(i\in {{{\mathcal {N}}}},\)
This assumption implies the standard Slater condition (see Assumption A5 below) widely used in the literature [2, 3, 13, 28].
Assumption A5
For each player \(i\in {{{\mathcal {N}}}}\) and any \(\pmb {\varphi _{-i}}\in \Phi _{-i}\), there exists \(\sigma _i\in \Phi _i\) such that
Assumptions A4 and A5 may seemingly be more general. Namely, we can formulate them for \(\pi _i\in \Pi _i\) instead of \(\sigma _i\in \Phi _i\) and replace the set \(\Phi _i\) by \(\Pi _i.\) However, Remark 2.2 implies that these formulations are in fact equivalent.
Remark 2.12
From Assumption A4, it follows that there exists \(\zeta >0\) such that for every player \(i\in {{{\mathcal {N}}}},\)
and consequently that for each player \(i\in {{{\mathcal {N}}}}\) and any \(\pmb {\varphi _{-i}}\in \Phi _{-i}\), there exists \(\sigma _i\in \Phi _i\) such that
Theorem 2.13
Assume A1, A2 and A4. Then, the CSG possesses a stationary weak correlated equilibrium.
The proof is given in Sect. 4.
Remark 2.14
The existence of a weak correlated equilibrium in an unconstrained case was proved by Nowak [35] under additional integrability condition (2.3).
Remark 2.15
If \(\psi ^*\) is a stationary weak correlated equilibrium in an ARAT game, then \((\psi _{1},\psi _{2},...,\psi _N)\) is a stationary Nash equilibrium in this game. Thus, Theorem 2.13 implies the main result of Dufour and Prieto-Rumeau [13], if the action sets are independent of the state. However, their proof is more direct in the sense that it is not based on an approximation by games with discrete state spaces. Instead, they directly apply a fixed point theorem. An extension to the case of action spaces depending on the state variable raises some additional technical issues.
3 Approximating Games with Countable State Spaces and Proofs of Theorems 2.3 and 2.8
In this section, we define a class of games that resemble stochastic games with a countable state space. Using them we can approximate the original game and apply the results on existence of stationary equilibria in discounted games with countably many states proved by Federgruen [15] (unconstrained case) and Jaśkiewicz and Nowak [28] (constrained case).
Let \({{{\mathcal {C}}}}(A)\) be the Banach space of all real-valued continuous functions on A endowed with the maximum norm \(\Vert \cdot \Vert .\) Let \({{{\mathcal {C}}}}_b = \{w_1,w_2,...\}\) denote the countable dense subset in the ball \(\{w\in {{{\mathcal {C}}}}(A): \Vert w\Vert \le b\}\) in \({{{\mathcal {C}}}}(A),\) where \(b\ge |c^\ell _i(x,\pmb {a})|\) for all \(i\in {{{\mathcal {N}}}}\), \(\ell \in \mathcal{L}_0,\) \((x,\pmb {a})\in {\mathbb {K}}.\)
We write \({{{\mathcal {L}}}}^1\) to denote the Banach space \(\mathcal{L}^1(X,{{{\mathcal {F}}}},\mu )\) of all absolutely integrable real-valued measurable functions on X with the norm
Let \({{{\mathcal {C}}}}(A,{{{\mathcal {L}}}}^1)\) be the space of all \({{{\mathcal {L}}}}^1\)-valued continuous functions on A with the norm
Here an element of \({{{\mathcal {C}}}}(A,{{{\mathcal {L}}}}^1)\) is written as a product measurable function \(\lambda : X\times A \rightarrow {\mathbb {R}}\) such that \(\lambda (\cdot ,\pmb {a}) \in {{{\mathcal {L}}}}^1\) for each \(\pmb {a}\in A\) and
By Lemma 3.99 in [1], the space \({{{\mathcal {C}}}}(A,{{{\mathcal {L}}}}^1)\) is separable. Assumption A2 implies that \({{{\mathcal {D}}}}:= \{ \delta (x,\cdot ,\cdot ): x \in X \} \subset {{{\mathcal {C}}}}(A,{{{\mathcal {L}}}}^1)\) is also a separable space when endowed with the relative topology. Therefore, there exists a subset \(\{x_k: k \in {\mathbb {N}}\}\) of the state space X such that the set \(\{\delta (x_k,\cdot ,\cdot ): k \in {\mathbb {N}} \}\) is dense in \({{{\mathcal {D}}}}.\)
For any player \(i\in {{{\mathcal {N}}}},\) and positive integers \(m_{i\ell },\) \(\ell \in {{{\mathcal {L}}}}_0,\) we put \({\overline{m}}_i =(m_{i0},m_{i1},...,m_{iL}).\) Then, given any \(\gamma >0 \), we define \(B^\gamma (i,{\overline{m}}_i)\) as the set of all states \(x\in X\) such that
For any \(k \in {\mathbb {N}},\) let
It is obvious that the sets \(B_k^\gamma \) and \(B^\gamma (i,{\overline{m}}_i)\) belong to \({{\mathcal {F}}}\) and the union of all sets
is the whole state space X. Indeed, if \(x\in X,\) then there exists \(k\in {\mathbb {N}}\) such that \(x\in B_k^\gamma \) and, for any player \(i\in {{{\mathcal {N}}}},\) there exist functions \(w_{m_{i\ell }}\in {{{\mathcal {C}}}}_b, \) and thus \({\overline{m}}_i \) such that (3.1) holds.
Let \(\xi \) be a fixed one-to-one correspondence between the sets \({\mathbb {N}}\) and \({\mathbb {N}}\times {\mathbb {N}}^{N(L+1)}.\) Assuming that \(j\in {\mathbb {N}}\) and \(\xi (j)= (k,{\overline{m}}_1,...,{\overline{m}}_N),\) we put
We can assume without loss of generality that \(Y^\gamma _1 \not = \emptyset .\) Next, we set \(X^\gamma _1= Y_1^\gamma \) and
Omitting empty sets \(X_\tau ^\gamma \) we obtain a subset \({\mathbb {N}}_0 \subset {\mathbb {N}}\) such that
is a measurable partition of the state space X. Choose any \(n \in {\mathbb {N}}_0.\) Then, \(\xi (n)\) is a unique sequence in \({{\mathbb {N}}}\times {\mathbb {N}}^{N(L+1)}\) that depends on n and, therefore, we can write \(\xi (n)= (k^n,{\overline{m}}_1^n,...,{\overline{m}}_N^n)\) where \({\overline{m}}_i^n = (m_{i0}^n, m_{i1}^n,...,m_{iL}^n),\) \(i\in {{{\mathcal {N}}}}.\) Next, for each \(x\in X_n^\gamma ,\) we define
From (3.1), (3.2) and (3.3), it follows that for each \(n\in {\mathbb {N}}_0\) and \(x\in X_n^\gamma ,\) we have
and
The original game defined in Sect. 2 is now denoted by \({{\mathcal {G}}}\). We use \({{{\mathcal {G}}}}^\gamma \) to denote the game, where the cost functions are \( c^{\ell ,\gamma }_i,\) \(\ell \in {{{\mathcal {L}}}}_0\) and \(i\in {{{\mathcal {N}}}}\), and the transition probability is
Note that \( c^{\ell ,\gamma }_i(x,\pmb {a})\) and \( p^\gamma (B|x,\pmb {a})\) are constant functions of x on every set \(X_n^\gamma .\)
The discounted expected costs in the game \({{{\mathcal {G}}}}^\gamma \) under a multi-strategy \(\pmb {\pi }\in \Pi \) are denoted by
Let
From (3.4), (3.5) and Lemma 4.4 in [34], we conclude the following auxiliary result.
Lemma 3.1
For each \(i\in {{{\mathcal {N}}}}\) and \(\ell \in {{{\mathcal {L}}}}_0,\) we have
With \({{{\mathcal {G}}}}^\gamma \) we associate a stochastic game \(\mathcal{G}^\gamma _c\) with the countable state space \({\mathbb {N}}_0 \subset {\mathbb {N}}\), the costs given by
and transitions defined as
Note that the right-hand sides in (3.7) and (3.8) are independent of x in \(X_n^\gamma \) and thus the costs and transitions above are well-defined. A stationary strategy for player \(i\in {{{\mathcal {N}}}}\) in the game \({{{\mathcal {G}}}}^\gamma _c\) is a transition probability \(f_i\) from \({\mathbb {N}}_0\) to \(A_i.\) The set of all stationary strategies for player \(i\in {{{\mathcal {N}}}}\) in this game is denoted by \(F_i.\) We put \(F:= \prod _{i\in {{{\mathcal {N}}}}}F_i.\)
The expected discounted costs in the game \({{{\mathcal {G}}}}^\gamma _c\) under stationary multi-strategy \(\pmb {\pi }\) are denoted by
Let \( \Phi ^\gamma _i\) be the set of all piecewise constant stationary strategies of player \(i\in {{{\mathcal {N}}}} \) in the game \(\mathcal{G}^\gamma .\) A strategy \(\varphi _i\in \Phi ^\gamma _i\), if, for each \(n\in {\mathbb {N}}_0,\) there exists a probability measure \(\nu _n\) on \(A_i\) such that \(\varphi _i (da_i|x)= \nu _n(da_i)\) for all \(x\in X_n^\gamma .\) We put \(\Phi ^\gamma = \prod _{i\in {{{\mathcal {N}}}}} \Phi ^\gamma _i.\)
Let \(\pmb {f}= (f_1,...,f_N) \in F\) and \(\pmb {\varphi }= (\varphi _1,...,\varphi _N)\in \Phi ^\gamma \) be such that
Then, for each \(i\in {{{\mathcal {N}}}}\), \(\ell \in {{{\mathcal {L}}}}_0,\) \(n\in {\mathbb {N}}\) and \(x\in X_n^\gamma ,\)
and
Equations (3.10) and (3.11) show that \({{{\mathcal {G}}}}^\gamma \) with the strategy sets \(\Phi ^\gamma _i\) can be recognised as a game with a countable state space. This observation plays an important role in the proof, because we can apply a result for games on countable state spaces.
Proof of Theorem 2.3
Let \(\varepsilon >0\) and \(i \in {{{\mathcal {N}}}}\). Choose \(\gamma >0\) in (3.6) such that \(\epsilon (\gamma ) < \varepsilon /2.\) By Assumption A3 and Remark 2.2 we imply that for any multi-strategy \(\pmb {\varphi }\in \Phi ^\gamma \) there exists \(\sigma _i\in \Phi _i\) such that
By Lemma 7.1 in Appendix, there exists a piecewise constant Markov strategy \({\overline{\pi }}_i\) such that
for all \(\ell \in {{{\mathcal {L}}}}_0.\) By Lemma 3.1 and (3.12) we conclude that
This means that the approximating game \({{{\mathcal {G}}}}^\gamma \) satisfies the Slater condition with the constants \(\kappa ^\ell _i +\frac{\varepsilon }{2},\) \(\ell \in {{{\mathcal {L}}}}.\) Note that the constraint constants in \({{{\mathcal {G}}}}^\gamma \) are also equal \(\kappa ^\ell _i +\frac{\varepsilon }{2},\) \(\ell \in {{{\mathcal {L}}}}.\) Therefore, the associated game \({{{\mathcal {G}}}}_c^\gamma \) also satisfies the Slater condition with the same constants \( \kappa ^\ell _i +\frac{\varepsilon }{2},\) \(\ell \in {{{\mathcal {L}}}}.\) Making use of Corollary 2 in [28], we infer that the game \({{{\mathcal {G}}}}_c^\gamma \) possesses a stationary Nash equilibrium \(\pmb {f}^*= (f_1^*,...,f_N^*).\) Define \(\pmb {\varphi }^* = (\varphi _1^*,...,\varphi _N^*)\in \Phi ^\gamma \) as in (3.9) with \( \pmb {\varphi }=\pmb {\varphi }^*\) and \(f=f^*.\) Then,
for any piecewise constant strategy \({\hat{\pi }}_i\) such that
We now show that \(\pmb {\varphi }^*\) is an approximate equilibrium in the original game. Note that for every player \(i\in {{{\mathcal {N}}}}\)
Hence, for every player \(i\in {{{\mathcal {N}}}}\)
i.e., condition (2.1) holds. Consider any feasible strategy \(\pi _i\in \Delta _i(\pmb {\varphi _{-i}^*}),\) i.e.,
Applying Remark 2.2, we deduce that there exists a strategy \(\sigma _i\in \Phi _i\) such that
Then, by Lemma 7.1 in Appendix, there exists a piecewise constant Markov strategy \({\overline{\pi }}_i\) such that
Moreover, by (3.15), Lemma 3.1, (3.14) and (3.13), for every \(\ell \in {{{\mathcal {L}}}},\) we have
In other words, \({\overline{\pi }}_i\) is a feasible strategy in \(\mathcal{G}^\gamma \). Therefore, by Lemma 3.1, (3.15) and (3.14), we infer
This fact together with (3.13) implies that (2.2) holds. \(\square \)
Proof of Theorem 2.8
Let \(\varepsilon >0\) be fixed. Choose \(\gamma >0\) in (3.6) such that \(\epsilon (\gamma ) < \varepsilon /2.\) By Theorem 2.3 in [15], the game \({{{\mathcal {G}}}}^\gamma _c\) has a stationary equilibrium \(\pmb {f}^*= (f_1^*,...,f_N^*).\) Define \(\pmb {\varphi }^* = (\varphi _1^*,...,\varphi _N^*)\in \Phi ^\gamma \) as in the proof of Theorem 2.3. Then we have
As in Lemma 4.1 in [34], we can prove that
This equality and Lemma 3.1 imply that
By standard methods in discounted dynamic programming [8, 34], we have
This fact and (3.18) imply that
which completes the proof. \(\square \)
Remark 3.2
The proof of Theorem 2.8 is similar to that of Theorem 3.1 in [34], but it has one important change implying that the restrictive condition (2.3) can be dropped.
4 Young Measures and the Proof of Theorem 2.13
Let \(\vartheta :=(\eta +\mu )/2.\) A function \(c:{\mathbb {K}}\rightarrow {\mathbb {R}}\) is Carathéodory, if it is product measurable on \({\mathbb {K}}\), \(c(x,\cdot )\) is continuous on A for each \(x\in X\) and
Let \(\Psi ^\vartheta \) be the space of all \(\vartheta \)-equivalence classes of functions in \(\Psi .\) The elements of \(\Psi ^\vartheta \) are called Young measures. Note that the expected discounted cost functionals are well-defined for all elements of \(\Psi ^\vartheta .\) More precisely, if \(\psi ^\vartheta \in \Psi ^\vartheta ,\) then \(J^\ell _i(\psi )\) is the same for all representatives \(\psi \) of \(\psi ^\vartheta \) in \(\Psi \) and we can understand \(J^\ell _i(\psi ^\vartheta )\) as \(J^\ell _i(\psi ).\) We shall identify in notation \(\psi ^\vartheta \) with its representative \(\psi \) and omit the superscript \(\vartheta .\)
We assume that the space \(\Psi ^\vartheta \) is endowed with the weak* topology. Since \({{{\mathcal {F}}}}\) is countably generated, \(\Psi ^\vartheta \) is metrisable. Moreover, since the set A is compact, \(\Psi ^\vartheta \) is a compact convex subset of a locally convex linear topological space. For a detailed discussion of these issues consult with [7] or Chapter 3 in [19]. Here, we recall that \(\psi ^n \rightarrow ^* \psi ^0\) in \(\Psi ^\vartheta \) as \(n\rightarrow \infty \) if and only if for every Carathéodory function \(c:{\mathbb {K}}\rightarrow {\mathbb {R}}\), we have
We now choose \(\varepsilon _n>0\) such that \(\varepsilon _n \searrow 0\) as \(n\rightarrow \infty \) and define
In other words, \(\epsilon (\gamma _n)=\varepsilon _n\) or \(\gamma _n=\epsilon ^{-1}(\varepsilon _n).\) From Theorem 2.3, it follows that there exists a profile of stationary piecewise constant strategies
which comprises an approximate equilibrium in the CSG for \(\varepsilon _n\) and at the same time an equilibrium in the corresponding constrained game \({{{\mathcal {G}}}}^{\gamma _n}\) with \(\gamma _n\) as in (4.1) and the constraint constants \(\kappa ^\ell _i+\frac{\varepsilon _n}{2}.\)
Define the product measure on A, for every \(x\in X\) and \(n\in {\mathbb {N}}\), as
We use \(\psi ^n\) to denote the class in \(\Psi ^\vartheta \) whose representative is this transition probability. Without loss of generality, we may assume that \(\psi ^n\) converges in the weak* topology to some \(\psi ^* \in \Psi ^\vartheta \) as \(n\rightarrow \infty .\)
We shall need the following results. The first one is a consequence of Lemma 3.1 and the fact that \(J_i^{\ell ,\gamma _n}(\pmb {\psi ^n})= J_i^{\ell ,\gamma _n}(\psi ^n)\) and \(J_i^{\ell }(\pmb {\psi ^n})= J_i^{\ell }(\psi ^n).\)
Lemma 4.1
For each \(i\in {{{\mathcal {N}}}}\) and \(\ell \in {{{\mathcal {L}}}}_0,\) we have
where \(\gamma _n\) is as in (4.1).
Lemma 4.2
If \(n\rightarrow \infty ,\) then for any \(\ell \in {{{\mathcal {L}}}}_0\)
(a) \(J_i^{\ell ,\gamma _n}(\psi ^n) \rightarrow J_i^{\ell }(\psi ^*)\) ,
(b) \(J_i^{\ell ,\gamma _n}([\psi _{-i}^n,\phi _i]) \rightarrow J_i^{\ell }([\psi _{-i}^*,\phi _i])\) for any \(\phi _i\in \Phi _i.\)
Proof
For part (a) we first use the triangle inequality
The first term on the right-hand side converges to 0 by Lemma 4.1 and the definition of \(\psi ^n,\) whereas the convergence to 0 of the the second term follows from Lemma 4.1 in [27] and the fact that \(|J_i^{\ell }(\cdot )|\le b\) for every \(i\in {{{\mathcal {N}}}}\) and \(\ell \in {{{\mathcal {L}}}}_0.\) Part (b) is proved as point (a) by using the Fubini theorem and noting that the elements in \(\Psi ^\vartheta \) induced by \(\psi _{-i}^n\) in (4.2) and \(\phi _i\) converge in the weak* sense to the element of \(\Psi ^\vartheta \) induced by \(\psi _{-i}^*\) and \(\phi _i.\) \(\Box \) \(\square \)
Let \(i\in {{{\mathcal {N}}}}.\) Consider a Markov decision process with player i as a decision maker and the transition probability
Let \(1_D\) be the indicator of the set \(D\subset X\times A.\) The associated occupation measure, when player i uses a stationary strategy \(\varphi _i\in \Phi _i\) is defined as follows
for any \(B\in {{{\mathcal {F}}}}\) and a Borel set C in \(A_i.\) We use the symbol \({{{\mathcal {E}}}}^{\varphi _i}_\eta \) to denote the expectation operator corresponding to the unique probability measure induced by \(\varphi _i\in \Phi _i\), the initial distribution \(\eta \) and the transition probability \(q^{\gamma _n}.\) For \(\ell \in {{{\mathcal {L}}}}_0,\) \(x\in X\) and \(a_i\in A_i,\) set
Proof of Theorem 2.13
Observe that Assumption A4 implies A3. We consider the weak* limit \(\psi ^* \in \Psi ^\vartheta \) mentioned above and denote its representative in \(\Psi \) by the same letter.
We shall show that \(\psi ^*\) is a weak correlated equilibrium. By Theorem 2.3, \(J^\ell _i(\pmb {\psi ^n}) =J^\ell _i(\psi ^n) \le \kappa ^\ell _i +\varepsilon _n\) for all \(i\in {{{\mathcal {N}}}}\) and \(\ell \in {{{\mathcal {L}}}}.\) Using Lemma 4.2(a), we conclude that
i.e., \(\psi ^*\) is feasible.
Take (if possible) any feasible strategy in the CSG for player \(i\in {{{\mathcal {N}}}}\), i.e., \( \pi _i\in \Pi _i\) such that
By Remark 2.2 there exists a strategy \(\phi _i\in \Phi _i\) such that
\(1^\circ \) Assume first that
From this inequality and Lemma 4.2(b), we infer that there exists \(N_1\in {\mathbb {N}}\) such that
For every \(n\ge N_1\) and Lemma 7.1 in Appendix we conclude the existence of a piecewise constant Markov strategy \({\overline{\pi }}_i\) (that may depend on n) such that
Hence, it must hold
In other words, for every \(n\ge N_1\) we have
Letting \(n\rightarrow \infty \) and making use of Lemma 4.2, we infer
for any feasible strategy \(\pi _i\in \Pi _i \) such that (4.4) holds.
\(2^\circ \) Assume now that there is player \(i\in {{{\mathcal {N}}}}\) and an index \(\ell _0\in {{{\mathcal {L}}}}\) such that
From the proof of Lemma 4.2(b), it follows that there exists a sequence \(e_n\ \rightarrow 0\) as \(n\rightarrow \infty ,\) \(e_n>0,\) such that
By Remark 2.12, we can find \(\zeta >0\) such that for every \(n\in {\mathbb {N}}\) there exists a strategy \(\sigma ^n_i\in \Phi _i\) such that
Hence, by Lemma 4.1, we conclude
and
Let \(N_2\in {\mathbb {N}}\) be such that \(\varepsilon _{N_2}<\zeta \) for all \(n\ge N_2\) set
and observe that \(\xi _n\rightarrow 0\) as \(n\rightarrow \infty \) and \(\xi _n\in (0,1) \) for all \(n > N_2\). Let \(\theta _{\phi _i}^{\gamma _n}\) and \( \theta _{\sigma ^n_i}^{\gamma _n}\) be two occupation measures defined as in (4.3). By Proposition 3.9 in [13], we define a sequence of occupation measures as follows
Then, for all \(\ell \in {{{\mathcal {L}}}}_0\) it holds
Hence, for \(n\ge N_2\) and all \(\ell \in {{{\mathcal {L}}}}\), from (4.6), we have
By Lemma 2.3 in [13] or Theorem 3.2 in [19] for every \(n\ge N_2,\) there exists a stationary strategy \(\chi ^n_i\in \Phi _i\) such that \(\theta ^n\) can be written as in (4.3) with \(\mathcal{E}_\eta ^{\varphi _i} \) replaced by \({{{\mathcal {E}}}}_\eta ^{\chi ^n_i}.\) In other words \( \theta ^n= \theta ^{\gamma _n}_{\chi ^n_i}.\) Therefore, for all \(\ell \in {{{\mathcal {L}}}}_0,\) we obtain
By Lemma 7.1 in Appendix for every \(n\in {\mathbb {N}}\) there exists a piecewise constant Markov strategy \({\overline{\pi }}^n_i\) such that
Hence, it must hold
We know that
Therefore, by Lemma 4.2(b) and (4.8), we get
for all \(\ell \in {{{\mathcal {L}}}}_0.\) This fact, (4.9) and Lemma 4.2(a) yield that
for any feasible strategy \(\pi _i\in \Pi _i\) for which (4.5) holds. \(\square \)
Let \(\Psi ^\vartheta _i\) be the space of \(\vartheta \)-equivalence classes of strategies in \(\Phi _i\) endowed with the weak* topology. Clearly, \(\Psi ^\vartheta _i\) is a compact metric space. The cost functionals \(J^\ell _i(\pmb {\varphi }),\) \(\ell \in {{{\mathcal {L}}}}_0\) and \(i\in {{{\mathcal {N}}}}\), are well defined for any profile \(\pmb {\varphi }= (\varphi _1,...,\varphi _N)\in {\widehat{\Psi }}^\vartheta = \prod _{j\in {{{\mathcal {N}}}}} \Psi ^\vartheta _j.\)
Remark 4.3
From Example 3.16 in [14] based on Rademacher’s functions, it follows that the weak* limit of the sequence of approximate equilibria in Theorem 2.13 need not be a stationary Nash equilibrium. The same example can be used to see that the cost functionals \(J^\ell _i,\) \(\ell \in {{{\mathcal {L}}}}_0\) and \(i\in {{{\mathcal {N}}}}\), may be discontinuous on \({\widehat{\Psi }}^\vartheta .\)
Consider the two-person game. It follows from Lemma 4.2 that \(J^\ell _i(\varphi _1,\varphi _2)\) is separately continuous in \(\varphi _1\) and \(\varphi _2\). Therefore, the functions
are upper semicontinuous on \(\Psi ^\vartheta _1\) and \(\Psi ^\vartheta _2\), respectively.
Remark 4.4
Consider a two-person game satisfying the standard Slater condition A5. Then, it follows
for all \(\varphi _1\in \Psi _1^\vartheta \) and \(\varphi _2\in \Psi _2^\vartheta .\) Since \(R_1\) and \(R_2\) are upper semicontinuous on the compact spaces \(\Psi _1^\vartheta \) and \(\Psi _2^\vartheta \), respectively, we conclude that
Obviously, \(\varphi _1\) and \(\varphi _2\) in inequalities (4.10) can be understood as representatives of (denoted by the same letters) classes in \(\Psi ^\vartheta _1\) and \(\Psi ^\vartheta _2,\) respectively. Then, it is apparent that A5 implies A4 for the considered two-person game.
Since in the N-person ARAT game the cost functionals are continuous on \({\widehat{\Psi }}^\vartheta \) with the product topology [13], A5 implies A4 in this case.
Finally,we note that in the countable state space case, the weak* topology on \(\Psi ^\vartheta _i\) is actually the topology of point-wise convergence and all cost functionals \(J^\ell _i\) are continuous on the compact space \({\widehat{\Psi }}^\theta \) with the product topology. Therefore, the standard Slater condition A4 made in the literature for these games, see [3, 4, 28, 42], is equivalent to A5.
5 Non-existence of Stationary Equilibria in Discounted Constrained Games
In this section, we consider discounted stochastic games with the given initial state distribution \(\eta .\) If \(c_i^\ell =0\) and \(\kappa _i^\ell =1\) for all \(i\in {{{\mathcal {N}}}} \) and \(\ell \in {{{\mathcal {L}}}}\), then the game in this class is trivially constrained and Assumption A3 automatically holds. Our aim is to conclude from [29] that such a game may have no stationary Nash equilibrium. For this, we need the following fact.
Proposition 5.1
Let A1 and A2 be satisfied and in addition let \(p(\cdot |x,\pmb {a}) \ll \eta \) for all \((x,\pmb {a})\in {\mathbb {K}}.\) If \(\pmb {\varphi }= (\varphi _1,\ldots ,\varphi _N)\in \Phi \) is a stationary Nash equilibrium in the discounted stochastic game with the initial state distribution \(\eta ,\) i.e.,
for all \(i\in {{{\mathcal {N}}}}\) and \(\pi _i\in \Pi _i,\) then there exists a stationary Nash equilibrium \(\pmb {\psi }=(\psi _1,...,\psi _N) \) in the unconstrained stochastic game for all initial states, i.e.,
for all \(i\in {{{\mathcal {N}}}},\) \(\pi _i\in \Pi _i \) and \(x\in X.\) Moreover, \(\varphi _i(da_i|x)=\psi _i(da_i|x)\) for \(\eta \)-a.e. \(x\in X \) and for all \(i\in {{{\mathcal {N}}}}.\)
We start with necessary notation. Let \(\pmb {\phi }= (\phi _1,...,\phi _N) \in \Phi .\) Then
is the product measure on A determined by \(\phi _i(da_i|x),\) \(i=1,2,...,N.\) Recall that by \(\phi _{-i}(d\pmb {a_{-i}}|x)\) we denote the projection of \(\phi (d\pmb {a}|x)\) on \(A_{-i}.\) We put
If \(\sigma _i \in \Phi _i,\) then
If \(\nu _i \in \Pr (A_i),\) then
with \(\sigma _i(da_i|x)= \nu _i(da_i)\) for all \(x\in X.\)
Let \(v_i\), \(i=1,2,...,N\), be bounded measurable functions on X. For each \(x\in X,\) by \(\Gamma _x(v_1,...,v_N)\) we denote the one-step N-person game, where the payoff (cost) function for player \(i\in {{{\mathcal {N}}}} \) is
Proof of Proposition 5.1
From (5.1), it follows that for each set \(S\in {{{\mathcal {F}}}},\) we have
Hence, for each \(S\in {{{\mathcal {F}}}},\)
Thus, for every \(i\in {{{\mathcal {N}}}},\) there exists \(S_i\in {{{\mathcal {F}}}}\) such that \(\eta (S_i)=1\) and for all \(x\in S_i,\) we have
Let \({\widehat{S}}:= S_1\cap S_2\cdots \cap S_N.\) Now consider the game \(\Gamma _x(v_1,...,v_N),\) where \(v_i(y)= J_i^0(\pmb {\varphi })(y),\) \(y\in X.\) By Lemma 5 in [36], there exists \(\pmb {\phi } \in \Phi \) such that \(\pmb {\phi } (d\pmb {a}|x) = (\phi _1(da_1|x),...,\phi _N(da_N|x))\) is a Nash equilibrium in the game \(\Gamma _x(v_1,...,v_N)\) for all \(x\in X\setminus {\widehat{S}}.\) For every \(i\in {{{\mathcal {N}}}},\) define \(\psi _i(da_i|x):= \varphi _i(da_i|x),\) if \(x\in {\widehat{S}},\) and \(\psi _i(da_i|x):= \phi _i(da_i|x),\) if \(x\in X\setminus {\widehat{S}}.\) Then, using (5.3), we conclude that \(\pmb {\psi }(d\pmb {a}|x) = (\psi _1(da_1|x),...,\psi _N(da_N|x))\) is a Nash equilibrium in the game \(\Gamma _x(v_1,...,v_N)\) for all \(x\in X.\) Define \(v_i^0(y):= v_i(y)= J_i^0(\pmb {\varphi })(y)\) for each \(y\in {\widehat{S}}\) and
for each \(y\in X\setminus {\widehat{S}}.\) Then, \(\eta (X\setminus {\widehat{S}})=0\) and our assumption \(p(\cdot |x,\pmb {a})\ll \eta (\cdot ),\) \((x,\pmb {a})\in {\mathbb {K}},\) imply that \(\Gamma _x(v_1^0,...,v_N^0)= \Gamma _x(v_1,...,v_N)\) for all \(x\in X.\) Therefore, for all \(x\in X,\) \(\psi (d\pmb {a}|x)\) is a Nash equilibrium in the game \(\Gamma _x(v_1^0,...,v_N^0)\) and
Using these facts and the Bellman equations for discounted dynamic programming [8, 24], we conclude that (5.2) holds. \(\square \)
Remark 5.2
Levy and McLennan [29] gave an example of a discounted stochastic game with no constraints having no stationary Nash equilibrium. This is an 8-person stochastic game with finite action sets for the players and \(X=[0,1]\) as the state space. The definitions of payoff functions and transition probabilities in their game are rather complicated and are not given here. We only mention that the transition probabilities are absolutely continuous with respect to the probability measure \(\eta _1=(\lambda _1 +\delta _1)/2,\) where \(\lambda _1\) is the Lebesgue measure on [0, 1] and \(\delta _1\) is the Dirac measure concentrated at the point 1. Assume that \(\eta _1\) is the initial state distribution in this game. If this game had a stationary Nash equilibrium, then by Proposition 5.1, it would have a stationary Nash equilibrium for all initial states. From [29], it follows that it is impossible.Footnote 1
6 Remarks on Games with Unbounded Costs
Our results can be extended to a class of games with unbounded cost functions \(c_i^{\ell }\) under some uniform integrability condition introduced in [16]. The method for doing this relies on truncations of the costs and using an approximation by bounded games. This was done in our paper [28] in the countable state space case. In a special situation, described below and inspired by the work of Wessels [40] on dynamic programming, a reduction to the bounded case can be obtained by the well-known data transformation as described in Remark 2.5 in [12] or Sect. 10 in [17]. Following Wessels [40], we make the following assumptions.
Assumption W
-
(i) There exist a measurable function \(\omega :X\rightarrow [1,\infty )\) and \(c_0>0\) such that \(|c^\ell _i(x,\pmb {a})|\le c_0\omega (x)\) for all \(x\in X,\) \(\pmb {a}\in A,\) \(i\in {{{\mathcal {N}}}}\) and \(\ell \in {{{\mathcal {L}}}}_0.\)
-
(ii) There exists \(\beta >1\) such that \(\alpha \beta <1\) and
$$\begin{aligned} \int _X\omega (y)p(dy|x,\pmb {a}) \le \beta \omega (x) \end{aligned}$$for all \(x\in X,\) \(\pmb {a}\in A.\)
-
(iii) If \(\pmb {a}^n \rightarrow \pmb {a}\) as \(n\rightarrow \infty ,\) then
To describe the equivalent model with bounded costs we extend the state space X by adding an isolated absorbing state \(0^*.\) All the costs at this absorbing state are zero. Let \(c_i^{\ell ,\omega }(x,\pmb {a}):= \frac{c_i^\ell (x,\pmb {a})}{\omega (x)},\) and
Now define the new initial state distribution as
Here, we assume that \(\eta \omega <\infty .\) Then, we obtain primitive data for a bounded constrained stochastic game, in which the discount factor is \(\alpha \beta .\) We denote the expected discounted costs in the bounded game under consideration by \({{{\mathcal {J}}}}^\ell _i(\pmb {\pi }).\) It is easy to see that
Theorems 2.3 and 2.13 can be established for the bounded game described above with minor modifications. For example, one has to define new constraint constants as \(\kappa ^\ell _i/\eta \omega ,\) \(i\in {{{\mathcal {N}}}}, \ell \in {{{\mathcal {L}}}}.\) Using the above transformation, we can immediately deduce similar results for games with unbounded cost functions satisfying Assumption W.
Notes
We thank John Yehuda Levy for pointing out this fact.
References
Aliprantis, C., Border, K.: Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, New York (2006)
Altman, E.: Constrained Markov Decision Processes. Chapman Hall & CRC, Florida (1999)
Altman, E., Shwartz, A.: Constrained Markov games: Nash equilibria. Ann. Int. Soc. Dyn. Games 5, 213–221 (2000)
Alvarez-Mena, J., Hernández-Lerma, O.: Existence of Nash equilibria for constrained stochastic games. Math. Meth. Oper. Res. 63, 261–285 (2006)
Aumann, R.J.: Subjectivity and correlation in randomized strategies. J. Math. Econ. 1, 67–96 (1974)
Aumann, R.J.: Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55, 1–18 (1987)
Balder, E.J.: Lectures on Young measure theory and its applications in economics. Rend. Istit. Mat. Univ. Trieste 31, 1–69 (2000)
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: the Discrete-Time Case. Academic Press, New York (1978)
Billingsley, P.: Probability and Measure. Wiley, New York (2012)
Debreu, G.: A social equilibrium existence theorem. Proc. Natl. Acad. Sci. USA 38, 931–938 (1954)
Duffie, D., Geanakoplos, J., Mas-Colell, A., McLennan, A.: Stationary Markov equilibria. Econometrica 62, 745–781 (1994)
Dufour, F., Prieto-Rumeau, T.: Conditions for the solvability of the linear programming formulation for constrained discounted Markov decision processes. Appl. Math. Optim. 74, 27–51 (2016)
Dufour, F., Prieto-Rumeau, T.: Stationary Markov Nash equilibria for nonzero-sum constrained ARAT Markov games. SIAM J. Control Optim. 60, 945–967 (2022)
Elliott, R.J., Kalton, N.J., Markus, L.: Saddle-points for linear differential games. SIAM J. Control Optim. 11, 100–112 (1973)
Federgruen, A.: On \(N\)-person stochastic games with denumerable state space. Adv. Appl. Prob. 10, 452–471 (1978)
Feinberg, E.A., Jaśkiewicz, A., Nowak, A.S.: Constrained discounted Markov decision processes with Borel state spaces. Automatica 111, 108582 (2020)
Feinberg, E.A., Piunovskiy, A.B.: Sufficiency of deterministic policies for atomless discounted and uniformly absorbing MDPs with multiple criteria. SIAM J. Control Optim. 57, 163–191 (2019)
Ferguson, T.S.: Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York (1967)
Florescu, L.C., Godet-Thobie, C.: Young Measures and Compactness in Measure Spaces. De Gruyter, Berlin (2012)
Forges, F.: An approach to communication equilibria. Econometrica 54, 1375–1385 (1986)
Forges, F.: Communication equilibria in repeated games with incomplete information. Math. Oper. Res. 13, 77–117 (1988)
Harris, C., Reny, P.J., Robson, A.: The existence of subgame-perfect equilibrium in continuous games with almost perfect information: a case for public randomization. Econometrica 63, 507–544 (1995)
He, W., Sun, Y.: Stationary Markov perfect equilibria in discounted stochastic games. J. Econ. Theory 169, 35–61 (2017)
Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York (1996)
Himmelberg, C.J., Parthasarathy, T., Raghavan, T.E.S., Van Vleck, F.S.: Existence of \(p\)-equilibrium and optimal stationary strategies in stochastic games. Proc. Am. Math. Soc. 60, 245–251 (1976)
Jaśkiewicz, A., Nowak, A.S.: Non-zero-sum stochastic games. In: Handbook of Dynamic Games, vol. I (Theory), (T. Başar and G. Zaccour, Eds.) pp. 281–344. Springer, Cham (2018)
Jaśkiewicz, A., Nowak, A.S.: Constrained Markov decision processes with expected total reward criteria. SIAM J. Control Optim. 57, 3118–3136 (2019)
Jaśkiewicz, A., Nowak, A.S.: Constrained discounted stochastic games. Appl. Math. Optim. 85(2), 6 (2022). https://doi.org/10.1007/s00245-022-09865-0
Levy, Y.J., McLennan, A.: Corrigendum to: discounted stochastic games with no stationary Nash equilibrium: two examples. Econometrica 83, 1237–1252 (2015)
Mertens, J.F.: Correlated and communication equilibria. In: Mertens, F., Sorin, S. (eds.) Game Theoretic Methods in General Equilibrium Analysis, pp. 243–248. Kluwer Academic, Dordrecht (1994)
Myerson, R.B.: Multistage games with communication. Econometrica 54, 323–358 (1986)
Moulin, H., Vial, J.P.: Strategically zero-sum games: the class of games whose completely mixed equilibria cannot be improved upon. Int. J. Game Theory 7, 201–221 (1978)
Neveu, J.: Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco (1965)
Nowak, A.S.: Existence of equilibrium stationary strategies in discounted noncooperative stochastic games with uncountable state space. J. Optim. Theory Appl. 45, 591–602 (1985)
Nowak, A.S.: Existence of correlated weak equilibria in discounted stochastic games with general state space. In: Stochastic Games and Related Topics (T.E.S. Raghavan, et al., Eds.), pp. 135–143. Kluwer Academic, Dordrecht (1991)
Nowak, A.S., Raghavan, T.E.S.: Existence of stationary correlated equilibria with symmetric information for discounted stochastic games. Math. Oper. Res. 17, 519–526 (1992)
Piunovskiy, A.B.: Optimal Control of Random Sequences in Problems with Constraints. Kluwer Academic Publishers (1997)
Solan, E.: Characterization of correlated equilibria in stochastic games. Int. J. Game Theory 30, 259–277 (2001)
Solan, E., Vieille, N.: Correlated equilibrium in stochastic games. Games Econ. Behav. 38, 362–399 (2002)
Wessels, J.: Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58, 326–335 (1977)
Whitt, W.: Representation and approximation of noncooperative sequential games. SIAM J. Control Optim. 18, 33–48 (1980)
Zhang, W., Huang, Y., Guo, X.: Nonzero-sum constrained discrete-time Markov games: the case of unbounded costs. TOP 22, 1074–1102 (2014)
Acknowledgements
We thank two reviewers for very helpful reports.
Funding
We acknowledge the financial support from the National Science Centre, Poland: Grant 2016/23/B/ST/00425.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have not disclosed any competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this section, we prove a lemma which plays an important role in the proofs of our theorems.
Let player \(i \in {{{\mathcal {N}}}}\) be fixed. We also fix \(\gamma >0\), the partition \({{{\mathcal {P}}}}^\gamma =\{X^\gamma _n: n\in {\mathbb {N}}_0 \}\) of the state space X, the cost functions \(c^{\ell ,\gamma }_i\) and the transition function \(p^\gamma \) in the game \({{{\mathcal {G}}}}^\gamma .\) We fix \(\pmb {\varphi _{-i}}\in \Phi _{-i}^\gamma =\prod _{j\in \mathcal{N}\setminus \{i\}}\Phi _j^\gamma .\)
A piecewise constant Markov strategy for player i is a sequence \(\pi _i= (f^t)_{t\in {\mathbb {N}}},\) where \(f^t\in \Phi _i^\gamma \) for all \(t\in {\mathbb {N}}.\)
Lemma 7.1
For fixed \(\pmb {\varphi }\in \Phi ^\gamma \) and each \(\phi _i\in \Phi _i\) there exists a piecewise constant Markov strategy \(\pi _i= (f^t)_{t\in {\mathbb {N}}}\) for player i such that
For a proof we need some auxiliary results. Let \(d\in {\mathbb {N}}.\)
Lemma 7.2
Assume that \(Y\in {{{\mathcal {F}}}}\) and \(\rho _0\) is a probability measure on X such that \(\rho _0(Y)=1.\) Let \(v=(v_0,...,v_{d-1})\), where every \(v_j:X\rightarrow {\mathbb {R}}\) is a bounded measurable function. Then, there exist points \(y_0,...,y_{d} \in Y\) and non-negative numbers \(\beta _0,...,\beta _{d}\) such that \(\sum _{j=0}^{d} \beta _j =1\) and
Proof
Consider the distribution function of v defined by: \(\zeta _v(B) :=\rho _0(v^{-1}(B))\), where B is any Borel set in \({\mathbb {R}}^d.\) Using Theorem 16.13 on page 229 in [9] and Lemma 3 on page 74 in [18], we obtain
Applying Carathéodory’s theorem, we find points \(y_0,...,y_{d}\in Y\) and numbers \(\beta _0,...,\beta _{d} \ge 0\) such that \(\sum _{j=0}^{d} \beta _j =1\) and (7.1) holds. \(\square \)
We use \({{{\mathcal {C}}}}(A_i)\) to denote the space of all real-valued continuous functions on \(A_i \) and \(\Pr (A_i)\) for the space of all probability measures on \(A_i.\)
Lemma 7.3
Let \(\rho \) be a probability measure on X. For each \(\ell \in \mathcal{L}_0\) assume that \(u^\ell :X\times A_i \rightarrow {\mathbb {R}}\) is a bounded function such that \(u^\ell (x,a_i) = u_n^\ell (a_i)\) for all \(x\in X^\gamma _n,\) \(a_i\in A_i,\) where \(u^\ell _n \in {{{\mathcal {C}}}}(A_i)\), \(n\in {\mathbb {N}}_0.\) Then, for any \(\phi _i\in \Phi _i\) there exists \(f\in \Phi ^\gamma _i\) such that
Proof
Assume first that \(\rho (X_n^\gamma )>0\) and define \(\rho _0(B) =\frac{\rho (B\cap X_n^\gamma )}{\rho (X_n^\gamma ) },\) \(B \in \mathcal{F}.\) Applying Lemma 7.2 with \(d=L+1\) and \(v=(u^0,...,u^{L}),\) we infer that there exist points \(y_0(n),...,y_{L+1}(n)\) in \(X_n^\gamma \) and \(\beta _0(n),...,\beta _{L+1}(n) \ge 0\) such that \(\sum _{j=0}^{L+1}\beta _j(n)=1\) and
For each \(x\in X_n^\gamma \), define \(f(da_i|x) :=\nu _n(da_i),\) where \(\nu _n \in \Pr (A_i)\) is given as
If \(\rho (X_n^\gamma )=0,\) then \(f(da_i|x)\) is defined for all \(x \in X_n^\gamma \) by \(f(da_i|x)=\nu _n(da_i)\) where \(\nu _n\) is any fixed measure in \(\Pr (A_i).\) Note that, we have
for all \(\ell \in {{{\mathcal {L}}}}_0,\ n\in {\mathbb {N}}_0.\) Hence,
for all \(\ell \in {{{\mathcal {L}}}}_0,\) which implies (7.2). \(\square \)
Since \(i\in {{{\mathcal {N}}}},\) \(\gamma >0\), \(\pmb {\varphi _{-i}}\in \Phi _{-i}^\gamma \) and \(\phi _i \in \Phi _i\) are fixed, the notation for the proof of Lemma 7.1 can be simplified.
Let \(\varphi _{-i}(d\pmb {a_{-i}}|x)\) be the product measure on \(A_{-i}\) induced by \(\varphi _j(da_j|x)\) with \(j\not = i.\) For \(\ell \in {{{\mathcal {L}}}}_0,\) \(x\in X\) and \(a_i\in A_i,\) we put
Next, we put
and, for any bounded measurable function \(w:X\rightarrow {\mathbb {R}},\)
Similarly, we define \(c^\ell _{g}(x)\) and \(Q_{g}w(x)\) for any \(g\in \Phi _i^\gamma .\) Next, if \(g^1,g^2,...,g^T \in \Phi _i^\gamma ,\) then
and
Note that \(\eta Q_{g^1}Q_{g^2}\cdots Q_{g^T}\) is the probability distribution of the state \(x_{T+1}\) of the process, when player i uses a Markov strategy \((g^t)_{t\in {\mathbb {N}}}.\)
We now introduce new notation for expected costs. Recalling that \(\phi _i\in \Phi _i,\) we put
If \(\pi _i=(g^t)_{t\in {\mathbb {N}}}\) is a piecewise constant strategy for player i, then \(I^{\ell ,\eta }_T(\pi _i)= I^{\ell ,\eta }_T(g^1,...,g^T) \) denotes the expected discounted cost in the T-step game \({{{\mathcal {G}}}}^\gamma \) under assumption that the other players use \(\pmb {\varphi _{-i}}.\) Then, the cost over the infinite time horizon is
Proof of Lemma 7.1
We show by induction that for given \(\phi _i\in \Phi _i\) there exists \(\pi _i=(f^t)_{t\in {\mathbb {N}}}\) with \(f^t\in \Phi _i^\gamma \) for all \(t\in {\mathbb {N}}\) such that for all \(T\in {\mathbb {N}},\) we have
We shall use the following equation
Assume that \(T=1.\) Then,
Applying Lemma 7.3 with \(\rho =\eta \) and
we obtain \(f^1\in \Phi _i^\gamma \) such that
Then, we get
We have obtained (7.3) for \(T=1.\) Assume now that (7.3) holds for \(T=m\) with some \(m\ge 1.\) Then we have for some \(f^1,...,f^m \in \Phi ^\gamma _i\) that
for all \(\ell \in {{{\mathcal {L}}}}_0.\) Applying Lemma 7.3 with \(u^\ell (x,a_i)\) given by (7.4) and \(\rho = \eta Q_{f^1}\cdots Q_{f^m},\) we obtain \(f^{m+1} \in \Phi _i^\gamma \) such that
Thus for all \(\ell \in {{{\mathcal {L}}}}_0\) we get
This finishes the induction step. Taking the limit in (7.3) as \(T\rightarrow \infty \), we obtain
for all \( \ell \in {{{\mathcal {L}}}}_0.\) Going back to our original notation, we deduce that this is the assertion of Lemma 7.1. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jaśkiewicz, A., Nowak, A.S. On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games. Appl Math Optim 87, 23 (2023). https://doi.org/10.1007/s00245-022-09930-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s00245-022-09930-8