On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Jaśkiewicz, Anna; Nowak, Andrzej S.

doi:10.1007/s00245-022-09930-8

On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Open access
Published: 13 January 2023

Volume 87, article number 23, (2023)
Cite this article

Download PDF

You have full access to this open access article

Applied Mathematics & Optimization Aims and scope Submit manuscript

On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Download PDF

1485 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we consider constrained discounted stochastic games with a countably generated state space and norm continuous transition probability having a density function. We prove existence of approximate stationary equilibria and stationary weak correlated equilibria. Our results imply the existence of stationary Nash equilibrium in ARAT stochastic games.

On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games

Article Open access 04 January 2015

Stationary Almost Markov ε-Equilibria for Discounted Stochastic Games with Borel Spaces and Unbounded Payoffs

Article 11 June 2024

Stationary Equilibria in Discounted Stochastic Games

Article 05 March 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Constrained Markov decision processes and stochastic games have numerous applications in operations research, economics, computer sciences, consult with [2, 3, 28, 37] and references cited therein. They arise in situations, in which a controller or player has many objectives. For example, when she or he wants to minimise one type of cost while keeping other costs lower than some given bounds. Constrained stochastic games with finite state and action spaces were first studied by Altman and Shwartz [3]. Their work was extended to some classes of games with countable state spaces in [4, 42] by finite state approximations. A more direct approach based on properties of measures induced by strategies and occupation measures was presented in [28].

In this paper, we study discounted constrained stochastic games with a general state space and the transition probability having a density function. Such two-person games with additive rewards and additive transition structure (ARAT games) were recently studied by Dufour and Prieto-Rumeau [13]. They established the existence of stationary Nash equilibria generalising the result of Himmelberg et al. [25] proved for unconstrained games. Moreover, their theorem also holds for N-person ARAT games satisfying the standard Slater condition. As shown in a highly non-trivial example by Levy and McLennan [29], the games under consideration in this paper may have no stationary Nash equilibrium in the unconstrained case. It can be seen that this example applies to the constrained case as well. Thus, results on approximate equilibria as in [34, 41] became more valuable. They are stated for the unconstrained case, and in this paper we extend the main result from [34] to a class of constrained games. In this way, we establish the existence of approximate stationary equilbria for discounted stochastic games with constraints and general state spaces. It should be noted that the existence of stationary equilibria in discounted unconstrained games was proved only in some special cases, for instance, for ARAT games [25] or games with transitions having no conditional atoms [23]. For a survey of results on stationary and non-stationary Nash equilibria the reader is referred to [26].

The other group of papers comprise the ones on stationary equilibria with public signals, see [11, 22, 36]. Such solutions can be viewed as special communication or correlated equilibria widely discussed in dynamic frameworks (repeated, stochastic or extensive form games) in [20, 21, 31, 38, 39]. They were inspired by the seminal papers of Aumann [5, 6]. A weaker version of correlated equilibrium was proposed by Moulin and Vial [32]. According to their approach a correlated strategy in a finite (bimatrix) game is a probability distribution $\nu $ on the set of pure strategy pairs. Every player has to decide whether to accept $\nu $ or to use his or her individual strategy. If player i uses an individual strategy and player $j\not =i$ obeys $\nu $, then a pure action for player j is selected by the marginal distribution of $\nu $ on his/her pure actions. Then $\nu $ is an equilibrium, if no unilateral deviations from it are profitable. This solution is called a weak correlated equilibrium or a correlated equilibrium with no exchange of information [32]. In contrast to Aumann’s approach, the players who accepted $\nu $ cannot change actions after using the lottery $\nu $. The solution proposed by Moulin and Vial [32] has an interesting property. Namely, the authors constructed a bimatrix game, in which the equilibrium payoffs in their equilibrium concept strictly dominate in the Pareto sense the payoffs in Aumann’s equilibrium, see [30, 32].

In [35] the concept of Moulin and Vial is used to an unconstrained discounted stochastic game with a general state space. However, as shown by Solan and Vieille, [39], the notion of a weak correlated equilibrium can be also regarded as a special case of a general correlation scheme.

In this paper, we extend the result from [35] to a large class of discounted stochastic games with so-called integral constraints. We apply our recent result from [28] for games with discrete state spaces and use an approximation technique. A stationary weak correlated equilibrium is obtained as a limit (in the weak* sense) of approximate equilibria. Our result generalises the main theorem of Dufour and Prieto-Rumeau [13] given for ARAT games, if the action sets for players do not depend on the state. We wish to emphasise that the considerations of other classes of correlated equilibria in constrained stochastic games (like equilibria with public signals) seem to be very challenging for many reasons. Firstly, the integral constraints are difficult to apply. Secondly, the usual methods from dynamic programming (Bellman’s principle) or backward and forward induction used in unconstrained cases are not applicable. Perhaps further possible results can be obtained for other correlated equilibria but under different type of constraints.

The paper is organised as follows. The model and main results on equilibria are contained in Sect. 2. Section 3 presents the approximation technique and the proofs of two main theorems. Section 4 is devoted to the proof on the existence of a weak correlated equilibrium and a discussion on our assumptions. In Sect. 5, we show that the example given in [29] can be used to show that discounted constrained stochastic games studied in this paper may not have stationary Nash equilibria. Section 6 discusses a useful transformation that shows how to easily extend our results formulated for bounded cost functions to unbounded ones. In Appendix (Sect. 7) we give a crucial lemma on a replacement of a general strategy by a piecewise constant strategy. It is used in the proofs of our main theorems on equilibria in constrained stochastic games.

2 The Game Model and Main Results

In this section, we describe constrained discounted stochastic games with general state space and our basic assumptions. We provide our main results in three cases. Firstly, we give a theorem on the existence of a stationary approximate equilibrium assuming that the players play the game independently. Secondly, we drop the constraints and give a theorem on the existence of a stationary $\varepsilon $-equilibrium for every initial state, extending the main result in [34]. Finally, we show that the constrained stochastic games under consideration possess stationary weak correlated equilibria introduced in the static (bimatrix) case by Moulin and Vial [32].

2.1 Approximate Nash Equilibria in Constrained Discounted Stochastic Games

The non-zero-sum constrained stochastic game (CSG) is described by the following objects:

${{{\mathcal {N}}}}=\{1,2,...,N\}$ is the set of players.
X is a state space endowed with a countably generated $\sigma $-algebra ${{\mathcal {F}}}.$
$A_i$ is a compact metric action space for player $i\in {{{\mathcal {N}}}} $ endowed with the Borel $\sigma $-algebra. We put
$$\begin{aligned}{} & {} A:=\prod _{j\in {{{\mathcal {N}}}}} A_j\quad \text{ and }\quad A_{-i}:=\prod _{j\in {{{\mathcal {N}}}}\setminus \{ i\}} A_j, \\{} & {} {\mathbb {K}}_i:= \{(x,a_i): x\in X,\ a_i\in A_i \}, \quad {\mathbb {K}}:= \{(x,\pmb {a}): x\in X,\ \pmb { a}=(a_1,...,a_n)\in A \}. \end{aligned}$$
The real-valued functions $c_i^\ell :{\mathbb {K}}\rightarrow {\mathbb {R}},$ where $i\in {{{\mathcal {N}}}},$ $\ell \in {{{\mathcal {L}}}}_0={{{\mathcal {L}}}}\cup \{0\} $ with ${{{\mathcal {L}}}}=\{1,...,L\},$ are product measurable. Here, $c_i^0$ is the cost-per-stage function for player $i\in {{{\mathcal {N}}}},$ and for each $\ell \in \mathcal{L},$ $c_i^\ell $ is a function used in the definition of the $\ell $-th constraint for this player. It is assumed that there exists $b>0$ such that
$$\begin{aligned} |c^\ell _i(x,\pmb {a})|\le b,\quad \text{ for } \text{ all }\quad i\in {{{\mathcal {N}}}},\ \ell \in {{{\mathcal {L}}}}_0,\ (x,\pmb {a})\in {\mathbb {K}}. \end{aligned}$$
$p(dy|x,\pmb {a})$ is the transition probability from x to $y\in X,$ when the players choose a profile $\pmb { a}=(a_1,a_2,...,a_N)$ of actions in A.
$\eta $ is the initial state distribution.
$\alpha \in (0,1)$ is the discount factor.
$\kappa _i^\ell $ are constraint constants, $i\in {{{\mathcal {N}}}},$ $\ell \in {{{\mathcal {L}}}}.$

Let ${\mathbb {N}}=\{1,2,...\} .$ Define $H^1=X$ and $H^{t+1}= {\mathbb {K}}\times H^{t}$ for $t\in {\mathbb {N}}.$ An element $h^t=(x^1,\pmb { a}^1,\ldots ,x^t)$ of $H^t$ represents a history of the game up to the t-th period, where $\pmb { a}^k=(a^k_1,\ldots ,a^k_N)$ is the profile of actions chosen by the players in the state $x^k$ on the k-th stage of the game, $h^1=x^1.$

Strategies for the players are defined in the usual way. A strategy for player $i\in {{{\mathcal {N}}}}$ is a sequence $\pi _i=(\pi _{i}^t)_{t\in {\mathbb {N}}},$ where each $\pi _{i}^t$ is a transition probability from $H^t$ to $A_i.$ By $\Pi _i$ we denote the set of all strategies for player i. Let $\Phi _i$ be the set of transition probabilities from X to $A_i.$ A stationary strategy for player i is a constant sequence $ (\varphi _{i}^t)_{t\in {\mathbb {N}}},$ where $\varphi _i^t=\varphi _i$ for all $t\in {\mathbb {N}}$ and some $\varphi _i\in \Phi _i.$ Furthermore, we shall identify a stationary strategy for player i with the constant element $\varphi _i$ of the sequence. Thus, the set of all stationary strategies of player i is also denoted by $\Phi _i.$ We define

$$\begin{aligned} \Pi = \prod _{i=1}^N \Pi _i \quad \text{ and }\quad \Phi = \prod _{i=1}^N \Phi _i. \end{aligned}$$

Hence, $\Pi $ ($\Phi $) is the set of all (stationary) multi-strategies of the players.

Let $H^\infty = {\mathbb {K}}\times {\mathbb {K}}\times \cdots $ be the space of all infinite histories of the game endowed with the product $\sigma $-algebra. For any multi-strategy $\pmb { \pi }\in \Pi $, a unique probability measure ${\mathbb {P}}_\eta ^{\pmb {\pi }}$ and a stochastic process $(x^t,\pmb {a}^t)_{t\in {\mathbb {N}}}$ are defined on $H^\infty $ in a canonical way, see the Ionescu-Tulcea theorem, e.g., Proposition V.1.1 in [33]. The measure ${\mathbb {P}}_\eta ^{\pmb {\pi }}$ is induced by $\pmb {\pi },$ the transition probability p and the initial distribution $\eta .$ The expectation operator with respect to ${\mathbb {P}}_\eta ^{\pmb {\pi }}$ is denoted by ${\mathbb {E}}_\eta ^{\pmb {\pi }}.$

Let $\pmb {\pi }\in \Pi $ be any multi-strategy. For each $i\in \mathcal{N}$ and $\ell \in {{{\mathcal {L}}}}_0$, the discounted cost functionals are defined as

$$\begin{aligned} J_i^\ell (\pmb {\pi }) = (1-\alpha ){\mathbb {E}}^{\pmb {\pi }}_\eta \left[ \sum _{t=1}^\infty \alpha ^{t-1} c_i^\ell (x^t,\pmb {a}^t) \right] . \end{aligned}$$

We assume that $J^0_i(\pmb {\pi })$ is the expected discounted cost of player $i\in {{{\mathcal {N}}}}$, who wishes to minimise it over $\pi _i \in \Pi _i$ in such a way that the following constraints are satisfied

$$\begin{aligned} J^\ell _i(\pmb {\pi }) \le \kappa ^\ell _i \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

A multi-strategy $\pmb {\pi }$ is feasible, if the above inequality holds for each $i\in {{{\mathcal {N}}}},$ $\ell \in {{{\mathcal {L}}}}.$ We denote by $\Delta $ the set of all feasible multi-strategies in the CSG.

As usual, for any $\pmb {\pi }\in \Pi $, we denote by $\pmb {\pi _{-i}}$ the multi-strategy of all players but player i, that is, $ \pmb {\pi _{-1}} =(\pi _2,...,\pi _N),$ $ \pmb {\pi _{-N}} =(\pi _1,...,\pi _{N-1}),$ and for $i\in {{{\mathcal {N}}}}\setminus \{1,N\},$

$$\begin{aligned} \pmb {\pi _{-i}} =(\pi _1,\ldots ,\pi _{i-1},\pi _{i+1},\ldots ,\pi _N). \end{aligned}$$

We identify $[\pmb {\pi _{-i}},\pi _i]$ with $\pmb {\pi }.$ For each $\pmb {\pi }\in \Pi $, we define the set of feasible strategies for player i with $\pmb {\pi _{-i}}$ as

$$\begin{aligned} \Delta _i(\pmb {\pi _{-i}})=\{\pi _i\in \Pi _i: \ J^\ell _i(\pmb {\pi }) = J^\ell _i([\pmb {\pi _{-i}},\pi _i]) \le \kappa ^\ell _i \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}\}. \end{aligned}$$

Let $\pmb {\pi }=(\pi _1,\pi _2,...,\pi _N) \in \Pi $ and $\sigma _i\in \Pi _i.$ By $[\pmb {\pi _{-i}},\sigma _i]$ we denote the multi-strategy, where player i uses $\sigma _i$ and every player $j\not = i$ uses $\pi _j.$

Definition 2.1

A multi-strategy $\pmb {\pi }^*\in \Pi $ is an approximate equilibrium in the CSG (for given $\varepsilon >0$), if for every $i\in {{{\mathcal {N}}}}$ and $\ell \in {{{\mathcal {L}}}},$

$$\begin{aligned} J^\ell _i(\pmb {\pi }^*) \le \kappa ^\ell _i +\varepsilon , \end{aligned}$$

(2.1)

and for every $i\in {{{\mathcal {N}}}},$

$$\begin{aligned} J^0_i(\pmb {\pi }^*)-\varepsilon \le \inf _{\sigma _i \in \Delta _i(\pmb {\pi _{-i}^*})} J^0_i([\pmb {\pi _{-i}^*},\sigma _i]). \end{aligned}$$

(2.2)

A multi-strategy $\pmb {\pi }^*\in \Pi $ is an $\varepsilon $-equilibrium in the CSG (for given $\varepsilon \ge 0$), if (2.2) holds and $J^\ell _i(\pmb {\pi }^*) \le \kappa ^\ell _i $ for every $i\in {{{\mathcal {N}}}}$ and $\ell \in {{{\mathcal {L}}}}.$ A 0-equilibrium is called a Nash equilibrium in the CSG.

Note that, every $\varepsilon $-equilibrium is approximate, but not vice versa. For small $\varepsilon >0,$ condition (2.1) allows for a slight violation of the feasibility of $\pmb {\pi }^*$. Further comments on this condition the reader will find in Remark 2.4.

We now formulate our basic assumptions.

Assumption A1

The functions $c^\ell _i(x,\cdot )$ are continuous on A for all $x\in X,$ $i\in {{{\mathcal {N}}}}$ and $\ell \in {{{\mathcal {L}}}}_0.$

Assumption A2

The transition probability p is of the form

$$\begin{aligned} p(B|x,\pmb {a})=\int _B \delta (x,y,\pmb {a})\mu (dy), \quad B\in {{{\mathcal {F}}}}, \end{aligned}$$

where $\mu $ is a probability measure on ${{\mathcal {F}}}$ and $\delta $ is a product measurable non-negative (density) function such that, if $\pmb {a}^n \rightarrow \pmb {a}$ as $n\rightarrow \infty ,$ then

$$\begin{aligned} \int _X|\delta (x,y,\pmb {a}^n) - \delta (x,y,\pmb {a})|\mu (dy) \rightarrow 0. \end{aligned}$$

This assumption means the norm continuity of p with respect to action profiles.

Assumption A3

For each stationary multi-strategy $\pmb {\varphi }\in \Phi $ and for each player $i\in {{{\mathcal {N}}}},$ there exists $\pi _i\in \Pi _i$ such that

$$\begin{aligned} J^\ell _i([\pmb {\varphi _{-i}},\pi _i]) \le \kappa ^\ell _i \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

Assumption A3 is standard in the theory of constrained decision processes and stochastic games [2, 3, 13, 28].

Remark 2.2

From Assumption A3, Lemma 2.3 in [13] and Lemma 24 in [37], it follows that the strategy $\pi _i\in \Pi _i$ can be replaced a stationary strategy $\sigma _i\in \Phi _i$ such that

$$\begin{aligned} J^\ell _i([\pmb {\varphi _{-i}},\pi _i])= J^\ell _i([\pmb {\varphi _{-i}},\sigma _i]) \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

The proof of Lemma 24 in [37] on the equivalence of these strategies is formulated for models with Borel state spaces. However, it is also valid in our framework (see pp. 307–309 in [37]) with the exception that we need an appropriate disintegration result. In this matter, consult with Lemma 2.3 in [13] or Theorem 3.2 in [19].

We are ready to state our first main result.

Theorem 2.3

Assume A1, A2 and A3. Then, for each $\varepsilon >0,$ the CSG possesses a stationary approximate equilibrium.

Remark 2.4

The proof of this result is given in Sect. 3. We prove that a stationary approximate equilibrium for given $\varepsilon >0$ consists of strategies that are piecewise constant functions of the state variable. We observe that, under assumptions of Theorem 2.3, condition (2.1) with $\varepsilon =0$ need not be satisfied by piecewise constant stationary multi-strategies. Therefore, the existence of an $\varepsilon $-equilibrium in the CSG is an open issue. We would like to emphasise that Theorem 2.3 is crucial in our proof of Theorem 2.13 on weak correlated equilibria, where we apply an asymptotic approach when $\varepsilon \rightarrow 0.$

Remark 2.5

The only result in the literature on the existence of stationary Nash equilibria in CSGs with general state space was given by Dufour and Prieto-Rumeau [13]. It concerns so-called discounted additive rewards and additive transition (ARAT) stochastic games. In the two-person case the ARAT assumption means that $c_i^\ell (x,a_1,a_2)= c_{1i}^\ell (x,a_1) + c_{2i}^\ell (x, a_2)$ and $p(\cdot |x,a_1,a_2)= p_1(\cdot |x,a_1 )+ p_2(\cdot |x, a_2), $ where $p_1$ and $p_2$ are transition subprobabilities. The results in [13] are given for two-person games satisfying the standard Slater condition (Assumption A3 with strict inequalities). However, they can be easily extended by the same methods to N-person ARAT stochastic games. A simple adaptation of the counterexample by Levy and McLennan [29] given for unconstrained discounted stochastic games implies that stationary Nash equilibria may not exist in the constrained stochastic games studied in this paper. For more details see Sect. 5.

Remark 2.6

We wish to emphasise that the Slater condition is not needed for the establishing an approximate equilibrium in CSGs.

2.2 An Update on Stationary Equilibria in Unconstrained Discounted Stochastic Games

In this subsection, we drop the constraints. By the Ionescu–Tulcea theorem [33], any multi-strategy $\pmb {\pi }\in \Pi $ and any initial state $x\in X,$ induce a unique probability measure ${\mathbb {P}}_x^{\pmb {\pi }}$ on $H^\infty .$ The expectation operator with respect to ${\mathbb {P}}_x^{\pmb {\pi }}$ is denoted by ${\mathbb {E}}_x^{\pmb {\pi }}.$

The discounted cost for player $i\in {{{\mathcal {N}}}}$ is defined as

$$\begin{aligned} J_i^0(\pmb {\pi })(x) = (1-\alpha ){\mathbb {E}}^{\pmb {\pi }}_x\left[ \sum _{t=1}^\infty \alpha ^{t-1} c_i^0(x^t,\pmb {a}^t) \right] . \end{aligned}$$

Definition 2.7

Let $\varepsilon \ge 0$ be fixed. A multi-strategy $\pmb {\pi }^*\in \Pi $ is an $\varepsilon $-equilibrium in the unconstrained discounted stochastic game, if

$$\begin{aligned} J^0_i(\pmb {\pi }^*)-\varepsilon \le \inf _{\sigma _i \in \Pi _i} J^0_i([\pmb {\pi }^*,\sigma _i]) \end{aligned}$$

for every player $i\in {{{\mathcal {N}}}}$ and for all initial states $x\in X.$ A 0-equilibrium is called a Nash equilibrium.

Theorem 2.8

Under assumptions A1and A2, for any $\varepsilon >0,$ the unconstrained discounted stochastic game has a stationary $\varepsilon $-equilibrium.

The proof is given in Sect. 3.

Remark 2.9

Stationary Nash equilibria exist only in some special cases of stochastic games satisfying Assumptions A1 and A2, see [25] (ARAT games), [23] (other classes of games) and [26] (a survey). As shown by Levy and McLennan [29] stationary Nash equilibria need not exist in general under assumptions of Theorem 2.8.

Remark 2.10

Theorem 2.8 is an extension of Theorem 3.1 in [34], where additionally it is assumed that

$$\begin{aligned} \int _X \sup _{\pmb {a}\in A} \delta (x,y,\pmb {a})\mu (dy) <\infty \quad \text{ for } \text{ each }\quad x\in X. \end{aligned}$$

(2.3)

2.3 Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Let $\Psi $ be the set of all transition probabilities from X to A, that is, $\psi \in \Psi $ if $\psi (\cdot |x)\in \Pr (A)$ for every $x\in X $ and $\psi (D|\cdot )$ is ${{\mathcal {F}}}$-measurable for any Borel set $D\subset A.$ A stationary correlated strategy for the players in the CSG is a constant sequence $(\psi ,\psi ,\ldots ),$ where $\psi \in \Psi .$ As in the case of stationary strategies, we shall identify a correlated strategy with the element $\psi $ of this sequence.

By the Ionescu-Tulcea theorem [33], any correlated strategy $\psi \in \Psi $ and the initial distribution $\eta ,$ induce a unique probability measure ${\mathbb {P}}_\eta ^{\psi }$ on $H^\infty .$ The expectation operator with respect to ${\mathbb {P}}_\eta ^{\psi }$ is denoted by ${\mathbb {E}}_\eta ^{\psi }.$ Then the discounted cost functionals for player $i\in {{{\mathcal {N}}}}$ are defined as

$$\begin{aligned} J_i^\ell (\psi ) = (1-\alpha ){\mathbb {E}}^{\psi }_\eta \left[ \sum _{t=1}^\infty \alpha ^{t-1} c_i^\ell (x^t,\pmb {a}^t) \right] \end{aligned}$$

for all $\ell \in {{{\mathcal {L}}}}_0.$ Obviously, here at stage t the vector of actions $\pmb {a}^t$ is chosen according to a probability measure $\psi (\cdot |x^t).$

Furthermore, let $\psi _{i}$ and i denote the projections for any x 2 X of (jx) on $A_{\psi i}$ and $A_{i}$, respectively. For any player $i\in {{{\mathcal {N}}}}$ and a strategy $\pi _i\in \Pi _i$ we denote by $[\psi _{-i},\pi _i]$ a multi-strategy, where player i uses a strategy $\pi _i$ and the other players act as one player applying $\psi _{-i}.$ In this case, $J^0_i([\psi _{-i},\pi _i])$ denotes the expected discounted cost for player i. Set

$$\begin{aligned} \Delta _i(\psi _{-i})=\{\pi _i\in \Pi _i:\ J_i^\ell ([\psi _{-i},\pi _i])\le \kappa _i^\ell \ \text{ for } \text{ all } \ \ell \in {{{\mathcal {L}}}}\}. \end{aligned}$$

Definition 2.11

A strategy $\psi ^*\in \Psi $ is called a weak correlated equilibrium in the CSG, if for every $i\in {{{\mathcal {N}}}} $ and $\ell \in {{{\mathcal {L}}}},$ $J^\ell _i (\psi ^*) \le \kappa ^\ell _i$ and for every $i\in {{{\mathcal {N}}}}, $

$$\begin{aligned} J^0_i(\psi ^*) \le \inf _{\pi _i \in \Delta _i(\psi _{-i}^*)} J^0_i([\psi _{-i}^*,\pi _i]). \end{aligned}$$

(2.4)

If all players but $i\in {{{\mathcal {N}}}}$ accept to use $\psi ^*$ to select an action profile in any state x and player $i\in {{{\mathcal {N}}}}$ decides to play independently of all of them by choosing a feasible strategy $\pi _i$, then the action profile for all players in $\mathcal{N}\setminus \{i\}$ is selected with respect to the marginal probability distribution $\psi _{-i}^*(\cdot |x)$ on $A_{-i}.$ When $\psi ^*$ is a weak correlated equilibrium, then inequality (2.4) says that unilateral deviations from $\psi ^*$ are not profitable. This is an adaptation of the equilibrium concept, formulated by Moulin and Vial [32] for static games, to our dynamic game model.

In order to state our third main result, we define $\Phi _{-i}:= \prod _{j\in {{{\mathcal {N}}}}\setminus \{i\}}\Phi _j $ and impose the following condition.

Assumption A4

For each player $i\in {{{\mathcal {N}}}},$

$$\begin{aligned} \sup _{\pmb {\varphi _{-i}} \in \Phi _{-i}} \min _ {\sigma _i\in \Phi _i} \max _{\ell \in {{{\mathcal {L}}}}}\ (J^\ell _i([\pmb {\varphi _{-i}},\sigma _i]) -\kappa ^\ell _i) <0. \end{aligned}$$

This assumption implies the standard Slater condition (see Assumption A5 below) widely used in the literature [2, 3, 13, 28].

Assumption A5

For each player $i\in {{{\mathcal {N}}}}$ and any $\pmb {\varphi _{-i}}\in \Phi _{-i}$, there exists $\sigma _i\in \Phi _i$ such that

$$\begin{aligned} J^\ell _i([\pmb {\varphi _{-i}},\sigma _i]) <\kappa ^\ell _i \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

Assumptions A4 and A5 may seemingly be more general. Namely, we can formulate them for $\pi _i\in \Pi _i$ instead of $\sigma _i\in \Phi _i$ and replace the set $\Phi _i$ by $\Pi _i.$ However, Remark 2.2 implies that these formulations are in fact equivalent.

Remark 2.12

From Assumption A4, it follows that there exists $\zeta >0$ such that for every player $i\in {{{\mathcal {N}}}},$

$$\begin{aligned} \sup _{\pmb {\varphi _{-i}} \in \Phi _{-i}} \min _ {\sigma _i\in \Phi _i} \max _{\ell \in {{{\mathcal {L}}}}}\ (J^\ell _i([\pmb {\varphi _{-i}},\sigma _i]) -\kappa ^\ell _i) <- \zeta , \end{aligned}$$

and consequently that for each player $i\in {{{\mathcal {N}}}}$ and any $\pmb {\varphi _{-i}}\in \Phi _{-i}$, there exists $\sigma _i\in \Phi _i$ such that

$$\begin{aligned} J^\ell _i([\pmb {\varphi _{-i}},\sigma _i]) <\kappa ^\ell _i -\zeta \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

Theorem 2.13

Assume A1, A2 and A4. Then, the CSG possesses a stationary weak correlated equilibrium.

The proof is given in Sect. 4.

Remark 2.14

The existence of a weak correlated equilibrium in an unconstrained case was proved by Nowak [35] under additional integrability condition (2.3).

Remark 2.15

If $\psi ^*$ is a stationary weak correlated equilibrium in an ARAT game, then $(\psi _{1},\psi _{2},...,\psi _N)$ is a stationary Nash equilibrium in this game. Thus, Theorem 2.13 implies the main result of Dufour and Prieto-Rumeau [13], if the action sets are independent of the state. However, their proof is more direct in the sense that it is not based on an approximation by games with discrete state spaces. Instead, they directly apply a fixed point theorem. An extension to the case of action spaces depending on the state variable raises some additional technical issues.

3 Approximating Games with Countable State Spaces and Proofs of Theorems 2.3 and 2.8

In this section, we define a class of games that resemble stochastic games with a countable state space. Using them we can approximate the original game and apply the results on existence of stationary equilibria in discounted games with countably many states proved by Federgruen [15] (unconstrained case) and Jaśkiewicz and Nowak [28] (constrained case).

Let ${{{\mathcal {C}}}}(A)$ be the Banach space of all real-valued continuous functions on A endowed with the maximum norm $\Vert \cdot \Vert .$ Let ${{{\mathcal {C}}}}_b = \{w_1,w_2,...\}$ denote the countable dense subset in the ball $\{w\in {{{\mathcal {C}}}}(A): \Vert w\Vert \le b\}$ in ${{{\mathcal {C}}}}(A),$ where $b\ge |c^\ell _i(x,\pmb {a})|$ for all $i\in {{{\mathcal {N}}}}$, $\ell \in \mathcal{L}_0,$ $(x,\pmb {a})\in {\mathbb {K}}.$

We write ${{{\mathcal {L}}}}^1$ to denote the Banach space $\mathcal{L}^1(X,{{{\mathcal {F}}}},\mu )$ of all absolutely integrable real-valued measurable functions on X with the norm

$$\begin{aligned} \Vert v \Vert _1= \int _X|v (y)|\mu (dy), \quad v \in {{{\mathcal {L}}}}^1. \end{aligned}$$

Let ${{{\mathcal {C}}}}(A,{{{\mathcal {L}}}}^1)$ be the space of all ${{{\mathcal {L}}}}^1$-valued continuous functions on A with the norm

$$\begin{aligned} \Vert \lambda \Vert _c = \max _{\pmb {a}\in A}\int _X|\lambda (y,\pmb {a})|\mu (dy). \end{aligned}$$

Here an element of ${{{\mathcal {C}}}}(A,{{{\mathcal {L}}}}^1)$ is written as a product measurable function $\lambda : X\times A \rightarrow {\mathbb {R}}$ such that $\lambda (\cdot ,\pmb {a}) \in {{{\mathcal {L}}}}^1$ for each $\pmb {a}\in A$ and

$$\begin{aligned} \Vert \lambda (\cdot ,\pmb {a}^n) - \lambda (\cdot ,\pmb {a})\Vert _1 = \int _X|\lambda (y,\pmb {a}^n) - \lambda (y,\pmb {a})|\mu (dy) \rightarrow 0 \quad \text{ as }\quad \pmb {a}^n \rightarrow \pmb {a}, \ n\rightarrow \infty . \end{aligned}$$

By Lemma 3.99 in [1], the space ${{{\mathcal {C}}}}(A,{{{\mathcal {L}}}}^1)$ is separable. Assumption A2 implies that ${{{\mathcal {D}}}}:= \{ \delta (x,\cdot ,\cdot ): x \in X \} \subset {{{\mathcal {C}}}}(A,{{{\mathcal {L}}}}^1)$ is also a separable space when endowed with the relative topology. Therefore, there exists a subset $\{x_k: k \in {\mathbb {N}}\}$ of the state space X such that the set $\{\delta (x_k,\cdot ,\cdot ): k \in {\mathbb {N}} \}$ is dense in ${{{\mathcal {D}}}}.$

For any player $i\in {{{\mathcal {N}}}},$ and positive integers $m_{i\ell },$ $\ell \in {{{\mathcal {L}}}}_0,$ we put ${\overline{m}}_i =(m_{i0},m_{i1},...,m_{iL}).$ Then, given any $\gamma >0 $, we define $B^\gamma (i,{\overline{m}}_i)$ as the set of all states $x\in X$ such that

$$\begin{aligned} \sum _{\ell =0}^L \Vert c_i^\ell (x,\cdot )- w_{m_{i\ell }}\Vert < \gamma . \end{aligned}$$

(3.1)

For any $k \in {\mathbb {N}},$ let

$$\begin{aligned} B_k^\gamma:= & {} \{x\in X: \Vert \delta (x,\cdot ,\cdot )-\delta (x_k,\cdot ,\cdot ) \Vert _c \nonumber \\= & {} \max _{\pmb {a}\in A}\int _X|\delta (x,y,\pmb {a})-\delta (x_k,y,\pmb {a})|\mu (dy) <\gamma \}. \end{aligned}$$

(3.2)

It is obvious that the sets $B_k^\gamma $ and $B^\gamma (i,{\overline{m}}_i)$ belong to ${{\mathcal {F}}}$ and the union of all sets

$$\begin{aligned} B_k^\gamma \cap B^\gamma (1,{\overline{m}}_1) \cap \ldots \cap B^\gamma (N,{\overline{m}}_N) \end{aligned}$$

is the whole state space X. Indeed, if $x\in X,$ then there exists $k\in {\mathbb {N}}$ such that $x\in B_k^\gamma $ and, for any player $i\in {{{\mathcal {N}}}},$ there exist functions $w_{m_{i\ell }}\in {{{\mathcal {C}}}}_b, $ and thus ${\overline{m}}_i $ such that (3.1) holds.

Let $\xi $ be a fixed one-to-one correspondence between the sets ${\mathbb {N}}$ and ${\mathbb {N}}\times {\mathbb {N}}^{N(L+1)}.$ Assuming that $j\in {\mathbb {N}}$ and $\xi (j)= (k,{\overline{m}}_1,...,{\overline{m}}_N),$ we put

$$\begin{aligned} Y_j^\gamma := B_k^\gamma \cap B^\gamma (1,{\overline{m}}_1) \cap \ldots \cap B^\gamma (N,{\overline{m}}_N). \end{aligned}$$

We can assume without loss of generality that $Y^\gamma _1 \not = \emptyset .$ Next, we set $X^\gamma _1= Y_1^\gamma $ and

$$\begin{aligned} X_\tau ^\gamma = Y_\tau ^\gamma - \bigcup _{t<\tau } X^\gamma _t, \quad \text{ for }\quad \tau \in {\mathbb {N}}\setminus \{1\}. \end{aligned}$$

Omitting empty sets $X_\tau ^\gamma $ we obtain a subset ${\mathbb {N}}_0 \subset {\mathbb {N}}$ such that

$$\begin{aligned} {{{\mathcal {P}}}}^\gamma = \{ X_j^\gamma : j\in {\mathbb {N}}_0 \} \end{aligned}$$

is a measurable partition of the state space X. Choose any $n \in {\mathbb {N}}_0.$ Then, $\xi (n)$ is a unique sequence in ${{\mathbb {N}}}\times {\mathbb {N}}^{N(L+1)}$ that depends on n and, therefore, we can write $\xi (n)= (k^n,{\overline{m}}_1^n,...,{\overline{m}}_N^n)$ where ${\overline{m}}_i^n = (m_{i0}^n, m_{i1}^n,...,m_{iL}^n),$ $i\in {{{\mathcal {N}}}}.$ Next, for each $x\in X_n^\gamma ,$ we define

$$\begin{aligned}{} & {} \delta ^\gamma (x, y,\pmb {a}) := \delta (x_{k^n},y,\pmb {a}) \quad \text{ for } y\in X\quad \text{ and }\quad c^{\ell ,\gamma }_i (x,\pmb {a}):= w_{m_{i,\ell }^n}(\pmb {a}) \nonumber \\{} & {} \quad \text{ for } \text{ all } \ell \in {{{\mathcal {L}}}}_0,\ i\in {{{\mathcal {N}}}}. \end{aligned}$$

(3.3)

From (3.1), (3.2) and (3.3), it follows that for each $n\in {\mathbb {N}}_0$ and $x\in X_n^\gamma ,$ we have

$$\begin{aligned} \Vert c_i^\ell (x,\cdot )- c^{\ell ,\gamma }_i (x, \cdot )\Vert <\gamma \quad \text{ for } \text{ all } \ell \in {{{\mathcal {L}}}}_0 \end{aligned}$$

(3.4)

and

$$\begin{aligned} \Vert \delta (x,\cdot ,\cdot )- \delta ^\gamma (x,\cdot ,\cdot )\Vert _c= \max _{\pmb {a}\in A} \int _X|\delta (x, y,\pmb {a})- \delta ^\gamma (x, y,\pmb {a})|\mu (dy) <\gamma . \end{aligned}$$

(3.5)

The original game defined in Sect. 2 is now denoted by ${{\mathcal {G}}}$. We use ${{{\mathcal {G}}}}^\gamma $ to denote the game, where the cost functions are $ c^{\ell ,\gamma }_i,$ $\ell \in {{{\mathcal {L}}}}_0$ and $i\in {{{\mathcal {N}}}}$, and the transition probability is

$$\begin{aligned} p^\gamma (B|x,\pmb {a}) = \int _X \delta ^\gamma (x, y,\pmb {a})\mu (dy), \quad B \in {{{\mathcal {F}}}}. \end{aligned}$$

Note that $ c^{\ell ,\gamma }_i(x,\pmb {a})$ and $ p^\gamma (B|x,\pmb {a})$ are constant functions of x on every set $X_n^\gamma .$

The discounted expected costs in the game ${{{\mathcal {G}}}}^\gamma $ under a multi-strategy $\pmb {\pi }\in \Pi $ are denoted by

$$\begin{aligned} J_i^{\ell ,\gamma }(\pmb {\pi })(x)\quad \text{ and }\quad J_i^{\ell ,\gamma }(\pmb {\pi }) = \int _X J_i^{\ell ,\gamma }(\pmb {\pi })(x)\eta (dx). \end{aligned}$$

Let

$$\begin{aligned} \epsilon (\gamma ) := \frac{\gamma (1-\alpha +b\alpha )}{1-\alpha }. \end{aligned}$$

(3.6)

From (3.4), (3.5) and Lemma 4.4 in [34], we conclude the following auxiliary result.

Lemma 3.1

For each $i\in {{{\mathcal {N}}}}$ and $\ell \in {{{\mathcal {L}}}}_0,$ we have

$$\begin{aligned} \sup _{x\in X} \sup _{\pmb {\pi } \in \Pi }|J_i^\ell (\pmb {\pi })(x)- J_i^{\ell ,\gamma }(\pmb {\pi })(x)|\le \epsilon (\gamma ). \end{aligned}$$

With ${{{\mathcal {G}}}}^\gamma $ we associate a stochastic game $\mathcal{G}^\gamma _c$ with the countable state space ${\mathbb {N}}_0 \subset {\mathbb {N}}$, the costs given by

$$\begin{aligned} {\widehat{c}}^{\ell ,\gamma }_i( n,\pmb {a}):=c^{\ell ,\gamma }_i( x,\pmb {a}), \quad x\in X^\gamma _n,\quad n \in {\mathbb {N}}_0, \quad \pmb {a}\in A, \end{aligned}$$

(3.7)

and transitions defined as

$$\begin{aligned} {\widehat{p}}^\gamma (\tau |n,\pmb {a}):=\delta ^\gamma (X_\tau ^\gamma |x,\pmb {a}), \quad x\in X^\gamma _n,\quad n, \tau \in {\mathbb {N}}_0, \quad \pmb {a}\in A. \end{aligned}$$

(3.8)

Note that the right-hand sides in (3.7) and (3.8) are independent of x in $X_n^\gamma $ and thus the costs and transitions above are well-defined. A stationary strategy for player $i\in {{{\mathcal {N}}}}$ in the game ${{{\mathcal {G}}}}^\gamma _c$ is a transition probability $f_i$ from ${\mathbb {N}}_0$ to $A_i.$ The set of all stationary strategies for player $i\in {{{\mathcal {N}}}}$ in this game is denoted by $F_i.$ We put $F:= \prod _{i\in {{{\mathcal {N}}}}}F_i.$

The expected discounted costs in the game ${{{\mathcal {G}}}}^\gamma _c$ under stationary multi-strategy $\pmb {\pi }$ are denoted by

$$\begin{aligned} {\widehat{J}}^{\ell ,\gamma }_i(\pmb {\pi })(n), \ n\in {\mathbb {N}}_0,\quad \text{ and }\quad {\widehat{J}}^{\ell ,\gamma }_i(\pmb {\pi }) = \sum _{n\in {\mathbb {N}}_0}{\widehat{J}}^{\ell ,\gamma }_i(\pmb {\pi })(n)\eta (X_n^\gamma ). \end{aligned}$$

Let $ \Phi ^\gamma _i$ be the set of all piecewise constant stationary strategies of player $i\in {{{\mathcal {N}}}} $ in the game $\mathcal{G}^\gamma .$ A strategy $\varphi _i\in \Phi ^\gamma _i$, if, for each $n\in {\mathbb {N}}_0,$ there exists a probability measure $\nu _n$ on $A_i$ such that $\varphi _i (da_i|x)= \nu _n(da_i)$ for all $x\in X_n^\gamma .$ We put $\Phi ^\gamma = \prod _{i\in {{{\mathcal {N}}}}} \Phi ^\gamma _i.$

Let $\pmb {f}= (f_1,...,f_N) \in F$ and $\pmb {\varphi }= (\varphi _1,...,\varphi _N)\in \Phi ^\gamma $ be such that

$$\begin{aligned} \varphi _i(da_i|x)= f_i(da_i|n)\quad \text{ for } \text{ all }\quad i\in \mathcal{N},\ n\in {\mathbb {N}}_0,\ x\in X_n^\gamma . \end{aligned}$$

(3.9)

Then, for each $i\in {{{\mathcal {N}}}}$, $\ell \in {{{\mathcal {L}}}}_0,$ $n\in {\mathbb {N}}$ and $x\in X_n^\gamma ,$

$$\begin{aligned} J^{\ell ,\gamma }_i(\pmb {\varphi })(x)= {\widehat{J}}^{\ell ,\gamma }_i(\pmb {f})(n) \end{aligned}$$

(3.10)

and

$$\begin{aligned} J^{\ell ,\gamma }_i(\pmb {\varphi }) = {\widehat{J}}^{\ell ,\gamma }_i(\pmb {f}). \end{aligned}$$

(3.11)

Equations (3.10) and (3.11) show that ${{{\mathcal {G}}}}^\gamma $ with the strategy sets $\Phi ^\gamma _i$ can be recognised as a game with a countable state space. This observation plays an important role in the proof, because we can apply a result for games on countable state spaces.

Proof of Theorem 2.3

Let $\varepsilon >0$ and $i \in {{{\mathcal {N}}}}$. Choose $\gamma >0$ in (3.6) such that $\epsilon (\gamma ) < \varepsilon /2.$ By Assumption A3 and Remark 2.2 we imply that for any multi-strategy $\pmb {\varphi }\in \Phi ^\gamma $ there exists $\sigma _i\in \Phi _i$ such that

$$\begin{aligned} J^\ell _i([\pmb {\varphi _{-i}},\sigma _i]) \le \kappa ^\ell _i \quad \text{ for } \text{ all }\ \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

(3.12)

By Lemma 7.1 in Appendix, there exists a piecewise constant Markov strategy ${\overline{\pi }}_i$ such that

$$\begin{aligned} J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}},\sigma _i])=J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}},{\overline{\pi }}_i]) \end{aligned}$$

for all $\ell \in {{{\mathcal {L}}}}_0.$ By Lemma 3.1 and (3.12) we conclude that

$$\begin{aligned} J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}},{\overline{\pi }}_i])<\kappa ^\ell _i +\frac{\varepsilon }{2} \quad \text{ for } \text{ all }\ \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

This means that the approximating game ${{{\mathcal {G}}}}^\gamma $ satisfies the Slater condition with the constants $\kappa ^\ell _i +\frac{\varepsilon }{2},$ $\ell \in {{{\mathcal {L}}}}.$ Note that the constraint constants in ${{{\mathcal {G}}}}^\gamma $ are also equal $\kappa ^\ell _i +\frac{\varepsilon }{2},$ $\ell \in {{{\mathcal {L}}}}.$ Therefore, the associated game ${{{\mathcal {G}}}}_c^\gamma $ also satisfies the Slater condition with the same constants $ \kappa ^\ell _i +\frac{\varepsilon }{2},$ $\ell \in {{{\mathcal {L}}}}.$ Making use of Corollary 2 in [28], we infer that the game ${{{\mathcal {G}}}}_c^\gamma $ possesses a stationary Nash equilibrium $\pmb {f}^*= (f_1^*,...,f_N^*).$ Define $\pmb {\varphi }^* = (\varphi _1^*,...,\varphi _N^*)\in \Phi ^\gamma $ as in (3.9) with $ \pmb {\varphi }=\pmb {\varphi }^*$ and $f=f^*.$ Then,

$$\begin{aligned} J_i^{0,\gamma }(\pmb {\varphi }^*)\le J_i^{0,\gamma }([\pmb {\varphi _{-i}^*},{\hat{\pi }}_i]) \end{aligned}$$

for any piecewise constant strategy ${\hat{\pi }}_i$ such that

$$\begin{aligned} J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}^*},{\hat{\pi }}_i])\le \kappa ^\ell _i +\frac{\varepsilon }{2}\quad \text{ for } \text{ all }\ \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

We now show that $\pmb {\varphi }^*$ is an approximate equilibrium in the original game. Note that for every player $i\in {{{\mathcal {N}}}}$

$$\begin{aligned} J_i^{\ell ,\gamma }( \pmb {\varphi ^*} )= J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}^*},\varphi _i^*])\le \kappa ^\ell _i +\frac{\varepsilon }{2}\quad \text{ for } \text{ all }\ \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

Hence, for every player $i\in {{{\mathcal {N}}}}$

$$\begin{aligned} J_i^{\ell }( \pmb {\varphi ^*} ) \le \kappa ^\ell _i + \varepsilon \quad \text{ for } \text{ all }\ \ell \in {{{\mathcal {L}}}}, \end{aligned}$$

i.e., condition (2.1) holds. Consider any feasible strategy $\pi _i\in \Delta _i(\pmb {\varphi _{-i}^*}),$ i.e.,

$$\begin{aligned} J^\ell _i([\pmb {\varphi _{-i}^*},\pi _i]) \le \kappa ^\ell _i, \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

(3.13)

Applying Remark 2.2, we deduce that there exists a strategy $\sigma _i\in \Phi _i$ such that

$$\begin{aligned} J^\ell _i([\pmb {\varphi _{-i}^*},\pi _i])=J^\ell _i([\pmb {\varphi _{-i}^*},\sigma _i])\quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

(3.14)

Then, by Lemma 7.1 in Appendix, there exists a piecewise constant Markov strategy ${\overline{\pi }}_i$ such that

$$\begin{aligned} J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}^*},\sigma _i])=J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}^*},{\overline{\pi }}_i])\quad \hbox { for all}\ \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

(3.15)

Moreover, by (3.15), Lemma 3.1, (3.14) and (3.13), for every $\ell \in {{{\mathcal {L}}}},$ we have

$$\begin{aligned} J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}^*},{\overline{\pi }}_i])\le J^\ell _i([\pmb {\varphi _{-i}^*},\sigma _i])+\frac{\varepsilon }{2} = J^\ell _i([\pmb {\varphi _{-i}^*},\pi _i])+\frac{\varepsilon }{2} \le \kappa ^\ell _i +\frac{\varepsilon }{2}. \end{aligned}$$

In other words, ${\overline{\pi }}_i$ is a feasible strategy in $\mathcal{G}^\gamma $. Therefore, by Lemma 3.1, (3.15) and (3.14), we infer

$$\begin{aligned} J_i^{0}(\pmb {\varphi }^*)\le & {} J_i^{0,\gamma }(\pmb {\varphi }^*)+\frac{\varepsilon }{2}\le J_i^{0,\gamma }([\pmb {\varphi _{-i}^*},{\overline{\pi }}_i])+\frac{\varepsilon }{2}= J_i^{0,\gamma }([\pmb {\varphi _{-i}^*},\sigma _i])+\frac{\varepsilon }{2}\\< & {} J_i^{0}([\pmb {\varphi _{-i}^*},\sigma _i]) +\varepsilon = J_i^{0}([\pmb {\varphi _{-i}^*},\pi _i])+\varepsilon . \end{aligned}$$

This fact together with (3.13) implies that (2.2) holds. $\square $

Proof of Theorem 2.8

Let $\varepsilon >0$ be fixed. Choose $\gamma >0$ in (3.6) such that $\epsilon (\gamma ) < \varepsilon /2.$ By Theorem 2.3 in [15], the game ${{{\mathcal {G}}}}^\gamma _c$ has a stationary equilibrium $\pmb {f}^*= (f_1^*,...,f_N^*).$ Define $\pmb {\varphi }^* = (\varphi _1^*,...,\varphi _N^*)\in \Phi ^\gamma $ as in the proof of Theorem 2.3. Then we have

$$\begin{aligned} J_i^{0,\gamma }(\pmb {\varphi }^*)(x) = \inf _{\phi _i\in \Phi _i^\gamma } J_i^{0,\gamma }([\pmb {\varphi _{-i}^*},\phi _i])(x), \quad i\in \mathcal{N},\ x\in X. \end{aligned}$$

(3.16)

As in Lemma 4.1 in [34], we can prove that

$$\begin{aligned} \inf _{\phi _i\in \Phi ^\gamma _i} J_i^{0,\gamma }([\pmb {\varphi _{-i}^*},\phi _i])(x)= \inf _{\phi _i\in \Phi _i} J_i^{0,\gamma }([\pmb {\varphi _{-i}^*},\phi _i])(x), \quad i\in {{{\mathcal {N}}}},\ x\in X. \end{aligned}$$

(3.17)

By (3.16) and (3.17), we get

$$\begin{aligned} J_i^{0,\gamma }(\pmb {\varphi }^*)(x) = \inf _{\phi _i\in \Phi _i} J_i^{0,\gamma }([\pmb {\varphi _{-i}^*},\phi _i])(x), \quad i\in \mathcal{N},\ x\in X. \end{aligned}$$

This equality and Lemma 3.1 imply that

$$\begin{aligned} J_i^{0 }(\pmb {\varphi }^*)(x)-\varepsilon \le \inf _{\phi _i\in \Phi _i} J_i^{0 }([\pmb {\varphi _{-i}^*},\phi _i])(x), \quad i\in \mathcal{N},\ x\in X. \end{aligned}$$

(3.18)

By standard methods in discounted dynamic programming [8, 34], we have

$$\begin{aligned} \inf _{\phi _i\in \Phi _i} J_i^{0}([\pmb {\varphi _{-i}^*},\phi _i])(x)= \inf _{\sigma _i\in \Pi _i} J_i^{0}([\pmb {\varphi _{-i}^*},\sigma _i])(x), \quad i\in {{{\mathcal {N}}}},\ x\in X. \end{aligned}$$

This fact and (3.18) imply that

$$\begin{aligned} J_i^{0 }(\pmb {\varphi }^*)(x)-\varepsilon \le \inf _{\sigma _i\in \Pi _i} J_i^{0}([\pmb {\varphi _{-i}^*},\sigma _i])(x), \quad i\in \mathcal{N},\ x\in X, \end{aligned}$$

which completes the proof. $\square $

Remark 3.2

The proof of Theorem 2.8 is similar to that of Theorem 3.1 in [34], but it has one important change implying that the restrictive condition (2.3) can be dropped.

4 Young Measures and the Proof of Theorem 2.13

Let $\vartheta :=(\eta +\mu )/2.$ A function $c:{\mathbb {K}}\rightarrow {\mathbb {R}}$ is Carathéodory, if it is product measurable on ${\mathbb {K}}$, $c(x,\cdot )$ is continuous on A for each $x\in X$ and

$$\begin{aligned} \int _X\max \limits _{\pmb {a}\in A}|c(x,\pmb {a})|\vartheta (dx)<\infty . \end{aligned}$$

Let $\Psi ^\vartheta $ be the space of all $\vartheta $-equivalence classes of functions in $\Psi .$ The elements of $\Psi ^\vartheta $ are called Young measures. Note that the expected discounted cost functionals are well-defined for all elements of $\Psi ^\vartheta .$ More precisely, if $\psi ^\vartheta \in \Psi ^\vartheta ,$ then $J^\ell _i(\psi )$ is the same for all representatives $\psi $ of $\psi ^\vartheta $ in $\Psi $ and we can understand $J^\ell _i(\psi ^\vartheta )$ as $J^\ell _i(\psi ).$ We shall identify in notation $\psi ^\vartheta $ with its representative $\psi $ and omit the superscript $\vartheta .$

We assume that the space $\Psi ^\vartheta $ is endowed with the weak* topology. Since ${{{\mathcal {F}}}}$ is countably generated, $\Psi ^\vartheta $ is metrisable. Moreover, since the set A is compact, $\Psi ^\vartheta $ is a compact convex subset of a locally convex linear topological space. For a detailed discussion of these issues consult with [7] or Chapter 3 in [19]. Here, we recall that $\psi ^n \rightarrow ^* \psi ^0$ in $\Psi ^\vartheta $ as $n\rightarrow \infty $ if and only if for every Carathéodory function $c:{\mathbb {K}}\rightarrow {\mathbb {R}}$, we have

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\int _X\int _{A}c(x,\pmb {a})\psi ^n(d\pmb {a}|x)\vartheta (dx)= \int _X\int _{A}c(x,\pmb {a})\psi ^0(d\pmb {a}|x)\vartheta (dx). \end{aligned}$$

We now choose $\varepsilon _n>0$ such that $\varepsilon _n \searrow 0$ as $n\rightarrow \infty $ and define

$$\begin{aligned} \gamma _n:=\frac{\varepsilon _n(1-\alpha )}{(1-\alpha +b\alpha )}. \end{aligned}$$

(4.1)

In other words, $\epsilon (\gamma _n)=\varepsilon _n$ or $\gamma _n=\epsilon ^{-1}(\varepsilon _n).$ From Theorem 2.3, it follows that there exists a profile of stationary piecewise constant strategies

$$\begin{aligned} \pmb {\psi ^n}=(\psi _{1}^n,\ldots ,\psi _{N}^n)\in \Phi ^{\gamma _n}, \end{aligned}$$

which comprises an approximate equilibrium in the CSG for $\varepsilon _n$ and at the same time an equilibrium in the corresponding constrained game ${{{\mathcal {G}}}}^{\gamma _n}$ with $\gamma _n$ as in (4.1) and the constraint constants $\kappa ^\ell _i+\frac{\varepsilon _n}{2}.$

Define the product measure on A, for every $x\in X$ and $n\in {\mathbb {N}}$, as

$$\begin{aligned} \psi ^n(\cdot |x):=\psi _{1}^n(\cdot |x)\otimes \ldots \otimes \psi _{N}^n(\cdot |x). \end{aligned}$$

(4.2)

We use $\psi ^n$ to denote the class in $\Psi ^\vartheta $ whose representative is this transition probability. Without loss of generality, we may assume that $\psi ^n$ converges in the weak* topology to some $\psi ^* \in \Psi ^\vartheta $ as $n\rightarrow \infty .$

We shall need the following results. The first one is a consequence of Lemma 3.1 and the fact that $J_i^{\ell ,\gamma _n}(\pmb {\psi ^n})= J_i^{\ell ,\gamma _n}(\psi ^n)$ and $J_i^{\ell }(\pmb {\psi ^n})= J_i^{\ell }(\psi ^n).$

Lemma 4.1

For each $i\in {{{\mathcal {N}}}}$ and $\ell \in {{{\mathcal {L}}}}_0,$ we have

$$\begin{aligned}{} & {} \sup _{\psi \in \Psi }|J_i^\ell (\psi )- J_i^{\ell ,\gamma _n}(\psi )|\le \varepsilon _n, \\{} & {} \sup _{\psi _{-i} \in \Psi _{-i}}\sup _{\pi _i\in \Pi _i}|J_i^\ell ([\psi _{-i},\pi _i])- J_i^{\ell ,\gamma _n}([\psi _{-i},\pi _i])|\le \varepsilon _n, \end{aligned}$$

where $\gamma _n$ is as in (4.1).

Lemma 4.2

If $n\rightarrow \infty ,$ then for any $\ell \in {{{\mathcal {L}}}}_0$

(a) $J_i^{\ell ,\gamma _n}(\psi ^n) \rightarrow J_i^{\ell }(\psi ^*)$ ,

(b) $J_i^{\ell ,\gamma _n}([\psi _{-i}^n,\phi _i]) \rightarrow J_i^{\ell }([\psi _{-i}^*,\phi _i])$ for any $\phi _i\in \Phi _i.$

Proof

For part (a) we first use the triangle inequality

$$\begin{aligned} |J_i^{\ell ,\gamma _n}(\psi ^n) - J_i^{\ell }(\psi ^*)|\le |J_i^{\ell ,\gamma _n}(\psi ^n) - J_i^{\ell }(\psi ^n)|+ |J_i^{\ell }(\psi ^n) - J_i^{\ell }(\psi ^*)|. \end{aligned}$$

The first term on the right-hand side converges to 0 by Lemma 4.1 and the definition of $\psi ^n,$ whereas the convergence to 0 of the the second term follows from Lemma 4.1 in [27] and the fact that $|J_i^{\ell }(\cdot )|\le b$ for every $i\in {{{\mathcal {N}}}}$ and $\ell \in {{{\mathcal {L}}}}_0.$ Part (b) is proved as point (a) by using the Fubini theorem and noting that the elements in $\Psi ^\vartheta $ induced by $\psi _{-i}^n$ in (4.2) and $\phi _i$ converge in the weak* sense to the element of $\Psi ^\vartheta $ induced by $\psi _{-i}^*$ and $\phi _i.$ $\Box $ $\square $

Let $i\in {{{\mathcal {N}}}}.$ Consider a Markov decision process with player i as a decision maker and the transition probability

$$\begin{aligned} q^{\gamma _n}(dy|x,a_i)= \int _{A_{-i}}p^{ \gamma _n}(dy|x,[\pmb {a_{-i}},a_i]) \psi _{-i}^n(d\pmb {a_{-i}}|x), \quad (x,a_i)\in {\mathbb {K}}_i. \end{aligned}$$

Let $1_D$ be the indicator of the set $D\subset X\times A.$ The associated occupation measure, when player i uses a stationary strategy $\varphi _i\in \Phi _i$ is defined as follows

$$\begin{aligned} \theta ^{\gamma _n}_{\varphi _i}(B\times C)= (1-\alpha )\sum _{t=1}^\infty \alpha ^{t-1} {{{\mathcal {E}}}}_\eta ^{\varphi _i} 1_{B\times C}(x^t,a_i^t) \end{aligned}$$

(4.3)

for any $B\in {{{\mathcal {F}}}}$ and a Borel set C in $A_i.$ We use the symbol ${{{\mathcal {E}}}}^{\varphi _i}_\eta $ to denote the expectation operator corresponding to the unique probability measure induced by $\varphi _i\in \Phi _i$, the initial distribution $\eta $ and the transition probability $q^{\gamma _n}.$ For $\ell \in {{{\mathcal {L}}}}_0,$ $x\in X$ and $a_i\in A_i,$ set

$$\begin{aligned} c_i^{\ell , \gamma _n }(x,a_i):= \int _{A_{-i}}c_i^{\ell ,\gamma _n}(x,[\pmb {a_{-i}},a_i]) \psi _{-i}^n(d\pmb {a_{-i}}|x). \end{aligned}$$

Proof of Theorem 2.13

Observe that Assumption A4 implies A3. We consider the weak* limit $\psi ^* \in \Psi ^\vartheta $ mentioned above and denote its representative in $\Psi $ by the same letter.

We shall show that $\psi ^*$ is a weak correlated equilibrium. By Theorem 2.3, $J^\ell _i(\pmb {\psi ^n}) =J^\ell _i(\psi ^n) \le \kappa ^\ell _i +\varepsilon _n$ for all $i\in {{{\mathcal {N}}}}$ and $\ell \in {{{\mathcal {L}}}}.$ Using Lemma 4.2(a), we conclude that

$$\begin{aligned} J^\ell _i(\psi ^*)=\lim _{n\rightarrow \infty }J^\ell _i(\psi ^n) \le \kappa ^\ell _i,\quad i\in {{{\mathcal {N}}}},\ \ell \in {{{\mathcal {L}}}}, \end{aligned}$$

i.e., $\psi ^*$ is feasible.

Take (if possible) any feasible strategy in the CSG for player $i\in {{{\mathcal {N}}}}$, i.e., $ \pi _i\in \Pi _i$ such that

$$\begin{aligned} J_i^{\ell } ([\psi _{-i}^*,\pi _i])\le \kappa _i^\ell \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

By Remark 2.2 there exists a strategy $\phi _i\in \Phi _i$ such that

$$\begin{aligned} J^\ell _i([\psi _{-i}^*,\pi _i])=J^\ell _i([\psi _{-i}^*,\phi _i])\quad \text{ for } \text{ all } \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

$1^\circ $ Assume first that

$$\begin{aligned} J^\ell _i([\psi _{-i}^*,\pi _i])=J_i^{\ell } ([\psi _{-i}^*,\phi _i])< \kappa _i^\ell \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

(4.4)

From this inequality and Lemma 4.2(b), we infer that there exists $N_1\in {\mathbb {N}}$ such that

$$\begin{aligned} J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\phi _i])< \kappa _i^\ell \quad \text{ for } \text{ all } \quad \ell \in {{{\mathcal {L}}}}\quad \text{ and }\quad n\ge N_1. \end{aligned}$$

For every $n\ge N_1$ and Lemma 7.1 in Appendix we conclude the existence of a piecewise constant Markov strategy ${\overline{\pi }}_i$ (that may depend on n) such that

$$\begin{aligned} J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\phi _i])=J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,{\overline{\pi }}_i]) \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

Hence, it must hold

$$\begin{aligned} J_i^{0,\gamma _n}(\psi ^n)\le J_i^{0,\gamma _n}([\psi ^n_{-i},{\overline{\pi }}_i])= J_i^{0,\gamma _n} ([\psi ^n_{-i},\phi _i]). \end{aligned}$$

In other words, for every $n\ge N_1$ we have

$$\begin{aligned} J_i^{0,\gamma _n}(\psi ^n)\le J_i^{0,\gamma _n} ([\psi _{-i}^n,\phi _i]). \end{aligned}$$

Letting $n\rightarrow \infty $ and making use of Lemma 4.2, we infer

$$\begin{aligned} J_i^{0}(\psi ^*)\le J_i^{0} ([\psi _{-i}^*,\phi _i])=J^0_i([\psi _{-i}^*,\pi _i]) \end{aligned}$$

for any feasible strategy $\pi _i\in \Pi _i $ such that (4.4) holds.

$2^\circ $ Assume now that there is player $i\in {{{\mathcal {N}}}}$ and an index $\ell _0\in {{{\mathcal {L}}}}$ such that

$$\begin{aligned} J^{\ell _0}_i([\psi _{-i}^*,\pi _i])=J_i^{\ell _0} ([\psi _{-i}^*,\phi _i])= \kappa _i^{\ell _0}. \end{aligned}$$

(4.5)

From the proof of Lemma 4.2(b), it follows that there exists a sequence $e_n\ \rightarrow 0$ as $n\rightarrow \infty ,$ $e_n>0,$ such that

$$\begin{aligned} J_i^{\ell }([\psi _{-i}^n,\phi _i])\le J_i^{\ell } ([\psi _{-i}^*,\phi _i])+e_n\le \kappa _i^\ell +e_n \qquad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

By Remark 2.12, we can find $\zeta >0$ such that for every $n\in {\mathbb {N}}$ there exists a strategy $\sigma ^n_i\in \Phi _i$ such that

$$\begin{aligned} J_i^{\ell } ([\psi _{-i}^n,\sigma ^n_i])< \kappa _i^\ell -\zeta \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

Hence, by Lemma 4.1, we conclude

$$\begin{aligned} J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\phi _i])-\varepsilon _n\le J_i^{\ell } ([\psi _{-i}^n,\phi _i])\le \kappa _i^\ell +e_n \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}} \end{aligned}$$

and

$$\begin{aligned} J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\sigma ^n_i])-\varepsilon _n\le J_i^{\ell } ([\psi _{-i}^n,\sigma ^n_i])< \kappa _i^\ell -\zeta \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

Let $N_2\in {\mathbb {N}}$ be such that $\varepsilon _{N_2}<\zeta $ for all $n\ge N_2$ set

$$\begin{aligned} \xi _n:=\frac{\varepsilon _n+e_n}{ \zeta +e_n} \end{aligned}$$

and observe that $\xi _n\rightarrow 0$ as $n\rightarrow \infty $ and $\xi _n\in (0,1) $ for all $n > N_2$. Let $\theta _{\phi _i}^{\gamma _n}$ and $ \theta _{\sigma ^n_i}^{\gamma _n}$ be two occupation measures defined as in (4.3). By Proposition 3.9 in [13], we define a sequence of occupation measures as follows

$$\begin{aligned} \theta ^n:=\xi _n\theta _{\sigma ^n_i}^{\gamma _n}+(1-\xi _n)\theta _{\phi _i}^{\gamma _n}. \end{aligned}$$

Then, for all $\ell \in {{{\mathcal {L}}}}_0$ it holds

$$\begin{aligned} \int _{X\times A_i} c^{\ell ,\gamma _n}_i(x,a_i)\theta ^n(dx\times da_i) = \xi _nJ_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\sigma ^n_i]) + (1-\xi _{n}) J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\phi _i]).\nonumber \\ \end{aligned}$$

(4.6)

Hence, for $n\ge N_2$ and all $\ell \in {{{\mathcal {L}}}}$, from (4.6), we have

$$\begin{aligned} \int _{X\times A_i} c^{\ell ,\gamma _n}_i(x,a_i)\theta ^n(dx\times da_i)\le & {} \xi _n (\kappa _i^\ell +\varepsilon _n -\zeta )+(1-\xi _n)( \kappa _i^\ell +\varepsilon _n+e_n)\nonumber \\= & {} -\xi _n(e_n+\zeta )+\kappa _i^\ell +\varepsilon _n +e_n \le \kappa _i^\ell <\kappa _i^\ell +\frac{\varepsilon _n}{2}.\nonumber \\ \end{aligned}$$

(4.7)

By Lemma 2.3 in [13] or Theorem 3.2 in [19] for every $n\ge N_2,$ there exists a stationary strategy $\chi ^n_i\in \Phi _i$ such that $\theta ^n$ can be written as in (4.3) with $\mathcal{E}_\eta ^{\varphi _i} $ replaced by ${{{\mathcal {E}}}}_\eta ^{\chi ^n_i}.$ In other words $ \theta ^n= \theta ^{\gamma _n}_{\chi ^n_i}.$ Therefore, for all $\ell \in {{{\mathcal {L}}}}_0,$ we obtain

$$\begin{aligned} \int _{X\times A_i} c^{\ell ,\gamma _n}_i(x,a_i)\theta ^n(dx\times da_i)= J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\chi ^n_i]). \end{aligned}$$

(4.8)

By Lemma 7.1 in Appendix for every $n\in {\mathbb {N}}$ there exists a piecewise constant Markov strategy ${\overline{\pi }}^n_i$ such that

$$\begin{aligned} J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\chi ^n_i])=J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,{\overline{\pi }}^n_i]) \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

By (4.7) and (4.8)

$$\begin{aligned} J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\chi ^n_i])\le \kappa _i^\ell <\kappa _i^\ell +\frac{\varepsilon _n}{2} \qquad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}. \end{aligned}$$

Hence, it must hold

$$\begin{aligned} J_i^{0,\gamma _n}(\psi ^n)\le J_i^{0,\gamma _n}([\psi ^n_{-i},{\overline{\pi }}^n_i]) =J_i^{0,\gamma _n} ([\psi ^n_{-i},\chi ^n_i]).\end{aligned}$$

(4.9)

We know that

$$\begin{aligned} J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\chi ^n_i])= \xi _nJ_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\sigma ^n_i])+ (1-\xi _n) J_i^{\ell ,\gamma _n} ([\psi _{-i}^n, \phi _i]). \end{aligned}$$

Therefore, by Lemma 4.2(b) and (4.8), we get

$$\begin{aligned} \lim _{n\rightarrow \infty } J_i^{\ell ,\gamma _n} ([\psi _{-i}^n,\chi ^n_i])= J_i^{\ell } ([\psi _{-i}^*,\phi _i]) \end{aligned}$$

for all $\ell \in {{{\mathcal {L}}}}_0.$ This fact, (4.9) and Lemma 4.2(a) yield that

$$\begin{aligned} J_i^{0}(\psi ^*)\le J_i^{0} ([\psi _{-i}^*,\phi _i])=J^0_i([\psi _{-i}^*,\pi _i]) \end{aligned}$$

for any feasible strategy $\pi _i\in \Pi _i$ for which (4.5) holds. $\square $

Let $\Psi ^\vartheta _i$ be the space of $\vartheta $-equivalence classes of strategies in $\Phi _i$ endowed with the weak* topology. Clearly, $\Psi ^\vartheta _i$ is a compact metric space. The cost functionals $J^\ell _i(\pmb {\varphi }),$ $\ell \in {{{\mathcal {L}}}}_0$ and $i\in {{{\mathcal {N}}}}$, are well defined for any profile $\pmb {\varphi }= (\varphi _1,...,\varphi _N)\in {\widehat{\Psi }}^\vartheta = \prod _{j\in {{{\mathcal {N}}}}} \Psi ^\vartheta _j.$

Remark 4.3

From Example 3.16 in [14] based on Rademacher’s functions, it follows that the weak* limit of the sequence of approximate equilibria in Theorem 2.13 need not be a stationary Nash equilibrium. The same example can be used to see that the cost functionals $J^\ell _i,$ $\ell \in {{{\mathcal {L}}}}_0$ and $i\in {{{\mathcal {N}}}}$, may be discontinuous on ${\widehat{\Psi }}^\vartheta .$

Consider the two-person game. It follows from Lemma 4.2 that $J^\ell _i(\varphi _1,\varphi _2)$ is separately continuous in $\varphi _1$ and $\varphi _2$. Therefore, the functions

$$\begin{aligned} R_1(\varphi _1):= \min _{\varphi _2\in \Psi _2^\vartheta } \max _{\ell \in {{{\mathcal {L}}}}}\ ( J^\ell _1(\varphi _1,\varphi _2)-\kappa ^\ell _1) \quad \text{ and }\quad R_2(\varphi _2):= \min _{\varphi _1\in \Psi _1^\vartheta } \max _{\ell \in {{{\mathcal {L}}}}}\ (J^\ell _2(\varphi _1,\varphi _2)-\kappa ^\ell _2) \end{aligned}$$

are upper semicontinuous on $\Psi ^\vartheta _1$ and $\Psi ^\vartheta _2$, respectively.

Remark 4.4

Consider a two-person game satisfying the standard Slater condition A5. Then, it follows

$$\begin{aligned} R_1(\varphi _1)<0 \quad \text{ and }\quad R_2(\varphi _2)<0 \end{aligned}$$

for all $\varphi _1\in \Psi _1^\vartheta $ and $\varphi _2\in \Psi _2^\vartheta .$ Since $R_1$ and $R_2$ are upper semicontinuous on the compact spaces $\Psi _1^\vartheta $ and $\Psi _2^\vartheta $, respectively, we conclude that

$$\begin{aligned} \max _{\varphi _1\in \Psi _1^\vartheta } R_1(\varphi _1)<0 \quad \text{ and }\quad \max _{\varphi _2\in \Psi _2^\vartheta } R_2(\varphi _2)<0. \end{aligned}$$

(4.10)

Obviously, $\varphi _1$ and $\varphi _2$ in inequalities (4.10) can be understood as representatives of (denoted by the same letters) classes in $\Psi ^\vartheta _1$ and $\Psi ^\vartheta _2,$ respectively. Then, it is apparent that A5 implies A4 for the considered two-person game.

Since in the N-person ARAT game the cost functionals are continuous on ${\widehat{\Psi }}^\vartheta $ with the product topology [13], A5 implies A4 in this case.

Finally,we note that in the countable state space case, the weak* topology on $\Psi ^\vartheta _i$ is actually the topology of point-wise convergence and all cost functionals $J^\ell _i$ are continuous on the compact space ${\widehat{\Psi }}^\theta $ with the product topology. Therefore, the standard Slater condition A4 made in the literature for these games, see [3, 4, 28, 42], is equivalent to A5.

5 Non-existence of Stationary Equilibria in Discounted Constrained Games

In this section, we consider discounted stochastic games with the given initial state distribution $\eta .$ If $c_i^\ell =0$ and $\kappa _i^\ell =1$ for all $i\in {{{\mathcal {N}}}} $ and $\ell \in {{{\mathcal {L}}}}$, then the game in this class is trivially constrained and Assumption A3 automatically holds. Our aim is to conclude from [29] that such a game may have no stationary Nash equilibrium. For this, we need the following fact.

Proposition 5.1

Let A1 and A2 be satisfied and in addition let $p(\cdot |x,\pmb {a}) \ll \eta $ for all $(x,\pmb {a})\in {\mathbb {K}}.$ If $\pmb {\varphi }= (\varphi _1,\ldots ,\varphi _N)\in \Phi $ is a stationary Nash equilibrium in the discounted stochastic game with the initial state distribution $\eta ,$ i.e.,

$$\begin{aligned} J_i^0(\pmb {\varphi }) \le J^0_i([\pmb {\varphi _{-i}},\pi _i]) \end{aligned}$$

(5.1)

for all $i\in {{{\mathcal {N}}}}$ and $\pi _i\in \Pi _i,$ then there exists a stationary Nash equilibrium $\pmb {\psi }=(\psi _1,...,\psi _N) $ in the unconstrained stochastic game for all initial states, i.e.,

$$\begin{aligned} J_i^0(\pmb {\psi })(x) \le J^0_i([\pmb {\psi _{-i}},\pi _i])(x) \end{aligned}$$

(5.2)

for all $i\in {{{\mathcal {N}}}},$ $\pi _i\in \Pi _i $ and $x\in X.$ Moreover, $\varphi _i(da_i|x)=\psi _i(da_i|x)$ for $\eta $-a.e. $x\in X $ and for all $i\in {{{\mathcal {N}}}}.$

We start with necessary notation. Let $\pmb {\phi }= (\phi _1,...,\phi _N) \in \Phi .$ Then

$$\begin{aligned} \phi (d\pmb {a}|x):=\phi _1(da_1|x)\otimes \phi _2(da_2|x)\otimes \cdots \otimes \phi _N(da_N|x) \end{aligned}$$

is the product measure on A determined by $\phi _i(da_i|x),$ $i=1,2,...,N.$ Recall that by $\phi _{-i}(d\pmb {a_{-i}}|x)$ we denote the projection of $\phi (d\pmb {a}|x)$ on $A_{-i}.$ We put

$$\begin{aligned} c_i^0(x,\pmb {\phi }):= \int _A c_i^0(x,\pmb {a})\phi (d\pmb {a}|x)\quad \text{ and }\quad p(dy|x,\pmb {\phi }):= \int _A p(dy|x,\pmb {a})\phi (d\pmb {a}|x). \end{aligned}$$

If $\sigma _i \in \Phi _i,$ then

$$\begin{aligned}{} & {} c_i^0(x,[\pmb {\phi _{-i}},\sigma _i]):= \int _{A_i}\int _{A_{-i}} c_i^0(x,[\pmb {a_{-i}},a_i]) \phi _{-i}(d\pmb {a_{-i}}|x)\sigma _i(da_i|x), \\{} & {} p(dy|x,[\pmb {\phi _{-i}},\sigma _i]):= \int _{A_i}\int _{A_{-i}}p(dy|x, [\pmb {a_{-i}},a_i])\phi _{-i}(d\pmb {a_{-i}}|x)\sigma _i(da_i|x). \end{aligned}$$

If $\nu _i \in \Pr (A_i),$ then

$$\begin{aligned} c_i^0(x,[\pmb {\phi _{-i}},\nu _i]) := c_i^0(x,[\pmb {\phi _{-i}},\sigma _i]) \quad \text{ and }\quad p(dy|x,[\pmb {\phi _{-i}},\nu _i]) := p(dy|x,[\pmb {\phi _{-i}},\sigma _i]) \end{aligned}$$

with $\sigma _i(da_i|x)= \nu _i(da_i)$ for all $x\in X.$

Let $v_i$, $i=1,2,...,N$, be bounded measurable functions on X. For each $x\in X,$ by $\Gamma _x(v_1,...,v_N)$ we denote the one-step N-person game, where the payoff (cost) function for player $i\in {{{\mathcal {N}}}} $ is

$$\begin{aligned} (1-\alpha )c_i^0(x,\pmb {a}) + \alpha \int _X v_i( y)p(dy|x,\pmb {a}), \quad \text{ where }\quad \pmb {a}=(a_1,...,a_N)\in A. \end{aligned}$$

Proof of Proposition 5.1

From (5.1), it follows that for each set $S\in {{{\mathcal {F}}}},$ we have

$$\begin{aligned} J_i^0(\pmb {\varphi })= & {} \int _{ X} \left( (1-\alpha )c^0_i(x,\pmb {\varphi }) +\alpha \int _XJ_i^0(\pmb {\varphi })(y)p(dy|x,\pmb {\varphi })\right) \eta (dx)\\{} & {} \le \int _S\min _{\nu _i\in \Pr (A_i)}\left( (1-\alpha )c^0_i(x,[\pmb {\varphi _{-i}},\nu _i]) +\alpha \int _XJ_i^0(\pmb {\varphi })(y)p(dy|x,[\pmb {\varphi _{-i}},\nu _i])\right) \eta (dx)\\{} & {} \quad +\int _{X\setminus S} \left( (1-\alpha )c^0_i(x,\pmb {\varphi }) +\alpha \int _XJ_i^0(\pmb {\varphi })(y)p(dy|x,\pmb {\varphi })\right) \eta (dx) \end{aligned}$$

Hence, for each $S\in {{{\mathcal {F}}}},$

$$\begin{aligned}{} & {} \int _{ S} \left( (1-\alpha )c^0_i(x,\pmb {\varphi }) +\alpha \int _XJ_i^0(\pmb {\varphi })(y)p(dy|x,\pmb {\varphi })\right) \eta (dx) \le \\{} & {} \quad \int _S\min _{\nu _i\in \Pr (A_i)}\left( (1-\alpha )c^0_i(x,[\pmb {\varphi _{-i}},\nu _i]) +\alpha \int _XJ_i^0(\pmb {\varphi })(y)p(dy|x,[\pmb {\varphi _{-i}},\nu _i])\right) \eta (dx). \end{aligned}$$

Thus, for every $i\in {{{\mathcal {N}}}},$ there exists $S_i\in {{{\mathcal {F}}}}$ such that $\eta (S_i)=1$ and for all $x\in S_i,$ we have

$$\begin{aligned}{} & {} (1-\alpha )c^0_i(x,\pmb {\varphi }) +\alpha \int _XJ_i^0(\pmb {\varphi })(y)p(dy|x,\pmb {\varphi }) \le \nonumber \\{} & {} \min _{\nu _i\in \Pr (A_i)}\left( (1-\alpha )c^0_i(x,[\pmb {\varphi _{-i}},\nu _i]) +\alpha \int _XJ_i^0(\pmb {\varphi })(y)p(dy|x,[\pmb {\varphi _{-i}},\nu _i])\right) . \end{aligned}$$

(5.3)

Let ${\widehat{S}}:= S_1\cap S_2\cdots \cap S_N.$ Now consider the game $\Gamma _x(v_1,...,v_N),$ where $v_i(y)= J_i^0(\pmb {\varphi })(y),$ $y\in X.$ By Lemma 5 in [36], there exists $\pmb {\phi } \in \Phi $ such that $\pmb {\phi } (d\pmb {a}|x) = (\phi _1(da_1|x),...,\phi _N(da_N|x))$ is a Nash equilibrium in the game $\Gamma _x(v_1,...,v_N)$ for all $x\in X\setminus {\widehat{S}}.$ For every $i\in {{{\mathcal {N}}}},$ define $\psi _i(da_i|x):= \varphi _i(da_i|x),$ if $x\in {\widehat{S}},$ and $\psi _i(da_i|x):= \phi _i(da_i|x),$ if $x\in X\setminus {\widehat{S}}.$ Then, using (5.3), we conclude that $\pmb {\psi }(d\pmb {a}|x) = (\psi _1(da_1|x),...,\psi _N(da_N|x))$ is a Nash equilibrium in the game $\Gamma _x(v_1,...,v_N)$ for all $x\in X.$ Define $v_i^0(y):= v_i(y)= J_i^0(\pmb {\varphi })(y)$ for each $y\in {\widehat{S}}$ and

$$\begin{aligned} v_i^0(y):= (1-\alpha )c^0_i(y,\pmb {\psi }) +\alpha \int _XJ_i^0(\pmb {\varphi })(z)p(dz|y,\pmb {\psi }) \end{aligned}$$

for each $y\in X\setminus {\widehat{S}}.$ Then, $\eta (X\setminus {\widehat{S}})=0$ and our assumption $p(\cdot |x,\pmb {a})\ll \eta (\cdot ),$ $(x,\pmb {a})\in {\mathbb {K}},$ imply that $\Gamma _x(v_1^0,...,v_N^0)= \Gamma _x(v_1,...,v_N)$ for all $x\in X.$ Therefore, for all $x\in X,$ $\psi (d\pmb {a}|x)$ is a Nash equilibrium in the game $\Gamma _x(v_1^0,...,v_N^0)$ and

$$\begin{aligned} v^0_i(x)=(1-\alpha )c^0_i(x,\pmb {\psi }) +\alpha \int _Xv_i^0(y)(y)p(dy|x,\pmb {\psi }). \end{aligned}$$

Using these facts and the Bellman equations for discounted dynamic programming [8, 24], we conclude that (5.2) holds. $\square $

Remark 5.2

Levy and McLennan [29] gave an example of a discounted stochastic game with no constraints having no stationary Nash equilibrium. This is an 8-person stochastic game with finite action sets for the players and $X=[0,1]$ as the state space. The definitions of payoff functions and transition probabilities in their game are rather complicated and are not given here. We only mention that the transition probabilities are absolutely continuous with respect to the probability measure $\eta _1=(\lambda _1 +\delta _1)/2,$ where $\lambda _1$ is the Lebesgue measure on [0, 1] and $\delta _1$ is the Dirac measure concentrated at the point 1. Assume that $\eta _1$ is the initial state distribution in this game. If this game had a stationary Nash equilibrium, then by Proposition 5.1, it would have a stationary Nash equilibrium for all initial states. From [29], it follows that it is impossible.^{Footnote 1}

6 Remarks on Games with Unbounded Costs

Our results can be extended to a class of games with unbounded cost functions $c_i^{\ell }$ under some uniform integrability condition introduced in [16]. The method for doing this relies on truncations of the costs and using an approximation by bounded games. This was done in our paper [28] in the countable state space case. In a special situation, described below and inspired by the work of Wessels [40] on dynamic programming, a reduction to the bounded case can be obtained by the well-known data transformation as described in Remark 2.5 in [12] or Sect. 10 in [17]. Following Wessels [40], we make the following assumptions.

Assumption W

(i) There exist a measurable function $\omega :X\rightarrow [1,\infty )$ and $c_0>0$ such that $|c^\ell _i(x,\pmb {a})|\le c_0\omega (x)$ for all $x\in X,$ $\pmb {a}\in A,$ $i\in {{{\mathcal {N}}}}$ and $\ell \in {{{\mathcal {L}}}}_0.$
(ii) There exists $\beta >1$ such that $\alpha \beta <1$ and
$$\begin{aligned} \int _X\omega (y)p(dy|x,\pmb {a}) \le \beta \omega (x) \end{aligned}$$
for all $x\in X,$ $\pmb {a}\in A.$
(iii) If $\pmb {a}^n \rightarrow \pmb {a}$ as $n\rightarrow \infty ,$ then

$$\begin{aligned} \int _X|\delta (x,y,\pmb {a}^n) - \delta (x,y,\pmb {a})|\omega (y)\mu (dy) \rightarrow 0. \end{aligned}$$

To describe the equivalent model with bounded costs we extend the state space X by adding an isolated absorbing state $0^*.$ All the costs at this absorbing state are zero. Let $c_i^{\ell ,\omega }(x,\pmb {a}):= \frac{c_i^\ell (x,\pmb {a})}{\omega (x)},$ and

$$\begin{aligned}{} & {} p^\omega (B|x,\pmb {a}):= \frac{\int _B \omega (y)p(dy|x,\pmb {a})}{\beta \omega (x)}, \quad B \in {{{\mathcal {F}}}},\ x\in X,\ \pmb {a}\in A, \\{} & {} p^\omega (0^*|x,\pmb {a}): = 1 -\frac{\int _X \omega (y)p(dy|x,\pmb {a})}{\beta \omega (x)},\quad x\in X,\ \pmb {a}\in A. \end{aligned}$$

Now define the new initial state distribution as

$$\begin{aligned} \eta _0(B):= \frac{\int _B\omega (x)\eta (dx)}{\eta \omega },\quad \text{ where }\quad \eta \omega = \int _X\omega (x)\eta (dx). \end{aligned}$$

Here, we assume that $\eta \omega <\infty .$ Then, we obtain primitive data for a bounded constrained stochastic game, in which the discount factor is $\alpha \beta .$ We denote the expected discounted costs in the bounded game under consideration by ${{{\mathcal {J}}}}^\ell _i(\pmb {\pi }).$ It is easy to see that

$$\begin{aligned} {{{\mathcal {J}}}}^\ell _i(\pmb {\pi })= \frac{J^\ell _i(\pmb {\pi })}{\eta \omega },\quad \text{ for } \text{ all }\quad i\in {{{\mathcal {N}}}}, \ell \in {{{\mathcal {L}}}}_0,\ \pmb {\pi }\in \Pi . \end{aligned}$$

Theorems 2.3 and 2.13 can be established for the bounded game described above with minor modifications. For example, one has to define new constraint constants as $\kappa ^\ell _i/\eta \omega ,$ $i\in {{{\mathcal {N}}}}, \ell \in {{{\mathcal {L}}}}.$ Using the above transformation, we can immediately deduce similar results for games with unbounded cost functions satisfying Assumption W.

Notes

We thank John Yehuda Levy for pointing out this fact.

References

Aliprantis, C., Border, K.: Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, New York (2006)
MATH Google Scholar
Altman, E.: Constrained Markov Decision Processes. Chapman Hall & CRC, Florida (1999)
MATH Google Scholar
Altman, E., Shwartz, A.: Constrained Markov games: Nash equilibria. Ann. Int. Soc. Dyn. Games 5, 213–221 (2000)
MathSciNet MATH Google Scholar
Alvarez-Mena, J., Hernández-Lerma, O.: Existence of Nash equilibria for constrained stochastic games. Math. Meth. Oper. Res. 63, 261–285 (2006)
Article MathSciNet MATH Google Scholar
Aumann, R.J.: Subjectivity and correlation in randomized strategies. J. Math. Econ. 1, 67–96 (1974)
Article MathSciNet MATH Google Scholar
Aumann, R.J.: Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55, 1–18 (1987)
Article MathSciNet MATH Google Scholar
Balder, E.J.: Lectures on Young measure theory and its applications in economics. Rend. Istit. Mat. Univ. Trieste 31, 1–69 (2000)
MathSciNet MATH Google Scholar
Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: the Discrete-Time Case. Academic Press, New York (1978)
MATH Google Scholar
Billingsley, P.: Probability and Measure. Wiley, New York (2012)
MATH Google Scholar
Debreu, G.: A social equilibrium existence theorem. Proc. Natl. Acad. Sci. USA 38, 931–938 (1954)
Google Scholar
Duffie, D., Geanakoplos, J., Mas-Colell, A., McLennan, A.: Stationary Markov equilibria. Econometrica 62, 745–781 (1994)
Article MathSciNet MATH Google Scholar
Dufour, F., Prieto-Rumeau, T.: Conditions for the solvability of the linear programming formulation for constrained discounted Markov decision processes. Appl. Math. Optim. 74, 27–51 (2016)
Article MathSciNet MATH Google Scholar
Dufour, F., Prieto-Rumeau, T.: Stationary Markov Nash equilibria for nonzero-sum constrained ARAT Markov games. SIAM J. Control Optim. 60, 945–967 (2022)
Article MathSciNet MATH Google Scholar
Elliott, R.J., Kalton, N.J., Markus, L.: Saddle-points for linear differential games. SIAM J. Control Optim. 11, 100–112 (1973)
Article MathSciNet MATH Google Scholar
Federgruen, A.: On $N$-person stochastic games with denumerable state space. Adv. Appl. Prob. 10, 452–471 (1978)
Article MathSciNet MATH Google Scholar
Feinberg, E.A., Jaśkiewicz, A., Nowak, A.S.: Constrained discounted Markov decision processes with Borel state spaces. Automatica 111, 108582 (2020)
Article MathSciNet MATH Google Scholar
Feinberg, E.A., Piunovskiy, A.B.: Sufficiency of deterministic policies for atomless discounted and uniformly absorbing MDPs with multiple criteria. SIAM J. Control Optim. 57, 163–191 (2019)
Article MathSciNet MATH Google Scholar
Ferguson, T.S.: Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York (1967)
MATH Google Scholar
Florescu, L.C., Godet-Thobie, C.: Young Measures and Compactness in Measure Spaces. De Gruyter, Berlin (2012)
Book MATH Google Scholar
Forges, F.: An approach to communication equilibria. Econometrica 54, 1375–1385 (1986)
Article MathSciNet MATH Google Scholar
Forges, F.: Communication equilibria in repeated games with incomplete information. Math. Oper. Res. 13, 77–117 (1988)
Article MathSciNet MATH Google Scholar
Harris, C., Reny, P.J., Robson, A.: The existence of subgame-perfect equilibrium in continuous games with almost perfect information: a case for public randomization. Econometrica 63, 507–544 (1995)
Article MathSciNet MATH Google Scholar
He, W., Sun, Y.: Stationary Markov perfect equilibria in discounted stochastic games. J. Econ. Theory 169, 35–61 (2017)
Article MathSciNet MATH Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York (1996)
Book MATH Google Scholar
Himmelberg, C.J., Parthasarathy, T., Raghavan, T.E.S., Van Vleck, F.S.: Existence of $p$-equilibrium and optimal stationary strategies in stochastic games. Proc. Am. Math. Soc. 60, 245–251 (1976)
MathSciNet MATH Google Scholar
Jaśkiewicz, A., Nowak, A.S.: Non-zero-sum stochastic games. In: Handbook of Dynamic Games, vol. I (Theory), (T. Başar and G. Zaccour, Eds.) pp. 281–344. Springer, Cham (2018)
Jaśkiewicz, A., Nowak, A.S.: Constrained Markov decision processes with expected total reward criteria. SIAM J. Control Optim. 57, 3118–3136 (2019)
Article MathSciNet MATH Google Scholar
Jaśkiewicz, A., Nowak, A.S.: Constrained discounted stochastic games. Appl. Math. Optim. 85(2), 6 (2022). https://doi.org/10.1007/s00245-022-09865-0
Article MathSciNet MATH Google Scholar
Levy, Y.J., McLennan, A.: Corrigendum to: discounted stochastic games with no stationary Nash equilibrium: two examples. Econometrica 83, 1237–1252 (2015)
Article MathSciNet MATH Google Scholar
Mertens, J.F.: Correlated and communication equilibria. In: Mertens, F., Sorin, S. (eds.) Game Theoretic Methods in General Equilibrium Analysis, pp. 243–248. Kluwer Academic, Dordrecht (1994)
Chapter Google Scholar
Myerson, R.B.: Multistage games with communication. Econometrica 54, 323–358 (1986)
Article MathSciNet MATH Google Scholar
Moulin, H., Vial, J.P.: Strategically zero-sum games: the class of games whose completely mixed equilibria cannot be improved upon. Int. J. Game Theory 7, 201–221 (1978)
Article MathSciNet MATH Google Scholar
Neveu, J.: Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco (1965)
MATH Google Scholar
Nowak, A.S.: Existence of equilibrium stationary strategies in discounted noncooperative stochastic games with uncountable state space. J. Optim. Theory Appl. 45, 591–602 (1985)
Article MathSciNet MATH Google Scholar
Nowak, A.S.: Existence of correlated weak equilibria in discounted stochastic games with general state space. In: Stochastic Games and Related Topics (T.E.S. Raghavan, et al., Eds.), pp. 135–143. Kluwer Academic, Dordrecht (1991)
Nowak, A.S., Raghavan, T.E.S.: Existence of stationary correlated equilibria with symmetric information for discounted stochastic games. Math. Oper. Res. 17, 519–526 (1992)
Article MathSciNet MATH Google Scholar
Piunovskiy, A.B.: Optimal Control of Random Sequences in Problems with Constraints. Kluwer Academic Publishers (1997)
Solan, E.: Characterization of correlated equilibria in stochastic games. Int. J. Game Theory 30, 259–277 (2001)
Article MathSciNet MATH Google Scholar
Solan, E., Vieille, N.: Correlated equilibrium in stochastic games. Games Econ. Behav. 38, 362–399 (2002)
Article MathSciNet MATH Google Scholar
Wessels, J.: Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58, 326–335 (1977)
Article MathSciNet MATH Google Scholar
Whitt, W.: Representation and approximation of noncooperative sequential games. SIAM J. Control Optim. 18, 33–48 (1980)
Article MathSciNet MATH Google Scholar
Zhang, W., Huang, Y., Guo, X.: Nonzero-sum constrained discrete-time Markov games: the case of unbounded costs. TOP 22, 1074–1102 (2014)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank two reviewers for very helpful reports.

Funding

We acknowledge the financial support from the National Science Centre, Poland: Grant 2016/23/B/ST/00425.

Author information

Authors and Affiliations

Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology, Wrocław, Poland
Anna Jaśkiewicz
Faculty of Mathematics, Computer Science, and Econometrics, University of Zielona Góra, Zielona Góra, Poland
Andrzej S. Nowak

Authors

Anna Jaśkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej S. Nowak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Jaśkiewicz.

Ethics declarations

Conflict of interest

The authors have not disclosed any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this section, we prove a lemma which plays an important role in the proofs of our theorems.

Let player $i \in {{{\mathcal {N}}}}$ be fixed. We also fix $\gamma >0$, the partition ${{{\mathcal {P}}}}^\gamma =\{X^\gamma _n: n\in {\mathbb {N}}_0 \}$ of the state space X, the cost functions $c^{\ell ,\gamma }_i$ and the transition function $p^\gamma $ in the game ${{{\mathcal {G}}}}^\gamma .$ We fix $\pmb {\varphi _{-i}}\in \Phi _{-i}^\gamma =\prod _{j\in \mathcal{N}\setminus \{i\}}\Phi _j^\gamma .$

A piecewise constant Markov strategy for player i is a sequence $\pi _i= (f^t)_{t\in {\mathbb {N}}},$ where $f^t\in \Phi _i^\gamma $ for all $t\in {\mathbb {N}}.$

Lemma 7.1

For fixed $\pmb {\varphi }\in \Phi ^\gamma $ and each $\phi _i\in \Phi _i$ there exists a piecewise constant Markov strategy $\pi _i= (f^t)_{t\in {\mathbb {N}}}$ for player i such that

$$\begin{aligned} J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}},\phi _i])= J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}},\pi _i])\quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

For a proof we need some auxiliary results. Let $d\in {\mathbb {N}}.$

Lemma 7.2

Assume that $Y\in {{{\mathcal {F}}}}$ and $\rho _0$ is a probability measure on X such that $\rho _0(Y)=1.$ Let $v=(v_0,...,v_{d-1})$, where every $v_j:X\rightarrow {\mathbb {R}}$ is a bounded measurable function. Then, there exist points $y_0,...,y_{d} \in Y$ and non-negative numbers $\beta _0,...,\beta _{d}$ such that $\sum _{j=0}^{d} \beta _j =1$ and

$$\begin{aligned} \int _Y v(x)\rho _0(dx)= \sum _{j=0}^{d}\beta _j v(y_j). \end{aligned}$$

(7.1)

Proof

Consider the distribution function of v defined by: $\zeta _v(B) :=\rho _0(v^{-1}(B))$, where B is any Borel set in ${\mathbb {R}}^d.$ Using Theorem 16.13 on page 229 in [9] and Lemma 3 on page 74 in [18], we obtain

$$\begin{aligned} \int _Y v(x)\rho _0(dx)= \int _{{\mathbb {R}}^d}z\zeta _v(z)dz \in co\{v(y): y\in X\}. \end{aligned}$$

Applying Carathéodory’s theorem, we find points $y_0,...,y_{d}\in Y$ and numbers $\beta _0,...,\beta _{d} \ge 0$ such that $\sum _{j=0}^{d} \beta _j =1$ and (7.1) holds. $\square $

We use ${{{\mathcal {C}}}}(A_i)$ to denote the space of all real-valued continuous functions on $A_i $ and $\Pr (A_i)$ for the space of all probability measures on $A_i.$

Lemma 7.3

Let $\rho $ be a probability measure on X. For each $\ell \in \mathcal{L}_0$ assume that $u^\ell :X\times A_i \rightarrow {\mathbb {R}}$ is a bounded function such that $u^\ell (x,a_i) = u_n^\ell (a_i)$ for all $x\in X^\gamma _n,$ $a_i\in A_i,$ where $u^\ell _n \in {{{\mathcal {C}}}}(A_i)$, $n\in {\mathbb {N}}_0.$ Then, for any $\phi _i\in \Phi _i$ there exists $f\in \Phi ^\gamma _i$ such that

$$\begin{aligned} \int _X\int _{A_i} u^\ell (x,a_i)\phi _i(da_i|x)\rho (dx)= \int _X\int _{A_i} u^\ell (x,a_i)f(da_i|x)\rho (dx) \quad \text{ for } \text{ all } \ell \in {{{\mathcal {L}}}}_0.\nonumber \\ \end{aligned}$$

(7.2)

Proof

Assume first that $\rho (X_n^\gamma )>0$ and define $\rho _0(B) =\frac{\rho (B\cap X_n^\gamma )}{\rho (X_n^\gamma ) },$ $B \in \mathcal{F}.$ Applying Lemma 7.2 with $d=L+1$ and $v=(u^0,...,u^{L}),$ we infer that there exist points $y_0(n),...,y_{L+1}(n)$ in $X_n^\gamma $ and $\beta _0(n),...,\beta _{L+1}(n) \ge 0$ such that $\sum _{j=0}^{L+1}\beta _j(n)=1$ and

$$\begin{aligned}{} & {} \frac{1}{\rho (X_n^\gamma )}\int _{X_n^\gamma }\int _{A_i}u^\ell (x,a_i)\phi _i(da_i|x)\rho (dx)= \frac{1}{\rho (X_n^\gamma )}\int _{X_n^\gamma }\int _{A_i}u_n^\ell (a_i)\phi _i(da_i|x)\rho (dx)\\{} & {} \quad = \sum _{j=0}^{L+1}\beta _j(n) \int _{A_i}u^\ell _n(a_i)\phi _i(da_i|y_j(n))\ \text{ for } \text{ all } \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

For each $x\in X_n^\gamma $, define $f(da_i|x) :=\nu _n(da_i),$ where $\nu _n \in \Pr (A_i)$ is given as

$$\begin{aligned} \nu _n(da_i):= \sum _{j=0}^{L+1}\beta _j(n) \phi _i(da_i|y_j(n)). \end{aligned}$$

If $\rho (X_n^\gamma )=0,$ then $f(da_i|x)$ is defined for all $x \in X_n^\gamma $ by $f(da_i|x)=\nu _n(da_i)$ where $\nu _n$ is any fixed measure in $\Pr (A_i).$ Note that, we have

$$\begin{aligned}{} & {} \int _{X_n^\gamma }\int _{A_i}u^\ell (x,a_i)\phi _i(da_i|x)\rho (dx)= \int _{A_i}u_n^\ell (a_i)\nu _n(da_i )\rho (X_n^\gamma )\\{} & {} \quad = \int _{X_n^\gamma }\int _{A_i}u^\ell (x,a_i)f(da_i|x)\rho (dx), \end{aligned}$$

for all $\ell \in {{{\mathcal {L}}}}_0,\ n\in {\mathbb {N}}_0.$ Hence,

$$\begin{aligned} \sum _{n\in {\mathbb {N}}_0} \int _{X_n^\gamma }\int _{A_i}u^\ell (x,a_i)\phi _i(da_i|x)\rho (dx) = \sum _{n\in {\mathbb {N}}_0} \int _{X_n^\gamma }\int _{A_i}u^\ell (x,a_i)f(da_i|x)\rho (dx), \end{aligned}$$

for all $\ell \in {{{\mathcal {L}}}}_0,$ which implies (7.2). $\square $

Since $i\in {{{\mathcal {N}}}},$ $\gamma >0$, $\pmb {\varphi _{-i}}\in \Phi _{-i}^\gamma $ and $\phi _i \in \Phi _i$ are fixed, the notation for the proof of Lemma 7.1 can be simplified.

Let $\varphi _{-i}(d\pmb {a_{-i}}|x)$ be the product measure on $A_{-i}$ induced by $\varphi _j(da_j|x)$ with $j\not = i.$ For $\ell \in {{{\mathcal {L}}}}_0,$ $x\in X$ and $a_i\in A_i,$ we put

$$\begin{aligned}{} & {} c^\ell (x,a_i):= \int _{A_{-i}}c_i^{\ell ,\gamma }(x,[\pmb {a_{-i}},a_i]) \varphi _{-i}(d\pmb {a_{-i}}|x), \\{} & {} q(dy|x,a_i):= \int _{A_{-i}}p^{ \gamma }(dy|x,[\pmb {a_{-i}},a_i]) \varphi _{-i}(d\pmb {a_{-i}}|x). \end{aligned}$$

Next, we put

$$\begin{aligned} c^\ell _{\phi _i}(x) := \int _{A_i}c^\ell (x,a_i)\phi _i(da_i|x), \end{aligned}$$

and, for any bounded measurable function $w:X\rightarrow {\mathbb {R}},$

$$\begin{aligned} Q_{\phi _i}w(x):= \int _{A_i}w(y)q(dy|x,a_i)\phi _i(da_i|x). \end{aligned}$$

Similarly, we define $c^\ell _{g}(x)$ and $Q_{g}w(x)$ for any $g\in \Phi _i^\gamma .$ Next, if $g^1,g^2,...,g^T \in \Phi _i^\gamma ,$ then

$$\begin{aligned} \eta w =\int _Xw(x)\eta (dx)\quad \text{ and }\quad Q_{g^1}Q_{g^2}\cdots Q_{g^T}w(x) = Q_{g^1}(Q_{g^2}\cdots Q_{g^T}w)(x) \end{aligned}$$

and

$$\begin{aligned} \eta Q_{g^1}Q_{g^2}\cdots Q_{g^T}w := \int _X Q_{g^1}Q_{g^2}\cdots Q_{g^T}w(x)\eta (dx). \end{aligned}$$

Note that $\eta Q_{g^1}Q_{g^2}\cdots Q_{g^T}$ is the probability distribution of the state $x_{T+1}$ of the process, when player i uses a Markov strategy $(g^t)_{t\in {\mathbb {N}}}.$

We now introduce new notation for expected costs. Recalling that $\phi _i\in \Phi _i,$ we put

$$\begin{aligned} I^\ell (\phi _i)(x) := J_i^{\ell ,\gamma }([\pmb {\varphi _{-i}},\phi _i])(x)\quad \text{ and }\quad I^{\ell ,\eta }(\phi _i) := \int _X I^\ell (\phi _i)(x)\eta (dx),\quad \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

If $\pi _i=(g^t)_{t\in {\mathbb {N}}}$ is a piecewise constant strategy for player i, then $I^{\ell ,\eta }_T(\pi _i)= I^{\ell ,\eta }_T(g^1,...,g^T) $ denotes the expected discounted cost in the T-step game ${{{\mathcal {G}}}}^\gamma $ under assumption that the other players use $\pmb {\varphi _{-i}}.$ Then, the cost over the infinite time horizon is

$$\begin{aligned} I^{\ell ,\eta }(\pi _i)= \lim _{T\rightarrow \infty } I^{\ell ,\eta }_T(\pi _i). \end{aligned}$$

Proof of Lemma 7.1

We show by induction that for given $\phi _i\in \Phi _i$ there exists $\pi _i=(f^t)_{t\in {\mathbb {N}}}$ with $f^t\in \Phi _i^\gamma $ for all $t\in {\mathbb {N}}$ such that for all $T\in {\mathbb {N}},$ we have

$$\begin{aligned} I^{\ell ,\eta }(\phi _i)= I^{\ell ,\eta }(f^1,...,f^T) + \alpha ^T \eta Q_{f^1}\cdots Q_{f^T}((1-\alpha )c^\ell _{\phi _i} +\alpha Q_{\phi _i}I^{\ell }(\phi _i)). \end{aligned}$$

(7.3)

We shall use the following equation

$$\begin{aligned} I^\ell (\phi _i)(x) = (1-\alpha )c^\ell _{ \phi _i}(x) + \alpha Q_{\phi _i}I^\ell (\phi _i)(x),\quad \text{ for } \text{ each }\quad x\in X. \end{aligned}$$

Assume that $T=1.$ Then,

$$\begin{aligned} I^{\ell ,\eta }(\phi _i)= & {} \eta ((1-\alpha ) c^\ell _{\phi _i} +\alpha Q_{\phi _i}I^{\ell }(\phi _i))\\ {}= & {} \int _X\int _{A_i}\left( (1-\alpha )c^\ell (x,a_i) + \alpha \int _X I^{\ell }(\phi _i)(y)q(dy|x,a_i)\right) \phi _i(da_i|x)\eta (dx). \end{aligned}$$

Applying Lemma 7.3 with $\rho =\eta $ and

$$\begin{aligned} u^\ell (x,a_i)=(1-\alpha ) c^\ell (x,a_i) + \alpha \int _X I^{\ell }(\phi _i)(y)q(dy|x,a_i) \end{aligned}$$

(7.4)

we obtain $f^1\in \Phi _i^\gamma $ such that

$$\begin{aligned} \int _X\int _{A_i}u^\ell (x,a_i)\phi _i(da_i|x)\eta (dx)= \int _X\int _{A_i}u^\ell (x,a_i) f^1(da_i|x)\eta (dx) \quad \text{ for } \text{ all }\quad \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

Then, we get

$$\begin{aligned} I^{\ell ,\eta }(\phi _i)= & {} \eta I^\ell (\phi _i) = \eta ((1-\alpha ) c^\ell _{\phi _i} +\alpha Q_{\phi _i}I^{\ell }(\phi _i))\\ {}= & {} \eta ((1-\alpha ) c^\ell _{f^1} +\alpha Q_{f^1}I^{\ell }(\phi _i)) = \eta (1-\alpha ) c^\ell _{f^1} + \alpha \eta Q_{f^1}I^{\ell }(\phi _i) \\ {}= & {} I_1^{\ell ,\eta }(f^1) + \alpha \eta Q_{f^1}((1-\alpha )c^\ell _{\phi _i}+\alpha Q_{\phi _i} I^{\ell }(\phi _i)) \quad \text{ fro } \text{ all }\quad \ell \in {{{\mathcal {L}}}}_0. \end{aligned}$$

We have obtained (7.3) for $T=1.$ Assume now that (7.3) holds for $T=m$ with some $m\ge 1.$ Then we have for some $f^1,...,f^m \in \Phi ^\gamma _i$ that

$$\begin{aligned} I^{\ell ,\eta }(\phi _i)= I^{\ell ,\eta }(f^1,...,f^m) + \alpha ^m \eta Q_{f^1}\cdots Q_{f^m}((1-\alpha )c^\ell _{\phi _i} +\alpha Q_{\phi _i}I^{\ell }(\phi _i)) \end{aligned}$$

for all $\ell \in {{{\mathcal {L}}}}_0.$ Applying Lemma 7.3 with $u^\ell (x,a_i)$ given by (7.4) and $\rho = \eta Q_{f^1}\cdots Q_{f^m},$ we obtain $f^{m+1} \in \Phi _i^\gamma $ such that

$$\begin{aligned}{} & {} \eta Q_{f^1}\cdots Q_{f^m}((1-\alpha )c^\ell _{\phi _i} +\alpha Q_{\phi _i}I^{\ell }(\phi _i)) \\{} & {} \quad = \eta Q_{f^1}\cdots Q_{f^m}((1-\alpha )c^\ell _{f^{m+1}} +\alpha Q_{f^{m+1}}I^{\ell }(\phi _i))\\{} & {} \quad = \eta Q_{f^1}\cdots Q_{f^m} (1-\alpha )c^\ell _{f^{m+1}}+ \alpha \eta Q_{f^1}\cdots Q_{f^m}Q_{f^{m+1}}((1-\alpha ) c^\ell _{\phi _i} + \alpha Q_{\phi _i} I^{\ell }(\phi _i)). \end{aligned}$$

Thus for all $\ell \in {{{\mathcal {L}}}}_0$ we get

$$\begin{aligned} I^{\ell ,\eta }(\phi _i)= & {} I^{\ell ,\eta }(f^1,...,f^m) + \alpha ^m \eta Q_{f^1}\cdots Q_{f^m} (1-\alpha )c^\ell _{f^{m+1}} \\ {}{} & {} + \alpha ^{m+1} \eta Q_{f^1}\cdots Q_{f^m}Q_{f^{m+1}}((1-\alpha ) c^\ell _{\phi _i} + \alpha Q_{\phi _i} I^{\ell }(\phi _i)) \\ {}= & {} I^{\ell ,\eta }(f^1,...,f^{m+1}) + \alpha ^{m+1} \eta Q_{f^1}\cdots Q_{f^m}Q_{f^{m+1}}((1-\alpha ) c^\ell _{\phi _i} + \alpha Q_{\phi _i} I^{\ell }(\phi _i)). \end{aligned}$$

This finishes the induction step. Taking the limit in (7.3) as $T\rightarrow \infty $, we obtain

$$\begin{aligned} I^{\ell ,\eta }(\phi _i) = I^{\ell ,\eta }(\pi _i) \quad \text{ with }\quad \pi _i= (f^1,f^2,...) \end{aligned}$$

for all $ \ell \in {{{\mathcal {L}}}}_0.$ Going back to our original notation, we deduce that this is the assertion of Lemma 7.1. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jaśkiewicz, A., Nowak, A.S. On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games. Appl Math Optim 87, 23 (2023). https://doi.org/10.1007/s00245-022-09930-8

Download citation

Accepted: 10 October 2022
Published: 13 January 2023
DOI: https://doi.org/10.1007/s00245-022-09930-8

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On Approximate and Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Abstract

Similar content being viewed by others

On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games

Stationary Almost Markov ε-Equilibria for Discounted Stochastic Games with Borel Spaces and Unbounded Payoffs

Stationary Equilibria in Discounted Stochastic Games

1 Introduction

2 The Game Model and Main Results

2.1 Approximate Nash Equilibria in Constrained Discounted Stochastic Games

Definition 2.1

Assumption A1

Assumption A2

Assumption A3

Remark 2.2

Theorem 2.3

Remark 2.4

Remark 2.5

Remark 2.6

2.2 An Update on Stationary Equilibria in Unconstrained Discounted Stochastic Games

Definition 2.7

Theorem 2.8

Remark 2.9

Remark 2.10

2.3 Weak Correlated Equilibria in Constrained Discounted Stochastic Games

Definition 2.11

Assumption A4

Assumption A5

Remark 2.12

Theorem 2.13

Remark 2.14

Remark 2.15

3 Approximating Games with Countable State Spaces and Proofs of Theorems 2.3 and 2.8

Lemma 3.1

Proof of Theorem 2.3

Proof of Theorem 2.8

Remark 3.2

4 Young Measures and the Proof of Theorem 2.13

Lemma 4.1

Lemma 4.2

Proof

Proof of Theorem 2.13

Remark 4.3

Remark 4.4

5 Non-existence of Stationary Equilibria in Discounted Constrained Games

Proposition 5.1

Proof of Proposition 5.1

Remark 5.2

6 Remarks on Games with Unbounded Costs

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Lemma 7.1

Lemma 7.2

Proof

Lemma 7.3

Proof

Proof of Lemma 7.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation