Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion

Prieto-Rumeau, Tomás; Lorenzo, José María

doi:10.1007/s11750-014-0354-8

Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion

Original Paper
Published: 10 December 2014

Volume 23, pages 799–836, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

TOP Aims and scope Submit manuscript

Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion

Download PDF

Tomás Prieto-Rumeau¹ &
José María Lorenzo²

249 Accesses
8 Citations
Explore all metrics

Abstract

We deal with a two-person zero-sum continuous-time Markov game $\mathcal {G}$ with denumerable state space, general action spaces, and unbounded payoff and transition rates. We consider noncooperative equilibria for the discounted payoff criterion. We are interested in approximating numerically the value and the optimal strategies of $\mathcal {G}$. To this end, we propose a definition of a sequence of game models $\mathcal {G}_n$ converging to $\mathcal {G}$, which ensures that the value and the optimal strategies of $\mathcal {G}_n$ converge to those of $\mathcal {G}$. For numerical purposes, we construct finite state and actions game models $\mathcal {G}_n$ that can be explicitly solved, and we study the convergence rate of the value of the games. A game model based on a population system illustrates our results.

Zero-Sum Markov Games with Random State-Actions-Dependent Discount Factors: Existence of Optimal Strategies

Article 03 March 2018

Zero-sum infinite-horizon discounted piecewise deterministic Markov games

Article 08 February 2023

Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: Approximation of Games with Finite Populations

Article Open access 15 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This paper deals with a two-person zero-sum continuous-time Markov game with denumerable state space, general action spaces, and possibly unbounded payoff and transition rates. The optimality criterion consists in finding a Nash equilibrium for the total expected discounted payoff of the players. The existence of such Nash equilibrium, as well as the existence of optimal strategies for the players, has been established in Guo and Hernández-Lerma (2005). In that reference, it is shown that the value of the game is the solution of an optimality equation (also referred to as the Shapley equation).

Now we explain, somehow loosely, the form of this Shapley equation. Let $i\in S$ be the state of the system, and denote by $a\in A$ and $b\in B$ the actions of the players that take values in some Borel spaces $A$ and $B$. There is some operator $\mathbf {H}$ that maps, for each fixed $a\in A$ and $b\in B$, a function $\{u(i)\}_{i\in S}$ into the function $\{(\mathbf {H}u)(i,a,b)\}_{i\in S}$ such that the value of the game $\{V(i)\}_{i\in S}$ is the unique solution of the equations

$$\begin{aligned} V(i)&= \sup _{\phi \in \mathcal {P}(A)} \inf _{\psi \in \mathcal {P}(B)} \int _A\int _B (\mathbf {H}V)(i,a,b)\psi (db)\phi (da) \end{aligned}$$

(1.1)

$$\begin{aligned}&= \inf _{\psi \in \mathcal {P}(B)} \sup _{\phi \in \mathcal {P}(A)} \int _A\int _B (\mathbf {H}V)(i,a,b)\psi (db)\phi (da) \end{aligned}$$

(1.2)

for all $i\in S$, where $\mathcal {P}(A)$ and $\mathcal {P}(B)$ denote the family of probability measures on $A$ and $B$, respectively.

It should be clear that one cannot expect to solve, in general, the Eqs. (1.1)–(1.2) explicitly. For computational purposes, therefore, one should use some kind of discretization technique to, at least, approximate the value of the game and the optimal strategies of the players. This is precisely the purpose of this paper.

Let $\mathcal {G}$ be the “original” game model, with denumerable state space and Borel action spaces, and let $\{\mathcal {G}_n\}_{n\ge 1}$ be a sequence of game models. In this paper, we propose a definition of the convergence $\mathcal {G}_n\rightarrow \mathcal {G}$ which, under adequate conditions, implies that the value of the games $\mathcal {G}_n$ and the corresponding optimal strategies converge to the value and the optimal strategies of the game $\mathcal {G}$. Then, for computational purposes, we show how we can construct, starting from the game model $\mathcal {G}$, a sequence of game models $\{\mathcal {G}_n\}_{n\ge 1}$ with finite state and action spaces that converge to $\mathcal {G}$. Such finite models can be solved explicitly and, hence, we can provide computable approximations of the value of the game model $\mathcal {G}$. Our proofs make use of separability of spaces of probability measures in the Wasserstein metric.

As far as we know, this is the first attempt to provide such computable approximations for continuous-time Markov games with denumerable state space and general action spaces. The reader interested in related works can consult (Jaśkiewicz and Nowak 2006; Nowak and Altman 2002), in which the idea of approximating a game model $\mathcal {G}$ with “simpler” models has been studied. The reference (Chang et al. 2010) also considers computational issues for a continuous-time game with general state space and finite action spaces. Approximation techniques similar to those developed in the present paper, but for continuous-time controlled (say, with a single player) Markov chains, have been studied in Prieto-Rumeau and Hernández-Lerma (2012) for the discounted reward criterion and in Prieto-Rumeau and Lorenzo (2010) for the average cost criterion; see also Prieto-Rumeau and Hernández-Lerma (2012). The reference Guo and Zhang (2014) proposes approximation techniques for discounted cost Markov decision processes with constraints (the previous references are concerned with unconstrained problems). Their setting is similar to ours, in the sense that they propose a definition of convergence for control models. The technique proofs in Guo and Zhang (2014) mainly rely on linear programming, while (Prieto-Rumeau and Hernández-Lerma 2012; Prieto-Rumeau and Lorenzo 2010) use dynamic programming arguments.

At this point, it is interesting to make a comparison between the approximation approaches for control and game models. Approximating a game model by means of finite state and actions game models is, from a technical point of view, more complicated than such approximations for control models. The analogous to (1.1)–(1.2) for a control model, in which the state space is $S$ and the action space of the controller is $A$, is the optimality (or dynamic programming) equation

$$\begin{aligned} V(i)&= \sup _{a\in A}\{ (\mathbf {H}V)(i,a)\}\quad \hbox {for}\quad i\in S. \end{aligned}$$

(1.3)

When making a finite approximation, one roughly considers an optimality equation as in (1.3) with finite $S$ and $A$. Then, one can use, for instance, the policy iteration algorithm that solves this optimality equation in a finite number of steps. For a game model, however, the Eqs. (1.1)–(1.2) are, even in the case of finite $S$, $A$, and $B$, of a continuous nature because we are optimizing on a set of probability measures (say, a simplex). This makes the computational problems less straightforward. Here, we combine linear programming with a “value iteration” algorithm to solve such problems. Moreover, from a computational perspective, the maximum of a function (as in (1.3)) is easier to approximate than the saddle point of a function (as in (1.1)–(1.2)).

The rest of the paper is organized as follows. In Sect. 2, we define the game models we will be dealing with, state our main assumptions, and recall some previously known results. The main convergence results are presented in Sect. 3. In Sect. 4, these results are specialized to the case of finite state and actions approximations. Moreover, under some additional conditions, we can obtain the convergence rate of the value of the games. Finally, in Sect. 5, we make an application to a population system for which we provide some explicit approximations.

2 Definitions and assumptions

The definition of the game model in this section and the corresponding results are mainly based on Guo et al. (2003); Guo and Hernández-Lerma (2005) and (Prieto-Rumeau and Hernández-Lerma 2012, Chapter 10).

2.1 The game model $\mathcal {G}$

Definition of the game model. We consider a two-player zero-sum continuous-time Markov game model $\mathcal {G}=\{S,A,B,\mathbb {K},Q,r\}$, where the elements of $\mathcal {G}$ are the following.

$S=\{0,1,2,\ldots \}$ is the state space.
$A$ and $B$ are the action spaces for players 1 and 2, respectively. We assume that $A$ and $B$ are Borel spaces (i.e., measurable subsets of complete and separable metric spaces). The corresponding metrics are denoted by $d_A$ and $d_B$, respectively.
For each $i\in S$, the measurable sets $A(i)\subseteq A$ and $B(i)\subseteq B$ stand for the actions available for players 1 and 2 in state $i\in S$, respectively. (Throughout this paper, measurability is always understood with respect to the Borel $\sigma $-algebra). The family of feasible triplets is defined as:
$$\begin{aligned} \mathbb {K}=\{(i,a,b)\in S\times A\times B: a\in A(i), b\in B(i)\}. \end{aligned}$$
The transition rate matrix of the system is $Q=[q_{ij}(a,b)]$. It is assumed that:
1. 1.
  The function $(a,b)\mapsto q_{ij}(a,b)$ is measurable on $A(i)\times B(i)$ for all $i,j\in S$;
2. 2.
  The transition rates are conservative, that is,
  $$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)=0\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}, \end{aligned}$$
  with $q_{ij}(a,b)\ge 0$ whenever $i\ne j$;
3. 3.
  The transition rates are stable, i.e., $q(i):=\sup _{a\in A(i),b\in B(i)} \{-q_{ii}(a,b)\}<\infty $.
Finally, the measurable function $r:\mathbb {K}\rightarrow \mathbb {R}$ is interpreted as the reward rate function for player 1 and the cost rate function for player 2.

The game $\mathcal {G}$ is played as follows. At each time $t\ge 0$, both players observe the state of the system $x(t)=i\in S$ and then, independently and simultaneously, they choose actions $a(t)=a\in A(i)$ and $b(t)=b\in B(i)$. In a small time interval $[t,t+dt]$:

player 1 receives a reward $r(i,a,b)dt$,
player 2 incurs a cost $r(i,a,b)dt$,
the system remains in state $i\in S$ with probability $1+q_{ii}(a,b)dt$ or makes a transition to the state $j\ne i$ with probability $q_{ij}(a,b)dt$.

This procedure is carried on over all the time horizon $t\in [0,\infty )$.

We suppose that the reward/cost of the players is depreciated at a constant discount rate $\alpha >0$, and so the reward/cost $r(i,a,b)$ at time $t\ge 0$ is brought to its present value, that is, $e^{-\alpha t}r(i,a,b)$. The goal of player 1 is to maximize his total expected discounted reward, loosely,

$$\begin{aligned} E\Big [\int _0^\infty e^{-\alpha t}r(x(t),a(t),b(t))dt\Big ], \end{aligned}$$

while player 2 wants to minimize his total expected discounted cost. A formal definition will be given below.

Assumptions on the game model. Next, we state all the assumptions we will make on the game model $\mathcal {G}$. We will use the following terminology.

Definition 2.1

We say that $w:S\rightarrow [1,\infty )$ is a Lyapunov function on $S$ when $w$ is monotone nondecreasing and, in addition, $\lim _{i\rightarrow \infty }w(i)=+\infty $.

Assumption 2.2

The game model $\mathcal {G}$ satisfies the following conditions.

(i)
There exists a Lyapunov function $w$ on $S$, and constants $c_1<\alpha $ and $d_1\ge 0$ with
$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)w(j)\le c_1w(i)+d_1\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}. \end{aligned}$$
For each $i\in S$, we have $q(i)\le w(i)$.
(ii)
There exists a constant $M>0$ such that $|r(i,a,b)|\le Mw(i)$ for all $(i,a,b)\in \mathbb {K}$.
(iii)
For each $i\in S$, the sets $A(i)$ and $B(i)$ are compact. For all $i,j\in S$, the functions $r(i,a,b)$ and $q_{ij}(a,b)$ are continuous on $A(i)\times B(i)$.
(iv)
There exist constants $c_2\in \mathbb {R}$ and $d_2\ge 0$ with
$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)w^2(j)\le c_2w^2(i)+d_2\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}. \end{aligned}$$

In the sequel, we will explain the use of the above hypotheses.

Strategies of the players. We introduce some notation. Let $\overline{A}(i)$ and $\overline{B}(i)$ be the families of probability measures on $A(i)$ and $B(i)$, when endowed with their Borel $\sigma $-algebras $\mathbb {B}(A(i))$ and $\mathbb {B}(B(i))$, respectively. Let

$$\begin{aligned} \pi ^1\equiv \{\pi ^1_t(C|i)\}_{t\ge 0,i\in S, C\in \mathbb {B} (A(i))} \end{aligned}$$

be such that $\pi ^1_t(\cdot |i)$ is in $\overline{A}(i)$ for all $t\ge 0$ and $i\in S$, and such that $t\mapsto \pi ^1_t(C|i)$ is a measurable function on $[0,\infty )$ for all $C\in \mathbb {B}(A(i))$ and $i\in S$. We say that $\pi ^1$ is a randomized Markov strategy for player 1, and we denote by $\Pi ^1$ the set of all such strategies. The family $\Pi ^2$ of randomized Markov strategies

$$\begin{aligned} \pi ^2\equiv \{\pi ^2_t(C|i)\}_{t \ge 0, i \in S, C \in \mathbb {B} (B(i))} \end{aligned}$$

for player 2 is defined similarly.

We say that $\pi ^1\in \Pi ^1$ is a randomized stationary strategy for player 1 when $\pi ^1_t(C|i)$ does not depend on $t\ge 0$. Thus, the class $\Pi ^{1,s}$ of stationary strategies for player 1 can be identified with

$$\begin{aligned} \Pi ^{1,s}=\prod _{i\in S} \overline{A}(i). \end{aligned}$$

Similarly, the class of randomized stationary strategies for player 2 is $\Pi ^{2,s}=\prod _{i\in S} \overline{B}(i)$.

Given a pair of strategies $(\pi ^1,\pi ^2)\in \Pi ^1\times \Pi ^2$, define

$$\begin{aligned} q_{ij}(t,\pi ^1,\pi ^2)=\int _{A(i)}\int _{B(i)} q_{ij}(a,b)\pi ^2_t(db|i) \pi ^1_t(da|i). \end{aligned}$$

The above integral is well defined and finite because the system’s transition rates are conservative and stable. In particular, they satisfy

$$\begin{aligned} -q_{ii}(t,\pi ^1,\pi ^2)=\sum _{j\ne i}q_{ij}(t,\pi ^1,\pi ^2)\le q(i)\quad \hbox {for each}\,\, t\ge 0~ \hbox {and}~ i\in S. \end{aligned}$$

We will also use the following notation. Given $i,j\in S$, $\phi \in \overline{A}(i)$, and $\psi \in \overline{B}(i)$, let

$$\begin{aligned} q_{ij}(\phi ,\psi )&= \int _{A(i)}\int _{B(i)} q_{ij}(a,b)\psi (db) \phi (da), \end{aligned}$$

(2.1)

$$\begin{aligned} r(i,\phi ,\psi )&= \int _{A(i)}\int _{B(i)} r(i,a,b)\psi (db) \phi (da), \end{aligned}$$

(2.2)

and for stationary strategies $(\pi ^1,\pi ^2)\in \Pi ^{1,s}\times \Pi ^{2,s}$, we write

$$\begin{aligned} q_{ij}(\pi ^1,\pi ^2) = q_{ij}(\pi ^1(\cdot |i),\pi ^2(\cdot |i))\quad \hbox {and}\quad r(i,\pi ^1,\pi ^2) =r(i,\pi ^1(\cdot |i),\pi ^2(\cdot |i)). \end{aligned}$$

Our next result summarizes the main results on the existence of the state and actions processes. See, e.g., Proposition 3.1 in Guo et al. (2003), Assumption A in Guo and Hernández-Lerma (2005), or Proposition 10.3 in Prieto-Rumeau and Hernández-Lerma (2012).

Theorem 2.3

Suppose that Assumption 2.2(i) is satisfied.

(i)
For every $(\pi ^1,\pi ^2)\in \Pi ^1\times \Pi ^2$, there exists a regular (nonhomogeneous) transition function
$$\begin{aligned} \{P^{\pi ^1,\pi ^2}_{ij}(s,t)\}_{i,j\in S,0\le s\le t} \end{aligned}$$
with transition rates $q_{ij}(t,\pi ^1,\pi ^2)$, that is,
$$\begin{aligned} \lim _{h\downarrow 0} \frac{P^{\pi ^1,\pi ^2}_{ij}(t,t+h)-\delta _{ij}}{h}=q_{ij}(t,\pi ^1,\pi ^2)\quad \hbox {for all } i,\! j\in S\,\, \hbox {and}\,\, t\ge 0. \end{aligned}$$
Let $\Omega =\mathbb {K}^{[0,\infty )}=\{(x(t),a(t),b(t))\}_{t\ge 0}$ be endowed with the product $\sigma $-algebra $\mathcal {F}$.
(ii)
Given an initial state $i\in S$ at time $0$ and $(\pi ^1,\pi ^2)\in \Pi ^1\times \Pi ^2$, there exists a unique probability measure $P^{i,\pi ^1,\pi ^2}$ on $(\Omega ,\mathcal {F})$ such that:

For each $A_0\in \mathbb {B}(A(i))$ and $B_0\in \mathbb {B}(B(i))$, we have
$$\begin{aligned} P^{i,\pi ^1,\pi ^2}\{x(0)=i,a(0)\in A_0,b(0)\in B_0\}=\pi ^1_0(A_0|i)\cdot \pi ^2_0(B_0|i). \end{aligned}$$
Given arbitrary $n\ge 1$ and $0\le s_1\le s_2\le \ldots \le s_n$, and, on the other hand, given $i_k\in S$, $A_k\in \mathbb {B}(A(i_k))$, and $B_k\in \mathbb {B}(B(i_k))$, for $k=1,\ldots ,n$, we have
$$\begin{aligned} P^{i,\pi ^1,\pi ^2}\{x(s_1)&=i_1,a(s_1)\in A_1,b(s_1)\in B_1,\ldots ,\\ x(s_n)&=i_n,a(s_n)\in A_n,b(s_n)\in B_n\} \\&=\prod _{k=1}^{n} P^{\pi ^1,\pi ^2}_{i_{k-1} i_{k}}(s_{k-1},s_{k})\pi ^1_{s_{k}}(A_{k}|i_{k})\pi ^2_{s_{k}}(B_{k}|i_{k}), \end{aligned}$$
with the convention that $i_0=i$ and $s_0=0$.

We will refer to $\{x(t)\}_{t\ge 0}$ as the state process, while $\{a(t)\}_{t\ge 0}$ and $\{b(t)\}_{t\ge 0}$ are the action processes for players 1 and 2. The expectation operator associated with $P^{i,\pi ^1,\pi ^2}$ will be denoted by $E^{i,\pi ^1,\pi ^2}$.

Let $\mathcal {B}_w(S)$ be the family of functions $u:S\rightarrow \mathbb {R}$ such that

$$\begin{aligned} ||u||_w=\sup _{i\in S} \{|u(i)|/w(i)\}<\infty . \end{aligned}$$

We have that $||\cdot ||_w$ is a norm on $\mathcal {B}_w(S)$, under which it is a Banach space.

The discounted reward/cost optimality criterion. For the discount rate $\alpha >0$, given an initial state $i\in S$ and a pair of strategies $(\pi ^1,\pi ^2)\in \Pi ^1\times \Pi ^2$, we define the total expected discounted payoff as

$$\begin{aligned} V^\alpha (i,\pi ^1,\pi ^2)=E^{i,\pi _1,\pi _2} \left[ \int _0^\infty e^{-\alpha t} r(x(t),a(t),b(t)) dt\right] . \end{aligned}$$

(2.3)

Thus, $V^\alpha (i,\pi ^1,\pi ^2)$ is the total expected discounted reward for player 1, and it is the total expected discounted cost for player 2. Using (Guo and Hernández-Lerma 2003, Lemma 3.2) or (Guo and Hernández-Lerma 2009, Lemma 6.3), under Assumption 2.2(i), we have that

$$\begin{aligned} E^{i,\pi ^1,\pi ^2} [w(x(t))]\le e^{c_1t}w(i)+\frac{d_1}{c_1}(e^{c_1t}-1). \end{aligned}$$

(2.4)

(if $c_1=0$, then the righthand term is $w(i)+d_1t$; to see this, just let $c_1\downarrow 0$ in (2.4)). As a consequence of Assumption 2.2(ii) given $i\in S$ and $(\pi ^1,\pi ^2)\in \Pi ^1\times \Pi ^2$, we have

$$\begin{aligned} |V^\alpha (i,\pi ^1,\pi ^2)| \le M \int _0^\infty e^{-\alpha t} E^{i,\pi ^1,\pi ^2}[w(x(t))]dt\le \frac{Mw(i)}{\alpha -c_1}+\frac{d_1M}{\alpha (\alpha -c_1)}. \end{aligned}$$

Hence, letting $\mathfrak {M}=\frac{M(\alpha +d_1)}{\alpha (\alpha -c_1)}$, it follows that

$$\begin{aligned} \Vert V^\alpha (\cdot ,\pi ^1,\pi ^2)\Vert _w\le \mathfrak {M}\quad \hbox {for all}\,\, (\pi ^1,\pi ^2)\in \Pi ^1\times \Pi ^2. \end{aligned}$$

(2.5)

Remark 2.4

If $(\pi ^1,\pi ^2)$ is a pair of stationary strategies, by Theorem 2.3(ii) we have (cf. (2.3))

$$\begin{aligned} V^\alpha (i,\pi ^1,\pi ^2)=E^{i,\pi ^1,\pi ^2}\left[ \int _0^\infty r(x(t),\pi ^1,\pi ^2)dt\right] \quad \hbox {for each}\,\, i\in S. \end{aligned}$$

Given the initial state $i\in S$, the lower value and upper value of $\mathcal {G}$ are defined as:

$$\begin{aligned} L^\alpha (i)&= \sup _{\pi ^1\in \Pi ^1} \inf _{\pi ^2\in \Pi ^2} V^\alpha (i,\pi ^1,\pi ^2) \\ U^\alpha (i)&= \inf _{\pi ^2\in \Pi ^2} \sup _{\pi ^1\in \Pi ^1} V^\alpha (i,\pi ^1,\pi ^2), \end{aligned}$$

respectively. We note that, as a consequence of (2.5), we have

$$\begin{aligned} ||L^\alpha ||_w\le \mathfrak {M}\quad \hbox {and}\quad ||U^\alpha ||_w\le \mathfrak {M}. \end{aligned}$$

The lower value of the game is the maximal discounted reward for player 1 when using a “maximin” strategy. Indeed, for every fixed strategy $\pi ^1\in \Pi ^1$, the worst scenario for player 1 is when player 2 chooses the strategy $\pi ^2\in \Pi ^2$ attaining the infimum

$$\begin{aligned} \inf _{\pi ^2\in \Pi ^2} V^\alpha (i,\pi ^1,\pi ^2). \end{aligned}$$

Then, player 1 chooses the strategy yielding the maximal reward, that is, the one achieving the supremum in the definition of $L^\alpha (i)$. Similarly, the upper value of the game corresponds to the optimal payoff of player 2 when using a “minimax” strategy. It is easy to see that $L^\alpha (i)\le U^\alpha (i)$ for every $i\in S$.

Definition 2.5

The game $\mathcal {G}$ has a value when $L^\alpha (i)=U^\alpha (i)$ for all $i\in S$. The function $V^\alpha (i):=L^\alpha (i)=U^\alpha (i)$ is called the value function of the game $\mathcal {G}$.

In this case, we say that $(\pi ^{*1},\pi ^{*2})\in \Pi ^1\times \Pi ^2$ is a pair of discount optimal strategies when

$$\begin{aligned} V^\alpha (i,\pi ^{1},\pi ^{*2}) \le V^\alpha (i,\pi ^{*1},\pi ^{*2}) \le V^\alpha (i,\pi ^{*1},\pi ^{2}) \end{aligned}$$

for all $i\in S$ and $(\pi ^1,\pi ^2)\in \Pi ^1\times \Pi ^2$.

A direct calculation shows that if $V^\alpha $ is the value function of the game and $(\pi ^{*1},\pi ^{*2})\in \Pi ^1\times \Pi ^2$ is a pair of discount optimal strategies, then $V^\alpha (i)=V^\alpha (i,\pi ^{*1},\pi ^{*2})$ for each $i\in S$. A pair of optimal strategies is usually referred to as a noncooperative or Nash equilibrium of the game.

Existence of the value function of $\mathcal {G}$ and the Shapley equations. The following is a consequence of our assumptions that will be useful in the forthcoming.

Corollary 2.6

Suppose that Assumption 2.2 holds.

(i)
Given $i\in S$ and $k>i$, we have
$$\begin{aligned} \sum _{j\ge k} q_{ij}(\phi ,\psi )w(j) \le \frac{1}{w(k)} \Big ( c_2w^2(i)+d_2+q(i)w^2(i) \Big ) \end{aligned}$$
for all $\phi \in \overline{A}(i)$ and $\psi \in \overline{B}(i)$.
(ii)
Given arbitrary $u\in \mathcal {B}_w(S)$, the function $ (a,b)\mapsto \sum _{j\in S}q_{ij}(a,b)u(j)$ is continuous on $A(i)\times B(i)$ for every $i\in S$.

Proof

(i)
Given $i\in S$, $k>i$, and $(a,b)\in A(i)\times B(i)$ we have
$$\begin{aligned} \sum _{j\ge k} q_{ij}(a,b)w(j)&\le \frac{1}{w(k)} \sum _{j\ge k} q_{ij}(a,b)w^2(j) \end{aligned}$$
(2.6)

$$\begin{aligned}&\le \frac{1}{w(k)} \Big ( \sum _{j\in S} q_{ij}(a,b)w^2(j) -q_{ii}(a,b)w^2(i)\Big ) \nonumber \\&\le \frac{1}{w(k)} \Big ( c_2w^2(i)+d_2+q(i)w^2(i) \Big ), \end{aligned}$$
(2.7)
where we have used monotonicity of $w$ in (2.6) and Assumptions 2.2(i) and (iv) in (2.7). By integration with respect to $\phi (da)$ and $\psi (db)$, statement (i) follows.
(ii)
From (2.7) we have that
$$\begin{aligned} \lim _{k\rightarrow \infty } \sup _{(a,b)\in A(i)\times B(i)} \sum _{j\ge k} q_{ij}(a,b)w(j)=0 \end{aligned}$$
and, in particular,
$$\begin{aligned} \lim _{k\rightarrow \infty } \sup _{(a,b)\in A(i)\times B(i)} \Big |\sum _{j\ge k} q_{ij}(a,b)u(j)\Big |=0. \end{aligned}$$
Consequently, the series $\sum q_{ij}(a,b)u(j)$ of continuous functions converges uniformly on $A(i)\times B(i)$. It is, therefore, a continuous function. $\square $

The continuity of $\sum _{j\in S}q_{ij}(a,b)w(j)$ is a usual requirement in Markov game models; see (Guo and Hernández-Lerma 2005, Assumption C.3) or (Prieto-Rumeau and Hernández-Lerma 2012, Assumption 10.7.b). As seen in Corollary 2.6(ii) above, this condition is in fact implied by our hypotheses.

The main result on the discounted game $\mathcal {G}$ is the following. It is borrowed from (Guo and Hernández-Lerma 2005; Prieto-Rumeau and Hernández-Lerma 2012).

Theorem 2.7

Suppose that the game model $\mathcal {G}$ satisfies Assumption 2.2.

(i)
The game has a value $V^\alpha \in \mathcal {B}_w(S)$ with $\Vert V^\alpha \Vert _w\le \mathfrak {M}$.
(ii)
The value function $V^\alpha $ is the unique solution $u$ in $\mathcal {B}_w(S)$ of the equations
$$\begin{aligned} \alpha u(i)&= \sup _{\phi \in \overline{A}(i)} \inf _{\psi \in \overline{B}(i)} \Big \{ r(i,\phi ,\psi )+\sum _{j\in S} q_{ij}(\phi ,\psi ) u(j) \Big \}\end{aligned}$$
(2.8)

$$\begin{aligned}&= \inf _{\psi \in \overline{B}(i)} \sup _{\phi \in \overline{A}(i)} \Big \{ r(i,\phi ,\psi )+\sum _{j\in S} q_{ij}(\phi ,\psi ) u(j) \Big \} \end{aligned}$$
(2.9)
for all $i\in S$.
(iii)
There exists a pair of optimal randomized stationary strategies. Moreover, $(\pi ^1,\pi ^2)\in \Pi ^{1,s}\times \Pi ^{2,s}$ is a pair of optimal randomized stationary strategies if and only if $\pi ^1(\cdot |i)$ and $\pi ^2(\cdot |i)$ attain the supremum and the infimum in (2.8) and (2.9), respectively, for every $i\in S$.

Remark 2.8

The fact that there exists a pair of optimal stationary strategies implies that the value of the game satisfies

$$\begin{aligned} V^\alpha (i)=\sup _{\pi ^1\in \Pi ^{1,s}} \inf _{\pi ^2\in \Pi ^{2,s}} V^\alpha (i,\pi ^1,\pi ^2)=\inf _{\pi ^2\in \Pi ^{2,s}} \sup _{\pi ^1\in \Pi ^{1,s}} V^\alpha (i,\pi ^1,\pi ^2) \end{aligned}$$

for all $i\in S$. Note that we are taking the infimum and the supremum over the family of stationary strategies (cf. the definition of the lower and upper value of the game).

2.2 The game models $\{\mathcal {G}_n\}_{n\ge 1}$

In the forthcoming, we consider a sequence of game models

$$\begin{aligned} \mathcal {G}_n=\{S_n,A,B,\mathbb {K}_n,Q_n,r_n\}\quad \hbox {for}\,\, n\ge 1. \end{aligned}$$

The elements of these game models satisfy the following conditions.

The state space $S_n$ is a (finite or infinite) subset of $S$.
The action spaces are $A$ and $B$, as for the game model $\mathcal {G}$.
The set of available actions in state $i\in S_n$ is the measurable sets $A_n(i)\subseteq A(i)$ and $B_n(i)\subseteq B(i)$ for players 1 and 2, respectively. Let
$$\begin{aligned} \mathbb {K}_n=\{(i,a,b)\in S_n\times A\times B: a\in A_n(i), b\in B_n(i)\}\subseteq \mathbb {K}. \end{aligned}$$
The transition rate matrix is given by $Q_n=[q^n_{ij}(a,b)]$ for $i,j\in S_n$ and $(a,b)\in A_n(i)\times B_n(i)$. We assume that $(a,b)\mapsto q^n_{ij}(a,b)$ is measurable on $A_n(i)\times B_n(i)$ for all $i,j\in S_n$. The transition rates are assumed to be conservative and stable, that is,
$$\begin{aligned} \sum _{j\in S_n} q^n_{ij}(a,b)=0\quad \hbox {and}\quad \sup _{a\in A_n(i),b\in B_n(i)}\{-q^n_{ii}(a,b)\}=: q_n(i)<\infty \end{aligned}$$
for $(i,a,b)\in \mathbb {K}_n$, with the condition that $q^n_{ij}(a,b)\ge 0$ for $i\ne j$.
The reward/cost rate function is $r_n:\mathbb {K}_n\rightarrow \mathbb {R}$, assumed to be measurable.

The discount rate $\alpha >0$ is the same for all the game models $\mathcal {G}_n$ and $\mathcal {G}$. The dynamics of the game $\mathcal {G}_n$ is similar to that of the game $\mathcal {G}$ studied previously. Next, we state our hypotheses on the sequence $\{\mathcal {G}_n\}_{n\ge 1}$.

Assumption 2.9

The following statements hold for every $n\ge 1$.

(i)
For all $(i,a,b)\in \mathbb {K}_n$
$$\begin{aligned} \sum _{j\in S_n}q^n_{ij}(a,b)w(j)\le c_1w(i)+d_1, \end{aligned}$$
where the Lyapunov function $w$ and the constants $c_1<\alpha $ and $d_1\ge 0$ come from Assumption 2.2(i). For each $i\in S_n$, we have $q_n(i)\le w(i)$.
(ii)
For the constant $M>0$ in Assumption 2.2(ii), $|r_n(i,a,b)|\le Mw(i)$ for all $(i,a,b)\in \mathbb {K}_n$.
(iii)
For each $i\in S_n$, the sets $A_n(i)\subseteq A(i)$ and $B_n(i)\subseteq B(i)$ are compact, while for all $i,j\in S_n$, the functions $r_n(i,a,b)$ and $q^n_{ij}(a,b)$ are continuous on $A_n(i)\times B_n(i)$.
(iv)
With $c_2\in \mathbb {R}$ and $d_2\ge 0$ as in Assumption 2.2(iv), we have
$$\begin{aligned} \sum _{j\in S_n}q^n_{ij}(a,b)w^2(j)\le c_2w^2(i)+d_2\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}_n. \end{aligned}$$

Roughly, Assumption 2.9 consists in supposing that Assumption 2.2 holds “uniformly” in $n\ge 1$. Next, we introduce some notation. For the game model $\mathcal {G}_n$, the families of randomized Markov strategies for players 1 and 2 are denoted by $\Pi ^1_n$ and $\Pi ^2_n$, respectively, while the corresponding stationary strategies are

$$\begin{aligned} \Pi ^{1,s}_n=\prod _{i\in S_n} \overline{A}_n(i)\quad \hbox {and}\quad \Pi ^{2,s}_n=\prod _{i\in S_n} \overline{B}_n(i), \end{aligned}$$

with $\overline{A}_n(i)$ and $\overline{B}_n(i)$ the families of probability measures on $A_n(i)$ and $B_n(i)$, respectively. Let $\mathcal {B}_w(S_n)$ be the Banach space of functions $u:S_n\rightarrow \mathbb {R}$ with finite $w$-norm

$$\begin{aligned} ||u||_w=\sup _{i\in S_n} \{ |u(i)|/w(i)\}. \end{aligned}$$

(We note that we use the same notation $||u||_w$ for $u:S\rightarrow \mathbb {R}$ and $u:S_n\rightarrow \mathbb {R}$). Notations such as

$$\begin{aligned} q^n_{ij}(\phi ,\psi ) \quad \hbox {and}\quad r_n(i,\phi ,\psi ) \end{aligned}$$

for $i,j\in S_n$, $\phi \in \overline{A}_n(i)$, and $\psi \in \overline{B}_n(i)$ are given the obvious definitions; see (2.1)–(2.2).

We can apply Theorem 2.3 to $\mathcal {G}_n$ and, therefore, there indeed exists a stochastic process $\{(x(t),a(t),b(t))\}_{t\ge 0}$ taking values in $\mathbb {K}_n$ that models the state and actions processes for the game model $\mathcal {G}_n$. In particular, the corresponding expectation operator will be denoted by $E^{i,\pi ^1,\pi ^2}_n$. Given $i\in S_n$ and $(\pi ^1,\pi ^2)\in \Pi ^1_n\times \Pi ^2_n$, define the total expected discounted payoff for the game model $\mathcal {G}_n$ as

$$\begin{aligned} V^\alpha _n(i,\pi ^1,\pi ^2)=E^{i,\pi ^1,\pi ^2}_n\Big [\int _0^\infty e^{-\alpha t} r_n(x(t),a(t),b(t))dt\Big ]. \end{aligned}$$

We also have (cf. 2.5),

$$\begin{aligned} \Vert V^\alpha _n(\cdot ,\pi ^1,\pi ^2)\Vert \le \mathfrak {M}\quad \hbox {for all}\,\, \pi ^1\in \Pi ^1_n\,\, \hbox {and}\,\, \pi ^2\in \Pi ^2_n. \end{aligned}$$

(2.10)

The lower and upper value functions of the game $L^\alpha _n,U^\alpha _n\in \mathcal {B}_w(S_n)$, and the value function $V^\alpha _n$ (provided it exists) are given the usual definitions. We have a result similar to Corollary 2.6, which is stated without proof.

Corollary 2.10

Suppose that Assumption 2.9 holds and fix $n\ge 1$.

(i)
Given $i\in S_n$ and $k>i$, we have
$$\begin{aligned} \sum _{j\ge k,j\in S_n} q^n_{ij}(\phi ,\psi )w(j) \le \frac{1}{w(k)} \Big ( c_2w^2(i)+d_2+q(i)w^2(i) \Big ) \end{aligned}$$
for all $\phi \in \overline{A}_n(i)$ and $\psi \in \overline{B}_n(i)$.
(ii)
Given arbitrary $u\in \mathcal {B}_w(S_n)$, the function $(a,b)\mapsto \sum _{j\in S_n}q^n_{ij}(a,b)u(j)$ is continuous on $A_n(i)\times B_n(i)$ for every $i\in S_n$.

Each discounted game model $\mathcal {G}_n$ has a value function that can be characterized by the corresponding Shapley equations; cf. Theorem 2.7.

Theorem 2.11

Suppose that Assumption 2.9 is satisfied. Then, the following statements hold for each $n\ge 1$:

(i)
The game $\mathcal {G}_n$ has a value $V_n^\alpha \in \mathcal {B}_w(S_n)$ with $\Vert V_n^\alpha \Vert _w\le \mathfrak {M}$.
(ii)
The value function $V_n^\alpha $ is the unique solution $u$ in $\mathcal {B}_w(S_n)$ of the equations
$$\begin{aligned} \alpha u(i)&= \sup _{\phi \in \overline{A}_n(i)} \inf _{\psi \in \overline{B}_n(i)} \Big \{ r_n(i,\phi ,\psi )+\sum _{j\in S_n} q^n_{ij}(\phi ,\psi ) u(j) \Big \} \end{aligned}$$
(2.11)

$$\begin{aligned}&= \inf _{\psi \in \overline{B}_n(i)} \sup _{\phi \in \overline{A}_n(i)} \Big \{ r_n(i,\phi ,\psi )+\sum _{j\in S_n} q^n_{ij}(\phi ,\psi ) u(j) \Big \} \end{aligned}$$
(2.12)
for each $i\in S_n$.
(iii)
There exists a pair of optimal randomized stationary strategies for the game model $\mathcal {G}_n$. Moreover, $(\pi ^1,\pi ^2)\in \Pi ^{1,s}_n\times \Pi ^{2,s}_n$ is a pair of optimal randomized stationary strategies if and only if $\pi ^1(\cdot |i)$ and $\pi ^2(\cdot |i)$ attain the supremum and the infimum in (2.11) and (2.12), respectively, for every $i\in S_n$.

3 Convergence results

In what follows we will suppose that the game models $\mathcal {G}$ and $\{\mathcal {G}_n\}_{n\ge 1}$ satisfy Assumptions 2.2 and 2.9.

Convergence of game models. We propose a definition of the game models $\mathcal {G}_n$ converging to the original game model $\mathcal {G}$. In this definition, we make use of the Hausdorff metric. We recall that the Hausdorff distance between two closed subsets $C$ and $D$ of a metric space $(X,d_X)$ is defined as:

$$\begin{aligned} d_H(C,D)=\sup _{y\in C}\inf _{x\in D} \{d_X(x,y)\}\ \vee \ \sup _{x\in D}\inf _{y\in C} \{d_X(x,y)\} \end{aligned}$$

(here, $\vee $ stands for “maximum”). The Hausdorff metric is indeed a metric on the family of closed subsets of $X$ except that it might not be finite. We say that $\{C_n\}_{n\ge 1}$ converges to $C$ in the Hausdorff metric when $d_H(C_n,C)\rightarrow 0$ as $n\rightarrow \infty $.

Definition 3.1

We say that $\mathcal {G}_n\rightarrow \mathcal {G}$ as $n\rightarrow \infty $ when the following conditions hold:

(a)
The sequence of states $S_n\uparrow S$.

We define $n(i)=\min \{n\ge 1: i\in S_n\}$ for each $i\in S$, and so $i\in S_n$ if and only if $n\ge n(i)$.

(b)
For each $i\in S$, the sequences of action sets $A_n(i)$ and $B_n(i)$ converge to $A(i)$ and $B(i)$ as $n\rightarrow \infty $, respectively, in the Hausdorff metric.

For every $i\in S$, given sequences $\{a_n\}_{n\ge n(i)}$ and $\{b_n\}_{n\ge n(i)}$, with $a_n\in A_n(i)$ and $b_n\in B_n(i)$, such that $a_n\rightarrow a$ and $b_n\rightarrow b$ for some $a\in A(i)$ and $b\in B(i)$, we have:

(c)
$\lim _{n\rightarrow \infty } q^n_{ij}(a_n,b_n)= q_{ij}(a,b)$ for all $j\in S$, and
(d)
$\lim _{n\rightarrow \infty } r_n(i,a_n,b_n)=r(i,a,b)$.

Observe that expressions such as $d_H(A(i),A_n(i))$ or $q_{ij}^n(a_n,b_n)$ are defined only for large enough $n$ (namely, $n\ge n(i)$ in the former case, and $n\ge n(i)\vee n(j)$ in the latter). This is not made explicit in the notation since we are dealing with the limit as $n\rightarrow \infty $.

Our next lemma gives some equivalent statements of Definition 3.1.

Lemma 3.2

(i)
The condition in Definition 3.1(c) can be replaced with the following statement. Given $i,j\in S$ and $\epsilon >0$ there exists $n_0\ge n(i)\vee n(j)$ such that for all $n\ge n_0$
$$\begin{aligned} \sup _{(a,b)\in A_n(i)\times B_n(i)}|q^n_{ij}(a,b)-q_{ij}(a,b)|\le \epsilon . \end{aligned}$$
(ii)
The condition in Definition 3.1(d) can be replaced with the following statement. Given $i\in S$ and $\epsilon >0$ there exists $n_0\ge n(i)$ such that for all $n\ge n_0$
$$\begin{aligned} \sup _{(a,b)\in A_n(i)\times B_n(i)}|r_n(i,a,b)-r(i,a,b)|\le \varepsilon . \end{aligned}$$

Proof

(i) First we prove that if Definition 3.1 holds, then (i) also holds. We proceed by contradiction. If (i) does not hold then there is some $i,j\in S$ and $\epsilon >0$ such that, for infinitely many $n\ge n(i)\vee n(j)$, there exist $a_n\in A_n(i)$ and $b_n\in B_n(i)$ with

$$\begin{aligned} |q^n_{ij}(a_n,b_n)-q_{ij}(a_n,b_n)|> \varepsilon . \end{aligned}$$

(3.1)

For such $n$, since $a_n\in A_n(i)\subseteq A(i)$, there exists a subsequence $\{n'\}$ and $a\in A(i)$ such that $a_{n'}\rightarrow a$. Similarly, for some subsequence, still denoted by $\{n'\}$, we have $b_{n'}\rightarrow b$ for some $b\in B(i)$. Next, define $\tilde{a}_n\in A_n(i)$ and $\tilde{b}_n\in B_n(i)$, for $n\ge n(i)$, as follows.

If $n$ belongs to the subsequence $\{n'\}$ then let $\tilde{a}_n=a_n$ and $\tilde{b}_n=b_n$;
Otherwise, let $\tilde{a}_n$ and $\tilde{b}_n$ be such that
$$\begin{aligned} d_A(\tilde{a}_n,a)=\inf _{x\in A_n(i)} d_A(x,a)\le d_H(A_n(i), A(i)) \\ d_B(\tilde{b}_n,b)=\inf _{y\in B_n(i)} d_B(y,b)\le d_H(B_n(i), B(i)). \end{aligned}$$

We have thus constructed sequences $\tilde{a}_n\in A_n(i)$ and $\tilde{b}_n\in B_n(i)$, for $n\ge n(i)$, such that $\tilde{a}_n\rightarrow a$ and $\tilde{b}_n\rightarrow b$. Consequently, by (c), for $n$ large enough we have

$$\begin{aligned} |q^n_{ij}(\tilde{a}_n,\tilde{b}_n)-q_{ij}(a,b)|\le \frac{\epsilon }{2}. \end{aligned}$$

In particular, recalling (3.1), along the subsequence $\{n'\}$ we have

$$\begin{aligned} |q_{ij}(a_{n'},b_{n'})-q_{ij}(a,b)|> \frac{\varepsilon }{2}. \end{aligned}$$

This contradicts the continuity of the transition rate function.

Conversely, let us now prove that Definition 3.1(a), (b), and (d), together with (i), imply (c). Fix $i,j\in S$, and let $a_n\in A_n(i)$ and $b_n\in B_n(i)$ be such that $a_n\rightarrow a\in A(i)$ and $b_n\rightarrow b\in B(i)$. By the condition (i), given $\epsilon >0$, for $n$ large enough we have

$$\begin{aligned} |q_{ij}^n(a_n,b_n)-q_{ij}(a_n,b_n)|\le \frac{\epsilon }{2}. \end{aligned}$$

But now continuity of the function $(a,b)\mapsto q_{ij}(a,b)$ implies that for $n$ large enough we also have

$$\begin{aligned} |q_{ij}(a_n,b_n)-q_{ij}(a,b)|\le \frac{\epsilon }{2}. \end{aligned}$$

This yields

$$\begin{aligned} |q_{ij}^n(a_n,b_n)-q_{ij}(a,b)|\le \epsilon \end{aligned}$$

and so $\lim _{n} q_{ij}^n(a_n,b_n)=q_{ij}(a,b)$. This completes the proof that (c) $\Leftrightarrow $ (i).

To prove statement (ii) we can proceed similarly. $\square $

Given a sequence of functions $u_n:S_n\rightarrow \mathbb {R}$, for $n\ge 1$, we say that $\{u_n\}$ converges pointwise to some function $u:S\rightarrow \mathbb {R}$ when

$$\begin{aligned} \lim _{n\rightarrow \infty } u_n(i)=u(i)\quad \hbox {for all}\,\, i\in S. \end{aligned}$$

Note that, for fixed $i\in S$, $u_n(i)$ is well defined only when $n\ge n(i)$. Since the above definition is concerned with the limit as $n\rightarrow \infty $, the requirement $n\ge n(i)$ will not be explicit in the notation.

Convergence of probability measures. Now we recall some facts on convergence of probability measures; see, e.g., (Billingsley 1968, Chapter 1) or (Bogachev 2007, Chapter 8). Given a metric space $(X,d_X)$, let $\mathcal {P}$ be the family of Borel probability measures on $X$. We say that the sequence $\{\mu _n\}\subseteq \mathcal {P}$ converges weakly to $\mu \in \mathcal {P}$, and we will write $\mu _n\mathop {\longrightarrow }\limits ^{d}\mu $, if

$$\begin{aligned} \lim _{n\rightarrow \infty } \int _{X} fd\mu _n=\int _{X}fd\mu \end{aligned}$$

(3.2)

for all bounded and continuous functions $f:X\rightarrow \mathbb {R}$. As a consequence of the Portmanteau theorem (see Theorems 1.2 and 2.1 in Billingsley 1968), to have weak convergence it suffices^{Footnote 1} that (3.2) holds for all bounded and Lipschitz continuous functions $f:X\rightarrow \mathbb {R}$. (We recall that $f:X\rightarrow \mathbb {R}$ is Lipschitz continuous if there exists a constant $L>0$ such that $|f(x)-f(y)|\le Ld_X(x,y)$ for all $x,y\in X$. In this case, we say that $f$ is $L$-Lipschitz continuous).

In case that $X$ is a compact metric space, we have that weak convergence is metrizable with the Wasserstein distance

$$\begin{aligned} \sup _{f\in \mathrm {Lip}_1(X)} \Big \{ \int _{X} fd\mu -\int _{X}fd\nu \Big \}=\inf _\lambda \int _{X\times X} d_X(x,x')\lambda (dx,dx')=:d_W(\mu ,\nu ), \end{aligned}$$

(3.3)

for $\mu ,\nu \in \mathcal {P}$, where the supremum ranges over the set $\mathrm {Lip}_1(X)$ of all $1$-Lipschitz continuous functions on $X$, and where the infimum ranges over the set of all probability measures $\lambda $ on $X\times X$ with marginals $\mu $ and $\nu $ [see Theorems 8.3.2 and 8.10.45, and Section 8.10(viii) in Bogachev 2007]. With this metric, we have that $(\mathcal {P},d_W)$ is a compact metric space [(Bogachev 2007, Theorem 8.9.3(i))]. In addition, if $\{x_1,x_2,\ldots \}$ is a countable dense subset of $X$, then the countable family of probability measures

$$\begin{aligned} \sum _{j=1}^k \beta _j \delta _{x_j} \end{aligned}$$

for all $k\ge 1$ and rational $\beta _1,\ldots ,\beta _k\ge 0$ with $\sum \beta _j=1$ is dense in $(\mathcal {P},d_W)$, where $\delta _x$ denotes the Dirac probability measure supported on $x$; see [(Bogachev 2007, Theorem 8.9.4(ii))] or Bolley (2008).

Main results. We need one preliminary lemma before proving our main convergence result.

Lemma 3.3

Suppose that the game models $\mathcal {G}$ and $\{\mathcal {G}_n\}_{n\ge 1}$ satisfy Assumptions 2.2 and 2.9, and also that $\mathcal {G}_n\rightarrow \mathcal {G}$. Suppose that the sequence $v_n\in \mathcal {B}_w(S_n)$, for $n\ge 1$, converges pointwise to $v\in \mathcal {B}_w(S)$, and that

$$\begin{aligned} \sup _{n\ge 1}||v_n||_w=\mathfrak {m}<\infty . \end{aligned}$$

For fixed $i\in S$, assume also that $\phi _n\in \overline{A}_n(i)$ and $\psi _n\in \overline{B}_n(i)$, for $n\ge n(i)$, are such that

$$\begin{aligned} \phi _n\mathop {\longrightarrow }\limits ^{d}\phi \quad \hbox {and}\quad \psi _n\mathop {\longrightarrow }\limits ^{d}\psi \quad \hbox {as } n\rightarrow \infty \end{aligned}$$

for some $\phi \in \overline{A}(i)$ and $\psi \in \overline{B}(i)$. Under these conditions,

$$\begin{aligned} \lim _{n\rightarrow \infty } \big [ r_n(i,\phi _n,\psi _n)+\sum _{j\in S_n} q^n_{ij}(\phi _n,\psi _n)v_n(j)\big ]= r(i,\phi ,\psi )+\sum _{j\in S} q_{ij}(\phi ,\psi )v(j). \end{aligned}$$

Proof

Let us first analyze the term $r_n(i,\phi _n,\psi _n)$. By Lemma 3.2(ii), given $\epsilon >0$ there exists $n_0\ge n(i)$ such that $n\ge n_0$ implies

$$\begin{aligned} |r_n(i,a,b)-r(i,a,b)| \le \frac{\epsilon }{2}\quad \hbox {for all}\,\, (a,b)\in A_n(i)\times B_n(i). \end{aligned}$$

In particular, we have

$$\begin{aligned} |r_n(i,\phi _n,\psi _n)-r(i,\phi _n,\psi _n)|&\le \int _{A_n(i)}\int _{B_n(i)} |r_n(i,a,b)-r(i,a,b)| \psi _n(db)\phi _n(da) \nonumber \\&\le \frac{\epsilon }{2} \end{aligned}$$

(3.4)

for $n\ge n_0$. On the other hand, since the function $r(i,a,b)$ is bounded and continuous on $A(i)\times B(i)$, by Theorem 3.2 in Billingsley (1968) we have that $r(i,\phi _n,\psi _n)$ converges to $r(i,\phi ,\psi )$ as $n\rightarrow \infty $. Consequently, there is some $n_1\ge n(i)$ such that $n\ge n_1$ gives

$$\begin{aligned} \big |r(i,\phi _n,\psi _n)-r(i,\phi ,\psi )\big |\le \frac{\epsilon }{2}. \end{aligned}$$

(3.5)

From (3.4) and (3.5), we have that $|r_n(i,\phi _n,\psi _n)-r(i,\phi ,\psi )|\le \epsilon $ for $n\ge n_0\vee n_1$. Therefore,

$$\begin{aligned} \lim _{n\rightarrow \infty } r_n(i,\phi _n,\psi _n)=r(i,\phi ,\psi ). \end{aligned}$$

We proceed with the proof. As a consequence of Corollaries 2.6(i) and 2.10(i) we deduce that, given $\epsilon >0$, there exists some $k>i$ such that $ \sum _{j\ge k}q_{ij}(\phi ,\psi ) w(j)\le \epsilon $ and such that, for all $n\ge n(i)$,

$$\begin{aligned} \sum _{j\ge k,j\in S_n}q^n_{ij}(\phi _n,\psi _n) w(j)\le \epsilon . \end{aligned}$$

Therefore, since $\Vert v_n\Vert _w\le \mathfrak {m}$ implies $\Vert v\Vert _w\le \mathfrak {m}$, we have

$$\begin{aligned} \Big | \sum _{j\ge k}q_{ij}(\phi ,\psi ) v(j)\Big | \le \mathfrak {m}\epsilon \end{aligned}$$

and, for all $n\ge n(i)$,

$$\begin{aligned} \big | \sum _{j\ge k,j\in S_n}q^n_{ij}(\phi _n,\psi _n) v_n(j)\big | \le \mathfrak {m}\epsilon . \end{aligned}$$

Consequently, if $n\ge n(i)$ is such that, in addition, $\{0,1,\ldots ,k-1\}\subseteq S_n$, we have

$$\begin{aligned}&\Big | \sum _{j\in S_n} q^n_{ij}(\phi _n,\psi _n)v_n(j)-\sum _{j\in S} q_{ij}(\phi ,\psi )v(j)\Big |\nonumber \\&\quad \le \sum _{j=0}^{k-1} \big |q^n_{ij}(\phi _n,\psi _n)v_n(j)- q_{ij}(\phi ,\psi )v(j)\big |+2\mathfrak {m}\epsilon . \end{aligned}$$

The left-hand term of the last expression can be made arbitrarily small by choosing $n$ large enough. Indeed, as made at the beginning of this proof, we can prove that

$$\begin{aligned} q^n_{ij}(\phi _n,\psi _n)\rightarrow q_{ij}(\phi ,\psi ) \end{aligned}$$

which, together with the fact that $v_n(j)\rightarrow v(j)$ for all $j\in S$, yields the stated result because, once $i\in S$ and $\epsilon >0$ are given, the state $k$ remains fixed and does not depend on $n$. The proof is now complete. $\square $

Our next result needs to introduce some terminology. Given randomized stationary strategies $(\pi ^1_n,\pi ^2_n)\in \Pi ^{1,s}_n\times \Pi ^{2,s}_n$ for the game model $\mathcal {G}_n$, for $n\ge 1$, we say that the randomized stationary strategies $(\pi ^1,\pi ^2)\in \Pi ^1_s\times \Pi ^2_s$ are a limit strategy of $\{(\pi ^1_n,\pi ^2_n)\}_{n\ge 1}$ if there exists a subsequence $\{n'\}$ such that

$$\begin{aligned} \pi ^1_{n'}(\cdot |i)\mathop {\longrightarrow }\limits ^{d} \pi ^1(\cdot |i)\quad \hbox {and}\quad \pi ^2_{n'}(\cdot |i)\mathop {\longrightarrow }\limits ^{d} \pi ^2(\cdot |i) \end{aligned}$$

for all $i\in S$. Clearly, every such sequence $\{(\pi ^1_n,\pi ^2_n)\}$ indeed has a limit strategy because $\pi ^1_n\in \overline{A}(i)$ and $\pi ^2_n\in \overline{B}(i)$, which are compact metric spaces with the Wasserstein metric. Next, we state our main convergence result.

Theorem 3.4

Suppose that the game models $\mathcal {G}$ and $\{\mathcal {G}_n\}_{n\ge 1}$ satisfy Assumptions 2.2 and 2.9. If $\mathcal {G}_n\rightarrow \mathcal {G}$ then the following statements are satisfied.

(i)
For all $i\in S$, $\lim _{n\rightarrow \infty } V^\alpha _n(i)=V^\alpha (i)$.
(ii)
If $(\pi ^1_n,\pi ^2_n)$ is a pair of optimal randomized stationary strategies for $\mathcal {G}_n$, then any limit strategy $(\pi ^1,\pi ^2)\in \Pi ^1_s\times \Pi ^2_s$ is a pair of optimal randomized stationary strategies for $\mathcal {G}$.

Proof

(i)
Recall that the sequence $\{V^\alpha _n\}$ of the values of the games $\mathcal {G}_n$ verifies
$$\begin{aligned} |V^\alpha _n(i)|\le \mathfrak {M}w(i)\quad \hbox {for all}\,\, n\ge 1\,\, \hbox {and}\,\, i\in S_n; \end{aligned}$$
see (2.10). Therefore, using a diagonal argument, we deduce the existence of $u\in \mathcal {B}_w(S)$ and a subsequence $\{k_n\}$ such that
$$\begin{aligned} \lim _{n\rightarrow \infty } V^\alpha _{k_n}(i)=u(i)\quad \hbox {for all}\,\, i\in S. \end{aligned}$$
Fix $i\in S$ and, for $n$ such that $k_n\ge n(i)$, consider the function on $\overline{A}_{k_n}(i)\times \overline{B}_{k_n}(i)$
$$\begin{aligned} (\phi ,\psi )\mapsto r_{k_n}(i,\phi ,\psi )+\sum _{j\in S_{k_n}} q^{k_n}_{ij}(\phi ,\psi ) V^\alpha _{k_n}(j). \end{aligned}$$
This function is continuous as a consequence of Corollary 2.10 and Theorem 3.2 in Billingsley (1968). Therefore,
$$\begin{aligned} \phi \mapsto \inf _{\psi \in \overline{B}_{k_n}(i)}\Big \{ r_{k_n}(i,\phi ,\psi )+\sum _{j\in S_{k_n}} q^{k_n}_{ij}(\phi ,\psi ) V^\alpha _{k_n}(j)\Big \} \end{aligned}$$
is upper semi-continuous on the compact set $\overline{A}_{k_n}(i)$ and, hence, it has a maximum which is reached at some $\phi _n\in \overline{A}_{k_n}(i)$. There exists a further subsequence $\{k_{n'}\}$ such that $\phi _{n'}\mathop {\longrightarrow }\limits ^{d} \phi _0$ for some $\phi _0\in \overline{A}(i)$. Without loss of generality, and to simplify the notation, we will suppose that the whole sequence $\{\phi _n\}$ converges to $\phi _0$. Fix now arbitrary $\psi \in \overline{B}(i)$. For each $n$ there exist
$$\begin{aligned} x_1,\ldots , x_t \in B(i)\quad \hbox {and}\quad \beta _1,\ldots ,\beta _{t}\in [0,1] \end{aligned}$$
with $\sum \beta _j=1$ such that $d_W(\psi ,\hat{\psi }_n)\le 1/n$, with $\hat{\psi }_n=\sum \beta _j\delta _{x_j}$. Let $y_j\in B_{k_n}(i)$ be such that $d_B(y_j,x_j)=\min _{y\in B_{k_n}(i)} d_B(y,x_j)$ for each $j=1,\ldots ,t$, and define
$$\begin{aligned} \tilde{\psi }_n=\sum _{j=1}^t \beta _j \delta _{y_j}\in \overline{B}_{k_n}(i). \end{aligned}$$
If $f$ is a bounded $L$-Lipschitz continuous function on $B(i)$ then we have
$$\begin{aligned} \Big |\int fd\tilde{\psi }_n-\int fd\psi \Big | \le \Big |\int fd\tilde{\psi }_n-\int fd\hat{\psi }_n\Big | + \Big |\int fd\hat{\psi }_n-\int fd\psi \Big |. \end{aligned}$$
We note that
$$\begin{aligned} \Big |\int fd\tilde{\psi }_n-\int fd\hat{\psi }_n\Big |&= \Big | \sum _{j=1}^t \beta _j[ f(y_j)-f(x_j)] \Big | \ \le \ L\sum _{j=1}^t \beta _j d_B(y_j,x_j) \\&\le L d_H(B_{k_n}(i),B(i)), \end{aligned}$$
which converges to $0$ as $n\rightarrow \infty $. On the other hand, since $\hat{\psi }_n\mathop {\longrightarrow }\limits ^{d} \psi $ we have $\int fd\hat{\psi }_n\rightarrow \int fd\psi $. So, we have shown that
$$\begin{aligned} \lim _{n\rightarrow \infty }\int fd\tilde{\psi }_n=\int fd\psi \end{aligned}$$
for all bounded and Lipschitz-continuous functions on $B(i)$. This implies that $\tilde{\psi }_n\mathop {\longrightarrow }\limits ^{d} \psi $. Summarizing, given arbitrary $\psi \in \overline{B}(i)$ we have constructed $\tilde{\psi }_n\in \overline{B}_{k_n}(i)$ such that $\{\tilde{\psi }_n\}$ converges weakly to $\psi $. By Theorem 2.11(ii), the value $V^\alpha _{k_n}$ of the game $\mathcal {G}_{k_n}$ verifies
$$\begin{aligned} \alpha V^\alpha _{k_n}(i)&= \sup _{\phi \in \overline{A}_{k_n}(i)} \inf _{\psi \in \overline{B}_{k_n}(i)} \Big \{r_{k_n}(i,\phi ,\psi )+\sum _{j\in S_{k_n}} q^{k_n}_{ij}(\phi ,\psi )V^\alpha _{k_n}(j) \Big \} \\&= \inf _{\psi \in \overline{B}_{k_n}(i)} \Big \{r_{k_n}(i,\phi _n,\psi )+\sum _{j\in S_{k_n}} q^{k_n}_{ij}(\phi _n,\psi )V^\alpha _{k_n}(j) \Big \}\\&\le r_{k_n}(i,\phi _n,\tilde{\psi }_n)+\sum _{j\in S_{k_n}} q^{k_n}_{ij}(\phi _n,\tilde{\psi }_n)V^\alpha _{k_n}(j). \end{aligned}$$
Taking the limit as $n\rightarrow \infty $ and recalling Lemma 3.3, we obtain
$$\begin{aligned} \alpha u(i) \le r(i,\phi _0,\psi )+\sum _{j\in S} q_{ij}(\phi _0,\psi )u(j). \end{aligned}$$
But $\psi \in \overline{B}(i)$ being arbitrary, we conclude that
$$\begin{aligned} \alpha u(i) \le \inf _{\psi \in \overline{B}(i)} \Big \{ r(i,\phi _0,\psi )+\sum _{j\in S} q_{ij}(\phi _0,\psi )u(j) \Big \}, \end{aligned}$$
and so,
$$\begin{aligned} \alpha u(i) \le \sup _{\phi \in \overline{A}(i)}\inf _{\psi \in \overline{B}(i)} \Big \{ r(i,\phi ,\psi )+\sum _{j\in S} q_{ij}(\phi ,\psi )u(j) \Big \}. \end{aligned}$$
Arguing similarly, we can show that
$$\begin{aligned} \alpha u(i) \ge \inf _{\psi \in \overline{B}(i)}\sup _{\phi \in \overline{A}(i)} \Big \{ r(i,\phi ,\psi )+\sum _{j\in S} q_{ij}(\phi ,\psi )u(j) \Big \}. \end{aligned}$$
Combining these two inequalities, we conclude that
$$\begin{aligned} \alpha u(i)&= \sup _{\phi \in \overline{A}(i)}\inf _{\psi \in \overline{B}(i)} \Big \{ r(i,\phi ,\psi )+\sum _{j\in S} q_{ij}(\phi ,\psi )u(j) \Big \}\\&= \inf _{\psi \in \overline{B}(i)}\sup _{\phi \in \overline{A}(i)} \Big \{ r(i,\phi ,\psi )+\sum _{j\in S} q_{ij}(\phi ,\psi )u(j) \Big \} \end{aligned}$$
for each $i\in S$. By Theorem 2.7(ii), this implies that $u$ equals $V^\alpha $, the value of the game $\mathcal {G}$. So far, we have shown that if $u$ is any limit point of $\{V^\alpha _n\}$ then, necessarily, $u=V^\alpha $. But this implies that $\lim _{n\rightarrow \infty } V^\alpha _n(i)=V^\alpha (i)$ for all $i\in S$. The proof of (i) is now complete.
(ii)
Suppose that $(\pi ^1,\pi ^2)\in \Pi ^1_s\times \Pi ^2_s$ is a limit strategy through the subsequence $\{n'\}$. Fix $i\in S$ and write
$$\begin{aligned} \phi ^*_n=\pi ^1_n(\cdot |i),\quad \psi ^*_n=\pi ^2_n(\cdot |i),\quad \phi ^*=\pi ^1(\cdot |i),\quad \psi ^*=\pi ^2(\cdot |i) \end{aligned}$$
for $n\ge n(i)$. Then, we have
$$\begin{aligned} \phi ^*_{n'}\mathop {\longrightarrow }\limits ^{d} \phi ^*\quad \hbox {and}\quad \psi ^*_{n'}\mathop {\longrightarrow }\limits ^{d} \psi ^*. \end{aligned}$$
For $n\ge n(i)$, we know that $\phi ^*_{n'}$ and $\psi ^*_{n'}$ attain the supremum and the infimum in the Shapley equation for $\mathcal {G}_{n'}$ for the state $i$; recall Theorem 2.11(iii). Therefore,
$$\begin{aligned} \alpha V^\alpha _{n'}(i)&= \sup _{\phi \in \overline{A}_{n'}(i)} \inf _{\psi \in \overline{B}_{n'}(i)} \Big \{r_{n'}(i,\phi ,\psi )+\sum _{j\in S_{n'}} q^{n'}_{ij}(\phi ,\psi )V^\alpha _{n'}(j) \Big \} \nonumber \\&= \inf _{\psi \in \overline{B}_{n'}(i)} \Big \{r_{n'}(i,\phi ^*_{n'},\psi )+\sum _{j\in S_{n'}} q^{n'}_{ij}(\phi ^*_{n'},\psi )V^\alpha _{n'}(j) \Big \}. \end{aligned}$$
(3.6)
Proceeding as in the proof of part (i), we can show that for every $\psi \in \overline{B}(i)$ there exists a sequence $\psi _{n'}\in \overline{B}_{n'}(i)$ such that $\psi _{n'}\mathop {\longrightarrow }\limits ^{d}\psi $, and so, by (3.6),
$$\begin{aligned} \alpha V^\alpha _{n'}(i)&\le r_{n'}(i,\phi ^*_{n'},\psi _{n'})+\sum _{j\in S_{n'}} q^{n'}_{ij}(\phi ^*_{n'},\psi _{n'})V^\alpha _{n'}(j). \end{aligned}$$
Taking the limit as $n'\rightarrow \infty $ and recalling that $V^\alpha _n$ converges pointwise to $V^\alpha $ (part (i) of this theorem) give
$$\begin{aligned} \alpha V^\alpha (i) \le r(i,\phi ^*,\psi )+\sum _{j\in S} q_{ij}(\phi ^*,\psi )V^\alpha (j). \end{aligned}$$
Since $\psi \in \overline{B}(i)$ is arbitrary, we have
$$\begin{aligned} \alpha V^\alpha (i) \le \inf _{\psi \in \overline{B}(i)}\Big \{r(i,\phi ^*,\psi )+\sum _{j\in S} q_{ij}(\phi ^*,\psi )V^\alpha (j)\Big \}. \end{aligned}$$
But from the Shapley equation for $\mathcal {G}$ we know that
$$\begin{aligned} \alpha V^\alpha (i) =\sup _{\phi \in \overline{A}(i)} \inf _{\psi \in \overline{B}(i)}\Big \{r(i,\phi ,\psi )+\sum _{j\in S} q_{ij}(\phi ,\psi )V^\alpha (j)\Big \}. \end{aligned}$$
Hence, $\phi ^*$ attains the supremum in the Shapley equation for $i\in S$. Similarly, $\psi ^*$ attains the infimum in the Shapley equation for $\mathcal {G}$ and $i\in S$ and, by Theorem 2.7(iii), this implies that $(\pi ^1,\pi ^2)\in \Pi ^1_s\times \Pi ^2_s$ is indeed a pair of optimal strategies for $\mathcal {G}$.

$\square $

4 Finite state and actions approximations

Given a game model $\mathcal {G}$ satisfying Assumption 2.2, we show how to construct a sequence of game models $\{\mathcal {G}_n\}_{n\ge 1}$ for which Assumption 2.9 holds. For each $n\ge 1$, the elements of the game model $\mathcal {G}_n$ are the following.

The state space is $S_n=\{0,1,\ldots ,n\}$.
For $i\in S_n$, let $A_n(i)$ and $B_n(i)$ be finite subsets of $A(i)$ and $B(i)$, respectively, such that
$$\begin{aligned} d_H(A_n(i),A(i))\rightarrow 0\quad \hbox {and}\quad d_H(B_n(i),B(i))\rightarrow 0 \end{aligned}$$
as $n\rightarrow \infty $.
Given $i\in S_n$ and $0\le j<n$, define $q^n_{ij}(a,b)=q_{ij}(a,b)$, and let
$$\begin{aligned} q_{in}^n(a,b)=\sum _{j\ge n}q_{ij}(a,b) \end{aligned}$$
for $(a,b)\in A_n(i)\times B_n(i)$.
The reward/cost rate is $r_n(i,a,b)=r(i,a,b)$ for $(a,b)\in A_n(i)\times B_n(i)$.

We note that construction of $A_n(i)$ and $B_n(i)$ with the property of convergence in the Hausdorff metric is indeed possible. For instance, for each $n\ge 1$, consider the open cover of $A(i)$ given by the open balls centered in $a\in A(i)$ with radius $1/n$, and let $A_n(i)$ be the centers of a finite subcover. Then, $d_H(A(i),A_n(i))\le 1/n$.

Theorem 4.1

If the game model $\mathcal {G}$ satisfies Assumption 2.2, then the sequence $\{\mathcal {G}_n\}_{n\ge 1}$ defined above satisfies Assumption 2.9 and, moreover, $\mathcal {G}_n\rightarrow \mathcal {G}$.

Proof

First of all, we observe that the transition rates of $\mathcal {G}_n$ are conservative:

$$\begin{aligned} \sum _{j\in S_n} q^n_{ij}(a,b)=\sum _{j\in S} q_{ij}(a,b)=0\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}_n, \end{aligned}$$

and stable. Indeed, $-q^n_{ii}(a,b)=-q_{ii}(a,b)\le w(i)$ for $i<n$ and

$$\begin{aligned} -q^n_{nn}(a,b)=-\sum _{j\ge n} q_{nj}(a,b) \le -q_{nn}(a,b)\le w(n). \end{aligned}$$

Concerning Assumption 2.9(i), observe that for all $(i,a,b)\in \mathbb {K}_n$

$$\begin{aligned} \sum _{j\in S_n} q_{ij}^n(a,b)w(j)&= \sum _{j\in S_n} q_{ij}(a,b)w(j) + \sum _{j> n} q_{ij}(a,b)w(n) \\&\le \sum _{j\in S_n} q_{ij}(a,b)w(j) + \sum _{j>n} q_{ij}(a,b)w(j)\\&= \sum _{j\in S}q_{ij}(a,b)w(j)\le c_1w(i)+d_1, \end{aligned}$$

where we make use of the monotonicity of $w$. The fact that $q_n(i)\le w(i)$ has been established along with the stability of the transition rates of $\mathcal {G}_n$. So, Assumption 2.9(i) indeed holds.

Clearly, Assumptions 2.9(ii)–(iii) are also satisfied, while Assumption 2.9(iv) is proved similarly to Assumption 2.9(i).

It remains to check that $\mathcal {G}_n\rightarrow \mathcal {G}$. Items (a) and (b) in Definition 3.1 hold by construction of $\mathcal {G}_n$. Finally, given $i,j\in S$, if $(a_n,b_n)\in A_n(i)\times B_n(i)$ are such that $a_n\rightarrow a\in A(i)$ and $b_n\rightarrow b\in B(i)$ then

$$\begin{aligned} r_n(i,a_n,b_n)=r(i,a_n,b_n)\quad \hbox {and}\quad q^n_{ij}(a_n,b_n)=q_{ij}(a_n,b_n) \end{aligned}$$

for $n> i\vee j$, and so Definitions 3.1(c)–(d) hold by continuity of the transition and reward/cost rates of $\mathcal {G}$. $\square $

As a consequence of Theorem 3.4, the value functions of the finite state and actions games $\mathcal {G}_n$ converge to the value function of $\mathcal {G}$, and any limit strategy of optimal stationary strategies for $\mathcal {G}_n$ is optimal for $\mathcal {G}$. Next, we address the issue of the rate of convergence of $V_n^\alpha (i)$ to $V^\alpha (i)$. To establish such convergence rates, we need to strengthen our hypotheses.

Assumption 4.2

(i)
For each $i,j\in S$, the functions $(a,b)\mapsto r(i,a,b)$ and $(a,b)\mapsto q_{ij}(a,b)$ are $L_{r_i}$- and $L_{q_{ij}}$-Lipschitz continuous on $A(i)\times B(i)$, i.e.,
$$\begin{aligned} |r(i,a,b)-r(i,a',b')|&\le L_{r_i}\big ( d_A(a,a')+d_B(b,b')\big )\\ |q_{ij}(a,b)-q_{ij}(a',b')|&\le L_{q_{ij}}\big ( d_A(a,a')+d_B(b,b')\big ) \end{aligned}$$
for all $a,a'\in A(i)$ and $b,b'\in B(i)$, and some $L_{r_i}>0$ and $L_{q_{ij}}>0$.
(ii)
With $w$ the Lyapunov function in Assumption 2.2, there exist constants $\delta >2$, $c_\delta <\alpha $, and $d_\delta \ge 0$ with
$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)w^{\delta }(j)\le c_\delta w^{\delta }(i)+d_\delta \quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}. \end{aligned}$$
(4.1)

Before establishing our main result, we need two preliminary lemmas.

Lemma 4.3

Suppose that the function $h:S\rightarrow [0,\infty )$ satisfies $q(i)\le h(i)$ for all $i\in S$. If there exists a power $\gamma >0$ and a constant $c_\gamma \ge 0$ such that

$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)h^\gamma (j) \le c_\gamma h^\gamma (i) \quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}, \end{aligned}$$

(4.2)

then for every power $0<\gamma '<\gamma $

$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)h^{\gamma '}(j) \le c_{\gamma } h^{\gamma '}(i) \quad \hbox {for all} (i,a,b)\in \mathbb {K}. \end{aligned}$$

Proof

Fix $(i,a,b)\in \mathbb {K}$ and $\eta >0$. Rewrite (4.2) as

$$\begin{aligned} \frac{1}{h(i)+\eta }\sum _{j\ne i} q_{ij}(a,b)h^\gamma (j) + \Big (\frac{q_{ii}(a,b)}{h(i)+\eta }+1\Big )h^\gamma (i)\le \Big (\frac{c_\gamma }{h(i)+\eta }+1\Big )h^\gamma (i). \end{aligned}$$

(4.3)

Let

$$\begin{aligned} p_i=\frac{q_{ii}(a,b)}{h(i)+\eta }+1\quad \hbox {and}\quad p_j=\frac{q_{ij}(a,b)}{h(i)+\eta }\quad \hbox {for}\,\, j\ne i. \end{aligned}$$

These coefficients are nonnegative and $\sum _{j\in S}p_j=1$. Therefore, (4.3) is equivalent to

$$\begin{aligned} \sum _{j\in S}p_jh^\gamma (j)\le \Big (\frac{c_\gamma }{h(i)+\eta }+1\Big )h^\gamma (i). \end{aligned}$$

Using Jensen’s inequality for the concave function $x\mapsto x^{\gamma '/\gamma }$ yields

$$\begin{aligned} \sum _{j\in S}p_jh^{\gamma '}(j)\le \Big (\frac{c_\gamma }{h(i)+\eta }+1\Big )^{\gamma '/\gamma }h^{\gamma '}(i) \end{aligned}$$

or, equivalently,

$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)h^{\gamma '}(j) \le h^{\gamma '}(i)\left( \Big (\frac{c_\gamma }{h(i)+\eta }+1\Big )^{\gamma '/\gamma }-1\right) (h(i)+\eta ). \end{aligned}$$

Since $0<\gamma '/\gamma <1$, we have

$$\begin{aligned} \Big (\frac{c_\gamma }{h(i)+\eta }+1\Big )^{\gamma '/\gamma }-1\le \frac{c_\gamma }{h(i)+\eta }, \end{aligned}$$

and so

$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)h^{\gamma '}(j) \le c_\gamma h^{\gamma '}(i), \end{aligned}$$

which completes the proof. $\square $

Consequently, if there exists a power $\gamma >0$ such that the Lyapunov function $w$ verifies

$$\begin{aligned} \sum _{j\in S} q_{ij}(a,b)w^\gamma (j)\le c_\gamma w^\gamma (i)+d_{\gamma }\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}\end{aligned}$$

and some constants $c_\gamma \in \mathbb {R}$ and $d_\gamma \ge 0$, then for every $0<\gamma '<\gamma $

$$\begin{aligned} \sum _{j\in S} q_{ij}(a,b)w^{\gamma '}(j) \le (|c_\gamma |+d_\gamma )w^{\gamma '}(i) \quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}\end{aligned}$$

(indeed, just note that $\sum q_{ij}(a,b)w^\gamma (j)\le (|c_\gamma |+d_\gamma )w^\gamma (i)$ and use Lemma 4.3). In particular, if Assumption 4.2(ii) holds then necessarily Assumption 2.2(iv) is satisfied and, moreover,

$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)w^{\delta -1}(j)\le (|c_{\delta } |+d_\delta )w^{\delta -1}(i)\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}. \end{aligned}$$

(4.4)

Lemma 4.4

Consider a fixed $n\ge 1$ and suppose that the game model $\mathcal {G}_n$ satisfies Assumption 2.9. Suppose that there exists a function $u\in \mathcal {B}_w(S_n)$ such that, for all $i\in S_n$,

$$\begin{aligned} \big |\alpha u(i)- \sup _{\phi \in \overline{A}_n(i)} \inf _{\psi \in \overline{B}_n(i)} \{ r_n(i,\phi ,\psi )+\sum _{j\in S_n}q^n_{ij}(\phi ,\psi )u(j)\} \big |\le h(i) \end{aligned}$$

for some $h(i)\ge 0$. Assume, in addition, that there exist constants $c_h<\alpha $ and $d_h\ge 0$ such that

$$\begin{aligned} \sum _{j\in S_n}q^n_{ij}(a,b)h(j)\le c_hh(i)+d_h\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}_n. \end{aligned}$$

Under these conditions,

$$\begin{aligned} |V_n^\alpha (i)-u(i)| \le \frac{ h(i)}{\alpha -c_h}+\frac{d_h}{\alpha (\alpha -c_h)}\quad \hbox {for each}\,\, i\in S_n. \end{aligned}$$

Proof

First of all, we note that for every $(\pi ^1,\pi ^2)\in \Pi ^1_n\times \Pi ^2_n$, $t\ge 0$, and $i\in S_n$ we have

$$\begin{aligned} E^{i,\pi ^1,\pi ^2}_n [h(x(t))] \le e^{c_ht}h(i)+\frac{d_h}{c_h}(e^{c_ht}-1)\quad \hbox {if } c_h\ne 0 \end{aligned}$$

or $E^{i,\pi ^1,\pi ^2}_n [h(x(t))] \le h(i)+d_ht$ when $c_h=0$ (the proof of these inequalities is similar to that of (2.4)). Therefore, in either case,

$$\begin{aligned} E^{i,\pi ^1,\pi ^2}_n \left[ \int _0^\infty e^{-\alpha t}h(x(t))\right] \le \frac{h(i)}{\alpha -c_h}+\frac{d_h}{\alpha (\alpha -c_h)}\quad \hbox {for all}\,\, i\in S_n. \end{aligned}$$

(4.5)

For every $i\in S_n$, there exists some $\phi \in \overline{A}_n(i)$ such that for all $\psi \in \overline{B}_n(i)$

$$\begin{aligned} \alpha u(i)- r_n(i,\phi ,\psi )-\sum _{j\in S_n}q^n_{ij}(\phi ,\psi )u(j) \le h(i). \end{aligned}$$

Consequently, there exists a stationary policy $\pi ^1\in \Pi ^{1,s}_{n}$ such that for every stationary $\pi ^2\in \Pi ^{2,s}_n$

$$\begin{aligned} \alpha u(i)- r_n(i,\pi ^1,\pi ^2)-\sum _{j\in S_n}q^n_{ij}(\pi ^1,\pi ^2)u(j) \le h(i)\quad \hbox {for all}\,\, i\in S_n. \end{aligned}$$

Using Dynkin’s formula gives, for every $i\in S_n$ and $t\ge 0$,

$$\begin{aligned}&E^{i,\pi ^1,\pi ^2}_n[ e^{-\alpha t} u(x(t))]-u(i) \\&\quad = E^{i,\pi ^1,\pi ^2}_n\Big [ \int _0^t e^{-\alpha s}[ -\alpha u(x(s))+\sum _{j\in S_n}q^n_{x(s)j}(\pi ^1,\pi ^2)u(j) ]ds\Big ] \\&\quad \ge - E^{i,\pi ^1,\pi ^2}_n\Big [ \int _0^t e^{-\alpha s} [r_n(x(s),\pi ^1,\pi ^2)+h(x(s))]ds\Big ]. \end{aligned}$$

Now we let $t\rightarrow \infty $ in this inequality. Recalling (2.4) and Remark 2.4 for the game model $\mathcal {G}_n$, and using dominated and monotone convergence, we obtain

$$\begin{aligned} u(i)&\le V^\alpha _n(i,\pi ^1,\pi ^2) + E^{i,\pi ^1,\pi ^2}_n\Big [ \int _0^\infty e^{-\alpha s} h(x(s))ds\Big ] \\&\le V^\alpha _n(i,\pi ^1,\pi ^2)+\frac{ h(i)}{\alpha -c_h}+\frac{d_h}{\alpha (\alpha -c_h)}, \end{aligned}$$

where we have used (4.5). Since this inequality holds for some $\pi ^1$ and all $\pi ^2$, we obtain

$$\begin{aligned} u(i)&\le \sup _{\pi ^1\in \Pi ^{1,s}_n} \inf _{\pi ^2\in \Pi ^{2,s}_n} \{V^\alpha _n(i,\pi ^1,\pi ^2)\}+\frac{ h(i)}{\alpha -c_h}+\frac{d_h}{\alpha (\alpha -c_h)}\nonumber \\&= V_n^\alpha (i)+\frac{ h(i)}{\alpha -c_h}+\frac{d_h}{\alpha (\alpha -c_h)} \end{aligned}$$

(4.6)

for all $i\in S_n$ (use Remark 2.8 for the game model $\mathcal {G}_n$).

Observe now that, the sets $A_n(i)$ and $B_n(i)$ being finite, we have

$$\begin{aligned}&\inf _{\psi \in \overline{B}_n(i)}\sup _{\phi \in \overline{A}_n(i)} \{ r_n(i,\phi ,\psi )+\sum _{j\in S_n}q^n_{ij}(\phi ,\psi )u(j)\} \\&\quad = \sup _{\phi \in \overline{A}_n(i)} \inf _{\psi \in \overline{B}_n(i)} \{ r_n(i,\phi ,\psi )+\sum _{j\in S_n}q^n_{ij}(\phi ,\psi )u(j)\}; \end{aligned}$$

see Theorem 1 in Frenk et al. (2004). So, using a symmetric argument with the inequality

$$\begin{aligned} -h(i)\le \alpha u(i)- \inf _{\psi \in \overline{B}_n(i)}\sup _{\phi \in \overline{A}_n(i)} \{ r_n(i,\phi ,\psi )+\sum _{j\in S_n}q^n_{ij}(\phi ,\psi )u(j)\} \end{aligned}$$

gives the existence of $\pi ^2\in \Pi ^{2,s}_n$ such that for all $\pi ^1\in \Pi ^{1,s}_n$

$$\begin{aligned} V_n^\alpha (i,\pi ^1,\pi ^2)\le u(i)+\frac{ h(i)}{\alpha -c_h}+\frac{d_h}{\alpha (\alpha -c_h)}\quad \hbox {for all}\,\, i\in S_n, \end{aligned}$$

and, therefore,

$$\begin{aligned} V^\alpha _n(i)\le u(i)+\frac{ h(i)}{\alpha -c_h}+\frac{d_h}{\alpha (\alpha -c_h)}\quad \hbox {for all}\, i\in S_n. \end{aligned}$$

Together with (4.6), this proves the stated result. $\square $

Finally, we state our main result on the convergence rates to the value of the game.

Theorem 4.5

Suppose that the game model $\mathcal {G}$ satisfies Assumptions 2.2 and 4.2.

Let $\{\mathcal {G}_n\}_{n\ge 1}$ be the sequence of finite state and actions truncations of $\mathcal {G}$ constructed at the beginning of Sect. 4, and suppose that the action sets for $\mathcal {G}_n$ are chosen in such a way that, for all $n\ge 1$ and $i\in S_n$, and for some constant $D>0$,

$$\begin{aligned} d_H(A_n(i),A(i))\vee d_H(B_n(i),B(i))\le \frac{Dw^{\delta }(i)}{w^{\delta -2}(n+1)(L_{r_i}+2\mathfrak {M}w(n)\sum _{j=0}^{n-1}L_{q_{ij}})}. \end{aligned}$$

Under these conditions, there exists a constant $\mathfrak {c}>0$ such that, for every $n\ge 1$ and $i\in S_n$,

$$\begin{aligned} |V^\alpha _n(i)-V^\alpha (i) | \le \mathfrak {c} \frac{ w^{\delta }(i)}{w^{\delta -2}(n+1)}. \end{aligned}$$

Proof

Fix $n\ge 1$ and $i\in S_n$. We have

$$\begin{aligned} \alpha V^\alpha (i)&= \sup _{\phi \in \overline{A}(i)}\inf _{\psi \in \overline{B}(i)} \left\{ r(i,\phi ,\psi )+\sum _{j\in S}q_{ij}(\phi ,\psi )V^\alpha (j)\right\} \nonumber \\&\le \sup _{\phi \in \overline{A}(i)}\inf _{\psi \in \overline{B}_n(i)} \left\{ r(i,\phi ,\psi )+\sum _{j\in S}q_{ij}(\phi ,\psi )V^\alpha (j)\right\} . \end{aligned}$$

(4.7)

Note that for every $(\phi ,\psi )\in \overline{A}(i)\times \overline{B}(i)$

$$\begin{aligned} \sum _{j\in S}q_{ij}(\phi ,\psi )V^\alpha (j)&= \sum _{j=0}^{n-1}q_{ij}(\phi ,\psi )(V^\alpha (j)-V^\alpha (n))\\&+\sum _{j>n}q_{ij}(\phi ,\psi )(V^\alpha (j)-V^\alpha (n)), \end{aligned}$$

and recalling that $||V^\alpha ||_w\le \mathfrak {M}$,

$$\begin{aligned} \big | \sum _{j>n}q_{ij}(\phi ,\psi )(V^\alpha (j)-V^\alpha (n)) \big | \le 2\mathfrak {M}\sum _{j>n} q_{ij}(\phi ,\psi ) w(j). \end{aligned}$$

Observe now that proceeding as in the proof of Corollary 2.6(i) and recalling (4.4), we can show that

$$\begin{aligned} \sum _{j>n} q_{ij}(\phi ,\psi ) w(j)&\le \frac{1}{w^{\delta -2}(n+1)}\cdot \Big ( (|c_\delta |+d_\delta )w^{\delta -1}(i)+q(i)w^{\delta -1}(i)\Big )\nonumber \\&\le (|c_\delta |+d_\delta +1) \frac{w^{\delta }(i)}{w^{\delta -2}(n+1)}. \end{aligned}$$

(4.8)

Therefore, combining (4.7) and (4.8), we obtain

$$\begin{aligned} \alpha V^\alpha (i) \!\le \! \sup _{\phi \in \overline{A}(i)}\inf _{\psi \in \overline{B}_n(i)} \Big \{ r(i,\phi ,\psi )\!+\!\sum _{j=0}^{n-1}q_{ij}(\phi ,\psi )(V^\alpha (j)\!-\!V^\alpha (n))\Big \} \!+ \overline{C} \frac{w^{\delta }(i)}{w^{\delta -2}(n+1)}, \end{aligned}$$

with

$$\begin{aligned} \overline{C}=2\mathfrak {M}(|c_\delta |+d_\delta +1). \end{aligned}$$

By upper semicontinuity, the above supremum is indeed attained. Consequently, there exists $\phi \in \overline{A}(i)$ such that, for every $\psi \in \overline{B}_n(i)$, we have

$$\begin{aligned} \alpha V^\alpha (i) \le r(i,\phi ,\psi )+\sum _{j=0}^{n-1}q_{ij}(\phi ,\psi )(V^\alpha (j)-V^\alpha (n))+\overline{C} \frac{w^{\delta }(i)}{w^{\delta -2}(n+1)}. \end{aligned}$$

(4.9)

Given arbitrary $\epsilon >0$, there exists a finite set $\{x_1,\ldots ,x_k\}\subseteq A(i)$ and $\beta _1,\ldots ,\beta _k\ge 0$, with $\beta _1+\ldots +\beta _k=1$, such that

$$\begin{aligned} d_W(\phi ,\sum _{j=1}^k \beta _j \delta _{x_j})\le \epsilon . \end{aligned}$$

For every $x_j$, let $\hat{x}_j\in A_n(i)$ be such that

$$\begin{aligned} d_A(x_j,\hat{x}_j)=\min _{y\in A_n(i)} d_A(x_j,y)\le d_H(A(i),A_n(i)). \end{aligned}$$

It is easy to see (recall (3.3)) that

$$\begin{aligned} d_W\left( \sum _{j=1}^k \beta _j \delta _{x_j},\sum _{j=1}^k \beta _j \delta _{\hat{x}_j}\right) \le \sum _{j=1}^k \beta _j d_A(x_j,\hat{x}_j) \le d_H(A(i),A_n(i)), \end{aligned}$$

and so, letting $\hat{\phi }=\sum _{j=1}^k \beta _j \delta _{\hat{x}_j}\in \overline{A}_n(i)$,

$$\begin{aligned} d_W(\phi ,\hat{\phi })\le \epsilon +d_H(A(i),A_n(i)). \end{aligned}$$

Summarizing, for $\phi \in \overline{A}(i)$ we have found a probability measure in $\hat{\phi }\in \overline{A}_n(i)$ which is “close” to $\phi $ in the Wasserstein metric. By the Lipschitz continuity Assumption 4.2, observe that the function on $A(i)\times B(i)$ given by

$$\begin{aligned} (a,b)\mapsto r(i,a,b)+\sum _{j=0}^{n-1}q_{ij}(a,b)(V^\alpha (j)-V^\alpha (n)) \end{aligned}$$

is Lipschitz continuous, with Lipschitz constant $L_{r_i}+2\mathfrak {M}w(n)\sum _{j=0}^{n-1}L_{q_{ij}}$. Consequently, the same applies to

$$\begin{aligned} a\mapsto \int _{B_n(i)} \Big [r(i,a,b)+\sum _{j=0}^{n-1}q_{ij}(a,b)(V^\alpha (j)-V^\alpha (n))\Big ]\psi (db). \end{aligned}$$

Use now (3.3) to derive that

$$\begin{aligned}&\Big | r(i,\phi ,\psi )+\sum _{j=0}^{n-1}q_{ij}(\phi ,\psi )(V^\alpha (j)-V^\alpha (n))- r(i,\hat{\phi },\psi )\\&\quad +\sum _{j=0}^{n-1}q_{ij}(\hat{\phi },\psi )(V^\alpha (j)-V^\alpha (n))\Big | \\&\le \Big (L_{r_i}+2\mathfrak {M}w(n)\sum _{j=0}^{n-1}L_{q_{ij}}\Big )\cdot d_W(\phi ,\hat{\phi })\\&\le \Big (L_{r_i}+2\mathfrak {M}w(n)\sum _{j=0}^{n-1}L_{q_{ij}}\Big )\cdot (\epsilon +d_H(A(i),A_n(i))). \end{aligned}$$

Therefore, recalling (4.9), this yields that $\alpha V^\alpha (i)$ is less than or equal to

$$\begin{aligned}&r(i,\hat{\phi },\psi )+\sum _{j=0}^{n-1}q_{ij}(\hat{\phi },\psi )(V^\alpha (j)-V^\alpha (n))\\&\quad +\overline{C} \frac{w^{\delta }(i)}{w^{\delta -2}(n+1)}+\Big (L_{r_i}+2\mathfrak {M}w(n)\sum _{j=0}^{n-1}L_{q_{ij}}\Big )\cdot (\epsilon +d_H(A(i),A_n(i))). \end{aligned}$$

Since this holds for all $\psi \in \overline{B}_n(i)$ and the particular $\hat{\phi }\in \overline{A}_n(i)$ constructed above, we deduce that

$$\begin{aligned} \alpha V^\alpha (i)&\le \sup _{\phi \in \overline{A}_n(i)} \inf _{\psi \in \overline{B}_n(i)} \left\{ r(i,\phi ,\psi )+\sum _{j=0}^{n-1}q_{ij}(\phi ,\psi )(V^\alpha (j)-V^\alpha (n))\right\} \\&+\,\, \overline{C} \frac{w^{\delta }(i)}{w^{\delta -2}(n+1)} \\&+\left( L_{r_i}+2\mathfrak {M}w(n)\sum _{j=0}^{n-1}L_{q_{ij}}\right) \cdot (\epsilon +d_H(A(i),A_n(i))). \end{aligned}$$

But $\epsilon >0$ being arbitrary and recalling our hypothesis on $d_H(A(i),A_n(i))$, we derive that

$$\begin{aligned} \alpha V^\alpha (i)&\le \sup _{\phi \in \overline{A}_n(i)} \inf _{\psi \in \overline{B}_n(i)} \left\{ r(i,\phi ,\psi )+\sum _{j=0}^{n-1}q_{ij}(\phi ,\psi )(V^\alpha (j)-V^\alpha (n))\right\} \\&+ \frac{(\overline{C}+D)w^{\delta }(i)}{w^{\delta -2}(n+1)}. \\&= \sup _{\phi \in \overline{A}_n(i)} \inf _{\psi \in \overline{B}_n(i)} \left\{ r_n(i,\phi ,\psi )+\sum _{j\in S_n}q^n_{ij}(\phi ,\psi )V^\alpha (j)\right\} + \frac{(\overline{C}+D)w^{\delta }(i)}{w^{\delta -2}(n+1)}, \end{aligned}$$

where the last equality is derived from the definition of the reward and transition rates of $\mathcal {G}_n$.

Using a symmetric argument, we can show that

$$\begin{aligned} \alpha V^\alpha (i) \ge \inf _{\psi \in \overline{B}_n(i)}\sup _{\phi \in \overline{A}_n(i)} \left\{ r_n(i,\phi ,\psi )+\sum _{j\in S_n}q_{ij}^n(\phi ,\psi )V^\alpha (j)\right\} - \frac{(\overline{C}+D)w^{\delta }(i)}{w^{\delta -2}(n+1)}. \end{aligned}$$

As in the proof of Theorem 4.1, we can show that the inequality in Assumption 4.2(ii) is satisfied by the transition rates of $\mathcal {G}_n$ with the same constants $c_\delta $ and $d_\delta $. Therefore, by Lemma 4.4, we conclude that, for every $i\in S_n$,

$$\begin{aligned} |V^{\alpha }_n(i)-V^\alpha (i)| \le \frac{(\overline{C}+D)w^{\delta }(i)}{(\alpha -c_\delta )w^{\delta -2}(n+1)}+\frac{(\overline{C}+D)d_\delta }{\alpha (\alpha -c_\delta )w^{\delta -2}(n+1)}. \end{aligned}$$

Recalling the definition of the constants $\overline{C}$ and $\mathfrak {M}$, and letting

$$\begin{aligned} \mathfrak {c}=\frac{(2M(\alpha +d_1)(|c_{\delta }|+d_{\delta }+1)+D\alpha (\alpha -c_1))(d_\delta +\alpha )}{\alpha ^2(\alpha -c_1)(\alpha -c_\delta )} \end{aligned}$$

we have

$$\begin{aligned} |V^{\alpha }_n(i)-V^\alpha (i)| \le \mathfrak {c} \frac{w^{\delta }(i)}{w^{\delta -2}(n+1)} \end{aligned}$$

for all $n\ge 1$ and $i\in S_n$. $\square $

The above theorem shows that, if a Lyapunov condition holds for the function $w^{\delta }$ (with $\delta >2$) then, by making a suitable choice of the finite sets of actions $A_n(i)$ and $B_n(i)$, the error when approximating $V^\alpha (i)$ with $V^\alpha _n(i)$ is of order $1/w^{\delta -2}(n+1)$. Moreover, we have an explicit expression for the multiplicative constant $\mathfrak {c}$ that depends on the initial data (and related constants) of the game model $\mathcal {G}$.

Solving a finite state and action game. Consider the finite state and actions game $\mathcal {G}_n$ defined in the beginning of Sect. 4. Let $\mathbf {q}_n>0$ be such that

$$\begin{aligned} \mathbf {q}_n> -q_{ii}^n(a,b)\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}_n \end{aligned}$$

(4.10)

(it suffices to let $\mathbf {q}_n>w(n)$). For $u=\{u(i)\}_{i\in S_n}\in \mathbb {R}^{n+1}$ define the operator $T_n u\in \mathbb {R}^{n+1}$ as

$$\begin{aligned} T_nu(i)&= \max _{\phi \in \overline{A}_n(i)} \min _{\psi \in \overline{B}_n(i)} \left\{ r_n(i,\phi ,\psi )+\sum _{j\in S_n} q_{ij}^n(\phi ,\psi )u(j)\right\} \\&= \min _{\psi \in \overline{B}_n(i)}\max _{\phi \in \overline{A}_n(i)} \left\{ r_n(i,\phi ,\psi )+\sum _{j\in S_n} q_{ij}^n(\phi ,\psi )u(j)\right\} \end{aligned}$$

for $i\in S_n$. Define also $\widetilde{T}_n u\in \mathbb {R}^{n+1}$ as

$$\begin{aligned} \widetilde{T}_nu(i)&= \max _{\phi \in \overline{A}_n(i)} \min _{\psi \in \overline{B}_n(i)} \Big \{ \frac{r_n(i,\phi ,\psi )}{\alpha +\mathbf {q}_n}+ \frac{\mathbf {q}_n}{\alpha +\mathbf {q}_n}\sum _{j\in S_n} \Big (\frac{q_{ij}^n(\phi ,\psi )}{\mathbf {q}_n}+\delta _{ij}\Big )u(j)\Big \} \end{aligned}$$

(4.11)

$$\begin{aligned}&= \min _{\psi \in \overline{B}_n(i)}\max _{\phi \in \overline{A}_n(i)}\Big \{ \frac{r_n(i,\phi ,\psi )}{\alpha +\mathbf {q}_n}+ \frac{\mathbf {q}_n}{\alpha +\mathbf {q}_n}\sum _{j\in S_n} \Big (\frac{q_{ij}^n(\phi ,\psi )}{\mathbf {q}_n}+\delta _{ij}\Big )u(j)\Big \}\nonumber \\ \end{aligned}$$

(4.12)

for $i\in S_n$ (cf. Section 8 in Guo and Hernández-Lerma 2005). It is easily seen that the equation $\alpha u=T_nu$ is equivalent to the fixed point equation $u=\widetilde{T}_nu$. Therefore, as a consequence of Theorem 2.11, the value $V_n^\alpha $ of the game $\mathcal {G}_n$ is the unique fixed point of the operator $\widetilde{T}_n$. Moreover, by a standard calculation, it follows that $\widetilde{T}_n$ is a contraction operator on $\mathbb {R}^{n+1}$ with modulus $\mathbf {q}_n/(\alpha +\mathbf {q}_n)<1$ when considering the supremum norm; that is,

$$\begin{aligned} \Vert \widetilde{T}_nu-\widetilde{T}_nv\Vert \le \frac{\mathbf {q}_n}{\alpha +\mathbf {q}_n} \Vert u-v\Vert \quad \hbox {for all}\,\, u,v\in \mathbb {R}^{n+1}. \end{aligned}$$

Hence, the iterative procedure (which is a sort of value iteration algorithm):

1.
Fix arbitrary $u_0\in \mathbb {R}^{n+1}$,
2.
For $k\ge 1$, let $u_k=\widetilde{T}_nu_{k-1}$,

converges geometrically to $V_n^\alpha $ in the supremum norm. Concerning the computation of $\widetilde{T}_nu$ for a given $u\in \mathbb {R}^{n+1}$, we can apply our next lemma, which uses the following notation. Given a positive integer $N$, define $\Delta _N$ as the set of nonnegative $\lambda _1,\ldots ,\lambda _N$ such that $\lambda _1+\ldots +\lambda _N=1$.

Lemma 4.6

Given the real-valued matrix $C=\{C_{s,t}\}_{1\le s\le I,1\le t\le J}$, define

$$\begin{aligned} V^* = \max _{\lambda \in \Delta _I}\min _{1\le t\le J} \sum _{1 \le s\le I} \lambda _s C_{s,t} =\min _{\mu \in \Delta _J} \max _{1\le s\le I} \sum _{1 \le t\le J} \mu _t C_{s,t}. \end{aligned}$$

Let $\mathbf {c}\ge 0$ be such that all the elements of the matrix $D$, with $D_{s,t}=C_{s,t}+\mathbf {c}$, are strictly positive. Consider the linear programming problem

$$\begin{aligned} \min \ \mathbf {1}'\mathbf {x} \quad \hbox {subject to}\quad D'\mathbf {x}\ge \mathbf {1},\ \mathbf {x}\ge \mathbf {0}, \end{aligned}$$

and let $\mathbf {x}^*\in \mathbb {R}^I$ be an optimal solution. Then, $ V^*=\frac{1}{\mathbf {1}'\mathbf {x}^*}-\mathbf {c}$.

Proof

We have

$$\begin{aligned} V^*+\mathbf {c}= \max _{\lambda \in \Delta _I}\min _{1\le t\le J} \sum _{1 \le s\le I} \lambda _s D_{s,t}=:\tilde{V} \end{aligned}$$

and suppose that $D_{s,t}\ge \epsilon >0$ for all $s$ and $t$. Observe that $\tilde{V}$ equals the optimum of the linear programming problem: maximize $v$ subject to

$$\begin{aligned} v\le \sum _{1\le s\le I} \lambda _s D_{s,t}\quad \hbox {for all}\,\, 1\le t\le J, \end{aligned}$$

with $v\ge \epsilon $ and $\lambda \in \Delta _I$. Letting $x_s=\lambda _s/v$ for $s=1,\ldots ,I$, it follows that $1/\tilde{V}$ equals the optimum of:

$$\begin{aligned} \min \ \mathbf {1}'\mathbf {x} \quad \hbox {subject to}\quad D'\mathbf {x}\ge \mathbf {1},\ \mathbf {x}\ge \mathbf {0},\ \mathbf {1}'\mathbf {x}\le 1/\epsilon , \end{aligned}$$

where the last constraint is redundant. $\square $

Therefore, once $u_{k-1}$ is known, we can effectively compute $u_k$ by solving the linear programming problem described in Lemma 4.6. Namely, given $i\in S_n$ and for all $a_s\in A_n(i)$ with $1\le s\le I$, and all $b_t\in B_n(i)$ with $1\le t\le J$, define

$$\begin{aligned} C_{s,t}= \frac{r_n(i,a_s,b_t)}{\alpha +\mathbf {q}_n}+ \frac{\mathbf {q}_n}{\alpha +\mathbf {q}_n}\sum _{j\in S_n} \Big (\frac{q_{ij}^n(a_s,b_t)}{\mathbf {q}_n}+\delta _{ij}\Big )u_{k-1}(j) \end{aligned}$$

and then use Lemma 4.6 to determine $u_k$.

Regarding a stopping criterion for the above algorithm, we have the following result. In the next lemma, the norm $||\cdot ||$ refers to the supremum norm on $\mathbb {R}^{n+1}$.

Lemma 4.7

Given the finite state and actions game model $\mathcal {G}_n$, consider the sequence of iterates $\{u_k\}_{k\ge 0}$, where $u_0\in \mathbb {R}^{n+1}$ is arbitrary and, for $k\ge 1$, $u_k=\widetilde{T}_nu_{k-1}$. Fix $\epsilon >0$ and let $k\ge 1$ be such that $||u_{k-1}-u_{k}||\le \epsilon \alpha /\mathbf {q}_n$. The following statements hold.

(i)
$||u_k-V_n^\alpha ||\le \epsilon $.
(ii)
The strategy $\pi _*^1\in \Pi ^{1,s}_n$ such that, for all $i\in S_n$, $\pi ^1_*(\cdot |i)$ attains the maximum in (4.11) for the iteration $u_{k+1}=\widetilde{T}_nu_k$ is $2\epsilon $-optimal for player 1, meaning that
$$\begin{aligned} V^\alpha _n(i)-2\epsilon \le \inf _{\pi ^2\in \Pi ^{2}_n} V_n^\alpha (i,\pi _*^1,\pi ^2)\quad \hbox {for all}\,\, i\in S_n. \end{aligned}$$
(iii)
The strategy $\pi _*^2\in \Pi ^{2,s}_n$ such that, for all $i\in S_n$, $\pi ^2_*(\cdot |i)$ attains the minimum in (4.12) for the iteration $u_{k+1}=\widetilde{T}_nu_k$ is $2\epsilon $-optimal for player 2, meaning that
$$\begin{aligned} V^\alpha _n(i)+2\epsilon \ge \sup _{\pi ^1\in \Pi ^{1}_n} V_n^\alpha (i,\pi ^1,\pi ^2_*)\quad \hbox {for all}\,\, i\in S_n. \end{aligned}$$

Proof

(i) We have

$$\begin{aligned} ||u_{k}-V^\alpha _n||&\le ||u_k-u_{k+1}||+||u_{k+1}-V^\alpha _n|| \\&\le \frac{\mathbf {q}_n}{\alpha +\mathbf {q}_n} \Big (||u_{k-1}-u_{k}||+||u_{k}-V^\alpha _n||\Big ) \end{aligned}$$

because $V^\alpha _n$ is the fixed point of $\widetilde{T}_n$, and so

$$\begin{aligned} ||u_{k}-V^\alpha _n|| \le \frac{\mathbf {q}_n}{\alpha }||u_{k-1}-u_{k}||\le \epsilon . \end{aligned}$$

(ii) For $u:S_n\rightarrow \mathbb {R}$, consider the operator

$$\begin{aligned} \widetilde{U}u(i)&= \min _{b\in \overline{B}_n(i)} \left\{ \frac{r_n(i,\pi ^1_*(\cdot |i),b)}{\alpha +\mathbf {q}_n}+ \frac{\mathbf {q}_n}{\alpha +\mathbf {q}_n}\sum _{j\in S_n} \left( \frac{q_{ij}^n(\pi ^1_*(\cdot |i),b)}{\mathbf {q}_n}+\delta _{ij}\right) u(j)\right\} \\&\quad \hbox {for}\,\, i\in S_n, \end{aligned}$$

which is a contraction on $\mathbb {R}^{n+1}$ with modulus $\frac{\mathbf {q}_n}{\alpha +\mathbf {q}_n}$, and let $W$ be its unique fixed point. The fixed point equation

$$\begin{aligned} W(i)&= \min _{b\in \overline{B}_n(i)} \left\{ \frac{r_n(i,\pi ^1_*(\cdot |i),b)}{\alpha +\mathbf {q}_n}+ \frac{\mathbf {q}_n}{\alpha +\mathbf {q}_n}\sum _{j\in S_n} \left( \frac{q_{ij}^n(\pi ^1_*(\cdot |i),b)}{\mathbf {q}_n}+\delta _{ij}\right) W(j)\right\} \\&\quad \hbox {for}\,\, i\in S_n, \end{aligned}$$

corresponds to the discounted cost optimality equation of a continuous-time control problem (for player 2) when the strategy of player 1 is $\pi ^1_*$; see ((Prieto-Rumeau and Hernández-Lerma 2012, Section 3.3)). Therefore, $W(i)=\inf _{\pi ^2\in \Pi ^2_n}V^\alpha _n(i,\pi ^1_*,\pi ^2)$ for all $i\in S_n$.

Observe now that

$$\begin{aligned} ||W-V^\alpha _n||\le ||W-u_{k}||+||u_{k}-V^\alpha _n||. \end{aligned}$$

(4.13)

Now, on the one hand,

$$\begin{aligned} ||W-u_{k}||&\le ||W-\widetilde{T}_nu_{k}||+||\widetilde{T}_nu_{k}-u_k|| \\&= || \widetilde{U}W-\widetilde{U}u_k||+||\widetilde{T}_nu_{k}-u_k|| \\&\le \ \frac{\mathbf {q}_n}{\alpha +\mathbf {q}_n} \left( ||W-u_k||+||u_{k-1}-u_{k}||\right) \end{aligned}$$

because $\widetilde{T}_nu_{k}=\widetilde{U}u_k$, and so

$$\begin{aligned} ||W-u_{k}||\le \frac{\mathbf {q}_n}{\alpha }||u_{k-1}-u_{k}||. \end{aligned}$$

On the other hand, as established in part (i), $||u_{k}-V^\alpha _n||\le \frac{\mathbf {q}_n}{\alpha }||u_{k-1}-u_{k}||$. From (4.13), we obtain

$$\begin{aligned} ||W-V^\alpha _n||\le \frac{2\mathbf {q}_n}{\alpha }||u_{k-1}-u_{k}||\le 2\epsilon , \end{aligned}$$

and the result follows. The proof of (iii) is similar. $\square $

As a consequence of this lemma, we can explicitly obtain an approximation of the value and nearly optimal strategies for both players for the game model $\mathcal {G}_n$.

5 Numerical application

In this section, we describe a Markov game $\mathcal {G}$ on a population system and make a numerical application of our approximation techniques.

A population system is managed by players 1 and 2. The natural birth and death rates per individual are $\lambda >0$ and $\mu >0$, respectively. Player 1 is interested in the system having a large population and, to this end, player 1 can decrease the mortality rate (for instance, using a suitable medical policy). On the other hand, the goal of player 2 is to have a small number of individuals; player 2 can choose policies that decrease the birth rate of the system (e.g., discouraging immigration).

We consider the following game model.

The state space, standing for the number of individuals in the population, is $S=\{0,1,2,\ldots \}$.
The action sets of the players are $A=B=[-1,1]$, while $A(i)=B(i)=[-1,1]$ for all $i\in S$.
The system’s transition rates $q_{ij}(a,b)$ satisfy $q_{ij}(a,b)=0$ when $|i-j|>1$. When $|i-j|\le 1$ we let
$$\begin{aligned} q_{01}(a,b)=-q_{00}(a,b)=\lambda -C_b|b|, \end{aligned}$$
and, for $i\ge 1$,
$$\begin{aligned} q_{i,i-1}(a,b)=\mu i-C_a|a|\sqrt{i},\quad q_{i,i+1}(a,b)=\lambda i -C_b|b|i, \end{aligned}$$
with $q_{ii}(a,b)=-(q_{i,i-1}(a,b)+q_{i,i+1}(a,b))$, for some constants $0<C_a<\mu $ and $0<C_b<\lambda $, and all $(a,b)\in A\times B$.
The payoff rate (interpreted as a reward for player 1 and a cost for player 2) is given by
$$\begin{aligned} r(i,a,b)=p\,i+C_r |ab|\sqrt{i}\quad \hbox {for}\,\, i\in S\,\, \hbox {and}\,\, -\!1\le a,b\le 1, \end{aligned}$$
for some constants $p>0$ and $C_r>0$.

In the above definitions, the term $\sqrt{i}$ models the fact that the payoff has a concave behavior with respect to the population size, while the term $|ab|$ in the payoff rate captures the “saddle-point interplay” between the actions of the players. Note that when the players take the actions $a=0$ and $b=0$ then they do not act on the dynamic system. In this case, the corresponding Markov process (referred to as the natural population system) is recurrent when $\lambda \le \mu $ and transient when $\lambda >\mu $.

We consider the Lyapunov function $w$ given by $w(i)=(\lambda +\mu +1)\cdot (i+1)$ for $i\in S$.

Proposition 5.1

Consider the population model $\mathcal {G}$ defined above. If the discount rate $\alpha >0$ satisfies $\lambda -\mu <\alpha $ then Assumption 2.2 holds. If, in addition, we have $2(\lambda -\mu )<\alpha $ then Assumption 4.2 is satisfied.

Proof

Fix an integer $k\ge 1$ and consider the Lyapunov function $i\mapsto w^k(i)$. Given a state $i\ge 1$, we have

$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)w^k(i)&= (w^k(i-1)-w^k(i)) (\mu i-C_a|a|\sqrt{i})\\&+\,(w^k(i+1)-w^k(i))(\lambda i - C_b|b|i). \end{aligned}$$

Noting that

$$\begin{aligned}&(i+2)^k-(i+1)^k=k(i+1)^{k-1}+\mathrm {O}(i^{k-2})\quad \hbox {and}\\&i^k-(i+1)^k=-k(i+1)^{k-1}+\mathrm {O}(i^{k-2}), \end{aligned}$$

some elementary calculations give

$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)w^k(i)&= k(\lambda -\mu -C_b|b|)w^k(i)+\mathrm {O}\left( i^{k-\frac{1}{2}}\right) \\&\le k(\lambda -\mu )w^k(i)+\mathrm {O}\left( i^{k-\frac{1}{2}}\right) . \end{aligned}$$

Therefore, given an integer $k\ge 1$ and a constant $c_k>k(\lambda -\mu )$, there exists $d_k\ge 0$ such that

$$\begin{aligned} \sum _{j\in S}q_{ij}(a,b)w^k(i)\le c_kw^k(i)+d_k\quad \hbox {for all}\,\, (i,a,b)\in \mathbb {K}. \end{aligned}$$

(5.1)

If $\lambda -\mu <\alpha $ then choose $\lambda -\mu <c_1<\alpha $, and so Assumption 2.2(i) holds. In particular, note also that $-q_{ii}(a,b)\le w(i)$ for all $(i,a,b)\in \mathbb {K}$. Regarding the other statements of Assumption 2.2, item (ii) holds by letting $M=p+C_r$, part (iii) is straightforward, and (iv) holds as a consequence of (5.1).

It should be clear that Assumption 4.2(i) is satisfied. If $2(\lambda -\mu )<\alpha $, then choose $\delta >2$ and $c_{\delta }$ such that

$$\begin{aligned} \delta (\lambda -\mu )<c_{\delta }<\alpha , \end{aligned}$$

and so Assumption 4.2(ii) holds. $\square $

For each $n\ge 1$, consider now the finite state and actions game model $\mathcal {G}_n$ as described in Sect. 4. As a consequence of Theorems 3.4, 4.1, and 4.5, we obtain the following results.

(i)
Case $\lambda \le \mu $ (the natural population system is recurrent). Given arbitrary discount rate $\alpha >0$ we have
$$\begin{aligned} \lim _{n\rightarrow \infty } V^\alpha _n(i)=V^\alpha (i)\quad \hbox {for all } i\in S. \end{aligned}$$
Given arbitrary $k>0$, by suitably choosing the action sets $A_n(i)$ and $B_n(i)$ we have
$$\begin{aligned} |V^\alpha _n(i)-V^\alpha (i)|=\mathrm {O}(n^{-k})\quad \hbox {for each}\,\, i\in S. \end{aligned}$$
(ii)
Case $\lambda >\mu $ (the natural population system is transient). Given a discount rate $\lambda -\mu <\alpha $ we have
$$\begin{aligned} \lim _{n\rightarrow \infty } V^\alpha _n(i)=V^\alpha (i)\quad \hbox {for all}\,\, i\in S. \end{aligned}$$
If the discount rate is such that $2(\lambda -\mu )<\alpha $ then for each $0<k<\frac{\alpha }{\lambda -\mu }-2$ we can choose the finite sets $A_n(i)$ and $B_n(i)$ such that
$$\begin{aligned} |V^\alpha _n(i)-V^\alpha (i)|=\mathrm {O}(n^{-k})\quad \hbox {for each}\,\, i\in S. \end{aligned}$$

Numerical experimentation. We choose the following values of the parameters:

$$\begin{aligned} \lambda =2.6,\quad \mu =2.5,\quad \alpha =1.2,\quad C_a=C_b=C_r=0.2,\quad \hbox {and}\quad \quad p=3. \end{aligned}$$

For each $n\ge 1$, we consider the truncated game model $\mathcal {G}_n$ with state space $\{0,1,\ldots ,n\}$. The action sets $A_n(i)\equiv A_n$ and $B_n(i)\equiv B_n$ consist of the $n+1$ points $\frac{2k}{n}-1$ for $k=0,1,\ldots ,n$.

For $n=1,\ldots ,30$, we solve the finite game model $\mathcal {G}_n$ using the value iteration procedure described in the previous section: we start from the initial value $u_0=\mathbf {0}$ and we let $\mathbf {q}_n=\max _{(i,a,b)\in \mathbb {K}_n}\{-q^n_{ii}(a,b)\}+0.1$; recall (4.10). As a stopping criterion for the value iteration algorithm, we let $\varepsilon =5\times 10^{-5}$ and we stop at the iterate $k$ when

$$\begin{aligned} ||u_k-u_{k-1}||\le \varepsilon \alpha /\mathbf {q}_n, \end{aligned}$$

which ensures that $\Vert u_k-V^\alpha _n||\le \varepsilon $ (this refers to the supremum norm in $\mathbb {R}^{n+1}$).

In Fig. 1, we display the values $V^\alpha _n(i)$ for $i=0,1,2,3$ and $1\le n\le 30$. We observe that the values of the games $\mathcal {G}_n$ become stable for relatively small values of the truncation size $n$, say for $n\ge 20$. We obtain the approximations

$$\begin{aligned} V^\alpha (0)\simeq 2.6179,\quad V^\alpha (1)\simeq 3.9269,\quad V^\alpha (2)\simeq 5.8948,\quad V^\alpha (3)\simeq 8.0524. \end{aligned}$$

By Lemma 4.7, the approximation error (with respect to the value $V^\alpha _{30}$ of the game model $\mathcal {G}_{30}$) is less than $5\times 10^{-5}$. Empirically, we observe that convergence seems to occur faster than at the convergence rate given in Theorem 4.5. This is because the bounds used to derive the convergence rate are very conservative.

Concerning the approximation of optimal strategies, for $n=30$ we show in Table 1 the randomized strategies $\pi ^1_*(\cdot |i)$ and $\pi ^2_*(\cdot |i)$ for $i=0$ and $i=1$ as described in Lemma 4.7. Table 1 displays the corresponding probability distributions on the discretized sets of actions $A_{30}=B_{30}$. These are $10^{-4}$-optimal strategies. Empirically, this suggests that the optimal strategy for player 1 in the game model $\mathcal {G}$ will be to choose his actions uniformly on $[-1,1]$ in state $i=0$, and to randomize between actions $-1$ and $1$, with probabilities $1/2$, in state $i=1$. For player 2, the estimation of an optimal strategy is to randomize between actions $-1$ and $1$, with probabilities $1/2$, in both states $i=0$ and $i=1$.

Table 1 Optimal strategies in $\overline{A}_{30}$ and $\overline{B}_{30}$ for $\mathcal {G}_{30}$

Full size table

Notes

This is not the usual statement of the Portmanteau theorem. Observe, however, that the function constructed in (Billingsley 1968, Theorem 1.2) is bounded and Lipschitz continuous, and then proceed as in the proof of (Billingsley 1968, Theorem 2.1). Another reference for this result is (Bogachev 2007, Remark 8.3.1).

References

Billingsley P (1968) Convergence of probability measures. Wiley, New York
Google Scholar
Bogachev VI (2007) Measure theory, vol II. Springer, New York
Book Google Scholar
Bolley F (2008) Separability and completeness for the Wasserstein distance. In: Séminaire de probabilités XLI. Lecture Notes in Math. 1934, Springer, Berlin, pp 371–377
Chang HS, Hu JQ, Fu MC, Marcus SI (2010) Adaptive adversarial multi-armed bandit approach to two-person zero-sum Markov games. IEEE Trans Automat Control 55:463–468
Article Google Scholar
Frenk JBG, Kassay G, Kolumbán J (2004) On equivalent results in minimax theory Euro. J Oper Res 157:46–58
Article Google Scholar
Guo XP, Hernández-Lerma O (2003) Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates. J Appl Probab 40:327–345
Article Google Scholar
Guo XP, Hernández-Lerma O (2003) Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion. IEEE Trans Automat Control 48:236–244
Article Google Scholar
Guo XP, Hernández-Lerma O (2005) Zero-sum continuous-time Markov games with unbounded transition and discounted payoff rates. Bernoulli 11:1009–1029
Article Google Scholar
Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes: theory and applications. Springer, New York
Book Google Scholar
Guo XP, Zhang WZ (2014) Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Eur J Oper Res 238:486–496
Article Google Scholar
Jaśkiewicz A, Nowak AS (2006) Approximation of noncooperative semi-Markov games. J Optim Theory Appl 131:115–134
Article Google Scholar
Nowak AS, Altman E (2002) $\epsilon $-equilibria for stochastic games with uncountable state space and unbounded costs. SIAM J Control Optim 40:1821–1839
Article Google Scholar
Prieto-Rumeau T, Hernández-Lerma O (2012) Discounted continuous-time controlled Markov chains: convergence of control models. J Appl Probab 49:1072–1090
Article Google Scholar
Prieto-Rumeau T, Hernández-Lerma O (2012) Selected topics on continuous-time controlled Markov chains and Markov games. Imperial College Press, London
Google Scholar
Prieto-Rumeau T, Lorenzo JM (2010) Approximating ergodic average reward continuous-time controlled Markov chains. IEEE Trans Automat Control 55:201–207
Article Google Scholar

Download references

Acknowledgments

Research supported by Grant MTM2012-31393 from the Spanish Ministerio de Economía y Competitividad.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, UNED, Madrid, Spain
Tomás Prieto-Rumeau
Department of Statistics and Operations Research II, Universidad Complutense de Madrid, Madrid, Spain
José María Lorenzo

Authors

Tomás Prieto-Rumeau
View author publications
You can also search for this author in PubMed Google Scholar
José María Lorenzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomás Prieto-Rumeau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prieto-Rumeau, T., Lorenzo, J.M. Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion. TOP 23, 799–836 (2015). https://doi.org/10.1007/s11750-014-0354-8

Download citation

Received: 01 April 2014
Accepted: 20 November 2014
Published: 10 December 2014
Issue Date: October 2015
DOI: https://doi.org/10.1007/s11750-014-0354-8

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion

Abstract

Similar content being viewed by others

Zero-Sum Markov Games with Random State-Actions-Dependent Discount Factors: Existence of Optimal Strategies

Zero-sum infinite-horizon discounted piecewise deterministic Markov games

Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: Approximation of Games with Finite Populations

1 Introduction

2 Definitions and assumptions

2.1 The game model \(\mathcal {G}\)

Definition 2.1

Assumption 2.2

Theorem 2.3

Remark 2.4

Definition 2.5

Corollary 2.6

Proof

Theorem 2.7

Remark 2.8

2.2 The game models \(\{\mathcal {G}_n\}_{n\ge 1}\)

Assumption 2.9

Corollary 2.10

Theorem 2.11

3 Convergence results

Definition 3.1

Lemma 3.2

Proof

Lemma 3.3

Proof

Theorem 3.4

Proof

4 Finite state and actions approximations

Theorem 4.1

Proof

Assumption 4.2

Lemma 4.3

Proof

Lemma 4.4

Proof

Theorem 4.5

Proof

Lemma 4.6

Proof

Lemma 4.7

Proof

5 Numerical application

Proposition 5.1

Proof

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation