Zero-sum infinite-horizon discounted piecewise deterministic Markov games

Huang, Yonghui; Lian, Zhaotong; Guo, Xianping

doi:10.1007/s00186-023-00809-0

Zero-sum infinite-horizon discounted piecewise deterministic Markov games

Original Article
Published: 08 February 2023

Volume 97, pages 179–205, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Methods of Operations Research Aims and scope Submit manuscript

Zero-sum infinite-horizon discounted piecewise deterministic Markov games

Download PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

This paper is devoted to zero-sum piecewise deterministic Markov games with Borel state and action spaces, where the expected infinite-horizon discounted payoff criterion is considered. Both the transition rate and payoff rate are allowed to be unbounded. The policies of the two players are history-dependent, and the controls continuously act on the transition rate and the payoff rate. Under suitable conditions, Dynkin’s formula and the comparison theorem are developed in our setup, via which the game is shown to have the value function as the unique solution to the associated Shapley equation. By the Shapley equation in the form of a differential equation, we establish the existence of a saddle point with a very simple form, which only depends on the current state and can be applied at any time. A potential algorithm for computing saddle points is proposed.

Zero-Sum Markov Games with Random State-Actions-Dependent Discount Factors: Existence of Optimal Strategies

Article 03 March 2018

Stationary Almost Markov ε-Equilibria for Discounted Stochastic Games with Borel Spaces and Unbounded Payoffs

Article 11 June 2024

Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes

Article 25 April 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

As is well known, game theory is a mathematical framework to analyze social situations among competing players and produce satisfactory decision-making for competing players. Game theory has a wide range of applications such as psychology, evolutionary biology, politics, social sciences, economics and business. Since John Nash received the Nobel Prize in 1994, twenty game theorists have been awarded the Nobel Prize in Economic Sciences. Nowadays, there have been many advances along with plenty of branches in game theory; see, for instance, the monographs (Barron 2013; Haurie et al. 2012).

Markov games are one type of stochastic dynamic games, where the state dynamics of the games are driven by Markov processes. To date, Markov games have received increasing attentions and have been widely investigated; see, for instance, Minjárez-Sosa (2020) for discrete-time Markov games (DTMGs) with discounted criterion; Gensbittel and Renault (2015), Minjárez-Sosa (2020) for DTMGs with long-run average criterion; Guo and Zhang (2017) for continuous-time (pure jump) Markov games (CTMGs) with finite-horizon payoff criterion; Guo and Hernández-Lerma (2007), Prieto-Rumeau and Lorenzo (2015) for CTMGs with discounted payoff criterion; Guo and Hernández-Lerma (2003), Lorenzo et al. (2015) for CTMGs with average payoff criterion; Jaśkiewicz (2009), Mondal (2017) for semi-Markov games (SMGs) with average criterion; Ghosh and Goswami (2006) for SMGs with discounted criterion; Costa and Dufour (2018) for piecewise deterministic Markov games (PDMGs) with discounted payoff criterion.

This paper is concerned with zero-sum PDMGs, one type of Markov games whose state dynamics are driven by piecewise deterministic Markov processes (PDMPs). PDMPs evolve through random jumps at random time points while the motion between jumps follows a flow. In particular, if the flow remains unchanged over time, PDMPs reduce to continuous-time (pure jump) Markov processes (CTMPs). These features of PDMPs make them have wide applications in many areas such as management science, operations research, and engineering. There has been a vast literature on piecewise deterministic Markov decision processes (PDMDPs) where only one decision maker is considered (Bäuerle and Rieder 2011; Costa et al. 2016; Huang and Guo 2019). However, as far as we can tell, there is only one paper (Costa and Dufour 2018) devoted to PDMGs. In fact, Costa and Dufour (2018) deal with zero-sum PDMGs with the infinite horizon total expected discounted reward criterion. The transition rate and reward functions are assumed to be unbounded. The authors use the special features of the PDMPs to reformulate the problem as a discrete-stage zero-sum game problem. They derive conditions for the existence of min-max strategies.

In this paper, our problem and the assumptions on model data are similar to those in Costa and Dufour (2018). We also consider the expected infinite-horizon discounted payoff criterion. The state space is assumed to be a Borel space. Both the transition rate and payoff function are allowed to be unbounded. The policies of the two players are history-dependent, and the controls continuously act on the transition rate and the payoff rate. However, there are some differences between the work in Costa and Dufour (2018) and ours. First, the decisions for the two players in Costa and Dufour (2018) are taken only after a jump time, while the transition rate, reward rate and boundary reward are affected until the next jump occurs through a predefined mapping l(x, a, b, t), where x is the state at the beginning of the current jump, (a, b) is the action pair chosen by players, and t is the time elapsed since this jump occurred. In our work, the decisions for the two players are taken continuously over time, and these actions will be continuously acting on the transition rate and the payoff rate. Second, our approach to dealing with the problem is different from the reduction technique in Costa and Dufour (2018). In fact, we use a so-called infinitesimal approach, which characterizes the value function as a solution to some differential equation; see Costa et al. (2016) for infinite-horizon discounted PDMDPs with bounded jump rates and Huang and Guo (2019) for finite-horizon PDMDPs with unbounded jump rates. In more detail, we develop Dynkin’s formula and the comparison theorem in our setup, and then show that the game has the value function as the unique solution to the Shapley equation that is in the form of a differential equation. As far as we know, this paper is the first attempt to apply the infinitesimal approach to studying PDMGs. Third, because our approach is different from the one in Costa and Dufour (2018), some assumptions in this paper are different from those in Costa and Dufour (2018); see Remarks 2 and 3 for details. We also provide a simple example to verify our assumptions. Finally, by the Shapley equation derived in this paper, we establish the existence of saddle points, and propose a potential algorithm for computing a saddle point. We mention that, since the Shapley equation in this paper is in the form of a differential equation, the saddle point we obtain is in a very simple form, which only depends on the current state and can be applied at any time.

The rest of the paper is organized as follows. Section 2 describes the model of PDMGs and the problem formulation. Section 3 provides some preliminaries such as Dynkin’s formula and the comparison theorem. The main results on the Shapley equation and the existence of saddle points are given in Sect. 4. Section 5 provides an example that verifies all the assumptions in this paper. An appendix about the proof of Proposition 1 is included in Sect. 6.

2 Problem formulation

Notation. Let $R=(-\infty ,\infty )$, and $R_+=[0, \infty )$. For a Borel space X, we denote by $D^c$ the complement of a set $D\subseteq X$, by $\delta _{\{x\}}(\cdot )$ the Dirac measure concentrated on $x \in X$, by $\mathbbm {1}_D(x)$ the indicator function on a set $D \subseteq X$, by ${\mathscr {P}}(X)$ the family of all non-empty subsets of X, by ${\mathscr {B}}(X)$ the Borel $\sigma $-algebra on X, and by P(X) the space of all probability measures on ($X,{\mathscr {B}}(X)$).

The model of a zero-sum PDMG is a tuple as below:

$$\begin{aligned} \{E, A, B, \{A(x), B(x), x\in E\}, q(\cdot \vert x,a,b), \phi (x,t), r(x,a,b) \}. \end{aligned}$$

(1)

Here, E is the state space of the PDMP, while A and B are the action space for player 1 and player 2, respectively. These spaces are all assumed to be Borel spaces endowed with Borel $\sigma $-algebras. $A(\cdot )$ and $B(\cdot )$ are measurable compact-valued multi-functions from E to ${\mathscr {P}}(A)$ and ${\mathscr {P}}(B)$, respectively. For each $x \in E$, A(x) and B(x) denote the sets of available actions to player 1 and player 2, respectively, when the system is at the state $x \in E$. For convenience, let’s introduce the set

$$\begin{aligned} K:=\{(x,a,b): x\in E, a\in A(x), b \in B(x)\}. \end{aligned}$$

That is, K is the graph of the multi-function $A(\cdot )\times B(\cdot )$. Since $A(\cdot )$ and $B(\cdot )$ are measurable compact-valued multi-functions, by Lemma 1.7 in Nowak (1984), K is a measurable subset of $E \times A \times B$. Moreover, $q(\cdot \vert x,a,b)$ is referred to a transition rate, which is a measurable signed kernel on E given K such that, for all $(x,a,b)\in K$, (i) $0\le q(D{\vert }x,a,b)<+\infty $ for $x \notin D \in {\mathscr {B}}(E)$, (ii) $q(E{\vert }x,a,b)\equiv 0$, and (iii) $q^*(x):= \sup _{a\in A(x),b \in B(x)}q(x,a,b)<\infty $, where $q(x,a,b):=-q(\{x\}{\vert }x,a,b)\ge 0$. The motion between jumps of the PDMP is determined by $\phi (x,t)$ in (1), called a flow, which is a measurable function from $E \times R$ to E. We assume that $\phi (x,s+t)=\phi (\phi (x,s),t)$ for all $x \in E$ and $(s,t)\in R^2$. Finally, the measurable function r(x, a, b) on K denotes the payoff rate for player 1. In a zero-sum game, one player’s gain is equivalent to another’s loss, so that r(x, a, b) is the loss rate for player 2.

Remark 1

Unlike the models of PDMPs in Costa et al. (2016), Costa and Dufour (2018), we do not consider jumps when hitting the boundary and the related impulsive control in our model. In some cases, there are actually no boundary jumps since the boundary will be never reached, as indicated in the example in Sect. 5 below.

To construct the PDMP based on the data above, let $E_\Delta :=E \cup \{x_{\infty }\}$, where $x_\infty $ is an isolated artificial point corresponding to the case when no jump occurs in the future. For $n\ge 0$, we put $\Omega _n=E \times ((0,\infty )\times E)^n \times (\{\infty \}\times \{x_\infty \})^\infty $. The sample space is $\Omega =\cup _{n=0}^\infty \Omega _n \cup (E \times ((0,\infty )\times E)^\infty )$, and let ${\mathscr {F}}$ be the Borel $\sigma $-algebra on $\Omega $. Then we obtain a measurable space $(\Omega ,{\mathscr {F}})$. For a trajectory $\omega =(x_{0},\theta _{1},x_{1},\ldots ,\theta _{n},x_{n},\ldots ) \in \Omega $, $x_0$ denotes the initial state of the process, and for $n \ge 1$, $\theta _n >0$ and $x_n$ correspond to the time interval between two consecutive jumps and the state of the process immediately after the jump. In case $\theta _n<\infty $ and $\theta _{n+1} =\infty $, the trajectory has only n jumps, and we put $\theta _{m}=\infty $ and $x_m=x_\infty $ for all $m\ge n+1$. On the measurable space $(\Omega , {{\mathscr {F}}})$, we define a sequence of random variables $\{\Theta _n, X_n, n \ge 0\}$ by $\Theta _0(\omega ):=0, \Theta _{n+1}(\omega ):=\theta _{n+1}, \ X_n(\omega ):=x_n$, for each $n \ge 0$ and any trajectory $\omega =(x_{0},\theta _{1},x_{1},\ldots , \theta _{k}, x_{k},\ldots ) \in \Omega $. Further, we define $T_{0}(\omega ):=0$, $T_n(\omega ):=\sum _{i=1}^{n} \Theta _i (\omega )$ for every $n \ge 1$, and $T_{\infty }(\omega ):=\lim _{n\rightarrow \infty }T_{n}(\omega )$. Then, the PDMP $\{\xi _t, t \in R_+\}$ is defined by

$$\begin{aligned} \xi _t(\omega ):= {\left\{ \begin{array}{ll} \phi (X_n(\omega ),t-T_n(\omega )), &{} ~\text{ if } \, T_n(\omega ) \le t<T_{n+1}(\omega ), \\ x_\infty , &{} ~\text{ if } \,t\ge T_{\infty }(\omega ), \end{array}\right. } \end{aligned}$$

for each $t \in R_+$. To be complete, when the PDMP occupies the state $x_\infty $, we introduce artificial actions $a_\infty $ and $b_\infty $ for player 1 and player 2, respectively. Moreover, let $q(\cdot \vert x_\infty ,a_\infty , b_\infty )\equiv 0$, $A(x_\infty )=\{a_\infty \}$, $B(x_\infty )=\{b_\infty \}$, and $A_\infty =A\cup \{a_\infty \}$, $B_\infty =B\cup \{b_\infty \}$.

The probabilistic properties of the PDMP $\{\xi _t, t \in R_+ \}$ are determined by the transition rate and policies of players. First of all, we need to define policies. To this end, we introduce the random measure $\mu $ associated with $\{\Theta _n,X_n, n \ge 0\}$ on $(R_+ \times E, {\mathscr {B}}(R_+ \times E))$ by

$$\begin{aligned} \mu (\omega ;dt,dx):=\sum _{n\ge 1} \mathbbm {1}_{\{T_n(\omega )<\infty \}} \delta _{\{T_n(\omega ),X_n(\omega )\}}(dt,dx). \end{aligned}$$

Moreover, we take the right-continuous family of $\sigma $-algebras $\{{\mathscr {F}}_{t}\}_{t\ge 0}$ with ${\mathscr {F}}_{t}:=\sigma (\mu ([0,s]\times D): 0\le s \le t, D \in {\mathscr {B}}(E))$, and let ${\mathcal {P}}:=\sigma (\{\Gamma \times \{0\}, \Gamma \in {\mathscr {F}}_0\} \cup \{\Gamma \times (s,\infty ), \Gamma \in {\mathscr {F}}_{s-}, s>0\})$ be the $\sigma $-algebra of predictable sets on $\Omega \times R_+$ related to $\{{\mathscr {F}}_{t}\}_{t\ge 0}$, where ${\mathscr {F}}_{s-}:=\vee _{t<s} {\mathscr {F}}_{t}$.

Definition 1

A randomized history-dependent policy, or simply, a policy for player 1 is a transition probability $\pi ^1(da{\vert }\omega ,t)$ from $(\Omega \times R_+,{\mathcal {P}})$ onto $(A_\infty , {\mathscr {B}}(A_\infty ))$ such that $\pi ^1( A(\xi _{t-}(\omega )){\vert }\omega , t)=1$. In particular, a policy $\pi ^1( da {\vert }\omega ,t)$ is called randomized Markov for player 1 if the action selection depends on the history only through the current state, i.e., the policy has the form $\pi ^1( da {\vert } \xi _{t-}(\omega ))$. Policies for player 2 can be defined similarly.

For each $i=1,2$, we denote by $\Pi ^i$ and $\Pi ^i_{RS}$ the families of all policies and that of all randomized Markov policies for player i, respectively.

Now consider a state $x \in E$ and a pair of policies $(\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2$. By Theorem 3.6 in Jacod (1975), there exists a probability measure $\mathbbm {P}_{x}^{\pi ^1,\pi ^2}$ on $(\Omega ,{\mathscr {F}})$ such that the restriction of $\mathbbm {P}_{x}^{\pi ^1,\pi ^2}$ on $(\Omega ,{\mathscr {F}}_0)$ is given by

$$\begin{aligned} \mathbbm {P}_{x}^{\pi ^1,\pi ^2}(X_0=x)=1, \end{aligned}$$

and the random measure $\nu ^{\pi ^1,\pi ^2}$ defined on $(0,\infty ) \times E$ by

$$\begin{aligned}&\nu ^{\pi ^1,\pi ^2}(\omega ;d t,dx)\\&\quad :=\int _{B}\int _{A}q(dx \setminus \{\xi _{t-}(\omega )\} {\vert }\xi _{t-}(\omega ),a,b)\pi ^1(da{\vert }\omega ,t)\pi ^2(d b{\vert }\omega ,t)dt. \end{aligned}$$

is the predictable projection of $\mu $ with respect to $\mathbbm {P}_{x}^{\pi ^1,\pi ^2}$; see Costa et al. (2016) for further details.

Let ${\mathbb {E}}_x^{\pi ^1,\pi ^2}$ be the corresponding expectation operator with respect to $\mathbbm {P}_{x}^{\pi ^1,\pi ^2}$. For each pair of policies $(\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2$, we define the expected infinite-horizon discounted payoff criterion by

$$\begin{aligned}&V^{\pi ^1,\pi ^2}(x)\\&\quad :={\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _{0}^\infty e^{-\alpha t} \int _B \int _A r(\xi _t,a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t)dt\right] \ \forall x\in E, \end{aligned}$$

provided that the integral is well defined. Here, $\alpha $ is a discount factor.

As is well known, the functions ${\underline{V}}^*(x)$ and ${\overline{V}}^*(x)$ on E defined by

$$\begin{aligned}{} & {} {\underline{V}}^*(x):=\sup _{\pi ^1\in \Pi ^1} \inf _{\pi ^2\in \Pi ^2} V^{\pi ^1,\pi ^2}(x), \\ \textrm{and} \quad{} & {} {\overline{V}}^*(x):=\inf _{\pi ^2\in \Pi ^2} \sup _{\pi ^1\in \Pi ^1} V^{\pi ^1,\pi ^2}(x), \ x\in E, \end{aligned}$$

are called the lower value and the upper value of the game, respectively. It is clear that ${\underline{V}}^*(x) \le {\overline{V}}^*(x)$ for all $x\in E$.

Definition 2

If ${\underline{V}}^*(x) = {\overline{V}}^*(x)$ for all $x\in E$, the common function is called the value function of the game and denoted by $V^*(x)$.

Definition 3

A pair of policies $({\hat{\pi }}^1,{\hat{\pi }}^2)$ is called a saddle-point if

$$\begin{aligned} V^{\pi ^1,{\hat{\pi }}^2}(x) \le V^{{\hat{\pi }}^1, {\hat{\pi }}^2}(x) \le V^{{\hat{\pi }}^1,\pi ^2}(x) \ \forall \pi ^1 \in \Pi ^1, \pi ^2 \in \Pi ^2, x \in E. \end{aligned}$$

At a saddle point, no player can improve (reduce) his/her payoff (loss) by deviating unilaterally from the saddle-point policy. Note that if a saddle point $({\hat{\pi }}^1,{\hat{\pi }}^2)$ exits, we must have

$$\begin{aligned} {\overline{V}}^*(x) \le V^{{\hat{\pi }}^1, {\hat{\pi }}^2}(x) \le {\underline{V}}^*(x) \ \forall x \in E, \end{aligned}$$

which implies that $V^{{\hat{\pi }}^1, {\hat{\pi }}^2}(x)=V^*(x)$ for all $x \in E$, meaning that the game has a value. In the following sections, we show when a saddle point exists and how to find a saddle point for the game.

3 Preliminaries

Since the transition rate is unbounded, we shall first avoid the explosion of the PDMP $\{\xi _t \}$, and so we propose the following assumption.

Assumption 1

(a) There exist a measurable function $w_0 \ge 1$ on E, and constants $c_0 > 0$, $d_0\ge 0$ such that $\int _E w_0(y) q(dy \vert x,a,b)\le c_0 w_0(x)+ d_0,$ for all $(x,a,b)\in K$;

(b) There exists a sequence $\{E_m, m\ge 1\}$ of Borel subsets of E such that $E_m \uparrow E$, $\sup _{x\in E_m} q^*(x)<\infty $, and $\lim _{m\rightarrow \infty }\inf _{x \notin E_m}w_0(x)=\infty $;

(c) $w_0(\phi (x,t))\le w_0(x)$ for all $(x,t) \in E \times R_+$.

Remark 2

(a) Assumption 1 is inspired by Assumption A in Guo and Song (2011) for continuous-time Markov decision processes (CTMDPs) and Assumption 3.1 in Guo and Hernández-Lerma (2007) for CTMGs. However, due to the presence of the flow $\phi $ in PDMPs, we add Assumption 1(c) additionally to ensure the non-explosion of PDMPs. Moreover, Assumption 1 here is similar to Assumption 3.1 in Huang and Guo (2019) for finite-horizon PDMDPs, but the two assumptions slightly differ.

(b) The parallel assumption in Costa and Dufour (2018) for PDMGs to Assumption 1 here is Assumption B therein, but it is different from ours owing to different techniques applied, mentioned in the introduction above.

Proposition 1

Let Assumption 1 be fulfilled. For each $(\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2$, $x\in E$, and $t \in R_+$, the following assertions hold.

(a)
$\mathbbm {P}_{x}^{\pi ^1,\pi ^2} (T_{\infty }=+\infty )=1$.
(b)
$\mathbbm {E}_{x}^{\pi ^1,\pi ^2} \big [w_0(\xi _t) \big ] \le e^{c_0 t}w_0(x) + \dfrac{d_0}{c_0}(e^{c_0 t} -1)$.

Proof

The proof is postponed to Sect. 6. $\square $

To ensure the finiteness of $V^{\pi ^1,\pi ^2}$, we also propose some growth conditions on the payoff rate and the discount factor.

Assumption 2

(a)
The constant $c_0$ in Assumption 1 satisfies that $ c_0 < \alpha $.
(b)
There exists a constant $M_0 >0$ such that
$$\begin{aligned} |r(x,a,b)|\le M_0 w_0(x) \ \ \forall (x,a,b) \in K. \end{aligned}$$

We can now show that $V^{\pi ^1,\pi ^2}$ is finite for each pair of policies $(\pi ^1,\pi ^2)$.

Lemma 1

Under Assumptions 1 and 2, we have

$$\begin{aligned} |V^{\pi ^1,\pi ^2}(x)|\le \frac{M_0}{\alpha -c_0} w_0(x) + \frac{d_0 M_0}{\alpha (\alpha -c_0)} \ \forall (\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2, x\in E. \end{aligned}$$

Proof

Under Assumptions 1 and 2, it follows from Proposition 1(b) that

$$\begin{aligned} |V^{\pi ^1,\pi ^2}(x)|\le M_0 \int _{0}^\infty e^{-\alpha t} {\mathbb {E}}_x^{\pi ^1,\pi ^2} \big [ w_0(\xi _t) \big ] dt \le \frac{M_0}{\alpha -c_0} w_0(x) + \frac{d_0 M_0}{\alpha (\alpha -c_0)}. \end{aligned}$$

The proof is complete. $\square $

We are going to derive Dynkin’s formula for discounted PDMGs with unbounded transition rates. To this end, we introduce a framework. Let $w\ge 1$ be a real-valued measurable function on E, called a weight function. For every measurable function $\varphi $ on E, we introduce its w-norm $\Vert \varphi \Vert _{w}$ by

$$\begin{aligned} \Vert \varphi \Vert _{w} :=\sup _{x \in E} |\varphi (x) |/w(x). \end{aligned}$$

Let $\mathbbm {B}_{w}(E)$ be the Banach space of all measurable functions $\varphi $ on E such that $\Vert \varphi \Vert _{w} < \infty $, while let $\mathbbm {B}_w^{ac}(E)$ be the collection of all measurable functions $\varphi $ in $\mathbbm {B}_{w}(E)$ such that $\varphi (\phi (x,t))$ is absolutely continuous in $t \in R_+$ for all $x \in E$. For a function $\varphi \in \mathbbm {B}_w^{ac}(E)$, by Lemma A.1 in Piunovskiy and Zhang (2021), there is some measurable function $L^\phi \varphi $ on E satisfying

$$\begin{aligned} \varphi (\phi (x,t))-\varphi (x)=\int _{0}^t L^\phi \varphi (\phi (x,v)) dv ~~\forall \ t \in R_+, x \in E. \end{aligned}$$

Here, for $x \in E$, the function $L^\phi \varphi (\phi (x,v))$ on $R_+$ coincides with the partial derivative, $\partial \varphi (\phi (x,v))/ \partial v$, of the function $\varphi (\phi (x,v))$ in $v \in R_+$ apart from on a null set $Z_\varphi (x)\subset R_+$ with respect to the Lebesgue measure. For such a function $\varphi $, let

$$\begin{aligned} {\mathcal {D}}^\varphi :=\{ \phi (x,t) \in E: t \in Z_\varphi ^c(x), x \in E\}. \end{aligned}$$

Then, the function $L^\phi \varphi $ on E can be defined as below:

$$\begin{aligned} L^\phi \varphi (x) := \left\{ \begin{array}{lll} &{} \lim \limits _{ \Delta s \rightarrow 0} \dfrac{\varphi (\phi (x, \Delta s))-\varphi (x)}{\Delta s}, &{} x \in {\mathcal {D}}^\varphi , \\ &{} \text { arbitrary}, &{} \text {otherwise}. \end{array} \right. \end{aligned}$$

In particular, if $\phi (x,t)\equiv x$ in which case a PDMP becomes a CTMP, $L^\phi \varphi (x)= 0$ for all $x\in E$. Moreover, for $\varphi \in \mathbbm {B}_{w}^{ac}(E)$ and a weight function ${\bar{w}}$ on E, we let

$$\begin{aligned} \Vert L^\phi \varphi \Vert _{{\bar{w}}}^{es}:= \sup _{x \in {\mathcal {D}}^\varphi } |L^\phi \varphi (x) |/{\bar{w}}(x), \end{aligned}$$

and $\mathbbm {B}_{w,{\bar{w}}}^{ac}(E):=\{\varphi \in \mathbbm {B}_{w}^{ac}(E): \Vert L^\phi \varphi \Vert _{{\bar{w}}}^{es} <\infty \}$.

Some more assumptions are required to ensure Dynkin’s formula works.

Assumption 3

There exist a measurable function $w_1 \ge 1$ on E, and constants $M_1>0$, $c_1 < \alpha $ and $d_1\ge 0$ such that

(a)
$\big (1+q(x,a,b) \big ) w_0(x)\le M_1 w_1(x)$, for all $(x,a,b)\in K$;
(b)
$ \int _E w_1(y)q(dy \vert x,a,b)\le c_1 w_1(x) + d_1$ for all $(x,a,b)\in {K}$;
(c)
$w_1(\phi (x,t))\le w_1(x)$ for all $(x,t) \in E \times R_+$.

Remark 3

The parallel assumption in Costa and Dufour (2018) for PDMGs to Assumption 3 here is Assumption D therein, but it is different from ours. Assumption D together with Assumption B and C therein are used to ensure the convergence of the expected infinite horizon discounted payoffs, while Assumption 3 together with Assumption 1 here are used to justify Dynkin’s formula in our setup.

When Assumptions 3(b) and 3(c) are further imposed, Proposition 1(b) holds with $w_0$ being replaced by $w_1$ so that we have

$$\begin{aligned} \mathbbm {E}_{x}^{\pi ^1,\pi ^2} \big [w_1(\xi _t) \big ] \le e^{c_1 t}w_1(x) + \dfrac{d_1}{c_1}(e^{c_1 t} -1). \end{aligned}$$

(2)

We are ready to state Dynkin’s formula for discounted PDMGs.

Theorem 2

(Dynkin’s formula) Suppose Assumptions 1, 2(a) and 3 are satisfied. For each $(\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2$, $\varphi \in \mathbbm {B}_{w_0, w_1}^{ac}(E)$, $T \in R_+$ and $x\in E$,

$$\begin{aligned}{} & {} {\mathbb {E}}_x^{\pi ^1,\pi ^2} \Big [\int _{0}^{T}\Big (L^\phi \big [ e^{-\alpha t} \varphi (\xi _t)\big ]\\{} & {} \qquad + e^{-\alpha t} \int _B \int _A \int _E \varphi (y)q(dy \vert \xi _t,a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t)\Big )dt \Big ] \\{} & {} \quad = {\mathbb {E}}_x^{\pi ^1,\pi ^2} [e^{-\alpha T} \varphi (\xi _T)]-\varphi (x). \end{aligned}$$

Proof

It follows from Proposition 1(a) that, for almost all (a.a.) $\omega \in \Omega $ with respective to $\mathbbm {P}_x^{\pi ^1,\pi ^2}$, there are only finite number of jumps of the PDMP $\{\xi _t(\omega )\}$ up to any time T. Therefore, by the construction of $\xi _t$ and the definition of ${\mathcal {D}}^\varphi $, we conclude that for a.a. $t \in R_+$, $\xi _t(\omega ) \in {\mathcal {D}}^\varphi $. Hence, by the definition of the operator $L^\phi $, we have

$$\begin{aligned} L^\phi \big [ e^{-\alpha t} \varphi (\xi _t(\omega ))\big ] =\frac{\partial [ e^{-\alpha t} \varphi (\xi _t(\omega ))\big ]}{\partial t}, \ \ \text {a.a.} \ \omega \in \Omega , t \in R_+. \end{aligned}$$

Thus, using the equality (8) in Avrachenkov et al. (2015) yields that

$$\begin{aligned}{} & {} e^{-\alpha T} \varphi (\xi _T)- \varphi (\xi _0) \nonumber \\{} & {} \quad = \int _{0}^T L^\phi \big [ e^{-\alpha t} \varphi (\xi _t)\big ] d t \nonumber \\{} & {} \qquad +\int _{(0,T] \times E } \Big [e^{-\alpha t} \varphi (y)- e^{-\alpha t} \varphi (\xi _{t-})\Big ] \mu (\omega ; d t, dy), \quad \text {a.a.} \omega -\mathbbm {P}_x^{\pi ^1,\pi ^2}. \end{aligned}$$

(3)

We are going to take expectations in both sides of (3) to derive Dynkin’s formula, but, we should first discuss if the expectations are well defined, as shown below.

Since $\varphi \in \mathbbm {B}_{w_0, w_1}^{ac}(E)$, it is clear that $ |\varphi (x) |\le \Vert \varphi \Vert _{w_0} w_0(x)$ for all $x \in E$, and $ |L^\phi \varphi (x)|\le \Vert L^\phi \varphi \Vert _{w_1}^{es} w_1(x)$ for any $x \in {\mathcal {D}}^\varphi $. On the one hand, observe that $(\xi _t(\omega )) \in {\mathcal {D}}^\varphi $ for a.a. $\omega \in \Omega $ and $t \in R_+$, and so we have

$$\begin{aligned} L^\phi \big [ e^{-\alpha t} \varphi (\xi _t(\omega ))\big ]=-\alpha e^{-\alpha t} \varphi (\xi _t(\omega ))+ e^{-\alpha t} L^\phi \big [\varphi (\xi _t(\omega ))\big ] , \ \text {a.a.} \ \omega \in \Omega , t\in R_+, \end{aligned}$$

which together with Proposition 1(b) and (2) give

$$\begin{aligned}{} & {} {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _0^T \Big \vert L^\phi \big [ e^{-\alpha t} \varphi (\xi _t)\big ] \Big \vert d t \right] \nonumber \\{} & {} \quad \le \int _0^T {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \alpha e^{-\alpha t} \Vert \varphi \Vert _{w_0} w_0 (\xi _t) + e^{-\alpha t} \Vert L^\phi \varphi \Vert _{w_1}^{es} w_1(\xi _t) \big ] \right] d t \nonumber \\{} & {} \quad \le \frac{\alpha \Vert \varphi \Vert _{w_0} }{\alpha -c_0}(w_0(x)+\frac{d_0}{c_0}) + \frac{\Vert L^\phi \varphi \Vert _{w_1}^{es}}{\alpha -c_1}(w_1(x)+\frac{d_1}{c_1}) \nonumber \\{} & {} \quad < \infty . \end{aligned}$$

(4)

On the other hand, under Assumptions 1 and 3, we have

$$\begin{aligned}{} & {} \Big \vert \int _B \int _A \int _E \varphi (y)q(dy\vert \xi _t,a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b\vert \omega ,t) \Big \vert \nonumber \\{} & {} \quad \le \Vert \varphi \Vert _{w_0} \int _B \int _{A} \Big [\int _E w_0(y)q(d y \vert \xi _t,a,,b)+ 2q(\xi _t,a,b)w_0(\xi _t)\Big ]\nonumber \\ {}{} & {} \pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t) \nonumber \\{} & {} \quad \le \Vert \varphi \Vert _{w_0} \Big [c_0 w_0(\xi _t)+ d_0 +2 M_1 w_1(\xi _t) \Big ]. \end{aligned}$$

(5)

Thus, by (5), Proposition 1(b) and (2), we obtain

$$\begin{aligned}{} & {} {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _{0}^{T} e^{-\alpha t} \Big \vert \int _B \int _A \int _E \varphi (y)q(dy\vert \xi _t,a,b)\pi ^1(da\vert \omega ,t)\pi ^2(d b \vert \omega ,t) \Big \vert dt \right] \nonumber \\{} & {} \quad \le \Vert \varphi \Vert _{w_0} \int _{0}^{T} e^{-\alpha t} {\mathbb {E}}_x^{\pi ^1,\pi ^2} \Big [c_0 w_0(\xi _t)+ d_0 +2 M_1 w_1(\xi _t) \Big ] d t \nonumber \\{} & {} \quad \le \Vert \varphi \Vert _{w_0} \Big [\frac{c_0}{\alpha -c_0}(w_0(x)+\frac{d_0}{c_0}) + \frac{d_0}{\alpha }+ \frac{2 M_1}{\alpha - c_1} (w_1(x)+\frac{d_1}{c_1}) \Big ] \nonumber \\{} & {} \quad < \infty . \end{aligned}$$

(6)

Now, taking expectation in both sides of (3), using (4) and (6) yields that

$$\begin{aligned}{} & {} {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ e^{-\alpha T} \varphi (\xi _T(\omega )) \right] -\varphi (x) \\{} & {} \quad = {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _{0}^T L^\phi \big [ e^{-\alpha t} \varphi (\xi _t)\big ] d t \right] \\{} & {} \qquad + {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _E \int _{(0,T]}[e^{-\alpha t} \varphi (y)- e^{-\alpha t} \varphi (\xi _{t-})\Big ] \mu (d t,dy)\right] \\{} & {} \quad = {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _{0}^T L^\phi \big [ e^{-\alpha t} \varphi (\xi _t)\big ] d t \right] \\{} & {} \qquad + {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _E \int _{(0,T]}e^{-\alpha t} (\varphi (y)-\varphi (\xi _{t-})) \nu ^{\pi ^1,\pi ^2}(d t,dy)\right] \\{} & {} \quad = {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _{0}^T L^\phi \big [ e^{-\alpha t} \varphi (\xi _t)\big ] d t \right] \\{} & {} \qquad + {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _{0}^{T} e^{-\alpha t} \int _B \int _A \int _E \varphi (y)q(dy{\vert }\xi _{t-},a,b)\pi ^1(da{\vert }\omega ,t)\pi ^2(d b{\vert }\omega ,t) dt \right] , \end{aligned}$$

where the second equality follows from the fact that the random measure $\nu ^{\pi ^1,\pi ^2}$ is a dual predictable projection of the random measure $\mu $ under $\mathbbm {P}_x^{\pi ^1,\pi ^2} $. Note that, for every $\omega \in \Omega $, $\xi _{t-}(\omega )=\xi _{t}(\omega )$ on (0, T] except countable many time points. Hence,

$$\begin{aligned}{} & {} {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _{0}^{T} e^{-\alpha t} \int _B \int _A \int _E \varphi (y)q(dy{\vert }\xi _{t-},a,b)\pi ^1(da{\vert }\omega ,t)\pi ^2(d b{\vert }\omega ,t) dt \right] \\{} & {} \quad = {\mathbb {E}}_x^{\pi ^1,\pi ^2} \left[ \int _{0}^{T} e^{-\alpha t} \int _B \int _A \int _E \varphi (y)q(dy{\vert }\xi _{t},a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b\vert \omega ,t) dt \right] . \end{aligned}$$

The formula then follows. $\square $

Remark 4

Costa et al. (2016) derive Dynkin’s formula for infinite-horizon discounted PDMDPs, where bounded transition rate and boundary jumps are considered.

Now, using Dynkin’s formula, we can compare $V^{\pi ^1,\pi ^2}$ with the solution to some differential inequality or equation. To begin with, we introduce the following notations: for each $x \in E$, $(\eta ,\gamma ) \in P(A(x))\times P(B(x))$ and $\varphi \in \mathbbm {B}_{w_0}(E)$,

$$\begin{aligned}{} & {} r(x,\eta ,\gamma ):=\int _{B(x)} \int _{A(x)} r(x,a,b)\eta (d a)\gamma (d b), \end{aligned}$$

(7)

$$\begin{aligned}{} & {} \int _ E \varphi (y) q(dy \vert x,\eta ,\gamma ):= \int _{B(x)} \int _{A(x)} \int _ E \varphi (y) q(dy \vert x,a,b) \eta (d a)\gamma (d b). \end{aligned}$$

(8)

Theorem 3

(Comparison Theorem) Suppose that Assumptions 1–3 hold.

(a)
If there exists a function $\varphi \in \mathbbm {B}_{w_0,w_1}^{ac}(E)$ such that
$$\begin{aligned}{} & {} L^\phi \varphi (x) - \alpha \varphi (x) + r(x,\eta ,\gamma ) \\{} & {} \quad + \int _E \varphi (y) q(dy \vert x,\eta ,\gamma ) \le 0 \quad \forall (\eta ,\gamma ) \in P(A(x))\times P(B(x)), x \in {\mathcal {D}}^\varphi , \end{aligned}$$
we have $V^{\pi ^1,\pi ^2}(x) \le \varphi (x)$, for all $(\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2$ and $x \in E$.
(b)
If there exists a function $\varphi \in \mathbbm {B}_{w_0,w_1}^{ac}(E)$ such that
$$\begin{aligned}{} & {} L^\phi \varphi (x) - \alpha \varphi (x) + r(x, \eta ,\gamma ) \\{} & {} \quad + \int _E \varphi (y) q(dy \vert x,\eta ,\gamma ) \ge 0 \quad \forall (\eta ,\gamma ) \in P(A(x))\times P(B(x)), x \in {\mathcal {D}}^\varphi , \end{aligned}$$
we have $V^{\pi ^1,\pi ^2}(x) \ge \varphi (x)$, for all $(\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2$ and $x \in E$.

Proof

(a)
As in the proof of Theorem 2, we conclude that, for a.a. $\omega \in \Omega $ with respect to $\mathbbm {P}_x^{\pi ^1,\pi ^2}$ and a.a. $t \in R_+$ with respect to Lebesgue measure, $\xi _t(\omega ) \in {\mathcal {D}}^\varphi $, which together with the condition in (a) implies that, for a.a. $\omega \in \Omega \ \text {and} \ t \in R_+$,
$$\begin{aligned}{} & {} L^\phi \varphi (\xi _t(\omega )) - \alpha \varphi (\xi _t(\omega )) + \int _B \int _A r(\xi _t(\omega ),a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t) \\{} & {} \quad + \int _B \int _A \int _E \varphi (y)q(dy \vert \xi _t,a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t) \le 0, \end{aligned}$$
which yields that
$$\begin{aligned}{} & {} L^\phi \Big [ e^{-\alpha t} \varphi (\xi _t(\omega )) \Big ] + e^{-\alpha t} \int _B \int _A r(\xi _t(\omega ),a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t) \\{} & {} \quad + e^{-\alpha t} \int _B \int _A \int _E \varphi (y)q(dy \vert \xi _t,a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t) \le 0. \end{aligned}$$
Thus, by Theorem 2, for all $(\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2$ and $x \in E$, we have
$$\begin{aligned}{} & {} {\mathbb {E}}_x^{\pi ^1,\pi ^2} [e^{-\alpha T} \varphi (\xi _T)]-\varphi (x) \\{} & {} \quad = {\mathbb {E}}_x^{\pi ^1,\pi ^2} \Big [\int _{0}^{T}\Big (L^\phi [ e^{\alpha t} \varphi (\xi _t)] \\{} & {} \qquad + e^{\alpha t} \int _B \int _A \int _E \varphi (y)q(dy \vert \xi _t,a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t)\Big )dt \Big ] \\{} & {} \quad \le - {\mathbb {E}}_x^{\pi ^1,\pi ^2} \Big [\int _{0}^T e^{-\alpha t} \int _B \int _A r(\xi _t,a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t)dt\Big ], \end{aligned}$$
which indicates that
$$\begin{aligned}{} & {} {\mathbb {E}}_x^{\pi ^1,\pi ^2} \Big [\int _{0}^T e^{-\alpha t} \int _B \int _A r(\xi _t,a,b)\pi ^1(da \vert \omega ,t)\pi ^2(d b \vert \omega ,t)dt\Big ]\\{} & {} \quad \le \varphi (x) -{\mathbb {E}}_x^{\pi ^1,\pi ^2} [e^{-\alpha T} \varphi (\xi _T)]. \end{aligned}$$
Letting $T\rightarrow \infty $ on both sides of the above inequality, the dominated convergence theorem yields that $V^{\pi ^1,\pi ^2}(x) \le \varphi (x)$.
(b)
The proof is similar to that of part (a). $\square $

Theorem 4

Suppose Assumptions 1–3 hold. For every $(\pi ^1,\pi ^2) \in \Pi ^1_{RS} \times \Pi ^2_{RS}$, $V^{\pi ^1,\pi ^2}$ is the unique solution solution in $\mathbbm {B}_{w_0,w_1}^{ac}(E)$ to the differential equation

$$\begin{aligned}{} & {} L^\phi \varphi (x) - \alpha \varphi (x) + r(x,\pi ^1(\cdot \vert x),\pi ^2(\cdot \vert x)) \nonumber \\{} & {} \quad + \int _E \varphi (y) q(dy \vert x,\pi ^1(\cdot \vert x),\pi ^2(\cdot \vert x)) =0 \qquad \forall x \in {\mathcal {D}}^{\varphi }. \end{aligned}$$

(9)

Proof

From Lemma 1, we see that $V^{\pi ^1,\pi ^2} \in \mathbbm {B}_{w_0}(E)$. Now, for each $x \in E$, conditioning on the first-jump time and the post-jump state, we have

$$\begin{aligned}{} & {} V^{\pi ^1,\pi ^2}(x) \nonumber \\{} & {} \quad = {\mathbb {E}}_x^{\pi ^1,\pi ^2} \bigg [\int _{0}^{T_1} e^{-\alpha t} \int _B \int _A r(\xi _t,a,b)\pi ^1(da \vert \xi _{t-} )\pi ^2(d b \vert \xi _{t-})dt\bigg ] \nonumber \\{} & {} \qquad + {\mathbb {E}}_x^{\pi ^1,\pi ^2} \bigg [e^{-\alpha T_1} \int _{T_1}^\infty e^{-\alpha (t-T_1)} \int _B \int _A r(\xi _t,a,b)\pi ^1(da \vert \xi _{t-} )\pi ^2(d b \vert \xi _{t-})dt\bigg ] \nonumber \\{} & {} \quad = \int _{0}^\infty e^{-\alpha t} {\mathbb {P}}_x^{\pi ^1,\pi ^2}(T_1> t) r(\phi (x,t),\pi ^1(\cdot \vert \phi (x,t)),\pi ^2(\cdot \vert \phi (x,t))) dt \nonumber \\{} & {} \qquad + {\mathbb {E}}_x^{\pi ^1,\pi ^2} \bigg [ e^{-\alpha T_1} {\mathbb {E}}_x^{\pi ^1,\pi ^2} \bigg [ \int _{T_1}^\infty e^{-\alpha (t-T_1)} \nonumber \\{} & {} \qquad \times \Big [\int _B \int _A r(\xi _t,a,b) \pi ^1(da \vert \xi _{t-} )\pi ^2(d b \vert \xi _{t-})\Big ]dt \Big \vert T_1, X_1\bigg ] \bigg ] \nonumber \\{} & {} \quad = \int _{0}^\infty e^{-\alpha t-\int _0^t q(\phi (x,v),\pi ^1(\cdot \vert \phi (x,v)),\pi ^2(\cdot \vert \phi (x,v)))d v} \nonumber \\{} & {} \qquad \times \Big [ r(\phi (x,t),\pi ^1(\cdot \vert \phi (x,t)),\pi ^2(\cdot \vert \phi (x,t))) \nonumber \\{} & {} \qquad + \int _{E \setminus \{\phi (x,t) \}} V^{\pi ^1,\pi ^2}(y) q(dy \vert \phi (x,t),\pi ^1(\cdot \vert \phi (x,t)),\pi ^2(\cdot \vert \phi (x,t))) \Big ] dt. \end{aligned}$$

(10)

For all $s\in R_+$, replacing x with $\phi (x,s)$ in (10), we obtain

$$\begin{aligned}{} & {} V^{\pi ^1,\pi ^2}(\phi (x,s)) \nonumber \\{} & {} \quad = \int _{0}^\infty e^{-\alpha t} e^{-\int _0^t q(\phi (x,v+s),\pi ^1(\cdot \vert \phi (x,t+v)),\pi ^2(\cdot \vert \phi (x,t+v)))d v} \nonumber \\{} & {} \qquad \times \big [ r(\phi (x,t+s),\pi ^1(\cdot \vert \phi (x,t+s)),\pi ^2(\cdot \vert \phi (x,t+s))) \nonumber \\{} & {} \qquad + \int _{E \setminus \{\phi (x,t+s) \}} V^{\pi ^1,\pi ^2}(y) \nonumber \\{} & {} \qquad \cdot q(dy \vert \phi (x,t+s),\pi ^1(\cdot \vert \phi (x,t+s)),\pi ^2(\cdot \vert \phi (x,t+s))) \big ] dt. \end{aligned}$$

(11)

Now, changing the integral variable t to u with $u=t+s$ in the R.H.S. of (11), and then multiplying by $e^{-\int _0^s (\alpha + q(\phi (x,v),\pi ^1(\cdot \vert \phi (x,v)),\pi ^2(\cdot \vert \phi (x,v))))d v}$ in both sides of (11), we find that

$$\begin{aligned}{} & {} e^{-\int _0^s (\alpha + q(\phi (x,v),\pi ^1(\cdot \vert \phi (x,v)),\pi ^2(\cdot \vert \phi (x,v))))d v} V^{\pi ^1,\pi ^2}(\phi (x,s)) \\{} & {} \quad = \int _{s}^\infty e^{-\int _0^u (\alpha + q(\phi (x,v),\pi ^1(\cdot \vert \phi (x,v)),\pi ^2(\cdot \vert \phi (x,v))))d v} \\{} & {} \qquad \times \Big [ r(\phi (x,u),\pi ^1(\cdot \vert \phi (x,u)),\pi ^2(\cdot \vert \phi (x,u))) \\{} & {} \qquad + \int _{E \setminus \{\phi (x,u) \}} V^{\pi ^1,\pi ^2}(y) q(dy \vert \phi (x,u),\pi ^1(\cdot \vert \phi (x,u)),\pi ^2(\cdot \vert \phi (x,u))) \Big ] d u, \end{aligned}$$

which shows that $V^{\pi ^1,\pi ^2}(\phi (x,s))$ is absolutely continuous in $s \in R_+$, and thus, is differentiable almost everywhere on $R_+$. Therefore, differentiating both sides of the above equality with respect to s, and then dividing by $e^{-\int _0^s (\alpha + q(\phi (x,v),\pi ^1(\cdot \vert \phi (x,v)),{\pi ^{2}}(\cdot \vert \phi (x,v))))dv}$ both sides of the resulting equality yields

$$\begin{aligned}{} & {} L^\phi V^{\pi ^1,\pi ^2}(\phi (x,s)) - \alpha V^{\pi ^1,\pi ^2}(\phi (x,s)) + r(\phi (x,s),\pi ^1(\cdot \vert \phi (x,s)),\pi ^2(\cdot \vert \phi (x,s))) \\{} & {} \quad + \int _E V^{\pi ^1,\pi ^2}(y) q(dy \vert \phi (x,s),\pi ^1(\cdot \vert \phi (x,s)),\\ {}{} & {} \pi ^2(\cdot \vert \phi (x,s))) =0 \ \ \ \ \ \ \forall s \in Z_{V^{\pi ^1,\pi ^2}}^c(x). \end{aligned}$$

This implies that

$$\begin{aligned}{} & {} L^\phi V^{\pi ^1,\pi ^2}(x) - \alpha V^{\pi ^1,\pi ^2}(x) + r(x,\pi ^1(\cdot \vert x),\pi ^2(\cdot \vert x)) \\{} & {} \quad + \int _E V^{\pi ^1,\pi ^2}(y) q(dy \vert x,\pi ^1(\cdot \vert x),\pi ^2(\cdot \vert x)) =0, \end{aligned}$$

for all $x \in {\mathcal {D}}^{V^{\pi ^1,\pi ^2}}$. That is, $V^{\pi ^1,\pi ^2}$ satisfies (9).

To show that $V^{\pi ^1,\pi ^2}$ is in $\mathbbm {B}_{w_0,w_1}^{ac}(E)$, it remains to verify that $\Vert L^\phi V^{\pi ^1,\pi ^2} \Vert _{w_1}^{es} < \infty $. Indeed, by (9) and Assumptions 1-3, a simple calculation gives that

$$\begin{aligned} \vert L^\phi V^{\pi ^1,\pi ^2} (x) \vert \le \big [ \frac{M_0}{\alpha -c_0} (1+ \frac{d_0}{\alpha }) (\alpha + c_0 + d_0 +2 M_1) + M_0 \big ] w_1(x) \forall x \in {\mathcal {D}}^{V^{\pi ^1,\pi ^2}}, \end{aligned}$$

which indicates that $\Vert L^\phi V^{\pi ^1,\pi ^2} \Vert _{w_1}^{es} < \infty $. Now, if $\varphi $ is another solution in $\mathbbm {B}_{w_0,w_1}^{ac}(E)$ to equation (9), by Theorem 3, we must have $\varphi = V^{\pi ^1,\pi ^2}$. The proof is achieved. $\square $

4 Main results

4.1 Shapley equation and saddle points

In this subsection, we prove that the game has the value function $V^*$ satisfying the associated Shapley equation and a saddle point exists.

To proceed, we introduce a set of continuity assumptions.

Assumption 4

Let $x \in E$ be arbitrary.

(a)
The payoff rate r(x, a, b) is continuous in $(a,b) \in A(x) \times B(x)$;
(b)
The function $q(D{\vert }x,a,b)$ is continuous in $(a,b) \in A(x) \times B(x)$ for every $D \in {{{\mathcal {B}}}}(E)$;
(c)
The function $\int _ E w_0(y) q(dy {\vert } x,a,b)$ is continuous in $(a,b) \in A(x) \times B(x)$.

Moreover, let m(x) be a measurable function on E such that $m(x) \ge q^*(x)+1$, and

$$\begin{aligned} Q(D{\vert } x,a,b):=\frac{q(D{\vert } x,a,b)}{m(x)}+\delta _{\{x\}}(D) \ \ \ \forall (x,a,b)\in K \ \textrm{and} \ D\in {{{\mathcal {B}}}}(E). \end{aligned}$$

Clearly, $Q(\cdot {\vert } x,a,b)$ is a stochastic kernel on E given K.

Lemma 2

(a)
$P(A(\cdot ))$ and $P(B(\cdot ))$ are measurable compact-valued multi-functions from E to ${\mathscr {P}}(P(A))$ and ${\mathscr {P}}(P(B))$, respectively;
(b)
Under Assumptions 2(b) and 4(a), $r(x,\eta ,\gamma )$ defined by (7) is continuous in $(\eta ,\gamma ) \in P(A(x)) \times P(B(x))$ with respect to the weak topology for each $x \in E$;
(c)
Suppose Assumptions 1(a), 3(a), 4(b) and 4(c) hold. For every function $\varphi $ in $\mathbbm {B}_{w_0}(E)$ and $x \in E $, $\int _ E \varphi (y) Q(dy {\vert } x,\eta ,\gamma )$ and $\int _ E \varphi (y) q(dy {\vert } x,\eta ,\gamma )$ defined by (8) are continuous in $(\eta ,\gamma ) \in P(A(x)) \times P(B(x))$ with respect to the weak topology.

Proof

(a)
When the model of PDMG is introduced, we have assumed that $A(\cdot )$ and $B(\cdot )$ are measurable compact-valued multi-functions from E to ${\mathscr {P}}(A)$ and ${\mathscr {P}}(B)$, respectively. Then, the result follows from Lemma 1.11 in Nowak (1984).
(b)
For each fixed $x \in E$, Assumptions 2(b) and 4(a) implies that r(x, a, b) is a bounded continuous function on $A(x) \times B(x)$. Hence, for any sequence $(\eta _n,\gamma _n)$ weakly converging to $(\eta ,\gamma )$ in $P(A(x)) \times P(B(x))$, we have $r(x,\eta _n,\gamma _n)$ converge to $r(x,\eta ,\gamma )$, which means that $r(x,\eta ,\gamma )$ is continuous in $(\eta ,\gamma ) \in P(A(x)) \times P(B(x))$.
(c)
For each $x \in E$, Assumption 4(c) implies that $\int _ E \varphi (y) Q(dy {\vert } x,a,b)$ is continuous in $(a,b) \in A(x) \times B(x)$ for every bounded measurable function $\varphi $ on E, while Assumption 4(c) implies that $\int _ E w_0(y) Q(dy {\vert } x,a,b)$ is continuous in $(a,b) \in A(x) \times B(x)$. Using this fact and a similar argument as in the proof of Lemma 8.3.7 in Hernández-Lerma and Lasserre (1999), we can prove that $\int _ E \varphi (y) Q(dy {\vert } x,a,b)$ is continuous in $(a,b) \in A(x) \times B(x)$ for every $\varphi $ in $\mathbbm {B}_{w_0}(E)$, and so is $\int _ E \varphi (y) q(dy {\vert } x,a,b)$.

For each fixed $x \in E$, Assumptions 1(a) and 3(a) as well as what we have proved above imply that $ \int _E \varphi (y) Q(dy {\vert } x,a,b)$ and $\int _E \varphi (y) q(dy {\vert } x,a,b)$ are bounded continuous function on $A(x) \times B(x)$. Hence, for any sequence $(\eta _n,\gamma _n)$ weakly converging to $(\eta ,\gamma )$ in $P(A(x)) \times P(B(x))$, we have $\int _E \varphi (y) Q(d y {\vert } x,\eta _n,\gamma _n) \rightarrow \int _E \varphi (y) Q(d y {\vert } x,\eta ,\gamma )$ and $\int _E \varphi (y) q(d y{\vert } x,\eta _n,\gamma _n) \rightarrow \int _E \varphi (y) q(d y {\vert } x,\eta ,\gamma )$, completing the proof. $\square $

For a function $\varphi $ in $\mathbbm {B}_{w_0}(E)$, we define a dynamic programming operator H as below:

$$\begin{aligned} H \varphi (x):= & {} \int _0^\infty e^{- \alpha s} e^{-\int _0^s m(\phi (x,v))dv} \sup \limits _{\eta \in P(A(\phi (x,s)))} \inf \limits _{\gamma \in P(B(\phi (x,s)))} \big [ r(\phi (x,s),\eta , \gamma ) \\{} & {} + m(\phi (x,s)) \int _{E} \varphi (y) Q(dy{\vert } \phi (x,s),\eta ,\gamma )\big ] d s, \ x \in E. \end{aligned}$$

In general, the function

$$\begin{aligned} x \mapsto \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \varphi (y) Q(dy{\vert } x,\eta ,\gamma )\Big ] \end{aligned}$$

need not be measurable. However, under those conditions in Lemma 2, by Fan’s minimax theorem in Fan (1953) and Lemma 4.1 in Nowak (1984), it is the case and thus the operator H is well defined.

Now, for each $n=0,1,\ldots $ and $x \in E$, we recursively define a sequence of functions as follows: $\psi _{n+1}(x):=H \psi _n(x)$, where the initial function $\psi _0$ is defined by

$$\begin{aligned} \psi _0(x):= -\frac{M_0}{\alpha -c_0} w_0(x) - \frac{d_0 M_0}{\alpha (\alpha -c_0)}. \end{aligned}$$

The choice of $\psi _0$ is inspired by Lemma 1, i.e., $\psi _0$ is the lower bound of $V^{\pi ^1,\pi 2}$. Note that H need not be a contraction operator due to the unbounded transition rate, and so the convergence of $\{\psi _n\}$ may be sensitive to the choice of $\psi _0$.

Theorem 5

Suppose that Assumptions 1, 2, 3(a) and 4 are satisfied.

(a)
The sequence $\{\psi _n\}$ is increasing in n, and the limit $\psi _\infty :=\lim \limits _{n\rightarrow \infty }\psi _n$ is in $\mathbbm {B}_{w_0}(E)$.
(b)
The function $\psi _\infty $ in (a) satisfies the equation $\psi _\infty =H \psi _\infty $.
(c)
The function $\psi _\infty $ in (a) is in $\mathbbm {B}_{w_0,w_1}^{ac}(E)$ and verifies the following Shapley equation:
$$\begin{aligned}{} & {} L^\phi \varphi (x) - \alpha \varphi (x) \nonumber \\{} & {} \quad +\sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\eta ,\gamma ) + \int _E \varphi (y) q(dy {\vert } x,\eta ,\gamma ) \Big ] =0 \ \forall x \in {\mathcal {D}}^{\varphi }. \end{aligned}$$
(12)

Proof

(a)
To prove the monotonicity of the sequence $\{\psi _n\}$, we first show that $\psi _1 \ge \psi _0$. Indeed, under Assumptions 1, and 2, for every $x \in E$, a direct calculation gives
$$\begin{aligned}{} & {} \psi _1(x)\\{} & {} \quad \ge - \int _0^\infty e^{- \alpha s} e^{-\int _0^s m(\phi (x,v))dv} \sup \limits _{\eta \in P(A(\phi (x,s)))} \inf \limits _{\gamma \in P(B(\phi (x,s)))} \Big [ M_0 w_0 (\phi (x,s)) \\{} & {} \qquad + m(\phi (x,s)) \int _{E} \Big ( \frac{M_0}{\alpha -c_0} w_0(y) + \frac{d_0 M_0}{\alpha (\alpha -c_0)}\Big ) Q(dy{\vert } \phi (x,s),\eta ,\gamma ) \Big ] d s \\{} & {} \quad \ge - \int _0^\infty e^{- \alpha s} e^{-\int _0^s m(\phi (x,v))dv} \Big [ M_0 w_0 (\phi (x,s)) \\{} & {} \qquad + \frac{M_0}{\alpha -c_0} \Big ( c_0 w_0(\phi (x,s))+ d_0 + m(\phi (x,s))w_0(\phi (x,s)) \Big )\\{} & {} \qquad + m(\phi (x,s))\frac{d_0 M_0}{\alpha (\alpha -c_0)} \Big ] d s \\{} & {} \quad \ge - \int _0^\infty e^{- \alpha s} e^{-\int _0^s m(\phi (x,v))dv} \Big [ M_0 w_0 (x) \\{} & {} \qquad + \frac{M_0}{\alpha -c_0} \Big ( c_0 w_0(x)+ d_0 + m(\phi (x,s))w_0(x) \Big ) + m(\phi (x,s))\frac{d_0 M_0}{\alpha (\alpha -c_0)} \Big ] d s \\{} & {} \quad = - \int _0^\infty e^{- \alpha s} e^{-\int _0^s m(\phi (x,v))dv} \Big [ \frac{\alpha M_0 w_0(x)}{\alpha -c_0} + \frac{M_0 d_0}{\alpha -c_0} \\{} & {} \qquad + \Big ( \frac{M_0 w_0(x)}{\alpha -c_0} + \frac{d_0 M_0}{\alpha (\alpha -c_0)} \Big ) m(\phi (x,s)) \Big ] d s \\{} & {} \quad = - \Big ( \frac{M_0 w_0(x)}{\alpha -c_0} + \frac{d_0 M_0}{\alpha (\alpha -c_0)} \Big ) \int _0^\infty e^{-\int _0^s (\alpha + m(\phi (x,v)))dv} \big ( \alpha + m(\phi (x,s)) \big ) d s \\{} & {} \quad = - \Big ( \frac{M_0 w_0(x)}{\alpha -c_0} + \frac{d_0 M_0}{\alpha (\alpha -c_0)} \Big ) \\{} & {} \quad = \psi _0(x). \end{aligned}$$
Thus, the monotonicity of the operator H yields
$$\begin{aligned} \psi _{n+1}= H^n\psi _1 \ge H^n\psi _0=\psi _n \ \forall n\ge 1, \end{aligned}$$
which implies the monotonicity of the sequence $\{\psi _n\}$, and thus the existence of the point-wise limit $\psi _\infty $. Moreover, by a similar calculation as in the proof of $\psi _1 \ge \psi _0$ and an induction argument, one can show that
$$\begin{aligned} {\vert } \psi _n(x){\vert } \le \frac{M_0}{\alpha -c_0} w_0(x) + \frac{d_0 M_0}{\alpha (\alpha -c_0)} \ \forall x \in E, n\ge 0, \end{aligned}$$
and so is $ \psi _\infty $, which indicates that $ \psi _\infty \in \mathbbm {B}_{w_0}(E)$.
(b)
On the one hand, by the monotonicity of H, we have that $ \psi _{n+1}=H \psi _n \le H \psi _\infty $ for all $n \ge 0$. Hence, $\psi _\infty \le H \psi _\infty $. On the other hand, for every fixed $x \in E$ and any $\eta \in P(A(x))$, it is clear that
$$\begin{aligned}{} & {} \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _n(y) Q(dy{\vert } x,\eta ,\gamma )\Big ] \nonumber \\{} & {} \quad \ge \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _n(y) Q(dy{\vert } x,\eta ,\gamma )\Big ]. \end{aligned}$$
(13)
By Lemma 2, for each $n\ge 0$, there exists $\gamma _n \in P(B(x))$ such that
$$\begin{aligned}{} & {} \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _n(y) Q(dy{\vert } x,\eta ,\gamma )\Big ] \nonumber \\{} & {} \quad = r(x,\eta , \gamma _n) + m(x) \int _{E} \psi _n(y) Q(dy{\vert } x,\eta ,\gamma _n). \end{aligned}$$
(14)
Note that P(B(x)) is compact with respect to the weak topology, there is a subsequence $\{\gamma _{n_l}, l \ge 0\}$ of $\{\gamma _n, n\ge 0\}$ such that $\gamma _{n_l}$ weakly converges to some $\gamma ^* \in P(B(x))$. Hence, it follows from Lemma 2(b) that
$$\begin{aligned} \lim _{l \rightarrow \infty } r(x,\eta , \gamma _{n_l}) = r(x,\eta , \gamma ^*). \end{aligned}$$
(15)
Furthermore,
$$\begin{aligned}{} & {} \Big \vert \int _{E} \psi _{n_l}(y) Q(dy{\vert } x,\eta ,\gamma _{n_l}) - \int _{E} \psi _\infty (y) Q(dy{\vert } x,\eta ,\gamma ^*) \Big \vert \nonumber \\{} & {} \quad \le \Big \vert \int _{E} \psi _{n_l}(y) Q(dy{\vert } x,\eta ,\gamma _{n_l}) - \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma _{n_l}) \Big \vert \nonumber \\{} & {} \qquad + \Big \vert \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma _{n_l}) - \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma ^*) \Big \vert \nonumber \\{} & {} \quad \le \int _{E} \Big ( \psi _\infty (y) - \psi _{n_l}(y) \Big ) Q(dy{\vert }x,\eta ,\gamma _{n_l}) \nonumber \\{} & {} \qquad + \Big \vert \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma _{n_l}) - \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma ^*) \Big \vert . \end{aligned}$$
(16)
Note that $( \psi _\infty - \psi _{n_l})$ are non-increasing in l. Under Assumption 4, using Theorem A. 1. 5 in Bäuerle and Rieder (2011) and the dominated convergence theorem, we conclude that
$$\begin{aligned}{} & {} \lim _{l \rightarrow \infty } \int _{E} \Big ( \psi _\infty (y) - \psi _{n_l}(y) \Big ) Q(dy{\vert }x,\eta ,\gamma _{n_l}) \nonumber \\{} & {} \quad \le \lim _{l \rightarrow \infty } \sup _{b \in B(x)} \Big [ \int _{E} \Big ( \psi _\infty (y) - \psi _{n_l}(y) \Big ) Q(dy{\vert }x,\eta ,b) \Big ] \nonumber \\{} & {} \quad = \sup _{b \in B(x)} \lim _{l \rightarrow \infty } \Big [ \int _{E} \Big (\psi _\infty (y) - \psi _{n_l}(y) \Big ) Q(dy{\vert }x,\eta ,b) \Big ] \nonumber \\{} & {} \quad = \sup _{b \in B(x)} \Big [\int _{E} \lim _{l \rightarrow \infty } \Big (\psi _\infty (y) - \psi _{n_l}(y) \Big ) Q(dy{\vert }x,\eta ,b) \Big ] \nonumber \\{} & {} \quad = 0. \end{aligned}$$
(17)
Moreover, since $\psi _\infty $ is in $\mathbbm {B}_{w_0}(E)$, it follows from Lemma 2(c) that
$$\begin{aligned} \lim _{l \rightarrow \infty } \Big \vert \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma _{n_l}) - \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma ^*) \Big \vert =0. \end{aligned}$$
(18)
Now, using (16)-(18), we have
$$\begin{aligned} \lim _{l \rightarrow \infty } \int _{E} \psi _{n_l}(y) Q(dy{\vert }x,\eta ,\gamma _{n_l}) = \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma ^*). \end{aligned}$$
(19)
Hence, it follows from (14), (15) and (19) that
$$\begin{aligned}{} & {} \lim _{l \rightarrow \infty } \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _{n_l}(y) Q(dy{\vert }x,\eta ,\gamma )\Big ] \nonumber \\{} & {} \quad = \lim _{l \rightarrow \infty } \Big [ r(x,\eta , \gamma _{n_l}) + m(x) \int _{E} \psi _{n_l}(y) Q(dy{\vert }x,\eta ,\gamma _{n_l})\Big ] \nonumber \\{} & {} \quad = r(x,\eta , \gamma ^*) + m(x) \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma ^*), \end{aligned}$$
(20)
which together with (13) gives that
$$\begin{aligned}{} & {} \lim _{l \rightarrow \infty } \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _{n_l}(y) Q(dy{\vert }x,\eta ,\gamma )\Big ] \nonumber \\{} & {} \quad \ge \lim _{l \rightarrow \infty } \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _{n_l}(y) Q(dy{\vert }x,\eta ,\gamma )\Big ] \nonumber \\{} & {} \quad = r(x,\eta , \gamma ^*) + m(x) \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma ^*) \nonumber \\{} & {} \quad \ge \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma )\Big ]. \nonumber \\ \end{aligned}$$
(21)
Hence, using the arbitrariness of $\eta $ and (21), we obtain
$$\begin{aligned}{} & {} \lim _{l \rightarrow \infty } \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _{n_l}(y) Q(dy{\vert }x,\eta ,\gamma )\Big ] \nonumber \\{} & {} \quad \ge \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _\infty (y) Q(dy{\vert }x,\eta ,\gamma )\Big ] \ \forall x \in E. \end{aligned}$$
(22)
Thus, using the dominated convergence theorem and (22), we have
$$\begin{aligned}{} & {} \psi _\infty (x)\\{} & {} \quad = \lim _{l\rightarrow \infty } H \psi _{n_l}(x) \\{} & {} \quad = \lim _{l\rightarrow \infty } \int _0^\infty e^{- \alpha s} e^{-\int _0^s m(\phi (x,v))dv} \sup \limits _{\eta \in P(A(\phi (x,s)))} \inf \limits _{\gamma \in P(B(\phi (x,s)))} \Big [ r(\phi (x,s),\eta , \gamma ) \\{} & {} \qquad + m(\phi (x,s)) \int _{E} \psi _{n_l}(y) Q(dy{\vert }\phi (x,s),\eta ,\gamma )\Big ] d s \\{} & {} \quad = \int _0^\infty e^{- \alpha s} e^{-\int _0^s m(\phi (x,v))dv} \lim _{l\rightarrow \infty } \sup \limits _{\eta \in P(A(\phi (x,s)))} \inf \limits _{\gamma \in P(B(\phi (x,s)))} \Big [ r(\phi (x,s),\eta , \gamma ) \\{} & {} \qquad + m(\phi (x,s)) \int _{E} \psi _{n_l}(y) Q(dy{\vert }\phi (x,s),\eta ,\gamma )\Big ] d s \\{} & {} \quad \ge \int _0^\infty e^{- \alpha s} e^{-\int _0^s m(\phi (x,v))dv} \sup \limits _{\eta \in P(A(\phi (x,s)))} \inf \limits _{\gamma \in P(B(\phi (x,s)))} \Big [ r(\phi (x,s),\eta , \gamma ) \\{} & {} \qquad + m(\phi (x,s)) \int _{E} \psi _\infty (y) Q(dy{\vert }\phi (x,s),\eta ,\gamma )\Big ] d s \ \forall x \in E, \end{aligned}$$
which gives the reverse inequality $\psi _\infty \ge H \psi _\infty $. This leads to that $\psi _\infty = H \psi _\infty $.
(c)
Clearly, for every $x \in E$ and $t\in R_+$, we see that
$$\begin{aligned}{} & {} \psi _\infty (\phi (x,t)) \\{} & {} \quad = \int _0^\infty e^{- \alpha s} e^{-\int _0^s m(\phi (x,v+t))dv} \\{} & {} \qquad \times \sup \limits _{\eta \in P(A(\phi (x,s+t)))} \inf \limits _{\gamma \in P(B(\phi (x,s+t)))} \Big [ r(\phi (x,s+t),\eta , \gamma ) \\{} & {} \qquad + m(\phi (x,s+t)) \int _{E} \psi _\infty (y) Q(dy{\vert }\phi (x,s+t),\eta ,\gamma )\Big ] d s, \ \ x \in E, \end{aligned}$$
which is equivalent to that
$$\begin{aligned}{} & {} e^{-\int _0^t (\alpha + m(\phi (x,v)))dv} \psi _\infty (\phi (x,t)) \nonumber \\{} & {} \quad = \int _t^\infty e^{-\int _0^s (\alpha + m(\phi (x,v)))dv} \sup \limits _{\eta \in P(A(\phi (x,s)))} \inf \limits _{\gamma \in P(B(\phi (x,s)))} \Big [ r(\phi (x,s),\eta , \gamma ) \nonumber \\{} & {} \qquad + m(\phi (x,s)) \int _{E} \psi _\infty (y) Q(dy{\vert }\phi (x,s),\eta ,\gamma )\Big ] d s , \ \ x \in E. \end{aligned}$$
(23)
This equality shows that $\psi _\infty (\phi (x,t))$ is absolutely continuous in $t \in R_+$, and thus, is differentiable almost everywhere on $R_+$, which indicates that $\psi _\infty $ is in $\mathbbm {B}_{w_0}^{ac}(E)$. For $x \in E$ and $t\in R_+$, differentiating both sides of (23) with respect to t eventually leads to
$$\begin{aligned}{} & {} L^\phi \psi _\infty (x) - \alpha \psi _\infty (x) \\{} & {} \quad + \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\eta ,\gamma ) + \int _E \psi _\infty (y) q(dy{\vert }x,\eta ,\gamma ) \Big ] =0 \ \forall x \in {\mathcal {D}}^{\psi _\infty }, \end{aligned}$$
which in turn implies that $\Vert L^\phi \psi _\infty \Vert _{w_1}^{es} <\infty $. The proof is complete. $\square $

We are now ready to state our main results.

Theorem 6

Under Assumptions 1–3 and 4, the following assertions hold.

(a)
The game has the value function $V^*(x)$ as the unique solution in $\mathbbm {B}_{w_0,w_1}^{ac}(E)$ to the Shapley equation
$$\begin{aligned}{} & {} L^\phi V^*(x) - \alpha V^*(x) + \\{} & {} \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\eta ,\gamma ) + \int _E V^*(y) q(dy{\vert } x,\eta ,\gamma ) \Big ] =0 \ \ \forall x \in {\mathcal {D}}^{V^*}. \end{aligned}$$
(b)
There exists a pair of policies $(\hat{\pi }^1,\hat{\pi }^2) \in \Pi ^1_{RS} \times \Pi ^2_{RS}$ such that
$$\begin{aligned}{} & {} \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\eta ,\gamma ) + \int _E V^*(y) q(dy{\vert } x,\eta ,\gamma ) \Big ] \nonumber \\{} & {} \quad = r(x, \hat{\pi }^1(\cdot {\vert } x),\hat{\pi }^2(\cdot {\vert } x)) + \int _E V^*(y) q(dy{\vert } x, \hat{\pi }^1(\cdot {\vert } x),\hat{\pi }^2(\cdot {\vert } x)) \\{} & {} \quad = \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\hat{\pi }^1(\cdot {\vert } x),\gamma ) + \int _E V^*(y) q(dy{\vert } x,\hat{\pi }^1(\cdot {\vert } x),\gamma ) \Big ] \\{} & {} \quad = \sup \limits _{\eta \in P(A(x))} \Big [r(x,\eta ,\hat{\pi }^2(\cdot {\vert } x)) + \int _E V^*(y) q(dy{\vert } x,\eta ,\hat{\pi }^2(\cdot {\vert } x)) \Big ] \end{aligned}$$
for all $x \in E$, and such a pair of policies $(\hat{\pi }^1,\hat{\pi }^2) \in \Pi ^1_{RS} \times \Pi ^2_{RS}$ is a saddle point.

Proof

For each $x \in E$, by Lemma 2, P(A(x)) and P(B(x)) is compact, and the function

$$\begin{aligned} (\eta ,\gamma ) \mapsto r(x,\eta ,\gamma ) + \int _E \psi _\infty (y) q(dy{\vert } x,\eta ,\gamma ) \end{aligned}$$

(24)

is continuous on $ P(A(x))\times P(B(x))$ with respect to the weak topology. Moreover, (24) is linear in $(\eta ,\gamma ) \in P(A(x))\times P(B(x))$. Thus, using Fan’s minimax theorem in Fan (1953), we obtain

$$\begin{aligned}{} & {} \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\eta ,\gamma ) + \int _E \psi _\infty (y) q(dy{\vert } x,\eta ,\gamma ) \Big ] \\{} & {} \quad = \inf \limits _{\gamma \in P(B(x))} \sup \limits _{\eta \in P(A(x))} \Big [r(x,\eta ,\gamma ) + \int _E \psi _\infty (y) q(dy{\vert } x,\eta ,\gamma ) \Big ] \ \forall x \in E. \end{aligned}$$

Now, by Lemma 2 above and Lemma 4.1 in Nowak (1984), there exists a pair of policies $(\hat{\pi }^1,\hat{\pi }^2) \in \Pi ^1_{RS} \times \Pi ^2_{RS}$ such that

$$\begin{aligned}{} & {} \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\eta ,\gamma ) + \int _E \psi _\infty (y) q(dy{\vert } x,\eta ,\gamma ) \Big ] \\{} & {} \quad = \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\hat{\pi }^1(\cdot {\vert } x),\gamma ) + \int _E \psi _\infty (y) q(dy{\vert } x,\hat{\pi }^1(\cdot {\vert } x),\gamma ) \Big ] \ \forall x \in E, \end{aligned}$$

and

$$\begin{aligned}{} & {} \inf \limits _{\gamma \in P(B(x))} \sup \limits _{\eta \in P(A(x))} \Big [r(x,\eta ,\gamma ) + \int _E \psi _\infty (y) q(dy{\vert } x,\eta ,\gamma ) \Big ] \\{} & {} \quad = \sup \limits _{\eta \in P(A(x))} \Big [r(x,\eta ,\hat{\pi }^2(\cdot {\vert } x)) + \int _E \psi _\infty (y) q(dy{\vert } x,\eta ,\hat{\pi }^2(\cdot {\vert } x)) \Big ] \ \forall x \in E, \end{aligned}$$

which implies that

$$\begin{aligned}{} & {} \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\eta ,\gamma ) + \int _E \psi _\infty (y) q(dy{\vert } x,\eta ,\gamma ) \Big ] \\{} & {} \quad = r(x, \hat{\pi }^1(\cdot {\vert } x),\hat{\pi }^2(\cdot {\vert } x)) + \int _E \psi _\infty (y) q(dy{\vert } x, \hat{\pi }^1(\cdot {\vert } x),\hat{\pi }^2(\cdot {\vert } x)) \ \forall x \in E. \end{aligned}$$

Therefore, since the function $\psi _\infty $ satisfies Shapley equation (12) by Theorem 5, we have

$$\begin{aligned}{} & {} L^\phi \psi _\infty (x) - \alpha \psi _\infty (x) + r(x, \hat{\pi }^1(\cdot {\vert } x),\hat{\pi }^2(\cdot {\vert } x)) \nonumber \\{} & {} \quad + \int _E \psi _\infty (y) q(dy {\vert }x, \hat{\pi }^1(\cdot {\vert } x),\hat{\pi }^2(\cdot {\vert } x))=0, \end{aligned}$$

(25)

$$\begin{aligned}{} & {} L^\phi \psi _\infty (x) - \alpha \psi _\infty (x) + \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\hat{\pi }^1(\cdot {\vert } x),\gamma ) \nonumber \\{} & {} \quad + \int _E \psi _\infty (y) q(dy{\vert } x,\hat{\pi }^1(\cdot {\vert } x),\gamma ) \Big ] =0, \end{aligned}$$

(26)

$$\begin{aligned}{} & {} L^\phi \psi _\infty (x) - \alpha \psi _\infty (x) + \sup \limits _{\eta \in P(A(x))} \Big [r(x,\eta ,\hat{\pi }^2(\cdot {\vert } x)) \nonumber \\{} & {} \quad + \int _E \psi _\infty (y) q(dy{\vert } x,\eta ,\hat{\pi }^2(\cdot {\vert } x)) \Big ] =0, \end{aligned}$$

(27)

for every $x \in {\mathcal {D}}^{\psi _\infty }$. Using Theorem 4 and (25), we have $\psi _\infty =V^{\hat{\pi }^1,\hat{\pi }^2}$. Using Theorem 3(b) and (26), we have $\psi _\infty \le V^{\hat{\pi }^1,\pi ^2}$ for every $\pi ^2 \in \Pi ^2$. Using Theorem 3(a) and (27), we have $ V^{\pi ^1,\hat{\pi }^2} \le \psi _\infty $ for every $\pi ^1 \in \Pi ^1$. These three facts show that

$$\begin{aligned} V^{\pi ^1,{\hat{\pi }}^2}(x) \le V^{{\hat{\pi }}^1, {\hat{\pi }}^2}(x) \le V^{{\hat{\pi }}^1,\pi ^2}(x) \ \forall \pi ^1 \in \Pi ^1, \pi ^2 \in \Pi ^2, x \in E, \end{aligned}$$

(28)

which implies that the pair of policies $(\hat{\pi }^1,\hat{\pi }^2) \in \Pi ^1_{RS} \times \Pi ^2_{RS}$ is a saddle point, and thus $V^*=\psi _\infty $ is the value of the game. Hence, part (a) and part (b) follow. $\square $

Remark 5

From Theorem 6, one can see that the max-min points of the mapping

$$\begin{aligned} (\eta ,\gamma ) \mapsto r(x,\eta ,\gamma ) + \int _E V^*(y) q(dy{\vert } x,\eta ,\gamma ) \end{aligned}$$

over $P(A(x)) \times P(B(x))$ for all $x \in E$, denoted by $(\hat{\pi }^1(\cdot {\vert } x),\hat{\pi }^2(\cdot {\vert } x))$, constitute a saddle point. It has a very simple form, which depends only on the current state and can be applied at any time.

4.2 How to compute a saddle point

Although Theorem 6 shows the existence of a saddle point, how to compute a saddle point in practice is still a difficult task. There are two problems on this issue. The first one is how to compute the value function $V^*(x)$. The second one is how to solve the static game

$$\begin{aligned} \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\eta ,\gamma ) + \int _E V^*(y) q(dy \vert x,\eta ,\gamma ) \Big ]. \end{aligned}$$

Fortunately, Theorem 5 proposes a value iteration algorithm to compute the value function $V^*(x)$, and Sect. 2.4.2 in Barron (2013) provides a linear program formulation to solve the static game. Below, we propose a potential algorithm to compute a saddle point.

Algorithm for saddle point:

Step 1.:

Specify an accuracy $\epsilon > 0$, and set $n=0$. Let

$$\begin{aligned} \psi _0(x):= -\frac{M_0}{\alpha -c_0} w_0(x) - \frac{d_0 M_0}{\alpha (\alpha -c_0)} \ \ \ \ \ \forall x \in E. \end{aligned}$$

Step 2.:

For each $x \in E$, carry out the linear program:

$$\begin{aligned} \begin{array}{lll} {\textbf {LP1}}: &{} \max \limits _{\eta , V(x)} &{} V(x), \\ &{} \text {s.t.} &{} r(x,\eta ,b) + m(x) \int _E \psi _n (y) Q(dy\vert x,\eta ,b)\ge V(x) \ \ \ \forall b \in B(x), \\ &{} &{} \eta \in P(A(x)). \end{array} \end{aligned}$$

One of the outputs of LP is the value ${{\hat{V}}}(x)$ of the static game

$$\begin{aligned} \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [ r(x,\eta , \gamma ) + m(x) \int _{E} \psi _n(y) Q(dy\vert x,\eta ,\gamma )\Big ]. \end{aligned}$$

Step 3.:

For each $x \in E$, compute

$$\begin{aligned} \psi _{n+1}:=\int _0^\infty e^{-\int _0^s (\alpha + m(\phi (x,v)))dv} {{\hat{V}}}(\phi (x,s)) d s. \end{aligned}$$

Step 4.:

If $\vert \psi _{n+1}(x)-\psi _n(x)\vert <\epsilon $ for every $x \in E$, go to Step 5. Otherwise, increment n by 1 and return to Step 2.

Step 5.:

For each $x \in E$, carry out the two linear programs:

$$\begin{aligned} \begin{array}{lll} {\textbf {LP2}}: &{} \max \limits _{\eta , V(x)} &{} V(x), \\ &{} \text {s.t.} &{} r(x,\eta ,b) + \int _E \psi _{n+1}(y) q(dy\vert x,\eta ,b) \ge V(x) \ \ \ \forall b \in B(x), \\[3mm] &{} &{} \eta \in P(A(x)). \end{array} \end{aligned}$$

and

$$\begin{aligned} \begin{array}{lll} {\textbf {LP3}}: &{} \min \limits _{\gamma , W(x)} &{} W(x), \\ &{} \text {s.t.} &{} r(x,a,\gamma ) + \int _E \psi _{n+1}(y) q(dy\vert x,a,\gamma ) \le W(x) \ \ \ \forall a \in A(x), \\[3mm] &{} &{} \gamma \in P(B(x)). \end{array} \end{aligned}$$

Denote by $({{\hat{V}}}(x), {\hat{\pi }}^1(\cdot \vert x))$ the solution of LP2 and by $({{\hat{W}}}(x), {\hat{\pi }}^2(\cdot \vert x))$ the solution of LP3. Then, ${{\hat{V}}}(x)={{\hat{W}}}(x)$ and $( {\hat{\pi }}^1(\cdot \vert x)), {\hat{\pi }}^2(\cdot \vert x)))$ are the value and a saddle point of the static game

$$\begin{aligned} \sup \limits _{\eta \in P(A(x))} \inf \limits _{\gamma \in P(B(x))} \Big [r(x,\eta ,\gamma ) + \int _E \psi _{n+1}(y) q(dy \vert x,\eta ,\gamma ) \Big ], x \in E, \end{aligned}$$

respectively.

Since $\psi _{n+1} \approx V^*$, by Theorem 6, we can expect that $({\hat{\pi }}^1, {\hat{\pi }}^2)$ obtained in Step 5 above is a near saddle point for the original PDMGs.

It should be noted that it is very difficult to implement the algorithm since the state space and the action spaces are all uncountable.

5 Example

In this section, we provide an example wherein all the assumptions in this paper are fulfilled.

Example 1

Let $E=(0,\infty )$, $A(x)=[-x-1,-x]$, and $B(x)=[0,\frac{\sqrt{2}}{2}x]$ for each $x\in E$. Suppose that the flow $\phi (x,t)=xe^{-t}$, the payoff rate $r(x,a,b)=\dfrac{x}{2}+a+b$ for each $(x,a,b) \in K$ and $t\in R_+$. The transition rates are given by

$$\begin{aligned} q(D\vert x,a,b)= & {} (x+a+b+1)\Big [\int _{D\setminus \{x\}} \frac{1}{x+a+b+1}e^{- \frac{y}{x+a+b+1}}dy - \delta _{\{x\}}(D)\Big ], \\{} & {} \forall (x,a,b) \in K, D \in {\mathscr {B}}(E). \end{aligned}$$

Note that, starting from any state $x \in E$, the flow never touches the boundary of E, say, $\{0\}$. Moreover, it is obvious that the multi-functions $A(\cdot )$ and $B(\cdot )$ are compact-valued. Also, we can verify that the inverses $A^{-1}(O):=\{x \in E: A(x) \cap O = \emptyset \} \in {\mathscr {B}}(E)$ and $B^{-1}(O) \in {\mathscr {B}}(E)$ for every open interval O in R, which together with the fact that any open set in R can be represented by countable union of open intervals, $A^{-1}(O) \in {\mathscr {B}}(E)$ and $B^{-1}(O) \in {\mathscr {B}}(E)$ for every open set O in R. This means that the multi-functions $A(\cdot )$ and $B(\cdot )$ are measurable.

Now, we verify that all the assumptions in this paper are satisfied. To do so, take $w_0(x)=x+1$, $w_1(x)=x^2+1$, and $E_m=[0,m]$ for all $x \in E$ and $m \ge 1$. Then, it is clear that $w_0(\phi (x,t))\le w_0(x)$, $w_1(\phi (x,t))\le w_1(x)$, $E_m \uparrow E$, $\sup _{x\in E_m} q^*(x)=\frac{\sqrt{2}}{2}m+1<\infty $, $\lim _{m\rightarrow \infty }\inf _{x \notin E_m}w_0(x)=\infty $, $\vert r(x,a,b) \vert \le \dfrac{x}{2}+1 \le w_0(x)$, and

$$\begin{aligned} \big (1+q(x,a,b) \big ) w_0(x)\le (\frac{\sqrt{2}}{2}x+2)(x+1) \le 3 w_1(x) \ \forall (x,a,b) \in K. \end{aligned}$$

This indicates that Assumptions 1(b), 1(c), 2(b), 3(a) and 3(c) are fulfilled.

Regarding Assumption 1(a), for all $(x,a,b) \in K$,

$$\begin{aligned}{} & {} \int _E w_0(y)q(dy \vert x,a,b) \le (x+a+b+1)\Big [(\frac{\sqrt{2}}{2}-1)x+1)\Big ]. \end{aligned}$$

If $x\le 2+\sqrt{2}$, then $(\frac{\sqrt{2}}{2}-1)x+1\ge 0$, and thus, $(x+a+b+1)\Big [(\frac{\sqrt{2}}{2}-1)x+1)\Big ] \le (\frac{\sqrt{2}}{2}x+1)\Big [(\frac{\sqrt{2}}{2}-1)x+1)\Big ] \le w_0(x)$. If $x > 2+\sqrt{2}$, then $(\frac{\sqrt{2}}{2}-1)x+1 < 0$, and thus, $(x+a+b+1)\Big [(\frac{\sqrt{2}}{2}-1)x+1)\Big ] \le 0< w_0(x)$. This means that, in either case, Assumption 1(a) is fulfilled with $c_0=1$ and $d_0=0$.

Regarding Assumption 3(b), for all $(x,a,b) \in K$, we have

$$\begin{aligned}{} & {} \int _E w_1 (y)q(dy\vert x,a,b)\\{} & {} \quad = (x+a+b+1) \Big [2 (x+a+b+1)^2-x^2 \Big ]\\{} & {} \quad \le (\frac{\sqrt{2}}{2}x+1)\Big [2(\frac{\sqrt{2}}{2}x+1)^2 -x^2 \Big ]\\{} & {} \quad \le (\frac{\sqrt{2}}{2}x+1)(2\sqrt{2}x+ 2)\\{} & {} \quad \le 4 w_1(x)+2, \end{aligned}$$

which implies that Assumption 3(b) is satisfied with $c_1=4$ and $d_1=2$. Hence, if we take $\alpha > 4$, the requirements on the discount factor in Assumptions 2(a) and 3 are both fulfilled.

Finally, Assumption 4 is obviously true for the data in this example.

Therefore, by Theorem 6, there is a saddle point for this game model.

References

Avrachenkov K, Habachi O, Piunovskiy A, Zhang Y (2015) Infinite horizon impulsive optimal control with applications to Internet congestion control. Int J Control 88:703–716
Article MATH Google Scholar
Barron EN (2013) Game Theory: An Introduction. Wiley series in operations research and management science, 2nd edn. Wiley series in operations research and management scienceWiley series in operations research and management scienceWiley series in operations research and management scienceWiley series in operations research and management science. John Wiley and Sons Inc, Hoboken, NJ
Book Google Scholar
Bäuerle N, Rieder U (2011) Markov Decision Processes with Applications to Finance. Chapter 8: piecewise deterministic Markov decision processes. Universitext, Springer, Heidelberg
Costa OLV, Dufour F, Piunovskiy AB (2016) Constrained and unconstrained optimal discounted control of piecewise deterministic Markov processes. SIAM J Control Optim 54:1444–1474
Article MathSciNet MATH Google Scholar
Costa OLV, Dufour F (2018) Zero-sum discounted reward criterion games for piecewise deterministic Markov processes. Appl Math Optim 78:587–611
Article MathSciNet MATH Google Scholar
Fan K (1953) Minimax theorems. Proc Natl Acad Sci USA 39:42–47
Article MathSciNet MATH Google Scholar
Gensbittel F, Renault J (2015) The value of Markov chain games with incomplete information on both sides. Math Oper Res 40:820–841
Article MathSciNet MATH Google Scholar
Ghosh MK, Goswami A (2006) Partially observable semi-Markov games with discounted payoff. Stoch Anal Appl 24:1035–1059
Article MathSciNet MATH Google Scholar
Guo X, Zhang Y (2017) Zero-sum continuous-time Markov pure jump game over a fixed duration. J Math Anal Appl 452:1194–1208
Article MathSciNet MATH Google Scholar
Guo XP, Hernández-Lerma O (2003) Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates. J Appl Probab 40:327–345
Article MathSciNet MATH Google Scholar
Guo XP, Hernández-Lerma O (2007) Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs. Adv Appl Probab 39:645–668
Article MathSciNet MATH Google Scholar
Guo XP, Song XY (2011) Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann Appl Probab 21:2016–2049
Article MathSciNet MATH Google Scholar
Haurie A, Krawczyk JB, Zaccour G (2012) Games and Dynamic Games. World Scientific Now Publishers Series in Business, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer-Verlag, New York
Book MATH Google Scholar
Huang YH, Guo XP (2019) Finite-horizon piecewise deterministic Markov decision processes with unbounded transition rates. Stochastics 91:67–95
Article MathSciNet MATH Google Scholar
Jacod J (1975) Multivariate point processes: predictable projection, Radon-Nikodym derivatives, representation of martingales. Z.Wahrscheinlichkeitstheorie und Verw Gebiete 31:235–253
Article MathSciNet MATH Google Scholar
Jaśkiewicz A (2009) Zero-sum ergodic semi-Markov games with weakly continuous transition probabilities. J Optim Theory Appl 141:321–347
Article MathSciNet MATH Google Scholar
Lorenzo JM, Hernández-Noriega I, Prieto-Rumeau T (2015) Approximation of two-person zero-sum continuous-time Markov games with average payoff criterion. Oper Res Lett 43:110–116
Article MathSciNet MATH Google Scholar
Minjárez-Sosa JA (2020) Zero-sum discrete-time Markov games with unknown disturbance distribution. Discounted and average criteria. SpringerBriefs in Probability and Mathematical Statistics. Springer, Cham
Mondal P (2017) On zero-sum two-person undiscounted semi-Markov games with a multichain structure. Adv Appl Probab 49:826–849
Article MathSciNet MATH Google Scholar
Nowak AS (1984) On zero-sum stochastic games with general state space. I. Probab Math Stat 4:13–32
MathSciNet MATH Google Scholar
Piunovskiy A, Zhang Y (2021) Aggregated occupation measures and linear programming approach to constrained impulse control problems. J Math Anal Appl 499(2):125070
Article MathSciNet MATH Google Scholar
Prieto-Rumeau T, Lorenzo JM (2015) Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion. TOP 23:799–836
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was supported in part by the National Natural Science Foundation of China (11931018), the National Key Research and Development Program of China (2022YFA1004600), the University of Macau (MYRG2019-00031-FBA), and Guangdong Basic and Applied Basic Research Foundation. The authors are grateful to the two reviewers and the associate editor for their careful reading and many constructive suggestions that have improved this paper.

Author information

Authors and Affiliations

School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China
Yonghui Huang & Xianping Guo
Faculty of Business Administration, University of Macau, Macau SAR, China
Zhaotong Lian

Authors

Yonghui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaotong Lian
View author publications
You can also search for this author in PubMed Google Scholar
Xianping Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yonghui Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of Proposition 1

In fact, we extend the previous results (Guo and Song 2011, Theorem 3.1) for CTMDPs and (Huang and Guo 2019, Proposition 3.1) for PDMDPs to the case of PDMGs. Thus, the proof of Proposition 1 proceeds in a similar way. Since Assumption 1 in this paper slightly differs from Assumption 3.1 in Huang and Guo (2019), the proofs are accordingly adjusted. First, we provide two lemmas.

Lemma 3

Let some function $w \ge 0$ on E, and some signed kernel ${\bar{q}}(dy \vert x,u)$ on E given $E \times R_+$ satisfy the following conditions: (i) ${\bar{q}}(E \vert x,u)=0$; ${\bar{q}}(D \vert x,u)\ge 0$ if $\phi (x,u) \notin D$; ${\bar{q}}(x,u):={\bar{q}}(E\setminus \{\phi (x,u)\} \vert x,u)<\infty $; (ii) $\int _{E} {\bar{q}}(dy \vert x,u)w(y)\le c w(\phi (x,u))+ d$, where $u\ge 0$, $c> 0$ and $d \ge 0$ are some constants; (iii) $w(\phi (x,t))\le w(x)$ for all $x\in E$ and $t \in R_+$. Then the function

$$\begin{aligned} \Psi (s, x):=e^{c s} w (x)+\frac{d}{c}(e^{c s}-1), \ \ s \in R_+, x \in E, \end{aligned}$$

(29)

satisfies the following inequality

$$\begin{aligned}{} & {} \int _0^s \int _{E\setminus \{\phi (x,u)\}} e^{-\int _0^u {\bar{q}}(x,v)dv}{\bar{q}}(dy \vert x,u)\Psi (s-u,y) du + e^{-\int _0^s {\bar{q}}(x,v)dv} w(\phi (x,s)) \\{} & {} \quad \le \Psi (s,x) \end{aligned}$$

for all $s \in R_+$ and $x \in E$.

Proof

Using the conditions in this lemma, it is easy to verify that

$$\begin{aligned}{} & {} \int _0^s \int _{E\setminus \{\phi (x,u)\}} e^{-\int _0^u {\bar{q}}(x,v)dv}{\bar{q}}(dy {\vert }x,u)\Big [ e^{c(s-u)}w(y)+\frac{d}{c}(e^{c(s-u)}-1)\Big ] du \\{} & {} \qquad + e^{-\int _0^s {\bar{q}}(x,v)dv} w(\phi (x,s))\\{} & {} \quad \le \int _0^s e^{-\int _0^u {\bar{q}}(x,v)dv}\Big [ e^{c(s-u)}\Big (c w (\phi (x,u))+ d + {\bar{q}}(x,u) w(\phi (x,u)) \Big ) \\{} & {} \qquad +\frac{b}{c}(e^{c(s-u)}-1){\bar{q}}(x,u)\Big ] du +e^{-\int _0^s {\bar{q}}(x,v)dv} w(\phi (x,s))\\{} & {} \quad \le \int _0^s e^{-\int _0^u {\bar{q}}(x,v)dv}\Big [ e^{c(s-u)}\Big (c w (x)+ d + {\bar{q}}(x,u) w(x) \Big )\\ {}{} & {} +\frac{d}{c}(e^{c(s-u)}-1){\bar{q}}(x,u)\Big ] du \\{} & {} \qquad +e^{-\int _0^s {\bar{q}}(x,v)dv} w(x)\\{} & {} \quad =\Big (w(x)+\frac{d}{c} \Big ) \int _0^s e^{-\int _0^u {\bar{q}}(x,v)dv} d \Big (- e^{c(s-u)}\Big ) \\{} & {} \qquad + w(x) \int _0^s e^{c(s-u)} d\Big (-e^{-\int _0^u {\bar{q}}(x,v)dv} \Big ) \\{} & {} \qquad + \frac{d}{c} \int _0^s \Big (e^{c(s-u)}-1 \Big ) d\Big ( -e^{-\int _0^u {\bar{q}}(x,v)dv} \Big ) + e^{-\int _0^s {\bar{q}}(x,v)dv} w(x)\\{} & {} \quad =\Big (w(x)+\frac{d}{c} \Big ) \Big [e^{c s}-e^{-\int _0^s {\bar{q}}(x,v)dv} - \int _0^s e^{c(s-u)} d \Big (- e^{-\int _0^u {\bar{q}}(x,v)dv} \Big )\Big ]\\{} & {} \qquad + \Big (w(x)+\frac{d}{c} \Big ) \int _0^s e^{c(s-u)} d\Big (- e^{-\int _0^u {\bar{q}}(x,v)dv} \Big ) + \frac{d}{c} \Big ( e^{-\int _0^s {\bar{q}}(x,v)dv}-1 \Big ) \\{} & {} \qquad + e^{-\int _0^s {\bar{q}}(x,v)dv} w(x) \\{} & {} \quad = e^{c s}w(x)+\frac{d}{c}(e^{c s}-1)\\{} & {} \quad =\Psi (s,x), \end{aligned}$$

where the first and second inequalities follow from conditions (ii) and (iii), respectively. $\square $

Lemma 4

Let Assumption 1(a) and Assumption 1(c) be fulfilled. Then, for each $ (\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2$, $x\in E$, and $m=0,1,2,\ldots $,

$$\begin{aligned} \mathbbm {E}_{x}^{\pi ^1,\pi ^2} \big [w_0(\xi _t) \mathbbm {1}_{\{t< T_{m+1}\}} \big ] \le e^{c_0 t}w_0(x) + \dfrac{d_0}{c_0}(e^{c_0 t} -1) ~ ~\forall t \in R_+. \end{aligned}$$

Proof

The proof is similar to that of (Huang and Guo 2019, Lemma A.2) for PDMDPs. We omit it to save space. $\square $

Proof of Proposition 1

Based on Lemma 4 and Assumption 1, the proof of Proposition 1 can be proceeded in a similar way to that of (Guo and Song 2011, Theorem 3.1) for CTMDPs and that of (Huang and Guo 2019, Proposition 3.1) for PDMDPs. However, we omit the proof to save space. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, Y., Lian, Z. & Guo, X. Zero-sum infinite-horizon discounted piecewise deterministic Markov games. Math Meth Oper Res 97, 179–205 (2023). https://doi.org/10.1007/s00186-023-00809-0

Download citation

Received: 11 September 2021
Revised: 22 September 2022
Accepted: 22 December 2022
Published: 08 February 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00186-023-00809-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Zero-sum infinite-horizon discounted piecewise deterministic Markov games

Abstract

Similar content being viewed by others

Zero-Sum Markov Games with Random State-Actions-Dependent Discount Factors: Existence of Optimal Strategies

Stationary Almost Markov ε-Equilibria for Discounted Stochastic Games with Borel Spaces and Unbounded Payoffs

Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes

1 Introduction

2 Problem formulation

Remark 1

Definition 1

Definition 2

Definition 3

3 Preliminaries

Assumption 1

Remark 2

Proposition 1

Proof

Assumption 2

Lemma 1

Proof

Assumption 3

Remark 3

Theorem 2

Proof

Remark 4

Theorem 3

Proof

Theorem 4

Proof

4 Main results

4.1 Shapley equation and saddle points

Assumption 4

Lemma 2

Proof

Theorem 5

Proof

Theorem 6

Proof

Remark 5

4.2 How to compute a saddle point

5 Example

Example 1

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proof of Proposition 1

Appendix: Proof of Proposition 1

Lemma 3

Proof

Lemma 4

Proof

Proof of Proposition 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation