Abstract
This papers deals with the zero-sum game with a discounted reward criterion for piecewise deterministic Markov process (PDMPs) in general Borel spaces. The two players can act on the jump rate and transition measure of the process, with the decisions being taken just after a jump of the process. The goal of this paper is to derive conditions for the existence of min–max strategies for the infinite horizon total expected discounted reward function, which is composed of running and boundary parts. The basic idea is, by using the special features of the PDMPs, to re-write the problem via an embedded discrete-time Markov chain associated to the PDMP and re-formulate the problem as a discrete-stage zero sum game problem.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Piecewise deterministic Markov processes (PDMPs) were introduced in [2] and [3] as a general family of continuous-time non-diffusion stochastic models, suitable for formulating many optimization problems in queuing and inventory systems, maintenance-replacement models, and many other areas of engineering and operations research. PDMPs are determined by three local characteristics: the flow \(\phi \), the jump rate \(\lambda \), and the transition measure Q. Starting from x, the motion of the process follows the flow \(\phi (x,t)\) until the first jump time \(T_1\), which occurs either spontaneously in a Poisson-like fashion with rate \(\lambda \) or when the flow \(\phi (x,t)\) hits the boundary of the state space. In either case the location of the process at the jump time \(T_1\) is selected by the transition measure \(Q(\phi (x,T_1),.)\) and the motion restarts from this new point as before. As shown in [3], a suitable choice of the state space and the local characteristics \(\phi \), \(\lambda \), and Q provides stochastic models covering a great number of problems of engineering and operations research (see, for instance, [3, 4]).
Zero-sum stochastic dynamic games have been recently widely studied in the literature, in discrete as well as continuous time. Regarding the first case the evolution of the process follows a discrete-time Markov process, and we can mention the book [6] and the papers [7, 10, 15,16,17] as a sample of works dealing with the discrete-time case. For these problems the admissible strategy of the players can be past-depend on the previous states and actions, and the optimal equilibrium solution is usually obtained from stochastic measurable selectors (thus depending only on the present state value) that satisfy a min–max optimality equation. For the continuous-time case the definition of the admissible strategies depends on the model that it is considered. In the so-called semi-Markov case (see, for instance, [13, 14, 22]) the controlled process is defined in terms of a sequence of random decision (jump) epochs and post-jump locations, and the decisions are taken immediately after a jump. The admissible strategies for the players are transition probabilities that may depend on the whole past history of the process and actions up to the present value of the state value. Notice that in this semi-Markov case there is no motion of the process between the jumps. Another approach for the problem is to consider that the state process evolves according to a continuous-time jump Markov process (see, for instance, [8, 9]). In this formulation both players 1 and 2 observe continuously the current state of the system and whenever it is at some state x(t) they choose independently their actions a(t) and b(t) according to stochastic kernels \(\pi ^1_t(.|x(t))\) and \(\pi ^2_t(.|x(t))\). Notice that in this case there is no dependence on previous actions and state values, and the strategies depend only on the present state value x(t).
In this paper we consider the zero-sum games with an infinite horizon discounted reward criterion for PDMP in general Borel spaces. The two players can act on the jump rate and transition measure of the process, with the decisions being taken just after a jump of the process. We assume that the players’ decisions may depend on the previous actions and the post-jump locations up to the present location. When compared with [13, 14, 22], the process considered in this paper is more general since it includes a possible flow motion between jumps and also jumps whenever the process touches the frontier. Note that a semi-Markov process can be written as a PDMP when it is “markovianized” as shown, for instance, in [21]. Indeed, as presented in [21], pages 71–72, to markovianize a semi-Markov model taking values in a sate space \(\widetilde{\mathbf {X}}\) and with probability distribution function \(\widetilde{F}(x,t)\) for the sojourn time at the state x, the state space has to be enlarged to \(\mathbf {X} = \widetilde{\mathbf {X}}\times [0,\infty )\) so that for \((x,t)\in \mathbf {X}\) we have that x represents the location of the process and t the elapsed sojourn time in state x. Writing this model as a PDMP, the flow would be \(\phi ((x,t),s)= (x,t+s)\), and \(\lambda (x,t)\) the failure rate of \(\widetilde{F}(x,t)\) so that, in this sense, the PDMP can be seen as more general than the semi-Markov case. On the other hand in order to get our closed expressions for the min–max optimality equation we exclude from the admissible strategies the dependence on the previous inter-arrival jump times. Indeed, the basic idea of our approach is to re-write the min–max continuous-time problem in a discrete-time way, and derive the optimality equations by iterations of a kernel G(.|x, a, b), to be defined in (17), where x will represent the post-jump location and a and b the post-jump actions from players 1 and 2 respectively. Thus to get our iterative procedure through kernel G we had to exclude the dependence on the sojourn times. When compared with [8, 9] our approach has the advantage of allowing the dependence on the previous actions and post-jump location of the process, being more within the context of a game.
This paper is organized as follows. In Sect. 2 we present the notation and problem formulation. Sections 3 presents the main operators that will be required in the paper. In Sect. 4 we present some auxiliary results. In Sect. 5 we derive conditions for the existence and characterization of min–max strategies for the infinite horizon total expected discounted payoff function, which is the main result of the paper. In the Appendix we present the proof of an auxiliary result.
2 Notation and Problem Formulation
In this section we start by introducing in Sect. 2.1 the main notation that will be used along the paper. Section 2.2 aims at presenting the spaces and parameters related to the problem. In Sect. 2.3 we introduce the construction of the process while in Sect. 2.4 we define the set of admissible strategies and the associated conditional distribution of the controlled process.
2.1 Notation
The following notation will be used in this paper: \(\mathbb {N}\) is the set of natural numbers including 0, \(\mathbb {N}^{*}=\mathbb {N}-\{0\}\), \(\mathbb {R}\) denotes the set of real numbers, \(\mathbb {R}_{+}\) the set of non-negative real numbers, \(\mathbb {R}_{+}^{*}=\mathbb {R}_{+}-\{0\}\), \({\mathbb {R}}_{+}^{*}=\mathbb {R}_{+}\mathop {\cup }\{+\infty \}\) and \(\overline{{\mathbb {R}}}_{+}^{*}=\mathbb {R}_{+}^*\mathop {\cup }\{+\infty \}\). For X a Borel space (i.e. a Borel-measurable subset of a complete and separable metric space), we denote by \(\mathcal {B}(X)\) its associated Borel \(\sigma \)-algebra. For X, Y Borel spaces, we write \(\mathbb {M}(X,Y)\) for the space of Borel-measurable functions from X to Y. The set of Borel-measurable and Borel-measurable and bounded real-valued functions defined on the Borel space X is denoted respectively by \(\mathbb {M}(X)\) and \(\mathbb {B}(X)\). By \(\mathbb {M}(X)_+\) we mean the set of non-negative Borel-measurable real-valued functions, and similarly for \(\mathbb {B}(X)_+\). For \(g\in \mathbb {M}(X)\) with \(g(x)>0\) for all \(x\in X\), \(\mathbb {B}_{g}(X)\) is the set of functions \(v\in \mathbb {M}(X)\) such that \(\displaystyle \sup _{x\in X} \frac{|v(x)|}{g(x)}< +\infty \). For any set A, \(I_{A}\) denotes the indicator function of the set A. \(\mathcal {P}(X)\) is the set of probability measures defined on \((X,\mathcal {B}(X))\), and \(\mathcal {P}(X|Y)\) is the set of stochastic kernels on X given Y where Y denotes a Borel space. For any point \(x\in X\), \(\delta _{x}\) denotes the Dirac measure defined by \(\delta _{x}(\Gamma )=I_{\Gamma }(x)\) for any \(\Gamma \in \mathcal {B}(X)\). If R is a kernel on Y given X and \(f\in \mathbb {M}(Y)\), then for any \(x\in X\), Rf(x) denotes \(\int _{Y}f(y)R(dy|x)\) provided the integral exists. Finally, the infimum over an empty set is understood to be equal to \(+\infty \), and we set \(e^{-\infty }=0\).
2.2 Preliminaries
For the definition of the state space of the PDMP we will consider for notational simplicity that \(\mathbf {X}\) is an open subset of \(\mathbb {R}^n\) (\(n\in \mathbb {N}^{*}\)) with \(\partial \mathbf {X}\) denoting the boundary of \(\mathbf {X}\), and \(\bar{\mathbf {X}}\) its closure. This definition could be easily generalized to include some boundary points and countable union of sets as in [3, Sect. 24]. In what follows the sets \(\mathbf {A}\) and \(\mathbf {B}\) are the action spaces for players 1 and 2, respectively, and assumed to be Borel spaces. For each \(x\in \mathbf {X}\), we define the subsets \(\mathbf {A}(x)\) of \(\mathbf {A}\) and \(\mathbf {B}(x)\) of \(\mathbf {B}\) as the set of feasible control actions for players 1 and 2, respectively, that can be taken when the state process is in \(x\in \mathbf {X}\). Let \(\mathbf {U}\) be another Borel space associated to the control process.
We introduce next some data that will be used to define the controlled PDMP.
-
The flow \(\phi (x,t)\) is a function \(\phi : \mathbb {R}^{n}\times \mathbb {R}_{+} \longrightarrow \mathbb {R}^{n}\) continuous in (x, t) and such that \(\phi (x,t+s) = \phi (\phi (x,t),s)\).
-
For each \(x\in \mathbf {X}\), the time the flow takes to reach the boundary starting from x is defined as
$$\begin{aligned} t_{*}(x)\doteq \inf \{t>0:\phi (x,t)\in \partial \mathbf {X} \}. \end{aligned}$$It is assumed that \(t_{*} \in \mathbb {M}(\mathbf {X},\bar{\mathbb {R}}_+)\) (see [3, Lemma 27.1] for conditions that assure that \(t_{*}\) is Borel measurable). For \(x\in \mathbf {X}\) such that \(t_{*}(x)=\infty \) (that is, the flow starting from x never touches the boundary), we set \(\phi (x,t_{*}(x))=\Delta \), where \(\Delta \) is a fixed point in \(\partial \mathbf {X}\).
-
The jump rate \(\lambda \in \mathbb {M}(\mathbf {X}\times \mathbf {U})_{+}\).
-
The transition measure Q which is a stochastic kernel in \(\mathcal {P} (\mathbf {X}|\bar{\mathbf {X}}\times \mathbf {U})\). To avoid jumps to the same point, we assume that \(Q(\{x\}|x,u)=0\) for any \(x\in \mathbf {X}\), \(u\in \mathbf {U}\).
-
The pre-defined control function \(\ell \in \mathbb {M}(\mathbf {X}\times \mathbf {A}\times \mathbf {B}\times \mathbb {R}_{+},\mathbf {U})\).
Remark 2.1
The idea behind the definition above is that after a jump from a point \(x\in \mathbf {X}\) an action \(a\in \mathbf {A}(x)\) will be chosen for player 1, and similarly an action \(b\in \mathbf {B}(x)\) will be chosen for player 2. Actions a and b will parametrize the function \(\ell (x,a,b,t)\), with \(0 \le t \le t_*(x)\), which will regulate the jump rate and transition measure of the PDMP until the next jump time. Therefore in the model considered in this paper the decisions for players 1 and 2 are taken only after a jump time, and the behavior of \(\lambda \) and Q will depend on the pre-defined function \(\ell (x,a,b,t)\) for \(0\le t \le t_*(x)\).
We define \(\mathbf {\Xi }=\{x\in \partial \mathbf {X}: x=\phi (y,t) \text { for some } y\in \mathbf{X} \text { and } t\in \mathbb {R}^*_+\}\), the so called active boundary. As usual we will assume that the set \(\mathbf {K}=\bigl \{ (x,a,b): x\in \mathbf {X}, a\in {\mathbf {A}}(x),\,b\in {\mathbf {B}}(x) \bigr \}\) is a Borel subset of \(\bar{\mathbf {X}} \times {\mathbf {A}}\times {\mathbf {B}}\).
2.3 Construction of the Process
Let \(\mathbf {X}_\infty =\mathbf {X}\cup \{x_{\infty }\}\), where \(x_\infty \) is an isolated artificial point corresponding to the case when no jumps occur in the future. Similarly \(\mathbf {A}_\infty =\mathbf {A}\cup \{a_{\infty }\}\), \(\mathbf {B}_\infty =\mathbf {B}\cup \{b_{\infty }\}\), \(\mathbf {A}(x_{\infty })=\{a_{\infty }\}\), and \(\mathbf {B}(x_{\infty })=\{b_{\infty }\} \) where \(a_\infty \), \(b_\infty \) are isolated artificial actions for players 1 and 2 corresponding to the case when no jumps occur in the future. For notational convenience, we introduce \(\mathbf {K}_{\infty }=\bigl \{ (x,a,b): x\in \mathbf {X}_{\infty }, a\in {\mathbf {A}}(x),\,b\in {\mathbf {B}}(x) \bigr \}.\)
We put \(\Omega _{n}=\mathbf {X}\times (\mathbf {A}\times \mathbf {B}\times \mathbb {R}_{+}^{*}\times \mathbf {X})^n \times (\{a_{\infty }\}\times \{b_{\infty }\}\times \{\infty \}\times \{x_{\infty }\})^\infty \). The canonical space denoted by \(\Omega \) is defined as \(\Omega =\bigcup _{n=1}^\infty \Omega _{n}\bigcup \big ( (\mathbf {X} \times \mathbf {A}\times \mathbf {B}\times \mathbb {R}_{+}^{*})^\infty \big )\) and is endowed with its Borel \(\sigma \)-algebra denoted by \(\mathcal {F}\). For notational convenience, \(\omega \in \Omega \) will be represented as
Here, \(x_0\in \mathbf{X}\) is the initial state of the controlled point process \(\xi \) with values in \(\mathbf {X}\), defined below. For \(n\in \mathbb {N}^{*}\), the components \(\theta _{n}>0\) and \(x_{n}\) correspond to the intervals between two consecutive jumps and the values of the process immediately after jumps, and \(a_n\), \(b_n\) the actions taken by players 1 and 2 respectively, also immediately after jumps. In case \(\theta _{n}<\infty \) and \(\theta _{n+1}=\infty \), the trajectory has only n jumps, and we put \(\theta _{m}=\infty \) and \(x_m=x_{\infty }\), \(a_m=a_{\infty }\), \(b_m=b_{\infty }\) (artificial points) for all \(m\ge n+1\). Between jumps, the state of the process \(\xi \) moves according to the flow \(\phi \).
The path up to \(n\in \mathbb {N}\) is denoted by \(h_{n}=(x_0,a_0,b_0,\theta _1,x_1,a_1,b_1,\theta _2,\ldots , x_{n-1},a_{n-1},b_{n-1},\theta _{n},x_{n})\) (thus excluding the decisions at n), and the collection of all such paths is denoted by \(\mathbf {H}_{n}\). For \(n\in \mathbb {N}\), introduce the mappings \(X_{n}:~\Omega \rightarrow \mathbf {X}_\infty \) by \(X_{n}(\omega )=x_{n}\), \(A_{n}:~\Omega \rightarrow \mathbf {A}_\infty \) by \(A_{n}(\omega )=a_{n}\), \(B_{n}:~\Omega \rightarrow \mathbf {B}_\infty \) by \(B_{n}(\omega )=b_{n}\) and, for \(n\ge 1\), the mappings \(\Theta _{n}: \Omega \rightarrow \overline{\mathbb {R}}_{+}^{*}\) by \(\Theta _{n}(\omega )=\theta _{n}\); \(\Theta _{0}(\omega )=0\). The sequence \((T_{n})_{n\in \mathbb {N}^{*}}\) of \(\overline{\mathbb {R}}_{+}^{*}\)-valued mappings is defined on \(\Omega \) by \(T_{n}(\omega )=\sum _{i=1}^n\Theta _i(\omega )=\sum _{i=1}^n\theta _i\) and \(T_\infty (\omega )=\lim _{n\rightarrow \infty }T_{n}(\omega )\). We denote by \(H_{n}=(X_0,A_0,B_0,\Theta _1,X_1,A_1,B_1,\ldots ,A_{n-1},B_{n-1},\Theta _{n},X_{n})\) the n-term random history process taking values in \(\mathbf {H}_{n}\) for \(n\in \mathbb {N}\).
The random measure \(\mu \) associated with \((\Theta _{n},X_{n},A_{n},B_{n})_{n\in \mathbb {N}}\) is a measure defined on \(\mathbb {R}^{*}_{+}\times \mathbf {X} \times \mathbf {A} \times \mathbf {B}\) by
Roughly speaking, for any \(\Gamma \in \mathcal {B}(\mathbb {R}^{*}_{+}\times \mathbf {X} \times \mathbf {A} \times \mathbf {B})\), \(\mu (\Gamma )\) gives the number of elements of the sequence \((T_{n},X_{n},A_{n},B_{n})_{n\in \mathbb {N}}\) that are in \(\Gamma \). For notational convenience the dependence on \(\omega \) will be suppressed and, instead of \(\mu (\omega ;dt,dx,da,db)\), it will be written \(\mu (dt,dx,da,db)\). Moreover, we will denote the marginal of the measure \(\mu \) on \(\mathbb {R}^{*}_{+}\) by \(\mu (dt,\mathbf {X}\times \mathbf {A}\times \mathbf {B})\), that is
Define \(\mathcal {F}_t=\sigma \{H_0\}\vee \sigma \{\mu (]0,s]\times B):~s\le t,B\in \mathcal {B}(\mathbf {X}\times \mathbf {A}\times \mathbf {B}\}\) for \(t\in \mathbb {R}_{+}\). Finally, we define the controlled process \(\big \{\xi (t)\big \}_{t\in \mathbb {R}_{+}}\) and the action processes \(\big \{a(t)\}_{t\in \mathbb {R}_{+}}\), \(\big \{b(t)\big \}_{t\in \mathbb {R}_{+}}\) as follows:
Obviously, the process \((\xi (t),a(t),b(t))_{t\in \mathbb {R}_{+}}\) can be equivalently described by the sequence of random variables \((\Theta _{n},X_{n},A_{n},B_{n})_{n\in \mathbb {N}}\). We define the random process \(\{u(t)\}_{t\in \mathbb {R}_{+}}\) taking values in \(\mathbf {U}\) as follows:
for \(t\in \mathbb {R}_{+}^{*}\). The process \(\{u(t)\}_{t\in \mathbb {R}_{+}}\) is \(\{\mathcal {F}_{t}\}_{t\in \mathbb {R}_{+}}\)-predictable with values in \(\mathbf {U}\).
2.4 Admissible Strategies and Conditional Distribution of the Controlled Process
In what follows we will consider strategies that depend, just after the \(n^{th}\) jump, on the past values of the post-jump location \(X_k\), \(k=0,\ldots ,X_n\), and the previous actions \(A_k\), \(B_k\), \(k=0,\ldots ,n-1\). For this we define, following the definition presented before, \(\widetilde{h}_n = (x_0,a_0,b_0,x_1,a_1,b_1,\ldots , x_{n-1},a_{n-1},b_{n-1},x_{n})\) (thus excluding the decisions at n, and all the inter-jump times \(\theta _k\)) and by \(\widetilde{\mathbf {H}}_{n}\) the collection of all such paths. An admissible strategy for players 1 and 2 respectively is a sequence \(\pi =(\pi _{n})_{n\in \mathbb {N}}\) and \(\gamma =(\gamma _{n})_{n\in \mathbb {N}}\) such that, for any \(n\in \mathbb {N}\),
-
\(\pi _{n}\) is a stochastic kernel on \(\mathbf {A}\) given \(\widetilde{\mathbf {H}}_{n}\). For \(\widetilde{h}_{n}=(x_0,a_0,b_0,x_1,\ldots ,x_{n})\in \widetilde{\mathbf {H}}_{n}\), it satisfies \(\pi _{n}(\mathbf {A}(x_{n})|\widetilde{h}_{n})=1\).
-
\(\gamma _{n}\) is a stochastic kernel on \(\mathbf {B}\) given \(\widetilde{\mathbf {H}}_{n}\). For \(\widetilde{h}_{n}=(x_0,a_0,b_0,x_1,\ldots ,x_{n})\in \widetilde{\mathbf {H}}_{n}\), it satisfies \(\gamma _{n}(\mathbf {B}(x_{n})|\widetilde{h}_{n})=1\).
For simplicity we denote the set of admissible strategies for player 1 by \(\Pi \) and the set of admissible strategies for player 2 by \(\Gamma \).
Definition 2.2
A randomized Markov strategy for player 1 is of the form \(p=(p_0,p_1,\ldots )\) where \(p_k\) is a stochastic kernel on \(\mathbf {A}\) given \(\mathbf {X}\) satisfying \(p_k(\mathbf {A}(x)|x)=1\) and similarly for a randomized Markov strategy for player 2, denoted by \(q=(q_0,q_1,\ldots )\). We denote the set of Markov randomized strategies for player 1 by \(\Pi ^M\) and the set of randomized Markov strategies for player 2 by \(\Gamma ^M\). The case in which \(p_k\) and \(q_k\) are measurable functions is referred to as the set of deterministic Markov strategies for players 1 and 2, and denoted respectively by \(\Pi ^D\) and \(\Gamma ^D\). The stationary case corresponds to \(p_k=p\) and \(q_k=q\) for all k, where p and q are stochastic kernels on \(\mathbf {A}\) given \(\mathbf {X}\) and \(\mathbf {B}\) given \(\mathbf {X}\) respectively, satisfying \(p(\mathbf {A}(x)|x)=1\) and \(q(\mathbf {B}(x)|x)=1\). This case will be denoted by \(\mathbf {P}\) and \(\mathbf {Q}\) respectively.
The cumulative jump rate \(\Lambda ^{a,b}(x,t)\) is given by
for \((x,a,b)\in \mathbf {K}\), and \(t\in [0,t_{*}(x)]\). With a slight abuse of notation, we denote
for \((x,a,b)\in \mathbf {K}\), \(t\in [0,t_{*}(x)]\), and \(A\in \mathcal {B}(\mathbf {X})\). Now, let us introduce the stochastic kernel D on \(\overline{\mathbb {R}}_{+}^{*}\times \mathbf {X}_{\infty }\) given \(\mathbf {K}_\infty \) describing the joint distribution of the next sojourn time and state of the process:
for \(\Gamma \in \mathcal {B}(\overline{\mathbb {R}}_{+}^{*})\), \(S \in \mathcal {B}(\mathbf {X}_{\infty })\).
Roughly speaking, given x the last post-jump location of the process, a the action for player 1 and b the action for player 2, the first line in the previous equation gives the probability of the next sojourn time and the state of the process to be equal to \((+\infty ,x_{\infty })\), that is,
The second line gives the probability of the next sojourn time to be equal to \(t_{*}(x)\) (corresponding to a jump at the boundary) and the state of the process to be in S, that is, for \(x\in \mathbf {X}\) such that \(t_*(x)<\infty \), we have that
The third line gives the probability of the next sojourn time to be less than \(t_{*}(x)\) (corresponding to a natural jump) and the state of the process to be in S, that is, for \(x\in \mathbf {X}\) and \(\tau <t_*(x)\), we have that
Consider the strategies \((\pi ,\gamma )\) for players 1 and 2 and an initial state \(x_{0}\in \mathbf {X}\). From Remark 3.43 in [12], there exists a probability \(\mathbb {P}^{\pi ,\gamma }_{x_{0}}\) on \((\Omega ,\mathcal {F})\) and a sequence of random variables \((\Theta _{n},X_{n},A_{n},B_{n})_{n\in \mathbb {N}}\) (or equivalently a stochastic process \((\xi (t),a(t),b(t))_{t\in \mathbb {R}_{+}}\), see Eqs. (2), (3), (4)) such that the conditional distribution of \((\Theta _{n+1},X_{n+1},A_{n+1},B_{n+1})\) given \(\mathcal {F}_{T_{n}}\) under \(\mathbb {P}^{\pi ,\gamma }_{x_{0}}\) is determined by the stochastic kernel \(D^{\pi ,\gamma }_{n}\) on \(\overline{\mathbb {R}}_{+}^{*}\times \mathbf {K}_\infty \) given \(\mathbf {H}_{n}\) given by
We write \(\mathbb {E}^{\pi ,\gamma }_{x_{0}}(.)\) to denote the expectation under the probability \(\mathbb {P}^{\pi ,\gamma }_{x_{0}}\). We can summarize the dynamics of the stochastic process \((\xi (t),a(t),b(t))_{t\in \mathbb {R}_{+}}\) as follows. At time \(t=0\) the first actions for players 1 and 2, denoted by \(A_0\) and \(B_0\), are obtained randomly from the probability measures \(\pi _0(.|x_0)\) and \(\gamma _0(.|x_0)\) respectively. The first jump time \(T_1\) is a random random variable with distribution given by
If \(T_1\) is equal to infinity, then \(\xi (t)= \phi (x_0,t)\), \(a(t)=A_0\), \(b(t)=B_0\) for \(t\in \mathbb {R}_+\), and \(X_n=x_\infty \), \(A_n=a_\infty \), \(B_n=b_\infty \) for \(n\in \mathbb {N}^*\). Otherwise select independently an \(\mathbf {X}\)-valued random variable \(X_1\) having distribution, for \(S \in \mathcal {B}(\mathbf {X})\), given by
The trajectory of \(\{\xi (t)\}\) starting from \(x_0\) and for \(t\le T_1\) is given by (2), and for \(\{a(t)\}\) and \(\{b(t)\}\) for \(t <T_1\) is as given in (3) and (4). In general, at time \(T_n\) and starting from \(X_{n}\), we select the actions for players 1 and 2, denoted by \(A_n\) and \(B_n\), randomly from the probability measures \(\pi _1(.|\tilde{H}_n)\) and \(\gamma _1(.|\tilde{H}_n)\) respectively, and the next inter-jump time \(T_{n+1}-T_n\) and post-jump location \(X_{n+1}\) as in (11) and (12) respectively.
The value function for the min–max problem will contain two terms, a running reward function f associated to the gradual actions of players 1 and 2, and a boundary reward function r, associated with the impulsive actions on the boundary \(\mathbf {\Xi }\) of players 1 and 2. We assume that \(f\in \mathbb {M}(\mathbf {X}_\infty )\) and \(r\in \mathbb {M}(\mathbf {\Xi })\).
The associated \(T_n\)-horizon and infinite-horizon discounted payoff criterion corresponding to strategies \((\pi ,\gamma )\) for players 1 and 2 are defined by
and
where the measure \(\mu (dt,\mathbf {X}\times \mathbf {A}\times \mathbf {B})\) has been defined in (1). In the previous expression, \(\alpha >0\) is the discount factor, \(\mathcal {D}(n,\pi ,\gamma ,x_{0})\) and \(\mathcal {D}(\pi ,\gamma ,x_{0})\) are understood to be equal to \(+\infty \) if the integrals of both the positive and negative parts of the integrand are infinite. Note that, for any strategy \(\pi \in \Pi \), \(\gamma \in \Gamma \), the functions \(\mathcal {D}(n,\pi ,\gamma ,\cdot )\) and \(\mathcal {D}(\pi ,\gamma ,\cdot )\) are measurable. The \(T_n\)-horizon and infinite horizon lower value (denoted by the superscript l) and upper value (denoted by the superscript u) problems for the discounted payoff games are defined respectively as:
Clearly we have that \(\mathcal {J}^{l}(n,x_0)\le \mathcal {J}^{u}(n,x_0)\) and \(\mathcal {J}^{l}(x_0)\le \mathcal {J}^{u}(x_0)\). If \(\mathcal {J}^{l}(n,x_0) = \mathcal {J}^{u}(n,x_0)\) (\(\mathcal {J}^{l}(x_0) = \mathcal {J}^{u}(x_0)\)) then the common value is called the value of the game and denoted by \(\mathcal {V}(n,x_0)\) (\(\mathcal {V}(x_0)\) respectively). If the infinite horizon game has a value \(\mathcal {V}\) then a strategy \(\pi ^*\in \Pi \) is said to be optimal for player 1 if \({\inf }_{\gamma \in \Gamma } \mathcal {D}(\pi ^*,\gamma ,x_0) = \mathcal {V}(x_0)\) and similarly \(\gamma ^*\in \Gamma \) is said to be optimal for player 2 if \({\sup }_{\pi \in \Pi } \mathcal {D}(\pi ,\gamma ^*,x_0) = \mathcal {V}(x_0)\). The pair \((\pi ^*,\gamma ^*)\) is said to be a pair of optimal strategies if \(\pi ^*\) is optimal for player 1 and \(\gamma ^*\) is optimal for player 2. Similar definitions hold the finite horizon case.
3 Main Operators
In this section we present some important operators associated to the \(T_n\)-horizon and infinite horizon min-max problems problem posed in (15) and (16) respectively. Let us introduce the kernel G on \(\mathbf {X}_{\infty }\) given \(\mathbf {K}_{\infty }\) as follows:
for \((x,a,b)\in \mathbf {K}_{\infty }\), \(A\in \mathcal {B}(\mathbf {X}_{\infty })\) and the kernel L (respectively, H) defined on \(\mathbf {X}\times \mathbf {U}\) (respectively, \(\mathbf {\Xi }\times \mathbf {U}\)) given \( \mathbf {K}_{\infty }\) as follows:
for \((x,a,b)\in \mathbf {K}_{\infty }\), \(A\in \mathcal {B}(\mathbf {X}\times \mathbf {U})\), \(B\in \mathcal {B}(\mathbf {\Xi }\times \mathbf {U})\).
Remark 3.1
When \(t_{*}(x)=\infty \) for \(x\in \mathbf {X}\) we have that \(e^{-\alpha t_{*}(x)}=0\) and thus the kernels G and H have a special form. Indeed in this case \(\displaystyle G(A|x,a,b) = \int _0^{t_{*}(x)}e^{-\alpha s - \Lambda ^{a,b}(x,s)}\lambda Q(A|\phi (x,s),\ell (x,a,b,s)) ds\) (see the notation in (6)) and \(H(B|x,a,b)=0\), for \((x,a,b)\in \mathbf {K}\), \(A\in \mathcal {B}(\mathbf {X})\), \(B\in \mathcal {B}(\mathbf {\Xi }\times \mathbf {U})\).
We conclude this section introducing the following notation. For \(\varrho \in \mathcal {P}(\mathbf {A}(x))\), \(\chi \in \mathcal {P}(\mathbf {B}(x))\) and a function \(h \in \mathbb {M}(\mathbf {K})\), we write
For \(\bar{\pi }\in \mathcal {P}(\mathbf {A}_{\infty }|\mathbf {X}_{\infty })\) and \(\bar{\gamma }\in \mathcal {P}(\mathbf {B}_{\infty }|\mathbf {X}_{\infty })\) respectively, satisfying \(\bar{\pi }(\mathbf {A}(x)|x)=1\) and \(\bar{\gamma }(\mathbf {B}(x)|x)=1\), we write
and for admissible strategies \(\pi =(\pi _{n})_{n\in \mathbb {N}}\in \Pi \) and \(\gamma =(\gamma _{n})_{n\in \mathbb {N}}\in \Gamma \) for players 1 and 2 respectively, we write
where we recall that \(\widetilde{h}_k = (x_0,a_0,b_0,x_1,a_1,b_1,\ldots , x_{k-1},a_{k-1},b_{k-1},x_{k})\). Following (21) and (22) we set, for \(\bar{\pi }\in \mathcal {P}(\mathbf {A}_{\infty }|\mathbf {X}_{\infty })\) and \(\gamma =(\gamma _{n})_{n\in \mathbb {N}}\in \Gamma \) an admissible strategy for player 2,
and similarly for \(h(x_k,\pi (.|\widetilde{h}_k),\bar{\gamma })\), for the case in which \(\pi =(\pi _{n})_{n\in \mathbb {N}}\in \Pi \) is an admissible strategy for player 1 and \(\bar{\gamma }\in \mathcal {P}(\mathbf {B}_{\infty }|\mathbf {X}_{\infty })\).
4 Assumptions and Auxiliary Results
The purpose of this section is to introduce the main assumptions and present some auxiliary results that will be needed for deriving our main results. The first assumption is related to an upper bound for the jump rate \(\lambda \).
Assumption A
There exists \(\bar{\lambda }\in \mathbb {M}(\mathbf {X})\) satisfying \(\displaystyle \int _{0}^{t} \bar{\lambda }(\phi (x,s)) ds < \infty \) for \(t\in [0,t_{*}(x))\) such that, for any \((x,r)\in \mathbf {X}\times \mathbf {U}\), \(\lambda (x,r)\le \bar{\lambda }(x)\).
The next proposition will be used in the sequel to establish an iterative procedure to get upper and lower bounds for the payoff functions (15) and (16), using the operator G defined in (17).
Proposition 4.1
Suppose that Assumption A holds and that there exist Borel-measurable functions \(\mathcal {W}: \mathbf {X}_{\infty } \mapsto \mathbb {R}_{+}\), \(\mathcal {S}: \mathbf {X}_{\infty } \mapsto \mathbb {R}\) and \(\mathcal {C}: \mathbf {K}_{\infty } \mapsto \mathbb {R}\) and a constant M satisfying \(G\mathcal {W}(x,a,b)\le M \mathcal {W}(x)\), \(|\mathcal {S}(x)|\le M \mathcal {W}(x)\) and \(|\mathcal {C}(x,a,b)|\le M \mathcal {W}(x)\) for any \((x,a,b)\in \mathbf {K}_{\infty }\). Consider \(x_0\in \mathbf {X}\), \(\pi =(\pi _{k})_{k\in \mathbb {N}}\in \Pi \) and \(\gamma =(\gamma _{k})_{k\in \mathbb {N}}\in \Gamma \) such that
for any \(k\in \mathbb {N}\) and \(\widetilde{h}_{k}\in \widetilde{\mathbf {H}}_{k}\). We have that
Proof
See Appendix. \(\square \)
Condition (24) is usually hard to be checked since it is written in terms of the operator G, which involves an integral with respect to the primitive data Q and \(\lambda \) of the process, as well as some boundary conditions. The next assumption presents an infinitesimal condition written directly in terms of the primitive data Q and \(\lambda \), and the boundary conditions, that will be used to verify (24). But first we need to introduce the set \(\mathbb {M}^{ac}(\mathbf {X}_{\infty })\) of real valued measurable functions defined on \(\mathbf {X}_{\infty }\) which are absolutely continuous with respect to the flow \(\phi \), that is, the set of functions \(g\in \mathbb {M}(\mathbf {X}_{\infty })\) such that for any \(x\in \mathbf {X}\), the function \(g(\phi (x,\cdot ))\) is absolutely continuous on \([0,t_{*}(x)[\) and \(\lim _{t\rightarrow t_{*}(x)} g(\phi (x,t))\) exists whenever \(t_{*}(x)<\infty \). From Lemma 2.2 in [1], if \(g\in \mathbb {M}^{ac}(\mathbf {X}_{\infty })\) then there exists a real-valued measurable function \(\mathcal {X}g\) defined on \(\mathbf {X}\) satisfying
for any \(x\in \mathbf {X}\), and \(t\in [0,t_{*}(x)[\). Notice that the domain of definition of the mapping \(g\in \mathbb {M}^{ac}(\mathbf {X})\) can be extended to \(\mathbf {X}_{\infty } \cup \mathbf {\Xi }\) by setting \(g(z)=\lim _{t\rightarrow t_{*}(x)} g(\phi (x,t))\) where \(z=\phi (x,t_{*}(x))\in \mathbf {\Xi }\).
Assumption B
There exist constants \(0<c_1<\alpha \), \(d_{1}\ge 0\) and a function \(W\in \mathbb {M}^{ac}(\mathbf {X}_{\infty })\) satisfying \(W\ge 1\) such that for all \((x,a,b)\in \mathbf {K}\), and \(0\le t < t_{*}(x)\),
and
whenever \(t_{*}(x)<\infty \).
Remark 4.2
Similarly as in Remark 3.2 of [8], Assumption B can be seen as an extension of the “drift condition” presented in (2.4) of [20] for PDMPs, and it is also known as a Lyapunov or Foster-Lyapunov condition. This condition is usually used to obtain growth conditions as in Proposition 4.3 below, and also for some forms of ergodicity, see, for instance, [13]. In Remark 4.5 we show that, for the continuous-time jump Markov process in Polish spaces case as considered in [8], condition (27) becomes condition (a) in Assumption 3.1 of [8].
Combining Assumptions A and B and using the operators L and H as defined in (18) and (19) we obtain a condition similar to (24) for fixed actions a and b.
Proposition 4.3
Suppose that Assumptions A and B hold. For any \(\bar{M}_1>0\) define
and
where W, \(c_1\) and \(d_1\) are as in Assumption B. Then, for any \((x,a,b)\in \mathbf {K}_{\infty }\),
Proof
From the definition of the kernels L and H in (18) and (19) we have that \(LW(x_\infty ,a_\infty ,b_\infty )=0\) and \(HW(x_\infty ,a_\infty ,b_\infty )=0\) since \(x_\infty \notin \mathbf {X}\). Thus from (29) we get that \(GS(x_{\infty },a_{\infty },b_{\infty })=\bar{C}(x_{\infty },a_{\infty },b_{\infty })=0\) and so (31) is trivially satisfied. Now, consider \((x,a,b)\in \mathbf {K}\). After some algebraic manipulations we get from (27) and (28) in Assumption B and S as defined in (30) that
Now, multiplying the inequality (32) by \(e^{-\alpha t-\Lambda ^{a,b}(x,t)}\) and integrating over [0, s] for \(s<t_{*}(x)\), we get
where we have used Assumption A to claim that
Consider first the case where \(t_{*}(x)<\infty \). Recalling that \(S(\phi (x,\cdot ))\) is absolutely continuous on \([0,t_{*}(x)]\) we obtain
by taking the limit in (34) as s tends to \(t_{*}(x)\). Now, by using (33) we easily get the result.
Now let us assume that \(t_{*}(x)=\infty \). Recalling that S is positive we get from (34) that
which is precisely the claim since \(t_{*}(x)=\infty \) (see Remark ). \(\square \)
In the next assumption we consider an upper bound function W(x) for |f(x, u)| and |r(z, u)|. This function W(x) will be the same function as in Assumption B.
Assumption C
There exists \(M_1 >0\) such that \(| f(x,u)| \le M_1 W(x)\) for all \(x\in \mathbf {X}\), \(u\in \mathbf {U}\), and \(|r(z,u)| \le M_1 W(z)\) for all \(z\in \mathbf {\Xi }\), \(u\in \mathbf {U}\).
Lemma 4.4
Suppose that Assumptions A, B, and C hold, and consider \(\bar{C}\) as in (29). Then the real-valued function C defined on \(\mathbf {K}_{\infty }\) by
satisfies \(\displaystyle \sup _{(x,a,b) \in \mathbf {K}_{\infty }} \frac{|C(x,a,b)|}{W(x)} <\infty \). Moreover, \(\displaystyle \sup _{(x,a,b) \in \mathbf {K}_{\infty }} \frac{\bar{C}(x,a,b)+GW(x,a,b)}{W(x)} <\infty \).
Proof
It is a straightforward application of Proposition 4.3 and Assumption C. \(\square \)
The next assumption is needed to guarantee the convergence of the sum of some discounted payoff functions related to the infinite horizon problem (14).
Assumption D
There exist constants \(0<c_2<\alpha \), \(d_{2}\ge 0\) and a function \(W_{2}\in \mathbb {M}^{ac}(\mathbf {X}_{\infty })\) such that for all \((x,a,b)\in \mathbf {K}\) with \(x\in \mathbf {X}\), and \(0\le t < t_{*}(x)\),
and
whenever \(t_{*}(x)<\infty \). Moreover,
for some \(M_2>0\) and for the function W introduced in Assumption B.
Remark 4.5
We show in this remark that for the case in which there is no flow, that is, \(\phi (x,t)=x\) for all t (and thus there is no boundary), Assumptions B and D are similar to Assumptions 3.1(a) and Assumption 5.2(d) in [8] obtained for a continuous-time jump Markov process in Polish spaces. Indeed, for the case in which there is no motion we would have \(\phi (x,t)=x\), \(\ell (x,a,b,t)= (a,b)\), \(t_*(x) = \infty \) for all \(x\in \mathbf {X}\), \(t\in \mathbb {R}_+\). For each \(x\in \mathbf {X}\), \(a\in \mathbf {A}(x)\), \(b\in \mathbf {A}(x)\), define the signed measure q(.|x, a, b) on \(\mathbb {B}(\mathbf {X})\) as
Then function q(.|x, a, b), referred to as the function of transition rates in [8], satisfies the conditions \((T_1)\), \((T_2)\), \((T_3)\) in [8], and \(-q(\{x\}|x,a,b) = \lambda (x,a,b)\), so that
Note that \(W(\phi (x,t))=W(x)\), \(W_2(\phi (x,t))=W_2(x)\) for all \(t\in \mathbb {R}_+\), so that the derivative with respect to t is zero, that is, \(\mathcal {X}W(\phi (x,t))=0\) and \(\mathcal {X}W_2(\phi (x,t))=0\). Using the notation in (39) we have that (27) and (36) can be written respectively as \(qW(x,a,b) \le c_1W(x) + d_{1}\) and \(qW_2(x,a,b) \le c_2W_2(x) + d_{2}\), which corresponds to Assumption 3.1(a) and the second part of Assumption 5.2(d) in [8]. Notice now that \(HW_2(x,a,b)=0\) and
so that (38) can be re-written as \((\alpha + \lambda (x,a,b)) W(x) \le M_2 W_2(x)\) which is equivalent, using the notation in (40), to \((\alpha + q(x)) W(x) \le M_2 W_2(x)\) for some \(M_2>0\). We show next that this is equivalent to the first part of the Assumption 5.2(d) in [8], that is, \(q(x)W(x)\le \widetilde{M}_2 W_2(x)\), for some \(\widetilde{M}_2>0\), assuming that \(0<q_{min}\le q(x)\) for all \(x\in \mathbf {X}\). In fact from \((\alpha + q(x)) W(x) \le M_2 W_2(x)\) it is immediate that \(q(x) W(x) \le (\alpha + q(x)) W(x) \le M_2 W_2(x)\). On the other hand, if \(q(x)W(x)\le \widetilde{M}_2 W_2(x)\) then \(W(x) \le \frac{\widetilde{M}_2}{q_{min}}W_2(x)\) and \((\alpha + q(x)) W(x) \le \widetilde{M}_2(1+\frac{\alpha }{q_{min}})W_2(x) = M_2 W_2(x)\) with \(M_2 = (1+\frac{\alpha }{q_{min}})\widetilde{M}_2\), showing the equivalence.
The next result shows the convergence of the expected discounted sum of the function W, that will be used for the infinite horizon problem (14).
Proposition 4.6
Consider Assumptions A, B and D. For \(x_0\in \mathbf {X}\), \(\pi =(\pi _{k})_{k\in \mathbb {N}}\in \Pi \) and \(\gamma =(\gamma _{k})_{k\in \mathbb {N}}\in \Gamma \), \(n\in \mathbb {N}\) we have that
where \(S_2(x) = \frac{M_2}{\alpha -c_2}W_2(x) + \frac{d_{2}M_2}{\alpha (\alpha -c_2)}\). In particular, \(\mathbb {P}^{\pi ,\gamma }_{x_0}\big ( \{T_{\infty } <\infty \} \big ) =0\) for any \(\pi =(\pi _{k})_{k\in \mathbb {N}}\in \Pi \) and \(\gamma =(\gamma _{k})_{k\in \mathbb {N}}\in \Gamma \).
Proof
Using the same arguments as in the Proposition 4.3 we get that for any \((x,a,b)\in \mathbf {K}_{\infty }\) \(GS_2(x,a,b) + M_{2} \bar{C}_2(x,a,b) \le S_2(x)\) with \(\bar{C}_2(x,a,b) = LW_2(x,a,b) + HW_2(x,a,b)\). Now, we can apply Proposition 4.1 to the functions \(\mathcal {W}(x)=\mathcal {S}(x)=S_{2}(x)\) and \(\mathcal {C}(x,a,b)=M_{2} \bar{C}_2(x,a,b)\) to get
Recalling that \(S_2\) is positive we obtain that
by using inequality (38). Since \(\mathbb {P}^{\pi ,\gamma }_{x_0}\Big ( \{X_{k}=x_{\infty }\}\cap \{T_{k}=\infty \}\Big )=1\) we get that
and thus we have that
From (42) and (43) we get (41). Now, since \(W\ge 1\), we get from (41) and the Monotone Convergence Theorem that \(\mathbb {E}^{\pi ,\gamma }_{x_0}\Big [ \sum _{k=0}^{\infty } e^{-\alpha T_k } \Big ] \le S_2(x_0)\), implying the last part of the result. \(\square \)
The following auxiliary results will be useful in the sequel, in order to re-write our min–max continuous-time problem in a discrete-time framework, in which the stages are defined by the jump times \(T_{k}\) of the PDMP. The first result gives an interpretation of (18), (19), in terms of the jump time \(T_1\).
Lemma 4.7
Suppose that Assumptions A, B, and C hold. For \(x_0\in \mathbf {X}\), \(\pi =(\pi _{n})_{n\in \mathbb {N}}\in \Pi \) and \(\gamma =(\gamma _{n})_{n\in \mathbb {N}}\in \Gamma \), \(k\in \mathbb {N}\)
Proof
Moreover, denoting \(S_{k+1} = T_{k+1} - T_{k}\), we have from (44) and similar reasoning as in (43) that
Similarly, from (45) we get that
completing the proof. \(\square \)
The next result re-writes the payoff functions in a discrete-time fashion, using the operators L and H defined in (18) and (19) respectively.
Proposition 4.8
Consider Assumptions A, B, C and D. For \(x_0\in \mathbf {X}\), \(\pi =(\pi _{k})_{k\in \mathbb {N}}\in \Pi \) and \(\gamma =(\gamma _{k})_{k\in \mathbb {N}}\in \Gamma \), \(n\in \mathbb {N}\), we have that
and \(\mathcal {D}(\pi ,\gamma ,x_0) = \lim _{n\rightarrow \infty } \mathcal {D}(n,\pi ,\gamma ,x_{0})\).
Proof
Recalling the definition of \(\mathcal {D}(n+1,\pi ,\gamma ,x_{0})\) (see Eq. (13)) and C (see Eq. (35)), we easily obtain the first part of the claim by using Lemma 4.7. Moreover, we have that
from the Monotone Convergence Theorem and since \(\lim _{k\rightarrow \infty } T_{k}=\infty \), \(\mathbb {P}^{\pi ,\gamma }_{x_{0}}\)-a.s. (see Lemma 4.7). Recalling the definition of \(\bar{C}\) in equation (29) we have that
for some positive constant M where we have used the fact that \(\bar{C}(x,a,b)\le M W(x)\) (see Eqs. (30) and (31)) to get the last inequality. Proposition 4.6 gives that
Consequently, by using Assumption C and the Bounded Convergence Theorem, we get the last part of the result. \(\square \)
5 Main Results
In this section we present the main results of this paper. We start by introducing some continuity and compactness conditions on the parameters of the problem. Proposition 5.1 establishes the existence of optimal Markov strategies for the finite horizon problem (13). The infinite horizon case (14) is considered in Propositions 5.2 and 5.3. It is shown first in Proposition 5.2 the existence of a solution for the optimality equation associated to the zero-sum game problem. Proposition 5.3 establishes uniqueness of the optimality equation and the existence of optimal stationary Markov strategies.
Consider the following assumptions:
Assumption E
-
(E1)
The set-valued mappings defined by \(x\mapsto A(x)\) and \(x\mapsto B(x)\) defined on \(\mathbf {X}_{\infty }\) are Borel-measurable and compact valued.
-
(E2)
For each \(x\in \mathbf {X}_{\infty }\), C(x, a, b) in continuous in \((a,b)\in \mathbf {A}(x)\times \mathbf {B}(x)\).
-
(E3)
For each \(x\in \mathbf {X}_{\infty }\) and \(u\in \mathbb {B}(\mathbf {X}_{\infty })\), Gu(x, a, b) is continuous in \((a,b)\in \mathbf {A}(x)\times \mathbf {B}(x)\).
-
(E4)
For each \(x\in \mathbf {X}_{\infty }\), GW(x, a, b) is continuous in \((a,b)\in \mathbf {A}(x)\times \mathbf {B}(x)\).
From Lemma 4.4, it follows that \(C(x,\varrho ,\chi ) + Gh(x,\varrho ,\chi )\) is well defined for any \(x\in \mathbf {X}_{\infty }\), \(\varrho \in \mathcal {P}(\mathbf {A}(x))\) and \(\chi \in \mathcal {P}(\mathbf {B}(x))\) whenever \(h\in \mathbb {B}_{W}(\mathbf {X}_{\infty })\). Consequently, proceeding as in the proof of Theorem 5.1 (c) in [8], it easily follows from Assumption E that the functions T and R defined on \(\mathbb {B}_{W}(\mathbf {X}_{\infty })\) by
are well defined and maps \(\mathbb {B}_{W}(\mathbf {X}_{\infty })\) into \(\mathbb {B}_{W}(\mathbf {X}_{\infty })\). Moreover, from Fan’s min–max theorem in [5], and the min–max measurable selection theorem in [18] and [19], there exist \(p\in \mathbf {P}\) and \(q\in \mathbf {Q}\) such that
Set recursively
Proposition 5.1
Consider Assumptions A, B, C and D and E. Then there exist \(p=(p_{n})_{n\in \mathbb {N}}\in \Pi ^M\) and \(q=(q_{n})_{n\in \mathbb {N}}\in \Gamma ^M\) such that
for \(k\in \mathbb {N}\). Moreover, the finite horizon game has a value \(\mathcal {V}(n,x)\) satisfying
and \(|\mathcal {V}(n,x)| \le S(x)\) for any \(x\in \mathbf {X}\).
Proof
The first statements can be easily obtained by using similar arguments as in Theorem 4.1 in [15] and Eq. (49). To show that \(|\mathcal {V}(n,x)| \le S(x)\), we first notice that \(GS(x,a,b) + C(x,a,b) \le S(x)\) for any \((x,a,b)\in \mathbf {K}_{\infty }\) by combining (31) and Assumption C. From Proposition 4.1 considering the functions \(\mathcal {W}(x)=W(x)\), \(\mathcal {S}(x)=S(x)\) and \(\mathcal {C}(x,a,b)=C(x,a,b)\) we get the desired result. \(\square \)
Define now the sequence \(U_{k+1}(x) = TU_k(x)\), with \(U_0(x)=-S(x)\).
Proposition 5.2
Suppose that Assumptions A, B, C and D and E hold. Then there exists a function \(U^* \in \mathbb {B}_{W}(\mathbf {X}_{\infty })\) such that \(U^* = TU^*=RU^*\) and \(\lim _{k\rightarrow \infty } U_k(x) = U^* (x)\) for each \(x\in \mathbf {X}_{\infty }\).
Proof
It follows similar arguments as in the proof of Theorem 5.1 in [8]. First we show by induction on k that \( |U_k(x)| \le S(x)\). For \(k=0\) it is immediate by definition. If it holds for k then, from Proposition 4.3, \(| C(x,a,b) + GU_k(x,a,b) | \le \bar{C}(x,a,b) + GS(x) \le S(x)\) showing that \( |U_{k+1}(x)| \le S(x)\). We also have from Proposition 4.3 that \((U_k(x))_{k\in \mathbb {N}}\) is a pointwise non-decreasing sequence of functions since \(C(x,a,b) + GU_0(x,a,b) \ge - \bar{C}(x,a,b) - GS(x) \ge -S(x)\), and thus \(U_1(x) \ge U_0(x)\), and the operator T is monotone. From this we have that there exists \(U^*(x) = \lim _{k\rightarrow \infty } U_k(x)\le S(x)\), and so \(U^* \in \mathbb {B}_{W}(\mathbf {X}_{\infty })\). Since T is monotone and \(U^* \ge U_k\) it follows that \(TU^* \ge TU_k = U_{k+1}\), which shows that \(TU^* \ge U^*\). From (49) there exists \(q_n\in \mathbf {Q}\) such that for any \(\varrho ^\prime \in \mathcal {P}(\mathbf {A}(x))\),
Since \(\mathcal {P}(\mathbf {B}(x))\) is compact it can be assumed without loss of generality that \(q_n(.|x) \rightarrow \chi ^\prime \) as \(n \rightarrow \infty \) for some \(\chi ^\prime \in \mathcal {P}(\mathbf {B}(x))\). From the extended Fatou’s lemma (see Lemma 8.3.7 in [11]) and the continuity assumptions made we get from (52) that
From (53) it follows that \(U^*(x) \ge RU^*(x) = TU^*(x)\) completing the proof. \(\square \)
Proposition 5.3
Suppose that Assumptions A, B, C and D and E hold and consider \(U^*\) as in Proposition 5.2. Then \(U^*\) is the unique solution for \(V = TV\) with \(V \in \mathbb {B}_{W}(\mathbf {X}_{\infty })\). Moreover the discounted infinite horizon game has a value \(\mathcal {V}\) satisfying
where the pair of optimal strategies \((p^*,q^*)\in \mathbf {P}\times \mathbf {Q}\) is such that \(U^*(x) = C(x,p^*,q^*) + GU^*(x,p^*,q^*)\).
Proof
Let \(V \in \mathbb {B}_{W}(\mathbf {X}_{\infty })\) satisfy \(V=TV\). The idea of the proof is to show that the discounted infinite horizon game has a value \(\mathcal {V}\), and that \(\mathcal {V}=V\), so that \(\mathcal {V}\) is the unique solution of \(V=TV\) with \(V \in \mathbb {B}_{W}(\mathbf {X}_{\infty })\), and from Proposition 5.2 we have that \(\mathcal {V}=U ^*\). Consider any strategy \(\pi \in \Pi \) for player 1 and set \(q^*\in \mathbf {Q}\) such that (see (49))
for any \(\varrho \in \mathcal {P}(\mathbf {A}(x))\). From Proposition 4.1 considering the functions \(\mathcal {W}(x)=W(x)\), \(\mathcal {S}(x)=V(x)\), \(\mathcal {C}(x,a,b)=C(x,a,b)\), and using the inequality \(V(x) \ge C(x,\varrho ,q^*) + GV(x,\varrho ,q^*)\) for any \(\varrho \in \mathcal {P}(\mathbf {A}(x))\) obtained from (54), we get that
Taking the limit in (55) as \(n\rightarrow \infty \) we obtain from Proposition 4.6 that
Similarly consider any strategy \(\gamma \in \Gamma \) for player 2 and set \(p^*\in \mathbf {P}\) such that (see (49))
for any \(\chi \in \mathcal {P}(\mathbf {B}(x))\). From Proposition 4.1 considering the functions \(\mathcal {W}(x)=W(x)\), \(\mathcal {S}(x)=-V(x)\), \(\mathcal {C}(x,a,b)=-C(x,a,b)\), and using the inequality \(V(x) \le {C(x,p^*,\chi ) + GV(x,p^*,\chi )}\) for any \(\chi \in \mathcal {P}(\mathbf {B}(x))\) obtained from (57), we get that
Taking the limit in (58) as \(n\rightarrow \infty \) and from Proposition 4.6 we get that
From (56) and (59) we get that
and thus the discounted infinite horizon game has a value \(\mathcal {V}\), and \(\mathcal {V}=V\). Therefore for any \(V \in \mathbb {B}_{W}(\mathbf {X}_{\infty })\) satisfying \(V=TV\) we have that \(V=\mathcal {V}\), which shows that \(\mathcal {V}\) is the unique fixed point solution of \(V=TV\) with \(V \in \mathbb {B}_{W}(\mathbf {X}_{\infty })\). From Proposition 5.2 we get that \(\mathcal {V}(x)=U ^*(x)\). Moreover by taking \(\pi = p^*\) in (59) and \(\gamma = q^*\) in (56) we get that \(\mathcal {V}(x_0)= \mathcal {D}(p^*,q^*,x_0)\) completing the proof. \(\square \)
References
Costa, O.L.V., Dufour, F.: Continuous average control of piecewise deterministic Markov processes. Springer, New York (2013)
Davis, M.H.A.: Piecewise-deterministic Markov processes: a general class of non-diffusion stochastic models. J. R. Stat. Soc. (B) 46(3), 353–388 (1984)
Davis, M.H.A.: Markov Models and Optimization. Chapman and Hall, London (1993)
Davis, M.H.A., Dempster, M.A.H., Sethi, S.P., Vermes, D.: Optimal capacity expansion under uncertainty. Adv. Appl. Probab. 19(1), 156–176 (1987)
Fan, K.: Minimax theorems. Proc. Natl. Acad. Sci. USA 39, 42–47 (1953)
Filar, J.A., Vrieze, K.: Competitive Markov decision processes. Springer, New York (1997)
Gonzáles-Trejo, J.I., Hernández-Lerma, O., Hoyos-Reyes, L.F.: Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41, 1626–1659 (2003)
Guo, X., Hernanez-Lerma, O.: Zero-sum games for continuous-time jump Markov processes in polish spaces: discounted payoffs. Adv. Appl. Probab. 39, 646–668 (2007)
Guo, X.P., Hernández-Lerma, O.: New optimality conditions for average-payoff continuous-time Markov games in Polish spaces. Sci. China Math. 54, 793–816 (2011)
Hernández-Lerma, O., Lasserre, J.B.: Zero-sum stochastic games in Borel spaces: average payoff criterion. SIAM J. Control Optim. 39, 1520–1539 (2001)
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics, vol. 42. Springer, New York (1999)
Jacod, J.: Calcul stochastique et problèmes de martingales. Lecture Notes in Mathematics, vol. 714. Springer, Berlin (1979)
Jaśkiewicz, A.: Zero-sum semi-Markov games. SIAM J. Control Optim. 41, 723–739 (2002)
Jaśkiewicz, A.: Zero-sum ergodic semi-Markov games with weakly continuous transition probabilities. J. Optim. Theory Appl. 141, 321–347 (2009)
Jaskiewicz, A., Nowak, A.S.: Zero-sum ergodic stochastic games with Feller transition probabilities. SIAM J. Control Optim. 45(3), 773–789 (2006)
Jaśkiewicz, A., Nowak, A.S.: Stochastic games with unbounded payoffs: applications to robust control in economics. Dyn. Games Appl. 1, 253–279 (2011)
Kuenle, Heinz-Uwe: On Markov games with average reward criterion and weakly continuous transition probabilities. SIAM J. Control Optim. 45, 2156–2168 (2007)
Nowak, A.S.: Measurable selection theorems for minimax stochastic optimization problems. SIAM J. Control Optim. 23, 466–476 (1985)
Rieder, U.: On semi-continuous dynamic games. Technical report, University of Karlsruhe, Karlsruhe, Germany, 1978
Tweedie, Richard L., Lund, Robert B., Meyn, Sean P.: Computable exponential convergence rates for stochastically ordered markov processes. Ann. Appl. Probab. 6(1), 218–237 (1996)
Van der Duyn Schouten, F.A.: Markov decision drift processes. In: Janssen, J. (ed.) Semi-Markov Models: Theroy and Applications, Chapter 2, pp. 63–78. Springer, New York (1984)
Vega-Amaya, O.: Zero-sum average semi-Markov games: fixed-point solutions of the Shapley equation. SIAM J. Control Optim. 42, 1876–1894 (2003)
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the paper. This work was partially supported by FAPESP (Research Council of the State of São Paulo) Grant 2013/50759-3. O.L.V. Costa received financial support from CNPq (Brazilian National Research Council), Grant 304091/2014-6, project INCT under the Grant CNPq 465755/2014-3, FAPESP 2014/50851-0, and FAPESP/BG Brasil through the Research Centre for Gas Innovation, FAPESP Grant 2014/50279-4.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of Proposition 4.1
Appendix: Proof of Proposition 4.1
For the proof of this proposition, we need first to derive some auxiliary technical results. In what follows we write for notational convenience \(\widetilde{h}_k = (\widetilde{h}_{k-1},a_{k-1},b_{k-1},x_k)\) and we introduce
and
for \(\pi =(\pi _{k})_{k\in \mathbb {N}}\in \Pi \), \(\gamma =(\gamma _{k})_{k\in \mathbb {N}}\in \Gamma \) and \(n\in \mathbb {N}\), \(k\in \mathbb {N}^{*}_{n}\) (for simplicity we omit the dependence on \((\pi ,\gamma )\) for the functions \(v_k^n\), \(s_k^n\) defined below). Observe that for any \(n\in \mathbb {N}\), \(k\in \mathbb {N}_{n}\) and \(\widetilde{h}_k\in \widetilde{\mathbf {H}}_{k}\), \( v_{k}^n (\widetilde{h}_{k})\) and \(s_{k}^n (\widetilde{h}_{k})\) are well defined by using the hypotheses on \(\mathcal {W}\), \(\mathcal {S}\) and \(\mathcal {C}\).
Proposition 6.1
For \(x_0\in \mathbf {X}\), \(\pi =(\pi _{k})_{k\in \mathbb {N}}\in \Pi \) and \(\gamma =(\gamma _{k})_{k\in \mathbb {N}}\in \Gamma \), \(m\in \mathbb {N}\), we have that
Proof
It is an immediate application of the construction of the process. \(\square \)
As a consequence of the previous proposition, we have the following result.
Proposition 6.2
For \(x_0\in \mathbf {X}\), \(\pi =(\pi _{k})_{k\in \mathbb {N}}\in \Pi \) and \(\gamma =(\gamma _{k})_{k\in \mathbb {N}}\in \Gamma \), \(m\in \mathbb {N}\) we have that
Proof
By definition, we have
and so from Proposition 6.1,
Repeating this procedure we get (64), and similarly we get (65). \(\square \)
For notational convenience, let us define
Proposition 6.3
For \(k\in \mathbb {N}_{n}\), we have that
Proof
Let us prove (67) by induction on k. For \(k=n\) we have from (60), (62) and (66) that \(s_n^n(\widetilde{h}_{n})= G\mathcal {S}(x_n,\pi ( . | \widetilde{h}_n),\gamma _{n}( . | \widetilde{h}_n))\), \(g_n^n(\widetilde{h}_{n})= v_n^n(\widetilde{h}_{n})= \mathcal {C}(x_n,\pi _{n}( . | \widetilde{h}_n),\gamma _{n}( . | \widetilde{h}_n))\) and thus from (24),
This proves the result for n. Suppose (67) holds for k. Let us show that it also holds for \(k-1\). We have from (60), (61), (63), (66), the induction hypothesis (67), and (24), that
where the last inequality follows from (24). Thus re-arranging the terms we get that \(s_{k-1}^n(\widetilde{h}_{k-1}) + \sum _{i=k-1}^n v_{k-1}^i(\widetilde{h}_{k-1})\le \mathcal {S}(x_{k-1})\) showing (67) for \(k-1\), completing the proof. \(\square \)
Now the proof of Proposition 4.1 is a straightforward consequence of Propositions 6.2 and 6.3. From (67), we have \(s_0^n(\widetilde{h}_{0}) + g_0^n(\widetilde{h}_{0})\le \mathcal {S}(x_0)\). Moreover, combining (64) and (66) we get \(g_0^n(\widetilde{h}_{0})=\sum _{k=0}^{n} \mathbb {E}^{\pi ,\gamma }_{x_0}\Big [e^{-\alpha T_k } \mathcal {C}(X_k,\pi _k(.|{\widetilde{H}_k}),\gamma _k(.|\widetilde{H}_k)) \Big ] \) and from (65), \(s_0^n(x_0,\pi _0,\gamma _0)= \mathbb {E}^{\pi ,\gamma }_{x_0}\Big [e^{-\alpha T_{n+1}} \mathcal {S}(X_{n+1})\Big ] \) giving the result.
Rights and permissions
About this article
Cite this article
Costa, O.L.V., Dufour, F. Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes. Appl Math Optim 78, 587–611 (2018). https://doi.org/10.1007/s00245-017-9416-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-017-9416-2