Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach

Dufour, F.; Piunovskiy, A. B.

doi:10.1007/s00245-015-9310-8

Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach

Published: 24 July 2015

Volume 74, pages 129–161, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Mathematics & Optimization Aims and scope Submit manuscript

Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach

Download PDF

F. Dufour^1,2,3 &
A. B. Piunovskiy⁴

409 Accesses
11 Citations
Explore all metrics

Abstract

In this paper, we investigate an optimization problem for continuous-time Markov decision processes with both impulsive and continuous controls. We consider the so-called constrained problem where the objective of the controller is to minimize a total expected discounted optimality criterion associated with a cost rate function while keeping other performance criteria of the same form, but associated with different cost rate functions, below some given bounds. Our model allows multiple impulses at the same time moment. The main objective of this work is to study the associated linear program defined on a space of measures including the occupation measures of the controlled process and to provide sufficient conditions to ensure the existence of an optimal control.

On Optimal Stopping and Impulse Control with Constraint

Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

Article 15 April 2016

Stability of nonlinear impulsive stochastic systems with Markovian switching under generalized average dwell time condition

Article 19 October 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The objective of this work is to study continuous-time Markov decision processes (CTMDP’s) with constraints for the infinite horizon discounted cost with both impulsive and continuous controls. CTMDP’s is a general family of controlled stochastic processes suitable for the modeling of sequential decision-making problems. They appear in many fields such as engineering, computer science, economics and operational research among others. Our goal is to study the two types of control for CTMDPs as described by Davis in his book [6]: continuous control acting at all times on the process through the transition rate, and impulsive control, used to describe control actions that move the process to a new point of the state space at some specific times. Continuous control for CTMDP’s has been extensively studied in the literature, see for example the recent books [9, 10, 27] and the references therein. Meanwhile, impulsive control for CTMDP’s has received less attention and was studied in [6, 8, 15, 16, 25, 29–32]; see also the recent work [4], where a general Markov model was considered with application to financial mathematics. We do not attempt to present here an exhaustive panorama on this topic, but refer the interested reader to [8] for a brief survey of CTMDP with impulsive control.

It is important to emphasize that in the framework of impulsive control for CTMDPs, there exist two rather distinct families of problems. The first class is related to models allowing only one impulsive action at a time. The second family is more general and studies models with possibly multiple impulses at the same time moment. This latter set of problems is much more delicate for the analysis. Indeed, if the process may take different values at the same time moment then it leads to non standard paths for the controlled process. Most of the works in the literature are concerned with the first class of problems. The second family of problems have been addressed mainly by Yushkevich in [29–32]. Yushkevich has introduced a new class of stochastic models, the so called T-processes where roughly speaking the processes are indexed by a parameter representing the natural current time and the number of the impulsive actions at that time moment. In [8], another approach has been developed by the authors in order to use the standard theory of marked point processes [17, 23]. Roughly speaking, the model discussed in [8] is defined by the following components: the state space $\mathbf {X}$, the set of continuous actions $\mathbf {A}^{g}$, the space of impulsive actions $\mathbf {A}^{i}$, a transition rate q on $\mathbf {X}$ given $\mathbf {X} \times \mathbf {A}^{g}$ and a stochastic kernel Q on $\mathbf {X}$ given $\mathbf {X}\times \mathbf {A}^{i}$. The model is given by a marked point process $(\Theta _{n},Y_{n})_{n\in \mathbb {N}}$ where $\Theta _{n}$ represents the sojourn time between two consecutive epochs induced either by a natural jump or by an intervention of the decision-maker. The natural jumps are generated by the transition rate q. The state vector $Y_{n}$ represents the successive jumps of the process and the associated impulsive actions at the n-th epoch. More precisely, $Y_{n}$ is of the form $(x_0,a_0,x_1,a_1,\ldots ,x_k,a_k,x_{k+1},\Delta ,\Delta ,\ldots )$ where $x_{0}$ corresponds to a possibly natural jump or to the value of the process just before the intervention. The triple $(x_{j},a_{j},x_{j+1})$ indicates that the impulsive action $a_{j}$ has been applied to the system at state $x_{j}$ leading to a new jump $x_{j+1}$ having distribution $Q(\cdot |x_{j},a_{j})$. The special impulsive action $\Delta $ means that the impulses are over and the artificial state $\Delta $ means the same.

The necessity of an immediate sequence of impulses appears naturally in many mathematical models describing real-life problems. For example, in heavy traffic control problems, the approximation of the real physical system may lead to control problems with simultaneous multiple impulses in the limit model. For the original physical system, the impulses are separated in time but for the limit model, impulses can occur at the same time moment due to the fact that the time is rescaled and compressed. For a detailed exposition of such phenomenon, the reader is referred for example to Sect. 8 of the book [21] (and the references therein) where an example of a production system in heavy traffic with impulsive control is discussed.

From a theoretical point of view, constrained CTMDPs are substantially different and more difficult to study than unconstrained CTMDPs. The linear programming technique has proved to be a very efficient method for solving such problems. The key idea is to reformulate the original sequential decision-making problem as an infinite dimensional static optimization problem over a space of measures where the admissible solutions are the so-called occupation measures of the controlled process. This technique has been extensively studied in the recent decades for continuous time processes in the context of continuous (gradual) control. We do not pretend to present here an exhaustive panorama of this approach but the interested reader may consult the following works and the references therein: [11, 21, 24, 26] for CTMDPs, [1, 3, 22] for diffusion processes and [2, 12, 20, 28] for controlled martingale problems. However, the linear programming approach has been considerably less studied for impulsive control for unconstrained and constrained problems. To the best of our knowledge, this paper seems to be the first attempt to tackle such problems. By using the Lagrange multiplier approach, impulsive control problem with constraints has been addressed in [25] for a model with no continuous control, finite state and action spaces and where a single impulsive action is allowed at a time.

In this paper, we investigate a constrained optimization problem for a CTMDP with general state and action spaces, and with both impulsive and continuous controls where the performance and the constraint criteria are given in terms of infinite-horizon discounted functionals. Our model allows multiple impulses at the same time moment. To the best of our knowledge, this paper can be seen as the first attempt to solve such general CTMDP’s. A distinguished feature of this work with respect to [8] is that in the present paper we consider the constrained control problem while in [8], the unconstrained case has been studied by using the dynamic programming technique. The main objective of the present work is to provide sufficient conditions to ensure the existence of an optimal control. First, we study the properties of the occupation measures. It is shown in Theorem 4.5 that for any admissible control strategy, the corresponding occupation measure satisfies a specific linear equation. It is then proved that this linear equation characterizes the optimal control problem under consideration, in the sense that, from any measure $\eta $ satisfying such linear equation one can construct a control strategy u such that the corresponding occupation measure $\eta _{u}$ is smaller than $\eta $ (for a precise mathematical statement, see Theorem 4.9). Based on these properties, one can introduce a linear program, labeled $\mathbb {PLP}$ and prove that the solvability of the constrained optimization problem is equivalent to the solvability of the $\mathbb {PLP}$ and that these two optimization problems give the same value. By introducing a set of weak hypotheses, the solvability of the $\mathbb {PLP}$ is proved in Theorem 5.5. Finally, it is shown in Theorem 5.6 the existence of an optimal randomized control strategy and that the class of such strategies is a sufficient set for the constrained optimization problem under consideration.

The rest of the paper is organized as follows. In Sect. 2, the CTMDP under consideration is discussed. Section 3 is devoted to the presentation of the performance criteria and the introduction of the main assumptions. The properties of the occupation measures are derived in Sect. 4. Finally, the linear program is studied in Sect. 5 where the existence of an optimal control strategy is shown. Several auxiliary results are presented in the Appendices 1 and 2 to streamline the presentation.

2 The Continuous-Time Markov Control Process

The main goal of this section is to introduce the notations, as well as the parameters defining the model, and to present the construction of the controlled process. In particular, having defined the class of admissible strategies, we introduce a probability measure $\mathbb {P}_{x_{0}}^{u}$ with respect to which the controlled process $(\Theta _{n},Y_{n})_{n\in \mathbb {N}}$ has the required conditional distributions.

The following basic notation will be used in this paper: $\mathbb {N}$ is the set of natural numbers including 0, $\mathbb {N}^{*}=\mathbb {N}\setminus \{0\}$, $\mathbb {R}$ denotes the set of real numbers, $\mathbb {R}_{+}$ the set of non-negative real numbers, $\mathbb {R}_{+}^{*}=\mathbb {R}_{+}\setminus \{0\}$, $\bar{\mathbb {R}}_{+}=\mathbb {R}_{+}\mathop {\cup }\{+\infty \}$ and $\bar{\mathbb {R}}_{+}^*=\mathbb {R}_{+}^*\mathop {\cup }\{+\infty \}$. For any $p\in \mathbb {N}$, $\mathbb {N}_{p}$ is the set $\{0,1,\ldots ,p\}$ and for any $p\in \mathbb {N}^{*}$, $\mathbb {N}_{p}^{*}$ is the set $\{1,\ldots ,p\}$. The term measure will always refer to a countably additive, $\bar{\mathbb {R}}_{+}$-valued set function. A finite (respectively, signed) measure is a countably additive, $\mathbb {R}_{+}$-valued (respectively, $\mathbb {R}$-valued) set function. Let $\mathbf {X}$ be a Borel space and denote by $\mathcal {B}(\mathbf {X})$ its associated Borel $\sigma $-algebra. For any set A, $I_{A}$ denotes the indicator function of the set A. The set of measures (respectively, signed measures) defined on $(\mathbf {X},\mathcal {B}(\mathbf {X}))$ is denoted by $\mathcal {M}(\mathbf {X})$ (respectively, $\mathcal {M}_{s}(\mathbf {X})$). The set of finite measures on $(\mathbf {X},\mathcal {B}(\mathbf {X}))$ is denoted by $\mathcal {M}_{f}(\mathbf {X})$ and $\mathcal {P}(\mathbf {X})$ is the set of probability measures defined on $(\mathbf {X},\mathcal {B}(\mathbf {X}))$. For any point $x\in \mathbf {X}$, $\delta _{x}$ denotes the Dirac measure defined by $\delta _{x}(\Gamma )=I_{\Gamma }(x)$ for any $\Gamma \in \mathcal {B}(\mathbf {X})$. The set of bounded real-valued measurable functions defined on X is denoted by $\mathbb {B}(\mathbf {X})$ and the set of $\mathbb {R}$-valued (respectively, $\bar{\mathbb {R}}_{+}$-valued, and $\bar{\mathbb {R}}^{*}_{+}$-valued) measurable functions defined on $\mathbf {X}$ is denoted by $\mathbb {M}(\mathbf {X})$ (respectively, $\bar{\mathbb {M}}_{+}(\mathbf {X})$, and $\bar{\mathbb {M}}^{*}_{+}(\mathbf {X})$). The set of continuous functions in $\mathbb {B}(\mathbf {X})$ is denoted by $\mathbb {C}_{b}(\mathbf {X})$.

Let $\mathbf {X}$ and $\mathbf {Z}$ be two Borel spaces. A kernel (respectively, signed kernel) $T(\cdot |\cdot )$ on $\mathbf {Z}$ given $\mathbf {X}$ is an $\bar{\mathbb {R}}_{+}$-valued (respectively, $\mathbb {R}$-valued) mapping defined on $\mathcal {B}(\mathbf {Z})\times \mathbf {X}$ such that for any $A\in \mathcal {B}(\mathbf {Z})$, $T(A|\cdot )\in \bar{\mathbb {M}}_{+}(\mathbf {X})$ (respectively, $T(A|\cdot )\in \mathbb {M}(\mathbf {X})$) and for any $x\in \mathbf {X}$, $T(\cdot |x)\in \mathcal {M}(\mathbf {Z})$ (respectively, $T(\cdot |x)\in \mathcal {M}_{s}(\mathbf {Z})$). A kernel $T(\cdot |\cdot )$ on $\mathbf {Z}$ given $\mathbf {X}$ is called stochastic (respectively, finite) if for any $x\in \mathbf {X}$, $T(\cdot |x)\in \mathcal {P}(\mathbf {Z})$ (respectively, $T(\cdot |x)\in \mathcal {M}_{f}(\mathbf {Z})$). $\mathcal {P}(\mathbf {Z}|\mathbf {X})$ denotes the set of stochastic kernels on $\mathbf {Z}$ given $\mathbf {X}$. A transition rate q on $\mathbf {X}$ given $\mathbf {X}\times \mathbf {Z}$ is a signed kernel on $\mathbf {X}$ given $\mathbf {X}\times \mathbf {Z}$ satisfying $q(\mathbf {X} |x,z)= 0$ and $q(\Gamma \setminus \{x\} |x,z)\ge 0$ for any $\Gamma \in \mathcal {B}(\mathbf {X})$ and any $(x,z)\in \mathbf {X}\times \mathbf {Z}$. To any transition rate q on $\mathbf {X}$ given $\mathbf {X}\times \mathbf {Z}$, we associate a kernel $\bar{q}$ on $\mathbf {X}$ given $\mathbf {X}\times \mathbf {Z}$ defined by $\bar{q}(\Gamma |x,z)=q(\Gamma \setminus \{x\} |x,z)$ for any $\Gamma \in \mathcal {B}(\mathbf {X})$ and any $(x,z)\in \mathbf {X}\times \mathbf {Z}$.

Let $\eta \in \mathcal {M}(\mathbf {X})$, $f\in \bar{\mathbb {M}}(\mathbf {Z})_{+}$ and a kernel T on $\mathbf {Z}$ given $\mathbf {X}$, then $\eta T$ denotes the measure on $\mathbf {Z}$ defined by $\displaystyle \eta T (\Gamma )=\int _{\mathbf {X}} T(\Gamma |x)\eta (dx)$ for any $\Gamma \in \mathcal {B}(\mathbf {Z})$ and Tf denotes the function defined on $\mathbf {X}$ by $\displaystyle Tf(x)=\int _{\mathbf {Z}} f(z) T(dz|x)$ for any $x\in \mathbf {X}$. Moreover, if R is a kernel on $\mathbf {Y}$ given $\mathbf {Z}$ then TR is the kernel on $\mathbf {Y}$ given $\mathbf {X}$ defined by $\displaystyle TR(\Gamma |x)=\int _{\mathbf {Z}} R(\Gamma |z)T(dz|x)$ for any $\Gamma \in \mathcal {B}(\mathbf {Y})$ and $x\in \mathbf {X}$.

Finally, the infimum over an empty set is understood to be equal to $+\infty $.

2.1 Parameters of the Model

We deal with a control model defined through the following elements:

$\mathbf {X}$ is the state space, assumed to be a Borel space (i.e., a measurable subset of a complete and separable metric space).
$\mathbf {A}$ is the action space, assumed to be also a Borel space. $\mathbf {A}^{i}\in \mathcal {B}(\mathbf {A})$ (respectively $\mathbf {A}^{g}\in \mathcal {B}(\mathbf {A})$) is the set of impulsive (respectively continuous) actions satisfying $\mathbf {A}=\mathbf {A}^i\cup \mathbf {A}^g$ with $\mathbf {A}^i\cap \mathbf {A}^g=\emptyset $.
The set of feasible actions in state $x\in \mathbf {X}$ is $\mathbf {A}(x)$, which is a nonempty measurable subset of $\mathbf {A}$. Admissible impulsive and continuous actions in the state $x\in \mathbf {X}$ are denoted by $\mathbf {A}^i(x)=\mathbf {A}(x)\cap \mathbf {A}^i$ and $\mathbf {A}^g(x)=\mathbf {A}(x)\cap \mathbf {A}^g$. It is supposed that $\mathbb {K}^g=\{(x,a)\in \mathbf {X}\times \mathbf {A}:a\in \mathbf {A}^{g}(x)\}\in \mathcal {B}(\mathbf {X}\times \mathbf {A}^g)$ and this set contains the graph of a measurable function from $\mathbf {X}$ to $\mathbf {A}^g$ (necessarily $\mathbf {A}^g(x)\ne \emptyset $ for all $x\in \mathbf {X}$) and that $\mathbb {K}^i=\{(x,a)\in \mathbf {X}\times \mathbf {A}^{i}:a\in \mathbf {A}^i(x)\}\in \mathcal {B}(\mathbb {X}^{i}\times \mathbf {A}^i)$ where $\mathbb {X}^{i}=\{x\in \mathbf {X}: \mathbf {A}^i(x)\ne \emptyset \}\in \mathcal {B}(\mathbf {X})$ and $\mathbb {K}^i$ contains the graph of a measurable function from $\mathbb {X}^{i}$ to $\mathbf {A}$.
The stochastic kernel Q on $\mathbf {X}$ given $\mathbb {K}^{i}$ describes the result of an impulsive action. In other words, if $x\in \mathbb {X}^{i}$ and an impulsive action $a\in \mathbf {A}^i(x)$ is applied then the state of the process changes instantly according to the stochastic kernel Q.
The signed kernel q on $\mathbf {X}$ given $\mathbb {K}^g$ is the intensity of jumps governing the dynamic of the process between interventions. For notational convenience, let us denote $q(\Gamma \setminus \{x\}|x,a)$ by $\bar{q}(\Gamma |x,a)$ for $\Gamma \in \mathcal {B}(\mathbf {X})$ and $(x,a)\in \mathbb {K}^{g}$. It satisfies $q(\mathbf {X}|x,a)=0$, $\bar{q}(\mathbf {X}|x,a)\ge 0$ and $\sup _{a\in \mathbf {A}(x)} \bar{q}(\mathbf {X}|x,a)< \infty $ for any $(x,a)\in \mathbb {K}^g$.

In our model, an intervention consists only of a finite sequence of pairs of impulsive action and associated jump. Actually, this finite sequence can be equivalently described by an infinite sequence of pairs of state and action, where the pairs are set to the fictitious action and state after a finite step. As a result, an intervention is an element of the set

$$\begin{aligned} \mathbf {Y}=\bigcup _{k\in \mathbb {N}}\mathbf {Y}_k \text { with } \mathbf {Y}_k=(\mathbf {X}\times \mathbf {A}^i)^k\times (\mathbf {X}\times \{\Delta \})\times (\{\Delta \}\times \{\Delta \})^\infty , \end{aligned}$$

where $\Delta $ will play the role of the fictitious state and action. The dynamic of such sequences is governed by the Markov Decision Process (MDP) $\mathcal {M}^{i}$ defined by $\mathcal {M}^{i}=\big (\mathbf {X}_\Delta ,\mathbf {A}^i_\Delta ,(\mathbf {A}^i_\Delta (x))_{x\in \mathbf {X}_\Delta },Q_{\Delta }\big )$ where $\mathbf {X}_{\Delta }$, $\mathbf {A}^{i}_{\Delta }$ and $\big (\mathbf {A}^{i}_{\Delta }(x))_{x\in \mathbf {X}_{\Delta }}$ are the new state and actions spaces augmented by the fictitious state $\Delta $: $\mathbf {X}_{\Delta }=\mathbf {X}\mathop {\cup }\{\Delta \}$, $\mathbf {A}^{i}_{\Delta }=\mathbf {A}^{i}\mathop {\cup }\{\Delta \}$ and $\mathbf {A}^{i}_{\Delta }(x)=\mathbf {A}^{i}(x)\mathop {\cup }\{\Delta \}$ for $x\in \mathbf {X}$ and $\mathbf {A}^{i}_{\Delta }(\Delta )=\{\Delta \}$. The dynamic is given by $Q_{\Delta }(.|x,a)=Q(.|x,a)$ for any $(x,a)\in \mathbb {K}^{i}$ and $Q_{\Delta }(\{\Delta \}|x,a)=1$ otherwise. For the model $\mathcal {M}^{i}$, according to the Ionescu Tulcea’s Theorem (see Proposition C.10 in [13]), there exists a unique strategic measure $P^{\beta }(\cdot |x)$ on $(\mathbf {X}_\Delta \times \mathbf {A}^i_\Delta )^\infty $ associated with the policy $\beta $ and the initial distribution $\delta _x$. Here and below, we use the standard terminology for MDP: see for example [13]. A policy is a sequence of past-dependent distributions on the action space. A randomized (non-randomized, respectively) policy is a control policy consisting in choosing randomly (deterministically, respectively) the actions along the time according to a probability law depending on the past history of the state and action processes. A Markov non-randomized policy is a sequence $(\varphi _j^i)_{j\in \mathbb {N}}$ of $\mathbf {A}^i_\Delta $-valued mappings on $\mathbf {X}_\Delta $, and so on. Observe that $P^{\beta }$ is in fact a stochastic kernel on $(\mathbf {X}_\Delta \times \mathbf {A}^i_\Delta )^\infty $ given $\mathbf {X}$, see Proposition C.10 in [13]. Since we only consider intervention as an element of $\mathbf {Y}$, we introduce $\Xi $ as the set of policies $\beta $ satisfying $P^{\beta }(\mathbf {Y}|x)=1$. We consider randomized interventions and consequently an intervention is an element of

$$\begin{aligned} \mathcal {P}^\mathbf {Y}=\{\gamma \in \mathcal {P}(\mathbf {Y}|\mathbf {X}): \gamma (\cdot |\cdot )=P^{\beta }(\cdot |\cdot ) \text { for some } \beta \in \Xi \}, \end{aligned}$$

and

$$\begin{aligned} \mathcal {P}^\mathbf {Y}(x)=\{\rho \in \mathcal {P}(\mathbf {Y}): \rho (\cdot )=P^{\beta }(\cdot |x) \text { for some } \beta \in \Xi \} \end{aligned}$$

is the set of feasible interventions in state $x\in \mathbf {X}$. Observe that if an intervention is chosen in $\mathbf {Y}_{0}$, it means actually that the controller has not intervened on the process through impulsive actions. For technical reasons, it appears necessary to introduce the set $\mathbf {Y}^{*}$ of real interventions given by $\mathbf {Y}^{*}=\bigcup _{k=1}^{\infty }\mathbf {Y}_k $. The associated sets of real randomized interventions are defined by

$$\begin{aligned} \mathcal {P}^{\mathbf {Y}^{*}}= & {} \{\gamma \in \mathcal {P}(\mathbf {Y}|\mathbf {X}): \gamma (\cdot |\cdot )=P^{\beta }(\cdot |\cdot ) \text { for some } \beta \in \Xi \\&\text { and }P^{\beta }(\mathbf {Y}^{*}|x)=1, \text{ for } \text{ any } x\in \mathbb {X}^{i}\} \end{aligned}$$

and

$$\begin{aligned} \mathcal {P}^{\mathbf {Y}^{*}}(x)=\{ \rho \in \mathcal {P}(\mathbf {Y}): \rho (\cdot )=P^{\beta }(\cdot |x) \text { for some } \beta \in \Xi \text { and }P^{\beta }(\mathbf {Y}^{*}|x)=1\} \end{aligned}$$

for $x\in \mathbf {X}$. Note that $\mathcal {P}^{\mathbf {Y}^{*}}(x)=\emptyset $ if $x\notin \mathbb {X}^{i}$.

Let us denote by $\mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$, the set of stochastic kernels $\pi \in \mathcal {P}(\mathbf {A}^{g}|\mathbf {X})$ such that for any $x\in \mathbf {X}$, $\pi (\mathbf {A}^{g}(x)|x)=1$ and by $\mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$, the set of stochastic kernels $\varphi \in \mathcal {P}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$ such that for any $x\in \mathbf {X}_{\Delta }$, $\varphi (\mathbf {A}^{i}_{\Delta }(x)|x)=1$. For $\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$, $\bar{q}^{\pi }$ denotes the kernel on $\mathbf {X}$ given $\mathbf {X}$ defined by $\displaystyle \int _{\mathbf {A}^{g}} \bar{q}(\Gamma |x,a) \pi (da|x)$ and $q^{\pi }$ denotes the transition rate on $\mathbf {X}$ given $\mathbf {X}$ defined by $\displaystyle \int _{\mathbf {A}^{g}} q(\Gamma |x,a) \pi (da|x)$ for any $\Gamma \in \mathcal {B}(\mathbf {X})$ and $x\in \mathbf {X}$. Similarly, for $\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$, $Q_{\Delta }^{\varphi }$ denotes the stochastic kernel on $\mathbf {X}_{\Delta }$ given $\mathbf {X}_{\Delta }$ defined by $\displaystyle \int _{\mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma |x,a) \varphi (da|x)$ for any $\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })$ and $x\in \mathbf {X}_{\Delta }$.

At some point, we need to consider strategic measures for the model $\mathcal {M}^{i}$ generated by arbitrary randomized stationary policies not necessarily belonging to $\Xi $. Consequently, let us introduce the space

$$\begin{aligned} \mathbb {K}^{i}_{\Delta }=\{(x,a)\in \mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }: a\in \mathbf {A}^{i}_{\Delta }(x)\} \in \mathcal {B}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }). \end{aligned}$$

Let $\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$, by a slight abuse of notation, let us denote by $\varphi $ the randomized stationary policy induced by the stochastic kernel $\varphi $ and by $P^{\varphi }$ the strategic measure for the model $\mathcal {M}^{i}$ generated by the policy $\varphi $. Clearly, $P^{\varphi } \in \mathcal {P}((\mathbb {K}^{i}_{\Delta })^{\infty }|\mathbf {X})$.

Finally, we end this subsection by introducing a projection mapping that will be used repeatedly in the paper. If $y\in \mathbf {Y}$ then there exists a unique $k\in \mathbb {N}$ such that $y\in \mathbf {Y}_k$. The $\mathbf {X}$-valued mapping $\bar{x}$ on $\mathbf {Y}$ is defined by

$$\begin{aligned} \bar{x}(y)=x_{k+1}. \end{aligned}$$

(1)

2.2 Construction of the Process

Having introduced the parameters of the model, we are now in position to construct the Markov controlled process. Let $\mathbf {Y}_\infty =\mathbf {Y}\cup \{y_{\infty }\}$ where $y_{\infty }$ is an artificial (isolated) point and $\Omega _{n}=\mathbf {Y}\times (\mathbb {R}_{+}^{*}\times \mathbf {Y})^n\times (\{\infty \}\times \{y_{\infty }\})^\infty $, for $n\in \mathbb {N}$. The canonical space $\Omega $ is defined as $\Omega =\bigcup _{n=1}^\infty \Omega _{n}\bigcup \big ( \mathbf {Y}\times (\mathbb {R}_{+}^{*}\times \mathbf {Y})^\infty \big )$ and is endowed with its Borel $\sigma $-algebra denoted by $\mathcal {F}$. For notational convenience, $\omega \in \Omega $ will be represented as

$$\begin{aligned} \omega =(y_0,\theta _1,y_1,\theta _2,y_2,\ldots ). \end{aligned}$$

Here, $y_0=(x_0,\Delta ,\Delta ,\ldots )$ is the initial state of the controlled point process $\xi $ with values in $\mathbf {Y}$, defined below; $\theta _1=0$ and $y_1\in \mathbf {Y}$ is the result of the initial intervention. The components $\theta _{n}>0$ for $n\ge 2$ mean the sojourn times; $y_{n}$ denotes the result of an intervention (if $y_{n}\in \mathbf {Y}^{*}$) or corresponds to a natural jump (if $y_{n}\in \mathbf {Y}\setminus \mathbf {Y}^{*})$). In case $\theta _{n}<\infty $ and $\theta _{n+1}=\infty $, the trajectory has only n jumps and we put $y_m=y_{\infty }$ for all $m\ge n+1$.

The path up to $n\in \mathbb {N}$ is denoted by $h_{n}=(y_0,\theta _1,y_1,\theta _2,y_2,\ldots \theta _{n},y_{n})$ and the collection of all such paths is denoted by $\mathbf {H}_{n}$. For $n\in \mathbb {N}$, introduce the mappings $Y_{n}:~\Omega \rightarrow \mathbf {Y}_\infty $ by $Y_{n}(\omega )=y_{n}$ and, for $n\ge 2$, the mappings $\Theta _{n}:~\Omega \rightarrow \overline{\mathbb {R}}_{+}^{*}$ by $\Theta _{n}(\omega )=\theta _{n}$; $\Theta _{1}(\omega )=0$. The sequence $(T_{n})_{n\in \mathbb {N}^{*}}$ of $\overline{\mathbb {R}}_{+}^{*}$-valued mappings is defined on $\Omega $ by $T_{n}(\omega )=\sum _{i=1}^n\Theta _i(\omega )=\sum _{i=1}^n\theta _i$ and $T_\infty (\omega )=\lim _{n\rightarrow \infty }T_{n}(\omega )$. For notational convenience, we denote by $H_{n}=(Y_0,\Theta _1,Y_1,\ldots ,\Theta _{n},Y_{n})$ the n-term history process taking values in $\mathbf {H}_{n}$ for $n\in \mathbb {N}$.

The random measure $\mu $ associated with $(\Theta _{n},Y_{n})_{n\in \mathbb {N}}$ is a measure defined on $\mathbb {R}^{*}_{+}\times \mathbf {Y}$ by

$$\begin{aligned} \mu (\omega ;dt,dy)=\sum _{n\ge 2}I_{\{T_{n}(\omega )<\infty \}}\delta _{(T_{n}(\omega ),Y_{n}(\omega ))}(dt,dy). \end{aligned}$$

For notational convenience the dependence on $\omega $ will be ignored and instead of $\mu (\omega ;dt,dy)$ it will be written $\mu (dt,dy)$. For $t\in \mathbb {R}_{+}$, define $\mathcal {F}_t=\sigma \{H_1\}\vee \sigma \{\mu (]0,s]\times B):~s\le t,B\in \mathcal {B}(\mathbf {Y})\}$. Finally, we define the controlled process $\big \{\xi _t\big \}_{t\in \mathbb {R}_{+}}$:

$$\begin{aligned} \xi _t(\omega )=\left\{ \begin{array}{ll} Y_{n}(\omega ), &{} \quad {\text {if}}\,\,T_{n}\le t<T_{n+1} \quad {\text {for}}\,\,n\in \mathbb {N}^{*}; \\ y_{\infty }, &{} \quad {\text {if}}\,\,T_\infty \le t, \end{array}\right. \end{aligned}$$

and $\xi _{0-}(\omega )=Y_0=y_0$ with $y_0=(x_0,\Delta ,\Delta ,\ldots )$. Obviously, the controlled process $\{\xi _t\}_{t\in \mathbb {R}_{+}}$ can be equivalently described by the sequence $(\Theta _{n},Y_{n})_{n\in \mathbb {N}}$. The sequence $(T_{n})_{n\in \mathbb {N}^{*}}$ describes the times of jumps of $\{\xi _t\}_{t\in \mathbb {R}_{+}}$: $T_{n}$ is the n-th jump moment. The state $\xi _{t}$ is constant between the jump times $T_{n}$ and $T_{n+1}$ and represents the successive jumps of the process and the associated impulsive actions at time $T_{n}$. This choice for the process $\{\xi _t\}_{t\in \mathbb {R}_{+}}$ is motivated by the fact we consider models with possibly multiple impulses at the same time moment. In such a framework, we extend the state space from $\mathbf {X}$ to $\mathbf {Y}$ in order to include the sequence of successive instantaneous jumps and corresponding impulsive actions. The extended state is of the form $(x_0,a_0,x_1,a_1,\ldots ,x_k,a_k,x_{k+1},\Delta ,\Delta ,\ldots )$ where $x_{0}$ corresponds to a possibly natural jump or to the value of the process just before the intervention. The triple $(x_{j},a_{j},x_{j+1})$ indicates that the impulsive action $a_{j}$ has been applied to the system at state $x_{j}$ leading to a new state $x_{j+1}$ having distribution $Q(\cdot |x_{j},a_{j})$. The special impulsive action $\Delta $ means that the impulses are over and the artificial state $\Delta $ means the same. Observe that the last component different from $\Delta $ in $\xi _{t}$ corresponds to the last position of the process after a sequence of successive instantaneous jumps and impulses. It is given by $\overline{x}(\xi _{t})$. It may appear odd to have included the impulsive actions (associated to the successive jumps) in the state process $\xi _{t}$. However, this choice was made to simplify the model describing the dynamic of the process. Other approaches would have been possible as for example, a model where $\{\xi _t\}_{t\in \mathbb {R}_{+}}$ would have been only constituted of the successive jumps but it would have induced a much more complicated mathematical description for the dynamic with heavy notation.

2.3 Admissible Strategies and Conditional Distribution of the Controlled Process

An admissible (randomized) control strategy is a sequence $u=(u_{n})_{n\in \mathbb {N}}$ such that $u_{0}\in \mathcal {P}^{\mathbf {Y}}(x_0)$ and, for any $n\in \mathbb {N}^{*}$, $u_{n}$ is given by

$$\begin{aligned} u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big ), \end{aligned}$$

where $\psi _{n}$ is a stochastic kernel on $\overline{\mathbb {R}}^{*}_{+}$ given $\mathbf {H}_{n}$ satisfying $\psi _{n}(\cdot |h_{n})=\delta _{+\infty }(\cdot )$ for any $h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}$ with $\overline{x}(y_{n})\notin \mathbb {X}^{i}$, $\pi _{n}$ is a stochastic kernel on $\mathbf {A}^{g}$ given $ \mathbf {H}_{n}\times \mathbb {R}_{+}^{*}$ satisfying $\pi _{n}(\mathbf {A}^{g}(\overline{x}(y_{n}))|h_{n},t)=1$ for any $t\in \mathbb {R}_{+}^{*}$ and $h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}$, $\gamma ^0_{n}$ is a stochastic kernel on $\mathbf {Y}$ given $ \mathbf {H}_{n}\times \mathbb {R}_{+}^{*}\times \mathbf {X}$ satisfying $\gamma ^0_{n}(\cdot |h_{n},t,x) \in \mathcal {P}^{\mathbf {Y}}(x)$ for any $h_{n}\in \mathbf {H}_{n}$, $t\in \mathbb {R}_{+}^{*}$ and $x\in \mathbf {X}$, and $\gamma ^1_{n}$ is a stochastic kernel on $\mathbf {Y}$ given $ \mathbf {H}_{n}$ satisfying $\gamma ^1_{n}(\cdot |h_{n})\in \mathcal {P}^{\mathbf {Y}^{*}}(\overline{x}(y_{n}))$ for any $h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}$ with $\overline{x}(y_{n})\in \mathbb {X}^{i}$; if $\overline{x}(y_n)\notin \mathbb {X}^i$ then $\gamma ^1_n(\cdot |h_n)=\delta _{(\overline{x}(y_n),\Delta ,\Delta ,\ldots )}(\cdot )$.

The above conditions apply when $y_n\ne y_\infty $; otherwise, all the values of $\psi _n(\cdot | h_n)$, $\pi _n(\cdot |h_n,t)$, $\gamma ^0_n(\cdot |h_n,t,\cdot )$ and $\gamma ^1_n(\cdot |h_n)$ may be arbitrary.

The set of admissible control strategies is denoted by $\mathcal U$. In what follows, we use notation $\gamma _{n}=(\gamma ^0_{n},\gamma ^1_{n})$. An admissible strategy u with $u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )$ for $n\in \mathbb {N}^{*}$ is called randomized stationary, if there exist $\psi \in \bar{\mathbb {M}}_{+}^{*}(\mathbf {X})$, $\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$, a stochastic kernel $\gamma ^{0}$ on $\mathbf {Y}$ given $\mathbf {X}$ such that $\gamma ^0(\cdot |x) \in \mathcal {P}^{\mathbf {Y}}(x)$ for any $x\in \mathbf {X}$, and a stochastic kernel $\gamma ^{1}$ on $\mathbf {Y}$ given $\mathbf {X}$ such that $\gamma ^1(\cdot |x) \in \mathcal {P}^{\mathbf {Y}^{*}}(x)$ for any $x\in \mathbb {X}^{i}$ satisfying $u_{0}(\cdot )=\gamma ^{0}(\cdot |x_{0})$, $\psi _n(\cdot |h_n)=\delta _{\psi (\overline{x}(y_n))}(\cdot )$, $\pi _n(\cdot |h_n,t)=\pi (\cdot |\overline{x}(y_n))$, $\gamma ^{0}_{n}(\cdot |h_n,t,x)=\gamma ^{0}(\cdot |x)$, and $\gamma ^{1}_{n}(\cdot |h_n)=\gamma ^{1}(\cdot |\overline{x}(y_n))$ when $\overline{x}(y_{n})\in \mathbb {X}^{i}$.

Roughly speaking, $\psi _{n}$ represents the conditional time distribution of the next possible intervention after time $T_{n}$, $\pi _{n}$ is the usual continuous control influencing the intensity of the jumps q between $T_{n}$ and $T_{n+1}$, $\gamma ^{0}_{n}$ is the distribution of the next intervention if it is decided to have an intervention just immediately after a natural jump and $\gamma ^{1}_{n}$ is the distribution of the next intervention if it is decided to have an intervention before a natural jump.

Suppose a strategy $u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}$ is fixed with $u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )$ for $n\in \mathbb {N}^{*}$. We introduce the intensity of the natural jumps

$$\begin{aligned} \lambda _{n}(\Gamma _x,h_{n},t)= & {} \int _{\mathbf {A}^{g}} \bar{q}(\Gamma _x | \overline{x}(y_{n}),a) \pi _{n}(da | h_{n},t), \end{aligned}$$

and the rate of the natural jumps

$$\begin{aligned} \Lambda _{n}(\Gamma _{x},h_{n},t)= & {} \int _{]0,t[} \lambda _{n}(\Gamma _{x},h_{n},s) ds \end{aligned}$$

for any $n\in \mathbb {N}^{*}$, $\Gamma _{x}\in \mathcal {B}(\mathbf {X})$, $h_{n}=(y_0,\theta _1,y_1,\ldots ,\theta _{n},y_{n})\in \mathbf {H}_{n}$ and $t\in \bar{\mathbb {R}}_{+}^{*}$. Now, for any $n\in \mathbb {N}^{*}$, the stochastic kernel $G_{n}$ on $\mathbf {Y}_{\infty }\times \overline{\mathbb {R}}_{+}^{*}$ given $\mathbf {H}_{n}$ is defined by

$$\begin{aligned} G_{n}(\{+\infty \}\times \{y_{\infty }\} | h_{n})= & {} \delta _{y_{n}} (\{y_{\infty }\}) + \delta _{y_{n}} (\mathbf {Y}) e^{-\Lambda _{n}(\mathbf {X},h_{n},+\infty )}\psi _{n}(\{+\infty \}|h_n) \end{aligned}$$

(2)

and

$$\begin{aligned} G_{n}(\Gamma _{\Theta } \times \Gamma _{y}| h_{n})&= \delta _{y_{n}} (\mathbf {Y}) \Big [ \gamma _{n}^{1}(\Gamma _{y}| h_{n}) \int _{\Gamma _{\theta }} e^{-\Lambda _{n}(\mathbf {X},h_{n},t)} \psi _{n}(dt | h_{n}) \nonumber \\&\,\quad + \int _{\Gamma _{\theta }} \int _{\mathbf {X}} \psi _{n}([t,\infty ] | h_{n}) \gamma _{n}^{0}(\Gamma _{y}| h_{n},t,x) \lambda _{n}(dx,h_{n},t) e^{-\Lambda _{n}(\mathbf {X},h_{n},t)} dt \Big ], \end{aligned}$$

(3)

and

$$\begin{aligned} G_{n}(\{+\infty \} \times \Gamma _{y}| h_{n}) = 0, \end{aligned}$$

(4)

where $\Gamma _{y}\in \mathcal {B}(\mathbf {Y})$, $\Gamma _{\Theta }\in \mathcal {B}(\mathbb {R}_{+}^{*})$ and $h_{n}=(y_0,\theta _1,y_1,\ldots ,\theta _{n},y_{n})\in \mathbf {H}_{n}$. Note that the kernel $\gamma ^1_{n}$ does not appear in the formula for $G_{n}$ if $\overline{x}(y_{n})\notin \mathbb {X}^{i}$.

Consider an admissible strategy $u\in \mathcal {U}$ and an initial state $x_{0}\in \mathbf {X}$. Recalling that $\Omega =\bigcup _{n=1}^\infty \Omega _{n}\bigcup \big ( \mathbf {Y}\times (\mathbb {R}_{+}^{*}\times \mathbf {Y})^\infty \big )$ and that $\mathcal {F}$ denotes its associated Borel $\sigma $-algebra, Theorem 3.6 in [17] (or Remark 3.43 in [18] or [19]) implies the existence of a probability $\mathbb {P}^{u}_{x_{0}}$ on $(\Omega ,\mathcal {F})$ such that the restriction of $\mathbb {P}^{u}_{x_{0}}$ to $(\Omega ,\mathcal {F}_{0})$ is given by

$$\begin{aligned} \mathbb {P}^{u}_{x_{0}} \big ( \{Y_{0}\}\times \{0\} \times \Gamma _y \times (\bar{\mathbb {R}}_{+}^{*}\times \mathbf {Y}_{\infty })^{\infty } \big )= & {} u_{0}(\Gamma _y|x_{0}) \end{aligned}$$

(5)

for any $\Gamma _y\in \mathcal {B}(\mathbf {Y})$ and the positive random measure $\nu $ defined on $\mathbb {R}_{+}^{*}\times \mathbf {Y}$ by

$$\begin{aligned} \nu (dt,dy)= \sum _{n\in \mathbb {N}^{*}} \frac{G_{n}(dt-T_{n}, dy | H_{n})}{G_{n}([t-T_{n},+\infty ]\times \mathbf {Y}_{\infty } | H_{n})} I_{\{T_{n}< t \le T_{n+1}\}} \end{aligned}$$

(6)

is the predictable projection of $\mu $ with respect to $\mathbb {P}^{u}_{x_{0}}$.

Remark 2.1

Observe that $\mathcal {F}_{T_{n}}$ is the $\sigma $-algebra generated by the random variable $H_{n}$ for $n\in \mathbb {N}^{*}$. The conditional distribution of $(Y_{n+1},\Theta _{n+1})$ given $\mathcal {F}_{T_{n}}$ under $\mathbb {P}^{u}_{x_{0}}$ is determined by $G_{n}(\cdot | H_{n})$ and the conditional survival function of $\Theta _{n+1}$ given $\mathcal {F}_{T_{n}}$ under $\mathbb {P}^{u}_{x_{0}}$ is given by $G_{n}([t,+\infty ]\times \mathbf {Y}_{\infty }| H_{n})$.

3 Optimization Problem and Assumptions

The objective of this section is to introduce the infinite-horizon performance criteria we are concerned with. Next we state our assumptions on the parameters of the model. Moreover, in the context of these hypotheses, at the end of this section we recall a technical result from [8] providing a decomposition of the predictable projection $\nu $ of the measure $\mu $ in terms of the distributions $(\gamma ^{0}_{n})_{n\in \mathbb {N}^{*}}$ and $(\gamma ^{1}_{n})_{n\in \mathbb {N}^{*}}$ of the next intervention.

We consider an optimization problem with $p\in \mathbb {N}$ constraints where the performance and the constraint criteria are given in terms of infinite-horizon discounted functionals. In order to define these criteria, we need to introduce the cost rates $\big (C^{g}_{j}\big )_{j\in \mathbb {N}_{p}}$ associated with continuous actions. For any $j\in \mathbb {N}_{p}$, the real-valued mapping $C^{g}_{j}$ is defined on $\mathbb {K}^{g}$. The costs $\big (C^{i}_{j}\big )_{j\in \mathbb {N}_{p}}$ associated with an intervention $y=(x_{0},a_{0},x_{1},a_{1},\ldots )\in \mathbf {Y}$ are given by $C^{i}_{j}(y)=\sum _{k\in \mathbb {N}} c^{i}_{j}(x_k,a_k)$, where for any $j\in \mathbb {N}_{p}$, $c^{i}_{j}$ is a non-negative real-valued mapping defined on $\mathbb {K}^{i}_{\Delta }$ satisfying $c^{i}_{j}(x,a)=0$ if $(x,a)\notin \mathbb {K}^{i}$. For any $(x,a)\in \mathbb {K}^{i}$ and $j\in \mathbb {N}_{p}$, $c^{i}_{j}(x,a)$ corresponds to the cost associated with a single jump at $x\in \mathbf {X}$ resulting from the impulsive action $a\in \mathbf {A}^{i}(x)$. The cost associated with a randomized intervention $\rho \in \mathcal {P}^{\mathbf {Y}}(x)$ for $x\in \mathbf {X}$ is given by $\int _{\mathbf {Y}} C^{i}_{j}(y)\rho (dy|x)$ for any $j\in \mathbb {N}_{p}$. Therefore, the infinite-horizon discounted performance criteria corresponding to an admissible control strategy $u\in \mathcal{U}$ are defined by

$$\begin{aligned} \mathcal {V}_{j}(u,x_{0})&= \int _{\mathbf {Y}} C^{i}_{j}(y) u_{0}(dy|x_{0}) + \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{0}^{+\infty } e^{-\alpha s} \int _{\mathbf {A}^{g}} C^{g}_{j}(\overline{x}(\xi _{s-}),a) \pi (da |s) ds \Bigg ] \nonumber \\&\,\quad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [\times {\mathbf {Y}}} e^{-\alpha s} C^{i}_{j}(y) \mu (ds,dy) \Bigg ], \end{aligned}$$

(7)

for any $j\in \mathbb {N}_{p}$. In the previous expression, $\alpha >0$ is the discount factor. Note that, the performance criteria are well defined under Assumption A imposed below.

Definition 3.1

The constrained optimization problem consists to minimize $\mathcal {V}_{0}(u,x_{0})$ within the class of admissible strategies $u\in \mathcal {U}$ where $x_{0}$ is the initial state and such that $\mathcal {V}_{j}(u,x_{0}) \le B_{j}$, for any $j\in \mathbb {N}_{p}^{*}$ where $(B_{j})_{j\in \mathbb {N}^{*}_{p}}$ are nonnegative real numbers representing the constraint bounds. The class of feasible strategies $u\in \mathcal {U}$ will be denoted by $\displaystyle \mathcal {U}^{f} = \Big \{ u\in \mathcal {U}: \mathcal {V}_{j}(u,x_{0}) \le B_{j}, \text { for any } j\in \mathbb {N}_{p}^{*} \Big \}$.

Assumption A

There exists a constant $K\in \mathbb {R}_{+}$ such that for any $x\in \mathbf {X}$, $a^g\in \mathbf {A}^{g}(x)$, $a^i\in \mathbf {A}^{i}(x)$ and $j\in \mathbb {N}_{p}$

(A1)
$\bar{q}(\mathbf {X}|x,a^g)\le K$.
(A2)
$C^{g}_{j}(x,a^g)\ge 0$.
(A3)
$c^{i}_{j}(x,a^i)\ge 0$.

Remark 3.1

It must be emphasized that Assumption (A2) can be replaced by the following apparently weaker condition $C^g_j(x,a^g)\ge -K$.

The purpose of the next assumption is to avoid infinite simultaneous interventions. This is a classical hypothesis in the framework of impulsive control problems, see for example [6].

Assumption B

There exists a constant $\underline{c}>0$ such that $\sum _{j\in \mathbb {N}_{p}} c^{i}_{j}(x,a)\ge \underline{c}$ for any $(x,a)\in \mathbb {K}^{i}$.

Finally, let us recall the following technical result from [8, Lemma 3.1]

Lemma 3.2

The predictable projection of the random measure $\mu $ is given by $\nu =\nu _{0}+\nu _{1}$ where

$$\begin{aligned} \nu _{0}(\Gamma ,\Gamma _{y}) = \int _{\Gamma }\int _{\mathbf {A}^{g}}\int _\mathbf {X} \gamma ^{0}(\Gamma _{y}|x,s) \bar{q}(dx | \overline{x}(\xi _{s-}),a) \pi (da|s) ds, \end{aligned}$$

$$\begin{aligned} \nu _{1}(\Gamma ,\Gamma _{y}) = \sum _{n\in \mathbb {N}^{*}} \gamma _{n}^{1}(\Gamma _{y}| H_{n}) \int _{\Gamma } I_{\{T_{n}< s \le T_{n+1}\}} \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})}, \end{aligned}$$

$\displaystyle \gamma ^{0}(dy|x,t)=\sum _{n\in \mathbb {N}^{*}} I_{\{T_{n}< t \le T_{n+1}\}} \gamma ^{0}_{n}(dy|H_{n},t-T_n,x),$ $\displaystyle \pi (da|t)=\sum _{n\in \mathbb {N}^{*}} I_{\{T_{n}< t \le T_{n+1}\}} \pi _{n}(da|H_{n},t-T_{n})$, for any $\Gamma \in \mathcal {B}(\mathbb {R}_{+}^{*})$, $\Gamma _{y}\in \mathcal {B}(\mathbf {Y})$, $t\in \mathbb {R}_{+}$.

4 Occupation Measures and Their Properties

In this section, we introduce the definition of an occupation measure $\eta _{u}$ induced by a control strategy $u\in \mathcal {U}$ (see Definition 4.2). The objective of this section is twofold. First, it is shown in Theorem 4.5 that for any control strategy $u\in \mathcal {U}$, the corresponding occupation measure $\eta _{u}$ satisfies a linear equation depending on the kernels q and Q. A measure satisfying such equation will be called admissible. Second, from any admissible measure $\eta $ one can construct a randomized stationary control strategy $u\in \mathcal {U}$ such that the restrictions of $\eta $ and $\eta _{u}$ to $\mathbb {K}^{g}$ are equal and the restriction of $\eta _{u}$ to $\mathbb {K}^{i}$ is smaller than the restriction of $\eta $ to $\mathbb {K}^{i}$ (see Theorem 4.9). These two important results will play an important role in the next section to establish a connection between the constrained optimal control problem and the linear program given in Definition 5.1. In order to prove these two results, one needs to provide auxiliary results whose proofs are deferred to the Appendices 1 and 2 to streamline the presentation.

Definition 4.1

To any $\gamma \in \mathcal {M}(\mathbf {Y})$, we associate the measure $\widetilde{\gamma } \in \mathcal {M}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta })$ defined by $\displaystyle \widetilde{\gamma }(\Gamma ) = \sum _{j=1}^{\infty } \gamma \big (\{\mathbf {y}\in \mathbf {Y}: \mathbf {y}_{j}\in \Gamma \} \big )$, for any $\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta })$ and where $\mathbf {y}_{j}$ is the jth coordinate of $\mathbf {y}\in \mathbf {Y}\subset (\mathbb {K}^{i}_{\Delta })^{\infty }$. Similarly, if R is a stochastic kernel on $\mathbf {Y}$ given a Borel space $\mathbf {Z}$ then $\widetilde{R}$ is a kernel on $\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }$ given $\mathbf {Z}$ defined by $\displaystyle \widetilde{R}(\Gamma |z) = \sum _{j=1}^{\infty } R\big (\{\mathbf {y}\in \mathbf {Y}: \mathbf {y}_{j}\in \Gamma \} | z\big )$, for any $z\in \mathbf {Z}$ and $\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta })$.

These definitions will be naturally extended to probability measures defined on $(\mathbb {K}^{i}_{\Delta })^{\infty }$ and to stochastic kernels on $(\mathbb {K}^{i}_{\Delta })^{\infty }$ given a Borel space $\mathbf {Z}$. Now, we introduce the definition of an occupation measure $\eta _{u}$ induced by an admissible control strategy u.

Definition 4.2

For a strategy $u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}$, let us introduce the measures $\eta _{u}^{g}$ (respectively, $\mu _{u}^{i}$ and $\eta _{u}^{i}$) defined on $\mathbf {X}\times \mathbf {A}^{g}$ (respectively, $\mathbf {Y}$ and $\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }$) by

$$\begin{aligned} \eta _{u}^{g}(dx,da)= & {} \alpha \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{0}^{T_{\infty }} e^{-\alpha s} \delta _{\bar{x}(\xi _{s-})}(dx) \pi (da|s) ds \Bigg ], \end{aligned}$$

(8)

$$\begin{aligned} \mu _{u}^{i}(dy)= & {} \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,T_{\infty }[} e^{-\alpha s} \mu (ds,dy) \Bigg ] + u_{0}(dy|x_{0}), \end{aligned}$$

(9)

and

$$\begin{aligned} \eta _{u}^{i}(dx,da)= & {} \widetilde{\mu }_{u}^{i}(dx,da). \end{aligned}$$

(10)

Actually, the measure $\eta _{u}^{g}$ is supported on $\mathbb {K}^{g}$ and clearly finite for any $u\in \mathcal {U}$, and the measure $\eta _{u}^{i}$ is supported on $\mathbb {K}^{i}_{\Delta }$. Then, the measure $\eta _{u}$ defined on $\mathbf {X}\times \mathbf {A}$ by

$$\begin{aligned} \eta _{u}(\Gamma )=\eta _{u}^{g}(\Gamma \mathop {\cap }\mathbb {K}^{g})+\eta _{u}^{i}(\Gamma \mathop {\cap }\mathbb {K}^{i}), \end{aligned}$$

(11)

for any $\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A})$ is called the occupation measure of the controlled process induced by the control strategy u. Clearly, the measure $\eta _{u}$ is supported on $\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}$.

The infinite-horizon discounted performance criteria corresponding to an admissible control strategy $u\in \mathcal{U}$ satisfying $\mathbb {P}^{u}_{x_{0}}(T_{\infty }=+\infty )=1$ can be written in terms of the measure $\eta _{u}$ as follows

$$\begin{aligned} \mathcal {V}_{j}(u,x_{0})&= \eta _{u}(C_{j}) \end{aligned}$$

(12)

with $C_{j}(x,a)=\frac{1}{\alpha }C^{g}_{j}(x,a)I_{\mathbb {K}^{g}}(x,a)+c^{i}_{j}(x,a) I_{\mathbb {K}^{i}}(x,a)$, for $j\in \mathbb {N}_{p}$.

In order to show the first main result of this section, Theorem 4.6, we need to derive two intermediate results: Propositions 4.3 and 4.4. For the sake of clarity in the exposition, their proofs are presented in Appendix 1. Roughly speaking, these two technical results establish links between the measures $\eta ^{g}_{u}$ and $\eta ^{i}_{u}$.

Proposition 4.3

Consider a strategy $u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}$ fixed with $u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )$ for $n\in \mathbb {N}^{*}$ satisfying $\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })<\infty $. Then, for any $\Gamma \in \mathcal {B}(\mathbf {X})$,

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g})&= \eta _{u}^{i}(\Gamma \times \{\Delta \}) - \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \eta _{u}^{g}(dx,da) \nonumber \\&\,- \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} e^{-\alpha T_{n}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}) \Bigg ]. \end{aligned}$$

(13)

Proof

See Appendix 1. $\square $

Proposition 4.4

Consider a strategy $u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}$ fixed with $u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )$ for $n\in \mathbb {N}^{*}$ satisfying $\mathbb {P}^{u}_{x_{0}}(T_{\infty }=+\infty )=1$. Then, for any $\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })$

$$\begin{aligned} \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \eta _{u}^{i}(dz,db)\nonumber \\&\quad + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma \mathop {\cap }\mathbf {X} | x,a) \eta _{u}^{g}(dx,da) \nonumber \\&\,+\mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} e^{-\alpha T_{n}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}) \Bigg ]. \end{aligned}$$

(14)

Proof

See Appendix 1. $\square $

Remark 4.1

Observe that in the previous result, if we consider $\Gamma $ in $\mathcal {B}(\mathbf {X})$ then Eq. (14) becomes

$$\begin{aligned} \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \eta _{u}^{i}(dz,db)\nonumber \\&\quad + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma | x,a) \eta _{u}^{g}(dx,da)\nonumber \\&\,+\mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} e^{-\alpha T_{n}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}) \Bigg ], \end{aligned}$$

(15)

since $Q_{\Delta }(\Gamma | x,a)=0$ for any $(x,a)\notin \mathbb {K}^{i}$.

Definition 4.5

A measure $\rho \in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A})$ is said to be admissible if $\rho $ is concentrated on $\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}$ and

$$\begin{aligned} \rho (\Gamma \times \mathbf {A})&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \rho (dz,db) + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} q(\Gamma | x,a) \rho (dx,da), \end{aligned}$$

(16)

for any $\Gamma \in \mathcal {B}(\mathbf {X})$.

The next result shows that any occupation measure is admissible.

Theorem 4.6

Consider a strategy $u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}$ fixed with $u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )$ for $n\in \mathbb {N}^{*}$ satisfying $\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })<\infty $. Then, the measure $\eta _{u}$ is admissible.

Proof

First of all, recall that $\eta _{u}$ is finite and notice that $q(\Gamma |x,a)=\bar{q}(\Gamma |x,a)-I_{\Gamma }(x)\bar{q}(\mathbf {X}|x,a)$ for any $\Gamma \in \mathcal {B}(\mathbf {X})$ and $(x,a)\in \mathbb {K}^{g}$. Now consider $\Gamma \in \mathcal {B}(\mathbf {X})$, then by adding Eqs. (13) and (15), it yields that

$$\begin{aligned} \eta _{u}(\Gamma \times \mathbf {A})= & {} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g}) + \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}) \nonumber \\= & {} \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \eta _{u}^{i}(dz,db) + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} q(\Gamma | x,a) \eta _{u}^{g}(dx,da) \nonumber \\= & {} \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \eta _{u}(dz,db) + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} q(\Gamma | x,a) \eta _{u}(dx,da), \end{aligned}$$

showing the result. $\square $

Let $\varphi $ be a randomized stationary policy for the model $\mathcal {M}^{i}$. Introduce the set

$$\begin{aligned} \mathcal {S}_{\varphi }= \big \{ x\in \mathbf {X}: \widetilde{P}^{\varphi }(\mathbf {X} \times \{\Delta \}|x)=1 \big \}, \end{aligned}$$

(17)

and the stochastic kernel $R^{\varphi }$ on $\mathbf {Y}$ given $\mathbf {X}$ by

$$\begin{aligned} R^{\varphi }(dy| x)= P^{\varphi }(dy|x) I_{\mathcal {S}_{\varphi }}(x)+\delta _{(x,\Delta ,\Delta ,\ldots )}(dy) I_{\mathcal {S}_{\varphi }^{c}}(x). \end{aligned}$$

(18)

We introduce now a special class of randomized stationary control strategies.

Definition 4.7

Consider $\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$ and $\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$. The strategy $u^{\pi ,\varphi }=(u^{\pi ,\varphi }_{n})_{n\in \mathbb {N}}$ is defined by $u^{\pi ,\varphi }_{0}(\cdot )=R^{\varphi }(\cdot |x_{0})$ and by $u^{\pi ,\varphi } _{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )$ for $n\in \mathbb {N}^{*}$ where for any $x\in \mathbf {X}$, $t\in \mathbb {R}_{+}$ and $h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}$, $\psi _{n}(\cdot |h_{n})=\delta _{+\infty }(\cdot )$, $\pi _{n}(\cdot |h_{n},t)=\pi (\cdot |\bar{x}(y_{n}))$, $\gamma ^0_{n}(\cdot |h_{n},t,x)=R^{\varphi }(\cdot |x)$. Finally, $\gamma ^{1}_{n}$ is defined by $\gamma ^{1}_{n}(\cdot |h_{n})=\gamma ^{1}(\cdot |\overline{x}(y_{n}))$ where $\gamma ^{1}$ is an arbitrary stochastic kernel on $\mathbf {Y}$ given $ \mathbf {X}$ satisfying $\gamma ^{1}(\cdot |x)\in \mathcal {P}^{\mathbf {Y}^{*}}(x)$ for $x\in \mathbb {X}^{i}$ and $\gamma ^{1}(\cdot |x)=\delta _{(x,\Delta ,\Delta ,\ldots )}(\cdot )$ otherwise.

Clearly, the strategy so defined satisfies $u^{\pi ,\varphi } \in \mathcal {U}$. The following proposition provides important properties of the measures $\eta _{u^{\pi ,\varphi }}^{g}$ and $\eta _{u^{\pi ,\varphi }}^{i}$ that will be needed in the proof of Theorem 4.9.

Proposition 4.8

Consider $\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$ and $\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$. Then

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i}(dx,da) =&\Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} \bar{q}^{\pi } \Big ] \widetilde{R}^{\varphi }(dx,da), \end{aligned}$$

(19)

and for any $\Gamma \in \mathcal {B}(\mathbf {X})$

$$\begin{aligned} \widehat{\eta }_{u^{\pi ,\varphi }}^{g} (\Gamma )=&\frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} r^{\pi }_{\varphi }(\Gamma ) +\widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0}), \end{aligned}$$

(20)

where $\widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx)$ denotes $\eta _{u^{\pi ,\varphi }}^{g}(dx, \mathbf {A}^{g})$ and the transition rate $r_{\varphi }^{\pi }$ on $\mathbf {X}$ given $\mathbf {X}$ is defined by

$$\begin{aligned} r_{\varphi }^{\pi }(\Gamma | x)&= \bar{q}^{\pi } \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}) - I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x). \end{aligned}$$

(21)

Proof

See Appendix 2. $\square $

The following theorem is the second main result of this section. Roughly speaking, it can be seen as a converse of Theorem 4.6. In particular, it shows that an admissible measure $\eta $ may not be necessarily an occupation measure but one can construct from $\eta $ a randomized stationary control strategy u such that the corresponding occupation measure $\eta _{u}$ is smaller than $\eta $ (see Eq. 26).

Theorem 4.9

Let $\eta $ be an admissible measure. Let us define the measure $\eta ^{g}$ on $\mathbf {X}\times \mathbf {A}^{g}$ by

$$\begin{aligned} \eta ^{g}(\Gamma ) = \eta (\Gamma \mathop {\cap }\mathbb {K}^{g}), \end{aligned}$$

(22)

for any $\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A}^{g})$ and the measure $\eta ^{i}$ on $\mathbf {X}\times \mathbf {A}^{i}_{\Delta }$ by

$$\begin{aligned} \eta ^{i}(\Gamma ) = \eta (\Gamma \mathop {\cap }\mathbb {K}^{i}), \end{aligned}$$

(23)

for any $\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A}^{i})$ and

$$\begin{aligned} \eta ^{i}(\Gamma \times \{\Delta \}) = \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \eta ^{g}(dx,da)+\eta ^{g}(\Gamma \times \mathbf {A}^{g}), \end{aligned}$$

for any $\Gamma \in \mathcal {B}(\mathbf {X})$. Then, there exist a stochastic kernel $\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$ satisfying

$$\begin{aligned} \eta ^{g}(\Gamma )=\int _{\Gamma } \pi (da|x) \eta ^{g}(dx,\mathbf {A}^{g}), \end{aligned}$$

(24)

for any $\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A}^{g})$ and a stochastic kernel $\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$ satisfying

$$\begin{aligned} \eta ^{i}(\Gamma )=\int _{\Gamma } \varphi (da|x) \eta ^{i}(dx,\mathbf {A}^{i}_{\Delta }) \end{aligned}$$

(25)

for any $\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })$ and $\varphi (\{\Delta \}|\Delta )=1$. Then,

$$\begin{aligned} \eta ^{g}_{u^{\pi ,\varphi }}=\eta ^{g} \text{ and } \eta ^{i}_{u^{\pi ,\varphi }}\le \eta ^{i} \end{aligned}$$

(26)

and

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i} = \Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \eta _{u^{\pi ,\varphi }}^{g} \bar{q} \Big ] \widetilde{P}^{\varphi }. \end{aligned}$$

(27)

Proof

First, notice that $\eta ^{i}\in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })$ and since $Q_{\Delta }(\mathbf {X}|x,a)=1$ for any $(x,a)\in \mathbb {K}^{i}$ and $q(\mathbf {X}|x,a)=0$ for any $(x,a)\in \mathbb {K}^{g}$, we obtain from Eq. (16) that $\eta ^{g}\in \mathcal {P}(\mathbf {X}\times \mathbf {A}^{g})$. Consequently, Proposition D.8 in [13, p. 184] ensures the existence of $\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$ and $\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$ satisfying respectively Eqs. (24) and (25). Consider an arbitrary set $\Gamma \in \mathcal {B}(\mathbf {X})$. For notational convenience, let us denote $\eta ^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })$ by $ \widehat{\eta }^{i}(\Gamma )$ and $\eta ^{g}(\Gamma \times \mathbf {A}^{g})$ by $\widehat{\eta }^{g}(\Gamma )$. Observe that $Q^{\varphi }_{\Delta }(\Gamma | \{\Delta \} )=0$. By using the definitions of $\eta ^{g}$ and $\eta ^{i}$ and Eq. (16) which is satisfied by $\eta $, it is easy to see that

$$\begin{aligned} \widehat{\eta }^{g}(\Gamma )= & {} \eta ^{i}(\Gamma \times \{\Delta \}) - \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}^{\pi }(\mathbf {X}| x,a) \widehat{\eta }^{g}(dx),\end{aligned}$$

(28)

$$\begin{aligned} \widehat{\eta }^{i}(\Gamma )= & {} \delta _{x_{0}}(\Gamma ) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\Gamma ) + \int _{\mathbf {X}} Q^{\varphi }_{\Delta }(\Gamma | z) \widehat{\eta }^{i}(dz) \end{aligned}$$

(29)

for any $\Gamma \in \mathcal {B}(\mathbf {X})$ by recalling that $q(\Gamma |x,a)=\bar{q}(\Gamma |x,a)-I_{\Gamma }(x)\bar{q}(\mathbf {X}|x,a)$. Consequently,

$$\begin{aligned} \widehat{\eta }^{i}(\Gamma )&= \delta _{x_{0}}(\Gamma ) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\Gamma ) + \widehat{\eta }^{i} I_{X}Q^{\varphi }_{\Delta }(\Gamma ). \end{aligned}$$

(30)

Moreover, $I_{X}Q^{\varphi }_{\Delta } I_{X}=Q^{\varphi }_{\Delta } I_{X}$ and therefore, by iterating Eq. (30) we have

$$\begin{aligned} \int _{\mathbf {X}} \sum _{k=0}^{n} \big (Q^{\varphi }_{\Delta }\big )^{k}(\Gamma | x) \Big [ \delta _{x_{0}}(dx)+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(dx) \Big ] \le \widehat{\eta }^{i}(\Gamma ) \end{aligned}$$

(31)

for any $n\in \mathbb {N}$. Observe that $\displaystyle \sum _{k=0}^{\infty } \big (Q^{\varphi }_{\Delta }\big )^{k}(\Gamma |x)=\widetilde{P}^{\varphi }(\Gamma \times \mathbf {A}^{i}_{\Delta }|x)$. For notational convenience, let us denote by $\rho $ the measure $\Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{P}^{\varphi } \in \mathcal {M}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta })$ which is concentrated on $\mathbb {K}^{i}_{\Delta }$. Applying the monotone convergence Theorem and taking the limit as n tends to infinity in Eq. (31), it follows that

$$\begin{aligned} \rho (\Gamma \times \mathbf {A}^{i}_{\Delta }) \le \widehat{\eta }^{i}(\Gamma ). \end{aligned}$$

(32)

By using the fact that $\widetilde{P}^{\varphi }(dz,db | x)=\varphi (db|z) \widetilde{P}^{\varphi }(dz\times \mathbf {A}^{i}_{\Delta }|x)$ for any $x\in \mathbf {X}$, it follows that

$$\begin{aligned} \rho (dz,db) = \varphi (db|z) \rho (dz, \mathbf {A}^{i}_{\Delta }). \end{aligned}$$

(33)

Combining Eqs. (25) and (32)–(33), it implies

$$\begin{aligned} \rho \le \eta ^{i}. \end{aligned}$$

(34)

Clearly, the measure $\rho \in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })$ and satisfies

$$\begin{aligned} \rho (\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\Gamma ) + \int _{\mathbf {X}} Q^{\varphi }_{\Delta }(\Gamma | z) \rho (dz,\mathbf {A}^{i}_{\Delta }) \end{aligned}$$

and so, recalling (33)

$$\begin{aligned} \rho (\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \rho (dz,db) . \end{aligned}$$

Since $Q_{\Delta }(\mathbf {X} | z,b)=I_{\mathbb {K}^{i}}(z,b)$ and $\rho (\mathbb {K}^{i})= \rho (\mathbf {X}\times \mathbf {A}^{i})$, we obtain from the previous equation

$$\begin{aligned} \rho (\mathbf {X}\times \{\Delta \}) + \rho (\mathbf {X}\times \mathbf {A}^{i})&= \delta _{x_{0}}(\mathbf {X}) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\mathbf {X}) + \rho (\mathbf {X}\times \mathbf {A}^{i}) \end{aligned}$$

showing

$$\begin{aligned} \rho (\mathbf {X}\times \{\Delta \})&= \delta _{x_{0}}(\mathbf {X}) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\mathbf {X}). \end{aligned}$$

Moreover, from the definition of $\rho $ we obtain that

$$\begin{aligned} \int _{\mathbf {X}} \big [ \widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \} | x) -1 \big ] \delta _{x_{0}}(dx) + \int _{\mathbf {X}} \big [ \widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \} | x) - 1 \big ] \widehat{\eta }^{g} \bar{q}^{\pi } (dx)=0. \end{aligned}$$

Lemma B.1 yields that $\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \} | x)\le 1$ for any $x\in \mathbf {X}$. Consequently,

$$\begin{aligned} \big [\delta _{x_{0}}+\widehat{\eta }^{g} \bar{q}^{\pi }\big ] \big (\mathcal {S}_{\varphi }^{c}\big )=0, \end{aligned}$$

(35)

where the set $\mathcal {S}_{\varphi }$ has been introduced in (17). Now, combining Eqs. (29) and (34), it follows

$$\begin{aligned} \widehat{\eta }^{g}(\Gamma )&\ge \widetilde{P}^{\varphi }(\Gamma \times \{\Delta \} | x_{0}) + \frac{1}{\alpha } \widehat{\eta }^{g} p^{\pi }_{\varphi }(\Gamma ), \end{aligned}$$

where

$$\begin{aligned} p_{\varphi }^{\pi }(\Gamma | x)&= \bar{q}^{\pi } \widetilde{P}^{\varphi } (\Gamma \times \{\Delta \} | x) - I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x). \end{aligned}$$

(36)

Now, from Eq. (35) we have $\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \} | x_{0})=1$ and

$$\begin{aligned} \widehat{\eta }^{g} p^{\pi }_{\varphi }(\mathbf {X})=\int _{\mathbf {X}} \widetilde{P}^{\varphi } (\mathbf {X}\times \{\Delta \} | x) \widehat{\eta }^{g} \bar{q}^{\pi } (dx) - \widehat{\eta }^{g} \bar{q}^{\pi } (\mathbf {X}) =0. \end{aligned}$$

Therefore, since $\widehat{\eta }^{g}(\mathbf {X})=1$ it follows that the positive measure $\gamma $ defined on $\mathbf {X}$ by

$$\begin{aligned} \gamma = \widehat{\eta }^{g} -\widetilde{P}^{\varphi }(\cdot , \{\Delta \} | x_{0}) -\frac{1}{\alpha } \widehat{\eta }^{g} p^{\pi }_{\varphi }, \end{aligned}$$

satisfies $\gamma (\mathbf {X})=0$ and so

$$\begin{aligned} \widehat{\eta }^{g}(\Gamma )&= \widetilde{P}^{\varphi }(\Gamma \times \{\Delta \} | x_{0}) + \frac{1}{\alpha } \widehat{\eta }^{g} p^{\pi }_{\varphi }(\Gamma ). \end{aligned}$$

(37)

However, from the definition of $R^{\varphi }$ in Eq. (18) we have

$$\begin{aligned} \widetilde{R}^{\varphi }(dz,db|x)=\widetilde{P}^{\varphi }(dz,db|x)I_{\mathcal {S}_{\varphi }}(x) +\delta _{(x,\Delta )}(dz,db) I_{\mathcal {S}_{\varphi }^{c}}(x), \end{aligned}$$

on $\mathbf {X}\times \mathbf {A}^{i}_{\Delta }$. Now, the previous equation and (35) yield

$$\begin{aligned} \Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{R}^{\varphi }=\Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{P}^{\varphi }. \end{aligned}$$

(38)

Combining Eqs. (36)–(38), we obtain that

$$\begin{aligned} \widehat{\eta }^{g}(\Gamma )&= \widetilde{R}^{\varphi }(\Gamma \times \{\Delta \} | x_{0}) + \frac{1}{\alpha } \widehat{\eta }^{g} r^{\pi }_{\varphi }(\Gamma ), \end{aligned}$$

where $r^{\pi }_{\varphi }$ has been defined in Eq. (21). Clearly, $r^{\pi }_{\varphi }$ is a transition rate on $\mathbf {X}$ given $\mathbf {X}$ and so applying the uniqueness result of item d) in Theorem 3.2 in [24] we have from (20) that $\widehat{\eta }_{u^{\pi ,\varphi }}^{g}=\widehat{\eta }^{g}$. From the definitions of $u^{\pi ,\varphi }$ and $\eta _{u}^{g}$ it is easy to show that $\eta _{u^{\pi ,\varphi }}^{g}(dx,da) = \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx)\pi (da|x)$. Therefore,

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{g}(dx,da) = \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx)\pi (da|x)=\widehat{\eta }^{g}(dx)\pi (da|x)=\eta ^{g}(dx,da), \end{aligned}$$

(39)

showing the first part of the result.

Now, combining the previous equation, (34) and (38) we get that

$$\begin{aligned} \Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{R}^{\varphi }=\Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{P}^{\varphi }=\rho \le \eta ^{i}, \end{aligned}$$

and so, by using Eqs. (19) and (39), we have that $\eta _{u^{\pi ,\varphi }}^{i} = \Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} \bar{q}^{\pi } \Big ] \widetilde{P}^{\varphi } \le \eta ^{i}$ giving the last assertions. $\square $

Finally, the following corollary shows that, although $\eta ^i$ and $\eta ^i_{u^{\pi ,\varphi }}$ are not equal, there exists a subset of $\mathbf {X}\times \mathbf {A}^{i}$ on which they coincide.

Corollary 4.10

Under the conditions of Theorem 4.9, there exists a set $D\in \mathcal{B}(\mathbf {X})$ such that for any $z\in D$, $Q^\varphi _\Delta (D|z)=1$; $\widehat{\eta }^g_{u^{\pi ,\varphi }}(D)=\widehat{\eta }^i_{u^{\pi ,\varphi }}(D)=0$, and $\eta ^i(\Gamma )= \eta ^i_{u^{\pi ,\varphi }}(\Gamma )$ for any $\Gamma \in \mathcal{B}\big ( (\mathbf {X} \setminus D)\times \mathbf {A}^{i}_\Delta \big )$.

Proof

From Proposition 4.4 the pair of measures $(\eta _{u^{\pi ,\varphi }}^{g},\eta _{u^{\pi ,\varphi }}^{i})$ satisfies

$$\begin{aligned} \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(\Gamma )&= \delta _{x_{0}}(\Gamma )+\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma | x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da) + \int _{\mathbf {X}} Q^\varphi _{\Delta }(\Gamma | z) \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(dz), \end{aligned}$$

(40)

since for $u^{\pi ,\varphi }$ we have $\psi _{n}(\cdot |h_{n})=\delta _{+\infty }(\cdot )$ for any $h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}$. Moreover, according to Eq. (29), we have

$$\begin{aligned} \widehat{\eta }^{i}(\Gamma ) = \delta _{x_{0}}(\Gamma )+\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma | x,a) \eta ^{g}(dx,da) + \int _{\mathbf {X}} Q^\varphi _{\Delta }(\Gamma | z) \widehat{\eta }^{i}(dz) \end{aligned}$$

for any $\Gamma \in \mathcal{B}(\mathbf {X} )$. Therefore, the measure $\gamma $ defined on $\mathbf {X}$ by $\gamma =\widehat{\eta }^i-\widehat{\eta }_{u^{\pi ,\varphi }}^{i}$ satisfies the following equation

$$\begin{aligned} \gamma (\Gamma )&= \int _{\mathbf {X}} Q^\varphi _{\Delta }(\Gamma | z) \gamma (dz) \text{ for } \text{ any } \Gamma \in \mathcal{B}(\mathbf {X} ) \end{aligned}$$

(41)

since $\eta _{u^{\pi ,\varphi }}^{g}=\eta ^g$. Define the sequence of sets $(\mathbf {X}_{n})_{n\in \mathbb {N}}$ by $\mathbf {X}_{n+1}=\{z\in \mathbf {X}_{n}:\ Q^\varphi _{\Delta }(\mathbf {X}_{n} | z)=1\}$, for $n\in \mathbb {N}^{*}$ and $\mathbf {X}_{0}=\mathbf {X}$. This sequence satisfies $\gamma (\mathbf {X}_n\setminus \mathbf {X}_{n+1}) =0$, $ Q^\varphi _{\Delta }(\mathbf {X}_n | z)=1$, for any $z\in \mathbf {X}_{n+1}$ and

$$\begin{aligned} \gamma (\mathbf {X}_n)= & {} \int _{\mathbf {X}_n} Q^\varphi _{\Delta }(\mathbf {X}_n | z) \gamma (dz), \end{aligned}$$

(42)

for any $n\in \mathbb {N}$. Indeed, Eq. (42) clearly holds for $n=0$ by using (41). Therefore, we have $\gamma (z\in \mathbf {X}_0:\ Q^\varphi _{\Delta }(\mathbf {X}_0 | z)<1)=0$ implying that $\gamma (\mathbf {X}_{0}\setminus \mathbf {X}_{1}) =0$. Moreover, by definition of $\mathbf {X}_{1}$ it is straightforward to see that $Q^\varphi _{\Delta }(\mathbf {X}_{0} | z)=1$ for any $z\in \mathbf {X}_{1}$. Suppose the decreasing family of sets $(\mathbf {X}_{j})_{j\in \mathbb {N}_{n}}$ satisfies the above equations. From Eq. (42) we have $\gamma (z\in \mathbf {X}_{n}:\ Q^\varphi _{\Delta }(\mathbf {X}_{n} | z)<1)=0$ showing that $\gamma (\mathbf {X}_{n}\setminus \mathbf {X}_{n+1})=0$. Then by using Eq. (41), we obtain $\displaystyle \gamma (\mathbf {X}_{n+1}) = \int _{\mathbf {X}_{n+1}} Q^\varphi _{\Delta }(\mathbf {X}_{n+1} | z) \gamma (dz)$. Finally, by definition of $\mathbf {X}_{n+1}$, we have $Q^\varphi _{\Delta }(\mathbf {X}_{n} | z)=1$ for any $z\in \mathbf {X}_{n+1}$. Let us introduce the set $D\subset \mathbf {X}$ defined by $\displaystyle D=\bigcap _{j=0}^\infty \mathbf {X}_{j}$. Then $\gamma (\mathbf {X}\setminus D)=\sum _{j=0}^\infty \gamma (\mathbf {X}_{j}\setminus \mathbf {X}_{j+1})=0$. Consequently, for any $\Gamma \in \mathcal{B}\big ( (\mathbf {X} \setminus D) \big )$ we have $\widehat{\eta }^{i}(\Gamma )= \widehat{\eta }^{i}_{u^{\pi ,\varphi }}(\Gamma )$, so the measures $\eta ^i$ and $\eta _{u^{\pi ,\varphi }}^{i}$ coincide on $(\mathbf {X}\setminus D)\times \mathbf {A}^i_\Delta $ because $\eta ^i(dx,da)=\widehat{\eta }^{i}(dx)\varphi (da | x)$ and $\eta _{u^{\pi ,\varphi }}^i(dx,da)=\widehat{\eta }_{u^{\pi ,\varphi }}^{i}(dx)\varphi (da | x)$ due to (27).

Now, observe that for any $z\in D$ and $j\in \mathbb {N}$, $Q^\varphi _{\Delta }(\mathbf {X}_{j} |z)=1$ implying $Q^\varphi _{\Delta }(D |z)=1$ and so, choosing $\Gamma =D$ in Eq. (40), we have

$$\begin{aligned} \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(D)&= \delta _{x_{0}}(D)+\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(D| x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da) +\widehat{\eta }_{u^{\pi ,\varphi }}^{i}(D)\\&\quad + \int _{\mathbf {X}\setminus D} Q^\varphi _{\Delta }(D | z) \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(dz) \end{aligned}$$

leading to $\displaystyle \delta _{x_{0}}(\Gamma )+\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma | x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da)+ \int _{\mathbf {X}\setminus D} Q^\varphi _{\Delta }(\Gamma | z) \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(dz)=0$, for any $\Gamma \in \mathcal {B}(D)$. Consequently, $\delta _{x_{0}}(D)+\frac{1}{\alpha } \eta _{u^{\pi ,\varphi }}^{g} \bar{q}(D)=0$ and $\big [\delta _{x_{0}}+\frac{1}{\alpha } \eta _{u^{\pi ,\varphi }}^{g} \bar{q} \big ] \sum _{k\in \mathbb {N}} \big ( Q^\varphi _{\Delta } \big )^{k} I_{\mathbf {X}\setminus D} Q^\varphi _{\Delta }(D)=0$, where we have used (27) to get the last equation. Combining the two previous equations, it can be shown easily by induction that $\big [\delta _{x_{0}}+\frac{1}{\alpha } \eta _{u^{\pi ,\varphi }}^{g} \bar{q} \big ] \big ( Q^\varphi _{\Delta } \big )^{k} (D)=0$ for any $k\in \mathbb {N}$ and so, recalling (27), it follows that $\widehat{\eta }_{u^{\pi ,\varphi }}^{i}(D)=0$.

Now, from Eq. (40) and by using the fact that $\delta _{x_{0}}(D)=\widehat{\eta }_{u^{\pi ,\varphi }}^{i}(D)=0$, we obtain

$$\begin{aligned} \int _{\mathbf {X} \times \mathbf {A}^{i}} Q_{\Delta }(D | z,b) \eta _{u^{\pi ,\varphi }}^{i}(dz,db) +\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(D| x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da)=0. \end{aligned}$$

Therefore, Eq. (16) yields $\displaystyle \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(D\times \mathbf {A}^{g})=\frac{1}{\alpha } \int _{D \times \mathbf {A}^{g}} q(\{x\}| x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da)$. Since $\eta _{u^{\pi ,\varphi }}^{g}\ge 0$ and $q(\{x\}| x,a)\le 0$, we conclude that $\widehat{\eta }_{u^{\pi ,\varphi }}^{g}(D\times \mathbf {A}^{g})=0$. $\square $

5 The LP Formulation

The main objective of this section is to show the existence of an optimal strategy for the constrained optimal control problem introduced in Definition 5.1. The idea to get this existence result can be decomposed into two steps. First, we introduce the (primal) linear program $\mathbb {PLP}$ associated with the optimization problem under consideration (see Definition 5.1) and show in Theorem 5.2 that there exists an optimal strategy for the constrained optimization control problem if and only if the linear program $\mathbb {PLP}$ is solvable. The second step consists in showing that the $\mathbb {PLP}$ is solvable (Theorem 5.5) and this is done by introducing an auxiliary linear program (whose properties are studied in Proposition 5.4) and by considering an additional set of hypotheses (see Assumption ). Combining these two steps, it is straightforward to obtain the existence of an optimal randomized control strategy for the constrained optimal control problem (see Theorem 5.6). An easy consequence of this result is that the class of strategies introduced in Definition 4.7 is a sufficient set. Assumption is a standard hypothesis in the literature on CTMDPs (see for example [24]) and mainly requires that the parameters of the system be lower semicontinuous and that the transition rate be weakly continuous. As an independent result, the dual linear program associated with the linear program PLP is briefly discussed at the end of this section.

Definition 5.1

The constrained linear program, labeled $\mathbb {PLP}$, is defined as minimize $\eta (C_{0})$ subject to $\eta \in \mathbb {L}$ where $\mathbb {L}$ is defined by the set of measures $\eta $ in $\mathcal {M}_{f}(\mathbf {X}\times \mathbf {A})$ which are admissible in the sense of Definition 4.5 and such that for any $j\in \mathbb {N}_{p}^{*}$, $ \eta (C_{j})\le B_{j}$.

The nonnegative real number $\inf _{\eta \in \mathbb {L}} \eta (C_{0})$ is called the value of the constrained linear program $\mathbb {PLP}$. Below, we say that $\mathbb {PLP}$ is solvable if there is $\eta ^*\in \mathbb {L}$ such that $\eta ^*(C_0)=\inf _{\eta \in \mathbb {L}} \eta (C_{0})$.

Theorem 5.2

The values of the constrained control problem and the linear program $\mathbb {PLP}$ are equivalent:

$$\begin{aligned} \inf _{\eta \in \mathbb {L}} \eta (C_{0}) =\inf _{u\in \mathcal{U}^f} \mathcal {V}_0(u,x_0). \end{aligned}$$

Moreover, assume the existence of $\bar{u}\in \mathcal {U}^{f}$ such that $\mathcal {V}_{0}(\bar{u},x_{0})<\infty $. Then the following assertions hold:

(i)
The measure $\eta _{\bar{u}}$ as defined in Eq. (11) for the strategy $\bar{u}$ belongs to $\mathbb {L}$ and $\eta _{\bar{u}}(C_{0})<\infty $.
(ii)
The constrained optimal control problem as introduced in Definition 3.1 is solvable if and only if the linear program $\mathbb {PLP}$ is solvable.
(iii)
If the constrained optimal control problem is solvable then there exists a randomized stationary optimal control strategy where the interventions only occur after the natural jumps and with a possible intervention at the initial moment.

Proof

if $\eta \in \mathbb {L}$ then it is admissible in the sense of Definition 4.5. From Theorem 4.9, the control strategy $u^{\pi ,\varphi }\in \mathcal {U}$ where $\pi $ (respectively, $\varphi $) has been defined in Eq. (24) (respectively, (25)) satisfies $\eta ^{g}_{u^{\pi ,\varphi }}=\eta ^{g}$ and $\eta ^{i}_{u^{\pi ,\varphi }}\le \eta ^{i}$ with $\eta ^{g}$ (respectively, $\eta ^{i}$) given in (22) (respectively, (23)). Therefore, we have for any $j\in \mathbb {N}_{p}^{*}$, $\mathcal {V}_{j}(u^{\pi ,\varphi },x_{0})\le \eta (C_{j})\le B_{j}$ and $\mathcal {V}_{0}(u^{\pi ,\varphi },x_{0})\le \eta (C_{0})$. In particular, this first statement implies that on one hand $\mathbb {L}$ is empty if $\mathcal {U}^f$ is empty and on the other hand if the set $\mathcal {U}^f$ is not empty with $\mathcal {V}_0(u,x_0)=\infty $ for any $u\in \mathcal {U}^f$ then either $\mathbb {L}$ is empty or $\displaystyle \inf _{\eta \in \mathbb {L}} \eta (C_{0})=\infty $ showing in any case $\displaystyle \inf _{\eta \in \mathbb {L}} \eta (C_{0}) =\inf _{u\in \mathcal{U}^f} \mathcal {V}_0(u,x_0)=\infty .$

Now, if $\mathcal {V}_{0}(u,x_{0})<\infty $ for $u\in \mathcal {U}$ and $\mathcal {V}_{j}(u,x_{0})\le B_{j}$ for any $j\in \mathbb {N}_{p}^{*}$ then recalling Assumptions (A2), (A3) and B we have necessarily $\eta _{u}^{i}(\mathbf {X}\times \mathbf {A}^{i})<\infty $. From Lemma A.1, it gives $\eta _{u}^{i}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })<\infty $. Consequently, according to Theorem 4.6, for any admissible control strategy $u\in \mathcal {U}$ such that $\mathcal {V}_{0}(u,x_{0})<\infty $ and $\mathcal {V}_{j}(u,x_{0}) \le B_{j}$ for any $j\in \mathbb {N}_{p}^{*}$, there exists a finite measure $\eta _{u}\in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A})$ concentrated on $\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}$ satisfying Eq. (16) with $\eta _{u}(C_{j})=\mathcal {V}_{j}(u,x_{0})$ for any $j\in \mathbb {N}_{p}$ implying that $\eta _{u}\in \mathbb {L}$ and $\eta _{u}(C_{0})<\infty $.

Combining these two statements, we obtain easily the results. $\square $

To study the solvability of the linear program $\mathbb {PLP}$, we need to introduce an auxiliary linear program. First, let us define $\mathbf {X}_{\sigma }$ by $\mathbf {X}\mathop {\cup }\{\sigma \}$ where $\sigma $ is an isolated point and the kernel $\widetilde{Q}$ on $\mathbf {X}_{\sigma }$ given $\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\mathop {\cup }(\{\sigma \}\times \mathbf {A})$ by

$$\begin{aligned} \widetilde{Q}(\Gamma |x,a)= & {} \Big \{ \frac{1}{K+\alpha }\Big [ \bar{q}(\Gamma \mathop {\cap }\mathbf {X}|x,a)+\delta _{x}(\Gamma \mathop {\cap }\mathbf {X})\big (K-\bar{q}(\mathbf {X}|x,a)\big )\Big ] \nonumber \\&+\frac{\alpha }{K+\alpha }\delta _{\sigma }(\Gamma ) \Big \} I_{\mathbb {K}^{g}}(x,a) +Q(\Gamma \mathop {\cap }\mathbf {X}|x,a) I_{\mathbb {K}^{i}}(x,a) + \delta _{\sigma }(\Gamma ) I_{\{\sigma \}}(x).\nonumber \\ \end{aligned}$$

(43)

Definition 5.3

The auxiliary linear program, labeled $\mathbb {LP}'$, is defined as minimize $\rho (C'_{0})$ subject to $\rho \in \mathbb {L}'$ where $\mathbb {L}'$ is defined by the set of measures $\rho $ in $\mathcal {M}(\mathbf {X}_{\sigma }\times \mathbf {A})$ concentrated on $\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\mathop {\cup }(\{\sigma \}\times \mathbf {A})$ such that for any $\Gamma \in \mathcal {B}(\mathbf {X}_{\sigma })$

$$\begin{aligned} \rho (\Gamma \times \mathbf {A})&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}_{\mathbf {\sigma }}\times \mathbf {A}} \widetilde{Q}(\Gamma | z,b) \rho (dz,db), \end{aligned}$$

(44)

$$\begin{aligned} \rho (C'_{j})\le B_{j}, \text { for any } j\in \mathbb {N}_{p}^{*}, \end{aligned}$$

$$\begin{aligned} \rho (\mathbb {K}^{g})\le \frac{K+\alpha }{\alpha }, \end{aligned}$$

with $C'_{j}(x,a)=\frac{1}{K+\alpha }C^{g}_{j}(x,a)I_{\mathbb {K}^{g}}(x,a)+c^{i}_{j}(x,a) I_{\mathbb {K}^{i}}(x,a)$ for $j\in \mathbb {N}_{p}$.

Proposition 5.4

The following assertions hold:

i)
If the measure $\eta $ belongs to $\mathbb {L}$ then the measure $\rho $, defined on $\mathbf {X}_{\sigma }\times \mathbf {A}$ by
$$\begin{aligned} \rho (\Gamma )=\eta (\Gamma \mathop {\cap }\mathbb {K}^{i})+\frac{K+\alpha }{\alpha }\eta (\Gamma \mathop {\cap }\mathbb {K}^{g}), \end{aligned}$$
for any $\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A})$ and $\rho (\{\sigma \}\times \Gamma )=+\infty $ for any $\Gamma \in \mathcal {B}(\mathbf {A})$, belongs to $ \mathbb {L}'$. Moreover, $\rho (C'_{j})=\eta (C_{j})$ for any $j\in \mathbb {N}_{p}$.
ii)
If the measure $\rho \in \mathbb {L}'$ satisfies $\rho (C'_{0})<\infty $ then the measure $\eta $ defined on $\mathbf {X}\times \mathbf {A}$ by
$$\begin{aligned} \eta (\Gamma )=\rho (\Gamma \mathop {\cap }\mathbb {K}^{i})+\frac{\alpha }{K+\alpha }\rho (\Gamma \mathop {\cap }\mathbb {K}^{g}) \end{aligned}$$
belongs to $\mathbb {L}$. Moreover, $\eta (C_{j})=\rho (C'_{j})$ for any $j\in \mathbb {N}_{p}$.

Proof

Regarding item i), it is clear that $\rho $ so defined is a positive measure on $\mathbf {X}_{\sigma }\times \mathbf {A}$ concentrated on $\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\mathop {\cup }(\{\sigma \}\times \mathbf {A})$. Moreover, a straightforward calculus show that $\rho $ satisfies Eq. (44) with $\rho (C'_{j})=\eta (C_{j})$, for any $j\in \mathbb {N}_{p}$, giving the first part of the result. For item ii), we have $\eta (\mathbb {K}^{g})=\frac{\alpha }{K+\alpha }\rho (\mathbb {K}^{g})\le 1$. Moreover, combining assumptions (A2), (A3) and B it follows that $\underline{c}\eta (\mathbb {K}^{i})\le \sum _{j\in \mathbb {N}_{p}^{*}} B_{j}+\eta (C'_{0})$. Consequently, $\eta \in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A})$ and $\eta $ is clearly concentrated on $\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}$. Now, simple algebraic manipulations yield that the measure $\eta $ satisfies Eq. (16) with $\eta (C_{j})=\rho (C'_{j})$, for any $j\in \mathbb {N}_{p}$ showing the last part of the result. $\square $

Assumption C

(C1):: The transition rate q is weakly continuous, that is, for any $h\in \mathbb {C}_{b}(\mathbf {X})$, $qh \in \mathbb {C}_{b}(\mathbb {K}^{g})$.
(C2):: If $\mathbb {K}^{i}\ne \emptyset $, the transition kernel Q is weakly continuous, that is, $Qh \in \mathbb {C}_{b}(\mathbb {K}^{i})$ for any $h\in \mathbb {C}_{b}(\mathbf {X})$.
(C3):: For any $j\in \mathbb {N}_{p}$, the function $c^{i}_{j}$ is lower semicontinuous on $\mathbb {K}^{i}$.
(C4):: For any $j\in \mathbb {N}_{p}$, the function $C^{g}_{j}$ is lower semicontinuous on $\mathbb {K}^{g}$.
(C5):: For any $x\in \mathbf {X}$, $\mathbf {A}^{g}(x)$ and $\mathbf {A}^{i}(x)$ are compact sets.
(C6):: The multifunction $\Psi : \mathbf {X}\rightarrow \mathbf {A}$ defined by $\Psi (x)=\mathbf {A}^{g}(x)\mathop {\cup }\mathbf {A}^{i}(x)$ is upper semicontinuous.

Theorem 5.5

Assume that there exists $\bar{\eta }\in \mathbb {L}$ such that $\bar{\eta }(C_{0})<\infty $. Then the linear program $\mathbb {PLP}$ is solvable.

Proof

Introduce $\mathbb {K}=\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\mathop {\cup }(\{\sigma \}\times \mathbf {A})$. According to Assumption , it is easy to see that the stochastic kernel $\widetilde{Q}$ on $\mathbf {X}_{\sigma }$ given $\mathbb {K}$ is weakly continuous and the mappings $C'_{j}$ are nonnegative and lower semicontinuous on $\mathbb {K}$ for any $j\in \mathbb {N}_{p}$. Consequently, by using Theorem 7.2 in [5], the auxiliary linear program $\mathbb {LP}'$ is solvable because, according to item i) of Proposition 5.4, there is $\overline{\rho }\in \mathbb {L}'$ such that $\overline{\rho }(C'_0)=\overline{\eta }(C_0)<\infty $. Observe that Theorem 7.2 in [5] is an extension of Theorem 4.1 in [7] to the case where the action space depends on the state variable. Now, from Proposition 5.4, it is easy to see that the auxiliary linear program $\mathbb {LP}'$ is solvable if and only if the linear program $\mathbb {LP}$ is solvable, since there exists $\bar{\rho }\in \mathbb {L}'$ such that $\bar{\rho }(C'_{0})<\infty $. Let $\rho ^{*}$ be a measure in $\mathbb {L}'$ such that $\inf _{\rho \in \mathbb {L}'} \rho (C'_{0})=\rho ^{*}(C'_{0})$. Then, $\rho ^{*}(C'_{0})\le \bar{\rho }(C'_{0})<\infty $. Consequently, item ii) of Proposition 5.4 implies that the measure $\eta ^{*}$ defined on $\mathbf {X}\times \mathbf {A}$ by

$$\begin{aligned} \eta ^{*}(\Gamma )=\rho ^{*}(\Gamma \mathop {\cap }\mathbb {K}^{i})+\frac{\alpha }{K+\alpha }\rho ^{*}(\Gamma \mathop {\cap }\mathbb {K}^{g}) \end{aligned}$$

belongs to $\mathbb {L}$ with $\eta ^{*}(C_{0})=\rho ^{*}(C'_{0})$. Finally, it is easy to show that we have necessarily

$$\begin{aligned} \inf _{\eta \in \mathbb {L}} \eta (C_{0})=\eta ^{*}(C_{0}) \end{aligned}$$

by using item i) of Proposition 5.4, giving the result. $\square $

If $\mathbb {L}\ne \emptyset $ and $\eta (C_0)=\infty $ for any $\eta \in \mathbb {L}$ then $\mathbb {PLP}$ is also solvable without Assumption . The last following Theorem is the main result of this section. It establishes the existence of an optimal randomized control strategy and states that the class of strategies introduced in 4.7 is a sufficient set.

Theorem 5.6

Assume that there exists $\bar{u}\in \mathcal {U}^{f}$ such that $\mathcal {V}_{0}(\bar{u},x_{0})<\infty $. Then, there exists a randomized stationary optimal control strategy, for the constrained optimal control problem introduced in Definition 3.1, where the interventions only occur after the natural jumps and with a possible intervention at the initial moment, that is, there exist $\pi ^{*}\in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$ and $\varphi ^{*} \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$ such that the strategy $u^{\pi ^{*},\varphi ^{*}}$ as introduced in Definition 4.7 belongs to $\mathcal {U}^{f}$ and satisfies

$$\begin{aligned} \inf _{u\in \mathcal {U}^{f}} \mathcal {V}_{0}(u,x_{0}) =\mathcal {V}_{0}(u^{\pi ^{*},\varphi ^{*}},x_{0}) =\eta _{u^{\pi ^{*},\varphi ^{*}}}(C_{0})=\inf _{\eta \in \mathbb {L}} \eta (C_{0}). \end{aligned}$$

(45)

As a consequence, the class of randomized stationary control strategy as introduced in Definiton 4.7 is a sufficient class of control strategy for the constrained control problem under consideration.

Proof

This result is a straightforward consequence of Theorems 5.2 and 5.5. $\square $

We now discuss briefly the dual program associated to the linear program $\mathbb {PLP}$ in the case of an unconstrained control problem, that is $p=0$. Assume that $|C^{g}_{0}(x,a)|\le K$ for any $(x,a)\in \mathbb {K}^{g}$. The dual linear program is defined as maximise $W(x_{0})$ subject to $W\in \mathbb {L}^{*}$ where $\mathbb {L}^{*}$ is the set of functions belonging to $\mathbb {B}(\mathbf {X})$ such that

$$\begin{aligned} W(x) \le C_{0}(x,a)+ I_{\mathbb {K}^i}(x,a)\int _{\mathbf {X}} W(y)Q(dy|x,a){+}\frac{1}{\alpha }I_{\mathbb {K}^g}(x,a)\int _{\mathbf {X}} W(y)q(dy|x,a). \end{aligned}$$

It is easy to show that that for any $\eta \in \mathbb {L}$ and $W\in \mathbb {L}^{*}$, we have $\eta (C_{0})\ge W(x_{0})$ showing that $\inf _{\eta \in \mathbb {L}} \eta (C_{0}) \ge \sup _{W\in \mathbb {L}^{*}} W(x_{0})$. Now, if Assumptions A and B hold then we have $\displaystyle \inf _{u\in \mathcal {U}} \mathcal {V}_{0}(u,x_{0})=\inf _{\eta \in \mathbb {L}} \eta (C_{0}) \ge \sup _{W\in \mathbb {L}^{*}} W(x_{0})$, from Theorem 5.2. Observe that the set $\mathbb {L}^{*}$ can be equivalently described by the set of functions belonging to $\mathbb {B}(\mathbf {X})$ and satisfying the following two inequalities

$$\begin{aligned} \left\{ \begin{array}{rl} \alpha W(x) &{} \displaystyle \le \inf _{a\in \mathbf {A}^g(x)}\left\{ C^{g}_{0}(x,a)+\int _{\mathbf {X}} W(y) q(dy|x,a)\right\} , \\ W(x) &{} \displaystyle \le \inf _{a\in \mathbf {A}^i(x)}\left\{ c^{i}_{0}(x,a)+\int _{\mathbf {X}} W(y) Q(dy|x,a)\right\} . \end{array}\right. \end{aligned}$$

(46)

Now, assume that the sets $\mathbf {A}^{g}$ and $\mathbf {A}^{i}$ are compact and the sets $\mathbb {K}^g$ and $\mathbb {K}^i$ are closed in $\mathbf {X}\times \mathbf {A}^{g}$ and $\mathbf {X}\times \mathbf {A}^{i}$ correspondingly and suppose that Assumptions (C1)–(C4) hold. In this context, according to item c) of Corollary 4.8 in [8], $\inf _{u\in \mathcal{U}}\mathcal{V}_0(u,x_0)=V(x_0)$ where V is the unique bounded measurable solution to the Bellman equation

$$\begin{aligned} \inf _{a\in \mathbf {A}^g(x)}&\left\{ -\alpha V(x)+C^g_{0}(x,a)+\int _{\mathbf {X}} V(y) q(dy|x,a)\right\} \nonumber \\&\wedge \inf _{a\in \mathbf {A}^i(x)}\left\{ -V(x)+c^i_{0}(x,a)+\int _{\mathbf {X}} V(y) Q(dy|x,a)\right\} =0. \end{aligned}$$

Since V satisfies the inequalities in (46), then

$$\begin{aligned} \inf _{u\in \mathcal {U}} \mathcal {V}_{0}(u,x_{0})=\inf _{\eta \in \mathbb {L}} \eta (C_{0}) = \sup _{W\in \mathbb {L}^{*}} W(x_{0})=V(x_0), \end{aligned}$$

and there is no duality gap. Consequently, if there are no constraints then solving the dual linear program is equivalent to solving the Bellman equation.

References

Arapostathis, A., Borkar, V.S., Ghosh, M.K.: Ergodic Control of Diffusion Processes. Encyclopedia of Mathematics and Its Applications, vol. 143. Cambridge University Press, Cambridge (2012)
MATH Google Scholar
Bhatt, Abhay G., Borkar, Vivek S.: Occupation measures for controlled Markov processes: characterization and optimality. Ann. Probab. 24(3), 1531–1562 (1996)
Article MathSciNet MATH Google Scholar
Buckdahn, R., Goreac, D., Quincampoix, M.: Stochastic optimal control and linear programming approach. Appl. Math. Optim. 63(2), 257–276 (2011)
Article MathSciNet MATH Google Scholar
Christensen, Sören: On the solution of general impulse control problems using superharmonic functions. Stoch. Process. Appl. 124(1), 709–729 (2014)
Article MathSciNet MATH Google Scholar
Costa, O.L.V., Dufour, F.: A linear programming formulation for constrained discounted continuous control for piecewise deterministic Markov processes. J. Math. Anal. Appl. 424(2), 892–914 (2015)
Article MathSciNet MATH Google Scholar
Davis, M.H.A.: Markov Models and Optimization. Monographs on Statistics and Applied Probability, vol. 49. Chapman & Hall, London (1993)
Book MATH Google Scholar
Dufour, F., Horiguchi, M., Piunovskiy, A.B.: The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach. Adv. Appl. Probab. 44(3), 774–793 (2012)
Article MathSciNet MATH Google Scholar
Dufour, F., Piunovskiy, A.B.: Impulsive control for continuous-time Markov decision processes. Adv. Appl. Probab. 47(1), 106–127 (2015)
Article MathSciNet MATH Google Scholar
Guo, X., Hernández-Lerma, O.: Continuous-Time Markov Decision Processes: Theory and applications. Stochastic Modelling and Applied Probability, vol. 62. Springer, Berlin (2009)
MATH Google Scholar
Guo, X., Hernández-Lerma, O., Prieto-Rumeau, T.: A survey of recent results on continuous-time Markov decision processes. Top 14(2), 177–261 (2006). With comments and a rejoinder by the authors
Article MathSciNet MATH Google Scholar
Guo, X., Piunovskiy, A.: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36(1), 105–132 (2011)
Article MathSciNet MATH Google Scholar
Helmes, K., Stockbridge, R.H.: Linear programming approach to the optimal stopping of singular stochastic processes. Stochastics 79(3–4), 309–335 (2007)
MathSciNet MATH Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes. Applications of Mathematics, vol. 30. Springer, New York (1996)
Book MATH Google Scholar
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Applications of Mathematics, vol. 42. Springer, New York (1999)
Book MATH Google Scholar
Hordijk, A., van der Duyn Schouten, F.A.: Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model. Adv. Appl. Probab. 15(2), 274–303 (1983)
Article MathSciNet MATH Google Scholar
Hordijk, A., Schouten, F.A.V.D.D.: Markov decision drift processes: conditions for optimality obtained by discretization. Math. Oper. Res. 10(1), 160–173 (1985)
Article MathSciNet MATH Google Scholar
Jacod, J.: Multivariate point processes: predictable projection, Radon-Nikodým derivatives, representation of martingales. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 31,235–253 (1974/75)
Jacod, J.: Calcul Stochastique et Problèmes de Martingales. Lecture Notes in Mathematics, vol. 714. Springer, Berlin (1979)
MATH Google Scholar
Kitaev, M.Y., Rykov, V.V.: Controlled Queueing Systems. CRC Press, Boca Raton (1995)
MATH Google Scholar
Kurtz, T.G., Stockbridge, R.H.: Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control Optim. 36(2), 609–653 (1998)
Article MathSciNet MATH Google Scholar
Kushner, H.J., Dupuis, P.G.: Numerical Methods for Stochastic Control Problems in Continuous Time. Applications of Mathematics (New York), vol. 24. Springer, New York (1992)
Book MATH Google Scholar
Kushner, H.J., Martins, L.F.: Numerical methods for stochastic singular control problems. SIAM J. Control Optim. 29(6), 1443–1475 (1991)
Article MathSciNet MATH Google Scholar
Last, G., Brandt, A.: Marked Point Processes on the Real Line. Probability and Its Applications (New York). Springer, New York (1995)
MATH Google Scholar
Piunovskiy, A., Zhang, Y.: Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optim. 49(5), 2032–2061 (2011)
Article MathSciNet MATH Google Scholar
Piunovskiy, A.B.: Multicriteria impulsive control of jump Markov processes. Math. Methods Oper. Res. 60(1), 125–144 (2004)
Article MathSciNet MATH Google Scholar
Prieto-Rumeau, T., Hernández-Lerma, O.: Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J. Control Optim. 47(4), 1888–1908 (2008)
Article MathSciNet MATH Google Scholar
Prieto-Rumeau, T., Hernández-Lerma, O.: Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games. ICP Advanced Texts in Mathematics, vol. 5. Imperial College Press, London (2012)
MATH Google Scholar
Stockbridge, R.H.: Time-average control of martingale problems: a linear programming formulation. Ann. Probab. 18(1), 206–217 (1990)
Article MathSciNet MATH Google Scholar
Yushkevich, A.A.: Continuous time Markov decision processes with interventions. Stochastics 9(4), 235–274 (1983)
Article MathSciNet MATH Google Scholar
Yushkevich, A.A.: Markov decision processes with both continuous and impulsive control. In: Stochastic optimization (Kiev, 1984). Lecture Notes in Control and Information Sciences, vol. 81, pp. 234–246. Springer, Berlin (1986)
Yushkevich, A.A.: Bellman inequalities in Markov decision deterministic drift processes. Stochastics 23(1), 25–77 (1987)
Article MathSciNet MATH Google Scholar
Yushkevich, A.A.: Verification theorems for Markov decision processes with controllable deterministic drift, gradual and impulse controls. Teor. Veroyatnost. i Primenen. 34(3), 528–551 (1989)
MathSciNet Google Scholar

Download references

Acknowledgments

This study has been carried out with financial support from the French State, managed by the French National Research Agency (ANR) in the frame of the “Investments for the future” Programme IdEx Bordeaux - CPU (ANR-10-IDEX-03-02).

Author information

Authors and Affiliations

Bordeaux INP, IMB, UMR CNRS 5251, Talence, France
F. Dufour
Universite de Bordeaux, IMB, UMR CNRS 5251, Talence, France
F. Dufour
Inria Bordeaux-Sud-Ouest, 200 Avenue de la Vieille Tour, 33405, Talence Cedex, France
F. Dufour
Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, United Kingdom
A. B. Piunovskiy

Authors

F. Dufour
View author publications
You can also search for this author in PubMed Google Scholar
A. B. Piunovskiy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to F. Dufour.

Appendices

Appendix 1: Proofs of Propositions 4.3 and 4.4

The next result provides a sufficient condition in terms of the finiteness of the occupation measure to ensure that the process is not explosive.

Lemma A.1

For any $u\in \mathcal {U}$, $\displaystyle \eta ^i_u({\mathbf {X}}\times \{\Delta \})\le 1+\frac{1}{\alpha } \int _{\mathbb {K}^i} \overline{q}(\mathbf {X}|x,a)\eta ^g_u(dx,da)+\eta ^i_u({\mathbf {X}}\times \mathbf {A}^i).$ If $\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })<\infty $ then $\mu ^{i}_{u}(\mathbf {Y})<\infty $ and $\mathbb {P}^{u}_{x_{0}}(T_{\infty }<\infty )=0$.

Proof

Note that

$$\begin{aligned} \eta ^i_u({\mathbf {X}}\times \{\Delta \})&=\widetilde{\mu }^i_u ( {\mathbf {X}}\times \{\Delta \}) =\sum _{j=1}^\infty \mu ^i_u \big ( \{ {\mathbf {y}}\in {\mathbf {Y}}:{\mathbf {y}}_j\in {\mathbf {X}}\times \{\Delta \} \} \big ) \nonumber \\&=\sum _{j=1}^\infty \mu ^i_u({\mathbf {Y}}_{j-1})=\mu ^i_u({\mathbf {Y}}) =\mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s}\mu (ds,{\mathbf {Y}})\right] +u_0({\mathbf {Y}}|x_0). \end{aligned}$$

Since $\nu =\nu _0+\nu _1$ is the predictable projection of $\mu $ and $\nu _1(ds,\cdot )$ is concentrated on ${\mathbf {Y}}^*$, we see that

$$\begin{aligned} \eta ^i_u({\mathbf {X}}\times \{\Delta \})&=\mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s}\int _{\mathbf {A}^g}\overline{q}({\mathbf {X}}|\overline{x}(\xi _{s-}),a)\pi (da|s)ds\right] \\&\quad +\mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s} \nu _1(ds,{\mathbf {Y}})\right] +1 \nonumber \\&=\frac{1}{\alpha }\int _{\mathbb {K}^i} \overline{q}(\mathbf {X}|x,a)\eta ^g_u(dx,da)+\mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s} \nu _1(ds,{\mathbf {Y}}^*)\right] +1 \end{aligned}$$

and

$$\begin{aligned} \mathbb {E}^u_{x_0}\left[ \int _{]0,T_\infty [} e^{-\alpha s} \nu _1(ds,{\mathbf {Y}}^*)\right] \le \mu ^i_u({\mathbf {Y}}^*)=\sum _{k=1}^\infty \mu ^i_u({\mathbf {Y}}_k). \end{aligned}$$

Finally, since $ \{\mathbf {y}\in \mathbf {Y} : \mathbf {y}_{j}\in \mathbf {X}\times \mathbf {A}^{i}\}=\bigcup _{k=j}^\infty {\mathbf {Y}}_k$,

$$\begin{aligned} \eta ^i_u({\mathbf {X}}\times \mathbf {A}^i)=\sum _{j=1}^\infty \mu ^i_u \big ( \{ {\mathbf {y}}\in {\mathbf {Y}}: {\mathbf {y}}_j\in {\mathbf {X}}\times \mathbf {A}^i \} \big ) =\sum _{j=1}^\infty j\mu ^i_u({\mathbf {Y}}_j)\ge \sum _{k=1}^\infty \mu ^i_u({\mathbf {Y}}_k), \end{aligned}$$

showing the first part of the result. To prove the last statement, observe first that for any $j\in \mathbb {N}^*$, we have

$$\begin{aligned} \{\mathbf {y}\in \mathbf {Y} : \mathbf {y}_{j}\in \mathbf {X}\times \mathbf {A}^{i}_{\Delta }\}= & {} \{\mathbf {y}\in \mathbf {Y} : \mathbf {y}_{j}\in \mathbf {X}\times \mathbf {A}^{i}\} \mathop {\cup }\{\mathbf {y}\in \mathbf {Y} : \mathbf {y}_{j}\in \mathbf {X}\times \{\Delta \}\} \\= & {} \mathop {\cup }_{k=j}^{\infty } \mathbf {Y}_{k}\mathop {\cup }\mathbf {Y}_{j-1} \end{aligned}$$

Consequently, $\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })=\widetilde{\mu }_{u}^{i}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta }) =\sum _{j\in \mathbb {N}} (j+1) \mu _{u}^{i}(\mathbf {Y}_{j})\ge \mu _{u}^{i}(\mathbf {Y})$. Now, we have that $\displaystyle \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n=2}^{\infty } e^{-\alpha T_{n}} I_{\{T_{n}<T_{\infty }\}} \Bigg ] \le \mu _{u}^{i}(\mathbf {Y}) < \infty $, showing the last part of the result. $\Box $

Proof of Proposition 4.3 Consider $\Gamma \in \mathcal {B}(\mathbf {X})$. From Lemma A.1, $\mathbb {P}^{u}_{x_{0}}(T_{\infty }=+\infty )=1$ and so, by using the product formula for functions of bounded variation

$$\begin{aligned} e^{-\alpha t} I_{\Gamma }(\overline{x}(\xi _{t}))= & {} I_{\Gamma }(\overline{x}(y_{1})) - \int _{0}^{t} \alpha e^{-\alpha s} I_{\Gamma }(\overline{x}(\xi _{s})) ds \\&+ \int _{]0,t]\times \mathbf {Y}} e^{-\alpha s} \Big [ I_{\Gamma }(\overline{x}(z)) - I_{\Gamma }(\overline{x}(\xi _{s-})) \Big ] \mu (ds,dz) . \end{aligned}$$

Therefore, combining the bounded convergence Theorem and the fact that $\mu ^{i}_{u}(\mathbf {Y})<\infty $ (see Lemma A.1), we have

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times&\mathbf {A}^{g}) = \alpha \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{0}^{\infty } e^{-\alpha s} I_{\Gamma }(\overline{x}(\xi _{s})) ds \Bigg ] \nonumber \\&\qquad = \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{\mathbf {Y}} I_{\Gamma }(\overline{x}(y)) u_{0}(dy|x_{0}) \Bigg ]\nonumber \\&\qquad \quad \,+ \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [\times \mathbf {Y}} e^{-\alpha s} \Big [ I_{\Gamma }(\overline{x}(z)) - I_{\Gamma }(\overline{x}(\xi _{s-})) \Big ] \mu (ds,dz) \Bigg ]. \end{aligned}$$

Recalling the definition $\mu _{u}^{i}$ (see Eq. 9) and the fact that $\nu $ is the predictable projection of $\mu $, we obtain by using Lemma 3.2

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g})&= \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) \mu _{u}^{i}(dy)\\&\quad - \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{0}^{\infty } e^{-\alpha s} \int _{\mathbf {A}^{g}} I_{\Gamma } (\overline{x}(\xi _{s})) \bar{q}(\mathbf {X}| \overline{x}(\xi _{s}),a) \pi (da |s) ds \Bigg ] \nonumber \\&- \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} I_{\Gamma }(\overline{x}(\xi _{s-})) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ], \end{aligned}$$

and so,

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g})&= \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) \mu _{u}^{i}(dy)\\&\quad - \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \int _{0}^{\infty } e^{-\alpha s} \delta _{\overline{x}(\xi _{s})}(dx)\pi (da |s) ds \Bigg ] \nonumber \\&- \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]T_{n}, T_{n+1}]} e^{-\alpha s} \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ]. \end{aligned}$$

By using Fubini’s Theorem, we have that

$$\begin{aligned}&\mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \int _{0}^{\infty } e^{-\alpha s} \delta _{\overline{x}(\xi _{s})}(dx)\pi (da |s) ds \Bigg ]\\&\quad = \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \eta _{u}^{g}(dx,da). \end{aligned}$$

Moreover, observe that

$$\begin{aligned} \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]T_{n}, T_{n+1}]} e^{-\alpha s} \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Big | \mathcal {F}_{T_{n}} \Bigg ] = e^{-\alpha T_{n}} \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}). \end{aligned}$$

Combining the last three equations, it follows that

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g})&= \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) \mu _{u}^{i}(dy) - \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \eta _{u}^{g}(dx,da) \nonumber \\&- \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} e^{-\alpha T_{n}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}) \Bigg ]. \end{aligned}$$

Finally, remark that $I_{\Gamma }(\bar{x}(y))= \sum _{j=1}^{\infty } I_{\Gamma \times \{\Delta \}} (y_{j})$ for any $y=\big (y_{1},y_{2},\ldots ,y_{j},\ldots \big )\in \mathbf {Y}$. Therefore, $\displaystyle \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) \mu _{u}^{i}(dy)= \eta _{u}^{i}(\Gamma \times \{\Delta \})$ showing the result. $\square $

Lemma A.2

Consider a strategy $u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}$ fixed with $u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )$ for $n\in \mathbb {N}^{*}$.

Then, for any $n\in \mathbb {N}^{*}$, $\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })$, $t\in \mathbb {R}_{+}$, $x\in \mathbf {X}$ and $h_{n}\in \mathbf {H}_{n}$

$$\begin{aligned} \widetilde{\gamma }^{0}_{n}(\Gamma \times \mathbf {A}^{i}_{\Delta } | h_{n},t,x)&= \delta _{x}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,a) \widetilde{\gamma }^{0}_{n}(dz,da | h_{n},t,x) ,\end{aligned}$$

(47)

$$\begin{aligned} \widetilde{\gamma }^{1}_{n}(\Gamma \times \mathbf {A}^{i}_{\Delta } | h_{n})&= \delta _{\bar{x}(y_{n})}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,a) \widetilde{\gamma }^{1}_{n}(dz,da | h_{n}) , \end{aligned}$$

(48)

where $h_{n}=(y_0,\theta _1,y_1,\ldots ,\theta _{n},y_{n})\in \mathbf {H}_{n}$. Similarly, for any $\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })$ and $x\in \mathbf {X}$

$$\begin{aligned} \widetilde{u}_{0}(\Gamma \times \mathbf {A}^{i}_{\Delta } | x)&= \delta _{x}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,a) \widetilde{u}_{0}(dz,da | x) . \end{aligned}$$

(49)

Proof

This Lemma is a straightforward consequence of Lemma 9.4.3 in [14]. $\square $

Proof of Proposition 4.4 By using the fact that $\nu $ is the predictable projection of $\mu $ and Lemma 3.2, we have

$$\begin{aligned} \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })&= \widetilde{u}_{0}(\Gamma \times \mathbf {A}^{i}_{\Delta }|x_{0}) \\&+ \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [} e^{-\alpha s} \int _{\mathbf {A}^{g}}\int _\mathbf {X} \widetilde{\gamma }^{0}(\Gamma \times \mathbf {A}^{i}_{\Delta }|x,s) \bar{q}(dx | \overline{x}(\xi _{s-}),a) \pi (da|s) ds \Bigg ] \\&+ \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} \widetilde{\gamma }^{1}_{n}(\Gamma \times \mathbf {A}^{i}_{\Delta }|H_{n}) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ]. \end{aligned}$$

Now, by using Lemma A.2, it follows that

$$\begin{aligned}&\eta _{u}^{i}( \Gamma \times \mathbf {A}^{i}_{\Delta }) \\&\quad = \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \widetilde{u}_{0}(dz,db|x_{0}) \\&\qquad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [} e^{-\alpha s} \int _{\mathbf {A}^{g}}\int _\mathbf {X} \delta _{x}(\Gamma ) \bar{q}(dx | \overline{x}(\xi _{s-}),a) \pi (da|s) ds \Bigg ] \\&\qquad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [} e^{-\alpha s} \int _{\mathbf {A}^{g}}\int _\mathbf {X} \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \widetilde{\gamma }^{0}(dz,db|x,s) \bar{q}(dx | \overline{x}(\xi _{s-}),a) \pi (da|s) ds \Bigg ] \\&\qquad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} I_{\Gamma }(\overline{x}(Y_{n})) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ] \\&\qquad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \widetilde{\gamma }^{1}_{n}(dz,db|H_{n}) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ]. \end{aligned}$$

Consequently

$$\begin{aligned} \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \eta _{u}^{i}(dz,db)\\&\quad + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma \mathop {\cap }\mathbf {X} | x,a) \eta _{u}^{g}(dx,da) \\&+ \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}}\int _{]T_{n}, T_{n+1}]} e^{-\alpha s} I_{\Gamma }(\overline{x}(Y_{n})) \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})} \Bigg ] \end{aligned}$$

showing the result. $\square $

Appendix 2: Proof of Proposition 4.8

This appendix is dedicated to the proof of Proposition 4.8. We first need to derive some technical results. In all this section, we consider $\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})$ and $\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })$ fixed. Let us introduce the stochastic kernel $G_{\pi ,\varphi }$ on $\mathbb {R}^{*}_{+}\times \mathbf {Y}$ given $\mathbf {Y}$

$$\begin{aligned} G_{\pi ,\varphi }(dt,dy| z)= \bar{q}^{\pi } R^{\varphi }(dy| \bar{x}(z)) e^{-t\bar{q}^{\pi }(\mathbf {X}|\bar{x}(z))} dt, \end{aligned}$$

(50)

and the stochastic kernel $L_{\pi }$ on $\mathbf {X}$ given $\mathbf {Y}$

$$\begin{aligned} L_{\pi }(dx|y)&= \frac{\delta _{\overline{x}(y)}(dx)}{\alpha +\bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}. \end{aligned}$$

(51)

For notational convenience, we denote

$$\begin{aligned} H_{\pi ,\varphi }=L_{\pi }\bar{q}^{\pi }R^{\varphi }. \end{aligned}$$

(52)

Lemma B.1

Let $\gamma \in \mathcal {P}(\mathbf {Y})$. Then $\widetilde{\gamma }$ is supported on $\mathbb {K}^{i}_{\Delta }$ and $\widetilde{\gamma }(\mathbf {X}\times \{\Delta \})=1$. Consider $x\in \mathbf {X}$ and a randomized stationary policy $\varphi $ for the model $\mathcal {M}^{i}$ then $\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \}|x)\le 1$. Moreover, $\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \}|x)=1$ if and only if $P^{\varphi }(\mathbf {Y}|x)=1$.

Proof

Let $j\in \mathbb {N}^{*}$. Observe that $\{\mathbf {y}\in \mathbf {Y}: \mathbf {y}_{j}\in \mathbf {X}\times \{\Delta \}\}=\mathbf {Y}_{j-1}$ and the first assertion is clear. Regarding the second claim, we have $P^{\varphi } \Big ( \big \{\mathbf {y}\in (\mathbb {K}^{i}_{\Delta })^{\infty }: \mathbf {y}_{j}\in \mathbf {X}\times \{\Delta \} \big \} |x \Big ) =P^{\varphi } \Big ( \mathbf {Y}_{j-1} |x \Big )$ for $x\in \mathbf {X}$ since $P^{\varphi }$ is the strategic measure for the model $\mathcal {M}^{i}$ generated by $\varphi $, showing the last part of the result. $\square $

Lemma B.2

For any $\Upsilon \in \mathcal {B}(\mathbf {Y})$ and $n\in \mathbb {N}^{*}$, we have

$$\begin{aligned} \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \delta _{Y_{n}}(\Upsilon ) \Big ] =R^{\varphi } H^{n-1}_{\pi ,\varphi }(\Upsilon |x_{0}). \end{aligned}$$

(53)

Proof

Let us show the result by induction. Clearly, this equation holds for $n=1$. Now, assume that Eq. (53) holds for n. Consider $\Upsilon \in \mathcal {B}(\mathbf {Y})$. Then,

$$\begin{aligned} \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n+1}< \infty \}}&e^{-\alpha T_{n+1}} \delta _{Y_{n+1}}(\Upsilon ) \Big ] \\&=\mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{\Theta _{n+1}< \infty \}} e^{-\alpha \Theta _{n+1}} \delta _{Y_{n+1}}(\Upsilon ) | \mathcal {F}_{T_{n}} \Big ] \Big ] \\&= \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \int _{\mathbb {R}_{+}^{*}} e^{-\alpha s} G_{\pi ,\varphi }(ds,\Upsilon | Y_{n}) \Big ] \\&= \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \int _{\mathbb {R}_{+}^{*}} e^{-\alpha s} \bar{q}^{\pi }R^{\varphi }(\Upsilon |\bar{x}(Y_{n})) e^{-s\bar{q}^{\pi }(\mathbf {X}|\bar{x}(Y_{n}))} ds \Big ] \\&= \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} H_{\pi ,\varphi }(\Upsilon | Y_{n}) \Big ] \\&= \int _{\mathbf {Y}} H_{\pi ,\varphi }(\Upsilon |z) R^{\varphi }H^{n-1}_{\pi ,\varphi }(dz|x_{0}), \end{aligned}$$

showing the result. $\square $

Proposition B.3

The following three equalities hold:

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i}(dx,da)= & {} \widetilde{R}^{\varphi }(dx,da|x_{0}) + \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi }\widetilde{H}_{\pi ,\varphi }(dx,da|x_{0}), \end{aligned}$$

(54)

$$\begin{aligned} \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx)= & {} \alpha \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi }(dx|x_{0}), \end{aligned}$$

(55)

and

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i}(\Gamma \times \{\Delta \}) =&\widehat{\eta }_{u^{\pi ,\varphi }}^{g}(\Gamma ) +\frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}^{\pi }(\mathbf {X}| x) \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx), \end{aligned}$$

(56)

Proof

From the definition of $\mu _{u^{\pi ,\varphi }}^{i}$ (see Eq. 9) and Lemma B.2, we have

$$\begin{aligned} \mu _{u^{\pi ,\varphi }}^{i}(dy)= & {} R^{\varphi }(dy|x_{0}) + \sum _{n=2}^{\infty } \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Big [ I_{\{T_{n}< \infty \}} e^{-\alpha T_{n}} \delta _{Y_{n}}(dy) \Big ] \\= & {} R^{\varphi }(dy|x_{0}) + \sum _{n=1}^{\infty } R^{\varphi } H^{n}_{\pi ,\varphi }(dy|x_{0}). \end{aligned}$$

Observe that $\widetilde{H}_{\pi ,\varphi }= L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi }$. Since $\eta _{u^{\pi ,\varphi }}^{i}(dx,da)=\widetilde{\mu }_{u^{\pi ,\varphi }}^{i}(dx,da)$, we obtain easily Eq. (54).

Moreover, we have

$$\begin{aligned}&\mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Bigg [ \int _{0}^{T_{\infty }} e^{-\alpha s} \delta _{\bar{x}(\xi _{s-})}(dx) ds \Bigg ]\\&\quad = \sum _{n=1}^{\infty } \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Bigg [ I_{\{T_{n}<\infty \}} \int _{]T_{n},T_{n+1}]} e^{-\alpha s} \delta _{\bar{x}(\xi _{s-})}(dx) ds \Bigg ] \\&\quad = \frac{1}{\alpha } \sum _{n=1}^{\infty } \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Bigg [ I_{\{T_{n}<\infty \}} e^{-\alpha T_{n}} \delta _{\bar{x}(Y_{n})}(dx) \big (1- e^{-\alpha \Theta _{n+1}}\big ) \Bigg ] \\&\quad = \sum _{n=1}^{\infty } \mathbb {E}^{u^{\pi ,\varphi }}_{x_{0}} \Bigg [ I_{\{T_{n}<\infty \}} e^{-\alpha T_{n}} \frac{\delta _{\bar{x}(Y_{n})}(dx)}{\alpha +\bar{q}^{\pi }(\mathbf {X}|\bar{x}(Y_{n}))} \Bigg ], \end{aligned}$$

and so by using the definition of $L_{\pi }$ and Lemma B.2, we obtain (55).

Now, from Eq. (55) we get

$$\begin{aligned}&\widehat{\eta }_{u^{\pi ,\varphi }}^{g} (\Gamma ) +\frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}^{\pi }(\mathbf {X}| x) \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx) \\&\quad = \sum _{n=1}^{\infty } \int _{\mathbf {Y}} \frac{\alpha I_{\Gamma }(\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} R^{\varphi } H^{n-1}_{\pi ,\varphi } (dy|x_{0})\\&\qquad +\sum _{n=1}^{\infty } \int _{\mathbf {Y}} \frac{I_{\Gamma }(\bar{x}(y)) \bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} R^{\varphi } H^{n-1}_{\pi ,\varphi } (dy|x_{0}) \\&\quad = \sum _{n=1}^{\infty } \int _{\mathbf {Y}} I_{\Gamma }(\bar{x}(y)) R^{\varphi } H^{n-1}_{\pi ,\varphi } (dy|x_{0}) = \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0})\\&\qquad + \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } \widetilde{H}_{\pi ,\varphi } (\Gamma \times \{\Delta \}|x_{0}). \end{aligned}$$

Recalling (54), we have Eq. (56), showing the result. $\square $

Proof of Proposition 4.8 Observe that $\widetilde{H}_{\pi ,\varphi }= L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi }$, and so

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i}(dx,da)&= \widetilde{R}^{\varphi }(dx,da|x_{0}) + \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi }(dx,da|x_{0}), \end{aligned}$$

and with (55) we get (19). The measure $\widehat{\eta }_{u^{\pi ,\varphi }}^{g}$ is finite by definition and so, by using Propositions B.3 and Assumption (A1), we have that for any $\Gamma \in \mathcal {B}(\mathbf {X})$

$$\begin{aligned} \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi }(\Gamma |x_{0}) <\infty \end{aligned}$$

(57)

$$\begin{aligned} \sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi }\widetilde{H}_{\pi ,\varphi }(\Gamma \times \{\Delta \}|x_{0}) <\infty \end{aligned}$$

(58)

Now, from Eq. (55) and the definition of $H_{\pi ,\varphi }$ (see Eq. 52), we have that for any $\Gamma \in \mathcal {B}(\mathbf {X})$

$$\begin{aligned} \frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} \bar{q}^{\pi } \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \})&= \sum _{n\in \mathbb {N}^{*}} R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } \bar{q}^{\pi } \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0}) \nonumber \\&= \sum _{n\in \mathbb {N}} R^{\varphi } H^{n}_{\pi ,\varphi } \widetilde{H}_{\pi ,\varphi } (\Gamma \times \{\Delta \}|x_{0}). \end{aligned}$$

(59)

Moreover, observe that $\displaystyle \frac{\bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} = 1 - \frac{\alpha }{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))}$ and so, for $n\ge 2$

$$\begin{aligned}&\int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } (dx|x_{0})\nonumber \\&\quad = \int _{\mathbf {Y}} \frac{I_{\Gamma }(\bar{x}(y)) \bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} R^{\varphi } H^{n-1}_{\pi ,\varphi } (dy|x_{0}) \nonumber \\&\quad = R^{\varphi } H^{n-2}_{\pi ,\varphi } \widetilde{H}_{\pi ,\varphi } (\Gamma \times \{\Delta \}|x_{0}) - \alpha R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } (\Gamma |x_{0}) \end{aligned}$$

(60)

and

$$\begin{aligned} \int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) R^{\varphi } L_{\pi } (dx|x_{0})= & {} \int _{\mathbf {Y}} \frac{I_{\Gamma }(\bar{x}(y)) \bar{q}^{\pi }(\mathbf {X}|\bar{x}(y))}{\alpha +\bar{q}^{\pi } (\mathbf {X}|\bar{x}(y))} R^{\varphi }(dy|x_{0}) \nonumber \\= & {} \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0}) - \alpha R^{\varphi } L_{\pi } (\Gamma |x_{0}). \end{aligned}$$

(61)

Consequently, by using the expression of $\widehat{\eta }_{u^{\pi ,\varphi }}^{g}$ in (55) and Eqs. (60)$-$(61)

$$\begin{aligned} \frac{1}{\alpha } \int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) \widehat{\eta }_{u^{\pi ,\varphi }}^{g} (dx|x_{0})= & {} \sum _{n=2}^{\infty } \int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi } (dx|x_{0}) \nonumber \\&+\int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) R^{\varphi } L_{\pi } (dx|x_{0}) \nonumber \\= & {} \sum _{n\in \mathbb {N}} R^{\varphi } H^{n}_{\pi ,\varphi } \widetilde{H}_{\pi ,\varphi } (\Gamma {\times } \{\Delta \}|x_{0}) + \widetilde{R}^{\varphi } (\Gamma {\times } \{\Delta \}|x_{0}) \nonumber \\&- \alpha \sum _{n\in \mathbb {N}} R^{\varphi } H^{n}_{\pi ,\varphi } L_{\pi } (\Gamma |x_{0}). \end{aligned}$$

(62)

Note that the above calculations are possible since the quantities $\sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi } L_{\pi }(\Gamma |x_{0})$ and $\sum _{n=1}^{\infty } R^{\varphi } H^{n-1}_{\pi ,\varphi }\widetilde{H}_{\pi ,\varphi }(\Gamma \times \{\Delta \}|x_{0})$ are finite (see inequalities (58) and (58)). Recalling (55) and combining Eqs. (59) and (62) we obtain that

$$\begin{aligned}&\frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} \bar{q}^{\pi } \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}) - \frac{1}{\alpha } \int _{\mathbf {X}} I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x) \widehat{\eta }_{u^{\pi ,\varphi }}^{g} (dx|x_{0})\\&\quad = - \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0}) + \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(\Gamma |x_{0}), \end{aligned}$$

showing the result. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dufour, F., Piunovskiy, A.B. Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach. Appl Math Optim 74, 129–161 (2016). https://doi.org/10.1007/s00245-015-9310-8

Download citation

Published: 24 July 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s00245-015-9310-8

Keywords

Mathematical Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach

Abstract

Similar content being viewed by others

On Optimal Stopping and Impulse Control with Constraint

Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

Stability of nonlinear impulsive stochastic systems with Markovian switching under generalized average dwell time condition

1 Introduction

2 The Continuous-Time Markov Control Process

2.1 Parameters of the Model

2.2 Construction of the Process

2.3 Admissible Strategies and Conditional Distribution of the Controlled Process

Remark 2.1

3 Optimization Problem and Assumptions

Definition 3.1

Assumption A

Remark 3.1

Assumption B

Lemma 3.2

4 Occupation Measures and Their Properties

Definition 4.1

Definition 4.2

Proposition 4.3

Proof

Proposition 4.4

Proof

Remark 4.1

Definition 4.5

Theorem 4.6

Proof

Definition 4.7

Proposition 4.8

Proof

Theorem 4.9

Proof

Corollary 4.10

Proof

5 The LP Formulation

Definition 5.1

Theorem 5.2

Proof

Definition 5.3

Proposition 5.4

Proof

Assumption C

Theorem 5.5

Proof

Theorem 5.6

Proof

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs of Propositions 4.3 and 4.4

Lemma A.1

Proof

Lemma A.2

Proof

Appendix 2: Proof of Proposition 4.8

Lemma B.1

Proof

Lemma B.2

Proof

Proposition B.3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematical Subject Classification

Search

Navigation