1 Introduction

The objective of this work is to study continuous-time Markov decision processes (CTMDP’s) with constraints for the infinite horizon discounted cost with both impulsive and continuous controls. CTMDP’s is a general family of controlled stochastic processes suitable for the modeling of sequential decision-making problems. They appear in many fields such as engineering, computer science, economics and operational research among others. Our goal is to study the two types of control for CTMDPs as described by Davis in his book [6]: continuous control acting at all times on the process through the transition rate, and impulsive control, used to describe control actions that move the process to a new point of the state space at some specific times. Continuous control for CTMDP’s has been extensively studied in the literature, see for example the recent books [9, 10, 27] and the references therein. Meanwhile, impulsive control for CTMDP’s has received less attention and was studied in [6, 8, 15, 16, 25, 2932]; see also the recent work [4], where a general Markov model was considered with application to financial mathematics. We do not attempt to present here an exhaustive panorama on this topic, but refer the interested reader to [8] for a brief survey of CTMDP with impulsive control.

It is important to emphasize that in the framework of impulsive control for CTMDPs, there exist two rather distinct families of problems. The first class is related to models allowing only one impulsive action at a time. The second family is more general and studies models with possibly multiple impulses at the same time moment. This latter set of problems is much more delicate for the analysis. Indeed, if the process may take different values at the same time moment then it leads to non standard paths for the controlled process. Most of the works in the literature are concerned with the first class of problems. The second family of problems have been addressed mainly by Yushkevich in [2932]. Yushkevich has introduced a new class of stochastic models, the so called T-processes where roughly speaking the processes are indexed by a parameter representing the natural current time and the number of the impulsive actions at that time moment. In [8], another approach has been developed by the authors in order to use the standard theory of marked point processes [17, 23]. Roughly speaking, the model discussed in [8] is defined by the following components: the state space \(\mathbf {X}\), the set of continuous actions \(\mathbf {A}^{g}\), the space of impulsive actions \(\mathbf {A}^{i}\), a transition rate q on \(\mathbf {X}\) given \(\mathbf {X} \times \mathbf {A}^{g}\) and a stochastic kernel Q on \(\mathbf {X}\) given \(\mathbf {X}\times \mathbf {A}^{i}\). The model is given by a marked point process \((\Theta _{n},Y_{n})_{n\in \mathbb {N}}\) where \(\Theta _{n}\) represents the sojourn time between two consecutive epochs induced either by a natural jump or by an intervention of the decision-maker. The natural jumps are generated by the transition rate q. The state vector \(Y_{n}\) represents the successive jumps of the process and the associated impulsive actions at the n-th epoch. More precisely, \(Y_{n}\) is of the form \((x_0,a_0,x_1,a_1,\ldots ,x_k,a_k,x_{k+1},\Delta ,\Delta ,\ldots )\) where \(x_{0}\) corresponds to a possibly natural jump or to the value of the process just before the intervention. The triple \((x_{j},a_{j},x_{j+1})\) indicates that the impulsive action \(a_{j}\) has been applied to the system at state \(x_{j}\) leading to a new jump \(x_{j+1}\) having distribution \(Q(\cdot |x_{j},a_{j})\). The special impulsive action \(\Delta \) means that the impulses are over and the artificial state \(\Delta \) means the same.

The necessity of an immediate sequence of impulses appears naturally in many mathematical models describing real-life problems. For example, in heavy traffic control problems, the approximation of the real physical system may lead to control problems with simultaneous multiple impulses in the limit model. For the original physical system, the impulses are separated in time but for the limit model, impulses can occur at the same time moment due to the fact that the time is rescaled and compressed. For a detailed exposition of such phenomenon, the reader is referred for example to Sect. 8 of the book [21] (and the references therein) where an example of a production system in heavy traffic with impulsive control is discussed.

From a theoretical point of view, constrained CTMDPs are substantially different and more difficult to study than unconstrained CTMDPs. The linear programming technique has proved to be a very efficient method for solving such problems. The key idea is to reformulate the original sequential decision-making problem as an infinite dimensional static optimization problem over a space of measures where the admissible solutions are the so-called occupation measures of the controlled process. This technique has been extensively studied in the recent decades for continuous time processes in the context of continuous (gradual) control. We do not pretend to present here an exhaustive panorama of this approach but the interested reader may consult the following works and the references therein: [11, 21, 24, 26] for CTMDPs, [1, 3, 22] for diffusion processes and [2, 12, 20, 28] for controlled martingale problems. However, the linear programming approach has been considerably less studied for impulsive control for unconstrained and constrained problems. To the best of our knowledge, this paper seems to be the first attempt to tackle such problems. By using the Lagrange multiplier approach, impulsive control problem with constraints has been addressed in [25] for a model with no continuous control, finite state and action spaces and where a single impulsive action is allowed at a time.

In this paper, we investigate a constrained optimization problem for a CTMDP with general state and action spaces, and with both impulsive and continuous controls where the performance and the constraint criteria are given in terms of infinite-horizon discounted functionals. Our model allows multiple impulses at the same time moment. To the best of our knowledge, this paper can be seen as the first attempt to solve such general CTMDP’s. A distinguished feature of this work with respect to [8] is that in the present paper we consider the constrained control problem while in [8], the unconstrained case has been studied by using the dynamic programming technique. The main objective of the present work is to provide sufficient conditions to ensure the existence of an optimal control. First, we study the properties of the occupation measures. It is shown in Theorem 4.5 that for any admissible control strategy, the corresponding occupation measure satisfies a specific linear equation. It is then proved that this linear equation characterizes the optimal control problem under consideration, in the sense that, from any measure \(\eta \) satisfying such linear equation one can construct a control strategy u such that the corresponding occupation measure \(\eta _{u}\) is smaller than \(\eta \) (for a precise mathematical statement, see Theorem 4.9). Based on these properties, one can introduce a linear program, labeled \(\mathbb {PLP}\) and prove that the solvability of the constrained optimization problem is equivalent to the solvability of the \(\mathbb {PLP}\) and that these two optimization problems give the same value. By introducing a set of weak hypotheses, the solvability of the \(\mathbb {PLP}\) is proved in Theorem 5.5. Finally, it is shown in Theorem 5.6 the existence of an optimal randomized control strategy and that the class of such strategies is a sufficient set for the constrained optimization problem under consideration.

The rest of the paper is organized as follows. In Sect. 2, the CTMDP under consideration is discussed. Section 3 is devoted to the presentation of the performance criteria and the introduction of the main assumptions. The properties of the occupation measures are derived in Sect. 4. Finally, the linear program is studied in Sect. 5 where the existence of an optimal control strategy is shown. Several auxiliary results are presented in the Appendices 1 and 2 to streamline the presentation.

2 The Continuous-Time Markov Control Process

The main goal of this section is to introduce the notations, as well as the parameters defining the model, and to present the construction of the controlled process. In particular, having defined the class of admissible strategies, we introduce a probability measure \(\mathbb {P}_{x_{0}}^{u}\) with respect to which the controlled process \((\Theta _{n},Y_{n})_{n\in \mathbb {N}}\) has the required conditional distributions.

The following basic notation will be used in this paper: \(\mathbb {N}\) is the set of natural numbers including 0, \(\mathbb {N}^{*}=\mathbb {N}\setminus \{0\}\), \(\mathbb {R}\) denotes the set of real numbers, \(\mathbb {R}_{+}\) the set of non-negative real numbers, \(\mathbb {R}_{+}^{*}=\mathbb {R}_{+}\setminus \{0\}\), \(\bar{\mathbb {R}}_{+}=\mathbb {R}_{+}\mathop {\cup }\{+\infty \}\) and \(\bar{\mathbb {R}}_{+}^*=\mathbb {R}_{+}^*\mathop {\cup }\{+\infty \}\). For any \(p\in \mathbb {N}\), \(\mathbb {N}_{p}\) is the set \(\{0,1,\ldots ,p\}\) and for any \(p\in \mathbb {N}^{*}\), \(\mathbb {N}_{p}^{*}\) is the set \(\{1,\ldots ,p\}\). The term measure will always refer to a countably additive, \(\bar{\mathbb {R}}_{+}\)-valued set function. A finite (respectively, signed) measure is a countably additive, \(\mathbb {R}_{+}\)-valued (respectively, \(\mathbb {R}\)-valued) set function. Let \(\mathbf {X}\) be a Borel space and denote by \(\mathcal {B}(\mathbf {X})\) its associated Borel \(\sigma \)-algebra. For any set A, \(I_{A}\) denotes the indicator function of the set A. The set of measures (respectively, signed measures) defined on \((\mathbf {X},\mathcal {B}(\mathbf {X}))\) is denoted by \(\mathcal {M}(\mathbf {X})\) (respectively, \(\mathcal {M}_{s}(\mathbf {X})\)). The set of finite measures on \((\mathbf {X},\mathcal {B}(\mathbf {X}))\) is denoted by \(\mathcal {M}_{f}(\mathbf {X})\) and \(\mathcal {P}(\mathbf {X})\) is the set of probability measures defined on \((\mathbf {X},\mathcal {B}(\mathbf {X}))\). For any point \(x\in \mathbf {X}\), \(\delta _{x}\) denotes the Dirac measure defined by \(\delta _{x}(\Gamma )=I_{\Gamma }(x)\) for any \(\Gamma \in \mathcal {B}(\mathbf {X})\). The set of bounded real-valued measurable functions defined on X is denoted by \(\mathbb {B}(\mathbf {X})\) and the set of \(\mathbb {R}\)-valued (respectively, \(\bar{\mathbb {R}}_{+}\)-valued, and \(\bar{\mathbb {R}}^{*}_{+}\)-valued) measurable functions defined on \(\mathbf {X}\) is denoted by \(\mathbb {M}(\mathbf {X})\) (respectively, \(\bar{\mathbb {M}}_{+}(\mathbf {X})\), and \(\bar{\mathbb {M}}^{*}_{+}(\mathbf {X})\)). The set of continuous functions in \(\mathbb {B}(\mathbf {X})\) is denoted by \(\mathbb {C}_{b}(\mathbf {X})\).

Let \(\mathbf {X}\) and \(\mathbf {Z}\) be two Borel spaces. A kernel (respectively, signed kernel) \(T(\cdot |\cdot )\) on \(\mathbf {Z}\) given \(\mathbf {X}\) is an \(\bar{\mathbb {R}}_{+}\)-valued (respectively, \(\mathbb {R}\)-valued) mapping defined on \(\mathcal {B}(\mathbf {Z})\times \mathbf {X}\) such that for any \(A\in \mathcal {B}(\mathbf {Z})\), \(T(A|\cdot )\in \bar{\mathbb {M}}_{+}(\mathbf {X})\) (respectively, \(T(A|\cdot )\in \mathbb {M}(\mathbf {X})\)) and for any \(x\in \mathbf {X}\), \(T(\cdot |x)\in \mathcal {M}(\mathbf {Z})\) (respectively, \(T(\cdot |x)\in \mathcal {M}_{s}(\mathbf {Z})\)). A kernel \(T(\cdot |\cdot )\) on \(\mathbf {Z}\) given \(\mathbf {X}\) is called stochastic (respectively, finite) if for any \(x\in \mathbf {X}\), \(T(\cdot |x)\in \mathcal {P}(\mathbf {Z})\) (respectively, \(T(\cdot |x)\in \mathcal {M}_{f}(\mathbf {Z})\)). \(\mathcal {P}(\mathbf {Z}|\mathbf {X})\) denotes the set of stochastic kernels on \(\mathbf {Z}\) given \(\mathbf {X}\). A transition rate q on \(\mathbf {X}\) given \(\mathbf {X}\times \mathbf {Z}\) is a signed kernel on \(\mathbf {X}\) given \(\mathbf {X}\times \mathbf {Z}\) satisfying \(q(\mathbf {X} |x,z)= 0\) and \(q(\Gamma \setminus \{x\} |x,z)\ge 0\) for any \(\Gamma \in \mathcal {B}(\mathbf {X})\) and any \((x,z)\in \mathbf {X}\times \mathbf {Z}\). To any transition rate q on \(\mathbf {X}\) given \(\mathbf {X}\times \mathbf {Z}\), we associate a kernel \(\bar{q}\) on \(\mathbf {X}\) given \(\mathbf {X}\times \mathbf {Z}\) defined by \(\bar{q}(\Gamma |x,z)=q(\Gamma \setminus \{x\} |x,z)\) for any \(\Gamma \in \mathcal {B}(\mathbf {X})\) and any \((x,z)\in \mathbf {X}\times \mathbf {Z}\).

Let \(\eta \in \mathcal {M}(\mathbf {X})\), \(f\in \bar{\mathbb {M}}(\mathbf {Z})_{+}\) and a kernel T on \(\mathbf {Z}\) given \(\mathbf {X}\), then \(\eta T\) denotes the measure on \(\mathbf {Z}\) defined by \(\displaystyle \eta T (\Gamma )=\int _{\mathbf {X}} T(\Gamma |x)\eta (dx)\) for any \(\Gamma \in \mathcal {B}(\mathbf {Z})\) and Tf denotes the function defined on \(\mathbf {X}\) by \(\displaystyle Tf(x)=\int _{\mathbf {Z}} f(z) T(dz|x)\) for any \(x\in \mathbf {X}\). Moreover, if R is a kernel on \(\mathbf {Y}\) given \(\mathbf {Z}\) then TR is the kernel on \(\mathbf {Y}\) given \(\mathbf {X}\) defined by \(\displaystyle TR(\Gamma |x)=\int _{\mathbf {Z}} R(\Gamma |z)T(dz|x)\) for any \(\Gamma \in \mathcal {B}(\mathbf {Y})\) and \(x\in \mathbf {X}\).

Finally, the infimum over an empty set is understood to be equal to \(+\infty \).

2.1 Parameters of the Model

We deal with a control model defined through the following elements:

  • \(\mathbf {X}\) is the state space, assumed to be a Borel space (i.e., a measurable subset of a complete and separable metric space).

  • \(\mathbf {A}\) is the action space, assumed to be also a Borel space. \(\mathbf {A}^{i}\in \mathcal {B}(\mathbf {A})\) (respectively \(\mathbf {A}^{g}\in \mathcal {B}(\mathbf {A})\)) is the set of impulsive (respectively continuous) actions satisfying \(\mathbf {A}=\mathbf {A}^i\cup \mathbf {A}^g\) with \(\mathbf {A}^i\cap \mathbf {A}^g=\emptyset \).

  • The set of feasible actions in state \(x\in \mathbf {X}\) is \(\mathbf {A}(x)\), which is a nonempty measurable subset of \(\mathbf {A}\). Admissible impulsive and continuous actions in the state \(x\in \mathbf {X}\) are denoted by \(\mathbf {A}^i(x)=\mathbf {A}(x)\cap \mathbf {A}^i\) and \(\mathbf {A}^g(x)=\mathbf {A}(x)\cap \mathbf {A}^g\). It is supposed that \(\mathbb {K}^g=\{(x,a)\in \mathbf {X}\times \mathbf {A}:a\in \mathbf {A}^{g}(x)\}\in \mathcal {B}(\mathbf {X}\times \mathbf {A}^g)\) and this set contains the graph of a measurable function from \(\mathbf {X}\) to \(\mathbf {A}^g\) (necessarily \(\mathbf {A}^g(x)\ne \emptyset \) for all \(x\in \mathbf {X}\)) and that \(\mathbb {K}^i=\{(x,a)\in \mathbf {X}\times \mathbf {A}^{i}:a\in \mathbf {A}^i(x)\}\in \mathcal {B}(\mathbb {X}^{i}\times \mathbf {A}^i)\) where \(\mathbb {X}^{i}=\{x\in \mathbf {X}: \mathbf {A}^i(x)\ne \emptyset \}\in \mathcal {B}(\mathbf {X})\) and \(\mathbb {K}^i\) contains the graph of a measurable function from \(\mathbb {X}^{i}\) to \(\mathbf {A}\).

  • The stochastic kernel Q on \(\mathbf {X}\) given \(\mathbb {K}^{i}\) describes the result of an impulsive action. In other words, if \(x\in \mathbb {X}^{i}\) and an impulsive action \(a\in \mathbf {A}^i(x)\) is applied then the state of the process changes instantly according to the stochastic kernel Q.

  • The signed kernel q on \(\mathbf {X}\) given \(\mathbb {K}^g\) is the intensity of jumps governing the dynamic of the process between interventions. For notational convenience, let us denote \(q(\Gamma \setminus \{x\}|x,a)\) by \(\bar{q}(\Gamma |x,a)\) for \(\Gamma \in \mathcal {B}(\mathbf {X})\) and \((x,a)\in \mathbb {K}^{g}\). It satisfies \(q(\mathbf {X}|x,a)=0\), \(\bar{q}(\mathbf {X}|x,a)\ge 0\) and \(\sup _{a\in \mathbf {A}(x)} \bar{q}(\mathbf {X}|x,a)< \infty \) for any \((x,a)\in \mathbb {K}^g\).

In our model, an intervention consists only of a finite sequence of pairs of impulsive action and associated jump. Actually, this finite sequence can be equivalently described by an infinite sequence of pairs of state and action, where the pairs are set to the fictitious action and state after a finite step. As a result, an intervention is an element of the set

$$\begin{aligned} \mathbf {Y}=\bigcup _{k\in \mathbb {N}}\mathbf {Y}_k \text { with } \mathbf {Y}_k=(\mathbf {X}\times \mathbf {A}^i)^k\times (\mathbf {X}\times \{\Delta \})\times (\{\Delta \}\times \{\Delta \})^\infty , \end{aligned}$$

where \(\Delta \) will play the role of the fictitious state and action. The dynamic of such sequences is governed by the Markov Decision Process (MDP) \(\mathcal {M}^{i}\) defined by \(\mathcal {M}^{i}=\big (\mathbf {X}_\Delta ,\mathbf {A}^i_\Delta ,(\mathbf {A}^i_\Delta (x))_{x\in \mathbf {X}_\Delta },Q_{\Delta }\big )\) where \(\mathbf {X}_{\Delta }\), \(\mathbf {A}^{i}_{\Delta }\) and \(\big (\mathbf {A}^{i}_{\Delta }(x))_{x\in \mathbf {X}_{\Delta }}\) are the new state and actions spaces augmented by the fictitious state \(\Delta \): \(\mathbf {X}_{\Delta }=\mathbf {X}\mathop {\cup }\{\Delta \}\), \(\mathbf {A}^{i}_{\Delta }=\mathbf {A}^{i}\mathop {\cup }\{\Delta \}\) and \(\mathbf {A}^{i}_{\Delta }(x)=\mathbf {A}^{i}(x)\mathop {\cup }\{\Delta \}\) for \(x\in \mathbf {X}\) and \(\mathbf {A}^{i}_{\Delta }(\Delta )=\{\Delta \}\). The dynamic is given by \(Q_{\Delta }(.|x,a)=Q(.|x,a)\) for any \((x,a)\in \mathbb {K}^{i}\) and \(Q_{\Delta }(\{\Delta \}|x,a)=1\) otherwise. For the model \(\mathcal {M}^{i}\), according to the Ionescu Tulcea’s Theorem (see Proposition C.10 in [13]), there exists a unique strategic measure \(P^{\beta }(\cdot |x)\) on \((\mathbf {X}_\Delta \times \mathbf {A}^i_\Delta )^\infty \) associated with the policy \(\beta \) and the initial distribution \(\delta _x\). Here and below, we use the standard terminology for MDP: see for example [13]. A policy is a sequence of past-dependent distributions on the action space. A randomized (non-randomized, respectively) policy is a control policy consisting in choosing randomly (deterministically, respectively) the actions along the time according to a probability law depending on the past history of the state and action processes. A Markov non-randomized policy is a sequence \((\varphi _j^i)_{j\in \mathbb {N}}\) of \(\mathbf {A}^i_\Delta \)-valued mappings on \(\mathbf {X}_\Delta \), and so on. Observe that \(P^{\beta }\) is in fact a stochastic kernel on \((\mathbf {X}_\Delta \times \mathbf {A}^i_\Delta )^\infty \) given \(\mathbf {X}\), see Proposition C.10 in [13]. Since we only consider intervention as an element of \(\mathbf {Y}\), we introduce \(\Xi \) as the set of policies \(\beta \) satisfying \(P^{\beta }(\mathbf {Y}|x)=1\). We consider randomized interventions and consequently an intervention is an element of

$$\begin{aligned} \mathcal {P}^\mathbf {Y}=\{\gamma \in \mathcal {P}(\mathbf {Y}|\mathbf {X}): \gamma (\cdot |\cdot )=P^{\beta }(\cdot |\cdot ) \text { for some } \beta \in \Xi \}, \end{aligned}$$

and

$$\begin{aligned} \mathcal {P}^\mathbf {Y}(x)=\{\rho \in \mathcal {P}(\mathbf {Y}): \rho (\cdot )=P^{\beta }(\cdot |x) \text { for some } \beta \in \Xi \} \end{aligned}$$

is the set of feasible interventions in state \(x\in \mathbf {X}\). Observe that if an intervention is chosen in \(\mathbf {Y}_{0}\), it means actually that the controller has not intervened on the process through impulsive actions. For technical reasons, it appears necessary to introduce the set \(\mathbf {Y}^{*}\) of real interventions given by \(\mathbf {Y}^{*}=\bigcup _{k=1}^{\infty }\mathbf {Y}_k \). The associated sets of real randomized interventions are defined by

$$\begin{aligned} \mathcal {P}^{\mathbf {Y}^{*}}= & {} \{\gamma \in \mathcal {P}(\mathbf {Y}|\mathbf {X}): \gamma (\cdot |\cdot )=P^{\beta }(\cdot |\cdot ) \text { for some } \beta \in \Xi \\&\text { and }P^{\beta }(\mathbf {Y}^{*}|x)=1, \text{ for } \text{ any } x\in \mathbb {X}^{i}\} \end{aligned}$$

and

$$\begin{aligned} \mathcal {P}^{\mathbf {Y}^{*}}(x)=\{ \rho \in \mathcal {P}(\mathbf {Y}): \rho (\cdot )=P^{\beta }(\cdot |x) \text { for some } \beta \in \Xi \text { and }P^{\beta }(\mathbf {Y}^{*}|x)=1\} \end{aligned}$$

for \(x\in \mathbf {X}\). Note that \(\mathcal {P}^{\mathbf {Y}^{*}}(x)=\emptyset \) if \(x\notin \mathbb {X}^{i}\).

Let us denote by \(\mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})\), the set of stochastic kernels \(\pi \in \mathcal {P}(\mathbf {A}^{g}|\mathbf {X})\) such that for any \(x\in \mathbf {X}\), \(\pi (\mathbf {A}^{g}(x)|x)=1\) and by \(\mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\), the set of stochastic kernels \(\varphi \in \mathcal {P}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\) such that for any \(x\in \mathbf {X}_{\Delta }\), \(\varphi (\mathbf {A}^{i}_{\Delta }(x)|x)=1\). For \(\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})\), \(\bar{q}^{\pi }\) denotes the kernel on \(\mathbf {X}\) given \(\mathbf {X}\) defined by \(\displaystyle \int _{\mathbf {A}^{g}} \bar{q}(\Gamma |x,a) \pi (da|x)\) and \(q^{\pi }\) denotes the transition rate on \(\mathbf {X}\) given \(\mathbf {X}\) defined by \(\displaystyle \int _{\mathbf {A}^{g}} q(\Gamma |x,a) \pi (da|x)\) for any \(\Gamma \in \mathcal {B}(\mathbf {X})\) and \(x\in \mathbf {X}\). Similarly, for \(\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\), \(Q_{\Delta }^{\varphi }\) denotes the stochastic kernel on \(\mathbf {X}_{\Delta }\) given \(\mathbf {X}_{\Delta }\) defined by \(\displaystyle \int _{\mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma |x,a) \varphi (da|x)\) for any \(\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })\) and \(x\in \mathbf {X}_{\Delta }\).

At some point, we need to consider strategic measures for the model \(\mathcal {M}^{i}\) generated by arbitrary randomized stationary policies not necessarily belonging to \(\Xi \). Consequently, let us introduce the space

$$\begin{aligned} \mathbb {K}^{i}_{\Delta }=\{(x,a)\in \mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }: a\in \mathbf {A}^{i}_{\Delta }(x)\} \in \mathcal {B}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }). \end{aligned}$$

Let \(\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\), by a slight abuse of notation, let us denote by \(\varphi \) the randomized stationary policy induced by the stochastic kernel \(\varphi \) and by \(P^{\varphi }\) the strategic measure for the model \(\mathcal {M}^{i}\) generated by the policy \(\varphi \). Clearly, \(P^{\varphi } \in \mathcal {P}((\mathbb {K}^{i}_{\Delta })^{\infty }|\mathbf {X})\).

Finally, we end this subsection by introducing a projection mapping that will be used repeatedly in the paper. If \(y\in \mathbf {Y}\) then there exists a unique \(k\in \mathbb {N}\) such that \(y\in \mathbf {Y}_k\). The \(\mathbf {X}\)-valued mapping \(\bar{x}\) on \(\mathbf {Y}\) is defined by

$$\begin{aligned} \bar{x}(y)=x_{k+1}. \end{aligned}$$
(1)

2.2 Construction of the Process

Having introduced the parameters of the model, we are now in position to construct the Markov controlled process. Let \(\mathbf {Y}_\infty =\mathbf {Y}\cup \{y_{\infty }\}\) where \(y_{\infty }\) is an artificial (isolated) point and \(\Omega _{n}=\mathbf {Y}\times (\mathbb {R}_{+}^{*}\times \mathbf {Y})^n\times (\{\infty \}\times \{y_{\infty }\})^\infty \), for \(n\in \mathbb {N}\). The canonical space \(\Omega \) is defined as \(\Omega =\bigcup _{n=1}^\infty \Omega _{n}\bigcup \big ( \mathbf {Y}\times (\mathbb {R}_{+}^{*}\times \mathbf {Y})^\infty \big )\) and is endowed with its Borel \(\sigma \)-algebra denoted by \(\mathcal {F}\). For notational convenience, \(\omega \in \Omega \) will be represented as

$$\begin{aligned} \omega =(y_0,\theta _1,y_1,\theta _2,y_2,\ldots ). \end{aligned}$$

Here, \(y_0=(x_0,\Delta ,\Delta ,\ldots )\) is the initial state of the controlled point process \(\xi \) with values in \(\mathbf {Y}\), defined below; \(\theta _1=0\) and \(y_1\in \mathbf {Y}\) is the result of the initial intervention. The components \(\theta _{n}>0\) for \(n\ge 2\) mean the sojourn times; \(y_{n}\) denotes the result of an intervention (if \(y_{n}\in \mathbf {Y}^{*}\)) or corresponds to a natural jump (if \(y_{n}\in \mathbf {Y}\setminus \mathbf {Y}^{*})\)). In case \(\theta _{n}<\infty \) and \(\theta _{n+1}=\infty \), the trajectory has only n jumps and we put \(y_m=y_{\infty }\) for all \(m\ge n+1\).

The path up to \(n\in \mathbb {N}\) is denoted by \(h_{n}=(y_0,\theta _1,y_1,\theta _2,y_2,\ldots \theta _{n},y_{n})\) and the collection of all such paths is denoted by \(\mathbf {H}_{n}\). For \(n\in \mathbb {N}\), introduce the mappings \(Y_{n}:~\Omega \rightarrow \mathbf {Y}_\infty \) by \(Y_{n}(\omega )=y_{n}\) and, for \(n\ge 2\), the mappings \(\Theta _{n}:~\Omega \rightarrow \overline{\mathbb {R}}_{+}^{*}\) by \(\Theta _{n}(\omega )=\theta _{n}\); \(\Theta _{1}(\omega )=0\). The sequence \((T_{n})_{n\in \mathbb {N}^{*}}\) of \(\overline{\mathbb {R}}_{+}^{*}\)-valued mappings is defined on \(\Omega \) by \(T_{n}(\omega )=\sum _{i=1}^n\Theta _i(\omega )=\sum _{i=1}^n\theta _i\) and \(T_\infty (\omega )=\lim _{n\rightarrow \infty }T_{n}(\omega )\). For notational convenience, we denote by \(H_{n}=(Y_0,\Theta _1,Y_1,\ldots ,\Theta _{n},Y_{n})\) the n-term history process taking values in \(\mathbf {H}_{n}\) for \(n\in \mathbb {N}\).

The random measure \(\mu \) associated with \((\Theta _{n},Y_{n})_{n\in \mathbb {N}}\) is a measure defined on \(\mathbb {R}^{*}_{+}\times \mathbf {Y}\) by

$$\begin{aligned} \mu (\omega ;dt,dy)=\sum _{n\ge 2}I_{\{T_{n}(\omega )<\infty \}}\delta _{(T_{n}(\omega ),Y_{n}(\omega ))}(dt,dy). \end{aligned}$$

For notational convenience the dependence on \(\omega \) will be ignored and instead of \(\mu (\omega ;dt,dy)\) it will be written \(\mu (dt,dy)\). For \(t\in \mathbb {R}_{+}\), define \(\mathcal {F}_t=\sigma \{H_1\}\vee \sigma \{\mu (]0,s]\times B):~s\le t,B\in \mathcal {B}(\mathbf {Y})\}\). Finally, we define the controlled process \(\big \{\xi _t\big \}_{t\in \mathbb {R}_{+}}\):

$$\begin{aligned} \xi _t(\omega )=\left\{ \begin{array}{ll} Y_{n}(\omega ), &{} \quad {\text {if}}\,\,T_{n}\le t<T_{n+1} \quad {\text {for}}\,\,n\in \mathbb {N}^{*}; \\ y_{\infty }, &{} \quad {\text {if}}\,\,T_\infty \le t, \end{array}\right. \end{aligned}$$

and \(\xi _{0-}(\omega )=Y_0=y_0\) with \(y_0=(x_0,\Delta ,\Delta ,\ldots )\). Obviously, the controlled process \(\{\xi _t\}_{t\in \mathbb {R}_{+}}\) can be equivalently described by the sequence \((\Theta _{n},Y_{n})_{n\in \mathbb {N}}\). The sequence \((T_{n})_{n\in \mathbb {N}^{*}}\) describes the times of jumps of \(\{\xi _t\}_{t\in \mathbb {R}_{+}}\): \(T_{n}\) is the n-th jump moment. The state \(\xi _{t}\) is constant between the jump times \(T_{n}\) and \(T_{n+1}\) and represents the successive jumps of the process and the associated impulsive actions at time \(T_{n}\). This choice for the process \(\{\xi _t\}_{t\in \mathbb {R}_{+}}\) is motivated by the fact we consider models with possibly multiple impulses at the same time moment. In such a framework, we extend the state space from \(\mathbf {X}\) to \(\mathbf {Y}\) in order to include the sequence of successive instantaneous jumps and corresponding impulsive actions. The extended state is of the form \((x_0,a_0,x_1,a_1,\ldots ,x_k,a_k,x_{k+1},\Delta ,\Delta ,\ldots )\) where \(x_{0}\) corresponds to a possibly natural jump or to the value of the process just before the intervention. The triple \((x_{j},a_{j},x_{j+1})\) indicates that the impulsive action \(a_{j}\) has been applied to the system at state \(x_{j}\) leading to a new state \(x_{j+1}\) having distribution \(Q(\cdot |x_{j},a_{j})\). The special impulsive action \(\Delta \) means that the impulses are over and the artificial state \(\Delta \) means the same. Observe that the last component different from \(\Delta \) in \(\xi _{t}\) corresponds to the last position of the process after a sequence of successive instantaneous jumps and impulses. It is given by \(\overline{x}(\xi _{t})\). It may appear odd to have included the impulsive actions (associated to the successive jumps) in the state process \(\xi _{t}\). However, this choice was made to simplify the model describing the dynamic of the process. Other approaches would have been possible as for example, a model where \(\{\xi _t\}_{t\in \mathbb {R}_{+}}\) would have been only constituted of the successive jumps but it would have induced a much more complicated mathematical description for the dynamic with heavy notation.

2.3 Admissible Strategies and Conditional Distribution of the Controlled Process

An admissible (randomized) control strategy is a sequence \(u=(u_{n})_{n\in \mathbb {N}}\) such that \(u_{0}\in \mathcal {P}^{\mathbf {Y}}(x_0)\) and, for any \(n\in \mathbb {N}^{*}\), \(u_{n}\) is given by

$$\begin{aligned} u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big ), \end{aligned}$$

where \(\psi _{n}\) is a stochastic kernel on \(\overline{\mathbb {R}}^{*}_{+}\) given \(\mathbf {H}_{n}\) satisfying \(\psi _{n}(\cdot |h_{n})=\delta _{+\infty }(\cdot )\) for any \(h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}\) with \(\overline{x}(y_{n})\notin \mathbb {X}^{i}\), \(\pi _{n}\) is a stochastic kernel on \(\mathbf {A}^{g}\) given \( \mathbf {H}_{n}\times \mathbb {R}_{+}^{*}\) satisfying \(\pi _{n}(\mathbf {A}^{g}(\overline{x}(y_{n}))|h_{n},t)=1\) for any \(t\in \mathbb {R}_{+}^{*}\) and \(h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}\), \(\gamma ^0_{n}\) is a stochastic kernel on \(\mathbf {Y}\) given \( \mathbf {H}_{n}\times \mathbb {R}_{+}^{*}\times \mathbf {X}\) satisfying \(\gamma ^0_{n}(\cdot |h_{n},t,x) \in \mathcal {P}^{\mathbf {Y}}(x)\) for any \(h_{n}\in \mathbf {H}_{n}\), \(t\in \mathbb {R}_{+}^{*}\) and \(x\in \mathbf {X}\), and \(\gamma ^1_{n}\) is a stochastic kernel on \(\mathbf {Y}\) given \( \mathbf {H}_{n}\) satisfying \(\gamma ^1_{n}(\cdot |h_{n})\in \mathcal {P}^{\mathbf {Y}^{*}}(\overline{x}(y_{n}))\) for any \(h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}\) with \(\overline{x}(y_{n})\in \mathbb {X}^{i}\); if \(\overline{x}(y_n)\notin \mathbb {X}^i\) then \(\gamma ^1_n(\cdot |h_n)=\delta _{(\overline{x}(y_n),\Delta ,\Delta ,\ldots )}(\cdot )\).

The above conditions apply when \(y_n\ne y_\infty \); otherwise, all the values of \(\psi _n(\cdot | h_n)\), \(\pi _n(\cdot |h_n,t)\), \(\gamma ^0_n(\cdot |h_n,t,\cdot )\) and \(\gamma ^1_n(\cdot |h_n)\) may be arbitrary.

The set of admissible control strategies is denoted by \(\mathcal U\). In what follows, we use notation \(\gamma _{n}=(\gamma ^0_{n},\gamma ^1_{n})\). An admissible strategy u with \(u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )\) for \(n\in \mathbb {N}^{*}\) is called randomized stationary, if there exist \(\psi \in \bar{\mathbb {M}}_{+}^{*}(\mathbf {X})\), \(\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})\), a stochastic kernel \(\gamma ^{0}\) on \(\mathbf {Y}\) given \(\mathbf {X}\) such that \(\gamma ^0(\cdot |x) \in \mathcal {P}^{\mathbf {Y}}(x)\) for any \(x\in \mathbf {X}\), and a stochastic kernel \(\gamma ^{1}\) on \(\mathbf {Y}\) given \(\mathbf {X}\) such that \(\gamma ^1(\cdot |x) \in \mathcal {P}^{\mathbf {Y}^{*}}(x)\) for any \(x\in \mathbb {X}^{i}\) satisfying \(u_{0}(\cdot )=\gamma ^{0}(\cdot |x_{0})\), \(\psi _n(\cdot |h_n)=\delta _{\psi (\overline{x}(y_n))}(\cdot )\), \(\pi _n(\cdot |h_n,t)=\pi (\cdot |\overline{x}(y_n))\), \(\gamma ^{0}_{n}(\cdot |h_n,t,x)=\gamma ^{0}(\cdot |x)\), and \(\gamma ^{1}_{n}(\cdot |h_n)=\gamma ^{1}(\cdot |\overline{x}(y_n))\) when \(\overline{x}(y_{n})\in \mathbb {X}^{i}\).

Roughly speaking, \(\psi _{n}\) represents the conditional time distribution of the next possible intervention after time \(T_{n}\), \(\pi _{n}\) is the usual continuous control influencing the intensity of the jumps q between \(T_{n}\) and \(T_{n+1}\), \(\gamma ^{0}_{n}\) is the distribution of the next intervention if it is decided to have an intervention just immediately after a natural jump and \(\gamma ^{1}_{n}\) is the distribution of the next intervention if it is decided to have an intervention before a natural jump.

Suppose a strategy \(u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}\) is fixed with \(u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )\) for \(n\in \mathbb {N}^{*}\). We introduce the intensity of the natural jumps

$$\begin{aligned} \lambda _{n}(\Gamma _x,h_{n},t)= & {} \int _{\mathbf {A}^{g}} \bar{q}(\Gamma _x | \overline{x}(y_{n}),a) \pi _{n}(da | h_{n},t), \end{aligned}$$

and the rate of the natural jumps

$$\begin{aligned} \Lambda _{n}(\Gamma _{x},h_{n},t)= & {} \int _{]0,t[} \lambda _{n}(\Gamma _{x},h_{n},s) ds \end{aligned}$$

for any \(n\in \mathbb {N}^{*}\), \(\Gamma _{x}\in \mathcal {B}(\mathbf {X})\), \(h_{n}=(y_0,\theta _1,y_1,\ldots ,\theta _{n},y_{n})\in \mathbf {H}_{n}\) and \(t\in \bar{\mathbb {R}}_{+}^{*}\). Now, for any \(n\in \mathbb {N}^{*}\), the stochastic kernel \(G_{n}\) on \(\mathbf {Y}_{\infty }\times \overline{\mathbb {R}}_{+}^{*}\) given \(\mathbf {H}_{n}\) is defined by

$$\begin{aligned} G_{n}(\{+\infty \}\times \{y_{\infty }\} | h_{n})= & {} \delta _{y_{n}} (\{y_{\infty }\}) + \delta _{y_{n}} (\mathbf {Y}) e^{-\Lambda _{n}(\mathbf {X},h_{n},+\infty )}\psi _{n}(\{+\infty \}|h_n) \end{aligned}$$
(2)

and

$$\begin{aligned} G_{n}(\Gamma _{\Theta } \times \Gamma _{y}| h_{n})&= \delta _{y_{n}} (\mathbf {Y}) \Big [ \gamma _{n}^{1}(\Gamma _{y}| h_{n}) \int _{\Gamma _{\theta }} e^{-\Lambda _{n}(\mathbf {X},h_{n},t)} \psi _{n}(dt | h_{n}) \nonumber \\&\,\quad + \int _{\Gamma _{\theta }} \int _{\mathbf {X}} \psi _{n}([t,\infty ] | h_{n}) \gamma _{n}^{0}(\Gamma _{y}| h_{n},t,x) \lambda _{n}(dx,h_{n},t) e^{-\Lambda _{n}(\mathbf {X},h_{n},t)} dt \Big ], \end{aligned}$$
(3)

and

$$\begin{aligned} G_{n}(\{+\infty \} \times \Gamma _{y}| h_{n}) = 0, \end{aligned}$$
(4)

where \(\Gamma _{y}\in \mathcal {B}(\mathbf {Y})\), \(\Gamma _{\Theta }\in \mathcal {B}(\mathbb {R}_{+}^{*})\) and \(h_{n}=(y_0,\theta _1,y_1,\ldots ,\theta _{n},y_{n})\in \mathbf {H}_{n}\). Note that the kernel \(\gamma ^1_{n}\) does not appear in the formula for \(G_{n}\) if \(\overline{x}(y_{n})\notin \mathbb {X}^{i}\).

Consider an admissible strategy \(u\in \mathcal {U}\) and an initial state \(x_{0}\in \mathbf {X}\). Recalling that \(\Omega =\bigcup _{n=1}^\infty \Omega _{n}\bigcup \big ( \mathbf {Y}\times (\mathbb {R}_{+}^{*}\times \mathbf {Y})^\infty \big )\) and that \(\mathcal {F}\) denotes its associated Borel \(\sigma \)-algebra, Theorem 3.6 in [17] (or Remark 3.43 in [18] or [19]) implies the existence of a probability \(\mathbb {P}^{u}_{x_{0}}\) on \((\Omega ,\mathcal {F})\) such that the restriction of \(\mathbb {P}^{u}_{x_{0}}\) to \((\Omega ,\mathcal {F}_{0})\) is given by

$$\begin{aligned} \mathbb {P}^{u}_{x_{0}} \big ( \{Y_{0}\}\times \{0\} \times \Gamma _y \times (\bar{\mathbb {R}}_{+}^{*}\times \mathbf {Y}_{\infty })^{\infty } \big )= & {} u_{0}(\Gamma _y|x_{0}) \end{aligned}$$
(5)

for any \(\Gamma _y\in \mathcal {B}(\mathbf {Y})\) and the positive random measure \(\nu \) defined on \(\mathbb {R}_{+}^{*}\times \mathbf {Y}\) by

$$\begin{aligned} \nu (dt,dy)= \sum _{n\in \mathbb {N}^{*}} \frac{G_{n}(dt-T_{n}, dy | H_{n})}{G_{n}([t-T_{n},+\infty ]\times \mathbf {Y}_{\infty } | H_{n})} I_{\{T_{n}< t \le T_{n+1}\}} \end{aligned}$$
(6)

is the predictable projection of \(\mu \) with respect to \(\mathbb {P}^{u}_{x_{0}}\).

Remark 2.1

Observe that \(\mathcal {F}_{T_{n}}\) is the \(\sigma \)-algebra generated by the random variable \(H_{n}\) for \(n\in \mathbb {N}^{*}\). The conditional distribution of \((Y_{n+1},\Theta _{n+1})\) given \(\mathcal {F}_{T_{n}}\) under \(\mathbb {P}^{u}_{x_{0}}\) is determined by \(G_{n}(\cdot | H_{n})\) and the conditional survival function of \(\Theta _{n+1}\) given \(\mathcal {F}_{T_{n}}\) under \(\mathbb {P}^{u}_{x_{0}}\) is given by \(G_{n}([t,+\infty ]\times \mathbf {Y}_{\infty }| H_{n})\).

3 Optimization Problem and Assumptions

The objective of this section is to introduce the infinite-horizon performance criteria we are concerned with. Next we state our assumptions on the parameters of the model. Moreover, in the context of these hypotheses, at the end of this section we recall a technical result from [8] providing a decomposition of the predictable projection \(\nu \) of the measure \(\mu \) in terms of the distributions \((\gamma ^{0}_{n})_{n\in \mathbb {N}^{*}}\) and \((\gamma ^{1}_{n})_{n\in \mathbb {N}^{*}}\) of the next intervention.

We consider an optimization problem with \(p\in \mathbb {N}\) constraints where the performance and the constraint criteria are given in terms of infinite-horizon discounted functionals. In order to define these criteria, we need to introduce the cost rates \(\big (C^{g}_{j}\big )_{j\in \mathbb {N}_{p}}\) associated with continuous actions. For any \(j\in \mathbb {N}_{p}\), the real-valued mapping \(C^{g}_{j}\) is defined on \(\mathbb {K}^{g}\). The costs \(\big (C^{i}_{j}\big )_{j\in \mathbb {N}_{p}}\) associated with an intervention \(y=(x_{0},a_{0},x_{1},a_{1},\ldots )\in \mathbf {Y}\) are given by \(C^{i}_{j}(y)=\sum _{k\in \mathbb {N}} c^{i}_{j}(x_k,a_k)\), where for any \(j\in \mathbb {N}_{p}\), \(c^{i}_{j}\) is a non-negative real-valued mapping defined on \(\mathbb {K}^{i}_{\Delta }\) satisfying \(c^{i}_{j}(x,a)=0\) if \((x,a)\notin \mathbb {K}^{i}\). For any \((x,a)\in \mathbb {K}^{i}\) and \(j\in \mathbb {N}_{p}\), \(c^{i}_{j}(x,a)\) corresponds to the cost associated with a single jump at \(x\in \mathbf {X}\) resulting from the impulsive action \(a\in \mathbf {A}^{i}(x)\). The cost associated with a randomized intervention \(\rho \in \mathcal {P}^{\mathbf {Y}}(x)\) for \(x\in \mathbf {X}\) is given by \(\int _{\mathbf {Y}} C^{i}_{j}(y)\rho (dy|x)\) for any \(j\in \mathbb {N}_{p}\). Therefore, the infinite-horizon discounted performance criteria corresponding to an admissible control strategy \(u\in \mathcal{U}\) are defined by

$$\begin{aligned} \mathcal {V}_{j}(u,x_{0})&= \int _{\mathbf {Y}} C^{i}_{j}(y) u_{0}(dy|x_{0}) + \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{0}^{+\infty } e^{-\alpha s} \int _{\mathbf {A}^{g}} C^{g}_{j}(\overline{x}(\xi _{s-}),a) \pi (da |s) ds \Bigg ] \nonumber \\&\,\quad + \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,\infty [\times {\mathbf {Y}}} e^{-\alpha s} C^{i}_{j}(y) \mu (ds,dy) \Bigg ], \end{aligned}$$
(7)

for any \(j\in \mathbb {N}_{p}\). In the previous expression, \(\alpha >0\) is the discount factor. Note that, the performance criteria are well defined under Assumption A imposed below.

Definition 3.1

The constrained optimization problem consists to minimize \(\mathcal {V}_{0}(u,x_{0})\) within the class of admissible strategies \(u\in \mathcal {U}\) where \(x_{0}\) is the initial state and such that \(\mathcal {V}_{j}(u,x_{0}) \le B_{j}\), for any \(j\in \mathbb {N}_{p}^{*}\) where \((B_{j})_{j\in \mathbb {N}^{*}_{p}}\) are nonnegative real numbers representing the constraint bounds. The class of feasible strategies \(u\in \mathcal {U}\) will be denoted by \(\displaystyle \mathcal {U}^{f} = \Big \{ u\in \mathcal {U}: \mathcal {V}_{j}(u,x_{0}) \le B_{j}, \text { for any } j\in \mathbb {N}_{p}^{*} \Big \}\).

Assumption A

There exists a constant \(K\in \mathbb {R}_{+}\) such that for any \(x\in \mathbf {X}\), \(a^g\in \mathbf {A}^{g}(x)\), \(a^i\in \mathbf {A}^{i}(x)\) and \(j\in \mathbb {N}_{p}\)

  1. (A1)

    \(\bar{q}(\mathbf {X}|x,a^g)\le K\).

  2. (A2)

    \(C^{g}_{j}(x,a^g)\ge 0\).

  3. (A3)

    \(c^{i}_{j}(x,a^i)\ge 0\).

Remark 3.1

It must be emphasized that Assumption (A2) can be replaced by the following apparently weaker condition \(C^g_j(x,a^g)\ge -K\).

The purpose of the next assumption is to avoid infinite simultaneous interventions. This is a classical hypothesis in the framework of impulsive control problems, see for example [6].

Assumption B

There exists a constant \(\underline{c}>0\) such that \(\sum _{j\in \mathbb {N}_{p}} c^{i}_{j}(x,a)\ge \underline{c}\) for any \((x,a)\in \mathbb {K}^{i}\).

Finally, let us recall the following technical result from [8, Lemma 3.1]

Lemma 3.2

The predictable projection of the random measure \(\mu \) is given by \(\nu =\nu _{0}+\nu _{1}\) where

$$\begin{aligned} \nu _{0}(\Gamma ,\Gamma _{y}) = \int _{\Gamma }\int _{\mathbf {A}^{g}}\int _\mathbf {X} \gamma ^{0}(\Gamma _{y}|x,s) \bar{q}(dx | \overline{x}(\xi _{s-}),a) \pi (da|s) ds, \end{aligned}$$
$$\begin{aligned} \nu _{1}(\Gamma ,\Gamma _{y}) = \sum _{n\in \mathbb {N}^{*}} \gamma _{n}^{1}(\Gamma _{y}| H_{n}) \int _{\Gamma } I_{\{T_{n}< s \le T_{n+1}\}} \frac{\psi _{n}(ds-T_{n} | H_{n})}{\psi _{n}([s-T_{n},+\infty ] | H_{n})}, \end{aligned}$$

\(\displaystyle \gamma ^{0}(dy|x,t)=\sum _{n\in \mathbb {N}^{*}} I_{\{T_{n}< t \le T_{n+1}\}} \gamma ^{0}_{n}(dy|H_{n},t-T_n,x),\) \(\displaystyle \pi (da|t)=\sum _{n\in \mathbb {N}^{*}} I_{\{T_{n}< t \le T_{n+1}\}} \pi _{n}(da|H_{n},t-T_{n})\), for any \(\Gamma \in \mathcal {B}(\mathbb {R}_{+}^{*})\), \(\Gamma _{y}\in \mathcal {B}(\mathbf {Y})\), \(t\in \mathbb {R}_{+}\).

4 Occupation Measures and Their Properties

In this section, we introduce the definition of an occupation measure \(\eta _{u}\) induced by a control strategy \(u\in \mathcal {U}\) (see Definition 4.2). The objective of this section is twofold. First, it is shown in Theorem 4.5 that for any control strategy \(u\in \mathcal {U}\), the corresponding occupation measure \(\eta _{u}\) satisfies a linear equation depending on the kernels q and Q. A measure satisfying such equation will be called admissible. Second, from any admissible measure \(\eta \) one can construct a randomized stationary control strategy \(u\in \mathcal {U}\) such that the restrictions of \(\eta \) and \(\eta _{u}\) to \(\mathbb {K}^{g}\) are equal and the restriction of \(\eta _{u}\) to \(\mathbb {K}^{i}\) is smaller than the restriction of \(\eta \) to \(\mathbb {K}^{i}\) (see Theorem 4.9). These two important results will play an important role in the next section to establish a connection between the constrained optimal control problem and the linear program given in Definition 5.1. In order to prove these two results, one needs to provide auxiliary results whose proofs are deferred to the Appendices 1 and 2 to streamline the presentation.

Definition 4.1

To any \(\gamma \in \mathcal {M}(\mathbf {Y})\), we associate the measure \(\widetilde{\gamma } \in \mathcal {M}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta })\) defined by \(\displaystyle \widetilde{\gamma }(\Gamma ) = \sum _{j=1}^{\infty } \gamma \big (\{\mathbf {y}\in \mathbf {Y}: \mathbf {y}_{j}\in \Gamma \} \big )\), for any \(\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta })\) and where \(\mathbf {y}_{j}\) is the jth coordinate of \(\mathbf {y}\in \mathbf {Y}\subset (\mathbb {K}^{i}_{\Delta })^{\infty }\). Similarly, if R is a stochastic kernel on \(\mathbf {Y}\) given a Borel space \(\mathbf {Z}\) then \(\widetilde{R}\) is a kernel on \(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }\) given \(\mathbf {Z}\) defined by \(\displaystyle \widetilde{R}(\Gamma |z) = \sum _{j=1}^{\infty } R\big (\{\mathbf {y}\in \mathbf {Y}: \mathbf {y}_{j}\in \Gamma \} | z\big )\), for any \(z\in \mathbf {Z}\) and \(\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta })\).

These definitions will be naturally extended to probability measures defined on \((\mathbb {K}^{i}_{\Delta })^{\infty }\) and to stochastic kernels on \((\mathbb {K}^{i}_{\Delta })^{\infty }\) given a Borel space \(\mathbf {Z}\). Now, we introduce the definition of an occupation measure \(\eta _{u}\) induced by an admissible control strategy u.

Definition 4.2

For a strategy \(u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}\), let us introduce the measures \(\eta _{u}^{g}\) (respectively, \(\mu _{u}^{i}\) and \(\eta _{u}^{i}\)) defined on \(\mathbf {X}\times \mathbf {A}^{g}\) (respectively, \(\mathbf {Y}\) and \(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }\)) by

$$\begin{aligned} \eta _{u}^{g}(dx,da)= & {} \alpha \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{0}^{T_{\infty }} e^{-\alpha s} \delta _{\bar{x}(\xi _{s-})}(dx) \pi (da|s) ds \Bigg ], \end{aligned}$$
(8)
$$\begin{aligned} \mu _{u}^{i}(dy)= & {} \mathbb {E}^{u}_{x_{0}} \Bigg [ \int _{]0,T_{\infty }[} e^{-\alpha s} \mu (ds,dy) \Bigg ] + u_{0}(dy|x_{0}), \end{aligned}$$
(9)

and

$$\begin{aligned} \eta _{u}^{i}(dx,da)= & {} \widetilde{\mu }_{u}^{i}(dx,da). \end{aligned}$$
(10)

Actually, the measure \(\eta _{u}^{g}\) is supported on \(\mathbb {K}^{g}\) and clearly finite for any \(u\in \mathcal {U}\), and the measure \(\eta _{u}^{i}\) is supported on \(\mathbb {K}^{i}_{\Delta }\). Then, the measure \(\eta _{u}\) defined on \(\mathbf {X}\times \mathbf {A}\) by

$$\begin{aligned} \eta _{u}(\Gamma )=\eta _{u}^{g}(\Gamma \mathop {\cap }\mathbb {K}^{g})+\eta _{u}^{i}(\Gamma \mathop {\cap }\mathbb {K}^{i}), \end{aligned}$$
(11)

for any \(\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A})\) is called the occupation measure of the controlled process induced by the control strategy u. Clearly, the measure \(\eta _{u}\) is supported on \(\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\).

The infinite-horizon discounted performance criteria corresponding to an admissible control strategy \(u\in \mathcal{U}\) satisfying \(\mathbb {P}^{u}_{x_{0}}(T_{\infty }=+\infty )=1\) can be written in terms of the measure \(\eta _{u}\) as follows

$$\begin{aligned} \mathcal {V}_{j}(u,x_{0})&= \eta _{u}(C_{j}) \end{aligned}$$
(12)

with \(C_{j}(x,a)=\frac{1}{\alpha }C^{g}_{j}(x,a)I_{\mathbb {K}^{g}}(x,a)+c^{i}_{j}(x,a) I_{\mathbb {K}^{i}}(x,a)\), for \(j\in \mathbb {N}_{p}\).

In order to show the first main result of this section, Theorem 4.6, we need to derive two intermediate results: Propositions 4.3 and 4.4. For the sake of clarity in the exposition, their proofs are presented in Appendix 1. Roughly speaking, these two technical results establish links between the measures \(\eta ^{g}_{u}\) and \(\eta ^{i}_{u}\).

Proposition 4.3

Consider a strategy \(u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}\) fixed with \(u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )\) for \(n\in \mathbb {N}^{*}\) satisfying \(\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })<\infty \). Then, for any \(\Gamma \in \mathcal {B}(\mathbf {X})\),

$$\begin{aligned} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g})&= \eta _{u}^{i}(\Gamma \times \{\Delta \}) - \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \eta _{u}^{g}(dx,da) \nonumber \\&\,- \mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} e^{-\alpha T_{n}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}) \Bigg ]. \end{aligned}$$
(13)

Proof

See Appendix 1. \(\square \)

Proposition 4.4

Consider a strategy \(u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}\) fixed with \(u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )\) for \(n\in \mathbb {N}^{*}\) satisfying \(\mathbb {P}^{u}_{x_{0}}(T_{\infty }=+\infty )=1\). Then, for any \(\Gamma \in \mathcal {B}(\mathbf {X}_{\Delta })\)

$$\begin{aligned} \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta }} Q_{\Delta }(\Gamma | z,b) \eta _{u}^{i}(dz,db)\nonumber \\&\quad + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma \mathop {\cap }\mathbf {X} | x,a) \eta _{u}^{g}(dx,da) \nonumber \\&\,+\mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} e^{-\alpha T_{n}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}) \Bigg ]. \end{aligned}$$
(14)

Proof

See Appendix 1. \(\square \)

Remark 4.1

Observe that in the previous result, if we consider \(\Gamma \) in \(\mathcal {B}(\mathbf {X})\) then Eq. (14) becomes

$$\begin{aligned} \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \eta _{u}^{i}(dz,db)\nonumber \\&\quad + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma | x,a) \eta _{u}^{g}(dx,da)\nonumber \\&\,+\mathbb {E}^{u}_{x_{0}} \Bigg [ \sum _{n\in \mathbb {N}^{*}} e^{-\alpha T_{n}} I_{\Gamma }(\overline{x}(Y_{n})) \int _{]0,\infty [} e^{-\alpha s} \psi _{n}(ds | H_{n}) \Bigg ], \end{aligned}$$
(15)

since \(Q_{\Delta }(\Gamma | x,a)=0\) for any \((x,a)\notin \mathbb {K}^{i}\).

Definition 4.5

A measure \(\rho \in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A})\) is said to be admissible if \(\rho \) is concentrated on \(\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\) and

$$\begin{aligned} \rho (\Gamma \times \mathbf {A})&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \rho (dz,db) + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} q(\Gamma | x,a) \rho (dx,da), \end{aligned}$$
(16)

for any \(\Gamma \in \mathcal {B}(\mathbf {X})\).

The next result shows that any occupation measure is admissible.

Theorem 4.6

Consider a strategy \(u=(u_{n})_{n\in \mathbb {N}}\in \mathcal {U}\) fixed with \(u_{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )\) for \(n\in \mathbb {N}^{*}\) satisfying \(\eta ^{i}_{u}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })<\infty \). Then, the measure \(\eta _{u}\) is admissible.

Proof

First of all, recall that \(\eta _{u}\) is finite and notice that \(q(\Gamma |x,a)=\bar{q}(\Gamma |x,a)-I_{\Gamma }(x)\bar{q}(\mathbf {X}|x,a)\) for any \(\Gamma \in \mathcal {B}(\mathbf {X})\) and \((x,a)\in \mathbb {K}^{g}\). Now consider \(\Gamma \in \mathcal {B}(\mathbf {X})\), then by adding Eqs. (13) and (15), it yields that

$$\begin{aligned} \eta _{u}(\Gamma \times \mathbf {A})= & {} \eta _{u}^{g}(\Gamma \times \mathbf {A}^{g}) + \eta _{u}^{i}(\Gamma \times \mathbf {A}^{i}) \nonumber \\= & {} \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \eta _{u}^{i}(dz,db) + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} q(\Gamma | x,a) \eta _{u}^{g}(dx,da) \nonumber \\= & {} \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \eta _{u}(dz,db) + \frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} q(\Gamma | x,a) \eta _{u}(dx,da), \end{aligned}$$

showing the result. \(\square \)

Let \(\varphi \) be a randomized stationary policy for the model \(\mathcal {M}^{i}\). Introduce the set

$$\begin{aligned} \mathcal {S}_{\varphi }= \big \{ x\in \mathbf {X}: \widetilde{P}^{\varphi }(\mathbf {X} \times \{\Delta \}|x)=1 \big \}, \end{aligned}$$
(17)

and the stochastic kernel \(R^{\varphi }\) on \(\mathbf {Y}\) given \(\mathbf {X}\) by

$$\begin{aligned} R^{\varphi }(dy| x)= P^{\varphi }(dy|x) I_{\mathcal {S}_{\varphi }}(x)+\delta _{(x,\Delta ,\Delta ,\ldots )}(dy) I_{\mathcal {S}_{\varphi }^{c}}(x). \end{aligned}$$
(18)

We introduce now a special class of randomized stationary control strategies.

Definition 4.7

Consider \(\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})\) and \(\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\). The strategy \(u^{\pi ,\varphi }=(u^{\pi ,\varphi }_{n})_{n\in \mathbb {N}}\) is defined by \(u^{\pi ,\varphi }_{0}(\cdot )=R^{\varphi }(\cdot |x_{0})\) and by \(u^{\pi ,\varphi } _{n}=\big ( \psi _{n},\pi _{n},\gamma ^0_{n},\gamma ^1_{n} \big )\) for \(n\in \mathbb {N}^{*}\) where for any \(x\in \mathbf {X}\), \(t\in \mathbb {R}_{+}\) and \(h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}\), \(\psi _{n}(\cdot |h_{n})=\delta _{+\infty }(\cdot )\), \(\pi _{n}(\cdot |h_{n},t)=\pi (\cdot |\bar{x}(y_{n}))\), \(\gamma ^0_{n}(\cdot |h_{n},t,x)=R^{\varphi }(\cdot |x)\). Finally, \(\gamma ^{1}_{n}\) is defined by \(\gamma ^{1}_{n}(\cdot |h_{n})=\gamma ^{1}(\cdot |\overline{x}(y_{n}))\) where \(\gamma ^{1}\) is an arbitrary stochastic kernel on \(\mathbf {Y}\) given \( \mathbf {X}\) satisfying \(\gamma ^{1}(\cdot |x)\in \mathcal {P}^{\mathbf {Y}^{*}}(x)\) for \(x\in \mathbb {X}^{i}\) and \(\gamma ^{1}(\cdot |x)=\delta _{(x,\Delta ,\Delta ,\ldots )}(\cdot )\) otherwise.

Clearly, the strategy so defined satisfies \(u^{\pi ,\varphi } \in \mathcal {U}\). The following proposition provides important properties of the measures \(\eta _{u^{\pi ,\varphi }}^{g}\) and \(\eta _{u^{\pi ,\varphi }}^{i}\) that will be needed in the proof of Theorem 4.9.

Proposition 4.8

Consider \(\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})\) and \(\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\). Then

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i}(dx,da) =&\Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} \bar{q}^{\pi } \Big ] \widetilde{R}^{\varphi }(dx,da), \end{aligned}$$
(19)

and for any \(\Gamma \in \mathcal {B}(\mathbf {X})\)

$$\begin{aligned} \widehat{\eta }_{u^{\pi ,\varphi }}^{g} (\Gamma )=&\frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} r^{\pi }_{\varphi }(\Gamma ) +\widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}|x_{0}), \end{aligned}$$
(20)

where \(\widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx)\) denotes \(\eta _{u^{\pi ,\varphi }}^{g}(dx, \mathbf {A}^{g})\) and the transition rate \(r_{\varphi }^{\pi }\) on \(\mathbf {X}\) given \(\mathbf {X}\) is defined by

$$\begin{aligned} r_{\varphi }^{\pi }(\Gamma | x)&= \bar{q}^{\pi } \widetilde{R}^{\varphi } (\Gamma \times \{\Delta \}) - I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x). \end{aligned}$$
(21)

Proof

See Appendix 2. \(\square \)

The following theorem is the second main result of this section. Roughly speaking, it can be seen as a converse of Theorem 4.6. In particular, it shows that an admissible measure \(\eta \) may not be necessarily an occupation measure but one can construct from \(\eta \) a randomized stationary control strategy u such that the corresponding occupation measure \(\eta _{u}\) is smaller than \(\eta \) (see Eq. 26).

Theorem 4.9

Let \(\eta \) be an admissible measure. Let us define the measure \(\eta ^{g}\) on \(\mathbf {X}\times \mathbf {A}^{g}\) by

$$\begin{aligned} \eta ^{g}(\Gamma ) = \eta (\Gamma \mathop {\cap }\mathbb {K}^{g}), \end{aligned}$$
(22)

for any \(\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A}^{g})\) and the measure \(\eta ^{i}\) on \(\mathbf {X}\times \mathbf {A}^{i}_{\Delta }\) by

$$\begin{aligned} \eta ^{i}(\Gamma ) = \eta (\Gamma \mathop {\cap }\mathbb {K}^{i}), \end{aligned}$$
(23)

for any \(\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A}^{i})\) and

$$\begin{aligned} \eta ^{i}(\Gamma \times \{\Delta \}) = \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}(\mathbf {X}| x,a) \eta ^{g}(dx,da)+\eta ^{g}(\Gamma \times \mathbf {A}^{g}), \end{aligned}$$

for any \(\Gamma \in \mathcal {B}(\mathbf {X})\). Then, there exist a stochastic kernel \(\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})\) satisfying

$$\begin{aligned} \eta ^{g}(\Gamma )=\int _{\Gamma } \pi (da|x) \eta ^{g}(dx,\mathbf {A}^{g}), \end{aligned}$$
(24)

for any \(\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A}^{g})\) and a stochastic kernel \(\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\) satisfying

$$\begin{aligned} \eta ^{i}(\Gamma )=\int _{\Gamma } \varphi (da|x) \eta ^{i}(dx,\mathbf {A}^{i}_{\Delta }) \end{aligned}$$
(25)

for any \(\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })\) and \(\varphi (\{\Delta \}|\Delta )=1\). Then,

$$\begin{aligned} \eta ^{g}_{u^{\pi ,\varphi }}=\eta ^{g} \text{ and } \eta ^{i}_{u^{\pi ,\varphi }}\le \eta ^{i} \end{aligned}$$
(26)

and

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{i} = \Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \eta _{u^{\pi ,\varphi }}^{g} \bar{q} \Big ] \widetilde{P}^{\varphi }. \end{aligned}$$
(27)

Proof

First, notice that \(\eta ^{i}\in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })\) and since \(Q_{\Delta }(\mathbf {X}|x,a)=1\) for any \((x,a)\in \mathbb {K}^{i}\) and \(q(\mathbf {X}|x,a)=0\) for any \((x,a)\in \mathbb {K}^{g}\), we obtain from Eq. (16) that \(\eta ^{g}\in \mathcal {P}(\mathbf {X}\times \mathbf {A}^{g})\). Consequently, Proposition D.8 in [13, p. 184] ensures the existence of \(\pi \in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})\) and \(\varphi \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\) satisfying respectively Eqs. (24) and (25). Consider an arbitrary set \(\Gamma \in \mathcal {B}(\mathbf {X})\). For notational convenience, let us denote \(\eta ^{i}(\Gamma \times \mathbf {A}^{i}_{\Delta })\) by \( \widehat{\eta }^{i}(\Gamma )\) and \(\eta ^{g}(\Gamma \times \mathbf {A}^{g})\) by \(\widehat{\eta }^{g}(\Gamma )\). Observe that \(Q^{\varphi }_{\Delta }(\Gamma | \{\Delta \} )=0\). By using the definitions of \(\eta ^{g}\) and \(\eta ^{i}\) and Eq. (16) which is satisfied by \(\eta \), it is easy to see that

$$\begin{aligned} \widehat{\eta }^{g}(\Gamma )= & {} \eta ^{i}(\Gamma \times \{\Delta \}) - \frac{1}{\alpha } \int _{\mathbf {X}\times \mathbf {A}^{g}} I_{\Gamma } (x) \bar{q}^{\pi }(\mathbf {X}| x,a) \widehat{\eta }^{g}(dx),\end{aligned}$$
(28)
$$\begin{aligned} \widehat{\eta }^{i}(\Gamma )= & {} \delta _{x_{0}}(\Gamma ) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\Gamma ) + \int _{\mathbf {X}} Q^{\varphi }_{\Delta }(\Gamma | z) \widehat{\eta }^{i}(dz) \end{aligned}$$
(29)

for any \(\Gamma \in \mathcal {B}(\mathbf {X})\) by recalling that \(q(\Gamma |x,a)=\bar{q}(\Gamma |x,a)-I_{\Gamma }(x)\bar{q}(\mathbf {X}|x,a)\). Consequently,

$$\begin{aligned} \widehat{\eta }^{i}(\Gamma )&= \delta _{x_{0}}(\Gamma ) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\Gamma ) + \widehat{\eta }^{i} I_{X}Q^{\varphi }_{\Delta }(\Gamma ). \end{aligned}$$
(30)

Moreover, \(I_{X}Q^{\varphi }_{\Delta } I_{X}=Q^{\varphi }_{\Delta } I_{X}\) and therefore, by iterating Eq. (30) we have

$$\begin{aligned} \int _{\mathbf {X}} \sum _{k=0}^{n} \big (Q^{\varphi }_{\Delta }\big )^{k}(\Gamma | x) \Big [ \delta _{x_{0}}(dx)+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(dx) \Big ] \le \widehat{\eta }^{i}(\Gamma ) \end{aligned}$$
(31)

for any \(n\in \mathbb {N}\). Observe that \(\displaystyle \sum _{k=0}^{\infty } \big (Q^{\varphi }_{\Delta }\big )^{k}(\Gamma |x)=\widetilde{P}^{\varphi }(\Gamma \times \mathbf {A}^{i}_{\Delta }|x)\). For notational convenience, let us denote by \(\rho \) the measure \(\Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{P}^{\varphi } \in \mathcal {M}(\mathbf {X}_{\Delta }\times \mathbf {A}^{i}_{\Delta })\) which is concentrated on \(\mathbb {K}^{i}_{\Delta }\). Applying the monotone convergence Theorem and taking the limit as n tends to infinity in Eq. (31), it follows that

$$\begin{aligned} \rho (\Gamma \times \mathbf {A}^{i}_{\Delta }) \le \widehat{\eta }^{i}(\Gamma ). \end{aligned}$$
(32)

By using the fact that \(\widetilde{P}^{\varphi }(dz,db | x)=\varphi (db|z) \widetilde{P}^{\varphi }(dz\times \mathbf {A}^{i}_{\Delta }|x)\) for any \(x\in \mathbf {X}\), it follows that

$$\begin{aligned} \rho (dz,db) = \varphi (db|z) \rho (dz, \mathbf {A}^{i}_{\Delta }). \end{aligned}$$
(33)

Combining Eqs. (25) and (32)–(33), it implies

$$\begin{aligned} \rho \le \eta ^{i}. \end{aligned}$$
(34)

Clearly, the measure \(\rho \in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })\) and satisfies

$$\begin{aligned} \rho (\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\Gamma ) + \int _{\mathbf {X}} Q^{\varphi }_{\Delta }(\Gamma | z) \rho (dz,\mathbf {A}^{i}_{\Delta }) \end{aligned}$$

and so, recalling (33)

$$\begin{aligned} \rho (\Gamma \times \mathbf {A}^{i}_{\Delta })&= \delta _{x_{0}}(\Gamma ) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\Gamma ) + \int _{\mathbf {X}\times \mathbf {A}^{i}} Q_{\Delta }(\Gamma | z,b) \rho (dz,db) . \end{aligned}$$

Since \(Q_{\Delta }(\mathbf {X} | z,b)=I_{\mathbb {K}^{i}}(z,b)\) and \(\rho (\mathbb {K}^{i})= \rho (\mathbf {X}\times \mathbf {A}^{i})\), we obtain from the previous equation

$$\begin{aligned} \rho (\mathbf {X}\times \{\Delta \}) + \rho (\mathbf {X}\times \mathbf {A}^{i})&= \delta _{x_{0}}(\mathbf {X}) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\mathbf {X}) + \rho (\mathbf {X}\times \mathbf {A}^{i}) \end{aligned}$$

showing

$$\begin{aligned} \rho (\mathbf {X}\times \{\Delta \})&= \delta _{x_{0}}(\mathbf {X}) + \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi }(\mathbf {X}). \end{aligned}$$

Moreover, from the definition of \(\rho \) we obtain that

$$\begin{aligned} \int _{\mathbf {X}} \big [ \widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \} | x) -1 \big ] \delta _{x_{0}}(dx) + \int _{\mathbf {X}} \big [ \widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \} | x) - 1 \big ] \widehat{\eta }^{g} \bar{q}^{\pi } (dx)=0. \end{aligned}$$

Lemma B.1 yields that \(\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \} | x)\le 1\) for any \(x\in \mathbf {X}\). Consequently,

$$\begin{aligned} \big [\delta _{x_{0}}+\widehat{\eta }^{g} \bar{q}^{\pi }\big ] \big (\mathcal {S}_{\varphi }^{c}\big )=0, \end{aligned}$$
(35)

where the set \(\mathcal {S}_{\varphi }\) has been introduced in (17). Now, combining Eqs. (29) and (34), it follows

$$\begin{aligned} \widehat{\eta }^{g}(\Gamma )&\ge \widetilde{P}^{\varphi }(\Gamma \times \{\Delta \} | x_{0}) + \frac{1}{\alpha } \widehat{\eta }^{g} p^{\pi }_{\varphi }(\Gamma ), \end{aligned}$$

where

$$\begin{aligned} p_{\varphi }^{\pi }(\Gamma | x)&= \bar{q}^{\pi } \widetilde{P}^{\varphi } (\Gamma \times \{\Delta \} | x) - I_{\Gamma }(x) \bar{q}^{\pi }(\mathbf {X}|x). \end{aligned}$$
(36)

Now, from Eq. (35) we have \(\widetilde{P}^{\varphi }(\mathbf {X}\times \{\Delta \} | x_{0})=1\) and

$$\begin{aligned} \widehat{\eta }^{g} p^{\pi }_{\varphi }(\mathbf {X})=\int _{\mathbf {X}} \widetilde{P}^{\varphi } (\mathbf {X}\times \{\Delta \} | x) \widehat{\eta }^{g} \bar{q}^{\pi } (dx) - \widehat{\eta }^{g} \bar{q}^{\pi } (\mathbf {X}) =0. \end{aligned}$$

Therefore, since \(\widehat{\eta }^{g}(\mathbf {X})=1\) it follows that the positive measure \(\gamma \) defined on \(\mathbf {X}\) by

$$\begin{aligned} \gamma = \widehat{\eta }^{g} -\widetilde{P}^{\varphi }(\cdot , \{\Delta \} | x_{0}) -\frac{1}{\alpha } \widehat{\eta }^{g} p^{\pi }_{\varphi }, \end{aligned}$$

satisfies \(\gamma (\mathbf {X})=0\) and so

$$\begin{aligned} \widehat{\eta }^{g}(\Gamma )&= \widetilde{P}^{\varphi }(\Gamma \times \{\Delta \} | x_{0}) + \frac{1}{\alpha } \widehat{\eta }^{g} p^{\pi }_{\varphi }(\Gamma ). \end{aligned}$$
(37)

However, from the definition of \(R^{\varphi }\) in Eq. (18) we have

$$\begin{aligned} \widetilde{R}^{\varphi }(dz,db|x)=\widetilde{P}^{\varphi }(dz,db|x)I_{\mathcal {S}_{\varphi }}(x) +\delta _{(x,\Delta )}(dz,db) I_{\mathcal {S}_{\varphi }^{c}}(x), \end{aligned}$$

on \(\mathbf {X}\times \mathbf {A}^{i}_{\Delta }\). Now, the previous equation and (35) yield

$$\begin{aligned} \Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{R}^{\varphi }=\Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{P}^{\varphi }. \end{aligned}$$
(38)

Combining Eqs. (36)–(38), we obtain that

$$\begin{aligned} \widehat{\eta }^{g}(\Gamma )&= \widetilde{R}^{\varphi }(\Gamma \times \{\Delta \} | x_{0}) + \frac{1}{\alpha } \widehat{\eta }^{g} r^{\pi }_{\varphi }(\Gamma ), \end{aligned}$$

where \(r^{\pi }_{\varphi }\) has been defined in Eq. (21). Clearly, \(r^{\pi }_{\varphi }\) is a transition rate on \(\mathbf {X}\) given \(\mathbf {X}\) and so applying the uniqueness result of item d) in Theorem 3.2 in [24] we have from (20) that \(\widehat{\eta }_{u^{\pi ,\varphi }}^{g}=\widehat{\eta }^{g}\). From the definitions of \(u^{\pi ,\varphi }\) and \(\eta _{u}^{g}\) it is easy to show that \(\eta _{u^{\pi ,\varphi }}^{g}(dx,da) = \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx)\pi (da|x)\). Therefore,

$$\begin{aligned} \eta _{u^{\pi ,\varphi }}^{g}(dx,da) = \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(dx)\pi (da|x)=\widehat{\eta }^{g}(dx)\pi (da|x)=\eta ^{g}(dx,da), \end{aligned}$$
(39)

showing the first part of the result.

Now, combining the previous equation, (34) and (38) we get that

$$\begin{aligned} \Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{R}^{\varphi }=\Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }^{g} \bar{q}^{\pi } \Big ] \widetilde{P}^{\varphi }=\rho \le \eta ^{i}, \end{aligned}$$

and so, by using Eqs. (19) and (39), we have that \(\eta _{u^{\pi ,\varphi }}^{i} = \Big [ \delta _{x_{0}}+ \frac{1}{\alpha } \widehat{\eta }_{u^{\pi ,\varphi }}^{g} \bar{q}^{\pi } \Big ] \widetilde{P}^{\varphi } \le \eta ^{i}\) giving the last assertions. \(\square \)

Finally, the following corollary shows that, although \(\eta ^i\) and \(\eta ^i_{u^{\pi ,\varphi }}\) are not equal, there exists a subset of \(\mathbf {X}\times \mathbf {A}^{i}\) on which they coincide.

Corollary 4.10

Under the conditions of Theorem 4.9, there exists a set \(D\in \mathcal{B}(\mathbf {X})\) such that for any \(z\in D\), \(Q^\varphi _\Delta (D|z)=1\); \(\widehat{\eta }^g_{u^{\pi ,\varphi }}(D)=\widehat{\eta }^i_{u^{\pi ,\varphi }}(D)=0\), and \(\eta ^i(\Gamma )= \eta ^i_{u^{\pi ,\varphi }}(\Gamma )\) for any \(\Gamma \in \mathcal{B}\big ( (\mathbf {X} \setminus D)\times \mathbf {A}^{i}_\Delta \big )\).

Proof

From Proposition 4.4 the pair of measures \((\eta _{u^{\pi ,\varphi }}^{g},\eta _{u^{\pi ,\varphi }}^{i})\) satisfies

$$\begin{aligned} \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(\Gamma )&= \delta _{x_{0}}(\Gamma )+\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma | x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da) + \int _{\mathbf {X}} Q^\varphi _{\Delta }(\Gamma | z) \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(dz), \end{aligned}$$
(40)

since for \(u^{\pi ,\varphi }\) we have \(\psi _{n}(\cdot |h_{n})=\delta _{+\infty }(\cdot )\) for any \(h_{n}=(y_0,\theta _1,\ldots \theta _{n},y_{n})\in \mathbf {H}_{n}\). Moreover, according to Eq. (29), we have

$$\begin{aligned} \widehat{\eta }^{i}(\Gamma ) = \delta _{x_{0}}(\Gamma )+\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma | x,a) \eta ^{g}(dx,da) + \int _{\mathbf {X}} Q^\varphi _{\Delta }(\Gamma | z) \widehat{\eta }^{i}(dz) \end{aligned}$$

for any \(\Gamma \in \mathcal{B}(\mathbf {X} )\). Therefore, the measure \(\gamma \) defined on \(\mathbf {X}\) by \(\gamma =\widehat{\eta }^i-\widehat{\eta }_{u^{\pi ,\varphi }}^{i}\) satisfies the following equation

$$\begin{aligned} \gamma (\Gamma )&= \int _{\mathbf {X}} Q^\varphi _{\Delta }(\Gamma | z) \gamma (dz) \text{ for } \text{ any } \Gamma \in \mathcal{B}(\mathbf {X} ) \end{aligned}$$
(41)

since \(\eta _{u^{\pi ,\varphi }}^{g}=\eta ^g\). Define the sequence of sets \((\mathbf {X}_{n})_{n\in \mathbb {N}}\) by \(\mathbf {X}_{n+1}=\{z\in \mathbf {X}_{n}:\ Q^\varphi _{\Delta }(\mathbf {X}_{n} | z)=1\}\), for \(n\in \mathbb {N}^{*}\) and \(\mathbf {X}_{0}=\mathbf {X}\). This sequence satisfies \(\gamma (\mathbf {X}_n\setminus \mathbf {X}_{n+1}) =0\), \( Q^\varphi _{\Delta }(\mathbf {X}_n | z)=1\), for any \(z\in \mathbf {X}_{n+1}\) and

$$\begin{aligned} \gamma (\mathbf {X}_n)= & {} \int _{\mathbf {X}_n} Q^\varphi _{\Delta }(\mathbf {X}_n | z) \gamma (dz), \end{aligned}$$
(42)

for any \(n\in \mathbb {N}\). Indeed, Eq. (42) clearly holds for \(n=0\) by using (41). Therefore, we have \(\gamma (z\in \mathbf {X}_0:\ Q^\varphi _{\Delta }(\mathbf {X}_0 | z)<1)=0\) implying that \(\gamma (\mathbf {X}_{0}\setminus \mathbf {X}_{1}) =0\). Moreover, by definition of \(\mathbf {X}_{1}\) it is straightforward to see that \(Q^\varphi _{\Delta }(\mathbf {X}_{0} | z)=1\) for any \(z\in \mathbf {X}_{1}\). Suppose the decreasing family of sets \((\mathbf {X}_{j})_{j\in \mathbb {N}_{n}}\) satisfies the above equations. From Eq. (42) we have \(\gamma (z\in \mathbf {X}_{n}:\ Q^\varphi _{\Delta }(\mathbf {X}_{n} | z)<1)=0\) showing that \(\gamma (\mathbf {X}_{n}\setminus \mathbf {X}_{n+1})=0\). Then by using Eq. (41), we obtain \(\displaystyle \gamma (\mathbf {X}_{n+1}) = \int _{\mathbf {X}_{n+1}} Q^\varphi _{\Delta }(\mathbf {X}_{n+1} | z) \gamma (dz)\). Finally, by definition of \(\mathbf {X}_{n+1}\), we have \(Q^\varphi _{\Delta }(\mathbf {X}_{n} | z)=1\) for any \(z\in \mathbf {X}_{n+1}\). Let us introduce the set \(D\subset \mathbf {X}\) defined by \(\displaystyle D=\bigcap _{j=0}^\infty \mathbf {X}_{j}\). Then \(\gamma (\mathbf {X}\setminus D)=\sum _{j=0}^\infty \gamma (\mathbf {X}_{j}\setminus \mathbf {X}_{j+1})=0\). Consequently, for any \(\Gamma \in \mathcal{B}\big ( (\mathbf {X} \setminus D) \big )\) we have \(\widehat{\eta }^{i}(\Gamma )= \widehat{\eta }^{i}_{u^{\pi ,\varphi }}(\Gamma )\), so the measures \(\eta ^i\) and \(\eta _{u^{\pi ,\varphi }}^{i}\) coincide on \((\mathbf {X}\setminus D)\times \mathbf {A}^i_\Delta \) because \(\eta ^i(dx,da)=\widehat{\eta }^{i}(dx)\varphi (da | x)\) and \(\eta _{u^{\pi ,\varphi }}^i(dx,da)=\widehat{\eta }_{u^{\pi ,\varphi }}^{i}(dx)\varphi (da | x)\) due to (27).

Now, observe that for any \(z\in D\) and \(j\in \mathbb {N}\), \(Q^\varphi _{\Delta }(\mathbf {X}_{j} |z)=1\) implying \(Q^\varphi _{\Delta }(D |z)=1\) and so, choosing \(\Gamma =D\) in Eq. (40), we have

$$\begin{aligned} \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(D)&= \delta _{x_{0}}(D)+\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(D| x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da) +\widehat{\eta }_{u^{\pi ,\varphi }}^{i}(D)\\&\quad + \int _{\mathbf {X}\setminus D} Q^\varphi _{\Delta }(D | z) \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(dz) \end{aligned}$$

leading to \(\displaystyle \delta _{x_{0}}(\Gamma )+\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(\Gamma | x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da)+ \int _{\mathbf {X}\setminus D} Q^\varphi _{\Delta }(\Gamma | z) \widehat{\eta }_{u^{\pi ,\varphi }}^{i}(dz)=0\), for any \(\Gamma \in \mathcal {B}(D)\). Consequently, \(\delta _{x_{0}}(D)+\frac{1}{\alpha } \eta _{u^{\pi ,\varphi }}^{g} \bar{q}(D)=0\) and \(\big [\delta _{x_{0}}+\frac{1}{\alpha } \eta _{u^{\pi ,\varphi }}^{g} \bar{q} \big ] \sum _{k\in \mathbb {N}} \big ( Q^\varphi _{\Delta } \big )^{k} I_{\mathbf {X}\setminus D} Q^\varphi _{\Delta }(D)=0\), where we have used (27) to get the last equation. Combining the two previous equations, it can be shown easily by induction that \(\big [\delta _{x_{0}}+\frac{1}{\alpha } \eta _{u^{\pi ,\varphi }}^{g} \bar{q} \big ] \big ( Q^\varphi _{\Delta } \big )^{k} (D)=0\) for any \(k\in \mathbb {N}\) and so, recalling (27), it follows that \(\widehat{\eta }_{u^{\pi ,\varphi }}^{i}(D)=0\).

Now, from Eq. (40) and by using the fact that \(\delta _{x_{0}}(D)=\widehat{\eta }_{u^{\pi ,\varphi }}^{i}(D)=0\), we obtain

$$\begin{aligned} \int _{\mathbf {X} \times \mathbf {A}^{i}} Q_{\Delta }(D | z,b) \eta _{u^{\pi ,\varphi }}^{i}(dz,db) +\frac{1}{\alpha } \int _{\mathbf {X} \times \mathbf {A}^{g}} \bar{q}(D| x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da)=0. \end{aligned}$$

Therefore, Eq. (16) yields \(\displaystyle \widehat{\eta }_{u^{\pi ,\varphi }}^{g}(D\times \mathbf {A}^{g})=\frac{1}{\alpha } \int _{D \times \mathbf {A}^{g}} q(\{x\}| x,a) \eta _{u^{\pi ,\varphi }}^{g}(dx,da)\). Since \(\eta _{u^{\pi ,\varphi }}^{g}\ge 0\) and \(q(\{x\}| x,a)\le 0\), we conclude that \(\widehat{\eta }_{u^{\pi ,\varphi }}^{g}(D\times \mathbf {A}^{g})=0\). \(\square \)

5 The LP Formulation

The main objective of this section is to show the existence of an optimal strategy for the constrained optimal control problem introduced in Definition 5.1. The idea to get this existence result can be decomposed into two steps. First, we introduce the (primal) linear program \(\mathbb {PLP}\) associated with the optimization problem under consideration (see Definition 5.1) and show in Theorem 5.2 that there exists an optimal strategy for the constrained optimization control problem if and only if the linear program \(\mathbb {PLP}\) is solvable. The second step consists in showing that the \(\mathbb {PLP}\) is solvable (Theorem 5.5) and this is done by introducing an auxiliary linear program (whose properties are studied in Proposition 5.4) and by considering an additional set of hypotheses (see Assumption ). Combining these two steps, it is straightforward to obtain the existence of an optimal randomized control strategy for the constrained optimal control problem (see Theorem 5.6). An easy consequence of this result is that the class of strategies introduced in Definition 4.7 is a sufficient set. Assumption is a standard hypothesis in the literature on CTMDPs (see for example [24]) and mainly requires that the parameters of the system be lower semicontinuous and that the transition rate be weakly continuous. As an independent result, the dual linear program associated with the linear program PLP is briefly discussed at the end of this section.

Definition 5.1

The constrained linear program, labeled \(\mathbb {PLP}\), is defined as minimize \(\eta (C_{0})\) subject to \(\eta \in \mathbb {L}\) where \(\mathbb {L}\) is defined by the set of measures \(\eta \) in \(\mathcal {M}_{f}(\mathbf {X}\times \mathbf {A})\) which are admissible in the sense of Definition 4.5 and such that for any \(j\in \mathbb {N}_{p}^{*}\), \( \eta (C_{j})\le B_{j}\).

The nonnegative real number \(\inf _{\eta \in \mathbb {L}} \eta (C_{0})\) is called the value of the constrained linear program \(\mathbb {PLP}\). Below, we say that \(\mathbb {PLP}\) is solvable if there is \(\eta ^*\in \mathbb {L}\) such that \(\eta ^*(C_0)=\inf _{\eta \in \mathbb {L}} \eta (C_{0})\).

Theorem 5.2

The values of the constrained control problem and the linear program \(\mathbb {PLP}\) are equivalent:

$$\begin{aligned} \inf _{\eta \in \mathbb {L}} \eta (C_{0}) =\inf _{u\in \mathcal{U}^f} \mathcal {V}_0(u,x_0). \end{aligned}$$

Moreover, assume the existence of \(\bar{u}\in \mathcal {U}^{f}\) such that \(\mathcal {V}_{0}(\bar{u},x_{0})<\infty \). Then the following assertions hold:

  1. (i)

    The measure \(\eta _{\bar{u}}\) as defined in Eq. (11) for the strategy \(\bar{u}\) belongs to \(\mathbb {L}\) and \(\eta _{\bar{u}}(C_{0})<\infty \).

  2. (ii)

    The constrained optimal control problem as introduced in Definition 3.1 is solvable if and only if the linear program \(\mathbb {PLP}\) is solvable.

  3. (iii)

    If the constrained optimal control problem is solvable then there exists a randomized stationary optimal control strategy where the interventions only occur after the natural jumps and with a possible intervention at the initial moment.

Proof

if \(\eta \in \mathbb {L}\) then it is admissible in the sense of Definition 4.5. From Theorem 4.9, the control strategy \(u^{\pi ,\varphi }\in \mathcal {U}\) where \(\pi \) (respectively, \(\varphi \)) has been defined in Eq. (24) (respectively, (25)) satisfies \(\eta ^{g}_{u^{\pi ,\varphi }}=\eta ^{g}\) and \(\eta ^{i}_{u^{\pi ,\varphi }}\le \eta ^{i}\) with \(\eta ^{g}\) (respectively, \(\eta ^{i}\)) given in (22) (respectively, (23)). Therefore, we have for any \(j\in \mathbb {N}_{p}^{*}\), \(\mathcal {V}_{j}(u^{\pi ,\varphi },x_{0})\le \eta (C_{j})\le B_{j}\) and \(\mathcal {V}_{0}(u^{\pi ,\varphi },x_{0})\le \eta (C_{0})\). In particular, this first statement implies that on one hand \(\mathbb {L}\) is empty if \(\mathcal {U}^f\) is empty and on the other hand if the set \(\mathcal {U}^f\) is not empty with \(\mathcal {V}_0(u,x_0)=\infty \) for any \(u\in \mathcal {U}^f\) then either \(\mathbb {L}\) is empty or \(\displaystyle \inf _{\eta \in \mathbb {L}} \eta (C_{0})=\infty \) showing in any case \(\displaystyle \inf _{\eta \in \mathbb {L}} \eta (C_{0}) =\inf _{u\in \mathcal{U}^f} \mathcal {V}_0(u,x_0)=\infty .\)

Now, if \(\mathcal {V}_{0}(u,x_{0})<\infty \) for \(u\in \mathcal {U}\) and \(\mathcal {V}_{j}(u,x_{0})\le B_{j}\) for any \(j\in \mathbb {N}_{p}^{*}\) then recalling Assumptions (A2), (A3) and B we have necessarily \(\eta _{u}^{i}(\mathbf {X}\times \mathbf {A}^{i})<\infty \). From Lemma A.1, it gives \(\eta _{u}^{i}(\mathbf {X}\times \mathbf {A}^{i}_{\Delta })<\infty \). Consequently, according to Theorem 4.6, for any admissible control strategy \(u\in \mathcal {U}\) such that \(\mathcal {V}_{0}(u,x_{0})<\infty \) and \(\mathcal {V}_{j}(u,x_{0}) \le B_{j}\) for any \(j\in \mathbb {N}_{p}^{*}\), there exists a finite measure \(\eta _{u}\in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A})\) concentrated on \(\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\) satisfying Eq. (16) with \(\eta _{u}(C_{j})=\mathcal {V}_{j}(u,x_{0})\) for any \(j\in \mathbb {N}_{p}\) implying that \(\eta _{u}\in \mathbb {L}\) and \(\eta _{u}(C_{0})<\infty \).

Combining these two statements, we obtain easily the results. \(\square \)

To study the solvability of the linear program \(\mathbb {PLP}\), we need to introduce an auxiliary linear program. First, let us define \(\mathbf {X}_{\sigma }\) by \(\mathbf {X}\mathop {\cup }\{\sigma \}\) where \(\sigma \) is an isolated point and the kernel \(\widetilde{Q}\) on \(\mathbf {X}_{\sigma }\) given \(\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\mathop {\cup }(\{\sigma \}\times \mathbf {A})\) by

$$\begin{aligned} \widetilde{Q}(\Gamma |x,a)= & {} \Big \{ \frac{1}{K+\alpha }\Big [ \bar{q}(\Gamma \mathop {\cap }\mathbf {X}|x,a)+\delta _{x}(\Gamma \mathop {\cap }\mathbf {X})\big (K-\bar{q}(\mathbf {X}|x,a)\big )\Big ] \nonumber \\&+\frac{\alpha }{K+\alpha }\delta _{\sigma }(\Gamma ) \Big \} I_{\mathbb {K}^{g}}(x,a) +Q(\Gamma \mathop {\cap }\mathbf {X}|x,a) I_{\mathbb {K}^{i}}(x,a) + \delta _{\sigma }(\Gamma ) I_{\{\sigma \}}(x).\nonumber \\ \end{aligned}$$
(43)

Definition 5.3

The auxiliary linear program, labeled \(\mathbb {LP}'\), is defined as minimize \(\rho (C'_{0})\) subject to \(\rho \in \mathbb {L}'\) where \(\mathbb {L}'\) is defined by the set of measures \(\rho \) in \(\mathcal {M}(\mathbf {X}_{\sigma }\times \mathbf {A})\) concentrated on \(\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\mathop {\cup }(\{\sigma \}\times \mathbf {A})\) such that for any \(\Gamma \in \mathcal {B}(\mathbf {X}_{\sigma })\)

$$\begin{aligned} \rho (\Gamma \times \mathbf {A})&= \delta _{x_{0}}(\Gamma ) + \int _{\mathbf {X}_{\mathbf {\sigma }}\times \mathbf {A}} \widetilde{Q}(\Gamma | z,b) \rho (dz,db), \end{aligned}$$
(44)
$$\begin{aligned} \rho (C'_{j})\le B_{j}, \text { for any } j\in \mathbb {N}_{p}^{*}, \end{aligned}$$
$$\begin{aligned} \rho (\mathbb {K}^{g})\le \frac{K+\alpha }{\alpha }, \end{aligned}$$

with \(C'_{j}(x,a)=\frac{1}{K+\alpha }C^{g}_{j}(x,a)I_{\mathbb {K}^{g}}(x,a)+c^{i}_{j}(x,a) I_{\mathbb {K}^{i}}(x,a)\) for \(j\in \mathbb {N}_{p}\).

Proposition 5.4

The following assertions hold:

  1. i)

    If the measure \(\eta \) belongs to \(\mathbb {L}\) then the measure \(\rho \), defined on \(\mathbf {X}_{\sigma }\times \mathbf {A}\) by

    $$\begin{aligned} \rho (\Gamma )=\eta (\Gamma \mathop {\cap }\mathbb {K}^{i})+\frac{K+\alpha }{\alpha }\eta (\Gamma \mathop {\cap }\mathbb {K}^{g}), \end{aligned}$$

    for any \(\Gamma \in \mathcal {B}(\mathbf {X}\times \mathbf {A})\) and \(\rho (\{\sigma \}\times \Gamma )=+\infty \) for any \(\Gamma \in \mathcal {B}(\mathbf {A})\), belongs to \( \mathbb {L}'\). Moreover, \(\rho (C'_{j})=\eta (C_{j})\) for any \(j\in \mathbb {N}_{p}\).

  2. ii)

    If the measure \(\rho \in \mathbb {L}'\) satisfies \(\rho (C'_{0})<\infty \) then the measure \(\eta \) defined on \(\mathbf {X}\times \mathbf {A}\) by

    $$\begin{aligned} \eta (\Gamma )=\rho (\Gamma \mathop {\cap }\mathbb {K}^{i})+\frac{\alpha }{K+\alpha }\rho (\Gamma \mathop {\cap }\mathbb {K}^{g}) \end{aligned}$$

    belongs to \(\mathbb {L}\). Moreover, \(\eta (C_{j})=\rho (C'_{j})\) for any \(j\in \mathbb {N}_{p}\).

Proof

Regarding item i), it is clear that \(\rho \) so defined is a positive measure on \(\mathbf {X}_{\sigma }\times \mathbf {A}\) concentrated on \(\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\mathop {\cup }(\{\sigma \}\times \mathbf {A})\). Moreover, a straightforward calculus show that \(\rho \) satisfies Eq. (44) with \(\rho (C'_{j})=\eta (C_{j})\), for any \(j\in \mathbb {N}_{p}\), giving the first part of the result. For item ii), we have \(\eta (\mathbb {K}^{g})=\frac{\alpha }{K+\alpha }\rho (\mathbb {K}^{g})\le 1\). Moreover, combining assumptions (A2), (A3) and B it follows that \(\underline{c}\eta (\mathbb {K}^{i})\le \sum _{j\in \mathbb {N}_{p}^{*}} B_{j}+\eta (C'_{0})\). Consequently, \(\eta \in \mathcal {M}_{f}(\mathbf {X}\times \mathbf {A})\) and \(\eta \) is clearly concentrated on \(\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\). Now, simple algebraic manipulations yield that the measure \(\eta \) satisfies Eq. (16) with \(\eta (C_{j})=\rho (C'_{j})\), for any \(j\in \mathbb {N}_{p}\) showing the last part of the result. \(\square \)

Assumption C   

(C1):

The transition rate q is weakly continuous, that is, for any \(h\in \mathbb {C}_{b}(\mathbf {X})\), \(qh \in \mathbb {C}_{b}(\mathbb {K}^{g})\).

(C2):

If \(\mathbb {K}^{i}\ne \emptyset \), the transition kernel Q is weakly continuous, that is, \(Qh \in \mathbb {C}_{b}(\mathbb {K}^{i})\) for any \(h\in \mathbb {C}_{b}(\mathbf {X})\).

(C3):

For any \(j\in \mathbb {N}_{p}\), the function \(c^{i}_{j}\) is lower semicontinuous on \(\mathbb {K}^{i}\).

(C4):

For any \(j\in \mathbb {N}_{p}\), the function \(C^{g}_{j}\) is lower semicontinuous on \(\mathbb {K}^{g}\).

(C5):

For any \(x\in \mathbf {X}\), \(\mathbf {A}^{g}(x)\) and \(\mathbf {A}^{i}(x)\) are compact sets.

(C6):

The multifunction \(\Psi : \mathbf {X}\rightarrow \mathbf {A}\) defined by \(\Psi (x)=\mathbf {A}^{g}(x)\mathop {\cup }\mathbf {A}^{i}(x)\) is upper semicontinuous.

Theorem 5.5

Assume that there exists \(\bar{\eta }\in \mathbb {L}\) such that \(\bar{\eta }(C_{0})<\infty \). Then the linear program \(\mathbb {PLP}\) is solvable.

Proof

Introduce \(\mathbb {K}=\mathbb {K}^{g}\mathop {\cup }\mathbb {K}^{i}\mathop {\cup }(\{\sigma \}\times \mathbf {A})\). According to Assumption , it is easy to see that the stochastic kernel \(\widetilde{Q}\) on \(\mathbf {X}_{\sigma }\) given \(\mathbb {K}\) is weakly continuous and the mappings \(C'_{j}\) are nonnegative and lower semicontinuous on \(\mathbb {K}\) for any \(j\in \mathbb {N}_{p}\). Consequently, by using Theorem 7.2 in [5], the auxiliary linear program \(\mathbb {LP}'\) is solvable because, according to item i) of Proposition 5.4, there is \(\overline{\rho }\in \mathbb {L}'\) such that \(\overline{\rho }(C'_0)=\overline{\eta }(C_0)<\infty \). Observe that Theorem 7.2 in [5] is an extension of Theorem 4.1 in [7] to the case where the action space depends on the state variable. Now, from Proposition 5.4, it is easy to see that the auxiliary linear program \(\mathbb {LP}'\) is solvable if and only if the linear program \(\mathbb {LP}\) is solvable, since there exists \(\bar{\rho }\in \mathbb {L}'\) such that \(\bar{\rho }(C'_{0})<\infty \). Let \(\rho ^{*}\) be a measure in \(\mathbb {L}'\) such that \(\inf _{\rho \in \mathbb {L}'} \rho (C'_{0})=\rho ^{*}(C'_{0})\). Then, \(\rho ^{*}(C'_{0})\le \bar{\rho }(C'_{0})<\infty \). Consequently, item ii) of Proposition 5.4 implies that the measure \(\eta ^{*}\) defined on \(\mathbf {X}\times \mathbf {A}\) by

$$\begin{aligned} \eta ^{*}(\Gamma )=\rho ^{*}(\Gamma \mathop {\cap }\mathbb {K}^{i})+\frac{\alpha }{K+\alpha }\rho ^{*}(\Gamma \mathop {\cap }\mathbb {K}^{g}) \end{aligned}$$

belongs to \(\mathbb {L}\) with \(\eta ^{*}(C_{0})=\rho ^{*}(C'_{0})\). Finally, it is easy to show that we have necessarily

$$\begin{aligned} \inf _{\eta \in \mathbb {L}} \eta (C_{0})=\eta ^{*}(C_{0}) \end{aligned}$$

by using item i) of Proposition 5.4, giving the result. \(\square \)

If \(\mathbb {L}\ne \emptyset \) and \(\eta (C_0)=\infty \) for any \(\eta \in \mathbb {L}\) then \(\mathbb {PLP}\) is also solvable without Assumption . The last following Theorem is the main result of this section. It establishes the existence of an optimal randomized control strategy and states that the class of strategies introduced in 4.7 is a sufficient set.

Theorem 5.6

Assume that there exists \(\bar{u}\in \mathcal {U}^{f}\) such that \(\mathcal {V}_{0}(\bar{u},x_{0})<\infty \). Then, there exists a randomized stationary optimal control strategy, for the constrained optimal control problem introduced in Definition 3.1, where the interventions only occur after the natural jumps and with a possible intervention at the initial moment, that is, there exist \(\pi ^{*}\in \mathcal {P}^{g}(\mathbf {A}^{g}|\mathbf {X})\) and \(\varphi ^{*} \in \mathcal {P}^{i}(\mathbf {A}^{i}_{\Delta }|\mathbf {X}_{\Delta })\) such that the strategy \(u^{\pi ^{*},\varphi ^{*}}\) as introduced in Definition 4.7 belongs to \(\mathcal {U}^{f}\) and satisfies

$$\begin{aligned} \inf _{u\in \mathcal {U}^{f}} \mathcal {V}_{0}(u,x_{0}) =\mathcal {V}_{0}(u^{\pi ^{*},\varphi ^{*}},x_{0}) =\eta _{u^{\pi ^{*},\varphi ^{*}}}(C_{0})=\inf _{\eta \in \mathbb {L}} \eta (C_{0}). \end{aligned}$$
(45)

As a consequence, the class of randomized stationary control strategy as introduced in Definiton 4.7 is a sufficient class of control strategy for the constrained control problem under consideration.

Proof

This result is a straightforward consequence of Theorems 5.2 and 5.5. \(\square \)

We now discuss briefly the dual program associated to the linear program \(\mathbb {PLP}\) in the case of an unconstrained control problem, that is \(p=0\). Assume that \(|C^{g}_{0}(x,a)|\le K\) for any \((x,a)\in \mathbb {K}^{g}\). The dual linear program is defined as maximise \(W(x_{0})\) subject to \(W\in \mathbb {L}^{*}\) where \(\mathbb {L}^{*}\) is the set of functions belonging to \(\mathbb {B}(\mathbf {X})\) such that

$$\begin{aligned} W(x) \le C_{0}(x,a)+ I_{\mathbb {K}^i}(x,a)\int _{\mathbf {X}} W(y)Q(dy|x,a){+}\frac{1}{\alpha }I_{\mathbb {K}^g}(x,a)\int _{\mathbf {X}} W(y)q(dy|x,a). \end{aligned}$$

It is easy to show that that for any \(\eta \in \mathbb {L}\) and \(W\in \mathbb {L}^{*}\), we have \(\eta (C_{0})\ge W(x_{0})\) showing that \(\inf _{\eta \in \mathbb {L}} \eta (C_{0}) \ge \sup _{W\in \mathbb {L}^{*}} W(x_{0})\). Now, if Assumptions A and B hold then we have \(\displaystyle \inf _{u\in \mathcal {U}} \mathcal {V}_{0}(u,x_{0})=\inf _{\eta \in \mathbb {L}} \eta (C_{0}) \ge \sup _{W\in \mathbb {L}^{*}} W(x_{0})\), from Theorem 5.2. Observe that the set \(\mathbb {L}^{*}\) can be equivalently described by the set of functions belonging to \(\mathbb {B}(\mathbf {X})\) and satisfying the following two inequalities

$$\begin{aligned} \left\{ \begin{array}{rl} \alpha W(x) &{} \displaystyle \le \inf _{a\in \mathbf {A}^g(x)}\left\{ C^{g}_{0}(x,a)+\int _{\mathbf {X}} W(y) q(dy|x,a)\right\} , \\ W(x) &{} \displaystyle \le \inf _{a\in \mathbf {A}^i(x)}\left\{ c^{i}_{0}(x,a)+\int _{\mathbf {X}} W(y) Q(dy|x,a)\right\} . \end{array}\right. \end{aligned}$$
(46)

Now, assume that the sets \(\mathbf {A}^{g}\) and \(\mathbf {A}^{i}\) are compact and the sets \(\mathbb {K}^g\) and \(\mathbb {K}^i\) are closed in \(\mathbf {X}\times \mathbf {A}^{g}\) and \(\mathbf {X}\times \mathbf {A}^{i}\) correspondingly and suppose that Assumptions (C1)–(C4) hold. In this context, according to item c) of Corollary 4.8 in [8], \(\inf _{u\in \mathcal{U}}\mathcal{V}_0(u,x_0)=V(x_0)\) where V is the unique bounded measurable solution to the Bellman equation

$$\begin{aligned} \inf _{a\in \mathbf {A}^g(x)}&\left\{ -\alpha V(x)+C^g_{0}(x,a)+\int _{\mathbf {X}} V(y) q(dy|x,a)\right\} \nonumber \\&\wedge \inf _{a\in \mathbf {A}^i(x)}\left\{ -V(x)+c^i_{0}(x,a)+\int _{\mathbf {X}} V(y) Q(dy|x,a)\right\} =0. \end{aligned}$$

Since V satisfies the inequalities in (46), then

$$\begin{aligned} \inf _{u\in \mathcal {U}} \mathcal {V}_{0}(u,x_{0})=\inf _{\eta \in \mathbb {L}} \eta (C_{0}) = \sup _{W\in \mathbb {L}^{*}} W(x_{0})=V(x_0), \end{aligned}$$

and there is no duality gap. Consequently, if there are no constraints then solving the dual linear program is equivalent to solving the Bellman equation.