On Risk-Sensitive Piecewise Deterministic Markov Decision Processes

Guo, Xin; Zhang, Yi

doi:10.1007/s00245-018-9485-x

On Risk-Sensitive Piecewise Deterministic Markov Decision Processes

Open access
Published: 17 February 2018

Volume 81, pages 685–710, (2020)
Cite this article

Download PDF

You have full access to this open access article

Applied Mathematics & Optimization Aims and scope Submit manuscript

On Risk-Sensitive Piecewise Deterministic Markov Decision Processes

Download PDF

Xin Guo¹ &
Yi Zhang¹

1925 Accesses
14 Citations
Explore all metrics

Abstract

We consider a piecewise deterministic Markov decision process, where the expected exponential utility of total (nonnegative) cost is to be minimized. The cost rate, transition rate and post-jump distributions are under control. The state space is Borel, and the transition and cost rates are locally integrable along the drift. Under natural conditions, we establish the optimality equation, justify the value iteration algorithm, and show the existence of a deterministic stationary optimal policy. Applied to special cases, the obtained results already significantly improve some existing results in the literature on finite horizon and infinite horizon discounted risk-sensitive continuous-time Markov decision processes.

Risk-sensitive infinite-horizon discounted piecewise deterministic Markov decision processes

Article 15 July 2022

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Article 19 October 2019

Finite horizon risk-sensitive continuous-time Markov decision processes with unbounded transition and cost rates

Article 10 January 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Since the pioneering work [16], risk-sensitive discrete-time Markov decision processes (DTMDPs) have been studied intensively. Having restricted our attention to total undiscounted or discounted problems, let us mention e.g., [4, 6, 7, 11, 12, 17, 18], most of which deal with the exponential utility, as well as in the present paper. As an application, an open problem in insurance was recently solved in [1] in the framework of risk-sensitive DTMDP. There are notable differences between risk-sensitive and risk-neutral DTMDPs. For instance, in a finite model, i.e., when the state and action spaces are both finite, there is always a deterministic stationary optimal policy in a discounted risk-neutral DTMDP, but not always in a discounted risk-sensitive DTMDP, see [17].

One of the first works on risk-sensitive continuous-time Markov decision processes (CTMDPs) is [21], where only verification theorems were presented. Recently, there have been reviving interests in this topic; see e.g., [8, 14, 20, 24, 25, 27]. A finite horizon total undiscounted risk-sensitive CTMDP was considered in [14, 21, 24], whose arguments were summarized as follows. Firstly, the optimality equation is shown to admit a solution out of a small enough class. Secondly, by using the Feynman–Kac formula, this solution is shown to be the value function, and any Markov policy providing the minimizer in the optimality equation is optimal. The proofs of [14, 24] reveal that the main technicalities lie in the first step, for which, the state space was assumed to be denumerable. This assumption is important for the diagonalization argument used in [24], which is an extension of [14] from bounded transition rate to possibly unbounded transition rate, whose growth is bounded by a Lyapunov function. The latter requirement and the boundedness of the cost rate then validate the Feynman–Kac formula applied in the second step. Wei [24] mentioned that it was unclear how to extend his argument to an unbounded cost rate, see Sect. 7 therein. Following a similar argument as described above, a discounted risk-sensitive CTMDP was also considered in [14], although now the first step becomes, to quote the authors’ words (see p. 658 therein), “surprisingly far more involved”, for which the state space was further assumed to be finite, see Remark 3.6 therein. It is a corollary of the present paper that we significantly weaken the restrictive conditions in [14, 24], see Sect. 3 below.

The present paper is concerned with a risk-sensitive piecewise deterministic Markov decision process (PDMDP), where the expected exponential utility of the total cost is to be minimized. The state space is a general Borel space, the transition and the nonnegative cost rates only need be locally integrable along the drift. A PDMDP is an extension of a CTMDP: now between two consecutive jumps, the process evolves according to deterministic Markov process. For simplicity and to keep the conditions as weak as possible, we do not consider the control on the drift. In spite that there has been a vast literature on PDMDPs; see the well known monographs [9, 10] and the references therein, to the best of our knowledge, risk-sensitive PDMDPs have not been systematically studied before.

Our main contributions are the following. We establish the optimality equation satisfied by the value function, justify the value iteration algorithm and show the existence of a deterministic stationary optimal policy. As an application and corollary, finite horizon and infinite horizon discounted risk-sensitive CTMDPs are reformulated as total undiscounted risk-sensitive PDMDPs, and are thus treated in a unified way and under much weaker conditions than in [14, 24]. This is possible because we follow a different argument. Namely, we directly show that the value function satisfies the optimality equation, by reducing the total undiscounted risk-sensitive PDMDP to a risk-sensitive DTMDP. This method, without referring to the Feynman–Kac formula, was originally developed by Yushkevich [26] for risk-neutral CTMDPs. Later, it was employed in [2, 3, 9, 10, 13, 23] for studies of risk-neutral PDMDPs, and in [27] for risk-sensitive CTMDPs. In [8], restricted to stationary policies, the discounted risk-sensitive CTMDP with bounded transition rates was reduced to a DTMDP problem, using the uniformization technique. The induced DTMDP is less standard (with a random cost), and was not further investigated there.

The rest of the paper is organized as follows. In Sect. 2 we describe the concerned optimal control problem. In Sect. 3 we present the main results, the proofs of which are postponed to Sect. 4. We finish the paper with a conclusion in Sect. 5. Some relevant facts were collected in the Appendix for ease of reference.

2 Model Description and Problem Statement

2.1 Notations and Conventions

In what follows, $\mathcal{{B}}(X)$ is the Borel $\sigma $-algebra of the topological space X, I stands for the indicator function, and $\delta _{\{x\}}(\cdot )$ is the Dirac measure concentrated on the singleton $\{x\},$ assumed to be measurable. A measure is $\sigma $-additive and $[0,\infty ]$-valued. Below, unless stated otherwise, the term of measurability is always understood in the Borel sense. Throughout this paper, we adopt the conventions of

$$\begin{aligned} \frac{0}{0}:=0,~0\cdot \infty :=0,~\frac{1}{0}:=+\infty ,~\infty -\infty :=\infty . \end{aligned}$$

(1)

If a mapping f defined on X, and $\{X_i\}$ is a partition of X, then when f is piecewise defined as $f(x)=g_i(x)$ for all $x\in X_i$, the notation $f(x)=\sum _i I\{x\in X_i\}g_i(x)$ is used, even if f is not real-valued.

Let S be a nonempty Borel state space, A be a nonempty Borel action space, and q stand for a signed kernel q(dy|x, a) on $\mathcal{{B}}(S)$ given $(x,a)\in S\times A$ such that

$$\begin{aligned} \tilde{q}(\Gamma _S|x,a):=q(\Gamma _S{\setminus }\{x\}|x,a)\ge 0 \end{aligned}$$

(2)

for all $\Gamma _S\in \mathcal{{B}}(S).$ Throughout this article we assume that $q(\cdot |x,a)$ is conservative and stable, i.e.,

$$\begin{aligned} q(S|x,a)=0,~\bar{q}_x=\sup _{a\in A}q_x(a)<\infty , \end{aligned}$$

(3)

where $q_x(a):=-q(\{x\}|x,a).$ The signed kernel q is often called the transition rate. Between two consecutive jumps, the state of the process evolves according to a measurable mapping $\phi $ from $S\times [0,\infty )$ to S, see (5) below. It is assumed that for each $x\in S$

$$\begin{aligned} \phi (x,t+s)=\phi (\phi (x,t),s),~\forall ~s,t\ge 0;~\phi (x,0)=x, \end{aligned}$$

(4)

and $t\rightarrow \phi (x,t)$ is continuous.

Finally let the cost rate c be a $[0,\infty )$-valued measurable function on $S\times A$. For simplicity, we do not consider the case of different admissible action spaces at different states.

Condition 2.1

(a)
For each bounded measurable function f on S and each $x\in S$, $\int _S f(y)\tilde{q}(dy|x,a)$ is continuous in $a\in A.$
(b)
For each $x\in S,$ the (nonnegative) function c(x, a) is lower semicontinuous in $a\in A.$
(c)
The action space A is a compact Borel space.

Condition 2.2

For each $x\in S$, $\int _{0}^t\overline{q}_{\phi (x,s)}ds<\infty $, and $\int _{0}^t \sup _{a\in A} c(\phi (x,s),a)ds<\infty $, for each $t\in [0,\infty ).$

The integrals in the above condition are well defined: the integrands are universally measurable in $s\in [0,\infty )$; see Chaper 7 of [5].

Let us take the sample space $\Omega $ by adjoining to the countable product space $S\times ((0,\infty )\times S)^\infty $ the sequences of the form $(x_0,\theta _1,\dots ,\theta _n,x_n,\infty ,x_\infty ,\infty ,x_\infty ,\dots ),$ where $x_0,x_1,\dots ,x_n$ belong to S, $\theta _1,\dots ,\theta _n$ belong to $(0,\infty ),$ and $x_{\infty }\notin S$ is the isolated point. We equip $\Omega $ with its Borel $\sigma $-algebra $\mathcal F$.

Let $t_0(\omega ):=0=:\theta _0,$ and for each $n\ge 0$, and each element $\omega :=(x_0,\theta _1,x_1,\theta _2,\dots )\in \Omega $, let

$$\begin{aligned} t_n(\omega ):= & {} t_{n-1}(\omega )+\theta _n, \end{aligned}$$

and

$$\begin{aligned} t_\infty (\omega ):=\lim _{n\rightarrow \infty }t_n(\omega ). \end{aligned}$$

Obviously, $(t_n(\omega ))$ are measurable mappings on $(\Omega ,\mathcal{F})$. In what follows, we often omit the argument $\omega \in \Omega $ from the presentation for simplicity. Also, we regard $x_n$ and $\theta _{n+1}$ as the coordinate variables, and note that the pairs $\{t_n,x_n\}$ form a marked point process with the internal history $\{\mathcal{F}_t\}_{t\ge 0},$ i.e., the filtration generated by $\{t_n,x_n\}$; see Chapter 4 of [19] for greater details. The marked point process $\{t_n,x_n\}$ defines the stochastic process $\{\xi _t,t\ge 0\}$ on $(\Omega ,\mathcal{F})$ of interest by

$$\begin{aligned} \xi _t=\sum _{n\ge 0}I\{t_n\le t<t_{n+1}\}\phi (x_n,t-t_n)+I\{t_\infty \le t\}x_\infty ,~t\ge 0, \end{aligned}$$

(5)

where we accept $0\cdot x:=0$ and $1\cdot x:=x$ for each $x\in S_\infty ,$ and below we denote $S_{\infty }:=S\bigcup \{x_\infty \}$.

A (history-dependent) policy $\pi $ is given by a sequence $(\pi _n)$ such that, for each $n=0,1,2,\dots ,$$\pi _n(da|x_0,\theta _1,\dots ,x_{n},s)$ is a stochastic kernel on A, and for each $\omega =(x_0,\theta _1,x_1,\theta _2,\dots )\in \Omega $, $t> 0,$

$$\begin{aligned} \pi (da|\omega ,t)= & {} I\{t\ge t_\infty \}\delta _{a_\infty }(da)\nonumber \\&+\,\sum _{n=0}^\infty I\{t_n< t\le t_{n+1}\}\pi _{n}(da|x_0,\theta _1,\dots ,\theta _n,x_n, t-t_n), \end{aligned}$$

(6)

where $a_\infty \notin A$ is some isolated point. A policy $\pi $ is called Markov if, with slight abuse of notations, $ \pi (da|\omega ,s)=\pi ^M(da|\xi _{s-},s)$ for some stochastic kernel $\pi ^M$. A Markov policy is further called deterministic if the stochastic kernels $\pi ^M(da|x,s)=\delta _{\{f^M(x,s)\}}(da)$ for some measurable mapping $f^M$ from $S\times (0,\infty )$ to A. A policy is called deterministic stationary if for each $n=0,1,\dots ,$$\pi _{n}(da|x_0,\theta _1,\dots ,\theta _n,x_n, t-t_n)=\delta _{\{f(\phi (x_n,t-t_n))\}}(da)$ for some measurable mapping f from S to A. We shall identify such a deterministic stationary policy by the underlying measurable mapping f.

The class of all policies is denoted by $\Pi .$ Under a fixed policy $\pi =(\pi _n)$, for each initial distribution $\gamma $ on $(S,\mathcal{B}(S)),$ by using the Ionescu-Tulcea theorem, one can build a probability measure $P_\gamma ^\pi $ on $(\Omega ,\mathcal{F})$ such that $P_\gamma ^\pi (x_0\in \Gamma )=\gamma (\Gamma )$ for each $\Gamma \in \mathcal{B}(S)$, and the conditional distribution of $(\theta _{n+1},x_{n+1})$ with the condition on $x_0,\theta _1,x_1,\dots ,\theta _{n},x_n$ is given on $\{\omega :x_n(\omega )\in S\}$ by

$$\begin{aligned}&P_\gamma ^\pi (\theta _{n+1}\in \Gamma _1,~x_{n+1}\in \Gamma _2|x_0,\theta _1,x_1,\dots ,\theta _{n},x_n)\nonumber \\&\quad =\int _{\Gamma _1}e^{-\int _0^t \int _A q_{\phi (x_n,s)}(a)\pi _n(da|x_0,\theta _1,\dots ,\theta _n,x_n,s)ds}\nonumber \\&\qquad \times \int _{A}\tilde{q}(\Gamma _2|\phi (x_n,t),a)\pi _n(da|x_0,\theta _1,\dots ,\theta _n,x_n,t)dt,\nonumber \\&\quad \qquad \forall ~\Gamma _1\in \mathcal{B}((0,\infty )),~\Gamma _2\in \mathcal{B}(S); P_\gamma ^\pi (\theta _{n+1}\nonumber \\&\quad =\infty ,~x_{n+1}=x_\infty |x_0,\theta _1,x_1,\dots ,\theta _{n},x_n)\nonumber \\&\quad =e^{-\int _0^\infty \int _A q_{\phi (x_n,s)}(a)\pi _n(da|x_0,\theta _1,\dots ,\theta _n,x_n,s)ds}, \end{aligned}$$

(7)

and given on $\{\omega :x_n(\omega )=x_\infty \}$ by

$$\begin{aligned} P_\gamma ^\pi (\theta _{n+1}=\infty ,~x_{n+1}=x_\infty |x_0,\theta _1,x_1,\dots ,\theta _{n},x_n)=1. \end{aligned}$$

Below, when $\gamma $ is a Dirac measure concentrated at $x\in S,$ we use the denotation ${}{P}_x^\pi .$ Expectations with respect to ${}{P}_\gamma ^\pi $ and ${}{P}_x^\pi $ are denoted as ${}{E}_{\gamma }^\pi $ and ${}{E}_{x}^\pi ,$ respectively. Roughly speaking, the uncontrolled version of the process evolves as follows: given the current state, the process evolves deterministically according to the mapping $\phi $, up to the next jump, taking place after a random time whose distribution is (nonstationary) exponential, and the dynamics continue in the similar manner. A detailed book treatment with many examples of this and more general type of processes, allowing deterministic jumps, can be found in [10].

For each $x\in S$, and policy $\pi =(\pi _n)$,

$$\begin{aligned} E_x^\pi \left[ e^{\int _0^\infty \int _A c(\xi _t,a)\pi (da|\omega ,t)dt}\right]= & {} E_x^\pi \left[ e^{\sum _{n=0}^\infty \int _0^{\theta _{n+1}} \int _A c(\phi (x_n,s),a)\pi _n(da|x_0,\theta _1,\dots ,x_n,s)ds}\right] \\=: & {} V(x,\pi ) \end{aligned}$$

defines the concerned performance measure of the policy $\pi \in \Pi $ given the initial state $x\in S.$ Here and below, we put $c(x_\infty ,a):=0$ for each $a\in A,$ and $\phi (x_\infty ,t)=x_\infty $ for each $t\in [0,\infty ).$ We are interested in the following optimal control problem for each $x\in S:$

$$\begin{aligned}&\text{ Minimize } \text{ over } \pi \in \Pi \text{: }&V(x,\pi ). \end{aligned}$$

(8)

A policy $\pi ^*$ is called optimal if $ V(x,\pi ^*)=\inf _{\pi \in \Pi }V(x,\pi )=:V^*(x)$ for each $x\in S$.

The objective of this paper is to show, under the imposed conditions, the existence of a deterministic stationary optimal policy, and to establish the corresponding optimality equation satisfied by the value function $V^*$, together with its value iteration. Evidently, $V^*(x)\ge 1$ for each $x\in S.$ Under the next condition, it will be seen that for each $x\in S,$$V^*(\phi (x,s))$ is absolutely continuous in s.

Condition 2.3

For each $x\in S,$$V^*(x)<\infty $.

The above condition is mainly assumed for notational convenience. In fact, the main optimality results (such as the existence of a deterministic stationary optimal policy) obtained in this paper can be established without assuming Condition 2.3, at the cost of some additional notations. In a nutshell, one has to consider the sets $ \hat{S}:=\{x\in S:~V^*(x)<\infty \} $ and $S{\setminus }\hat{S}$ separately, and note that if $x\in \hat{S}$, then $\phi (x,t)\in \hat{S}$ for each $t\in [0,\infty ).$ The reasoning presented under Condition 2.3 can be followed in an obvious manner. We formulate the corresponding optimality results in Remarks 3.1 and 3.2 below.

3 Main Statements

We first present the main optimality results concerning problem (8) for the PDMDP model. Their proofs are postponed to the next section.

Theorem 3.1

Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. Then the following assertions hold.

(a)
The value function $V^*$ for problem (8) is the minimal $[1,\infty )$-valued solution to the following optimality equation:
$$\begin{aligned}&-\,(V(\phi (x,t))-V(x))\\&\quad =\int _0^t \inf _{a\in A}\left\{ \int _S V(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\right. \\&\qquad \left. -\,c(\phi (x,\tau ),a) )V(\phi (x,\tau ))\right\} d\tau ,t\in [0,\infty ),x\in S. \end{aligned}$$
In particular, $V^*(\phi (x,t))$ is absolutely continuous in t for each $x\in S.$
(b)
There exists a deterministic stationary optimal policy f, which can be taken as any measurable mapping from S to A such that
$$\begin{aligned}&\inf _{a\in A}\left\{ \int _S V^*(y)\tilde{q}(dy|x,a)- (q_{x}(a)-c(x,a))V^*(x))\right\} \\&\quad =\int _S V^*(y)\tilde{q}(dy|x,f(x))- (q_{x}(f(x))-c(x,f(x)))V^*(x)),~\forall ~x\in S. \end{aligned}$$

Remark 3.1

By inspecting its proof, one can see the following version of Theorem 3.1 holds without assuming Condition 2.3. Suppose Conditions 2.1 and 2.2 are satisfied. Then the following assertions hold.

(a)
The value function $V^*$ for problem (8) is the minimal $[1,\infty ]$-valued solution to the following optimality equation:
$$\begin{aligned}&-\,(V(\phi (x,t))-V(x))\\&\quad =\int _0^t \inf _{a\in A}\Bigg \{ \int _S V(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\\&\qquad -\,c(\phi (x,\tau ),a) )V(\phi (x,\tau ))\Bigg \}d\tau ,\quad t\in [0,\infty ),\quad x\in \hat{S};\\&\quad \qquad V(x)<\infty ,\quad x\in \hat{S};\quad V(x)=\infty ,\quad x\in S{\setminus }\hat{S}. \end{aligned}$$
In particular, $V^*(\phi (x,t))$ is absolutely continuous in t for each $x\in \hat{S}.$
(b)
There exists a deterministic stationary optimal policy f, which can be taken as any measurable mapping from S to A such that
$$\begin{aligned}&\inf _{a\in A}\left\{ \int _S V^*(y)\tilde{q}(dy|x,a)- (q_{x}(a)-c(x,a))V^*(x))\right\} \\&\quad =\int _S V^*(y)\tilde{q}(dy|x,f(x))- (q_{x}(f(x))-c(x,f(x)))V^*(x)),~\forall ~x\in \hat{S}. \end{aligned}$$

Next, we present the value iteration algorithm for the value function $V^*$.

Theorem 3.2

Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. Let $V^{(0)}(x):=1$ for each $x\in S$. For each $n\ge 0,$ let $V^{(n+1)}$ be the minimal $[1,\infty )$-valued measurable solution to

$$\begin{aligned}&-\,(V^{(n+1)}(\phi (x,t))-V^{(n+1)}(x))\nonumber \\&\quad =\int _0^t \inf _{a\in A}\Bigg \{ \int _S V^{(n)}(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\nonumber \\&\qquad -\,c(\phi (x,\tau ),a) )V^{(n+1)}(\phi (x,\tau ))\Bigg \}d\tau ,\quad t\in [0,\infty ),\quad x\in S, \end{aligned}$$

(9)

such that $V^{(n+1)}(\phi (x,t))$ is absolutely continuous in t for each $x\in S.$ (For each $n\ge 0,$ such a solution always exists.) Furthermore, $\{V^{(n)}\}$ is a monontone nondecreasing sequence of measurable functions on S such that for each $x\in S,$$V^{(n)}(x)\uparrow V^*(x)$ as $n\uparrow \infty .$

Remark 3.2

Similar to Remark 3.1, we have the following version of Theorem 3.2 without assuming Condition 2.3. Suppose Conditions 2.1, 2.2 are satisfied. Let $V^{(0)}(x):=1$ for each $x\in \hat{S}$ and $V^{(0)}(x)=\infty $ if $x\in S{\setminus }\hat{S}$. For each $n\ge 0,$ let $V^{(n+1)}$ be the minimal $[1,\infty ]$-valued measurable solution to

$$\begin{aligned}&-\,(V^{(n+1)}(\phi (x,t))-V^{(n+1)}(x))\nonumber \\&\quad =\int _0^t \inf _{a\in A}\Bigg \{ \int _S V^{(n)}(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\\&\qquad -\,c(\phi (x,\tau ),a) )V^{(n+1)}(\phi (x,\tau ))\Bigg \}d\tau ,\quad t\in [0,\infty ),\quad x\in \hat{S},\\&\quad \qquad V^{(n+1)}(x)<\infty , \quad x\in \hat{S},\quad V^{(n+1)}(x)=\infty ,\quad x\in S{\setminus }\hat{S}. \end{aligned}$$

Here $V^{(n+1)}(\phi (x,t))$ is absolutely continuous in t for each $x\in \hat{S}.$ (For each $n\ge 0,$ such a solution always exists.) Furthermore, $\{V^{(n)}\}$ is a monontone nondecreasing sequence of measurable functions on S such that for each $x\in S,$$V^{(n)}(x)\uparrow V^*(x)$ as $n\uparrow \infty .$

We can apply our theorems to a special case of a CTMDP. That is, $\phi (x,t)\equiv x$ for each $x\in S.$ The following $\alpha $-discounted risk-sensitive CTMDP problem was considered in [14]:

$$\begin{aligned} \text{ Minimize } \text{ over } \pi \in \Pi \text{: } E_x^\pi \left[ e^{\int _0^\infty e^{-\alpha t}\int _A c(\xi _t,a)\pi (da|\omega ,t)dt}\right] ,~x\in S. \end{aligned}$$

(10)

Here $\alpha >0$ is a fixed constant. In fact, Ghosh and Saha [14] were restricted to Markov policies, bounded transition and cost rates, i.e., $\sup _{x\in S}\overline{q}_x<\infty $, and $\sup _{x\in S,a\in A}c(x,a)<\infty $, and a finite state space S. These restrictions, e.g., the finiteness of S, were needed for their investigations, see e.g., [14, Remark 3.6]. Under the compactness-continuity condition (Condition 2.1), it was shown in [14] that there exists an optimal Markov policy for the discounted risk-sensitive CTMDP, and established the optimality equation. By using the theorems presented earlier in this section, we can obtain these optimality results for problem (10) in a much more general setup: the state space S is Borel, there is no boundedness requirement on the transition rate with respect to the state $x\in S$, and the optimality is over the class of history-dependent policies. Furthermore, we let the CTMDP model be nonhomogeneous, i.e., the transition rate q(dy|t, x, a) now is a signed kernel on $\mathcal{B}(S)$ from $(t,x,a)\in [0,\infty )\times S\times A$, satisfying the corresponding version of (3); the notations $\tilde{q}$ is kept as before, see (2), with the extra argument t in addition to x. Similarly, the nonnegative cost rate c is allowed to be a measurable function on $[0,\infty )\times S\times A$.

Corollary 1

Consider the $\alpha $-discounted risk-sensitive (nonhomogeneous) CTMDP problem (10) with $c(\xi _t,a)$ being replaced by $c(t,\xi _t,a)$. Suppose

$$\begin{aligned} \sup _{t\in [0,\infty )}\{\overline{q}_{(t,x)}\}<\infty ,~\forall ~x\in S,~ \sup _{t\in [0,\infty ),x\in S,a\in A}c(t,x,a)<\infty , \end{aligned}$$

and the corresponding version of Condition 2.1, where x is replaced by (t, x), is satisfied by the nonhomogeneous CTMDP model. Then the following assertions hold.

(a)
There exists some $[1,\infty )$-valued measurable solution on $[0,\infty )\times S$ to
$$\begin{aligned}&-\,(V(t,x)-V(0,x))\\&\quad =\int _0^t \inf _{a\in A}\left\{ \int _S V(u,y)\tilde{q}(dy|u,x,a)+(e^{-\alpha u} c(u,x,a)\right. \\&\qquad \left. -\,q_{(u,x)}(a))V(u,x) \right\} du,~x\in S,~t\in [0,\infty ), \end{aligned}$$
so that V(t, x) is absolutely continuous in t for each $x\in S.$
(b)
Let L be the minimal $[1,\infty )$-valued measurable solution on $[0,\infty )\times S$ to the above equation. Then the value function say $L^*$ to the $\alpha $-discounted risk-sensitive CTMDP problem (10) (with $c(\xi _t,a)$ being replaced by $c(t,\xi _t,a)$) is given by $L^*(x)=L(0,x)$ for each $x\in S.$
(c)
There exists an optimal deterministic Markov policy f for the $\alpha $-discounted risk-sensitive CTMDP problem (10) (with $c(\xi _t,a)$ being replaced by $c(t,\xi _t,a)$). One can take f as any measurable mapping from $[0,\infty )\times S$ to A such that
$$\begin{aligned}&\inf _{a\in A}\left\{ \int _S L(u,y)\tilde{q}(dy|u,x,a)+(e^{-\alpha u} c(u,x,a)-q_{(u,x)}(a))L(u,x) \right\} \\&\quad =\int _S L(u,y)\tilde{q}(dy|u,x,f(u,x))+(e^{-\alpha u} c(u,x,f(u,x))\\&\qquad -\,q_{(u,x)}(f(u,x)))L(u,x) \end{aligned}$$
for each $u\in [0,\infty )$ and $x\in S.$

Proof

We prove this by reformulating the nonhomogeneous version of the $\alpha $-discounted risk-sensitive (nonhomogeneous) CTMDP problem (10) in the form of problem (8) for a PDMDP, which we introduce as follows. We use the notation “hat” to distinguish this model from the original (nonhomogeneous) CTMDP model.

The state space is $\hat{S}=[0,\infty )\times S.$
The action space is the same as in the CTMDP: $\hat{A}=A.$
the transition rate $\hat{q}(ds\times dy|(t,x),a)$ is defined by
$$\begin{aligned} \hat{q}(ds\times dy|(t,x),a):=\tilde{\hat{q}}(ds\times dy|(t,x),a)-I\{(t,x)\in ds\times dy\}q_{(t,x)}(a), \end{aligned}$$
where
$$\begin{aligned} \tilde{\hat{q}}(ds\times dy|(t,x),a):=I\{t\in ds\}\tilde{q}(dy|t,x,a), \end{aligned}$$
for each $(t,x)\in \hat{S}$ and $a\in \hat{A}.$
The drift is given by $\hat{\phi }((t,x),s):=(t+s,x)$ for each $x\in S$ and $t,s\ge 0.$ Clearly it satisfies the corresponding version of (4).
The cost rate is given by
$$\begin{aligned} \hat{c}((t,x),a):=e^{-\alpha t} c(t,x,a),\quad ~\forall ~t\in [0,\infty ),\quad ~x\in S,\quad ~a\in A. \end{aligned}$$

Now the marked point process $\{\hat{t}_n,\hat{x}_n\}$ and controlled process $\hat{\xi }_t$ in this PDMDP model is connected to those in the original (nonhomogeneous) CTMDP model, namely $(t_n,x_n)$ and $\xi _t$, via $\hat{t}_n=t_n$ and $\hat{x}_n=(t_n,x_n),$ and $\hat{\xi }_t=(t,\xi _t).$ For example, under a fixed strategy $\hat{\pi }$ and initial distribution $\hat{\gamma }$ in this PDMDP model, the version of the first equation in (7) now reads on $\{\omega :x_n(\omega )\in S\}$

$$\begin{aligned}&\hat{P}_{\hat{\gamma }}^{\hat{\pi }}(\hat{\theta }_{n+1}\in \Gamma _1,~\hat{x}_{n+1}\in \Gamma _2\times \Gamma _3|\hat{x}_0,\hat{\theta }_1,\hat{x}_1,\dots ,\hat{\theta }_{n},\hat{x}_n)\nonumber \\&\quad =\int _{\Gamma _1}e^{-\int _0^t \int _A q_{(t_n+s,x_n)}(a)\hat{\pi }_n(da|\hat{x}_0,\hat{\theta }_1,\dots ,\hat{\theta }_n,\hat{x}_n,s)ds}\\&\qquad \times \int _{A}I\{t+t_n\in \Gamma _2\} \tilde{q} (\Gamma _3|t+t_n,x_n ,a)\hat{\pi }_n(da|\hat{x}_0,\hat{\theta }_1,\dots ,\hat{\theta }_n,\hat{x}_n,t)dt,\nonumber \\&\quad \qquad ~\forall ~\Gamma _1\in \mathcal{B}((0,\infty )),\quad ~\Gamma _2\in \mathcal{B}([0,\infty )), \quad ~\Gamma _3\in \mathcal{B}(S).\nonumber \end{aligned}$$

Clearly, Conditions 2.1, 2.2 and 2.3 are satisfied by this PDMDP model. It remains to apply Theorem 3.1. $\square $

The condition in the previous corollary is much weaker than in [14], and can be further weakened; one only needs the reformulated PDMDP to satisfy Conditions 2.1, 2.2 and 2.3. Moreover, the boundedness of the cost rate c was assumed in the previous corollary only to ensure Condition 2.3 to be satisfied. It can be relaxed if one formulates the previous corollary using the statements in Remarks 3.1 and 3.2.

One can also consider the risk-sensitive nonhomogeneous CTMDP problem on the finite horizon [0, T] with $T>0$ being a fixed constant:

$$\begin{aligned} \text{ Minimize } \text{ over } \pi \in \Pi \text{: } E_x^\pi \left[ e^{\int _0^T e^{-\alpha t}\int _A c(t,\xi _t,a)\pi (da|\omega ,t)dt +g(\xi _T)}\right] ,~x\in S, \end{aligned}$$

where g is a $[0,\infty )$-valued measurable function; g(x) represents the terminal cost incurred when $\xi _T=x\in S$. Let us put $g(x_\infty ):=0.$ Here $\alpha $ is a fixed nonnegative finite constant. A simpler version of this problem was considered in [24] with $\alpha =0$ and a bounded cost rate, where additional restrictions were put on the growth of the transition rate. We can reformulate this problem into the PDMDP problem (8) just as in the above. The only difference is that now we put $q_{(t,x)}(a)\equiv 0$ for each $x\in S$ and $t\ge T,$ and introduce the following cost rate for each $x\in S$, $t\ge 0$ and $a\in A:$

$$\begin{aligned} \hat{c}((t,x),a)=\left\{ \begin{array}{ll} e^{-\alpha t}c(t,x,a), &{}\quad \text{ if } t\le T; \\ e^{-(t-T)}g(x) &{}\quad \text{ if } t>T. \end{array}\right. \end{aligned}$$

4 Proof of the Main Statements

For the rest of this paper, it is convenient to introduce the following notations. Let $\mathbb {P}(A)$ be the space of probability measures on $\mathcal{B}(A)$, endowed with the standard weak topology. For each $\mu \in \mathbb {P}(A)$,

$$\begin{aligned} q_x(\mu ):= & {} \int _A q_x(a)\mu (da),~\tilde{q}(dy|x,\mu ):= \int _A \tilde{q}(dy|x,a)\mu (da),~c(x,\mu )\\:= & {} \int _A c(x,a)\mu (da). \end{aligned}$$

Let $\mathcal{R}$ denote the set of (Borel) measurable mappings $\rho _t(da)$ from $t\in (0,\infty )\rightarrow \mathbb {P}(A).$ Here, we do not distinguish two measurable mappings in $t\in (0,\infty ),$ which coincide almost everywhere with respect to the Lebesgue measure. Let us equip $\mathcal{R}$ with the Young topology, which is the weakest topology with respect to which the function $ \rho \in \mathcal{{R}}\rightarrow \int _0^\infty \int _A f(t,a)\rho _t(da)dt $ is continuous for each strongly integrable Carathéodory function f on $(0,\infty )\times A$ . Here a real-valued measurable function f on $(0,\infty )\times A$ is called a strongly integrable Carathéodory function if for each fixed $t\in (0,\infty )$, f(t, a) is continuous in $a\in A,$ and for each fixed $a\in A,$$\sup _{a\in A}|f(t,a)|$ is integrable in t, i.e., $\int _0^\infty \sup _{a\in A}|f(t,a)|dt<\infty .$ It is known that if A is a compact Borel space, then so is $\mathcal{R}$; see Chapter 4 of [10].

Lemma 4.1

Suppose Conditions 2.1 and 2.2 are satisfied. Then the following assertions hold.

(a)
The value function $V^*$ is the minimal $[1,\infty ]$-valued measurable solution to
$$\begin{aligned}&V^*(x)\\&\quad = \inf _{\rho \in \mathcal{R}}\left\{ \int _0^\infty e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds} \left( \int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )\right) d\tau \right. \\&\qquad \left. +\,e^{-\int _0^\infty q_{\phi (x,s)}(\rho _s)ds}e^{\int _0^\infty c(\phi (x,s),\rho _s)ds} \right\} ,\quad ~\forall ~x\in S. \end{aligned}$$
(b)
The mapping
$$\begin{aligned}&\rho \in \mathcal{R}\rightarrow W(x,\rho )\\&\quad :=\int _0^\infty e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds} \left( \int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )\right) d\tau \\&\qquad +\,e^{-\int _0^\infty q_{\phi (x,s)}(\rho _s)ds}e^{\int _0^\infty c(\phi (x,s),\rho _s)ds} \end{aligned}$$
is lower semicontinuous for each $x\in S.$

Proof

One can legitimately consider the following DTMDP (discrete-time Markov decision process): according to [9, Lemma 2.29], all the involved mappings are measurable.

The state space is $\mathbf X :=((0,\infty )\times S)\bigcup \{(\infty ,x_\infty )\}$. Whenever the topology is concerned, $(\infty ,x_\infty )$ is regarded as an isolated point in $\mathbf X .$
The action space is $\mathbf A :=\mathcal{R}$.
The transition kernel p on $\mathcal{B}(\mathbf X )$ from $\mathbf X \times \mathbf A $, c.f. (7), is given for each $\rho \in \mathbf A $ by
$$\begin{aligned}&p(\Gamma _1\times \Gamma _2|(\theta ,x),\rho ):=\int _{\Gamma _2} e^{-\int _0^t q_{\phi (x,s)}(\rho _s)ds}\tilde{q}(\Gamma _1|\phi (x,t),\rho _t)dt,\\&\quad \forall ~\Gamma _1\in \mathcal{B}(S),~\Gamma _2 \in \mathcal{B}((0,\infty )),~x\in S,~\theta \in (0,\infty ),\nonumber \\&p(\{(\infty ,x_\infty )\}|(\theta ,x),\rho ):=e^{-\int _0^\infty q_{\phi (x,s)}(\rho _s)ds},\quad ~\forall ~x\in S,\quad ~\theta \in (0,\infty );\nonumber \\&p(\{(\infty ,x_\infty )\}|(\infty ,x_\infty ),\rho ):=1. \end{aligned}$$
The cost function l is a $[0,\infty ]$-valued measurable function on $\mathbf X \times \mathbf A \times \mathbf X $ given by
$$\begin{aligned}&l((\theta ,x),\rho ,(\tau ,y))\\&\quad :=\int _0^\infty I\{s<\tau \} c(\phi (x,s),\rho _s)ds,~\forall ~((\theta ,x),\rho ,(\tau ,y))\in \mathbf X \times \mathbf A \times \mathbf X . \end{aligned}$$

The relevant facts and statements for the DTMDP are included in the Appendix.

One can show that under Conditions 2.1 and 2.2, for each $(\theta ,x)\in \mathbf X $, $a\in \mathbf A \rightarrow \int _\mathbf{X }f(z)p(dz|(\theta ,x),a)$ is continuous for each bounded measurable function f on $\mathbf X $; for each $(\theta ,x)\in \mathbf X $ and $(\tau ,y)\in \mathbf X $, $a\in \mathbf A \rightarrow l((\theta ,x),\rho ,(\tau ,y))$ is lower semicontinuous, and $\mathbf A $ is a compact Borel space. Hence, Condition A.1 for the DTMDP model $\{\mathbf{X },\mathbf{A },p,l\}$ is satisfied.

The controlled process in the above DTMDP model $\{\mathbf{X },\mathbf{A },p,l\}$ is denoted by $\{Y_n,n=0,1,\dots \}$, where $Y_n=(\Theta _n,X_n)$, and the controlling process is denoted by $\{A_n,n=0,1,\dots \}.$ For $n\ge 1,$$\Theta _n$ and $X_n$ correspond to the nth sojourn time and the post-jump state in the PDMDP, $\Theta _0$ is fictitious, and $X_0$ is the initial state in the PDMDP. Let $\Sigma $ be the class of all strategies for the DTMDP model $\{\mathbf{X },\mathbf{A },p,l\}$, and $\Sigma _{DM}^0$ be the class of deterministic Markov strategies in the form $\sigma =(\varphi _n)$ where $\varphi _0((\theta ,x))$ does not depend on $\theta \in (0,\infty )$ for each $x\in S.$ We preserve the term of policy for the PDMDP and the term of strategy for the DTMDP.

According to Proposition A.1, the function

$$\begin{aligned} (\theta ,x)\in \mathbf X \rightarrow \mathbf V ^*((\theta ,x)):=\inf _{\sigma \in \Sigma }{} \mathbf E _{(\theta ,x)}^{\sigma }\left[ e^{\sum _{n=0}^\infty l(Y_n,A_n,Y_{n+1})}\right] \end{aligned}$$

is the minimal $[1,\infty ]$-valued measurable solution to the optimality equation

$$\begin{aligned}&\mathbf V ^*((\theta ,x))\\&\quad = \inf _{\rho \in \mathcal{R}}\Bigg \{\int _0^\infty e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds} \left( \int _S \mathbf V ^*((\tau ,y))\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )\right) d\tau \\&\qquad +\,e^{-\int _0^\infty q_{\phi (x,s)}(\rho _s)ds}e^{\int _0^\infty c(\phi (x,s),\rho _s)ds} \Bigg \} \end{aligned}$$

for each $x\in S$ and $\theta \in (0,\infty );$ this is just (20). Furthermore, by Proposition A.1, there exists a deterministic stationary strategy $\sigma ^*$ for the DTMDP such that $\sigma ^*((\theta ,x))$ attains the above infimum for each $x\in S$ and $\theta \in (0,\infty ),$ and any such strategy $\sigma ^*$ verifies

$$\begin{aligned} \mathbf E _{(\theta ,x)}^{\sigma ^*}\left[ e^{\sum _{n=0}^\infty l(Y_n,A_n,Y_{n+1})}\right] = \inf _{\sigma \in \Sigma }{} \mathbf E _{(\theta ,x)}^{\sigma }\left[ e^{\sum _{n=0}^\infty l(Y_n,A_n,Y_{n+1})}\right] ,~\forall ~(\theta ,x)\in \mathbf X . \end{aligned}$$

Let $\hat{\theta }\in ~(0,\infty )$ be arbitrarily fixed. The function $\mathbf V ^*((\theta ,x))$ being measurable in $(\theta ,x)\in \mathbf X $, it follows that $x\in S\rightarrow \mathbf V ^*((\hat{\theta },x))$ is measurable. The strategy $\sigma ^*$ and the constant $\hat{\theta }$ induce a deterministic Markov strategy $\sigma ^{**}=(\varphi _n)\in \Sigma ^0_{DM}$, where $\varphi _0((\theta ,x))=:\sigma ^*((\hat{\theta },x))$ for each $\theta \in (0,\infty ),~x\in S$, and $\varphi _n((\theta ,x)):=\sigma ((\theta ,x))$ for each $n\ge 1$, $\theta \in (0,\infty ),~x\in S.$ (The control on the isolated point $(0,x_\infty )$ is irrelevant and we do not specify the definition of the strategy on that point.) This strategy can be identified with a policy $\pi ^*$ in the PDMDP, c.f. (6). On the other hand, each policy $\pi =(\pi _n)$ can be identified with a deterministic strategy in this DTMDP. Thus,

$$\begin{aligned} V^*(x)\ge \mathbf V ^*((\hat{\theta },x))= & {} \mathbf E _{(\hat{\theta },x)}^{\sigma ^*}\left[ e^{\sum _{n=0}^\infty l(Y_n,A_n,Y_{n+1})}\right] = \mathbf E _{(\hat{\theta },x)}^{\sigma ^{**}}\left[ e^{\sum _{n=0}^\infty l(Y_n,A_n,Y_{n+1})}\right] \\= & {} V(x,\pi ^*)\ge V^*(x) \end{aligned}$$

for each $x\in S.$ Consequently, the policy $\pi ^*$ is optimal, $V^*(x)=\mathbf V ^*((\theta ,x))$ for each $x\in S$ and $\theta \in (0,\infty );$ recall that $\hat{\theta }$ was arbitrarily fixed. The statement of this lemma now follows. $\square $

The policy $\pi ^*$ in the proof of the previous lemma is actually optimal for problem (8). However, it is not necessarily a deterministic nor stationary policy. Also the reduction of the risk-sensitive PDMDP problem (8) to a risk-sensitive problem for the DTMDP model $\{\mathbf{X },\mathbf{A },p,l\}$ as seen in the proof of the above theorem will be used without special reference in what follows.

Lemma 4.2

Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. For each $x\in S$ and $\rho \in \mathcal{R}$,

$$\begin{aligned} t\in [0,\infty )\rightarrow & {} \int _0^t e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\int _{S}V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )d\tau \\&+\,e^{-\int _0^t (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*(\phi (x,t)) \end{aligned}$$

is monotone nondecreasing in $t\in [0,\infty )$.

Proof

Let $0\le t_1<t_2<\infty $ be arbitrarily fixed. We need show

$$\begin{aligned}&\int _0^{t_2} e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\int _{S}V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )d\tau \nonumber \\&\quad +\,e^{-\int _0^{t_2} (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*(\phi (x,t_2))\nonumber \\&\qquad \ge \int _0^{t_1} e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\int _{S}V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )d\tau \nonumber \\&\quad \qquad +\,e^{-\int _0^{t_1} (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*(\phi (x,t_1)). \end{aligned}$$

(11)

It is without loss of generality to assume

$$\begin{aligned} \int _0^{t_2} e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\int _{S}V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )d\tau <\infty . \end{aligned}$$

Then all the four terms in (11) are nonnegative and finite, and (11) is equivalent to

$$\begin{aligned}&\int _0^{t_2} e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\int _{S}V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )d\tau \nonumber \\&\quad +\,e^{-\int _0^{t_2} (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*(\phi (x,t_2))\nonumber \\&\quad -\,\int _0^{t_1} e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\int _{S}V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )d\tau \nonumber \\&\quad -\,e^{-\int _0^{t_1} (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*(\phi (x,t_1))\nonumber \\&\qquad =\int _{t_1}^{t_2}e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\int _{S}V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )d\tau \nonumber \\&\quad \qquad +\,e^{-\int _0^{t_1} (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\Big (e^{-\int _{t_1}^{t_2} (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*(\phi (x,t_2))\nonumber \\&\quad \qquad -\,V^*(\phi (x,t_1))\Big )\nonumber \\&\qquad =\left\{ \int _0^{t_2-t_1} e^{-\int _0^\tau (q_{\phi (x,s+t_1)}(\rho _{s+t_1})-c(\phi (x,s+t_1),\rho _{s+t_1}))ds}\right. \nonumber \\&\quad \qquad \times \left. \int _S V^*(y)\tilde{q}(dy|\phi (x,t_1+\tau ),\rho _{t_1+\tau })d\tau \right. \nonumber \\&\qquad \quad \left. +\,e^{-\int _{t_1}^{t_2} (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*(\phi (x,t_2))-\,V^*(\phi (x,t_1))\right\} \nonumber \\&\qquad \quad \times \, e^{-\int _0^{t_1} (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\nonumber \\&\qquad \ge 0, \end{aligned}$$

(12)

which is verified as follows. Let $\delta >0$ be arbitrarily fixed. By Lemma 4.1, there exists some $\hat{\nu }\in \mathcal{R}$ such that

$$\begin{aligned}&V^*(\phi (x,t_2))+\delta \\&\quad \ge \int _0^\infty \int _S V^*(y)\tilde{q}(dy|\phi (x,t_2+\tau ),\hat{\nu }_\tau )e^{-\int _0^\tau (q_{\phi (x,t_2+s)}(\hat{\nu }_s)-c(\phi (x,t_2+s),\hat{\nu }_s))ds}d\tau \nonumber \\&\qquad +\,e^{-\int _0^\infty q_{\phi (x,t_2+s)}(\hat{\nu }_s)ds}e^{\int _0^\infty c(\phi (x,t_2+s),\hat{\nu }_s)ds}. \end{aligned}$$

(Recall $\phi (x,t_2+t)=\phi (\phi (x,t_2),t)$ for each $t \ge 0.$) Consider $\tilde{\nu }\in \mathcal{R}$ defined by

$$\begin{aligned} \tilde{\nu }_s=\left\{ \begin{array}{ll} \rho _{t_1+s}, &{} \quad \text{ if } s\le t_2-t_1; \\ \hat{\nu }_{s-(t_2-t_1)} &{} \quad \text{ if } s>t_2-t_1. \end{array}\right. \end{aligned}$$

Then routine calculations lead to

$$\begin{aligned}&V^*(\phi (x,t_1))\\&\quad \le \int _0^{t_2-t_1} e^{-\int _0^\tau (q_{\phi (x,t_1+s})(\tilde{\nu }_s)-c(\phi (x,t_1+s),\tilde{\nu }_s))ds}\\&\qquad \times \,\left( \int _S V^*(y)\tilde{q}(dy|\phi (x,t_1+\tau ),\tilde{\nu }_\tau )\right) d\tau \\&\qquad +\,\int _{t_2-t_1}^\infty e^{-\int _0^\tau (q_{\phi (x,t_1+s})(\tilde{\nu }_s)-c(\phi (x,t_1+s),\tilde{\nu }_s))ds} \left( \int _S V^*(y)\tilde{q}(dy|\phi (x,t_1+\tau ),\tilde{\nu }_\tau )\right) d\tau \\&\qquad +\,e^{-\int _0^{t_2-t_1} (q_{\phi (x,t_1+s)}(\tilde{\nu }_s)-c(\phi (x,t_1+s),\tilde{\nu }_s))ds} e^{-\int _{t_2-t_1}^\infty q_{\phi (x,t_1+s)}(\tilde{\nu }_s)ds}e^{\int _{t_2-t_1}^\infty c(\phi (x,t_1+s),\tilde{\nu }_s)ds}\\&\quad =\int _0^{t_2-t_1}e^{-\int _0^\tau (q_{\phi (x,t_1+s)}(\rho _{s+t_1})-c(\phi (x,t_1+s),\rho _{s+t_1}))ds}\\&\qquad \times \int _S V^*(y)\tilde{q}(dy|\phi (x,t_1+\tau ),\rho _{t_1+\tau })d\tau \\&\qquad +\,e^{-\int _0^{t_2-t_1} (q_{\phi (x,t_1+s)}(\rho _{s+t_1})-c(\phi (x,t_1+s),\rho _{s+t_1}))ds}\\&\qquad \times \Bigg \{\int _0^\infty e^{-\int _0^\tau (q_{\phi (x,t_2+s)}(\hat{\nu }_s)-c(\phi (x,t_2+s),\hat{\nu }_s))ds}\int _S V^*(y)\tilde{q}(dy|\phi (x,t_2+\tau ),\hat{\nu }_\tau )d\tau \\&\qquad +\,e^{-\int _0^\infty q_{\phi (x,t_2+s)}(\hat{\nu }_s)ds}e^{\int _0^\infty c(\phi (x,t_2+s),\hat{\nu }_s)ds}\Bigg \}\\&\quad \le \int _0^{t_2-t_1}e^{-\int _0^\tau (q_{\phi (x,t_1+s)}(\rho _{s+t_1})-c(\phi (x,t_1+s),\rho _{s+t_1}))ds}\\&\qquad \times \int _S V^*(y)\tilde{q}(dy|\phi (x,t_1+\tau ),\rho _{t_1+\tau })d\tau \\&\qquad +\,e^{-\int _0^{t_2-t_1} (q_{\phi (x,t_1+s)}(\rho _{s+t_1})-c(\phi (x,t_1+s),\rho _{s+t_1}))ds} (V^*(\phi (x,t_2))+\delta ). \end{aligned}$$

Since $\delta >0$ was arbitrarily fixed, now it follows that the term in the parenthesis in (12) is nonnegative, and thus inequality (12) is verified. $\square $

Lemma 4.3

Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. For each $x\in S$, there is some $\rho ^*\in \mathcal{R}$ such that

$$\begin{aligned} V^*(x)= & {} \inf _{\rho \in \mathcal{R}}\left\{ \int _0^t e^{-\int _0^s(q_{\phi (x,v)}(\rho _v)-c(\phi (x,v),\rho _v))dv}\int _S V^*(y)\tilde{q}(dy|\phi (x,s),\rho _s)ds\right. \nonumber \\&\left. +\,e^{-\int _0^t (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*(\phi (x,t))\right\} \nonumber \\= & {} \int _0^t e^{-\int _0^s(q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\int _S V^*(y)\tilde{q}(dy|\phi (x,s),\rho ^*_s)ds\nonumber \\&+\,e^{-\int _0^t (q_{\phi (x,s)}(\rho ^*_s)-c(\phi (x,s),\rho ^*_s))ds}V^*(\phi (x,t)),~\forall ~t\ge 0. \end{aligned}$$

(13)

Proof

Let $x\in S$ be fixed, and let $\rho ^*\in \mathcal{R}$ be such that $V^*(x)=W(x,\rho ^*)$, see Lemma 4.1. Suppose $t\in [0,\infty )$ is arbitrarily fixed. Consider $\tilde{\rho }\in \mathcal{R}$ defined by $ \tilde{\rho }_s=\rho ^*_{t+s}$ for each $s>0$. Then

$$\begin{aligned} V^*(x)= & {} \int _0^t e^{-\int _0^s(q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\int _S V^*(y)\tilde{q}(dy|\phi (x,s),\rho ^*_s)ds\\&+\,e^{-\int _0^t (q_{\phi (x,s)}(\rho ^*_s)-c(\phi (x,s),\rho ^*_s))ds}\\&\times \left\{ \int _0^\infty e^{-\int _0^\tau (q_{\phi (x,t+s)}(\tilde{\rho }_s)-c(\phi (x,s+t),\tilde{\rho }_s))ds}\int _S V^*(y)\tilde{q}(dy|\phi (x,\tau +t),\tilde{\rho }_\tau )d\tau \right. \\&\left. +\,e^{-\int _0^\infty q_{\phi (x,t+s)}(\tilde{\rho }_s)ds}e^{-\int _0^\infty c(\phi (x,t+s),\tilde{\rho }_s)ds}\right\} \\\ge & {} \int _0^t e^{-\int _0^s(q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\int _S V^*(y)\tilde{q}(dy|\phi (x,s),\rho ^*_s)ds\\&+\,e^{-\int _0^t (q_{\phi (x,s)}(\rho ^*_s)-c(\phi (x,s),\rho ^*_s))ds}V^*(\phi (x,t)); \end{aligned}$$

recall (4). On the other hand, by Lemma 4.2,

$$\begin{aligned} V^*(x)\le & {} \inf _{\rho \in \mathcal{R}}\Bigg \{\int _0^t e^{-\int _0^s(q_{\phi (x,v)}(\rho _v)-c(\phi (x,v),\rho _v))dv}\int _S V^*(y)\tilde{q}(dy|\phi (x,s),\rho _s)ds\\&+\,e^{-\int _0^t (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*(\phi (x,t))\Bigg \}. \end{aligned}$$

The statement of this lemma is thus proved. $\square $

Lemma 4.4

Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. Then for each $x\in S,$$t\in [0,\infty )\rightarrow V^*(\phi (x,t))$ is absolutely continuous.

Proof

This immediately follows from Lemma 4.3. $\square $

Proof of Theorem 3.1

(a) Under Conditions 2.1, 2.2 and 2.3, by Lemma 4.4, for each $x\in S,$ let $t\in [0,\infty )\rightarrow U^*(x,t)$ be an integrable real-valued function such that $U^*(x,t)$ coincides with the derivative of $t\in [0,\infty )\rightarrow V(\phi (x,t))$ almost everywhere. Let $x\in S$ and $t\in [0,\infty )$ be fixed, and let $\rho ^*\in \mathcal{R}$ be from Lemma 4.3.

By Lemmas 4.3 and 4.4,

$$\begin{aligned} \int _0^\tau e^{-\int _0^s(q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\int _S V^*(y)\tilde{q}(dy|\phi (x,s),\rho ^*_s)ds \end{aligned}$$

and

$$\begin{aligned} e^{-\int _0^\tau (q_{\phi (x,s)}(\rho ^*_s)-c(\phi (x,s),\rho ^*_s))ds}V^*(\phi (x,\tau )) \end{aligned}$$

are absolutely continuous in $\tau $ and are finite for each $\tau \in [0,\infty )$. Since $\phi (x,0)=x$, see (4),

$$\begin{aligned}&e^{-\int _0^t (q_{\phi (x,s)}(\rho ^*_s)-c(\phi (x,s),\rho ^*_s))ds}V^*(\phi (x,t))-V^*(x)\nonumber \\&\quad =\int _0^t e^{-\int _0^\tau (q_{\phi (x,s)}(\rho ^*_s)-c(\phi (x,s),\rho ^*_s) )ds}\left\{ U^*(x,\tau )- (q_{\phi (x,\tau )}(\rho ^*_\tau )\right. \\&\qquad \left. -\,c(\phi (x,\tau ),\rho ^*_\tau ) )V^*(\phi (x,\tau )) \right\} d\tau . \end{aligned}$$

Now by Lemma 4.3,

$$\begin{aligned}&0=\int _0^t e^{-\int _0^s(q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\int _S V^*(y)\tilde{q}(dy|\phi (x,s),\rho ^*_s)ds\nonumber \\&\qquad +\,e^{-\int _0^t (q_{\phi (x,s)}(\rho ^*_s)-c(\phi (x,s),\rho ^*_s))ds}V^*(\phi (x,t))-V^*(x)\nonumber \\&\quad =\int _0^t e^{-\int _0^\tau (q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\Bigg \{\int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho ^*_\tau )+U^*(x,\tau )\nonumber \\&\qquad -\, (q_{\phi (x,\tau )}(\rho ^*_\tau )-c(\phi (x,\tau ),\rho ^*_\tau ) )V^*(\phi (x,\tau )) \Bigg \}d\tau \nonumber \\&\quad \ge \int _0^t e^{-\int _0^\tau (q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\Bigg \{U^*(x,\tau ) \nonumber \\&\qquad +\,\left. \inf _{a\in A}\left\{ \int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\right. \right. \nonumber \\&\qquad -\,c(\phi (x,\tau ),a) )V^*(\phi (x,\tau ))\Bigg \} \Bigg \}d\tau \nonumber \\&\quad = \int _0^t e^{-\int _0^\tau (q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\Bigg \{U^*(x,\tau )\nonumber \\&\qquad \left. +\,\int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),f(\phi (x,\tau )))\right. \nonumber \\&\qquad -\, (q_{\phi (x,\tau )}(f(\phi (x,\tau )))-c(\phi (x,\tau ),f(\phi (x,\tau ))) )V^*(\phi (x,\tau )) \Bigg \}d\tau , \end{aligned}$$

(14)

where f is a measurable mapping from S to A such that

$$\begin{aligned}&\inf _{a\in A}\left\{ \int _S V^*(y)\tilde{q}(dy|x,a)- (q_{x}(a)-c(x,a) )V^*(x)\right\} \\&\quad = \int _S V^*(y)\tilde{q}(dy|x,f(x))- (q_{x}(\varphi (x))-c(x,f(x)) )V^*(x) \end{aligned}$$

for each $x\in S$; the existence of such a mapping is according to a well known measurable selection theorem, c.f. Proposition D.5 of [15].

Note that $e^{-\int _0^\tau (q_{\phi (x,v)}(\rho _v)-c(\phi (x,v),\rho _v))dv}$ is bounded and separated from zero in $\tau \in [0,t]$ for each $\rho \in \mathcal{R};$ recall Condition 2.2. So

$$\begin{aligned}&\int _0^te^{-\int _0^\tau (q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\Bigg \{U^*(x,\tau )\\&\quad -\,(q_{\phi (x,\tau )}(f(\phi (x,\tau )))-c(\phi (x,\tau ),f(\phi (x,\tau ))) )V^*(\phi (x,\tau )) \Bigg \}d\tau \end{aligned}$$

is finite. If

$$\begin{aligned} \int _0^t \int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),f(\phi (x,\tau )))d\tau =\infty , \end{aligned}$$

then

$$\begin{aligned}&\int _0^t e^{-\int _0^\tau (q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\Bigg \{U^*(x,\tau )+\int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),f(\phi (x,\tau )))\\&\quad -\, (q_{\phi (x,\tau )}(f(\phi (x,\tau )))-c(\phi (x,\tau ),f(\phi (x,\tau ))) )V^*(\phi (x,\tau )) \Bigg \}d\tau =\infty , \end{aligned}$$

which is against (14). Therefore,

$$\begin{aligned} \int _0^t \int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),f(\phi (x,\tau )))d\tau <\infty . \end{aligned}$$

Then

$$\begin{aligned}&\int _0^v e^{-\int _0^\tau (q_{\phi (x,s)}(f(\phi (x,s)))-c(\phi (x,s),f(\phi (x,s))))ds}\\&\quad \times \int _{S}V^*(y)\tilde{q}(dy|\phi (x,\tau ), f(\phi (x,\tau )))d\tau \\&\quad +\,e^{-\int _0^v (q_{\phi (x,s)}(f(\phi (x,s)))-c(\phi (x,s),f(\phi (x,s))))ds}V^*(\phi (x,v)) \end{aligned}$$

is absolutely continuous on [0, t]. After legitimately differentiating the above expression with respect to v, and applying Lemma 4.2, we see

$$\begin{aligned}&U^*(x,v)+\int _S V^*(y)\tilde{q}(dy|\phi (x,v),f(\phi (x,v)))\nonumber \\&\quad -\, (q_{\phi (x,v)}(f(\phi (x,v)))-c(\phi (x,v),f(\phi (x,v))) )V^*(\phi (x,v))\ge 0 \end{aligned}$$

for almost all $v\in [0,t].$ This and (14) imply

$$\begin{aligned}&U^*(x,\tau )+\inf _{a\in A}\Bigg \{ \int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\\&\quad -\,c(\phi (x,\tau ),a) )V^*(\phi (x,\tau ))\Bigg \}=0 \end{aligned}$$

almost everywhere in $\tau \in [0,t].$ Remember, $t\in [0,\infty )$ was arbitrarily fixed. The first part of (a) is thus verified, and we postpone the justification of the second part of (a) after the proof of part (b).

(b) We use the same notation as in the above. Note that

$$\begin{aligned}&\mathop {\underline{\lim }}_{t\rightarrow \infty }\left\{ e^{-\int _0^t (q_{\phi (x,s)}(f(\phi (x,s)))- c(\phi (x,s),f(\phi (x,s))))ds}\right\} \nonumber \\&\quad \ge e^{-\int _0^\infty q_{\phi (x,s)}(f(\phi (x,s)))ds} e^{\int _0^\infty c(\phi (x,s),f(\phi (x,s))))ds}. \end{aligned}$$

(15)

Indeed, if either $\int _0^\infty q_{\phi (x,s)}(f(\phi (x,s)))ds$ or $\int _0^\infty c(\phi (x,s),f(\phi (x,s))))ds$ is finite, then in the above inequality, the equality takes place; and if both $\int _0^\infty q_{\phi (x,s)}(f(\phi (x,s)))ds$ and $\int _0^\infty c(\phi (x,s),f(\phi (x,s))))ds$ are infinite, then the right hand side of the inequality is zero according to (1).

In the proof of part (a), it was observed that

$$\begin{aligned} \int _0^t e^{-\int _0^s(q_{\phi (x,v)}(f(\phi (x,v)))-c(\phi (x,v),f(\phi (x,v))))dv}\int _S V^*(y)\tilde{q}(dy|\phi (x,s),f(\phi (x,s)))ds \end{aligned}$$

and

$$\begin{aligned} e^{-\int _0^t (q_{\phi (x,s)}(f(\phi (x,s)))-c(\phi (x,s),f(\phi (x,s))))ds}V^*(\phi (x,t)) \end{aligned}$$

are absolutely continuous in t and are thus finite for each $t\in [0,\infty )$. As in the proof of part (a), similar calculations to those in (14) imply that for each $t\in [0,\infty ),$

$$\begin{aligned}&\int _0^t e^{-\int _0^s(q_{\phi (x,v)}(f(\phi (x,v)))-c(\phi (x,v),f(\phi (x,v))))dv}\int _S V^*(y)\tilde{q}(dy|\phi (x,s),f(\phi (x,s)))ds\nonumber \\&\qquad +\,e^{-\int _0^t (q_{\phi (x,s)}(f(\phi (x,s)))-c(\phi (x,s),f(\phi (x,s))))ds}V^*(\phi (x,t))-V^*(x)\nonumber \\&\quad = \int _0^t e^{-\int _0^\tau (q_{\phi (x,v)}(f(\phi (x,v)))-c(\phi (x,v),f(\phi (x,v))))dv}\Bigg \{U^*(x,\tau )\\&\qquad +\,\int _S V^*(y)\tilde{q}(dy|\phi (x,\tau ),f(\phi (x,\tau )))\nonumber \\&\qquad -\, (q_{\phi (x,\tau )}(f(\phi (x,\tau )))-c(\phi (x,\tau ),f(\phi (x,\tau )) ))V^*(\phi (x,\tau )) \Bigg \}d\tau =0, \end{aligned}$$

where the last equality is by what was established in part (a). Therefore, for each $t\in [0,\infty ),$

$$\begin{aligned}&V^*(x)-\int _0^t e^{-\int _0^s(q_{\phi (x,v)}(f(\phi (x,v)))-c(\phi (x,v),f(\phi (x,v))))dv}\\&\qquad \qquad \times \int _S V^*(y)\tilde{q}(dy|\phi (x,s),f(\phi (x,s)))ds\\&\quad =e^{-\int _0^t (q_{\phi (x,s)}(f(\phi (x,s)))-c(\phi (x,s),f(\phi (x,s))))ds}V^*(\phi (x,t))\\&\quad \ge e^{-\int _0^t (q_{\phi (x,s)}(f(\phi (x,s)))-c(\phi (x,s),f(\phi (x,s))))ds}, \end{aligned}$$

where the inequality holds because $V^*(x)\ge 1$ for each $x\in S.$ Taking $\mathop {\underline{\lim }}_{t\rightarrow \infty }$ on the both sides of the previous equality yields:

$$\begin{aligned}&V^*(x)-\int _0^\infty e^{-\int _0^s(q_{\phi (x,v)}(f(\phi (x,v)))-c(\phi (x,v),f(\phi (x,v))))dv}\\&\qquad \qquad \times \int _S V^*(y)\tilde{q}(dy|\phi (x,s),f(\phi (x,s)))ds\\&\quad \ge e^{-\int _0^\infty q_{\phi (x,s)}(f(\phi (x,s)))ds} e^{\int _0^\infty c(\phi (x,s),f(\phi (x,s))))ds} \end{aligned}$$

with the inequality following from (15). Hence

$$\begin{aligned}&V^*(x)\ge \int _0^\infty e^{-\int _0^s(q_{\phi (x,v)}(f(\phi (x,v)))-c(\phi (x,v),f(\phi (x,v))))dv}\\&\qquad \qquad \times \int _S V^*(y)\tilde{q}(dy|\phi (x,s),f(\phi (x,s)))ds\\&\qquad \qquad +e^{-\int _0^\infty q_{\phi (x,s)}(f(\phi (x,s)))ds} e^{\int _0^\infty c(\phi (x,s),f(\phi (x,s))))ds}=W(x,\tilde{f}^x)\ge V^*(x). \end{aligned}$$

Here it is clear that $s\in [0,\infty )\rightarrow f(\phi (x,s))$ can be identified as an element of $\mathcal{R}$, denoted as $\tilde{f}^x$. In fact, $\tilde{f}_s^x=\delta _{\{f(\phi (x,s))\}}$ for each $s\in [0,\infty )$, whereas $x\in S\rightarrow \tilde{f}^x\in \mathcal{R}$ is measurable. This measurable mapping $x\in S\rightarrow \tilde{f}^x\in \mathcal{R}$ defines a deterministic stationary optimal strategy for the risk-sensitive DTMDP problem (20) by Proposition A.1. It is clear that the measurable mapping $x\in S\rightarrow f(x)\in A$ defines an optimal deterministic stationary policy for the PDMDP problem (8).

Finally, we show the remaining part of (a). Let $H^*$ be a measurable $[1,\infty )$-valued function on S such that

$$\begin{aligned}&-\,(H^*(\phi (x,t))-H^*(x))\\&\quad =\int _0^t \inf _{a\in A}\Bigg \{ \int _S H^*(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\\&\quad \qquad -\,c(\phi (x,\tau ),a) )H^*(\phi (x,\tau ))\Bigg \}d\tau ,\quad ~t\in [0,\infty ),\quad x\in S. \end{aligned}$$

There exists a measurable mapping h from S to A such that

$$\begin{aligned}&\inf _{a\in A}\left\{ \int _S H^*(y)\tilde{q}(dy|x,a)- (q_{x}(a)-c(x,a) )H^*(x)\right\} \\&\quad =\int _S H^*(y)\tilde{q}(dy|x,h(x))- (q_{x}(h(x))-c(x,h(x)) )H^*(x),~\forall ~x\in S; \end{aligned}$$

c.f., Proposition D.5 of [15]. It follows that $\int _0^s\int _S H^*(y)\tilde{q}(dy|\phi (x,\tau ),h(\phi (x,\tau )))d\tau $ is absolutely continuous in $s\in [0,t]$ for each $t\ge 0.$ As in the proof of part (b),

$$\begin{aligned}&\int _0^t e^{-\int _0^s(q_{\phi (x,v)}(h(\phi (x,v)))-c(\phi (x,v),h(\phi (x,v))))dv}\int _S H^*(y)\tilde{q}(dy|\phi (x,s),h(\phi (x,s)))ds\nonumber \\&\quad +\,e^{-\int _0^t (q_{\phi (x,s)}(h(\phi (x,s)))-c(\phi (x,s),h(\phi (x,s))))ds}H^*(\phi (x,t))\\&\quad -\,H^*(x) =0,\quad ~\forall ~t\in [0,\infty ), \end{aligned}$$

and by passing to the lower limit as $t\rightarrow \infty $,

$$\begin{aligned} H^*(x)\ge & {} \int _0^\infty e^{-\int _0^s(q_{\phi (x,v)}(h(\phi (x,v)))-c(\phi (x,v),h(\phi (x,v))))dv}\nonumber \\&\times \int _S H^*(y)\tilde{q}(dy|\phi (x,s),h(\phi (x,s)))ds\nonumber \\&+\,e^{-\int _0^\infty q_{\phi (x,s)}(h(\phi (x,s)))ds} e^{\int _0^\infty c(\phi (x,s),h(\phi (x,s))))ds}\nonumber \\\ge & {} \inf _{\rho \in \mathcal{R}}\Bigg \{\int _0^\infty e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds} \left( \int _S H^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )\right) d\tau \nonumber \\&+\,e^{-\int _0^\infty q_{\phi (x,s)}(\rho _s)ds}e^{\int _0^\infty c(\phi (x,s),\rho _s)ds} \Bigg \},\quad ~\forall ~x\in S. \end{aligned}$$

(16)

It remains to refer to Proposition A.1 for that $H^*(x)\ge V^*(x)$ for each $x\in S.$$\square $

Proof of Theorem 3.2

Let $V^*_0(x):=1$ for each $x\in S.$ For each $n\ge 0,$ one can legitimately define

$$\begin{aligned}&V^*_{n+1}(x)\nonumber \\&\quad = \inf _{\rho \in \mathcal{R}}\Bigg \{\int _0^\infty e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds} \left( \int _S V^*_n(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )\right) d\tau \nonumber \\&\qquad +\,e^{-\int _0^\infty q_{\phi (x,s)}(\rho _s)ds}e^{\int _0^\infty c(\phi (x,s),\rho _s)ds} \Bigg \},\quad ~\forall ~x\in S. \end{aligned}$$

(17)

Recall that the DTMDP model $\{\mathbf{X },\mathbf{A },p,l\}$ satisfies Condition A.1, as noted in the proof of Lemma 4.1. Then by Proposition A.1, $\{V_n^*\}$ is a monotone nondecreasing sequence of $[1,\infty )$-valued measurable functions on S such that $V^*_n(x)\uparrow V^*(x)$ as $n\uparrow \infty ,$ for each $x\in S.$

Let $n\ge 0$ be fixed. As in Lemma 4.3, for each $x\in S$, there is some $\rho ^*\in \mathcal{R}$ such that

$$\begin{aligned} V^*_{n+1}(x)= & {} \inf _{\rho \in \mathcal{R}}\Bigg \{\int _0^t e^{-\int _0^s(q_{\phi (x,v)}(\rho _v)-c(\phi (x,v),\rho _v))dv}\int _S V^*_n(y)\tilde{q}(dy|\phi (x,s),\rho _s)ds\nonumber \\&+\,e^{-\int _0^t (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V^*_{n+1}(\phi (x,t))\Bigg \}\nonumber \\= & {} \int _0^t e^{-\int _0^s(q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\int _S V_n^*(y)\tilde{q}(dy|\phi (x,s),\rho ^*_s)ds\nonumber \\&+\,e^{-\int _0^t (q_{\phi (x,s)}(\rho ^*_s)-c(\phi (x,s),\rho ^*_s))ds}V_{n+1}^*(\phi (x,t)),~\forall ~t\ge 0. \end{aligned}$$

Also the relevant version of Lemma 4.2 holds: for each $x\in S$ and $\rho \in \mathcal{R}$,

$$\begin{aligned} t\in [0,\infty )\rightarrow & {} \int _0^t e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}\int _{S}V^*_n(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )d\tau \\&+\,e^{-\int _0^t (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds}V_{n+1}^*(\phi (x,t)) \end{aligned}$$

is monotone nondecreasing in $t\in [0,\infty )$. Clearly, $V^*_{n+1}(\phi (x,t))$ is absolutely continuous in $t\in [0,\infty )$ for each $x\in S$.

Corresponding to (14), we now have

$$\begin{aligned} 0= & {} \int _0^t e^{-\int _0^s(q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\int _S V^*_n(y)\tilde{q}(dy|\phi (x,s),\rho ^*_s)ds\nonumber \\&+\,e^{-\int _0^t (q_{\phi (x,s)}(\rho ^*_s)-c(\phi (x,s),\rho ^*_s))ds}V^*_{n+1}(\phi (x,t))-V_{n+1}^*(x)\nonumber \\= & {} \int _0^t e^{-\int _0^\tau (q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\Bigg \{\int _S V_n^*(y)\tilde{q}(dy|\phi (x,\tau ),\rho ^*_\tau )+U_{n+1}^*(x,\tau )\nonumber \\&-\, (q_{\phi (x,\tau )}(\rho ^*_\tau )-c(\phi (x,\tau ),\rho ^*_\tau ) )V_{n+1}^*(\phi (x,\tau )) \Bigg \}d\tau \nonumber \\\ge & {} \int _0^t e^{-\int _0^\tau (q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\Bigg \{U_{n+1}^*(x,\tau ) \nonumber \\&+\,\left. \inf _{a\in A}\left\{ \int _S V_n^*(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\right. \right. \nonumber \\&\quad -\,c(\phi (x,\tau ),a) )V_{n+1}^*(\phi (x,\tau ))\Bigg \} \Bigg \}d\tau \nonumber \\= & {} \int _0^t e^{-\int _0^\tau (q_{\phi (x,v)}(\rho ^*_v)-c(\phi (x,v),\rho ^*_v))dv}\Bigg \{U_{n+1}^*(x,\tau )\nonumber \\&\quad +\,\int _S V_n^*(y)\tilde{q}(dy|\phi (x,\tau ),f(\phi (x,\tau )))\nonumber \\&\quad -\, (q_{\phi (x,\tau )}(f(\phi (x,\tau )))-c(\phi (x,\tau ),f(\phi (x,\tau )) )V_{n+1}^*(\phi (x,\tau )) \Bigg \}d\tau , \end{aligned}$$

where $\tau \in [0,t]\rightarrow U^*_{n+1}(x,\tau )$ is integrable and coincides with $\frac{\partial V^*_{n+1}(\phi (x,t))}{\partial t}$ almost everywhere, and f is some measurable mapping from S to A, whose existence is guaranteed by [15, Proposition D.5]. Continued from the above relation, the reasoning in the proof of the first assertion in part (a) of Theorem 3.1 can be followed: eventually we see

$$\begin{aligned} U_{n+1}^*(x,\tau )&+\inf _{a\in A}\Bigg \{ \int _S V_n^*(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\\&\qquad \qquad -\,c(\phi (x,\tau ),a) )V_{n+1}^*(\phi (x,\tau ))\Bigg \}=0 \end{aligned}$$

almost everywhere in $\tau \in [0,t],$ i.e., the equation

$$\begin{aligned}&-\,(V(\phi (x,t))-V(x))\nonumber \\&\quad =\int _0^t \inf _{a\in A}\Bigg \{ \int _S V^{*}_n(y)\tilde{q}(dy|\phi (x,\tau ),a)- (q_{\phi (x,\tau )}(a)\nonumber \\&\quad \qquad \qquad \qquad -\,c(\phi (x,\tau ),a) )V(\phi (x,\tau ))\Bigg \}d\tau ,\quad ~t\in [0,\infty ),\quad x\in S, \end{aligned}$$

(18)

is satisfied by $V=V^*_{n+1}.$

Recall that $V^*_{0}=V^{(0)}$. Suppose the recursive definition in (9) is valid up to step n, and $V^*_{n}(x)=V^{(n)}(x)$ for each $x\in S.$ Consider an arbitrarily fixed $[1,\infty )$-valued measurable solution V to (18), and let $f^*$ be a measurable mapping from S to A such that

$$\begin{aligned}&\inf _{a\in A}\left\{ \int _S V^*_n(y)\tilde{q}(dy|x,a)- (q_{x}(a)-c(x,a) )V(x)\right\} \\&\quad =\int _S V_n^*(y)\tilde{q}(dy|x,f^*(x))- (q_{x}(f^*(x))-c(x,f^*(x)) )V(x),~\forall ~x\in S. \end{aligned}$$

One can follow the reasoning in the last part of the proof of Theorem 3.1, and see, c.f. (16),

$$\begin{aligned} V(x)\ge & {} \int _0^\infty e^{-\int _0^s(q_{\phi (x,v)}(f^*(\phi (x,v)))-c(\phi (x,v),f^*(\phi (x,v))))dv}\\&\times \int _S V_n^*(y)\tilde{q}(dy|\phi (x,s),f^*(\phi (x,s)))ds\\&+\,e^{-\int _0^\infty q_{\phi (x,s)}(f^*(\phi (x,s)))ds} e^{\int _0^\infty c(\phi (x,s),f^*(\phi (x,s))))ds}\\\ge & {} \inf _{\rho \in \mathcal{R}}\Bigg \{\int _0^\infty e^{-\int _0^\tau (q_{\phi (x,s)}(\rho _s)-c(\phi (x,s),\rho _s))ds} \left( \int _S V^*_n(y)\tilde{q}(dy|\phi (x,\tau ),\rho _\tau )\right) d\tau \\&+\,e^{-\int _0^\infty q_{\phi (x,s)}(\rho _s)ds}e^{\int _0^\infty c(\phi (x,s),\rho _s)ds} \Bigg \}=V^*_{n+1}(x),\quad ~\forall ~x\in S, \end{aligned}$$

where the last equality is by (17). Thus, $V^*_{n+1}$ is the minimal $[1,\infty )$-valued measurable solution to (18), and coincides with $V^{(n+1)}$. Therefore, by induction $V^*_{n}=V^{(n)}$ for each $n\ge 0.$ It follows now that $V^{(n)}(x)\uparrow V^*(x)$ as $n\uparrow \infty $ for each $x\in S.$$\square $

5 Conclusion

In this paper, we considered total undiscounted risk-sensitive PDMDP in Borel state and action spaces with a nonnegative cost rate. The transition and cost rates are assumed to be locally integrable along the drift. Under quite natural conditions, we showed that the value function is a solution to the optimality equation, justified the value iteration algorithm, and showed the existence of deterministic stationary optimal policy. As a corollary, the obtained results were applied to improving significantly known results for finite horizon undiscounted and infinite horizon discounted risk-sensitive CTMDP in the literature.

References

Bäuerle, N., Jaśkiewicz, A.: Risk-sensitive Divident problems. Eur. J. Oper. Res. 242, 161–171 (2015)
Article Google Scholar
Bäuerle, N., Rieder, U.: MDP algorithms for portfolio optimization problems in pure jump markets. Financ. Stoch. 13, 591–611 (2009)
Article MathSciNet Google Scholar
Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Berlin (2011)
Book Google Scholar
Bäuerle, N., Rieder, U.: More risk-sensitive Markov decision processes. Math. Oper. Res. 39, 105–120 (2014)
Article MathSciNet Google Scholar
Bertsekas, D., Shreve, S.: Stochastic Optimal Control. Academic Press, New York (1978)
MATH Google Scholar
Cavazos-Cadena, R., Montes-de-Oca, R.: Optimal stationary policies in risk-sensitive dynamic programs with finite state space and nonnegative rewards. Appl. Math. (Warsaw) 27, 167–185 (2000)
Article MathSciNet Google Scholar
Chung, K., Sobel, M.: Discounted MDP’s: distribution functions and exponential utility maximization. SIAM J Control Optim. 25, 49–62 (1987)
Article MathSciNet Google Scholar
Coraluppi, S., Marcus, S.: Risk-sensitive queueing. In: Proceedings of the 35th Annual Allerton Conference on Communication Control and Computing, 943–952 (1997)
Costa, O., Dufour, F.: Continuous Average Control of Piecewise Deterministic Markov Processes. Springer, New York (2013)
Book Google Scholar
Davis, M.: Markov Models and Optimization. Chapman and Hall, London (1993)
Book Google Scholar
Di Masi, G., Stettner, L.: Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J. Control Optim. 38, 61–78 (1999)
Article MathSciNet Google Scholar
Fainberg, E.: Controlled Markov processes with arbitrary numerical criteria. Theory Probab. Appl. 27, 486–503 (1982)
Article MathSciNet Google Scholar
Forwick, L., Schäl, M., Schmitz, M.: Piecewise deterministic Markov control processes with feedback controls and unbounded costs. Acta Appl. Math. 82, 239–267 (2004)
Article MathSciNet Google Scholar
Ghosh, M., Saha, S.: Risk-sensitive control of continuous time Markov chains. Stochastics 86, 655–675 (2014)
Article MathSciNet Google Scholar
Hernández-Lerma, O., Lasserre, J.: Discrete-Time Markov Control Processes. Springer, New York (1996)
Book Google Scholar
Howard, R., Matheson, J.: Risk-sensitive Markov decision proceses. Manag. Sci. 18, 356–369 (1972)
Article Google Scholar
Jaquette, S.: A utility criterion for Markov decision processes. Manag. Sci. 23, 43–49 (1976)
Article MathSciNet Google Scholar
Jaśkiewicz, A.: A note on negative dynamic programming for risk-sensitive control. Oper. Res. Lett. 36, 531–534 (2008)
Article MathSciNet Google Scholar
Kitaev, M., Rykov, V.: Controlled Queueing Systems. CRC Press, Boca Raton (1995)
MATH Google Scholar
Kumar, S., Pal, C.: Risk-sensitive control of pure jump process on countable space with near monotone cost. Appl. Math. Optim. 68, 311–331 (2013)
Article MathSciNet Google Scholar
Piunovski, A., Khametov, V.: New effective solutions of optimality equations for the controlled Markov chains with continuous parameter (the unbounded price-function). Probl. Control Inf. Theory 14, 303–318 (1985)
MathSciNet MATH Google Scholar
Piunovskiy, A.: Optimal Control of Random Sequences in Problems with Constraints. Kluwer, Dordrecht (1997)
Book Google Scholar
Schäl, M.: On piecewise deterministic Markov control processes: control of jumps and of risk processes in insurance. Insur. Math. Econ. 22, 75–91 (1998)
Article MathSciNet Google Scholar
Wei, Q.: Continuous-time Markov decision processes with risk-sensitive finite-horizon cost criterion. Math. Methods Oper. Res. 84, 461–487 (2016)
Article MathSciNet Google Scholar
Wei, Q., Chen, X.: Continuous-time Markov decision processes under the risk-sensitive average cost criterion. Oper. Res. Lett. 44, 457–462 (2016)
Article MathSciNet Google Scholar
Yushkevich, A.: On reducing a jump controllable Markov model to a model with discrete time. Theory Probab. Appl. 25, 58–68 (1980)
Article Google Scholar
Zhang, Y.: Continuous-time Markov decision processes with exponential utility. SIAM J. Control Optim. 55, 2636–2660 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank the referees for their remarks, which improved the presentation of this paper. This work is partially supported by a grant from the Royal Society (IE160503).

Author information

Authors and Affiliations

Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, UK
Xin Guo & Yi Zhang

Authors

Xin Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Zhang.

Appendix

For ease of reference, we present the relevant notations and facts about the risk-sensitive problem for a DTMDP. The proofs of the presented statements can be found in [18] or [27]. Standard description of a DTMDP can be found in e.g., [15, 22].

Consider a discrete-time Markov decision process with the following primitives:

$\mathbf X $ is a nonempty Borel state space.
$\mathbf A $ is a nonempty Borel action space.
p(dy|x, a) is a stochastic kernel on $\mathcal{B}(\mathbf X )$ given $(x,a)\in \mathbf X \times \mathbf A $.
l a $[0,\infty ]$-valued measurable cost function on $\mathbf X \times \mathbf A \times \mathbf X .$

Let $\Sigma $ be the space of strategies, and $\Sigma _{DM}$ be the space of all deterministic strategies for the DTMDP. Let the controlled and controlling processes be denoted by $\{Y_n, n=0,1,\dots ,\infty \}$ and $\{A_n,n=0,1,\dots ,\infty \}$, respectively. The strategic measure of a strategy $\sigma $ given the initial state $x\in \mathbf X $ is denoted by $\mathbf P _x^\sigma $. The expectation taken with respect to $\mathbf P _x^\sigma $ is denoted by $\mathbf E _x^\sigma .$

Consider the optimal control problem

$$\begin{aligned} \text{ Minimize } \text{ over } \sigma :&\mathbf E _x^\sigma \left[ e^{\sum _{n=0}^\infty l(Y_n,A_n,Y_{n+1})}\right] =:\mathbf V (x,\sigma ),~x\in \mathbf X . \end{aligned}$$

(19)

It is also referred to as the risk-sensitive DTMDP problem. We denote the value function of problem (19) by $\mathbf V ^*$. Then a strategy $\sigma ^*$ is called optimal for problem (19) if $\mathbf V (x,\sigma ^*)=\mathbf V ^*(x)$ for each $x\in \mathbf X .$

Condition A.1

(a)
The function l(x, a, y) is lower semicontinuous in $a\in \mathbf A $ for each $x,y\in \mathbf X .$
(b)
For each bounded measurable function f on $\mathbf X $ and each $x\in \mathbf X ,$$\int _\mathbf{X }f(y)p(dy|x,a)$ is continuous in $a\in \mathbf A .$
(c)
The space $\mathbf A $ is a compact Borel space.

Proposition A.1

Suppose Condition A.1 is satisfied.

(a)
The value function $\mathbf V ^*$ is the minimal $[1,\infty ]$-valued measurable solution to
$$\begin{aligned} \mathbf V (x)=\inf _{a\in \mathbf A }\left\{ \int _\mathbf{X }p(dy|x,a)e^{l(x,a,y)}{} \mathbf V (y)\right\} ,\quad ~x\in \mathbf X . \end{aligned}$$
(20)
(b)
Let $\mathbf U $ be a $[1,\infty ]$-valued lower semianalytic function on $\mathbf X $. If
$$\begin{aligned} \mathbf U (x)\ge \inf _{a\in \mathbf A }\left\{ \int _\mathbf{X }p(dy|x,a)e^{l(x,a,y)}{} \mathbf U (y)\right\} ,\quad ~\forall ~x\in \mathbf X , \end{aligned}$$
then $\mathbf U (x)\ge \mathbf V ^*(x)$ for each $x\in \mathbf X .$ In particular, if the function $\mathbf U $ satisfying the above relation is $[1,\infty )$-valued, then so is the value function $\mathbf V ^*.$
(c)
Let $\varphi $ be a deterministic stationary strategy for the DTMDP model $\{\mathbf{X },\mathbf{A },p,l\}$. If
$$\begin{aligned} \mathbf V ^*(x)=\int _\mathbf{X }p(dy|x,\varphi (x))e^{l(x,\varphi (x),y)}{} \mathbf V ^*(y),\quad ~\forall ~x\in \mathbf X , \end{aligned}$$
(21)
then $\mathbf V ^*(x)=\mathbf V (x,\varphi )$ for each $x\in \mathbf X .$
(d)
Let $\mathbf V ^{(0)}(x):=1$ for each $x\in \mathbf X $, and for each $n=1,2,\dots ,$
$$\begin{aligned} \mathbf V ^{(n)}(x):=\inf _{a\in A}\left\{ \int _\mathbf{X }p(dy|x,a)e^{l(x,a,y)}{} \mathbf V ^{(n-1)}(y)\right\} ,\quad ~\forall ~x\in \mathbf X . \end{aligned}$$
Then $(\mathbf V ^{(n)}(x))$ increases to $\mathbf V ^*(x)$ for each $x\in \mathbf X $, where $\mathbf V ^*$ is the value function for problem (19). Furthermore, there exists a deterministic stationary strategy $\varphi $ satisfying (21), and so in particular, there exists a deterministic stationary optimal strategy for the risk-sensitive DTMDP problem (19).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Guo, X., Zhang, Y. On Risk-Sensitive Piecewise Deterministic Markov Decision Processes. Appl Math Optim 81, 685–710 (2020). https://doi.org/10.1007/s00245-018-9485-x

Download citation

Published: 17 February 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00245-018-9485-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On Risk-Sensitive Piecewise Deterministic Markov Decision Processes

Abstract

Similar content being viewed by others

Risk-sensitive infinite-horizon discounted piecewise deterministic Markov decision processes

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Finite horizon risk-sensitive continuous-time Markov decision processes with unbounded transition and cost rates

1 Introduction

2 Model Description and Problem Statement

2.1 Notations and Conventions

Condition 2.1

Condition 2.2

Condition 2.3

3 Main Statements

Theorem 3.1

Remark 3.1

Theorem 3.2

Remark 3.2

Corollary 1

Proof

4 Proof of the Main Statements

Lemma 4.1

Proof

Lemma 4.2

Proof

Lemma 4.3

Proof

Lemma 4.4

Proof

Proof of Theorem 3.1

Proof of Theorem 3.2

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Condition A.1

Proposition A.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation