Risk-sensitive discounted cost criterion for continuous-time Markov decision processes on a general state space

Golui, Subrata; Pal, Chandan

doi:10.1007/s00186-022-00779-9

Risk-sensitive discounted cost criterion for continuous-time Markov decision processes on a general state space

Original Article
Published: 11 April 2022

Volume 95, pages 219–247, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Methods of Operations Research Aims and scope Submit manuscript

Risk-sensitive discounted cost criterion for continuous-time Markov decision processes on a general state space

Download PDF

266 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we consider risk-sensitive discounted control problem for continuous-time jump Markov processes taking values in general state space. The transition rates of underlying continuous-time jump Markov processes and the cost rates are allowed to be unbounded. Under certain Lyapunov condition, we establish the existence and uniqueness of the solution to the Hamilton–Jacobi–Bellman equation. Also, we prove the existence of optimal risk-sensitive control in the class of Markov control and completely characterized the optimal control.

Risk-sensitive control of Markov jump linear systems: Caveats and difficulties

Article 23 December 2016

On Risk-Sensitive Piecewise Deterministic Markov Decision Processes

Article Open access 17 February 2018

Risk-sensitive infinite-horizon discounted piecewise deterministic Markov decision processes

Article 15 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we study the risk-sensitive discounted criterion for continuous-time Markov decision processes (CTMDPs) with Borel state space. In the risk-neutral criterion, the controller wants to optimize the expected value of the total payoff. But in the risk-sensitive criterion, controller considers the expected value of the exponential of the total payoff and so, the risk-sensitive criterion gives a better protection from the risk. Therefore, the risk-sensitive or exponential of integral is a very popular cost criterion due to its applications in many areas such as queueing systems and finance, for more details see Bauerle and Rieder (2014) and Whittle (1990) and the references therein. In the literature, risk-sensitive control problems for CTMDPs are an important class of stochastic optimal control problems and have been widely studied under different sets of conditions. Finite horizon risk-sensitive CTMDPs for countable state space were studied in Ghosh and Saha (2014), Guo et al. (2019) and Wei (2016) and for infinite horizon risk-sensitive CTMDPs we refer to Ghosh and Saha (2014), Guo and Zhang (2020), Kumar and Pal (2013, 2015), Pal and Pradhan (2019) and Zhang (2017). For important contributions to the risk-sensitive control of discrete-time MDP on a general state space, see Masi and Stettner (2000) and Masi and Stettner (2007). Although risk-sensitive control of CTMDPs on a countable state space have been studied extensively, but the corresponding literature in the context of risk-sensitive control of CTMDPs on a general state space is rather limited. Some exceptions are (Guo and Zhang 2019, 2020; Pal and Pradhan 2019).

In the paper (Pal and Pradhan 2019), the authors studied risk-sensitive control of pure jump processes on a general state space. They considered bounded transition and cost rates and all controls are Markovian. In Pal and Pradhan (2019), the authors proved the HJB characterization of the optimal risk-sensitive control. The boundedness assumption on the transition and cost rates plays a key role in the proof of the existence of the optimal risk-sensitive control in Pal and Pradhan (2019). This boundedness requirement, however, imposes some restrictions in applications, for instance in queueing control and population processes, where the transition and reward/cost rates are usually unbounded. Also, there are many real-life situations where the state space may be uncountable, for example, the chemical reaction model, Gaussian model, etc. One can see Guo and Zhang (2019), Piunovskiy and Zhang (2020) and references therein for the real-life examples. In Guo and Zhang (2019), the author considered the finite-horizon risk-sensitive control problem for CTMDPs on a Borel state space with unbounded transition and cost rates and proved the existence of optimal control via the HJB equation.

In this paper, we study a much more general risk-sensitive control problem for CTMDP with general state space. To the best of our knowledge, this is the first work which deals with infinite horizon discounted risk-sensitive control for CTMDPs on a general state space with unbounded cost and transition rates and the controls can be history-dependent. The main objective of this work is to prove the existence of the solution of the HJB equation and the characterization of the optimal risk-sensitive control. In particular (1) We prove that the HJB equations has a unique solution $\varphi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)$ satisfying the bounds as in Eq. (3.1) below, where $L^\infty _{{\mathcal {V}}}([0,1]\times S)$ is described below. (2) We prove that any measurable minimizer of the HJB equation is optimal, and conversely any optimal control in the class of Markov controls is a minimizer of the HJB Eq. (3.1) below. We first consider for bounded transition and cost rates and establish the existence of a solution to the corresponding HJB equation by Banach’s fixed point theorem as in Pal and Pradhan (2019). Then we will relax the bounded hypothesis and extend this result to unbounded transition and cost rates. We characterize the value function via the HJB equation. Also, we prove the existence of an optimal control in the class of Markov control and the HJB characterization of the optimal risk-sensitive control and prove its complete characterization. In Corollary 5.1, we prove that if the cost and transition rates are bounded, then an optimal control exists for our model.

The rest of this article is structured as follows. Section 2 deals with the description of the problem, required notations, some Assumptions, and preliminary results. In Sect. 3, we give a continuity-compactness Assumption and prove the stochastic representation of the solution of the HJB Eq. (3.1). In Sect. 4, we truncate our transition and cost rates and prove the existence of the unique solution to the HJB equation. A complete characterization of optimal control is proven in Sect. 5. In Sect. 6, we illustrate our theory and assumptions by an illustrative example.

2 The control problems

The model of CTMDP is a five-tuple which consists of the following elements:

$$\begin{aligned} {\mathbb {M}}:=\{S, A, (A(x)\subset A, x\in S),c(x, a),q( \cdot |x, a)\}, \end{aligned}$$

a Borel space S, called the state space, whose elements are referred to as states of the system and the corresponding Borel $\sigma $-algebra is ${\mathcal {B}}(S)$. (Throughout the whole paper we consider that for any Borel space X, the corresponding Borel $\sigma $-algebra is ${\mathcal {B}}(X)$.)
A is the action set, which is assumed to be Borel space with the Borel $\sigma $-algebra ${\mathcal {B}}(A)$.
for each $x\in S$, $A(x)\in {\mathcal {B}}(A)$ denotes the set of admissible actions for state x. Let $K:=\{(x, a)|x\in S, a\in A(x)\}$, which is a Borel subset of $S\times A$.
the measurable function $c:K \rightarrow {\mathbb {R}}_{+}$ denotes the cost rate function. We require the cost function c(x, a) to measure (or evaluate) the utility of taking action a at state x.
given any $(x, a)\in K$, the transition rate $q(\cdot | x, a)$ is a Borel measurable signed kernel on S given K. That is, $q(\cdot |x,a)$ satisfies countable additivity; $q(D| x, a)\ge 0 $ where $(x,a)\in K$, $x\notin D$ and $D\in {\mathcal {B}}(S)$. Moreover, we assume that $q(\cdot | x, a)$ satisfies the following conservative and stable conditions: for any $x\in S,$
$$\begin{aligned}&q(S|x,a)\equiv 0 ~~~\text {and}\\&~q^{*}(x):=\sup _{a\in A(x)}q_x(a)<\infty , \end{aligned}$$
where $q_x(a):=-q(\{x\}| x, a)\ge 0.$ We need the transition rates to specify the random dynamic evolution of the system.

Next, we give an informal description of the evolution of the CTMDPs as follows. The controller observes continuously the current state of the system. When the system is in state $x\in S$ at time $t\ge 0$, he/she chooses action $a_t\in A(x)$ according to some control. As a consequence of this, the following happens:

the controller incurs an immediate cost at rate $c(x, a_t)$; and
after a random sojourn time (i.e., the holding time at state x), the system jumps to a set $B\in {\mathcal {B}}(S)$ ($x\notin B$) of states with the transition probability $\dfrac{q(B|x,a_t)}{q_x(a_t)}$ determined by the transition rates $q(dy|x,a_t)$. The distribution function of the sojourn time is $(1-e^{-\int _{t}^{t+x}q_x(a_s)ds})$. (see Proposition B.8 in [ Guo and Hernandez-Lerma (2009), p. 205] for details).

When the state of the system transits to a new state $y\ne x$, the above procedure is repeated. Thus, the controller tries to minimize his/her costs with respect to some performance criterion ${\mathcal {J}}_\alpha (\cdot ,\cdot , \cdot )$, which in our present case is defined by (2.2), below. To formalize what is described above, below we describe the construction of continuous time Markov decision processes (CTMDPs) under possibly history-dependent controls. To construct the underlying CTMDPs (as in Guo and Piunovskiy 2011; Kitaev 1995; Piunovskiy and Zhang 2011, 2020) we introduce some notations: let $S_\Delta :=S \cup \{\Delta \}$ (with some “isolated” state $\Delta \notin S$), $\text{\O}mega _0:=(S\times (0,\infty ))^\infty $, $\text{\O}mega _k:=(S\times (0,\infty ))^k\times S\times (\{\infty \}\times \{\Delta \})^\infty $ for $k\ge 1$ and $\text{\O}mega :=\cup _{k=0}^\infty \text{\O}mega _k$. Let ${\mathcal {F}}$ be the Borel $\sigma $-algebra on $\text{\O}mega $. Then we obtain the measurable space $(\text{\O}mega , {\mathcal {F}})$. For some $k\ge 1$, and sample $ \omega :=(x_0, \theta _1, x_1, \cdots , \theta _k, x_k, \cdots )\in \text{\O}mega ,$ define

$$\begin{aligned} T_0(\omega ):=0,~ T_k(\omega ):= T_{k-1}(\omega )+\theta _{k},~ T_\infty (\omega ):=\lim _{k\rightarrow \infty }T_k(\omega ). \end{aligned}$$

Using $\{T_k\}$, we define the state process $\{\xi _t\}_{t\ge 0}$ as

$$\begin{aligned} \xi _t(\omega ):=\sum _{k\ge 0}I_{\{T_k\le t<T_{k+1}\}}x_k+ I_{\{t\ge T_\infty \}}\Delta , \text { for } t\ge 0~(\text {with}~ T_0:=0). \end{aligned}$$

(2.1)

Here, $I_{E}$ denotes the indicator function of a set E, and we use the convention that $0+z=:z$ and $0z=:0$ for all $z\in S_\Delta $. Obviously, $\xi _t(\omega )$ is right-continuous on $[0,\infty )$. We denote $\xi _{t-}(\omega ):=\liminf _{s\rightarrow t-}\xi _s(\omega )$. From Eq. (2.1), we see that $T_k(\omega )$ $(k\ge 1)$ denotes the k-th jump moment of $\{\xi _t, t\ge 0\}$, $x_{k-1}$ is the state of the process on $[T_{k-1}(\omega ),T_k(\omega ))$, $\theta _k=T_k(\omega )-T_{k-1}(\omega )$ plays the role of sojourn time at state $x_{k-1}$, and the sample path $\{\xi _t(\omega ),t\ge 0\}$ has at most denumerable states $x_k(k=0,1,\cdots )$. The process after $T_\infty $ is regarded to be absorbed in the state $\Delta $. Thus, let $q(\cdot | \Delta , a_\Delta ):\equiv 0$, $A_\Delta :=A\cup \{a_\Delta \}$, $ A(\Delta ):=\{a_\Delta \}$, $c(\Delta , a):\equiv 0$ for all $a\in A_\Delta $, where $a_\Delta $ is an isolated point.

To precisely define the criterion, we need to introduce the concept of a control as in Guo et al. (2012), Guo and Piunovskiy (2011) and Kitaev and Rykov (1995). Take the right-continuous $\sigma $-algebras $\{{\mathcal {F}}_t\}_{t\ge 0}$ with ${\mathcal {F}}_t:=\sigma (\{T_k\le s,\xi _{T_k}\in S\}: 0\le s\le t, k\ge 0)$. For all $t\ge 0$, ${\mathcal {F}}_{s-}=:\bigvee _{0\le t<s}{\mathcal {F}}_t$. Now define a $\sigma $-algebra ${\mathcal {P}}:=\sigma (A\times \{0\}, B\times (s,\infty ): A\in {\mathcal {F}}_0, B\in {\mathcal {F}}_{s-})$, which denotes the $\sigma $-algebra of predictable sets on $\text{\O}mega \times [0,\infty )$ related to $\{{\mathcal {F}}_t\}_{t\ge 0}$. To complete the specification of a stochastic optimal control problem, we need, of course, to introduce an optimality criterion. This requires to define the class of controls as below.

Definition 2.1

A history-dependent policy $\pi :=\{\pi _t(\omega )\}_{t\ge 0}$ is a measurable map from $(\text{\O}mega \times [0,\infty ),{\mathcal {P}})$ onto $(A_\Delta ,{\mathcal {B}}(A_\Delta ))$ satisfying $\pi _t(\omega )\in A(\xi _{t-}(\omega ))$ for all $\omega \in \text{\O}mega $ and $t\ge 0$. For notational simplicity, we denote a history-dependent control as $\{\pi _t\}_{t\ge 0}$. The set of all history-dependent controls is denoted by $\Pi $. A control $\pi \in \Pi $, is called a Markov if $\pi _t(\omega )=\pi _t( \xi _{t-}(w))$ for every $w\in \text{\O}mega $ and $t\ge 0$, where $\xi _{t-}(w):=\lim _{s\uparrow t}\xi _s(w)$. We denote by $\Pi ^{m}$ the family of all Markov controls.

For any compact metric space Y, let P(Y) denote the space of probability measures on Y with Prohorov topology. Under Assumption 2.1 below, for any initial state $x\in S$ and any control $\pi \in \Pi $, Theorem 4.27 in Kitaev and Rykov (1995) yields the existence of a unique probability measure denoted by $P^{\pi }_x$ on $(\text{\O}mega ,{\mathcal {F}})$. Let $E^{\pi }_x$ be the expectation operator with respect to $P^{\pi }_x$. Fix any discounted factor $\alpha >0$. For any $\pi \in \Pi $ and $x\in S$, the risk-sensitive discounted criterion is defined as

$$\begin{aligned} {\mathcal {J}}_\alpha (\theta ,x,\pi ):=\frac{1}{\theta }\log \biggl \{{E}^{\pi }_x\biggl [\exp \biggl (\theta {\int _{0}^{\infty }e^{-\alpha t}c(\xi _t,\pi _t)dt}\biggr )\biggr ]\biggr \}, \end{aligned}$$

(2.2)

provided that the integral is well defined, where $\{\xi _t\}_{t\ge 0}$ is the Markov process corresponding to $\pi =\{\pi _t\}_{t\ge 0}\in \Pi $ and $\theta \in (0,1]$ denotes a risk-sensitive parameter and the limiting case of $\theta \rightarrow 0$ is the risk-neutral case. For each $x\in S$, let

$$\begin{aligned} {\mathcal {J}}^{*}_\alpha (\theta ,x)=\inf _{\pi \in \Pi }{\mathcal {J}}_\alpha (\theta ,x,\pi ). \end{aligned}$$

A control $\pi ^{*}\in \Pi $ is said to be optimal if ${\mathcal {J}}_\alpha (\theta ,x,\pi ^{*})={\mathcal {J}}^{*}_\alpha (\theta ,x)$ for all $x\in S$. The objective of this paper is to provide conditions for the existence of optimal control and introduce a HJB characterization of such control.

Since the logarithm is an increasing function, instead of studying ${\mathcal {J}}_\alpha (\theta ,x,\pi )$, we will consider ${\tilde{J}}_\alpha (\theta ,x,\pi )$ on $[0,1]\times S\times \Pi $ defined by

$$\begin{aligned} {\tilde{J}}_\alpha (\theta ,x,\pi ):= {E}^{\pi }_x\biggl [\exp \biggl (\theta {\int _{0}^{\infty }e^{-\alpha t}c(\xi _t,\pi _t)dt}\biggr )\biggr ]. \end{aligned}$$

(2.3)

Obviously, ${\tilde{J}}_\alpha (\theta ,x,\pi )\ge 1$ for $(\theta ,x)\in [0,1]\times S$ and $\pi \in \Pi $, and we have $\pi ^{*}$ is optimal if and only if $\displaystyle \inf _{\pi \in \Pi }{\tilde{J}}_\alpha (\theta ,x,\pi )={\tilde{J}}_{\alpha }(\theta ,x,\pi ^{*}) =:{\tilde{J}}^{*}_\alpha (\theta ,x) ~\forall x\in S.$ Since the rates q(dy|x, a) and costs c(x, a) are allowed to be unbounded, we next give conditions for the non-explosion of $\{\xi _t,t\ge 0\}$ and finiteness of $ {\mathcal {J}}_\alpha (\theta , x, \pi )$, which had been widely used in CTMDPs; see, for instance, (Guo and Hernandez-Lerma 2009; Guo et al. 2012; Guo and Liao 2019; Guo and Piunovskiy 2011; Prieto-Rumeau and Hernandez-Lerma 2012) and references therein.

Assumption 2.1

There exists a real-valued Borel measurable function ${\mathcal {V}} \ge 1$ on S and constants $\rho _0> 0$, $M_0>0$, $L_0\ge 0$ and $0<\rho _1<\min \{\alpha ,\rho ^{-1}_0\alpha ^2\}$ such that

(i)
$\int _{S}{\mathcal {V}}(y)q(dy |x, a)\le \rho _0 {\mathcal {V}}(x)~~~\forall (x, a)\in K$;
(ii)
$\sup _{a\in A(x)}q_x(a)\le M_0 {\mathcal {V}}(x)~~~\forall x\in S$;
(iii)
$\sup _{a\in A(x)}c(x,a)\le \rho _1\log {\mathcal {V}}(x)+L_0~~~\forall x\in S.$

Remark 2.1

(a)
Note that, when the transition rates are bounded i.e., $\sup _{x\in S} q^*(x)<\infty $, Assumptions 2.1 (i) and (ii) are satisfied by taking a suitable constant value of ${\mathcal {V}}(x)$.
(b)
Under Assumption 2.1 (iii) the criterion (2.3) is well defined and finite; see Proposition 2.1(c) below.

Proposition 2.1

Grant Assumption 2.1. Then for any control $\pi \in \Pi $ and $(\theta ,x)\in [0,1]\times S$, the following results are true:

(a)
${P}^{\pi }_x(T_\infty =\infty )=1$, $P^\pi _x(\xi _0=x)=1$, and ${P}^{\pi }_x(\xi _t\in S)=1$ for all $t\ge 0$;
(b)
${E}^{\pi }_x[{\mathcal {V}}(\xi _t)]\le e^{\rho _0 t} {\mathcal {V}}(x)$ for all $t\ge 0;$
(c)
We have
$$\begin{aligned} {\tilde{J}}_\alpha (\theta ,x,\pi )\le \frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1\theta }e^{ {\theta L_0}/{\alpha }}[{\mathcal {V}}(x)]^{\frac{\rho _1\theta }{\alpha }}\le \frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1}e^{{L_0}/{\alpha }}{\mathcal {V}}(x). \end{aligned}$$
Also, we get
$$\begin{aligned} {\mathcal {J}}^{*}_\alpha (\theta ,x)\le \log \biggl ({\frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1}}\biggr )+\frac{L_0}{\alpha }+\frac{\rho _1}{\alpha }\log {{\mathcal {V}}(x)} ~~\forall \theta \in (0,1],x\in S. \end{aligned}$$
(2.4)

Proof

For parts (a) and (b), see, Guo et al. (2012) and Guo and Piunovskiy (2011, Theorem 3.1).

Proof of part (c): Observe that $d(- e^{-\alpha t})$ is a probability measure on $[0,\infty ).$ For any $\pi \in \Pi $ and $(\theta ,x)\in [0,1]\times S$, by (2.3) and Jensen’s inequality we have

$$\begin{aligned} {\tilde{J}}_\alpha (\theta ,x,\pi )&= {E}^{\pi }_x\biggl [\exp \biggl (\int _{0}^{\infty }\frac{\theta }{\alpha }c(\xi _t,\pi _t)d(- e^{-\alpha t})\biggr )\biggr ]\\&\le {E}^{\pi }_x\biggl [\int _{0}^{\infty }\exp \biggl (\frac{\theta }{\alpha }c(\xi _t,\pi _t)\biggr )d(- e^{-\alpha t})\biggr ]. \end{aligned}$$

By Assumption 2.1 and part (b) we obtain

$$\begin{aligned} {\tilde{J}}_\alpha (\theta ,x,\pi )&\le {E}^{\pi }_x\biggl [\int _{0}^{\infty }\exp \biggl (\frac{\theta }{\alpha }(\rho _1\log {{\mathcal {V}}(\xi _t)}+L_0)\biggr )d(- e^{-\alpha t})\biggr ]\\&=e^{{\theta L_0}/{\alpha }}\biggl [\int _{0}^{\infty }{E}^{\pi }_x\biggl ({\mathcal {V}}(\xi _t)^{\frac{\rho _1\theta }{\alpha }}\biggr )d(- e^{-\alpha t})\biggr ]\\&\le e^{{\theta L_0}/{\alpha }}\biggl [\int _{0}^{\infty }({E}^{\pi }_x[{\mathcal {V}}(\xi _t)])^{\frac{\rho _1\theta }{\alpha }}d(- e^{-\alpha t})\biggr ]~~~(\text {since}~~ \rho _1\theta <\alpha )\\&\le \alpha e^{{\theta L_0}/{\alpha }} [{\mathcal {V}}(x)]^{\frac{\rho _1\theta }{\alpha }}\biggl [\int _{0}^{\infty }\exp \biggl (\frac{\rho _0\rho _1\theta t}{\alpha }-\alpha t\biggr )dt\biggr ]\\&={\frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1\theta }} e^{{\theta L_0}/{\alpha }}[{\mathcal {V}}(x)]^{\frac{\rho _1\theta }{\alpha }}, \end{aligned}$$

where the last equality holds due to the fact that $\rho _0\rho _1\theta <\alpha ^2$.

Next observe that $\displaystyle \sup _{\theta \in [0,1]}{\tilde{J}}^{*}_\alpha (\theta ,x)\le \frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1} e^{{ L_0}/{\alpha }}{\mathcal {V}}(x)$, and

$$\begin{aligned} \sup _{\theta \in (0,1]}{\mathcal {J}}^{*}_\alpha (\theta ,x)= \sup _{\theta \in (0,1]}\frac{1}{\theta }\log {{\tilde{J}}^*_\alpha (\theta ,x)}\le \sup _{\theta \in (0,1]}\frac{1}{\theta }\biggl (\log {\frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1\theta }}\biggr )+\frac{L_0}{\alpha }+\frac{\rho _1}{\alpha }\log {\mathcal {V}}(x). \end{aligned}$$

Also, doing a simple and direct calculation, we achieve (2.4). $\square $

In Ghosh and Saha (2014) and Kumar and Pal (2013), the authors used the Dynkin’s formula within the class of Markov controls by using the Markov property of the state process $\{\xi _t\}_{t\ge 0}$. But this Markov property may fail to hold when we study within the class of history-dependent controls, and consequently, here we can’t directly apply the Dynkin formula. Hence we assume the following condition, so that we can apply the Dynkin’s formula for a large enough class of functions, which had been widely used in CTMDPs; see, for instance, (Guo and Liao 2019; Guo et al. 2019; Guo and Zhang 2019).

Assumption 2.2

The Borel measurable function ${\mathcal {V}}^2\ge 1$ on S satisfies the following Lyapunov condition

$$\begin{aligned} \int _{S}q(dy|x,a){\mathcal {V}}^2(y)\le \rho _2{\mathcal {V}}^2(x)+b_0~~\forall ~(x,a)\in K, \end{aligned}$$

for some constants $0<\rho _2<\alpha $ and $b_0\ge 0$. Here ${\mathcal {V}}$ is as in Assumption 2.1.

We now introduce some frequently used notations.

$C^\infty _c(a,b)$ denotes the set of all infinitely differentiable functions on (a, b) with compact support.
Let $A_{as}([0,1]\times S)$ denote the space of all functions which are real-valued and differentiable almost everywhere with respect to the first variable $\theta \in [0,1]$. Given any real-valued function $W\ge 1$ on S and any Borel set X, a real-valued function $\varphi $ on $X\times S$ is called W bounded if $\displaystyle \Vert \varphi \Vert ^\infty _{W}:=\sup _{(\theta ,x)\in X\times S}\frac{|\varphi (\theta ,x)|}{W(x)}< \infty $. Denote $B_{W}(X\times S)$ the Banach space of all W-bounded functions. When $W\equiv 1$, $B_{1}([0,1]\times S)$ is the space of all bounded functions on $[0,1]\times S.$ Now define $L^\infty _{W}([0,1]\times S):=\{\varphi :[0,1]\times S\rightarrow {\mathbb {R}}:\varphi \in B_{W}([0,1]\times S)\cap A_{as}([0,1]\times S)\}$.

3 Stochastic representation of a solution to the HJB equation

In this section, we prove that if the HJB equation for the cost criterion (2.3) has a solution then we will give a stochastic representation of that solution. Using dynamic programming heuristics, the HJB equations for the discounted cost criterion (2.3) is given by

$$\begin{aligned} \left\{ \begin{array}{ll}\alpha \theta \frac{\partial \varphi _\alpha }{\partial \theta }(\theta ,x)&{}=\displaystyle {\inf _{a\in A(x)}\biggl [\int _{S}q(dy|x,a)\varphi _\alpha (\theta ,y)+\theta c(x,a)\varphi _\alpha (\theta ,x)\biggr ]},\\ {} &{}1\le \varphi _\alpha (\theta ,x)\le {\frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1\theta }} e^{{\theta L_0}/{\alpha }}({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}~~\text {for}~~(\theta ,x)\in [0,1]\times S, \end{array}\right. \end{aligned}$$

(3.1)

for each $x\in S$ and a.e. $\theta \in [0,1]$ where the upper bound of $\varphi _\alpha (\theta ,x)$ is motivated by Proposition 2.1.

Remark 3.1

To prove the existence of an optimal control for bounded cost and transition rates, in Pal and Pradhan (2019), the authors studied the following HJB equation having a solution $\phi _\alpha (\theta ,x)$ on $[0,1]\times S$ such that

$$\begin{aligned} \left\{ \begin{array}{ll}\alpha \theta \frac{\partial \phi _\alpha }{\partial \theta }(\theta ,x)&{}=\displaystyle {\inf _{a\in A(x)}\biggl [\int _{ S}q(dy|x,a)\phi _\alpha (\theta ,y)+\theta c(x,a)\phi _\alpha (\theta ,x)\biggr ]},~\text {for}~~(\theta ,x)\in [0,1]\times S,\\ &{}\lim _{\theta \rightarrow 0} \phi _\alpha (\theta ,x)=1~\text {uniformly in}~x\in S. \end{array}\right. \end{aligned}$$

(3.2)

From the arguments for the existence of a unique solution to the Eq. (3.2), it is necessary to have $\phi _\alpha (\theta ,x)$ converges to 1 uniformly in x as $\theta \rightarrow 0$. But, it is not true in general when the cost and transition rates are unbounded; for more details see Example 3.2 in Guo and Liao (2019). In this article we replace the uniform convergence condition by the above new one.

To ensure the existence of an optimal control, in addition to Assumptions 2.1 and 2.2, we also need the following continuity and compactness conditions.

Assumption 3.1

The following conditions hold:

(i)
for each $x\in S$, the set A(x) is compact;
(ii)
for any fixed $x\in S$, the function c(x, a) is continuous in $a\in A(x)$;
(iii)
for any given $x\in S$, the function $\displaystyle \int _{S}{\mathcal {V}}(y)q(dy|x,a)$ is continuous in $a\in A(x)$, where ${\mathcal {V}}$ is introduced in Assumption 2.1.

Remark 3.2

Assumptions 3.1 (i)–(iii) are commonly used to find an optimal control for continuous-time MDP, see Guo and Hernandez-Lerma (2009), Guo and Liao (2019), Guo et al. (2019), Guo and Piunovskiy (2011) and Guo and Zhang (2019). Also, note that if Assumption 3.1 (iii) is satisfied, then for any given $x\in S$, the function $\displaystyle \int _{S}u(y)q(dy|x,a)$ is continuous in $a\in A(x)$ for each function $u\in B_{{\mathcal {V}}}(S)$.

In the next theorem we show that if the HJB equation has a solution then its stochastic representation is equal to the value function corresponding to the cost criterion (2.3).

Theorem 3.1

Under Assumptions 2.1, 2.2, and 3.1 suppose that the HJB Eq. (3.1) has a solution $\varphi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)$ satisfying the bounds as in Eq. (3.1). Then, for all $(\theta ,x)\in [0,1]\times S$, we have the probabilistic representation of $\varphi _\alpha $ as

$$\begin{aligned} \varphi _\alpha (\theta ,x)=\inf _{\pi \in \Pi }E^{\pi }_x\biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t}c(\xi _t,\pi _t)dt\biggr )\biggr ] \end{aligned}$$

(3.3)

i.e., $\varphi _\alpha (\theta ,x)={\tilde{J}}^{*}_\alpha (\theta ,x)$ for all $(\theta ,x)\in [0,1]\times S$.

Proof

First, we see that

$$\begin{aligned} \biggl [\theta c(x,a)\varphi _\alpha (\theta ,x)+\int _{S}q(dy|x,a)\varphi _\alpha (\theta ,y)\biggr ] \end{aligned}$$

is continuous in $a\in A(x)$ and A(x) is compact. So by measurable selection theorem, [Bertsekas and Shreve (1996), Proposition 7.33], there exists a measurable function $f^{*}:[0,1]\times S\rightarrow A$ such that

$$\begin{aligned}&\inf _{a\in A(x)}\biggl [\theta c(x,a)\varphi _\alpha (\theta ,x)+\int _{S}q(dy|x,a)\varphi _\alpha (\theta ,y)\biggr ]\nonumber \\&\quad =\biggl [\theta c(x,f^{*}(\theta ,x))\varphi _\alpha (\theta ,x)+\int _{S}q(dy|x,f^{*}(\theta ,x))\varphi _\alpha (\theta ,y)\biggr ]. \end{aligned}$$

(3.4)

Let

$$\begin{aligned} \pi ^{*}:S\times {\mathbb {R}}_+ \rightarrow A \ \end{aligned}$$

be defined by

$$\begin{aligned}&\pi ^{*}_t(x):=f^{*}(\theta e^{-\alpha t},x). \end{aligned}$$

Now we observe from Eq. (3.1) that for any $x\in S$, $a\in A(x)$ and a.e. $\theta \in [0,1]$ that

$$\begin{aligned} -\alpha \theta \frac{\partial \varphi _\alpha }{\partial \theta }(\theta ,x)+\displaystyle {\biggl [\int _{S}q(dy|x,a)\varphi _\alpha (\theta ,y)+\theta c(x,a)\varphi _\alpha (\theta ,x)\biggr ]}\ge 0. \end{aligned}$$

(3.5)

For any history-dependent control $\pi \in \Pi $ and $\theta \in [0,1]$, let $\{\xi _t, t\ge 0\}$ be the corresponding process, and define $\theta (t):=\theta e^{-\alpha t}$. Now for each $\omega \in \text{\O}mega $, by Eq. (3.5), we get for a.e. $s\ge 0$,

$$\begin{aligned} -\alpha \theta (s)\frac{\partial \varphi _\alpha }{\partial \theta }(\theta (s),\xi _s)+\displaystyle {\biggl [\int _{S}q(dy|\xi _s,\pi _s)\varphi _\alpha (\theta (s),y)+\theta (s) c(\xi _s,\pi _s)\varphi _\alpha (\theta (s),\xi _s)\biggr ]}\ge 0. \end{aligned}$$

(3.6)

Define a function $g: [0,\infty )\times S\times \text{\O}mega \rightarrow [0, \infty )$ by

$$\begin{aligned} g(t,x,\omega ):=\exp \biggl (\int _{0}^{t}\theta (s)c(\xi _s,\pi _s)ds\biggr )\varphi _\alpha (\theta (t),x). \end{aligned}$$

In view of Assumptions 2.1 and 2.2, we have

$$\begin{aligned}&E^{\pi }_x\biggl [\exp \biggl (\int _{0}^{t} 2e^{-\alpha s}c(\xi _s,\pi _s)ds\biggr )\biggr ]\nonumber \\&\le E^{\pi }_x\biggl [\exp \biggl (\int _{0}^{\infty }\frac{2}{\alpha } c(\xi _s,\pi _s)d(-e^{-\alpha s})\biggr )\biggr ]\nonumber \\&\le E^{\pi }_x\biggl [\int _{0}^{\infty }\exp \biggl ( \frac{2}{\alpha }c(\xi _s,\pi _s)\biggr )d(-e^{-\alpha s})\biggr ]\nonumber \\&\le E^{\pi }_x\biggl [\int _{0}^{\infty }\exp \biggl (\frac{2}{\alpha }(\rho _1\log {{\mathcal {V}}(\xi _s)}+L_0)\biggr )d(-e^{-\alpha s})\biggr ]\nonumber \\&~(\text {by Assumption} 2.1)\nonumber \\&= e^{{2L_0}/{\alpha }}\biggl [\int _{0}^{\infty }E^{\pi }_x\biggl ({\mathcal {V}}(\xi _s)^{\frac{2\rho _1}{\alpha }}\biggr )d(-e^{-\alpha s})\biggr ]\nonumber \\&\le \alpha e^{{2L_0}/{\alpha }}\biggl ({\mathcal {V}}^2(x)+\frac{b_0}{\rho _2}\biggr )\biggl [\int _{0}^{\infty }e^{\rho _2 s-\alpha s}ds\biggr ]\nonumber \\&=\frac{\alpha e^{{2L_0}/{\alpha }}}{\alpha -\rho _2}\biggl ({\mathcal {V}}^2(x)+\frac{b_0}{\rho _2}\biggr ), \end{aligned}$$

(3.7)

where second inequality is obtained by using Jensen’s inequality.

Hence $E^{\pi }_x\biggl [\exp \biggl (\int _{0}^{t}\ 2e^{-\alpha s}c(\xi _s,\pi _s)ds\biggr )\biggr ]<\infty $ for all $x\in S$ and $t\in (0,\infty )$. Thus, using the extension of Dynkin formula in Guo et al. (2019), Theorem 3.1 to the function g, we have

$$\begin{aligned}&E^{\pi }_x[g(t,\xi _t,\omega )]-\varphi _\alpha (\theta ,x)\nonumber \\&=E^{\pi }_x\biggl \{\int _{0}^{t}\exp \biggl (\int _{0}^{s}\theta (v)c(\xi _v,\pi _v)dv\biggr )\times \biggl [-\alpha \theta (s)\frac{\partial \varphi _\alpha }{\partial \theta }(\theta (s),\xi _s)\nonumber \\&\qquad +\int _{S}q(dy|\xi _s,\pi _s)\varphi _\alpha (\theta (s),y)+\theta (s)c(\xi _s,\pi _s)\varphi _\alpha (\theta (s),\xi _s)\biggr ]ds\biggr \}. \end{aligned}$$

(3.8)

Now from (3.6) and (3.8), we have

$$\begin{aligned} \varphi _\alpha (\theta ,x)\le E^{\pi }_x\biggl [\exp \biggl (\int _{0}^{t}\theta (s)c(\xi _s,\pi _s)ds\biggr )\varphi _\alpha (\theta (t),\xi _t)\biggr ]. \end{aligned}$$

(3.9)

Given any $p>1$, let $q>1$ such that $\frac{1}{p}+\frac{1}{q}=1$, by Holder’s inequality we have

$$\begin{aligned} \varphi _\alpha (\theta ,x)&\le E^{\pi }_x\biggl [\exp \biggl (\int _{0}^{t}\theta (s)c(\xi _s,\pi _s)ds\biggr )\varphi _\alpha (\theta (t),\xi _t)\biggr ]\nonumber \\&\le \biggl \{E^{\pi }_x\biggl [\exp \biggl (p\int _{0}^{t}\theta (s) c(\xi _s,\pi _s)ds\biggr )\biggr ]\biggr \}^{{1}/{p}}\times \biggl \{E^{\pi }_x[\varphi ^q_\alpha (\theta (t),\xi _t)]\biggr \}^{{1}/{q}}\nonumber \\&=:T_1(p,t)\cdot T_2(q,t). \end{aligned}$$

(3.10)

For $T_2(q,t):=\{E^{\pi }_x[\varphi ^q_\alpha (\theta (t),\xi _t)]\}^{{1}/{q}}$, by the upper bound of $\varphi _\alpha $ in (3.1), we have

$$\begin{aligned} \varphi _\alpha (\theta (t),\xi _t)=\varphi _\alpha (\theta e^{-\alpha t},\xi _t)\le \frac{\alpha ^2}{\alpha ^2-\theta e^{-\alpha t}\rho _0\rho _1}\exp \biggl (\frac{\theta e^{-\alpha t} L_0}{\alpha }\biggr ) [{\mathcal {V}}(\xi _t)]^{\frac{\rho _1\theta e^{-\alpha t}}{\alpha }}. \end{aligned}$$

If $t>\alpha ^{-1}\log ({\theta q\rho _1}/{\alpha })$ then ${\theta e^{-\alpha t}q\rho _1}/{\alpha }<1$. Applying Jensen’s inequality and Proposition 2.1(b), we get

$$\begin{aligned} T_2(q,t)\le&\biggl \{E^{\pi }_x\biggl [\biggl (\frac{\alpha ^2}{\alpha ^2-\theta e^{-\alpha t}\rho _0\rho _1}\biggr )^q \exp \biggl (\frac{q\theta e^{-\alpha t } L_0}{\alpha }\biggr ) [{\mathcal {V}}(\xi _t)]^{\frac{q\rho _1\theta e^{-\alpha t}}{\alpha }}\biggr ]\biggr \}^{{1}/{q}}\nonumber \\&=\frac{\alpha ^2}{\alpha ^2-\theta e^{-\alpha t}\rho _0\rho _1} \exp \biggl (\frac{\theta e^{-\alpha t } L_0}{\alpha }\biggr ) \biggl [E^{\pi }_x[{\mathcal {V}}^{\frac{q\rho _1\theta e^{-\alpha t}}{\alpha }}(\xi _t)]\biggr ]^{\frac{1}{q}}\nonumber \\&\le \frac{\alpha ^2}{\alpha ^2-\theta e^{-\alpha t}\rho _0\rho _1} \exp \biggl (\frac{\theta e^{-\alpha t } L_0}{\alpha }\biggr ) [E^{\pi }_x({\mathcal {V}}(\xi _t))]^{\frac{\rho _1\theta e^{-\alpha t}}{\alpha }}\nonumber \\&\le \frac{\alpha ^2}{\alpha ^2-\theta e^{-\alpha t}\rho _0\rho _1} \exp \biggl (\frac{\theta e^{-\alpha t }}{\alpha }(L_0+\rho _0 \rho _1 t)\biggr ) {\mathcal {V}}^{\frac{\theta e^{-\alpha t}\rho _1}{\alpha }}(x)=:T_3(t). \end{aligned}$$

(3.11)

Next take $t \rightarrow \infty $ and get

$$\begin{aligned}&T_1(p,t)\rightarrow \biggl \{E^{\pi }_x\biggl [\exp \biggl (p\int _{0}^{\infty }\theta (s)c(\xi _s,\pi _s)ds\biggr )\biggr ]\biggr \}^{{1}/{p}} \text {and}~ \; T_3(t)\rightarrow 1. \end{aligned}$$

(3.12)

By (3.10), (3.11) and (3.12) we obtain

$$\begin{aligned} \varphi _\alpha (\theta ,x)\le \biggl \{E^{\pi }_x\biggl [\exp \biggl (p\theta \int _{0}^{\infty }e^{-\alpha t}c(\xi _t,\pi _t)dt\biggr )\biggr ]\biggr \}^{{1}/{p}}. \end{aligned}$$

Now, take the limit as $p\downarrow 1$ and get the result

$$\begin{aligned} \varphi _\alpha (\theta ,x)\le E^{\pi }_x\biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t}c(\xi _t,\pi _t)dt\biggr )\biggr ]. \end{aligned}$$

Since $\pi \in \Pi $ is an arbitrary control, we have

$$\begin{aligned} \varphi _\alpha (\theta ,x)\le \inf _{\pi \in \Pi }E^{\pi }_x\biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t}c(\xi _t,\pi _t)dt\biggr )\biggr ]. \end{aligned}$$

(3.13)

Using (3.1), (3.4) and (3.10), we can show that

$$\begin{aligned}&E^{\pi ^{*}}_x\biggl [\exp \biggl (\int _{0}^{t}\theta (s)c(\xi _s,\pi ^{*}_s(\xi _{s-}))ds\biggr )\varphi _\alpha (\theta (t),\xi _t)\biggr ]= \varphi _\alpha (\theta ,x). \end{aligned}$$

(3.14)

Now, using the lower bound of $\varphi _\alpha $ in (3.1) and Fatou’s lemma, we obtain

$$\begin{aligned}&\liminf _{t\rightarrow \infty }E^{\pi ^{*}}_x\biggl [\exp \biggl (\int _{0}^{t}\theta (s)c(\xi _s,\pi ^{*}_s(\xi _{s-}))ds\biggr )\varphi _\alpha (\theta (t),\xi _t)\biggr ]\nonumber \\&\ge \liminf _{t\rightarrow \infty }E^{\pi ^{*}}_x\biggl [\exp \biggl (\int _{0}^{t}\theta (s)c(\xi _s,\pi ^{*}_s(\xi _{s-}))ds\biggr )\biggr ]\nonumber \\&\ge E^{\pi ^{*}}_x\biggl [\liminf _{t\rightarrow \infty }\exp \biggl (\int _{0}^{t}\theta (s)c(\xi _s,\pi ^{*}_s(\xi _{s-}))ds\biggr )\biggr ]\nonumber \\&={\tilde{J}}_\alpha (\theta ,x,\pi ^{*}). \end{aligned}$$

(3.15)

From (3.14) and (3.15), we have

$$\begin{aligned} {\tilde{J}}_\alpha (\theta ,x,\pi ^{*})\le \varphi _\alpha (\theta ,x). \end{aligned}$$

Thus

$$\begin{aligned} \inf _{\pi \in \Pi } {\tilde{J}}_\alpha (\theta ,x,\pi )\le {\tilde{J}}_\alpha (\theta ,x,\pi ^{*})\le \varphi _\alpha (\theta ,x). \end{aligned}$$

(3.16)

From (3.13) and (3.16), we have (3.3). $\square $

4 The existence of solution to the HJB equation

In this Section, we prove that the Eq. (3.1) is the HJB equation for the $\alpha $ discounted cost (2.3) and the Eq. (3.1) has a solution in $ L^\infty _{{\mathcal {V}}}([0,1]\times S)$. We now proceed to make a rigorous analysis of the above. First, we prove a lemma about the existence of a solution for the HJB equation for bounded transition and cost rates; see Lemma 4.1 below. Then in Theorem 4.1, we relax these boundedness condition and prove the existence of a solution to the HJB Eq. (3.1). For that, we first truncate our transition and cost rates which plays a crucial role to derive the HJB equations and find the solution. Fix any $n\ge 1$, $0<\delta <1$. For each $n\ge 1$, $x\in S$, $a\in A(x)$, let $A_n(x):=A(x)$, $S_n:=\{x\in S|{\mathcal {V}}(x)\le n\}$, and $K_n:=\{(x,a)|x\in S_n,a\in A_n(x)\}$. Moreover for each $x\in S$, $a\in A_n(x)$ define

$$\begin{aligned} q^{(n)}(dy|x,a) :=\left\{ \begin{array}{ll} q(dy|x,a)~\text { if}~x\in S_n,\\ 0~\text { if }~x\notin S_n \end{array}\right. \end{aligned}$$

(4.1)

and

$$\begin{aligned} c_n(x,a):=\left\{ \begin{array}{ll} c(x,a)\wedge \text { min}\{n,\rho _1\ln {\mathcal {V}}(x)+L_0\}~\text { if}~x\in S_n,\\ 0~\text { if }~x\notin S_n. \end{array}\right. \end{aligned}$$

(4.2)

Lemma 4.1

Grant Assumptions 2.1, 2.2 and 3.1. Then, there exists a unique function $\varphi ^{(n,\delta )}_\alpha $ (depending on n, $\delta $) in $L^\infty _{{\mathcal {V}}}([0,1]\times S)$ for which the followings are true :

(1)
$\varphi ^{(n,\delta )}_\alpha \in B_{1}([0,1]\times S)$ is a bounded solution to the following differential equations (DEs) for all $x\in S$ and a.e. $\theta \in (\delta ,1]:$
$$\begin{aligned} \left\{ \begin{array}{ll} \alpha \theta \frac{\partial \varphi ^{(n,\delta )}_\alpha }{\partial \theta }(\theta ,x) &{}=\displaystyle {\inf _{a\in A(x)}\biggl [\theta c_n(x,a)\varphi ^{(n,\delta )}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a)\varphi ^{(n,\delta )}_\alpha (\theta ,y)\biggr ]}\\ \varphi ^{(n,\delta )}_\alpha (\delta ,x)&{}=e^{{n\delta }/{\alpha }}. \end{array}\right. \end{aligned}$$
(4.3)
(2)
$\varphi ^{(n,\delta )}_\alpha (\theta ,x)$ has a stochastic representation as follows: for each $x\in S$ and a.e. $\theta \in (\delta ,1]$,
$$\begin{aligned}&\varphi _\alpha ^{(n,\delta )}(\theta ,x)=\inf _{\pi \in \Pi }E^{\pi }_x\biggl [e^{{n\delta }/{\alpha }}\exp \biggl (\theta \int _{0}^{T_\delta (\theta )}e^{-\alpha t}c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ], \end{aligned}$$
(4.4)
where $T_\delta (\theta ):=\alpha ^{-1}\log (\theta / \delta )$ and $\{\xi ^{(n)}_t\}_{t\ge 0}$ is the process corresponding to the $q^{(n)}(\cdot |x,a)$.

Proof

(1) Since $S_n:=\{x\in S|{\mathcal {V}}(x)\le n\}$, by Assumption 2.1(ii), we see that $\displaystyle q^{(n)}_x(a):=\int _{S/\{x\}}q^{(n)}(dy|x,a)$ is bounded. So we can use the Lyapunov function $V\equiv 1$ such that $\int _{S}q^{(n)}(dy|x,a)V(y)\le \rho _0 V(x)$, and ${\overline{q}}^{(n)}:=\sup _{(x,a)\in K}q^{(n)}_x(a)<\infty $. Now let us define a nonlinear operator T on $B_{1}([0,1]\times S)$ as follows:

$$\begin{aligned} Tu(\theta ,x)=&e^{{\delta n}/{\alpha }}+\frac{1}{\alpha }\int _{\delta }^{\theta }\inf _{a\in A(x)}\biggl [\frac{1}{s}\int _{S}q^{(n)}(dy|x,a)u(s,y)+c_n(x,a)u(s,x)\biggr ]ds, \end{aligned}$$

where $u\in B_{1}([0,1]\times S)$ and $(\theta ,x)\in [\delta ,1]\times S$. By using the Assumption 2.1 and the fact that $c_n$ is bounded, we obtain

$$\begin{aligned}&\sup _{\theta \in [\delta ,1]}\sup _{x\in S}|Tu(\theta ,x)|\\&\quad \le e^{{\delta n}/{\alpha }}+\frac{1}{\alpha }\int _{\delta }^{1}\sup _{a\in A(x)}\biggl \{\frac{1}{s}\sup _{x\in S}\biggl [\int _{S}|q^{(n)}(dy|x,a)||u(s,y)|\biggr ]+n\sup _{x\in S}|u(s,x)|\biggr \}ds\\&\quad \le e^{{\delta n}/{\alpha }}+\frac{\Vert u\Vert _1^\infty }{\alpha }\biggl \{\int _{\delta }^{1}\sup _{a\in A(x)}\frac{1}{s}\sup _{x\in S}\biggl (2q^{(n)}_x(a)\biggr )ds+n(1-\delta )\biggr \}\\&\quad \le e^{{\delta n}/{\alpha }}+\frac{1}{\alpha }\biggl [(-2){\overline{q}}^{(n)}\log {\delta }+n(1-\delta )\biggr ]{\Vert u\Vert _1^\infty }. \end{aligned}$$

Therefore, T is a nonlinear operator from $B_{1}([0,1]\times S)$ to $B_{1}([0,1]\times S)$. For any $g_1,g_2\in B_{1}([0,1]\times S)$ and $\theta \in [\delta ,1]$, we have

$$\begin{aligned} \sup _{x\in S}|Tg_1(t,x)-Tg_2(t,x)|&\le \frac{1}{\alpha }\int _{\delta }^{t}\biggl (2{\overline{q}}^{(n)}/s+n\biggr )\sup _{x\in S}|g_1(s,x)-g_2(s,x)|ds\nonumber \\&\le \frac{1}{\alpha }\biggl [2{\overline{q}}^{(n)}(\log t-\log \delta )+n(t-\delta )\biggr ]\Vert g_1-g_2\Vert ^\infty _1. \end{aligned}$$

(4.5)

Now, we prove the following:

$$\begin{aligned}&\sup _{x\in S}|T^lg_1(t,x)-T^lg_2(t,x)|\nonumber \\&\quad \le \frac{\Vert g_1-g_2\Vert ^\infty _1}{\alpha ^l\cdot l!}\biggl [2{\overline{q}}^{(n)}(\log t-\log \delta )+n(t-\delta )\biggr ]^l~~\forall ~l\ge 1. \end{aligned}$$

(4.6)

By (4.5) and (4.6) we have

$$\begin{aligned}&\sup _{x\in S}|T^{l+1}g_1(t,x)-T^{l+1}g_2(t,x)|\\&\le \frac{1}{\alpha }\int _{\delta }^{t}\biggl (2{\overline{q}}^{(n)}/s+n\biggr )\sup _{x\in S}|T^lg_1(s,x)-T^lg_2(s,x)|ds\\&\le \frac{\Vert g_1-g_2\Vert ^\infty _1}{\alpha ^{l+1}\cdot l!}\int _{\delta }^{t}\biggl (2{\overline{q}}^{(n)}/s+n\biggr )\biggl [2{\overline{q}}^{(n)}(\log s-\log \delta )+n(s-\delta )\biggr ]^l ds\\&=\frac{\Vert g_1-g_2\Vert ^\infty _1}{\alpha ^{l+1} \cdot (l+1)!}\biggl [2{\overline{q}}^{(n)}(\log t-\log \delta )+n(t-\delta )\biggr ]^{l+1}. \end{aligned}$$

Since $\sum _{k\ge 1}\frac{1}{\alpha ^k \cdot k!}\biggl [-2{\overline{q}}^{(n)}\log \delta +n(1-\delta )\biggr ]^k<\infty $, there exists some m such that $\beta :=\frac{1}{\alpha ^m \cdot m!}\biggl [-2{\overline{q}}^{(n)}\log \delta +n(1-\delta )\biggr ]^m<1,$ which implies that $\Vert T^m g_1-T^m g_2\Vert _1^\infty \le \beta \Vert g_1-g_2\Vert ^\infty _1$. Therefore, T is a m-step contraction operator on $B_{1}([0,1]\times S)$. So, by Banach fixed point theorem, there exists a unique bounded function $\varphi _\alpha ^{(n,\delta )}\in B_{1}([0,1]\times S)$ (depending on $(n,\delta )$) such that $T\varphi ^{(n,\delta )}_\alpha =\varphi ^{(n,\delta )}_\alpha $; that is,

$$\begin{aligned} \varphi ^{(n,\delta )}_\alpha (\theta ,x) =e^{{\delta n}/{\alpha }}+&\frac{1}{\alpha }\int _{\delta }^{\theta }\inf _{a\in A(x)}\biggl [\frac{1}{s}\int _{S}q^{(n)}(dy|x,a)\varphi ^{(n,\delta )}_\alpha (s,y)\\&+c_n(x,a)\varphi ^{(n,\delta )}_\alpha (s,x)\biggr ]ds. \end{aligned}$$

Also note that $\varphi ^{(n,\delta )}_\alpha (\delta ,x)=e^{{\delta n}/{\alpha }}$. Hence by using (4.1), (4.2) and the above equation, we have $\varphi ^{(n,\delta )}_\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)$ and it satisfies equation (4.3).

(2) First we see that

$$\begin{aligned} \biggl [\theta c_n(x,a)\varphi ^{(n,\delta )}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a)\varphi ^{(n,\delta )}_\alpha (\theta ,y)\biggr ] \end{aligned}$$

is continuous in $a\in A(x)$ and A(x) is compact. So by measurable selection theorem (Bertsekas and Shreve 1996), Proposition 7.33, there exists a measurable function $f^{*\delta }:[0,1]\times S\rightarrow A$ such that

$$\begin{aligned}&\inf _{a\in A(x)}\biggl [\theta c_n(x,a)\varphi ^{(n,\delta )}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a)\varphi ^{(n,\delta )}_\alpha (\theta ,y)\biggr ]\nonumber \\&=\biggl [\theta c_n(x,f^{*\delta }(\theta ,x))\varphi ^{(n,\delta )}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,f^{*\delta }(\theta ,x))\varphi ^{(n,\delta )}_\alpha (\theta ,y)\biggr ]. \end{aligned}$$

(4.7)

Let

$$\begin{aligned} \pi ^{*\delta }:S\times {\mathbb {R}}_+ \rightarrow A \ \end{aligned}$$

be defined by

$$\begin{aligned}&\pi ^{*\delta }_t(x):=f^{*\delta }(\theta e^{-\alpha t},x). \end{aligned}$$

Let $\theta (t):=\theta e^{-\alpha t}$ for $t\in [0,\infty )$. Since $c_n$ and $\varphi _\alpha ^{(n,\delta ,k)}$ are bounded, by Dynkin’s formula we get

$$\begin{aligned}&E^{\pi }_x\biggl [\exp \biggl (\int _{0}^{T_\delta (\theta )}\theta (s)c_n(\xi ^{(n)}_s,\pi _s)ds\biggr )\varphi ^{(n,\delta )}_\alpha \biggl (\theta (T_\delta ),\xi _{T_\delta }^{(n)}\biggr )\biggr ]-\varphi ^{(n,\delta )}_\alpha (\theta ,x)\nonumber \\&=E^{\pi }_x\biggl \{\int _{0}^{T_\delta (\theta )}\biggl [-\alpha \theta (s)\frac{\partial \varphi ^{(n,\delta )}_\alpha }{\partial \theta }(\theta (s),\xi ^{(n)}_s) +\int _{S}q^{(n)}(dy|\xi ^{(n)}_s,\pi _s)\varphi ^{(n,\delta )}_\alpha (\theta (s),y)\nonumber \\&\qquad +\theta (s)c_n(\xi ^{(n)}_s,\pi _s)\varphi _\alpha ^{(n,\delta )}(\theta (s),\xi ^{(n)}_s)\biggr ]\times \exp \biggl (\int _{0}^{s}\theta (v)c_n(\xi _v^{(n)},\pi _v)dv\biggr )ds\biggr \}. \end{aligned}$$

(4.8)

By using (4.3) and (4.8), we obtain

$$\begin{aligned}&E^{\pi }_x\biggl [\exp \biggl (\int _{0}^{T_\delta (\theta )}\theta (s)c_n(\xi ^{(n)}_s,\pi _s)ds\biggr )\varphi ^{(n,\delta )}_\alpha \biggl (\theta (T_\delta ),\xi _{T_\delta }^{(n)}\biggr )\biggr ]\ge \varphi ^{(n,\delta )}_\alpha (\theta ,x). \end{aligned}$$

Since $\pi \in \Pi $ is an arbitrary control and $\varphi _\alpha ^{(n,\delta )}(\theta (T_\delta (\theta )),\xi ^{(n)}_{T_\delta })=e^{{n \delta }/{\alpha }}$, we have

$$\begin{aligned}&\varphi ^{(n,\delta )}_\alpha (\theta ,x)\le \inf _{\pi \in \Pi }E^{\pi }_x\biggl [e^{{n \delta }/{\alpha }}\exp \biggl (\int _{0}^{T_\delta (\theta )}\theta (s)c_n(\xi ^{(n)}_s,\pi _s)ds\biggr )\biggr ]. \end{aligned}$$

(4.9)

Using Eqs. (4.3), (4.7) and (4.8), we can show that

$$\begin{aligned}&\varphi ^{(n,\delta )}_\alpha (\theta ,x)=E^{\pi ^{*\delta }}_x\biggl [e^{{n \delta }/{\alpha }}\exp \biggl (\int _{0}^{T_\delta (\theta )}\theta (s)c_n(\xi ^{(n)}_s,\pi ^{*\delta }_s(\xi ^{(n)}_{s-}))ds\biggr )\biggr ]. \end{aligned}$$

Therefore

$$\begin{aligned}&\varphi ^{(n,\delta )}_\alpha (\theta ,x) \ge \inf _{\pi \in \Pi } E^{\pi }_x\biggl [e^{{n \delta }/{\alpha }}\exp \biggl (\int _{0}^{T_\delta (\theta )}\theta (s)c_n(\xi ^{(n)}_s,\pi _s)ds\biggr )\biggr ]. \end{aligned}$$

(4.10)

Therefore, from (4.9) and (4.10), we obtain (4.4). This completes the proof. $\square $

Theorem 4.1

Grant Assumptions 2.1, 2.2 and 3.1. Then the HJB Eq. (3.1) has a unique solution $\varphi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)$ satisfying $1\le \varphi _\alpha (\theta ,x)\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}$ for all $(\theta ,x)\in [0,1]\times S.$

Proof

First note that $\varphi ^{(n,\delta )}_\alpha $ is the solution to the Eq. (4.3), which depends on two parameters n, $\delta $. We prove this theorem in two steps.

Step 1: In the first step, we construct a solution $\varphi ^{(n)}_\alpha (\cdot ,x)$ from $\varphi ^{(n,\delta )}_\alpha (\cdot ,x)$ by passing the limit as $\delta \rightarrow 0$, such that $\varphi ^{(n)}_\alpha (\cdot ,x)$ is an absolutely continuous function and satisfies the following DEs:

$$\begin{aligned} \left\{ \begin{array}{lllll}\alpha \theta \frac{\partial \varphi ^{(n)}_\alpha }{\partial \theta }(\theta ,x) &{}=\displaystyle {\inf _{a\in A(x)}}\biggl [\int _{S}q^{(n)}(dy|x,a)\varphi _\alpha ^{(n)}(\theta ,y)+\theta c_n(x,a)\varphi ^{(n)}_\alpha (\theta ,x)\biggr ],\\ &{}\qquad \qquad \qquad x\in S,~\text {a.e.},~\theta \in [0,1],\\ 1\le \varphi ^{(n)}_\alpha (\theta ,x)&{}\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }}({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}~~~ \forall ~(\theta ,x)\in [0,1]\times S. \end{array}\right. \end{aligned}$$

(4.11)

Given $0<\delta <1$ and $1\le n<\infty $ by (4.4) and $\displaystyle \sup _{(x,a)\in K}c_n(x,a)\le n$, we have

$\varphi ^{(n,\delta )}_\alpha (\theta ,x)\le e^{2n/\alpha },~~x\in S,\theta \in [\delta ,1]$.

Next, we extend the domain of $\varphi ^{(n,\delta )}_\alpha $ to $[0,1]\times S$ by

$$\begin{aligned} {\overline{\varphi }}^{(n,\delta )}_{\alpha }(\theta ,x)=\left\{ \begin{array}{lcl}{\varphi }^{(n,\delta )}_{\alpha }(\theta ,x) , &{}\delta \le \theta \le 1~\forall x\in S\\ e^{{n\delta }/{\alpha }}, &{} 0\le \theta <\delta ~\forall x\in S. \end{array}\right. \end{aligned}$$

We consider the following expression, for any given $\pi \in \Pi $, $x\in S$, $\theta ,\theta _0\in [\delta ,1]$:

$$\begin{aligned} \biggl |E^{\pi }_x&\biggl [e^{n\delta /\alpha }\exp \biggl (\theta \int _{0}^{T_\delta (\theta )} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\\&-E^{\pi }_x \biggl [e^{n\delta /\alpha }\exp \biggl (\theta _0\int _{0}^{T_\delta (\theta _0)}e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\biggr |\\&\le P_1+P_2, \end{aligned}$$

where

$$\begin{aligned} P_1:=&\biggl |E^{\pi }_x \biggl [e^{n\delta /\alpha }\exp \biggl (\theta \int _{0}^{T_\delta (\theta )} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\\&-E^{\pi }_x \biggl [e^{n\delta /\alpha }\exp \biggl (\theta _0\int _{0}^{T_\delta (\theta )} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\biggr |, \end{aligned}$$

and

$$\begin{aligned} P_2:=&\biggl |E^{\pi }_x \biggl [e^{n\delta /\alpha }\exp \biggl (\theta _0\int _{0}^{T_\delta (\theta )} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\\&-E^{\pi }_x \biggl [e^{n\delta /\alpha }\exp \biggl (\theta _0\int _{0}^{T_\delta (\theta _0)} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\biggr |. \end{aligned}$$

Consider $c\wedge d:=min\{c,d\}$ and $c\vee d:=max\{c,d\}$. Then for fix $n\ge 1$; we have

$$\begin{aligned} \int _{0}^{T_\delta (\theta )} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\le n\int _{0}^{T_\delta (\theta )}e^{-\alpha t}dt\le \frac{n}{\alpha } \end{aligned}$$

and

$$\begin{aligned} \int _{T_\delta (\theta \wedge \theta _0)}^{T_\delta (\theta \vee \theta _0)}&e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt \le n \int _{T_\delta (\theta \wedge \theta _0)}^{T_\delta (\theta \vee \theta _0)}e^{-\alpha t}dt\\&=\frac{n}{\alpha }[\exp (-\alpha T_\delta (\theta \wedge \theta _0))-\exp (-\alpha T_\delta (\theta \vee \theta _0))]\le \frac{\delta n |\theta _0-\theta |}{\alpha \theta \theta _0}. \end{aligned}$$

Using the above results and knowing the fact that $e^{bz}-1\le (e^b-1)z$ for all $z\in [0,1]$ and $b>0$, we obtain

$$\begin{aligned} P_1&=e^{n\delta /\alpha }E^{\pi }_x \biggl [\exp \biggl ((\theta \wedge \theta _0)\int _{0}^{T_\delta (\theta )} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\\&~~~~\times \biggl (\exp \biggl (|\theta _0-\theta |\int _{0}^{T_\delta (\theta )}e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )-1\biggr )\biggr ]\\&\le e^{2 n/\alpha }E^{\pi }_x\biggl [\exp \biggl (|\theta _0-\theta |\int _{0}^{T_\delta (\theta )} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )-1\biggr ]\\&\le e^{2 n/\alpha } \Big (\exp \Big (\frac{n}{\alpha }|\theta _0-\theta |\Big )-1\Big )\\&\le e^{2 n/\alpha }\Big (e^{n/\alpha }-1\Big )|\theta _0-\theta |. \end{aligned}$$

Similarly for $P_2$ we have

$$\begin{aligned} P_2&=e^{n\delta /\alpha }E^{\pi }_x \biggl [\exp \biggl (\theta _0\int _{0}^{T_\delta (\theta \wedge \theta _0)} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\\&\qquad \times \biggl (\exp \biggl (\theta _0\int _{T_\delta (\theta \wedge \theta _0)}^{T_\delta (\theta \vee \theta _0)} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )-1\biggr )\biggr ]\\&\le e^{2 n/\alpha }E^{\pi }_x \biggl [\exp \biggl (\theta _0\int _{T_\delta (\theta \wedge \theta _0)}^{T_\delta (\theta \vee \theta _0)} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )-1\biggr ]\\&\le e^{2 n/\alpha }\biggl (\exp \biggl (\frac{n\delta |\theta -\theta _0|}{\alpha \theta }\biggr )-1\biggr )\\&\le e^{2 n/\alpha }\biggl (e^{n/\alpha }-1\biggr )|\theta _0-\theta |. \end{aligned}$$

Hence for all $(\theta ,x)\in [0,1]\times S$, we have

$$\begin{aligned} |{\overline{\varphi }}^{(n,\delta )}_\alpha (\theta _0,x)-{\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,x)| \le 2e^{{2n}/{\alpha }}(e^{{n}/{\alpha }}-1)|\theta -\theta _0|. \end{aligned}$$

(4.12)

Now we want to show that ${\overline{\varphi }}^{(n,\delta )}_\alpha $ is decreasing as $\delta \rightarrow 0$ for any $(\theta ,x)$. For a fixed $\alpha >0$ and $\varepsilon >0$ small enough, consider ${\overline{\varphi }}^{(n,\delta +\varepsilon )}_\alpha (\theta ,x)-{\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,x)$ and assume that $h_\delta :=e^{\frac{n\delta }{\alpha }}$. By measurable selection theorem we get the minimizer $\pi ^{*(\delta +\varepsilon )}$ like in Eq. (3.4), corresponding to ${\overline{\varphi }}_\alpha ^{(n,\delta +\varepsilon )}$ such that the followings cases hold.

Case 1. If $\delta +\varepsilon <\theta $ then

$$\begin{aligned}&{\overline{\varphi }}^{(n,\delta +\varepsilon )}_\alpha (\theta ,x)-{\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,x)\\&=E^{\pi ^{*(\delta +\varepsilon )}}_x\biggl [h_{\delta +\varepsilon } \exp \biggl (\theta \int _{0}^{T_{\delta +\varepsilon }} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi ^{*(\delta +\varepsilon )}_{t}(\xi ^{(n)}_{t-}))dt\biggr )\biggr ]\\&\quad -\inf _{\pi \in \Pi }E^{\pi }_x\biggl [h_{\delta } \exp \biggl (\theta \int _{0}^{T_\delta } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\\&\ge h_\delta E^{\pi ^{*(\delta +\varepsilon )}}_x\biggl [ \exp \biggl (\theta \int _{0}^{T_{\delta +\varepsilon }} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi ^{*(\delta +\varepsilon )}_{t}(\xi ^{(n)}_{t-}))dt\biggr )\\&\quad \times \biggl \{h_\varepsilon -\exp \biggl (\theta \int _{T_{\delta +\varepsilon }}^{T_\delta } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi ^{*(\delta +\varepsilon )}_{t}(\xi ^{(n)}_{t-}))dt\biggr )\biggr \}\biggr ]\\&\ge h_\delta E^{\pi ^{*(\delta +\varepsilon )}}_x\biggl [ \exp \biggl (\theta \int _{0}^{T_{\delta +\varepsilon }} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi ^{*(\delta +\varepsilon )}_{t}(\xi ^{(n)}_{t-}))dt\biggr )\biggl \{h_\varepsilon -\exp \biggl (\theta \int _{T_{\delta +\varepsilon }}^{T_\delta } e^{-\alpha t} ndt\biggr )\biggr \}\biggr ]\\&=h_\delta E^{\pi ^{*(\delta +\varepsilon )}}_x\biggl [ \exp \biggl (\theta \int _{0}^{T_{\delta +\varepsilon }} e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi ^{*(\delta +\varepsilon )}_{t}(\xi ^{(n)}_{t-}))dt\biggr )\biggl \{h_\varepsilon -\exp \biggl (\frac{n\theta (e^{-\alpha T_{\delta +\varepsilon }}-e^{-\alpha T_\delta })}{\alpha }\biggr )\biggr \}\biggr ]\\&=0. \end{aligned}$$

Case 2. $\delta <\theta \le \delta +\varepsilon $

$$\begin{aligned}&{\overline{\varphi }}^{(n,\delta +\varepsilon )}_\alpha (\theta ,x)-{\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,x)\\&=h_{\delta +\varepsilon }-E^{\pi ^{*\delta }}_x\biggl [h_{\delta } \exp \biggl (\theta \int _{0}^{T_\delta } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi ^{*\delta }_t(\xi ^{(n)}_{t-}))dt\biggr )\biggr ]\\&=h_\delta \biggl [h_\varepsilon -E^{\pi ^{*\delta }}_x\biggl [\exp \biggl (\theta \int _{{0}}^{T_\delta } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi ^{*\delta }_{t}(\xi ^{(n)}_{t-}))dt\biggr )\biggr ]\biggr ]\\&\ge h_\delta \biggl [h_\varepsilon -\exp \biggl (\theta \int _{{0}}^{T_\delta } e^{-\alpha t} ndt\biggr )\biggr ]\\&= h_\delta \biggl [h_\varepsilon -e^{n\theta \frac{(1-e^{-\alpha T_\delta })}{\alpha }}\biggr ]\ge 0. \end{aligned}$$

Case 3. $\theta \le \delta $

$$\begin{aligned} {\overline{\varphi }}^{(n,\delta +\varepsilon )}_\alpha (\theta ,x)-{\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,x)=h_{\delta +\varepsilon }-h_\delta =h_\delta (h_\varepsilon -1)=h_\delta (e^\frac{n\varepsilon }{\alpha }-1)\ge 0. \end{aligned}$$

Hence ${\overline{\varphi }}_\alpha ^{(n,\delta )}(\theta ,x)$ is increasing in $\delta $ for any $(\theta ,x)\in [0,1]\times S$. Now from (4.12), we know that for each $x\in S$, ${\overline{\varphi }}_\alpha ^{(n,\delta )}(\cdot ,x)$ is Lipschitz continous in $\theta \in [0, 1]$. Also, ${\overline{\varphi }}_\alpha ^{(n,\delta )}(\theta ,x)$ is increasing in $\delta $ for any $(\theta ,x)\in [0,1]\times S$ and bounded above (since ${\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,x)\le e^{2n/\alpha },~~x\in S,\theta \in [\delta ,1]$), therefore there exists a function $\varphi ^{(n)}_\alpha $ on $[0,1]\times S$ that is continuous with respect to $\theta \in [0,1]$, such that along a subsequence $\delta _m\rightarrow 0$, we have $\lim _{m\rightarrow \infty }{\overline{\varphi }}_\alpha ^{(n,\delta _m)}(\theta ,x)=\varphi _\alpha ^{(n)}(\theta ,x)$ and for any fixed $x\in S$ this convergence is uniform in $\theta \in [0,1]$.

Let $\psi \in C^\infty _c(0,1)$, then we have

$$\begin{aligned}&-\int _{0}^{1}\alpha \frac{d(\theta \psi )}{d\theta }(\theta ){\overline{\varphi }}_\alpha ^{(n,\delta _m)}(\theta ,x)d\theta =\int _{0}^{1}\alpha \theta \frac{\partial {\overline{\varphi }}^{(n,\delta _m)}_\alpha }{\partial \theta }(\theta ,x)\psi (\theta )d\theta \nonumber \\&=\int _{0}^{1}\displaystyle {\inf _{a\in A(x)}}\biggl [\theta c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta \nonumber \\&-\int _{0}^{\delta _m}\inf _{a\in A(x)}\biggl [\theta c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta \nonumber \\&=\int _{0}^{1}\displaystyle {\inf _{a\in A(x)}\biggl [\theta c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,y)\biggr ]}\psi (\theta )d\theta \nonumber \\&-\int _{0}^{\delta _m}\inf _{a\in A(x)}\biggl [\theta c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)\biggr ]\psi (\theta )d\theta . \end{aligned}$$

(4.13)

Now take $\tau (x):=M_0 {\mathcal {V}}(x)$ and define

$$\begin{aligned} Q^{(n)}(dy|x,a):=\delta _x(dy)+\frac{q^{(n)}(dy|x,a)}{\tau (x)} \end{aligned}$$

for all $(x,a)\in K$ where $\delta _x(\cdot )$ is the Dirac measure concentrated at x. We see that under Assumption 2.1, $Q^{(n)}$ is a stochastic kernel on S given K. Then (4.13) can be written as

$$\begin{aligned}&-\int _{0}^{1}\biggl \{\frac{\alpha }{\tau (x)}\frac{d(\theta \psi )}{d\theta }{\overline{\varphi }}_\alpha ^{(n,\delta _m)}(\theta ,x)-{\overline{\varphi }}_\alpha ^{(n,\delta _m)}(\theta ,x)\psi (\theta )\biggr \}d\theta \nonumber \\&=\int _{0}^{1}\displaystyle {\inf _{a\in A(x)}}\biggl [\frac{\theta }{\tau (x)} c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)+\int _{S}Q^{(n)}(dy|x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta \nonumber \\&-\frac{1}{\tau (x)}\int _{0}^{\delta _m}\inf _{a\in A(x)}\biggl [\theta c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)\biggr ]\psi (\theta )d\theta . \end{aligned}$$

(4.14)

Now

$$\begin{aligned}&\biggl |\displaystyle {\inf _{a\in A(x)}\biggl [\frac{\theta }{\tau (x)} c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)+\int _{S}Q^{(n)}(dy|x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,y)\biggr ]}\psi (\theta )\biggr |\nonumber \\&\le |\psi (\theta )|\displaystyle {\sup _{a\in A(x)}\biggl [\frac{\theta }{\tau (x)} |c_n(x,a)||{\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,x)|+\int _{S}Q^{(n)}(dy|x,a)|{\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,y)|\biggr ]}\nonumber \\&\le {\frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1\theta }} e^{{\theta L_0}/{\alpha }}\displaystyle \sup _{a\in A(x)}\biggl [\frac{\theta }{\tau (x)}n{\mathcal {V}}^{\frac{\rho _1\theta }{\alpha }}(x)+\int _{S}Q^{(n)}(dy|x,a){\mathcal {V}}^{\frac{\rho _1\theta }{\alpha }}(y)\biggr ]|\psi (\theta )|\nonumber \\&\le {\frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1\theta }} e^{{\theta L_0}/{\alpha }}\displaystyle \sup _{a\in A(x)}\biggl [\frac{\theta }{\tau (x)}n{\mathcal {V}}(x)+\int _{S}Q^{(n)}(dy|x,a){\mathcal {V}}(y)\biggr ]|\psi (\theta )|\nonumber \\&\le {\frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1\theta }} e^{{\theta L_0}/{\alpha }}\displaystyle \biggl [\frac{\theta }{\tau (x)}n{\mathcal {V}}(x)+{\mathcal {V}}(x)+\rho _0\frac{{\mathcal {V}}(x)}{\tau (x)}\biggr ]|\psi (\theta )|\nonumber \\&= {\frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1\theta }} e^{{\theta L_0}/{\alpha }}\displaystyle \biggl [\frac{\theta }{\tau (x)}n{\mathcal {V}}(x)+{\mathcal {V}}(x)+\frac{\rho _0}{M_0}\biggr ]|\psi (\theta )| . \end{aligned}$$

(4.15)

Since for each fixed $x\in S$, A(x) is compact, there exist a subsequence of $\{m\}$, by abuse of notation, we denote the same sequence and $a^*\in A(x)$ such that $\lim _{m\rightarrow \infty }a^{*}_m=a^*$. Now, from (4.13), for any $a\in A(x)$, we have

$$\begin{aligned}&-\int _{0}^{1}\biggl \{\frac{\alpha }{\tau (x)}\frac{d(\theta \psi )}{d\theta }{\overline{\varphi }}_\alpha ^{(n,\delta _m)}(\theta ,x)-{\overline{\varphi }}_\alpha ^{(n,\delta _m)}(\theta ,x)\psi (\theta )\biggr \}d\theta \nonumber \\&= \int _{0}^{1}\displaystyle \biggl [\frac{\theta }{\tau (x)} c_n(x,a^{*}_m){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)+\int _{S}Q^{(n)}(dy|x,a^{*}_m){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta \nonumber \\&\quad -\frac{1}{\tau (x)}\int _{0}^{\delta _m}\inf _{a\in A(x)}\biggl [\theta c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)\biggr ]\psi (\theta )d\theta . \end{aligned}$$

(4.16)

So, by Lemma 8.3.7 in Hernandez-Lerma and Lassere (1999) (Hernandez-Lerma and Lasserre 1999) taking limit as $m\rightarrow \infty $ in (4.16), we get

$$\begin{aligned}&-\int _{0}^{1}\biggl \{\frac{\alpha }{\tau (x)}\frac{d(\theta \psi )}{d\theta }(\theta )\varphi _\alpha ^{(n)}(\theta ,x)-\varphi _\alpha ^{(n)}(\theta ,x)\psi (\theta )\biggr \}d\theta \nonumber \\&\ge \int _{0}^{1}\displaystyle \biggl [\frac{\theta }{\tau (x)} c_n(x,a^{*})\varphi ^{(n)}_\alpha (\theta ,x)+\int _{S}Q^{(n)}(dy|x,a^{*})\varphi ^{(n)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta . \end{aligned}$$

Hence

$$\begin{aligned}&-\int _{0}^{1}\biggl \{\frac{\alpha }{\tau (x)}\frac{d(\theta \psi )}{d\theta }(\theta )\varphi _\alpha ^{(n)}(\theta ,x)-\varphi _\alpha ^{(n)}(\theta ,x)\psi (\theta )\biggr \}d\theta \nonumber \\&\ge \inf _{a\in A(x)}\int _{0}^{1}\displaystyle \biggl [\frac{\theta }{\tau (x)} c_n(x,a)\varphi ^{(n)}_\alpha (\theta ,x)+\int _{S}Q^{(n)}(dy|x,a)\varphi ^{(n)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta . \end{aligned}$$

(4.17)

But

$$\begin{aligned}&-\int _{0}^{1}\biggl \{\frac{\alpha }{\tau (x)}\frac{d(\theta \psi )}{d\theta }{\overline{\varphi }}_\alpha ^{(n,\delta _m)}(\theta ,x)-{\overline{\varphi }}_\alpha ^{(n,\delta _m)}(\theta ,x)\psi (\theta )\biggr \}d\theta \\&\le \int _{0}^{1}\displaystyle \biggl [\frac{\theta }{\tau (x)} c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)+\int _{S}Q^{(n)}(dy|x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta \\&\quad -\frac{1}{\tau (x)}\int _{0}^{\delta _m}\inf _{a\in A(x)}\biggl [\theta c_n(x,a){\overline{\varphi }}^{(n,\delta _m)}_\alpha (\theta ,x)\biggr ]\psi (\theta )d\theta . \end{aligned}$$

By analogous arguments, we get

$$\begin{aligned}&-\int _{0}^{1}\biggl \{\frac{\alpha }{\tau (x)}\frac{d(\theta \psi )}{d\theta }(\theta )\varphi _\alpha ^{(n)}(\theta ,x)-\varphi _\alpha ^{(n)}(\theta ,x)\psi (\theta )\biggr \}d\theta \nonumber \\&\le \inf _{a\in A(x)}\int _{0}^{1}\displaystyle \biggl [\frac{\theta }{\tau (x)} c_n(x,a)\varphi ^{(n)}_\alpha (\theta ,x)+\int _{S}Q^{(n)}(dy|x,a)\varphi ^{(n)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta . \end{aligned}$$

(4.18)

From (4.17) and (4.18), we get

$$\begin{aligned}&-\int _{0}^{1}\biggl \{\frac{\alpha }{\tau (x)}\frac{d(\theta \psi )}{d\theta }(\theta )\varphi _\alpha ^{(n)}(\theta ,x)-\varphi _\alpha ^{(n)}(\theta ,x)\psi (\theta )\biggr \}d\theta \nonumber \\&= \inf _{a\in A(x)}\int _{0}^{1}\displaystyle \biggl [\frac{\theta }{\tau (x)} c_n(x,a)\varphi ^{(n)}_\alpha (\theta ,x)+\int _{S}Q^{(n)}(dy|x,a)\varphi ^{(n)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta . \end{aligned}$$

(4.19)

Thus we obtain

$$\begin{aligned}&-\int _{0}^{1}\alpha \frac{d(\theta \psi )}{d\theta }(\theta )\varphi _\alpha ^{(n)}(\theta ,x)d\theta \\&= \inf _{a\in A(x)}\int _{0}^{1}\displaystyle \biggl [{\theta } c_n(x,a)\varphi ^{(n)}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a)\varphi ^{(n)}_\alpha (\theta ,y)\biggr ]\psi (\theta )d\theta . \end{aligned}$$

Hence

$$\begin{aligned} \alpha \theta \frac{\partial \varphi ^{(n)}_\alpha }{\partial \theta }(\theta ,x) = \inf _{a\in A(x)}\displaystyle \biggl [ \theta c_n(x,a)\varphi ^{(n)}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a)\varphi ^{(n)}_\alpha (\theta ,y)\biggr ]~\text {a.e.}~\theta \in [0,1] \end{aligned}$$

in the sense of distribution. Now for $\theta \in [\delta _m,1]$, by using (4.4) and Proposition 2.1, we have

$$\begin{aligned}&\varphi ^{(n,\delta _m)}_\alpha (\theta ,x) =\inf _{\pi \in \Pi } E^{\pi }_x \biggl [e^{n\delta _m/\alpha }\exp \biggl (\theta \int _{0}^{T_{\delta _m}(\theta )}e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\\&\le e^{n\delta _m/\alpha }\inf _{\pi \in \Pi }E^{\pi }_x \biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\\&\le e^{n\delta _m/\alpha }\inf _{\pi \in \Pi }E^{\pi }_x \biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t} c(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\\&\le e^{n\delta _m/\alpha }{\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }} \end{aligned}$$

Note that $\varphi _\alpha ^{(n,\delta _m)}\rightarrow \varphi _\alpha ^{(n)}$ as $m\rightarrow \infty $. Thus, letting $m\rightarrow \infty $ in the above equation, we obtain

$$\begin{aligned} 1\le \varphi ^{(n)}_\alpha (\theta ,x) \le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}. \end{aligned}$$

(4.20)

By using (4.1), (4.2), (4.20), and the DE satisfied by $\varphi ^{(n)}_{\alpha }$ (that is just proven), we see that $\varphi ^{(n)}_{\alpha }\in L^\infty _{{\mathcal {V}}}([0,1]\times S)$ and it is a solution of (4.11). Thus by closely mimicking the arguments as in Theorem 3.1, one can easily get the stochastic representation of the solution $\varphi _\alpha ^{(n)}$, that is

$$\begin{aligned} \varphi ^{(n)}_\alpha (\theta ,x)=\inf _{\pi \in \Pi }E^{\pi }_x \biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]. \end{aligned}$$

(4.21)

Step 2: In this step we prove Theorem 4.1, by passing to the limit as $n \rightarrow \infty $. Now we will prove that for each $x\in S$, $\{\varphi ^{(n)}_\alpha \}_{n\ge 1}$ is equicontinuous on [0, 1]. We consider the following expression, for any given $\pi \in \Pi $, $x\in S$, $\theta ,\theta _0\in [0,1]$:

$$\begin{aligned} \biggl |E^{\pi }_x&\biggl [\exp \biggl (\theta \int _{0}^{\infty } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]-E^{\pi }_x \biggl [\exp \biggl (\theta _0\int _{0}^{\infty }e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\biggr |\\&\le K_1, \end{aligned}$$

where

$$\begin{aligned}&K_1=E^{\pi }_x \biggl [\exp \biggl ((\theta \wedge \theta _0)\int _{0}^{\infty } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\times \\&\biggl (\exp \biggl (|\theta _0-\theta |\int _{0}^{\infty } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )-1\biggr )\biggr ]\\&\le E^{\pi }_x \biggl [\exp \biggl ((\theta \wedge \theta _0)\int _{0}^{\infty } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\times \\&\biggl (\exp \biggl (\int _{0}^{\infty } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt \biggr ) -1\biggr )|\theta _0-\theta |\biggr ]\\&\le E^{\pi }_x \biggl [\exp \biggl (\int _{0}^{\infty } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\times \\&\biggl (\exp \biggl (\int _{0}^{\infty } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )|\theta _0-\theta |\biggr )\biggr ]\\&= |\theta _0-\theta | \times E^{\pi }_x \biggl [\exp \biggl (2\int _{0}^{\infty } e^{-\alpha t} c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ]\\&\le |\theta _0-\theta | \times \frac{\alpha e^{{2L_0}/{\alpha }}}{\alpha -\rho _2}\biggl ({\mathcal {V}}^2(x)+\frac{b_0}{\rho _2}\biggr ). \end{aligned}$$

Here, the first inequality is according to $e^{bz}-1\le (e^b-1)z$ for all $z\in [0,1]$ and $b>0$ and the last inequality follows from (3.7). Therefore, we have

$$\begin{aligned} |\varphi ^{(n)}_\alpha (\theta _0,x)-\varphi ^{(n)}_\alpha (\theta ,x)|&\le \sup _{\pi \in \Pi } |\theta _0-\theta | \times \frac{\alpha e^{{2L_0}/{\alpha }}}{\alpha -\rho _2}\biggl ({\mathcal {V}}^2(x)+\frac{b_0}{\rho _2}\biggr )\nonumber \\&=|\theta _0-\theta | \times \frac{\alpha e^{{2L_0}/{\alpha }}}{\alpha -\rho _2}\biggl ({\mathcal {V}}^2(x)+\frac{b_0}{\rho _2}\biggr ). \end{aligned}$$

(4.22)

By measurable selection theorem, [Bertsekas and Shreve (1996), Proposition 7.33], there exists a measurable function $f^{*n}:[0,1]\times S\rightarrow A$ such that

$$\begin{aligned}&\inf _{a\in A(x)}\biggl [\theta c_n(x,a)\varphi _\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a)\varphi _\alpha (\theta ,y)\biggr ]\nonumber \\&=\biggl [\theta c_n(x,f^{*n}(\theta ,x))\varphi _\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,f^{*n}(\theta ,x))\varphi _\alpha (\theta ,y)\biggr ]. \end{aligned}$$

(4.23)

Let

$$\begin{aligned} \pi ^{*n}:S\times {\mathbb {R}}_+ \rightarrow A \ \end{aligned}$$

be defined by

$$\begin{aligned}&\pi ^{*n}_t(x):=f^{*n}(\theta e^{-\alpha t},x). \end{aligned}$$

Hence by Eq. (4.11), we have a.e. $\theta \in [0,1]$ and $\forall x\in S$, we have

$$\begin{aligned} \left\{ \begin{array}{lllll}\alpha \theta \frac{\partial \varphi ^{(n)}_\alpha }{\partial \theta }(\theta ,x) &{}=\displaystyle \biggl [\int _{S}q^{(n)}(dy|x,f^{*n}(\theta ,x))\varphi _\alpha ^{(n)}(\theta ,y)+\theta c_n(x,f^{*n}(\theta ,x))\varphi ^{(n)}_\alpha (\theta ,x)\biggr ]\\ 1\le \varphi ^{(n)}_\alpha (\theta ,x)&{}\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }}({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}~~~ \forall ~(\theta ,x)\in [0,1]\times S. \end{array}\right. \end{aligned}$$

(4.24)

Since $c_n\ge 0$, by (4.21), we see $\varphi ^{(n)}_\alpha (\theta ,x)$ is increasing in $\theta $. Also we know that $\varphi ^{(n)}_\alpha (\theta ,x)$ is differentiable a.e. with respect to $\theta \in [0,1]$. So

$$\begin{aligned} \frac{\partial \varphi ^{(n)}_\alpha }{\partial \theta }(\theta ,x)\ge 0~~\text {for a.e.}~\theta . \end{aligned}$$

(4.25)

So, by (4.1), (4.2) and (4.24), for all $x\in S$ and for a.e. $\theta $, we have

$$\begin{aligned} \left\{ \begin{array}{lllll}&{}-\alpha \theta \frac{\partial \varphi ^{(n)}_\alpha }{\partial \theta }(\theta ,x)+\displaystyle \biggl [\int _{S}q^{(n-1)}(dy|x,f^{*n}(\theta ,x))\varphi _\alpha ^{(n)}(\theta ,y)+\theta c_{n-1}(x,f^{*n}(\theta ,x))\varphi ^{(n)}_\alpha (\theta ,x)\biggr ]\\ &{}\le 0\quad \quad \text {if}~x\in S_{n-1} \end{array}\right. \end{aligned}$$

(4.26)

and

$$\begin{aligned} \left\{ \begin{array}{llll}&{}-\alpha \theta \frac{\partial \varphi ^{(n)}_\alpha }{\partial \theta }(\theta ,x)+\displaystyle \biggl [\int _{S}q^{(n-1)}(dy|x,f^{*n}(\theta ,x))\varphi _\alpha ^{(n)}(\theta ,y)+\theta c_{n-1}(x,f^{*n}(\theta ,x))\varphi ^{(n)}_\alpha (\theta ,x)\biggr ]\\ &{}=-\alpha \theta \frac{\partial \varphi ^{(n)}_\alpha }{\partial \theta }(\theta ,x)\le 0\\ &{}\quad \quad \text {if}~x\notin S_{n-1}~(\text {by}~(4.25)) . \end{array}\right. \end{aligned}$$

(4.27)

So, by Dynkin formula, we get

$$\begin{aligned} E^{\pi ^{*n}}_x\biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t}c_{n-1}(\xi ^{(n-1)}_t,\pi ^{*n}_t(\xi ^{(n-1)}_{t-}))dt\biggr )\biggr ]\le \varphi ^{(n)}_\alpha (\theta ,x)~\text {for all}~(\theta ,x)\in [0,1]\times S. \end{aligned}$$

(4.28)

Also using (4.11) and Dynkin formula (see (3.7) and (3.13)), we have

$$\begin{aligned} \varphi ^{(n-1)}_\alpha (\theta ,x)\le E^{\pi ^{*n}}_x\biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t}c_{n-1}(\xi ^{(n-1)}_t,\pi ^{*n}_t(\xi ^{(n-1)}_{t-}))dt\biggr )\biggr ]. \end{aligned}$$

(4.29)

By (4.28) and (4.29), we have $\varphi ^{(n-1)}_\alpha (\theta ,x)\le \varphi ^{(n)}_\alpha (\theta ,x).$

Hence $\varphi ^{(n)}_\alpha (\theta ,x)$ is increasing in n for any $(\theta ,x)\in [0,1]\times S$. Now from (4.22), we know that for each $x\in S$, $\varphi ^{(n)}(\cdot ,x)$ is Lipschitz continuous in $\theta \in [0, 1]$. Also, $\varphi ^{(n)}_\alpha (\theta ,x)$ is increasing as $n\rightarrow \infty $ for any $(\theta ,x)\in [0,1]\times S$ and bounded above (by (4.20)), therefore there exists a function $\varphi _\alpha $ on $[0,1]\times S$ that is continuous with respect to $\theta \in [0,1]$, such that along a subsequence $n_k\rightarrow \infty $, we have $\lim _{n_k\rightarrow \infty }\varphi ^{(n_k)}_\alpha (\theta ,x)=\varphi _\alpha (\theta ,x)$ and this convergence is uniform in $\theta \in [0,1]$ for each fixed $x\in S$. Moreover, by (4.20), we have

$$\begin{aligned} 1\le \varphi _\alpha (\theta ,x)&\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}. \end{aligned}$$

(4.30)

As the proof of equation (4.11) in step 1 (starting from the first equality of (4.13)), we see that $\varphi _{\alpha }$ is a solution to the HJB Eq. (3.1). Also by (4.30), we can conclude that $\varphi _{\alpha }\in L^\infty _{{\mathcal {V}}}([0,1]\times S)$. Finally, the uniqueness of $\varphi _\alpha (\theta ,x)$ follows from the stochastic representation in Theorem 3.1. $\square $

5 The existence of optimal control

In this section, we present the main result of this article. Here we show the existence of an optimal control.

Theorem 5.1

Suppose that Assumptions 2.1, 2.2 and 3.1 are satisfied. Then, the following assertions hold.

(1)
The HJB Eq. (3.1) has a unique solution $\varphi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)$ and the solution admits the following representation
$$\begin{aligned} 1\le \varphi _\alpha (\theta ,x)&=\inf _{\pi \in \Pi }E^{\pi }_x\biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t}c(\xi _t,\pi _t)dt\biggr )\biggr ]\\&\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}. \end{aligned}$$
(2)
There exists a measurable function $f^*: [0,1]\times S \rightarrow A$ such that
$$\begin{aligned} \alpha \theta \frac{\partial \varphi _\alpha }{\partial \theta }(\theta ,x)&=\biggl [\int _{S}q(dy|x,f^{*}(\theta ,x))\varphi _\alpha (\theta ,y)+\theta c(x,f^{*}(\theta ,x))\varphi _\alpha (\theta ,x)\biggr ]\nonumber \\ \text {a.e.}~\theta \in [0,1]. \end{aligned}$$
(5.1)
(3)
Furthermore an optimal Markov control for the cost criterion (2.2) exists and is given by
$$\begin{aligned} {\tilde{\pi }}^*_t(x):=f^*(\theta e^{-\alpha t},x), \end{aligned}$$
where $f^*$ satisfies (5.1).

Proof

Part (1) follows from Theorems 3.1 and 4.1.

To prove (2), by Hernandez-Lerma and Lasserre (1999), we first observe that the function

$$\begin{aligned} \int _{S}q(dy|x,a)\varphi _\alpha (\theta ,y)+\theta c(x,a)\varphi _\alpha (\theta ,x) \end{aligned}$$

is continuous in $a\in A(x)$ for each given $(\theta ,x)\in [0,1]\times S$. Thus, by the measurable selection theorem (Bertsekas and Shreve 1996), Proposition 7.33 there exists a measurable function $f^{*}$ satisfying (5.1), and so (2) follows. For part (3), take any $f^{*}$ that satisfies (5.1). Then by Theorem 3.1, we have $\displaystyle \inf _{\pi \in \Pi } {\tilde{J}}_\alpha (\theta ,x,{\pi })={\tilde{J}}_\alpha (\theta ,x,{\tilde{\pi }}^{*})= \varphi _\alpha (\theta ,x)$, which together with (2.2), (2.3) and part (1), we have $\displaystyle \inf _{\pi \in \Pi }{\mathcal {J}}_\alpha (\theta ,x,{\pi })={\mathcal {J}}_\alpha (\theta ,x,{\tilde{\pi }}^{*})=\frac{1}{\theta }\ln {\tilde{J}}_\alpha (\theta ,x,{\tilde{\pi }}^{*})= \frac{1}{\theta }\ln \varphi _\alpha (\theta ,x).$ Hence ${\tilde{\pi }}^{*}$ is an optimal Markov control. $\square $

Now we prove the converse of the Theorem 5.1.

Theorem 5.2

Grant Assumptions 2.1, 2.2 and 3.1. Suppose there exists an optimal Markov control for the cost criterion (2.2) and is given by

$$\begin{aligned} {\hat{\pi }}^*_t(x):={\tilde{f}}^*(\theta e^{-\alpha t},x), \end{aligned}$$

for some measurable function ${\tilde{f}}^*$. Then we prove that ${\tilde{f}}^*$ is a minimizing selector of (3.1).

Proof

sSince ${\hat{\pi }}^*$ is optimal for the cost criterion (2.2), therefore we have

$$\begin{aligned} \displaystyle \inf _{\pi \in \Pi } {\tilde{J}}_\alpha (\theta ,x,{\pi })={\tilde{J}}_\alpha (\theta ,x,{\hat{\pi }}^{*})={\tilde{J}}^*_\alpha (\theta ,x). \end{aligned}$$

(5.2)

Now for ${\tilde{f}}^*$ by Theorem 4.1, there exists a unique solution $\psi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)$ for the equation

$$\begin{aligned} \alpha \theta \frac{\partial \psi _\alpha }{\partial \theta }(\theta ,x)&=\displaystyle {\biggl [\int _{S}q(dy|x,{\tilde{f}}^*(\theta ,x))\psi _\alpha (\theta ,y)+\theta c(x,{\tilde{f}}^*(\theta ,x))\psi _\alpha (\theta ,x)\biggr ]}, \end{aligned}$$

(5.3)

for each $x\in S$ and a.e. $\theta \in [0,1]$, satisfying $1\le \psi _\alpha (\theta ,x)\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}$ for all $(\theta ,x)\in [0,1]\times S.$

Now by Theorem 3.1, we know that

$$\begin{aligned} 1\le \psi _\alpha (\theta ,x)&=E^{{\hat{\pi }}^{*}}_x\biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t}c(\xi _t,{\hat{\pi }}^{*}_t(\xi _{t-}))dt\biggr )\biggr ]\nonumber \\&\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}. \end{aligned}$$

(5.4)

From (5.2) and (5.4), we get

$$\begin{aligned} \psi _\alpha (\theta ,x)= \displaystyle \inf _{\pi \in \Pi } {\tilde{J}}_\alpha (\theta ,x,{\pi })={\tilde{J}}_\alpha (\theta ,x,{\hat{\pi }}^{*})={\tilde{J}}^*_\alpha (\theta ,x) ~\text {for}~~(\theta ,x)\in [0,1]\times S.\nonumber \\ \end{aligned}$$

(5.5)

So, in view of Theorem 3.1, by Eqs. (3.1), (5.3), and (5.5), we conclude that ${\tilde{f}}^*$ is a minimizing selector of (3.1). $\square $

When the transition and cost rates are bounded, the existence of an optimal control is ensured by Theorem 5.1.

Corollary 5.1

Grant Assumption 3.1 ((i)–(ii)). Also, assume that the transition and cost rates are bounded. Then, there exist a unique solution $\varphi _\alpha $ and an optimal control for the HJB Eq. (3.1).

Proof

Suppose there exist constants $L_1$ and $b_1$, such that $\displaystyle \sup _{(x,a)\in {K}}q_x(a)\le L_1$ and $\displaystyle \sup _{(x,a)\in {K}}c(x,a)\le b_1$. First we take the Lyapunov function ${\mathcal {V}}(x)\equiv P$, for all $x\in S$, $P\ge 1$, a constant. Now $\int _{S}{\mathcal {V}}(y)q(dy|x,a)=\int _{S}{\mathcal {V}}^2(y)q(dy|x,a)=0$, for all $(x,a)\in {K}.$ Now, take ${\rho }_0=\alpha $, $M_0=L_1$, any real number, ${\rho }_1\in (0,\alpha )$, and ${L}_0=b_1$. Then Assumption 2.1 is verified. Now for all $x\in S$, take any constants ${\rho }_2\in (0,\alpha )$ and $b_0\in (0,\infty )$. Then Assumption 2.2 holds. Also $\int _{S}{\mathcal {V}}(y)q(dy|x,a)$ is continuous in $a\in A(x)$. So, Assumption 3.1 is also true. Then, by Theorem 5.1, we have a unique solution $\varphi _\alpha $ and an optimal control for the HJB Eq. (3.1). $\square $

6 Application and example

In this section, we verify the above assumptions with one example, where the transition and cost rates are unbounded.

Example 6.1

The Gaussian Model: Suppose a hunter is hunting outside his house for his manager. Suppose the house is at state 0. A positive state represents the distance from the house to the right, and a negative state represents the distance from the house to the left. Let $S={\mathbb {R}}$. If the current position is $x\in S$, the hunter takes an action $a\in A(x)$, then after an exponentially distributed travel time with rate $\lambda (x,a)>0$, the hunter reaches the new position, and the travel distance follows the normal distribution with mean x and variance $\sigma $. (Or we can interpret $\lambda (x,a)$ as the total jump intensity that is an arbitrary measurable positive-valued function on $S\times A$, and the distribution of the state after a jump from $x\in S$ is normal with the variance $\sigma $ and expectation x.) Also assume that the hunter receives a payoff c(x, a) from his manager for each unit of time he spends there. Let us consider the model as $A_2:=\{S,(A,A(x),x\in S),c(x,a),q(dy|x,a)\}$, where $S=(-\infty ,\infty )$. For each $D\in {\mathcal {B}}(S)$, the transition rate is

$$\begin{aligned} q(D|x,a)=\lambda (x,a)\bigg [\int _{y\in D}\frac{1}{\sqrt{2\pi }\sigma }e^{-\frac{(y-x)^2}{2\sigma ^2}}dy-\delta _x(D)\bigg ],~x\in S,a\in A(x), \sigma >0. \end{aligned}$$

(6.1)

To ensure the existence of an optimal Markov control for the model, we consider the following hypotheses.

(I)
For each fixed $x\in S$, $\lambda (x,a)$ is continuous in $a\in A(x)$ and there exists a positive constant $M_1$ such that $\displaystyle 0<\sup _{a\in A(x)}\lambda (x,a)\le M_1(x^2+1)$ and $M_1<\frac{\alpha }{6\sigma ^2(\sigma ^2+1)}$.
(II)
For each $x\in S$, the cost rate c(x, a) is nonnegative and continuous in $a\in A(x)$ and there exists a constant $0<\rho _1<\min \{\alpha ,\frac{\alpha ^2}{M_1\sigma ^2}\}$ such that
$$\begin{aligned} \sup _{a\in A(x)}c(x,a)\le \rho _1 \log (1+x^2). \end{aligned}$$
(III)
For each fixed $x\in S$, A(x) is a compact subset of the Borel spaces A.

Proposition 6.1

Under conditions (I)–(III), the above controlled system satisfies the Assumptions 2.1, 2.2, and 3.1. Hence by Theorem 5.1, there exists an optimal Markov control for this model.

Proof

We know $\frac{1}{\sqrt{2\pi }\sigma }\int _{-\infty }^{\infty }(y-x)^{2k+1}e^{-\frac{(y-x)^2}{2\sigma ^2}}dy=0$ and $\frac{1}{\sqrt{2\pi }\sigma }\int _{-\infty }^{\infty }(y-x)^{2k}e^{-\frac{(y-x)^2}{2\sigma ^2}}dy=1\cdot 3\cdots (2k-1)\sigma ^{2k}$ for all $k=0,1\cdots .$

We first verify Assumption 2.1. Let ${\mathcal {V}}(x)=x^2+1$.

$$\begin{aligned} \int _{S}{\mathcal {V}}(y)q(dy|x,a)&=\lambda (x,a)\bigg [\frac{1}{\sqrt{2\pi }\sigma }\int _{-\infty }^{\infty }(y^2+1)e^{-\frac{(y-x)^2}{2\sigma ^2}}dy-(x^2+1)\bigg ]\nonumber \\&=\lambda (x,a)\sigma ^2\nonumber \\&\le M_1 \sigma ^2{\mathcal {V}}(x). \end{aligned}$$

(6.2)

Let $\rho _0=M_1 \sigma ^2$. Then $ \int _{S}{\mathcal {V}}(y)q(dy|x,a)\le \rho _0{\mathcal {V}}(x).$ Now

$$\begin{aligned} q^{*}(x)=\sup _{a\in A(x)}q_x(a)=\sup _{a\in A(x)}\lambda (x,a)\le M_1(x^2+1)=M_1 {\mathcal {V}}(x)~\forall ~x\in S. \end{aligned}$$

Now by condition (II), we can write

$$\begin{aligned} \sup _{a\in A(x)}c(x,a)\le \rho _1 \log (1+x^2)+M_1. \end{aligned}$$

Observe that by condition (II), $0<\rho _1<\min \{\alpha ,\rho _0^{-1}\alpha ^2\}$. Hence Assumption 2.1 is verified with $M_0=L_0= M_1$.

Next we verify Assumption 2.2. For any $x\in S$, $a\in A(x)$,

$$\begin{aligned} \int _{S}q(dy|x,a){\mathcal {V}}^2(y)&=\int _{S}q(dy|x,a) (1+y^2)^2\\&=\lambda (x,a)\biggl [\frac{1}{\sqrt{2\pi }\sigma }\int _{-\infty }^{\infty }(y^2+1)^2e^{-\frac{(y-x)^2}{2\sigma ^2}}dy-(x^2+1)^2\biggr ]\\&=\lambda (x,a)[1\cdot 3\sigma ^4+\sigma ^2(2+6x^2)]\\&\le M_1(x^2+1)[3\sigma ^4+\sigma ^2(2+6 x^2)]\\&=M_1\sigma ^2(x^2+1)(3\sigma ^2+2+6x^2)\\&\le 6M_1\sigma ^2(x^2+1)(x^2+1)(\sigma ^2+1)\\&=6M_1{\mathcal {V}}^2(x)\sigma ^2(\sigma ^2+1)\\&\le \rho _2 {\mathcal {V}}^2(x)+1 \end{aligned}$$

where $\rho _2=6M_1\sigma ^2(\sigma ^2+1)$, and $b_0=1$. Then by condition (I), we have $0<\rho _2<\alpha $. Hence, Assumption 2.2 is verified. Now by conditions (I) and (II) c(x, a) is continuous in $a\in A(x)$. Observe that by condition (I) and (6.2), $ \int _{S}{\mathcal {V}}(y)q(dy|x,a)$ is continuous in $a\in A(x)$. Hence Assumption 3.1 is also verified. So, by Theorem 5.1, we see that there exists an optimal Markov control for this model. $\square $

Remark 6.1

As we mention in the introduction, there are many real-life applications, where the underlying system dynamic is modeled as a CTMDP, with a Borel state and action spaces as well as cost and transition rates are unbounded, see such a cash-flow problem in Guo and Zhang (2019), [p. 112, Piunovskiy and Zhang (2020)]. Also, there are lots of real-life examples like infrastructure surveillance models [p. 115–116, Piunovskiy and Zhang (2020)], queueing model [p. 192, Piunovskiy and Zhang (2020)], where we see that the state space is uncountable, can be formulated in our set-up.

References

Bauerle N, Rieder U (2014) More risk-sensitive Markov decision processes. Math Oper Res 39:105–120
Article MathSciNet Google Scholar
Bertsekas D, Shreve S (1996) Stochastic optimal control: the discrete-time case. Academic Press Inc, New York
Ghosh MK, Saha S (2014) Risk-sensitive control of continuous-time Markov chains. Stoch 86:655–675
Guo XP, Hernandez-Lerma O (2009) Continuous-time Markov decision processes: theory and applications. Stochastic modelling and applied probability. Springer, Berlin
Guo X, Liao ZW (2019) Risk-sensitive discounted continuous-time Markov decision processes with unbounded rates. SIAM J Control Optim 57:3857–3883
Guo X, Piunovskiy A (2011) Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res 36:105–132
Guo X, Zhang J (2019) Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces. Discrete Event Dyn Syst 29:445–471
Article MathSciNet Google Scholar
Guo X, Zhang Y (2020) On risk-sensitive piecewise deterministic Markov decision processes. Appl Math Optim 81:685–710
Article MathSciNet Google Scholar
Guo X, Huang Y, Song X (2012) Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent polices. SIAM J Control Optim 50:23–47
Article MathSciNet Google Scholar
Guo X, Liu Q, Zhang Y (2019) Finite horizon risk-sensitive continuous-time Markov decision processes with unbounded transition and cost rates. 4OR 17:427–442
Article MathSciNet Google Scholar
Hernandez-Lerma O, Lasserre J (1999) Further topics on discrete-time Markov control processes. Springer, New York
Book Google Scholar
Kitaev MY (1995) Semi-Markov and jump Markov controlled models: average cost criterion. SIAM Theory Probab Appl 30:272–288
Kitaev MY, Rykov VV (1995) Controlled queueing systems. CRC Press, Boca Raton
MATH Google Scholar
Kumar KS, Pal C (2013) Risk-sensitive control of jump process on denumerable state space with near monotone cost. Appl Math Optim 68:311–331
Article MathSciNet Google Scholar
Kumar KS, Pal C (2015) Risk-sensitive control of continuous-time Markov processes with denumerable state space. Stoch Anal Appl 33:863–881
Article MathSciNet Google Scholar
Masi GB, Stettner L (2000) Infinite horizon risk-sensitive control of discrete time Markov processes with small risk. Syst Control Lett 40:15–20
Article MathSciNet Google Scholar
Masi GB, Stettner L (2007) Infinite horizon risk-sensitive control of discrete time Markov processes under minorization property. SIAM J Control Optim 46:231–252
Pal C, Pradhan S (2019) Risk-sensitive control of pure jump processes on a general state space. Int J Probab Stoch Process 91(2):155–174
Piunovskiy A, Zhang Y (2020) Continuous-time Markov decision processes. Springer, Berlin
Book Google Scholar
Piunovskiy A, Zhang Y (2011) Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J Control Optim 49:2032–2061
Article MathSciNet Google Scholar
Prieto-Rumeau T, Hernandez-Lerma O (2012) Selected topics in continuous-time controlled Markov chains and Markov games. Imperical College Press, London
Book Google Scholar
Wei Q (2016) Continuous-time Markov decision processes with risk-sensitive finite-horizon cost criterion. Math Methods Oper Res 84:461–487
Article MathSciNet Google Scholar
Whittle P (1990) Risk-sensitive optimal control, Wiley-Inter Science series in systems and optimization. Wiley, Chichester
Google Scholar
Zhang Y (2017) Continuous-time Markov decision processes with exponential utility. SIAM J Control Optim 55:2636–2660
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank the anonymous referees for their valuable comments and helpful suggestions that have improved the presentation of this paper.

Author information

Authors and Affiliations

Department of Mathematics, Indian Institute of Technology Guwahati, Guwahati, Assam, India
Subrata Golui & Chandan Pal

Authors

Subrata Golui
View author publications
You can also search for this author in PubMed Google Scholar
Chandan Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chandan Pal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Golui, S., Pal, C. Risk-sensitive discounted cost criterion for continuous-time Markov decision processes on a general state space. Math Meth Oper Res 95, 219–247 (2022). https://doi.org/10.1007/s00186-022-00779-9

Download citation

Received: 14 September 2021
Revised: 31 December 2021
Accepted: 17 March 2022
Published: 11 April 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00186-022-00779-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Risk-sensitive discounted cost criterion for continuous-time Markov decision processes on a general state space

Abstract

Similar content being viewed by others

Risk-sensitive control of Markov jump linear systems: Caveats and difficulties

On Risk-Sensitive Piecewise Deterministic Markov Decision Processes

Risk-sensitive infinite-horizon discounted piecewise deterministic Markov decision processes

1 Introduction

2 The control problems

Definition 2.1

Assumption 2.1

Remark 2.1

Proposition 2.1

Proof

Assumption 2.2

3 Stochastic representation of a solution to the HJB equation

Remark 3.1

Assumption 3.1

Remark 3.2

Theorem 3.1

Proof

4 The existence of solution to the HJB equation

Lemma 4.1

Proof

Theorem 4.1

Proof

5 The existence of optimal control

Theorem 5.1

Proof

Theorem 5.2

Proof

Corollary 5.1

Proof

6 Application and example

Example 6.1

Proposition 6.1

Proof

Remark 6.1

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation