Abstract
In this paper, we consider risk-sensitive discounted control problem for continuous-time jump Markov processes taking values in general state space. The transition rates of underlying continuous-time jump Markov processes and the cost rates are allowed to be unbounded. Under certain Lyapunov condition, we establish the existence and uniqueness of the solution to the Hamilton–Jacobi–Bellman equation. Also, we prove the existence of optimal risk-sensitive control in the class of Markov control and completely characterized the optimal control.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we study the risk-sensitive discounted criterion for continuous-time Markov decision processes (CTMDPs) with Borel state space. In the risk-neutral criterion, the controller wants to optimize the expected value of the total payoff. But in the risk-sensitive criterion, controller considers the expected value of the exponential of the total payoff and so, the risk-sensitive criterion gives a better protection from the risk. Therefore, the risk-sensitive or exponential of integral is a very popular cost criterion due to its applications in many areas such as queueing systems and finance, for more details see Bauerle and Rieder (2014) and Whittle (1990) and the references therein. In the literature, risk-sensitive control problems for CTMDPs are an important class of stochastic optimal control problems and have been widely studied under different sets of conditions. Finite horizon risk-sensitive CTMDPs for countable state space were studied in Ghosh and Saha (2014), Guo et al. (2019) and Wei (2016) and for infinite horizon risk-sensitive CTMDPs we refer to Ghosh and Saha (2014), Guo and Zhang (2020), Kumar and Pal (2013, 2015), Pal and Pradhan (2019) and Zhang (2017). For important contributions to the risk-sensitive control of discrete-time MDP on a general state space, see Masi and Stettner (2000) and Masi and Stettner (2007). Although risk-sensitive control of CTMDPs on a countable state space have been studied extensively, but the corresponding literature in the context of risk-sensitive control of CTMDPs on a general state space is rather limited. Some exceptions are (Guo and Zhang 2019, 2020; Pal and Pradhan 2019).
In the paper (Pal and Pradhan 2019), the authors studied risk-sensitive control of pure jump processes on a general state space. They considered bounded transition and cost rates and all controls are Markovian. In Pal and Pradhan (2019), the authors proved the HJB characterization of the optimal risk-sensitive control. The boundedness assumption on the transition and cost rates plays a key role in the proof of the existence of the optimal risk-sensitive control in Pal and Pradhan (2019). This boundedness requirement, however, imposes some restrictions in applications, for instance in queueing control and population processes, where the transition and reward/cost rates are usually unbounded. Also, there are many real-life situations where the state space may be uncountable, for example, the chemical reaction model, Gaussian model, etc. One can see Guo and Zhang (2019), Piunovskiy and Zhang (2020) and references therein for the real-life examples. In Guo and Zhang (2019), the author considered the finite-horizon risk-sensitive control problem for CTMDPs on a Borel state space with unbounded transition and cost rates and proved the existence of optimal control via the HJB equation.
In this paper, we study a much more general risk-sensitive control problem for CTMDP with general state space. To the best of our knowledge, this is the first work which deals with infinite horizon discounted risk-sensitive control for CTMDPs on a general state space with unbounded cost and transition rates and the controls can be history-dependent. The main objective of this work is to prove the existence of the solution of the HJB equation and the characterization of the optimal risk-sensitive control. In particular (1) We prove that the HJB equations has a unique solution \(\varphi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)\) satisfying the bounds as in Eq. (3.1) below, where \(L^\infty _{{\mathcal {V}}}([0,1]\times S)\) is described below. (2) We prove that any measurable minimizer of the HJB equation is optimal, and conversely any optimal control in the class of Markov controls is a minimizer of the HJB Eq. (3.1) below. We first consider for bounded transition and cost rates and establish the existence of a solution to the corresponding HJB equation by Banach’s fixed point theorem as in Pal and Pradhan (2019). Then we will relax the bounded hypothesis and extend this result to unbounded transition and cost rates. We characterize the value function via the HJB equation. Also, we prove the existence of an optimal control in the class of Markov control and the HJB characterization of the optimal risk-sensitive control and prove its complete characterization. In Corollary 5.1, we prove that if the cost and transition rates are bounded, then an optimal control exists for our model.
The rest of this article is structured as follows. Section 2 deals with the description of the problem, required notations, some Assumptions, and preliminary results. In Sect. 3, we give a continuity-compactness Assumption and prove the stochastic representation of the solution of the HJB Eq. (3.1). In Sect. 4, we truncate our transition and cost rates and prove the existence of the unique solution to the HJB equation. A complete characterization of optimal control is proven in Sect. 5. In Sect. 6, we illustrate our theory and assumptions by an illustrative example.
2 The control problems
The model of CTMDP is a five-tuple which consists of the following elements:
-
a Borel space S, called the state space, whose elements are referred to as states of the system and the corresponding Borel \(\sigma \)-algebra is \({\mathcal {B}}(S)\). (Throughout the whole paper we consider that for any Borel space X, the corresponding Borel \(\sigma \)-algebra is \({\mathcal {B}}(X)\).)
-
A is the action set, which is assumed to be Borel space with the Borel \(\sigma \)-algebra \({\mathcal {B}}(A)\).
-
for each \(x\in S\), \(A(x)\in {\mathcal {B}}(A)\) denotes the set of admissible actions for state x. Let \(K:=\{(x, a)|x\in S, a\in A(x)\}\), which is a Borel subset of \(S\times A\).
-
the measurable function \(c:K \rightarrow {\mathbb {R}}_{+}\) denotes the cost rate function. We require the cost function c(x, a) to measure (or evaluate) the utility of taking action a at state x.
-
given any \((x, a)\in K\), the transition rate \(q(\cdot | x, a)\) is a Borel measurable signed kernel on S given K. That is, \(q(\cdot |x,a)\) satisfies countable additivity; \(q(D| x, a)\ge 0 \) where \((x,a)\in K\), \(x\notin D\) and \(D\in {\mathcal {B}}(S)\). Moreover, we assume that \(q(\cdot | x, a)\) satisfies the following conservative and stable conditions: for any \(x\in S,\)
$$\begin{aligned}&q(S|x,a)\equiv 0 ~~~\text {and}\\&~q^{*}(x):=\sup _{a\in A(x)}q_x(a)<\infty , \end{aligned}$$where \(q_x(a):=-q(\{x\}| x, a)\ge 0.\) We need the transition rates to specify the random dynamic evolution of the system.
Next, we give an informal description of the evolution of the CTMDPs as follows. The controller observes continuously the current state of the system. When the system is in state \(x\in S\) at time \(t\ge 0\), he/she chooses action \(a_t\in A(x)\) according to some control. As a consequence of this, the following happens:
-
the controller incurs an immediate cost at rate \(c(x, a_t)\); and
-
after a random sojourn time (i.e., the holding time at state x), the system jumps to a set \(B\in {\mathcal {B}}(S)\) (\(x\notin B\)) of states with the transition probability \(\dfrac{q(B|x,a_t)}{q_x(a_t)}\) determined by the transition rates \(q(dy|x,a_t)\). The distribution function of the sojourn time is \((1-e^{-\int _{t}^{t+x}q_x(a_s)ds})\). (see Proposition B.8 in [ Guo and Hernandez-Lerma (2009), p. 205] for details).
When the state of the system transits to a new state \(y\ne x\), the above procedure is repeated. Thus, the controller tries to minimize his/her costs with respect to some performance criterion \({\mathcal {J}}_\alpha (\cdot ,\cdot , \cdot )\), which in our present case is defined by (2.2), below. To formalize what is described above, below we describe the construction of continuous time Markov decision processes (CTMDPs) under possibly history-dependent controls. To construct the underlying CTMDPs (as in Guo and Piunovskiy 2011; Kitaev 1995; Piunovskiy and Zhang 2011, 2020) we introduce some notations: let \(S_\Delta :=S \cup \{\Delta \}\) (with some “isolated” state \(\Delta \notin S\)), \(\text{\O}mega _0:=(S\times (0,\infty ))^\infty \), \(\text{\O}mega _k:=(S\times (0,\infty ))^k\times S\times (\{\infty \}\times \{\Delta \})^\infty \) for \(k\ge 1\) and \(\text{\O}mega :=\cup _{k=0}^\infty \text{\O}mega _k\). Let \({\mathcal {F}}\) be the Borel \(\sigma \)-algebra on \(\text{\O}mega \). Then we obtain the measurable space \((\text{\O}mega , {\mathcal {F}})\). For some \(k\ge 1\), and sample \( \omega :=(x_0, \theta _1, x_1, \cdots , \theta _k, x_k, \cdots )\in \text{\O}mega ,\) define
Using \(\{T_k\}\), we define the state process \(\{\xi _t\}_{t\ge 0}\) as
Here, \(I_{E}\) denotes the indicator function of a set E, and we use the convention that \(0+z=:z\) and \(0z=:0\) for all \(z\in S_\Delta \). Obviously, \(\xi _t(\omega )\) is right-continuous on \([0,\infty )\). We denote \(\xi _{t-}(\omega ):=\liminf _{s\rightarrow t-}\xi _s(\omega )\). From Eq. (2.1), we see that \(T_k(\omega )\) \((k\ge 1)\) denotes the k-th jump moment of \(\{\xi _t, t\ge 0\}\), \(x_{k-1}\) is the state of the process on \([T_{k-1}(\omega ),T_k(\omega ))\), \(\theta _k=T_k(\omega )-T_{k-1}(\omega )\) plays the role of sojourn time at state \(x_{k-1}\), and the sample path \(\{\xi _t(\omega ),t\ge 0\}\) has at most denumerable states \(x_k(k=0,1,\cdots )\). The process after \(T_\infty \) is regarded to be absorbed in the state \(\Delta \). Thus, let \(q(\cdot | \Delta , a_\Delta ):\equiv 0\), \(A_\Delta :=A\cup \{a_\Delta \}\), \( A(\Delta ):=\{a_\Delta \}\), \(c(\Delta , a):\equiv 0\) for all \(a\in A_\Delta \), where \(a_\Delta \) is an isolated point.
To precisely define the criterion, we need to introduce the concept of a control as in Guo et al. (2012), Guo and Piunovskiy (2011) and Kitaev and Rykov (1995). Take the right-continuous \(\sigma \)-algebras \(\{{\mathcal {F}}_t\}_{t\ge 0}\) with \({\mathcal {F}}_t:=\sigma (\{T_k\le s,\xi _{T_k}\in S\}: 0\le s\le t, k\ge 0)\). For all \(t\ge 0\), \({\mathcal {F}}_{s-}=:\bigvee _{0\le t<s}{\mathcal {F}}_t\). Now define a \(\sigma \)-algebra \({\mathcal {P}}:=\sigma (A\times \{0\}, B\times (s,\infty ): A\in {\mathcal {F}}_0, B\in {\mathcal {F}}_{s-})\), which denotes the \(\sigma \)-algebra of predictable sets on \(\text{\O}mega \times [0,\infty )\) related to \(\{{\mathcal {F}}_t\}_{t\ge 0}\). To complete the specification of a stochastic optimal control problem, we need, of course, to introduce an optimality criterion. This requires to define the class of controls as below.
Definition 2.1
A history-dependent policy \(\pi :=\{\pi _t(\omega )\}_{t\ge 0}\) is a measurable map from \((\text{\O}mega \times [0,\infty ),{\mathcal {P}})\) onto \((A_\Delta ,{\mathcal {B}}(A_\Delta ))\) satisfying \(\pi _t(\omega )\in A(\xi _{t-}(\omega ))\) for all \(\omega \in \text{\O}mega \) and \(t\ge 0\). For notational simplicity, we denote a history-dependent control as \(\{\pi _t\}_{t\ge 0}\). The set of all history-dependent controls is denoted by \(\Pi \). A control \(\pi \in \Pi \), is called a Markov if \(\pi _t(\omega )=\pi _t( \xi _{t-}(w))\) for every \(w\in \text{\O}mega \) and \(t\ge 0\), where \(\xi _{t-}(w):=\lim _{s\uparrow t}\xi _s(w)\). We denote by \(\Pi ^{m}\) the family of all Markov controls.
For any compact metric space Y, let P(Y) denote the space of probability measures on Y with Prohorov topology. Under Assumption 2.1 below, for any initial state \(x\in S\) and any control \(\pi \in \Pi \), Theorem 4.27 in Kitaev and Rykov (1995) yields the existence of a unique probability measure denoted by \(P^{\pi }_x\) on \((\text{\O}mega ,{\mathcal {F}})\). Let \(E^{\pi }_x\) be the expectation operator with respect to \(P^{\pi }_x\). Fix any discounted factor \(\alpha >0\). For any \(\pi \in \Pi \) and \(x\in S\), the risk-sensitive discounted criterion is defined as
provided that the integral is well defined, where \(\{\xi _t\}_{t\ge 0}\) is the Markov process corresponding to \(\pi =\{\pi _t\}_{t\ge 0}\in \Pi \) and \(\theta \in (0,1]\) denotes a risk-sensitive parameter and the limiting case of \(\theta \rightarrow 0\) is the risk-neutral case. For each \(x\in S\), let
A control \(\pi ^{*}\in \Pi \) is said to be optimal if \({\mathcal {J}}_\alpha (\theta ,x,\pi ^{*})={\mathcal {J}}^{*}_\alpha (\theta ,x)\) for all \(x\in S\). The objective of this paper is to provide conditions for the existence of optimal control and introduce a HJB characterization of such control.
Since the logarithm is an increasing function, instead of studying \({\mathcal {J}}_\alpha (\theta ,x,\pi )\), we will consider \({\tilde{J}}_\alpha (\theta ,x,\pi )\) on \([0,1]\times S\times \Pi \) defined by
Obviously, \({\tilde{J}}_\alpha (\theta ,x,\pi )\ge 1\) for \((\theta ,x)\in [0,1]\times S\) and \(\pi \in \Pi \), and we have \(\pi ^{*}\) is optimal if and only if \(\displaystyle \inf _{\pi \in \Pi }{\tilde{J}}_\alpha (\theta ,x,\pi )={\tilde{J}}_{\alpha }(\theta ,x,\pi ^{*}) =:{\tilde{J}}^{*}_\alpha (\theta ,x) ~\forall x\in S.\) Since the rates q(dy|x, a) and costs c(x, a) are allowed to be unbounded, we next give conditions for the non-explosion of \(\{\xi _t,t\ge 0\}\) and finiteness of \( {\mathcal {J}}_\alpha (\theta , x, \pi )\), which had been widely used in CTMDPs; see, for instance, (Guo and Hernandez-Lerma 2009; Guo et al. 2012; Guo and Liao 2019; Guo and Piunovskiy 2011; Prieto-Rumeau and Hernandez-Lerma 2012) and references therein.
Assumption 2.1
There exists a real-valued Borel measurable function \({\mathcal {V}} \ge 1\) on S and constants \(\rho _0> 0\), \(M_0>0\), \(L_0\ge 0\) and \(0<\rho _1<\min \{\alpha ,\rho ^{-1}_0\alpha ^2\}\) such that
-
(i)
\(\int _{S}{\mathcal {V}}(y)q(dy |x, a)\le \rho _0 {\mathcal {V}}(x)~~~\forall (x, a)\in K\);
-
(ii)
\(\sup _{a\in A(x)}q_x(a)\le M_0 {\mathcal {V}}(x)~~~\forall x\in S\);
-
(iii)
\(\sup _{a\in A(x)}c(x,a)\le \rho _1\log {\mathcal {V}}(x)+L_0~~~\forall x\in S.\)
Remark 2.1
-
(a)
Note that, when the transition rates are bounded i.e., \(\sup _{x\in S} q^*(x)<\infty \), Assumptions 2.1 (i) and (ii) are satisfied by taking a suitable constant value of \({\mathcal {V}}(x)\).
-
(b)
Under Assumption 2.1 (iii) the criterion (2.3) is well defined and finite; see Proposition 2.1(c) below.
Proposition 2.1
Grant Assumption 2.1. Then for any control \(\pi \in \Pi \) and \((\theta ,x)\in [0,1]\times S\), the following results are true:
-
(a)
\({P}^{\pi }_x(T_\infty =\infty )=1\), \(P^\pi _x(\xi _0=x)=1\), and \({P}^{\pi }_x(\xi _t\in S)=1\) for all \(t\ge 0\);
-
(b)
\({E}^{\pi }_x[{\mathcal {V}}(\xi _t)]\le e^{\rho _0 t} {\mathcal {V}}(x)\) for all \(t\ge 0;\)
-
(c)
We have
$$\begin{aligned} {\tilde{J}}_\alpha (\theta ,x,\pi )\le \frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1\theta }e^{ {\theta L_0}/{\alpha }}[{\mathcal {V}}(x)]^{\frac{\rho _1\theta }{\alpha }}\le \frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1}e^{{L_0}/{\alpha }}{\mathcal {V}}(x). \end{aligned}$$Also, we get
$$\begin{aligned} {\mathcal {J}}^{*}_\alpha (\theta ,x)\le \log \biggl ({\frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1}}\biggr )+\frac{L_0}{\alpha }+\frac{\rho _1}{\alpha }\log {{\mathcal {V}}(x)} ~~\forall \theta \in (0,1],x\in S. \end{aligned}$$(2.4)
Proof
For parts (a) and (b), see, Guo et al. (2012) and Guo and Piunovskiy (2011, Theorem 3.1).
Proof of part (c): Observe that \(d(- e^{-\alpha t})\) is a probability measure on \([0,\infty ).\) For any \(\pi \in \Pi \) and \((\theta ,x)\in [0,1]\times S\), by (2.3) and Jensen’s inequality we have
By Assumption 2.1 and part (b) we obtain
where the last equality holds due to the fact that \(\rho _0\rho _1\theta <\alpha ^2\).
Next observe that \(\displaystyle \sup _{\theta \in [0,1]}{\tilde{J}}^{*}_\alpha (\theta ,x)\le \frac{\alpha ^2}{\alpha ^2-\rho _0\rho _1} e^{{ L_0}/{\alpha }}{\mathcal {V}}(x)\), and
Also, doing a simple and direct calculation, we achieve (2.4). \(\square \)
In Ghosh and Saha (2014) and Kumar and Pal (2013), the authors used the Dynkin’s formula within the class of Markov controls by using the Markov property of the state process \(\{\xi _t\}_{t\ge 0}\). But this Markov property may fail to hold when we study within the class of history-dependent controls, and consequently, here we can’t directly apply the Dynkin formula. Hence we assume the following condition, so that we can apply the Dynkin’s formula for a large enough class of functions, which had been widely used in CTMDPs; see, for instance, (Guo and Liao 2019; Guo et al. 2019; Guo and Zhang 2019).
Assumption 2.2
The Borel measurable function \({\mathcal {V}}^2\ge 1\) on S satisfies the following Lyapunov condition
for some constants \(0<\rho _2<\alpha \) and \(b_0\ge 0\). Here \({\mathcal {V}}\) is as in Assumption 2.1.
We now introduce some frequently used notations.
-
\(C^\infty _c(a,b)\) denotes the set of all infinitely differentiable functions on (a, b) with compact support.
-
Let \(A_{as}([0,1]\times S)\) denote the space of all functions which are real-valued and differentiable almost everywhere with respect to the first variable \(\theta \in [0,1]\). Given any real-valued function \(W\ge 1\) on S and any Borel set X, a real-valued function \(\varphi \) on \(X\times S\) is called W bounded if \(\displaystyle \Vert \varphi \Vert ^\infty _{W}:=\sup _{(\theta ,x)\in X\times S}\frac{|\varphi (\theta ,x)|}{W(x)}< \infty \). Denote \(B_{W}(X\times S)\) the Banach space of all W-bounded functions. When \(W\equiv 1\), \(B_{1}([0,1]\times S)\) is the space of all bounded functions on \([0,1]\times S.\) Now define \(L^\infty _{W}([0,1]\times S):=\{\varphi :[0,1]\times S\rightarrow {\mathbb {R}}:\varphi \in B_{W}([0,1]\times S)\cap A_{as}([0,1]\times S)\}\).
3 Stochastic representation of a solution to the HJB equation
In this section, we prove that if the HJB equation for the cost criterion (2.3) has a solution then we will give a stochastic representation of that solution. Using dynamic programming heuristics, the HJB equations for the discounted cost criterion (2.3) is given by
for each \(x\in S\) and a.e. \(\theta \in [0,1]\) where the upper bound of \(\varphi _\alpha (\theta ,x)\) is motivated by Proposition 2.1.
Remark 3.1
To prove the existence of an optimal control for bounded cost and transition rates, in Pal and Pradhan (2019), the authors studied the following HJB equation having a solution \(\phi _\alpha (\theta ,x)\) on \([0,1]\times S\) such that
From the arguments for the existence of a unique solution to the Eq. (3.2), it is necessary to have \(\phi _\alpha (\theta ,x)\) converges to 1 uniformly in x as \(\theta \rightarrow 0\). But, it is not true in general when the cost and transition rates are unbounded; for more details see Example 3.2 in Guo and Liao (2019). In this article we replace the uniform convergence condition by the above new one.
To ensure the existence of an optimal control, in addition to Assumptions 2.1 and 2.2, we also need the following continuity and compactness conditions.
Assumption 3.1
The following conditions hold:
-
(i)
for each \(x\in S\), the set A(x) is compact;
-
(ii)
for any fixed \(x\in S\), the function c(x, a) is continuous in \(a\in A(x)\);
-
(iii)
for any given \(x\in S\), the function \(\displaystyle \int _{S}{\mathcal {V}}(y)q(dy|x,a)\) is continuous in \(a\in A(x)\), where \({\mathcal {V}}\) is introduced in Assumption 2.1.
Remark 3.2
Assumptions 3.1 (i)–(iii) are commonly used to find an optimal control for continuous-time MDP, see Guo and Hernandez-Lerma (2009), Guo and Liao (2019), Guo et al. (2019), Guo and Piunovskiy (2011) and Guo and Zhang (2019). Also, note that if Assumption 3.1 (iii) is satisfied, then for any given \(x\in S\), the function \(\displaystyle \int _{S}u(y)q(dy|x,a)\) is continuous in \(a\in A(x)\) for each function \(u\in B_{{\mathcal {V}}}(S)\).
In the next theorem we show that if the HJB equation has a solution then its stochastic representation is equal to the value function corresponding to the cost criterion (2.3).
Theorem 3.1
Under Assumptions 2.1, 2.2, and 3.1 suppose that the HJB Eq. (3.1) has a solution \(\varphi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)\) satisfying the bounds as in Eq. (3.1). Then, for all \((\theta ,x)\in [0,1]\times S\), we have the probabilistic representation of \(\varphi _\alpha \) as
i.e., \(\varphi _\alpha (\theta ,x)={\tilde{J}}^{*}_\alpha (\theta ,x)\) for all \((\theta ,x)\in [0,1]\times S\).
Proof
First, we see that
is continuous in \(a\in A(x)\) and A(x) is compact. So by measurable selection theorem, [Bertsekas and Shreve (1996), Proposition 7.33], there exists a measurable function \(f^{*}:[0,1]\times S\rightarrow A\) such that
Let
be defined by
Now we observe from Eq. (3.1) that for any \(x\in S\), \(a\in A(x)\) and a.e. \(\theta \in [0,1]\) that
For any history-dependent control \(\pi \in \Pi \) and \(\theta \in [0,1]\), let \(\{\xi _t, t\ge 0\}\) be the corresponding process, and define \(\theta (t):=\theta e^{-\alpha t}\). Now for each \(\omega \in \text{\O}mega \), by Eq. (3.5), we get for a.e. \(s\ge 0\),
Define a function \(g: [0,\infty )\times S\times \text{\O}mega \rightarrow [0, \infty )\) by
In view of Assumptions 2.1 and 2.2, we have
where second inequality is obtained by using Jensen’s inequality.
Hence \(E^{\pi }_x\biggl [\exp \biggl (\int _{0}^{t}\ 2e^{-\alpha s}c(\xi _s,\pi _s)ds\biggr )\biggr ]<\infty \) for all \(x\in S\) and \(t\in (0,\infty )\). Thus, using the extension of Dynkin formula in Guo et al. (2019), Theorem 3.1 to the function g, we have
Now from (3.6) and (3.8), we have
Given any \(p>1\), let \(q>1\) such that \(\frac{1}{p}+\frac{1}{q}=1\), by Holder’s inequality we have
For \(T_2(q,t):=\{E^{\pi }_x[\varphi ^q_\alpha (\theta (t),\xi _t)]\}^{{1}/{q}}\), by the upper bound of \(\varphi _\alpha \) in (3.1), we have
If \(t>\alpha ^{-1}\log ({\theta q\rho _1}/{\alpha })\) then \({\theta e^{-\alpha t}q\rho _1}/{\alpha }<1\). Applying Jensen’s inequality and Proposition 2.1(b), we get
Next take \(t \rightarrow \infty \) and get
By (3.10), (3.11) and (3.12) we obtain
Now, take the limit as \(p\downarrow 1\) and get the result
Since \(\pi \in \Pi \) is an arbitrary control, we have
Using (3.1), (3.4) and (3.10), we can show that
Now, using the lower bound of \(\varphi _\alpha \) in (3.1) and Fatou’s lemma, we obtain
From (3.14) and (3.15), we have
Thus
4 The existence of solution to the HJB equation
In this Section, we prove that the Eq. (3.1) is the HJB equation for the \(\alpha \) discounted cost (2.3) and the Eq. (3.1) has a solution in \( L^\infty _{{\mathcal {V}}}([0,1]\times S)\). We now proceed to make a rigorous analysis of the above. First, we prove a lemma about the existence of a solution for the HJB equation for bounded transition and cost rates; see Lemma 4.1 below. Then in Theorem 4.1, we relax these boundedness condition and prove the existence of a solution to the HJB Eq. (3.1). For that, we first truncate our transition and cost rates which plays a crucial role to derive the HJB equations and find the solution. Fix any \(n\ge 1\), \(0<\delta <1\). For each \(n\ge 1\), \(x\in S\), \(a\in A(x)\), let \(A_n(x):=A(x)\), \(S_n:=\{x\in S|{\mathcal {V}}(x)\le n\}\), and \(K_n:=\{(x,a)|x\in S_n,a\in A_n(x)\}\). Moreover for each \(x\in S\), \(a\in A_n(x)\) define
and
Lemma 4.1
Grant Assumptions 2.1, 2.2 and 3.1. Then, there exists a unique function \(\varphi ^{(n,\delta )}_\alpha \) (depending on n, \(\delta \)) in \(L^\infty _{{\mathcal {V}}}([0,1]\times S)\) for which the followings are true :
-
(1)
\(\varphi ^{(n,\delta )}_\alpha \in B_{1}([0,1]\times S)\) is a bounded solution to the following differential equations (DEs) for all \(x\in S\) and a.e. \(\theta \in (\delta ,1]:\)
$$\begin{aligned} \left\{ \begin{array}{ll} \alpha \theta \frac{\partial \varphi ^{(n,\delta )}_\alpha }{\partial \theta }(\theta ,x) &{}=\displaystyle {\inf _{a\in A(x)}\biggl [\theta c_n(x,a)\varphi ^{(n,\delta )}_\alpha (\theta ,x)+\int _{S}q^{(n)}(dy|x,a)\varphi ^{(n,\delta )}_\alpha (\theta ,y)\biggr ]}\\ \varphi ^{(n,\delta )}_\alpha (\delta ,x)&{}=e^{{n\delta }/{\alpha }}. \end{array}\right. \end{aligned}$$(4.3) -
(2)
\(\varphi ^{(n,\delta )}_\alpha (\theta ,x)\) has a stochastic representation as follows: for each \(x\in S\) and a.e. \(\theta \in (\delta ,1]\),
$$\begin{aligned}&\varphi _\alpha ^{(n,\delta )}(\theta ,x)=\inf _{\pi \in \Pi }E^{\pi }_x\biggl [e^{{n\delta }/{\alpha }}\exp \biggl (\theta \int _{0}^{T_\delta (\theta )}e^{-\alpha t}c_n(\xi ^{(n)}_t,\pi _t)dt\biggr )\biggr ], \end{aligned}$$(4.4)where \(T_\delta (\theta ):=\alpha ^{-1}\log (\theta / \delta )\) and \(\{\xi ^{(n)}_t\}_{t\ge 0}\) is the process corresponding to the \(q^{(n)}(\cdot |x,a)\).
Proof
(1) Since \(S_n:=\{x\in S|{\mathcal {V}}(x)\le n\}\), by Assumption 2.1(ii), we see that \(\displaystyle q^{(n)}_x(a):=\int _{S/\{x\}}q^{(n)}(dy|x,a)\) is bounded. So we can use the Lyapunov function \(V\equiv 1\) such that \(\int _{S}q^{(n)}(dy|x,a)V(y)\le \rho _0 V(x)\), and \({\overline{q}}^{(n)}:=\sup _{(x,a)\in K}q^{(n)}_x(a)<\infty \). Now let us define a nonlinear operator T on \(B_{1}([0,1]\times S)\) as follows:
where \(u\in B_{1}([0,1]\times S)\) and \((\theta ,x)\in [\delta ,1]\times S\). By using the Assumption 2.1 and the fact that \(c_n\) is bounded, we obtain
Therefore, T is a nonlinear operator from \(B_{1}([0,1]\times S)\) to \(B_{1}([0,1]\times S)\). For any \(g_1,g_2\in B_{1}([0,1]\times S)\) and \(\theta \in [\delta ,1]\), we have
Now, we prove the following:
Since \(\sum _{k\ge 1}\frac{1}{\alpha ^k \cdot k!}\biggl [-2{\overline{q}}^{(n)}\log \delta +n(1-\delta )\biggr ]^k<\infty \), there exists some m such that \(\beta :=\frac{1}{\alpha ^m \cdot m!}\biggl [-2{\overline{q}}^{(n)}\log \delta +n(1-\delta )\biggr ]^m<1,\) which implies that \(\Vert T^m g_1-T^m g_2\Vert _1^\infty \le \beta \Vert g_1-g_2\Vert ^\infty _1\). Therefore, T is a m-step contraction operator on \(B_{1}([0,1]\times S)\). So, by Banach fixed point theorem, there exists a unique bounded function \(\varphi _\alpha ^{(n,\delta )}\in B_{1}([0,1]\times S)\) (depending on \((n,\delta )\)) such that \(T\varphi ^{(n,\delta )}_\alpha =\varphi ^{(n,\delta )}_\alpha \); that is,
Also note that \(\varphi ^{(n,\delta )}_\alpha (\delta ,x)=e^{{\delta n}/{\alpha }}\). Hence by using (4.1), (4.2) and the above equation, we have \(\varphi ^{(n,\delta )}_\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)\) and it satisfies equation (4.3).
(2) First we see that
is continuous in \(a\in A(x)\) and A(x) is compact. So by measurable selection theorem (Bertsekas and Shreve 1996), Proposition 7.33, there exists a measurable function \(f^{*\delta }:[0,1]\times S\rightarrow A\) such that
Let
be defined by
Let \(\theta (t):=\theta e^{-\alpha t}\) for \(t\in [0,\infty )\). Since \(c_n\) and \(\varphi _\alpha ^{(n,\delta ,k)}\) are bounded, by Dynkin’s formula we get
By using (4.3) and (4.8), we obtain
Since \(\pi \in \Pi \) is an arbitrary control and \(\varphi _\alpha ^{(n,\delta )}(\theta (T_\delta (\theta )),\xi ^{(n)}_{T_\delta })=e^{{n \delta }/{\alpha }}\), we have
Using Eqs. (4.3), (4.7) and (4.8), we can show that
Therefore
Therefore, from (4.9) and (4.10), we obtain (4.4). This completes the proof. \(\square \)
Theorem 4.1
Grant Assumptions 2.1, 2.2 and 3.1. Then the HJB Eq. (3.1) has a unique solution \(\varphi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)\) satisfying \(1\le \varphi _\alpha (\theta ,x)\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}\) for all \((\theta ,x)\in [0,1]\times S.\)
Proof
First note that \(\varphi ^{(n,\delta )}_\alpha \) is the solution to the Eq. (4.3), which depends on two parameters n, \(\delta \). We prove this theorem in two steps.
Step 1: In the first step, we construct a solution \(\varphi ^{(n)}_\alpha (\cdot ,x)\) from \(\varphi ^{(n,\delta )}_\alpha (\cdot ,x)\) by passing the limit as \(\delta \rightarrow 0\), such that \(\varphi ^{(n)}_\alpha (\cdot ,x)\) is an absolutely continuous function and satisfies the following DEs:
Given \(0<\delta <1\) and \(1\le n<\infty \) by (4.4) and \(\displaystyle \sup _{(x,a)\in K}c_n(x,a)\le n\), we have
\(\varphi ^{(n,\delta )}_\alpha (\theta ,x)\le e^{2n/\alpha },~~x\in S,\theta \in [\delta ,1]\).
Next, we extend the domain of \(\varphi ^{(n,\delta )}_\alpha \) to \([0,1]\times S\) by
We consider the following expression, for any given \(\pi \in \Pi \), \(x\in S\), \(\theta ,\theta _0\in [\delta ,1]\):
where
and
Consider \(c\wedge d:=min\{c,d\}\) and \(c\vee d:=max\{c,d\}\). Then for fix \(n\ge 1\); we have
and
Using the above results and knowing the fact that \(e^{bz}-1\le (e^b-1)z\) for all \(z\in [0,1]\) and \(b>0\), we obtain
Similarly for \(P_2\) we have
Hence for all \((\theta ,x)\in [0,1]\times S\), we have
Now we want to show that \({\overline{\varphi }}^{(n,\delta )}_\alpha \) is decreasing as \(\delta \rightarrow 0\) for any \((\theta ,x)\). For a fixed \(\alpha >0\) and \(\varepsilon >0\) small enough, consider \({\overline{\varphi }}^{(n,\delta +\varepsilon )}_\alpha (\theta ,x)-{\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,x)\) and assume that \(h_\delta :=e^{\frac{n\delta }{\alpha }}\). By measurable selection theorem we get the minimizer \(\pi ^{*(\delta +\varepsilon )}\) like in Eq. (3.4), corresponding to \({\overline{\varphi }}_\alpha ^{(n,\delta +\varepsilon )}\) such that the followings cases hold.
Case 1. If \(\delta +\varepsilon <\theta \) then
Case 2. \(\delta <\theta \le \delta +\varepsilon \)
Case 3. \(\theta \le \delta \)
Hence \({\overline{\varphi }}_\alpha ^{(n,\delta )}(\theta ,x)\) is increasing in \(\delta \) for any \((\theta ,x)\in [0,1]\times S\). Now from (4.12), we know that for each \(x\in S\), \({\overline{\varphi }}_\alpha ^{(n,\delta )}(\cdot ,x)\) is Lipschitz continous in \(\theta \in [0, 1]\). Also, \({\overline{\varphi }}_\alpha ^{(n,\delta )}(\theta ,x)\) is increasing in \(\delta \) for any \((\theta ,x)\in [0,1]\times S\) and bounded above (since \({\overline{\varphi }}^{(n,\delta )}_\alpha (\theta ,x)\le e^{2n/\alpha },~~x\in S,\theta \in [\delta ,1]\)), therefore there exists a function \(\varphi ^{(n)}_\alpha \) on \([0,1]\times S\) that is continuous with respect to \(\theta \in [0,1]\), such that along a subsequence \(\delta _m\rightarrow 0\), we have \(\lim _{m\rightarrow \infty }{\overline{\varphi }}_\alpha ^{(n,\delta _m)}(\theta ,x)=\varphi _\alpha ^{(n)}(\theta ,x)\) and for any fixed \(x\in S\) this convergence is uniform in \(\theta \in [0,1]\).
Let \(\psi \in C^\infty _c(0,1)\), then we have
Now take \(\tau (x):=M_0 {\mathcal {V}}(x)\) and define
for all \((x,a)\in K\) where \(\delta _x(\cdot )\) is the Dirac measure concentrated at x. We see that under Assumption 2.1, \(Q^{(n)}\) is a stochastic kernel on S given K. Then (4.13) can be written as
Now
Since for each fixed \(x\in S\), A(x) is compact, there exist a subsequence of \(\{m\}\), by abuse of notation, we denote the same sequence and \(a^*\in A(x)\) such that \(\lim _{m\rightarrow \infty }a^{*}_m=a^*\). Now, from (4.13), for any \(a\in A(x)\), we have
So, by Lemma 8.3.7 in Hernandez-Lerma and Lassere (1999) (Hernandez-Lerma and Lasserre 1999) taking limit as \(m\rightarrow \infty \) in (4.16), we get
Hence
But
By analogous arguments, we get
From (4.17) and (4.18), we get
Thus we obtain
Hence
in the sense of distribution. Now for \(\theta \in [\delta _m,1]\), by using (4.4) and Proposition 2.1, we have
Note that \(\varphi _\alpha ^{(n,\delta _m)}\rightarrow \varphi _\alpha ^{(n)}\) as \(m\rightarrow \infty \). Thus, letting \(m\rightarrow \infty \) in the above equation, we obtain
By using (4.1), (4.2), (4.20), and the DE satisfied by \(\varphi ^{(n)}_{\alpha }\) (that is just proven), we see that \(\varphi ^{(n)}_{\alpha }\in L^\infty _{{\mathcal {V}}}([0,1]\times S)\) and it is a solution of (4.11). Thus by closely mimicking the arguments as in Theorem 3.1, one can easily get the stochastic representation of the solution \(\varphi _\alpha ^{(n)}\), that is
Step 2: In this step we prove Theorem 4.1, by passing to the limit as \(n \rightarrow \infty \). Now we will prove that for each \(x\in S\), \(\{\varphi ^{(n)}_\alpha \}_{n\ge 1}\) is equicontinuous on [0, 1]. We consider the following expression, for any given \(\pi \in \Pi \), \(x\in S\), \(\theta ,\theta _0\in [0,1]\):
where
Here, the first inequality is according to \(e^{bz}-1\le (e^b-1)z\) for all \(z\in [0,1]\) and \(b>0\) and the last inequality follows from (3.7). Therefore, we have
By measurable selection theorem, [Bertsekas and Shreve (1996), Proposition 7.33], there exists a measurable function \(f^{*n}:[0,1]\times S\rightarrow A\) such that
Let
be defined by
Hence by Eq. (4.11), we have a.e. \(\theta \in [0,1]\) and \(\forall x\in S\), we have
Since \(c_n\ge 0\), by (4.21), we see \(\varphi ^{(n)}_\alpha (\theta ,x)\) is increasing in \(\theta \). Also we know that \(\varphi ^{(n)}_\alpha (\theta ,x)\) is differentiable a.e. with respect to \(\theta \in [0,1]\). So
So, by (4.1), (4.2) and (4.24), for all \(x\in S\) and for a.e. \(\theta \), we have
and
So, by Dynkin formula, we get
Also using (4.11) and Dynkin formula (see (3.7) and (3.13)), we have
By (4.28) and (4.29), we have \(\varphi ^{(n-1)}_\alpha (\theta ,x)\le \varphi ^{(n)}_\alpha (\theta ,x).\)
Hence \(\varphi ^{(n)}_\alpha (\theta ,x)\) is increasing in n for any \((\theta ,x)\in [0,1]\times S\). Now from (4.22), we know that for each \(x\in S\), \(\varphi ^{(n)}(\cdot ,x)\) is Lipschitz continuous in \(\theta \in [0, 1]\). Also, \(\varphi ^{(n)}_\alpha (\theta ,x)\) is increasing as \(n\rightarrow \infty \) for any \((\theta ,x)\in [0,1]\times S\) and bounded above (by (4.20)), therefore there exists a function \(\varphi _\alpha \) on \([0,1]\times S\) that is continuous with respect to \(\theta \in [0,1]\), such that along a subsequence \(n_k\rightarrow \infty \), we have \(\lim _{n_k\rightarrow \infty }\varphi ^{(n_k)}_\alpha (\theta ,x)=\varphi _\alpha (\theta ,x)\) and this convergence is uniform in \(\theta \in [0,1]\) for each fixed \(x\in S\). Moreover, by (4.20), we have
As the proof of equation (4.11) in step 1 (starting from the first equality of (4.13)), we see that \(\varphi _{\alpha }\) is a solution to the HJB Eq. (3.1). Also by (4.30), we can conclude that \(\varphi _{\alpha }\in L^\infty _{{\mathcal {V}}}([0,1]\times S)\). Finally, the uniqueness of \(\varphi _\alpha (\theta ,x)\) follows from the stochastic representation in Theorem 3.1. \(\square \)
5 The existence of optimal control
In this section, we present the main result of this article. Here we show the existence of an optimal control.
Theorem 5.1
Suppose that Assumptions 2.1, 2.2 and 3.1 are satisfied. Then, the following assertions hold.
-
(1)
The HJB Eq. (3.1) has a unique solution \(\varphi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)\) and the solution admits the following representation
$$\begin{aligned} 1\le \varphi _\alpha (\theta ,x)&=\inf _{\pi \in \Pi }E^{\pi }_x\biggl [\exp \biggl (\theta \int _{0}^{\infty }e^{-\alpha t}c(\xi _t,\pi _t)dt\biggr )\biggr ]\\&\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}. \end{aligned}$$ -
(2)
There exists a measurable function \(f^*: [0,1]\times S \rightarrow A\) such that
$$\begin{aligned} \alpha \theta \frac{\partial \varphi _\alpha }{\partial \theta }(\theta ,x)&=\biggl [\int _{S}q(dy|x,f^{*}(\theta ,x))\varphi _\alpha (\theta ,y)+\theta c(x,f^{*}(\theta ,x))\varphi _\alpha (\theta ,x)\biggr ]\nonumber \\ \text {a.e.}~\theta \in [0,1]. \end{aligned}$$(5.1) -
(3)
Furthermore an optimal Markov control for the cost criterion (2.2) exists and is given by
$$\begin{aligned} {\tilde{\pi }}^*_t(x):=f^*(\theta e^{-\alpha t},x), \end{aligned}$$where \(f^*\) satisfies (5.1).
Proof
Part (1) follows from Theorems 3.1 and 4.1.
To prove (2), by Hernandez-Lerma and Lasserre (1999), we first observe that the function
is continuous in \(a\in A(x)\) for each given \((\theta ,x)\in [0,1]\times S\). Thus, by the measurable selection theorem (Bertsekas and Shreve 1996), Proposition 7.33 there exists a measurable function \(f^{*}\) satisfying (5.1), and so (2) follows. For part (3), take any \(f^{*}\) that satisfies (5.1). Then by Theorem 3.1, we have \(\displaystyle \inf _{\pi \in \Pi } {\tilde{J}}_\alpha (\theta ,x,{\pi })={\tilde{J}}_\alpha (\theta ,x,{\tilde{\pi }}^{*})= \varphi _\alpha (\theta ,x)\), which together with (2.2), (2.3) and part (1), we have \(\displaystyle \inf _{\pi \in \Pi }{\mathcal {J}}_\alpha (\theta ,x,{\pi })={\mathcal {J}}_\alpha (\theta ,x,{\tilde{\pi }}^{*})=\frac{1}{\theta }\ln {\tilde{J}}_\alpha (\theta ,x,{\tilde{\pi }}^{*})= \frac{1}{\theta }\ln \varphi _\alpha (\theta ,x).\) Hence \({\tilde{\pi }}^{*}\) is an optimal Markov control. \(\square \)
Now we prove the converse of the Theorem 5.1.
Theorem 5.2
Grant Assumptions 2.1, 2.2 and 3.1. Suppose there exists an optimal Markov control for the cost criterion (2.2) and is given by
for some measurable function \({\tilde{f}}^*\). Then we prove that \({\tilde{f}}^*\) is a minimizing selector of (3.1).
Proof
sSince \({\hat{\pi }}^*\) is optimal for the cost criterion (2.2), therefore we have
Now for \({\tilde{f}}^*\) by Theorem 4.1, there exists a unique solution \(\psi _\alpha \in L^\infty _{{\mathcal {V}}}([0,1]\times S)\) for the equation
for each \(x\in S\) and a.e. \(\theta \in [0,1]\), satisfying \(1\le \psi _\alpha (\theta ,x)\le {\frac{\alpha ^2 e^{{\theta L_0}/{\alpha }}}{\alpha ^2-\rho _0\rho _1\theta }} ({\mathcal {V}}(x))^{\frac{\rho _1\theta }{\alpha }}\) for all \((\theta ,x)\in [0,1]\times S.\)
Now by Theorem 3.1, we know that
So, in view of Theorem 3.1, by Eqs. (3.1), (5.3), and (5.5), we conclude that \({\tilde{f}}^*\) is a minimizing selector of (3.1). \(\square \)
When the transition and cost rates are bounded, the existence of an optimal control is ensured by Theorem 5.1.
Corollary 5.1
Grant Assumption 3.1 ((i)–(ii)). Also, assume that the transition and cost rates are bounded. Then, there exist a unique solution \(\varphi _\alpha \) and an optimal control for the HJB Eq. (3.1).
Proof
Suppose there exist constants \(L_1\) and \(b_1\), such that \(\displaystyle \sup _{(x,a)\in {K}}q_x(a)\le L_1\) and \(\displaystyle \sup _{(x,a)\in {K}}c(x,a)\le b_1\). First we take the Lyapunov function \({\mathcal {V}}(x)\equiv P\), for all \(x\in S\), \(P\ge 1\), a constant. Now \(\int _{S}{\mathcal {V}}(y)q(dy|x,a)=\int _{S}{\mathcal {V}}^2(y)q(dy|x,a)=0\), for all \((x,a)\in {K}.\) Now, take \({\rho }_0=\alpha \), \(M_0=L_1\), any real number, \({\rho }_1\in (0,\alpha )\), and \({L}_0=b_1\). Then Assumption 2.1 is verified. Now for all \(x\in S\), take any constants \({\rho }_2\in (0,\alpha )\) and \(b_0\in (0,\infty )\). Then Assumption 2.2 holds. Also \(\int _{S}{\mathcal {V}}(y)q(dy|x,a)\) is continuous in \(a\in A(x)\). So, Assumption 3.1 is also true. Then, by Theorem 5.1, we have a unique solution \(\varphi _\alpha \) and an optimal control for the HJB Eq. (3.1). \(\square \)
6 Application and example
In this section, we verify the above assumptions with one example, where the transition and cost rates are unbounded.
Example 6.1
The Gaussian Model: Suppose a hunter is hunting outside his house for his manager. Suppose the house is at state 0. A positive state represents the distance from the house to the right, and a negative state represents the distance from the house to the left. Let \(S={\mathbb {R}}\). If the current position is \(x\in S\), the hunter takes an action \(a\in A(x)\), then after an exponentially distributed travel time with rate \(\lambda (x,a)>0\), the hunter reaches the new position, and the travel distance follows the normal distribution with mean x and variance \(\sigma \). (Or we can interpret \(\lambda (x,a)\) as the total jump intensity that is an arbitrary measurable positive-valued function on \(S\times A\), and the distribution of the state after a jump from \(x\in S\) is normal with the variance \(\sigma \) and expectation x.) Also assume that the hunter receives a payoff c(x, a) from his manager for each unit of time he spends there. Let us consider the model as \(A_2:=\{S,(A,A(x),x\in S),c(x,a),q(dy|x,a)\}\), where \(S=(-\infty ,\infty )\). For each \(D\in {\mathcal {B}}(S)\), the transition rate is
To ensure the existence of an optimal Markov control for the model, we consider the following hypotheses.
-
(I)
For each fixed \(x\in S\), \(\lambda (x,a)\) is continuous in \(a\in A(x)\) and there exists a positive constant \(M_1\) such that \(\displaystyle 0<\sup _{a\in A(x)}\lambda (x,a)\le M_1(x^2+1)\) and \(M_1<\frac{\alpha }{6\sigma ^2(\sigma ^2+1)}\).
-
(II)
For each \(x\in S\), the cost rate c(x, a) is nonnegative and continuous in \(a\in A(x)\) and there exists a constant \(0<\rho _1<\min \{\alpha ,\frac{\alpha ^2}{M_1\sigma ^2}\}\) such that
$$\begin{aligned} \sup _{a\in A(x)}c(x,a)\le \rho _1 \log (1+x^2). \end{aligned}$$ -
(III)
For each fixed \(x\in S\), A(x) is a compact subset of the Borel spaces A.
Proposition 6.1
Under conditions (I)–(III), the above controlled system satisfies the Assumptions 2.1, 2.2, and 3.1. Hence by Theorem 5.1, there exists an optimal Markov control for this model.
Proof
We know \(\frac{1}{\sqrt{2\pi }\sigma }\int _{-\infty }^{\infty }(y-x)^{2k+1}e^{-\frac{(y-x)^2}{2\sigma ^2}}dy=0\) and \(\frac{1}{\sqrt{2\pi }\sigma }\int _{-\infty }^{\infty }(y-x)^{2k}e^{-\frac{(y-x)^2}{2\sigma ^2}}dy=1\cdot 3\cdots (2k-1)\sigma ^{2k}\) for all \(k=0,1\cdots .\)
We first verify Assumption 2.1. Let \({\mathcal {V}}(x)=x^2+1\).
Let \(\rho _0=M_1 \sigma ^2\). Then \( \int _{S}{\mathcal {V}}(y)q(dy|x,a)\le \rho _0{\mathcal {V}}(x).\) Now
Now by condition (II), we can write
Observe that by condition (II), \(0<\rho _1<\min \{\alpha ,\rho _0^{-1}\alpha ^2\}\). Hence Assumption 2.1 is verified with \(M_0=L_0= M_1\).
Next we verify Assumption 2.2. For any \(x\in S\), \(a\in A(x)\),
where \(\rho _2=6M_1\sigma ^2(\sigma ^2+1)\), and \(b_0=1\). Then by condition (I), we have \(0<\rho _2<\alpha \). Hence, Assumption 2.2 is verified. Now by conditions (I) and (II) c(x, a) is continuous in \(a\in A(x)\). Observe that by condition (I) and (6.2), \( \int _{S}{\mathcal {V}}(y)q(dy|x,a)\) is continuous in \(a\in A(x)\). Hence Assumption 3.1 is also verified. So, by Theorem 5.1, we see that there exists an optimal Markov control for this model. \(\square \)
Remark 6.1
As we mention in the introduction, there are many real-life applications, where the underlying system dynamic is modeled as a CTMDP, with a Borel state and action spaces as well as cost and transition rates are unbounded, see such a cash-flow problem in Guo and Zhang (2019), [p. 112, Piunovskiy and Zhang (2020)]. Also, there are lots of real-life examples like infrastructure surveillance models [p. 115–116, Piunovskiy and Zhang (2020)], queueing model [p. 192, Piunovskiy and Zhang (2020)], where we see that the state space is uncountable, can be formulated in our set-up.
References
Bauerle N, Rieder U (2014) More risk-sensitive Markov decision processes. Math Oper Res 39:105–120
Bertsekas D, Shreve S (1996) Stochastic optimal control: the discrete-time case. Academic Press Inc, New York
Ghosh MK, Saha S (2014) Risk-sensitive control of continuous-time Markov chains. Stoch 86:655–675
Guo XP, Hernandez-Lerma O (2009) Continuous-time Markov decision processes: theory and applications. Stochastic modelling and applied probability. Springer, Berlin
Guo X, Liao ZW (2019) Risk-sensitive discounted continuous-time Markov decision processes with unbounded rates. SIAM J Control Optim 57:3857–3883
Guo X, Piunovskiy A (2011) Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res 36:105–132
Guo X, Zhang J (2019) Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces. Discrete Event Dyn Syst 29:445–471
Guo X, Zhang Y (2020) On risk-sensitive piecewise deterministic Markov decision processes. Appl Math Optim 81:685–710
Guo X, Huang Y, Song X (2012) Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent polices. SIAM J Control Optim 50:23–47
Guo X, Liu Q, Zhang Y (2019) Finite horizon risk-sensitive continuous-time Markov decision processes with unbounded transition and cost rates. 4OR 17:427–442
Hernandez-Lerma O, Lasserre J (1999) Further topics on discrete-time Markov control processes. Springer, New York
Kitaev MY (1995) Semi-Markov and jump Markov controlled models: average cost criterion. SIAM Theory Probab Appl 30:272–288
Kitaev MY, Rykov VV (1995) Controlled queueing systems. CRC Press, Boca Raton
Kumar KS, Pal C (2013) Risk-sensitive control of jump process on denumerable state space with near monotone cost. Appl Math Optim 68:311–331
Kumar KS, Pal C (2015) Risk-sensitive control of continuous-time Markov processes with denumerable state space. Stoch Anal Appl 33:863–881
Masi GB, Stettner L (2000) Infinite horizon risk-sensitive control of discrete time Markov processes with small risk. Syst Control Lett 40:15–20
Masi GB, Stettner L (2007) Infinite horizon risk-sensitive control of discrete time Markov processes under minorization property. SIAM J Control Optim 46:231–252
Pal C, Pradhan S (2019) Risk-sensitive control of pure jump processes on a general state space. Int J Probab Stoch Process 91(2):155–174
Piunovskiy A, Zhang Y (2020) Continuous-time Markov decision processes. Springer, Berlin
Piunovskiy A, Zhang Y (2011) Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J Control Optim 49:2032–2061
Prieto-Rumeau T, Hernandez-Lerma O (2012) Selected topics in continuous-time controlled Markov chains and Markov games. Imperical College Press, London
Wei Q (2016) Continuous-time Markov decision processes with risk-sensitive finite-horizon cost criterion. Math Methods Oper Res 84:461–487
Whittle P (1990) Risk-sensitive optimal control, Wiley-Inter Science series in systems and optimization. Wiley, Chichester
Zhang Y (2017) Continuous-time Markov decision processes with exponential utility. SIAM J Control Optim 55:2636–2660
Acknowledgements
We thank the anonymous referees for their valuable comments and helpful suggestions that have improved the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Golui, S., Pal, C. Risk-sensitive discounted cost criterion for continuous-time Markov decision processes on a general state space. Math Meth Oper Res 95, 219–247 (2022). https://doi.org/10.1007/s00186-022-00779-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-022-00779-9