1 Introduction

Markov decision processes have been widely applied to the queueing system, power-managed systems, inventory control, telecommunication, infrastructure surveillance models, preventive maintenance, and epidemic control, etc. (see [11, 18, 20, 21]). In the standard expected discounted cost criterion, the decision horizon is infinite and the decision-maker is assumed to be risk-neutral (see [11, 14, 18, 20, 21]). From the viewpoint of the risk preference, the decision-maker may be risk-sensitive rather than risk-neutral; see, for instance, [2, 4,5,6,7,8,9,10, 15,16,17, 22, 24,25,28]. The exponential utility function is a common way of characterizing the risk-sensitivity of the decision-maker in Markov decision processes; see, for instance, [8, 10, 16, 25] for the finite-horizon risk-sensitive cost criterion, [5, 8, 15] for the infinite-horizon risk-sensitive discounted cost criterion and [4,5,6,7,8,9, 22, 24, 26, 27] for the infinite-horizon risk-sensitive average cost criterion. On the other hand, the decision horizon in [4,5,6,7,8,9,10, 15, 16, 22, 24,25,26,27] is either finite or infinite. But in the practical applications, the decision-maker may be concerned with the costs before the state of the controlled stochastic system falls into some target set. Thus, the decision horizon is uncertain; see, for instance, [12, 19]. The first passage time to some target set is used to describe the uncertainty of the decision horizon in [12, 19]. More precisely, Guo et al. [12] studied the first passage mean-variance criterion for discounted continuous-time Markov decision processes and [19] investigated the nonzero-sum games under the first passage expected discounted payoff criterion for continuous-time jump processes. Hence, it is meaningful for us to take the risk-sensitivity of the decision-maker and the uncertainty of the decision horizon into consideration. In this paper we intend to investigate continuous-time Markov decision processes under the risk-sensitive first passage discounted cost criterion, which has not been studied yet.

The state and action spaces are Borel spaces, and the cost and transition rates can be possibly unbounded in this paper. In order to ensure the finiteness of the risk-sensitive first passage discounted cost criterion and the existence of optimal policies, we require the drift condition on the transition rate, the growth condition on the cost rate, and the continuity and compactness conditions. First, we derive the bounds on the risk-sensitive first passage discounted cost criterion and the Feynman–Kac formula which is applicable to a class of unbounded functions (see Lemmas 3.1 and 3.3). Then, we construct an approximating sequence of bounded cost rates and introduce a new value iteration [see (4.1)]. Employing the results on the approximating sequence, we obtain the existence of a solution to the risk-sensitive first passage discounted cost optimality equation for the case of the Borel state space and unbounded transition and cost rates. Moreover, using the Feynman–Kac formula, we prove that the risk-sensitive first passage discounted cost optimal value function is a unique solution to the risk-sensitive first passage discounted cost optimality equation. In addition, from the optimality equation we derive the existence of a deterministic Markov optimal policy in the class of randomized history-dependent policies (see Theorem 4.1). Finally, we provide a cash flow model to illustrate the optimality conditions of our results.

Compared with the existing literature on the risk-sensitive discounted continuous-time Markov decision processes, the main contributions of this paper are as follows.

  1. (I)

    (New method). The existence of a solution to the infinite-horizon risk-sensitive discounted cost optimality equation is obtained in [5] with the Borel state space and bounded cost and transition rates, [8] with the denumerable state space and bounded cost and transition rates, and [15] with the denumerable state space and unbounded cost and transition rates. The extension of the results in [5, 8, 15] to the case of the Borel state space and unbounded cost and transition rates is nontrivial. We establish the existence of a solution to the optimality equation via introducing a new value iteration, which is different from the methods in [5, 8, 15]. More precisely, the existence of a solution to the infinite-horizon risk-sensitive discounted cost optimality equation is obtained in [5, 8, 15] via the fixed point technique. Since the function \(\frac{1}{\lambda }\) is singular at the point \(\lambda =0\), the integral in the construction of the fixed point mapping needs to start from some \(\varepsilon >0\) in [5, 8, 15]. Then letting \(\varepsilon \rightarrow 0\), the existence of a solution to the infinite-horizon risk-sensitive discounted cost optimality equation for the case of bounded cost and transition rates can be derived. The fixed point method requires the boundedness assumption on the cost and transition rates. Thus, the approach in [5, 8] cannot deal with the unbounded cost and transition rates. To investigate the unbounded case, [15] first constructs an approximating sequence of bounded cost and transition rates via the truncation method and then applying the fixed point technique to the approximating sequence. The existence of a solution to the infinite-horizon risk-sensitive discounted cost optimality equation for a denumerable state space and unbounded cost and transition rates is established in [15] by an approximation approach. Because the diagonalization arguments in [15] require the assumption that the state space is a denumerable set, the method in [15] is inapplicable to the case of the Borel state space. To deal with the Borel state space and unbounded cost and transition rates, we introduce a new value iteration [see (4.1)]. Our value iteration approach has the following two advantages. On the one hand, we overcome the singular issue from the fixed point method in [5, 8, 15]. The approximation \(\varepsilon \rightarrow 0\) in [5, 8, 15] is not required for the value iteration method. On the other hand, the value iteration can treat the case of the Borel state space and unbounded transition rates directly without the boundedness assumption as in [5, 8] and the diagonalization arguments and the construction of an approximating sequence of bounded transition rates as in [15].

  2. (II)

    (Weaker conditions). We obtain the existence result under the conditions weaker than those in [5, 8, 15]. More specifically, we remove the boundedness condition on the cost and transition rates in [5, 8] and the uniform convergence condition of the solution in [8]. We do not require the denumerability assumption on the state space, the uniformly conservative condition on the transition rates, and the additional condition that the constant in the second-order drift condition is less than the discount factor in [15].

The remainder of the paper is organized as follows. In Sect. 2, we introduce the decision model and the risk-sensitive first passage discounted cost criterion. In Sect. 3, we provide the optimality conditions for the existence of optimal policies and prove the Feynman–Kac formula which is applicable to a class of unbounded functions. In Sect. 4, we employ a new value iteration to establish the risk-sensitive first passage discounted cost optimality equation and show the existence of optimal policies. In Sect. 5, we use a cash flow model to illustrate the main results. In Sect. 6, we give the conclusions.

2 The Decision Model

We first give a description of the decision model and then introduce the definition of the policy and the risk-sensitive first passage discounted cost criterion in this section. The decision model contains the following components:

$$\begin{aligned} \{X,A, (A(x)\subseteq A,x\in X),q(\cdot |x,a), c(x,a)\}. \end{aligned}$$

The state space X and action space A are assumed to be Borel spaces endowed with the Borel \(\sigma \)-algebras \(\mathcal {B}(X)\) and \(\mathcal {B}(A)\), respectively. For each \(x\in X\), \(A(x)\in \mathcal {B}(A)\) denotes the set of all admissible actions at state x. Let \(K:=\{(x,a):x\in X, a\in A(x)\}\) which is supposed to be a Borel measurable subset of \(X\times A\). The transition rate q satisfies the following properties: (i) for any \((x,a)\in K\), \(q(\cdot |x,a)\) is a signed measure on \(\mathcal {B}(X)\) and for any \(D\in \mathcal {B}(X)\), \(q(D|\cdot )\) is a real-valued measurable function on K; (ii) \(0\le q(D|x,a)<\infty \) for all \((x,a)\in K\) and \(x\notin D\in \mathcal {B}(X)\); (iii) \(q(X|x,a)=0\) for all \((x,a)\in K\); and (iv) \(q^*(x):=\sup _{a\in A(x)}|q(\{x\}|x,a)|<\infty \) for all \(x\in X\). The cost rate c is a nonnegative real-valued measurable function on K.

Next, we describe the construction of the state process. To this end, set \(X_{\infty }:=X\cup \{x_{\infty }\}\) with an isolated point \(x_{\infty }\notin X\), \(\mathbb {R}_+:=(0,\infty )\), \(\Omega ^0:=(X\times \mathbb {R}_+)^{\infty }\) and \(\Omega :=\Omega ^0\cup \{(x_0,\theta _1,x_1,\ldots ,\theta _n,x_n,\infty ,x_{\infty },\infty ,x_{\infty },\ldots )|x_0\in X,x_l\in X,\theta _l\in \mathbb {R}_+ \ \mathrm{for \ each} \ 1\le l\le n,n\ge 1\}\). Denote by \(\mathcal {F}\) the Borel \(\sigma \)-algebra of \(\Omega \). For any \(\omega =(x_0,\theta _1,x_1,\ldots )\in \Omega \), define \(S_0(\omega ):=x_0\), \(T_0(\omega ):=0\), \(S_n(\omega ):=x_n\), \(T_n(\omega ):=\sum _{i=1}^n\theta _i\) for all \(n\ge 1\), \(T_{\infty }(\omega ):=\lim _{n\rightarrow \infty }T_n(\omega )\) and the state process

$$\begin{aligned} \xi _t(\omega ):=\sum _{n\ge 0}I_{\{T_n\le t<T_{n+1}\}}x_n+I_{\{t\ge T_{\infty }\}}x_{\infty } \end{aligned}$$

for \(t\ge 0\), where \(I_D\) stands for the indicator function of a set D. Moreover, let \(A(x_{\infty }):=a_{\infty }\) with an isolated point \(a_{\infty }\notin A\), \(A_{\infty }:=A\cup \{a_{\infty }\}\), \(c(x_{\infty },a_{\infty }):=0\), \(q(x_{\infty }|x_{\infty },a_{\infty }):=0\), \(\mathcal {F}_t:=\sigma (\{T_n\le s,S_n\in D\}: s\le t,D\in \mathcal {B}(X),n\ge 0)\) for all \(t\ge 0\), \(\mathcal {F}_{s-}:=\bigvee _{0\le t<s}\mathcal {F}_t\), and \(\mathcal {P}:=\sigma (\{D\times \{0\},D\in \mathcal {F}_0\}\cup \{D\times (s,\infty ),D\in \mathcal {F}_{s-},s>0\})\) which presents the \(\sigma \)-algebra of predictable sets on \(\Omega \times [0,\infty )\) with respect to \(\{\mathcal {F}_t\}_{t\ge 0}\).

Now we give the definition of a policy.

Definition 2.1

A transition probability \(\pi (\cdot |\cdot )\) on \(A_{\infty }\) given \(\Omega \times [0,\infty )\) is called a randomized history-dependent policy if for any \((\omega ,t)\in \Omega \times [0,\infty )\), \(\pi (\cdot |\omega ,t)\) is a probability measure supported on \(A(\xi _{t-}(\omega ))\) and for any \(D\in \mathcal {B}(A_{\infty })\), \(\pi (D|\cdot )\) is \(\mathcal {P}\)-measurable, where \(\xi _{t-}=\lim _{s\uparrow t}\xi _s\). The set of all randomized history-dependent policies is denoted by \(\Pi \). A policy \(\pi \in \Pi \) is said to be deterministic Markov if there exists a measurable mapping \(f:[0,\infty )\times X_{\infty }\rightarrow A_{\infty }\) with \(f(t,x)\in A(x)\) for all \((t,x)\in [0,\infty )\times X_{\infty }\) such that \(\pi (\cdot |\omega ,t)\) is the Dirac measure at the point \(f(t,\xi _{t-}(\omega ))\) for all \((\omega ,t)\in \Omega \times [0,\infty )\). The set of all deterministic Markov policies is denoted by \(\Pi _M\).

For any initial state \(x\in X\) and an arbitrary policy \(\pi \in \Pi \), by Theorem 4.27 in [20] there exists a unique probability measure \(P_x^{\pi }\) on \((\Omega ,\mathcal {F})\). The expectation operator with respect to \(P_x^{\pi }\) is denoted by \(E_x^{\pi }\).

Finally, we introduce the risk-sensitive first passage discounted cost criterion. Fix the target set \(B\in \mathcal {B}(X)\) and the discount factor \(\alpha >0\). The first passage time to the target set B is given by

$$\begin{aligned} \tau _B:=\inf \{t\ge 0:\xi _t\in B\} \ \mathrm{with \ the \ convention \ that} \ \inf \emptyset :=\infty . \end{aligned}$$

For any risk-sensitivity coefficient \(\lambda >0\), the risk-sensitive first passage discounted cost criterion is defined as follows:

$$\begin{aligned} J(\lambda , x,\pi ):=\frac{1}{\lambda }\ln E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{\tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \end{aligned}$$
(2.1)

for all \(x\in X\) and \(\pi \in \Pi \). Taking \(B=\emptyset \), we can see that the risk-sensitive first passage discounted cost criterion given by (2.1) reduces to the infinite-horizon risk-sensitive discounted cost criterion in [5, 8, 15]. Let

$$\begin{aligned} \widehat{J}(\lambda , x,\pi ):=E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{\tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \end{aligned}$$

for all \(x\in X\) and \(\pi \in \Pi \). Then we have \(J(\lambda , x,\pi )=0\) and \(\widehat{J}(\lambda , x,\pi )=1\) for all \(x\in B\) and \(\pi \in \Pi \).

Definition 2.2

A policy \(\pi ^*\in \Pi \) is said to be optimal if \( J(\lambda , x,\pi ^*)=\inf _{\pi \in \Pi }J(\lambda , x,\pi )=:J^*(\lambda ,x)\) for all \(x\in X\). The function \(J^*\) is called the risk-sensitive first passage discounted cost optimal value function.

3 Preliminaries

In this section, we give the optimality conditions for the existence of optimal policies. To investigate the possibly unbounded cost and transition rates, we require the following assumption.

Assumption 3.1

There exist a real-valued measurable function \(w\ge 1\) on X, constants \(\rho >0\), \(d\ge 0\), \(L>0\), \(b\ge 0\) and \(0\le M<\min \left\{ \frac{\alpha ^2}{\rho \lambda },\frac{\alpha }{2\lambda }\right\} \) such that

  1. (i)

    \(\int _Xw(y)q(dy|x,a)\le \rho w(x)+d\) for all \((x,a)\in K\);

  2. (ii)

    \(q^*(x)\le Lw(x)\) for all \(x\in X\);

  3. (iii)

    \(c(x,a)\le M\ln w(x)+b\) for all \((x,a)\in K\).

Remark 3.1

Assumptions 3.1(i) and (ii) are used to ensure the non-explosion of the state process; see, for example, [9,10,11,12, 14,15,16, 19, 25, 27]. Assumption 3.1(iii) is used to guarantee the finiteness of the risk-sensitive first passage discounted cost criterion.

Under Assumption 3.1, we have the following result.

Lemma 3.1

Under Assumption 3.1, the following assertions hold.

  1. (a)

    \( E_x^{\pi }[w(\xi _t)]\le \text {e}^{\rho t}w(x)+\frac{d}{\rho }\text {e}^{\rho t}\le \text {e}^{\rho t}\left( 1+\frac{d}{\rho }\right) w(x)\) for all \(x\in X\) and \(\pi \in \Pi \).

  2. (b)

    \(\widehat{J}(\lambda ,x,\pi )\le R_{\lambda }w^{\frac{\lambda M}{\alpha }}(x)\) for all \(x\in X\) and \(\pi \in \Pi \), where the constant \(R_{\lambda }=\frac{\alpha ^2\text {e}^{\frac{\lambda b}{\alpha }}}{\alpha ^2-\rho \lambda M}\left( 1+\frac{d}{\rho }\right) ^{\frac{\lambda M}{\alpha }}\).

  3. (c)

    \(J(\lambda ,x,\pi )\le \frac{1}{\lambda }\ln R_{\lambda }+\frac{M}{\alpha }\ln w(x)\) for all \(x\in X\) and \(\pi \in \Pi \).

Proof

  1. (a)

    The assertion follows from Theorem 3.1 in [14].

  2. (b)

    Direct computations yield

    $$\begin{aligned} \widehat{J}(\lambda ,x,\pi )&\le E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{\infty }\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \\&\le E_x^{\pi }\left[ \int _0^{\infty }\text {e}^{\frac{\lambda }{\alpha }\int _Ac(\xi _t,a)\pi (\text {d}a|\omega ,t)}\alpha \text {e}^{-\alpha t}\text {d}t\right] \\&\le \alpha \text {e}^{\frac{\lambda b}{\alpha }} E_x^{\pi }\left[ \int _0^{\infty }\text {e}^{-\alpha t}w^{\frac{\lambda M}{\alpha }}(\xi _t)\text {d}t\right] \\&\le \alpha \text {e}^{\frac{\lambda b}{\alpha }}\int _0^{\infty }\text {e}^{-\alpha t} (E_x^{\pi }[w(\xi _t)])^{\frac{\lambda M}{\alpha }}\text {d}t\\&\le \alpha \text {e}^{\frac{\lambda b}{\alpha }}\left( 1+\frac{d}{\rho }\right) ^{\frac{\lambda M}{\alpha }}w^{\frac{\lambda M}{\alpha }}(x)\int _0^{\infty }\text {e}^{-\alpha t}\text {e}^{\frac{\rho \lambda M t}{\alpha }}\text {d}t \\&=\frac{\alpha ^2\text {e}^{\frac{\lambda b}{\alpha }}}{\alpha ^2-\rho \lambda M}\left( 1+\frac{d}{\rho }\right) ^{\frac{\lambda M}{\alpha }}w^{\frac{\lambda M}{\alpha }}(x) \end{aligned}$$

for all \(x\in X\) and \(\pi \in \Pi \), where the second and fourth inequalities are due to the Jensen inequality, the third one follows from Assumption 3.1(iii), and the fifth one is due to part (a).

  1. (c)

    From part (b), we can directly obtain part (c).\(\square \)

To obtain the existence of optimal policies, we also need the following assumption.

Assumption 3.2

  1. (i)

    For each \(x\in X\), the set A(x) is compact.

  2. (ii)

    For each \(x\in X\), c(xa) is lower semi-continuous in \(a\in A(x)\) and \(\int _Xv(y)q(\text {d}y|x,a)\) is continuous in \(a\in A(x)\) for any bounded measurable function v on X.

  3. (iii)

    There exist constants \(\overline{\rho }>0\) and \(\overline{d}\ge 0\) such that \( \int _Xw^2(y)q(\text {d}y|x,a)\le \overline{\rho }w^2(x)+\overline{d}\) for all \((x,a)\in K\), where the function w on X comes from Assumption 3.1.

Remark 3.2

The continuity and compactness conditions in Assumptions 3.2(i) and (ii) are used to obtain the existence of optimal policies; see, for example, [9,10,11,12, 14,15,16, 19, 25,26,27]. Assumption 3.2(iii) has been widely used in continuous-time Markov decision processes; see, for example, [9,10,11,12, 14,15,16, 19, 25, 27]. Here, we use it to ensure the integrability in obtaining the Feynman–Kac formula in Lemma 3.3.

Lemma 3.2

Under Assumptions 3.2(i) and (ii), the following statements are true.

  1. (a)

    For any nonnegative real-valued measurable function v on X, \(\int _Xv(y)q(\text {d}y|x,a)\) is lower semi-continuous in \(a\in A(x)\) for all \(x\in X\).

  2. (b)

    Let \(\{v_n,n\ge 1\}\) be a sequence of nonnegative real-valued measurable functions on X with \(\lim _{n\rightarrow \infty }v_n=v\). Then, for any \(x\in X\) and any sequence \(\{a_n,n\ge 1\}\subseteq A(x)\) satisfying \(a_n\rightarrow a\) as \(n\rightarrow \infty \), we have

    $$\begin{aligned}\liminf _{n\rightarrow \infty }\int _Xv_n(y)q(\mathrm{{d}}y|x,a_n)\ge \int _Xv(y)q(\mathrm{{d}}y|x,a). \end{aligned}$$

Proof

(a) Define \(v_m(x):=\min \{v(x),m\}\) for all \(x\in X\) and \(m\ge 1\). Fix any \(x\in X\). Let \(\{a_n,n\ge 1\}\subseteq A(x)\) be a sequence satisfying \(a_n\rightarrow a\) as \(n\rightarrow \infty \). Employing Assumption 3.2(ii), we can get

$$\begin{aligned} \lim _{n\rightarrow \infty }q(\{x\}|x,a_n)&=\lim _{n\rightarrow \infty }\int _XI_{\{x\}}(y)q(\text {d}y|x,a_n)\nonumber \\&=\int _XI_{\{x\}}(y)q(\text {d}y|x,a)=q(\{x\}|x,a). \end{aligned}$$
(3.1)

For each \(m\ge 1\), we have

$$\begin{aligned} \liminf _{n\rightarrow \infty }\int _{X\setminus \{x\}}v(y)q(\text {d}y|x,a_n)&\ge \liminf _{n\rightarrow \infty }\int _{X\setminus \{x\}}v_m(y)q(\text {d}y|x,a_n)\nonumber \\&=\liminf _{n\rightarrow \infty }\left[ \int _{X}v_m(y)q(\text {d}y|x,a_n)-v_m(x)q(\{x\}|x,a_n)\right] \nonumber \\&=\int _{X\setminus \{x\}}v_m(y)q(\text {d}y|x,a), \end{aligned}$$
(3.2)

where the last equality follows from Assumption 3.2(ii) and (3.1). Thus, letting \(m\rightarrow \infty \) in (3.2) and using the monotone convergence theorem, we derive \(\liminf _{n\rightarrow \infty }\int _{X{\setminus } \{x\}}v(y)q(\text {d}y|x,a_n)\ge \int _{X{\setminus } \{x\}}v(y)q(\text {d}y|x,a)\). Observe that \(\lim _{n\rightarrow \infty }v(x)q(\{x\}|x,a_n)=v(x)q(\{x\}|x,a)\). Hence, we can obtain \( \liminf _{n\rightarrow \infty }\int _{X}v(y)q(\text {d}y|x,a_n)\ge \int _{X}v(y)q(\text {d}y|x,a)\). Therefore, \(\int _Xv(y)q(\text {d}y|x,a)\) is lower semi-continuous in \(a\in A(x)\).

(b) For each \(m\ge 1\), define \(\widetilde{v}_m:=\inf _{n\ge m}v_n\). Then, we have

$$\begin{aligned}&\liminf _{n\rightarrow \infty }\int _{X\setminus \{x\}}v_n(y)q(\text {d}y|x,a_n)\ge \liminf _{n\rightarrow \infty }\int _{X\setminus \{x\}}\widetilde{v}_m(y)q(\text {d}y|x,a_n)\nonumber \\&\quad =\liminf _{n\rightarrow \infty }\left[ \int _{X}\widetilde{v}_m(y)q(\text {d}y|x,a_n)-\widetilde{v}_m(x)q(\{x\}|x,a_n)\right] \ge \int _{X\setminus \{x\}}\widetilde{v}_m(y)q(\text {d}y|x,a), \end{aligned}$$
(3.3)

where the last inequality is due to part (a). Letting \(m\rightarrow \infty \) in (3.3) and using the monotone convergence theorem and (3.1), we can get the statement. \(\square \)

Let \(\varLambda :=\min \left\{ \frac{\alpha ^2}{\rho M},\frac{\alpha }{2\,M}\right\} \) and the function w on X be as in Assumption 3.1. The notation \(U_{w}((0,\varLambda )\times X)\) denotes the set of all real-valued measurable functions v on \((0,\varLambda )\times X\) which satisfy the following properties:

  1. (i)

    \(\sup _{(\lambda ,x)\in (0,\varLambda )\times X}\frac{|v(\lambda ,x)|}{w(x)}<\infty \);

  2. (ii)

    for any \(x\in X\) and \([\underline{\gamma },\overline{\gamma }]\subset (0,\varLambda )\), \(v(\cdot ,x)\) is absolutely continuous on \([\underline{\gamma },\overline{\gamma }]\) (this implies that the partial derivative of \(v(\cdot ,x)\) with respect to the variable \(\lambda \) exists almost everywhere (a.e.) \(\lambda \in (0,\varLambda )\); see Remark 3.3 for details) and \(\lambda \left| \frac{\partial v}{\partial \lambda }(\lambda ,x)\right| \le \overline{R}_{v}w^2(x)\) a.e. \(\lambda \in (0,\varLambda )\) for some positive constant \(\overline{R}_{v}\) independent of \(\lambda \) and x.

Remark 3.3

Let \(v\in U_{w}((0,\varLambda )\times X)\) and fix any \(x\in X\). Note that \((0,\varLambda )=\bigcup _{n=1}^{\infty }\left[ \frac{1}{n},\varLambda -\frac{1}{n}\right] \). For each \(n\ge 1\), since \(v(\cdot ,x)\) is absolutely continuous on \(\left[ \frac{1}{n},\varLambda -\frac{1}{n}\right] \), there exists \(O_{x,n}\subset (0,\varLambda )\) with Lebesgue measure zero such that the partial derivative of \(v(\cdot ,x)\) with respect to the variable \(\lambda \) exists for all \(\lambda \in \left[ \frac{1}{n},\varLambda -\frac{1}{n}\right] {\setminus } O_{x,n}\). Set \(O_x:=\bigcup _{n=1}^{\infty }O_{x,n}\). Then, the Lebesgue measure of \(O_x\) is zero. Moreover, the partial derivative of \(v(\cdot ,x)\) with respect to the variable \(\lambda \) exists for all \(\lambda \in (0,\varLambda ){\setminus } O_x\). Hence, the partial derivative of \(v(\cdot ,x)\) with respect to the variable \(\lambda \) exists a.e. \(\lambda \in (0,\varLambda )\).

Inspired by Theorem 3.1 in [10] and Lemma 3.2 in [16], we have the following Feynman–Kac formula which plays a key role in proving the existence of optimal policies.

Lemma 3.3

Suppose that Assumptions 3.1(i), (ii) and 3.2(iii) hold. Then, for any bounded measurable function r on K, \(v\in U_{w}((0,\varLambda )\times X)\), \(T\ge 0\), \(\lambda \in (0,\varLambda )\), \(\pi \in \Pi \) and stopping time \(\eta \), we have

$$\begin{aligned}&E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \eta }\int _A\text {e}^{-\alpha t}r(\xi _t,a)\pi (da|\omega ,t)\text {d}t}v(\lambda \text {e}^{-\alpha (T\wedge \eta )}, \xi _{T\wedge \eta })\right] -v(\lambda ,x)\\&\quad =E_x^{\pi }\bigg [\int _0^{T\wedge \eta }\text {e}^{\lambda \int _0^s\int _A \text {e}^{-\alpha u }r(\xi _u,a)\pi (\text {d}a|\omega ,u)du}\\&\quad \bigg (\lambda \text {e}^{-\alpha s}\int _Ar(\xi _s,a)\pi (\text {d}a|\omega ,s)v(\lambda \text {e}^{-\alpha s},\xi _s)-\alpha \lambda \text {e}^{-\alpha s}\frac{\partial v}{\partial \lambda }(\lambda \text {e}^{-\alpha s},\xi _s)\\&\quad \quad +\int _X\int _Av(\lambda \text {e}^{-\alpha s},y)q(\text {d}y|\xi _s,a)\pi (\text {d}a|\omega ,s) \bigg )\text {d}s\bigg ] \end{aligned}$$

for all \(x\in X\), where \(T\wedge \eta :=\min \{T,\eta \}\).

Proof

Fix any \(x\in X\), \(\pi \in \Pi \), \(T\ge 0\), \(\lambda \in (0,\varLambda )\) and \(v\in U_{w}((0,\varLambda )\times X)\). Let \(\Vert r\Vert :=\sup _{(x,a)\in K}|r(x,a)|\), \(\Vert v\Vert _{w}:=\sup _{(\lambda ,x)\in (0,\varLambda )\times X}\frac{|v(\lambda ,x)|}{w(x)}\) and \(G(\omega ,t,x):=\text {e}^{\lambda \int _0^t\int _A\text {e}^{-\alpha s}r(\xi _s,a)\pi (\text {d}a|\omega ,s)\text {d}s}v(\lambda \text {e}^{-\alpha t},x)\) for all \(t\in [0,T]\). By the Dynkin formula, we have

$$\begin{aligned} E_x^{\pi }[w(\xi _{T\wedge \eta })]=w(x)+E_x^{\pi }\left[ \int _0^{T\wedge \eta }\int _X\int _Aw(y)q(\text {d}y|\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t\right] . \end{aligned}$$
(3.4)

Employing Assumptions 3.1(i), (ii), 3.2(iii), (3.4) and Lemma 3.1(a), we can derive

$$\begin{aligned}&E_x^{\pi }\left[ |G(\omega ,T\wedge \eta ,\xi _{T\wedge \eta })|\right] \le \Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }} E_x^{\pi }[w(\xi _{T\wedge \eta })]\\&\quad \le \,\Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}w(x) +\Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}E_x^{\pi }\left[ \int _0^{T}\int _X\int _Aw(y)|q(\text {d}y|\xi _t,a)|\pi (\text {d}a|\omega ,t)\text {d}t\right] \\&\quad \le \, \Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}w(x)+\Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }} \int _0^TE_x^{\pi }\left[ \int _X\int _Aw(y)q(\text {d}y|\xi _t,a)\pi (\text {d}a|\omega ,t)+2q^*(\xi _t)w(\xi _t)\right] \text {d}t\\&\quad \le \, \Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}w(x)+\Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }} \int _0^TE_x^{\pi }\left[ \rho w(\xi _t)+d+2Lw^2(\xi _t)\right] dt\\&\quad \le \,\Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}w(x)+\Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}T \left[ \rho \text {e}^{\rho T}\left( 1+\frac{d}{\rho }\right) w(x)+d+2L\left( \text {e}^{\overline{\rho }T}w^2(x) +\frac{\overline{d}}{\overline{\rho }}\text {e}^{\overline{\rho }T}\right) \right] , \\&E_x^{\pi }\left[ \int _0^{T\wedge \eta }\text {e}^{\lambda \int _0^s\int _A \text {e}^{-\alpha u }r(\xi _u,a)\pi (\text {d}a|\omega ,u)\text {d}u} \text {e}^{-\alpha s}\int _A|r(\xi _s,a)|\pi (\text {d}a|\omega ,s)|v(\lambda \text {e}^{-\alpha s},\xi _s)|\text {d}s\right] \\&\quad \le \,\Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}\Vert r\Vert \int _0^TE_x^{\pi }[w(\xi _s)]\text {d}s\le \Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}\Vert r\Vert T\text {e}^{\rho T}\left( 1+\frac{d}{\rho }\right) w(x), \\&E_x^{\pi }\left[ \int _0^{T\wedge \eta }\text {e}^{\lambda \int _0^s\int _A \text {e}^{-\alpha u }r(\xi _u,a)\pi (\text {d}a|\omega ,u)\text {d}u}\text {e}^{-\alpha s}\lambda \left| \frac{\partial v}{\partial \lambda }(\lambda \text {e}^{-\alpha s},\xi _s)\right| \text {d}s\right] \\&\quad \le \, \overline{R}_{v}\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}\int _0^TE_x^{\pi }[w^2(\xi _s)]\text {d}s\le \overline{R}_{v}\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}T\left[ \text {e}^{\overline{\rho }T}w^2(x)+\frac{\overline{d}}{\overline{\rho }}\text {e}^{\overline{\rho }T}\right] , \end{aligned}$$

and

$$\begin{aligned}&E_x^{\pi }\left[ \int _0^{T\wedge \eta }\text {e}^{\lambda \int _0^s\int _A \text {e}^{-\alpha u }r(\xi _u,a)\pi (\text {d}a|\omega ,u)\text {d}u}\int _X\int _A|v(\lambda \text {e}^{-\alpha s},y)||q(\text {d}y|\xi _s,a)|\pi (\text {d}a|\omega ,s)\text {d}s\right] \\&\quad \le \, \Vert v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }} \int _0^TE_x^{\pi }\left[ \int _X\int _Aw(y)q(\text {d}y|\xi _s,a)\pi (\text {d}a|\omega ,s)+2q^*(\xi _s)w(\xi _s)\right] \text {d}s\\&\quad \le \, |v\Vert _w\text {e}^{\frac{\lambda \Vert r\Vert }{\alpha }}T\left[ \rho \text {e}^{\rho T}\left( 1+\frac{d}{\rho }\right) w(x)+d+2L\left( \text {e}^{\overline{\rho }T}w^2(x) +\frac{\overline{d}}{\overline{\rho }}\text {e}^{\overline{\rho }T}\right) \right] . \end{aligned}$$

Because \(v(\cdot ,x)\) is absolutely continuous on \([\lambda \text {e}^{-\alpha T},\lambda ]\), we can obtain

$$\begin{aligned} G(\omega ,T\wedge \eta ,\xi _{T\wedge \eta })=v(\lambda ,x)+\int _0^{T\wedge \eta }G'(\omega ,t,\xi _t)\text {d}t+\sum _{k\ge 1}\int _{(0,T\wedge \eta ]}\Delta G(\omega ,t,\xi _t)\delta _{T_k}(\text {d}t), \end{aligned}$$
(3.5)

where \(\Delta G(\omega ,t,\xi _t):= G(\omega ,t,\xi _t)-G(\omega ,t-,\xi _{t-})\), \(G'\) denotes the derivative of G with respect to the variable t and \(\delta _s(\cdot )\) represents the Dirac measure concentrated at s. Hence, using (3.5) and following the similar arguments of Theorem 3.1 in [10] or Lemma 3.2 in [16], we can get the assertion. \(\square \)

4 The Main Results

In this section, we show the existence of a unique solution to the risk-sensitive first passage discounted cost optimality equation and the existence of optimal policies. To this end, we introduce the following new value iteration.

Let \(m(x):=Lw(x)\) for all \(x\in X\), where the constant L and the function w on X are as in Assumption 3.1. Set

$$\begin{aligned} c_n(x,a):=\min \{c(x,a),n\} \ \textrm{and} \ Q(\text {d}y|x,a):=\frac{q(\text {d}y|x,a)}{m(x)}+\delta _x(\text {d}y) \end{aligned}$$

for all \((x,a)\in K\) and \(n\ge 1\). For each \(n\ge 1\), define \(v_n^{(0)}(\lambda ,x):=1\) for all \((\lambda ,x)\in (0,\varLambda )\times X\) and

$$\begin{aligned} \left\{ \begin{array}{ll} v_n^{(k+1)}(\lambda ,x):=&{}\,(\alpha \lambda )^{-\frac{m(x)}{\alpha }}\int _0^{\lambda }(\alpha t)^{\frac{m(x)}{\alpha }}\inf _{a\in A(x)}\bigg \{\frac{1}{\alpha }c_n(x,a)v_n^{(k)}(t,x)\\ &{}+\frac{m(x)}{\alpha t}\int _{X}v_n^{(k)}(t,y)Q(\text {d}y|x,a)\bigg \}\text {d}t \ \mathrm{for \ all} \ (\lambda ,x)\in (0,\varLambda )\times B^c\\ v_n^{(k+1)}(\lambda ,x):=&{}\,1 \ \mathrm{for \ all} \ (\lambda ,x)\in (0,\varLambda )\times B. \end{array}\right. \end{aligned}$$
(4.1)

for all \(k\ge 0\), where \(B^c\) stands for the complement of B with respect to X.

We have the following lemma which plays a key role in obtaining the existence of a unique solution to the risk-sensitive first passage discounted cost optimality equation.

Lemma 4.1

Suppose that Assumptions 3.1 and 3.2 are satisfied. Then, the following statements hold for all \(n\ge 1\).

  1. (a)

    There exists a bounded measurable function \(v^*_n\) on \((0,\varLambda )\times X\) satisfying the following equation:

    $$\begin{aligned} \left\{ \begin{array}{ll} v^*_{n}(\lambda ,x)=&{}\,(\alpha \lambda )^{-\frac{m(x)}{\alpha }}\int _0^{\lambda }(\alpha t)^{\frac{m(x)}{\alpha }}\inf _{a\in A(x)}\bigg \{\frac{1}{\alpha }c_n(x,a)v^*_n(t,x)\\ &{}+\frac{m(x)}{\alpha t}\int _{X}v^*_n(t,y)Q(\text {d}y|x,a)\bigg \}\text {d}t \ for \ all \ (\lambda ,x)\in (0,\varLambda )\times B^c\\ v^*_{n}(\lambda ,x)=&{}\,1 \ for \ all \ (\lambda ,x)\in (0,\varLambda )\times B. \end{array}\right. \end{aligned}$$

    Moreover, we have \(1\le v^*_n(\lambda ,x)\le \text {e}^{\frac{\lambda \Vert c_n\Vert }{\alpha }}\le \text {e}^{\frac{\varLambda \Vert c_n\Vert }{\alpha }}\) for all \((\lambda ,x)\in (0,\varLambda )\times X\), where \(\Vert c_n\Vert :=\max _{(x,a)\in K}c_n(x,a)\).

  2. (b)

    \(v^*_n(\lambda ,x)=\inf _{\pi \in \Pi }E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{\tau _B}\int _A\text {e}^{-\alpha t}c_n(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \) for all \((\lambda ,x)\in (0,\varLambda )\times X\).

Proof

(a) Fix any \(n\ge 1\). By (4.1), Assumption 3.2, Lemma 3.2, Lemma 8.3.8 in [18] and an induction argument, we see that \(v_n^{(k)}\) is measurable on \((0,\varLambda )\times X\) for all \(k\ge 0\). Below, we prove the following fact that

$$\begin{aligned} v_n^{(k)}\le v_n^{(k+1)} \ \mathrm{for \ all} \ k\ge 0. \end{aligned}$$
(4.2)

Indeed, by (4.1) direct computations imply

$$\begin{aligned} v_n^{(1)}(\lambda ,x)\ge (\alpha \lambda )^{-\frac{m(x)}{\alpha }}\int _0^{\lambda }(\alpha t)^{\frac{m(x)}{\alpha }}\frac{m(x)}{\alpha t}\text {d}t=v_n^{(0)}(\lambda ,x) \end{aligned}$$

for all \((\lambda ,x)\in (0,\varLambda )\times B^c\). Thus, (4.2) is true for \(k=0\). Suppose that (4.2) holds for some \(k_0\ge 0\). Then, it follows from (4.1) and the induction hypothesis that

$$\begin{aligned} v_n^{(k_0+2)}(\lambda ,x)&\ge (\alpha \lambda )^{-\frac{m(x)}{\alpha }}\int _0^{\lambda }(\alpha t)^{\frac{m(x)}{\alpha }}\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v_n^{(k_0)}(t,x)\right. \\&\quad \left. +\frac{m(x)}{\alpha t}\int _{X}v_n^{(k_0)}(t,y)Q(\text {d}y|x,a)\right\} \text {d}t\\&=v_n^{(k_0+1)}(\lambda ,x) \end{aligned}$$

for all \((\lambda ,x)\in (0,\varLambda )\times B^c\). Hence, (4.2) holds for \(k=k_0+1\). Therefore, we can derive (4.2) from the induction argument. Let \(v^*_n(\lambda ,x):=\lim _{k\rightarrow \infty }v_n^{(k)}(\lambda ,x)\) for all \((\lambda ,x)\in (0,\varLambda )\times X\). Then, we have \(v^*_n(\lambda ,x)=1\) for all \((\lambda ,x)\in (0,\varLambda )\times B\). For each \(k\ge 0\) and \((t,x)\in (0,\varLambda )\times B^c\), by Assumption 3.2, Lemma 3.2 and Theorem 2.43 in [1], there exists \(a^{(k)}_{t,x}\in A(x)\) such that

$$\begin{aligned}&\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v_n^{(k)}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{(k)}(t,y)Q(\text {d}y|x,a)\right\} \nonumber \\&\quad =\frac{1}{\alpha }c_n(x,a^{(k)}_{t,x})v_n^{(k)}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{(k)}(t,y)Q(\text {d}y|x,a^{(k)}_{t,x}). \end{aligned}$$
(4.3)

Because A(x) is compact, there exists a subsequence of \(\{a^{(k)}_{t,x},k\ge 0\}\) (denoted by the same sequence) such that \(a^{(k)}_{t,x}\) converges to some \(\widetilde{a}_{t,x}\in A(x)\). Employing (4.1) and an induction argument, we see that

$$\begin{aligned} 1\le v_n^{(k)}(t,x)\le \text {e}^{\frac{\lambda \Vert c_n\Vert }{\alpha }} \end{aligned}$$
(4.4)

for all \((t,x)\in (0,\varLambda )\times B^c\) and \(k\ge 0\). Then, it follows from (4.3), (4.4), Assumption 3.2 and Lemma 3.2 that

$$\begin{aligned}&\lim _{k\rightarrow \infty }\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v_n^{(k)}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{(k)}(t,y)Q(\text {d}y|x,a)\right\} \nonumber \\&\quad \ge \frac{1}{\alpha }c_n(x,\widetilde{a}_{t,x})v_n^{*}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{*}(t,y)Q(\text {d}y|x,\widetilde{a}_{t,x})\nonumber \\&\quad \ge \inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v_n^{*}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{*}(t,y)Q(\text {d}y|x,a)\right\} \end{aligned}$$
(4.5)

for all \((t,x)\in (0,\varLambda )\times B^c\). Moreover, we have

$$\begin{aligned}&\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v_n^{(k)}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{(k)}(t,y)Q(\text {d}y|x,a)\right\} \\&\quad \le \frac{1}{\alpha }c_n(x,a)v_n^{(k)}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{(k)}(t,y)Q(\text {d}y|x,a) \end{aligned}$$

for all \(k\ge 0\), which yields

$$\begin{aligned}&\lim _{k\rightarrow \infty }\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v_n^{(k)}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{(k)}(t,y)Q(\text {d}y|x,a)\right\} \nonumber \\&\quad \le \,\frac{1}{\alpha }c_n(x,a)v_n^{*}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{*}(t,y)Q(\text {d}y|x,a) \end{aligned}$$

for all \((t,x)\in (0,\varLambda )\times B^c\) and \(a\in A(x)\). Thus, from the last inequality and (4.5), we get

$$\begin{aligned}&\lim _{k\rightarrow \infty }\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v_n^{(k)}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{(k)}(t,y)Q(\text {d}y|x,a)\right\} \nonumber \\&\quad =\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v_n^{*}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{*}(t,y)Q(\text {d}y|x,a)\right\} \end{aligned}$$
(4.6)

for all \((t,x)\in (0,\varLambda )\times B^c\). Hence, using (4.1), (4.4), (4.6) and the monotone convergence theorem, we derive

$$\begin{aligned} v^*_n(\lambda ,x)&=(\alpha \lambda )^{-\frac{m(x)}{\alpha }}\int _0^{\lambda }(\alpha t)^{\frac{m(x)}{\alpha }}\inf _{a\in A(x)}\\&\quad \left\{ \frac{1}{\alpha }c_n(x,a)v_n^{*}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{*}(t,y)Q(\text {d}y|x,a)\right\} \text {d}t \end{aligned}$$

for all \((\lambda ,x)\in (0,\varLambda )\times B^c\). Furthermore, by (4.4) we can obtain \(1\le v^*_n(\lambda ,x)\le \text {e}^{\frac{\lambda \Vert c_n\Vert }{\alpha }}\le \text {e}^{\frac{\varLambda \Vert c_n\Vert }{\alpha }}\) for all \((\lambda ,x)\in (0,\varLambda )\times B^c\).

(b) The assertion is obviously true for all \((\lambda , x)\in (0,\varLambda )\times B\). Below, we show that the assertion is true for all \((\lambda , x)\in (0,\varLambda )\times B^c\). Fix any \(n\ge 1\). For each \(x\in B^c\), by part (a) we have

$$\begin{aligned} (\alpha \lambda )^{\frac{m(x)}{\alpha }}v^*_{n}(\lambda ,x)&=\int _0^{\lambda }(\alpha t)^{\frac{m(x)}{\alpha }}\inf _{a\in A(x)}\nonumber \\&\quad \left\{ \frac{1}{\alpha }c_n(x,a)v^*_n(t,x)+\frac{m(x)}{\alpha t}\int _{X}v^*_n(t,y)Q(\text {d}y|x,a)\right\} \text {d}t \end{aligned}$$
(4.7)

for all \(\lambda \in (0,\varLambda )\). For any \(x\in B^c\) and \([\underline{\gamma },\overline{\gamma }]\subset (0,\varLambda )\), employing Theorem 3.11, Exercise 22 in [23, p. 130, 149] and (4.7), we see that \(v^*_{n}(\cdot ,x)\) is absolutely continuous on \([\underline{\gamma },\overline{\gamma }]\). Thus, for each \(x\in B^c\), the partial derivative of \(v^*_n(\cdot ,x)\) with respect to the variable \(\lambda \) exists a.e. \(\lambda \in (0,\varLambda )\). Calculating the derivative with respect to the variable \(\lambda \) in (4.7), for each \(x\in B^c\), we derive

$$\begin{aligned}&(\alpha \lambda )^{\frac{m(x)}{\alpha }}\frac{m(x)}{\alpha \lambda }v^*_{n}(\lambda ,x)+(\alpha \lambda )^{\frac{m(x)}{\alpha }}\frac{\partial v^*_n}{\partial \lambda }(\lambda ,x)\nonumber \\&\quad =(\alpha \lambda )^{\frac{m(x)}{\alpha }}\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v^*_n(\lambda ,x)+\frac{m(x)}{\alpha \lambda }\int _{X}v^*_n(\lambda ,y)Q(\text {d}y|x,a)\right\} \nonumber \\&\quad =(\alpha \lambda )^{\frac{m(x)}{\alpha }}\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v^*_n(\lambda ,x)+\frac{m(x)}{\alpha \lambda }\left( \frac{1}{m(x)}\int _Xv^*_n(\lambda ,y)q(\text {d}y|x,a)+v^*_n(\lambda ,x)\right) \right\} \end{aligned}$$
(4.8)

a.e. \(\lambda \in (0,\varLambda )\). Then, using (4.8), for each \(x\in B^c\), there exists \(O_x\subset (0,\varLambda )\) with Lebesgue measure zero such that

$$\begin{aligned} \frac{\partial v^*_n}{\partial \lambda }(\lambda ,x)=\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v^*_n(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*_n(\lambda ,y)q(\text {d}y|x,a)\right\} \end{aligned}$$
(4.9)

for all \(\lambda \in O_x^c\), where \(O^c_x\) denotes the complement of \(O_x\) with respect to \((0,\varLambda )\). Thus, it follows from (4.9), Assumption 3.1 and part (a) that

$$\begin{aligned} \lambda \left| \frac{\partial v^*_n}{\partial \lambda }(\lambda ,x)\right| \le \frac{\varLambda }{\alpha }\Vert c_n\Vert \text {e}^{\frac{\varLambda \Vert c_n\Vert }{\alpha }}+\frac{2}{\alpha } \text {e}^{\frac{\varLambda \Vert c_n\Vert }{\alpha }}q^*(x)\le \left( \frac{\varLambda }{\alpha }\Vert c_n\Vert \text {e}^{\frac{\varLambda \Vert c_n\Vert }{\alpha }}+\frac{2}{\alpha } \text {e}^{\frac{\varLambda \Vert c_n\Vert }{\alpha }}L\right) w(x) \end{aligned}$$

for all \(x\in X\) and \(\lambda \in O_x^c\). Hence, employing Lemma 3.3 we can get

$$\begin{aligned}&E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\int _A\text {e}^{-\alpha t}c_n(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*_n(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B})\right] -v^*_n(\lambda ,x)\nonumber \\&\quad =E_x^{\pi }\bigg [\int _0^{T\wedge \tau _B}\text {e}^{\lambda \int _0^s\int _A \text {e}^{-\alpha u }c_n(\xi _u,a)\pi (\text {d}a|\omega ,u)\text {d}u}\bigg (\lambda \text {e}^{-\alpha s}\int _Ac_n(\xi _s,a)\pi (\text {d}a|\omega ,s)v^*_n(\lambda \text {e}^{-\alpha s},\xi _s)\nonumber \\&\qquad -\alpha \lambda \text {e}^{-\alpha s}\frac{\partial v^*_n}{\partial \lambda }(\lambda \text {e}^{-\alpha s},\xi _s) +\int _X\int _Av^*_n(\lambda \text {e}^{-\alpha s},y)q(\text {d}y|\xi _s,a)\pi (\text {d}a|\omega ,s) \bigg )\text {d}s\bigg ] \end{aligned}$$
(4.10)

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(T\ge 0\) and \(\pi \in \Pi \). By Assumption 3.2, Lemma 3.2 and Lemma 8.3.8 in [18], there exists a measurable mapping \(f^*_n:(0,\varLambda )\times X\rightarrow A\) with \(f^*_n(\lambda ,x)\in A(x)\) for all \((\lambda ,x)\in (0,\varLambda )\times X\) such that

$$\begin{aligned}&\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v^*_n(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*_n(\lambda ,y)q(\text {d}y|x,a)\right\} \nonumber \\&\quad =\frac{1}{\alpha }c_n(x,f^*_n(\lambda ,x))v^*_n(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*_n(\lambda ,y)q(\text {d}y|x,f^*_n(\lambda ,x)) \end{aligned}$$
(4.11)

for all \((\lambda ,x)\in (0,\varLambda )\times X\). Let \(\pi ^*_n(\cdot |\omega ,t):=\delta _{f^*_n(\lambda \text {e}^{-\alpha t},\xi _{t-})}(\cdot )\) for all \(t\ge 0\). Using (4.9)–(4.11) we can obtain

$$\begin{aligned} v^*_n(\lambda ,x)= E_x^{\pi ^*_n}\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\text {e}^{-\alpha t}c_n(\xi _t,f^*_n(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}v^*_n(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B})\right] \end{aligned}$$
(4.12)

for all \(x\in B^c\), \(\lambda \in (0,\varLambda )\) and \(T\ge 0\). Note that part (a) implies \(v^*_n(0,x)=1\) for all \(x\in X\). Employing (4.12) we have

$$\begin{aligned} v^*_n(\lambda ,x)=&\,\,E_x^{\pi ^*_n}\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\text {e}^{-\alpha t}c_n(\xi _t,f^*_n(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}v^*_n(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B})I_{\{\tau _B<\infty \}}\right] \nonumber \\&\quad +E_x^{\pi ^*_n}\left[ \text {e}^{\lambda \int _0^{T}\text {e}^{-\alpha t}c_n(\xi _t,f^*_n(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}v^*_n(\lambda \text {e}^{-\alpha T}, \xi _{T})I_{\{\tau _B=\infty \}}\right] \end{aligned}$$
(4.13)

for all \(x\in B^c\), \(\lambda \in (0,\varLambda )\) and \(T\ge 0\). Therefore, letting \(T\rightarrow \infty \) in (4.13), by part (a) and the dominated convergence theorem, we can derive

$$\begin{aligned} v^*_n(\lambda ,x)&=E_x^{\pi ^*_n}\left[ \text {e}^{\lambda \int _0^{\tau _B}\text {e}^{-\alpha t}c_n(\xi _t,f^*_n(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}I_{\{\tau _B<\infty \}}\right] \nonumber \\&\quad +E_x^{\pi ^*_n}\left[ \text {e}^{\lambda \int _0^{\infty }\text {e}^{-\alpha t}c_n(\xi _t,f^*_n(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}I_{\{\tau _B=\infty \}}\right] \nonumber \\&=E_x^{\pi ^*_n}\left[ \text {e}^{\lambda \int _0^{\tau _B}\text {e}^{-\alpha t}c_n(\xi _t,f^*_n(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}\right] \nonumber \\&\ge \inf _{\pi \in \Pi }E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{\tau _B}\int _A\text {e}^{-\alpha t}c_n(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \end{aligned}$$
(4.14)

for all \(x\in B^c\) and \(\lambda \in (0,\varLambda )\). On the other hand, it follows from (4.9) and (4.11) that

$$\begin{aligned} v^*_n(\lambda ,x)\le E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\int _A\text {e}^{-\alpha t}c_n(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*_n(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B})\right] \end{aligned}$$

for all \(x\in B^c\), \(\lambda \in (0,\varLambda )\), \(T\ge 0\) and \(\pi \in \Pi \). Thus, using the last inequality and the similar arguments as (4.14), we can get

$$\begin{aligned} v^*_n(\lambda ,x)\le E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{\tau _B}\int _A\text {e}^{-\alpha t}c_n(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \end{aligned}$$
(4.15)

for all \(x\in B^c\), \(\lambda \in (0,\varLambda )\) and \(\pi \in \Pi \). Hence, the assertion follows from (4.14) and (4.15). \(\square \)

Denote by \(V_{w}((0,\varLambda )\times X)\) the set of all nonnegative real-valued measurable functions \(v\in U_w((0,\varLambda )\times X)\) and \(v(\lambda ,x)\le R_{\lambda }w^{\frac{\lambda M}{\alpha }}(x)\) for all \((\lambda ,x)\in (0,\varLambda )\times X\), where the constant \(R_{\lambda }\) is as in Lemma 3.1. Employing Lemma 4.1, we can derive the existence of a unique solution in \(V_{w}((0,\varLambda )\times X)\) to the risk-sensitive first passage discounted cost optimality equation and the existence of optimal policies in the following theorem.

Theorem 4.1

Under Assumptions 3.1 and 3.2, the following assertions are true.

  1. (a)

    There exists a measurable function \(v^*\in V_{w}((0,\varLambda )\times X)\) satisfying \(v^*(\lambda ,x)=1\) for all \((\lambda ,x)\in (0,\varLambda )\times B\) and

    $$\begin{aligned} \frac{\partial v^*}{\partial \lambda }(\lambda ,x)=\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c(x,a)v^*(\lambda ,x)+\frac{1}{\alpha \lambda }\int _Xv^*(\lambda ,y)q(\text {d}y|x,a)\right\} \end{aligned}$$
    (4.16)

    for all \(x\in B^c\) and a.e. \(\lambda \in (0,\varLambda )\).

  2. (b)

    There exists a measurable mapping \(f^*:(0,\varLambda )\times X\rightarrow A\) with \(f^*(\lambda ,x)\in A(x)\) for all \((\lambda ,x)\in (0,\varLambda )\times X\) such that

    $$\begin{aligned}&\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c(x,a)v^*(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*(\lambda ,y)q(\text {d}y|x,a)\right\} \\&\quad =\frac{1}{\alpha }c(x,f^*(\lambda ,x))v^*(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*(\lambda ,y)q(\text {d}y|x,f^*(\lambda ,x)) \end{aligned}$$

    for all \((\lambda ,x)\in (0,\varLambda )\times X\). Let \(\pi ^*(\cdot |\omega ,t):=\delta _{f^*(\lambda \text {e}^{-\alpha t},\xi _{t-})}(\cdot )\) for all \(t\ge 0\). Then, we have \(v^*(\lambda ,x)=\widehat{J}(\lambda , x,\pi ^*)=\inf _{\pi \in \Pi }\widehat{J}(\lambda , x,\pi )\) for all \((\lambda ,x)\in (0,\varLambda )\times X\). Hence, there exists a deterministic Markov optimal policy under the risk-sensitive first passage discounted cost criterion.

  3. (c)

    If there exists a measurable function \(v\in V_{w}((0,\varLambda )\times X)\) satisfying \(v(\lambda ,x)=1\) for all \((\lambda ,x)\in (0,\varLambda )\times B\) and (4.16), then we have \(v(\lambda ,x)=\inf _{\pi \in \Pi }\widehat{J}(\lambda , x,\pi )\) for all \((\lambda ,x)\in (0,\varLambda )\times X\).

Proof

(a) By Lemmas 3.1 and 4.1, we can obtain

$$\begin{aligned} 1\le v^*_n(\lambda ,x)\le \inf _{\pi \in \Pi }E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{\tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \le R_{\lambda }w^{\frac{\lambda M}{\alpha }}(x) \end{aligned}$$
(4.17)

for all \((\lambda ,x)\in (0,\varLambda )\times X\) and \(n\ge 1\). From Lemma 4.1(b), we see that \(v^*_n\) is nondecreasing in n. Let \(v^*(\lambda ,x):=\lim _{n\rightarrow \infty }v^*_n(\lambda ,x)\) for all \((\lambda ,x)\in (0,\varLambda )\times X\). Lemma 4.1(a) gives \(v^*(\lambda ,x)=1\) for all \((\lambda ,x)\in (0,\varLambda )\times B\). Employing the similar arguments as (4.6), we can get

$$\begin{aligned}&\lim _{n\rightarrow \infty } \inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c_n(x,a)v_n^{*}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v_n^{*}(t,y)Q(\text {d}y|x,a)\right\} \nonumber \\&\quad =\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c(x,a)v^{*}(t,x)+\frac{m(x)}{\alpha t}\int _{X}v^{*}(t,y)Q(\text {d}y|x,a)\right\} \end{aligned}$$
(4.18)

for all \((t,x)\in (0,\varLambda )\times B^c\). Moreover, using Lemma 4.1(a), (4.18) and the monotone convergence theorem, we derive

$$\begin{aligned} v^*(\lambda ,x)&=(\alpha \lambda )^{-\frac{m(x)}{\alpha }}\int _0^{\lambda }(\alpha t)^{\frac{m(x)}{\alpha }}\inf _{a\in A(x)}\nonumber \\&\quad \left\{ \frac{1}{\alpha }c(x,a)v^*(t,x) +\frac{m(x)}{\alpha t}\int _{X}v^*(t,y)Q(\text {d}y|x,a)\right\} \text {d}t \end{aligned}$$
(4.19)

for all \((t,x)\in (0,\varLambda )\times B^c\). Below, we show that \(v^*\in V_{w}((0,\varLambda )\times X)\). In fact, (4.17) implies

$$\begin{aligned} 1\le v^*(\lambda ,x)\le R_{\lambda }w^{\frac{\lambda M}{\alpha }}(x) \end{aligned}$$
(4.20)

for all \((\lambda ,x)\in (0,\varLambda )\times X\). For each \(x\in B^c\), by (4.19) we get

$$\begin{aligned} (\alpha \lambda )^{\frac{m(x)}{\alpha }} v^*(\lambda ,x)&=\int _0^{\lambda }(\alpha t)^{\frac{m(x)}{\alpha }}\inf _{a\in A(x)}\nonumber \\&\quad \left\{ \frac{1}{\alpha }c(x,a)v^*(t,x) +\frac{m(x)}{\alpha t}\int _{X}v^*(t,y)Q(\text {d}y|x,a)\right\} \text {d}t \end{aligned}$$
(4.21)

for all \(\lambda \in (0,\varLambda )\). For any \(x\in B^c\) and \([\underline{\gamma },\overline{\gamma }]\subset (0,\varLambda )\), using Theorem 3.11, Exercise 22 in [23, p.130, 149] and (4.21), we have that \(v^*(\cdot ,x)\) is absolutely continuous on \([\underline{\gamma },\overline{\gamma }]\). Similar to the calculations as (4.9), employing (4.21) we can obtain

$$\begin{aligned} \frac{\partial v^*}{\partial \lambda }(\lambda ,x)=\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c(x,a)v^*(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*(\lambda ,y)q(\text {d}y|x,a)\right\} \end{aligned}$$
(4.22)

a.e. \(\lambda \in (0,\varLambda )\). Thus, by (4.22) and Assumption 3.1, for any \(x\in B^c\), we have

$$\begin{aligned}&\lambda \left| \frac{\partial v^*}{\partial \lambda }(\lambda ,x)\right| \le \frac{\varLambda }{\alpha }(M\ln w(x)+b)R_{\lambda }w(x)+\frac{R_{\lambda }}{\alpha }\int _Xw(y)|q(\text {d}y|x,a)|\\&\le \frac{\varLambda }{\alpha }(M+b)R_{\lambda }w^2(x)+\frac{R_{\lambda }}{\alpha }\left[ \int _Xw(y)q(\text {d}y|x,a)+2q^*(x)w(x)\right] \\&\le \left[ \frac{\varLambda }{\alpha }(M+b)R_{\varLambda }+\frac{R_{\varLambda }}{\alpha }(\rho +d+2L)\right] w^2(x) \end{aligned}$$

a.e. \(\lambda \in (0,\varLambda )\). Note that \(v^*(\lambda ,x)=1\) and \(\frac{\partial v^*}{\partial \lambda }(\lambda ,x)=0\) for all \((\lambda ,x)\in (0,\varLambda )\times B\). Hence, we get \(v^*\in V_{w}((0,\varLambda )\times X)\).

(b) By Assumption 3.2(ii) and Lemma 3.2, we see that for each \((\lambda ,x)\in (0,\varLambda )\times X\), \(\frac{1}{\alpha }c(x,a)v^*(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*(\lambda ,y)q(\text {d}y|x,a)\) is lower semi-continuous in \(a\in A(x)\). Then, using Assumption 3.2(i) and Lemma 8.3.8 in [18], we can derive the existence of a measurable mapping \(f^*:(0,\varLambda )\times X\rightarrow A\) with \(f^*(\lambda ,x)\in A(x)\) for all \((\lambda ,x)\in (0,\varLambda )\times X\) which satisfies

$$\begin{aligned}&\inf _{a\in A(x)}\left\{ \frac{1}{\alpha }c(x,a)v^*(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*(\lambda ,y)q(\text {d}y|x,a)\right\} \nonumber \\&\quad =\frac{1}{\alpha }c(x,f^*(\lambda ,x))v^*(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*(\lambda ,y)q(\text {d}y|x,f^*(\lambda ,x)) \end{aligned}$$
(4.23)

for all \((\lambda ,x)\in (0,\varLambda )\times X\). Let \(\pi ^*(\cdot |\omega ,t):=\delta _{f^*(\lambda \text {e}^{-\alpha t},\xi _{t-})}(\cdot )\) for all \(t\ge 0\). Part (a) and Lemma 3.3 give

$$\begin{aligned}&E_x^{\pi ^*}\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\text {e}^{-\alpha t}c_n(\xi _t,f^*(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B})\right] -v^*(\lambda ,x)\nonumber \\&\quad =E_x^{\pi ^*}\bigg [\int _0^{T\wedge \tau _B}\text {e}^{\lambda \int _0^s \text {e}^{-\alpha u }c_n(\xi _u,f^*(\lambda \text {e}^{-\alpha u},\xi _{u}))du}\bigg (\lambda \text {e}^{-\alpha s}c_n(\xi _s,f^*(\lambda \text {e}^{-\alpha s},\xi _{s}))v^*(\lambda \text {e}^{-\alpha s},\xi _s)\nonumber \\&\quad \quad -\alpha \lambda \text {e}^{-\alpha s}\frac{\partial v^*}{\partial \lambda }(\lambda \text {e}^{-\alpha s},\xi _s) +\int _Xv^*(\lambda \text {e}^{-\alpha s},y)q(\text {d}y|\xi _s,f^*(\lambda \text {e}^{-\alpha s},\xi _{s})) \bigg )\text {d}s\bigg ] \end{aligned}$$
(4.24)

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(T\ge 0\) and \(n\ge 1\). From part (a) and (4.23), for each \(x\in B^c\), we have

$$\begin{aligned} \frac{\partial v^*}{\partial \lambda }(\lambda ,x)\ge \frac{1}{\alpha }c_n(x,f^*(\lambda ,x))v^*(\lambda ,x)+\frac{1}{\alpha \lambda }\int _{X}v^*(\lambda ,y)q(\text {d}y|x,f^*(\lambda ,x)) \end{aligned}$$

a.e. \(\lambda \in (0,\varLambda )\). Thus, it follows from (4.20), (4.24) and the last inequality that

$$\begin{aligned} v^*(\lambda ,x)\ge&\, E_x^{\pi ^*}\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\text {e}^{-\alpha t}c_n(\xi _t,f^*(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B})\right] \nonumber \\ \ge&\,E_x^{\pi ^*}\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\text {e}^{-\alpha t}c_n(\xi _t,f^*(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}\right] \end{aligned}$$
(4.25)

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(T\ge 0\) and \(n\ge 1\). Letting \(n\rightarrow \infty \) in (4.25), we get

$$\begin{aligned} v^*(\lambda ,x)\ge E_x^{\pi ^*}\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\text {e}^{-\alpha t}c(\xi _t,f^*(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}\right] \end{aligned}$$

for all \(x\in X\), \(\lambda \in (0,\varLambda )\) and \(T\ge 0\). Moreover, letting \(T\rightarrow \infty \) in the last inequality, the monotone convergence theorem implies

$$\begin{aligned} v^*(\lambda ,x)\ge E_x^{\pi ^*}\left[ \text {e}^{\lambda \int _0^{\tau _B}\text {e}^{-\alpha t}c(\xi _t,f^*(\lambda \text {e}^{-\alpha t},\xi _{t}))\text {d}t}\right] \ge \inf _{\pi \in \Pi }\widehat{J}(\lambda , x,\pi ) \end{aligned}$$
(4.26)

for all \(x\in X\) and \(\lambda \in (0,\varLambda )\).

Below, we show that

$$\begin{aligned} v^*(\lambda ,x)\le \inf _{\pi \in \Pi }\widehat{J}(\lambda , x,\pi ) \ \mathrm{for \ all} \ x\in X \ \textrm{and} \ \lambda \in (0,\varLambda ). \end{aligned}$$
(4.27)

Let \(\widehat{c}(x):=\max _{a\in A(x)}c(x,a)\) for all \(x\in X\), \(Y_{k}:=\{x:\widehat{c}(x)>k\}\) and \(\eta _{Y_k}:=\inf \{t\ge 0:\xi _t\in Y_k\}\) for all \(k\ge 1\). Then, for any \(n>k\), we can obtain

$$\begin{aligned}&E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B\wedge \eta _{Y_k}}\int _A\text {e}^{-\alpha t}c_n(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B\wedge \eta _{Y_k})}, \xi _{T\wedge \tau _B\wedge \eta _{Y_k}})\right] -v^*(\lambda ,x)\nonumber \\&\quad =E_x^{\pi }\bigg [\int _0^{T\wedge \tau _B\wedge \eta _{Y_k}}\text {e}^{\lambda \int _0^s\int _A \text {e}^{-\alpha u }c_n(\xi _u,a)\pi (\text {d}a|\omega ,u)du}\bigg (\lambda \text {e}^{-\alpha s}\int _Ac_n(\xi _s,a)\pi (\text {d}a|\omega ,s)v^*(\lambda \text {e}^{-\alpha s},\xi _s)\nonumber \\&\quad \quad -\alpha \lambda \text {e}^{-\alpha s}\frac{\partial v^*}{\partial \lambda }(\lambda \text {e}^{-\alpha s},\xi _s) +\int _X\int _Av^*(\lambda \text {e}^{-\alpha s},y)q(\text {d}y|\xi _s,a)\pi (\text {d}a|\omega ,s) \bigg )\text {d}s\bigg ]\nonumber \\&\quad =E_x^{\pi }\bigg [\int _0^{T\wedge \tau _B\wedge \eta _{Y_k}}\text {e}^{\lambda \int _0^s\int _A \text {e}^{-\alpha u }c(\xi _u,a)\pi (\text {d}a|\omega ,u)du}\bigg (\lambda \text {e}^{-\alpha s}\int _Ac(\xi _s,a)\pi (\text {d}a|\omega ,s)v^*(\lambda \text {e}^{-\alpha s},\xi _s)\nonumber \\&\quad \quad -\alpha \lambda \text {e}^{-\alpha s}\frac{\partial v^*}{\partial \lambda }(\lambda \text {e}^{-\alpha s},\xi _s) +\int _X\int _Av^*(\lambda \text {e}^{-\alpha s},y)q(\text {d}y|\xi _s,a)\pi (\text {d}a|\omega ,s) \bigg )\text {d}s\bigg ]\ge 0 \end{aligned}$$
(4.28)

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(T\ge 0\), \(k\ge 1\) and \(\pi \in \Pi \), where the first equality follows from part (a) and Lemma 3.3, and the inequality is due to part (a). Employing (4.28) we derive

$$\begin{aligned} v^*(\lambda ,x)\le E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B\wedge \eta _{Y_k}}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B\wedge \eta _{Y_k})}, \xi _{T\wedge \tau _B\wedge \eta _{Y_k}})\right] \end{aligned}$$
(4.29)

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(T\ge 0\), \(k\ge 1\) and \(\pi \in \Pi \). By Lemma 4.1(b), we see that \(v^*(\lambda ,x)\) is nondecreasing in \(\lambda \) for all \(x\in X\). Moreover, we have

$$\begin{aligned}&\text {e}^{\lambda \int _0^{T\wedge \tau _B\wedge \eta _{Y_k}}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B\wedge \eta _{Y_k})}, \xi _{T\wedge \tau _B\wedge \eta _{Y_k}})\nonumber \\ \le&\text {e}^{\lambda \int _0^{T}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*(\lambda , \xi _{T\wedge \tau _B\wedge \eta _{Y_k}})\nonumber \\ \le&R_{\lambda } \text {e}^{\lambda \int _0^{T}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}w^{\frac{\lambda M}{\alpha }}(\xi _{T\wedge \tau _B\wedge \eta _{Y_k}})\nonumber \\ \le&R^2_{\lambda } \text {e}^{2\lambda \int _0^{T}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}+w^{\frac{2\lambda M}{\alpha }}(\xi _{T\wedge \tau _B\wedge \eta _{Y_k}})\nonumber \\ \le&R^2_{\lambda } \text {e}^{2\lambda \int _0^{T}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}+w(\xi _{T\wedge \tau _B\wedge \eta _{Y_k}}) \end{aligned}$$
(4.30)

for all \(\lambda \in (0,\varLambda )\), \(T\ge 0\) and \(k\ge 1\), where the second inequality follows from (4.20). From the Dynkin formula, we can get

$$\begin{aligned} E_x^{\pi }[w(\xi _{T\wedge \tau _B\wedge \eta _{Y_k}})]=w(x)+E_x^{\pi }\left[ \int _0^{T\wedge \tau _B\wedge \eta _{Y_k}}\int _X\int _Aw(y)q(\text {d}y|\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t\right] \end{aligned}$$
(4.31)

for all \(x\in X\), \(\pi \in \Pi \), \(T\ge 0\) and \(k\ge 1\). Using Assumptions 3.1, 3.2 and Lemma 3.1(a), we obtain

$$\begin{aligned}&E_x^{\pi }\left[ \int _0^{T}\int _X\int _Aw(y)|q(\text {d}y|\xi _t,a)|\pi (\text {d}a|\omega ,t)\text {d}t\right] \nonumber \\&\quad \le \int _0^TE_x^{\pi }[\rho w(\xi _t)+d+2q^*(\xi _t)w(\xi _t)]\text {d}t\nonumber \\&\quad \le (\rho +d+2L)\int _0^TE_x^{\pi }[w^2(\xi _t)]\text {d}t\le (\rho +d+2L)T\left[ \text {e}^{\overline{\rho }T}w^2(x)+\frac{\overline{d}}{\overline{\rho }}\text {e}^{\overline{\rho }T}\right] \end{aligned}$$
(4.32)

for all \(x\in X\), \(\pi \in \Pi \) and \(T\ge 0\). Thus, we can derive

$$\begin{aligned}&\lim _{k\rightarrow \infty }E_x^{\pi }[w(\xi _{T\wedge \tau _B\wedge \eta _{Y_k}})]=w(x)+E_x^{\pi }\left[ \int _0^{T\wedge \tau _B }\int _X\int _Aw(y)q(\text {d}y|\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t\right] \nonumber \\&\quad =E_x^{\pi }[w(\xi _{T\wedge \tau _B})] \end{aligned}$$
(4.33)

for all \(x\in X\), \(\pi \in \Pi \) and \(T\ge 0\), where the first equality follows from (4.31) and the dominated convergence theorem, and the second one is due to the Dynkin formula. Direct calculations give

$$\begin{aligned}&E_x^{\pi }\left[ R^2_{\lambda }\text {e}^{2\lambda \int _0^{T}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}+w(\xi _{T\wedge \tau _B})\right] \\&\quad \le R^2_{\lambda } E_x^{\pi }\left[ \int _0^{T}\text {e}^{\frac{2\lambda }{\alpha }(1-\text {e}^{-\alpha T})\int _Ac(\xi _t,a)\pi (\text {d}a|\omega ,t)}\frac{\alpha \text {e}^{-\alpha t}}{1-\text {e}^{-\alpha T}}\text {d}t\right] +w(x)\\&\qquad + E_x^{\pi }\left[ \int _0^{T}\int _X\int _Aw(y)|q(\text {d}y|\xi _t,a)|\pi (\text {d}a|\omega ,t)\text {d}t\right] \\&\quad \le \frac{R^2_{\lambda }\alpha \text {e}^{\frac{2\lambda b}{\alpha }}}{1-\text {e}^{-\alpha T}}\int _0^{T}\text {e}^{-\alpha t}E_x^{\pi }[w(\xi _t)]\text {d}t+w(x)+(\rho +d+2L)T\left[ \text {e}^{\overline{\rho }T}w^2(x)+\frac{\overline{d}}{\overline{\rho }}\text {e}^{\overline{\rho }T}\right] \\&\quad \le \frac{R^2_{\lambda }\alpha \text {e}^{\frac{2\lambda b}{\alpha }}(\text {e}^{(\rho -\alpha )T}-1)}{(1-\text {e}^{-\alpha T})(\rho -\alpha )}\left( 1+\frac{d}{\rho }\right) w (x)+w(x)+(\rho +d+2L)T\left[ \text {e}^{\overline{\rho }T}w^2(x)+\frac{\overline{d}}{\overline{\rho }}\text {e}^{\overline{\rho }T}\right] \end{aligned}$$

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(\pi \in \Pi \) and \(T\ge 0\), where the first inequality follows from the Jensen inequality and (4.33), the second one is due to Assumption 3.1 and (4.32), and the last one follows from Lemma 3.1(a). Hence, by (4.30)–(4.33), the last inequality and the generalized dominated convergence theorem (see Theorem 2.88 in [3]), we have

$$\begin{aligned}&\lim _{k\rightarrow \infty }E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B\wedge \eta _{Y_k}}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B\wedge \eta _{Y_k})}, \xi _{T\wedge \tau _B\wedge \eta _{Y_k}})\right] \\&\quad =E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B })\right] , \end{aligned}$$

which together with (4.29) implies

$$\begin{aligned} v^*(\lambda ,x)\le E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B })\right] \end{aligned}$$
(4.34)

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(\pi \in \Pi \) and \(T\ge 0\). Let p be an arbitrary constant satisfying \(p>1\), \(\frac{\lambda Mp}{\alpha }<1\) and \(\frac{\lambda M\rho p}{\alpha ^2}<1\). Employing the Hölder inequality, we can derive

$$\begin{aligned}&E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B })\right] \nonumber \\&\quad \le \left( E_x^{\pi }\left[ \text {e}^{\lambda p\int _0^{T\wedge \tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \right) ^{\frac{1}{p}} \left( E_x^{\pi }\left[ v^{*\frac{p}{p-1}}(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B })\right] \right) ^{\frac{p-1}{p}} \end{aligned}$$
(4.35)

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(\pi \in \Pi \) and \(T\ge 0\). Moreover, by part (a) we obtain

$$\begin{aligned}&E_x^{\pi }\left[ v^{*\frac{p}{p-1}}(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B })\right] \nonumber \\&\quad =E_x^{\pi }\left[ v^{*\frac{p}{p-1}}(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B })\left( I_{\{\tau _B\ge T\}}+I_{\{\tau _B<T\}}I_{\{\tau _B<\infty \}}\right) \right] \nonumber \\&\quad \le E_x^{\pi }\left[ v^{*\frac{p}{p-1}}(\lambda \text {e}^{-\alpha T}, \xi _{T})\right] +E_x^{\pi }\left[ v^{*\frac{p}{p-1}}(\lambda \text {e}^{-\alpha \tau _B}, \xi _{\tau _B})I_{\{\tau _B<\infty \}}\right] \nonumber \\&\quad \le \left( \frac{\alpha ^2\text {e}^{\frac{\lambda \text {e}^{-\alpha T}}{\alpha }}}{\alpha ^2-\rho \lambda \text {e}^{-\alpha T}M}\right) ^{\frac{p}{p-1}}\left( 1+\frac{d}{\rho }\right) ^{\frac{\lambda \text {e}^{-\alpha T} Mp}{\alpha (p-1)}} E_x^{\pi }\left[ w^{\frac{\lambda \text {e}^{-\alpha T}Mp}{\alpha (p-1)}}(\xi _T)\right] +1 \end{aligned}$$
(4.36)

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(\pi \in \Pi \) and \(T\ge 0\). Observe that there exists a constant \(\widehat{T}>0\) satisfying \(\frac{\lambda \text {e}^{-\alpha T}Mp}{\alpha (p-1)}<1\) for all \(T\ge \widehat{T}\). Thus, using Lemma 3.1(a), (4.36) and the Jensen inequality, we get

$$\begin{aligned}&E_x^{\pi }\left[ v^{*\frac{p}{p-1}}(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B })\right] \\&\quad \le \left( \frac{\alpha ^2\text {e}^{\frac{\lambda \text {e}^{-\alpha T}}{\alpha }}}{\alpha ^2-\rho \lambda \text {e}^{-\alpha T}M}\right) ^{\frac{p}{p-1}}\left( 1+\frac{d}{\rho }\right) ^{\frac{\lambda \text {e}^{-\alpha T} Mp}{\alpha (p-1)}} (E_x^{\pi }\left[ w(\xi _T)\right] )^{\frac{\lambda \text {e}^{-\alpha T}Mp}{\alpha (p-1)}}+1\\&\quad \le \left( \frac{\alpha ^2\text {e}^{\frac{\lambda \text {e}^{-\alpha T}}{\alpha }}}{\alpha ^2-\rho \lambda \text {e}^{-\alpha T}M}\right) ^{\frac{p}{p-1}}\left( 1+\frac{d}{\rho }\right) ^{\frac{2\lambda \text {e}^{-\alpha T} Mp}{\alpha (p-1)}} \text {e}^{\frac{\lambda \text {e}^{-\alpha T}M\rho Tp}{\alpha (p-1)}}w^{\frac{\lambda \text {e}^{-\alpha T}Mp}{\alpha (p-1)}}(x)+1, \end{aligned}$$

which together with (4.35) yields

$$\begin{aligned}&E_x^{\pi }\left[ \text {e}^{\lambda \int _0^{T\wedge \tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}v^*(\lambda \text {e}^{-\alpha (T\wedge \tau _B)}, \xi _{T\wedge \tau _B })\right] \nonumber \\&\quad \le \,\left( E_x^{\pi }\left[ \text {e}^{\lambda p\int _0^{T\wedge \tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \right) ^{\frac{1}{p}}\nonumber \\&\qquad \times \left[ \left( \frac{\alpha ^2\text {e}^{\frac{\lambda \text {e}^{-\alpha T}}{\alpha }}}{\alpha ^2-\rho \lambda \text {e}^{-\alpha T}M}\right) ^{\frac{p}{p-1}}\left( 1+\frac{d}{\rho }\right) ^{\frac{2\lambda \text {e}^{-\alpha T} Mp}{\alpha (p-1)}} \text {e}^{\frac{\lambda \text {e}^{-\alpha T}M\rho Tp}{\alpha (p-1)}}w^{\frac{\lambda \text {e}^{-\alpha T}Mp}{\alpha (p-1)}}(x)+1\right] ^{\frac{p-1}{p}} \end{aligned}$$
(4.37)

for all \(x\in X\), \(\lambda \in (0,\varLambda )\), \(\pi \in \Pi \) and \(T\ge \widehat{T}\). Direct calculations give

$$\begin{aligned} \text {e}^{\lambda p\int _0^{T\wedge \tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}&\le \text {e}^{\lambda p\int _0^{\infty }\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\\&\le \int _0^{\infty }\text {e}^{\frac{\lambda p}{\alpha }\int _A c(\xi _t,a)\pi (\text {d}a|\omega ,t)}\alpha \text {e}^{-\alpha t}\text {d}t\\&\le \alpha \text {e}^{\frac{\lambda bp}{\alpha }}\int _0^{\infty }\text {e}^{-\alpha t}w^{\frac{\lambda Mp}{\alpha }}(\xi _t)\text {d}t \end{aligned}$$

for all \(T\ge 0\), where the second inequality is due to the Jensen inequality and the third one follows from Assumption 3.1. Moreover, by Lemma 3.1(a) and the Jensen inequality, we have

$$\begin{aligned}&E_x^{\pi }\left[ \int _0^{\infty }\text {e}^{-\alpha t}w^{\frac{\lambda Mp}{\alpha }}(\xi _t)\text {d}t\right] \\&\quad \le \int _0^{\infty }\text {e}^{-\alpha t}(E_x^{\pi }[w(\xi _t)])^{\frac{\lambda Mp}{\alpha }}\text {d}t\le \left( 1+\frac{d}{\rho }\right) ^{\frac{\lambda Mp}{\alpha }}\int _0^{\infty }\text {e}^{-\alpha t}\text {e}^{\frac{\rho \lambda Mpt}{\alpha }}w^{\frac{\lambda Mp}{\alpha }}(x)\text {d}t\\&\quad =\left( 1+\frac{d}{\rho }\right) ^{\frac{\lambda Mp}{\alpha }}\frac{\alpha }{\alpha ^2-\rho \lambda Mp}w^{\frac{\lambda Mp}{\alpha }}(x) \end{aligned}$$

for all \(x\in X\) and \(\pi \in \Pi \). Hence, letting \(T\rightarrow \infty \) in (4.37) and using (4.34) and the dominated convergence theorem, we can derive

$$\begin{aligned} v^*(\lambda ,x)\le 2^{\frac{p-1}{p}}\left( E_x^{\pi }\left[ \text {e}^{\lambda p\int _0^{\tau _B}\int _A\text {e}^{-\alpha t}c(\xi _t,a)\pi (\text {d}a|\omega ,t)\text {d}t}\right] \right) ^{\frac{1}{p}} \end{aligned}$$

for all \(x\in X\), \(\lambda \in (0,\varLambda )\) and \(\pi \in \Pi \). Furthermore, letting \(p\downarrow 1\) in the last inequality, by the dominated convergence theorem we obtain \(v^*(\lambda ,x)\le \widehat{J}(\lambda , x,\pi )\) for all \(x\in X\), \(\lambda \in (0,\varLambda )\) and \(\pi \in \Pi \). Therefore, we can get (4.27). Moreover, from (4.26) and (4.27), we have \(v^*(\lambda ,x)=\widehat{J}(\lambda , x,\pi ^*)=\inf _{\pi \in \Pi }\widehat{J}(\lambda , x,\pi )\) for all \((\lambda ,x)\in (0,\varLambda )\times X\). Hence, we see that \(\pi ^*\in \Pi _M\) is an optimal policy.

(c) The assertion follows from the same arguments as (4.26) and (4.27). \(\square \)

Remark 4.1

(a) Theorem 4.1 establishes the existence of a unique solution in \(V_{w}((0,\varLambda )\times X)\) to the risk-sensitive first passage discounted cost optimality equation given by (4.16) and the existence of a deterministic Markov optimal policy for the Borel state space and unbounded cost and transition rates, which extends the existence results of the infinite-horizon risk-sensitive discounted cost criterion in [5] with the Borel state space and bounded cost and transition rates, [8] with the denumerable state space and bounded cost and transition rates, and [15] with the denumerable state space and unbounded cost and transition rates.

(b) In [5, 8, 15] the existence of a solution to the infinite-horizon risk-sensitive discounted cost optimality equation is derived by the fixed point technique. The fixed point approach needs the boundedness assumption on the cost and transition rates. Thus, the method in [5, 8] cannot be applied to the case of unbounded cost and transition rates. To deal with the unbounded case, [15] constructs an approximating sequence of bounded cost and transition rates and applies the fixed point technique to the approximating sequence. The diagonalization arguments in [15] require the assumption that the state space is a denumerable set. Thus, the method in [15] is inapplicable to the case of the Borel state space. We introduce a new value iteration given by (4.1) to deal with the Borel state space and unbounded transition rates directly without the boundedness assumption as in [5, 8] and the diagonalization arguments and the construction of an approximating sequence of bounded transition rates as in [15].

(c) We obtain the existence result in Theorem 4.1 under the conditions weaker than those in [5, 8, 15]. More precisely, we remove the boundedness condition on the cost and transition rates in [5, 8] and the uniform convergence condition of the solution (i.e., \(\lim _{\lambda \rightarrow 0}v^*(\lambda ,x)=1\) uniformly in \(x\in X\)) in [8]. We do not require the denumerability assumption on the state space, the uniformly conservative condition on the transition rates (i.e., for any \(i\in X, \sum _{j\in X}q(j|i,a)=0\) uniformly in \(a\in A(i)\)) and the additional condition that the constant \(\overline{\rho }<\alpha \) in [15].

5 An Example

In this section, a cash flow model is given to illustrate the main results.

Example 5.1

(A cash flow model in [13]) The amount of the cash in the cash flow model is regarded as the state variable and all the possible states are given by \(X=(-\infty ,\infty )\). The action a denotes the withdrawal rate of money in cash (if \(a<0\)) or the supply rate (if \(a>0\)). When the amount of the cash is \(x\in X\), the decision-maker takes an action from a given set \(A(x)=[\zeta _1(x),\zeta _2(x)]\), where \(\zeta _1\) and \(\zeta _2\) are measurable functions on X and satisfy \(\zeta _1(x)<0\) and \(\zeta _2(x)>0\) for all \(x\in X\). The amount of the cash x and the action a chosen by the decision-maker incur a nonnegative cost c(xa). Moreover, when the amount of the cash equals x and the action \(a\in A(x)\) is taken by the decision-maker, after an exponentially distributed random time with the rate \(\kappa (x,a)>0\), the amount of the cash is changed to a new state following the normal distribution with the mean x and the variance \(\beta ^2\). So the transition rate can be given by

$$\begin{aligned} q(D|x,a)=\kappa (x,a)\left[ \int _{D\setminus \{x\}}\frac{1}{\sqrt{2\pi }\beta }\text {e}^{-\frac{(y-x)^2}{2\beta ^2}}\text {d}y-\delta _x(D)\right] \end{aligned}$$
(5.1)

for all \((x,a)\in K\) and \(D\in \mathcal {B}(X)\). Assume that the risk-sensitivity coefficient of the decision-maker is given by \(\lambda >0\). The decision-maker wishes to minimize the risk-sensitive discounted cost before the amount of the cash falls into the target set \(B=(-\infty ,0)\).

We consider the following conditions to guarantee the existence of optimal policies for the cash flow model.

  1. (E1)

    There exists a constant \(\widehat{L}>0\) such that \(\kappa (x,a)\le \widehat{L}(x^2+1)\) for all \((x,a)\in K\).

  2. (E2)

    There exist constants \(0\le \widehat{M}<\min \left\{ \frac{\alpha ^2}{\widehat{L}\beta ^2\lambda },\frac{\alpha }{2\lambda }\right\} \) and \(\widehat{b}\ge 0\) such that \(c(x,a)\le \widehat{M}\ln (x^2+1)+\widehat{b}\) for all \((x,a)\in K\).

  3. (E3)

    The function \(\kappa (x,a)\) is measurable on K and continuous in \(a\in A(x)\), and the function c(xa) is measurable on K and lower semi-continuous in \(a\in A(x)\) for all \(x\in X\).

Proposition 5.1

Under conditions (E1)–(E3), Example 5.1 satisfies Assumptions 3.1 and 3.2. Therefore, by Theorem 4.1 there exists a deterministic Markov optimal policy for the cash flow model under the risk-sensitive first passage discounted cost criterion.

Proof

Take \(w(x)=x^2+1\) for all \(x\in X\). By (5.1) and condition (E1), we obtain

$$\begin{aligned} \int _Xw(y)q(\text {d}y|x,a)=\beta ^2\kappa (x,a)\le \widehat{L}\beta ^2w(x) \ \textrm{and}\ q^*(x)=\sup _{a\in A(x)}\kappa (x,a)\le \widehat{L}w(x) \end{aligned}$$
(5.2)

for all \(x\in X\) and \(a\in A(x)\). Thus, it follows from (5.2) and condition (E2) that Assumption 3.1 is satisfied with \(\rho =\widehat{L}\beta ^2\), \(d=0\), \(L=\widehat{L}\), \(M=\widehat{M}\) and \(b=\widehat{b}\). Moreover, from the description of the model and condition (E3), we see that Assumptions 3.2(i) and (ii) hold. Finally, we verify Assumption 3.2(iii). Employing (5.1) and condition (E1), we can derive

$$\begin{aligned} \int _Xw^2(y)q(\text {d}y|x,a)=\kappa (x,a)(3\beta ^4+6\beta ^2x^2+2\beta ^2)\le \widehat{L}(3\beta ^4+6\beta ^2)w^2(x) \end{aligned}$$

for all \((x,a)\in K\). Hence, Assumption 3.2(iii) holds with \(\overline{\rho }=\widehat{L}(3\beta ^4+6\beta ^2)\) and \(\overline{d}=0\). \(\square \)

Remark 5.1

The state space is a Borel space and the transition and cost rates are allowed to be unbounded in Example 5.1. Hence, the conditions in [5, 8, 15] fail to hold because the cost and transition rates are bounded in [5, 8] and the state space is a denumerable set in [15].

6 Conclusions

In this paper, we have investigated continuous-time Markov decision processes under the risk-sensitive first passage discounted cost criterion. The state and action spaces are Borel spaces, and the cost and transition rates can be unbounded. We have introduced a new value iteration to derive the existence of a solution to the risk-sensitive first passage discounted cost optimality equation under the suitable conditions. Moreover, employing the Feynman–Kac formula, we have proved that the risk-sensitive first passage discounted cost optimal value function is a unique solution to the risk-sensitive first passage discounted cost optimality equation. In addition, we have obtained the existence of a deterministic Markov optimal policy in the class of randomized history-dependent policies.