1 Introduction

Continuous-time Markov decision processes (CTMDPs) have rich applications in the queueing systems, inventory management, telecommunication, the control of the epidemic, etc; see, for instance, Puterman (1994), Kitaev and Rykov (1995), Guo and Hernandez-Lerma (2009), Guo et al. (2012) and the references therein. The average cost criterion is a common optimality criterion in the CTMDPs, which includes the expected average cost and the risk-sensitive average cost criteria. For the expected average cost criterion, the decision-maker is supposed to be risk-neutral and there exists a vast amount of literature; see, for instance, Puterman (1994), Kitaev and Rykov (1995), Guo and Hernandez-Lerma (2009), Guo et al. (2012), Wei and Chen (2014) and the extensive references therein. For the risk-sensitive average cost criterion, the exponential utility function is employed to characterize the risk preferences of the decision-maker. More specifically, when the risk-sensitivity parameter of the exponential utility function takes positive (negative) values, the decision-maker is risk-averse (risk-seeking). Although the risk-sensitive average cost criterion for the discrete-time MDPs has been widely studied (see, for instance, Cavazos-Cadena and Fernandez-Gaucherand (2005), Cavazos-Cadena (2010) and Cavazos-Cadena and Hernandez-Hernandez (2011) for the countable state space and Jaśkiewicz (2007) and Di Masi and Stettner (2007) for the uncountable state space), there exists a handful of literature on this criterion for the CTMDPs. Ghosh and Saha (2014) and Wei and Chen (2016) investigate the risk-sensitive average cost criterion with a positive risk-sensitivity parameter for the CTMDPs and obtain the existence of optimal policies via the optimality equation approach. Moreover, to the best of our knowledge, there is no existing literature dealing with the risk-sensitive average cost criterion which allows the risk-sensitivity parameter to take negative values for the CTMDPs.

On the other hand, the risk preferences of the decision-maker may be described neither by the identity function nor by the exponential utility function in the real-world applications. Except the identity function and the exponential utility function, there are other utility functions to describe the risk preferences of the decision-maker, such as the logarithmic utility function, the power utility function, etc. Thus, it is desirable for us to consider the average cost criterion induced by the general utility function. For the discrete-time MDPs, Bäuerle and Rieder (2014) discusses the average cost criterion induced by the power utility function and Cavazos-Cadena and Hernández-Hernández (2016) studies the average cost criterion induced by the regular utility function which is referred to as the U-average cost criterion for simplicity. The U-average cost criterion includes the expected average cost criterion induced by the identity function, the risk-sensitive average cost criterion induced by the exponential utility function and the average cost criterion induced by the logarithmic and power utility functions. For the CTMDPs, as far as we can tell, the discussions on the average cost criterion only focus on the expected average cost and risk-sensitive average cost criteria.

In this paper we study the U-average cost criterion for the CTMDPs. The state space is a finite set and the action space is a Borel space. Since the existence of optimal policies for the U-average cost criterion is closely connected with the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value, we need to investigate the risk-sensitive average cost criterion at first. Under the optimality conditions of the paper (i.e., the standard continuity-compactness condition and the irreducibility condition), we first show that the simultaneous Doeblin condition holds (see Theorem 3.1). The simultaneous Doeblin condition plays a crucial role in establishing the existence of a solution to the risk-sensitive average cost optimality equation. Then we introduce an auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function (see Theorem 3.2). Basing on Theorem 3.2, we have that the pair of the optimal value functions of the risk-sensitive average cost criterion and the risk-sensitive first passage criterion is a solution to the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value (see Theorem 3.3), which generalizes the results in Ghosh and Saha (2014) and Wei and Chen (2016) only allowing the risk-sensitivity parameter to take positive values (see Remarks 3.3 and 3.4). It should be noted that the extension is nontrivial. Moreover, we prove that the optimal value function of the risk-sensitive average cost criterion is continuous with respect to the risk-sensitivity parameter (see Theorem 3.4). Finally, using the results on the expected average cost and risk-sensitive average cost criteria, we show that there exists an optimal deterministic stationary policy in the class of all randomized Markov policies for the U-average cost criterion. Moreover, we have that the optimal value function and the set of all optimal stationary policies for the U-average cost criterion coincide with those for the average cost criterion induced either by the identity function or by the exponential utility function with some risk-sensitivity parameter (see Theorem 4.1).

The rest of this paper is organized as follows. In Section 2, we introduce the control model and the optimality criterion. In Section 3, we present the results on the simultaneous Doeblin condition, the risk-sensitive first passage criterion and the risk-sensitive average cost criterion, whose proofs are given in Sections 57. In Section 4, we state and prove the main results on the U-average cost criterion. In Section 8, we conclude with some remarks.

2 Preliminaries

In this section, we introduce the control model and the average cost criterion induced by the regular utility function for the CTMDPs. The control model in this paper is given by

$$\mathcal{M}:=\{S, A, (A(i), i\in S), q(j|i,a), c(i,a)\}. $$
  • (i) The state space S is a finite set endowed with the discrete topology.

  • (ii) The action space A is a Borel space with the Borel σ-algebra \(\mathcal {B}(A)\).

  • (iii) The set of all admissible actions in state iS denoted by A(i) is a Borel-measurable subset of A. Moreover, we set K:={(i,a)|iS,aA(i)} which stands for the set of all admissible state-action pairs.

  • The real-valued measurable transition rate q(j|i,a) satisfies q(j|i,a)≥0 for all (i,a)∈K and ji, and is conservative (i.e., \({\sum }_{j\in S}q(j|i,a)=0\) for all (i,a)∈K).

  • The positive real-valued cost rate function c(i,a) is measurable in aA(i) for each iS.

The evolution of a CTMDP is intuitively described as follows. The state of the dynamical system is observed continuously by a decision-maker. When the state of the system occupies iS, the decision-maker takes an action a from the set of all admissible actions A(i). As a result of this action, a cost is incurred at the rate c(i,a), and the system stays at state i for a random time following the exponential distribution and then jumps to a new state ji according to some distribution (see Proposition B.8 in Guo and Hernández-Lerna (2009, p.205) for the explicit expressions of the corresponding distributions). When the state of the system transits to the state j, the above procedure is repeated.

Below we formally give a mathematical description.

Let \(S_{\infty }:=S\cup \{i_{\infty }\}\) with an isolated point \(i_{\infty }\notin S\), \(\mathbb {R}_{+}:=(0,+\infty )\), \({\Omega }^{0}:=(S\times \mathbb {R}_{+})^{\infty }\), \({\Omega }:={\Omega }^{0}\cup \{(i_{0}, \theta _{1},i_{1}, \ldots , \theta _{m-1},i_{m-1}, \infty , i_{\infty },\infty , i_{\infty },\ldots )| i_{0}\in S, \ i_{l}\in S, \ \theta _{l}\in \mathbb {R}_{+} \ \text {for \ each} \ 1\leq l\leq m-1, \ m\geq 2\}\), and \(\mathcal {F}\) be the Borel σ-algebra of Ω. For each ω=(i 0,𝜃 1,i 1,…)∈Ω, define X 0(ω):=i 0, T 0(ω):=0, X m (ω):=i m , T m (ω):=𝜃 1+𝜃 2+⋯+𝜃 m for m≥1, \(T_{\infty }(\omega ):=\lim _{m\to \infty }T_{m}(\omega )\), and the state process

$$\xi_{t}(\omega):={\sum}_{m\geq0}I_{\{T_{m}\leq t<T_{m+1}\}}i_{m}+I_{\{t\geq T_{\infty}\}}i_{\infty}\ \ \text{for} \ t\geq0, $$

where I D denotes the indicator function of a set D. The process after \(T_{\infty }\) is regarded to be absorbed in the state \(i_{\infty }\). Hence, we write \(q(i_{\infty }|i_{\infty }, a_{\infty })=0\), \(c(i_{\infty }, a_{\infty })=0\), \(A(i_{\infty }):=\{a_{\infty }\}\), \(A_{\infty }:=A\cup \{a_{\infty }\}\), where \(a_{\infty }\) is an isolated point. Let \(\mathcal {F}_{t}:=\sigma (\{T_{m}\leq s, X_{m}=i\}: i\in S, s\leq t, m\geq 0)\) for t≥0, \(\mathcal {F}_{s-}:=\bigvee _{0\leq t<s}\mathcal {F}_{t}\), and \(\mathcal {P}:=\sigma (\{D\times \{0\}, D\in \mathcal {F}_{0}\}\cup \{D\times (s,\infty ), D\in \mathcal {F}_{s-}, s>0\})\) which denotes the σ-algebra of predictable sets on \({\Omega }\times [0,\infty )\) related to \(\{\mathcal {F}_{t}\}_{t\geq 0}\).

Before giving the optimality criterion, we need to introduce the following definition of a randomized Markov policy.

Definition 2.1

A \(\mathcal {P}\)-measurable transition probability π(⋅|ω,t) on \((A_{\infty }, \mathcal {B}(A_{\infty }))\), concentrated on A(ξ t(ω)) is called a randomized Markov policy if there exists a kernel φ on \(a_{\infty }\) given \(S_{\infty }\times [0,\infty )\) such that π(⋅|ω,t)=φ(⋅|ξ t(ω),t). A policy π is said to be deterministic stationary if there exists a function f on \(S_{\infty }\) satisfying f(i)∈A(i) for all \(i\in S_{\infty }\) and \(\pi (\cdot |\omega , t)=\delta _{f(\xi _{t-}(\omega ))}(\cdot )\), where δ x (⋅) is the Dirac measure concentrated at the point x.

Let π and F be the set of all randomized Markov policies and the set of all deterministic stationary policies, respectively.

Given an arbitrary initial state iS and any policy π∈π, employing Theorem 4.27 in Kitaev and Rykov (1995), we obtain the existence of a unique probability measure denoted by \(P_{i}^{\pi }\) on \(({\Omega },\mathcal {F})\). The notation \(E_{i}^{\pi }\) represents the expectation operator with respect to \(P_{i}^{\pi }\).

Let \(\mathcal {U}\) be the set of all the real-valued utility functions U on \(\mathbb {R}_{+}\) satisfying the following properties: (i) U has continuous derivatives up to second order; (ii) the first derivative \(U^{\prime }(x)\) is positive for all \(x\in \mathbb {R}_{+}\). For any \(U\in \mathcal {U}\), the Arrow-Pratt risk-sensitivity function \(\mathcal {A}_{U}\) is defined by \(\mathcal {A}_{U}(x):=\frac {U^{\prime \prime }(x)}{U^{\prime }(x)}\) for all \(x\in \mathbb {R}_{+}\), where \(U^{\prime \prime }\) denotes the second derivative of U. Below we give the definition of a regular utility function in Cavazos-Cadena and Hernández-Hernández (2016).

Definition 2.2

A utility function \(U\in \mathcal {U}\) is said to be regular if \(\lambda _{U}:= \lim _{x\to \infty }\mathcal {A}_{U}(x)\) exists in \(\mathbb {R}:=(-\infty ,\infty )\). The constant λ U is called the asymptotic risk-sensitivity parameter of the regular utility function U.

Let \(\mathcal {U}_{r}\) be the set of all the regular utility functions in \(\mathcal {U}\). For any \(U\in \mathcal {U}_{r}\), iS and π∈π, the average cost criterion induced by the regular utility function U is defined by

$$ J_{U}(i,\pi):=\limsup_{T\to\infty}\frac{1}{T}U^{-1}\left(E_{i}^{\pi}\left [U\left({{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt\right)\right]\right), $$
(2.1)

where U −1 denotes the inverse function of U. In the following, we refer to the average cost criterion defined in Eq. 2.1 as the U-average cost criterion for simplicity.

Remark 2.1

Let \(Y:={{\int }_{0}^{T}}{\int }_{A}c(\xi _{t},a)\pi (da|\xi _{t},t)dt\) be the total cost incurred during the finite time interval [0,T]. The quantity \(U^{-1}(E_{i}^{\pi }[U(Y)])\) stands for the certainty equivalent of Y with respect to the utility function U and the decision-maker is indifferent between paying the random cost Y or the certainty equivalent of Y; see the detailed discussions in Cavazos-Cadena and Fernandez-Gaucherand (2005), Cavazos-Cadena (2010), Cavazos-Cadena and Hernandez-Hernandez (2011), Bauerle and Rieder (2014), and Cavazos-Cadena and Hernandez-Hernandez (2016).

Definition 2.3

A policy π ∈π is said to be U-average optimal if

$$J_{U}(i,\pi^{*})=\inf_{\pi\in{\Pi}}J_{U}(i,\pi)=:J_{U}^{*}(i) $$

for all iS. The function \(J^{*}_{U}\) on S is referred to as the optimal value function of the U-average cost criterion.

Finally, we introduce the expected average cost and risk-sensitive average cost criteria which are the particular cases of the U-average cost criterion and play an important role in proving the existence of U-average optimal policies.

For each real number \(\lambda \in \mathbb {R}\), define the real-valued function V λ on \(\mathbb {R}_{+}\) as follows:

$$\begin{array}{@{}rcl@{}} V_{\lambda}(x)=\left\{\begin{array}{ll} e^{\lambda x}, &\text{ if } \lambda>0,\\ x, & \text{ if } \lambda=0,\\ -e^{\lambda x}, &\text{ if } \lambda<0, \end{array}\right. \end{array} $$
(2.2)

for all \(x\in \mathbb {R}_{+}\). It is obvious that V λ belongs to \(\mathcal {U}_{r}\) for all \(\lambda \in \mathbb {R}\). Then for each λ≠0, iS and π∈π, by Eq. 2.1 we have

$$J_{V_{\lambda}}(i,\pi)=\limsup_{T\to\infty}\frac{1}{\lambda T}\ln E_{i}^{\pi}\left[e^{{\lambda{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt}\right], $$

which is induced by the exponential utility function and called the risk-sensitive average cost criterion in Ghosh and Saha (2014) and Wei and Chen (2016). For λ=0, Eq. 2.1 gives

$$J_{V_{0}}(i,\pi)=\limsup_{T\to\infty}\frac{1}{T}E_{i}^{\pi}\left[{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt\right] $$

for all iS and π∈π, which is induced by the identity function and referred to as the expected average cost criterion; see, for instance, Puterman (1994), Kitaev and Rykov (1995), Guo and Hernandez-Lerma (2009), Guo et al. (2012), and Wei and Chen (2014). Hence, the V λ -average cost criterion contains the expected average cost and risk-sensitive average cost criteria.

3 The V λ -average cost criterion

In this section, we aim to give the optimality conditions for the existence of V λ -average optimal policies and establish the existence of a solution to the optimality equation of the V λ -average cost criterion. Since the existence of optimal policies and the optimality equation for the V 0-average cost criterion (i.e., the expected average cost criterion) have been well studied (see Puterman (1994), Kitaev and Rykov (1995), Guo and Hernández-Lerma (2009), and Wei and Chen (2014) and the references therein), we mainly focus on the V λ -average cost criterion for all λ≠0 (i.e., the risk-sensitive average cost criterion) below. To do so, we introduce the following assumption in Wei and Chen (2016), i.e., the usual continuity-compactness condition and the irreducibility condition.

Assumption 3.1

  • (i) For each iS, the set A(i) is compact.

  • (ii) For each i,jS, the functions c(i,a) and q(j|i,a) are continuous in aA(i).

  • (iii) For each fF, the continuous-time Markov chain associated with the transition rate q(⋅|⋅,f) is irreducible, which means that for any two states ij, there exist different states j 1=i, j 2,…, j m such that q(j 2|j 1,f)⋯q(j|j m ,f)>0, where we write q(j|i,f):=q(j|i,f(i)).

Remark 3.1

For each fF, f can be viewed as \({\prod }_{i\in S}f(i)\in {\prod }_{i\in S}A(i)\). Thus, by Assumption 3.1(i) and the Tychonoff theorem, we have that F is compact and metrizable.

Fix any state zS throughout the paper. Define

$$\tau_{z}:=\inf\{t\geq T_{1}:\xi_{t}=z\} \ \text{with \ the \ convention\ that} \ \inf\emptyset:=\infty. $$

τ z is the time of the first entry into the state z after the first transition has occurred. Under Assumption 3.1, the following result indicates that the simultaneous Doeblin condition (i.e., the statement of Theorem 3.1(b)) for the CTMDPs holds.

Theorem 3.1

Suppose that Assumption 3.1 is satisfied. Then the following assertions hold.

  • (a) There exists a constant \(L\in \mathbb {R}_{+}\) such that \({E_{i}^{f}}[\tau _{z}]\leq L\) for all i∈S and f∈F.

  • (b) There exist constants \(t_{0}\in \mathbb {R}_{+}\) and α∈(0,1) such that \({P_{i}^{f}}(\tau _{z}>t_{0})\leq \alpha \) for all i∈S and f∈F.

Proof

See Section 5. □

Remark 3.2

The assertion of part (a) is equivalent to that of part (b). Indeed, from the proof of Theorem 3.1, we see that part (a) implies part (b). On the other hand, suppose that part (b) holds. Then by an induction argument, we have \({P_{i}^{f}}(\tau _{z}>mt_{0})\leq \alpha ^{m}\) for all iS, fF and m=1,2,…. Thus, employing the last inequality and Lemma 3.4 in Kallenberg (2012, p.49), we obtain

$${E_{i}^{f}}{\kern-.5pt}[{\kern-.5pt}\tau_{z}{\kern-.5pt}]{\kern-.5pt} ={\kern-.5pt}{\int}_{0}^{\infty}\!{P_{i}^{f}}\!({\kern-.5pt}\tau_{z}\! >\! t{\kern-.5pt})dt{\kern-.5pt} ={\kern-.5pt} {\sum}_{m=0}^{\infty}{\int}_{mt_{0}}^{(m+1)t_{0}}\!{P_{i}^{f}}{\kern-.5pt}({\kern-.5pt}\tau_{z}\! >\! t{\kern-.5pt})dt{\kern-.5pt}\leq{\kern-.5pt} t_{0}{\sum}_{m=0}^{\infty}{P_{i}^{f}}(\tau_{z}{\kern-.5pt}>{\kern-.5pt}mt_{0})\leq\frac{t_{0}}{1-\alpha} $$

for all iS and fF. Hence, part (b) implies that part (a) holds with \(L=\frac {t_{0}}{1-\alpha }\).

To obtain the existence of a solution to the optimality equation of the risk-sensitive average cost criterion, we need to introduce the following auxiliary risk-sensitive first passage optimization problem which is of interest on its own.

For each iS and fF, we set c(i,f):=c(i,f(i)). For each \(g\in \mathbb {R}\), λ≠0, iS and fF, the risk-sensitive first passage criterion h g,λ (i,f) and the corresponding optimal value function \(h^{*}_{g,\lambda }(\cdot )\) on S are given by

$$ h_{g,\lambda}(i,f):=\frac{1}{\lambda}\ln {E_{i}^{f}}\left[e^{\lambda{\int}_{0}^{\tau_{z}}\left(c(\xi_{t},f)-g\right)dt}\right] \ \ \text{ and } \ \ h_{g,\lambda}^{*}(i):=\inf_{f\in F}h_{g,\lambda}(i,f), $$
(3.1)

respectively. Let \(G_{\lambda }:=\left \{g\in \mathbb {R}:h^{*}_{g,\lambda }(z)\leq 0\right \}\) for all λ>0 and \(G_{\lambda }:=\left \{g\in \mathbb {R}:h^{*}_{g,\lambda }(z)\geq 0\right \}\) for all λ<0. Moreover, we define

$$\begin{array}{@{}rcl@{}} g^{*}_{\lambda}:=\left\{\begin{array}{ll} \inf G_{\lambda}, &\textrm{if \(\lambda>0\)},\\ \sup G_{\lambda}, &\textrm{if \(\lambda<0\)}. \end{array}\right. \end{array} $$
(3.2)

Then we have the following assertions on the risk-sensitive first passage criterion.

Theorem 3.2

Under Assumption 3.1, the following statements hold for all λ≠0.

  • (a) The set G λ is nonempty.

  • (b) For each \(g\in \mathbb {R}\) and f∈F, the function h g,λ (⋅,f) on S satisfies the following equations

    $$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda h_{g,\lambda}(i,f)}=Q(i,f,g,\lambda)\left(q(z|i,f)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h_{g,\lambda}(j,f)}q(j|i,f)\right) \\ e^{\lambda h_{g,\lambda}(z,f)}=Q(z,f,g,\lambda) {\sum}_{j\in S\setminus\{z\}}e^{\lambda h_{g,\lambda}(j,f)}q(j|z,f) \end{array}\right. \end{array} $$

    for all i∈S∖{z}, where we set \(Q(i,f,g, \lambda ):={\int }_{0}^{\infty }e^{\lambda (c(i,f)-g)s+q(i|i,f)s}ds\) and make a convention that \(0\cdot \infty :=0\).

  • (c) For each g∈G λ , the function \(h_{g,\lambda }^{*}\) on S satisfies the following equations

    $$ \left\{\begin{array}{ll} e^{\lambda h^{*}_{g,\lambda}(i)}\! =\! sgn{\kern-.5pt}({\kern-.5pt}\lambda{\kern-.5pt})\inf\limits_{a\in A(i)}\!\left\{{\kern-.5pt}sgn{\kern-.5pt}({\kern-.5pt}\lambda{\kern-.5pt}){\kern-.5pt}Q{\kern-.5pt}({\kern-.5pt}i{\kern-.5pt},{\kern-.5pt}a{\kern-.5pt},{\kern-.5pt}g{\kern-.5pt},{\kern-.5pt}\lambda{\kern-.5pt})\!\left({\kern-.5pt}q{\kern-.5pt}({\kern-.5pt}z{\kern-.5pt}|{\kern-.5pt}i{\kern-.5pt},{\kern-.5pt}a{\kern-.5pt})\! +\!{\sum}_{j\in S\setminus\{i, z\}}\!e^{\lambda h^{*}_{g,\lambda}(j)}{\kern-.5pt}q{\kern-.5pt}({\kern-.5pt}j{\kern-.5pt}|{\kern-.5pt}i{\kern-.5pt},{\kern-.5pt}a{\kern-.5pt})\!\right)\!\right\}\\ e^{\lambda h^{*}_{g,\lambda}(z)}=sgn(\lambda)\inf\limits_{a\in A(z)}\left\{sgn(\lambda)Q(z,a,g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|z,a)\right\} \end{array}\right. $$
    (3.3)

    for all i∈S∖{z}, where we set \(Q(i,a,g,\lambda ):={\int }_{0}^{\infty }e^{\lambda (c(i,a)-g)s+q(i|i,a)s}ds\) and sgn(λ) is the sign function, i.e., if λ>0, sgn(λ)=1; if λ<0, sgn(λ)=−1. Moreover, there exists a policy \(f^{*}_{g,\lambda }\in F\) with \(f^{*}_{g,\lambda }(i)\in A(i)\) attaining the minimum of Eq. 3.3 , and for any \(f^{*}_{g,\lambda }\in F\) with \(f^{*}_{g,\lambda }(i)\in A(i)\) attaining the minimum of Eq. 3.3 , we have \(h_{g,\lambda }(i,f^{*}_{g,\lambda })=h^{*}_{g,\lambda }(i)\in \mathbb {R}\) and \(Q(i,f^{*}_{g,\lambda },g,\lambda )<\infty \) for all i∈S.

  • (d) We have \(g^{*}_{\lambda }\in G_{\lambda }\) and \(h^{*}_{g^{*}_{\lambda },\lambda }(z)=0\).

Proof

See Section 6. □

Remark 3.3

The equations in Eq. 3.3 are referred to as the optimality equations of the risk-sensitive first passage criterion. The statements of Theorem 3.2 hold for an arbitrary risk-sensitivity parameter λ≠0 and extend the results in Wei and Chen (2016) for any λ>0. Moreover, as can be seen in the proof of Theorem 3.2, the treatment of the case λ<0 is more difficult than that of the case λ>0. Hence, the extension is nontrivial.

Let B(S) be the set of all real-valued functions on S. Below we state the optimality equation and the existence of optimal policies for the risk-sensitive average cost criterion.

Theorem 3.3

Suppose that Assumption 3.1 is satisfied. For each λ≠0, let \(g^{*}_{\lambda }\) and \(h^{*}_{g^{*}_{\lambda },\lambda }\) be as in Eqs. 3.1 and 3.2. Then the following assertions hold.

  • The pair \((g^{*}_{\lambda }, h^{*}_{g^{*}_{\lambda },\lambda })\in \mathbb {R}\times B(S)\) satisfies the following optimality equation

    $$ \lambda g^{*}_{\lambda} e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(i)}\,=\, sgn(\lambda)\inf\limits_{a\in A(i)}\left\{sgn(\lambda)\left(\lambda c(i,a)e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(i)}\,+\,{\sum}_{j\in S} e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(j)}q(j|i,a)\right)\right\} $$
    (3.4)

    for all i∈S. Moreover, there exists \(f^{*}_{\lambda }\in F\) with \(f^{*}_{\lambda }(i)\in A(i)\) attaining the minimum of Eq. 3.4.

  • For any \(f^{*}_{\lambda }\in F\) with \(f^{*}_{\lambda }(i)\in A(i)\) attaining the minimum of Eq. 3.4 , we have

    $$J^{*}_{V_{\lambda}}(i)=J_{V_{\lambda}}(i,f^{*}_{\lambda})=\lim_{T\to\infty}\frac{1}{\lambda T}\ln E_{i}^{f^{*}_{\lambda}}\left[e^{{\lambda{\int}_{0}^{T}}c(\xi_{t},f^{*}_{\lambda})dt}\right]=g^{*}_{\lambda}$$

    for all i∈S. Hence, the policy \(f^{*}_{\lambda }\) is V λ -average optimal.

Proof

The assertions follow from the above Theorem 3.2, the Feynman-Kac formula and the similar techniques of Theorem 3.2 in Wei and Chen (2016). □

Remark 3.4

Theorem 3.3 establishes the existence of a solution to the optimality equation and the existence of an optimal stationary policy for the V λ -average cost criterion with an arbitrary risk-sensitivity parameter λ≠0, which generalizes those in Ghosh and Saha (2014) and Wei and Chen (2016). More precisely, the risk-sensitivity parameter λ is positive in Ghosh and Saha (2014) and Wei and Chen (2016) and satisfies some additional condition that \(\lambda \max _{(i,a)\in K}c(i,a)<b\) (for some constant b>0) in Ghosh and Saha (2014). Moreover, to the best of our knowledge, the risk-sensitive average cost criterion with a negative risk-sensitivity parameter has not been studied in the existing literature.

Finally, we give the following statement on the continuity of \(J^{*}_{V_{\lambda }}(i)\) in \(\lambda \in \mathbb {R}\), which plays a crucial role in the study on the existence of U-average optimal policies.

Theorem 3.4

Suppose that Assumption 3.1 holds. Then for each i∈S, \(J^{*}_{V_{\lambda }}(i)\) is continuous in \(\lambda \in \mathbb {R}\).

Proof

See Section 7. □

4 The existence of U-average optimal policies

In this section, we show the existence of optimal policies for the U-average cost criterion induced by a regular utility function U with the asymptotic risk-sensitivity parameter λ U .

Below we state the main results on the U-average cost criterion.

Theorem 4.1

Suppose that Assumption 3.1 is satisfied. Let \(g^{*}_{\lambda _{U}}\) be as in Eq. 3.2 with λ U in lieu of λ for all λ U ≠0 and \(g^{*}_{0}:=\lim _{\lambda \to 0}g^{*}_{\lambda }\). Then the following assertions hold.

  • (a) \(J^{*}_{U}(i)=J^{*}_{V_{\lambda _{U}}}(i)=g^{*}_{\lambda _{U}}\) for all i∈S.

  • (b) For each f ∈F and i ∈S, \(J_{U}(\hspace *{-.65pt}i\hspace *{-.5pt},\hspace *{-.65pt}f\hspace *{-.5pt})\hspace *{-.65pt}=\hspace *{-.5pt}J_{V_{\lambda _{U}}}(\hspace *{-.65pt}i\hspace *{-.5pt},\hspace *{-.65pt}f\hspace *{-.5pt})=\lim _{T\to \infty }\frac {1}{\lambda _{U} T}\ln {E_{i}^{f}}\left [e^{\lambda _{U}{{\int }_{0}^{T}}c(\xi _{t},f)dt}\right ]\) for all λ U ≠0 and \(J_{U}(i,f)=J_{V_{0}}(i,f)=\lim _{T\to \infty }\frac {1}{T}{E_{i}^{f}}\left [{{\int }_{0}^{T}}c(\xi _{t},f)dt\right ]\) for λ U = 0. Moreover, the limits are independent of the state i∈S.

  • (c) For any λ U ≠0 and \(f^{*}_{\lambda _{U}}\in F\) attaining the minimum of Eq. 3.4 with λ U in lieu of λ, we have \(J^{*}_{U}(i)=J_{U}(i,f^{*}_{\lambda _{U}})=g^{*}_{\lambda _{U}}\) for all i∈S. For λ U =0, there exist \((g^{*}_{0},h_{0})\in \mathbb {R}\times B(S)\) and a policy \(f^{*}_{0}\in F\) satisfying

    $$\begin{array}{@{}rcl@{}} g^{*}_{0}&=&\inf_{a\in A(i)}\left\{c(i,a)+{\sum}_{j\in S}h_{0}(j)q(j|i,a)\right\} \end{array} $$
    (4.1)
    $$\begin{array}{@{}rcl@{}} &=&c(i,f^{*}_{0})+{\sum}_{j\in S}h_{0}(j)q(j|i,f^{*}_{0}) \end{array} $$

    for all i∈S. Moreover, for any \(f^{*}_{0}\in F\) attaining the minimum of Eq. 4.1 , we have \(J^{*}_{U}(i)=J^{*}_{U}(i,f^{*}_{0})=g^{*}_{0}\) for all i∈S. Hence, the policy \(f^{*}_{\lambda _{U}}\) is U-average optimal.

Proof

  • (a) Fix any iS, π∈π and η>0. The relation \(\lim _{x\to \infty }\mathcal {A}_{U}(x)=\lambda _{U}\) implies that there exists a positive constant x 0 satisfying

    $$ \lambda_{U}-\eta\leq\mathcal{A}_{U}(x)\leq \lambda_{U}+\eta \ \ \text{for \ all} \ x>x_{0}. $$
    (4.2)

    Note that \(\min _{(i,a)\in K}c(i,a)>0\). Thus, there exists a positive constant T such that

    $$Y:={{\int}_{0}^{T}}{\int}_{A} c(\xi_{t},a)\pi(da|\xi_{t},t)dt> x_{0} \ \ \text{for \ all} \ T> T^{*}. $$

    Let \(V_{\lambda _{U}-\eta }\) and \(V_{\lambda _{U}+\eta }\) be the regular utility functions given by Eq. 2.2. Then we have

    $$ \mathcal{A}_{V_{\lambda_{U}-\eta}}(x)=\lambda_{U}-\eta \ \ \text{and} \ \ \mathcal{A}_{V_{\lambda_{U}+\eta}}(x)=\lambda_{U}+\eta \ \ \text{for \ all} \ x\in\mathbb{R}_{+}. $$
    (4.3)

    Moreover, direct calculations yield that \(E_{i}^{\pi }[U(Y)]\), \(E_{i}^{\pi }[V_{\lambda _{U}-\eta }(Y)]\) and \(E_{i}^{\pi }[V_{\lambda _{U}+\eta }(Y)]\) are finite. Thus, employing Theorem 4.1 in Cavazos-Cadena and Hernández-Hernández (2016), Eq. 4.2 and 4.3, we obtain \(J_{V_{\lambda _{U}-\eta }}(i,\pi )\leq J_{U}(i,\pi )\leq J_{V_{\lambda _{U}+\eta }}(i,\pi )\), which gives

    $$ J^{*}_{V_{\lambda_{U}-\eta}}(i)\leq J^{*}_{U}(i)\leq J^{*}_{V_{\lambda_{U}+\eta}}(i). $$
    (4.4)

    Hence, letting η→0 in Eq. 4.4 and using Theorems 3.3(b) and 3.4, we get the desired result.

  • (b) Fix any fF. Let \(\mathcal {M}_{f}\) be the control model in which we take A(i)={f(i)} for all iS and the other components are the same as in the model \(\mathcal {M}\). Then it is obvious that the model \(\mathcal {M}_{f}\) satisfies Assumption 3.1. Thus, part (b) follows directly from part (a), Lemma 3.1(b) in Guo and Hernández-Lerma (2009) and Theorem 3.3.

  • (c) By part (a), Theorem 7.8 in Guo and Hernández-Lerma (2009) and Theorem 3.3 we have \(J^{*}_{U}(i)=J^{*}_{V_{\lambda _{U}}}(i)=J_{V_{\lambda _{U}}}(i,f^{*}_{\lambda _{U}})=g^{*}_{\lambda _{U}}\) for all iS, which together with part (b) implies the assertion.

Remark 4.1

  • (a) Theorem 4.1 indicates that the optimal value function of the U-average cost criterion induced by a regular utility function U with the asymptotic risk-sensitivity parameter λ U is a constant and equals the optimal value function of the \(V_{\lambda _{U}}\)-average cost criterion. Moreover, the set of all U-average optimal stationary policies coincides with the set of all \(V_{\lambda _{U}}\)-average optimal stationary policies. Hence, we can compute a U-average optimal policy and the optimal value function of the U-average cost criterion via the policy iteration algorithms given in Ghosh and Saha (2014) for the risk-sensitive average cost criterion with the risk-sensitivity parameter λ U ≠0 or in Guo and Hernández-Lerma (2009) for the expected average cost criterion with λ U =0.

  • (b) Besides the risk-sensitive average cost criterion induced by the exponential utility function and the expected average cost criterion induced by the identity function, the U-average cost criterion includes other average cost criteria, such as those induced by the logarithmic utility function \(W(x)=\ln x\) and the power utility function U β (x):=x β ( β>0) for all \(x\in \mathbb {R}_{+}\). Obviously, we have that the utility functions W and U β are regular with the asymptotic risk-sensitivity parameters λ W =λ U β =0. Thus, Theorem 4.1 implies \(J^{*}_{W}(i)=J^{*}_{U_{\beta }}(i)=g_{0}^{*}\) for all iS. That is, under Assumption 3.1, the optimal value functions \(J^{*}_{W}\) and \(J^{*}_{U_{\beta }}\) are independent of the state variable, and equal the optimal value function of the expected average cost criterion which is risk-neutral.

5 Proof of Theorem 3.1

In this section, we give the proof of Theorem 3.1.

Proof

  • (a) By Assumption 3.1(iii) and the finiteness of S, for each fF, the continuous-time Markov chain associated with the transition rate q(⋅|⋅,f) has a unique invariant probability measure denoted by μ f , which satisfies \({\sum }_{i\in S}q(j|i,f)\mu _{f}(i)=0\) for all jS. Below we show that μ f is continuous in fF. In fact, let \(\{f_{n},n\geq 1\}\subseteq F\) be an arbitrary sequence converging to fF. Note that \(0\leq \mu _{f_{n}}(j)\leq 1\) for all jS and n≥1. Fix any iS and choose any convergent subsequence \(\{\mu _{f_{n_{l}}}(i),l\geq 1\}\) of \(\{\mu _{f_{n}}(i),n\geq 1\}\). Let \(\lim _{l\to \infty }\mu _{f_{n_{l}}}(i)=:\mu (i)\). Moreover, there exists a subsequence of {n l } (still denoted by {n l }) such that \(\lim _{l\to \infty }\mu _{f_{n_{l}}}(j)=:\widetilde {\mu }(j)\) for all jS and \(\widetilde {\mu }(i)=\mu (i)\). Thus, we have \(0\leq \widetilde {\mu }(j)\leq 1\), \({\sum }_{j\in S}\widetilde {\mu }(j)=1\), and

    $${\sum}_{k\in S}q(j|k,f)\widetilde{\mu}(k)=\lim_{l\to\infty}{\sum}_{k\in S}q(j|k,f_{n_{l}})\mu_{f_{n_{l}}}(k)=0. $$

    Hence, by the uniqueness of the invariant probability measure, we obtain \(\widetilde {\mu }(j)=\mu _{f}(j)\) for all jS. Therefore, the continuity of μ f in fF follows from the fact that any convergent subsequence \(\{\mu _{f_{n_{l}}}(i),l\geq 1\}\) of \(\{\mu _{f_{n}}(i),n\geq 1\}\) has the same limit μ f (i). Set

    $$\widetilde{g}:=\sup_{f\in F}\frac{{E_{z}^{f}}\left[{\int}_{0}^{\tau_{z}}(1-I_{z}(\xi_{t}))dt\right]}{{E_{z}^{f}}[\tau_{z}]}. $$

    Then direct calculations give

    $$\widetilde{g}=\sup_{f\in F}\frac{{E_{z}^{f}}\left[{\int}_{T_{1}}^{\tau_{z}}(1-I_{z}(\xi_{t}))dt\right]}{{E_{z}^{f}}[\tau_{z}]}=\sup_{f\in F}\frac{{E_{z}^{f}}[\tau_{z}]-{E_{z}^{f}}[T_{1}]}{{E_{z}^{f}}[\tau_{z}]}=\sup_{f\in F}\left(1-\mu_{f}(z)\right), $$

    where the last equality is due to Proposition 2.1 in Anderson (1991, p.213) and Proposition B.8 in Guo and Hernández-Lerma (2009, p.205). Thus, by the compactness of F and the continuity of μ f (z) in fF, there exists \(\widetilde {f}\in F\) such that \(\widetilde {g}=1-\mu _{\widetilde {f}}(z)\). Moreover, by Assumption 3.1(iii), we have \(0<\mu _{\widetilde {f}}(z)\leq 1\), which implies \(0\leq \widetilde {g}<1\). Define

    $$\widetilde{h}(i):=E_{i}^{\widetilde{f}}\left[{\int}_{0}^{\tau_{z}}(1-I_{z}(\xi_{t})-\widetilde{g})dt\right] \ \ \text{for\ all} \ i\in S. $$

    Then we have

    $$ \widetilde{h}(z)=0 \ \ \text{and} \ \ \widetilde{h}(i)\geq0 \ \ \text{for \ all} \ i\in S\setminus\{z\}. $$
    (5.1)

    Next, we show that \(\widetilde {h}\in B(S)\) and

    $$ \widetilde{g}=\sup_{a\in A(i)}\left\{1-I_{z}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,a)\right\} \ \ \text{for \ all} \ i\in S. $$
    (5.2)

    Indeed, for each iS∖{z}, by the strong Markov property, direct calculations yield

    $$\begin{array}{@{}rcl@{}} \widetilde{h}(i)&=&E_{i}^{\widetilde{f}}\left[{\int}_{0}^{\tau_{z}}(1-I_{z}(\xi_{t})-\widetilde{g})dtI_{\{\tau_{z}=T_{1}\}}\right]{\kern-.5pt}+{\kern-.5pt} E_{i}^{\widetilde{f}}\left[{\int}_{0}^{\tau_{z}}(1{\kern-.5pt}-{\kern-.5pt}I_{z}(\xi_{t}){\kern-.5pt}-{\kern-.5pt}\widetilde{g})dtI_{\{\tau_{z}>T_{1}\}}\right]\!\!\\ &=&(1-\widetilde{g})E_{i}^{\widetilde{f}}[T_{1}I_{\{\tau_{z}=T_{1}\}}]+(1-\widetilde{g})E_{i}^{\widetilde{f}}[T_{1}I_{\{\tau_{z}>T_{1}\}}]\\ &&+E_{i}^{\widetilde{f}}\left[I_{\{\tau_{z}>T_{1}\}}E_{i}^{\widetilde{f}}\left[{\int}_{T_{1}}^{\tau_{z}}(1-I_{z}(\xi_{t})-\widetilde{g})dt \big|\mathcal{F}_{T_{1}}\right]\right]\\ &=&(1-\widetilde{g})E_{i}^{\widetilde{f}}[T_{1}]+ E_{i}^{\widetilde{f}}[I_{\{\tau_{z}>T_{1}\}}\widetilde{h}(\xi_{T_{1}})]\\ &=&-\frac{1-\widetilde{g}}{q(i|i,\widetilde{f})}-{\sum}_{j\in S\setminus\{i,z\}}\frac{\widetilde{h}(j)q(j|i,\widetilde{f})}{q(i|i,\widetilde{f})}, \end{array} $$
    (5.3)

    where the fourth equality follows from Proposition B.8 in Guo and Hernández-Lerma (2009, p.205). Similarly, we have

    $$\begin{array}{@{}rcl@{}} \widetilde{h}(z)&=&-\widetilde{g}E_{z}^{\widetilde{f}}[T_{1}I_{\{\tau_{z}>T_{1}\}}] +E_{z}^{\widetilde{f}}[I_{\{\tau_{z}>T_{1}\}}\widetilde{h}(\xi_{T_{1}})]\\ &=&\frac{\widetilde{g}}{q(z|z,\widetilde{f})}-{\sum}_{j\in S\setminus\{z\}}\frac{\widetilde{h}(j)q(j|z,\widetilde{f})}{q(z|z,\widetilde{f})}. \end{array} $$
    (5.4)

    For any iz, Assumption 3.1(iii) implies that there exist different states j 1=z, j 2,…, j m =i such that \(q(j_{n+1}|j_{n},\widetilde {f})>0\) for all n=1,2,…,m−1. Then by Eq. 5.4 we obtain \(\widetilde {h}(j_{2})<\infty \), which together with Eq. 5.3 and an induction argument gives \(\widetilde {h}(i)<\infty \) for all iS. Hence, we get \(\widetilde {h}\in B(S)\). Employing Eqs. 5.1, 5.3 and 5.4, we have

    $$\begin{array}{@{}rcl@{}} \widetilde{g}&=&1-I_{z}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,\widetilde{f}) \end{array} $$
    (5.5)
    $$\begin{array}{@{}rcl@{}} &\leq&\sup_{a\in A(i)}\left\{1-I_{z}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,a)\right\} \end{array} $$
    (5.6)

    for all iS. On the other hand, fix any (k,a)∈K and define

    $$\begin{array}{@{}rcl@{}} {\Phi}(i):=\left\{\begin{array}{ll} \widetilde{g}-1+I_{z}(i)-{\sum}_{j\in S}\widetilde{h}(j)q(j|i,a), & \ \text{if} \ i=k,\\ 0, & \ \text{otherwise}. \end{array}\right. \end{array} $$
    (5.7)

    Obviously, we get Φ∈B(S). Let \(\widehat {f}\in F\) be a policy with \(\widehat {f}(k)=a\) and \(\widehat {f}(i)=\widetilde {f}(i)\) for all iS∖{k}. Then by Eqs. 5.5 and 5.7, we obtain

    $$\widetilde{g}=1-I_{z}(i)+{\Phi}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,\widehat{f}) $$

    for all iS. Thus, using the last equality and the Dynkin formula, we have

    $$\widetilde{g}T=E_{i}^{\widehat{f}}\left[{{\int}_{0}^{T}}(1-I_{z}(\xi_{t})+{\Phi}(\xi_{t}))dt\right] +E_{i}^{\widehat{f}}[\widetilde{h}(\xi_{T})]-\widetilde{h}(i), $$

    which together with the fact that \(\widetilde {h}\in B(S)\) gives

    $$ \widetilde{g}=\lim_{T\to\infty}\frac{1}{T}E_{i}^{\widehat{f}}\left[{{\int}_{0}^{T}}(1-I_{z}(\xi_{t})+{\Phi}(\xi_{t}))dt\right] =1-\mu_{\widehat{f}}(z)+{\Phi}(k)\mu_{\widehat{f}}(k) $$
    (5.8)

    for all iS. Note that \(\mu _{\widehat {f}}(k)>0\) and \(\widetilde {g}\geq 1-\mu _{\widehat {f}}(z)\). Hence, by Eq. 5.8 we obtain Φ(k)≥0. Therefore, we get

    $$\widetilde{g}\geq \sup_{a\in A(i)}\left\{1-I_{z}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,a)\right\} $$

    for all iS, which together with Eq. 5.6 implies Eq. 5.2. Fix any iS and fF below. By Eqs. 5.1 and 5.2 we obtain

    $$\widetilde{h}(i)\geq-\frac{1-I_{z}(i)-\widetilde{g}}{q(i|i,f)}-{\sum}_{j\in S\setminus\{i,z\}}\frac{\widetilde{h}(j)q(j|i,f)}{q(i|i,f)}. $$

    Then employing the last inequality, we have

    $$ \widetilde{h}(\xi_{T_{m}})\geq {E_{i}^{f}}\left[{\int}_{T_{m}}^{T_{m+1}}(1-I_{z}(\xi_{t})-\widetilde{g})dt\big|\mathcal{F}_{T_{m}}\right]+ {E_{i}^{f}}\left[\widetilde{h}(\xi_{T_{m+1}})I_{\{\tau_{z}>T_{m+1}\}}\big|\mathcal{F}_{T_{m}}\right] $$
    (5.9)

    for all m=0,1,…. Thus, using Eq. 5.9 and an induction argument, we get

    $$\widetilde{h}(i)\geq-\frac{1-I_{z}(i)-\widetilde{g}}{q(i|i,f)}+(1-\widetilde{g}){\sum}_{l=1}^{n}{E_{i}^{f}}[\theta_{l+1}I_{\{\tau_{z}>T_{l}\}}] +{E_{i}^{f}}\left[\widetilde{h}(\xi_{T_{n+1}})I_{\{\tau_{z}>T_{n+1}\}}\right] $$

    for all n=1,2,…, which together with Eq. 5.1 yields

    $$\widetilde{h}(i)+\frac{1-I_{z}(i)-\widetilde{g}}{q(i|i,f)}\geq(1-\widetilde{g}){\sum}_{l=1}^{\infty}{E_{i}^{f}}[\theta_{l+1}I_{\{\tau_{z}>T_{l}\}}]. $$

    Hence, by the last inequality we obtain

    $$ {\sum}_{l=2}^{\infty}{E_{i}^{f}}[\theta_{l}I_{\{\tau_{z}\geq T_{l}\}}]\leq \frac{1}{1-\widetilde{g}}\left[\max_{i\in S}\widetilde{h}(i)-\min_{(i,a)\in K}\frac{\widetilde{g}}{q(i|i,a)}\right]:=L_{1}. $$
    (5.10)

    Observe that

    $${E_{i}^{f}}[\tau_{z}]={\sum}_{l=1}^{\infty}{E_{i}^{f}}[\theta_{l}I_{\{\tau_{z}\geq T_{l}\}}]=-\frac{1}{q(i|i,f)}+{\sum}_{l=2}^{\infty}{E_{i}^{f}}[\theta_{l}I_{\{\tau_{z}\geq T_{l}\}}], $$

    which together with Eq. 5.10 implies \({E_{i}^{f}}[\tau _{z}]\leq L_{1}-\min _{(i,a)\in K}\frac {1}{q(i|i,a)}\). Therefore, the assertion holds with \(L:=L_{1}-\min _{(i,a)\in K}\frac {1}{q(i|i,a)}\).

  • (b) By part (a) we have

    $${P_{i}^{f}}(\tau_{z}>t)\leq \frac{{E_{i}^{f}}[\tau_{z}]}{t}\leq \frac{L}{t} $$

    for all iS, fF and t>0. Moreover, there exists \(t_{0}\in \mathbb {R}_{+}\) such that \(\frac {L}{t_{0}}\in (0,1)\). Hence, part (b) holds with \(\alpha :=\frac {L}{t_{0}}\).

6 Proof of Theorem 3.2

In this section, we present the proof of Theorem 3.2.

Proof

The statements for the case λ>0 follow from Theorem 3.1 in Wei and Chen (2016). Below we only need to prove the case λ<0.

  • (a) Set \(\widetilde {M}:=\min _{(i,a)\in K}c(i,a)\). Thus, we obtain \(h_{\widetilde {M},\lambda }(i,f)\geq 0\) for all iS and fF, which gives \(h^{*}_{\widetilde {M},\lambda }(z)\geq 0\). Therefore, the set G λ is nonempty.

  • (b) From the proof of Theorem 3.1(b) in Wei and Chen (2016), we see that part (b) also holds for the case λ<0.

  • (c) Fix any gG λ . Set \(\overline {c}:=\max _{(i,a)\in K} |c(i,a)-g|\). For each iS∖{z}, fF and m≥1, direct calculations yield

    $$\begin{array}{@{}rcl@{}} e^{\lambda h_{g,\lambda}(z,f)}&\geq & {E_{z}^{f}}\left[e^{\lambda {\int}_{0}^{\tau_{z}}(c(\xi_{t},f)-g)dt}I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}\right]\\ &=&{E_{z}^{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda {\int}_{0}^{T_{m}}(c(\xi_{t},f)-g)dt}{E_{z}^{f}}\left[e^{\lambda {\int}_{T_{m}}^{\tau_{z}}(c(\xi_{t},f)-g)dt}\big|\mathcal{F}_{T_{m}}\right]\right]\\ &=&e^{\lambda h_{g,\lambda}(i,f)}{E_{z}^{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda {\int}_{0}^{T_{m}}(c(\xi_{t},f)-g)dt}\right]\\ &\geq&e^{\lambda h_{g,\lambda}(i,f)}{E_{z}^{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right], \end{array} $$

    which together with Eq. 3.1 and the definition of G λ gives

    $$ e^{\lambda h_{g,\lambda}(i,f)}{E_{z}^{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right]\leq e^{\lambda h^{*}_{g,\lambda}(z)}\leq 1. $$
    (6.1)

    Suppose that \(\sup _{f\in F}e^{\lambda h_{g,\lambda }(i,f)}=\infty \). Then there exists a sequence \(\{f_{n},n\geq 1\}\subseteq F\) such that e λh g,λ (i,f n )≥n for all n≥1. Thus, by Eq. 6.1 we obtain

    $$ \lim_{n\to\infty}E_{z}^{f_{n}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right]=0 $$
    (6.2)

    for all m≥1. Because F is compact, there exists a subsequence of {f n ,n≥1} (denoted by the same sequence) such that f n converges to some \(\overline {f}\in F\), i.e.,

    $$ f_{n}(j)\to \overline{f}(j) \ \ \text{for \ all} \ j\in S \ \ \text{as} \ n\to\infty. $$
    (6.3)

    Moreover, for each m≥1, by Proposition B.8 in Guo and Hernández-Lerma (2009, p.205) we have

    $$\begin{array}{@{}rcl@{}} &&E_{z}^{f_{n}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right]\\ &=&{\sum}_{j_{1}\in S\setminus\{i\},j_{l+1}\in S\setminus\{j_{l},i,z\},l=1,2,\ldots,m-2}\left(-\frac{q(j_{1}|z,f_{n})}{q(z|z,f_{n})+\lambda \overline{c}}\right){\prod}_{l=1}^{m-2}\left(-\frac{q(j_{l+1}|j_{l},f_{n})}{q(j_{l}|j_{l},f_{n})+\lambda \overline{c}}\right)\\ &&\times\left(-\frac{q(i|j_{m-1},f_{n})}{q(j_{m-1}|j_{m-1},f_{n})+\lambda \overline{c}}\right) \end{array} $$

    for all n≥1, which together with Assumption 3.1(ii), Eqs. 6.2 and 6.3 implies

    $$ E_{z}^{\overline{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right]=0. $$
    (6.4)

    On the other hand, Assumption 3.1(iii) gives that there exist different states k 0=z, k 1,…, \(k_{\widetilde {m}}=i\) such that \(q(k_{n+1}|k_{n},\overline {f})>0\) for all \(n=0,2,\ldots , \widetilde {m}-1\). Thus, we get

    $$E_{z}^{\overline{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq \widetilde{m}-1, \xi_{T_{\widetilde{m}}}=i\right\}}e^{\lambda \overline{c}T_{\widetilde{m}}} \right]\geq{\prod}_{l=0}^{\widetilde{m}-1}\left(-\frac{q(k_{l+1}|k_{l},\overline{f})}{q(k_{l}|k_{l},\overline{f})+\lambda \overline{c}}\right)>0, $$

    which contradicts (6.4). Hence, we obtain

    $$ e^{\lambda h^{*}_{g,\lambda}(i)}=\sup_{f\in F}e^{\lambda h_{g,\lambda}(i,f)}<\infty \ \ \text{for \ all} \ i\in S. $$
    (6.5)

    By Eq. 3.1 and part (b), we have

    $$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda h^{*}_{g,\lambda}(i)}\geq Q(i,f,g,\lambda)\left(q(z|i,f)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h_{g,\lambda}(j,f)}q(j|i,f)\right) \\ e^{\lambda h^{*}_{g,\lambda}(z)}\geq Q(z,f,g,\lambda) {\sum}_{j\in S\setminus\{z\}}e^{\lambda h_{g,\lambda}(j,f)}q(j|z,f) \end{array}\right. \end{array} $$
    (6.6)

    for all iS∖{z} and fF. Note that Theorem 3.1 implies

    $$ e^{\lambda h_{g,\lambda}(i,f)}\geq {E_{i}^{f}}\left[e^{\lambda \max_{(i,a)\in K}|c(i,a)-g|\tau_{z}}\right]>0 $$
    (6.7)

    for all iS and fF. Thus, employing Eqs. 6.56.7 and Assumption 3.1(iii), we get

    $$ e^{\lambda h_{g,\lambda}(i,f)}<\infty \ \ \text{and} \ \ Q(i,f,g,\lambda)<\infty \ \ \text{for \ all} \ i\in S \ \text{and} \ f\in F. $$
    (6.8)

    Moreover, using Eq. 3.1 and part (b) again, we obtain

    $$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda h^{*}_{g,\lambda}(i)}\leq\sup\limits_{a\in A(i)}\left\{Q(i,a,g,\lambda)\left(q(z|i,a)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,a)\right)\right\} \\ e^{\lambda h^{*}_{g,\lambda}(z)}\leq\sup\limits_{a\in A(z)}\left\{Q(z,a,g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|z,a)\right\} \end{array}\right. \end{array} $$
    (6.9)

    for all iS∖{z}. By Theorem 3.1(c) in Wei and Chen (2016), Assumption 3.1(ii) and Eq. 6.5, we see that \(Q(i,a,g,\lambda )(q(z|i,a)+{\sum }_{j\in S\setminus \{i, z\}}e^{\lambda h^{*}_{g,\lambda }(j)}q(j|i,a))\) and \(Q(z,a,g,\lambda ){\sum }_{j\in S\setminus \{z\}}e^{\lambda h^{*}_{g,\lambda }(j)}q(j|z,a)\) in Eq. 6.9 are continuous in aA(i) and aA(z), respectively. Thus, the Weierstrass theorem in Aliprantis and Border (2007, p.40) and Assumption 3.1 imply that there exists \(f^{*}_{g,\lambda }\in F\) with \(f^{*}_{g,\lambda }(i)\in A(i)\) attaining the maximum of Eq. 6.9, i.e.,

    $$\begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\left\{\begin{array}{ll} e^{\lambda h^{*}_{g,\lambda}(i)}\leq Q(i,f^{*}_{g,\lambda},g,\lambda)\left(q(z|i,f^{*}_{g,\lambda})+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,f^{*}_{g,\lambda})\right)\\ e^{\lambda h^{*}_{g,\lambda}(z)}\leq Q(z,f^{*}_{g,\lambda},g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|z,f^{*}_{g,\lambda}) \end{array}\right. \end{array} $$
    (6.10)

    for all iS∖{z}. Hence, by Eq. 6.10, Proposition B.8 in Guo and Hernández-Lerma (2009, p.205) and an induction argument, we have

    $$\begin{array}{@{}rcl@{}} e^{\lambda h^{*}_{g,\lambda}(i)}&\leq& {\sum}_{m=1}^{n} E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{m}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt} I_{\left\{\tau_{z}=T_{m}\right\}}\right]\\ &&+E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{n}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt}e^{\lambda h^{*}_{g,\lambda}(\xi_{T_{n}})}I_{\left\{\tau_{z}>T_{n}\right\}}\right] \end{array} $$
    (6.11)

    for all iS∖{z} and n=1,2,…. Furthermore, it follows from part (b) and the similar arguments of Eq. 6.11 that

    $$\begin{array}{@{}rcl@{}} e^{\lambda h_{g,\lambda}(i,f^{*}_{g,\lambda})}&=&{\sum}_{m=1}^{n} E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{m}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt} I_{\left\{\tau_{z}=T_{m}\right\}}\right]\\ &&+E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{n}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt}e^{\lambda h_{g,\lambda}(\xi_{T_{n}},f^{*}_{g,\lambda})}I_{\left\{\tau_{z}>T_{n}\right\}}\right] \end{array} $$

    for all iS∖{z} and n=1,2,…. Thus, employing the last equality and the fact that \(\min _{i\in S}e^{\lambda h_{g,\lambda }(i,f^{*}_{g,\lambda })}>0\), we obtain

    $$ \liminf_{n\to\infty}E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{n}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt}I_{\left\{\tau_{z}>T_{n}\right\}}\right]=0 $$
    (6.12)

    for all iS∖{z}. Hence, using Eqs. 6.11 and 6.12, we get

    $$e^{\lambda h^{*}_{g,\lambda}(i)}\!\leq e^{\lambda h_{g,\lambda}(i,f^{*}_{g,\lambda})} +\max_{j\in S}e^{\lambda h^{*}_{g,\lambda}(j)}\liminf_{n\to\infty}E_{i}^{f^{*}_{g,\lambda}}\!\left[e^{\lambda{\int}_{0}^{T_{n}} \left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt}I_{\left\{\tau_{z}>T_{n}\right\}}\right]=e^{\lambda h_{g,\lambda}(i,f^{*}_{g,\lambda})}, $$

    which together with Eq. 3.1 implies

    $$ h^{*}_{g,\lambda}(i)=h_{g,\lambda}(i,f^{*}_{g,\lambda}) \ \ \text{for \ all} \ i\in S\setminus\{z\}. $$
    (6.13)

    Thus, using Eqs. 6.6 and 6.13, we get

    $$\begin{array}{@{}rcl@{}} e^{\lambda h^{*}_{g,\lambda}(i)}&\geq & Q(i,f^{*}_{g,\lambda},g,\lambda)\left(q(z|i,f^{*}_{g,\lambda})+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,f^{*}_{g,\lambda})\right)\\ &=&\sup\limits_{a\in A(i)}\left\{Q(i,a,g,\lambda)\left(q(z|i,a)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,a)\right)\right\}, \end{array} $$

    which together with Eq. 6.9 yields

    $$ e^{\lambda h^{*}_{g,\lambda}(i)} =\sup\limits_{a\in A(i)}\left\{Q(i,a,g,\lambda)\left(q(z|i,a)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,a)\right)\right\} $$
    (6.14)

    for all iS∖{z}. Following the similar arguments of Eqs. 6.13 and 6.14, we have

    $$ h^{*}_{g,\lambda}(z)=h_{g,\lambda}(z,f^{*}_{g,\lambda})\ \ \text{and} \ \ e^{\lambda h^{*}_{g,\lambda}(z)}=\sup\limits_{a\in A(z)}\left\{Q(z,a,g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|z,a)\right\}. $$
    (6.15)

    Therefore, the function \(h^{*}_{g,\lambda }\) on S satisfies (3.3). Moreover, Eq. 6.7 gives

    $$ e^{\lambda h^{*}_{g,\lambda}(i)}=e^{\lambda h_{g,\lambda}(i,f^{*}_{g,\lambda})}>0 \ \ \text{for \ all} \ i\in S. $$
    (6.16)

    Hence, by Eqs. 6.5, 6.8, 6.13, 6.15 and 6.16, we see that for any \(f^{*}_{g,\lambda }\in F\) with \(f^{*}_{g,\lambda }(i)\in A(i)\) attaining the minimum of Eq. 3.3, \(h_{g,\lambda }(i,f^{*}_{g,\lambda })=h^{*}_{g,\lambda }(i)\in \mathbb {R}\) and \(Q(i,f^{*}_{g,\lambda },g,\lambda )<\infty \) for all iS.

  • Choose a sequence \(\{\overline {g}^{\lambda }_{n},n\geq 1\}\subseteq G_{\lambda }\) satisfying

    $$ \overline{g}^{\lambda}_{n}\leq \overline{g}^{\lambda}_{n+1} \ \ \text{for \ all} \ \ n\geq1 \ \ \text{and} \ \ \lim_{n\to\infty}\overline{g}^{\lambda}_{n}=g^{*}_{\lambda}. $$
    (6.17)

    By Eqs. 3.1 and 6.17, we obtain

    $$ h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(i)\geq h^{*}_{\overline{g}^{\lambda}_{n+1},\lambda}(i)\geq h^{*}_{g^{*}_{\lambda},\lambda}(i) \ \ \text{for \ all} \ i\in S \ \text{and} \ n\geq1. $$
    (6.18)

    Set \(H_{\lambda }(i):=\lim _{n\to \infty }h^{*}_{\overline {g}^{\lambda }_{n},\lambda }(i)\) for all iS. Then by the definition of G λ and Eq. 6.18, we have

    $$ H_{\lambda}(z)\geq0 \ \ \text{and} \ \ H_{\lambda}(i)\geq h^{*}_{g^{*}_{\lambda},\lambda}(i) \ \ \text{for \ all} \ \ i\in S. $$
    (6.19)

    Moreover, employing part (c), we get

    $$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(i)}\geq Q(i,f,\overline{g}^{\lambda}_{n},\lambda)\left(q(z|i,f)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(j)}q(j|i,f)\right)\\ e^{\lambda h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(z)}\geq Q(z,f,\overline{g}^{\lambda}_{n},\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(j)}q(j|z,f) \end{array}\right. \end{array} $$

    for all n≥1, which together with the Fatou lemma gives

    $$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda H_{\lambda}(i)}\geq Q(i,f,g^{*}_{\lambda},\lambda)\left(q(z|i,f) +{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda H_{\lambda}(j)}q(j|i,f)\right)\\ e^{\lambda H_{\lambda}(z)}\geq Q(z,f,g^{*}_{\lambda},\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda H_{\lambda}(j)}q(j|z,f) \end{array}\right. \end{array} $$
    (6.20)

    for all iS∖{z} and fF. Thus, using Eq. 6.20 and the similar arguments of Eq. 6.11, we obtain \(e^{\lambda H_{\lambda }(i)}\geq e^{\lambda h_{g^{*}_{\lambda }, \lambda }(i,f)}\) for all fF, which together with Eqs. 3.1 and 6.19 yields \(h^{*}_{g^{*}_{\lambda },\lambda }(z)=H_{\lambda }(z)\geq 0\). Hence, we obtain \(g^{*}_{\lambda }\in G_{\lambda }\).

    Suppose that \(h^{*}_{g^{*}_{\lambda },\lambda }(z)>0\). For each n≥1, let \(\gamma _{n}:=e^{n\lambda h^{*}_{g^{*}_{\lambda },\lambda }(z)}\). Then we have γ n ∈(0,1) for all n≥1 and \(\lim _{n\to \infty }\gamma _{n}=0\). Employing Eq. 6.8 we get \(\lambda c(i,f)-\lambda g^{*}_{\lambda }+q(i|i,f)<0\) for all iS and fF. For each n≥1 and fF, define the transition rate as follows:

    $$ \overline{p}_{n}(z|z,f):=-\gamma_{n} e^{\lambda h_{g^{*}_{\lambda},\lambda}(z,f)}, \ \overline{p}_{n}(j|z,f):=-\frac{\gamma_{n}e^{\lambda h_{g^{*}_{\lambda},\lambda}(j,f)}q(j|z,f)}{\lambda c(z,f)-\lambda g^{*}_{\lambda}+q(z|z,f)} \ \ \text{for} \ j\in S\setminus\{z\}, $$
    (6.21)

    and for any iS∖{z},

    $$\begin{array}{@{}rcl@{}} \overline{p}_{n}(i|i,f)&:=&-\gamma_{n} e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f)}, \ \overline{p}_{n}(z|i,f):=-\frac{\gamma_{n} q(z|i,f)}{\lambda c(i,f)-\lambda g^{*}_{\lambda}+q(i|i,f)}, \end{array} $$
    (6.22)
    $$\begin{array}{@{}rcl@{}} \overline{p}_{n}(j|i,f)&:=&-\frac{\gamma_{n}e^{\lambda h_{g^{*}_{\lambda},\lambda}(j,f)}q(j|i,f)}{\lambda c(i,f)-\lambda g^{*}_{\lambda}+q(i|i,f)} \ \ \text{for} \ j\in S\setminus \{i,z\}. \end{array} $$
    (6.23)

    For any initial state iS and any policy fF, the probability measure and expectation operator corresponding to the transition rate \(\overline {p}_{n}(\cdot |\cdot ,f)\) defined in Eqs. 6.216.23 are denoted by \(\overline {P}^{f}_{i,n}\) and \(\overline {E}^{f}_{i,n}\). For any κ>0 and n≥1, define

    $$\overline{H}_{\kappa,n}(i,f):=\frac{1}{\lambda}\ln \overline{E}^{f}_{i,n}\left[e^{-\lambda \kappa \tau_{z}}\right] \ \ \text{for \ all} \ i\in S\ \text{and} \ f\in F. $$

    Note that Eq. 6.7 gives \(e^{\lambda h_{g^{*}_{\lambda },\lambda }(i,f)}>0\) for all iS and fF. Employing Eq. 3.1 and part (c), we have \(\min _{i\in S}\inf _{f\in F}e^{-\lambda h_{g^{*}_{\lambda },\lambda }(i,f)}=\min _{i\in S}e^{-\lambda h^{*}_{g^{*}_{\lambda },\lambda }(i)}>0\). Then by the fact that \(\lim _{n\to \infty }\gamma _{n}=0\), there exists a positive integer n 1 such that

    $$\begin{array}{@{}rcl@{}} \gamma_{n_{1}}&<& \min_{(i,a)\in K}\left(\lambda g^{*}_{\lambda}-\lambda c(i,a)-q(i|i,a)\right) \ \ \text{and} \end{array} $$
    (6.24)
    $$\begin{array}{@{}rcl@{}} \gamma_{n_{1}}&\leq&\min_{(i,a)\in K}\left(\lambda g^{*}_{\lambda}-\lambda c(i,a)-q(i|i,a)\right)\times\min_{i\in S}\inf_{f\in F}e^{-\lambda h_{g^{*}_{\lambda},\lambda}(i,f)}\\ &\leq&\left(\lambda g^{*}_{\lambda}-\lambda c(i,f)-q(i|i,f)\right)e^{-\lambda h_{g^{*}_{\lambda},\lambda}(i,f)} \end{array} $$
    (6.25)

    for all iS and fF. Moreover, we have that there exists f 1F satisfying \(h_{g^{*}_{\lambda },\lambda }(i,f^{1})=\sup _{f\in F}h_{g^{*}_{\lambda },\lambda }(i,f)=:\overline {h}_{g^{*}_{\lambda },\lambda }(i)\) for all iS. In fact, by part (b) we get

    $$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(i)}\geq \inf_{a\in A(i)}\left\{Q(i,a,g,\lambda)\left(q(z|i,a)+{\sum}_{j\in S\setminus\{i,z\}}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(j)}q(j|i,a)\right)\right\}\\ e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(z)}\geq \inf_{a\in A(z)}\left\{Q(z,a,g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(j)}q(j|z,a)\right\} \end{array}\right. \end{array} $$
    (6.26)

    for all iS∖{z}. Then employing Assumption 3.1, Theorem 3.1(c) in Wei and Chen (2016) and the Weierstrass theorem in Aliprantis and Border (2007, p.40), we obtain the existence of f 1F with f 1(i)∈A(i) attaining the minimum of Eq. 6.26, i.e,

    $$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(i)}\geq Q(i,f^{1},g,\lambda)\left(q(z|i,f^{1})+{\sum}_{j\in S\setminus\{i,z\}}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(j)}q(j|i,f^{1})\right)\\ e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(z)}\geq Q(z,f^{1},g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(j)}q(j|z,f^{1}) \end{array}\right. \end{array} $$
    (6.27)

    for all iS∖{z}. Thus, using Eq. 6.27 and following the similar arguments of Eq. 6.11, we have

    $$e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(i)}\!\geq \!{\sum}_{m=1}^{n}E_{i}^{f^{1}}\!\left[e^{\lambda{\int}_{0}^{T_{m}}(c(\xi_{t},f^{1})-g^{*}_{\lambda})dt}I_{\{\tau_{z}=T_{m}\}}\right] +E_{i}^{f^{1}}\!\left[e^{\lambda{\int}_{0}^{T_{n}}(c(\xi_{t},f^{1})-g^{*}_{\lambda})dt}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(\xi_{T_{n}})}I_{\{\tau_{z}>T_{n}\}}\!\right] $$

    for all n=1,2,…, which gives \(\overline {h}_{g^{*}_{\lambda },\lambda }(i)\leq h_{g^{*}_{\lambda },\lambda }(i,f^{1})\) for all iS∖{z}. Hence, we get \(\overline {h}_{g^{*}_{\lambda },\lambda }(i)=h_{g^{*}_{\lambda },\lambda }(i,f^{1})\) for all iS∖{z}. Similarly, by Eq. 6.27 we can obtain \(\overline {h}_{g^{*}_{\lambda },\lambda }(z)=h_{g^{*}_{\lambda },\lambda }(z,f^{1})\). Furthermore, by Eq. 6.7 we have

    $$ \inf_{f\in F}e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f)}=e^{\lambda \sup_{f\in F}h_{g^{*}_{\lambda},\lambda}(i,f)}=e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f^{1})}>0, $$
    (6.28)

    which implies \(\zeta (i):=\inf _{f\in F}\frac {1}{\lambda }\overline {p}_{n_{1}}(i|i,f)>0\) for all iS. Hence, for any \(\kappa \in \left (0,\min \limits _{i\in S}\zeta (i)\right )=:\overline {O}_{n_{1}}\), by Eqs. 6.216.23 and the similar arguments of Theorem 3.1(b) in Wei and Chen (2016), we get

    $$\begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\left\{\begin{array}{ll} e^{\lambda \overline{H}_{\kappa, n_{1}}(i,f)}=-\frac{1}{\gamma_{n_{1}} e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f)}+\lambda \kappa}\!\left(\frac{\gamma_{n_{1}} q(z|i,f)}{\lambda c(i,f)-\lambda g^{*}_{\lambda}+q(i|i,f)}+\!\!\!\sum\limits_{j\in S\setminus\{i, z\}}\!\!\!\!\frac{\gamma_{n_{1}} e^{\lambda \overline{H}_{\kappa,n_{1}}(j,f)+\lambda h_{g^{*}_{\lambda},\lambda}(j,f)}q(j|i,f)}{\lambda c(i,f)-\lambda g^{*}_{\lambda}+q(i|i,f)}\right)\\ e^{\lambda \overline{H}_{\kappa, n_{1}}(z,f)}=-\frac{1}{\gamma_{n_{1}}e^{\lambda h_{g^{*}_{\lambda},\lambda}(z,f)}+\lambda\kappa} \!\sum\limits_{j\in S\setminus\{z\}}\frac{\gamma_{n_{1}} e^{\lambda \overline{H}_{\kappa,n_{1}}(j,f)+\lambda h_{g^{*}_{\lambda},\lambda}(j,f)}q(j|z,f)}{\lambda c(z,f)-\lambda g^{*}_{\lambda}+q(z|z,f)} \end{array}\right. \end{array} $$
    (6.29)

    for all iS∖{z} and fF. Below we show that there exist \(\widetilde {t}_{0}\in \mathbb {R}_{+}\) and \(\widetilde {\alpha }\in (0,1)\) such that

    $$ \overline{P}^{f}_{i,n_{1}}(\tau_{z}>\widetilde{t}_{0})\leq \widetilde{\alpha} \ \ \text{for \ all} \ i\in S \ \text{and} \ f\in F. $$
    (6.30)

    By Theorem 3.1 and an induction argument, we have \({P_{i}^{f}}(\tau _{z}>nt_{0})\leq \alpha ^{n}\) for all n=1,2,…, which implies that there exists a positive integer n 2 such that

    $$ 0<\alpha^{n_{2}}\leq\frac{1}{3} \ \ \text{and} \ \ {P_{i}^{f}}(\tau_{z}>n_{2}t_{0})\leq \alpha^{n_{2}} $$
    (6.31)

    for all iS and fF. Let \(\pi _{1}:=\max _{(i,a)\in K}(-q(i|i,a))\). Then direct calculations yield

    $$\begin{array}{@{}rcl@{}} {P_{i}^{f}}(T_{m} < n_{2}t_{0})&=&{P_{i}^{f}}(\theta_{1}+\cdots+\theta_{m} < n_{2}t_{0})\\ &=&{\int}_{s_{1}+\cdots+s_{m} <n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{i\},j_{l+1}\in S\setminus\{j_{l}\},l=1,\ldots,m-1}e^{q(i|i,f)s_{1}}q(j_{1}|i,f)\\ &&\times{\prod}_{l=1}^{m-1}\left(e^{q(j_{l}|j_{l},f)s_{l+1}} q(j_{l+1}|j_{l},f)\right)ds_{m}{\cdots} ds_{1}\\ &\leq&{\pi_{1}^{m}}{\int}_{s_{1}+\cdots+s_{m}<n_{2}t_{0}}ds_{m}{\cdots} ds_{1}=\frac{(\pi_{1}n_{2}t_{0})^{m}}{m!} \end{array} $$
    (6.32)

    for all iS and fF, where the second equality follows from Proposition B.8 in Guo and Hernández-Lerma (2009, p.205). Thus, using Eq. 6.32 we obtain \(\lim _{m\to \infty }\sup _{i\in S,f\in F}{P_{i}^{f}}(T_{m}< n_{2}t_{0})=0\), which gives that there exists a positive integer m satisfying

    $$ {P_{i}^{f}}(T_{m^{*}}<n_{2}t_{0})\leq \frac{\alpha^{n_{2}}}{2} \ \ \text{for \ all} \ i\in S \ \text{and} \ f\in F. $$
    (6.33)

    For each fF, let {Y n ,n=0,1,…} be the embedding Markov chain of the continuous-time Markov chain associated with the transition rate q(⋅|⋅,f) and define \(\tau _{1}:=\inf \{n\geq 1:Y_{n}=z\}\). Employing Eqs. 6.31 and 6.33 we get

    $$\begin{array}{@{}rcl@{}} {P_{i}^{f}}(\tau_{1}\leq m^{*})={P_{i}^{f}}(\tau_{z}\leq T_{m^{*}})&\geq&{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0}, T_{m^{*}}\geq n_{2}t_{0})\\ &\geq&{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0})+{P_{i}^{f}}(T_{m^{*}}\geq n_{2}t_{0})-1\geq\frac{1}{2} \end{array} $$
    (6.34)

    for all iS and fF. Let \(\pi _{2}:=\min _{(i,a)\in K}\left (-\frac {\gamma _{n_{1}}}{\lambda c(i,a)-\lambda g^{*}_{\lambda }+q(i|i,a)}\right )\), \(\pi _{3}:=\min _{i\in S}\inf _{f\in F}e^{\lambda h_{g^{*}_{\lambda },\lambda }(i,f)}\), and \(\pi _{4}:=\min _{i\in S}\inf _{f\in F}\overline {p}_{n_{1}}(i|i,f)\). Then Eq. 6.24 and Theorem 3.2(c) imply π 2∈(0,1) and \(\pi _{4}\in (-\infty ,0)\). By Eqs. 3.1, 6.28 and the fact that \(g^{*}_{\lambda }\in G_{\lambda }\), we obtain

    $$\pi_{3}=\min_{i\in S}e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f^{1})}>0 \ \ \text{and} \ \ \pi_{3}\leq e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(z)}\leq1. $$

    Thus, for each iS, fF and m≥1, direct calculations yield

    $$\begin{array}{@{}rcl@{}} &&\overline{P}^{f}_{i,n_{1}}(\tau_{z}\leq n_{2}t_{0})\\ &\geq&\overline{P}^{f}_{i,n_{1}}(\tau_{z}\leq n_{2}t_{0},\tau_{z}\leq T_{m})\\ &=&{\sum}_{n=1}^{m}\overline{P}^{f}_{i,n_{1}}(\tau_{z}\leq n_{2}t_{0},\tau_{z}=T_{n})\\ &=&{\sum}_{n=1}^{m}{\int}_{s_{1}+\cdots+s_{n}\leq n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{ i,z\},j_{l+1}\in S\setminus\{j_{l},z\},l=1,2,\ldots,n-2}e^{\overline{p}_{n_{1}}(i|i,f)s_{1}}\overline{p}_{n_{1}}(j_{1}|i,f)\\ &&\times{\prod}_{l=1}^{n-2} e^{\overline{p}_{n_{1}}(j_{l}|j_{l},f)s_{l+1}}\overline{p}_{n_{1}}(j_{l+1}|j_{l},f) e^{\overline{p}_{n_{1}}(j_{n-1}|j_{n-1},f)s_{n}}\overline{p}_{n_{1}}(z|j_{n-1},f)ds_{n} {\cdots} ds_{1}\\ &\geq&{\sum}_{n=1}^{m}{\int}_{s_{1}+\cdots+s_{n}\leq n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{ i,z\},j_{l+1}\in S\setminus\{j_{l},z\},l=1,2,\ldots,n-2}\pi_{2}\pi_{3}e^{\pi_{4}s_{1}}e^{q(i|i,f)s_{1}} q(j_{1}|i,f)\\ &&\times{\prod}_{l=1}^{n-2} \pi_{2}\pi_{3}e^{\pi_{4}s_{l+1}}e^{q(j_{l}|j_{l},f)s_{l+1}}q(j_{l+1}|j_{l},f) \pi_{2}e^{\pi_{4}s_{n}}e^{q(j_{n-1}|j_{n-1},f)s_{n}}q(z|j_{n-1},f)ds_{n} {\cdots} ds_{1}\\ &=&{\sum}_{n=1}^{m}(\pi_{2}\pi_{3})^{n-1}\pi_{2}{\int}_{s_{1}+\cdots+s_{n}\leq n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{ i,z\},j_{l+1}\in S\setminus\{j_{l},z\},l=1,2,\ldots,n-2}e^{\pi_{4}(s_{1}+\cdots+s_{n})}\\ &&\times e^{q(i|i,f)s_{1}}q(j_{1}|i,f){\prod}_{l=1}^{n-2} e^{q(j_{l}|j_{l},f)s_{l+1}}q(j_{l+1}|j_{l},f) e^{q(j_{n-1}|j_{n-1},f)s_{n}}q(z|j_{n-1},f) ds_{n} {\cdots} ds_{1}\\ &\geq& (\pi_{2}\pi_{3})^{m-1}\pi_{2}e^{\pi_{4}n_{2}t_{0}}{\sum}_{n=1}^{m}{\int}_{s_{1}+\cdots+s_{n}\leq n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{ i,z\},j_{l+1}\in S\setminus\{j_{l},z\},l=1,2,\ldots,n-2} e^{q(i|i,f)s_{1}}q(j_{1}|i,f)\\ &&\times{\prod}_{l=1}^{n-2} e^{q(j_{l}|j_{l},f)s_{l+1}}q(j_{l+1}|j_{l},f) e^{q(j_{n-1}|j_{n-1},f)s_{n}}q(z|j_{n-1},f) ds_{n} {\cdots} ds_{1}\\ &=&(\pi_{2}\pi_{3})^{m-1}\pi_{2}e^{\pi_{4}n_{2}t_{0}}{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0},\tau_{z}\leq T_{m}), \end{array} $$
    (6.35)

    where the second equality follows from Proposition B.8 in Guo and Hernández-Lerma (2009, p.205). Note that \(\overline {b}:=(\pi _{2}\pi _{3})^{m^{*}-1}\pi _{2}e^{\pi _{4}n_{2}t_{0}}\in (0,1)\). Hence, using Eqs. 6.31, 6.34 and 6.35, we have

    $$\begin{array}{@{}rcl@{}} \overline{P}^{f}_{i,n_{1}}(\tau_{z}\leq n_{2}t_{0})&\geq& \overline{b}{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0},\tau_{z}\leq T_{m^{*}})\\ &=&\overline{b}{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0},\tau_{1}\leq m^{*})\\ &\geq&\overline{b}[{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0})+{P_{i}^{f}}(\tau_{1}\leq m^{*})-1] \geq\frac{1}{6}\overline{b} \end{array} $$

    for all iS and fF. Therefore, Eq. 6.30 holds with \(\widetilde {t}_{0}:=n_{2}t_{0}\) and \(\widetilde {\alpha }:=1-\frac {1}{6}\overline {b}\). Furthermore, employing Eq. 6.30 and an induction argument, we obtain

    $$ \overline{P}^{f}_{i,n_{1}}(\tau_{z}>n\widetilde{t}_{0})\leq \widetilde{\alpha}^{n} \ \ \text{for \ all} \ i\in S, \ f\in F \ \text{and} \ n=1,2,\ldots. $$
    (6.36)

    Then for any \(\kappa _{0}\in \overline {O}_{n_{1}}\) satisfying \(\kappa _{0}<\frac {\ln \widetilde {\alpha }}{\lambda \widetilde {t}_{0}}\), iS and fF, by (6.36) we get

    $$\begin{array}{@{}rcl@{}} e^{\lambda \overline{H}_{\kappa_{0},n_{1}}(i,f)}&=&{\sum}_{m=0}^{\infty}\overline{E}_{i,n_{1}}^{f}\left[e^{-\lambda \kappa_{0} \tau_{z}}I_{\{\tau_{z}\in(m\widetilde{t}_{0},(m+1)\widetilde{t}_{0}]\}}\right]\\ &\leq&{\sum}_{m=0}^{\infty}e^{-\lambda\kappa_{0}(m+1)\widetilde{t}_{0}}\overline{P}_{i,n_{1}}^{f}(\tau_{z}>m\widetilde{t}_{0})\\ &\leq&\frac{e^{-\lambda \kappa_{0}\widetilde{t}_{0}}}{1-\widetilde{\alpha} e^{-\lambda \kappa_{0}\widetilde{t}_{0}}}. \end{array} $$

    Take any κ 1∈(0,κ 0) satisfying \(\kappa _{1}<\min _{(i,a)\in K}\left \{\frac {1}{\lambda }[\lambda c(i,a)-\lambda g^{*}_{\lambda }+q(i|i,a)]\right \}\) and define \(\overline {H}^{*}_{\kappa _{1},n_{1}}(i,f):=\gamma _{n_{1}}e^{\lambda \overline {H}_{\kappa _{1},n_{1}}(i,f)+\lambda h_{g^{*}_{\lambda },\lambda }(i,f)}\) for all iS and fF. Thus, using Eqs. 6.25 and 6.29, we have

    $$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} \overline{H}^{*}_{\kappa_{1},n_{1}}(i,f)\geq-\frac{1}{\lambda c(i,f)-\lambda g^{*}_{\lambda}-\lambda \kappa_{1}+q(i|i,f)}\left(\gamma_{n_{1}} q(z|i,f)+{\sum}_{j\in S\setminus\{i, z\}} \overline{H}^{*}_{\kappa_{1},n_{1}}(j,f)q(j|i,f)\right)\\ \overline{H}^{*}_{\kappa_{1},n_{1}}(z,f)\geq-\frac{1}{\lambda c(z,f)-\lambda g^{*}_{\lambda}-\lambda \kappa_{1}+q(z|z,f)}{\sum}_{j\in S\setminus\{z\}}\overline{H}^{*}_{\kappa_{1},n_{1}}(j,f)q(j|z,f) \end{array}\right. \end{array} $$

    for all iS∖{z} and fF. Hence, by the last inequalities and the similar arguments of Eq. 6.11, we obtain \(\overline {H}^{*}_{\kappa _{1},n_{1}}(i,f)\geq \gamma _{n_{1}}e^{\lambda h_{g^{*}_{\lambda }+\kappa _{1},\lambda }(i,f)}\) for all iS and fF, which implies

    $$ e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(z)} \sup_{f\in F}e^{\lambda \overline{H}_{\kappa_{1},n_{1}}(z,f)}\geq\sup_{f\in F}e^{\lambda \overline{H}_{\kappa_{1},n_{1}}(z,f)+\lambda h_{g^{*}_{\lambda},\lambda}(z,f)}\geq e^{\lambda h^{*}_{g^{*}_{\lambda}+\kappa_{1},\lambda}(z)}. $$
    (6.37)

    Let \(\{\rho _{m},m\geq 1\}\subseteq (0,\kappa _{1})\) be a sequence satisfying \(\lim _{m\to \infty }\rho _{m}=0\). Note that \(0\leq e^{-\lambda \rho _{m}\widetilde {t}_{0}}\widetilde {\alpha }\leq e^{-\lambda \kappa _{0}\widetilde {t}_{0}}\widetilde {\alpha }<1\) for all m≥1. Then direct calculations give

    $$\begin{array}{@{}rcl@{}} 0\leq \sup_{f\in F}e^{\lambda \overline{H}_{\rho_{m},n_{1}}(z,f)}-1&=&\sup_{f\in F}{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m})^{n}}{n!}\overline{E}_{z,n_{1}}^{f}[{\tau_{z}^{n}}]\\ &=&\sup_{f\in F}{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m})^{n}}{(n-1)!}{\int}_{0}^{\infty}t^{n-1}\overline{P}_{z,n_{1}}^{f}(\tau_{z}>t)dt\\ &=&\sup_{f\in F}{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m})^{n}}{(n-1)!}{\sum}_{l=0}^{\infty}{\int}_{l\widetilde{t}_{0}}^{(l+1) \widetilde{t}_{0}}t^{n-1}\overline{P}_{z,n_{1}}^{f}(\tau_{z}>t)dt\\ &\leq&\sup_{f\in F}{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m}\widetilde{t}_{0})^{n}}{(n-1)!}{\sum}_{l=0}^{\infty} (l+1)^{n-1}\overline{P}_{z,n_{1}}^{f}(\tau_{z}>l\widetilde{t}_{0})\\ &\leq&{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m}\widetilde{t}_{0})^{n}}{(n-1)!}{\sum}_{l=0}^{\infty} (l+1)^{n-1}\widetilde{\alpha}^{l}\\ &=&-\frac{\lambda\rho_{m}\widetilde{t}_{0}e^{-\lambda\rho_{m}\widetilde{t}_{0}}} {1-\widetilde{\alpha}e^{-\lambda\rho_{m}\widetilde{t}_{0}}} \end{array} $$
    (6.38)

    for all m≥1, where the second equality is due to Lemma 3.4 in Kallenberg (2012, p.49) and the last inequality follows from Eq. 6.36. Thus, employing Eq. 6.38 we obtain \(\lim _{m\to \infty }\sup _{f\in F}e^{\lambda \overline {H}_{\rho _{m},n_{1}}(z,f)}=1\). Hence, for any \(\varepsilon \in (0,e^{-\lambda h^{*}_{g^{*}_{\lambda },\lambda }(z)}-1]\), there exists a positive integer m 0 such that \(\sup _{f\in F}e^{\lambda \overline {H}_{\rho _{m_{0}},n_{1}}(z,f)}-1\leq \varepsilon \), which together with Eq. 6.37 implies

    $$ e^{\lambda h^{*}_{g^{*}_{\lambda}+\rho_{m_{0}},\lambda}(z)}\leq e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(z)} \sup_{f\in F}e^{\lambda \overline{H}_{\rho_{m_{0}},n_{1}}(z,f)}\leq1. $$
    (6.39)

    Moreover, by Eq. 6.39 we have \(g^{*}_{\lambda }+\rho _{m_{0}}\in G_{\lambda }\), which leads to a contradiction that \(g^{*}_{\lambda }+\rho _{m_{0}}\leq g^{*}_{\lambda }\). Therefore, we get \(h^{*}_{g^{*}_{\lambda },\lambda }(z)=0\).

7 Proof of Theorem 3.4

In this section, we prove Theorem 3.4.

Proof

Fix any iS and π∈π. For any λ 1,λ 2>0, we have

$$\begin{array}{@{}rcl@{}} E_{i}^{\pi}\!\left[e^{\lambda_{1}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt}\!\right] =&E_{i}^{\pi}\!\left[e^{\lambda_{2}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt} e^{(\lambda_{1}-\lambda_{2}){{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt}\right]\\ &{}\leq E_{i}^{\pi}\left[e^{\lambda_{2}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt} \right]e^{|\lambda_{1}-\lambda_{2}|T\max_{(i,a)\in K}c(i,a)} \end{array} $$

and

$$\begin{array}{@{}rcl@{}} E_{i}^{\pi}\left[e^{\lambda_{1}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt}\right] \geq&E_{i}^{\pi}\left[e^{\lambda_{2}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt} \right]e^{-|\lambda_{1}-\lambda_{2}|T\max_{(i,a)\in K}c(i,a)}, \end{array} $$

which give

$$|\lambda_{1} J^{*}_{V_{\lambda_{1}}}(i)-\lambda_{2} J^{*}_{V_{\lambda_{2}}}(i)|\leq|\lambda_{1}-\lambda_{2}|\max_{(i,a)\in K}c(i,a). $$

Thus, Theorem 3.3(b) and the last inequality yield that \(\lambda g^{*}_{\lambda }\) is continuous in \(\lambda \in (0,\infty )\). Hence, by Theorem 3.3(b) again, we obtain that \(J^{*}_{V_{\lambda }}(i)\) is continuous in \(\lambda \in (0,\infty )\). Moreover, employing the same technique above, we can obtain the continuity of \(J^{*}_{V_{\lambda }}(i)\) in \(\lambda \in (-\infty ,0)\). Below we show that \(J^{*}_{V_{\lambda }}(i)\) is continuous in λ=0. Take an arbitrary sequence \(\{\lambda _{n},n\geq 1\}\subseteq (0,\infty )\) with \(\lim _{n\to \infty }\lambda _{n}=0\). For each n≥1, let \(g^{*}_{\lambda _{n}}\) and \(h^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}\) be as in Theorem 3.3 with λ n in lieu of λ. Set \(\widetilde {h}^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(i):=h^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(i)-\min _{j\in S}h^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(j)\) for all n≥1. Then for each n≥1, we have \(\widetilde {h}^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(i)\geq 0\) and there exists i λ n S satisfying \(\widetilde {h}^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(i^{\lambda _{n}})=0\). Moreover, using Theorem 3.3(a), for each n≥1, there exists \(f^{*}_{\lambda _{n}}\in F\) such that

$$\begin{array}{@{}rcl@{}} \lambda_{n} g^{*}_{\lambda_{n}} e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(i)}&=&\underset{a\in A(i)}{\inf} \!\left\{\lambda_{n} c(i,a)e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(i)}\!\,+\,\underset{j\in S}{\sum}~e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(j)}q(j|i,a)\right\} \end{array} $$
(7.1)
$$\begin{array}{@{}rcl@{}} &=&\lambda_{n} c(i,f^{*}_{\lambda_{n}})e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(i)}+\underset{j\in S}{\sum}~e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(j)}q(j|i,f^{*}_{\lambda_{n}}). \end{array} $$
(7.2)

Note that Theorem 3.3(b) gives \(0\leq g^{*}_{\lambda _{n}}\leq \max _{(i,a)\in K}c(i,a)\) for all n≥1. Thus, choose any convergent subsequence \(\{g^{*}_{\lambda _{n_{l}}},l\geq 1\}\) of \(\{g^{*}_{\lambda _{n}},n\geq 1\}\) and denote the corresponding limit by

$$ \overline{g}:=\lim_{l\to\infty}g^{*}_{\lambda_{n_{l}}}\in \left[0,\max_{(i,a)\in K}c(i,a)\right]. $$
(7.3)

Furthermore, by the finiteness of S and the compactness of F and \([0,\infty ]\), there exists a subsequence of {n l } (still denoted by {n l }) such that \(i^{\lambda _{n_{l}}}=i^{*}\in S\) for all l≥1 and the limits of the sequences \(\{\widetilde {h}^{*}_{g^{*}_{\lambda _{n_{l}}},\lambda _{n_{l}}},l\geq 1\}\) and \(\{f^{*}_{\lambda _{n_{l}}},l\geq 1\}\) exist. Set

$$ \overline{h}(j):=\lim_{l\to\infty}\widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(j)\ \ \text{and} \ \ f^{*}(j):=\lim_{l\to\infty}f^{*}_{\lambda_{n_{l}}}(j)\ \ \text{for \ all} \ j\in S. $$
(7.4)

Then we have \(\overline {h}(j)\geq 0\) for all jS and \(\overline {h}(i^{*})=0\). Below we show \(\overline {h}(j)<\infty \) for all jS by an induction argument. For any ji , Assumption 3.1(iii) yields that there exist different states k 1=i , k 2,…,\(k_{m^{\prime }}=j\) such that q(k n+1|k n ,f )>0 for all \(n=1,2,\ldots , m^{\prime }-1\). Observe that \(\overline {h}(k_{1})=\overline {h}(i^{*})=0\). Thus, \(\overline {h}(k_{n})<\infty \) holds for n=1. Suppose that \(\overline {h}(k_{n^{*}})<\infty \) for some \(n^{*}\in \{1,2,\ldots ,m^{\prime }-1\}\). Employing Eq. 7.2 we obtain

$$g^{*}_{\lambda_{n_{l}}} e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})} =c(k_{n^{*}},f^{*}_{\lambda_{n_{l}}})e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})}+{\sum}_{j\in S} \frac{e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(j)}-1}{\lambda_{n_{l}}}q(j|k_{n^{*}},f^{*}_{\lambda_{n_{l}}}), $$

which together with the inequality e x−1≥x ( x≥0) gives

$$\begin{array}{@{}rcl@{}} g^{*}_{\lambda_{n_{l}}} e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})}\geq&c(k_{n^{*}},f^{*}_{\lambda_{n_{l}}})e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})}+\frac{e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})}-1}{\lambda_{n_{l}}}q(k_{n^{*}}|k_{n^{*}},f^{*}_{\lambda_{n_{l}}})\\ &+\widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}} (k_{n^{*}+1})q(k_{n^{*}+1}|k_{n^{*}},f^{*}_{\lambda_{n_{l}}}) \end{array} $$

for all l≥1. Then letting \(l\to \infty \) in the both sides of the last inequality and using the induction hypothesis, Assumption 3.1(ii), Eqs. 7.3 and 7.4, we get

$$\overline{g}\geq c(k_{n^{*}},f^{*})+\overline{h}(k_{n^{*}})q(k_{n^{*}}|k_{n^{*}},f^{*})+\overline{h}(k_{n^{*}+1})q(k_{n^{*}+1}|k_{n^{*}},f^{*}),$$

which implies \(\overline {h}(k_{n^{*}+1})<\infty \). Hence, by induction we have \(\overline {h}(k_{n})<\infty \) for all \(n=1,2,\ldots ,m^{\prime }\). Therefore, we obtain \(\overline {h}(j)\in [0,\infty )\) for all jS. By Eq. 7.1 we get

$$\begin{array}{@{}rcl@{}} g^{*}_{\lambda_{n_{l}}} e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(i)}&\leq&c(i,a)e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(i)}+{\sum}_{j\in S} \frac{e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(j)}-1}{\lambda_{n_{l}}}q(j|i,a) \end{array} $$

for all l≥1 and aA(i). Then the last inequality, Eqs. 7.3 and 7.4 yield

$$ \overline{g}\leq c(i,a)+\sum\limits_{j\in S}\overline{h}(j)q(j|i,a) \ \ \text{for \ all} \ a\in A(i). $$
(7.5)

Thus, employing Eq. 7.5 and the Dynkin formula, we obtain

$$\overline{g}T\leq E_{i}^{\pi}\left[{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt\right]+E_{i}^{\pi}[\overline{h}(\xi_{T})]-\overline{h}(i) $$

for all T>0, which gives

$$ \overline{g}\leq J^{*}_{V_{0}}(i). $$
(7.6)

On the other hand, using Eq. 7.2 and the similar arguments of Eq. 7.6, we have

$$ \overline{g}=J_{V_{0}}(i,f^{*})\geq J^{*}_{V_{0}}(i). $$
(7.7)

Then combining Eqs. 7.6 and 7.7, we get \(\overline {g}=J^{*}_{V_{0}}(i)\). Since all the convergent subsequences of \(\{g^{*}_{\lambda _{n}},n\geq 1\}\) have the same limit \(J^{*}_{V_{0}}(i)\), we obtain \(\lim _{n\to \infty }g^{*}_{\lambda _{n}}=J^{*}_{V_{0}}(i)\), which together with Theorem 3.3(b) implies \(\lim _{n\to \infty }J^{*}_{V_{\lambda _{n}}}(i)=J^{*}_{V_{0}}(i)\). Therefore, \(J^{*}_{V_{\lambda }}(i)\) is right-continuous in λ=0. Moreover, following the same technique above, we have that \(J^{*}_{V_{\lambda }}(i)\) is left-continuous in λ=0. Hence, we complete the proof of the theorem. □

8 Concluding remarks

In this paper we have studied the U-average cost criterion for the CTMDPs with a finite state space. Under the continuity-compactness condition and the irreducibility condition, we have shown that the simultaneous Doeblin condition for the CTMDPs holds. Moreover, we have obtained the optimality equation of the auxiliary risk-sensitive first passage optimization problem and the properties of the corresponding optimal value function for any nonzero risk-sensitivity parameter. Then employing the obtained results on the risk-sensitive first passage criterion, we have established the existence of a solution to the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value. Furthermore, we have proven that the optimal value function of the risk-sensitive average cost criterion is continuous with respect to the risk-sensitivity parameter. Finally, we have given the connections between the U-average cost criterion and the average cost criteria induced by the identity function and the exponential utility function, from which the existence of a U-average optimal deterministic stationary policy has been shown. It should be mentioned that the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value plays a crucial role in the study of the U-average cost criterion. Hence, when dealing with the U-average cost criterion with a countable state space, the difficulty lies in finding the conditions under which the optimality equation of the risk-sensitive average cost criterion holds for any nonzero risk-sensitivity parameter. In addition, the CTMDPs with the bounded transition rates under the expected discounted cost and expected average cost criteria can be transformed to the equivalent discrete-time MDPs by the uniformization technique. Whether the uniformization technique is applicable to the CTMDPs under the risk-sensitive average cost criterion is a very interesting problem.