Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Wei, Qingda; Chen, Xian

doi:10.1007/s10626-017-0237-x

Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Published: 20 February 2017

Volume 27, pages 501–524, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Discrete Event Dynamic Systems Aims and scope Submit manuscript

Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Download PDF

Qingda Wei¹ &
Xian Chen²

319 Accesses
5 Citations
Explore all metrics

Abstract

In this paper we study the average cost criterion induced by the regular utility function (U-average cost criterion) for continuous-time Markov decision processes. This criterion is a generalization of the risk-sensitive average cost and expected average cost criteria. We first introduce an auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function under the slight conditions. Then we show that the pair of the optimal value functions of the risk-sensitive average cost criterion and the risk-sensitive first passage criterion is a solution to the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value. Moreover, we have that the optimal value function of the risk-sensitive average cost criterion is continuous with respect to the risk-sensitivity parameter. Finally, we give the connections between the U-average cost criterion and the average cost criteria induced by the identity function and the exponential utility function, and prove the existence of a U-average optimal deterministic stationary policy in the class of all randomized Markov policies.

Continuous-Time Markov Decision Processes Under the Risk-Sensitive First Passage Discounted Cost Criterion

Article 06 March 2023

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Article 19 October 2019

Finite horizon risk-sensitive continuous-time Markov decision processes with unbounded transition and cost rates

Article 10 January 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Continuous-time Markov decision processes (CTMDPs) have rich applications in the queueing systems, inventory management, telecommunication, the control of the epidemic, etc; see, for instance, Puterman (1994), Kitaev and Rykov (1995), Guo and Hernandez-Lerma (2009), Guo et al. (2012) and the references therein. The average cost criterion is a common optimality criterion in the CTMDPs, which includes the expected average cost and the risk-sensitive average cost criteria. For the expected average cost criterion, the decision-maker is supposed to be risk-neutral and there exists a vast amount of literature; see, for instance, Puterman (1994), Kitaev and Rykov (1995), Guo and Hernandez-Lerma (2009), Guo et al. (2012), Wei and Chen (2014) and the extensive references therein. For the risk-sensitive average cost criterion, the exponential utility function is employed to characterize the risk preferences of the decision-maker. More specifically, when the risk-sensitivity parameter of the exponential utility function takes positive (negative) values, the decision-maker is risk-averse (risk-seeking). Although the risk-sensitive average cost criterion for the discrete-time MDPs has been widely studied (see, for instance, Cavazos-Cadena and Fernandez-Gaucherand (2005), Cavazos-Cadena (2010) and Cavazos-Cadena and Hernandez-Hernandez (2011) for the countable state space and Jaśkiewicz (2007) and Di Masi and Stettner (2007) for the uncountable state space), there exists a handful of literature on this criterion for the CTMDPs. Ghosh and Saha (2014) and Wei and Chen (2016) investigate the risk-sensitive average cost criterion with a positive risk-sensitivity parameter for the CTMDPs and obtain the existence of optimal policies via the optimality equation approach. Moreover, to the best of our knowledge, there is no existing literature dealing with the risk-sensitive average cost criterion which allows the risk-sensitivity parameter to take negative values for the CTMDPs.

On the other hand, the risk preferences of the decision-maker may be described neither by the identity function nor by the exponential utility function in the real-world applications. Except the identity function and the exponential utility function, there are other utility functions to describe the risk preferences of the decision-maker, such as the logarithmic utility function, the power utility function, etc. Thus, it is desirable for us to consider the average cost criterion induced by the general utility function. For the discrete-time MDPs, Bäuerle and Rieder (2014) discusses the average cost criterion induced by the power utility function and Cavazos-Cadena and Hernández-Hernández (2016) studies the average cost criterion induced by the regular utility function which is referred to as the U-average cost criterion for simplicity. The U-average cost criterion includes the expected average cost criterion induced by the identity function, the risk-sensitive average cost criterion induced by the exponential utility function and the average cost criterion induced by the logarithmic and power utility functions. For the CTMDPs, as far as we can tell, the discussions on the average cost criterion only focus on the expected average cost and risk-sensitive average cost criteria.

In this paper we study the U-average cost criterion for the CTMDPs. The state space is a finite set and the action space is a Borel space. Since the existence of optimal policies for the U-average cost criterion is closely connected with the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value, we need to investigate the risk-sensitive average cost criterion at first. Under the optimality conditions of the paper (i.e., the standard continuity-compactness condition and the irreducibility condition), we first show that the simultaneous Doeblin condition holds (see Theorem 3.1). The simultaneous Doeblin condition plays a crucial role in establishing the existence of a solution to the risk-sensitive average cost optimality equation. Then we introduce an auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function (see Theorem 3.2). Basing on Theorem 3.2, we have that the pair of the optimal value functions of the risk-sensitive average cost criterion and the risk-sensitive first passage criterion is a solution to the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value (see Theorem 3.3), which generalizes the results in Ghosh and Saha (2014) and Wei and Chen (2016) only allowing the risk-sensitivity parameter to take positive values (see Remarks 3.3 and 3.4). It should be noted that the extension is nontrivial. Moreover, we prove that the optimal value function of the risk-sensitive average cost criterion is continuous with respect to the risk-sensitivity parameter (see Theorem 3.4). Finally, using the results on the expected average cost and risk-sensitive average cost criteria, we show that there exists an optimal deterministic stationary policy in the class of all randomized Markov policies for the U-average cost criterion. Moreover, we have that the optimal value function and the set of all optimal stationary policies for the U-average cost criterion coincide with those for the average cost criterion induced either by the identity function or by the exponential utility function with some risk-sensitivity parameter (see Theorem 4.1).

The rest of this paper is organized as follows. In Section 2, we introduce the control model and the optimality criterion. In Section 3, we present the results on the simultaneous Doeblin condition, the risk-sensitive first passage criterion and the risk-sensitive average cost criterion, whose proofs are given in Sections 5–7. In Section 4, we state and prove the main results on the U-average cost criterion. In Section 8, we conclude with some remarks.

2 Preliminaries

In this section, we introduce the control model and the average cost criterion induced by the regular utility function for the CTMDPs. The control model in this paper is given by

$$\mathcal{M}:=\{S, A, (A(i), i\in S), q(j|i,a), c(i,a)\}. $$

(i) The state space S is a finite set endowed with the discrete topology.
(ii) The action space A is a Borel space with the Borel σ-algebra $\mathcal {B}(A)$.
(iii) The set of all admissible actions in state i∈S denoted by A(i) is a Borel-measurable subset of A. Moreover, we set K:={(i,a)|i∈S,a∈A(i)} which stands for the set of all admissible state-action pairs.
The real-valued measurable transition rate q(j|i,a) satisfies q(j|i,a)≥0 for all (i,a)∈K and j≠i, and is conservative (i.e., ${\sum }_{j\in S}q(j|i,a)=0$ for all (i,a)∈K).
The positive real-valued cost rate function c(i,a) is measurable in a∈A(i) for each i∈S.

The evolution of a CTMDP is intuitively described as follows. The state of the dynamical system is observed continuously by a decision-maker. When the state of the system occupies i∈S, the decision-maker takes an action a from the set of all admissible actions A(i). As a result of this action, a cost is incurred at the rate c(i,a), and the system stays at state i for a random time following the exponential distribution and then jumps to a new state j≠i according to some distribution (see Proposition B.8 in Guo and Hernández-Lerna (2009, p.205) for the explicit expressions of the corresponding distributions). When the state of the system transits to the state j, the above procedure is repeated.

Below we formally give a mathematical description.

Let $S_{\infty }:=S\cup \{i_{\infty }\}$ with an isolated point $i_{\infty }\notin S$, $\mathbb {R}_{+}:=(0,+\infty )$, ${\Omega }^{0}:=(S\times \mathbb {R}_{+})^{\infty }$, ${\Omega }:={\Omega }^{0}\cup \{(i_{0}, \theta _{1},i_{1}, \ldots , \theta _{m-1},i_{m-1}, \infty , i_{\infty },\infty , i_{\infty },\ldots )| i_{0}\in S, \ i_{l}\in S, \ \theta _{l}\in \mathbb {R}_{+} \ \text {for \ each} \ 1\leq l\leq m-1, \ m\geq 2\}$, and $\mathcal {F}$ be the Borel σ-algebra of Ω. For each ω=(i ₀,𝜃 ₁,i ₁,…)∈Ω, define X ₀(ω):=i ₀, T ₀(ω):=0, X _m(ω):=i _m, T _m(ω):=𝜃 ₁+𝜃 ₂+⋯+𝜃 _m for m≥1, $T_{\infty }(\omega ):=\lim _{m\to \infty }T_{m}(\omega )$, and the state process

$$\xi_{t}(\omega):={\sum}_{m\geq0}I_{\{T_{m}\leq t<T_{m+1}\}}i_{m}+I_{\{t\geq T_{\infty}\}}i_{\infty}\ \ \text{for} \ t\geq0, $$

where I _D denotes the indicator function of a set D. The process after $T_{\infty }$ is regarded to be absorbed in the state $i_{\infty }$. Hence, we write $q(i_{\infty }|i_{\infty }, a_{\infty })=0$, $c(i_{\infty }, a_{\infty })=0$, $A(i_{\infty }):=\{a_{\infty }\}$, $A_{\infty }:=A\cup \{a_{\infty }\}$, where $a_{\infty }$ is an isolated point. Let $\mathcal {F}_{t}:=\sigma (\{T_{m}\leq s, X_{m}=i\}: i\in S, s\leq t, m\geq 0)$ for t≥0, $\mathcal {F}_{s-}:=\bigvee _{0\leq t<s}\mathcal {F}_{t}$, and $\mathcal {P}:=\sigma (\{D\times \{0\}, D\in \mathcal {F}_{0}\}\cup \{D\times (s,\infty ), D\in \mathcal {F}_{s-}, s>0\})$ which denotes the σ-algebra of predictable sets on ${\Omega }\times [0,\infty )$ related to $\{\mathcal {F}_{t}\}_{t\geq 0}$.

Before giving the optimality criterion, we need to introduce the following definition of a randomized Markov policy.

Definition 2.1

A $\mathcal {P}$-measurable transition probability π(⋅|ω,t) on $(A_{\infty }, \mathcal {B}(A_{\infty }))$, concentrated on A(ξ _t−(ω)) is called a randomized Markov policy if there exists a kernel φ on $a_{\infty }$ given $S_{\infty }\times [0,\infty )$ such that π(⋅|ω,t)=φ(⋅|ξ _t−(ω),t). A policy π is said to be deterministic stationary if there exists a function f on $S_{\infty }$ satisfying f(i)∈A(i) for all $i\in S_{\infty }$ and $\pi (\cdot |\omega , t)=\delta _{f(\xi _{t-}(\omega ))}(\cdot )$, where δ _x(⋅) is the Dirac measure concentrated at the point x.

Let π and F be the set of all randomized Markov policies and the set of all deterministic stationary policies, respectively.

Given an arbitrary initial state i∈S and any policy π∈π, employing Theorem 4.27 in Kitaev and Rykov (1995), we obtain the existence of a unique probability measure denoted by $P_{i}^{\pi }$ on $({\Omega },\mathcal {F})$. The notation $E_{i}^{\pi }$ represents the expectation operator with respect to $P_{i}^{\pi }$.

Let $\mathcal {U}$ be the set of all the real-valued utility functions U on $\mathbb {R}_{+}$ satisfying the following properties: (i) U has continuous derivatives up to second order; (ii) the first derivative $U^{\prime }(x)$ is positive for all $x\in \mathbb {R}_{+}$. For any $U\in \mathcal {U}$, the Arrow-Pratt risk-sensitivity function $\mathcal {A}_{U}$ is defined by $\mathcal {A}_{U}(x):=\frac {U^{\prime \prime }(x)}{U^{\prime }(x)}$ for all $x\in \mathbb {R}_{+}$, where $U^{\prime \prime }$ denotes the second derivative of U. Below we give the definition of a regular utility function in Cavazos-Cadena and Hernández-Hernández (2016).

Definition 2.2

A utility function $U\in \mathcal {U}$ is said to be regular if $\lambda _{U}:= \lim _{x\to \infty }\mathcal {A}_{U}(x)$ exists in $\mathbb {R}:=(-\infty ,\infty )$. The constant λ _U is called the asymptotic risk-sensitivity parameter of the regular utility function U.

Let $\mathcal {U}_{r}$ be the set of all the regular utility functions in $\mathcal {U}$. For any $U\in \mathcal {U}_{r}$, i∈S and π∈π, the average cost criterion induced by the regular utility function U is defined by

$$ J_{U}(i,\pi):=\limsup_{T\to\infty}\frac{1}{T}U^{-1}\left(E_{i}^{\pi}\left [U\left({{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt\right)\right]\right), $$

(2.1)

where U ⁻¹ denotes the inverse function of U. In the following, we refer to the average cost criterion defined in Eq. 2.1 as the U-average cost criterion for simplicity.

Remark 2.1

Let $Y:={{\int }_{0}^{T}}{\int }_{A}c(\xi _{t},a)\pi (da|\xi _{t},t)dt$ be the total cost incurred during the finite time interval [0,T]. The quantity $U^{-1}(E_{i}^{\pi }[U(Y)])$ stands for the certainty equivalent of Y with respect to the utility function U and the decision-maker is indifferent between paying the random cost Y or the certainty equivalent of Y; see the detailed discussions in Cavazos-Cadena and Fernandez-Gaucherand (2005), Cavazos-Cadena (2010), Cavazos-Cadena and Hernandez-Hernandez (2011), Bauerle and Rieder (2014), and Cavazos-Cadena and Hernandez-Hernandez (2016).

Definition 2.3

A policy π ^∗∈π is said to be U-average optimal if

$$J_{U}(i,\pi^{*})=\inf_{\pi\in{\Pi}}J_{U}(i,\pi)=:J_{U}^{*}(i) $$

for all i∈S. The function $J^{*}_{U}$ on S is referred to as the optimal value function of the U-average cost criterion.

Finally, we introduce the expected average cost and risk-sensitive average cost criteria which are the particular cases of the U-average cost criterion and play an important role in proving the existence of U-average optimal policies.

For each real number $\lambda \in \mathbb {R}$, define the real-valued function V _λ on $\mathbb {R}_{+}$ as follows:

$$\begin{array}{@{}rcl@{}} V_{\lambda}(x)=\left\{\begin{array}{ll} e^{\lambda x}, &\text{ if } \lambda>0,\\ x, & \text{ if } \lambda=0,\\ -e^{\lambda x}, &\text{ if } \lambda<0, \end{array}\right. \end{array} $$

(2.2)

for all $x\in \mathbb {R}_{+}$. It is obvious that V _λ belongs to $\mathcal {U}_{r}$ for all $\lambda \in \mathbb {R}$. Then for each λ≠0, i∈S and π∈π, by Eq. 2.1 we have

$$J_{V_{\lambda}}(i,\pi)=\limsup_{T\to\infty}\frac{1}{\lambda T}\ln E_{i}^{\pi}\left[e^{{\lambda{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt}\right], $$

which is induced by the exponential utility function and called the risk-sensitive average cost criterion in Ghosh and Saha (2014) and Wei and Chen (2016). For λ=0, Eq. 2.1 gives

$$J_{V_{0}}(i,\pi)=\limsup_{T\to\infty}\frac{1}{T}E_{i}^{\pi}\left[{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt\right] $$

for all i∈S and π∈π, which is induced by the identity function and referred to as the expected average cost criterion; see, for instance, Puterman (1994), Kitaev and Rykov (1995), Guo and Hernandez-Lerma (2009), Guo et al. (2012), and Wei and Chen (2014). Hence, the V _λ-average cost criterion contains the expected average cost and risk-sensitive average cost criteria.

3 The V _λ-average cost criterion

In this section, we aim to give the optimality conditions for the existence of V _λ-average optimal policies and establish the existence of a solution to the optimality equation of the V _λ-average cost criterion. Since the existence of optimal policies and the optimality equation for the V ₀-average cost criterion (i.e., the expected average cost criterion) have been well studied (see Puterman (1994), Kitaev and Rykov (1995), Guo and Hernández-Lerma (2009), and Wei and Chen (2014) and the references therein), we mainly focus on the V _λ-average cost criterion for all λ≠0 (i.e., the risk-sensitive average cost criterion) below. To do so, we introduce the following assumption in Wei and Chen (2016), i.e., the usual continuity-compactness condition and the irreducibility condition.

Assumption 3.1

(i) For each i∈S, the set A(i) is compact.
(ii) For each i,j∈S, the functions c(i,a) and q(j|i,a) are continuous in a∈A(i).
(iii) For each f∈F, the continuous-time Markov chain associated with the transition rate q(⋅|⋅,f) is irreducible, which means that for any two states i≠j, there exist different states j ₁=i, j ₂,…, j _m such that q(j ₂|j ₁,f)⋯q(j|j _m,f)>0, where we write q(j|i,f):=q(j|i,f(i)).

Remark 3.1

For each f∈F, f can be viewed as ${\prod }_{i\in S}f(i)\in {\prod }_{i\in S}A(i)$. Thus, by Assumption 3.1(i) and the Tychonoff theorem, we have that F is compact and metrizable.

Fix any state z∈S throughout the paper. Define

$$\tau_{z}:=\inf\{t\geq T_{1}:\xi_{t}=z\} \ \text{with \ the \ convention\ that} \ \inf\emptyset:=\infty. $$

τ _z is the time of the first entry into the state z after the first transition has occurred. Under Assumption 3.1, the following result indicates that the simultaneous Doeblin condition (i.e., the statement of Theorem 3.1(b)) for the CTMDPs holds.

Theorem 3.1

Suppose that Assumption 3.1 is satisfied. Then the following assertions hold.

(a) There exists a constant $L\in \mathbb {R}_{+}$ such that ${E_{i}^{f}}[\tau _{z}]\leq L$ for all i∈S and f∈F.
(b) There exist constants $t_{0}\in \mathbb {R}_{+}$ and α∈(0,1) such that ${P_{i}^{f}}(\tau _{z}>t_{0})\leq \alpha $ for all i∈S and f∈F.

Proof

See Section 5. □

Remark 3.2

The assertion of part (a) is equivalent to that of part (b). Indeed, from the proof of Theorem 3.1, we see that part (a) implies part (b). On the other hand, suppose that part (b) holds. Then by an induction argument, we have ${P_{i}^{f}}(\tau _{z}>mt_{0})\leq \alpha ^{m}$ for all i∈S, f∈F and m=1,2,…. Thus, employing the last inequality and Lemma 3.4 in Kallenberg (2012, p.49), we obtain

$${E_{i}^{f}}{\kern-.5pt}[{\kern-.5pt}\tau_{z}{\kern-.5pt}]{\kern-.5pt} ={\kern-.5pt}{\int}_{0}^{\infty}\!{P_{i}^{f}}\!({\kern-.5pt}\tau_{z}\! >\! t{\kern-.5pt})dt{\kern-.5pt} ={\kern-.5pt} {\sum}_{m=0}^{\infty}{\int}_{mt_{0}}^{(m+1)t_{0}}\!{P_{i}^{f}}{\kern-.5pt}({\kern-.5pt}\tau_{z}\! >\! t{\kern-.5pt})dt{\kern-.5pt}\leq{\kern-.5pt} t_{0}{\sum}_{m=0}^{\infty}{P_{i}^{f}}(\tau_{z}{\kern-.5pt}>{\kern-.5pt}mt_{0})\leq\frac{t_{0}}{1-\alpha} $$

for all i∈S and f∈F. Hence, part (b) implies that part (a) holds with $L=\frac {t_{0}}{1-\alpha }$.

To obtain the existence of a solution to the optimality equation of the risk-sensitive average cost criterion, we need to introduce the following auxiliary risk-sensitive first passage optimization problem which is of interest on its own.

For each i∈S and f∈F, we set c(i,f):=c(i,f(i)). For each $g\in \mathbb {R}$, λ≠0, i∈S and f∈F, the risk-sensitive first passage criterion h _g,λ(i,f) and the corresponding optimal value function $h^{*}_{g,\lambda }(\cdot )$ on S are given by

$$ h_{g,\lambda}(i,f):=\frac{1}{\lambda}\ln {E_{i}^{f}}\left[e^{\lambda{\int}_{0}^{\tau_{z}}\left(c(\xi_{t},f)-g\right)dt}\right] \ \ \text{ and } \ \ h_{g,\lambda}^{*}(i):=\inf_{f\in F}h_{g,\lambda}(i,f), $$

(3.1)

respectively. Let $G_{\lambda }:=\left \{g\in \mathbb {R}:h^{*}_{g,\lambda }(z)\leq 0\right \}$ for all λ>0 and $G_{\lambda }:=\left \{g\in \mathbb {R}:h^{*}_{g,\lambda }(z)\geq 0\right \}$ for all λ<0. Moreover, we define

$$\begin{array}{@{}rcl@{}} g^{*}_{\lambda}:=\left\{\begin{array}{ll} \inf G_{\lambda}, &\textrm{if $\lambda>0$},\\ \sup G_{\lambda}, &\textrm{if $\lambda<0$}. \end{array}\right. \end{array} $$

(3.2)

Then we have the following assertions on the risk-sensitive first passage criterion.

Theorem 3.2

Under Assumption 3.1, the following statements hold for all λ≠0.

(a) The set G _λ is nonempty.
(b) For each $g\in \mathbb {R}$ and f∈F, the function h _g,λ (⋅,f) on S satisfies the following equations
$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda h_{g,\lambda}(i,f)}=Q(i,f,g,\lambda)\left(q(z|i,f)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h_{g,\lambda}(j,f)}q(j|i,f)\right) \\ e^{\lambda h_{g,\lambda}(z,f)}=Q(z,f,g,\lambda) {\sum}_{j\in S\setminus\{z\}}e^{\lambda h_{g,\lambda}(j,f)}q(j|z,f) \end{array}\right. \end{array} $$
for all i∈S∖{z}, where we set $Q(i,f,g, \lambda ):={\int }_{0}^{\infty }e^{\lambda (c(i,f)-g)s+q(i|i,f)s}ds$ and make a convention that $0\cdot \infty :=0$.
(c) For each g∈G _λ , the function $h_{g,\lambda }^{*}$ on S satisfies the following equations
$$ \left\{\begin{array}{ll} e^{\lambda h^{*}_{g,\lambda}(i)}\! =\! sgn{\kern-.5pt}({\kern-.5pt}\lambda{\kern-.5pt})\inf\limits_{a\in A(i)}\!\left\{{\kern-.5pt}sgn{\kern-.5pt}({\kern-.5pt}\lambda{\kern-.5pt}){\kern-.5pt}Q{\kern-.5pt}({\kern-.5pt}i{\kern-.5pt},{\kern-.5pt}a{\kern-.5pt},{\kern-.5pt}g{\kern-.5pt},{\kern-.5pt}\lambda{\kern-.5pt})\!\left({\kern-.5pt}q{\kern-.5pt}({\kern-.5pt}z{\kern-.5pt}|{\kern-.5pt}i{\kern-.5pt},{\kern-.5pt}a{\kern-.5pt})\! +\!{\sum}_{j\in S\setminus\{i, z\}}\!e^{\lambda h^{*}_{g,\lambda}(j)}{\kern-.5pt}q{\kern-.5pt}({\kern-.5pt}j{\kern-.5pt}|{\kern-.5pt}i{\kern-.5pt},{\kern-.5pt}a{\kern-.5pt})\!\right)\!\right\}\\ e^{\lambda h^{*}_{g,\lambda}(z)}=sgn(\lambda)\inf\limits_{a\in A(z)}\left\{sgn(\lambda)Q(z,a,g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|z,a)\right\} \end{array}\right. $$
(3.3)
for all i∈S∖{z}, where we set $Q(i,a,g,\lambda ):={\int }_{0}^{\infty }e^{\lambda (c(i,a)-g)s+q(i|i,a)s}ds$ and sgn(λ) is the sign function, i.e., if λ>0, sgn(λ)=1; if λ<0, sgn(λ)=−1. Moreover, there exists a policy $f^{*}_{g,\lambda }\in F$ with $f^{*}_{g,\lambda }(i)\in A(i)$ attaining the minimum of Eq. 3.3 , and for any $f^{*}_{g,\lambda }\in F$ with $f^{*}_{g,\lambda }(i)\in A(i)$ attaining the minimum of Eq. 3.3 , we have $h_{g,\lambda }(i,f^{*}_{g,\lambda })=h^{*}_{g,\lambda }(i)\in \mathbb {R}$ and $Q(i,f^{*}_{g,\lambda },g,\lambda )<\infty $ for all i∈S.
(d) We have $g^{*}_{\lambda }\in G_{\lambda }$ and $h^{*}_{g^{*}_{\lambda },\lambda }(z)=0$.

Proof

See Section 6. □

Remark 3.3

The equations in Eq. 3.3 are referred to as the optimality equations of the risk-sensitive first passage criterion. The statements of Theorem 3.2 hold for an arbitrary risk-sensitivity parameter λ≠0 and extend the results in Wei and Chen (2016) for any λ>0. Moreover, as can be seen in the proof of Theorem 3.2, the treatment of the case λ<0 is more difficult than that of the case λ>0. Hence, the extension is nontrivial.

Let B(S) be the set of all real-valued functions on S. Below we state the optimality equation and the existence of optimal policies for the risk-sensitive average cost criterion.

Theorem 3.3

Suppose that Assumption 3.1 is satisfied. For each λ≠0, let $g^{*}_{\lambda }$ and $h^{*}_{g^{*}_{\lambda },\lambda }$ be as in Eqs. 3.1 and 3.2. Then the following assertions hold.

The pair $(g^{*}_{\lambda }, h^{*}_{g^{*}_{\lambda },\lambda })\in \mathbb {R}\times B(S)$ satisfies the following optimality equation
$$ \lambda g^{*}_{\lambda} e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(i)}\,=\, sgn(\lambda)\inf\limits_{a\in A(i)}\left\{sgn(\lambda)\left(\lambda c(i,a)e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(i)}\,+\,{\sum}_{j\in S} e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(j)}q(j|i,a)\right)\right\} $$
(3.4)
for all i∈S. Moreover, there exists $f^{*}_{\lambda }\in F$ with $f^{*}_{\lambda }(i)\in A(i)$ attaining the minimum of Eq. 3.4.
For any $f^{*}_{\lambda }\in F$ with $f^{*}_{\lambda }(i)\in A(i)$ attaining the minimum of Eq. 3.4 , we have
$$J^{*}_{V_{\lambda}}(i)=J_{V_{\lambda}}(i,f^{*}_{\lambda})=\lim_{T\to\infty}\frac{1}{\lambda T}\ln E_{i}^{f^{*}_{\lambda}}\left[e^{{\lambda{\int}_{0}^{T}}c(\xi_{t},f^{*}_{\lambda})dt}\right]=g^{*}_{\lambda}$$
for all i∈S. Hence, the policy $f^{*}_{\lambda }$ is V _λ -average optimal.

Proof

The assertions follow from the above Theorem 3.2, the Feynman-Kac formula and the similar techniques of Theorem 3.2 in Wei and Chen (2016). □

Remark 3.4

Theorem 3.3 establishes the existence of a solution to the optimality equation and the existence of an optimal stationary policy for the V _λ-average cost criterion with an arbitrary risk-sensitivity parameter λ≠0, which generalizes those in Ghosh and Saha (2014) and Wei and Chen (2016). More precisely, the risk-sensitivity parameter λ is positive in Ghosh and Saha (2014) and Wei and Chen (2016) and satisfies some additional condition that $\lambda \max _{(i,a)\in K}c(i,a)<b$ (for some constant b>0) in Ghosh and Saha (2014). Moreover, to the best of our knowledge, the risk-sensitive average cost criterion with a negative risk-sensitivity parameter has not been studied in the existing literature.

Finally, we give the following statement on the continuity of $J^{*}_{V_{\lambda }}(i)$ in $\lambda \in \mathbb {R}$, which plays a crucial role in the study on the existence of U-average optimal policies.

Theorem 3.4

Suppose that Assumption 3.1 holds. Then for each i∈S, $J^{*}_{V_{\lambda }}(i)$ is continuous in $\lambda \in \mathbb {R}$.

Proof

See Section 7. □

4 The existence of U-average optimal policies

In this section, we show the existence of optimal policies for the U-average cost criterion induced by a regular utility function U with the asymptotic risk-sensitivity parameter λ _U.

Below we state the main results on the U-average cost criterion.

Theorem 4.1

Suppose that Assumption 3.1 is satisfied. Let $g^{*}_{\lambda _{U}}$ be as in Eq. 3.2 with λ _U in lieu of λ for all λ _U ≠0 and $g^{*}_{0}:=\lim _{\lambda \to 0}g^{*}_{\lambda }$. Then the following assertions hold.

(a) $J^{*}_{U}(i)=J^{*}_{V_{\lambda _{U}}}(i)=g^{*}_{\lambda _{U}}$ for all i∈S.
(b) For each f ∈F and i ∈S, $J_{U}(\hspace *{-.65pt}i\hspace *{-.5pt},\hspace *{-.65pt}f\hspace *{-.5pt})\hspace *{-.65pt}=\hspace *{-.5pt}J_{V_{\lambda _{U}}}(\hspace *{-.65pt}i\hspace *{-.5pt},\hspace *{-.65pt}f\hspace *{-.5pt})=\lim _{T\to \infty }\frac {1}{\lambda _{U} T}\ln {E_{i}^{f}}\left [e^{\lambda _{U}{{\int }_{0}^{T}}c(\xi _{t},f)dt}\right ]$ for all λ _U ≠0 and $J_{U}(i,f)=J_{V_{0}}(i,f)=\lim _{T\to \infty }\frac {1}{T}{E_{i}^{f}}\left [{{\int }_{0}^{T}}c(\xi _{t},f)dt\right ]$ for λ _U = 0. Moreover, the limits are independent of the state i∈S.
(c) For any λ _U ≠0 and $f^{*}_{\lambda _{U}}\in F$ attaining the minimum of Eq. 3.4 with λ _U in lieu of λ, we have $J^{*}_{U}(i)=J_{U}(i,f^{*}_{\lambda _{U}})=g^{*}_{\lambda _{U}}$ for all i∈S. For λ _U =0, there exist $(g^{*}_{0},h_{0})\in \mathbb {R}\times B(S)$ and a policy $f^{*}_{0}\in F$ satisfying
$$\begin{array}{@{}rcl@{}} g^{*}_{0}&=&\inf_{a\in A(i)}\left\{c(i,a)+{\sum}_{j\in S}h_{0}(j)q(j|i,a)\right\} \end{array} $$
(4.1)

$$\begin{array}{@{}rcl@{}} &=&c(i,f^{*}_{0})+{\sum}_{j\in S}h_{0}(j)q(j|i,f^{*}_{0}) \end{array} $$
for all i∈S. Moreover, for any $f^{*}_{0}\in F$ attaining the minimum of Eq. 4.1 , we have $J^{*}_{U}(i)=J^{*}_{U}(i,f^{*}_{0})=g^{*}_{0}$ for all i∈S. Hence, the policy $f^{*}_{\lambda _{U}}$ is U-average optimal.

Proof

(a) Fix any i∈S, π∈π and η>0. The relation $\lim _{x\to \infty }\mathcal {A}_{U}(x)=\lambda _{U}$ implies that there exists a positive constant x ₀ satisfying
$$ \lambda_{U}-\eta\leq\mathcal{A}_{U}(x)\leq \lambda_{U}+\eta \ \ \text{for \ all} \ x>x_{0}. $$
(4.2)
Note that $\min _{(i,a)\in K}c(i,a)>0$. Thus, there exists a positive constant T ^∗ such that
$$Y:={{\int}_{0}^{T}}{\int}_{A} c(\xi_{t},a)\pi(da|\xi_{t},t)dt> x_{0} \ \ \text{for \ all} \ T> T^{*}. $$
Let $V_{\lambda _{U}-\eta }$ and $V_{\lambda _{U}+\eta }$ be the regular utility functions given by Eq. 2.2. Then we have
$$ \mathcal{A}_{V_{\lambda_{U}-\eta}}(x)=\lambda_{U}-\eta \ \ \text{and} \ \ \mathcal{A}_{V_{\lambda_{U}+\eta}}(x)=\lambda_{U}+\eta \ \ \text{for \ all} \ x\in\mathbb{R}_{+}. $$
(4.3)
Moreover, direct calculations yield that $E_{i}^{\pi }[U(Y)]$, $E_{i}^{\pi }[V_{\lambda _{U}-\eta }(Y)]$ and $E_{i}^{\pi }[V_{\lambda _{U}+\eta }(Y)]$ are finite. Thus, employing Theorem 4.1 in Cavazos-Cadena and Hernández-Hernández (2016), Eq. 4.2 and 4.3, we obtain $J_{V_{\lambda _{U}-\eta }}(i,\pi )\leq J_{U}(i,\pi )\leq J_{V_{\lambda _{U}+\eta }}(i,\pi )$, which gives
$$ J^{*}_{V_{\lambda_{U}-\eta}}(i)\leq J^{*}_{U}(i)\leq J^{*}_{V_{\lambda_{U}+\eta}}(i). $$
(4.4)
Hence, letting η→0 in Eq. 4.4 and using Theorems 3.3(b) and 3.4, we get the desired result.
(b) Fix any f∈F. Let $\mathcal {M}_{f}$ be the control model in which we take A(i)={f(i)} for all i∈S and the other components are the same as in the model $\mathcal {M}$. Then it is obvious that the model $\mathcal {M}_{f}$ satisfies Assumption 3.1. Thus, part (b) follows directly from part (a), Lemma 3.1(b) in Guo and Hernández-Lerma (2009) and Theorem 3.3.
(c) By part (a), Theorem 7.8 in Guo and Hernández-Lerma (2009) and Theorem 3.3 we have $J^{*}_{U}(i)=J^{*}_{V_{\lambda _{U}}}(i)=J_{V_{\lambda _{U}}}(i,f^{*}_{\lambda _{U}})=g^{*}_{\lambda _{U}}$ for all i∈S, which together with part (b) implies the assertion.

□

Remark 4.1

(a) Theorem 4.1 indicates that the optimal value function of the U-average cost criterion induced by a regular utility function U with the asymptotic risk-sensitivity parameter λ _U is a constant and equals the optimal value function of the $V_{\lambda _{U}}$-average cost criterion. Moreover, the set of all U-average optimal stationary policies coincides with the set of all $V_{\lambda _{U}}$-average optimal stationary policies. Hence, we can compute a U-average optimal policy and the optimal value function of the U-average cost criterion via the policy iteration algorithms given in Ghosh and Saha (2014) for the risk-sensitive average cost criterion with the risk-sensitivity parameter λ _U≠0 or in Guo and Hernández-Lerma (2009) for the expected average cost criterion with λ _U=0.
(b) Besides the risk-sensitive average cost criterion induced by the exponential utility function and the expected average cost criterion induced by the identity function, the U-average cost criterion includes other average cost criteria, such as those induced by the logarithmic utility function $W(x)=\ln x$ and the power utility function U _β(x):=x ^β ( β>0) for all $x\in \mathbb {R}_{+}$. Obviously, we have that the utility functions W and U _β are regular with the asymptotic risk-sensitivity parameters λ _W=λ _U _β=0. Thus, Theorem 4.1 implies $J^{*}_{W}(i)=J^{*}_{U_{\beta }}(i)=g_{0}^{*}$ for all i∈S. That is, under Assumption 3.1, the optimal value functions $J^{*}_{W}$ and $J^{*}_{U_{\beta }}$ are independent of the state variable, and equal the optimal value function of the expected average cost criterion which is risk-neutral.

5 Proof of Theorem 3.1

In this section, we give the proof of Theorem 3.1.

Proof

(a) By Assumption 3.1(iii) and the finiteness of S, for each f∈F, the continuous-time Markov chain associated with the transition rate q(⋅|⋅,f) has a unique invariant probability measure denoted by μ _f, which satisfies ${\sum }_{i\in S}q(j|i,f)\mu _{f}(i)=0$ for all j∈S. Below we show that μ _f is continuous in f∈F. In fact, let $\{f_{n},n\geq 1\}\subseteq F$ be an arbitrary sequence converging to f∈F. Note that $0\leq \mu _{f_{n}}(j)\leq 1$ for all j∈S and n≥1. Fix any i∈S and choose any convergent subsequence $\{\mu _{f_{n_{l}}}(i),l\geq 1\}$ of $\{\mu _{f_{n}}(i),n\geq 1\}$. Let $\lim _{l\to \infty }\mu _{f_{n_{l}}}(i)=:\mu (i)$. Moreover, there exists a subsequence of {n _l} (still denoted by {n _l}) such that $\lim _{l\to \infty }\mu _{f_{n_{l}}}(j)=:\widetilde {\mu }(j)$ for all j∈S and $\widetilde {\mu }(i)=\mu (i)$. Thus, we have $0\leq \widetilde {\mu }(j)\leq 1$, ${\sum }_{j\in S}\widetilde {\mu }(j)=1$, and
$${\sum}_{k\in S}q(j|k,f)\widetilde{\mu}(k)=\lim_{l\to\infty}{\sum}_{k\in S}q(j|k,f_{n_{l}})\mu_{f_{n_{l}}}(k)=0. $$
Hence, by the uniqueness of the invariant probability measure, we obtain $\widetilde {\mu }(j)=\mu _{f}(j)$ for all j∈S. Therefore, the continuity of μ _f in f∈F follows from the fact that any convergent subsequence $\{\mu _{f_{n_{l}}}(i),l\geq 1\}$ of $\{\mu _{f_{n}}(i),n\geq 1\}$ has the same limit μ _f(i). Set
$$\widetilde{g}:=\sup_{f\in F}\frac{{E_{z}^{f}}\left[{\int}_{0}^{\tau_{z}}(1-I_{z}(\xi_{t}))dt\right]}{{E_{z}^{f}}[\tau_{z}]}. $$
Then direct calculations give
$$\widetilde{g}=\sup_{f\in F}\frac{{E_{z}^{f}}\left[{\int}_{T_{1}}^{\tau_{z}}(1-I_{z}(\xi_{t}))dt\right]}{{E_{z}^{f}}[\tau_{z}]}=\sup_{f\in F}\frac{{E_{z}^{f}}[\tau_{z}]-{E_{z}^{f}}[T_{1}]}{{E_{z}^{f}}[\tau_{z}]}=\sup_{f\in F}\left(1-\mu_{f}(z)\right), $$
where the last equality is due to Proposition 2.1 in Anderson (1991, p.213) and Proposition B.8 in Guo and Hernández-Lerma (2009, p.205). Thus, by the compactness of F and the continuity of μ _f(z) in f∈F, there exists $\widetilde {f}\in F$ such that $\widetilde {g}=1-\mu _{\widetilde {f}}(z)$. Moreover, by Assumption 3.1(iii), we have $0<\mu _{\widetilde {f}}(z)\leq 1$, which implies $0\leq \widetilde {g}<1$. Define
$$\widetilde{h}(i):=E_{i}^{\widetilde{f}}\left[{\int}_{0}^{\tau_{z}}(1-I_{z}(\xi_{t})-\widetilde{g})dt\right] \ \ \text{for\ all} \ i\in S. $$
Then we have
$$ \widetilde{h}(z)=0 \ \ \text{and} \ \ \widetilde{h}(i)\geq0 \ \ \text{for \ all} \ i\in S\setminus\{z\}. $$
(5.1)
Next, we show that $\widetilde {h}\in B(S)$ and
$$ \widetilde{g}=\sup_{a\in A(i)}\left\{1-I_{z}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,a)\right\} \ \ \text{for \ all} \ i\in S. $$
(5.2)
Indeed, for each i∈S∖{z}, by the strong Markov property, direct calculations yield
$$\begin{array}{@{}rcl@{}} \widetilde{h}(i)&=&E_{i}^{\widetilde{f}}\left[{\int}_{0}^{\tau_{z}}(1-I_{z}(\xi_{t})-\widetilde{g})dtI_{\{\tau_{z}=T_{1}\}}\right]{\kern-.5pt}+{\kern-.5pt} E_{i}^{\widetilde{f}}\left[{\int}_{0}^{\tau_{z}}(1{\kern-.5pt}-{\kern-.5pt}I_{z}(\xi_{t}){\kern-.5pt}-{\kern-.5pt}\widetilde{g})dtI_{\{\tau_{z}>T_{1}\}}\right]\!\!\\ &=&(1-\widetilde{g})E_{i}^{\widetilde{f}}[T_{1}I_{\{\tau_{z}=T_{1}\}}]+(1-\widetilde{g})E_{i}^{\widetilde{f}}[T_{1}I_{\{\tau_{z}>T_{1}\}}]\\ &&+E_{i}^{\widetilde{f}}\left[I_{\{\tau_{z}>T_{1}\}}E_{i}^{\widetilde{f}}\left[{\int}_{T_{1}}^{\tau_{z}}(1-I_{z}(\xi_{t})-\widetilde{g})dt \big|\mathcal{F}_{T_{1}}\right]\right]\\ &=&(1-\widetilde{g})E_{i}^{\widetilde{f}}[T_{1}]+ E_{i}^{\widetilde{f}}[I_{\{\tau_{z}>T_{1}\}}\widetilde{h}(\xi_{T_{1}})]\\ &=&-\frac{1-\widetilde{g}}{q(i|i,\widetilde{f})}-{\sum}_{j\in S\setminus\{i,z\}}\frac{\widetilde{h}(j)q(j|i,\widetilde{f})}{q(i|i,\widetilde{f})}, \end{array} $$
(5.3)
where the fourth equality follows from Proposition B.8 in Guo and Hernández-Lerma (2009, p.205). Similarly, we have
$$\begin{array}{@{}rcl@{}} \widetilde{h}(z)&=&-\widetilde{g}E_{z}^{\widetilde{f}}[T_{1}I_{\{\tau_{z}>T_{1}\}}] +E_{z}^{\widetilde{f}}[I_{\{\tau_{z}>T_{1}\}}\widetilde{h}(\xi_{T_{1}})]\\ &=&\frac{\widetilde{g}}{q(z|z,\widetilde{f})}-{\sum}_{j\in S\setminus\{z\}}\frac{\widetilde{h}(j)q(j|z,\widetilde{f})}{q(z|z,\widetilde{f})}. \end{array} $$
(5.4)
For any i≠z, Assumption 3.1(iii) implies that there exist different states j ₁=z, j ₂,…, j _m=i such that $q(j_{n+1}|j_{n},\widetilde {f})>0$ for all n=1,2,…,m−1. Then by Eq. 5.4 we obtain $\widetilde {h}(j_{2})<\infty $, which together with Eq. 5.3 and an induction argument gives $\widetilde {h}(i)<\infty $ for all i∈S. Hence, we get $\widetilde {h}\in B(S)$. Employing Eqs. 5.1, 5.3 and 5.4, we have
$$\begin{array}{@{}rcl@{}} \widetilde{g}&=&1-I_{z}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,\widetilde{f}) \end{array} $$
(5.5)

$$\begin{array}{@{}rcl@{}} &\leq&\sup_{a\in A(i)}\left\{1-I_{z}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,a)\right\} \end{array} $$
(5.6)
for all i∈S. On the other hand, fix any (k,a)∈K and define
$$\begin{array}{@{}rcl@{}} {\Phi}(i):=\left\{\begin{array}{ll} \widetilde{g}-1+I_{z}(i)-{\sum}_{j\in S}\widetilde{h}(j)q(j|i,a), & \ \text{if} \ i=k,\\ 0, & \ \text{otherwise}. \end{array}\right. \end{array} $$
(5.7)
Obviously, we get Φ∈B(S). Let $\widehat {f}\in F$ be a policy with $\widehat {f}(k)=a$ and $\widehat {f}(i)=\widetilde {f}(i)$ for all i∈S∖{k}. Then by Eqs. 5.5 and 5.7, we obtain
$$\widetilde{g}=1-I_{z}(i)+{\Phi}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,\widehat{f}) $$
for all i∈S. Thus, using the last equality and the Dynkin formula, we have
$$\widetilde{g}T=E_{i}^{\widehat{f}}\left[{{\int}_{0}^{T}}(1-I_{z}(\xi_{t})+{\Phi}(\xi_{t}))dt\right] +E_{i}^{\widehat{f}}[\widetilde{h}(\xi_{T})]-\widetilde{h}(i), $$
which together with the fact that $\widetilde {h}\in B(S)$ gives
$$ \widetilde{g}=\lim_{T\to\infty}\frac{1}{T}E_{i}^{\widehat{f}}\left[{{\int}_{0}^{T}}(1-I_{z}(\xi_{t})+{\Phi}(\xi_{t}))dt\right] =1-\mu_{\widehat{f}}(z)+{\Phi}(k)\mu_{\widehat{f}}(k) $$
(5.8)
for all i∈S. Note that $\mu _{\widehat {f}}(k)>0$ and $\widetilde {g}\geq 1-\mu _{\widehat {f}}(z)$. Hence, by Eq. 5.8 we obtain Φ(k)≥0. Therefore, we get
$$\widetilde{g}\geq \sup_{a\in A(i)}\left\{1-I_{z}(i)+{\sum}_{j\in S}\widetilde{h}(j)q(j|i,a)\right\} $$
for all i∈S, which together with Eq. 5.6 implies Eq. 5.2. Fix any i∈S and f∈F below. By Eqs. 5.1 and 5.2 we obtain
$$\widetilde{h}(i)\geq-\frac{1-I_{z}(i)-\widetilde{g}}{q(i|i,f)}-{\sum}_{j\in S\setminus\{i,z\}}\frac{\widetilde{h}(j)q(j|i,f)}{q(i|i,f)}. $$
Then employing the last inequality, we have
$$ \widetilde{h}(\xi_{T_{m}})\geq {E_{i}^{f}}\left[{\int}_{T_{m}}^{T_{m+1}}(1-I_{z}(\xi_{t})-\widetilde{g})dt\big|\mathcal{F}_{T_{m}}\right]+ {E_{i}^{f}}\left[\widetilde{h}(\xi_{T_{m+1}})I_{\{\tau_{z}>T_{m+1}\}}\big|\mathcal{F}_{T_{m}}\right] $$
(5.9)
for all m=0,1,…. Thus, using Eq. 5.9 and an induction argument, we get
$$\widetilde{h}(i)\geq-\frac{1-I_{z}(i)-\widetilde{g}}{q(i|i,f)}+(1-\widetilde{g}){\sum}_{l=1}^{n}{E_{i}^{f}}[\theta_{l+1}I_{\{\tau_{z}>T_{l}\}}] +{E_{i}^{f}}\left[\widetilde{h}(\xi_{T_{n+1}})I_{\{\tau_{z}>T_{n+1}\}}\right] $$
for all n=1,2,…, which together with Eq. 5.1 yields
$$\widetilde{h}(i)+\frac{1-I_{z}(i)-\widetilde{g}}{q(i|i,f)}\geq(1-\widetilde{g}){\sum}_{l=1}^{\infty}{E_{i}^{f}}[\theta_{l+1}I_{\{\tau_{z}>T_{l}\}}]. $$
Hence, by the last inequality we obtain
$$ {\sum}_{l=2}^{\infty}{E_{i}^{f}}[\theta_{l}I_{\{\tau_{z}\geq T_{l}\}}]\leq \frac{1}{1-\widetilde{g}}\left[\max_{i\in S}\widetilde{h}(i)-\min_{(i,a)\in K}\frac{\widetilde{g}}{q(i|i,a)}\right]:=L_{1}. $$
(5.10)
Observe that
$${E_{i}^{f}}[\tau_{z}]={\sum}_{l=1}^{\infty}{E_{i}^{f}}[\theta_{l}I_{\{\tau_{z}\geq T_{l}\}}]=-\frac{1}{q(i|i,f)}+{\sum}_{l=2}^{\infty}{E_{i}^{f}}[\theta_{l}I_{\{\tau_{z}\geq T_{l}\}}], $$
which together with Eq. 5.10 implies ${E_{i}^{f}}[\tau _{z}]\leq L_{1}-\min _{(i,a)\in K}\frac {1}{q(i|i,a)}$. Therefore, the assertion holds with $L:=L_{1}-\min _{(i,a)\in K}\frac {1}{q(i|i,a)}$.
(b) By part (a) we have
$${P_{i}^{f}}(\tau_{z}>t)\leq \frac{{E_{i}^{f}}[\tau_{z}]}{t}\leq \frac{L}{t} $$
for all i∈S, f∈F and t>0. Moreover, there exists $t_{0}\in \mathbb {R}_{+}$ such that $\frac {L}{t_{0}}\in (0,1)$. Hence, part (b) holds with $\alpha :=\frac {L}{t_{0}}$.

□

6 Proof of Theorem 3.2

In this section, we present the proof of Theorem 3.2.

Proof

The statements for the case λ>0 follow from Theorem 3.1 in Wei and Chen (2016). Below we only need to prove the case λ<0.

(a) Set $\widetilde {M}:=\min _{(i,a)\in K}c(i,a)$. Thus, we obtain $h_{\widetilde {M},\lambda }(i,f)\geq 0$ for all i∈S and f∈F, which gives $h^{*}_{\widetilde {M},\lambda }(z)\geq 0$. Therefore, the set G _λ is nonempty.
(b) From the proof of Theorem 3.1(b) in Wei and Chen (2016), we see that part (b) also holds for the case λ<0.
(c) Fix any g∈G _λ. Set $\overline {c}:=\max _{(i,a)\in K} |c(i,a)-g|$. For each i∈S∖{z}, f∈F and m≥1, direct calculations yield
$$\begin{array}{@{}rcl@{}} e^{\lambda h_{g,\lambda}(z,f)}&\geq & {E_{z}^{f}}\left[e^{\lambda {\int}_{0}^{\tau_{z}}(c(\xi_{t},f)-g)dt}I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}\right]\\ &=&{E_{z}^{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda {\int}_{0}^{T_{m}}(c(\xi_{t},f)-g)dt}{E_{z}^{f}}\left[e^{\lambda {\int}_{T_{m}}^{\tau_{z}}(c(\xi_{t},f)-g)dt}\big|\mathcal{F}_{T_{m}}\right]\right]\\ &=&e^{\lambda h_{g,\lambda}(i,f)}{E_{z}^{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda {\int}_{0}^{T_{m}}(c(\xi_{t},f)-g)dt}\right]\\ &\geq&e^{\lambda h_{g,\lambda}(i,f)}{E_{z}^{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right], \end{array} $$
which together with Eq. 3.1 and the definition of G _λ gives
$$ e^{\lambda h_{g,\lambda}(i,f)}{E_{z}^{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right]\leq e^{\lambda h^{*}_{g,\lambda}(z)}\leq 1. $$
(6.1)
Suppose that $\sup _{f\in F}e^{\lambda h_{g,\lambda }(i,f)}=\infty $. Then there exists a sequence $\{f_{n},n\geq 1\}\subseteq F$ such that e ^λh _g,λ(i,f _n)≥n for all n≥1. Thus, by Eq. 6.1 we obtain
$$ \lim_{n\to\infty}E_{z}^{f_{n}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right]=0 $$
(6.2)
for all m≥1. Because F is compact, there exists a subsequence of {f _n,n≥1} (denoted by the same sequence) such that f _n converges to some $\overline {f}\in F$, i.e.,
$$ f_{n}(j)\to \overline{f}(j) \ \ \text{for \ all} \ j\in S \ \ \text{as} \ n\to\infty. $$
(6.3)
Moreover, for each m≥1, by Proposition B.8 in Guo and Hernández-Lerma (2009, p.205) we have
$$\begin{array}{@{}rcl@{}} &&E_{z}^{f_{n}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right]\\ &=&{\sum}_{j_{1}\in S\setminus\{i\},j_{l+1}\in S\setminus\{j_{l},i,z\},l=1,2,\ldots,m-2}\left(-\frac{q(j_{1}|z,f_{n})}{q(z|z,f_{n})+\lambda \overline{c}}\right){\prod}_{l=1}^{m-2}\left(-\frac{q(j_{l+1}|j_{l},f_{n})}{q(j_{l}|j_{l},f_{n})+\lambda \overline{c}}\right)\\ &&\times\left(-\frac{q(i|j_{m-1},f_{n})}{q(j_{m-1}|j_{m-1},f_{n})+\lambda \overline{c}}\right) \end{array} $$
for all n≥1, which together with Assumption 3.1(ii), Eqs. 6.2 and 6.3 implies
$$ E_{z}^{\overline{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq m-1, \xi_{T_{m}}=i\right\}}e^{\lambda \overline{c}T_{m}} \right]=0. $$
(6.4)
On the other hand, Assumption 3.1(iii) gives that there exist different states k ₀=z, k ₁,…, $k_{\widetilde {m}}=i$ such that $q(k_{n+1}|k_{n},\overline {f})>0$ for all $n=0,2,\ldots , \widetilde {m}-1$. Thus, we get
$$E_{z}^{\overline{f}}\left[I_{\left\{\xi_{T_{l}}\neq i,z, 1\leq l\leq \widetilde{m}-1, \xi_{T_{\widetilde{m}}}=i\right\}}e^{\lambda \overline{c}T_{\widetilde{m}}} \right]\geq{\prod}_{l=0}^{\widetilde{m}-1}\left(-\frac{q(k_{l+1}|k_{l},\overline{f})}{q(k_{l}|k_{l},\overline{f})+\lambda \overline{c}}\right)>0, $$
which contradicts (6.4). Hence, we obtain
$$ e^{\lambda h^{*}_{g,\lambda}(i)}=\sup_{f\in F}e^{\lambda h_{g,\lambda}(i,f)}<\infty \ \ \text{for \ all} \ i\in S. $$
(6.5)
By Eq. 3.1 and part (b), we have
$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda h^{*}_{g,\lambda}(i)}\geq Q(i,f,g,\lambda)\left(q(z|i,f)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h_{g,\lambda}(j,f)}q(j|i,f)\right) \\ e^{\lambda h^{*}_{g,\lambda}(z)}\geq Q(z,f,g,\lambda) {\sum}_{j\in S\setminus\{z\}}e^{\lambda h_{g,\lambda}(j,f)}q(j|z,f) \end{array}\right. \end{array} $$
(6.6)
for all i∈S∖{z} and f∈F. Note that Theorem 3.1 implies
$$ e^{\lambda h_{g,\lambda}(i,f)}\geq {E_{i}^{f}}\left[e^{\lambda \max_{(i,a)\in K}|c(i,a)-g|\tau_{z}}\right]>0 $$
(6.7)
for all i∈S and f∈F. Thus, employing Eqs. 6.5–6.7 and Assumption 3.1(iii), we get
$$ e^{\lambda h_{g,\lambda}(i,f)}<\infty \ \ \text{and} \ \ Q(i,f,g,\lambda)<\infty \ \ \text{for \ all} \ i\in S \ \text{and} \ f\in F. $$
(6.8)
Moreover, using Eq. 3.1 and part (b) again, we obtain
$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda h^{*}_{g,\lambda}(i)}\leq\sup\limits_{a\in A(i)}\left\{Q(i,a,g,\lambda)\left(q(z|i,a)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,a)\right)\right\} \\ e^{\lambda h^{*}_{g,\lambda}(z)}\leq\sup\limits_{a\in A(z)}\left\{Q(z,a,g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|z,a)\right\} \end{array}\right. \end{array} $$
(6.9)
for all i∈S∖{z}. By Theorem 3.1(c) in Wei and Chen (2016), Assumption 3.1(ii) and Eq. 6.5, we see that $Q(i,a,g,\lambda )(q(z|i,a)+{\sum }_{j\in S\setminus \{i, z\}}e^{\lambda h^{*}_{g,\lambda }(j)}q(j|i,a))$ and $Q(z,a,g,\lambda ){\sum }_{j\in S\setminus \{z\}}e^{\lambda h^{*}_{g,\lambda }(j)}q(j|z,a)$ in Eq. 6.9 are continuous in a∈A(i) and a∈A(z), respectively. Thus, the Weierstrass theorem in Aliprantis and Border (2007, p.40) and Assumption 3.1 imply that there exists $f^{*}_{g,\lambda }\in F$ with $f^{*}_{g,\lambda }(i)\in A(i)$ attaining the maximum of Eq. 6.9, i.e.,
$$\begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\left\{\begin{array}{ll} e^{\lambda h^{*}_{g,\lambda}(i)}\leq Q(i,f^{*}_{g,\lambda},g,\lambda)\left(q(z|i,f^{*}_{g,\lambda})+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,f^{*}_{g,\lambda})\right)\\ e^{\lambda h^{*}_{g,\lambda}(z)}\leq Q(z,f^{*}_{g,\lambda},g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|z,f^{*}_{g,\lambda}) \end{array}\right. \end{array} $$
(6.10)
for all i∈S∖{z}. Hence, by Eq. 6.10, Proposition B.8 in Guo and Hernández-Lerma (2009, p.205) and an induction argument, we have
$$\begin{array}{@{}rcl@{}} e^{\lambda h^{*}_{g,\lambda}(i)}&\leq& {\sum}_{m=1}^{n} E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{m}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt} I_{\left\{\tau_{z}=T_{m}\right\}}\right]\\ &&+E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{n}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt}e^{\lambda h^{*}_{g,\lambda}(\xi_{T_{n}})}I_{\left\{\tau_{z}>T_{n}\right\}}\right] \end{array} $$
(6.11)
for all i∈S∖{z} and n=1,2,…. Furthermore, it follows from part (b) and the similar arguments of Eq. 6.11 that
$$\begin{array}{@{}rcl@{}} e^{\lambda h_{g,\lambda}(i,f^{*}_{g,\lambda})}&=&{\sum}_{m=1}^{n} E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{m}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt} I_{\left\{\tau_{z}=T_{m}\right\}}\right]\\ &&+E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{n}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt}e^{\lambda h_{g,\lambda}(\xi_{T_{n}},f^{*}_{g,\lambda})}I_{\left\{\tau_{z}>T_{n}\right\}}\right] \end{array} $$
for all i∈S∖{z} and n=1,2,…. Thus, employing the last equality and the fact that $\min _{i\in S}e^{\lambda h_{g,\lambda }(i,f^{*}_{g,\lambda })}>0$, we obtain
$$ \liminf_{n\to\infty}E_{i}^{f^{*}_{g,\lambda}}\left[e^{\lambda{\int}_{0}^{T_{n}}\left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt}I_{\left\{\tau_{z}>T_{n}\right\}}\right]=0 $$
(6.12)
for all i∈S∖{z}. Hence, using Eqs. 6.11 and 6.12, we get
$$e^{\lambda h^{*}_{g,\lambda}(i)}\!\leq e^{\lambda h_{g,\lambda}(i,f^{*}_{g,\lambda})} +\max_{j\in S}e^{\lambda h^{*}_{g,\lambda}(j)}\liminf_{n\to\infty}E_{i}^{f^{*}_{g,\lambda}}\!\left[e^{\lambda{\int}_{0}^{T_{n}} \left(c(\xi_{t},f^{*}_{g,\lambda})-g\right)dt}I_{\left\{\tau_{z}>T_{n}\right\}}\right]=e^{\lambda h_{g,\lambda}(i,f^{*}_{g,\lambda})}, $$
which together with Eq. 3.1 implies
$$ h^{*}_{g,\lambda}(i)=h_{g,\lambda}(i,f^{*}_{g,\lambda}) \ \ \text{for \ all} \ i\in S\setminus\{z\}. $$
(6.13)
Thus, using Eqs. 6.6 and 6.13, we get
$$\begin{array}{@{}rcl@{}} e^{\lambda h^{*}_{g,\lambda}(i)}&\geq & Q(i,f^{*}_{g,\lambda},g,\lambda)\left(q(z|i,f^{*}_{g,\lambda})+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,f^{*}_{g,\lambda})\right)\\ &=&\sup\limits_{a\in A(i)}\left\{Q(i,a,g,\lambda)\left(q(z|i,a)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,a)\right)\right\}, \end{array} $$
which together with Eq. 6.9 yields
$$ e^{\lambda h^{*}_{g,\lambda}(i)} =\sup\limits_{a\in A(i)}\left\{Q(i,a,g,\lambda)\left(q(z|i,a)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|i,a)\right)\right\} $$
(6.14)
for all i∈S∖{z}. Following the similar arguments of Eqs. 6.13 and 6.14, we have
$$ h^{*}_{g,\lambda}(z)=h_{g,\lambda}(z,f^{*}_{g,\lambda})\ \ \text{and} \ \ e^{\lambda h^{*}_{g,\lambda}(z)}=\sup\limits_{a\in A(z)}\left\{Q(z,a,g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{g,\lambda}(j)}q(j|z,a)\right\}. $$
(6.15)
Therefore, the function $h^{*}_{g,\lambda }$ on S satisfies (3.3). Moreover, Eq. 6.7 gives
$$ e^{\lambda h^{*}_{g,\lambda}(i)}=e^{\lambda h_{g,\lambda}(i,f^{*}_{g,\lambda})}>0 \ \ \text{for \ all} \ i\in S. $$
(6.16)
Hence, by Eqs. 6.5, 6.8, 6.13, 6.15 and 6.16, we see that for any $f^{*}_{g,\lambda }\in F$ with $f^{*}_{g,\lambda }(i)\in A(i)$ attaining the minimum of Eq. 3.3, $h_{g,\lambda }(i,f^{*}_{g,\lambda })=h^{*}_{g,\lambda }(i)\in \mathbb {R}$ and $Q(i,f^{*}_{g,\lambda },g,\lambda )<\infty $ for all i∈S.
Choose a sequence $\{\overline {g}^{\lambda }_{n},n\geq 1\}\subseteq G_{\lambda }$ satisfying
$$ \overline{g}^{\lambda}_{n}\leq \overline{g}^{\lambda}_{n+1} \ \ \text{for \ all} \ \ n\geq1 \ \ \text{and} \ \ \lim_{n\to\infty}\overline{g}^{\lambda}_{n}=g^{*}_{\lambda}. $$
(6.17)
By Eqs. 3.1 and 6.17, we obtain
$$ h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(i)\geq h^{*}_{\overline{g}^{\lambda}_{n+1},\lambda}(i)\geq h^{*}_{g^{*}_{\lambda},\lambda}(i) \ \ \text{for \ all} \ i\in S \ \text{and} \ n\geq1. $$
(6.18)
Set $H_{\lambda }(i):=\lim _{n\to \infty }h^{*}_{\overline {g}^{\lambda }_{n},\lambda }(i)$ for all i∈S. Then by the definition of G _λ and Eq. 6.18, we have
$$ H_{\lambda}(z)\geq0 \ \ \text{and} \ \ H_{\lambda}(i)\geq h^{*}_{g^{*}_{\lambda},\lambda}(i) \ \ \text{for \ all} \ \ i\in S. $$
(6.19)
Moreover, employing part (c), we get
$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(i)}\geq Q(i,f,\overline{g}^{\lambda}_{n},\lambda)\left(q(z|i,f)+{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(j)}q(j|i,f)\right)\\ e^{\lambda h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(z)}\geq Q(z,f,\overline{g}^{\lambda}_{n},\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda h^{*}_{\overline{g}^{\lambda}_{n},\lambda}(j)}q(j|z,f) \end{array}\right. \end{array} $$
for all n≥1, which together with the Fatou lemma gives
$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda H_{\lambda}(i)}\geq Q(i,f,g^{*}_{\lambda},\lambda)\left(q(z|i,f) +{\sum}_{j\in S\setminus\{i, z\}}e^{\lambda H_{\lambda}(j)}q(j|i,f)\right)\\ e^{\lambda H_{\lambda}(z)}\geq Q(z,f,g^{*}_{\lambda},\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda H_{\lambda}(j)}q(j|z,f) \end{array}\right. \end{array} $$
(6.20)
for all i∈S∖{z} and f∈F. Thus, using Eq. 6.20 and the similar arguments of Eq. 6.11, we obtain $e^{\lambda H_{\lambda }(i)}\geq e^{\lambda h_{g^{*}_{\lambda }, \lambda }(i,f)}$ for all f∈F, which together with Eqs. 3.1 and 6.19 yields $h^{*}_{g^{*}_{\lambda },\lambda }(z)=H_{\lambda }(z)\geq 0$. Hence, we obtain $g^{*}_{\lambda }\in G_{\lambda }$.

Suppose that $h^{*}_{g^{*}_{\lambda },\lambda }(z)>0$. For each n≥1, let $\gamma _{n}:=e^{n\lambda h^{*}_{g^{*}_{\lambda },\lambda }(z)}$. Then we have γ _n∈(0,1) for all n≥1 and $\lim _{n\to \infty }\gamma _{n}=0$. Employing Eq. 6.8 we get $\lambda c(i,f)-\lambda g^{*}_{\lambda }+q(i|i,f)<0$ for all i∈S and f∈F. For each n≥1 and f∈F, define the transition rate as follows:
$$ \overline{p}_{n}(z|z,f):=-\gamma_{n} e^{\lambda h_{g^{*}_{\lambda},\lambda}(z,f)}, \ \overline{p}_{n}(j|z,f):=-\frac{\gamma_{n}e^{\lambda h_{g^{*}_{\lambda},\lambda}(j,f)}q(j|z,f)}{\lambda c(z,f)-\lambda g^{*}_{\lambda}+q(z|z,f)} \ \ \text{for} \ j\in S\setminus\{z\}, $$
(6.21)
and for any i∈S∖{z},
$$\begin{array}{@{}rcl@{}} \overline{p}_{n}(i|i,f)&:=&-\gamma_{n} e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f)}, \ \overline{p}_{n}(z|i,f):=-\frac{\gamma_{n} q(z|i,f)}{\lambda c(i,f)-\lambda g^{*}_{\lambda}+q(i|i,f)}, \end{array} $$
(6.22)

$$\begin{array}{@{}rcl@{}} \overline{p}_{n}(j|i,f)&:=&-\frac{\gamma_{n}e^{\lambda h_{g^{*}_{\lambda},\lambda}(j,f)}q(j|i,f)}{\lambda c(i,f)-\lambda g^{*}_{\lambda}+q(i|i,f)} \ \ \text{for} \ j\in S\setminus \{i,z\}. \end{array} $$
(6.23)
For any initial state i∈S and any policy f∈F, the probability measure and expectation operator corresponding to the transition rate $\overline {p}_{n}(\cdot |\cdot ,f)$ defined in Eqs. 6.21–6.23 are denoted by $\overline {P}^{f}_{i,n}$ and $\overline {E}^{f}_{i,n}$. For any κ>0 and n≥1, define
$$\overline{H}_{\kappa,n}(i,f):=\frac{1}{\lambda}\ln \overline{E}^{f}_{i,n}\left[e^{-\lambda \kappa \tau_{z}}\right] \ \ \text{for \ all} \ i\in S\ \text{and} \ f\in F. $$
Note that Eq. 6.7 gives $e^{\lambda h_{g^{*}_{\lambda },\lambda }(i,f)}>0$ for all i∈S and f∈F. Employing Eq. 3.1 and part (c), we have $\min _{i\in S}\inf _{f\in F}e^{-\lambda h_{g^{*}_{\lambda },\lambda }(i,f)}=\min _{i\in S}e^{-\lambda h^{*}_{g^{*}_{\lambda },\lambda }(i)}>0$. Then by the fact that $\lim _{n\to \infty }\gamma _{n}=0$, there exists a positive integer n ₁ such that
$$\begin{array}{@{}rcl@{}} \gamma_{n_{1}}&<& \min_{(i,a)\in K}\left(\lambda g^{*}_{\lambda}-\lambda c(i,a)-q(i|i,a)\right) \ \ \text{and} \end{array} $$
(6.24)

$$\begin{array}{@{}rcl@{}} \gamma_{n_{1}}&\leq&\min_{(i,a)\in K}\left(\lambda g^{*}_{\lambda}-\lambda c(i,a)-q(i|i,a)\right)\times\min_{i\in S}\inf_{f\in F}e^{-\lambda h_{g^{*}_{\lambda},\lambda}(i,f)}\\ &\leq&\left(\lambda g^{*}_{\lambda}-\lambda c(i,f)-q(i|i,f)\right)e^{-\lambda h_{g^{*}_{\lambda},\lambda}(i,f)} \end{array} $$
(6.25)
for all i∈S and f∈F. Moreover, we have that there exists f ¹∈F satisfying $h_{g^{*}_{\lambda },\lambda }(i,f^{1})=\sup _{f\in F}h_{g^{*}_{\lambda },\lambda }(i,f)=:\overline {h}_{g^{*}_{\lambda },\lambda }(i)$ for all i∈S. In fact, by part (b) we get
$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(i)}\geq \inf_{a\in A(i)}\left\{Q(i,a,g,\lambda)\left(q(z|i,a)+{\sum}_{j\in S\setminus\{i,z\}}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(j)}q(j|i,a)\right)\right\}\\ e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(z)}\geq \inf_{a\in A(z)}\left\{Q(z,a,g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(j)}q(j|z,a)\right\} \end{array}\right. \end{array} $$
(6.26)
for all i∈S∖{z}. Then employing Assumption 3.1, Theorem 3.1(c) in Wei and Chen (2016) and the Weierstrass theorem in Aliprantis and Border (2007, p.40), we obtain the existence of f ¹∈F with f ¹(i)∈A(i) attaining the minimum of Eq. 6.26, i.e,
$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(i)}\geq Q(i,f^{1},g,\lambda)\left(q(z|i,f^{1})+{\sum}_{j\in S\setminus\{i,z\}}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(j)}q(j|i,f^{1})\right)\\ e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(z)}\geq Q(z,f^{1},g,\lambda){\sum}_{j\in S\setminus\{z\}}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(j)}q(j|z,f^{1}) \end{array}\right. \end{array} $$
(6.27)
for all i∈S∖{z}. Thus, using Eq. 6.27 and following the similar arguments of Eq. 6.11, we have
$$e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(i)}\!\geq \!{\sum}_{m=1}^{n}E_{i}^{f^{1}}\!\left[e^{\lambda{\int}_{0}^{T_{m}}(c(\xi_{t},f^{1})-g^{*}_{\lambda})dt}I_{\{\tau_{z}=T_{m}\}}\right] +E_{i}^{f^{1}}\!\left[e^{\lambda{\int}_{0}^{T_{n}}(c(\xi_{t},f^{1})-g^{*}_{\lambda})dt}e^{\lambda \overline{h}_{g^{*}_{\lambda},\lambda}(\xi_{T_{n}})}I_{\{\tau_{z}>T_{n}\}}\!\right] $$
for all n=1,2,…, which gives $\overline {h}_{g^{*}_{\lambda },\lambda }(i)\leq h_{g^{*}_{\lambda },\lambda }(i,f^{1})$ for all i∈S∖{z}. Hence, we get $\overline {h}_{g^{*}_{\lambda },\lambda }(i)=h_{g^{*}_{\lambda },\lambda }(i,f^{1})$ for all i∈S∖{z}. Similarly, by Eq. 6.27 we can obtain $\overline {h}_{g^{*}_{\lambda },\lambda }(z)=h_{g^{*}_{\lambda },\lambda }(z,f^{1})$. Furthermore, by Eq. 6.7 we have
$$ \inf_{f\in F}e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f)}=e^{\lambda \sup_{f\in F}h_{g^{*}_{\lambda},\lambda}(i,f)}=e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f^{1})}>0, $$
(6.28)
which implies $\zeta (i):=\inf _{f\in F}\frac {1}{\lambda }\overline {p}_{n_{1}}(i|i,f)>0$ for all i∈S. Hence, for any $\kappa \in \left (0,\min \limits _{i\in S}\zeta (i)\right )=:\overline {O}_{n_{1}}$, by Eqs. 6.21–6.23 and the similar arguments of Theorem 3.1(b) in Wei and Chen (2016), we get
$$\begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\left\{\begin{array}{ll} e^{\lambda \overline{H}_{\kappa, n_{1}}(i,f)}=-\frac{1}{\gamma_{n_{1}} e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f)}+\lambda \kappa}\!\left(\frac{\gamma_{n_{1}} q(z|i,f)}{\lambda c(i,f)-\lambda g^{*}_{\lambda}+q(i|i,f)}+\!\!\!\sum\limits_{j\in S\setminus\{i, z\}}\!\!\!\!\frac{\gamma_{n_{1}} e^{\lambda \overline{H}_{\kappa,n_{1}}(j,f)+\lambda h_{g^{*}_{\lambda},\lambda}(j,f)}q(j|i,f)}{\lambda c(i,f)-\lambda g^{*}_{\lambda}+q(i|i,f)}\right)\\ e^{\lambda \overline{H}_{\kappa, n_{1}}(z,f)}=-\frac{1}{\gamma_{n_{1}}e^{\lambda h_{g^{*}_{\lambda},\lambda}(z,f)}+\lambda\kappa} \!\sum\limits_{j\in S\setminus\{z\}}\frac{\gamma_{n_{1}} e^{\lambda \overline{H}_{\kappa,n_{1}}(j,f)+\lambda h_{g^{*}_{\lambda},\lambda}(j,f)}q(j|z,f)}{\lambda c(z,f)-\lambda g^{*}_{\lambda}+q(z|z,f)} \end{array}\right. \end{array} $$
(6.29)
for all i∈S∖{z} and f∈F. Below we show that there exist $\widetilde {t}_{0}\in \mathbb {R}_{+}$ and $\widetilde {\alpha }\in (0,1)$ such that
$$ \overline{P}^{f}_{i,n_{1}}(\tau_{z}>\widetilde{t}_{0})\leq \widetilde{\alpha} \ \ \text{for \ all} \ i\in S \ \text{and} \ f\in F. $$
(6.30)
By Theorem 3.1 and an induction argument, we have ${P_{i}^{f}}(\tau _{z}>nt_{0})\leq \alpha ^{n}$ for all n=1,2,…, which implies that there exists a positive integer n ₂ such that
$$ 0<\alpha^{n_{2}}\leq\frac{1}{3} \ \ \text{and} \ \ {P_{i}^{f}}(\tau_{z}>n_{2}t_{0})\leq \alpha^{n_{2}} $$
(6.31)
for all i∈S and f∈F. Let $\pi _{1}:=\max _{(i,a)\in K}(-q(i|i,a))$. Then direct calculations yield
$$\begin{array}{@{}rcl@{}} {P_{i}^{f}}(T_{m} < n_{2}t_{0})&=&{P_{i}^{f}}(\theta_{1}+\cdots+\theta_{m} < n_{2}t_{0})\\ &=&{\int}_{s_{1}+\cdots+s_{m} <n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{i\},j_{l+1}\in S\setminus\{j_{l}\},l=1,\ldots,m-1}e^{q(i|i,f)s_{1}}q(j_{1}|i,f)\\ &&\times{\prod}_{l=1}^{m-1}\left(e^{q(j_{l}|j_{l},f)s_{l+1}} q(j_{l+1}|j_{l},f)\right)ds_{m}{\cdots} ds_{1}\\ &\leq&{\pi_{1}^{m}}{\int}_{s_{1}+\cdots+s_{m}<n_{2}t_{0}}ds_{m}{\cdots} ds_{1}=\frac{(\pi_{1}n_{2}t_{0})^{m}}{m!} \end{array} $$
(6.32)
for all i∈S and f∈F, where the second equality follows from Proposition B.8 in Guo and Hernández-Lerma (2009, p.205). Thus, using Eq. 6.32 we obtain $\lim _{m\to \infty }\sup _{i\in S,f\in F}{P_{i}^{f}}(T_{m}< n_{2}t_{0})=0$, which gives that there exists a positive integer m ^∗ satisfying
$$ {P_{i}^{f}}(T_{m^{*}}<n_{2}t_{0})\leq \frac{\alpha^{n_{2}}}{2} \ \ \text{for \ all} \ i\in S \ \text{and} \ f\in F. $$
(6.33)
For each f∈F, let {Y _n,n=0,1,…} be the embedding Markov chain of the continuous-time Markov chain associated with the transition rate q(⋅|⋅,f) and define $\tau _{1}:=\inf \{n\geq 1:Y_{n}=z\}$. Employing Eqs. 6.31 and 6.33 we get
$$\begin{array}{@{}rcl@{}} {P_{i}^{f}}(\tau_{1}\leq m^{*})={P_{i}^{f}}(\tau_{z}\leq T_{m^{*}})&\geq&{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0}, T_{m^{*}}\geq n_{2}t_{0})\\ &\geq&{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0})+{P_{i}^{f}}(T_{m^{*}}\geq n_{2}t_{0})-1\geq\frac{1}{2} \end{array} $$
(6.34)
for all i∈S and f∈F. Let $\pi _{2}:=\min _{(i,a)\in K}\left (-\frac {\gamma _{n_{1}}}{\lambda c(i,a)-\lambda g^{*}_{\lambda }+q(i|i,a)}\right )$, $\pi _{3}:=\min _{i\in S}\inf _{f\in F}e^{\lambda h_{g^{*}_{\lambda },\lambda }(i,f)}$, and $\pi _{4}:=\min _{i\in S}\inf _{f\in F}\overline {p}_{n_{1}}(i|i,f)$. Then Eq. 6.24 and Theorem 3.2(c) imply π ₂∈(0,1) and $\pi _{4}\in (-\infty ,0)$. By Eqs. 3.1, 6.28 and the fact that $g^{*}_{\lambda }\in G_{\lambda }$, we obtain
$$\pi_{3}=\min_{i\in S}e^{\lambda h_{g^{*}_{\lambda},\lambda}(i,f^{1})}>0 \ \ \text{and} \ \ \pi_{3}\leq e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(z)}\leq1. $$
Thus, for each i∈S, f∈F and m≥1, direct calculations yield
$$\begin{array}{@{}rcl@{}} &&\overline{P}^{f}_{i,n_{1}}(\tau_{z}\leq n_{2}t_{0})\\ &\geq&\overline{P}^{f}_{i,n_{1}}(\tau_{z}\leq n_{2}t_{0},\tau_{z}\leq T_{m})\\ &=&{\sum}_{n=1}^{m}\overline{P}^{f}_{i,n_{1}}(\tau_{z}\leq n_{2}t_{0},\tau_{z}=T_{n})\\ &=&{\sum}_{n=1}^{m}{\int}_{s_{1}+\cdots+s_{n}\leq n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{ i,z\},j_{l+1}\in S\setminus\{j_{l},z\},l=1,2,\ldots,n-2}e^{\overline{p}_{n_{1}}(i|i,f)s_{1}}\overline{p}_{n_{1}}(j_{1}|i,f)\\ &&\times{\prod}_{l=1}^{n-2} e^{\overline{p}_{n_{1}}(j_{l}|j_{l},f)s_{l+1}}\overline{p}_{n_{1}}(j_{l+1}|j_{l},f) e^{\overline{p}_{n_{1}}(j_{n-1}|j_{n-1},f)s_{n}}\overline{p}_{n_{1}}(z|j_{n-1},f)ds_{n} {\cdots} ds_{1}\\ &\geq&{\sum}_{n=1}^{m}{\int}_{s_{1}+\cdots+s_{n}\leq n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{ i,z\},j_{l+1}\in S\setminus\{j_{l},z\},l=1,2,\ldots,n-2}\pi_{2}\pi_{3}e^{\pi_{4}s_{1}}e^{q(i|i,f)s_{1}} q(j_{1}|i,f)\\ &&\times{\prod}_{l=1}^{n-2} \pi_{2}\pi_{3}e^{\pi_{4}s_{l+1}}e^{q(j_{l}|j_{l},f)s_{l+1}}q(j_{l+1}|j_{l},f) \pi_{2}e^{\pi_{4}s_{n}}e^{q(j_{n-1}|j_{n-1},f)s_{n}}q(z|j_{n-1},f)ds_{n} {\cdots} ds_{1}\\ &=&{\sum}_{n=1}^{m}(\pi_{2}\pi_{3})^{n-1}\pi_{2}{\int}_{s_{1}+\cdots+s_{n}\leq n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{ i,z\},j_{l+1}\in S\setminus\{j_{l},z\},l=1,2,\ldots,n-2}e^{\pi_{4}(s_{1}+\cdots+s_{n})}\\ &&\times e^{q(i|i,f)s_{1}}q(j_{1}|i,f){\prod}_{l=1}^{n-2} e^{q(j_{l}|j_{l},f)s_{l+1}}q(j_{l+1}|j_{l},f) e^{q(j_{n-1}|j_{n-1},f)s_{n}}q(z|j_{n-1},f) ds_{n} {\cdots} ds_{1}\\ &\geq& (\pi_{2}\pi_{3})^{m-1}\pi_{2}e^{\pi_{4}n_{2}t_{0}}{\sum}_{n=1}^{m}{\int}_{s_{1}+\cdots+s_{n}\leq n_{2}t_{0}}{\sum}_{j_{1}\in S\setminus\{ i,z\},j_{l+1}\in S\setminus\{j_{l},z\},l=1,2,\ldots,n-2} e^{q(i|i,f)s_{1}}q(j_{1}|i,f)\\ &&\times{\prod}_{l=1}^{n-2} e^{q(j_{l}|j_{l},f)s_{l+1}}q(j_{l+1}|j_{l},f) e^{q(j_{n-1}|j_{n-1},f)s_{n}}q(z|j_{n-1},f) ds_{n} {\cdots} ds_{1}\\ &=&(\pi_{2}\pi_{3})^{m-1}\pi_{2}e^{\pi_{4}n_{2}t_{0}}{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0},\tau_{z}\leq T_{m}), \end{array} $$
(6.35)
where the second equality follows from Proposition B.8 in Guo and Hernández-Lerma (2009, p.205). Note that $\overline {b}:=(\pi _{2}\pi _{3})^{m^{*}-1}\pi _{2}e^{\pi _{4}n_{2}t_{0}}\in (0,1)$. Hence, using Eqs. 6.31, 6.34 and 6.35, we have
$$\begin{array}{@{}rcl@{}} \overline{P}^{f}_{i,n_{1}}(\tau_{z}\leq n_{2}t_{0})&\geq& \overline{b}{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0},\tau_{z}\leq T_{m^{*}})\\ &=&\overline{b}{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0},\tau_{1}\leq m^{*})\\ &\geq&\overline{b}[{P_{i}^{f}}(\tau_{z}\leq n_{2}t_{0})+{P_{i}^{f}}(\tau_{1}\leq m^{*})-1] \geq\frac{1}{6}\overline{b} \end{array} $$
for all i∈S and f∈F. Therefore, Eq. 6.30 holds with $\widetilde {t}_{0}:=n_{2}t_{0}$ and $\widetilde {\alpha }:=1-\frac {1}{6}\overline {b}$. Furthermore, employing Eq. 6.30 and an induction argument, we obtain
$$ \overline{P}^{f}_{i,n_{1}}(\tau_{z}>n\widetilde{t}_{0})\leq \widetilde{\alpha}^{n} \ \ \text{for \ all} \ i\in S, \ f\in F \ \text{and} \ n=1,2,\ldots. $$
(6.36)
Then for any $\kappa _{0}\in \overline {O}_{n_{1}}$ satisfying $\kappa _{0}<\frac {\ln \widetilde {\alpha }}{\lambda \widetilde {t}_{0}}$, i∈S and f∈F, by (6.36) we get
$$\begin{array}{@{}rcl@{}} e^{\lambda \overline{H}_{\kappa_{0},n_{1}}(i,f)}&=&{\sum}_{m=0}^{\infty}\overline{E}_{i,n_{1}}^{f}\left[e^{-\lambda \kappa_{0} \tau_{z}}I_{\{\tau_{z}\in(m\widetilde{t}_{0},(m+1)\widetilde{t}_{0}]\}}\right]\\ &\leq&{\sum}_{m=0}^{\infty}e^{-\lambda\kappa_{0}(m+1)\widetilde{t}_{0}}\overline{P}_{i,n_{1}}^{f}(\tau_{z}>m\widetilde{t}_{0})\\ &\leq&\frac{e^{-\lambda \kappa_{0}\widetilde{t}_{0}}}{1-\widetilde{\alpha} e^{-\lambda \kappa_{0}\widetilde{t}_{0}}}. \end{array} $$
Take any κ ₁∈(0,κ ₀) satisfying $\kappa _{1}<\min _{(i,a)\in K}\left \{\frac {1}{\lambda }[\lambda c(i,a)-\lambda g^{*}_{\lambda }+q(i|i,a)]\right \}$ and define $\overline {H}^{*}_{\kappa _{1},n_{1}}(i,f):=\gamma _{n_{1}}e^{\lambda \overline {H}_{\kappa _{1},n_{1}}(i,f)+\lambda h_{g^{*}_{\lambda },\lambda }(i,f)}$ for all i∈S and f∈F. Thus, using Eqs. 6.25 and 6.29, we have
$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{ll} \overline{H}^{*}_{\kappa_{1},n_{1}}(i,f)\geq-\frac{1}{\lambda c(i,f)-\lambda g^{*}_{\lambda}-\lambda \kappa_{1}+q(i|i,f)}\left(\gamma_{n_{1}} q(z|i,f)+{\sum}_{j\in S\setminus\{i, z\}} \overline{H}^{*}_{\kappa_{1},n_{1}}(j,f)q(j|i,f)\right)\\ \overline{H}^{*}_{\kappa_{1},n_{1}}(z,f)\geq-\frac{1}{\lambda c(z,f)-\lambda g^{*}_{\lambda}-\lambda \kappa_{1}+q(z|z,f)}{\sum}_{j\in S\setminus\{z\}}\overline{H}^{*}_{\kappa_{1},n_{1}}(j,f)q(j|z,f) \end{array}\right. \end{array} $$
for all i∈S∖{z} and f∈F. Hence, by the last inequalities and the similar arguments of Eq. 6.11, we obtain $\overline {H}^{*}_{\kappa _{1},n_{1}}(i,f)\geq \gamma _{n_{1}}e^{\lambda h_{g^{*}_{\lambda }+\kappa _{1},\lambda }(i,f)}$ for all i∈S and f∈F, which implies
$$ e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(z)} \sup_{f\in F}e^{\lambda \overline{H}_{\kappa_{1},n_{1}}(z,f)}\geq\sup_{f\in F}e^{\lambda \overline{H}_{\kappa_{1},n_{1}}(z,f)+\lambda h_{g^{*}_{\lambda},\lambda}(z,f)}\geq e^{\lambda h^{*}_{g^{*}_{\lambda}+\kappa_{1},\lambda}(z)}. $$
(6.37)
Let $\{\rho _{m},m\geq 1\}\subseteq (0,\kappa _{1})$ be a sequence satisfying $\lim _{m\to \infty }\rho _{m}=0$. Note that $0\leq e^{-\lambda \rho _{m}\widetilde {t}_{0}}\widetilde {\alpha }\leq e^{-\lambda \kappa _{0}\widetilde {t}_{0}}\widetilde {\alpha }<1$ for all m≥1. Then direct calculations give
$$\begin{array}{@{}rcl@{}} 0\leq \sup_{f\in F}e^{\lambda \overline{H}_{\rho_{m},n_{1}}(z,f)}-1&=&\sup_{f\in F}{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m})^{n}}{n!}\overline{E}_{z,n_{1}}^{f}[{\tau_{z}^{n}}]\\ &=&\sup_{f\in F}{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m})^{n}}{(n-1)!}{\int}_{0}^{\infty}t^{n-1}\overline{P}_{z,n_{1}}^{f}(\tau_{z}>t)dt\\ &=&\sup_{f\in F}{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m})^{n}}{(n-1)!}{\sum}_{l=0}^{\infty}{\int}_{l\widetilde{t}_{0}}^{(l+1) \widetilde{t}_{0}}t^{n-1}\overline{P}_{z,n_{1}}^{f}(\tau_{z}>t)dt\\ &\leq&\sup_{f\in F}{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m}\widetilde{t}_{0})^{n}}{(n-1)!}{\sum}_{l=0}^{\infty} (l+1)^{n-1}\overline{P}_{z,n_{1}}^{f}(\tau_{z}>l\widetilde{t}_{0})\\ &\leq&{\sum}_{n=1}^{\infty}\frac{(-\lambda\rho_{m}\widetilde{t}_{0})^{n}}{(n-1)!}{\sum}_{l=0}^{\infty} (l+1)^{n-1}\widetilde{\alpha}^{l}\\ &=&-\frac{\lambda\rho_{m}\widetilde{t}_{0}e^{-\lambda\rho_{m}\widetilde{t}_{0}}} {1-\widetilde{\alpha}e^{-\lambda\rho_{m}\widetilde{t}_{0}}} \end{array} $$
(6.38)
for all m≥1, where the second equality is due to Lemma 3.4 in Kallenberg (2012, p.49) and the last inequality follows from Eq. 6.36. Thus, employing Eq. 6.38 we obtain $\lim _{m\to \infty }\sup _{f\in F}e^{\lambda \overline {H}_{\rho _{m},n_{1}}(z,f)}=1$. Hence, for any $\varepsilon \in (0,e^{-\lambda h^{*}_{g^{*}_{\lambda },\lambda }(z)}-1]$, there exists a positive integer m ₀ such that $\sup _{f\in F}e^{\lambda \overline {H}_{\rho _{m_{0}},n_{1}}(z,f)}-1\leq \varepsilon $, which together with Eq. 6.37 implies
$$ e^{\lambda h^{*}_{g^{*}_{\lambda}+\rho_{m_{0}},\lambda}(z)}\leq e^{\lambda h^{*}_{g^{*}_{\lambda},\lambda}(z)} \sup_{f\in F}e^{\lambda \overline{H}_{\rho_{m_{0}},n_{1}}(z,f)}\leq1. $$
(6.39)
Moreover, by Eq. 6.39 we have $g^{*}_{\lambda }+\rho _{m_{0}}\in G_{\lambda }$, which leads to a contradiction that $g^{*}_{\lambda }+\rho _{m_{0}}\leq g^{*}_{\lambda }$. Therefore, we get $h^{*}_{g^{*}_{\lambda },\lambda }(z)=0$.

□

7 Proof of Theorem 3.4

In this section, we prove Theorem 3.4.

Proof

Fix any i∈S and π∈π. For any λ ₁,λ ₂>0, we have

$$\begin{array}{@{}rcl@{}} E_{i}^{\pi}\!\left[e^{\lambda_{1}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt}\!\right] =&E_{i}^{\pi}\!\left[e^{\lambda_{2}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt} e^{(\lambda_{1}-\lambda_{2}){{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt}\right]\\ &{}\leq E_{i}^{\pi}\left[e^{\lambda_{2}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt} \right]e^{|\lambda_{1}-\lambda_{2}|T\max_{(i,a)\in K}c(i,a)} \end{array} $$

and

$$\begin{array}{@{}rcl@{}} E_{i}^{\pi}\left[e^{\lambda_{1}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt}\right] \geq&E_{i}^{\pi}\left[e^{\lambda_{2}{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt} \right]e^{-|\lambda_{1}-\lambda_{2}|T\max_{(i,a)\in K}c(i,a)}, \end{array} $$

which give

$$|\lambda_{1} J^{*}_{V_{\lambda_{1}}}(i)-\lambda_{2} J^{*}_{V_{\lambda_{2}}}(i)|\leq|\lambda_{1}-\lambda_{2}|\max_{(i,a)\in K}c(i,a). $$

Thus, Theorem 3.3(b) and the last inequality yield that $\lambda g^{*}_{\lambda }$ is continuous in $\lambda \in (0,\infty )$. Hence, by Theorem 3.3(b) again, we obtain that $J^{*}_{V_{\lambda }}(i)$ is continuous in $\lambda \in (0,\infty )$. Moreover, employing the same technique above, we can obtain the continuity of $J^{*}_{V_{\lambda }}(i)$ in $\lambda \in (-\infty ,0)$. Below we show that $J^{*}_{V_{\lambda }}(i)$ is continuous in λ=0. Take an arbitrary sequence $\{\lambda _{n},n\geq 1\}\subseteq (0,\infty )$ with $\lim _{n\to \infty }\lambda _{n}=0$. For each n≥1, let $g^{*}_{\lambda _{n}}$ and $h^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}$ be as in Theorem 3.3 with λ _n in lieu of λ. Set $\widetilde {h}^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(i):=h^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(i)-\min _{j\in S}h^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(j)$ for all n≥1. Then for each n≥1, we have $\widetilde {h}^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(i)\geq 0$ and there exists i ^λ _n∈S satisfying $\widetilde {h}^{*}_{g^{*}_{\lambda _{n}},\lambda _{n}}(i^{\lambda _{n}})=0$. Moreover, using Theorem 3.3(a), for each n≥1, there exists $f^{*}_{\lambda _{n}}\in F$ such that

$$\begin{array}{@{}rcl@{}} \lambda_{n} g^{*}_{\lambda_{n}} e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(i)}&=&\underset{a\in A(i)}{\inf} \!\left\{\lambda_{n} c(i,a)e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(i)}\!\,+\,\underset{j\in S}{\sum}~e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(j)}q(j|i,a)\right\} \end{array} $$

(7.1)

$$\begin{array}{@{}rcl@{}} &=&\lambda_{n} c(i,f^{*}_{\lambda_{n}})e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(i)}+\underset{j\in S}{\sum}~e^{\lambda_{n} \widetilde{h}^{*}_{g^{*}_{\lambda_{n}},\lambda_{n}}(j)}q(j|i,f^{*}_{\lambda_{n}}). \end{array} $$

(7.2)

Note that Theorem 3.3(b) gives $0\leq g^{*}_{\lambda _{n}}\leq \max _{(i,a)\in K}c(i,a)$ for all n≥1. Thus, choose any convergent subsequence $\{g^{*}_{\lambda _{n_{l}}},l\geq 1\}$ of $\{g^{*}_{\lambda _{n}},n\geq 1\}$ and denote the corresponding limit by

$$ \overline{g}:=\lim_{l\to\infty}g^{*}_{\lambda_{n_{l}}}\in \left[0,\max_{(i,a)\in K}c(i,a)\right]. $$

(7.3)

Furthermore, by the finiteness of S and the compactness of F and $[0,\infty ]$, there exists a subsequence of {n _l} (still denoted by {n _l}) such that $i^{\lambda _{n_{l}}}=i^{*}\in S$ for all l≥1 and the limits of the sequences $\{\widetilde {h}^{*}_{g^{*}_{\lambda _{n_{l}}},\lambda _{n_{l}}},l\geq 1\}$ and $\{f^{*}_{\lambda _{n_{l}}},l\geq 1\}$ exist. Set

$$ \overline{h}(j):=\lim_{l\to\infty}\widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(j)\ \ \text{and} \ \ f^{*}(j):=\lim_{l\to\infty}f^{*}_{\lambda_{n_{l}}}(j)\ \ \text{for \ all} \ j\in S. $$

(7.4)

Then we have $\overline {h}(j)\geq 0$ for all j∈S and $\overline {h}(i^{*})=0$. Below we show $\overline {h}(j)<\infty $ for all j∈S by an induction argument. For any j≠i ^∗, Assumption 3.1(iii) yields that there exist different states k ₁=i ^∗, k ₂,…,$k_{m^{\prime }}=j$ such that q(k _n+1|k _n,f ^∗)>0 for all $n=1,2,\ldots , m^{\prime }-1$. Observe that $\overline {h}(k_{1})=\overline {h}(i^{*})=0$. Thus, $\overline {h}(k_{n})<\infty $ holds for n=1. Suppose that $\overline {h}(k_{n^{*}})<\infty $ for some $n^{*}\in \{1,2,\ldots ,m^{\prime }-1\}$. Employing Eq. 7.2 we obtain

$$g^{*}_{\lambda_{n_{l}}} e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})} =c(k_{n^{*}},f^{*}_{\lambda_{n_{l}}})e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})}+{\sum}_{j\in S} \frac{e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(j)}-1}{\lambda_{n_{l}}}q(j|k_{n^{*}},f^{*}_{\lambda_{n_{l}}}), $$

which together with the inequality e ^x−1≥x ( x≥0) gives

$$\begin{array}{@{}rcl@{}} g^{*}_{\lambda_{n_{l}}} e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})}\geq&c(k_{n^{*}},f^{*}_{\lambda_{n_{l}}})e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})}+\frac{e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(k_{n^{*}})}-1}{\lambda_{n_{l}}}q(k_{n^{*}}|k_{n^{*}},f^{*}_{\lambda_{n_{l}}})\\ &+\widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}} (k_{n^{*}+1})q(k_{n^{*}+1}|k_{n^{*}},f^{*}_{\lambda_{n_{l}}}) \end{array} $$

for all l≥1. Then letting $l\to \infty $ in the both sides of the last inequality and using the induction hypothesis, Assumption 3.1(ii), Eqs. 7.3 and 7.4, we get

$$\overline{g}\geq c(k_{n^{*}},f^{*})+\overline{h}(k_{n^{*}})q(k_{n^{*}}|k_{n^{*}},f^{*})+\overline{h}(k_{n^{*}+1})q(k_{n^{*}+1}|k_{n^{*}},f^{*}),$$

which implies $\overline {h}(k_{n^{*}+1})<\infty $. Hence, by induction we have $\overline {h}(k_{n})<\infty $ for all $n=1,2,\ldots ,m^{\prime }$. Therefore, we obtain $\overline {h}(j)\in [0,\infty )$ for all j∈S. By Eq. 7.1 we get

$$\begin{array}{@{}rcl@{}} g^{*}_{\lambda_{n_{l}}} e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(i)}&\leq&c(i,a)e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(i)}+{\sum}_{j\in S} \frac{e^{\lambda_{n_{l}} \widetilde{h}^{*}_{g^{*}_{\lambda_{n_{l}}},\lambda_{n_{l}}}(j)}-1}{\lambda_{n_{l}}}q(j|i,a) \end{array} $$

for all l≥1 and a∈A(i). Then the last inequality, Eqs. 7.3 and 7.4 yield

$$ \overline{g}\leq c(i,a)+\sum\limits_{j\in S}\overline{h}(j)q(j|i,a) \ \ \text{for \ all} \ a\in A(i). $$

(7.5)

Thus, employing Eq. 7.5 and the Dynkin formula, we obtain

$$\overline{g}T\leq E_{i}^{\pi}\left[{{\int}_{0}^{T}}{\int}_{A}c(\xi_{t},a)\pi(da|\xi_{t},t)dt\right]+E_{i}^{\pi}[\overline{h}(\xi_{T})]-\overline{h}(i) $$

for all T>0, which gives

$$ \overline{g}\leq J^{*}_{V_{0}}(i). $$

(7.6)

On the other hand, using Eq. 7.2 and the similar arguments of Eq. 7.6, we have

$$ \overline{g}=J_{V_{0}}(i,f^{*})\geq J^{*}_{V_{0}}(i). $$

(7.7)

Then combining Eqs. 7.6 and 7.7, we get $\overline {g}=J^{*}_{V_{0}}(i)$. Since all the convergent subsequences of $\{g^{*}_{\lambda _{n}},n\geq 1\}$ have the same limit $J^{*}_{V_{0}}(i)$, we obtain $\lim _{n\to \infty }g^{*}_{\lambda _{n}}=J^{*}_{V_{0}}(i)$, which together with Theorem 3.3(b) implies $\lim _{n\to \infty }J^{*}_{V_{\lambda _{n}}}(i)=J^{*}_{V_{0}}(i)$. Therefore, $J^{*}_{V_{\lambda }}(i)$ is right-continuous in λ=0. Moreover, following the same technique above, we have that $J^{*}_{V_{\lambda }}(i)$ is left-continuous in λ=0. Hence, we complete the proof of the theorem. □

8 Concluding remarks

In this paper we have studied the U-average cost criterion for the CTMDPs with a finite state space. Under the continuity-compactness condition and the irreducibility condition, we have shown that the simultaneous Doeblin condition for the CTMDPs holds. Moreover, we have obtained the optimality equation of the auxiliary risk-sensitive first passage optimization problem and the properties of the corresponding optimal value function for any nonzero risk-sensitivity parameter. Then employing the obtained results on the risk-sensitive first passage criterion, we have established the existence of a solution to the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value. Furthermore, we have proven that the optimal value function of the risk-sensitive average cost criterion is continuous with respect to the risk-sensitivity parameter. Finally, we have given the connections between the U-average cost criterion and the average cost criteria induced by the identity function and the exponential utility function, from which the existence of a U-average optimal deterministic stationary policy has been shown. It should be mentioned that the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value plays a crucial role in the study of the U-average cost criterion. Hence, when dealing with the U-average cost criterion with a countable state space, the difficulty lies in finding the conditions under which the optimality equation of the risk-sensitive average cost criterion holds for any nonzero risk-sensitivity parameter. In addition, the CTMDPs with the bounded transition rates under the expected discounted cost and expected average cost criteria can be transformed to the equivalent discrete-time MDPs by the uniformization technique. Whether the uniformization technique is applicable to the CTMDPs under the risk-sensitive average cost criterion is a very interesting problem.

References

Aliprantis C, Border K (2007) Infinite dimensional analysis. Springer, New York
MATH Google Scholar
Anderson WJ (1991) Continuous-time Markov chains: an applications-oriented approach. Springer, New York
Book MATH Google Scholar
Bäuerle N, Rieder U (2014) More risk-sensitive Markov decision processes. Math Oper Res 39:105–120
Article MathSciNet MATH Google Scholar
Cavazos-Cadena R, Fernández-Gaucherand E (2005) Risk-sensitive optimal control in communicating average Markov decision chains. In: Dror M, L’Ecuyer P, Szidarovsky F (eds) In: Modelling uncertainty: an examination of stochastic theory, methods, and applications. Springer, New York, pp 515–553
Google Scholar
Cavazos-Cadena R (2010) Optimality equations and inequalities in a class of risk-sensitive average cost Markov decision chains. Math Meth Oper Res 71:47–84
Article MathSciNet MATH Google Scholar
Cavazos-Cadena R, Hernández-Hernández D (2011) Discounted approximations for risk-sensitive average criteria in Markov decision chains with finite state space. Math Oper Res 36:133–146
Article MathSciNet MATH Google Scholar
Cavazos-Cadena R, Hernández-Hernández D (2016) A characterization of the optimal certainty equivalent of the average cost via the Arrow-Pratt sensitivity function. Math Oper Res 41:224–235
Article MathSciNet MATH Google Scholar
Di Masi G B, Stettner L (2007) Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J Control Optim 46:231–252
Article MathSciNet MATH Google Scholar
Ghosh M K, Saha S (2014) Risk-sensitive control of continuous time Markov chains. Stochastics 86:655–675
MathSciNet MATH Google Scholar
Guo X P, Hernández-Lerma O (2009) Continuous-time Markov decision processes: theory and applications. Springer, Berlin
Book MATH Google Scholar
Guo XP, Huang YH, Song XY (2012) Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J Control Optim 50:23–47
Article MathSciNet MATH Google Scholar
Jaśkiewicz A (2007) Average optimality for risk-sensitive control with general state space. Ann Appl Probab 17:654–675
Article MathSciNet MATH Google Scholar
Kallenberg O (2012) Foundations of modern probability. Springer, New York
MATH Google Scholar
Kitaev M Y, Rykov V V (1995) Controlled queueing systems. CRC Press, Boca Raton
MATH Google Scholar
Puterman M L (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Book MATH Google Scholar
Wei Q D, Chen X (2014) Strong average optimality criterion for continuous-time Markov decision processes. Kybernetika 50:950–977
MathSciNet MATH Google Scholar
Wei QD, Chen X (2016) Continuous-time Markov decision processes under the risk-sensitive average cost criterion. Oper Res Lett 44:457–462
Article MathSciNet Google Scholar

Download references

Acknowledgments

The authors are greatly indebted to the referees for the valuable comments and suggestions which have greatly improved the presentation. The research of the first author was supported by National Natural Science Foundation of China (Grant No.11601166) and Cultivation Program for Outstanding Young Scientific Talents of Fujian Province. The research of the second author was supported by the Fundamental Research Funds for the Central Universities of Xiamen University (Grant No.20720170008).

Author information

Authors and Affiliations

School of Economics and Finance, Huaqiao University, Quanzhou, 362021, People’s Republic of China
Qingda Wei
School of Mathematical Sciences, Xiamen University, Xiamen, 361005, People’s Republic of China
Xian Chen

Authors

Qingda Wei
View author publications
You can also search for this author in PubMed Google Scholar
Xian Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xian Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, Q., Chen, X. Average cost criterion induced by the regular utility function for continuous-time Markov decision processes. Discrete Event Dyn Syst 27, 501–524 (2017). https://doi.org/10.1007/s10626-017-0237-x

Download citation

Received: 05 January 2016
Accepted: 05 February 2017
Published: 20 February 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10626-017-0237-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Abstract

Similar content being viewed by others

Continuous-Time Markov Decision Processes Under the Risk-Sensitive First Passage Discounted Cost Criterion

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Finite horizon risk-sensitive continuous-time Markov decision processes with unbounded transition and cost rates

1 Introduction

2 Preliminaries

Definition 2.1

Definition 2.2

Remark 2.1

Definition 2.3

3 The V λ -average cost criterion

Assumption 3.1

Remark 3.1

Theorem 3.1

Proof

Remark 3.2

Theorem 3.2

Proof

Remark 3.3

Theorem 3.3

Proof

Remark 3.4

Theorem 3.4

Proof

4 The existence of U-average optimal policies

Theorem 4.1

Proof

Remark 4.1

5 Proof of Theorem 3.1

Proof

6 Proof of Theorem 3.2

Proof

7 Proof of Theorem 3.4

Proof

8 Concluding remarks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

3 The V _λ-average cost criterion