Occupation Measures: Asymptotic Properties and Ramification

Yin, G. George; Zhang, Qing

doi:10.1007/978-1-4614-4346-9_5

G. George Yin³ &
Qing Zhang⁴

Part of the book series: Stochastic Modelling and Applied Probability ((SMAP,volume 37))

3675 Accesses

Abstract

Chapter 4 deals with the probability distribution of α^ε( ⋅) through the corresponding forward equation and is mainly an analytical approach, whereas the current chapter is largely probabilistic in nature. The central theme of this chapter is limit results of unscaled as well as scaled sequences of occupation measures, which include the law of large numbers for an unscaled sequence, exponential upper bounds, and asymptotic distribution of a suitably scaled sequence of occupation times.

Access provided by Autonomous University of Puebla. Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Chapter 4 deals with the probability distribution of α^ε( ⋅) through the corresponding forward equation and is mainly an analytical approach, whereas the current chapter is largely probabilistic in nature. The central theme of this chapter is limit results of unscaled as well as scaled sequences of occupation measures, which include the law of large numbers for an unscaled sequence, exponential upper bounds, and asymptotic distribution of a suitably scaled sequence of occupation times. It further exploits the deviation of the functional occupation times from its quasi-stationary distribution. We obtain estimates of centered deviations, prove the convergence of a properly scaled and centered sequence of occupation times, characterize the limit process by deriving the limit distribution and providing explicit formulas for the mean and covariance functions, and provide exponential bounds for the normalized process. It is worthwhile to note that the limit covariance function depends on the initial-layer terms in contrast with most of the existing results of central limit type.

The rest of the chapter is arranged as follows. We first study the asymptotic properties of irreducible Markov chains in Section 5.2. In view of the developments in Remarks 4.34 and 4.39, the Markov chain with recurrent states is the most illustrative and representative one. As a result, in the remaining chapters, we mainly treat problems associated with this model. Starting in Section 5.3.1, we consider Markov chains with weak and strong interactions with generators consisting of multiple irreducible blocks. After treating aggregation of the Markov states, we study the corresponding exponential bounds. We deal with asymptotic distributions. Then in Section 5.4, we treat Markov chains with generators that are merely measurable. Next, remarks on inclusion of transient and absorbing states are provided in Section 5.5. Applications of the weak convergence results and a related stability problem are provided in Section 5.6. Finally, Section 5.7 concludes the chapter with notes and further remarks.

2 The Irreducible Case

The notion of occupation measure is set forth first. We consider a sequence of unscaled occupation measures and establish its convergence in probability to that of the accumulative quasi-stationary distribution. This is followed by exponential bounds of the function occupation time and moment estimates. In addition, asymptotic normality is derived. Although the prelimit process has nonzero mean and is nonstationary, using the results of Section 4.2, the quasi-stationary regime is established after a short period (of order O(ε)). We also calculate explicitly the covariance representation of the limit process, and prove that the process α^ε( ⋅) satisfies a mixing condition. The tightness of the sequence and the w.p.1 continuity of the sample paths of the limit process are proved by estimating the fourth moment. The limit of the finite-dimensional distributions is then calculated and shown to be Gaussian. By proving a series of lemmas, we derive the desired asymptotic normality.

As was mentioned in previous chapters, the process α^ε( ⋅) arises from pervasive practical use that involves a rapidly fluctuating finite-state Markov chain. In these applications, the asymptotic behavior of the Markov chain α^ε( ⋅) has a major influence. Further investigation and understanding of the asymptotic properties of α^ε( ⋅), in particular, the probabilistic structures, play an important role in the in-depth study.

In Section 4.2, using singular perturbation techniques, we examined the asymptotic properties of ${p}_{i}^\varepsilon (t) = P({\alpha }^\varepsilon (t) = i)$. It has been proved that p ^ε(t) = (p ₁ ^ε(t), …, p _m ^ε(t)) converges to the quasi-stationary distribution ν(t) as ε → 0 for each t > 0 and p ^ε(t) admits an asymptotic expansion in terms of ε. To gain further insight, we ask whether there is a limit result for the occupation measure $\int \nolimits_{0}^{t}{I}_{\{{\alpha }^\varepsilon (s)=i\}}ds$. If a convergence is expected to take place, then what is the rate of convergence? Does one have a central limit theorem associated with the α^ε( ⋅)-process? The answers to these questions are affirmative. We will prove a number of limit results related to an unscaled sequence, and a suitably scaled and normalized sequence. Owing to the asymptotic expansions, the scaling factor is $\sqrt \varepsilon $. The limit process is Gaussian with zero mean, and the covariance of the limit process depends on the asymptotic expansion in an essential way, which reflects one of the distinct features of the central limit theorem. It appears that it is virtually impossible to calculate the asymptotic covariance of the Gaussian process without the help of the asymptotic expansion, which reveals a salient characteristic of the two-time-scale Markov chain.

A related problem is to examine the exponential bounds on the scaled occupation measure process. This is similar to the estimation of the moment generating function. Such estimates have been found useful in studying hierarchical controls of manufacturing systems. Using the asymptotic expansion and the martingale representation of finite-state Markov chains, we are able to establish such exponential bounds for the scaled occupation measures.

2.1 Occupation Measure

Let $(\Omega,\mathcal{F},P)$ denote the underlying probability space. As in Section 4.2, α^ε( ⋅) is a nonstationary Markov chain on $(\Omega,\mathcal{F},P)$ with finite-state space $\mathcal{M} =\{ 1,\ldots,m\}$ and generator ${Q}^\varepsilon (t) = Q(t)/\varepsilon $.

For each $i \in \mathcal{M}$, let β_i( ⋅) denote a bounded Borel measurable deterministic function and define a sequence of centered (around the quasi-stationary distribution) occupation measures Z _i ^ε(t) as

$${Z}_{i}^\varepsilon (t) ={ \int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)\right ){\beta }_{i}(s)ds.$$

(5.1)

Set Z ^ε(t) = (Z ₁ ^ε(t), …, Z _m ^ε(t)). It is a measure of the functional occupancy for the process α^ε( ⋅). Our interest lies in the asymptotic properties of the sequence defined in (5.1). To proceed, we first present some conditions and preliminary results needed in the sequel.

Note that a special choice of β_i( ⋅) is β_i(t) = 1, for t ∈ [0, T]. To insert β_i( ⋅) in sequence allows one to treat various situations in some applications. For example, in the manufacturing problem, β_i(t) is often given by a function of a control process; see Chapter 8 for further details.

2.2 Conditions and Preliminary Results

To proceed, we make the following assumptions.

(A5.1)
For each t ∈ [0, T], Q(t) is weakly irreducible.
(A5.2)
( ⋅) is continuously differentiable on [0, T], and its derivative is Lipschitz.

Recall that ${p}^\varepsilon (t) = (P({\alpha }^\varepsilon (t) = 1),\ldots,P({\alpha }^\varepsilon (t) = m))$ and let

$${p}_{ij}^\varepsilon (t,{t}_{ 0}) = P({\alpha }^\varepsilon (t) = j\vert {\alpha }^\varepsilon ({t}_{ 0}) = i)\quad \mbox{ for all }i,j \in \mathcal{M}.$$

Use P ^ε(t, t ₀) to denote the transition matrix (p _ij ^ε(t, t ₀)). The following lemma is on the asymptotic expansion of P ^ε(t, t ₀).

Lemma 5.1

Assume (A5.1) and (A5.2) . Then there exists a positive constant κ ₀ such that for each fixed 0 ≤ T < ∞,

$${P}^\varepsilon (t,{t}_{ 0}) = {P}_{0}(t) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - {t}_{0})} \varepsilon \right )\right )$$

(5.2)

uniformly in (t ₀ ,t) where 0 ≤ t ₀ ≤ t ≤ T and

$${P}_{0}(t) = \left (\begin{array}{c} \nu (t)\\ \vdots\\ \nu (t) \end{array} \right ).$$

In addition, assume Q(⋅) to be twice continuously differentiable on [0,T] with the second derivative being Lipschitz. Then

$$\begin{array}{ll} {P}^\varepsilon (t,{t}_{0})& = {P}_{0}(t) + \varepsilon {P}_{1}(t) \\ &\quad + {Q}_{0}\left (\frac{t - {t}_{0}} \varepsilon ,{t}_{0}\right ) + \varepsilon {Q}_{1}\left (\frac{t - {t}_{0}} \varepsilon ,{t}_{0}\right ) + O(\varepsilon ^{2}) \end{array}$$

(5.3)

uniformly in (t ₀ ,t), where 0 ≤ t ₀ ≤ t ≤ T,

$${P}_{1}(t) = \left (\begin{array}{c} {\varphi }_{1}(t)\\ \vdots \\ {\varphi }_{1}(t) \end{array} \right ),$$

$$\begin{array}{rl} &\frac{d{Q}_{0}(\tau,{t}_{0})} {d\tau } = {Q}_{0}(\tau,{t}_{0})Q({t}_{0}),\;\tau \geq 0, \\ &{Q}_{0}(0,{t}_{0}) = I - {P}_{0}({t}_{0}),\end{array}$$

and

$$\begin{array}{rl} &\frac{d{Q}_{1}(\tau,{t}_{0})} {d\tau } = {Q}_{1}(\tau,{t}_{0})Q({t}_{0}) + \tau {Q}_{0}(\tau,{t}_{0})\frac{dQ({t}_{0})} {dt},\ \ \tau \geq 0 \\ &{Q}_{1}(0,{t}_{0}) = -{P}_{1}({t}_{0}), \end{array}$$

where φ ₁ (t) is given in (4.13) ( with $\tau := (t - {t}_{0})/\varepsilon $ ) . Furthermore, for i = 0,1, the P _i (⋅) are (2 − i) times continuously differentiable on [0,T] and there exist constants K > 0 and κ ₀ > 0 such that

$$\left \vert {Q}_{i}\left (\tau,{t}_{0}\right )\right \vert \leq K\exp (-{\kappa }_{0}\tau ),$$

(5.4)

uniformly for t ₀ ∈ [0,T].

Remark 5.2

Recall that ν(t) and φ₁(t) are row vectors. As a result, P₀(⋅) and P₁(⋅) have identical rows. This is a consequence of the convergence of p^ε(t) to the quasi-stationary distribution and the asymptotic expansions.

Proof of Lemma 5.1: It suffices to verify (5.3) because (5.2) can be derived similarly. The asymptotic expansion of P ^ε(t, t ₀) can be obtained as in Section 4.2. Thus only the exponential bound (5.4) needs to be proved. The main task is to verify the uniformity in t ₀. To this end, it suffices to treat each row of Q _i(τ, t ₀) separately. For a fixed i = 0, 1, let

$$\eta (\tau,{t}_{0}) = ({\eta }_{1}(\tau,{t}_{0}),\ldots,{\eta }_{m}(\tau,{t}_{0}))$$

denote any row of Q _i(τ, t ₀) and η₀(t ₀) the corresponding row in Q _i(0, t ₀) with

$$\begin{array}{rl} &{Q}_{0}(0,{t}_{0}) = I - {P}_{0}({t}_{0})\quad \mbox{ and } \\ &{Q}_{1}(0,{t}_{0}) = -{P}_{1}({t}_{0})\end{array}$$

Then η(τ, t ₀) satisfies the differential equation

$$\begin{array}{ll} &\frac{d\eta (\tau,{t}_{0})} {d\tau } = \eta (\tau,{t}_{0})Q({t}_{0}),\;\tau \geq 0, \\ &\eta (0,{t}_{0}) = {\eta }_{0}({t}_{0})\end{array}$$

By virtue of the assumptions of Lemma 5.1 and the asymptotic expansion, it follows that η₀(t ₀) is uniformly bounded and η₀(t ₀)1 = 0.

Define

$$\widehat{\kappa } = -\max \{\mbox{ The real parts of eigenvalues of }Q(t),\;t \in [0,T]\}.$$

Then Lemma A.6 implies that $\widehat{k} > 0$. In view of Theorem 4.5, it suffices to show that for all $\tau \geq 0$ and for some constant K > 0 independent of t ₀,

$$\vert \eta (\tau,{t}_{0})\vert \leq K\exp \left (-\frac{\widehat{\kappa }\tau } {2} \right ).$$

(5.5)

To verify (5.5), note that for any ς₀ ∈ [0, T],

$${ d\eta (\tau,{t}_{0}) \over d\tau } = \eta (\tau,{t}_{0})Q({\varsigma }_{0}) + \eta (\tau,{t}_{0})[Q({t}_{0}) - Q({\varsigma }_{0})].$$

Solving this differential equation by treating η(τ, t ₀)[Q(t ₀) − Q(ς₀)] as the driving term, we have

$$\begin{array}{ll} \eta (\tau,{t}_{0})& = {\eta }_{0}({t}_{0})\exp \left (Q({\varsigma }_{0})\tau \right ) \\ &\quad +{ \int }_{0}^{\tau }\eta (\varsigma,{t}_{ 0})[Q({t}_{0}) - Q({\varsigma }_{0})]\exp \left (Q({\varsigma }_{0})(\tau - \varsigma )\right )d\varsigma.\end{array}$$

(5.6)

In view of (A5.2), for some K ₀ > 0,

$$\vert Q({t}_{0}) - Q({\varsigma }_{0})\vert \leq {K}_{0}\vert {t}_{0} - {\varsigma }_{0}\vert.$$

Noting that η₀(t ₀)1 = 0 and that P ₀(t) has identical rows, we have

$${\eta }_{0}({t}_{0}){P}_{0}(t) = 0,\quad \mbox{ for }t \geq 0.$$

Thus the equation in (5.6) is equivalent to

$$\begin{array}{rl} \eta (\tau,{t}_{0})& = {\eta }_{0}({t}_{0})(\exp \left (Q({\varsigma }_{0})s\right ) - {P}_{0}({\varsigma }_{0})) \\ &\ +{ \int }_{0}^{\tau }\eta (\varsigma,{t}_{ 0})[Q({t}_{0}) - Q({\varsigma }_{0})](\exp \left (Q({\varsigma }_{0})(\tau - \varsigma )\right ) - {P}_{0}({\varsigma }_{0}))d\varsigma \end{array}$$

From Lemma A.2, we have

$$\vert \eta (\tau,{t}_{0})\vert \leq {K}_{1}\exp \left (-\widehat{\kappa }\tau \right ) + {K}_{2}\vert {t}_{0} - {\varsigma }_{0}\vert {\int }_{0}^{\tau }\vert \eta (\varsigma,{t}_{ 0})\vert \exp \left (-\widehat{\kappa }(\tau - \varsigma )\right )d\varsigma,$$

for some constants K ₁ and K ₂ which may depend on ς₀ but are independent of t ₀. By Gronwall’s inequality (see Hale [79, p. 36]),

$$\vert \eta (\tau,{t}_{0})\vert \leq {K}_{1}\exp \left (-(\widehat{\kappa } - {K}_{2}\vert {t}_{0} - {\varsigma }_{0}\vert )\tau \right ),$$

(5.7)

for all t ₀ ∈ [0, T] and τ > 0.

If (5.5) does not hold uniformly, then there exist τ_n > 0 and t _n ∈ [0, T] such that

$$\vert \eta ({\tau }_{n},{t}_{n})\vert \geq n\exp \left (-\frac{\widehat{\kappa }{\tau }_{n}} {2} \right ).$$

Since T is finite, we may assume t _n → ς₀, as n → ∞. This contradicts (5.7) for n large enough satisfying $\vert {t}_{n} - {\varsigma }_{0}\vert <\widehat{ \kappa }/(2{K}_{2})$ and K ₁ < n. Thus the proof is complete. □

Unscaled Occupation Measure

To study the unscaled occupation measure Z _i ^ε(t) in (5.1), we define a related sequence $\{\widehat{{Z}}^\varepsilon (t)\}$ of ${\mathbb{R}}^{m}$-valued processes with its ith component $\widehat{{Z}}_{i}^\varepsilon (t)$ given by

$$\widehat{{Z}}_{i}^\varepsilon (t) ={ \int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - P({\alpha }^\varepsilon (s) = i)\right ){\beta }_{ i}(s)ds.$$

Assume the conditions (A5.1) and (A5.2). We claim that for any δ > 0,

$$ \lim \limits_{\varepsilon \rightarrow 0} \left( \sup\limits_{0\leq t\leq T}P(\vert \widehat{{Z}}^\varepsilon (t)\vert \geq \delta )\right) = 0\mbox{ and }$$

(5.8)

$$ \lim \limits_{\varepsilon \rightarrow 0}\left( \sup \limits_{0\leq t\leq T}E\vert \widehat{{Z}}^\varepsilon (t){\vert}^{2}\right) = 0.$$

(5.9)

Note that (5.8) follows from (5.9) using Tchebyshev’s inequality. The verification of (5.9), which mainly depends on a mixing property of the underlying sequence, is almost the same as the moment estimates in the proof of asymptotic normality in Lemma 5.13. The details of the verifications of (5.8) and (5.9) are omitted here.

With (5.9) in hand for any δ > 0, to study the asymptotic properties of Z ^ε( ⋅), it remains to show that

$$\begin{array}{rl} \lim \limits {}_{\varepsilon \rightarrow 0}\left( \sup \limits_{0\leq t\leq T}P(\vert {Z}^\varepsilon (t)\vert \geq \delta )\right ) = 0 \mbox{ and } \quad \lim \limits_{\varepsilon \rightarrow 0}\left( \sup \limits_{0\leq t\leq T}E\vert {Z}^\varepsilon (t){\vert }^{2}\right ) = 0 .\end{array}$$

In fact, it is enough to work with each component of Z ^ε(t). Note that both Z ^ε(t) and $\widehat{Z}^\varepsilon(t)$ are bounded. This together with the boundedness of β(t) and Lemma 5.1 implies that for each $i \in \mathcal{M}$,

$$\begin{array}{rl} & \sup \limits_{0\leq t\leq T}E\vert {Z}_{i}^\varepsilon (t){\vert }^{2} \\ & \leq 2 \left(\sup \limits_{0\leq t\leq T}E\vert \widehat{{Z}}_{i}^\varepsilon(t){\vert }^{2} + \sup \limits_{ 0\leq t\leq T}E |\int_{0}^{t}\left(P({\alpha }^\varepsilon (s) = i) - {\nu }_{i}(s)\right){\beta }_{i}(s)d s {|}^{2}\right) \\ & \leq 2\left(\sup \limits_{0\leq t\leq T}E\vert \widehat{{Z}}_{i}^\varepsilon(t){\vert }^{2} +{ \int }_{0}^{T}O(\varepsilon )ds\right )\rightarrow 0, \end{array}$$

as ε → 0, which yields the desired results.

The limit result above is of the law-of-large-numbers type. What has been obtained is that as ε → 0,

$${\int }_{0}^{t}{I}_{\{{ \alpha }^\varepsilon (s)=i\}}ds \rightarrow {\int }_{0}^{t}{\nu }_{ i}(s)ds\quad \mbox{ in probability as }\varepsilon \rightarrow 0,$$

for 0 < t ≤ T. In fact, a somewhat stronger result on uniform convergence in terms of the second moment is established. To illustrate, suppose that ${\alpha }^\varepsilon (t) = \alpha (t/\varepsilon )$ such that α( ⋅) is a stationary process with stationary distribution $\overline{\nu } = ({\overline{\nu }}_{1},\ldots,{\overline{\nu }}_{m})$. Then via a change of variable $\varsigma = s/\varepsilon $, we have

$$\begin{array}{rl} &\frac{1} {t}{\int }_{0}^{t}{I}_{\{{ \alpha }^\varepsilon (s)=i\}}ds = \frac\varepsilon {t}{\int }_{0}^{t/\varepsilon }{I}_{\{ \alpha (\varsigma )=i\}}d\varsigma \\ &\quad ={ \varepsilon \over t} {\int }_{0}^{t/\varepsilon }{I}_{\{ \alpha (\varsigma )=i\}}d\varsigma \rightarrow {\overline{\nu }}_{i}\quad \mbox{ in probability as }\varepsilon \rightarrow 0, \end{array}$$

for 0 < t ≤ T. This is exactly the continuous-time version of the law of large numbers.

Example 5.3

Let us return to the singularly perturbed Cox process of Section 3.3. Recall that the compensator of the singularly perturbed Cox process is given by

$${G}^\varepsilon (t) = {G}_{ 0} + \sum \limits_{i=1}^{m}{ \int }_{0}^{t}{a}_{ i}{I}_{\{{\alpha }^\varepsilon (s)=i\}}ds,$$

where a_i > 0 for i = 1,…,m. Assume that all the conditions in Lemma 5.1 hold. Then Theorem 4.5 implies that P(α ^ε (t) = i) → ν _i (t) as ε → 0. What we have discussed thus far implies that for each $i \in \mathcal{M}$,

$${\int }_{0}^{t}{a}_{ i}{I}_{\{{\alpha }^\varepsilon (s)=i\}}ds \rightarrow {\int }_{0}^{t}{a}_{ i}{\nu }_{i}(s)ds\quad \mbox{ in probability as }\varepsilon \rightarrow 0\mbox{ and }$$

$${G}^\varepsilon (t) \rightarrow G(t) = {G}_{ 0} + \sum \limits_{i=1}^{m}{ \int }_{0}^{t}{a}_{ i}{\nu }_{i}(s)ds\quad \mbox{ in probability.}$$

Moreover,

$$\lim \limits_{\varepsilon \rightarrow 0}\left(\sup \limits_{0\leq t\leq T}E\vert {G}^\varepsilon (t) - G(t){\vert }^{2}\right ) = 0.$$

In the rest of this chapter, we treat suitably scaled occupation measures; the corresponding results for the Cox process can also be derived.

With the limit results in hand, the next question is this: How fast does the convergence take place? The rate of convergence issue together with more detailed asymptotic properties is examined fully in the following sections.

2.3 Exponential Bounds

This section is devoted to the derivation of exponential bounds for the normalized occupation measure (or occupation time) n ^ε( ⋅). Given a deterministic process β( ⋅), we consider the “centered” and “scaled” functional occupation-time process n ^ε(t, i) defined by

$$\begin{array}{ll} &{n}^\varepsilon (t,i) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)\right ){\beta }_{i}(s)ds\ \mbox{ and } \\ &{n}^\varepsilon (t) = ({n}^\varepsilon (t,1),\ldots,{n}^\varepsilon (t,m)) \in {\mathbb{R}}^{1\times m}.\end{array}$$

(5.10)

In view of Lemma 5.1, we have, for 0 ≤ s ≤ t ≤ T,

$${P}^\varepsilon (t,s) - {P}_{ 0}(t) = O\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ),$$

for some κ₀ > 0. Note that the big O( ⋅) usually depends on T. Let K _T denote an upper bound of

$$\frac{{P}^\varepsilon (t,s) - {P}_{0}(t)} {\varepsilon +\exp (-{\kappa }_{0}(t - s)/\varepsilon )}$$

for 0 ≤ s ≤ t ≤ T. For convenience, we use the notation O ₁(y) to denote a function of y such that | O ₁(y) | ∕ | y | ≤ 1. The rationale is that K _T represents the magnitude of the bounding constant and the rest of the bound is in terms of a function with norm bounded by 1. Using this notation and K _T, we write

$${P}^\varepsilon (t,s) - {P}_{ 0}(t) = {K}_{T}{O}_{1}\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).$$

(5.11)

Let y(t) = ( y _ij(t)) and z( t) = (z _i(t)) denote a matrix-valued function and a vector-valued function defined on [0, T], respectively. Their norms are defined by

$$\begin{array}{ll} &\vert y{\vert }_{T} =\max \limits_{i,j}\sup\limits _{0\leq t\leq T}\vert {y}_{ij}(t)\vert, \\ &\vert z{\vert }_{T} =\max\limits _{i}\sup \limits_{0\leq t\leq T}\vert {z}_{i}(t)\vert. \end{array}$$

(5.12)

For future use, define β(t) = diag(β ₁(t), …, β _m(t)). The following theorem is concerned with the exponential bound of n ^ε(t) for ε sufficiently small.

Theorem 5.4

Assume that (A5.1) and (A5.2) are satisfied. Then there exist ε ₀ > 0 and K > 0 such that for all 0 < ε ≤ ε ₀ , T ≥ 0, and any bounded and measurable deterministic function β(⋅) = diag (β ₁ (⋅),…,β _m (⋅)), the following exponential bound holds:

$$E\exp \left \{ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup \limits_{0\leq t\leq T}\left \vert {n}^\varepsilon (t)\right \vert \right \}\leq K,$$

(5.13)

where θ _T is a constant satisfying

$$0 \leq {\theta }_{T} \leq \frac{\min \{1,{\kappa }_{0}\}} {{K}_{T}\vert \beta {\vert }_{T}}$$

(5.14)

with κ ₀ being the exponential constant in Lemma 5.1.

Remark 5.5

Note that the constants ε ₀ and K are independent of T. This is a convenient feature of the estimate in certain applications. The result is in terms of a fixed but otherwise arbitrary T, which is particularly useful for systems in an infinite horizon.

Proof: The proof is divided into several steps.

Step 1. In the first step, we show that (5.13) holds when the “sup” is absent. Let χ^ε( ⋅) denote the indicator vector of α^ε( ⋅), that is,

$$\begin{array}{rl} &{\chi }^\varepsilon (t) = \left ({I}_{\{{ \alpha }^\varepsilon (t)=1\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)=m\}}\right )\quad \mbox{ and } \\ &{w}^\varepsilon (t) = {\chi }^\varepsilon (t) - {\chi }^\varepsilon (0) -\frac{1} \varepsilon {\int }_{0}^{t}{\chi }^\varepsilon (s)Q(s)ds\end{array}.$$

It is well known (see Elliott [56]) that w ^ε(t) = (w ₁ ^ε(t), …, w _m ^ε(t)), for t ≥ 0, is a σ{α ^ε(s) : s ≤ t}-martingale. In view of a result of Kunita and Watanabe [134] (see also Ikeda and Watanabe [91, p. 55]), one can define a stochastic integral with respect to w ^ε(t). Moreover, the solution of the linear stochastic differential equation

$$d{\chi }^\varepsilon (t) = {\chi }^\varepsilon (t){Q}^\varepsilon (t)dt + d{w}^\varepsilon (t)$$

is given by

$${\chi }^\varepsilon (t) = {\chi }^\varepsilon (0){P}^\varepsilon (t,0) + {\int }_{0}^{t}(d{w}^\varepsilon (s)){P}^\varepsilon (t,s),$$

where P ^ε(t, s) is the principal matrix solution to the equation

$$\frac{d{P}^\varepsilon (t,s)} {dt} = \frac{1} \varepsilon {P}^\varepsilon (t,s)Q(t),\;\mbox{ with }{P}^\varepsilon (s,s) = I$$

representing the transition probability matrix.

Note that for t ≥ s ≥ 0,

$${\chi }^\varepsilon (s){P}_{ 0}(t) = ({\chi }^\varepsilon (s)\mathrm{l})\nu (t) = \nu (t).$$

Using this and (5.11), we have

$$\begin{array}{l} {\chi }^\varepsilon (t) - \nu (t) \\ \ = {\chi }^\varepsilon (0)({P}^\varepsilon (t,0) - {P}_{ 0}(t)) + {\int }_{0}^{t}(d{w}^\varepsilon (s))[({P}^\varepsilon (t,s) - {P}_{ 0}(t)) + {P}_{0}(t)] \\ \ = {K}_{T}{O}_{1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}t} \varepsilon \right )\right ) \\ \quad \ + {K}_{T}{\int }_{0}^{t}(d{w}^\varepsilon (s)){O}_{ 1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ) + {w}^\varepsilon (t){P}_{ 0}(t) \\ \ = {K}_{T}{O}_{1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}t} \varepsilon \right )\right ) \\ \quad \ + {K}_{T}{\int }_{0}^{t}(d{w}^\varepsilon (s)){O}_{ 1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).\end{array}$$

The last equality above follows from the observation that

$$Q(s){P}_{0}(t) = 0\mbox{ for all }t \geq s \geq 0,$$

and

$$\begin{array}{rl} {w}^\varepsilon (t){P}_{0}(t) =&\left ({\chi }^\varepsilon (t) - {\chi }^\varepsilon (0) -{ 1 \over \varepsilon } {\int }_{0}^{t}{\chi }^\varepsilon (r)Q(r)dr\right ){P}_{ 0}(t) \\ =&\nu (t) - \nu (t) -{ 1 \over \varepsilon } {\int }_{0}^{t}{\chi }^\varepsilon (r)Q(r){P}_{ 0}(t)dr = 0 .\end{array}$$

Recall that β( t) = diag(β₁(t), …, β_m(t)). Then it follows that

$$\begin{array}{rl} &{\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds \\ =&{K}_{T}{O}_{1}(\varepsilon (t + 1)) \\ & + {K}_{T}{\int }_{0}^{t}{\int }_{0}^{s}(d{w}^\varepsilon (r)){O}_{ 1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}(s - r)} \varepsilon \right )\right )\beta (s)ds \\ =&{K}_{T}{O}_{1}(\varepsilon (t + 1)) \\ & + {K}_{T}{\int }_{0}^{t}(d{w}^\varepsilon (r))\left ({\int }_{r}^{t}{O}_{ 1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}(s - r)} \varepsilon \right )\right )\beta (s)ds\right ) \\ =&{K}_{T}{O}_{1}(\varepsilon (t + 1)) \\ & + \varepsilon {K}_{T}{\int }_{0}^{t}(d{w}^\varepsilon (r)){O}_{ 1}\left ((t - r) + \frac{1} {{\kappa }_{0}}\left (1 -\exp \left (-\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right )\right )\vert \beta {\vert }_{T} \\ =&{K}_{T}{O}_{1}(\varepsilon (t + 1)) + \varepsilon {K}_{T}\vert \beta {\vert }_{T}\left (T + \frac{1} {{\kappa }_{0}}\right ){\int }_{0}^{t}(d{w}^\varepsilon (r))b(r,t), \\ \end{array}$$

where b( s, t) is a measurable function and | b(s, t) | ≤ 1 for all s and t. Dividing both sides by (T + 1), we obtain

$$\begin{array}{l} \frac{1} {T + 1}\left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert \\ \ = \varepsilon {K}_{T}{O}_{1}(1) + \varepsilon {K}_{T}\vert \beta {\vert }_{T}\left (\frac{T + (1/{\kappa }_{0})} {T + 1} \right )\left \vert {\int }_{0}^{t}(d{w}^\varepsilon (s))b(s,t)\right \vert. \end{array}$$

(5.15)

Therefore, we have

$$\begin{array}{rl} &E\exp \left \{ \frac{{\theta }_{T}} {\sqrt\varepsilon {(T + 1)}^{\frac{3} {2} }} \left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert \right \} \\ \leq &E\exp \left \{ \frac{1} {\sqrt\varepsilon \sqrt{T + 1}}\left [\varepsilon {O}_{1}(1) + \varepsilon \left \vert {\int }_{0}^{t}(d{w}^\varepsilon (s))b(s,t)\right \vert \right ]\right \}.\end{array}$$

In view of the choice of θ_T, it follows that

$$\begin{array}{rl} &E\exp \left \{ \frac{{\theta }_{T}} {\sqrt\varepsilon {(T + 1)}^{\frac{3} {2} }} \left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert \right \} \\ &\leq \exp \left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\right )E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}(d{w}^\varepsilon (s))b(s,t)\right \vert \right \} \\ &\leq eE\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}(d{w}^\varepsilon (s))b(s,t)\right \vert \right \} .\end{array}$$

(5.16)

Recall that

$${w}^\varepsilon (t) = ({w}_{ 1}^\varepsilon (t),\ldots,{w}_{ m}^\varepsilon (t)).$$

It suffices to work out the estimate for each component w _i ^ε (t). That is, it is enough to show that for each i = 1, …, m,

$$E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}b(s,t)d{w}_{ i}^\varepsilon (s)\right \vert \right \}\leq K,$$

(5.17)

for all measurable functions b( ⋅, ⋅) with | b(s, t) | ≤ 1 and 0 ≤ t ≤ T. For each t ₀ ≥ 0, let b ₀(s) = b(s, t ₀).

For any nonnegative random variable ξ,

$$\begin{array}{rl} E{e}^{\xi } =&\sum \limits_{j=0}^{\infty }{\int }_{\{j\leq \xi <j+1\}}{e}^{\xi }dP \\ \leq &\sum \limits_{j=0}^{\infty }{\int }_{\{j\leq \xi <j+1\}}{e}^{j+1}dP \\ =&\sum \limits_{j=0}^{\infty }{e}^{j+1}P(j \leq \xi < j + 1) \\ =&\sum \limits_{j=0}^{\infty }{e}^{j+1}[P(\xi \geq j) - P(\xi \geq j + 1)] \\ \leq &e + (e - 1)\sum \limits_{j=1}^{\infty }{e}^{j}P(\xi \geq j)\end{array}$$

By virtue of the inequality above, we have

$$\begin{array}{l} E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \right \} \\ \leq e + (e - 1)\sum \limits_{j=1}^{\infty }{e}^{j}P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{0}(s)d{w}_{i}^\varepsilon (s)\right \vert \geq j\right).\end{array}$$

(5.18)

To proceed, let us concentrate on the estimate of

$$P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \geq j\right ).$$

For each i = 1, …, m, let

$$\widetilde{{p}}_{i}(t) ={ \int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)$$

and let $\widetilde{{q}}_{i}(\cdot )$ denote the only solution to the following equation (see Elliott [55, Chapter 13])

$$\widetilde{{q}}_{i}(t) = 1 + \zeta {\int }_{0}^{t}\widetilde{{q}}_{ i}({s}^{-})d\widetilde{{p}}_{ i}(s),$$

where $\widetilde{{q}}_{i}({s}^{-})$ is the left-hand limit of $\widetilde{{q}}_{i}$ at s and ζ is a positive constant to be determined later. In what follows, we suppress the i-dependence and write $\widetilde{{p}}_{i}(\cdot )$ and $\widetilde{{q}}_{i}(\cdot )$ as $\widetilde{p}(\cdot )$ and $\widetilde{q}(\cdot )$ whenever there is no confusion.

Note that $\widetilde{p}(t)$ , for t ≥ 0, is a local martingale. Since

$$\zeta {\int }_{0}^{t}\widetilde{q}({s}^{-})d\widetilde{p}(s),\;t \geq 0,$$

is a local martingale, we have $E\widetilde{q}(t) \leq 1\mbox{ for all }t \geq 0$. Moreover, $\widetilde{q}(t)$ can be written as follows (see Elliott [55, Chapter 13]):

$$\widetilde{q}(t) =\exp \left (\zeta \widetilde{p}(t)\right )\prod \limits_{s\leq t}(1 + \zeta \Delta \widetilde{p}(s))\exp \left (-\zeta \Delta \widetilde{p}(s)\right ),$$

(5.19)

where $\Delta \widetilde{p}(s) :=\widetilde{ p}(s) -\widetilde{ p}({s}^{-})$, with $\vert \Delta \widetilde{p}(s)\vert \leq 1$.

Now observe that there exist positive constants ζ₀ and κ₁ such that for 0 < ζ ≤ ζ₀ and for all s > 0,

$$(1 + \zeta \Delta \widetilde{p}(s))\exp \left (-\zeta \Delta \widetilde{p}(s)\right ) \geq \exp \left (-{\kappa }_{1}{\zeta }^{2}\right ).$$

(5.20)

Combining (5.19) and (5.20), we obtain

$$\widetilde{q}(t) \geq \exp \{ \zeta \widetilde{p}(t) - {\kappa }_{1}{\zeta }^{2}N(t)\},\;\mbox{ for }0 < \zeta \leq {\zeta }_{ 0},\;t > 0,$$

where N(t) is the number of jumps of $\widetilde{p}(s)$ in s ∈ [0, t]. Since N(t) is a monotonically increasing process, we have

$$\widetilde{q}(t) \geq \exp \left \{\zeta \widetilde{p}(t) - {\kappa }_{1}{\zeta }^{2}N(T)\right \},\;\mbox{ for }0 < \zeta \leq {\zeta }_{ 0}.$$

Note also that for each i = 1, …, m,

$$\begin{array}{rl} &P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \geq j\right ) \\ =&P\left (\vert \widetilde{p}(t)\vert \geq \frac{j\sqrt{T + 1}} {\sqrt\varepsilon } \right ) \\ \leq &P\left (\widetilde{p}(t) \geq \frac{j\sqrt{T + 1}} {\sqrt\varepsilon } \right ) + P\left (-\widetilde{p}(t) \geq \frac{j\sqrt{T + 1}} {\sqrt\varepsilon } \right ).\end{array}$$

Consider the first term on the right-hand side of the inequality above. Let ${a}_{j} = j(T + 1)/(8{\kappa }_{1}\varepsilon )$. Then

$$\begin{array}{rl} &P\left (\widetilde{p}(t) \geq \frac{j\sqrt{T + 1}} {\sqrt\varepsilon } \right ) \\ \leq &P\left (\widetilde{q}(t) \geq \exp \left \{\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } - {\kappa }_{1}{\zeta }^{2}N(T)\right \}\right ) \\ \leq &P\left (\widetilde{q}(t) \geq \exp \left \{\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } - {\kappa }_{1}{\zeta }^{2}N(T)\right \},N(T) \leq {a}_{ j}\right ) \\ & +\; P(N(T) \geq {a}_{j}) \\ \leq &P\left (\widetilde{q}(t) \geq \exp \left (\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } - {\kappa }_{1}{\zeta }^{2}{a}_{ j}\right )\right ) + P(N(T) \geq {a}_{j}) \\ \leq &2\exp \left (-\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } + {\kappa }_{1}{\zeta }^{2}{a}_{ j}\right ) + P(N(T) \geq {a}_{j}).\end{array}$$

The last inequality follows from the local martingale property (see Elliott [55, Theorem 4.2]).

Now if we choose $\zeta = 4\sqrt\varepsilon /\sqrt{T + 1}$, then

$$\exp \left (-\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } + {\kappa }_{1}{\zeta }^{2}{a}_{ j}\right ) = {e}^{-2j}.$$

In view of the construction of Markov chains in Section 2.4, there exists a Poisson process N ₀ ( ⋅) with parameter (i.e., mean) a ∕ ε for some a > 0, such that N(t) ≤ N ₀(t). Assume a = 1 without loss of generality (otherwise one may replace ε by εa ^− 1 ). Using the Poisson distribution of N ₀(t), we have

$$P({N}_{0}(T) \geq k) \leq \frac{{(T/\varepsilon )}^{k}} {k!} \mbox{ for }k \geq 0.$$

In view of Stirling’s formula (see Chow and Teicher [30] or Feller [60]), for ε small enough,

$$P(N(T) \geq {a}_{j}) \leq \frac{{(T/\varepsilon )}^{\lfloor {a}_{j}\rfloor }} {\lfloor {a}_{j}\rfloor !} \leq 2{\left (\frac{8{\kappa }_{1}} {j} \right )}^{{a}_{j}-1} := 2{\gamma }_{ 0}^{{a}_{j}-1},$$

where ⌊ a _j⌋ is the integer part of a _j and

$${\gamma }_{0} = \frac{8e{\kappa }_{1}} {{j}_{0}} \in (0,1)\quad \mbox{ for }{j}_{0} >\max \{ 1,8e{\kappa }_{1}\}.$$

Thus, for j ≥ j ₀,

$$P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}{\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s) \geq j\right ) \leq 2{e}^{-2j} + 2{\gamma }_{ 0}^{{a}_{j}-1}.$$

Repeating the same argument for the martingale $(-\widetilde{p}(\cdot ))$, we get for j ≥ j ₀,

$$P\left (- \frac{\sqrt\varepsilon } {\sqrt{T + 1}}{\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s) \geq j\right ) \leq 2{e}^{-2j} + 2{\gamma }_{ 0}^{{a}_{j}-1}.$$

Combining the above two inequalities yields that for j ≥ j ₀,

$$P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \geq j\right ) \leq 4({e}^{-2j} + {\gamma }_{ 0}^{{a}_{j}-1}).$$

Then by (5.18),

$$\begin{array}{rl} &E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \right \} \\ &\qquad \leq {K}_{0} + 4(e - 1)\sum \limits_{j=1}^{\infty }{e}^{j}({e}^{-2j} + {\gamma }_{ 0}^{{a}_{j}-1}), \end{array}$$

where K ₀ is the sum corresponding to those terms with j ≤ j ₀. Now choose ε small enough that $e{\gamma }_{0}^{1/(8{\kappa }_{1}\varepsilon )} \leq 1/2$. Then

$$E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \right \}\leq {K}_{ 0} + 4e{\gamma }_{0}^{-1}.$$

Since t ₀ is arbitrary, we may take t ₀ = t in the above inequality. Then

$$E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}b(s,t)d{w}_{ i}^\varepsilon (s)\right \vert \right \}\leq {K}_{ 0} + 4e{\gamma }_{0}^{-1}.$$

Combining this inequality with (5.16) leads to

$$\begin{array}{rl} &E\exp \left \{ \frac{{\theta }_{T}} {\sqrt\varepsilon {(T + 1)}^{\frac{3} {2} }} \left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert \right \} \\ & \leq e({K}_{0} + 4e{\gamma }_{0}^{-1}) := K\end{array}$$

Step 2. Recall that

$${n}^\varepsilon (t,i) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}\left ({I}_{\{ \alpha (\varepsilon,s)=i\}} - {\nu }_{i}(s)\right ){\beta }_{i}(s)\,ds.$$

Then, for each $i \in \mathcal{M}$, n ^ε(t, i) is nearly a martingale, i.e., for ε small enough,

$$\vert E[{n}^\varepsilon (t,i)\vert {\mathcal{F}}_{ s}] - {n}^\varepsilon (s,i)\vert \leq O(\sqrt\varepsilon ),\quad \mbox{ for all }\omega \in \Omega \mbox{ and }0 \leq s \leq t \leq T.$$

(5.21)

Here $O \sqrt{\varepsilon}$ is deterministic, i.e., it does not depend on the sample point ω. The reason is that it is obtained from the asymptotic expansions. In fact, for all ${i}_{0} \in \mathcal{M}$,

$$ \begin{array}{l} E\left[{\int }_{s}^{t}({I}_{\{ \alpha (\varepsilon,r)=i\}} - {\nu }_{i}(r)){\beta }_{i}(r)\,dr {\vert }\alpha (\varepsilon,s) = {i}_{0}\right] \\ ={ \int }_{s}^{t}(E[{I}_{\{ \alpha (\varepsilon,r)=i\}}\vert \alpha (\varepsilon,s) = {i}_{0}] - {\nu }_{i}(r)){\beta }_{i}(r)\,dr \\ ={ \int }_{s}^{t}[P(\alpha (\varepsilon,r) = i\vert \alpha (\varepsilon,s) = {i}_{ 0}) - {\nu }_{i}({r}_)]{\beta }_{i}(r)\,dr \\ ={ \int }_{s}^{t}O(\varepsilon +\exp (-{\kappa }_{ 0}(r - s)/\varepsilon )\,dr = O(\varepsilon )\end{array}$$

So, (5.21) follows.

Step 3. We show that for each a > 0,

$$E[\exp \{a\vert {n}^\varepsilon (t,i)\vert \}\vert {\mathcal{F}}_{ s}] \geq \exp \{ a\vert {n}^\varepsilon (s,i)\vert \}(1 + O(\sqrt\varepsilon )).$$

First of all, note that ϕ(x) = | x | is a convex function. There exists a vector function ϕ₀(x) bounded by 1 such that

$$\phi (x) \geq \phi (a) + {\phi }_{0}(a) \cdot (x - a),$$

for all x and a. Noting that $O(\sqrt\varepsilon ) = -O(\sqrt\varepsilon )$ , we have

$$\begin{array}{rl} E[\vert {n}^\varepsilon (t,i)\vert \;\vert {\mathcal{F}}_{s}] \geq &\vert {n}^\varepsilon (s,i)\vert + {\phi }_{0}({n}^\varepsilon (s,i)) \cdot E[{n}^\varepsilon (t,i) - {n}^\varepsilon (s,i)\vert {\mathcal{F}}_{s}] \\ \geq &\vert {n}^\varepsilon (s,i)\vert + O(\sqrt\varepsilon )\end{array}$$

Moreover, note that e ^ax is also convex. It follows that

$$ \begin{array}{l} E[\exp (a\vert {n}^\varepsilon (t,i)\vert )\vert {\mathcal{F}}_{s}] \\ \geq \exp (a\vert {n}^\varepsilon (s,i)\vert ) + a\exp \{a\vert {n}^\varepsilon (s,i)\vert \}E[\vert {n}^\varepsilon (t,i)\vert -\vert {n}^\varepsilon (s,i)\vert \;\vert {\mathcal{F}}_{s}] \\ \geq \exp (a\vert {n}^\varepsilon (s,i)\vert )(1 + O(\sqrt\varepsilon )).\end{array}$$

Step 4. Let x ^ε(t) = exp(a | n ^ε(t, i) | ) for a > 0. Then, for any ${\mathcal{F}}_{t}$ stopping time τ ≤ T,

$$E[{x}^\varepsilon (T)\vert {\mathcal{F}}_{ \tau }] \geq {x}^\varepsilon (\tau )(1 + O(\sqrt\varepsilon )).$$

(5.22)

Note that x ^ε(t) is continuous. Therefore, it suffices to show the above inequality when τ takes values in a countable set {t ₁ , t ₂, …}. To this end, note that, for each t _i,

$$E[{x}^\varepsilon (T)\vert {\mathcal{F}}_{{ t}_{i}}] \geq {x}^\varepsilon ({t}_{ i})(1 + O(\sqrt\varepsilon )).$$

For all $A \in {\mathcal{F}}_{\tau }$ , we have $A \cap \{ \tau = {t}_{i}\} \in {\mathcal{F}}_{{t}_{i}}$ . Therefore,

$${\int }_{A\cap \{\tau ={t}_{i}\}}{x}^\varepsilon (T)\,dP \geq \left ({\int }_{A\cap \{\tau ={t}_{i}\}}{x}^\varepsilon (\tau )\,dP\right )(1 + O(\sqrt\varepsilon )).$$

Thus

$${\int }_{A}{x}^\varepsilon (T)\,dP \geq \left ({\int }_{A}{x}^\varepsilon (\tau )\,dP\right )(1 + O(\sqrt\varepsilon )),$$

and (5.22) follows.

Step 5. Let $a = \theta /\sqrt{{(T + 1)}^{3}}$ in Step 3. Then, for ε small enough, there exists K such that

$$P\left (\sup \limits_{t\leq T}{x}^\varepsilon (t) \geq x\right ) \leq \frac{K} {x},$$

(5.23)

for all x > 0.

In fact, let τ = inf{t > 0 : x ^ε(t) ≥ x}. We adopt the convention that τ = ∞ if { t > 0 : x ^ε(t) ≥ x} = ∅. Then we have

$$E[{x}^\varepsilon (T)] \geq (E[{x}^\varepsilon (T \wedge \tau )])(1 + O(\sqrt\varepsilon )),$$

and write

$$E[{x}^\varepsilon (T \wedge \tau )] = E[{x}^\varepsilon (\tau ){I}_{\{ \tau <T\}}] + E[{x}^\varepsilon (T){I}_{\{ \tau \geq T\}}] \geq E[{x}^\varepsilon (\tau ){I}_{\{ \tau <T\}}].$$

Moreover, in view of the definition of τ, we have

$$E\left [{x}^\varepsilon (\tau ){I}_{\{ \tau <T\}}\right ] \geq xP(\tau < T) \geq xP\left (\sup \limits_{t\leq T}{x}^\varepsilon (t) \geq x\right ).$$

It follows that

$$P\left (\sup \limits_{t\leq T}{x}^\varepsilon (t) \geq x\right ) \leq \frac{E[{x}^\varepsilon (T)]} {(1 + O(\sqrt\varepsilon ))x} \leq \frac{K} {x}.$$

Thus, (5.23) follows.

Finally, to complete the proof of (5.13), note that, for 0 < κ < 1,

$$E\exp \left ( \frac{\kappa \theta } {\sqrt{{(1 + T)}^{3}}}\sup \limits _{t\leq T}\vert {n}^\varepsilon (t,i)\vert \right ) = E\left [\sup \limits_{ t\leq T}{({x}^\varepsilon (t))}^{\kappa }\right ].$$

It follows that

$$\begin{array}{rl} E\left [\sup \limits_{t\leq T}{({x}^\varepsilon (t))}^{\kappa }\right ] =&{\int }_{0}^{\infty }P\left (\sup \limits_{ t\leq T}{({x}^\varepsilon (t))}^{\kappa } \geq x\right )\,dx \\ \leq &1 +{ \int }_{1}^{\infty }P\left (\sup \limits_{ t\leq T}{({x}^\varepsilon (t))}^{\kappa } \geq x\right )\,dx \\ \leq &1 +{ \int }_{1}^{\infty }P\left (\sup \limits_{ t\leq T}{x}^\varepsilon (t) \geq {x}^{1/\kappa }\right )\,dx \\ \leq &1 +{ \int }_{1}^{\infty}K{x}^{-1/\kappa }\,dx < \infty. \end{array}$$

This completes the proof. □

Next we give several corollaries to the theorem. Such estimates are useful for establishing exponential bounds of asymptotic optimal hierarchical controls in manufacturing models (see Sethi and Zhang [192]).

Corollary 5.6

In Theorem 5.4 , if Q(t) = Q, a constant matrix, then the following stronger estimate holds:

$$E\exp \left \{ \frac{{\theta }_{T}} {\sqrt{1 + T}}\sup \limits_{0\leq t\leq T}\left \vert {n}^\varepsilon (t)\right \vert \right \}\leq K.$$

(5.24)

Moreover, the constant θ _T = θ is independent of T for T > 0.

Proof: If Q(t) = Q, then φ₁(t) in Lemma 5.1 is identically 0. Therefore, the estimate (5.11) can be replaced by

$${P}^\varepsilon (t,s) - {P}_{ 0}(t) = {K}_{T}{O}_{1}\left (\exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).$$

As a result, the estimate in (5.15) can be replaced by

$$\sup \limits_{0\leq t\leq T}\left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert = \varepsilon {K}_{ T}{O}_{1}(1)+\varepsilon K{}_{T}\sup \limits_{0\leq t\leq T}\left \vert {\int }_{0}^{t}{O}_{ 1}(1)d{w}^\varepsilon (s)\right \vert.$$

The proof of (5.24) follows in essentially the same way as that of Theorem 5.4 (from equation (5.15) on).

To see that θ_T in (5.24) is independent of T, it suffices to note that in (5.11) the constant K _T is independent of T, which can be seen by examining closely Example 4.16. □

Corollary 5.7

Under the conditions of Theorem 5.4 , there exist constants K _j , such that for j = 1,2,…,

$$E\sup \limits_{0\leq t\leq T}{\left \vert {n}^\varepsilon (t)\right \vert }^{2j} \leq {K}_{ j}{(1 + T)}^{3j}.$$

(5.25)

Moreover, if Q(t) = Q, then for some K _j independent of T and

$$E\sup \limits_{0\leq t\leq T}{\left \vert {n}^\varepsilon (t)\right \vert }^{2j} \leq {K}_{ j}{(1 + T)}^{j}.$$

(5.26)

Proof: Since (5.26) follows from a similar argument to that of Corollary 5.6, it suffices to verify (5.25) using Theorem 5.4. Note that for each j = 1, 2, …, there exists K _j ⁰ such that for all x, we have x ^2j ≤ K _j ⁰ e ^x. Thus,

$${\left ( \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup \limits _{0\leq t\leq T}\left \vert {n}^\varepsilon (t)\right \vert \right)}^{2j} \leq {K}_{ j}^{0}\exp \left \{ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup \limits _{0\leq t\leq T}\left \vert{n}^\varepsilon (t)\right \vert \right \}.$$

Taking expectations on both sides of the above inequality yields the desired estimate. □

Corollary 5.8

Under the conditions of Theorem 5.4 , for each 0 < δ < 1∕2, we have

$$\begin{array}{l} P\left (\sup \limits_{0\leq t\leq T}\left \vert {\int}_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu}_{i}(s)){\beta }_{i}(s)ds\right \vert \geq \varepsilon ^{\frac{1}{2} -\delta }\right ) \\ \leq K\exp \left \{- \frac{{\theta }_{T}}{\varepsilon ^{\delta }{(T + 1)}^{\frac{3} {2} }}\right\}.\end{array}$$

(5.27)

Moreover, if Q(t) = Q, then θ _T = θ is independent of T and

$$\begin{array}{l} P\left (\sup \limits_{0\leq t\leq T}\left \vert {\int}_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu}_{i}(s)){\beta }_{i}(s)ds\right \vert \geq \varepsilon ^{\frac{1}{2} -\delta }\right ) \\ \leq K\exp \left \{- \frac{\theta }{\varepsilon ^{\delta }\sqrt{1 + T}}\right \}.\end{array}$$

(5.28)

Proof: Using Theorem 5.4, we obtain

$$\begin{array}{rl} &P\left (\sup \limits_{0\leq t\leq T}\left \vert{\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu}_{i}(s)){\beta }_{i}(s)ds\right \vert \geq \varepsilon ^{\frac{1} {2} -\delta }\right ) \\ =&P\left (\exp \left \{ \frac{{\theta}_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup \limits_{0\leq t\leq T}\left \vert{n}^\varepsilon (t)\right \vert \right \}\geq \exp \left \{\frac{{\theta }_{T}\varepsilon ^{\frac{1} {2} -\delta }}{\sqrt\varepsilon {(T + 1)}^{\frac{3} {2} }} \right \}\right ) \\ \leq &K\exp \left \{- \frac{{\theta }_{T}} {\varepsilon ^{\delta}{(T + 1)}^{\frac{3} {2} }} \right \} .\end{array}$$

This proves (5.27). Similarly, (5.28) follows from Corollary 5.6. □

2.4 Asymptotic Normality

Recall that the ith component of n ^ε( ⋅) is given by

$${n}^\varepsilon (t,i) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)\right ){\beta }_{i}(s)ds.$$

It is expected that the sequence of centered and scaled occupation measures will display certain “central limit type” phenomena. The goal here is to study the asymptotic properties of n ^ε( ⋅) as ε → 0. To be more specific, we show that n ^ε( ⋅) converges to a Gaussian process as ε goes to 0. The following theorem is the main result of this section.

Theorem 5.9

Suppose that (A5.1) is satisfied and Q(⋅) is twice continuously differentiable in [0,T] with the second derivative being Lipschitz. Then for t ∈ [0,T], the process n ^ε (⋅) converges weakly to a Gaussian process n(⋅) with independent increments such that

$$En(t) = 0\mbox{ and }E[n^{\prime}(t)n(t)] ={ \int }_{0}^{t}A(s)ds,$$

(5.29)

where A(t) = (A _ij (t)) with

$${A}_{ij}(t) = {\beta }_{i}(t){\beta }_{j}(t)\left [{\nu }_{i}(t){\int }_{0}^{\infty }{q}_{ 0,ij}(r,t)dr + {\nu }_{j}(t){\int }_{0}^{\infty }{q}_{ 0,ji}(r,t)dr\right ],$$

(5.30)

and Q ₀ (r,t) = (q _0,ij (r,t)).

Remark 5.10

In view of (5.29) and the independent increment property of n(t), it follows that

$$E[n^{\prime}({t}_{1})n({t}_{2})] ={ \int }_{0}^{\min \{{t}_{1},{t}_{2}\} }A(s)ds.$$

(5.31)

The form of the covariance matrix (between t₁ and t₂) reveals the nonstationarity of the limit process n(⋅). Note that the limit covariance of n(t) given in (5.31) is an integral of the function A(s) defined in (5.30). For simplicity, with a slight abuse of notation, we shall also call A(t) as the covariance. This convention will be used throughout the chapter.

Remark 5.11

The additional assumptions on the second derivative of Q(⋅) in Theorem 5.9 are required for computing or characterizing the function A(⋅). It is not crucial for the convergence of n ^ε (⋅); see Remark 5.44 in Section 5.3.3 for details.

Proof of Theorem 5.9: We divide the proof into several steps, which are presented by a number of lemmas.

Step 1. Show that the limit of the mean of n ^ε( ⋅) is 0.

Lemma 5.12

For each t ∈ [0,T],

$$\lim \limits_{\varepsilon \rightarrow 0}E{n}^\varepsilon (t) = 0.$$

Proof: Using Theorem 4.5 and the boundedness of β_i( ⋅), for t ∈ [0, T],

$$\begin{array}{rl} E{n}^\varepsilon (t,i) =& \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}(E{I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)){\beta }_{i}(s)ds \\ =& \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}(P({\alpha }^\varepsilon (s) = i) - {\nu }_{ i}(s)){\beta }_{i}(s)ds \\ =& \frac{1} {\sqrt\varepsilon}{\int }_{0}^{t}\left [O(\varepsilon ) + O\left (\exp \left ( -\frac{{\kappa }_{0}s} \varepsilon \right )\right )\right ]{\beta}_{i}(s)ds \\ =&O(\sqrt\varepsilon ) + \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}O\left(\exp \left( -\frac{{\kappa }_{0}s} \varepsilon \right)\right.ds = O(\sqrt\varepsilon ) \rightarrow 0, \end{array}$$

for each $i \in \mathcal{M}$. □

Step 2. Calculate the limit covariance function.

Lemma 5.13

For each t ∈ [0,T],

$$\lim \limits_{\varepsilon \rightarrow 0}E({n}^{\varepsilon,{\prime}}(t){n}^\varepsilon (t)) ={ \int }_{0}^{t}A(s)ds,$$

(5.32)

where A(t) is given by (5.30).

Proof: For each $i,j \in \mathcal{M}$,

$$\begin{array}{rl} E\left [{n}^\varepsilon (t,i){n}^\varepsilon (t,j)\right ] =&\frac{1} \varepsilon E\left[\left({\int}_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (\varsigma )=i\}} - {\nu}_{i}(\varsigma )){\beta }_{i}(\varsigma )d\varsigma \right)\right. \\ & \left.\times \left ({\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (r)=j\}} - {\nu }_{j}(r)){\beta }_{j}(r)dr\right )\right] \\ =&\frac{1} \varepsilon E\left[{\int }_{0}^{t}{ \int }_{0}^{t}\left({I}_{\{{ \alpha }^\varepsilon (\varsigma )=i,{\alpha}^\varepsilon (r)=j\}} - {\nu }_{i}(\varsigma ){I}_{\{{\alpha}^\varepsilon (r)=j\}}\right.\right. \\ & \left.\left.- {\nu}_{j}(r){I}_{\{{\alpha }^\varepsilon (\varsigma )=i\}} + {\nu}_{i}(\varsigma ){\nu }_{j}(r)\right){\beta }_{i}(\varsigma ){\beta}_{j}(r)d\varsigma dr\right]\end{array}.$$

Let

$$\begin{array}{rl} &{D}_{1} =\{ (\varsigma,r) :\; 0 \leq r \leq \varsigma \leq t\}, \\ &{D}_{2} =\{ (\varsigma,r) :\; 0 \leq \varsigma \leq r \leq t\}, \end{array}$$

and let

$$\begin{array}{rl} {\Phi }^\varepsilon (\varsigma,r)& = P({\alpha }^\varepsilon (\varsigma ) = i,{\alpha }^\varepsilon (r) = j) - {\nu }_{ i}(\varsigma )P({\alpha }^\varepsilon (r) = j) \\ &\quad - {\nu }_{j}(r)P({\alpha }^\varepsilon (\varsigma ) = i) + {\nu }_{ i}(\varsigma ){\nu }_{j}(r)\end{array}.$$

Then it follows that

$$\begin{array}{rl} E\left [{n}^\varepsilon (t,i){n}^\varepsilon (t,j)\right ] =&\frac{1} \varepsilon \left [{\int }_{0}^{t}{ \int }_{0}^{t}{\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr\right ] \\ =&\frac{1} \varepsilon \left ({\int }_{{D}_{1}} +{ \int }_{{D}_{2}}\right ){\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr\end{array}.$$

Note that if (ς, r) ∈ D ₁, then ς ≥ r and

$$\begin{array}{rl} &P({\alpha }^\varepsilon (\varsigma ) = i,{\alpha }^\varepsilon (r) = j) \\ &\quad = P({\alpha }^\varepsilon (\varsigma ) = i\vert {\alpha }^\varepsilon (r) = j)P({\alpha }^\varepsilon (r) = j)\end{array}.$$

Hence, for (ς, r) ∈ D ₁ we have

$$\begin{array}{ll} {\Phi }^\varepsilon (\varsigma,r) =&[P({\alpha }^\varepsilon (\varsigma ) = i\vert {\alpha }^\varepsilon (r) = j) - {\nu }_{ i}(\varsigma )]P({\alpha }^\varepsilon (r) = j) \\ & + {\nu }_{j}(r)[{\nu }_{i}(\varsigma ) - P({\alpha }^\varepsilon (\varsigma ) = i)]\end{array}.$$

Using Theorem 4.5 and Lemma 5.1, for (ς, r) ∈ D ₁,

$$\begin{array}{rl} {\Phi }^\varepsilon (\varsigma,r) =&\left (\varepsilon {\varphi }_{1}^{i}(\varsigma ) + {q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ) + \varepsilon {q}_{1,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ) + O(\varepsilon ^{2})\right ) \\ & \times \left ( {\nu }_{j}(r) + \varepsilon {\varphi }_{1}^{j}(r) + {\psi }_{ 0}^{j}\left (\frac{r} \varepsilon \right ) + \varepsilon {\psi }_{1}^{j}\left (\frac{r} \varepsilon \right ) + O(\varepsilon ^{2})\right ) \\ & - {\nu }_{j}(r)\left (\varepsilon {\varphi }_{1}^{i}(\varsigma ) + {\psi }_{ 0}^{i}\left (\frac{\varsigma } \varepsilon \right ) + \varepsilon {\psi }_{1}^{i}\left (\frac{\varsigma } \varepsilon \right ) + O(\varepsilon ^{2})\right ) \\ =&{\nu }_{j}(r){q}_{0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ) +\left [ O\left (\varepsilon \exp \left ( -\frac{{\kappa }_{0}r} \varepsilon \right )\right )\right. \\ & \left.+ O\left (\varepsilon \exp \left ( -\frac{{\kappa }_{0}(\varsigma - r)} \varepsilon \right )\right ) + O\left (\exp \left ( -\frac{{\kappa }_{0}\varsigma } \varepsilon \right )\right ) + O(\varepsilon ^{2})\right ]\end{array}.$$

In the above, φ _ℓ ⁱ and ψ _ℓ ⁱ denote the ith components of the vectors φ_ℓ and ψ_ℓ, respectively. By elementary integration, we have

$$\begin{array}{l} {\int }_{0}^{t}\left ({ \int }_{0}^{\varsigma }\exp \left ( -\frac{{\kappa }_{0}\varsigma } \varepsilon \right )dr\right )d\varsigma ={ \int }_{0}^{t}\varsigma \exp \left ( -\frac{{\kappa }_{0}\varsigma } \varepsilon \right )d\varsigma = O(\varepsilon ^{2}), \\ \varepsilon {\int }_{0}^{t}\left ({ \int }_{0}^{\varsigma }\exp \left ( -\frac{{\kappa }_{0}r} \varepsilon \right )dr\right )d\varsigma = \frac{\varepsilon ^{2}} {{\kappa }_{0}}{ \int }_{0}^{t}\left (1 -\exp \left (-\frac{{\kappa }_{0}\varsigma } \varepsilon \right )\right )d\varsigma = O(\varepsilon ^{2}), \\ \mbox{and} \\ \varepsilon {\int }_{0}^{t}\left ({ \int }_{0}^{\varsigma }\exp \left ( -\frac{{\kappa }_{0}(\varsigma - r)} \varepsilon \right )dr\right )d\varsigma = \varepsilon {\int }_{0}^{t}\left ({ \int }_{0}^{\varsigma }\exp \left ( -\frac{{\kappa }_{0}r} \varepsilon \right )dr\right )d\varsigma = O(\varepsilon ^{2})\end{array}.$$

Thus, it follows that

$$\begin{array}{rl} &{ \int }_{{D}_{1}}{\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr \\ =&{\int }_{0}^{t}\left ({\int }_{0}^{\varsigma }{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\nu }_{j}(r){\beta }_{i}(\varsigma ){\beta }_{j}(r)dr\right )d\varsigma + O(\varepsilon ^{2})\end{array}.$$

Exchanging the order of integration leads to

$$\begin{array}{rl} &{\int }_{0}^{t}\left ({\int }_{0}^{\varsigma }{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\nu }_{j}(r){\beta }_{i}(\varsigma ){\beta }_{j}(r)dr\right )d\varsigma \\ =&{\int }_{0}^{t}\left ({\int }_{r}^{t}{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\nu }_{j}(r){\beta }_{i}(\varsigma ){\beta }_{j}(r)d\varsigma \right )dr \\ =&{\int }_{0}^{t}{\beta }_{ j}(r){\nu }_{j}(r)\left ({\int }_{r}^{t}{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\beta }_{i}(\varsigma )d\varsigma \right )dr .\end{array}$$

Making a change of variables (via $\varsigma - r = \varepsilon s$) yields

$${\int }_{r}^{t}{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\beta }_{i}(\varsigma )d\varsigma = \varepsilon {\int }_{0}^{(t-r)/\varepsilon }{q}_{ 0,ji}(s,r){\beta }_{i}(r + \varepsilon s)ds.$$

We note that β_i ( ⋅) is bounded and β _i (r + ε s) → β_i(r) in L ¹ for each r ∈ [0, T], as ε → 0. Since q _0, ji( ⋅) decays exponentially fast, as in Lemma 5.1, we have

$${\int }_{0}^{(t-r)/\varepsilon }{q}_{ 0,ji}(s,r){\beta }_{i}(r + \varepsilon s)ds \rightarrow {\beta }_{i}(r){\int }_{0}^{\infty }{q}_{ 0,ji}(s,r)ds.$$

Therefore, we obtain

$$\begin{array}{rl} &\lim \limits_{\varepsilon \rightarrow 0 } \frac{1} \varepsilon {\int }_{{D}_{1}}{\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr \\ =&{\int }_{0}^{t}{\beta }_{ i}(r){\beta }_{j}(r){\nu }_{j}(r)\left ({\int }_{0}^{\infty }{q}_{ 0,ji}(s,r)ds\right )dr.\end{array}$$

(5.33)

Similarly, we can show that

$$\begin{array}{rl} &\lim \limits_{\varepsilon \rightarrow 0 } \frac{1} \varepsilon {\int }_{{D}_{2}}{\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr \\ =&{\int }_{0}^{t}{\beta }_{ i}(r){\beta }_{j}(r){\nu }_{i}(r)\left ({\int }_{0}^{\infty }{q}_{ 0,ij}(s,r)ds\right )dr.\end{array}$$

(5.34)

Combining (5.33) and (5.34), we obtain

$$\lim \limits_{\varepsilon \rightarrow 0}E\left [{n}^\varepsilon (t,i){n}^\varepsilon (t,j)\right ] ={ \int }_{0}^{t}{A}_{ ij}(s)ds,$$

with A(t) = ( A _ij(t)) given by (5.30). □

Step 3. Establish a mixing condition for the sequence { n ^ε( ⋅)}.

Lemma 5.14

For any ς ≥ 0 and σ{α ^ε (s) : s ≥ t + ς}-measurable η with |η|≤ 1,

$${\vert }E(\eta \vert {\alpha }^\varepsilon (s) :\; s \leq t) -E\eta {\vert }\leq K\exp l \left( -\frac{\kappa \varsigma } \varepsilon \right )\quad \mbox{ w.p.1.}$$

(5.35)

Remark 5.15

It follows from (5.35) that for any σ{α^ε(s) : 0 ≤ s ≤ t}-measurable ξ with |ξ|≤ 1 and η given in Lemma 5.14,

$${\vert }E\xi \eta - E\xi E\eta {\vert }\leq K\exp \left ( -\frac{\kappa \varsigma } \varepsilon \right ).$$

(5.36)

We will make crucial use of (5.35) and (5.36) in what follows.

Proof of Lemma 5.14: For any

$$0 \leq {s}_{1} \leq {s}_{2} \leq \cdots \leq {s}_{n} = t \leq t + \varsigma = {t}_{0} \leq {t}_{1} \leq \cdots \leq {t}_{l} < \infty,$$

let

$$\begin{array}{rl} &{E}_{1} =\{ {\alpha }^\varepsilon (t) = i,\;{\alpha }^\varepsilon ({s}_{n-1}) = {i}_{n-1},\;\ldots,{\alpha }^\varepsilon ({s}_{1}) = {i}_{1}\}\mbox{ and } \\ &{E}_{2} =\{ {\alpha }^\varepsilon (t + \varsigma ) = j,\;{\alpha }^\varepsilon ({t}_{1}) = {j}_{1},\;\ldots,{\alpha }^\varepsilon ({t}_{l}) = {j}_{l}\}\end{array}$$

Then in view of the Markovian property of α ^ε ( ⋅),

$$\begin{array}{rl} &P({E}_{2}\vert {E}_{1}) = P({E}_{2}\vert {\alpha }^\varepsilon (t) = i) \\ & = P({\alpha }^\varepsilon (t + \varsigma ) = j\vert {\alpha }^\varepsilon (t) = i)[{p}_{ j,{j}_{1}}^\varepsilon ({t}_{ 1},t + \varsigma )\cdots {p}_{{j}_{l-1},{j}_{l}}^\varepsilon ({t}_{ l},{t}_{l-1})]\end{array}.$$

Similarly, we have

$$P({E}_{2}) = P({\alpha }^\varepsilon (t + \varsigma ) = j)[{p}_{ j,{j}_{1}}^\varepsilon ({t}_{ 1},t + \varsigma )\cdots {p}_{{j}_{l-1},{j}_{l}}^\varepsilon ({t}_{ l},{t}_{l-1})].$$

We first show that

$${\vert }P({E}_{2}\vert {E}_{1}) - P({E}_{2}){\vert }\leq K\exp \left ( -\frac{\kappa \varsigma } \varepsilon \right ),$$

(5.37)

for some positive constants K and κ that are independent of $i,j \in \mathcal{M}$ and t ∈ [0, T].

To verify (5.37), it suffices to show that for any $k \in \mathcal{M}$,

$${\vert }{p}_{ij}^\varepsilon (t + \varsigma,t) - {p}_{kj}^\varepsilon (t + \varsigma,t){\vert }\leq K\exp \left (-\frac{2\kappa \varsigma } \varepsilon \right ).$$

(5.38)

Since P ₀ ( ⋅) and P ₁( ⋅) have identical rows, the asymptotic expansion in (5.3) implies that ${p}_{ij}^\varepsilon (t + \zeta,t) - {p}_{kj}^\varepsilon (t + \zeta,t)$ is determined by Q ₀ (ζ ∕ ε, t). By virtue of the asymptotic expansion (see Theorem 4.5 and Lemma 5.1), there exist a K ₁ > 0 and a κ ₀ > 0 such that

$$\left\vert {Q}_{0}\left(\frac{(t +\widetilde{ t}) - t} \varepsilon ,t\right)\right \vert \leq {K}_{1}\exp \left(-\frac{{\kappa }_{0}\widetilde{t}} \varepsilon \right),\mbox{ for all }\widetilde{t} \geq 0.$$

Choose N > 0 sufficiently large that K ₁ exp( − κ ₀ N) < 1. Then for ε > 0 sufficiently small, there is a 0 < ρ < 1 such that

$$\left\vert{p}_{ij}^\varepsilon (t + N\varepsilon,t) - {p}_{ kj}^\varepsilon (t + N\varepsilon,t)\right\vert \leq \rho.$$

To proceed, subdivide $[t + N\varepsilon,t + \varsigma ]$ into intervals of length Nε.

In view of the Chapman–Kolmogorov equation,

$$\begin{array}{rl} &\vert {p}_{ij }^\varepsilon (t + 2N\varepsilon,t) - {p}_{ kj}^\varepsilon (t + 2N\varepsilon,t)\vert \\ =&\left |{\sum }_{{l}_{0}=1}^{m}[{p}_{ i{l}_{0}}^\varepsilon (t + N\varepsilon,t) - {p}_{ k{l}_{0}}^\varepsilon (t + N\varepsilon,t)]{p}_{{ l}_{0}j}^\varepsilon (t + 2N\varepsilon,t + N\varepsilon )\right | \\ =&\left |{\sum }_{{l}_{0}=1}^{m}[{p}_{ i{l}_{0}}^\varepsilon (t + N\varepsilon,t) - {p}_{ k{l}_{0}}^\varepsilon (t + N\varepsilon,t)] \right.\\ & \times [{p}_{{l}_{0}j}^\varepsilon (t + 2N\varepsilon,t + N\varepsilon ) - {p}_{{ l}_{1}j}^\varepsilon (t + 2N\varepsilon,t + N\varepsilon )] | \leq K{\rho }^{2},\end{array}$$

for any ${l}_{1} \in \mathcal{M}$ . Iterating on the inequality above, we arrive at

$$\left\vert{p}_{ij}^\varepsilon (t + {k}_{ 0}N\varepsilon,t) - {p}_{kj}^\varepsilon (t + {k}_{ 0}N\varepsilon,t)\right\vert\leq K{\rho }^{{k}_{0} },\;\mbox{ for }{k}_{0} \geq 1.$$

Choose $\kappa = -1/(2N)\log \rho $ , and note that κ > 0. Then for any ς satisfying k ₀ Nε ≤ ς < ( k ₀ + 1)Nε,

$$\left\vert{p}_{ij}^\varepsilon (t + \varsigma,t) - {p}_{ kj}^\varepsilon (t + \varsigma,t)\right\vert\leq K\exp \left( -\frac{2\kappa \varsigma } \varepsilon \right).$$

Thus (5.37) holds. This implies that α^ε( ⋅) is a mixing process with exponential mixing rate. By virtue of Lemma A.16, (5.35) holds. □

Step 4. Prove that the sequence n ^ε( ⋅) is tight, and any weakly convergent subsequence of {n ^ε( ⋅)} has continuous paths with probability 1.

Lemma 5.16

The following assertions hold:

(a)
{n ^ε(t); t ∈ [0, T]} is tight in $D([0,T]; {\mathbb{R}}^{m})$, where $D([0,T]; {\mathbb{R}}^{m})$ denotes the space of functions that are defined on [0, T] and that are right continuous with left limits.
(b)
The limitn( ⋅) of any weakly convergent subsequence ofn ^ε( ⋅) has continuous sample paths with probability 1.

Proof: For $i \in \mathcal{M}$ , define

$$\widetilde{{n}}^\varepsilon (t,i) ={ 1 \over \sqrt\varepsilon } {\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - P({\alpha }^\varepsilon (s) = i)\right ){\beta }_{ i}(s)ds.$$

By virtue of Theorem 4.5,

$$\frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}\left (P({\alpha }^\varepsilon (s) = i) - {\nu }_{ i}(s)\right ){\beta }_{i}(s)ds = O(\sqrt\varepsilon ).$$

Thus ${n}^\varepsilon (t,i) =\widetilde{ {n}}^\varepsilon (t,i) + O(\sqrt\varepsilon )$ , and as a result the tightness of { n ^ε( ⋅)} will follow from the tightness of $\{\widetilde{{n}}^\varepsilon (\cdot )\}$ (see Kushner [139, Lemma 5, p. 50]).

For the tightness of $\{\widetilde{{n}}^\varepsilon (\cdot )\}$, in view of Kushner [139, Theorem 5, p. 32], it suffices to show that

$$E\vert \widetilde{{n}}^\varepsilon (t + \varsigma ) -\widetilde{ {n}}^\varepsilon (t){\vert }^{4} \leq K{\varsigma }^{2}.$$

(5.39)

To verify this assertion, it is enough to prove that for each $i \in \mathcal{M}$, $\widetilde{{n}}^\varepsilon (\cdot,i)$ satisfies the condition.

Fix $i \in \mathcal{M}$ and for any 0 ≤ t ≤ T, let

$$\theta (t) = \left ({I}_{\{{\alpha }^\varepsilon (t)=i\}} - P({\alpha }^\varepsilon (t) = i)\right ){\beta }_{ i}(t).$$

We have suppressed the i and ε dependence in θ( t) for ease of presentation. Let $D =\{ ({s}_{1},{s}_{2},{s}_{3},{s}_{4}) : t \leq {s}_{i} \leq t + \varsigma,i = 1,2,3,4\}$. It follows that

$$\begin{array}{l} E\vert \widetilde{{n}}^\varepsilon (t + \varsigma,i) -\widetilde{ {n}}^\varepsilon (t,i){\vert }^{4} \\ \leq { 1 \over \varepsilon ^{2}} { \int }_{D}\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4})\vert d{s}_{1}d{s}_{2}d{s}_{3}d{s}_{4}.\end{array}$$

(5.40)

Let (i ₁ , i ₂, i ₃ , i ₄) denote a permutation of (1, 2, 3, 4) and

$${D}_{{i}_{1}{i}_{2}{i}_{3}{i}_{4}} =\{ ({s}_{1},{s}_{2},{s}_{3},{s}_{4}) :\; t \leq {s}_{{i}_{1}} \leq {s}_{{i}_{2}} \leq {s}_{{i}_{3}} \leq {s}_{{i}_{4}} \leq t + \varsigma \}.$$

Then it is easy to see that $D = \cup {D}_{{i}_{1}{i}_{2}{i}_{3}{i}_{4}}$. This and (5.40) leads to

$$\begin{array}{rl} &E\vert \widetilde{{n}}^\varepsilon (t + \varsigma,i) -\widetilde{ {n}}^\varepsilon (t,i){\vert }^{4} \\ & \leq \frac{K} {\varepsilon ^{2}}{ \int }_{{D}_{0}}\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4})\vert d{s}_{1}d{s}_{2}d{s}_{3}d{s}_{4}, \end{array}$$

where D ₀ = D ₁₂₃₄.

Note that

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4})\vert \\ &\leq \vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})\vert \\ &\qquad + \vert E\theta ({s}_{1})\theta ({s}_{2})\vert \vert E\theta ({s}_{3})\theta ({s}_{4})\vert.\end{array}$$

(5.41)

By virtue of (5.36) and Eθ( t) = 0, t ≥ 0,

$$\begin{array}{rl} \vert E\theta ({s}_{1})\theta ({s}_{2})\vert & = \vert E\theta ({s}_{1})\theta ({s}_{2}) - E\theta ({s}_{1})E\theta ({s}_{2})\vert \\ & \leq K\exp \left (-\frac{\kappa ({s}_{2} - {s}_{1})} \varepsilon \right ).\end{array}$$

Similarly, we have

$$\begin{array}{rl} \vert E\theta ({s}_{3})\theta ({s}_{4})\vert & = \vert E\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{3})E\theta ({s}_{4})\vert \\ &\leq K\exp \left (-\frac{\kappa ({s}_{4} - {s}_{3})} \varepsilon \right )\end{array}$$

Therefore, it follows that

$${ K \over \varepsilon ^{2}} { \int }_{{D}_{0}}\vert E\theta ({s}_{1})\theta ({s}_{2})\vert \cdot \vert E\theta ({s}_{3})\theta ({s}_{4})\vert d{s}_{1}d{s}_{2}d{s}_{3}d{s}_{4} \leq K{\varsigma }^{2}.$$

(5.42)

The elementary inequality ${(a + b)}^{1/2} \leq {a}^{1/2} + {b}^{1/2}$ for nonnegative numbers a and b yields that

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})\vert \\ &\quad ={ \left (\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} }\right )}^{2} \\ & \quad \leq \vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})){\vert }^{\frac{1} {2} } \\ & \qquad \times \left (\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} } + \vert E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} }\right )\end{array}$$

In view of (5.36), we obtain

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})){\vert }^{\frac{1} {2} } \\ & \quad \leq K\exp \left (-\frac{\kappa ({s}_{3} - {s}_{2})} {2\varepsilon } \right )\end{array}$$

Similarly, by virtue of (5.35) and the boundedness of θ(s),

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} } \\ & \quad = \vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})(E(\theta ({s}_{4})\vert {\alpha }^\varepsilon (s) :\; s \leq {s}_{3}) - E\theta ({s}_{4})){\vert }^{\frac{1} {2} } \\ & \quad \leq K\exp \left (-\frac{\kappa ({s}_{4} - {s}_{3})} {2\varepsilon } \right ),\end{array}$$

and

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} } \\ & \quad = \vert (E\theta ({s}_{1})\theta ({s}_{2}) - E\theta ({s}_{1})E\theta ({s}_{2}))(E\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{3})E\theta ({s}_{4})){\vert }^{\frac{1} {2} } \\ & \quad \leq K\exp \left (-\frac{\kappa ({s}_{2} - {s}_{1})} {2\varepsilon } \right )\exp \left (-\frac{\kappa ({s}_{4} - {s}_{3})} {2\varepsilon } \right )\end{array}$$

By virtue of the estimates above, we arrive at

$$\begin{array}{ll} &{ K \over \varepsilon ^{2}} { \int }_{{D}_{0}}\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) \\ &\qquad - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})\vert d{s}_{1}d{s}_{2}d{s}_{3}d{s}_{4} \leq K{\varsigma }^{2}.\end{array}$$

(5.43)

The estimate (5.39) then follows from (5.42) and (5.43), and so does the desired tightness of {n ^ε ( ⋅)}.

Since {n ^ε( ⋅)} is tight, by Prohorov’s theorem, we extract a convergent subsequence, and for notational simplicity, we still denote the sequence by {n ^ε ( ⋅)} whose limit is n( ⋅). By virtue of Kushner [139, Theorem 5, p. 32] or Ethier and Kurtz [59, Proposition 10.3, p. 149], n( ⋅) has continuous paths with probability 1. □

Remark 5.17

Step 4 implies that both n^ε(⋅) and n(⋅) have continuous sample paths with probability 1. It follows, in view of Prohorov’s theorem (see Billingsley [13]), that n^ε(⋅) is tight in $C([0,T]; {\mathbb{R}}^{m})$.

Step 5. Show that the finite-dimensional distributions of n ^ε( ⋅) converge to that of a Gaussian process with independent increments.

This part of the proof is similar to Khasminskii [112] (see also Friedlin and Wentzel [67, pp. 224]). Use ι to denote the imaginary number ${\iota }^{2} = -1$. To prove the convergence of the finite-dimensional distributions, we use the characteristic function Eexp(ι⟨ z, n ^ε(t)⟩), where $z \in {\mathbb{R}}^{m}$ and ⟨ ⋅. ⋅⟩ denotes the usual inner product in ${\mathbb{R}}^{m}$. Owing to the mixing property and repeated applications of Remark 5.15, for arbitrary positive real numbers s _l and t _l satisfying

$$0 \leq {s}_{0} \leq {t}_{0} \leq {s}_{1} \leq {t}_{1} \leq {s}_{2} \leq \cdots \leq {s}_{n} \leq {t}_{n},$$

we have

$$\begin{array}{rl} &|E\exp \left (\iota \sum \limits_{l=0}^{n}\langle {z}_{ l},({n}^\varepsilon ({t}_{ l}) - {n}^\varepsilon ({s}_{ l}))\rangle \right ) \\ & -{\prod }_{l=0}^{n}E\exp \left (\iota \langle {z}_{ l},({n}^\varepsilon ({t}_{ l}) - {n}^\varepsilon ({s}_{ l}))\rangle \right )| \rightarrow 0\end{array}$$

as ε → 0, for ${z}_{l} \in {\mathbb{R}}^{m}$. This, in turn, implies that the limit process n( ⋅) has independent increments. Moreover, in view of Lemma 5.16, the limit process has continuous path with probability 1. In accordance with a result in Skorohod [197, p. 7], if a process with independent increments has continuous paths w.p.1, then it must necessarily be a Gaussian process. This implies that the limits of the finite-dimensional distribution of n( ⋅) are Gaussian.

Consequently, n( ⋅) is a process having Gaussian finite-dimensional distributions, with mean zero and covariance ∫₀ ^t A( s)ds given by Lemma 5.13. Moreover, the limit does not depend on the chosen subsequence. Thus n ^ε( ⋅) converges weakly to the Gaussian process n( ⋅). This completes the proof of the theorem. □

To illustrate, we give an example in which the covariance function of the limit process can be calculated explicitly.

Example 5.18

Let ${\alpha }^\varepsilon (t) \in \mathcal{M} =\{ 1,2\}$ be a two-state Markov chain with a generator

$$Q(t) = \left (\begin{array}{*{10}c} -{\mu }_{1}(t)& {\mu }_{1}(t) \\ {\mu }_{2}(t) &-{\mu }_{2}(t)\\ \end{array} \right )$$

where μ₁(t) ≥ 0, μ₂(t) ≥ 0, and μ₁(t) + μ₂(t) > 0 for each t ∈ [0,T]. Moreover, μ₁(⋅) and μ₂(⋅) are twice continuously differentiable with Lipschitz continuous second derivatives. It is easy to see that assumptions (A5.1) and (A5.2) are satisfied. Therefore the desired asymptotic normality follows.

In this example,

$$\nu (t) = ({\nu }_{1}(t),{\nu }_{2}(t)) = \left ( \frac{{\mu }_{2}(t)} {{\mu }_{1}(t) + {\mu }_{2}(t)}, \frac{{\mu }_{1}(t)} {{\mu }_{1}(t) + {\mu }_{2}(t)}\right ).$$

Moreover,

$${Q}_{0}(s,{t}_{0}) = -\frac{\exp (-({\mu }_{1}({t}_{0}) + {\mu }_{2}({t}_{0}))s)} {{\mu }_{1}({t}_{0}) + {\mu }_{2}({t}_{0})} Q({t}_{0}).$$

Thus,

$$A(t) = \frac{2{\mu }_{1}(t){\mu }_{2}(t)} {{({\mu }_{1}(t) + {\mu }_{2}(t))}^{3}}\left (\begin{array}{cc} {({\beta }_{1}(t))}^{2} & - {\beta }_{1}(t){\beta }_{2}(t) \\ - {\beta }_{1}(t){\beta }_{2}(t)& {({\beta }_{2}(t))}^{2} \end{array} \right ).$$

2.5 Extensions

In this section, we generalize our results in the previous sections including asymptotic expansions, asymptotic normality, and exponential bounds, to the Markov chain α^ε ( ⋅) with generator given by ${Q}^\varepsilon (t) = Q(t)/\varepsilon +\widehat{ Q}(t)$ with weakly irreducible generator Q(t). Recall that the vector of probabilities ${p}^\varepsilon (t) = (P({\alpha }^\varepsilon (t) = 1),\ldots,P({\alpha }^\varepsilon (t) = m))$ satisfies the differential equation

$$\begin{array}{l} \frac{d{p}^\varepsilon (t)} {dt} = {p}^\varepsilon (t){Q}^\varepsilon (t),\;{p}^\varepsilon (t) \in {\mathbb{R}}^{m}, \\ {p}^\varepsilon (0) = {p}^{0}\mbox{ with }{p}_{ i}^{0} \geq 0\mbox{ for }i \in \mathcal{M}\mbox{ and }{\sum }_{i=1}^{m}{p}_{ i}^{0} = 1, \end{array}$$

To proceed, the following conditions are needed.

(A5.3)
Both Q(t) and $\widehat{Q}(t)$ are generators. For each t ∈ [0, T], Q(t) is weakly irreducible.
(A5.4)
For some positive integer n ₀, Q( ⋅) is ( n ₀ + 1)-times continuously differentiable on [0, T] and $({d}^{{n}_{0}+1}/d{t}^{{n}_{0}+1})Q(\cdot )$ is Lipschitz. Moreover, $\widehat{Q}(\cdot )$ is n ₀-times continuously differentiable on [0, T] and $({d}^{{n}_{0}}/d{t}^{{n}_{0}})\widehat{Q}(\cdot )$ is Lipschitz.

Similarly to Section 4.2 for $k = 1,\ldots,{n}_{0} + 1$, the outer expansions lead to equations

$$\begin{array}{l} \varepsilon ^{0} :\; {\varphi }_{ 0}(t)Q(t) = 0, \\ \varepsilon ^{1} :\; {\varphi }_{ 1}(t)Q(t) + {\varphi }_{0}(t)\widehat{Q}(t) ={ d{\varphi }_{0}(t) \over dt}, \\ \ \qquad \ \cdots \\ \varepsilon ^{k} :\; {\varphi }_{ k}(t)Q(t) + {\varphi }_{k-1}(t)\widehat{Q}(t) ={ d{\varphi }_{k-1}(t) \over dt},\end{array}$$

(5.44)

with constraints

$${\sum }_{i=1}^{m}{\varphi }_{ 0,i}(t) = 1$$

and

$${\sum }_{i=1}^{m}{\varphi }_{ k,i}(t) = 0,\mbox{ for }k \geq 1.$$

The initial-layer correction terms are

$$\begin{array}{l} \varepsilon ^{0} :\ { d{\psi }_{0}(\tau ) \over d\tau } = {\psi }_{0}(\tau )Q(0), \\ \varepsilon ^{1} :\ { d{\psi }_{1}(\tau ) \over d\tau } = {\psi }_{1}(\tau )Q(0) + {\psi }_{0}(\tau )\left (\tau { dQ(0) \over dt} +\widehat{ Q}(0)\right ), \\ \ \qquad \ \cdots \\ \varepsilon ^{k} :\ { d{\psi }_{k}(\tau ) \over d\tau } = {\psi }_{k}(\tau )Q(0) + {r}_{k}(\tau ),\end{array}$$

(5.45)

where

$${r}_{k}(\tau ) = \sum \limits_{i=1}^{k}{\psi }_{ k-i}(\tau )\left ({ {\tau }^{i} \over i!} { {d}^{i}Q(0) \over d{t}^{i}} +{ {\tau }^{i-1} \over (i - 1)!} { {d}^{i-1}\widehat{Q}(0) \over d{t}^{i-1}} \right ),$$

with initial conditions

$$\begin{array}{rl} &{\psi }_{0}(0) = {p}^{0} - {\varphi }_{ 0}(0),\mbox{ and } \\ &{\psi }_{k}(0) = -{\varphi }_{k}(0)\mbox{ for }k \geq \end{array}$$

(1.)

Theorem 5.19

Suppose that (A5.3) and (A5.4) are satisfied. Then

(a)
φ_i( ⋅) is $({n}_{0} + 1 - i)$ -times continuously differentiable on [0, T],
(b)
for each i, there is a $\widehat{\kappa } > 0$ such that
$${\left |{\psi }_{i}\left ({ t \over \varepsilon } \right )\right |} \leq K\exp \left (-\frac{\widehat{\kappa }t} \varepsilon \right ),\,\mbox{ and}$$
(c)
the approximation error satisfies
$${ \sup }_{t\in [0,T]}{\left |{p}^\varepsilon (t) -{\sum }_{i=0}^{{n}_{0} }\varepsilon ^{i}{\varphi }_{ i}(t) -{\sum }_{i=0}^{{n}_{0} }\varepsilon ^{i}{\psi }_{ i}\left ({ t \over \varepsilon } \right )\right |} \leq K\varepsilon ^{{n}_{0}+1}.$$
(5.46)

The proof of this theorem is similar to those of Theorem 4.5, and is thus omitted. We also omit the proofs of the following two theorems because they are similar to that of Theorem 5.4 and Theorem 5.9, respectively.

Theorem 5.20

Suppose (A5.3) and (A5.4) are satisfied with n ₀ = 0. Then there exist positive constants ε ₀ and K such that for 0 < ε ≤ ε ₀,$i \in \mathcal{M}$ , and for any deterministic process β _i (⋅) satisfying |β _i (t)|≤ 1 for all t ≥ 0, we have

$$E\exp \left \{{ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup }_{0\leq t\leq T}\left \vert {n}^\varepsilon (t)\right \vert \right \}\leq K,$$

where θ _T and n ^ε (⋅) are as defined previously.

Corollary 5.21

Consider ${Q}^\varepsilon = Q/\varepsilon +\widehat{ Q}$ with constant generators Q and $\widehat{Q}$ such that Q is weakly irreducible. Then (5.25) and (5.27) hold with constants K and K _j independent of T.

Theorem 5.22

Suppose (A5.3) and (A5.4) are satisfied with n ₀ = 1. Then for t ∈ [0,T], the process n ^ε (⋅) converges weakly to a Gaussian process n(⋅) such that

$$En(t) = 0\mbox{ and }E[n^{\prime}(t)n(t)] ={ \int }_{0}^{t}A(s)ds,$$

where A(t) = (A _ij (t)) with

$${A}_{ij}(t) = {\beta }_{i}(t){\beta }_{j}(t)\left [{\nu }_{i}(t){\int }_{0}^{\infty }{q}_{ 0,ij}(r,t)dr + {\nu }_{j}(t){\int }_{0}^{\infty }{q}_{ 0,ji}(r,t)dr\right ],$$

and Q ₀ (r,t) = (q _0,ij (r,t)) satisfying

$$\begin{array}{rl} &\frac{d{Q}_{0}(r,t)} {dr} = {Q}_{0}(r,t)Q(t),\;r \geq 0, \\ &{Q}_{0}(0,t) = I - {P}_{0}(t), \end{array}$$

with P ₀ (t) = (ν′(t),…,ν′(t))′.

Remark 5.23

In view of Theorem 5.22 , the asymptotic covariance is determined by the quasi-stationary distribution ν(t) and Q ₀ (r,t). Both ν(t) and Q ₀ (r,t) are determined by Q(t), the dominating term in Q ^ε (t). In the asymptotic normality analysis, it is essential to have the irreducibility condition of Q(t), whereas the role of $\widehat{Q}(t)$ is not as important. If Q(t) is weakly irreducible, then there exists an ε ₀ > 0 such that ${Q}^\varepsilon (t) = Q(t)/\varepsilon +\widehat{ Q}(t)$ is weakly irreducible for 0 < ε ≤ ε ₀, as shown in Sethi and Zhang [192, Lemma J.10].

By introducing another generator $\widehat{Q}(t)$ , we are dealing with a singularly perturbed Markovian system with fast and slow motions. Nevertheless, the entire system under consideration is still weakly irreducible. This irreducibility allows us to extend our previous results with minor modifications.

Although most of the results in this section can be extended to the case with ${Q}^\varepsilon (t) = Q(t)/\varepsilon +\widehat{ Q}(t)$ , there are some exceptions. For example, Corollary 5.6 would not go through because even with constant matrix $\widehat{Q}(t) =\widehat{ Q}$ , φ ₁ (t) in Lemma 5.1 does not equal 0 when $\widehat{Q}\not =0$.

One may wonder what happens if Q(t) in Q ^ε (t) is not weakly irreducible. In particular, one can consider the case in which Q(t) consists of several blocks of irreducible submatrices. Related results of asymptotic normality and the exponential bounds are treated in subsequent sections.

3 Markov Chains with Weak and Strong Interactions

For brevity, unless otherwise noted, in the rest of the book, whenever the phrase “weak and strong interaction” is used, it refers to the case of two-time-scale Markov chains with all states being recurrent. Similar approaches can be used for the other cases as well. The remainder of the chapter concentrates on exploiting detailed structures of the weak and strong interactions. In addition, it deals with convergence of the probability distribution with merely measurable generators.

We continue our investigation of asymptotic properties of the Markov chain α^ε ( ⋅) generated by Q ^ε( ⋅), with

$${Q}^\varepsilon (t) = \frac{1} \varepsilon \widetilde{Q}(t) +\widehat{ Q}(t),\mbox{ for }t \geq 0,$$

(5.47)

where $\widetilde{Q}(t) = \mathrm{diag}(\widetilde{{Q}}^{1}(t),\ldots,\widetilde{{Q}}^{l}(t))$ is a block-diagonal matrix such that $\widehat{Q}(t)$ and $\widetilde{{Q}}^{k}(t)$, for k = 1, …, l, are themselves generators. The state space of α ^ε ( ⋅) is given by

$$\mathcal{M} = \{{s}_{11},\ldots,{s}_{1{m}_{1}},\ldots,{s}_{l1},\ldots,{s}_{l{m}_{l}}\}.$$

For each k = 1, …, l, let ${\mathcal{M}}_{k} =\{ {s}_{k1},\ldots,{s}_{k{m}_{k}}\}$, representing the group of states corresponding to $\widetilde{{Q}}^{k}(t)$.

The results in Section 5.3.1 reveal the structures of the Markov chains with weak and strong interactions based on the following observations. Intuitively, for small ε, the Markov chain α^ε( ⋅) jumps more frequently within the states in ${\mathcal{M}}_{k}$ and less frequently from ${\mathcal{M}}_{k}$ to ${\mathcal{M}}_{j}$ for j ≠ k. Therefore, the states in ${\mathcal{M}}_{k}$ can be aggregated and represented by a single state k (one may view the state k as a super state). That is, one can approximate α^ε( ⋅) by an aggregated process, say, ${\overline{\alpha }}^\varepsilon (\cdot )$. Furthermore, by examining the tightness and finite-dimensional distribution of ${\overline{\alpha }}^\varepsilon (\cdot )$, it will be shown that ${\overline{\alpha }}^\varepsilon (\cdot )$ converges weakly to a Markov chain $\overline{\alpha }(\cdot )$ generated by

$$\overline{Q}(t) = \mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))\widehat{Q}(t)\mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{ m}_{1}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{l}}).$$

(5.48)

Section 5.3.2 continues the investigation along the line of estimating the error bounds of the approximation. Our interest lies in finding how closely one can approximate an unscaled sequence of occupation measures. The study is through the examination of appropriate exponential-type bounds. To take a suitable scaled sequence, one first centers the sequence around the “mean,” and then compares the actual sequence of occupation measures with this “mean.” In contrast to the results of Section 5.2, in lieu of taking the difference of the occupation measure with that of a deterministic function, it is compared with a random process. One of the key points here is the utilization of solutions of linear time-varying stochastic differential equations, in which the stochastic integration is with respect to a square-integrable martingale.

In comparison with the central limit theorem obtained in Section 5.2, it is interesting to know whether these results still hold under the structure of weak and strong interactions. The answer to this question is in Section 5.3.3, which also contains further study on related scaled sequences of occupation measures. The approach is quite different from that of Section 5.2. We use the martingale formulation and apply the techniques of perturbed test functions. It is interesting to note that the limit process is a switching diffusion process, which does not have independent increments. When the generator is weakly irreducible as in Section 5.2, the motion of jumping around the grouped states disappears and the diffusion becomes the dominant force.

We have considered only Markov chains with smooth generators up to now. However, there are cases in certain applications in which the generators may be merely measurable. Section 5.4 takes care of the scenario in which the Markov chains are governed by generators that are only measurable. Formulation via weak derivatives is also discussed briefly. Finally the chapter is concluded with a few more remarks. Among other things, additional references are given.

3.1 Aggregation of Markov Chains

This section deals with an aggregation of α^ε( ⋅). The following assumptions will be needed:

(A5.5)
For each k = 1, …, l and t ∈ [0, T], $\widetilde{{Q}}^{k}(t)$ is weakly irreducible.
(A5.6)
$\widetilde{Q}(\cdot )$ is differentiable on [0, T] and its derivative is Lipschitz. Moreover, $\widehat{Q}(\cdot )$ is also Lipschitz.

The assumptions above guarantee the existence of an asymptotic expansion up to zeroth order. To prepare for the subsequent study, we first provide the following error estimate. Since only the zeroth-order expansion is needed here, the estimate is confined to such an approximation. Higher-order terms can be obtained in a similar way.

Lemma 5.24

Assume (A5.5) and (A5.6) . Let P ^ε (t,t ₀ ) denote the transition probability of α ^ε (⋅). Then for some κ ₀ > 0,

$${P}^\varepsilon (t,{t}_{ 0}) = {P}_{0}(t,{t}_{0}) + O\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}(t - {t}_{0})} \varepsilon \right )\right ),$$

where

$$\begin{array}{ll} {P}_{0}(t,{t}_{0})& =\widetilde{ \mathrm{1}\mathrm{l}}\Theta (t,{t}_{0})\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)) \\ & = \left (\begin{array}{ccc} \mathrm{1}{\mathrm{l}}_{{m}_{1}}{\nu }^{1}(t){{\vartheta}}_{11}(t,{t}_{0}),&\ldots &,\mathrm{1}{\mathrm{l}}_{{m}_{1}}{\nu }^{l}(t){{\vartheta}}_{1l}(t,{t}_{0})\\ \vdots &\cdots & \vdots \\ \mathrm{1}{\mathrm{l}}_{{m}_{l}}{\nu }^{1}(t){{\vartheta}}_{l1}(t,{t}_{0}),&\ldots &,\mathrm{1}{\mathrm{l}}_{{m}_{l}}{\nu }^{l}(t){{\vartheta}}_{ll}(t,{t}_{0})\\ \end{array} \right ), \end{array}$$

(5.49)

where ν ^k (t) is the quasi-stationary distribution of $\widetilde{{Q}}^{k}(t)$ , and Θ(t,t ₀) $= ({{\vartheta}}_{ij}(t,{t}_{0})) \in {\mathbb{R}}^{l\times l}$ is the solution to the following initial value problem:

$$\begin{array}{ll} &{ d\Theta (t,{t}_{0}) \over dt} = \Theta (t,{t}_{0})\overline{Q}(t), \\ &\Theta ({t}_{0},{t}_{0}) = I.\end{array}$$

(5.50)

Proof: The proof is similar to those of Lemma 5.1 and Theorem 4.29, except that the notation is more involved. □

Define an aggregated process of α^ε ( ⋅) on [0, T] by

$${ \overline{\alpha }}^\varepsilon (t) = k\mbox{ if }{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ k}.$$

(5.51)

The idea to follow is to treat a related Markov chain having only l states. The transitions among its states correspond to the jumps from one group ${\mathcal{M}}_{k}$ to another ${\mathcal{M}}_{j}$,j≠k, in the original Markov chain.

Theorem 5.25

Assume (A5.5) and (A5.6) . Then, for any i = 1,…,l, j = 1,…,m _i , and bounded and measurable deterministic function β _ij (⋅),

$$E\left ({\int }_{0}^{T}{\left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}\right ){\beta }_{ij}(t)dt}\right )^{2} = O(\varepsilon ).$$

Proof: For any i, j and 0 ≤ t ≤ T, let

$${\eta }^\varepsilon (t) = E{\left ({\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (r)={s}_{ij}\}} - {\nu }_{j}^{i}(r){I}_{\{{\overline{\alpha }}^{ \varepsilon }(r)=i\}}\right ){\beta }_{ij}(r)dr\right )}^{2}.$$

(5.52)

We have suppressed the i, j dependence of η^ε( ⋅) for notational simplicity. Loosely speaking, the argument used below is a Liapunov stability one, and η^ε( ⋅) can be viewed as a Liapunov function. By differentiating η^ε( ⋅), we have

$$\begin{array}{rl} \frac{d{\eta }^\varepsilon (t)} {dt} =&2E\left [\left ({\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (r)={s}_{ij}\}} - {\nu }_{j}^{i}(r){I}_{\{{\overline{\alpha }}^{ \varepsilon }(r)=i\}}\right ){\beta }_{ij}(r)dr\right )\right. \\ &\left.\qquad \times \left ({I}_{\{{\alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}\right ){\beta }_{ij}(t)\right ]\end{array}$$

The definition of ${\overline{\alpha }}^\varepsilon (\cdot )$ yields that $\{{\overline{\alpha }}^\varepsilon (t) = i\} =\{ {\alpha }^\varepsilon (t) \in {\mathcal{M}}_{i}\}$. Thus,

$$\frac{d{\eta }^\varepsilon (t)} {dt} = 2{\int }_{0}^{t}{\Phi }^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)dr,$$

where ${\Phi }^\varepsilon (t,r) = {\Phi }_{1}^\varepsilon (t,r) + {\Phi }_{2}^\varepsilon (t,r)$ with

$$\begin{array}{rl} {\Phi }_{1}^\varepsilon (t,r) =&P({\alpha }^\varepsilon (t) = {s}_{ij},{\alpha }^\varepsilon (r) = {s}_{ij}) \\ &\ - {\nu }_{j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i},{\alpha }^\varepsilon (r) = {s}_{ ij}), \end{array}$$

(5.53)

and

$$\begin{array}{rl} {\Phi }_{2}^\varepsilon (t,r) =& - {\nu }_{j}^{i}(r)P({\alpha }^\varepsilon (t) = {s}_{ij},{\alpha }^\varepsilon (r) \in {\mathcal{M}}_{i}) \\ & + {\nu }_{j}^{i}(r){\nu }_{ j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i},{\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}).\end{array}$$

(5.54)

Note that the Markov property of α^ε( ⋅) implies that for 0 ≤ r ≤ t,

$$\begin{array}{rl} &P({\alpha }^\varepsilon (t)= {s}_{ij},{\alpha }^\varepsilon (r) = {s}_{ij}) \\ &\quad = P({\alpha }^\varepsilon (t) = {s}_{ij}\vert {\alpha }^\varepsilon (r) = {s}_{ij})P({\alpha }^\varepsilon (r) = {s}_{ij})\end{array}$$

In view of the asymptotic expansion, we have

$$\begin{array}{l} P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (r) = {s}_{ ij}) \\ = {\nu }_{j}^{i}(t){{\vartheta}}_{ ii}(t,r) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ).\end{array}$$

(5.55)

It follows that

$$\begin{array}{rl} &P({\alpha }^\varepsilon (t)\in {\mathcal{M}}_{i}\vert {\alpha }^\varepsilon (r) = {s}_{ij}) \\ =&{\sum }_{k=1}^{{m}_{i} }{\nu }_{k}^{i}(t){{\vartheta}}_{ ii}(t,r) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ) \\ =&{{\vartheta}}_{ii}(t,r) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ).\end{array}$$

(5.56)

Combining (5.55) and (5.56) leads to

$${\Phi }_{1}^\varepsilon (t,r) = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ).$$

Similarly, we can show that

$${\Phi }_{2}^\varepsilon (t,r) = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ),$$

by noting that

$$\begin{array}{rl} {\Phi }_{2}^\varepsilon (t,r) =& - {\nu }_{j}^{i}(r){\sum }_{k=1}^{{m}_{i} }P({\alpha }^\varepsilon (t) = {s}_{ ij},{\alpha }^\varepsilon (r) = {s}_{ ik}) \\ &\ + {\nu }_{j}^{i}(r){\sum }_{k=1}^{{m}_{i} }{\nu }_{j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i},{\alpha }^\varepsilon (r) = {s}_{ ik}) \end{array}$$

and

$$\begin{array}{rl} &P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (r) = {s}_{ ik}) \\ &\quad = {\nu }_{j}^{i}(t){{\vartheta}}_{ ii}(t,r) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ), \end{array}$$

for any k = 1, …, m _i. Therefore,

$$\frac{d{\eta }^\varepsilon (t)} {dt} = 2{\int }_{0}^{t}O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right )dr = O(\varepsilon ).$$

(5.57)

This together with η^ε (0) = 0 implies that η ^ε(t) = O(ε). □

Theorem 5.25 indicates that ν^k(t) together with ${\overline{\alpha }}^\varepsilon (\cdot )$ approximates well the Markov chain α^ε( ⋅) in an appropriate sense. Nevertheless, in general, {α^ε( ⋅)} is not tight. The following example provides a simple illustration.

Example 5.26

Let α ^ε (⋅) ∈{ 1,2} denote a Markov chain generated by

$${ 1 \over \varepsilon } \left (\begin{array}{cc} - \lambda & \lambda \\ \mu & -\mu \\ \end{array} \right ),$$

for some λ,μ > 0. Then α ^ε (⋅) is not tight.

Proof: If α ^ε (⋅) is tight, then there exists a sequence ε _k → 0 such that ${\alpha }^{\varepsilon _{k}}(\cdot )$ converges weakly to a stochastic process $\alpha (\cdot ) \in D([0,T];\mathcal{M})$ . In view of the Skorohod representation (without changing notation for simplicity), Theorem A.11 , we may assume ${\alpha }^{\varepsilon _{k}}(\cdot ) \rightarrow \alpha (\cdot )$ w.p.1. It follows from Lemma A.41 that

$$E{\left \vert {\int }_{0}^{t}{\alpha }^{\varepsilon _{k} }(s)ds -{\int }_{0}^{t}\alpha (s)ds\right \vert }^{2} \rightarrow 0,$$

for all t ∈ [0,T]. Moreover, similarly as in Theorem 5.25 , we obtain

$$E{\left \vert {\int }_{0}^{t}{\alpha }^{\varepsilon _{k} }(s)ds -{\int }_{0}^{t}({\nu }_{ 1} + 2{\nu }_{2})ds\right \vert }^{2} \rightarrow 0,$$

where (ν ₁ ,ν ₂ ) is the stationary distribution of α ^ε (⋅) and ν ₁ + 2ν ₂ is the mean with respect to the stationary distribution. As a consequence, it follows that $\alpha (t) = {\nu }_{1} + 2{\nu }_{2}$ for all t ∈ [0,T] w.p.1. Let

$${\delta }_{0} =\min \{ \vert 1 - ({\nu }_{1} + 2{\nu }_{2})\vert,\vert 2 - ({\nu }_{1} + 2{\nu }_{2})\vert \} > 0.$$

Then for t ∈ [0,T],

$$\vert {\alpha }^\varepsilon (t) - ({\nu }_{ 1} + 2{\nu }_{2})\vert \geq {\delta }_{0}.$$

Hence, under the Skorohod topology

$$d({\alpha }^{\varepsilon _{k} }(\cdot ),{\nu }_{1} + 2{\nu }_{2}) \geq {\delta }_{0}.$$

This contradicts the fact that ${\alpha }^{\varepsilon _{k}}(\cdot ) \rightarrow \alpha (\cdot ) = {\nu }_{ 1} + 2{\nu }_{2}$ w.p.1. Therefore, α ^ε(⋅) cannot be tight. □

Although α^ε ( ⋅) is not tight because it fluctuates in ${\mathcal{M}}_{k}$ very rapidly for small ε, its aggregation ${\overline{\alpha }}^\varepsilon (\cdot )$ is tight, and converges weakly to $\overline{\alpha }(t)$,t ≥ 0, a Markov chain generated by $\overline{Q}(t)$, t ≥ 0, where $\overline{Q}(t)$ is defined in (5.48). The next theorem shows that ${\overline{\alpha }}^\varepsilon (\cdot )$ can be further approximated by $\overline{\alpha }(\cdot )$.

Theorem 5.27

Assume (A5.5) and (A5.6) . Then ${\overline{\alpha }}^\varepsilon (\cdot )$ converges weakly to $\overline{\alpha }(\cdot )$ in $D([0,T];\overline{\mathcal{M}})$ , as ε → 0.

Proof: The proof is divided into two steps. First, we show that ${\overline{\alpha }}^\varepsilon (\cdot )$ defined in (5.51) is tight in $D([0,T];\overline{\mathcal{M}})$. The definition of ${\overline{\alpha }}^\varepsilon (\cdot )$ implies that

$$\{{\overline{\alpha }}^\varepsilon (t) = i\} =\{ {\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}\} =\{ {\alpha }^\varepsilon (t) = {s}_{ ij}\mbox{ for some }j = 1,\ldots,{m}_{i}\}.$$

Consider the conditional expectation

$$\begin{array}{l} E\left[{\left ({\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (s)\right)}^{2}\vert{\alpha }^\varepsilon (s) = {s}_{ ij}\right] \\ \quad=E\left[{\left ({\overline{\alpha }}^\varepsilon (t + s) - i\right)}^{2}\vert{\alpha }^\varepsilon (s) = {s}_{ ij}\right ] \\ \quad = \sum_{k=1}^{l}E\left [{\left ({\overline{\alpha }}^\varepsilon (t + s) - i\right )}^{2}{I}_{\{{\overline{\alpha }}^{ \varepsilon}(t+s)=k\}} \vert{\alpha }^\varepsilon (s) = {s}_{ ij}\right] \\ \quad={\sum }_{k=1}^{l}{(k - i)}^{2}P({\overline{\alpha}}^\varepsilon (t + s) = k\vert {\alpha }^\varepsilon (s) = {s}_{ij}) \\ \quad \leq {l}^{2} \sum \limits_{k\neq i}P({\overline{\alpha}}^\varepsilon (t + s) = k\vert {\alpha}^\varepsilon (s) = {s}_{ij})\end{array}$$

Since $\{{\overline{\alpha }}^\varepsilon (t + s) = k\} =\{ {\alpha }^\varepsilon (t + s) \in {\mathcal{M}}_{k}\}$, it follows that

$$\begin{array}{rl} &P({\overline{\alpha } }^\varepsilon (t + s) = k\vert {\alpha }^\varepsilon (s) = {s}_{ij}) \\ =&{\sum }_{{k}_{1}=1}^{{m}_{k} }P({\alpha }^\varepsilon (t + s) = {s}_{ k{k}_{1}}\vert {\alpha }^\varepsilon (s) = {s}_{ ij}) \\ =&{\sum }_{{k}_{1}=1}^{{m}_{k} }{\nu }_{{k}_{1}}^{k}(t + s){{\vartheta}}_{ ik}(t + s,s) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}t} \varepsilon \right )\right ) \\ =&{{\vartheta}}_{ik}(t + s,s) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}t} \varepsilon \right )\right )\end{array}$$

Therefore, we obtain

$$\begin{array}{rl} &E\left [{\left ({\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (s)\right )}^{2}\vert {\alpha }^\varepsilon (s) = {s}_{ij}\right ] \\ \leq &{l}^{2} \sum \limits_{k\neq i}{{\vartheta}}_{ik}(t + s,s) + O\left(\varepsilon +\exp \left( -\frac{{\kappa }_{0}t} \varepsilon \right)\right)\end{array}$$

Note that $\lim \limits_{t\rightarrow 0}{{\vartheta}}_{ik}(t + s,s) = 0$ for i≠ k.

$$\lim \limits_{t\rightarrow 0}\left (\lim \limits_{\varepsilon \rightarrow 0}E\left ({({\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (s))}^{2}\vert {\alpha }^\varepsilon (s) = {s}_{ ij}\right )\right ) = 0.$$

Thus, the Markov property of α^ε( ⋅) implies

$${ \lim }_{t\rightarrow 0}\left (\lim \limits_{\varepsilon \rightarrow 0}E\left ({({\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (s))}^{2}\vert {\alpha }^\varepsilon (r) :\; r \leq s\right )\right ) = 0.$$

(5.58)

Recall that ${\overline{\alpha }}^\varepsilon (\cdot )$ is bounded. The tightness of ${\overline{\alpha }}^\varepsilon (\cdot )$ follows from Kurtz’ tightness criterion (see Lemma A.17).

To complete the proof, it remains to show that the finite-dimensional distributions of ${\overline{\alpha }}^\varepsilon (\cdot )$ converge to that of $\overline{\alpha }(\cdot )$. In fact, for any

$$0 \leq {t}_{1} < {t}_{2} < \cdots < {t}_{n} \leq T\mbox{ and }{i}_{1},{i}_{2},\ldots,{i}_{n} \in \overline{\mathcal{M}} =\{ 1,\ldots,l\},$$

we have

$$\begin{array}{rl} &P({\overline{\alpha } }^\varepsilon ({t}_{n}) = {i}_{n},\ldots,{\overline{\alpha }}^\varepsilon ({t}_{1}) = {i}_{1}) \\ =&P({\alpha }^\varepsilon ({t}_{ n}) \in {\mathcal{M}}_{{i}_{n}},\ldots,{\alpha }^\varepsilon ({t}_{ 1}) \in {\mathcal{M}}_{{i}_{1}}) \\ =&{\sum }_{{j}_{1},\ldots,{j}_{n}}P({\alpha }^\varepsilon ({t}_{ n}) = {s}_{{i}_{n}{j}_{n}},\ldots,{\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}}) \\ =&{\sum }_{{j}_{1},\ldots,{j}_{n}}P({\alpha }^\varepsilon ({t}_{ n}) = {s}_{{i}_{n}{j}_{n}}\vert {\alpha }^\varepsilon ({t}_{ n-1}) = {s}_{{i}_{n-1}{j}_{n-1}}) \\ &\qquad \times \cdots \times P({\alpha }^\varepsilon ({t}_{ 2}) = {s}_{{i}_{2}{j}_{2}}\vert {\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}})P({\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}})\end{array}$$

In view of Lemma 5.24, for each k, we have

$$P({\alpha }^\varepsilon ({t}_{ k}) = {s}_{{i}_{k}{j}_{k}}\vert {\alpha }^\varepsilon ({t}_{ k-1}) = {s}_{{i}_{k-1}{j}_{k-1}}) \rightarrow {\nu }_{{j}_{k}}^{{i}_{k} }({t}_{k}){{\vartheta}}_{{i}_{k-1}{i}_{k}}({t}_{k},{t}_{k-1}).$$

Moreover, note that

$${\sum }_{{j}_{k}=1}^{{m}_{{i}_{k}} }{\nu }_{{j}_{k}}^{{i}_{k} }({t}_{k}) = 1.$$

It follows that

$$\begin{array}{rl} & \sum \limits_{{j}_{1},\ldots,{j}_{n}}P({\alpha }^\varepsilon ({t}_{ n}) = {s}_{{i}_{n}{j}_{n}}\vert {\alpha }^\varepsilon ({t}_{ n-1}) = {s}_{{i}_{n-1}{j}_{n-1}}) \\ &\qquad \times \cdots \times P({\alpha }^\varepsilon ({t}_{ 2}) = {s}_{{i}_{2}{j}_{2}}\vert {\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}})P({\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}}) \\ \rightarrow &{\sum }_{{j}_{1},\ldots,{j}_{n}}{\nu }_{{j}_{n}}^{{i}_{n} }({t}_{n}){{\vartheta}}_{{i}_{n-1}{i}_{n}}({t}_{n},{t}_{n-1})\cdots {\nu }_{{j}_{2}}^{{i}_{2} }({t}_{2}){{\vartheta}}_{{i}_{1}{i}_{2}}({t}_{2},{t}_{1}){\nu }_{{j}_{1}}^{{i}_{1} }({t}_{1})\widetilde{{{\vartheta}}}_{{i}_{1}}({t}_{1}) \\ =&{{\vartheta}}_{{i}_{n-1}{i}_{n}}({t}_{n},{t}_{n-1})\cdots {{\vartheta}}_{{i}_{1}{i}_{2}}({t}_{2},{t}_{1})\widetilde{{{\vartheta}}}_{{i}_{1}}({t}_{1}) \\ =&P(\overline{\alpha }({t}_{n}) = {i}_{n},\ldots,\overline{\alpha }({t}_{1}) = {i}_{1}), \end{array}$$

where ${\sum }_{{j}_{1},\ldots,{j}_{n}} = \sum \limits_{{j}_{1}=1}^{{m}_{{i}_{1}}}\cdots {\sum }_{{j}_{ n}=1}^{{m}_{{i}_{n}}}$ and $\widetilde{{{\vartheta}}}_{{i}_{1}}({t}_{1})$ denotes the initial distribution (also known as absolute probability in the literature of Markov chains). Thus, ${\overline{\alpha }}^\varepsilon (\cdot ) \rightarrow \overline{\alpha }(\cdot )$ in distribution. □

This theorem implies that ${\overline{\alpha }}^\varepsilon (\cdot )$ converges to a Markov chain, although ${\overline{\alpha }}^\varepsilon (\cdot )$ itself is not a Markov chain in general. If, however, the generator Q ^ε(t) has some specific structure, then ${\overline{\alpha }}^\varepsilon (\cdot )$ is a Markov chain. The following example demonstrates this point.

Example 5.28

Let $\widetilde{Q}(t) = (\widetilde{{q}}_{ij}(t))$ and $\overline{Q}(t) = ({\overline{q}}_{ij}(t))$ denote generators with the corresponding state spaces $\{{a}_{1},\ldots,{a}_{{m}_{0}}\}$ and {1,…,l}, respectively. Consider

$${ Q}^\varepsilon (t) = \frac{1} \varepsilon \left (\begin{array}{cccc} \widetilde{Q}(t)&&& \\ & &\ddots &\\ & & &\widetilde{Q}(t) \\ \end{array} \right )+\left (\begin{array}{ccc} {\overline{q}}_{11}(t){I}_{{m}_{0}} & \cdots &{\overline{q}}_{1l}(t){I}_{{m}_{0}}\\ \vdots & \vdots & \vdots \\ {\overline{q}}_{l1}(t){I}_{{m}_{0}} & \cdots & {\overline{q}}_{ll}(t){I}_{{m}_{0}}\\ \end{array} \right ),$$

(5.59)

where ${I}_{{m}_{0}}$ is the m ₀ × m ₀ identity matrix. In this case

$${m}_{1} = {m}_{2} = \cdots = {m}_{l} = {m}_{0}.$$

Then ${\overline{\alpha }}^\varepsilon (\cdot )$ is a Markov chain generated by $\overline{Q}(t)$ . In fact, let

$${\chi }^\varepsilon (t) = \left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{11}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{1{m}_{0}}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l1}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l{m}_{0}}\}}\right ).$$

Note that s _ij = (i,a _j ) for j = 1,…,m ₀ and i = 1,…,l. In view of Lemma 2.4 , we obtain that

$${\chi }^\varepsilon (t) -{\int }_{0}^{t}{\chi }^\varepsilon (s){Q}^\varepsilon (s)ds$$

(5.60)

is a martingale. Postmultiplying (multiplying from the right) (5.60) by

$$\widetilde{\mathrm{1}\mathrm{l}} = \mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{m}_{0}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{0}})$$

and noting that $\{{\overline{\alpha }}^\varepsilon (t) = i\} =\{ {\alpha }^\varepsilon (t) \in {\mathcal{M}}_{i}\}$ and

$${\chi }^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} = ({I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (t)=l\}}),$$

we obtain that

$$\begin{array}{rl} \left ({I}_{\{{\overline{\alpha }}^\varepsilon (t)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (t)=l\}}\right ) -{\int }_{0}^{t}{\chi }^\varepsilon (s){Q}^\varepsilon (s)ds\widetilde{\mathrm{1}\mathrm{l}} \end{array}$$

is still a martingale. In view of the special structure of Q ^ε (t) in (5.59),

$$\widetilde{Q}(t)\mathrm{1}{\mathrm{l}}_{{m}_{0}} = 0,\quad {Q}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} =\widehat{ Q}(t)\widetilde{\mathrm{1}\mathrm{l}},$$

and

$${\chi }^\varepsilon (s)\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} = \left ({I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (s)=l\}}\right )\overline{Q}(s).$$

Therefore, (5.60) implies that

$$\left ({I}_{\{{\overline{\alpha }}^\varepsilon (t)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (t)=l\}}\right ) -{\int }_{0}^{t}\left ({I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (s)=l\}}\right )\overline{Q}(s)ds$$

is a martingale. This implies, in view of Lemma 2.4, that ${\overline{\alpha }}^\varepsilon (\cdot )$ is a Markov chain generated by $\overline{Q}(t)$ , t ≥ 0.

3.2 Exponential Bounds

For each i = 1, …, l, j = 1, …, m _i, $\alpha \in \mathcal{M}$, and t ≥ 0, let β _ij(t) be a bounded, Borel measurable, deterministic function and let

$${W}_{ij}(t,\alpha ) = \left ({I}_{\{\alpha ={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{ \alpha \in {\mathcal{M}}_{i}\}}\right ){\beta }_{ij}(t).$$

(5.61)

Consider normalized occupation measures

$${n}^\varepsilon (t) = \left ({n}_{ 11}^\varepsilon (t),\ldots,{n}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{n}_{ l1}^\varepsilon (t),\ldots,{n}_{ l{m}_{l}}^\varepsilon (t)\right ),$$

where

$${n}_{ij}^\varepsilon (t) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}{W}_{ ij}(s,{\alpha }^\varepsilon (s))ds.$$

In this section, we establish the exponential error bound for n ^ε( ⋅), a sequence of suitably scaled occupation measures for the singularly perturbed Markov chains with weak and strong interactions.

In view of Theorem 4.29, there exists κ₀ > 0 such that

$$\vert{P}^\varepsilon (t,s) - {P}_{ 0}(t,s)\vert = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).$$

(5.62)

Similar to Section 5.3.2, for fixed but otherwise arbitrary T > 0, let

$${K}_{T} =\max {\left \{ 1{,\sup}_{0\leq s\leq t\leq T}\left (\frac{\vert {P}^\varepsilon (t,s) - {P}_{0}(t,s)\vert } {\varepsilon +\exp (-{\kappa }_{0}(t - s)/\varepsilon )} \right )\right \}}.$$

(5.63)

We may write (5.62) in terms of K _T and O ₁( ⋅) as follows:

$$\vert{P}^\varepsilon (t,s) - {P}_{ 0}(t,s)\vert = {K}_{T}{O}_{1}\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ),$$

(5.64)

where | O ₁(y) | ∕ | y | ≤ 1. The notation of K _T and O ₁( ⋅) above emphasizes the separation of the dependence of the constant and a “norm 1” function. Essentially, K _T serves as a magnitude of the bound indicating the size of the bounding region, and the rest is absorbed into the function O ₁( ⋅).

Theorem 5.29

Assume (A5.5) and (A5.6) . Then there exist ε ₀ > 0 and K > 0 such that for 0 < ε ≤ ε ₀ , T ≥ 0, and for any bounded, Borel measurable, and deterministic process β _ij (⋅),

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{3}}\sup }_{0\leq t\leq T}\vert {n}^\varepsilon (t)\vert \right ) \leq K,$$

(5.65)

where θ _T is any constant satisfying

$$0 \leq {\theta }_{T} \leq \frac{\min \{1,{\kappa }_{0}\}} {{K}_{T}\vert \beta {\vert }_{T}(1 + \vert \widehat{Q}{\vert }_{T})},$$

(5.66)

and where |⋅| _T denotes the matrix norm as defined in (5.12) , that is,

$$\vert \beta {\vert }_{T} {=\max { }_{i,j}\sup }_{0\leq t\leq T}\vert {\beta }_{ij}(t)\vert,$$

similarly for $\vert \widehat{Q}{\vert }_{T}$.

Remark 5.30

This theorem is a natural extension to Theorem 5.4 . Owing to the existence of the weak and strong interactions, slightly stronger conditions on K _T and θ _T are made in (5.63) and (5.66). Also the exponential constant in (5.65) is changed to (T + 1)³.

Proof of Theorem 5.29: Here the proof is again along the lines of Theorem 5.4. Since Steps 2-5 in the proof are similar to those of Theorem 5.4, we will only give the proof for Step 1.

Let χ^ε ( ⋅) denote the vector of indicators corresponding to α ^ε ( ⋅), that is,

$${\chi }^\varepsilon (t) = \left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{11}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{1{m}_{1}}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l1}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l{m}_{l}}\}}\right ).$$

Then w ^ε( ⋅) defined by

$${w}^\varepsilon (t) = {\chi }^\varepsilon (t) - {\chi }^\varepsilon (0) -{\int }_{0}^{t}{\chi }^\varepsilon (s){Q}^\varepsilon (s)ds$$

(5.67)

is an ${\mathbb{R}}^{m}$-valued martingale. In fact, w ^ε ( ⋅) is square integrable on [0, T]. It then follows from a well-known result (see Elliott [55] or Kunita and Watanabe [134]) that a stochastic integral with respect to w ^ε(t) can be defined. In view of the defining equation (5.67), the linear stochastic differential equation

$$d{\chi }^\varepsilon (t) = {\chi }^\varepsilon (t){Q}^\varepsilon (t)dt + d{w}^\varepsilon (t)$$

(5.68)

makes sense. Recall that P ^ε(t, s) is the principal matrix solution of the matrix differential equation

$${ dy(t) \over dt} = y(t){Q}^\varepsilon (t).$$

(5.69)

The solution of this stochastic differential equation is

$$\begin{array}{ll} {\chi }^\varepsilon (t)& = {\chi }^\varepsilon (0){P}^\varepsilon (t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)){P}^\varepsilon (t,s) \\ & = {\chi }^\varepsilon (0)\left ({P}^\varepsilon (t,0) - {P}_{ 0}(t,0)\right ) \\ &\quad \quad +{ \int }_{0}^{t}(d{w}^\varepsilon (s))\left ({P}^\varepsilon (t,s) - {P}_{ 0}(t,s)\right ) \\ &\quad \quad + {\chi }^\varepsilon (0){P}_{ 0}(t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)){P}_{ 0}(t,s).\end{array}$$

(5.70)

Use ${{\vartheta}}_{ij}(t,s)$ defined in Lemma 5.24 and write $\Theta (t,s) = ({{\vartheta}}_{ij}(t,s))$ . Then it is easy to check that

$${P}_{0}(t,s) =\widetilde{ \mathrm{1}\mathrm{l}}\Theta (t,s)\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)).$$

(5.71)

Set

$${\overline{\chi }}^\varepsilon (t) = \left ({\nu }^{1}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=1\}},\ldots,{\nu }^{l}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=l\}}\right ) \in {\mathbb{R}}^{m}$$

and

$$\widetilde{{\chi }}^\varepsilon (t) = \left ({I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (t)=l\}}\right ) \in {\mathbb{R}}^{l}.$$

Then it follows that

$$\begin{array}{l} \widetilde{{\chi }}^\varepsilon (t) = {\chi }^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}}\quad \mbox{ and } \\ {\overline{\chi }}^\varepsilon (t) =\widetilde{ {\chi }}^\varepsilon (t)\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)).\end{array}$$

(5.72)

Moreover, postmultiplying both sides of (5.67) by $\widetilde{\mathrm{1}\mathrm{l}}$ yields that

$${\chi }^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} - {\chi }^\varepsilon (0)\widetilde{\mathrm{1}\mathrm{l}} -{\int }_{0}^{t}{\chi }^\varepsilon (s){Q}^\varepsilon (s)\widetilde{\mathrm{1}\mathrm{l}}ds = {w}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}}.$$

(5.73)

Here ${w}^\varepsilon (\cdot )\widetilde{\mathrm{1}\mathrm{l}}$ is also a square-integrable martingale. Note that $\widetilde{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} = 0$ and hence

$$\begin{array}{l} {Q}^\varepsilon (s)\widetilde{\mathrm{1}\mathrm{l}} =\widehat{ Q}(s)\widetilde{\mathrm{1}\mathrm{l}}\quad \mbox{ and } \\ {\overline{\chi }}^\varepsilon (s)\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} =\widetilde{ {\chi }}^\varepsilon (s)\mathrm{diag}({\nu }^{1}(s),\ldots,{\nu }^{l}(s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} =\widetilde{ {\chi }}^\varepsilon (s)\overline{Q}(s)\end{array}$$

We obtain from (5.73) that

$$\widetilde{{\chi }}^\varepsilon (t) -\widetilde{ {\chi }}^\varepsilon (0) -{\int }_{0}^{t}\left (({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} +\widetilde{ {\chi }}^\varepsilon (s)\overline{Q}(s)\right )ds = {w}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}}.$$

Since Θ(t, s) is the principal matrix solution to

$$\frac{d\Theta (t,s)} {dt} = \Theta (t,s)\overline{Q}(t),\,\mbox{ with }\Theta (s,s) = I,$$

similar to (5.68), solving the stochastic differential equation for $\widetilde{{\chi }}^\varepsilon (\cdot )$ leads to the equation:

$$\begin{array}{l} \widetilde{{\chi }}^\varepsilon (t) =\widetilde{ {\chi }}^\varepsilon (0)\Theta (t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)\widetilde{\mathrm{1}\mathrm{l}})\Theta (t,s) \\ \quad \quad +{ \int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,s)ds.\end{array}$$

(5.74)

Let us now return to the last two terms in (5.70) and use (5.71), (5.72), and (5.74) to obtain

$$\begin{array}{l} {\chi }^\varepsilon (0){P}_{ 0}(t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)){P}_{ 0}(t,s) \\ \quad = \left ({\chi }^\varepsilon (0)\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s))\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,s)\right )\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)) \\ \quad = \left (\widetilde{{\chi }}^\varepsilon (0)\Theta (t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)\widetilde{\mathrm{1}\mathrm{l}})\Theta (t,s)\right )\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)) \\ \quad = \left (\widetilde{{\chi }}^\varepsilon (t) -{\int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,s)ds\right )\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)) \\ \quad ={ \overline{\chi }}^\varepsilon (t) -{\int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,s)\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))ds \\ \quad ={ \overline{\chi }}^\varepsilon (t) -{\int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s){P}_{ 0}(t,s)ds\end{array}$$

Combining this with (5.70), we have

$$({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t)) +{ \int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s){P}_{ 0}(t,s)ds = {\eta }^\varepsilon (t),$$

(5.75)

where

$${\eta }^\varepsilon (t) = {\chi }^\varepsilon (0)\left ({P}^\varepsilon (t,0) - {P}_{ 0}(t,0)\right ) +{ \int }_{0}^{t}(d{w}^\varepsilon (s))\left ({P}^\varepsilon (t,s) - {P}_{ 0}(t,s)\right ).$$

Note that the matrix P ^ε(t, s) is invertible but P ₀(t, s) is not. The idea is to approximate the noninvertible matrix P ₀(t, s) by the invertible P ^ε(t, s). Let

$${\eta }_{1}^\varepsilon (t) ={ \int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\left ({P}_{ 0}(t,s) - {P}^\varepsilon (t,s)\right )ds$$

(5.76)

and

$${\phi }^\varepsilon (t) = ({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t)) - ({\eta }^\varepsilon (t) - {\eta }_{ 1}^\varepsilon (t)).$$

Then ϕ^ε(0) = 0 and ϕ^ε(t) satisfies the following equation:

$${\phi }^\varepsilon (t) +{ \int }_{0}^{t}{\phi }^\varepsilon (s)\widehat{Q}(s){P}^\varepsilon (t,s)ds +{ \int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s){P}^\varepsilon (t,s)ds = 0.$$

The properties of the principal matrix solution imply that

$${P}^\varepsilon (t,s) = {P}^\varepsilon (0,s){P}^\varepsilon (t,0).$$

Set

$$\begin{array}{l} \check{{Q}}^\varepsilon (t) = {P}^\varepsilon (t,0)\widehat{Q}(t){P}^\varepsilon (0,t), \\ {\psi }^\varepsilon (t) = {\phi }^\varepsilon (t){P}^\varepsilon (0,t),\quad \mbox{ and } \\ {\eta }_{2}^\varepsilon (t) = ({\eta }^\varepsilon (t) - {\eta }_{1}^\varepsilon (t))\widehat{Q}(t){P}^\varepsilon (0,t)\end{array}$$

Owing to the properties of the principal matrix solution, for any t ∈ [0, T], we have

$${P}^\varepsilon (0,t){P}^\varepsilon (t,0) = {P}^\varepsilon (t,t) = I,$$

(5.77)

ψ ^ε (0) = 0 and ψ ^ε(t) satisfies the equation

$${\psi }^\varepsilon (t) +{ \int }_{0}^{t}{\psi }^\varepsilon (s)\check{{Q}}^\varepsilon (s)ds +{ \int }_{0}^{t}{\eta }_{ 2}^\varepsilon (s)ds = 0.$$

The solution to this equation is given by

$${\psi }^\varepsilon (t) = -{\int }_{0}^{t}{\eta }_{ 2}^\varepsilon (s)\check{{\Phi }}^\varepsilon (t,s)ds,$$

(5.78)

where $\check{{\Phi }}^\varepsilon (t,s)$ is the principal matrix solution to

$$\frac{d\check{{\Phi }}^\varepsilon (t,s)} {dt} = -\check{{\Phi }}^\varepsilon (t,s)\check{{Q}}^\varepsilon (t),\;\mbox{ with }\check{{\Phi }}^\varepsilon (s,s) = I.$$

Postmultiplying both sides of (5.78) by P ^ε(t, 0) yields

$$\begin{array}{rl} {\phi }^\varepsilon (t)& = {\psi }^\varepsilon (t){P}^\varepsilon (t,0) \\ & = -{\int }_{0}^{t}{\eta }_{ 2}^\varepsilon (s)\check{{\Phi }}^\varepsilon (t,s){P}^\varepsilon (t,0)ds \\ & = -{\int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)ds, \end{array}$$

where

$$\check{{\Psi }}^\varepsilon (t,s) = {P}^\varepsilon (0,s)\check{{\Phi }}^\varepsilon (t,s){P}^\varepsilon (t,0).$$

Thus it follows that

$${\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t) = {\eta }^\varepsilon (t) - {\eta }_{ 1}^\varepsilon (t) -{\int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)ds.$$

(5.79)

Again using (5.77), we have

$$\begin{array}{l} \frac{d} {dt}\left (\check{{\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\right ) \\ \quad = \left (\frac{d\check{{\Phi }}^\varepsilon (t,0)} {dt} \right ){P}^\varepsilon (t,0) +\check{ {\Phi }}^\varepsilon (t,0)\left (\frac{d{P}^\varepsilon (t,0)} {dt} \right ) \\ \quad = -\check{{\Phi }}^\varepsilon (t,0)\check{{Q}}^\varepsilon (t){P}^\varepsilon (t,0) +\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0){Q}^\varepsilon (t) \\ \quad = -\check{{\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\widehat{Q}(t){P}^\varepsilon (0,t){P}^\varepsilon (t,0) +\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0){Q}^\varepsilon (t) \\ \quad = -\check{{\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\widehat{Q}(t) +\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0){Q}^\varepsilon (t) \\ \quad =\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\left (-\widehat{Q}(t) + {Q}^\varepsilon (t)\right ) \\ \quad =\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\left (\frac{1} \varepsilon \widetilde{Q}(t)\right )\end{array}$$

This implies that $\check{{\Psi }}^\varepsilon (t,s)$ is the principal matrix solution to the differential equation

$$\frac{d\check{{\Psi }}^\varepsilon (t,s)} {dt} =\check{ {\Psi }}^\varepsilon (t,s)\left (\frac{1} \varepsilon \widetilde{Q}(t)\right ),\;\mbox{ with }\check{{\Psi }}^\varepsilon (s,s) = I.$$

(5.80)

Therefore, all entries of $\check{{\Psi }}^\varepsilon (t,s)$ are bounded below from 0 and bounded above by 1, and these bounds are uniform in 0 ≤ s ≤ t ≤ T. Thus, $\vert \check{{\Psi }}^\varepsilon (t,s){\vert }_{T} \leq 1$.

Multiplying both sides of (5.79) by the m ×m matrix

$$\beta (t) := \mathrm{diag}({\beta }_{11}(t),\ldots,{\beta }_{1{m}_{1}}(t),\ldots,{\beta }_{l1}(t),\ldots,{\beta }_{l{m}_{l}}(t))$$

from the right and integrating over the interval [0, ς], for each ς ∈ [0, T], we have

$$\begin{array}{rl} { \int }_{0}^{\varsigma }({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t))\beta (t)dt& ={ \int }_{0}^{\varsigma }{\eta }^\varepsilon (t)\beta (t)dt -{\int }_{0}^{\varsigma }{\eta }_{ 1}^\varepsilon (t)\beta (t)dt \\ &\quad -{\int }_{0}^{\varsigma }{ \int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)ds\beta (t)dt\end{array}$$

By changing the order of integration, we write the last term in the above expression as

$$\begin{array}{l} { \int }_{0}^{\varsigma }{ \int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)ds\beta (t)dt \\ \quad ={ \int }_{0}^{\varsigma }({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\left ({\int }_{s}^{\varsigma }\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)\beta (t)dt\right )ds\end{array}$$

Therefore, it follows that

$${\int }_{0}^{\varsigma }({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t))\beta (t)dt ={ \int }_{0}^{\varsigma }{\eta }^\varepsilon (t)\widetilde{\beta }(t)dt -{\int }_{0}^{\varsigma }{\eta }_{ 1}^\varepsilon (t)\widetilde{\beta }(t)dt,$$

(5.81)

where

$$\widetilde{\beta }(t) = \beta (t) +{ \int }_{t}^{\varsigma }\widehat{Q}(t)\check{{\Psi }}^\varepsilon (r,t)\beta (r)dr.$$

Moreover, in view of the fact that $\vert \check{{\Psi }}^\varepsilon (t,s){\vert }_{T} \leq 1$, it is easy to see that

$$\vert \widetilde{\beta }{\vert }_{T} \leq (1 + T)\vert \beta {\vert }_{T}(1 + \vert \widehat{Q}{\vert }_{T}).$$

(5.82)

Note that n ^ε ( ⋅) can be written in terms of χ ^ε ( ⋅) and ${\overline{\chi }}^\varepsilon (\cdot )$ as

$${n}^\varepsilon (\varsigma ) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{\varsigma }({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t))\beta (t)dt.$$

By virtue of (5.81), it follows that

$$\vert {n}^\varepsilon (\varsigma )\vert \leq \frac{1} {\sqrt\varepsilon }\left \vert {\int }_{0}^{\varsigma }{\eta }^\varepsilon (t)\widetilde{\beta }(t)dt\right \vert + \frac{1} {\sqrt\varepsilon }\left \vert {\int }_{0}^{\varsigma }{\eta }_{ 1}^\varepsilon (t)\widetilde{\beta }(t)dt\right \vert.$$

Note that in view of the definition of η₁ ^ε( ⋅) in (5.76),

$$\vert {\eta }_{1}^\varepsilon (t)\vert ={ \int }_{0}^{t}O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right )ds = O(\varepsilon (t + 1)).$$

Thus, in view of (5.82),

$$\begin{array}{rl} {\sup }_{0\leq \varsigma \leq T}\left \vert {\int }_{0}^{\varsigma }{\eta }_{ 1}^\varepsilon (t)\widetilde{\beta }(t)dt\right \vert & = \vert \widetilde{\beta }{\vert {}_{ T}\sup }_{0\leq \varsigma \leq T}{ \int }_{0}^{\varsigma }O(\varepsilon (t + 1))dt \\ & = \vert \widetilde{\beta }{\vert {}_{T}\sup }_{0\leq \varsigma \leq T}O(\varepsilon ({\varsigma }^{2} + \varsigma )) \\ & = \vert \widetilde{\beta }{\vert }_{T}({T}^{2} + T)O(\varepsilon ) \\ & \leq {(1 + T)}^{3}\vert \beta {\vert }_{ T}(1 + \vert \widehat{Q}{\vert }_{T})O(\varepsilon ).\end{array}$$

(5.83)

Thus, in view of (5.63) and (5.66), for some ε₀ > 0, and all 0 < ε ≤ ε ₀,

$$\begin{array}{l} \exp \left ({\frac{{\theta }_{T}} {{(T + 1)}^{3}}\sup }_{0\leq \varsigma \leq T}{\left |{ \int }_{0}^{\varsigma }\frac{{\eta }_{1}^\varepsilon (t)\widetilde{\beta }(t)} {\sqrt\varepsilon } dt\right |}\right ) \\ \quad \leq \exp \left ( \frac{O(\sqrt\varepsilon )\min \{1,{\kappa }_{0}\}} {{K}_{T}} \right ) \\ \quad \leq \exp \left(O(\sqrt{\varepsilon _{0}})\min \{1,{\kappa }_{0}\}\right) \leq K.\end{array}$$

(5.84)

Moreover, using (5.64), as in the proof of Theorem 5.4, we obtain that

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup }_{0\leq \varsigma \leq T}{\left |{ \int }_{0}^{\varsigma }\left (\frac{{\eta }^\varepsilon (t)\widetilde{\beta }(t)} {\sqrt\varepsilon } \right )dt\right |}\right ) \leq K,$$

(5.85)

for

$$0 \leq {\theta }_{T} \leq \frac{\min \{1,{\kappa }_{0}\}} {{K}_{T}\vert \beta {\vert }_{T}(1 + \vert \widehat{Q}{\vert }_{T})}.$$

Finally, combine (5.81), (5.83), and (5.85) to obtain

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{3}}\sup }_{0\leq t\leq T}\vert {n}^\varepsilon (t)\vert \right ) \leq K.$$

This completes the proof. □

Remark 5.31

It is easily seen that the error bound so obtained has a form similar to that of the martingale inequality. If n^ε(⋅) were a martingale, the inequality would be obtained much more easily since exp (⋅) is a convex function. As in Section 5.2, the error bound is still a measure of “goodness” of approximation. However, one cannot compare the unscaled occupation measures with a deterministic function. A sensible alternative is to use an approximation by the aggregated process that is no longer deterministic. The exponential bounds obtained tell us exactly how closely one can carry out the approximation. It should be particularly useful for many applications in stochastic control problems with Markovian jump disturbances under discounted cost criteria.

The next two corollaries show that the error bound can be improved under additional conditions by having smaller exponential constants, e.g., ${(T + 1)}^{3/2}$ or ${(T + 1)}^{5/2}$ instead of ( T + 1)³.

Corollary 5.32

Assume that the conditions of Theorem 5.29 hold. Let $\widetilde{Q}(t) = (\widetilde{{q}}_{ij}(t))$ and $\overline{Q}(t) = ({\overline{q}}_{ij}(t))$ denote generators with the corresponding state spaces $\{{a}_{1},\ldots,{a}_{{m}_{0}}\}$ and {1,…,l}, respectively. Consider

$${Q}^\varepsilon (t) = \frac{1} \varepsilon \left (\begin{array}{ccc} \widetilde{Q}(t)&& \\ &\ddots &\\ & &\widetilde{Q}(t) \\ \end{array} \right )+\left (\begin{array}{ccc} {\overline{q}}_{11}(t){I}_{{m}_{0}} & \cdots &{\overline{q}}_{1l}(t){I}_{{m}_{0}} \\ \vdots & \cdots & \vdots \\ {\overline{q}}_{l1}(t){I}_{{m}_{0}} & \cdots & {\overline{q}}_{ll}(t){I}_{{m}_{0}}\\ \end{array} \right ),$$

where ${I}_{{m}_{0}}$ is the m ₀ × m ₀ identity matrix. Then there exist positive constants ε ₀ and K such that for 0 < ε ≤ ε ₀ , and T ≥ 0,

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup }_{0\leq t\leq T}\vert {n}^\varepsilon (t)\vert \right ) \leq K.$$

Proof: Under the special structure of the generator Q ^ε, it is easy to see that

$$\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} =\widetilde{ \mathrm{1}\mathrm{l}}\overline{Q}(s),$$

where $\widetilde{\mathrm{1}\mathrm{l}}$ now takes the form

$$\widetilde{\mathrm{1}\mathrm{l}} = \mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{m}_{0}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{0}}).$$

Note that under current conditions on the fast-changing part of the generator $\widetilde{Q}(t)$,

$${\nu }^{1}(t) = {\nu }^{2}(t) = \cdots = {\nu }^{l}(t)\mbox{ and }\mbox{ diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))\widetilde{\mathrm{1}\mathrm{l}} = {I}_{ l},$$

where I _l denotes the l-dimensional identity matrix. This together with (5.72) implies that

$$({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} = 0.$$

It follows from (5.71) that

$${\int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s){P}_{ 0}(t,s)ds = 0.$$

Then (5.75) becomes

$${\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t) = {\eta }^\varepsilon (t).$$

The rest of the proof follows exactly that of Theorem 5.29. □

Corollary 5.33

Assume the conditions of Theorem 5.29 . Suppose $\widetilde{Q}(t) =\widetilde{ Q}$ and $\widehat{Q}(t) =\widehat{ Q}$ for some constant matrices $\widetilde{Q}$ and $\widehat{Q}$ . Then there exist positive constants ε ₀ and K such that for 0 < ε ≤ ε ₀ , and T ≥ 0,

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{5} {2} }} \sup }_{0\leq t\leq T}\vert {n}^\varepsilon (t)\vert \right ) \leq K.$$

Remark 5.34

Note that in view of Corollary 4.31 , one can show under the condition $\widetilde{Q}(t) =\widetilde{ Q}$ and $\widehat{Q}(t) =\widehat{ Q}$ that there exists a constant K such that

$${P}^\varepsilon (t,s) - {P}_{ 0}(t,s) = K(T + 1){O}_{1}\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).$$

In this case, θ _T can be taken as

$$0 \leq {\theta }_{T} \leq \frac{\min \{1,{\kappa }_{0}\}} {K(T + 1)\vert \beta {\vert }_{T}(1 + \vert \widehat{Q}{\vert }_{T})}.$$

That is, compared with the general result, the constant K _T can be further specified as ${K}_{T} = K(T + 1)$.

Proof of Corollary 5.33: Note that when the generators are time independent, the quasi-stationary distribution νⁱ(t) is also independent of time and is denoted by νⁱ. In this case, the argument from (5.75) to (5.80) can be replaced by the following. Let

$$\check{{Q}}_{0} =\widehat{ Q}\widetilde{\mathrm{1}\mathrm{l}}\mathrm{diag}({\nu }^{1},\ldots,{\nu }^{l}).$$

Then it can be shown that

$$\widehat{Q}\widetilde{\mathrm{1}\mathrm{l}}{\left (\overline{Q}\right )}^{k}\mathrm{diag}({\nu }^{1},\ldots,{\nu }^{l}) = {(\check{{Q}}_{ 0})}^{k+1},\mbox{ for }k \geq 0.$$

This implies that

$$\begin{array}{ll} \widehat{Q}{P}_{0}(t,s)& =\widehat{ Q}\widetilde{\mathrm{1}\mathrm{l}}\exp \left (\overline{Q}(t - s)\right )\mathrm{diag}({\nu }^{1},\ldots,{\nu }^{l}) \\ & =\check{ {Q}}_{0}\exp (\check{{Q}}_{0}(t - s))\end{array}$$

Let ${\phi }^\varepsilon (t) = ({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t)) - {\eta }^\varepsilon (t)$. Then ϕ^ε( ⋅) satisfies the equation

$${\phi }^\varepsilon (t) +{ \int }_{0}^{t}({\phi }^\varepsilon (s) + {\eta }^\varepsilon (s))\check{{Q}}_{ 0}\exp (\check{{Q}}_{0}(t - s))ds = 0.$$

Solving for ϕ^ε( ⋅), we obtain

$${\phi }^\varepsilon (t) = -{\int }_{0}^{t}{\eta }^\varepsilon (s)\check{{Q}}_{ 0}ds.$$

Writing ${\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t)$ in terms of ϕ^ε (t) and η ^ε(t) yields,

$${\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t) = {\eta }^\varepsilon (t) -{\int }_{0}^{t}{\eta }^\varepsilon (s)\check{{Q}}_{ 0}ds.$$

The rest of the proof follows that of Theorem 5.29. □

Similar to Section 5.2, we derive estimates that are analogous to Corollary 5.7 and Corollary 5.8. The details are omitted, however.

3.3 Asymptotic Distributions

In Section 5.2, we obtained a central limit theorem for a class of Markov chains generated by ${Q}^\varepsilon (t) = Q(t)/\varepsilon +\widehat{ Q}(t)$ with a weakly irreducible Q( t). In this case for sufficiently small ε > 0, Q ^ε(t) is weakly irreducible. What, if anything, can be said about the weak and strong interaction models, when $\widetilde{Q}(t)$ is not weakly irreducible? Is there a central limit theorem for the corresponding occupation measure when one has a singularly perturbed Markov chain with weak and strong interactions? This section deals with such an issue; our interest lies in the asymptotic distribution as ε → 0. It is shown that the asymptotic distribution of the corresponding occupation measure can be obtained. However, the limit distribution is no longer Gaussian, but a Gaussian mixture, and the proof is quite different from that of the irreducible case in Section 5.2.

For each i = 1, …, l, j = 1, …, m _i, $\alpha \in \mathcal{M}$ , and t ≥ 0, let β_ij(t) be a bounded Borel measurable deterministic function. Use W _ij(t, α) defined in (5.61) and the normalized occupation measure

$${n}^\varepsilon (t) = \left ({n}_{ 11}^\varepsilon (t),\ldots,{n}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{n}_{ l1}^\varepsilon (t),\ldots,{n}_{ l{m}_{l}}^\varepsilon (t)\right ),$$

with

$${n}_{ij}^\varepsilon (t) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}{W}_{ ij}(s,{\alpha }^\varepsilon (s))ds.$$

We will show in this section that n ^ε( ⋅) converges weakly to a switching diffusion modulated by $\overline{\alpha }(\cdot )$. The procedure is as follows:

(a)
Show that $({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$ is tight;
(b)
verify that the limit of a subsequence of $({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$ is a solution to a martingale problem that has a unique solution;
(c)
characterize the solution of the associated martingale problem;
(d)
construct a switching diffusion that is also a solution to the martingale problem and therefore the limit of $({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$.

To accomplish our goal, these steps are realized by proving a series of lemmas. Recall that ${\mathcal{F}}_{t}^\varepsilon = \sigma \{{\alpha }^\varepsilon (s) :\; 0 \leq s \leq t\}$ denotes the filtration generated by α^ε( ⋅). The lemma below is on the order estimates of the conditional moments, and is useful for getting the tightness result in what follows.

Lemma 5.35

Assume (A5.5) and (A5.6) . Then for all 0 ≤ s ≤ t ≤ T and ε small enough, the following hold:

$$\begin{array}{l} {\mbox{ (a) }\sup }_{s\leq t\leq T}E[{n}^\varepsilon (t) - {n}^\varepsilon (s)\vert {\mathcal{F}}_{ s}^\varepsilon ] = O(\sqrt\varepsilon ); \\ {\mbox{ (b) }\sup }_\varepsilon E\left [\;\vert {n}^\varepsilon (t) - {n}^\varepsilon (s){\vert }^{2}\vert {\mathcal{F}}_{ s}^\varepsilon \right ] = O(t - s)\end{array}$$

Proof: First, note that for any fixed i, j,

$$E[({n}_{ij}^\varepsilon (t) - {n}_{ ij}^\varepsilon (s))\vert {\mathcal{F}}_{ s}^\varepsilon ] = \frac{1} {\sqrt\varepsilon }{\int }_{s}^{t}E[{W}_{ ij}(r,{\alpha }^\varepsilon (r))\vert {\mathcal{F}}_{ s}^\varepsilon ]dr.$$

Moreover, in view of the definition of W _ij(t, α) and the Markov property, we have, for 0 ≤ s ≤ r,

$$\begin{array}{l} E[{W}_{ij}(r,{\alpha }^\varepsilon (r))\vert {\mathcal{F}}_{ s}^\varepsilon ] \\ \quad = E\left [\left ({I}_{\{{\alpha }^\varepsilon (r)={s}_{ij}\}} - {\nu }_{j}^{i}(r){I}_{\{{ \alpha }^\varepsilon (r)\in {\mathcal{M}}_{i}\}}\right )\vert {\mathcal{F}}_{s}^\varepsilon \right ]{\beta }_{ ij}(r) \\ \quad = \left (P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\mathcal{F}}_{s}^\varepsilon ) - {\nu }_{ j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\mathcal{F}}_{s}^\varepsilon )\right ){\beta }_{ ij}(r) \\ \quad = \left (P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\alpha }^\varepsilon (s)) - {\nu }_{ j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (s))\right ){\beta }_{ ij}(r)\end{array}$$

In view of Lemma 5.24, in particular, similar to (5.55) and (5.56), for all i ₀ = 1, …, l and ${j}_{0} = 1,\ldots,{m}_{{i}_{0}}$,

$$\begin{array}{l} P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}}) - {\nu }_{j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}}) \\ = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(r - s)} \varepsilon \right )\right )\end{array}$$

Thus owing to Lemma A.42, we have

$$\begin{array}{rl} &\left (P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\alpha }^\varepsilon (s)) - {\nu }_{ j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (s))\right ){\beta }_{ ij}(r) \\ &\quad = \sum \limits_{{i}_{0}=1}^{l} \sum \limits_{{j}_{0}=1}^{{m}_{{i}_{0}} }{I}_{\{{\alpha }^\varepsilon (s)={s}_{{i}_{ 0}{j}_{0}}\}}\left (P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}})\right. \\ &\left.\quad \quad \quad - {\nu }_{j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}})\right ){\beta }_{ij}(r) \\ &\quad = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(r - s)} \varepsilon \right )\right )\end{array}$$

Note also that

$$\frac{1} {\sqrt\varepsilon }{\int }_{s}^{t}O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(r - s)} \varepsilon \right )\right )dr = O(\sqrt\varepsilon ).$$

This implies (a).

To verify (b), fix and suppress i, j and define

$${\eta }^\varepsilon (t) = E\left [\left ({{\int }_{s}^{t}{W}_{ ij}(r,{\alpha }^\varepsilon (r))dr}\right )^{2}\left |{\mathcal{F}}_{ s}^\varepsilon \right]\right..$$

Then by the definition of n _ij( ⋅),

$$E\left [{\left ({n}_{ij}^\varepsilon (t) - {n}_{ ij}^\varepsilon (s)\right )}^{2}\left |{\mathcal{F}}_{ s}^\varepsilon \right ] = \frac{{\eta }^\varepsilon (t)} \varepsilon .\right.$$

(5.86)

In accordance with the definition of ${\overline{\alpha }}^\varepsilon (\cdot )$,${\overline{\alpha }}^\varepsilon (t) = i$ iff ${\alpha }^\varepsilon (t) \in {\mathcal{M}}_{i}$ . In what follows, we use ${\alpha }^\varepsilon (t) \in {\mathcal{M}}_{i}$ and ${\overline{\alpha }}^\varepsilon (t) = i$ interchangeably. Set

$$\begin{array}{rl} &{\Psi }_{1}^\varepsilon (t,r) = {I}_{\{{ \alpha }^\varepsilon (r)={s}_{ij}\}}{I}_{\{{\alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{ \alpha }^\varepsilon (r)={s}_{ij}\}}{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}, \\ &{\Psi }_{2}^\varepsilon (t,r) = -{\nu }_{ j}^{i}(r){I}_{\{{\overline{\alpha }}^{ \varepsilon }(r)=i\}}{I}_{\{{\alpha }^\varepsilon (t)={s}_{ij}\}} + {\nu }_{j}^{i}(r){\nu }_{ j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(r)=i\}}{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}\end{array}$$

Then as in the proof of Theorem 5.25,

$${ d{\eta }^\varepsilon (t) \over dt} = 2{\int }_{s}^{t}E\left [{\Psi }_{ 1}^\varepsilon (t,r) + {\Psi }_{ 2}^\varepsilon (t,r)\vert {\mathcal{F}}_{ s}^\varepsilon \right ]{\beta }_{ ij}(r){\beta }_{ij}(t)dr.$$

Using Lemma 5.24, we obtain

$$\begin{array}{rl} &E[{\Psi }_{1}^\varepsilon (t,r)\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}}] = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ), \\ &E[{\Psi }_{2}^\varepsilon (t,r)\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}}] = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ), \end{array}$$

for all i ₀ = 1, …, l and ${j}_{0} = 1,\ldots,{m}_{{i}_{0}}$. Then from Lemma A.42, we obtain

$$\begin{array}{rl} &E[{\Psi }_{1}^\varepsilon (t,r)\vert {\mathcal{F}}_{ s}^\varepsilon ] = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ), \\ &E[{\Psi }_{2}^\varepsilon (t,r)\vert {\mathcal{F}}_{ s}^\varepsilon ] = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right )\end{array}$$

As a consequence, we have

$$\frac{d{\eta }^\varepsilon (t)} {dt} = O(\varepsilon ).$$

Integrating both sides over [s, t] and recalling η^ε(s) = 0 yields

$$\frac{{\eta }^\varepsilon (t)} \varepsilon = O(t - s).$$

This completes the proof of the lemma. □

The next lemma is concerned with the tightness of $\{({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\}$.

Lemma 5.36

Assume (A5.5) and (A5.6) . Then $\{({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\}$ is tight in $D([0,T]; {\mathbb{R}}^{m} \times \overline{\mathcal{M}})$.

Proof: The proof uses Lemma A.17. We first verify that the condition given in Remark A.18 holds. To this end, note that $0 \leq {\overline{\alpha }}^\varepsilon (t) \leq l$ for all t ∈ [0, T]. Moreover, by virtue of Theorem 5.25, for each δ > 0 and each rational t ≥ 0,

$$\begin{array}{rl} {\inf }_\varepsilon P\left (\vert {n}^\varepsilon (t)\vert \leq {K}_{ t,\delta }\right )& {=\inf }_\varepsilon [1 - P(\vert {n}^\varepsilon (t)\vert \geq {K}_{ t,\delta })] \\ & {\geq \inf }_\varepsilon \left (1 -\frac{E\vert {n}^\varepsilon (t){\vert }^{2}} {{K}_{t,\delta }^{2}} \right ) \\ & \geq 1 - \frac{Kt} {{K}_{t,\delta }^{2}}, \end{array}$$

where the last inequality is due to Theorem 5.25. Thus if we choose ${K}_{t,\delta } > \sqrt{KT/\delta }$, (A.6) will follow.

It follows from Lemma 5.35 and (5.58) that for all t ∈ [0, T],

$$\begin{array}{l} \lim \limits_{\Delta \rightarrow 0}{\left \{ limsup{}_{\varepsilon \rightarrow 0}\left ({\sup }_{0\leq s\leq \Delta }E\left \{E\left [\;\vert {n}_{ij}^\varepsilon (t + s) - {n}_{ ij}^\varepsilon (t){\vert }^{2}\vert {\mathcal{F}}_{ t}^\varepsilon \right ]\right \}\right )\right \}} = 0, \\ \lim \limits_{\Delta \rightarrow 0}{\left \{ limsup{}_{\varepsilon \rightarrow 0}\left ({\sup }_{0\leq s\leq \Delta }E\left \{E\left [\;\vert {\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (t){\vert }^{2}\vert {\mathcal{F}}_{ t}^\varepsilon \right ]\right \}\right )\right \}} = 0.\end{array}$$

(5.87)

Using (5.86) and (5.87), Theorem A.17 yields the desired result. □

The tightness of $({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$ and Prohorov’s theorem allow one to extract convergent subsequences. We next show that the limit of such a subsequence is uniquely determined in distribution. An equivalent statement is that the associated martingale problem has a unique solution. The following lemma is a generalization of Theorem 5.25 and is needed for proving such a uniqueness property.

Lemma 5.37

Let ξ(t,x) be a real-valued function that is Lipschitz in (t,x) $\in {\mathbb{R}}^{m+1}$ . Then

$${\sup }_{0\leq \varsigma \leq T}E{|{\int }_{0}^{\varsigma }{W}_{ ij}(s,{\alpha }^\varepsilon (s))\xi (s,{n}^\varepsilon (s))d{s|}}^{2} \rightarrow 0,$$

where ${W}_{ij}(t,\alpha ) = ({I}_{\{\alpha ={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{\alpha \in {\mathcal{M}}_{i}\}}){\beta }_{ij}(t)$ as defined in (5.61).

Remark 5.38

This lemma indicates that the weighted occupation measure (with weighting function ξ(t,α^ε(t))) defined above goes to zero in mean square uniformly in t ∈ [0,ς]. If ξ(⋅) were a bounded and measurable deterministic function not depending on α^ε(⋅) or n^ε(⋅), this assertion would follow from Theorem 5.25 easily. In the current situation, it is a function of n ^ε (⋅) and therefore a function of α ^ε (⋅), which results in much of the difficulty. Intuitively, if we can “separate” the functions W _ij (⋅) and ξ(⋅) in the sense treating ξ(⋅) as deterministic, then Theorem 5.25 can be applied to obtain the desired limit. To do so, subdivide the interval [0,ς] into small intervals so that on each of the small intervals, the two functions can be separated. To be more specific, on each partitioned interval, use a piecewise-constant function to approximate ξ(⋅), and show that the error goes to zero. In this process, the Lipschitz condition of ξ(t,x) plays a crucial role.

Proof of Lemma 5.37: For 0 < δ < 1 and 0 < ς ≤ T, let $N = [\varsigma /\varepsilon ^{1-\delta }]$. Use a partition of [0, ς] given by

$$[{t}_{0},{t}_{1}] \cup [{t}_{1},{t}_{2}) \cup \cdots \cup [{t}_{N},{t}_{N+1}]$$

of [0, ς], where ${t}_{k} = \varepsilon ^{1-\delta }k$ for k = 0, 1, …, N and ${t}_{N+1} = \varsigma $ . Consider a piecewise-constant function

$$\widetilde{\xi }(t) = \left \{\begin{array}{ll} \xi (0,{n}^\varepsilon (0)), &\mbox{ if }0 \leq t < {t}_{2}, \\ \xi ({t}_{k-1},{n}^\varepsilon ({t}_{k-1})), &\mbox{ if }{t}_{k} \leq t < {t}_{k+1},\;k = 2,\ldots N, \\ \xi ({t}_{N-1},{n}^\varepsilon ({t}_{N-1})),&\mbox{ if }t = {t}_{N+1}. \end{array} \right.$$

Let W _ij ^ε(t) = W _ij(t, α ^ε(t)). Then

$$\begin{array}{l} E{ |{ \int }_{0}^{\varsigma }{W}_{ ij}^\varepsilon (t)\xi (t,{n}^\varepsilon (t))d{t |}}^{2} \\ \leq 2E{ |{\int }_{0}^{\varsigma }{W}_{ ij}^\varepsilon (t)\vert \xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t)\vert d{t |}}^{2} + 2E{ |{\int }_{0}^{\varsigma }{W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)d{t |}}^{2}.\end{array}$$

(5.88)

We now estimate the first term on the second line above. In view of the Cauchy inequality and the boundedness of W _ij ^ε (t), it follows, for 0 ≤ ς ≤ T, that

$$\begin{array}{rl} E{ |{\int }_{0}^{\varsigma }{W}_{ ij}^\varepsilon (t)\vert \xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t)\vert d{t |}}^{2} & \leq TE{\int }_{0}^{\varsigma }{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt \\ & = T{\int }_{0}^{\varsigma }E{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt.\end{array}$$

Note that Theorem 5.25 implies

$$E\vert {n}^\varepsilon (t){\vert }^{2} \leq K,$$

for a positive constant K and for all t ∈ [0, T]. Therefore, in view of the Lipschitz condition of ξ( ⋅), we have

$$E\vert \xi (t,{n}^\varepsilon (t))\vert \leq K(1 + E\vert {n}^\varepsilon (t)\vert ) \leq K(1 + {(E\vert {n}^\varepsilon (t){\vert }^{2})}^{\frac{1} {2} }) = O(1).$$

Noting that ${t}_{2} = 2\varepsilon ^{1-\delta } = O(\varepsilon ^{1-\delta })$ , it follows that

$$\begin{array}{l} { \int }_{0}^{\varsigma }E{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt \\ \quad = \sum \limits_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }E{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt + O(\varepsilon ^{1-\delta })\end{array}$$

Using the definition of $\widetilde{\xi }(t)$ , the Lipschitz property of ξ( t, x) in ( t, x), the choice of the partition of [0, ς], and Lemma 5.35, we have

$$\begin{array}{l} \sum \limits_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }E{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt \\ \quad = \sum \limits_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }E{(\xi (t,{n}^\varepsilon (t)) - \xi ({t}_{ k-1},{n}^\varepsilon ({t}_{ k-1})))}^{2}dt \\ \quad \leq 2{\sum }_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }K\left ({(t - {t}_{k-1})}^{2} + E\vert {n}^\varepsilon (t) - {n}^\varepsilon ({t}_{ k-1})){\vert }^{2}\right )dt \\ \quad \leq 2{\sum }_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }K\left ({(t - {t}_{k-1})}^{2} + O(t - {t}_{ k-1})\right )dt \\ \quad = 2{\sum }_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }O(\varepsilon ^{1-\delta })dt = O(\varepsilon ^{1-\delta })\end{array}$$

Let us estimate the second term on the second line in (5.88). Set

$$\widetilde{{\eta }}^\varepsilon (t) = E{\left ({\int }_{0}^{t}{W}_{ ij}^\varepsilon (s)\widetilde{\xi }(s)ds\right )}^{2}.$$

Then the derivative of $\widetilde{{\eta }}^\varepsilon (t)$ is given by

$$\begin{array}{l} \frac{d\widetilde{{\eta }}^\varepsilon (t)} {dt} = 2E{\int }_{0}^{t}{W}_{ ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)ds \\ \quad = 2{\int }_{0}^{t}E\left ({W}_{ ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right )ds\end{array}$$

For 0 ≤ t ≤ t ₂, in view of the Lipschitz property and Theorem 5.25, we obtain

$$\begin{array}{rl} {\int }_{0}^{{t}_{2} }E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right )ds& \leq {\int }_{0}^{{t}_{2} }E\left (\vert \widetilde{\xi }(s)\vert \cdot \vert \widetilde{\xi }(t)\vert \right )ds \\ & \leq {\int }_{0}^{{t}_{2} }{(E\vert \widetilde{\xi }(s){\vert }^{2})}^{\frac{1} {2} }{(E\vert \widetilde{\xi }(t){\vert }^{2})}^{\frac{1} {2} }ds \\ & = O({t}_{2}) = O(\varepsilon ^{1-\delta })\end{array}$$

If t _k ≤ t < t _k + 1, for k = 2, …, N, then using the same argument gives us

$$\begin{array}{l} { \int }_{k-1}^{t}E\left ({W}_{ ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right )ds \\ \quad = O(t - {t}_{k-1}) = O({t}_{k+1} - {t}_{k-1}) = O(\varepsilon ^{1-\delta }) \end{array}$$

and

$$\frac{d\widetilde{{\eta }}^\varepsilon (t)} {dt} = 2{\int }_{0}^{{t}_{k-1} }E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right )ds + O(\varepsilon ^{1-\delta }).$$

Recall that ${\mathcal{F}}_{t}^\varepsilon = \sigma \{{\alpha }^\varepsilon (s) :\; 0 \leq s \leq t\}$ . For $s \leq {t}_{k-1} < {t}_{k} \leq t < {t}_{k+1}$,

$$\begin{array}{l} E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right ) \\ \quad = E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s)E[{W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\vert {\mathcal{F}}_{{ t}_{k-1}}]\right ).\end{array}$$

(5.89)

Moreover, in view of the definition of $\widetilde{\xi }(\cdot )$ and the proof of Lemma 5.35, we have for some κ₀ > 0,

$$\begin{array}{rl} E[{W}_{ij}^\varepsilon (t)\widetilde{\xi }(t)\vert {\mathcal{F}}_{{ t}_{k-1}}]& =\widetilde{ \xi }(t)E[{W}_{ij}^\varepsilon (t)\vert {\mathcal{F}}_{{ t}_{k-1}}] \\ & =\widetilde{ \xi }(t)O\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}(t - {t}_{k-1})} \varepsilon \right )\right ) \\ & =\widetilde{ \xi }(t)O\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}({t}_{k} - {t}_{k-1})} \varepsilon \right )\right ) \\ & =\widetilde{ \xi }(t)O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}} {\varepsilon ^{\delta }} \right )\right ) =\widetilde{ \xi }(t)O(\varepsilon )\end{array}$$

Combine this with (5.89) to obtain

$$E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right ) = O(\varepsilon )E\vert \widetilde{\xi }(s)\widetilde{\xi }(t)\vert = O(\varepsilon ).$$

Therefore,

$$\frac{d\widetilde{{\eta }}^\varepsilon (t)} {dt} = O(\varepsilon ^{1-\delta })$$

uniformly on [0, T], which implies, together with $\widetilde{{\eta }}^\varepsilon (0) = 0$, that

$${\sup }_{0\leq \varsigma \leq T}\widetilde{{\eta }}^\varepsilon (\varsigma ) {=\sup }_{ 0\leq \varsigma \leq T}{ \int }_{0}^{\varsigma }\left (\frac{d\widetilde{{\eta }}^\varepsilon (t)} {dt} \right )dt = O(\varepsilon ^{1-\delta }).$$

This completes the proof. □

To characterize the limit of $({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$, consider the martingale problem associated with $({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$. Note that

$$\frac{d{n}^\varepsilon (t)} {dt} = \frac{1} {\sqrt\varepsilon }W(t,{\alpha }^\varepsilon (t))\mbox{ and }{n}^\varepsilon (0) = 0,$$

where

$$\begin{array}{l} W(t,\alpha ) = \left ({W}_{11}(t,\alpha ),\ldots,{W}_{1{m}_{1}}(t,\alpha ),\ldots,{W}_{l1}(t,\alpha ),\ldots,{W}_{l{m}_{l}}(t,\alpha )\right )\end{array}$$

Let ${\mathcal{G}}^\varepsilon (t)$ be the operator

$$\begin{array}{l} {\mathcal{G}}^\varepsilon (t)f(t,x,\alpha ) = \frac{\partial } {\partial t}f(t,x,\alpha ) + \frac{1} {\sqrt\varepsilon }\langle W(t,\alpha ),{\nabla }_{x}f(t,x,\alpha )\rangle \\ + {Q}^\varepsilon (t)f(t,x,\cdot )(\alpha ), \end{array}$$

for all f( ⋅, ⋅, α) ∈ C ^1, 1, where ∇ _x denotes the gradient with respect to x and ⟨ ⋅, ⋅⟩ denotes the usual inner product in Euclidean space. It is well known that (see Davis [41, Chapter 2])

$$f(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)) -{\int }_{0}^{t}{\mathcal{G}}^\varepsilon (s)f(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))ds$$

(5.90)

is a martingale.

We use the perturbed test function method (see Ethier and Kurtz [59] and Kushner [139]) to study the limit as ε → 0. To begin with, we define a functional space on ${\mathbb{R}}^{m} \times \overline{\mathcal{M}}$

$$\begin{array}{l} {C}_{L}^{2} = \{{f}^{0}(x,i) : \mbox{ with bounded derivatives up to the } \\ \mbox{ second order such that the second derivative is Lipschitz}\}.\end{array}$$

(5.91)

For any real-valued function ${f}^{0}(\cdot,i) \in {C}_{L}^{2}$, define

$$\overline{f}(x,\alpha ) = \sum \limits_{i=1}^{l}{f}^{0}(x,i){I}_{\{ \alpha \in {\mathcal{M}}_{i}\}} = \left \{\begin{array}{cc} {f}^{0}(x,1),&\mbox{ if }\alpha \in {\mathcal{M}}_{1},\\ \vdots & \vdots \\ {f}^{0}(x,l), & \mbox{ if }\alpha \in {\mathcal{M}}_{l},\\ \end{array} \right.$$

and consider the function

$$f(t,x,\alpha ) = \overline{f}(x,\alpha ) + \sqrt\varepsilon h(t,x,\alpha ),$$

(5.92)

where h( t, x, α) is to be specified later. The main idea is that by appropriate choice of h( ⋅), the perturbation is small and results in the desired cancelation in the calculation.

In view of the block-diagonal structure of $\widetilde{Q}(t)$ and the definition of $\overline{f}(x,\alpha )$, it is easy to see that

$$\widetilde{Q}(t)\overline{f}(x,\cdot )(\alpha ) = 0.$$

Applying the operator ${\mathcal{G}}^\varepsilon (t)$ to the function f( ⋅) defined in (5.92) yields that

$$\begin{array}{l} \overline{f}({n}^\varepsilon (t),{\alpha }^\varepsilon (t)) + \sqrt\varepsilon h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)) \\ \quad -{\int }_{0}^{t}\left \{ \frac{1} {\sqrt\varepsilon }\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}\overline{f}({n}^\varepsilon (s),{\alpha }^\varepsilon (s)) + \sqrt\varepsilon {\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle \right.\\ \qquad \qquad + \sqrt\varepsilon \frac{\partial } {\partial s}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s)) + \frac{1} {\sqrt\varepsilon }\widetilde{Q}(s)h(s,{n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s)) \\ \left.\qquad \qquad +\widehat{ Q}(s)(\overline{f}({n}^\varepsilon (s),\cdot ) + \sqrt\varepsilon h(s,{n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))\right \}ds \end{array}$$

defines a martingale.

The basic premise of the perturbed test function method is to choose the function h( ⋅) that cancels the “bad” terms of order 1 ∕ \sqrt{ε}:

$$\widetilde{Q}(s)h(s,x,\cdot )(\alpha ) = -\langle W(s,\alpha ),{\nabla }_{x}\overline{f}(x,\alpha )\rangle.$$

(5.93)

Note that as mentioned previously, $\widetilde{Q}(t)$ has rank m − l. Thus the dimension of the null space is l; that is, $N(\widetilde{Q}(t)) = l$. A crucial observation is that in view of the Fredholm alternative (see Lemma A.37 and Corollary A.38), a solution of (5.93) exists iff the matrix $(\langle W(s,{s}_{ij}),{\nabla }_{x}\overline{f}(x,{s}_{ij})\rangle )$ is orthogonal to $\widetilde{\mathrm{1}{\mathrm{l}}}_{{m}_{1}},\ldots,\widetilde{\mathrm{1}{\mathrm{l}}}_{{m}_{l}}$ , the span of $N(\widetilde{Q}(t))$ (see Remark 4.23 for the notation). Moreover, since f ⁰( ⋅, i) is C _L ²,h( ⋅) can be chosen to satisfy the following properties assuming β_ij( ⋅) to be Lipschitz on [0, T]:

$$\begin{array}{l} \mbox{ (1) }h(t,x,\alpha )\mbox{ is uniformly Lipschitz in }t; \\ \mbox{ (2) }\vert h(t,x,\alpha )\vert \mbox{ and }\vert {\nabla }_{x}h(t,x,\alpha )\vert \mbox{ are bounded; } \\ \mbox{ (3) }{\nabla }_{x}h(t,x,\alpha )\mbox{ is Lipschitz in }(t,x)\end{array}$$

Such an $h(\cdot )$ leads to

$$\begin{array}{l} \overline{f }({n}^\varepsilon (t),{\alpha }^\varepsilon (t)) + \sqrt\varepsilon h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)) \\ -{\int }_{0}^{t}\left \{\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle \right.\\ \qquad + \sqrt\varepsilon \left ( \frac{\partial } {\partial s}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\right ) +\widehat{ Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s)) \\ \left.\qquad + \sqrt\varepsilon \widehat{Q}(s)h(s,{n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))\right \}ds \end{array}$$

(5.94)

being a martingale. For each s, x, α, define

$$g(s,x,\alpha ) =\langle W(s,\alpha ),{\nabla }_{x}h(s,x,\alpha )\rangle.$$

(5.95)

With f ⁰ ∈ C _L ² , it is easy to see that g(s, x, α) is Lipschitz in (s, x). This function will be used in defining the operator for the limit problem later.

Remark 5.39

Note that the choice of h(⋅) in (5.93) is not unique. If h ₁ (⋅) and h ₂(⋅) are both solutions to (5.93), then the irreducibility of $\widetilde{{Q}}^{i}(s)$ implies that, for each i = 1,…,l,

$$\left (\begin{array}{c} {h}_{1}(s,x,{s}_{i1})\\ \vdots \\ {h}_{1}(s,x,{s}_{i{m}_{i}}) \end{array} \right )-\left (\begin{array}{c} {h}_{2}(s,x,{s}_{i1})\\ \vdots \\ {h}_{2}(s,x,{s}_{i{m}_{i}}) \end{array} \right ) = {h}^{0}(s,x,i)\mathrm{1}{\mathrm{l}}_{{ m}_{i}}$$

for some scalar functions h ⁰ (s,x,i). Although the choice of h is not unique, the resulting function g(s,x,α) is well defined. As in Remark 4.23, the consistency condition or solvability condition due to Fredholm alternative is in force. Therefore, if h ₁ and h ₂ are both solutions to (5.93), then

$$\langle W(s,\alpha ),{\nabla }_{x}{h}_{1}(s,x,\alpha )\rangle =\langle W(s,\alpha ),{\nabla }_{x}{h}_{2}(s,x,\alpha )\rangle,$$

for $\alpha \in {\mathcal{M}}_{i}$ and i = 1,…,l.

Using g(s, x, α) defined above, we obtain

$$\begin{array}{l} {\int }_{0}^{t}\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle ds \\ \quad ={ \int }_{0}^{t}g(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{I}_{\{{\alpha }^\varepsilon (s)={s}_{ij}\}}g(s,{n}^\varepsilon (s),{s}_{ ij})ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }({I}_{\{{\alpha }^\varepsilon (s)={s}_{ij}\}} - {\nu }_{j}^{i}(s){I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}})g(s,{n}^\varepsilon (s),{s}_{ ij})ds \\ \qquad \quad +{ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}{\nu }_{j}^{i}(s)g(s,{n}^\varepsilon (s),{s}_{ ij})ds\end{array}$$

In view of Lemma 5.37, the term in the fourth line above goes to zero in mean square uniformly in t ∈ [0, T]. Let

$$\overline{g}(s,x,i) = \sum \limits_{j=1}^{{m}_{i} }{\nu }_{j}^{i}(s)g(s,x,{s}_{ ij}).$$

Then it follows that

$$\begin{array}{l} {\int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}{\nu }_{j}^{i}(s)g(s,{n}^\varepsilon (s),{s}_{ ij})ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l}{I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}}\overline{g}(s,{n}^\varepsilon (s),i)ds \\ \quad ={ \int }_{0}^{t}\overline{g}(s,{n}^\varepsilon (s),{\overline{\alpha }}^\varepsilon (s))ds\end{array}$$

Therefore, as ε → 0, we have

$$\begin{array}{l} E|{ \int }_{0}^{t}\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle ds \\ \quad -{\int }_{0}^{t}\overline{g}(s,{n}^\varepsilon (s),{\overline{\alpha }}^\varepsilon (s))d{s|}^{2} \rightarrow 0 \end{array}$$

(5.96)

uniformly in t ∈ [0, T].

Furthermore, we have

$$\begin{array}{l} {\int }_{0}^{t}\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{I}_{\{{\alpha }^\varepsilon (s)={s}_{ij}\}}\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({s}_{ ij})ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }({I}_{\{{\alpha }^\varepsilon (s)={s}_{ij}\}} - {\nu }_{j}^{i}(s){I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}})\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({s}_{ ij})ds \\ \quad \quad +{ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{\nu }_{j}^{i}(s){I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}}\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({s}_{ ij})ds\end{array}$$

Again, Lemma 5.37 implies that the third line above goes to 0 in mean square uniformly in t ∈ [0, T]. The last term above equals

$${\int }_{0}^{t}\overline{Q}(s){f}^{0}({n}^\varepsilon (s),\cdot )({\overline{\alpha }}^\varepsilon (s))ds,$$

where $\overline{Q}(s) = \mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}$. It follows that as ε → 0,

$$\begin{array}{l} E|{ \int }_{0}^{t}\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))ds \\ \quad -{\int }_{0}^{t}\overline{Q}(s){f}^{0}({n}^\varepsilon (s),\cdot )({\overline{\alpha }}^\varepsilon (s))ds| \rightarrow 0 \end{array}$$

(5.97)

uniformly in t ∈ [0, T].

We next examine the function $\overline{g}(s,x,i)$ closely. Using the block-diagonal structure of $\widetilde{Q}(s)$, we can write (5.93) in terms of each block $\widetilde{{Q}}^{j}(s)$. For j = 1, …, l,

$$\widetilde{{Q}}^{j}(s)\left (\begin{array}{c} h(s,x,{s}_{j1})\\ \vdots \\ h(s,x,{s}_{j{m}_{j}})\\ \end{array} \right ) = -\left (\begin{array}{c} \langle W(s,{s}_{j1}),{\nabla }_{x}{f}^{0}(x,j)\rangle \\ \vdots \\ \langle W(s,{s}_{j{m}_{j}}),{\nabla }_{x}{f}^{0}(x,j)\rangle \\ \end{array} \right ).$$

(5.98)

Note that $\widetilde{{Q}}^{j}(s)$ is weakly irreducible so $\mathrm{rank}(\widetilde{{Q}}^{j}(s)) = {m}_{j} - 1$. As in Remark 4.9, equation (5.98) has a solution since it is consistent and the solvability condition in the sense of Fredholm alternative is satisfied. We can solve (5.98) using exactly the same technique as in Section 4.2 for obtaining the φ_i(t), that is, replacing one of the rows of the augmented matrix in (5.98) by (1, 1, …, 1, 0), which represents the equation ${\sum }_{k=1}^{{m}_{j}}h(s,x,{s}_{jk}) = 0$. The coefficient matrix of the resulting equation then has full rank; one readily obtains a solution. Equivalently, the solution may be written as

$$\begin{array}{l} \left (\begin{array}{c} h(s,x,{s}_{j1})\\ \vdots \\ h(s,x,{s}_{j{m}_{j}})\\ \end{array} \right ) = \\ \quad -{\left [\left (\begin{array}{c} \widetilde{{Q}}^{j}(s) \\ \mathrm{1}{\mathrm{l}}_{{m}_{j}}^{\prime}\\ \end{array} \right )^{\prime}\left (\begin{array}{c} \widetilde{{Q}}^{j}(s) \\ \mathrm{1}{\mathrm{l}}_{{m}_{j}}^{\prime}\\ \end{array} \right )\right ]}^{-1}\left (\begin{array}{c} \widetilde{{Q}}^{j}(s) \\ \mathrm{1}{\mathrm{l}}_{{m}_{j}}^{\prime}\\ \end{array} \right )^{\prime}\left (\begin{array}{c} \langle W(s,{s}_{j1}),{\nabla }_{x}{f}^{0}(x,j)\rangle \\ \vdots \\ \langle W(s,{s}_{j{m}_{j}}),{\nabla }_{x}{f}^{0}(x,j)\rangle \\ 0\\ \end{array} \right )\end{array}$$

Note that

$${I}_{\{\alpha ={s}_{jk}\}} - {\nu }_{k}^{j}(t){I}_{\{ \alpha \in {\mathcal{M}}_{j}\}} = 0\mbox{ if }\alpha \not\in {\mathcal{M}}_{j}.$$

Recall the notation for the partitioned vector x = ( x ¹, …, x ^l) where x ^j is an m _j-dimensional vector and ${x}^{j} = ({x}_{1}^{j},\ldots,{x}_{{m}_{j}}^{j})$. For the partial derivatives, use the notation

$${\partial }_{j,k} ={ \partial \over \partial {x}_{k}^{j}} \mbox{ and }{\partial }_{j,{j}_{1}{j}_{2}}^{2} ={ {\partial }^{2} \over \partial {x}_{{j}_{1}}^{j}\partial {x}_{{j}_{2}}^{j}}.$$

Then h(s, x, s _jk) is a functional of ∂ _j, j1 f ⁰(x, j),...${\partial }_{j,{m}_{j}}{f}^{0}(x,j)$. It follows that g( s, x, s _jk) is a functional of ${\partial }_{j,{j}_{1}{j}_{2}}^{2}{f}^{0}(x,j)$ , for j ₁, j ₂ = 1, …, m _j, and so is $\overline{g}(s,x,j)$. Write

$$\overline{g}(s,x,j) = \frac{1} {2}{\sum }_{{j}_{1},{j}_{2}=1}^{{m}_{j} }{a}_{{j}_{1}{j}_{2}}(s,j){\partial }_{j,{j}_{1}{j}_{2}}^{2}{f}^{0}(x,j),$$

(5.99)

for some continuous functions ${a}_{{j}_{1}{j}_{2}}(s,j)$.

Lemma 5.40

Assume (A5.5) and (A5.6) . Suppose $({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$ converges weakly to $(n(\cdot ),\overline{\alpha }(\cdot ))$ . Then for f ⁰ (⋅,i) ∈ C _L ²,

$${f}^{0}(n(t),\overline{\alpha }(t)) -{\int }_{0}^{t}\left (\overline{g}(s,n(s),\overline{\alpha }(s)) + \overline{Q}(s){f}^{0}(n(s),\cdot )(\overline{\alpha }(s))\right )ds$$

is a martingale.

Proof: Define

$$\begin{array}{l} {H}^\varepsilon (t) = \overline{f}({n}^\varepsilon (t),{\alpha }^\varepsilon (t)) + \sqrt\varepsilon h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)) \\ \quad \, -{\int }_{0}^{t}\left \{\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle \right.\\ \qquad \qquad \, + \sqrt\varepsilon \frac{\partial } {\partial s}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s)) +\widehat{ Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s)) \\ \left.\qquad \qquad \, + \sqrt\varepsilon \widehat{Q}(s)h(s,{n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))\right \}ds\end{array}$$

The martingale property implies that

$$E\left [({H}^\varepsilon (t) - {H}^\varepsilon (s)){z}_{ 1}({n}^\varepsilon ({t}_{ 1}),{\overline{\alpha }}^\varepsilon ({t}_{ 1}))\cdots {z}_{k}({n}^\varepsilon ({t}_{ k}),{\overline{\alpha }}^\varepsilon ({t}_{ k}))\right ] = 0,$$

for any 0 ≤ t ₁ ≤ ⋯ ≤ t _k ≤ s ≤ t and any bounded and continuous functions z ₁ ( ⋅), …, z _k( ⋅).

In view of the choice of h( ⋅), it follows that all the three terms

$$\begin{array}{l} \sqrt\varepsilon h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)), \\ \sqrt\varepsilon \left ( \frac{\partial } {\partial t}h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t))\right ),\mbox{ and} \\ \sqrt\varepsilon \widehat{Q}(t)h(t,{n}^\varepsilon (t),\cdot )({\alpha }^\varepsilon (t)) \end{array}$$

converge to 0 in mean square. Recall (5.96), (5.97), and

$$\overline{f}({n}^\varepsilon (t),{\alpha }^\varepsilon (t)) = {f}^{0}({n}^\varepsilon (t),{\overline{\alpha }}^\varepsilon (t)).$$

Denote the weak limit of H ^ε( ⋅) by $\overline{H}(\cdot )$. We have

$$E\left [\left (\overline{H}(t) -\overline{H}(s)\right ){z}_{1}(n({t}_{1}),\overline{\alpha }({t}_{1}))\cdots {z}_{k}(n({t}_{k}),\overline{\alpha }({t}_{k}))\right ] = 0,$$

where $\overline{H}(\cdot )$ is given by

$$\begin{array}{l} \overline{H }(t)= {f}^{0}(n(t),\overline{\alpha }(t)) \\ \quad -{\int }_{0}^{t}\left (\overline{g}(r,n(r),\overline{\alpha }(r)) + \overline{Q}(r){f}^{0}(n(r),\cdot )(\overline{\alpha }(r))\right )dr\end{array}$$

Thus $(n(\cdot ),\overline{\alpha }(\cdot ))$ is a solution to the martingale problem. □

Lemma 5.41

Let $\mathcal{L}$ denote the operator given by

$$\mathcal{L}{f}^{0}(x,j) ={ 1 \over 2} {\sum }_{{j}_{1},{j}_{2}=1}^{{m}_{j} }{a}_{{j}_{1}{j}_{2}}(s,j){\partial }_{j,{j}_{1}{j}_{2}}^{2}{f}^{0}(x,j) + \overline{Q}(s){f}^{0}(x,\cdot )(j).$$

Then the martingale problem with operator $\mathcal{L}$ has a unique solution.

Proof: In view of Lemma A.14, we need only verify the uniqueness in distribution of $(n(t),\overline{\alpha }(t))$ for each t ∈ [0, T]. Let

$$f(x,j) =\exp \left (\iota \{\langle \theta,x\rangle + {\theta }_{0}j\}\right ),$$

where $\theta \in {\mathbb{R}}^{m}$,${\theta }_{0} \in \mathbb{R}$,$j \in \mathcal{M}$ , and ι is the pure imaginary number with ${\iota }^{2} = -1$.

For fixed j ₀, k ₀ , let ${F}_{{j}_{0}{k}_{0}}(x,j) = {I}_{\{j={j}_{0}\}}f(x,{k}_{0})$ . Then

$${F}_{{j}_{0}{k}_{0}}(n(t),\overline{\alpha }(t)) = {I}_{\{\overline{\alpha }(t)={j}_{0}\}}f(n(t),{k}_{0}).$$

Moreover, note that

$$\begin{array}{l} \overline{g }(s,n(s),\overline{\alpha }(s)) = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}}\overline{g}(s,n(s),j) \\ \quad ={ 1 \over 2} {\sum }_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}} \sum \limits_{{j}_{1},{j}_{2}=1}^{{m}_{j} }{a}_{{j}_{1}{j}_{2}}(s,j){\partial }_{j,{j}_{1}{j}_{2}}^{2}{F}_{{ j}_{0}{k}_{0}}(n(s),j) \\ \quad ={ 1 \over 2} {I}_{\{\overline{\alpha }(s)={j}_{0}\}} \sum \limits_{{j}_{1},{j}_{2}=1}^{{m}_{{j}_{0}} }{a}_{{j}_{1}{j}_{2}}(s,{j}_{0}){\partial }_{{j}_{0},{j}_{1}{j}_{2}}^{2}f(n(s),{k}_{ 0}) \\ \quad ={ 1 \over 2} {\sum }_{{j}_{1},{j}_{2}=1}^{{m}_{j} }{a}_{{j}_{1}{j}_{2}}(s,{j}_{0})(-{\theta }_{{j}_{0}{j}_{1}}{\theta }_{{j}_{0}{j}_{2}})({I}_{\{\overline{\alpha }(s)={j}_{0}\}}f(n(s),{k}_{0})).\end{array}$$

(5.100)

Furthermore, we have

$$\begin{array}{l} \overline{Q }(s){F}_{{j}_{0}{k}_{0}}(n(s),\cdot )(\overline{\alpha }(s)) \\ \quad = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}}\overline{Q}(s){F}_{{j}_{0}{k}_{0}}(n(s),\cdot )(j) \\ \quad = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}} \sum \limits_{k=1}^{l}{\overline{q}}_{ jk}(s){F}_{{j}_{0}{k}_{0}}(n(s),k) \\ \quad = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}} \sum \limits_{k=1}^{l}{\overline{q}}_{ jk}(s){I}_{\{k={j}_{0}\}}f(n(s),{k}_{0}) \\ \quad = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}}{\overline{q}}_{j{j}_{0}}(s)f(n(s),{k}_{0}) \\ \quad = \sum \limits_{j=1}^{l}{\overline{q}}_{ j{j}_{0}}(s){I}_{\{\overline{\alpha }(s)=j\}}f(n(s),{k}_{0}).\end{array}$$

(5.101)

Let

$${\phi }_{jk}(t) = E\left ({I}_{\{\overline{\alpha }(t)=j\}}f(n(t),k)\right ),\ \mbox{ for }\ j,k = 1,\ldots,l.$$

Then in view of (5.100) and (5.101),

$$\begin{array}{l} {\phi }_{{j}_{0 } {k}_{0}}(t) - {\phi }_{{j}_{0}{k}_{0}}(0) -{\int }_{0}^{t}\left \{\;\; \sum \limits_{{j}_{1},{j}_{2}=1}^{{m}_{{j}_{0}} }{a}_{{j}_{1}{j}_{2}}(s,{j}_{0})(-{\theta }_{{j}_{0}{j}_{1}}{\theta }_{{j}_{0}{j}_{2}}){\phi }_{{j}_{0}{k}_{0}}(s) \right.\\ \left.\qquad \qquad + \sum \limits_{j=1}^{l}{\overline{q}}_{ j{j}_{0}}(s){\phi }_{j{k}_{0}}(s)\right \}ds = 0.\end{array}$$

(5.102)

Let

$$\phi (t) = ({\phi }_{11}(t),\ldots,{\phi }_{1{m}_{1}}(t),\ldots,{\phi }_{l1}(t),\ldots,{\phi }_{l{m}_{l}}(t)).$$

Rewrite (5.102) in terms of ϕ( ⋅) as

$$\phi (t) = \phi (0) +{ \int }_{0}^{t}\phi (s)B(s)ds,$$

where ϕ(0) = (ϕ _jk (0)) with ${\phi }_{jk}(0) = E{I}_{\{\overline{\alpha }(0)=j\}}f(0,k)$, and B( t) is a matrix-valued function whose entries are defined by the integrand of (5.102). The equation for ϕ(t) is a linear ordinary differential equation. It is well known that such a differential equation has a unique solution. Hence, ϕ(t) is uniquely determined. In particular,

$$\begin{array}{l} E\exp \left (\iota \{\langle \theta,n(t)\rangle + {\theta }_{0}\overline{\alpha }(t)\}\right ) \\ \quad = \sum \limits_{j=1}^{l}E\left ({I}_{\{\overline{\alpha } (t)=j\}}\exp \left (\iota \{\langle \theta,n(t)\rangle + j{\theta }_{0}\}\right )\right ) \end{array}$$

is uniquely determined for all θ, θ₀, so is the distribution of $(n(t),\overline{\alpha }(t))$ by virtue of the uniqueness theorem and the inversion formula of the characteristic function (see Chow and Teicher [30]). □

The tightness of $({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$ together with Lemma 5.40 and Lemma 5.41 implies that $({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$ converges weakly to $(n(\cdot ),\overline{\alpha }(\cdot ))$. We will show that n( ⋅) is a switching diffusion, i.e., a diffusion process modulated by a Markov process such that the covariance of the diffusion depends on the Markov jump process. Precisely, owing to the presence of the jump Markov chains, the limit process does not possess the independent increment property shared by many processes. A moment of reflection reveals that, necessarily, the coefficients in $\overline{g}(s,x,i)$ must consist of a symmetric nonnegative definite matrix serving as a covariance matrix. The following lemma verifies this assertion.

Lemma 5.42

For s ∈ [0,T] and j = 1,…,l, the matrix

$$A(s,j) = ({a}_{{j}_{1}{j}_{2}}(s,j))$$

is symmetric and nonnegative definite.

Proof: Let ${\eta }^{j} = ({\eta }_{j1},\ldots,{\eta }_{j{m}_{j}})^{\prime}$ and ${x}^{j} = ({x}_{j1},\ldots,{x}_{j{m}_{j}})^{\prime}$. Define

$${f}_{j}(x) = \frac{1} {2}{\left (\langle {\eta }^{j},{x}^{j}\rangle \right )}^{2}.$$

Then the corresponding $\overline{g}(\cdot )$ defined in (5.99) has the following form:

$$\overline{g}(s,x,j) = \frac{1} {2}{\eta }^{j,{\prime}}A(s,j){\eta }^{j}.$$

Moreover, let f _j (x, k) = f _j (x), independent of k. Then for all k = 1, …, l,

$$\overline{Q}(s){f}_{j}({n}^\varepsilon (s),\cdot )(k) = 0.$$

To verify the nonnegativity of A(s, j), it suffices to show that

$${\int }_{s}^{t}{\eta }^{j,{\prime}}A(r,j){\eta }^{j}dr \geq 0,$$

for all 0 ≤ s ≤ t ≤ T. Recall that f _j(x) is a quadratic function. In view of (5.94) and the proof of Lemma 5.40, it then follows that

$$\frac{1} {2}{\int }_{s}^{t}{\eta }^{j,{\prime}}A(r,j){\eta }^{j}dr {=\lim }_{ \varepsilon \rightarrow 0}\left (E{f}_{j}({n}^\varepsilon (t)) - E{f}_{ j}({n}^\varepsilon (s))\right ).$$

We are in a position to show that the limit is nonnegative. Let

$${n}^{\varepsilon,j}(t) = ({n}_{ j1}^\varepsilon (t),\ldots,{n}_{ j{m}_{j}}^\varepsilon (t)).$$

Then

$$\begin{array}{l} E\left ({f}_{j}({n}^\varepsilon (t)) - {f}_{ j}({n}^\varepsilon (s))\right ) \\ \quad = \frac{1} {2}E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}{(t)\rangle }^{2} -\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2}\right )\end{array}$$

For t ≥ s ≥ 0, using

$$\langle {\eta }^{j},{n}^{\varepsilon,j}(t)\rangle =\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle +\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle,$$

we have

$$\begin{array}{l} E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}{(t)\rangle }^{2} -\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2}\right ) \\ = E\left (2\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle \langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle +\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}{(s)\rangle }^{2}\right ) \\ \geq 2E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle \langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle \right ) \\ = 2E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle E\left [\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle \left |{\mathcal{F}}_{ s}^\varepsilon \right ]\right )\right.\end{array}$$

We next show that the last term goes to 0 as ε → 0. In fact, in view of (a) in Lemma 5.35, it follows that

$$E[{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\vert {\mathcal{F}}_{ s}^\varepsilon ] = O(\sqrt\varepsilon ),$$

and hence

$$\begin{array}{rl} &E\left [\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle \left |{\mathcal{F}}_{ s}^\varepsilon \right ] \right.\\ &\qquad \quad =\langle {\eta }^{j},E[({n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s))\vert {\mathcal{F}}_{ s}^\varepsilon ]\rangle = O(\sqrt\varepsilon )\end{array}$$

Using (b) in Lemma 5.35, we derive the following inequalities

$$E\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2} \leq \vert {\eta }^{j}{\vert }^{2}E\vert {n}^{\varepsilon,j}(s){\vert }^{2} \leq \vert {\eta }^{j}{\vert }^{2}O(s).$$

The Cauchy–Schwarz inequality then leads to

$$\begin{array}{l} \vert E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle E\left [\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle \left |{\mathcal{F}}_{ s}^\varepsilon \right ]\right )\vert \right.\\ \quad \leq {(E\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2})}^{\frac{1} {2} }{(E{[\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle |{\mathcal{F}}_{s}^\varepsilon]}^{2})}^{\frac{1} {2} } \\ \quad ={ \left (E\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2}\right )}^{\frac{1} {2} }O(\sqrt\varepsilon ) \rightarrow 0,\ \mbox{ as }\varepsilon \rightarrow 0. \end{array}$$

As a result for some K > 0, we have

$$E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle E\left [\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle |{\mathcal{F}}_{ s}^\varepsilon \right ]\right ) \geq -K\vert {\eta }^{j}\vert s\sqrt\varepsilon \rightarrow 0,$$

as ε → 0. The nonnegativity of A(s, j) follows.

To show that A( s, j) is symmetric, consider

$${f}_{j,{j}_{1}{j}_{2}}(x) = {x}_{j{j}_{1}}{x}_{j{j}_{2}}\mbox{ for }{j}_{1},{j}_{2} = 1,\ldots,{m}_{j}.$$

Then, we have

$$\frac{1} {2}{\int }_{0}^{t}{a}_{{ j}_{1}{j}_{2}}(s,j)ds {=\lim }_{\varepsilon \rightarrow 0}E({n}_{{j}_{1}}^{\varepsilon,j}(t){n}_{{ j}_{2}}^{\varepsilon,j}(t)) = \frac{1} {2}{\int }_{0}^{t}{a}_{{ j}_{2}{j}_{1}}(s,j)ds,$$

(5.103)

for all t ∈ [0, T]. Thus, A(s, j) is symmetric. □

Next, we derive an explicit representation of the nonnegative definite matrix A( s, j) similar to that of Theorem 5.9. Recall that given a function f ⁰( ⋅), one can find h( ⋅) as in (5.93). Using this h( ⋅), one defines f( ⋅) as in (5.95) which leads to $\overline{g}(\cdot )$ given in (5.99). In view of the result in Theorem 5.9 for a single block of the irreducible matrix $\widetilde{{Q}}^{j}(t)$ together with the computations of $\overline{g}(s,x,j)$ , it follows that A(s, j) = 2A ⁰(s, j), where

$$\begin{array}{l} {A}^{0 } (t,j) = {\beta }_{\mathrm{ diag}}^{j}(t)\left ({\nu }_{\mathrm{ diag}}^{j}(t){\int }_{0}^{\infty }{Q}_{ 0}(r,t,j)dr\right. \\ \quad \quad \quad +\left.\left ({ \int }_{0}^{\infty }{Q}_{ 0}(r,t,j)dr\right ){\nu }_{\mathrm{diag}}^{j}(t)\right ){\beta }_{\mathrm{ diag}}^{j}(t), \end{array}$$

with

$$\begin{array}{l} {\beta }_{\mathrm{diag}}^{j}(t) = \mathrm{diag}({\beta }_{j1}(t),\ldots,{\beta }_{j{m}_{j}}(t)), \\ {\nu }_{\mathrm{diag}}^{j}(t) = \mathrm{diag}({\nu }_{1}^{j}(t),\ldots,{\nu }_{{m}_{j}}^{j}(t)), \end{array}$$

and

$${Q}_{0}(r,t,j) = \left [I -\left (\begin{array}{c} {\nu }^{j}(t)\\ \vdots \\ {\nu }^{j}(t) \end{array} \right )\right ]\exp \left (\widetilde{{Q}}^{j}(t)r\right ).$$

Applying Lemma 5.42 to the case of $\widetilde{Q}(s)$ a single block irreducible matrix $\widetilde{{Q}}^{j}(s)$ , it follows that A ⁰(s, j) is symmetric and nonnegative definite. Hence, standard results in linear algebra yield that there exists an m _j ×m _j matrix σ ⁰ (s, j) such that

$${\sigma }^{0}(s,j){\sigma }^{0,{\prime}}(s,j) = {A}^{0}(s,j).$$

(5.104)

Note that the definition of $\overline{g}(s,x,j)$ is independent of $\widehat{Q}(t)$, so for determining A ⁰(s, j), we may consider $\widehat{Q}(t) = 0$ . Note also that

$$\widetilde{Q}(t) = \mathrm{diag}(\widetilde{{Q}}^{1}(t),0,\ldots,0) + \cdots + \mathrm{diag}(0,\ldots,0,\widetilde{{Q}}^{l}(t)).$$

The foregoing statements suggest that in view of (5.104), the desired covariance matrix is given by

$$\begin{array}{ll} \sigma (s,j)& = \left (\begin{array}{*{10}c} {0}_{{m}_{1}\times {m}_{1}}& && && \\ &{0}_{{m}_{2}\times {m}_{2}}&& &&\\ & &\ddots &&& \\ & &&{\sigma }^{0}(s,j)&&\\ & & &&\ddots& \\ & && &&{0}_{{m}_{l}\times {m}_{l}}\\ \end{array} \right ) \\ & = \mathrm{diag}({0}_{{m}_{1}\times {m}_{1}},{0}_{{m}_{2}\times {m}_{2}},\ldots,{\sigma }^{0}(s,j)\ldots,{0}_{{ m}_{l}\times {m}_{l}}), \end{array}$$

(5.105)

where ${0}_{{m}_{k}\times {m}_{k}}$ is the m _k ×m _k zero matrix. That is, it is a matrix with the jth block-diagonal submatrix equal to σ⁰(s, j) and the rest of its elements equal to zero.

Theorem 5.43

Assume that (A5.5) holds. Suppose $\widetilde{Q}(\cdot )$ is twice differentiable with Lipschitz continuous second derivative and $\widehat{Q}(\cdot )$ is differentiable with Lipschitz continuous derivative. Let β _ij (⋅) be bounded and Lipschitz continuous deterministic functions. Then n ^ε (⋅) converges weakly to a switching diffusion n(⋅), where

$$n(t) = \left({\int }_{0}^{t}\sigma (s,\overline{\alpha }(s))dw(s)\right)^{\prime}$$

and w(⋅) is a standard m-dimensional Brownian motion.

Proof: Let

$$\widetilde{n}(t) = \left({\int }_{0}^{t}\sigma (s,\overline{\alpha }(s))dw(s)\right)^{\prime}$$

and $\overline{\alpha }(\cdot )$ be a Markov chain generated by $\overline{Q}(t)$. Then for all f ⁰ ( ⋅, i) ∈ C _L ²,

$${f}^{0}(\widetilde{n}(t),\overline{\alpha }(t)) -{\int }_{0}^{t}\left (\overline{g}(s,\widetilde{n}(s),\overline{\alpha }(s)) + \overline{Q}(s){f}^{0}(\widetilde{n}(s),\cdot )(\overline{\alpha }(s))\right )ds$$

is a martingale. This and the uniqueness of the martingale problem in Lemma 5.41 yields that $(\widetilde{n}(\cdot ),\overline{\alpha }(\cdot ))$ has the same probability distribution as $(n(\cdot ),\overline{\alpha }(\cdot ))$. This proves the theorem. □

Remark 5.44

Note that the Lipschitz condition on β _ij(⋅) is not required in analyzing the asymptotic normality in Section 5.3.3. It is needed in this section because the perturbed test function method typically requires smoothness conditions of the associated processes.

It appears that the conditions in (A5.5) and (A5.6) together with the Lipschitz property of β _ij (⋅) are sufficient for the convergence of n ^ε (⋅) to a switching diffusion n(⋅). The additional assumptions on further derivatives of $\widetilde{Q}(\cdot )$ and $\widehat{Q}(\cdot )$ are needed for computing the covariance of the limit process n(⋅).

Remark 5.45

If $\overline{\alpha }(\cdot )$ were a deterministic function, n(⋅) above would be a diffusion process in the usual sense. However since the limit $\overline{\alpha }(\cdot )$ is a Markov chain, the diffusion process is modulated by this jump process; the resulting distribution has the features of the “continuous” diffusion process and the “discrete” Markov chain limit.

In this section, we use the perturbed test function method, which is quite different from the approach of Section 5.2. The method used in that section, which might be called a direct approach, is interesting in its own right and makes a close connection between asymptotic expansion and asymptotic normality. It is effective whenever it can be applied. One of the main ingredients is that the direct approach makes use of the mixing properties of the scaled occupation measures heavily. In fact, using asymptotic expansion, it was shown that the scaled sequence of occupation measures is a mixing process with exponential mixing rate. For the weak and strong interaction cases presented, the mixing condition, and even approximate mixing conditions, no longer hold. To illustrate, consider Example 4.20 with constant jump rates and calculate

$$E[{n}^{\varepsilon,{\prime}}(s)({n}^\varepsilon (t) - {n}^\varepsilon (s))].$$

By virtue of the proof of Theorem 5.25 , a straightforward but tedious calculation shows that

$$E\left [{n}^{\varepsilon,{\prime}}(s)({n}^\varepsilon (t) - {n}^\varepsilon (s))\right ]\rightarrow 0\ \mbox{ as }\varepsilon \rightarrow 0$$

for the weak and strong interaction models because E[n ^ε,′ (s)(n ^ε (t) − n ^ε (s))] depends on P ₁ (t,s), generally a nonzero function. A direct consequence is that the limit process does not have independent increments in general. It is thus difficult to characterize the limit process via the direct approach. The perturbed test function method, on the other hand, can be considered as a combined approach. It uses enlarged or augmented states by treating the scaled occupation measure n ^ε (⋅) and the Markov chain α ^ε (⋅) together. That is, one considers a new state variable with two components (x,α). This allows us to bypass the verification of mixing-like properties such that the limit process is characterized by means of solutions of appropriate martingale problems via perturbed test functions, which underlies the rationale and essence of the approach. As a consequence, the limit process is characterized via the limit of the underlying sequence of operators.

Note that if $\widetilde{Q}(t)$ itself is weakly irreducible (i.e., $\widetilde{Q}(t)$ consists of only one block), then the covariance matrix is given by (5.30). In this case, since there is only one group of recurrent states, the jump behavior due to the limit process $\overline{\alpha }(\cdot )$ will disappear. Moreover, owing to the fast transition rate $\widetilde{Q}(t)/\varepsilon $, the singularly perturbed Markov chain rapidly reaches its quasi-stationary regime. As a result, the jump behavior does not appear in the asymptotic distribution, and the diffusion becomes the dominant factor. Although the method employed in this chapter is different from that of Section 5.2, the result coincides with that of Section 5.2 under irreducibility. We state this in the following corollary.

Corollary 5.46

Assume that the conditions of Theorem 5.43 are fulfilled with l = 1 (i.e., $\widetilde{Q}(t)$ has only one block). Then n ^ε (⋅) converges weakly to the diffusion process

$$n(t) = \left({\int }_{0}^{t}\sigma (s)dw(s)\right)^{\prime},$$

where w(⋅) is an m-dimensional standard Brownian motion with covariance

$$A(t) = \sigma (t)\sigma ^{\prime}(t)$$

given by (5.30).

To further illustrate, consider the following example. This problem is concerned with a singularly perturbed Markov chain with four states divided into two groups. It has been used in modeling production planning problems with failure-prone machines. As was mentioned, from a modeling point of view, it may be used to depict the situation that two machines operate in tandem, in which the operating conditions (the machine capacity) of one of the machines change much faster than the other; see also related discussions in Chapters 7 and 8.

Example 5.47

Let α ^ε (⋅) be a Markov chain generated by

$$\begin{array}{l} {Q}^\varepsilon (t) = \frac{1} \varepsilon \left (\begin{array}{cccc} - {\lambda }_{1}(t)& {\lambda }_{1}(t) & 0 & 0 \\ {\mu }_{1}(t) & - {\mu }_{1}(t)& 0 & 0 \\ 0 & 0 & - {\lambda }_{1}(t)& {\lambda }_{1}(t) \\ 0 & 0 & {\mu }_{1}(t) & - {\mu }_{1}(t)\\ \end{array} \right ) \\ \\ \quad \quad + \left (\begin{array}{cccc} - {\lambda }_{2}(t)& 0 & {\lambda }_{2}(t) & 0 \\ 0 & - {\lambda }_{2}(t)& 0 & {\lambda }_{2}(t) \\ {\mu }_{2}(t) & 0 & - {\mu }_{2}(t)& 0 \\ 0 & {\mu }_{2}(t) & 0 & - {\mu }_{2}(t)\\ \end{array} \right ).\end{array}$$

Then

$$\overline{Q}(t) = \left (\begin{array}{cc} - {\lambda }_{2}(t)& {\lambda }_{2}(t) \\ {\mu }_{2}(t) & - {\mu }_{2}(t) \end{array} \right ).$$

Let $\overline{\alpha }(\cdot )$ be a Markov chain generated by $\overline{Q}(t)$ , t ≥ 0. In this example,

$${\sigma }^{0}(s,1) = 2{\left ( \frac{{\lambda }_{1}(s){\mu }_{1}(s)} {{({\lambda }_{1}(s) + {\mu }_{1}(s))}^{3}}\right )}^{\frac{1} {2} }\left (\begin{array}{cc} {\beta }_{11}(s) &0\\ - {\beta }_{ 12}(s)&0\\ \end{array} \right ),$$

$${\sigma }^{0}(s,2) = 2{\left ( \frac{{\lambda }_{1}(s){\mu }_{1}(s)} {{({\lambda }_{1}(s) + {\mu }_{1}(s))}^{3}}\right )}^{\frac{1} {2} }\left (\begin{array}{cc} {\beta }_{21}(s) &0\\ - {\beta }_{ 22}(s)&0\\ \end{array} \right ),$$

$$\sigma (s,1) = 2{\left ( \frac{{\lambda }_{1}(s){\mu }_{1}(s)} {{({\lambda }_{1}(s) + {\mu }_{1}(s))}^{3}}\right )}^{\frac{1} {2} }\left (\begin{array}{cccc} {\beta }_{11}(s) &0&0&0 \\ - {\beta }_{12}(s)&0&0&0 \\ 0 &0&0&0\\ 0 &0 &0 &0\\ \end{array} \right ),$$

and

$$\sigma (s,2) = 2{\left ( \frac{{\lambda }_{1}(s){\mu }_{1}(s)} {{({\lambda }_{1}(s) + {\mu }_{1}(s))}^{3}}\right )}^{\frac{1} {2} }\left (\begin{array}{cccc} 0&0& 0 &0\\ 0 &0 & 0 &0 \\ 0&0& {\beta }_{21}(s) &0 \\ 0&0& - {\beta }_{22}(s)&0\\ \end{array} \right ).$$

The limit of n ^ε (⋅) is given by

$$n(t) = \left({\int }_{0}^{t}\sigma (s,\overline{\alpha (s)})dw(s)\right)^{\prime},$$

where w(⋅) is a standard Brownian motion taking values in ${\mathbb{R}}^{4}$.

4 Measurable Generators

In Section 4.2, we considered the asymptotic expansions of probability distributions. A natural requirement of such expansions is that the generator Q ^ε(t) be smooth enough to establish the desired error bounds. It would be interesting to consider the case in which the generator Q ^ε(t), t ≥ 0, is merely measurable. The method used in this section is very useful in some manufacturing problems; see Sethi and Zhang [192]. Moreover, the results are used in Section 8.6 to deal with a control problem under relaxed control formulation. Given only the measurability of Q ^ε(t), there seems to be little hope to obtain an asymptotic expansion. Instead of constructing an asymptotic series of the corresponding probability distribution, we consider the convergence of P(α ^ε(t) = s _ij) under the framework of convergence of

$${\int }_{0}^{T}P({\alpha }^\varepsilon (t) = {s}_{ ij})f(t)dt\ \mbox{ for }\ f(\cdot ) \in {L}^{2}[0,T]; \mathbb{R}).$$

Since the phrase “weak convergence” is reserved throughout the book for the convergence of probability measures, to avoid confusion, we refer to the convergence above as convergence in the weak sense on ${L}^{2}([0,T]; \mathbb{R})$ or convergence under the weak topology of ${L}^{2}([0,T]; \mathbb{R})$.

4.1 Case I: Weakly Irreducible $\widetilde{Q}(t)$

Let ${\alpha }^\varepsilon (\cdot ) \in \mathcal{M} =\{ 1,\ldots,m\}$ denote the Markov chain generated by

$${Q}^\varepsilon (t) = \frac{1} \varepsilon \widetilde{Q}(t) +\widehat{ Q}(t),$$

where both $\widetilde{Q}(t)$ and $\widehat{Q}(t)$ are generators.

We assume the following conditions in this subsection.

(A5.7)
$\widetilde{Q}(t)$ and $\widehat{Q}(t)$ are bounded and Borel measurable. Moreover, $\widetilde{Q}(t)$ is weakly irreducible.

Remark 5.48

In fact, both the boundedness and the Borel measurability in (A5.7) are redundant. Recall that our definition of generators (see Definition 2.2) uses the q-Property, which includes both the Borel measurability and the boundedness. Thus, (A5.7) requires only weak irreducibility. Nevertheless, we retain both boundedness and measurability for those who read only this section. Similar comments apply to assumption (A5.8) in what follows.

Define the probability distribution vector

$${p}^\varepsilon (t) = (P({\alpha }^\varepsilon (t) = 1),\ldots,P({\alpha }^\varepsilon (t) = m))$$

and the transition matrix

$${P}^\varepsilon (t,s) = ({p}_{ ij}^\varepsilon (t,s)) = \left (P({\alpha }^\varepsilon (t) = j\vert {\alpha }^\varepsilon (s) = i)\right ).$$

Then using the martingale property in Lemma 2.4, we have

$${p}^\varepsilon (t) = {p}^\varepsilon (s) +{ \int }_{s}^{t}{p}^\varepsilon (r){Q}^\varepsilon (r)dr$$

(5.106)

and

$${P}^\varepsilon (t,s) = I +{ \int }_{s}^{t}{P}^\varepsilon (r,s){Q}^\varepsilon (r)dr.$$

(5.107)

The next two lemmas are concerned with the asymptotic properties of p ^ε(t) and P ^ε(t, s).

Lemma 5.49

Assume (A5.7) . Then for each i, j, and T > 0, P(α ^ε (t) = i) and $P({\alpha }^\varepsilon (t) = i\vert {\alpha }^\varepsilon (s) = j)$ both converge weakly to ν _i (t) on ${L}^{2}([0,T]; \mathbb{R})$ and ${L}^{2}([s,T]; \mathbb{R})$ , respectively, that is, as ε → 0,

$${\int }_{0}^{T}[P({\alpha }^\varepsilon (t) = i) - {\nu }_{ i}(t)]f(t)dt \rightarrow 0$$

(5.108)

and

$${\int }_{s}^{T}[P({\alpha }^\varepsilon (t) = i\vert {\alpha }^\varepsilon (s) = j) - {\nu }_{ i}(t)]f(t)dt \rightarrow 0,$$

(5.109)

for all $f(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})$ and ${L}^{2}([s,T]; \mathbb{R})$ , respectively.

Proof: We only verify (5.108); the proof of (5.109) is similar. Recall that

$${p}^\varepsilon (t) = ({p}_{ 1}^\varepsilon (t),\ldots,{p}_{ m}^\varepsilon (t)) = (P({\alpha }^\varepsilon (t) = 1),\ldots,P({\alpha }^\varepsilon (t) = m)).$$

Since ${p}^\varepsilon (\cdot ) \in {L}^{2}([0,T]; {\mathbb{R}}^{m})$ (space of square-integrable functions on [0, T] taking values in ${\mathbb{R}}^{m}$ ), for each subsequence of ε → 0 there exists (see Lemma A.36) a further subsequence of ε → 0 (still denoted by ε for simplicity), and for such ε, the corresponding { p ^ε( ⋅)} converges (in the weak sense on ${L}^{2}([0,T]; {\mathbb{R}}^{m})$) to some $p(\cdot ) = ({p}_{1}(\cdot ),\ldots,{p}_{m}(\cdot )) \in {L}^{2}([0,T]; {\mathbb{R}}^{m})$, that is,

$${\int }_{0}^{T}{p}^\varepsilon (r)({f}_{ 1}(r),\ldots,{f}_{m}(r))^{\prime}dr \rightarrow {\int }_{0}^{T}p(r)({f}_{ 1}(r),\ldots,{f}_{m}(r))^{\prime}dr,$$

for any $({f}_{1}(\cdot ),\ldots,{f}_{m}(\cdot ))^{\prime} \in {L}^{2}([0,T]; {\mathbb{R}}^{m})$. Moreover,

$$0 \leq {p}_{i}(t) \leq 1\quad \mbox{ and }\quad {p}_{1}(t) + \cdots + {p}_{m}(t) = 1$$

(5.110)

almost everywhere. Since $\widetilde{Q}(\cdot ) \in {L}^{2}([0,T]; {\mathbb{R}}^{m\times m})$, we have for 0 ≤ s ≤ t ≤ T,

$${\int }_{s}^{t}{p}^\varepsilon (r)\widetilde{Q}(r)dr \rightarrow {\int }_{s}^{t}p(r)\widetilde{Q}(r)dr.$$

Thus, using (5.106) we obtain

$$\begin{array}{rl} {\int }_{s}^{t}p(r)\widetilde{Q}(r)dr =&\lim \limits_{ \varepsilon \rightarrow 0}{ \int }_{s}^{t}{p}^\varepsilon (r)\widetilde{Q}(r)dr \\ =&\lim \limits_{\varepsilon \rightarrow 0}\left (\varepsilon ({p}^\varepsilon (t) - {p}^\varepsilon (s)) - \varepsilon {\int }_{s}^{t}{p}^\varepsilon (r)\widehat{Q}(r)dr\right ) = 0.\end{array}$$

Since s and t are arbitrary, it follows immediately that

$$p(t)\widetilde{Q}(t) = 0\mbox{ a.e. in }t.$$

By virtue of (5.110), the irreducibility of $\widetilde{Q}(t)$ implies p(t) = ν( t) almost everywhere. Thus the limit is independent of the chosen subsequence. Therefore, p ^ε ( ⋅) → ν( ⋅) in the weak sense on ${L}^{2}([0,T]; {\mathbb{R}}^{m})$. □

Theorem 5.50

Assume (A5.7) . Then for any bounded deterministic function β _i (⋅) and for each $i \in \mathcal{M}$ and t ≥ 0,

$$E{\left \vert {\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)){\beta }_{i}(s)ds\right \vert }^{2} \rightarrow 0\mbox{ as }\varepsilon \rightarrow 0.$$

(5.111)

Proof: Let

$$\eta (t) = E{\left \vert {\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)){\beta }_{i}(s)ds\right \vert }^{2}.$$

Then as in the proof of Theorem 5.25, we can show that

$$\eta (t) = 2({\eta }_{1}(t) + {\eta }_{2}(t)),$$

where

$$\begin{array}{rl} {\eta }_{1}(t) =&{\int }_{0}^{t}{ \int }_{0}^{s}(-{\nu }_{ i}(r))[P({\alpha }^\varepsilon (s) = i) - {\nu }_{ i}(s)]{\beta }_{i}(s){\beta }_{i}(r)drds, \\ {\eta }_{2}(t) =&{\int }_{0}^{t}{ \int }_{0}^{s}P({\alpha }^\varepsilon (r) = i)[P({\alpha }^\varepsilon (s) = i\vert {\alpha }^\varepsilon (r) = i) - {\nu }_{ i}(s)] \\ &\qquad \times {\beta }_{i}(s){\beta }_{i}(r)drds\end{array}$$

By virtue of Lemma 5.49, P(α^ε(s) = i) → ν_i(s) in the weak sense on ${L}^{2}([0,T]; \mathbb{R})$ and therefore as ε → 0,

$${\eta }_{1}(t) ={ \int }_{0}^{t}[P({\alpha }^\varepsilon (s) = i) - {\nu }_{ i}(s)]{\beta }_{i}(s)\left ({\int }_{0}^{s}(-{\nu }_{ i}(r)){\beta }_{i}(r)dr\right )ds \rightarrow 0.$$

Similarly, in view of the convergence of

$$P({\alpha }^\varepsilon (s) = i\vert {\alpha }^\varepsilon (r) = i) \rightarrow {\nu }_{ i}(s)$$

under the weak topology of ${L}^{2}([r,t]; \mathbb{R})$, we have

$$\begin{array}{rl} &{\eta }_{2}(t) ={ \int }_{0}^{t}\left [{\int }_{r}^{t}[P({\alpha }^\varepsilon (s) = i\vert {\alpha }^\varepsilon (r) = i) - {\nu }_{ i}(s)]{\beta }_{i}(s)ds\right ] \\ &\quad \times P({\alpha }^\varepsilon (r) = i){\beta }_{i}(r)dr \rightarrow 0. \end{array}$$

This concludes the proof of the theorem. □

4.2 Case II: $\widetilde{Q}(t) = \mathrm{diag}(\widetilde{{Q}}^{1}(t),\ldots,\widetilde{{Q}}^{l}(t))$

This subsection extends the preceding result to the cases in which $\widetilde{Q}(t)$ is a block-diagonal matrix with irreducible blocks. We make the following assumptions:

(A5.8)
$\widehat{Q}(t)$ and $\widetilde{{Q}}^{i}(t)$, for i = 1, …, l, are bounded and Borel measurable. Moreover, $\widetilde{{Q}}^{i}(t)$,i = 1, …, l, are weakly irreducible.

Lemma 5.51

Assume (A5.8) . Then the following assertions hold:

(a)
For each i = 1, …, l and j = 1, …, m _i, P(α^ε(t) = s _ij) converges in the weak sense to ${\nu }_{j}^{i}(t){{\vartheta}}^{i}(t)$ on ${L}^{2}([0,T]; \mathbb{R})$ , that is,
$${\int }_{0}^{T}[P({\alpha }^\varepsilon (t) = {s}_{ ij}) - {\nu }_{j}^{i}(t){{\vartheta}}^{i}(t)]f(t)dt \rightarrow 0,$$
(5.112)
for all $f(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})$ , where
$$({{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t)) = {p}_{ 0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}({{\vartheta}}^{1}(s),\ldots,{{\vartheta}}^{l}(s))\overline{Q}(s)ds.$$
(b)
For each i, j, i ₁, j ₁,$P({\alpha }^\varepsilon (t) = {s}_{ij}\vert {\alpha }^\varepsilon (s) = {s}_{{i}_{1}{j}_{1}})$ converges in the weak sense to ${\nu }_{j}^{i}(t){{\vartheta}}_{ii}(t,s)$ on ${L}^{2}([s,T]; \mathbb{R})$ , that is,
$${\int }_{s}^{T}[P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{1}{j}_{1}}) - {\nu }_{j}^{i}(t){{\vartheta}}_{ ii}(t,s)]f(t)dt \rightarrow 0,$$
(5.113)
for all $f(\cdot ) \in {L}^{2}([s,T]; \mathbb{R})$ , where ${{\vartheta}}_{ij}(t,s)$ is defined in Lemma 5.24 (see (5.50)).

Proof: We only derive (5.112); the proof of (5.113) is similar. Let

$${p}^\varepsilon (t) = \left ({p}_{ 11}^\varepsilon (t),\ldots,{p}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{p}_{ l1}^\varepsilon (t),\ldots,{p}_{ l{m}_{l}}^\varepsilon (t)\right )$$

where ${p}_{ij}^\varepsilon (t) = P({\alpha }^\varepsilon (t) = {s}_{ij})$ . Since ${p}^\varepsilon (\cdot ) \in {L}^{2}([0,T]; {\mathbb{R}}^{m})$ , there exists (see Lemma A.36) a subsequence of ε → 0 (still denoted by ε for simplicity), such that corresponding to this ε, p ^ε(t) converges to some $p(\cdot ) \in {L}^{2}([0,T]; {\mathbb{R}}^{m})$ under the weak topology. Let

$$p(t) = \left ({p}_{11}(t),\ldots,{p}_{1{m}_{1}}(t),\ldots,{p}_{l1}(t),\ldots,{p}_{l{m}_{l}}(t)\right ).$$

Then 0 ≤ p _ij(t) ≤ 1 and ∑_{i,
j} p _ij (t) = 1 almost everywhere. Similarly as in the proof of Lemma 5.49, for 0 ≤ t ≤ T,

$$p(t)\widetilde{Q}(t) = 0\mbox{ a.e. in }t.$$

The irreducibility of $\widetilde{{Q}}^{k}(t)$,k = 1, …, l, implies that

$$p(t) = ({{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t))\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)),$$

(5.114)

for some functions ${{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t)$.

In view of (5.106), we have

$${p}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} = {p}_{ 0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}{p}^\varepsilon (s)\left (\frac{1} \varepsilon \widetilde{Q}(s) +\widehat{ Q}(s)\right )\widetilde{\mathrm{1}\mathrm{l}}ds.$$

Since $\widetilde{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} = 0$, it follows that

$${p}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} = {p}_{ 0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}{p}^\varepsilon (s)\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}ds.$$

Owing to the convergence of p ^ε(t) → p( t) under the weak topology of ${L}^{2}([0,T]; {\mathbb{R}}^{m})$, we have

$$p(t)\widetilde{\mathrm{1}\mathrm{l}} = {p}_{0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}p(s)\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}ds.$$

Using (5.114) and noting that

$$\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))\widetilde{\mathrm{1}\mathrm{l}} = I,$$

we have

$$({{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t)) = {p}_{ 0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}({{\vartheta}}^{1}(s),\ldots,{{\vartheta}}^{l}(s))\overline{Q}(s)ds.$$

The uniqueness of the solution then yields the lemma. □

Theorem 5.52

Assume (A5.8) . Then for any i = 1,…,l, j = 1,…,m _i , and bounded deterministic function β _ij (t), t ≥ 0,

$$E\left ({\int }_{0}^{T}\left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}\right ){\beta }_{ij}(t)d{t}\right )^{2} \rightarrow 0,\mbox{ as }\varepsilon \rightarrow 0.$$

Proof: Let η(t) be defined as in (5.52). Then we can show similarly as in the proof of Theorem 5.25 that

$$\eta (T) = 2{\int }_{0}^{T}{ \int }_{0}^{t}{\Phi }^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)drdt,$$

where ${\Phi }^\varepsilon (t,r) = {\Phi }_{1}^\varepsilon (t,r) + {\Phi }_{2}^\varepsilon (t,r)$ with Φ ₁ ^ε(t, r) and Φ ₂ ^ε(t, r) defined by (5.53) and (5.54), respectively.

Note that by changing the order of integration,

$$\begin{array}{rl} &{\int }_{0}^{T}{ \int }_{0}^{t}{\Phi }_{ 1}^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)drdt \\ &\quad ={ \int }_{0}^{T}P({\alpha }^\varepsilon (r) = {s}_{ ij}){\beta }_{ij}(r)\left \{{\int }_{r}^{T}[P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (r) = {s}_{ ij})\right. \\ &\left.\qquad - {\nu }_{j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (r) = {s}_{ ij})]{\beta }_{ij}(t)dt\right \}dr\end{array}$$

Since the β _ij( ⋅) are bounded uniformly on [0, T], ${\beta }_{ij}(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})$. As a result, Lemma 5.51 implies that

$$\begin{array}{rl} &{ \int }_{r}^{T}[P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (r) = {s}_{ ij}) \\ & - {\nu }_{j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (r) = {s}_{ ij})]{\beta }_{ij}(t)dt \rightarrow 0.\end{array}$$

Hence as ε → 0,

$${\int }_{0}^{T}{ \int }_{0}^{t}{\Phi }_{ 1}^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)drdt \rightarrow 0.$$

Similarly,

$${\int }_{0}^{T}{ \int }_{0}^{t}{\Phi }_{ 2}^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)drdt \rightarrow 0,\mbox{ as }\varepsilon \rightarrow 0.$$

The proof is complete. □

Theorem 5.53

Assume (A5.8) . Then ${\overline{\alpha }}^\varepsilon (\cdot )$ converges weakly to $\overline{\alpha }(\cdot )$ on $D([0,T];\overline{\mathcal{M}})$, as ε → 0.

Proof: Recall that χ^ε(t) denotes the vector of indicator functions

$$\left ({I}_{\{{\alpha }^\varepsilon (t)={s}_{11}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{1{m}_{ 1}}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l1}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l{m}_{ l}}\}}\right ),$$

and let

$${\overline{\chi }}^\varepsilon (t) = ({\overline{\chi }}_{ 1}^\varepsilon (t),\ldots,{\overline{\chi }}_{ l}^\varepsilon (t)) = {\chi }^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}}.$$

Then ${\overline{\chi }}_{i}^\varepsilon (t) = {I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}$ for i = 1, …, l.

We show that ${\overline{\chi }}^\varepsilon (\cdot )$ is tight in D ^l[0, T] first. Let ${\mathcal{F}}_{t}^\varepsilon = \sigma \{{\alpha }^\varepsilon (r) :\; r \leq t\}$. Then in view of the martingale property associated with α^ε( ⋅), we have, for 0 ≤ s ≤ t,

$$E\left [{\chi }^\varepsilon (t) - {\chi }^\varepsilon (s) -{\int }_{s}^{t}{\chi }^\varepsilon (r){Q}^\varepsilon (r)dr|{\mathcal{F}}_{ s}^\varepsilon \right ] = 0.$$

Right multiplying both sides of the equation by $\widetilde{\mathrm{1}\mathrm{l}}$ and noting that $\widetilde{Q}(r)\widetilde{\mathrm{1}\mathrm{l}} = 0$, we obtain

$$E\left [{\overline{\chi }}^\varepsilon (t) -{\overline{\chi }}^\varepsilon (s) -{\int }_{s}^{t}{\chi }^\varepsilon (r)\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr|{\mathcal{F}}_{ s}^\varepsilon \right ] = 0.$$

(5.115)

Note that

$$\left \vert {\int }_{s}^{t}{\chi }^\varepsilon (r)\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr\right \vert = O(t - s).$$

It follows from (5.115) that

$$E\left [{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}\vert {\mathcal{F}}_{s}^\varepsilon \right ] = {I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}} + O(t - s).$$

(5.116)

Note also that (I _A)² = I _A for any set A. We have, in view of (5.116),

$$\begin{array}{rl} &E\left [{\left ({I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}} - {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}\right )}^{2}|{\mathcal{F}}_{ s}^\varepsilon \right ] \\ & = E\left [{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}} - 2{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}{I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} + {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}|{\mathcal{F}}_{s}^\varepsilon \right ] \\ & = E\left [{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}|{\mathcal{F}}_{s}^\varepsilon \right ] - 2E\left [{I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}|{\mathcal{F}}_{s}^\varepsilon \right ]{I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}} + {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} \\ & = {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} + O(t - s) \\ &\qquad \quad - 2\left ({I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} + O(t - s)\right ){I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} + {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} \\ & = O(t - s), \end{array}$$

for each i = 1, …, l. Hence,

$$\lim {}_{t\rightarrow s}\lim \limits_{\varepsilon \rightarrow 0}E\left \{E\left [{\left ({I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}} - {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}\right )}^{2}|{\mathcal{F}}_{ s}^\varepsilon \right ]\right \} = 0.$$

Therefore, by Lemma A.17, ${\overline{\chi }}^\varepsilon (\cdot )$ is tight.

The tightness of ${\overline{\chi }}^\varepsilon (\cdot )$ implies that for any sequence ε_k → 0, there exists a subsequence of {ε_k } (still denoted by {ε _k }) such that ${\overline{\chi }}^{\varepsilon _{k}}(\cdot )$ converges weakly. We next show that the limit of such a subsequence is uniquely determined by $\overline{Q}(\cdot ) := \mathrm{diag}({\nu }^{1}(\cdot ),\ldots,{\nu }^{l}(\cdot ))\widehat{Q}(\cdot )\widetilde{\mathrm{1}\mathrm{l}}$.

Note that

$$\begin{array}{rl} &{ \int }_{s}^{t}{\chi }^\varepsilon (r)\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr ={ \int }_{s}^{t}{\overline{\chi }}^\varepsilon (r)\overline{Q}(r)dr \\ & +{ \int }_{s}^{t}\left ({\chi }^\varepsilon (r) -{\overline{\chi }}^\varepsilon (r)\mathrm{diag}({\nu }^{1}(r),\ldots,{\nu }^{l}(r))\right )\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr.\end{array}$$

In view of Theorem 5.52, we have, as ε → 0,

$$E\left \vert {\int }_{s}^{t}\left [{\chi }^\varepsilon (r) -{\overline{\chi }}^\varepsilon (r)\mathrm{diag}({\nu }^{1}(r),\ldots,{\nu }^{l}(r))\right ]\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr\right \vert \rightarrow 0.$$

(5.117)

Now by virtue of (5.115),

$$E\left [\left ({\overline{\chi }}^\varepsilon (t) -{\overline{\chi }}^\varepsilon (s) -{\int }_{s}^{t}{\chi }^\varepsilon (r)\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr\right ){z}_{ 1}({\overline{\chi }}^\varepsilon ({t}_{ 1}))\cdots {z}_{j}({\overline{\chi }}^\varepsilon ({t}_{ j}))\right ] = 0,$$

for 0 ≤ t ₁ ≤ ⋯ ≤ t _j ≤ s ≤ t and bounded and continuous functions z ₁ ( ⋅), …, z _j( ⋅).

Let $\overline{\chi }(\cdot )$ denote the limit in distribution of ${\overline{\chi }}^{\varepsilon _{k}}(\cdot )$. Then in view of (5.117) and the continuity of ${\int }_{s}^{t}\eta (r)\overline{Q}(r)dr$ with respect to η( ⋅) (see Lemma A.40), we have ${\overline{\chi }}^\varepsilon (\cdot ) \rightarrow \overline{\chi }(\cdot )$ as ε_k → 0, and $\overline{\chi }(\cdot )$ satisfies

$$E\left [\left (\overline{\chi }(t) -\overline{\chi }(s) -{\int }_{s}^{t}\overline{\chi }(r)\overline{Q}(r)dr\right ){z}_{ 1}(\overline{\chi }({t}_{1}))\cdots {z}_{j}(\overline{\chi }({t}_{j}))\right ] = 0.$$

It is easy to see that $\overline{\chi }(\cdot ) = ({\overline{\chi }}_{1}(\cdot ),\ldots,{\overline{\chi }}_{l}(\cdot ))$ is an l-valued measurable process having sample paths in D ^l[0, T] and satisfying ${\overline{\chi }}_{i}(t) = 0$ or 1 and ${\overline{\chi }}_{1}(\cdot ) + \cdots +{ \overline{\chi }}_{l}(\cdot ) = 1$ w.p.1. Let

$$\overline{\alpha }(t) = \sum \limits_{i=1}^{l}i{I}_{\{{\overline{\chi }}_{ i}(t)=1\}},$$

or in an expanded form,

$$\overline{\alpha }(t) = \left \{\begin{array}{cl} 1,&\mbox{ if }{\overline{\chi }}_{1}(t) = 1, \\ 2,&\mbox{ if }{\overline{\chi }}_{1}(t) = 0,\;{\overline{\chi }}_{2}(t) = 1,\\ \vdots &\vdots \\ l, &\mbox{ if }{\overline{\chi }}_{i}(t) = 0,\mbox{ for }i \leq l - 1,\;{\overline{\chi }}_{l}(t) = 1. \end{array} \right.$$

Then $\overline{\alpha }(\cdot )$ is a process with sample paths in $D([0,T];\overline{\mathcal{M}})$ and

$$\overline{\chi }(t) = ({I}_{\{\overline{\alpha }(t)=1\}},\ldots,{I}_{\{\overline{\alpha }(t)=l\}})\mbox{ w.p.1}.$$

Therefore, $\overline{\alpha }(\cdot )$ is a Markov chain generated by $\overline{Q}(\cdot )$ . As a result, its distribution is uniquely determined by $\overline{Q}(\cdot )$ . It follows that ${\overline{\alpha }}^\varepsilon (\cdot )$ converges weakly to $\overline{\alpha }(\cdot )$. □

Remark 5.54

Note that Theorem 5.53 gives the same result as Theorem 5.27 under weaker conditions. The proofs are quite different. The proof of Theorem 5.53 is based on martingale properties associated with the Markov chain, whereas the proof of Theorem 5.27 follows the traditional approach, i.e., after the tightness is verified, the convergence of finite-dimensional distributions is proved.

Remark 5.55

In view of the development in Chapter 4, apart from the smoothness conditions, one of the main ingredients is the use of the Fredholm alternative. One hopes that this will carry over (under suitable conditions) to the measurable generators. A possible approach is the utilization of the formulation of weak derivatives initiated in the study of partial differential equations (see Hutson and Pym [90]).

Following the tactics of the weak sense formulation, for some T < ∞ and for given $g(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})$ , a function $f(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})$ is a weak solution of $(d/dt)f = g$ if

$${\int }_{0}^{T}f(t)\left ({ d\phi (t) \over dt} \right )dt ={ \int }_{0}^{T}g(t)\phi (t)dt$$

for any C ^∞ -functions on [0,T] vanishing on the boundary together with their derivatives (denoted by $\phi \in {C}_{0}^{\infty }([0,T]; \mathbb{R})$ ). Write the weak solution as $(d/dt)f\mbox{ w} =g$.

Recall that L _loc ² is the set of functions that lie in ${L}^{2}(S; \mathbb{R})$ for every closed and bounded set S ⊂ (0,T). A function f(⋅) ∈ L _loc ² has a jth-order weak derivative if there is a function g(⋅) ∈ L _loc ² such that

$${\int }_{0}^{T}g(t)\phi (t)dt = {(-1)}^{j}{ \int }_{0}^{T}f(t){ {d}^{j}\phi (t) \over d{t}^{j}} dt$$

for all $\phi \in {C}_{0}^{\infty }([0,T]; \mathbb{R})$ . The function g(⋅) above is called the jth-order weak derivative of f(⋅), and is denoted by D ^j f = g.

To proceed, define the space of functions H ⁿ as

$${H}^{n} =\{ f\mbox{ on }[0,T];\ \mbox{ for }0 \leq j \leq n,\ {D}^{j}f\mbox{ exist and are in }{L}^{2}([0,T]; \mathbb{R})\}.$$

Equip H ⁿ with an inner product and a norm as

$$\begin{array}{rl} &{(f,g)}_{n} = \sum \limits_{j\leq n}{ \int }_{0}^{T}{D}^{j}f{D}^{j}gdt, \\ &\vert f{\vert }_{n}^{2} = {(f,f)}_{ n} = \sum \limits_{j\leq n}{ \int }_{0}^{T}\vert {D}^{j}f{\vert }^{2}dt\end{array}$$

One can then work under such a framework and proceed to obtain the asymptotic expansion of the probability distribution. It seems that the conditions required are not much different from those in the case of smooth generators; we will not pursue this issue further.

5 Remarks on Inclusion of Transient and Absorbing States

So far, the development in this chapter has focused on Markov chains with only recurrent states (either a single weakly irreducible class or a number of weakly irreducible classes). This section extends the results obtained to the case that a transient class or a group of absorbing states is included.

5.1 Inclusion of Transient States

Consider the Markov chain ${\alpha }^\varepsilon (\cdot ) \in \mathcal{M}$, where its generator is still given by (5.47) and the state space of α^ε(t) is given by

$$\mathcal{M} = {\mathcal{M}}_{1} \cup {\mathcal{M}}_{2} \cup \cdots \cup {\mathcal{M}}_{l} \cup {\mathcal{M}}_{{_\ast}},$$

(5.118)

with ${\mathcal{M}}_{i} =\{ {s}_{i1},\ldots,{s}_{i{m}_{i}}\}$ and ${\mathcal{M}}_{{_\ast}} =\{ {s}_{{_\ast}1},\ldots,{s}_{{_\ast}{m}_{{_\ast}}}\}$. In what follows, we present results concerning the asymptotic distributions of scaled occupation measures and properties of measurable generators. While main assumptions and results are provided, the full proofs are omitted. The interested reader can derive the results using the ideas presented in the previous sections.

To proceed, assume that $\widetilde{Q}(t)$ is a generator of a Markov chain satisfying

$$\widetilde{Q}(t) = \left (\begin{array}{cccc} \widetilde{{Q}}^{1}(t) & & & \\ \\ \\ & \ddots & &\\ \\ \\ & & \widetilde{{Q}}^{l}(t) & \\ \\ \\ \widetilde{{Q}}_{{_\ast}}^{1}(t)&\cdots &\widetilde{{Q}}_{{_\ast}}^{l}(t)&\widetilde{{Q}}_{{_\ast}}(t)\\ \end{array} \right )$$

(5.119)

such that for each t ∈ [0, T] and each i = 1, …, l, $\widetilde{{Q}}^{i}(t)$ is a generator with dimension m _i ×m _i, $\widetilde{{Q}}_{{_\ast}}(t)$ is an m _∗ ×m _∗ matrix, $\widetilde{{Q}}_{{_\ast}}^{i}(t) \in {\mathbb{R}}^{{m}_{{_\ast}}\times {m}_{i}}$, and ${m}_{1} + {m}_{2} + \cdots + {m}_{l} + {m}_{{_\ast}} = m$. We impose the following conditions.

(A5.9)
For all t ∈ [0, T], and i = 1, …, l, $\widetilde{{Q}}^{i}(t)$ are weakly irreducible, and $\widetilde{{Q}}_{{_\ast}}(t)$ is Hurwitz (i.e., all of its eigenvalues have negative real parts). Moreover, $\widetilde{Q}(\cdot )$ is differentiable on [0, T] and its derivative is Lipschitz; $\widehat{Q}(\cdot )$ is Lipschitz continuous on [0, T].

Use the partition

$$\widehat{Q}(t) = \left (\begin{array}{*{10}c} \widehat{{Q}}^{11}(t)&\widehat{{Q}}^{12}(t) \\ \widehat{{Q}}^{21}(t)&\widehat{{Q}}^{22}(t)\\ \end{array} \right )$$

where

$$\begin{array}{rl} &\widehat{{Q}}^{11}(t) \in {\mathbb{R}}^{(m-{m}_{{_\ast}})\times (m-{m}_{{_\ast}})},\ \widehat{{Q}}^{12}(t) \in {\mathbb{R}}^{(m-{m}_{{_\ast}})\times {m}_{{_\ast}} }, \\ &\widehat{{Q}}^{21}(t) \in {\mathbb{R}}^{{m}_{{_\ast}}\times (m-{m}_{{_\ast}})},\ \mbox{ and }\ \widehat{{Q}}^{22}(t) \in {\mathbb{R}}^{{m}_{{_\ast}}\times {m}_{{_\ast}} }, \end{array}$$

and write

$$\begin{array}{ll} &{\overline{Q}}_{{_\ast}}(t) = \mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))(\widehat{{Q}}^{11}(t)\widetilde{\mathrm{1}\mathrm{l}} +\widehat{ {Q}}^{12}(t)({a}_{{ m}_{1}}(t),\ldots,{a}_{{m}_{l}}(t))) \\ &\overline{Q}(t) = \mathrm{diag}({\overline{Q}}_{{_\ast}}(t),{0}_{{m}_{{_\ast}}\times {m}_{{_\ast}}}), \end{array}$$

(5.120)

where

$$\widetilde{\mathrm{1}\mathrm{l}} = \mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{m}_{1}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{l}}),\;\mathrm{1}{\mathrm{l}}_{{m}_{j}} = (1,\ldots,1)^{\prime} \in {\mathbb{R}}^{{m}_{j}\times 1},$$

and

$${a}_{{m}_{i}}(t) = -\widetilde{{Q}}_{{_\ast}}^{-1}(t)\widetilde{{Q}}_{ {_\ast}}^{i}(t)\mathrm{1}{\mathrm{l}}_{{ m}_{i}},\mbox{ for }i = 1,\ldots,l.$$

(5.121)

In what follows, if ${a}_{{m}_{i}}(t)$ is time independent, we will simply write it as ${a}_{{m}_{i}}$. The requirement on $\widetilde{{Q}}_{{_\ast}}(t)$ in (A5.9) implies that the corresponding states are transient. The Hurwitzian property also has the following interesting implication: For each t ∈ [0, T], and each i = 1, …, l, ${a}_{{m}_{i}}(t) = ({a}_{{m}_{i},1}(t),\ldots,{a}_{{m}_{i},{m}_{{_\ast}}}(t))^{\prime} \in {\mathbb{R}}^{{m}_{{_\ast}}\times 1}$. Then

$${a}_{{m}_{i},j}(t) \geq 0\ \mbox{ and }\ {\sum }_{i=1}^{l}{a}_{{ m}_{i},j}(t) = 1$$

(5.122)

for each j = 1, …, m _∗. That is, for each t ∈ [0, T] and each j = 1, …, l, $({a}_{{m}_{1},j}(t),\ldots,{a}_{{m}_{l},j}(t))$ can be considered a probability row vector. To see this, note that

$${\int }_{0}^{\infty }\exp (\widetilde{{Q}}_{ {_\ast}}(t)s)ds = -\widetilde{{Q}}_{{_\ast}}^{-1}(t),$$

which has nonnegative components. It follows from the definition that ${a}_{{m}_{i}}(t) \geq 0$. Furthermore,

$${\sum }_{i=1}^{l}{a}_{{ m}_{i}}(t) = -\widetilde{{Q}}_{{_\ast}}^{-1}(t){\sum }_{i=1}^{l}\widetilde{{Q}}_{ {_\ast}}^{i}(t)\mathrm{1}{\mathrm{l}}_{{ m}_{i}} = (-\widetilde{{Q}}_{{_\ast}}^{-1}(t))(-\widetilde{{Q}}_{ {_\ast}}(t))\mathrm{1}{\mathrm{l}}_{{m}_{{_\ast}}} = \mathrm{1}{\mathrm{l}}_{{m}_{{_\ast}}}.$$

Thus (5.122) follows. Similar to the development in the section for the case of weak and strong interactions, we can derive the following results.

Theorem 5.56

Define

$${ \chi }_{ij}^\varepsilon (t) = \left \{\begin{array}{@{}l@{\quad }l@{}} {\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)={s}_{ij}\}} - {\nu }_{j}^{i}(s){I}_{\{{ \alpha }^\varepsilon (s)\in {\mathcal{M}}_{i}\}}\right )ds,\quad &fori=1,\ldots,l, \\ {\int }_{0}^{t}{I}_{\{{ \alpha }^\varepsilon (s)={s}_{{_\ast}j}\}}ds, \quad &fori=*,\\ \quad \end{array} \right.$$

(5.123)

and assume (A5.9) . Then for each j = 1,…,m _i,

$${ \sup }_{t\in [0,T]}E\vert {\chi }_{ij}^\varepsilon (t){\vert }^{2} = \left \{\begin{array}{@{}l@{\quad }l@{}} O(\varepsilon ), \quad &fori=1,\ldots,l, \\ O(\varepsilon ^{2}),\quad &fori=*.\\ \quad \end{array} \right.$$

(5.124)

Next, for each fixed t ∈ [0, T], let ξ be a random variable uniformly distributed on [0, 1] that is independent of α^ε( ⋅). For each j = 1, …, m _∗, define an integer-valued random variable ξ_j(t) by

$$\begin{array}{rl} {\xi }_{j}(t)& = {I}_{\{0\leq \xi \leq {a}_{{m}_{ 1},j}(t)\}} + 2{I}_{\{{a}_{{m}_{ 1},j}(t)<\xi \leq {a}_{{m}_{1},j}(t)+{a}_{{m}_{2},j}(t)\}} \\ & \qquad + \cdots + l{I}_{\{{a}_{{m}_{ 1},j}(t)+\cdots +{a}_{{m}_{l-1},j}(t)<\xi \leq 1\}}\end{array}$$

Now redefine the aggregated process ${\overline{\alpha }}^\varepsilon (\cdot )$ by

$${ \overline{\alpha }}^\varepsilon (t) = \left \{\begin{array}{ll} i, &\mbox{ if }{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}, \\ {\xi }_{j}(t),&\mbox{ if }{\alpha }^\varepsilon (t) = {s}_{{_\ast}j}(t).\\ \end{array} \right.$$

(5.125)

Note that the state space of ${\overline{\alpha }}^\varepsilon (t)$ is $\overline{\mathcal{M}} =\{ 1,\ldots,l\}$, and that ${\overline{\alpha }}^\varepsilon (\cdot ) \in D([0,T];\overline{\mathcal{M}})$. Similar to the weak and strong interaction case, but with more effort, we can obtain the following result.

Theorem 5.57

Under conditions (A5.9), ${\overline{\alpha }}^\varepsilon (\cdot )$ converges weakly to $\overline{\alpha }(\cdot )$ , a Markov chain generated by ${\overline{Q}}_{{_\ast}}(\cdot )$ given by (5.120).

Next, for t ≥ 0, and $\alpha \in \mathcal{M}$, let β_ij(t) be bounded Borel measurable deterministic functions, and let

$${ W}_{ij}(t,\alpha ) = \left \{\begin{array}{ll} ({I}_{\{\alpha ={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{\alpha \in {\mathcal{M}}_{i}\}}){\beta }_{ij}(t),&\mbox{ if }i = 1,\ldots,l,\ j = 1,\ldots,{m}_{i}, \\ {I}_{\{\alpha ={s}_{{_\ast}j}\}}{\beta }_{ij}(t), &\mbox{ if }i = {_\ast},\ j = 1,\ldots,{m}_{{_\ast}}. \end{array} \right.$$

(5.126)

Consider the normalized occupation measure

$${n}^\varepsilon (t) = ({n}_{ 11}^\varepsilon (t),\ldots,{n}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{n}_{ {_\ast}1}^\varepsilon (t),\ldots,{n}_{ {_\ast}{m}_{{_\ast}}}^\varepsilon (t)),$$

where

$${n}_{ij}^\varepsilon (t) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}{W}_{ ij}(s,{\alpha }^\varepsilon (s))ds.$$

We can then proceed to obtain the asymptotic distribution.

Theorem 5.58

Assume (A5.9) , and suppose $\widetilde{Q}(\cdot )$ is twice differentiable with Lipschitz continuous second derivative. Moreover, $\widehat{Q}(\cdot )$ is differentiable with Lipschitz continuous derivative. Let β _ij (⋅) (for i = 1,…,l, j = 1,…,m _i ) be bounded and Lipschitz continuous deterministic functions. Then n ^ε (⋅) converges weakly to a switching diffusion n(⋅), where

$$n(t) = \left({\int }_{0}^{t}\sigma (s,\overline{\alpha }(s))dw(s)\right)^{\prime},$$

(5.127)

where σ(s,i) is similar to (5.105) with the following modifications:

$$\sigma (s,i) = \mathrm{diag}({0}_{{m}_{1}\times {m}_{1}},\ldots,{\sigma }^{0}(s,i),\ldots,{0}_{{ m}_{l}\times {m}_{l}},{0}_{{m}_{{_\ast}}\times {m}_{{_\ast}}})$$

(5.128)

and w(⋅) is a standard m-dimensional Brownian motion.

Finally, we confirm that the case of the generators being merely measurable can be treated as well. We state this as the following theorem.

Theorem 5.59

Assume the generator is given by (5.47) with $\widetilde{Q}(\cdot )$ given by (5.120) such that $\widetilde{Q}$ and $\widehat{Q}$ are measurable and bounded and that $\widetilde{{Q}}^{i}(t)$ is weakly irreducible for each $i = 1,\ldots,l$ . Then the following assertions hold:

For any i = 1, …, l, j = 1, …, m _i, and bounded deterministic function β_ij(t), t ≥ 0,
$$E\left ({\int }_{0}^{T}\left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}\right ){\beta }_{ij}(t)dt\right )^{2} \rightarrow 0,\mbox{ as }\varepsilon \rightarrow 0.$$
${\overline{\alpha }}^\varepsilon (\cdot )$ converges weakly to $\overline{\alpha }(\cdot )$, a Markov chain generated by ${\overline{Q}}_{{_\ast}}(\cdot )$.

5.2 Inclusion of Absorbing States

Consider the Markov chain ${\alpha }^\varepsilon (\cdot ) \in \mathcal{M}$, where the generator of α^ε( ⋅) is still given by (5.47) with

$$\widetilde{Q}(t) = \mathrm{diag}(\widetilde{{Q}}^{1}(t),\ldots,\widetilde{{Q}}^{l}(t),{0}_{{ m}_{a}\times {m}_{a}}),$$

(5.129)

where ${0}_{{m}_{a}\times {m}_{a}}$ is the m _a ×m _a zero matrix, the state space of α^ε(t) is given by

$$\mathcal{M} = {\mathcal{M}}_{1} \cup {\mathcal{M}}_{2} \cup \cdots \cup {\mathcal{M}}_{l} \cup {\mathcal{M}}_{a},$$

(5.130)

with ${\mathcal{M}}_{i} =\{ {s}_{i1},\ldots,{s}_{i{m}_{i}}\}$ and ${\mathcal{M}}_{a} =\{ {s}_{a1},\ldots,{s}_{a{m}_{a}}\}$, and ${m}_{1} + {m}_{2} + \cdots + {m}_{l} + {m}_{a} = m$. Assume the following conditions.

(A5.10)
For all t ∈ [0, T] and i = 1, …, l, $\widetilde{{Q}}^{i}(t)$ is weakly irreducible. Furthermore, $\widetilde{Q}(\cdot )$ is differentiable on [0, T] and its derivative is Lipschitz. Moreover, $\widehat{Q}(\cdot )$ is Lipschitz continuous on [0, T].

Define

$$\begin{array}{ll} &\widetilde{\mathrm{1}\mathrm{l}} = \mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{m}_{1}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{l}})\ \mbox{ and }\ \widetilde{\mathrm{1}{\mathrm{l}}}_{a} = \mathrm{diag}(\widetilde{\mathrm{1}\mathrm{l}},{I}_{{m}_{a}}) \\ &\overline{Q}(t) = \mathrm{diag}({\nu }^{1}(t),{\nu }^{2}(t),\ldots,{\nu }^{l}(t),{I}_{{ m}_{a}})\widehat{Q}(t)\widetilde{\mathrm{1}{\mathrm{l}}}_{a}.\end{array}$$

(5.131)

Assume that the conditions in (A5.10) are satisfied. Then we can prove the following:

(a)
As ε → 0,
$${p}^\varepsilon (t) = ({\vartheta}(t),{{\vartheta}}^{a}(t))\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t),{I}_{{ m}_{a}}) + O\left (\varepsilon +\exp (-{\kappa }_{0}t/\varepsilon )\right ),$$
where
$$\begin{array}{rl} &{\vartheta}(t) = ({{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t))) \in {\mathbb{R}}^{1\times l}\ \mbox{ and } \\ &{{\vartheta}}^{a}(t) = ({{\vartheta}}_{ 1}^{a}(t),\ldots,{{\vartheta}}_{{ m}_{a}}^{a}(t)) \in {\mathbb{R}}^{1\times {m}_{a} },\end{array}$$
satisfying
$$\begin{array}{rl} { d({\vartheta}(t),{{\vartheta}}^{a}(t)) \over dt} = ({\vartheta}(t),{{\vartheta}}^{a}(t))\overline{Q}(t),\ \ ({\vartheta}(0),{{\vartheta}}^{a}(0)) = {p}^\varepsilon (0)\widetilde{\mathrm{1}{\mathrm{l}}}_{ a} \end{array}$$
where $\overline{Q}(t)$ is given in (5.131) and p ^ε(0) = (p ^ε, 1(0), …, p ^ε, l(0), p ^ε, a(0)) with ${p}^{\varepsilon,i}(0) \in {\mathbb{R}}^{1\times {m}_{i}}$ and ${p}^{\varepsilon,a}(0) \in {\mathbb{R}}^{1\times {m}_{a}}$.
(b)
For the transition probability P ^ε(t, t ₀), we have
$${P}^\varepsilon (t,{t}_{ 0}) = {P}^{0}(t,{t}_{ 0}) + O\left (\varepsilon +\exp (-{\kappa }_{0}(t - {t}_{0})/\varepsilon ))\right ),$$
(5.132)
for some κ₀ > 0, where
$${P}^{0}(t,{t}_{ 0}) =\widetilde{ \mathrm{1}{\mathrm{l}}}_{a}\Theta (t,{t}_{0})\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t),{I}_{{ m}_{a}}),$$
and
$${ d\Theta (t,{t}_{0}) \over dt} = \Theta (t,{t}_{0})\overline{Q}(t),\ \ \Theta ({t}_{0},{t}_{0}) = I.$$

To proceed, we aggregate the states in ${\mathcal{M}}_{i}$ for i = 1, …, l as one state leading to the definition of the following process:

$${ \overline{\alpha }}^\varepsilon (t) = \left \{\begin{array}{@{}l@{\quad }l@{}} i, \quad &if{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}, \\ {\alpha }^\varepsilon (t),\quad &if{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ a}.\\ \quad \end{array} \right.$$

(5.133)

For each j = 1, …, m _i, we also define a sequence of centered occupation measures by

$${ \chi }_{ij}^\varepsilon (t) = \left \{\begin{array}{@{}l@{\quad }l@{}} {\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)={s}_{ij}\}} - {\nu }_{j}^{i}(s){I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}}\right )ds,\quad &fori=1,\ldots,l, \\ {\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)={s}_{aj}\}} - {{\vartheta}}_{j}^{a}(s))ds. \quad & \\ \quad \end{array} \right.$$

(5.134)

For t ≥ 0 and $\alpha \in \mathcal{M}$, let

$${ W}_{ij}(t,\alpha ) = \left \{\begin{array}{@{}l@{\quad }l@{}} {I}_{\{\alpha ={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{ \alpha \in {\mathcal{M}}_{i}\}},\quad &fori = 1,\ldots,l,j = 1,\ldots,{m}_{i}, \\ {I}_{\{\alpha ={s}_{aj}\}} - {{\vartheta}}_{j}^{a}(t), \quad &forj = 1,\ldots,{m}_{ a}.\\ \quad \end{array} \right.$$

(5.135)

Consider the normalized occupation measure

$${n}^\varepsilon (t) = ({n}_{ 11}^\varepsilon (t),\ldots,{n}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{n}_{ l1}^\varepsilon (t),\ldots,{n}_{ l{m}_{l}}^\varepsilon (t),{n}_{ a1}^\varepsilon (t),\ldots,{n}_{ a{m}_{a}}^\varepsilon (t)),$$

where

$${n}_{ij}^\varepsilon (t) = \left \{\begin{array}{@{}l@{\quad }l@{}} \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}{W}_{ ij}(s,{\alpha }^\varepsilon (s)){\beta }_{ ij}(s)ds,\ i = 1,\ldots,l,\ j = 1,\ldots,{m}_{i},\quad \\ {\int }_{0}^{t}{W}_{ aj}(s,{\alpha }^\varepsilon (s)){\beta }_{ aj}(s)ds,\ j = 1,\ldots,{m}_{a}. \quad \\ \quad \end{array} \right.\ $$

Note that

$$\begin{array}{rl} &{ d{n}^\varepsilon (t) \over dt} = \left \{\begin{array}{@{}l@{\quad }l@{}} \frac{1} {\sqrt\varepsilon }{W}^{r}(t,{\alpha }^\varepsilon (t)),\quad &for{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ 1} \cup \cdots \cup {\mathcal{M}}_{l}, \\ {W}^{a}(t,{\alpha }^\varepsilon (t)), \quad &for{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{a},\\ \quad \end{array} \right. \\ &{n}^\varepsilon (0) = 0, \end{array}$$

where

$$\begin{array}{rl} &{W}^{r}(t,\alpha ) = ({W}_{ 11}(t,\alpha ),\ldots,{W}_{1{m}_{1}}(t,\alpha ),\ldots,{W}_{l1}(t,\alpha ),\ldots,{W}_{l{m}_{l}}(t,\alpha )), \\ &{W}^{a}(t,\alpha ) = ({W}_{ a1}(t,\alpha ),\ldots,{W}_{a{m}_{a}}(t,\alpha )),\mbox{ and } \\ &W(t,\alpha ) = ({W}^{r}(t,\alpha ),{W}^{a}(t,\alpha ))\end{array}$$

$$\langle {W}^{a}(u,\alpha )),{\nabla }_{ x}^{a}{f}^{0}(x,\alpha )\rangle = \sum \limits_{j=1}^{{m}_{a} }{b}_{j}(u,\alpha ){ \partial \over {\partial }_{a,j}} {f}^{0}(x,\alpha ).$$

We can obtain the following results.

Theorem 5.60

Assume (A5.10) . Then the following assertions hold.

(a)
For all i = 1, …, l and j = 1, …, m _i, corresponding to the recurrent states, sup_{t ∈ [0, T]} E | O _ij ^ε(t) | ² = O(ε).
(b)
${\overline{\alpha }}^\varepsilon (\cdot )$ converges weakly to $\overline{\alpha }(\cdot )$, a Markov chain generated by $\overline{Q}(\cdot )$.
(c)
Define the generator $\mathcal{L}$ by
$$\begin{array}{rl} \mathcal{L}{f}^{0}(x,\alpha )& ={ 1 \over 2} {\sum }_{{j}_{1},{j}_{2}=1}^{{m}_{\alpha } }{a}_{{j}_{1}{j}_{2}}(s,\alpha ){\partial }_{\alpha,{j}_{1}{j}_{2}}^{2}{f}^{0}(x,\alpha ) \\ &\ + \sum \limits_{j=1}^{{m}_{a} }{b}_{j}(s,\alpha ){\partial }_{a,j}{f}^{0}(x,\alpha ) + \overline{Q}(s){f}^{0}(x,\cdot )(\alpha )\end{array}$$
Then the sequence ${Y }^\varepsilon (\cdot ) = ({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))$ converges weakly to $\overline{Y }(\cdot ) = (n(\cdot ),\overline{\alpha }(\cdot ))$ that is a solution of the martingale problem with operator $\mathcal{L}$.

Next, assume that $\widetilde{Q}(\cdot )$ and $\widehat{Q}(\cdot )$ are bounded and measurable and $\widetilde{{Q}}^{i}(t)$ for each $i = 1,\ldots,l$ is weakly irreducible. Then

$${p}^\varepsilon (t) = ({p}_{ 11}^\varepsilon (t),\ldots,{p}_{ 1m1}^\varepsilon (t),\ldots,{p}_{ l1}^\varepsilon (t),\ldots,{p}_{ l{m}_{l}}^\varepsilon (t),{p}_{ a1}^\varepsilon (t),\ldots,{p}_{ a{m}_{a}}^\varepsilon (t))$$

converges in the weak topology of ${L}^{2}([0,T]; {\mathbb{R}}^{m})$ (with $m = \sum \limits_{i=1}^{l}{m}_{i} + {m}_{a})$ to

$$p(t) = ({{\vartheta}}_{1}(t){\nu }^{1}(t),\ldots,{\nu }^{1}(t){{\vartheta}}_{ l}(t),{p}^{0,a}),$$

where p ^0, a is the subvector in the initial data p ⁰ corresponding the absorbing state.

Note that in deriving the asymptotic distribution of the scaled occupation measures, we need to compute the asymptotic covariance of the limit process. That is, we need to evaluate the limit of

$$\begin{array}{ll} &E{\int }_{0}^{t}\left (\begin{array}{*{10}c} { 1 \over \sqrt\varepsilon } ({W}^{r}(s,{\alpha }^\varepsilon (s)))^{\prime} \\ ({W}^{a}(s,{\alpha }^\varepsilon (s)))^{\prime}\\ \end{array} \right )\left (\begin{array}{*{10}c} { 1 \over \sqrt\varepsilon } {W}^{r}(s,{\alpha }^\varepsilon (s)),&{W}^{a}(s,{\alpha }^\varepsilon (s)) \\ \end{array} \right )ds \\ &\qquad \stackrel{\mathrm{def}}{=}\left (\begin{array}{*{10}c} {W}_\varepsilon ^{rr}(t)&{W}_\varepsilon ^{ra}(t) \\ {W}_\varepsilon ^{ar}(t)&{W}_\varepsilon ^{aa}(t)\\ \end{array} \right ),\end{array}$$

(5.136)

where

$$\begin{array}{rl} &{W}_\varepsilon ^{rr}(t) ={ 1 \over \varepsilon } {\int }_{0}^{t}E({W}^{r}(s,{\alpha }^\varepsilon (s)))^{\prime}{W}^{r}(s,{\alpha }^\varepsilon (s))ds \\ &{W}_\varepsilon ^{ra}(t) ={ 1 \over \sqrt\varepsilon } {\int }_{0}^{t}E({W}^{r}(s,{\alpha }^\varepsilon (s)))^{\prime}{W}^{a}(s,{\alpha }^\varepsilon (s))ds \\ &{W}_\varepsilon ^{ar}(t) ={ 1 \over \sqrt\varepsilon } {\int }_{0}^{t}E({W}^{a}(s,{\alpha }^\varepsilon (s)))^{\prime}{W}^{r}(s,{\alpha }^\varepsilon (s))ds \\ &{W}_\varepsilon ^{aa}(t) ={ \int }_{0}^{t}E({W}^{a}(s,{\alpha }^\varepsilon (s)))^{\prime}{W}^{a}(s,{\alpha }^\varepsilon (s))ds\end{array}$$

It can be shown that

$${W}_\varepsilon ^{rr}(t) \rightarrow {\overline{W}}^{r}(t),\ {W}_{ \varepsilon }^{ra}(t) \rightarrow 0,\ {W}_{ \varepsilon }^{ar}(t) \rightarrow 0,\ \mbox{ and }{W}_{ \varepsilon }^{aa}(t) \rightarrow {\overline{W}}^{a}(t),$$

as ε → 0, where for i = 1, …, l, ${\overline{W}}^{r}(t) ={ \overline{W}}^{r}(t,i) ={ \int }_{0}^{t}\widehat{{W}}^{r}(s,i)ds$ with

$$\widehat{{W}}^{r}(s,i) = \mathrm{diag}({0}_{{ m}_{1}\times {m}_{1}},\ldots,\sigma (s,i),\ldots,{0}_{{m}_{l}\times {m}_{l}})$$

(5.137)

with σ(s, i) the m _i ×m _i matrix such that $\sigma (s,i)\sigma ^{\prime}(s,i) = A(s,i)\mbox{ for }i = 1,\ldots,l,$ and

$$\begin{array}{ll} &{\overline{W}}^{a}(t) = ({\overline{W}}_{ jk}^{a}(t))\mbox{ with }\ {\overline{W}}_{ jk}^{a}(t) ={ \int }_{0}^{t}\left ({\delta }_{ jk}{{\vartheta}}_{j}^{a}(s) - {{\vartheta}}_{ j}^{a}(s){{\vartheta}}_{ k}^{a}(s)\right )ds, \end{array}$$

(5.138)

where δ_jk = 1 if j = k, δ_jk = 0 if j ≠ k. The detailed proof of Theorem 5.60 can be found in Yin, Zhang, and Badowski [241].

6 Remarks on a Stability Problem

So far, our study has been devoted to systems with two time scales in a finite interval. In many problems arising in networked control systems, stability is often a main concern. A related problem along this line is in Badowski and Yin [5].

It is interesting to note that intuitive ideas sometimes are not necessarily true for systems with switching, for example, if one put together two stable systems by using, for instance, Markovian switching. Our intuition may lead to the conclusion that the combined systems should also be stable. Nevertheless, this is, in fact, not true. Such an idea was illustrated in Wang, Khargonekar, and Beydoun [212] for deterministically switched systems; see also Chapter 1 of this book concerning this matter.

As a variation of the system in [212], we consider the following example. Suppose that α^ε( ⋅) is a continuous-time Markov chain with state space $\mathcal{M} =\{ 1,2\}$ and generator ${Q}^\varepsilon = Q/\varepsilon $, where $Q = \left (\begin{array}{cc} - 1& 1\\ \\ \\ 1 & -1 \end{array} \right )$. Consider a controlled system

$$\dot{x} = A({\alpha }^\varepsilon (t))x + B({\alpha }^\varepsilon (t))u(t),$$

with state feedback u(t) = K(α^ε(t))x(t). Then we obtain the equivalent representation

$$\dot{x} = [A({\alpha }^\varepsilon (t)) - B({\alpha }^\varepsilon (t))K({\alpha }^\varepsilon (t))]x.$$

(5.139)

Suppose that

$$\begin{array}{rl} &G(1) = A(1) - B(1)K(1) = \left (\begin{array}{cc} - 100& 20\\ \\ \\ 200 & - 100 \end{array} \right ), \\ &G(2) = A(2) - B(2)K(2) = \left (\begin{array}{cc} - 100& 200\\ \\ \\ 20 & - 100 \end{array} \right )\end{array}$$

Note that both matrices are Hurwitz (i.e., their eigenvalues have negative real parts). A question of interest is this: Is system (5.139) stable? The key to understanding the system is to examine

$$\dot{{x}}^\varepsilon (t) = G({\alpha }^\varepsilon (t)){x}^\varepsilon (t),$$

(5.140)

where both G(1) and G(2) are stable matrices.

Since Q is irreducible, the stationary distribution associated with Q is given by $(1/2,1/2)$. As a result, as ε → 0, using our weak convergence result, x ^ε( ⋅) converges weakly to x( ⋅), which is a solution of the system

$$\begin{array}{ll} &\dot{x}(t) = \overline{G}x(t),\ \mbox{ where } \\ &\overline{G} ={ 1 \over 2} (G(1) + G(2)) = \left (\begin{array}{cc} - 100& 110\\ \\ \\ 110 & - 100 \end{array} \right ).\end{array}$$

(5.141)

In addition, for any T < ∞, using the large deviations result obtained in He, Yin, and Zhang [84], we can show that for any δ > 0, there is a c ₁ > 0 such that

$$P({\rho }_{0,T}({x}^\varepsilon (t),x(t)) \geq \delta ) \leq \exp (-{c}_{ 1}/\varepsilon ),$$

(5.142)

where ${\rho }_{0,T}(x,y) {=\sup }_{0\leq t\leq T}\vert x(t) - y(t)\vert $.

Note that $\overline{G}$ is an unstable matrix with eigenvalues − 210 and 10. Thus for (5.141), the critical point (0, 0)′ is a saddle point. But why should the stability of the averaged system dominate that of the original system? To see this, from a result of differential equations, there is a nonsingular matrix H such that $H\overline{G}{H}^{-1} = \Lambda = \mathrm{diag}(-210,10)$. Clearly, the stability of (5.141) is equivalent to that of

$$\dot{y}(t) = \Lambda y(t) = H{\sum }_{i=1}^{2}{\nu }_{ i}G(i){H}^{-1}y(t) = \mathrm{diag}(-210,10)y(t),$$

(5.143)

where $y = Hx = ({y}_{1},{y}_{2})^{\prime}$. The stability of (5.141) is equivalent to that of (5.143), which is completely decoupled and ${y}_{1}(t) =\exp (-210t){y}_{1}(0) \rightarrow 0$ and y ₂(t) = exp(10t)y ₂(0) → ∞. To see how the original system (5.140) behaves, we apply the same transformation to get

$$\dot{{y}}^\varepsilon (t) = H{\sum }_{i=1}^{2}{I}_{\{{ \alpha }^\varepsilon (t)=i\}}G(i){H}^{-1}{y}^\varepsilon (t).$$

(5.144)

For the transformed system (5.143), by choosing $V (y) = {y}_{2}^{2}/2$, we obtain $\dot{V }(y(t)) = 10{y}_{2}^{2} > 0$ for all y ₂ ≠ 0. Define ${L}^\varepsilon z(t) {=\lim }_{\delta \downarrow 0}{E}_{t}^\varepsilon [z(t + \delta ) - z(t)]/\delta $ for a real-valued function z(t) that is continuously differentiable, where E _t ^ε denotes the conditioning on the ${\mathcal{F}}_{t}^\varepsilon = \sigma \{{\alpha }^\varepsilon (s) : s \leq t\}$. With $V (y) = {y}_{2}^{2}/2$, we have

$${L}^\varepsilon V ({y}^\varepsilon (t)) = 10{({y}_{ 2}^\varepsilon (t))}^{2} + {V ^{\prime}}_{ y}({y}^\varepsilon (t))H{\sum }_{i=1}^{2}[{I}_{\{{ \alpha }^\varepsilon (t)=i\}} - {\nu }_{i}]G(i){H}^{-1}{y}^\varepsilon (t),$$

where ${V ^{\prime}}_{y}(y) = (0,{y}_{2}) \in {\mathbb{R}}^{1\times 2}$. Using perturbed Liapunov function techniques as done in Badowski and Yin [5], define a perturbation

$${V }_{2}^\varepsilon (y,t) = {E}_{ t}^\varepsilon { \int }_{t}^{\infty }{e}^{t-s}{V }_{ y}^{\prime}(y)H{\sum }_{i=1}^{2}[{I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}]G(i){H}^{-1}y.$$

It can be shown that V ₂ ^ε(y, t) = O(ε)V (y). In addition,

$$\begin{array}{rl} &{L}^\varepsilon {V }_{ 2}^\varepsilon ({y}^\varepsilon (t),t) = -{V ^{\prime}}_{ y}({y}^\varepsilon (t))H{\sum }_{i=1}^{2}[{I}_{\{{ \alpha }^\varepsilon (t)=i\}} - {\nu }_{i}]G(i){H}^{-1}{y}^\varepsilon (t) \\ & + O(\varepsilon )V ({y}^\varepsilon (t))\end{array}$$

Define

$${V }^\varepsilon (y,t) = V (y) + {V }_{ 2}^\varepsilon (y,t).$$

Evaluate L ^ε V ^ε(y ^ε(t), t). Upon cancelation, for sufficiently small ε, we can make

$$O(\varepsilon )V ({y}^\varepsilon (t)) \geq -{({y}_{ 2}^\varepsilon (t))}^{2}.$$

It then follows that

$$\begin{array}{rl} {L}^\varepsilon {V }^\varepsilon ({y}^\varepsilon (t),t)& = 10{({y}_{ 2}^\varepsilon (t))}^{2} + O(\varepsilon )V ({y}^\varepsilon (t)) \\ & \geq 9{({y}_{2}^\varepsilon (t))}^{2}\end{array}$$

Taking expectation of the left- and right-hand sides above leads to

$${ d \over dt} E\vert {y}_{2}^\varepsilon (t){\vert }^{2} \geq 9E\vert {y}_{ 2}^\varepsilon (t){\vert }^{2},$$

which in turn yields that

$$E{({y}_{2}^\varepsilon (t))}^{2} \geq E{({y}_{ 2}^\varepsilon (0))}^{2}\exp (9t) \rightarrow \infty \ \mbox{ as }\ t \rightarrow \infty.$$

Similar to the previous development, choose $V (y) = {y}_{1}^{2}/2$, define

$${V }_{1}^\varepsilon (y,t) = {E}_{ t}^\varepsilon { \int }_{t}^{\infty }{e}^{t-s}{V }_{ y}^{\prime}(y)H{\sum }_{i=1}^{2}[{I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}]G(i){H}^{-1}y,$$

and redefine

$${V }^\varepsilon (y,t) = V (y) + {V }_{ 1}^\varepsilon (y,t).$$

Using the upper bound O(ε)V (y ^ε(t)) ≤ (y ₁ ^ε(t))² this time and calculating L ^ε V ^ε(y ^ε(t), t), we obtain

$${ d \over dt} E\vert {y}_{1}^\varepsilon (t){\vert }^{2} \leq -209E\vert {y}_{ 1}^\varepsilon (t){\vert }^{2},$$

which in turn yields that

$$E{({y}_{1}^\varepsilon (t))}^{2} \leq E{({y}_{ 1}^\varepsilon (0))}^{2}\exp (-209t) \rightarrow 0\ \mbox{ as }\ t \rightarrow \infty.$$

This yields that (5.144) and hence (5.140) are unstable in probability (see Yin and Zhu [244, p. 220] for a definition). In fact, it can be seen that the trivial solution of the original system is also a saddle.

In the same spirit of the last example, consider a system given by

$$\begin{array}{ll} &\dot{{x}}^\varepsilon (t) = G({\alpha }^\varepsilon (t)){x}^\varepsilon (t),\ \ {\alpha }^\varepsilon (t) \sim Q/\varepsilon,\ \mbox{ where } \\ &G(1) = \left (\begin{array}{rr} -{ 7 \over 3} & - 1\\ \\ \\ 0& 1 \end{array} \right ),\ G(2) = \left (\begin{array}{rr} 1& 0\\ \\ \\ - 1& -{ 7 \over 3} \end{array} \right ), \end{array}$$

(5.145)

where Q is as in the last example. Then it can be shown that x ^ε( ⋅) converges weakly to x( ⋅) that is a solution of the following system

$$\begin{array}{ll} \dot{x}(t)& = \overline{G}x(t), \\ \overline{G} & = \left (\begin{array}{rr} -{ 4 \over 3} & - 1\\ \\ \\ - 1& -{ 4 \over 3} \end{array} \right )\end{array}$$

(5.146)

Neither G(1) nor G(2) is a stable matrix, but the system (5.146) is a stable one. The stability analysis is again carried out using perturbed Liapunov function methods. Here exactly the same kind of argument as in [5] can be applied. Using the techniques of perturbed Liapunov functions, we can show that the stability of the averaged system “implies” that of the original system.

These two examples illustrate that one can combine two stable systems using Markovian switching to produce an unstable limit system. Likewise, one can combine two unstable systems to produce a limit stable systems. More importantly, using our weak convergence result of this chapter and the large deviations results in He, Yin, and Zhang [84], combined with the perturbed Liapunov function argument, we can give the reason why such a thing can happen.

7 Notes

This chapter concerns sequences of functional occupation measures. It includes convergence of an unscaled sequence (in probability) and central-limit-type results for suitably scaled sequences. For a general introduction to central limit theorems, we refer to the book by Chow and Teicher [30] and the references therein. In the stationary case, that is, Q(t) = Q, a constant matrix, the central limit theorem may be obtained as in Friedlin and Wentzell [67]. Some results of central limit type for discrete Markov chains are in Dobrushin [50] (see also the work of Linnik on time-inhomogeneous Markov chains [147]). Work in the context of random evolution, with primary concern the central limit theorem involving a singularly perturbed Markov chain, is in Pinsky [176]; see also Kurtz [135, 137] for related discussions and the martingale problem formulation. Exponential bounds for Markov processes are quite useful in analyzing the behavior of the underlying stochastic processes. Some results in connection with diffusions can be found in Kallianpur [102]. Corollary 5.8 can be viewed as a large deviations result. For extensive treatment of large deviations, see Varadhan [207].

The central theme here is limit results of unscaled as well as scaled sequences of occupation measures, which include the law of large numbers for an unscaled sequence, exponential upper bounds, and asymptotic distribution of a suitably scaled sequence of occupation times. Results in Section 5.2 are based on the paper of Zhang and Yin [252]; however, a somewhat different approach to the central limit theorem was used in [252]. Some of the results in Section 5.3 are based on Zhang and Yin [253]. The result on exponential error bound in Section 5.3 is a natural extension for the irreducible generators. Such result holds uniformly in t ∈ [0, T] for fixed but otherwise arbitrary T > 0. The main motivation for treating T as a parameter stems from various control and optimization problems with discounted cost over the infinite horizon. In such a situation, the magnitude of the bound counts. Thus detailed information on the bounding constant is helpful for dealing with the near optimality of the underlying problem. Section 5.3 also presents a characterization of the limit process using martingale problem formulations. Much of the foundation of this useful approach is in the work of Stroock and Varadhan [203]. Using perturbed operators to study limit behavior may be traced back to Kurtz [135]. The general idea of perturbed test functions was used in Blankenship and Papanicolaou [16], and Papanicolaou, Stroock, and Varadhan [168]. It was further developed and extended by Kushner [139] for various stochastic systems, and singularly perturbed systems in Kushner [140]; see also Kushner and Yin [145] for related stochastic approximation problems, and Ethier and Kurtz [59] and Kurtz [137] for related work in stochastic processes. The results of this section have benefited from the discussion with Thomas Kurtz, who suggested treating the pair of processes (n ^ε( ⋅), α^ε( ⋅)) together, which led to the current version. Earlier treatment of a pair of processes may be found in the work of Kesten and Papanicolaou [110] for stochastic acceleration.

The results on asymptotic properties for the inclusion of transient states can be found in Yin, Zhang, and Badowski [239]; the results for the case of generators being measurable can be found in the work of Yin, Zhang, and Badowski [240]; the results on asymptotic properties of occupation measures with absorbing states can be found in Yin, Zhang, and Badowski [241].

References

M. Abbad, J.A. Filar, and T.R. Bielecki, Algorithms for singularly perturbed limiting average Markov control problems, IEEE Trans. Automat. Control AC-37 (1992), 1421–1425.
Google Scholar
R. Akella and P.R. Kumar, Optimal control of production rate in a failure-prone manufacturing system, IEEE Trans. Automat. Control AC-31 (1986), 116–126.
Google Scholar
W.J. Anderson, Continuous-Time Markov Chains: An Application-Oriented Approach, Springer-Verlag, New York, 1991.
Google Scholar
E. Altman, K.E. Avrachenkov, and R. Nunez-Queija, Pertrubation analysis for denumerable Markov chains with applications to queueing models, Adv. in Appl. Probab., 36 (2004), 839–853.
MathSciNet MATH Google Scholar
G. Badowski and G. Yin, Stability of hybrid dynamic systems containing singularly perturbed random processes, IEEE Trans. Automat. Control, 47 (2002), 2021–2031.
MathSciNet Google Scholar
G. Barone-Adesi and R. Whaley, Efficient analytic approximation of American option values, J. Finance, 42 (1987), 301–320.
Google Scholar
A. Bensoussan, J.L. Lion, and G.C. Papanicolaou, Asymptotic Analysis of Periodic Structures, North-Holland, Amsterdam, 1978.
Google Scholar
A. Bensoussan, Perturbation Methods in Optimal Control, J. Wiley, Chichester, 1988.
MATH Google Scholar
L.D. Berkovitz, Optimal Control Theory, Springer-Verlag, New York, 1974.
MATH Google Scholar
A.T. Bharucha-Reid, Elements of the Theory of Markov Processes and Their Applications, McGraw-Hill, New York, 1960.
MATH Google Scholar
T.R. Bielecki and J.A. Filar, Singularly perturbed Markov control problem: Limiting average cost, Ann. Oper. Res. 28 (1991), 153–168.
Google Scholar
T.R. Bielecki and P.R. Kumar, Optimality of zero-inventory policies for unreliable manufacturing systems, Oper. Res. 36 (1988), 532–541.
Google Scholar
P. Billingsley, Convergence of Probability Measures, J. Wiley, New York, 1968.
MATH Google Scholar
T. Björk, Finite dimensional optimal filters for a class of Ito processes with jumping parameters, Stochastics, 4 (1980), 167–183.
MathSciNet MATH Google Scholar
W.P. Blair and D.D. Sworder, Feedback control of a class of linear discrete systems with jump parameters and quadratic cost criteria, Int. J. Control, 21 (1986), 833–841.
MathSciNet Google Scholar
G.B. Blankenship and G.C. Papanicolaou, Stability and control of stochastic systems with wide band noise, SIAM J. Appl. Math. 34 (1978), 437–476.
Google Scholar
H.A.P. Blom and Y. Bar-Shalom, The interacting multiple model algorithm for systems with Markovian switching coefficients, IEEE Trans. Automat. Control, AC-33 (1988), 780–783.
Google Scholar
N.N. Bogoliubov and Y.A. Mitropolskii, Asymptotic Methods in the Theory of Nonlinear Oscillator, Gordon and Breach, New York, 1961.
Google Scholar
E.K. Boukas and A. Haurie, Manufacturing flow control and preventive maintenance: A stochastic control approach, IEEE Trans. Automat. Control AC-35 (1990), 1024–1031.
Google Scholar
P. Brémaud, Point Processes and Queues, Springer-Verlag, New York, 1981.
MATH Google Scholar
P.E. Caines and H.-F. Chen, Optimal adaptive LQG control for systems with finite state process parameters, IEEE Trans. Automat. Control, AC-30 (1985), 185–189.
MathSciNet Google Scholar
S.L. Campbell, Singular perturbation of autonomous linear systems, II, J. Differential Equations 29 (1978), 362–373.
Google Scholar
S.L. Campbell and N.J. Rose, Singular perturbation of autonomous linear systems, SIAM J. Math. Anal. 10 (1979), 542–551.
Google Scholar
M. Caramanis and G. Liberopoulos, Perturbation analysis for the design of flexible manufacturing system flow controllers, Oper. Res. 40 (1992), 1107–1125.
Google Scholar
M.-F. Chen, From Markov Chains to Non-equilibrium Particle Systems, 2nd ed., World Scientific, Singapore, 2004.
MATH Google Scholar
S. Chen, X. Li, and X.Y. Zhou, Stochastic linear quadratic regulators with indefinite control weight costs, SIAM J. Control Optim. 36 (1998), 1685–1702.
Google Scholar
C.L. Chiang, An Introduction to Stochastic Processes and Their Applications, Kreiger, Hungtington, 1980.
MATH Google Scholar
T.-S. Chiang and Y. Chow, A limit theorem for a class of inhomogeneous Markov processes, Ann. Probab. 17 (1989), 1483–1502.
Google Scholar
P.L. Chow, J.L. Menaldi, and M. Robin, Additive control of stochastic linear systems with finite horizon, SIAM J. Control Optim. 23 (1985), 859–899.
Google Scholar
Y.S. Chow and H. Teicher, Probability Theory, Springer-Verlag, New York, 1978.
MATH Google Scholar
K.L. Chung, Markov Chains with Stationary Transition Probabilities, 2nd Ed., Springer-Verlag, New York, 1967.
MATH Google Scholar
F. Clarke, Optimization and Non-smooth Analysis, Wiley Interscience, New York, 1983.
Google Scholar
O.L.V. Costa and F. Dufour, Singular perturbation for the discounted continuous control of piecewise deterministic Markov processes, Appl. Math. Optim., 63 (2011), 357–384.
MathSciNet MATH Google Scholar
O.L.V. Costa and F. Dufour, Singularly perturbed discounted Markov control processes in a general state space, SIAM J. Control Optim., 50 (2012), 720–747.
MathSciNet MATH Google Scholar
P.J. Courtois, Decomposability: Queueing and Computer System Applications, Academic Press, New York, NY, 1977.
MATH Google Scholar
D.R. Cox and H.D. Miller, The Theory of Stochastic Processes, J. Wiley, New York, 1965.
MATH Google Scholar
M.G. Crandall, C. Evans, and P.L. Lions, Some properties of viscosity solutions of Hamilton-Jacobi equations, Trans. Amer. Math. Soc. 282 (1984), 487–501.
Google Scholar
M.G. Crandall, H. Ishii, and P.L. Lions, User’s guide to viscosity solutions of second order partial differential equations, Bull. Amer. Math. Soc. 27 (1992), 1–67.
Google Scholar
M.G. Crandall and P.L. Lions, Viscosity solutions of Hamilton-Jacobi equations, Trans. Amer. Math. Soc. 277 (1983), 1–42.
Google Scholar
I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Regional Conf. Ser. Appl. Math., SIAM, Philadelphia, PA, 1992.
MATH Google Scholar
M.H.A. Davis, Markov Models and Optimization, Chapman & Hall, London, 1993.
Google Scholar
M.V. Day, Boundary local time and small parameter exit problems with characteristic boundaries, SIAM J. Math. Anal. 20 (1989), 222–248.
Google Scholar
F. Delebecque, A reduction process for perturbed Markov chains, SIAM J. Appl. Math., 48 (1983), 325–350.
MathSciNet Google Scholar
F. Delebecque and J. Quadrat, Optimal control for Markov chains admitting strong and weak interactions, Automatica 17 (1981), 281–296.
Google Scholar
F. Delebecque, J. Quadrat, and P. Kokotovic, A unified view of aggregation and coherency in networks and Markov chains, Internat. J. Control 40 (1984), 939–952.
Google Scholar
C. Derman, Finite State Markovian Decision Processes, Academic Press, New York, 1970.
MATH Google Scholar
G.B. Di Masi and Yu.M. Kabanov, The strong convergence of two-scale stochastic systems and singular perturbations of filtering equations, J. Math. Systems, Estimation Control 3 (1993), 207–224.
Google Scholar
G.B. Di Masi and Yu.M. Kabanov, A first order approximation for the convergence of distributions of the Cox processes with fast Markov switchings, Stochastics Stochastics Rep. 54 (1995), 211–219.
Google Scholar
J.L. Doob, Stochastic Processes, Wiley Classic Library Edition, Wiley, New York, 1990.
MATH Google Scholar
R.L. Dobrushin, Central limit theorem for nonstationary Markov chains, Theory Probab. Appl. 1 (1956), 65–80, 329–383.
Google Scholar
E.B. Dynkin, Markov Processes, Springer-Verlag, Berlin, 1965.
MATH Google Scholar
N. Dunford and J.T. Schwartz, Linear Operators, Interscience, New York, 1958.
MATH Google Scholar
E.B. Dynkin and A.A. Yushkevich, Controlled Markov Processes, Springer-Verlag, New York, 1979.
Google Scholar
W. Eckhaus, Asymptotic Analysis of Singular Perturbations, North-Holland, Amsterdam, 1979.
MATH Google Scholar
R.J. Elliott, Stochastic Calculus and Applications, Springer-Verlag, New York, 1982.
MATH Google Scholar
R.J. Elliott, Smoothing for a finite state Markov process, in Lecture Notes in Control and Inform. Sci., 69, 199–206, Springer-Verlag, New York, 1985.
Google Scholar
R.J. Elliott, L. Aggoun, and J. Moore, Hidden Markov Models: Estimation and Control, Springer-Verlag, New York, 1995.
MATH Google Scholar
A. Erdélyi, Asymptotic Expansions, Dover, New York, 1956.
MATH Google Scholar
S.N. Ethier and T.G. Kurtz, Markov Processes: Characterization and Convergence, J. Wiley, New York, 1986.
MATH Google Scholar
W. Feller, An Introduction to Probability Theory and Its Applications, J. Wiley, New York, Vol. I, 1957; Vol. II, 1966.
Google Scholar
W.H. Fleming, Functions of Several Variables, Addison-Wesley, Reading, 1965.
MATH Google Scholar
W.H. Fleming, Generalized solution in optimal stochastic control, in Proc. URI Conf. on Control, 147–165, Kingston, RI, 1982.
Google Scholar
W.H. Fleming and R.W. Rishel, Deterministic and Stochastic Optimal Control, Springer-Verlag, New York, 1975.
MATH Google Scholar
W.H. Fleming and H.M. Soner, Controlled Markov Processes and Viscosity Solutions, Springer-Verlag, New York, 1992.
Google Scholar
W.H. Fleming, S.P. Sethi, and H.M. Soner, An optimal stochastic production planning problem with randomly fluctuating demand, SIAM J. Control Optim. 25 (1987), 1494–1502.
Google Scholar
W.H. Fleming and Q. Zhang, Risk-sensitive production planning of a stochastic manufacturing system, SIAM J. Control Optim., 36 (1998), 1147–1170.
MathSciNet MATH Google Scholar
M.I. Friedlin and A.D. Wentzel, Random Perturbations of Dynamical Systems, Springer-Verlag, New York, 1984.
Google Scholar
C.W. Gardiner, Handbook of Stochastic Methods for Physics, Chemistry, and the Natural Sciences, 2nd Ed., Springer-Verlag, Berlin, 1985.
Google Scholar
V.G. Gaitsgori and A.A. Pervozvanskii, Aggregation of states in a Markov chain with weak interactions, Kybernetika 11 (1975), 91–98.
Google Scholar
D. Geman and S. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Machine Intelligence 6 (1984), 721–741.
Google Scholar
S.B. Gershwin, Manufacturing Systems Engineering, Prentice-Hall, Englewood Cliffs, 1994.
Google Scholar
M.K. Ghosh, A. Arapostathis, and S.I. Marcus, Ergodic control of switching diffusions, SIAM J. Control Optim., 35 (1997), 1952–1988.
MathSciNet MATH Google Scholar
I.I. Gihman and A.V. Skorohod, Introduction to the Theory of Random Processes, W.B. Saunders, Philadelphia, 1969.
Google Scholar
P. Glasserman, Gradient Estimation via Perturbation Analysis, Kluwer, Boston, MA, 1991.
MATH Google Scholar
R. Goodman, Introduction to Stochastic Models, Benjamin/Cummings, Menlo Park, CA, 1988.
Google Scholar
R.J. Griego and R. Hersh, Random evolutions, Markov chains, and systems of partial differential equations, Proc. Nat. Acad. Sci. U.S.A. 62 (1969), 305–308.
Google Scholar
R.J. Griego and R. Hersh, Theory of random evolutions with applications to partial differential equations, Trans. Amer. Math. Soc. 156 (1971), 405–418.
Google Scholar
X. Guo and O. Hernàndez-Lerma, Continuous-time Markov Decision Processes: Theory and Applications, Springer, Heidelberg, 2001.
Google Scholar
J.K. Hale, Ordinary Differential Equations, R.E. Krieger Publishing Co., 2nd Ed., Malabar, 1980.
Google Scholar
P. Hänggi, P. Talkner, and M. Borkovec, Reaction-rate theory: Fifty years after Kramers, Rev. Modern Phys. 62 (1990), 251–341.
Google Scholar
J.M. Harrison and M.I. Reiman, Reflected Brownian motion on an orthant, Ann. Probab. 9 (1981), 302–308.
Google Scholar
U.G. Haussmann and Q. Zhang, Stochastic adaptive control with small observation noise, Stochastics Stochastics Rep. 32 (1990), 109–144.
Google Scholar
U.G. Haussmann and Q. Zhang, Discrete time stochastic adaptive control with small observation noise, Appl. Math. Optim. 25 (1992), 303–330.
Google Scholar
Q. He, G. Yin, and Q. Zhang, Large Deviations for Two-Time-Scale Systems Driven by Nonhomogeneous Markov Chains and LQ Control Problems, SIAM J. Control Optim., 49, (2011), 1737–1765.
MathSciNet MATH Google Scholar
R. Hersh, Random evolutions: A survey of results and problems, Rocky Mountain J. Math. 4 (1974), 443–477.
Google Scholar
F.S. Hillier and G.J. Lieberman, Introduction to Operations Research, McGraw-Hill, New York, 1989.
Google Scholar
Y.C. Ho and X.R. Cao, Perturbation Analysis of Discrete Event Dynamic Systems, Kluwer, Boston, MA, 1991.
MATH Google Scholar
A. Hoyland and M. Rausand, System Reliability Theory: Models and Statistical Methods, J. Wiley, New York, 1994.
Google Scholar
Z. Hou and Q. Guo, Homogeneous Denumerable Markov Processes, Springer-Verlag, Berlin, 1980.
Google Scholar
V. Hutson and J.S. Pym, Applications of Functional Analysis and Operator Theory, Academic Press, London, 1980.
MATH Google Scholar
N. Ikeda and S. Watanabe, Stochastic Differential Equations and Diffusion Processes, North-Holland, Amsterdam, 1981.
MATH Google Scholar
A.M. Il’in, Matching of Asymptotic Expansions of Solutions of Boundary Value Problems, Trans. Math. Monographs, Vol. 102, Amer. Math. Soc., Providence, 1992.
Google Scholar
A.M. Il’in and R.Z. Khasminskii, Asymptotic behavior of solutions of parabolic equations and ergodic properties of nonhomogeneous diffusion processes, Math. Sbornik. 60 (1963), 366–392.
Google Scholar
A.M. Il’in, R.Z. Khasminskii, and G. Yin, Singularly perturbed switching diffusions: Rapid switchings and fast diffusions, J. Optim. Theory Appl. 102 (1999), 555–591.
Google Scholar
M. Iosifescu, Finite Markov Processes and Their Applications, Wiley, Chichester, 1980.
MATH Google Scholar
H. Ishii, Uniqueness of unbounded viscosity solutions of Hamilton-Jacobi equations, Indiana Univ. Math. J. 33 (1984), 721–748.
Google Scholar
Y. Ji and H.J. Chizeck, Controllability, stabilizability, and continuous-time Markovian jump linear quadratic control, IEEE Trans. Automatic Control, 35 (1990), 777–788.
MathSciNet MATH Google Scholar
Y. Ji and H.J. Chizeck, Jump linear quadratic Gaussian control in continuous time, IEEE Trans. Automat. Control AC-37 (1992), 1884–1892.
Google Scholar
J. Jiang and S.P. Sethi, A state aggregation approach to manufacturing systems having machines states with weak and strong interactions, Oper. Res. 39 (1991), 970–978.
Google Scholar
Yu. Kabanov and S. Pergamenshchikov, Two-scale Stochastic Systems: Asymptotic Analysis and Control, Springer, New York, NY, 2003.
Google Scholar
I.Ia. Kac and N.N. Krasovskii, On the stability of systems with random parameters, J. Appl. Math. Mech., 24 (1960), 1225–1246.
Google Scholar
G. Kallianpur, Stochastic Filtering Theory, Springer-Verlag, New York, 1980.
MATH Google Scholar
D. Kannan, An Introduction to Stochastic Processes, North-Holland, New York, 1980.
Google Scholar
S. Karlin and J. McGregor, The classification of birth and death processes, Trans. Amer. Math. Soc. 85 (1957), 489–546.
Google Scholar
S. Karlin and H.M. Taylor, A First Course in Stochastic Processes, 2nd Ed., Academic Press, New York, 1975.
MATH Google Scholar
S. Karlin and H.M. Taylor, A Second Course in Stochastic Processes, Academic Press, New York, 1981.
MATH Google Scholar
J. Keilson, Green’s Function Methods in Probability Theory, Griffin, London, 1965.
MATH Google Scholar
J. Kevorkian and J.D. Cole, Perturbation Methods in Applied Mathematics, Springer-Verlag, New York, 1981.
MATH Google Scholar
J. Kevorkian and J.D. Cole, Multiple Scale and Singular Perturbation Methods, Springer-Verlag, New York, 1996.
MATH Google Scholar
H. Kesten and G.C. Papanicolaou, A limit theorem for stochastic acceleration, Comm. Math. Phys. 78 (1980), 19–63.
Google Scholar
R.Z. Khasminskii, On diffusion processes with a small parameter, Izv. Akad. Nauk U.S.S.R. Ser. Mat. 27 (1963), 1281–1300.
Google Scholar
R.Z. Khasminskii, On stochastic processes defined by differential equations with a small parameter, Theory Probab. Appl. 11 (1966), 211–228.
Google Scholar
R.Z. Khasminskii, On an averaging principle for Ito stochastic differential equations, Kybernetika 4 (1968), 260-279.
Google Scholar
R.Z. Khasminskii, Stochastic Stability of Differential Equations, 2nd Ed., Springer, New York, 2012.
MATH Google Scholar
R.Z. Khasminskii and G. Yin, Asymptotic series for singularly perturbed Kolmogorov-Fokker-Planck equations, SIAM J. Appl. Math. 56 (1996), 1766–1793.
Google Scholar
R.Z. Khasminskii and G. Yin, On transition densities of singularly perturbed diffusions with fast and slow components, SIAM J. Appl. Math. 56 (1996), 1794–1819.
Google Scholar
R.Z. Khasminskii and G. Yin, On averaging principles: An asymptotic expansion approach, SIAM J. Math. Anal., 35 (2004), 1534–1560.
MathSciNet MATH Google Scholar
R.Z. Khasminskii and G. Yin, Limit behavior of two-time-scale diffusions revisited, J. Differential Eqs., 212 (2005) 85–113.
MathSciNet MATH Google Scholar
R.Z. Khasminskii, G. Yin, and Q. Zhang, Asymptotic expansions of singularly perturbed systems involving rapidly fluctuating Markov chains, SIAM J. Appl. Math. 56 (1996), 277–293.
Google Scholar
R.Z. Khasminskii, G. Yin, and Q. Zhang, Constructing asymptotic series for probability distribution of Markov chains with weak and strong interactions, Quart. Appl. Math. LV (1997), 177–200.
Google Scholar
J.G. Kimemia and S.B. Gershwin, An algorithm for the computer control production in flexible manufacturing systems, IIE Trans. 15 (1983), 353–362.
Google Scholar
J.F.C. Kingman, Poisson Processes, Oxford Univ. Press, Oxford, 1993.
MATH Google Scholar
S. Kirkpatrick, C. Gebatt, and M. Vecchi, Optimization by simulated annealing, Science 220 (1983), 671–680.
Google Scholar
C. Knessel, On finite capacity processor-shared queues, SIAM J. Appl. Math. 50 (1990), 264–287.
Google Scholar
C. Knessel and J.A. Morrison, Heavy traffic analysis of a data handling system with multiple sources, SIAM J. Appl. Math. 51 (1991), 187–213.
Google Scholar
P.V. Kokotovic, Application of singular perturbation techniques to control problems, SIAM Rev. 26 (1984), 501–550.
Google Scholar
P.V. Kokotovic, A. Bensoussan, and G. Blankenship (Eds.), Singular Perturbations and Asymptotic Analysis in Control Systems, Lecture Notes in Control and Inform. Sci. 90, Springer-Verlag, Berlin, 1987.
Google Scholar
P.V. Kokotovic and H.K. Khalil (Eds.), Singular Perturbations in Systems and Control, IEEE Press, New York, 1986.
Google Scholar
P.V. Kokotovic, H.K. Khalil, and J. O’Reilly, Singular Perturbation Methods in Control, Academic Press, London, 1986.
MATH Google Scholar
V. Korolykuk and A. Swishchuk, Evolution of Systems in Random Media, CRC Press, Boca Raton, 1995.
Google Scholar
V.S. Korolyuk and N. Limnios, Diffusion approximation with equilibrium of evolutionary systems switched by semi-Markov processes, translation in Ukrainian Math. J. 57 (2005), 1466–1476.
Google Scholar
V.S. Korolyuk and N. Limnios, Stochastic systems in merging phase space, World Sci., Hackensack, NJ, 2005.
Google Scholar
N.M. Krylov and N.N. Bogoliubov, Introduction to Nonlinear Mechanics, Princeton Univ. Press, Princeton, 1947.
Google Scholar
H. Kunita and S. Watanabe, On square integrable martingales, Nagoya Math. J. 30 (1967), 209–245.
Google Scholar
T.G. Kurtz, A limit theorem for perturbed operator semigroups with applications to random evolutions, J. Functional Anal. 12 (1973), 55–67.
Google Scholar
T.G. Kurtz, Approximation of Population Processes, SIAM, Philadelphia, PA, 1981.
Google Scholar
T.G. Kurtz, Averaging for martingale problems and stochastic approximation, in Proc. US-French Workshop on Appl. Stochastic Anal., Lecture Notes in Control and Inform. Sci., 177, I. Karatzas and D. Ocone (Eds.), 186–209, Springer-Verlag, New York, 1991.
Google Scholar
H.J. Kushner, Probability Methods for Approximation in Stochastic Control and for Elliptic Equations, Academic Press, New York, 1977.
Google Scholar
H.J. Kushner, Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory, MIT Press, Cambridge, MA, 1984.
MATH Google Scholar
H.J. Kushner, Weak Convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems, Birkhäuser, Boston, 1990.
MATH Google Scholar
H.J. Kushner and P.G. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time, Springer-Verlag, New York, 1992.
MATH Google Scholar
H.J. Kushner and W. Runggaldier, Nearly optimal state feedback controls for stochastic systems with wideband noise disturbances, SIAM J Control Optim. 25 (1987), 289–315.
Google Scholar
H.J. Kushner and F.J. Vázquez-Abad, Stochastic approximation algorithms for systems over an infinite horizon, SIAM J. Control Optim. 34 (1996), 712–756.
Google Scholar
H.J. Kushner and G. Yin, Asymptotic properties of distributed and communicating stochastic approximation algorithms, SIAM J. Control Optim. 25 (1987), 1266–1290.
Google Scholar
H.J. Kushner and G. Yin, Stochastic Approximation and Recursive Algorithms and Applications, 2nd Edition, Springer-Verlag, New York, 2003.
MATH Google Scholar
X.R. Li, Hybrid estimation techniques, in Control and Dynamic Systems, Vol. 76, C.T. Leondes (Ed.), Academic Press, New York, 1996.
Google Scholar
Yu. V. Linnik, On the theory of nonhomogeneous Markov chains, Izv. Akad. Nauk. USSR Ser. Mat. 13 (1949), 65–94.
Google Scholar
Y.J. Liu, G. Yin, and X.Y. Zhou, Near-optimal controls of random-switching LQ problems with indefinite control weight costs, Automatica, 41 (2005) 1063–1070.
MathSciNet MATH Google Scholar
P. Lochak and C. Meunier, Multiphase Averaging for Classical Systems, Springer-Verlag, New York, 1988.
MATH Google Scholar
J. Lehoczky, S.P. Sethi, H.M. Soner, and M. Taksar, An asymptotic analysis of hierarchical control of manufacturing systems under uncertainty, Math. Oper. Res. 16 (1992), 596–608.
Google Scholar
G. Lerman and Z. Schuss, Asymptotic theory of large deviations for Markov chains, SIAM J. Appl. Math., 58 (1998), 1862–1877.
MathSciNet MATH Google Scholar
D. Ludwig, Persistence of dynamical systems under random perturbations, SIAM Rev. 17 (1975), 605–640.
Google Scholar
X. Mao and C. Yuan, Stochastic Differential Equations with Markovian Switching, Imperial College Press, London, UK, 2006.
MATH Google Scholar
M. Mariton, Robust jump linear quadratic control: A mode stabilizing solution, IEEE Trans. Automat. Control, AC-30 (1985), 1145–1147.
MathSciNet Google Scholar
M. Mariton, Jump Linear Systems in Automatic Control, Marcel Dekker, Inc., New York, 1990.
Google Scholar
L.F. Martins and H.J. Kushner, Routing and singular control for queueing networks in heavy traffic, SIAM J. Control Optim. 28 (1990), 1209–1233.
Google Scholar
W.A. Massey and W. Whitt, Uniform acceleration expansions for Markov chains with time-varying rates, Ann. Appl. Probab., 8 (1998), 1130–1155.
MathSciNet MATH Google Scholar
B.J. Matkowsky and Z. Schuss, The exit problem for randomly perturbed dynamical systems, SIAM J. Appl. Math. 33 (1977), 365–382.
Google Scholar
S.P. Meyn and R.L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993.
MATH Google Scholar
T. Naeh, M.M. Klosek, B.J. Matkowski, and Z. Schuss, A direct approach to the exit problem, SIAM J. Appl. Math. 50 (1990), 595–627.
Google Scholar
A.H. Nayfeh, Introduction to Perturbation Techniques, J. Wiley, New York, 1981.
MATH Google Scholar
M.F. Neuts, Matrix-Geometric Solutions in Stochastic Models, Johns Hopkins Univ. Press, Baltimore, 1981.
MATH Google Scholar
R.E. O’Malley, Jr., Singular Perturbation Methods for Ordinary Differential Equations, Springer-Verlag, New York, 1991.
MATH Google Scholar
Z.G. Pan and T. Başar, H ^∞-control of Markovian jump linear systems and solutions to associated piecewise-deterministic differential games, in New Trends in Dynamic Games and Applications, G.J. Olsder Ed., 61–94, Birkhäuser, Boston, MA, 1995.
Google Scholar
Z.G. Pan and T. Başar, H ^∞ control of large scale jump linear systems via averaging and aggregation, in Proc. 34th IEEE Conf. Decision Control, 2574-2579, New Orleans, LA, 1995.
Google Scholar
Z.G. Pan and T. Başar, Random evolutionary time-scale decomposition in robust control of jump linear systems, in Proc. 35th IEEE Conf. Decision Control, Kobe, Japan, 1996.
Google Scholar
Z.G. Pan and T. Basar, H ^∞ control of large-scale jump linear systems via averaging and aggregation, Internat. J. Control, 72 (1999), 866–881.
MathSciNet MATH Google Scholar
G.C. Papanicolaou, D. Stroock, and S.R.S. Varadhan, Martingale approach to some limit theorems, in Proc. 1976 Duke Univ. Conf. on Turbulence, Durham, NC, 1976.
Google Scholar
G.C. Papanicolaou, Introduction to the asymptotic analysis of stochastic equations, in Lectures in Applied Mathematics, Amer. Math. Soc., Vol. 16, 1977, 109-147.
MathSciNet Google Scholar
G.C. Papanicolaou, Asymptotic analysis of stochastic equations, Studies in Probability Theory, M. Rosenblatt (Ed.), Vol. 18, MAA, 1978, 111–179.
Google Scholar
E. Pardoux and S. Peng, Adapted solution of backward stochastic equation, Syst. Control Lett., 14 (1990), 55–61.
MathSciNet MATH Google Scholar
A. Pazy, Semigroups of Linear Operators and Applications to Partial Differential Equations, Springer-Verlag, New York, 1983.
MATH Google Scholar
L. Perko, Differential Equations and Dynamical Systems, Springer, 3rd Ed., New York, 2001.
Google Scholar
A.A. Pervozvanskii and V.G. Gaitsgori, Theory of Suboptimal Decisions: Decomposition and Aggregation, Kluwer, Dordrecht, 1988.
MATH Google Scholar
R.G. Phillips and P.V. Kokotovic, A singular perturbation approach to modelling and control of Markov chains, IEEE Trans. Automat. Control 26 (1981), 1087–1094.
Google Scholar
M.A. Pinsky, Differential equations with a small parameter and the central limit theorem for functions defined on a Markov chain, Z. Wahrsch. verw. Gebiete 9 (1968), 101–111.
Google Scholar
M.A. Pinsky, Multiplicative operator functionals and their asymptotic properties, in Advances in Probability Vol. 3, P. Ney and S. Port (Eds.), Marcel Dekker, New York, 1974.
Google Scholar
L. Prandtl, Über Flüssigkeits – bewegung bei kleiner Reibung, Verhandlungen, in III. Internat. Math. Kongresses, (1905), 484–491.
Google Scholar
M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, J. Wiley, New York, 1994.
MATH Google Scholar
D. Revuz, Markov Chains, Revised Ed., North-Holland, Amsterdam, 1975.
Google Scholar
R. Rishel, Controlled wear process: Modeling optimal control, IEEE Trans. Automat. Control 36 (1991), 1100–1102.
Google Scholar
H. Risken, The Fokker-Planck Equation: Methods of Solution and Applications, 2nd Ed., Springer-Verlag, London, 1989.
MATH Google Scholar
M. Rosenblatt, Markov Processes: Structure and Asymptotic Behavior, Springer-Verlag, Berlin, 1971.
MATH Google Scholar
S. Ross, Introduction to Stochastic Dynamic Programming, Academic Press, New York, 1983.
MATH Google Scholar
E. Roxin, The existence of optimal controls, Mich. Math. J. 9 (1962), 109–119.
Google Scholar
V.R. Saksena, J. O’Reilly, and P.V. Kokotovic, Singular perturbations and time-scale methods in control theory: Survey 1976-1983, Automatica 20 (1984), 273–293.
Google Scholar
Z. Schuss, Singularly perturbation methods in stochastic differential equations of mathematical physics, SIAM Rev. 22 (1980), 119–155.
Google Scholar
Z. Schuss, Theory and Applications of Stochastic Differential Equations, J. Wiley, New York, 1980.
MATH Google Scholar
E. Seneta, Non-negative Matrices and Markov Chains, Springer-Verlag, New York, 1981.
MATH Google Scholar
R. Serfozo, Introduction to Stochastic Networks, Springer, New York, 1999.
MATH Google Scholar
S.P. Sethi and G.L. Thompson, Applied Optimal Control: Applications to Management Science, Martinus Nijhoff, Boston, MA, 1981.
Google Scholar
S.P. Sethi and Q. Zhang, Hierarchical Decision Making in Stochastic Manufacturing Systems, Birkhäuser, Boston, 1994.
MATH Google Scholar
S.P. Sethi and Q. Zhang, Multilevel hierarchical decision making in stochastic marketing-production systems, SIAM J. Control Optim. 33 (1995), 528–553.
Google Scholar
O.P. Sharma, Markov Queues, Ellis Horwood, New York, 1990.
Google Scholar
H.A. Simon, Models of Discovery and Other Topics in the Methods of Science, D. Reidel Publ. Co., Boston, MA, 1977.
MATH Google Scholar
H.A. Simon and A. Ando, Aggregation of variables in dynamic systems, Econometrica 29 (1961), 111–138.
Google Scholar
A.V. Skorohod, Studies in the Theory of Random Processes, Dover, New York, 1982.
Google Scholar
A.V. Skorohod, Asymptotic Methods of the Theory of Stochastic Differential Equations, Trans. Math. Monographs, Vol. 78, Amer. Math. Soc., Providence, 1989.
Google Scholar
D.R. Smith, Singular Perturbation Theory, Cambridge Univ. Press, New York, 1985.
MATH Google Scholar
D. Snyder, Random Point Processes, Wiley, New York, 1975.
MATH Google Scholar
H.M. Soner, Optimal control with state space constraints II, SIAM J. Control Optim. 24 (1986), 1110–1122.
Google Scholar
H.M. Soner, Singular perturbations in manufacturing systems, SIAM J. Control Optim. 31 (1993), 132–146.
Google Scholar
D.W. Stroock and S.R.S. Varadhan, Multidimensional Diffusion Processes, Springer-Verlag, Berlin, 1979.
MATH Google Scholar
H.M. Taylor and S. Karlin, An Introduction to Stochastic Modeling, Academic Press, Boston, 1994.
MATH Google Scholar
W.A. Thompson, Jr., Point Process Models with Applications to Safety and Reliability, Chapman and Hall, New York, 1988.
MATH Google Scholar
D.N.C. Tse, R.G. Gallager, and J.N. Tsitsiklis, Statistical multiplexing of multiple time-scale Markov streams, IEEE J. Selected Areas Comm. 13 (1995), 1028–1038.
Google Scholar
S.R.S. Varadhan, Large Deviations and Applications, SIAM, Philadelphia, 1984.
Google Scholar
N.G. van Kampen, Stochastic Processes in Physics and Chemistry, North-Holland, Amsterdam, 1992.
Google Scholar
A.B. Vasil’eava and V.F. Butuzov, Asymptotic Expansions of the Solutions of Singularly Perturbed Equations, Nauka, Moscow, 1973.
Google Scholar
A.B. Vasil’eava and V.F. Butuzov, Asymptotic Methods in Singular Perturbations Theory (in Russian), Vysshaya Shkola, Moscow, 1990.
Google Scholar
D. Vermes, Optimal control of piecewise deterministic Markov processes, Stochastics, 14 (1985), 165–207.
MathSciNet MATH Google Scholar
L.Y. Wang, P.P. Khargonekar, and A. Beydoun, Robust control of hybrid systems: Performance guided strategies, in Hybrid Systems V, P. Antsaklis, W. Kohn, M. Lemmon, A. Nerode, amd S. Sastry Eds., Lecuture Notes in Computer Sci., 1567, 356–389, Berlin, 1999.
Google Scholar
Z. Wang and X. Yang, Birth and Death Processes and Markov Chains, Springer-Verlag, Science Press, Beijing, 1992.
MATH Google Scholar
J. Warga, Relaxed variational problems, J Math. Anal. Appl. 4 (1962), 111–128.
Google Scholar
W. Wasow, Asymptotic Expansions for Ordinary Differential Equations, Interscience, New York, 1965.
MATH Google Scholar
W. Wasow, Linear Turning Point Theory, Springer-Verlag, New York, 1985.
MATH Google Scholar
A.D. Wentzel, On the asymptotics of eigenvalues of matrices with elements of order exp( − V _ij ∕ 2ε²), Dokl. Akad. Nauk SSSR 222 (1972), 263–265.
Google Scholar
D.J. White, Markov Decision Processes, Wiley, New York, 1992.
Google Scholar
J.H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, New York, 1988.
MATH Google Scholar
H. Yan, G. Yin, and S. X. C. Lou, Using stochastic optimization to determine threshold values for control of unreliable manufacturing systems, J. Optim. Theory Appl. 83 (1994), 511–539.
Google Scholar
H. Yan and Q. Zhang, A numerical method in optimal production and setup scheduling in stochastic manufacturing systems, IEEE Trans. Automat. Control, 42 (1997), 1452–1455.
MathSciNet MATH Google Scholar
G. Yin, Asymptotic properties of an adaptive beam former algorithm, IEEE Trans. Information Theory IT-35 (1989), 859-867.
Google Scholar
G. Yin, Asymptotic expansions of option price under regime-switching diffusions with a fast-varying switching process, Asymptotic Anal., 65 (2009), 203–222.
MATH Google Scholar
G. Yin and I. Gupta, On a continuous time stochastic approximation problem, Acta Appl. Math. 33 (1993), 3–20.
Google Scholar
G. Yin, V. Krishnamurthy, and C. Ion, Regime switching stochastic approximation algorithms with application to adaptive discrete stochastic optimization, SIAM J. Optim., 14 (2004), 1187–1215.
MathSciNet MATH Google Scholar
G. Yin and D.T. Nguyen, Asymptotic expansions of backward equations for two-time-scale Markov chains in continuous time, Acta Math. Appl. Sinica, 25 (2009), 457–476.
MathSciNet MATH Google Scholar
G. Yin and K.M. Ramachandran, A differential delay equation with wideband noise perturbation, Stochastic Process Appl. 35 (1990), 231–249.
Google Scholar
G. Yin, H. Yan, and S.X.C. Lou, On a class of stochastic optimization algorithms with applications to manufacturing models, in Model-Oriented Data Analysis, W.G. Müller, H.P. Wynn and A.A. Zhigljavsky (Eds.), 213–226, Physica-Verlag, Heidelberg, 1993.
Google Scholar
G. Yin and H.L. Yang, Two-time-scale jump-diffusion models with Markovian switching regimes, Stochastics Stochastics Rep., 76 (2004), 77–99.
MathSciNet MATH Google Scholar
G. Yin and H. Zhang, Two-time-scale markov chains and applications to quasi-birth-death queues, SIAM J. Appl. Math., 65 (2005), 567–586.
MathSciNet MATH Google Scholar
G. Yin and H. Zhang, Singularly perturbed markov chains: Limit results and applications, Ann. Appl. Probab., 17 (2007), 207–229.
MathSciNet MATH Google Scholar
G. Yin, H. Zhang, and Q. Zhang, Applications of Two-time-scale Markovian Systems, Preprint, 2012.
Google Scholar
G. Yin and Q. Zhang, Near optimality of stochastic control in systems with unknown parameter processes, Appl. Math. Optim. 29 (1994), 263–284.
Google Scholar
G. Yin and Q. Zhang, Control of dynamic systems under the influence of singularly perturbed Markov chains, J. Math. Anal. Appl., 216 (1997), 343–367.
MathSciNet MATH Google Scholar
G. Yin and Q. Zhang (Eds.), Recent Advances in Control and Optimization of Manufacturing Systems, Lecture Notes in Control and Information Sciences (LNCIS) series, Vol. 214, Springer-Verlag, New York, 1996.
Google Scholar
G. Yin and Q. Zhang (Eds.), Mathematics of Stochastic Manufacturing Systems, Proc. 1996 AMS-SIAM Summer Seminar in Applied Mathematics, Lectures in Applied Mathematics, Amer. Math. Soc., Providence, RI, 1997.
Google Scholar
G. Yin and Q. Zhang, Continuous-Time Markov Chains and Applications: A Singular Perturbation Approach, 1st Ed., Springer-Verlag, New York, 1998.
MATH Google Scholar
G. Yin and Q. Zhang, Discrete-time Markov Chains: Two-time-scale Methods and Applications, Springer, New York. 2005.
Google Scholar
G. Yin, Q. Zhang, and G. Badowski, Asymptotic properties of a singularly perturbed Markov chain with inclusion of transient states, Ann. Appl. Probab., 10 (2000), 549–572.
MathSciNet MATH Google Scholar
G. Yin, Q. Zhang, and G. Badowski, Singularly perturbed Markov chains: Convergence and aggregation, J. Multivariate Anal., 72 (2000), 208–229.
MathSciNet MATH Google Scholar
G. Yin, Q. Zhang, and G. Badowski, Occupation measures of singularly perturbed Markov chains with absorbing states, Acta Math. Sinica, 16 (2000), 161–180.
MathSciNet MATH Google Scholar
G. Yin, Q. Zhang, and G. Badowski, Discrete-time singularly perturbed Markov chains: Aggregation, occupation measures, and switching diffusion limit, Adv. in Appl. Probab., 35 (2003), 449–476.
MathSciNet MATH Google Scholar
G. Yin and X.Y. Zhou, Markowitz’s mean-variance portfolio selection with regime switching: From discrete-time models to their continuous-time limits, IEEE Trans. Automat. Control, 49 (2004), 349–360.
MathSciNet Google Scholar
G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, Springer, New York, 2010.
Google Scholar
K. Yosida, Functional Analysis, 6th Ed., Springer-Verlag, New York, NY, 1980.
MATH Google Scholar
J. Yong and X.Y. Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equations, Springer, New York, 1999.
MATH Google Scholar
Q. Zhang, An asymptotic analysis of controlled diffusions with rapidly oscillating parameters, Stochastics Stochastics Rep. 42 (1993), 67–92.
Google Scholar
Q. Zhang, Risk sensitive production planning of stochastic manufacturing systems: A singular perturbation approach, SIAM J. Control Optim. 33 (1995), 498–527.
Google Scholar
Q. Zhang, Finite state Markovian decision processes with weak and strong interactions, Stochastics Stochastics Rep. 59 (1996), 283–304.
Google Scholar
Q. Zhang, Nonlinear filtering and control of a switching diffusion with small observation noise, SIAM J. Control Optim., 36 (1998), 1738–1768.
Google Scholar
Q. Zhang and G. Yin, Turnpike sets in stochastic manufacturing systems with finite time horizon, Stochastics Stochastics Rep. 51 (1994), 11–40.
Google Scholar
Q. Zhang and G. Yin, Central limit theorems for singular perturbations of nonstationary finite state Markov chains, Ann. Appl. Probab. 6 (1996), 650–670.
Google Scholar
Q. Zhang and G. Yin, Structural properties of Markov chains with weak and strong interactions, Stochastic Process Appl., 70 (1997), 181–197.
MathSciNet MATH Google Scholar
Q. Zhang and G. Yin, On nearly optimal controls of hybrid LQG problems, IEEE Trans. Automat. Control, 44 (1999), 2271–2282.
MathSciNet MATH Google Scholar
Q. Zhang and G. Yin, Nearly optimal asset allocation in hybrid stock-investment models, J. Optim. Theory Appl., 121 (2004), 197–222.
MathSciNet Google Scholar
Q. Zhang, G. Yin, and E.K. Boukas, Controlled Markov chains with weak and strong interactions: Asymptotic optimality and application in manufacturing, J. Optim. Theory Appl., 94 (1997), 169–194.
MathSciNet MATH Google Scholar
Q. Zhang, G. Yin, and R.H. Liu, A near-optimal selling rule for a two-time-scale market model, SIAM J. Multiscale Modeling Simulation, 4 (2005), 172–193.
MathSciNet MATH Google Scholar
X.Y. Zhou, Verification theorem within the framework of viscosity solutions, J. Math. Anal. Appl. 177 (1993), 208–225.
Google Scholar
X.Y. Zhou and G. Yin, Markowitz mean-variance portfolio selection with regime switching: A continuous-time model, SIAM J. Control Optim., 42 (2003), 1466–1482.
MathSciNet MATH Google Scholar
C. Zhu, G. Yin, and Q.S. Song, Stability of random-switching systems of differential equations, Quart. Appl. Math., 67 (2009), 201–220.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Wayne State University, Detroit, Michigan, USA
G. George Yin
Department of Mathematics, University of Georgia, Athens, Georgia, USA
Qing Zhang

Authors

G. George Yin
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yin, G.G., Zhang, Q. (2013). Occupation Measures: Asymptotic Properties and Ramification. In: Continuous-Time Markov Chains and Applications. Stochastic Modelling and Applied Probability, vol 37. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4346-9_5

Download citation

DOI: https://doi.org/10.1007/978-1-4614-4346-9_5
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4345-2
Online ISBN: 978-1-4614-4346-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Occupation Measures: Asymptotic Properties and Ramification

Abstract

Keywords

1 Introduction

2 The Irreducible Case

2.1 Occupation Measure

2.2 Conditions and Preliminary Results

Lemma 5.1

Remark 5.2

Example 5.3

2.3 Exponential Bounds

Theorem 5.4

Remark 5.5

Corollary 5.6

Corollary 5.7

Corollary 5.8

2.4 Asymptotic Normality

Theorem 5.9

Remark 5.10

Remark 5.11

Lemma 5.12

Lemma 5.13

Lemma 5.14

Remark 5.15

Lemma 5.16

Remark 5.17

Example 5.18

2.5 Extensions

Theorem 5.19

Theorem 5.20

Corollary 5.21

Theorem 5.22

Remark 5.23

3 Markov Chains with Weak and Strong Interactions

3.1 Aggregation of Markov Chains

Lemma 5.24

Theorem 5.25

Example 5.26

Theorem 5.27

Example 5.28

3.2 Exponential Bounds

Theorem 5.29

Remark 5.30

Remark 5.31

Corollary 5.32

Corollary 5.33

Remark 5.34

3.3 Asymptotic Distributions

Lemma 5.35

Lemma 5.36

Lemma 5.37

Remark 5.38

Remark 5.39

Lemma 5.40

Lemma 5.41

Lemma 5.42

Theorem 5.43

Remark 5.44

Remark 5.45

Corollary 5.46

Example 5.47

4 Measurable Generators

4.1 Case I: Weakly Irreducible \(\widetilde{Q}(t)\)

Remark 5.48

Lemma 5.49

Theorem 5.50

4.2 Case II: \(\widetilde{Q}(t) = \mathrm{diag}(\widetilde{{Q}}^{1}(t),\ldots,\widetilde{{Q}}^{l}(t))\)

Lemma 5.51

Theorem 5.52

Theorem 5.53

Remark 5.54

Remark 5.55

5 Remarks on Inclusion of Transient and Absorbing States

5.1 Inclusion of Transient States

Theorem 5.56

Theorem 5.57

Theorem 5.58

Theorem 5.59

5.2 Inclusion of Absorbing States

Theorem 5.60