Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Chapter 4 deals with the probability distribution of αε( ⋅) through the corresponding forward equation and is mainly an analytical approach, whereas the current chapter is largely probabilistic in nature. The central theme of this chapter is limit results of unscaled as well as scaled sequences of occupation measures, which include the law of large numbers for an unscaled sequence, exponential upper bounds, and asymptotic distribution of a suitably scaled sequence of occupation times. It further exploits the deviation of the functional occupation times from its quasi-stationary distribution. We obtain estimates of centered deviations, prove the convergence of a properly scaled and centered sequence of occupation times, characterize the limit process by deriving the limit distribution and providing explicit formulas for the mean and covariance functions, and provide exponential bounds for the normalized process. It is worthwhile to note that the limit covariance function depends on the initial-layer terms in contrast with most of the existing results of central limit type.

The rest of the chapter is arranged as follows. We first study the asymptotic properties of irreducible Markov chains in Section 5.2. In view of the developments in Remarks 4.34 and 4.39, the Markov chain with recurrent states is the most illustrative and representative one. As a result, in the remaining chapters, we mainly treat problems associated with this model. Starting in Section 5.3.1, we consider Markov chains with weak and strong interactions with generators consisting of multiple irreducible blocks. After treating aggregation of the Markov states, we study the corresponding exponential bounds. We deal with asymptotic distributions. Then in Section 5.4, we treat Markov chains with generators that are merely measurable. Next, remarks on inclusion of transient and absorbing states are provided in Section 5.5. Applications of the weak convergence results and a related stability problem are provided in Section 5.6. Finally, Section 5.7 concludes the chapter with notes and further remarks.

2 The Irreducible Case

The notion of occupation measure is set forth first. We consider a sequence of unscaled occupation measures and establish its convergence in probability to that of the accumulative quasi-stationary distribution. This is followed by exponential bounds of the function occupation time and moment estimates. In addition, asymptotic normality is derived. Although the prelimit process has nonzero mean and is nonstationary, using the results of Section 4.2, the quasi-stationary regime is established after a short period (of order O(ε)). We also calculate explicitly the covariance representation of the limit process, and prove that the process αε( ⋅) satisfies a mixing condition. The tightness of the sequence and the w.p.1 continuity of the sample paths of the limit process are proved by estimating the fourth moment. The limit of the finite-dimensional distributions is then calculated and shown to be Gaussian. By proving a series of lemmas, we derive the desired asymptotic normality.

As was mentioned in previous chapters, the process αε( ⋅) arises from pervasive practical use that involves a rapidly fluctuating finite-state Markov chain. In these applications, the asymptotic behavior of the Markov chain αε( ⋅) has a major influence. Further investigation and understanding of the asymptotic properties of αε( ⋅), in particular, the probabilistic structures, play an important role in the in-depth study.

In Section 4.2, using singular perturbation techniques, we examined the asymptotic properties of \({p}_{i}^\varepsilon (t) = P({\alpha }^\varepsilon (t) = i)\). It has been proved that p ε(t) = (p 1 ε(t), , p m ε(t)) converges to the quasi-stationary distribution ν(t) as ε → 0 for each t > 0 and p ε(t) admits an asymptotic expansion in terms of ε. To gain further insight, we ask whether there is a limit result for the occupation measure \(\int \nolimits_{0}^{t}{I}_{\{{\alpha }^\varepsilon (s)=i\}}ds\). If a convergence is expected to take place, then what is the rate of convergence? Does one have a central limit theorem associated with the αε( ⋅)-process? The answers to these questions are affirmative. We will prove a number of limit results related to an unscaled sequence, and a suitably scaled and normalized sequence. Owing to the asymptotic expansions, the scaling factor is \(\sqrt \varepsilon \). The limit process is Gaussian with zero mean, and the covariance of the limit process depends on the asymptotic expansion in an essential way, which reflects one of the distinct features of the central limit theorem. It appears that it is virtually impossible to calculate the asymptotic covariance of the Gaussian process without the help of the asymptotic expansion, which reveals a salient characteristic of the two-time-scale Markov chain.

A related problem is to examine the exponential bounds on the scaled occupation measure process. This is similar to the estimation of the moment generating function. Such estimates have been found useful in studying hierarchical controls of manufacturing systems. Using the asymptotic expansion and the martingale representation of finite-state Markov chains, we are able to establish such exponential bounds for the scaled occupation measures.

2.1 Occupation Measure

Let \((\Omega,\mathcal{F},P)\) denote the underlying probability space. As in Section 4.2, αε( ⋅) is a nonstationary Markov chain on \((\Omega,\mathcal{F},P)\) with finite-state space \(\mathcal{M} =\{ 1,\ldots,m\}\) and generator \({Q}^\varepsilon (t) = Q(t)/\varepsilon \).

For each \(i \in \mathcal{M}\), let β i ( ⋅) denote a bounded Borel measurable deterministic function and define a sequence of centered (around the quasi-stationary distribution) occupation measures Z i ε(t) as

$${Z}_{i}^\varepsilon (t) ={ \int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)\right ){\beta }_{i}(s)ds.$$
(5.1)

Set Z ε(t) = (Z 1 ε(t), , Z m ε(t)). It is a measure of the functional occupancy for the process αε( ⋅). Our interest lies in the asymptotic properties of the sequence defined in (5.1). To proceed, we first present some conditions and preliminary results needed in the sequel.

Note that a special choice of β i ( ⋅) is β i (t) = 1, for t ∈ [0, T]. To insert β i ( ⋅) in sequence allows one to treat various situations in some applications. For example, in the manufacturing problem, β i (t) is often given by a function of a control process; see Chapter 8 for further details.

2.2 Conditions and Preliminary Results

To proceed, we make the following assumptions.

  1. (A5.1)

    For each t ∈ [0, T], Q(t) is weakly irreducible.

  2. (A5.2)

    ( ⋅) is continuously differentiable on [0, T], and its derivative is Lipschitz.

Recall that \({p}^\varepsilon (t) = (P({\alpha }^\varepsilon (t) = 1),\ldots,P({\alpha }^\varepsilon (t) = m))\) and let

$${p}_{ij}^\varepsilon (t,{t}_{ 0}) = P({\alpha }^\varepsilon (t) = j\vert {\alpha }^\varepsilon ({t}_{ 0}) = i)\quad \mbox{ for all }i,j \in \mathcal{M}.$$

Use P ε(t, t 0) to denote the transition matrix (p ij ε(t, t 0)). The following lemma is on the asymptotic expansion of P ε(t, t 0).

Lemma 5.1

Assume (A5.1) and (A5.2) . Then there exists a positive constant κ 0 such that for each fixed 0 ≤ T < ∞,

$${P}^\varepsilon (t,{t}_{ 0}) = {P}_{0}(t) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - {t}_{0})} \varepsilon \right )\right )$$
(5.2)

uniformly in (t 0 ,t) where 0 ≤ t 0 ≤ t ≤ T and

$${P}_{0}(t) = \left (\begin{array}{c} \nu (t)\\ \vdots\\ \nu (t) \end{array} \right ).$$

In addition, assume Q(⋅) to be twice continuously differentiable on [0,T] with the second derivative being Lipschitz. Then

$$\begin{array}{ll} {P}^\varepsilon (t,{t}_{0})& = {P}_{0}(t) + \varepsilon {P}_{1}(t) \\ &\quad + {Q}_{0}\left (\frac{t - {t}_{0}} \varepsilon ,{t}_{0}\right ) + \varepsilon {Q}_{1}\left (\frac{t - {t}_{0}} \varepsilon ,{t}_{0}\right ) + O(\varepsilon ^{2}) \end{array}$$
(5.3)

uniformly in (t 0 ,t), where 0 ≤ t 0 ≤ t ≤ T,

$${P}_{1}(t) = \left (\begin{array}{c} {\varphi }_{1}(t)\\ \vdots \\ {\varphi }_{1}(t) \end{array} \right ),$$
$$\begin{array}{rl} &\frac{d{Q}_{0}(\tau,{t}_{0})} {d\tau } = {Q}_{0}(\tau,{t}_{0})Q({t}_{0}),\;\tau \geq 0, \\ &{Q}_{0}(0,{t}_{0}) = I - {P}_{0}({t}_{0}),\end{array}$$

and

$$\begin{array}{rl} &\frac{d{Q}_{1}(\tau,{t}_{0})} {d\tau } = {Q}_{1}(\tau,{t}_{0})Q({t}_{0}) + \tau {Q}_{0}(\tau,{t}_{0})\frac{dQ({t}_{0})} {dt},\ \ \tau \geq 0 \\ &{Q}_{1}(0,{t}_{0}) = -{P}_{1}({t}_{0}), \end{array}$$

where φ 1 (t) is given in (4.13) ( with \(\tau := (t - {t}_{0})/\varepsilon \) ) . Furthermore, for i = 0,1, the P i (⋅) are (2 − i) times continuously differentiable on [0,T] and there exist constants K > 0 and κ 0 > 0 such that

$$\left \vert {Q}_{i}\left (\tau,{t}_{0}\right )\right \vert \leq K\exp (-{\kappa }_{0}\tau ),$$
(5.4)

uniformly for t 0 ∈ [0,T].

Remark 5.2

Recall that ν(t) and φ1(t) are row vectors. As a result, P0(⋅) and P1(⋅) have identical rows. This is a consequence of the convergence of pε(t) to the quasi-stationary distribution and the asymptotic expansions.

Proof of Lemma  5.1: It suffices to verify (5.3) because (5.2) can be derived similarly. The asymptotic expansion of P ε(t, t 0) can be obtained as in Section 4.2. Thus only the exponential bound (5.4) needs to be proved. The main task is to verify the uniformity in t 0. To this end, it suffices to treat each row of Q i (τ, t 0) separately. For a fixed i = 0, 1, let

$$\eta (\tau,{t}_{0}) = ({\eta }_{1}(\tau,{t}_{0}),\ldots,{\eta }_{m}(\tau,{t}_{0}))$$

denote any row of Q i (τ, t 0) and η0(t 0) the corresponding row in Q i (0, t 0) with

$$\begin{array}{rl} &{Q}_{0}(0,{t}_{0}) = I - {P}_{0}({t}_{0})\quad \mbox{ and } \\ &{Q}_{1}(0,{t}_{0}) = -{P}_{1}({t}_{0})\end{array}$$

Then η(τ, t 0) satisfies the differential equation

$$\begin{array}{ll} &\frac{d\eta (\tau,{t}_{0})} {d\tau } = \eta (\tau,{t}_{0})Q({t}_{0}),\;\tau \geq 0, \\ &\eta (0,{t}_{0}) = {\eta }_{0}({t}_{0})\end{array}$$

By virtue of the assumptions of Lemma  5.1 and the asymptotic expansion, it follows that η0(t 0) is uniformly bounded and η0(t 0)1  = 0. 

Define

$$\widehat{\kappa } = -\max \{\mbox{ The real parts of eigenvalues of }Q(t),\;t \in [0,T]\}.$$

Then Lemma  A.6 implies that \(\widehat{k} > 0\). In view of Theorem  4.5, it suffices to show that for all \(\tau \geq 0\) and for some constant K > 0 independent of t 0,

$$\vert \eta (\tau,{t}_{0})\vert \leq K\exp \left (-\frac{\widehat{\kappa }\tau } {2} \right ).$$
(5.5)

To verify (5.5), note that for any ς0 ∈ [0, T],

$${ d\eta (\tau,{t}_{0}) \over d\tau } = \eta (\tau,{t}_{0})Q({\varsigma }_{0}) + \eta (\tau,{t}_{0})[Q({t}_{0}) - Q({\varsigma }_{0})].$$

Solving this differential equation by treating η(τ, t 0)[Q(t 0) − Q0)] as the driving term, we have

$$\begin{array}{ll} \eta (\tau,{t}_{0})& = {\eta }_{0}({t}_{0})\exp \left (Q({\varsigma }_{0})\tau \right ) \\ &\quad +{ \int }_{0}^{\tau }\eta (\varsigma,{t}_{ 0})[Q({t}_{0}) - Q({\varsigma }_{0})]\exp \left (Q({\varsigma }_{0})(\tau - \varsigma )\right )d\varsigma.\end{array}$$
(5.6)

In view of (A5.2), for some K 0 > 0,

$$\vert Q({t}_{0}) - Q({\varsigma }_{0})\vert \leq {K}_{0}\vert {t}_{0} - {\varsigma }_{0}\vert.$$

Noting that η0(t 0)1  = 0 and that P 0(t) has identical rows, we have

$${\eta }_{0}({t}_{0}){P}_{0}(t) = 0,\quad \mbox{ for }t \geq 0.$$

Thus the equation in (5.6) is equivalent to

$$\begin{array}{rl} \eta (\tau,{t}_{0})& = {\eta }_{0}({t}_{0})(\exp \left (Q({\varsigma }_{0})s\right ) - {P}_{0}({\varsigma }_{0})) \\ &\ +{ \int }_{0}^{\tau }\eta (\varsigma,{t}_{ 0})[Q({t}_{0}) - Q({\varsigma }_{0})](\exp \left (Q({\varsigma }_{0})(\tau - \varsigma )\right ) - {P}_{0}({\varsigma }_{0}))d\varsigma \end{array}$$

From Lemma  A.2, we have

$$\vert \eta (\tau,{t}_{0})\vert \leq {K}_{1}\exp \left (-\widehat{\kappa }\tau \right ) + {K}_{2}\vert {t}_{0} - {\varsigma }_{0}\vert {\int }_{0}^{\tau }\vert \eta (\varsigma,{t}_{ 0})\vert \exp \left (-\widehat{\kappa }(\tau - \varsigma )\right )d\varsigma,$$

for some constants K 1 and K 2 which may depend on ς0 but are independent of t 0. By Gronwall’s inequality (see Hale [79, p. 36]),

$$\vert \eta (\tau,{t}_{0})\vert \leq {K}_{1}\exp \left (-(\widehat{\kappa } - {K}_{2}\vert {t}_{0} - {\varsigma }_{0}\vert )\tau \right ),$$
(5.7)

for all t 0 ∈ [0, T] and τ > 0.

If (5.5) does not hold uniformly, then there exist τ n  > 0 and t n  ∈ [0, T] such that

$$\vert \eta ({\tau }_{n},{t}_{n})\vert \geq n\exp \left (-\frac{\widehat{\kappa }{\tau }_{n}} {2} \right ).$$

Since T is finite, we may assume t n  → ς0, as n → . This contradicts (5.7) for n large enough satisfying \(\vert {t}_{n} - {\varsigma }_{0}\vert <\widehat{ \kappa }/(2{K}_{2})\) and K 1 < n. Thus the proof is complete. □ 

Unscaled Occupation Measure

To study the unscaled occupation measure Z i ε(t) in (5.1), we define a related sequence \(\{\widehat{{Z}}^\varepsilon (t)\}\) of \({\mathbb{R}}^{m}\)-valued processes with its ith component \(\widehat{{Z}}_{i}^\varepsilon (t)\) given by

$$\widehat{{Z}}_{i}^\varepsilon (t) ={ \int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - P({\alpha }^\varepsilon (s) = i)\right ){\beta }_{ i}(s)ds.$$

Assume the conditions (A5.1) and (A5.2). We claim that for any δ > 0,

$$ \lim \limits_{\varepsilon \rightarrow 0} \left( \sup\limits_{0\leq t\leq T}P(\vert \widehat{{Z}}^\varepsilon (t)\vert \geq \delta )\right) = 0\mbox{ and }$$
(5.8)
$$ \lim \limits_{\varepsilon \rightarrow 0}\left( \sup \limits_{0\leq t\leq T}E\vert \widehat{{Z}}^\varepsilon (t){\vert}^{2}\right) = 0.$$
(5.9)

Note that (5.8) follows from (5.9) using Tchebyshev’s inequality. The verification of (5.9), which mainly depends on a mixing property of the underlying sequence, is almost the same as the moment estimates in the proof of asymptotic normality in Lemma  5.13. The details of the verifications of (5.8) and (5.9) are omitted here.

With (5.9) in hand for any δ > 0, to study the asymptotic properties of Z ε( ⋅), it remains to show that

$$\begin{array}{rl} \lim \limits {}_{\varepsilon \rightarrow 0}\left( \sup \limits_{0\leq t\leq T}P(\vert {Z}^\varepsilon (t)\vert \geq \delta )\right ) = 0 \mbox{ and } \quad \lim \limits_{\varepsilon \rightarrow 0}\left( \sup \limits_{0\leq t\leq T}E\vert {Z}^\varepsilon (t){\vert }^{2}\right ) = 0 .\end{array}$$

In fact, it is enough to work with each component of Z ε(t). Note that both Z ε(t) and \(\widehat{Z}^\varepsilon(t)\) are bounded. This together with the boundedness of β(t) and Lemma  5.1 implies that for each \(i \in \mathcal{M}\),

$$\begin{array}{rl} & \sup \limits_{0\leq t\leq T}E\vert {Z}_{i}^\varepsilon (t){\vert }^{2} \\ & \leq 2 \left(\sup \limits_{0\leq t\leq T}E\vert \widehat{{Z}}_{i}^\varepsilon(t){\vert }^{2} + \sup \limits_{ 0\leq t\leq T}E |\int_{0}^{t}\left(P({\alpha }^\varepsilon (s) = i) - {\nu }_{i}(s)\right){\beta }_{i}(s)d s {|}^{2}\right) \\ & \leq 2\left(\sup \limits_{0\leq t\leq T}E\vert \widehat{{Z}}_{i}^\varepsilon(t){\vert }^{2} +{ \int }_{0}^{T}O(\varepsilon )ds\right )\rightarrow 0, \end{array}$$

as ε → 0, which yields the desired results.

The limit result above is of the law-of-large-numbers type. What has been obtained is that as ε → 0,

$${\int }_{0}^{t}{I}_{\{{ \alpha }^\varepsilon (s)=i\}}ds \rightarrow {\int }_{0}^{t}{\nu }_{ i}(s)ds\quad \mbox{ in probability as }\varepsilon \rightarrow 0,$$

for 0 < t ≤ T. In fact, a somewhat stronger result on uniform convergence in terms of the second moment is established. To illustrate, suppose that \({\alpha }^\varepsilon (t) = \alpha (t/\varepsilon )\) such that α( ⋅) is a stationary process with stationary distribution \(\overline{\nu } = ({\overline{\nu }}_{1},\ldots,{\overline{\nu }}_{m})\). Then via a change of variable \(\varsigma = s/\varepsilon \), we have

$$\begin{array}{rl} &\frac{1} {t}{\int }_{0}^{t}{I}_{\{{ \alpha }^\varepsilon (s)=i\}}ds = \frac\varepsilon {t}{\int }_{0}^{t/\varepsilon }{I}_{\{ \alpha (\varsigma )=i\}}d\varsigma \\ &\quad ={ \varepsilon \over t} {\int }_{0}^{t/\varepsilon }{I}_{\{ \alpha (\varsigma )=i\}}d\varsigma \rightarrow {\overline{\nu }}_{i}\quad \mbox{ in probability as }\varepsilon \rightarrow 0, \end{array}$$

for 0 < t ≤ T. This is exactly the continuous-time version of the law of large numbers.

Example 5.3

Let us return to the singularly perturbed Cox process of Section 3.3. Recall that the compensator of the singularly perturbed Cox process is given by

$${G}^\varepsilon (t) = {G}_{ 0} + \sum \limits_{i=1}^{m}{ \int }_{0}^{t}{a}_{ i}{I}_{\{{\alpha }^\varepsilon (s)=i\}}ds,$$

where ai > 0 for i = 1,…,m. Assume that all the conditions in Lemma  5.1 hold. Then Theorem  4.5 implies that P(α ε (t) = i) → ν i (t) as ε → 0. What we have discussed thus far implies that for each \(i \in \mathcal{M}\),

$${\int }_{0}^{t}{a}_{ i}{I}_{\{{\alpha }^\varepsilon (s)=i\}}ds \rightarrow {\int }_{0}^{t}{a}_{ i}{\nu }_{i}(s)ds\quad \mbox{ in probability as }\varepsilon \rightarrow 0\mbox{ and }$$
$${G}^\varepsilon (t) \rightarrow G(t) = {G}_{ 0} + \sum \limits_{i=1}^{m}{ \int }_{0}^{t}{a}_{ i}{\nu }_{i}(s)ds\quad \mbox{ in probability.}$$

Moreover,

$$\lim \limits_{\varepsilon \rightarrow 0}\left(\sup \limits_{0\leq t\leq T}E\vert {G}^\varepsilon (t) - G(t){\vert }^{2}\right ) = 0.$$

In the rest of this chapter, we treat suitably scaled occupation measures; the corresponding results for the Cox process can also be derived.

With the limit results in hand, the next question is this: How fast does the convergence take place? The rate of convergence issue together with more detailed asymptotic properties is examined fully in the following sections.

2.3 Exponential Bounds

This section is devoted to the derivation of exponential bounds for the normalized occupation measure (or occupation time) n ε( ⋅). Given a deterministic process β( ⋅), we consider the “centered” and “scaled” functional occupation-time process n ε(t, i) defined by

$$\begin{array}{ll} &{n}^\varepsilon (t,i) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)\right ){\beta }_{i}(s)ds\ \mbox{ and } \\ &{n}^\varepsilon (t) = ({n}^\varepsilon (t,1),\ldots,{n}^\varepsilon (t,m)) \in {\mathbb{R}}^{1\times m}.\end{array}$$
(5.10)

In view of Lemma  5.1, we have, for 0 ≤ s ≤ t ≤  T,

$${P}^\varepsilon (t,s) - {P}_{ 0}(t) = O\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ),$$

for some κ0 > 0. Note that the big O( ⋅) usually depends on T. Let K T denote an upper bound of

$$\frac{{P}^\varepsilon (t,s) - {P}_{0}(t)} {\varepsilon +\exp (-{\kappa }_{0}(t - s)/\varepsilon )}$$

for 0 ≤ s ≤  t ≤ T. For convenience, we use the notation O 1(y) to denote a function of y such that | O 1(y) | ∕ | y | ≤ 1. The rationale is that K T represents the magnitude of the bounding constant and the rest of the bound is in terms of a function with norm bounded by 1. Using this notation and K T , we write

$${P}^\varepsilon (t,s) - {P}_{ 0}(t) = {K}_{T}{O}_{1}\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).$$
(5.11)

Let y(t) = ( y ij (t)) and z( t) = (z i (t)) denote a matrix-valued function and a vector-valued function defined on [0, T], respectively. Their norms are defined by

$$\begin{array}{ll} &\vert y{\vert }_{T} =\max \limits_{i,j}\sup\limits _{0\leq t\leq T}\vert {y}_{ij}(t)\vert, \\ &\vert z{\vert }_{T} =\max\limits _{i}\sup \limits_{0\leq t\leq T}\vert {z}_{i}(t)\vert. \end{array}$$
(5.12)

For future use, define β(t) = diag(β 1(t), …, β m (t)). The following theorem is concerned with the exponential bound of n ε(t) for ε sufficiently small.

Theorem 5.4

Assume that (A5.1) and (A5.2) are satisfied. Then there exist ε 0 > 0 and K > 0 such that for all 0 < ε ≤ ε 0 , T ≥ 0, and any bounded and measurable deterministic function β(⋅) = diag 1 (⋅),…,β m (⋅)), the following exponential bound holds:

$$E\exp \left \{ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup \limits_{0\leq t\leq T}\left \vert {n}^\varepsilon (t)\right \vert \right \}\leq K,$$
(5.13)

where θ T is a constant satisfying

$$0 \leq {\theta }_{T} \leq \frac{\min \{1,{\kappa }_{0}\}} {{K}_{T}\vert \beta {\vert }_{T}}$$
(5.14)

with κ 0 being the exponential constant in Lemma  5.1.

Remark 5.5

Note that the constants ε 0 and K are independent of T. This is a convenient feature of the estimate in certain applications. The result is in terms of a fixed but otherwise arbitrary T, which is particularly useful for systems in an infinite horizon.

Proof: The proof is divided into several steps.

Step 1. In the first step, we show that (5.13) holds when the “sup” is absent. Let χε( ⋅) denote the indicator vector of αε( ⋅), that is,

$$\begin{array}{rl} &{\chi }^\varepsilon (t) = \left ({I}_{\{{ \alpha }^\varepsilon (t)=1\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)=m\}}\right )\quad \mbox{ and } \\ &{w}^\varepsilon (t) = {\chi }^\varepsilon (t) - {\chi }^\varepsilon (0) -\frac{1} \varepsilon {\int }_{0}^{t}{\chi }^\varepsilon (s)Q(s)ds\end{array}.$$

It is well known (see Elliott [56]) that w ε(t) = (w 1 ε(t), …,  w m ε(t)), for t ≥ 0, is a σ{α ε(s) : s ≤  t}-martingale. In view of a result of Kunita and Watanabe [134] (see also Ikeda and Watanabe [91, p. 55]), one can define a stochastic integral with respect to w ε(t). Moreover, the solution of the linear stochastic differential equation

$$d{\chi }^\varepsilon (t) = {\chi }^\varepsilon (t){Q}^\varepsilon (t)dt + d{w}^\varepsilon (t)$$

is given by

$${\chi }^\varepsilon (t) = {\chi }^\varepsilon (0){P}^\varepsilon (t,0) + {\int }_{0}^{t}(d{w}^\varepsilon (s)){P}^\varepsilon (t,s),$$

where P ε(t, s) is the principal matrix solution to the equation

$$\frac{d{P}^\varepsilon (t,s)} {dt} = \frac{1} \varepsilon {P}^\varepsilon (t,s)Q(t),\;\mbox{ with }{P}^\varepsilon (s,s) = I$$

representing the transition probability matrix.

Note that for t ≥ s ≥ 0,

$${\chi }^\varepsilon (s){P}_{ 0}(t) = ({\chi }^\varepsilon (s)\mathrm{l})\nu (t) = \nu (t).$$

Using this and (5.11), we have

$$\begin{array}{l} {\chi }^\varepsilon (t) - \nu (t) \\ \ = {\chi }^\varepsilon (0)({P}^\varepsilon (t,0) - {P}_{ 0}(t)) + {\int }_{0}^{t}(d{w}^\varepsilon (s))[({P}^\varepsilon (t,s) - {P}_{ 0}(t)) + {P}_{0}(t)] \\ \ = {K}_{T}{O}_{1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}t} \varepsilon \right )\right ) \\ \quad \ + {K}_{T}{\int }_{0}^{t}(d{w}^\varepsilon (s)){O}_{ 1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ) + {w}^\varepsilon (t){P}_{ 0}(t) \\ \ = {K}_{T}{O}_{1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}t} \varepsilon \right )\right ) \\ \quad \ + {K}_{T}{\int }_{0}^{t}(d{w}^\varepsilon (s)){O}_{ 1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).\end{array}$$

The last equality above follows from the observation that

$$Q(s){P}_{0}(t) = 0\mbox{ for all }t \geq s \geq 0,$$

and

$$\begin{array}{rl} {w}^\varepsilon (t){P}_{0}(t) =&\left ({\chi }^\varepsilon (t) - {\chi }^\varepsilon (0) -{ 1 \over \varepsilon } {\int }_{0}^{t}{\chi }^\varepsilon (r)Q(r)dr\right ){P}_{ 0}(t) \\ =&\nu (t) - \nu (t) -{ 1 \over \varepsilon } {\int }_{0}^{t}{\chi }^\varepsilon (r)Q(r){P}_{ 0}(t)dr = 0 .\end{array}$$

Recall that β( t) = diag(β1(t),  , β m (t)). Then it follows that

$$\begin{array}{rl} &{\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds \\ =&{K}_{T}{O}_{1}(\varepsilon (t + 1)) \\ & + {K}_{T}{\int }_{0}^{t}{\int }_{0}^{s}(d{w}^\varepsilon (r)){O}_{ 1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}(s - r)} \varepsilon \right )\right )\beta (s)ds \\ =&{K}_{T}{O}_{1}(\varepsilon (t + 1)) \\ & + {K}_{T}{\int }_{0}^{t}(d{w}^\varepsilon (r))\left ({\int }_{r}^{t}{O}_{ 1}\left (\varepsilon + \exp \left (-\frac{{\kappa }_{0}(s - r)} \varepsilon \right )\right )\beta (s)ds\right ) \\ =&{K}_{T}{O}_{1}(\varepsilon (t + 1)) \\ & + \varepsilon {K}_{T}{\int }_{0}^{t}(d{w}^\varepsilon (r)){O}_{ 1}\left ((t - r) + \frac{1} {{\kappa }_{0}}\left (1 -\exp \left (-\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right )\right )\vert \beta {\vert }_{T} \\ =&{K}_{T}{O}_{1}(\varepsilon (t + 1)) + \varepsilon {K}_{T}\vert \beta {\vert }_{T}\left (T + \frac{1} {{\kappa }_{0}}\right ){\int }_{0}^{t}(d{w}^\varepsilon (r))b(r,t), \\ \end{array}$$

where b( s, t) is a measurable function and |  b(s,  t) | ≤ 1 for all s and t. Dividing both sides by (T + 1), we obtain

$$\begin{array}{l} \frac{1} {T + 1}\left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert \\ \ = \varepsilon {K}_{T}{O}_{1}(1) + \varepsilon {K}_{T}\vert \beta {\vert }_{T}\left (\frac{T + (1/{\kappa }_{0})} {T + 1} \right )\left \vert {\int }_{0}^{t}(d{w}^\varepsilon (s))b(s,t)\right \vert. \end{array}$$
(5.15)

Therefore, we have

$$\begin{array}{rl} &E\exp \left \{ \frac{{\theta }_{T}} {\sqrt\varepsilon {(T + 1)}^{\frac{3} {2} }} \left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert \right \} \\ \leq &E\exp \left \{ \frac{1} {\sqrt\varepsilon \sqrt{T + 1}}\left [\varepsilon {O}_{1}(1) + \varepsilon \left \vert {\int }_{0}^{t}(d{w}^\varepsilon (s))b(s,t)\right \vert \right ]\right \}.\end{array}$$

In view of the choice of θ T , it follows that

$$\begin{array}{rl} &E\exp \left \{ \frac{{\theta }_{T}} {\sqrt\varepsilon {(T + 1)}^{\frac{3} {2} }} \left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert \right \} \\ &\leq \exp \left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\right )E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}(d{w}^\varepsilon (s))b(s,t)\right \vert \right \} \\ &\leq eE\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}(d{w}^\varepsilon (s))b(s,t)\right \vert \right \} .\end{array}$$
(5.16)

Recall that

$${w}^\varepsilon (t) = ({w}_{ 1}^\varepsilon (t),\ldots,{w}_{ m}^\varepsilon (t)).$$

It suffices to work out the estimate for each component w i ε (t). That is, it is enough to show that for each i = 1, …,  m,

$$E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}b(s,t)d{w}_{ i}^\varepsilon (s)\right \vert \right \}\leq K,$$
(5.17)

for all measurable functions b( ⋅,  ⋅) with |  b(s,  t) | ≤ 1 and 0 ≤ t ≤  T. For each t 0  ≥ 0, let b 0(s) =  b(s,  t 0).

For any nonnegative random variable ξ,

$$\begin{array}{rl} E{e}^{\xi } =&\sum \limits_{j=0}^{\infty }{\int }_{\{j\leq \xi <j+1\}}{e}^{\xi }dP \\ \leq &\sum \limits_{j=0}^{\infty }{\int }_{\{j\leq \xi <j+1\}}{e}^{j+1}dP \\ =&\sum \limits_{j=0}^{\infty }{e}^{j+1}P(j \leq \xi < j + 1) \\ =&\sum \limits_{j=0}^{\infty }{e}^{j+1}[P(\xi \geq j) - P(\xi \geq j + 1)] \\ \leq &e + (e - 1)\sum \limits_{j=1}^{\infty }{e}^{j}P(\xi \geq j)\end{array}$$

By virtue of the inequality above, we have

$$\begin{array}{l} E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \right \} \\ \leq e + (e - 1)\sum \limits_{j=1}^{\infty }{e}^{j}P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{0}(s)d{w}_{i}^\varepsilon (s)\right \vert \geq j\right).\end{array}$$
(5.18)

To proceed, let us concentrate on the estimate of

$$P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \geq j\right ).$$

For each i = 1,  , m, let

$$\widetilde{{p}}_{i}(t) ={ \int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)$$

and let \(\widetilde{{q}}_{i}(\cdot )\) denote the only solution to the following equation (see Elliott [55, Chapter 13])

$$\widetilde{{q}}_{i}(t) = 1 + \zeta {\int }_{0}^{t}\widetilde{{q}}_{ i}({s}^{-})d\widetilde{{p}}_{ i}(s),$$

where \(\widetilde{{q}}_{i}({s}^{-})\) is the left-hand limit of \(\widetilde{{q}}_{i}\) at s and ζ is a positive constant to be determined later. In what follows, we suppress the i-dependence and write \(\widetilde{{p}}_{i}(\cdot )\) and \(\widetilde{{q}}_{i}(\cdot )\) as \(\widetilde{p}(\cdot )\) and \(\widetilde{q}(\cdot )\) whenever there is no confusion.

Note that \(\widetilde{p}(t)\) , for t ≥ 0, is a local martingale. Since

$$\zeta {\int }_{0}^{t}\widetilde{q}({s}^{-})d\widetilde{p}(s),\;t \geq 0,$$

is a local martingale, we have \(E\widetilde{q}(t) \leq 1\mbox{ for all }t \geq 0\). Moreover, \(\widetilde{q}(t)\) can be written as follows (see Elliott [55, Chapter 13]):

$$\widetilde{q}(t) =\exp \left (\zeta \widetilde{p}(t)\right )\prod \limits_{s\leq t}(1 + \zeta \Delta \widetilde{p}(s))\exp \left (-\zeta \Delta \widetilde{p}(s)\right ),$$
(5.19)

where \(\Delta \widetilde{p}(s) :=\widetilde{ p}(s) -\widetilde{ p}({s}^{-})\), with \(\vert \Delta \widetilde{p}(s)\vert \leq 1\).

Now observe that there exist positive constants ζ0 and κ1 such that for 0 < ζ ≤ ζ0 and for all s > 0,

$$(1 + \zeta \Delta \widetilde{p}(s))\exp \left (-\zeta \Delta \widetilde{p}(s)\right ) \geq \exp \left (-{\kappa }_{1}{\zeta }^{2}\right ).$$
(5.20)

Combining (5.19) and (5.20), we obtain

$$\widetilde{q}(t) \geq \exp \{ \zeta \widetilde{p}(t) - {\kappa }_{1}{\zeta }^{2}N(t)\},\;\mbox{ for }0 < \zeta \leq {\zeta }_{ 0},\;t > 0,$$

where N(t) is the number of jumps of \(\widetilde{p}(s)\) in s ∈ [0, t]. Since N(t) is a monotonically increasing process, we have

$$\widetilde{q}(t) \geq \exp \left \{\zeta \widetilde{p}(t) - {\kappa }_{1}{\zeta }^{2}N(T)\right \},\;\mbox{ for }0 < \zeta \leq {\zeta }_{ 0}.$$

Note also that for each i = 1, …,  m,

$$\begin{array}{rl} &P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \geq j\right ) \\ =&P\left (\vert \widetilde{p}(t)\vert \geq \frac{j\sqrt{T + 1}} {\sqrt\varepsilon } \right ) \\ \leq &P\left (\widetilde{p}(t) \geq \frac{j\sqrt{T + 1}} {\sqrt\varepsilon } \right ) + P\left (-\widetilde{p}(t) \geq \frac{j\sqrt{T + 1}} {\sqrt\varepsilon } \right ).\end{array}$$

Consider the first term on the right-hand side of the inequality above. Let \({a}_{j} = j(T + 1)/(8{\kappa }_{1}\varepsilon )\). Then

$$\begin{array}{rl} &P\left (\widetilde{p}(t) \geq \frac{j\sqrt{T + 1}} {\sqrt\varepsilon } \right ) \\ \leq &P\left (\widetilde{q}(t) \geq \exp \left \{\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } - {\kappa }_{1}{\zeta }^{2}N(T)\right \}\right ) \\ \leq &P\left (\widetilde{q}(t) \geq \exp \left \{\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } - {\kappa }_{1}{\zeta }^{2}N(T)\right \},N(T) \leq {a}_{ j}\right ) \\ & +\; P(N(T) \geq {a}_{j}) \\ \leq &P\left (\widetilde{q}(t) \geq \exp \left (\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } - {\kappa }_{1}{\zeta }^{2}{a}_{ j}\right )\right ) + P(N(T) \geq {a}_{j}) \\ \leq &2\exp \left (-\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } + {\kappa }_{1}{\zeta }^{2}{a}_{ j}\right ) + P(N(T) \geq {a}_{j}).\end{array}$$

The last inequality follows from the local martingale property (see Elliott [55, Theorem 4.2]).

Now if we choose \(\zeta = 4\sqrt\varepsilon /\sqrt{T + 1}\), then

$$\exp \left (-\frac{j\zeta \sqrt{T + 1}} {\sqrt\varepsilon } + {\kappa }_{1}{\zeta }^{2}{a}_{ j}\right ) = {e}^{-2j}.$$

In view of the construction of Markov chains in Section 2.4, there exists a Poisson process N 0 ( ⋅) with parameter (i.e., mean) a ∕ ε for some a > 0, such that N(t) ≤  N 0(t). Assume a = 1 without loss of generality (otherwise one may replace ε by εa  − 1 ). Using the Poisson distribution of N 0(t), we have

$$P({N}_{0}(T) \geq k) \leq \frac{{(T/\varepsilon )}^{k}} {k!} \mbox{ for }k \geq 0.$$

In view of Stirling’s formula (see Chow and Teicher [30] or Feller [60]), for ε small enough,

$$P(N(T) \geq {a}_{j}) \leq \frac{{(T/\varepsilon )}^{\lfloor {a}_{j}\rfloor }} {\lfloor {a}_{j}\rfloor !} \leq 2{\left (\frac{8{\kappa }_{1}} {j} \right )}^{{a}_{j}-1} := 2{\gamma }_{ 0}^{{a}_{j}-1},$$

where ⌊ a j ⌋ is the integer part of a j and

$${\gamma }_{0} = \frac{8e{\kappa }_{1}} {{j}_{0}} \in (0,1)\quad \mbox{ for }{j}_{0} >\max \{ 1,8e{\kappa }_{1}\}.$$

Thus, for j ≥ j 0,

$$P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}{\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s) \geq j\right ) \leq 2{e}^{-2j} + 2{\gamma }_{ 0}^{{a}_{j}-1}.$$

Repeating the same argument for the martingale \((-\widetilde{p}(\cdot ))\), we get for j ≥ j 0,

$$P\left (- \frac{\sqrt\varepsilon } {\sqrt{T + 1}}{\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s) \geq j\right ) \leq 2{e}^{-2j} + 2{\gamma }_{ 0}^{{a}_{j}-1}.$$

Combining the above two inequalities yields that for j ≥ j 0,

$$P\left ( \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \geq j\right ) \leq 4({e}^{-2j} + {\gamma }_{ 0}^{{a}_{j}-1}).$$

Then by (5.18),

$$\begin{array}{rl} &E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \right \} \\ &\qquad \leq {K}_{0} + 4(e - 1)\sum \limits_{j=1}^{\infty }{e}^{j}({e}^{-2j} + {\gamma }_{ 0}^{{a}_{j}-1}), \end{array}$$

where K 0 is the sum corresponding to those terms with j ≤  j 0. Now choose ε small enough that \(e{\gamma }_{0}^{1/(8{\kappa }_{1}\varepsilon )} \leq 1/2\). Then

$$E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}{b}_{ 0}(s)d{w}_{i}^\varepsilon (s)\right \vert \right \}\leq {K}_{ 0} + 4e{\gamma }_{0}^{-1}.$$

Since t 0 is arbitrary, we may take t 0 = t in the above inequality. Then

$$E\exp \left \{ \frac{\sqrt\varepsilon } {\sqrt{T + 1}}\left \vert {\int }_{0}^{t}b(s,t)d{w}_{ i}^\varepsilon (s)\right \vert \right \}\leq {K}_{ 0} + 4e{\gamma }_{0}^{-1}.$$

Combining this inequality with (5.16) leads to

$$\begin{array}{rl} &E\exp \left \{ \frac{{\theta }_{T}} {\sqrt\varepsilon {(T + 1)}^{\frac{3} {2} }} \left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert \right \} \\ & \leq e({K}_{0} + 4e{\gamma }_{0}^{-1}) := K\end{array}$$

Step 2. Recall that

$${n}^\varepsilon (t,i) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}\left ({I}_{\{ \alpha (\varepsilon,s)=i\}} - {\nu }_{i}(s)\right ){\beta }_{i}(s)\,ds.$$

Then, for each \(i \in \mathcal{M}\), n ε(t, i) is nearly a martingale, i.e., for ε small enough,

$$\vert E[{n}^\varepsilon (t,i)\vert {\mathcal{F}}_{ s}] - {n}^\varepsilon (s,i)\vert \leq O(\sqrt\varepsilon ),\quad \mbox{ for all }\omega \in \Omega \mbox{ and }0 \leq s \leq t \leq T.$$
(5.21)

Here \(O \sqrt{\varepsilon}\) is deterministic, i.e., it does not depend on the sample point ω. The reason is that it is obtained from the asymptotic expansions. In fact, for all \({i}_{0} \in \mathcal{M}\),

$$ \begin{array}{l} E\left[{\int }_{s}^{t}({I}_{\{ \alpha (\varepsilon,r)=i\}} - {\nu }_{i}(r)){\beta }_{i}(r)\,dr {\vert }\alpha (\varepsilon,s) = {i}_{0}\right] \\ ={ \int }_{s}^{t}(E[{I}_{\{ \alpha (\varepsilon,r)=i\}}\vert \alpha (\varepsilon,s) = {i}_{0}] - {\nu }_{i}(r)){\beta }_{i}(r)\,dr \\ ={ \int }_{s}^{t}[P(\alpha (\varepsilon,r) = i\vert \alpha (\varepsilon,s) = {i}_{ 0}) - {\nu }_{i}({r}_)]{\beta }_{i}(r)\,dr \\ ={ \int }_{s}^{t}O(\varepsilon +\exp (-{\kappa }_{ 0}(r - s)/\varepsilon )\,dr = O(\varepsilon )\end{array}$$

So, (5.21) follows.

Step 3. We show that for each a > 0,

$$E[\exp \{a\vert {n}^\varepsilon (t,i)\vert \}\vert {\mathcal{F}}_{ s}] \geq \exp \{ a\vert {n}^\varepsilon (s,i)\vert \}(1 + O(\sqrt\varepsilon )).$$

First of all, note that ϕ(x) =  |  x | is a convex function. There exists a vector function ϕ0(x) bounded by 1 such that

$$\phi (x) \geq \phi (a) + {\phi }_{0}(a) \cdot (x - a),$$

for all x and a. Noting that \(O(\sqrt\varepsilon ) = -O(\sqrt\varepsilon )\) , we have

$$\begin{array}{rl} E[\vert {n}^\varepsilon (t,i)\vert \;\vert {\mathcal{F}}_{s}] \geq &\vert {n}^\varepsilon (s,i)\vert + {\phi }_{0}({n}^\varepsilon (s,i)) \cdot E[{n}^\varepsilon (t,i) - {n}^\varepsilon (s,i)\vert {\mathcal{F}}_{s}] \\ \geq &\vert {n}^\varepsilon (s,i)\vert + O(\sqrt\varepsilon )\end{array}$$

Moreover, note that e ax is also convex. It follows that

$$ \begin{array}{l} E[\exp (a\vert {n}^\varepsilon (t,i)\vert )\vert {\mathcal{F}}_{s}] \\ \geq \exp (a\vert {n}^\varepsilon (s,i)\vert ) + a\exp \{a\vert {n}^\varepsilon (s,i)\vert \}E[\vert {n}^\varepsilon (t,i)\vert -\vert {n}^\varepsilon (s,i)\vert \;\vert {\mathcal{F}}_{s}] \\ \geq \exp (a\vert {n}^\varepsilon (s,i)\vert )(1 + O(\sqrt\varepsilon )).\end{array}$$

Step 4. Let x ε(t) = exp(a |  n ε(t,  i) | ) for a > 0. Then, for any \({\mathcal{F}}_{t}\) stopping time τ ≤  T,

$$E[{x}^\varepsilon (T)\vert {\mathcal{F}}_{ \tau }] \geq {x}^\varepsilon (\tau )(1 + O(\sqrt\varepsilon )).$$
(5.22)

Note that x ε(t) is continuous. Therefore, it suffices to show the above inequality when τ takes values in a countable set {t 1 ,  t 2, …}. To this end, note that, for each t i ,

$$E[{x}^\varepsilon (T)\vert {\mathcal{F}}_{{ t}_{i}}] \geq {x}^\varepsilon ({t}_{ i})(1 + O(\sqrt\varepsilon )).$$

For all \(A \in {\mathcal{F}}_{\tau }\) , we have \(A \cap \{ \tau = {t}_{i}\} \in {\mathcal{F}}_{{t}_{i}}\) . Therefore,

$${\int }_{A\cap \{\tau ={t}_{i}\}}{x}^\varepsilon (T)\,dP \geq \left ({\int }_{A\cap \{\tau ={t}_{i}\}}{x}^\varepsilon (\tau )\,dP\right )(1 + O(\sqrt\varepsilon )).$$

Thus

$${\int }_{A}{x}^\varepsilon (T)\,dP \geq \left ({\int }_{A}{x}^\varepsilon (\tau )\,dP\right )(1 + O(\sqrt\varepsilon )),$$

and (5.22) follows.

Step 5. Let \(a = \theta /\sqrt{{(T + 1)}^{3}}\) in Step 3. Then, for ε small enough, there exists K such that

$$P\left (\sup \limits_{t\leq T}{x}^\varepsilon (t) \geq x\right ) \leq \frac{K} {x},$$
(5.23)

for all x > 0.

In fact, let τ = inf{t > 0  :   x ε(t) ≥  x}. We adopt the convention that τ = ∞ if { t > 0  :   x ε(t) ≥ x} =  . Then we have

$$E[{x}^\varepsilon (T)] \geq (E[{x}^\varepsilon (T \wedge \tau )])(1 + O(\sqrt\varepsilon )),$$

and write

$$E[{x}^\varepsilon (T \wedge \tau )] = E[{x}^\varepsilon (\tau ){I}_{\{ \tau <T\}}] + E[{x}^\varepsilon (T){I}_{\{ \tau \geq T\}}] \geq E[{x}^\varepsilon (\tau ){I}_{\{ \tau <T\}}].$$

Moreover, in view of the definition of τ, we have

$$E\left [{x}^\varepsilon (\tau ){I}_{\{ \tau <T\}}\right ] \geq xP(\tau < T) \geq xP\left (\sup \limits_{t\leq T}{x}^\varepsilon (t) \geq x\right ).$$

It follows that

$$P\left (\sup \limits_{t\leq T}{x}^\varepsilon (t) \geq x\right ) \leq \frac{E[{x}^\varepsilon (T)]} {(1 + O(\sqrt\varepsilon ))x} \leq \frac{K} {x}.$$

Thus, (5.23) follows.

Finally, to complete the proof of (5.13), note that, for 0 < κ < 1,

$$E\exp \left ( \frac{\kappa \theta } {\sqrt{{(1 + T)}^{3}}}\sup \limits _{t\leq T}\vert {n}^\varepsilon (t,i)\vert \right ) = E\left [\sup \limits_{ t\leq T}{({x}^\varepsilon (t))}^{\kappa }\right ].$$

It follows that

$$\begin{array}{rl} E\left [\sup \limits_{t\leq T}{({x}^\varepsilon (t))}^{\kappa }\right ] =&{\int }_{0}^{\infty }P\left (\sup \limits_{ t\leq T}{({x}^\varepsilon (t))}^{\kappa } \geq x\right )\,dx \\ \leq &1 +{ \int }_{1}^{\infty }P\left (\sup \limits_{ t\leq T}{({x}^\varepsilon (t))}^{\kappa } \geq x\right )\,dx \\ \leq &1 +{ \int }_{1}^{\infty }P\left (\sup \limits_{ t\leq T}{x}^\varepsilon (t) \geq {x}^{1/\kappa }\right )\,dx \\ \leq &1 +{ \int }_{1}^{\infty}K{x}^{-1/\kappa }\,dx < \infty. \end{array}$$

This completes the proof. □ 

Next we give several corollaries to the theorem. Such estimates are useful for establishing exponential bounds of asymptotic optimal hierarchical controls in manufacturing models (see Sethi and Zhang [192]).

Corollary 5.6

In Theorem  5.4 , if Q(t) = Q, a constant matrix, then the following stronger estimate holds:

$$E\exp \left \{ \frac{{\theta }_{T}} {\sqrt{1 + T}}\sup \limits_{0\leq t\leq T}\left \vert {n}^\varepsilon (t)\right \vert \right \}\leq K.$$
(5.24)

Moreover, the constant θ T = θ is independent of T for T > 0.

Proof: If Q(t) =  Q, then φ1(t) in Lemma  5.1 is identically 0. Therefore, the estimate (5.11) can be replaced by

$${P}^\varepsilon (t,s) - {P}_{ 0}(t) = {K}_{T}{O}_{1}\left (\exp \left (-\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).$$

As a result, the estimate in (5.15) can be replaced by

$$\sup \limits_{0\leq t\leq T}\left \vert {\int }_{0}^{t}({\chi }^\varepsilon (s) - \nu (s))\beta (s)ds\right \vert = \varepsilon {K}_{ T}{O}_{1}(1)+\varepsilon K{}_{T}\sup \limits_{0\leq t\leq T}\left \vert {\int }_{0}^{t}{O}_{ 1}(1)d{w}^\varepsilon (s)\right \vert.$$

The proof of (5.24) follows in essentially the same way as that of Theorem  5.4 (from equation (5.15) on).

To see that θ T in (5.24) is independent of T, it suffices to note that in (5.11) the constant K T is independent of T, which can be seen by examining closely Example  4.16. □ 

Corollary 5.7

Under the conditions of Theorem  5.4 , there exist constants K j , such that for j = 1,2,…,

$$E\sup \limits_{0\leq t\leq T}{\left \vert {n}^\varepsilon (t)\right \vert }^{2j} \leq {K}_{ j}{(1 + T)}^{3j}.$$
(5.25)

Moreover, if Q(t) = Q, then for some K j independent of T and

$$E\sup \limits_{0\leq t\leq T}{\left \vert {n}^\varepsilon (t)\right \vert }^{2j} \leq {K}_{ j}{(1 + T)}^{j}.$$
(5.26)

Proof: Since (5.26) follows from a similar argument to that of Corollary  5.6, it suffices to verify (5.25) using Theorem  5.4. Note that for each j = 1, 2, …, there exists K j 0 such that for all x, we have x 2j ≤ K j 0 e x. Thus,

$${\left ( \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup \limits _{0\leq t\leq T}\left \vert {n}^\varepsilon (t)\right \vert \right)}^{2j} \leq {K}_{ j}^{0}\exp \left \{ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup \limits _{0\leq t\leq T}\left \vert{n}^\varepsilon (t)\right \vert \right \}.$$

Taking expectations on both sides of the above inequality yields the desired estimate. □ 

Corollary 5.8

Under the conditions of Theorem  5.4 , for each 0 < δ < 1∕2, we have

$$\begin{array}{l} P\left (\sup \limits_{0\leq t\leq T}\left \vert {\int}_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu}_{i}(s)){\beta }_{i}(s)ds\right \vert \geq \varepsilon ^{\frac{1}{2} -\delta }\right ) \\ \leq K\exp \left \{- \frac{{\theta }_{T}}{\varepsilon ^{\delta }{(T + 1)}^{\frac{3} {2} }}\right\}.\end{array}$$
(5.27)

Moreover, if Q(t) = Q, then θ T = θ is independent of T and

$$\begin{array}{l} P\left (\sup \limits_{0\leq t\leq T}\left \vert {\int}_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu}_{i}(s)){\beta }_{i}(s)ds\right \vert \geq \varepsilon ^{\frac{1}{2} -\delta }\right ) \\ \leq K\exp \left \{- \frac{\theta }{\varepsilon ^{\delta }\sqrt{1 + T}}\right \}.\end{array}$$
(5.28)

Proof: Using Theorem  5.4, we obtain

$$\begin{array}{rl} &P\left (\sup \limits_{0\leq t\leq T}\left \vert{\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu}_{i}(s)){\beta }_{i}(s)ds\right \vert \geq \varepsilon ^{\frac{1} {2} -\delta }\right ) \\ =&P\left (\exp \left \{ \frac{{\theta}_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup \limits_{0\leq t\leq T}\left \vert{n}^\varepsilon (t)\right \vert \right \}\geq \exp \left \{\frac{{\theta }_{T}\varepsilon ^{\frac{1} {2} -\delta }}{\sqrt\varepsilon {(T + 1)}^{\frac{3} {2} }} \right \}\right ) \\ \leq &K\exp \left \{- \frac{{\theta }_{T}} {\varepsilon ^{\delta}{(T + 1)}^{\frac{3} {2} }} \right \} .\end{array}$$

This proves (5.27). Similarly, (5.28) follows from Corollary  5.6. □ 

2.4 Asymptotic Normality

Recall that the ith component of n ε( ⋅) is given by

$${n}^\varepsilon (t,i) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)\right ){\beta }_{i}(s)ds.$$

It is expected that the sequence of centered and scaled occupation measures will display certain “central limit type” phenomena. The goal here is to study the asymptotic properties of n ε( ⋅) as ε → 0. To be more specific, we show that n ε( ⋅) converges to a Gaussian process as ε goes to 0. The following theorem is the main result of this section.

Theorem 5.9

Suppose that (A5.1) is satisfied and Q(⋅) is twice continuously differentiable in [0,T] with the second derivative being Lipschitz. Then for t ∈ [0,T], the process n ε (⋅) converges weakly to a Gaussian process n(⋅) with independent increments such that

$$En(t) = 0\mbox{ and }E[n^{\prime}(t)n(t)] ={ \int }_{0}^{t}A(s)ds,$$
(5.29)

where A(t) = (A ij (t)) with

$${A}_{ij}(t) = {\beta }_{i}(t){\beta }_{j}(t)\left [{\nu }_{i}(t){\int }_{0}^{\infty }{q}_{ 0,ij}(r,t)dr + {\nu }_{j}(t){\int }_{0}^{\infty }{q}_{ 0,ji}(r,t)dr\right ],$$
(5.30)

and Q 0 (r,t) = (q 0,ij (r,t)).

Remark 5.10

In view of (5.29) and the independent increment property of n(t), it follows that

$$E[n^{\prime}({t}_{1})n({t}_{2})] ={ \int }_{0}^{\min \{{t}_{1},{t}_{2}\} }A(s)ds.$$
(5.31)

The form of the covariance matrix (between t1 and t2) reveals the nonstationarity of the limit process n(⋅). Note that the limit covariance of n(t) given in (5.31) is an integral of the function A(s) defined in (5.30). For simplicity, with a slight abuse of notation, we shall also call A(t) as the covariance. This convention will be used throughout the chapter.

Remark 5.11

The additional assumptions on the second derivative of Q(⋅) in Theorem  5.9 are required for computing or characterizing the function A(⋅). It is not crucial for the convergence of n ε (⋅); see Remark  5.44 in Section 5.3.3 for details.

Proof of Theorem  5.9: We divide the proof into several steps, which are presented by a number of lemmas.

Step 1. Show that the limit of the mean of n ε( ⋅) is 0.

Lemma 5.12

For each t ∈ [0,T],

$$\lim \limits_{\varepsilon \rightarrow 0}E{n}^\varepsilon (t) = 0.$$

Proof: Using Theorem  4.5 and the boundedness of β i ( ⋅), for t ∈ [0, T],

$$\begin{array}{rl} E{n}^\varepsilon (t,i) =& \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}(E{I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)){\beta }_{i}(s)ds \\ =& \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}(P({\alpha }^\varepsilon (s) = i) - {\nu }_{ i}(s)){\beta }_{i}(s)ds \\ =& \frac{1} {\sqrt\varepsilon}{\int }_{0}^{t}\left [O(\varepsilon ) + O\left (\exp \left ( -\frac{{\kappa }_{0}s} \varepsilon \right )\right )\right ]{\beta}_{i}(s)ds \\ =&O(\sqrt\varepsilon ) + \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}O\left(\exp \left( -\frac{{\kappa }_{0}s} \varepsilon \right)\right.ds = O(\sqrt\varepsilon ) \rightarrow 0, \end{array}$$

for each \(i \in \mathcal{M}\). □ 

Step 2. Calculate the limit covariance function.

Lemma 5.13

For each t ∈ [0,T],

$$\lim \limits_{\varepsilon \rightarrow 0}E({n}^{\varepsilon,{\prime}}(t){n}^\varepsilon (t)) ={ \int }_{0}^{t}A(s)ds,$$
(5.32)

where A(t) is given by (5.30).

Proof: For each \(i,j \in \mathcal{M}\),

$$\begin{array}{rl} E\left [{n}^\varepsilon (t,i){n}^\varepsilon (t,j)\right ] =&\frac{1} \varepsilon E\left[\left({\int}_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (\varsigma )=i\}} - {\nu}_{i}(\varsigma )){\beta }_{i}(\varsigma )d\varsigma \right)\right. \\ & \left.\times \left ({\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (r)=j\}} - {\nu }_{j}(r)){\beta }_{j}(r)dr\right )\right] \\ =&\frac{1} \varepsilon E\left[{\int }_{0}^{t}{ \int }_{0}^{t}\left({I}_{\{{ \alpha }^\varepsilon (\varsigma )=i,{\alpha}^\varepsilon (r)=j\}} - {\nu }_{i}(\varsigma ){I}_{\{{\alpha}^\varepsilon (r)=j\}}\right.\right. \\ & \left.\left.- {\nu}_{j}(r){I}_{\{{\alpha }^\varepsilon (\varsigma )=i\}} + {\nu}_{i}(\varsigma ){\nu }_{j}(r)\right){\beta }_{i}(\varsigma ){\beta}_{j}(r)d\varsigma dr\right]\end{array}.$$

Let

$$\begin{array}{rl} &{D}_{1} =\{ (\varsigma,r) :\; 0 \leq r \leq \varsigma \leq t\}, \\ &{D}_{2} =\{ (\varsigma,r) :\; 0 \leq \varsigma \leq r \leq t\}, \end{array}$$

and let

$$\begin{array}{rl} {\Phi }^\varepsilon (\varsigma,r)& = P({\alpha }^\varepsilon (\varsigma ) = i,{\alpha }^\varepsilon (r) = j) - {\nu }_{ i}(\varsigma )P({\alpha }^\varepsilon (r) = j) \\ &\quad - {\nu }_{j}(r)P({\alpha }^\varepsilon (\varsigma ) = i) + {\nu }_{ i}(\varsigma ){\nu }_{j}(r)\end{array}.$$

Then it follows that

$$\begin{array}{rl} E\left [{n}^\varepsilon (t,i){n}^\varepsilon (t,j)\right ] =&\frac{1} \varepsilon \left [{\int }_{0}^{t}{ \int }_{0}^{t}{\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr\right ] \\ =&\frac{1} \varepsilon \left ({\int }_{{D}_{1}} +{ \int }_{{D}_{2}}\right ){\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr\end{array}.$$

Note that if (ς, r) ∈  D 1, then ς ≥ r and

$$\begin{array}{rl} &P({\alpha }^\varepsilon (\varsigma ) = i,{\alpha }^\varepsilon (r) = j) \\ &\quad = P({\alpha }^\varepsilon (\varsigma ) = i\vert {\alpha }^\varepsilon (r) = j)P({\alpha }^\varepsilon (r) = j)\end{array}.$$

Hence, for (ς,  r) ∈ D 1 we have

$$\begin{array}{ll} {\Phi }^\varepsilon (\varsigma,r) =&[P({\alpha }^\varepsilon (\varsigma ) = i\vert {\alpha }^\varepsilon (r) = j) - {\nu }_{ i}(\varsigma )]P({\alpha }^\varepsilon (r) = j) \\ & + {\nu }_{j}(r)[{\nu }_{i}(\varsigma ) - P({\alpha }^\varepsilon (\varsigma ) = i)]\end{array}.$$

Using Theorem  4.5 and Lemma  5.1, for (ς, r) ∈ D 1,

$$\begin{array}{rl} {\Phi }^\varepsilon (\varsigma,r) =&\left (\varepsilon {\varphi }_{1}^{i}(\varsigma ) + {q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ) + \varepsilon {q}_{1,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ) + O(\varepsilon ^{2})\right ) \\ & \times \left ( {\nu }_{j}(r) + \varepsilon {\varphi }_{1}^{j}(r) + {\psi }_{ 0}^{j}\left (\frac{r} \varepsilon \right ) + \varepsilon {\psi }_{1}^{j}\left (\frac{r} \varepsilon \right ) + O(\varepsilon ^{2})\right ) \\ & - {\nu }_{j}(r)\left (\varepsilon {\varphi }_{1}^{i}(\varsigma ) + {\psi }_{ 0}^{i}\left (\frac{\varsigma } \varepsilon \right ) + \varepsilon {\psi }_{1}^{i}\left (\frac{\varsigma } \varepsilon \right ) + O(\varepsilon ^{2})\right ) \\ =&{\nu }_{j}(r){q}_{0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ) +\left [ O\left (\varepsilon \exp \left ( -\frac{{\kappa }_{0}r} \varepsilon \right )\right )\right. \\ & \left.+ O\left (\varepsilon \exp \left ( -\frac{{\kappa }_{0}(\varsigma - r)} \varepsilon \right )\right ) + O\left (\exp \left ( -\frac{{\kappa }_{0}\varsigma } \varepsilon \right )\right ) + O(\varepsilon ^{2})\right ]\end{array}.$$

In the above, φ i and ψ i denote the ith components of the vectors φ and ψ , respectively. By elementary integration, we have

$$\begin{array}{l} {\int }_{0}^{t}\left ({ \int }_{0}^{\varsigma }\exp \left ( -\frac{{\kappa }_{0}\varsigma } \varepsilon \right )dr\right )d\varsigma ={ \int }_{0}^{t}\varsigma \exp \left ( -\frac{{\kappa }_{0}\varsigma } \varepsilon \right )d\varsigma = O(\varepsilon ^{2}), \\ \varepsilon {\int }_{0}^{t}\left ({ \int }_{0}^{\varsigma }\exp \left ( -\frac{{\kappa }_{0}r} \varepsilon \right )dr\right )d\varsigma = \frac{\varepsilon ^{2}} {{\kappa }_{0}}{ \int }_{0}^{t}\left (1 -\exp \left (-\frac{{\kappa }_{0}\varsigma } \varepsilon \right )\right )d\varsigma = O(\varepsilon ^{2}), \\ \mbox{and} \\ \varepsilon {\int }_{0}^{t}\left ({ \int }_{0}^{\varsigma }\exp \left ( -\frac{{\kappa }_{0}(\varsigma - r)} \varepsilon \right )dr\right )d\varsigma = \varepsilon {\int }_{0}^{t}\left ({ \int }_{0}^{\varsigma }\exp \left ( -\frac{{\kappa }_{0}r} \varepsilon \right )dr\right )d\varsigma = O(\varepsilon ^{2})\end{array}.$$

Thus, it follows that

$$\begin{array}{rl} &{ \int }_{{D}_{1}}{\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr \\ =&{\int }_{0}^{t}\left ({\int }_{0}^{\varsigma }{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\nu }_{j}(r){\beta }_{i}(\varsigma ){\beta }_{j}(r)dr\right )d\varsigma + O(\varepsilon ^{2})\end{array}.$$

Exchanging the order of integration leads to

$$\begin{array}{rl} &{\int }_{0}^{t}\left ({\int }_{0}^{\varsigma }{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\nu }_{j}(r){\beta }_{i}(\varsigma ){\beta }_{j}(r)dr\right )d\varsigma \\ =&{\int }_{0}^{t}\left ({\int }_{r}^{t}{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\nu }_{j}(r){\beta }_{i}(\varsigma ){\beta }_{j}(r)d\varsigma \right )dr \\ =&{\int }_{0}^{t}{\beta }_{ j}(r){\nu }_{j}(r)\left ({\int }_{r}^{t}{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\beta }_{i}(\varsigma )d\varsigma \right )dr .\end{array}$$

Making a change of variables (via \(\varsigma - r = \varepsilon s\)) yields

$${\int }_{r}^{t}{q}_{ 0,ji}\left (\frac{\varsigma - r} \varepsilon ,r\right ){\beta }_{i}(\varsigma )d\varsigma = \varepsilon {\int }_{0}^{(t-r)/\varepsilon }{q}_{ 0,ji}(s,r){\beta }_{i}(r + \varepsilon s)ds.$$

We note that β i ( ⋅) is bounded and β i (r + ε s) → β i (r) in L 1 for each r ∈ [0, T], as ε → 0. Since q 0, ji ( ⋅) decays exponentially fast, as in Lemma  5.1, we have

$${\int }_{0}^{(t-r)/\varepsilon }{q}_{ 0,ji}(s,r){\beta }_{i}(r + \varepsilon s)ds \rightarrow {\beta }_{i}(r){\int }_{0}^{\infty }{q}_{ 0,ji}(s,r)ds.$$

Therefore, we obtain

$$\begin{array}{rl} &\lim \limits_{\varepsilon \rightarrow 0 } \frac{1} \varepsilon {\int }_{{D}_{1}}{\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr \\ =&{\int }_{0}^{t}{\beta }_{ i}(r){\beta }_{j}(r){\nu }_{j}(r)\left ({\int }_{0}^{\infty }{q}_{ 0,ji}(s,r)ds\right )dr.\end{array}$$
(5.33)

Similarly, we can show that

$$\begin{array}{rl} &\lim \limits_{\varepsilon \rightarrow 0 } \frac{1} \varepsilon {\int }_{{D}_{2}}{\Phi }^\varepsilon (\varsigma,r){\beta }_{ i}(\varsigma ){\beta }_{j}(r)d\varsigma dr \\ =&{\int }_{0}^{t}{\beta }_{ i}(r){\beta }_{j}(r){\nu }_{i}(r)\left ({\int }_{0}^{\infty }{q}_{ 0,ij}(s,r)ds\right )dr.\end{array}$$
(5.34)

Combining (5.33) and (5.34), we obtain

$$\lim \limits_{\varepsilon \rightarrow 0}E\left [{n}^\varepsilon (t,i){n}^\varepsilon (t,j)\right ] ={ \int }_{0}^{t}{A}_{ ij}(s)ds,$$

with A(t) = ( A ij (t)) given by (5.30). □ 

Step 3. Establish a mixing condition for the sequence { n ε( ⋅)}.

Lemma 5.14

For any ς ≥ 0 and σ{α ε (s) :  s ≥ t + ς}-measurable η with |η|≤ 1,

$${\vert }E(\eta \vert {\alpha }^\varepsilon (s) :\; s \leq t) -E\eta {\vert }\leq K\exp l \left( -\frac{\kappa \varsigma } \varepsilon \right )\quad \mbox{ w.p.1.}$$
(5.35)

Remark 5.15

It follows from (5.35) that for any σ{αε(s) :  0 ≤ s ≤ t}-measurable ξ with |ξ|≤ 1 and η given in Lemma  5.14,

$${\vert }E\xi \eta - E\xi E\eta {\vert }\leq K\exp \left ( -\frac{\kappa \varsigma } \varepsilon \right ).$$
(5.36)

We will make crucial use of (5.35) and (5.36) in what follows.

Proof of Lemma  5.14: For any

$$0 \leq {s}_{1} \leq {s}_{2} \leq \cdots \leq {s}_{n} = t \leq t + \varsigma = {t}_{0} \leq {t}_{1} \leq \cdots \leq {t}_{l} < \infty,$$

let

$$\begin{array}{rl} &{E}_{1} =\{ {\alpha }^\varepsilon (t) = i,\;{\alpha }^\varepsilon ({s}_{n-1}) = {i}_{n-1},\;\ldots,{\alpha }^\varepsilon ({s}_{1}) = {i}_{1}\}\mbox{ and } \\ &{E}_{2} =\{ {\alpha }^\varepsilon (t + \varsigma ) = j,\;{\alpha }^\varepsilon ({t}_{1}) = {j}_{1},\;\ldots,{\alpha }^\varepsilon ({t}_{l}) = {j}_{l}\}\end{array}$$

Then in view of the Markovian property of α ε ( ⋅),

$$\begin{array}{rl} &P({E}_{2}\vert {E}_{1}) = P({E}_{2}\vert {\alpha }^\varepsilon (t) = i) \\ & = P({\alpha }^\varepsilon (t + \varsigma ) = j\vert {\alpha }^\varepsilon (t) = i)[{p}_{ j,{j}_{1}}^\varepsilon ({t}_{ 1},t + \varsigma )\cdots {p}_{{j}_{l-1},{j}_{l}}^\varepsilon ({t}_{ l},{t}_{l-1})]\end{array}.$$

Similarly, we have

$$P({E}_{2}) = P({\alpha }^\varepsilon (t + \varsigma ) = j)[{p}_{ j,{j}_{1}}^\varepsilon ({t}_{ 1},t + \varsigma )\cdots {p}_{{j}_{l-1},{j}_{l}}^\varepsilon ({t}_{ l},{t}_{l-1})].$$

We first show that

$${\vert }P({E}_{2}\vert {E}_{1}) - P({E}_{2}){\vert }\leq K\exp \left ( -\frac{\kappa \varsigma } \varepsilon \right ),$$
(5.37)

for some positive constants K and κ that are independent of \(i,j \in \mathcal{M}\) and t ∈ [0,  T].

To verify (5.37), it suffices to show that for any \(k \in \mathcal{M}\),

$${\vert }{p}_{ij}^\varepsilon (t + \varsigma,t) - {p}_{kj}^\varepsilon (t + \varsigma,t){\vert }\leq K\exp \left (-\frac{2\kappa \varsigma } \varepsilon \right ).$$
(5.38)

Since P 0 ( ⋅) and P 1( ⋅) have identical rows, the asymptotic expansion in (5.3) implies that \({p}_{ij}^\varepsilon (t + \zeta,t) - {p}_{kj}^\varepsilon (t + \zeta,t)\) is determined by Q 0 (ζ ∕ ε,  t). By virtue of the asymptotic expansion (see Theorem  4.5 and Lemma  5.1), there exist a K 1  > 0 and a κ 0  > 0 such that

$$\left\vert {Q}_{0}\left(\frac{(t +\widetilde{ t}) - t} \varepsilon ,t\right)\right \vert \leq {K}_{1}\exp \left(-\frac{{\kappa }_{0}\widetilde{t}} \varepsilon \right),\mbox{ for all }\widetilde{t} \geq 0.$$

Choose N > 0 sufficiently large that K 1 exp( − κ 0 N) < 1. Then for ε > 0 sufficiently small, there is a 0 < ρ < 1 such that

$$\left\vert{p}_{ij}^\varepsilon (t + N\varepsilon,t) - {p}_{ kj}^\varepsilon (t + N\varepsilon,t)\right\vert \leq \rho.$$

To proceed, subdivide \([t + N\varepsilon,t + \varsigma ]\) into intervals of length .

In view of the Chapman–Kolmogorov equation,

$$\begin{array}{rl} &\vert {p}_{ij }^\varepsilon (t + 2N\varepsilon,t) - {p}_{ kj}^\varepsilon (t + 2N\varepsilon,t)\vert \\ =&\left |{\sum }_{{l}_{0}=1}^{m}[{p}_{ i{l}_{0}}^\varepsilon (t + N\varepsilon,t) - {p}_{ k{l}_{0}}^\varepsilon (t + N\varepsilon,t)]{p}_{{ l}_{0}j}^\varepsilon (t + 2N\varepsilon,t + N\varepsilon )\right | \\ =&\left |{\sum }_{{l}_{0}=1}^{m}[{p}_{ i{l}_{0}}^\varepsilon (t + N\varepsilon,t) - {p}_{ k{l}_{0}}^\varepsilon (t + N\varepsilon,t)] \right.\\ & \times [{p}_{{l}_{0}j}^\varepsilon (t + 2N\varepsilon,t + N\varepsilon ) - {p}_{{ l}_{1}j}^\varepsilon (t + 2N\varepsilon,t + N\varepsilon )] | \leq K{\rho }^{2},\end{array}$$

for any \({l}_{1} \in \mathcal{M}\) . Iterating on the inequality above, we arrive at

$$\left\vert{p}_{ij}^\varepsilon (t + {k}_{ 0}N\varepsilon,t) - {p}_{kj}^\varepsilon (t + {k}_{ 0}N\varepsilon,t)\right\vert\leq K{\rho }^{{k}_{0} },\;\mbox{ for }{k}_{0} \geq 1.$$

Choose \(\kappa = -1/(2N)\log \rho \) , and note that κ > 0. Then for any ς satisfying k 0 Nε ≤ ς < ( k 0 + 1),

$$\left\vert{p}_{ij}^\varepsilon (t + \varsigma,t) - {p}_{ kj}^\varepsilon (t + \varsigma,t)\right\vert\leq K\exp \left( -\frac{2\kappa \varsigma } \varepsilon \right).$$

Thus (5.37) holds. This implies that αε( ⋅) is a mixing process with exponential mixing rate. By virtue of Lemma  A.16, (5.35) holds. □ 

Step 4. Prove that the sequence n ε( ⋅) is tight, and any weakly convergent subsequence of {n ε( ⋅)} has continuous paths with probability 1.

Lemma 5.16

The following assertions hold:

  1. (a)

    {n ε(t);   t ∈ [0, T]} is tight in \(D([0,T]; {\mathbb{R}}^{m})\), where \(D([0,T]; {\mathbb{R}}^{m})\) denotes the space of functions that are defined on [0, T] and that are right continuous with left limits.

  2. (b)

    The limitn( ⋅) of any weakly convergent subsequence ofn ε( ⋅) has continuous sample paths with probability 1.

Proof: For \(i \in \mathcal{M}\) , define

$$\widetilde{{n}}^\varepsilon (t,i) ={ 1 \over \sqrt\varepsilon } {\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - P({\alpha }^\varepsilon (s) = i)\right ){\beta }_{ i}(s)ds.$$

By virtue of Theorem  4.5,

$$\frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}\left (P({\alpha }^\varepsilon (s) = i) - {\nu }_{ i}(s)\right ){\beta }_{i}(s)ds = O(\sqrt\varepsilon ).$$

Thus \({n}^\varepsilon (t,i) =\widetilde{ {n}}^\varepsilon (t,i) + O(\sqrt\varepsilon )\) , and as a result the tightness of { n ε( ⋅)} will follow from the tightness of \(\{\widetilde{{n}}^\varepsilon (\cdot )\}\) (see Kushner [139, Lemma 5, p. 50]).

For the tightness of \(\{\widetilde{{n}}^\varepsilon (\cdot )\}\), in view of Kushner [139, Theorem 5, p. 32], it suffices to show that

$$E\vert \widetilde{{n}}^\varepsilon (t + \varsigma ) -\widetilde{ {n}}^\varepsilon (t){\vert }^{4} \leq K{\varsigma }^{2}.$$
(5.39)

To verify this assertion, it is enough to prove that for each \(i \in \mathcal{M}\), \(\widetilde{{n}}^\varepsilon (\cdot,i)\) satisfies the condition.

Fix \(i \in \mathcal{M}\) and for any 0 ≤ t ≤  T, let

$$\theta (t) = \left ({I}_{\{{\alpha }^\varepsilon (t)=i\}} - P({\alpha }^\varepsilon (t) = i)\right ){\beta }_{ i}(t).$$

We have suppressed the i and ε dependence in θ( t) for ease of presentation. Let \(D =\{ ({s}_{1},{s}_{2},{s}_{3},{s}_{4}) : t \leq {s}_{i} \leq t + \varsigma,i = 1,2,3,4\}\). It follows that

$$\begin{array}{l} E\vert \widetilde{{n}}^\varepsilon (t + \varsigma,i) -\widetilde{ {n}}^\varepsilon (t,i){\vert }^{4} \\ \leq { 1 \over \varepsilon ^{2}} { \int }_{D}\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4})\vert d{s}_{1}d{s}_{2}d{s}_{3}d{s}_{4}.\end{array}$$
(5.40)

Let (i 1 ,  i 2, i 3 ,  i 4) denote a permutation of (1, 2, 3, 4) and

$${D}_{{i}_{1}{i}_{2}{i}_{3}{i}_{4}} =\{ ({s}_{1},{s}_{2},{s}_{3},{s}_{4}) :\; t \leq {s}_{{i}_{1}} \leq {s}_{{i}_{2}} \leq {s}_{{i}_{3}} \leq {s}_{{i}_{4}} \leq t + \varsigma \}.$$

Then it is easy to see that \(D = \cup {D}_{{i}_{1}{i}_{2}{i}_{3}{i}_{4}}\). This and (5.40) leads to

$$\begin{array}{rl} &E\vert \widetilde{{n}}^\varepsilon (t + \varsigma,i) -\widetilde{ {n}}^\varepsilon (t,i){\vert }^{4} \\ & \leq \frac{K} {\varepsilon ^{2}}{ \int }_{{D}_{0}}\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4})\vert d{s}_{1}d{s}_{2}d{s}_{3}d{s}_{4}, \end{array}$$

where D 0  =  D 1234.

Note that

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4})\vert \\ &\leq \vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})\vert \\ &\qquad + \vert E\theta ({s}_{1})\theta ({s}_{2})\vert \vert E\theta ({s}_{3})\theta ({s}_{4})\vert.\end{array}$$
(5.41)

By virtue of (5.36) and Eθ( t) = 0, t ≥ 0,

$$\begin{array}{rl} \vert E\theta ({s}_{1})\theta ({s}_{2})\vert & = \vert E\theta ({s}_{1})\theta ({s}_{2}) - E\theta ({s}_{1})E\theta ({s}_{2})\vert \\ & \leq K\exp \left (-\frac{\kappa ({s}_{2} - {s}_{1})} \varepsilon \right ).\end{array}$$

Similarly, we have

$$\begin{array}{rl} \vert E\theta ({s}_{3})\theta ({s}_{4})\vert & = \vert E\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{3})E\theta ({s}_{4})\vert \\ &\leq K\exp \left (-\frac{\kappa ({s}_{4} - {s}_{3})} \varepsilon \right )\end{array}$$

Therefore, it follows that

$${ K \over \varepsilon ^{2}} { \int }_{{D}_{0}}\vert E\theta ({s}_{1})\theta ({s}_{2})\vert \cdot \vert E\theta ({s}_{3})\theta ({s}_{4})\vert d{s}_{1}d{s}_{2}d{s}_{3}d{s}_{4} \leq K{\varsigma }^{2}.$$
(5.42)

The elementary inequality \({(a + b)}^{1/2} \leq {a}^{1/2} + {b}^{1/2}\) for nonnegative numbers a and b yields that

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})\vert \\ &\quad ={ \left (\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} }\right )}^{2} \\ & \quad \leq \vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})){\vert }^{\frac{1} {2} } \\ & \qquad \times \left (\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} } + \vert E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} }\right )\end{array}$$

In view of (5.36), we obtain

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})){\vert }^{\frac{1} {2} } \\ & \quad \leq K\exp \left (-\frac{\kappa ({s}_{3} - {s}_{2})} {2\varepsilon } \right )\end{array}$$

Similarly, by virtue of (5.35) and the boundedness of θ(s),

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} } \\ & \quad = \vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})(E(\theta ({s}_{4})\vert {\alpha }^\varepsilon (s) :\; s \leq {s}_{3}) - E\theta ({s}_{4})){\vert }^{\frac{1} {2} } \\ & \quad \leq K\exp \left (-\frac{\kappa ({s}_{4} - {s}_{3})} {2\varepsilon } \right ),\end{array}$$

and

$$\begin{array}{rl} &\vert E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4}){\vert }^{\frac{1} {2} } \\ & \quad = \vert (E\theta ({s}_{1})\theta ({s}_{2}) - E\theta ({s}_{1})E\theta ({s}_{2}))(E\theta ({s}_{3})\theta ({s}_{4}) - E\theta ({s}_{3})E\theta ({s}_{4})){\vert }^{\frac{1} {2} } \\ & \quad \leq K\exp \left (-\frac{\kappa ({s}_{2} - {s}_{1})} {2\varepsilon } \right )\exp \left (-\frac{\kappa ({s}_{4} - {s}_{3})} {2\varepsilon } \right )\end{array}$$

By virtue of the estimates above, we arrive at

$$\begin{array}{ll} &{ K \over \varepsilon ^{2}} { \int }_{{D}_{0}}\vert E\theta ({s}_{1})\theta ({s}_{2})\theta ({s}_{3})\theta ({s}_{4}) \\ &\qquad - E\theta ({s}_{1})\theta ({s}_{2})E\theta ({s}_{3})\theta ({s}_{4})\vert d{s}_{1}d{s}_{2}d{s}_{3}d{s}_{4} \leq K{\varsigma }^{2}.\end{array}$$
(5.43)

The estimate (5.39) then follows from (5.42) and (5.43), and so does the desired tightness of {n ε ( ⋅)}.

Since {n ε( ⋅)} is tight, by Prohorov’s theorem, we extract a convergent subsequence, and for notational simplicity, we still denote the sequence by {n ε ( ⋅)} whose limit is n( ⋅). By virtue of Kushner [139, Theorem 5, p. 32] or Ethier and Kurtz [59, Proposition 10.3, p. 149], n( ⋅) has continuous paths with probability 1. □ 

Remark 5.17

Step 4 implies that both nε(⋅) and n(⋅) have continuous sample paths with probability 1. It follows, in view of Prohorov’s theorem (see Billingsley [13]), that nε(⋅) is tight in \(C([0,T]; {\mathbb{R}}^{m})\).

Step 5. Show that the finite-dimensional distributions of n ε( ⋅) converge to that of a Gaussian process with independent increments.

This part of the proof is similar to Khasminskii [112] (see also Friedlin and Wentzel [67, pp. 224]). Use ι to denote the imaginary number \({\iota }^{2} = -1\). To prove the convergence of the finite-dimensional distributions, we use the characteristic function Eexp(ι⟨ z, n ε(t)⟩), where \(z \in {\mathbb{R}}^{m}\) and ⟨ ⋅.  ⋅⟩ denotes the usual inner product in \({\mathbb{R}}^{m}\). Owing to the mixing property and repeated applications of Remark  5.15, for arbitrary positive real numbers s l and t l satisfying

$$0 \leq {s}_{0} \leq {t}_{0} \leq {s}_{1} \leq {t}_{1} \leq {s}_{2} \leq \cdots \leq {s}_{n} \leq {t}_{n},$$

we have

$$\begin{array}{rl} &|E\exp \left (\iota \sum \limits_{l=0}^{n}\langle {z}_{ l},({n}^\varepsilon ({t}_{ l}) - {n}^\varepsilon ({s}_{ l}))\rangle \right ) \\ & -{\prod }_{l=0}^{n}E\exp \left (\iota \langle {z}_{ l},({n}^\varepsilon ({t}_{ l}) - {n}^\varepsilon ({s}_{ l}))\rangle \right )| \rightarrow 0\end{array}$$

as ε → 0, for \({z}_{l} \in {\mathbb{R}}^{m}\). This, in turn, implies that the limit process n( ⋅) has independent increments. Moreover, in view of Lemma  5.16, the limit process has continuous path with probability 1. In accordance with a result in Skorohod [197, p. 7], if a process with independent increments has continuous paths w.p.1, then it must necessarily be a Gaussian process. This implies that the limits of the finite-dimensional distribution of n( ⋅) are Gaussian.

Consequently, n( ⋅) is a process having Gaussian finite-dimensional distributions, with mean zero and covariance ∫0 t A( s)ds given by Lemma  5.13. Moreover, the limit does not depend on the chosen subsequence. Thus n ε( ⋅) converges weakly to the Gaussian process n( ⋅). This completes the proof of the theorem. □ 

To illustrate, we give an example in which the covariance function of the limit process can be calculated explicitly.

Example 5.18

Let \({\alpha }^\varepsilon (t) \in \mathcal{M} =\{ 1,2\}\) be a two-state Markov chain with a generator

$$Q(t) = \left (\begin{array}{*{10}c} -{\mu }_{1}(t)& {\mu }_{1}(t) \\ {\mu }_{2}(t) &-{\mu }_{2}(t)\\ \end{array} \right )$$

where μ1(t) ≥ 0, μ2(t) ≥ 0, and μ1(t) + μ2(t) > 0 for each t ∈ [0,T]. Moreover, μ1(⋅) and μ2(⋅) are twice continuously differentiable with Lipschitz continuous second derivatives. It is easy to see that assumptions (A5.1) and (A5.2) are satisfied. Therefore the desired asymptotic normality follows.

In this example,

$$\nu (t) = ({\nu }_{1}(t),{\nu }_{2}(t)) = \left ( \frac{{\mu }_{2}(t)} {{\mu }_{1}(t) + {\mu }_{2}(t)}, \frac{{\mu }_{1}(t)} {{\mu }_{1}(t) + {\mu }_{2}(t)}\right ).$$

Moreover,

$${Q}_{0}(s,{t}_{0}) = -\frac{\exp (-({\mu }_{1}({t}_{0}) + {\mu }_{2}({t}_{0}))s)} {{\mu }_{1}({t}_{0}) + {\mu }_{2}({t}_{0})} Q({t}_{0}).$$

Thus,

$$A(t) = \frac{2{\mu }_{1}(t){\mu }_{2}(t)} {{({\mu }_{1}(t) + {\mu }_{2}(t))}^{3}}\left (\begin{array}{cc} {({\beta }_{1}(t))}^{2} & - {\beta }_{1}(t){\beta }_{2}(t) \\ - {\beta }_{1}(t){\beta }_{2}(t)& {({\beta }_{2}(t))}^{2} \end{array} \right ).$$

2.5 Extensions

In this section, we generalize our results in the previous sections including asymptotic expansions, asymptotic normality, and exponential bounds, to the Markov chain αε ( ⋅) with generator given by \({Q}^\varepsilon (t) = Q(t)/\varepsilon +\widehat{ Q}(t)\) with weakly irreducible generator Q(t). Recall that the vector of probabilities \({p}^\varepsilon (t) = (P({\alpha }^\varepsilon (t) = 1),\ldots,P({\alpha }^\varepsilon (t) = m))\) satisfies the differential equation

$$\begin{array}{l} \frac{d{p}^\varepsilon (t)} {dt} = {p}^\varepsilon (t){Q}^\varepsilon (t),\;{p}^\varepsilon (t) \in {\mathbb{R}}^{m}, \\ {p}^\varepsilon (0) = {p}^{0}\mbox{ with }{p}_{ i}^{0} \geq 0\mbox{ for }i \in \mathcal{M}\mbox{ and }{\sum }_{i=1}^{m}{p}_{ i}^{0} = 1, \end{array}$$

To proceed, the following conditions are needed.

  1. (A5.3)

    Both Q(t) and \(\widehat{Q}(t)\) are generators. For each t ∈ [0, T], Q(t) is weakly irreducible.

  2. (A5.4)

    For some positive integer n 0, Q( ⋅) is ( n 0 + 1)-times continuously differentiable on [0, T] and \(({d}^{{n}_{0}+1}/d{t}^{{n}_{0}+1})Q(\cdot )\) is Lipschitz. Moreover, \(\widehat{Q}(\cdot )\) is n 0-times continuously differentiable on [0, T] and \(({d}^{{n}_{0}}/d{t}^{{n}_{0}})\widehat{Q}(\cdot )\) is Lipschitz.

Similarly to Section 4.2 for \(k = 1,\ldots,{n}_{0} + 1\), the outer expansions lead to equations

$$\begin{array}{l} \varepsilon ^{0} :\; {\varphi }_{ 0}(t)Q(t) = 0, \\ \varepsilon ^{1} :\; {\varphi }_{ 1}(t)Q(t) + {\varphi }_{0}(t)\widehat{Q}(t) ={ d{\varphi }_{0}(t) \over dt}, \\ \ \qquad \ \cdots \\ \varepsilon ^{k} :\; {\varphi }_{ k}(t)Q(t) + {\varphi }_{k-1}(t)\widehat{Q}(t) ={ d{\varphi }_{k-1}(t) \over dt},\end{array}$$
(5.44)

with constraints

$${\sum }_{i=1}^{m}{\varphi }_{ 0,i}(t) = 1$$

and

$${\sum }_{i=1}^{m}{\varphi }_{ k,i}(t) = 0,\mbox{ for }k \geq 1.$$

The initial-layer correction terms are

$$\begin{array}{l} \varepsilon ^{0} :\ { d{\psi }_{0}(\tau ) \over d\tau } = {\psi }_{0}(\tau )Q(0), \\ \varepsilon ^{1} :\ { d{\psi }_{1}(\tau ) \over d\tau } = {\psi }_{1}(\tau )Q(0) + {\psi }_{0}(\tau )\left (\tau { dQ(0) \over dt} +\widehat{ Q}(0)\right ), \\ \ \qquad \ \cdots \\ \varepsilon ^{k} :\ { d{\psi }_{k}(\tau ) \over d\tau } = {\psi }_{k}(\tau )Q(0) + {r}_{k}(\tau ),\end{array}$$
(5.45)

where

$${r}_{k}(\tau ) = \sum \limits_{i=1}^{k}{\psi }_{ k-i}(\tau )\left ({ {\tau }^{i} \over i!} { {d}^{i}Q(0) \over d{t}^{i}} +{ {\tau }^{i-1} \over (i - 1)!} { {d}^{i-1}\widehat{Q}(0) \over d{t}^{i-1}} \right ),$$

with initial conditions

$$\begin{array}{rl} &{\psi }_{0}(0) = {p}^{0} - {\varphi }_{ 0}(0),\mbox{ and } \\ &{\psi }_{k}(0) = -{\varphi }_{k}(0)\mbox{ for }k \geq \end{array}$$
(1.)

Theorem 5.19

Suppose that (A5.3) and (A5.4) are satisfied. Then

  1. (a)

    φ i ( ⋅) is \(({n}_{0} + 1 - i)\) -times continuously differentiable on [0,  T],

  2. (b)

    for each i, there is a \(\widehat{\kappa } > 0\) such that

    $${\left |{\psi }_{i}\left ({ t \over \varepsilon } \right )\right |} \leq K\exp \left (-\frac{\widehat{\kappa }t} \varepsilon \right ),\,\mbox{ and}$$
  3. (c)

    the approximation error satisfies

    $${ \sup }_{t\in [0,T]}{\left |{p}^\varepsilon (t) -{\sum }_{i=0}^{{n}_{0} }\varepsilon ^{i}{\varphi }_{ i}(t) -{\sum }_{i=0}^{{n}_{0} }\varepsilon ^{i}{\psi }_{ i}\left ({ t \over \varepsilon } \right )\right |} \leq K\varepsilon ^{{n}_{0}+1}.$$
    (5.46)

The proof of this theorem is similar to those of Theorem  4.5, and is thus omitted. We also omit the proofs of the following two theorems because they are similar to that of Theorem  5.4 and Theorem  5.9, respectively.

Theorem 5.20

Suppose (A5.3) and (A5.4) are satisfied with n 0 = 0. Then there exist positive constants ε 0 and K such that for 0 < ε ≤ ε 0,\(i \in \mathcal{M}\) , and for any deterministic process β i (⋅) satisfying |β i (t)|≤ 1 for all t ≥ 0, we have

$$E\exp \left \{{ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup }_{0\leq t\leq T}\left \vert {n}^\varepsilon (t)\right \vert \right \}\leq K,$$

where θ T and n ε (⋅) are as defined previously.

Corollary 5.21

Consider \({Q}^\varepsilon = Q/\varepsilon +\widehat{ Q}\) with constant generators Q and \(\widehat{Q}\) such that Q is weakly irreducible. Then (5.25) and (5.27) hold with constants K and K j independent of T.

Theorem 5.22

Suppose (A5.3) and (A5.4) are satisfied with n 0 = 1. Then for t ∈ [0,T], the process n ε (⋅) converges weakly to a Gaussian process n(⋅) such that

$$En(t) = 0\mbox{ and }E[n^{\prime}(t)n(t)] ={ \int }_{0}^{t}A(s)ds,$$

where A(t) = (A ij (t)) with

$${A}_{ij}(t) = {\beta }_{i}(t){\beta }_{j}(t)\left [{\nu }_{i}(t){\int }_{0}^{\infty }{q}_{ 0,ij}(r,t)dr + {\nu }_{j}(t){\int }_{0}^{\infty }{q}_{ 0,ji}(r,t)dr\right ],$$

and Q 0 (r,t) = (q 0,ij (r,t)) satisfying

$$\begin{array}{rl} &\frac{d{Q}_{0}(r,t)} {dr} = {Q}_{0}(r,t)Q(t),\;r \geq 0, \\ &{Q}_{0}(0,t) = I - {P}_{0}(t), \end{array}$$

with P 0 (t) = (ν′(t),…,ν′(t))′.

Remark 5.23

In view of Theorem  5.22 , the asymptotic covariance is determined by the quasi-stationary distribution ν(t) and Q 0 (r,t). Both ν(t) and Q 0 (r,t) are determined by Q(t), the dominating term in Q ε (t). In the asymptotic normality analysis, it is essential to have the irreducibility condition of Q(t), whereas the role of \(\widehat{Q}(t)\) is not as important. If Q(t) is weakly irreducible, then there exists an ε 0 > 0 such that \({Q}^\varepsilon (t) = Q(t)/\varepsilon +\widehat{ Q}(t)\) is weakly irreducible for 0 < ε ≤ ε 0, as shown in Sethi and Zhang [192, Lemma J.10].

By introducing another generator \(\widehat{Q}(t)\) , we are dealing with a singularly perturbed Markovian system with fast and slow motions. Nevertheless, the entire system under consideration is still weakly irreducible. This irreducibility allows us to extend our previous results with minor modifications.

Although most of the results in this section can be extended to the case with \({Q}^\varepsilon (t) = Q(t)/\varepsilon +\widehat{ Q}(t)\) , there are some exceptions. For example, Corollary  5.6 would not go through because even with constant matrix \(\widehat{Q}(t) =\widehat{ Q}\) , φ 1 (t) in Lemma  5.1 does not equal 0 when \(\widehat{Q}\not =0\).

One may wonder what happens if Q(t) in Q ε (t) is not weakly irreducible. In particular, one can consider the case in which Q(t) consists of several blocks of irreducible submatrices. Related results of asymptotic normality and the exponential bounds are treated in subsequent sections.

3 Markov Chains with Weak and Strong Interactions

For brevity, unless otherwise noted, in the rest of the book, whenever the phrase “weak and strong interaction” is used, it refers to the case of two-time-scale Markov chains with all states being recurrent. Similar approaches can be used for the other cases as well. The remainder of the chapter concentrates on exploiting detailed structures of the weak and strong interactions. In addition, it deals with convergence of the probability distribution with merely measurable generators.

We continue our investigation of asymptotic properties of the Markov chain αε ( ⋅) generated by Q ε( ⋅), with

$${Q}^\varepsilon (t) = \frac{1} \varepsilon \widetilde{Q}(t) +\widehat{ Q}(t),\mbox{ for }t \geq 0,$$
(5.47)

where \(\widetilde{Q}(t) = \mathrm{diag}(\widetilde{{Q}}^{1}(t),\ldots,\widetilde{{Q}}^{l}(t))\) is a block-diagonal matrix such that \(\widehat{Q}(t)\) and \(\widetilde{{Q}}^{k}(t)\), for k = 1,  , l, are themselves generators. The state space of α ε ( ⋅) is given by

$$\mathcal{M} = \{{s}_{11},\ldots,{s}_{1{m}_{1}},\ldots,{s}_{l1},\ldots,{s}_{l{m}_{l}}\}.$$

For each k = 1, …,  l, let \({\mathcal{M}}_{k} =\{ {s}_{k1},\ldots,{s}_{k{m}_{k}}\}\), representing the group of states corresponding to \(\widetilde{{Q}}^{k}(t)\).

The results in Section 5.3.1 reveal the structures of the Markov chains with weak and strong interactions based on the following observations. Intuitively, for small ε, the Markov chain αε( ⋅) jumps more frequently within the states in \({\mathcal{M}}_{k}\) and less frequently from \({\mathcal{M}}_{k}\) to \({\mathcal{M}}_{j}\) for j ≠  k. Therefore, the states in \({\mathcal{M}}_{k}\) can be aggregated and represented by a single state k (one may view the state k as a super state). That is, one can approximate αε( ⋅) by an aggregated process, say, \({\overline{\alpha }}^\varepsilon (\cdot )\). Furthermore, by examining the tightness and finite-dimensional distribution of \({\overline{\alpha }}^\varepsilon (\cdot )\), it will be shown that \({\overline{\alpha }}^\varepsilon (\cdot )\) converges weakly to a Markov chain \(\overline{\alpha }(\cdot )\) generated by

$$\overline{Q}(t) = \mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))\widehat{Q}(t)\mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{ m}_{1}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{l}}).$$
(5.48)

Section 5.3.2 continues the investigation along the line of estimating the error bounds of the approximation. Our interest lies in finding how closely one can approximate an unscaled sequence of occupation measures. The study is through the examination of appropriate exponential-type bounds. To take a suitable scaled sequence, one first centers the sequence around the “mean,” and then compares the actual sequence of occupation measures with this “mean.” In contrast to the results of Section 5.2, in lieu of taking the difference of the occupation measure with that of a deterministic function, it is compared with a random process. One of the key points here is the utilization of solutions of linear time-varying stochastic differential equations, in which the stochastic integration is with respect to a square-integrable martingale.

In comparison with the central limit theorem obtained in Section 5.2, it is interesting to know whether these results still hold under the structure of weak and strong interactions. The answer to this question is in Section 5.3.3, which also contains further study on related scaled sequences of occupation measures. The approach is quite different from that of Section 5.2. We use the martingale formulation and apply the techniques of perturbed test functions. It is interesting to note that the limit process is a switching diffusion process, which does not have independent increments. When the generator is weakly irreducible as in Section 5.2, the motion of jumping around the grouped states disappears and the diffusion becomes the dominant force.

We have considered only Markov chains with smooth generators up to now. However, there are cases in certain applications in which the generators may be merely measurable. Section 5.4 takes care of the scenario in which the Markov chains are governed by generators that are only measurable. Formulation via weak derivatives is also discussed briefly. Finally the chapter is concluded with a few more remarks. Among other things, additional references are given.

3.1 Aggregation of Markov Chains

This section deals with an aggregation of αε( ⋅). The following assumptions will be needed:

  1. (A5.5)

    For each k = 1,  , l and t ∈ [0, T], \(\widetilde{{Q}}^{k}(t)\) is weakly irreducible.

  2. (A5.6)

    \(\widetilde{Q}(\cdot )\) is differentiable on [0,  T] and its derivative is Lipschitz. Moreover, \(\widehat{Q}(\cdot )\) is also Lipschitz.

The assumptions above guarantee the existence of an asymptotic expansion up to zeroth order. To prepare for the subsequent study, we first provide the following error estimate. Since only the zeroth-order expansion is needed here, the estimate is confined to such an approximation. Higher-order terms can be obtained in a similar way.

Lemma 5.24

Assume (A5.5) and (A5.6) . Let P ε (t,t 0 ) denote the transition probability of α ε (⋅). Then for some κ 0 > 0,

$${P}^\varepsilon (t,{t}_{ 0}) = {P}_{0}(t,{t}_{0}) + O\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}(t - {t}_{0})} \varepsilon \right )\right ),$$

where

$$\begin{array}{ll} {P}_{0}(t,{t}_{0})& =\widetilde{ \mathrm{1}\mathrm{l}}\Theta (t,{t}_{0})\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)) \\ & = \left (\begin{array}{ccc} \mathrm{1}{\mathrm{l}}_{{m}_{1}}{\nu }^{1}(t){{\vartheta}}_{11}(t,{t}_{0}),&\ldots &,\mathrm{1}{\mathrm{l}}_{{m}_{1}}{\nu }^{l}(t){{\vartheta}}_{1l}(t,{t}_{0})\\ \vdots &\cdots & \vdots \\ \mathrm{1}{\mathrm{l}}_{{m}_{l}}{\nu }^{1}(t){{\vartheta}}_{l1}(t,{t}_{0}),&\ldots &,\mathrm{1}{\mathrm{l}}_{{m}_{l}}{\nu }^{l}(t){{\vartheta}}_{ll}(t,{t}_{0})\\ \end{array} \right ), \end{array}$$
(5.49)

where ν k (t) is the quasi-stationary distribution of \(\widetilde{{Q}}^{k}(t)\) , and Θ(t,t 0) \(= ({{\vartheta}}_{ij}(t,{t}_{0})) \in {\mathbb{R}}^{l\times l}\) is the solution to the following initial value problem:

$$\begin{array}{ll} &{ d\Theta (t,{t}_{0}) \over dt} = \Theta (t,{t}_{0})\overline{Q}(t), \\ &\Theta ({t}_{0},{t}_{0}) = I.\end{array}$$
(5.50)

Proof: The proof is similar to those of Lemma  5.1 and Theorem  4.29, except that the notation is more involved. □ 

Define an aggregated process of αε ( ⋅) on [0,  T] by

$${ \overline{\alpha }}^\varepsilon (t) = k\mbox{ if }{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ k}.$$
(5.51)

The idea to follow is to treat a related Markov chain having only l states. The transitions among its states correspond to the jumps from one group \({\mathcal{M}}_{k}\) to another \({\mathcal{M}}_{j}\),jk, in the original Markov chain.

Theorem 5.25

Assume (A5.5) and (A5.6) . Then, for any i = 1,…,l, j = 1,…,m i , and bounded and measurable deterministic function β ij (⋅),

$$E\left ({\int }_{0}^{T}{\left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}\right ){\beta }_{ij}(t)dt}\right )^{2} = O(\varepsilon ).$$

Proof: For any i,  j and 0 ≤ t ≤  T, let

$${\eta }^\varepsilon (t) = E{\left ({\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (r)={s}_{ij}\}} - {\nu }_{j}^{i}(r){I}_{\{{\overline{\alpha }}^{ \varepsilon }(r)=i\}}\right ){\beta }_{ij}(r)dr\right )}^{2}.$$
(5.52)

We have suppressed the i,  j dependence of ηε( ⋅) for notational simplicity. Loosely speaking, the argument used below is a Liapunov stability one, and ηε( ⋅) can be viewed as a Liapunov function. By differentiating ηε( ⋅), we have

$$\begin{array}{rl} \frac{d{\eta }^\varepsilon (t)} {dt} =&2E\left [\left ({\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (r)={s}_{ij}\}} - {\nu }_{j}^{i}(r){I}_{\{{\overline{\alpha }}^{ \varepsilon }(r)=i\}}\right ){\beta }_{ij}(r)dr\right )\right. \\ &\left.\qquad \times \left ({I}_{\{{\alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}\right ){\beta }_{ij}(t)\right ]\end{array}$$

The definition of \({\overline{\alpha }}^\varepsilon (\cdot )\) yields that \(\{{\overline{\alpha }}^\varepsilon (t) = i\} =\{ {\alpha }^\varepsilon (t) \in {\mathcal{M}}_{i}\}\). Thus,

$$\frac{d{\eta }^\varepsilon (t)} {dt} = 2{\int }_{0}^{t}{\Phi }^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)dr,$$

where \({\Phi }^\varepsilon (t,r) = {\Phi }_{1}^\varepsilon (t,r) + {\Phi }_{2}^\varepsilon (t,r)\) with

$$\begin{array}{rl} {\Phi }_{1}^\varepsilon (t,r) =&P({\alpha }^\varepsilon (t) = {s}_{ij},{\alpha }^\varepsilon (r) = {s}_{ij}) \\ &\ - {\nu }_{j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i},{\alpha }^\varepsilon (r) = {s}_{ ij}), \end{array}$$
(5.53)

and

$$\begin{array}{rl} {\Phi }_{2}^\varepsilon (t,r) =& - {\nu }_{j}^{i}(r)P({\alpha }^\varepsilon (t) = {s}_{ij},{\alpha }^\varepsilon (r) \in {\mathcal{M}}_{i}) \\ & + {\nu }_{j}^{i}(r){\nu }_{ j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i},{\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}).\end{array}$$
(5.54)

Note that the Markov property of αε( ⋅) implies that for 0 ≤ r ≤  t,

$$\begin{array}{rl} &P({\alpha }^\varepsilon (t)= {s}_{ij},{\alpha }^\varepsilon (r) = {s}_{ij}) \\ &\quad = P({\alpha }^\varepsilon (t) = {s}_{ij}\vert {\alpha }^\varepsilon (r) = {s}_{ij})P({\alpha }^\varepsilon (r) = {s}_{ij})\end{array}$$

In view of the asymptotic expansion, we have

$$\begin{array}{l} P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (r) = {s}_{ ij}) \\ = {\nu }_{j}^{i}(t){{\vartheta}}_{ ii}(t,r) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ).\end{array}$$
(5.55)

It follows that

$$\begin{array}{rl} &P({\alpha }^\varepsilon (t)\in {\mathcal{M}}_{i}\vert {\alpha }^\varepsilon (r) = {s}_{ij}) \\ =&{\sum }_{k=1}^{{m}_{i} }{\nu }_{k}^{i}(t){{\vartheta}}_{ ii}(t,r) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ) \\ =&{{\vartheta}}_{ii}(t,r) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ).\end{array}$$
(5.56)

Combining (5.55) and (5.56) leads to

$${\Phi }_{1}^\varepsilon (t,r) = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ).$$

Similarly, we can show that

$${\Phi }_{2}^\varepsilon (t,r) = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ),$$

by noting that

$$\begin{array}{rl} {\Phi }_{2}^\varepsilon (t,r) =& - {\nu }_{j}^{i}(r){\sum }_{k=1}^{{m}_{i} }P({\alpha }^\varepsilon (t) = {s}_{ ij},{\alpha }^\varepsilon (r) = {s}_{ ik}) \\ &\ + {\nu }_{j}^{i}(r){\sum }_{k=1}^{{m}_{i} }{\nu }_{j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i},{\alpha }^\varepsilon (r) = {s}_{ ik}) \end{array}$$

and

$$\begin{array}{rl} &P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (r) = {s}_{ ik}) \\ &\quad = {\nu }_{j}^{i}(t){{\vartheta}}_{ ii}(t,r) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ), \end{array}$$

for any k = 1,  , m i . Therefore,

$$\frac{d{\eta }^\varepsilon (t)} {dt} = 2{\int }_{0}^{t}O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right )dr = O(\varepsilon ).$$
(5.57)

This together with ηε (0) = 0 implies that η ε(t) =  O(ε). □ 

Theorem  5.25 indicates that νk(t) together with \({\overline{\alpha }}^\varepsilon (\cdot )\) approximates well the Markov chain αε( ⋅) in an appropriate sense. Nevertheless, in general, {αε( ⋅)} is not tight. The following example provides a simple illustration.

Example 5.26

Let α ε (⋅) ∈{ 1,2} denote a Markov chain generated by

$${ 1 \over \varepsilon } \left (\begin{array}{cc} - \lambda & \lambda \\ \mu & -\mu \\ \end{array} \right ),$$

for some λ,μ > 0. Then α ε (⋅) is not tight.

Proof: If α ε (⋅) is tight, then there exists a sequence ε k → 0 such that \({\alpha }^{\varepsilon _{k}}(\cdot )\) converges weakly to a stochastic process \(\alpha (\cdot ) \in D([0,T];\mathcal{M})\) . In view of the Skorohod representation (without changing notation for simplicity), Theorem  A.11 , we may assume \({\alpha }^{\varepsilon _{k}}(\cdot ) \rightarrow \alpha (\cdot )\) w.p.1. It follows from Lemma  A.41 that

$$E{\left \vert {\int }_{0}^{t}{\alpha }^{\varepsilon _{k} }(s)ds -{\int }_{0}^{t}\alpha (s)ds\right \vert }^{2} \rightarrow 0,$$

for all t ∈ [0,T]. Moreover, similarly as in Theorem  5.25 , we obtain

$$E{\left \vert {\int }_{0}^{t}{\alpha }^{\varepsilon _{k} }(s)ds -{\int }_{0}^{t}({\nu }_{ 1} + 2{\nu }_{2})ds\right \vert }^{2} \rightarrow 0,$$

where (ν 1 2 ) is the stationary distribution of α ε (⋅) and ν 1 + 2ν 2 is the mean with respect to the stationary distribution. As a consequence, it follows that \(\alpha (t) = {\nu }_{1} + 2{\nu }_{2}\) for all t ∈ [0,T] w.p.1. Let

$${\delta }_{0} =\min \{ \vert 1 - ({\nu }_{1} + 2{\nu }_{2})\vert,\vert 2 - ({\nu }_{1} + 2{\nu }_{2})\vert \} > 0.$$

Then for t ∈ [0,T],

$$\vert {\alpha }^\varepsilon (t) - ({\nu }_{ 1} + 2{\nu }_{2})\vert \geq {\delta }_{0}.$$

Hence, under the Skorohod topology

$$d({\alpha }^{\varepsilon _{k} }(\cdot ),{\nu }_{1} + 2{\nu }_{2}) \geq {\delta }_{0}.$$

This contradicts the fact that \({\alpha }^{\varepsilon _{k}}(\cdot ) \rightarrow \alpha (\cdot ) = {\nu }_{ 1} + 2{\nu }_{2}\) w.p.1. Therefore, α ε(⋅) cannot be tight. □

Although αε ( ⋅) is not tight because it fluctuates in \({\mathcal{M}}_{k}\) very rapidly for small ε, its aggregation \({\overline{\alpha }}^\varepsilon (\cdot )\) is tight, and converges weakly to \(\overline{\alpha }(t)\),t ≥ 0, a Markov chain generated by \(\overline{Q}(t)\), t ≥ 0, where \(\overline{Q}(t)\) is defined in (5.48). The next theorem shows that \({\overline{\alpha }}^\varepsilon (\cdot )\) can be further approximated by \(\overline{\alpha }(\cdot )\).

Theorem 5.27

Assume (A5.5) and (A5.6) . Then \({\overline{\alpha }}^\varepsilon (\cdot )\) converges weakly to \(\overline{\alpha }(\cdot )\) in \(D([0,T];\overline{\mathcal{M}})\) , as ε → 0.

Proof: The proof is divided into two steps. First, we show that \({\overline{\alpha }}^\varepsilon (\cdot )\) defined in (5.51) is tight in \(D([0,T];\overline{\mathcal{M}})\). The definition of \({\overline{\alpha }}^\varepsilon (\cdot )\) implies that

$$\{{\overline{\alpha }}^\varepsilon (t) = i\} =\{ {\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}\} =\{ {\alpha }^\varepsilon (t) = {s}_{ ij}\mbox{ for some }j = 1,\ldots,{m}_{i}\}.$$

Consider the conditional expectation

$$\begin{array}{l} E\left[{\left ({\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (s)\right)}^{2}\vert{\alpha }^\varepsilon (s) = {s}_{ ij}\right] \\ \quad=E\left[{\left ({\overline{\alpha }}^\varepsilon (t + s) - i\right)}^{2}\vert{\alpha }^\varepsilon (s) = {s}_{ ij}\right ] \\ \quad = \sum_{k=1}^{l}E\left [{\left ({\overline{\alpha }}^\varepsilon (t + s) - i\right )}^{2}{I}_{\{{\overline{\alpha }}^{ \varepsilon}(t+s)=k\}} \vert{\alpha }^\varepsilon (s) = {s}_{ ij}\right] \\ \quad={\sum }_{k=1}^{l}{(k - i)}^{2}P({\overline{\alpha}}^\varepsilon (t + s) = k\vert {\alpha }^\varepsilon (s) = {s}_{ij}) \\ \quad \leq {l}^{2} \sum \limits_{k\neq i}P({\overline{\alpha}}^\varepsilon (t + s) = k\vert {\alpha}^\varepsilon (s) = {s}_{ij})\end{array}$$

Since \(\{{\overline{\alpha }}^\varepsilon (t + s) = k\} =\{ {\alpha }^\varepsilon (t + s) \in {\mathcal{M}}_{k}\}\), it follows that

$$\begin{array}{rl} &P({\overline{\alpha } }^\varepsilon (t + s) = k\vert {\alpha }^\varepsilon (s) = {s}_{ij}) \\ =&{\sum }_{{k}_{1}=1}^{{m}_{k} }P({\alpha }^\varepsilon (t + s) = {s}_{ k{k}_{1}}\vert {\alpha }^\varepsilon (s) = {s}_{ ij}) \\ =&{\sum }_{{k}_{1}=1}^{{m}_{k} }{\nu }_{{k}_{1}}^{k}(t + s){{\vartheta}}_{ ik}(t + s,s) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}t} \varepsilon \right )\right ) \\ =&{{\vartheta}}_{ik}(t + s,s) + O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}t} \varepsilon \right )\right )\end{array}$$

Therefore, we obtain

$$\begin{array}{rl} &E\left [{\left ({\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (s)\right )}^{2}\vert {\alpha }^\varepsilon (s) = {s}_{ij}\right ] \\ \leq &{l}^{2} \sum \limits_{k\neq i}{{\vartheta}}_{ik}(t + s,s) + O\left(\varepsilon +\exp \left( -\frac{{\kappa }_{0}t} \varepsilon \right)\right)\end{array}$$

Note that \(\lim \limits_{t\rightarrow 0}{{\vartheta}}_{ik}(t + s,s) = 0\) for i≠ k.

$$\lim \limits_{t\rightarrow 0}\left (\lim \limits_{\varepsilon \rightarrow 0}E\left ({({\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (s))}^{2}\vert {\alpha }^\varepsilon (s) = {s}_{ ij}\right )\right ) = 0.$$

Thus, the Markov property of αε( ⋅) implies

$${ \lim }_{t\rightarrow 0}\left (\lim \limits_{\varepsilon \rightarrow 0}E\left ({({\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (s))}^{2}\vert {\alpha }^\varepsilon (r) :\; r \leq s\right )\right ) = 0.$$
(5.58)

Recall that \({\overline{\alpha }}^\varepsilon (\cdot )\) is bounded. The tightness of \({\overline{\alpha }}^\varepsilon (\cdot )\) follows from Kurtz’ tightness criterion (see Lemma  A.17).

To complete the proof, it remains to show that the finite-dimensional distributions of \({\overline{\alpha }}^\varepsilon (\cdot )\) converge to that of \(\overline{\alpha }(\cdot )\). In fact, for any

$$0 \leq {t}_{1} < {t}_{2} < \cdots < {t}_{n} \leq T\mbox{ and }{i}_{1},{i}_{2},\ldots,{i}_{n} \in \overline{\mathcal{M}} =\{ 1,\ldots,l\},$$

we have

$$\begin{array}{rl} &P({\overline{\alpha } }^\varepsilon ({t}_{n}) = {i}_{n},\ldots,{\overline{\alpha }}^\varepsilon ({t}_{1}) = {i}_{1}) \\ =&P({\alpha }^\varepsilon ({t}_{ n}) \in {\mathcal{M}}_{{i}_{n}},\ldots,{\alpha }^\varepsilon ({t}_{ 1}) \in {\mathcal{M}}_{{i}_{1}}) \\ =&{\sum }_{{j}_{1},\ldots,{j}_{n}}P({\alpha }^\varepsilon ({t}_{ n}) = {s}_{{i}_{n}{j}_{n}},\ldots,{\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}}) \\ =&{\sum }_{{j}_{1},\ldots,{j}_{n}}P({\alpha }^\varepsilon ({t}_{ n}) = {s}_{{i}_{n}{j}_{n}}\vert {\alpha }^\varepsilon ({t}_{ n-1}) = {s}_{{i}_{n-1}{j}_{n-1}}) \\ &\qquad \times \cdots \times P({\alpha }^\varepsilon ({t}_{ 2}) = {s}_{{i}_{2}{j}_{2}}\vert {\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}})P({\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}})\end{array}$$

In view of Lemma  5.24, for each k, we have

$$P({\alpha }^\varepsilon ({t}_{ k}) = {s}_{{i}_{k}{j}_{k}}\vert {\alpha }^\varepsilon ({t}_{ k-1}) = {s}_{{i}_{k-1}{j}_{k-1}}) \rightarrow {\nu }_{{j}_{k}}^{{i}_{k} }({t}_{k}){{\vartheta}}_{{i}_{k-1}{i}_{k}}({t}_{k},{t}_{k-1}).$$

Moreover, note that

$${\sum }_{{j}_{k}=1}^{{m}_{{i}_{k}} }{\nu }_{{j}_{k}}^{{i}_{k} }({t}_{k}) = 1.$$

It follows that

$$\begin{array}{rl} & \sum \limits_{{j}_{1},\ldots,{j}_{n}}P({\alpha }^\varepsilon ({t}_{ n}) = {s}_{{i}_{n}{j}_{n}}\vert {\alpha }^\varepsilon ({t}_{ n-1}) = {s}_{{i}_{n-1}{j}_{n-1}}) \\ &\qquad \times \cdots \times P({\alpha }^\varepsilon ({t}_{ 2}) = {s}_{{i}_{2}{j}_{2}}\vert {\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}})P({\alpha }^\varepsilon ({t}_{ 1}) = {s}_{{i}_{1}{j}_{1}}) \\ \rightarrow &{\sum }_{{j}_{1},\ldots,{j}_{n}}{\nu }_{{j}_{n}}^{{i}_{n} }({t}_{n}){{\vartheta}}_{{i}_{n-1}{i}_{n}}({t}_{n},{t}_{n-1})\cdots {\nu }_{{j}_{2}}^{{i}_{2} }({t}_{2}){{\vartheta}}_{{i}_{1}{i}_{2}}({t}_{2},{t}_{1}){\nu }_{{j}_{1}}^{{i}_{1} }({t}_{1})\widetilde{{{\vartheta}}}_{{i}_{1}}({t}_{1}) \\ =&{{\vartheta}}_{{i}_{n-1}{i}_{n}}({t}_{n},{t}_{n-1})\cdots {{\vartheta}}_{{i}_{1}{i}_{2}}({t}_{2},{t}_{1})\widetilde{{{\vartheta}}}_{{i}_{1}}({t}_{1}) \\ =&P(\overline{\alpha }({t}_{n}) = {i}_{n},\ldots,\overline{\alpha }({t}_{1}) = {i}_{1}), \end{array}$$

where \({\sum }_{{j}_{1},\ldots,{j}_{n}} = \sum \limits_{{j}_{1}=1}^{{m}_{{i}_{1}}}\cdots {\sum }_{{j}_{ n}=1}^{{m}_{{i}_{n}}}\) and \(\widetilde{{{\vartheta}}}_{{i}_{1}}({t}_{1})\) denotes the initial distribution (also known as absolute probability in the literature of Markov chains). Thus, \({\overline{\alpha }}^\varepsilon (\cdot ) \rightarrow \overline{\alpha }(\cdot )\) in distribution. □ 

This theorem implies that \({\overline{\alpha }}^\varepsilon (\cdot )\) converges to a Markov chain, although \({\overline{\alpha }}^\varepsilon (\cdot )\) itself is not a Markov chain in general. If, however, the generator Q ε(t) has some specific structure, then \({\overline{\alpha }}^\varepsilon (\cdot )\) is a Markov chain. The following example demonstrates this point.

Example 5.28

Let \(\widetilde{Q}(t) = (\widetilde{{q}}_{ij}(t))\) and \(\overline{Q}(t) = ({\overline{q}}_{ij}(t))\) denote generators with the corresponding state spaces \(\{{a}_{1},\ldots,{a}_{{m}_{0}}\}\) and {1,…,l}, respectively. Consider

$${ Q}^\varepsilon (t) = \frac{1} \varepsilon \left (\begin{array}{cccc} \widetilde{Q}(t)&&& \\ & &\ddots &\\ & & &\widetilde{Q}(t) \\ \end{array} \right )+\left (\begin{array}{ccc} {\overline{q}}_{11}(t){I}_{{m}_{0}} & \cdots &{\overline{q}}_{1l}(t){I}_{{m}_{0}}\\ \vdots & \vdots & \vdots \\ {\overline{q}}_{l1}(t){I}_{{m}_{0}} & \cdots & {\overline{q}}_{ll}(t){I}_{{m}_{0}}\\ \end{array} \right ),$$
(5.59)

where \({I}_{{m}_{0}}\) is the m 0 × m 0 identity matrix. In this case

$${m}_{1} = {m}_{2} = \cdots = {m}_{l} = {m}_{0}.$$

Then \({\overline{\alpha }}^\varepsilon (\cdot )\) is a Markov chain generated by \(\overline{Q}(t)\) . In fact, let

$${\chi }^\varepsilon (t) = \left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{11}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{1{m}_{0}}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l1}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l{m}_{0}}\}}\right ).$$

Note that s ij = (i,a j ) for j = 1,…,m 0 and i = 1,…,l. In view of Lemma  2.4 , we obtain that

$${\chi }^\varepsilon (t) -{\int }_{0}^{t}{\chi }^\varepsilon (s){Q}^\varepsilon (s)ds$$
(5.60)

is a martingale. Postmultiplying (multiplying from the right) (5.60) by

$$\widetilde{\mathrm{1}\mathrm{l}} = \mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{m}_{0}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{0}})$$

and noting that \(\{{\overline{\alpha }}^\varepsilon (t) = i\} =\{ {\alpha }^\varepsilon (t) \in {\mathcal{M}}_{i}\}\) and

$${\chi }^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} = ({I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (t)=l\}}),$$

we obtain that

$$\begin{array}{rl} \left ({I}_{\{{\overline{\alpha }}^\varepsilon (t)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (t)=l\}}\right ) -{\int }_{0}^{t}{\chi }^\varepsilon (s){Q}^\varepsilon (s)ds\widetilde{\mathrm{1}\mathrm{l}} \end{array}$$

is still a martingale. In view of the special structure of Q ε (t) in (5.59),

$$\widetilde{Q}(t)\mathrm{1}{\mathrm{l}}_{{m}_{0}} = 0,\quad {Q}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} =\widehat{ Q}(t)\widetilde{\mathrm{1}\mathrm{l}},$$

and

$${\chi }^\varepsilon (s)\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} = \left ({I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (s)=l\}}\right )\overline{Q}(s).$$

Therefore, (5.60) implies that

$$\left ({I}_{\{{\overline{\alpha }}^\varepsilon (t)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (t)=l\}}\right ) -{\int }_{0}^{t}\left ({I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (s)=l\}}\right )\overline{Q}(s)ds$$

is a martingale. This implies, in view of Lemma 2.4, that \({\overline{\alpha }}^\varepsilon (\cdot )\) is a Markov chain generated by \(\overline{Q}(t)\) , t ≥ 0.

3.2 Exponential Bounds

For each i = 1, …,  l, j = 1,  , m i , \(\alpha \in \mathcal{M}\), and t ≥ 0, let β ij (t) be a bounded, Borel measurable, deterministic function and let

$${W}_{ij}(t,\alpha ) = \left ({I}_{\{\alpha ={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{ \alpha \in {\mathcal{M}}_{i}\}}\right ){\beta }_{ij}(t).$$
(5.61)

Consider normalized occupation measures

$${n}^\varepsilon (t) = \left ({n}_{ 11}^\varepsilon (t),\ldots,{n}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{n}_{ l1}^\varepsilon (t),\ldots,{n}_{ l{m}_{l}}^\varepsilon (t)\right ),$$

where

$${n}_{ij}^\varepsilon (t) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}{W}_{ ij}(s,{\alpha }^\varepsilon (s))ds.$$

In this section, we establish the exponential error bound for n ε( ⋅), a sequence of suitably scaled occupation measures for the singularly perturbed Markov chains with weak and strong interactions.

In view of Theorem  4.29, there exists κ0 > 0 such that

$$\vert{P}^\varepsilon (t,s) - {P}_{ 0}(t,s)\vert = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).$$
(5.62)

Similar to Section 5.3.2, for fixed but otherwise arbitrary T > 0, let

$${K}_{T} =\max {\left \{ 1{,\sup}_{0\leq s\leq t\leq T}\left (\frac{\vert {P}^\varepsilon (t,s) - {P}_{0}(t,s)\vert } {\varepsilon +\exp (-{\kappa }_{0}(t - s)/\varepsilon )} \right )\right \}}.$$
(5.63)

We may write (5.62) in terms of K T and O 1( ⋅) as follows:

$$\vert{P}^\varepsilon (t,s) - {P}_{ 0}(t,s)\vert = {K}_{T}{O}_{1}\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ),$$
(5.64)

where | O 1(y) | ∕ | y | ≤ 1. The notation of K T and O 1( ⋅) above emphasizes the separation of the dependence of the constant and a “norm 1” function. Essentially, K T serves as a magnitude of the bound indicating the size of the bounding region, and the rest is absorbed into the function O 1( ⋅).

Theorem 5.29

Assume (A5.5) and (A5.6) . Then there exist ε 0 > 0 and K > 0 such that for 0 < ε ≤ ε 0 , T ≥ 0, and for any bounded, Borel measurable, and deterministic process β ij (⋅),

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{3}}\sup }_{0\leq t\leq T}\vert {n}^\varepsilon (t)\vert \right ) \leq K,$$
(5.65)

where θ T is any constant satisfying

$$0 \leq {\theta }_{T} \leq \frac{\min \{1,{\kappa }_{0}\}} {{K}_{T}\vert \beta {\vert }_{T}(1 + \vert \widehat{Q}{\vert }_{T})},$$
(5.66)

and where |⋅| T denotes the matrix norm as defined in (5.12) , that is,

$$\vert \beta {\vert }_{T} {=\max { }_{i,j}\sup }_{0\leq t\leq T}\vert {\beta }_{ij}(t)\vert,$$

similarly for \(\vert \widehat{Q}{\vert }_{T}\).

Remark 5.30

This theorem is a natural extension to Theorem  5.4 . Owing to the existence of the weak and strong interactions, slightly stronger conditions on K T and θ T are made in (5.63) and (5.66). Also the exponential constant in (5.65) is changed to (T + 1)3.

Proof of Theorem  5.29: Here the proof is again along the lines of Theorem  5.4. Since Steps 2-5 in the proof are similar to those of Theorem  5.4, we will only give the proof for Step 1.

Let χε ( ⋅) denote the vector of indicators corresponding to α ε ( ⋅), that is,

$${\chi }^\varepsilon (t) = \left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{11}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{1{m}_{1}}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l1}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l{m}_{l}}\}}\right ).$$

Then w ε( ⋅) defined by

$${w}^\varepsilon (t) = {\chi }^\varepsilon (t) - {\chi }^\varepsilon (0) -{\int }_{0}^{t}{\chi }^\varepsilon (s){Q}^\varepsilon (s)ds$$
(5.67)

is an \({\mathbb{R}}^{m}\)-valued martingale. In fact, w ε ( ⋅) is square integrable on [0,  T]. It then follows from a well-known result (see Elliott [55] or Kunita and Watanabe [134]) that a stochastic integral with respect to w ε(t) can be defined. In view of the defining equation (5.67), the linear stochastic differential equation

$$d{\chi }^\varepsilon (t) = {\chi }^\varepsilon (t){Q}^\varepsilon (t)dt + d{w}^\varepsilon (t)$$
(5.68)

makes sense. Recall that P ε(t, s) is the principal matrix solution of the matrix differential equation

$${ dy(t) \over dt} = y(t){Q}^\varepsilon (t).$$
(5.69)

The solution of this stochastic differential equation is

$$\begin{array}{ll} {\chi }^\varepsilon (t)& = {\chi }^\varepsilon (0){P}^\varepsilon (t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)){P}^\varepsilon (t,s) \\ & = {\chi }^\varepsilon (0)\left ({P}^\varepsilon (t,0) - {P}_{ 0}(t,0)\right ) \\ &\quad \quad +{ \int }_{0}^{t}(d{w}^\varepsilon (s))\left ({P}^\varepsilon (t,s) - {P}_{ 0}(t,s)\right ) \\ &\quad \quad + {\chi }^\varepsilon (0){P}_{ 0}(t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)){P}_{ 0}(t,s).\end{array}$$
(5.70)

Use \({{\vartheta}}_{ij}(t,s)\) defined in Lemma  5.24 and write \(\Theta (t,s) = ({{\vartheta}}_{ij}(t,s))\) . Then it is easy to check that

$${P}_{0}(t,s) =\widetilde{ \mathrm{1}\mathrm{l}}\Theta (t,s)\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)).$$
(5.71)

Set

$${\overline{\chi }}^\varepsilon (t) = \left ({\nu }^{1}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=1\}},\ldots,{\nu }^{l}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=l\}}\right ) \in {\mathbb{R}}^{m}$$

and

$$\widetilde{{\chi }}^\varepsilon (t) = \left ({I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=1\}},\ldots,{I}_{\{{\overline{\alpha }}^\varepsilon (t)=l\}}\right ) \in {\mathbb{R}}^{l}.$$

Then it follows that

$$\begin{array}{l} \widetilde{{\chi }}^\varepsilon (t) = {\chi }^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}}\quad \mbox{ and } \\ {\overline{\chi }}^\varepsilon (t) =\widetilde{ {\chi }}^\varepsilon (t)\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)).\end{array}$$
(5.72)

Moreover, postmultiplying both sides of (5.67) by \(\widetilde{\mathrm{1}\mathrm{l}}\) yields that

$${\chi }^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} - {\chi }^\varepsilon (0)\widetilde{\mathrm{1}\mathrm{l}} -{\int }_{0}^{t}{\chi }^\varepsilon (s){Q}^\varepsilon (s)\widetilde{\mathrm{1}\mathrm{l}}ds = {w}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}}.$$
(5.73)

Here \({w}^\varepsilon (\cdot )\widetilde{\mathrm{1}\mathrm{l}}\) is also a square-integrable martingale. Note that \(\widetilde{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} = 0\) and hence

$$\begin{array}{l} {Q}^\varepsilon (s)\widetilde{\mathrm{1}\mathrm{l}} =\widehat{ Q}(s)\widetilde{\mathrm{1}\mathrm{l}}\quad \mbox{ and } \\ {\overline{\chi }}^\varepsilon (s)\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} =\widetilde{ {\chi }}^\varepsilon (s)\mathrm{diag}({\nu }^{1}(s),\ldots,{\nu }^{l}(s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} =\widetilde{ {\chi }}^\varepsilon (s)\overline{Q}(s)\end{array}$$

We obtain from (5.73) that

$$\widetilde{{\chi }}^\varepsilon (t) -\widetilde{ {\chi }}^\varepsilon (0) -{\int }_{0}^{t}\left (({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} +\widetilde{ {\chi }}^\varepsilon (s)\overline{Q}(s)\right )ds = {w}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}}.$$

Since Θ(t,  s) is the principal matrix solution to

$$\frac{d\Theta (t,s)} {dt} = \Theta (t,s)\overline{Q}(t),\,\mbox{ with }\Theta (s,s) = I,$$

similar to (5.68), solving the stochastic differential equation for \(\widetilde{{\chi }}^\varepsilon (\cdot )\) leads to the equation:

$$\begin{array}{l} \widetilde{{\chi }}^\varepsilon (t) =\widetilde{ {\chi }}^\varepsilon (0)\Theta (t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)\widetilde{\mathrm{1}\mathrm{l}})\Theta (t,s) \\ \quad \quad +{ \int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,s)ds.\end{array}$$
(5.74)

Let us now return to the last two terms in (5.70) and use (5.71), (5.72), and (5.74) to obtain

$$\begin{array}{l} {\chi }^\varepsilon (0){P}_{ 0}(t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)){P}_{ 0}(t,s) \\ \quad = \left ({\chi }^\varepsilon (0)\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s))\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,s)\right )\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)) \\ \quad = \left (\widetilde{{\chi }}^\varepsilon (0)\Theta (t,0) +{ \int }_{0}^{t}(d{w}^\varepsilon (s)\widetilde{\mathrm{1}\mathrm{l}})\Theta (t,s)\right )\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)) \\ \quad = \left (\widetilde{{\chi }}^\varepsilon (t) -{\int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,s)ds\right )\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)) \\ \quad ={ \overline{\chi }}^\varepsilon (t) -{\int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}\Theta (t,s)\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))ds \\ \quad ={ \overline{\chi }}^\varepsilon (t) -{\int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s){P}_{ 0}(t,s)ds\end{array}$$

Combining this with (5.70), we have

$$({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t)) +{ \int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s){P}_{ 0}(t,s)ds = {\eta }^\varepsilon (t),$$
(5.75)

where

$${\eta }^\varepsilon (t) = {\chi }^\varepsilon (0)\left ({P}^\varepsilon (t,0) - {P}_{ 0}(t,0)\right ) +{ \int }_{0}^{t}(d{w}^\varepsilon (s))\left ({P}^\varepsilon (t,s) - {P}_{ 0}(t,s)\right ).$$

Note that the matrix P ε(t, s) is invertible but P 0(t,  s) is not. The idea is to approximate the noninvertible matrix P 0(t, s) by the invertible P ε(t,  s). Let

$${\eta }_{1}^\varepsilon (t) ={ \int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\left ({P}_{ 0}(t,s) - {P}^\varepsilon (t,s)\right )ds$$
(5.76)

and

$${\phi }^\varepsilon (t) = ({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t)) - ({\eta }^\varepsilon (t) - {\eta }_{ 1}^\varepsilon (t)).$$

Then ϕε(0) = 0 and ϕε(t) satisfies the following equation:

$${\phi }^\varepsilon (t) +{ \int }_{0}^{t}{\phi }^\varepsilon (s)\widehat{Q}(s){P}^\varepsilon (t,s)ds +{ \int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s){P}^\varepsilon (t,s)ds = 0.$$

The properties of the principal matrix solution imply that

$${P}^\varepsilon (t,s) = {P}^\varepsilon (0,s){P}^\varepsilon (t,0).$$

Set

$$\begin{array}{l} \check{{Q}}^\varepsilon (t) = {P}^\varepsilon (t,0)\widehat{Q}(t){P}^\varepsilon (0,t), \\ {\psi }^\varepsilon (t) = {\phi }^\varepsilon (t){P}^\varepsilon (0,t),\quad \mbox{ and } \\ {\eta }_{2}^\varepsilon (t) = ({\eta }^\varepsilon (t) - {\eta }_{1}^\varepsilon (t))\widehat{Q}(t){P}^\varepsilon (0,t)\end{array}$$

Owing to the properties of the principal matrix solution, for any t ∈ [0, T], we have

$${P}^\varepsilon (0,t){P}^\varepsilon (t,0) = {P}^\varepsilon (t,t) = I,$$
(5.77)

ψ ε (0) = 0 and ψ ε(t) satisfies the equation

$${\psi }^\varepsilon (t) +{ \int }_{0}^{t}{\psi }^\varepsilon (s)\check{{Q}}^\varepsilon (s)ds +{ \int }_{0}^{t}{\eta }_{ 2}^\varepsilon (s)ds = 0.$$

The solution to this equation is given by

$${\psi }^\varepsilon (t) = -{\int }_{0}^{t}{\eta }_{ 2}^\varepsilon (s)\check{{\Phi }}^\varepsilon (t,s)ds,$$
(5.78)

where \(\check{{\Phi }}^\varepsilon (t,s)\) is the principal matrix solution to

$$\frac{d\check{{\Phi }}^\varepsilon (t,s)} {dt} = -\check{{\Phi }}^\varepsilon (t,s)\check{{Q}}^\varepsilon (t),\;\mbox{ with }\check{{\Phi }}^\varepsilon (s,s) = I.$$

Postmultiplying both sides of (5.78) by P ε(t, 0) yields

$$\begin{array}{rl} {\phi }^\varepsilon (t)& = {\psi }^\varepsilon (t){P}^\varepsilon (t,0) \\ & = -{\int }_{0}^{t}{\eta }_{ 2}^\varepsilon (s)\check{{\Phi }}^\varepsilon (t,s){P}^\varepsilon (t,0)ds \\ & = -{\int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)ds, \end{array}$$

where

$$\check{{\Psi }}^\varepsilon (t,s) = {P}^\varepsilon (0,s)\check{{\Phi }}^\varepsilon (t,s){P}^\varepsilon (t,0).$$

Thus it follows that

$${\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t) = {\eta }^\varepsilon (t) - {\eta }_{ 1}^\varepsilon (t) -{\int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)ds.$$
(5.79)

Again using (5.77), we have

$$\begin{array}{l} \frac{d} {dt}\left (\check{{\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\right ) \\ \quad = \left (\frac{d\check{{\Phi }}^\varepsilon (t,0)} {dt} \right ){P}^\varepsilon (t,0) +\check{ {\Phi }}^\varepsilon (t,0)\left (\frac{d{P}^\varepsilon (t,0)} {dt} \right ) \\ \quad = -\check{{\Phi }}^\varepsilon (t,0)\check{{Q}}^\varepsilon (t){P}^\varepsilon (t,0) +\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0){Q}^\varepsilon (t) \\ \quad = -\check{{\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\widehat{Q}(t){P}^\varepsilon (0,t){P}^\varepsilon (t,0) +\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0){Q}^\varepsilon (t) \\ \quad = -\check{{\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\widehat{Q}(t) +\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0){Q}^\varepsilon (t) \\ \quad =\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\left (-\widehat{Q}(t) + {Q}^\varepsilon (t)\right ) \\ \quad =\check{ {\Phi }}^\varepsilon (t,0){P}^\varepsilon (t,0)\left (\frac{1} \varepsilon \widetilde{Q}(t)\right )\end{array}$$

This implies that \(\check{{\Psi }}^\varepsilon (t,s)\) is the principal matrix solution to the differential equation

$$\frac{d\check{{\Psi }}^\varepsilon (t,s)} {dt} =\check{ {\Psi }}^\varepsilon (t,s)\left (\frac{1} \varepsilon \widetilde{Q}(t)\right ),\;\mbox{ with }\check{{\Psi }}^\varepsilon (s,s) = I.$$
(5.80)

Therefore, all entries of \(\check{{\Psi }}^\varepsilon (t,s)\) are bounded below from 0 and bounded above by 1, and these bounds are uniform in 0 ≤ s ≤  t ≤ T. Thus, \(\vert \check{{\Psi }}^\varepsilon (t,s){\vert }_{T} \leq 1\).

Multiplying both sides of (5.79) by the m ×m matrix

$$\beta (t) := \mathrm{diag}({\beta }_{11}(t),\ldots,{\beta }_{1{m}_{1}}(t),\ldots,{\beta }_{l1}(t),\ldots,{\beta }_{l{m}_{l}}(t))$$

from the right and integrating over the interval [0, ς], for each ς ∈ [0,  T], we have

$$\begin{array}{rl} { \int }_{0}^{\varsigma }({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t))\beta (t)dt& ={ \int }_{0}^{\varsigma }{\eta }^\varepsilon (t)\beta (t)dt -{\int }_{0}^{\varsigma }{\eta }_{ 1}^\varepsilon (t)\beta (t)dt \\ &\quad -{\int }_{0}^{\varsigma }{ \int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)ds\beta (t)dt\end{array}$$

By changing the order of integration, we write the last term in the above expression as

$$\begin{array}{l} { \int }_{0}^{\varsigma }{ \int }_{0}^{t}({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)ds\beta (t)dt \\ \quad ={ \int }_{0}^{\varsigma }({\eta }^\varepsilon (s) - {\eta }_{ 1}^\varepsilon (s))\left ({\int }_{s}^{\varsigma }\widehat{Q}(s)\check{{\Psi }}^\varepsilon (t,s)\beta (t)dt\right )ds\end{array}$$

Therefore, it follows that

$${\int }_{0}^{\varsigma }({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t))\beta (t)dt ={ \int }_{0}^{\varsigma }{\eta }^\varepsilon (t)\widetilde{\beta }(t)dt -{\int }_{0}^{\varsigma }{\eta }_{ 1}^\varepsilon (t)\widetilde{\beta }(t)dt,$$
(5.81)

where

$$\widetilde{\beta }(t) = \beta (t) +{ \int }_{t}^{\varsigma }\widehat{Q}(t)\check{{\Psi }}^\varepsilon (r,t)\beta (r)dr.$$

Moreover, in view of the fact that \(\vert \check{{\Psi }}^\varepsilon (t,s){\vert }_{T} \leq 1\), it is easy to see that

$$\vert \widetilde{\beta }{\vert }_{T} \leq (1 + T)\vert \beta {\vert }_{T}(1 + \vert \widehat{Q}{\vert }_{T}).$$
(5.82)

Note that n ε ( ⋅) can be written in terms of χ ε ( ⋅) and \({\overline{\chi }}^\varepsilon (\cdot )\) as

$${n}^\varepsilon (\varsigma ) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{\varsigma }({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t))\beta (t)dt.$$

By virtue of (5.81), it follows that

$$\vert {n}^\varepsilon (\varsigma )\vert \leq \frac{1} {\sqrt\varepsilon }\left \vert {\int }_{0}^{\varsigma }{\eta }^\varepsilon (t)\widetilde{\beta }(t)dt\right \vert + \frac{1} {\sqrt\varepsilon }\left \vert {\int }_{0}^{\varsigma }{\eta }_{ 1}^\varepsilon (t)\widetilde{\beta }(t)dt\right \vert.$$

Note that in view of the definition of η1 ε( ⋅) in (5.76),

$$\vert {\eta }_{1}^\varepsilon (t)\vert ={ \int }_{0}^{t}O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right )ds = O(\varepsilon (t + 1)).$$

Thus, in view of (5.82),

$$\begin{array}{rl} {\sup }_{0\leq \varsigma \leq T}\left \vert {\int }_{0}^{\varsigma }{\eta }_{ 1}^\varepsilon (t)\widetilde{\beta }(t)dt\right \vert & = \vert \widetilde{\beta }{\vert {}_{ T}\sup }_{0\leq \varsigma \leq T}{ \int }_{0}^{\varsigma }O(\varepsilon (t + 1))dt \\ & = \vert \widetilde{\beta }{\vert {}_{T}\sup }_{0\leq \varsigma \leq T}O(\varepsilon ({\varsigma }^{2} + \varsigma )) \\ & = \vert \widetilde{\beta }{\vert }_{T}({T}^{2} + T)O(\varepsilon ) \\ & \leq {(1 + T)}^{3}\vert \beta {\vert }_{ T}(1 + \vert \widehat{Q}{\vert }_{T})O(\varepsilon ).\end{array}$$
(5.83)

Thus, in view of (5.63) and (5.66), for some ε0  > 0, and all 0 < ε ≤ ε 0,

$$\begin{array}{l} \exp \left ({\frac{{\theta }_{T}} {{(T + 1)}^{3}}\sup }_{0\leq \varsigma \leq T}{\left |{ \int }_{0}^{\varsigma }\frac{{\eta }_{1}^\varepsilon (t)\widetilde{\beta }(t)} {\sqrt\varepsilon } dt\right |}\right ) \\ \quad \leq \exp \left ( \frac{O(\sqrt\varepsilon )\min \{1,{\kappa }_{0}\}} {{K}_{T}} \right ) \\ \quad \leq \exp \left(O(\sqrt{\varepsilon _{0}})\min \{1,{\kappa }_{0}\}\right) \leq K.\end{array}$$
(5.84)

Moreover, using (5.64), as in the proof of Theorem  5.4, we obtain that

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup }_{0\leq \varsigma \leq T}{\left |{ \int }_{0}^{\varsigma }\left (\frac{{\eta }^\varepsilon (t)\widetilde{\beta }(t)} {\sqrt\varepsilon } \right )dt\right |}\right ) \leq K,$$
(5.85)

for

$$0 \leq {\theta }_{T} \leq \frac{\min \{1,{\kappa }_{0}\}} {{K}_{T}\vert \beta {\vert }_{T}(1 + \vert \widehat{Q}{\vert }_{T})}.$$

Finally, combine (5.81), (5.83), and (5.85) to obtain

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{3}}\sup }_{0\leq t\leq T}\vert {n}^\varepsilon (t)\vert \right ) \leq K.$$

This completes the proof. □ 

Remark 5.31

It is easily seen that the error bound so obtained has a form similar to that of the martingale inequality. If nε(⋅) were a martingale, the inequality would be obtained much more easily since exp (⋅) is a convex function. As in Section 5.2, the error bound is still a measure of “goodness” of approximation. However, one cannot compare the unscaled occupation measures with a deterministic function. A sensible alternative is to use an approximation by the aggregated process that is no longer deterministic. The exponential bounds obtained tell us exactly how closely one can carry out the approximation. It should be particularly useful for many applications in stochastic control problems with Markovian jump disturbances under discounted cost criteria.

The next two corollaries show that the error bound can be improved under additional conditions by having smaller exponential constants, e.g., \({(T + 1)}^{3/2}\) or \({(T + 1)}^{5/2}\) instead of ( T + 1)3.

Corollary 5.32

Assume that the conditions of Theorem  5.29 hold. Let \(\widetilde{Q}(t) = (\widetilde{{q}}_{ij}(t))\) and \(\overline{Q}(t) = ({\overline{q}}_{ij}(t))\) denote generators with the corresponding state spaces \(\{{a}_{1},\ldots,{a}_{{m}_{0}}\}\) and {1,…,l}, respectively. Consider

$${Q}^\varepsilon (t) = \frac{1} \varepsilon \left (\begin{array}{ccc} \widetilde{Q}(t)&& \\ &\ddots &\\ & &\widetilde{Q}(t) \\ \end{array} \right )+\left (\begin{array}{ccc} {\overline{q}}_{11}(t){I}_{{m}_{0}} & \cdots &{\overline{q}}_{1l}(t){I}_{{m}_{0}} \\ \vdots & \cdots & \vdots \\ {\overline{q}}_{l1}(t){I}_{{m}_{0}} & \cdots & {\overline{q}}_{ll}(t){I}_{{m}_{0}}\\ \end{array} \right ),$$

where \({I}_{{m}_{0}}\) is the m 0 × m 0 identity matrix. Then there exist positive constants ε 0 and K such that for 0 < ε ≤ ε 0 , and T ≥ 0,

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{3} {2} }} \sup }_{0\leq t\leq T}\vert {n}^\varepsilon (t)\vert \right ) \leq K.$$

Proof: Under the special structure of the generator Q ε, it is easy to see that

$$\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} =\widetilde{ \mathrm{1}\mathrm{l}}\overline{Q}(s),$$

where \(\widetilde{\mathrm{1}\mathrm{l}}\) now takes the form

$$\widetilde{\mathrm{1}\mathrm{l}} = \mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{m}_{0}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{0}}).$$

Note that under current conditions on the fast-changing part of the generator \(\widetilde{Q}(t)\),

$${\nu }^{1}(t) = {\nu }^{2}(t) = \cdots = {\nu }^{l}(t)\mbox{ and }\mbox{ diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))\widetilde{\mathrm{1}\mathrm{l}} = {I}_{ l},$$

where I l denotes the l-dimensional identity matrix. This together with (5.72) implies that

$$({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} = 0.$$

It follows from (5.71) that

$${\int }_{0}^{t}({\chi }^\varepsilon (s) -{\overline{\chi }}^\varepsilon (s))\widehat{Q}(s){P}_{ 0}(t,s)ds = 0.$$

Then (5.75) becomes

$${\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t) = {\eta }^\varepsilon (t).$$

The rest of the proof follows exactly that of Theorem  5.29. □ 

Corollary 5.33

Assume the conditions of Theorem  5.29 . Suppose \(\widetilde{Q}(t) =\widetilde{ Q}\) and \(\widehat{Q}(t) =\widehat{ Q}\) for some constant matrices \(\widetilde{Q}\) and \(\widehat{Q}\) . Then there exist positive constants ε 0 and K such that for 0 < ε ≤ ε 0 , and T ≥ 0,

$$E\exp \left ({ \frac{{\theta }_{T}} {{(T + 1)}^{\frac{5} {2} }} \sup }_{0\leq t\leq T}\vert {n}^\varepsilon (t)\vert \right ) \leq K.$$

Remark 5.34

Note that in view of Corollary  4.31 , one can show under the condition \(\widetilde{Q}(t) =\widetilde{ Q}\) and \(\widehat{Q}(t) =\widehat{ Q}\) that there exists a constant K such that

$${P}^\varepsilon (t,s) - {P}_{ 0}(t,s) = K(T + 1){O}_{1}\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - s)} \varepsilon \right )\right ).$$

In this case, θ T can be taken as

$$0 \leq {\theta }_{T} \leq \frac{\min \{1,{\kappa }_{0}\}} {K(T + 1)\vert \beta {\vert }_{T}(1 + \vert \widehat{Q}{\vert }_{T})}.$$

That is, compared with the general result, the constant K T can be further specified as \({K}_{T} = K(T + 1)\).

Proof of Corollary  5.33: Note that when the generators are time independent, the quasi-stationary distribution νi(t) is also independent of time and is denoted by νi. In this case, the argument from (5.75) to (5.80) can be replaced by the following. Let

$$\check{{Q}}_{0} =\widehat{ Q}\widetilde{\mathrm{1}\mathrm{l}}\mathrm{diag}({\nu }^{1},\ldots,{\nu }^{l}).$$

Then it can be shown that

$$\widehat{Q}\widetilde{\mathrm{1}\mathrm{l}}{\left (\overline{Q}\right )}^{k}\mathrm{diag}({\nu }^{1},\ldots,{\nu }^{l}) = {(\check{{Q}}_{ 0})}^{k+1},\mbox{ for }k \geq 0.$$

This implies that

$$\begin{array}{ll} \widehat{Q}{P}_{0}(t,s)& =\widehat{ Q}\widetilde{\mathrm{1}\mathrm{l}}\exp \left (\overline{Q}(t - s)\right )\mathrm{diag}({\nu }^{1},\ldots,{\nu }^{l}) \\ & =\check{ {Q}}_{0}\exp (\check{{Q}}_{0}(t - s))\end{array}$$

Let \({\phi }^\varepsilon (t) = ({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t)) - {\eta }^\varepsilon (t)\). Then ϕε( ⋅) satisfies the equation

$${\phi }^\varepsilon (t) +{ \int }_{0}^{t}({\phi }^\varepsilon (s) + {\eta }^\varepsilon (s))\check{{Q}}_{ 0}\exp (\check{{Q}}_{0}(t - s))ds = 0.$$

Solving for ϕε( ⋅), we obtain

$${\phi }^\varepsilon (t) = -{\int }_{0}^{t}{\eta }^\varepsilon (s)\check{{Q}}_{ 0}ds.$$

Writing \({\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t)\) in terms of ϕε (t) and η ε(t) yields,

$${\chi }^\varepsilon (t) -{\overline{\chi }}^\varepsilon (t) = {\eta }^\varepsilon (t) -{\int }_{0}^{t}{\eta }^\varepsilon (s)\check{{Q}}_{ 0}ds.$$

The rest of the proof follows that of Theorem  5.29. □ 

Similar to Section 5.2, we derive estimates that are analogous to Corollary  5.7 and Corollary  5.8. The details are omitted, however.

3.3 Asymptotic Distributions

In Section 5.2, we obtained a central limit theorem for a class of Markov chains generated by \({Q}^\varepsilon (t) = Q(t)/\varepsilon +\widehat{ Q}(t)\) with a weakly irreducible Q( t). In this case for sufficiently small ε > 0, Q ε(t) is weakly irreducible. What, if anything, can be said about the weak and strong interaction models, when \(\widetilde{Q}(t)\) is not weakly irreducible? Is there a central limit theorem for the corresponding occupation measure when one has a singularly perturbed Markov chain with weak and strong interactions? This section deals with such an issue; our interest lies in the asymptotic distribution as ε → 0. It is shown that the asymptotic distribution of the corresponding occupation measure can be obtained. However, the limit distribution is no longer Gaussian, but a Gaussian mixture, and the proof is quite different from that of the irreducible case in Section 5.2.

For each i = 1,  , l, j = 1, …,  m i , \(\alpha \in \mathcal{M}\) , and t ≥ 0, let β ij (t) be a bounded Borel measurable deterministic function. Use W ij (t, α) defined in (5.61) and the normalized occupation measure

$${n}^\varepsilon (t) = \left ({n}_{ 11}^\varepsilon (t),\ldots,{n}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{n}_{ l1}^\varepsilon (t),\ldots,{n}_{ l{m}_{l}}^\varepsilon (t)\right ),$$

with

$${n}_{ij}^\varepsilon (t) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}{W}_{ ij}(s,{\alpha }^\varepsilon (s))ds.$$

We will show in this section that n ε( ⋅) converges weakly to a switching diffusion modulated by \(\overline{\alpha }(\cdot )\). The procedure is as follows:

  1. (a)

    Show that \(({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\) is tight;

  2. (b)

    verify that the limit of a subsequence of \(({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\) is a solution to a martingale problem that has a unique solution;

  3. (c)

    characterize the solution of the associated martingale problem;

  4. (d)

    construct a switching diffusion that is also a solution to the martingale problem and therefore the limit of \(({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\).

To accomplish our goal, these steps are realized by proving a series of lemmas. Recall that \({\mathcal{F}}_{t}^\varepsilon = \sigma \{{\alpha }^\varepsilon (s) :\; 0 \leq s \leq t\}\) denotes the filtration generated by αε( ⋅). The lemma below is on the order estimates of the conditional moments, and is useful for getting the tightness result in what follows.

Lemma 5.35

Assume (A5.5) and (A5.6) . Then for all 0 ≤ s ≤ t ≤ T and ε small enough, the following hold:

$$\begin{array}{l} {\mbox{ (a) }\sup }_{s\leq t\leq T}E[{n}^\varepsilon (t) - {n}^\varepsilon (s)\vert {\mathcal{F}}_{ s}^\varepsilon ] = O(\sqrt\varepsilon ); \\ {\mbox{ (b) }\sup }_\varepsilon E\left [\;\vert {n}^\varepsilon (t) - {n}^\varepsilon (s){\vert }^{2}\vert {\mathcal{F}}_{ s}^\varepsilon \right ] = O(t - s)\end{array}$$

Proof: First, note that for any fixed i, j,

$$E[({n}_{ij}^\varepsilon (t) - {n}_{ ij}^\varepsilon (s))\vert {\mathcal{F}}_{ s}^\varepsilon ] = \frac{1} {\sqrt\varepsilon }{\int }_{s}^{t}E[{W}_{ ij}(r,{\alpha }^\varepsilon (r))\vert {\mathcal{F}}_{ s}^\varepsilon ]dr.$$

Moreover, in view of the definition of W ij (t, α) and the Markov property, we have, for 0 ≤ s ≤  r,

$$\begin{array}{l} E[{W}_{ij}(r,{\alpha }^\varepsilon (r))\vert {\mathcal{F}}_{ s}^\varepsilon ] \\ \quad = E\left [\left ({I}_{\{{\alpha }^\varepsilon (r)={s}_{ij}\}} - {\nu }_{j}^{i}(r){I}_{\{{ \alpha }^\varepsilon (r)\in {\mathcal{M}}_{i}\}}\right )\vert {\mathcal{F}}_{s}^\varepsilon \right ]{\beta }_{ ij}(r) \\ \quad = \left (P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\mathcal{F}}_{s}^\varepsilon ) - {\nu }_{ j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\mathcal{F}}_{s}^\varepsilon )\right ){\beta }_{ ij}(r) \\ \quad = \left (P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\alpha }^\varepsilon (s)) - {\nu }_{ j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (s))\right ){\beta }_{ ij}(r)\end{array}$$

In view of Lemma  5.24, in particular, similar to (5.55) and (5.56), for all i 0  = 1,  , l and \({j}_{0} = 1,\ldots,{m}_{{i}_{0}}\),

$$\begin{array}{l} P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}}) - {\nu }_{j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}}) \\ = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(r - s)} \varepsilon \right )\right )\end{array}$$

Thus owing to Lemma  A.42, we have

$$\begin{array}{rl} &\left (P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\alpha }^\varepsilon (s)) - {\nu }_{ j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (s))\right ){\beta }_{ ij}(r) \\ &\quad = \sum \limits_{{i}_{0}=1}^{l} \sum \limits_{{j}_{0}=1}^{{m}_{{i}_{0}} }{I}_{\{{\alpha }^\varepsilon (s)={s}_{{i}_{ 0}{j}_{0}}\}}\left (P({\alpha }^\varepsilon (r) = {s}_{ ij}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}})\right. \\ &\left.\quad \quad \quad - {\nu }_{j}^{i}(r)P({\alpha }^\varepsilon (r) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}})\right ){\beta }_{ij}(r) \\ &\quad = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(r - s)} \varepsilon \right )\right )\end{array}$$

Note also that

$$\frac{1} {\sqrt\varepsilon }{\int }_{s}^{t}O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(r - s)} \varepsilon \right )\right )dr = O(\sqrt\varepsilon ).$$

This implies (a).

To verify (b), fix and suppress i, j and define

$${\eta }^\varepsilon (t) = E\left [\left ({{\int }_{s}^{t}{W}_{ ij}(r,{\alpha }^\varepsilon (r))dr}\right )^{2}\left |{\mathcal{F}}_{ s}^\varepsilon \right]\right..$$

Then by the definition of n ij ( ⋅),

$$E\left [{\left ({n}_{ij}^\varepsilon (t) - {n}_{ ij}^\varepsilon (s)\right )}^{2}\left |{\mathcal{F}}_{ s}^\varepsilon \right ] = \frac{{\eta }^\varepsilon (t)} \varepsilon .\right.$$
(5.86)

In accordance with the definition of \({\overline{\alpha }}^\varepsilon (\cdot )\),\({\overline{\alpha }}^\varepsilon (t) = i\) iff \({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{i}\) . In what follows, we use \({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{i}\) and \({\overline{\alpha }}^\varepsilon (t) = i\) interchangeably. Set

$$\begin{array}{rl} &{\Psi }_{1}^\varepsilon (t,r) = {I}_{\{{ \alpha }^\varepsilon (r)={s}_{ij}\}}{I}_{\{{\alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{ \alpha }^\varepsilon (r)={s}_{ij}\}}{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}, \\ &{\Psi }_{2}^\varepsilon (t,r) = -{\nu }_{ j}^{i}(r){I}_{\{{\overline{\alpha }}^{ \varepsilon }(r)=i\}}{I}_{\{{\alpha }^\varepsilon (t)={s}_{ij}\}} + {\nu }_{j}^{i}(r){\nu }_{ j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(r)=i\}}{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}\end{array}$$

Then as in the proof of Theorem  5.25,

$${ d{\eta }^\varepsilon (t) \over dt} = 2{\int }_{s}^{t}E\left [{\Psi }_{ 1}^\varepsilon (t,r) + {\Psi }_{ 2}^\varepsilon (t,r)\vert {\mathcal{F}}_{ s}^\varepsilon \right ]{\beta }_{ ij}(r){\beta }_{ij}(t)dr.$$

Using Lemma  5.24, we obtain

$$\begin{array}{rl} &E[{\Psi }_{1}^\varepsilon (t,r)\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}}] = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ), \\ &E[{\Psi }_{2}^\varepsilon (t,r)\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{0}{j}_{0}}] = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ), \end{array}$$

for all i 0 = 1, …,  l and \({j}_{0} = 1,\ldots,{m}_{{i}_{0}}\). Then from Lemma  A.42, we obtain

$$\begin{array}{rl} &E[{\Psi }_{1}^\varepsilon (t,r)\vert {\mathcal{F}}_{ s}^\varepsilon ] = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right ), \\ &E[{\Psi }_{2}^\varepsilon (t,r)\vert {\mathcal{F}}_{ s}^\varepsilon ] = O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}(t - r)} \varepsilon \right )\right )\end{array}$$

As a consequence, we have

$$\frac{d{\eta }^\varepsilon (t)} {dt} = O(\varepsilon ).$$

Integrating both sides over [s,  t] and recalling ηε(s) = 0 yields

$$\frac{{\eta }^\varepsilon (t)} \varepsilon = O(t - s).$$

This completes the proof of the lemma. □ 

The next lemma is concerned with the tightness of \(\{({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\}\).

Lemma 5.36

Assume (A5.5) and (A5.6) . Then \(\{({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\}\) is tight in \(D([0,T]; {\mathbb{R}}^{m} \times \overline{\mathcal{M}})\).

Proof: The proof uses Lemma  A.17. We first verify that the condition given in Remark  A.18 holds. To this end, note that \(0 \leq {\overline{\alpha }}^\varepsilon (t) \leq l\) for all t ∈ [0,  T]. Moreover, by virtue of Theorem  5.25, for each δ > 0 and each rational t ≥ 0,

$$\begin{array}{rl} {\inf }_\varepsilon P\left (\vert {n}^\varepsilon (t)\vert \leq {K}_{ t,\delta }\right )& {=\inf }_\varepsilon [1 - P(\vert {n}^\varepsilon (t)\vert \geq {K}_{ t,\delta })] \\ & {\geq \inf }_\varepsilon \left (1 -\frac{E\vert {n}^\varepsilon (t){\vert }^{2}} {{K}_{t,\delta }^{2}} \right ) \\ & \geq 1 - \frac{Kt} {{K}_{t,\delta }^{2}}, \end{array}$$

where the last inequality is due to Theorem  5.25. Thus if we choose \({K}_{t,\delta } > \sqrt{KT/\delta }\), (A.6) will follow.

It follows from Lemma  5.35 and (5.58) that for all t ∈ [0, T],

$$\begin{array}{l} \lim \limits_{\Delta \rightarrow 0}{\left \{ limsup{}_{\varepsilon \rightarrow 0}\left ({\sup }_{0\leq s\leq \Delta }E\left \{E\left [\;\vert {n}_{ij}^\varepsilon (t + s) - {n}_{ ij}^\varepsilon (t){\vert }^{2}\vert {\mathcal{F}}_{ t}^\varepsilon \right ]\right \}\right )\right \}} = 0, \\ \lim \limits_{\Delta \rightarrow 0}{\left \{ limsup{}_{\varepsilon \rightarrow 0}\left ({\sup }_{0\leq s\leq \Delta }E\left \{E\left [\;\vert {\overline{\alpha }}^\varepsilon (t + s) -{\overline{\alpha }}^\varepsilon (t){\vert }^{2}\vert {\mathcal{F}}_{ t}^\varepsilon \right ]\right \}\right )\right \}} = 0.\end{array}$$
(5.87)

Using (5.86) and (5.87), Theorem  A.17 yields the desired result. □ 

The tightness of \(({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\) and Prohorov’s theorem allow one to extract convergent subsequences. We next show that the limit of such a subsequence is uniquely determined in distribution. An equivalent statement is that the associated martingale problem has a unique solution. The following lemma is a generalization of Theorem  5.25 and is needed for proving such a uniqueness property.

Lemma 5.37

Let ξ(t,x) be a real-valued function that is Lipschitz in (t,x) \(\in {\mathbb{R}}^{m+1}\) . Then

$${\sup }_{0\leq \varsigma \leq T}E{|{\int }_{0}^{\varsigma }{W}_{ ij}(s,{\alpha }^\varepsilon (s))\xi (s,{n}^\varepsilon (s))d{s|}}^{2} \rightarrow 0,$$

where \({W}_{ij}(t,\alpha ) = ({I}_{\{\alpha ={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{\alpha \in {\mathcal{M}}_{i}\}}){\beta }_{ij}(t)\) as defined in (5.61).

Remark 5.38

This lemma indicates that the weighted occupation measure (with weighting function ξ(t,αε(t))) defined above goes to zero in mean square uniformly in t ∈ [0,ς]. If ξ(⋅) were a bounded and measurable deterministic function not depending on αε(⋅) or nε(⋅), this assertion would follow from Theorem  5.25 easily. In the current situation, it is a function of n ε (⋅) and therefore a function of α ε (⋅), which results in much of the difficulty. Intuitively, if we can “separate” the functions W ij (⋅) and ξ(⋅) in the sense treating ξ(⋅) as deterministic, then Theorem  5.25 can be applied to obtain the desired limit. To do so, subdivide the interval [0,ς] into small intervals so that on each of the small intervals, the two functions can be separated. To be more specific, on each partitioned interval, use a piecewise-constant function to approximate ξ(⋅), and show that the error goes to zero. In this process, the Lipschitz condition of ξ(t,x) plays a crucial role.

Proof of Lemma  5.37: For 0 < δ < 1 and 0 < ς ≤ T, let \(N = [\varsigma /\varepsilon ^{1-\delta }]\). Use a partition of [0, ς] given by

$$[{t}_{0},{t}_{1}] \cup [{t}_{1},{t}_{2}) \cup \cdots \cup [{t}_{N},{t}_{N+1}]$$

of [0, ς], where \({t}_{k} = \varepsilon ^{1-\delta }k\) for k = 0, 1,  , N and \({t}_{N+1} = \varsigma \) . Consider a piecewise-constant function

$$\widetilde{\xi }(t) = \left \{\begin{array}{ll} \xi (0,{n}^\varepsilon (0)), &\mbox{ if }0 \leq t < {t}_{2}, \\ \xi ({t}_{k-1},{n}^\varepsilon ({t}_{k-1})), &\mbox{ if }{t}_{k} \leq t < {t}_{k+1},\;k = 2,\ldots N, \\ \xi ({t}_{N-1},{n}^\varepsilon ({t}_{N-1})),&\mbox{ if }t = {t}_{N+1}. \end{array} \right.$$

Let W ij ε(t) = W ij (t, α ε(t)). Then

$$\begin{array}{l} E{ |{ \int }_{0}^{\varsigma }{W}_{ ij}^\varepsilon (t)\xi (t,{n}^\varepsilon (t))d{t |}}^{2} \\ \leq 2E{ |{\int }_{0}^{\varsigma }{W}_{ ij}^\varepsilon (t)\vert \xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t)\vert d{t |}}^{2} + 2E{ |{\int }_{0}^{\varsigma }{W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)d{t |}}^{2}.\end{array}$$
(5.88)

We now estimate the first term on the second line above. In view of the Cauchy inequality and the boundedness of W ij ε (t), it follows, for 0 ≤ ς ≤ T, that

$$\begin{array}{rl} E{ |{\int }_{0}^{\varsigma }{W}_{ ij}^\varepsilon (t)\vert \xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t)\vert d{t |}}^{2} & \leq TE{\int }_{0}^{\varsigma }{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt \\ & = T{\int }_{0}^{\varsigma }E{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt.\end{array}$$

Note that Theorem  5.25 implies

$$E\vert {n}^\varepsilon (t){\vert }^{2} \leq K,$$

for a positive constant K and for all t ∈ [0, T]. Therefore, in view of the Lipschitz condition of ξ( ⋅), we have

$$E\vert \xi (t,{n}^\varepsilon (t))\vert \leq K(1 + E\vert {n}^\varepsilon (t)\vert ) \leq K(1 + {(E\vert {n}^\varepsilon (t){\vert }^{2})}^{\frac{1} {2} }) = O(1).$$

Noting that \({t}_{2} = 2\varepsilon ^{1-\delta } = O(\varepsilon ^{1-\delta })\) , it follows that

$$\begin{array}{l} { \int }_{0}^{\varsigma }E{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt \\ \quad = \sum \limits_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }E{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt + O(\varepsilon ^{1-\delta })\end{array}$$

Using the definition of \(\widetilde{\xi }(t)\) , the Lipschitz property of ξ( t, x) in ( t, x), the choice of the partition of [0, ς], and Lemma  5.35, we have

$$\begin{array}{l} \sum \limits_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }E{(\xi (t,{n}^\varepsilon (t)) -\widetilde{ \xi }(t))}^{2}dt \\ \quad = \sum \limits_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }E{(\xi (t,{n}^\varepsilon (t)) - \xi ({t}_{ k-1},{n}^\varepsilon ({t}_{ k-1})))}^{2}dt \\ \quad \leq 2{\sum }_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }K\left ({(t - {t}_{k-1})}^{2} + E\vert {n}^\varepsilon (t) - {n}^\varepsilon ({t}_{ k-1})){\vert }^{2}\right )dt \\ \quad \leq 2{\sum }_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }K\left ({(t - {t}_{k-1})}^{2} + O(t - {t}_{ k-1})\right )dt \\ \quad = 2{\sum }_{k=2}^{N}{ \int }_{{t}_{k}}^{{t}_{k+1} }O(\varepsilon ^{1-\delta })dt = O(\varepsilon ^{1-\delta })\end{array}$$

Let us estimate the second term on the second line in (5.88). Set

$$\widetilde{{\eta }}^\varepsilon (t) = E{\left ({\int }_{0}^{t}{W}_{ ij}^\varepsilon (s)\widetilde{\xi }(s)ds\right )}^{2}.$$

Then the derivative of \(\widetilde{{\eta }}^\varepsilon (t)\) is given by

$$\begin{array}{l} \frac{d\widetilde{{\eta }}^\varepsilon (t)} {dt} = 2E{\int }_{0}^{t}{W}_{ ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)ds \\ \quad = 2{\int }_{0}^{t}E\left ({W}_{ ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right )ds\end{array}$$

For 0 ≤  t ≤ t 2, in view of the Lipschitz property and Theorem  5.25, we obtain

$$\begin{array}{rl} {\int }_{0}^{{t}_{2} }E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right )ds& \leq {\int }_{0}^{{t}_{2} }E\left (\vert \widetilde{\xi }(s)\vert \cdot \vert \widetilde{\xi }(t)\vert \right )ds \\ & \leq {\int }_{0}^{{t}_{2} }{(E\vert \widetilde{\xi }(s){\vert }^{2})}^{\frac{1} {2} }{(E\vert \widetilde{\xi }(t){\vert }^{2})}^{\frac{1} {2} }ds \\ & = O({t}_{2}) = O(\varepsilon ^{1-\delta })\end{array}$$

If t k  ≤ t < t k + 1, for k = 2,  , N, then using the same argument gives us

$$\begin{array}{l} { \int }_{k-1}^{t}E\left ({W}_{ ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right )ds \\ \quad = O(t - {t}_{k-1}) = O({t}_{k+1} - {t}_{k-1}) = O(\varepsilon ^{1-\delta }) \end{array}$$

and

$$\frac{d\widetilde{{\eta }}^\varepsilon (t)} {dt} = 2{\int }_{0}^{{t}_{k-1} }E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right )ds + O(\varepsilon ^{1-\delta }).$$

Recall that \({\mathcal{F}}_{t}^\varepsilon = \sigma \{{\alpha }^\varepsilon (s) :\; 0 \leq s \leq t\}\) . For \(s \leq {t}_{k-1} < {t}_{k} \leq t < {t}_{k+1}\),

$$\begin{array}{l} E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right ) \\ \quad = E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s)E[{W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\vert {\mathcal{F}}_{{ t}_{k-1}}]\right ).\end{array}$$
(5.89)

Moreover, in view of the definition of \(\widetilde{\xi }(\cdot )\) and the proof of Lemma  5.35, we have for some κ0  > 0,

$$\begin{array}{rl} E[{W}_{ij}^\varepsilon (t)\widetilde{\xi }(t)\vert {\mathcal{F}}_{{ t}_{k-1}}]& =\widetilde{ \xi }(t)E[{W}_{ij}^\varepsilon (t)\vert {\mathcal{F}}_{{ t}_{k-1}}] \\ & =\widetilde{ \xi }(t)O\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}(t - {t}_{k-1})} \varepsilon \right )\right ) \\ & =\widetilde{ \xi }(t)O\left (\varepsilon +\exp \left (-\frac{{\kappa }_{0}({t}_{k} - {t}_{k-1})} \varepsilon \right )\right ) \\ & =\widetilde{ \xi }(t)O\left (\varepsilon +\exp \left ( -\frac{{\kappa }_{0}} {\varepsilon ^{\delta }} \right )\right ) =\widetilde{ \xi }(t)O(\varepsilon )\end{array}$$

Combine this with (5.89) to obtain

$$E\left ({W}_{ij}^\varepsilon (s)\widetilde{\xi }(s){W}_{ ij}^\varepsilon (t)\widetilde{\xi }(t)\right ) = O(\varepsilon )E\vert \widetilde{\xi }(s)\widetilde{\xi }(t)\vert = O(\varepsilon ).$$

Therefore,

$$\frac{d\widetilde{{\eta }}^\varepsilon (t)} {dt} = O(\varepsilon ^{1-\delta })$$

uniformly on [0,  T], which implies, together with \(\widetilde{{\eta }}^\varepsilon (0) = 0\), that

$${\sup }_{0\leq \varsigma \leq T}\widetilde{{\eta }}^\varepsilon (\varsigma ) {=\sup }_{ 0\leq \varsigma \leq T}{ \int }_{0}^{\varsigma }\left (\frac{d\widetilde{{\eta }}^\varepsilon (t)} {dt} \right )dt = O(\varepsilon ^{1-\delta }).$$

This completes the proof. □ 

To characterize the limit of \(({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\), consider the martingale problem associated with \(({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\). Note that

$$\frac{d{n}^\varepsilon (t)} {dt} = \frac{1} {\sqrt\varepsilon }W(t,{\alpha }^\varepsilon (t))\mbox{ and }{n}^\varepsilon (0) = 0,$$

where

$$\begin{array}{l} W(t,\alpha ) = \left ({W}_{11}(t,\alpha ),\ldots,{W}_{1{m}_{1}}(t,\alpha ),\ldots,{W}_{l1}(t,\alpha ),\ldots,{W}_{l{m}_{l}}(t,\alpha )\right )\end{array}$$

Let \({\mathcal{G}}^\varepsilon (t)\) be the operator

$$\begin{array}{l} {\mathcal{G}}^\varepsilon (t)f(t,x,\alpha ) = \frac{\partial } {\partial t}f(t,x,\alpha ) + \frac{1} {\sqrt\varepsilon }\langle W(t,\alpha ),{\nabla }_{x}f(t,x,\alpha )\rangle \\ + {Q}^\varepsilon (t)f(t,x,\cdot )(\alpha ), \end{array}$$

for all f( ⋅,  ⋅, α) ∈  C 1, 1, where ∇  x denotes the gradient with respect to x and ⟨ ⋅,  ⋅⟩ denotes the usual inner product in Euclidean space. It is well known that (see Davis [41, Chapter 2])

$$f(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)) -{\int }_{0}^{t}{\mathcal{G}}^\varepsilon (s)f(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))ds$$
(5.90)

is a martingale.

We use the perturbed test function method (see Ethier and Kurtz [59] and Kushner [139]) to study the limit as ε → 0. To begin with, we define a functional space on \({\mathbb{R}}^{m} \times \overline{\mathcal{M}}\)

$$\begin{array}{l} {C}_{L}^{2} = \{{f}^{0}(x,i) : \mbox{ with bounded derivatives up to the } \\ \mbox{ second order such that the second derivative is Lipschitz}\}.\end{array}$$
(5.91)

For any real-valued function \({f}^{0}(\cdot,i) \in {C}_{L}^{2}\), define

$$\overline{f}(x,\alpha ) = \sum \limits_{i=1}^{l}{f}^{0}(x,i){I}_{\{ \alpha \in {\mathcal{M}}_{i}\}} = \left \{\begin{array}{cc} {f}^{0}(x,1),&\mbox{ if }\alpha \in {\mathcal{M}}_{1},\\ \vdots & \vdots \\ {f}^{0}(x,l), & \mbox{ if }\alpha \in {\mathcal{M}}_{l},\\ \end{array} \right.$$

and consider the function

$$f(t,x,\alpha ) = \overline{f}(x,\alpha ) + \sqrt\varepsilon h(t,x,\alpha ),$$
(5.92)

where h( t, x, α) is to be specified later. The main idea is that by appropriate choice of h( ⋅), the perturbation is small and results in the desired cancelation in the calculation.

In view of the block-diagonal structure of \(\widetilde{Q}(t)\) and the definition of \(\overline{f}(x,\alpha )\), it is easy to see that

$$\widetilde{Q}(t)\overline{f}(x,\cdot )(\alpha ) = 0.$$

Applying the operator \({\mathcal{G}}^\varepsilon (t)\) to the function f( ⋅) defined in (5.92) yields that

$$\begin{array}{l} \overline{f}({n}^\varepsilon (t),{\alpha }^\varepsilon (t)) + \sqrt\varepsilon h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)) \\ \quad -{\int }_{0}^{t}\left \{ \frac{1} {\sqrt\varepsilon }\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}\overline{f}({n}^\varepsilon (s),{\alpha }^\varepsilon (s)) + \sqrt\varepsilon {\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle \right.\\ \qquad \qquad + \sqrt\varepsilon \frac{\partial } {\partial s}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s)) + \frac{1} {\sqrt\varepsilon }\widetilde{Q}(s)h(s,{n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s)) \\ \left.\qquad \qquad +\widehat{ Q}(s)(\overline{f}({n}^\varepsilon (s),\cdot ) + \sqrt\varepsilon h(s,{n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))\right \}ds \end{array}$$

defines a martingale.

The basic premise of the perturbed test function method is to choose the function h( ⋅) that cancels the “bad” terms of order 1 ∕ \sqrt{ε}:

$$\widetilde{Q}(s)h(s,x,\cdot )(\alpha ) = -\langle W(s,\alpha ),{\nabla }_{x}\overline{f}(x,\alpha )\rangle.$$
(5.93)

Note that as mentioned previously, \(\widetilde{Q}(t)\) has rank m −  l. Thus the dimension of the null space is l; that is, \(N(\widetilde{Q}(t)) = l\). A crucial observation is that in view of the Fredholm alternative (see Lemma  A.37 and Corollary  A.38), a solution of (5.93) exists iff the matrix \((\langle W(s,{s}_{ij}),{\nabla }_{x}\overline{f}(x,{s}_{ij})\rangle )\) is orthogonal to \(\widetilde{\mathrm{1}{\mathrm{l}}}_{{m}_{1}},\ldots,\widetilde{\mathrm{1}{\mathrm{l}}}_{{m}_{l}}\) , the span of \(N(\widetilde{Q}(t))\) (see Remark  4.23 for the notation). Moreover, since f 0( ⋅, i) is C L 2,h( ⋅) can be chosen to satisfy the following properties assuming β ij ( ⋅) to be Lipschitz on [0, T]:

$$\begin{array}{l} \mbox{ (1) }h(t,x,\alpha )\mbox{ is uniformly Lipschitz in }t; \\ \mbox{ (2) }\vert h(t,x,\alpha )\vert \mbox{ and }\vert {\nabla }_{x}h(t,x,\alpha )\vert \mbox{ are bounded; } \\ \mbox{ (3) }{\nabla }_{x}h(t,x,\alpha )\mbox{ is Lipschitz in }(t,x)\end{array}$$

Such an \(h(\cdot )\) leads to

$$\begin{array}{l} \overline{f }({n}^\varepsilon (t),{\alpha }^\varepsilon (t)) + \sqrt\varepsilon h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)) \\ -{\int }_{0}^{t}\left \{\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle \right.\\ \qquad + \sqrt\varepsilon \left ( \frac{\partial } {\partial s}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\right ) +\widehat{ Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s)) \\ \left.\qquad + \sqrt\varepsilon \widehat{Q}(s)h(s,{n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))\right \}ds \end{array}$$
(5.94)

being a martingale. For each s,  x, α, define

$$g(s,x,\alpha ) =\langle W(s,\alpha ),{\nabla }_{x}h(s,x,\alpha )\rangle.$$
(5.95)

With f 0  ∈  C L 2 , it is easy to see that g(s,  x, α) is Lipschitz in (s,  x). This function will be used in defining the operator for the limit problem later.

Remark 5.39

Note that the choice of h(⋅) in (5.93) is not unique. If h 1 (⋅) and h 2(⋅) are both solutions to (5.93), then the irreducibility of \(\widetilde{{Q}}^{i}(s)\) implies that, for each i = 1,…,l,

$$\left (\begin{array}{c} {h}_{1}(s,x,{s}_{i1})\\ \vdots \\ {h}_{1}(s,x,{s}_{i{m}_{i}}) \end{array} \right )-\left (\begin{array}{c} {h}_{2}(s,x,{s}_{i1})\\ \vdots \\ {h}_{2}(s,x,{s}_{i{m}_{i}}) \end{array} \right ) = {h}^{0}(s,x,i)\mathrm{1}{\mathrm{l}}_{{ m}_{i}}$$

for some scalar functions h 0 (s,x,i). Although the choice of h is not unique, the resulting function g(s,x,α) is well defined. As in Remark  4.23, the consistency condition or solvability condition due to Fredholm alternative is in force. Therefore, if h 1 and h 2 are both solutions to (5.93), then

$$\langle W(s,\alpha ),{\nabla }_{x}{h}_{1}(s,x,\alpha )\rangle =\langle W(s,\alpha ),{\nabla }_{x}{h}_{2}(s,x,\alpha )\rangle,$$

for \(\alpha \in {\mathcal{M}}_{i}\) and i = 1,…,l.

Using g(s,  x, α) defined above, we obtain

$$\begin{array}{l} {\int }_{0}^{t}\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle ds \\ \quad ={ \int }_{0}^{t}g(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{I}_{\{{\alpha }^\varepsilon (s)={s}_{ij}\}}g(s,{n}^\varepsilon (s),{s}_{ ij})ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }({I}_{\{{\alpha }^\varepsilon (s)={s}_{ij}\}} - {\nu }_{j}^{i}(s){I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}})g(s,{n}^\varepsilon (s),{s}_{ ij})ds \\ \qquad \quad +{ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}{\nu }_{j}^{i}(s)g(s,{n}^\varepsilon (s),{s}_{ ij})ds\end{array}$$

In view of Lemma  5.37, the term in the fourth line above goes to zero in mean square uniformly in t ∈ [0,  T]. Let

$$\overline{g}(s,x,i) = \sum \limits_{j=1}^{{m}_{i} }{\nu }_{j}^{i}(s)g(s,x,{s}_{ ij}).$$

Then it follows that

$$\begin{array}{l} {\int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}{\nu }_{j}^{i}(s)g(s,{n}^\varepsilon (s),{s}_{ ij})ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l}{I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}}\overline{g}(s,{n}^\varepsilon (s),i)ds \\ \quad ={ \int }_{0}^{t}\overline{g}(s,{n}^\varepsilon (s),{\overline{\alpha }}^\varepsilon (s))ds\end{array}$$

Therefore, as ε → 0, we have

$$\begin{array}{l} E|{ \int }_{0}^{t}\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle ds \\ \quad -{\int }_{0}^{t}\overline{g}(s,{n}^\varepsilon (s),{\overline{\alpha }}^\varepsilon (s))d{s|}^{2} \rightarrow 0 \end{array}$$
(5.96)

uniformly in t ∈ [0,  T].

Furthermore, we have

$$\begin{array}{l} {\int }_{0}^{t}\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{I}_{\{{\alpha }^\varepsilon (s)={s}_{ij}\}}\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({s}_{ ij})ds \\ \quad ={ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }({I}_{\{{\alpha }^\varepsilon (s)={s}_{ij}\}} - {\nu }_{j}^{i}(s){I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}})\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({s}_{ ij})ds \\ \quad \quad +{ \int }_{0}^{t} \sum \limits_{i=1}^{l} \sum \limits_{j=1}^{{m}_{i} }{\nu }_{j}^{i}(s){I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}}\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({s}_{ ij})ds\end{array}$$

Again, Lemma  5.37 implies that the third line above goes to 0 in mean square uniformly in t ∈ [0,  T]. The last term above equals

$${\int }_{0}^{t}\overline{Q}(s){f}^{0}({n}^\varepsilon (s),\cdot )({\overline{\alpha }}^\varepsilon (s))ds,$$

where \(\overline{Q}(s) = \mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}\). It follows that as ε → 0,

$$\begin{array}{l} E|{ \int }_{0}^{t}\widehat{Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))ds \\ \quad -{\int }_{0}^{t}\overline{Q}(s){f}^{0}({n}^\varepsilon (s),\cdot )({\overline{\alpha }}^\varepsilon (s))ds| \rightarrow 0 \end{array}$$
(5.97)

uniformly in t ∈ [0,  T].

We next examine the function \(\overline{g}(s,x,i)\) closely. Using the block-diagonal structure of \(\widetilde{Q}(s)\), we can write (5.93) in terms of each block \(\widetilde{{Q}}^{j}(s)\). For j = 1,  , l,

$$\widetilde{{Q}}^{j}(s)\left (\begin{array}{c} h(s,x,{s}_{j1})\\ \vdots \\ h(s,x,{s}_{j{m}_{j}})\\ \end{array} \right ) = -\left (\begin{array}{c} \langle W(s,{s}_{j1}),{\nabla }_{x}{f}^{0}(x,j)\rangle \\ \vdots \\ \langle W(s,{s}_{j{m}_{j}}),{\nabla }_{x}{f}^{0}(x,j)\rangle \\ \end{array} \right ).$$
(5.98)

Note that \(\widetilde{{Q}}^{j}(s)\) is weakly irreducible so \(\mathrm{rank}(\widetilde{{Q}}^{j}(s)) = {m}_{j} - 1\). As in Remark  4.9, equation (5.98) has a solution since it is consistent and the solvability condition in the sense of Fredholm alternative is satisfied. We can solve (5.98) using exactly the same technique as in Section 4.2 for obtaining the φ i (t), that is, replacing one of the rows of the augmented matrix in (5.98) by (1, 1, , 1, 0), which represents the equation \({\sum }_{k=1}^{{m}_{j}}h(s,x,{s}_{jk}) = 0\). The coefficient matrix of the resulting equation then has full rank; one readily obtains a solution. Equivalently, the solution may be written as

$$\begin{array}{l} \left (\begin{array}{c} h(s,x,{s}_{j1})\\ \vdots \\ h(s,x,{s}_{j{m}_{j}})\\ \end{array} \right ) = \\ \quad -{\left [\left (\begin{array}{c} \widetilde{{Q}}^{j}(s) \\ \mathrm{1}{\mathrm{l}}_{{m}_{j}}^{\prime}\\ \end{array} \right )^{\prime}\left (\begin{array}{c} \widetilde{{Q}}^{j}(s) \\ \mathrm{1}{\mathrm{l}}_{{m}_{j}}^{\prime}\\ \end{array} \right )\right ]}^{-1}\left (\begin{array}{c} \widetilde{{Q}}^{j}(s) \\ \mathrm{1}{\mathrm{l}}_{{m}_{j}}^{\prime}\\ \end{array} \right )^{\prime}\left (\begin{array}{c} \langle W(s,{s}_{j1}),{\nabla }_{x}{f}^{0}(x,j)\rangle \\ \vdots \\ \langle W(s,{s}_{j{m}_{j}}),{\nabla }_{x}{f}^{0}(x,j)\rangle \\ 0\\ \end{array} \right )\end{array}$$

Note that

$${I}_{\{\alpha ={s}_{jk}\}} - {\nu }_{k}^{j}(t){I}_{\{ \alpha \in {\mathcal{M}}_{j}\}} = 0\mbox{ if }\alpha \not\in {\mathcal{M}}_{j}.$$

Recall the notation for the partitioned vector x = ( x 1, …,  x l) where x j is an m j -dimensional vector and \({x}^{j} = ({x}_{1}^{j},\ldots,{x}_{{m}_{j}}^{j})\). For the partial derivatives, use the notation

$${\partial }_{j,k} ={ \partial \over \partial {x}_{k}^{j}} \mbox{ and }{\partial }_{j,{j}_{1}{j}_{2}}^{2} ={ {\partial }^{2} \over \partial {x}_{{j}_{1}}^{j}\partial {x}_{{j}_{2}}^{j}}.$$

Then h(s,  x, s jk ) is a functional of j, j1 f 0(x,  j),...\({\partial }_{j,{m}_{j}}{f}^{0}(x,j)\). It follows that g( s, x,  s jk ) is a functional of \({\partial }_{j,{j}_{1}{j}_{2}}^{2}{f}^{0}(x,j)\) , for j 1, j 2  = 1,  , m j , and so is \(\overline{g}(s,x,j)\). Write

$$\overline{g}(s,x,j) = \frac{1} {2}{\sum }_{{j}_{1},{j}_{2}=1}^{{m}_{j} }{a}_{{j}_{1}{j}_{2}}(s,j){\partial }_{j,{j}_{1}{j}_{2}}^{2}{f}^{0}(x,j),$$
(5.99)

for some continuous functions \({a}_{{j}_{1}{j}_{2}}(s,j)\).

Lemma 5.40

Assume (A5.5) and (A5.6) . Suppose \(({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\) converges weakly to \((n(\cdot ),\overline{\alpha }(\cdot ))\) . Then for f 0 (⋅,i) ∈ C L 2,

$${f}^{0}(n(t),\overline{\alpha }(t)) -{\int }_{0}^{t}\left (\overline{g}(s,n(s),\overline{\alpha }(s)) + \overline{Q}(s){f}^{0}(n(s),\cdot )(\overline{\alpha }(s))\right )ds$$

is a martingale.

Proof: Define

$$\begin{array}{l} {H}^\varepsilon (t) = \overline{f}({n}^\varepsilon (t),{\alpha }^\varepsilon (t)) + \sqrt\varepsilon h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)) \\ \quad \, -{\int }_{0}^{t}\left \{\langle W(s,{\alpha }^\varepsilon (s)),{\nabla }_{ x}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s))\rangle \right.\\ \qquad \qquad \, + \sqrt\varepsilon \frac{\partial } {\partial s}h(s,{n}^\varepsilon (s),{\alpha }^\varepsilon (s)) +\widehat{ Q}(s)\overline{f}({n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s)) \\ \left.\qquad \qquad \, + \sqrt\varepsilon \widehat{Q}(s)h(s,{n}^\varepsilon (s),\cdot )({\alpha }^\varepsilon (s))\right \}ds\end{array}$$

The martingale property implies that

$$E\left [({H}^\varepsilon (t) - {H}^\varepsilon (s)){z}_{ 1}({n}^\varepsilon ({t}_{ 1}),{\overline{\alpha }}^\varepsilon ({t}_{ 1}))\cdots {z}_{k}({n}^\varepsilon ({t}_{ k}),{\overline{\alpha }}^\varepsilon ({t}_{ k}))\right ] = 0,$$

for any 0 ≤  t 1 ≤ ⋯ ≤ t k  ≤ s ≤  t and any bounded and continuous functions z 1 ( ⋅),  , z k ( ⋅).

In view of the choice of h( ⋅), it follows that all the three terms

$$\begin{array}{l} \sqrt\varepsilon h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t)), \\ \sqrt\varepsilon \left ( \frac{\partial } {\partial t}h(t,{n}^\varepsilon (t),{\alpha }^\varepsilon (t))\right ),\mbox{ and} \\ \sqrt\varepsilon \widehat{Q}(t)h(t,{n}^\varepsilon (t),\cdot )({\alpha }^\varepsilon (t)) \end{array}$$

converge to 0 in mean square. Recall (5.96), (5.97), and

$$\overline{f}({n}^\varepsilon (t),{\alpha }^\varepsilon (t)) = {f}^{0}({n}^\varepsilon (t),{\overline{\alpha }}^\varepsilon (t)).$$

Denote the weak limit of H ε( ⋅) by \(\overline{H}(\cdot )\). We have

$$E\left [\left (\overline{H}(t) -\overline{H}(s)\right ){z}_{1}(n({t}_{1}),\overline{\alpha }({t}_{1}))\cdots {z}_{k}(n({t}_{k}),\overline{\alpha }({t}_{k}))\right ] = 0,$$

where \(\overline{H}(\cdot )\) is given by

$$\begin{array}{l} \overline{H }(t)= {f}^{0}(n(t),\overline{\alpha }(t)) \\ \quad -{\int }_{0}^{t}\left (\overline{g}(r,n(r),\overline{\alpha }(r)) + \overline{Q}(r){f}^{0}(n(r),\cdot )(\overline{\alpha }(r))\right )dr\end{array}$$

Thus \((n(\cdot ),\overline{\alpha }(\cdot ))\) is a solution to the martingale problem. □ 

Lemma 5.41

Let \(\mathcal{L}\) denote the operator given by

$$\mathcal{L}{f}^{0}(x,j) ={ 1 \over 2} {\sum }_{{j}_{1},{j}_{2}=1}^{{m}_{j} }{a}_{{j}_{1}{j}_{2}}(s,j){\partial }_{j,{j}_{1}{j}_{2}}^{2}{f}^{0}(x,j) + \overline{Q}(s){f}^{0}(x,\cdot )(j).$$

Then the martingale problem with operator \(\mathcal{L}\) has a unique solution.

Proof: In view of Lemma  A.14, we need only verify the uniqueness in distribution of \((n(t),\overline{\alpha }(t))\) for each t ∈ [0, T]. Let

$$f(x,j) =\exp \left (\iota \{\langle \theta,x\rangle + {\theta }_{0}j\}\right ),$$

where \(\theta \in {\mathbb{R}}^{m}\),\({\theta }_{0} \in \mathbb{R}\),\(j \in \mathcal{M}\) , and ι is the pure imaginary number with \({\iota }^{2} = -1\).

For fixed j 0, k 0 , let \({F}_{{j}_{0}{k}_{0}}(x,j) = {I}_{\{j={j}_{0}\}}f(x,{k}_{0})\) . Then

$${F}_{{j}_{0}{k}_{0}}(n(t),\overline{\alpha }(t)) = {I}_{\{\overline{\alpha }(t)={j}_{0}\}}f(n(t),{k}_{0}).$$

Moreover, note that

$$\begin{array}{l} \overline{g }(s,n(s),\overline{\alpha }(s)) = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}}\overline{g}(s,n(s),j) \\ \quad ={ 1 \over 2} {\sum }_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}} \sum \limits_{{j}_{1},{j}_{2}=1}^{{m}_{j} }{a}_{{j}_{1}{j}_{2}}(s,j){\partial }_{j,{j}_{1}{j}_{2}}^{2}{F}_{{ j}_{0}{k}_{0}}(n(s),j) \\ \quad ={ 1 \over 2} {I}_{\{\overline{\alpha }(s)={j}_{0}\}} \sum \limits_{{j}_{1},{j}_{2}=1}^{{m}_{{j}_{0}} }{a}_{{j}_{1}{j}_{2}}(s,{j}_{0}){\partial }_{{j}_{0},{j}_{1}{j}_{2}}^{2}f(n(s),{k}_{ 0}) \\ \quad ={ 1 \over 2} {\sum }_{{j}_{1},{j}_{2}=1}^{{m}_{j} }{a}_{{j}_{1}{j}_{2}}(s,{j}_{0})(-{\theta }_{{j}_{0}{j}_{1}}{\theta }_{{j}_{0}{j}_{2}})({I}_{\{\overline{\alpha }(s)={j}_{0}\}}f(n(s),{k}_{0})).\end{array}$$
(5.100)

Furthermore, we have

$$\begin{array}{l} \overline{Q }(s){F}_{{j}_{0}{k}_{0}}(n(s),\cdot )(\overline{\alpha }(s)) \\ \quad = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}}\overline{Q}(s){F}_{{j}_{0}{k}_{0}}(n(s),\cdot )(j) \\ \quad = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}} \sum \limits_{k=1}^{l}{\overline{q}}_{ jk}(s){F}_{{j}_{0}{k}_{0}}(n(s),k) \\ \quad = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}} \sum \limits_{k=1}^{l}{\overline{q}}_{ jk}(s){I}_{\{k={j}_{0}\}}f(n(s),{k}_{0}) \\ \quad = \sum \limits_{j=1}^{l}{I}_{\{\overline{\alpha } (s)=j\}}{\overline{q}}_{j{j}_{0}}(s)f(n(s),{k}_{0}) \\ \quad = \sum \limits_{j=1}^{l}{\overline{q}}_{ j{j}_{0}}(s){I}_{\{\overline{\alpha }(s)=j\}}f(n(s),{k}_{0}).\end{array}$$
(5.101)

Let

$${\phi }_{jk}(t) = E\left ({I}_{\{\overline{\alpha }(t)=j\}}f(n(t),k)\right ),\ \mbox{ for }\ j,k = 1,\ldots,l.$$

Then in view of (5.100) and (5.101),

$$\begin{array}{l} {\phi }_{{j}_{0 } {k}_{0}}(t) - {\phi }_{{j}_{0}{k}_{0}}(0) -{\int }_{0}^{t}\left \{\;\; \sum \limits_{{j}_{1},{j}_{2}=1}^{{m}_{{j}_{0}} }{a}_{{j}_{1}{j}_{2}}(s,{j}_{0})(-{\theta }_{{j}_{0}{j}_{1}}{\theta }_{{j}_{0}{j}_{2}}){\phi }_{{j}_{0}{k}_{0}}(s) \right.\\ \left.\qquad \qquad + \sum \limits_{j=1}^{l}{\overline{q}}_{ j{j}_{0}}(s){\phi }_{j{k}_{0}}(s)\right \}ds = 0.\end{array}$$
(5.102)

Let

$$\phi (t) = ({\phi }_{11}(t),\ldots,{\phi }_{1{m}_{1}}(t),\ldots,{\phi }_{l1}(t),\ldots,{\phi }_{l{m}_{l}}(t)).$$

Rewrite (5.102) in terms of ϕ( ⋅) as

$$\phi (t) = \phi (0) +{ \int }_{0}^{t}\phi (s)B(s)ds,$$

where ϕ(0) = (ϕ jk (0)) with \({\phi }_{jk}(0) = E{I}_{\{\overline{\alpha }(0)=j\}}f(0,k)\), and B( t) is a matrix-valued function whose entries are defined by the integrand of (5.102). The equation for ϕ(t) is a linear ordinary differential equation. It is well known that such a differential equation has a unique solution. Hence, ϕ(t) is uniquely determined. In particular,

$$\begin{array}{l} E\exp \left (\iota \{\langle \theta,n(t)\rangle + {\theta }_{0}\overline{\alpha }(t)\}\right ) \\ \quad = \sum \limits_{j=1}^{l}E\left ({I}_{\{\overline{\alpha } (t)=j\}}\exp \left (\iota \{\langle \theta,n(t)\rangle + j{\theta }_{0}\}\right )\right ) \end{array}$$

is uniquely determined for all θ, θ0, so is the distribution of \((n(t),\overline{\alpha }(t))\) by virtue of the uniqueness theorem and the inversion formula of the characteristic function (see Chow and Teicher [30]). □ 

The tightness of \(({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\) together with Lemma  5.40 and Lemma  5.41 implies that \(({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\) converges weakly to \((n(\cdot ),\overline{\alpha }(\cdot ))\). We will show that n( ⋅) is a switching diffusion, i.e., a diffusion process modulated by a Markov process such that the covariance of the diffusion depends on the Markov jump process. Precisely, owing to the presence of the jump Markov chains, the limit process does not possess the independent increment property shared by many processes. A moment of reflection reveals that, necessarily, the coefficients in \(\overline{g}(s,x,i)\) must consist of a symmetric nonnegative definite matrix serving as a covariance matrix. The following lemma verifies this assertion.

Lemma 5.42

For s ∈ [0,T] and j = 1,…,l, the matrix

$$A(s,j) = ({a}_{{j}_{1}{j}_{2}}(s,j))$$

is symmetric and nonnegative definite.

Proof: Let \({\eta }^{j} = ({\eta }_{j1},\ldots,{\eta }_{j{m}_{j}})^{\prime}\) and \({x}^{j} = ({x}_{j1},\ldots,{x}_{j{m}_{j}})^{\prime}\). Define

$${f}_{j}(x) = \frac{1} {2}{\left (\langle {\eta }^{j},{x}^{j}\rangle \right )}^{2}.$$

Then the corresponding \(\overline{g}(\cdot )\) defined in (5.99) has the following form:

$$\overline{g}(s,x,j) = \frac{1} {2}{\eta }^{j,{\prime}}A(s,j){\eta }^{j}.$$

Moreover, let f j (x,  k) = f j (x), independent of k. Then for all k = 1,  , l,

$$\overline{Q}(s){f}_{j}({n}^\varepsilon (s),\cdot )(k) = 0.$$

To verify the nonnegativity of A(s,  j), it suffices to show that

$${\int }_{s}^{t}{\eta }^{j,{\prime}}A(r,j){\eta }^{j}dr \geq 0,$$

for all 0 ≤ s ≤  t ≤ T. Recall that f j (x) is a quadratic function. In view of (5.94) and the proof of Lemma  5.40, it then follows that

$$\frac{1} {2}{\int }_{s}^{t}{\eta }^{j,{\prime}}A(r,j){\eta }^{j}dr {=\lim }_{ \varepsilon \rightarrow 0}\left (E{f}_{j}({n}^\varepsilon (t)) - E{f}_{ j}({n}^\varepsilon (s))\right ).$$

We are in a position to show that the limit is nonnegative. Let

$${n}^{\varepsilon,j}(t) = ({n}_{ j1}^\varepsilon (t),\ldots,{n}_{ j{m}_{j}}^\varepsilon (t)).$$

Then

$$\begin{array}{l} E\left ({f}_{j}({n}^\varepsilon (t)) - {f}_{ j}({n}^\varepsilon (s))\right ) \\ \quad = \frac{1} {2}E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}{(t)\rangle }^{2} -\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2}\right )\end{array}$$

For t ≥  s ≥ 0, using

$$\langle {\eta }^{j},{n}^{\varepsilon,j}(t)\rangle =\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle +\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle,$$

we have

$$\begin{array}{l} E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}{(t)\rangle }^{2} -\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2}\right ) \\ = E\left (2\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle \langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle +\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}{(s)\rangle }^{2}\right ) \\ \geq 2E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle \langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle \right ) \\ = 2E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle E\left [\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle \left |{\mathcal{F}}_{ s}^\varepsilon \right ]\right )\right.\end{array}$$

We next show that the last term goes to 0 as ε → 0. In fact, in view of (a) in Lemma  5.35, it follows that

$$E[{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\vert {\mathcal{F}}_{ s}^\varepsilon ] = O(\sqrt\varepsilon ),$$

and hence

$$\begin{array}{rl} &E\left [\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle \left |{\mathcal{F}}_{ s}^\varepsilon \right ] \right.\\ &\qquad \quad =\langle {\eta }^{j},E[({n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s))\vert {\mathcal{F}}_{ s}^\varepsilon ]\rangle = O(\sqrt\varepsilon )\end{array}$$

Using (b) in Lemma  5.35, we derive the following inequalities

$$E\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2} \leq \vert {\eta }^{j}{\vert }^{2}E\vert {n}^{\varepsilon,j}(s){\vert }^{2} \leq \vert {\eta }^{j}{\vert }^{2}O(s).$$

The Cauchy–Schwarz inequality then leads to

$$\begin{array}{l} \vert E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle E\left [\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle \left |{\mathcal{F}}_{ s}^\varepsilon \right ]\right )\vert \right.\\ \quad \leq {(E\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2})}^{\frac{1} {2} }{(E{[\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle |{\mathcal{F}}_{s}^\varepsilon]}^{2})}^{\frac{1} {2} } \\ \quad ={ \left (E\langle {\eta }^{j},{n}^{\varepsilon,j}{(s)\rangle }^{2}\right )}^{\frac{1} {2} }O(\sqrt\varepsilon ) \rightarrow 0,\ \mbox{ as }\varepsilon \rightarrow 0. \end{array}$$

As a result for some K > 0, we have

$$E\left (\langle {\eta }^{j},{n}^{\varepsilon,j}(s)\rangle E\left [\langle {\eta }^{j},{n}^{\varepsilon,j}(t) - {n}^{\varepsilon,j}(s)\rangle |{\mathcal{F}}_{ s}^\varepsilon \right ]\right ) \geq -K\vert {\eta }^{j}\vert s\sqrt\varepsilon \rightarrow 0,$$

as ε → 0. The nonnegativity of A(s,  j) follows.

To show that A( s, j) is symmetric, consider

$${f}_{j,{j}_{1}{j}_{2}}(x) = {x}_{j{j}_{1}}{x}_{j{j}_{2}}\mbox{ for }{j}_{1},{j}_{2} = 1,\ldots,{m}_{j}.$$

Then, we have

$$\frac{1} {2}{\int }_{0}^{t}{a}_{{ j}_{1}{j}_{2}}(s,j)ds {=\lim }_{\varepsilon \rightarrow 0}E({n}_{{j}_{1}}^{\varepsilon,j}(t){n}_{{ j}_{2}}^{\varepsilon,j}(t)) = \frac{1} {2}{\int }_{0}^{t}{a}_{{ j}_{2}{j}_{1}}(s,j)ds,$$
(5.103)

for all t ∈ [0, T]. Thus, A(s,  j) is symmetric. □ 

Next, we derive an explicit representation of the nonnegative definite matrix A( s, j) similar to that of Theorem  5.9. Recall that given a function f 0( ⋅), one can find h( ⋅) as in (5.93). Using this h( ⋅), one defines f( ⋅) as in (5.95) which leads to \(\overline{g}(\cdot )\) given in (5.99). In view of the result in Theorem  5.9 for a single block of the irreducible matrix \(\widetilde{{Q}}^{j}(t)\) together with the computations of \(\overline{g}(s,x,j)\) , it follows that A(s,  j) = 2A 0(s, j), where

$$\begin{array}{l} {A}^{0 } (t,j) = {\beta }_{\mathrm{ diag}}^{j}(t)\left ({\nu }_{\mathrm{ diag}}^{j}(t){\int }_{0}^{\infty }{Q}_{ 0}(r,t,j)dr\right. \\ \quad \quad \quad +\left.\left ({ \int }_{0}^{\infty }{Q}_{ 0}(r,t,j)dr\right ){\nu }_{\mathrm{diag}}^{j}(t)\right ){\beta }_{\mathrm{ diag}}^{j}(t), \end{array}$$

with

$$\begin{array}{l} {\beta }_{\mathrm{diag}}^{j}(t) = \mathrm{diag}({\beta }_{j1}(t),\ldots,{\beta }_{j{m}_{j}}(t)), \\ {\nu }_{\mathrm{diag}}^{j}(t) = \mathrm{diag}({\nu }_{1}^{j}(t),\ldots,{\nu }_{{m}_{j}}^{j}(t)), \end{array}$$

and

$${Q}_{0}(r,t,j) = \left [I -\left (\begin{array}{c} {\nu }^{j}(t)\\ \vdots \\ {\nu }^{j}(t) \end{array} \right )\right ]\exp \left (\widetilde{{Q}}^{j}(t)r\right ).$$

Applying Lemma  5.42 to the case of \(\widetilde{Q}(s)\) a single block irreducible matrix \(\widetilde{{Q}}^{j}(s)\) , it follows that A 0(s,  j) is symmetric and nonnegative definite. Hence, standard results in linear algebra yield that there exists an m j ×m j matrix σ 0 (s,  j) such that

$${\sigma }^{0}(s,j){\sigma }^{0,{\prime}}(s,j) = {A}^{0}(s,j).$$
(5.104)

Note that the definition of \(\overline{g}(s,x,j)\) is independent of \(\widehat{Q}(t)\), so for determining A 0(s, j), we may consider \(\widehat{Q}(t) = 0\) . Note also that

$$\widetilde{Q}(t) = \mathrm{diag}(\widetilde{{Q}}^{1}(t),0,\ldots,0) + \cdots + \mathrm{diag}(0,\ldots,0,\widetilde{{Q}}^{l}(t)).$$

The foregoing statements suggest that in view of (5.104), the desired covariance matrix is given by

$$\begin{array}{ll} \sigma (s,j)& = \left (\begin{array}{*{10}c} {0}_{{m}_{1}\times {m}_{1}}& && && \\ &{0}_{{m}_{2}\times {m}_{2}}&& &&\\ & &\ddots &&& \\ & &&{\sigma }^{0}(s,j)&&\\ & & &&\ddots& \\ & && &&{0}_{{m}_{l}\times {m}_{l}}\\ \end{array} \right ) \\ & = \mathrm{diag}({0}_{{m}_{1}\times {m}_{1}},{0}_{{m}_{2}\times {m}_{2}},\ldots,{\sigma }^{0}(s,j)\ldots,{0}_{{ m}_{l}\times {m}_{l}}), \end{array}$$
(5.105)

where \({0}_{{m}_{k}\times {m}_{k}}\) is the m k ×m k zero matrix. That is, it is a matrix with the jth block-diagonal submatrix equal to σ0(s,  j) and the rest of its elements equal to zero.

Theorem 5.43

Assume that (A5.5) holds. Suppose \(\widetilde{Q}(\cdot )\) is twice differentiable with Lipschitz continuous second derivative and \(\widehat{Q}(\cdot )\) is differentiable with Lipschitz continuous derivative. Let β ij (⋅) be bounded and Lipschitz continuous deterministic functions. Then n ε (⋅) converges weakly to a switching diffusion n(⋅), where

$$n(t) = \left({\int }_{0}^{t}\sigma (s,\overline{\alpha }(s))dw(s)\right)^{\prime}$$

and w(⋅) is a standard m-dimensional Brownian motion.

Proof: Let

$$\widetilde{n}(t) = \left({\int }_{0}^{t}\sigma (s,\overline{\alpha }(s))dw(s)\right)^{\prime}$$

and \(\overline{\alpha }(\cdot )\) be a Markov chain generated by \(\overline{Q}(t)\). Then for all f 0 ( ⋅,  i) ∈ C L 2,

$${f}^{0}(\widetilde{n}(t),\overline{\alpha }(t)) -{\int }_{0}^{t}\left (\overline{g}(s,\widetilde{n}(s),\overline{\alpha }(s)) + \overline{Q}(s){f}^{0}(\widetilde{n}(s),\cdot )(\overline{\alpha }(s))\right )ds$$

is a martingale. This and the uniqueness of the martingale problem in Lemma  5.41 yields that \((\widetilde{n}(\cdot ),\overline{\alpha }(\cdot ))\) has the same probability distribution as \((n(\cdot ),\overline{\alpha }(\cdot ))\). This proves the theorem. □ 

Remark 5.44

Note that the Lipschitz condition on β ij(⋅) is not required in analyzing the asymptotic normality in Section 5.3.3. It is needed in this section because the perturbed test function method typically requires smoothness conditions of the associated processes.

It appears that the conditions in (A5.5) and (A5.6) together with the Lipschitz property of β ij (⋅) are sufficient for the convergence of n ε (⋅) to a switching diffusion n(⋅). The additional assumptions on further derivatives of \(\widetilde{Q}(\cdot )\) and \(\widehat{Q}(\cdot )\) are needed for computing the covariance of the limit process n(⋅).

Remark 5.45

If \(\overline{\alpha }(\cdot )\) were a deterministic function, n(⋅) above would be a diffusion process in the usual sense. However since the limit \(\overline{\alpha }(\cdot )\) is a Markov chain, the diffusion process is modulated by this jump process; the resulting distribution has the features of the “continuous” diffusion process and the “discrete” Markov chain limit.

In this section, we use the perturbed test function method, which is quite different from the approach of Section 5.2. The method used in that section, which might be called a direct approach, is interesting in its own right and makes a close connection between asymptotic expansion and asymptotic normality. It is effective whenever it can be applied. One of the main ingredients is that the direct approach makes use of the mixing properties of the scaled occupation measures heavily. In fact, using asymptotic expansion, it was shown that the scaled sequence of occupation measures is a mixing process with exponential mixing rate. For the weak and strong interaction cases presented, the mixing condition, and even approximate mixing conditions, no longer hold. To illustrate, consider Example  4.20 with constant jump rates and calculate

$$E[{n}^{\varepsilon,{\prime}}(s)({n}^\varepsilon (t) - {n}^\varepsilon (s))].$$

By virtue of the proof of Theorem  5.25 , a straightforward but tedious calculation shows that

$$E\left [{n}^{\varepsilon,{\prime}}(s)({n}^\varepsilon (t) - {n}^\varepsilon (s))\right ]\rightarrow 0\ \mbox{ as }\varepsilon \rightarrow 0$$

for the weak and strong interaction models because E[n ε,′ (s)(n ε (t) − n ε (s))] depends on P 1 (t,s), generally a nonzero function. A direct consequence is that the limit process does not have independent increments in general. It is thus difficult to characterize the limit process via the direct approach. The perturbed test function method, on the other hand, can be considered as a combined approach. It uses enlarged or augmented states by treating the scaled occupation measure n ε (⋅) and the Markov chain α ε (⋅) together. That is, one considers a new state variable with two components (x,α). This allows us to bypass the verification of mixing-like properties such that the limit process is characterized by means of solutions of appropriate martingale problems via perturbed test functions, which underlies the rationale and essence of the approach. As a consequence, the limit process is characterized via the limit of the underlying sequence of operators.

Note that if \(\widetilde{Q}(t)\) itself is weakly irreducible (i.e., \(\widetilde{Q}(t)\) consists of only one block), then the covariance matrix is given by (5.30). In this case, since there is only one group of recurrent states, the jump behavior due to the limit process \(\overline{\alpha }(\cdot )\) will disappear. Moreover, owing to the fast transition rate \(\widetilde{Q}(t)/\varepsilon \), the singularly perturbed Markov chain rapidly reaches its quasi-stationary regime. As a result, the jump behavior does not appear in the asymptotic distribution, and the diffusion becomes the dominant factor. Although the method employed in this chapter is different from that of Section 5.2, the result coincides with that of Section 5.2 under irreducibility. We state this in the following corollary.

Corollary 5.46

Assume that the conditions of Theorem  5.43 are fulfilled with l = 1 (i.e., \(\widetilde{Q}(t)\) has only one block). Then n ε (⋅) converges weakly to the diffusion process

$$n(t) = \left({\int }_{0}^{t}\sigma (s)dw(s)\right)^{\prime},$$

where w(⋅) is an m-dimensional standard Brownian motion with covariance

$$A(t) = \sigma (t)\sigma ^{\prime}(t)$$

given by (5.30).

To further illustrate, consider the following example. This problem is concerned with a singularly perturbed Markov chain with four states divided into two groups. It has been used in modeling production planning problems with failure-prone machines. As was mentioned, from a modeling point of view, it may be used to depict the situation that two machines operate in tandem, in which the operating conditions (the machine capacity) of one of the machines change much faster than the other; see also related discussions in Chapters 7 and 8.

Example 5.47

Let α ε (⋅) be a Markov chain generated by

$$\begin{array}{l} {Q}^\varepsilon (t) = \frac{1} \varepsilon \left (\begin{array}{cccc} - {\lambda }_{1}(t)& {\lambda }_{1}(t) & 0 & 0 \\ {\mu }_{1}(t) & - {\mu }_{1}(t)& 0 & 0 \\ 0 & 0 & - {\lambda }_{1}(t)& {\lambda }_{1}(t) \\ 0 & 0 & {\mu }_{1}(t) & - {\mu }_{1}(t)\\ \end{array} \right ) \\ \\ \quad \quad + \left (\begin{array}{cccc} - {\lambda }_{2}(t)& 0 & {\lambda }_{2}(t) & 0 \\ 0 & - {\lambda }_{2}(t)& 0 & {\lambda }_{2}(t) \\ {\mu }_{2}(t) & 0 & - {\mu }_{2}(t)& 0 \\ 0 & {\mu }_{2}(t) & 0 & - {\mu }_{2}(t)\\ \end{array} \right ).\end{array}$$

Then

$$\overline{Q}(t) = \left (\begin{array}{cc} - {\lambda }_{2}(t)& {\lambda }_{2}(t) \\ {\mu }_{2}(t) & - {\mu }_{2}(t) \end{array} \right ).$$

Let \(\overline{\alpha }(\cdot )\) be a Markov chain generated by \(\overline{Q}(t)\) , t ≥ 0. In this example,

$${\sigma }^{0}(s,1) = 2{\left ( \frac{{\lambda }_{1}(s){\mu }_{1}(s)} {{({\lambda }_{1}(s) + {\mu }_{1}(s))}^{3}}\right )}^{\frac{1} {2} }\left (\begin{array}{cc} {\beta }_{11}(s) &0\\ - {\beta }_{ 12}(s)&0\\ \end{array} \right ),$$
$${\sigma }^{0}(s,2) = 2{\left ( \frac{{\lambda }_{1}(s){\mu }_{1}(s)} {{({\lambda }_{1}(s) + {\mu }_{1}(s))}^{3}}\right )}^{\frac{1} {2} }\left (\begin{array}{cc} {\beta }_{21}(s) &0\\ - {\beta }_{ 22}(s)&0\\ \end{array} \right ),$$
$$\sigma (s,1) = 2{\left ( \frac{{\lambda }_{1}(s){\mu }_{1}(s)} {{({\lambda }_{1}(s) + {\mu }_{1}(s))}^{3}}\right )}^{\frac{1} {2} }\left (\begin{array}{cccc} {\beta }_{11}(s) &0&0&0 \\ - {\beta }_{12}(s)&0&0&0 \\ 0 &0&0&0\\ 0 &0 &0 &0\\ \end{array} \right ),$$

and

$$\sigma (s,2) = 2{\left ( \frac{{\lambda }_{1}(s){\mu }_{1}(s)} {{({\lambda }_{1}(s) + {\mu }_{1}(s))}^{3}}\right )}^{\frac{1} {2} }\left (\begin{array}{cccc} 0&0& 0 &0\\ 0 &0 & 0 &0 \\ 0&0& {\beta }_{21}(s) &0 \\ 0&0& - {\beta }_{22}(s)&0\\ \end{array} \right ).$$

The limit of n ε (⋅) is given by

$$n(t) = \left({\int }_{0}^{t}\sigma (s,\overline{\alpha (s)})dw(s)\right)^{\prime},$$

where w(⋅) is a standard Brownian motion taking values in \({\mathbb{R}}^{4}\).

4 Measurable Generators

In Section 4.2, we considered the asymptotic expansions of probability distributions. A natural requirement of such expansions is that the generator Q ε(t) be smooth enough to establish the desired error bounds. It would be interesting to consider the case in which the generator Q ε(t), t ≥ 0, is merely measurable. The method used in this section is very useful in some manufacturing problems; see Sethi and Zhang [192]. Moreover, the results are used in Section 8.6 to deal with a control problem under relaxed control formulation. Given only the measurability of Q ε(t), there seems to be little hope to obtain an asymptotic expansion. Instead of constructing an asymptotic series of the corresponding probability distribution, we consider the convergence of P(α ε(t) = s ij ) under the framework of convergence of

$${\int }_{0}^{T}P({\alpha }^\varepsilon (t) = {s}_{ ij})f(t)dt\ \mbox{ for }\ f(\cdot ) \in {L}^{2}[0,T]; \mathbb{R}).$$

Since the phrase “weak convergence” is reserved throughout the book for the convergence of probability measures, to avoid confusion, we refer to the convergence above as convergence in the weak sense on \({L}^{2}([0,T]; \mathbb{R})\) or convergence under the weak topology of \({L}^{2}([0,T]; \mathbb{R})\).

4.1 Case I: Weakly Irreducible \(\widetilde{Q}(t)\)

Let \({\alpha }^\varepsilon (\cdot ) \in \mathcal{M} =\{ 1,\ldots,m\}\) denote the Markov chain generated by

$${Q}^\varepsilon (t) = \frac{1} \varepsilon \widetilde{Q}(t) +\widehat{ Q}(t),$$

where both \(\widetilde{Q}(t)\) and \(\widehat{Q}(t)\) are generators.

We assume the following conditions in this subsection.

  1. (A5.7)

    \(\widetilde{Q}(t)\) and \(\widehat{Q}(t)\) are bounded and Borel measurable. Moreover, \(\widetilde{Q}(t)\) is weakly irreducible.

Remark 5.48

In fact, both the boundedness and the Borel measurability in (A5.7) are redundant. Recall that our definition of generators (see Definition  2.2) uses the q-Property, which includes both the Borel measurability and the boundedness. Thus, (A5.7) requires only weak irreducibility. Nevertheless, we retain both boundedness and measurability for those who read only this section. Similar comments apply to assumption (A5.8) in what follows.

Define the probability distribution vector

$${p}^\varepsilon (t) = (P({\alpha }^\varepsilon (t) = 1),\ldots,P({\alpha }^\varepsilon (t) = m))$$

and the transition matrix

$${P}^\varepsilon (t,s) = ({p}_{ ij}^\varepsilon (t,s)) = \left (P({\alpha }^\varepsilon (t) = j\vert {\alpha }^\varepsilon (s) = i)\right ).$$

Then using the martingale property in Lemma  2.4, we have

$${p}^\varepsilon (t) = {p}^\varepsilon (s) +{ \int }_{s}^{t}{p}^\varepsilon (r){Q}^\varepsilon (r)dr$$
(5.106)

and

$${P}^\varepsilon (t,s) = I +{ \int }_{s}^{t}{P}^\varepsilon (r,s){Q}^\varepsilon (r)dr.$$
(5.107)

The next two lemmas are concerned with the asymptotic properties of p ε(t) and P ε(t,  s).

Lemma 5.49

Assume (A5.7) . Then for each i, j, and T > 0, P(α ε (t) = i) and \(P({\alpha }^\varepsilon (t) = i\vert {\alpha }^\varepsilon (s) = j)\) both converge weakly to ν i (t) on \({L}^{2}([0,T]; \mathbb{R})\) and \({L}^{2}([s,T]; \mathbb{R})\) , respectively, that is, as ε → 0,

$${\int }_{0}^{T}[P({\alpha }^\varepsilon (t) = i) - {\nu }_{ i}(t)]f(t)dt \rightarrow 0$$
(5.108)

and

$${\int }_{s}^{T}[P({\alpha }^\varepsilon (t) = i\vert {\alpha }^\varepsilon (s) = j) - {\nu }_{ i}(t)]f(t)dt \rightarrow 0,$$
(5.109)

for all \(f(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})\) and \({L}^{2}([s,T]; \mathbb{R})\) , respectively.

Proof: We only verify (5.108); the proof of (5.109) is similar. Recall that

$${p}^\varepsilon (t) = ({p}_{ 1}^\varepsilon (t),\ldots,{p}_{ m}^\varepsilon (t)) = (P({\alpha }^\varepsilon (t) = 1),\ldots,P({\alpha }^\varepsilon (t) = m)).$$

Since \({p}^\varepsilon (\cdot ) \in {L}^{2}([0,T]; {\mathbb{R}}^{m})\) (space of square-integrable functions on [0, T] taking values in \({\mathbb{R}}^{m}\) ), for each subsequence of ε → 0 there exists (see Lemma  A.36) a further subsequence of ε → 0 (still denoted by ε for simplicity), and for such ε, the corresponding { p ε( ⋅)} converges (in the weak sense on \({L}^{2}([0,T]; {\mathbb{R}}^{m})\)) to some \(p(\cdot ) = ({p}_{1}(\cdot ),\ldots,{p}_{m}(\cdot )) \in {L}^{2}([0,T]; {\mathbb{R}}^{m})\), that is,

$${\int }_{0}^{T}{p}^\varepsilon (r)({f}_{ 1}(r),\ldots,{f}_{m}(r))^{\prime}dr \rightarrow {\int }_{0}^{T}p(r)({f}_{ 1}(r),\ldots,{f}_{m}(r))^{\prime}dr,$$

for any \(({f}_{1}(\cdot ),\ldots,{f}_{m}(\cdot ))^{\prime} \in {L}^{2}([0,T]; {\mathbb{R}}^{m})\). Moreover,

$$0 \leq {p}_{i}(t) \leq 1\quad \mbox{ and }\quad {p}_{1}(t) + \cdots + {p}_{m}(t) = 1$$
(5.110)

almost everywhere. Since \(\widetilde{Q}(\cdot ) \in {L}^{2}([0,T]; {\mathbb{R}}^{m\times m})\), we have for 0 ≤ s ≤  t ≤ T,

$${\int }_{s}^{t}{p}^\varepsilon (r)\widetilde{Q}(r)dr \rightarrow {\int }_{s}^{t}p(r)\widetilde{Q}(r)dr.$$

Thus, using (5.106) we obtain

$$\begin{array}{rl} {\int }_{s}^{t}p(r)\widetilde{Q}(r)dr =&\lim \limits_{ \varepsilon \rightarrow 0}{ \int }_{s}^{t}{p}^\varepsilon (r)\widetilde{Q}(r)dr \\ =&\lim \limits_{\varepsilon \rightarrow 0}\left (\varepsilon ({p}^\varepsilon (t) - {p}^\varepsilon (s)) - \varepsilon {\int }_{s}^{t}{p}^\varepsilon (r)\widehat{Q}(r)dr\right ) = 0.\end{array}$$

Since s and t are arbitrary, it follows immediately that

$$p(t)\widetilde{Q}(t) = 0\mbox{ a.e. in }t.$$

By virtue of (5.110), the irreducibility of \(\widetilde{Q}(t)\) implies p(t) = ν( t) almost everywhere. Thus the limit is independent of the chosen subsequence. Therefore, p ε ( ⋅) → ν( ⋅) in the weak sense on \({L}^{2}([0,T]; {\mathbb{R}}^{m})\). □ 

Theorem 5.50

Assume (A5.7) . Then for any bounded deterministic function β i (⋅) and for each \(i \in \mathcal{M}\) and t ≥ 0,

$$E{\left \vert {\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)){\beta }_{i}(s)ds\right \vert }^{2} \rightarrow 0\mbox{ as }\varepsilon \rightarrow 0.$$
(5.111)

Proof: Let

$$\eta (t) = E{\left \vert {\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}(s)){\beta }_{i}(s)ds\right \vert }^{2}.$$

Then as in the proof of Theorem  5.25, we can show that

$$\eta (t) = 2({\eta }_{1}(t) + {\eta }_{2}(t)),$$

where

$$\begin{array}{rl} {\eta }_{1}(t) =&{\int }_{0}^{t}{ \int }_{0}^{s}(-{\nu }_{ i}(r))[P({\alpha }^\varepsilon (s) = i) - {\nu }_{ i}(s)]{\beta }_{i}(s){\beta }_{i}(r)drds, \\ {\eta }_{2}(t) =&{\int }_{0}^{t}{ \int }_{0}^{s}P({\alpha }^\varepsilon (r) = i)[P({\alpha }^\varepsilon (s) = i\vert {\alpha }^\varepsilon (r) = i) - {\nu }_{ i}(s)] \\ &\qquad \times {\beta }_{i}(s){\beta }_{i}(r)drds\end{array}$$

By virtue of Lemma  5.49, Pε(s) =  i) → ν i (s) in the weak sense on \({L}^{2}([0,T]; \mathbb{R})\) and therefore as ε → 0,

$${\eta }_{1}(t) ={ \int }_{0}^{t}[P({\alpha }^\varepsilon (s) = i) - {\nu }_{ i}(s)]{\beta }_{i}(s)\left ({\int }_{0}^{s}(-{\nu }_{ i}(r)){\beta }_{i}(r)dr\right )ds \rightarrow 0.$$

Similarly, in view of the convergence of

$$P({\alpha }^\varepsilon (s) = i\vert {\alpha }^\varepsilon (r) = i) \rightarrow {\nu }_{ i}(s)$$

under the weak topology of \({L}^{2}([r,t]; \mathbb{R})\), we have

$$\begin{array}{rl} &{\eta }_{2}(t) ={ \int }_{0}^{t}\left [{\int }_{r}^{t}[P({\alpha }^\varepsilon (s) = i\vert {\alpha }^\varepsilon (r) = i) - {\nu }_{ i}(s)]{\beta }_{i}(s)ds\right ] \\ &\quad \times P({\alpha }^\varepsilon (r) = i){\beta }_{i}(r)dr \rightarrow 0. \end{array}$$

This concludes the proof of the theorem. □ 

4.2 Case II: \(\widetilde{Q}(t) = \mathrm{diag}(\widetilde{{Q}}^{1}(t),\ldots,\widetilde{{Q}}^{l}(t))\)

This subsection extends the preceding result to the cases in which \(\widetilde{Q}(t)\) is a block-diagonal matrix with irreducible blocks. We make the following assumptions:

  1. (A5.8)

    \(\widehat{Q}(t)\) and \(\widetilde{{Q}}^{i}(t)\), for i = 1,  , l, are bounded and Borel measurable. Moreover, \(\widetilde{{Q}}^{i}(t)\),i = 1, …,  l, are weakly irreducible.

Lemma 5.51

Assume (A5.8) . Then the following assertions hold:

  1. (a)

    For each i = 1,  , l and j = 1, …,  m i , Pε(t) =  s ij ) converges in the weak sense to \({\nu }_{j}^{i}(t){{\vartheta}}^{i}(t)\) on \({L}^{2}([0,T]; \mathbb{R})\) , that is,

    $${\int }_{0}^{T}[P({\alpha }^\varepsilon (t) = {s}_{ ij}) - {\nu }_{j}^{i}(t){{\vartheta}}^{i}(t)]f(t)dt \rightarrow 0,$$
    (5.112)

    for all \(f(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})\) , where

    $$({{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t)) = {p}_{ 0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}({{\vartheta}}^{1}(s),\ldots,{{\vartheta}}^{l}(s))\overline{Q}(s)ds.$$
  2. (b)

    For each i, j,  i 1, j 1,\(P({\alpha }^\varepsilon (t) = {s}_{ij}\vert {\alpha }^\varepsilon (s) = {s}_{{i}_{1}{j}_{1}})\) converges in the weak sense to \({\nu }_{j}^{i}(t){{\vartheta}}_{ii}(t,s)\) on \({L}^{2}([s,T]; \mathbb{R})\) , that is,

    $${\int }_{s}^{T}[P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (s) = {s}_{{ i}_{1}{j}_{1}}) - {\nu }_{j}^{i}(t){{\vartheta}}_{ ii}(t,s)]f(t)dt \rightarrow 0,$$
    (5.113)

    for all \(f(\cdot ) \in {L}^{2}([s,T]; \mathbb{R})\) , where \({{\vartheta}}_{ij}(t,s)\) is defined in Lemma  5.24 (see (5.50)).

Proof: We only derive (5.112); the proof of (5.113) is similar. Let

$${p}^\varepsilon (t) = \left ({p}_{ 11}^\varepsilon (t),\ldots,{p}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{p}_{ l1}^\varepsilon (t),\ldots,{p}_{ l{m}_{l}}^\varepsilon (t)\right )$$

where \({p}_{ij}^\varepsilon (t) = P({\alpha }^\varepsilon (t) = {s}_{ij})\) . Since \({p}^\varepsilon (\cdot ) \in {L}^{2}([0,T]; {\mathbb{R}}^{m})\) , there exists (see Lemma  A.36) a subsequence of ε → 0 (still denoted by ε for simplicity), such that corresponding to this ε, p ε(t) converges to some \(p(\cdot ) \in {L}^{2}([0,T]; {\mathbb{R}}^{m})\) under the weak topology. Let

$$p(t) = \left ({p}_{11}(t),\ldots,{p}_{1{m}_{1}}(t),\ldots,{p}_{l1}(t),\ldots,{p}_{l{m}_{l}}(t)\right ).$$

Then 0 ≤  p ij (t) ≤ 1 and ∑ i,  j p ij (t) = 1 almost everywhere. Similarly as in the proof of Lemma  5.49, for 0 ≤ t ≤ T,

$$p(t)\widetilde{Q}(t) = 0\mbox{ a.e. in }t.$$

The irreducibility of \(\widetilde{{Q}}^{k}(t)\),k = 1, …,  l, implies that

$$p(t) = ({{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t))\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t)),$$
(5.114)

for some functions \({{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t)\).

In view of (5.106), we have

$${p}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} = {p}_{ 0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}{p}^\varepsilon (s)\left (\frac{1} \varepsilon \widetilde{Q}(s) +\widehat{ Q}(s)\right )\widetilde{\mathrm{1}\mathrm{l}}ds.$$

Since \(\widetilde{Q}(s)\widetilde{\mathrm{1}\mathrm{l}} = 0\), it follows that

$${p}^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}} = {p}_{ 0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}{p}^\varepsilon (s)\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}ds.$$

Owing to the convergence of p ε(t) → p( t) under the weak topology of \({L}^{2}([0,T]; {\mathbb{R}}^{m})\), we have

$$p(t)\widetilde{\mathrm{1}\mathrm{l}} = {p}_{0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}p(s)\widehat{Q}(s)\widetilde{\mathrm{1}\mathrm{l}}ds.$$

Using (5.114) and noting that

$$\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))\widetilde{\mathrm{1}\mathrm{l}} = I,$$

we have

$$({{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t)) = {p}_{ 0}\widetilde{\mathrm{1}\mathrm{l}} +{ \int }_{0}^{t}({{\vartheta}}^{1}(s),\ldots,{{\vartheta}}^{l}(s))\overline{Q}(s)ds.$$

The uniqueness of the solution then yields the lemma. □ 

Theorem 5.52

Assume (A5.8) . Then for any i = 1,…,l, j = 1,…,m i , and bounded deterministic function β ij (t), t ≥ 0,

$$E\left ({\int }_{0}^{T}\left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}\right ){\beta }_{ij}(t)d{t}\right )^{2} \rightarrow 0,\mbox{ as }\varepsilon \rightarrow 0.$$

Proof: Let η(t) be defined as in (5.52). Then we can show similarly as in the proof of Theorem  5.25 that

$$\eta (T) = 2{\int }_{0}^{T}{ \int }_{0}^{t}{\Phi }^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)drdt,$$

where \({\Phi }^\varepsilon (t,r) = {\Phi }_{1}^\varepsilon (t,r) + {\Phi }_{2}^\varepsilon (t,r)\) with Φ 1 ε(t,  r) and Φ 2 ε(t, r) defined by (5.53) and (5.54), respectively.

Note that by changing the order of integration,

$$\begin{array}{rl} &{\int }_{0}^{T}{ \int }_{0}^{t}{\Phi }_{ 1}^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)drdt \\ &\quad ={ \int }_{0}^{T}P({\alpha }^\varepsilon (r) = {s}_{ ij}){\beta }_{ij}(r)\left \{{\int }_{r}^{T}[P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (r) = {s}_{ ij})\right. \\ &\left.\qquad - {\nu }_{j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (r) = {s}_{ ij})]{\beta }_{ij}(t)dt\right \}dr\end{array}$$

Since the β ij ( ⋅) are bounded uniformly on [0, T], \({\beta }_{ij}(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})\). As a result, Lemma  5.51 implies that

$$\begin{array}{rl} &{ \int }_{r}^{T}[P({\alpha }^\varepsilon (t) = {s}_{ ij}\vert {\alpha }^\varepsilon (r) = {s}_{ ij}) \\ & - {\nu }_{j}^{i}(t)P({\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}\vert {\alpha }^\varepsilon (r) = {s}_{ ij})]{\beta }_{ij}(t)dt \rightarrow 0.\end{array}$$

Hence as ε → 0,

$${\int }_{0}^{T}{ \int }_{0}^{t}{\Phi }_{ 1}^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)drdt \rightarrow 0.$$

Similarly,

$${\int }_{0}^{T}{ \int }_{0}^{t}{\Phi }_{ 2}^\varepsilon (t,r){\beta }_{ ij}(t){\beta }_{ij}(r)drdt \rightarrow 0,\mbox{ as }\varepsilon \rightarrow 0.$$

The proof is complete. □ 

Theorem 5.53

Assume (A5.8) . Then \({\overline{\alpha }}^\varepsilon (\cdot )\) converges weakly to \(\overline{\alpha }(\cdot )\) on \(D([0,T];\overline{\mathcal{M}})\), as ε → 0.

Proof: Recall that χε(t) denotes the vector of indicator functions

$$\left ({I}_{\{{\alpha }^\varepsilon (t)={s}_{11}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{1{m}_{ 1}}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l1}\}},\ldots,{I}_{\{{\alpha }^\varepsilon (t)={s}_{l{m}_{ l}}\}}\right ),$$

and let

$${\overline{\chi }}^\varepsilon (t) = ({\overline{\chi }}_{ 1}^\varepsilon (t),\ldots,{\overline{\chi }}_{ l}^\varepsilon (t)) = {\chi }^\varepsilon (t)\widetilde{\mathrm{1}\mathrm{l}}.$$

Then \({\overline{\chi }}_{i}^\varepsilon (t) = {I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}\) for i = 1,  , l.

We show that \({\overline{\chi }}^\varepsilon (\cdot )\) is tight in D l[0, T] first. Let \({\mathcal{F}}_{t}^\varepsilon = \sigma \{{\alpha }^\varepsilon (r) :\; r \leq t\}\). Then in view of the martingale property associated with αε( ⋅), we have, for 0 ≤ s ≤  t,

$$E\left [{\chi }^\varepsilon (t) - {\chi }^\varepsilon (s) -{\int }_{s}^{t}{\chi }^\varepsilon (r){Q}^\varepsilon (r)dr|{\mathcal{F}}_{ s}^\varepsilon \right ] = 0.$$

Right multiplying both sides of the equation by \(\widetilde{\mathrm{1}\mathrm{l}}\) and noting that \(\widetilde{Q}(r)\widetilde{\mathrm{1}\mathrm{l}} = 0\), we obtain

$$E\left [{\overline{\chi }}^\varepsilon (t) -{\overline{\chi }}^\varepsilon (s) -{\int }_{s}^{t}{\chi }^\varepsilon (r)\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr|{\mathcal{F}}_{ s}^\varepsilon \right ] = 0.$$
(5.115)

Note that

$$\left \vert {\int }_{s}^{t}{\chi }^\varepsilon (r)\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr\right \vert = O(t - s).$$

It follows from (5.115) that

$$E\left [{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}\vert {\mathcal{F}}_{s}^\varepsilon \right ] = {I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}} + O(t - s).$$
(5.116)

Note also that (I A )2  = I A for any set A. We have, in view of (5.116),

$$\begin{array}{rl} &E\left [{\left ({I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}} - {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}\right )}^{2}|{\mathcal{F}}_{ s}^\varepsilon \right ] \\ & = E\left [{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}} - 2{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}{I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} + {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}|{\mathcal{F}}_{s}^\varepsilon \right ] \\ & = E\left [{I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}}|{\mathcal{F}}_{s}^\varepsilon \right ] - 2E\left [{I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}|{\mathcal{F}}_{s}^\varepsilon \right ]{I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}} + {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} \\ & = {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} + O(t - s) \\ &\qquad \quad - 2\left ({I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} + O(t - s)\right ){I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} + {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}} \\ & = O(t - s), \end{array}$$

for each i = 1, …,  l. Hence,

$$\lim {}_{t\rightarrow s}\lim \limits_{\varepsilon \rightarrow 0}E\left \{E\left [{\left ({I}_{\{{\overline{\alpha }}^\varepsilon (t)=i\}} - {I}_{\{{\overline{\alpha }}^\varepsilon (s)=i\}}\right )}^{2}|{\mathcal{F}}_{ s}^\varepsilon \right ]\right \} = 0.$$

Therefore, by Lemma  A.17, \({\overline{\chi }}^\varepsilon (\cdot )\) is tight.

The tightness of \({\overline{\chi }}^\varepsilon (\cdot )\) implies that for any sequence ε k  → 0, there exists a subsequence of {ε k } (still denoted by {ε k }) such that \({\overline{\chi }}^{\varepsilon _{k}}(\cdot )\) converges weakly. We next show that the limit of such a subsequence is uniquely determined by \(\overline{Q}(\cdot ) := \mathrm{diag}({\nu }^{1}(\cdot ),\ldots,{\nu }^{l}(\cdot ))\widehat{Q}(\cdot )\widetilde{\mathrm{1}\mathrm{l}}\).

Note that

$$\begin{array}{rl} &{ \int }_{s}^{t}{\chi }^\varepsilon (r)\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr ={ \int }_{s}^{t}{\overline{\chi }}^\varepsilon (r)\overline{Q}(r)dr \\ & +{ \int }_{s}^{t}\left ({\chi }^\varepsilon (r) -{\overline{\chi }}^\varepsilon (r)\mathrm{diag}({\nu }^{1}(r),\ldots,{\nu }^{l}(r))\right )\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr.\end{array}$$

In view of Theorem  5.52, we have, as ε → 0,

$$E\left \vert {\int }_{s}^{t}\left [{\chi }^\varepsilon (r) -{\overline{\chi }}^\varepsilon (r)\mathrm{diag}({\nu }^{1}(r),\ldots,{\nu }^{l}(r))\right ]\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr\right \vert \rightarrow 0.$$
(5.117)

Now by virtue of (5.115),

$$E\left [\left ({\overline{\chi }}^\varepsilon (t) -{\overline{\chi }}^\varepsilon (s) -{\int }_{s}^{t}{\chi }^\varepsilon (r)\widehat{Q}(r)\widetilde{\mathrm{1}\mathrm{l}}dr\right ){z}_{ 1}({\overline{\chi }}^\varepsilon ({t}_{ 1}))\cdots {z}_{j}({\overline{\chi }}^\varepsilon ({t}_{ j}))\right ] = 0,$$

for 0 ≤  t 1 ≤ ⋯ ≤ t j  ≤ s ≤  t and bounded and continuous functions z 1 ( ⋅),  , z j ( ⋅).

Let \(\overline{\chi }(\cdot )\) denote the limit in distribution of \({\overline{\chi }}^{\varepsilon _{k}}(\cdot )\). Then in view of (5.117) and the continuity of \({\int }_{s}^{t}\eta (r)\overline{Q}(r)dr\) with respect to η( ⋅) (see Lemma  A.40), we have \({\overline{\chi }}^\varepsilon (\cdot ) \rightarrow \overline{\chi }(\cdot )\) as ε k  → 0, and \(\overline{\chi }(\cdot )\) satisfies

$$E\left [\left (\overline{\chi }(t) -\overline{\chi }(s) -{\int }_{s}^{t}\overline{\chi }(r)\overline{Q}(r)dr\right ){z}_{ 1}(\overline{\chi }({t}_{1}))\cdots {z}_{j}(\overline{\chi }({t}_{j}))\right ] = 0.$$

It is easy to see that \(\overline{\chi }(\cdot ) = ({\overline{\chi }}_{1}(\cdot ),\ldots,{\overline{\chi }}_{l}(\cdot ))\) is an l-valued measurable process having sample paths in D l[0, T] and satisfying \({\overline{\chi }}_{i}(t) = 0\) or 1 and \({\overline{\chi }}_{1}(\cdot ) + \cdots +{ \overline{\chi }}_{l}(\cdot ) = 1\) w.p.1. Let

$$\overline{\alpha }(t) = \sum \limits_{i=1}^{l}i{I}_{\{{\overline{\chi }}_{ i}(t)=1\}},$$

or in an expanded form,

$$\overline{\alpha }(t) = \left \{\begin{array}{cl} 1,&\mbox{ if }{\overline{\chi }}_{1}(t) = 1, \\ 2,&\mbox{ if }{\overline{\chi }}_{1}(t) = 0,\;{\overline{\chi }}_{2}(t) = 1,\\ \vdots &\vdots \\ l, &\mbox{ if }{\overline{\chi }}_{i}(t) = 0,\mbox{ for }i \leq l - 1,\;{\overline{\chi }}_{l}(t) = 1. \end{array} \right.$$

Then \(\overline{\alpha }(\cdot )\) is a process with sample paths in \(D([0,T];\overline{\mathcal{M}})\) and

$$\overline{\chi }(t) = ({I}_{\{\overline{\alpha }(t)=1\}},\ldots,{I}_{\{\overline{\alpha }(t)=l\}})\mbox{ w.p.1}.$$

Therefore, \(\overline{\alpha }(\cdot )\) is a Markov chain generated by \(\overline{Q}(\cdot )\) . As a result, its distribution is uniquely determined by \(\overline{Q}(\cdot )\) . It follows that \({\overline{\alpha }}^\varepsilon (\cdot )\) converges weakly to \(\overline{\alpha }(\cdot )\). □ 

Remark 5.54

Note that Theorem  5.53 gives the same result as Theorem  5.27 under weaker conditions. The proofs are quite different. The proof of Theorem  5.53 is based on martingale properties associated with the Markov chain, whereas the proof of Theorem  5.27 follows the traditional approach, i.e., after the tightness is verified, the convergence of finite-dimensional distributions is proved.

Remark 5.55

In view of the development in Chapter 4, apart from the smoothness conditions, one of the main ingredients is the use of the Fredholm alternative. One hopes that this will carry over (under suitable conditions) to the measurable generators. A possible approach is the utilization of the formulation of weak derivatives initiated in the study of partial differential equations (see Hutson and Pym [90]).

Following the tactics of the weak sense formulation, for some T < ∞ and for given \(g(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})\) , a function \(f(\cdot ) \in {L}^{2}([0,T]; \mathbb{R})\) is a weak solution of \((d/dt)f = g\) if

$${\int }_{0}^{T}f(t)\left ({ d\phi (t) \over dt} \right )dt ={ \int }_{0}^{T}g(t)\phi (t)dt$$

for any C -functions on [0,T] vanishing on the boundary together with their derivatives (denoted by \(\phi \in {C}_{0}^{\infty }([0,T]; \mathbb{R})\) ). Write the weak solution as \((d/dt)f\mbox{ w} =g\).

Recall that L loc 2 is the set of functions that lie in \({L}^{2}(S; \mathbb{R})\) for every closed and bounded set S ⊂ (0,T). A function f(⋅) ∈ L loc 2 has a jth-order weak derivative if there is a function g(⋅) ∈ L loc 2 such that

$${\int }_{0}^{T}g(t)\phi (t)dt = {(-1)}^{j}{ \int }_{0}^{T}f(t){ {d}^{j}\phi (t) \over d{t}^{j}} dt$$

for all \(\phi \in {C}_{0}^{\infty }([0,T]; \mathbb{R})\) . The function g(⋅) above is called the jth-order weak derivative of f(⋅), and is denoted by D j f = g.

To proceed, define the space of functions H n as

$${H}^{n} =\{ f\mbox{ on }[0,T];\ \mbox{ for }0 \leq j \leq n,\ {D}^{j}f\mbox{ exist and are in }{L}^{2}([0,T]; \mathbb{R})\}.$$

Equip H n with an inner product and a norm as

$$\begin{array}{rl} &{(f,g)}_{n} = \sum \limits_{j\leq n}{ \int }_{0}^{T}{D}^{j}f{D}^{j}gdt, \\ &\vert f{\vert }_{n}^{2} = {(f,f)}_{ n} = \sum \limits_{j\leq n}{ \int }_{0}^{T}\vert {D}^{j}f{\vert }^{2}dt\end{array}$$

One can then work under such a framework and proceed to obtain the asymptotic expansion of the probability distribution. It seems that the conditions required are not much different from those in the case of smooth generators; we will not pursue this issue further.

5 Remarks on Inclusion of Transient and Absorbing States

So far, the development in this chapter has focused on Markov chains with only recurrent states (either a single weakly irreducible class or a number of weakly irreducible classes). This section extends the results obtained to the case that a transient class or a group of absorbing states is included.

5.1 Inclusion of Transient States

Consider the Markov chain \({\alpha }^\varepsilon (\cdot ) \in \mathcal{M}\), where its generator is still given by (5.47) and the state space of αε(t) is given by

$$\mathcal{M} = {\mathcal{M}}_{1} \cup {\mathcal{M}}_{2} \cup \cdots \cup {\mathcal{M}}_{l} \cup {\mathcal{M}}_{{_\ast}},$$
(5.118)

with \({\mathcal{M}}_{i} =\{ {s}_{i1},\ldots,{s}_{i{m}_{i}}\}\) and \({\mathcal{M}}_{{_\ast}} =\{ {s}_{{_\ast}1},\ldots,{s}_{{_\ast}{m}_{{_\ast}}}\}\). In what follows, we present results concerning the asymptotic distributions of scaled occupation measures and properties of measurable generators. While main assumptions and results are provided, the full proofs are omitted. The interested reader can derive the results using the ideas presented in the previous sections.

To proceed, assume that \(\widetilde{Q}(t)\) is a generator of a Markov chain satisfying

$$\widetilde{Q}(t) = \left (\begin{array}{cccc} \widetilde{{Q}}^{1}(t) & & & \\ \\ \\ & \ddots & &\\ \\ \\ & & \widetilde{{Q}}^{l}(t) & \\ \\ \\ \widetilde{{Q}}_{{_\ast}}^{1}(t)&\cdots &\widetilde{{Q}}_{{_\ast}}^{l}(t)&\widetilde{{Q}}_{{_\ast}}(t)\\ \end{array} \right )$$
(5.119)

such that for each t ∈ [0, T] and each i = 1, , l, \(\widetilde{{Q}}^{i}(t)\) is a generator with dimension m i ×m i , \(\widetilde{{Q}}_{{_\ast}}(t)\) is an m  ∗  ×m  ∗  matrix, \(\widetilde{{Q}}_{{_\ast}}^{i}(t) \in {\mathbb{R}}^{{m}_{{_\ast}}\times {m}_{i}}\), and \({m}_{1} + {m}_{2} + \cdots + {m}_{l} + {m}_{{_\ast}} = m\). We impose the following conditions.

  1. (A5.9)

    For all t ∈ [0, T], and i = 1, , l, \(\widetilde{{Q}}^{i}(t)\) are weakly irreducible, and \(\widetilde{{Q}}_{{_\ast}}(t)\) is Hurwitz (i.e., all of its eigenvalues have negative real parts). Moreover, \(\widetilde{Q}(\cdot )\) is differentiable on [0, T] and its derivative is Lipschitz; \(\widehat{Q}(\cdot )\) is Lipschitz continuous on [0, T].

Use the partition

$$\widehat{Q}(t) = \left (\begin{array}{*{10}c} \widehat{{Q}}^{11}(t)&\widehat{{Q}}^{12}(t) \\ \widehat{{Q}}^{21}(t)&\widehat{{Q}}^{22}(t)\\ \end{array} \right )$$

where

$$\begin{array}{rl} &\widehat{{Q}}^{11}(t) \in {\mathbb{R}}^{(m-{m}_{{_\ast}})\times (m-{m}_{{_\ast}})},\ \widehat{{Q}}^{12}(t) \in {\mathbb{R}}^{(m-{m}_{{_\ast}})\times {m}_{{_\ast}} }, \\ &\widehat{{Q}}^{21}(t) \in {\mathbb{R}}^{{m}_{{_\ast}}\times (m-{m}_{{_\ast}})},\ \mbox{ and }\ \widehat{{Q}}^{22}(t) \in {\mathbb{R}}^{{m}_{{_\ast}}\times {m}_{{_\ast}} }, \end{array}$$

and write

$$\begin{array}{ll} &{\overline{Q}}_{{_\ast}}(t) = \mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t))(\widehat{{Q}}^{11}(t)\widetilde{\mathrm{1}\mathrm{l}} +\widehat{ {Q}}^{12}(t)({a}_{{ m}_{1}}(t),\ldots,{a}_{{m}_{l}}(t))) \\ &\overline{Q}(t) = \mathrm{diag}({\overline{Q}}_{{_\ast}}(t),{0}_{{m}_{{_\ast}}\times {m}_{{_\ast}}}), \end{array}$$
(5.120)

where

$$\widetilde{\mathrm{1}\mathrm{l}} = \mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{m}_{1}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{l}}),\;\mathrm{1}{\mathrm{l}}_{{m}_{j}} = (1,\ldots,1)^{\prime} \in {\mathbb{R}}^{{m}_{j}\times 1},$$

and

$${a}_{{m}_{i}}(t) = -\widetilde{{Q}}_{{_\ast}}^{-1}(t)\widetilde{{Q}}_{ {_\ast}}^{i}(t)\mathrm{1}{\mathrm{l}}_{{ m}_{i}},\mbox{ for }i = 1,\ldots,l.$$
(5.121)

In what follows, if \({a}_{{m}_{i}}(t)\) is time independent, we will simply write it as \({a}_{{m}_{i}}\). The requirement on \(\widetilde{{Q}}_{{_\ast}}(t)\) in (A5.9) implies that the corresponding states are transient. The Hurwitzian property also has the following interesting implication: For each t ∈ [0, T], and each i = 1, , l, \({a}_{{m}_{i}}(t) = ({a}_{{m}_{i},1}(t),\ldots,{a}_{{m}_{i},{m}_{{_\ast}}}(t))^{\prime} \in {\mathbb{R}}^{{m}_{{_\ast}}\times 1}\). Then

$${a}_{{m}_{i},j}(t) \geq 0\ \mbox{ and }\ {\sum }_{i=1}^{l}{a}_{{ m}_{i},j}(t) = 1$$
(5.122)

for each j = 1, , m  ∗ . That is, for each t ∈ [0, T] and each j = 1, , l, \(({a}_{{m}_{1},j}(t),\ldots,{a}_{{m}_{l},j}(t))\) can be considered a probability row vector. To see this, note that

$${\int }_{0}^{\infty }\exp (\widetilde{{Q}}_{ {_\ast}}(t)s)ds = -\widetilde{{Q}}_{{_\ast}}^{-1}(t),$$

which has nonnegative components. It follows from the definition that \({a}_{{m}_{i}}(t) \geq 0\). Furthermore,

$${\sum }_{i=1}^{l}{a}_{{ m}_{i}}(t) = -\widetilde{{Q}}_{{_\ast}}^{-1}(t){\sum }_{i=1}^{l}\widetilde{{Q}}_{ {_\ast}}^{i}(t)\mathrm{1}{\mathrm{l}}_{{ m}_{i}} = (-\widetilde{{Q}}_{{_\ast}}^{-1}(t))(-\widetilde{{Q}}_{ {_\ast}}(t))\mathrm{1}{\mathrm{l}}_{{m}_{{_\ast}}} = \mathrm{1}{\mathrm{l}}_{{m}_{{_\ast}}}.$$

Thus (5.122) follows. Similar to the development in the section for the case of weak and strong interactions, we can derive the following results.

Theorem 5.56

Define

$${ \chi }_{ij}^\varepsilon (t) = \left \{\begin{array}{@{}l@{\quad }l@{}} {\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)={s}_{ij}\}} - {\nu }_{j}^{i}(s){I}_{\{{ \alpha }^\varepsilon (s)\in {\mathcal{M}}_{i}\}}\right )ds,\quad &fori=1,\ldots,l, \\ {\int }_{0}^{t}{I}_{\{{ \alpha }^\varepsilon (s)={s}_{{_\ast}j}\}}ds, \quad &fori=*,\\ \quad \end{array} \right.$$
(5.123)

and assume (A5.9) . Then for each j = 1,…,m i,

$${ \sup }_{t\in [0,T]}E\vert {\chi }_{ij}^\varepsilon (t){\vert }^{2} = \left \{\begin{array}{@{}l@{\quad }l@{}} O(\varepsilon ), \quad &fori=1,\ldots,l, \\ O(\varepsilon ^{2}),\quad &fori=*.\\ \quad \end{array} \right.$$
(5.124)

Next, for each fixed t ∈ [0, T], let ξ be a random variable uniformly distributed on [0, 1] that is independent of αε( ⋅). For each j = 1, , m  ∗ , define an integer-valued random variable ξ j (t) by

$$\begin{array}{rl} {\xi }_{j}(t)& = {I}_{\{0\leq \xi \leq {a}_{{m}_{ 1},j}(t)\}} + 2{I}_{\{{a}_{{m}_{ 1},j}(t)<\xi \leq {a}_{{m}_{1},j}(t)+{a}_{{m}_{2},j}(t)\}} \\ & \qquad + \cdots + l{I}_{\{{a}_{{m}_{ 1},j}(t)+\cdots +{a}_{{m}_{l-1},j}(t)<\xi \leq 1\}}\end{array}$$

Now redefine the aggregated process \({\overline{\alpha }}^\varepsilon (\cdot )\) by

$${ \overline{\alpha }}^\varepsilon (t) = \left \{\begin{array}{ll} i, &\mbox{ if }{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}, \\ {\xi }_{j}(t),&\mbox{ if }{\alpha }^\varepsilon (t) = {s}_{{_\ast}j}(t).\\ \end{array} \right.$$
(5.125)

Note that the state space of \({\overline{\alpha }}^\varepsilon (t)\) is \(\overline{\mathcal{M}} =\{ 1,\ldots,l\}\), and that \({\overline{\alpha }}^\varepsilon (\cdot ) \in D([0,T];\overline{\mathcal{M}})\). Similar to the weak and strong interaction case, but with more effort, we can obtain the following result.

Theorem 5.57

Under conditions (A5.9), \({\overline{\alpha }}^\varepsilon (\cdot )\) converges weakly to \(\overline{\alpha }(\cdot )\) , a Markov chain generated by \({\overline{Q}}_{{_\ast}}(\cdot )\) given by (5.120).

Next, for t ≥ 0, and \(\alpha \in \mathcal{M}\), let β ij (t) be bounded Borel measurable deterministic functions, and let

$${ W}_{ij}(t,\alpha ) = \left \{\begin{array}{ll} ({I}_{\{\alpha ={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{\alpha \in {\mathcal{M}}_{i}\}}){\beta }_{ij}(t),&\mbox{ if }i = 1,\ldots,l,\ j = 1,\ldots,{m}_{i}, \\ {I}_{\{\alpha ={s}_{{_\ast}j}\}}{\beta }_{ij}(t), &\mbox{ if }i = {_\ast},\ j = 1,\ldots,{m}_{{_\ast}}. \end{array} \right.$$
(5.126)

Consider the normalized occupation measure

$${n}^\varepsilon (t) = ({n}_{ 11}^\varepsilon (t),\ldots,{n}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{n}_{ {_\ast}1}^\varepsilon (t),\ldots,{n}_{ {_\ast}{m}_{{_\ast}}}^\varepsilon (t)),$$

where

$${n}_{ij}^\varepsilon (t) = \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}{W}_{ ij}(s,{\alpha }^\varepsilon (s))ds.$$

We can then proceed to obtain the asymptotic distribution.

Theorem 5.58

Assume (A5.9) , and suppose \(\widetilde{Q}(\cdot )\) is twice differentiable with Lipschitz continuous second derivative. Moreover, \(\widehat{Q}(\cdot )\) is differentiable with Lipschitz continuous derivative. Let β ij (⋅) (for i = 1,…,l, j = 1,…,m i ) be bounded and Lipschitz continuous deterministic functions. Then n ε (⋅) converges weakly to a switching diffusion n(⋅), where

$$n(t) = \left({\int }_{0}^{t}\sigma (s,\overline{\alpha }(s))dw(s)\right)^{\prime},$$
(5.127)

where σ(s,i) is similar to (5.105) with the following modifications:

$$\sigma (s,i) = \mathrm{diag}({0}_{{m}_{1}\times {m}_{1}},\ldots,{\sigma }^{0}(s,i),\ldots,{0}_{{ m}_{l}\times {m}_{l}},{0}_{{m}_{{_\ast}}\times {m}_{{_\ast}}})$$
(5.128)

and w(⋅) is a standard m-dimensional Brownian motion.

Finally, we confirm that the case of the generators being merely measurable can be treated as well. We state this as the following theorem.

Theorem 5.59

Assume the generator is given by (5.47) with \(\widetilde{Q}(\cdot )\) given by (5.120) such that \(\widetilde{Q}\) and \(\widehat{Q}\) are measurable and bounded and that \(\widetilde{{Q}}^{i}(t)\) is weakly irreducible for each \(i = 1,\ldots,l\) . Then the following assertions hold:

  • For any i = 1, , l, j = 1, , m i , and bounded deterministic function β ij (t), t ≥ 0,

    $$E\left ({\int }_{0}^{T}\left ({I}_{\{{ \alpha }^\varepsilon (t)={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{{\overline{\alpha }}^{ \varepsilon }(t)=i\}}\right ){\beta }_{ij}(t)dt\right )^{2} \rightarrow 0,\mbox{ as }\varepsilon \rightarrow 0.$$
  • \({\overline{\alpha }}^\varepsilon (\cdot )\) converges weakly to \(\overline{\alpha }(\cdot )\), a Markov chain generated by \({\overline{Q}}_{{_\ast}}(\cdot )\).

5.2 Inclusion of Absorbing States

Consider the Markov chain \({\alpha }^\varepsilon (\cdot ) \in \mathcal{M}\), where the generator of αε( ⋅) is still given by (5.47) with

$$\widetilde{Q}(t) = \mathrm{diag}(\widetilde{{Q}}^{1}(t),\ldots,\widetilde{{Q}}^{l}(t),{0}_{{ m}_{a}\times {m}_{a}}),$$
(5.129)

where \({0}_{{m}_{a}\times {m}_{a}}\) is the m a ×m a zero matrix, the state space of αε(t) is given by

$$\mathcal{M} = {\mathcal{M}}_{1} \cup {\mathcal{M}}_{2} \cup \cdots \cup {\mathcal{M}}_{l} \cup {\mathcal{M}}_{a},$$
(5.130)

with \({\mathcal{M}}_{i} =\{ {s}_{i1},\ldots,{s}_{i{m}_{i}}\}\) and \({\mathcal{M}}_{a} =\{ {s}_{a1},\ldots,{s}_{a{m}_{a}}\}\), and \({m}_{1} + {m}_{2} + \cdots + {m}_{l} + {m}_{a} = m\). Assume the following conditions.

  1. (A5.10)

    For all t ∈ [0, T] and i = 1, , l, \(\widetilde{{Q}}^{i}(t)\) is weakly irreducible. Furthermore, \(\widetilde{Q}(\cdot )\) is differentiable on [0, T] and its derivative is Lipschitz. Moreover, \(\widehat{Q}(\cdot )\) is Lipschitz continuous on [0, T].

Define

$$\begin{array}{ll} &\widetilde{\mathrm{1}\mathrm{l}} = \mathrm{diag}(\mathrm{1}{\mathrm{l}}_{{m}_{1}},\ldots,\mathrm{1}{\mathrm{l}}_{{m}_{l}})\ \mbox{ and }\ \widetilde{\mathrm{1}{\mathrm{l}}}_{a} = \mathrm{diag}(\widetilde{\mathrm{1}\mathrm{l}},{I}_{{m}_{a}}) \\ &\overline{Q}(t) = \mathrm{diag}({\nu }^{1}(t),{\nu }^{2}(t),\ldots,{\nu }^{l}(t),{I}_{{ m}_{a}})\widehat{Q}(t)\widetilde{\mathrm{1}{\mathrm{l}}}_{a}.\end{array}$$
(5.131)

Assume that the conditions in (A5.10) are satisfied. Then we can prove the following:

  1. (a)

    As ε → 0,

    $${p}^\varepsilon (t) = ({\vartheta}(t),{{\vartheta}}^{a}(t))\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t),{I}_{{ m}_{a}}) + O\left (\varepsilon +\exp (-{\kappa }_{0}t/\varepsilon )\right ),$$

    where

    $$\begin{array}{rl} &{\vartheta}(t) = ({{\vartheta}}^{1}(t),\ldots,{{\vartheta}}^{l}(t))) \in {\mathbb{R}}^{1\times l}\ \mbox{ and } \\ &{{\vartheta}}^{a}(t) = ({{\vartheta}}_{ 1}^{a}(t),\ldots,{{\vartheta}}_{{ m}_{a}}^{a}(t)) \in {\mathbb{R}}^{1\times {m}_{a} },\end{array}$$

    satisfying

    $$\begin{array}{rl} { d({\vartheta}(t),{{\vartheta}}^{a}(t)) \over dt} = ({\vartheta}(t),{{\vartheta}}^{a}(t))\overline{Q}(t),\ \ ({\vartheta}(0),{{\vartheta}}^{a}(0)) = {p}^\varepsilon (0)\widetilde{\mathrm{1}{\mathrm{l}}}_{ a} \end{array}$$

    where \(\overline{Q}(t)\) is given in (5.131) and p ε(0) = (p ε, 1(0), , p ε, l(0), p ε, a(0)) with \({p}^{\varepsilon,i}(0) \in {\mathbb{R}}^{1\times {m}_{i}}\) and \({p}^{\varepsilon,a}(0) \in {\mathbb{R}}^{1\times {m}_{a}}\).

  2. (b)

    For the transition probability P ε(t, t 0), we have

    $${P}^\varepsilon (t,{t}_{ 0}) = {P}^{0}(t,{t}_{ 0}) + O\left (\varepsilon +\exp (-{\kappa }_{0}(t - {t}_{0})/\varepsilon ))\right ),$$
    (5.132)

    for some κ0 > 0, where

    $${P}^{0}(t,{t}_{ 0}) =\widetilde{ \mathrm{1}{\mathrm{l}}}_{a}\Theta (t,{t}_{0})\mathrm{diag}({\nu }^{1}(t),\ldots,{\nu }^{l}(t),{I}_{{ m}_{a}}),$$

    and

    $${ d\Theta (t,{t}_{0}) \over dt} = \Theta (t,{t}_{0})\overline{Q}(t),\ \ \Theta ({t}_{0},{t}_{0}) = I.$$

To proceed, we aggregate the states in \({\mathcal{M}}_{i}\) for i = 1, , l as one state leading to the definition of the following process:

$${ \overline{\alpha }}^\varepsilon (t) = \left \{\begin{array}{@{}l@{\quad }l@{}} i, \quad &if{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ i}, \\ {\alpha }^\varepsilon (t),\quad &if{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ a}.\\ \quad \end{array} \right.$$
(5.133)

For each j = 1, , m i , we also define a sequence of centered occupation measures by

$${ \chi }_{ij}^\varepsilon (t) = \left \{\begin{array}{@{}l@{\quad }l@{}} {\int }_{0}^{t}\left ({I}_{\{{ \alpha }^\varepsilon (s)={s}_{ij}\}} - {\nu }_{j}^{i}(s){I}_{\{{\overline{\alpha }}^{ \varepsilon }(s)=i\}}\right )ds,\quad &fori=1,\ldots,l, \\ {\int }_{0}^{t}({I}_{\{{ \alpha }^\varepsilon (s)={s}_{aj}\}} - {{\vartheta}}_{j}^{a}(s))ds. \quad & \\ \quad \end{array} \right.$$
(5.134)

For t ≥ 0 and \(\alpha \in \mathcal{M}\), let

$${ W}_{ij}(t,\alpha ) = \left \{\begin{array}{@{}l@{\quad }l@{}} {I}_{\{\alpha ={s}_{ij}\}} - {\nu }_{j}^{i}(t){I}_{\{ \alpha \in {\mathcal{M}}_{i}\}},\quad &fori = 1,\ldots,l,j = 1,\ldots,{m}_{i}, \\ {I}_{\{\alpha ={s}_{aj}\}} - {{\vartheta}}_{j}^{a}(t), \quad &forj = 1,\ldots,{m}_{ a}.\\ \quad \end{array} \right.$$
(5.135)

Consider the normalized occupation measure

$${n}^\varepsilon (t) = ({n}_{ 11}^\varepsilon (t),\ldots,{n}_{ 1{m}_{1}}^\varepsilon (t),\ldots,{n}_{ l1}^\varepsilon (t),\ldots,{n}_{ l{m}_{l}}^\varepsilon (t),{n}_{ a1}^\varepsilon (t),\ldots,{n}_{ a{m}_{a}}^\varepsilon (t)),$$

where

$${n}_{ij}^\varepsilon (t) = \left \{\begin{array}{@{}l@{\quad }l@{}} \frac{1} {\sqrt\varepsilon }{\int }_{0}^{t}{W}_{ ij}(s,{\alpha }^\varepsilon (s)){\beta }_{ ij}(s)ds,\ i = 1,\ldots,l,\ j = 1,\ldots,{m}_{i},\quad \\ {\int }_{0}^{t}{W}_{ aj}(s,{\alpha }^\varepsilon (s)){\beta }_{ aj}(s)ds,\ j = 1,\ldots,{m}_{a}. \quad \\ \quad \end{array} \right.\ $$

Note that

$$\begin{array}{rl} &{ d{n}^\varepsilon (t) \over dt} = \left \{\begin{array}{@{}l@{\quad }l@{}} \frac{1} {\sqrt\varepsilon }{W}^{r}(t,{\alpha }^\varepsilon (t)),\quad &for{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{ 1} \cup \cdots \cup {\mathcal{M}}_{l}, \\ {W}^{a}(t,{\alpha }^\varepsilon (t)), \quad &for{\alpha }^\varepsilon (t) \in {\mathcal{M}}_{a},\\ \quad \end{array} \right. \\ &{n}^\varepsilon (0) = 0, \end{array}$$

where

$$\begin{array}{rl} &{W}^{r}(t,\alpha ) = ({W}_{ 11}(t,\alpha ),\ldots,{W}_{1{m}_{1}}(t,\alpha ),\ldots,{W}_{l1}(t,\alpha ),\ldots,{W}_{l{m}_{l}}(t,\alpha )), \\ &{W}^{a}(t,\alpha ) = ({W}_{ a1}(t,\alpha ),\ldots,{W}_{a{m}_{a}}(t,\alpha )),\mbox{ and } \\ &W(t,\alpha ) = ({W}^{r}(t,\alpha ),{W}^{a}(t,\alpha ))\end{array}$$
$$\langle {W}^{a}(u,\alpha )),{\nabla }_{ x}^{a}{f}^{0}(x,\alpha )\rangle = \sum \limits_{j=1}^{{m}_{a} }{b}_{j}(u,\alpha ){ \partial \over {\partial }_{a,j}} {f}^{0}(x,\alpha ).$$

We can obtain the following results.

Theorem 5.60

Assume (A5.10) . Then the following assertions hold.

  1. (a)

    For all i = 1, , l and j = 1, , m i , corresponding to the recurrent states, sup t ∈ [0, T] E | O ij ε(t) | 2 = O(ε). 

  2. (b)

    \({\overline{\alpha }}^\varepsilon (\cdot )\) converges weakly to \(\overline{\alpha }(\cdot )\), a Markov chain generated by \(\overline{Q}(\cdot )\).

  3. (c)

    Define the generator \(\mathcal{L}\) by

    $$\begin{array}{rl} \mathcal{L}{f}^{0}(x,\alpha )& ={ 1 \over 2} {\sum }_{{j}_{1},{j}_{2}=1}^{{m}_{\alpha } }{a}_{{j}_{1}{j}_{2}}(s,\alpha ){\partial }_{\alpha,{j}_{1}{j}_{2}}^{2}{f}^{0}(x,\alpha ) \\ &\ + \sum \limits_{j=1}^{{m}_{a} }{b}_{j}(s,\alpha ){\partial }_{a,j}{f}^{0}(x,\alpha ) + \overline{Q}(s){f}^{0}(x,\cdot )(\alpha )\end{array}$$

    Then the sequence \({Y }^\varepsilon (\cdot ) = ({n}^\varepsilon (\cdot ),{\overline{\alpha }}^\varepsilon (\cdot ))\) converges weakly to \(\overline{Y }(\cdot ) = (n(\cdot ),\overline{\alpha }(\cdot ))\) that is a solution of the martingale problem with operator \(\mathcal{L}\).

Next, assume that \(\widetilde{Q}(\cdot )\) and \(\widehat{Q}(\cdot )\) are bounded and measurable and \(\widetilde{{Q}}^{i}(t)\) for each \(i = 1,\ldots,l\) is weakly irreducible. Then

$${p}^\varepsilon (t) = ({p}_{ 11}^\varepsilon (t),\ldots,{p}_{ 1m1}^\varepsilon (t),\ldots,{p}_{ l1}^\varepsilon (t),\ldots,{p}_{ l{m}_{l}}^\varepsilon (t),{p}_{ a1}^\varepsilon (t),\ldots,{p}_{ a{m}_{a}}^\varepsilon (t))$$

converges in the weak topology of \({L}^{2}([0,T]; {\mathbb{R}}^{m})\) (with \(m = \sum \limits_{i=1}^{l}{m}_{i} + {m}_{a})\) to

$$p(t) = ({{\vartheta}}_{1}(t){\nu }^{1}(t),\ldots,{\nu }^{1}(t){{\vartheta}}_{ l}(t),{p}^{0,a}),$$

where p 0, a is the subvector in the initial data p 0 corresponding the absorbing state.

Note that in deriving the asymptotic distribution of the scaled occupation measures, we need to compute the asymptotic covariance of the limit process. That is, we need to evaluate the limit of

$$\begin{array}{ll} &E{\int }_{0}^{t}\left (\begin{array}{*{10}c} { 1 \over \sqrt\varepsilon } ({W}^{r}(s,{\alpha }^\varepsilon (s)))^{\prime} \\ ({W}^{a}(s,{\alpha }^\varepsilon (s)))^{\prime}\\ \end{array} \right )\left (\begin{array}{*{10}c} { 1 \over \sqrt\varepsilon } {W}^{r}(s,{\alpha }^\varepsilon (s)),&{W}^{a}(s,{\alpha }^\varepsilon (s)) \\ \end{array} \right )ds \\ &\qquad \stackrel{\mathrm{def}}{=}\left (\begin{array}{*{10}c} {W}_\varepsilon ^{rr}(t)&{W}_\varepsilon ^{ra}(t) \\ {W}_\varepsilon ^{ar}(t)&{W}_\varepsilon ^{aa}(t)\\ \end{array} \right ),\end{array}$$
(5.136)

where

$$\begin{array}{rl} &{W}_\varepsilon ^{rr}(t) ={ 1 \over \varepsilon } {\int }_{0}^{t}E({W}^{r}(s,{\alpha }^\varepsilon (s)))^{\prime}{W}^{r}(s,{\alpha }^\varepsilon (s))ds \\ &{W}_\varepsilon ^{ra}(t) ={ 1 \over \sqrt\varepsilon } {\int }_{0}^{t}E({W}^{r}(s,{\alpha }^\varepsilon (s)))^{\prime}{W}^{a}(s,{\alpha }^\varepsilon (s))ds \\ &{W}_\varepsilon ^{ar}(t) ={ 1 \over \sqrt\varepsilon } {\int }_{0}^{t}E({W}^{a}(s,{\alpha }^\varepsilon (s)))^{\prime}{W}^{r}(s,{\alpha }^\varepsilon (s))ds \\ &{W}_\varepsilon ^{aa}(t) ={ \int }_{0}^{t}E({W}^{a}(s,{\alpha }^\varepsilon (s)))^{\prime}{W}^{a}(s,{\alpha }^\varepsilon (s))ds\end{array}$$

It can be shown that

$${W}_\varepsilon ^{rr}(t) \rightarrow {\overline{W}}^{r}(t),\ {W}_{ \varepsilon }^{ra}(t) \rightarrow 0,\ {W}_{ \varepsilon }^{ar}(t) \rightarrow 0,\ \mbox{ and }{W}_{ \varepsilon }^{aa}(t) \rightarrow {\overline{W}}^{a}(t),$$

as ε → 0, where for i = 1, , l, \({\overline{W}}^{r}(t) ={ \overline{W}}^{r}(t,i) ={ \int }_{0}^{t}\widehat{{W}}^{r}(s,i)ds\) with

$$\widehat{{W}}^{r}(s,i) = \mathrm{diag}({0}_{{ m}_{1}\times {m}_{1}},\ldots,\sigma (s,i),\ldots,{0}_{{m}_{l}\times {m}_{l}})$$
(5.137)

with σ(s, i) the m i ×m i matrix such that \(\sigma (s,i)\sigma ^{\prime}(s,i) = A(s,i)\mbox{ for }i = 1,\ldots,l,\) and

$$\begin{array}{ll} &{\overline{W}}^{a}(t) = ({\overline{W}}_{ jk}^{a}(t))\mbox{ with }\ {\overline{W}}_{ jk}^{a}(t) ={ \int }_{0}^{t}\left ({\delta }_{ jk}{{\vartheta}}_{j}^{a}(s) - {{\vartheta}}_{ j}^{a}(s){{\vartheta}}_{ k}^{a}(s)\right )ds, \end{array}$$
(5.138)

where δ jk  = 1 if j = k, δ jk  = 0 if j ≠ k. The detailed proof of Theorem  5.60 can be found in Yin, Zhang, and Badowski [241].

6 Remarks on a Stability Problem

So far, our study has been devoted to systems with two time scales in a finite interval. In many problems arising in networked control systems, stability is often a main concern. A related problem along this line is in Badowski and Yin [5].

It is interesting to note that intuitive ideas sometimes are not necessarily true for systems with switching, for example, if one put together two stable systems by using, for instance, Markovian switching. Our intuition may lead to the conclusion that the combined systems should also be stable. Nevertheless, this is, in fact, not true. Such an idea was illustrated in Wang, Khargonekar, and Beydoun [212] for deterministically switched systems; see also Chapter 1 of this book concerning this matter.

As a variation of the system in [212], we consider the following example. Suppose that αε( ⋅) is a continuous-time Markov chain with state space \(\mathcal{M} =\{ 1,2\}\) and generator \({Q}^\varepsilon = Q/\varepsilon \), where \(Q = \left (\begin{array}{cc} - 1& 1\\ \\ \\ 1 & -1 \end{array} \right )\). Consider a controlled system

$$\dot{x} = A({\alpha }^\varepsilon (t))x + B({\alpha }^\varepsilon (t))u(t),$$

with state feedback u(t) = Kε(t))x(t). Then we obtain the equivalent representation

$$\dot{x} = [A({\alpha }^\varepsilon (t)) - B({\alpha }^\varepsilon (t))K({\alpha }^\varepsilon (t))]x.$$
(5.139)

Suppose that

$$\begin{array}{rl} &G(1) = A(1) - B(1)K(1) = \left (\begin{array}{cc} - 100& 20\\ \\ \\ 200 & - 100 \end{array} \right ), \\ &G(2) = A(2) - B(2)K(2) = \left (\begin{array}{cc} - 100& 200\\ \\ \\ 20 & - 100 \end{array} \right )\end{array}$$

Note that both matrices are Hurwitz (i.e., their eigenvalues have negative real parts). A question of interest is this: Is system (5.139) stable? The key to understanding the system is to examine

$$\dot{{x}}^\varepsilon (t) = G({\alpha }^\varepsilon (t)){x}^\varepsilon (t),$$
(5.140)

where both G(1) and G(2) are stable matrices.

Since Q is irreducible, the stationary distribution associated with Q is given by \((1/2,1/2)\). As a result, as ε → 0, using our weak convergence result, x ε( ⋅) converges weakly to x( ⋅), which is a solution of the system

$$\begin{array}{ll} &\dot{x}(t) = \overline{G}x(t),\ \mbox{ where } \\ &\overline{G} ={ 1 \over 2} (G(1) + G(2)) = \left (\begin{array}{cc} - 100& 110\\ \\ \\ 110 & - 100 \end{array} \right ).\end{array}$$
(5.141)

In addition, for any T < , using the large deviations result obtained in He, Yin, and Zhang [84], we can show that for any δ > 0, there is a c 1 > 0 such that

$$P({\rho }_{0,T}({x}^\varepsilon (t),x(t)) \geq \delta ) \leq \exp (-{c}_{ 1}/\varepsilon ),$$
(5.142)

where \({\rho }_{0,T}(x,y) {=\sup }_{0\leq t\leq T}\vert x(t) - y(t)\vert \).

Note that \(\overline{G}\) is an unstable matrix with eigenvalues − 210 and 10. Thus for (5.141), the critical point (0, 0) is a saddle point. But why should the stability of the averaged system dominate that of the original system? To see this, from a result of differential equations, there is a nonsingular matrix H such that \(H\overline{G}{H}^{-1} = \Lambda = \mathrm{diag}(-210,10)\). Clearly, the stability of (5.141) is equivalent to that of

$$\dot{y}(t) = \Lambda y(t) = H{\sum }_{i=1}^{2}{\nu }_{ i}G(i){H}^{-1}y(t) = \mathrm{diag}(-210,10)y(t),$$
(5.143)

where \(y = Hx = ({y}_{1},{y}_{2})^{\prime}\). The stability of (5.141) is equivalent to that of (5.143), which is completely decoupled and \({y}_{1}(t) =\exp (-210t){y}_{1}(0) \rightarrow 0\) and y 2(t) = exp(10t)y 2(0) → . To see how the original system (5.140) behaves, we apply the same transformation to get

$$\dot{{y}}^\varepsilon (t) = H{\sum }_{i=1}^{2}{I}_{\{{ \alpha }^\varepsilon (t)=i\}}G(i){H}^{-1}{y}^\varepsilon (t).$$
(5.144)

For the transformed system (5.143), by choosing \(V (y) = {y}_{2}^{2}/2\), we obtain \(\dot{V }(y(t)) = 10{y}_{2}^{2} > 0\) for all y 2 ≠ 0. Define \({L}^\varepsilon z(t) {=\lim }_{\delta \downarrow 0}{E}_{t}^\varepsilon [z(t + \delta ) - z(t)]/\delta \) for a real-valued function z(t) that is continuously differentiable, where E t ε denotes the conditioning on the \({\mathcal{F}}_{t}^\varepsilon = \sigma \{{\alpha }^\varepsilon (s) : s \leq t\}\). With \(V (y) = {y}_{2}^{2}/2\), we have

$${L}^\varepsilon V ({y}^\varepsilon (t)) = 10{({y}_{ 2}^\varepsilon (t))}^{2} + {V ^{\prime}}_{ y}({y}^\varepsilon (t))H{\sum }_{i=1}^{2}[{I}_{\{{ \alpha }^\varepsilon (t)=i\}} - {\nu }_{i}]G(i){H}^{-1}{y}^\varepsilon (t),$$

where \({V ^{\prime}}_{y}(y) = (0,{y}_{2}) \in {\mathbb{R}}^{1\times 2}\). Using perturbed Liapunov function techniques as done in Badowski and Yin [5], define a perturbation

$${V }_{2}^\varepsilon (y,t) = {E}_{ t}^\varepsilon { \int }_{t}^{\infty }{e}^{t-s}{V }_{ y}^{\prime}(y)H{\sum }_{i=1}^{2}[{I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}]G(i){H}^{-1}y.$$

It can be shown that V 2 ε(y, t) = O(ε)V (y). In addition,

$$\begin{array}{rl} &{L}^\varepsilon {V }_{ 2}^\varepsilon ({y}^\varepsilon (t),t) = -{V ^{\prime}}_{ y}({y}^\varepsilon (t))H{\sum }_{i=1}^{2}[{I}_{\{{ \alpha }^\varepsilon (t)=i\}} - {\nu }_{i}]G(i){H}^{-1}{y}^\varepsilon (t) \\ & + O(\varepsilon )V ({y}^\varepsilon (t))\end{array}$$

Define

$${V }^\varepsilon (y,t) = V (y) + {V }_{ 2}^\varepsilon (y,t).$$

Evaluate L ε V ε(y ε(t), t). Upon cancelation, for sufficiently small ε, we can make

$$O(\varepsilon )V ({y}^\varepsilon (t)) \geq -{({y}_{ 2}^\varepsilon (t))}^{2}.$$

It then follows that

$$\begin{array}{rl} {L}^\varepsilon {V }^\varepsilon ({y}^\varepsilon (t),t)& = 10{({y}_{ 2}^\varepsilon (t))}^{2} + O(\varepsilon )V ({y}^\varepsilon (t)) \\ & \geq 9{({y}_{2}^\varepsilon (t))}^{2}\end{array}$$

Taking expectation of the left- and right-hand sides above leads to

$${ d \over dt} E\vert {y}_{2}^\varepsilon (t){\vert }^{2} \geq 9E\vert {y}_{ 2}^\varepsilon (t){\vert }^{2},$$

which in turn yields that

$$E{({y}_{2}^\varepsilon (t))}^{2} \geq E{({y}_{ 2}^\varepsilon (0))}^{2}\exp (9t) \rightarrow \infty \ \mbox{ as }\ t \rightarrow \infty.$$

Similar to the previous development, choose \(V (y) = {y}_{1}^{2}/2\), define

$${V }_{1}^\varepsilon (y,t) = {E}_{ t}^\varepsilon { \int }_{t}^{\infty }{e}^{t-s}{V }_{ y}^{\prime}(y)H{\sum }_{i=1}^{2}[{I}_{\{{ \alpha }^\varepsilon (s)=i\}} - {\nu }_{i}]G(i){H}^{-1}y,$$

and redefine

$${V }^\varepsilon (y,t) = V (y) + {V }_{ 1}^\varepsilon (y,t).$$

Using the upper bound O(ε)V (y ε(t)) ≤ (y 1 ε(t))2 this time and calculating L ε V ε(y ε(t), t), we obtain

$${ d \over dt} E\vert {y}_{1}^\varepsilon (t){\vert }^{2} \leq -209E\vert {y}_{ 1}^\varepsilon (t){\vert }^{2},$$

which in turn yields that

$$E{({y}_{1}^\varepsilon (t))}^{2} \leq E{({y}_{ 1}^\varepsilon (0))}^{2}\exp (-209t) \rightarrow 0\ \mbox{ as }\ t \rightarrow \infty.$$

This yields that (5.144) and hence (5.140) are unstable in probability (see Yin and Zhu [244, p. 220] for a definition). In fact, it can be seen that the trivial solution of the original system is also a saddle.

In the same spirit of the last example, consider a system given by

$$\begin{array}{ll} &\dot{{x}}^\varepsilon (t) = G({\alpha }^\varepsilon (t)){x}^\varepsilon (t),\ \ {\alpha }^\varepsilon (t) \sim Q/\varepsilon,\ \mbox{ where } \\ &G(1) = \left (\begin{array}{rr} -{ 7 \over 3} & - 1\\ \\ \\ 0& 1 \end{array} \right ),\ G(2) = \left (\begin{array}{rr} 1& 0\\ \\ \\ - 1& -{ 7 \over 3} \end{array} \right ), \end{array}$$
(5.145)

where Q is as in the last example. Then it can be shown that x ε( ⋅) converges weakly to x( ⋅) that is a solution of the following system

$$\begin{array}{ll} \dot{x}(t)& = \overline{G}x(t), \\ \overline{G} & = \left (\begin{array}{rr} -{ 4 \over 3} & - 1\\ \\ \\ - 1& -{ 4 \over 3} \end{array} \right )\end{array}$$
(5.146)

Neither G(1) nor G(2) is a stable matrix, but the system (5.146) is a stable one. The stability analysis is again carried out using perturbed Liapunov function methods. Here exactly the same kind of argument as in [5] can be applied. Using the techniques of perturbed Liapunov functions, we can show that the stability of the averaged system “implies” that of the original system.

These two examples illustrate that one can combine two stable systems using Markovian switching to produce an unstable limit system. Likewise, one can combine two unstable systems to produce a limit stable systems. More importantly, using our weak convergence result of this chapter and the large deviations results in He, Yin, and Zhang [84], combined with the perturbed Liapunov function argument, we can give the reason why such a thing can happen.

7 Notes

This chapter concerns sequences of functional occupation measures. It includes convergence of an unscaled sequence (in probability) and central-limit-type results for suitably scaled sequences. For a general introduction to central limit theorems, we refer to the book by Chow and Teicher [30] and the references therein. In the stationary case, that is, Q(t) = Q, a constant matrix, the central limit theorem may be obtained as in Friedlin and Wentzell [67]. Some results of central limit type for discrete Markov chains are in Dobrushin [50] (see also the work of Linnik on time-inhomogeneous Markov chains [147]). Work in the context of random evolution, with primary concern the central limit theorem involving a singularly perturbed Markov chain, is in Pinsky [176]; see also Kurtz [135, 137] for related discussions and the martingale problem formulation. Exponential bounds for Markov processes are quite useful in analyzing the behavior of the underlying stochastic processes. Some results in connection with diffusions can be found in Kallianpur [102]. Corollary  5.8 can be viewed as a large deviations result. For extensive treatment of large deviations, see Varadhan [207].

The central theme here is limit results of unscaled as well as scaled sequences of occupation measures, which include the law of large numbers for an unscaled sequence, exponential upper bounds, and asymptotic distribution of a suitably scaled sequence of occupation times. Results in Section 5.2 are based on the paper of Zhang and Yin [252]; however, a somewhat different approach to the central limit theorem was used in [252]. Some of the results in Section 5.3 are based on Zhang and Yin [253]. The result on exponential error bound in Section 5.3 is a natural extension for the irreducible generators. Such result holds uniformly in t ∈ [0, T] for fixed but otherwise arbitrary T > 0. The main motivation for treating T as a parameter stems from various control and optimization problems with discounted cost over the infinite horizon. In such a situation, the magnitude of the bound counts. Thus detailed information on the bounding constant is helpful for dealing with the near optimality of the underlying problem. Section 5.3 also presents a characterization of the limit process using martingale problem formulations. Much of the foundation of this useful approach is in the work of Stroock and Varadhan [203]. Using perturbed operators to study limit behavior may be traced back to Kurtz [135]. The general idea of perturbed test functions was used in Blankenship and Papanicolaou [16], and Papanicolaou, Stroock, and Varadhan [168]. It was further developed and extended by Kushner [139] for various stochastic systems, and singularly perturbed systems in Kushner [140]; see also Kushner and Yin [145] for related stochastic approximation problems, and Ethier and Kurtz [59] and Kurtz [137] for related work in stochastic processes. The results of this section have benefited from the discussion with Thomas Kurtz, who suggested treating the pair of processes (n ε( ⋅), αε( ⋅)) together, which led to the current version. Earlier treatment of a pair of processes may be found in the work of Kesten and Papanicolaou [110] for stochastic acceleration.

The results on asymptotic properties for the inclusion of transient states can be found in Yin, Zhang, and Badowski [239]; the results for the case of generators being measurable can be found in the work of Yin, Zhang, and Badowski [240]; the results on asymptotic properties of occupation measures with absorbing states can be found in Yin, Zhang, and Badowski [241].