Abstract
Let \((\xi _n)_{n=0}^\infty \) be a nonhomogeneous Markov chain taking values in a finite state-space \(\mathbf {X}=\{1,2,\ldots ,b\}\). In this paper, we will study the generalized entropy ergodic theorem with almost-everywhere and \(\mathcal {L}_1\) convergence for nonhomogeneous Markov chains; this generalizes the corresponding classical results for Markov chains.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We begin with introducing some notations that will be used throughout the paper. Assume that \((\xi _n)_{n=0}^\infty \) is a sequence of arbitrary random variables taking values from a finite set of \(\mathbf {X} =\{1,2,\ldots ,b\}\) and \((\Omega ,\mathcal {F},\mathbb {P})\) the underlying probability space. For convenience, denote by \( \xi _{m,n}\) the random vector of \((\xi _m,\ldots ,\xi _{m+n})\) and \(x_{m,n}=(x_m,\ldots ,x_{m+n})\), a realization of \(\xi _{m,n}\). Suppose the joint distribution of \(\xi _{m,n}\) is
Let \((a_n)_{n=0}^\infty \) and \((\phi (n))_{n=0}^\infty \) be two sequences of nonnegative integers such that \(\phi (n)\) converges to infinite as \(n\rightarrow \infty \). Let
where \(\log \) is the natural logarithm. \(f_{a_n,\phi (n)}(\omega )\) will be called generalized entropy density of \(\xi _{a_n,\phi (n)}\). If \(a_n\equiv 0\) and \(\phi (n)=n\), \(f_{a_n,\phi (n)}(\omega )\) will become the classical entropy density of \(\xi _{0,n}\) defined as follows
If \((\xi _n)_{n=0}^\infty \) is a nonhomogeneous Markov chain taking values in finite state-space of \(\mathbf {X} =\{1,2,\ldots ,b\}\) with the initial distribution
and the transition matrices
where \(p_n(i,j)=\mathbb {P}(\xi _n=j|\xi _{n-1}=i)\), then
where \(\mu _{a_n}(x)\) is the distribution of \(\xi _{a_n}\).
The convergence of \(f_{0,n}(\omega )\) to a constant in a sense of \(\mathcal {L}_1\) convergence, convergence in probability or \(a.e.\) convergence, is called Shannon–McMillan–Breiman theorem or entropy ergodic theorem or asymptotic equipartition property (AEP), respectively, in information theory. Shannon [11] first established the entropy ergodic theorem for convergence in probability for stationary ergodic information sources with finite alphabet. McMillan [10] and Breiman [3] obtained, for finite stationary ergodic information sources, the entropy ergodic theorem in \(\mathcal {L}_1\) and \(a.e.\) convergence, respectively. Chung [6] considered the case of countable alphabet. The entropy ergodic theorem for general stochastic processes can be found, for example, in Barron [2], Kieffer [8], or Algoet and Cover [1]. Yang [12] obtained entropy ergodic theorem for a class of nonhomogeneous Markov chains, and Yang and Liu [13], the entropy ergodic theorem for a class of \(m\)th-order nonhomogeneous Markov chains, Zhong, Yang and Liang [14], entropy ergodic theorem for a class of asymptotic circular Markov chains.
The second term of Eq. (1.6) is actually delayed sums of random variables, which was first introduced by Zygmund [15] who used it to prove a Tauberian theorem of Hardy. Since then, a lot of work has been done to investigate the properties of delayed sums. For example, by using the limiting behavior of delayed sums, Chow [4] found necessary and sufficient conditions for the Borel summability of i.i.d. random variables and simplified the proofs of a number of well-known results such as the Hsu–Robbins–Spitzer–Katz theorem. Lai [9] studied the analogues of the law of the iterated logarithm for delayed sums of independent random variables. Recently, Gut and Stradtmüller [7] studied the strong law of large numbers for delayed sums of random fields.
Let \((\xi _n)_{n=0}^\infty \) be a nonhomogeneous Markov chain with the transition matrices (1.5). Yang [12] showed that the classical entropy density \(f_{0,n}(\omega )\) of this Markov chain converges \(a.e.\) to the entropy rate of a Markov chain under the condition that \(\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{k=1}^{n}|p_k(i,j)-p(i,j)|=0\), for all \(i,j\in \mathbf {X}\), where \(P=(p(i,j))_{b\times b}\) is an irreducible transition matrix. In this paper, we will prove that the generalized entropy density \(f_{a_n,\phi (n)}(\omega )\) converges \(a.e.\) and \(\mathcal {L}_1\) to this entropy rate under some mild conditions, which is called the generalized entropy ergodic theorem. The results of this paper generalize the results of those in [12].
To prove the main results, we first establish a strong limit theorem for the delayed sums of the functions of two variables for nonhomogeneous Markov chains, then we obtain the strong limit theorems of the frequencies of occurrence of states and the ordered couples of states in the segment \(\xi _{a_n},\ldots ,\xi _{a_n+\phi (n)}\) for the Markov chains. At the end, we present the main results. We also prove that \(f_{a_n,\phi (n)}(\omega )\) are uniformly integrable for arbitrary finite sequence of random variables.
The approach used in this paper is different from the one used in some previous works [12, 13], where the strong law of large numbers for martingale is applied. As \(f_{a_n,\phi (n)}(\omega )\) is the delayed sums of \(\log p_k(\xi _{k-1},\xi _{k})\), the strong law of large numbers for martingale cannot be applied. The essence of the technique used in this paper is first to construct a one parameter class of random variables with means of 1, then, using Borel–Cantelli lemma, to prove the existence of \(a.e.\) convergence of certain random variables.
The rest of this paper is organized as follows. In Sect. 2, we first establish some preliminary results that will be used to prove our main results, and present the main results of this paper and their proofs in Sect. 3.
2 Some Lemmas
Before proving the main results, we first begin with some lemmas.
Lemma 1
Suppose \((\xi _n)_{n=0}^\infty \) is a nonhomogeneous Markov chain taking values from a finite state-space of \(\mathbf {X} =\{1,2,\ldots ,b\}\) with the initial distribution (1.4) and the transition matrices (1.5). Suppose \((a_n)_{n=0}^\infty \) and \((\phi (n))_{n=0}^\infty \) are two sequences of nonnegative integers such that \(\phi (n)\) tends to infinity as \(n\rightarrow \infty \). Let \((g_n(x,y))_{n=0}^\infty \) be a sequence of real functions defined on \(\mathbf {X}\times \mathbf {X}\). If for every \(\varepsilon >0\)
and there exists a real number \( 0<\gamma <\infty \) such that
then, we have
Remark 1
Obviously, condition (2.1) in Lemma 1 can be easily satisfied. For example, let \(\phi (n)=[n^\alpha ] (\alpha >0)\), where \([\cdot ]\) is the usual greatest integer function, then (2.1) holds. If \((g_n(x,y))_{n=1}^\infty \) are uniformly bounded, then Eq. (2.2) holds.
Remark 2
Since \(E[g_k(\xi _{k-1},\xi _{k})|\xi _{k-1}]=\sum _{j=1}^bg_k(\xi _{k-1},j) p_k(\xi _{k-1},j) \), Eq. (2.3) can be rewritten as
Proof
Let \(s\) be a nonzero real number, define
and note that
For any \(\varepsilon >0\), by Markov inequality and Eq. (2.1), we have
By Borel–Cantelli lemma and arbitrariness of \(\varepsilon \), we have
Note that
By Eqs. (2.7) and (2.8), we have
Letting \(0<s<\gamma \), dividing both sides of Eq. (2.9) by \(s\), we obtain
Using the inequalities \(\log x\le x-1\ (x> 0)\) and \(0\le e^x-1-x\le \frac{1}{2}x^2e^{|x|}\ ( x\in \mathbf {R})\), from Eq. (2.10), we have
Letting \(s\downarrow 0\) in Eq. (2.11), we obtain
Letting \(-\gamma <s<0\) in (2.9), similarly, we obtain that
Equation (2.3) follows immediately from Eqs. (2.12) and (2.13).\(\square \)
Let \(\mathbf {1}_{\{\cdot \}}(\cdot )\) be the indicator function and \(S_{m,n}(j;\omega )\) the number of occurrences of \(j\) in the segment \(\xi _{m},\ldots ,\xi _{m+n-1}\). It is easy to see that
Let \(S_{m,n}(i,j;\omega )\) be the number of occurrences of the pair \((i,j)\) in the sequence of ordered pairs \((\xi _{m},\xi _{m+1}),\ldots ,\, (\xi _{m+n-1},\xi _{m+n})\). Then
Corollary 1
Under the conditions of Lemma 1, let \(S_{m,n}(j;\omega )\) be defined as before. Then
Proof
Let \( g_k(x,y)=\mathbf {1}_{\{j\}}(y)\) in Lemma 1. It is easy to see that \(\{ g_k(x,y), k\ge 1\}\) satisfy the Eq. (2.2) of Lemma 1. Noticing that
Equation (2.14) follows from Lemma 1.\(\square \)
Corollary 2
Under the conditions of Lemma 1, let \(S_{m,n}(i,j;\omega )\) be defined as before. Then
Proof
Let \(g_k(x,y)=\mathbf {1}_{\{i\}}(x)\mathbf {1}_{\{j\}}(y)\) in Lemma 1. It is easy to see that \(\{ g_k(x,y), k\ge 1\}\) satisfy the Eq. (2.2) of Lemma 1. Noticing that
Equation (2.16) follows from Lemma 1.\(\square \)
Lemma 2
Let \((a_n)_{n=0}^\infty \) and \((\phi (n))_{n=0}^\infty \) be as in Lemma 1, and \(h(x)\) be a bounded function defined on an interval I, and \((x_n)_{n=0}^\infty \) a sequence in I. If
and \(h(x)\) is continuous at point \(x\), then,
Proof
The proof of this lemma is similar to that of Lemma 2 in [12], so we omit it.\(\square \)
Lemma 3
Let \((\xi _n)_{n=0}^\infty \) be a sequence of arbitrary random variables taking values from a finite state-space of \(\mathbf {X} =\{1,2,\ldots ,b\}\), and let \(f_{a_n,\phi (n)}(\omega )\) be defined by Eq. (1.2). Then \(f_{a_n,\phi (n)}(\omega )\) are uniformly integrable.
Proof
To prove that \(f_{a_n,\phi (n)}(\omega )\) are uniformly integrable, it is sufficient to verify the following two conditions (see [5], p.96)
-
(a)
For every \(\varepsilon >0\), there exists \(\delta (\varepsilon )>0\) such that for any \(A\in \mathcal {F}\),
$$\begin{aligned} \mathbb {P}(A)<\delta (\varepsilon )\Longrightarrow \int _Af_{a_n,\phi (n)}(\omega )\mathrm{d}\mathbb {P}<\varepsilon \ \ \text {for every}\ n. \end{aligned}$$ -
(b)
\(Ef_{a_n,\phi (n)}(\omega )\) are bounded for all \(n\).
Let \(A\in \mathcal {F}\). It is easy to see that
Replacing \(\log \mathbb {P}(A\cap \{\xi _{a_n,\phi (n)}=x_{a_n,\phi (n)}\})\) by \(\log \frac{\mathbb {P}(A)}{b^{\phi (n)+1}}\) in Eq. (2.18) and noting that
by the entropy inequality
where \(p_k,q_k\ge 0,\ k=1,2,\ldots ,s\) and \(\sum _{k=1}^sp_k=\sum _{k=1}^sq_k\), we have
Since \(\lim _{x\rightarrow 0^+}x(2\log b-\log x)=0\), the left hand side of Eq. (2.19) is small provided \(\mathbb {P}(A)\) is small and a) holds. Letting \(A=\Omega \) in Eq. (2.19), we have
Thus b) holds and the proof of the Lemma 5 is complete.\(\square \)
3 The Main Results
In this section, we will establish the strong law of large numbers for frequencies of occurrence of states and the pairs of states for delayed sums of nonhomogeneous Markov chains and the generalized entropy ergodic theorem for the Markov chains.
Theorem 1
Suppose \((\xi _n)_{n=0}^\infty \) is a nonhomogeneous Markov chain taking values from a finite state-space of \(\mathbf {X} =\{1,2,\ldots ,b\}\) with the initial distribution (1.4) and the transition matrices (1.5). Let \((a_n)_{n=0}^\infty \) and \((\phi (n))_{n=0}^\infty \) be as in Lemma 1. Let \(S_{a_n,\phi (n)}(i,\omega )\) and \(S_{a_n,\phi (n)}(i,j;\omega )\) be defined as before, and \(f_{a_n,\phi (n)}(\omega )\) be defined by Eq. (1.6). Let \(P=(p(i,j))_{b\times b}\) be another transition matrix, and assume that \(P\) is irreducible. If Eq. (2.1) holds and
then
where \((\pi _1,\ldots ,\pi _b)\) is the unique stationary distribution determined by the transition matrix \(P\).
Remark 3
It is easy to see that if \(\lim _n p_n(i,j)= p(i,j) \quad \forall i,j \in \mathbf {X}, \) then Eq. (3.1) holds. Observe that
If, in addition, \(\{\frac{a_n}{\phi (n)}\}\) is bounded, then Eq. (3.1) follows from the following equation
But in general Eq. (3.5) may not imply (3.1). For example, let
Let \((\xi _n)_{n=0}^\infty \) be a nonhomogeneous Markov chain with transition matrices
Let \(P=P_2\). It is easy to see that when \(2^k\le n< 2^{k+1}\), for any \(i,j \in \mathbf {X}\)
So Eq. (3.5) holds. However, if we let \(a_n=2^n\) and \(\phi (n)=n\), then
so Eq. (3.1) does not hold.
Remark 4
From Lemma 3, we know that \(\frac{1}{\phi (n)}S_{a_n,\phi (n)}(i;\omega )\), \(\frac{1}{\phi (n)}S_{a_n,\phi (n)}(i,j;\omega )\) and \(f_{a_n,\phi (n)}(\omega )\) are all uniformly integrable, so Eqs. (3.2), (3.3) and (3.4) also hold with \(\mathcal {L}_1\) convergence.
Remark 5
The right hand side of Eq. (3.4) is actually the entropy rate of a Markov chain with the transition matrix \(P\).
Remark 6
If we define a statistic as follows:
it is easy to see from Theorem 1 that \(\hat{H}\) is a strongly consistent estimate of entropy rate \(H\), where
Putting \(a_n=2^n\) and \(\phi (n)=n\), under the condition of Eq. (3.1), we can use information from a segment of \((\xi _n)_{n=0}^\infty \) to estimate the entropy rate of a nonhomogeneous Markov chain.
Proof
Proof of (i). It is easy to see that
and
From (3.1), we have that
Combining Eqs. (2.14), (3.6), (3.7) and (3.8), we have
Multiplying the two sides of Eq. (3.9) by \(p(j,k)\), and adding them together for \(j=1,2,\ldots ,b\), we have
where \(p^{(l)}(i, k)\) (\(l\) is a positive integer) is the \(l\)-step transition probability determined by the transition matrix \(P\). By induction, for all \(l\ge 1\), we have
and
It is easy to see that \(\sum _{i=1}^bS_{a_n,\phi (n)}(i,\omega )=\phi (n)\), by (3.12), we have for all \(m\ge 1\)
Because \(P\) is irreducible, so
Equation (3.2) follows from Eqs. (3.13) and (3.14).
Proof of (ii). Observe that
From Eq. (3.1), we have that
Combining Eqs. (2.16), (3.15) and (3.16), we have
Equation (3.3) follows from Eqs. (3.2) and (3.17).
Proof of (iii). Since \(Ee^{|\log \mu _{a_n}(\xi _{a_n})|}=\sum _{i=1}^be^{-\log \mu _{a_n}(i)} \mu _{a_n}(i)=b\), by Markov inequality, for every \(\varepsilon >0\), form Eq. (2.1), we have
By Borel–Cantelli lemma, we obtain
Letting \(g_k(x,y)=\log p_k(x,y)\) and \(\gamma =\frac{1}{2}\) in Lemma 1, and noticing that
it follows from the Lemma 1 that
Now
By Lemma 2, Eq. (3.1) and the continuity of \(h(x)=x\log x\), we have
Combining Eqs. (3.21), (3.22), (3.23) and (3.2), we have
From Eqs. (1.6), (3.19) and (3.24), Eq. (3.4) follows.\(\square \)
Corollary 3
(see [12]). Under the conditions of Theorem 1, if Eq. (3.5) holds, then
where \((\pi _1,\ldots ,\pi _b)\) is the unique stationary distribution determined by the transition matrix \(P\).
References
Algoet, P.H., Cover, T.M.: A sandwich proof of the Shannon–McMillan–Breiman theorem. Ann. Probab. 16, 899–909 (1988)
Barron, A.R.: The strong ergodic theorem for densities: generalized Shannon–McMillian–Breiman theorem. Ann. Probab. 13, 1292–1303 (1985)
Breiman, L.: The individual ergodic theorem of information theory. Ann. Math. Stat. 28, 809–811 (1957)
Chow, Y.S.: Delayed sums and Borel summability for independent, identically distributed random variables. Bull. Inst. Math. Academia Sinica 1, 207–220 (1972)
Chung, K.L.: A Course of Probability Theory. Academic Press, New York (1974)
Chung, K.L.: The ergodic the theorem of information theory. Ann. Math. Stat. 32, 612–614 (1961)
Gut, A., Stradtmüller, U.: On the strong law of large numbers for delayed sums and random fields. Acta Math. Hung. 129(1–2), 182–203 (2010)
Kiefer, J.C.: A simple proof of the Moy–Perez generalization of the Shannon–McMillan theorem. Pac. J. Math. 51, 203–204 (1974)
Lai, T.L.: Limit theorems for delayed sums. Ann. Probab. 2(3), 432–440 (1974)
McMillan, B.: The basic theorem of information theory. Ann. Math. Stat. 24, 196–215 (1953)
Shannon, C.A.: A mathematical theorem of communication. Bell Syst. Teach. J. 27, 379–423, 623–656 (1948)
Yang, W.G.: The asymptotic equipartition property for nonhomogeneous Markov information sources. Probab. Eng. Inform. Sci. 12, 509–518 (1998)
Yang, W.G., Liu, W.: The asymptotic equipartition property for Mth-order nonhomogeneous Markov information sources. IEEE Trans. Inform. Theory 50(12), 3326–3330 (2004)
Zhong, P.P., Yang, W.G., Liang, P.P.: The asymptotic equipartition property for asymptotic circular Markov chains. Probab. Eng. Inform. Sci. 24, 279–288 (2010)
Zygmund, A.: Trigonometric Series 1. Cambridge University Press, New York (1959)
Acknowledgments
The authors are very thankful to Professor Keyue Ding who helped us to improve the English of this paper greatly, and very thankful to the reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China (11071104), the National Natural Science Foundation of Anhui Province (1408085MA04) and Foundation of Anhui Educational Committee (KJ2012B117).
Rights and permissions
About this article
Cite this article
Wang, Z., Yang, W. The Generalized Entropy Ergodic Theorem for Nonhomogeneous Markov Chains. J Theor Probab 29, 761–775 (2016). https://doi.org/10.1007/s10959-015-0597-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10959-015-0597-9