1 Introduction

We begin with introducing some notations that will be used throughout the paper. Assume that \((\xi _n)_{n=0}^\infty \) is a sequence of arbitrary random variables taking values from a finite set of \(\mathbf {X} =\{1,2,\ldots ,b\}\) and \((\Omega ,\mathcal {F},\mathbb {P})\) the underlying probability space. For convenience, denote by \( \xi _{m,n}\) the random vector of \((\xi _m,\ldots ,\xi _{m+n})\) and \(x_{m,n}=(x_m,\ldots ,x_{m+n})\), a realization of \(\xi _{m,n}\). Suppose the joint distribution of \(\xi _{m,n}\) is

$$\begin{aligned} \mathbb {P}(\xi _{m,n}=x_{m,n})\!=\!p(x_m,\ldots ,x_{m+n})\!=\!p(x_{m,n}), \ \ x_k\in \mathbf {X},\ m\le k\le m\!+\! n. \end{aligned}$$
(1.1)

Let \((a_n)_{n=0}^\infty \) and \((\phi (n))_{n=0}^\infty \) be two sequences of nonnegative integers such that \(\phi (n)\) converges to infinite as \(n\rightarrow \infty \). Let

$$\begin{aligned} f_{a_n,\phi (n)}(\omega )=-\frac{1}{\phi (n)}\log p(\xi _{a_n,\phi (n)}), \end{aligned}$$
(1.2)

where \(\log \) is the natural logarithm. \(f_{a_n,\phi (n)}(\omega )\) will be called generalized entropy density of \(\xi _{a_n,\phi (n)}\). If \(a_n\equiv 0\) and \(\phi (n)=n\), \(f_{a_n,\phi (n)}(\omega )\) will become the classical entropy density of \(\xi _{0,n}\) defined as follows

$$\begin{aligned} f_{0,n}(\omega )=-\frac{1}{n}\log p(\xi _{0,n}). \end{aligned}$$
(1.3)

If \((\xi _n)_{n=0}^\infty \) is a nonhomogeneous Markov chain taking values in finite state-space of \(\mathbf {X} =\{1,2,\ldots ,b\}\) with the initial distribution

$$\begin{aligned} (\mu _0(1),\ldots ,\mu _0(b)), \end{aligned}$$
(1.4)

and the transition matrices

$$\begin{aligned} P_n=(p_{n}(i,j))_{b\times b},\ \ i,j\in \mathbf {X},\ n=1,2\ldots , \end{aligned}$$
(1.5)

where \(p_n(i,j)=\mathbb {P}(\xi _n=j|\xi _{n-1}=i)\), then

$$\begin{aligned} f_{a_n,\phi (n)}(\omega )=-\frac{1}{\phi (n)}\left\{ \log \mu _{a_n}(\xi _{a_n})+\sum _{k=a_n+1}^{a_n+\phi (n)} \log p_k(\xi _{k-1},\xi _k)\right\} , \end{aligned}$$
(1.6)

where \(\mu _{a_n}(x)\) is the distribution of \(\xi _{a_n}\).

The convergence of \(f_{0,n}(\omega )\) to a constant in a sense of \(\mathcal {L}_1\) convergence, convergence in probability or \(a.e.\) convergence, is called Shannon–McMillan–Breiman theorem or entropy ergodic theorem or asymptotic equipartition property (AEP), respectively, in information theory. Shannon [11] first established the entropy ergodic theorem for convergence in probability for stationary ergodic information sources with finite alphabet. McMillan [10] and Breiman [3] obtained, for finite stationary ergodic information sources, the entropy ergodic theorem in \(\mathcal {L}_1\) and \(a.e.\) convergence, respectively. Chung [6] considered the case of countable alphabet. The entropy ergodic theorem for general stochastic processes can be found, for example, in Barron [2], Kieffer [8], or Algoet and Cover [1]. Yang [12] obtained entropy ergodic theorem for a class of nonhomogeneous Markov chains, and Yang and Liu [13], the entropy ergodic theorem for a class of \(m\)th-order nonhomogeneous Markov chains, Zhong, Yang and Liang [14], entropy ergodic theorem for a class of asymptotic circular Markov chains.

The second term of Eq. (1.6) is actually delayed sums of random variables, which was first introduced by Zygmund [15] who used it to prove a Tauberian theorem of Hardy. Since then, a lot of work has been done to investigate the properties of delayed sums. For example, by using the limiting behavior of delayed sums, Chow [4] found necessary and sufficient conditions for the Borel summability of i.i.d. random variables and simplified the proofs of a number of well-known results such as the Hsu–Robbins–Spitzer–Katz theorem. Lai [9] studied the analogues of the law of the iterated logarithm for delayed sums of independent random variables. Recently, Gut and Stradtmüller [7] studied the strong law of large numbers for delayed sums of random fields.

Let \((\xi _n)_{n=0}^\infty \) be a nonhomogeneous Markov chain with the transition matrices (1.5). Yang [12] showed that the classical entropy density \(f_{0,n}(\omega )\) of this Markov chain converges \(a.e.\) to the entropy rate of a Markov chain under the condition that \(\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{k=1}^{n}|p_k(i,j)-p(i,j)|=0\), for all \(i,j\in \mathbf {X}\), where \(P=(p(i,j))_{b\times b}\) is an irreducible transition matrix. In this paper, we will prove that the generalized entropy density \(f_{a_n,\phi (n)}(\omega )\) converges \(a.e.\) and \(\mathcal {L}_1\) to this entropy rate under some mild conditions, which is called the generalized entropy ergodic theorem. The results of this paper generalize the results of those in [12].

To prove the main results, we first establish a strong limit theorem for the delayed sums of the functions of two variables for nonhomogeneous Markov chains, then we obtain the strong limit theorems of the frequencies of occurrence of states and the ordered couples of states in the segment \(\xi _{a_n},\ldots ,\xi _{a_n+\phi (n)}\) for the Markov chains. At the end, we present the main results. We also prove that \(f_{a_n,\phi (n)}(\omega )\) are uniformly integrable for arbitrary finite sequence of random variables.

The approach used in this paper is different from the one used in some previous works [12, 13], where the strong law of large numbers for martingale is applied. As \(f_{a_n,\phi (n)}(\omega )\) is the delayed sums of \(\log p_k(\xi _{k-1},\xi _{k})\), the strong law of large numbers for martingale cannot be applied. The essence of the technique used in this paper is first to construct a one parameter class of random variables with means of 1, then, using Borel–Cantelli lemma, to prove the existence of \(a.e.\) convergence of certain random variables.

The rest of this paper is organized as follows. In Sect. 2, we first establish some preliminary results that will be used to prove our main results, and present the main results of this paper and their proofs in Sect. 3.

2 Some Lemmas

Before proving the main results, we first begin with some lemmas.

Lemma 1

Suppose \((\xi _n)_{n=0}^\infty \) is a nonhomogeneous Markov chain taking values from a finite state-space of \(\mathbf {X} =\{1,2,\ldots ,b\}\) with the initial distribution (1.4) and the transition matrices (1.5). Suppose \((a_n)_{n=0}^\infty \) and \((\phi (n))_{n=0}^\infty \) are two sequences of nonnegative integers such that \(\phi (n)\) tends to infinity as \(n\rightarrow \infty \). Let \((g_n(x,y))_{n=0}^\infty \) be a sequence of real functions defined on \(\mathbf {X}\times \mathbf {X}\). If for every \(\varepsilon >0\)

$$\begin{aligned} \sum _{n=1}^\infty \exp [-\varepsilon \phi (n)]<\infty , \end{aligned}$$
(2.1)

and there exists a real number \( 0<\gamma <\infty \) such that

$$\begin{aligned} \limsup _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}E[|g_k(\xi _{k-1}, \xi _k)|^2 e^{\gamma |g_k(\xi _{k-1},\xi _k)|}|\xi _{k-1}]\!=\!c(\gamma ;\omega )\!<\!\infty \ \ a.e., \end{aligned}$$
(2.2)

then, we have

$$\begin{aligned} \lim _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\{g_k(\xi _{k-1}, \xi _k)-E[g_k(\xi _{k-1},\xi _k)|\xi _{k-1}]\}=0\ \ a.e. \end{aligned}$$
(2.3)

Remark 1

Obviously, condition (2.1) in Lemma 1 can be easily satisfied. For example, let \(\phi (n)=[n^\alpha ] (\alpha >0)\), where \([\cdot ]\) is the usual greatest integer function, then (2.1) holds. If \((g_n(x,y))_{n=1}^\infty \) are uniformly bounded, then Eq. (2.2) holds.

Remark 2

Since \(E[g_k(\xi _{k-1},\xi _{k})|\xi _{k-1}]=\sum _{j=1}^bg_k(\xi _{k-1},j) p_k(\xi _{k-1},j) \), Eq. (2.3) can be rewritten as

$$\begin{aligned} \lim _{n}\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\{ g_k(\xi _{k-1},\xi _{k})- \sum _{j=1}^bg_k(\xi _{k-1},j)p_k(\xi _{k-1},j)\}=0 \ \ a.e. \end{aligned}$$
(2.4)

Proof

Let \(s\) be a nonzero real number, define

$$\begin{aligned} \Lambda _{a_n,\phi (n)}(s,\omega )=\frac{\exp \{s \sum _{k=a_n+1}^{a_n+\phi (n)}g_k(\xi _{k-1},\xi _k)\}}{\prod _{k=a_n+1}^{a_n+\phi (n)}E[e^{s g_k(\xi _{k-1},\xi _k)}|\xi _{k-1}]},\ \ n=1,2,\ldots . \end{aligned}$$

and note that

$$\begin{aligned}&E\Lambda _{a_n,\phi (n)}(s,\omega )= E[E[\Lambda _{a_n,\phi (n)}(s,\omega )|\xi _{0,a_n+\phi (n)-1}]]\nonumber \\&\quad \!=\!E\left[ E[\Lambda _{a_n,\phi (n)-1}(s,\omega )\frac{e^{s g_{a_n+\phi (n)} (\xi _{a_n+\phi (n)-1},\xi _{a_n+\phi (n)})}}{E[e^{sg_{a_n+\phi (n)}(\xi _{a_n+\phi (n)-1},\xi _{a_n+\phi (n)})} |\xi _{a_n+\phi (n)-1}]}|\xi _{0,a_n+\phi (n)\!-\!1}]\right] \nonumber \\&\quad =E\left[ \frac{\Lambda _{a_n,\phi (n)-1}(s,\omega ) E[e^{sg_{a_n+\phi (n)} (\xi _{a_n+\phi (n)-1},\xi _{a_n+\phi (n)})}|\xi _{a_n+\phi (n)-1}] }{E[e^{sg_{a_n+\phi (n)}(\xi _{a_n+\phi (n)-1},\xi _{a_n+\phi (n)})}|\xi _{a_n+\phi (n)-1}]}\right] \nonumber \\&\quad =E\Lambda _{a_n,\phi (n)-1}(s,\omega )=\cdots =E\Lambda _{a_n,1}(s,\omega )=1. \end{aligned}$$
(2.5)

For any \(\varepsilon >0\), by Markov inequality and Eq. (2.1), we have

$$\begin{aligned} \sum _{n=1}^\infty \mathbb {P}\left[ \phi ^{-1}(n)\log \Lambda _{a_n,\phi (n)}(s,\omega )\ge \varepsilon \right]&=\sum _{n=1}^\infty \mathbb {P} \left[ \Lambda _{a_n,\phi (n)}(s,\omega )\ge \exp (\phi (n)\varepsilon )\right] \nonumber \\&\le \sum _{n=1}^\infty 1\cdot \exp (-\phi (n)\varepsilon )<\infty . \end{aligned}$$
(2.6)

By Borel–Cantelli lemma and arbitrariness of \(\varepsilon \), we have

$$\begin{aligned} \limsup _n\frac{1}{\phi (n)}\log \Lambda _{a_n,\phi (n)}(s,\omega )\le 0\ \ a.e. \end{aligned}$$
(2.7)

Note that

$$\begin{aligned} \frac{1}{\phi (n)}\log \Lambda _{a_n,\phi (n)}(s,\omega )=\frac{1}{\phi (n)} \sum _{k=a_n+1}^{a_n+\phi (n)}\{sg_k(\xi _{k-1},\xi _{k})-\log {E[e^{s g_k(\xi _{k-1},\xi _k)}|\xi _{k-1}]}\}. \end{aligned}$$
(2.8)

By Eqs. (2.7) and (2.8), we have

$$\begin{aligned} \limsup _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)} \left\{ sg_k(\xi _{k-1},\xi _{k})-\log E[e^{sg_k(\xi _{k-1},\xi _{k})}|\xi _{k-1}]\right\} \le 0\ \ a.e. \end{aligned}$$
(2.9)

Letting \(0<s<\gamma \), dividing both sides of Eq. (2.9) by \(s\), we obtain

$$\begin{aligned} \limsup _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)} \left\{ g_k(\xi _{k-1},\xi _{k})-\frac{1}{s }\log E[e^{s g_k(\xi _{k-1},\xi _{k})}|\xi _{k-1}]\right\} \le 0\ \ a.e. \end{aligned}$$
(2.10)

Using the inequalities \(\log x\le x-1\ (x> 0)\) and \(0\le e^x-1-x\le \frac{1}{2}x^2e^{|x|}\ ( x\in \mathbf {R})\), from Eq. (2.10), we have

$$\begin{aligned}&\limsup _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\{g_k(\xi _{k-1}, \xi _{k})-E[g_k(\xi _{k-1},\xi _{k})|\xi _{k-1}]\}\nonumber \\&\quad \le \limsup _n\frac{1}{ \phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)} \left\{ \frac{1}{s }\log E[e^{s g_k(\xi _{k-1},\xi _{k})}|\xi _{k-1}]-E[g_k(\xi _{k-1},\xi _{k})| \xi _{k-1}]\right\} \nonumber \\&\quad =\limsup _n\frac{1}{ \phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\left\{ \frac{E[(e^{sg_k(\xi _{k-1}, \xi _{k})}-1-sg_k(\xi _{k-1},\xi _{k}))|\xi _{k-1}]}{s}\right\} \nonumber \\&\quad \le \frac{s}{2}\limsup _n\frac{1}{ \phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}E[g_k^2(\xi _{k-1},\xi _{k})e^{s|g_k (\xi _{k-1},\xi _{k})|}|\xi _{k-1}]\nonumber \\&\quad \le \frac{1}{2}sc(\gamma ;\omega ) <\infty \ \ \ a.e. \end{aligned}$$
(2.11)

Letting \(s\downarrow 0\) in Eq. (2.11), we obtain

$$\begin{aligned} \limsup _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)} [g_k(\xi _{k-1},\xi _{k})-E(g_k(\xi _{k-1},\xi _{k})|\xi _{k-1})]\le 0\ \ a.e. \end{aligned}$$
(2.12)

Letting \(-\gamma <s<0\) in (2.9), similarly, we obtain that

$$\begin{aligned} \liminf _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)} [g_k(\xi _{k-1},\xi _{k})-E(g_k(\xi _{k-1},\xi _{k})|\xi _{k-1})]\ge 0\ \ a.e. \end{aligned}$$
(2.13)

Equation (2.3) follows immediately from Eqs. (2.12) and (2.13).\(\square \)

Let \(\mathbf {1}_{\{\cdot \}}(\cdot )\) be the indicator function and \(S_{m,n}(j;\omega )\) the number of occurrences of \(j\) in the segment \(\xi _{m},\ldots ,\xi _{m+n-1}\). It is easy to see that

$$\begin{aligned} S_{m,n}(j;\omega )=\sum _{k=m}^{m+n-1}\mathbf {1}_{\{j\}}(\xi _k) ~~ \end{aligned}$$

Let \(S_{m,n}(i,j;\omega )\) be the number of occurrences of the pair \((i,j)\) in the sequence of ordered pairs \((\xi _{m},\xi _{m+1}),\ldots ,\, (\xi _{m+n-1},\xi _{m+n})\). Then

$$\begin{aligned} S_{m,n}(i,j;\omega )=\sum _{k=m}^{m+n-1}\mathbf {1}_{\{i\}}(\xi _k) \mathbf {1}_{\{j\}}(\xi _{k+1}) \end{aligned}$$

Corollary 1

Under the conditions of Lemma 1, let \(S_{m,n}(j;\omega )\) be defined as before. Then

$$\begin{aligned} \lim _{n}\frac{1}{\phi (n)}\{S_{a_n,\phi (n)} (j;\omega )-\sum _{k=a_n+1}^{a_n+\phi (n)} p_k(\xi _{k-1},j)\}=0 \ \ a.e. \end{aligned}$$
(2.14)

Proof

Let \( g_k(x,y)=\mathbf {1}_{\{j\}}(y)\) in Lemma 1. It is easy to see that \(\{ g_k(x,y), k\ge 1\}\) satisfy the Eq. (2.2) of Lemma 1. Noticing that

$$\begin{aligned}&\sum _{k=a_n+1}^{a_n+\phi (n)}\{g_k(\xi _{k-1},\xi _{k})- \sum _{l=1}^bg_k(\xi _{k-1},l)p_k(\xi _{k-1},l)\}\nonumber \\&\quad =\sum _{k=a_n+1}^{a_n+\phi (n)}\{\mathbf {1}_{\{j\}} (\xi _k) -\sum _{l=1}^b\mathbf {1}_{\{j\}}(l) p_{k}(\xi _{k-1},l)\}\nonumber \\&\quad = S_{a_n,\phi (n)}(j;\omega )+\mathbf {1}_{\{j\}}(\xi _{a_n+\phi (n)}) -\mathbf {1}_{\{j\}}(\xi _{a_n})-\sum _{k=a_n+1}^{a_n+\phi (n)} p_{k}(\xi _{k-1},j), \end{aligned}$$
(2.15)

Equation (2.14) follows from Lemma 1.\(\square \)

Corollary 2

Under the conditions of Lemma 1, let \(S_{m,n}(i,j;\omega )\) be defined as before. Then

$$\begin{aligned} \lim _{n}\frac{1}{\phi (n)}\{S_{a_n,\phi (n)}(i,j;\omega ) -\sum _{k=a_n+1}^{a_n+\phi (n)} \mathbf {1}_{\{i\}}(\xi _{k-1})p_k(i,j)\}=0 \ \ a.e. \end{aligned}$$
(2.16)

Proof

Let \(g_k(x,y)=\mathbf {1}_{\{i\}}(x)\mathbf {1}_{\{j\}}(y)\) in Lemma 1. It is easy to see that \(\{ g_k(x,y), k\ge 1\}\) satisfy the Eq. (2.2) of Lemma 1. Noticing that

$$\begin{aligned}&\sum _{k=a_n+1}^{a_n+\phi (n)}\{g_k(\xi _{k-1},\xi _{k})- \sum _{l=1}^bg_k(\xi _{k-1},l)p_k(\xi _{k-1},l)\}\nonumber \\&\quad =\sum _{k=a_n+1}^{a_n+\phi (n)}\{\mathbf {1}_{\{i\}}(\xi _{k-1})\mathbf {1}_ {\{j\}}(\xi _k)-\sum _{l=1}^b\mathbf {1}_{\{i\}}(\xi _{k-1})\mathbf {1}_{\{j\}} (l)p_k(\xi _{k-1},l)\}\nonumber \\&\quad = S_{a_n,\phi (n)}(i,j;\omega )-\sum _{k=a_n+1}^{a_n+\phi (n)} \mathbf {1}_{\{i\}}(\xi _{k-1})p_k(i,j), \end{aligned}$$
(2.17)

Equation (2.16) follows from Lemma 1.\(\square \)

Lemma 2

Let \((a_n)_{n=0}^\infty \) and \((\phi (n))_{n=0}^\infty \) be as in Lemma 1, and \(h(x)\) be a bounded function defined on an interval I, and \((x_n)_{n=0}^\infty \) a sequence in I. If

$$\begin{aligned} \lim _{n}\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}|x_{k}-x|=0, \end{aligned}$$

and \(h(x)\) is continuous at point \(x\), then,

$$\begin{aligned} \lim _{n}\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}|h(x_{k})-h(x)|=0. \end{aligned}$$

Proof

The proof of this lemma is similar to that of Lemma 2 in [12], so we omit it.\(\square \)

Lemma 3

Let \((\xi _n)_{n=0}^\infty \) be a sequence of arbitrary random variables taking values from a finite state-space of \(\mathbf {X} =\{1,2,\ldots ,b\}\), and let \(f_{a_n,\phi (n)}(\omega )\) be defined by Eq. (1.2). Then \(f_{a_n,\phi (n)}(\omega )\) are uniformly integrable.

Proof

To prove that \(f_{a_n,\phi (n)}(\omega )\) are uniformly integrable, it is sufficient to verify the following two conditions (see [5], p.96)

  1. (a)

    For every \(\varepsilon >0\), there exists \(\delta (\varepsilon )>0\) such that for any \(A\in \mathcal {F}\),

    $$\begin{aligned} \mathbb {P}(A)<\delta (\varepsilon )\Longrightarrow \int _Af_{a_n,\phi (n)}(\omega )\mathrm{d}\mathbb {P}<\varepsilon \ \ \text {for every}\ n. \end{aligned}$$
  2. (b)

    \(Ef_{a_n,\phi (n)}(\omega )\) are bounded for all \(n\).

Let \(A\in \mathcal {F}\). It is easy to see that

$$\begin{aligned}&\int _Af_{a_n,\phi (n)}(\omega )\mathrm{d}\mathbb {P} =-\int _A\frac{1}{\phi (n)}\log p(\xi _{a_n,\phi (n)})\mathrm{d}\mathbb {P}\nonumber \\&\!=\!-\sum _{x_{a_n},\ldots ,x_{a_n+\phi (n)}}\frac{1}{\phi (n)}\log p(x_{a_n,\phi (n)})\cdot \mathbb {P}(A\cap \{\xi _{a_n,\phi (n)} =x_{a_n,\phi (n)}\})\nonumber \\&\!\le \!-\sum _{x_{a_n},\ldots ,x_{a_n+\phi (n)}}\frac{1}{\phi (n)}\log \mathbb {P}(A\cap \{\xi _{a_n,\phi (n)}\!=\!x_{a_n,\phi (n)}\})\cdot \mathbb {P} (A\cap \{\xi _{a_n,\phi (n)}\!=\!x_{a_n,\phi (n)}\}). \end{aligned}$$
(2.18)

Replacing \(\log \mathbb {P}(A\cap \{\xi _{a_n,\phi (n)}=x_{a_n,\phi (n)}\})\) by \(\log \frac{\mathbb {P}(A)}{b^{\phi (n)+1}}\) in Eq. (2.18) and noting that

$$\begin{aligned} \sum _{x_{a_n},\ldots ,x_{a_n+\phi (n)}}\mathbb {P}(A\cap \{\xi _{a_n,\phi (n)} =x_{a_n,\phi (n)}\})=\mathbb {P}(A)=\sum _{x_{a_n},\ldots ,x_{a_n+\phi (n)}} \frac{\mathbb {P}(A)}{b^{\phi (n)+1}}, \end{aligned}$$

by the entropy inequality

$$\begin{aligned} -\sum _{k=1}^sp_k\log p_k\le -\sum _{k=1}^sp_k\log q_k, \end{aligned}$$

where \(p_k,q_k\ge 0,\ k=1,2,\ldots ,s\) and \(\sum _{k=1}^sp_k=\sum _{k=1}^sq_k\), we have

$$\begin{aligned} \int _Af_{a_n,\phi (n)}(\omega )\mathrm{d}\mathbb {P}&\le -\sum _{x_{a_n},\ldots ,x_{a_n+\phi (n)}}\frac{1}{\phi (n)}\log \frac{\mathbb {P}(A)}{b^{\phi (n)+1}}\cdot \mathbb {P}(A\cap \{\xi _{a_n,\phi (n)} =x_{a_n,\phi (n)}\})\nonumber \\ \!&=\!-\frac{1}{\phi (n)}\left( \log \frac{\mathbb {P}(A)}{b^{\phi (n)+1}}\sum _{x_{a_n},\ldots ,x_{a_n+\phi (n)}} \mathbb {P}(A\cap \{\xi _{a_n,\phi (n)}\!=\!x_{a_n,\phi (n)}\})\right) \nonumber \\&=\left( \frac{\phi (n)+1}{\phi (n)}\log b-\frac{\log \mathbb {P}(A)}{\phi (n)}\right) \cdot \mathbb {P}(A)\nonumber \\&\le (2\log b-\log \mathbb {P}(A))\mathbb {P}(A). \end{aligned}$$
(2.19)

Since \(\lim _{x\rightarrow 0^+}x(2\log b-\log x)=0\), the left hand side of Eq. (2.19) is small provided \(\mathbb {P}(A)\) is small and a) holds. Letting \(A=\Omega \) in Eq. (2.19), we have

$$\begin{aligned} Ef_{a_n,\phi (n)}(\omega )=\int f_{a_n,\phi (n)}(\omega )\mathrm{d}\mathbb {P}\le 2\log b. \end{aligned}$$

Thus b) holds and the proof of the Lemma 5 is complete.\(\square \)

3 The Main Results

In this section, we will establish the strong law of large numbers for frequencies of occurrence of states and the pairs of states for delayed sums of nonhomogeneous Markov chains and the generalized entropy ergodic theorem for the Markov chains.

Theorem 1

Suppose \((\xi _n)_{n=0}^\infty \) is a nonhomogeneous Markov chain taking values from a finite state-space of \(\mathbf {X} =\{1,2,\ldots ,b\}\) with the initial distribution (1.4) and the transition matrices (1.5). Let \((a_n)_{n=0}^\infty \) and \((\phi (n))_{n=0}^\infty \) be as in Lemma 1. Let \(S_{a_n,\phi (n)}(i,\omega )\) and \(S_{a_n,\phi (n)}(i,j;\omega )\) be defined as before, and \(f_{a_n,\phi (n)}(\omega )\) be defined by Eq. (1.6). Let \(P=(p(i,j))_{b\times b}\) be another transition matrix, and assume that \(P\) is irreducible. If Eq. (2.1) holds and

$$\begin{aligned} \lim _{n}\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}|p_{k}(i,j)-p(i,j)|=0, \quad \forall i,j\in \mathbf {X}, \end{aligned}$$
(3.1)

then

$$\begin{aligned} (i)&\quad \lim _{n}\frac{1}{\phi (n)}S_{a_n,\phi (n)}(i;\omega )=\pi _{i} \ \ a.e.\quad \forall i\in \mathbf {X},\end{aligned}$$
(3.2)
$$\begin{aligned} (ii)&\quad \lim _{n}\frac{1}{\phi (n)}S_{a_n,\phi (n)}(i,j;\omega )=\pi _{i}p(i,j) \ \ a.e.\quad \forall i,j \in \mathbf {X},\end{aligned}$$
(3.3)
$$\begin{aligned} (iii)&\quad \lim _{n}f_{a_n,\phi (n)}(\omega )=-\sum _{i=1}^b\sum _{j=1}^b\pi _{i}p(i,j)\log p(i,j) \ \ a.e.,\ \ \ \ \ \ \ \ \end{aligned}$$
(3.4)

where \((\pi _1,\ldots ,\pi _b)\) is the unique stationary distribution determined by the transition matrix \(P\).

Remark 3

It is easy to see that if \(\lim _n p_n(i,j)= p(i,j) \quad \forall i,j \in \mathbf {X}, \) then Eq. (3.1) holds. Observe that

$$\begin{aligned} \frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}|p_{k}(i,j)-p(i,j)| \!\le \! (1\!+\!\frac{a_n}{\phi (n)})\frac{1}{a_n+\phi (n)}\sum _{k=1}^ {a_n+\phi (n)}|p_{k}(i,j)-p(i,j)|. \end{aligned}$$

If, in addition, \(\{\frac{a_n}{\phi (n)}\}\) is bounded, then Eq. (3.1) follows from the following equation

$$\begin{aligned} \lim _n\frac{1}{n}\sum _{k=1}^n|p_k(i,j)-p(i,j)|=0 \quad \forall i,j \in \mathbf {X}. \end{aligned}$$
(3.5)

But in general Eq. (3.5) may not imply (3.1). For example, let

$$\begin{aligned} P_1=\left[ \begin{array}{cc}\frac{1}{3}&{}\frac{2}{3}\\ \frac{2}{3}&{}\frac{1}{3}\end{array}\right] , P_2=\left[ \begin{array} {cc}\frac{1}{2}&{}\frac{1}{2}\\ \\ \frac{1}{2}&{}\frac{1}{2}\end{array}\right] . \end{aligned}$$

Let \((\xi _n)_{n=0}^\infty \) be a nonhomogeneous Markov chain with transition matrices

$$\begin{aligned} P_n=\left\{ \begin{array}{ll} P_1, \ \ &{}\text {if} \ 2^k\le n\le 2^k+k, k\ge 0,\\ \\ P_2, \ \ &{}\text {otherwise}. \end{array} \right. \end{aligned}$$

Let \(P=P_2\). It is easy to see that when \(2^k\le n< 2^{k+1}\), for any \(i,j \in \mathbf {X}\)

$$\begin{aligned}&\frac{1}{n}\sum _{l=1}^n|p_l(i,j)-p(i,j)| \le \frac{1}{2^k}\sum _{l=1}^{2^{k+1}-1}|p_l(i,j)-p(i,j)|\\&\quad \le \frac{1+2+3+\cdots +(k+1)}{2^k}\frac{1}{6}=\frac{1}{2^k} \frac{(k+2)(k+1)}{2}\frac{1}{6}\rightarrow 0 \ (k\rightarrow \infty ). \end{aligned}$$

So Eq. (3.5) holds. However, if we let \(a_n=2^n\) and \(\phi (n)=n\), then

$$\begin{aligned} \frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}|p_{k}(i,j)-p(i,j)|= \frac{1}{n}\sum _{k=2^n+1}^{2^n+n}|p_{k}(i,j)-p(i,j)|=\frac{1}{6}, \end{aligned}$$

so Eq. (3.1) does not hold.

Remark 4

From Lemma 3, we know that \(\frac{1}{\phi (n)}S_{a_n,\phi (n)}(i;\omega )\), \(\frac{1}{\phi (n)}S_{a_n,\phi (n)}(i,j;\omega )\) and \(f_{a_n,\phi (n)}(\omega )\) are all uniformly integrable, so Eqs. (3.2), (3.3) and (3.4) also hold with \(\mathcal {L}_1\) convergence.

Remark 5

The right hand side of Eq. (3.4) is actually the entropy rate of a Markov chain with the transition matrix \(P\).

Remark 6

If we define a statistic as follows:

$$\begin{aligned} \hat{H}=-\sum _{i=1}^b\sum _{i=1}^b\frac{S_{a_n,\phi (n)}(i;\omega )}{\phi (n)}\frac{S_{a_n,\phi (n)}(i,j;\omega )}{S_{a_n,\phi (n)}(i;\omega )} \log \frac{S_{a_n,\phi (n)}(i,j;\omega )}{S_{a_n,\phi (n)}(i;\omega )}, \end{aligned}$$

it is easy to see from Theorem 1 that \(\hat{H}\) is a strongly consistent estimate of entropy rate \(H\), where

$$\begin{aligned} H= -\sum _{i=1}^b\sum _{j=1}^b\pi _{i}p(i,j)\log p(i,j). \end{aligned}$$

Putting \(a_n=2^n\) and \(\phi (n)=n\), under the condition of Eq. (3.1), we can use information from a segment of \((\xi _n)_{n=0}^\infty \) to estimate the entropy rate of a nonhomogeneous Markov chain.

Proof

Proof of (i). It is easy to see that

$$\begin{aligned} \sum _{k=a_n+1}^{a_n+\phi (n)}p_{k}( \xi _{k-1},j)=\sum _{k=a_n+1}^{a_n+\phi (n)}\sum _{i=1}^b\mathbf {1} _{\{i\}}(\xi _{k-1})p_k(i,j),\quad \forall j\in \mathbf {X}, \end{aligned}$$
(3.6)

and

$$\begin{aligned} \sum _{k=a_n+1}^{a_n+\phi (n)}\sum _{i=1}^b\mathbf {1}_{\{i\}}(\xi _{k-1})p(i,j) =\sum _{i=1}^bS_{n,\phi (n)}(i;\omega )p(i,j),\quad \forall j\in \mathbf {X}. \end{aligned}$$
(3.7)

From (3.1), we have that

$$\begin{aligned}&\lim _{n}\left| \frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\sum _{i=1} ^b\mathbf {1}_{\{i\}}(\xi _{k-1})[p_k(i,j)-p(i,j)]\right| \nonumber \\&\quad \le \sum _{i=1}^b\lim _{n}\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}|p_{k} (i,j)-p(i,j)|=0, \quad \forall j\in \mathbf {X}. \end{aligned}$$
(3.8)

Combining Eqs. (2.14), (3.6), (3.7) and (3.8), we have

$$\begin{aligned}&\lim _{n}\frac{1}{\phi (n)}[S_{n,\phi (n)}(j;\omega )-\sum _{i=1}^bS_{a_n, \phi (n)}(i;\omega )p(i,j)]\nonumber \\&\quad = \lim _{n}\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\sum _{i=1} ^b\mathbf {1}_{\{i\}}(\xi _{k-1})[p_k(i,j)-p(i,j)]\nonumber \\&\quad =0, \ \ a.e. \quad \forall j\in \mathbf {X}. \end{aligned}$$
(3.9)

Multiplying the two sides of Eq. (3.9) by \(p(j,k)\), and adding them together for \(j=1,2,\ldots ,b\), we have

$$\begin{aligned} 0=&\lim _{n}\frac{1}{\phi (n)}[\sum _{j=1}^bS_{a_n,\phi (n)}(j;\omega )p(j,k) -\sum _{j=1}^b\sum _{i=1}^bS_{a_n,\phi (n)}(i;\omega )p(i,j)p(j,k)]\nonumber \\ =&\lim _{n}[\sum _{j=1}^b\frac{1}{\phi (n)}S_{a_n,\phi (n)}(j;\omega ) p(j,k)-\frac{1}{\phi (n)}S_{a_n,\phi (n)}(k;\omega )]\nonumber \\&+\lim _n [\frac{1}{\phi (n)}S_{a_n,\phi (n)}(k;\omega )- \sum _{j=1}^b\sum _{i=1}^b\frac{1}{\phi (n)}S_{a_n,\phi (n)}(i;\omega ) p(i,j)p(j,k)]\nonumber \\ =&\lim _{n}[\frac{1}{\phi (n)}S_{a_n,\phi (n)} (k;\omega )-\sum _{i=1}^{b}\frac{1}{\phi (n)}S_{a_n,\phi (n)}(i;\omega )p^{(2)}(i,k)]\ \ a.e., \end{aligned}$$
(3.10)

where \(p^{(l)}(i, k)\) (\(l\) is a positive integer) is the \(l\)-step transition probability determined by the transition matrix \(P\). By induction, for all \(l\ge 1\), we have

$$\begin{aligned} \lim _{n}\frac{1}{\phi (n)}[S_{a_n,\phi (n)}(k;\omega )-\sum _{i=1}^{b}S_{a_n,\phi (n)}(i;\omega )p^{(l)}(i,k)]= 0, \ \ a.e., \end{aligned}$$
(3.11)

and

$$\begin{aligned} \lim _{n}[\frac{1}{\phi (n)}S_{a_n,\phi (n)}(k;\omega )\!-\!\frac{1}{\phi (n)}\sum _{i=1}^{b}S_{a_n,\phi (n)}(i;\omega )\frac{1}{m}\sum _{l=1}^mp^{(l)}(i,k)]\!=\! 0, \ \ a.e. \end{aligned}$$
(3.12)

It is easy to see that \(\sum _{i=1}^bS_{a_n,\phi (n)}(i,\omega )=\phi (n)\), by (3.12), we have for all \(m\ge 1\)

$$\begin{aligned} \limsup _{n}|\frac{1}{\phi (n)}S_{a_n,\phi (n)}(k;\omega )-\pi _k|\le \sum _{i=1}^b|\frac{1}{m}\sum _{l=1}^mp^{(l)}(i,k)-\pi _k| \ \ a.e. \end{aligned}$$
(3.13)

Because \(P\) is irreducible, so

$$\begin{aligned} \lim _m\frac{1}{m}\sum _{l=1}^mp^{(l)}(i,k)=\pi _k,\quad \forall i\in \mathbf {X}, \end{aligned}$$
(3.14)

Equation (3.2) follows from Eqs. (3.13) and (3.14).

Proof of (ii). Observe that

$$\begin{aligned} \sum _{k=a_n+1}^{a_n+\phi (n)}\mathbf {1}_{\{i\}}(\xi _{k-1})p(i,j)= S_{a_n,\phi (n)}(i;\omega )p(i,j). \end{aligned}$$
(3.15)

From Eq. (3.1), we have that

$$\begin{aligned} \lim _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\mathbf {1}_{\{i\}} (\xi _{k-1})[p_{k}(i,j)-p(i,j)]=0. \end{aligned}$$
(3.16)

Combining Eqs. (2.16), (3.15) and (3.16), we have

$$\begin{aligned}&\lim _{n}\frac{1}{\phi (n)}[S_{a_n,\phi (n)}(i,j;\omega )-S_{a_n,\phi (n)} (i;\omega )p(i,j)]\nonumber \\&\quad =\lim _{n}\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\mathbf {1}_{\{i\}} (\xi _{k-1})[p_{k}(i,j)-p(i,j)]=0 \ \ a.e. \end{aligned}$$
(3.17)

Equation (3.3) follows from Eqs. (3.2) and (3.17).

Proof of (iii). Since \(Ee^{|\log \mu _{a_n}(\xi _{a_n})|}=\sum _{i=1}^be^{-\log \mu _{a_n}(i)} \mu _{a_n}(i)=b\), by Markov inequality, for every \(\varepsilon >0\), form Eq. (2.1), we have

$$\begin{aligned} \sum _{n=1}^\infty \mathbb {P}\left[ \phi (n)^{-1}|\log \mu _{a_n}(\xi _{a_n})|\ge \varepsilon \right] \le b\sum _{n=1}^\infty \exp (-\phi (n)\varepsilon )<\infty . \end{aligned}$$
(3.18)

By Borel–Cantelli lemma, we obtain

$$\begin{aligned} \lim _n\frac{1}{\phi (n)}\log \mu _{a_n}(\xi _{a_n})=0\ \ a.e. \end{aligned}$$
(3.19)

Letting \(g_k(x,y)=\log p_k(x,y)\) and \(\gamma =\frac{1}{2}\) in Lemma 1, and noticing that

$$\begin{aligned}&E[(\log p_k(\xi _{k-1},\xi _k))^2e^{\frac{1}{2}|\log p_k(\xi _{k-1},\xi _k)|}|\xi _{k-1}]\nonumber \\&\quad =\sum _{j=1}^bp_k^{-\frac{1}{2}}(\xi _{k-1},j)\log ^2p_k(\xi _{k-1},j) p_k(\xi _{k-1},j)\nonumber \\&\quad =\sum _{j=1}^bp_k^{\frac{1}{2}}(\xi _{k-1},j)\log ^2p_k(\xi _{k-1},j) \le 16be^{-2}, \end{aligned}$$
(3.20)

it follows from the Lemma 1 that

$$\begin{aligned}&\lim _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\left\{ \log p_k(\xi _{k-1},\xi _k)-\sum _{j=1}^bp_k(\xi _{k-1},j)\log p_k(\xi _{k-1},j)\right\} \nonumber \\&\qquad \qquad =0\ \ a.e. \end{aligned}$$
(3.21)

Now

$$\begin{aligned}&|\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\sum _{j=1}^bp_k(\xi _{k-1},j) \log p_k(\xi _{k-1},j)-\sum _{i=1}^b\pi _i\sum _{j=1}^bp(i,j)\log p(i,j)|\nonumber \\&\quad \le |\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\sum _{i=1}^b \sum _{j=1}^b\mathbf {1}_{\{i\}}(\xi _{k-1})p_k(i,j)\log p_k(i,j)\nonumber \\&\quad -\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)} \sum _{i=1}^b\sum _{j=1}^b\mathbf {1}_{\{i\}}(\xi _{k-1})p(i,j) \log p(i,j)|\nonumber \\&\quad \!+\!|\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\sum _{i=1}^b\sum _{j=1} ^b\mathbf {1}_{\{i\}}(\xi _{k-1})p(i,j)\log p(i,j){-}\!\sum _{i=1}^b\pi _i\sum _{j=1}^bp(i,j)\log p(i,j)|\nonumber \\&\quad \le \sum _{i=1}^b\sum _{j=1}^b\frac{1}{\phi (n)}\sum _{k=a_n+1}^ {a_n+\phi (n)}|p_k(i,j)\log p_k(i,j)-p(i,j)\log p(i,j)|\nonumber \\&\quad +\sum _{i=1}^b\sum _{j=1}^b|p(i,j)\log p(i,j)||\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\mathbf {1}_{\{i\}} (\xi _{k-1}){-}\pi _i|. \end{aligned}$$
(3.22)

By Lemma 2, Eq. (3.1) and the continuity of \(h(x)=x\log x\), we have

$$\begin{aligned}&\lim _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}|p_k(i,j)\log p_k(i,j)\nonumber \\&\quad -p(i,j)\log p(i,j)|=0\quad \forall i,j\in \mathbf {X}. \end{aligned}$$
(3.23)

Combining Eqs. (3.21), (3.22), (3.23) and (3.2), we have

$$\begin{aligned}&\lim _n\frac{1}{\phi (n)}\sum _{k=a_n+1}^{a_n+\phi (n)}\log p_k(\xi _{k-1},\xi _k)=\sum _{i=1}^b\pi _i\sum _{j=1}^bp(i,j)\log p(i,j)\ \ a.e. \end{aligned}$$
(3.24)

From Eqs. (1.6), (3.19) and (3.24), Eq. (3.4) follows.\(\square \)

Corollary 3

(see [12]). Under the conditions of Theorem 1, if Eq. (3.5) holds, then

$$\begin{aligned}&(i)\quad \quad \lim _{n}\frac{1}{n}S_{0,n}(i;\omega )=\pi _{i} \ \ a.e., \text {and} \ \ \mathcal {L}_1 \quad \forall i\in \mathbf {X},\end{aligned}$$
(3.25)
$$\begin{aligned}&(ii)\quad \quad \lim _{n}\frac{1}{n}S_{0,n}(i,j;\omega )=\pi _{i}p(i,j) \ \ a.e.,\ \text {and} \ \ \mathcal {L}_1 \quad \forall i,j \in \mathbf {X},\end{aligned}$$
(3.26)
$$\begin{aligned}&(iii)\quad \quad \lim _{n}f_{0,n}(\omega )=-\sum _{i=1}^b\sum _{j=1}^b\pi _{i}p(i,j)\log p(i,j) \ \ a.e.,\ \text {and} \ \ \mathcal {L}_1, \ \ \ \ \ \ \ \end{aligned}$$
(3.27)

where \((\pi _1,\ldots ,\pi _b)\) is the unique stationary distribution determined by the transition matrix \(P\).