On moving averages and asymptotic equipartition of information

Nair, Radhakrishnan

doi:10.1007/s10998-014-0080-x

On moving averages and asymptotic equipartition of information

Published: 06 January 2015

Volume 71, pages 59–63, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Periodica Mathematica Hungarica Aims and scope Submit manuscript

On moving averages and asymptotic equipartition of information

Download PDF

Radhakrishnan Nair¹

4533 Accesses
1 Citation
Explore all metrics

Abstract

We prove a moving average version of the Shannon–McMillan–Breiman theorem.

Information in the presence of noise. Shannon’s amount of information

Large Deviations and Wschebor’s Theorems

What is Shannon information?

Article 17 July 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We begin by introducing some notation. Let $Z$ be a collection of points in $\mathbf{Z}\times \mathbf{N}$ and let

$$\begin{aligned} Z^h&= \ \left\{ (n,k) \ : \ (n,k) \ \in \ Z \ \mathrm{and} \ k \ \ge \ h \right\} ,\\ Z^h_{\alpha }&= \ \left\{ (z,s) \ \in \ \mathbf{Z}^2 \ : \ |z \ - \ y| \ < \ \alpha (s \ - \ r) \ {\mathrm{for}} \ {\mathrm{some}} \ (y,r) \ \in Z^h \right\} \end{aligned}$$

and

$$\begin{aligned} Z^h_{\alpha }(\lambda ) \ = \ \left\{ n : \ (n, \lambda ) \ \in \ Z^h_{\alpha } \right\} . \qquad \qquad ( \lambda \in \mathbf{N} ) \end{aligned}$$

Geometrically we can think of $Z^1 _{ \alpha }$ as the lattice points contained in the union of all solid cones with aperture $\alpha $ and vertex contained in $Z^1= Z$. We say a sequence of pairs of natural numbers $(n_l ,k_l )_{l=1}^{\infty }$ is Stoltz if there exists a collection of points $Z$ in $\mathbf{Z}\times \mathbf{N}$, and a function $h \ = \ h(t)$ tending to infinity with $t$ such that $(n_l,k_l )_{l=t}^{\infty } \ \in \ Z^{h(t)}$ and there exist $h_0$, $\alpha _0$ and $A \ > \ 0$ such that for all integers $\lambda \ > \ 0$ we have $|Z^{h_0}_{\alpha _0}(\lambda )| \ \le \ A\lambda $. This technical condition is interesting because of the following theorem [2].

Theorem 1.1

Let $(X,\beta ,\mu , T )$ denote a dynamical system, with set $X$, a $\sigma $-algebra of its subsets $\beta $, a measure $\mu $ defined on the measurable space $(X,\beta )$ such that $\mu (X) \ = \ 1$ and a measurable, measure preserving map $T: X \rightarrow X$. Suppose $f$ is in $L^1(X,\beta , \mu )$ and that the sequence of pairs on natural numbers $(n_l , k_l)_{l=1}^{\infty }$ is Stoltz then

$$\begin{aligned} m_f(x) = \lim _{l\rightarrow \infty }{1\over k_l}\sum _{i=1}^{k_l}f\left( T^{n_l+i}x\right) , \end{aligned}$$

exists almost everywhere with respect to $\mu $.

See [9, 10] and [11] for applications of this theorem to the metric theory of continued fractions. Note that if $m_{l, f}(x) = {1\over k_l}\sum _{i=1}^{k_l}f(T^{n_l+i}x)$ then

$$\begin{aligned} m_{l,f}(Tx)- m_{l,f}(x) = k_l^{-1}\left( f(T^{n_l+k_l+1}x)- f(T^{n_l+1}x)\right) . \end{aligned}$$

This means that $m_f(Tx) = m_{f}(x)$ $\mu $ almost everywhere. A dynamical system $(X,\beta , \mu , T)$ is called ergodic if given any $A \in \beta $ we have $T^{-1}A := \{ x \in X : Tx \in A \} = A$, the set $A$ has either full or null measure. A standard fact in ergodic theory is that if $(X,\beta , \mu , T)$ is ergodic and for $\mu $ measureable $k$ on $X$ we have $k(Tx) = k(x)$ almost everywhere, then $k(x) = \int _X kd\mu $ $\mu $ almost everywhere [6]. Averages where $n_l \ = \ 1$ for all $l$ will be called non-moving. Moving averages satisfying the above hypothesis can be constructed by taking for instance $n_l \ = \ 2^{2^l}$ and $k_l \ = \ 2^{2^{l-1}}$. Proving pointwise ergodic theorems like Theorem 1.1 is closely related to the proof of results in differentiation theory. For instance the proof of Birkhoff’s pointwise ergodic theorem is effected using a maximal inequality called the maximal ergodic theorem. An idea due to N. Wiener clarified by A. Calderon [5] reduces the proving the maximal ergodic theorem to proving it for the special case where $X=\mathbf{Z}$ and $T$ is addition by $1$, that is for $x \in \mathbf{Z}$ we have $Tx= x+1$. In this setting the maximal ergodic theorem is nothing other than the Hardy Littlewood maximal inequality on the group $\mathbf{Z}$. The proof of this is essentially identical to the case of the more familiar case where the group is the $\mathbf{R}$. As is well known, given a continuous function on the unit circle, it can be realised as the boundry value function of a harmonic function defined inside the unit disk. The limit behaviour of this harmonic function as the argument inside the unit disc tends toward the unit circle has long been of interest to analysts. It turns out the almost everywhere convergence behaviour of this limit is also governed by the Hardy Littlewood maximal functions. To prove this theorem, it is required that this argument approaches the unit circle confined to a cone within the unit circle. This cone is called the Stoltz cone. It is this condition in Harmonic analysis that inspires the authors of [2] to prove Theorem 1.1. See [13] for more background.

Prior to the proof of Theorem 1.1 a number of authors considered moving averages for specific sequences $(n_l,k_l)_{l\ge 1}$. For instance in [1] it is shown that if $n_l=l$ and $k_l=\sqrt{l}$ then there is an $L^{\infty }$ function for which pointwise convergence of moving averages fails. In [14] it is shown that if $n_l =p(l)$ for a non-constant polynomial with coefficients in $\mathbf{Z}$ and $k_l \over n_l$ tends to $0$ as $l$ tends to infinity, then there is an $L^{\infty }$ function for which convergence fails. Another $L^{\infty }$ counter example appears [3] in the case $n_l=4^l$ and $k_l=2^l$. On the other hand as the authors of [2] state it was known at the time of writing of [2] that if $n_l=2^{2^l}$ and $k_l = \sqrt{n}_k$ then pointwise convergence takes place. All this is now resolved in Theorem 1.1.

Let $\ldots x_{-1}, x_0, x_1, \ldots $ be two sided stationary process taking values from the finite set $K= { \{ a_1, \ldots , a_s\}} $ and let $p(x_0, \ldots , x_n)$ denote the joint distribution function of the variables $x_0, \ldots , x_n$. In this paper we prove the following theorem.

Theorem 1.2

Suppose $(n_l, k_l)_{l=1}^{\infty }$ is Stoltz. Then there is a constant $H$ such that

$$\begin{aligned} \lim _{l\rightarrow \infty }{1\over k_l} \log p(x_{n_l}, \ldots , x_{n_l+k_l}) =-H, \end{aligned}$$

almost everywhere.

In the case $n_l=1$ for all $l$, Theorem 1.2 reduces to the famous Shannon–McMillan–Breiman theorem, referred to briefly as the SMB theorem [4, 12, 15] is the fundamental theorem of information theory. A primary application of the SMB theorem is to give a theoretical underpinning to binary data compression of an ergodic time series of entropy $H>0$. Because the SMB theorem describes generic behaviour one is lead to the concept of a typical set. In particular if $x_1, \ldots , x_n$ is a sequence of stationary variables taking values in a finite state space $K$. Given $\epsilon \in (0,1)$ then a typical set with respect to the probability $p$ is the set

$$\begin{aligned} A_{\epsilon }^{n} :=\left\{ (x_1, \ldots , x_n) \in K^n: 2^{-n(H+\epsilon )} \le p(x_1, \ldots , x_n ) \le 2^{-n(H- \epsilon ) } \right\} . \end{aligned}$$

Elementary arguments, to be found in standard textbooks—see for instance [7], enable one to conclude that

$$\begin{aligned} P(A_{\epsilon }^n)&\ge 1- \epsilon , \end{aligned}$$

(1.1)

$$\begin{aligned} |A_{\epsilon }^{n}|&\le 2^{n(H+ \epsilon )}; \end{aligned}$$

(1.2)

and

$$\begin{aligned} |A_{\epsilon }^{n}| \ge (1- \epsilon ) 2^{n(H- \epsilon )} \end{aligned}$$

(1.3)

all for large $n$. These inequalities can be used to describe how to faithfully compress the data from this sequence $x_1, \ldots x_n$ using a binary code. See [7] for instance for details how this can be done. The role of the SMB theorem here is to ensure the existance of typical sets described above used in the compression. We can now consider an alternative scenario where we have a stationary series of random variables $(x_n)_{n\ge 1}$ which we are only able to observe from time to time. Say for instance a space ship travels towards a far off data source. This data is then collected compressed and returned to earth. Suppose data becomes more plentyful as it approaches the source but that to conserve resources the data is collected only intermittently. A protocol is needed to manage the collection, compression and communication of this data. Theorem 1.2 tells us how this might be done. Suppose $S= (k_l, n_l)_{l\ge 1}$ denotes a sequence of Stoltz intervals and that data collection and communication are switched off outside Stoltz intervals. We can, without loss of generality assume that the Stoltz intervals are disjoint. Associated to these Stoltz intervals we can define a typical set

$$\begin{aligned} B_{S,\epsilon }^{l} :=\left\{ (x_{n_l+1}, \ldots , x_{n_l+k_l}) \in K^{k_l}: 2^{-k_l(H+\epsilon )} \le p(x_{n_l +1}, \ldots , x_{n_l+k_l} ) \le 2^{-k_l (H- \epsilon ) } \right\} . \end{aligned}$$

In light of Theorem 1.2, it is possible to prove along similar lines, analogues of inequalities (1.1), (1.2) and (1.3), for the sets $(B_{S,\epsilon }^l)_{l\ge 1}$ which can then be used to construct a compression scheme along the lines of the one that is constructed from $(A_{\epsilon }^n)_{n\ge 1}$.

2 Proof of Theorem 1.2

In the case of one sided shifts we can think of it as the future of a the two sided shift arising form the natural extention of the one sided shift. We will denote by $\mathbf{E}(f|{\mathcal {A}})(x)$ the conditional expectation operator of the function $f$ with respect to the $\sigma $- algebra ${\mathcal {A}}$. To prove Theorem 1.2 we need the following lemma

Lemma 2.1

Suppose $(\Omega , {\mathcal {B}}, p, T)$ is a dynamical system and that $(g_k)_{k=1}^{\infty }$ is a sequence of $p$ measurable functions converging pointwise to $g$. Then if $\sup _{k\ge 1}|g_k| \in L^1(\Omega , {\mathcal {B}}, p)$ we have

$$\begin{aligned} \lim _{l\rightarrow \infty }{1\over k_l}\sum _{k=n_l}^{n_l+k_l}g_k(T^k \omega ) = \mathbf{E}(g| {\mathcal {I}})(\omega ). \end{aligned}$$

Proof

We have

$$\begin{aligned} {1\over k_l}\sum _{k=n_l}^{k=n_l+k_l} g_k(T^kx) ={1\over k_l}\sum _{k=n_l}^{k=n_l+k_l} g(T^kx) + {1\over k_l}\sum _{k=n_l}^{k=n_l+k_l} \left[ g_k(T^kx)-g(T^kx)\right] . \end{aligned}$$

Using the moving average ergodic theorem the first term on the right tends to $\mathbf{E}(g|{\mathcal {I}})(x)$. Let $G_N(x) =\sup _{k\ge N}|g_k(x)-g(x)|$. Then for the second term on the right we have the estimate

$$\begin{aligned}&\limsup _{l\rightarrow \infty } \left| {1\over k_l}\sum _{k=n_l}^{k=n_l+k_l}\left[ g_k(T^kx)-g(T^kx)\right] \right| \\&\quad \le \limsup _{l\rightarrow \infty } {1\over k_l}\sum _{k=n_l}^{k=n_l+k_l} |g_k(T^kx)-g(T^kx) |\\&\quad \le \limsup _{l\rightarrow \infty } {1\over k_l}\sum _{k=n_l}^{k=n_l+k_l} |G_N(T^kx)| = \mathbf{E}(G_N|{\mathcal {I}})(x) \end{aligned}$$

almost everywhere. Now $(G_N)_{N\ge 1}$ converges monotonically to zero and

$$\begin{aligned} \mathbf{E}G_0 \le \mathbf{E}(\sup _k|g_k| + |g| ) (x) < \infty , \end{aligned}$$

by the monotone convergence theorem $\mathbf{E}(G_N|{\mathcal {I}} )(x)$ converges to $0$. Lemma 2.1 is proved.

We now complete the proof of Theorem 1.2. Set $g_0(x) =-\log p(x_0)$ and set $g_k(x) = \log {p(x_{-k}, \ldots , x_0) \over p( x_{-k}, \ldots , x_1)}$ $(k\ge 1)$, where if $(x_n)_{n=0}^{\infty }$ is a one sided sequence we work with the two sided sequences obtained via the natural extention $T$ of the shift map.

$$\begin{aligned} -{1\over k_l} \log p(x_{n_l}, \ldots , x_{n_l +k_l}) = -{1\over k_l} T^{n_l} \log p(x_{0}, \ldots , x_{k_l-1}) = {1\over k_l} T^{k_l} \left( \sum _{k=0}^{k_l-1}g_k(T^kx)\right) . \end{aligned}$$

Since $T$ is 1–1 and measure preserving, the proof of Theorem 1.2 is completed, once we show $(g_k)_{k\ge 0}$ converges almost everywhere an that $\mathbf{E}(\sup _k g_k) < \infty $. To do this we start with an equality of McMillan [12].

$$\begin{aligned} \int _{m\le g_k < m+1} g_k \le s(m+1)2^{-m}. \end{aligned}$$

We confine attention to the cylinder set $Z_i \subseteq \Omega $ with $Z_i = \{ x : x_0 = a_i \}$. On $Z_i$ we have

$$\begin{aligned} g_k(x) = -\log p (x_0 = a_i |x_{-1} , \ldots , x_k ). \end{aligned}$$

(2.1)

As ${ p (x_0 = a_i |x_{-1} , \ldots , x_k )}_{k\ge 1}$ is a martingale, and $-$ $\log $ is a convex function, the sequence $(g_k(x))_{k\ge 1}$ is a semi-martingale. Then $(g_k)_{k \ge 1}$ converges almost everywhere on $Z_i$ and hence $\Omega $ [8]. Furthermore by the semi-martingale property

$$\begin{aligned} \int _{Z_i} { \sup _{0\le k \le n} g_k} \le {e\over e-1} + {e\over e-1}\int _{Z_i}(g_n(\log ^+ g_n ) ). \end{aligned}$$

Using (2.1) again we have

$$\begin{aligned} \int _{Z_i} (g_n \log ^+ g_n )&= \sum _{m=0}^{\infty } \int _{Z_i \cap [m\le g_n < m+1]} (g_n\log ^+ g_n )\\&\le \sum _{m=0}^{\infty } s(m+1) \log (m+1) 2^{-m}. \end{aligned}$$

Thus $\int _{Z_i} \left( \sum _k g_k\right) < \infty $ so by addition $\mathbf{E}( \sup _k g_k) < \infty $ and Theorem 1.2 is proved. $\square $

References

M.A. Akcoglu, A. del Junco, Convergence of averages of point transformations. Proc. Am. Math. Soc. 49, 265–266 (1975)
MATH Google Scholar
A. Bellow, R. Jones, J. Rosenblatt, Convergence of moving averages. Ergod. Theory Dyn. Syst. 10(1), 43–62 (1990)
Article MathSciNet MATH Google Scholar
A. Bellow, V. Losert, The weighted ergodic theorem and individual ergodic theorem along subsequences. Trans. Amer. Math. Soc. 288, 307–345 (1985)
L. Breiman, The individual ergodic theorem of information theory. Ann. Math. Stat. 28(3), 809–811 (1957)
Article MathSciNet MATH Google Scholar
A.P. Calderón, Ergodic theory and translation-invariant operators. Proc. Nat. Acad. Sci. USA. 59, 349–353 (1968)
Article MATH Google Scholar
I.P. Cornfeld, S.V. Fomin, Ya G. Sinai, Ergodic Theory, Grundlehren der Mathematischen Wissenschaften, vol. 245 (Springer, New York, 1982). x+486 pp.
Google Scholar
T. Cover, J. Thomas, Elements of Information Theory, 2nd edn. (Wiley, Newyork, 2006)
MATH Google Scholar
J.L. Doob, Stochastic Processes (Wiley, New York, 1953)
MATH Google Scholar
H. Kamarul-Haili, R. Nair, On moving averages and continued fractions Unif. Distrib. Theory 6(1), 65–78 (2011)
MathSciNet Google Scholar
H. Kamarul-Haili, R. Nair, Optimal continued fractions and the moving average ergodic theorem. Period. Math. Hung. 66(1), 95–103 (2013)
Article MathSciNet Google Scholar
H. Kamarul-Haili, R. Nair, The nearest integer continued fraction and the moving average ergodic theorem. Unif. Distrib. Theory 8(1), 73–87 (2013)
MathSciNet Google Scholar
B. McMillan, The basic theorems of Information. Ann. Math. Stat. 24, 196–219 (1953)
Article MathSciNet Google Scholar
A. Nagel, E.M. Stein, On certain maximal functions and approach regions. Adv. Math. 54(1), 83–106 (1984)
Article MathSciNet MATH Google Scholar
M. Schwartz, Polynomially moving ergodic averages. Proc. Am. Math. Soc. 103(1), 252–254 (1988)
Article MATH Google Scholar
C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, pp.379–423,623–656 (1948)

Download references

Author information

Authors and Affiliations

Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 3BX, UK
Radhakrishnan Nair

Authors

Radhakrishnan Nair
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radhakrishnan Nair.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nair, R. On moving averages and asymptotic equipartition of information. Period Math Hung 71, 59–63 (2015). https://doi.org/10.1007/s10998-014-0080-x

Download citation

Published: 06 January 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10998-014-0080-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On moving averages and asymptotic equipartition of information

Abstract

Similar content being viewed by others

Information in the presence of noise. Shannon’s amount of information

Large Deviations and Wschebor’s Theorems

What is Shannon information?

1 Introduction

Theorem 1.1

Theorem 1.2

2 Proof of Theorem 1.2

Lemma 2.1

Proof

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On moving averages and asymptotic equipartition of information

Abstract

Similar content being viewed by others

Information in the presence of noise. Shannon’s amount of information

Large Deviations and Wschebor’s Theorems

What is Shannon information?

1 Introduction

Theorem 1.1

Theorem 1.2

2 Proof of Theorem 1.2

Lemma 2.1

Proof

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation