The Word Entropy and How to Compute It

Ferenczi, Sébastien; Mauduit, Christian; Moreira, Carlos Gustavo

doi:10.1007/978-3-319-66396-8_15

Sébastien Ferenczi¹⁷,
Christian Mauduit¹⁷ &
Carlos Gustavo Moreira¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10432))

Included in the following conference series:

International Conference on Combinatorics on Words

557 Accesses

Abstract

The complexity function of an infinite word counts the number of its factors. For any positive function f, its exponential rate of growth $E_0(f)$ is $\lim \limits _{n\rightarrow \infty } \inf \frac{1}{n}\log f(n)$. We define a new quantity, the word entropy $E_W(f)$, as the maximal exponential growth rate of a complexity function smaller than f. This is in general smaller than $E_0(f)$, and more difficult to compute; we give an algorithm to estimate it. The quantity $E_W(f)$ is used to compute the Hausdorff dimension of the set of real numbers whose expansions in a given base have complexity bounded by f.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Finite Polylogarithms, Their Multiple Analogues and the Shannon Entropy

String Powers in Trees

Characteristic, Counting, and Representation Functions Characterized

Keywords

1 Definitions

Let A be the finite alphabet $\{0,1,\dots ,q-1\}$, If $w\in A^{{\mathbb N}}$, and L(w) the set of finite factors of w; for any non-negative integer n, we write $L_n(w)=L(w)\cap A^n$. The classical complexity function is described for example in [2].

Definition 1

The complexity function of $w\in A^{{\mathbb N}}$ is defined for any non-negative integer n by $p_w(n)=|L_n(w)|$.

Our work concerns the study of infinite words w the complexity function of which is bounded by a given function f from ${\mathbb N}$ to ${\mathbb R}^{+}$. More precisely, if f is such a function, we put

$$\begin{aligned} W(f)=\{w\in A^{{\mathbb N}}, p_w(n)\le f(n), \forall n \in {\mathbb N}\}. \end{aligned}$$

Definition 2

If f is a function from ${\mathbb N}$ to ${\mathbb R}^{+}$, we call exponential rate of growth of f the quantity

$$\begin{aligned} E_0(f)=\lim \limits _{n\rightarrow \infty } \inf \frac{1}{n}\log f(n) \end{aligned}$$

and word entropy of f the quantity

$$\begin{aligned} E_W (f) =\sup _{\begin{array}{c} w \in W(f) \end{array}} E_0(p_w). \end{aligned}$$

Of course, if $E_0=0$ then $E_W$ is zero also. Thus the study of $E_W$ is interesting only when f has exponential growth: we are in the little-explored field of word combinatorics in positive entropy, or exponential complexity. For an equivalent theory in zero entropy, see [3, 4].

2 First Properties of $E_0$ and $E_W$

The basic study of these quantities is carried out in [5], where the following results are proved.

If f is itself a complexity function (i.e. $f=p_w$ for some $w \in A^{\mathbb N}$), then $E_W (f)=E_0 (f)$. But in general $E_W$ may be much smaller than $E_0$.

We define mild regularity conditions for f: f is said to satisfy $(\mathcal {C})$ if the sequence $\left( f(n)\right) _{n\ge 1}$ is strictly increasing, there exists $n_0\in {\mathbb N}$ such that , $f(n+1) \le f(1) f(n) $, and the sequence $\left( \frac{1}{n} \log f(n)\right) _{n\ge 1}$ converges.

But for each $1<\theta \le q$, and $n_0 \in {\mathbb N}$ such that $\theta ^{n_0+1} > n_0+q-1$, we define the function f by $f(1)=q$, $f(n) = n+q-1$ for $1 \le n \le n_0$ and $f(n)=\theta ^n$ for $n > n_0$. We have $E_0(f) = \log \theta $ and it is proved that

$$\begin{aligned} E_W(f) \le \frac{1}{n_0} \log (n_0+q-1), \end{aligned}$$

which can be made arbitrarily small, independently of $\theta $, while f satisfies $(\mathcal {C})$.

We define stronger regularity conditions for f.

Definition 3

We say that a function f from ${\mathbb N}$ to ${\mathbb R}^{+}$ satisfies the conditions $(\mathcal C^*)$ if (i) for any $n \in {\mathbb N}$ we have $f(n+1) > f(n) \ge n+1$; (ii) for any $(n, n' ) \in {\mathbb N}^2$ we have $f(n+n') \le f(n) f(n') $.

But even with $(\mathcal C^*)$ we may have $E_W(f) < E_0(f)$. Indeed, let f be the function defined by $f(n)=\lceil 3^{n/2} \rceil $ for any $n \in {\mathbb N}$. Then it is easy to check that f satisfies conditions $(\mathcal C^*)$ and that $E_0(f)=\lim \limits _{n\rightarrow \infty }\frac{1}{n} \log f(n) =\log (\sqrt{3})$. On the other hand, we have $f(1)=2$, $f(2)=3$; thus the language has no 00 or no 11, and this implies that $E_W(f) \le \log (\frac{1+\sqrt{5}}{2})<E_0(f)$.

At least, under these conditions, we have the important

Theorem 4

If f is a function from ${\mathbb N}$ to ${\mathbb R}^{+}$ satisfying the conditions $(\mathcal C*)$, then $E_W(f)> \frac{1}{2} E_0(f)$.

It is also shown in [5] that the constant $\frac{1}{2}$ is optimal.

Finally, it will be useful to know that

Theorem 5

For any function f from ${\mathbb N}$ to ${\mathbb R}^{+}$, there exists $w \in W(f)$ such that for any $ n \in {\mathbb N}$ we have $p_w(n) \ge \exp (E_W(f) n)$.

3 Algorithm

In general $E_W(f)$ is much more difficult to compute than $E_0(f)$; now we will give an algorithm which allows us to estimate with arbitrary precision $E_W(f)$ from finitely many values of f, if we know already $E_0(f)$ and have some information on the speed with which this limit is approximated.

We assume that f satisfies conditions $\mathcal C^*$. We don’t loose too much generality with this assumption, since if the function f which satisfies the weaker conditions $\mathcal C$, we can replace it by the function $\tilde{f}$ given recursively by

$$\begin{aligned} \tilde{f}(n):=\min \{f(n), \min _{1 \le k <n}\tilde{f}(k) \tilde{f}(n-k) \}, \end{aligned}$$

which satisfies conditions $\mathcal C^*$, such that $\tilde{f}(n) \le f(n), \forall n \in {\mathbb N}$ and $W(\tilde{f})=W(f)$.

Theorem 6

There is an algorithm which gives, starting from f and ${\varepsilon }$, a quantity h such that $(1-{\varepsilon })h\le E_W(f) \le h$. h depends explicitely on ${\varepsilon }$, $E_0(f)$, N, f(1), ..., f(N), for an integer N which depends explicitely on ${\varepsilon }$, $E_0(f)$, and an integer $n_0$, larger than an explicit function of ${\varepsilon }$ and $E_0(f)$, and such that

$$\begin{aligned} \frac{\log f(n)}{n}< (1+\frac{E_0(f) {\varepsilon }}{210(4+2E_0(f))})E_0(f), \quad \text{ for }\quad n_0 \le n <2 n_0. \end{aligned}$$

We shall now give the algorithm. f is given and henceforth we omit to mention it in $E_0(f)$ and $E_W(f)$. Also given is ${\varepsilon }\in (0,1)$.

Description of the algorithm

Let
$$\begin{aligned} \delta :=\frac{E_0 {\varepsilon }}{105(4+2E_0)}<\frac{{\varepsilon }}{210}. \end{aligned}$$
Let
$$\begin{aligned} K:=\lceil \delta ^{-1} \rceil +1. \end{aligned}$$
Choose a positive integer
$$\begin{aligned} n_0\ge K\vee \frac{4K^2}{420^3E_0} \end{aligned}$$
such that
$$\begin{aligned} \frac{\log f(n)}{n} < (1+\frac{\delta }{2})E_0, \forall n \ge n_0; \end{aligned}$$
in view of conditions $\mathcal C^*$, this last condition is equivalent to $\frac{\log f(n)}{n}< (1+\frac{\delta }{2})E_0, n_0 \le n <2 n_0$.
Choose intervals so large that all the lengths of words we manipulate stay in one of them. Namely, for each $t \ge 0$, let
$$\begin{aligned} n_{t+1}:= \exp (K((1+\delta )^2E_0n_t+E_0)). \end{aligned}$$
We take
$$\begin{aligned} N:=n_K. \end{aligned}$$
Choose a set $Y \subset A^N$: for each possible Y, we define $L_n(Y)=\cup _{{\gamma }\in Y}L_({\gamma })$, $q_n(Y):=|L_n(Y)|$, for $1 \le n \le N$. We look at those Y for which $q_n(Y) \le f(n), \forall n \le N$, and choose one among them such that
$$\begin{aligned} \min _{1 \le n \le N}\frac{\log q_n(Y)}{n} \end{aligned}$$
is maximum.
By Lemma 7 below, on one of the large intervals we have defined, namely $[n_r,n_{r+1}]$, $\frac{\log q_n(Y)}{n}$ will be almost constant. Let
$$\begin{aligned} h:=\frac{\log q_{n_{r}}(Y)}{n_{r}}. \end{aligned}$$

Here is the lemma we needed; henceforth, Y is fixed and we omit to mention it in the $q_n(Y)$:

Lemma 7

There exists $r<K$, such that

$$\begin{aligned} \frac{\log q_{n_r}}{n_{r}} < (1+\delta ) \frac{\log q_{n_{r+1}}}{{n_{r+1}}}. \end{aligned}$$

Proof

Otherwise $\frac{\log q_{n_0}}{n_0}\ge (1+\delta )^K \frac{\log q_{n_K}}{{n_K}}$: as $K>\frac{1}{\delta }$, $(1+\delta )^K$ would be close to e for $\delta $ small enough, and is larger than $\frac{9}{4}$ as $\delta <\frac{1}{2}$; thus, as $\frac{\log q_{n_K}}{n_K} \ge E_W$ by the proof of Proposition 8 below, we have $\frac{\log q_{n_0}}{n_0}\ge \frac{9}{4}E_W$, but $q_{n_0}\le f(n_0)$ hence $\frac{\log q_{n_0}}{n_0} < (1+\frac{\delta }{2})E_0$, and this contradicts $E_0\le 2E_W$, which is true by Theorem 4.

We prove now that indeed h is a good approximation of the word entropy.

Proposition 8

$$\begin{aligned} h \ge E_W. \end{aligned}$$

Proof

We prove that

$$\begin{aligned} \min _{1 \le n \le N}\frac{\log q_n}{n} \ge E_W. \end{aligned}$$

We know by Theorem 5 that there is $\hat{w} \in W(f)$ with $p_n(\hat{w}) \ge \exp (E_W n)$, for all $n \ge 1$. For such a word $\hat{w}$, let $X:=L_N(\hat{w})\subset A^N$. We have, for each n with $1\le n \le N$, $L_n(X)=L_n(\hat{w})$ and $f(n)\ge \# L_n(\hat{w})=p_n(\hat{w})\ge \exp (E_W n)$. Thus X is one of the possible Y, and the result follows from the maximality of $\min _{1 \le n \le N}\frac{\log q_n}{n}$.

What remains to prove is the following proposition (which, understandably, does not use the maximality of $\min _{1 \le n \le N}\frac{\log q_n}{n}$).

Proposition 9

$$\begin{aligned} (1-{\varepsilon })h\le E_W. \end{aligned}$$

Proof

Our strategy is to build a word w such that, for all $n\ge 1$,

$$\begin{aligned} \exp ((1-{\varepsilon })hn)\le p_n(w)\le f(n), \end{aligned}$$

which gives the conclusion by definition of $E_W$. To build the word w, we shall define an integer m, and build successive subsets of $L_m(Y)$; for such a subset Z, we order it (lexicographically for example) and define w(Z) to be the Champernowne word on Z: namely, if $Z=\{{\beta }_1,{\beta }_2,...,{\beta }_t\}$, we build the infinite word

$$\begin{aligned} w(Z):={\beta }_1{\beta }_2\dots {\beta }_t{\beta }_1{\beta }_1{\beta }_1{\beta }_2{\beta }_1{\beta }_3\dots {\beta }_{t-1}{\beta }_t{\beta }_1 {\beta }_1{\beta }_1\dots {\beta }_t{\beta }_t{\beta }_t\dots \end{aligned}$$

made by concatenation of all words in Z followed by the concatenations of all pairs of words of Z followed by the concatenations of all triples of words of Z, etc.

The word w(Z) will satisfy $\exp ((1-{\varepsilon })hn)\le p_n(w(Z))$ for all n as soon as

$$\begin{aligned} |Z| \ge \exp ((1- {\varepsilon })h m), \end{aligned}$$

since, for every positive integer k, we will have at least $|Z|^k$ factors of length km in w(Z).

The successive (decreasing) subsets Z of $L_m(Y)$ we build will all have cardinality at least $\exp ((1- {\varepsilon })h m)$, and the words w(Z) will satisfy $p_n(w(Z))\le f(n)$ for n in an interval which will increase at each new set Z we build, and ultimately contains all the integers.

We give only the main ideas of the remaining proof. In the first stage we define two lengths of words, $\hat{n}$ and $m>\frac{\hat{n}}{2{\varepsilon }}$, which will be both in the interval $[n_r,n_{r+1}]$, and a set $Z_1$ of words of length m of the form $\gamma \theta $, for words ${\gamma }$ of length $\hat{n}$, such that the word $\gamma \theta \gamma $ is in $L_{m+\hat{n}}(Y)$. This is done by looking precisely at twin occurrences of words.

Let $\tilde{\varepsilon }= \frac{{\varepsilon }}{15}=\frac{7(4+2E_0)\delta }{E_0}>14 \delta $; then we can get such a set $Z_1$ with $|Z_1|\ge \exp ((1-\tilde{\varepsilon })h (m+\hat{n}))$.

In the second stage, we define a new set $Z_2\subset Z_1$ in which all the words have the same prefix ${\gamma }_1$ of length $6\tilde{\varepsilon }hm$, and all the words have the same suffix ${\gamma }_2$ of length $6\tilde{\varepsilon }hm$, with $|Z_2|\ge |Z_1|\exp (-12\tilde{\varepsilon }h m-2\delta h\hat{n})$, and $2\delta h\hat{n} \le (1-\tilde{\varepsilon })\hat{n}$, thus

$$\begin{aligned} |Z_2|\ge \exp ((1-13\tilde{\varepsilon })h m). \end{aligned}$$

As a consequence of the definition of $Z_2$, all words of $Z_2$ have the same prefix of length $\hat{n}$, which is a prefix ${\gamma }_0$ of ${\gamma }_1$; as $Z_2$ is included in $Z_1$, any word of $Z_2$ is of the form $\gamma _0\theta $, amd the word $\gamma _0\theta \gamma _0$ is in $L_{m+\hat{n}}(Y)$.

At this stage we can prove

Claim

$p_{w(Z_2)}(n)\le f(n)$ for all $1\le n\le \hat{n}+1$.

Let us shrink again our set of words.

Lemma 10

For a given subset Z of $Z_2$, there exists $Z'\subset Z$, $|Z'|\ge (1-\exp (-(j-1)\frac{E_0}{2}))^j|Z|$, such that the total number of factors of length $\hat{n}+j$ of all words ${\gamma }_0\theta {\gamma }_0$ such that ${\gamma }_0\theta $ is in $Z'$ is at most $f(\hat{n}+j)-j$.

We start from $Z_2$ and apply successively Lemma 10 from $j=2$ to $j=6\tilde{\varepsilon }m$, getting $6\tilde{\varepsilon }m-1$ successive sets $Z'$; at the end, we get a set $Z_3$ such that the total number of factors of length $\hat{n}+j$ of words ${\gamma }_0\theta {\gamma }_0$ for ${\gamma }_0\theta $ in $Z_3$ is at most $f(\hat{n}+j)-j$ for $j=2,\ldots , 6\tilde{\varepsilon }m$, and $\frac{|Z_3|}{|Z_2|}$ is at least

$$\begin{aligned} \varPi _{2 \le j \le 6 \tilde{\varepsilon }m-\hat{n}}(1-\exp (-(j-1)\frac{E_0}{2}))^j \ge \varPi _{j \ge 2}(1-\exp (-(j-1)\frac{E_0}{2}))^j, \end{aligned}$$

which implies after computations that

$$\begin{aligned} |Z_3| \ge \exp ((1-14\tilde{\varepsilon })h m). \end{aligned}$$

We can now bound the number of short factors by using the factors we have just deleted and properties of ${\gamma }_0$, ${\gamma }_1$ and ${\gamma }_2$.

Claim

$p_{w(Z_3)}(n)\le f(n)$ for all $1\le n\le 6\tilde{\varepsilon }m$.

We shrink our set again.

Let $m \ge n > 6 \tilde{\varepsilon }m$; in average a factor of length n of a word in $Z_3$ occurs in at most $\frac{m|Z_3|}{f(n)}$ elements of $Z_3$. We consider the $\frac{f(n)}{mn^2}$ factors of length n which occur the least often. In total, these factors occur in at most $\frac{m|Z_3|}{f(n)}\frac{f(n)}{mn^2}=\frac{|Z_3|}{n^2}$ elements of $Z_3$. We remove these words from $Z_3$, for all $m \ge n > 6 \tilde{\varepsilon }m$, obtaining a set $Z_4$ with $|Z_4| \ge \exp ((1-15\tilde{\varepsilon })h m)$.

We can now control medium length factors, using again the missing factors we have just created, and ${\gamma }_1$ and ${\gamma }_2$, but not ${\gamma }_0$.

Claim

$p_{w (Z_4)}(n)\le f(n)$ for all $1\le n\le m$.

Finally we put $Z_5=Z_4$ if $|Z_4|\le \exp ((1-4\tilde{\varepsilon })hm)$, otherwise we take for $Z_5$ any subset of $Z_4$ with $\lceil \exp ((1-4\tilde{\varepsilon })hm)\rceil $ elements. In both cases we have

$$\begin{aligned} |Z_5| \ge \exp ((1- {\varepsilon })h m). \end{aligned}$$

For the long factors, we use mainly the fact that there are many missing factors of length m, but we need also some help from ${\gamma }_1$ and ${\gamma }_2$.

Claim

$p_{w(Z_5)}(n)\le f(n)$ for all n.

In view of the considerations at the beginning of the proof of Proposition 9, Claim 3 completes the proof of that proposition, and thus of Theorem 6.

4 Application

We define

$$\begin{aligned} C(f)=\{x = \sum \limits _{n \ge 0} \frac{w_n}{q^{n+ 1}} \in [0,1], w(x) = {w_0}{w_1}\cdots {w_n}\cdots \in W(f)\}. \end{aligned}$$

We are interested in the Hausdorff dimensions of this set, see [1] for definitions; indeed, the main motivation for studying the word entropy is Theorem 4.8 of [5]:

Theorem 11

The Hausdorff dimension of C(f) is equal to $E_W(f)/\log q$.

References

Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Chichester (1990)
Google Scholar
Ferenczi, S.: Complexity of sequences and dynamical systems. Discrete Math. 206(1–3), 145–154 (1999). http://dx.doi.org/10.1016/S0012-365X(98)00400-2, (Tiruchirappalli 1996)
Mauduit, C., Moreira, C.G.: Complexity of infinite sequences with zero entropy. Acta Arith. 142(4), 331–346 (2010). http://dx.doi.org/10.4064/aa142-4-3
Mauduit, C., Moreira, C.G.: Generalized Hausdorff dimensions of sets of real numbers with zero entropy expansion. Ergodic Theor. Dynam. Syst. 32(3), 1073–1089 (2012). http://dx.doi.org/10.1017/S0143385711000137
Mauduit, C., Moreira, C.G.: Complexity and fractal dimensions for infinite sequences with positive entropy (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Aix Marseille Université, CNRS, Centrale Marseille, Institut de Mathématiques de Marseille, I2M - UMR 7373, 163, avenue de Luminy, 13288, Marseille Cedex 9, France
Sébastien Ferenczi & Christian Mauduit
Instituto de Matemática Pura e Aplicada, Estrada Dona Castorina 110, Rio de Janeiro, RJ, 22460-320, Brazil
Carlos Gustavo Moreira

Authors

Sébastien Ferenczi
View author publications
You can also search for this author in PubMed Google Scholar
Christian Mauduit
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Gustavo Moreira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sébastien Ferenczi .

Editor information

Editors and Affiliations

Université du Québec à Montréal, Montreal, Québec, Canada
Srečko Brlek
Université du Québec à Montréal, Montreal, Québec, Canada
Francesco Dolce
Université du Québec à Montréal, Montreal, Québec, Canada
Christophe Reutenauer
Université du Québec à Montréal, Montreal, Québec, Canada
Élise Vandomme

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferenczi, S., Mauduit, C., Moreira, C.G. (2017). The Word Entropy and How to Compute It. In: Brlek, S., Dolce, F., Reutenauer, C., Vandomme, É. (eds) Combinatorics on Words. WORDS 2017. Lecture Notes in Computer Science(), vol 10432. Springer, Cham. https://doi.org/10.1007/978-3-319-66396-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-66396-8_15
Published: 15 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66395-1
Online ISBN: 978-3-319-66396-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Word Entropy and How to Compute It

Abstract

Similar content being viewed by others

Finite Polylogarithms, Their Multiple Analogues and the Shannon Entropy

String Powers in Trees

Characteristic, Counting, and Representation Functions Characterized

Keywords

1 Definitions

Definition 1

Definition 2

2 First Properties of \(E_0\) and \(E_W\)

Definition 3

Theorem 4

Theorem 5

3 Algorithm

Theorem 6

Lemma 7

Proof

Proposition 8

Proof

Proposition 9

Proof

Claim

Lemma 10

Claim

Claim

Claim

4 Application

Theorem 11

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation