1 Introduction

In this work we introduce a notion of independence for pairs of infinite words, based on finite-state automata. We call it finite-state independence.

The concept of independence appears in many mathematical theories, formalizing that two elements are independent if they have no common parts. In classical probability theory the notion of independence is defined for random variables. In the case of random variables with finite range the notion of independence can be reformulated in terms of Shannon entropy function: two random variables are independent if the entropy of the pair is exactly the sum of the individual entropies. An equivalent formulation says that two random variables are independent if putting one as a condition does not decrease the entropy of the other one. In algorithmic information theory the notion of independence can be defined for finite objects using program-size (Kolmogorov–Chaitin) complexity. Namely, finite words x and y are independent if the program-size complexity of the pair (x,y) is close to the sum of program-size complexities of x and y. Equivalently, up to a small error term, two finite words are independent if the program-size complexity of one does not decrease when we allow the other as an oracle.

Algorithmic information theory also defines the notion of independence for random infinite words, as follows. Recall that, according to Martin-Löf’s definition, an infinite word is random if it does not belong to any effectively null set; an equivalent characterization establishes that an infinite word is random if its prefixes have nearly maximal program-size complexity, which means that they are incompressible with Turing machines. Two random infinite words x 1 x 2… and y 1 y 2… are independent if their join x 1 y 1 x 2 y 2… is random [1, 23], see also [17, Theorem 3.4.6] and [14] for independence on stronger notions of randomness. An equivalent definition establishes that two random infinite words are independent if the program-size complexity of the initial segment of one, conditioned on the other one, is nearly maximal. This means that one word remains incompressible even when using the other one as an oracle. See [10, 16, 22] for a thorough presentation of this material. While the notion of independence for random infinite words is well understood, algorithmic information theory has not provided a fully satisfactory definition of independence for arbitrary infinite words, see the discussion in [7].

Here we scale down the notion of independence given by algorithmic information theory by considering incompressibility by finite-state automata instead of incompressibility by Turing machines. Our definition builds on the theory of finite-state compression ratio introduced by Dai, Lathrop, Lutz and Mayordomo [9]. The finite-state compression ratio of an infinite word indicates how much it can be compressed by one-to-one finite-state transducers, which are finite-state automata augmented with an output transition function such that the automata input–output behavior is one-to-one (Huffman [12] called them lossless finite-state compressors). The infinite words that can not be compressed are exactly the Borel normal words (this result was first known from combining [20] and [9], see [4] for a direct proof). We say that two infinite words are finite-state independent if one does not help to compress the other using finite-state transducers. In Theorem 5.1 we show that the set of finite-state independent pairs of infinite words has Lebesgue measure 1, giving an elementary proof.

As expected, the join of two finite-state independent normal words is normal (Theorem 4.1). However, independence of two normal words is not guaranteed if we just require that their join is normal. To show this we construct a normal word x that is equal to the word even(x) that consists of the symbols of x at even positions (Theorem 4.4). Thus, if odd(x) consists of the symbols of x at odd positions, both odd(x) and even(x) are normal, and their join is normal. But odd(x) and even(x) are not independent: odd(x) equals odd(even(x)). This phenomenon is not isolated: Alexander Shen (personal communication, August 2016) proved that for the set of words x such that x = even(x), a word is normal with probability 1.

The notion of finite-state independence we present here is based just on deterministic finite-state transducers. It remains to investigate if non-deterministic finite-state transducers operating with an oracle can achieve different compression ratios. In the case of finite-state transducers with no oracle, it is already known that the deterministic and the non-deterministic models compress exactly the same words, namely, the non-normal words [3]. Some other models also compress exactly the same words, such as the finite-state transducers with a single counter [3] and the two-way transducers [8]. It is still unknown if there is a deterministic push-down transducer that can compress some normal words.

It also remains the question of how to characterize finite-state independence other than by the conditional compression ratio in finite-state automata. One would like a characterization in terms of a complexity function based on finite automata as those considered in [21] and [13].

2 Primary Definitions

Let A be a finite set of symbols, the alphabet. We write A ω for the set of all infinite words over A and A k stands for the set of all words of length k. The length of a finite word w is denoted by |w|. The positions in finite and infinite words are numbered starting from 1. To denote the symbol at position i of a word w we write w[i] and to denote the subword of w from position i to j we write w[i..j]. We use the customary notation for asymptotic growth of functions saying that f(n) is in O(g(n)) if ∃k > 0 ∃n 0n > n 0, |f(n)|≤ k|g(n)|.

2.1 Normality

A presentation of the definitions and basic results on normal sequences can be read form [2]. Here we start by introducing the number of occurrences and the number of aligned occurrences of a word u in a word w.

Definition 2.1

For two words w and u, the number of occurrences of u in w, denoted by |w| u , and the number of aligned occurrences of u in w, denoted by | |w| | u , are defined as

$$\begin{array}{@{}rcl@{}} |w|_{u} & =&|\{ i : w[i..i+|u|-1] = u \}|, \\ |\!|w|\!|_{u} & =&|\{ i : w[i..i+|u|-1] = u \text{ and } i \equiv 1 \bmod |u|\}|. \end{array} $$

For example, |a a a a a| a a = 4 and | |a a a a a| | a a = 2. Aligned occurrences are obtained by cutting w in |u|-sized pieces starting from the left. Notice that the definition of aligned occurrences has the condition i ≡ 1 mod |u| (and not i ≡ 0 mod |u|), because the positions are numbered starting from 1. Of course, when a word u is just a symbol, |w| u and | |w| | u coincide. Aligned occurrences can be seen as symbol occurrences using a power alphabet: if w is a word whose length is a multiple of r, then w can be considered as a word π(w) over A r by grouping its symbols into blocks of length r. The aligned occurrences of a word u of length r in w then correspond to the occurrences of the symbol π(u) in the word π(w), and | |w| | u = |π(w)| π(u).

We recall the definition of Borel normality [5] for infinite words (see the books [6, 15] for a complete presentation). An infinite word x is simply normal to word length if all the blocks of length have asymptotically the same frequency of aligned occurrences, i.e., if for every uA ,

$$\begin{array}{@{}rcl@{}} \lim\limits_{n \to \infty} \frac{|\!|x[1..n]|\!|_{u}}{n/\ell} = |A|^{-\ell}. \end{array} $$

An infinite word x is normal if it is simply normal to every length. Normality is defined here in terms of aligned occurrences but it could also be defined in terms of all occurrences. The equivalence between the two definitions requires a proof (see Theorem 4.5 in [6]).

2.2 Automata

We consider k-tape automata, also known as k-tape transducers when k is greater than 1 [18, 19]. We call them k-automata and we consider them for k equal to 1, 2, or 3 (Fig. 1). A k-automaton is a tuple \(\mathcal {T}= \langle Q,A,\delta ,I \rangle \), where Q is the finite state set, A is the alphabet, δ is the transition relation, I is the set of initial states. The transition relation is a subset of Q × (A ∪{ε})k × Q. A transition is thus a tuple 〈p,α 1,…,α k ,q〉 where p is its starting state, 〈α 1,…,α k 〉 is its label and q is its ending state. Note that each α i is here either a symbol of the alphabet or the empty word. A transition is written as \(p \xrightarrow {\alpha _{1},\ldots ,\alpha _{k}} q\). As usual, two transitions are called consecutive if the ending state of the first is the starting state of the second.

Fig. 1
figure 1

A 3-automaton and its tapes

An infinite run is an infinite sequence of consecutive transitions

$$\begin{array}{@{}rcl@{}} q_{0} \xrightarrow{\alpha_{1,1},\ldots,\alpha_{k,1}} q_{1} \xrightarrow{\alpha_{1,2},\ldots,\alpha_{k,2}} q_{2} \xrightarrow{\alpha_{1,3},\ldots,\alpha_{k,3}} q_{3} \xrightarrow{}\cdots \end{array} $$

The label of the run is the component-wise concatenation of the labels of the transitions given by the tuple 〈x 1,…,x k 〉 where each x j for 1 ≤ jk is equal to α j,1 α j,2 α j,3⋯. Note that some label x j might be finite although the run is infinite since some transitions may have empty labels. The run is accepting if its first state q 0 is initial and each word x j is infinite. Such an accepting run is written shortly as \(q_{0} \xrightarrow {x_{1},\ldots ,x_{k}} \infty \). The tuple 〈x 1,…,x k 〉 is accepted if there exists at least one accepting run with label 〈x 1,…,x k 〉. The 1-automata are the usual automata with ε-transitions and the 2-automata are the usual automata with input and output also known as transducers.

In this work we consider only deterministic k-automata. We actually consider k-automata where the transition is determined by a subset of the k tapes. Informally, a k-automaton is -deterministic, for 1 ≤ k, if the run is entirely determined by the contents of the first tapes. More precisely, a k-automaton is -deterministic if the following two conditions are fulfilled,

  • the set I of initial states is a singleton set;

  • for any two transitions \(p \xrightarrow {\alpha _{1},\ldots ,\alpha _{k}} q\) and \(p^{\prime } \xrightarrow {\alpha ^{\prime }_{1},\ldots ,\alpha ^{\prime }_{k}} q^{\prime }\) with p = p ,

    if α j = ε for some 1 ≤ j, then \(\alpha ^{\prime }_{j} = \varepsilon \)

    if \(\alpha _{1} = \alpha ^{\prime }_{1}, \ldots , \alpha _{\ell } = \alpha ^{\prime }_{\ell }\), then \(\alpha _{\ell + 1} = \alpha ^{\prime }_{\ell + 1}, \ldots ,\alpha _{n} = \alpha ^{\prime }_{n}\) and q = q .

The conditions on the transitions leaving a state p are the following. The first one requires that, among the first components, the ones with empty label are the same for all transitions leaving p. This means that each state determines (among the first tapes) the tapes from which a symbol is read (the ones with a symbol as label) and the tapes from which no symbol is read (the ones with empty label). The second condition is the usual one stating that two transitions leaving p and with the same labels in the first components must be the same.

Figures 2 and 3 show 3-automata that accept a triple 〈x,y,z〉 of infinite words over the alphabet {0,1}. In Fig. 2 the tuple 〈x,y,z〉 is accepted if z is a shuffle of x and y. In Fig. 3 the tuple 〈x,y,z〉 is accepted if z is the join of x and y (the join of two infinite words x = a 1 a 2 a 3⋯ and y = b 1 b 2 b 3⋯ is the infinite word z = a 1 b 1 a 2 b 2 a 3⋯.) The 3-automaton in Fig. 3 is 2-deterministic but the one in Fig. 2 is not because the two transitions \(q_{0} \xrightarrow {0,\varepsilon ,0} q_{0}\) and \(q_{0} \xrightarrow {\varepsilon ,0,0} q_{0}\) violate the required condition.

Fig. 2
figure 2

A non-deterministic 3-automaton for the shuffle

Fig. 3
figure 3

A 2-deterministic 3-automaton for the join

We assume each -deterministic k-automata \({\mathcal T}\) computes a partial function (A ω) → (A ω)k. Let \(\mathcal {T}\) be an -deterministic k-automaton. For each tuple 〈x 1,…,x 〉 of infinite words there exists at most one tuple 〈y + 1,…,y k 〉 of infinite words such that the k-tuple 〈x 1,…,x ,y + 1,…,y k 〉 is accepted by \(\mathcal {T}\). The automaton \(\mathcal {T}\) realizes then a partial function from (A ω) to (A ω)k and the tuple 〈y + 1,…,y k 〉 is denoted \(\mathcal {T}(x_{1},\ldots ,x_{\ell })\). The 1-deterministic 2-automata are also called sequential transducers in the literature. When a k-automaton is -deterministic, we write the transition as \(p \xrightarrow {\alpha _{1},\ldots ,\alpha _{\ell }|\beta _{\ell + 1},\ldots , \beta _{k}} q\) to emphasize the fact that the first tapes are input tapes and that the k remaining ones are output tapes.

We comment here on transitions reading no symbol from the input tapes. Let \(\mathcal {A}\) be a -deterministic k-automaton. Suppose that there exists a transition from state p to state q whose label has the form 〈ε,…,ε,β + 1,…,β k 〉 with empty labels in the first components. Let us call these such a transition an ε -transition. We claim that it is always possible to get rid of ε -transitions. The automaton \(\mathcal {A}\) being deterministic implies that this transition is the only transition leaving state p. If there exists a cycle made of ε -transitions, this cycle is a dead end in the automaton and its states can be removed without changing the set of accepting runs of \(\mathcal {A}\). Let us recall that it is required that all labels of an accepting state to be be infinite. Assume now that there is no cycle of such ε -transitions. Removing the transition \(p \xrightarrow {\varepsilon ,\ldots ,\varepsilon |\beta _{\ell + 1},\ldots , \beta _{k}} q\) and adding a transition \(p \xrightarrow {\alpha _{1},\ldots ,\alpha _{\ell }|\beta _{\ell + 1}\gamma _{\ell + 1},\ldots , \beta _{k}\gamma _{k}} r\) for each transition \(q \xrightarrow {\alpha _{1},\ldots ,\alpha _{\ell }|\gamma _{\ell + 1},\ldots , \gamma _{k}} r\) leaving q preserve the accepting paths and decrease the number of ε -transitions. Completing the process until no ε -transition remains remove all of them. In the rest of the paper, we always assume that ε -transitions have been removed.

Let \(\mathcal {T}\) be a 1-deterministic 2-automaton. To define the compression ratio of an infinite word x = a 1 a 2… by \(\mathcal {T}\), denoted by \(\rho _{\mathcal {T}}(x) \), consider the unique accepting run

$$q_{0} \xrightarrow{a_{1}|v_{1}} q_{1} \xrightarrow{a_{2}|v_{2}} q_{2} \xrightarrow{a_{3}|v_{3}} q_{3} \cdots $$

of \(\mathcal {T}\), where each a i is a symbol in A, and each v i is a finite word (possibly empty) of symbols in A. Then,

$$\begin{array}{@{}rcl@{}} \rho_{\mathcal{T}}(x) = \liminf\limits_{n \to \infty} \frac{|v_{1}v_{2}{\cdots} v_{n}|}{|a_{1}a_{2}{\cdots} a_{n}|}. \end{array} $$

For a given infinite word x it may happen that for some automata \(\mathcal {T}\), \( \rho _{\mathcal {T}}(x)\) is greater than 1 and for some other automaton \(\mathcal {T}^{\prime }\), \(\rho _{\mathcal {T}^{\prime }}(x)\) is less than 1. We say that an infinite word x is compressible by a 1-deterministic 2-automaton \(\mathcal {T}\) if \(\rho _{\mathcal {T}}(x) < 1\). A 1-deterministic 2-automaton \(\mathcal {T}\) is called one-to-one if the function which maps x to \(\mathcal {T}(x)\) is one-to-one. The compression ratio of an infinite word x is the infimum of the compression ratios achievable by all one-to-one 1-deterministic 2-automata:

$$\begin{array}{@{}rcl@{}} \rho(x) = \inf\{\rho_{\mathcal{T}}(x) : \mathcal{T}\text{ is a one-to-one 1-deterministic 2-automaton}\}. \end{array} $$

This compression ratio ρ(x) is always less or equal to 1, as witnessed by the one-to-one compressor \(\mathcal {C}_{0}\) which copies each symbol of the input to the output. For each infinite word x, the compression ratio \(\rho _{\mathcal {C}_{0}}(x)\) is equal to 1.

The sequence x = 0ω, made entirely of zeros, has compression ratio ρ(x) = 0. This is because for each positive real number ε, there exists a compressor \(\mathcal {C}\) such that \(\rho _{\mathcal {C}}(x) < \varepsilon \). However, the compression ratio 0 is not achievable by any one-to-one 1-deterministic 2-automaton \(\mathcal {C}\) for the following reason. Every such automaton \(\mathcal {C}\) computes a function A ωA ω, and the compression ratio by \(\mathcal {C}\) is the ratio between the output and the input in the cycle reached by the infinite run. If the input word x is ultimately periodic as for x = 0ω and x is in the domain of \(\mathcal {C}\) then the run on x is also ultimately periodic, and the compression ratio of x by \(\mathcal {C}\) is non-zero. On the other extreme, the words with compression ratio equal to 1 are exactly the normal words.

3 Finite-State Independence

To define the notion of finite-state independence we introduce the notion of conditional compression ratio. The conditional compression ratio of an infinite word x with respect to another infinite word y is the ratio of compression of x when y is used as an oracle. To define this notion, we consider 2-deterministic 3-automata such that two input tapes contain the words x and y and the output tape contains the result of the compression of x with the help of y.

A compressor is a 2-deterministic 3-automata \(\mathcal {C}\) such that for any fixed infinite word y, the function \(x \mapsto \mathcal {C}(x,y)\) which maps x to the output \(\mathcal {C}(x,y)\) is one-to-one. This guarantees that, if y is known, x can be recovered from \(\mathcal {C}(x,y)\). Note that we do not require that the function \((x,y) \mapsto \mathcal {C}(x,y)\) is one-to-one, which would be a much stronger requirement.

Definition 3.1

Let \(\mathcal {C}\) be a compressor. The conditional compression ratio of an infinite word x with respect to y for \(\mathcal {C}\) is determined by the unique accepting run

$$\begin{array}{@{}rcl@{}} q_{0} \xrightarrow{\alpha_{1},\beta_{1}|w_{1}} q_{1} \xrightarrow{\alpha_{2},\beta_{2}|w_{2}} q_{2} \xrightarrow{\alpha_{3},\beta_{3}|w_{3}} q_{3} \cdots \end{array} $$

such that x = α 1 α 2 α 3⋯ and y = β 1 β 2 β 3…, with each α i ,β i ∈ (A𝜖), as

$$\begin{array}{@{}rcl@{}} \rho_{\mathcal{C}}(x/y) = \liminf\limits_{n \to \infty} \frac{|w_{1}w_{2}{\cdots} w_{n}|}{|\alpha_{1} \alpha_{2}{\cdots} \alpha_{n}|}. \end{array} $$

Notice that the number of symbols read from y, the length of β 1 β 2β n , is not taken into account when defining \(\rho _{\mathcal {C}}(x/y)\).

The conditional compression ratio of an infinite word x given an infinite word y, denoted by ρ(x/y), is the infimum of the compression ratios \(\rho _{\mathcal {C}}(x/y)\) of all compressors \(\mathcal {C}\) with input x and oracle y.

Notice that the plain compression ratio ρ(x) and the conditional compression ratio ρ(x/y) always exist and they are values between 0 and 1 (witnessed by the identity function).

The following proposition gives sufficient conditions for the maximum compression, when the conditional compression ratio is equal to zero.

Proposition 3.2

If a function f is realizable by a 1-deterministic 2-automata then, for every x, ρ(f(x)/x) = 0.

Proof

Assume that the function f is realized by a 1-deterministic 2-automaton \(\mathcal {T}\), so \(f(x) = \mathcal {T}(x)\) for every infinite word x. We fix a positive integer k and we construct a compressor \(\mathcal {C}\) such that \(\rho _{\mathcal {C}}(f(x)/x)= 1/k\). This compressor has two input tapes, the first one containing a word y and the second one the word x. It compresses y in a non-trivial way only when y is equal to f(x). The automaton \(\mathcal {C}\) proceeds as follows. It reads the infinite word x from the second input tape and computes f(x) by simulating the automaton \(\mathcal {T}\). While f(x) coincides with y for the next k symbols, then \(\mathcal {C}\) writes a 0. When there is a mismatch, \(\mathcal {C}\) writes a 1 and then copies the remaining part of y to the output tape. Thus, if the mismatch occurs at position m = k p + r with 1 ≤ rk, the automaton \(\mathcal {C}\) writes p symbols 0 before the mismatch, a symbol 1 and y k p+ 1 y k p+ 2 y k p+ 3⋯. □

The following proposition provides a sufficient condition for compressing x given y: some correlation between symbols in x and y at the same positions ensures ρ(x/y) < 1.

Proposition 3.3

Let x and y be two infinite words. If there are three symbols c, c and d and an increasing sequence (m n ) n≥ 0 of integers such that

$$\begin{array}{@{}rcl@{}} \lim\limits_{n\to\infty}{\frac{|\{ 1 \!\le\! i \!\le\! m_{n} \!:\! x[i] \,=\, c, y[i] \,=\, d\}|}{m_{n}}} \!\neq\! \lim\limits_{n\to\infty}{\frac{|\{ 1 \!\le\! i \!\le\! m_{n} \!:\! x[i] \,=\, c^{\prime}, y[i] \,=\, d\}|}{m_{n}}}, \end{array} $$

then ρ(x/y) < 1.

We assume here that both limits exist; however, the same result holds if just one or none of the two limits exist.

Proof

By replacing the sequence (m n ) n≥ 0 by some subsequence of it, we may assume without loss of generality that \(\lim _{n\to \infty }|\{ 1 \le i \le m_{n} : x[i] = a, y[i] = b\}|/m_{n}\) exists for arbitrary symbols a and b. This limit is denoted by π(a,b). The existence of all the limits π(a,b) implies that for each symbol b,

$$\lim\limits_{n\to\infty}{|\{ 1 \le i \le m_{n} : y[i] = b \}|/m_{n}} $$

also exists and

$$\lim\limits_{n\to\infty}{\frac{|\{ 1 \le i \le m_{n} : y[i] = b \}|}{m_{n}}} = \sum\limits_{a \in A} \pi(a,b). $$

This limit is denoted by π y (b). Define

$$\nu(a/b)=\left\{ \begin{array}{ll} \pi(a,b)/\pi_{y}(b)&\text{ if } \pi_{y}(b) \neq 0 \\ 1/|A|& otherwise. \end{array} \right. $$

Let k be a block length to be fixed later. For two words u = a 1a k and v = b 1b k of length k, define

$$\nu(u/v) = \prod\limits_{i = 1}^{k}\nu(a_{i}/b_{i}). $$

Let us recall that a set P of words is prefix-free if no distinct words u,vP satisfy that u is a prefix of v. Note that if P is prefix-free, every word wA has at most one factorization w = u 1u n where each u i belongs to P. We will use the following well-known fact due to Huffman [12].

Let p 1,…,p n be real numbers such that 0 ≤ p i ≤ 1 for each 1 ≤ in and \({\sum }_{i = 1}^{n}{p_{i}} = 1\), then there exist n distinct words u 1,…,u n such that the set {u 1,…,u n } is prefix-free and |u i |≤⌈−log p i ⌉ for 1 ≤ in.

It is purely routine to check that \({\sum }_{u \in A^{k}} \nu (u/v) = 1\) for each word v of length k. Since \({\sum }_{u \in A^{k}} \nu (u/v) = 1\), there exists for each word v, a prefix-free set {w(u,v) : u,vA k} such that the relation |w(u,v)|≤⌈−log |A|ν(u/v)⌉ holds for each u and v. These words w(u,v) are used by the transducer to encode the infinite word x with the help of y. The transducer reads x and y by blocks of length k. For each pair of blocks u and v, it outputs w(u,v). This output can be decoded with the help of y because for each fixed block v, the possible blocks u are in one-to-one correspondence with the words w(u,v).

We now evaluate the length of the output of the transducer. Let p(n,u,v) be the number of occurrences of the pair (u,v) in x and y. Then,

$$\begin{array}{@{}rcl@{}} p(n,u,v) \,=\, |\{ 1 \le i \le n-k: i \,=\, 1\bmod k, x[i..i+k-1] \,=\, u, y[i..i+k-1] \,=\, v\}| \end{array} $$
$$\begin{array}{@{}rcl@{}} \rho(x/y) & \le& \lim\limits_{n\to\infty} \frac{1}{n} \sum\limits_{u,v \in A^{k}} p(n,u,v)|w(u,v)| \\ & \le& \lim\limits_{n\to\infty} \frac{1}{n} \sum\limits_{u,v \in A^{k}} p(n,u,v) \lceil -\log_{|A|} \nu(u/v)\rceil \\ & \le& \frac{1}{k} + \lim\limits_{n\to\infty} \frac{1}{n}\sum\limits_{u,v \in A^{k}} p(n,u,v)(-\log_{|A|} \nu(u/v)) \\ & = &\frac{1}{k} + \lim\limits_{n\to\infty} \frac{1}{n}\underset{v = b_{1} {\cdots} b_{k}}{\sum\limits\limits_{u = a_{1} {\cdots} a_{k}}} p(n,u,v)(-\log_{|A|} \prod\limits_{i = 1}^{k}\nu(a_{i}/b_{i})) \\ & = &\frac{1}{k} + \lim\limits_{n\to\infty} \frac{1}{n}\sum\limits_{a,b\in A} |\{ 1 \le i \le n : x[i] = a, y[i]=b\}| \log_{|A|} \frac{1}{\nu(a/b)} \\ & = &\frac{1}{k} + \sum\limits_{b\in A, \pi_{y}(b)\not= 0} \pi_{y}(b) \sum\limits_{a\in A} \frac{\pi(a,b)}{\pi_{y}(b)} \log_{|A|} \frac{\pi_{y}(b)}{\pi(a,b)} \end{array} $$

The previous to the last row results from counting correlated symbols instead of counting correlated blocks of length k. The last equality results from using \(\nu (a/b) = \frac {\pi (a,b)}{\pi _{y}(b)}\) and applying the definition of π(⋅,⋅) and π y (⋅).

We have that for each symbol b, the sum

$$\sum\limits_{a\in A} \frac{\pi(a,b)}{\pi_{y}(b)}\log_{|A|} \frac{\pi_{y}(b)}{\pi(a,b)} \leq 1. $$

However, for b = d, the above sum is strictly less than 1. Then, it follows that

$$\begin{array}{@{}rcl@{}} \sum\limits_{b\in A} \pi_{y}(b) \sum\limits_{a\in A} \frac{\pi(a,b)}{\pi_{y}(b)} \log \frac{\pi_{y}(b)}{\pi(a,b)} < 1. \end{array} $$

If k is chosen great enough, the conditional compression ratio ρ(x/y) satisfies ρ(x/y) < 1. □

The definition of finite-state independence of two infinite words is based on the conditional compression ratio.

Definition 3.4

Two infinite words x and y are finite-state independent if ρ(x/y) = ρ(x), ρ(y/x) = ρ(y) and the compression ratios of x and y are non-zero.

Note that we require that the compression ratios of x and y are non-zero. This means that a word x such that ρ(x) = 0 cannot be part of an independent pair. Without this requirement, every two words x and y such that ρ(x) = ρ(y) = 0 would be independent. In particular, every word x with ρ(x) = 0 would be independent of itself.

4 Join Independence is not Enough

Recall that the join of two infinite words x = a 1 a 2 a 3⋯ and y = b 1 b 2 b 3⋯ is the infinite word a 1 b 1 a 2 b 2⋯ obtained by interleaving their symbols. It is denoted xy. A possible definition of independence for normal words x and y would be to require their join xy to be normal. We call this notion join independence. This definition of independence would be natural since it mimics the definition of independence in the theory of algorithmic randomness, where two random infinite words are independent if their join is random [23], see also [17, Theorem 3.4.6]. One can ask whether a similar result holds for our definition. Is it true that two normal words are finite-state independent if and only if their join is normal? It turns out that finite-state independence implies join independence. But the converse fails.

We use the following notation. For a given infinite word x = a 1 a 2 a 3…, we define infinite words even(x) and odd(x) as a 2 a 4 a 6⋯ and a 1 a 3 a 5⋯ respectively. Similarly, for a finite word w, even(w) and odd(w) are the words appearing on the even and the odd positions in w. For example, x = even(x) means that a n = a 2n for all n.

Theorem 4.1

Let x and y be normal. If x and y are finite-state independent then xy is normal.

Proof

Suppose xy is not normal. Then, there is a length k such that a word of length k does not have aligned frequency 2k in xy. We can assume without loss of generality that the length is even (increasing it if necessary), so we assume that some word of length 2k does not have aligned frequency 2− 2k. Split xy into blocks of size 2k:

$$x \vee y = (u_{1}\vee v_{1})(u_{2}\vee v_{2})\cdots $$

where x = u 1 u 2… and y = v 1 v 2… are the respective decomposition in k-blocks. Consider a sequence of positions (m n ) n≥ 1 in xy such that each block of length 2k has an aligned frequency, and let uv be a block of length 2k whose frequency f along (m n ) n≥ 1 is not 2− 2k. Let u v be another block of length 2k with frequency along (m n ) n≥ 1 different from f. (Notice that if uv and all blocks u v have the all the same frequency f along (m n ) n≥ 1 then, necessarily, there is a block v such that uv has frequency along (m n ) n≥ 1 different from f, so we exchange the roles of x and y). Then, at the positions (m n /2) n≥ 1

$$\begin{array}{@{}rcl@{}} && \lim_{n\to\infty}{\frac{|\{ 1 \le i \le m_{n}/2 : x[i..i+k-1] = u, y[i..+k-1] = v\}|}{m_{n}}} \\ && \qquad \neq \lim_{n\to\infty}{\frac{|\{ 1 \le i \le m_{n}/2 : x[i..+k-1] = u^{\prime}, y[i..i+ k -1] = v\}|}{m_{n}}}, \end{array} $$

By the argument in Proposition 3.3 we conclude that ρ(u 1 u 2…/v 1 v 2…) < 1. By considering a compressor that reads blocks of length k we obtain

$$\rho(x/y)=\rho(u_{1}u_{2}\ldots/v_{1}v_{2}\ldots)<1. $$

Actually, the proof of Theorem 4.1 gives a stronger result.

Proposition 4.2

If y is normal and ρ(x/y) = 1, then xy is normal.

We now show that there are two words, that are join independent but not finite-state independent. The proof is based on the existence of normal word x such that x = even(x), that we prove in Theorem 4.4 below.

Theorem 4.3

There exists two normal words x and y such that xy is normal but x and y are not independent.

Proof

By Theorem 4.4, proved below, there is a normal word x such that x = even(x). Let y = odd(x) and z = even(x). Since x is normal and x = yz, the words y and z are join independent. Since y = odd(x) and x = even(x) = z, we have the equality y = odd(z). This implies, by Proposition 3.2, that ρ(y/z) = 0; hence, y and z are not finite-state independent. □

4.1 Construction of a Normal Word x such that x = even(x)

Theorem 4.4

There is a normal word x such that x = even(x).

Here we prove this existence by giving an explicit construction of a normal word x = a 1 a 2 a 3⋯ over the alphabet {0,1} such that a 2n = a n for every n. The construction can be easily extended to an alphabet of size k to obtain a word a 1 a 2 a 3⋯ such that a k n = a n for each integer n ≥ 1. Beware that the construction, as it is presented below, cannot provide a word a 1 a 2 a 3⋯ over a binary alphabet such that a 3n = a n (some more sophisticate one is needed, but it can be done; the probabilistic argument also works).

A finite word w is called -perfect for an integer ≥ 1, if |w| is a multiple of and all words of length have the same number of aligned occurrences |w|/(2) in w.

Lemma 4.5

Let w be an -perfect word such that |w|is a multiple of 22 . Then, there exists a 2 -perfect word z of length 2|w|such that even(z) = w .

Proof

Since |w| is a multiple of 22 and w is -perfect, for each word u of length , | |w| | u is a multiple of 2. Consider a factorization of w = w 1 w 2w r such that for each i, |w i | = . Thus, r = |w|/. Since w is -perfect, for any word u of length , the set {i : w i = u} has cardinality r/2. Define z of length 2|w| as z = z 1 z 2z r such that for each i, |z i | = 2, e v e n(z i ) = w i and for all words u and u of length , the set {i : z i = u u} has cardinality r/22. This latter condition is achievable because, for each word u of length , the set {i : even(z i ) = u} has cardinality r/2 which is a multiple of 2, the number of possible words u . □

Corollary 4.6

Let w be an -perfect word for some even integer . Then there exists an -perfect word z of length 2| w|such that even(z) = w .

Proof

Since w is -perfect, it is also /2-perfect. Furthermore, if u and v are words of length /2 and respectively then | |w| | u = 2/2 + 1| |w| | v . Thus, the hypothesis of Lemma 4.5 is fulfilled with /2. □

Corollary 4.7

There exist a sequence (w n ) n≥ 1 of words and a sequence of positive integers ( n ) n≥ 1 such that |w n | = 2n , even(w n+ 1) = w n , w n is n -perfect and ( n ) n≥ 1 is non-decreasing and unbounded. Furthermore, it can be assumed that w 1 = 01.

Proof

We start with w 1 = 01, 1 = 1, w 2 = 1001 and 2 = 1. For each n ≥ 2, if \(\ell _{n}2^{2\ell _{n}}\) divides |w n |, then n+ 1 = 2 n and w n+ 1 is obtained by Lemma 4.5. Otherwise, n+ 1 = n and w n+ 1 is obtained by Corollary 4.6 . Note that the former case happens infinitely often, so ( n ) n≥ 1 is unbounded. Also note that each n is a power of 2. □

Lemma 4.8 (Theorem 148 11)

Let A be an alphabet of b symbols. Let p(k,r,j)be the number of words of length k with exactly j occurrences of a given word of length r , at any position:

$$\begin{array}{@{}rcl@{}} p(k,r,j)=\left|\bigcup\limits_{u\in A^{r}} \{ w\in A^{k} : |w|_{u}=j\} \right|. \end{array} $$

For every integer r greater than or equal to 1, for every integer k large enough and for every real number ε such that 6/⌊k/r⌋≤ ε ≤ 1/b r ,

$$\begin{array}{@{}rcl@{}} \sum\limits_{i:\ \left| i - k/b^{r} \right| \ge \varepsilon k} \!\!\!\!\!\!\! p(k,r,i) < 2\ b^{k + 2r-2}r \ e^{- b^{r} \varepsilon^2 k/6r}. \end{array} $$

Lemma 4.9 (Theorem 4.6 6)

Let A be an alphabet. An infinite word x is normal if and only if there is a positive number C such that, for every every word u,

$$\begin{array}{@{}rcl@{}} \limsup\limits_{N \to\infty} \frac{|x[1..N]|_{u}}{N} < \frac{C}{|A|^{|u|}}, \end{array} $$

Finally, the next lemma is similar to Lemma 4.9 but with aligned occurrences.

Lemma 4.10

Let A be an alphabet. An infinite word x is normal if and only if there is a positive number C such that, such that for infinitely many lengths , for every word w of length ,

$$\begin{array}{@{}rcl@{}} \limsup\limits_{N \to \infty} \frac{|\!|x[1..\ell N]|\!|_{w}}{N} < \frac{C}{|A|^{\ell}}. \end{array} $$

Proof

The implication from left to right is immediate from the definition of normality. We prove the other. Fix alphabet A with b symbols, fix x and C. Assume that for infinitely many lengths , for every word w of length , the stated condition holds. Equivalently,

$$ \limsup\limits_{N \to \infty } \frac{|\!|x [1..N] |\!|_{w}}{N} < \frac{C}{\ell |A|^{\ell}}. $$
(*)

We will prove that for every word u, of any length,

$$\begin{array}{@{}rcl@{}} \limsup\limits_{N \to \infty } \frac{|x [1..N] |_{u}}{N} < \frac{C}{|A|^{u}}. \end{array} $$

and conclude that x is normal by Lemma 4.9. The task is to switch from aligned occurrences to non-aligned occurrences. For this we consider long consecutive blocks and the fact that most of them have the expected number of non-aligned occurrences of small blocks inside.

Fix a length r and a word u of length r. Let be any length greater than r for which (*) holds. Fix ε. We group the words of length in good and bad for r and ε. The bad ones deviate from the expected number of occurrences of some word of length r by ε or more. The good ones do not. Lemma 4.8 bounds the number of these bad words.

We use that each bad word has at most r + 1 occurrences of u; each good word has at most /b r + ε /b r occurrences of u; and in between any of two consecutive blocks of length there are at most r − 1 occurrences of u.

$$\begin{array}{@{}rcl@{}} \frac{|x[1..N]|_{u}}{N} &<&\frac{1}{N} \sum\limits_{w\in A^{\ell}} |\!|x[1..N]|\!|_{w} (|w|_{u} + (r-1)) \\ &=&\frac{1}{N} \sum\limits_{\text{bad } w} |\!|x[1..N]|\!|_{w} \left( |w|_{u} + (r-1)\right) \\ &&+ \frac{1}{N} \sum\limits_{\text{good } w} |\!|x[1..N]|\!|_{w} \left( |w|_{u} + (r-1)\right) \\ &<&\frac{1}{N} (\ell -r + 1 + r-1) \sum\limits_{\text{bad } w} |\!|x[1..N]|\!|_{w} \\ &&+ \frac{1}{N} \left( \frac{\ell}{b^{r}} + \varepsilon\ell +r-1\right) \sum\limits_{\text{good } w} |\!|x[1..N]|\!|_{w} \\ &<&\frac{1}{N} \ \ell \ \left( 2 b^{\ell} \ b^{2r-2} r \ e^{-b^{r} \varepsilon^{2} \ell/(6r)} \right) \frac{C N}{\ell b^{\ell}} + \frac{1}{N} \left( \frac{\ell}{b^{r}} + \varepsilon\ell +r-1\right) \frac{ N}{\ell} \\ &=&2 b^{2r-2 } r e^{-b^{r} \varepsilon^{2} \ell/(6r)} C + \left( \frac{1}{b^{r}} + \varepsilon +\frac{r-1}{\ell}\right). \end{array} $$

For large enough and ε = − 1/3 the values \(2 b^{2r-2 } r e^{-b^{r} \varepsilon ^{2} \ell /(6r)} C\) and \(\left (\varepsilon +\frac {r-1}{\ell }\right )\) are arbitrarily small. So,

$$\begin{array}{@{}rcl@{}} \limsup\limits_{N\to\infty} \frac{|x[1..N]|_{u}}{N} < \frac{C}{b^{r}}. \end{array} $$

Proof of Theorem 4.4

Let (w n ) n≥ 1 be a sequence given by Corollary 4.7. Let x = 11w 1 w 2 w 3⋯ We first prove that x satisfies x = even(x). Note that x[2k + 1..2k+ 1] = w k for each k ≥ 1 and x[1..2k+ 1] = 11w 1w k . The fact that w n = even(w n+ 1) implies x[2n] = x[n], for every n ≥ 3. The cases for n = 1 and n = 2 hold because x[1..4] = 1101.

We prove that x is normal. Consider an arbitrary index n 0. By construction, \(w_{n_{0}}\) is \(\ell _{n_{0}}\)-perfect and for each nn 0, w n is also \(\ell _{n_{0}}\)-perfect. For every word u of length \(\ell _{n_{0}}\) and for every nn 0,

$$\begin{array}{@{}rcl@{}} |\!| x[1..2^{n + 1}]|\!|_{u} \leq |\!| x[1..2^{n_{0}}]|\!|_{u} + |\!|w_{n_{0}} {\ldots} w_{n}|\!|_{u}. \end{array} $$

Then, for every N such that 2nN < 2n+ 1 and nn 0,

$$\begin{array}{@{}rcl@{}} \frac{|\!|x[1..N]|\!|_{u}}{N/\ell_{n_{0}}} & \le &\frac{|\!|x[1..2^{n + 1}]|\!|_{u}}{N /\ell_{n_{0}}} \\ & \le &\frac{|\!|x[1..2^{n_{0}}]|\!|_{u} +|\!|w_{n_{0}} {\ldots} w_{n}|\!|_{u}} {N/ \ell_{n_{0}}} \\ & \le &\frac{|\!|x[1..2^{n_{0}}]|\!|_{u}}{2^{n}/ \ell_{n_{0}}} + \frac{|\!|w_{n_{0}}{\ldots} w_{n}|\!|_{u}}{2^{n}/\ell_{n_{0}}} \\ & = &\frac{|\!|x[1..2^{n_{0}}]|\!|_{u}}{2^{n}/ \ell_{n_{0}}} + \frac{(2^{n_{0}} + {\ldots} + 2^{n})/(\ell_{n_{0}} 2^{\ell_{n_{0}}})} {2^{n}/ \ell_{n_{0}}} \\ & = &\frac{|\!|x[1..2^{n_{0}}]|\!|_{u}}{2^{n}/ \ell_{n_{0}}} + \frac{2^{n + 1}-2^{n_{0}}}{2^{n} 2^{\ell_{n_{0}}}} \\ & < &\frac{|\!|x[1..2^{n_{0}}]|\!|_{u}}{2^{n}/ \ell_{n_{0}}} + \frac{2}{2^{\ell_{n_{0}}}}. \end{array} $$

For large values of N and n such that 2nN < 2n+ 1, the expression

$$\frac{|\!| x[1..2^{n_{0}}]|\!|_{u} }{ 2^{n}/ \ell_{n_{0}}} $$

becomes arbitrarily small. We obtain for every word u of length \(\ell _{n_{0}}\),

$$\begin{array}{@{}rcl@{}} \limsup\limits_{N\to \infty} \frac{|\!|x[1..N]|\!|_{u}}{N/ \ell_{n_{0}}} \leq 3\ 2^{-\ell_{n_{0}}}. \end{array} $$

Since the choice of \({\ell _{n_{0}}}\) was arbitrary, the above inequality holds for each n . Since ( n ) n≥ 1 is unbounded, the hypothesis of Lemma 4.10 is fulfilled, with C = 3, so we conclude that x is normal. □

Alexander Shen (personal communication, August 2016) proved that almost all binary words satisfying x = even(x) are normal. The argument in his proof works if the distances between different occurrences of the same repeated symbol grow sufficiently fast. For example, his argument can be also used to prove that almost all binary words satisfying x 3n = x n are normal.

5 Almost All Pairs are Independent

The next theorem establishes that almost all pairs of infinite words are independent.

Theorem 5.1

The set I = {(x,y) : x and y are independent}has measure 1.

To prove it we use that if the oracle y is normal then the number of symbols read from y, is linearly bounded by the number of symbols read from the input x. This property, stated in Lemma 5.3 below, requires the notion of a finite run and the notion of a forward pair.

A finite run of a k-automaton \(\mathcal T\) is a finite sequence of consecutive transitions

$$\begin{array}{@{}rcl@{}} q_{0} \xrightarrow{a_{1,1},\ldots,a_{k,1}} q_{1} \xrightarrow{a_{1,2},\ldots,a_{k,2}} q_{2} \xrightarrow{}\cdots\xrightarrow{} q_{n-1} \xrightarrow{a_{1,n},\ldots,a_{k,n}} q_{n}. \end{array} $$

The label of the run is the component-wise concatenation of the labels of the transitions. More precisely, it is the tuple 〈u 1,…,u k 〉 where each u j for 1 ≤ jk is equal to a j,1 a j,2a j,n . Such a run is written shortly as \(q_{0} \xrightarrow {u_{1},\ldots ,u_{k}} q_{n}\).

Let \(\mathcal {T}\) be a 2-deterministic 3-automaton whose state set is Q. Let v be a finite word. A pair (p,a) ∈ Q × A is called a forward pair of v if there is a finite run \(p \xrightarrow {au,v|w} q\) for some finite words u and w and some state q. A finite word v is called a forward word if it has a maximum number of forward pairs. Since the number of pairs (p,a) ∈ Q × A is finite, there exist forward words. Note that, in case the automaton is total, every extension of a forward word is also a forward word. However, not every prefix of a forward word is a forward word.

Lemma 5.2

Let \(\mathcal {T}\) be a 2-deterministic 3-automaton and let x and y be two infinite words such that the run \(\gamma = q_{0} \xrightarrow {x,y|z} \infty \) is accepting. Let v be a forward word for \(\mathcal {T}\) . For each factorization \(\gamma = q_{0} \xrightarrow {u_{1},v_{1}|w_{1}} q_{1} \xrightarrow {u_{2},v_{2}|w_{2}} q_{2} \xrightarrow {x_{1},y_{1}|z_{1}} \infty \) such that v occurs in v 2 , u 2 is non-empty.

Proof

It suffices to prove the result when the word v 2 is equal to v. Suppose by contradiction that u 2 is empty. Let a be the first symbol of x 1. Since u 2 is empty, the pair (q 1,a) is not forward pair of v. Since x 1 is infinite, there exists a right extension v v of v such that (q 1,a) is a forward pair of v v . This contradicts the fact that v has a maximum number of forward pairs. □

Lemma 5.3

Let \(\mathcal {T}\) be a 2-deterministic 3-automaton and let x and y be two infinite words such that the run \(\gamma = q_{0} \xrightarrow {x,y|z} \infty \) is accepting. If y is normal, there is a constant K depending only on \(\mathcal {T}\) such that for any factorization \(\gamma = q_{0} \xrightarrow {u_{1},v_{1}|w_{1}} q_{1} \xrightarrow {x_{1},y_{1}|z_{1}} \infty \) such that |u 1|is long enough, |v 1|≤ K|u 1|.

Proof

Let v be a forward word for \(\mathcal {T}\). Since y is normal there is a positive constant k less than 1 such that the number of disjoint occurrences of v in y[1..n] is greater than kn for n large enough. By the previous lemma, |u 1|≥ k|v 1| holds. The result holds then with K = 1/k. □

We write μ for the Lebesgue measure.

Theorem 5.4

For each normal word y, the set {x : ρ(x/y) < ρ(x)}has Lebesgue measure 0.

Proof

Fix y normal. Since for every x, ρ(x) ≤ 1 (see comment after Definition 3.1), it suffices to prove {x : ρ(x/y) < 1} has measure 0. The inequality ρ(x/y) < 1 holds if there exists a 2-deterministic 3-automaton that compresses x using y as oracle. For a 2-deterministic 3-automaton \(\mathcal {T}\), let Q be its state set and let K be the constant obtained by Lemma 5.3. For any integer n and any positive real number ε < 1, let X n,ε be defined by

$$\begin{array}{@{}rcl@{}} X_{n,\varepsilon} = \{ x : q_{0}\xrightarrow{u,v|w} q, u = x[1..n], v = y[1..n^{\prime}] \text{ and } |w| < (1-\varepsilon)n \}. \end{array} $$

We claim that μ(X n,ε ) ≤|Q|K n2nε. The number of configurations 〈q,n,n 〉 is at most |Q|K n2(1−ε)n because there are |Q| possibles states, n K n, and there are at most 2(1−ε)n words of length smaller than (1 − ε)n. Two words x and x with different prefixes of length n can not reach the same configuration. If they did, the rest of the run would be the same and this would contradict the injectivity of \(\mathcal {T}\). Therefore, X n,ε is contained in at most |Q|K n2(1−ε)n cylinders of measure 2n.

Observe that for n fixed, the inclusion \(X_{n,\varepsilon } \subseteq X_{n,\varepsilon ^{\prime }}\) holds for ε ε. Let ε(n) be a decreasing function which maps each integer n to a real number such that

$$\sum\limits_{n\ge0}{n2^{-\varepsilon(n)n}} < \infty, $$

for instance \(\varepsilon (n) = 1/\sqrt n\). For each k, define

$$\begin{array}{@{}rcl@{}} \tilde X_{k} = \bigcup\limits_{n \ge k} {X_{n,\varepsilon(n)}}. \end{array} $$

By the choice of the function ε(n), \(\lim _{k \to \infty }{\mu (\tilde X_{k})} = 0\). We claim that each word x compressible with \(\mathcal {T}\) with normal oracle y belongs to \(\tilde X_{k}\) for each k. It suffices to prove that it belongs to infinitely many sets X n,ε(n). Suppose x is compressed with ratio 1 − δ, for some δ > 0. Then there is an infinite sequence of integers (n j ) j≥ 0 such that the configurations \(\langle q,n_{j},n^{\prime }_{j} \rangle \) reached after reading the prefix of length n j and outputting w j , with |w j | < (1 − δ)n j . Let j 0 be such that \(\varepsilon (n_{j_{0}})<\delta \). Then, for each for jj 0, we have \(x\in X_{n_{j},\varepsilon (n_{j})}\). Since this holds for each of the countably many 3-automata \(\mathcal {T}\), we conclude that the measure of the set of words compressible with normal oracle y is null. □

Proof Proof of Theorem 5.1

By definition of independence, the complement \(\bar {I} = A^{\omega } \times A^{\omega } \setminus I\) of I can be decomposed as \(\bar {I} = J_{1} \cup J_{2} \cup J_{3} \cup J_{4}\) where the sets J 1, J 2, J 3 and J 4 are defined by the following equations.

$$\begin{array}{@{}rcl@{}} J_{1} & = &\{ (x,y) : \rho(x) = 0 \} \qquad\qquad\qquad\ \ J_{2} = \{ (x,y) : \rho(y) = 0 \} \\ J_{3} & = & \{ (x,y) : \rho(x/y) < \rho(x) \} \qquad\qquad J_{4} = \{ (x,y) : \rho(y/x) < \rho(y) \} \end{array} $$

The sets J 1 and J 2 satisfy μ(J 1) = μ(J 2) = 0. By symmetry, the sets J 3 and J 4 satisfy μ(J 3) = μ(J 4). To show that μ(I) = 1, it suffices then to show that μ(J 3) = 0.

The measure of J 3 is then given by

$$\begin{array}{@{}rcl@{}} \mu(J_{3}) = \int\!\!\!{\int}_{x,y} 1_{J_{3}}\, dxdy = {\int}_{y} \left( {\int}_{x} 1_{J_{3}}\,dx \right) dy = {\int}_{y} f(y)\,dy \end{array} $$

where the function f is defined by f(y) = μ({x : ρ(x/y) < ρ(x)}). By Theorem 5.4, f(y) is equal to 0 for each normal word y. Since the set of normal words has measure 1, f(y) is equal to 0 for almost all y. It follows that μ(J 3) = 0. This concludes the proof of the theorem. □