1.1 Conventions

The set of nonnegative integers (respectively integers, rational numbers, real numbers, and complex numbers) is written \(\mathbb {N}\) (respectively, \(\mathbb {Z}\), \(\mathbb {Q}\), \(\mathbb {R}\), and \(\mathbb {C}\)). In particular, the set \(\mathbb {N}\) is {0, 1, 2, …}. We use the notation for the set of integers {i, i + 1, …, j}. The floor of a real number x is \(\left \lfloor x \right \rfloor =\sup \{z\in \mathbb {Z}\mid z\le x\}\), whereas {x} = x −⌊x⌋ stands for the fractional part of x. Recall that ⌈⋅⌉ denotes the ceiling function , i.e., \(\lceil x\rceil =\inf \{z\in \mathbb {Z}\mid z\ge x\}\). The characteristic sequence χ X of a set \(X \subset {\mathbb N} ^d\) takes its values in {0, 1} and satisfies χ X (n) = 1 if and only if n ∈ X.

Let us recall the notation about asymptotics. Let \(f,g:\mathbb {R}\to \mathbb {R}\) be two functions. The definitions given below can also be applied to functions defined on another domain like \(\mathbb {R}_{>a}\), \(\mathbb {N}\) or \(\mathbb {Z}\). We assume implicitly that the following notions are defined for x → +. We write \(f\in \mathcal {O}(g)\), if there exist two constants x0 and C > 0 such that, for all x ≥ x0, |f(x)|≤ C|g(x)|. We also write f ≪ g or g ≫ f, or else g ∈ Ω(f). Note that we can write either \(f\in \mathcal {O}(g)\) or \( f=\mathcal {O}(g)\). Be aware that in the literature, authors sometimes give different meanings to the notation Ω(f). Here we consider a bound, for all large enough x, but there exist variants where the bound holds only for an increasing sequence (x n )n≥0 of reals, i.e., limsupx→+|g(x)|/|f(x)| > 0.

If g belongs to \(\mathcal {O}(f)\cap \varOmega (f)\), i.e., there exist constants x0, C1, C2 with C1, C2 > 0 such that, for all x ≥ x0, C1|f(x)|≤|g(x)|≤ C2|f(x)|, then we write g ∈ Θ(f). As an example, the function \(x^2+\sin 6x\) is in Θ(x2) and \(x^2|\sin {}(4x)|\) is in \(\mathcal {O}(x^2)\) but not in Θ(x2).

1.2 Algebraic Structures

We briefly recall the basic definitions of monoid, (semi)group, (semi)ring, field, ideal, vector space, and module.

Definition 1.2.1

Let \(\mathbb {S}\) be a set equipped with a single binary operation

$$\begin{aligned} \star:\mathbb{S}\times \mathbb{S}\to \mathbb{S}\, . \end{aligned}$$

It is convenient to call this operation a multiplication over \(\mathbb {S}\), and the product of \(x,y\in \mathbb {S}\) is usually denoted by xy.

If this multiplication is associative, i.e., for all \(x,y,z\in \mathbb {S}\), (xy)z = x(yz), then the algebraic structure given by the pair \((\mathbb {S},\star )\) is a semigroup .

If, moreover, multiplication has an identity element, i.e., there exists some element \(1\in \mathbb {S}\) such that, for all \(x\in \mathbb {S}\), x1 = x = 1x, then \((\mathbb {S},\star )\) is a monoid .

In addition if every element \(x\in \mathbb {S}\) has an inverse, i.e., there exists \(y\in \mathbb {S}\) such that xy = 1 = yx, then \((\mathbb {S},\star )\) is a group .

Definition 1.2.2

A semiring is a set R equipped with two binary operations + and ⋅ such that

  1. 1.

    (R, +) is a commutative monoid with identity element 0.

  2. 2.

    (R, ⋅) is a monoid with identity element 1.

  3. 3.

    The product is distributive with respect to the sum.

  4. 4.

    For all r ∈ R, 0 ⋅ r = 0 = r ⋅ 0.

If, moreover, ⋅ is commutative, then the semiring is said to be commutative. A ring is a semiring where (R, +) is a commutative group. A field is a commutative ring where (R, ⋅) is a group.

Definition 1.2.3

A (two-sided) ideal of a ring (R, +, ⋅) is a nonempty subset I of R, such that (I, +) is a subgroup of (R, +) and for all i ∈ I and all r ∈ R, i ⋅ r and r ⋅ i belong to I.

Definition 1.2.4

Let K be a field with identity element 1 for its multiplication. A vector space over K is a set V equipped with a binary operation + : V × V → V such that (V, +) is a commutative group and a binary operation ⋅ : K × V → V such that, for all k,  ∈ K and all x, y ∈ V ,

  1. 1.

    k ⋅ ( ⋅ x) = (kℓ) ⋅ x

  2. 2.

    1 ⋅ x = x

  3. 3.

    (k + ) ⋅ x = k ⋅ x +  ⋅ x

  4. 4.

    k ⋅ (x + y) = k ⋅ x + k ⋅ y

A K-module is similarly defined but it is built over a ring K instead of a field.

We now consider natural notions specific to group and semigroup theory (see also Section 9.3.1 for further basic definitions on group theory and Chapter 11).

For a given property \(\mathcal {P}\) of groups (abelian, free, nilpotent, soluble, …), group G is called virtually \(\mathcal {P}\) if G contains a finite-index subgroup satisfying property \(\mathcal {P}\). See also Definition 9.3.36 and Section 9.3.4.1 for properties of virtually free groups such as the decidability for the word problem (Theorem 9.3.37).

Schreier graphs generalize Cayley graphs. Let G be a group generated by S and acting on a set X, the vertices of its Schreier graph (depending on S) are the elements of X, and there is an edge from x to y if y is the image of x under the action of some element of S. By considering the action of the group on itself by right multiplication, this graph coincides with its Cayley graph. See also Definition 10.3.1 and Section 11.3.

Let G be a finitely generated group with a generator system given by S = {g1, …, g m }. The length of g ∈ G (with respect to S) is the smallest integer such that g can be represented by a product of the form

$$\begin{aligned} g=g_{i_1}^{\pm 1}\cdots g_{i_\ell}^{\pm 1}, \end{aligned}$$

i.e., the length of the shortest decomposition of g. The growth of the group G (with respect to S) is the map

$$\begin{aligned} \gamma_S:\mathbb{N}\to\mathbb{N},\ n\mapsto\mathrm{Card}\{g\in G\mid d_S(g)\leq n\}\, , \end{aligned}$$

where d S (g) is the length of g with respect to S. This definition can be made independent of S by noticing that the growths corresponding to two generating sets are equivalent [409]. Note that a finite group has a bounded growth, an infinite abelian group has a polynomial growth, and a non-abelian free group has an exponential growth. The growth of a finitely generated group can also be seen as the growth of its Cayley graph: we count the vertices which are within distance n of the identity element. This notion is considered in Sections 10.3.4.1, 11.3.1, and 11.4.

1.3 Words

This section is intended to give basic definitions about words either finite or infinite. Words are ubiquitous when encoding a piece of information. As an example, a finite word over the alphabet of digits {0, …, k − 1} can be seen as the k-ary expansion of an integer. On the other hand, an infinite word over {0, 1} could be used as the characteristic sequence for a subset of \(\mathbb {N}\). For material not covered here, see the classical Lothaire’s textbooks on finite or infinite words and their properties are [31,32,387]. Also see Allouche and Shallit’s book [14] about automatic sequences or, Queffélec’s book [488] for a dynamical point of view. For a quick overview, the reader can have a look at the chapter [150] or the tutorial [75]. The book [504] is also intended to serve as introductory lecture notes on the subject.

1.3.1 Finite Words

An alphabet is a finite nonempty set. Its elements are called symbols or letters .

Definition 1.3.1

A (finite) word over Σ is a finite sequence of letters from Σ. The empty sequence is called the empty word and it is denoted by ε. The sets of all finite words (respectively, finite nonempty words) over Σ are denoted by Σ (respectively, Σ+). A word w = w0 w2wn−1 where w i  ∈ Σ, 0 ≤ i < n, can be seen as a function w : {0, 1, …, n − 1}→ Σ in which w(i) = w i for all i. The empty word is the word whose domain is the empty set.

Let u = u0um−1 and v = v0vn−1 be two words over Σ. The concatenation of u and v is the word w = w0wm+n−1 defined by w i  = u i if 0 ≤ i < m, and w i  = vim otherwise. We write u ⋅ v or simply uv to express the concatenation of u and v. The concatenation (or catenation) of words is an associative operation, i.e., given three words u, v and w, (uv)w = u(vw). Hence, parenthesis can be omitted. In particular, the set Σ (respectively, Σ+) equipped with the concatenation product is a monoid (respectively, a semigroup).

The length of a word w, denoted by |w|, is the number of occurrences of the letters in w. In other words, if w = w0 w2wn−1 with w i  ∈ Σ, 0 ≤ i < n, then |w| = n. In particular, the length of the empty word is zero. The set of words of length k (respectively, at most k) over Σ is denoted by Σk (respectively, Σk). For a ∈ Σ and w ∈ Σ, we write |w| a for the number of occurrences of a in w. Therefore, we have

$$\begin{aligned} |w|=\sum_{a\in\varSigma}|w|{}_a\, . \end{aligned}$$

If u and v are two words over Σ such that |u| a  = |v| a for all a ∈ Σ, then u is obtained by permuting the letters of v: u and v are said to be abelian equivalent. These are anagrams.

A word u is a factor of a word v (respectively, a prefix or a suffix ), if there exist words x and y such that v = xuy (respectively, v = uy, or v = xu). A factor (respectively, a prefix or a suffix) u of a word v is called proper if u ≠ v and u ≠ ε. Prefixes and suffixes are sometimes called initial and terminal factors. Thus, for example, if w = concatenation, then con is a prefix, ate is a factor, and nation is a suffix of w. If w = w0w n and u is a factor of w such that u = w i wi+|u|−1, we say that u occurs in w at position i. For instance, in abbabaabbaab, the factor ab occurs at positions 0, 3, 6, 10. The set of factors of u (respectively, of prefixes of u) is denoted by Fac(u) (respectively, Pref(u)).

The mirror (sometimes called reversal ) of a word u = u0um−1 is the word \(\tilde {u}=u_{m-1}\cdots u_0\). It can be defined inductively on the length of the word by \(\tilde {\varepsilon }=\varepsilon \) and \(\widetilde {au}=\tilde {u}a\) for a ∈ Σ and u ∈ Σ. Notice that for u, v ∈ Σ, \(\widetilde {uv}=\tilde {v}\tilde {u}\). A palindrome is a word u such that \(\tilde {u}=u\). For instance, the palindromes of length at most 3 in {0, 1} are ϵ, 0, 1, 00, 11, 000, 010, 101, 111.

1.3.2 Infinite Words

Instead of considering finite sequences of elements belonging to an alphabet Σ, considering infinite sequences of elements in Σ is also relevant.

Definition 1.3.2

An (one-sided right) infinite word is a map from \(\mathbb {N}\) to Σ. If w is an infinite word, we often write

$$\begin{aligned} \mathbf{w} = a_0 a_1 a_2\cdots, \end{aligned}$$

where each a i  ∈ Σ. The set of all infinite words of Σ is denoted Σω (one can also find the notation \(\varSigma ^{\mathbb {N}}\)).

Example 1.3.3

Consider the infinite word x = x0 x1 x2⋯ where the letters x i  ∈{0, …, 9} are given by the digits appearing in the usual decimal expansion of π − 3,

$$\begin{aligned} \pi-3=\sum_{i=0}^{+\infty} x_i\, 10^{-i-1}, \end{aligned}$$

i.e., x = 14159265358979323846264338327950288419⋯ is an infinite word.

The notions of factor , prefix, or suffix introduced for finite words can be extended to infinite words. Factors and prefixes are finite words, but a suffix of an infinite word is also infinite. We still make use of the notation Fac(w) and Pref(w).

Definition 1.3.4

The language of the infinite word x is the set of all its factors. It is denoted by Fac(x). The set of factors of length n occurring in x is denoted by Fac n (x).

Definition 1.3.5

The complexity function, or factor complexity, of an infinite word x maps \(n\in \mathbb {N}\) onto the number p x (n) = Card(Fac n (x)) of distinct factors of length n occurring in x.

Example 1.3.6

The Thue–Morse word t = t0 t1 t2⋯ (ubiquitous word encountered in combinatorics on words [18]) can be defined over {a, b} by t n  = a if and only if there is an even number of ones in the base-2 expansion of n ≥ 0. Otherwise stated, if the sum of base-2 digits of n is even. Thus a prefix of t is given

$$\begin{aligned} \mathtt{abbabaab baababba baababbaabbabaab} \cdots\, . \end{aligned}$$

If we replace a with 1 and b with 0, then we get the characteristic sequence χ E of the set of integers whose sum of base-2 digits is even. The factor complexity of the Thue–Morse word t is well known [107, 391]. See also [78, p. 225] where a chapter is devoted to the factor complexity of morphic words. We have

$$\begin{aligned} p_{\mathbf{t}}(n) = \left\{ \begin{array}{cl} 4n-2\cdot2^m-4, & \ \text{if} \ 2\cdot2^m<n\leq3\cdot2^m; \\ 2n+4\cdot2^m-2, & \ \text{if} \ 3\cdot2^m<n\leq4\cdot2^m. \end{array} \right. \end{aligned}$$

Definition 1.3.7

A two-sided or bi-infinite word is a map from \(\mathbb {Z}\) to Σ. The set of all bi-infinite words is denoted ω Σω (one can also find the notation \(\varSigma ^{\mathbb {Z}}\)).

Definition 1.3.8

An infinite word x = x0 x1⋯ is (purely) periodic if there exists a finite word u = u0uk−1 ≠ ϵ such that x = uω, i.e., for all n ≥ 0, we have x n  = u r where n = dk + r with r ∈{0, …, k − 1}. An infinite word x is eventually periodic (or ultimately periodic ) if there exist two finite words u, v ∈ Σ, with v ≠ ϵ such that x = uvvv⋯ = uvω. Notice that purely periodic words are special cases of eventually periodic words. For any eventually periodic word x, there exist words u, v of shortest length such that x = uvω, then the integer |u| (respectively |v|) is referred to as the preperiod (respectively period ) of x. An infinite word is said to be nonperiodic if it is not eventually periodic.

Let us mention the next result called Morse–Hedlund theorem.

Theorem 1.3.9

Let w be an infinite word over a finite alphabet. The word w is eventually periodic if and only if there exists some integer N such that p w (N) ≤ N.

Among the nonperiodic words of low factor complexity, Sturmian words play a special role and have been extensively studied. An infinite word x is Sturmian if p x (n) = n + 1 for all n ≥ 0. Note that Sturmian words are over a 2-letter alphabet. For general references, see [386, Chapter 2] or [487, Chapter 6]. They will be considered in Chapter 6.

Definition 1.3.10

An infinite word x is recurrent if all its factors occur infinitely often in x. It is uniformly recurrent if it is recurrent and for every factor u of x, for the infinite set

$$\begin{aligned} \left\{i_1^{(u)}<i_2^{(u)}<i_3^{(u)}<\cdots \right\} \end{aligned}$$

of positions where u occurs in x, there exists a constant C u such that, for all j ≥ 1,

$$\begin{aligned} i_{j+1}^{(u)}-i_j^{(u)}\le C_u\, . \end{aligned}$$

Note that, by Furstenberg’s theorem , for any infinite word w, there is a uniformly recurrent word r over the same alphabet such that every finite factor of r is a factor of w, i.e., Fac(r) ⊆Fac(w) (see Theorem 4.4.9).

Let x be an infinite word, the function \(R_{\mathbf {x}}:\mathrm {Fac}(\mathbf {x})\to \mathbb {N}\cup \{\infty \}\) maps a factor u of x to the smallest k such that every factor of x of length k contains u, or if no such k exists. Otherwise stated, an infinite word x is uniformly recurrent, if for every factor u of x, R x is finite. The recurrence function maps \(n\in \mathbb {N}\) to

$$\begin{aligned} R_{\mathbf{x}}(n)=\max_{u\in L_n(\mathbf{x})}R_{\mathbf{x}}(u)\, . \end{aligned}$$

Otherwise stated, if x is uniformly recurrent, then for every factor of length n of x, R x (n) is finite and u occurs in all factors of length R x (n) of x.

Assume that Σ is totally ordered: (Σ, <). Let x, y be two infinite words over Σ. We say that x is lexicographically less than y if there exists N such that x i  = y i for all i < N and x N  < y N .

Definition 1.3.11

One can endow Σω with a distance d defined as follows. Let x, y be two infinite words over Σ. Let x ∧y denote the longest common prefix of x and y. Then the distance d is given by

$$\begin{aligned} d(\mathbf{x},\mathbf{y}):=\left\{\begin{array}{ll} 0, &\text{ if }\mathbf{x}=\mathbf{y},\\ 2^{-|\mathbf{x}\wedge \mathbf{y}|},&\text{ otherwise}.\\ \end{array}\right. \end{aligned}$$

This notion of distance extends to \(\varSigma ^{\mathbb {Z}}\). Notice that the topology on Σω is the product topology (of the discrete topology on Σ). The space Σω is a compact Cantor set , that is, a totally disconnected compact space without isolated points. Since Σω is a (complete) metric space, it is therefore relevant to speak of convergent sequences of infinite words. The sequence (z n )n≥0 of infinite words over Σ converges to x ∈ Σω, if for all ϵ > 0, there exists \(N \in \mathbb {N}\) such that, for all n ≥ N, d(z n , x) < ϵ. To express the fact that a sequence of finite words (w n )n≥0 over Σ converges to an infinite word y, it is assumed that Σ is extended with an extra letter cΣ. Any finite word w n is replaced with the infinite word w n ccc⋯, and if the sequence of infinite words (w n ccc⋯ )n≥0 converges to y, then the sequence (w n )n≥0 is said to converge to y.

Let (u n )n≥0 be a sequence of nonempty finite words. If we define, for all  ≥ 0, the finite word v as the concatenation u0 u1u , then the sequence (v )≥0 of finite words converges to an infinite word. This latter word is said to be the concatenation of the elements in the infinite sequence of finite words (u n )n≥0. In particular, for a constant sequence u n  = u for all n ≥ 0, v  = u+1 and the concatenation of an infinite number of copies of the finite word u is denoted by uω.

We have discussed the fact that a (finite) word u may appear as a factor of an infinite word x. It may occur a finite number of times, infinitely often, or even in such a way that R x (u) is finite. But we could also introduce the frequency of a factor u occurring in x as the following limit, if it exists,

$$\begin{aligned} \lim_{n\to +\infty}\frac{\mathrm{Card}\left(\{i\le n-|u|\mid x_i\cdots x_{i+|u|-1}=u\}\right)}{n}\, . \end{aligned}$$

For instance, for the infinite word w = 01 0011 0414 08 18 016116⋯ where we have longer and longer blocks of consecutive zeroes followed by longer and longer blocks of ones. The frequencies of 0 and 1 do not exist. Frequency appears naturally in the definition of normal numbers given below. See also Theorem 1.6.10 about the frequency of symbols in automatic sequences and morphic words. Frequencies are also considered in Chapter 5 in the framework of repetitions, and in Chapter 7 and 8 in the framework of normality.

1.3.3 Number Representations

We refer the reader to Frougny’s chapter [386] or to [227] for a general presentation of numeration systems. The book [503] can also serve as an introduction to the subject. We also mention the survey [36]. More details are also discussed in Section 3.2 of this book.

Let k ≥ 2 be an integer. Let us recall how base-k expansion of integers may be computed. For any positive integer n, there exist  ≥ 0 such that k ≤ n < k+1 and unique coefficients c0, …, c  ∈{0, …, k − 1} such that

$$\begin{aligned} n=\sum_{i=0}^\ell c_{i}\, k^i \text{ and }c_\ell\neq 0\, . \end{aligned}$$

The coefficients c , …, c0 can be computed by successive Euclidean divisions. Set n0 := n. We have n0 = c k + n1 with n1 < k and for i = 1, …, , n i  = ci ki + ni+1 with ni+1 < ki. The word c c0 is said to be the k-ary representation or k-ary expansion of n (sometimes called greedy representation) and denoted by rep p (n). If d d0 is a word over an alphabet of digits included in \(\mathbb {Z}\), we define

$$\begin{aligned} \mathrm{val}_k(d_\ell\cdots d_0)=\sum_{i=0}^\ell d_i\, k^i\, . \end{aligned}$$

If one replaces the sequence (kn)n≥0 with an increasing sequence (U n )n≥0 of integer such that U0 = 1, then a similar algorithm may be applied. The corresponding U-expansions are over the alphabet \(\{0,\ldots ,\sup \lceil \frac {U_{n+1}}{U_n}\rceil -1\}\). One finds the general terminology positional numeration system. It is also possible to extend the procedure to represent real numbers. Let x ∈ (0, 1). There exists a decomposition of the form

$$\begin{aligned} x=\sum_{i=1}^{+\infty} c_i\, k^{-i} \end{aligned}$$

where c i  ∈{0, …, k − 1} for all i ≥ 1. If we forbid sequences where c i  = k − 1 for all large enough i, then the sequence (c i )i≥1 is unique. Given x ∈ [0, 1), the algorithm in Table 1.1 provides the corresponding sequence (c i )i≥0 of digits.

Table 1.1 An algorithm for computing the base-k expansion of x ∈ [0, 1).

In this algorithm, we iterate a map from the interval [0, 1) onto itself, i.e.,

$$\begin{aligned} T_k:[0,1)\to [0,1), y\mapsto \{k y\}\end{aligned} $$
(1.1)

and the value taken by the image determines the next digit in the expansion. This yields a dynamical system such as discussed in Section 1.7. The interval [0, 1) is thus split into k subintervals [j/k, (j + 1)/k), for j = 0, …, k − 1. For all i ≥ 0, if \(T_k^i(x)\) belongs to the subinterval [j/k, (j + 1)/k), then the digit c i occurring in rep k (x) is equal to j. It is indeed natural to consider such subintervals. If y belongs to [j/k, (j + 1)/k), then ky has an integer part equal to j and the map T k is continuous and increasing on every subinterval [j/k, (j + 1)/k). Note also that the range of T k on any of these subintervals is [0, 1). So applying T k to a point in one of these subintervals can lead to a point belonging to any of these subintervals (later on, we shall introduce some other transformation, e.g., β-transformations, where a restriction appears on the intervals that can be reached). So to speak, the base-k expansion of x can be derived from the trajectory of x under T k , i.e., from the sequence \((T_k^n(x))_{n\ge 0}\).

As an example, consider the base k = 3 and the expansion of x = 3/10. The point lies in the interval [0, 1/3); thus the first digit of the expansion is 0. Then T3(3/10) = 9/10 lies in the interval [2/3, 1); thus the second digit is 2. If we apply again T3, we get \(T_3^2(3/10)=\{27/10\}=7/10\), which belongs again to [2/3, 1) giving the digit 2. Then \(T_3^3(3/10)=1/10\) giving the digit 0 and finally \(T_3^4(3/10)=3/10\). So rep3(3/10) = (0220)ω.

A natural generalization of base-k expansion (discussed in Section 3.6 and in Example 8.1.2) is to replace the base k with a real number β > 1. In particular, the transformation T k will be replaced by the so-called β-transformation. Note that we shall be concerned with expansions of numbers in [0, 1). If x ≥ 1, then there exists a smallest d such that x/βd belongs to [0, 1). It is therefore enoughFootnote 1to concentrate on [0, 1).

Definition 1.3.12 (β-Expansions)

We will only represent real numbers in the interval [0, 1). Let β > 1 be a real number. The representations discussed here are a direct generalization of the base-k expansions. Every real number x ∈ [0, 1) can be written as a series

$$\begin{aligned} x=\sum_{i=0}^{+\infty} c_i\, \beta^{-i-1} \end{aligned} $$
(1.2)

where c i belong to {0, ⌈β⌉− 1}. Note that if β is an integer, then ⌈β⌉− 1 = β − 1. For integer base-b expansions, a number may have more than one representation, namely, those ending with 0ω or (b − 1)ω. For a real base β, we obtain many more representations. Consider the Golden mean ϕ, which satisfies ϕ2 − ϕ − 1 = 0, and thus

$$\begin{aligned} \frac{1}{\phi^n}=\frac{1}{\phi^{n+1}}+\frac{1}{\phi^{n+2}},\quad \forall n\ge 0\, . \end{aligned}$$

As an example, the number 1/ϕ has thus infinitely many representations as a power series with negative powers of ϕ and coefficients 0 and 1:

$$\begin{aligned} \frac{1}{\phi}=\frac{1}{\phi^{2}}+\frac{1}{\phi^{3}}=\frac{1}{\phi^{2}}+\frac{1}{\phi^{4}}+\frac{1}{\phi^{5}}=\frac{1}{\phi^{2}}+\frac{1}{\phi^{4}}+\frac{1}{\phi^{6}}+\frac{1}{\phi^{7}}=\cdots\, . \end{aligned}$$

To get a canonical expansion for a real x ∈ [0, 1), we just have to replace the integer base b with β and consider the so-called β-transformation

$$\begin{aligned} T_\beta:[0,1)\to[0,1),\ x\mapsto \{\beta x\} \end{aligned}$$

in the algorithm from Table 1.1. For i = 0, 1, …, the idea is to remove the largest integer multiple c i of βi−1 and then repeat the process with the remainder and the next negative power of β to get (1.2). Note that c i is less than ⌈β⌉ because of the greediness of the process. Otherwise, one could have removed a larger multiple of the power of β at a previous step. The corresponding infinite word c0 c1⋯ is called the β-expansion of x and is usually denoted by d β (x). Any word d0 d1⋯ over a finite alphabet of nonnegative integers satisfying

$$\begin{aligned} x=\sum_{i=0}^{+\infty} d_i\, \beta^{-i-1} \end{aligned}$$

is said to be a β-representation of x. Thus, the β-expansion of x is the lexicographically maximal word among the β-representations of x.

The greediness of the algorithm can be reformulated as follows.

Lemma 1.3.13

A word d0 d1over {0, …, ⌈β⌉− 1} is the β-expansion of a real number x ∈ [0, 1) if and only if, for all j ≥ 0,

$$\begin{aligned} \sum_{i=j}^{+\infty} d_i\, \beta^{-i-1}<\beta^{-j}\, . \end{aligned}$$

Proposition 1.3.14

Let x, y be real numbers in [0, 1). We have x < y if and only if d β (x) is lexicographically less than d β (y).

1.3.4 Normality

Now that number representations and the frequency of a factor have been introduced, we can define normal numbers.

A real number x is simply normal with respect to base b ≥ 2 if in the base-b expansion of x (which is an infinite word over {0, …, b − 1}), the frequency of every digit d ∈{0, 1, …, b − 1} exists and is equal to 1/b. Furthermore x is normal in base b if it is simply normal with respect to the bases b, b2, b3,…. An equivalent definition is to say that for all k ≥ 1 and every word u = u1u k  ∈{0, 1, …, b − 1}k, the frequency of u in the base-b expansion of x exists and is equal to 1/bk. A real number x is absolutely normal if x is normal to every integer base greater than or equal to 2.

Normality can also be expressed in terms of uniform distribution modulo 1 [578] (see Section 7.6 for corresponding definitions). Indeed, a real number x is normal to base b if and only if the sequence (bj x)j≥0 is uniformly distributed modulo 1.

These notions were introduced by Borel [99] and are discussed in Chapters 2, 7, and 8. In particular, constructions of normal numbers are provided in Sections 7.7 and 7.8. See also Theorem 7.4.1 (the so-called Hot Spot Lemma according to [101]) for a further convenient characterization of normality in terms of limsups instead of limits. For a dynamical viewpoint, see Section 8.2, where the definition of a normal number is transferred to symbolic dynamical systems, and constructions with concatenation of words for languages with specification are provided.

1.3.5 Repetitions in Words

In combinatorics on words, a question that naturally arises is to study the repetitions that should occur or may be avoided in words. See in particular Chapter 5 and Chapters 4 and 5 in [79].

Concatenating a word w with itself k times is abbreviated by wk. In particular, w0 = ε. Furthermore, for an integer m and a word w = w1 w2w n , where w i  ∈ Σ for 1 ≤ i ≤ n (here it is convenient to start indexing with 1), the rational power

$$\begin{aligned} w^{m/n} \end{aligned}$$

is wq w1 w2w r , where m = qn + r for 0 ≤ r < n. For instance, we have

$$\begin{aligned} \mathtt{(abbab)^{9/5}=abbababba}\, . \end{aligned}$$

Consider definitions that have to do with repetitions in words. A square is a nonempty word of the form xx, where x ∈ Σ. An example of a square in English is the word murmur with x equal to mur. An overlap is a word of the form axaxa, where a ∈ Σ and x ∈ Σ. The word alfalfa is an example of an overlap in English with x equal to lf. It is obvious that every overlap has a square as prefix. For any positive integer k ≥ 2, a k-power is a nonempty word of the form xk. Thus a 2-power is a square, and a 3-power is a cube. A nonempty word that is not a k-power for any k ≥ 2 is primitive .

Let us say a few words about avoidance (which is the topic of Chapter 5). It is an easy exercise to show that over a 2-letter alphabet, every word of a length of at least 4 contains a square. This raises several questions. Over a 3-letter alphabet, can we build longer words with no square as a factor? In particular, does there exist an infinite word with no square in it? Also over a 2-letter alphabet, if squares cannot be avoided, could we avoid cubes or even overlaps?

We say that a word w (finite or infinite) is square-free (or avoids squares) if no factor of w is a square. A finite or infinite word is overlap-free if it contains no factor that is an overlap. Thue [563] was the first to show the existence of an infinite overlap-free binary word. The Thue–Morse word (see Example 1.3.6) is overlap-free. See [79, Chapter 4] for more on avoidable repetitions and regularities in words. More generally, a (finite or infinite) word is k-power-free (or avoids k-powers) if none of its factors is a k-power. For instance, one can check that abbabaabbaab is overlap-free. (It is indeed a prefix of the Thue–Morse word). The goal of Chapter 5 is to present general techniques to prove positive or negative results about the appearance of a repetition pattern. The general question is to know whether an infinite word without a given pattern exists over an alphabet of a given size. Another question is to consider the growth function (in the sense of Definition 1.5.7) of the language of finite words avoiding a particular pattern.

Many variations on these topics exist. For instance, an abelian square is a word of the form uv where u and v are abelian equivalent. One can check that over a 3-letter alphabet, every long enough finite word contains an abelian square.

In Chapter 6, the addressed question is this: given a nonperiodic word x ∈ Σω, does there exist a finite nonempty set C and a mapping φ : Σ+ → C such that for each factorization x = u1 u2 u3⋯ there exist i, j ≥ 1 such that φ(u i ) ≠ φ(u j )?

1.4 Morphisms

Infinite words of particular interest can be obtained by iterating morphisms of free monoids. They have many interesting combinatorial properties and can be generated by a simple mean.

Definition 1.4.1

A map h : Σ→ Δ, where Σ and Δ are alphabets, is called a morphism if h satisfies h(xy) = h(x)h(y) for all x, y ∈ Σ. In particular, we have h(ε) = ε. When Σ = Δ, morphisms are also called substitutions .

A morphism may be specified by providing the values h(a) for all a ∈ Σ. For example, we may define the morphism t : {0, 1}→{0, 1} by

$$\begin{aligned} \begin{array}{rcl} {} 0 &\mapsto &01 \\ 1 &\mapsto &10. \end{array} \end{aligned} $$
(1.3)

This morphism is often referred to as the Thue–Morse morphism . The domain Σ of a morphism h is easily extended to the set Σω of (one-sided) infinite words. Let h : Σ→ Δ be a morphism and x = x0 x1 x2⋯ be an infinite word over Σ. Simply consider the sequence of finite words (h(x0x n ))n≥0 of images of the prefixes of x. The limit of this sequence is h(x). In particular, if h : Σ→ Σ and x is an infinite word such that h(x) = x, then x is said to be a fixed point of h.

A morphism h : Σ→ Σ such that h(a) = ax for some a ∈ Σ and x ∈ Σ with hi(x) ≠ ϵ for all i is said to be prolongable on a. The Thue–Morse morphism t given by (1.3) is prolongable on 0 (and also on 1). The first few iterations of t are

$$\begin{aligned} \begin{array}{rcl} t(0)&=&01\\ t^2(0)&=&0110\\ t^3(0)&=&01101001\\ t^4(0)&=&01101001 10010110\\ &\vdots& \\ \end{array} \end{aligned}$$

Since |t(0)| = |t(1)| = 2, we have |tn(0)| = 2n for all n ≥ 0. It is easy to prove that tn(0) is a proper prefix of tn+1(0), and thus the sequence (tn(0))n≥0 converges to an infinite word. So we get the fixed point of t

$$\begin{aligned}t^\omega(0)= 0110100110010110\cdots. \end{aligned}$$

One can prove that the fixed point tω(0) is the Thue–Morse word introduced in Example 1.3.6.

More generally, if h : Σ→ Σ is a morphism prolongable on a, we may then repeatedly iterate h to obtain the infinite fixed point

$$\begin{aligned} h^\omega(a)= a\, x \, h(x) \,h^2(x) \, h^3(x)\cdots. \end{aligned}$$

This infinite word is said to be purely morphic .

The factor complexity of purely morphic word is well known. The next result was stated by Pansiot in [467] and then generalized in [468]. For a comprehensive presentation, see [78, Section 4.7]. Recall that the case of eventually periodic words is settled by Morse–Hedlund theorem.

Theorem 1.4.2

Let w be a pure morphic word. If w is not eventually periodic, then its factor complexity p w belongs to Θ(n), \(\varTheta (n\log \log n)\) , \(\varTheta (n\log n)\) , or Θ(n2).

Definition 1.4.3

A morphism h is non-erasing if h(a) ≠ ϵ for all a ∈ Σ. Otherwise it is erasing. A morphism is k-uniform if |h(a)| = k for all a ∈ Σ; it is uniform if it is k-uniform for some k. A 1-uniform morphism is often said to be a letter-to-letter morphism or a coding .

The Thue–Morse morphism t given in (1.3) is 2-uniform.

Example 1.4.4 (Fibonacci Word)

Another significant example of a purely morphic word is the Fibonacci word. It is obtained from the non-uniform morphism defined over the alphabet {0, 1} by σ : 0↦01, 1↦0,

$$\begin{aligned} \sigma^\omega(0)=(x_n)_{n\ge 0}=0100101001001010010100100101001001010010100\cdots\, . \end{aligned}$$

It is a Sturmian word and can be obtained as follows. Let \(\phi =(1+\sqrt {5})/2\) be the Golden mean . For all n ≥ 1, if ⌊(n + 1)ϕ⌋−⌊⌋ = 2, then xn−1 = 0; otherwise xn−1 = 1.

An infinite word x over Δ is morphic if there exists a purely morphic word y over Σ and a morphism g : Σ→ Δ such that x = g(y).

We can always restrict ourselves to non-erasing prolongable morphisms and codings. This result was already stated in [154]. J.-J. Pansiot also considered this result in [466]. For a proof, see [14]. An alternative short proof is given in [298]. This result is also discussed in detail in [134] and [146].

Theorem 1.4.5

Let f : Σ Σ be a (possibly erasing) morphism that is prolongable on a letter a  Σ. Let g : Σ Γ be a (possibly erasing) morphism. If the word g(fω(a)) is infinite, there exists a non-erasing morphism h : Δ Δ prolongable on a letter c  Δ and a coding j : Δ Γ such that g(fω(a)) = j(hω(c)).

1.5 Languages and Machines

Formal languages theory is mostly concerned with the study of the mathematical properties of sets of words. For a comprehensive exposition on regular (or rational) languages and automata theory, see, for instance, Sakarovitch’s book [518]. For the connections with infinite words, see [476]. For an overview see the chapter [590]. Finally see [555], Hopcroft and Ullman’s classic book [301], or its updated version [300] for general books on formal languages theory.

1.5.1 Languages of Finite Words

Let Σ be an alphabet. A subset L of Σ is said to be a language . Since a language is a set of words, we can apply all the usual set operations like union, intersection, or set difference: ∪, ∩, or ∖. The concatenation of words can be extended to define an operation on languages. If L, M are languages, LM is the language of the words obtained by concatenation of a word in L and a word in M, i.e.,

$$\begin{aligned} LM=\{u v\mid u\in L, v\in M\}\, . \end{aligned}$$

We can of course define the concatenation of a language with itself, so it permits us to introduce the power of a language. Let \(n\in \mathbb {N}\), Σ be an alphabet, and L ⊆ Σ be a language. The language Ln is the set of words obtained by concatenating n words in L. We set L0 := {ϵ}. In particular, we recall that Σn denotes the set of words of length n over Σ, i.e., concatenations of n letters in Σ. The (Kleene) star of the language L is defined as

$$\begin{aligned} L^*=\bigcup_{i\ge 0}L^i\, . \end{aligned}$$

Otherwise stated, L contains the words that are obtained as the concatenation of an arbitrary number of words in L. Notice that the definition of Kleene star is compatible with the notation Σ introduced to denote the set of finite words over Σ. We also write Ln as a shorthand for

$$\begin{aligned} L^{\le n}=\bigcup_{i=0}^n L^i\, . \end{aligned}$$

Note that if the empty word belongs to L, then Ln = Ln. We recall that Σn is the set of words over Σ of length at most n.

Example 1.5.1

Let L = {a, ab, aab} and M = {a, ab, ba} be two finite languages. We have

$$\begin{aligned} L^2=\{aa,aab,aaab,aba,abab,abaab,aaba,aabab,aabaab\} \end{aligned}$$

and

$$\begin{aligned} M^2=\{aa,aab,aba,abab,abba,baa,baab,baba\}\, . \end{aligned}$$

One can notice that Card (L2) = (Card L)2 but Card (M2) < (Card M)2. This is due to the fact that all words in L2 have a unique factorization as concatenation of two elements in L, but this is not the case for M, where (ab)a = a(ba). We can notice that

$$\begin{aligned} L^*=\{a\}^*\cup\{a^{i_1}ba^{i_2}b\cdots a^{i_n}ba^{i_{n+1}}\mid \forall n\ge 1, i_1,\ldots,i_{n}\ge 1, \ i_{n+1} \geq 0\}\, . \end{aligned}$$

Since languages are sets of (finite) words, a language can be either finite or infinite . For instance, a language L differs from ∅ or {ϵ} if and only if the language L is infinite. Let L be a language, we set L+ = LL. The mirror operation can also be extended from words to languages: \(\tilde {L}=\{\tilde {u}\mid u\in L\}\).

Definition 1.5.2

A language is prefix-closed (respectively suffix-closed ) if it contains all prefixes (respectively suffixes) of any of its elements. A language is factorial if it contains all factors of any of its elements.

Obviously, any factorial language is prefix-closed and suffix-closed. The converse does not hold. For instance, the language {an bn > 0} is suffix-closed but not factorial.

Example 1.5.3

Connected with the Thue–Morse word (see Example 1.3.6), the set of words over {0, 1} containing an even number of ones is the language

$$\begin{aligned} \begin{array}{rcl} E&=&\{w\in\{0,1\}^*\mid |w|{}_1\equiv 0\pmod 2\}\\ &=&\{\epsilon,0,00,11,000,011,101,110,0000,0011,\ldots\}. \end{array} \end{aligned} $$

This language is closed under mirror, i.e., \(\tilde {L}=L\). Notice that the concatenation E{1}E is the language of words containing an odd number of ones and E ∪ E{1}E = E({ϵ}∪{1}E) = {0, 1}. Notice that E is neither prefix-closed, since 1001 ∈ E but 100∉E, nor suffix-closed.

Definition 1.5.4

The set of factors of a language L is denoted as Fac(L), whereas the set of prefixes of a language L is denoted as Pref(L). The notation w−1 L stands for w−1 L = {uwu ∈ L}.

If a language L over Σ can be obtained by applying to some finite languages a finite number of operations of union, concatenation, and Kleene star, then this language is said to be a regular language. This generation process leads to regular expressions which are well-formed expressions used to describe how a regular language is built in terms of these operations.

Note that the Chomsky–Schützenberger hierarchy introduced in the theory of formal languages provides a classification depending on the machine needed to recognize an infinite language of finite words. From a computational perspective, the simplest languages are the regular languages. They are accepted (or recognized) by finite automata, and described by regular expressions. One then has context-free languages that are recognized by non-deterministic pushdown automata, context-sensitive languages recognized by linear-bounded non-deterministic Turing machines, and lastly, recursively enumerable languages recognized by Turing machines. See Section 2.1.2 for a similar hierarchy for Mahler functions and regular sequences.

From the definition of a regular language, the following result is immediate.

Theorem 1.5.5

The class of regular languages over Σ is the smallest subset of \(2^{\varSigma ^*}\) (for inclusion) containing the languages ∅, {a} for all a  Σ and closed under union, concatenation, and Kleene star.

Example 1.5.6

For instance, the language L over {0, 1} whose words do not contain the factor 11 is regular. It is called the Golden mean shift . This language can be described by the regular expression L = {0}{1}{0, 01}∪{0}. Otherwise stated, it is generated from the finite languages {0}, {0, 01}, and {1} by applying union, concatenation, and star operations. Its complement in Σ is also regular and is described by the regular expression Σ{11}Σ. The language E from Example 1.5.3 is also regular; we have the following regular expression {0}({1}{0}{1}{0}) describing E.

Definition 1.5.7

Let L ⊆ Σ be a language over the alphabet Σ. The growth function of L is the map

$$\begin{aligned} g_L:\mathbb{N}\to\mathbb{N},\ n\mapsto \mathrm{Card} (L\cap \varSigma^n)\, . \end{aligned}$$

In particular, g L (n) ≤ (Card Σ)n for all n ≥ 0. Note that the complexity function of an infinite word x (see Definition 1.3.5) is exactly the growth function of the language Fac(x) of x.

1.5.2 Formal Series

Let R be a semiring (see Definition 1.2.2). We can consider a map m from Σ to R. This map can be represented as a formal series

$$\begin{aligned} S=\sum_{w\in\varSigma^*} m(w)\, w\, . \end{aligned}$$

This means that the coefficient (S, w) of the series S for the word w is given by m(w). The sets of those formal series is denoted by R〈〈Σ〉〉 and has a semiring structure for the two operations defined as follows:

$$\begin{aligned} (S+T,w)=(S,w)+(T,w) \end{aligned}$$

and

$$\begin{aligned} (ST,w)=\sum_{uv=w}(S,u)(T,v)\, . \end{aligned}$$

In particular, a finite word w of length n can be factored in n + 1 concatenation products. This means that the sum above is finite. When R is limited to the Boolean semiring \(\mathbb {B}\), then \(\mathbb {B}\langle \langle \varSigma ^*\rangle \rangle \) is just the set of languages over Σ. As a prominent example, Mahler functions are studied in details in Chapter 2.

1.5.3 Codes

A subset X ⊂ Σ+ is a code if every word in X has a unique factorization with factors in X, i.e.,

$$\begin{aligned} (x_1\cdots x_m=y_1\cdots y_n,\ x_1,\ldots,x_m,y_1,\ldots,y_n\in X) \Rightarrow (m=n\ \text{ and }\ x_i=y_i\ \forall i)\, . \end{aligned}$$

As an example, the set X = {a, ab, ba} is not a code because the word aba has two X-factorizations: a(ba) and (ab)a. The language {ai bi ≥ 0} is clearly a code. For an introduction to codes, see Bruyère’s chapter in [386].

Let X be a set of words where no word in X is a proper prefix of another word in X. Then X is said to be a prefix code . The terminology of code comes from the following proposition.

Proposition 1.5.8

A subset X  Σ+ is a code if and only if any morphism f : Γ Σ induced by a one-to-one correspondence (i.e., bijection) from Γ to X is one to one (injective).

The notion can be extended to deal with infinite words. A subset X ⊂ Σ+ is an ω-code if every word in Σω has at most one factorization with words in X. As an example, X = {a, ab, bb} is a code but it is not an ω-code. The infinite word abbb⋯ has two X-factorizations (a, bb, bb, …) and (ab, bb, bb, …).

1.5.4 Automata

As we shall briefly explain in this section, the regular languages are exactly the languages recognized by finite automata. We start with non-deterministic automata in Definition 1.5.9, then we present the deterministic ones in Definition 1.5.13. Finally, we introduce automata with output in Definition 1.5.17. The notions recalled here will be used in particular in Section 7.5 in connection with normality, and in Chapter 10 with the notion of Mealy automaton.

Definition 1.5.9

A finite automaton is a labeled graph given by a 5-tuple \(\mathcal {A}=(Q,\varSigma ,E,I,T)\) where Q is the (finite) set of states, E ⊆ Q × Σ× Q is the finite set of edges defining the transition relation , I ⊆ Q is the set of initial states, and T is the set of terminal (or final) states . A path in the automaton is a sequence

$$\begin{aligned} (q_0,u_0,q_1,u_1,\ldots,q_{k-1},u_{k-1},q_{k}) \end{aligned}$$

such that, for all i ∈{0, …, k − 1}, (q i , u i , qi+1) ∈ E, u0uk−1 is the label of the path. Such a path is successful if q0 ∈ I and q k  ∈ T. The language \(L(\mathcal {A})\) recognized (or accepted) by \(\mathcal {A}\) is the set of labels of all successful paths in \(\mathcal {A}\).

Any finite automaton \(\mathcal {A}\) gives a partition of Σ into \(L(\mathcal {A})\) and \(\varSigma ^*\setminus L(\mathcal {A})\). When depicting an automaton, initial states are marked with an incoming arrow and terminal states are marked with an outgoing arrow. A transition like (q, u, r) is represented by a directed edge from q to r with label u, \(q\stackrel {u}{\longrightarrow }r\).

Example 1.5.10

In Figure 1.1 the automaton has two initial states p and r and three terminal states q, r, and s. For instance, the word ba is recognized by the automaton. There are two successful paths corresponding to the label ba: (p, b, q, a, s) and (p, b, p, a, s). For this latter path, we can write \(p\stackrel {b}{\longrightarrow }p\stackrel {a}{\longrightarrow }s\). On the other hand, the word baab is not recognized by the automaton.

Fig. 1.1
figure 1

A finite automaton.

Example 1.5.11

The automaton in Figure 1.2 recognizes exactly the language E of the words having an even number of 1 from Example 1.5.3.

Fig. 1.2
figure 2

An automaton recognizing words with an even number of 1.

Definition 1.5.12

Let \(\mathcal {A}=(Q,\varSigma ,E,I,T)\) be a finite automaton. A state q ∈ Q is accessible (respectively co-accessible ) if there exists a path from an initial state to q (respectively from q to some terminal state). If all states of \(\mathcal {A}\) are both accessible and co-accessible, then \(\mathcal {A}\) is said to be trim .

Definition 1.5.13

A finite automaton \(\mathcal {A}=(Q,\varSigma ,E,I,T)\) is said to be deterministic (DFA) if it has only one initial state q0, if E is a subset of Q × Σ × Q and for each (q, a) ∈ Q × Σ there is at most one state r ∈ Q such that (q, a, r) ∈ E. In that case, E defines a partial function \(\delta _{\mathcal {A}}:Q\times \varSigma \to Q\) that is called the transition function of \(\mathcal {A}\). The adjective partial means that the domain of \(\delta _{\mathcal {A}}\) can be a strict subset of Q × Σ. To express that the partial transition function is total, the DFA can be said to be complete . To get a total function, one can add to Q a new “sink state” s and, for all (q, a) ∈ Q × Σ such that \(\delta _{\mathcal {A}}\) is not defined, set \(\delta _{\mathcal {A}}(q,a):=s\). This operation does not alter the language recognized by \(\mathcal {A}\). We can extend \(\delta _{\mathcal {A}}\) to be defined on Q × Σ by \(\delta _{\mathcal {A}}(q,\epsilon )=q\) and, for all q ∈ Q, a ∈ Σ, and u ∈ Σ, \(\delta _{\mathcal {A}}(q,au)=\delta _{\mathcal {A}}(\delta _{\mathcal {A}}(q,a),u)\). Otherwise stated, the language recognized by \(\mathcal {A}\) is \(L(\mathcal {A})=\{u\in \varSigma ^*\mid \delta _{\mathcal {A}}(q_0,u)\in F\}\) where q0 is the initial state of \(\mathcal {A}\). If the automaton is deterministic, it is sometimes convenient to refer to the 5-tuple \(\mathcal {A}=(Q,\varSigma ,\delta _{\mathcal {A}},I,T)\).

As explained by the following result, for languages of finite words, finite automata and deterministic finite automata recognize exactly the same languages. The following result is referred to as Rabin–Scott theorem [489].

Theorem 1.5.14

If L is recognized by a finite automaton \(\mathcal {A}\) , there exists a DFA which can be effectively computed from \(\mathcal {A}\) and recognizing the same language L.

A proof and more details about classical results in automata theory can be found in textbooks like [300, 518] or [539]. For standard material in automata theory, we shall not refer again to these references below.

One important result is that the set of regular languages coincides with the set of languages recognized by finite automata. The following result is referred to as Kleene’s theorem [349].

Theorem 1.5.15

A language is regular if and only if it is recognized by a (deterministic) finite automaton.

Observe that if L and M are two regular languages over Σ, then L ∩ M, L ∪ M, LM, and L ∖ M are also regular languages. In particular, a language over Σ is regular if and only if its complement in Σ is regular.

Example 1.5.16

The regular language L = {0}{1}{0, 01}∪{0} introduced in Example 1.5.6 is recognized by the DFA depicted in Figure 1.3. Notice that the state s is a sink : a non-terminal state and all transitions remain in s.

Fig. 1.3
figure 3

A DFA accepting words without factor 11.

We introduce the notion of automaton with output (see also more generally Definition 7.5.1 for the notion of a transducer). It generalizes the classical DFA: if the output function takes at most two values, then it is a DFA. The extra output function will take care of the extra coding.

Definition 1.5.17

A deterministic finite automaton with output or DFAO for short is given by a 5-tuple \(\mathcal {A}=(Q,q_0,A,\delta ,\mu )\) where Q is a finite set of states, q0 ∈ Q is the initial state, δ : Q × A → Q is the transition function, and μ : Q → B is the output map (where B is some finite set).

Finite automata accepting languages of infinite words are not presented here. Büchi automata (where an accepting run goes infinitely often through an accepting state) are introduced in Section 3.6.

1.6 Sequences and Machines

1.6.1 Automatic Sequences

We now consider how finite automata can be used to generate sequences with values in a finite alphabet, namely, we present the automatic sequences. As we shall soon see, they are particular morphic words and are deeply linked with the integer base-k numeration system. They were introduced by A. Cobham [156] under the name uniform tag sequences. Automatic sequences will appear in Chapters 2, 3, and 4. See in particular Section 2.2 for definitions, properties and examples, and connections with Mahler functions. We will recall that automatic sequences may be obtained as the image under a coding of the fixed point of a k-uniform morphism. Equivalently, for all n ≥ 0, the nth symbol of such a sequence is the output of a deterministic finite automaton with output fed with the k-ary expansion of n.

Definition 1.6.1

Let k ≥ 2. Consider an infinite word w = g(fω(a)) where f : Σ→ Σ is a k-uniform morphism prolongable on a and g : Σ→ Γ is a coding. We say that w is k-automatic.

Observe that |fn(a)| = kn for all n ≥ 0. We first consider the “internal sequence,” i.e., the fixed point x = fω(a) = x0 x1 x2⋯. Let j such that k ≤ j < k2; then j = kq + r with 1 ≤ q < k and 0 ≤ r < k. The symbol x j is the (r + 1)st symbol occurring in f(x q ). As depicted in Figure 1.4, this simply comes from one iteration of the k-uniform morphism.

Fig. 1.4
figure 4

Iterating a k-uniform morphism (with k = 4).

We obtain the following result by induction on m ≥ 0. Even though it is not surprising, it has an important consequence about how the word can be obtained.

Lemma 1.6.2

Let j such that km ≤ j < km+1 , for some m ≥ 0. Then j = kq + r with km−1 ≤ q < km and 0 ≤ r < k and the symbol x j is the (r + 1)st symbol occurring in f(x q ).

The quotient ⌊j/k⌋ of the Euclidean division of j by k is denoted by j DIV k. So to speak, for any symbol x j occurring in x = fω(a), we can track its history: x j has been produced by f from xj DIV k. The latter symbol appears itself in the image by f of x(j DIV k) DIV k, and so on and so forth.

Note that if the base-k expansion of j is rep k (j) = c i c1 c0, then the base-k expansion of j DIV k is c i c1. This simple observation permits one to easily track the past of a given symbol by considering the prefixes of rep k (j). Consider, for instance, the symbol t28 occurring in the Thue–Morse word:

$$\begin{aligned} \mathbf{t}=0\underline{1}1\overline{0}100\underline{1}100101\overline{1}0 100101100110\underline{1}001\cdots\, . \end{aligned}$$

Since rep2(28) = 11100, this symbol comes from t14 because rep2(14) = 1110. Then t14 appears in the image of t7, itself appearing in the image of t3 and finally in the image of t1.

But Lemma 1.6.2 provides some extra knowledge. Let j such that j = kq + r with km−1 ≤ q < km and 0 ≤ r < k, for some m ≥ 0. We have just explained how x j comes from x q . But the knowledge of x q and r entirely determines x j . It is thus time to explain where does the term of automatic sequence come from.

We can associate with a k-uniform morphism f : Σ→ Σ and a letter a ∈ Σ, a DFA where δ f (b, i) = wb,i if f(b) = wb,0wb,k−1. Note that the alphabet Σ is the set of states of this automaton.

Example 1.6.3

Consider the morphism f and the associated automaton depicted in Figure 1.5.

Fig. 1.5
figure 5

A 3-uniform morphism and the associated automaton \(\mathcal {A}_f\).

The next propositions explain the terminology of automatic sequences.

Proposition 1.6.4

Let x = fω(a) = x0 x1with f a k-uniform morphism. With the above notation, for all j ≥ 0,

$$\begin{aligned} x_j=\delta_f(a,\mathrm{rep}_k(j))\, . \end{aligned}$$

Proof

This is a direct consequence of Lemma 1.6.2. □

The converse also holds.

Proposition 1.6.5

Let be a DFA such that δ(a, 0) = a. Then the word x = x0 x1 x2defined by x j  = δ(a, rep k (j)), for all j ≥ 0, is the fixed point of a k-uniform morphism f prolongable on a where f(b) = δ(b, 0)⋯δ(b, k − 1) for all b  Σ.

Proof

This is again a direct consequence of Lemma 1.6.2. □

The reader will object that we have not taken into account that an extra coding can be applied to x = f(x). This does not require many changes. We simply have to make use of automata with output as stated below in Cobham’s theorem on automatic sequences [156].

Theorem 1.6.6

Let w = w0 w1 w2be an infinite word over an alphabet Γ. It is of the form g(fω(a)) where f : Σ Σ is a k-uniform morphism prolongable on a  Σ and g : Σ Γ is a coding if and only if there exists a DFAO

such that δ(a, 0) = a and, for all j ≥ 0, w j  = μ(δ(a, rep k (j))).

Proof

Proceed as above and the coding g coincides with the output function μ. □

Example 1.6.7

From the morphism t given in (1.3) generating the Thue–Morse word, we derive the automaton depicted in Figure 1.2. Again considering 28, which is written 11100 in base 2, if we start from the initial state p and we read consecutively the symbols in rep2(28) from left to right, then we follow some path in the automaton, and the state q finally reached gives the symbol t28. The output function maps p to 0 and q to 1.

Example 1.6.8

Let us consider a more intricate example where a coding, and thus an output function, is used. The morphism f and the coding g are given in Figure 1.6. The corresponding automaton is represented on the right of the same figure. We have

$$\begin{aligned} f^\omega(\mathtt{a})=\mathtt{acabaccaacababacacabaccaaccaacab}\cdots \end{aligned}$$

and

$$\begin{aligned} g(f^\omega(\mathtt{a}))=00010000000101000001000000000001\cdots\, . \end{aligned}$$
Fig. 1.6
figure 6

A 2-uniform morphism, a coding and the corresponding DFAO.

Again, the jth symbol in g(fω(a)) can be readily obtained from rep2(j) fed to the DFAO represented in Figure 1.6 where the states contain the information about the value of the output function.

Now we turn to the factors occurring in an automatic sequence w = g(x), where x is a fixed point of the k-uniform morphism f : Σ→ Σ. Let u be a factor of length n occurring in x. There exists i such that ki−1 ≤ n < ki. Note that |fi(b)| = ki for all b ∈ Σ. We consider the factorization of x into consecutive blocks of length ki of the form fi(b). Therefore, the factor u either occurs inside some fi(b) or it overlaps two images, i.e., u occurs in fi(bc) for some letters b, c ∈ Σ. Actually, there exist two letters b and c such that fi(bc) = pus with |p| < ki. This last condition tells us that u starts inside fi(b). Such a simple observation, where we look backwards at the images of the morphism, as suggested by Figure 1.7, is sometimes called a desubstitution . It provides us with an upper bound on the number of factors of length n that may occur in x: the number of pairs of letters (b, c) is (Card Σ)2 and u should start in one of the ki symbols of fi(b). Therefore, the number of factors of length n in x is at most

$$\begin{aligned} (\mathrm{Card}\, \varSigma)^2\, k^i\le (\mathrm{Card}\, \varSigma)^2\, k\, n\, . \end{aligned}$$

We can even replace (Card Σ)2 with p x (2) because only the factors bc occurring in x give factors of the form fi(b)fi(c) occurring in x = fi(x). Since applying a coding g cannot increase the number of factors, we get

$$\begin{aligned} \mathrm{Card}(\mathrm{Fac}(\mathbf{x})\cap \varSigma^n)\ge \mathrm{Card} \{ g(u)\mid u\in \mathrm{Fac}(\mathbf{x})\cap \varSigma^n\}\, , \end{aligned}$$

and so we have obtained the following result.

Fig. 1.7
figure 7

Iterating a 2-uniform morphism.

Theorem 1.6.9

Let w be a k-automatic sequence. Then p w (n) is in \(\mathcal {O}(n)\).

A proof of the following result can be found in [14, Section 8.4].

Theorem 1.6.10

If the frequency of a letter in a morphic sequence exists, then it is an algebraic number. If the frequency of a letter in an automatic sequence exists, then it is a rational number.

To conclude this section, we present another characterization of k-automatic sequences. This is not the last one; in Chapter 3, Section 3.3, a logical characterization of k-automatic sequences will be discussed, whereas Chapter 4 will provide an algebraic characterization in terms of polynomial identities (see Corollary 4.5.3).

Definition 1.6.11

Let k ≥ 2 be an integer. Given a sequence s = (s(n))n≥0, we define a particular set of subsequences called the k-kernel of s

$$\begin{aligned} \mathrm{Ker}_k(s):=\left\{(s(k^\ell n+r))_{n\geq 0}\mid \ell\geq 0,\ 0\leq r< k^\ell\right\}\, . \end{aligned}$$

An equivalent definition of the k-kernel is to introduce k operators of k-decimation acting on the set of sequences and defined, for r ∈{0, …, k − 1}, by

$$\begin{aligned} \rho_{k,r} ((s(n))_{n\ge 0}) = (s(kn+r))_{n\ge 0}\, . \end{aligned}$$

Thus Ker k (s) is the set of sequences of the form

$$\begin{aligned} \rho_{k,r_1} \circ \cdots \circ \rho_{k,r_m}((s(n))_{n\ge 0}) \end{aligned} $$
(1.4)

for all m ≥ 0 and r1, …, r m  ∈{0, …, k − 1}. These decimation operators are close to the Cartier operators discussed in Chapter 2. The following result appeared in Eilenberg’s book [211]. Note that if a sequence t belongs to Ker k (s), then ρk,r(t) also belongs to Ker k (s).

Theorem 1.6.12

A sequence is k-automatic if and only if its k-kernel is finite.

Example 1.6.13

The 2-kernel of the Thue–Morse sequence contains exactly two sequences (the sequence itself and its “complement”). Indeed, let s2(n) be the sum of digits of the binary expansion of n, we have

$$\begin{aligned} s_2(2n)= s_2(n), \quad s_2(2n+1)= s_2(n)+1\, . \end{aligned} $$
(1.5)

1.6.2 Regular Sequences

We have seen that k-automatic sequences may be defined through the finiteness of their k-kernel (Theorem 1.6.12). This characterization is used to extend the notion to sequences taking infinitely many values. Allouche and Shallit considered sequences taking values in a ring R containing a commutative Noetherian ring R′ (i.e., every ideal of R′ is finitely generated). Examples of such rings R′ are given by all finite rings, all principal ideal domains, and in particular \(\mathbb {Z}\), the ring of polynomials with coefficients in a field, or all fields. We may consider linear combinations with coefficients in R′ (R′-linear combinations) of sequences in \(R^{\mathbb {N}}\). Endowed with point-wise addition and multiplication by an element in R′, the set \(R^{\mathbb {N}}\) has a R′-module structure: if r = (r(n))n≥0 and s = (s(n))n≥0 belong to \(R^{\mathbb {N}}\) and α belongs to R′, then, for all \(n\in \mathbb {N}\),

$$\begin{aligned} (r+s)(n)=r(n)+s(n)\end{aligned} $$

and

$$\begin{aligned} (\alpha\cdot r)(n)=\alpha\cdot r(n).\end{aligned} $$

In this short section, we mainly consider sequences in \(\mathbb {Z}^{\mathbb {N}}\), i.e., \(R=R'=\mathbb {Z}\). We will encounter regular sequences in Chapters 2, 3, and 4 of this book. To have stand-alone chapters, these notions will also be repeated there. In Chapter 3 (see in particular Section 3.4.1), k-regularity will be extended to sequences taking values in a semiring.

Regular sequences appeared in [16]. Many examples are given in [15]. See also [14, Chapter 16] and the updated version of Berstel and Reutenauer’s book [77] where a chapter is devoted to regular sequences and linked with rational series.

Let M be a R-module and a subset X ⊂ M. The submodule generated by X is the intersection of all submodules of M containing X. It is denoted by 〈X〉. It is the set of all finite R-linear combinations of elements in X. A module is finitely generated (over R) when it is generated by a finite set (i.e., it is the R-span of a finite set). One also says that the module is of finite type or even finite over R. Note that the finite set of generators is not necessarily a basis.

Definition 1.6.14

Let k ≥ 2 be an integer. A sequence s = (s(n))n≥0 taking integer values is k-regular if the \(\mathbb {Z}\)-module generated by its k-kernel \(\left \langle \mathrm {Ker}_k(s)\right \rangle \) is finitely generated, i.e., there exists a finite number of sequences in \(\mathbb {Z}^{\mathbb {N}}\)

$$\begin{aligned} t_1=(t_1(n))_{n\geq 0}, \ldots, t_\ell=(t_\ell(n))_{n\geq 0} \end{aligned}$$

such that

$$\begin{aligned} \left\langle \mathrm{Ker}_k(s)\right\rangle=\langle t_1,\ldots,t_\ell\rangle.\end{aligned} $$

In particular, every sequence in Ker k (s) is a \(\mathbb {Z}\)-linear combination of the t j s. For all i ≥ 0 and for all r ∈{0, …, ki − 1}, there exist integers ci,1, …, ci, such that

$$\begin{aligned} \forall n \ge 0, \quad s(k^i n + r) = \sum_{j=1}^\ell c_{i,j}\, t_j (n).\, \end{aligned}$$

One can consider another point of view. A sequence is said to be k-regular if its orbit under the action of the operators of k-decimation remains in a finite dimensional vector space. Indeed, \(\mathbb {Z}\) is included in fields such as \(\mathbb {Q}\), \(\mathbb {R}\), or \(\mathbb {C}\). Thus the sequences can be seen as elements of \(\mathbb {Q}^{\mathbb {N}}\) which is a \(\mathbb {Q}\)-vector space.

Remark 1.6.15

The original definition in [16] was formulated differently. Let R be a ring containing a commutative Noetherian ring R′. A sequence s = (s(n))n≥0 in \(R^{\mathbb {N}}\) is (R’,k)-regular if there exists a finite number of sequences in \(R^{\mathbb {N}}\)

$$\begin{aligned} t_1=(t_1(n))_{n\geq 0}, \ldots, t_\ell=(t_\ell(n))_{n\geq 0} \end{aligned}$$

such that every sequence in Ker k (s) is an R′-linear combination of t1, …, t . Thus the definition means that \(\left \langle \mathrm {Ker}_k(s)\right \rangle \subseteq \langle t_1,\ldots ,t_\ell \rangle \). Otherwise stated, \(\left \langle \mathrm {Ker}_k(s)\right \rangle \) is a submodule of a finitely generated R′-module (in general, this does not imply that the submodule itself is finitely generated). Since R′ is assumed to be Noetherian, one can show that every submodule of a finitely generated R′-module is finitely generatedFootnote 2, and thus \(\left \langle \mathrm {Ker}_k(s)\right \rangle \) is finitely generated. This was the point of view adopted in Definition 1.6.14. In particular, if the setting does not assume that R′ is Noetherian (in particular, if R or R′ is a semiring), then Definition 1.6.14 would be stronger than simply requiring \(\left \langle \mathrm {Ker}_k(s)\right \rangle \subseteq \langle t_1,\ldots ,t_\ell \rangle \).

Example 1.6.16

The base-2 sum-of-digits function s2 gives the sequence

$$\begin{aligned} (s_2(n))_{n\ge 0}=0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, \ldots\, . \end{aligned}$$

(Notice that we can interchange the words function and sequence and also speak of k-regular functions when defined over \(\mathbb {N}\).) Clearly this sequence is unbounded: s2(2n − 1) = n for all n. Nevertheless, in view of (1.5), the \(\mathbb {Z}\)-module generated by its 2-kernel is generated by the sequence (s2(n))n≥0 itself and the constant sequence (1)n≥0.

Obviously, every k-automatic sequence is k-regular.

Proposition 1.6.17

Let s be a sequence taking finitely many different values, i.e., there exists a finite alphabet Σ such that s  Σω . Let k ≥ 2. The sequence is k-automatic if and only if it is k-regular.

There is an intermediate class of sequences between k-automatic and k-regular sequences [130].

Definition 1.6.18

Let k ≥ 2 be an integer. The map rep k is extended to \(\mathbb {N}\times \mathbb {N}\) as follows. For all \(m,n\in \mathbb {N}\),

$$\begin{aligned} \mathrm{rep}_k(m,n)=\left(0^{M-|\mathrm{rep}_k(m)|}\mathrm{rep}_k(m),0^{M-|\mathrm{rep}_k(n)|}\mathrm{rep}_k(n)\right) \end{aligned}$$

where \(M=\max \{|\mathrm {rep}_k(m)|,|\mathrm {rep}_k(n)|\}\). The idea is that the shortest word is padded with leading zeroes to get two words of the same length.

A sequence (s(n))n≥0 of integers is said to be k-synchronized if the language \(\{\mathrm {rep}_k(n,s(n)) \mid n \in \mathbb {N}\}\) is accepted by some finite automaton reading pairs of digits.

As an example, the complexity function (p x (n))n≥0 of a k-automatic sequence x is k-synchronized [522]; we refer to Proposition 3.4.16. More results of this form are provided in Section 3.4. For results on the growth of regular sequences, see Section 2.3.

Proposition 1.6.19

Let s be a sequence taking finitely many different values, i.e., there exists a finite alphabet Σ such that s  Σω . Let k ≥ 2. The sequence is k-automatic if and only if it is k-synchronized.

Similarly to recognizable formal series, with every k-regular sequence \((s(n))_{n\ge 0}\in \mathbb {Z}^{\mathbb {N}}\) is associated linear representation (λ, μ, ν). There exist a positive integer r, a row vector \(\lambda \in \mathbb {Z}^{1\times r}\) and a column vector \(\nu \in \mathbb {Z}^{r\times 1}\), a matrix-valued morphism \(\mu :\{0,\ldots ,k-1\}\to \mathbb {Z}^{r\times r}\) such that

$$\begin{aligned} s(n)=\lambda \mu(c_0\cdots c_\ell) \nu \end{aligned}$$

for all c , …, c0 ∈{0, …, k − 1} such that \(\mathrm {val}_k(c_\ell \cdots c_0)=\sum _{i=0}^\ell c_i\, k^i=n\). The converse also holds, if there exists a linear representation associated with the canonical k-ary expansion of integers (one has to take into account the technicality of representations with leading zeros), then the sequence is k-regular. See, for instance, [14, Theorem 16.2.3]. As a corollary, the nth term of a k-regular sequence can be computed with ⌊log k (n)⌋ matrix multiplications.

Proof

Let \(s=(s(n))_{n\ge 0}\in \mathbb {Z}^{\mathbb {N}}\) be a k-regular sequence. By definition, there exists a finite number of sequences t1, …, t such that \(\left \langle \mathrm {Ker}_k(s)\right \rangle =\langle t_1,\ldots ,t_\ell \rangle \). In particular, each t j is a \(\mathbb {Z}\)-linear combination of elements in the k-kernel of s. We have finitely many t j s, so t1, …, t are linear combinations of finitely many elements in Ker k (s). Thus we can assume that \(\left \langle \mathrm {Ker}_k(s)\right \rangle \) is generated by finitely many elements from Ker k (s) itself. Without loss of generality, we will now assume that t1, …, t belong to Ker k (s).

From (1.4), for all r ∈{0, …, k − 1} and all i ∈{1, …, }, ρk,r(t i ) is a sequence in Ker k (s), and thus, there exist coefficients (A r )1,i, …, (A r ),i such that

$$\begin{aligned} \rho_{k,r}( t_i )=\sum_{j=1}^\ell (A_r)_{j,i}\, t_j\, .\end{aligned} $$

Notice that A r is an  ×  matrix. Roughly, if we were in a vector space setting, this means that the matrices A r represent the linear operators ρk,r in the basis t1, …, t . Let p ≥ 0 be an integer. Notice that if rep k (p) = r m r0, then s(p) is the first term, i.e., corresponding to the index 0, of the sequence

$$\begin{aligned} (s(b^{m+1}n+p))_{n\ge 0}=\rho_{k,r_0}\circ\cdots\circ\rho_{k,r_m}\left((s(n))_{n\ge 0}\right)\, . \end{aligned}$$

We will use the fact that ρk,r is linear, i.e., if α, β are coefficients and v, w are two sequences, then ρk,r(αv + βw) = αρk,r(v) + βρk,r(w). It is easy to see that

$$\begin{aligned} \rho_{k,r_0}\circ\cdots\circ\rho_{k,r_m}\left(t_i\right)=\sum_{j=1}^\ell (A_{r_0}\cdot \cdots \cdot A_{r_m})_{j,i}\, t_j\, . \end{aligned}$$

If we have the following decomposition of s (in a vector space setting, we would have a unique decomposition of s in the basis t1, …, t )

$$\begin{aligned} s=\sum_{i=1}^\ell \sigma_i\, t_i \end{aligned}$$

then, by linearity,

$$\begin{aligned} (s(b^{m+1}n+p))_{n\ge 0}=\sum_{i=1}^\ell \sigma_i\, \sum_{j=1}^\ell (A_{r_0}\cdot \cdots \cdot A_{r_m})_{j,i}\, (t_j(n))_{n\ge 0}=\sum_{j=1}^\ell \tau_j\, (t_j(n))_{n\ge 0} \end{aligned}$$

where

$$\begin{aligned} \begin{pmatrix} \tau_1\\ \vdots\\ \tau_\ell\\ \end{pmatrix}=A_{r_0}\cdot \cdots \cdot A_{r_m} \begin{pmatrix} \sigma_1\\ \vdots\\ \sigma_\ell\\ \end{pmatrix}\, . \end{aligned}$$

Consequently, s(p) is obtained as

$$\begin{aligned} s(p)=\sum_{i=1}^\ell \tau_i\, t_i(0)=\begin{pmatrix} t_1(0) & \cdots &t_\ell (0)\\ \end{pmatrix} A_{r_0}\cdot \cdots \cdot A_{r_m} \begin{pmatrix} \sigma_1\\ \vdots\\ \sigma_\ell\\ \end{pmatrix}\, . \end{aligned}$$

For a reader familiar with rational series, the previous result can be reformulated as follows. A sequence s(n) is k-regular if and only if the formal series

$$\begin{aligned} \sum_{w\in\{0,\ldots,k-1\}^*} s(\mathrm{val}_k(w))\, w \end{aligned}$$

is recognizable (with the terminology of [77]; see Definition 3.4.1).

Example 1.6.20

For the sum-of-digits function given in Example 1.6.16, the sequence s2 = (s2(n))n≥0 has a (base-2) linear representation given by

$$\begin{aligned} \lambda= \begin{pmatrix} 0&1\\ \end{pmatrix},\ \mu(i)= \begin{pmatrix} 1&0\\ i&1\\ \end{pmatrix},\ \nu= \begin{pmatrix} 1\\ 0\\ \end{pmatrix}\, . \end{aligned}$$

We let 1 denote the constant sequence. It does not belong to the 2-kernel of s2, but it belongs to the \(\mathbb {Z}\)-module generated by it because it is equal to ρ2,1(s2) − s2. Nevertheless, it is enough to see that ρ2,0(1) = ρ2,1(1) = 1 and take s2 and 1 as generators to proceed as in the proof above. From the following relations we derive the two columns of matrix μ(0)

$$\begin{aligned} \rho_{2,0}(s_2)=1\cdot s_2+ 0\cdot \mathbf{1},\quad \rho_{2,0}(\mathbf{1})=0\cdot s_2+ 1\cdot \mathbf{1} \end{aligned}$$

and for μ(1)

$$\begin{aligned} \rho_{2,1}(s_2)=1\cdot s_2+ 1\cdot \mathbf{1},\quad \rho_{2,1}(\mathbf{1})=0\cdot s_2+ 1\cdot \mathbf{1}\, . \end{aligned}$$

The vector λ is given by s2(0) = 0 ans 1(0) = 1. The vector ν is obtained from s2 = 1 ⋅ s2 + 0 ⋅1. To compute s2(19), observe that rep2(19) = 10011. Thus we compute

$$\begin{aligned} \begin{pmatrix} 0&1\\ \end{pmatrix} \mu(1)\mu(1)\mu(0)\mu(0)\mu(1) \begin{pmatrix} 1\\ 0\\ \end{pmatrix}=3\, . \end{aligned}$$

Example 1.6.21

A less trivial example is considered in [201] by counting the number of odd numbers in the first n rows of the Pascal triangle. This sequence has a (base-2) linear representation given by

$$\begin{aligned} \lambda= \begin{pmatrix} 0&1\\ \end{pmatrix},\ \mu(0)= \begin{pmatrix} 3&6\\ 0&1\\ \end{pmatrix},\ \mu(1)= \begin{pmatrix} 0&-6\\ 1&5\\ \end{pmatrix},\ \nu= \begin{pmatrix} 1\\ 0\\ \end{pmatrix}\, . \end{aligned}$$

Remark 1.6.22

In [15, Section 6], a practical procedure to guess relations a possibly k-regular sequence will satisfy is described. Consider a sequence (s(n))n≥0. The idea is to construct a matrix in which the rows represent truncated versions of elements of the k-kernel of (s(n))n≥0, together with row reduction. Start with a matrix having a single row, say, corresponding to the first m elements of the sequence. Then repeatedly add subsequences of the form (s(k n + r))n≥0 not linearly dependent of the previous stored sequences. From this, you have candidate relations that remain to be proven.

Considering again the sum-of-digit function, Delange [191] showed that the summatory function of s2 exhibits a particular behavior (also see [14, Thm. 3.5.4]).

$$\begin{aligned} \frac{1}{N} \sum_{j=0}^{N-1}s_2(j)=\frac{1}{2} \log_2 N+ \mathcal{G}(\log_2 N) \end{aligned} $$
(1.6)

where \(\mathcal {G}\) is a continuous nowhere differentiable periodic function of period 1 (Figure 1.8).

Fig. 1.8
figure 8

The periodic function \(\mathcal {G}\) on [0, 1].

General results do exist for summatory function of k-regular sequences. The result below can be found in [14, Thm. 16.4.1].

Theorem 1.6.23

Let a = (a(n))n≥0 and b = (b(n))n≥0 be k-regular sequences. Then c = a ⋆ b , where, for all n ≥ 0, \(c(n)=\sum _{i=0}^n a_i\, b_{n-i}\) , is k-regular.

Corollary 1.6.24

Let a = (a(n))n≥0 be a k-regular sequence. The sequence of partial sums

$$\begin{aligned} \left(\sum_{i=0}^n a_i\right)_{n\ge 0}\end{aligned} $$

is k-regular.

Proof

One simply takes for b the constant sequence (1)n≥0 in Theorem 1.6.23. □

A linear representation of the summatory sequence can easily be deduced from the linear representation of the sequence itself, see [201, Lemma 1] or Proposition 2.2.11 in Chapter 2. Let us state the following result obtained by Dumas [201, 202] (see also Theorem 2.3.13).

Theorem 1.6.25

Let k ≥ 2 be an integer. The summatory function of a k-regular sequence (u(n))n≥0 with a linear representation given by the matrices Γ0, …, Γk−1 admits an asymptotic expansion which is a sum of terms of the form

$$\begin{aligned} N^{\log_k\rho}\, \binom{\log_kN}{m}\, e^{i\theta\log_kN}\, \varphi(\log_kN) \end{aligned}$$

for the eigenvalues ρe of Γ := Γ0 + ⋯ + Γk−1 whose modulus ρ is larger than the joint spectral radius of Γ0, …, Γk−1 and where m is an integer bounded by the maximal size of a Jordan block associated with ρe and φ is a periodic function of period 1. For this asymptotic expansion, there is an error term in \({\mathcal O}(N^{\log _k r})\) for every r larger than the joint spectral radius of the matrices Γ0, …, Γk−1.

Definition about the joint spectral radius will be given in Chapter 2; see also Chapter 11.8.1. Similar results are also discussed by Drmota and Grabner in [78, Theorem 9.2.15]. Let us also mention another result (see [14, Theorem 3.5.1]) with stronger assumptions but avoiding error terms. In this result, if v belongs to \(\mathbb {C}^d\), then the notation ||v|| stands for the Euclidean norm of v defined by \(\left ( \sum _{i=1}^d |v_i|{ }^2 \right ) ^{\frac {1}{2}}\). Moreover, if M is a square matrix of dimension d with entries in \(\mathbb {C}\), then by ||M|| we mean the L2 norm, which is the matrix norm associated with the usual Euclidean norm on \(\mathbb {C}^d\) by the formula ||M|| =sup||x||=1||Mx||.

Theorem 1.6.26

Let k ≥ 2 be an integer. Suppose there exist an integer d ≥ 1, a sequence of vectors (V n )n≥0 , \(V_n\in \mathbb {C}^d\) , defined by

$$\begin{aligned} V_n= \begin{pmatrix} V_n^{(1)}\\ V_n^{(2)}\\ \vdots\\ V_n^{(d)}\\ \end{pmatrix}, \end{aligned}$$

and k square matrices Γ0, Γ0, …, Γk−1 of dimension d such that

  1. 1.

    Vkn+r = Γ r V n for all n ≥ 0 and all r, 0 ≤ r < k.

  2. 2.

    \(||V_n|| = O(\log n)\).

  3. 3.

    There exist a d × d matrix Λ and a constant c > 0 such that either ||Λ|| < c or Λ is nilpotent, such that Γ := Γ0 + Γ1 + ⋯ + Γk−1 = cI + Λ.

The matrix Γ being clearly invertible, if ||Γ−1|| < 1, then there exists a continuous function \(G :\mathbb {R} \to \mathbb {C}^d\) of period 1 such that

$$\begin{aligned} \sum_{0\le n < N} V_n = N^{\log_k c}(I + c^{-1} \varLambda)^{\log_k N} G\left( \log_k N \right). \end{aligned}$$

1.7 Dynamical Systems

There are two main types of dynamical systems, namely, topological ones and measure-theoretic ones. Dynamical systems will be considered in particular in Chapters 8, 9, and 11.

1.7.1 Topological Dynamical Systems

Definition 1.7.1

A topological dynamical system (X, T) is defined as a compact metric space X together with a continuous map T defined onto the set X.

We are interested in iterating the map T, and we look at the orbits \(\mathcal {O}(x)\) of x ∈ X defined as

$$\begin{aligned}\mathcal{O}(x)=\{T^n(x)\colon n\in\mathbb{N}\}.\end{aligned}$$

under the action T. The trajectory of x ∈ X is the sequence (Tn(x))n≥0.

A topological dynamical system (X, T) is minimal if, for all x in X, the orbit of x, i.e., the set \( \{T^n\mathbf {x} \mid n \in \mathbb {N} \}\), is dense in X. Let us note that if (X, S) is a subshift, and if X is furthermore assumed to be minimal, then X is periodic if and only if X is finite.

Two dynamical systems (X1, T1) and (X2, T2) are said to be topologically conjugate (or topologically isomorphic ) if there exists an homeomorphism f from X1 onto X2 which conjugates T1 and T2, that is:

$$\begin{aligned} f \circ T_1 = T_2 \circ f. \end{aligned}$$

If f is only onto, then (X1, T1) is said to factor onto (X2, T2), (X2, T2) is a factor of (X1, T1), and f is called a factor map.

1.7.2 Measure-Theoretic Dynamical Systems

We have considered here the notion of dynamical system, that is, a map acting on a given set, in a topological context. This notion can be extended to measurable spaces; we thus get measure-theoretic dynamical systems. For more details, one can refer, for instance, to [579]. See also Section 11.11.3.

Definition 1.7.2

A measure-theoretic dynamical system is defined as a system \((X, \mathcal {B}, \mu ,T)\), where \(\mathcal {B}\) is a σ-algebra, μ a probability measure defined on \(\mathcal {B}\), and T : X → X is a measurable map which preserves the measure μ, i.e., for all \(B \in \mathcal {B}\), μ(T−1(B)) = μ(B). Such a measure is said to be T-invariant and the map T is said to preserve the measure μ.

The transformation T (or the system \((X, \mathcal {B}, \mu ,T)\)) is ergodic if for every \(B \in \mathcal {B}\) such that T−1(B) = B, then B has either zero measure or full measure.

Let (X, T) be a topological dynamical system. A topological system (X, T) always has an invariant probability measure. The case where there exists only one T-invariant measure is of particular interest. A topological dynamical system (X, T) is said to be uniquely ergodic if there exists one and only one T-invariant Borel probability measure over X. In particular, a uniquely ergodic topological dynamical system yields an ergodic measure-theoretic dynamical system.

A measure-theoretic ergodic dynamical system satisfies the Birkhoff ergodic theorem , also called individual ergodic theorem . Let us recall that the abbreviation a.e. stands for “almost everywhere” : a property holds almost everywhere if the set of elements for which the property does not hold is contained in a set of zero measure.

Theorem 1.7.3

Let \((X, \mathcal {B}, \mu ,T)\) be a measure-theoretic dynamical system. Let \( f\in L^1(X,\mathbb {R}) \) . Then the sequence \( (\frac {1}{n} \sum _{k=0}^{n-1} f \circ T^k )_{n \ge 0}\) converges a.e. to a function \(f^* \in L^1(X,\mathbb {R}) \) . One has f T = f a.e. and \(\int _X f^* \, d\mu = \int _X f\, d\mu \) . Furthermore, if T is ergodic, since f is a.e. constant, one has:

$$\begin{aligned} \forall f\in L^1(X,\mathbb{R}) \ , \ \ \ \frac{1}{n} \sum_{k=0}^{n-1} f \circ T^k \xrightarrow[ n \rightarrow \infty]{\mu - a.e.} \int_{X} f\, d\mu\, .\end{aligned} $$

Note that the notions of conjugacy and factor between two topological dynamical systems extend in a natural way to the measure-theoretic context.

1.7.3 Symbolic Dynamics

Let us introduce some basic notions in symbolic dynamics. For expository books on the subject, see [167, 348, 381, 475] and [488]. For references on ergodic theory, also see, e.g., [579]. These notions will be central in particular in Chapters 8 and 9.

Let S denote the following map defined on Σω, called the one-sided shift :

$$\begin{aligned} S((x_n)_{n \ge 0})=(x_{n+1})_{n \ge 0}\, . \end{aligned}$$

In particular, if x = x0 x1 x2⋯ is an infinite word over Σ, then for all n ≥ 0, its suffix x n xn+1⋯ is simply Sn(x). The map S is uniformly continuous, onto but not one to one on Σω. This notion extends in a natural way to \(\varSigma ^{\mathbb Z}\). In this latter case, the shift S is one to one. We thus get symbolic dynamical systems. Here symbolic refers to the fact that they are defined on words.

The definitions given below correspond to the one-sided shift, but they extend in a natural way to the two-sided shift.

Definition 1.7.4

Let x be an infinite word over the alphabet Σ. The symbolic dynamical system associated with x is then defined as the shift orbit closure \((\overline {\mathcal {O}(\mathbf {x})},S)\), where \( \overline {\mathcal {O}(\mathbf {x})}\subseteq \varSigma ^{\omega }\) is the closure of the orbit \( \mathcal {O}(\mathbf {x})=\{S^n\mathbf {x} \mid n\in {\mathbb {N}}\}\) of x.

In the case of bi-infinite words, we similarly define \(\mathcal {O}(\mathbf {x}) = \{ S^n \mathbf {x }\mid n\in \mathbb {Z} \}\) where the (two-sided) shift map is defined on \(\varSigma ^{\mathbb {Z}}\). The set \(X_{\mathbf {x}}:= \overline {\mathcal {O}(\mathbf {x})}\) is a closed subset of the compact set Σω; hence it is a compact space and S is a continuous map acting on it. One checks that, for every infinite word y ∈ Σω, the word y belongs to X x if and only if L(y) ⊆ L(x). For a proof, see [488] or Chapter 1 of [487]. Note that \(\overline {\mathcal {O}(\mathbf {x})}\) is finite if and only if x is eventually periodic. Moreover, if x is an infinite word, (X x , S) is minimal if and only if x is uniformly recurrent. Indeed, w is a factor of x, we write

$$\begin{aligned} \overline{\mathcal{O}(\mathbf{x})}=\bigcup_{n \in \mathbb{N}} S^{-n}[w], \end{aligned}$$

and we conclude by a compactness argument.

Generic examples of symbolic dynamical systems are provided by subshifts (also called shifts for short). Let Y be a closed subset of Σω that is stable under the action of the shift S. The system (Y, S) is called a subshift . The full shift is defined as (Σω, S). If Y is a subshift, there exists a set \({\mathcal F} \subset \varSigma ^*\) of finite words such that an infinite word x belongs to X if and only if none of its factors belongs to \({\mathcal F}\). A subshift X is called a subshift of finite type if one can choose the set \({\mathcal F}\) to be finite. A subshift is said to be sofic if the set \({\mathcal F}\) is a regular language. A subshift (X, S) is said to be periodic if there exist x ∈ X and an integer k such that X = {x, S x, …, Sk x = x}. Otherwise it is said to be aperiodic .

For the more general case of a group G acting on configurations in ΣG, see Chapter 9. Elements of ΣG can be considered as colorings of a group G by a finite alphabet Σ. The set of configurations ΣG, endowed with the product topology, is a compact space on which we define the shift transformations: for every g ∈ G, the shift Sg translates a configuration x ∈ ΣG through \(S^g(x)_h = x_{g^{-1}h}\) for every h ∈ G. In this framework, subshifts are exactly subsets of AG that are both shift-invariant and closed for the product topology.

Example 1.7.5

The set of infinite words over {0, 1} of Example 1.5.6 which do not contain the factor 11 is a subshift of finite type, whereas the set of infinite words over {0, 1} having an even number of 1 between two occurrences of the letter 0 is a sofic subshift which is not of finite type.

Definition 1.7.6

Let Y be a subshift. For a word w = w0w r , the cylinder set [w] is the set {y ∈ Y ∣y0 = w0, …, y r  = w r }.

The cylinder sets are clopen (open and closed) sets and form a basis of open sets for the topology of Y . Furthermore, one checks that a clopen set is a finite union of cylinders. In the bi-infinite case, the cylinders are the sets

$$\begin{aligned}{}[u.v]_Y = \{ y\in Y \mid y_i = u_i, y_j=v_j, \ -|u| \leq i\leq -1, \ 0\leq j \leq |v| -1 \} \end{aligned}$$

and the same remark holds.

Then the topological entropy h(X) of the symbolic dynamical system (X, S) measures the richness of its language L, defined as the set of factors of elements in X. It is defined as

$$\begin{aligned}h(X)=\lim_{n\to\infty}\frac{1}{n}\ln\left\vert L \cap \varSigma^n \right\vert.\end{aligned}$$

It is closely related to the growth rate of the language L defined as \(\limsup _{n\to \infty } \vert L \cap \varSigma ^n \vert ^{\frac {1}{n}}\) and considered in Chapter 5.