Codes and Automata in Minimal Sets

Perrin, Dominique

doi:10.1007/978-3-319-23660-5_4

Dominique Perrin¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9304))

Included in the following conference series:

International Conference on Combinatorics on Words

451 Accesses

Abstract

We explore several notions concerning codes and automata in a restricted set of words S. We define a notion of S-degree of an automaton and prove an inequality relating the cardinality of a prefix code included in a minimal set S and its S-degree.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Kleene Closure on Regular and Prefix-Free Languages

Minimal Partial Languages and Automata

Prefix-Free Subsets of Regular Languages and Descriptional Complexity

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

We have introduced in [1] the notion of tree set as a common generalization of Sturmian sets and of interval exchange sets. In this paper, we investigate several new directions concerning codes and automata in minimal sets.

Codes and automata in restricted sets of words have already been investigated several times. In particular, Restivo has investigated codes in sets of finite type [2] and Reutenauer has studied the more general notion of codes of paths in a graph [3]. We have initiated in [4] with several other authors, a systematic study of bifix codes in Sturmian sets, a subject already considered before in [5]. The overall conclusion of this study is that very surprising phenomena appear in this context in relation with subgroups of finite index of the free group, allowing one to obtain positive bases of the subgroups contained in a given minimal set.

In this paper, we investigate several notions concerning codes and automata in relation with a factorial set S. This includes a definition of minimal S-rank of an automaton, which is equal to 1 if and only if the automaton is synchronized. We prove a result which allows to compute the minimal S-rank when S is minimal (Theorem 3.1). We also show that for a recurrent set S and a strongly connected automaton $\mathcal {A}$, the set of elements of the transition monoid M of minimal S-rank is included in a $\mathcal {D}$-class of M called its S-minimal $\mathcal {D}$-class (Proposition 3.2). This regular $\mathcal {D}$-class is unique when S is minimal and it is related with the results of [6] and [7] on the regular $\mathcal {J}$-classes of free profinite semigroups.

We define the S-degree of a prefix code X included in S as the minimal S-rank of the minimal automaton of $X^*$. We show that the cardinality of a prefix code is bounded below by a linear function of its S-degree (Theorem 4.4).

Let X be a prefix code and let M be the transition monoid of the minimal automaton of $X^*$. We associate to X a permutation group denoted $G_X(S)$ which is the structure group of the S-minimal $\mathcal {D}$-class of M. We show that for any uniformly recurrent tree set S and any finite S-maximal bifix code X, the group $G_X(S)$ is equivalent to the representation of the free group on the cosets of the subgroup generated by X (Theorem 4.5).

2 Neutral and Tree Sets

Let A be a finite alphabet. We denote by $A^*$ the set of all words on A. We denote by $\varepsilon $ or 1 the empty word. A set of words on the alphabet A and containing A is said to be factorial if it contains the factors of its elements. An internal factor of a word x is a word v such that $x = uvw$ with u, w nonempty.

2.1 Neutral Sets

Let S be a factorial set on the alphabet A. For $w\in S$, we denote $L_S(w) = \{a \in A \mid aw \in S\}, \quad R_S(w) = \{a \in A \mid wa \in S\}, E_S(w) = \{(a,b) \in A \times A \mid awb \in S\}$, and further $\ell _S(w) = {{\mathrm{Card}}}(L_S(w))$, $r_S(w) = {{\mathrm{Card}}}(R_S(w))$, $e_S(w) = {{\mathrm{Card}}}(E_S(w))$.

We omit the subscript S when it is clear from the context. A word w is right-extendable if $r(w)>0$, left-extendable if $\ell (w)>0$ and biextendable if $e(w)>0$. A factorial set S is called right-extendable (resp. left-extendable, resp. biextendable) if every word in S is right-extendable (resp. left-extendable, resp. biextendable).

A word w is called right-special if $r(w) \ge 2$. It is called left-special if $\ell (w) \ge 2$. It is called bispecial if it is both left-special and right-special. For $w\in S$, we denote

$$ m_S(w) = e_S(w) - \ell _S(w) - r_S(w) + 1. $$

A word w is called neutral if $m_S(w) = 0$. We say that a set S is neutral if it is factorial and every nonempty word $w \in S$ is neutral. The characteristic of S is the integer $\chi (S) = 1 - m_S(\varepsilon )$.

A neutral set of characteristic 1, simply called a neutral set, is such that all words (including the empty word) are neutral.

The following is a trivial example of a neutral set of characteristic 2.

Example 2.1

Let $A = \{a,b\}$ and let S be the set of factors of $(ab)^*$. Then S is neutral of characteristic 2.

As a more interesting example, any Sturmian set is a neutral set [1] (by a Sturmian set, we mean the set of factors of a strict episturmian word, see [8]).

The following example is the classical example of a Sturmian set.

Example 2.2

Let $A=\{a,b\}$ and let $f:A^*\rightarrow A^*$ be the Fibonacci morphism defined by $f(a)=ab$ and $f(b)=a$. The infinite word $x=\lim _{n\rightarrow \infty }f^n(a)$ is the Fibonacci word. One has $x=abaababa\cdots $. The Fibonacci set is the set of factors of the Fibonacci word. It is a Sturmian set, and thus a neutral set.

The factor complexity of a factorial set S of words on an alphabet A is the sequence $p_n = {{\mathrm{Card}}}(S \cap A^n)$. The complexity of a Sturmian set is $p_n=n({{\mathrm{Card}}}(A)-1)+1$. The following result (see [9]) shows that a neutral set has linear complexity.

Proposition 2.1

The factor complexity of a neutral set on k letters is given by $p_0 = 1$ and $p_n = n(k - \chi (S)) + \chi (S)$ for every $n \ge 1$.

Example 2.3

The complexity of the set of Example 2.1 is $p_n=2$ for any $n\ge 1$.

A set of words $S \ne \{\varepsilon \}$ is recurrent if it is factorial and for any $u,w \in S$, there is a $v \in S$ such that $uvw \in S$. An infinite factorial set is said to be minimal or uniformly recurrent if for any word $u \in S$ there is an integer $n \ge 1$ such that u is a factor of any word of S of length n. A uniformly recurrent set is recurrent.

2.2 Tree Sets

Let S be a biextendable set of words. For $w \in S$, we consider the set E(w) as an undirected graph on the set of vertices which is the disjoint union of L(w) and R(w) with edges the pairs $(a,b) \in E(w)$. This graph is called the extension graph of w. We sometimes denote $1 \otimes L(w)$ and $R(w) \otimes 1$ the copies of L(w) and R(w) used to define the set of vertices of E(w). We note that since E(w) has $\ell (w)+r(w)$ vertices and e(w) edges, the number $1-m_S(w)$ is the Euler characteristic of the graph E(w).

A biextendable set S is called a tree set of characteristic c if for any nonempty $w \in S$, the graph E(w) is a tree and if $E(\varepsilon )$ is a union of c trees. Note that a tree set of characteristic c is a neutral set of characteristic c.

Example 2.4

The set S of Example 2.1 is a tree set of characteristic 2.

A tree set of characteristic 1, simply called a tree set as in [1], is such that E(w) is a tree for any $w\in S$.

As an example, a Sturmian set is a tree set [1].

Example 2.5

Let $A=\{a,b\}$ and let $f:A^*\rightarrow A^*$ be the morphism defined by $f(a)=ab$ and $f(b)=ba$. The infinite word $x=\lim _{n\rightarrow \infty }f^n(a)$ is the Thue-Morse word. The Thue-Morse set is the set of factors of the Thue-Morse word. It is uniformly recurrent but it is not a tree set since $E(\varepsilon )=A\times A$.

Let S be a set of words. For $w\in S$, let $\varGamma _S(w)=\{x\in S\mid wx\in S\cap A^+w\}$. If S is recurrent, the set $\varGamma _S(w)$ is nonempty. Let

$$\begin{aligned} \mathrm{Ret}_S(w)=\varGamma _S(w)\setminus \varGamma _S(w) A^+ \end{aligned}$$

be the set of return words to w.

Note that a recurrent set S is uniformly recurrent if and only if the set $\mathrm{Ret}_S(w)$ is finite for any $w\in S$. Indeed, if N is the maximal length of the words in $\mathrm{Ret}_S(w)$ for a word w of length n, any word in S of length $N+n$ contains an occurrence of w. The converse is obvious.

We will use the following result [1, Theorem 4.5]. We denote by $F_A$ the free group on A.

Theorem 2.2

(Return Theorem). Let S be a uniformly recurrent tree set. For any $w\in S$, the set $\mathrm{Ret}_S(w)$ is a basis of the free group $F_A$.

Note that this result implies in particular that for any $w\in S$, the set $\mathrm{Ret}_S(w)$ has ${{\mathrm{Card}}}(A)$ elements.

Example 2.6

Let S be the Tribonacci set. It is the set of factors of the infinite word $x=abacaba\cdots $ which is the fixed point of the morphism f defined by $f(a)=ab$, $f(b)=ac$, $f(c)=a$. It is a Sturmian set (see [8]). We have $\mathrm{Ret}_S(a)=\{a,ba,ca\}$.

3 Automata

All automata considered in this paper are deterministic and strongly connected and we simply call them automata. An automaton on a finite set Q of states is given by a partial map from $Q\times A$ into Q denoted $p\mapsto p\cdot a$, and extended to words with the same notation. For a word w, we denote by $\varphi _\mathcal {A}$ the map $p\in Q\mapsto p\cdot w\in Q$.

The transition monoid of the automaton $\mathcal {A}$ is the monoid M of partial maps from Q to itself of the form $\varphi _\mathcal {A}(w)$ for $w\in A^*$. The rank of an element m of M is the cardinality of its image, denoted $\mathrm{Im}(m)$.

Let $\mathcal {A}$ be an automaton and let S be a set of words. Denote by ${{\mathrm{rank}}}_\mathcal {A}(w)$ the rank of the map $\varphi _\mathcal {A}(w)$, also called the rank of w with respect to the automaton $\mathcal {A}$. The S-minimal rank of $\mathcal {A}$ is the minimal value of ${{\mathrm{rank}}}_\mathcal {A}(w)$ for $w\in S$. It is denoted ${{\mathrm{rank}}}_\mathcal {A}(S)$. A word of rank 1 is called synchronizing.

The following result gives a method to compute ${{\mathrm{rank}}}_\mathcal {A}(S)$ and thus gives a method to decide if $\mathcal {A}$ admits synchronizing words.

Theorem 3.1

Let S be a recurrent set and let $\mathcal {A}$ be an automaton. Let w be in S and let $I=\mathrm{Im}(w)$. Then w has rank equal to ${{\mathrm{rank}}}_\mathcal {A}(S)$ if and only if ${{\mathrm{rank}}}_\mathcal {A}(wz)={{\mathrm{rank}}}_\mathcal {A}(w)$ for any $z\in \mathrm{Ret}_S(w)$.

Proof

Assume first that ${{\mathrm{rank}}}_\mathcal {A}(w)={{\mathrm{rank}}}_\mathcal {A}(S)$. If z is in $\mathrm{Ret}_S(w)$, then wz is in S. Since ${{\mathrm{rank}}}_\mathcal {A}(wz)\le {{\mathrm{rank}}}_\mathcal {A}(w)$ and since ${{\mathrm{rank}}}_\mathcal {A}(w)$ is minimal, this forces ${{\mathrm{rank}}}_\mathcal {A}(wz)={{\mathrm{rank}}}_\mathcal {A}(w)$.

Conversely, assume that w satisfies the condition. For any $r\in \mathrm{Ret}_S(w)$, we have $I\cdot r=\mathrm{Im}(wr)\subset \mathrm{Im}(w)=I$. Since ${{\mathrm{rank}}}_\mathcal {A}(wr)={{\mathrm{rank}}}_\mathcal {A}(w)$, this forces $I\cdot r=I$. Since $\varGamma _S(w)\subset \mathrm{Ret}_S(w)^*$, this proves that

$$\begin{aligned} \varGamma _S(w)\subset \{z\in S\mid I\cdot z=I\}. \end{aligned}$$

(3.1)

Let u be a word of S of minimal rank. Since S is recurrent, there exists words $v,v'$ such that $wvuv'w\in S$. Then $vuv'w$ is in $\varGamma _S(w)$ and thus $I\cdot vuv'w=I$ by (3.1). This implies that ${{\mathrm{rank}}}_\mathcal {A}(u)\ge {{\mathrm{rank}}}_\mathcal {A}(vuv'w)={{\mathrm{rank}}}_\mathcal {A}(w)$. Thus w has minimal rank in S.

Theorem 3.1 can be used to compute the S-minimal rank of an automaton in an effective way for a uniformly recurrent set S provided one can compute effectively the finite sets $\mathrm{Ret}_S(w)$ for $w\in S$.

Example 3.1

Let S be the Fibonacci set and let $\mathcal {A}$ be the automaton given by its transitions in Fig. 3.1 on the left. One has $\mathrm{Im}(a^2)=\{1,2,4\}$. The action on the 3-element sets of states of the automaton is shown on the right. By Theorem 3.1, we obtain ${{\mathrm{rank}}}_\mathcal {A}(S)=3$.

We denote by $\mathcal {L},\mathcal {R},\mathcal {D},\mathcal {H}$ the usual Green relations on a monoid M (see [10]). Recall that $\mathcal {R}$ is the equivalence on M defined by $m\mathcal {R}n$ if $mM=nM$. The $\mathcal {R}$-class of m is denoted R(m). Symmetrically, one denotes by $\mathcal {L}$ the equivalence defined by $m\mathcal {L}n$ if $Mm=Mn$. It is well-known that the equivalences $\mathcal {R}$ and $\mathcal {L}$ commute. The equivalence $\mathcal {R}\mathcal {L}=\mathcal {L}\mathcal {R}$ is denoted $\mathcal {D}$. Finally, one denotes by $\mathcal {H}$ the equivalence $\mathcal {R}\cap \mathcal {L}$.

The following result is proved in [4] in a particular case (that is, for an automaton recognizing the submomoid generated by a bifix code).

Proposition 3.2

Let S be a recurrent set and $\mathcal {A}$ be a strongly connected automaton. Set $\varphi =\varphi _\mathcal {A}$ and $M=\varphi (A^*)$. The set of elements of $\varphi (S)$ of rank ${{\mathrm{rank}}}_\mathcal {A}(S)$ is included in a regular $\mathcal {D}$-class of M.

Proof

Set $d={{\mathrm{rank}}}_\mathcal {A}(S)$. Let $u,v\in S$ be two words of rank d. Set $m=\varphi (u)$ and $n=\varphi (v)$. Let w be such that $uwv\in S$. We show first that $m\mathcal {R}\varphi (uwv)$ and $n\mathcal {L}\varphi (uwv)$.

For this, let t be such that $uwvtu\in S$. Set $z=wvtu$. Since $uz\in S$, the rank of uz is d. Since $\mathrm{Im}(uz)\subset \mathrm{Im}(z)\subset \mathrm{Im}(u)$, this implies that the images are equal. Consequently, the restriction of $\varphi (z)$ to $\mathrm{Im}(u)$ is a permutation. Since $\mathrm{Im}(u)$ is finite, there is an integer $\ell \ge 1$ such that $\varphi (z)^\ell $ is the identity on $\mathrm{Im}(u)$. Set $e=\varphi (z)^\ell $ and $s=tuz^{\ell -1}$. Then, since e is the identity on $\mathrm{Im}(u)$, one has $m=me$. Thus $m=\varphi (uwv)\varphi (s)$, and since $\varphi (uwv)=m\varphi (wv)$, it follows that m and $\varphi (uwv)$ are $\mathcal {R}$-equivalent.

Similarly n and $\varphi (uwv)$ are $\mathcal {L}$-equivalent. Indeed, let $t'$ be such that $vt'uwv\in S$. Set $z'=t'uwv$. Then $\mathrm{Im}(vz')\subset \mathrm{Im}(z')\subset \mathrm{Im}(v)$. Since $vz'$ is a factor of $z^2$ and z has rank d, it follows that $d={{\mathrm{rank}}}(z^2)\le {{\mathrm{rank}}}(vz')\le {{\mathrm{rank}}}(v)=d$. Therefore, $vz'$ has rank d and consequently the images $\mathrm{Im}(vz')$, $\mathrm{Im}(z')$ and $\mathrm{Im}(v)$ are equal. There is an integer $\ell '\ge 1$ such that $\varphi (z')^{\ell '}$ is the identity on $\mathrm{Im}(v)$. Set $e'=\varphi (z')^{\ell '}$. Then $n=ne'=n\varphi (z')^{\ell '-1}\varphi (tuwv)=nq\varphi (uwv)$, with $q=\varphi (z')^{\ell '-1}\varphi (t)$. Since $\varphi (uwv)=\varphi (uw)n$, one has $n\mathcal {L}\varphi (uwv)$. Thus m, n are $\mathcal {D}$-equivalent, and $\varphi (uwv)\in R(m)\cap L(n)$.

Set $p=\varphi (wv)$. Then $p=\varphi (w)n$ and, with the previous notation, $n=ne'=nq\varphi (u)p$, so $L(n)=L(p)$. Thus $mp=\varphi (uwv)\in R(m)\cap L(p)$, and by Clifford and Miller’s Lemma, $R(p)\cap L(m)$ contains an idempotent. Thus the $\mathcal {D}$-class of m, p and n is regular.

The $\mathcal {D}$-class containing the elements of $\varphi (S)$ of rank ${{\mathrm{rank}}}_\mathcal {A}(S)$ is called the S-minimal $\mathcal {D}$-class of M. This $\mathcal {D}$-class appears in a different context in [11] (for a survey concerning the use of Green’s relations in automata theory, see [12]).

Example 3.2

Let S be the Fibonacci set and let $\mathcal {A}$ be the automaton represented in Fig. 3.2 on the left. The S-minimal $\mathcal {D}$-class of the transition monoid of $\mathcal {A}$ is represented in Fig. 3.2 on the right.

Thus ${{\mathrm{rank}}}_\mathcal {A}(S)=1$. We indicate with a $*$ the $\mathcal {H}$-classes containing an idempotent.

Let us recall some notions concerning groups in transformation monoids (see [4] for a more detailed presentation). Let M be a transformation monoid on a set Q. For $I\subset Q$, we denote

$$\begin{aligned} {{\mathrm{Stab}}}_M(I)=\{x\in M\mid Ix=I\} \end{aligned}$$

or ${{\mathrm{Stab}}}(I)$ if the monoid M is understood. The holonomy group of M relative to I is the restriction of the elements of ${{\mathrm{Stab}}}_M(I)$ to the set I. It is denoted ${{\mathrm{Group}}}(I)$.

Let D be a regular $\mathcal {D}$-class in a transformation monoid M on a set Q. The holonomy groups of M relative to the sets Qm for $m\in D$ are all equivalent. The structure group of D is any of them.

Let $\mathcal {A}$ be an automaton with Q as set of states and let $I\subset Q$. Let w be a word such that $\varphi _\mathcal {A}(w)\in {{\mathrm{Stab}}}(I)$. The restriction of $\varphi _\mathcal {A}(w)$ to I is a permutation which belongs to ${{\mathrm{Group}}}(I)$. It is called the permutation defined by the word w on the set I.

Let $\mathcal {A}$ be a strongly connected automaton and let S be a recurrent set of words. The S-group of $\mathcal {A}$ is the structure group of its S-minimal $\mathcal {D}$-class. It is denoted $G_\mathcal {A}(S)$.

For the set $S=A^*$ and a strongly connected automaton, the group $G_\mathcal {A}(S)$ is a transitive permutation group of degree $d_X(S)$ (see [10, Theorem 9.3.10]). We conjecture that it holds for a uniformly recurrent tree set. It is not true for any uniformly recurrent set S, as shown in the following examples.

Example 3.3

Let S be the set of factors of $(ab)^*$ and let $\mathcal {A}$ be the automaton of Fig. 3.3. The minimal S-rank of $\mathcal {A}$ is 2 but the group $G_\mathcal {A}(S)$ is trivial.

Example 3.4

Let S be the Thue-Morse set and let $\mathcal {A}$ be the automaton represented in Fig. 3.4 on the left. The word aa has rank 3 and image $I=\{1,2,4\}$.

The action on the images accessible from I is given in Fig. 3.5. All words with image $\{1,2,4\}$ end with aa. The paths returning for the first time to $\{1,2,4\}$ are labeled by the set $\mathrm{Ret}_S(aa)=\{b^2a^2,bab^2aba^2,bab^2a^2,b^2aba^2\}$. Thus ${{\mathrm{rank}}}_\mathcal {A}(S)=3$ by Theorem 3.1. Moreover each of the words of $\mathrm{Ret}_S(a^2)$ defines the trivial permutation on the set $\{1,2,4\}$. Thus $G_\mathcal {A}(S)$ is trivial.

The fact that $d_\mathcal {A}(S)=3$ and that $G_\mathcal {A}(S)$ is trivial can be seen directly as follows. Consider the group automaton $\mathcal {B}$ represented in Fig. 3.4 on the right and corresponding to the map sending each word to the difference modulo 3 of the number of occurrences of a and b. There is a reduction $\rho $ from $\mathcal {A}$ onto $\mathcal {B}$ such that $1\mapsto 0$, $2\mapsto 1$, and $4\mapsto 2$. This accounts for the fact that $d_\mathcal {A}(S)=3$. Moreover, one may verify that any return word x to $a^2$ has equal number of a and b (if $x=uaa$ then aauaa is in S, which implies that aua and thus uaa have the same number of a and b). This implies that the permutation $\varphi _\mathcal {B}(x)$ is the identity, and therefore also the restriction of $\varphi _\mathcal {A}(x)$ to I. The same argument holds for Example 3.3 by considering the parity of the length.

4 Codes

A code is a set X such that for any $n,m\ge 0$ any $x_1,\ldots ,x_n$ and $y_1,\ldots ,y_m$ in X, one has $x_1\cdots x_n=y_1\cdots y_m$ only if $n=m$ and $x_1=y_1$,..., $x_n=y_n$. A prefix code is a set X of nonempty words which does not contain any proper prefix of its elements. A suffix code is defined symmetrically. A bifix code is a set which is both a prefix code and a suffix code.

Let S be a set of words. A prefix code $X\subset S$ is said to be S-maximal if it is not properly contained in any prefix code $Y\subset S$. The notion of an S-maximal suffix or bifix code are symmetrical.

It follows from results of [4] that for a recurrent set S, a finite bifix code $X\subset S$ is S-maximal as a bifix code if and only if it is S-maximal as a prefix code.

Given a set $X\subset S$, we denote $\lambda _S(X) = \sum _{x \in X}\lambda _S(x)$ where $\lambda _S$ is the map defined by $\lambda _S(x)=e_S(x)-r_S(x)$. The following result is [9, Proposition 4].

Proposition 4.1

Let S be a neutral set of characteristic c on the alphabet A, and let X be a finite S-maximal prefix code. Then $\lambda _S(X) = {{\mathrm{Card}}}(A) - c$.

Symmetrically, one denotes $\rho _S(x)=e_S(x)-\ell _S(x)$. The dual of Proposition 4.1 holds for suffix codes instead of prefix codes with $\rho _S$ instead of $\lambda _S$.

Note that when S is Sturmian, one has $\lambda _S(x)={{\mathrm{Card}}}(A)-1$ if x is left-special and $\lambda _S(x)=0$ otherwise. Thus Proposition 4.1 expresses the fact that any finite S-maximal prefix code contains exactly one left-special word [4, Proposition 5.1.5].

Example 4.1

Let S be the Fibonacci set and let $X=\{aa,ab,b\}$. The set X is an S-maximal prefix code. It contains exactly one left-special word, namely ab. Accordingly, one has $\lambda _S(X)=1$.

Let S be a factorial set and let $X\subset S$ be a finite prefix code. The S-degree of X is the S-minimal rank of the minimal automaton of $X^*$. It is denoted $d_X(S)$.

When X is a finite bifix code, the S-degree can be defined in a different way. A parse of a word w is a triple (s, x, p) such that $w=sxp$ with $s\in A^*\setminus A^*X$, $x\in X^*$ and $p\in A^*\setminus XA^*$. For a recurrent set S and an S-maximal bifix code X, $d_X(S)$ is the maximal number of parses of a word of S. A word $w\in S$ has $d_X(S)$ parses if and only if it is not an internal factor of a word of X (see [4]).

The following result is [13, Theorem 4.4].

Theorem 4.2

(Finite Index Basis Theorem). Let S be a uniformly recurrent tree set and let $X\subset S$ be a finite bifix code. Then X is an S-maximal bifix code of S-degree d if and only if it is a basis of a subgroup of index d of $F_A$.

Note that the result implies that any S-maximal bifix code of S-degree n has $d({{\mathrm{Card}}}(A)-1)+1$ elements. Indeed, by Schreier’s Formula, a subgroup of index d of a free group of rank r has rank $d(r-1)+1$.

Example 4.2

Let S be a Sturmian set. For any $n\ge 1$, the set $X=S\cap A^n$ is an S-maximal bifix code of S-degree n. According to theorem 4.2, it is a basis of the subgroup which is the kernel of the group morphism from $F_A$ onto the additive group $\mathbb {Z}/n\mathbb {Z}$ sending each letter to 1.

The following statement generalizes [4, Theorem 4.3.7] where it is proved for a bifix code (and in this case with a stronger conclusion).

Theorem 4.3

Let S be a recurrent set and let X be a finite S-maximal prefix code of S-degree n. The set of nonempty proper prefixes of X contains a disjoint union of $n-1$ S-maximal suffix codes.

Proof

Let P be the set of proper prefixes of X. Any word of S of rank n of length larger than the words of X has n suffixes which are in P.

We claim that this implies that any word in S is a suffix of a word with at least n suffixes in P. Indeed, let $x\in S$ be of minimal rank. For any $w\in S$, since S is recurrent, there is some u such that $xuw\in S$. Then xuw is of rank n and has n suffixes in P. This proves the claim.

Let $Y_i$ for $1\le i\le n$ be the set of $p\in P$ which have i suffixes in P. One has $Y_1=\{\varepsilon \}$ and each $Y_i$ for $2\le i\le d$ is clearly a suffix code. It follows from the claim above that it is S-maximal. Since the $Y_i$ are also disjoint, the result follows.

Corollary 1

Let S be a recurrent neutral set of characteristic c, and let X be a finite S-maximal prefix code of S-degree n. The set P of proper prefixes of X satisfies $\rho _S(P)\ge n({{\mathrm{Card}}}(A)-c)$.

Proof

By Theorem 4.3, there exist $n-1$ pairwise disjoint S-maximal suffix codes $Y_i$ ($2\le i\le n$) such that P contains all $Y_i$. By the dual of Proposition 4.1, we have $\rho _S(Y_i)={{\mathrm{Card}}}(A)-c$ for $2\le i\le n$. Since $\rho _S(\varepsilon )=e_S(\varepsilon )-\ell _S(\varepsilon )= m_S(\varepsilon )+r_S(\varepsilon )-1={{\mathrm{Card}}}(A)-c$, we obtain $\rho _S(P)\ge \rho _S(\varepsilon )+(n-1)({{\mathrm{Card}}}(A)-c)=n({{\mathrm{Card}}}(A)-c)$.

4.1 A Cardinality Theorem for Prefix Codes

Theorem 4.4

Let S be a uniformly recurrent neutral set of characteristic c. Any finite S-maximal prefix code has at least $d_X(S)({{\mathrm{Card}}}(A)-c)+1$ elements.

Proof

Let P be the set of proper prefixes of X. We may identify X with the set of leaves of a tree having P as set of internal nodes, each having $r_S(p)$ sons. By a well-known argument on trees, we have ${{\mathrm{Card}}}(X)=1+\sum _{p\in P}(r_S(p)-1)$. Thus ${{\mathrm{Card}}}(X)=1+\rho _S(P)$. By Corollary 1, we have $\rho _S(P)\ge n({{\mathrm{Card}}}(A)-c)$.

The next example shows that the prefix code can have strictly more than $d_X(S)({{\mathrm{Card}}}(A)-c)+1$ elements.

Example 4.3

Let S be the Fibonacci set. Let X be the S-maximal prefix code represented in Fig. 4.1. The states of the minimal automaton of $X^*$ are represented on the figure. The automaton coincides with that of Example 3.1. Thus $d_X(S)=3$ and ${{\mathrm{Card}}}(X)=6$ while $d_X(S)({{\mathrm{Card}}}(A)-1)+1=4$.

If X is bifix, then it has $d_X(S)({{\mathrm{Card}}}(A)-c)+1$ elements by a result of [9]. The following example shows that an S-maximal prefix code can have $d_X(S)({{\mathrm{Card}}}(A)-c)+1$ elements without being bifix.

Example 4.4

Let S be the Fibonacci set and let

$$\begin{aligned} X=\{aaba,ab,ba\}. \end{aligned}$$

The literal automaton of $X^*$ is represented in Fig. 4.2 on the left. The prefix code X is S-maximal. The word ab has rank 2 in the literal automaton of $X^*$. Indeed, $\mathrm{Im}(ab)=\{1,3\}$. Moreover $R_S(ab)=\{ab,aab\}$. The ranks of abab and abaab are also equal to 2, as shown in Fig. 4.2 on the right. Thus the S-degree of X is 2 by Proposition 3.1. The code X is not bifix since ba is a suffix of aaba.

4.2 The Group of a Bifix Code

The following result is proved in [4, Theorem 7.2.5] for a Sturmian set S. Recall that a group code of degree d is a bifix code Z such that $Z^*=\varphi ^{-1}(K)$ for a surjective morphism $\varphi $ from $A^*$ onto a finite group G and a subgroup K of index d in G. Equivalently, a bifix code Z is a group code if it generates the submonoid $H\cap A^*$ where H is a subgroup of index d of the free group $F_A$.

The S-group of a prefix code, denoted $G_X(S)$, is the group $G_\mathcal {A}(S)$ where $\mathcal {A}$ is the minimal automaton of $X^*$.

Theorem 4.5

Let Z be a group code of degree d and let S be a uniformly recurrent tree set S. The set $X=Z\cap S$ is an S-maximal bifix code of S-degree d and $G_X(S)$ is equivalent to the representation of $F_A$ on the cosets of the subgroup generated by X.

Proof

The first part is [14, Theorem 5.10], obtained as a corollary of the Finite Index Basis Theorem. To see the second part, let H be the subgroup generated by X of the free group $F_A$. Consider a word $w\in S$ which is not an internal factor of X. Let P be the set of proper prefixes of X which are suffixes of w. Then P has d elements since for each $p\in P$, there is a parse of w of the form (s, x, p). Moreover P is a set of representatives of the right cosets of H. Indeed, let $p,q\in P$ and assume that $p=uq$ with $u\in S$. If $p\in Hq$, then $u\in X^*\cap S$. Since p cannot have a prefix in X, we conclude that $p=q$. Since H has index d, this implies the conclusion.

Let $\mathcal {A}=(Q,i,i)$ be the minimal automaton of $X^*$. Set $I=Q\cdot w$. Let ${{\mathrm{Stab}}}(I)$ be the set of words $x\in A^*$ such that $I\cdot x=I$. Note that ${{\mathrm{Stab}}}(I)$ contains the set $\mathrm{Ret}_S(w)$ of right return words to w. For $x\in {{\mathrm{Stab}}}(I)$, let $\pi (x)$ be the permutation defined by x on I. By definition, the group $G_X(S)$ is generated by $\pi ({{\mathrm{Stab}}}(I))$. Since ${{\mathrm{Stab}}}(I)$ contains $\mathrm{Ret}_S(w)$ and since $\mathrm{Ret}_S(w)$ generates the free group $F_A$, the set ${{\mathrm{Stab}}}(I)$ generates $F_A$.

Let $x\in {{\mathrm{Stab}}}(I)$. For $p,q\in I$, let $u,v\in P$ be such that $i\cdot u=p$, $i\cdot v=q$. Let us verify that

$$\begin{aligned} p\cdot x=q\Leftrightarrow ux\in Hv. \end{aligned}$$

(4.1)

Indeed, let $t\in S$ be such that $vt\in X$. Then, one has $p\cdot x=q$ if and only if $uxt\in X^*$ which is equivalent to $ux\in Hv$. Since ${{\mathrm{Stab}}}(I)$ generates $F_A$, Eq. (4.1) shows that the bijection $u\mapsto i\cdot u$ from P onto I defines an equivalence from $G_X(S)$ onto the representation of $F_A$ on the cosets of H.

Example 4.5

Let S be the Fibonacci set and let $Z=A^2$ which is a group code of degree 2 corresponding to the morphism from $A^*$ onto the additive $\mathbb {Z}/2\mathbb {Z}$ sending each letter to 1. Then $X=\{aa,ab,ba\}$. The minimal automaton of $X^*$ is represented in Fig. 4.3 on the left. The word a has 2 parses and its image is the set $\{1,2\}$. We have $\mathrm{Ret}_S(a)=\{a,ba\}$ and the action of $\mathrm{Ret}_S(a)$ on the minimal images is indicated in Fig. 4.3 on the right. The word a defines the permutation (12) and the word ba the identity.

Theorem 4.5 is not true for an arbitrary minimal set instead of a minimal tree set (see Example 3.4). The second part is true for an arbitrary finite S-maximal bifix code by the Finite Index Basis Theorem. We have no example where the second part is not true when X is S-maximal prefix instead of S-maximal bifix.

References

Berthé, V., De Felice, C., Dolce, F., Leroy, J., Perrin, D., Reutenauer, C., Rindone, G.: Acyclic, connected and tree sets. Monats. Math. 176, 521–550 (2015)
Article Google Scholar
Restivo, A.: Codes and local constraints. Theoret. Comput. Sci. 72(1), 55–64 (1990)
Article MATH MathSciNet Google Scholar
Reutenauer, C.: Ensembles libres de chemins dans un graphe. Bull. Soc. Math. France 114(2), 135–152 (1986)
MATH MathSciNet Google Scholar
Berstel, J., De Felice, C., Perrin, D., Reutenauer, C., Rindone, G.: Bifix codes and Sturmian words. J. Algebra 369, 146–202 (2012)
Article MATH MathSciNet Google Scholar
Carpi, A., de Luca, A.: Codes of central Sturmian words. Theoret. Comput. Sci. 340(2), 220–239 (2005)
Article MATH MathSciNet Google Scholar
Almeida, J., Costa, A.: On the transition semigroups of centrally labeled Rauzy graphs. Internat. J. Algebra Comput. 22(2), 1250018 (2012). 25
Article MathSciNet Google Scholar
Almeida, J., Costa, A.: Presentations of Schützenberger groups of minimal subshifts. Israel J. Math. 196(1), 1–31 (2013)
Article MATH MathSciNet Google Scholar
Droubay, X., Justin, J., Pirillo, G.: Episturmian words and some constructions of de Luca and Rauzy. Theoret. Comput. Sci. 255(1–2), 539–553 (2001)
Article MathSciNet Google Scholar
Dolce, F., Perrin, D.: Enumeration formulæ in neutral sets. In: Potapov, I. (ed.) DLT 2015. LNCS, vol. 9168, pp. 215–227. Springer, Heidelberg (2015)
Chapter Google Scholar
Berstel, J., Perrin, D., Reutenauer, C.: Codes and Automata. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Perrin, D., Schupp, P.: Automata on the integers, recurrence, distinguishability, and the equivalence of monadic theories. LICS 1986, 301–304 (1986)
Google Scholar
Colcombet, T.: Green’s relations and their use in automata theory. In: Dediu, A.-H., Inenaga, S., Martín-Vide, C. (eds.) LATA 2011. LNCS, vol. 6638, pp. 1–21. Springer, Heidelberg (2011)
Chapter Google Scholar
Berthé, V., De Felice, C., Dolce, F., Leroy, J., Perrin, D., Reutenauer, C., Rindone, G.: The finite index basis property. J. Pure Appl. Algebra 219, 2521–2537 (2015)
Article MATH MathSciNet Google Scholar
Berthé, V., De Felice, C., Dolce, F., Leroy, J., Perrin, D., Reutenauer, C., Rindone, G.: Maximal bifix decoding. Discrete Math. 338, 725–742 (2015)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

LIGM, Université Paris Est, Paris, France
Dominique Perrin

Authors

Dominique Perrin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominique Perrin .

Editor information

Editors and Affiliations

Universität Kiel, Kiel, Germany
Florin Manea
Universität Kiel, Kiel, Germany
Dirk Nowotka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perrin, D. (2015). Codes and Automata in Minimal Sets. In: Manea, F., Nowotka, D. (eds) Combinatorics on Words. WORDS 2015. Lecture Notes in Computer Science(), vol 9304. Springer, Cham. https://doi.org/10.1007/978-3-319-23660-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-23660-5_4
Published: 27 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23659-9
Online ISBN: 978-3-319-23660-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Codes and Automata in Minimal Sets

Abstract

Similar content being viewed by others

Kleene Closure on Regular and Prefix-Free Languages

Minimal Partial Languages and Automata

Prefix-Free Subsets of Regular Languages and Descriptional Complexity

Keywords

1 Introduction

2 Neutral and Tree Sets

2.1 Neutral Sets

Example 2.1

Example 2.2

Proposition 2.1

Example 2.3

2.2 Tree Sets

Example 2.4

Example 2.5

Theorem 2.2

Example 2.6

3 Automata

Theorem 3.1

Proof

Example 3.1

Proposition 3.2

Proof

Example 3.2

Example 3.3

Example 3.4

4 Codes

Proposition 4.1

Example 4.1

Theorem 4.2

Example 4.2

Theorem 4.3

Proof

Corollary 1

Proof

4.1 A Cardinality Theorem for Prefix Codes

Theorem 4.4

Proof

Example 4.3

Example 4.4

4.2 The Group of a Bifix Code

Theorem 4.5

Proof

Example 4.5

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation