Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In several previous papers (e.g., [1, 2124, 32] we have explored the ramifications of a decision procedure for the logical theory \(\text {Th}({\mathbb {N}}, +, a(n))\), where \((a(n))_{n \ge 0}\) is an infinite sequence specified by a finite-state machine M. Furthermore, in many cases we can explicitly enumerate various aspects of such sequences, such as subword complexity [8]. Roughly speaking, given a predicate P of one or more variables in the logical theory, the method transforms M to a new automaton \(M'\) that accepts the representations of those variables making the predicate true. The ideas are based on extensions of the logical theory \(\text {Th}({\mathbb {N}},+)\), sometimes called Presburger arithmetic [28, 29]. See, for example, [6].

A critical point is what sort of representations are allowed. According to the results in [5], it suffices that the representations be based on a Pisot number. The critical point is that there must be an automaton that can perform addition of two numbers in the appropriate representation [15, 16].

In the papers mentioned above, we applied our method to the so-called k-automatic sequences, which correspond to automata that work with the ordinary base-k expansions of numbers. More recently, we also proved a number of new results using Fibonacci (or “Zeckendorf”) representation [13], which is based on writing integers as a sum of Fibonacci numbers.

It is our contention that the power of this approach has not been widely appreciated, and that many results, previously proved using long and involved ad hoc techniques, can be proved with much less effort by phrasing them as logical predicates and employing a decision procedure. Furthermore, many enumeration questions can be solved with a similar approach. Although the worst-case running time of the procedure is enormously large (of the form where the number of 2’s is the number of quantifier alternations), n is the size of the polynomial, and p is a polynomial), in practice the procedure often terminates in a reasonable time.

In this paper we discuss our implementation of an analogous algorithm for Tribonacci representation. We use it to reprove some old results from the literature purely mechanically, as well as obtain some new results. The implementation of the decision procedure was created by the first author. It is called Walnut, and is available for free download at https://www.cs.uwaterloo.ca/~shallit/papers.html.

We have not rigorously proved the correctness of this implementation, but it has been tested in a large number of different ways (including some results verified with independently-written programs). In this, we are well in the tradition of many other results in combinatorics on words that have been verified with machine computations — despite a lack of formal verification of the code. Even if the code were formally verified, one could reasonably ask for a proof of the correctness of the verification code! We believe that publication of our code, allowing checking by any interested reader, serves as an adequate check.

We view our work as part of a modern trend in mathematics. For other works on using computerized formal methods to prove theorems see, for example, [25, 27].

2 Tribonacci Representation

Let the Tribonacci numbers be defined, as usual, by the linear recurrence \(T_n = T_{{n-1}} + T_{{n-2}}+ T_{{n-3}}\) for \(n \ge 3\) with initial values \(T_0 = 0\), \(T_1 = 1\), \(T_2 = 1\). (We caution the reader that some authors use a different indexing for these numbers.) Here are the first few values of this sequence.

n

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

\(T_n\)

0

1

1

2

4

7

13

24

44

81

149

274

504

927

1705

3136

5768

From the theory of linear recurrences we know that

$$\begin{aligned} T_n = c_1 \alpha ^n + c_2 \beta ^n + c_3 \gamma ^n \end{aligned}$$

where \(\alpha , \beta , \gamma \) are the zeros of the polynomial \(x^3 - x^2 -x - 1\). The only real zero is \(\alpha \doteq 1.83928675521416113\); the other two zeros are complex and are of magnitude \(<3/4\). Solving for the constants, we find that \(c_1 \doteq 0.336228116994941094225\), the real zero of the polynomial \(44x^3-2x-1=0\). It follows that \(T_n = c_1 \alpha ^n + O(.75^n)\). In particular \(T_n/T_{n-1} = \alpha + O(.41^n)\).

It is well-known that every non-negative integer can be represented, in an essentially unique way, as a sum of Tribonacci numbers \((T_i)_{i\ge 2}\), subject to the constraint that no three consecutive Tribonacci numbers are used [7]. For example, \(43 = T_7 + T_6 + T_4 + T_3\).

Such a representation can be written as a binary word \(a_1 a_2 \cdots a_n\) representing the integer \(\sum _{1 \le i \le n} a_i T_{n+2-i}\). For example, the binary word 110110 is the Tribonacci representation of 43.

Let \(\varSigma _2 = \{ 0, 1 \}\). For \(w = a_1 a_2 \cdots a_n \in \varSigma _2^*\), we define \([a_1 a_2 \cdots a_n]_T := \sum _{1 \le i \le n} a_i T_{n+2-i}\), even if \(a_1 a_2 \cdots a_n\) has leading zeros or occurrences of the word 111.

By \((n)_T\) we mean the canonical Tribonacci representation for the integer n, having no leading zeros or occurrences of 111. Note that \((0)_T = \epsilon \), the empty word. The language of all canonical representations of elements of \({\mathbb {N}}\) is \(\epsilon + (1 + 11)(0+01+011)^*\).

Just as Tribonacci representation is an analogue of base-k representation, we can define the notion of Tribonacci-automatic sequence as the analogue of the more familiar notation of k-automatic sequence [2, 11]. We say that an infinite word \(\mathbf{a} = (a_n)_{n \ge 0}\) is Tribonacci-automatic if there exists an automaton with output \(M = (Q, \varSigma _2, q_0, \delta , \kappa , \varDelta )\) for a coding \(\kappa \) such that \(a_n = \kappa (\delta (q_0, (n)_T))\) for all \(n \ge 0\). An example of a Tribonacci-automatic sequence is the infinite Tribonacci word,

$$\begin{aligned} \mathbf{T} = T_0 T_1 T_2 \cdots = 0102010010201\cdots \end{aligned}$$

which is generated by the following 3-state automaton (Fig. 1):

Fig. 1.
figure 1

Automaton generating the Tribonacci sequence

To compute \(T_i\), we express i in canonical Tribonacci representation, and feed it into the automaton. Then \(T_i\) is the output associated with the last state reached (denoted by the symbol after the slash).

A basic fact about Tribonacci representation is that addition can be performed by a finite automaton. To make this precise, we need to generalize our notion of Tribonacci representation to r-tuples of integers for \(r \ge 1\). A representation for \((x_1, x_2,\ldots , x_r)\) consists of a string of symbols z over the alphabet \(\varSigma _2^r\), such that the projection \(\pi _i(z)\) over the i’th coordinate gives a Tribonacci representation of \(x_i\). Notice that since the canonical Tribonacci representations of the individual \(x_i\) may have different lengths, padding with leading zeros will often be necessary. A representation for \((x_1, x_2, \ldots , x_r)\) is called canonical if it has no leading \([0,0,\ldots 0]\) symbols and the projections into individual coordinates have no occurrences of 111. We write the canonical representation as \((x_1, x_2, \ldots , x_r)_T\). Thus, for example, the canonical representation for (9, 16) is [0, 1][1, 0][0, 0][1, 1][0, 1].

Thus, our claim about addition in Tribonacci representation is that there exists a deterministic finite automaton (DFA) \(M_\mathrm{add}\) that takes input words of the form \([0,0,0]^* (x,y,z)_T\), and accepts if and only if \(x +y =z\). Thus, for example, \(M_\mathrm{add}\) accepts [1, 0, 1][0, 1, 1][0, 0, 0] since the three words obtained by projection are 100, 010, and 110, which represent, respectively, 4, 2, and 6 in Tribonacci representation.

Since this automaton does not appear to have been given explicitly in the literature and it is essential to our implementation, we give it below in Table 1. This automaton actually works even for non-canonical expansions having three consecutive 1’s. The initial state is state 1. The state 0 is a “dead state” that can safely be ignored.

We briefly sketch a proof of the correctness of this automaton. States can be identified with certain sequences, as follows: if xyz are the identical-length words arising from projection of a word that takes \(M_\mathrm{add}\) from the initial state 1 to the state t, then t is identified with the integer sequence \(([x0^n]_T + [y0^n]_T - [z0^n]_T)_{n \ge 0}\). State 0 corresponds to sequences that can never lead to 0, as they are too positive or too negative.

When we intersect this automaton with the appropriate regular language (ruling out input triples containing 111 in any coordinate), we get an automaton with 149 states accepting \(0^*(x,y,z)_T\) such that \(x + y = z\).

Another basic fact about Tribonacci representation is that, for canonical representations containing no three consecutive 1’s or leading zeros, the radix order on representations is the same as the ordinary ordering on \({\mathbb {N}}\). It follows that a very simple automaton can, on input \((x,y)_T\), decide whether \(x < y\).

Putting this all together, we get the following decision procedure:

Procedure 1

(Decision Procedure for Tribonacci-Automatic Words) Input:

  • \(m,n \in {\mathbb {N}}\);

  • m DFA’s generating the Tribonacci-automatic words \(\mathbf{w}_1,\mathbf{w}_2,\dots ,\mathbf{w}_m\);

  • a first-order proposition with n free variables \(\varphi (v_1,v_2,\dots ,v_n)\) using constants and relations definable in \(\text {Th}({\mathbb {N}},0,1,+)\) and indexing into \(\mathbf{w}_1,\mathbf{w}_2,\dots ,\mathbf{w}_m\).

Output: DFA with input alphabet \(\varSigma _2^n\) accepting

\(\{ (k_1,k_2,\dots ,k_n)_T \;:\;\varphi (k_1,k_2,\dots ,k_n) \text { holds} \}\).

Table 1. Transition table for \(M_\mathrm{add}\) for Tribonacci addition

3 Mechanical Proofs of Properties of the Infinite Tribonacci Word

Recall that a word x, whether finite or infinite, is said to have period p if \(x[i] = x[i+p]\) for all i for which this equality is meaningful. Thus, for example, the English word \(\mathtt{alfalfa}\) has period 3. The exponent of a finite word x, written \(\exp (x)\), is |x| / P, where P is the smallest period of x. Thus \(\exp (\mathtt{alfalfa}) = 7/3\).

If \(\mathbf{x}\) is an infinite word with a finite period, we say it is ultimately periodic. An infinite word \(\mathbf{x}\) is ultimately periodic if and only if there are finite words uv such that \(x = uv^\omega \), where \(v^\omega = vvv \cdots \).

A nonempty word of the form xx is called a square, and a nonempty word of the form xxx is called a cube. More generally, a nonempty word of the form \(x^n\) is called an n’th power. By the order of a square xx, cube xxx, or n’th power \(x^n\), we mean the length |x|.

The infinite Tribonacci word \(\mathbf{T} = 0102010 \cdots = T_0 T_1 T_2 \cdots \) can be described in many different ways. In addition to our definition in terms of automata, it is also the fixed point of the morphism \(\varphi (0) = 01\), \(\varphi (1) = 02\), and \(\varphi (1) = 0\). This word has been studied extensively in the literature; see, for example, [3, 9, 14, 17, 30, 31, 33, 34].

It can also be described as the limit of the finite Tribonacci words \((Y_n)_{n \ge 0}\), defined as follows:

$$\begin{aligned} Y_0&= \epsilon \quad&Y_1&= 2 \quad&Y_2&= 0 \\ Y_3&= 01 \quad&Y_n&= Y_{n-1} Y_{n-2} Y_{n-3} \ \text {for}\, n \ge 4. \end{aligned}$$

Note that \(Y_n\), for \(n \ge 2\), is the prefix of length \(T_n\) of \(\mathbf{T}\).

In the next subsection, we use our implementation to prove a variety of results about repetitions in \(\mathbf{T}\).

3.1 Repetitions

It is known that all strict epistandard words (or Arnoux-Rauzy words), are not ultimately periodic (see, for example, [19]). Since \(\mathbf{T}\) is in this class, we have the following known result which we can reprove using our method.

Theorem 2

The word \(\mathbf{T}\) is not ultimately periodic.

Proof

We construct a predicate asserting that the integer \(p \ge 1\) is a period of some suffix of \(\mathbf{T}\):

$$\begin{aligned} (p \ge 1) \ \wedge \ \exists n \ \forall i \ge n\ \mathbf{T}[i] =\mathbf{T}[i+p] . \end{aligned}$$

(Note: unless otherwise indicated, whenever we refer to a variable in a predicate, the range of the variable is assumed to be \({\mathbb {N}}= \{ 0, 1, 2, \ldots \}\).) From this predicate, using our program, we constructed an automaton accepting the language

$$\begin{aligned} L = 0^*\ \{ (p)_T \ : \ (p \ge 1) \ \wedge \ \exists n\ \forall i \ge n \ \mathbf{T}[i] = \mathbf{T}[i+p] \} . \end{aligned}$$

This automaton accepts the empty language, and so it follows that \(\mathbf{T}\) is not ultimately periodic.

Here is the log of our program:

figure a

The largest intermediate automaton during the computation had 5999 states.

A few words of explanation are in order: here “T” refers to the sequence \(\mathbf{T}\), and “E” is our abbreviation for \(\exists \) and “A” is our abbreviation for \(\forall \). The symbol “=> ” is logical implication, and “&” is logical and.    \(\square \)

From now on, whenever we discuss the language accepted by an automaton, we will omit the \(0^*\) at the beginning.

We now turn to repetitions. As a particular case of [17, Theorem. 6.31 and Example 7.6, p. 130] and [18, Example 6.21] we have the following result, which we can reprove using our method.

Theorem 3

\(\mathbf{T}\) contains no fourth powers.

Proof

A natural predicate for the orders of all fourth powers occurring in \(\mathbf{T}\):

$$\begin{aligned} (n > 0) \ \wedge \ \exists i \ \forall t<3n \ \mathbf{T}[i+t] = \mathbf{T}[i+n+t] . \end{aligned}$$

However, this predicate could not be run on our prover. It runs out of space while trying to determinize an NFA with 24904 states.

Instead, we make the substitution \(j = i+t\), obtaining the new predicate

$$\begin{aligned} (n > 0) \ \wedge \ \exists i \ \forall j\ ((j \ge i) \wedge (j<i+3n)) \implies \mathbf{T}[j] = \mathbf{T}[j+n] . \end{aligned}$$

The resulting automaton accepts nothing, so there are no fourth powers.

Here is the log.

figure b

The largest intermediate automaton in the computation had 86711 states.    \(\square \)

Next, we move on to a description of the orders of squares occurring in \(\mathbf{T}\). We reprove a result of Glen [17, Sect. 6.3.5].

Theorem 4

All squares in \(\mathbf{T}\) are of order \(T_n\) or \(T_n + T_{n-1}\) for some \(n \ge 2\). Furthermore, for all \(n \ge 2\), there exists a square of order \(T_n\) and \(T_n + T_{n-1}\) in \(\mathbf{T}\).

Proof

A natural predicate for the lengths of squares is

$$\begin{aligned} (n > 0) \ \wedge \ \exists i \ \forall t<n \ \mathbf{T}[i+t] = \mathbf{T}[i+n+t] . \end{aligned}$$

but when we run our solver on this predicate, we get an intermediate NFA of 4612 states that our solver could not determinize in the allotted space. The problem appears to arise from the three different variables indexing T. To get around this problem, we rephrase the predicate, introducing a new variable j that represents \(i+t\). This gives the predicate

$$\begin{aligned} (n>0) \ \wedge \ \exists i \ \forall j \ ((i\le j) \wedge (j< i+n)) \implies \mathbf{T}[j] = \mathbf{T}[j+n] . \end{aligned}$$

and the following log

The resulting automaton accepts exactly the language \(10^* + 110^*\). The largest intermediate automaton had 26949 states.   \(\square \)

figure c

We can easily get more information about the square occurrences in \(\mathbf{T}\). By modifying our previous predicate, we get

$$\begin{aligned} (n>0) \ \wedge \ \forall j \ ((i\le j) \wedge (j< i+n)) \implies \mathbf{T}[j] = \mathbf{T}[j+n] \end{aligned}$$

which encodes those (in) pairs such that there is a square of order n beginning at position i of \(\mathbf{T}\).

This automaton has only 10 states and efficiently encodes the orders and starting positions of each square in \(\mathbf{T}\). During the computation, the largest intermediate automaton had 26949 states. Thus we have proved the following new result:

Theorem 5

The language

$$\begin{aligned} \{ (i,n)_T \ : \ there\; is\; a\; square\; of\; order\; n\; beginning\; at\; position\; i\; in\; \mathbf{T} \} \end{aligned}$$

is accepted by the automaton in Fig. 2.

Fig. 2.
figure 2

Automaton accepting orders and positions of all squares in \(\mathbf{T}\)

Next, we examine the cubes in \(\mathbf{T}\). Evidently Theorem 4 implies that any cube in \(\mathbf{T}\) must be of order \(T_n\) or \(T_n + T_{n-1}\) for some n. However, not every order occurs. We thus recover the following result of Glen [17, Sect. 6.3.7].

Theorem 6

The cubes in \(\mathbf{T}\) are of order \(T_n\) for \(n \ge 5\), and a cube of each such order occurs.

Proof

We use the predicate

$$\begin{aligned} (n>0) \ \wedge \ \exists i \ \forall j \ ((i\le j) \wedge (j< i+2n)) \implies \mathbf{T}[j] = \mathbf{T}[j+n] . \end{aligned}$$

When we run our program, we obtain an automaton accepting exactly the language \((1000)0^*\), which corresponds to \(T_n\) for \(n \ge 5\). The largest intermediate automaton had 60743 states.    \(\square \)

4 Enumeration

Mimicking the base-k ideas in [8], we can also mechanically enumerate many aspects of Tribonacci-automatic sequences. We do this by encoding the factors having the property in terms of paths of an automaton. This gives the concept of Tribonacci-regular sequence. Roughly speaking, a sequence \((a(n))_{n \ge 0}\) taking values in \({\mathbb {N}}\) is Tribonacci-regular if the set of sequences

$$\begin{aligned} \{ (a([xw]_T)_{w \in \varSigma _2^*} \ : \ x \in \varSigma _2^* \} \end{aligned}$$

is finitely generated. Here we assume that \(a([xw]_T)\) evaluates to 0 if xw contains the word 111. Every Tribonacci-regular sequence \((a(n))_{n \ge 0}\) has a linear representation of the form \((u, \mu , v)\) where u and v are row and column vectors, respectively, and \(\mu :\varSigma _2 \rightarrow {\mathbb {N}}^{d \times d}\) is a matrix-valued morphism, where \(\mu (0) = M_0\) and \(\mu (1) = M_1\) are \(d \times d\) matrices for some \(d \ge 1\), such that

$$\begin{aligned} a(n) = u \cdot \mu (x) \cdot v \end{aligned}$$

whenever \([x]_T = n\). The rank of the representation is the integer d.

Recall that if \(\mathbf{x}\) is an infinite word, then the subword complexity function \(\rho _\mathbf{x} (n)\) counts the number of distinct factors of length n. Then, in analogy with [8, Theorem. 27], we have

Theorem 7

If \(\mathbf{x}\) is Tribonacci-automatic, then the subword complexity function of \(\mathbf{x}\) is Tribonacci-regular.

Using our implementation, we can obtain a linear representation of the subword complexity function for \(\mathbf{T}\). An obvious choice is to use the language

$$\begin{aligned} \{ (n,i)_T \ : \ \forall j < i \ \mathbf{T}[i..i+n-1] \ne \mathbf{T}[j..j+n-1] \} , \end{aligned}$$

based on a predicate that expresses the assertion that the factor of length n beginning at position i has never appeared before. Then, for each n, the number of corresponding i gives \(\rho _\mathbf{T}(n)\).

However, this does not run to completion in our implementation in the allotted time and space. Instead, let us substitute \(u = j+t\) and \(k = i-j\) to get the predicate

$$\begin{aligned} \forall k\ (((k>0)\wedge (k\le i)) \implies (\exists u\ ((u\ge j) \wedge (u<n+j)\wedge (\mathbf{T}[u]\ne \mathbf{T}[u+k])))) . \end{aligned}$$

This predicate is close to the upper limit of what we can compute using our program. The largest intermediate automaton had 1230379 states and the program took 12323.82 s, giving us a linear representation \((u, \mu , v)\) rank 22. When we minimize this using the algorithm in [4] we get the rank-12 linear representation

$$\begin{aligned} u = [1\ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 ] \end{aligned}$$
$$\begin{aligned} M_0&= \left[ \begin{array}{cccccccccccc} 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0\\ 0&{} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0\\ 0&{} 0&{} 0&{} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0\\ -1&{} 0&{} 1&{} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 0\\ 0&{} 0&{} 0&{} 0&{} 0&{} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0\\ -1&{} 0&{} 1&{} 0&{} 0&{} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0\\ -2&{} 0&{} 1&{} 0&{} 1&{} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0\\ -3&{} 0&{} 2&{} 0&{} 1&{} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0\\ -4&{} 0&{} 2&{} 0&{} 2&{} 0&{} 1&{} 0&{} 0&{} 0&{} 0&{} 0\\ -5&{} 0&{} 2&{} 0&{} 2&{} 0&{} 2&{} 0&{} 0&{} 0&{} 0&{} 0\\ -6&{} 0&{} 2&{} 0&{} 3&{} 0&{} 2&{} 0&{} 0&{} 0&{} 0&{} 0\\ -10&{} 0&{} 3&{} 0&{} 4&{} 0&{} 4&{} 0&{} 0&{} 0&{} 0&{} 0\\ \end{array} \right] \quad&\quad M_1&= \left[ \begin{array}{cccccccccccc} 0&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}1\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}0\\ \end{array} \right] \end{aligned}$$
$$\begin{aligned} v' = [1\ 3\ 5\ 7\ 9\ 11\ 15\ 17\ 21\ 29\ 33\ 55]^R . \end{aligned}$$

Comparing this to an independently-derived linear representation of the function \(2n+1\), we see they are the same. From this we get a well-known result (see, e.g., [12, Theorem 7]):

Theorem 8

The subword complexity function of \(\mathbf{T}\) is \(2n+1\).

We now turn to computing the exact number of square occurrences in the finite Tribonacci words \(Y_n\).

To solve this using our approach, we first generalize the problem to consider any length-n prefix of \(Y_n\), and not simply the prefixes of length \(T_n\).

The predicate represents the number of distinct squares in \(\mathbf{T}[0..n-1]\):

$$ L_\mathrm{ds} := \{ (n,i,j)_T \ : \ (j \ge 1) \text { and } (i+2j \le n) \text { and } \mathbf{T}[i..i+j-1] = \mathbf{T}[i+j..i+2j-1] \\ \text { and } \forall i' < i \ \mathbf{T}[i'..i'+2j-1] \not = \mathbf{T}[i..i+2j-1] \} . $$

This predicate asserts that \(\mathbf{T}[i..i+2j-1]\) is a square occurring in \(\mathbf{T}[0..n-1]\) and that furthermore it is the first occurrence of this particular word in \(\mathbf{T}[0..n-1]\).

This represents the total number of occurrences of squares in \(\mathbf{T}[0..n-1]\):

$$\begin{aligned} L_\mathrm{dos} := \{ (n,i,j)_T \ : \ (j \ge 1) \text { and }(i+2j \le n) \text { and } \mathbf{T}[i..i+j-1] = \mathbf{T}[i+j..i+2j-1] \} . \end{aligned}$$

This predicate asserts that \(\mathbf{T}[i..i+2j-1]\) is a square occurring in \(\mathbf{T}[0..n-1]\).

Unfortunately, applying our enumeration method to this suffers from the same problem as before, so we rewrite it as

$$\begin{aligned} (j \ge 1) \wedge \ (i+2j \le n) \ \wedge \forall u \ ((u \ge i)\wedge (u<i+j)) \implies \mathbf{T}[u] = \mathbf{T}[u+j] \end{aligned}$$

When we compute the linear representation of the function counting the number of such i and j, we get a linear representation of rank 63. Now we compute the minimal polynomial of \(M_0\) which is \((x-1)^2 (x^2 + x + 1)^2 (x^3 - x^2 -x - 1)^2\). Solving a linear system in terms of the roots (or, more accurately, in terms of the sequences 1, n, \(T_n\), \(T_{n-1}\), \(T_{n-2}\), \(nT_n\), \(nT_{n-1}\), \(n T_{n-2}\)) gives

Theorem 9

The total number of occurrences of squares in the Tribonacci word \(Y_n\) is

$$\begin{aligned} c(n) = {n \over {22}}(9 T_n - T_{n-1} - 5 T_{n-2}) +{1 \over {44}} (-117 T_n + 30 T_{n-1} + 33 T_{n-2}) + n - {7 \over 4} \end{aligned}$$

for \(n \ge 5\).

In a similar way, we can count the occurrences of cubes in the finite Tribonacci word \(Y_n\). Here we get a linear representation of rank 46. The minimal polynomial for \(M_0\) is \(x^4(x^3-x^2-x-1)^2(x^2+x+1)^2(x-1)^2\). Using analysis exactly like the square case, we easily find

Theorem 10

Let C(n) denote the number of cube occurrences in the Tribonacci word \(Y_n\). Then for \(n \ge 3\) we have

$$ C(n) = {1 \over {44}} (T_n + 2T_{n-1} -33 T_{n-2}) + {n \over {22}} (-6T_n + 8T_{n-1} + 7T_{n-2}) + {n \over 6} \\ -{1 \over 4} [n \equiv 0\ (\mathrm{mod}\ 3)] + {1 \over 12} [n \equiv 1\ (\mathrm{mod}\ 3)] -{7 \over 12} [n \equiv 2\ (\mathrm{mod}\ 3)] . $$

Here [P] is Iverson notation, and equals 1 if P holds and 0 otherwise.

5 Additional Results

Next, we encode the orders and positions of all cubes. We build a DFA accepting the language

$$\begin{aligned} \{ (i,n)_T \ : \ (n>0) \ \wedge \ \forall j \ ((i\le j) \wedge (j< i+2n))\implies \mathbf{T}[j] = \mathbf{T}[j+n]\} . \end{aligned}$$

Theorem 11

The language

$$\begin{aligned} \{ (n,i)_T \ : \ there\; is\; a\; cube\; of\; order\; n\; beginning\; at\; position\; i\; in\; \mathbf{T} \} \end{aligned}$$

is accepted by the automaton in Fig. 3.

Fig. 3.
figure 3

Automaton accepting orders and positions of all cubes in \(\mathbf{T}\)

We also computed an automaton accepting those pairs (pn) such that there is a factor of \(\mathbf{T}\) having length n and period p, and n is the largest such length corresponding to the period p. However, this automaton has 266 states, so we do not give it here.

5.1 Palindromes

We now turn to a characterization of the palindromes in \(\mathbf{T}\). Once again it turns out that the predicate we previously used in [13], namely,

$$\begin{aligned} \exists i \ \forall j<n \ \mathbf{T}[i+j] = \mathbf{T}[i+n-1-j], \end{aligned}$$

resulted in an intermediate NFA of 5711 states that we could not successfully determinize.

Instead, we used two equivalent predicates. The first accepts n if there is an even-length palindrome, of length 2n, centered at position i:

$$\begin{aligned} \exists i \ge n \ \forall j<n\ \mathbf{T}[i+j]=\mathbf{T}[i-j-1] . \end{aligned}$$

The second accepts n if there is an odd-length palindrome, of length \(2n+1\), centered at position i:

$$\begin{aligned} \exists i \ge n \ \forall j\ (1 \le j \le n) \implies \mathbf{T}[i+j]=\mathbf{T}[i-j]. \end{aligned}$$

Theorem 12

There exist palindromes of every length \(\ge 0\) in \(\mathbf{T}\).

Proof

For the first predicate, our program outputs the automaton below. It clearly accepts the Tribonacci representations for all n (Fig. 4).

The log of our program follows.

Fig. 4.
figure 4

Automaton accepting lengths of palindromes in \(\mathbf{T}\)

figure d

The largest intermediate automaton had 918871 states. This was a fairly significant computation, taking about two hours’ CPU time on a laptop.

We omit the details of the computation for the odd-length palindromes, which are quite similar.   \(\square \)

Remark 1

A. Glen has pointed out to us that this follows from the fact that \(\mathbf{T}\) is episturmian and hence rich, so a new palindrome is introduced at each new position in T.

We could also characterize the positions of all nonempty palindromes. To illustrate the idea, we generated an automaton accepting (in) such that \(\mathbf{T}[i-n..i+n-1]\) is an (even-length) palindrome (Fig. 5).

Fig. 5.
figure 5

Automaton accepting orders and positions of all nonempty even-length palindromes in \(\mathbf{T}\)

The prefixes are factors of particular interest. Let us determine which prefixes are palindromes:

Theorem 13

The prefix \(\mathbf{T}[0..n-1]\) of length n is a palindrome if and only if \(n=0\) or \((n)_T \in 1 + 11 + 10(010)^*(00+001+0011)\).

Proof

We use the predicate

$$\begin{aligned} \forall i<n\ \mathbf{T}[i] = \mathbf{T}[n-1-i]. \end{aligned}$$

The automaton generated is given below (Fig. 6).

Fig. 6.
figure 6

Automaton accepting lengths of palindromes in \(\mathbf{T}\)

   \(\square \)

Remark 2

A. Glen points out to us that the palindromic prefixes of \(\mathbf{T}\) are precisely those of the form \(\text {Pal}(w)\), where w is a finite prefix of the infinite word \((012)^\omega \) and \(\text {Pal}\) denotes the “iterated palindromic closure”; see, for example, [19, Example 2.6]. She also points out that these lengths are precisely the integers \((T_i + T_{i+2} - 3)/2\) for \(i \ge 1\).

5.2 Quasiperiods

We now turn to quasiperiods. An infinite word \(\mathbf a\) is said to be quasiperiodic if there is some finite nonempty word x such that \(\mathbf{a}\) can be completely “covered” with translates of x. Here we study the stronger version of quasiperiodicity where the first copy of x used must be aligned with the left edge of \(\mathbf w\) and is not allowed to “hang over”; these are called aligned covers in [10]. More precisely, for us \(\mathbf{a} = a_0 a_1 a_2 \cdots \) is quasiperiodic if there exists x such that for all \(i \ge 0\) there exists \(j\ge 0\) with \(i-n < j \le i\) such that \(a_j a_{j+1} \cdots a_{j+n-1} = x\), where \(n = |x|\). Such an x is called a quasiperiod. Note that the condition \(j \ge 0\) implies that, in this interpretation, any quasiperiod must actually be a prefix of \(\mathbf a\).

Glen, Levé, and Richomme characterized the quasiperiods of a large class of words, including the Tribonacci word [20, Theorem 4.19]. However, their characterization did not explicitly give the lengths of the quasiperiods. We do that in the following new result.

Theorem 14

A nonempty length-n prefix of \(\mathbf{T}\) is a quasiperiod of \(\mathbf{T}\) if and only if n is accepted by the following automaton (Fig. 7):

Fig. 7.
figure 7

Automaton accepting lengths of quasiperiods of the Tribonacci sequence

Proof

We write a predicate for the assertion that the length-n prefix is a quasiperiod:

$$\begin{aligned} \forall i \ge 0 \ \exists j \text { with } i-n < j \le i\text { such that } \forall t<n \ \mathbf{T}[t] = \mathbf{T}[j+t] . \end{aligned}$$

When we do this, we get the automaton above. These numbers are those i for which \(T_n \le i \le U_n\) for \(n \ge 5\), where \(U_2 = 0\), \(U_3 = 1\), \(U_4 = 3\), and \(U_n = U_{n-1} + U_{n-2} + U_{n-3} + 3\) for \(n \ge 5\).   \(\square \)

5.3 Unbordered Factors

Next we look at unbordered factors. A word y is said to be a border of x if y is both a nonempty proper prefix and suffix of x. A word x is bordered if it has at least one border. It is easy to see that if a word y is bordered iff it has a border of length \(\ell \) with \(0 < \ell \le |y|/2\).

Theorem 15

There is an unbordered factor of length n of \(\mathbf{T}\) if and only if \((n)_T\) is accepted by the automaton given below (Fig. 8).

Fig. 8.
figure 8

Automaton accepting lengths of unbordered factors of the Tribonacci sequence

Proof

As in a previous paper [13] we can express the property of having an unbordered factor of length n as follows

$$\begin{aligned} \exists i\ \forall j, 1 \le j \le n/2, \ \exists t<j\ \mathbf{T}[i+t] \ne \mathbf{T}[i+n-j+t] . \end{aligned}$$

However, this does not run to completion within the available space on our prover. Instead, make the substitutions \(t' = n-j\) and \(u = i+t\). This gives the predicate

$$\begin{aligned} \exists i\ \forall t',\ n/2 \le t' <n, \ \exists u, \ (i \le u < i+n-t')\ \mathbf{T}[u] \ne \mathbf{T}[u+t'] . \end{aligned}$$

Here is the log:

figure e

5.4 Lyndon Words

Next, we turn to some results about Lyndon words. Recall that a nonempty word x is a Lyndon word if it is lexicographically less than all of its nonempty proper prefixes.Footnote 1

Theorem 16

There is a factor of length n of \(\mathbf{T}\) that is Lyndon if and only if n is accepted by the automaton given below (Fig. 9).

Fig. 9.
figure 9

Automaton accepting lengths of Lyndon factors of the Tribonacci sequence

Proof

Here is a predicate specifying that there is a factor of length n that is Lyndon:

$$\begin{aligned} \exists i\ \forall j, 1 \le j < n, \ \exists t < n-j \ (\forall u<t \ \mathbf{T}[i+u]=\mathbf{T}[i+j+u]) \ \wedge \ \mathbf{T}[i+t] < \mathbf{T}[i+j+t] . \end{aligned}$$

Unfortunately this predicate did not run to completion, so we substituted \(u' := i+u\) to get

$$\begin{aligned} \exists i\ \forall j, 1 \le j < n, \ \exists t < n-j \ (\forall u', i \le u' < i+t \ \mathbf{T}[u']=\mathbf{T}[u'+j]) \ \wedge \ \mathbf{T}[i+t] < \mathbf{T}[i+j+t]. \quad \square \end{aligned}$$

5.5 Critical Exponent

Recall from Sect. 3 that \(\exp (w) = |w|/P\), where P is the smallest period of w. The critical exponent of an infinite word \(\mathbf{x}\) is the supremum, over all factors w of \(\mathbf{x}\), of \(\exp (w)\).

Then Tan and Wen [33] proved that

Theorem 17

The critical exponent of \(\mathbf{T}\) is \(\rho \doteq 3.19148788395311874706\), the real zero of the polynomial \(2x^3-12x^2+22x-13\).

A. Glen points out that this result can also be deduced from [26, Theorem 5.2].

Proof

Let x be any factor of exponent \(\ge 3\) in \(\mathbf{T}\). From Theorem 11 we know that such x exist. Let \(n = |x|\) and p be the period, so that \(n/p \ge 3\). Then by considering the first 3p symbols of x, which form a cube, we have by Theorem 11 that \(p = T_n\). So it suffices to determine the largest n corresponding to every p of the form \(T_n\). We did this using the predicate (Fig. 10).

Fig. 10.
figure 10

Length n of longest factors having period \(p= T_n\) of Tribonacci sequence

From inspection of the automaton, we see that the maximum length of a factor \(n = U_j\) having period \(p = T_j\), \(j \ge 2\), is given by

$$\begin{aligned} U_j = \left\{ \begin{array}{ll} 2, &{} \text {if}\, j = 2; \\ 5, &{} \text {if}\, j = 3; \\ {}[110(100)^{i-1} 0]_T, &{} \text {if}\, j = 3i+1 \ge 4; \\ {}[110(100)^{i-1} 01]_T, &{} \text {if}\, j = 3i+2 \ge 5; \\ {}[110(100)^{i-1} 011]_T, &{} \text {if}\, j = 3i+3 \ge 6. \end{array}\right. \end{aligned}$$

A tedious induction shows that \(U_j\) satisfies the linear recurrence \(U_j = U_{j-1}+U_{j-2}+U_{j-3} + 3\) for \(j \ge 5\). Hence we can write \(U_j\) as a linear combination of Tribonacci sequences and the constant sequence 1, and solving for the constants we get

$$\begin{aligned} U_j = {5 \over 2}T_j + T_{j-1} + {1 \over 2} T_{j-2} - {3 \over 2} \end{aligned}$$

for \(j \ge 2\).

The critical exponent of T is then \(\sup _{j \ge 1} U_j/T_j\). Now

$$\begin{aligned} U_j/T_j = {5 \over 2} + {{T_{j-1}} \over {T_j}} + {{T_{j-2}} \over {2T_j}} - {3 \over {2T_j}} = {5 \over 2} + \alpha ^{-1} + {1 \over 2} \alpha ^{-2} + O(1.8^{-j}). \end{aligned}$$

Hence \(U_j/T_j\) tends to \(5/2 + \alpha ^{-1} + {1\over 2}\alpha ^{-2} = \rho \).   \(\square \)

We can also ask the same sort of questions about the initial critical exponent of a word \(\mathbf w\), which is the supremum over the exponents of all prefixes of \(\mathbf w\).

Theorem 18

The initial critical exponent of \(\mathbf{T}\) is \(\rho - 1\).

Proof

We create an automaton \(M_\mathrm{ice}\) accepting the language

$$\begin{aligned} L = \{ (n,p)_T \ : \ \mathbf{T}[0..n-1] \text { has least period } p \} . \end{aligned}$$

It is depicted in Fig. 11 below. An analysis similar to that we gave above for the critical exponent gives the result.   \(\square \)

Fig. 11.
figure 11

Automaton accepting least periods of prefixes of length n

Theorem 19

The only prefixes of the Tribonacci word that are powers are those of length \(2 T_n\) for \(n \ge 5\).

Proof

The predicate

$$\begin{aligned} \exists d < n\ (\forall j<n-d \ \mathbf{T}[j]=\mathbf{T}[d+j]) \ \wedge \ (\forall k<d \ \mathbf{T}[k] = \mathbf{T}[n-d+k]) \end{aligned}$$

asserts that the prefix \(\mathbf{T}[0..n-1]\) is a power. When we run this through our program, the resulting automaton accepts \(100010^*\), which corresponds to \(F_{n+1} + F_{n-3} = 2T_n\) for \(n \ge 5\).   \(\square \)

6 Abelian Properties

We can derive some results about the abelian properties of the Tribonacci word \(\mathbf{T}\) by proving the analogue of Theorem 63 of [13]:

Theorem 20

Let n be a non-negative integer and let \(e_1 e_2 \cdots e_j\) be a Tribonacci representation of n, possibly with leading zeros, with \(j \ge 3\). Then

  1. (a)

    \(|\mathbf{T}[0..n-1]|_0 = [e_1 e_2 \cdots e_{j-1}]_T + e_j\).

  2. (b)

    \(|\mathbf{T}[0..n-1]|_1 = [e_1 e_2 \cdots e_{j-2}]_T + e_{j-1}\).

  3. (c)

    \(|\mathbf{T}[0..n-1]|_2 = [e_1 e_2 \cdots e_{j-3}]_T + e_{j-2}\).

Proof

By induction, in analogy with the proof of [13, Theorem 63].   \(\square \)

Recall that the Parikh vector \(\psi (x)\) of a word x over an ordered alphabet \(\varSigma = \{ a_1, a_2, \ldots , a_k \}\) is defined to be \((|x|_{a_1}, \ldots , |x|_{a_k})\), the number of occurrences of each letter in x. Recall that the abelian complexity function \(\rho _\mathbf{w}^\mathrm{ab} (n)\) counts the number of distinct Parikh vectors of the length-n factors of an infinite word \(\mathbf w\).

Using Theorem 20 we get another proof of a recent result of Turek [34].

Corollary 1

The abelian complexity function of \(\mathbf{T}\) is Tribonacci-regular.

Proof

First, from Theorem 20 there exists an automaton \(\text {TAB}\) such that \((n,i,j,k)_T\) is accepted iff \((i,j,k) = \psi (\mathbf{T}[0..n-1])\). In fact, such an automaton has 32 states.

Using this automaton, we can create a predicate P(ni) such that the number of i for which P(ni) is true equals \(\rho _\mathbf{T}^\mathrm{ab} (n)\). For this we assert that i is the least index at which we find an occurrence of the Parikh vector of \(\mathbf{T}[i..i+n-1]\):

$$ \forall i' < i \ \exists a_0, a_1, a_2, b_0,b_1, b_2, c_0,c_1,c_2, d_0, d_1, d_2 \\ \text {TAB}(i+n,a_0,a_1,a_2) \ \wedge \ \text {TAB}(i,b_0,b_1,b_2) \ \wedge \ \text {TAB}(i'+n,c_0,c_1,c_2) \ \wedge \ \text {TAB}(i',d_0,d_1,d_2) \ \wedge \ \\ ((a_0 - b_0 \not = c_0 - d_0) \ \vee \ (a_1 - b_1 \not = c_1 - d_1) \ \vee \ (a_2 - b_2 \not = c_2 - d_2)).\quad \square $$

Remark 3

Note that exactly the same proof would work for any word and numeration system where the Parikh vector of prefixes of length n is “synchronized” with n.

Remark 4

In principle we could mechanically compute the Tribonacci-regular representation of the abelian complexity function using this technique, but with our current implementation this is not computationally feasible.

Theorem 21

Any morphic image of the Tribonacci word is Tribonacci-automatic.

Proof

In analogy with Corollary 69 of [13].    \(\square \)

7 Things We Could Not Do Yet

There are a number of things we have not succeeded in computing with our prover because it ran out of space. These include

  • Mirror invariance of \(\mathbf{T}\) (that is, if x is a finite factor then so is \(x^R\));

  • Counting the number of special factors of length n (although it can be deduced from the subword complexity function);

  • Statistics about, e.g., lengths of squares, cubes, etc., in the “flipped” Tribonacci sequence [31], the fixed point of \( 0 \rightarrow 01\), \(1 \rightarrow 20\), \(2 \rightarrow 0\);

  • Recurrence properties of the Tribonacci word;

  • Counting the number of distinct squares (not occurrences) in the finite Tribonacci word \(Y_n\).

  • Abelian complexity of the Tribonacci word.

In the future, an improved implementation may succeed in resolving these in a mechanical fashion.