Keywords

1 Introduction

In [14], Droubay, Justin and Pirillo observed that the number of distinct palindromes occurring in a finite word w of length n does not exceed \(n+1\). This upper bound motivated Brlek, Hamel, Nivat, and Reutenauer to define in [9] the notion palindromic defect D(w) of a finite word w as the difference of the upper bound \(n+1\) and the actual number of palindromic factors occurring in w. One can say that the palindromic defect measures the number of “missing” palindromic factors in the given word. A word with zero palindromic defect is usually shortly called rich or full.

For an infinite word \(\mathbf {u}\) the palindromic defect \(D(\mathbf {u})\) is naturally defined as the supremum of the set \(\{D(w) :w \text { is a factor of } \mathbf {u}\} \). Many classes of words with the defect zero have been found, for example Sturmian words, words coding symmetrical interval exchange and complementary symmetric Rote words (see [2, 7, 15]).

Palindromic defect is actively studied in the last decade. During these years many nice properties of words with zero defect have been brought into light. Some of them have been already proved, some of them are formulated as conjectures and are still open. Neither the basic question “What is the number of rich words of a given length?” has been answered. This question is extremely interesting as the set of rich words is a very naturally defined factorial language which has superpolynomial and subexponential growth as was shown in [17] by C. Guo, J. Shallit and A.M. Shur and in [30] by J. Rukavicka, respectively.

This article consists of three parts. In the first part, we present relevant known results. In the last part we give a list of open questions connected to the palindromic defect and we also recall a narrow connection to the well known conjecture of Hof, Knill, and Simon. The middle part contains a new result. It is devoted to so-called compatible words, i.e., to the pairs of finite rich words which can occur simultaneously as factors of a longer rich word. We believe that our result may help to characterize words w with the following property: \(D(w) = 1\) and \(D(u) =0\) for each proper factor u of w. A characterization of these words seems to be the missing point in answering several open questions.

2 Preliminaries

2.1 Basic Notations and Definitions

Let \(\mathcal A\) be a finite set, called an alphabet. Its elements are called letters. A finite word w is an element of \(\mathcal A^n\) for \(n \in \mathbb N\). The length of w is n and is denoted |w|. The set of all finite words over \(\mathcal A\) is denoted \(\mathcal A^*\). An infinite word over \(\mathcal A\) is an infinite sequence of letters from \(\mathcal A\).

A finite word w is a factor of a finite or infinite word v if there exist words p and s such that v is a concatenation of p, w, and s, denoted \(v = pws\). The word p is said to be a prefix and s a suffix of v. The set of all factors of a word \({\mathbf u}\) is the language of \({\mathbf{u}}\) and is denoted \(\mathcal {L}({\mathbf u})\). All factors of \({\mathbf u}\) of length n are denoted by \(\mathcal {L}_n({\mathbf u})\).

An occurrence of \(w = w_0w_1 \cdots w_{n-1} \in \mathcal A^n\) in a word \(v = v_0v_1v_2 \ldots \) is an index i such that \(v_i \cdots v_{i+n-1} = w\). A factor w is unioccurrent in v if there is exactly one occurrence of w in v. A complete return word of a factor w (in v) is a factor f (of v) containing exactly two occurrences of w such that w is its prefix and also its suffix. For instance, the word 010011010 is a complete return word of 010.

The reversal or mirror mapping assigns to a word \(w \in \mathcal A^*\) the word \(\widetilde{w}\) with the letters reversed, i.e.,

$$\begin{aligned} \widetilde{w} = w_{n-1}w_{n-2}\cdots w_1 w_0 \quad \text { where } w = w_0w_1 \cdots w_{n-1} \in \mathcal A^n. \end{aligned}$$

A word is palindrome if \(w = \widetilde{w}\). We say that a language \(\mathcal {L}\subset \mathcal A^*\) is closed under reversal if for all \(w \in \mathcal {L}\) we have \(\widetilde{w} \in \mathcal {L}\).

Given an infinite word \({\mathbf u}\), its factor complexity \(\mathcal C_{\mathbf u}(n)\) is the count of its factors of length n:

$$\begin{aligned} \mathcal C_{\mathbf u}(n) = \# \mathcal {L}_n({\mathbf u}) \quad \text { for all } n \in \mathbb N. \end{aligned}$$

Let \({\mathrm {Pal}}({\mathbf u})\) be the set of all palindromic factors of the infinite word \({\mathbf u}\). The palindromic complexity \({\mathcal P}_{\mathbf u}(n)\) of \({\mathbf u}\) is given by

$$\begin{aligned} {\mathcal P}_{\mathbf u}(n) = \# ( \mathcal {L}_n({\mathbf u}) \cap {\mathcal P}({\mathbf u}) ) \quad \text { for all } n \in \mathbb N. \end{aligned}$$

We omit the subscript \({\mathbf u}\) if there is no confusion.

2.2 Fixed Points of Morphisms and Their Properties

A morphism \(\varphi \) is a mapping \(\mathcal A^* \rightarrow {\mathcal B}^*\) where \(\mathcal A\) and \({\mathcal B}\) are alphabets such that for all \(v,w \in \mathcal A^*\) we have \(\varphi (vw) = \varphi (v)\varphi (w)\) (it is a homomorphism of the monoids \(\mathcal A^*\) and \({\mathcal B}^*\)). Its action is extended to \(\mathcal A^\mathbb N\): if \({\mathbf u}= u_0u_1u_2 \ldots \in \mathcal A^\mathbb N\), then

$$\begin{aligned} \varphi ({\mathbf u}) = \varphi (u_0) \varphi (u_1) \varphi (u_2) \ldots \in {\mathcal B}^\mathbb N. \end{aligned}$$

If \(\varphi \) is an endomorphism of \(\mathcal A^*\), we may find its fixed point, i.e., a word \({\mathbf u}\) such that \(\varphi ({\mathbf u}) = {\mathbf u}\). We are interested mainly in the case of \({\mathbf u}\) being infinite. A morphism \(\varphi : \mathcal A^* \rightarrow \mathcal A^*\) is primitive if there exists an integer k such that for every \(a,b \in \mathcal A\) the letter b occurs in \(\varphi ^k(a)\).

Two morphisms \(\varphi , \psi : \mathcal A^* \rightarrow {\mathcal B}^*\) are conjugate if there exists a word \(w \in {\mathcal B}^*\) such that

$$\begin{aligned} \forall a \in \mathcal A, \varphi (a)w = w \psi (a) \quad \text { or } \quad \forall a \in \mathcal A, w\varphi (a) = \psi (a)w. \end{aligned}$$

If \(\varphi \) is primitive, then the languages of fixed points of \(\varphi \) and \(\psi \) are the same.

A morphism \(\psi : \mathcal A^* \rightarrow {\mathcal B}^*\) is of class P if \(\psi (a) = pp_a\) for all \(a \in \mathcal A\) where p and \(p_a\) are both palindromes (possibly empty). A morphism \(\varphi \) is of class \(P'\) if it is conjugate to a morphism of class P.

The following examples illustrate the last few notions.

Example 1

Let \(\varphi : \{a,b\}^* \rightarrow \{a,b\}^*\) be determined by \( \varphi : \begin{array}{rcl} a &{} \mapsto &{} abab, \\ b &{} \mapsto &{} aab. \end{array} \) The fixed point of \(\varphi \) is

$$\begin{aligned} {\mathbf u}= \lim _{k \rightarrow +\infty } \varphi ^k(a) = \underbrace{abab}_{\varphi (a)} \underbrace{aab}_{\varphi (b)} \underbrace{abab}_{\varphi (a)} \underbrace{aab}_{\varphi (b)} \underbrace{abab}_{\varphi (a)} \ldots \end{aligned}$$

The morphism \(\varphi \) is of class \(P'\) since it is conjugate to \(\psi \) given by \( \psi : \begin{array}{rcl} a &{} \mapsto &{} abab, \\ b &{} \mapsto &{} aba. \end{array} \) Indeed, we have \(ab \varphi (a) = \psi (a) ab\) and \(ab \varphi (b) = \psi (b) ab \). To see that \(\psi \) is of class P, i.e., it is of the form \(a \mapsto pp_a\) and \(b \mapsto pp_b\), it suffices to set \(p = aba\), \(p_a = b\) and \(p_b = \varepsilon \). The fixed point of \(\psi \) is

$$\begin{aligned} {\mathbf v}= \lim _{k \rightarrow +\infty } \psi ^k(a) = \underbrace{abab}_{\psi (a)} \underbrace{aba}_{\psi (b)} \underbrace{abab}_{\psi (a)} \underbrace{aba}_{\psi (b)} \underbrace{abab}_{\psi (a)} \ldots \end{aligned}$$

We have \(\mathcal {L}({\mathbf u}) = \mathcal {L}({\mathbf v})\).

Example 2

The two famous examples of infinite words, the Thue–Morse word \({\mathbf t}\) and the Fibonacci word \({\mathbf f}\), are both fixed points of a morphism.

The word \({\mathbf t}\) is fixed by the morphism \(\varphi _{TM}\) determined by \(\varphi _{TM}(0) = 01\) and \(\varphi _{TM}(1) = 10\). Note that this morphism in fact has two fixed points, one being the other one after replacing 0 with 1 and 1 with 0. The word \({\mathbf t}\) as given above is the fixed points starting in 0.

The word \({\mathbf f}\) is fixed by the morphism \(\varphi _F\) defined by \(\varphi _F(0) = 01\) and \(\varphi _F(1) = 0\).

An (infinite) fixed point of a morphism of class \(P'\) clearly contains infinitely many palindromes which is one motivation for this notion. Class P is introduced in [19] in the context of discrete Schrödinger operators.

3 The Study of Palindromic Defect

3.1 Characterizations of Words with the Zero Defect

We start by giving some of the known characterizations of infinite rich words.

Theorem 3

For an infinite word \({\mathbf u}\) with language closed under reversal the following statements are equivalent:

  1. 1.

    \(D({\mathbf u})\) is zero [9];

  2. 2.

    any prefix of \({\mathbf u}\) has a unioccurrent longest palindromic suffix [14];

  3. 3.

    for any palindromic factor w of \({\mathbf u}\), every complete return word of w is a palindrome [16];

  4. 4.

    for any factor w of \({\mathbf u}\), every factor of \({\mathbf u}\) that contains w only as its prefix and \(\widetilde{w}\) only as its suffix is a palindrome [16];

  5. 5.

    for each \(n \in N\) we have \(\mathcal C(n+1) - \mathcal C(n) + 2 = {\mathcal P}(n) + {\mathcal P}(n+1)\) [11].

We generalized the previous theorem to infinite words with finite palindromic defect, see [3, 26]. In particular, we showed that an infinite word has a finite palindromic defect \(D({\mathbf u})\) if and only if the equality \(\mathcal C(n+1) - \mathcal C(n) + 2 ={\mathcal P}(n) + {\mathcal P}(n+1)\) is valid for all \(n\in N\) up to finitely many exceptions. A surprising observation that these exceptional indices allow to determine the value of the palindromic defect was made by Brlek and Reutenauer. In [8] they proved for infinite periodic words and conjectured for general words the following equality

$$\begin{aligned} 2 D({\mathbf u}) = \sum _{n = 0}^{+\infty } \Bigl ( \mathcal C_{\mathbf u}(n+1)-\mathcal C_{\mathbf u}(n)+2-{\mathcal P}_{\mathbf u}(n+1)-{\mathcal P}_{\mathbf u}(n) \Bigr ). \end{aligned}$$
(1)

The conjecture was confirmed in [4] where we showed the following theorem.

Theorem 4

Equation (1) is true for any infinite word \({\mathbf u}\) whose language is closed under reversal.

Besides these general properties, many examples of words with zero or finite palindromic defect were found:

  • In [12, 27], another characterizations of rich words are given.

  • In [13], the relation of rich words to so-called periodic-like words is exhibited.

  • Links to another class of words, trapezoidal words, are shown in [24].

  • Words coding symmetric interval exchange transformations are rich by [2].

  • In [7], the authors show that words coding rotation on the unit circle with respect to partition consisting of two intervals are rich.

  • In [29], the authors show a connection of rich words with the Burrows–Wheeler transform.

  • In [32], we show that morphic images of episturmian words, a known class of rich words, produces a word with finite palindromic defect.

  • The articles [20, 28, 31] exhibit more examples of words with finite palindromic defect (along with some examples of words with finite generalized palindromic defect).

3.2 Palindromic Defect of Fixed Points of Morphisms

We now focus on words that are fixed by a morphism with the assumption that their language is closed under reversal. The main motivation to study their palindromic defect is the following conjecture.

Conjecture 5

(Zero defect conjecture [6]). Let \({\mathbf u}\) be an aperiodic fixed point of a primitive morphism having its language closed under reversal. We have \(D({\mathbf u}) = 0\) or \(D({\mathbf u}) = +\infty \).

The Thue–Morse word \({\mathbf t}\) and the Fibonacci word \({\mathbf f}\) are examples of aperiodic fixed points of a primitive morphism (see Example 2) having their language closed under reversal. We have \(D({\mathbf f}) = 0\) and \(D({\mathbf t}) = +\infty \).

Counterexamples to the conjecture were given in [1, 10]. Thus, the current statement of the conjecture is not true. There still might some refinement of the current statement that is valid as there are many witnesses and the found counterexamples seem to have some specific properties. Indeed, in [22] we prove that the conjecture is true for a special class of morphisms. A morphism \(\varphi \) is marked if there exists two morphisms \(\varphi _1\) and \(\varphi _2\), both being conjugate to \(\varphi \), such that

$$\begin{aligned} \{ \text {last letter of } \varphi _1(a) :a \in \mathcal A\} = \{ \text {first letter of } \varphi _2(a) :a \in \mathcal A\} = \mathcal A. \end{aligned}$$

In other words, the set of the last letters of the images of letters by \(\varphi _1\) is the whole alphabet \(\mathcal A\) and the set of the first letters of the images of letters by \(\varphi _2\) is also the whole alphabet \(\mathcal A\).

For instance, \(\varphi = \varphi _{TM}: 0 \mapsto 01, 1 \mapsto 10\) is marked (here \(\varphi = \varphi _1 = \varphi _2\)). For \(\varphi = \varphi _F: 0 \mapsto 01, 1 \mapsto 0\) we have \(\varphi = \varphi _1\) and \(\varphi _2: 0 \mapsto 10, 1 \mapsto 0\). Thus, \(\varphi _F\) is also marked.

If a morphism \(\varphi \) is conjugate to no other morphism except for \(\varphi \) itself, then we say that \(\varphi \) is stationary. In other words, a morphism \(\varphi \) is stationary if the longest common prefix and the longest common suffix of \(\varphi \)-images of all letters are both empty words.

In [22] we show the following theorems:

Theorem 6

Let \(\varphi \) be a primitive marked morphism and let \({\mathbf u}\) be its fixed point with finite palindromic defect. If all complete return words of all letters in \({\mathbf u}\) are palindromes or \(\varphi \) is not stationary, then \(D({\mathbf u}) = 0\).

Moreover, the binary alphabet allows for all of the assumptions to be dropped:

Theorem 7

If \({\mathbf u}\in \mathcal A^\mathbb N\) is a fixed point of a primitive morphism over binary alphabet and \(D({\mathbf u}) < +\infty \), then \(D({\mathbf u}) = 0\) or \({\mathbf u}\) is periodic.

We thus confirm that for a large class of fixed points of morphisms, their palindromic defect is either zero or infinite.

3.3 Enumeration of Rich Words

Let \(R_{d}(n)\) denote the number of rich words of length n over an alphabet with d elements. As we have already mentioned, there is no closed-form formula for \(R_{d}(n)\).

In [34], Vesti gives a recursive lower bound on \(R_{d}(n)\) and an upper bound on \(R_{2}(n)\). Both these estimates seem to be very rough.

In [17], Guo, Shallit and Shur constructed for each n a large set of binary rich words of length n. They show that for any two sequences of integers \(0\le n_1\le n_2 \le \cdots \le n_k\) and \(0\le m_1\le m_2 \le \cdots \le m_k\) satisfying \(n= \sum _{i=1}^k n_k + \sum _{i=1}^k m_k\), the word \(a^{n_1}b^{m_1}a^{n_1}b^{m_1}\cdots a^{n_k}b^{m_k}\) of length n is rich. This construction gives, currently, the best lower bound on the number of binary rich words, namely \(R_2(n) \ge \frac{ C^{\sqrt{n}}}{p(n)}\) where p(n) is a polynomial and the constant \(C\,\sim \,37\). They also conjectured that \(R_2(n) = \varTheta \Bigl (\frac{n}{g(n)}\Bigr )^{\sqrt{n}}\) for some infinitely growing function g(n).

The best upper bound is provided by Rukavicka in [30]. He shows that \(R_d(n)\) has a subexponential growth on any alphabet. More precisely, for any cardinality d of the alphabet \(\lim \limits _{n\rightarrow \infty }\root n \of {R_d(n)}=1\). The result uses a specific factorization of a rich word into distinct rich palindromes, called UPS-factorization (Unioccurrent Palindromic Suffix factorization).

4 Compatible Pairs

The set of rich words is a factorial language but it is not recurrent. Let us recall that a language \(\mathcal {L} \subset \mathcal {A}^*\) is recurrent if for any two words \(u,v \in \mathcal {L}\) there exists \(w \in \mathcal {L}\) such that u is a prefix of w and v is a suffix of w. Using results of Glen et al. [16], Vesti in [34] formulated a sufficient condition which prevents two rich words uv to be simultaneously factors of another rich word. His proposition uses the notion of longest palindromic suffix of a factor u, denoted \({{\mathrm{lps}}}(u)\) and longest palindromic prefix of a factor u, denoted \({{\mathrm{lpp}}}(u)\). We say that two finite words are compatible if there exists a rich words having these two words as factors.

Proposition 8

Let u and v be two words such that

$$\begin{aligned} u\ne v, \ \ u,v \ \ rich, \quad {{\mathrm{lpp}}}(u)={{\mathrm{lpp}}}(v) \quad and\quad {{\mathrm{lps}}}(u)={{\mathrm{lps}}}(v). \end{aligned}$$
(2)

If a word w contains factors u and v, then w is not rich, i.e., u and v are not compatible.

We give an example which demonstrates that a word w can be non-rich without containing factors u and v satisfying (2).

Example 9

Consider the word \(w=11010011\), which is not rich. In fact, it is a factor of the Thue–Morse word. As pointed out in [5], the length 8 is the shortest length of a non-rich binary word.

Table 1 depicts all non-empty rich factors u of w together with the pairs \(({{\mathrm{lpp}}}(u), {{\mathrm{lps}}}(u))\). The map \(u \mapsto ({{\mathrm{lpp}}}(u), {{\mathrm{lps}}}(u))\) is injective. In other words, no pair of factors uv of the non-rich word \(w=11010011\) satisfies (2).

Table 1. All non-empty rich factors u of w from Example 9 together with the pairs \(({{\mathrm{lpp}}}(u), {{\mathrm{lps}}}(u))\).

Let us formulate another sufficient condition for non-richness of a word w.

Proposition 10

Let u and v be two words satisfying

$$\begin{aligned} u\ne \widetilde{v}, \ \ u,v \ \ rich, \quad {{\mathrm{lps}}}(u)={{\mathrm{lpp}}}(v) \quad and\quad {{\mathrm{lps}}}(v)={{\mathrm{lpp}}}(u). \end{aligned}$$
(3)

If a word w contains factors u and v, then w is not rich.

Proof

First we show (by contradiction) that the assumption (3) gives

$$\begin{aligned} u, \widetilde{u} \notin \mathcal {L}(v)\cup \mathcal {L}(\widetilde{v}) \quad and\quad v, \widetilde{v} \notin \mathcal {L}(u)\cup \mathcal {L}(\widetilde{u}). \end{aligned}$$
(4)

As the roles of v and u are symmetric, we have to discuss the following two cases:

  1. (1)

    \(u \in \mathcal {L}(v)\):

    As v is rich, \({{\mathrm{lps}}}(v)\) is unioccurrent in v. Since \({{\mathrm{lps}}}(v) = {{\mathrm{lpp}}}(u)\), we have that \({{\mathrm{lpp}}}(u)\) occurs only as a suffix of v. Since \(u \in \mathcal {L}(v)\), necessarily \(u = {{\mathrm{lpp}}}(u)\) and thus u is a palindrome. It follows that \(u={{\mathrm{lps}}}(u)={{\mathrm{lpp}}}(v)={{\mathrm{lps}}}(v)\). Richness of v implies that \({{\mathrm{lpp}}}(v)\) and \({{\mathrm{lps}}}(v)\) are unioccurrent in v and consequently v is a palindrome satisfying \(v={{\mathrm{lpp}}}(v) = u = \widetilde{u}\), which is a contradiction.

  2. (2)

    \(\widetilde{u} \in \mathcal {L}(v)\):

    Since \({{\mathrm{lps}}}(v) = {{\mathrm{lps}}}(\widetilde{u}) \) is unioccurrent in v, we have that \(\widetilde{u}\) occurs only as a suffix of v. Similarly, as \({{\mathrm{lpp}}}(v) = {{\mathrm{lpp}}}(\widetilde{u}) \) is unioccurrent in v, we get that \(\widetilde{u}\) occurs only as a prefix of v. It implies \(v= \widetilde{u}\), which is again a contradiction.

Obviously, the assumption (3) implies that u and v are not palindromes.

To prove the proposition itself (again by contradiction), we assume that w is rich and let f denote the shortest factor of w such that f contains as its factor u or \(\widetilde{u}\) and f contains as its factor v or \(\widetilde{v}\). Without loss of generality and due to (4), we have to discuss the following two cases:

  1. (1)

    u is a proper prefix and v is a proper suffix of f:

    The word \({{\mathrm{lps}}}(f)\) is not longer than v; otherwise, we obtain a contradiction with the choice of f as the shortest factor with the given property. Thus \({{\mathrm{lps}}}(f) = {{\mathrm{lps}}}(v)\). Similarly, \({{\mathrm{lpp}}}(f) = {{\mathrm{lpp}}}(u)\). It means that \( {{\mathrm{lps}}}(f)\) is not unioccurrent in f—a contradiction.

  2. (2)

    u is a proper prefix and \(\widetilde{v}\) is a proper suffix of f:

    By the same argument as before, \({{\mathrm{lps}}}(f) = {{\mathrm{lps}}}(\widetilde{v})= {{\mathrm{lpp}}}(v)\). It means that \({{\mathrm{lpp}}}(v) = {{\mathrm{lps}}}(u)\) occurs as a suffix of f and also as a suffix of u. Since u is a proper prefix of f, the factor \({{\mathrm{lpp}}}(v) = {{\mathrm{lps}}}(f)\) occurs in f twice—a contradiction with the richness of f.

Example 11

We consider again the non-rich word \(w=11010011\). It contains the factors \(u =11010\), \(v = 010011\) such that \({{\mathrm{lpp}}}(u) =11={{\mathrm{lps}}}(v)\) and \({{\mathrm{lps}}}(u) =010={{\mathrm{lpp}}}(v)\). Also the pairs \(u' = 1101001\), \(v' = 10011\) and \(u'' = 110100\), \(v'' = 0011\) satisfy (3).

We show that a pair of factors with the property (3) occurs in each non-rich word.

Proposition 12

If w be is a non-rich word, then w has two factors u and v such that

$$\begin{aligned} u\ne \widetilde{v}, \ \ u,v \ \ rich, \quad {{\mathrm{lps}}}(u)={{\mathrm{lpp}}}(v) \quad \, and \, \quad {{\mathrm{lps}}}(v)={{\mathrm{lpp}}}(u). \end{aligned}$$

Proof

As w is not rich, it contains a complete return word r to a palindrome p such that r is not a palindrome. Let r be the shortest non-palindromic return word in w to a palindrome. Denote by t the first letter of r and find the longest q such that tq is a prefix of r and \(\widetilde{q}t\) is a suffix of v. Clearly, p is a prefix of tq and p is a suffix of \(\widetilde{q}t\). Let us denote x and y the letters such that tqx is a prefix of r and \(y\widetilde{q}t\). Obviously, \(x\ne y\).

  • If q is empty, then r is a non-palindromic complete return word to the letter t, i.e., the letter t does not occur in the factor f given by \(r = tft\), i.e., \(f=t^{-1}rt^{-1}\). Choose \(z \in \{x,y\}\) such that \(z\ne t\) and put

    \(u:=\) the shortest prefix of r which ends with the letter z and

    \(v:=\) the shortest suffix of r which starts with the letter z.

    In particular, both letters z and t are unioccurrent in u and also in v. It means that \({{\mathrm{lpp}}}(u)=t={{\mathrm{lps}}}(v)\) and \({{\mathrm{lps}}}(u)=z={{\mathrm{lpp}}}(v)\). One of the words u and v has length 2 and the second one is longer than 2. It implies that \(u\ne \widetilde{v}\).

  • Let us assume that \(q\ne \varepsilon \). The word \(f= t^{-1}rt^{-1}\) has a prefix qx and a suffix \(y\widetilde{q}\). First we show

Claim:

Occurrences of q and \(\widetilde{q}\) in f alternate and moreover each factor of f starting with q and ending with \(\widetilde{q}\) without other occurrences of q and \(\widetilde{q}\) is a palindrome.

Proof of the claim:

Let \(w'\) be arbitrary suffix of f such that \(|w'| > |\widetilde{q}|\) and \(w'\) has a prefix q. Clearly, f has a suffix \(\widetilde{q}\) and thus \(\widetilde{q}\) is a suffix of \(w'\) as well. Let us denote \(p'={{\mathrm{lpp}}}(q)\). Since q is rich, \(p'\) is unioccurrent in q. But \(p'\) occurs in \(w'\) at least twice, as \(\widetilde{q}\) is a suffix of \(w'\). Let us denote \(r'\) a complete return word to \(p'\) in \(w'\). From minimality of r, the complete return word \(r'\) to \(p'\) is a palindrome. Therefore, \(w'\) has prefixes \(p'\), q and \(r'\), their lengths satisfy \(|p'|\le |q| < |r'|\). It implies that \(\widetilde{q}\) is a suffix of the palindrome \(r'\) and thus the first occurrence of q in \(w'\) is followed by the occurrence of \(\widetilde{q}\).

Since f is not a palindrome, the previous claim implies that q and \(\widetilde{q}\) occur also as inner factors of f. It means that there exists a palindromic factor, say \(w''\), of the word f such that \(\widetilde{q}\) is a prefix and q is a suffix of \(w''\) and \(|w''| > |q|\). Let z denote the letter satisfying that \(\widetilde{q}z\) is a prefix of \(w''\). Obviously, zq is a suffix of \(w''\). Let us stress that \(z\ne t\), otherwise r would not be a complete return word to the palindrome p. The letter z enables us to identify the factors v and u announced in the proposition. Put

\(u:=\) the shortest prefix of \(r = tft\) which ends with \(\widetilde{q}z\)

\(v:=\) the shortest suffix of \(r = tft\) which starts with zq.

To prove \({{\mathrm{lpp}}}(u)={{\mathrm{lps}}}(v)\), we apply the simple observation: If a word \(s'\) is a prefix of a word s and \({{\mathrm{lpp}}}(s)\) is a prefix of \(s'\), then \({{\mathrm{lpp}}}(s)={{\mathrm{lpp}}}(s')\).

In our situation: \(p = {{\mathrm{lpp}}}(r)={{\mathrm{lpp}}}(u)\). Analogously, \(p = {{\mathrm{lps}}}(r)={{\mathrm{lps}}}(v)\).

To show \({{\mathrm{lps}}}(u)={{\mathrm{lpp}}}(v)\), we use a simple consequence of the claim: Any occurrence of \(\ell q\) in r, where \(\ell \) is a letter with \(\ell \ne t\), is preceded with an occurrence of \(\widetilde{q}\ell \). Therefore, our definition of u guarantees that \({{\mathrm{lps}}}(u)\) is not longer than \(\widetilde{q}z\), i.e., \({{\mathrm{lps}}}(u) ={{\mathrm{lps}}}(\widetilde{q}z)\). By the same reason, \({{\mathrm{lpp}}}(v) ={{\mathrm{lpp}}}(zq)\). As \({{\mathrm{lpp}}}(zq) = {{\mathrm{lps}}}(\widetilde{q}z)\), the equality \({{\mathrm{lps}}}(u)={{\mathrm{lpp}}}(v)\) is proven.

Obviously, \(u\ne \widetilde{v}\). Otherwise, we have a contradiction with the assumption that tq is the longest prefix of r such that \(\widetilde{q}t\) is a suffix of r.

The last proof has an interesting direct consequence on a binary alphabet. It is based on the fact that the case \(q = \varepsilon \) is not possible on a binary alphabet and the second case implies that q is not a palindrome. We state this consequence of the construction in the second case as the following corollary.

Corollary 13

Let \(w \in \{0,1\}^*\) be a binary word. The word w is not rich if and only if there exists a non-palindromic word q such that

$$\begin{aligned} 0q0, 1q1, 0\widetilde{q}1, 1\widetilde{q}0 \in \mathcal {L}(w). \end{aligned}$$

5 Open Questions and Related Problems

We finish this article with a list of open questions that we deem important in further understanding of the structure of rich words (and more generally, words with finite palindromic defect).

  • The subexponential upper bound on the number of rich words \(R_d(n) \) of length n over d letters is based on the statement that any rich word of length n can be factorized into at most \(c\frac{n}{\ln n}\) distinct palindromes. In fact, the number of palindromes is exaggerated, as the factorization does not take into consideration that each of the palindromes is rich as well. Any asymptotic improvement of the bound \(c\frac{n}{\ln n}\) would improve the upper bound on \(R_d(n) \).

  • To our knowledge, there are no result on morphisms preserving the set of rich words. Such a class of morphisms preserving richness would allow to construct a set of class other than the set constructed in [17] to obtain a lower bound on \(R_2(n)\). In particular, any fixed point of a primitive morphism which preserves the set of rich words must be rich as well. In this point of view the following question is also important.

  • Theorem 6 confirms the validity of the zero defect conjecture only for marked morphisms \(\varphi \) satisfying the following assumption: all complete return words of all letters in \({\mathbf u}\) are palindromes or \(\varphi \) is not stationary. We have no example that this peculiar assumption is really needed.

  • We do not know how to decide whether two rich words u and v are factors of a common rich word w. The related task is to identify a minimal non-rich word, i.e., to look for a word which is not rich but any its proper factor is rich.

Primitive morphisms that preserve the set of rich words are included in a larger set of morphisms having infinitely many palindromic factors in their fixed points. An infinite word having infinitely many palindromic factors is usually called palindromic. A very useful property of morphisms in this larger set is given by the following conjecture.

Conjecture 14

(Class P conjecture [19]). Let \({\mathbf u}\) be a palindromic fixed point of a primitive morphism \(\varphi \). There exists a morphism of class \(P'\) such that its fixed point has the same language as \({\mathbf u}\).

The original statement of the conjecture in [19] is ambiguous and allows for more interpretations, see also [21] or [18]. The above given statement of Conjecture 14 follows from two results. First, for binary alphabet the question is solved by B. Tan in [33]: if a fixed point of a primitive morphism \(\varphi \) over a binary alphabet contains infinitely many palindromes, then \(\varphi \) or \(\varphi ^2\) is of class \(P'\). Second, in [23], S. Labbé shows that the analogy of the previous result cannot be generalized for multiliteral alphabet: there exists a word \({\mathbf w}\) over ternary alphabet which is a palindromic fixed point of a primitive morphism and not being fixed by any morphism of class \(P'\). However, the authors of [18] note that the language of the word \({\mathbf w}\) may indeed be generated by a morphism of class P.

At this moment only partial answers to Conjecture 14 are known: as already mentioned, the binary case is solved ([33]); for larger alphabets an affirmative answer is provided only for some special classes of morphisms.

In [25], we confirm the conjecture for morphisms fixing a codings a non-degenerate exchange of 3 intervals. In [21], the authors prove the validity of the conjecture for marked morphisms. Moreover, they show that a power of the marked morphism itself is in class \(P'\). The technique and results used in the proofs of the latter fact is crucial in showing the defect conjecture for marked morphisms in [22].

Palindromicity of a fixed point \({\mathbf u}\) is linked to the symmetry of the language \(\mathcal {L}({\mathbf u})\), namely the closedness under reversal. One direction of this connection is trivial: If a fixed point of a primitive morphism contains infinitely many palindromes, then its language is closed under reversal. The non-trivial converse is shown in [21] for marked morphisms. The mentioned results and computer experiments lead to the formulation of the following conjecture.

Conjecture 15

Let \(\varphi : \mathcal A^* \rightarrow \mathcal A^*\) be a primitive morphism having a fixed point \({\mathbf u}\). Its language \(\mathcal {L}({\mathbf u})\) is closed under reversal if and only if \({\mathbf u}\) is palindromic.

A proof in full generality of this conjecture has applications in algorithmic analysis of the language of a given morphism. Specifically, it allows for an efficient test whether the language of a fixed point is closed under reversal. For marked primitive morphisms, such an algorithm may be devised based on the following results of [21]:

  1. 1.

    Every marked morphism has a so-called well-marked power (see [21] for a definition). If the fixed point of the morphism is palindromic, then this power is of class \(P'\).

  2. 2.

    Conjecture 15 is true for marked morphisms.

Overall, closedness under reversal of the language generated by a marked primitive morphism is equivalent to palindromicity of the language which is equivalent to the well-marked power being in class \(P'\). Therefore, given a marked primitive morphism, the test whether the language it generates is closed under reversal consists of find the well-marked power and checking if this power is in class \(P'\). Since both these tasks can be performed efficiently in a straightforward manner, the whole test can be easily executed.

In the view of this special case, Conjecture 15 may be seen as a first step to provide an efficient test of closedness under reversal for the language generated by any primitive morphism for which the class P conjecture holds.