1 Introduction

Word equations have been studied both from an algorithmic and algebraic point of view. Some well-known results are that the complexity of the satisfiability problem of word equations can be solved in nondeterministic linear space [1], and that every system of word equations is equivalent to a finite subsystem [2, 3].

In this article, we concentrate on constant-free equations, and all equations are assumed to be constant-free from now on. For some relations between constant-free equations and equations with constants, see [4].

Equations with one or two variables have only periodic solutions, and are therefore not interesting. The three-variable case, on the other hand, is highly nontrivial, while simultaneously being much simpler than the four-variable case. Some examples of difficult results about three-variable equations are Hmelevskii’s theorem that every three-variable equation has a parametric solution [5] (this does not hold for equations with four or more variables), and a bound of 18 for the size of independent systems [6] (no finite bound is known for equations with four or more variables).

Hmelevskii’s theorem, in particular, is relevant for this article. The original proof, and even the simpler modern version of that proof in [7, 8], is very long. Also, while the basic concept of parametric solutions is simple, the actual parametric formulas that arise from the proof can be very complicated. They were analyzed in [8, 9], and it was proved, for example, that the number of numerical parameters needed in these formulas is at most logarithmic with respect to the length of the equation, and the total length of the formulas is at most exponential.

In this article, we find out that for certain families of three-variable equations and systems of equations, namely, for so-called unbalanced equations and entire systems, the set of solutions can be described using explicitly given formulas that are quite simple. In particular, only two numerical parameters are needed to represent all solutions, and only one numerical parameter is needed if we disregard periodic solutions. We also outline a strategy to possibly extend our result to all three-variable equations, although one or two numerical parameters will probably no longer be sufficient in that case. This can potentially lead to a much stronger and more explicit version of Hmelevskii’s theorem in the future.

This article is an extended version of the conference paper [10]. The most important differences are the following: We have added Section 3 about parametric words. In the conference version, parametric words were mentioned but not defined formally. We have added Theorem 5.2 and Corollary 5.3. These are essentially direct consequences of results in [10], but the relevant parts were split and hidden inside many lemmas. We have added Theorem 7.2. The idea behind this theorem was briefly discussed in [10], but there was no formal statement or proof. Finally, we have added Theorem 8.2.

2 Preliminaries

First, we go through some basic definitions and lemmas abouts words. For more, see [11, 12].

Let \(\mathbb {N}\) denote the set of nonnegative integers. Throughout the article, let \(\Sigma \) be an alphabet that contains at least two letters a and b. Let \(\varepsilon \) denote the empty word.

A word u is a factor (prefix, suffix) of a word w if there exist words xy such that \(w = xuy\) (\(w = uy\), \(w = xu\), respectively). If one of the words uv is a prefix (suffix) of the other, we use the notation \(u \sim _{\textrm{p}}v\) (\(u \sim _{\textrm{s}}v\), respectively).

If \(w \in \Sigma ^*\) and \(n \in \mathbb {N}\), then \(w^n\) is called a power of w, or more specifically, an n-power of w. If u is a prefix of w, then \(w^n u\) is a called a fractional power of w. We also use negative powers as follows: If \(w = uv\), then \(u^{-1} w = v\) and \(w v^{-1} = u\). If x is not a prefix (suffix) of w, then \(x^{-1} w\) (\(w x^{-1}\), respectively) is not defined, so whenever we use expressions like these, we have to make sure that they represent well-defined words.

Words u and v are conjugates if there exist words xy such that \(u = xy\) and \(v = yx\). If a word is an n-power, then all of its conjugates are also n-powers.

Let \(\Xi \) be another alphabet. A mapping \(h: \Xi ^* \rightarrow \Sigma ^*\) is a morphism if \(h(uv) = h(u) h(v)\) for all \(u, v \in \Xi ^*\). The morphism h is periodic if there exists \(w \in \Sigma ^*\) such that \(h(u) \in w^*\) for all \(u \in \Xi ^*\), and nonperiodic otherwise.

Next, we state some well-known results that are needed later. These can be considered folklore results, and Lemmas 2.1 and 2.3 can be found, for example, in [11, Corollary 1.2.6, Proposition 1.3.4].

Lemma 2.1

Let \(x, y \in \Sigma ^*\). The following are equivalent:

  1. 1.

    x and y are powers of a common word.

  2. 2.

    \(xy = yx\).

  3. 3.

    x and y satisfy a nontrivial relation, that is, there exist \(x_1, \dots , x_m, y_1, \dots , y_n \in \{x, y\}\) such that \((x_1, \dots , x_m) \ne (y_1, \dots , y_n)\) but \(x_1 \cdots x_m = y_1 \cdots y_n\).

Lemma 2.2

Let \(x, y \in \Sigma ^*\). If y is a fractional power of x and x is a suffix of y, then x and y are powers of a common word.

Proof

Because y is a fractional power of x, we can write \(x = pq\) and \(y = (pq)^n p\) for some words pq and integer \(n \ge 0\). Because x is a suffix of y, \(x = qp\). It follows that \(pq = qp\) and, by Lemma 2.1, \(p, q \in r^*\) for some word r. Then also \(x, y \in r^*\).

\(\square \)

Lemma 2.3

Let \(x, y, z \in \Sigma ^*\) and let \(xy = yz\). Then \(x = z = \varepsilon \) or

$$\begin{aligned} x = uv,\qquad y = (uv)^j u,\qquad z = vu \end{aligned}$$

for some \(u, v \in \Sigma ^*\) and \(j \in \mathbb {N}\).

The next result is one of the equivalent formulations of the periodicity theorem of Fine and Wilf.

Theorem 2.4

(Fine and Wilf [13]) Let \(x, y \in \Sigma ^*\). If a power of x and a power of y have a common prefix of length \(|xy| - \gcd (|x|, |y|)\), then x and y are powers of a common word.

Let us fix an alphabet of variables \(\Xi \) and an alphabet of constants \(\Sigma \). A word equation is a pair \(E = (L, R)\), where \(L, R \in \Xi ^*\), and a solution of E is a morphism \(h: \Xi ^* \rightarrow \Sigma ^*\) such that \(h(L) = h(R)\). The equation E is nontrivial if \(L \ne R\).

A system of equations is a set of equations. A solution of a system is a morphism that satisfies all equations in the system. A system is nontrivial if it contains at least one nontrivial equation.

The set of all solutions of an equation or system E is denoted by \({{\,\textrm{Sol}\,}}(E)\), and the set of all equations satisfied by a morphism h is denoted by \(\textrm{Eq}(h)\). Then \(\textrm{Eq}(h)\) is a system of equations, and it is called an entire system. Equations or systems \(E_1, E_2\) are equivalent if \({{\,\textrm{Sol}\,}}(E_1) = {{\,\textrm{Sol}\,}}(E_2)\), and morphisms \(h_1, h_2\) are equivalent if \(\textrm{Eq}(h_1) = \textrm{Eq}(h_2)\).

We are particularly interested in the three-variable case \(\Xi = \{X, Y, Z\}\). Throughout the article, we let XYZ be distinct variables, and we use the shorthand notation [xyz], where \(x, y, z \in \Sigma ^*\), for the morphism \(h: \{X, Y, Z\}^* \rightarrow \Sigma ^*\) defined by \(h(X) = x\), \(h(Y) = y\), \(h(Z) = z\).

Example 2.5

Consider the equation \(E = (XY, YZ)\). We can easily check that the morphism \(h = [uv, (uv)^j u, vu]\), where \(u, v \in \Sigma ^*\) and \(j \in \mathbb {N}\), is a solution of E:

$$\begin{aligned} h(XY) = uv (uv)^j u = (uv)^j u vu = h(YZ). \end{aligned}$$

It follows from Lemma 2.3 that all solutions of E are of this form or of the form \([\varepsilon , u, \varepsilon ]\).

An equation (LR) is balanced if every variable has as many occurrences in L as in R, and unbalanced otherwise. Results related to balanced equations can be found, for example, in [14, 15]. We need the following two theorems.

Theorem 2.6

(Harju and Nowotka [14]) Let \(g, h: \{X, Y, Z\}^* \rightarrow \Sigma ^*\) be nonperiodic morphisms such that \(\textrm{Eq}(g) \ne \textrm{Eq}(h)\). Then \(\textrm{Eq}(g) \cap \textrm{Eq}(h)\) does not contain any unbalanced equations.

Theorem 2.7

(Harju and Nowotka [14]) If two unbalanced equations have a common nonperiodic solution, then they have the same set of periodic solutions.

3 Parametric Words

Parametric words can be formalized in a couple of different ways. In this section, we define them as certain kinds of functions.

First, let us consider functions \((\Sigma ^*)^p \times \mathbb {N}^q \rightarrow \Sigma ^*\). Here \((\Sigma ^*)^p\) means a cartesian product rather than a concatenation of languages, so these are functions with p word parameters and q numerical parameters. For \(i \in \{1, \dots , p\}\), let \(U_i\) be the function defined by

$$\begin{aligned} U_i(u_1, \dots , u_p; j_1, \dots , j_q) = u_i. \end{aligned}$$

We can use \(\mathcal {E}\) to mean the function that maps everything to the empty word. If \(\alpha , \beta \) are functions, then their product (or concatenation), denoted by \(\alpha \beta \), is the function

$$\begin{aligned} (\alpha \beta )(u_1, \dots , u_p; j_1, \dots , j_q) \!=\! \alpha (u_1, \dots , u_p; j_1, \dots , j_q) \beta (u_1, \dots , u_p; j_1, \dots , j_q). \end{aligned}$$

For all \(i \in \{1, \dots , q\}\), we also define a formal power of \(\alpha \), denoted by \(\alpha ^{J_i}\), as the function

$$\begin{aligned} \alpha ^{J_i}(u_1, \dots , u_p; j_1, \dots , j_q) = \alpha (u_1, \dots , u_p; j_1, \dots , j_q)^{j_i}. \end{aligned}$$

Parametric words can now be defined as the functions we get by starting with the functions \(\mathcal {E}, U_1, \dots , U_p\) and repeatedly applying the above-mentioned operations. Formally, let

$$\begin{aligned} \mathcal {P}_0(p, q) = \{U_{k_1} \cdots U_{k_n} \mid n \in \mathbb {N}, k_i \in \{1, \dots , p\}\}, \end{aligned}$$

where the case \(n = 0\) gives \(\mathcal {E}\). For \(d \ge 1\), let

$$\begin{aligned} \mathcal {P}_d(p, q) = \{ \alpha _0 \beta _1^{J_{k_1}} \alpha _1 \cdots \beta _n^{J_{k_n}} \alpha _n \mid n \in \mathbb {N}, k_i \in \{1, \dots , q\}, \alpha _i, \beta _i \in \mathcal {P}_{d - 1}(p, q) \}. \end{aligned}$$

The elements of

$$\begin{aligned} \bigcup _{i = 0}^\infty \mathcal {P}_i(p, q). \end{aligned}$$

are called parametric words with p word parameters and q numerical parameters. For a parametric word \(\alpha \), the smallest d such that \(\alpha \in \mathcal {P}_d(p, q)\) is called the nesting level of \(\alpha \). Note that \(\mathcal {P}_{d - 1}(p, q) \subseteq \mathcal {P}_d(p, q)\). This means that \(\mathcal {P}_d(p, q)\) is the set of parametric words with nesting level at most d.

We can naturally use shorthand notation such as \(\alpha ^{J_1} \alpha = \alpha ^{J_1 + 1}\), \(\alpha ^{J_1} \alpha ^{J_1} = \alpha ^{2 J_1}\), and \(\alpha ^{J_1} \alpha ^{J_2} = \alpha ^{J_1 + J_2}\). We are mostly interested in the case \(p = 2\), and then we use the notation \(U = U_1\) and \(V = U_2\). We can also let \(J = J_1\) and \(K = J_2\).

Example 3.1

The parametric word \(\alpha = U (V^{J} U)^{K} U V^{J} \in \mathcal {P}_2(2, 2)\) is the function defined by \(\alpha (u, v; j, k) = u (v^{j} u)^{k} u v^{j}\).

Next we define parametric representations and solutions. To simplify notation, we concentrate on the three-variable case, which is the only one needed in this article. Generalizing the definitions for more variables would be straightforward.

A finite set

$$\begin{aligned} \{(\alpha _i, \beta _i, \gamma _i) \mid i \in \{1, \dots , k\}\} \end{aligned}$$

of triples of parametric words in \(\mathcal {P}_d(p, q)\) is a parametric representation of the set of morphisms

$$\begin{aligned} \{ [\alpha _i(x), \beta _i(x), \gamma _i(x)] \mid i \in \{1, \dots , k\}, x \in (\Sigma ^*)^p \times \mathbb {N}^q \}. \end{aligned}$$

The set of all such parametric representations is denoted by \(\mathcal {R}_d(p, q)\). For a three-variable equation E, a parametric representation of \({{\,\textrm{Sol}\,}}(E)\) is called a parametric solution of E.

Example 3.2

By Lemma 2.5, the equation (XYYZ) has a parametric solution

$$\begin{aligned} \{(UV, (UV)^J U, VU), (\mathcal {E}, U, \mathcal {E})\} \in \mathcal {R}_1(2, 1). \end{aligned}$$

Hmelevskii [5] proved that every three-variable equation has a parametric solution. These parametric solutions use two word parameters, except that trivial equations have the parametric solution \(\{(U_1, U_2, U_3)\} \in \mathcal {R}_0(3, 0)\). The number of numerical parameters and nesting level were not analyzed in [5], but a logarithmic upper bound follows from the proofs in [8, 9], leading to the next theorem.

Theorem 3.3

For every nontrivial three-variable equation E, there exist qd such that E has a parametric solution in \(\mathcal {R}_d(2, q)\). Moreover, \(q, d \in O(\log |E|)\).

The process of solving a word equation is often divided into two parts: Finding all periodic solutions (which is straightforward), and finding all nonperiodic solutions (which is much more difficult). To make it easier to talk about these situations, we give the following definition: A parametric representation for a set \(S \subseteq {{\,\textrm{Sol}\,}}(E)\) such that every nonperiodic solution of E is in S is called a parametric NonPer-solution of E.

Lemma 3.4

If a three-variable equation E has a parametric NonPer-solution in \(\mathcal {R}_d(2, q)\), then it has a parametric solution in \(\mathcal {R}_{\max (d, 1)}(2, \max (q, 3))\). Moreover, if E is unbalanced, then it has a parametric solution in \(\mathcal {R}_{\max (d, 1)}(2, \max (q, 2))\).

Proof

It is well-known that the set of periodic solutions has the parametric representation \(\{(U^{J_1}, U^{J_2}, U^{J_3})\} \in \mathcal {R}_1(1, 3)\) if the equation is balanced, and it has a parametric representation in \(\mathcal {R}_1(1, 2)\) if the equation is unbalanced (see, for instance, Example 5.1.2 and the proof of Theorem 5.1.3 in [8] for an explanation). This parametric representation together with the NonPer-solution gives a parametric solution of E. The claim follows.\(\square \)

4 Lemmas

Before moving to the classification of entire systems and unbalanced equations in Section 5 and to the solution sets of these in Section 6, we prove some (mostly technical) lemmas that are needed in those sections.

Lemma 4.1

Let \(x, y \in \Sigma ^*\) and \(m, n, p, q \in \mathbb {N}\) and \(m, n, p + q \ge 1\). Let \(x^p \sim _{\textrm{p}}y^m x\) and \(x^q \sim _{\textrm{s}}x y^n\). Then x and y are powers of a common word or

$$\begin{aligned} x = (uv)^j u, \qquad y = x^{p - 1} uv x^{q} = x^{p} vu x^{q - 1} \end{aligned}$$

for some \(u, v \in \Sigma ^*\) and \(j \in \mathbb {N}\).

Proof

Throughout the proof, we assume that \(p \ge q\) and, consequently, \(p \ge 1\). The case \(p < q\) is symmetric and can be handled in the same way by reversing all the words.

First, let \(|y| \ge |x^{p + q - 1}|\). Then \(y = x^{p - 1} w x^q\) for some word w. From \(x^p \sim _{\textrm{p}}y^m x\) it follows that \(x \sim _{\textrm{p}}w x^q (x^{p - 1} w x^q)^{m - 1} x\). Clearly, \(w x^q (x^{p - 1} w x^q)^{m - 1} x\) has a prefix \(w^k x\) for some \(k \ge 1\) (if \(q + p - 1 \ge 1\), then \(k = 1\), otherwise \(k = m\)), and then x is a prefix of \(w^k x\). Let \(j \ge 0\) be the largest integer such that \(w^j\) is a prefix of x and let u be such that \(x = w^j u\). Then u is a prefix of \(w^k u\) and, by the maximality of j, u is shorter than w and thus a prefix of w, so we can write \(w = uv\) for some word v, and then \(x = (uv)^j u\).

Next, let \(|y| < |x^{p + q - 1}|\) and \(q = 0\). Then \(|x^p| \ge |xy|\) and \(|y^m x| \ge |xy|\). From \(x^p \sim _{\textrm{p}}y^m x\) it follows that x is a fractional power of y, and then from the theorem of Fine and Wilf it follows that x and y are powers of a common word.

Next, let \(|y| < |x^{p + q - 1}|\) and \(q \ge 1\) and \(|x^p| < |y|\). Then y has a prefix \(x^p\) and a suffix \(x^q\). Let \(y = z x^q\). We have \(|zx| < |x^p|\), so zx is a prefix of \(x^p\) and therefore a fractional power of x. By Lemma 2.2, x and zx are powers of a common word, and then also y is a power of that same word.

Next, let \(|y| < |x^{p + q - 1}|\) and \(q \ge 1\) and \(|x| < |y| \le |x^p|\). Then y is a fractional power of x and ends in x. By Lemma 2.2, x and y are powers of a common word.

Finally, let \(|y| < |x^{p + q - 1}|\) and \(q \ge 1\) and \(|y| \le |x|\). Let \(k \ge 0\) be the largest integer such that \(y^k\) is a prefix of x and let z be such that \(x = y^k z\). From x being a prefix of \(y^m x\) it follows that z is a prefix of \(y^m z\), and by the maximality of k, z is shorter than y and thus a prefix of y. This means that x is a fractional power of y. Also, x ends in y. By Lemma 2.2, x and y are powers of a common word.\(\square \)

Lemma 4.2

Let \(x, y \in \Sigma ^*\) and \(m, n, \in \mathbb {N}\) and \(\gcd (m, n) = 1\). Let \(x^m y = y z^n\). Then xyz are powers of a common word or

$$\begin{aligned} x = (st)^n, \qquad y = (st)^i s, \qquad z = (ts)^m \end{aligned}$$

for some \(s, t \in \Sigma ^*\) and \(i \in \mathbb {N}\).

Proof

By Lemma 2.3, \(x^m = uv\), \(y = (uv)^j u\), \(z^n = vu\) for some \(u, v \in \Sigma ^*\) and \(j \in \mathbb {N}\). Then uv is an m-power, and because its conjugate vu is an n-power, uv is also an n-power, say, \(uv = r^n\). From \(x^m = r^n\) and Lemma 2.1 it follows that x and r are powers of a common word p, and then \(uv = p^l\) for some positive integer l, and l is divisible by both m and n. By \(\gcd (m, n) = 1\), l is divisible by mn, and \(uv = w^{mn}\) for some \(w \in p^*\). We can write \(w = st\), \(u = (st)^k s\), \(v = t (st)^{mn - k - 1}\) for some \(s, t \in \Sigma ^*\) and \(k \in \mathbb {N}\), \(k \le mn - 1\). Then \(x = (st)^n\) and \(z = (ts)^m\) and \(y = (st)^{mnj + k} s\). \(\square \)

Lemma 4.3

Let \(x, y, z \in \Sigma ^*\) and \(i, k \in \mathbb {N}\) and \(i \ge 1\) and \(k \ge 2\). Let \((xz)^i x = y^k\). Then xyz are powers of a common word or

$$\begin{aligned} x = (uv)^j u,\qquad y = (uv)^{j + 1} u,\qquad z = vu ((uv)^{j + 1} u)^{k - 2} uv \end{aligned}$$

for some \(u, v \in \Sigma ^*\) and \(j \ge 0\).

Proof

If \(i \ge 2\) or \(|x| \ge |y|\), then \(|(xz)^i x| = |y^k| \ge |xzy|\), so xz and y are powers of a common word by the theorem of Fine and Wilf, and then xyz are powers of a common word.

If \(i = 1\) and \(|x| < |y|\), then \(y = sx = xt\) and \(z = t y^{k - 2} s\) for some \(s, t \in \Sigma ^+\). By Lemma 2.3, \(s = uv\), \(t = vu\), \(x = (uv)^j u\) for some \(u, v \in \Sigma ^*\) and \(j \in \mathbb {N}\), and then \(y = (uv)^{j + 1} u\) and \(z = vu ((uv)^{j + 1} u)^{k - 2} uv\).\(\square \)

Lemma 4.4

Let \(h: \{X, Y, Z\}^* \rightarrow \Sigma ^*\) be a nonperiodic morphism. If \(E \in \textrm{Eq}(h)\) is unbalanced, then E is equivalent to \(\textrm{Eq}(h)\).

Proof

Every solution of \(\textrm{Eq}(h)\) is a solution of E. Every periodic solution of E is a solution of all balanced equations in \(\textrm{Eq}(h)\), because periodic morphisms satisfy all balanced equations, and also a solution of all unbalanced equations in \(\textrm{Eq}(h)\) by Theorem 2.7. If g is a nonperiodic solution of E, then \(E \in \textrm{Eq}(g) \cap \textrm{Eq}(h)\), so it must be \(\textrm{Eq}(g) = \textrm{Eq}(h)\) by Theorem 2.6. This means that g is a solution of \(\textrm{Eq}(h)\). We have shown that E and \(\textrm{Eq}(h)\) have the same solutions. \(\square \)

5 Classification of Entire Systems

Budkina and Markov [16] classified all three-generator subsemigroups of a free semigroup. In the next theorem, we give a reformulation of this theorem in terms of morphisms and equations. An essentially equivalent result was proved independently by Spehner [17, 18]. These results have been used to study three-variable word equations in [14] and [19], for example. In [14], there is also a good comparison of these results.

Theorem 5.1

(Budkina and Markov [16]) Every nonperiodic morphism \(\{X, Y, Z\}^* \rightarrow \Sigma ^*\) that satisfies a nontrivial equation is equivalent, up to a permutation of the variables, to a morphism of one of the following types:

  1. BM1.

    \([a, b, a^{k_0} \prod _{i = 1}^n b a^{k_i}],\) where \(n, k_0, \dots , k_n \in \mathbb {N}\).

  2. BM2.

    \([a, b^m, b^n],\) where \(m, n \in \mathbb {N}\) and \(m, n \ge 1\) and \(\gcd (m, n) = 1\).

  3. BM3.

    \([a, a^p b a^q, a^{p'} b \prod _{i = 1}^n (a^{k_i} b) a^{q'}],\) where \(p, q, p', q', n, k_1, \dots , k_n \in \mathbb {N}\) and \(pp' = qq' = 0\) and \(1 \le p + q \le k_1, \dots , k_n\).

  4. BM4.

    \( [a, a^p b (a^k b)^m, b (a^k b)^n a^q],\) where \(p, q, k, m, n \in \mathbb {N}\) and \(k, m, n \ge 1\) and \(p, q \le k\) and \(\gcd (m + 1, n + 1) = 1\).

  5. BM5.

    \( [a, a^p b (a^k b)^m a^q, b (a^k b)^n],\) where \(p, q, k, m, n \in \mathbb {N}\) and \(p, q, k, m, n \ge 1\) and \(p, q \le k\) and \(\gcd (m + 1, n + 1) = 1\).

  6. BM6.

    \( [a, a^p b a^q, b \prod _{i = 1}^n (a^{k_i} b) (a^k b \prod _{i = 1}^n (a^{k_i} b))^m],\) where \(p, q, k, m, n, k_1, \dots , k_n \) \( \in \mathbb {N}\) and \(m, p, q \ge 1\) and \(p, q \le k < p + q \le k_1, \dots , k_n\).

With the help of Theorem 5.1, we can prove a similar classification result for entire systems (Theorem 5.2) and for unbalanced equations (Corollary 5.3).

Theorem 5.2

For every nonperiodic morphism \(h: \{X, Y, Z\}^* \rightarrow \Sigma ^*\) that satisfies a nontrivial equation, \(\textrm{Eq}(h)\) is equivalent, up to a permutation of the variables, to an unbalanced equation of one of the following types:

  1. BME1.

    \( (X^{k_0} \prod _{i = 1}^n Y X^{k_i}, Z),\) where \(n, k_0, \dots , k_n \in \mathbb {N}\).

  2. BME2.

    \( (Y^n, Z^m),\) where \(m, n \in \mathbb {N}\) and \(m, n \ge 1\) and \(\gcd (m, n) = 1\).

  3. BME3.

    \( (X^p Z X^q, X^{p'} Y \prod _{i = 1}^n (X^{k_i - p - q} Y) X^{q'}),\) where \(p, q, p', q', n, k_1, \dots , k_n \in \mathbb {N}\) and \(pp' = qq' = 0\) and \(1 \le p + q \le k_1, \dots , k_n\).

  4. BME4.

    \(((X^{k - p} Y)^{n + 1} X^k, X^k (Z X^{k - q})^{m + 1}),\) where \(p, q, k, m, n \in \mathbb {N}\) and \(k, m, n \ge 1\) and \(p, q \le k\) and \(\gcd (m + 1, n + 1) = 1\).

  5. BME5.

    \(((X^{k - p} Y X^{k - q} Z)^{n + 1}, (X^k Z)^{m + n + 2}),\) where \(p, q, k, m, n \in \mathbb {N}\) and \(p, q,\) \( k, m, n \ge 1\) and \(p, q \le k\) and \(\gcd (m + 1, n + 1) = 1\).

  6. BME6.

    \(((X^k Z)^{m + 2}, (X^{k - p} Y \prod _{i = 1}^n (X^{k_i - p - q} Y) X^{k - q} Z)^{m + 1}),\) where \(p, q, k, m, n,\) \( k_1, \dots , k_n \in \mathbb {N}\) and \(m, p, q \ge 1\) and \(p, q \le k < p + q \le k_1, \dots , k_n.\)

Proof

By Theorem 5.1, h is equivalent, up to a permutation of the variables, to one of the morphisms BM1–BM6. It can be verified by a straightforward computation that each of these morphisms satisfies the corresponding equation BME1–BME6, and these equations are unbalanced. Thus there exists an unbalanced equation \(E \in \textrm{Eq}(h)\) that is equal, up to a permutation of the variables, to one of the equations BME1–BME6. By Lemma 4.4, \(\textrm{Eq}(h)\) is equivalent to E. \(\square \)

Corollary 5.3

Every unbalanced three-variable equation E with a nonperiodic solution h is equivalent, up to a permutation of the variables, to one of the equations BME1–BME6.

Proof

By Lemma 4.4, E is equivalent to \(\textrm{Eq}(h)\), so the claim follows from Theorem 5.2.\(\square \)

6 Solutions of Entire Systems

In this section, we solve the equations BME1–BME6. By Theorem 5.2 and Corollary 5.3, this essentially solves all entire systems and all unbalanced equations in the three-variable case. For each equation, we find an explicit description of all nonperiodic solutions of that equation using two word parameters, denoted by u and v, and possibly one numerical parameter, denoted by j. Each of the results could be formulated in terms of parametric solutions, but we return to this in Section 7.

Lemma 6.1

Let E be the equation BME1. Then [xyz] is a solution of E if and only if

$$\begin{aligned} x = u,\qquad y = v,\qquad z = u^{k_0} \prod _{i = 1}^n v u^{k_i} \end{aligned}$$
(1)

for some \(u, v \in \Sigma ^*\).

Proof

Let \(g = [x, y, z]\) be a solution of E. We can let x and y be arbitrary words u and v, and then g is a solution if and only if \(z = u^{k_0} \prod _{i = 1}^n v u^{k_i}\). \(\square \)

Lemma 6.2

Let E be the equation BME2. Then [xyz] is a solution of E if and only if

$$\begin{aligned} x = u,\qquad y = v^m,\qquad z = v^n \end{aligned}$$
(2)

for some \(u, v \in \Sigma ^*\).

Proof

Let \(g = [x, y, z]\) be a solution of E. Then \(y^n = z^m\) is both an n-power and an m-power, so it is also an mn-power of some word v because \(\gcd (m, n) = 1\). This means that \(y = v^m\) and \(z = v^n\). Then x can be an arbitrary word u and this always gives a solution g. \(\square \)

Lemma 6.3

Let E be the equation BME3. If [xyz] is a nonperiodic solution of E, then

$$\begin{aligned} x = (uv)^j u,\qquad y = x^{p - 1} uv x^{q},\qquad z = x^{p' - 1} uv \prod _{i = 1}^n (x^{k_i - 1} uv) x^{q'} \end{aligned}$$
(3)

for some \(u, v \in \Sigma ^*\) and \(j \ge 0\). Moreover, every morphism defined by these formulas is a solution of E, except that if \(p' = q' = 0\) and \(k_i = 1\) for all i, then we must require that \(j \le n\).

Proof

Let \(g = [x, y, z]\) be a nonperiodic solution of E. We have

$$\begin{aligned} x^p z x^q = x^{p'} y \prod _{i = 1}^n (x^{k_i - p - q} y) x^{q'}. \end{aligned}$$

We show that there exists \(m \ge 1\) such that \(x^p \sim _{\textrm{p}}y^m x\). First, if \(p = 0\), this is trivial. Second, if \(p > 0\) and \(k_i > p + q\) for some i, then \(p' = 0\) and we can let m be the smallest number i such that \(k_i > p + q\). Finally, if \(p > 0\) and \(k_i = p + q\) for all i, then \(p' = 0\) and \(x^p\) is a prefix of \(y^{n + 1} x^{q'}\), which either has a prefix \(y^{n + 1} x\) (if \(q' \ge 1\)) or is a prefix of \(y^{n + 1} x\) (if \(q' = 0\)), and in either case we can let \(m = n + 1\). We have shown that \(x^p \sim _{\textrm{p}}y^m x\). Similarly, \(x^q \sim _{\textrm{s}}x y^{m'}\) for some \(m' \ge 1\).

If x and y are powers of a common word, then g is periodic, so it follows from Lemma 4.1 that x and y are of the claimed form. Now g is a solution of E if and only if

$$\begin{aligned} z&= x^{-p + p'} y \prod _{i = 1}^n (x^{k_i - p - q} y) x^{q' - q}\\&= x^{-p + p'} x^{p - 1} uv x^{q} \prod _{i = 1}^n (x^{k_i - p - q} x^{p - 1} uv x^{q}) x^{q' - q}\\&= x^{p' - 1} uv \prod _{i = 1}^n (x^{k_i - 1} uv) x^{q'} \end{aligned}$$

and if this is a well-defined word, despite \(p' - 1\) being negative in the case \(p' = 0\). If \(k_i \ge 2\) for some i or if \(q' \ge 1\), then \(x^{p' - 1}\) is followed by \((uv)^r x = x (vu)^r\) for some \(r \ge 1\), making z a well-defined word. If \(k_i = 1\) for all i and \(p' = q' = 0\), then

$$\begin{aligned} z = x^{-1} uv \prod _{i = 1}^n (uv) = ((uv)^j u)^{-1} (uv)^{n + 1} = v (uv)^{n - j} \end{aligned}$$

which is a well-defined word if and only if \(j \le n\) (or if \(u = \varepsilon \) and \(v^{n - j + 1} = \varepsilon \), but that only gives periodic solutions). \(\square \)

Lemma 6.4

Let E be the equation BME4. If [xyz] is a nonperiodic solution of E, then

$$\begin{aligned} x = (uv)^j u,\qquad y = x^{p - 1} uv (x^{k - 1} uv)^m,\qquad z = (vu x^{k - 1})^n vu x^{q - 1} \end{aligned}$$
(4)

for some \(u, v \in \Sigma ^*\) and \(j \ge 0\). Moreover, every morphism defined by these formulas is a solution of E, except that if \(k = 1\) and \(p = 0\), then we must require \(j \le m\), and if \(k = 1\) and \(q = 0\), then we must require \(j \le n\).

Proof

Let \(g = [x, y, z]\) be a nonperiodic solution of E. We have

$$\begin{aligned} (x^{k - p} y)^{n + 1} x^k = x^k (z x^{k - q})^{m + 1}, \end{aligned}$$

so Lemma 4.2 gives

$$\begin{aligned} x^{k - p} y = (st)^{m + 1},\qquad z x^{k - q} = (ts)^{n + 1},\qquad x^k = (st)^i s \end{aligned}$$

for some \(s, t \in \Sigma ^*\) and \(i \in \mathbb {N}\).

If \(i = 0\), then \(s = x^k\) and

$$\begin{aligned} y&= x^{p - k} (st)^{m + 1} = x^{p - k} (x^k t)^{m + 1} = x^p t (x^k t)^m,\\ z&= (ts)^{n + 1} x^{q - k} = (t x^k)^{n + 1} x^{q - k} = (t x^k)^n t x^q. \end{aligned}$$

If we let \(u = x\) and \(v = t\), then this matches (4) with \(j = 0\).

If \(k \ge 2\) and \(i \ge 1\), then by Lemma 4.3, either xst are powers of a common word, making g periodic, or \(s = (uv)^j u\), \(x = (uv)^{j + 1} u\), \(t = vu x^{k - 2} uv\) for some \(u, v \in \Sigma ^*\) and \(j \in \mathbb {N}\). We get

$$\begin{aligned} y&= x^{p - k} (st)^{m + 1} = x^{p - k} ((uv)^j u vu x^{k - 2} uv)^{m + 1} = x^{p - 1} uv (x^{k - 1} uv)^m\\ z&= (ts)^{n + 1} x^{q - k} = (vu x^{k - 2} uv (uv)^j u)^{n + 1} x^{q - k} = (vu x^{k - 1})^n vu x^{q - 1}. \end{aligned}$$

This matches (4) with \(j \ge 1\). Because \(uv (x^{k - 1} uv)^m\) always begins with \(uv x = x vu\), y is a well-defined word even if \(p = 0\). Similarly, z is a well-defined word even if \(q = 0\).

If \(k = 1\) and \(i \ge 1\), then

$$\begin{aligned} x = (st)^i s,\qquad y = x^{p - 1} (st)^{m + 1},\qquad z = (ts)^{n + 1} x^{q - 1}. \end{aligned}$$

If we let \(u = s\) and \(v = t\) and \(j = i\), then this matches (4) with \(j \ge 1\). Finally, we have to make sure that y and z are well-defined words even if \(p = 0\) or \(q = 0\). If \(p = 0\), then we must require \(i \le m\), and if \(q = 0\), then we must require \(i \le n\). \(\square \)

Lemma 6.5

Let E be the equation BME5. If [xyz] is a nonperiodic solution of E, then

$$\begin{aligned} x = (uv)^j u,\qquad y = x^{p - 1} uv (x^{k - 1} uv)^m x^q,\qquad z = x^{-1} uv (x^{k - 1} uv)^n \end{aligned}$$
(5)

for some \(u, v \in \Sigma ^*\) and \(j \ge 0\). Moreover, every morphism defined by these formulas is a solution of E, except that if \(k = 1\), then we must require that \(j \le n\).

Proof

Let \(g = [x, y, z]\) be a nonperiodic solution of E. We have

$$\begin{aligned} (x^{k - p} y x^{k - q} z)^{n + 1} = (x^k z)^{m + n + 2}, \end{aligned}$$

and \(\gcd (n + 1, m + n + 2) = 1\), so

$$\begin{aligned} x^{k - p} y x^{k - q} z = w^{m + n + 2},\qquad x^k z = w^{n + 1} \end{aligned}$$

for some \(w \in \Sigma ^*\). We get

$$\begin{aligned} w^{m + 1} = w^{m + n + 2} w^{-n - 1} = x^{k - p} y x^{k - q} z (x^k z)^{-1} = x^{k - p} y x^{-q}. \end{aligned}$$

If \(k = 1\), then from \(xz = w^{n + 1}\) it follows that \(w = uv\) and \(x = (uv)^j u\) for some \(u, v \in \Sigma ^*\) and \(j \in \mathbb {N}\), \(j \le n\). Then

$$\begin{aligned} y = x^{p - 1} w^{m + 1} x^q = x^{p - 1} uv (uv)^m x^q,\qquad z = x^{-1} w^{n + 1} = x^{-1} uv (uv)^n. \end{aligned}$$

This matches (5).

If \(k \ge 2\) and \(|x^{k - 1}| \le |w|\), then from \(x^k z = w^{n + 1}\) it follows that \(w = x^{k - 1} t\) for some \(t \in \Sigma ^*\), and that \(x^k\) is a prefix of \(x^{k - 1} t x\), so x is a prefix of tx. This means that \(t = uv\) and \(x = (uv)^j u\) for some \(u, v \in \Sigma ^*\) and \(j \in \mathbb {N}\). Then

$$\begin{aligned} y = x^{p - k} w^{m + 1} x^q = x^{p - 1} uv (x^{k - 1} uv)^m x^q,\qquad z = x^{-k} w^{n + 1} = x^{-1} uv (x^{k - 1} uv)^n. \end{aligned}$$

This matches (5). Because \(uv (x^{k - 1} uv)^n\) always begins with \(uv x = x vu\), z is a well-defined word.

If \(k \ge 2\) and \(|w| < |x^{k - 1}|\), then from \(x^k z = w^{n + 1}\) it follows that \(x^k\) and \(w^{n + 1}\) have a common prefix of length \(|x^k| > |xw|\). By the theorem of Fine and Wilf, x and w are powers of a common word, and this leads to g being periodic. \(\square \)

Lemma 6.6

Let E be the equation BME6. If [xyz] is a nonperiodic solution of E, then

$$\begin{aligned} x = (uv)^j u,\qquad y = x^{p - 1} uv x^{q},\qquad z = x^{-k} \Big (x^{k - 1} uv \prod _{i = 1}^n (x^{k_i - 1} uv)\Big )^{m + 1} \end{aligned}$$
(6)

for some \(u, v \in \Sigma ^*\) and \(j \ge 0\). Moreover, every morphism defined by these formulas is a solution of E, except that if \(k = 1\) and \(n = 0\), then we must require that \(j \le m\).

Proof

Let \(g = [x, y, z]\) be a nonperiodic solution of E. We have

$$\begin{aligned} (x^k z)^{m + 2} = \Big (x^{k - p} y \prod _{i = 1}^n (x^{k_i - p - q} y) x^{k - q} z\Big )^{m + 1}, \end{aligned}$$
(7)

and therefore,

$$\begin{aligned} |x^k z| \le |x^{k - p} y \prod _{i = 1}^n (x^{k_i - p - q} y) x^{k - q} z|. \end{aligned}$$

Thus \(x^k\) is a prefix of \(x^{k - p} y \prod _{i = 1}^n (x^{k_i - p - q} y) x^{k - q}\), and consequently, \(x^p\) is a prefix of \(y \prod _{i = 1}^n (x^{k_i - p - q} y) x^{k - q}\). It follows that \(x^p \sim _{\textrm{p}}y^{m'} x\) for some \(m' \ge 1\). Similarly, we see that \(x^q \sim _{\textrm{s}}x y^{m''}\) for some \(m'' \ge 1\). If x and y are powers of a common word, then g is periodic, so it follows from Lemma 4.1 that x and y are of the claimed form.

The left-hand side and right-hand side of (7) is both an \((m + 2)\)-power and an \((m + 1)\)-power, so it is an \((m + 2)(m + 1)\)-power of some word w. Now g is a solution of E if and only if

$$\begin{aligned} x^k z = w^{m + 1}, \qquad x^{k - p} y \prod _{i = 1}^n (x^{k_i - p - q} y) x^{k - q} z = w^{m + 2}, \end{aligned}$$

so

$$\begin{aligned} w = w^{m + 2} w^{-m -1} = x^{k - p} y \prod _{i = 1}^n (x^{k_i - p - q} y) x^{-q} \end{aligned}$$

and

$$\begin{aligned} z&= x^{-k} w^{m + 1} = x^{-k} \Big (x^{k - p} y \prod _{i = 1}^n (x^{k_i - p - q} y) x^{-q}\Big )^{m + 1}\\&= x^{-k} \Big (x^{k - p} x^{p - 1} uv x^{q} \prod _{i = 1}^n (x^{k_i - p - q} x^{p - 1} uv x^{q}) x^{-q}\Big )^{m + 1}\\&= x^{-k} \Big (x^{k - 1} uv \prod _{i = 1}^n (x^{k_i - 1} uv)\Big )^{m + 1}. \end{aligned}$$

This is always a well-defined word, except that in the case \(k = 1\) and \(n = 0\), we get \(z = x^{-1} (uv)^{m + 1}\), and we must additionally require that \(j \le m\). \(\square \)

7 Main Result

Let us take a closer look at the lemmas proved in the previous section. We see that the formulas (1)–(6) that describe the nonperiodic solutions contain at most one free numerical parameter j. The other numbers in these formulas, denoted by symbols such as \(k_i\), p, q and so on, are actually constants defined by the morphism h. The next example illustrates this in the case of the last lemma. In the following theorem, we formulate this precisely in terms of parametric solutions.

Example 7.1

Consider Lemma 6.6 and equation BME6. If \(p = q = m = 1\) and \(k = 2\) and \(n = 0\), then the equation has a parametric NonPer-solution

$$\begin{aligned} \{((UV)^J U, UV (UV)^J U, VUUV)\} \in \mathcal {R}_1(2, 1). \end{aligned}$$

If \(p = q = k = m = 1\) and \(n = 0\), then the equation has a parametric NonPer-solution

$$\begin{aligned} \{(U, UV U, V UV), (UV U, UV UV U, V)\} \in \mathcal {R}_0(2, 0). \end{aligned}$$

Theorem 7.2

Every nontrivial entire system \(\textrm{Eq}(h)\), where h is nonperiodic, and every unbalanced equation E on three variables has a parametric NonPer-solution in \(\mathcal {R}_1(2, 1)\) and a parametric solution in \(\mathcal {R}_1(2, 2)\).

Proof

By Theorem 5.2, the entire system \(\textrm{Eq}(h)\) is equivalent, up to a permutation of the variables, to one of the equations BME1–BME6, and by Corollary 5.3, the same is true for E if it has a nonperiodic solution. By the lemmas of Section 6, each of the equations BME1–BME6 has a parametric NonPer-solution in \(\mathcal {R}_1(2, 1)\) and, by Lemma 3.4, a parametric solution in \(\mathcal {R}_1(2, 2)\).

If E has only periodic solutions, then the empty set is a parametric NonPer-solution of E, and E has a parametric solution in \(\mathcal {R}_1(2, 2)\) by Lemma 3.4. \(\square \)

8 Towards a Characterization for Balanced Equations

We would like to prove a result similar to Theorem 7.2 for balanced three-variable equations. The next theorem points towards such a result.

Theorem 8.1

Let E be a three-variable equation. Let H be a set of representatives of all equivalence classes of morphisms \(\{X, Y, Z\}^* \rightarrow \Sigma ^*\). Then

$$\begin{aligned} {{\,\textrm{Sol}\,}}(E) = \bigcup _{h \in {{\,\textrm{Sol}\,}}(E) \cap H} {{\,\textrm{Sol}\,}}(\textrm{Eq}(h)). \end{aligned}$$

Proof

Every \(g \in {{\,\textrm{Sol}\,}}(E)\) is equivalent to some \(h \in H\), and then \(h \in {{\,\textrm{Sol}\,}}(E) \cap H\) and \(g \in {{\,\textrm{Sol}\,}}(\textrm{Eq}(g)) = {{\,\textrm{Sol}\,}}(\textrm{Eq}(h))\).

On the other hand, if \(f \in {{\,\textrm{Sol}\,}}(\textrm{Eq}(h))\) for some \(h \in {{\,\textrm{Sol}\,}}(E) \cap H\), then \(E \in \textrm{Eq}(h)\) and thus \(f \in {{\,\textrm{Sol}\,}}(\textrm{Eq}(h)) \subseteq {{\,\textrm{Sol}\,}}(E)\). \(\square \)

If the set \({{\,\textrm{Sol}\,}}(E) \cap H\) is finite, then this theorem, together with the results proved in this article, leads to a simple parametric solution for E. Unfortunately, the set \({{\,\textrm{Sol}\,}}(E) \cap H\) can be infinite, but the results in [19] give some restrictions on how complex this set can be. This could allow us to prove a stronger version of Hmelevskii’s theorem.

Theorem 8.2

Let E be a three-variable equation. Let H be the set of morphisms of the form BM1–BM6. Then \({{\,\textrm{Sol}\,}}(E) \cap H = (A \cap H) \cup (B \cap H)\), where A is either a finite set or the set

$$\begin{aligned} \{[a, a^i b a^j, (uv)^k u] \mid k \in \mathbb {N}\} \end{aligned}$$

for some \(i, j \in \mathbb {N}\) and \(u, v \in \Sigma ^*\), and B is either a finite set or the set

$$\begin{aligned} \{[a, a^i b (a^m b)^p a^j, a^k b (a^m b)^q a^l] \mid p, q \ge 1, \gcd (p + 1, q + 1) = 1\} \end{aligned}$$

for some \(i, j, k, l, m \in \mathbb {N}\).

Proof

Each of the morphisms in H is either of the form \([a, a^i b a^j, w]\) or of the form \([a, a^i b (a^m b)^p a^j, a^k b (a^m b)^q a^l]\).

For morphisms of the form \([a, a^i b a^j, w]\), it was proved in [19] that E can have such a solution for at most one pair (ij). If we fix i and j and replace X by a and Y by \(a^i b a^j\) in E, we get a nontrivial one-variable equation with constants, and solving this one-variable equation gives us the possible values for w. It is known that if such an equation has infinitely many solutions, then there are uv such that for each k, the morphism that maps the remaining variable Z to \((uv)^k u\) is a solution, and there are no other solutions, see [20].

For morphisms of the form \([a, a^i b (a^m b)^p a^j, a^k b (a^m b)^q a^l]\), it was proved in [19] that if E has infinitely many such solutions, then the set of such solutions in \({{\,\textrm{Sol}\,}}(E)\) is

$$\begin{aligned} \{[a, a^i b (a^m b)^p a^j, a^k b (a^m b)^q a^l] \mid p, q \ge 1, \gcd (p + 1, q + 1) = 1\} \end{aligned}$$

for some \(i, j, k, l, m \in \mathbb {N}\).\(\square \)

We expect that every nontrivial three-variable equation has a parametric solution in \(\mathcal {R}_d(2, q)\) for some fixed numbers dq, perhaps even for \(d = 2\) and \(q = 3\). However, proving this kind of a result requires additional work in the future.