1 Introduction

The theory of languages is extremely rich and important automata-logic correspondances have been shown for various classes of logics, automata, and structures. There are less known automata-logic connections in the theory of transductions. Nevertheless, important results have been obtained for the class of MSO-definable transductions, as defined by Courcelle. Most notably, it has been shown by J. Engelfriet and H.J. Hoogeboom that MSO-definable (finite) string to string transductions are exactly those transductions defined by deterministic two-way transducers [12]. This result has then been extended to ordered ranked trees by J. Engelfriet and S. Maneth, for the class of linear-size increase macro tree transducers [13] and recently to nested words-to-words transductions [10]. MSO-definable transductions of finite strings have also been characterized by a new automata model, that of (copyless) streaming string transducers, by R. Alur and P. Černý [2].

Copyless streaming string transducers (SST) extend deterministic finite state automata with a finite set of string variables \(X,Y,\dots \). Each variable stores an intermediate string output and can be combined and updated with other variables. Along the transitions, a finite string can be appended or prepended to a variable, and variables can be reset or concatenated. The variable updates along the transitions are formally defined by variable substitutions and the copyless restriction is defined by considering only linear substitutions. Therefore, variable update such as \(X := XX\) are forbidden by the copyless restriction. The SST model has then been extended to other structures such as infinite strings [7], trees [4], and quantitative languages [5].

Two examples of SST are depicted in Fig. 1:

  • The SST \(T_0\) depicted on the left realizes the function \(f_0\) mapping any input word \(u\in \varSigma ^*\) to the word \(u\overline{u}\), where \(\overline{u}\) is the mirror image of the word u. Indeed, when the input word u has been read by the automaton, the variable X contains the word u, while the variable Y contains the word \(\overline{u}\). Hence, the final output, defined as XY, is equal to the concatenation \(u\overline{u}\). It is worth noting that this SST is copyless.

  • The SST \(T_1\) depicted on the right realizes the function \(f_1\) mapping any input word \(u = a^n\), with \(n \ge 1\), to the output word \(a^{2^n}\). This SST is copyful.

Fig. 1.
figure 1

Two streaming string transducers \(T_0\) (left) and \(T_1\) (right).

One of the most important and fundamental problem in the theory of transducers is the equivalence problem, which asks, given two transducers, whether they realize the same transduction. This problem is well known to be decidable for rational functions, and more generally for MSO-definable transductions [14], and hence for copyless SST (in PSpace, as shown in [1]). The problem gets undecidable when the transducers define binary relations instead of functions: it is already undecidable for rational transducers [17], a strict subclass of non-deterministic SST. However, it was unknown whether, in the functional case, decidability still holds without the copyless restriction, as mentioned in [3], an extended version of [4].

HDT0L systems allow to define languages by means of morphism iteration: a sequence of indices \(i_1,\ldots ,i_k\) induces a composition of morphisms (one morphism for each index \(i_j\)) which, applied on an initial and fixed word, produces a new word. An important result related to HDT0L systems has been obtained in the 80’s, see [9]. It states that the equivalence of finite-valued transducers over HDT0L languages is decidable, with unknown complexity.

In this paper, we build a tight connection between HDT0L systems and streaming string transducers. To this end, we introduce a new semantics of HDT0L systems, viewed as transducers. This allows us to prove that (total) copyful SST and HDT0L systems (seen as transducers) have the same expressive power, with back and forth transformations of linear complexity. As a corollary of this result, we obtain that the equivalence problem of copyful SST and the equivalence problem of HDT0L systems are inter-reducible, in linear time. This result has two consequences:

  • first, the decidability of SST equivalence directly follows from [9],

  • second, the functionality problem for non-deterministic (copyful) SST is decidable.

Note that the decidability of SST equivalence also follows from the two recent results [8, 21].

Last, we study the following subclass definability problem: given a (copyful) SST, does there exist an equivalent copyless SST? We show that this problem is decidable in polynomial time, using a reduction to a boundedness problem for products of matrices studied by Mandel and Simon [20], and show that when possible, we can build an equivalent copyless SST.

Organization of the paper. We introduce the models of streaming string transducers and HDT0L systems in Sect. 2. In Sect. 3, we prove that these two models are equi-expressive. We apply this result to prove the decidability of the equivalence of copyful SST and of the functionality of non-deterministic SST in Sect. 4. Last, in Sect. 5, we study the subclass definability problem for copyless SST.

2 Preliminaries

For all finite alphabets \(\varSigma \), we denote by \(\varSigma ^*\) the set of finite words over \(\varSigma \), and by \(\epsilon \) the empty word. Given two alphabets \(\varSigma \) and \(\varGamma \), a transduction R from \(\varSigma ^*\) to \(\varGamma ^*\) is a subset of \(\varSigma ^*\times \varGamma ^*\). It is functional if it defines a function, i.e. for all \(w\in \varSigma ^*\), there exists at most one \(v\in \varGamma ^*\) such that \((w,v)\in R\). The domain of R, denoted by \(\textit{Dom}(R)\), is the set \(\textit{Dom}(R) = \{ w\in \varSigma ^*\mid \exists v\in \varGamma ^*, (w,v)\in R\}\). A transduction is total if \(\textit{Dom}(R) = \varSigma ^*\). Given two finite alphabets \(\varSigma ,\varGamma \), a morphim from \(\varSigma ^*\) to \(\varGamma ^*\) is a mapping \(h:\varSigma ^* \rightarrow \varGamma ^*\) such that \(h(uv)=h(u)h(v)\) for any two words \(u,v\in \varSigma ^*\).

2.1 Streaming String Transducers

Let \(\mathcal {X}\) be a finite set of variables denoted by \(X,Y,\dots \) and \(\varGamma \) be a finite alphabet. A substitution s is defined as a mapping \({s : \mathcal {X}\rightarrow (\varGamma \cup \mathcal {X})^*}\). Let \(\mathcal {S}_{\mathcal {X}, \varGamma }\) be the set of all substitutions. Any substitution s can be extended to \(\hat{s}: (\varGamma \cup \mathcal {X})^* \rightarrow (\varGamma \cup \mathcal {X})^*\) in a straightforward manner. The composition \(s_1\circ s_2\) (or \(s_1s_2\) for short) of two substitutions \(s_1,s_2\in \mathcal {S}_{\mathcal {X}, \varGamma }\) is defined as the standard function composition \(\hat{s_1} \circ s_2\), i.e. \((s_1s_2)(X)=(\hat{s_1}s_2)(X) = \hat{s_1}(s_2(X))\) for all \(X \in \mathcal {X}\).

Definition 1

A non-deterministic streaming string transducer (NSST for short) is a tuple \(T = (\varSigma , \varGamma , Q, Q_0, Q_f, \varDelta , \mathcal {X}, \rho , s_0,s_f)\) where:

  • \(\varSigma \) and \(\varGamma \) are finite alphabets of input and output symbols,

  • Q is a finite set of states,

  • \(Q_0\subseteq Q\) is a set of initial states,

  • \(Q_f\subseteq Q\) is a set of final states,

  • \(\varDelta \subseteq Q \times \varSigma \times Q\) is a transition relation,

  • \(\mathcal {X}\) is a finite set of variables,

  • \(\rho : \varDelta \rightarrow 2^{\mathcal {S}_{\mathcal {X}, \varGamma }}\) is a variable update function such that \(\rho (t)\) is finite, for all \(t\in \varDelta \),

  • \(s_0 : \mathcal {X}\rightarrow \varGamma ^*\) is the initial function that gives the initial content of the variables,

  • \(s_f: Q_f \rightarrow (\mathcal {X}\cup \varGamma )^*\) is the final output function, which gives what is output for each final state.

The concept of a run of an NSST is defined in an analogous manner to that of a finite state automaton: it is a finite sequence \(r\in (Q\varSigma )^*Q\), denoted \(r = q_0 \xrightarrow {\sigma _1} q_1 \xrightarrow {\sigma _2} q_2 \ldots q_{n-1}\xrightarrow {\sigma _n} q_n\), such that \((q_i,\sigma _{i+1},q_{i+1})\in \varDelta \) for all \(0\le i<n\). It is accepting if \(q_0\in Q_0\) and \(q_n\in Q_f\). A sequence of substitutions \(\overline{s} = \langle s_i \rangle _{1 \le i \le n}\) in \(\mathcal {S}_{\mathcal {X}, \varGamma }\) is compatible with r if for all \(1\le i\le n\), \(s_i \in \rho (q_{i-1}, \sigma _{i},q_i)\).

If r is accepting, the output of r, denoted by \(\textit{Out}(r)\subseteq \varGamma ^*\), is defined as

$$ \textit{Out}(r)=\{s_0s_1\dots s_ns_f(q_n)\ |\ \langle s_{i} \rangle _{1 \le i \le n}\in (\mathcal {S}_{\mathcal {X}, \varGamma })^*\text { is compatible with } r\}. $$

For all words \(w\in \varSigma ^*\), the output of w by T, denoted by T(w), is \(T(w) = \{\textit{Out}(r)\ |\ r\text { is an accepting run on } w\}\). The domain of T, denoted by \(\textit{Dom}(T)\), is defined as the set of words w such that \(T(w)\ne \emptyset \). The transduction \(\llbracket T \rrbracket \) defined by T is the relation from \(\varSigma ^*\) to \(\varGamma ^*\) given by the set of pairs (wv) such that \(v\in T(w)\).

Deterministic and functional SST. A deterministic SST (SST for short) is an NSST such that \(|Q_0|=1\) and for all \(p\in Q,\sigma \in \varSigma \), there exists at most one \(q\in Q\) such that \((p,\sigma ,q)\in \varDelta \), and such that for all \(t\in \varDelta \), \(|\rho (t)|=1\). When an SST is deterministic, we identify \(Q_0\) with \(q_0\), and given \(t\in \varDelta \), we write \(\rho (t) = s\) instead of \(\rho (t)=\{s\}\).

In the following, the streaming string transducers we consider are deterministic, unless they are explicitly stated to be non deterministic.

In [2, 6], the variable updates are required to be copyless, i.e. for every variable \(X\in \mathcal {X}\), and for every transition \(t\in \varDelta \), X occurs at most once in \(\rho (t)(X_1),\ldots ,\rho (t)(X_n)\) where \(\{X_1,\ldots ,X_n\}=\mathcal {X}\). One of the main result of [2] is to show that this restriction, as well as determinism, allows one to capture exactly the class of MSO-definable transductions.

It is worth noting that any SST, since it is deterministic, defines a functional transduction. More generally, we say that an NSST T is functional if \(\llbracket T \rrbracket \) is functional. It is known that functional NSST with copyless update are no more expressive than (deterministic) SST with copyless update [6]. We show a similar result for (copyful) SST:

Proposition 1

Functional NSST and SST are equi-expressive.

Proof

Let \(T=(\varSigma , \varGamma , Q, Q_0, Q_f, \varDelta , \mathcal {X}, \rho , s_0,s_f)\) be a functional NSST. Without loss of generality, we assume that T is trim, i.e. every state of T is reachable from some initial state, and co-reachable from an accepting state. Any SST can be made trim by filtering out the states that do not have this property (which is decidable in PTime).

The main idea is to realize a subset construction on T (a similar construction was given in [7]). On states, the subset construction is just as the subset construction for NFA. On variables, one needs to duplicate each variable as many times as the number of states. The invariant property is the following: after reading a word w, if there exists a run \(\rho \) of T on w leading to q, then for all \(X\in \mathcal {X}\) such that there exists an accepting continuation \(w'\) of w (i.e. \(ww'\in \textit{Dom}(T)\)) whose output uses the content of X after \(\rho \), then \(X_q\) and X have the same content after reading w. There might be several runs of T leading to q, but since T is trim and functional, then content of F does not depend on the chosen run. Hence the invariant is well-defined.

Formally, we define an equivalent SST \(T'=(\varSigma ,\varGamma ,Q', q'_0, Q'_f, \varDelta ', \mathcal {X}', \rho ', s'_0,s'_f)\) such that \((\varSigma ,\varGamma ,Q',q'_0,Q'_f,\varDelta ')\) is the DFA resulting from the classical subset construction (in particular \(Q'=2^Q\)) and such that:

  • \(\mathcal {X}' = \mathcal {X}\times Q\) (each variable is denoted by \(X_q\))

  • \(\forall t'=(Q_1,\sigma ,Q_2) \in \varDelta ', \forall q_2 \in Q_2, \forall X\in \mathcal {X}\), \(\rho '(t')(X_{q_2}) = \text {rename}_{q_1}(\rho (t)(X))\) for some \(q_1\in Q_1\) such that \(t=(q_1,\sigma ,q_2) \in \varDelta \), where \(\text {rename}_{q_1}\) is the identity morphism on \(\varSigma \) and replaces any \(Y\in \mathcal {X}\) by \(Y_{q_1}\). As explained before, the functionality of T entails that the choice of \(q_1\) is not important (a different choice would give the same value to \(X_{q_2}\)). This choice can be made canonical by using some order on the states of T.

  • \(\forall P\in Q'_f\), \(\rho '(P)= \text {rename}_q(\rho (q))\) for some \(q\in P\cap Q_f\). Once again, by functionality of T, the choice of q does not matter and can be made canonical.   \(\square \)

We say that a SST T is total if \(\llbracket T \rrbracket \) is total. We also show that regarding the equivalence problem, considering total SST is harmless, as one can modify a SST in linear time in order to make it total.

Proposition 2

Given two SST \(T,T'\), one can build in linear time two total SST \(T_{tot}\) and \(T_{tot}'\) such that \(\llbracket T \rrbracket =\llbracket T' \rrbracket \) iff \(\llbracket T_{tot} \rrbracket =\llbracket T_{tot}' \rrbracket \).

Proof

Indeed, let \(\#\not \in \varGamma \). Any (partial) SST T can be transformed into an SST \(T_{tot}\) that defines the following transduction: \(\llbracket T_{tot} \rrbracket (u) = \llbracket T \rrbracket (u)\) if \(u\in \textit{Dom}(T)\), and \(\llbracket T_{tot} \rrbracket (u) = \#\) otherwise. This is achieved using a new variable \(X_\#\) whose content is always \(\#\), and by completing the rules of T by adding an accepting sink state \(q_{sink}\). We also modify final states and the final output function: states that were not final are declared final, and the final output function associates with these states the variable \(X_\#\).    \(\square \)

2.2 HDT0L Systems

Lindenmayer introduced in the sixties a formal grammar in order to model the developement process of some biological systems [19]. We consider here a particular class of these systems, called HDT0L systems (HDT0L stands for Deterministic 0-context Lindenmayer systems with Tables and with an additional Homomorphism).

Definition 2

(HDT0L System). An HDT0L system over \(\varSigma \) and \(\varGamma \) is defined as a tuple \(H=(\varSigma ,A,\varGamma ,v,h,(h_\sigma )_{\sigma \in \varSigma })\) where:

  • \(\varSigma \), A and \(\varGamma \) are finite alphabets,

  • \(v\in A^*\) is the initial word,

  • h is a morphism from \(A^*\) to \(\varGamma ^*\),

  • for each \(\sigma \in \varSigma \), \(h_\sigma \) is a morphism from \(A^*\) to \(A^*\).

The equivalence problem for HDT0L systems asks, given two such systems \(H=(\varSigma ,A,\varGamma ,v,h,(h_\sigma )_{\sigma \in \varSigma })\) and \(G=(\varSigma ,A,\varGamma ,w,g,(g_\sigma )_{\sigma \in \varSigma })\), whether, for every \(\sigma _1 \ldots \sigma _k\in \varSigma ^*\), we have \(h(h_{\sigma _1}\ldots h_{\sigma _k}(v)) = g(g_{\sigma _1}\ldots g_{\sigma _k}(w))\). This problem is known to be decidable [9], with unknown complexity. The original proof of [9] is based on Ehrenfeucht’s conjecture and Makanin’s algorithm. Honkala provided a simpler proof in [18], based on Hilbert’s Basis Theorem.

In order to transfer this decidability result to SST, we introduce a semantics of HDT0L systems in terms of transductions.

Definition 3

(Transduction realized by an HDT0L system). Let \(H=(\varSigma ,A,\varGamma ,v,h,(h_\sigma )_{\sigma \in \varSigma })\) be an HDT0L system. We define \(\llbracket H \rrbracket \) as a (total) transduction from \(\varSigma ^*\) to \(\varGamma ^*\) defined by \(\llbracket H \rrbracket (\sigma _1 \ldots \sigma _k) = h(h_{\sigma _1}\ldots h_{\sigma _k}(v))\).

Example 1

Let us consider the function \(f_0\) introduced in the introduction. We define an HDT0L \(H_0 = (\varSigma ,A,\varSigma ,v_0,h,(h_\sigma )_{\sigma \in \varSigma })\) such that \(\llbracket H_0 \rrbracket = f_0\), with \(A = \{\$_1,\$_2,a,b\}\), \(\varSigma = \{a,b\}\), \(v_0 = \$_1\$_2\), and the morphisms are defined as follows:

$$ \begin{array}{cccccccccccccccccccccccccc} h &{} : &{} a &{} \rightarrow &{} a &{}\qquad h_a &{} : &{} a &{} \rightarrow &{} a &{} \qquad h_b &{} : &{} a &{} \rightarrow a \\ &{} &{} b &{} \rightarrow &{} b &{} &{} &{} b &{} \rightarrow &{} b &{} &{} &{} b &{} \rightarrow b \\ &{} &{} \$_1 &{} \rightarrow &{} \epsilon &{} &{} &{} \$_1 &{} \rightarrow &{} \$_1a &{} &{} &{} \$_1 &{} \rightarrow &{} \$_1b\\ &{} &{} \$_2 &{} \rightarrow &{} \epsilon &{} &{} &{} \$_2 &{} \rightarrow &{} a\$_2 &{} &{} &{} \$_2 &{} \rightarrow &{} b\$_2\\ \end{array} $$

For instance, we have the following derivation:

$$ \begin{array}{llllllll} \llbracket H_0 \rrbracket (abb) &{} = &{} hh_{abb}(\$_1\$_2) = hh_{ab} \circ h_b(\$_1\$_2) = hh_{ab} (\$_1bb\$_2) = hh_{a} (\$_1bbbb\$_2) \\ &{} = &{} h (\$_1abbbba\$_2) = abbbba \end{array} $$

We can now rephrase the result of [9] as follows:

Theorem 1

[9]. Given two HDT0L systems \(H_1,H_2\) over \(\varSigma \) and \(\varGamma \), it is decidable whether \(\llbracket H_1 \rrbracket = \llbracket H_2 \rrbracket \).

In the next section, we show that HDT0L systems and SST define the same class of transductions.

3 SST and HDT0L Systems Are Equi-Expressive

Let \(\varSigma \) and \(\varGamma \) two alphabets. In this section, we always consider that SST and HDT0L systems are over \(\varSigma \) and \(\varGamma \). We prove the following theorem:

Theorem 2

HDT0L systems over \(\varSigma \) and \(\varGamma \) and total SST define the same class of transductions. Moreover, the constructions are effective in both directions, in linear-time.

A direct consequence of this result is:

Corollary 1

The equivalence problems for HDT0L systems and for (copyful) streaming string transducers are inter-reducible in linear time.

We prove successively the two directions of Theorem 2.

Lemma 1

For all HDT0L systems H, there exists an equivalent (total) SST T with only one state.

Proof

Let \(H = (\varSigma ,A,\varGamma ,v,h,(h_\sigma )_{\sigma \in \varSigma })\) be an HDT0L system. We construct a total SST T over \(\varSigma \) and \(\varGamma \) such that \(\llbracket T \rrbracket = \llbracket H \rrbracket \). The SST has one state q, both initial and accepting. Its set of variables is the set \(\mathcal {X}= \{ X_a\ |\ a\in A\}\). Its transitions are defined by \(q\xrightarrow {\sigma }q\) for all \(\sigma \in \varSigma \).

To define the update functions, we first introduce the morphism \(\text {rename}_X : A^*\rightarrow \mathcal {X}^*\) defined for all \(a\in A\) by \(\text {rename}_X(a) = X_a\). Then, the update function \(\rho \) is defined, for all \(\sigma \in \varSigma \) and \(a\in A\) by \(\rho ((q,\sigma ,q))(X_a) = \text {rename}_X(h_\sigma (a))\).

Finally, the initial function is defined by \(s_0(X_a) = h(a)\) for all \(a\in A\), and the final function \(s_f\) by \(s_f(q) = \text {rename}_X(v)\).    \(\square \)

Example 2

For the HDT0L system \(H_0\) of Example 1, we obtain the SST \(T_2\) depicted in Fig. 2.

Fig. 2.
figure 2

A SST \(T_2\).

We now prove the converse.

Lemma 2

For all total SST T, there exists an equivalent HDT0L system H.

Proof

Let \(T = (\varSigma , \varGamma , Q, q_0, Q_f, \varDelta , \mathcal {X}, \rho , s_0,s_f)\) be a total SST (remember that by default an SST is deterministic). We define an equivalent HDT0L H as follows.

We consider the finite alphabet \(A = \{\alpha _q \mid \alpha \in \varGamma \cup \mathcal {X}, q \in Q\}\). For every \(q\in Q\), we consider the morphism \(\text {subscript}_q : (\varGamma \cup \mathcal {X})^*\rightarrow A^*\) defined for all \(\alpha \in \varGamma \cup \mathcal {X}\) by \(\text {subscript}_q(\alpha ) = \alpha _q\).

As T is total, we have that \(Q_f=Q\). We consider an enumeration \(q_1,\ldots ,q_n\) of Q. We define the initial word v as follows:

$$v= \text {subscript}_{q_1}(s_f(q_1)) \ldots \text {subscript}_{q_n}(s_f(q_n))$$

We define the morphim \(h:A^* \rightarrow \varGamma ^*\) as follows:

$$ \begin{array}{lllll} h: &{} \gamma _{q_0} &{} \rightarrow &{} \gamma &{} \text { with }\gamma \in \varGamma \\ &{} X_{q_0} &{} \rightarrow &{} s_0(X) &{} \text { with } X \in \mathcal {X}\\ &{} \alpha _q &{} \rightarrow &{} \epsilon &{} \text { with } q\ne q_0 \text { and }\alpha \in \varGamma \cup \mathcal {X}\end{array} $$

Last, given \(\sigma \in \varSigma \) we define the morphism \(h_\sigma :A^* \rightarrow A^*\) as follows. Given a state q, we define the set \(Pre_q^\sigma \subseteq Q\) as the set of states p such that \((p,\sigma ,q)\in \varDelta \).

We define: (by convention, the product over the empty set gives the empty word)

$$ \begin{array}{llll} \forall \gamma \in \varGamma , &{} h_\sigma (\gamma _q) &{}=&{} \varPi _{p\in Pre_q^\sigma }\text {subscript}_{p}(\gamma )\\ \forall X\in \mathcal {X}, &{} h_\sigma (X_q) &{}=&{} \varPi _{p\in Pre_q^\sigma }\text {subscript}_{p}(\rho (p,\sigma ,q)(X)) \end{array} $$

Intuitively, the HDT0L system simulates the computations of the SST in a backward manner, starting from the final states. These computations are encoded using the labelling of symbols by states. One can easily prove by induction on the length of some input word w that after reading w, for every state q, the projection of \(h_w(v)\) on the subalphabet \(\text {subscript}_{q}(\varGamma \cup \mathcal {X})\) encodes the run of the SST on w starting in state q, which is unique since T is deterministic. The morphism h then simply erases parts of the computations that did not reach the initial state \(q_0\).    \(\square \)

We point out the following result, which follows from Lemmas 1 and 2:

Corollary 2

For all SST T, one can construct in polynomial time an equivalent SST \(T'\) such that the underlying input DFA of \(T'\) is the minimal complete DFA recognizing \(\textit{Dom}(T)\).

Proof

If T is total, then the result is a direct consequence of the successive application of Lemmas 2 and 1. Note that in this case, \(T'\) has only one state.

If T is not total, we make it total as in the proof of Proposition 2, and obtain a total SST S which can be converted into a single state SST \(S'\). Then, we minimize the underlying input DFA of T (which recognizes \(\textit{Dom}(T)\)) into a minimal complete DFA \(A_{min}\). Finally, \(T'\) is defined as a kind of product of \(S'\) and \(A_{min}\): if \(A_{min} = (\varSigma ,P,p_0,P_f,\delta )\) and \(S' = (\varSigma ,\varGamma ,\{q\},q,\{q\},\{(q,\sigma ,q) \mid \sigma \in \varSigma \},\mathcal {X},\rho ,s_0,s_f)\), then we let \(T'=(\varSigma ,\varGamma ,P,p_0,P_f,\delta ,\mathcal {X},\rho ',s_0,s'_f)\) where:

  • \(\rho '(p,\sigma ,p')=\rho (q,\sigma ,q)\) for all \((p,\sigma ,p')\in \delta \),

  • \(s'_f(p_f) = s_f(q)\) for all \(p_f\in P_f\).    \(\square \)

Remark 1

By this corollary, any total SST is equivalent to some single state SST.

4 Applications: SST Equivalence and Functionality of NSST

Based on the correspondence between SST and HDT0L systems, and the fact that the HDT0L system equivalence problem is decidable, we show that that the SST equivalence and functionality problems are decidable.

Theorem 3

  1. 1.

    Given two SST T and \(T'\), it is decidable whether they are equivalent, i.e. \(\llbracket T \rrbracket = \llbracket T' \rrbracket \).

  2. 2.

    Given an NSST T, it is decidable whether T is functional.

Proof

The first statement is straightforward by Theorems 1, 2 and Proposition 2.

To prove the second statement, we reduce the functionality problem to the equivalence of two (deterministic) SST \(T_1\) and \(T_2\). Let \(T = (\varSigma , \varGamma , Q, Q_0, Q_f, \varDelta , \mathcal {X},\) \( \rho , s_0, s_f)\) be an NSST. We extend the alphabet \(\varSigma \) with pairs of rules of T as follows: \(\varSigma ' = \varSigma \times \varDelta ^2\). Now, \(T_1\) and \(T_2\) are defined as the square of T: they run on words \(w'\) over \(\varSigma '\), and make sure that the sequence of transitions are valid runs of T on the \(\varSigma \)-projections of \(w'\). In addition, \(T_i\) simulates T on the \((i+1)\)-th component, for all \(i=1,2\), by following the transitions defined on the input letters. Clearly, \(T_1\) and \(T_2\) have the same domain, are deterministic, and are equivalent iff all pairs of accepting runs of T on the same input word produce the same output, i.e., iff T is functional. The conclusion follows from statement 1.    \(\square \)

5 Deciding the Subclass of Copyless SST

The subclass of copyless SST is of great interest, as it exactly corresponds to the class of regular functions which enjoys multiple characterizations (MSO transductions and deterministic two-way transducers for instance) and has been widely studied in the literature (see for instance [16] for a survey). Given a copyful SST, it is thus a natural question whether there exists an equivalent copyless SST and if so, whether one can actually compute such an equivalent machine. In this section, we answer these questions positively:

Theorem 4

Given an SST T, it is decidable in PTime whether there exists an equivalent copyless SST. If this is the case, one can build an equivalent copyless SST.

It is well-known that copyless SST define transductions f that are linear-size increase (LSI for short), i.e. there exists some constant M such that for every w, we have \(|f(w)|\le M |w|\) (this can be observed for instance using the MSOT presentation). We will use this semantical condition in order to solve the above problem.

Proof

Let \(T = (\varSigma , \varGamma , Q, q_0, Q_f, \varDelta , \mathcal {X}, \rho , s_0,s_f)\) be an SST. Consider a run \(q_0\xrightarrow {\sigma _1}q_1 \ldots q_{n-1}\xrightarrow {\sigma _n}q_n = p\) of T starting in the initial state. By definition of the semantics of T, after this run, one can associate, in state p, with variable X the content \(\nu (X)\in \varGamma ^*\) defined as \(s_0s_1\ldots s_n(X)\) where \(s_i=\rho (q_{i-1},\sigma _i,q_i)\). We use the notation \(q_0 \xrightarrow {\sigma _1\ldots \sigma _n} (p,\nu )\) to describe this fact, and may remove the label \(\sigma _1\ldots \sigma _n\) when it is useless.

We also say that a pair \((p,X)\in Q\times \mathcal {X}\) is co-accessible whenever there exists a run starting in state p reaching a final state \(q_f\) such that the final output \(s_f(q_f)\) involves a variable whose content depends on the content of X at the beginning of the run. In other words, the content of X at configuration (pX) flows into \(s_f(q_f)\), i.e. X is “useful” for \(s_f(q_f)\). This intuitive notion of variable flow is formally defined in [15].

We introduce the following objects:

  • we let \(val(p,X)=\{\nu (X) \in \varGamma ^* \mid \text { there exists a run } q_0 \rightarrow (p,\nu )\}\)

  • we let \(INF=\{(p,X)\mid (p,X) \text { is co-accessible and }val(p,X) \text { is infinite} \}\)

  • we define \((p,X) \xrightarrow {u\mid n} (q,Y)\) if there exists a run from p to q on word u on which X flows n times in Y. We may omit u when it is useless.

Claim: \(\llbracket T \rrbracket \) is definable by a copyless SST iff there exists \(K\in \mathbb {N}\) such that for all \((p,X) \xrightarrow {n} (q,Y)\) with \((p,X),(q,Y)\in INF\), we have \(n\le K\).

Before proving the claim, we show that it implies decidability. First, the set INF is computable in polynomial time (fixpoint computation in the set of pairs \((p,X)\in Q\times \mathcal {X}\)). Second, we define the following set \(\mathcal M\) of square matrices indexed by elements of INF with coefficients in \(\mathbb {N}\cup \{\bot \}\). We suppose that \(\bot \) behaves as 0: for every integer \(n\in \mathbb {N}\), we have \(n.\bot =\bot .n=\bot \) and \(n+\bot =\bot +n=n\). The set \(\mathcal M\) is defined as the set of finite products of matrices \(\{M_a \mid a\in \varSigma \}\), where, for each letter \(a\in \varSigma \), we define matrix \(M_a\) by:

$$ M_a[(p,X),(q,Y)]= \left\{ \begin{array}{lll} n &{} \text {if} &{}(p,a,q) \in \varDelta \text { and }|\rho (p,a,q)(Y)|_X=n\\ \bot &{} \text {if} &{} (p,a,q) \not \in \varDelta \end{array} \right. $$

We then observe that the right property of the claim is satisfied iff the set \(\mathcal M\) is finite. By Mandel and Simon [20], this last property is decidable in polynomial time.

We turn to the proof of the claim:

  • \(\Rightarrow \) We prove the contraposition, by showing that if the right property of the claim is not satisfied, then \(\llbracket T \rrbracket \) is not LSI, which implies that \(\llbracket T \rrbracket \) is not definable by a copyless SST as copyless SST are LSI. We thus assume that the set of flow matrices defined previously is not bounded. By the characterization proven in Mandel and Simon of this property, two cases may occur:

    1. 1.

      there exists \((p,X)\in INF\) such that \((p,X) \xrightarrow {n} (p,X)\) with \(n\ge 2\),

    2. 2.

      there exist \((p,X),(q,Y)\in INF\) such that \((p,X) \xrightarrow {u\mid n_1} (p,X)\), \((p,X) \xrightarrow {u\mid n_2} (q,Y)\) and \((q,Y) \xrightarrow {u\mid n_3} (q,Y)\), with \(n_1,n_2,n_3\ge 1\), for some word u.

    In the first case, using the fact that \((p,X)\in INF\), one can prove that \(\llbracket T \rrbracket \) is not LSI. In the second case, for every \(n\ge 1\), one has \((p,X)\xrightarrow {u^n \mid m} (q,Y)\) for some \(m\ge n\). Again, one can use this property to show that \(\llbracket T \rrbracket \) is not LSI.

  • \(\Leftarrow \) The constraint expressed by the right property of the claim precisely states that, with respect to pairs \((p,X)\in INF\), the SST is bounded copy. One can then easily remove variables (pX) that do not belong to INF, so as to obtain a bounded copy SST, and it is known that every bounded copy SST can be turned into an equivalent copyless SST [7, 11].    \(\square \)

6 Conclusion

Our results establish a bridge between the theory of SST and the theory of systems of iterated morphisms. It allows to solve an interesting open problem for copyful streaming string transducers, namely the decidability of the equivalence problem. We have also proven the decidability of functionality for non-deterministic SST, and that of the subclass of copyless SST, using a reduction to a boundedness problem.

We hope that these positive decidability results will pave the way to a further study of the class of copyful SST. As future work, we want to investigate what the theory of iterated morphisms can bring to the theory of SST, and conversely, in terms of tight complexity results. For instance, the class of copyless SST, for which equivalence is PSpace-complete, could have an interesting interpretation in terms of HDT0L systems.