1 Introduction

Linear extended top-down tree transducers (l-xt) were introduced (under a different name) and investigated already in [1]. We present them in the framework of synchronous grammars [3] since in syntax-based statistical machine translation these transducers are applied, and since we utilize some results of [6, 14]. An l-xt M has a finite set of states and finitely many rules of the form \(\langle \ell , q, r\rangle \), where q is a state and the left- and right-hand side \(\ell \) and r are trees, which may also contain state-labeled leaves such that each state in r also occurs in \(\ell \). Linearity requires that each state occurs at most once both in \(\ell \) and in r. In particular, in \(\varepsilon \)-rules the left-hand side \(\ell \) and in non-strict rules the right-hand side r is just a state. The semantics of M is defined by means of synchronous rewriting using the derivation relation \(\Rightarrow \). It is defined over sentential forms, which are triples \((\xi , L, \zeta )\) consisting of trees \(\xi \) and \(\zeta \) with state-labeled leaves and a set L of links. A link is a pair (uv) of positions pointing to occurrences of the same state in the trees \(\xi \) and \(\zeta \), respectively. A rule \(\langle \ell , q, r\rangle \) can be applied to a sentential form \((\xi , L, \zeta )\) if there is a link \((u, v) \in L\) such that u and v point to an occurrence of the state q. In this case we write \((\xi , L, \zeta ) \Rightarrow (\xi ', L', \zeta ')\), where the sentential form \((\xi ', L', \zeta ')\) is obtained by replacing the linked occurrences of q in \(\xi \) and \(\zeta \) by \(\ell \) and r, respectively. In addition, L is updated to include links induced by occurrences of the same state in \(\ell \) and r. The initial sentential form is \((q_0, \{(\varepsilon , \varepsilon )\}, q_0)\), in which \(q_0\) is the initial state of M, and we apply derivation steps until no occurrences of linked states remain. Any remaining (unlinked) state occurrence in the input tree t can then be replaced by an arbitrary tree. An instance of t is obtained by replacing all state occurrences and I(t) is the set of all instances of t. The tree transformation induced by M consists of all pairs \((t', u)\) such that \((q_0, \{(\varepsilon ,\varepsilon )\}, q_0) \Rightarrow ^* (t, \emptyset , u)\) and \(t' \in I(t)\). In order to increase their expressive power, l-xt can be equipped with regular look-ahead [4], which restricts the instances I(t) such that an unlinked occurrence of a state q can only be replaced by an element of a given regular tree language c(q). We abbreviate ‘l-xt with regular look-ahead’ by l-xt\(^\mathrm{R}\).

Weighted l-xt\(^\mathrm{R}\), abbreviated by l-wxt\(^\mathrm{R}\), are able to express quantitative properties of the tree transformations [7, 13, 14, 16]. In an l-wxt\(^\mathrm{R}\) a weight from a semiring \({\mathbb {K}}\) is associated to each rule, and these rule weights are multiplied in a derivation. Provided that several derivations exist, these derivation weights are summed up. In this manner, an l-wxt\(^\mathrm{R}\,M\) assigns a weight \(\Vert M \Vert _{{\mathbb {K}}}(t, u) \in {\mathbb {K}}\) to each pair (tu) of trees. It turned out that both l-wxt and l-wxt\(^\mathrm{R}\) over the probabilistic semiring can serve as formal models of tree transformations which are used in syntax-based statistical machine translation [10, 12].

We focus on the composition closure of l-wxt and l-wxt\(^\mathrm{R}\) without \(\varepsilon \)-rules (l-wxt and l-wxt\(^\mathrm{R}\), respectively) and some of their restricted subclasses because compositions of weighted tree transformations induced by them can be defined in terms of finite sums. Our motivation is that complex systems are often specified in terms of compositions of simpler tree transformations [17], which are easier to develop, train, and maintain [12]. More precisely, let \({\mathcal {C}}\) be a class of weighted tree transformations (e.g. the class of all weighted tree transformations induced by l-wxt\(^\mathrm{R}\)). The composition hierarchy of \({\mathcal {C}}\) is \({\mathcal {C}} \subseteq {\mathcal {C}}^2 \subseteq {\mathcal {C}}^3 \subseteq \cdots {}\), where \({\mathcal {C}}^n\) denotes the n-fold composition of \({\mathcal {C}}\). It is either infinite (i.e., \({\mathcal {C}}^n \subsetneq {\mathcal {C}}^{n+1}\) for all n) or finite (i.e., \(\mathcal C^n = {\mathcal {C}}^{n+1}\) for some n). In the latter case, the minimal such n is interesting since all compositions can be reduced to this length.

The additional standard restrictions we consider are strictness (‘s’), which requires that the right-hand side r is not a single state, and nondeletion (‘n’), which means that each state in the left-hand side \(\ell \) occurs also in the right-hand side r, in both cases for each rule \(\langle \ell , q, r\rangle \) of the l-wxt\(^\mathrm{R}\). Thus, for instance sl-wxt\(^\mathrm{R}\) abbreviates the expression ‘strict l-wxt\(^\mathrm{R}\)’. The class of all weighted tree transformations induced by certain kind of l-wxt\(^\mathrm{R}\) is denoted by typesetter letters so for instance stands for the set of all weighted tree transformations computable by sl-wxt\(^\mathrm{R}\) over the semiring \({\mathbb {K}}\). We consider the composition hierarchies of the classes , which is also investigated in [15], and , , , and . As main result we show that the composition hierarchies of these classes collapse at levels 2, 2, 2, 3, and 4, respectively, for an arbitrary commutative semiring \({\mathbb {K}}\) (cf. Theorem 16). We achieve our results by lifting the results [1, Theorem 6.2] and [6, Theorems 26, 31, 33, 34], where it is shown that the corresponding hierarchies in the unweighted cases collapse at the same levels. To this end, we decompose an l-wxt\(^\mathrm{R}\) into a weighted relabeling that handles all weights and nondeterminism, and a Boolean functional unambiguous l-wxt\(^\mathrm{R}\) (cf. Lemma 12). Moreover, we show that we can compose any such relabeling to the right of any l-wxt\(^\mathrm{R}\) (cf. Lemma 13). These two constructions together will allow us to take all l-wxt\(^{\text{ R }}\) in a composition chain into a particularly simple normal form (cf. Theorem 14). After some additional technical tailoring, we can utilize the mentioned results [1, 6] and lift them to the corresponding weighted devices over any commutative semiring.

2 Preliminaries

We let \({\mathbb {N}}= \{0, 1, 2, \cdots \}\) and \([n] = \{i \in {\mathbb {N}}\mid 1 \le i \le n\}\) for every \(n \in {\mathbb {N}}\). For sets S and T, we let \(2^S = \{S' \mid S' \subseteq S\}\), and we identify \(S = \{s\}\) with the element s. Moreover, \(T^S = \{f \mid f :S \rightarrow T\}\), and \(\vert S \vert \) is the cardinality of S. For every \(R \subseteq S \times T\), we let \({{\,\mathrm{dom}\,}}(R) = \{s \mid \exists t \in T :(s, t) \in R\}\) and \({{\,\mathrm{range}\,}}(R) = \{t \mid \exists s \in S :(s, t) \in R\}\). The composition of R with \(R' \subseteq T \times U\) is \(R \mathbin ; R' = \{(s, u) \mid \exists t \in T :(s, t) \in R,\, (t, u) \in R'\}\). Given \(n \in {\mathbb {N}}\), we let \(S^n\) be the n-fold Cartesian product of S with itself and \(S^* = \bigcup _{n \in {\mathbb {N}}} S^n\) is the set of all words over S. The length \(\vert w \vert \) of \(w \in S^n\) is n. The empty word () is also denoted by \(\varepsilon \). The concatenation of \(v, w \in S^*\) is v.w or vw.

A ranked alphabet consists of a nonempty, finite set \(\varSigma \) and , which we omit whenever it is obvious. We let \(\varSigma ^{(n)} = \{\sigma \in \varSigma \mid {{\,\mathrm{rk}\,}}(\sigma ) = n\}\) for every \(n \in {\mathbb {N}}\). In the following, \(\varSigma \), \(\varDelta \), and \(\varGamma \) are arbitrary ranked alphabets. Let V be a set with \(V\,\cap \,\varSigma = \emptyset \). The set \(T_\varSigma (V)\) of \(\varSigma \)trees indexed by V is defined in the usual way, and we let \(T_\varSigma = T_\varSigma (\emptyset )\). If \(t \in T_\varSigma (V)\) is given as \(t = \sigma (t_{1}, \cdots , t_{n})\), then we often omit the obvious quantification that \(n \in {\mathbb {N}}\), \(\sigma \in \varSigma ^{(n)}\), and \(t_{1}, \cdots , t_{n} \in T_\varSigma (V)\). The map assigning positions is inductively defined for all \(t \in T_\varSigma (V)\) by \({{\,\mathrm{pos}\,}}(t) = \{\varepsilon \}\) if \(t \in V\) and \({{\,\mathrm{pos}\,}}(t) = \{\varepsilon \} \cup \{i.w \mid i \in [n],\, w \in {{\,\mathrm{pos}\,}}(t_i) \}\) if \(t = \sigma (t_{1}, \cdots , t_{n})\). The size \(\vert t \vert \) of a tree \(t \in T_\varSigma (V)\) is \(\vert {{\,\mathrm{pos}\,}}(t) \vert \). The label of t at \(w \in {{\,\mathrm{pos}\,}}(t)\) is t(w) and the subtree of t rooted at w is \(t|_w\). For every set \(D \subseteq \varSigma \cup V\) of labels, we let \({{\,\mathrm{pos}\,}}_D(t) = \{ w \in {{\,\mathrm{pos}\,}}(t) \mid t(w) \in D\}\). The tree t is linear (resp. nondeleting) in \(V' \subseteq V\) if \(\vert {{\,\mathrm{pos}\,}}_v(t) \vert \le 1\) (resp., \(\vert {{\,\mathrm{pos}\,}}_v(t) \vert \ge 1\)) for every \(v \in V'\). We let \(T^{\text {lin}}_\varSigma (V)\) be the set of all trees of \(T_\varSigma (V)\) that are linear in V. Moreover, \({{\,\mathrm{var}\,}}(t) = \{v \in V \mid {{\,\mathrm{pos}\,}}_v(t) \ne \emptyset \}\). We use the countably infinite set \(X = \{x_i \mid i \in {\mathbb {N}}\}\) and \(X_n = \{x_i \mid i \in [n]\}\) for every \(n \in {\mathbb {N}}\). For every \(n \in {\mathbb {N}}\), we define the set \(C_\varSigma (X_n) = \{t \in T^{\text {lin}}_\varSigma (X_n) \mid {{\,\mathrm{var}\,}}(t) = X_n\}\) of n-contexts and the set \(\widehat{C_\varSigma }(X_n) = \{t \in C_\varSigma (X_n) \mid \text {the order of variables in } t \text { is }x_{1}, \cdots , x_{n}\}\) of straight n-contexts. Let \(X' \subseteq X\) and \(\theta :X' \rightarrow T_\varSigma \). Each such mapping \(\theta \) extends to a mapping \((\cdot ) \theta :T_\varSigma (X) \rightarrow T_\varSigma (X)\) such that for all \(t \in T_\varSigma (X)\) we have \(t\theta = t\) if \(t \in X \setminus X'\), \(t\theta = \theta (t)\) if \(t \in X'\), and \(t\theta = \sigma (t_1\theta , \cdots , t_n\theta )\) if \(t = \sigma (t_{1}, \cdots , t_{n})\). Given \(t \in T_\varSigma (X)\) and \(t_{1}, \cdots , t_{n} \in T_\varSigma \), we write \(t[t_{1}, \cdots , t_{n}]\) for \(t\theta \), where \(\theta :X_n \rightarrow T_\varSigma \) is given by \(\theta (x_i) = t_i\) for every \(i \in [n]\). Moreover, for every \(t \in T_\varSigma \) and \(k \in {\mathbb {N}}\), let

$$\begin{aligned} {{\,\mathrm{decomp}\,}}(t)&= \bigcup _{n \in {\mathbb {N}}} \bigl \{ (c, t_{1}, \cdots , t_{n}) \in \widehat{C_\varSigma }(X_n) \times (T_\varSigma )^n \mid t = c[t_{1}, \cdots , t_{n}] \bigr \} \end{aligned}$$

and \({{\,\mathrm{subst}\,}}_k(t) = \bigl \{ (u, \theta ) \in T_\varSigma ^{\text {lin}}(X_k) \times (T_\varSigma )^{{{\,\mathrm{var}\,}}(u)} \mid t = u\theta \bigr \}\). Note that both \({{\,\mathrm{decomp}\,}}(t)\) and \({{\,\mathrm{subst}\,}}_k(t)\) are finite sets.

As weight structures we use commutative semirings [9, 11]. Formally, a commutative semiring is an algebraic structure such that and are commutative monoids, \(\cdot \) distributes over finite sums, and for all \(a \in K\) we have \(a \cdot 0 = 0\). Given a mapping \(f :S \rightarrow K\), it is Boolean if \({{\,\mathrm{range}\,}}(f) \subseteq \{0, 1\}\), and its support \({{\,\mathrm{supp}\,}}(f)\) is \({{\,\mathrm{supp}\,}}(f) = \{s \in S \mid f(s) \ne 0\}\). Any map \(L :T_\varSigma \rightarrow K\) is a weighted tree language.

We will often utilize the Boolean semiring \({\mathbb {B}} = (\{0,1\}, \vee , \wedge , 0, 1)\), which is used to model the unweighted case, and the semiring . For the rest of this contribution, let be a commutative semiring.

A weighted tree automaton is a tuple , in which Q is a finite set of states, and \(Q_0 \subseteq Q\) is a set of initial states, \(\varSigma \) is a ranked alphabet of input symbols, and is a weight assignment to transitions. It is Boolean if the weight assignment ‘’ is Boolean, and it is (bottom-up) deterministic if \(q = q'\) for all . The semantics \(\Vert A \Vert _{{\mathbb {K}}} :T_\varSigma \rightarrow K\) of A is \(\Vert A \Vert _{{\mathbb {K}}}(t) = \sum _{q \in Q_0} \Vert A \Vert _{{\mathbb {K}}}^q(t)\) for every \(t \in T_\varSigma \), where the map \(\Vert A \Vert _{{\mathbb {K}}}^q :T_\varSigma \rightarrow K\) is inductively defined for every \(q \in Q\) and tree \(t = \sigma (t_{1}, \cdots , t_{n})\) by \(\Vert A \Vert _{{\mathbb {K}}}^q(t) = \sum _{q_{1}, \cdots , q_{n} \in Q} {{\,\mathrm{wt}\,}}(q_{1}, \cdots , q_{n},\sigma , q) \cdot \prod _{i = 1}^n \Vert A \Vert _{{\mathbb {K}}}^{q_i}(t_i)\). A weighted tree language \(L :T_\varSigma \rightarrow K\) is (\({\mathbb {K}}\)-)regular if there exists a weighted tree automaton A such that \(L = \Vert A \Vert _{{\mathbb {K}}}\). The class of all such regular weighted tree languages is denoted by \({\texttt {REG}}_\varSigma ({{\mathbb {K}}})\). For a Boolean deterministic weighted tree automaton A the weighted tree languages \(\Vert A \Vert _{{\mathbb {K}}}\) and \(\Vert A \Vert _{{\mathbb {K}}}^q\) for all \(q \in Q\) are obviously Boolean. The regular weighted tree languages are closed under the weighted relabelings [7, Theorem 5.3], which we introduce next.

A weighted tree transformation is a mapping \(\tau :T_\varSigma (V) \times T_\varDelta (V) \rightarrow K\). The domain ‘\({{\,\mathrm{dom}\,}}(\tau )\)’ and range ‘\({{\,\mathrm{range}\,}}(\tau )\)’ for a weighted tree transformation \(\tau :T_\varSigma (V) \times T_\varDelta (V) \rightarrow K\) are simply defined by \({{\,\mathrm{dom}\,}}(\tau ) = {{\,\mathrm{dom}\,}}({{\,\mathrm{supp}\,}}(\tau ))\) and \({{\,\mathrm{range}\,}}(\tau ) = {{\,\mathrm{range}\,}}({{\,\mathrm{supp}\,}}(\tau ))\). The transformation \(\tau \) is functional (resp., finitary), if \(\{u \mid (t, u) \in {{\,\mathrm{supp}\,}}(\tau )\}\) contains at most one element (resp., is finite) for every \(t \in T_\varSigma (V)\). For a functional \(\tau \), its support \({{\,\mathrm{supp}\,}}(\tau )\) is a partial function.

A weighted relabeling is a mapping \(\nu :\varSigma \times \varDelta \rightarrow K\) such that \(\nu (\sigma , \delta ) = 0\) for all \(\sigma \in \varSigma \) and \(\delta \in \varDelta \) with \({{\,\mathrm{rk}\,}}(\sigma ) \ne {{\,\mathrm{rk}\,}}(\delta )\). Each weighted relabeling \(\nu :\varSigma \times \varDelta \rightarrow K\) extends to a finitary weighted tree transformation \({\overline{\nu }} :T_\varSigma (V) \times T_\varDelta (V) \rightarrow K\), which is given as follows: for all variables \(v, v' \in V\), trees \(t = \sigma (t_{1}, \cdots , t_{n}) \in T_\varSigma (V)\), and \(u = \delta (u_{1}, \cdots , u_{k}) \in T_\varDelta (V)\), we define \({\overline{\nu }}(v, u) = {\overline{\nu }}(t, v') = 0\) and

$$\begin{aligned} {\overline{\nu }}(v, v')&= {\left\{ \begin{array}{ll} 1, &{} \text {if } v = v'; \\ 0, &{} \text {otherwise,} \end{array}\right. } \qquad&{\overline{\nu }}(t, u)&= {\left\{ \begin{array}{ll} \nu (\sigma , \delta ) \cdot \prod _{i = 1}^{n} \overline{\nu }(t_i, u_i)\,, &{} \text {if } n = k; \\ 0, &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Note that the weighted tree transformation \({\overline{\nu }}\) is finitary. Since \({\overline{\nu }}\) and \(\nu \) coincide on \(\varSigma ^{(0)} \times \varDelta ^{(0)}\), we will not distinguish carefully between them and use just \(\nu \) for both. In fact, for all \(t \in T_\varSigma (V)\), \(u \in T_\varDelta (V)\), and each \((c', u_{1}, \cdots , u_{n}) \in {{\,\mathrm{decomp}\,}}(u)\)

$$\begin{aligned} \nu (t, u) = \sum _{(c, t_{1}, \cdots , t_{n}) \in {{\,\mathrm{decomp}\,}}(t)} \nu (c, c') \cdot \prod _{i = 1}^n \nu (t_i, u_i) . \end{aligned}$$
(1)

There is actually at most one decomposition of t in (1) that yields a non-zero weight for the sum (since the shapes of c and \(c'\) and similarly \(t_i\) and \(u_i\) need to coincide for all \(i \in [n]\)). The analogous property holds provided that a decomposition of t is given. The class of all weighted tree transformations induced by weighted relabelings is denoted by \({\texttt {WREL}}({{\mathbb {K}}})\).

Given a finitary weighted tree transformation \(\tau :T_\varSigma \times T_\varDelta \rightarrow K\) and a weighted tree language \(L :T_\varDelta \rightarrow K\), we can define the pre-image \(\tau ^{-1}(L) :T_\varSigma \rightarrow K\) of L via \(\tau \) for every \(t \in T_\varSigma \) by \(\bigl (\tau ^{-1}(L) \bigr )(t) = \sum _{u \in T_\varDelta } \tau (t, u) \cdot L(u)\). Given another weighted tree transformation \(\tau ' :T_\varDelta \times T_\varGamma \rightarrow K\), we define the composition \(\tau \mathbin ; \tau ' :T_\varSigma \times T_\varGamma \rightarrow K\) of \(\tau \) followed by \(\tau '\) for every \(t \in T_\varSigma \) and \(v \in T_\varGamma \) as \(\bigl (\tau \mathbin ; \tau '\bigr )(t, v) = \sum _{u \in T_\varDelta } \tau (t, u) \cdot \tau '(u, v)\). Given classes \({\mathcal {C}}, \mathcal C'\) of tree transformations, we let \({\mathcal {C}} \mathbin ; {\mathcal {C}}' = \{ \tau \mathbin ; \tau ' \mid \tau \in {\mathcal {C}},\, \tau ' \in {\mathcal {C}}'\}\). We also write \(\tau \mathbin ; L\) instead of \(\tau ^{-1}(L)\) for a weighted relabeling \(\tau \) and a weighted tree language \(L :T_\varDelta \rightarrow K\).

Theorem 1

([7, Theorem 5.1]). For every weighted relabeling \(\tau \in {\texttt {WREL}}({{\mathbb {K}}})\) of type \(\tau :T_\varSigma \times T_\varDelta \rightarrow K\) and regular weighted tree language \(L \in {\texttt {REG}}_\varDelta ({{\mathbb {K}}})\), the weighted tree language \(\tau ^{-1}(L)\) is again regular [i.e., \(\tau ^{-1}(L) \in {\texttt {REG}}_\varSigma ({{\mathbb {K}}})\)].

3 Transformational Model

A linear weighted extended top-down tree transducer with regular look-ahead (for short: l-wxt\(^{\text{ R }}\)) over \({\mathbb {K}}\) [14] is a tuple , in which

  • Q is a finite set of states, and \(q_0 \in Q\) is an initial state,

  • \(\varSigma \) and \(\varDelta \) are ranked alphabets of input and output symbols, respectively,

  • \(R \subseteq T^{\text {lin}}_\varSigma (Q) \times Q \times T^{\text {lin}}_\varDelta (Q)\) is a finite set of rules such that \({{\,\mathrm{var}\,}}(r) \subseteq {{\,\mathrm{var}\,}}(\ell )\) and \(\{\ell , r\} \not \subseteq Q\) for every rule \(\langle \ell , q, r \rangle \in R\),

  • is a weight assignment to the rules, and

  • \(c :Q \rightarrow {\texttt {REG}}_\varSigma ({{\mathbb {K}}})\) is a regular weighted look-ahead for each state.

To save parentheses, we will write \(c^q\) instead of c(q) for every state \(q \in Q\).

Next, we recall some common restrictions of the general model that have already been discussed in [7]. The l-wxt\({}^{\text{ R }}\,\,M\) is

  • \(\varepsilon \)-free (resp., strict), if \(\ell \notin Q\) (resp., \(r \notin Q\)) for every rule \(\langle \ell , q, r\rangle \in R\),

  • nondeleting, if \({{\,\mathrm{var}\,}}(\ell ) = {{\,\mathrm{var}\,}}(r)\) for every rule \(\langle \ell , q, r\rangle \in R\),

  • Boolean, if ‘’ and \(c^q\) are Boolean for every state \(q \in Q\),

  • an l-wxt (i.e., without look-ahead) if \(c^q(t) = 1\) for every state \(q \in Q\) and tree \(t \in T_\varSigma \), and

  • an l-wt\(^{\text{ R }}\) (i.e., non-extended) if \({{\,\mathrm{pos}\,}}_\varSigma (\ell ) = \{\varepsilon \}\) for every rule \(\langle \ell , q, r\rangle \in R\).

Next we recall the semantics of an l-, which is the weighted tree transformation \(\Vert M \Vert _{{\mathbb {K}}}^q :T_\varSigma \times T_\varDelta \rightarrow K\) defined inductively for every \(t \in T_\varSigma \) and \(u \in T_\varDelta \) by

(2)

Using our remarks about \({{\,\mathrm{decomp}\,}}(t)\) and \({{\,\mathrm{subst}\,}}_n(u)\), all sets used in the index of the sum are finite, so the sum has only finitely many summands. Since we have \(\langle \ell [q_{1}, \cdots , q_{n}], q, r[q_{1}, \cdots , q_{n}] \rangle \in R\), we know that \(\ell \notin Q\) or \(r \notin Q\). Consequently, \(\vert t_i \vert < \vert t \vert \) or \(\vert x_i\theta \vert < \vert u \vert \) for every \(i \in [n]\) with \(x_i \in {{\,\mathrm{var}\,}}(r)\), which proves that the recursion is well-founded. Besides the rule weight \({{\,\mathrm{wt}\,}}(\rho )\), we multiply the weights \(\Vert M \Vert _{{\mathbb {K}}}^{q_i}(t_i, x_i\theta )\) of the recursive processing of those subtrees \(t_i\) that are further processed. The subtrees that are not further processed contribute their look-ahead weight \(c^{q_i}(t_i)\). The semantics \(\Vert M \Vert _{{\mathbb {K}}} :T_\varSigma \times T_\varDelta \rightarrow K\) is then given for every \(t \in T_\varSigma \) and \(u \in T_\varDelta \) by \(\Vert M \Vert _{{\mathbb {K}}}(t, u) = \Vert M \Vert ^{q_0}_{{\mathbb {K}}}(t, u)\). For nondeleting l-wxt\({}^{\text{ R }}\) the rightmost product in (2) yields 1, hence nondeleting l-wxt\({}^{\text{ R }}\) and nondeleting l-wxt are equally expressive. An l-wxt\({}^{\text{ R }}\,\,M\) is functional if \(\Vert M \Vert _{{\mathbb {K}}}\) is functional. We note that each l-wxt\({}^{\text{ R }}\,\,M\) that is \(\varepsilon \)-free or functional induces a finitary \(\Vert M \Vert _{{\mathbb {K}}}\). Next we relate l-wxt\({}^{\text{ R }}\) over the semiring \({\mathbb {N}}\) and l-wxt\({}^{\text{ R }}\) over \({\mathbb {K}}\). For this, we recall that \({\mathbb {N}}\) is the initial commutative semiring, so there exists a unique homomorphism [9, 11] from \({\mathbb {N}}\) to \({\mathbb {K}}\), i.e., a mapping \(h :{\mathbb {N}}\rightarrow K\) with \(h(0) = 0\), \(h(1) = 1\), \(h(n + n') = h(n) + h(n')\), and \(h(n \cdot n') = h(n) \cdot h(n')\) for all \(n, n' \in {\mathbb {N}}\).

Lemma 2

Let be an l-wxt\({}^{\text{ R }}\) over the semiring \({\mathbb {N}}\) of nonnegative integers, and let \(h :{\mathbb {N}}\rightarrow K\) be the unique semiring homomorphism from \({\mathbb {N}}\) to \({{\mathbb {K}}}\). Then \(h(\Vert M \Vert _{\mathbb {N}}(t, u)) = \Vert M' \Vert _{{\mathbb {K}}}(t, u)\) for all \((t, u) \in T_\varSigma \times T_\varDelta \), where is the l-wxt\({}^{\text{ R }}\) over the semiring \({\mathbb {K}}\) with \({{\,\mathrm{wt}\,}}'(\rho ) = h({{\,\mathrm{wt}\,}}(\rho ))\) and \((c')^q(t) = h(c^q(t))\) for all \(\rho \in R\), \(q \in Q\), and \(t \in T_\varSigma \).

We abbreviate the properties of l-wxt\({}^{\text{ R }}\) as follows: , , , , . We use these shorthands with the stems ‘l-wxt\({}^{\text{ R }}\)’, ‘l-wxt’, ‘l-wt\({}^{\text{ R }}\)’, and ‘l-wt’ to talk about an l-wxt\({}^{\text{ R }}\), l-wxt, l-wt\({}^{\text{ R }}\), or l-wt that additionally has the abbreviated properties attached as a prefix. For example, Bnl-wxt stands for “Boolean nondeleting l-wxt”. We use the same abbreviations with the stem (i.e., the material behind the hyphen) in typesetter letters (and the semiring \({\mathbb {K}}\) in parentheses) for the corresponding classes of induced weighted tree transformations. For instance, \({\texttt {n}}{\texttt {l}}\mathord {{{\,\mathrm{-}\,}}}{\texttt {WXT}}({\mathbb {K}})\) stands for the set of all weighted tree transformations computable by nl-wxt.

Utilizing the bimorphism characterizations of [7, Section 4] and the closure of the regular tree languages under linear homomorphisms [8, Theorem II.4.16], we easily obtain that both the domain as well as the range of each such transducer are regular, which we utilize without explicit mention.

Lemma 3

For every \(\tau \in {\texttt {l}}\mathord {{{\,\mathrm{-}\,}}}{\texttt {WXT}}^{{\texttt {R}}}({\mathbb {B}})\) both \({{\,\mathrm{dom}\,}}(\tau )\) and \({{\,\mathrm{range}\,}}(\tau )\) are regular.

Finally, we recall the results for the composition hierarchies of the unweighted tree transformation classes, which we lift to the weighted case in Sect. 5.

Theorem 4

(see [1, Theorem 6.2] and [6, Theorems 26, 31, 33, 34]).

figure a

Additionally, the classes of (3b) and (3d) as well as (3c) and (3e) coincide.

4 Faithful Representation

In this section, we deal with the question in which cases unweighted transducers faithfully represent certain Boolean weighted transducers (note that the weighted tree transformation induced by a Boolean l-wxt\({}^{\text{ R }}\) might not be Boolean). Moreover, we consider for which such transducers M another such transducer \(M'\) exists, which induces a partial function \(\Vert M' \Vert _{{\mathbb {B}}} \subseteq \Vert M \Vert _{{\mathbb {B}}}\) with the same domain as \(\Vert M \Vert _{{\mathbb {B}}}\). We start with the definition of unambiguous l-wxt\({}^{\text{ R }}\). To this end, we reinterpret Boolean l-wxt\({}^{\text{ R }}\) over the semiring \({\mathbb {K}}\), which anyway only use the neutral elements 0 and 1, as Boolean l-wxt\({}^{\text{ R }}\) over the semiring \({\mathbb {N}}\) of nonnegative integers by identifying the neutral elements 0 and 1 in \(\mathbb K\) and \({\mathbb {N}}\). Thus, given a Boolean l-wxt\({}^{\text{ R }}\,\,M\) (over \({\mathbb {K}}\)), we write \(\Vert M \Vert _{\mathbb {N}}\) for its semantics in the semiring of nonnegative integers. We also reinterpret M over the Boolean semiring \({\mathbb {B}}\) and write \(\Vert M \Vert _{{\mathbb {B}}}\). Over the Boolean semiring \({\mathbb {B}}\), we sometimes identify mappings \(f :S \rightarrow \{0,1\}\) with their support \({{\,\mathrm{supp}\,}}(f)\) and vice versa, so \(\Vert M \Vert _{{\mathbb {B}}} :T_\varSigma \times T_\varDelta \rightarrow \{0, 1\}\) is identified with \(\Vert M \Vert _{{\mathbb {B}}} \subseteq T_\varSigma \times T_\varDelta \) for a Boolean l-wxt\({}^{\text{ R }}\,\,M\).

Definition 5

Let be a Boolean l-wxt\({}^{\text{ R }}\) over \({\mathbb {K}}\) and \(L \subseteq T_\varSigma \). We say that M is unambiguous on L if \(\Vert M \Vert _{\mathbb {N}}(t, u) \in \{0, 1\}\) for every \((t, u) \in L \times T_\varDelta \) (i.e., the mapping \(\Vert M \Vert _{\mathbb {N}}\) restricted to \(L \times T_\varDelta \) is Boolean). In the case of \(L = T_\varSigma \), we also say that M is unambiguous.

Lemma 6

Let be a Boolean l-wxt\({}^{\text{ R }}\) over \({\mathbb {K}}\) that is unambiguous on \(L \subseteq T_\varSigma \). Then \(\Vert M \Vert _{{\mathbb {K}}}(t, u) = \Vert M \Vert _{{\mathbb {B}}}(t, u)\) for all \((t,u) \in L \times T_\varDelta \).

Next we consider how Boolean weighted tree transformations behave under composition. We will identify another restriction, functionality, that is required for the faithful representation via unweighted composition.

Lemma 7

Let \(\tau :T_\varSigma \times T_\varDelta \rightarrow K\) and \(\tau ' :T_\varDelta \times T_\varGamma \rightarrow K\) be Boolean weighted tree transformations. If \(\tau \) is functional, then \(\tau \mathbin ; \tau ' = {{\,\mathrm{supp}\,}}(\tau ) \mathbin ; {{\,\mathrm{supp}\,}}(\tau ')\).

With this knowledge we are now ready to state the main lemma of this section. Given Boolean transducers that additionally obey certain functionality and unambiguity restrictions, the computation inside the Boolean semiring faithfully represents the computation in the semiring \({\mathbb {K}}\). We recall that the neutral element of composition is the identity mapping of the correct set \(T_\varSigma \), whose range is clearly \(T_\varSigma \).

Lemma 8

Let \(n \ge 1\) be an integer and be a Boolean l-wxt\({}^{\text{ R }}\) for every \(i \in [n]\). If (i) \(M_i\) is unambiguous on

$$ {{\,\mathrm{range}\,}}(\Vert M_1 \Vert _{{\mathbb {K}}} \mathbin ; \cdots \mathbin ; \Vert M_{i-1} \Vert _{{\mathbb {K}}}) $$

and (ii) \(\Vert M_1 \Vert _{{\mathbb {K}}} \mathbin ; \cdots \mathbin ; \Vert M_{i-1} \Vert _{{\mathbb {K}}}\) is functional for every \(i \in [n]\), then

$$ \Vert M_1 \Vert _{{\mathbb {K}}} \mathbin ; \cdots \mathbin ; \Vert M_n \Vert _{{\mathbb {K}}} = \Vert M_1 \Vert _{{\mathbb {B}}} \mathbin ; \cdots \mathbin ; \Vert M_n \Vert _{{\mathbb {B}}} . $$

We identify a weighted tree transformation over \({\mathbb {B}}\) with its support for the rest of this section. A uniformizer [2] of a tree transformation \(\tau \subseteq T_\varSigma \times T_\varDelta \) is a partial function \(f :T_\varSigma \rightarrow T_\varDelta \) such that \(f \subseteq \tau \) and \({{\,\mathrm{dom}\,}}(f) = {{\,\mathrm{dom}\,}}(\tau )\). In other words, a uniformizer of \(\tau \) is a maximal partial function contained in \(\tau \). We start with a simple proposition that shows how uniformizers behave under composition.

Lemma 9

([2, Lemma 24]). Let \(n \ge 1\) be an integer, \(\varSigma _{1}, \cdots , \varSigma _{n+1}\) be ranked alphabets, and \(\tau _i \subseteq T_{\varSigma _i} \times T_{\varSigma _{i+1}}\) and \(f_i :T_{\varSigma _i} \rightarrow T_{\varSigma _{i+1}}\) be tree transformations for all \(i \in [n]\). If

  1. 1.

    \({{\,\mathrm{range}\,}}(\tau _j) \subseteq {{\,\mathrm{dom}\,}}(\tau _{j+1})\) for all \(1 \le j \le n-1\) and

  2. 2.

    \(f_i\) is a uniformizer for \(\tau _i\) for all \(i \in [n]\),

then \(f = f_1 \mathbin ; \cdots \mathbin ; f_n\) is a uniformizer for \(\tau = \tau _1 \mathbin ; \cdots \mathbin ; \tau _n\) and \({{\,\mathrm{dom}\,}}(\tau ) = {{\,\mathrm{dom}\,}}(\tau _1)\). If additionally \(\tau \) is functional, then \(f = \tau \).

Finally, we need two results from the theory of unweighted tree transducers. The first statement establishes the existence of uniformizers of tree transformations induced by over \({\mathbb {B}}\).

Lemma 10

(variant of [5, Lemma]). For every \(w \subseteq \{{\texttt {n}}, {\texttt {s}}\}\), each tree transformation of has a uniformizer in .

The second statement builds also on [5, Lemma], which essentially says that functional top-down tree transducers can be determinized provided that they have regular look-ahead. We utilize the same idea to prove a corresponding lemma, where we use unambiguity instead of determinism. The lemma shows that we can remove ambiguity from a functional l-wxt\({}^{\text{ R }}\) over \({\mathbb {B}}\). We use the shorthand ‘u’ to abbreviate ‘unambiguous’.

Lemma 11

\(w{\texttt {f}}{\texttt {l}}\mathord {{{\,\mathrm{-}\,}}}{\texttt {WXT}}^{{\texttt {R}}}({{\mathbb {B}}}) = w{\texttt {f}}{\texttt {u}}{\texttt {l}}\mathord {{{\,\mathrm{-}\,}}}{\texttt {WXT}}^{{\texttt {R}}}({{\mathbb {B}}})\) for all .

5 Main Results

We start with a construction that shows that a weighted relabeling can be separated from an \(\varepsilon \)-free l-wxt\({}^{\text{ R }}\) that handles all the weights and the nondeterminism leaving a Boolean functional \(\varepsilon \)-free l-wxt\({}^{\text{ R }}\) that is also unambiguous.

Lemma 12

For all \(w \subseteq \{{\texttt {n}}, {\texttt {s}}\}\)

figure b

Proof

Let be an arbitrary \(\varepsilon \)-free l-wxt\({}^{\text{ R }}\). Since we also need access to the transitions of a weighted tree automaton computing the regular look-ahead, let be a weighted tree automaton such that \(Q \subseteq Q'\) and \(\Vert A \Vert _{{\mathbb {K}}}^q = c^q\) for every \(q \in Q\) (e.g., we can take the disjoint union of the weighted tree automata computing \(c^q\) for all \(q \in Q\)). Let \(P = R \cup {{\,\mathrm{supp}\,}}(\underline{{{\,\mathrm{wt}\,}}})\) be the set of all rules and transitions used in M and A. We first construct the ranked alphabet consisting of the symbols \(\varSigma ' = \varSigma \cup (\varSigma \times P)\) such that \({{\,\mathrm{rk}\,}}'(\sigma ) = {{\,\mathrm{rk}\,}}(\sigma )\) and \({{\,\mathrm{rk}\,}}'(\langle \sigma , \rho \rangle ) = {{\,\mathrm{rk}\,}}(\sigma )\) for every \(\sigma \in \varSigma \) and \(\rho \in P\). Next, we construct the weighted relabeling \(\nu :\varSigma \times \varSigma ' \rightarrow K\) as follows: \(\nu (\sigma , \sigma ') = \delta _{\sigma \sigma '}\) and

for all \(\sigma , \sigma ' \in \varSigma \) and \(\rho \in P\), where \(\delta _{\sigma \sigma '}\) is the usual Kronecker delta. In other words, the relabeling either (i) keeps the symbol or (ii) keeps the input symbol, but annotates a rule or transition and charges the corresponding weight. Intuitively, the relabeling annotates the rules and transitions to be executed, but the relabeling does not ensure that the annotation can actually be executed at the annotated position. This check and the execution are performed by the Boolean \(\varepsilon \)-free l-, to which the Boolean weighted tree automaton \(A' = (Q', \varSigma ', Q'_0, \underline{{{\,\mathrm{wt}\,}}'})\) is associated via \({\underline{c}}^q = \Vert A' \Vert _{{\mathbb {K}}}^q\) for every state \(q \in Q\). We set

  • \(\rho ' = \bigl \langle \langle \ell (\varepsilon ), \rho \rangle (\ell |_1, \cdots , \ell |_{{{\,\mathrm{rk}\,}}(\sigma )}), q, r \bigr \rangle \in R'\) and \({{\,\mathrm{wt}\,}}'(\rho ') = 1\) for every rule \(\rho = \langle \ell , q, r\rangle \in R\),

  • \(\underline{{{\,\mathrm{wt}\,}}'}(q'_{1}, \cdots , q'_{n},\langle \sigma , \rho \rangle , q') = 1\) for every \(\rho = (q'_{1}, \cdots , q'_{n},\sigma , q') \in {{\,\mathrm{supp}\,}}(\underline{{{\,\mathrm{wt}\,}}})\), and

  • no additional rules are in \(R'\) and \(\underline{{{\,\mathrm{wt}\,}}'}(\rho ) = 0\) for all other transitions \(\rho \).

Note that \(\ell (\varepsilon ) \in \varSigma \) in the first item because M is \(\varepsilon \)-free. Hence \(M'\) is \(\varepsilon \)-free. Moreover, \(A'\) is deterministic and both \(M'\) and \(A'\) are Boolean because \({\underline{c}}^q = \Vert A' \Vert _{{\mathbb {K}}}^q\) is also Boolean. So the constructed l-wxt\({}^{\text{ R }}\,\,M'\) is Boolean and \(\varepsilon \)-free. Moreover, it inherits the properties ‘nondeleting’ and ‘strict’ from M. If M has trivial look-ahead c, then we set \({\underline{c}}\) to the trivial look-ahead for \(T_{\varSigma '}\) for the statements on l-wxt. Finally, it is straightforward to establish that \(M'\) is functional and unambiguous as it can at most execute the annotated rules and transitions (and can perform this in at most one fashion).

It remains to prove that \(\Vert M \Vert _{{\mathbb {K}}} = \nu \mathbin ; \Vert M' \Vert _{{\mathbb {K}}}\), for which we first prove that \(\Vert M \Vert _{{\mathbb {K}}}^q = \nu \mathbin ; \Vert M' \Vert _{{\mathbb {K}}}^q\) for every \(q \in Q\) and \(\Vert A \Vert _{{\mathbb {K}}}^{q'} = \nu \mathbin ; \Vert A' \Vert _{{\mathbb {K}}}^{q'}\) for every \(q' \in Q'\). These proofs can be found in the appendix.

We prove that we can compose any such relabeling to the right of any l-wxt\({}^{\text{ R }}\).

Lemma 13

For all \(w \subseteq \{{\texttt {n}}, {\texttt {s}}\}\)

figure c

Proof

Let be an \(\varepsilon \)-free l-wxt\({}^{\text{ R }}\) and \(\nu :\varDelta \times \varDelta ' \rightarrow K\) be a weighted relabeling. We construct the l- such that \(\langle \ell , q, r'\rangle \in R'\) and \({{\,\mathrm{wt}\,}}'(\langle \ell , q, r'\rangle ) = \sum _{\langle \ell , q, r''\rangle \in R} {{\,\mathrm{wt}\,}}(\langle \ell , q, r''\rangle ) \cdot \nu (r'', r')\) for every translation rule \(\langle \ell , q, r\rangle \in R\) and \(r' \in T_{\varDelta '}(Q)\) with \(\nu (r, r') \ne 0\). No additional rules are in \(R'\). Since the left-hand sides and the shape of the right-hand sides remains the same, it is clear that \(M'\) inherits the properties ‘\(\varepsilon \)-free’, ‘nondeleting’, and ‘strict’ directly from M. Since the look-ahead coincides these results also hold for l-wxt.

It remains to prove that \(\Vert M \Vert _{{\mathbb {K}}} \mathbin ; \nu = \Vert M' \Vert _{{\mathbb {K}}}\), which we prove again with the help of \(\Vert M \Vert _{{\mathbb {K}}}^q \mathbin ; \nu = \Vert M' \Vert _{{\mathbb {K}}}^q\) for every \(q \in Q\). The proof details are in the appendix.

We now have the two ingredients that allow us to normalize composition chains of \(\varepsilon \)-free l-wxt\(^{\text {R}}\).

Theorem 14

For all \(n \ge 1\) and \(w \subseteq \{{\texttt {n}}, {\texttt {s}}\}\)

figure d

Proof

We prove the statements by induction on n. For \(n = 1\), they are proved in Lemma 12. Suppose that the property holds for \(n \ge 1\). Then using Lemmas 12 and 13 and the induction hypothesis in sequence we obtain

figure e

and the same reasoning proves the statement also for l-wxt.

Finally, we compose a weighted relabeling to the left of an \(\varepsilon \)-free l-wxt\({}^{\text{ R }}\) to eliminate the weighted relabeling again.

Lemma 15

for all \(w \subseteq \{{\texttt {n}}, {\texttt {s}}\}\).

Proof

Let \(\nu :\varSigma \times \varSigma ' \rightarrow K\) be a weighted relabeling, be an arbitrary \(\varepsilon \)-free l-wxt\({}^{\text{ R }}\). The \(\varepsilon \)-free l- is given by \(\langle \ell ', q, r \rangle \in R'\) and \({{\,\mathrm{wt}\,}}'(\langle \ell ', q, r\rangle ) = \sum _{\langle \ell '', q, r\rangle \in R} \nu (\ell ', \ell '') \cdot {{\,\mathrm{wt}\,}}(\langle \ell '', q, r\rangle )\) for every rule \(\rho = \langle \ell , q, r\rangle \in R\) and \(\ell ' \in T_\varSigma (Q)\) with \(\nu (\ell ', \ell ) \ne 0\). No additional rules are in \(R'\). In addition, we let \({\underline{c}}^q = \nu \mathbin ; c^q\) for every \(q \in Q\), which is regular by Theorem 1. Obviously, the constructed l-wxt\({}^{\text{ R }}\,\,M'\) is \(\varepsilon \)-free, and it inherits the properties ‘nondeleting’ and ‘strict’ from M.

To prove that \(\nu \mathbin ; \Vert M \Vert _{{\mathbb {K}}} = \Vert M' \Vert _{\mathbb K}\), we first prove that \(\nu \mathbin ; \Vert M \Vert _{{\mathbb {K}}}^q = \Vert M' \Vert _{{\mathbb {K}}}^q\) for every \(q \in Q\). Both proofs can be found in the appendix.

Now we are ready to state and prove our main results.

Theorem 16

figure f

Additionally, the classes of (4b) and (4d) as well as (4c) and (4e) coincide.

Proof

All the right-to-left inclusions are trivial. The left-to-right inclusions are shown as follows. To prove (4a), let be of type \(\tau :T_\varSigma \times T_\varDelta \rightarrow K\). According to Theorem 14 there exist a weighted relabeling \(\nu :\varSigma \times \varSigma ' \rightarrow K\) and \(\tau ' :T_{\varSigma '} \times T_\varDelta \rightarrow K\) such that \(\tau = \nu \mathbin ; \tau '\) and . Since the composition of functional weighted tree transformations is naturally again functional, we can apply Lemma 8 to obtain that . Using (3a), we obtain . Let \(\tau ' = \tau '_1 \mathbin ; \tau '_2\) with . Next we restrict the range of \(\tau '_1\) to the regular tree language \({{\,\mathrm{dom}\,}}(\tau '_2)\). In this way, we obtain the tree transformation . Clearly, \(\tau ''_1 \mathbin ; \tau '_2 = \tau '_1 \mathbin ; \tau '_2\). Using Lemma 10 we additionally obtain uniformizers for \(\tau ''_1\) and \(\tau '_2\), respectively. Since \(\tau '\) is functional, we fulfill all the requirements of Lemma 9 and we can conclude that \(f''_1 \mathbin ; f''_2 = \tau ''_1 \mathbin ; \tau '_2 = \tau '_1 \mathbin ; \tau '_2 = \tau '\). Hence . With the help of Lemma 11 we obtain , which immediately also yields by Lemma 8. Finally, we utilize Lemma 15 to show that as desired.

This approach works in the same manner for (4d) and (4e). However, instead of (3a), we use the results (3d) and (3e), respectively.

To prove (4b) we proceed in the same manner as in case (4a) and obtain that \(\tau = \nu \mathbin ; \tau '\), where \(\nu \) is a weighted relabeling and . Then we use

from [6, Theorem 20], and again as in the proof of case (4a) we obtain the statement . It is well-known that regular look-ahead can be simulated by a nondeleting, strict, and linear weighted top-down tree transducer. Thus, we obtain that \(\tau '\) is an element of

figure g

where the second inclusion is due to [13, Theorem 8]. Then by Lemma 15 we obtain that .

Finally, for (4c) we also proceed as usual: \(\tau = \nu \mathbin ; \tau '\), where \(\nu \) is a weighted relabeling and . We continue with

where the first equality is by (3c) and the second inclusion is due to [6, Theorem 24]. Continuing on as before, we arrive at

Removing the regular look-ahead with the constructions already mentioned, we obtain

and then by Lemma 15 we conclude that . The final equalities between the classes of (4b) and (4d) as well as (4c) and (4e) follow directly from the presented arguments.