1 Introduction

Numerous extensions of nondeterministic finite-state string automata have been proposed in the past few decades. On the one hand, the qualitative evaluation of inputs was extended to a quantitative evaluation in the weighted automata of [23]. This development led to the fruitful study of recognizable formal power series [22], which are well-suited for representing factors such as costs, consumption of resources, or time and probabilities related to the processed input. The main algebraic structure for the weight calculations are semirings [16, 17], which offer a nice compromise between generality and efficiency of computation (due to their distributivity).

On the other hand, finite-state automata have been generalized to other input structures such as infinite words [21] and trees [4]. Finite-state tree automata were introduced independently in [7, 24, 25] and they and the tree languages they generate, called regular tree languages, have been intensively studied since their inception [4]. They are successfully utilized in various applications in many diverse areas like natural language processing [18], picture generation [8], and compiler construction [28]. Indeed several applications require the combination of the two mentioned generalizations and a broad range of weighted tree automaton (WTA) models has been studied (cf. [13, Chapter 9] for an overview). It is well-known that finite-state tree automata cannot ensure that two subtrees (of potentially arbitrary size) are always equal in an accepted tree [14]. An extension proposed in [20] aims to remedy this problem and introduces a tree automaton model that explicitly can require certain subtrees to be equal or unequal. Such models are very useful when investigating transduction models (see [13] for an overview) that can copy subtrees (thus resulting in equal subtrees) and they are the main tool used in the seminal paper [15] that proved that the HOM problem is decidable.

The HOM problem was a long-standing open problem in the theory of tree languages and recently solved in [15]. It asks whether the image of an (effectively presented) regular tree language under a given tree homomorphism is again regular. This is not necessarily the case as tree homomorphisms can create copies of subtrees. Indeed removing this ability from the tree homomorphism, obtaining a linear tree homomorphism, yields that the mentioned image is always regular [14]. In the solution to the HOM problem provided in [15] the image is first represented by a tree automaton with constraints and then it is investigated whether this tree automaton actually generates a regular tree language.

In the weighted setting, the HOM problem is also interesting as it once again provides an answer whether a given homomorphic image of a regular weighted tree language can efficiently be represented. While preservation of regularity has been investigated [3, 10,11,12] also in the weighted setting, the decidability of the HOM problem remains wide open. With the goal of investigating this problem, we introduce weighted tree automata with constraints (WTAc for short) in this contribution. We demonstrate that those WTAc can again represent all homomorphic images of the regular weighted tree languages. Thus, in principle, it only remains to provide a decision procedure for determining whether a given WTAc generates a regular weighted tree language. We approach this task by providing some common closure properties following essentially the steps also taken in [15]. For zero-sum free semirings we can also show that decidability of support emptiness and finiteness are directly inherited from the unweighted case [15].

2 Preliminaries

We denote the set of nonnegative integers by \(\mathbb {N}\), and for every \(k \in \mathbb {N}\), we let \([k] = \{i \in \mathbb {N}\mid 1 \le i \le k\}\). For all sets T and Z let \(T^Z\) be the set of all mappings \(\varphi :Z \rightarrow T\), and correspondingly we sometimes write \(\varphi _z\) instead of \(\varphi (z)\) for every \(\varphi \in T^Z\). The cardinality of Z is denoted by \(|Z |\).

A ranked alphabet \((\varSigma , \mathord {\mathrm {rk}})\) is a pair consisting of a finite set \(\varSigma \) and a mapping \(\mathord {\mathrm {rk}} \in \mathbb {N}^\varSigma \) that assigns a rank to each symbol of \(\varSigma \). If there is no risk of confusion, we denote a ranked alphabet \((\varSigma , \mathord {\mathrm {rk}})\) by \(\varSigma \). We write \(\sigma ^{(k)}\) to indicate that \(\mathrm {rk}(\sigma ) = k\). Moreover, for every \(k \in \mathbb {N}\) we let \(\varSigma _k = \mathrm {rk}^{-1}(k)\). Let \(X = \{x_i \mid i \in \mathbb {N}\}\) be a countable set of (formal) variables. For each \(n\in \mathbb {N}\) we let \(X_n = \bigl \{x_i \mid i \in [n] \bigr \}\). Given a ranked alphabet \(\varSigma \) and a set Z, the set \(T_{\varSigma }(Z)\) of \(\varSigma \)trees indexed by Z is the smallest set such that \(Z \subseteq T_{\varSigma }(Z)\) and \(\sigma (t_{1}, \dotsc , t_{k}) \in T_{\varSigma }(Z)\) for every \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), and \(t_{1}, \dotsc , t_{k} \in T_{\varSigma }(Z)\). We abbreviate \(T_{\varSigma }(\emptyset )\) simply to \(T_{\varSigma }\), and any subset \(L \subseteq T_{\varSigma }\) is called a tree language.

Let \(\varSigma \) be a ranked alphabet, Z a set, and \(t \in T_{\varSigma }(Z)\). The set \(\mathrm {pos}(t)\) of positions of t is inductively defined by \(\mathrm {pos}(z) = \{\varepsilon \}\) for all \(z \in Z\) and by \(\mathrm {pos}\bigl (\sigma (t_{1}, \dotsc , t_{k}) \bigr ) = \bigl \{\varepsilon \bigr \} \cup \bigcup _{i \in [k]} \bigl \{iw \mid w \in \mathrm {pos}(t_i) \bigr \}\) for all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), and \(t_{1}, \dotsc , t_{k} \in T_{\varSigma }(Z)\). The size \(|t |\) of t is defined as \(|t | = |\mathrm {pos}(t) |\). For \(w\in \mathrm {pos}(t)\) and \(t' \in T_{\varSigma }(Z)\), the label t(w) of t at w, the subtree \(t|_w\) of t at w, and the substitution \(t[t']_w\) of \(t'\) into t at w are defined by \(z(\varepsilon ) = z|_\varepsilon = z\) and \(z[t']_\varepsilon = t'\) for all \(z \in Z\) and for \(t = \sigma (t_{1}, \dotsc , t_{k})\) by \(t(\varepsilon ) = \sigma \), \(t(iw') = t_i(w')\), \(t|_\varepsilon = t\), \(t|_{iw'} = t_i|_{w'}\), \(t[t']_\varepsilon = t'\), and \(t[t']_{iw'} = \sigma \bigl (t_{1}, \dotsc , t_{i-1}, t_i[t']_{w'}, t_{i+1}, \dotsc , t_{k} \bigr )\) for all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), \(t_{1}, \dotsc , t_{k} \in T_{\varSigma }(Z)\), \(i \in [k]\), and \(w' \in \mathrm {pos}(t_i)\). For all \(\sigma \in \varSigma \cup Z\), we let \(\mathrm {pos}_{\sigma }(t) = \bigl \{w \in \mathrm {pos}(t) \mid t(w) = \sigma \bigr \}\) and \(\mathrm {var}(t) = \{x \in X \mid \mathrm {pos}_x(t) \ne \emptyset \}\). Finally, for every \(t\in T_{\varSigma }(Z)\), finite \(V \subseteq Z\), and \(\theta \in T_{\varSigma }(Z)^V\), the substitution \(\theta \) applied to t is written as \(t\theta \) and defined by \(v\theta = \theta _v\) for every \(v \in V\), \(z\theta = z\) for every \(z \in Z \setminus V\), and \(\sigma (t_{1}, \dotsc , t_{k})\theta = \sigma (t_1\theta , \dotsc , t_k\theta )\) for all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), and \(t_{1}, \dotsc , t_{k} \in T_{\varSigma }(Z)\). We also write the substitution \(\theta \in T_{\varSigma }(Z)^V\) as (i) \([v_1 \leftarrow \theta _{v_1}, \dotsc , v_n \leftarrow \theta _{v_n}]\) if \(V = \{v_{1}, \dotsc , v_{n}\}\) or (ii) \([\theta _{x_1}, \dotsc , \theta _{x_n}]\) if \(V = X_n\).

A commutative semiring [16, 17] is a tuple \((\mathbb {S}, \mathord +, \mathord \cdot , 0, 1)\) such that \((\mathbb {S}, \mathord +, 0)\) and \((\mathbb {S}, \mathord \cdot , 1)\) are commutative monoids, \(\cdot \) distributes over \(+\), and \(0 \cdot s = 0\) for all \(s\in \mathbb {S}\). Examples include (i) the Boolean semiring \(\mathbb {B} = \bigl (\{ 0,1 \}, \mathord \vee , \mathord \wedge , 0, 1 \bigr )\), (ii) the semiring \(\mathbb {N}= \bigl (\mathbb {N}, \mathord +, \mathord \cdot , 0, 1)\), (iii) the tropical semiring \(\mathbb T = \bigl (\mathbb {N}\cup \{\infty \}, \mathord {\min }, \mathord +, \infty , 0 \bigr )\), and (iv) the arctic semiring \(\mathbb A = \bigl (\mathbb {N}\cup \{-\infty \}, \mathord {\max }, \mathord +, -\infty , 0 \bigr )\). Given semirings \((\mathbb S, \mathord +, \mathord \cdot , 0, 1)\) and \((\mathbb T, \mathord \oplus , \mathord \odot , \bot , \top )\), a semiring homomorphism is a mapping \(h \in \mathbb T^{\mathbb S}\) such that \(h(0) = \bot \), \(h(1) = \top \), and \(h(s_1 + s_2) = h(s_1) \oplus h(s_2)\) as well as \(h(s_1 \cdot s_2) = h(s_1) \odot h(s_2)\) for all \(s_1, s_2 \in \mathbb S\). When there is no risk of confusion, we refer to a semiring \((\mathbb {S}, \mathord +, \mathord \cdot , 0, 1)\) simply by its carrier set \(\mathbb {S}\). A semiring \(\mathbb {S}\) is a ring if there exists \(-1 \in \mathbb S\) such that \(-1 + 1 = 0\). Let \(\varSigma \) be a ranked alphabet. Any mapping \(A \in \mathbb S^{T_{\varSigma }}\) is called a weighted tree language over \(\mathbb {S}\) and its support is \(\mathrm {supp}(A) = \{t \in T_{\varSigma }\mid A_t \ne 0\}\).

Let \(\varSigma \) and \(\varDelta \) be ranked alphabets and let \(h' \in T_\varDelta (X)^\varSigma \) be a mapping such that \(h'_\sigma \in T_\varDelta (X_k) \) for all \(k \in \mathbb {N}\) and \(\sigma \in \varSigma _k\). We extend \(h'\) to \(h \in T_\varDelta ^{T_{\varSigma }}\) by (i) \(h_\alpha = h'_\alpha \in T_\varDelta (X_0) = T_\varDelta \) for all \(\alpha \in \varSigma _0\) and (ii) \(h_{\sigma (t_{1}, \dotsc , t_{k})} = h'_\sigma [h_{t_1}, \dotsc , h_{t_k}]\) for all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), and \(t_{1}, \dotsc , t_{k} \in T_{\varSigma }\). The mapping h is called the tree homomorphism induced by \(h'\), and we identify \(h'\) and its induced tree homomorphism h. It is nonerasing if \(h'_\sigma \notin X\) for all \(k\in \mathbb {N}\) and \(\sigma \in \varSigma _k\), and it is nondeleting if \(\mathrm {var}(h'_\sigma ) = X_k\) for all \(k\in \mathbb {N}\) and \(\sigma \in \varSigma _k\). Let \(h \in T_\varDelta ^{T_{\varSigma }}\) be a nonerasing and nondeleting homomorphism. Then h is input finitary; i.e., the set \(h^{-1}(u)\) is finite for every \(u \in T_\varDelta \) because \(|t | \le |u |\) for each \(t \in h^{-1}(u)\). Additionally, let \(A \in \mathbb {S}^{T_{\varSigma }}\) be a weighted tree language. We define the weighted tree language \(h(A) \in \mathbb {S}^{T_\varDelta }\) for every \(u \in T_\varDelta \) by \(h(A)_u = \sum _{t \in h^{-1}(u)} A_t\).

3 Weighted Tree Grammars with Constraints

Let us start with the formal definition of our weighted tree grammars. They are a weighted variant of the tree automata with equality and inequality constraints originally introduced in [1, 5] (with constraints on direct subtrees). An overview of further developments for these automata can be found in [26]. We essentially use the version recently utilized to solve the HOM problem [15, Definition 4.1].

Definition 1

A weighted tree grammar with constraints (WTGc) is a tuple \(G = (Q, \varSigma , F, P, \mathord {\mathrm {wt}})\) such that

  • Q is a finite set of nonterminals and \(F \in \mathbb S^Q\) assigns final weights,

  • \(\varSigma \) is a ranked alphabet of input symbols,

  • P is a finite set of productions of the form \((\ell , q, E, I)\), where \(\ell \in T_\varSigma (Q) \setminus Q\), \(q \in Q\), and \(E, I \subseteq \mathbb N^* \times \mathbb N^*\) are finite sets, and

  • \(\mathord {\mathrm {wt}} \in \mathbb S^P\) assigns a weight to each production.    \(\square \)

In the following, let \(G = (Q, \varSigma , F, P, \mathord {\mathrm {wt}})\) be a WTGc. The components of a production \(p = (\ell , q, E, I) \in P\) are the left-hand side \(\ell \), the governing nonterminal q, the set E of equality constraints, and the set I of inequality constraints. Correspondingly, the production p is also written \(\ell {\mathop {\longrightarrow }\limits ^{E,I}}q\) or even \(\ell {\mathop {\longrightarrow }\limits ^{E,I}}_{\mathrm {wt}_p} q\) if we want to indicate its weight. Additionally, we simply list an equality constraint \((v, v') \in E\) as \(v = v'\) and an inequality constraint \((v, v') \in I\) as \(v \ne v'\). A production \(\ell {\mathop {\longrightarrow }\limits ^{E,I}}q \in P\) is normalized if \(\ell = \sigma (q_{1}, \dotsc , q_{k})\) for some \(k \in \mathbb N\), \(\sigma \in \varSigma _k\), and \(q_{1}, \dotsc , q_{k} \in Q\), and it is unconstrained if \(E = \emptyset = I\); in this case we also simply write \(\ell \rightarrow q\). The WTGc G is a weighted tree automaton with constraints (WTAc) if all productions \(p \in P\) are normalized, and it is a weighted tree grammar (WTG) [14] if all productions \(p \in P\) are unconstrained. If G is both a WTAc as well as a WTG, then it is a weighted tree automaton (WTA) [14]. All these devices have Boolean final weights if \(F \in \{0,1\}^Q\). Finally, if we utilize the Boolean semiring \(\mathbb B\), then we reobtain the unweighted versions and omit the ‘W’ in the abbreviations and the mapping ‘\(\mathrm {wt}\)’ from the tuple.

The semantics for our WTGc G is a slightly non-standard derivation semantics when compared to [15, Definitions 4.3 and 4.4]. Let \((v,v') \in \mathbb {N}^* \times \mathbb {N}^*\) and \(t \in T_{\varSigma }\). If \(v,v' \in \mathrm {pos}(t)\) and \(t|_v = t|_{v'}\), we say that t satisfies \((v,v')\), otherwise t dissatisfies \((v,v')\). Let now \(C \subseteq \mathbb {N}^* \times \mathbb {N}^*\) be a finite set of constraints. We write \(t\models C\) if t satisfies all \((v,v') \in C\), and  if t dissatisfies all \((v,v') \in C\). Universally dissatisfying C is generally stronger than simply not satisfying C.

Definition 2

A sentential form (for G) is simply a tree of \(\xi \in T_\varSigma (Q)\). Given an input tree \(t \in T_\varSigma \), sentential forms \(\xi , \zeta \in T_\varSigma (Q)\), a production \(p = \ell {\mathop {\longrightarrow }\limits ^{E,I}}q \in P\), and a position \(w \in \mathrm {pos}(\xi )\), we write \(\xi \Rightarrow _{G,t}^{p,w} \zeta \) if \(\xi |_w = \ell \), \(\zeta = \xi [q]_w\), and the constraints E and I are fulfilled on \(t|_w\); i.e., \(t|_w \models E\) and . A sequence \(d = (p_1,w_1) \cdots (p_n, w_n) \in (P \times \mathbb N^*)^*\) is a derivation of G for t if there exist \(\xi _{1}, \dotsc , \xi _{n} \in T_\varSigma (Q)\) such that \(t \Rightarrow _{G, t}^{p_1,w_1} \xi _1 \Rightarrow _{G, t}^{p_2,w_2} \cdots \Rightarrow _{G, t}^{p_n,w_n} \xi _n\). It is left-most if additionally \(w_1 \prec w_2 \prec \cdots \prec w_n\), where \(\preceq \) is the lexicographic order on \(\mathbb N^*\) in which prefixes are larger, so \(\varepsilon \) is the largest element.    \(\square \)

Note that the sentential forms \(\xi _{1}, \dotsc , \xi _{n}\) are uniquely determined if they exist, and for any derivation d for t there exists a unique permutation of d that is a left-most derivation for t. The derivation d is complete if \(\xi _n \in Q\), and in that case it is also called a derivation to \(\xi _n\). The set of all complete left-most derivations for t to \(q \in Q\) is denoted by \(D^q_G(t)\). The WTGc G is unambiguous if \(\sum _{q \in \mathrm {supp}(F)} |D_G^q(t) | \le 1\) for every \(t \in T_{\varSigma }\).

Definition 3

The weight of a derivation \(d = (p_1,w_1) \cdots (p_n, w_n)\) is defined to be \(\mathrm {wt}_G(d) = \prod _{i = 1}^n \mathrm {wt}(p_i)\). The weighted tree language generated by G, written simply \(G \in \mathbb S^{T_{\varSigma }}\), is defined for every \(t \in T_{\varSigma }\) by

$$\begin{aligned} G_t = \sum _{q \in Q,\, d \in D^q_G(t)} F_q \cdot \mathrm {wt}_G(d). \end{aligned}$$

   \(\square \)

Two WTGc are equivalent if they generate the same weighted tree language. Finally, a weighted tree language is regular if it is generated by a WTG, and it is constraint-regular if it is generated by a WTGc. Since the weights of productions are multiplied, we can assume that \(\mathrm {wt}_p \ne 0\) for all \(p \in P\).

Example 4

Consider the WTGc \(G = \bigl (Q, \varSigma , F, P, \mathord {\mathrm {wt}} \bigr )\) over \(\mathbb A\) with \(Q = \{q, q'\}\), \(\varSigma = \{\alpha ^{(0)}, \gamma ^{(1)}, \sigma ^{(2)}\}\), \(F_q = -\infty \), \(F_{q'} = 0\), and P and ‘\(\mathrm {wt}\)’ given by the productions \(p_1 = \alpha \rightarrow _0 q\), \(p_2 = \gamma (q) \rightarrow _1 q\), and \(p_3 = \sigma \bigl (\gamma (q), q\bigr ) {\mathop {\longrightarrow }\limits ^{11=2}}_1 q'\). The tree \(t = \sigma \bigl (\gamma (\gamma (\alpha )), \gamma (\alpha ) \bigr )\) has the unique left-most derivation

$$ d = (p_1, 111) \, (p_2, 11) \, (p_1, 21) \, (p_2, 2) \, (p_3, \varepsilon ) $$

to the nonterminal \(q'\). Overall, we have \(\mathrm {supp}(G) = \big \{ \sigma \bigl (\gamma ^{i+1}(\alpha ), \gamma ^i(\alpha )\bigr ) \mid i\in \mathbb {N}\bigr \}\) and \(G_t = |\mathrm {pos}_\gamma (t) |\) for every \(t \in \mathrm {supp}(G)\).    \(\square \)

For the restricted model of WTAc we introduce another semantics, called initial algebra semantics, which is often more convenient in proofs.

Definition 5

If G is a WTAc, then for each \(q \in Q\) we define \(\mathord {\mathrm {wt}_G^q} \in \mathbb S^{T_{\varSigma }}\) for every \(t = \sigma (t_{1}, \dotsc , t_{k})\) with \(k \in \mathbb N\), \(\sigma \in \varSigma _k\), and \(t_{1}, \dotsc , t_{k} \in T_\varSigma \) by

   \(\square \)

It is a routine matter to verify \(\mathrm {wt}_G^q(t) = \sum _{d \in D^q_G(t)} \mathrm {wt}_G(d)\) for every \(q \in Q\) and \(t \in T_{\varSigma }\). Indeed as for WTG and WTA [13] also every WTGc can be turned into an equivalent WTAc at the expense of additional nonterminals by decomposing the left-hand sides.

Lemma 6

(cf. [15, Lemma 4.8]). WTGc and WTAc are equally expressive.

Proof

Let \(G = (Q, \varSigma , F, P, \mathord {\mathrm {wt}})\) be a WTGc with a non-normalized production \(p = \sigma (\ell _{1}, \dotsc , \ell _{k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \in P\), let U be an infinite set with \(Q\subseteq U\) and let \(\varphi \in U^{T_\varSigma (Q)}\) be an injective map such that \(\varphi _q = q\) for all \(q \in Q\). We define the WTGc \(G' = (Q', \varSigma , F', P', \mathord {\mathrm {wt}'})\) such that \(Q' = Q \cup \{\varphi _{\ell _1}, \dotsc , \varphi _{\ell _k}\}\), \(F'_q = F_q\) for all \(q \in Q\) and \(F'_{q'} = 0\) for all \(q' \in Q' \setminus Q\), and

$$ P' = (P \setminus \{p\}) \cup \{\sigma (\varphi _{\ell _1}, \dotsc , \varphi _{\ell _k}) {\mathop {\longrightarrow }\limits ^{E,I}}q\} \cup \{\ell _i \rightarrow \varphi _{\ell _i} \mid i \in [k], \ell _i \notin Q\}, $$

and for every \(p' \in P'\)

$$ \mathrm {wt}'_{p'} = {\left\{ \begin{array}{ll} \mathrm {wt}_{p'} &{} \text {if } p' \in P \setminus \{p\} \\ \mathrm {wt}_p &{} \text {if } p' = \sigma (\varphi _{\ell _1}, \dotsc , \varphi _{\ell _k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \\ 1 &{} \text {otherwise.} \end{array}\right. } $$

   \(\square \)

Example 7

Consider the WTGc G of Example 4 and its non-normalized production \(p = \sigma \big (\gamma (q), q\big ) {\mathop {\longrightarrow }\limits ^{11=2}}_1 q'\). Applying the construction in the proof of Lemma 6 we replace p by the productions \(\sigma (q'', q) {\mathop {\longrightarrow }\limits ^{11=2}}_1 q\) and \(\gamma (q) \rightarrow _0 q''\), where \(q''\) is some new nonterminal. The such obtained WTGc is already a WTAc.    \(\square \)

Another routine normalization turns the final weights into Boolean final weights [2, Lemma 6.1.1]. This is achieved by adding special copies of all nonterminals that terminate the derivation and pre-apply the final weight.

Lemma 8

WTAc and WTAc with Boolean final weights are equally expressive.

Let \(d \in D_G^q(t)\) be a derivation for some \(q \in Q\) and \(t \in T_{\varSigma }\). Since we often argue with the help of such derivations d, it is a nuisance that we might have \(\mathrm {wt}_G(d) = 0\). This anomaly can occur even if \(\mathrm {wt}_p \ne 0\) for all \(p \in P\) due to the presence of zero-divisors, which are elements \(s, s' \in \mathbb S \setminus \{0\}\) such that \(s \cdot s' = 0\). However, we can fortunately avoid such anomalies altogether utilizing a construction of [19] based on Dickson’s Lemma [6], which has been lifted to tree automata in [9]. We note that the construction preserves Boolean final weights.

Lemma 9

For every WTAc G there exists a WTAc \(G' = (Q', \varSigma , F', P', \mathord {\mathrm {wt}'})\) that is equivalent and \(\mathrm {wt}'_{G'}(d') \ne 0\) for all \(q' \in Q'\), \(t' \in T_{\varSigma }\), and \(d' \in D_{G'}^{q'}(t')\).

For zero-sum free semirings [16, 17] we obtain that the support \(\mathrm {supp}(G)\) of an WTAc can be generated by a TAc. A semiring is zero-sum free if \(s = 0 = s'\) for every \(s, s' \in \mathbb S\) such that \(s + s' = 0\). Clearly, rings are never zero-sum free, but the mentioned semirings \(\mathbb B\), \(\mathbb N\), \(\mathbb T\), and \(\mathbb A\) are all zero-sum free.

Corollary 10

(of Lemmata 6 and 9). If \(\mathbb S\) is zero-sum free, then \(\mathrm {supp}(G)\) is constraint-regular for every WTGc G.

Proof

We apply Lemmata 6 and 8 to obtain an equivalent WTAc with Boolean final weights and then Lemma 9 to obtain the WTAc \(G' = (Q', \varSigma , F', P', \mathord {\mathrm {wt}'})\) with Boolean final weights. As mentioned we can assume that \(\mathrm {wt}'_{p'} \ne 0\) for all \(p' \in P'\). Let \(q' \in \mathrm {supp}(F')\) and \(t' \in T_{\varSigma }\) with \(D_{G'}^{q'}(t')\ne \emptyset \). Since \(\mathrm {wt}'_{G'}(d') \ne 0\) for every \(d' \in D_{G'}^{q'}(t')\) and \(s + s' \ne 0\) for all \(s, s' \in \mathbb S \setminus \{0\}\) due to zero-sum freeness, we obtain \(t' \in \mathrm {supp}(G')\). Thus, the existence of a complete derivation for \(t'\) to an accepting nonterminal (i.e., one with final weight 1) characterizes whether we have \(t' \in \mathrm {supp}(G')\). Consequently, the TAc \(\bigl (Q', \varSigma , \mathrm {supp}(F'), P' \bigr )\) generates the tree language \(\mathrm {supp}(G')\), which is thus constraint-regular.    \(\square \)

4 Closure Properties

In this section we investigate several closure properties of the constraint-regular weighted tree languages. We start with the (point-wise) sum, which is given by \((A + A')_t = A_t + A'_t\) for every \(t \in T_{\varSigma }\) and \(A, A' \in \mathbb S^{T_{\varSigma }}\). Given WTGc G and \(G'\) generating A and \(A'\) we can trivially use a disjoint union construction to obtain a WTGc generating \(A + A'\). We omit the details.

Proposition 11

The constraint-regular weighted tree languages (over the same ranked-alphabet) are closed under sums.    \(\square \)

The corresponding (point-wise) product is the Hadamard product, which is given by \((A \cdot A')_t = A_t \cdot A'_t\) for every \(t \in T_{\varSigma }\) and \(A, A' \in \mathbb S^{T_{\varSigma }}\). With the help of a standard product construction we show that the constraint-regular weighted tree languages are also closed under Hadamard product. As preparation we introduce a special normal form. A WTAc \(G = (Q, \varSigma , F, P, \mathord {\mathrm {wt}})\) is constraint-determined if \(E = E'\) and \(I = I'\) for all productions \(\sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E, I}}q \in P\) and \(\sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E', I'}}q \in P\). In other words, two productions cannot differ only in the sets of constraints. It is straightforward to turn any WTAc into an equivalent constraint-determined WTAc by introducing additional nonterminals (e.g. annotate the constraints to the state on the right-hand side).

Theorem 12

The constraint-regular weighted tree languages (over the same ranked alphabet) are closed under Hadamard product.

Proof

Let \(A, A' \in \mathbb S^{T_{\varSigma }}\) be constraint-regular. Without loss of generality (see Lemma 6) we can assume constraint-determined WTAc \(G = (Q, \varSigma , F, P, \mathord {\mathrm {wt}})\) and \(G' = (Q', \varSigma , F', P', \mathord {\mathrm {wt}'})\) that generate A and \(A'\), respectively. We construct the direct product WTAc \(G \times G' = (Q \times Q', \varSigma , F'', P'', \mathord {\mathrm {wt}''})\) such that \(F''_{\langle q, q'\rangle } = F_q \cdot F'_{q'}\) for every \(q \in Q\) and \(q' \in Q'\) and for every production \(p = \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E, I}}~q \in P\) and production \(p' = \sigma (q'_{1}, \dotsc , q'_{k}) {\mathop {\longrightarrow }\limits ^{E', I'}}q' \in P'\) the production

$$ p'' = \sigma \bigl ( \langle q_1, q'_1\rangle , \dotsc , \langle q_k, q'_k\rangle \bigr ) {\mathop {\longrightarrow }\limits ^{E \cup E', I \cup I'}}\langle q, q'\rangle $$

belongs to \(P''\) and its weight is \(\mathrm {wt}''_{p''} = \mathrm {wt}_p \cdot \mathrm {wt}'_{p'}\). No other productions belong to \(P''\). The proof that \(G \times G' = A \cdot A'\) is a straightforward induction proving \(\mathrm {wt}_{G \times G'}^{\langle q, q'\rangle }(t) = \mathrm {wt}_G^q(t) \cdot \mathrm {wt}_{G'}^{q'}(t)\) for all \(t \in T_\varSigma \) using the initial algebra semantics. The WTAc G and \(G'\) are required to be constraint-determined, so that we can uniquely identify the productions \(p \in P\) and \(p' \in P'\) that construct a production \(p'' \in P''\).    \(\square \)

Example 13

Let \(G = \bigl (\{q\}, \varSigma , F, P, \mathord {\mathrm {wt}} \bigr )\) and \(G' = \bigl (\{z\}, \varSigma , F', P', \mathord {\mathrm {wt}'} \bigr )\) be WTAc over \(\mathbb A\) and \(\varSigma = \{\alpha ^{(0)}, \gamma ^{(1)}, \sigma ^{(2)}\}\), \(F_q = F'_z = 0\), and the productions

figure a

We observe that

$$\begin{aligned} \mathrm {supp}(G)&= \bigl \{t \in T_{\varSigma }\mid \forall w \in \mathrm {pos}_\sigma (t) :t|_{w1} = t|_{w2} \bigr \} \\ \mathrm {supp}(G')&= \bigl \{t \in T_{\varSigma }\mid \forall w \in \mathrm {pos}_\gamma (t) :\text { if } t(w1) = \sigma \text { then } t|_{w11} \ne t|_{w12} \bigr \} \end{aligned}$$

and \(G_t = 2|\mathrm {pos}_\gamma (t) |\) as well as \(G'_{t'} = |\mathrm {pos}_\gamma (t') | + |\mathrm {pos}_\sigma (t') |\) for all \(t \in \mathrm {supp}(G)\) and \(t' \in \mathrm {supp}(G')\). We obtain the WTAc \(G \times G' = \bigl (\{\langle q,z\rangle \}, \varSigma , F'', P'', \mathord {\mathrm {wt}''}\bigr )\) with \(F''_{\langle q, z\rangle } = 0\) and the following productions.

$$ \alpha \rightarrow _0 \langle q,z\rangle \qquad \gamma \bigl (\langle q,z\rangle \bigr ) {\mathop {\longrightarrow }\limits ^{11\ne 12}}_3 \langle q, z\rangle \qquad \sigma \bigl ( \langle q,z\rangle , \langle q,z\rangle \bigr ) {\mathop {\longrightarrow }\limits ^{1=2}}_1 \langle q,z\rangle $$

Hence we obtain the equality \((G \times G')_t = 3|\mathrm {pos}_\gamma (t) | + |\mathrm {pos}_\sigma (t) | = G_t \cdot G'_t\) for every tree \(t \in \mathrm {supp}(G) \cap \mathrm {supp}(G')\).    \(\square \)

Next, we use an extended version of the classical power set construction to obtain an unambiguous WTAc that keeps track of the reachable nonterminals, but preserves only the homomorphic image of its weight. The unweighted part of the construction mimics a power-set construction and the handling of constraints roughly follows [15, Definition 3.1].

Theorem 14

Let \(h \in \mathbb T^{\mathbb S}\) be a semiring homomorphism into a finite semiring \(\mathbb T\). For every WTAc \(G = (Q, \varSigma , F, P, \mathord {\mathrm {wt}})\) over \(\mathbb S\) there exists an unambiguous WTAc \(G' = (\mathbb T^Q, \varSigma , F', P', \mathord {\mathrm {wt}'})\) such that for every \(t \in T_{\varSigma }\) and \(\varphi \in \mathbb T^Q\)

$$ \mathrm {wt}_{G'}^\varphi (t) = {\left\{ \begin{array}{ll} 1 &{} if \varphi _q = h_{\mathrm {wt}_G^q(t)} for all q \in Q \\ 0 &{} otherwise. \end{array}\right. } $$

Moreover, \(G'_t = h_{G(t)}\) for every \(t \in T_{\varSigma }\).

Proof

Let \(\mathcal C = \bigl \{ E \mid \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \in P \bigr \} \cup \bigl \{ I \mid \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E,I}}q \in P \bigr \}\) be the constraints that occur in G. We let \(F'_\varphi = \sum _{q \in Q} h_{F(q)} \cdot \varphi _q\) for every \(\varphi \in \mathbb T^Q\). For all \(k \in \mathbb {N}\), \(\sigma \in \varSigma _k\), nonterminals \(\varphi ^1, \dotsc , \varphi ^k \in \mathbb T^Q\), and constraints \(\mathcal E \subseteq \mathcal C\) we let \(p' = \sigma (\varphi ^1, \dotsc \varphi ^k) {\mathop {\longrightarrow }\limits ^{\mathcal E, \mathcal I}}\varphi \in P'\), where \(\mathcal I = \mathcal C \setminus \mathcal E\) and for every \(q \in Q\)

$$\begin{aligned} \varphi _q = \sum _{\begin{array}{c} p = \sigma (q_{1}, \dotsc , q_{k}) {\mathop {\longrightarrow }\limits ^{E, I}}q \in P \\ E \subseteq \mathcal E,\, I \subseteq \mathcal I \end{array}} h_{\mathrm {wt}(p)} \cdot \varphi ^1_{q_1} \cdot \ldots \cdot \varphi ^k_{q_k} . \end{aligned}$$
(1)

No additional productions belong to \(P'\). Finally, we set \(\mathord {\mathrm {wt}'_{p'}} = 1\) for all \(p' \in P'\). In general, the WTAc \(G'\) is certainly not deterministic due to the choice of constraints, but \(G'\) is unambiguous since the resulting \(2^{|\mathcal C |}\) rules for each left-hand side have mutually exclusive constraint sets. In fact, for each \(t\in T_{\varSigma }\) there is exactly one left-most complete derivation of \(G'\) for t, and it derives to \(\varphi \in \mathbb T^Q\) such that \(\varphi _q = h_{\mathrm {wt}_G^q(t)}\) for every \(q \in Q\). The weight of that derivation is 1. These statements are proven inductively. The final statement \(G'_t = h_{G(t)}\) is an easy consequence of the previous statements.    \(\square \)

Example 15

Reconsider the WTAc obtained as the disjoint union of the WTAc G and \(G'\) of Example 13 as well as the semiring homomorphism \(h \in \mathbb B^{\mathbb A}\) given by \(h_a = 1\) for all \(a \in \mathbb A \setminus \{-\infty \}\) and \(h_{-\infty } = 0\). The set \(\mathcal C\) of utilized constraints is \(\bigl \{(1,2), (11, 12) \bigr \}\) and we write \(\varphi \in \mathbb B^Q\) simply as subsets of Q. We obtain the unambiguous WTAc \(G''\) with the following (sensible, i.e. having satisfiable constraints) productions for all \(Q', Q'' \subseteq \{q, z\}\), which all have weight 1.

$$\begin{aligned} \alpha&{\mathop {\longrightarrow }\limits ^{1\ne 2, 11\ne 12}}\{q, z\} \\ \gamma (Q')&{\mathop {\longrightarrow }\limits ^{1=2, 11=12}}Q' \cap \{q\}&\gamma (Q')&{\mathop {\longrightarrow }\limits ^{1 \ne 2, 11=12}}Q' \cap \{q\} \\* \gamma (Q')&{\mathop {\longrightarrow }\limits ^{1=2, 11\ne 12}}Q'&\gamma (Q')&{\mathop {\longrightarrow }\limits ^{1 \ne 2, 11\ne 12}}Q' \\* \sigma (Q', Q'')&{\mathop {\longrightarrow }\limits ^{1=2, 11=12}}Q' \cap Q''&\sigma (Q', Q'')&{\mathop {\longrightarrow }\limits ^{1 \ne 2, 11=12}}Q' \cap Q'' \cap \{z\} \\* \sigma (Q', Q'')&{\mathop {\longrightarrow }\limits ^{1=2, 11\ne 12}}Q' \cap Q''&\sigma (Q', Q'')&{\mathop {\longrightarrow }\limits ^{1 \ne 2, 11\ne 12}}Q' \cap Q'' \cap \{z\} \end{aligned}$$

Each \(t \in T_{\varSigma }\) has exactly one left-most complete derivation in \(G''\); it derives to \(Q'\), where (i) \(q \in Q'\) iff \(t \in \mathrm {supp}(G)\) and (ii) \(z \in Q'\) iff \(t \in \mathrm {supp}(G')\).    \(\square \)

Corollary 16

(of Theorem 14). Let \(\mathbb S\) be finite. For every WTAc over \(\mathbb S\) there exists an equivalent unambiguous WTAc.    \(\square \)

Corollary 17

(of Theorem 14). Let \(\mathbb S\) be zero-sum free. For every WTAc G over \(\mathbb S\) there exists an unambiguous TAc generating \(\mathrm {supp}(G)\).

Proof

Utilizing Lemma 8 we can first construct an equivalent WTAc with Boolean final weights. If \(\mathbb S\) is zero-sum free, then there exists a semiring homomorphism \(h \in \mathbb B^{\mathbb S}\) by [27]. By Lemma 9 we can assume that each derivation of G has non-zero weight and sums of non-zero elements remain non-zero by zero-sum freeness. Thus we can simply replace the factor \(h_{\mathrm {wt}(p)}\) by 1 in (1). The such obtained TAc generates \(\mathrm {supp}(G)\).    \(\square \)

Let \(A, A' \in \mathbb S^{T_{\varSigma }}\). It is often useful (see [15, Definition 4.11]) to restrict A to the support of \(A'\) but without changing the weights of those trees inside the support. Formally, we define \(A|_{\mathrm {supp}(A')} \in \mathbb S^{T_{\varSigma }}\) for every \(t \in T_{\varSigma }\) by \(A|_{\mathrm {supp}(A')}(t) = A(t)\) if \(t \in \mathrm {supp}(A')\) and \(A|_{\mathrm {supp}(A')}(t) = 0\) otherwise. Utilizing the unambiguous WTAc and the Hadamard product, we can show that \(A|_{\mathrm {supp}(A')}\) is constraint-regular if A and \(A'\) are constraint-regular and the semiring \(\mathbb S\) is zero-sum free.

Theorem 18

Let \(\mathbb S\) be zero-sum free. For all WTAc G and \(G'\) there exists a WTAc H such that \(H = G|_{\mathrm {supp}(G')}\).

Proof

By Corollary 10 the support \(\mathrm {supp}(G')\) is constraint-regular. Hence we can obtain an unambiguous WTAc \(G''\) for \(\mathrm {supp}(G')\) using Theorem 14. Without loss of generality we assume that both G and \(G''\) are constraint-determined; we note that the normalization preserves unambiguous WTAc. Finally we construct \(G \times G''\), which by Theorem 12 generates exactly \(G|_{\mathrm {supp}(G')}\) as required.    \(\square \)

5 Towards HOM Problem

The strategy of [15] for deciding the HOM problem first represents the homomorphic image L of the regular tree language with the help of an WTGc G. For deciding whether L is regular, a tree automaton \(G'\) simulating the behavior of G up to a certain bounded height is constructed. If \(G'=G\), then L is regular. If not, pumping arguments are used to prove that it is impossible to find any TA for L. Overall, they reduce the HOM problem to an equivalence problem.

Towards solving the HOM problem in the weighted case we now proceed similarly. First, we show that WTGc can encode each (well-defined) homomorphic image of a regular weighted tree language. This ability motivated their definition in the unweighted case [15, Proposition 4.6], and it also applies in the weighted case with minor restrictions that just enforce that all obtained sums are finite.

Theorem 19

Let \(G = (Q, \varSigma , F, P, \mathord {\mathrm {wt}})\) be a WTA and \(h \in T_\varDelta ^{T_{\varSigma }}\) be a nondeleting and nonerasing tree homomorphism. There exists a WTGc \(G'\) with \(G' = h(G)\).

Proof

We construct a WTGc \(G'\) for h(G) in two stages. First, we construct the WTGc \(G'' = \bigl (Q \cup \{\bot \}, \varDelta \cup \varDelta \times P, F'', P'', \mathord {\mathrm {wt}''} \bigr )\) such that for every production \(p = \sigma (q_{1}, \dotsc , q_{k}) \rightarrow q \in P\) and \(h_\sigma = u = \delta (u_{1}, \dotsc , u_{n})\),

$$ p'' = \Bigl ( \langle \delta ,p\rangle (u_{1}, \dotsc , u_{n}) \llbracket q_{1}, \dotsc , q_{k} \rrbracket {\mathop {\longrightarrow }\limits ^{E, \emptyset }}q \Bigr ) \in P'' \quad \text {with} \quad E = \bigcup _{i \in [k]} \mathrm {pos}_{x_i}(u)^2 $$

where the substitution \(\langle \delta ,p\rangle (u_{1}, \dotsc , u_{n})\llbracket q_{1}, \dotsc , q_{k} \rrbracket \) replaces for every \(i \in [k]\) only the left-most occurrence of \(x_i\) in \(\langle \delta ,p\rangle (u_{1}, \dotsc , u_{n})\) by \(q_i\) and all other occurrences by \(\bot \). Moreover \(\mathrm {wt}''_{p''} = \mathrm {wt}_p\). Additionally, we let \(p''_\delta = \delta (\bot , \dotsc , \bot ) \rightarrow \bot \in P''\) with \(\mathrm {wt}''_{p''_\delta } = 1\) for every \(k \in \mathbb N\) and \(\delta \in \varDelta _k \cup \varDelta _k \times P\). No other productions are in \(P''\). Finally, we let \(F''_q = F_q\) for all \(q \in Q\) and \(F''_\bot = 0\).

We can now delete the annotation. First we remove all productions to \(\bot \) that are labeled with symbols from \(\varDelta \times P\). Second, we use a deterministic relabeling to remove the second components of labels of \(\varDelta \times P\). Thus, we overall obtain a WTGc \(G'\) (using only equality constraints) such that \(G' = h(G)\).

The sole purpose of the annotations is to establish a one-to-one correspondence between the valid runs of G and those of \(G''\), before evaluating the sums to compute h(G). This simplifies the understanding of the correctness of the construction, but is otherwise superfluous and may be omitted for efficiency.    \(\square \)

Let us illustrate the construction on a simple example.

Example 20

Consider the WTA \(G = \bigl (\{q, q'\}, \varSigma , F, P, \mathord {\mathrm {wt}} \bigr )\) over the semiring \(\mathbb {N}\) with \(\varSigma = \{\alpha ^{(0)}, \phi ^{(1)}, \gamma ^{(1)}, \epsilon ^{(1)}\}\), \(F_q = 0\), \(F_{q'} = 1\), and the set of productions and their weights given by \( p_1 = \alpha \rightarrow _1 q , p_2 = \gamma (q) \rightarrow _2 q, p_3 = \epsilon (q) \rightarrow _1 q \) and \( p_4 = \phi (q) \rightarrow _1 q'\). We have \(\mathrm {supp}(G) = \bigl \{ \phi (t) \mid t \in T_{\varSigma \setminus \{\phi \}} \bigr \}\) and \(G_t = 2^{|\mathrm {pos}_\gamma (t) |}\) for \(t \in \mathrm {supp}(G)\). Consider the ranked alphabet \(\varDelta = \{\alpha ^{(0)}, \gamma ^{(1)}, \sigma ^{(2)}\}\) and the homomorphism h induced by \(h_\alpha = \alpha \), \(h_\gamma = h_\epsilon = \gamma (x_1)\), and \(h_\phi = \sigma \bigl (\gamma (x_1), x_1 \bigr )\). So \( \mathrm {supp}\bigl (h(G) \bigr ) = \bigl \{ \sigma \bigl (\gamma ^{n+1}(\alpha ), \gamma ^n(\alpha ) \bigr ) \mid n \in \mathbb {N}\bigr \} \) and \( h(G)_t = \sum _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) 2^k=3^n \) for every \(t = \sigma \bigl (\gamma ^{n+1}(\alpha ), \gamma ^n(\alpha ) \bigr ) \in \mathrm {supp}\bigl (h(G) \bigr )\). A WTGc for h(G) is constructed as follows. First, we let \(G'' = \bigl (\{q, q', \bot \}, \varDelta \cup \varDelta \times P, F'', P'', \mathord {\mathrm {wt}''} \bigr )\) with \(F''_{q'} = 1\), \(F''_{q} = F''_\bot = 0\) and the productions and their weights are given by

$$\begin{aligned} \langle \alpha , p_1\rangle&\rightarrow _1 q&\langle \gamma , p_2 \rangle (q)&\rightarrow _2 q&\langle \gamma , p_3 \rangle (q)&\rightarrow _1 q&\langle \sigma , p_4 \rangle \bigl (\gamma (q), \bot \bigr )&{\mathop {\longrightarrow }\limits ^{11=2}}_1 q' \end{aligned}$$

and \(\delta (\bot , \dotsc , \bot ) \rightarrow _1 \bot \) for all \(\delta \in \varDelta \cup \varDelta \times P\). Next we remove the second component of the labels and add weights of productions that become equal. This applies to the production \(\gamma (q) \rightarrow q\), which obtains the sum of the two productions (with annotations \(p_2\) and \(p_3\)). So we obtain the WTGc \(G' = \bigl (\{q, q', \bot \}, \varDelta , F'', P', \mathord {\mathrm {wt}'} \bigr )\) with the following productions for all \(\delta \in \varDelta \).

$$\begin{aligned} \alpha&\rightarrow _1 q&\gamma (q)&\rightarrow _3 q&\sigma \bigl (\gamma (q), \bot \bigr )&{\mathop {\longrightarrow }\limits ^{11=2}}_1 q'&\delta (\bot , \dotsc , \bot )&\rightarrow _1 \bot \end{aligned}$$

   \(\square \)

Although for zero-sum free semirings, the support of a regular weighted tree language is again regular, in general, the converse is not true, so we cannot apply the decision procedure from [15] to the support of h(G) in order to decide its regularity. Instead, we hope to extend the unweighted argument in a way that tracks the weights sufficiently close. For this, we prepare two decidability results, which rely mostly on the corresponding results in the unweighted case. To this end, we need to relate our WTGc constructed in Theorem 19 to those used in [15]. This requires that the equality constraints in every production refer to positions that occur in its left-hand side and are labeled by the same nonterminal.

Definition 21

A WTGc \(G = (Q, \varSigma , F, P, \mathord {\mathrm {wt}})\) is classic if \(\{v, v'\} \subseteq \mathrm {pos}(\ell )\) and \(\ell (v) = \ell (v') \in Q\) for every production \(\ell {\mathop {\longrightarrow }\limits ^{E,I}}~q \in P\) and \((v, v') \in E\).    \(\square \)

Theorem 22

Let \(\mathbb S\) be a zero-sum free semiring, \(G = (Q, \varSigma , F, P, \mathord {\mathrm {wt}})\) be a WTA and \(h \in T_\varDelta ^{T_{\varSigma }}\) be a nondeleting and nonerasing tree homomorphism. Finally, let \(A = h(G)\). Emptiness and finiteness of \(\mathrm {supp}(A)\) are decidable.

The proof of Theorem 22 applies the corresponding result for the unweighted case. In short, we use Theorem 19 to represent A by a WTGc for which we drop the weights. The resulting TGc representing \(\mathrm {supp}(A)\) is then modified into an equivalent, classic one. For this, emptiness and finiteness are decidable by [15].