Keywords

1 Introduction

In 1967, Thatcher [32] investigated the theory of finite pseudoterms (nowadays known as unranked trees) and pseudoautomata (or unranked tree automata). In contrast to ranked trees (see [9, 22, 23] for surveys), for unranked trees the number of children of a node is not determined by the label of that node. Since then, due to the development of the modern document language XML and the fact that (fully structured) XML-documents can be formalized as unranked trees, the theory of unranked tree automata and unranked tree languages has developed intensively, cf. e.g. [2, 7, 28,29,30, 32] and Chap. 8 of [9].

Classical unranked tree automata (amongst others) provide the opportunity to cope with qualitative questions, like reachability. More recently, also quantitative aspects gained much attention in automata theory. For instance, weighted real-time automata were investigated in [27], weighted modal systems in [4], and axiomatizations of weighted transition systems in [25]. For ranked trees, weighted automata were introduced in [1, 6]; for surveys we refer to [13, 21]. Weighted automata for unranked trees over semirings were investigated in [20, 26]. A weighted unranked tree automata model over tree valuation monoids was introduced in [12]. Tree valuation monoids provide a very general weight structure including all semirings, bounded (possibly non-distributive) lattices, and in addition, computations of averages or discounting of weights.

Nivat-type results provide a close relationship between weighted and unweighted automata models. In 1968, Nivat [31] (see also [5], Theorem 4.1) proved the fundamental theorem which characterizes rational transductions, and thereby established a connection between rational transductions and rational languages. Droste and Kuske [14] extended Nivat’s theorem to weighted word automata over semirings. They showed that recognizable word series are exactly those which can be constructed from recognizable languages and very particular recognizable series using operations like morphisms and intersections. Recently, other extensions followed. Nivat theorems were given in [17, 18] for weighted timed automata and weighted timed pushdown automata over timed valuation monoids and thereby implicitly also for weighted word automata over valuation monoids, in [3] for weighted picture automata over picture valuation monoids, in [10] for weighted graph automata over semirings, and in [33, 34] for probabilistic automata on finite and infinite words and ranked trees, respectively.

The goal of this paper is such a Nivat result for weighted unranked tree automata over tree valuation monoids. Such automata consist of a state set and a family of weighted word automata. The latter are used to calculate the local weight at a position of a tree by letting the weighted word automaton run on states at the children of the position. To define the behavior, we use extended runs which were already introduced in [20]. Additionally to the information of classical runs, extended runs also include runs of the weighted word automata called at positions of the input tree. Then the local weight of a position equals the weight of the transition taken for this position in the run of the position’s parent. We use the valuation function of the tree valuation monoid to calculate the weights of an extended run in a global way, i.e. given a run we apply the valuation function to all local weights which appear along the extended run. We obtain the weight of the tree as the sum of the weights of all its extended runs. In [12] it was shown that this model of weighted unranked tree automata is expressively equivalent to a suitable weighted MSO logic for unranked trees.

The main result of this paper gives a Nivat-type result for weighted unranked tree automata. We show that the behaviors of weighted unranked tree automata are exactly the functions which can be constructed from recognizable tree languages and behaviors of very simple weighted unranked tree automata by using operations like relabelings and intersections. Indeed, it even suffices to take functions mapping tree labels to tree valuation monoid elements instead of the very simple weighted unranked tree automata. It is clear that these functions define simple, recognizable tree series. Together with the results of [12], our present main result shows that the weighted unranked tree automata model of [12] is robust. In comparison to the proofs of the Nivat theorem for words (cf. [14]), for unranked trees technical difficulties arise from the technically more complex extended runs. Moreover, we also give a Nivat theorem for weighted ranked tree automata over tree valuation monoids.

2 Preliminaries

Let \(\mathbb {N}= \{1, 2, \ldots \}\) be the set of all natural numbers and \(\mathbb {N}_0=\mathbb {N}\cup \{0\}\). For a set H, we denote by |H| the cardinality of H and by \(H^*\) the set of all finite words over H. The empty word is denoted by \(\varepsilon \). For sets \(H_1,\dots ,H_n\) and \(x\in H_1\times \ldots \times H_n\), \(x_i\) equals the i-th component of x.

2.1 Trees and Tree Valuation Monoids

A tree domain is a finite, non-empty subset \(\mathcal {B}\) of \(\mathbb {N}^*\) such that for all \(u\in \mathbb {N}^*\) and \(i\in \mathbb {N}\), if \(u.i\in \mathcal {B}\), then \( u,\ u.1,\dots ,\ u.(i-1)\in \mathcal {B}\). An unranked tree over a set H (of labels) is a mapping \(t:\mathcal {B} \rightarrow H\) such that \({\text {dom}}(t)=\mathcal {B}\) is a tree domain. The set of all unranked trees over H is denoted by \(U_H\). For every \(h\in H\), we denote also by h the particular tree defined by \(t:\{\varepsilon \}\rightarrow H\) and \(t(\varepsilon )=h\). Let \(t\in U_H\). The elements of \({\text {dom}}(t)\) are called positions of t. Let \(u\in {\text {dom}}(t)\). We call t(u) the label of t at u. The rank \({\text {rk}}_t(u)\) of u is defined to be \(\max \{i\in \mathbb {N}\mid u.i\in {\text {dom}}(t)\}\). If \({\text {rk}}_t(u)=0\), then u is also called a leaf of t. We denote by \({\text {leaf}}(t)\) the set of all leaves of t.

A tree valuation monoid (tv-monoid for short) [11, 15] is a quadruple \((D,+,{\text {Val}},\mathbb {0})\) such that \((D,+,\mathbb {0})\) is a commutative monoid and \({\text {Val}}:U_D \rightarrow D\) is a function, called (tree) valuation function, which satisfies that \({\text {Val}}(d)=d\) for every tree \(d\in D\), and \({\text {Val}}(t)=\mathbb {0}\) for every \(t\in U_D\) with \(\mathbb {0}\in {\text {im}}(t)\).

Example 1

\(\mathbb {Q}_{\max }=(\mathbb {Q}\cup \{-\infty \},\max ,{\text {avg}},-\infty )\) with \({\text {avg}}(t)=\frac{\sum _{u\in {\text {dom}}(t)}t(u)}{|{\text {dom}}(t)|}\) for all \(t\in U_{\mathbb {Q}\cup \{-\infty \}}\) is a tv-monoid. The valuation function of this tv-monoid calculates the average of all weights of a tree. The idea for the average calculation was already suggested in [8, 16] for words and in [11] for trees.

2.2 Weighted Unranked Tree Automata

Here we recall the definition of the class of recognizable tree series which were introduced in connection with restricted weighted MSO logics, cf. [12]. Since weighted unranked tree automata use weighted word automata we first recall the definition of weighted word automata over tree valuation monoids.

Let \(\mathbb D\,\) be a tv-momoid and \(\varSigma \) an alphabet, i.e. a non-empty, finite set. A weighted word automaton over \(\varSigma \) and \(\mathbb D\,\) is a quadruple \(\mathcal {A}=(P,I,\mu ,F)\) where P is a non-empty, finite set of states, \(I,F\subseteq P\) are the sets of initial and final states, respectively, and \(\mu :P\times \varSigma \times P\rightarrow \mathbb D\,\). A run of \(\mathcal {A}\) on \(w=w_1\dots w_n\) with \(w_1,\dots ,w_n\in \varSigma \) and \(n\ge 0\) is a sequence \(\pi =(p_{i-1},w_i,p_i)_{1\le i\le n}\) if \(n>0\), and a state \(\pi =p_0\) if \(n=0\) where \(p_0,\dots ,p_n\in P\). The run \(\pi \) is successful if \(p_0\in I\) and \(p_n\in F\). In order to define the weight \({\text {wt}}(\pi )\) of \(\pi \) using a tree valuation function \({\text {Val}}\), we define a tree \(t_\pi \) by letting \({\text {dom}}(t_\pi )=\{1^i\mid 0\le i< n\}\) and \(t_\pi (1^{i})=\mu (p_{i-1},w_i,p_i)\) (\(0\le i< n\)) if \(n>0\), and \(t_\pi (\varepsilon )=\mathbb {0}\) if \(n=0\). Then let \({\text {wt}}(\pi )={\text {Val}}(t_\pi )\). The behavior of \(\mathcal {A}\) is the function \(\Vert \mathcal {A} \Vert :\varSigma ^*\rightarrow \mathbb D\,\) with \(\Vert \mathcal {A} \Vert (w)=\sum _{\pi \text { successful run on } w}{\text {wt}}(\pi )\) for \(w\in \varSigma ^*\).

A weighted unranked tree automaton (WUTA for short) over \(\varSigma \) and \(\mathbb D\,\) is a triple \(\mathcal {M}=(Q,\mathcal {A},\gamma )\) where Q is a non-empty, finite set of states, \(\mathcal {A}=(\mathcal {A}_{q,a}\mid q\in Q,a\in \varSigma )\) is a family of weighted word automata over Q as alphabet and \(\mathbb D\,\), and \(\gamma :Q\rightarrow \mathbb D\,\) is a root weight function. Let \(\mathcal {A}_{q,a}=(P_{q,a},I_{q,a},\mu _{q,a},F_{q,a})\) for all \(q\in Q\), \(a\in \varSigma \). We assume the sets \(P_{q,a}\) to be pairwise disjoint and let \(P_\mathcal {A}=\bigcup _{q\in Q,a\in \varSigma }P_{q,a}\). Moreover, let \(\mu _\mathcal {A}\) be the union of the transition functions \(\mu _{q,a}\).

Intuitively, an extended run assigns a state \(q\in Q\) to each position u of a given tree \(t\in U_\varSigma \) and then consists of one run of \(\mathcal {A}_{q,t(u)}\) on \(q_1\dots q_{{\text {rk}}_t(u)}\) where \(q_i\) is the state assigned to the i-th child of u. Formally, an extended run of \(\mathcal {M}\) on a tree t is a triple (qsl) such that

  • \(q\in Q\) is the root state;

  • \(s:{\text {dom}}(t)\setminus \{\varepsilon \}\rightarrow P_\mathcal {A}\times Q\times P_\mathcal {A}\) is a function such that \(s(1) \dots s({\text {rk}}_t(\varepsilon ))\) is a run of \(\mathcal {A}_{q,t(\varepsilon )}\) and \(s(u.1)\dots s(u.{\text {rk}}_t(u))\) is a run of \(\mathcal {A}_{s(u)_2,t(u)}\) for every \(u\in {\text {dom}}(t)\setminus ({\text {leaf}}(t)\cup \{\varepsilon \})\);

  • \(l:{\text {leaf}}(t)\rightarrow P_\mathcal {A}\) is a function satisfying \(l(\varepsilon )\in P_{q,t(\varepsilon )}\) if t only consists of the root, and if \(u\ne \varepsilon \) is a leaf, then \(l(u)\in P_{s(u)_2,t(u)}\).

An extended run (qsl) is successful if \(s(u.1)\dots s(u.{\text {rk}}_t(u))\) is successful for all \(u\in {\text {dom}}(t)\setminus {\text {leaf}}(t)\) and if l(u) is successful for all \(u\in {\text {leaf}}(t)\) (i.e., l(u) is an initial and final state of \(\mathcal {A}_{s(u)_2,t(u)}\) if \(u\ne \varepsilon \) respectively of \(\mathcal {A}_{q,t(\varepsilon )}\) if \(u=\varepsilon \)). We let \({\text {succ}}(\mathcal {M},t)\) denote the set of all successful extended runs of \(\mathcal {M}\) on t.

We will define the local weight of a position u by the weight of the transition taken for u in the run of the parent of u. This gives a tree \(\mu (t,(q,s,l))\in U_\mathbb D\,\) of weights with the same domain as t; then we apply \({\text {Val}}\) to obtain the weight of the run (qsl) on t. Formally, we define a tree \(\mu (t,(q,s,l))\in U_\mathbb D\,\) where \({\text {dom}}(\mu (t,(q,s,l)))={\text {dom}}(t)\) and

$$ \mu (t,(q,s,l))(u) = {\left\{ \begin{array}{ll}\gamma (q)&{}\text {if }u=\varepsilon ,\\ \mu _\mathcal {A}(s(u))&{}\text {otherwise} \end{array}\right. } $$

for all \(u \in {\text {dom}}(t)\). We call \(\mu (t,(q,s,l))(u)\) the local weight of u. Then \({\text {Val}}(\mu (t,(q,s,l)))\) is the weight of (q, s, l) on t. The behavior of a WUTA \(\mathcal {M}\) is the function \(\Vert \mathcal {M} \Vert :U_\varSigma \rightarrow \mathbb D\,\) defined by

$$\begin{aligned} \Vert \mathcal {M} \Vert (t) = \sum _{(q,s,l)\in {\text {succ}}(\mathcal {M},t)}{\text {Val}}(\mu (t,(q,s,l))) \end{aligned}$$

for all \(t \in U_{\varSigma }\). If no successful extended run on t exists, we put \(\Vert \mathcal {M} \Vert (t) = \mathbb {0}\).

Any mapping from \(U_\varSigma \) to \(\mathbb D\,\) is called an (unranked) tree series. A tree series \(S:U_\varSigma \rightarrow \mathbb D\,\) is called recognizable over \(\mathbb D\,\) if there is a WUTA \(\mathcal {M}\) over \(\varSigma \) and \(\mathbb D\,\) with \(\Vert \mathcal {M} \Vert =S\).

Remark:

Every unranked tree automaton M (see [32] for a definition of unranked tree automata) over an alphabet \(\varSigma \) can be seen as a weighted unranked tree automaton over \(\varSigma \) and the boolean semiring \(\mathbb {B}=(\{0,1\},\wedge ,\vee ,0,1)\). Let \(M=(Q,A,\gamma )\) be an unranked tree automaton. In the following, we identify the weight functions \(\gamma \) and \(\mu _{q,a}\) (\(q\in Q, a\in \varSigma \)) with their support sets. The language of M is defined as follows. We say that M recognizes \(t\in U_\varSigma \) if there is a function \(r:{\text {dom}}(t)\rightarrow Q\) (which we call successful run) with \(r(\epsilon )\in \gamma \) and such that there is a successful run of \(A_{r(u),t(u)}\) on \(r(u.1)\dots r(u.{\text {rk}}(t(u)))\) with \({\text {wt}}(r(u.1)\dots r(u.{\text {rk}}(t(u))))=1\) for all \(u\in {\text {dom}}(t)\). The tree language recognized by M is defined by \(L(M)=\{t\in U_\varSigma \ |\ M \text { recognizes } t\}\). A tree language \(L\subseteq U_\varSigma \) is recognizable if there is a unranked tree automaton M with \(L(M)=L\). Note that this behavior definition is expressively equivalent to the earlier extended run behavior over the boolean semiring, nevertheless, we later use this behavior definition to avoid the syntactically more complex extended runs.

Fig. 1.
figure 1

Sub-automata of the example WUTA \(\mathcal {M}\). Here, incoming arrows symbolize that a state is initial whereas a double border indicates that a state is final. An edge from one state \(p_1\) to another state \(p_2\) (\(p_1\) and \(p_2\) can be the same state) labeled with ad stands for the transition \((p_1,a,p_2)\) with weight d. Transitions with weight \(-\infty \) are omitted.

Example 2

Let \(\mathbb {Q}_{\max }\) be the tv-monoid from Example 1 and \(\varSigma \) be an arbitrary, but fixed alphabet. In [12], a WUTA \(\mathcal {M}\) which calculates the leaves-to-size ratio of a given input tree were given. The size of a tree is the number of all positions of the tree. For sake of completeness we include the definition of \(\mathcal {M}\) and some considerations concerning \(\mathcal {M}\)’s behavior here. Let \(\mathcal {M}=(\{c,n\},\mathcal {A},\gamma )\) over \(\varSigma \) with \(\gamma (c)=1\), \(\gamma (n)=0\), and

  • \(\mathcal {A}_{n,a}=(\{i,f\},\{i\},\mu _{n,a},\{f\})\) where \(\mu _{n,a}(i,n,f)=\mu _{n,a}(f,n,f)=0\), \(\mu _{n,a}(i,c,f)=\mu _{n,a}(f,c,f)=1\) and \(\mu _{n,a}(f,q,i)=\mu _{n,a}(i,q,i)=-\infty \)

  • \(\mathcal {A}_{c,a}=(\{p\},\{p\},\mu _{c,a},\{p\})\) where \(\mu _{c,a}(p,q,p)=-\infty \)

for all \(q\in \{c,n\}\) and \(a\in \varSigma \); for notational convenience, here we have dropped the condition on pairwise disjointness of the state sets. The sub-automata \(\mathcal {A}_{n,a}\) and \(\mathcal {A}_{c,a}\) depicted in Fig. 1 for some \(a\in \varSigma \).

First, let us consider an example tree. For this, we choose \(\varSigma =\{\alpha ,\beta \}\) and the tree

Then (nsl) with

is an extended run on \(t_\text {ex}\). Note that the domain of s excludes \(\varepsilon \). therefore, above, the position \(\varepsilon \) is unlabeled for s. Similarly, the function l has leaves of \(t_\text {ex}\) as domain; therefore, above, the positions \(\varepsilon \) and 1 are unlabeled for l.

Obviously (nsl) is successful, since the runs \(s(1)s(2)=(i,c,f)(f,n,f)\) and \(s(2.1)=(i,c,f)\) are successful in \(\mathcal {A}_{n,\alpha }\) and \(\mathcal {A}_{n,\beta }\), respectively, and the run p is successful in \(\mathcal {A}_{c,\alpha }\) as well as in \(\mathcal {A}_{c,\beta }\). The local weights of (nsl) are

figure a

and thus the weight of (nsl) equals \(\frac{1}{2}\).

Now let t be an arbitrary, but fixed tree. It is easy to see that for every successful extended run (qsl) on t, \(l(u)=p\) for every leaf u of t. Assume that in addition (qsl) assigns the state n to each inner position of t. Let \(\pi _u\) be the unique run of \(\mathcal {A}_{n,t(u)}\) for which \(t_{\pi _u}\) has no label equal to \(-\infty \), thus, \(\pi _u\) leads directly from i to f and finally loops in f. If (qsl) consists for every inner position \(u\ne \varepsilon \) of \(\pi _u\), then (qsl) is the only successful extended run such that \(\mu (t,(q,s,l))\) does not contain \(-\infty \). Let \(\pi \) denote this unique extended run. For leaves u of t, \(\mu (t,\pi )(u)=1\) and for inner positions \(u'\), \(\mu (t,\pi )(u')=0\). Thus,

We will recall some properties of recognizable unranked tree series. Let \(L\subseteq U_\varSigma \) and \(S:U_\varSigma \rightarrow \mathbb D\,\). We define the restriction of S on L by the tree series \(S\cap L:U_\varSigma \rightarrow \mathbb D\,\) by letting \((S\cap L)(t) = S(t)\) if \(t\in L\) and \((S\cap L)(t)=\mathbb {0}\) if \(t\notin L\).

Proposition 3

[12, Lemma 3.4(2)]. Let \(\mathbb D\,\) be a tv-monoid, \(L\subseteq U_\varSigma \) and \(S:U_\varSigma \rightarrow \mathbb D\,\) be recognizable. Then \(S \cap L\) is also recognizable.

Now we consider the closure under relabeling, similarly to [16, 19]. Let \(\varSigma \) and \(\varGamma \) be two alphabets and \(h:\varSigma \rightarrow 2^{\varGamma }\) be a mapping. Then h can be extended to a mapping \(h':U_\varSigma \rightarrow 2^{U_\varGamma }\) by letting \(h'(t)\) be the set of all unranked trees \(t'\) over \(\varGamma \) such that \({\text {dom}}(t') = {\text {dom}}(t)\) and \(t'(w) \in h(t(w))\) for each position \(w \in {\text {dom}}(t)\). For every \(S:U_\varSigma \rightarrow \mathbb D\,\) the tree series \(h''(S):U_\varGamma \rightarrow \mathbb D\,\) is defined by

$$\begin{aligned} h''(S)(t') = \sum _{t \in U_\varSigma \, \wedge \, t' \in h'(t) } S(t) \end{aligned}$$

for all \(t'\in U_\varGamma \). Clearly, the index set of the summation is finite. We denote \(h'\) and \(h''\) also by h which we call a relabeling.

Proposition 4

[12, Lemma 3.6]. Recognizable tree series are closed under relabeling.

3 Nivat-Classes for Unranked Trees

Here we will define the set of all tree series which can be constructed from recognizable tree languages and behaviors of very simple weighted unranked tree automata by using operations like relabelings and intersections. Inspired by Weidner [35], we will call this set Nivat-class for unranked trees.

For the rest of this section let \(\mathbb D\,=(D,+,{\text {Val}},\mathbb {0})\) be a tv-monoid, \(\varGamma \) be an alphabet and \({\text {g}}:\varGamma \rightarrow D\) be a function. The function \({\text {g}}\) later assigns labels to valuation monoid elements. The extension \({\text {g}}':U_\varGamma \rightarrow U_D\) of \({\text {g}}\) is defined by \({\text {g}}'(t)(u)={\text {g}}(t(u))\) for all \(t\in \mathcal {O}(\varGamma )\) and \(u\in {\text {dom}}(t)\). In the following we denote \({\text {g}}'\) also by \({\text {g}}\). Then \({\text {Val}}\circ {\text {g}}\) assigns to each tree \(t\in U_\varGamma \) the weight \({\text {Val}}({\text {g}}(t))\).

Definition 5

Let \(\varSigma \) be an alphabet, \(\mathbb D\,\) be a tv-monoid. The Nivat-class \(\mathcal {N}_\mathbb D\,(U_\varSigma )\) for unranked trees consists of all \(S:U_\varSigma \rightarrow \mathbb D\,\) for which there are:

  • an alphabet \(\varGamma \),

  • a recognizable tree language \(L\subseteq U_\varGamma \),

  • a relabeling \(h:\varGamma \rightarrow \varSigma \),

  • a function \({\text {g}}:\varGamma \rightarrow D\)

such that

$$\begin{aligned} S=h(({\text {Val}}\circ {\text {g}})\cap L). \end{aligned}$$

Example 6

Let \(\mathbb {Q}_{\max }\) be the tv-monoid of Example 1 and \(\varSigma \) be an arbitrary, but fixed alphabet. We will show that the tree series defined by the WUTA in Example 2, which calculates the leaves-to-size ratio of trees, is in \(\mathcal {N}_{\mathbb {Q}_{\max }}(U_\varSigma )\). For this, let

  • \(\varGamma =\varSigma \times \{\text {leaf},\text {noleaf}\}\),

  • \(L=\{t\in U_\varGamma \mid \forall u\in {\text {dom}}(t)\setminus {\text {leaf}}(t):t(u)_2=\text {noleaf}\)

                      \(\wedge \forall u\in {\text {leaf}}(t):t(u)_2=\text {leaf}\}\),

  • \(h(a)=a_1\) for all \(a\in \varGamma \),

  • \(g(a,\text {leaf})=1\) and \(g(a,\text {noleaf})=0\) for all \(a\in \varSigma \).

It is easy to check that L is indeed recognizable and that

for all \(t\in U_\varSigma \).

Our main result will show that the Nivat-class for unranked trees and the set of all recognizable tree series are the same. For the proof of the inclusion of the Nivat-class in the set of all recognizable tree series we first prove that \({\text {Val}}(g)\) is recognizable by an especially simple weighted tree automaton \(\mathcal {M}\). In particular, we can choose \(\mathcal {M}\) with state set \(\varGamma \) and \(\mathcal {A}=(A_{q,a}\mid q\in \varGamma ,a\in \varGamma )\) where \(\mathcal {A}_{q,a}\) has only one state.

Lemma 7

Let \(\varGamma \) be an alphabet, \(\mathbb D\,=(D,+,{\text {Val}},\mathbb {0})\) be a tv-monoid, and \({\text {g}}:\varGamma \rightarrow D\). Then \({\text {Val}}\circ {\text {g}}\) is recognizable.

Proof

We will build a WUTA \(\mathcal {M}\) which recognizes \({\text {Val}}\circ {\text {g}}\). Basically, \(\mathcal {M}\) will have exactly one successful extended run per input tree \(t\in U_\varGamma \). The local weight of this extended run at a position u shall be \({\text {g}}(t(u))\). For this we set the state set of \(\mathcal {M}\) to \(\varGamma \) and let the root weight of state \(a\in \varGamma \) be \({\text {g}}(a)\). Moreover, each subautomaton \(\mathcal {A}_{a,a}\) (\(a\in \varGamma \)) of \(\mathcal {M}\) has only one state q which is initial and final (so that there is only one successful run), and each transition of \(\mathcal {A}_{a,a}\) labeled with \(b\in \varGamma \) has weight \({\text {g}}(b)\) (this generates the local weight which we would like to produce). All other subautomata do not produce any successful run. The latter secures that one only gets a successful extended run if for a-labeled positions \(\mathcal {A}_{a,a}\) is called. Formally, let \(\mathcal {M}=(\varGamma , \mathcal {A},\gamma )\) with \(\gamma (a)=g(a)\) for all \(a\in \varGamma \) and \(\mathcal {A}=(\mathcal {A}_{q,a}\mid q\in \varGamma , a\in \varGamma )\) with \(\mathcal {A}_{a,a}=(\{q\},\{q\},\mu _{a,a},\{q\})\) with \(\mu _{a,a}(b)=g(b)\) for all \(b\in \varGamma \) and \(\mathcal {A}_{c,a}=(\{q\},\{q\},\underline{\mathbb {0}},\emptyset )\) for all \(a,c\in \varGamma \) with \(a\ne c\) where \(\underline{\mathbb {0}}\) denotes a function which maps all triples to \(\mathbb {0}\).

Using the intuition behind \(\mathcal {M}\) given above, one can easily check that \(\Vert \mathcal {M} \Vert (t)={\text {Val}}(g(t))\) for all \(t\in U_\varGamma \).   \(\square \)

Now we prove our main theorem.

Theorem 8

Let \(\varSigma \) be an alphabet, \(\mathbb D\,\) be a tv-monoid, and \(S:U_\varSigma \rightarrow \mathbb D\,\) a tree series. Then S is recognizable iff \(S\in \mathcal {N}_\mathbb D\,(U_\varSigma )\).

Proof

We start with the proof of the “if”-implication. For this, let \(\varGamma \) be an alphabet, \(L\subseteq U_\varGamma \) a recognizable tree language, \(h:\varGamma \rightarrow \varSigma \) a relabeling, and \({\text {g}}:\varGamma \rightarrow D\) a function such that \(S=h(({\text {Val}}\circ {\text {g}})\cap L)\). By Lemma 7, \({\text {Val}}\circ {\text {g}}\) is recognizable, and thus, by Proposition 3, also \(({\text {Val}}\circ {\text {g}})\cap L\) is recognizable. Hence by Proposition 4, \(S=h(({\text {Val}}\circ {\text {g}})\cap L)\) is recognizable.

For the converse, let S be recognizable and \(\mathcal {M}=(Q,\mathcal {A},\gamma )\) be a WUTA with \(\Vert \mathcal {M} \Vert =S\). Moreover let \(\mathcal {A}_{q,a}=(P_{q,a},I_{q,a},\mu _{q,a},F_{q,a})\) for all \(q\in Q\), \(a\in \varSigma \). We assume the sets \(P_{q,a}\) to be pairwise disjoint and let \(P_\mathcal {A}=\bigcup _{q\in Q,a\in \varSigma }P_{q,a}\). Let \(\mu _\mathcal {A}\) be the union of the transition functions \(\mu _{q,a}\).

We will simulate the behavior of \(\mathcal {M}\) by appropriately chosen \(\varGamma \), L, h, and \({\text {g}}\). The main idea for the choice of \(\varGamma \), L, h, and \({\text {g}}\) is that L will be the set of successful extended runs of \(\mathcal {M}\), \({\text {g}}\) will determine the local weights of the extended runs in L, the valuation function \({\text {Val}}\) will calculate the weights of the extended runs, and h will be a projection of the extended runs on their related trees (this results in the “sum over all trees” since h is a relabeling). Since L shall be a set of trees over an alphabet \(\varGamma \), we have to encode (successful) extended runs by trees. As indicated in Example 2, each component q, s, and l, respectively, of an extended run (qsl) on a tree t can be viewed as a tree with possibly unlabeled positions over the tree domain \({\text {dom}}(t)\). Then q has labels in \(\varGamma _1=Q\), s in \(\varGamma _2=P_\mathcal {A}\times Q\times P_\mathcal {A}\), and l in \(P_\mathcal {A}\) whereby the root is the only labeled position in q and the only unlabeled position in s, respectively, and in l only leaves are labeled. We combine these three trees to one tree via building tuples from the labels of q, s, and l. The resulting tree then has labels in \(\varGamma _1\cup \varGamma _2 \cup \varGamma _3\) where \(\varGamma _3=((P_\mathcal {A}\times Q\times P_\mathcal {A})\times P_\mathcal {A})\). Thus we set \(\varGamma =(\varGamma _1\cup \varGamma _2 \cup \varGamma _3)\times \varSigma \). We built the Cartesian product with \(\varSigma \), so that later h can extract the related tree. Let \(t\in U_\varSigma \), (qsl) an extended run on t, and \(\tau \) a tree over \(\varGamma \). We say that \(\tau \) encodes the pair ((qsl), t) if \({\text {dom}}(t)={\text {dom}}(\tau )\) and \(\tau (\varepsilon )=(q,t(\varepsilon ))\), \(\tau (u)=(s(u),t(u))\) for all \(u\in {\text {dom}}(t)\setminus (\{\varepsilon \}\cup {\text {leaf}}(t))\), and \(\tau (u)=((s(u),l(u)),t(u))\) for all \(u\in {\text {leaf}}(t)\setminus \{\varepsilon \}\). From now on we identify a pair ((qsl), t) and its encoding \(\tau \in U_\varGamma \).

Now we can define \(\varGamma \), L, h, and \({\text {g}}\):

  • \(\varGamma =(\varGamma _1\cup \varGamma _2 \cup \varGamma _3)\times \varSigma \)

  • \(L=\{((q,s,l),t)\in U_\varGamma \mid (q,s,l)\in {\text {succ}}(\mathcal {M},t)\}\)

  • \(h(q',a)=a\)

  • \({\text {g}}(q',a)={\left\{ \begin{array}{ll}\gamma (a)&{}\text {if }q'\in \varGamma _1\\ \mu (p_1,q,p_2)&{}\text {if }q'=(p_1,q,p_2)\in \varGamma _2\text { or }q'=((p_1,q,p_2),p)\in \varGamma _3\end{array}\right. }\)

for \(a\in \varSigma \) and \((q',a)\in \varGamma \).

We show that L is actually recognizable. For this, we construct an unranked tree automaton \(M_\text {runs}=(Q_\text {runs},A,\gamma _\text {runs})\) which has only \(\tau \) as successful run on an input tree \(\tau =((q,s,l),t)\in U_\varGamma \) iff (qsl) is a successful extended run of \(\mathcal {M}\) on t. Thus, the state set \(Q_\text {runs}\) of \(M_\text {runs}\) will be \(\varGamma \) and only subautomata \(A_{\alpha ,\alpha }\) (\(\alpha \in \varGamma \)) actually accept a non-empty language. The subautomaton \(\mathcal {A}_{\alpha ,\alpha }\) of \(M_\text {runs}\) will be a version of the subautomaton \(\mathcal {A}_{q,a}\) (where q is the Q-component and a is the \(\varSigma \)-component, respectively, of \(\alpha \)) of \(\mathcal {M}\) without weights. Hence

  • \(\tau (1)\dots \tau ({\text {rk}}_\tau (\varepsilon ))\in L(\delta (\tau (\varepsilon ),\tau (\varepsilon )))\) iff \(s(1)\dots s({\text {rk}}_t(u))\) is a successful run in \(\mathcal {A}_{q,t(\varepsilon )}\),

  • \(\tau (u.1)\dots \tau (u.{\text {rk}}_\tau (u))\in L(\delta (\tau (u),\tau (u)))\) iff \(s(u.1)\dots s(u.{\text {rk}}_t(u))\) is a successful run in \(\mathcal {A}_{s(u)_2,t(u)}\) for all \(u\in {\text {dom}}(t)\setminus (\{\varepsilon \}\cup {\text {leaf}}(t))\),

  • \(\varepsilon \in L(\delta (\tau (u),\tau (u)))\) iff l(u) is a successful run in \(\mathcal {A}_{s(u)_2,t(u)}\) for all leaves u

for all trees \(\tau =((q,s,l),t)\in U_\varGamma \). To guarantee that trees accepted by \(M_\text {runs}\) have their root label in \(\varGamma _1\times \varSigma \), we let \(\varGamma _1\times \varSigma \) be the set of final states \(\gamma _\text {runs}\) of \(M_\text {runs}\). Moreover, subautomata associated to a state in \(\varGamma _2\times \varSigma \) will not accept the empty word, hence, \(M_\text {runs}\) does not allow runs where states in \(\varGamma _2\times \varSigma \) occur at leaf positions. Leaf positions shall be labeled with states in \(\varGamma _3\times \varSigma \). We achieve this by letting \(\mathcal {A}_{\alpha ,\alpha }\) for \(\alpha \in \varGamma _3\times \varSigma \) be a word automaton which at most accepts the empty word.

Formally, let \(M_\text {runs}=(\varGamma ,A,Q\times \varSigma )\) with

  • \(A_{\alpha ,\alpha }=(P_{q,a},I_{q,a},\mu _{q,a},F_{q,a})\) for \(\alpha =(q,a)\in (\varGamma _1\times \varSigma )\) where \(P_{q,a},I_{q,a},F_{q,a}\) are as in \(\mathcal {A}_{q,a}\) and

    $$\begin{aligned} \mu _{q,a}=\;&\{(p_1,(p_1,q',p_2),p_2)\mid p_1,p_2\in P_{q,a}, q'\in Q\}\\&\cup \{(p_1,((p_1,q',p_2),p),p_2)\mid p_1,p_2,p\in P_{q,a}, q'\in Q\} \end{aligned}$$
  • \(A_{\alpha ,\alpha }=(P_{q,a}\cup \overline{I_{q,a}},\overline{I_{q,a}},\mu _{q,a}',F_{q,a})\) for \(\alpha =((p_1',q,p_2'),a)\in (\varGamma _2\times \varSigma )\) where \(P_{q,a},I_{q,a},F_{q,a}\) are as in \(\mathcal {A}_{q,a}\), \(\overline{I_{q,a}}\) is a disjoint copy of \(I_{q,a}\), and

    $$\begin{aligned} \mu _{q,a}'=&\;T_{q,a}\cup \{(\overline{i_1},(i,q',p_2),p_2)\mid i\in I_{q,a},p_2\in P_{q,a}, q'\in Q\}\\&\cup \{(\overline{i},((i,q',p_2),p),p_2)\mid i\in I_{q,,a},p_2,p\in P_{q,a}, q'\in Q\} \end{aligned}$$

    with

    $$\begin{aligned} \mu _{q,a}=\;&\{(p_1,(p_1,q',p_2),p_2)\mid p_1,p_2\in P_{q,a}, q'\in Q\}\\&\cup \{(p_1,((p_1,q',p_2),p),p_2)\mid p_1,p_2,p\in P_{q,a}, q'\in Q\} \end{aligned}$$
  • \(A_{\alpha ,\alpha }=(\{p_\alpha \},\{p_\alpha \},\emptyset ,\{p_\alpha \})\) for \(\alpha =(((p_1',q,p_2'),p'),a)\in (\varGamma _3\times \varSigma )\) with \(p'\in I_{q,a}\cap F_{q,a}\) and \(A_{\alpha ,\alpha }=(\{p_\alpha \},\{p_\alpha \},\emptyset ,\emptyset )\) for \(\alpha =(((p_1',q,p_2'),p'),a)\) with \(p'\notin I_{q,a}\cap F_{q,a}\)

  • \(A_{\alpha ,\beta }=(\{p\},\{p\},\emptyset ,\emptyset )\)

for \(\alpha ,\beta \in \varGamma \) and \(\alpha \ne \beta \). One can easily prove that \(L(M_\text {runs})=L\).

Now let \(t\in U_\varSigma \). Then

$$\begin{aligned} h(({\text {Val}}\circ {\text {g}})\cap L)(t)&=\sum _{\tau \in U_\varGamma \wedge t\in h(\tau )}(({\text {Val}}\circ {\text {g}})\cap L)(\tau )\\&=\sum _{\tau =((q,s,l),t)\wedge (q,s,l)\in {\text {succ}}(\mathcal {M},t)}{\text {Val}}(g(\tau ))\\&=\sum _{\tau =((q,s,l),t)\wedge (q,s,l)\in {\text {succ}}(\mathcal {M},t)}{\text {Val}}(\mu (t,(s,q,l)))\\&=\sum _{(q,s,l)\in {\text {succ}}(\mathcal {M},t)}{\text {Val}}(\mu (t,(s,q,l)))\\&=\Vert \mathcal {M} \Vert (t)\,. \end{aligned}$$

   \(\square \)

4 The Ranked Tree Case

In this section we will show a version of Theorem 8 for ranked trees. For this, we briefly recall the definitions of ranked alphabets, ranked trees and weighted ranked tree automata over tv-monoids as well as some considerations on the relationship between weighted ranked tree automata and weighted unranked tree automata.

A ranked alphabet is a pair \((\varSigma ,{\text {rk}}_\varSigma )\), where \(\varSigma \) is an alphabet and \({\text {rk}}_\varSigma :\varSigma \rightarrow \mathbb {N}_0\) is a mapping which assigns to each symbol of \(\varSigma \) its rank. We denote by \(\varSigma ^{(k)}\) the set of all symbols which have rank k and by \(a^{(k)}\) that a is in \(\varSigma ^{(k)}\). Usually we drop \({\text {rk}}_\varSigma \) and denote a ranked alphabet simply by \(\varSigma \). In this paper we assume that \(\varSigma ^{(0)}\ne \emptyset \). We define \(\max _\varSigma =\max \{{\text {rk}}_\varSigma (\sigma )\ |\ \sigma \in \varSigma \}\). A ranked tree over a ranked alphabet \(\varSigma \) is an unranked tree over the set \(\varSigma \) such that for all \(u\in {\text {dom}}(t)\), \({\text {rk}}_t(u)=k\) whenever \(t(u)\in \varSigma ^{(k)}\). We denote the set of all ranked trees over \(\varSigma \) by \(T_\varSigma \).

Let \(\varSigma \) be a ranked alphabet and \((D,+,{\text {Val}},\mathbb {0})\) a tv-monoid. A weighted ranked tree automaton (WRTA for short) over \(\varSigma \) and D is a triple \(\mathcal {M} = (Q, \mu , F)\) where Q is a non-empty finite set of states, \(\mu = (\mu _m)_{0\le m \le \max _\varSigma }\) is a family of transition mappings \(\mu _m:Q^m \times \varSigma ^{(m)}\times Q \rightarrow D\), and \(F \subseteq Q\) is a set of final states.

A run r of \(\mathcal {M}\) on a tree \(t\in T_\varSigma \) is a mapping \(r:{\text {dom}}(t) \rightarrow Q\). Since the domain of a run is a tree domain, each run r on t defines a tree \(\mu (t,r)\in T_D\) where \({\text {dom}}(\mu (t,r))={\text {dom}}(t)\) and \(\mu (t,r)(u) = \mu _m(r(u.1) \ldots r(u.m),t(u),r(u))\) with \(t(u)\in \varSigma ^{(m)}\) for all \(u \in {\text {dom}}(t)\). We call r on t successful if \(r(\varepsilon ) \in F\). The behavior of a WRTA \(\mathcal {M}\) is the function \(\Vert \mathcal {M} \Vert :T_\varSigma \rightarrow \mathbb D\,\) defined by

$$\begin{aligned} \Vert \mathcal {M} \Vert (t) = \sum \bigl ({\text {Val}}(\mu (t,r))\,|\,r\text { is successful run of} \, \mathcal {M}\, \mathrm{on} \, t\bigr ) \end{aligned}$$

for all \(t \in T_{\varSigma }\). If no successful run on t exists, we put \(\Vert \mathcal {M} \Vert (t) = \mathbb {0}\).

A ranked tree series is a mapping \(S:T_{\varSigma } \rightarrow \mathbb D\,\). A tree series S is called recognizable if \(S = \Vert \mathcal {M} \Vert \) for some WRTA \(\mathcal {M}\). As is well-known, a ranked tree automaton (cf. [9]) can be seen as a weighted ranked tree automaton over the boolean semiring, and conversely.

In passing, we note the following result. It shows that on ranked trees WRTA and WUTA have the same expressive power.

Proposition 9

[24, Lemma 3.10 and Lemma 3.11]. Let \(\varSigma \) be a ranked alphabet.

  1. 1.

    For every WRTA \(\mathcal {N}\) over \(\varSigma \) there exists a WUTA \(\mathcal {M}\) such that \(\Vert \mathcal {M} \Vert (t)=\Vert \mathcal {N} \Vert (t)\) for all \(t\in T_\varSigma \) and \(\Vert \mathcal {M} \Vert (t)=\mathbb {0}\) for all \(t\in U_\varSigma \setminus T_\varSigma \).

  2. 2.

    For every WUTA \(\mathcal {M}\) over \(\varSigma \) there exists a WRTA \(\mathcal {N}\) such that \(\Vert \mathcal {N} \Vert (t)=\Vert \mathcal {M} \Vert (t)\) for all \(t\in T_\varSigma \).

Now we define a Nivat-class for ranked trees. For this, we define for all \(L\subseteq T_\varSigma \) and \(S:T_\varSigma \rightarrow \mathbb D\,\) the restriction of S on L analogously to the respective definition of the restriction of a tree series and an unranked tree language. Moreover, let \(\varSigma \) and \(\varGamma \) be two ranked alphabets and \(h:\varSigma \rightarrow 2^{\varGamma }\) be a mapping with \({\text {rk}}_\varGamma (b)={\text {rk}}_\varSigma (a)\) for all \(b\in h(a)\). We extend h to a mapping \(h'\) from ranked trees over \(\varSigma \) to the power set of ranked trees over \(\varGamma \), and afterwards to a mapping \(h''\) from ranked tree series over \(\varSigma \) and \(\mathbb D\,\) to ranked tree series over \(\varGamma \) and \(\mathbb D\,\) analogously as we did in the unranked tree case. Again, we denote \(h'\) and \(h''\) also by h which we call a relabeling. Let \(\varGamma \) be a ranked alphabet, \({\text {g}}:\varGamma \rightarrow D\) be a function, and the extension \({\text {g}}':U_\varGamma \rightarrow U_D\) of \({\text {g}}\) be defined by \({\text {g}}'(t)(u)={\text {g}}(t(u))\) for all \(t\in T_\varGamma \) and \(u\in {\text {dom}}(t)\). In the following we denote \({\text {g}}'\) also by \({\text {g}}\).

Definition 10

The Nivat-class \(\mathcal {N}_\mathbb D\,(T_\varSigma )\) for ranked trees consists of all \(S:T_\varSigma \rightarrow \mathbb D\,\) for which there are:

  • an alphabet \(\varGamma \),

  • a recognizable ranked tree language \(L\subseteq T_\varGamma \),

  • a relabeling \(h:\varGamma \rightarrow \varSigma \),

  • a function \({\text {g}}:\varGamma \rightarrow D\)

such that

$$\begin{aligned} S=h(({\text {Val}}\circ {\text {g}})\cap L)\,. \end{aligned}$$

Example 11

Let \(\varSigma \) be a ranked alphabet. We will show that also the ranked tree series which calculates the leaves-to-size ratio of trees is in \(\mathcal {N}_{\mathbb {Q}_{\max }}(T_\varSigma )\). We let \(\varGamma =\varSigma \), \(L=T_\varGamma \), \(h(a)=a\) for all \(a\in \varGamma \), and \(g(a^{(0)})=1\) and \(g(a^{(k)})=0\) for all \(a\in \varSigma \) and \(k>0\). Obviously L is recognizable and \(h(({\text {Val}}\circ {\text {g}})\cap L)\) calculates the leaves-to-size ratio of ranked trees. A WRTA that recognizes \(h(({\text {Val}}\circ {\text {g}})\cap L)\) was given in [11].

Next we prove a Nivat theorem for ranked trees.

Theorem 12

Let \(\varSigma \) be a ranked alphabet, \(\mathbb D\,\) a tv-monoid, and \(S:T_\varSigma \rightarrow \mathbb D\,\) a tree series. Then S is recognizable iff \(S\in \mathcal {N}_\mathbb D\,(T_\varSigma )\).

Proof

Let \(\varGamma \) be an alphabet, \(L\subseteq T_\varGamma \) a recognizable ranked tree language, \(h:\varGamma \rightarrow \varSigma \) a relabeling, and \({\text {g}}:\varGamma \rightarrow D\) a function such that \(S=h(({\text {Val}}\circ {\text {g}})\cap L)\). As is easy to see, \({\text {Val}}\circ {\text {g}}\) is accepted by a one state automaton \(\mathcal {M}\). Indeed, \(\mathcal {M}=(\{q\},\mu ,\{q\})\) with \(\mu _m(q\dots q,a,q)=g(a)\) for all \(a\in \varGamma ^{(m)}\) and \(m\in \mathbb {N}\). By the versions of Proposition 3 and Proposition 4 for ranked trees (cf. [11]), \(({\text {Val}}\circ {\text {g}})\cap L\) is recognizable, and thus, \(S=h(({\text {Val}}\circ {\text {g}})\cap L)\) is recognizable.

For the converse implication, let \(S=\Vert \mathcal {M} \Vert \) for some WRTA \(\mathcal {M}=(Q,\mu ,F)\). Then let

  • \(\varGamma =\bigcup _{0\le m\le \max _\varSigma }Q^m\times \varSigma ^{(m)}\times Q\) with \({\text {rk}}(q_1\dots q_m,a,q)={\text {rk}}(a)\) for all \(q_1,\dots ,q_m,q\in Q\), \(a\in \varSigma ^{(m)}\) be the set of all transitions of \(\mathcal {M}\),

  • \(L=\{t\in T_\varGamma \mid \forall u\in {\text {dom}}(t)\text { with }t(u)\in \varGamma ^{(m)}:t(u.i)_{3}=(t(u)_1)_i\text { for all } 1\le i\le m\text { and }t(\varepsilon )_3\in F\}\) describing the set of all successful runs of \(\mathcal {M}\),

  • \(h((q_1,\dots , q_m),a,q)=a\),

  • \(g((q_1,\dots , q_m),a,q)=\mu _m(q_1\dots q_m,a,q)\)

for all \(q_1,\dots ,q_m,q\in Q\), \(a\in \varSigma ^{(m)}\). One can check that L is recognizable and \(h(({\text {Val}}\circ {\text {g}})\cap L)=S\).    \(\square \)

5 Conclusion

We proved two Nivat theorems for weighted unranked tree automata and for weighted ranked tree automata over tree valuation monoids.

In [17, 33, 34], the Nivat theorem was used to show the expressive equivalence of a suitable MSO logic and the respective automata model. We think that, similarly, Theorem 8 could be used to derive an alternative proof to the one in [12] showing that the weighted MSO logic defined there and weighted unranked tree automata over tree valuation monoids are expressively equivalent.