1 Introduction

Two words uv over an alphabet \(\Sigma \) are said to be Parikh equivalent, if for each \(a \in \Sigma \), the number of occurrences of a in u and v are the same. The Parikh image of a language L, is the set of Parikh equivalence classes of words in L. One of the most celebrated results in automata theory, Parikh’s theorem [27], states that for any context-free language L, there is a regular language \(L'\) such that the Parikh images of L and \(L'\) are the same. For example, the context-free language has the same Parikh image as the regular language \((ab)^*\); both the Parikh images only consist of those equivalence classes where the numbers of as is equal to the number of bs. An important and immediate consequence of this result is that every context-free language over the unary alphabet is in fact regular. Parikh’s theorem has found many applications—in automata theory to prove non-context-freeness of languages [13], decision problems for membership, universality and inclusions involving context-free languages and semi-linear sets [11, 17,18,19]; in verification of subclasses and extensions of counter machines [8, 11, 14, 15, 20, 23, 33, 35, 37]; automata and logics over unranked trees with counting [2, 34]; PAC-learning [23].

Weighted automata [10, 32] are a generalization of classical automata (finite or otherwise) in which each transition has an associated weight from a semiring. Recall that a semiring is an algebra with two operations \(\oplus \) and \(\otimes \) such that \(\oplus \) is a commutative monoid operation, \(\otimes \) is a monoid operation, \(\otimes \) distributes over \(\oplus \), and the identity of \(\oplus \) is an anhilator for \(\otimes \). Unlike classical automata that compute Boolean-valued functions over words, weighted automata compute more general functions over words—the weight of an accepting computation is the product of the weights of its transitions, and the weight of a word is the sum of the weights of all its accepting computations. Since the seminal paper of Schützerberger [32], weighted automata have inspired a wealth of extensions and further research (see [10] for a recent handbook compilation). Weighted automata have found applications in verification [5, 6, 9, 24], reasoning about competitive ratio of online algorithms [1], digital image compression [7, 16, 21, 22], in speech-to-text processing [4, 25, 26], and data flow analysis [30, 31]. A special case of weighted automata are probabilistic automata [28, 29] that model randomized algorithms and stochastic uncertainties in the system environment.

In this paper, we investigate whether Parikh’s theorem can be generalized to the weighted case. In particular we investigate if for any weighted context-free grammar \(\mathrm {G} \) there is a weighted right-linear grammar such that for any Parikh equivalence class C, the sum of the weights of words in C under \(\mathrm {G} \) and \(\mathrm {G} '\) is the same. It is easy to see that if the weight domain is not commutative (i.e., \(\otimes \) is not a commutative operation) then Parikh’s theorem does not hold. Thus we focus our attention on commutative weight domains.

Our first result concerns weight domains that are additionally idempotent, which means that \(\oplus \) is an idempotent operation. A classical example of such a semiring is the min-plus or tropical semiring over the natural numbers where \(\min \) is the “addition” operation, and \(+\) is the “product” operation. We show that Parikh’s theorem does indeed hold for weighted automata over commutative, idempotent semirings.

Next, we show that our assumption about idempotence of semirings is necessary. In particular, we give an example of a stochastic context-free grammar \(\mathrm {G} \) over the unary alphabet such that the function computed by \(\mathrm {G} \) cannot be realized by any stochastic right linear grammar.

Our last result concerns unary grammars that are polynomially ambiguous. Recall that a grammar is polynomially ambiguous if there is a polynomial p such that on any word of length n in the language, the number of derivation trees for the word is bounded by p(n). We prove that Parikh’s theorem extends for such grammars. Specifically, we show that, over the unary alphabet, any probability function realized by a stochastic context-free grammar can also be realized by a right-linear grammar. Though we present this result in the context of stochastic grammars, the proof applies to any polynomially ambiguous weighted context-free grammar over a semiring that is commutative, but not necessarily idempotent.

The rest of the paper is organized as follows. We introduce the basic models and notation in Sect. 2. The Parikh’s theorem for weighted automata over commutative, idempotent semirings is presented in Sect. 3. In Sect. 4, we present an example unary stochastic context-free grammar, and show that there is no stochastic right-linear grammar that is equivalent to it. Section 5 contains our proof for Parikh’s theorem for polynomially ambiguous grammars. Eventhough this proof is presented in the context of stochastic grammars, it is easy to see that it extends to any weighted context free grammar over a commutative (but not necessarily idempotent) semiring. Finally, we present our conclusions and directions for future work in Sect. 6.

2 Preliminaries

Strings. Let us fix a finite string/word \(w \in \Sigma ^*\) over \(\Sigma \). For a subset \(\Gamma \subseteq \Sigma \), \({w \mathord {\upharpoonright }_{\Gamma }}\) will denote the string over \(\Gamma \) obtained by removing the symbols not in \(\Gamma \) from w. The Parikh map, or Parikh image, of \(w \in \Sigma ^*\), denoted by \({{\mathrm{\mathrm {Pk}}}}(w)\), is a mapping from \(\Sigma \) to \(\mathbb {N}\), such that for \(a \in \Sigma \), \({{\mathrm{\mathrm {Pk}}}}(w)(a)\) is the number of occurrences of a in w. The Parikh equivalence class of \(w \), , is the set of all words with the same Parikh image as \(w \). We can extend the Parikh map to languages \(L\subseteq \Sigma ^*\), defining .

Context Free Grammars. We will consider context free grammars in Greibach Normal Form. Formally (in this paper) a context-free grammar is \(\mathrm {G} = \left( \mathrm {V}, \Sigma , \mathrm {P}, S \right) \), where \(\mathrm {V} \) and \(\Sigma \) are disjoint sets of variables (or non-terminals) and terminals, respectively; \(S \in \mathrm {V} \) is the start symbol; and \(\mathrm {P} \subseteq \mathrm {V} \times \Sigma \mathrm {V} ^*\) is a finite set of productions where each production is of the form \(A \rightarrow \mathtt {a}\beta \) with \(\mathtt {a}\in \Sigma \) and \(\beta \in \mathrm {V} ^*\). Without loss of generality, we assume that every production in the grammar is used in some derivation from \(S \) to a string in \(\Sigma ^*\). A sentence is a string in \((\Sigma \cup \mathrm {V})^*\). A right-linear grammar is a context-free grammar where the productions have at most one non-terminal on the right-hand side, i.e., \(\mathrm {P} \subseteq \mathrm {V} \times (\Sigma (\left\{ \epsilon \right\} \cup \mathrm {V}))\). It is well known that a language is generated by a right-linear grammar if and only if it is regular.

We will find it convenient to partition the variables of a grammar into those that have exactly one derivation tree and those that have more than one. Formally, the set of single-derivation variables \(\mathrm {X} \subseteq \mathrm {V} \) is the smallest set containing all variables \(A \) with exactly one production of the form \(A \rightarrow \mathtt {a}\) (with \(\mathtt {a}\in \Sigma \)) and having the property that if a variable \(A \) has exactly one production of the form \(A \rightarrow \mathtt {a}\alpha \) where \(\mathtt {a}\in \Sigma \) and \(\alpha \in \mathrm {X} ^*\) then \(A \in \mathrm {X} \). The remaining variables, i.e. \(\mathrm {Y} = \mathrm {V} \setminus \mathrm {X} \), are multiple-derivation variables.

Prioritized leftmost derivations. In this paper we will consider special derivation sequences of a context-free grammar that expand the leftmost variable while giving priority to single-derivation variables. We call these prioritized leftmost (PLM) derivations, and we define them precisely next.

Definition 1

Consider a context-free grammar \(\mathrm {G} = \left( \mathrm {V}, \Sigma , \mathrm {P}, S \right) \), where the non-terminals \(\mathrm {V} \) have been partitioned into the set of single-derivation variables \(\mathrm {X} \) and multiple-derivation variables \(\mathrm {Y} \). We say that \(\alpha A \beta \) rewrites in a single prioritized leftmost derivation step to \(\alpha \gamma \beta \) (denoted as \(\alpha A \beta \Rightarrow _{\mathrm {plm}}\alpha \gamma \beta \)) iff \(\exists \pi \in \mathrm {P}, \pi = (A \rightarrow \gamma )\) such that either

  1. 1.

    \(A \in \mathrm {X} \), \(\alpha \in (\Sigma \cup \mathrm {Y})^*\), and \(\beta \in \mathrm {V} ^*\), or

  2. 2.

    \(A \in \mathrm {Y} \), \(\alpha \in \Sigma ^*\), and \(\beta \in (\Sigma \cup \mathrm {Y})^*\)

In other words, either \(A \) is the leftmost single-derivation variable in \(\alpha A \beta \), or \(A \) is the leftmost multiple-derivation variable and \(\alpha A \beta \) has no single-derivation variables. If \(\alpha \Rightarrow _{\mathrm {plm}}\beta \) by application of , we’ll write . Note that if \(\alpha \Rightarrow _{\mathrm {plm}}\beta \) there is always a unique \(\pi \) such that .

A prioritized leftmost (PLM) derivation is a sequence \({\psi } = \alpha _1,\cdots ,\alpha _n\) such that \(\alpha _1 \Rightarrow _{\mathrm {plm}}\alpha _2 \Rightarrow _{\mathrm {plm}}\cdots \Rightarrow _{\mathrm {plm}}\alpha _n\). The set of all PLM derivations is denoted \({{\mathrm{\mathrm {Der}_{\mathrm {plm}}}}}(\mathrm {G})\).

The language generated by G is where \(\Rightarrow ^*_{\mathrm {plm}}\) is the reflexive and transitive closure of \(\Rightarrow _{\mathrm {plm}}\). Finally, the parse of a word \(w \in (\Sigma \cup \mathrm {V})^*\), denoted \({{\mathrm{\mathrm {parse}}}}_\mathrm {G} (w)\), is the set of all PLM derivations yielding \(w \):

Example 1

We present a simple example to illustrate the definitions. Consider the grammar \(\mathrm {G} = \left( \left\{ S, B \right\} , \left\{ \mathtt {a}, \mathtt {b} \right\} , \mathrm {P}, S \right) \) where \(\mathrm {P} \) consists of the following productions: \(\pi _1 = S \rightarrow \mathtt {a}S B \), \(\pi _2 = S \rightarrow \mathtt {a}B \), and \(\pi _3 = B \rightarrow \mathtt {b}\). The set of single-derivation variables is \(\{B\}\) and the set of multiple-derivation variables is \(\{S\}\). An example of a prioritized leftmost derivation is

The language generated by this grammar is .

Derivation trees. The set of all derivation trees for \(\mathrm {G} \) will be denoted as \({\Delta _{\mathrm {G}}} \). For a derivation tree \(\tau \), a node \(n \) in \(\tau \), and a path p from the root in \(\tau \), \({{\mathrm{\ell }}}(\tau )\), \({{\mathrm{\ell }}}(n)\) and \({{\mathrm{\ell }}}(p)\) will denote the label of the root, the node \(n \), and the node reached by path p in \(\tau \), respectively. For any node \(n \) in a tree \(\tau \) and path p from the root, we denote the subtree rooted at \(n \) by \(\tau (n)\), and the subtree rooted at the node reached by path p by \(\tau (p)\). The frontier of a tree \(\tau \), denoted \({{\mathrm{\mathrm {Fr}}}}(\tau )\) is the sentence \(\ell (n _1)\ell (n _2)\cdots \ell (n _k)\) where \(n _1,\cdots ,n _k\) are the leaves of \(\tau \) in left-to-right order.

For any variable \(A \in \mathrm {V} \), is the subset of derivation trees rooted at \(A \). A tree \(\tau \) for which \({{\mathrm{\mathrm {Fr}}}}(\tau ) \in \Sigma ^*\) is called a complete derivation tree, and the set of all complete derivation trees rooted at \(A \) is . The set of all complete derivation trees is . A tree \(\tau \in {\Delta _{\mathrm {G}}} (A)\) is said to be an \(A \)-pumping tree if \({{{\mathrm{\mathrm {Fr}}}}(\tau )\mathord {\upharpoonright }_{\mathrm {V}}} = A \). The set of \(A \)-pumping trees is . The set of all pumping trees is given by .

Remark 1

In a context-free grammar (where all productions are “useful”), every single-derivation variable is the root of exactly one complete derivation tree, and every multiple-derivation variable is the root of at least two complete derivation trees.

Tree Notation. We will use the following notation on derivation trees. Let \(\tau \in {\Delta _{\mathrm {G}}} \), \(n \) be a node in \(\tau \), and p be a path from the root in \(\tau \). The leftmost child of the node reached by path p, will be the one reached by the path \(p\cdot 0\) with the other children corresponding to the paths \(p\cdot 1\), \(p\cdot 2\), etc. For \(\tau _1 \in {\Delta _{\mathrm {G}}} ({{\mathrm{\ell }}}(n))\) (\(\tau _1 \in {\Delta _{\mathrm {G}}} ({{\mathrm{\ell }}}(p))\)), \(\tau [n \mapsto \tau _1]\) (\(\tau [p \mapsto \tau _1]\)) denotes the derivation tree obtained by replacing \(\tau (n)\) (\(\tau (p)\)) by the tree \(\tau _1\). We denote by \(\mathsf {rem}_{p}(\tau )\) the tree obtained by replacing \(\tau (p)\) by the root of \(\tau (p)\), i.e., by “removing” all the children of p. Finally, for a rule \(A \rightarrow \mathtt {a}\alpha \) with \(\alpha = A _1A _2\cdots A _k\), and trees \(\tau _i \in {\Delta _{\mathrm {G}}} (A _i)\), \(A _{\mathtt {a}\alpha }(\tau _1,\tau _2,\ldots \tau _k)\) denotes the tree with root labeled A and children \(\mathtt {a},\tau _1,\ldots \tau _k\) from left-to-right. Thus, \(A _\mathtt {a}\) denotes the tree with root labeled \(A \) and one child labeled \(\mathtt {a}\).

Cuts. Observe that, for any string \(\alpha \in (\mathrm {V} \cup \Sigma )^*\), there is a bijection between derivation trees \(\tau \) with \({{\mathrm{\mathrm {Fr}}}}(\tau ) = \alpha \) and PLM derivations in \({{\mathrm{\mathrm {parse}}}}_{\mathrm {G}}(\alpha )\). A set of nodes separating the root of \(\tau \) from all of the leaves in \(\tau \) is a cut of \(\tau \). Now consider the unique PLM derivation \(\Psi \) corresponding to \(\tau \). Every sentence in \(\Psi \) corresponds to a cut \(\mathcal {C} \) in \(\tau \). We call any such \(\mathcal {C} \) a prioritized leftmost (PLM) cut of \(\tau \). For a set of trees T and a variable \(A \in \mathrm {V} \), the Parikh supremum of variable \(A \) in T, denoted by \({{\mathrm{\sup _{{{\mathrm{\mathrm {Pk}}}}}}}}(A,T)\), is the maximum number of occurrences of \(A \) in any PLM cut of any tree \(\tau \in T\). Observe that any PLM derivation sequence corresponding to a tree \(\tau \) in T can have at most \({{\mathrm{\sup _{{{\mathrm{\mathrm {Pk}}}}}}}}(A,T)\) occurrences of the variable \(A \) in any sentence.

Ambiguity. We will say that a set of trees \(\Gamma \) is ambiguous if there are two distinct trees \(\tau _1,\tau _2\) such that \({{\mathrm{\mathrm {Fr}}}}(\tau _1) = {{\mathrm{\mathrm {Fr}}}}(\tau _2)\); if \(\Gamma \) is not ambiguous, we say it is unambiguous. The ambiguity function \({{\mathrm{\mu }}}_G : \mathbb {N}\rightarrow \mathbb {N}\) for a grammar \(\mathrm {G} \) is a function mapping every natural number n to the maximal number of PLM derivations which a word of length n may have. Formally, \({{\mathrm{\mu }}}_G(n) = \max _{w\in {{\mathrm{L}}}(\mathrm {G}), \left| w \right| = n} \left| {{\mathrm{\mathrm {parse}}}}_\mathrm {G} (w) \right| \). A grammar is said to have exponential ambiguity if its ambiguity function is in \(2^{\Theta (n)}\), and it is said to have polynomially-bounded ambiguity, or to be polynomially ambiguous, if its ambiguity function is in \(O(n^d)\) for some \(d\in \mathbb {N}_0\). Any grammar \(\mathrm {G} \) has either exponential ambiguity or polynomially-bounded ambiguity [36]. The following characterization of polynomial ambiguity was proved in [36].

Theorem 1

[36]. A context-free grammar \(\mathrm {G} \) has polynomially-bounded ambiguity if and only if \({\Delta _{\mathrm {G}}^p} \) is unambiguous.

We conclude the preliminaries by recalling a classical result due to Parikh [27].

Theorem 2

(Parikh’s Theorem [27]). For every context-free grammar \(\mathrm {G} \), there is a right-linear context-free grammar \(\mathrm {G} '\) such that \({{\mathrm{\mathrm {Pk}}}}({{\mathrm{L}}}(\mathrm {G})) = {{\mathrm{\mathrm {Pk}}}}({{\mathrm{L}}}(\mathrm {G} '))\).

2.1 Weighted and Stochastic Context-Free Grammars

Weighted context-free grammars define a function that associates a value in a semiring with each string. Stochastic context-free grammars are special weighted context-free grammars that associate probabilities with strings. We recall these classical definitions in this section. We begin by defining a semiring.

Semiring. A semiring is a structure \(\mathbb {D}= (D,\oplus ,\otimes ,0_D,1_D)\) where \((D,\oplus ,0_D)\) is a commutative monoid with identity \(0_D\), \((D \setminus \left\{ 0_D \right\} , \otimes ,1_D)\) is a monoid with identity \(1_D\), \(\otimes \) distributes over \(\oplus \) (i.e., \((a\,\oplus \,b) \otimes \,c = (a\,\otimes \,c) \oplus (b\,\otimes \,c)\) and \(a\,\otimes (b\,\oplus \,c) = (a\,\otimes \,b) \oplus (a\,\otimes \,c)\), for every \(a,b,c \in D\)), and \(0_D\) is an annhilator for \(\otimes \) (i.e., \(a\,\otimes 0_D = 0_D\otimes a = 0\) for every \(a \in D\)). We abuse notation and use \(\mathbb {D}\) to denote the semiring and the underlying set where the meaning is clear from context. We define \(\mathbb {D}_0= D\setminus \left\{ 0_D \right\} \). When considering an abstract semiring \(\mathbb {D}\), we’ll write \(0_\mathbb {D}\) and \(1_\mathbb {D}\) for \(0_D\) and \(1_D\) respectively. An idempotent semiring satisfies the additional requirement that for all \(a\in \mathbb {D}\), \(a\oplus a = a\). A commutative semiring is one where \(\otimes \) is commutative, i.e., \((D \setminus \left\{ 0_D \right\} , \otimes ,1_D)\) is a commutative monoid as well.

Example 2

Classical examples of a semiring are the tropical semiring and the probability semiring. The tropical or min-plus semiring is \((\mathbb {N}\cup \left\{ \infty \right\} , \min , +, \infty , 0)\), where \(\infty \) is taken to be larger than everything in \(\mathbb {N}\). It is commutative and idempotent as \(\min (a,a) = a\) for any a. The probability semiring is \(([0,1], +, \times , 0, 1)\), where [0, 1] is the set of reals between 0 and 1. It is commutative as \(\times \) is commutative. However, since the addition of two numbers is not idempotent, the probability semiring is not idempotent.

Weighted context-free grammars. A weighted context-free grammar is a pair \(\left( \mathrm {G}, {{\mathrm{W}}} \right) \) where \(G=\left( \mathrm {V}, \Sigma , \mathrm {P}, S \right) \) is a context-free grammar, and \({{\mathrm{W}}}: \mathrm {P} \rightarrow \mathbb {D}\) assigns a weight from \(\mathbb {D}\) to each production in \(\mathrm {P} \), for some semiring \(\mathbb {D}\). (Note that W may assign \(0_\mathbb {D}\) to some productions in \(\mathrm {P} \).) The weight of a PLM derivation of \(\mathrm {G} \), is given by \({{\mathrm{W}}}({\psi }) \triangleq \otimes _{i=1}^{n-1} {{\mathrm{W}}}(\pi _i)\). For \(w \in \Sigma ^*\), \({{\mathrm{W}}}(w) \triangleq \oplus _{{\psi } \in {{\mathrm{\mathrm {parse}}}}_{\mathrm {G}}(w)} {{\mathrm{W}}}({\psi })\); we assume that if \({{\mathrm{\mathrm {parse}}}}_{\mathrm {G}}(w) = \emptyset \) (i.e., \(w \not \in {{\mathrm{L}}}(\mathrm {G})\)) then \({{\mathrm{W}}}(w) = 0_\mathbb {D}\). The semantics of a weighted grammar \((\mathrm {G},{{\mathrm{W}}})\), denoted , is the function mapping each word to its weight in \(\mathrm {G} \), i.e., .

Example 3

Let \(\mathrm {G} \) be the grammar described in Example 1. Consider a weight function \({{\mathrm{W}}}\) that assigns weights from the tropical semiring, with the weight of every production \(\pi \in \mathrm {P} \) being equal to 1. Then the semantics of \(\left( \mathrm {G},{{\mathrm{W}}} \right) \) is given as and , when \(w \not \in {{\mathrm{L}}}(\mathrm {G})\).

Definition 2

The Parikh image of a weighted context-free grammar \(\left( \mathrm {G},{{\mathrm{W}}} \right) \), written as is function defined as

Stochastic Context-free Grammars. A stochastic context-free grammar is a weighted context-free grammar \(\left( \mathrm {G} = (\mathrm {V}, \Sigma , \mathrm {P}, S),{{\mathrm{W}}} \right) \) where the weight domain is the probability semiring \(([0,1], +, \times , 0, 1)\), and for any \(A \in \mathrm {V} \) and \(\mathtt {a}\in \Sigma \), we have

$$ \sum _{\alpha \in V^*:(A \rightarrow \mathtt {a}\alpha ) \in \mathrm {P}} {{\mathrm{W}}}(A \rightarrow \mathtt {a}\alpha ) \in \left[ 0,1 \right] . $$

3 A Parikh’s Theorem for Weighted CFGs

The main result of this section is that for any weighted context-free grammar over an idempotent, commutative semiring (like the tropical semiring), there is a Parikh equivalent weighted right-linear context-free grammar. Thus, this observation extends the classical result to weighted CFGs over idempotent semirings.

Theorem 3

(Weighted Parikh’s Theorem). For every weighted context-free grammar \((\mathrm {G},{{\mathrm{W}}})\) over an idempotent, commutative semiring, there exists a Parikh-equivalent weighted right-linear grammar \((\mathrm {G} _*, {{\mathrm{W}}}_*)\), that is, we have

Proof

The full proof can be found in [3]. Here we present the broad ideas.

Let \(\mathrm {G} = \left( \mathrm {V}, \Sigma , \mathrm {P}, S \right) \) be a context-free grammar and let \(W : \mathrm {P} \rightarrow \mathbb {D}\) be a weight function over a commutative, idempotent weight domain \(\mathbb {D}\). Consider the following homomorphism \(h: \mathrm {P} ^* \rightarrow \Sigma ^*\) defined as \(h(\pi ) = \mathtt {a}\), where \(\pi = A \rightarrow \mathtt {a}\alpha \in \mathrm {P} \).

We begin by first constructing a weighted context-free grammar \(\left( \mathrm {G} _1,{{\mathrm{W}}}_1 \right) \) over the alphabet \(\mathrm {P} \), whose image under h gives us \(\mathrm {G} \). Formally, \(\mathrm {G} _1 = (\mathrm {V},\mathrm {P}, \mathrm {P} _1, S)\) has as productions . In addition, take \({{\mathrm{W}}}_1\) to be \({{\mathrm{W}}}_1(A \rightarrow \pi \alpha ) = {{\mathrm{W}}}(\pi )\). It is easy to see that \(h({{\mathrm{L}}}(\mathrm {G} _1)) = {{\mathrm{L}}}(\mathrm {G})\) by construction. Moreover, given \({{\mathrm{W}}}_1\) and \({{\mathrm{W}}}\), we can conclude .

By Parikh’s theorem (Theorem 2), there is a right-linear grammar \(\mathrm {G} _2 = (\mathrm {V} _2, \mathrm {P}, \mathrm {P} _2, S _2)\) such that \({{\mathrm{\mathrm {Pk}}}}({{\mathrm{L}}}(\mathrm {G} _2)) = {{\mathrm{\mathrm {Pk}}}}({{\mathrm{L}}}(\mathrm {G} _1))\). Define the weight function \({{\mathrm{W}}}_2\) as \({{\mathrm{W}}}_2(A \rightarrow \pi B) = {{\mathrm{W}}}(\pi )\) to give us the weighted CFG \(\left( \mathrm {G} _2,{{\mathrm{W}}}_2 \right) \). Using the fact that \(\otimes \) is commutative, and \(\oplus \) is idempotent, we can prove that .

Finally, we consider the context free grammar \(\mathrm {G} _3\) obtained by “applying the homomorphism h” to \(\mathrm {G} _2\). Formally, \(\mathrm {G} _3 = (\mathrm {V} _2,\Sigma ,\mathrm {P} _3,S _2)\), where . The weight function \({{\mathrm{W}}}_3\) is defined in such a way that weight of \(A \rightarrow \mathtt {a}B \) is the sum of the weights of all productions \(A \rightarrow \pi B \), where \(h(\pi ) = \mathtt {a}\), i.e.,

$$ {{\mathrm{W}}}_3(A \rightarrow \mathtt {a}B) = \bigoplus _{\pi : h(\pi ) = \mathtt {a}} {{\mathrm{W}}}_2(A \rightarrow \pi B). $$

\(\left( \mathrm {G} _3,{{\mathrm{W}}}_3 \right) \) and \(\left( \mathrm {G} _2,{{\mathrm{W}}}_2 \right) \) share the same relationship as \(\left( \mathrm {G},{{\mathrm{W}}} \right) \) and \(\left( \mathrm {G} _1,{{\mathrm{W}}}_1 \right) \). That is, we have \(h({{\mathrm{L}}}(\mathrm {G} _2)) = {{\mathrm{L}}}(\mathrm {G} _3)\) and .

\(\left( \mathrm {G} _3,{{\mathrm{W}}}_3 \right) \) is the desired weighted grammar, i.e., .   \(\square \)

Corollary 1

If \((\mathrm {G}, {{\mathrm{W}}})\) is a weighted context-free grammar over an idempotent, commutative weight domain and a unary alphabet, then there exists a weighted right-linear context-free grammar \((\mathrm {G} ', {{\mathrm{W}}}')\) such that .

Example 4

Starting with the weighted grammar \((\mathrm {G}, {{\mathrm{W}}})\) from Example 3, the construction used in the proof of Theorem 3 would have \(\mathrm {P} _1\) containing the following productions: \(S \rightarrow \pi _1S B,\quad S \rightarrow \pi _2B,\quad B \rightarrow \pi _3\). The language of this grammar is .

One candidate for \(\mathrm {G} _2\) would have as productions \(S \rightarrow \pi _1 J,\quad S \rightarrow \pi _2K, J \rightarrow \pi _3S \), and \(K \rightarrow \pi _3\). The language is .

In that case \((\mathrm {G} _3,{{\mathrm{W}}}_3)\) would have productions and weights as follows:

$$\begin{aligned} {{\mathrm{W}}}_3(S \rightarrow \mathtt {a}J)&= 1&{{\mathrm{W}}}_3(S \rightarrow \mathtt {a}K)&= 1\\ {{\mathrm{W}}}_3(J \rightarrow \mathtt {b}S)&= 1&{{\mathrm{W}}}_3(K \rightarrow \mathtt {b})&= 1 \end{aligned}$$

The language of the underlying grammar would be , and

4 A Counterexample to Parikh’s Theorem for Stochastic Grammars

Theorem 3 crucially relies on the semiring being idempotent. In this section, we show that Theorem 3 fails to generalize if we drop the requirement of idempotence. We give an example of a stochastic context-free grammar over the unary alphabet that is not equivalent to any stochastic right-linear grammar. Before presenting the example stochastic context-free grammar and proving the inexpressivity result, we recall some classical observations about unary stochastic right linear grammars.

4.1 Properties of Unary Stochastic Right-Linear Grammars

Stochastic right-linear grammars satisfy pumping lemma type properties. Here we recall such an observation for unary stochastic right-linear grammars.

Theorem 4

(Pumping Lemma). Let be a stochastic right-linear grammar over the unary alphabet. There is a number k, and real numbers \(c_0,c_1,\ldots c_k\) with \(c_0+c_1+\cdots + c_k = 1\) such that for every \(\ell \in \mathbb {N}\)

Proof

The result is a consequence of the Cayley-Hamilton theorem and the fact that 1 is an eigen value of stochastic matrices. We skip the proof as it is a specialization of Theorem 2.8 in Chapter II.C in [28].    \(\square \)

Let \((\mathrm {G},{{\mathrm{W}}})\) be a unary weighted context-free grammar. The generating function of such a grammar is . We conclude this section by observing that if \(\mathrm {G} \) is right-linear, then its generating function must be a rational function, i.e., P(x) is an algebraic fraction where both the numerator and denominator are polynomials.

Theorem 5

Let \((\mathrm {G}, {{\mathrm{W}}})\) be a stochastic right-linear grammar over the unary alphabet. Then the generating function is a rational function.

Proof

Observe that Theorem 4 says that the sequence satisfies a linear homogeneous recurrence with constant coefficients. Thus, its generating function must be rational.    \(\square \)

4.2 The Counterexample

We now present a unary weighted CFG and show that there is no weighted right-linear CFG that is equivalent to it. Consider the grammar \(\mathrm {G} _* = (\left\{ S \right\} , \left\{ \mathtt {a} \right\} , \left\{ (S \rightarrow \mathtt {a}), (S \rightarrow \mathtt {a}S S) \right\} , S)\). Let p be some number in (0, 1). The weight function \({{\mathrm{W}}}_*\) is defined as follows: \({{\mathrm{W}}}_*(S \rightarrow \mathtt {a}) = 1-p\), and \({{\mathrm{W}}}_*(S \rightarrow \mathtt {a}S S) = p\). Taking \(c_n\) to be the nth Catalan number, we can see that ; this is because the probability of any PLM derivation for \(\mathtt {a}^{2k+1}\) is \(p^k(1-p)^{k+1}\) and there are \(c_k\) elements in \({{\mathrm{\mathrm {parse}}}}_{\mathrm {G} _*}(\mathtt {a}^{2k+1})\). Taking , we have

$$b_k = {\left\{ \begin{array}{ll} c_{(k-1)/2}p^{(k-1)/2}(1-p)^{(k-1)/2 + 1}&{}\,\text {if}\, k\, \text {is odd}\\ 0&{}\,\text {otherwise} \end{array}\right. }$$

Recall that the generating function for the Catalan numbers, \(C(z) = \sum _{k \ge 0} c_kz^k\), is given by \(C(z) = \frac{1-\sqrt{1-4z}}{2z}\). Based on the above observations, the generating function for the grammar \((\mathrm {G} _*,{{\mathrm{W}}}_*)\), \(P(z) = \sum _{k \ge 0} b_kz^k\) can be written as follows.

$$\begin{aligned} P(z)&= \sum _{k\ge 0} b_k z^k = \sum _{k\ge 0} b_{2k+1}z^{2k+1}\\&= z\sum _{k\ge 0} b_{2k+1}\left( z^2 \right) ^{k} = z \sum _{k\ge 0} c_{k}p^k(1-p)^{k+1}\left( z^2 \right) ^{k}\\&= z(1-p) \sum _{k\ge 0} c_{k} \left( z^2p(1-p) \right) ^k\\&= z(1-p) C(z^2p(1-p))\\&= z(1-p) \frac{1-\sqrt{1 - 4z^2p(1-p)}}{2z^2p(1-p)}\\&= \frac{1}{2zp}\left( 1 - \sqrt{1-4z^2p(1-p)} \right) \end{aligned}$$

Having identified an expression for the generating function for \((\mathrm {G} _*,{{\mathrm{W}}}_*)\), we are ready to prove that there is no Parikh equivalent right-linear grammar for \((\mathrm {G} _*,{{\mathrm{W}}}_*)\). First notice that if a weighted grammar \((\mathrm {G},{{\mathrm{W}}})\) is over the unary alphabet, then . Therefore, to establish the result of this section, it suffices to prove the statement that there is no right-linear grammar that is equivalent to \((\mathrm {G} _*,{{\mathrm{W}}}_*)\); this is the content of the next theorem.

Theorem 6

There is no stochastic right-linear grammar \((\mathrm {G},{{\mathrm{W}}})\) such that .

Proof

Given Theorem 5, it suffices to prove that the generating function P(z) for \((\mathrm {G} _*,{{\mathrm{W}}}_*)\) is not rational. Taking \(Q(z) = \sqrt{1 - 4z^2p(1-p)}\), we see that \(P(z) = \frac{1}{2zp}\left( 1-Q(z) \right) \). Given this relationship, we can conclude that if P(z) is rational function, then so is Q(z). Thus, our goal will be to prove that Q(z) is not a rational function.

Assume for contradiction that Q(z) is rational. Then \(Q(z) = f(z)/g(z)\) where f and g are both polynomials with greatest common divisor having degree 0. Then \(Q^2(z) = f^2(z)/g^2(z)\) and \(Q^2(z)g^2(z) = \left( 1- 4z^2p(1-p) \right) g^2(z) = f^2(z)\). Thus, \(Q^2(z)\) must divide \(f^2(z)\). We observe that \(Q^2(z) = (1-2z\sqrt{p(1-p)})(1+2z\sqrt{p(1-p)})\) is square-free so \(Q^2(z)\) must divide f(z), and \(f(z) = Q^2(z)h(z)\) for some polynomial h. Substituting for f(z) in \(Q^2(z)g^2(z) = f^2(z)\) and rearranging we obtain \(Q^2(z)g^2(z) = (Q^2(z))^2h^2(z) \implies g^2(z) = Q^2(z)h^2(z)\), and by the same argument as above, \(Q^2(z)\) divides g(z). Thus, \(Q^2(z)\) divides both f(z) and g(z). Since \(Q^2(z)\) is not a degree 0 polynomial, we contradict the assumption that the greatest common divisor of f and g has degree 0.    \(\square \)

5 Parikh’s Theorem for Unary Polynomially Ambiguous Stochastic Grammars

The weighted stochastic context-free grammar \((\mathrm {G} _*,{{\mathrm{W}}}_*)\) in Sect. 4.2 is exponentially ambiguous; the ambiguity function \({{\mathrm{\mu }}}_{\mathrm {G} _*}\) is bounded by the Catalan numbers. Exponential ambiguity turns out to be critical to construct such counterexamples. In this section, we prove that any unary stochastic context-free grammar with polynomial ambiguity is equivalent to a unary stochastic right-linear grammar. The proof of this result relies on an observation that in any PLM cut in a complete derivation tree of a unary polynomially ambiguous grammar, the number of occurences of any variable is bounded by a constant dependent on the grammar. The unary alphabet assumption is crucial in obtaining such a bound (Lemma 3). In the next two subsections we present a proof of this observation by first bounding the number of occurrences in cuts of pumping trees and then using it to bound it in complete derivation trees. In Sect. 5.3, we then present the construction of the right-linear grammar. Though we present this result in the context of stochastic grammars, it applies to any weighted CFG over a commutative (but not necessarily idempotent) semiring.

In the rest of this section, let us fix a unary, polynomially ambiguous, context-free grammar \(\mathrm {G} = (\mathrm {V}, \left\{ \mathtt {a} \right\} , \mathrm {P}, S)\) and a stochastic weight function \(\Pr \) (for probability). We assume that the set of variables \(\mathrm {V} \) is partitioned into single-derivation variables \(\mathrm {X} \) and multiple-derivation variables \(\mathrm {Y} \). As we have done throughout this paper, we assume that every production in \(\mathrm {G} \) is “useful”, that is, is used in some complete derivation tree whose root is labeled \(S \). Finally we will assume that m is the maximum length of the right-hand side of any production in \(\mathrm {P} \).

5.1 Parikh Suprema in Pumping Trees

In this section we will bound the number of times a variable can appear in any PLM cut of a pumping tree in \(\mathrm {G} \). We begin by observing some simple properties about single-derivation variables \(\mathrm {X} \) and multiple-derivation variables \(\mathrm {Y} \). Since every production in the grammar is useful, we can conclude that there is a unique complete derivation tree with root \(A \) if \(A \in \mathrm {X} \), and that there are at least two complete derivation trees with root \(A \) if \(A \in \mathrm {Y} \), i.e., \(|{\Delta ^{\Sigma }_{\mathrm {G}}} (A)| = 1\) if \(A \in \mathrm {X} \), and \(|{\Delta ^{\Sigma }_{\mathrm {G}}} (A)| > 1\) if \(A \in \mathrm {Y} \). Next, for \(A \in \mathrm {X} \), the unique \(\tau \in {\Delta ^{\Sigma }_{\mathrm {G}}} (A)\) has the following properties: (a) no node is labeled by a variable in \(\mathrm {Y} \); (b) each variable in \(\mathrm {X} \) labels at most one node along any path in \(\tau \). Property (b) holds because if \(A \in \mathrm {X} \) has a derivation \(A \Rightarrow ^*_{\mathrm {plm}}\alpha A \beta \), then \(A \) cannot have any complete derivation tree and it would be useless. This also means that any pumping tree \(\tau \in {\Delta _{\mathrm {G}}^p} \) must have \({{\mathrm{\ell }}}(\tau ) \in \mathrm {Y} \). These properties allow us to bound the size of the unique complete derivation for variable \(A \in \mathrm {X} \).

Lemma 1

For any \(A \in \mathrm {X} \), the unique tree \(\tau \in {\Delta ^{\Sigma }_{\mathrm {G}}} (A)\) has size at most \(m^{\left| \mathrm {X} \right| }\).

Proof

Since only variables in \(\mathrm {X} \) can appear as labels in \(\tau \) and no variable appears more than once in any path, the height of \(\tau \) is \(\le \left| \mathrm {X} \right| \). Finally, since any node has at most m children, we get the bound on the size of \(\tau \).   \(\square \)

Next we prove that Lemma 1 allows one to bound the number of times any single-derivation variable appears in any PLM cut of a pumping tree.

Lemma 2

For any \(A \in \mathrm {Y} \) and \(B \in \mathrm {X} \), \({{\mathrm{\sup _{{{\mathrm{\mathrm {Pk}}}}}}}}(B, {\Delta _{\mathrm {G}}^p} (A)) \le m^{\left| \mathrm {X} \right| + 1}\).

Proof

Let \(\tau \in {\Delta _{\mathrm {G}}^p} (A)\) be an arbitrary \(A \)-pumping tree, where \(A \in \mathrm {Y} \). Let \(\mathcal {C} \) be an arbitrary PLM cut of \(\tau \). We will prove a slightly stronger statement; we will show that the total number of single-derivation variables in \(\mathcal {C} \) is \(\le m^{\left| \mathrm {X} \right| +1}\). This will bound the Parikh supremum for any single-derivation variable.

Without loss of generality, assume that \(\mathcal {C} \) has at least one node with label in \(\mathrm {X} \). Amongst all nodes in \(\mathcal {C} \) that are labeled by a variable in \(\mathrm {X} \), let \(n \) be the node that is closest to the root, and if there are multiple such nodes, take \(n \) to be the leftmost one. From the definition of PLM cuts, the following property holds for \(\mathcal {C} \) and \(n \): (a) any node to right of \(n \) in \(\mathcal {C} \) that is labeled by a variable in \(\mathrm {X} \) must be a sibling of \(n \), and (b) all nodes to the left of \(n \) in \(\mathcal {C} \) labeled by variables of \(\mathrm {X} \) must be descendents of some left sibling (say \(n _1\)) of \(n \) that is also labeled by a variable in \(\mathrm {X} \). Thus, the number of nodes to the right of \(n \) (including \(n \)) in \(\mathcal {C} \) labeled by \(\mathrm {X} \) is at most m, and, by Lemma 1, the number of nodes to the left of \(n \) in \(\mathcal {C} \) labeled by \(\mathrm {X} \) is at most \(m^{\left| \mathrm {X} \right| }\). Putting these together, the total number of nodes in \(\mathcal {C} \) labeled by some variable in \(\mathrm {X} \) is at most \(m + m^{\left| \mathrm {X} \right| } \le m^{\left| \mathrm {X} \right| +1}\).    \(\square \)

Lemma 3

For any \(A \in \mathrm {V} \), and \(B \in \mathrm {Y} \), \({{\mathrm{\sup _{{{\mathrm{\mathrm {Pk}}}}}}}}(B,{\Delta _{\mathrm {G}}^p} (A))\le 2\).

Proof

Let \(\tau \) be a \(A \)-pumping tree, for some variable \(A \). Note that \(A \) must be a multiple-derivation variable because of property (b) before Lemma 1. Let \(\mathcal {C} \) be any PLM cut of \(\tau \). Since \(\tau \) is an \(A \)-pumping tree it must contain \(A \) in its frontier. Then there must be some node \(n \) in \(\mathcal {C} \) such that the subtree \(\tau (n)\) contains \(A \) in its frontier. Let \(C = {{\mathrm{\ell }}}(n)\). Observe that \(C \) is a multiple-derivation variable because a node labeled \(A \in \mathrm {Y} \) is a descendent. Thus, there are two complete derivation trees \(\tau ^{C}_1,\tau ^{C}_2\) with roots labeled \(C \) (Remark 1).

We’ll first show that there cannot be more than two occurrences of nodes labeled \(C \) in \(\mathcal {C} \). Assume towards the contrary that there are at least three nodes \(n _1, n _2,n _3\) in \(\mathcal {C} \) with \({{\mathrm{\ell }}}(n _1) = {{\mathrm{\ell }}}(n _2) = {{\mathrm{\ell }}}(n _3) = C \). Without loss of generality, assume \(n _1\), \(n _2\), and \(n _3\) are in left-to-right order in \(\tau \) and \(n \in \left\{ n _1,n _2,n _3 \right\} \). Since \(n _1,n _2,n _3\) belong to a cut, they are not related by the ancestor/descendent relationship.

Let \(\tau _1\) be the tree \(\tau [n _1 \mapsto \tau ^{C}_1, n _2 \mapsto \tau ^{C}_2, n _3 \mapsto \tau (n)]\), and let \(\tau _2\) be the tree \(\tau [n _1 \mapsto \tau ^{C}_2, n _2 \mapsto \tau ^{C}_1, n _3 \mapsto \tau (n)]\). By construction, \(\tau _1\) and \(\tau _2\) are both \(A \)-pumping trees with \({{\mathrm{\mathrm {Fr}}}}(\tau _1) = {{\mathrm{\mathrm {Fr}}}}(\tau _2)\) and \(\tau _1 \ne \tau _2\). However, since \(\mathrm {G} \) is polynomially ambiguous, by Theorem 1, the set of pumping trees is unambiguous, giving us the desired contradiction.

Next, we show that there cannot be more than two nodes labeled \(B \in \mathrm {Y} \) in \(\mathcal {C} \), where \(B \ne C \). Assume that there are at least three nodes \(n _1, n _2,n _3\) in \(\mathcal {C} \) with \({{\mathrm{\ell }}}(n _1) = {{\mathrm{\ell }}}(n _2) = {{\mathrm{\ell }}}(n _3) = B \). Again assume \(n _1\), \(n _2\), and \(n _3\) are in left-to-right order in \(\tau \). Further, since \(B \in \mathrm {Y} \), there are two complete derivation trees \(\tau ^{B}_1\) and \(\tau ^{B}_2\) with root labeled \(B \) (Remark 1).

Observe that at least two nodes of \(\{n _1,n _2,n _3\}\) must lie to one side of \(n \) in \(\tau \). Without loss of generality we may assume that \(n _1\) and \(n _2\) are those nodes. Let \(\tau _1\) be the tree \(\tau [n _1 \mapsto \tau ^{B}_1, n _2 \mapsto \tau ^{B}_2]\), and let \(\tau _2\) be the tree \(\tau [n _1 \mapsto \tau ^{B}_2, n _2 \mapsto \tau ^{B}_1]\). Clearly, \(\tau _1,\tau _2\) are \(A \)-pumping trees with \({{\mathrm{\mathrm {Fr}}}}(\tau _1) = {{\mathrm{\mathrm {Fr}}}}(\tau _2)\), and \(\tau _1 \ne \tau _2\).    \(\square \)

5.2 Parikh Suprema in Complete Derivation Trees

We will now use the results in Sect. 5.1 to bound the Parikh supremum of any variable in a complete derivation tree of \(\mathrm {G} \). The key property we will exploit is the fact that any complete derivation tree can be written as the “composition” of a small number of pumping trees (see Fig. 1) such that any PLM cut is the union of cuts in each of these pumping trees. The bounds on Parikh suprema will then follow from the observations in Sect. 5.1.

Fig. 1.
figure 1

A complete derivation tree for the grammar in Example 5 on the left and the compressed tree data structure with removed pumping trees on the right.

We begin with some convenient notation. For a \(\tau \in {\Delta _{\mathrm {G}}} \), let \({{\mathrm{longestpath}}}(\tau )\) denote the longest path from the root of \(\tau \) to a node labeled \({{\mathrm{\ell }}}(\tau )\). If there are multiple such paths, \({{\mathrm{longestpath}}}(\tau )\) is the lexicographically-first path among them. Note that \({{\mathrm{longestpath}}}(\tau )\) can be \(\varepsilon \) if the root is the only node with label \({{\mathrm{\ell }}}(\tau )\) in \(\tau \). Let \({{\mathrm{depth}}}(\tau )\) denote the length of the longest path from root to leaf in \(\tau \).

We now describe two procedures \({{\mathrm{\mathtt {compress}}}}\) and \({{\mathrm{\mathtt {decompress}}}}\). Let us fix a complete derivation tree \(\tau \). The procedure \({{\mathrm{\mathtt {compress}}}}\) returns a data structure of pumping trees. These pumping trees are small in number and \(\tau \) is the “composition” of these pumping trees. Let \(n \) be the lowest node in \(\tau \) that has the same label as the root. \({{\mathrm{\mathtt {compress}}}}\) identifies the pumping tree obtained by removing the children of \(n \), and recursively compresses the subtrees rooted at the children of \(n \). Note that if \(n \) is the same as the root, then the pumping tree identified by \({{\mathrm{\mathtt {compress}}}}\) will just be the tree with one node.

figure a

The tree \(\tau \) is the “composition” of pumping trees in the data structure returned by \({{\mathrm{\mathtt {compress}}}}\). We describe this “composition operation” itself by an algorithm \({{\mathrm{\mathtt {decompress}}}}\).

figure b

The following lemma characterizing the relationship between \({{\mathrm{\mathtt {compress}}}}\) and \({{\mathrm{\mathtt {decompress}}}}\) is easy to see.

Lemma 4

For any complete derivation tree \(\tau \), \(\tau = {{\mathrm{\mathtt {decompress}}}}({{\mathrm{\mathtt {compress}}}}(\tau ))\).

Example 5

Consider a grammar \((\left\{ S,B,C \right\} ,\left\{ \mathtt {a} \right\} ,\mathrm {P},S)\) with productions

$$ S \rightarrow \mathtt {a}S B | \mathtt {a}B B | \mathtt {a}B,\quad B \rightarrow \mathtt {a}B C | \mathtt {a},\quad C \rightarrow \mathtt {a}\text {.} $$

Consider the complete derivation tree shown on the left in Fig. 1. The output of \({{\mathrm{\mathtt {compress}}}}\) will be

$$ [\tau _1,S \rightarrow \mathtt {a}B B, [\tau _2,B \rightarrow \mathtt {a}], B _{\mathtt {a}}] $$

We will now show that the data structure returned by \({{\mathrm{\mathtt {compress}}}}\) has a constant number of pumping trees. Consider a call of \({{\mathrm{\mathtt {compress}}}}(\tau )\), where p is the \({{\mathrm{longestpath}}}(\tau )\). The key property that we exploit is the fact that the label \({{\mathrm{\ell }}}(\tau )\) does not appear in the subtrees rooted at the children of p.

Lemma 5

For any complete derivation tree \(\tau \), the number of trees in the data structure returned by \({{\mathrm{\mathtt {compress}}}}(\tau )\) is at most \(m^{\left| \mathrm {V} \right| }\).

Proof

Let \(p = {{\mathrm{longestpath}}}(\tau )\), and let \(\tau (p\cdot 0), \tau (p\cdot 1), \ldots , \tau (p\cdot k)\) be the children of \(\tau (p)\). As observed before, the label \({{\mathrm{\ell }}}(\tau )\) does not appear in the subtrees \(\tau (p\cdot i)\). Thus, the depth of the recursion in \({{\mathrm{\mathtt {compress}}}}\) is bounded by \(\left| \mathrm {V} \right| \). Finally, observing that \(k \le m\), we get the desired bound.   \(\square \)

We are now ready to prove the main result of this section.

Lemma 6

For any variable \(A \), \({{\mathrm{\sup _{{{\mathrm{\mathrm {Pk}}}}}}}}(A, {\Delta ^{\Sigma }_{\mathrm {G}}}) \le m^{\left| \mathrm {X} \right| +\left| \mathrm {V} \right| +1}\)

Proof

By Lemma 5, we know that the number of trees in \({{\mathrm{\mathtt {compress}}}}(\tau )\) is at most \(m^{\left| \mathrm {V} \right| }\). Consider any PLM cut \(\mathcal {C} \) of \(\tau \). Any node in \(\mathcal {C} \) belongs to at most one tree in \({{\mathrm{\mathtt {compress}}}}(\tau )\). Further for any \(\tau _1 \in {{\mathrm{\mathtt {compress}}}}(\tau )\), \(\mathcal {C} \) restricted to \(\tau _1\) is a PLM cut of \(\tau _1\). Thus, \(\mathcal {C} \) can be seen as the union of at most \(m^{\left| \mathrm {V} \right| }\) PLM cuts in pumping trees. By Lemmas 2 and 3, the Parikh supremum of any variable in any of these pumping trees is at most \(m^{\left| \mathrm {X} \right| +1}\). These observations together establish the bound.   \(\square \)

5.3 Right-Linear SCFG for Polynomially Ambiguous SCFGs

For this section, let us fix \(k = m^{\left| \mathrm {V} \right| +\left| \mathrm {X} \right| +1}\). By Lemma 6, in any PLM derivation of \(\mathrm {G} \), any variable appears at most k times at any step. Since k is a constant, the right-linear grammar can simulate every PLM derivation of \(\mathrm {G} \) by explicitly keeping track of only k copies of any variable. This idea is very similar to the one used in [12]. We now give the formal definition of the right-linear grammar based on this intuition.

For a sentence , we define . The stochastic right-linear grammar \(({\mathrm{G}}_1=({\mathrm{V}}_1, \{\mathrm{a}\},{\mathrm{P}}_1,{S}_1),{\mathrm{Pr}}_1)\) is formally defined as follows.

  1. 1.

    . Thus, the variables of \(\mathrm {G}_1 \) are sequences of single-derivation variables followed by multiple-derivation variables from \(\mathrm {G} \) in which each variable appears at most k times.

  2. 2.

    \(S _1 = \left\langle S \right\rangle \)

  3. 3.

    For any production \(\pi = (A \rightarrow \mathtt {a}\beta ) \in \mathrm {P} \) and sentence \(\alpha \in \mathrm {V} ^*\) we define a production

    $$ \pi ^\alpha = (\left\langle A \alpha \right\rangle \rightarrow \mathtt {a}\left\langle {{\mathrm{\varvec{\ell \!f}}}}(\beta \alpha ) \right\rangle ) $$

    corresponding to applying the production \(\pi \) as a PLM step from the sentence \(A\alpha \). The set \(\mathrm {P}_1 \) is defined as

  4. 4.

    Finally is defined as .

We first observe that \(({\mathrm{G}}_1,{\mathrm{Pr}}_1)\) is a stochastic CFG. The proof is in [3]

Proposition 1

\(({\mathrm{G}}_1,{\mathrm{Pr}}_1)\) is a stochastic CFG.

\(({\mathrm{G}}_1,{\mathrm{Pr}}_1)\) is equivalent to \((\mathrm {G},\Pr )\). Its proof is in [3].

Theorem 7

For any unary, stochastic grammar \((\mathrm {G},\Pr )\) of polynomial ambiguity, there is a stochastic right-linear grammar \(({\mathrm{G}}_1,{\mathrm{Pr}}_1)\) such that .

6 Conclusions

In this paper we investigated whether Parikh’s theorem generalizes to weighted automata. We proved that it does indeed when the weighted context-free grammar is over a commutative, idempotent semiring. We showed that idempotence of the weight domain is necessary by demonstrating that Parikh’s theorem does not extend to unary, stochastic grammars. However, we proved that if the context-free grammar is polynomially ambiguous, then idempotence of the weight domain is not required for Parikh’s theorem to hold.

Our proof for Parikh’s theorem for commutative and idempotent semirings extends (as is) to pushdown automata (as opposed to context-free grammars). However, the same does not apply to our result for unary, polynomially ambiguous grammars over non-idempotent rings. Our current proof subtly relies on the “one state” property of context-free grammars. It would be interesting to see how to generalize these ideas to the case of pushdown automata. Finally, stochastic context-free grammars have a (weaker) semantics as language acceptors — the grammar accepts a word if its weight is \(> \frac{1}{2}\). Our results imply that every unary language accepted by a polynomially ambiguous, stochastic context-free grammar is also accepted by a probabilistic automata (with probability \(> \frac{1}{2}\)). It is open if this also holds when the grammar is exponentially ambiguous; our counterexample in this paper only shows that there is no probabilistic automaton that satisfies the stronger requirement that words are accepted with the same probability.