Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Probabilistic automata are fundamental models of randomized computation. They have been used in the study of such topics as the semantics and correctness of probabilistic programming languages [18, 20], randomized algorithms [24, 25], and machine learning [3, 26]. Removing randomness but adding nondeterminism, nondeterministic automata are established tools for describing concurrent and distributed systems [27].

Interest in systems that exhibit both random and nondeterministic behaviors goes back to Rabin’s randomized techniques to increase the efficiency of distributed algorithms in the 1970s and 1980s [24, 25]. This line of research yielded several automata models supporting both nondeterministic and probabilistic choices [4, 16, 28]. Many formal techniques and tools were developed for these models, and they have been successfully used in verification tasks [15, 16, 19, 30], but there are many ways of combining nondeterminism and randomization, and there remains plenty of room for further investigation.

In this paper we study nondeterministic probabilistic automata (NPAs) and propose a novel probabilistic language semantics. NPAs are similar to Segala systems [28] in that transitions can make combined nondeterministic and probabilistic choices, but NPAs also have an output weight in [0, 1] for each state, reminiscent of observations in Markov Decision Processes. This enables us to define the expected weight associated with a word in a similar way to what one would do for standard nondeterministic automata—the output of an NPA on an input word can be computed in a deterministic version of the automaton, using a careful choice of algebraic structure for the state space.

Equivalence in our semantics is language equivalence (also known as trace equivalence), which is coarser than probabilistic bisimulation [7, 9, 17, 32], which distinguishes systems with different branching structure even if the total weight assigned to a word is the same. This generalizes the classical difference between branching and linear semantics [31] to the probabilistic setting, with different target applications calling for different semantics.

After reviewing mathematical preliminaries in Sect. 2, we introduce the NPA model and explore its semantics in Sect. 3. We show that there are precisely two natural ways to define the language semantics of such systems—by either taking the maximum or the minimum of the weights associated with the different paths labeled by an input word. The proof of this fact relies on an abstract view on these automata generating probabilistic languages with algebraic structure. Specifically, probabilistic languages have the structure of a convex algebra, analogous to the join-semilattice structure of standard languages. These features can abstractly be seen as so-called Eilenberg-Moore algebras for a monad—the distribution and the powerset monads, respectively—which can support new semantics and proof techniques (see, e.g., [6, 7]).

In Sect. 4, we compare NPAs with standard, deterministic probabilistic automata (DPAs) as formulated by Rabin [23]. Our semantics ensures that NPAs recover DPAs in the special case when there is no nondeterministic choice. More interestingly, we show that there are weighted languages accepted by NPAs that are not accepted by any DPA. We use the theory of linear recurrence sequences to give a separation even for weighted languages over a unary alphabet.

In Sect. 5, we turn to equivalence. We prove that language equivalence of NPAs is undecidable by reduction from so-called threshold problems, which are undecidable [5, 12, 22]. The hard instances encoding the threshold problem are equivalences between probabilistic automata over a two-letter alphabet. Thus, the theorem immediately implies that equivalence of NPAs is undecidable when the alphabet size is at least two. The situation for automata over unary alphabets is more subtle; in particular, the threshold problem over a unary alphabet is not known to be undecidable. However, we give a reduction from the Positivity problem on linear recurrence sequences, a problem where a decision procedure would necessarily entail breakthroughs in open problems in number theory [21]. Finally, we show that despite the undecidability result we can provide a discounted metric that can be computed to arbitrarily high precision.

We survey related work and conclude in Sect. 6.

2 Preliminaries

Before we present our main technical results, we review some necessary mathematical background on convex algebras, monads, probabilistic automata, and language semantics.

2.1 Convex Algebra

A set A is a convex algebra, or a convex set, if for all \(n \in \mathbb {N}\) and tuples \((p_i)_{i = 1}^n\) of numbers in [0, 1] summing up to 1 there is an operation denoted \(\sum _{i = 1}^n p_i (-)_i :A^n \rightarrow A\) satisfying the following properties for \((a_1, \dots , a_n) \in A^n\):  

Projection.:

If \(p_j = 1\) (and hence \(p_i = 0\) for all \(i \ne j\)), we have \(\sum _{i = 1}^n p_i a_i = a_j\).

Barycenter.:

For any n tuples \((q_{i, j})_{j = 1}^m\) in [0, 1] summing up to 1, we have

$$ \sum _{i = 1}^n p_i \left( \sum _{j = 1}^m q_{i, j} a_j \right) = \sum _{j = 1}^m \left( \sum _{i = 1}^n p_i q_{i, j} \right) a_j . $$

Informally, a convex algebra structure gives a way to take finite convex combinations of elements in a set A. Given this structure, we can define convex subsets and generate them by elements of A.

Definition 1

A subset \(S \subseteq A\) is convex if it is closed under all convex combinations. (Such a set can also be seen as a convex subalgebra.) A convex set S is generated by a set \(G \subseteq A\) if for all \(s \in S\), there exist \(n \in \mathbb {N}\), \((p_i)_{i = 1}^n\), \((g_i)_{i = 1}^n \in G^n\) such that \(s = \sum _i p_i g_i\). When G is finite, we say that S is finitely generated.

We can also define morphisms between convex sets.

Definition 2

An affine map between two convex sets A and B is a function \(h :A \rightarrow B\) commuting with convex combinations:

$$ h \left( \sum _{i = 1}^n p_i a_i \right) = \sum _{i = 1}^n p_i h(a_i) . $$

2.2 Monads and Their Algebras

Our definition of language semantics will be based on the category theoretic framework of monads and their algebras. Monads can be used to model computational side-effects such as nondeterminism and probabilistic choice. An algebra allows us to interpret such side-effects within an object of the category.

Definition 3

A monad \((T, \eta , \mu )\) consists of an endofunctor T and two natural transformations: a unit \(\eta : Id \Rightarrow T\) and a multiplication \(\mu :TT \Rightarrow T\), making the following diagrams commute.

When there is no risk of confusion, we identify a monad with its endofunctor. An example of a monad in the category of sets is the triple , where denotes the finite powerset functor sending each set to the set of its finite subsets, \(\{-\}\) is the singleton operation, and \(\bigcup \) is set union.

Definition 4

An algebra for a monad \((T, \eta , \mu )\) is a pair (Xh) consisting of a carrier set X and a function \(h :TX \rightarrow X\) making the following diagrams commute.

Definition 5

A homomorphism from an algebra (Xh) to an algebra (Yk) for a monad T is a function \(f :X \rightarrow Y\) making the diagram below commute.

The algebras for the finite powerset monad are precisely the join-semilattices with bottom, and their homomorphisms are maps that preserve finite joins. The algebras for any monad together with their homomorphisms form a category.

2.3 Distribution and Convex Powerset Monads

We will work with two monads closely associated with convex sets. In the category of sets, the distribution monad maps a set X to the set of distributions over X with finite support. The unit maps \(x \in X\) to the point distribution at x. For the multiplication , let be a finite distribution with support and define \(m(d) = \sum _{i = 1}^n p_i d_i\), where \(p_i\) is the probability of producing \(d_i\) under d. The category of algebras for the distribution monad is precisely the category of convex sets and affine maps—we will often convert between these two representations implicitly.

In the category of convex sets, the finitely generated nonempty convex powerset monad [7] maps a convex set A to the set of finitely generated nonempty convex subsets of A.Footnote 1 The convex algebra structure on is given by with every . The unit map maps \(a \in A\) to a singleton convex set \(\{ a \}\), and the multiplication is again the union operation, which collapses nested convex sets.

As an example, we can consider this monad on the convex algebra [0, 1]. The result is a finitely generated convex set.

Lemma 1

The convex set is generated by its elements \(\{0\}\), \(\{1\}\), and [0, 1], i.e., .

Proof

The finitely generated nonempty convex subsets of [0, 1] are of the form [pq] for \(p, q \in [0, 1]\), and \([p, q] = p\{1\} + (q - p)[0, 1] + (1 - q)\{0\}\).    \(\square \)

To describe automata with both nondeterministic and probabilistic transitions, we will work with convex powersets of distributions. The functor taking sets X to the set of finitely generated nonempty convex sets of distributions over X can be given a monad structure.

Explicitly, writing for the (affine) convex algebra structure on for any convex algebra A, the composite monad is given by

(1)

For all convex sets A and finite nonempty subsets \(S \subseteq A\), we can define the convex closure of S (sometimes called the convex hull) by

where \(\alpha :\mathcal {D}{A} \rightarrow A\) is the convex algebra structure on A. \(\mathsf {Conv}\) is in fact a natural transformation, a fact we will use later.

Lemma 2

For all convex sets \((A, \alpha )\) and \((B, \beta )\), affine maps \(f :A \rightarrow B\), and finite nonempty subsets \(S \subseteq A\), .

Proof

We will first show that

(2)

for all finite nonempty \(S \subseteq A\). For the inclusion from left to right, note that for each such that \(\mathsf {supp}(d) \subseteq S\) we have only if there exists \(a \in S\) such that \(f(a) = b\). Thus, . Conversely, consider such that \(\mathsf {supp}(d) \subseteq \{f(a) \mid a \in S\}\). We define by

$$ d'(a) = \frac{d(f(a))}{|\{a' \in S \mid f(a') = f(a)\}|}. $$

Then

Now we have

2.4 Automata and Language Semantics

In this section we review the general language semantics for automata with side-effects provided by a monad (see, e.g., [2, 14, 29]). This categorical framework is the foundation of our language semantics for NPA.

Definition 6

Given a monad \((T, \eta , \mu )\) in the category of sets, an output set O, and a (finite) alphabet A, a T-automaton is defined by a tuple \((S, s_0, \gamma , \{ \tau _a \}_{a \in A})\), where S is the set of states, \(s_0 \in S\) is the initial state, \(\gamma :S \rightarrow O\) is the output function, and \(\tau _a :S \rightarrow TS\) for \(a \in A\) are the transition functions.

This abstract formulation encompasses many standard notions of automata. For instance, we recover deterministic (Moore) automata by letting T be the identity monad; deterministic acceptors are a further specialization where the output set is the set \(2 = \{0, 1\}\), with 0 modeling rejecting states and 1 modeling accepting states. If we use the powerset monad, we recover nondeterministic acceptors.

Any T-automaton can be determinized, using a categorical generalization of the powerset construction [29].

Definition 7

Given a monad \((T, \eta , \mu )\) in the category of sets, an output set O with a T-algebra structure \(o :TO \rightarrow O\), and a (finite) alphabet A, a T-automaton \((S, s_0, \gamma , \{ \tau _a \}_{a \in A})\) can be determinized into the deterministic automaton \((TS, s_0', \gamma ', \{ \tau _a' \}_{a \in A})\) given by \(s_0' = \eta (s_0) \in TS\) and

$$\begin{aligned} \begin{array}{ll} \gamma ' :TS \rightarrow O &{} \tau _a' :TS \rightarrow TS \\ \gamma ' = o \circ T\gamma &{} \tau _a' = \mu \circ T\tau _a. \end{array} \end{aligned}$$

This construction allows us to define the language semantics of any T-automaton as the semantics of its determinization. More formally, we have the following definition.

Definition 8

Given a monad \((T, \eta , \mu )\) in the category of sets, an output set O with a T-algebra structure \(o :TO \rightarrow O\), and a (finite) alphabet A, the language accepted by a T-automaton is the function given by , where is defined inductively by

As an example, we recover deterministic probabilistic automata (DPAs) by taking T to be the distribution monad and letting the output set be the interval [0, 1]. That is, a DPA with finiteFootnote 2 state space S has an output function of type \(S \rightarrow [0, 1]\), and each of its transition functions is of type . To give a semantics to such an automaton, we use the usual -algebra structure computing the expected weight.

More concretely, the semantics works as follows. Let \((S, s_0, \gamma , \{ \tau _a \}_{a \in A})\) be a DPA. At any time while reading a word, we are in a convex combination of states \(\sum _{i = 1}^n p_i s_i\) (equivalently, a distribution over states). The current output is given by evaluating the sum \(\sum _{i = 1}^n p_i \gamma (s_i)\). On reading a symbol \(a \in A\), we transition to the convex combination of convex combinations \(\sum _{i = 1}^n p_i \tau _a(s_i)\), say \(\sum _{i = 1}^n p_i \sum _{j = 1}^{m_i} q_{i, j} s_{i, j}\), which is collapsed to the final convex combination \(\sum _{i = 1}^n \sum _{j = 1}^{m_i} p_i q_{i, j} s_{i, j}\) (again, a distribution over states).

Remark 1

One may wonder if the automaton model would be more expressive if the initial state \(s_0\) in an automaton \((S, s_0, \gamma , \{ \tau _a \}_{a \in A})\) would be an element of TS rather than S. This is not the case, since we can always add a new element to S that simulates \(s_0\) by setting its output to \((o \circ T\gamma )(s_0)\) and its transition on \(a \in A\) to \((\mu \circ T\tau _a)(s_0)\).

For instance, DPAs allowing a distribution over states as the initial state can be represented by an initial state distribution \(\mu \), an output vector \(\gamma \), and transitions \(\tau _a\). In typical presentations, \(\mu \) and \(\gamma \) are represented as weight vectors over states, and the \(\tau _a\) are encoded by stochastic matrices.

3 Nondeterministic Probabilistic Automata

We work with an automaton model supporting probabilistic and nondeterministic behaviors, inspired by Segala [28]. On each input letter, the automaton can choose from a finitely generated nonempty convex set of distributions over states. After selecting a distribution, the automaton then transitions to its next state probabilistically. Each state has an output weight in [0, 1]. The following formalization is an instantiation of Definition 6 with the monad .

Definition 9

A nondeterministic probabilistic automaton (NPA) over a (finite) alphabet A is defined by a tuple \((S, s_0, \gamma , \{ \tau _a \}_{a \in A})\), where S is a finite set of states, \(s_0 \in S\) is the initial state, \(\gamma :S \rightarrow [0, 1]\) is the output function, and are the transition functions indexed by inputs \(a \in A\).

As an example, consider the NPA below.

(3)

States are labeled by their direct output (i.e., their weight from \(\gamma \)) while outgoing edges represent transitions. Additionally, we write the state name next to each state. We only indicate a set of generators of the convex subset that a state transitions into. If one of these generators is a distribution with nonsingleton support, then a transition into a black dot is depicted, from which the outgoing transitions represent the distribution. Those edges are labeled with probabilities.

Our NPAs recognize weighted languages. The rest of the section is concerned with formally defining this semantics, based on the general framework from Sect. 2.4.

3.1 From Convex Algebra to Language Semantics

To define language semantics for NPAs, we will use the monad structure of . To be able to use the semantics from Sect. 2.4, we need to specify a -algebra structure . Moreover, our model should naturally coincide with DPAs when transitions make no nondeterministic choices, i.e., when each transition function maps each state to a singleton distribution over states. Thus, we require the -algebra o to extend the expected weight function \(\mathbb {E}\), making the diagram below commute.

(4)

3.2 Characterizing the Convex Algebra on [0, 1]

While in principle there could be many different -algebras on [0, 1] leading to different language semantics for NPAs, we show that (i) each algebra extending the -algebra on [0, 1] is fully determined by a -algebra on [0, 1], and (ii) there are exactly two -algebras on [0, 1]: the map computing the minimum and the map computing the maximum.

Proposition 1

Any -algebra on [0, 1] extending is of the form , where \(\alpha \) is a -algebra.

Proof

Let be a -algebra extending \(\mathbb {E}\). We define

Indeed, the diagram

figure a

commutes, so it only remains to show that \(\alpha \) is a -algebra. This can be seen from the commutative diagrams below.

figure b

Proposition 2

The only -algebras on the convex set [0, 1] are \(\mathsf {min}\) and \(\mathsf {max}\).

Proof

Let be a -algebra. Then for any \(r \in [0, 1]\), \(\alpha (\{r\}) = r\), and the diagram below must commute.

(5)

Furthermore, \(\alpha \) is an affine map. Since by Lemma 1, \(\alpha (\{0\}) = 0\), and \(\alpha (\{1\}) = 1\), \(\alpha \) is completely determined by \(\alpha ([0, 1])\). We now calculate that

Thus, we have either \(\alpha ([0, 1]) = 0\) or \(\alpha ([0, 1]) = 1\). Consider any finitely generated nonempty convex subset \([p, q] \subseteq [0, 1]\). If \(\alpha ([0, 1]) = 0\), then Lemma 1 gives

if \(\alpha ([0, 1]) = 1\), then

$$\begin{aligned} \alpha ([p, q])&= \alpha (p\{1\} + (q - p)[0, 1] + (1 - q)\{0\}) \\&= p \cdot \alpha (\{1\}) + (q - p) \cdot \alpha ([0, 1]) + (1 - q) \cdot \alpha (\{0\}) \\&= p \cdot 1 + (q - p) \cdot 1 + (1 - q) \cdot 0 = q = \mathsf {max}([p, q]). \end{aligned}$$

We now show that \(\mathsf {min}\) is an algebra; the case for \(\mathsf {max}\) is analogous. We have

so \(\mathsf {min}\) is an affine map. Furthermore, clearly \(\mathsf {min}(\{r\}) = r\) for all \(r \in [0, 1]\), and for all ,

Corollary 1

The only -algebras on [0, 1] extending \(\mathbb {E}\) are and .

Consider again the NPA (3). Since we can always choose to remain in the initial state, the \(\mathsf {max}\) semantics assigns 1 to each word for this automaton. The \(\mathsf {min}\) semantics is more interesting. Consider reading the word aa. On the first a, we transition from \(s_0\) to . Reading the second a gives

Now we first apply \(\mathcal {P}_c\omega \) to eliminate the outer distribution, arriving at

Taking the union yields

which leads to the convex subset of distributions over outputs

Calculating the expected weights gives , which has a minimum of \(\frac{1}{4}\). One can show that on reading any word \(u \in A^*\) the automaton outputs \(2^{-n}\), where n is the length of the longest sequence of a’s occurring in u.

The semantics coming from \(\mathsf {max}\) and \(\mathsf {min}\) are highly symmetrical; in a sense, they are two representations of the same semantics.Footnote 3 Technically, we establish the following relation between the two semantics—this will be useful to avoid repeating proofs twice for each property.

Proposition 3

Consider an NPA under the \(\mathsf {min}\) semantics. Define \(\gamma ' :S \rightarrow [0, 1]\) by \(\gamma '(s) = 1 - \gamma (s)\), and consider the NPA under the \(\mathsf {max}\) semantics. Then .

Proof

We prove a stronger property by induction on u: for all and \(u \in A^*\), we have . This is sufficient because and \(\mathcal {A}'\) have the same initial state. We have

Furthermore,

4 Expressive Power of NPAs

Our convex language semantics for NPAs coincides with the standard semantics for DPAs when all convex sets in the transition functions are singleton sets. In this section, we show that NPAs are in fact strictly more expressive than DPAs. We give two results. First, we exhibit a concrete language over a binary alphabet that is recognizable by a NPA, but not recognizable by any DPA. This argument uses elementary facts about the Hankel matrix, and actually shows that NPAs are strictly more expressive than weighted finite automata (WFAs).

Next, we separate NPAs and DPAs over a unary alphabet. This argument is substantially more technical, relying on deeper results from number theory about linear recurrence sequences.

4.1 Separating NPAs and DPAs: Binary Alphabet

Consider the language by , where n is the length of the longest sequence of a’s occurring in u. Recall that this language is accepted by the NPA (3) using the \(\mathsf {min}\) algebra.

Theorem 1

NPAs are more expressive than DPAs. Specifically, there is no DPA, or even WFA, accepting .

Proof

Assume there exists a WFA accepting , and let l(u) for \(u \in \{a, b\}^*\) be the language of the linear combination of states reached after reading the word u. We will show that the languages \(l(a^nb)\) for \(n \in \mathbb {N}\) are linearly independent. Since the function that assigns to each linear combination of states its accepted language is a linear map, this implies that the set of linear combinations of states of the WFA is a vector space of infinite dimension, and hence the WFA cannot exist.

The proof is by induction on a natural number m. Assume that for all natural numbers \(i \le m\) the languages \(l(a^ib)\) are linearly independent. For all \(i \le m\) we have \(l(a^ib)(a^m) = 2^{-m}\) and \(l(a^ib)(a^{m + 1}) = 2^{-m - 1}\); however, \(l(a^{m + 1}b)(a^m) = l(a^{m + 1}b)(a^{m + 1}) = 2^{-m - 1}\). If \(l(a^{m + 1}b)\) is a linear combination of the languages \(l(a^ib)\) for \(i \le m\), then there are constants \(c_1, \ldots , c_m \in \mathbb {R}\) such that in particular

These equations cannot be satisfied. Therefore, for all natural numbers \(i \le m + 1\) the languages \(l(a^ib)\) are linearly independent. We conclude by induction that for all \(m \in \mathbb {N}\) the languages \(l(a^ib)\) for \(i \le m\) are linearly independent, which implies that all languages \(l(a^nb)\) for \(n \in \mathbb {N}\) are linearly independent.    \(\square \)

A similar argument works for NPAs under the \(\mathsf {max}\) algebra semantics—one can easily repeat the argument in the above theorem for the language accepted by the NPA resulting from applying Proposition 3 to the NPA (3).

4.2 Separating NPAs and DPAs: Unary Alphabet

We now turn to the unary case. A weighted language over a unary alphabet can be represented by a sequence \(\langle u_i \rangle = u_0, u_1, \dots \) of real numbers. We will give such a language that is recognizable by a NPA but not recognizable by any WFA (and in particular, any DPA) using results on linear recurrence sequences, an established tool for studying unary weighted languages.

We begin with some mathematical preliminaries. A sequence of real numbers \(\langle u_i \rangle \) is a linear recurrence sequence (LRS) if for some integer \(k \in \mathbb {N}\) (the order), constants \(u_0, \dots , u_{k - 1} \in \mathbb {R}\) (the initial conditions), and coefficients \(b_0, \dots , b_{k - 1} \in \mathbb {R}\), we have

$$ u_{n + k} = b_{k - 1} u_{n - 1} + \cdots + b_0 u_n $$

for every \(n \in \mathbb {N}\). A well-known example of an LRS is the Fibonacci sequence, an order-2 LRS satisfying the recurrence \(f_{n + 2} = f_{n + 1} + f_n\). Another example of an LRS is any constant sequence, i.e., \(\langle u_i \rangle \) with \(u_i = c\) for all i.

Linear recurrence sequences are closed under linear combinations: for any two LRS \(\langle u_i \rangle , \langle v_i \rangle \) and constants \(\alpha , \beta \in \mathbb {R}\), the sequence \(\langle \alpha u_i + \beta v_i \rangle \) is again an LRS (possibly of larger order). We will use one important theorem about LRSs. See the monograph by Everest et al. [11] for details.

Theorem 2

Skolem-Mahler-Lech). If \(\langle u_i \rangle \) is an LRS, then its zero set \(\{i \in \mathbb {N}\mid u_i = 0\}\) is the union of a finite set along with finitely many arithmetic progressions (i.e., sets of the form \(\{ p + k n \mid n \in \mathbb {N} \}\) with \(k \ne 0\)).

This is a celebrated result in number theory and not at all easy to prove. To make the connection to probabilistic and weighted automata, we will use two results. The first proposition follows from the Cayley-Hamilton Theorem.

Proposition 4

(see, e.g., [21]). Let be a weighted unary language recognizable by a weighted automaton W. Then the sequence of weights \(\langle u_i \rangle \) with is an LRS, where the order is at most the number of states in W.

While not every LRS can be recognized by a DPA, it is known that DPAs can recognize a weighted language encoding the sign of a given LRS.

Theorem 3

( Akshay et al. [1, Theorem 3, Corollary 4]). Given any LRS \(\langle u_i \rangle \), there exists a stochastic matrix M such that

for all n, where \(u = (1, 0, \dots , 0)\) and \(v = (0, 1, 0, \dots , 0)\). Equality holds on the left if and only if it holds on the right. The language is recognizable by a DPA with input vector u, output vector v, and transition matrix M (Remark 1). If the LRS is rational, M can be taken to be rational as well.

We are now ready to separate NPAs and WFAs over a unary alphabet.

Theorem 4

There is a language over a unary alphabet that is recognizable by an NPA but not by any WFA (and in particular any DPA).

Proof

We will work in the complex numbers \(\mathbb {C}\), with i being the positive square root of \(-1\) as usual. Let \(a, b \in \mathbb {Q}\) be nonzero such that \(z \triangleq a + bi\) is on the unit circle in \(\mathbb {C}\), for instance \(a = 3/5, b = 4/5\) so that \(|a + bi| = a^2 + b^2 = 1\). Let \(\bar{z} = a - bi\) denote the complex conjugate of z and let \(\text {Re}(z)\) denote the real part of a complex number. It is possible to show that z is not a root of unity, i.e., \(z^k \ne 1\) for all \(k \in \mathbb {N}\). Let \(\langle x_n \rangle \) be the sequence \(x_n \triangleq (z^n + \bar{z}^n)/2 = \text {Re}(z^n)\). By direct calculation, this sequence has imaginary part zero and satisfies the recurrence

$$ x_{n + 2} = 2 a x_{n + 1} - (a^2 + b^2) x_n $$

with \(x_0 = 1\) and \(x_1 = a\), so \(\langle x_n \rangle \) is an order-2 rational LRS. By Theorem 3, there exists a stochastic matrix M and non-negative vectors uv such that

$$ x_n \ge 0 \iff u^T M^n v \ge 1/4 $$

for all n, where equality holds on the left if and only if equality holds on the right. Note that \(x_n = \text {Re} (z^n) \ne 0\) since z is not a root of unity (so in particular \(z^n \ne \pm i\)), hence equality never holds on the right. Letting \(\langle y_n \rangle \) be the sequence \(y_n = u^T M^n v\), the (unary) language with weights \(\langle y_n \rangle \) is recognized by the DPA with input u, output v and transition matrix M. Furthermore, the constant sequence \(\langle 1/4 \rangle \) is recognizable by a DPA.

Now we define a sequence \(\langle w_n \rangle \) with \(w_n = \max (y_n, 1/4)\). Since \(\langle y_n \rangle \) and \(\langle 1/4 \rangle \) are recognizable by DPAs, \(\langle w_n \rangle \) is recognizable by an NPA whose initial state nondeterministically chooses between the two DPAs (see Remark 1). Suppose for the sake of contradiction that it is also recognizable by a WFA. Then \(\langle w_n \rangle \) is an LRS (by Proposition 4) and hence so is \(\langle t_n \rangle \) with \(t_n = w_n - y_n\). If we now consider the zero set

Theorem 2 implies that S is the union of a finite set of indices and along with a finite number of arithmetic progressions. Note that S cannot be finite—in the last line, \(z^n\) is dense in the unit circle since z is not a root of unity—so there must be at least one arithmetic progression \(\{ p + kn \mid n \in \mathbb {N} \}\). Letting \(\langle r_n \rangle \) be

$$ r_n = (z^p \cdot (z^k)^n + \bar{z}^p \cdot (\bar{z}^k)^n)/2 = \text {Re}(z^p \cdot (z^k)^n) = x_{p + kn} , $$

we have \(p + kn \in S\), so \(r_n > 0\) for all \(n \in \mathbb {N}\), but this is impossible since it is dense in \([-1, 1]\) (because \(z^k\) is not a root of unity for \(k \ne 0\), so \(z^p \cdot (z^k)^n\) is dense in the unit circle).

Hence, the unary weighted language \(\langle w_n \rangle \) can be recognized by an NPA but not by a WFA.    \(\square \)

5 Checking Language Equivalence of NPAs

Now that we have a coalgebraic model for NPA, a natural question is whether there is a procedure to check language equivalence of NPAs. We will show that language equivalence of NPAs is undecidable by reduction from the threshold problem on DPAs. Nevertheless, we can define a metric on the set of languages recognized by NPAs to measure their similarity. While this metric cannot be computed exactly, it can be approximated to any given precision in finite time.

5.1 Undecidability and Hardness

Theorem 5

Equivalence of NPAs is undecidable when \(|A| \ge 2\) and the -algebra on [0, 1] extends the usual -algebra on [0, 1].

Proof

Let X be a DPA and \(\kappa \in [0, 1]\). We define NPAs Y and Z as follows:

Here the node labeled X represents a copy of the automaton X—the transition into X goes into the initial state of X. Note that the edges are labeled by A to indicate a transition for every element of A. We see that and (for \(\alpha \) either \(\mathsf {min}\) or \(\mathsf {max}\), as follows from Corollary 1)

Thus, if \(\alpha = \mathsf {min}\), then if and only if for all \(v \in A^*\); if \(\alpha = \mathsf {max}\), then if and only if for all \(v \in A^*\). Both of these threshold problems are undecidable for alphabets of size at least 2 [5, 12, 22].    \(\square \)

The situation for automata over unary alphabets is more subtle; in particular, the threshold problem is not known to be undecidable in this case. However, there is a reduction to a long-standing open problem on LRSs.

Given an LRS \(\langle u_i \rangle \), the Positivity problem is to decide whether \(u_i\) is non-negative for all \(i \in \mathbb {N}\) (see, e.g., [21]). While the decidability of this problem has remained open for more than 80 years, it is known that a decision procedure for Positivity would necessarily entail breakthroughs in open problems in number theory. That is, it would give an algorithm to compute the homogeneous Diophantine approximation type for a class of transcendental numbers [21]. Furthermore, the Positivity problem can be reduced to the threshold problem on unary probabilistic automata. Putting everything together, we have the following reduction.

Corollary 2

The Positivity problem for linear recurrence sequences can be reduced to the equivalence problem of NPAs over a unary alphabet.

Proof

The construction in Theorem 5 shows that the lesser-than threshold problem can be reduced to the equivalence problem for NPAs with \(\mathsf {max}\) semantics, so we show that Positivity can be reduced to the lesser-than threshold problem on probabilistic automata with a unary alphabet. Given any rational LRS \(\langle u_i \rangle \), clearly \(\langle - u_i \rangle \) is an LRS as well, so by Theorem 3 there exists a rational stochastic matrix M such that

$$ - u_n> 0 \iff u^T M^n v > 1/4 $$

for all n, where \(u = (1, 0, \dots , 0)\) and \(v = (0, 1, 0, \dots , 0)\). Taking M to be the transition matrix, v to be the input vector, and u to be the output vector, the probabilistic automaton corresponding to the right-hand side is a nonsatisfying instance to the threshold problem with threshold \(\le 1/4\) if and only if the \(\langle u_i \rangle \) is a satisfying instance of the Positivity problem.

Applying Proposition 3 yields an analogous reduction from Positivity to the equivalence problem of NPAs with \(\mathsf {min}\) semantics.    \(\square \)

5.2 Checking Approximate Equivalence

The previous negative results show that deciding exact equivalence of NPAs is computationally intractable (or at least very difficult, for a unary alphabet). A natural question is whether we might be able to check approximate equivalence. In this section, we show how to approximate a metric on weighted languages. Our metric will be discounted—differences in weights of longer words will contribute less to the metric than differences in weights of shorter words.

Given \(c \in [0, 1)\) and two weighted languages \(l_1, l_2 :A^* \rightarrow [0, 1]\), we define

Suppose that \(l_1\) and \(l_2\) are recognized by given NPAs. Since \(d_c(l_1, l_2) = 0\) if and only if the languages (and automata) are equivalent, we cannot hope to compute the metric exactly. We can, however, compute the weight of any finite word under \(l_1\) and \(l_2\). Combined with the discounting in the metric, we can approximate this metric \(d_c\) within any desired (nonzero) error.

Theorem 6

There is a procedure that given \(c \in [0, 1)\), \(\kappa > 0\), and computable functions \(l_1, l_2 :A^* \rightarrow [0, 1]\) outputs \(x \in \mathbb {R}_+\) such that \(|d_c(l_1, l_2) - x| \le \kappa \).

Proof

Let \(n = \lceil \log _c((1 - c) \cdot \kappa )\rceil \in \mathbb {N}\) and define

$$ x = \sum _{u \in A^*, |u| < n}|l_1(u) - l_2(u)| \cdot \left( \frac{c}{|A|}\right) ^{|u|}. $$

This sum is over a finite set of finite strings and the weights of \(l_1(u)\) and \(l_2(u)\) can all be computed exactly, so x is computable as well. Now we can bound

$$\begin{aligned} |d_c(l_1, l_2) - x|&= \sum _{u \in A^*, |u| \ge n}|l_1(u) - l_2(u)| \cdot \left( \frac{c}{|A|}\right) ^{|u|} \le \sum _{u \in A^*, |u| \ge n}\left( \frac{c}{|A|}\right) ^{|u|} \\&= \sum _{i \in \mathbb {N}, i \ge n}|A|^{i}\cdot \left( \frac{c}{|A|}\right) ^{i} = \sum _{i \in \mathbb {N}, i \ge n}c^{i} = \frac{c^n}{1 - c} \le \kappa , \end{aligned}$$

where the last step is because \(n \ge \log _c((1 - c) \cdot \kappa )\), and thus \(c^n \le (1 - c) \cdot \kappa \), noting that \(c \in [0, 1)\) and \(\kappa > 0\).    \(\square \)

We leave approximating other metrics on weighted languages—especially nondiscounted metrics—as an intriguing open question.

6 Conclusions

We have defined a novel probabilistic language semantics for nondeterministic probabilistic automata (NPAs). We proved that NPAs are strictly more expressive than deterministic probabilistic automata, and that exact equivalence is undecidable. We have shown how to approximate the equivalence question to arbitrary precision using a discounted metric. There are two directions for future work that we would like to explore. First, it would be interesting to see if different metrics can be defined on probabilistic languages and what approximate equivalence procedures they give rise to. Second, we would like to explore whether we can extend logical characterization results in the style of Panangaden et al. [10, 13]. Finally, it would be interesting to investigate the class of languages recognizable by our NPAs.

Related Work. There are many papers studying probabilitic automata and variants thereof. The work in our paper is closest to the work of Segala [28] in that our automaton model has both nondeterminism and probabilistic choice. However, we enrich the states with an output weight that is used in the definition of the language semantics. Our language semantics is coarser than probabilistic (convex) bisimilarity [28] and bisimilarity on distributions [17]. In fact, in contrast to the hardness and undecidability results we proved for probabilistic language equivalence, bisimilarity on distributions can be shown to be decidable [17] with the help of convexity. The techniques we use in defining the semantics are closely related to the recent categorical understanding of bisimilarity on distributions [7].