Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Weighted finite automata are used in many applications including speech recognition [19], speech synthesis [11], machine translation [13], computational biology [10], image processing [2], and optical character recognition [7]. Such applications often have strict time and memory requirements, so efficient representations and algorithms are paramount. We examine one useful technique, the use of failure transitions, to represent automata compactly. A failure transition is taken only when no immediate match to the input is possible at a given state. In this paper, we will present efficient algorithms to combine, optimize and search weighted automata with failure transitions.

Aho and Corasick [1] introduce failure transitions in the context of efficient string matching from a finite set of strings input. Mohri [16] shows how to use failure transitions in string matching from finite automata input. Several authors explore constructing deterministic failure automata from arbitrary deterministic finite automata (DFA) for space optimization [6, 15, 22].

Automata with failure transitions, initially introduced for string matching problems, have found wider use including compactly representing language, pronunciation, transliteration and semantic models [3, 8, 12, 14, 20, 21, 27].

Mohri [18] gives a concise presentation of various fundamental weighted automata algorithms, many generalizations of classical algorithms to the weighted case. These algorithms include intersection, epsilon removal and shortest distance. Our goal here is to present similar algorithms for weighted automata with failure transitions.

This paper is organized as follows. In Sect. 2 we introduce the automata classes and related notation used here. In Sect. 3 we present the algorithms for automata with failure transitions. In Sect. 4 offer discussion and include mention of a related open-source software library.

2 Preliminaries

2.1 Semirings

A semiring \(K = (\mathbb {K}, \oplus , \otimes , \overline{0}, \overline{1})\) consists of a set \(\mathbb {K}\) together with an associative and commutative operation \(\oplus \) and an associative operation \(\otimes \), with identities \(\overline{0}\) and \(\overline{1}\), respectively, such that \(\otimes \) distributes over \(\oplus \), and \(\overline{0}\otimes x = x\otimes \overline{0}= \overline{0}\). A semiring is commutative if the \(\otimes \) operation is also commutative.

Let K be a semiring equipped with a metric \(\varDelta \).Footnote 1 A family \(\left\{ x_i \right\} _{i\in I}\) of elements in \(\mathbb {K}\) is summable to \(x \in \mathbb {K}\) if \(\forall \eta > 0\) there is a \(J_\eta \subset I\) such that

$$\begin{aligned} \varDelta \Big (\bigoplus _{i \in L} x_i, x \Big ) \le \eta \end{aligned}$$
(1)

for all finite L with \(J_\eta \subseteq L \subset I\) [25].

2.2 Weighted Automata

A weighted finite automaton (WFA) \(A= (\varSigma , Q, E, i, F, \rho )\) over a semiring K is given by a finite alphabet \(\varSigma \), a finite set of states Q, a finite set of transitions \(E \subseteq Q \times \varSigma \times \mathbb {K}\times Q\), an initial state \(i \in Q\), a set of final states \(F \subseteq Q\), and a final weight function \(\rho :F \rightarrow \mathbb {K}\).

A transition \(e = (p[{e}], \ell [e], w[e], n[{e}]) \in E\) represents a move from the source or previous state \(p[{e}]\) to the destination or next state \(n[{e}]\) with the label \(\ell [e]\) and weight \(w[e]\). The transitions with source state q are denoted by E[q].

Transitions \(e_1\) and \(e_2\) are consecutive if \(n[{e_i}] = p[{e_{i+1}}]\). A path \(\pi = e_1 \cdots e_n \in E^*\) is a finite sequence of consecutive transitions. The source state of a path we denote by \(p[{\pi }]\) and the destination state by \(n[{\pi }]\). The label of a path is the concatenation of its transition labels: \(\ell [\pi ] = \ell [e_1] \cdots \ell [e_n]\). The weight of a path is obtained by \(\otimes \)-multiplying its transition weights: \(w[\pi ] = w[e_1] \otimes \cdots \otimes w[e_n]\). For a non-empty path, the i-th transition is denoted by \(\pi _i\).

\(P(q,q')\) denotes the set of all paths in \(A\) from q to \(q'\). We extend this to sets in the obvious way: P(qR) denotes the set of all paths q to \(q' \in R\) and so forth.

A path \(\pi \) is successful if it is in P(iF) and in that case the automaton is said to accept the input string \(\alpha = \ell [\pi ]\). The weight of \(\alpha \in \varSigma ^*\) assigned by the automaton is:

$$\begin{aligned} A{(\alpha )} = \bigoplus _{\pi \in P(i, F) : \, \ell [\pi ] = \alpha } w[\pi ] \rho (n[{\pi }]). \end{aligned}$$
(2)

2.3 Weighted Automata with \(\epsilon \) or \(\varphi \) Transitions

A weighted finite automaton with \(\epsilon \)-transitions (\(\epsilon \)-WFA) \(A_\epsilon = (\varSigma , Q, E_\epsilon , i, F, \rho )\) is a WFA extended to allow a transition to have an empty label denoted by \(\epsilon \): \(E_\epsilon \subseteq Q \times (\varSigma \cup \{\epsilon \}) \times \mathbb {K}\times Q\). A weighted finite automaton with failure transitions (\(\varphi \)-WFA) \(A_\varphi = (\varSigma , Q, E_\varphi , i, F, \rho )\) is a WFA extended to allow a transition to have a special failure label denoted by \(\varphi \): \(E_\varphi \subseteq Q \times (\varSigma \cup \{\varphi \}) \times \mathbb {K}\times Q\).

Neither \(\epsilon \) and \(\varphi \) transitions add to a path label; they consume no input as their labels are identity elements of string concatenation for their respective automata. An \(\epsilon \)-transition places no restriction on a path; it is a ‘free’ move. A failure transition, however, is followed only when the input can not be read immediately.

Specifically, a path \(e_1\cdots e_n\) in a \(\varphi \)-WFA is disallowed if it contains a subpath \(e_i \cdots e_j\) such that \(\ell [e_k] = \varphi \) for \(i \le k < j\) and there is another transition \(e \in E\) such that \(p[{e_i}] = p[{e}]\) and \(\ell [e_j] = \ell [e] \in \varSigma \) (see Fig. 1). Since the label \(a = l[e_j]\) can be read on e, we do not follow the failure transitions to read it on \(e_j\) as well.

Fig. 1.
figure 1

The (dashed red) path \(e_i = (q_i, \varphi , \omega _i, q_{i+1})\) to \(e_j = (q_j, a, \omega _j, q_{j+1})\) is disallowed since a can be read already on \(e = (q_i, a, \omega , q)\). (Color figure online)

We use \(P^*(q, q') \subseteq P(q, q')\) to denote the set of (not dis-) allowed paths from q to \(q'\) in a \(\varphi \)-WFA. This again extends to sets in the obvious way. A path \(\pi \) is successful in a \(\varphi \)-WFA if \(\pi \in P^*(i, F)\) and \(\ell [\pi _{|\pi |}] \ne \varphi \) and only in that case is the input string \(\alpha = \ell [\pi ]\) accepted.Footnote 2

The weight of \(\alpha \in \varSigma ^*\) assigned by the automaton is:

$$\begin{aligned} A_\varphi {(\alpha )} = \bigoplus _{\pi \in P^*(i, F) : \, \ell [\pi ] = \alpha , \ell [\pi _{|\pi |}] \ne \varphi } w[\pi ] \rho (n[{\pi }]). \end{aligned}$$
(3)

For these automata, we will assume there are no \(\epsilon \)- or \(\varphi \)-labeled cycles. When there is at most one exiting failure transition per state we call the automaton \(\varphi \)-deterministic. We will also assume the \(\varphi \)-WFAs in this paper are \(\varphi \)-deterministic.

Two automata are equivalent if they accept the same strings with the same weights. Any weighted finite automaton is trivially also a \(\varphi \)-WFA. In the next section we describe how to remove the failure transitions from a \(\varphi \)-WFA to produce an equivalent \(\varphi \)-free WFA. As with \(\epsilon \)-transitions, \(\varphi \)-transitions do not extend the set of weighted languages, rational power series, representable by WFAs [5].

Fig. 2.
figure 2

Pseudocode of the intersection algorithm with failure transitions.

2.4 \(\varphi \)-Removed Automata

Given a \(\varphi \)-WFA \(A= (\varSigma , Q, E, i, F, \rho )\), let the \(\varphi \)-removed transitions leaving q be defined as:

$$\begin{aligned}\ E^*[q] = \Big \{ (q,a,\omega ,q') :\quad&\pi \in P^*(q,Q), a = \ell [\pi ] = \ell [\pi _{|\pi |}] \in \varSigma , q' = n[{\pi }], \\&\omega = \bigoplus _{\pi ' \in P^*(q,q') : \, a = \ell [\pi '] = \ell [\pi '_{|\pi '|}], q' = n[{\pi '}]} w[\pi '] \Big \} \end{aligned}$$

This is a set of (possibly new) transitions \((q,a,\omega ,q')\), one for each source state q and destination state \(q'\) of one or more a-labeled paths with optional leading failure transitions. The weight is the \(\oplus \)-sum of all such paths between those state pairs and with that label.

Then define the \(\varphi \)-removed WFA as \((\varSigma , Q, \cup _{q \in Q} E^*[q], i, F, \rho )\).

3 Algorithms

In this section we wish to extend some common WFA algorithms to the case where failure transitions are present.

3.1 Intersection

Computing the intersection of two WFAs is a fundamental operation and one of the most useful. For example, the application of an n-gram language model to an unweighted string (or more general unweighted automaton) is accomplished by intersection [3]. We extend intersection to \(\varphi \)-WFAs as follows:

Let K be a commutative semiring and let \(A_1 = (\varSigma , Q_1, E_1, i_i, F_1, \rho _1)\) and \(A_2= (\varSigma , Q_2, E_2, i_2, F_2, \rho _2)\) be two \(\varphi \)-WFAs over the same alphabet. The intersection of \(A_1\) and \(A_2\) is a \(\varphi \)-WFA denoted by \(A_1 \cap A_2\) and specified for all x by:

$$\begin{aligned} (A_1 \cap A_2) (x) = A_1(x) \otimes A_2(x). \end{aligned}$$
(4)

Leaving aside \(\varphi \)-transitions, the following rule specifies how to compute a transition of \(A_1 \cap A_2\) from transitions of \(A_1\) and \(A_2\) with matching labels: \((q_1, a, \omega _1, q'_1)\) and \((q_2, a, w_2, q'_2)\) results in \(((q_1,q_2), a, w_1 \otimes w_2, (q'_1,q'_2))\). A simple algorithm to compute the intersection of two such automata, following the above rule, is given in [16].

The idea for extending the intersection algorithm when one or both automata have failure transitions is to output failure transitions where appropriate otherwise follow the failure transitions when matching. Figure 2 gives the pseudocode for computing \(A= A_1 \cap A_2 = (\varSigma , Q, E, i, F, \rho )\) in this case.

E and F are assumed initialized to the empty set and grown as needed. The algorithm uses a queue S with arbitrary discipline to hold the state pairs yet to be examined. The state set Q is initially the pair of initial states (lines 1–2). Each time through the loop in lines 3–14, a new pair of states \((q_1,q_2)\) is extracted from S (lines 4–5). Each non-\(\varphi \)-transition \(e_1\) leaving \(q_1\) is matched with \(\varphi \)-removed transitions \(e_2\) leaving \(q_2\) (lines 7–8). A transition is created with the matching label from \((q_1, q_2)\) to \((n[{e_1}], n[{e_2}])\) with weight computed by \(\otimes \)-multiplying the weights of the matching transitions. A transition is also created for each \(\varphi \)-transition \(e_1\) leaving \(q_1\) from \((q_1, q_2)\) to \((n[{e_1}], q_2)\) with weight \(w[e_1]\) (lines 10–11). If \(q_1\) and \(q_2\) are final, the pair \((q_1, q_2)\) is final with final weight computed by \(\otimes \)-multiplying the component final weights (lines 12–14).

If there are no failure transitions in \(A_1\), this algorithm is simply the WFA intersection of \(A_1\) with the \(\varphi \)-removed \(A_2\) (lines 7–9). Failure transitions in \(A_1\), however, appear in the output; they delay failure matching on that side (lines 10–11) similar to epsilon processing in \(\epsilon \)-WFAs [16]. The choice of which automaton is used in line 6 can be generalized to be state-dependent. For example, one could use the automaton at line 6 for which \(|E_i(q_i)|\) is less.

The worst-case time complexity of the algorithm is in \(O(|E_1| |Q_2| (m_2 + l_2 {\text {log}}\, d_2))\), where \(d_i\) is the maximum out-degree, \(m_i\) is the maximum label multiplicityFootnote 3 and \(l_i\) the maximum length of a \(\varphi \)-labeled path at a state in \(A_i\) (assuming line 8 is implemented as a binary search).

Fig. 3.
figure 3

Pseudocode of the \(\varphi \)-removal algorithm

Fig. 4.
figure 4

Pseudocode to trim a \(\varphi \)-WFA.

3.2 \(\varphi \)-Removal

An algorithm to \(\varphi \)-remove a WFA A over alphabet \(\varSigma \) to produce an equivalent WFA is shown in Fig. 3. The pseudocode uses the algorithm of Fig. 2 to intersect the input with a \(\varphi \)-free WFA that accepts \(\varSigma ^*\). The intersection algorithm only outputs failure transitions from its first argument, as previously noted, and there are none. Intersection with \(\varSigma ^*\) produces an equivalent result.

3.3 Trimming

Trimming removes states and transitions from an automaton that are useless. These could arise, for example, as the by-product of an intersection algorithm. In a WFA, a state or transition is useless if it is not on a successful path or equivalently, it is not both accessible and coaccessible. A state is accessible if it can be discovered in a visitation (e.g., depth-first) from the initial state [9]. Similarly, coaccessibility can be determined in a visitation in the reverse direction from the final states [9].

For a \(\varphi \)-WFA we must keep each state and transition that is on a successful path or equivalently if it is both \(\varphi \)-accessible from the initial state and \(\varphi \)-coaccessible to a final state. A state q is \(\varphi \)-accessible if \(P^*(i,q)\) is not empty. \(\varphi \)-accessible transitions and \(\varphi \)-coaccessible states and transitions are similarly defined. Unlike a WFA we may also need to retain a state or transition in order to disallow a path in a \(\varphi \)-WFA. For example in Fig. 7 the a-labeled transition leaving state q and its destination state are needed, regardless if on a successful path, to disallow reading \(\varphi a\) from q to \(q''\).

Figure 4 gives the pseudocode to \(\varphi \)-trim a \(\varphi \)-WFA \(A\). First the \(\varphi \)-accessible and \(\varphi \)-coaccessible states are found (lines 1–2). The set of disallowed labels at a state \(\varSigma _\perp \), i.e. those that are not the label of a transition leaving q that is on a \(\varphi \)-accessible path, is also computed at this time. See below for their implementations. The disallowed transitions are then filtered out (line 3).

Fig. 5.
figure 5

Pseudocode to determine which states are \(\varphi \)-accessible and which labels are disallowed at a state.

The automaton is then examined for useful transitions and any found are retained in the result (lines 4–11). A transition is useful if its source state is \(\varphi \)-accessible and its destination is \(\varphi \)-coaccessible (lines 1–4) or if it is needed to forbid a path (lines 5–8).

Figure 5 gives the pseudocode for computing \(\varphi \)-accessibility. The algorithm uses an arbitrary queue S containing the states to be processed. Each time through the loop in lines 5–21, a new state q is extracted from S (lines 6–7). Each transition e leaving q that is not already discovered, visited or has a disallowed label is considered in turn (lines 8–9). If it is a non-\(\varphi \)-transition, then its destination state is marked as discovered and cleared of any previously disallowed labels (lines 10–12). Otherwise, it is a \(\varphi \)-transition and its destination state is marked as \(\varphi \)-discovered (line 14–15).

In this \(\varphi \)-discovered case, the disallowed labels at q together with the transition labels leaving q become the disallowed labels at \(n[{q}]\) if just discovered (lines 1–2). Otherwise, the existing set is filtered by the disallowed labels at q and any newly allowed transition labels leaving q (lines 3–4).

Once a transition is processed, its destination state \(n[{q}]\) is enqueued if needed (lines 16–17). Once all the transitions leaving q are processed, state q is marked as visited or \(\varphi \)-visited if previously discovered or \(\varphi \)-discovered (lines 18–21).

\(\varphi \)-coaccessibility on the \(\varphi \)-accessible component of \(A\) can be computed with a standard coaccessibility computation if restricted to paths that are \(\varphi \)-accessible, as previously found, and that do not end in a \(\varphi \)-labeled transition. The time complexity of trimming is dominated by the computation of \(\varphi \)-accessibility, which is in \(O(l (|E| + (C_S + {\text {log}}\, d) |Q|))\) where d is the maximum out-degree, l is the maximum length of a \(\varphi \)-labeled path in A, and \(C_S\) is the maximum cost of a queue operation for S.

3.4 Shortest Distance

Shortest-distance algorithms play a central role in applications on weighted automata requiring searching, counting, normalization or approximation [3, 20, 23, 27]. In these applications, the tropical semiring \((\mathbb {R}_+\cup \{\infty \}, \min , +, \infty , 0)\) and the positive real semiring \(\mathcal{R}_+= (\mathbb {R}_+, +, \times , 0, 1)\) are among the most widely used.

Shortest Distance on WFAs and \(\epsilon \)-WFAs. The shortest distance from the initial state i to state q in a WFA \(A\) over semiring K is defined as:

$$\begin{aligned} \delta [q] = \bigoplus _{\pi \in P(i,q)} w[\pi ] \end{aligned}$$
(5)

when well-defined and in \(\mathbb {K}\) [17]. Mohri presented an algorithm to compute this shortest distance that is often much more efficient than alternatives such as Floyd-Warshall [17]. The pseudocode is shown in Fig. 6. We show his extended version where \(\mathbb {K}\) is equipped with a metric \(\varDelta \) and an \(\epsilon \) threshold.

The algorithm is a generalization of classical shortest distance algorithms over the tropical semiringFootnote 4 to more general semirings [9]. The algorithm uses a queue S to extract states (line 7) whose transitions e are relaxed to update \(d[n[{e}]]\) (lines 11–13), the estimate of the shortest distance to that state. Unlike the classical algorithms, a second array r[q] is maintained (lines 9, 14) that ensures that the weights added to d[q] from paths in P(iq) are applied only once, important for the non-idempotent case (i.e. when \(a \oplus a \ne a\)). See [17] for the detailed proofs.

Mohri proved exact computation of \(\delta [q]\) (with \(\varDelta (x,y) = 1_{x\ne y}\) and \(\epsilon = 1\) for any queue S) when the input is acyclic, is over a k-closed semiring, or is k-closed for \(A\).Footnote 5 For other semirings such as \(\mathcal{R}_+\), it is an approximation algorithm controlled by \(\epsilon \). We prove convergence and correctness for \(\mathcal{R}_+\) below.

Theorem 1

Let \(A\) be a WFA over \(\mathcal{R}_+\) equipped with the usual metric \(\varDelta (x,y) = |x - y|\). Assume the family of path weights \(\left\{ w[\pi ] \right\} _{\pi \in P(i,q)}\) defining \(\delta [q]\) is summable for all \(q \in Q\). Then ShortestDistance(\(A\), S, \(\epsilon \)) terminates for any queue S and any \(\epsilon > 0\). Further for any \(\eta > 0\), there is an \(\epsilon > 0\) such that at termination \(\varDelta (d[q], \delta [q]) \le \eta \) for any \(q \in Q\).

To prove this theorem, we first introduce a lemma. As in [17], define finite \(D(q) \subseteq P(i,q), \,q \in Q\) as the set of all paths whose weight have been added so far to d[q] at some time in the execution of the program.

Fig. 6.
figure 6

Shortest-distance algorithm without [17] and with failure transitions.

Lemma 1

For any path \(\pi \in P(i,q)\), there is a \(\theta _\pi > 0\) such that \(\pi \) is in \(D(n[{\pi }])\) after some point in the execution of the algorithm provided the algorithm is run with \(\epsilon \le \theta _\pi \) and it terminates.

Proof

If \(|\pi | = 0\), then \(\pi \in D(i)\) with \(\theta _\pi = 1\) by line 3. If \(|\pi | > 0\), let \(\pi = \tau e, \,e \in E\) and assume, by induction, \(\tau \in D(n[{\tau }])\) for all \(\epsilon \le \theta _\tau \). Let \(\theta _\pi = \min \left\{ \theta _\tau , w[\pi ] \right\} \). State \(n[{\tau }]\) was enqueued since \(\tau \in D(n[{\tau }])\). When it is dequeued, which must happen since the algorithm is assumed to terminate, line 11 will succeed for \(\pi \). This follows since \(\varDelta (d[n[{e}]], d[n[{e}]] \oplus (r' \otimes w[e])) = |r' w[e]| \ge |(w[\tau ] + x)\,w[e]| \ge w[\pi ] \ge \theta _\pi \), where x represents the weight of any other paths added to \(r'\). Thus \(\pi \) is added to \(D(n[{\pi }])\) at line 12 for all \(\epsilon \le \theta _\pi \).

Proof

(Theorem 1). Termination: The d[q] form a monotone increasing sequence of partial sums \(d_1, d_2, \ldots \) during the execution of the program and are bounded above by \(\delta (q)\). This ensures the condition in line 11 succeeds only finitely many times.

Convergence to \(\delta [q]\) : Select finite \(J_\eta (q)\) to satisfy Eq. 1 for the summable family \(\left\{ w[\pi ] \right\} _{\pi \in P(i,q)}\) for \(\eta > 0\). By Lemma 1 there is a \(\theta _q = \min _{\pi \in J_\eta (q)} \theta _\pi \) such that \(J_\eta (q) \subseteq D(q)\) and by Eq. 1

$$\varDelta \Big (\bigoplus _{\pi \in D(q)} w[\pi ], \delta [q]\Big ) \le \eta $$

provided \(\epsilon \le \theta _q\). But \(d[q] = \bigoplus _{\pi \in D(q)} w[\pi ]\) as shown in [17]. So if we select \(\epsilon \) as \(\min _{q \in Q} \theta _q\), we prove the theorem. \(\square \)

Fig. 7.
figure 7

Failure transitions in \(A\) are replaced by \(\epsilon \)-transitions in \(A_\epsilon \). To compensate for the formerly disallowed paths, new (dashed red) negatively-weighted paths are added. (Color figure online)

The time complexity of the algorithm is in \(O(N_\epsilon (|E| + C_S |Q|))\) where \(N_\epsilon \) is the maximum number of times a state can be enqueued given the threshold \(\epsilon \) and \(C_S\) is the maximum cost of a queue operation for S [17].

Since the transition labels play no role in the definitions and results of this section, they apply equally to \(\epsilon \)-WFAs. In the next section, the labels matter.

Shortest Distance on \(\varphi \)-WFAs. We define the shortest distance from the initial state i to state q in a \(\varphi \)-WFA as:

$$\begin{aligned} \delta [q] = \bigoplus _{\pi \in P^*(i,q)} w[\pi ] \end{aligned}$$
(6)

when well-defined and in \(\mathbb {K}\).

We present an algorithm for the positive real semiring \(\mathcal{R}_+\), an important case in applications. To do so, we will transform \(\varphi \)-WFA \(A\) on \(\mathcal{R}_+\) to an equivalent \(\epsilon \)-WFA \(A_\epsilon \) on \(\mathcal{R}= (\mathbb {R}, +, *, 0, 1)\). We can then use the ShortestDistance algorithm in Fig. 6, suitably adapted.

Given a \(\varphi \)-WFA \(A= (\varSigma , Q, E, i, F, \rho )\) with \(Q = \left\{ 1,\ldots ,|Q| \right\} \) define \(Q_\epsilon = \left\{ 1,\ldots ,2|Q| \right\} \) and

$$\begin{aligned} E_\epsilon =&\left\{ (q,a,\omega ,q') \in E :\, a \in \varSigma \right\} \cup \\&\left\{ (q,\epsilon ,\beta ,q') :\, (q,\varphi ,\beta ,q') \in E \right\} \cup \left\{ (q,\epsilon ,\beta ,q+|Q|) :\, (q,\varphi ,\beta ,q') \in E \right\} \cup \\&\left\{ (q+|Q|,a,-\nu ,q'') :\, (q,\varphi ,\beta ,q')(q',a,\nu ,q'') \in P(q,q''), \, a \in \varSigma \right\} . \end{aligned}$$

Then let the \(\epsilon \)-WFA be \(A_\epsilon = (\varSigma , Q_\epsilon , E_\epsilon , i, F, \rho )\). Each failure transition in \(A\) is relabeled with an \(\epsilon \) in \(A_\epsilon \), allowing previously disallowed paths. To compensate for this, each disallowed consecutive \(\varphi \)- and a-labeled transition pair in \(A\) also has a corresponding negatively-weighted consecutive \(\varphi \)- and a-labeled transition pair in \(A_\epsilon \) that cancels it (see Fig. 7) [20, 23].

If \(A\) is acyclic, the family of path weights defining \(\delta [q]\) is summable in \(A_\epsilon \) since finite and ShortestDistance can be applied. The cyclic case requires more care; the presence of negative weights means that the partial sums may diverge or converge to something other than \(\delta [q]\) depending on the ordering of the summands (cf. the Reimann Rearrangement Theorem [24]). We will select a queue discipline for the new states in \(A_\epsilon \) to ensure the correct behavior. The idea is to have the negatively-weighted terms be immediately cancelled by their positively-weighted counterparts in the running of the algorithm (cf. [26]). Figure 6 shows the pseudocode for the queue and the general shortest distance algorithm with failure transitions.

The queue \(S_\epsilon \) enqueues states in Q in the arbitrary subqueue S (lines 1–2). New states \(q+|Q|, \, q \in Q\) are enqueued in a FIFO subqueue array indexed by the \(\varphi \)-successor state \(q'\) of q (lines 3–5). \(S_\epsilon \) ensures that any new state is dequeued from the subqueue array (lines 2–3) just before its index state (lines 4–5). In this way, the positive and negative weights of the disallowed a transitions will immediately cancel in \(d[q'']\) at relaxation (see Fig. 7).Footnote 6

The complexity is the same as in the \(\varphi \)-free case since \(|Q_\epsilon | = 2|Q|\), \(|E_\epsilon | < 2 |E|\) and \(C_{S_\epsilon } \in O(C_S)\).

4 Discussion

The intersection algorithm presented ensures \(\varphi \)-determinism in the result with \(\varphi \)-deterministic input. Mohri and Yang [20] describe an alternative algorithm that adds \(\varphi \)-non-determinism but avoids the \(\varphi \)-removal \(E_2^*[q_2]\).

The proof of Theorem 1 takes advantage of rather specific (complete, monotonic, total) order properties of \(\mathbb {R}\) rather than more general metric properties.Footnote 7 Can this shortest distance algorithm be extended to further semirings? It is easy to see that if the algorithm is correct with semirings \(K_1\) and \(K_2\) having metrics \(\varDelta _1\) and \(\varDelta _2\), then \(K_1 \times K_2\) having the square metric \(\varDelta (x,y) = \max (\varDelta _1(x,y), \varDelta _2(x,y))\) is also correct.

These algorithms have been implemented as part of SFST, an open-source C++ library for normalizing, sampling, combining, and approximating stochastic finite-state transducers [4].