Keywords

1 Introduction

It is well known that parsing for graph grammars based on hyperedge replacement (HR) is in general NP-hard, even for a particular grammar [8]. In earlier work [6], we have devised predictive shift-reduce parsing (PSR), which lifts Knuth’s LR string parsing [9] to graphs, is efficient, but unfortunately restricted to a subclass of unambiguous HR grammars. This makes it unsuitable for applications in natural language processing (NLP) where grammars are often ambiguous. So we extend the PSR algorithm to arbitrary HR grammars in this paper: Just like Tomita’s generalized LR string parser [12], the generalized PSR parser pursues all possible parses of a graph in parallel whenever ambiguity occurs. We describe the implementation of the generalized PSR parser by Mark Minas,Footnote 1 and compare its efficiency with the Cocke-Younger-Kasami parser for arbitrary HR grammars [11].

The remainder of this paper is structured as follows. After recalling HR grammars in Sect. 2 and PSR parsing in Sect. 3, we introduce generalized PSR parsing in Sect. 4, and compare its performance with CYK parsing in Sect. 5. Due to lack of space, our presentation is driven by a small example—a grammar for series-parallel graphs—that exhibits many peculiarities of generalized PSR parsing. In Sect. 6, we conclude by indicating related and future work.

2 Graph Grammars Based on Hyperedge Replacement

Throughout the paper, we assume that X is a global, countably infinite supply of nodes, and that \(\varSigma \) is a finite set of symbols that comes with an arity function \( arity :\varSigma \rightarrow \mathbb {N}\), and is partitioned into disjoint subsets \(\mathcal {N}\) of nonterminals and \(\mathcal {T}\) of terminals.

We represent hypergraphs as ordered sequences of edge literals, where each literal represents an edge with its attached nodes. This is convenient as we shall derive (and parse) the edges of a graph in a fixed order.

Definition 1

(Hypergraph). For a symbol \(\textsf {a} \in \varSigma \) and \(k= arity (\textsf {a})\) pairwise distinct nodes \(x_1,\dots ,x_k\in X\), \(a = \textsf {a}(x_1, \dots , x_k) \) represents a hyperedge that is labeled with \(\textsf {a}\) and attached to \(x_1, \dots , x_k\). \({\mathcal E}_{\varSigma }\) denotes the set of hyperedges (over \(\varSigma \)).

A hypergraph \(\langle {\gamma }, { V}\rangle \) consists of a sequence \(\gamma = e_1 \cdots e_n\in {\mathcal E}_{\varSigma }^*\) of hyperedges and a finite set \(V \subseteq X\) of nodes that contains all nodes attached to the hyperedges of \(\gamma \). \(\mathcal {G}_{\varSigma }\) denotes the set of all hypergraphs (over \({\varSigma }\)).

In the following, we usually call hypergraphs just graphs and hyperedges just edges. Moreover, we denote a graph just by its edges \(\gamma \), and refer to its nodes by \(V_{\!\gamma }\).Footnote 2 The “concatenation” of two graphs \(\alpha , \beta \in \mathcal {G}_{\varSigma }\) yields a graph \(\gamma = \alpha \beta \) with nodes \(V_{\!\gamma }= V_{\!\alpha }\cup V_{\!\beta }\). Two graphs \(\gamma \) and \(\gamma '\) are equivalent, written \(\gamma \bowtie \gamma '\), if \(V_{\!\gamma }= V_{\!\gamma '}\) and \(\gamma \) is a permutation of \(\gamma '\).

Note that we order the edges of a graph in rules and derivations. However, the relation \(\bowtie \) makes graphs with permuted edges equivalent, like in ordinary definitions of graphs. Our parsers will make sure that equivalent graphs are always processed in the same way.

An injective function \(\varrho :X \rightarrow X\) is called a renaming, and \(\gamma ^\varrho \) denotes the graph obtained by replacing all nodes in \(\gamma \) (and in \(V_{\!\gamma }\)) according to \(\varrho \). A hyperedge replacement rule \(r=(A \rightarrow \alpha )\) (rule for short) has a nonterminal edge \(A \in {\mathcal E}_{\mathcal {N}}\) as its left-hand side, and a graph \(\alpha \in \mathcal {G}_{\varSigma }\) with \(V_{\!A} \subseteq V_{\!\alpha }\) as its right-hand side.

Consider a graph \(\gamma = \beta \bar{A} \bar{\beta }\in \mathcal {G}_{\varSigma }\) with a nonterminal edge \(\bar{A}\) and a rule \(r=(A \rightarrow \alpha )\). A renaming \(\mu :X \rightarrow X\) is a match (of r in \(\gamma \)) if \(A^\mu = \bar{A}\) and if \(V_{\!\gamma }\cap V_{\!\alpha ^\mu } \subseteq V_{\!A^\mu }\).Footnote 3 A match \(\mu \) of r derives \(\gamma \) to the graph \(\gamma ' = \beta \alpha ^\mu \bar{\beta }\). This is denoted as \(\gamma \mathop {\Rightarrow }_{r,\mu } \gamma '\) . If \(\mathcal {R}\) is a finite set of rules, we write \(\gamma \Rightarrow _\mathcal {R}\gamma '\) if \(\gamma \mathop {\Rightarrow }_{r,\mu } \gamma '\) for some match \(\mu \) of some rule \(r \in \mathcal {R}\).

Definition 2

(HR Grammar). A hyperedge replacement grammar \(\varGamma = (\varSigma ,\mathcal {T},\mathcal {R},Z)\) (HR grammar for short) consists of symbols \(\varSigma \) with terminals \(\mathcal {T}\subseteq \varSigma \) as assumed above, a finite set \(\mathcal {R}\) of rules, and a start graph \(Z= \textsf {Z}()\) with \(\textsf {Z}\in \mathcal {N}\) of arity 0. \(\varGamma \) generates the language

$${\mathcal {L}}(\varGamma ) = \{ g' \in \mathcal {G}_{\mathcal {T}} \mid Z\mathop {\Rightarrow }\nolimits _\mathcal {R}^* g, g' \bowtie g \}. $$

In the following, we simply write \(\mathop {\Rightarrow }\) and \(\mathop {\Rightarrow }^*\) because the rule set \(\mathcal {R}\) in question will always be clear from the context.

Example 1

(A HR Grammar for Series-Parallel Graphs). The following rules

generate series-parallel graphs [8, p. 99]; see Fig. 1 for a derivation with graphs drawn as diagrams.

Fig. 1.
figure 1

A derivation of the graph \(\textsf {e}(1,3) \, \textsf {e}(1,2) \, \textsf {e}(2,3) \, \textsf {e}(3,4)\). Nodes are drawn as circles, nonterminal edges as boxes around their label, with lines to their attached nodes, and terminal edges as arrows from their first to their second attached node; since \(\textsf {e}\) is the only terminal label, we omitted it in the terminal graph.

3 Predictive Shift-Reduce Parsing

The article [6] gives detailed definitions and correctness proofs for PSR parsing. Here we recall the concepts only so far that we can describe its generalization in the next section.

A PSR parser attempts to construct a derivation by reading the edges of a given input graph one after the other.Footnote 4 However, the parser must not assume that the edges of the input graph come in the same order as in a derivation. E.g., when constructing the derivation in Fig. 1, it must also accept an input graph \(\textsf {e}(2,3) \, \textsf {e}(1,2) \, \textsf {e}(1,3) \, \textsf {e}(3,4)\) where the edges are permuted.

Before parsing starts, a procedure described in [5, Sect. 4] analyzes the grammar for the unique start node property, by computing the possible incidences of all nodes created by a grammar. The unique start nodes have to be matched by some nodes in the start rule of the grammar, thus determining where parsing begins. For our example, the procedure detects that every series-parallel graph has a unique root (without ingoing edges), and that the node x in the start rule \(\textsf {Z}()\rightarrow \textsf {G}(x,y)\) must be bound to the root of any input graph.Footnote 5 If the input graph has no root, or more than one, it cannot be series-parallel, so that parsing fails immediately.

A PSR parser is a push-down automaton that is controlled by a characteristic finite automaton (CFA). The stack of the PSR parser consists of states of the CFA. The parser makes sure that the sequence of states on its stack always describes a valid walk through its CFA. In order to do so, the parser generator computes a parsing table with processing instructions that control the parser.

Table 1. PSR parsing table for series-parallel graphs.

Table 1 shows the parsing table for our example of series-parallel graphs. It has been generated by the graph parser generator Grappa (see footnote 1), using the constructions described in [6]. The rows of the table correspond to states of the CFA. Each of these states has a certain number of parameters. For instance, \(Q_2(p,q)\) has two parameters \(p\) and \(q\). Parameters remain abstract in the CFA and in the parsing table; only the parser will bind them to nodes of the input graph, and store them in concrete states on its stack.

When the parser starts, nothing of its input graph has been read yet, and its stack consists of a single concrete state \(Q_0\), where its parameter \(p\) is bound to the unique start node, namely the root of the input graph.

The columns of the table correspond to terminal and nonterminal edges as well as the end-of-input marker $. Column \(\textsf {e}(x,y)\) in Table 1 contains all actions that can be taken by the parser if the input graph contains an edge \(\textsf {e}(x,y)\) that is still unread. Column $ contains the actions to be done if the input graph has been read completely. We will come to column \(\textsf {G}(x,y)\) later.

The parser looks up its next action in the parsing table by inspecting the top-most state on its stack and all unread edges of the input graph. For illustration, let us assume that the parser has state \(Q_3\) on top of its stack, with its parameters \(p\) and \(q\) being bound (by an appropriate renaming \(\sigma \)) to the input graph nodes \(p^\sigma \) and \(q^\sigma \), resp., so that the parser must look into row \(Q_3(p,q)\). If the input graph contains an unread \(\textsf {e}\)-labeled edge, the entry in column \(\textsf {e}(x,y)\) applies. The corresponding table entry contains two possible actions, a shift and a reduce.

The shift operation can be selected if the input graph contains an unread \(\textsf {e}\)-labeled edge e that connects \(p^\sigma \), either with \(q^\sigma \), or with any node that has not yet occurred in the parse, indicated by “\(-\)” in the condition. If this shift operation is selected, it marks e as read and pushes a new concrete state \(Q_1\) onto the stack, where the parameters \(p\) and \(q\) of \(Q_1\) are bound to the source and the target node of e.

The reduce operation does not require any further condition to be satisfied. It consists of two steps: First, it pops as many states from the top of the stack as the right-hand side of the rule has edges. In our example “reduce 2, \(\textsf {G}(p,q)\)” refers to rule 2, with two edges in its right-hand side. Second, the reduce operation looks up the row for the new top-most state of the stack, selects the operation for the new nonterminal edge with label \(\textsf {G}\) that connects \(p^\sigma \) with \(q^\sigma \), i.e., in column \(\textsf {G}(x,y)\), and pushes the corresponding state onto the stack.

The entries accept and error in column $ express that, if all edges of the input graph have been read, the parser terminates with success if the top-most state is \(Q_2\), or with failure if it is \(Q_0\) or \(Q_4\).

A PSR parser must always be able to select the correct operation; it must not happen that the parser must choose between two or more operations where one of them leads to a successful parse whereas another one leads to failure. Such a situation is called a conflict. It is clear that a PSR parser always selects the correct action if conflicts cannot occur. However, a PSR parser does not know a priori which unread edge must be selected next. Hence, there are not only the shift-reduce or reduce-reduce conflicts (well known from LR parsing [9]). Shift-shift conflicts may also occur if the parser has to choose which input edge should be read next. Moreover, a shift alone may raise a conflict, since the unread input graph may contain more than one edge matching a pattern like \(\textsf {e}(p,-)\). Only if the free edge choice property holds, the parser knows that any of these edges may be processed, without affecting the result of the parse.Footnote 6

For our example of series-parallel graphs, conflicts arise in all states, except for \(Q_0\) and \(Q_1\). For \(Q_3(p^\sigma ,q^\sigma )\), e.g., the parser can always select the reduction, but it can also shift any edge connecting \(p^\sigma \) with \(q^\sigma \) or with any other unread node. Apparently, not every choice will lead to a successful parse, even if the input graph is valid.

Thus the parsing table does not always allow to predict the next correct action, and the grammar does not have a PSR parser. This problem can be solved by generalizing PSR parsing as described in the next section.

4 Generalized Predictive Shift-Reduce Parsing

Before we describe GPSR parsing for HR grammars, let us briefly recapitulate LR(k) parsing for context-free string grammars ([9], with \(k=1\) symbols of lookahead) and how this is extended to generalized LR (GLR) parsing [12]. An LR(1) parser is controlled by a parsing table derived from the CFA of the grammar. The parsing table assigns a unique parser action to each state of the CFA and to each terminal symbol: shift, reduce, accept, or error. In each step, the parser executes the action specified for the current state on top of the stack and the next unread input symbol (the look-ahead). However, LR(1) parsing is not possible if the parsing table has conflicts, i.e., if there is a state q and a look-ahead symbol a associated with two actions or more. A parser that reaches q with a look-ahead symbol a has as many choices how it may continue, i.e., the parse stack can be modified in different ways. A search process must then explore which of the resulting parse stacks can be further extended to a successful parse.

A GLR parser organizes this search process as a breadth-first search. It reads the input string from left to right. At any time, it has read a certain prefix \(\alpha \) of the input string. It maintains the set of all (parse) stacks which can be obtained by reading \(\alpha \). This set of stacks is in fact processed in rounds as follows: For each stack, the parser determines all possible actions based on the parsing table, the top-most state of the stack, and the look-ahead symbol. The parser has found a successful parse if the action is accept and the entire input string has been read. (It may proceed if further parses shall be found.) If the action is an error, the parser just discards this stack, stops if this has been the last remaining stack, and fails altogether if it has not found a successful parse previously. If the parsing table, however, indicates more than one possible action, the parser duplicates the stack for each of them, and performs each action on one of the copies. If the action is a shift, the resulting stack is no longer considered in this round, but only in the next one. This way, at the beginning of the next round, each stack is the result of reading the look-ahead symbol in a shift action, and having read the same prefix of the input string.

In fact, a GLR parser does not store complete copies of stacks, but shares their common prefixes and suffixes. The resulting structure is known as a graph-structured stack (GSS). An individual stack is represented as a path in the GSS, from some top-most state to the unique initial state.

Table 2. Graph-structured stacks and steps of the GPSR parser when parsing the graph consisting of the hyperedges \(a=\textsf {e}(1,2), b=\textsf {e}(2,3), c=\textsf {e}(1,3), d=\textsf {e}(3,4)\).

A GPSR parser generalizes a PSR parser in the same way as a GLR parser generalizes an LR parser. It also maintains a set of parse stacks, which contain concrete states whose parameters are bound to nodes of the input graph. However, a GPSR parser must also deal with the fact that there is no a priori reading sequence of edges of the input graph.

This affects a GPSR parser even more than a PSR parser since a GPSR parser may be forced to pursue different reading sequences in parallel while it performs the search process. This has consequences as follows:

  • Each parse stack corresponds to a specific part of the input graph that has been read already. Hence, the parser must store, for each stack separately, which edges of the input graph have been read. Sets of stacks are stored as a GSS like in GLR parsers. Each GSS node corresponds to a concrete state. Additionally, each GSS node keeps track of the set of input graph edges that have been read so far. Note that GSS nodes may be shared only if both their concrete states and their sets of read edges coincide.

  • GPSR parsers cannot process their sets of stacks in rounds. When a stack is obtained by executing a shift action, the parser must not wait until the same edge has been read in all the other stacks; they may read other edges first. As a consequence, a GPSR parser needs other strategies to control the order in which stacks are processed. Strategies are discussed in Sect. 5.

We demonstrate GPSR parsing using the example of series-parallel graphs and the input graph \(\textsf {e}(1,2) \, \textsf {e}(2,3) \, \textsf {e}(1,3) \, \textsf {e}(3,4)\) derived in Fig. 1. We refer to these edges by the letters a, b, c, and d. We write GSS nodes in compact form: e.g., \(5^{341}_{cd}\) refers to the concrete state \(Q_5(3, 4, 1)\) and indicates that the edges \(c = \textsf {e}(2,3)\) and \(d = \textsf {e}(3,4)\) have been read already.

The parser determines node 1 as the unique start node, i.e., it starts with concrete state \(Q_0(1)\), with all edges of the input graph unread. So the GSS in step 0 consists of \(0^{1}_{\varnothing }\) (cf. Table 2). The parsing table in Table 1 indicates that the parser can shift both edge \(a = \textsf {e}(1,2)\) and edge \(c = \textsf {e}(1,3)\), resulting in the stacks \(0^{1}_{\varnothing }\) \(1^{12}_{a}\) and \(0^{1}_{\varnothing }\) \(1^{13}_{c}\). Table 2 shows the corresponding GSS in step 1. (Note that “new” GSS nodes are set in boldface.) Step 1 continues with processing stack \(0^{1}_{\varnothing }\) \(1^{12}_{a}\). (See Sect. 5 for a discussion of strategies.) State \(Q_1(1,2)\) just allows a reduction by rule 1, producing a nonterminal edge \(\textsf {G}(1,2)\). This pops \(1^{12}_{a}\) from the stack; processing \(\textsf {G}(1,2)\) pushes the concrete state \(Q_2(1,2)\), which is represented by \(2^{12}_{a}\) in step 2.

State \(5^{231}_{ab}\) in step 7 allows both, to shift \(d=\textsf {e}(3,4)\), and to reduce by rule 3, with nonterminal edge \(\textsf {G}(1,3)\). The resulting GSS nodes are \(1^{34}_{abd}\) and \(2^{13}_{ab}\) in step 8. State \(5^{341}_{cd}\) allows just a reduce action by grammar rule 3. However, this reduce operation with nonterminal edge \(\textsf {G}(1,4)\) is invalid. If it were valid, \(\textsf {G}(1,4)\) could be derived to the graph consisting of just \(c=\textsf {e}(2,3)\) and \(d=\textsf {e}(3,4)\), i.e., it would generate node 3. However, this contradicts the fact that the unread edge \(b=\textsf {e}(2,3)\) is attached to node 3, which must be generated earlier in the derivation. Therefore, the stack with top-most state \(5^{341}_{cd}\) is discarded in this step (indicated by the asterisk), and analogously in steps 12 and 13.

Note that the shift action in step 9 results in a GSS where \(1^{34}_{abd}\) is the top-most state of two stacks. The reduce operation in step 10, however, removes \(1^{34}_{abd}\) from the GSS and from the corresponding stacks again, and produces the two stacks with top-most states \(5^{341}_{abd}\) and \(5^{342}_{abd}\).

The GSS in step 18 contains node \(2^{14}_{abcd}\), i.e., the accept state \(Q_2(1,4)\) with the entire input graph being read. The GPSR parser, therefore, has found a successful parse of the input graph.

The current implementation stops when the first successful parse has been found. Another successful parse could have been found if \(1^{12}_{ac}\) had been processed in step 5 or later.

5 Parsing Experiments

We now report on runtime experiments with different parsers applied to series-parallel graphs and to structured flowcharts. The latter are flowcharts that do not allow arbitrary jumps, but represent structured programs with conditional statements and while loops. They consist of rectangles containing instructions, diamonds that indicate conditions, and ovals indicating begin and end of the program. Arrows indicate control flow; see Fig. 2 for an example. Flowcharts are easily represented by graphs as also shown in Fig. 2. Figure 3 defines the rules of an HR grammar generating all graphs representing structured flowcharts. This grammar is not PSR because a state of its CFA has conflicts.

Fig. 2.
figure 2

A structured flowchart (text within the blocks has been omitted) and its graph representation.

Fig. 3.
figure 3

HR rules for structured flowcharts.

Fig. 4.
figure 4

Definition of flowchart graphs \(F_n\).

We generated three different parsers for the grammar of series-parallel graphs and for structured flowcharts: a Cocke-Younger-Kasami style parser (CYK, [11]) using DiaGenFootnote 7, and two variants of the GPSR parser using Grappa (see footnote 1). The CYK parser was in fact optimized in two ways: the parser creates nonterminal edges by dynamic programming, and each of these edges can be derived to a certain subgraph of the input graph. The optimized parser makes sure that it does not create two or more indistinguishable nonterminals for the same subgraph, even if the nonterminals represent different derivation trees. And it stops as soon as it finds the first derivation of the entire input graph.

The GPSR parsers differ in the strategy that controls which of the currently considered stacks is selected for the next step. GPSR 1 simply maintains a FIFO queue of all such stacks, i.e., new states are enqueued as soon as they are created, and a top-most state is selected for processing as soon as it is next in the queue. GPSR 2, however, applies a more sophisticated strategy. It requires grammar rules to be annotated with either first or second priority. The GPSR 2 parser provides two queues, the first one using FIFO and the second LIFO. New states that result from handling a first priority rule go into the first queue, the others into the second. The parser always tries to select states from the first queue; it selects from the second queue only if the first queue is empty. This way one can control, by annotating grammar rules, which rules should be considered first. This does not affect the correctness of the parser; it can still examine the entire search space. However, it will stop as soon as it finds the first successful parse. By appropriately annotating grammar rules, one can thus speed up the parser if the input graph is valid. However, there is no speed-up for invalid input graphs, since the parser must inspect the entire search space in this case.

The GPSR 2 parser for series-parallel graphs gives rule 3 (series) precedence over rule 2 (parallel); it has been applied to graphs

figure a

with different values of n. The GPSR 2 parser for structured flowcharts gives sequences priority over conditional statements; it has been applied to flowcharts \(F_n\) defined in Fig. 4 and consisting of n conditions and \(3n+1\) instructions. The flowchart in Fig. 2 is in fact \(F_3\). \(F_n\) has a subgraph \(D_n\), which, for \(n>0\), contains subgraphs \(D_m\) and \(D_{m'}\) with \(n = m + m' + 1\). Note that the conditions in \(F_n\) form a binary tree with n nodes when we ignore instructions. We always choose m and \(m'\) such that it is a complete binary tree.

Fig. 5.
figure 5

Runtime (in milliseconds) of different parsers for series-parallel graphs and structured flowcharts.

Figure 5 shows the runtime of the different parsers applied to \(S_n\) and \(F_n\) with varying value n. Runtime has been measured on an iMac 2017, 4.2 GHz Intel Core i7, Java 1.8.0_181 with standard configuration, and is shown in milliseconds on the y-axis while n is shown on the x-axis.

The experiments first demonstrate that the more sophisticated strategy of GPSR 2 really pays off as GPSR 2 finds a derivation much faster than GPSR 1. For parsing \(F_{1000}\), e.g., GPSR 1 needs \(4\,013\,880\) steps, but GPSR 2 just \(13\,004\). The experiments also show that GPSR 1 is in fact much slower than CYK, which demonstrates the need for a sophisticated strategy for the GPSR parser. But for series-parallel graphs, even GPSR 2 is much slower than CYK. Because, the grammar of series-parallel graphs is highly ambiguous. For instance, \(S_{100}\) has the ridiculous number of \(6.1\cdot 10^{281}\) derivation trees. The CYK parser has to create \(40\,422\) nonterminal edges for \(S_{100}\), and for \(S_{40}\) just \(6\,582\), where most of them represent a high number of different derivation trees. GPSR 2, however, needs \(908\,122\) steps to find a derivation for \(S_{40}\). Apparently, the compactification by the optimized CYK is more effective to cut down the number of choices the parser has to follow.

6 Conclusions

We have generalized PSR parsing for HR grammars [6] to cope with ambiguous graph grammars, by pursuing all possible parses of a graph in parallel until the first derivation has been found. This work is inspired by Tomita’s GLR string parsers [12], which extend D.E. Knuth’s LR string parsers [9]. For the academic example grammars examined in Sect. 5, in particular the highly ambiguous grammar for series-parallel graphs, comparison of our parser with the CYK parser does not give a clear picture. Moreover, the speed-up obtained by choosing an appropriate strategy only helps when parsing valid graphs, but not when processing invalid graphs. Our experiments shall be extended in two respects: First, we shall study more, and more realistic HR grammars, e.g., the modestly ambiguous (and big) grammars used for processing abstract meaning representations in natural language processing (NLP). Second, we shall compare the GPSR parser with the two parsers used for NLP: the Bolinas parser [1] by D. Chiang, K. Knight et al. implements the polynomial algorithm for a restricted class of HR grammars devised in [10]; the s-graph parser [7] by A. Koller et al. uses a similar formalism.

We also intend to extend both the original and the generalized PSR parsers to contextual HR grammars [2, 3], which have greater generative power, and can be used for analyzing graph models that are more general, and more relevant in practice. Our experience with PTD parsing [4] suggests that this should be relatively easy.