Keywords

1 Introduction

Graphs and graph transformation (GT) systems have been applied to a variety of problems with industrial relevance [12, 17,18,19]. GT tools play an important role in making this feasible. A considerable number of such tools have been developed over the last decades [1, 15]. Some of these tools aim to closely implement results from GT theory, e.g., results from algebraic graph rewriting [16, 20]). Other tools are more concerned with incorporating a notion of GTs with practical applications, e.g., software design [10] and verification [13]. However, one aspect where all current GT tools fall short of reflecting mathematical theory is that they maintain graphs in ephemeral data structures, i.e., graphs are maintained in stateful objects that are updated when GT rules are applied and previous version of graphs are lost. The use of ephemeral data structures for maintaining graphs is particularly limiting for example when an application needs to “look back” into graph derivation histories [8] or if it needs to perform some type of reasoning about graph processes [4]. Significant extensions of GT theory, such as the definition of transactional graph transformation systems as processes [2], cannot readily be implemented with current GT tools. From a more mundane perspective, it is well known from the programming languages and software engineering domains that stateless computation avoids complexity and is easier to understand and test.

All data in functional programming is immutable. Under the hood, this is implemented with persistent data structures which, in contrast to ephemeral data structures, retain all previous versions upon update [9]. A data structure is partially persistent if all versions can be accessed but only the latest version can be modified. Data structures where all versions can be modified are referred to as fully persistent.

In this paper, we present GrapeVine, a GT tool that maintains graphs in a fully persistent data structure and treats graphs as immutable data objects. GrapeVine is based on our earlier work on Grape (the Graph Rewriting And Persistence Engine) [21] and its integration with computational notebook technology (GrapePress) [22]. Like other tools, Grape (and GrapePress) uses an ephemeral data structure to store graphs, i.e., graphs are destructively rewritten “in-place”. (The term “persistence” in the tool’s name was originally chosen to indicate that data structure was persisted in a database with transactional support.) While GrapeVine is based on these earlier works, it constitutes a complete reimplementation of its core engine and data structures to achieve support for truly functional graph rewriting.

The rest of this paper is structured as follows: We discuss related work in the following section. Section 3 provides an overview of GrapeVine and its new persistent data structure. Section 4 comments on the tool demonstration provided in the appendix and Sect. 5 offers concluding remarks and an outlook to future work.

2 Related Work

Our current work on GrapeVine can be seen as a major revision of our earlier work on Grape [21] and GrapePress [22], with the fundamental difference that GrapeVine uses a “functional programming” paradigm and treats graphs as immutable objects. Historically, the name Grape referred to the GT engine and the domain-specific language for defining and controlling GTs, while the name GrapePress was used to refer to the computational notebook platform that integrates with Grape. While GrapeVine inherited the same architecture (i.e., its language and engine can be used without the computational notebook platform), we use a single name for the tool, going forward. Similarly, we will use the name GrapePress in this paper to also include the Grape engine.

GrapeVine has the ability to concurrently explore and compare many (all) possible derivations of arbitrarily many graphs. This feature is related to functionality provided by the GROOVE tool, which provides support for state-space exploration of GT systems [13]. However, GROOVE does not persist different versions of a graph but only fingerprints for the purpose of detecting collisions during model-checking (which is the primary function of graph process exploration in GROOVE).

The functionality provided by the fully persistent data structure implemented in GrapeVine resembles that of model versioning and indexing system like Hawk [3]. However, those approaches are primarily used for model management but they are not integrated with the computational model of the transformation tool. In contrast, GrapeVine realizes a truly functional computation model based on its fully-persistent graph data structure.

The data structure implemented in GrapeVine was inspired by the theoretical concept of a graph process, as defined by Corradini et al. [4], which is an a graph of graph transformation occurrences. Moreover, transaction-handling in GrapeVine is based on graph processes, as proposed by Baldan et al. [2].

3 GrapeVine Concepts

3.1 Overview

A core objective in the design of GrapeVine (and its predecessor GrapePress) has been to make it easy to integrate GT programming with common software engineering tools and processes. As such, GrapeVine was developed as an internal domain-specific language (DSL) to a general purpose programming language (Clojure). Regular tools like text editors, IDEs and configuration management systems can be used, without the need to install and learn a particular graphical tool for GT development and execution [21]. GrapeVine comes integrated with an optional computational notebook user interface, which provides graphical visualization of rules, graphs and graph history [22]. GrapeVine computational notebooks are an excellent way to quickly explore and document GT-based computations and systems. Notebook worksheets can be shared as “executable” papers, but they can also be saved as regular program code, to be used as part of larger software applications.

Under the hood, GrapeVine uses the Neo4J graph database management system, which makes it highly scalable to ultra-large graphs. GrapeVine is available as a Docker image and installs with a single command.

Graph Model. The tool uses directed, attributed, node- and edge-labeled (danel) graphs. Graphs do not need to be typed but it is possible to define constraints on graphs, which are enforced whenever rules are applied. GrapeVine comes with a number of predefined constraint types (e.g., for allowed node and edge types, cardinalities, and uniqueness of attribute values) but also allows users to define complex, user-defined constraints based on Oreja et al.’s logic of graph constraints [11].

Graph Transformation Rules. In the previous version of the tool (GrapePress), each GT rule could be defined with either single-pushout (SPO) or double-pushout (DPO) semantics [6]. The main difference concerned the way how any “dangling” edges would be handled during the execution of rules that delete nodes. SPO rules would delete such dangling edges, while the application of a DPO rule is simply not permitted in a context that would cause dangling edges to arise. GrapeVine has removed this choice and adopted SPO semantics throughout. We found that the complexities caused by providing this option (both in terms of tool implementation as well as in terms of reasoning about such “mixed” GT systems) are not justified, since the behaviour of DPO rules can be simulated in the SPO approach, given the availability of additional mechanisms, like negative application conditions.

The user can still choose between homomorphic and isomorphic matching semantics for each rule. Rules can be equipped with applications conditions (i.e., conditions that must be true for a rule to be applicable) [5] and negative application conditions (i.e., conditions that prevent rule application) [7].

Rules can be parameterized and rule parameters can be used to define or restrict the labels and attributes of graph elements. The previous version of the tool (GrapePress) treated attributes as variables and therefore also provided an assignment operator to change their values. The concept of variables no longer applies to the functional computation paradigm implemented in GrapeVine. Since graphs are immutable, attributes also have that property. This means that GTs that seek to “modify” attributes of a graph element (node or edge) need to replace that graph element with a new graph element of the same type, while copying the unchanged attributes and (re)defining the “changed” ones.

Control Structures. Since GrapeVine rules are defined with an internal DSL to a general purpose programming language (Clojure), the control structures of the host language are available for programming with GTs. The previous version of the tool (GrapePress) also provided a set of dedicated control structures (e.g., atomic blocks, loops, non-deterministic choice), which could be used to define composite transactions with ACID properties and backtracking for dealing with non-determinism during rule applications [21]. Given the paradigm shift to a purely functional model of computation in GrapeVine, these control structures have been revised completely. All GrapeVine control structures are defined as functions on graph sets rather than graphs. This allows the definition of deterministic operators for rule application. (Non-deterministic operators are also still available.) GrapeVine therefore no longer needs complex backtracking mechanisms to “undo” unsuccessful derivations. Similarly, there is no longer a need for a dedicated transaction manager to achieve ACID properties for programmed transformation units, since unsuccessful (partial) execution of such units can simply be “forgotten” [2].

Graph Queries. Graph queries are pattern-based matches for the purpose of returning data from the graph or testing the existence of conditions. The previous version of the tool (GrapePress) did not have a dedicated syntactic form for graph queries; graph queries were merely defined as rules that do not alter the graph. Instead, GrapeVine has a dedicated syntax for defining graph queries and enforces their read-only semantics. This is important because, in contrast to rule applications, the application of queries does not generate a new graph.

3.2 A Fully-Persistent Data Structure for Functional Graph Rewriting

There are several choices that can be made when designing a fully persistent data structure. Of course, there is always the naïve approach of copying a data item upon modification. However, that approach makes sense only for relatively small data items. Since in our application, each occurrence of a GT creates a new graph, one choice is to record only the differences from the “previous” graph in our data structure. We haven chosen that option and Fig. 1 presents the model that was implemented for this purpose.

Nodes of type Graph reference the graph elements (nodes and edges) that were created or deleted when the “last” transformation occurred. We also record which graph elements where read (preserved) when a transformation occurred. While the latter is not needed for implementing a versioned data structure, we capture this data for the purpose of using the tool for reasoning about the provenance of applied transformation rules.

Fig. 1.
figure 1

Meta-model for persistent graph data structure in GrapeVine

Graphs are uniquely identified by a globally unique hash code (id). Graphs are arranged in a partial order, which is induced by the previous relationship. Computing the elements of a given graph G simply requires retrieving all elements that have been created by any graph in the history of G minus those elements that have been deleted in G’s history. This query can efficiently be executed in the Neo4J graph database. The Neo4J query language is well understandable and we provide it below instead of creating our own mathematical formulation. The query looks up a graph (g) with the given id and collects all graphs gs in its history by repeatedly traversing previous edges. The query then collects all graph elements e that have a created edge from any node in gs without also having a deleted edge from any node in gs.

figure a

Now that we have described the design chosen for implementing the persistent data structure for graphs in GrapeVine, we briefly want to comment on a disadvantage of the chosen approach, i.e., GrapeVine graphs are no longer represented “varbatim” in the Neo4J graph database. While the previous version of the tool (GrapePress) was able to operate on any any graph in a Neo4J database, independently of its origin, GrapeVine now requires graphs to conform to the meta-model in Fig. 1. For example, we note that GrapeVine now represents graph edges as Neo4J graph nodes, to allow incoming provenance edges from Graph nodes. (Neo4J does not support hypergraphs.)

We could have represented GrapeVine edges directly as edges in the Neo4J database if we replaced the provenance edges (create, delete, read) by “foreign key” attributes (createdBy, deletedBy, readBy) on edges. (Neo4J allows attributes on edges.) These attributes could store the IDs of the corresponding graphs (that created, deleted or read the edge), respectively. However, such a design would be inefficient as each graph lookup would then involve value-based “join” operations on potentially large sets of elements. Moreover, the benefit of such a design would be small, as other external tools accessing the graph database directly would still need to use a projection to assemble a concrete graph from its history. The only alternative that would avoid the need for such a projection would be the naïve approach of copying the graph upon each change, which does not appear scalable.

As mentioned in the previous section, GrapeVine no longer requires a sophisticated transaction manager like the one used on the previous (stateful) version of the tool (GrapePress). Rather, GrapeVine implements transactions based on the notion of graph processes. [2] Unsuccessful or incomplete executions of composite GT programs can simply be “forgotten”. Of course, from a practical point of view, these graphs would still fill up the database over time. GrapeVine therefore provides a mechanism to purge them. Any graph of interest can be “committed” to the database by tagging it with a commit tag (ctag) in Fig. 1. Conversely, GrapeVine offers a rollback operation that deletes all graphs that are not in the history of any committed graph.

4 A Taste of Interacting with GrapeVine

Like the previous version of the tool, GrapeVine can be used as a library in software programming projects or it can be used as a stand-alone tool with its integrated computational notebook. The demo in the appendix uses the latter.

At first glance, GrapeVine looks similar to the previous version of the tool (GrapePress). Worksheets consist of sequences of static segments and dynamic segments. Dynamic segments consist of executable code and a display for the output of the computation. A major advantage of the functional (side-effect free) computational model in GrapeVine is, however, that it is now much easier to create idempotent dynamic segments. In the previous version of the tool, code segments needed to be executed “in the right order” to produce the desired output result. For example, if a graph was rewritten by a graph transformation once, applying that transformation another time would usually result in a different output.

The demo shows how using sets of graphs rather than single graphs as a data type for computation simplifies the composition of (GT) operations and allows for the definition of a deterministic operator for rule application (which simply return the set of all possible direct derivations). Another advantage of this choice of data type is that unsuccessful rule applications simply return an empty set rather than requiring special handling of failure.

Finally, the demo illustrates how GrapeVine’s data structure maintains the derivation history of all graphs as a graph of graphs. We demonstrate how such history graphs can be reflected as regular GrapeVine graphs and how the tool supports visualizing and reasoning about derivation histories.

5 Conclusions and Future Work

The fact that current GT tools maintain graphs in ephemeral data structures limits their usefulness in practice. Applications that require knowledge of the derivation history of graphs cannot readily be supported. Current tools do not support computing with and reasoning about graph derivation history. Even in applications where this is not needed, stateful computation is harder to understand and test. Stateless computation is also preferrable when working with GrapeVine as a computational notebook. Users of the predecessor tool had to remember to execute code segments in worksheets “in the right order” since GTs had side-effects on the shared graph and few code segments would be idempotent.

Finally, from a tool developer’s point of view, the associated stateful computation model is complex and prone to problems. For example, the transaction manager developed for the (stateful) prior version of this tool (GrapePress) was by far the most complex module of the tool. Removing it from GrapeVine was liberating, but also painful because of all the work that had gone into it.

Our current plan for evolving GrapeVine has two near-term objectives. Firstly, we will evolve the persistent data structure to make it confluently persistent. A data structure is called confluently persistent if there is a “merge” operation to join two branches originating from a common version [9]. While we did not talk about it in this paper, GrapeVine already has an operator to filter out “duplicate” graphs in a graph set (up to isomorphism). That operator is implemented along a similar mechanism as described by Rensink [14]. We are working on using this operator as a basis for defining a “merge” operation for GrapeVine’s data structure. Our second objective is to integrate theoretical results for verifying properties of graph transformation systems, in particular with respect to the confluence of rule sets, i.e., critical pair analysis.

figure b
figure c
figure d
figure e