Keywords

1 Introduction

Mean field approximations (MFAs) are used in the study of complex systems to obtain simplified and revealing descriptions of their dynamics. MFAs are used in many disparate contexts such as Chemical Reaction Networks (CRNs) and their derivatives  [13, 23, 34], walkers on bio-polymers  [24, 44], models of epidemic spreading  [27], and the evolution of social networks  [20]. These examples witness both the power and universality of MFA techniques, and the fact that they are pursued in a seemingly ad hoc, case-by-case fashion.

The case of CRNs is particularly interesting because they provide a human-readable, declarative language for a common class of complex systems. The stochastic semantics of a CRN is given by a continuous-time Markov chain (CTMC) which gives rise to the so-called master equation (ME). The ME is a system of differential equations describing the time evolution of the probability of finding the CRN in any given state. Various tools have been developed to automate the generation and solution of the ME from a given CRN, liberating modellers from the daunting task of working with the ME directly (e.g.  [25, 36, 45]).

Fig. 1.
figure 1

Stukalin model of a walking DNA bimotor.

Its high dimensionality often precludes exact solutions of the ME. This is where MFA techniques become effective. The generally countably infinite ME is replaced by a finite system of differential equations, called the rate equations (RE)  [28, 34], which describe the time evolution of the average occurrence count of individual species. Here, we extend this idea to the case of graphs and, in fact, the resulting framework subsumes all the examples mentioned above (including CRNs). The main finding is summarised in a single Eq. (15) which we call the generalised rate equations for graphs (GREG). We have published in previous work a solution to this problem for the subclass of reversible graph rewriting systems  [17, 18]. The solution presented here is valid for any such system, reversible or not. The added mathematical difficulty is substantial and concentrates in the backward modularity Lemma 2. As in Ref.  [18], the somewhat informal approach of Ref.  [17] is replaced with precise category-theoretical language with which the backward modularity Lemma finds a concise and natural formulation.

As the reader will notice, Eq. (15) is entirely combinatorial and can be readily implemented. Our implementation can be played with at https://rhz.github.io/fragger. Its source can be found at https://github.com/rhz/fragger.

1.1 Two-Legged DNA Walker

Let us start with an example from biophysics  [44]. The model describes a protein complex walking on DNA. The walker contains two special proteins – the legs – each binding a different DNA strand. The legs are able to move along the strands independently but can be at most m DNA segments apart.

Following Stukalin et al.  [44], we are interested in computing the velocity at which a two-legged walker moves on DNA with \(m=1\). In this case, and assuming the two legs are symmetric, there are only two configurations a walker can be in: either extended (E) or compressed (C). Therefore all possible transitions can be compactly represented by the four rules shown in Fig. 1, where the grey node represents the walker and white nodes are DNA segments. The polarisation of the DNA double helix is represented by the direction of the edge that binds two consecutive DNA segments. Rules are labelled by two subscripts: the first tells us if the leg that changes position is moving forward (F) or backward (B), while the second states whether the rule extends or compresses the current configuration.

The mean velocity V of a single walker in the system can be computed from the rates at which they move forward and backward and their expected number of occurrences \(\mathbb {E}{\left[ G_i\right] }\), where \(G_i\) is in either of the three possible configurations depicted in Fig. 1, and \(\left[ G_i\right] \) is short for \(\left[ G_i\right] (X(t))\), the integer-valued random variable that tracks the number of occurrences of \(G_i\) in the (random) state of the system X(t) at time t. We call any real- or integer-valued function on X(t) an observable.

In the case there is only a single motor in the system, the observables \(\left[ G_i\right] \) are Bernoulli-distributed random variables, and the expectations \(\mathbb {E}{\left[ G_i\right] }\) correspond to the probabilities of finding the motor in the configuration \(G_i\) at any given time. Thus by constructing the ODEs for these observables, we can compute the mean velocity of a single motor in the system. That is, we must compute the rate equations for these graphs.

Intuitively, to compute rate equations we must find all ways in which the rules can create or destroy an occurrence of an observable of interest. When, and only when, a rule application and an occurrence of the given observable overlap, can this occurrence be created or destroyed. A systematic inventory of all such overlaps can be obtained by enumerating the so-called minimal gluings (MGs) of the graph underlying the given observable and the left- and right-hand sides of each rule in the system. MGs show how two graphs can overlap (full definition in the next section). Such an enumeration of MGs is shown in Fig. 2, where the two graphs used to compute the MGs are the extended walker motif – the middle graph in Fig. 1 – and the left-hand side of the forward-extension rule. The MGs are related and partially ordered by graph morphisms between them.

In theory, since we are gluing with the left-hand side of a rule each one of the MGs represents a configuration in which the application of the rule might destroy an occurrence of the observable. However, if we suppose that walkers initially have two legs, then 13 of the 21 MGs in Fig. 2 are impossible to produce by the rules, because no rule can create additional legs. Therefore those configurations will never be reached by the system and we can disregard them. If we further suppose the DNA backbone to be simple and non-branching, we eliminate three more gluings. Finally, if there is only one motor, the remaining four non-trivial gluings are eliminated. In this way, invariants can considerably reduce the number of gluings that have to be considered. Removing terms corresponding to observables which, under the assumptions above, are identically zero, we get the following series of ODEs. For readability, only a subset of the terms is shown, and we write G instead of the proper \(\mathbb {E}{\left[ G\right] }\) in ODEs.

figure a

Notice how only graphs with extra white nodes to the right are obtained when computing the ODE for the left graph in Fig. 1. The opposite is true for the right graph in Fig. 1. This infinite expansion can be further simplified if we assume the DNA chain to be infinite or circular. In this case we can avoid boundary conditions and replace the left- and right-hand observables below by the simpler middle observable:

figure b

The infinite expansion above now boils down to a simple finite ODE system.

figure c

From the above ODEs and assumptions, we get the steady state equation.

figure d

Since we have only one motor,

figure e

Using this, we can derive the steady state value for the mean velocity:

figure f

This exact equation is derived in Ref.  [44]. We obtain it as a particular case of the general notion of rate equations for graph explained below. It is worth noting that, despite the simplicity of the equation, it is not easily derivable by hand. This and other examples are available to play with in our web app at https://rhz.github.io/fragger/. The example models include

The DNA walker model presented in this introduction is small and reversible. It requires no approximation to obtain a finite expansion. By contrast, the population model and the preferential attachment model are irreversible; the population and the voter model require an approximation to obtain a finite expansion.

Fig. 2.
figure 2

The poset of minimal gluings of \(G_2\) and \(G_1\). The disjoint sum is at the top. Gluings are layered by the number of node and edge identifications or, equivalently, by the size of their intersection.

1.2 Discussion

The reasoning and derivation done above in the DNA walker can in fact made completely general. Given a graph observable \(\left[ F\right] \), meaning a function counting the number of embeddings of the graph F in the state X, one can build an ODE which describes the rate at which the mean occurrence count \({{\,\mathrm{\mathbb {E}}\,}}(\left[ F\right] (X(t)))\) changes over time.

Because the underlying Markov process X(t) is generated by graph-rewriting rules, the one combinatorial ingredient to build that equation is the notion of minimal gluings (MGs) of a pair of graphs. Terms in the ODE for F are derived from the set of MGs of F with the left and right sides of the rules which generate X(t). Besides, each term in F’s ODE depends on the current state only via expressions of the form \({{\,\mathrm{\mathbb {E}}\,}}(\left[ G\right] )\) for G a graph defining a new observable. Thus each fresh observable \(\left[ G\right] \) can then be submitted to the same treatment, and one obtains in general a countable system of rate equations for graphs. In good cases (as in the walker example), the expansion is finite and there is no need for any approximation. In general, one needs to truncate the expansion. As the MFA expansion is a symbolic procedure one can pursue it in principle to any order.

The significance of the method hinges both on how many models can be captured in graph-like languages, and how accurate the obtained MFAs are. While these models do no exhaust all possibilities, GTSs seem very expressive. In addition, our approach to the derivation of rate equations for graphs uses a general categorical treatment which subsumes various graph-like structures such as: hyper-graphs, typed graphs, etc.  [2, 35]. This abstract view is mathematically convenient, and broadens the set of models to which the method applies.

What we know about the existence of solutions to the (in general) countable ODE systems generated by our method is limited. For general countable continuous-time Markov chains and observables, existence of a solution is not guaranteed  [43]. Despite their great popularity, the current mathematical understanding of the quality of MFAs only addresses the case of CRNs and density-dependent Markov chains, with Kurtz’ theory of scalings  [23, Chap. 11], or the case of dynamics on static graphs  [27]. Some progress on going beyond the formal point of view and obtaining existence theorems for solutions of REs for graphs were reported in Ref.  [16]. Another limitation is the accuracy of MFAs once truncated (as they must be if one wants to plug them in an ODE solver). Even if an MFA can be built to any desired order, it might still fall short of giving a sensible picture of the dynamics of interest. Finally, it may also be that the cost of running a convincing approximation is about the same as that of simulating the system upfront.

This paper follows ideas on applying the methods of abstract interpretation to the differential semantics of site graph rewriting  [15, 26, 30]. Another more remote influence is Lynch’s finite-model theoretic approach to MFAs  [39]. From the GTS side, the theory of site graph rewriting had long been thought to be a lucky anomaly until a recent series of work showed that most of its ingredients could be made sense of, and given a much larger basis of applications, through the use of algebraic graph-rewriting techniques  [3, 31, 32]. These latter investigations motivated us to try to address MFA-related questions at a higher level of generality.

1.3 Relation to the rule-algebraic approach

Another approach to the same broad set of questions started with Behr et al. introducing ideas from representation theory commonly used in statistical physics  [5]. They show that one can study the algebra of rules and their composition law in a way that is decoupled from what is actually being re-written. The “rule-algebraic” theory allows one to derive Kolmogorov equations for observables based on a systematic use of rule commutators (recently implemented in   [12]). Interestingly, novel notions of graph rewriting appear  [11]. Partial differential equations describing the generating function of such observables can be derived systematically  [7]. As the theory can handle adhesive categories in general and sesqui-pushout rewriting  [10], it offers an treatment of irreversible rewrites alternative to the one presented in this paper. (The rule-algebraic approach can also handle application conditions  [9]). It will need further ork to precisely pinpoint how these two threads of work articulate both at the theoretical and at the implementation levels.

Outline. The paper is organised as follows: Sect. 2 collects preliminaries on graph-rewriting and establishes the key forward and backward modularity lemmas; Sect. 3 derives our main result namely a concrete formula for the action of a generator associated to a set of graph-rewriting rules as specified in Sect. 2. From this formula, the rate equation for graphs follows easily. Basic category-theoretical definitions needed in the main text are given in App. A; axiomatic proofs in App. B.

2 Stochastic Graph Rewriting

We turn now to the graphical framework within which we will carry out the derivation of our generalised rate equation (GREG) in Sect. 3. We use a categorical approach know as algebraic graph rewriting, specifically the single pushout (SPO) approach  [22, 37]. The reasons for this choice are twofold: first, we benefit from a solid body of preexisting work; second, it allows for a succinct and ‘axiomatic’ presentation abstracting over the details of the graph-like structures that are being rewritten. Working at this high level of abstraction allows us to identify a set of generic properties necessary for the derivation of the GREG without getting bogged down in the details of the objects being rewritten. Indeed, while we only treat the case of directed multigraphs (graphs with an arbitrary number of directed edges between any two nodes) in this section, the proofs of all lemmas are set in the more general context of adhesive categories  [35] in App. B. This extends the applicability of our technique to rewrite systems over typed graphs and hypergraphs, among others.

For the convenience of the reader, we reproduce from Ref.  [18] our basic definitions for the category \(\varvec{\mathbf {Grph}}\) of directed multigraphs. Next, we briefly summarise the SPO approach and its stochastic semantics  [33]. We conclude with the modularity lemmas, which are key to the derivation of the GREG in the next section.

2.1 The Category of Directed Multigraphs

A directed multigraph G consists of a finite set of nodes \(V_G\), a finite set of edges \(E_G\), and source and target maps \(s_{G}, t_{G} :E_G \rightarrow V_G\). A graph morphism \(f :G \rightarrow H\) between graphs G and H is a pair of maps \(f_E :E_G \rightarrow E_H\), \(f_V :V_G \rightarrow V_H\) which preserve the graph structure, i.e. such that for all \(e \in E_G\),

$$\begin{aligned} s_{H}(f_E(e))&= f_V(s_{G}(e))&\text { and }&t_{H}(f_E(e))&= f_V(t_{G}(e)). \end{aligned}$$

The graphs G and H are called the domain and codomain of f. A graph morphism \(f :G \rightarrow H\) is a monomorphism, or simply a mono, if \(f_V\) and \(f_E\) are injective; it is a graph inclusion if both \(f_V\) and \(f_E\) are inclusion maps, in which case G is a subgraph of H and we write \(G \subseteq H\). Every morphism \(f :G \rightarrow H\) induces a subgraph \(f(G) \subseteq H\) called the direct image (or just the image) of f in H, such that \(V_{f(G)} = f_V(V_G)\) and \(E_{f(G)} = f_E(E_G)\). Figure 3 illustrates a graph and a graph morphism.

Fig. 3.
figure 3

Examples of a) a directed multigraph, b) a graph morphism.

Fig. 4.
figure 4

A derivation. (Color figure online)

A partial graph morphism \(p :G \rightharpoonup H\) is a pair of partial maps \(p_V :V_G \rightharpoonup V_H\) and \(p_E :E_G \rightharpoonup E_H\) that preserve the graph structure. Equivalently, p can be represented as a span of (total) graph morphisms, that is, a pair of morphisms \(p_1 :K \rightarrow G\), \(p_2 :K \rightarrow H\) with common domain K, where \(p_1\) is mono and K is the domain of definition of p. We will use whichever representation is more appropriate for the task at hand. Graphs and graph morphisms form the category \(\varvec{\mathbf {Grph}}\) (with the obvious notion of composition and identity) while graphs and partial graph morphisms form the category \(\varvec{\mathbf {\varvec{\mathbf {Grph}}_*}}\).

Graph morphisms provide us with a notion of pattern matching on graphs while partial graph morphisms provide the accompanying notion of rewrite rule. We restrict pattern matching to monos: a match of a pattern L in a graph G is a monomorphism \(f :L \rightarrow G\). We write \(\left[ {L}, {G}\right] \) for the set of matches of L in G. We also restrict rules: a rule is a partial graph morphism \(\alpha :L \rightharpoonup R = \left( \alpha _1 :K \rightarrow L, \alpha _2 :K \rightarrow R \right) \) where both \(\alpha _1\) and \(\alpha _2\) are monos. We say that L and R are \(\alpha \)’s left and right hand side (LHS and RHS). Rules are special cases of partial graph morphisms and compose as such. Given a rule \(\alpha :L \rightharpoonup R = \left( \alpha _1, \alpha _2 \right) \), we define the reverse rule \(\alpha ^\dag :R \rightharpoonup L\) as the pair \(\alpha ^\dag := \left( \alpha _2, \alpha _1 \right) \), not to be confused with the inverse of \(\alpha \) (which does not exist in general). Note that \(-^\dag \) is an involution, that is, \((\alpha ^\dag )^\dag = \alpha \).

2.2 Graph Rewriting

The basic rewrite steps of a GTS are called derivations. We first describe them informally. Figure 4 shows a commutative square, with a match \(f :L \rightarrow G\) on the left and a rule \(\alpha :L \rightharpoonup R\), on top. The match f identifies the subgraph in G that is to be modified, while the rule \(\alpha \) describes how to carry out the modification. In order to obtain the comatch \(g :R \rightarrow H\) on the right, one starts by removing nodes and edges from f(L) which do not have a preimage under \(f {{\,\mathrm{\circ }\,}}\alpha _1\), as well as any edges left dangling (coloured red in the figure). To complete the derivation, one extends the resulting match by adjoining to D the nodes and edges in R that do not have a preimage under \(\alpha _2\) (coloured green in the figure).

Derivations constructed in this way have the defining property of pushout squares (PO) in \(\varvec{\mathbf {\varvec{\mathbf {Grph}}_*}}\), hence the name SPO for the approach. Alternatively, one can describe a derivation through the properties of its inner squares: the left square is the final pullback complement (FPBC) of \(\alpha _1\) and f, while the right one is a PO in \(\varvec{\mathbf {Grph}}\)  [14]. (Definitions and basic properties of POs and FPBCs are given in App. A.)

Definition 1

A derivation of a comatch \(g :R \rightarrow H\) from a match \(f :L \rightarrow G\) by a rule \(\alpha = \left( \alpha _1:K \rightarrow L, \alpha _2:K \rightarrow R \right) \) is a diagram in \(\varvec{\mathbf {Grph}}\) such as (1), where the left square is an FPBC of f and \(\alpha _1\) and the right square is a PO,

figure g

with h, g matches and \(\beta = \left( \beta _1, \beta _2 \right) \) a rule, called the corule of the derivation.

Equivalently, a derivation of g from f by \(\alpha \) is a PO in \(\varvec{\mathbf {\varvec{\mathbf {Grph}}_*}}\) as in (2), with corule \(\beta \). We will mostly use this second characterisation of derivations.

Write \(f \Rightarrow _{\alpha } g\) if there is a derivation of g from f by \(\alpha \). Since derivations are POs of partial morphisms and \(\varvec{\mathbf {\varvec{\mathbf {Grph}}_*}}\) has all such POs  [37], the relation \(\Rightarrow _{\alpha }\) is total, that is, for any match f and rule \(\alpha \) (with common domain), we can find a comatch g. However, the converse is not true: not every match g having the RHS of \(\alpha \) as its domain is a comatch of a derivation by \(\alpha \). Which is to say, there might not exist f such that \(f \Rightarrow _{\alpha } g\) (the relation \(\Rightarrow _{\alpha }\) is not surjective). When there is such an f, we say g is derivable by \(\alpha \). Consider the example in Fig. 5. Here, g is \(\alpha \)-derivable (as witnessed by f) but h is not: no match of the LHS could contain a “preimage” of the extra (red) edge e in the codomain of h because the target node of e has not yet been created.

We say a derivation \(f \Rightarrow _{\alpha } g\) (with corule \(\beta \)) is reversible if \(g \Rightarrow _{\alpha ^\dag } f\) (with corule \(\beta ^\dag \)), and irreversible otherwise. Clearly, derivations are not reversible in general, otherwise \(\Rightarrow _{\alpha }\) would be surjective. Consider the derivation shown in Fig. 4. The derivation removes two (red) edges from the codomain of f; the removal of the lower edge is specified in the LHS of \(\alpha \), whereas the removal of the upper edge is a side effect of removing the red node to which the edge is connected (graphs cannot contain dangling edges). Applying the reverse rule \(\alpha ^\dag \) to the comatch g restores the red node and the lower red edge, but not the upper red edge. In other words, f is not \(\alpha ^\dag \)-derivable, hence the derivation in Fig. 4 is irreversible. In previous work, we have shown how to derive rate equations for graph transformation systems with only reversible derivations  [15, 17, 18]. In Sect. 3, we overcome this limitation, giving a procedure that extends to the irreversible case.

Fig. 5.
figure 5

The match g is \(\alpha \)-derivable, while h is not. (Color figure online)

Since POs are unique (up to unique isomorphism), \(\Rightarrow _{\alpha }\) is also functional (up to isomorphism). The fact that derivations are only defined up to isomorphism is convenient as it allows us to manipulate them without paying attention to the concrete naming of nodes and edges. Without this flexibility, stating and proving properties such as Lemma 2 and 3 below would be exceedingly cumbersome. On the other hand, when defining the stochastic semantics of our rewrite systems, it is more convenient to restrict \(\Rightarrow _{\alpha }\) to a properly functional relation. To this end, we fix once and for all, for any given match \(f :L \rightarrow G\) and rule \(\alpha :L \rightharpoonup R\), a representative \(f \Rightarrow _{\alpha } \alpha (f)\) from the corresponding isomorphism class of derivations, with (unique) comatch \(\alpha (f) :R \rightarrow H\), and (unique) corule \(f(\alpha ) :G \rightharpoonup H\).

A set of rules \(\mathcal {R}\) thus defines a labelled transition system (LTS) over graphs, with corules as transitions, labelled by the associated pair \(\left( f, \alpha \right) \). Given a rule \(\alpha :L \rightharpoonup R\), we define a stochastic rate matrix \(Q_\alpha := (q^\alpha _{G H})\) over graphs as follows.

(3)

Given a model, that is to say a finite set of rules \(\mathcal {R}\) and a rate map \(k:\mathcal {R} \rightarrow \mathbb {R}^+\), we define the model rate matrix \(Q(\mathcal {R},k)\) as

$$\begin{aligned} Q(\mathcal {R},k)&:= \textstyle \sum _{\alpha \in \mathcal {R}} k(\alpha ) Q_\alpha \end{aligned}$$
(4)

Thus a model defines a CTMC over \(\varvec{\mathbf {\varvec{\mathbf {Grph}}_*}}\). As \(\mathcal {R}\) is finite, \(Q(\mathcal {R},k)\) is row-finite.

2.3 Composition and Modularity of Derivations

By the well-known Pushout Lemma, derivations can be composed horizontally (rule composition) and vertically (rule specialisation) in the sense that if inner squares below are derivations, so are the outer ones:

figure h

Derivations can also be decomposed vertically. First, one has a forward decomposition (which follows immediately from pasting of POs in \(\varvec{\mathbf {\varvec{\mathbf {Grph}}_*}}\)):

Lemma 1 (Forward modularity)

Let \(\alpha \), \(\beta \), \(\gamma \) be rules and \(f_1\), \(f_2\), g, \(g_1\) matches such that diagrams (5) and (6) are derivations. Then there is a unique match \(g_2\) such that diagram (7) commutes (in \(\varvec{\mathbf {\varvec{\mathbf {Grph}}_*}}\)) and is a vertical composition of derivations.

figure i

A novel observation, which will play a central role in the next section, is that one also has a backward decomposition:

Lemma 2 (Backward modularity)

Let \(\alpha \), \(\beta \), \(\gamma \) be rules and f, \(f_1\), \(g_1\), \(g_2\) matches such that diagrams (8) and (9) are derivations. Then there is a unique match \(f_2\) such that diagram (10) commutes (in \(\varvec{\mathbf {\varvec{\mathbf {Grph}}_*}}\)) and is a vertical composition of derivations.

figure j

Forward and backward modularity look deceptively similar, but while Lemma 1 is a standard property of POs, Lemma 2 is decidedly non-standard. Remember that derivations are generally irreversible. It is therefore not at all obvious that one should be able to transport factorisations of comatches backwards along a rule, let alone in a unique fashion. Nor is it obvious that the top half of the resulting decomposition should be reversible. The crucial ingredient that makes backward modularity possible is that both matches and rules are monos. Because rules are (partial) monos, we can reverse \(\alpha \) and \(\beta \) in (8), and the resulting diagram still commutes (though it is no longer a derivation in general). The existence and uniqueness of \(f_2\) is then a direct consequence of the universal property of (9), seen as a PO. The fact that (9) is reversible relies on matches also being monos, but in a more subtle way. Intuitively, the graph T cannot contain any superfluous edges of the sort that render the derivation in Fig. 4 irreversible because, \(g_2\) being a mono, such edges would appear in H as subgraphs, contradicting the \(\alpha \)-derivability of \(g_2 {{\,\mathrm{\circ }\,}}g_1\). Together, the factorisation of f and the reversibility of (9) then induce the decomposition in (10) by Lemma 1. A full, axiomatic proof of Lemma 2 is given in App. B.3.

Among other things, Lemma 2 allows one to relate derivability of matches to reversibility of derivations:

Lemma 3

A match \(g :R \rightarrow H\) is derivable by a rule \(\alpha :L \rightharpoonup R\) if and only if the derivation \(g \Rightarrow _{\alpha ^\dag } f\) is reversible.

2.4 Gluings

Given \(G_1 \subseteq H\) and \(G_2 \subseteq H\), the union of \(G_1\) and \(G_2\) in H is the unique subgraph \(G_1 \cup G_2\) of H, such that \(V_{(G_1 \cup G_2)} = V_{G_1} \cup V_{G_2}\) and \(E_{(G_1 \cup G_2)} = E_{G_1} \cup E_{G_2}\). The intersection \((G_1 \cap G_2) \subseteq H\) is defined analogously. The subgraphs of H form a complete distributive lattice with \(\cup \) and \(\cap \) as the join and meet operations. One can glue arbitrary graphs as follows:

Definition 2

A gluing of graphs \(G_1\), \(G_2\) is a pair of matches \(i_1 :G_1 \rightarrow U\), \(i_2 :G_2 \rightarrow U\) with common codomain U; if in addition \(U = i_1(G_1) \cup i_2(G_2)\), one says the gluing is minimal.

Two gluings \(i_1 :G_1 \rightarrow U\), \(i_2 :G_2 \rightarrow U\) and \(j_1 :G_1 \rightarrow V\), \(j_2 :G_2 \rightarrow V\) are said to be isomorphic if there is an isomorphism \(u :U \rightarrow V\), such that \(j_1 = u {{\,\mathrm{\circ }\,}}i_1\) and \(j_2 = u {{\,\mathrm{\circ }\,}}i_2\). We write \(G_1 *_{\simeq } G_2\) for the set of isomorphism classes of minimal gluings (MG) of \(G_1\) and \(G_2\), and \(G_1 *_{} G_2\) for an arbitrary choice of representatives from each class in \(G_1 *_{\simeq } G_2\). Given a gluing \(\mu :G_1 \rightarrow H \leftarrow G_2\), denote by \(\hat{\mu }\) its “tip”, i.e. the common codomain \(\hat{\mu } = H\) of \(\mu \).

It is easy to see the following (see App. B for an axiomatic proof):

Lemma 4

Let \(G_1\), \(G_2\) be graphs, then \(G_1 *_{} G_2\) is finite, and for every gluing \(f_1 :G_1 \rightarrow H\), \(f_2 :G_2 \rightarrow H\), there is a unique MG \(i_1 :G_1 \rightarrow U\), \(i_2 :G_2 \rightarrow U\) in \(G_1 *_{} G_2\) and match \(u :U \rightarrow H\) such that \(f_1 = u {{\,\mathrm{\circ }\,}}i_1\) and \(f_2 = u {{\,\mathrm{\circ }\,}}i_2\).

See Fig. 2 in Sect. 1 for an example of a set of MGs.

3 Graph-Based GREs

To derive the GRE for graphs (GREG) we follow the development in our previous work  [17, 18] with the important difference that we do not assume derivations to be reversible. The key technical innovation that allows us to avoid the assumption of reversibility is the backward modularity lemma (Lemma 2).

As sketched in Sect. 1.2, our GRE for graphs is defined in terms of graph observables, which we now define formally. Fix S to be the countable (up to iso) set of finite graphs, and let \(F \in S\) be a graph. The graph observable \(\left[ F\right] :S \rightarrow \mathbb {N}\) is the integer-valued function \(\left[ F\right] (G) := \left|\left[ {F}, {G}\right] \right|\) counting the number of occurrences (i.e. matches) of F in a given graph G. Graph observables are elements of the vector space \(\mathbb {R}^S\) of real-valued functions on S.

The stochastic rate matrix \(Q_\alpha \) for a rule \(\alpha :L \rightharpoonup R\) defined in (3) is a linear map on \(\mathbb {R}^S\). Its action on an observable \(\left[ F\right] \) is given by

$$\begin{aligned} (Q_\alpha \left[ F\right] )(G)&:= \textstyle \sum _H q^\alpha _{G H}(\left[ F\right] (G) - \left[ F\right] (H))&\text {for } G, H \in S. \end{aligned}$$
(11)

Since the sum above is finite, \(Q_\alpha \left[ F\right] \) is indeed a well-defined element of \(\mathbb {R}^S\). We call \(Q_\alpha \left[ F\right] \) the jump of \(\left[ F\right] \) relative to \(Q_\alpha \). Intuitively, \((Q_\alpha \left[ F\right] )(G)\) is the expected rate of change in \(\left[ F\right] \) given that the CTMC sits at G.

To obtain the GREG as sketched in Sect. 1, we want to express the jump as a finite linear combination of graph observables. We start by substituting the definition of \(Q_\alpha \) in (11).

$$\begin{aligned} (Q_\alpha \left[ F\right] ) (G) \;&= \; \textstyle \sum _{H} q^\alpha _{G H} (\left[ F\right] (H) - \left[ F\right] (G)) \\ \nonumber&= \; \textstyle \sum _{H} \textstyle \sum _{ f \in \left[ {L}, {G}\right] \text { s.t. } \alpha (f) \in \left[ {R}, {H}\right] } \, (\left|\left[ {F}, {H}\right] \right| - \left|\left[ {F}, {G}\right] \right|) \\ \nonumber&= \; \textstyle \sum _{f \in \left[ {L}, {G}\right] } \left( \left|\left[ {F}, {{{\,\mathrm{cod}\,}}(\alpha (f))}\right] \right| - \left|\left[ {F}, {G}\right] \right| \right) . \end{aligned}$$

where the simplification in the last step is justified by the fact that f and \(\alpha \) uniquely determine \(\alpha (f)\). The last line suggests a decomposition of \(Q_\alpha \left[ F\right] \) as \(Q_\alpha \left[ F\right] = Q^+_\alpha \left[ F\right] - Q^-_\alpha \left[ F\right] \), where \(Q^+_\alpha \) produces new instances of F while \(Q^-_\alpha \) consumes existing ones.

By Lemma 4, we can factor the action of the consumption term \(Q^-_\alpha \) through the MGs \(L *_{} F\) of L and F to obtain

$$ (Q^-_\alpha \left[ F\right] )(G) \; = \; \textstyle \sum _{f \in \left[ {L}, {G}\right] } \left|\left[ {F}, {G}\right] \right| \; = \; \left|\left[ {L}, {G}\right] \right| \cdot \left|\left[ {F}, {G}\right] \right| \; = \; \sum _{\mu \in L *_{} F} \left|\left[ {\hat{\mu }}, {G}\right] \right|. $$

The resulting sum is a linear combination of a finite number of graph observables, which is exactly what we are looking for.

Simplifying the production term requires a bit more work. Applying the same factorisation Lemma 4, we arrive at

$$\begin{aligned} (Q^+_\alpha \left[ F\right] )(G) \;&= \; \textstyle \sum _{f \in \left[ {L}, {G}\right] } \left|\left[ {F}, {\hat{\alpha }(f)}\right] \right| \\&= \; \textstyle \sum _{f \in \left[ {L}, {G}\right] } \; \textstyle \sum _{(\mu _1, \mu _2) \in R *_{} F} \left|\left\{ g \in \left[ {\hat{\mu }}, {\hat{\alpha }(f)}\right] \mid g {{\,\mathrm{\circ }\,}}\mu _1 = \alpha (f) \right\} \right|. \end{aligned}$$

where \(\hat{\alpha }(f) = {{\,\mathrm{cod}\,}}(\alpha (f))\) denotes the codomain of the comatch of f. To simplify this expression further, we use the properties of derivations introduced in Sect. 2.3. First, we observe that \(\mu _1\) must be derivable by \(\alpha \) for the set of g’s in the above expression to be nonempty

Lemma 5

Let \(\alpha :L \rightharpoonup R\) be a rule and \(f :L \rightarrow G\), \(g :R \rightarrow H\), \(g_1 :R \rightarrow T\) matches such that \(f \Rightarrow _{\alpha } g\), but \(g_1\) is not derivable by \(\alpha \). Then there is no match \(g_2 :T \rightarrow H\) such that \(g_2 {{\,\mathrm{\circ }\,}}g_1 = g\).

Proof

By the contrapositive of backward modularity. Any such \(g_2\) would induce, by Lemma 2, a match \(f_1:L \rightarrow S\) and a derivation \(f_1 \Rightarrow _{\alpha } g_1\).    \(\square \)

We may therefore restrict the set \(R *_{} F\) of right-hand MGs under consideration to the subset \(\alpha *_{R} F := \left\{ (\mu _1, \mu _2) \in R *_{} F \mid \exists h. \, h \Rightarrow _{\alpha } \mu _1 \right\} \) of MGs with a first projection derivable by \(\alpha \). Next, we observe that the modularity Lemma 1 and 2 establish a one-to-one correspondence between the set of factorisations of the comatches \(\alpha (f)\) (through the MGs in \(\alpha *_{R} F\)) and a set of factorisations of the corresponding matches f.

Lemma 6 (correspondence of matches)

Let \(\alpha \), \(\beta \), \(\gamma \), f, \(f_1\), g, \(g_1\) such that diagrams (12) and (13) are derivations and \(g_1\) is derivable by \(\alpha \). Then the set \(M_L = \left\{ f_2 \in \left[ {S}, {G}\right] \mid f_2 {{\,\mathrm{\circ }\,}}f_1 = f \right\} \) is in one-to-one correspondence with the set \(M_R = \{g_2 \in \left[ {T}, {H}\right] \mid g_2 {{\,\mathrm{\circ }\,}}g_1 = g\}\).

figure k

Proof

Since \(g_1\) is \(\alpha \)-derivable, the diagram (13) is reversible, that is, \(f_1 \Rightarrow _{\alpha } g_1\), with corule \(\gamma \) (by Lemma 3). Hence, if we are given a match \(f_2\) in \(M_L\), we can forward-decompose (12) vertically along the factorisation \(f_2 {{\,\mathrm{\circ }\,}}f_1 = f\), resulting in the diagram below (by forward modularity, Lemma 1). Furthermore, the comatch \(g_2\) is unique with respect to this decomposition, thus defining a function \(\phi :M_L \rightarrow M_R\) that maps any \(f_2\) in \(M_L\) to the corresponding comatch \(\phi (f_2) = g_2\) in \(M_R\). We want to show that \(\phi \) is a bijection. By backward modularity (Lemma 2), there is a match \(f_2 \in M_L\) for any match \(g_2 \in M_R\) such that \(\phi (f_2) = g_2\) (surjectivity), and furthermore, \(f_2\) is the unique match for which \(\phi (f_2) = g_2\) (injectivity).    \(\square \)

figure l

Using Lemma 5 and 6, we can simplify \(Q^+_\alpha \) as follows:

$$\begin{aligned} (Q^+_\alpha \left[ F\right] )(G) \;&= \sum _{f \in \left[ {L}, {G}\right] } \; \sum _{\mu \in \alpha *_{R} F} \left|\left\{ g_2 \in \, \left[ {\hat{\mu }}, {\hat{\alpha }(f)}\right] \mid g_2 {{\,\mathrm{\circ }\,}}\mu _1 = \alpha (f) \right\} \right| \\&= \sum _{\mu \in \alpha *_{R} F} \, \sum _{f \in \left[ {L}, {G}\right] } \left|\left\{ f_2 \in \left[ {\hat{\alpha }^\dag (\mu _1)}, {G}\right] \mid f_2 {{\,\mathrm{\circ }\,}}\alpha ^\dag (\mu _1) = f \right\} \right| \\&= \sum _{\mu \in \alpha *_{R} F} \left|\left[ {\hat{\alpha }^\dag (\mu _1)}, {G}\right] \right| \end{aligned}$$

If we set \(\alpha *_{L} F := L *_{} F\) to symmetrise notation, we obtain

$$\begin{aligned} Q_\alpha (\left[ F\right] )&= \textstyle \sum _{\mu \in \alpha *_{R} F} \left[ \hat{\alpha }^\dag (\mu _1)\right] - \textstyle \sum _{\mu \in \alpha *_{L} F} \left[ \hat{\mu }\right] \end{aligned}$$
(14)

Now, in general for a CTMC on a countable state space S, the Markov-generated and time-dependent probability p on S follows the master equation [1, 40]: \(\tfrac{d}{dt}p^T = p^T Q\). Given an abstract observable f in \(\mathbb {R}^S\), and writing \({{\,\mathrm{\mathbb {E}}\,}}_p(f) := p^T f\) for the expected value of f according to p, we can then derive the formalFootnote 1 Kolmogorov equation for f:

$$\begin{aligned} \tfrac{d}{dt}{{\,\mathrm{\mathbb {E}}\,}}_p(f) \; = \; \tfrac{d}{dt}p^T f \; = \; p^T Q f \; = \; {{\,\mathrm{\mathbb {E}}\,}}_p (Q f), \end{aligned}$$

giving us an equation for the rate of change of the mean of f(X(t)). Following this general recipe gives us the GRE for graphs immediately from (14).

$$\begin{aligned} \frac{d}{dt}{{\,\mathrm{\mathbb {E}}\,}}_p (\left[ F\right] ) =&- \sum _{\alpha \in \mathcal {R}} k(\alpha ) \!\!\! \sum _{\mu \in \alpha *_{L} F} \!\!\! {{\,\mathrm{\mathbb {E}}\,}}_p \left[ \hat{\mu }\right] + \sum _{\alpha \in \mathcal {R}} k(\alpha ) \!\!\! \sum _{\mu \in \alpha *_{R} F} \!\!\! {{\,\mathrm{\mathbb {E}}\,}}_p \left[ \hat{\alpha }^\dag (\mu _1)\right] . \end{aligned}$$
(15)

Remember that \(\mu _1\) denotes the left injection of the MG \(\mu = (\mu _1, \mu _2)\) while \(\hat{\mu }\) denotes its codomain, and that \(\hat{\alpha }^\dag (f) = {{\,\mathrm{cod}\,}}(\alpha ^\dag (f))\).

Unsurprisingly, the derivation of (15) was more technically challenging than that of the GRE for reversible graph rewrite systems (cf.  [18, Theorem 2]). Yet the resulting GREs look almost identical (cf.  [18, Eq. (7)]). The crucial difference is in the production term \(Q^+_\alpha \), where we no longer sum over the full set of right-hand MGs \(R *_{} F\) but only over the subset \(\alpha *_{R} F\) of MGs that are \(\alpha \)-derivable. This extra condition is the price we pay for dealing with irreversibility: irreversible rules can consume all MGs, but only produce some.

Note that the number of terms in (15) depends on the size of the relevant sets of left and right-hand MGs, which is worst-case exponential in the size of the graphs involved, due to the combinatorial nature of MGs. (See Fig. 2 in Sect. 1 for an example.) In practice, one often finds many pairs of irrelevant MGs, the terms of which cancel out exactly. This reduces the effective size of the equations but not the overall complexity of generating the GREG.

Finally, as said in Sect. 1.2, the repeated application of (15) will lead to an infinite expansion in general. In practice, the system of ODEs needs to be truncated. For certain models, one can identify invariants in the underlying rewrite system via static analysis, which result in a finite closure even though the set of reachable components is demonstrably infinite  [19]. We have seen an example in Sect. 1.

4 Conclusion

We have developed a computer supported method for mean field approximations (MFA) for stochastic systems with graph-like states that are described by rules of SPO rewriting. The underlying theory unifies a large and seemingly unstructured collection of MFA approaches which share a graphical “air de famille”. Based on the categorical frameworks of graph transformation systems (GTS), we have developed MFA-specific techniques, in particular concerning the combinatorics of minimal gluings. The main technical hurdle consisted in showing that the set of subgraph observables is closed under the action of the rate matrix (a.k.a. the infinitesimal generator) of the continuous-time Markov chain generated by an irreversible GTS. The proof is constructive and gives us an explicit term for the derivative of the mean of any observable of interest.

Mean field approximation and moment-closure methods are of wide use in applications, as typical probabilistic systems tend to have state spaces which defy more direct approaches. To reach their full potential, MFAs need to be combined with reachability and invariant analysis (as illustrated in Sect. 1).

We have worked the construction at the general axiomatic level of SPO-rewriting with matches and rules restricted to monomorphisms. One interesting extension is to include nested application conditions (NACs)  [29, 41] where the application of a rule can be modulated locally by the context of the match. NACs are useful in practice, and bring aboard the expressive power of first order logic in the description of transformation rules. We plan to investigate the extension of our approach to NACs, and, in particular, whether it is possible to incorporate them axiomatically, and what additional complexity cost they might incur.

Another direction of future work is to improve on the method of truncation. In the literature, one often finds graphical MFAs used in combination with conditional independence assumptions to control the size of connected observables, as e.g. the so-called pair approximation  [20, 27]. As these methods are known to improve the accuracy of naive truncation, we wish to understand if and how they can be brought inside our formal approach.