1 Introduction

We consider the problem of counting the number of times a pattern graph H appears in a host graph G as an induced subgraph. Without any restrictions on G, this problem is already difficult for very simple H: Flum and Grohe [10] showed that it is -complete when H is a clique and Chen, Thurley, and Weyer showed that it is -complete when it is a path [2] (a related result by Chen and Flum shows that counting maximal paths is -hard). That is, there is little hope for algorithms with running time \(f(|H|) \cdot \textsf{poly}{|G|}\) for these problems unless e.g. counting satisfying assignments of a 3-CNF formula is possible in time \(2^{o(n)}\) (further details on parameterized counting classes can be found in Flum and Grohe’s book [11]).

The situation is less glum when we restrict ourselves to sparse host graphs. For example, Eppstein, Löffler, and Strash showed that enumerating all cliques in a d-degenerate host graph G is possible in time \(O(d \cdot 3^{d/3} |G|)\) [8]. More generally, we can count any pattern graph H on h vertices in time \(O(f(h) \cdot |G|)\) provided that G is taken from a graph class of bounded expansion (where f depends on the class) and time \(O(f(h) \cdot |G|^{1+o(1)})\) if it is taken from a nowhere dense graph class. These two classes generalize well-known notions like classes excluding a (topological) minor or classes of bounded degree and are therefore attractive targets for algorithmic research.

Intuitively, bounded expansion classes have the property that they have bounded average degree and this property is maintained when taking a minor of bounded depth (also called a shallow minor), meaning a minor whose branch sets are bounded by a constant (the ‘depth’). The average degree of the graphs that can be obtained by such an operation is then only a function of the original graph class and the depth of the minor. Nowhere dense classes are similar, but instead of the average degree we have that the clique number remains constant under this operation. Classes with bounded expansion are degenerate, but there are degenerate classes that do not have bounded expansion (the simplest example are one-subdivision of complete graphs which are 2-degenerate but have neither bounded expansion nor are they nowhere dense). Nowhere dense classes generalise bounded expansion classes and allow a (slightly) superlinear number of edges, as such these classes are incomparable with degenerate classes.

In summary, bounded expansion is a very general notion of sparseness that allows us to design linear-time algorithms for problems that, in all likelihood, do not admit linear-time algorithms on general graphs. Moreover, we have evidence based on random graph models [3, 9, 22] as well as measurements [4, 17] that at least some types of networks can be described as “having bounded expansion”.Footnote 1 We therefore believe that algorithm based on the bounded expansion toolkit can be useful to analyse real-world data, though making these algorithms practical is not without challenges.

Case in point, the two existing types of approaches to count substructures in bounded expansion classes have not shown much promise in practice. One class of algorithms is based on so-called p-treedepth colorings: given a class \({\mathcal {G}}\) of bounded expansion we can color any \(G \in {\mathcal {G}}\) in time \(f(p) \cdot |G|\) with f(p) colors so that any subgraph of G with \(i < p\) colors has treedepth \(\le i\). By computing an h-treedepth coloring this effectively reduces the problem to counting H in a graph \(G'\) of treedepth \(t \le |H|\). Ossona de Mendez and Nešetřil, who also introduced the notion of bounded expansion [18] and nowhere dense classes [19], presented an algorithm for this latter step with a running time of \(O(2^{ht}ht \cdot |G'|)\) [20]; with Demaine, Rossmanith, Sánchez Villaamil, and Sikdar we later improved this to \(O((6t)^h h^2 \cdot |G'|)\) [3]. Using this subroutine, we can count occurrences of H in G by first computing an h-treedepth coloring with \(f'(h)\) colors, then iterate through all \(\sum _{i=1}^h {f'(h) \atopwithdelims ()i}\) color combinations and count in the aforementioned time the number of times H appears in the subgraph \(G'\) induced by these colors. The final count is then computed via inclusion-exclusion over the counts obtained for the color sets.

While conceptually simple, it turns out that these algorithms are currently impractical: a) computing h-treedepth colorings is currently computationally quite expensive and b) the number of colors \(f'(h)\) is so big that already the act of enumerating all relevant color subsets takes too long [21]. It turns out that the underlying technique for these algorithms—so-called transitive-fraternal augmentations [20] (tf-augmentations) with some practical improvements [21, 22]—also lies at the heart of the other available technique. Kazana and Segoufin used tf-augmentations to enumerate first-order queries with constant delay (or to count such queries in linear time) in classes with bounded expansion [14] and Dvořák and Tůma designed a dynamic data structureFootnote 2 to count subgraphs with amortized polylogarithmic updates [7]. The latter approach also has the drawback that in order to count induced subgraphs, one must perform a big inclusion-exclusion over all supergraphs of the pattern.Footnote 3

Despite our best efforts to make tf-augmentations practical, so far they seem to be only useful in very tame settings like bounded-degree graphs [1]. It is thus natural to ask whether we can solve the subgraph-counting problem without relying on p-treedepth colorings or even tf-augmentations. In particular, the computation of so-called generalized coloring numbers (a set of graph measures introduced by Kierstead and Yang [15] which provide an alternative characterisation of bounded expansion/nowhere dense classes [24]), appears much more feasible in practice [17], and offers an attractive ordering-based alternative.

Our contribution here is to provide an algorithm to count induced subgraphs which is solely based on the weak coloring number (or the coloring number). At a high level, we do this by using a suitable linear order of the host graph and counting how often each of the possible pattern graph orders appears in it.Footnote 4 The crucial insight here is that under some orderings, the pattern graph can only appear inside certain neighborhood-subsets and that all other orderings can be reduced to these easily countable cases via inclusion-exclusion style arguments. Note that in contrast to Dvořák and Tůma’s approach, the objects in our inclusion–exclusion are specific ordered graphs and we can therefore avoid counting all supergraphs of the pattern.

In order to establish the practicality of our approach, we implemented a prototype of the entire algorithmic pipeline described in this paper using a combination of Rust and Python. The code is available under a BSD 3-clause license at http://www.github.com/theoryinpractice/mandoline.

We begin in Sect. 2 by providing necessary definitions and notation related to ordered graphs, reachability and bounded expansion. We then describe our approach to decomposing the pattern graph and combining counts of partial matches in Sect. 3. We combine these subroutines with a new data structure in Sect. 4 to form the basis of our linear-fpt algorithm. Finally, in Sect. 5, we briefly discuss our experimental results and future work.

2 Preliminaries

We begin with a high-level description of how our algorithm works, in order to motivate the technical concepts and notation defined in this section. To that end, let us first review how complete subgraphs can be counted in linear time in a d-degenerate graph G: We first compute a degeneracy ordering of G, meaning that every vertex v has at most d left neighbors \(N^-(v)\) i.e. neighbors of v that appear before v in the ordering. Assume some vertex set \(X \subseteq V(G)\) induces a complete subgraph \(K_t\) in G. Let \(v \in X\) be the last vertex of X in the ordering, then note that \(X \subseteq N^- (v) \cup \{ v \}\). Accordingly, we can count all complete subgraphs in G by exhaustively searching every left-neighborhood \(N^-(v) \cup \{v\}\) for every vertex \(v \in G\) and tally up the results.

In order to extend this algorithm, we need to generalise it in two directions. First, instead of using degeneracy orderings and left neighborhoods we will use weak r-coloring orderings and the related weakly r-reachable sets. Similar to the clique example, we find that for some pattern graphs H it is enough to search the weakly r-reachable set of each vertex to locate H. For other patterns, however, we can only locate pieces of the pattern and we need to combine these pieces in order to count appearances of H.

More specifically, our algorithm enumerates all possible orderings of the pattern H and counts how often the pattern appears in this order inside the ordered host graph. For most ordered patterns, the algorithm needs to decompose the pattern into pieces, a process that depends on both the ordering of the pattern as well as certain connectivity properties of the pattern itself. This is formalized by the notion of tree ordered graphs and relaxations below. The tree order informs how an ordered pattern needs to be decomposed into piece-sums, that is, a collection of pieces whose respective counts in the host graph can be combined to compute the number of times the ordered pattern appears.

Finally, the above process needs to be corrected as it will also count certain subgraphs which are not isomorphic to the pattern. We call these graphs defects—in short these graphs contain all the necessary pieces that we identified in the decomposition of the pattern, but in such a way that pieces either share vertices or are connected by edges and hence do not form the sought pattern. In order to describe defects precisely, we introduce the notation of embeddings of tree-ordered graphs below. Our algorithm counts defects by the same procedure as it counts the patterns, i.e. it decomposes them into pieces and counts those individually, this in turn generates further defects to be counted etc. This recursion terminates since at every step the defects become denser and “more linear”.

2.1 General Notation

We frequently use the notation \([n] := \{1,\ldots ,n\}\) for natural numbers n. The symbol is reserved for variables whose value is unimportant in this contex, we write e.g.  to emphasise that f has a single argument. In the proof of Lemma 6 we make heavy use of Iverson brackets: for any predicate S the expression \( \llbracket S \rrbracket \) evaluates to 1 if S is true and 0 otherwise.

2.2 Trees

All trees in this paper will be assumed to be rooted. In particular, a subtree is always a rooted subtree. For a tree T, we write \({\text {root}}(T)\) to denote its root and \({\text {leaves}}(T)\) to denote its leaves. The root path \({\text {rpath}}_T(x)\) for a node \(x \in T\) is the unique path from \({\text {root}}(T)\) to x in T.

The ancestor relationship \({{\,\mathrm{\preccurlyeq _{\textsf{anc}}}\,}}^{T}\) of a tree T is the partial order defined via

$$\begin{aligned} x {{\,\mathrm{\preccurlyeq _{\textsf{anc}}}\,}}^{T}y \iff x \in {\text {rpath}}_T(y). \end{aligned}$$

2.3 Partial and Total Orders

We will use the symbol \(\preccurlyeq \) to denote partial orders and the symbol \(\prec \) to denote the relation \((x \preccurlyeq y) \wedge (x \ne y)\). Given a partial order \(\preccurlyeq \) over S, its digraph representation is a dag with vertices S and arcs \(\{ xy \in S \times S \mid x \prec y\}\).

The principal digraph of a partial order \(\preccurlyeq \) over S is the dag with vertices S and the arcs

$$\begin{aligned} \{ xy \in S \times S \mid x \prec y ~\text {and there is no }z \in S\text { with}~x \prec z \prec y\} \end{aligned}$$

Note that if \(\vec {D}\) is the principal digraph of \(\preccurlyeq \), S; then the transitive closure of \(\vec {D}\) is the digraph representation of \(\preccurlyeq \), S.

A partial order \(\preccurlyeq \) over S is a tree if is has a unique minimum and if for every element \(x \in S\), the set \(\{ y \mid y \preccurlyeq x \}\) is well-ordered by \(\preccurlyeq \). Alternatively, \(\preccurlyeq \) is a tree if its principal digraph is a directed tree, e.g. all arcs are oriented away from the root node.

A linear extension of \(\preccurlyeq \) is a total order \(\le \) such that \(x \preccurlyeq y\) implies \(x \le y\). The linear extensions of \(\preccurlyeq \) are precisely the topological orderings of either its digraph representation or its principal digraph.

2.4 Ordered Graphs

A tree ordered graph (tog) \(\textbf{G}= (G, \preccurlyeq )\) is a graph whose vertex set \(V(\textbf{G}) := V(G)\) is imbued with a (partial) order relation \(\preccurlyeq \) with the following properties:

  1. 1.

    The relation \(\preccurlyeq \) is a tree order.

  2. 2.

    The relation E(G) is guarded by \(\preccurlyeq \): for every edge \(uv \in E(G)\) it holds that either \(u \preccurlyeq v\) or \(v \preccurlyeq u\).

We define \(T(\textbf{G})\) to be the tree-representation of \(\preccurlyeq \) with node set \(V(\textbf{G})\). We extend the notions and notation of roots, leaves, and root-paths to togs via \({\text {root}}(\textbf{G}) := {\text {root}}(T(\textbf{G}))\), \({\text {leaves}}(\textbf{G}) := {\text {leaves}}(T(\textbf{G}))\), and . Given a tog \(\textbf{G}\) we write \(\preccurlyeq _\textbf{G}\) to denote its tree-order relation and we will use the notation \(u \prec _\textbf{G}v\) to mean that \(u \preccurlyeq _\textbf{G}v\) and \(u \ne v\). An ordered vertex set \({\bar{x}} := x_1,\ldots ,x_\ell \) of a tog \(\textbf{G}\) is a sequence of vertices which satisfies \(x_1 \prec _\textbf{G}x_2 \prec _\textbf{G}\ldots \prec _\textbf{G}x_\ell \). The length of an ordered vertex set \({\bar{x}}\) is the number of elements in it. We use the symbol \(\emptyset \) to denote both the empty set and the empty ordered vertex set and make sure it is clear from the context which is meant.

If \(\preccurlyeq _\textbf{G}\) is a total order we call \(\textbf{G}\) a linear graph. We will use the symbol \(\mathbb {G}\) (instead of \(\textbf{G}\)) and \(\le _{\mathbb {G}}\) (instead of \(\preccurlyeq _{\textbf{G}}\)) in cases were we want to emphasize that the ordering is linear. For a given graph G we write \(\Pi (G)\) for the set of all linear graphs obtained from G by permuting its vertex set.

A tog isomorphism \(\textbf{H}\simeq \textbf{G}\) is a bijection between the vertex sets of \(\textbf{H}\) and \(\textbf{G}\) that preserves both the edge and the ordering relations. Given a vertex set \(X \subseteq V(\textbf{G})\) with a unique minimum under \(\preccurlyeq _\textbf{G}\), the tog induced by X, denoted by \(\textbf{G}[X]\), is the tog \((G[X], \preccurlyeq _\textbf{G}\!|_X)\). Note that we need X to have a unique minimum in order to ensure that \(\preccurlyeq _\textbf{G}\!|_X\) is still a tree order. Since \(\preccurlyeq _\textbf{G}\) guards the edge relation E(G) this is already true if G[X] is connected. In general, a tog \(\textbf{H}\) is an induced subtog of a tog \(\textbf{G}\) if there exists a vertex set X such that \(\textbf{H}\simeq \textbf{G}[X]\) and we write \(\textbf{H}\subseteq \textbf{G}\).

The stem of a tog \(\textbf{G}\) is the ordered set \({\bar{x}}\) of maximal length such that \({\bar{x}}\) is linearly ordered under \(\preccurlyeq _\textbf{G}\) and \(\max {\bar{x}} \preccurlyeq _\textbf{G}u\) for all vertices \(u \in V(\textbf{G}) - {\bar{x}}\). If we visualize \(\preccurlyeq _\textbf{G}\) as a tree then the stem is the path from the root to the first node with more than one child.

We say that a tog \(\textbf{H}\) embeds into a tog \(\textbf{G}\) if there exists an induced subgraph isomorphism \(\phi \) from H to G that further satisfies

$$\begin{aligned} u \preccurlyeq _\textbf{H}v \implies \phi (u) \preccurlyeq _\textbf{G}\phi (v) \end{aligned}$$

and we write . If we do not need to asign a variable for the embedding, we will simply write or, in cases where we want specify that the embedding maps stem-vertices \({\bar{x}}\) of \(\textbf{H}\) onto \({\bar{y}}\) in \(\textbf{G}\), we write .

We further write if and \(\phi \) is an isomorphism between H and G, we will call such embeddings strict. Again, we simply write to denote that such a strict embedding exists or if we want to specify the mapping of stem-vertices.

For a vertex set \(X \subseteq V(\textbf{G})\), we let \(\min _\textbf{G}X\) and \(\max _\textbf{G}X\) be the minimum and maximum according to \(\preccurlyeq _\textbf{G}\) (if they exist). We extend this notation to subtogs via \(\min _\textbf{G}\textbf{H}:= \min _\textbf{G}V(\textbf{H})\) and \(\max _\textbf{G}\textbf{H}:= \max _\textbf{G}V(\textbf{H})\). We observe that minima are preserved by tog embeddings:

Observation 1

Let and let \(\textbf{H}' \subseteq \textbf{H}\). Then \(\min _\textbf{G}\phi (\textbf{H}') = \phi (\min _\textbf{H}\textbf{H}')\).

Proof

Since \(\textbf{H}'\) is a subtog of \(\textbf{H}\) the minimum \(\min _\textbf{H}\textbf{H}'\) exists. From \(\min _\textbf{H}\textbf{H}' \preccurlyeq _\textbf{H}u\) for every \(u \in \textbf{H}'\) and the fact that \(\phi \) is an embedding we conclude that \(\phi (\min _\textbf{H}\textbf{H}') \preccurlyeq _\textbf{G}\phi (u)\) for every \(u \in \textbf{H}'\). Therefore \(\min _\textbf{G}\phi (\textbf{H}') = \phi (\min _\textbf{H}\textbf{H}')\). \(\square \)

This in particular implies that embeddings preserve ordered vertex sets: if \({\bar{x}}\) is an ordered vertex set of \(\textbf{H}\) and , then \(\phi ({\bar{x}})\) is an ordered vertex set of \(\textbf{G}\). Similarly, if \({\bar{x}}\) is the stem of \(\textbf{H}\) and \({\bar{y}}\) is the stem of \(\textbf{G}\) then \(\phi ({\bar{x}})\) is a prefix of \({\bar{y}}\).

Finally, we note that embeddings are transitive:

Observation 2

If and then .

The notion of elimination trees (known also under the name treedepth decomposition and many others) connects tree ordered graphs to linearly ordered graphs.

Definition 1

(Elimination tree) Given a connected linearly ordered graph \(\mathbb {H}\), the elimination tree \({{\,\textrm{ET}\,}}(\mathbb {H})\) is defined recursively as follows: Let \(x := \min \mathbb {H}\) and let \(\mathbb {K}_1,\ldots ,\mathbb {K}_s\) be the connected components of \(\mathbb {H}- x\). Then \({{\,\textrm{ET}\,}}(\mathbb {H})\) has x as its root with the roots of \({{\,\textrm{ET}\,}}(\mathbb {K}_1),\ldots , {{\,\textrm{ET}\,}}(\mathbb {K}_s)\) as its children.

Definition 2

(Tree order relaxation) Given a connected linearly ordered graph \(\mathbb {H}\) and its elimination tree \(T := {{\,\textrm{ET}\,}}(\mathbb {H})\), we define its tree order relaxation as the tog \({{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {H}) = (H, {{\,\mathrm{\preccurlyeq _{\textsf{anc}}}\,}}^{T})\).

Observe that and these embeddings have the stem of \({{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {H})\) as fixed points.

Definition 3

(Elimination-ordered graph (etog)) A tog \(\textbf{H}\) for which there exists a linear graph \(\mathbb {H}\) such that \({{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {H}) = \textbf{H}\) is called an elimination-ordered graph (etog).

Lemma 1

Let \(\textbf{H}= {{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {H})\) be a tree order relaxation of a connected linear graph \(\mathbb {H}\). Then for every pair of vertices \(x,y \in V(\textbf{H})\) it holds that that \(x \preccurlyeq _\textbf{H}y\) if and only if there exists an xy-path P with \(\min _\textbf{H}P = x\).

Proof

Let \(T = {{\,\textrm{ET}\,}}(\mathbb {H})\) be the elimination tree whose ancestor relationship defines \(\preccurlyeq _\textbf{H}\). First assume that \(x \preccurlyeq _\textbf{H}y\). By definition, the nodes of the subtree \(T_x\) induce a connected subtog of \(\mathbb {H}\) and hence in \(\textbf{H}\), thus there exists a path P from x to y in \(\textbf{H}[V(T_x)]\) and hence \(\min _\textbf{H}P = x\). In the other direction, assume an xy-path P with \(\min _\textbf{H}P = x\) exists. Since \(y \in P\) it follows that \(x \preccurlyeq _\textbf{H}y\), as claimed. \(\square \)

Corollary 1

Let \(\textbf{H}= {{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {H})\) be a tree order relaxation of a connected linear graph \(\mathbb {H}\). Then for every pair of vertices \(x,y \in V(\textbf{H})\) which are incomparable under \(\preccurlyeq _{\textbf{H}}\), every path P from x to y satisfies \(\min _\textbf{H}P \not \in \{x,y\}\).

2.5 Reachability and Bounded Expansion

For any integer r, we define the set \({\mathcal {P}}^r(u)\) as the set of all paths with length between 1 and r which have u as one of their endpoints. Similarly, we the set \({\mathcal {P}}^r(u,v)\) as the set of all uv-paths of length \(\le r\). With this notation, we can now define the weak r-neighbors of a vertex u in a linear graph \(\mathbb {G}\) as the set

$$\begin{aligned} W^r_\mathbb {G}(u) = \{ \min P \mid P \in {\mathcal {P}}^r(u) \} \setminus \{u\}, \end{aligned}$$

that is, \(W^r_\mathbb {G}(u)\) contains all vertices that are weakly r-reachable from u. Notice that by definition every vertex in \(W^r_\mathbb {G}(u)\) precedes u in \(\mathbb {G}\). In words, this set contains those vertices \(v \preccurlyeq _\mathbb {G}u\) that can be reached from u by using a path of length at most r which does not use any vertices to the left of v.

We also define the strong r-neighbors as the set

$$\begin{aligned} S^r_\mathbb {G}(u) = \{ v \preccurlyeq _\mathbb {G}u \mid \exists P \in {\mathcal {P}}^r(u,v) ~\text {s.t.}~ u \preccurlyeq _\mathbb {G}(P-v) \}, \end{aligned}$$

that is, \(S^r_\mathbb {G}(u)\) contains all vertices that are strongly r-reachable from u. This set contains those vertices \(v \preccurlyeq _\mathbb {G}u\) which can be reached by paths of length at most r which, apart from its endpoints, lie entirely to the right of u. For convenience, we define \(W^r_\mathbb {G}[u] := W^r_\mathbb {G}(u) \cup \{u\}\) and \(S^r_\mathbb {G}[u] := S^r_\mathbb {G}(u) \cup \{u\}\). As usual, we omit the subscript \(\mathbb {G}\) if clear from the context.

The notions of weak and strong reachability are at the core of the generalized colorings numbers \( {\text {col}}_{r} \) and \( {\text {wcol}}_{r} \). For a linear graphFootnote 5\(\mathbb {G}\), we define them as

$$\begin{aligned} {\text {wcol}}_{r} (\mathbb {G})&:= \max _{v \in \mathbb {G}} | W^r_\mathbb {G}[v] |, \\ {\text {col}}_{r} (\mathbb {G})&:= \max _{v \in \mathbb {G}} | S^r_\mathbb {G}[v] |, \\ \end{aligned}$$

which lets us define the generalized coloring numbers for unordered graphs as

$$\begin{aligned} {\text {wcol}}_{r} (G)&:= \min _{\mathbb {G}\in \Pi (G)} {\text {wcol}}_{r} (\mathbb {G}), \\ {\text {col}}_{r} (G)&:= \min _{\mathbb {G}\in \Pi (G)} {\text {col}}_{r} (\mathbb {G}). \end{aligned}$$

Kierstead and Yang [15] showed that the weak r-coloring number is bounded iff the r-coloring number is:

$$\begin{aligned} {\text {col}}_{r} (G) \le {\text {wcol}}_{r} (G) \le {\text {col}}_{r} (G)^r, \end{aligned}$$

and Zhu related the above graph measures to classes of bounded expansion [24]. As a result, we can work with the following characterisation of bounded expansion and nowhere dense classes:

Proposition 1

The following statements about a graph class \({\mathcal {G}}\) are equivalent:

  1. 1.

    \({\mathcal {G}}\) has bounded expansion,

  2. 2.

    there exists a function f such that \( {\text {col}}_{r} (G) < f(r)\) for all \(G \in {\mathcal {G}}\) and all \(r \in {\mathbb {N}}_0\),

  3. 3.

    there exists a function g such that \( {\text {wcol}}_{r} (G) < g(r)\) for all \(G \in {\mathcal {G}}\) and all \(r \in {\mathbb {N}}_0\).

In nowhere dense classes these measures might depend on the size of the graph, albeit only sublinearly [13]:

Proposition 2

The following statements about a graph class \({\mathcal {G}}\) are equivalent:

  1. 1.

    \({\mathcal {G}}\) is nowhere dense,

  2. 2.

    there exists a sequence of functions \((f_r)_{r \in \mathbb {N}_0}\) with \(f_r(n) = O(n^{o(1)})\) such that \( {\text {col}}_{r} (G) < f_r(|G|)\) for all \(G \in {\mathcal {G}}\) and all \(r \in \mathbb {N}_0\),

  3. 3.

    there exists a sequence of functions \((g_r)_{r \in \mathbb {N}_0}\) with \(g_r(n) = O(n^{o(1)})\) such that \( {\text {wcol}}_{r} (G) < g_r(|G|)\) for all \(G \in {\mathcal {G}}\) and all \(r \in \mathbb {N}_0\).

We are left with the question of computing orderings which provide small values for \(W^r\) or \(S^r\). Finding optimal orderings for weakly reachable sets is \(\textsf{N}\textsf{P}\)-complete [12] for \(r \ge 3\), we therefore have to resort to approximations. To our knowledge, the best current option for theoretical purposes is via admissibility, yet another order-based measure: the r-admissibility \( {\text {adm}}_{r} ^\mathbb {G}(v)\) of a vertex v in an ordered graph \(\mathbb {G}\) is the maximum number of paths of length at most r which a) only intersect in v and b) end in vertices that come before v in \(\le _\mathbb {G}\). The admissibility of an ordered graph \(\mathbb {G}\) is then

$$\begin{aligned} {\text {adm}}_{r} (\mathbb {G})&= \max _{v \in G} | {\text {adm}}_{r} ^\mathbb {G}(v) |, \end{aligned}$$

and it is not too difficult to see that \( {\text {adm}}_{r} (\mathbb {G}) \le {\text {col}}_{r} (\mathbb {G})\). As with the other generalized coloring numbers, the admissibility of an unordered graph is taken over all its possible orderings:

$$\begin{aligned} {\text {adm}}_{r} (G)&= \min _{\mathbb {G}\in \Pi (G)} {\text {adm}}_{r} (\mathbb {G}). \end{aligned}$$

In the other direction, we have the following results:

Proposition 3

(cf. Dvořák [5]) For any linear ordering \(\mathbb {G}\) of G and \(r \in \mathbb {N}\) it holds that

$$\begin{aligned} {\text {col}}_{r} (\mathbb {G}) \le {\text {adm}}_{r} (\mathbb {G})( {\text {adm}}_{r} (\mathbb {G})-1)^{r-1} + 1. \end{aligned}$$

Proposition 4

(cf. Dvořák [6]) For any linear ordering \(\mathbb {G}\) of G and \(r \in \mathbb {N}\) it holds that

$$\begin{aligned} {\text {wcol}}_{r} (\mathbb {G}) \le \big (r^2 {\text {adm}}_{r} (\mathbb {G}) \big )^r. \end{aligned}$$

Importantly, a linear-time algorithm to compute the admissiblity existsFootnote 6.

Proposition 5

(cf. Dvořák [5]) Let \({\mathcal {G}}\) be a class with bounded expansion and \(r \in \mathbb {N}\). There exists a linear-time algorithm that for each \(G \in \mathcal G\) computes an ordering \(\mathbb {G}\) with \( {\text {adm}}_{r} (\mathbb {G}) = {\text {adm}}_{r} (G)\).

As a corollary to these three proposition, we can compute an ordering \(\mathbb {G}\) of G in linear time with

$$\begin{aligned} {\text {col}}_{r} (\mathbb {G}) \le {\text {col}}_{r} (G)( {\text {col}}_{r} (G)-1)^{r-1} + 1 = O( {\text {col}}_{r} (G)^r) \end{aligned}$$

and

$$\begin{aligned} {\text {wcol}}_{r} (\mathbb {G}) \le \big (r^2 {\text {wcol}}_{r} (G) \big )^r. \end{aligned}$$

2.6 Conventions

In the remainder, we fix a linear graph \(\mathbb {G}\), the host graph, and a pattern graph H. Our goal is to count how often H appears as an induced subgraph in the underlying graph G of \(\mathbb {G}\). For ease of presentation, we will assume that H is connected and discuss later how the algorithms can be modified for disconnected patterns.

3 Pattern Decomposition

We will be counting the pattern by considering the possible orderings in which it may appear in the host graph. However, it turns out that some of these orderings need to be treated as a unit with our approach, namely those orderings that result in the same pattern relaxation. In that sense, we count the number of embeddings only for members of the following set:

Definition 4

(Pattern relaxation) For the pattern graph H we define its pattern relaxations as the set

$$\begin{aligned} {\mathcal {H}} := \{ {{\,\mathrm{{\textsf{relax}}}\,}}((H, \le _\pi )) \mid \pi \in \pi (V(H)) \}. \end{aligned}$$

Each pattern relaxation will be decomposed further until we arrive at an object that is easily countable. To that end, we define the following:

Definition 5

(Pieces, linear pieces) Given a pattern relaxation \(\textbf{H}\in {\mathcal {H}}\) and a subset of its leaves \(S \subseteq {\text {leaves}}(\textbf{H})\), the piece induced by S is the induced subtog

$$\begin{aligned} \textbf{H}\big [ \bigcup _{x \in S} {\text {rpath}}(x) \big ]. \end{aligned}$$

If \(|S| = 1\), the resulting piece is a linear graph and we refer to it as a linear piece.

With that, we define the decomposition of a pattern relaxation via piece sums (see Fig. 1 for examples):

Definition 6

(Piece sum) Let \(\textbf{H}\) be a tog with stem \({\bar{x}}\). We write \(\textbf{H}= \textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2\) to denote that \(\textbf{H}_1\) and \(\textbf{H}_2\) are pieces of \(\textbf{H}\) with the properties that

  1. 1.

    \({\text {leaves}}(\textbf{H}_1)\) and \({\text {leaves}}(\textbf{H}_2)\) are both non-empty and partition \({\text {leaves}}(\textbf{H})\),

  2. 2.

    and \(\textbf{H}_1 \cap \textbf{H}_2 = {\bar{x}}\).

Note that pieces of a connected tog are not necessarily connected. However, in the case of etogs, some connectivity is maintained, namely between non-stem vertices:

Lemma 2

Let \(\textbf{H}= \textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2\) be an etog and let \(u \in \textbf{H}_1 - {\bar{x}}\). Then any vertex v with \(u \preccurlyeq _{\textbf{H}} v\) is contained in \(\textbf{H}_1 - {\bar{x}}\). Moreover, there exists a uv-path \(P \subseteq \textbf{H}_1 - {\bar{x}}\) with \(\min _{\textbf{H}} P = \min _{\textbf{H}_1} P = u\).

Proof

Since \(\textbf{H}\) is an etog, there exists an elimination tree T whose ancestor relationship defines \(\preccurlyeq _{\textbf{H}}\). Consider the subtree \(T_u\) rooted at u, because \(u \preccurlyeq _{\textbf{H}} v\) we have that \(v \in T_u\). Recall that by the definition of elimination trees \(H[T_u]\) is connected, therefore \(H[T_u]\) contains a uv-path \(P \subseteq T_u\).

Because \(u \in \textbf{H}_1 - {\bar{x}}\) it follows that \(T_u \subseteq \textbf{H}_1 - {\bar{x}}\), thus the same is true for P. It immediately follows that \(\min _{\textbf{H}_1} P = u\). \(\square \)

Fig. 1
figure 1

The tog \(\textbf{H}\) is decomposed into pieces \(\textbf{H}_1\) and \(\textbf{H}_2\) along its stem \({\bar{x}}\). \(\textbf{D}\) is a defect of \(\textbf{H}\) as \(\textbf{H}\not \hookrightarrow \textbf{D}\) but there exists a mapping \(\phi \) such that and . On the right the same situation with a concrete example where \(\textbf{H}\) is an etog of \(P_4\)

We now show that linear pieces can be enumerated or counted in linear time given a suitable vertex ordering of the host graph with constant-sized weak/strong r-neighborhoods.

3.1 Counting (Relevant) Linear Pieces

We first prove that all relevant linear pieces (those that can be completed to the full pattern) are completely contained in weakly reachable sets and therefore can be counted easily in time \(O( {\text {wcol}}_{|\textbf{H}|} (\mathbb {G})^{|\textbf{H}|-1} \cdot |\mathbb {G}|)\), see Sect. 4 for details.

Lemma 3

Let \(\mathbb {K}\) be a linear piece for some pattern relaxation \(\textbf{H}\in {\mathcal {H}}\) and let \(z = \max (\mathbb {K})\). Then for every it holds that \(\phi (\mathbb {K})\) is contained in \(W^{|\textbf{H}|}_\mathbb {G}[\phi (z)]\).

Proof

Let \(\phi \) be such an embedding and fix any \(x \in \mathbb {K}\). We need to show that \(\phi (x) \in W^{|\textbf{H}|}_\mathbb {G}[\phi (z)]\). Since we assumed that H is connected, so is \(\textbf{H}\). Then by Lemma 1, there exists a path P path from x to z in \(\textbf{H}\) with \(\min _\textbf{H}P = x\). Since \(\phi \) is an embedding, by Observation 1 it holds that

$$\begin{aligned} \min _\mathbb {G}\phi (P) = \phi (\min _\textbf{H}P) = \phi (x). \end{aligned}$$

We conclude that \(\phi (x) \in W^{|P|}_\mathbb {G}[\phi (z)] \subseteq W^{|\textbf{H}|}_\mathbb {G}[\phi (z)]\). \(\square \)

The above does not hold if we replace weak reachability by strong reachability, however, the following statement already suffices to build the strong-reachability variant of our algorithm:

Lemma 4

Let \(\mathbb {K}\) be a linear piece for some pattern relaxation \(\textbf{H}\in {\mathcal {H}}\). Let \(z = \max (\mathbb {K})\) and let \(x <_\mathbb {K}z\) be an arbitrary vertex of \(\mathbb {K}.\) There exists a vertex \(y \in \mathbb {K}\), \(x <_\mathbb {K}y \le _\mathbb {K}z\) such that for every embedding it holds that \( \phi (x) \in S_\mathbb {G}^{|H|}[\phi (y)] \).

Proof

Let \(\phi \) be such an embedding. Again, since \(\textbf{H}\) is connected there exists a path P from x to z in \(\textbf{H}\) with \(\min _\textbf{H}P = x\). Let \(y = \min _\textbf{H}((P - x) \cap V(\mathbb {K}))\) be the smallest vertex of \(\mathbb {K}\) which lies on P; since z lies in this intersection this minimum must exist. Let \(P'\) be the portion of P which goes from x to y.

Claim

\(y = \min _\textbf{H}(P'-x)\).

Proof

Assume towards a contradiction that \(y' = \min _\textbf{H}(P'-x)\) with \(y' \ne y\). Note that by our choice of P it holds that \(y' \prec _\textbf{H}z\).

First consider the case that \(y' \prec _\textbf{H}y\). Hence \(y'\) must lie somewhere on the path from x to y in \(T(\textbf{H})\). But then \(y'\) is contained in the piece \(\mathbb {K}\) and hence \((P-x) \cap V(\mathbb {K})\), contradicting our choice of y.

Otherwise, \(y'\) and y are incomparable under \(\preccurlyeq _\textbf{H}\) and in particular \(y'\) cannot lie anywhere on \({\text {rpath}}_{T(\textbf{H})}(y)\) or anywhere below y in \(T(\textbf{H})\). Since \(\preccurlyeq _\textbf{H}\) guards E(H) the path P can only go from \(y'\) to y by intersecting \({\text {rpath}}_{T(\textbf{H})}(y)\) in some vertex \(y''\). But then \(y'' \in (P-x) \cap V(\mathbb {K})\), contradicting our choice of y. \(\square \)

Finally, we apply Observation 1 and find that

$$\begin{aligned} \min _\mathbb {G}\phi (P'-x) = \phi (\min _\textbf{H}(P'-x)) = \phi (y) \end{aligned}$$

from which we conclude that indeed \(\phi (x) \in S^{|P|}_\mathbb {G}[\phi (y)] \subseteq S^{|H|}_\mathbb {G}[\phi (y)]\). \(\square \)

We call such a vertex y a hint and introduce the following notation to speak about it more succinctly:

Definition 7

(\({{\,\mathrm{{\textsf{hint}}}\,}}\)). Let \(\mathbb {K}\) be a linear piece of \(\textbf{H}\in {\mathcal {H}}\) with vertices \(x_1,\ldots ,x_p\). For every index \(i \in [p]\) we define the function \({{\,\mathrm{{\textsf{hint}}}\,}}^\textbf{H}_\mathbb {K}(i)\) to be the largest index \(j > i\) such that for every embedding it holds that \(\phi (x_i) \in S_\mathbb {G}^{|H|}[\phi (x_j)]\).

To use these hints algorithmically, we proceed as follows. Given a linear piece \(\mathbb {K}\) with vertices \(x_1,\ldots ,x_p\), we first match the last vertex \(x_p\) to a vertex \({\hat{x}}_p\) in \(\textbf{G}\). By Lemma 4, we can now restrict our search for \({\hat{x}}_{p-1}\) to vertices in the set \(S^p_\mathbb {G}[{\hat{x}}_p]\). For any match of \(x_{p-1}\) to \({\hat{x}}_{p-1}\) in \(\textbf{G}\), we then can attempt to find a match for \(x_{p-2}\) by searching the set \(S^p_\mathbb {G}[{\hat{x}}_j]\) where \(j = {{\,\mathrm{{\textsf{hint}}}\,}}(p-2) \in \{p-1,p\}\). This algorithm is described in more detail in the last section.

3.2 Combining Counts

In order to succinctly describe our approach, we need to introduce the following notation for counting embeddings of a pattern graph \(\textbf{H}\) into a host graph \(\textbf{G}\) where we already fix the embedding of a prefix of \(\textbf{H}\)’s stem vertices.

Definition 8

((Strict) embedding count) For togs \(\textbf{H}, \textbf{G}\) with \({\bar{x}}\) a stem prefix of \(\textbf{H}\) and \({\bar{y}} \subseteq \textbf{G}\) an ordered vertex set with \(|{\bar{x}}| = |{\bar{y}}|\), we define

The central idea is now that in order to count \(\textbf{H}\), we instead count the occurrences of two pieces \(\textbf{H}_1 \oplus _{{{\,\mathrm{{\textsf{stem}}}\,}}\textbf{H}} \textbf{H}_2\) and compute by taking the product . Of course, the latter quantity over-counts the former, as we will discuss below. First, let us introduce the following notation for this ‘estimate’ embeddings count:

Definition 9

(Relaxed embedding) For togs \(\textbf{H}= \textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2, \textbf{G}\) with \(\bar{x}\) a stem prefix of \(\textbf{H}\) we write to denote that the mapping \(\phi \in V(\textbf{G})^{V(\textbf{H})}\) has the following properties:

  • and ,

  • \(\phi (\textbf{H}) = V(\textbf{G})\).

Definition 10

(Relaxed embedding count) For togs \(\textbf{H}= \textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2, \textbf{G}\) with \(\bar{x}\) a stem prefix of \(\textbf{H}\) and \({\bar{y}} \subseteq \textbf{G}\) an ordered vertex set with \(|{\bar{x}}| = |{\bar{y}}|\) we write for the number of embeddings with \(\phi ({\bar{x}}) = {\bar{y}}\).

Now, how does a mapping which embeds \(\textbf{H}_1\) and \(\textbf{H}_2\) fail to embed \(\textbf{H}= \textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2\)? We either must have that the images of \(\textbf{H}_1\) and \(\textbf{H}_2\) intersect or that there exists an edge between their images which does not belong to \(\textbf{H}\). We will call such pair of embeddings a defect:

Definition 11

(Defect) Let \(\textbf{H}\in {\mathcal {H}}\) be a pattern relaxation and let \(\textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2 = \textbf{H}\). A defect of \(\textbf{H}_1, \textbf{H}_2\) is any etog \(\textbf{D}\) that satisfies the following properties:

  1. 1.

    \(\textbf{H}\not \hookrightarrow \textbf{D}\),

  2. 2.

    ,

  3. 3.

    where \(\phi \) is the identity on the set \((V(\textbf{H}_2)\setminus V(\textbf{H}_1)) \cup {\bar{x}}\),

  4. 4.

    and \(V(\textbf{D}) = V(\textbf{H}_1) \cup \phi (\textbf{H}_2)\).

We will write \({{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\) to denote the set of all defects for the pair \(\textbf{H}_1,\textbf{H}_2\).

Note that several of the above properties are for convenience only: we insist that \(\textbf{H}_1\) is a subgraph of \(\textbf{D}\) to avoid handling yet another embedding and we make \(\phi \) preserve all vertices that it possibly can for the same reason. Importantly, all the togs \(\textbf{H}\), \(\textbf{H}_1\), \(\textbf{H}_2\), and \(\textbf{D}\) share the ordered set \({\bar{x}}\) as a stem prefix.

At this point we should point out that it is not a priori clear that it is enough to consider defects that are etogs themselves, it could very well be the case that defects are arbitrary tree-ordered or just ‘ordered’ graphs. Note that what we really want to count are linear subgraphs \(\mathbb {D}\subseteq \mathbb {G}\) into which \(\textbf{H}_1\) and \(\textbf{H}_2\) embed, but \(\textbf{H}= \textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2\) does not (as these are precisely the cases that we over-count in the product , for some prefix \({\bar{y}}\) of \(\mathbb {D}\)). The next lemma shows that instead of trying to find these linear subgraphs, we can instead recourse to counting their relaxations, thus circling back to etogs:

Lemma 5

Let \(\textbf{H}\) be a connected etog with pieces \(\textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2 = \textbf{H}\). Let \(\textbf{D}\in {{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\). Then for every linear graph \(\mathbb {D}\) with \({{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {D}) = \textbf{D}\) it holds that

Proof

Fix a linear graph \(\mathbb {D}\) with \(\textbf{D}= {{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {D})\) in the following. Let \(\xi _1,\ldots ,\xi _p\) be all strict embeddings of \(\textbf{D}\) into \(\mathbb {D}\) with fixed points \({\bar{x}}\), that is, and \(\xi _i({\bar{x}}) = {\bar{x}}\) for all \(i \in [p]\). Recall that the mappings \(\xi _i\) are in particular automorphisms of the underling graph D and therefore the inverse mappings \(\xi ^{-1}_i\) exist.

Claim

Let with \(\phi ({\bar{x}}) = {\bar{x}}\). Then and \((\xi _i \circ \phi )(\bar{x}) = {\bar{x}}\) for each \(i \in [p]\).

Proof

For each \(\textbf{H}_j\), \(j \in \{1,2\}\), and \(i \in [p]\) we have that , therefore by transitivity . Since \(\phi (\textbf{H}) = V(\textbf{D})\) and each \(\xi _i\) is an automorphism of the underlying graph D, it follows that \((\xi \circ \phi )(\textbf{H}) = V(\textbf{D}) = V(\mathbb {D})\). We conclude that indeed . Finally, since both \(\phi \) and each \(\xi _i\) have \({\bar{x}}\) as fixed points, \((\xi \circ \phi )({\bar{x}}) = {\bar{x}}\). \(\square \)

Claim

Let with \(\psi (\bar{x}) = {\bar{x}}\). Then and \((\xi ^{-1}_i \circ \psi )({\bar{x}}) = {\bar{x}}\) for each \(i \in [p]\).

Proof

Fix \(\xi _i\), \(i \in [p]\) and let \(\theta := \xi ^{-1}_i \circ \psi \). We first show that for \(j \in \{1,2\}\). Clearly \(\theta \) is an induced subgraph isomorphism from \(H_j\) to D with fixed points \({\bar{x}}\), it remains to show that the ordering \(\preccurlyeq _{\textbf{H}_j}\) is properly embedded into \(\preccurlyeq _{\textbf{D}}\).

Consider a pair \(u \preccurlyeq _{\textbf{H}_j} v\). Then \(\psi (u) \le _{\mathbb {D}} \psi (v)\) and since , we have two possibilities for the pre-images of \(\psi (u)\) and \(\psi (v)\) under \(\xi _i\): either \((\xi ^{-1}_i \circ \psi )(u) = \theta (u)\) and \((\xi ^{-1}_i \circ \psi )(v) = \theta (v)\) satisfy \(\theta (u) \preccurlyeq _{\textbf{D}} \theta (v)\), in which case we are done, or the two are incomparable. We now argue that the latter case is impossible.

figure d

First, note that if \(u \in {\bar{x}}\) then necessarily \(\theta (u) \preccurlyeq _{\textbf{D}} \theta (v)\): if \(v \in {\bar{x}}\) then their order is maintained because \(\theta ({\bar{x}}) = {\bar{x}}\), otherwise the claim follows because \({\bar{x}}\) is a stem-prefix of \(\textbf{D}\) and therefore \({\bar{x}} \preccurlyeq _{\textbf{D}} \textbf{D}- {\bar{x}}\).

Therefore we may assume that \(u \in \textbf{H}_j - {\bar{x}}\) and we apply Lemma 2 to obtain a uv-path \(P \subseteq H_j - {\bar{x}}\) with \(\min _{\textbf{H}_j} P = u\). Since , there exists a \(\psi (u)\)\(\psi (v)\)-path \(P' \subseteq \psi (P)\) in \(\mathbb {D}\) with \(\min _\mathbb {D}P' = \psi (\min _{\textbf{H}_j} P) = \psi (u)\). But then by Corollary 1, the pre-images of \(\psi (u)\) and \(\psi (v)\) under \(\xi _i\) (namely \(\theta (u)\) and \(\theta (v)\)) cannot be incomparable (Corollary 1 applies after a trivial relabelling of graphs to \(\textbf{D}\) and \(\mathbb {D}\)).

We conclude that indeed \(\theta (u) \preccurlyeq _{\textbf{D}} \theta (v)\) and therefore that for \(j \in \{1,2\}\), as claimed. \(\square \)

Let (LRE) be a bipartite graph where the side contains all relevant mappings from \(\textbf{H}\) into \(\textbf{D}\) and similarly those from \(\textbf{H}\) into \(\mathbb {D}\). We draw an edge between \(\phi \in L\) and \(\psi \in R\) if \(\xi _i \circ \phi = \psi \) for some \(i \in [p]\). This is of course precisely true iff \(\phi = \xi ^{-1}_i \circ \psi \). Therefore, by the above observations, we find that vertices on both sides have degree exactly p. The lemma now follows by observing that in a regular bipartite graph both sides must have the same size. \(\square \)

We are now ready to prove the main technical lemma of this paper, the recurrence that will allow us to compute , i.e. the number of embeddings from \(\textbf{H}\) into \(\mathbb {G}\) which map the stem prefix \({\bar{x}}\) of \(\textbf{H}\) onto the ordered subset \({\bar{y}}\) of \(\mathbb {G}\). Note that in order to compute the number of induced subgraphs, we simply have to divide this value by , the number of automorphisms of \(\textbf{H}\) with fixed points \({\bar{x}}\).

Lemma 6

Let \(\textbf{H}\in {\mathcal {H}}\) be a (non-linear) pattern relaxation and let \(\textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2 = \textbf{H}\). Fix an ordered vertex set \({\bar{y}} \in \mathbb {G}\) such that \(\textbf{H}[{\bar{x}}] \simeq \mathbb {G}[{\bar{y}}]\). Then

with coefficients

Proof

In order to prove this equation we will rewrite it in terms of individual linear subtogs \(\mathbb {K}\subseteq \mathbb {G}\) and consider how they contribute to the different terms. To that end, define \(\Phi (\mathbb {K})\) to be all pairs \((\phi _1,\phi _2)\) with \(\phi _1({\bar{x}} ) = \phi _2({\bar{x}}) = {\bar{y}}\) such that

  1. (a)

    and ; and

  2. (b)

    \(V(\mathbb {K}) = V(\phi _1(\textbf{H}_1)) \cup V(\phi _2(\textbf{H}_2))\).

That is, \(\Phi (\mathbb {K})\) contains all pairs of embeddings that minimally embed the graphs \(\textbf{H}_1\) and \(\textbf{H}_2\) into \(\mathbb {K}\) while also mapping \({\bar{x}}\) onto \({\bar{y}}\).

Claim

Let \(\mathbb {K}\subseteq \mathbb {G}\) such that . Then .

Proof

First note that since , it follows that \(H \simeq K\). Fix an embedding-pair \((\phi _1, \phi _2) \in \Phi (\mathbb {K})\) and let \(V_1 := V(\phi _1(\textbf{H}_1))\) and \(V_2 := V(\phi _2(\textbf{H}_2))\) By definition, \(V_1 \cup V_2 = V(\mathbb {K})\) and since \(|\textbf{H}| = |\mathbb {K}|\) it follows that \(V_1\) and \(V_2\) partition \(V(\mathbb {K})\).

As the embeddings of \(\textbf{H}_1\) and \(\textbf{H}_2\) are vertex-disjoint and since \(\Vert \mathbb {K}\Vert = \Vert \textbf{H}\Vert \), it follows that there cannot exist an edge between \(V_1 \setminus {\bar{y}}\) and \(V_2 \setminus {\bar{y}}\). But then with \(\xi (u) = \phi _1(u)\) for \(u \in \textbf{H}_1\) and \(\xi (u) = \phi _2(u)\) for \(u \in \textbf{H}_1 - {\bar{x}}\).

In the other direction, any embedding  with \(\xi ({\bar{x}}) = {\bar{y}}\) can of course be trivially decomposed into an embedding-pair \((\phi _1,\phi _2) \in \Phi (\mathbb {K})\). Accordingly, we have a one-to-one correspondence and we conclude that , as claimed. \(\square \)

We can now use Iverson bracket notation to rewrite and obtain

which holds because \(\Phi (\mathbb {K}) = \emptyset \) whenever \(\mathbb {K}\) does not have \({\bar{y}}\) as a stem prefix. We will now consider the two sums (\(\clubsuit \)) and (\(\spadesuit \)) individually.

Claim

Proof

Fix an embedding-pair \((\phi _1, \phi _2)\) with \(\phi _1({\bar{x}}) = \phi _2({\bar{x}}) = {\bar{y}}\) as well as and . Such a pair contributes exactly one to the left-hand side of the equation and we therefore have to show that there is exactly one \(\mathbb {K}\subseteq \mathbb {G}\) with \((\phi _1,\phi _2) \in \Phi (\mathbb {K})\). Concretely, we claim that this subtog is \(\mathbb {K}= \mathbb {G}[V(\phi _1(\textbf{H}_1) \cup V(\phi _2(\textbf{H}_2)))]\). Clearly \((\phi _1,\phi _2) \in \Phi (\mathbb {K})\), to see that this is the only subtog with this property, simply note that \((\phi _1,\phi _2) \in \Phi (\mathbb {K}')\) implies that \(V(\mathbb {K}') = V(\phi _1(\textbf{H}_1)) \cup V(\phi _2(\textbf{H}_2)) = V(\mathbb {K})\) and therefore \(\mathbb {K}' = \mathbb {K}\). \(\square \)

The following claim will help us to rewrite the sum (\(\spadesuit \)):

Claim

Let \(\mathbb {K}\subseteq \mathbb {G}\) be a linear subtog with and \(\Phi (\mathbb {K}) \ne \emptyset \). Then there exists a unique defect \(\textbf{D}\in {{\,\mathrm{{\mathcal {D}}}\,}}(H_1,H_2)\) with \(\textbf{D}\simeq {{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {K})\) and .

Proof

Since \(\Phi (\mathbb {K})\) is non-empty there exists at least one embedding-pair \((\phi _1,\phi _2)\) with , and \(V(\mathbb {K}) = V(\phi _1(\textbf{H}_1)) \cup V(\phi _2(\textbf{H}_2))\). Define \(\xi :V(\textbf{H}) \rightarrow V(\mathbb {K})\) to be \(\xi (u) = \phi _1(u)\) for \(u \in \textbf{H}_1\) and \(\xi (u) = \phi _2(u)\) for \(u \in \textbf{H}_2 \setminus \textbf{H}_1\). Then \(\xi \) fulfils all the properties as in Definition 10, meaning that

where we applied Lemma 5 with \(\textbf{K}:= {{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {K})\) in the second step.

Now note that the etog \(\textbf{K}\) satisfies and \(V(\textbf{K}) = V(\phi _1(\textbf{H}_1)) \cup V(\phi _2(\textbf{H}_2))\). Therefore \(\textbf{K}\) is, up to relabelling, a defect of \(\textbf{H}_1, \textbf{H}_2\). Concretely, there exists a unique \(\textbf{D}\in {{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\) such that \(\textbf{D}\simeq \textbf{K}\) and accordingly

from which the claim follows. \(\square \)

With this claim, we can now rewrite the sum \((\spadesuit )\) as follows:

figure e

Since we are now summing only over subtogs whose relaxation is isomorphic to a defect \({{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\), we can drop the condition that :

The sum \(\sum _{\mathbb {K}\subseteq \mathbb {G}} \llbracket \textbf{D}\simeq {{\,\mathrm{{\textsf{relax}}}\,}}(\mathbb {K}) \rrbracket \) counts how many subtogs of \(\mathbb {G}\) are isomorphic to \(\textbf{D}\). This is the same as counting how often \(\textbf{D}\) embeds into \(\mathbb {G}\) divided by the number of automorphisms \(\textbf{D}\) has. We finally arrive at

With these alternative computations of the sums (\(\clubsuit \)) and (\(\spadesuit \)) we finally arrive at the claimed equality

\(\square \)

figure f

We next prove that the recurrence implied by the equation in Lemma 6 is finite to prove that the decomposition algorithm does indeed terminate. Here the recurrence terminates if the first argument is a linear graph as these cannot be decomposed further and, as described in the previous sections, these graphs can be counted in linear time. We are not interested in precise bounds here, instead we provide relevant measurements for small graphs of interest in Table 1 in the last section.

Lemma 7

The recurrence for as stated in Lemma 6 has depth at most \(|\textbf{H}|^2\).

Proof

Let \(h := |\textbf{H}|\). We argue that the measure

$$\begin{aligned} f(\textbf{G}) = (h-|{{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{G})|)h + \kappa (\textbf{G}- {{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{G})), \end{aligned}$$

where \(\kappa \) denotes the number of connected components, strictly decreases for all graphs involved in the right hand side of the recurrence. That is, we show that \(f(\textbf{K}) < f(\textbf{H})\) for all togs \(\textbf{K}\in \{\textbf{H}_1, \textbf{H}_2 \} \cup {{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\). All these graphs have at most h vertices, therefore \(\kappa (\textbf{K}- {{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{K})) \le h\). Note therefore that the measure already decreases if \(|{{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{K})| > |{{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{H})|\), independent of the value of \(\kappa (\textbf{K}- {{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{K}))\).

First consider \(\textbf{H}_1, \textbf{H}_2\). Since \(\textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2 = \textbf{H}\), we have that \(\kappa (\textbf{H}_i - {{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{H})) < \kappa (\textbf{H}- {{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{H}))\) for \(i \in \{1,2\}\). Thus for \({{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{H}_i) = {{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{H})\) we have \(f(\textbf{H}_i) < f(\textbf{H})\), as observed above the same holds true when \({{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{H}_i) > {{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{H})\).

Thus we are left to prove that the measure decreases for all \(\textbf{D}\in {{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1, \textbf{H}_2)\). Let \(\phi \) be the embedding with fixed points \({\bar{x}}\) (recall that \(\textbf{H}_1\) is simply a subtog of \(\textbf{D}\)). Since \(\textbf{H}\not \hookrightarrow {} \textbf{D}\), we conclude that \(\textbf{H}_1 - {\bar{x}}\) and \(\phi (\textbf{H}_2) - {\bar{x}}\) must either be connected by an edge or share a vertex in \(\textbf{D}\). In either case, \(\kappa (\textbf{H}- {\bar{x}}) < \kappa (\textbf{D}- {\bar{x}})\). Accordingly, the measure decreases if \(\bar{x} = {{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{D})\). If that is not the case, we necessarily have that \(|{{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{D})| > |{{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{H})|\). We conclude that the measure indeed decreases in either case.

Finally, note that once the measure is 0 we have that \({{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{K}) = h\) and the etog in question is therefore linear. Since we are only considering connected patterns, the maximum value the measure can possibly take is \((h-1)h + h - 1 = h^2-1\). We conclude that the recurrence has depth at most \(|\textbf{H}|^2\). \(\square \)

3.3 Computing Defects

For the remainder of this section, fix \(\textbf{H}= \textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2\) where \({\bar{x}} := {{\,\mathrm{{\textsf{stem}}}\,}}(\textbf{H})\). Let also \(V_1 := V(\textbf{H}_1) - {\bar{x}}\) and \(V_2 := V(\textbf{H}_2) - {\bar{x}}\) be the vertex sets exclusive to \(\textbf{H}_1\) and \(\textbf{H}_2\).

Definition 12

(Monotone) Let \(\preccurlyeq \) be a partial order over a set S and let \(M \subseteq {S \atopwithdelims ()2}\) be a matching. Let further \(\vec {D}\) be the digraph representation of \(\preccurlyeq \). We say that M is monotone with respect to \(\preccurlyeq \) if the digraph obtained from \(\vec {D}\) by identifying the pairs in M is a dag.

Definition 13

(Defect map) A defect map is a bijection \(\kappa :{\bar{x}} \cup \tilde{V}_1 \rightarrow {\bar{x}} \cup {\tilde{V}}_2\) for subsets \({\tilde{V}}_1 \subseteq V_1\) and \({\tilde{V}}_2 \subseteq V_2\) with the following properties:

  • \(\kappa \) is an isomorphism between \(H[{\bar{x}} \cup {\tilde{V}}_1]\) and \(H[{\bar{x}} \cup {\tilde{V}}_2]\),

  • the matching \(\{x\kappa (x) \mid x \in {\tilde{V}}_1\}\) is monotone with respect to \(\preccurlyeq _\textbf{H}\).

In the following we construct a set of etogs \({\mathcal {D}}'\) and prove that it is precisely \({{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\). Given the decomposition \(\textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2\) of \(\textbf{H}\), we generate the etogs in \({\mathcal {D}}'\) as follows:

  1. 1.

    Select appropriate subsets \({\tilde{V}}_1 \subseteq V_1\) and \({\tilde{V}}_2 \subseteq V_2\) and a defect map \(\kappa :{\bar{x}} \cup {\tilde{V}}_1 \rightarrow {\bar{x}} \cup {\tilde{V}}_2\). Let \(M := \{x\kappa (x) \mid x \in {\tilde{V}}_1 \}\).

  2. 2.

    Identify the pairs matched by M in H to create the (unordered) graph \(H'\) and create the relation \(\preccurlyeq _M\) from \(\preccurlyeq _\textbf{H}\) by the same process.

  3. 3.

    Select a set \(E^+ \subseteq (V_1 - {\tilde{V}}_1) \times (V_2 - {\tilde{V}}_2)\) with \(E^+ \cap E(H) = \emptyset \) and add it to \(H'\); we only allow \(E^+ = \emptyset \) if \({\tilde{V}}_1, {\tilde{V}}_2 \ne \emptyset \).

  4. 4.

    For every linear ordering \(\le \) of V(H) that is compatible with \(\preccurlyeq _M\), add the graph \({{\,\mathrm{{\textsf{relax}}}\,}}((H',\le ))\) to \({\mathcal {D}}'\).

For compatibility with Definition 11, whenever we identify vertices \(xy \in M\), we label the resultant vertex x, thus \(V_1 \subseteq V(H')\).

Theorem 1

The above process generates exactly \({{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\).

We prove Theorem 1 by showing the following two lemmas.

Lemma 8

\({{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2) \subseteq {\mathcal {D}}'\).

Proof

Consider \(\textbf{D}\in {{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\) and let such that \(\phi \) is the identity on the set \(V(\textbf{H}_2)\setminus V(\textbf{H}_1) \cup {\bar{x}}\). Recall that, by convention, . Let \({\tilde{V}} := V_1 \cap \phi (V_2)\) and define the mapping \(\kappa :{\bar{x}} \cup {\tilde{V}} \rightarrow {\bar{x}} \cup \phi ^{-1}({\tilde{V}})\) as the identity on \({\bar{x}}\) and \(\kappa (x) := \phi ^{-1}(x)\) for \(x \in {\tilde{V}} \subseteq V_1\). Let further \(M := \{x\kappa (x) \mid x \in \tilde{V}\}\).

Claim

\(\kappa \) is a defect map.

Proof

Since and we conclude that \(\kappa \) is an isomorphism of the underlying graphs \(H_1[{\bar{x}} \cup {\tilde{V}}]\) and \(H_2[\kappa ({\bar{x}} \cup {\tilde{V}})]\).

Let \(\vec {O}\) be the digraph representation of \(\preccurlyeq _\textbf{H}\) and \(\vec {O}''\) of \(\preccurlyeq _\textbf{D}\). By construction, \(\vec {O}''\) is precisely the digraph obtained from \(\vec {O}\) by identifying the pairs matched in M. Since \(\textbf{D}\) is a tree-ordered graph, \(\preccurlyeq _\textbf{D}\) is a partial order and thus \(\vec {O}''\) is a dag. In other words, the matching M is monotone with respect to \(\preccurlyeq _\textbf{H}\) and we conclude that \(\kappa \) is a defect map. \(\square \)

Let \({\tilde{V}}_1 := {\tilde{V}}\) and \({\tilde{V}}_2 := \kappa ({\tilde{V}})\) in the following. Define \(E^+ := E(\textbf{D}) \cap ((V_1 - {\tilde{V}}_1)\times (\phi (V_2) - {\tilde{V}}_1))\). Let \(H'\) be the graph obtained from H by identifying the pairs matched by M and adding \(E^+\) to it. Let further \(\preccurlyeq _M\) be the relation obtained from \(\preccurlyeq _{\textbf{H}}\) by identifying the pairs matched by M. It is left to show that there exists a linear order \(\le \) which is compatible with \(\preccurlyeq _M\) and satisfies \(D = {{\,\mathrm{{\textsf{relax}}}\,}}((H',\le ))\). Let \(\vec {O}'\) be the digraph representation of \(\preccurlyeq _M\) and let \(\vec {O}''\) be again the digraph representation of \(\preccurlyeq _\textbf{D}\). Note that the difference between \(\vec {O}'\) and \(\vec {O}''\) are arcs corresponding to an orientation \(\vec {E}^+\) of \(E^+\) and transitive arcs resulting from the addition of \(\vec {E}^+\). Since all edges in \(E^+\) are between \(V_1 - {\tilde{V}}\) and \(\phi (V_2) - {\tilde{V}}\) and those two sets are disjoint, we can choose, for example, to orient \(\vec {E}^+\) by letting all arcs point towards \(\phi (V_2) - \tilde{V}\). Then \(\vec {O}' \cup \vec {E}^+\) is a digraph and so is its transitive closure \({\tilde{O}}''\). Now note that every topological ordering \(\le \) of \(\vec {O}''\) is also a topological ordering of \(\vec {O}'\) and we conclude that \(\preccurlyeq _M\) is compatible with \(\preccurlyeq _D\). Since \(\textbf{D}\) is an etog, these orderings also all satisfy \({{\,\mathrm{{\textsf{relax}}}\,}}((H',\le )) = D\). We conclude that \(D \in {\mathcal {D}}'\) and therefore \({{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2) \subseteq {\mathcal {D}}'\). \(\square \)

Lemma 9

\({\mathcal {D}}' \subseteq {{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\).

Proof

Consider \(\textbf{D}\in {\mathcal {D}}'\) and let \({\tilde{V}}_1\), \({\tilde{V}}_2\), \(\kappa \), \(E^+\) and \(\le \) be those choices that generated \(\textbf{D}\). Let also \(H'\) be the graph generated by identifying the pairs matched by \(M := \{x\kappa (x) \mid x \in {\tilde{V}}_1\}\) in H and adding \(E^+\) to the resulting graph. We need to show that \(\textbf{D}\) is indeed a defect; note that by the last step of the construction it is necessarily an etog.

Fig. 2
figure 2

Top: Patterns (0–6), defects (7–8) and pieces (9–24) needed to count a path on four vertices. The arrangements indicate the tree order, all pattern relaxations except 1,3,4,7 and 8 are linear. Bottom: Algebraic expressions to compute non-linear patterns, gray boxes indicate the stems. The graphs are understood as automorphism-corrected embedding counts (see note below Lemma 6). For example, in order to compute the number of embeddings of defect 8 which map its stem \({\bar{x}}\) onto a vertex pair \({\bar{y}}\) in the host graph, we first need to compute the number of embeddings of the pieces 24 and 10 which likewise map the first two vertices of their stem prefix onto \(\bar{x}\)

First, let us convince ourselves that \(\textbf{H}\not \hookrightarrow \textbf{D}\). If \({\tilde{V}}_1 \ne \emptyset \), then \(\textbf{D}\) has less vertices than \(\textbf{H}\) and thus no embedding can exist. Otherwise, we have that \(E^+\) is non-empty and therefore \(\textbf{D}\) has more edges than \(\textbf{H}\), again no embedding can exist.

Next, we need to show that . We chose to label the vertices from identifying the pairs in M by their respective endpoint in \({\tilde{V}}_1\). Furthermore, no edge in \(E^+\) has both its endpoints in \({\bar{x}} \cup V_1\), therefore \(\textbf{D}[{\bar{x}} \cup V_1] = \textbf{H}[{\bar{x}} \cup V_1]\) and therefore .

Similarly, we need to show that . Define \(\phi \) to be the identity on \({\bar{x}} \cup V_2 \setminus V_1\) and \(\kappa ^{-1}\) on \({\tilde{V}}_2\). Again, no edge in \(E^+\) has both its endpoints in \(\phi (\textbf{H}_2)\) and hence and therefore .

Finally, it follows directly from the construction of \(\textbf{D}\) that indeed \(V(\textbf{D}) = {\bar{x}} \cup V_1 \cup \phi (V_2) = V(\textbf{H}_1) \cup \phi (\textbf{H}_2)\), thus we conclude that \(\textbf{D}\) is indeed a defect. It follows that \({\mathcal {D}}' \subseteq {{\,\mathrm{{\mathcal {D}}}\,}}(\textbf{H}_1,\textbf{H}_2)\), as claimed (Fig. 2). \(\square \)

4 The Algorithms

In order to efficiently implement the counting algorithm we need a data structure \({\textsf{C}}\) which acts as a map from ordered vertex sets to integers; the idea being that for a fixed pattern relaxation \(\textbf{H}\in {\mathcal {H}}\) with stem \({\bar{x}}\) we store in \({\textsf{C}}[\bar{y}]\), \({\bar{y}} \subset \mathbb {G}\) how many embeddings exist. We use Lemmas 3 or 4 to populate these counters for all linear pieces of \(\textbf{H}\) and then use Lemma 6 to progressively compute counts for larger and larger pieces of \(\textbf{H}\) until we arrive at a count for \(\textbf{H}\) itself. We organize the progressive decompositions of \(\textbf{H}\) and the coefficients resulting from the application of Lemma 6 in a counting dag. Leaves of the counting dag correspond to linear pieces of \(\textbf{H}\), the single source to \(\textbf{H}\) itself. The computation then proceeds from the leaves upwards; a task can be completed as soon as all its out-neighbors have been completed (leaf nodes are completed by applying Lemmas 3 or 4).

Repeating this procedure for all pattern relaxations in \({\mathcal {H}}\) and correcting the sum by the number of automorphisms of H then gives us the total number of times H appears as an induced subgraph of G. For convenience, we compute a joint counting dag for all relaxations \({\mathcal {H}}\) and read of the final value from all its source nodes—note that in practice this will save some computations since the counting dags likely have nodes in common.

We first outline the notation and necessary operations of \({\textsf{C}}\) and then discuss how it can be implemented, then we describe the counting dag and then finally provide the algorithms. We will assume in the following that \(\mathbb {G}\) is a linear ordering of G, we present the algorithm with a dependence on \( {\text {wcol}}_{|H|} (\mathbb {G})\) and show what modifications have to be made to arrive at an algorithm depending on \( {\text {col}}_{|H|} (\mathbb {G})\) instead. At the end of the section we will use Propositions 3 and 5 to express the running times in terms of \( {\text {wcol}}_{|H|} (G)\) and \( {\text {col}}_{|H|} (G)\), respectively.

4.1 Counting Data Structure

The counting data structure \({\textsf{C}}\) of depth d is a map from d-length ordered vertex sets \({\bar{y}} \subseteq \textbf{G}\) to positive integers \({\textsf{C}}[{\bar{y}}]\). Initially, the counting data structure contains a count of zero for every possible key. We write \(|{\textsf{C}}|\) to denote the number of keys stored in \({\textsf{C}}\) with non-zero counts. The data structure supports the following queries and modifications:

  • Increment count \({\textsf{C}}[{\bar{y}}]\) by any integer for tuples \({\bar{y}}\) of length d in time O(d);

  • Answer the prefix query

    $$\begin{aligned} {\textsf{C}}[{\bar{y}}] := \sum _{\begin{array}{c} {\bar{z}}: |{\bar{z}}| = d \\ \text {and}~ {\bar{z}}|_ r= {\bar{y}} \end{array}} {\textsf{C}}[{\bar{z}}] \end{aligned}$$

    for tuples \({\bar{y}}\) of length \(r \le d\) in time O(r);

  • for \(\gamma \in {\mathbb {R}}\) we can compute the scalar product \(\gamma {\textsf{C}}\) with

    $$\begin{aligned} (\gamma {\textsf{C}})[{\bar{y}}] := \gamma \cdot {\textsf{C}}[{\bar{y}}] \quad \forall {\bar{y}} \in V(\mathbb {G})^d \end{aligned}$$

    in time \(O(d |{\textsf{C}}|)\).

Given two counting data structures \({\textsf{C}}_1, {\textsf{C}}_2\) of depth \(\ge r\) the following two operations must be supported:

  • The r-depth difference \({\textsf{C}}_1 -_r {\textsf{C}}_2\) with

    $$\begin{aligned} ({\textsf{C}}_1 -_r {\textsf{C}}_2)[{\bar{y}}] := {\textsf{C}}_1[{\bar{y}}] - {\textsf{C}}_2[{\bar{y}}] \quad \forall {\bar{y}} \in V(\mathbb {G})^r \end{aligned}$$

    in time \(O(r \cdot \max (|{\textsf{C}}_1|,|{\textsf{C}}_2|))\);

  • the r-depth product \({\textsf{C}}_1 \mathbin {*}_r {\textsf{C}}_2\) with

    $$\begin{aligned} ({\textsf{C}}_1 \mathbin {*}_r {\textsf{C}}_2)[{\bar{y}}] := {\textsf{C}}_1[{\bar{y}}] \cdot {\textsf{C}}_2[{\bar{y}}] \quad \forall {\bar{y}} \in V(\mathbb {G})^r \end{aligned}$$

    in time \(O(r \cdot \max (|{\textsf{C}}_1|,|{\textsf{C}}_2|))\).

A convenient way to implement \({\textsf{C}}\) is a prefix-trie in which every node contains a counter (which contains the sum-total of all values stored below it) and a dynamically sized hash-map to store its descendants. It is trivial to update the counters during an increment and answering the prefix query \({\textsf{C}}[{\bar{y}}]\) amounts to locating the node with prefix \({\bar{y}}\) in \({\textsf{C}}\) and returning its counter in time \(O(|{\bar{y}}|)\).

Since we can easily enumerate all keys contained in \({\textsf{C}}\) by a depth-first traversal, implementing the scalar product can be done by first creating an empty counting data structure \({\textsf{C}}'\) and inserting all keys \({\bar{x}}\) contained \({\textsf{C}}\) by incrementing the value of \({\textsf{C}}'[{\bar{x}}]\) by \(\gamma {\textsf{C}}[{\bar{x}}]\). The DFS on \({\textsf{C}}\) takes time \(O(|{\textsf{C}}|)\) and each insertion takes time O(d), hence the claimed running time holds true (Fig. 3).

To perform the r-depth difference and product we traverse the two tries \({\textsf{C}}_1\) and \({\textsf{C}}_2\) in lockstep, meaning that we only descend in the DFS if the two currently active nodes \(x_1\) in \({\textsf{C}}_1\) and \(x_2\) in \({\textsf{C}}_2\) both have a child with the same respective key, and truncating the DFS at depth r. During this traversal, it is easy to populate a new trie to obtain the final result \(({\textsf{C}}_1 -_r {\textsf{C}}_2)\) or \(({\textsf{C}}_1 \mathbin {*}_r {\textsf{C}}_2)\). The lockstep DFS takes time \(O(\max (|{\textsf{C}}_1|,|{\textsf{C}}_2|))\), each insertion into the resultant trie takes time O(r) and the running time follows.

Fig. 3
figure 3

Task-dag for counting a path on four vertices, as depicted in Fig. 2. Blue arcs belong to edges \(E^\times \), gray arcs to \(E^-\). Note that the coefficients are for the automorphism-corrected counts, therefore the hyperedges \(E^\times \) need to be imbued with a weight as well (Color figure online)

4.2 The Counting Dag

A counting dag is a directed hypergraph \(({\mathcal {V}}, E^\times , E^-)\) with two types of edges. \(E^\times \subseteq {\mathcal {V}}^3\) contains edges of the form \((\textbf{H},\textbf{H}_l,\textbf{H}_r)\) with \(\textbf{H}= \textbf{H}_r \oplus _{{\bar{x}}} \textbf{H}_l\) which indicate that in order to compute by application of Lemma 6, we need to compute and first because they appear in the product on the right hand side. Every node in \({\mathcal {V}}\) has at most one outgoing hyperedge in \(E^\times \). Also note that \(\textbf{H}_l = \textbf{H}_r\) is possible.

figure g

Similarly, \(E^- \subseteq {\mathcal {V}}^2 \times {\mathbb {R}}\) contains edges of the form \((\textbf{H}, \textbf{D}, \gamma )\) where \(\textbf{D}\) is a defect of \(\textbf{H}_l, \textbf{H}_r\) and in order to compute from we need to subtract for all such edges.

For two counting dags \(\vec {C}, \vec {C}'\) we write \(\vec {C} \cup \vec {C'}\) to denote the union of their vertex and edge sets. With that notation in mind, Algorithm 1 shows how to compute a counting dag for a given etog \(\textbf{H}\). Note that the choice of decomposition \(\textbf{H}= \textbf{H}_1 \oplus _{{\bar{x}}} \textbf{H}_2\) is arbitrary, reasonable choices include either letting \(\textbf{H}_1\) be as small as possible or trying to balance the size of \(\textbf{H}_1\) and \(\textbf{H}_2\).

Lemma 10

Given a graph H on h vertices, we can construct a counting dag \(\vec {C}(H)\) with \(\Vert \vec {C}\Vert = O(3^{h^2})\) using Algorithm 1 in time \(O(\Vert \vec {C}\Vert )\).

Proof

We enumerate the at most h! etogs of H and run Algorithm 1, then we take the union of all resulting counting dags to obtain \(\vec {C} := \vec {C}(H)\).

If we employ memoization across the calls we can upper-bound the running time and the size of the counting dag by the total number of togs on \(\le h\) vertices.

We arrive at a crude upper bound by noting that there are at most \(O(2.956^h)\) tree orders on h vertices [16, chap. 2.3.4.4], each of which can contain up to \({h \atopwithdelims ()2}\) edges; hence

$$\begin{aligned} \Vert \vec {C}\Vert \le |\vec {C}|^2 = O\Big (\big (2.956^h \cdot 2^{\frac{1}{2}(h^2-h)} \big )^2 \Big ) = O\Big (2^{h^2+2.128h}\Big ) = O\big (3^{h^2}\big ). \end{aligned}$$

We obtain the same bound on the running time:

$$\begin{aligned} O(h! \cdot \Vert C\Vert ) = O(2^{h \log h} \cdot 2^{h^2+2.128h}) = O\big (3^{h^2}\big ). \end{aligned}$$

\(\square \)

4.3 The Algorithms

figure h
figure i

The following proofs of the worst-case running time are not very indicative of the algorithms performance as a) the term \(O(3^{h^2})\) is a (crude) upper bound on the size of the counting dag and b) not every pattern of size h needs to use the \((h-1)\)-reachable sets. As an extreme example, the graph \(K_h\) only needs 1-reachable sets, e.g. only a degeneracy ordering of the host graph. We include a table with sizes of counting dags and the necessary depth for various patterns graph in the subsequent section.

Lemma 11

Algorithm 2 computes the number of induced embeddings of H into G in time \(O(\Vert \vec {C}\Vert \cdot h {\text {wcol}}_{h} (\mathbb {G})^{h-1} |G|) = O(3^{h^2} \cdot h {\text {wcol}}_{h} (\mathbb {G})^{h-1} |G|)\) where \(h := |H|\).

Proof

In part , for every one of the \(\ell := {\text {leaves}}(\vec {C})\) many sinks \(\textbf{H}_i\), \(i \in [t,\ell ]\), of \(\vec {C}\) we fill the counting data structure \(\textsf{C}_i\) in time \(O( {\text {wcol}}_{h} (\mathbb {G})^{h-1} |G|)\) by application of Lemma 3.

Since every counting data structure at the end of part contains at most \(O( {\text {wcol}}_{h} (\mathbb {G})^{h-1} |G|)\) many tuples, it follows that all operations in step on counting data structures (r-depth products, differences, scalar products) can be computed in time \(O(h {\text {wcol}}_{h} (\mathbb {G})^{h-1} |G|)\). The number of such operations is proportional to \(\Vert \vec {C}\Vert \), thus in total step takes times \( O(\Vert \vec {C}\Vert \cdot h {\text {wcol}}_{h} (\mathbb {G})^{h-1} |G|) \). The time taken in step is negligible compared to the previous steps and we conclude that the total running time is as claimed.

The correctness of the algorithm follows by induction over the counting dag: the leaf counts are correct by Lemma 3 and the counts at the internal nodes are correct by Lemma 6. \(\square \)

Combining the above lemma with Propositions 4 and 5 , we immediately obtain the following:

Corollary 2

There exists an algorithm that given a graph G and a graph H on h vertices computes the number of times H appears as an induced subgraph in G in total time \(O((3h^2 {\text {wcol}}_{h} (G))^{h^2} |G|)\).

Note that this running time is linear in |G| if G is taken from a class with bounded expansion. Further, if we consider h to be constant, then the running time is \(O(|G|^{1+o(1)})\), i.e. almost linear, when G is taken from a nowhere dense class.

Exchanging Lemma 3 for Lemma 4 in the above proof shows a similar running time for the variants using strong reachability:

Lemma 12

Algorithm 3 computes the number of induced embeddings of H into G in time \(O(\Vert \vec {C}\Vert \cdot h {\text {col}}_{h} (\mathbb {G})^{h-1} |G|) = O(3^{h^2} \cdot h {\text {col}}_{h} (\mathbb {G})^{h-1} |G|)\) where \(h := |H|\).

Corollary 3

There exists an algorithm that given a graph G and a graph H on h vertices computes the number of times H appears as an induced subgraph in G in total time \(O((3h^2 {\text {col}}_{h} (G))^{h^2} |G|)\).

5 Discussion

The goal of our work is to design an algorithm for subgraph counting that has the potential for being useful in practice, which is why we chose to avoid certain tools from the bounded expansion toolkit which—as discussed in the introduction—would immediately render our algorithm unuseable. Outside these questions of algorithmic feasibility, we then need to ask whether real-world networks exhibit the presumed properties, namely, whether they “have bounded expansion”. As cited in the Introduction, there is theoretical and empirical evidence supporting the hypothesis that some types of real-world networks can be treated as bounded expansion graphs. However, we still need to address the fact that the notion is ill-defined for single instances.

We could approach this problem by computing statistics like the weak r-coloring number for a network and then decide based on those numbers whether the network has “bounded expansion” or not. This is of course deeply unsatisfactory (though useful as a comparative approach) as we have to draw arbitrary boundaries around how large the statistics should be for any given r and which values of r we should consider.

We believe that the better approach is to side-step the more philosophical question of whether a single given network is better treated as “bounded expansion” or not and instead simply run the algorithm in question and see whether it appears to be useful/competitive. Algorithms designed for bounded expansion classes by design will still work on dense inputs, unlike e.g. certain algorithms that assume that the input graph is planar. Grounding the experiments in known use-cases then also provides us with a natural limit on the ‘depth’ r at which we operate the algorithm. In our case, practical subgraph counting is usually limited to pattern graphs of size \(\le 5\) [23] which provides an upper bound on r for our purposes (see column ‘d’ in Table 1).

With these considerations in mind, let us now discuss our proof-of-concept implementationFootnote 7 along with preliminary experimental results. Several observations on natural extensions of this algorithm follow.

In practice, Algorithm 2 and have a lot of engineering potential. In most cases, the search space for linear patterns is much smaller than the h-weak or strong neighborhoods since previously-fixed vertices will often have the sought vertices in their left neighborhood or in a weak/strong neighborhood at distance less than h. Furthermore, the task dag for a given pattern can be precomputed and optimized; in order to minimize memory use, we can process tasks in an order which enables us to delete counting data structures as soon as they have been propagated along all in-edges.

Since we view the counting dag computation as a form of pre-processing, we implemented this stage using Python and show results for several small pattern graphs in Table 1. We have not yet explored whether different decomposition strategies (i.e. which piece-sum decomposition to choose if there are multiple options) significantly impact the size of these dags. As expected, the counting dag is smaller for denser graphs—for complete graphs the algorithm essentially reduces to the well-known clique-counting algorithm for degenerate graphs.

Table 1 Size and number of leaves for counting dags for various small graphs. The final column gives the reachability-depth d necessary to count the specified pattern. Named graphs can be looked up under http://www.graphclasses.org/smallgraphs.html

For the subgraph counting algorithm, we choses to implement in Rust. While we recognize that there is significant additional optimization and engineering needed, it is notable that runtimes remain reasonable (see Table 2) on host graphs with tens of thousands of nodes even for relatively large patterns (all measurements where taken on a simple laptop with an intel i5 core and 4GB RAM).

Additionally, we chose to use a very simple heuristic for the vertex ordering by sorting the vertices by descending degree. Nadara et al. showed that these orderings yield acceptable results for weak coloring numbers on real-world networks [17] and are, of course, very fast to compute. In the future, we will investigate the impact of computing better orderings (using e.g. the more involved heuristics discussed in [17]) on the subsequent counting step and how these two phases of the algorithm should be balanced.

Table 2 Runtimes for counting several common patterns in five real-world networks

We plan to engineer these implementations further and compare this approach to other subgraph-counting algorithms on a larger corpus of host and pattern graphs in future work. Note that it is straightforward to extend our algorithm to edge- and vertex-labelled graphs by defining isomorphisms and embeddings appropriately. We chose not to include labels here as they add another layer of notation that would make the presentation less clear.

We also, for simplicity, assumed that the pattern graph H is connected. This is easily remedied by a labelled version of the algorithm: we add an apex vertex with a unique label to both H and G and make it the minimum in \(\mathbb {G}\). Alternatively, the presented algorithm can be modified by allowing piece-sums to work on connected components. This modification does not significantly change the algorithm, but adds additional cases in many proofs.

Finally, we observe that the approach presented here can be modified to count non-induced subgraphs, subgraph homomorphisms or boolean queries instead by adjusting the notions of patterns and pattern decompositions appropriately.