1 Introduction

Labeled graphs can be used for modeling various kinds of objects, such as chemical compounds, images, molecular structures, and many more. Because of this flexibility, labeled graphs have received increasing attention over the past years. One task researchers have focused on is the following: Given a database \(\mathscr {G}\) that contains labeled graphs, find all graphs \(G\in \mathscr {G}\) that are sufficiently similar to a query graph H or find the k graphs from \(\mathscr {G}\) that are most similar to H [23, 29, 66]. Being able to quickly answer graph similarity queries of this kind is crucial for the development of performant pattern recognition techniques in various application domains [62], such as keyword spotting in handwritten documents [61] and cancer detection [48].

For answering graph similarity queries, a distance measure between two labeled graphs G and H has to be defined. A very flexible, sensitive, and therefore widely used measure is the graph edit distance (GED), which is defined as the minimum cost of an edit path between G and H [20, 59]. An edit path is a sequence of graphs starting at G and ending at a graph that is isomorphic to H such that each graph on the path can be obtained from its predecessor by applying one of the following edit operations: adding or deleting an isolated node or an edge, and relabeling an existing node or edge. Each edit operation comes with an associated non-negative edit cost, and the cost of an edit path is defined as the sum of the costs of its edit operations. GED inherits metric properties from the underlying edit costs [36]. For instance, if \(\mathbb {G}\) is the domain of graphs with real-valued node and edge labels and the edit costs are defined as the Euclidean distances between the labels, then GED is a metric on \(\mathbb {G}\).

Of course, computing GED is not the only way for assessing whether or not two graphs are similar. In particular, one popular approach is to embed the graphs into multidimensional vector spaces and then to compare their vector representations [19, 23, 29, 66]. The main advantage of this paradigm is that it allows for fast computations. On the other hand, a substantial part of the information encoded in the original graphs is lost when embedding them into vector spaces. If the graphs are large (e.g., social networks or street networks), this information loss is tolerable. However, there are application domains such as keyword spotting in handwritten documents, cancer detection, and drug discovery, where the graphs are quite small and where local information that would be lost by embedding them into vector spaces is crucial [62]. GED is mainly used for these application domains, i.e., in settings where we have to answer fine-grained similarity queries for (possibly very many) rather small graphs.

Computing GED is a very difficult problem. In fact, it has been shown that the problem of computing GED is NP-hard even for uniform edit costs [68] and APX-hard for metric edit costs [45]. Even worse: since, by definition of GED, it holds that \(\mathrm{GED} (G,H)=0\) just in case G and H are isomorphic, approximating GED within any approximation ratio is GI-hard. These theoretical complexities are mirrored by the fact that, in practice, no available exact algorithm can reliably compute GED on graphs with more than 16 nodes [10].

Because of the hardness of exactly computing GED or approximating it within provable approximation ratios, during the past years, a huge variety of heuristics have been proposed that approximate GED via lower or upper bounds, using techniques such as transformations to the linear sum assignment problem with error correction, linear programming, and local search. Since no theoretical guarantees can be provided for the produced bounds, the heuristics are always evaluated empirically. The most frequently used evaluation criteria are the following:

  1. C1

    Runtime behavior of the heuristics.

  2. C2

    Tightness of the produced lower or upper bounds.

  3. C3

    Performance of pattern recognition frameworks that use the bounds produced by the heuristics as underlying distance measures.

In this paper, we provide a systematic overview of the most important heuristics for the computation of GED. Figure 1 shows the suggested taxonomy. Whenever possible, we model the compared heuristics as instantiations of one of the following three paradigms: LSAPE-GED, LP-GED, and LS-GED. Instantiations of LSAPE-GED use transformations to the linear sum assignment problem with error correction (LSAPE) for heuristically computing GED. All instantiations of LSAPE-GED produce upper bounds; some also yield lower bounds. Instantiations of LP-GED compute lower and upper bounds for GED by employing linear programming (LP) relaxations of mixed integer programming (MIP) formulations of GED. And instantiations of the paradigm LS-GED improve initially computed or randomly generated upper bounds by using variants of local search.

Locating the presented heuristics for GED within the suggested taxonomy has two main advantages: Firstly, it allows to clearly highlight differences and similarities between the presented heuristics. Secondly, the suggested taxonomy provides a guidance for implementing the heuristics in a clean, code-efficient, and comparable way, as common constituents of all instantiations of one of the paradigms can be implemented within an interface representing the paradigm. For instance, all instantiations of LSAPE-GED must have access to a solver for LSAPE. This solver can be implemented or called from an interface that represents the paradigm LSAPE-GED.

Fig. 1
figure 1

Suggested taxonomy for heuristics for GED computation

We carried out extensive experiments in order to test how the compared heuristics perform w.r.t. the evaluation criteria C1 to C3. For enhancing comparability, we reimplemented all heuristics within the C++ library GEDLIB and ensured that they use the same data structures and subroutines whenever possible. GEDLIB mirrors the taxonomy displayed in Fig. 1, i.e., we implemented the paradigms LSAPE-GED, LP-GED, and LS-GED as abstract classes and their instantiations as derived classes. GEDLIB is freely available on GitHub: https://github.com/dbblumenthal/gedlib/.

An alternative view of the upper and lower bounds produced by heuristic algorithms for GED is to not regard them as proxies for GED, but rather as independent distance measures for labeled graphs whose design is guided by GED. With this interpretation, two meta-questions naturally arise:

  1. Q1

    Is it indeed beneficial to use GED as a guidance for the design of graph distance measures, if these distance measures are to be used within pattern recognition frameworks?

  2. Q2

    Do graph distance measures defined by upper bounds for GED or graph distance measures defined by lower bounds for GED perform better when used within pattern recognition frameworks?

To the best of our knowledge, these questions have not been explicitly discussed in the literature. In this paper, we intend to fill this gap. For addressing Q1, we evaluate if performing well w.r.t. the evaluation criterion C3 is positively correlated with performing well w.r.t. C2. For addressing Q2, we check if, globally, heuristics producing lower bounds or heuristics producing upper bounds perform better w.r.t. C3. In sum, our paper contains the following contributions:

  • We suggest a taxonomy for algorithms that heuristically compute GED.

  • We present the most important existing heuristics within this taxonomy.

  • We present the results of an extensive empirical evaluation of all compared heuristics. For carrying out the experiments, all heuristics were reimplemented in C++.

  • We empirically address the question whether or not GED constitutes a good guidance for the design of graph distance measures to be used for pattern recognition tasks.

  • We empirically address the question whether lower or upper bounds for GED perform better when used for pattern recognition tasks.

The remainder of this paper is organized as follows: In Sect. 2, related work is discussed. In Sect. 3, important concepts and notations that are used throughout the paper are introduced. In Sect. 4, a first overview of the compared heuristics is provided and their most important properties are summarized in a comparative way. In Sects. 57, heuristics that instantiate the paradigms LSAPE-GED, LP-GED, and LS-GED are presented. In Sect. 8, miscellaneous heuristics that cannot be modeled as instantiations of one of the paradigms are presented. In Sect. 9, the outcomes of the experimental evaluation are presented. Section 10 concludes the paper. Appendix A contains short descriptions of the test datasets, and Appendix B contains further figures for visualizing the results of the experiments.

2 Related work

To the best of our knowledge, the present paper offers the first comprehensive comparative evaluation of algorithms for heuristically computing GED available in the literature. Nonetheless, there are similar works. In [10], some of the most important algorithms for exactly computing GED are evaluated. Structurally, this paper is very similar to the present one, but its scope is different, as heuristics are not considered.

In [1], the results of a graph edit distance contest are reported. Authors of exact or heuristic algorithms for GED could submit binaries of their algorithms, which were run and compared w.r.t. a set of evaluation criteria. As only the submitted algorithms are described in [1], many important GED heuristics are not covered. Moreover, [1] differs from the present paper in that it empirically evaluates implementations rather than algorithms.

In another survey [31], the main focus is on methods that learn good edit costs for given datasets. Only a few algorithms are presented that aim at computing GED for fixed edit costs, and those that are, are mostly designed for special graphs without node or edge labels. Moreover, the presented algorithms are not compared empirically.

The surveys [19, 23, 29, 66] provide broader overviews of graph-based methods for pattern recognition. In addition to GED, related topics such as exact graph matching and graph kernels are discussed. Like in [31], the presented methods are not compared experimentally in [19, 23, 29, 66].

Methods for GED are also discussed in the books [49, 52]. In [52], some algorithms for exactly and heuristically computing GED are described and empirically evaluated, but the main focus is on how to use GED for defining graph kernels and vector space embeddings that can be employed for clustering and classification. The book [49] exclusively treats GED heuristics, but has a very narrow scope: While a few heuristics are described and tested in great detail, many others are not covered.

Finally, in [67, 69, 71], it is discussed how to index graph databases for efficiently answering graph similarity queries when GED is used as the underlying distance measure. However, as these papers focus on indexing techniques rather than on the design of heuristics, we do not present their findings in this survey.

3 Preliminaries

Since the graphs for which GED-based methods are applied are mostly undirected [2, 50, 62], most heuristics for GED are presented for undirected labeled graphs, although they can usually be easily generalized to directed graphs. For the ease of presentation, we restrict to undirected graphs also in this paper. For the generalizations of the presented heuristics to directed graphs, we refer to the original publications.

An undirected labeled graphG is a 4-tuple \(G=(V^G,E^G,\ell ^G_V,\ell ^G_E)\), where \(V^G\) and \(E^G\) are sets of nodes and edges, \(\varSigma _V\) and \(\varSigma _E\) are label alphabets, and \(\ell ^G_V:V^G\rightarrow \varSigma _V\), \(\ell ^G_E:E^G\rightarrow \varSigma _E\) are labeling functions. The dummy symbol\(\varepsilon \) denotes dummy nodes and edges as well as their labels. Throughout the paper, we denote the nodes of a graph G by \(V^G:=\{u_i\mid i\in [|V^G|]\}\) and the nodes of a graph H by \(V^H:=\{v_k\mid k\in [|V^H|]\}\). Furthermore, we use the notation \((u_i,u_j):=(u_j,u_i):=\{u_i,u_j\}\in E^G\) to denote that there is an undirected edge in G that connects the nodes \(u_i\) and \(u_j\).

Let \(\mathbb {G}:=\{G\mid {{\,\mathrm{img}\,}}(\ell ^G_V)\subseteq \varSigma _V\wedge {{\,\mathrm{img}\,}}(\ell ^G_E)\subseteq \varSigma _E\}\) be the domain of graphs with node labels from \(\varSigma _V\) and edge labels from \(\varSigma _E\). A function \(c_V:\varSigma _V\cup \{\varepsilon \}\times \varSigma _V\cup \{\varepsilon \}\rightarrow {\mathbb {R}} _{\ge 0}\) is a node edit cost functions for \(\mathbb {G}\) just in case, for all \((\alpha ,\alpha ^\prime )\in (\varSigma _V\cup \{\varepsilon \})\times (\varSigma _V\cup \{\varepsilon \})\), \(c_V(\alpha ,\alpha ^\prime )=0\) holds if and only if \(\alpha =\alpha ^\prime \). Similarly, \(c_E:\varSigma _E\cup \{\varepsilon \}\times \varSigma _E\cup \{\varepsilon \}\rightarrow {\mathbb {R}} _{\ge 0}\) is an edge edit cost functions for \(\mathbb {G}\) just in case, for all \((\beta ,\beta ^\prime )\in (\varSigma _E\cup \{\varepsilon \})\times (\varSigma _E\cup \{\varepsilon \})\), \(c_E(\beta ,\beta )=0\) holds if and only if \(\beta =\beta ^\prime \). We say that a node edit cost function \(c_V\) is constant just in case there are constants \(c^{\mathrm{sub}}_V,c^{\mathrm{del}}_V,c^{\mathrm{ins}}_V\in {\mathbb {R}} \) such that \(c_V(\alpha ,\alpha ^\prime )=c^{\mathrm{sub}}_V\), \(c_V(\alpha ,\varepsilon )=c^{\mathrm{del}}_V\), and \(c_V(\varepsilon ,\alpha ^\prime )=c^{\mathrm{ins}}_V\) hold for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\) with \(\alpha \ne \alpha ^\prime \). Constant edge edit costs are defined analogously. We say that the edit cost functions \(c_V\) and \(c_E\) are uniform if and only if both of them are constant and, additionally, we have \(c^{\mathrm{sub}}_V=c^{\mathrm{del}}_V=c^{\mathrm{ins}}_V=c^{\mathrm{sub}}_E=c^{\mathrm{del}}_E=c^{\mathrm{ins}}_E\).

Given fixed edit cost functions \(c_V\) and \(c_E\), GED is defined in terms of the six edit operations and their associated edit costs, which are detailed in Table 1. An edit pathP between two graphs \(G,H\in \mathbb {G}\) is a sequence \(P:=(o_i)^{r}_{i=1}\) of edit operations that satisfies \((o_{r}\circ \ldots \circ o_1)(G)\simeq H\), i.e., transforms G into a graph \(H^\prime \) which is isomorphic to H. Its edit cost c(P) is defined as the sum \(c(P):=\sum ^{r}_{i=1}c(o_i)\) of the contained edit operations. We are now in the position to give a first intuitive definition of GED.

Definition 1

(GED—first definition [10, 36]) The graph edit distance (GED) between two graphs \(G,H\in \mathbb {G}\) is defined as \(\mathrm{GED} (G,H):=\min \{c(P)\mid P\in \varPsi (G,H)\}\), where \(\varPsi (G,H)\) is the set of all edit paths between G and H.

Table 1 Edit operations and edit costs
Table 2 Edit operations and edit costs induced by node map \(\pi \in \varPi (G,H)\); \(u\in V^G\) and \(v\in V^H\) are nodes, \(e\in E^G\) and \(f\in E^H\) are edges

Definition 1 is very intuitive but useless for algorithmic purposes: Since the graph isomorphism problem currently cannot be solved in polynomial time [3], it is not even possible to polynomially recognize an edit path as such, let alone to optimize over the set of all edit paths. For this reason, all exact or approximate GED algorithms work with an alternative definition. This definition crucially uses the concept of a node map between two graphs G and H.

Definition 2

(Node map [10]) Let \(G,H\in \mathbb {G}\) be two graphs. A relation \(\pi \subseteq (V^G\cup \{\varepsilon \})\times (V^H\cup \{\varepsilon \})\) is called node map between G and H if and only if \(|\{v \mid v\in (V^H\cup \{\varepsilon \})\wedge (u,v)\in \pi \}|=1\) holds for all \(u\in V^G\) and \(|\{u \mid u\in (V^G\cup \{\varepsilon \})\wedge (u,v)\in \pi \}|=1\) holds for all \(v\in V^H\). We write \(\pi (u)=v\) just in case \((u,v)\in \pi \) and \(u\ne \varepsilon \), and \(\pi ^{-1}(v)=u\) just in case \((u,v)\in \pi \) and \(v\ne \varepsilon \). \(\varPi (G,H)\) denotes the set of all node maps between G and H. For edges \(e=(u,u^\prime )\in E^G\) and \(f=(v,v^\prime )\in E^H\), we introduce the shorthand notations \(\pi (e):=(\pi (u),\pi (u^\prime ))\) and \(\pi ^{-1}(f):=(\pi ^{-1}(v),\pi ^{-1}(v^\prime ))\).

A node map \(\pi \in \varPi (G,H)\) specifies for all nodes and edges of G and H whether they are substituted, deleted, or inserted. Table 2 details these edit operations.

Definition 3

(Induced edit path) Let \(G,H\in \mathbb {G}\) be graphs, \(\pi \in \varPi (G,H)\) be a node map between them, and O be the set of \(\pi \)’s induced edit operations as specified in Table 2. Then, an ordering \(P_\pi :=(o_r)^{|O|}_{r=1}\) of O is called induced edit path of the node map \(\pi \) if and only if edge deletions come first and edge insertions come last, i.e., if there are indices \(r_1\) and \(r_2\) such that \(o_r\) is an edge deletion just in case \(1\le r< r_1\) and \(o_i\) is an edge insertion just in case \(r_2< r\le |O|\).

It has been shown that induced edit paths are indeed edit paths, i.e., that \(P_\pi \in \varPsi (G,H)\) holds for all \(\pi \in \varPi (G,H)\) [14]. The cost \(c(P_\pi )\) of an edit path \(P_\pi \) induced by a node map \(\pi \in \varPi (G,H)\) is given as follows:

$$\begin{aligned} c(P_\pi )= & {} \underbrace{\sum _{\begin{array}{c} u\in V^G\\ \pi (u)\in V^H \end{array}}c_V(u,\pi (u))}_{\text {node substitutions}}+\underbrace{\sum _{\begin{array}{c} e\in E^G\\ \pi (e)\in E^H \end{array}}c_V(e,\pi (e))}_{\text {edge substitutions}}\\&+\>\underbrace{\sum _{\begin{array}{c} u\in V^G\\ \pi (u)\notin V^H \end{array}}c_V(u,\varepsilon )}_{\text {node deletions}}+\underbrace{\sum _{\begin{array}{c} e\in E^G\\ \pi (e)\notin E^H \end{array}}c_E(e,\varepsilon )}_{\text {edge deletions}}\nonumber \\&+\>\underbrace{\sum _{\begin{array}{c} v\in V^H\\ \pi ^{-1}(v)\notin V^G \end{array}}c_V(\varepsilon ,v)}_{\text {node insertions}}+\underbrace{\sum _{\begin{array}{c} f\in E^H\\ \pi ^{-1}(f)\notin E^G \end{array}}c_E(\varepsilon ,f)}_{\text {edge insertions}}\nonumber \end{aligned}$$
(1)

Note that, by Definition 3, a node map \(\pi \) generally induces more than one edit path. However, all of these edit paths are equivalent, as they differ only w.r.t. the ordering of the contained edit operations. In the following, we will therefore identify all edit paths induced by \(\pi \). We can now give an alternative definition of GED.

Definition 4

(GED—alternative definition [14, 20]) The graph edit distance (GED) between two graphs \(G,H\in \mathbb {G}\) is defined as \(\mathrm{GED} (G,H):=\min \{c(P_\pi )\mid \pi \in \varPi (G,H)\}\), where \(P_\pi \in \varPsi (G,H)\) is the edit path induced by the node map \(\pi \).

Fig. 2
figure 2

Two graphs G and H from the letter (h) dataset and a node map \(\pi \) between them

In [36], it is shown that Definition 1 and Definition 4 are equivalent if the underlying edit cost functions \(c_V\) and \(c_E\) are metrics. In [10], this result is extended to general edit cost functions. There main advantage of using Definition 4 instead of Definition 1 is that, unlike edit paths, node maps can be constructed easily. Since each node map \(\pi \in \varPi (G,H)\) induces an upper bound \(c(P_\pi )\ge \mathrm{GED} (G,H)\), one can hence straightforwardly generate upper bounds.

Example 1

Consider the graphs G and H shown in Fig. 2. G and H are taken from the letter (h) dataset and represent distorted letter drawings [50]. Their nodes are labeled with two-dimensional, non-negative Euclidean coordinates. Edges are unlabeled. Hence, we have \(\varSigma _V={\mathbb {R}} _{\ge 0}\times {\mathbb {R}} _{\ge 0}\) and \(\varSigma _E=\{1\}\). In [52], it is suggested that the edit cost functions \(c_V\) and \(c_E\) for letter (h) should be defined as follows: \(c_E(1,\varepsilon ):=c_E(\varepsilon ,1):=0.425\), \(c_V(\alpha ,\alpha ^\prime ):=0.75\left||\alpha -\alpha ^\prime \right||\), and \(c_V(\alpha ,\varepsilon ):=c_V(\varepsilon ,\alpha ):=0.675\) for all node labels \(\alpha ,\alpha ^\prime \in \varSigma _V\), where \(\left||\cdot \right||\) is the Euclidean norm. Now consider the node map \(\pi \in \varPi (G,H)\) shown in Fig. 2. Its induced edit operations and edit costs are detailed in Table 3. By summing the induced edit costs, we obtain that \(\pi \)’s induced edit path \(P_\pi \in \varPsi (G,H)\) has cost \(c(P_\pi )=2.623179\), which implies \(\mathrm{GED} (G,H)\le 2.623179\).

Table 3 Edit operations and edit costs induced by node map \(\pi \) shown in Fig. 2, given the edit cost functions defined in Example 1
Table 4 Overview of compared heuristics

We conclude this section by recalling mathematical concepts and notations that are used throughout the paper.

Definition 5

(Miscellaneous definitions) In the remainder of this paper, we use the following definitions:

  • For all \(N\in \mathbb {N}\), we define \([N]:=\{n\in \mathbb {N}\mid 1\le n\wedge n\le N\}\).

  • Let \(G\in \mathbb {G}\) be a graph. A edge sequence \(((u_{i_1},u_{i_2}))^{k}_{i=1}\), \((u_{i_1},u_{i_2})\in E^G\) for all \(i\in [k]\), is called walk of length k between the nodes \(u_{i_1},u_{{k}_2}\in V^G\), if and only if \(u_{i_2}=u_{{i+1}_1}\) holds for all \(i\in [k-1]\).

  • Let \(G\in \mathbb {G}\) be a graph. A walk between two nodes \(u,u^\prime \in V^G\) is called path betweenuand\(u^\prime \), if and only if no node is encountered more than once.

  • Let \(G\in \mathbb {G}\) be a graph. The distance between two nodes\(u,u^\prime \in V^G\) in G, is defined as \(d^G(u,u^\prime ):=0\), if \(u=u^\prime \), as \(d^G(u,u^\prime ):=\min \{|P|\mid P \text {is path between}\)u and \(u^\prime \}\), if \(u\ne u^\prime \) and u and \(u^\prime \) are in the same connected component of G, as \(d^G(u,u^\prime ):=\infty \), if u and \(u^\prime \) are in different connected components of G.

  • The diameter of a graph\(G\in \mathbb {G}\) is defined as \({{\,\mathrm{diam}\,}}(G):=\max _{u\in V^G}\max _{u^\prime \in V^G}d^G(u,u^\prime )\).

  • Let \(G\in \mathbb {G}\) be a graph. The \(k^{\text {th}}\)neighborhood of a node\(u\in V^G\) in G is defined as \(N^G_k(u):=\{u^\prime \in V^G\mid d^G(u,u^\prime )=k\}\). The \(1^{\text {st}}\) neighborhood is called neighborhood of nodeu and abbreviated as \(N^G(u):=N^G_1(u)\).

  • Let \(G\in \mathbb {G}\) be a graph. The set of edges that are incident with a node\(u\in V^G\) is denoted by \(E^G(u)\).

  • Let \(G\in \mathbb {G}\) be a graph. The degree of a node\(u\in V^G\) in G is defined as \(\hbox {deg}^G(u):=|N^G(u)|\).

  • The maximum degree of a graph\(G\in \mathbb {G}\) is defined as \({{\,\mathrm{max\,deg}\,}}(G):=\max _{u\in V^G}\hbox {deg}^G(u)\).

  • Let \(f:X\rightarrow Y\) be a function and \(A\subseteq X\) be a subset of its domain. The image ofAunderf is denoted by \({f[A]}\), and the multiset image ofAunderf is denoted by \({f\llbracket A\rrbracket }\).

  • Let \(G\in \mathbb {G}\) be a graph and \(V\subseteq V^G\) be a subset of its nodes. Then, \({G[V]}:=(V,E^G\cap (V\times V),\ell ^G_V,\ell ^G_E)\) denotes the subgraph of G which is induced by the node set V.

  • The expression \(\delta _{{\texttt {true}}|{\texttt {false}}}\) is defined as \(\delta _{{\texttt {true}}}:=1\) and \(\delta _{{\texttt {false}}}:=0\).

4 Overview of compared heuristics

Table 4 gives an overview of the heuristics that are compared in this survey. Each heuristic is denoted by a name written in typewriter font. Whenever possible, this name is taken from the original publication. If no original name is available, we invented a name which reflects the main technical ingredient of the heuristic.

Some heuristics are categorized as extensions rather than instantiations of the paradigms LSAPE-GED and LS-GED, respectively. Although, in the original publications, these heuristics are usually presented as improvements of a specific instantiation of the respective paradigm, they can in fact be used to improve all instantiations. For instance, in [26], it is suggested that the local search algorithm IPFP should be run from several initial solutions in parallel. As this technique does not depend in the choice of the local search algorithm, it can be generalized to paradigm level and hence yields the extension MULTI-START of the paradigm LS-GED. In this survey, we generalize techniques to paradigm level whenever possible.

Moreover, some of the heuristics are designed for special edit cost functions. For instance, BRANCH-CONST requires constant edge edit costs, i.e., expects that there are constants \(c^{\mathrm{sub}}_E,c^{\mathrm{del}}_E,c^{\mathrm{ins}}_E\in {\mathbb {R}} \) such that \(c_E(\beta ,\beta ^\prime )=c^{\mathrm{sub}}_E\), \(c_E(\beta ,\varepsilon )=c^{\mathrm{del}}_E\), and \(c_V(\varepsilon ,\beta ^\prime )=c^{\mathrm{ins}}_E\) holds for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\) with \(\beta \ne \beta ^\prime \). In many datasets, this constraint is not satisfied. In our implementation of BRANCH-CONST, we therefore set \(c^{\mathrm{sub}}_E:=\min \{c_E(\beta ,\beta ^\prime )\mid (\beta ,\beta ^\prime )\in {\ell ^G_E[E^G]}\times {\ell ^H_E[E^H]}\wedge \beta \ne \beta ^\prime \}\), \(c^{\mathrm{del}}_E:=\min \{c_E(\beta ,\varepsilon )\mid \beta \in {\ell ^G_E[E^G]}\}\), and \(c^{\mathrm{ins}}_E:=\min \{c_E(\varepsilon ,\beta ^\prime )\mid \beta ^\prime \in {\ell ^H_E[E^H]}\}\) when running BRANCH-CONST on graphs G and H that come with nonconstant edge edit costs. Since we use minima for defining the constants, this preprocessing leaves BRANCH-CONST ’s lower bound valid. Similar techniques are used for enforcing the cost constraints of the other methods that are not designed for general edit costs.

5 Heuristics based on transformations to the linear sum assignment problem with error correction

In this section, we first introduce the paradigm LSAPE-GED, which generalizes heuristics that use transformations to the linear sum assignment problem with error correction (LSAPE) for upper and, possibly, lower bounding GED (Sect. 5.1). Subsequently, we present heuristics that can be modeled as instantiations of LSAPE-GED (Sect. 5.2) and summarize heuristics that can be modeled as extensions of LSAPE-GED (Sect. 5.3).

5.1 The paradigm LSAPE-GED

Definition 4 implies that each node map between G and H induces an upper bound for \(\mathrm{GED} (G,H)\). Instantiations of the paradigm LSAPE-GED use transformations to LSAPE for heuristically finding a node map that induces a tight upper bound. LSAPE is defined as follows:

Definition 6

(LSAPEfirst definition [15]) Given a matrix \({\mathbf{C}} \in \mathbb {R}^{(n+1)\times (m+1)}\) with \(c_{n+1,m+1}=0\), the linear sum assignment problem with error correction (LSAPE) consists in the task to minimize \({\mathbf{C}} (\pi ):=\sum _{(i,k)\in \pi }c_{i,k}\) over all relations \(\pi \in \varPi (n,m)\), where

  • \(\varPi (n,m)\subseteq \mathscr {P}([n+1]\times [m+1])\) is defined as the set of all feasible LSAPE solutions for C, and

  • a relation \(\pi \subseteq [n+1]\times [m+1]\) is called feasible LSAPE solution for C if and only if \(|\{k \mid k\in [m+1]\wedge (i,k)\in \pi \}|=1\) holds for all \(i\in [n]\) and \(|\{i \mid i\in [n+1]\wedge (i,k)\in \pi \}|=1\) holds for all \(k\in [m]\).

We write \(\pi (i)=k\) if \((i,k)\in \pi \) and \(i\ne n+1\); and \(\pi ^{-1}(k)=i\) if \((i,k)\in \pi \) and \(k\ne m+1\). The set of all optimal LSAPE solutions for C is denoted as \(\varPi ^\star ({\mathbf{C}}):={{\,\mathrm{arg\,min}\,}}_{\pi \in \varPi (n,m)}{\mathbf{C}} (\pi )\). We write \(\mathrm{LSAPE} ({\mathbf{C}}):=\min _{\pi \in \varPi (n,m)}{\mathbf{C}} (\pi )\) for the cost of an optimal solution for C.

Given a matrix \({\mathbf{C}} \in \mathbb {R}^{(n+1)\times (m+1)}\), an optimal solution \(\pi \in \varPi ^\star ({\mathbf{C}})\) can be computed in \(O(\min \{n,m\}^2\max \{n,m\})\) time [15], using variants of the famous Hungarian Algorithm [38, 46]. Once one optimal solution has been found, for each \(s\in [|\varPi ^\star ({\mathbf{C}})|]\), a solution set \(\varPi ^\star _s({\mathbf{C}})\subseteq \varPi ^\star ({\mathbf{C}})\) of size s can be enumerated in \(O(nm\sqrt{n+m}+s\log {(n+m)})\) time [64, 65]. Greedy suboptimal solutions can be computed in O(nm) time [55].

Node maps and feasible solutions for LSAPE are closely related. Assume that we are given graphs G and H and an LSAPE instance \({\mathbf{C}} \in \mathbb {R}^{(|V^G|+1)\times (|V^H|+1)}\). Then it immediately follows from Definitions 2 and 6 that we can identify the set \(\varPi (G,H)\) of all node maps between G and H with the set \(\varPi (|V^G|,|V^H|)\) of all feasible LSAPE solutions for C: For all \(i\in [|V^G|]\) and all \(k\in [|V^H|]\), we associate C ’s \(i^{\text {th}}\) row with the node \(u_i\in V^G\) and C ’s \(k^{\text {th}}\) column with the node \(v_k\in V^H\). The last row and the last column of C are associated with the dummy node \(\varepsilon \). Therefore, each feasible LSAPE solution \(\pi \) for C yields an upper bound for \(\mathrm{GED} (G,H)\), namely the cost \(c(P_\pi )\) of the edit path induced by \(\pi \)’s interpretation as a node map.

Sometimes, it is useful to view LSAPE not as an optimization problem over relations, but rather as an optimization problem over binary matrices. With this view, LSAPE can be equivalently defined as follows.

Definition 7

(LSAPE—alternative definition [15]) Given a matrix \({\mathbf{C}} \in {\mathbb {R}} ^{(n+1)\times (m+1)}\) with \(c_{n+1,m+1}=0\), the linear sum assignment problem with error correction (LSAPE) consists in the task to minimize \({\mathbf{C}} ({\mathbf{X}}):=\sum ^{n+1}_{i=1}\sum ^{m+1}_{k=1}c_{i,k}x_{i,k}\) over all binary matrices \({\mathbf{X}} \in \varPi (n,m)\), where

  • \(\varPi (n,m)\subseteq \{0,1\}^{(n+1)\times (m+1)}\) is defined as the set of all feasible LSAPE solutions for C, and

  • a binary matrix \({\mathbf{X}} \in \{0,1\}^{(n+1)\times (m+1)}\) is called feasible LSAPE solution for C if and only if \(\sum ^{m+1}_{k=1}x_{i,k}=1\) holds for all \(i\in [n]\) and \(\sum ^{n+1}_{i=1}x_{i,k}=1\) holds for all \(k\in [m]\).

Example 2

Let \({\mathbf{C}} \in {\mathbb {R}} ^{6\times 5}\) be a matrix and again consider the node map \(\pi \in \varPi (G,H)\) visualized in Fig. 2. The node map \(\pi \)’s interpretation as a feasible LSAPE solution for C in relational form is given by \(\{(i,i)\mid i\in [5]\}\). Its matrix representation X is shown in Fig. 3, along with another feasible LSAPE solution \({\mathbf{X}} ^\prime \). Instead of substituting row 3 by column 3 and row 4 by column 4, \({\mathbf{X}} ^\prime \) substitutes row 4 by column 3, deletes row 3, and inserts column 4.

Fig. 3
figure 3

Two LSAPE solutions of size \(6\times 5\) in matrix representation. If viewed as node maps between graphs G and H with \(|V^G|=5\) and \(|V^H|=6\), the last row corresponds to the dummy node \(\varepsilon \) in G and the last column corresponds to the dummy node \(\varepsilon \) in H

Instantiations of the paradigm LSAPE-GED now proceed as described in Algorithm 1: In the first step, the input graphs G and H and the edit cost functions are used to construct an LSAPE instance C of size \((|V^G|+1)\times (|V^H|+1)\) such that optimal LSAPE solutions for C induce cheap edit paths between G and H (line 1). This construction phase is where different instantiations of LSAPE-GED vary from each other. Subsequently, the LSAPE instance C is solved—either optimally or greedily—and the cost \(c(P_\pi )\) of the edit path induced by the obtained LSAPE solution \(\pi \) is interpreted as an upper bound for \(\mathrm{GED} (G,H)\) (line 2). If the protocol for constructing the LSAPE instance C ensures that one can define a scaling function \(\xi (G,H,c_V,c_E)\) such that

$$\begin{aligned} \xi (G,H,c_V,c_E)\cdot \mathrm{LSAPE}({\mathbf{C}})\le \mathrm{GED}(G,H) \end{aligned}$$
(2)

holds for all graphs \(G,H\in \mathbb {G}\) and all edit cost functions \(c_V\) and \(c_E\) and an optimal LSAPE solver was used to compute \(\pi \), \(\xi (G,H,c_V,c_E){\mathbf{C}} (\pi )\) is returned as a lower bound for GED along with the upper bound derived from the induced edit path (lines 4 to 7). Otherwise, only the upper bound is returned (lines 9 to 12).

Assume that an instantiation of LSAPE-GED constructs its LSAPE instance C in \(O(\omega )\) time. Optimally solving C requires \(O(\min \{|V^G|,|V^H|\}^2\max \{|V^G|,|V^H|\})\) time, while the complexity of greedily computing a cheap suboptimal solution is \(O(|V^G||V^H|)\). The induced cost of the obtained node map \(\pi \) can be computed in \(O(\max \{|E^G|,|E^H|\})\) time. The heuristic’s overall runtime complexity is hence \(O(\omega + \min \{|V^G|,|V^H|\}^2\max \{|V^G|,|V^H|\}+\max \{|E^G|,|E^H|\})\) if an optimal solver is used in line 2, and \(O(\omega + |V^G||V^H|+\max \{|E^G|,|E^H|\})\) if C is solved greedily.

figure a

5.2 Instantiations of the paradigm LSAPE-GED

Next, we present 11 algorithms for heuristically computing GED that can be modeled as instantiations of the paradigm LSAPE-GED. All heuristics compute upper bounds for GED, and some of them also compute lower bounds. Some of the heuristics require special edit costs, while others can be used with general edit costs.

5.2.1 The algorithm NODE

The algorithm NODE [36] is a very simple instantiation of LSAPE: It completely ignores the edges of the input graphs G and H and just defines C as the node edit cost matrix between G and H. In other words, it sets

$$\begin{aligned}&c_{i,k}:=c_V(u_i,v_k)\\&c_{i,|V^H|+1}:=c_V(u_i,\varepsilon )\\&c_{|V^G|+1,k}:=c_V(\varepsilon ,v_k) \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\).

The time complexity of constructing C is \(O(|V^G||V^H|)\). As Eq. (2) with \(\xi :\equiv 1\) holds for all graphs \(G,H\in \mathbb {G}\) and all edit cost functions \(c_V\) and \(c_E\), NODE computes both an upper and a lower bound for GED.

5.2.2 The algorithm BP

Unlike NODE, the algorithm BP [51] also considers edges. Informally, this is done by adding to \(c_{i,k}\) as defined by NODE the optimal cost of transforming the edges that are incident with \(u_i\) in G into the edges that are incident with \(v_k\) in H.

Formally, for each \((i,k)\in [|V^G|]\times [|V^H|]\), an auxiliary LSAPE instance \({\mathbf{C}} ^{i,k}\in {\mathbb {R}} ^{(\deg ^G(u_i)+1)\times (\deg ^H(v_k)+1)}\) is constructed. Let \((u_{i_j})^{\deg ^G(u_i)}_{j=1}\) be an enumeration of \(u_i\)’s neighborhood \(N^G(u_i)\) and \((v_{k_l})^{\deg ^H(v_k)}_{l=1}\) be an enumeration of \(v_k\)’s neighborhood \(N^H(v_k)\). BP sets

$$\begin{aligned}&c^{i,k}_{j,l}:=c_E((u_i,u_{i_j}),(v_k,v_{k_l}))\\&c^{i,k}_{j,\deg ^H(v^k)+1}:=c_E((u_i,u_{i_j}),\varepsilon )\\&c^{i,k}_{\deg ^G(u_i),l} :=c_E(\varepsilon ,(v_k,v_{k_l})) \end{aligned}$$

for all \((j,l)\in [\deg ^G(i)]\times [\deg ^H(i)]\) and computes an optimal LSAPE solution \(\pi ^{i,k}\in \varPi (\deg ^G(u_i),\deg ^H(v_k))\). Once this has been done for all \((i,k)\in [|V^G|]\times [|V^H|]\), the final LSAPE instance C is constructed by setting

$$\begin{aligned}&c_{i,k}:=c_V(u_i,v_k)+{\mathbf{C}} ^{i,k}(\pi ^{i,k})\\&c_{i,|V^H|+1}:=c_V(u_i,\varepsilon )+\sum ^{\deg ^G(u_i)}_{j=1}c_E((u_i,u_{i_j}),\varepsilon )\\&c_{|V^G|+1,k}:=c_V(\varepsilon ,v_k)+\sum ^{\deg ^H(v_k)}_{l=1}c_E(\varepsilon ,(v_k,v_{k_l})) \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\).

BP requires \(O(|V^G||V^H|{\varDelta ^{G,H}_{\mathrm{min}}}^2\varDelta ^{G,H}_{\mathrm{max}})\) time for constructing C, where \(\varDelta ^{G,H}_{\mathrm{min}}:=\min \{{{\,\mathrm{max\,deg}\,}}(G),{{\,\mathrm{max\,deg}\,}}(H)\}\) and \(\varDelta ^{G,H}_{\mathrm{max}}:=\max \{{{\,\mathrm{max\,deg}\,}}(G),{{\,\mathrm{max\,deg}\,}}(H)\}\). This construction does not guarantee that Eq. (2) holds, which implies that BP only returns an upper bound for GED.

5.2.3 The algorithm BRANCH

The algorithm BRANCH [8, 9, 57] is a slight modification of BP that also allows for the computation of a lower bound. The only modification is that the edge costs in the construction of the LSAPE instance C are divided by 2, i.e., C is defined by setting

$$\begin{aligned}&c_{i,k}:=c_V(u_i,v_k)+0.5\cdot {\mathbf{C}} ^{i,k}(\pi ^{i,k})\\&c_{i,|V^H|+1} :=c_V(u_i,\varepsilon )+0.5\cdot \sum ^{\deg ^G(u_i)}_{j=1}c_E((u_i,u_{i_j}),\varepsilon )\\&c_{|V^G|+1,k}:=c_V(\varepsilon ,v_k)+0.5\cdot \sum ^{\deg ^H(v_k)}_{l=1}c_E(\varepsilon ,(v_k,v_{k_l})) \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\).

Like BP, BRANCH requires \(O(|V^G||V^H|{\varDelta ^{G,H}_{\mathrm{min}}}^2\varDelta ^{G,H}_{\mathrm{max}})\) time for constructing C. Unlike BP, the construction carried out by BRANCH ensures that Eq. (2) with \(\xi :\equiv 1\) holds for all graphs \(G,H\in \mathbb {G}\) and all edit cost functions \(c_V\) and \(c_E\), which implies that BRANCH also computes a lower bound for GED. Furthermore, it is shown that, given fixed metric edit cost functions \(c_V\) and \(c_E\), the lower bound returned by BRANCH is a pseudo-metric on \(\mathbb {G}\), i.e., is symmetric, nonnegative, respects the triangle inequality, and equals 0 if two graphs \(G,H\in \mathbb {G}\) are isomorphic.

5.2.4 The algorithm BRANCH-FAST

The algorithm BRANCH-FAST suggested in [8, 9] speeds up BRANCH at the cost of producing a looser lower bound. For all \((i,k)\in [|V^G|]\times [|V^H|]\), BRANCH-FAST computes \(\varGamma ^{i,k}_E\) as the size of the intersection of the multisets of edge labels that are incident to \(u_i\) in G and to \(v_k\) in H, i.e., sets \(\varGamma ^{i,k}_E:=|{\ell ^G_E\llbracket E^G(u_i)\rrbracket }\cap {\ell ^H_E\llbracket E^H(v_k)\rrbracket }|\). Moreover, BRANCH-FAST computes the minimal deletion cost \(c^i_{\mathrm{min}}:=\min \{c_E(e,\varepsilon )\mid e\in E^G(u_i)\}\), the minimal insertion cost \(c^k_{\mathrm{min}}:=\min \{c_E(\varepsilon ,f)\mid f\in E^H(v_k)\}\), as well as the minimal substitution cost \(c^{i,k}_{\mathrm{min}}:=\min \{c_E(e,f)\mid (e,f)\in E^G(u_i)\times E^H(v_k)\wedge \ell ^G_E(e)\ne \ell ^H_E(f)\}\) for the sets \(E^G(u_i)\) and \(E^H(v_k)\) of edges that are incident to \(u_i\) in G and to \(v_k\) in H, respectively. With these ingredients, BRANCH-FAST constructs its LSAPE instance C by setting

$$\begin{aligned} c_{i,k}:= & {} c_V(u_i,v_k)+0.5\cdot [(\varDelta ^{i,k}_{\mathrm{min}}-\varGamma ^{i,k}_E)c^{i,k}_{\mathrm{min}}\\&+\>\delta _{\deg ^G(u_i)>\deg ^H(v_k)}(\varDelta ^{i,k}_{\mathrm{max}}-\varDelta ^{i,k}_{\mathrm{min}})c^i_{\mathrm{min}}\\&+\>\delta _{\deg ^G(u_i)<\deg ^H(v_k)}(\varDelta ^{i,k}_{\mathrm{max}}-\varDelta ^{i,k}_{\mathrm{min}})c^k_{\mathrm{min}}]\\ c_{i,|V^H|+1}:= & {} c_V(u_i,\varepsilon )+0.5\cdot \deg ^G(u_i)c^i_{\mathrm{min}}\\ c_{|V^G|+1,k}:= & {} c_V(\varepsilon ,v_k)+0.5\cdot \deg ^H(v_k)c^k_{\mathrm{min}} \end{aligned}$$

for all \((i,k){\in }[|V^G|]\times [|V^H|]\), where \(\varDelta ^{i,k}_{\mathrm{min}}{:=}\min \{\deg ^G(u_i),\deg ^H(v_i)\}\) and \(\varDelta ^{i,k}_{\mathrm{max}}:=\max \{\deg ^G(u_i),\deg ^H(v_i)\}\).

By sorting all sets of incident edge labels before populating C, BRANCH-FAST can reduce the time complexity of constructing C to \(O(\max \{|V^G|,|V^H|\}\varDelta ^{G,H}_{\mathrm{max}}\log (\varDelta ^{G,H}_{\mathrm{max}})+|V^G||V^H|\varDelta ^{G,H}_{\mathrm{min}}\varDelta ^{G,H}_{\mathrm{max}})\). As Eq. (2) with \(\xi :\equiv 1\) holds for each input, BRANCH-FAST returns an upper and a lower bound for GED. Like the lower bound produced by BRANCH, the lower bound yielded by BRANCH-FAST is a pseudo-metric if the underlying edit costs are metric. Furthermore, it can be shown that BRANCH-FAST ’s lower bound is never tighter than the one computed by BRANCH. For constant edge edit costs, BRANCH and BRANCH-FAST are equivalent.

5.2.5 The algorithm BRANCH-CONST

The algorithm BRANCH-CONST [70, 71] can be viewed as a speedup of BRANCH and BRANCH-FAST for constant edge edit costs \(c_E\).Footnote 1 It uses the fact that if the edge edit costs are constant, the minimum edge deletion, insertion, and substitution costs employed by BRANCH-FAST do not have to be computed, as we have \(c^i_{\mathrm{min}}=c^{\mathrm{del}}_E\), \(c^{k}_{\mathrm{min}}=c^{\mathrm{ins}}_E\), and \(c^{i,k}_{\mathrm{min}}=c^{\mathrm{sub}}_E\). This implies that the LSAPE instance C can be constructed in \(O(\max \{|V^G|,|V^H|\}\varDelta ^{G,H}_{\mathrm{max}}\log (\varDelta ^{G,H}_{\mathrm{max}})+|V^G||V^H|\varDelta ^{G,H}_{\mathrm{min}})\) time. All other properties are inherited from BRANCH and BRANCH-FAST.

5.2.6 The algorithm STAR

The algorithm STAR [68] considers the neighbors of the nodes \(u_i\in V^G\) and \(v_k\in V^H\) when populating the cell \(c_{i,k}\) of its LSAPE instance C. It requires uniform edit cost functions \(c_V\) and \(c_E\) and ignores the edge labels of the input graphs. Let C be the constant such that \(c_V(\alpha ,\alpha ^\prime )=c_E(\beta ,\beta ^\prime )=C\) holds for all \((\alpha ,\alpha ^\prime )\in (\varSigma _V\cup \{\varepsilon \})\times (\varSigma _V\cup \{\varepsilon \})\) with \(\alpha \ne \alpha ^\prime \) and all \((\beta ,\beta ^\prime )\in (\varSigma _E\cup \{\varepsilon \})\times (\varSigma _E\cup \{\varepsilon \})\) with \(\beta \ne \beta ^\prime \). In the first step, STAR computes \(\varGamma ^{i,k}_V\) as the size of the intersection of the multisets of node labels that are adjacent to \(u_i\) in G and to \(v_k\) in H, i.e., sets \(\varGamma ^{i,k}_V:=|{\ell ^G_V\llbracket N^G(u_i)\rrbracket }\cap {\ell ^H_V\llbracket N^H(v_k)\rrbracket }|\). STAR then defines its LSAPE instance C by setting

$$\begin{aligned}&c_{i,k}:=C\cdot [\delta _{\ell ^G_V(u_i)\ne \ell ^H_V(v_k)}+2\varDelta ^{i,k}_{\mathrm{max}}-\varDelta ^{i,k}_{\mathrm{min}})-\varGamma ^{i,k}_V]\\&c_{i,|V^H|+1}:=C\cdot [1+2\deg ^G(u_i)]\\&c_{|V^G|+1,k}:=C\cdot [1+2\deg ^H(v_k)] \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\), where \(\varDelta ^{i,k}_{\mathrm{min}}\) and \(\varDelta ^{i,k}_{\mathrm{min}}\) are defined as in Sect. 5.2.4.

STAR has the same time complexity as BRANCH-CONST, that is, STAR requires \(O(\max \{|V^G|,|V^H|\}\varDelta ^{G,H}_{\mathrm{max}}\log (\varDelta ^{G,H}_{\mathrm{max}})+|V^G||V^H|\varDelta ^{G,H}_{\mathrm{min}})\) time for constructing its LSAPE instance C. Furthermore, Eq. (2) holds if the scaling function \(\xi \) is defined as \(\xi (G,H,c_V,c_E):=1/\max \{4,\varDelta ^{G,H}_{\mathrm{max}}+1\}\). STAR hence returns both a lower and an upper bound for GED.

5.2.7 The algorithm SUBGRAPH

The algorithm SUBGRAPH [21] considers more global information than the previously presented heuristics for constructing its LSAPE instance C. Given a constant \(K\in \mathbb {N}_{\ge 1}\), SUBGRAPH constructs graphlets \(G_i:={G[\bigcup ^K_{s=0}N^G_s(u_i)]}\) and \(H_k:={H[\bigcup ^K_{s=0}N^H_s(v_k)]}\) for all \((i,k)\in [|V^G|]\times [|V^H|]\), i.e., associates all nodes in the input graphs to the subgraphs which are induced by the sets of all nodes that are at distance at most K. For graphlets \(G_i\) and \(H_k\), SUBGRAPH defines \(\mathrm{GED} _{i,k}(G_i,H_k):=\min \{c(P_\pi )\mid \pi \in \varPi (G_i,H_i)\wedge \pi (u_i)=v_k\}\) as the edit distance under the restriction that \(G^i\)’s root node \(u_i\) be mapped to \(H^k\)’s root node \(v_k\). SUBGRAPH then constructs its LSAPE instance C by setting

$$\begin{aligned}&c_{i,k}:=\mathrm{GED}_{i,k}(G_i,H_k)\\&c_{i,|V^H|+1}:=\mathrm{GED}(G_i,\mathscr {E})\\&c_{|V^G|+1,k}:=\mathrm{GED}(\mathscr {E},H_k) \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\), where \(\mathscr {E}\) denotes the empty graph.

The time complexity of SUBGRAPH ’s construction phase of its LSAPE instance C is exponential in \(\varDelta ^{G,H}_{\mathrm{max}}\). This implies that, unless \({{\,\mathrm{max\,deg}\,}}(G)\) and \({{\,\mathrm{max\,deg}\,}}(H)\) are constantly bounded, SUBGRAPH does not run in polynomial time. SUBGRAPH only computes an upper bound for GED.

5.2.8 The algorithm WALKS

The algorithm WALKS [32] requires constant and symmetric edit cost functions \(c_V\) and \(c_E\) and aims at computing a tight upper bound for GED by associating each node in the input graphs to the set of walks of size K that start at this node. Given a constant \(K\in \mathbb {N}_{\ge 1}\), a node \(u_i\in V^G\), and a node \(v_k\in V^H\), WALKS defines \(\mathscr {W}^G_i\) and \(\mathscr {W}^H_k\) as, respectively, the sets of walks of size K that start at \(u_i\) and \(v_k\). Walks \(W\in \mathscr {W}^G_i\) and \(W^\prime \in \mathscr {W}^H_k\) are called similar if they encode the same sequences of node and edge labels. Otherwise, W and \(W^\prime \) are called different.

WALKS now computes the matrix products \(\mathbf{W}^K_G\), \(\mathbf{W}^K_H\), and \(\mathbf{W}^K_\times \), where \(\mathbf{W}_G\) is the adjacency matrix of G, \(\mathbf{W}_H\) is the adjacency matrix of H, and \(\mathbf{W}_\times \) is the adjacency matrix of the direct product graph \(G\times H\) of G and H. \(G\times H\) contains a node \((u_i,v_k)\) for each \((i,k)\in [|V^G|]\times [|V^H|]\) such that \(\ell ^G_V(u_i)=\ell ^H_V(v_k)\). Two nodes \((u_i,v_k)\) and \((u_j,v_l)\) of the product graph \(G\times H\) are connected by an edge if and only if \((u_i,u_j)\in E^G\), \((v_k,v_l)\in E^H\), and \(\ell ^G_E(u_i,u_j)=\ell ^H_E(v_k,v_l)\).

With the help of \(\mathbf{W}^K_G\), \(\mathbf{W}^K_H\), and \(\mathbf{W}^K_\times \), for each node label \(\alpha \in \varSigma _V\), WALKS computes an estimate \(\widehat{h}_{i{\setminus } k}(\alpha )\) of the number of walks \(W\in \mathscr {W}^G_i\) that end at a node with label \(\alpha \) and must be substituted by a different walk \(W^\prime \in \mathscr {W}^H_k\). Analogously, \(\widehat{h}_{k{\setminus } i}(\alpha )\) is computed as an estimate of the number of walks \(W^\prime \in \mathscr {W}^H_k\) that end at a node with label \(\alpha \) and must be substituted by a different walk \(W\in \mathscr {W}^G_i\). Moreover, WALKS computes an estimate \(\widehat{r}_{i{\setminus } k}:=\sum _{\alpha \in \varSigma _V}\widehat{h}_{i{\setminus } k}(\alpha )-\min \{\widehat{h}_{i{\setminus } k}(\alpha ),\widehat{h}_{k{\setminus } i}(\alpha )\}\) of the number of walks in \(W\in \mathscr {W}^G_i\) that must be substituted by a different walk \(W^\prime \in \mathscr {W}^H_i\) that does not end at the same node label, and an estimate \(\widehat{r}_{k{\setminus } i}:=\sum _{\alpha \in \varSigma _V}\widehat{h}_{k{\setminus } i}(\alpha )-\min \{\widehat{h}_{i{\setminus } k}(\alpha ),\widehat{h}_{k{\setminus } i}(\alpha )\}\) of the number of walks in \(W^\prime \in \mathscr {W}^H_k\) that must be substituted by a different walk \(W\in \mathscr {W}^G_i\) that does not end at the same node label. With these ingredients, WALKS constructs its LSAPE instance C by setting

$$\begin{aligned}&c_{i,k}:=[(\delta _{\ell ^G_V(u_i)\ne \ell ^H_V(v_k)}+K-1)c^{\mathrm{sub}}_V\\&\qquad \qquad +\>K c^{\mathrm{sub}}_E]\cdot \textstyle \sum _{\alpha \in \varSigma _V}\min \{\widehat{h}_{i{\setminus } k}(\alpha ),\widehat{h}_{k{\setminus } i}(\alpha )\}\\&\qquad \qquad +\>[(\delta _{\ell ^G_V(u_i)\ne \ell ^H_V(v_k)}+K)c^{\mathrm{sub}}_V\\&\qquad \qquad {+}\>K c^{\mathrm{sub}}_E]\cdot \min \{\widehat{r}_{i{\setminus } k},\widehat{r}_{k{\setminus } i}\}\\&\qquad \qquad +\>[(\delta _{\ell ^G_V(u_i)\ne \ell ^H_V(v_k)}{+}K)c^{\mathrm{del}}_V\\&\qquad \qquad {+}K c^{\mathrm{del}}_E]\cdot |\widehat{r}_{i{\setminus } k}{-}\widehat{r}_{k{\setminus } i}|\\&c_{i,|V^H|+1}:=[(\delta _{\ell ^G_V(u_i)\ne \ell ^H_V(v_k)}+K)c^{\mathrm{del}}_V+K c^{\mathrm{del}}_E]\cdot |\mathscr {W}^G_i|\\&c_{|V^G|+1,k}:=[(\delta _{\ell ^G_V(u_i)\ne \ell ^H_V(v_k)}+K)c^{\mathrm{del}}_V+K c^{\mathrm{del}}_E]\cdot |\mathscr {W}^H_k| \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\).

WALKS requires \(O((|V^G||V^H|)^\omega )\) time for computing its LSAPE instance C, where \(O(n^\omega )\) is the complexity of multiplying two matrices with n rows and n columns. The asymptotically fastest matrix multiplication algorithms achieve \(\omega <2.38\) [40]; the fastest practically useful matrix multiplication algorithm runs in \(O(n^{\log _2(7)})\approx O(n^{2.81})\) time [63]. WALKS only computes an upper bound for GED.

5.2.9 The algorithm RING

Like SUBGRAPH and WALKS, the algorithm RING [4, 5] aims at computing a tight upper bound for GED by considering enlarged local structures. Given a constant \(K\in \mathbb {N}_{\ge 1}\), a node \(u_i\in V^G\), and a node \(v_k\in V^H\), RING uses breadth-first search to construct rings \(\mathscr {R}^G_i:=(\mathscr {L}^G_l(u_i))^{K-1}_{l=0}\) and \(\mathscr {R}^H_k:=(\mathscr {L}^H_l(v_k))^{K-1}_{l=0}\) whose \(l^{\text {th}}\) layers are defined as the triplets \(\mathscr {L}^G_l(u_i):=(N^G_l(u_i),{\textit{OE}} ^G_l(u_i),{\textit{IE}} ^G_l(u_i))\) and \(\mathscr {L}^H_l(v_k):=(N^H_l(v_k),{\textit{OE}} ^H_l(v_k),{\textit{IE}} ^H_l(v_k))\), respectively. \({\textit{OE}} ^G_l(u_i):=E^G\cap (N^G_l(u_i)\times N^G_{l+1}(u_i))\) is defined the set of edges from G that connect nodes at distance l from \(u_i\) to nodes at distance \(l+1\), while the set \({\textit{IE}} ^G_l(u_i):=E^G\cap (N^G_l(u_i)\times N^G_l(u_i))\) contains all edges that connect two nodes at distance l. The edge sets \({\textit{OE}} ^H_l(v_k)\) and \({\textit{IE}} ^H_l(v_k)\) are defined analogously.

In the next step, RING defined ring distances \(d_{\mathscr {R}}(\mathscr {R}^G_i,\mathscr {R}^H_k):=\sum ^{K-1}_{l=0}\lambda _ld_{\mathscr {L}}(\mathscr {L}^G_l(u_i),\mathscr {L}^H_l(v_k)\), where the layer distance \(d_{\mathscr {L}}\) is defined as \(d_{\mathscr {L}}(\mathscr {L}^G_l(u_i),\mathscr {L}^H_l(v_k):=\alpha _0d_V(N^G_l(u_i),N^H_l(v_k))+\alpha _1d_E({\textit{OE}} ^G_l(u_i),{\textit{OE}} ^H_l(v_k))+\alpha _2d_E({\textit{IE}} ^G_l(u_i),{\textit{IE}} ^H_l(v_k))\), \((\lambda _l)_l\in {\mathbb {R}} ^K_{\ge 0}\) and \((\alpha _s)_s\in {\mathbb {R}} ^3_{\ge 0}\) are simplex vectors, \(d_V\) is a distance measure between node sets, and \(d_E\) is a distance measure between edge sets. RING suggests a strategy that uses a blackbox optimization for intelligently choosing the meta-parameters \(\lambda _l\), \(\alpha _s\), and K.

Three different definitions of the node set distance \(d_V\) are suggested. The first proposal is to use the node edit cost function \(c_V\) to construct an auxiliary LSAPE instance \({\mathbf{C}} ^{i,k}\) for the node sets \(N^G_l(u_i)\) and \(N^H_l(v_k)\) and then to define \(d_V(N^G_l(u_i),N^H_l(v_k)):=\mathrm{LSAPE} ({\mathbf{C}} ^{i,k})\). Alternatively, it is proposed to define \(d_V(N^G_l(u_i),N^H_l(v_k)):=\overline{\mathrm{LSAPE} ({\mathbf{C}} ^{i,k})}\), where \(\overline{\mathrm{LSAPE} ({\mathbf{C}} ^{i,k})}\) is a proxy for \(\mathrm{LSAPE} ({\mathbf{C}} ^{i,k})\) which is efficiently computed via greedy LSAPE solvers or fast heuristics based on multiset intersection. The edge set distance measure \(d_E\) can be defined analogously. RING then constructs its LSAPE instance C by setting

$$\begin{aligned}&c_{i,k}:=d_{\mathscr {R}}(\mathscr {R}^G_i,\mathscr {R}^H_k)\\&c_{i,|V^H|+1}:=d_{\mathscr {R}}(\mathscr {R}^G_i,\mathscr {E})\\&c_{|V^G|+1,k}:=d_{\mathscr {R}}(\mathscr {E},\mathscr {R}^H_k) \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\), where \(\mathscr {E}:=(\emptyset ,\emptyset ,\emptyset )^{K-1}_{l=0}\) is the empty ring of size K.

Constructing all rings for a graph G requires \(O(|V^G|(|V^G|+|E^G|))\) time. Let \(\varOmega \) be the maximum size of a node or edge set that appears in one of the rings rooted at the nodes of G and H. Then, once all rings for G and H have been constructed, RING populates its LSAPE instance C in \(O(|V^G||V^H|\varOmega ^3)\) time if an optimal LSAPE solver is used for computing \(d_V\) and \(d_E\), and in \(O(|V^G||V^H|\varOmega ^2)\) time if greedy or multiset intersection-based heuristics are employed. RING only computes an upper bound for GED.

5.2.10 The algorithm RING-ML

The algorithm RING-ML [5] is similar to RING in that it also decomposes the input graphs into rings rooted at their nodes. However, instead of computing distances between the rings, RING-ML constructs a feature vectors \({\mathbf{x}} ^{i,k}:=({\mathbf{x}} ^{G,H},{\mathbf{x}} ^0,\ldots ,{\mathbf{x}} ^l,\ldots ,{\mathbf{x}} ^{K-1})\in {\mathbb {R}} ^{6K+10}\) for all \((i,k)\in [|V^G|+1]\times [|V^H|+1]\). The features contained in \({\mathbf{x}} ^l\in {\mathbb {R}} ^6\) express the dissimilarity of the \(l^{\text {th}}\) layers of the rings \(\mathscr {R}^G_i\) and \(\mathscr {R}^H_k\). They are defined as \(\mathbf{x}^l_0:=|N^G_l(u_i)|-|N^H_l(v_k)|\), \(\mathbf{x}^l_1:=|{\textit{OE}} ^G_l(u_i)|-|{\textit{OE}} ^H_l(v_k)|\), \(\mathbf{x}^l_2:=|{\textit{IE}} ^G_l(u_i)|-|{\textit{IE}} ^H_l(v_k)|\), \(\mathbf{x}^l_3:=d_V(N^G_l(u_i),N^H_l(v_k))\), \(\mathbf{x}^l_4:=d_E({\textit{OE}} ^G_l(u_i),{\textit{OE}} ^H_l(v_k))\), and \(\mathbf{x}^l_5:=d_E({\textit{IE}} ^G_l(u_i),{\textit{IE}} ^H_l(v_k))\), where \(u_{|V^G|+1}:=v_{|V^H|+1}:=\varepsilon \). The vector \({\mathbf{x}} ^{G,H}\) contains global features that are the same for all \((i,k)\in [|V^G|]\times [|V^H|]\): the number of nodes and edges of G and H, the average costs for deleting nodes and edges from G, the average costs for inserting nodes and edges into H, and the average costs for substituting nodes and edges in G by nodes and edges in H.

Given a training set, RING-ML defines a node assignment \((u,v)\in (V^G\cup \{\varepsilon \})\times (V^H\cup \{\varepsilon \})\) as good if and only if there is a node map \(\pi \in \varPi (G,H)\) with \(c(P_\pi )=\mathrm{GED} (G,H)\) and \((u,v)\in \pi \). Next, RING-ML learns a function \(p^\star \) that estimates the probability that a node assignment is good. This is done by computing optimal or close-to-optimal node maps between all graphs contained in a training set and then training a support vector classifier with probability estimates and RBF kernel, a one-class support vector machine with RBF kernel, or a fully connected, feedforward neural network on the generated training data. Once the probability estimate \(p^\star \) has been learned, RING-ML constructs its LSAPE instance C by setting

$$\begin{aligned}&c_{i,k}:=1-p^\star ({\mathbf{x}} ^{i,k})\\&c_{i,|V^H|+1}:=1-p^\star ({\mathbf{x}} ^{i,|V^H|+1})\\&c_{|V^G|+1,k}:=1-p^\star ({\mathbf{x}} ^{|V^G|+1,k}) \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\).

Depending on the choice of the node and edge set distances \(d_V\) and \(d_E\), RING-ML requires \(O(|V^G||V^H|\varOmega ^3)\) or \(O(|V^G||V^H|\varOmega ^2)\) time for constructing all feature vectors for the graphs G and H. The time complexity of populating C once the feature vectors have been constructed depends on the employed machine learning technique. RING-ML only computes an upper bound for GED.

5.2.11 The algorithm PREDICT

The algorithm PREDICT [5, 54] differs from RING-ML only in that it uses different feature vectors \({\mathbf{x}} ^{i,k}\). For computing its feature vectors, PREDICT first constructs an auxiliary LSAPE instance \({\mathbf{C}} ^{{\texttt {BP}}}\) as done by the algorithm BP presented in Sect. 5.2.2. Subsequently, for all \((i,k)\in [|V^G|+1]\times [|V^H|+1]\), PREDICT defines \({\mathbf{x}} ^{i,k}:=({\mathbf{x}} ^{G,H},{\mathbf{x}} ^i,{\mathbf{x}} ^k,c_V(u_i,v_k),c^{{\texttt {BP}}}_{i,k}-c_V(u_i,v_k))\in {\mathbb {R}} ^{24}\), where \(u_{|V^G|+1}:=v_{|V^H|+1}:=\varepsilon \). The vector \({\mathbf{x}} ^{G,H}\in {\mathbb {R}} ^4\) contains four global features: the maximum, the minimum, the average, and the deviation of \({\mathbf{C}} ^{{\texttt {BP}}}\). The vector \({\mathbf{x}} ^i\in {\mathbb {R}} ^9\) contains nine features associated with the \(i^{\text {th}}\) row of \({\mathbf{C}} ^{{\texttt {BP}}}\): its maximum, its minimum, its average, its deviation, its uniqueness, its divergence, its leader, its interval, and its outlierness. Analogously, \({\mathbf{x}} ^k\in {\mathbb {R}} ^9\) contains nine features associated with the \(k^{\text {th}}\) row of \({\mathbf{C}} ^{{\texttt {BP}}}\). Finally, \({\mathbf{x}} ^{i,k}\) contains features for the node and the edge edit costs which are induced by assigning \(u_i\) to \(v_k\). Once all feature vectors have been constructed, PREDICT proceeds exactly like RING-ML.

5.3 Extensions of the paradigm LSAPE-GED

Next, we present two extensions of the paradigm LSAPE-GED. Both of them can be used to improve all of the heuristics described in Sect. 5.2.

5.3.1 The extension CENTRALITIES

Assume that an LSAPE instance \({\mathbf{C}} \in {\mathbb {R}} ^{(|V^G|+1)\times (|V^H|+1)}\) has been constructed by one of the instantiations of LSAPE-GED presented in Sect. 5.2. In [25, 53], it is suggested to define a node centrality measure \(\phi \) that maps central nodes to large and non-central nodes to small non-negative reals. Suggested centrality measures are, for instance, the degrees, the eigenvector centralities [11], and the pagerank centralities [18] of the nodes of the input graphs.

With the help of \(\phi \), the upper bound for GED induced by C can be improved. To this purpose, a second LSAPE instance \({\mathbf{C}} ^\prime \in {\mathbb {R}} ^{(|V^G|+1)\times (|V^H|+1)}\) is constructed by setting

$$\begin{aligned}&c^\prime _{i,k}:=(1-\gamma )\cdot c_{i,k}+\gamma \cdot |\phi (u_i)-\phi (v_k)|\\&c^\prime _{i,|V^H|+1}:=(1-\gamma )\cdot c_{i,|V^H|+1}+\gamma \cdot \phi (u_i)\\&c^\prime _{|V^G|+1,k}:=(1-\gamma )\cdot c_{|V^G|+1,k}+\gamma \cdot \phi (v_k) \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\), where \(0\le \gamma \le 1\) is a meta-parameter. Subsequently, two cheap or optimal LSAPE solutions \(\pi ,\pi ^\prime \in \varPi (|V^G|,|V^H|)\) for C and \({\mathbf{C}} ^\prime \) are computed, and the returned upper bound for GED is improved from \({\textit{UB}} :=c(P_\pi )\) to \({\textit{UB}} :=\min \{c(P_\pi ),c(P_{\pi ^\prime })\}\).Footnote 2

5.3.2 The extension MULTI-SOL

Given a constant \(K\in {\mathbb {N}} _{\ge 1}\), the extension MULTI-SOL of the paradigm LSAPE-GED suggested in [5, 26] improves the upper bound returned by instantiations of LSAPE-GED by considering not only one, but rather up to K optimal LSAPE solutions. MULTI-SOL cannot be used in combination with greedy LSAPE solvers, as it requires that the LSAPE instance C is solved optimally in line 2 of Algorithm 1. Once the first optimal LSAPE solution \(\pi ^\star _0\in \varPi ^\star ({\mathbf{C}})\) has been computed, MULTI-SOL uses a variant of the algorithm suggested in [64] for enumerating \(K^\prime :=\min \{K,|\varPi ^\star ({\mathbf{C}})|\}-1\) optimal LSAPE solutions \(\{\pi ^\star _l\}^{K^\prime }_{l=1}\), all of which are pairwise different and different from \(\pi ^\star _0\). Since K is a constant, this enumeration requires only \(O(|V^G|+|V^H|)\) additional time. Subsequently, the upper bound for GED is improved from \({\textit{UB}} :=c(P_{\pi ^\star _0})\) to \({\textit{UB}} :=\min ^{K^\prime }_{l=0}c(P_{\pi ^\star _l})\).

6 Heuristics based on linear programming

In this section, we first introduce the paradigm LP-GED, which generalizes heuristics that use linear programming for lower and upper bounding GED (Sect. 6.1). Subsequently, we present heuristics that can be modeled as instantiations of LP-GED (Sect. 6.2).

6.1 The paradigm LP-GED

Recall the alternative Definition 4 of GED, which defines the problem of computing GED as a minimization problem over the set of all node maps between two graphs G and H. This definition can straightforwardly be transformed into the quadratic programming formulation of GED detailed in Fig. 4 [16, 49]. The binary decision variables \(x^{\mathrm{sub}}_{i,k}\), \(x^{\mathrm{del}}_i\), and \(x^{\mathrm{ins}}_k\) indicate, respectively, whether the node \(u_i\in V^G\) is to be substituted by the node \(v_k\in V^H\), whether \(u_i\) is to be deleted, and whether \(v_k\) is to be inserted. Analogously, the binary decision variables \(y^{\mathrm{sub}}_{i,j,k,l}\), \(y^{\mathrm{del}}_{i,j}\), and \(y^{\mathrm{ins}}_{k,l}\) indicate, respectively, whether the edge \((u_i,u_j)\in E^G\) is to be substituted by the edge \((v_k,v_l)\in E^H\), whether \((u_i,u_j)\) is to be deleted, and whether \((v_k,v_l)\) is to be inserted. The quadratic constraint \(y^{\mathrm{sub}}_{i,j,k,l}-x^{\mathrm{sub}}_{i,k}x^{\mathrm{sub}}_{j,l}-x^{\mathrm{sub}}_{i,l}x^{\mathrm{sub}}_{j,k}=0\) ensures that \((u_i,u_j)\) is substituted by \((v_k,v_l)\) if and only if the node map \(\pi \) encoded by \(\mathbf{x^\mathrm{sub}}\), \(\mathbf{x^\mathrm{del}}\), and \(\mathbf{x^\mathrm{ins}}\) satisfies \(\pi (u_i,u_j)=(v_k,v_l)\).

Fig. 4
figure 4

A quadratic programming formulation of GED

Heuristics that use linear programs (LP) for upper and lower bounding GED proceed as described in Algorithm 2: In a first step, the quadratic program shown in Fig. 4 is linearized to obtain a (mixed) integer linear programming (MIP) formulation \(\widehat{F}\) (line 1). This linearization phase is where different instantiations of LP-GED vary from each other. Next, all integrality constraints contained in \(\widehat{F}\) are continuously relaxed, which yields in an LP F (line 2). Subsequently, F is solved and the lower bound LB is set to the optimal solution of F.

In the literature, LP-based heuristics for GED are usually described as algorithms that only yield lower bounds. However, they can straightforwardly be extended to also compute upper bounds. To that purpose, after solving the LP F, an LSAPE instance \({\mathbf{C}} \in {\mathbb {R}} ^{(|V^G|+1)\times (|V^H|+1)}\) is constructed, whose optimal solutions \(\pi ^\star \in {{\,\mathrm{arg\,min}\,}}_{\pi \in \varPi (|V^G|,|V^H|)}{\mathbf{C}} (\pi )\) can be viewed as projections of the previously computed optimal and possibly continuous solution for F to the discrete domain (line 4). Subsequently, an optimal solution \(\pi ^\star \) for C is computed (line 5), the upper bound UB is set to its induced edit cost (line 6), and LB and UB are returned (line 7).

In theory, the LP F can be solved in \(O({{\,\mathrm{var}\,}}(F)^{3.5}{{\,\mathrm{enc}\,}}(F))\) time, where \({{\,\mathrm{var}\,}}(F)\) is the number of variables contained in F and \({{\,\mathrm{enc}\,}}(F)\) is the number of bits needed to encode F [37]. However, popular LP solvers such as IBM CPLEX or Gurobi Optimization often use asymptotically slower but practically faster algorithms. Solving the projection problem C requires \(O(\min \{|V^G|,|V^H|\}^2\max \{|V^G|,|V^H|\})\) time.

figure b

6.2 Instantiations of the paradigm LP-GED

We present four different linearizations of the quadratic program shown in Fig. 4. The first three linearizations presented in Sects. 6.2.16.2.3 are designed for general graphs and edit costs. They hence not only yield upper and lower bounds for GED as detailed in Algorithm 2, but also exact algorithms if fed into exact MIP solvers. The last linearization presented in Sect. 6.2.4 requires the edge edit costs to be constant and symmetric. Furthermore, it is designed for graphs without edge labels. If used with graphs whose edges are labeled, it hence does not yield an exact algorithm for GED, even if fed into an exact MIP solver.

6.2.1 The linearization F1

The integer linear program F1 [43, 44] displayed in Fig. 5 is a straightforward linearization of the quadratic programming formulation shown in Fig. 4. F1 has \(|V^G||V^H|+|V^G|+|V^H|+|E^G||E^H|+|E^G|+|E^H|=O(|E^G||E^H|)\) binary variables and \(|V^G|+|V^H|+|E^G|+|E^H|+2|E^G||E^H|=O(|E^G||E^H|)\) constraints. Given an optimal solution \((\mathbf{x}^{\mathrm{sub}},\mathbf{x}^{\mathrm{del}},\mathbf{x}^{\mathrm{ins}},\mathbf{y}^{\mathrm{sub}},\mathbf{y}^{\mathrm{del}},\mathbf{y}^{\mathrm{ins}})\) for the continuous relaxation of F1, the projection problem C is defined as

$$\begin{aligned}&c_{i,k}:=1-x^{\mathrm{sub}}_{i,k}\\&c_{i,|V^H|+1}:=1-x^{\mathrm{del}}_{i}\\&c_{|V^G|+1,k}:=1-x^{\mathrm{ins}}_{k} \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\).

Fig. 5
figure 5

Linearization F1 suggested in [43, 44]

6.2.2 The linearization F2

The linearization F2 [44] displayed in Fig. 6 improves F1 by reducing the number of variables and constraints. It uses the fact that the node and edge substitution variables \(\mathbf{x}^{\mathrm{sub}}\) and \(\mathbf{y}^{\mathrm{sub}}\) implicitly encode the node and edge deletion and insertion variables \(\mathbf{x}^{\mathrm{del}}\), \(\mathbf{x}^{\mathrm{ins}}\), \(\mathbf{y}^{\mathrm{del}}\), and \(\mathbf{y}^{\mathrm{ins}}\). F2 has \(|V^G|+|V^H|+|V^H||E^G|=O(|V^H||E^G|)\) constraints and \(|V^G||V^H|+|E^G||E^H|=O(|E^G||E^H|)\) binary variables. Given an optimal solution \((\mathbf{x}^{\mathrm{sub}},\mathbf{y}^{\mathrm{sub}})\) for the continuous relaxation of F2, the projection problem C is defined as

$$\begin{aligned}&c_{i,k}:=1-x^{\mathrm{sub}}_{i,k}\\&c_{i,|V^H|+1}:=\sum _{v_k\in V^H}x^{\mathrm{sub}}_{i,k}\\&c_{|V^G|+1,k}:=\sum _{u_i\in V^G}x^{\mathrm{sub}}_{i,k} \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\).

Fig. 6
figure 6

Linearization F2 suggested in [44]. The modified edit costs \(c^\prime _V\) and \(c^\prime _E\) and the constant C are defined as \(c^\prime _V(u_i,v_k):=c_V(u_i,v_k)-c_V(u_i,\varepsilon )-c_V(\varepsilon ,v_k)\), \(c^\prime _E((u_i,u_j),(v_k,v_l)):=c_E((u_i,u_j),(v_k,v_l))-c_E((u_i,u_j),\varepsilon )-c_E(\varepsilon ,(v_k,v_l))\), and \(C:=\sum _{u_i\in V^G}c_V(u_i,\varepsilon )+\sum _{v_k\in V^H}c_V(\varepsilon ,v_k)+\sum _{(u_i,u_j)\in E^G}c_E((u_i,u_j),\varepsilon )+\sum _{(v_k,v_l)\in E^H}c_E(\varepsilon ,(v_k,v_l))\)

6.2.3 The linearization COMPACT-MIP

The linearization COMPACT-MIP [10] displayed in Fig. 7 makes do without edge variables. Instead, it contains continuous variables \(\mathbf{z}^{\mathrm{sub}}\), \(\mathbf{z}^{\mathrm{del}}\), \(\mathbf{z}^{\mathrm{ins}}\), which, at the optimum, contain the edit costs which are induced by the node assignment \(\pi \) encoded by optimal binary node variables \(\mathbf{x^\mathrm{sub}}\), \(\mathbf{x^\mathrm{del}}\), and \(\mathbf{x^\mathrm{ins}}\). COMPACT-MIP has \(|V^G||V^H|+|V^G|+|V^H|=O(|V^G||V^H|)\) binary variables, \(|V^G||V^H|+|V^G|+|V^H|=O(|V^G||V^H|)\) continuous variables, and \(|V^G||V^H|+|V^G|+|V^H|=O(|V^G||V^H|)\) constraints. It is hence smaller than both F1 and F2. Given an optimal solution \((\mathbf{x}^{\mathrm{sub}},\mathbf{x}^{\mathrm{del}},\mathbf{x}^{\mathrm{ins}},\mathbf{z}^{\mathrm{sub}},\mathbf{z}^{\mathrm{del}},\mathbf{z}^{\mathrm{ins}})\) for the continuous relaxation of COMPACT-MIP, the projection problem C is defined as

$$\begin{aligned}&c_{i,k}:=1-x^{\mathrm{sub}}_{i,k}\\&c_{i,|V^H|+1}:=1-x^{\mathrm{del}}_{i}\\&c_{|V^G|+1,k}:=1-x^{\mathrm{ins}}_{k} \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\).

Fig. 7
figure 7

Linearization COMPACT-MIP suggested in [10]. For all \((u_i,v_k)\in V^G\times V^H\), the constants \(a^{\mathrm{sub}}_{i,k}\), \(a^{\mathrm{del}}_i\), and \(a^{\mathrm{ins}}_k\) are defined as \(a^{\mathrm{sub}}_{i,k}:=0.5\cdot [\sum _{u_j\in N^G(u_i)}\sum _{v_l\in N^H(v_k)}c_E((u_i,u_j),(v_k,v_l))+\sum _{u_j\in N^G(u_i)}(|V^H|-\deg ^H(v_k)+1)c_E((u_i,u_j),\varepsilon )+\sum _{v_l\in N^H(v_k)}(|V^G|-\deg ^G(u_i)+1)c_E(\varepsilon ,(v_k,v_l))]\), \(a^{\mathrm{del}}_i:=0.5\cdot \sum _{u_j\in N^G(u_i)}(|V^H|+1)c_E((u_i,u_j),\varepsilon )\), and \(a^{\mathrm{ins}}_k:=0.5\cdot \sum _{v_l\in N^H(v_k)}(|V^G|+1)c_E(\varepsilon ,(v_k,v_l))\)

6.2.4 The linearization ADJ-IP

The linearization ADJ-IP [36] displayed in Fig. 8 requires the edge edit costs \(c_E\) to be constant and symmetric. Furthermore, it is designed for graphs without edge labels. ADJ-IP has \(3(|V^G|+|V^H|)^2=O((|V^G|+|V^H|)^2)\) binary variables and \(2|V^G|+2|V^H|+|V^G||V^H|+|V^G|^2+|V^H|^2=O((|V^G|+|V^H|)^2)\) constraints. Note that ADJ-IP sets all edge substitution costs to 0. If used with graphs whose edges are labeled, it hence ignores all edit costs induced by substituting an edge \((u_i,u_j)\in E^G\) by an edge \((v_k,v_l)\in E^H\) with \(\ell ^G_E(u_i,u_j)\ne \ell ^H_E(v_k,v_l)\). Given an optimal solution \((\mathbf{x},\mathbf{s},\mathbf{t})\) for the continuous relaxation of ADJ-IP, the projection problem C is defined as

$$\begin{aligned}&c_{i,k}:=1-x_{i,k}\\&c_{i,|V^H|+1}:=\sum _{v_k\in V^H}x_{i,k}\\&c_{|V^G|+1,k}:=\sum _{u_i\in V^G}x_{i,k} \end{aligned}$$

for all \((i,k)\in [|V^G|]\times [|V^H|]\).

Fig. 8
figure 8

Linearization ADJ-IP suggested in [36]

7 Heuristics based on local search

In this section, we first introduce the paradigm LS-GED, which generalizes heuristics that use variants of local search for upper bounding GED (Sect. 7.1). Subsequently, we present heuristics that can be modeled as instantiations (Sect. 7.2) and extensions (Sect. 7.2) of LS-GED.

7.1 The paradigm LS-GED

Algorithm 3 shows how to compute an upper bound for GED via a variant of local search. In the first step, an initial node map \(\pi \in \varPi (G,H)\) is generated randomly or constructed, for instance, by calling an instantiation of LSAPE-GED (line 1). Subsequently, a variant of local search is run, which produces an improved node map \(\pi ^\prime \in \varPi (G,H)\) with \(c(P_{\pi ^\prime })\le c(P_\pi )\) (line 2). This refinement phase is where different instantiations of LS-GED vary from each other. Once \(\pi ^\prime \) has been computed, \({\textit{UB}} :=c(P_{\pi ^\prime })\) is returned (lines 3 to 4).

figure c

7.2 Instantiations of the paradigm LS-GED

In the sequel, we present five algorithms for heuristically computing GED that can be modeled as instantiations of the paradigm LS-GED. All of them yield upper but no lower bounds for GED and can be used with general edit costs.

7.2.1 The algorithm REFINE

Given an initial node map \(\pi \in \varPi (G,H)\), the algorithm REFINE [12, 68] proceeds as follows: Let \(((u_s,v_s))^{|\pi |}_{s=1}\) be an arbitrary ordering of the node assignments contained in \(\pi \), let \(u_{|\pi |+1}:=v_{|\pi |+1}:=\varepsilon \), and let \(G_\pi :=(V^G_\pi \cup V^H_\pi ,A_\pi )\) be an auxiliary directed bipartite graph, where \(V^G_\pi :=\{u_s\mid s\in [|\pi |+1]\}\), \(V^H_\pi :=\{v_s\mid s\in [|\pi |+1]\}\), and \(A_\pi :=\pi \cup \{(u_{|\pi |+1},v_{|\pi |+1})\}\cup \{(v_s,u_{s^\prime })\mid (s,s^\prime )\in [|\pi |+1]\times [|\pi |+1]\wedge s\ne s^\prime \}\). In other words, \(G_\pi \) contains a forward arc for each assignment contained in \(\pi \), a forward arc between the additionally added dummy nodes \(u_{|\pi |+1}\) and \(v_{|\pi |+1}\), and backward arcs between nodes in \(V^G_\pi \) and \(V^H_\pi \) that are not assigned to each other by \(\pi \). Note that, by definition of a node map, \(V^G_\pi \) contains each node \(u\in V^G\) exactly once, \(V^H_\pi \) contains each node \(v\in V^H\) exactly once, but both \(V^G_\pi \) and \(V^H_\pi \) might contain multiple copies of the dummy node \(\varepsilon \). A directed cycle \(C\subseteq A_\pi \) in \(G_\pi \) with \(|C|=4\) is called swap. There are exactly \(\left( {\begin{array}{c}|\pi |+1\\ 2\end{array}}\right) =O((|V^G|+|V^H|)^2)\) swaps.

For each swap \(C{=}\{(u_s,v_s),(v_s,u_{s^\prime }),(u_{s^\prime },v_{s^\prime }),(v_{s^\prime },u_s)\}\), REFINE checks if the swapped node map \(\pi ^\prime :=(\pi {\setminus }\{(u_s,v_s),(u_{s^\prime },v_{s^\prime })\})\cup \{(u_s,v_{s^\prime }),(u_{s^\prime },v_s)\}\) induces a smaller upper bound than \(\pi \). If, at the end of the for-loop, a node map \(\pi ^\prime \) has been found that improves the upper bound, \(\pi \) is updated to the node map that yields the largest improvement and the process iterates. Otherwise, the output node map \(\pi ^\prime \) is set to \(\pi \) and REFINE terminates.

For checking if a swap C improves the induced upper bound, it suffices to consider the edges that are incident with the nodes involved in the swap. Therefore, one iteration of REFINE runs in \(O((|V^G|+|V^H|)^2\varDelta ^{G,H}_{\mathrm{max}})\) time, where \(\varDelta ^{G,H}_{\mathrm{max}}:=\max \{{{\,\mathrm{max\,deg}\,}}(G),{{\,\mathrm{max\,deg}\,}}(H)\}\). Since the induced upper bound improves in each iteration, this gives an overall runtime complexity of \(O({\textit{UB}} (|V^G|+|V^H|)^2\varDelta ^{G,H}_{\mathrm{max}})\) for integral edit costs, where \({\textit{UB}} \) is the initial upper bound.

7.2.2 The algorithm K-REFINE

The algorithm K-REFINE [12] is a straightforward extension of the algorithm REFINE presented in the previous section. Let \(\pi \in \varPi (G,H)\) be an initial node map and \(K\in {\mathbb {N}} _{\ge 2}\) be a constant. Furthermore, let the auxiliary directed bipartite graph \(G_\pi \) be defined as in the previous section. A directed cycle \(C\subseteq A_\pi \) in \(G_\pi \) with \(|C|=2K^\prime \) is called swap of size \(K^\prime \), where \(K^\prime \in \{2,\ldots ,K\}\). There are exactly \(\left( {\begin{array}{c}|\pi |+1\\ K^\prime \end{array}}\right) (K^\prime -1)!=O((|V^G|+|V^H|)^{K^\prime })\) swaps of size \(K^\prime \).

Starting with \(K^\prime :=2\), K-REFINE checks if there is a swap of size \(K^\prime \) that improves the induced upper bound. If so, \(\pi \) is updated to the node map obtained by the swap of size \(K^\prime \) that yields the largest improvement, \(K^\prime \) is reset to 2, and the process iterates. If no swap of size \(K^\prime \) yields an improvement, K-REFINE checks whether \(K^\prime \) equals the maximal swap size K. If this is the case, the output node map \(\pi ^\prime \) is set to \(\pi \) and K-REFINE terminates. Otherwise, K-REFINE increments \(K^\prime \) and continues the search.

One iteration of K-REFINE runs in \(O((|V^G|+|V^H|)^K\varDelta ^{G,H}_{\mathrm{max}})\) time. For integral edit costs, K-REFINE ’s overall runtime complexity is hence \(O({\textit{UB}} (|V^G|+|V^H|)^K\varDelta ^{G,H}_{\mathrm{max}})\), where \({\textit{UB}} \) is the initial upper bound.

7.2.3 The algorithm BP-BEAM

Given an initial node map \(\pi \in \varPi (G,H)\) and a constant \(K\in {\mathbb {N}} _{\ge 1}\), the algorithm BP-BEAM [56] starts by producing a random ordering \(((u_s,v_s))^{|\pi |}_{s=1}\) of the node assignments contained in \(\pi \). BP-BEAM now constructs an output node map \(\pi ^\prime \) with \(c(P_{\pi ^\prime })\le c(P_{\pi ^\prime })\) by partially traversing an implicitly constructed tree T via beam search with beam size K. The nodes of T are tuples \((\pi ^{\prime \prime },c(P_{\pi ^{\prime \prime }}),s)\), where \(\pi ^{\prime \prime }\in \varPi (G,H)\) is an ordered node map, \(c(P_{\pi ^{\prime \prime }})\) is its induced edit cost, and \(s\in [|\pi |]\) is the depth of the tree node in T. Tree nodes \((\pi ^{\prime \prime },c(P_{\pi ^{\prime \prime }}),s)\) with \(s=|\pi |\) are leafs, and the children of an inner node \((\pi ^{\prime \prime },c(P_{\pi ^{\prime \prime }}),s)\) are \(\{({{\,\mathrm{swap}\,}}(\pi ^{\prime \prime },s,s^\prime ),c(P_{{{\,\mathrm{swap}\,}}(\pi ^{\prime \prime },s,s^\prime )}),s+1)\mid s^\prime \in \{s,\ldots ,|\pi |\}\}\). Here, \({{\,\mathrm{swap}\,}}(\pi ^{\prime \prime },s,s^\prime )\) is the ordered node map obtained from \(\pi ^{\prime \prime }\) by swapping the assignments \((u_s,v_s)\) and \((u_{s^\prime },v_{s^\prime })\), i.e., setting \(v_s:=v_{s^\prime }\) and \(v_{s^\prime }:=v_s\).

At initialization, BP-BEAM sets the output node map \(\pi ^\prime \) to the initial node map \(\pi \). Furthermore, BP-BEAM maintains a priority queue q of tree nodes which is initialized as \(q:=\{(\pi ,c(P_\pi ),1)\}\) and sorted w.r.t. non-decreasing induced edit cost of the contained node maps. As long as q is non-empty, BP-BEAM extracts the top node \((\pi ^{\prime \prime },c(P_{\pi ^{\prime \prime }}),s)\) from q and updates the output node map \(\pi ^\prime \) to \(\pi ^{\prime \prime }\) if \(c(P_{\pi ^{\prime \prime }})<c(P_{\pi ^\prime })\). If \(s<|\pi |\), BP-BEAM adds all of its children to the priority queue q and subsequently discards all but the first K tree nodes contained in q. Once q is empty, the cheapest encountered node map \(\pi ^\prime \) is returned.

By construction of T, we know that at most \(1+K(|\pi |-1)=O(|V^G|+|V^H|)\) tree nodes are extracted from q. For each extracted inner node, BP-BEAM has to constructed all children, which requires \(O((|V^G|+|V^H|)\varDelta ^{G,H}_{\mathrm{max}})\) time, and subsequently sort q, which requires \(O((|V^G|+|V^H|)\log (|V^G|+|V^H|))\) time. BP-BEAM hence runs in \(O((|V^G|+|V^H|)^2(\varDelta ^{G,H}_{\mathrm{max}}+\log (|V^G|+|V^H|)))\) time.

7.2.4 The algorithm IBP-BEAM

Since the size of the priority queue q is restricted to K, which parts of the search tree T are visited by BP-BEAM crucially depends on the ordering of the initial node map \(\pi \). Therefore, BP-BEAM can be improved by considering not one but several initial orderings. The algorithm IBP-BEAM suggested in [27] does exactly this. That is, given a constant number of iterations \(I\in {\mathbb {N}} _{\ge 1}\), IBP-BEAM runs BP-BEAM with I different randomly created orderings of the initial node map \(\pi \) and then returns the cheapest node map \(\pi ^\prime \) encountered in one of the iterations. Therefore, IBP-BEAM runs in \(O(I(|V^G|+|V^H|)^2(\varDelta ^{G,H}_{\mathrm{max}}+\log (|V^G|+|V^H|)))\) time.

7.2.5 The algorithm IPFP

The algorithm IPFP [42] can be seen as an adaptation of the seminal Frank–Wolfe algorithm [30] to cases where an integer solution is required. Its adaptation to the case of GED, first suggested in [16], implicitly constructs a matrix \({\mathbf{Q}} \in {\mathbb {R}} ^{((|V^G|+1)\cdot (|V^H|+1))\times ((|V^G|+1)\cdot (|V^H|+1))}\) such that \(\min _{{\mathbf{X}} \in \varPi (|V^G|,|V^H|)}{{{\,\mathrm{vec}\,}}({\mathbf{X}})}^{\textsf {T}}{\mathbf{Q}} {{\,\mathrm{vec}\,}}({\mathbf{X}})=\mathrm{GED} \). Recall that \(\varPi (|V^G|,|V^H|)\in \{0,1\}^{(|V^G|+1)\times (|V^H|+1)}\) is the set of feasible LSAPE solutions of size \((|V^G|+1)\times (|V^H|+1)\) and that LSAPE solutions of size \((|V^G|+1)\times (|V^H|+1)\) are equivalent to node maps between G and H. \({\mathbf{Q}} \) can hence be viewed as a matrix representation of the quadratic program shown in Fig. 4. In this context, we define the cost function \(c:[0,1]^{(|V^G|+1)\cdot (|V^H|+1)}\rightarrow {\mathbb {R}} \) as \(c({\mathbf{X}}):={{{\,\mathrm{vec}\,}}({\mathbf{X}})}^{\textsf {T}}{\mathbf{Q}} {{\,\mathrm{vec}\,}}({\mathbf{X}})\).

Starting from an initial node map \({\mathbf{X}} _0\in \varPi (|V^G|,|V^H|)\) with induced upper bound \({\textit{UB}} :=c({\mathbf{X}} _0)\), the algorithm converges to a, possibly fractional, local minimum for GED by repeating the five following steps:

  1. 1.

    Populate LSAPE instance \({\mathbf{C}} _k:={\mathbf{Q}} {{\,\mathrm{vec}\,}}({\mathbf{X}} _k)\).

  2. 2.

    Compute \({\mathbf{B}} _{k+1}\in {{\,\mathrm{arg\,min}\,}}_{{\mathbf{B}} \in \varPi (|V^G|,|V^H|)}{\mathbf{C}} _k({\mathbf{B}})\).

  3. 3.

    Set \({\textit{UB}} :=\min \{{\textit{UB}},c({\mathbf{B}} _{k+1})\}\).

  4. 4.

    Compute \(\alpha _{k+1} :=\min _{\alpha \in [0,1]} c({\mathbf{X}} _k+ \alpha \cdot ({\mathbf{B}} _{k+1}-{\mathbf{X}} _k))\).

  5. 5.

    Set \({\mathbf{X}} _{k+1}:={\mathbf{X}} _k + \alpha _{k+1} ({\mathbf{B}} _{k+1} - {\mathbf{X}} _k)\).

The algorithm iterates until \(|c({\mathbf{X}} _k)-{\mathbf{C}} _k({\mathbf{B}} _{k+1})|/c({\mathbf{X}} _k)\) is smaller than a convergence threshold \(\varepsilon \) or a maximal number of iterations I have been reached. Subsequently, the possibly fractional local optimum \({\mathbf{X}} _{k+1}\) is projected to the closest integral solution \(\widehat{{\mathbf{X}}}\), and the upper bound \({\textit{UB}} :=\min \{{\textit{UB}},c(\widehat{{\mathbf{X}}})\}\) is returned.

Populating the LSAPE instance \({\mathbf{C}} _k\) in step 1 requires \(O(k|V^G||V^H|\max \{|V^G|,|V^H|\})\) time. Solving the LSAPE instance in step 2 requires \(O(\min \{|V^G|,|V^H|\}^2\max \{|V^G|,|V^H|\})\) time. Updating the upper bound in step 3 requires \(O(\max \{|V^G|,|V^H|\}^2)\) time. Determining the optimal step width \(\alpha _{k+1}\) in step 4 can be done analytically in \(O(|V^G||V^H|)\) time. And projecting the final fractional solution \({\mathbf{X}} _{k+1}\) to the integral solution \(\widehat{{\mathbf{X}}}\) requires \(O(\min \{|V^G|,|V^H|\}^2\max \{|V^G|,|V^H|\})\) time. IPFP ’s overall runtime complexity is hence \(O(I^2|V^G||V^H|\max \{|V^G|,|V^H|\})\).

Slightly different versions of IPFP that use LSAP instead of LSAPE as a linear model have been presented in [14] and [7]. The main advantage of these versions w.r.t. the one presented in this survey is that they are easier to implement: Unlike LSAPE, LSAP is a standard combinatorial optimization problem and solvers are available for all major programming languages. The drawback of the version presented in [14] is that is uses a significantly larger quadratic matrix \(\mathbf {Q}\), while the drawback of the version presented in [7] is that it can be used only for quasimetric edit cost functions.

7.3 Extensions of the paradigm LS-GED

Next, we present two extensions of the paradigm LS-GED. Both of them can be used to improve all of the heuristics described in Sect. 7.2.

7.3.1 The extension MULTI-START

MULTI-START was suggested in [26] as an extension to the IPFP algorithm. While the general LS-GED framework computes a local optimum, the quality of the local optimum highly depends on the initialization of the method, which is a general drawback of local search methods. Hence, the MULTI-START extension to the framework simply proposes to use K different initial solutions, run the LS-GED framework on each of them (possibly in parallel), and return the best among the K computed local optima.

In order to further reduce the computing time of MULTI-START when parallelization is available, it was suggested in [13] to run in parallel more local searches than the number of desired local optima and to stop the whole process when the number of local searches that have converged has reached the number of desired local optima. In this context, the framework runs with two parameters: K is the number of initial solutions, and \(0<\rho \le 1\) is defined such that \({\lceil }*{\rceil }{\rho \cdot K}\) is the number of desired computed local optima.

7.3.2 The extension RANDPOST

The RANDPOST framework initially proposed in [13] and refined in [12] aims at extending the MULTI-START framework by running it several times in a row, and using the information contained in the computed local optima computed so far in order to produce better initializations. In addition to the two parameters K and \(\rho \) of MULTI-START, RANDPOST requires two parameters: the number of iterations \(L\in {\mathbb {N}} \) and a penalty parameter \(\eta \in [0,1]\). RANDPOST maintains a score matrix \({\mathbf{M}} \in {\mathbb {R}} ^{(|V^G|+1)\times (|V^H|+1)}_{\ge 0}\) for all possible node assignments, which is initialized as \(\mathbf{0}_{(|V^G|+1)\times (|V^H|+1)}\). The score for each substitution \((u_i,v_k)\in V^G\times V^H\) is represented by the value \(m_{i,k}\) in the score matrix M, while the scores for the deletion \((u_i,\varepsilon )\) and the insertion \((\varepsilon ,v_k)\) are represented by the values \(m_{i,|V^H|+1}\) and \(m_{|V^G|+1,k}\), respectively. When the penalty parameter \(\eta \) equals 0, \(m_{i,k}\) always represents the number of converged local optima that contain the corresponding assignment. When \(\eta >0\), the score of each assignment depends on both the number and the cost of converged local optima that contain it (assignments that appear in node maps with lower costs receive higher scores).

RANDPOST starts by running one iteration of MULTI-START. Next, RANDPOST carries out L iterations of its main for-loop. Inside this for-loop, RANDPOST starts by updating the scores matrix M as \({\mathbf{M}} :={\mathbf{M}} +\sum _{\pi \in S}{\mathbf{X}} ^\pi [(1-\eta )+\eta \frac{{\textit{UB}}-{\textit{LB}}}{c(P_{\pi })-{\textit{LB}}}]\), where S is the set of converged node maps computed by the previous iteration of MULTI-START and, for each node map \(\pi \in S\), \({\mathbf{X}} ^\pi \) is the binary matrix that encodes \(\pi \). Subsequently, RANDPOST randomly generates new initial node maps such that assignments with higher scores are more likely to be part of the generated node maps: For each of the first \(|V^G|\) rows \({\mathbf{M}} _i\) of the score matrix M, RANDPOST draws a column \(k\in [|V^H|+1]\) from the distribution encoded my \({\mathbf{M}} _i\). If \(k=|V^H|+1\), the node deletion \((u_i,\varepsilon )\) is added to the node map \(\pi \) that is being constructed. Otherwise, the substitution \((u_i,v_k)\) is added to \(\pi \), the score \(m_{j,k}\) is temporarily set to 0 for all \(j\in [|V^G|]{\setminus }[i]\), and the column k is marked as covered. Once all nodes of G have been processed, node insertions \((\varepsilon ,v_k)\) are added to \(\pi \) for all uncovered columns \(k\in [|V^H|]\). This process is repeated until K different node maps have been created. Once all new initial node maps have been generated, RANDPOST carries out another iteration of MULTI-START and updates the upper bound if one of the newly computed node maps yields an improvement.

8 Miscellaneous heuristics

In this section, we describe six algorithms that do not instantiate any of the paradigms LSAPE-GED, LS-GED, and LP-GED discussed in Sects. 56. The first three algorithms presented in Sects. 8.18.3 accept arbitrary edit cost functions, whereas the remaining three algorithms presented in Sects. 8.48.6 are designed for uniform edit costs.

8.1 The algorithm HED

Given two input graphs G and H, the algorithm HED [28] starts by constructing the same LSAPE instance \({\mathbf{C}} \in {\mathbb {R}} ^{(|V^G|+1)\times (|V^H|+1)}\) as the algorithm BRANCH presented in Sect. 5.2.3 above. However, instead of feeding C into an LSAPE solver for obtaining upper and lower bounds for GED, HED computes a lower bound

$$\begin{aligned} {\textit{LB}} :=0.5\cdot \sum ^{|V^G|}_{i=1}\min _{k\in [|V^H|+1]}c_{i,k}+0.5\cdot \sum ^{|V^H|}_{k=1}\min _{i\in [|V^G|+1]}c_{i,k} \end{aligned}$$

for GED by summing the minima of C ’s rows and columns. Note that, in general, LB does not correspond to a feasible LSAPE solution, because of which HED does not compute an upper bound for GED. Furthermore, it holds that \({\textit{LB}} \le \mathrm{LSAPE} ({\mathbf{C}})\), which implies that HED ’s lower bound is never tighter than the lower bound computed by BRANCH.

As detailed in Sect. 5.2.3, the LSAPE instance C can be constructed in \(O(|V^G||V^H|{\varDelta ^{G,H}_{\mathrm{min}}}^2\varDelta ^{G,H}_{\mathrm{max}})\) time, where \(\varDelta ^{G,H}_{\mathrm{min}}:=\min \{{{\,\mathrm{max\,deg}\,}}(G),{{\,\mathrm{max\,deg}\,}}(H)\}\) and \(\varDelta ^{G,H}_{\mathrm{max}}:=\max \{{{\,\mathrm{max\,deg}\,}}(G),{{\,\mathrm{max\,deg}\,}}(H)\}\). This implies that the overall runtime complexity of HED is \(O(|V^G||V^H|{\varDelta ^{G,H}_{\mathrm{min}}}^2\varDelta ^{G,H}_{\mathrm{max}})\).

8.2 The algorithm BRANCH-TIGHT

Given two input graphs G and H, the algorithm BRANCH-TIGHT [9] starts by enforcing \(|V^G|=|V^H|=: N\). If the edit cost functions \(c_V\) and \(c_E\) are metric, this is done by adding \(\max \{|V^G|,|V^H|\}-|V^G|\) isolated dummy nodes to G and adding \(\max \{|V^G|,|V^H|\}-|V^H|\) isolated dummy nodes to H. Otherwise, \(|V^H|\) isolated dummy nodes are added to G and \(|V^G|\) isolated dummy nodes are added to H. Next, dummy edges are added to G and H to render G and Hd-regular with \(d=O(\varDelta ^{G,H}_{\mathrm{max}})\). Both of these preprocessing operations leave \(\mathrm{GED} (G,H)\) invariant.

After preprocessing the input graphs, BRANCH-TIGHT runs an anytime algorithm that, given a maximal number of iterations I, computes lower bounds \(({\textit{LB}} _s)^I_{r=1}\) and upper bounds \(({\textit{UB}} _s)^I_{r=1}\) for GED such that \({\textit{LB}} _1\) equals the lower bound computed by the algorithm BRANCH presented in Sect. 5.2.3 and \({\textit{LB}} _{r+1}\ge {\textit{LB}} _s\) holds for all \(r\in [I-1]\). Once I or a given time limit has been reached or the lower bound has converged, BRANCH-TIGHT returns the last lower bound \({\textit{LB}} :={\textit{LB}} _{I^\prime }\) and the best encountered upper bound.

BRANCH-TIGHT repeatedly solves instances of the linear sum assignment or minimum cost perfect bipartite matching problem (LSAP). LSAP is similar to LSAPE but does not allow deletions and insertions of rows and columns.

Definition 8

(LSAP) Given a square matrix \({\mathbf{C}} \in {\mathbb {R}} ^{n \times n}\), the linear sum assignment problem (LSAP) asks to minimize \({\mathbf{C}} ({\mathbf{X}}):=\sum ^{n}_{i=1}\sum ^{m}_{k=1}c_{i,k}x_{i,k}\) over all permutation matrices \({\mathbf{X}} \in \widehat{\varPi }({\mathbf{C}})\), where \(\widehat{\varPi }(n,m):=\{{\mathbf{X}} \in \{0,1\}^{n\times n}\mid {\mathbf{1}}^{\textsf {T}}{\mathbf{X}} ={\mathbf{1}}^{\textsf {T}}\wedge {\mathbf{X}} \mathbf{1}=\mathbf{1}\}\) and \(\mathbf{1}\) is the n-sized vector of ones. \(\mathrm{LSAP} ({\mathbf{C}}):={\mathbf{C}} ({\mathbf{X}})\) denotes the cost of an optimal solution X.

For each \((u_i,v_k)\in V^G\times V^H\) and each iteration \(r\in [I]\), BRANCH-TIGHT constructs and solves LSAP instances \({\mathbf{C}} ^{i,k,r}\in {\mathbb {R}} ^{d \times d}\) defined as

$$\begin{aligned} c^{i,k,r}_{j,l}:={\left\{ \begin{array}{ll} 0.5\cdot c_E((u_i,u_j),(v_k,v_l))&{}\text { if }r=1\\ c^{i,k,r-1}_{j,l}-s^{i,k,r-1}_{j,l}-\frac{s^{r-1}_{i,k}}{d}+s^{j,l,r-1}_{i,k}+\frac{s^{r-1}_{j,l}}{d}&{}\text { else} \end{array}\right. } \end{aligned}$$

for all \((u_j,v_l)\in N^G(u_i)\times N^H(v_k)\). Here, \(s^{r,i,k}_{j,l}\) is the slack of the variable \(x_{j,l}\) in an optimal LSAP solution of the LSAP instance \({\mathbf{C}} ^{i,k,r}\), and \(s^{r}_{i,k}\) is the slack of the variable \(x_{i,k}\) in an optimal solution of the LSAP instance \({\mathbf{C}} ^{r}\in {\mathbb {R}} ^{N \times N}\), which, in turn, is constructed by setting

$$\begin{aligned} c^{r}_{i,k}:=c_V(u_i,v_k)+\mathrm{LSAP} ({\mathbf{C}} ^{i,k,r}) \end{aligned}$$

for all \((u_i,v_k)\in V^G\times V^H\). After constructing \({\mathbf{C}} ^r\), an optimal solution \({\mathbf{X}} ^r\) for \({\mathbf{C}} ^r\) is computed, \({\textit{LB}} _r\) is set to \(\mathrm{LSAP} ({\mathbf{C}} ^r)\), and \({\textit{UB}} _r\) is set to the cost of the edit path induced by \({\mathbf{X}} ^{r}\). Subsequently, r is incremented and the process iterates.

Preprocessing the input graphs G and H requires \(O(N^3{\varDelta ^{G,H}_{\mathrm{max}}}^2)\), and one iteration of the anytime algorithm runs in \(O(N^2{\varDelta ^{G,H}_{\mathrm{max}}}^3+N^3)\). This implies that BRANCH-TIGHT ’s overall runtime complexity is \(O(N^3{\varDelta ^{G,H}_{\mathrm{max}}}^2+I(N^2{\varDelta ^{G,H}_{\mathrm{max}}}^3+N^3))\). Recall that we have \(N=\max \{|V^G|,|V^H|\}\), if the edit cost functions are metric, and \(N=|V^G|+|V^H|\), otherwise.

8.3 The algorithm SA

The algorithm SA [58] uses simulated annealing to improve the upper bound computed by an instantiation of the paradigm LSAPE-GED discussed in Sect. 5.Footnote 3SA is hence similar to the local search-based heuristics presented in Sect. 7. However, instead of varying an initial node map, SA varies the processing order for greedily computing a cheap solution for an initially computed LSAPE instance.

Assume w.l.o.g. that G and H are two input graphs with \(|V^G|\ge |V^H|\). SA starts by running an instantiation of LSAPE-GED to obtain an initial node map \(\pi \in \varPi (G,H)\), an LSAPE instance \({\mathbf{C}} \in {\mathbb {R}} ^{(|V^G|+1)\times (|V^H|+1)}\), and, possibly, a lower bound LB. If the employed LSAPE-GED instantiation does not yield a lower bound, LB can be computed with any other method that produces a lower bound.

Given a maximal number of iterations I and start and end probabilities \(p_1\) and \(p_I\) with \(1>p_1\ge p_I>0\) for accepting an unimproved node map, SA initializes an ordering \(\sigma :[|V^G|]\rightarrow [|V^G|]\) of the first \(|V^G|\) rows of C by setting \(\sigma (i):=i\) for all \(i\in [|V^G|]\), computes a cooling factor \(a:=(\log (p_1)/\log (p_I))^{1/(I-1)}\) such that \(p_1^{a^{-(I-1)}}=p_I\), sets the current acceptance probability to \(p:=p_1\), initializes the best encountered node map \(\pi ^\prime \) and the current node map \(\pi ^{\prime \prime }\) as \(\pi ^\prime :=\pi ^{\prime \prime }:=\pi \), and sets the number r of consecutive iterations without improvement of the upper bound to \(r:=0\).

While the maximal number of iterations I has not been reached and the best upper bound \(c(P_{\pi ^\prime })\) is greater than LB, SA does the following: First, a candidate row ordering \(\sigma ^\prime \) is obtained from the current ordering \(\sigma \) by setting \(\sigma ^\prime (1):=\sigma (i)\), \(\sigma ^\prime (j):=\sigma (j-1)\) for all \(j\in [i]{\setminus }\{1\}\), and \(\sigma ^\prime (j):=\sigma (j)\) for all \(j\in [|V^G|]{\setminus }[i]\), where \(i\in [|V^G|]\) is a randomly selected row of C. Next, a candidate node map \(\pi ^{\prime \prime \prime }\) is computed by greedily assigning the \(\sigma ^\prime \)-ordered rows of C to the cheapest unassigned columns. If \(\pi ^{\prime \prime \prime }\)’s induced upper bound is cheaper than the upper bound of the current node map \(\pi ^{\prime \prime }\), \(\pi ^{\prime \prime }\) and \(\sigma \) are updated to \(\pi ^{\prime \prime \prime }\) and \(\sigma ^\prime \), respectively. Otherwise, they are updated with a probability that is proportional to the current acceptance probability p and inversely proportional to the deterioration \(c(P_{\pi ^{\prime \prime \prime }})-c(P_{\pi ^{\prime \prime }})\) of the induced upper bound.

After updating \(\pi ^{\prime \prime }\) and \(\sigma \), SA checks if \(\pi ^{\prime \prime }\)’s induced upper bound is tighter than the upper bound induced by the best encountered node map \(\pi ^\prime \). If this is the case, \(\pi ^\prime \) is updated to \(\pi ^{\prime \prime }\) and the number r of consecutive iterations without improvement is reset to 0. Otherwise, r is incremented and the current ordering \(\sigma \) is reshuffled randomly with probability r / I. Finally, the current acceptance probability p is set to \(p_1^{a^{-s}}\), where s is the number of the current iteration, and SA iterates. After exiting the main loop, SA returns \({\textit{UB}} :=c(P_{\pi ^\prime })\).

The dominant operations in one iteration of SA are the greedy computation of the candidate node map \(\pi ^{\prime \prime \prime }\) and the computation of its induced upper bound. One iteration of SA hence runs in \(O(|V^G||V^H|+\max \{E^G,E^H\})\) time. This implies that SA ’s overall runtime complexity is \(O(\omega +I(|V^G||V^H|+\max \{E^G,E^H\}))\), where \(O(\omega )\) is the runtime required for computing the initial upper and lower bounds as well as the LSAPE instance C.

8.4 The algorithm BRANCH-COMPACT

The algorithm BRANCH-COMPACT [71] yields a lower bound for GED with uniform edit cost functions \(c_V\) and \(c_E\). Recall that \(c_V\) and \(c_E\) are uniform if there is a constant \(c\in {\mathbb {R}} _{>0}\) such that \(c_V(\alpha ,\alpha ^\prime )=c_E(\beta ,\beta ^\prime )=c\) holds for all node labels \((\alpha ,\alpha ^\prime )\in (\varSigma _V\cup \{\varepsilon \})\times (\varSigma _V\cup \{\varepsilon \})\) with \(\alpha \ne \alpha ^\prime \) and all edge labels \((\beta ,\beta ^\prime )\in (\varSigma _E\cup \{\varepsilon \})\times (\varSigma _E\cup \{\varepsilon \})\) with \(\beta \ne \beta ^\prime \).

Given input graphs G and H, BRANCH-COMPACT starts by constructing branches \(\mathscr {B}^G_i:=(\ell ^G_V(u_i),{\ell ^G_E\llbracket E^G(u_i)\rrbracket })\) and \(\mathscr {B}^H_k:=(\ell ^H_V(v_k),{\ell ^H_E\llbracket E^H(v_k)\rrbracket })\) for all \(u_i\in V^G\) and all \(v_k\in V^H\). Subsequently, BRANCH-COMPACT sorts the branches in non-decreasing lexicographical order, i.e., computes orderings \(\sigma ^G:[|V^G|]\rightarrow [|V^G|]\) and \(\sigma ^H:[|V^H|]\rightarrow [|V^H|]\) such that \(\mathscr {B}^G_{\sigma ^G(i)}\preceq _L\mathscr {B}^G_{\sigma ^G(i+1)}\) holds for all \(i\in [|V^G|-1]\) and \(\mathscr {B}^H_{\sigma ^H(k)}\preceq _L\mathscr {B}^H_{\sigma ^H(k+1)}\) holds for all \(k\in [|V^H|-1]\).

BRANCH-COMPACT now performs a first parallel linear scan over the sorted sequences of branches \((\mathscr {B}^G_{\sigma ^G(i)})^{|V^G|}_{i=1}\) and \((\mathscr {B}^H_{\sigma ^H(k)})^{|V^H|}_{k=1}\) to delete a maximal number of pairs of branches \((\mathscr {B}^G_{\sigma ^G(i)},\mathscr {B}^H_{\sigma ^H(k)})\) with \(\mathscr {B}^G_{\sigma ^G(i)}=\mathscr {B}^H_{\sigma ^H(k)}\). Subsequently, BRANCH-COMPACT initializes its lower bound as \({\textit{LB}} :=0\) and performs a second parallel linear scan over the remaining branches. In this scan, a maximal number of pairs of branches \((\mathscr {B}^G_{\sigma ^G(i)},\mathscr {B}^H_{\sigma ^H(k)})\) with \(\ell ^G_V(u_{\sigma ^G(i)})=\ell ^H_V(v_{\sigma ^H(k)})\) are deleted and LB is incremented by c / 2 for each deleted pair of branches. Finally, LB is incremented by \(c(\max \{|V^G|,|V^H|\}-D)\), where D is the number of pairs of branches that have been deleted during the two scans.

BRANCH-COMPACT first sorts the branches in \(O(\max \{|V^G|,|V^H|\}\varDelta ^{G,H}_{\mathrm{max}}\log (\varDelta ^{G,H}_{\mathrm{max}}))\) time and then computes its lower bound in \(O(\max \{|V^G|,|V^H|\})\) time. BRANCH-COMPACT ’s overall runtime complexity is hence \(O(\max \{|V^G|,|V^H|\}(\varDelta ^{G,H}_{\mathrm{max}}\log (\varDelta ^{G,H}_{\mathrm{max}})+\log (\max \{|V^G|,|V^H|\})))\).

8.5 The algorithm PARTITION

The algorithm PARTITION [71] computes a lower bound for GED with uniform edit costs. Given input graphs G and H and a constant \(K\in {\mathbb {R}} _{\ge 1}\), PARTITION starts by initializing a collection \(\mathscr {S}:=\emptyset \) of \(K^\prime \)-sized substructures of G that are not subgraph-isomorphic to H, where \(K^\prime \in [K]\) and a \(K^\prime \)-sized substructure of G is a connected subgraph of G that is composed of \(K^\prime \) elements (nodes or edges). For instance, 1-sized substructures are single nodes or edges, 2-sized substructures are nodes together with an incident edge, and 3-sized substructures are nodes together with two incident edges or edges together with their terminal nodes.

Starting with \(K^\prime :=1\), PARTITION now consecutively checks for each \(K^\prime \)-sized substructure \(S^G\subseteq G\) of G if there is a \(K^\prime \)-sized substructure of H which is isomorphic to \(S^G\). If this is not the case, \(S^G\) is added to \(\mathscr {S}\) and deleted from G. Once all \(K^\prime \)-sized substructures have been considered, \(K^\prime \) is incremented and the process iterates if \(K^\prime \le K\). Otherwise, PARTITION returns the lower bound \({\textit{LB}} :=c|\mathscr {S}|\).

Since G and H have, respectively, \(O(|E^G|)\) and \(O(|E^H|)\) substructures of sizes 1, 2, and 3, PARTITION with \(K\le 3\) runs in \(O(|E^G||E^H|)\) time. Determining non-isomorphic substructures of size \(K>3\) cannot be done naively but requires to call subgraph isomorphism verification algorithms such as the one proposed in [24]. These algorithms run in super-polynomial time but are often fast in practice for small K.

8.6 The algorithm HYBRID

The algorithm HYBRID [71] improves the lower bounds of the algorithms BRANCH-CONST and PARTITION presented in Sects. 5.2.5 and 8.5. Given input graphs G and H and a constant \(K\in {\mathbb {R}} _{\ge 1}\), HYBRID first runs PARTITION with the maximal size of the considered substructures set to K and hence obtains a collection \(\mathscr {S}\) of substructures \(S^G\subseteq G\) of G that are not subgraph-isomorphic to H.

Table 5 Overview of test datasets

Let be the set of all configurations of nodes or edges that appear in the non-isomorphic substructures. For each configuration \(a:=(a_s)^{|\mathscr {S}|}_{s=1}\in \mathscr {C}(\mathscr {S})\), HYBRID creates a modified graph \(G_a\), where all nodes or edges \(a_s\) contained in the configuration a get a special wildcard label \(\gamma \), and runs a variant of BRANCH-CONST on the graphs \(G_a\) and H, which edits \(\gamma \)-labeled nodes and edges for free. Finally, HYBRID returns the lower bound \({\textit{LB}} :=|\mathscr {S}|+\min \{{\textit{LB}} _a\mid a\in \mathscr {C}(\mathscr {S})\}\), where \({\textit{LB}} _a\) is the lower bound returned by the wildcard version of BRANCH-CONST if run on the graphs \(G_a\) and H. This lower bound is guaranteed to be as least as tight as the lower bounds computed by PARTITION and BRANCH-CONST.

Let \(O(\omega _1)\) be the runtime complexity of PARTITION with the maximal size of the considered substructures set to K and \(O(\omega _2)\) be the runtime complexity of BRANCH-CONST. Then, HYBRID runs in \(O(\omega _1+\omega _2|\mathscr {C}(\mathscr {S})|)\) time. Note that \(|\mathscr {C}(\mathscr {S})|\) can get huge. For instance, assume that PARTITION completely partitions G into substructures of size 2. Then, it holds that \(|\mathscr {C}(\mathscr {S})|=\prod _{S^G\in \mathscr {S}}|S^G|=2^{|V^G|}\). HYBRID ’s runtime complexity is hence not polynomially bounded.

9 Experimental evaluation

We carried out extensive experiments to empirically evaluate the presented heuristics and to address the two meta-questions Q1 and Q2 introduced in Sect. 1. We first describe the setup of our experiments (Sects. 9.19.4) and then report their results (Sects. 9.59.9).

9.1 Datasets and edit cost functions

We tested on the widely used benchmark datasets aids, muta, protein, letter (h), grec, and fp from the IAM Graph Database Repository [50, 52] and used the edit cost functions suggested in [2, 52]. Table 5 summarizes important statistics of the datasets. For details on the edit cost function, cf. Appendix A. In order to be able to compare all heuristics on all datasets, we used the technique described in Sect. 4 to extend heuristics with cost constraints to general edit costs.

9.2 Choice of options and parameters

For all instantiations of the paradigms LSAPE-GED, LS-GED, and LP-GED and for all miscellaneous heuristics, we followed the original publications to determine their meta-parameters. In the remainder of this section, we give detailed descriptions for each heuristic and describe how we tested the extensions MULTI-SOL and CENTRALITIES of the paradigm LSAPE-GED, as well as the extensions MULTI-START and RANDPOST of the paradigm LS-GED.

  • Meta-parameters for SUBGRAPH and WALKS: As suggested in [21, 32], for each dataset, we determined the parameters K of SUBGRAPH and WALKS as the \(K\in [5]\) that yielded the tightest average upper bounds on a set of training graphs. To cope with SUBGRAPH ’s exponential runtime complexity, we set a time limit of \(1\,\hbox {ms}\) for the computation of each cell of its LSAPE instance C.

  • Options and meta-parameters for RING: As highlighted in [4, 5], RING performs best if the node and edge set distances are computed via optimal LSAPE solvers or multiset intersection-based proxies. We included both options in our experiments; the resulting heuristics are denoted as RINGOPT and RINGMS, respectively. For both variants and each dataset, the meta-parameters \(\lambda _l\), \(\alpha _s\), and K were determined by running a blackbox optimizer on a set of training graphs, as suggested in [4, 5].

  • Options for RING-ML and PREDICT: As highlighted in [4], the machine learning-based heuristics RING-ML and PREDICT perform best if one-class support vector machines with RBF kernel or fully connected feedforward deep neural networks are used for training. We included both variants in our experiments; the resulting heuristics are denoted as RING-ML1-SVM, RING-MLDNN, PREDICT1-SVM, and PREDICTDNN, respectively.

  • Meta-parameters for K-REFINE: We ran K-REFINE with swap size \(K:=3\). We hence followed the suggestion in [12], where it is highlighted that \(K>3\) leads to an enormous blowup of K-REFINE ’s runtime on larger graphs.

  • Meta-parameters for BP-BEAM and IBP-BEAM: As suggested in [27, 56], we set the beam size employed by BP-BEAM and IBP-BEAM to \(K:=5\) and the number of iterations employed by IBP-BEAM to \(I:=20\).

  • Options and meta-parameters for IPFP: As highlighted in [7], the best performing variant of IPFP that can cope with general edit costs is the one suggested in [16]. In our experiments, we therefore only included this variant. Like in the experiments of the original publications, we set the maximal number of iterations to \(I:=100\) and the convergence threshold to \(\varepsilon :=10^{-3}\).

  • Meta-parameters for BRANCH-TIGHT: As suggested in [9], we set the number of iterations carried out by BRANCH-TIGHT to \(I:=20\).

  • Meta-parameters for SA: As suggested in [58], we set SA ’s number of iterations to \(I:=100\) and used start and end probabilities \(p_1:=0.8\) and \(p_I:=10^{-2}\). We used BRANCH for computing SA ’s initial LSAPE instance C.

  • Meta-parameters for PARTITION and HYBRID: In [71], it is suggested to set the maximal size of the substructures employed by PARTITION and HYBRID to \(K:=8\). However, how to implement these heuristics with \(K>3\) is not well documented in [71] and the authors did not reply to our request to share their implementation. We therefore used \(K:=3\) for our experiments. To cope with HYBRID ’s exponential runtime complexity, we set a time limit of \(1\,\hbox {s}\) and set up HYBRID to return the maximum of the lower bounds computed by PARTITION and BRANCH-CONST if it did not terminate within the time limit.

  • Configurations for the extensions MULTI-SOL and CENTRALITIES of the paradigm LSAPE-GED: In order to test MULTI-SOL and CENTRALITIES, we ran all instantiations of LSAPE-GED with all configurations \((K,\gamma )\in \{1,3,7,10\}\times \{0,0.7\}\), where K is the maximal number of solutions computed by MULTI-SOL and \(\gamma \) is the weight of the centralities used by CENTRALITIES. We used pagerank centralities with \(\gamma =0.7\), because in [53] this setup is reported to yield the best results among all variants of CENTRALITIES. MULTI-SOL is used just in case \(K\ne 1\) and CENTRALITIES is used just in case \(\gamma \ne 0\).

  • Configurations for the extensions MULTI-START and RANDPOST of the paradigm LS-GED: For testing MULTI-START and RANDPOST, we ran each LS-GED instantiation with all \((K,\rho ,L,\eta )\in (\{(40,\nicefrac {1}{2},1),(40,\nicefrac {1}{4},3),(40,\nicefrac {1}{8},7)\}\times \{0,1\})\cup (\{1,10,20,30,40\}\times \{(1,0,0)\})\). K is the number of initial node maps constructed by MULTI-START, \({\lceil }*{\rceil }{\rho \cdot K}\) is the number of completed runs from initial node maps, L is the number of RANDPOST loops, and \(\eta \) is the penalty for expensive converged node maps employed by RANDPOST. The initial node maps were constructed randomly under the constraint that they contain exactly \(\min \{|V^G|,|V^H|\}\) node substitutions. Note that each configuration that uses RANDPOST (i.e., has \(L>0\)) in total carries out exactly 40 runs from different initial node maps.

9.3 Test protocol and test metrics

For each test dataset \(\mathscr {D}\), we randomly selected a training set \(\mathscr {D}_{\text {train}}\subseteq \mathscr {D}\) and a testing set \(\mathscr {D}_{\text {test}}\subseteq \mathscr {D}{\setminus }\mathscr {D}_{\text {train}}\). We ensured that both sets are balanced w.r.t. the classes of the contained graphs and set their sizes to the largest integers not greater than, respectively, 50 (for training) and 100 (for testing) that allowed balancing. All algorithms that require training were trained on \(\mathscr {D}_{\text {train}}\). Subsequently, we ran all compared algorithms on all pairs of graphs \((G,H)\in \mathscr {D}_{\text {test}}\times \mathscr {D}_{\text {test}}\). Recall that we compared various configurations of the extensions MULTI-SOL and CENTRALITIES for the instantiations of LSAPE-GED and that we tested various configurations of the extensions MULTI-START and RANDPOST for the instantiations of LS-GED. In the following, the expression “algorithm” denotes a heuristic together with its configuration.

Algorithms for GED computation are typically evaluated w.r.t. their runtime behavior, the tightness of the produced bounds, and the performance of pattern recognition frameworks that use the produced bounds as underlying distance measures (cf. the criteria C1 to C3). For all compared algorithms ALG, we therefore recorded the average runtime \(t({{\texttt {ALG}}})\). Moreover, we recorded the average lower bound \(d_{{\textit{LB}}}({{\texttt {ALG}}})\) and the classification coefficient \(c_{{\textit{LB}}}({{\texttt {ALG}}})\) for all algorithms that yield lower bounds, and the average upper bound \(d_{{\textit{UB}}}({{\texttt {ALG}}})\) and the classification coefficient \(c_{{\textit{UB}}}({{\texttt {ALG}}})\) for all algorithms that yield upper bounds.

The coefficients \(c_{{\textit{LB}}}\) and \(c_{{\textit{UB}}}\) were computed as

$$\begin{aligned} c_{{\textit{LB}}}({{\texttt {ALG}}}):= & {} (d^{\text {inter}}_{{\textit{LB}}}({{\texttt {ALG}}})-d^{\text {intra}}_{{\textit{LB}}}({{\texttt {ALG}}}))/\max \,{\textit{LB}} ({{\texttt {ALG}}})\\ c_{{\textit{UB}}}({{\texttt {ALG}}}):= & {} (d^{\text {inter}}_{{\textit{UB}}}({{\texttt {ALG}}})-d^{\text {intra}}_{{\textit{UB}}}({{\texttt {ALG}}}))/\max \,{\textit{UB}} ({{\texttt {ALG}}})\text {,} \end{aligned}$$

where \(d^{\text {inter}}_{{\textit{LB}}}({{\texttt {ALG}}})\) and \(d^{\text {inter}}_{{\textit{UB}}}({{\texttt {ALG}}})\) are the average lower and upper bounds between graphs with different classes, \(d^{\text {intra}}_{{\textit{LB}}}({{\texttt {ALG}}})\) and \(d^{\text {intra}}_{{\textit{UB}}}({{\texttt {ALG}}})\) are the average lower and upper bounds between graphs with the same class, and \(\max \,{\textit{LB}} ({{\texttt {ALG}}})\) and \(\max \,{\textit{UB}} ({{\texttt {ALG}}})\) denote the maximal lower and upper bounds computed by ALG. The reason for defining the classification coefficients in this way is that pattern recognition frameworks based on distance measures perform well just in case the intra-class distances are significantly smaller than the inter-class distances. Hence, large classification coefficients \(c_{{\textit{LB}}}({{\texttt {ALG}}})\) and \(c_{{\textit{UB}}}({{\texttt {ALG}}})\) imply that the respective lower or upper bounds are fit for use within distance-based pattern recognition frameworks. We normalized by the maximal lower and upper bounds in order to ensure \(c_{{\textit{LB}}}({{\texttt {ALG}}}),c_{{\textit{UB}}}({{\texttt {ALG}}})\in [-1,1]\) and hence render the classification coefficients comparable across different datasets. We rounded \(t({{\texttt {ALG}}})\) to microseconds and \(d_{{\textit{LB}} |{\textit{UB}}}({{\texttt {ALG}}})\) as well as \(c_{{\textit{LB}} |{\textit{UB}}}({{\texttt {ALG}}})\) to two decimal places.

After running all algorithms, we computed a joint score \(s_{{\textit{LB}}}({{\texttt {ALG}}})\in [0,1]\) for all algorithms that yield lower bounds and a joint score \(s_{{\textit{UB}}}({{\texttt {ALG}}})\in [0,1]\) for all algorithms that yield upper bounds. The joint scores are defined as

$$\begin{aligned} s_{{\textit{LB}}}({{\texttt {ALG}}}):= & {} \frac{d_{{\textit{LB}}}({{\texttt {ALG}}})}{3\cdot d^\star _{{\textit{LB}}}}+\frac{t^\star _{{\textit{LB}}}}{3\cdot t({{\texttt {ALG}}})}+\frac{c_{{\textit{LB}}}({{\texttt {ALG}}})}{3\cdot c^\star _{{\textit{LB}}}}\\ s_{{\textit{UB}}}({{\texttt {ALG}}}):= & {} \frac{d^\star _{{\textit{UB}}}}{3\cdot d_{{\textit{UB}}}({{\texttt {ALG}}})}+\frac{t^\star _{{\textit{UB}}}}{3\cdot t({{\texttt {ALG}}})}+\frac{c_{{\textit{UB}}}({{\texttt {ALG}}})}{3\cdot c^\star _{{\textit{UB}}}}\text {,} \end{aligned}$$

where \(d^\star _{{\textit{LB}}}\), \(t^\star _{{\textit{LB}}}\), and \(c^\star _{{\textit{LB}}}\) denote the best (i.e., largest) average lower bound, the best average runtime, and the best classification coefficient yielded by any algorithm that computes a lower bound. Analogously, \(d^\star _{{\textit{UB}}}\), \(t^\star _{{\textit{UB}}}\), and \(c^\star _{{\textit{UB}}}\) denote the best (i.e., smallest) average upper bound, the best average runtime, and the best classification coefficient yielded by any algorithm that computes an upper bound. With this definition, each evaluation criterion contributes a quantity between 0 and 1 / 3 to the joint score, and an algorithm has joint score 1 if it performs best w.r.t. all three criteria.

We partially ordered the compared algorithms w.r.t. the Pareto dominance relations \(\succ _{{\textit{LB}}}\) and \(\succ _{{\textit{UB}}}\). For two algorithms \({{\texttt {ALG}}} _1\) and \({{\texttt {ALG}}} _2\) that compute lower bounds, we say that the lower bound computed by \({{\texttt {ALG}}} _1\) dominates the one produced by \({{\texttt {ALG}}} _2\) on a given dataset (in symbols: \({{\texttt {ALG}}} _1\succ _{{\textit{LB}}}{{\texttt {ALG}}} _2\)) just in case \({{\texttt {ALG}}} _1\) performs at least as good as \({{\texttt {ALG}}} _2\) w.r.t. to all three evaluation criteria \(d_{{\textit{LB}}}\), t, and \(c_{{\textit{LB}}}\), and strictly better than \({{\texttt {ALG}}} _2\) w.r.t. at least one of them. The dominance relation \(\succ _{{\textit{UB}}}\) for the upper bounds is defined analogously. Note that, with these definitions, \({{\texttt {ALG}}} _1\succ _{{\textit{LB}}}{{\texttt {ALG}}} _2\) implies \(s_{{\textit{LB}}}({{\texttt {ALG}}} _1)>s_{{\textit{LB}}}({{\texttt {ALG}}} _2)\) and \({{\texttt {ALG}}} _1\succ _{{\textit{UB}}}{{\texttt {ALG}}} _2\) implies \(s_{{\textit{UB}}}({{\texttt {ALG}}} _1)>s_{{\textit{UB}}}({{\texttt {ALG}}} _2)\), but the inverse implications do not hold. The joint scores \(s_{{\textit{LB}}}\) and \(s_{{\textit{UB}}}\) hence allow to compare algorithms that are Pareto optimal.

Using the partial orders \(\succ _{{\textit{LB}}}\) and \(\succ _{{\textit{UB}}}\), we computed aggregated joint lower bound scores \(\widehat{s_{{\textit{LB}}}}({{\texttt {H}}})\) for all heuristics H that compute lower bounds, as well as aggregated joint upper bounds score \(\widehat{s_{{\textit{UB}}}}({{\texttt {H}}})\) and \(\widehat{s_{{\textit{UB}}}}({{\texttt {E}}})\) for all heuristics H that compute lower bounds and all extensions E of the paradigms LSAPE-GED and LS-GED. These scores were computed as

$$\begin{aligned} \widehat{s_{{\textit{LB}}}}({{\texttt {H}}}):= & {} \delta _{\mathscr {C}({{\texttt {H}}})\cap {\mathrm{MAX}_{\succ _{{\textit{LB}}}}} \ne \emptyset }\max _{{{\texttt {ALG}}} \in \mathscr {C}({{\texttt {H}}})\cap {\mathrm{MAX}_{\succ _{{\textit{LB}}}}}}s_{{\textit{LB}}}({{\texttt {ALG}}})\\ \widehat{s_{{\textit{UB}}}}({{\texttt {H}}}):= & {} \delta _{\mathscr {C}({{\texttt {H}}})\cap {\mathrm{MAX}_{\succ _{{\textit{UB}}}}} \ne \emptyset }\max _{{{\texttt {ALG}}} \in \mathscr {C}({{\texttt {H}}})\cap {\mathrm{MAX}_{\succ _{{\textit{UB}}}}}}s_{{\textit{UB}}}({{\texttt {ALG}}})\\ \widehat{s_{{\textit{UB}}}}({{\texttt {E}}}):= & {} \frac{\delta _{\mathscr {C}({{\texttt {P}}} ({{\texttt {E}}}))\cap {\mathrm{MAX}_{\succ _{{\textit{UB}}}}} \ne \emptyset }\sum _{{{\texttt {ALG}}} \in \mathscr {C}({{\texttt {E}}})\cap {\mathrm{MAX}_{\succ _{{\textit{UB}}}}}}s_{{\textit{UB}}}({{\texttt {ALG}}})}{\sum _{{{\texttt {ALG}}} \in \mathscr {C}({{\texttt {P}}} ({{\texttt {E}}}))\cap {\mathrm{MAX}_{\succ _{{\textit{UB}}}}}}s_{{\textit{UB}}}({{\texttt {ALG}}})}\text {,} \end{aligned}$$

where \(\mathscr {C}({{\texttt {H}}})\) is the set of compared algorithms that are configurations of the heuristic H, \(\mathscr {C}({{\texttt {E}}})\) is the set of compared algorithms that use the extension E, \(\mathscr {C}({{\texttt {P}}} ({{\texttt {E}}}))\) is the set of compared algorithms that instantiate the paradigm extended by E, and \({\mathrm{MAX}_{\succ _{{\textit{LB}}}}} \) and \({\mathrm{MAX}_{\succ _{{\textit{UB}}}}} \) are the set of maxima w.r.t. the partial orders \(\succ _{{\textit{LB}}}\) and \(\succ _{{\textit{UB}}}\), respectively. In other words, we set the aggregated joint scores \(\widehat{s_{{\textit{LB}}}}({{\texttt {H}}})\) and \(\widehat{s_{{\textit{UB}}}}({{\texttt {H}}})\) of a heuristic H to the maximal scores of Pareto optimal configurations of H, and to 0 if no configurations of H were Pareto optimal. The aggregated joint upper bound score \(\widehat{s_{{\textit{UB}}}}({{\texttt {E}}})\) of an extension E of the paradigms LSAPE-GED and LS-GED was set to the sum of the joint upper bound scores of Pareto optimal algorithms that use E divided by the sum of the joint upper bound scores of Pareto optimal algorithms that instantiate the paradigm extended by E, and to 0 if no algorithms that instantiate the paradigm extended by E were Pareto optimal. We also computed vectors \(\chi _{{\textit{LB}}}({{\texttt {H}}})\in \{0,1\}^3\) for all heuristics that yield lower bounds and vectors \(\chi _{{\textit{UB}}}({{\texttt {H}}}),\chi _{{\textit{UB}}}({{\texttt {E}}})\in \{0,1\}^3\) for all heuristics that yield upper bounds and all extensions of the paradigms LSAPE-GED and LS-GED. These vectors indicate whether a heuristic or an extension has a configuration that performed best w.r.t. one or several of the observed metrics \(f_{1_{{\textit{LB}} |{\textit{UB}}}}:=d_{{\textit{LB}} |{\textit{UB}}}\), \(f_{2_{{\textit{LB}} |{\textit{UB}}}}:=t_{{\textit{LB}} |{\textit{UB}}}\), and \(f_{3_{{\textit{LB}} |{\textit{UB}}}}:=c_{{\textit{LB}} |{\textit{UB}}}\). That is, the indicator vectors were computed as follows:

$$\begin{aligned} \chi _{{\textit{LB}}}({{\texttt {H}}}):= & {} (\delta _{\exists {{\texttt {ALG}}} \in \mathscr {C}({{\texttt {H}}}):f_{r_{\textit{LB}}}({{\texttt {ALG}}})=f^\star _{r_{\textit{LB}}}})^3_{r=1}\\ \chi _{{\textit{UB}}}({{\texttt {H}}}):= & {} (\delta _{\exists {{\texttt {ALG}}} \in \mathscr {C}({{\texttt {H}}}):f_{r_{\textit{UB}}}({{\texttt {ALG}}})=f^\star _{r_{\textit{UB}}}})^3_{r=1}\\ \chi _{{\textit{UB}}}({{\texttt {E}}}):= & {} (\delta _{\exists {{\texttt {ALG}}} \in \mathscr {C}({{\texttt {E}}}):f_{r_{\textit{UB}}}({{\texttt {ALG}}})=f^\star _{r_{\textit{UB}}}})^3_{r=1}\\ \end{aligned}$$
Table 6 Overview of test metrics

Finally, we trained linear regression models \(c_{\textit{LB}} \sim d_{\textit{LB}} :=(a_{\textit{LB}},m_{\textit{LB}})\) and \(c_{\textit{UB}} \sim d_{\textit{UB}} :=(a_{\textit{UB}},m_{\textit{UB}})\) defined as

$$\begin{aligned} (a_{{\textit{LB}}},m_{{\textit{LB}}})&{:=}&{\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{(a,m)\in {\mathbb {R}} \times {\mathbb {R}}}} \sum _{{{\texttt {ALG}}}}[c_{\textit{LB}} ({{\texttt {ALG}}})\\&-(a+m\cdot d_{\textit{LB}} ({{\texttt {ALG}}}))]^2\\ (a_{\textit{UB}},m_{\textit{UB}}):= & {} {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{(a,m)\in {\mathbb {R}} \times {\mathbb {R}}}} \sum _{{{\texttt {ALG}}}}[c_{\textit{UB}} ({{\texttt {ALG}}})\\&-(a+m\cdot d_{\textit{UB}} ({{\texttt {ALG}}}))]^2 \end{aligned}$$

for each dataset, which relate the tightnesses of the computed upper and lower bounds to the obtained classification coefficients: Tightness of lower bounds is positively correlated with high classification coefficients if the slope \(m_{\textit{LB}} \) is positive; tightness of upper bounds is positively correlated with high classification coefficients if the slope \(m_{\textit{UB}} \) is negative. We also computed the p values \(p_{\textit{LB}} \) and \(p_{\textit{UB}} \) of the regression models, to assess whether the correlations between bounds and classification coefficients are statistically significant. Table 6 gives an overview of all test metrics.

9.4 Implementation and hardware specifications

To ensure comparability, we reimplemented all compared heuristics in C++. Our implementation builds upon the Boost Graph Library [41] and Eigen [33] for efficiently managing graphs and matrices. For solving LSAPE, we used the solver suggested in [17], which is efficiently implemented in the LSAPE toolbox available at https://bougleux.users.greyc.fr/lsape/. We used the blackbox optimizer NOMAD [39] for training RINGOPT and RINGMS, the support vector machine library LIBSVM [22] for training RING-ML1-SVM and PREDICT1-SVM, the artificial neural network library FANN [47] for training RING-MLDNN and PREDICTDNN, and the mathematical programming library Gurobi [34] for implementing the instantiations of LP-GED.

All heuristics were run in six threads: Instantiations of LSAPE-GED were set up to parallelly construct their LSAPE instance C, instantiations of LS-GED were implemented to parallelly carry out runs from several initial solutions, and instantiations of LP-GED were allowed to use multithreading when solving their LP via calls to Gurobi. For the miscellaneous heuristics, we used the following parallelization techniques: HED was set up to construct its LSAPE instance C in parallel, BRANCH-TIGHT was implemented to parallelize the construction phases of all of its LSAP instances \({\mathbf{C}} ^r\), and SA and HYBRID were set up to use the parallelized versions of, respectively, BRANCH and BRANCH-CONST as subroutines. BRANCH-COMPACT and PARTITION do not allow straightforward parallelizations and where hence run in only one thread.

Source code and datasets are distributed with GEDLIB: https://github.com/dbblumenthal/gedlib/ [6]. Tests were run on a machine with two Intel Xeon E5-2667 v3 processors with 8 cores each and 98 GB of main memory running GNU/Linux.

9.5 Lower bounds

Figure 9 shows the average lower bounds and runtimes of all heuristics that compute lower bounds. Globally, we see that instantiations of LP-GED (except COMPACT-MIP) yielded tight lower bounds, but were also relatively slow. Instantiations of LSAPE-GED were faster and only slightly less accurate. Among the miscellaneous heuristics, only BRANCH-TIGHT and HYBRID produced competitive lower bounds.

Fig. 9
figure 9

Average lower bounds versus average runtime. Instantiations of LSAPE-GED and LP-GED are displayed as circles and squares, respectively; miscellaneous methods are displayed as triangles

Table 7 Overview of results for lower bounds
Fig. 10
figure 10

Average aggregated joint lower bound scores and numbers of datasets where heuristics are optimal w.r.t. tightness of lower bound, runtime, and lower bound classification coefficient, respectively. Only non-zero statistics are displayed

Table 7 shows the aggregated joint lower bound scores \(\widehat{s_{\textit{LB}}}({{\texttt {H}}})\), as well as the indicator vectors \(\chi _{\textit{LB}} ({{\texttt {H}}})\). The fast but relatively imprecise LSAPE-GED instantiations BRANCH-CONST and NODE were Pareto optimal on six (BRANCH-CONST), respectively, five (NODE) out of six datasets. Among the more precise heuristics, BRANCH-TIGHT and the LP-GED instantiations ADJ-IP and F2 performed best. They were Pareto optimal on four (ADJ-IP), respectively, three (F2 and BRANCH-TIGHT) out of six datasets.

In Fig. 10, the results are further aggregated by averaging the scores and summing the indicator vectors over all datasets. Instantiations of LSAPE-GED are displayed white, instantiations of LP-GED are displayed light gray, and miscellaneous heuristics are displayed black. We observe that, globally, BRANCH-CONST and NODE achieved the best aggregated joint lower bound scores, i.e., exhibited the best trade-offs between tightness of the obtained lower bound, runtime, and classification coefficient (cf. Fig. 10a). NODE and BRANCH-CONST also performed best in terms of runtime (cf. Fig. 10c). In terms of tightness of the obtained lower bound, the LP based heuristics ADJ-IP performed best, followed by BRANCH-TIGHT and F2 (cf. Fig. 10b). BRANCH-TIGHT and ADJ-IP also were the best performing heuristics w.r.t. the classification coefficient (cf. Fig. 10d).

Fig. 11
figure 11

Average upper bounds versus average runtime. Instantiations of LSAPE-GED, LP-GED, and LS-GED are displayed as circles, squares, and diamonds, respectively; miscellaneous methods are displayed as triangles. For each heuristic H, the results are displayed only for the configuration that does not use any extensions

Figures 17, 18, 19, 20, 21, and 22 in Appendix B show the dominance graphs induced by the relation \(\succ _{\textit{LB}} \) for each dataset and hence visualize the results in more detail.

9.6 Upper bounds

Figure 11 shows the average upper bounds and runtimes of all heuristics that compute upper bounds. Instantiations of LS-GED usually provided the tightest upper bounds at the price of large execution times. Instantiations of LP-GED (except COMPACT-MIP) also yielded low upper bounds but were even slower. Finally, the paradigm LSAPE-GED represents the category with the largest variations. Its instantiations usually required low execution times (except RING, SUBGRAPH, and WALKS), but the provided upper bounds greatly depend on the intrinsic difficulty of the datasets.

Table 8 shows the aggregated joint upper bound scores \(\widehat{s_{\textit{UB}}}({{\texttt {H}}})\) and \(\widehat{s_{\textit{UB}}}({{\texttt {E}}})\), as well as the indicator vectors \(\chi _{\textit{UB}} ({{\texttt {H}}})\) and \(\chi _{\textit{UB}} ({{\texttt {E}}})\). The LS-GED instantiation IPFP was Pareto optimal on all datasets, as it always computed the tightest upper bound. The instantiation NODE of LSAPE-GED was Pareto optimal and the fastest heuristic on five out of six datasets and also achieved very high aggregated joint upper bound scores on these datasets. The instantiation REFINE of LS-GED performed well, too, as it was Pareto optimal on all datasets except for grec. On the negative side, we see that the instantiations of LP-GED and the miscellaneous heuristics performed very poorly, as they were almost always dominated by other heuristics.

In Fig. 12, the results are further aggregated by averaging the scores and summing the indicator vectors over all datasets. Instantiations and extensions of LSAPE-GED are displayed white, instantiations of LP-GED are displayed light gray, instantiations and extensions of LS-GED are displayed dark gray, and miscellaneous heuristics are displayed black. We observe that, globally, NODE, IPFP, and REFINE achieved the best aggregated joint upper bound scores, i.e., exhibited the best trade-offs between tightness of the obtained upper bound, runtime, and classification coefficient (cf. Fig. 12a). In terms of runtime, NODE and BRANCH-CONST performed best (cf. Fig. 12c). In terms of classification coefficient and tightness, the instantiations of LS-GED performed best, with IPFP as the best performing heuristic among them (cf. Fig. 12b, d).

The average aggregated joint upper bound scores of both extensions CENTRALITIES and MULTI-SOL of the paradigm LSAPE-GED turned out to be smaller than 0.5 (cf. Fig. 12a). That is, on average, instantiations of LSAPE-GED did not benefit from the extensions CENTRALITIES and MULTI-SOL. However, on each dataset, some instantiations of LSAPE-GED did benefit from the extensions, as some algorithms using CENTRALITIES and MULTI-SOL were Pareto optimal on almost all datasets (cf. Table 8).

We also observe that the average aggregated joint upper bound scores of the extensions MULTI-START and RANDPOST of the paradigm LS-GED are, respectively, clearly larger and clearly smaller than 0.5 (cf. Fig. 12a). That is, on average, instantiations of LS-GED benefited from MULTI-START but not from RANDPOST. However, RANDPOST still turned out to be used by Pareto optimal algorithms on all datasets except for the datasets letter (h) and fp, which contain very small graphs. MULTI-START was used by Pareto optimal algorithms on all datasets (cf. Table 8). Moreover, we see that, on all six datasets, algorithms using MULTI-START and RANDPOST yielded the tightest upper bounds and the best classification coefficients (cf. Fig. 12b, d).

Fig. 12
figure 12

Average aggregated joint upper bound scores and numbers of datasets where heuristics are optimal w.r.t. tightness of upper bound, runtime, and upper bound classification coefficient, respectively. Only non-zero statistics are displayed

Figures 23, 24, 25, 26, 27, and 28 in Appendix B show the dominance graphs induced by the relation \(\succ _{\textit{UB}} \) for each dataset and hence visualize the results in more detail.

9.7 Gaps between lower and upper bounds

Table 9 shows the tightest average lower and upper bounds \(d^\star _{\textit{LB}} \) and \(d^\star _{\textit{UB}} \) for all datasets and the gaps between them. We see that the best upper bounds overestimate the best lower bounds (and hence, a fortiori, the exact GED) by at most 4.23% and only 1.99% on average. Given the hardness of exactly computing GED (cf. Sect. 1), this is a remarkable result. On all datasets, \(d^\star _{\textit{UB}} \) was computed by IPFP (cf. Sect. 9.6). On fp, \(d^\star _{\textit{LB}} \) was computed by BRANCH-TIGHT; on protein, it was computed by F2; and on all other datasets, it was computed by ADJ-IP (cf. Sect. 9.5).

Table 8 Overview of results for upper bounds

9.8 Effect of graph sizes

We also carried out experiments for evaluating the effect of the graph sizes on the compared methods. For this, we partitioned the datasets aids, muta, and protein that also contain larger graphs into subsets of graphs whose numbers of nodes are between 1 and 10, 11 and 20, and so forth. Subsequently, we randomly sampled ten graphs from each subset with at least ten graphs, and, for each sample \(\mathscr {D}_{\text {sample}}\), ran all compared methods on all pairs of graphs \((G,H)\in \mathscr {D}_{\text {sample}}\times \mathscr {D}_{\text {sample}}\).

Figure 13 shows the observed trends for the average runtimes t and the average lower and upper bounds \(d_{{\textit{LB}}}\) and \(d_{{\textit{UB}}}\). In order not to overcrowd the plots, trends are displayed only for the five methods with the best average aggregated joint lower and upper bound scores: BRANCH-CONST, NODE, ADJ-IP, F2, and BRANCH-FAST for \(d_{{\textit{LB}}}\) (cf. Fig. 10a), and NODE, IPFP, REFINE, BRANCH-CONST, and BRANCH-FAST for \(d_{{\textit{UB}}}\) (cf. Fig. 12a). As expected, instantiations of LSAPE-GED were faster than instantiations of LS-GED, which, in turn, were faster than instantiations of LP-GED. We also see that, for all three datasets, there are hardly any crossing points between the trends for the lower and upper bounds. This is interesting, because it means that the graph sizes have little effect on the question which of two heuristic \({{\texttt {H}}} _1\) and \({{\texttt {H}}} _2\) yields the tighter lower or upper bound. Finally, we observe that, on protein, the gap between the tightest average lower and upper bounds is very narrow across all graph sizes. On aids and muta, the gaps grow with increasing graph sizes, but are still moderate also on the samples that contain the largest graphs (also cf. Table 9).

Table 9 Tightest average lower and upper bounds
Fig. 13
figure 13

Effect of graph sizes on methods with best average aggregated joint lower and upper bound scores. For each heuristic H, the results are displayed only for the configuration that does not use any extensions

9.9 Classification coefficients versus tightness of lower and upper bounds

Figure 14 and Table 10 relate the lower bounds of the algorithms producing lower bounds to the obtained lower bound classification coefficients. Figure 14 contains plots for all datasets. In each of them, each black dot represents an algorithm that yields a lower bound and the gray line visualizes the obtained linear regression model \(c_{\textit{LB}} \sim d_{\textit{LB}} \). For each dataset, Table 10 summarizes the slopes and p values of the models, as well as the maximum and average lower bound classification coefficients.

We observe that, on muta, all obtained classification coefficients either equal 0.00 or 0.01. This can be explained by the fact that, for both of its classes, muta contains graphs of very different sizes, which leads to a small difference between intra- and inter-class distances. As we have \(c_{\textit{LB}} ({{\texttt {ALG}}})\in \{0.00,0.01\}\) for all algorithms ALG that produce lower bounds, the obtained linear regression model has a very high p value and hence is not statistically significant.

For all other datasets, the obtained linear regression models have p values smaller than \(10^{-3}\) and are hence highly significant. Furthermore, all linear regression models except the statistically insignificant model for muta have a positive slope. That is, tight lower bounds tend to go hand in hand with good classification coefficients.

Fig. 14
figure 14

Average lower bounds versus lower bound classification coefficients. Each black dot represents one algorithm that computes a lower bound. The linear regression model \(c_{\textit{LB}} \sim d_{\textit{LB}} \) is displayed in gray

Figure 15 and Table 11 relate the upper bounds of the algorithms producing upper bounds to the obtained upper bound classification coefficients. Figure 15 contains plots for all datasets. In each of them, each black dot represents an algorithm that yields an upper bound and the gray line visualizes the obtained linear regression model \(c_{\textit{UB}} \sim d_{\textit{UB}} \). For each dataset, Table 11 summarizes the slopes and p values of the models, as well as the maximum and average upper bound classification coefficients.

We again note that, on muta, all obtained classification coefficients either equal 0.00 or 0.01. Since we tested many more algorithms that compute upper bounds than algorithms that yield lower bounds,Footnote 4 the linear regression model \(c_{\textit{UB}} \sim d_{\textit{UB}} \) for muta nonetheless has a p value smaller than \(10^{-3}\) and is hence still highly statistically significant. However, its p value is much larger than the p values of the linear regression models \(c_{\textit{UB}} \sim d_{\textit{UB}} \) we obtained for the other datasets.

Table 10 Maximum and average lower bound classification coefficients for all datasets, and slopes and p values of the linear regression models \(c_{\textit{LB}} \sim d_{\textit{LB}} \)

We observe that, for all datasets, the slopes of the linear regression models \(c_{\textit{UB}} \sim d_{\textit{UB}} \) are positive. We can hence draw the same conclusion as for the lower bounds, namely, that tight upper bounds tend to go hand in hand with good classification coefficients. These findings allow us to positively answer the meta-question Q1 raised in the introduction: It is indeed beneficial to use GED as a guidance for the design of graph distance measures that are to be used within pattern recognition frameworks.

Fig. 15
figure 15

Average upper bounds versus upper bound classification coefficients. Each black dot represents one algorithm that computes an upper bound. The linear regression model \(c_{\textit{UB}} \sim d_{\textit{UB}} \) is displayed in gray

Table 11 Maximum and average upper bound classification coefficients for all datasets, and slopes and p values of the linear regression models \(c_{\textit{UB}} \sim d_{\textit{UB}} \)
Fig. 16
figure 16

Snapshot of the dominance graph induced by \(\succ _{\textit{UB}} \) on the dataset fp shown in Fig. 24

The second meta-question Q2 asked whether lower or upper bounds for GED are better suited for use as graph distance measures within classification frameworks. Since the classification coefficients induced by the lower and upper bounds turned out to be very similar, this question cannot be answered as straightforwardly as Q1. However, there is a tendency: While the average lower bound classification coefficients were slightly better than the average upper bound classification coefficients, the opposite can be observed for the maximum lower and upper bound classification coefficients. Moreover, on average, the slopes of the linear regression models \(c_{\textit{UB}} \sim d_{\textit{UB}} \) are slightly steeper than the slopes of the linear regression models \(c_{\textit{LB}} \sim d_{\textit{LB}} \). Together, these observations suggest that the upper bound classification coefficients benefit more from tight upper bounds than the lower bound classification coefficients benefit from tight lower bounds. As a rule of thumb, we can hence conclude that tight upper bounds for GED (e.g., the upper bound computed by IPFP) should be used for classification purposes, if one is willing to invest a lot of time in the computation of the graph distance measure. Otherwise, a quickly computable lower bound such as the one produced by BRANCH-CONST should be employed.

10 Conclusions and future work

In this paper, we provided a systematic overview of the state of the art for heuristically computing GED. In total, we presented 30 different heuristics that were initially suggested in 28 different articles published from 2006 on. Thirteen heuristics were modeled as instantiations or extensions of the paradigm LSAPE-GED, which generalizes algorithms that upper and, possibly, lower bound GED via transformations to LSAPE. Four heuristics were modeled as instantiations of the paradigm LP-GED, which uses linear programming for computing lower and upper bounds for GED. Seven heuristics were modeled as instantiation or extensions of the paradigm LS-GED, a local search-based approach for upper bounding GED. The remaining six heuristics do not fit within any of the suggested paradigms and were hence presented as miscellaneous heuristics.

We reimplemented all methods in C++ and empirically evaluated them by carrying out experiments on six different benchmark datasets. As the gap between tightest average lower bounds and tightest average upper bounds never exceeded 4.23%, the experiments showed that despite the high theoretical complexity of approximating GED, on small- to medium-sized graphs as the ones contained in the test datasets, GED can be bounded within tight margins.

On average, the instantiation ADJ-IP of LP-GED suggested in [36] computed the tightest lower bounds, followed by the LP-GED instantiation F2 [44] and the miscellaneous heuristic BRANCH-TIGHT [9]. On all datasets, the tightest upper bounds were computed by the instantiation IPFP of LS-GED suggested in [7, 14, 16]. The instantiations NODE [36], BRANCH-CONST [70, 71], and BRANCH-FAST [8, 9] of LSAPE-GED achieved excellent trade-offs between tightness, runtime, and classification coefficient—both w.r.t. the produced lower and w.r.t. the produced upper bounds.

Fig. 17
figure 17

Transitive reduction of dominance graph for lower bounds on the dataset letter (h)

Fig. 18
figure 18

Transitive reduction of dominance graph for lower bounds on the dataset fp

Fig. 19
figure 19

Transitive reduction of dominance graph for lower bounds on the dataset aids

Fig. 20
figure 20

Transitive reduction of dominance graph for lower bounds on the dataset muta

Furthermore, we addressed a tacit assumption made in many publications on GED, which states that the tighter the lower or upper bound for GED, the better its performance when used as a graph distance measure within pattern recognition frameworks. Our experiments provided thorough evidence to support this assumption. They hence justify the ongoing competition for tight upper and lower bounds.

Fig. 21
figure 21

Transitive reduction of dominance graph for lower bounds on the dataset grec

Given the small gaps between the tightest currently available lower and upper bounds for GED, we see little room for further tightening these bounds. Instead, we suggest that future work on the heuristic computation of GED should focus on the task of speeding up those existing heuristics that yield the tightest currently available bounds.

Fig. 22
figure 22

Transitive reduction of dominance graph for lower bounds on the dataset protein

Fig. 23
figure 23

Transitive reduction of dominance graph for upper bounds on the dataset letter (h)

Fig. 24
figure 24

Transitive reduction of dominance graph for upper bounds on the dataset fp

Fig. 25
figure 25

Transitive reduction of dominance graph for upper bounds on the dataset aids

Fig. 26
figure 26

Transitive reduction of dominance graph for upper bounds on the dataset muta

Fig. 27
figure 27

Transitive reduction of dominance graph for upper bounds on the dataset grec

Fig. 28
figure 28

Transitive reduction of dominance graph for upper bounds on the dataset protein