1 Introduction

Subgraph Isomorphism (SI) is an NP-complete problem which involves finding a copy of a pattern graph into a target graph, i.e., finding a mapping that associates a different target node to each pattern node in such a way that edges are preserved. There are two main variants of SI: in the non-induced case, only pattern edges must be preserved (i.e., pattern nodes connected by an edge must be mapped to target nodes connected by an edge); in the induced case, target edges must also be preserved (i.e., target nodes connected by an edge cannot be mapped to pattern nodes not connected by an edge).

SI is at the heart of many structural pattern recognition tasks in different application fields such as image or biology, for example [7]. In the pattern recognition community, the most well-known algorithms used to solve SI are VF2 [8], VF3 [5], and RI [4]. These solvers will be referred to as PR solvers. PR solvers perform a depth-first search in a space of states: each state corresponds to a partial mapping where some pattern nodes have been mapped, and each state is recursively extended by adding to its partial mapping a new couple of mapped pattern/target nodes.

SI is also widely studied in the constraint programming community as it may be modelled as a constraint satisfaction problem in a straightforward way. Many constraint-based solvers have been proposed for solving SI since Ullman [20] such as, for example, nRF+ [15], ILF [21], LAD [18], SND [2], and Glasgow [1, 16]. These solvers will be referred to as CP solvers. Like VF2, VF3, and RI, CP solvers recursively extend partial mappings. However, a fundamental difference is that CP solvers maintain, for each non-mapped pattern node, the list of candidate target nodes that may be mapped to it, and they propagate constraints to reduce these lists. This constraint propagation mechanism is expensive, both in memory and time, but it reduces the number of states to explore.

Recent PR and CP solvers can solve very quickly rather large SI instances, that involve graphs with thousands of nodes. Indeed, being NP-complete does not mean that all instances are hard to solve, and some instances of NP-complete problems can be very easy to solve. In particular, in [6], Cheeseman et al. show that NP-complete problems can be summarised by at least one “order parameter”, and that hard instances occur at a critical value of such a parameter. In [17], McCreesh et al. use this approach to generate “really hard” random SI instances according to three random graph models. For example, for Erdős-Rényi random graphs (where edges are generated according to an independent probability [11]), instances of non-induced SI may be generated by fixing pattern and target numbers of nodes, and varying pattern and target edge probabilities from 0 to 1. In this case, a phase transition occurs between entirely satisfiable instances (when patterns are sparse and targets are dense) and entirely unsatisfiable instances (when patterns are dense and targets are sparse), and the location of this phase transition can be predicted by computing the expected number of solutions. Instances located within this phase transition are computationally challenging for all solvers even when graphs are small (e.g., thirty pattern nodes and 150 target nodes). However, the experimental study reported in [17] also shows that some small instances which are predicted as easy, and which are easily solved by CP solvers, appear to be challenging for PR solvers.

In this paper, we widen this experimental study and we experimentally evaluate and compare RI, VF2, VF3, Glasgow, and LAD on a large test suite of 14, 621 instances coming from eight benchmarks. In Sect. 2, we describe our test suite. In Sect. 3, we show that, as expected for an NP-complete problem, the solving time of an instance does not depend on its size, and that some small instances (including instances coming from real applications) are not solved by any of the considered solvers. In Sect. 4, we identify easy and hard instances and we show that, if PR solvers are able to solve very quickly easy instances (for which CP solvers often need more time), they fail at solving some other instances that are rather quickly solved by CP solvers, and they are clearly outperformed by Glasgow on hard instances. Finally, in Sect. 5, we show that we can easily combine PR and CP solvers to take benefit of their complementarity.

Table 1. For each class, we give the number of instances (#inst) and then describe pattern and target graph features: minimum and maximum number of nodes, number of edges, and density.
Fig. 1.
figure 1

Number of pattern edges (x-axis), target edges (y-axis), and solving time (colour) for non-induced SI: top left = Glasgow; top right = LAD; bottom left = RI; bottom right = VF2. (Color figure online)

2 Experimental Set-Up

Test Suite. We consider 14, 621 instances coming from eight benchmarks described in Table 1, and available at liris.cnrs.fr/christine.solnon/SIP.html. images and meshes are coming from real applications where both pattern and target graphs correspond to graphs extracted from segmented images and 3D meshes [9, 19].

LV is a benchmark described in [15]. It uses 113 graphs with various properties coming from the Stanford GraphBase described by Knuth in [13]. The benchmark is built by splitting the set of graphs in two parts: the first part contains the 50 smallest graphs; the second part contains the 63 remaining graphs. We consider all pairs of graphs such that the pattern graph belongs to the first part, the target graph belongs to the first or the second part, and the target graph has at least as many nodes as the pattern graph.

rand* (with \(*\in \{{ ERP,ER,BVG,M,SF}\}\)) are randomly generated instances. randERP are instances close to the phase transition (expected to be hard as explained in [17]), and all graphs are Erdős-Rényi graphs. randER, randBVG, and randM are coming from the database described in [10], and graphs are Erdős-Rényi graphs, (modified) bounded valence graphs and 4D meshes, respectively. randSF is described in [22] and it contains scale-free graphs. All instances in randER, randBVG, randM, and randSF (except 20 instances in randSF) are feasible by construction because the pattern has been extracted from the target.

All graphs have at least as many edges as nodes. Hence, the size of a graph is dominated by its number of edges.

Performance Measures. The experiments were performed on the EPCC Cirrus HPC facility, on systems with dual Intel Xeon E5-2695 v4 CPUs and 256 Gb RAM, running Centos 7.3.1611, and GCC 7.2.0 as the compiler. Each run has been limited to 1, 000 s of CPU time. Some instances are not solved within this time limit (note that even when increasing the time limit to 100, 000 s some instances are still unsolved). We consider two different performance measures: when all solvers have been able to solve all instances of a benchmark, we report the average solving time; when some instances have not been solved within the time limit, we report the number of solved instances within the time limit, and we plot the evolution of the cumulative number of solved instances with respect to time (i.e., the function \(f(t)=\#\{i\in I : t^s_i\le t\}\) where I is the set of instances, s a solver, and \(t_i^s\) the time spent by s to solve an instance \(i\in I\)).

We do not consider memory consumption as a performance measure as solvers never run out of memory, even for the largest instances (all solvers have polynomial memory complexities). However, CP solvers need more memory than PR solvers as they maintain candidate lists of target nodes for each non-mapped pattern vertex.

Different variants of Glasgow are described in [1]. We consider the biased variant, which is the default settingFootnote 1.

3 Does the Solving Time Depend on Graph Sizes?

To study the relation between the solving time and the size of an instance, we plot in Figs. 1 and 2 the time spent by each solver on each instance. Each instance corresponds to a point (xy) where x is the number of pattern edges, y the number of target edges, and the colour depends on the solving time: yellow if it is smaller than one second, and black if the instance has not been solved within 1000 s (if several instances have the same size, the colour corresponds to the average solving time for all these instances).

Fig. 2.
figure 2

Number of pattern edges (x-axis), target edges (y-axis), and solving time (colour) for induced SI: top left = Glasgow; top right = LAD; bottom left = RI; bottom right = VF3. (Color figure online)

As expected for an NP-complete problem, these figures show us that hardness does not depend on size. Let us first consider the non-induced case, displayed in Fig. 1. Unsolved instances (black points) are not specially concentrated in the top right area of the plots (corresponding to the largest instances). The number of unsolved instances is quite different from a solver to another, but some black points are common to all solvers. Among the set of instances which are solved by none of the solvers, the smallest pattern (resp. target) graph has 62 edges and 30 nodes (resp. 400 edges and 86 nodes). Many much larger instances are solved in less than one second. The gray line separates instances that have more target edges than pattern edges (top left) from those that have less target edges than pattern edges (bottom right). All instances in the bottom right part are trivially infeasible. However, both VF2 and RI are not able to solve some of them.

Fig. 3.
figure 3

Number of pattern edges (x-axis), target edges (y-axis), and classes (colour) of unsolved instances: left = non-induced SI; right = induced SI. (Color figure online)

Table 2. Number of feasible (yes), infeasible (no), easy (E), easy-or-hard (EH), hard (H), and unsolved (U) instances per class.

When looking at the induced case in Fig. 2, we also note that the unsolved instances are not necessarily those with the largest graphs and the number of unsolved instances is quite different from a solver to another. Among the set of instances which are solved by none of the solvers, the smallest pattern (resp. target) graph has 62 edges and 30 nodes (resp. 638 edges and 120 nodes). VF3 has much better results on induced SI than VF2 on non-induced SI, and it is always able to quickly solve instances that are trivially infeasible because they have less target edges than pattern edges.

4 Where Are the Hard Instances?

To have a better insight into where the hard instances are, we have partitioned each class of our benchmark into 4 separate groups, depending on instance hardness. As all instances but those of randERP have not been randomly generated with a model that allows us to predict hardness with respect to the phase transition location, we consider an empirical definition of instance hardness:

  • an instance is easy if the four solvers are able to solve it within one second;

  • an instance is hard if no solver can solve it within one second, but at least one solver can solve it within the time limit of 1000 s;

  • an instance is easy-or-hard if at least one solver solves it within one second whereas at least one solver cannot solve it within one second;

  • an instance is unsolved if no solver can solve it within the time limit of 1000 s.

In Fig. 3, we display the number of edges in pattern and target graphs of unsolved instances, and in Table 2, we give the number of instances in each group of each class. As expected, many randERP instances are unsolved or hard, and none of them is easy: these instances are close to the phase transition and they are expected to be challenging despite their small size. However, not all unsolved instances are coming from randERP. This shows us that really hard instances may occur even if they have not been generated on purpose. For the non-induced case, LV and meshes respectively contain 120 and 11 unsolved instances, whereas for the induced case, LV contains 60 unsolved instances. In both cases, these instances are not the largest ones, and some of them are really small as illustrated in Fig. 3.

Many instances are easy (7,787 instances for the non-induced case, and 9,109 for the induced case), and these easy instances are coming from all classes but randERP and randER for the non-induced case, and all classes but randERP for the induced case.

In Table 2, we also give the number of feasible instance per class. Note that any instance feasible for the induced case is also feasible for the non-induced case. Three classes (i.e., randER, randBVG, and randM) only contain feasible instances as they have been randomly generated in such a way that there always exists at least one solution. There is no obvious relation between feasibility and hardness: hard and unsolved groups contain both feasible and infeasible instances.

Table 3. Results of Glasgow (G), LAD (L), VF2/VF3 (V), and RI (R) on non-induced (top) and induced (bottom) SI instances. #u is the number of unsolved instances within 1000 s (for easy instances, #u = 0). When all instances are solved, we report the average solving time in seconds.

5 Experimental Comparison of the Solvers

In Table 3, we display the results of the four solvers on the different classes, grouped with respect to hardness. For easy instances (which are solved by all solvers), RI is an order faster than the other solvers for the non-induced case, and VF3 is twice as fast as RI which is an order faster than Glasgow and LAD for the induced case. Hence, on easy instances, the fastest solvers clearly are RI for the non-induced case and VF3 for the induced case, and CP solvers are an order slower.

Fig. 4.
figure 4

Cumulative number of solved instances: top = easy instances; middle = easy-or-hard instances; bottom = hard instances; left = non-induced SI; right = induced SI.

Fig. 5.
figure 5

Comparison of the best PR and CP solvers. On the left (resp. right), each point (xy) corresponds to an instance which is solved in x seconds by Glasgow and y seconds by RI for the non-induced case (resp. VF3 for the induced case). When an instance is not solved by Glasgow (resp. RI or VF3), it is displayed on \(x=1,000\) (resp. \(y=1,000\)).

However, on easy-or-hard and hard instances, PR solvers solve less instances than LAD, and LAD solves less instances than Glasgow. More precisely, for the non-induced case, Glasgow (resp. LAD, VF2, and RI) fails at solving 51 (resp. 221, 1879, and 682) instances. For the induced case, Glasgow (resp. LAD, VF3, and RI) fails at solving 14 (resp. 295, 416, and 596) instances. Hence, on easy-or-hard and on hard instances, the best solver clearly is Glasgow for both the non-induced and the induced case. Actually most easy-or-hard instances are trivially solved by Glasgow in less than one second whereas PR solvers fail at solving many of these instances.

For the non-induced case, if LAD is outperformed by Glasgow, it is able to solve much more instances than PR solvers. For the induced case, LAD is also outperformed by Glasgow and, if it is able to solve more instances than PR solvers on many classes, it is clearly outperformed by them on randER instances. Actually, LAD is the only solver which solves less instances for the induced case than for the non-induced case. This comes from the fact that LAD has been designed for the non-induced case. It has been extended to handle the induced case in a very naive way (by checking that target edges are preserved a posteriori), without exploiting properties specific to the induced case.

In Fig. 4, we plot the evolution of the cumulative number of solved instances with respect to time. For easy instances, RI (resp. VF3) dominates all other solvers for the non-induced (resp. induced) case, and it is able to solve more than 5, 000 (resp. 7, 000) instances in less than 0.001 s. On these instances, CP solvers often need more time.

For easy-or-hard instances, RI (for the non-induced case) and VF3 (for the induced case) are able to solve more than 2, 500 instances in less than .001 s. However, they fail at solving hundreds of instances which are easily solved by Glasgow, in less than one second, and the cumulative number of instances solved by Glasgow becomes larger than those of RI and VF3 after 0.3 s.

For hard instances, Glasgow clearly outperforms all other solvers and it is able to solve much more instances.

Fig. 6.
figure 6

Cumulative number of solved instances on the whole benchmark of RI (resp. VF3), Glasgow, and RI+Glasgow (resp. VF3+Glasgow) for non-induced SI (left) (resp. induced SI (right)).

In Fig. 5, we compare the best CP solver (i.e., Glasgow) with the best PR solver (i.e., RI for the non-induced case, and VF3 for the induced case) on a per instance basis. Every point below the gray line corresponds to an instance which is solved quicker by the PR solver than by Glasgow, and the wide majority of these points are on the left of the vertical line \(x=1\), corresponding to instances which are solved in less than one second by Glasgow. Every point above the gray line corresponds to an instance which is solved quicker by Glasgow than by the PR solver, and many of these points are on the horizontal line \(y=1,000\), corresponding to instances which are not solved by the PR solver within the time limit of 1, 000 s.

6 Combining Solvers to Take the Best of Them

Glasgow is complementary to the best PR solver (i.e., RI for the non-induced case and VF3 for the induced case) as it needs more time on very easy instances, but it is able to solve more instances. We can take benefit of this complementarity as follows: we run the best PR solver with a time limit of \(t_1\) seconds; if the instance has not been solved within this limit, we run Glasgow. The time limit \(t_1\) should be long enough to allow the PR solver to solve easy instances, but not too long in order not to penalise the total solving time when the PR solver is not able to solve the instance. In Fig. 6, we display cumulative numbers of solved instances of the best PR solver, Glasgow, and the combined approach (denoted RI+Glasgow for the non-induced case, and VF3+Glasgow for the induced case) when the time limit \(t_1\) is set to 0.1 s. It shows us that this simple combination allows to take the best of both solvers: before 0.1 s, the cumulative number of solved instances of RI+Glasgow (or VF3+Glasgow) is equal to the one of RI (or VF3), which is much greater than the one of Glasgow (not displayed because the y-axis starts at 8, 000 and Glasgow solves less than 8, 000 instances in 0.1 s); after 0.1 s, the cumulative number of solved instances of RI+Glasgow (or VF3+Glasgow) grows faster than the one of RI (or VF3) because Glasgow is able to solve instances which are not solved by RI (or VF3); finally, after a few seconds, the cumulative number of solved instances of RI+Glasgow (or VF3+Glasgow) is very close to the one of Glasgow as the delay of 0.1 s due to the run of RI (or VF3) is negligible.

Of course, this very simple approach could be enhanced by considering more solvers (including more variants of each solver, using different ordering heuristics, for example). In this case, we may gather all solvers in a portfolio, and use an algorithm selection approach to dynamically select from the portfolio the solver which is expected to perform best for each new SI instance to solve, as proposed by Kotthoff et al. in [14].

7 Conclusion

This study has shown that there are many very easy SI instances which are solved in a few milliseconds by modern solvers, and that some of these instances may involve very large graphs with thousands of nodes. However, there are still small instances which cannot be solved within a reasonable amount of time by any of these solvers. It is important to evaluate solvers on these hard instances too as they do appear in real applications, though they are less frequent than easy instances.

A promising research direction for solving hard instances is to exploit multiple cores, and parallel SI solvers have been introduced in [1, 3, 16], for example. A special attention should be paid on performance measures used to evaluate these approaches. Indeed, measuring an average speed-up between a sequential and a parallel solver is not very meaningful when considering NP-complete problems because speed-ups are very different from an instance to another, and do not depend on instance sizes: for easy instances, speed-ups are usually very low, whereas for hard instances it is not rare to have super-linear speed-ups. Also, really hard instances are not solved within a reasonable amount of time, and speed-ups cannot be computed in this case. Let us illustrate this point on the parallel version of Glasgow (using 32 cores) described in [1]. On easy instances (solved in less than 1 s by sequential Glasgow), the speed-up varies between 0.1 and 32, and the average speed-up is close to 1. On hard instances (that are not solved by sequential Glasgow within 1 s, but are solved within 1000 s), the speed-up varies between 1 and 583, and the average speed-up is 14. However, parallel Glasgow is able to solve instances which are not solved by sequential Glasgow within 1000 s and, if we include these instances, the average speed-up becomes greater than 19 (this is a lower bound of the speed-up as we only have a lower bound of the time of sequential Glasgow for unsolved instances). This shows us that the average speed-up does not give a clear picture of solver performance. Better insights are given by scatter plots that compare times on a per instance basis (as done in Fig. 5), or by the aggregate speed-up measure introduced in [12], which measures timeout ratio for solving a same number of instances. For instance, Sequential Glasgow solves 14, 356 instances within 1000 s, and the hardest of these instances is solved in 939 s. Parallel Glasgow solves 14, 356 instances within a timeout of 19 s, and this gives an aggregate speed-up of \(939/19=49\).